Stanford NER, extracting & visualizing patterns

This is just a quick note while I’m thinking about this. I say ‘visualizing’ patterns, but there are of course many ways of doing that. Here, I’m just going quick’n’dirty into a network.

Say you have the diplomatic correspondence of the Republic of Texas, and you suspect that there might be interesting patterns in the places named over time. You can use the Stanford Named Entity Recognition package to extract locations. Then, using some regular expressions, you can transform that output into a network file. BUT – and this is important – it’s a format that carries some baggage of its own. Anyway, first you’ll want the Correspondence. Over at The Macroscope, we’ve already written about how you can extract the patterns of correspondence between individuals using regex patterns. This doesn’t need the Stanford NER because there is an index to that correspondence, and the regex grabs & parses that information for you.

But there is no such index for locations named. So grab that document, and feed it into the NER as Michelle Moravec instructs on her blog here. In the  terminal window, as the classifier classifies Persons, Organizations, and Locations, you’ll spot blank lines between batches of categorized items (edit: there’s a classifier that’ll grab time too; that’d be quite handy to incorporate here – SG). These blanks correspond to the blanks between the letters in the original document. Copy all of the terminal output into a new Notepad++ or Textwrangler document. We’re going to trim away every line that isn’t led by LOCATION:

\n[^LOCATION].+

and replace with nothing. This will delete everything that doesn’t have the location tag in front. Now, let’s mark those blank lines as the start of a new letter. A thread on Stack Overflow suggests this regex to find those blank lines:

^\s*$

where:

^ is the beginning of string anchor
$ is the end of string anchor
\s is the whitespace character class
* is zero-or-more repetition

and we replace with the string new-letter.

Now we want to get all of the locations for a single letter into a single line. Replace ‘LOCATION’ with a comma. This budges everything into a single line, so we need to reintroduce line breaks, by replacing ‘new-letter’ with the new line character:

find: (new-letter)
replace \n(\1)

I could’ve just replaced new-letter with a new-line, but I wanted to make sure that every new line did in fact start with new-letter. Now find and replace new-letter so that it’s removed. You now have a document with the same number of lines as original letters in the original correspondence file. Now to turn it into a network file! Add the following information at the start of the file:

DL
n=721
format = nodelist1
labels embedded:
data:

DL will tell a network analysis program that we are dealing with UCINET’s DL format. N equals the number of nodes. Format=nodelist1 says, ‘this is a format where the first item on the line is connected to all the subsequent items on that line’. As a historian or archaeologist, you can see that there’s a big assumption in that format. Is it justified? That’s something to mull over. Gephi only accepts DL in format=edgelist1, that is, binary pairs. If that describes the relationship in your data, there’s a lot of legwork involved in moving from nodelist1 to edgelist1, and I’m not covering that here. Let’s imagine that, on historical grounds, nodelist1 accurately describes the relationship between locations mentioned in letters, that the first location mentioned is probably the place where the letter is being written from, or the most important place, or….

“labels embedded:” tells a network program that the labels themselves are being used as data points, and “data:” indicates that everything afterwards is the data. But how did we know how many nodes there were? You could tally up by hand; you could copy and paste your data )(back when each LOCATION was listed) into a spreadsheet and use its COUNT function to find uniques; I’m lazy and just bang any old number in there, and then save it with a .dl extension.  Then I open it using a small program called Keyplayer. This isn’t what the program is for, but it will give you an error message that tells you the correct number of nodes! Put that number into your DL file, and try again. If you’ve got it right, Keyplayer won’t do anything – its silence speaks volumes (you can then run an analysis in keyplayer. If your DL file is not formatted correctly, no results!).

You now have a DL file that you can analyze in Pajek or UCINET. If you want to visualize in Gephi, you have to get it into a DL format that Gephi can use (edgelist) or else into .net format. Open your DL file in Pajek, and then save as Pajek format (which is .net). Then open in Gephi. (Alternatively, going back a step, you can open in Keyplayer, and then within Keyplayer, hit the ‘visualize in Pajek’ button, and you’ll automatically get that transformation). (edit: if you’re on a Mac, you have to run Pajek or Ucinet with something like Winebottler. Forgot to mention that).

Ta da!

Locations mentioned in letters of the Republic of Texas

Locations mentioned in letters of the Republic of Texas

 

 

-ing history!

Still playing with videogrep. I downloaded 25 heritage minute commercials (non-Canadians: a series of 1 minute or so clips that teach us Canucks about the morally uplifting things we’ve done in the past, things we’ve invented, bad-things-we-did-but-we’ve-patched-over-now. You get the gist.). I ran them through various pattern matches based on parts-of-speech tagging. It was hard to do anything more than that because the closed captioning (on which this all rests) was simply awful. Anyway, there’s a healthy dose of serendipity in all of this, as even after the search is done, the exact sequence the clips are reassembled in is more or less random.

And with that, I give you the result of my pattern matching for gerunds:

-ing history! A Heritage Minute Auto-Supercut.

Historical Maps into Minecraft

dowslakemap1847

Dow’s Lake area, settlement by 1847 Map Source: Bruce Elliott, Nepean, The City Beyond, page 23, posted on http://www.bytown.net/dowslake.htm

The folks over at the New York Public Library published an excellent & comprehensive tutorial for digitizing historical maps, and then importing them into Minecraft.

First: thank you!

Unfortunately, for me, it’s not working. I document here what I’ve been doing and ideally someone far more clever than me will figure out what needs to happen…

The first parts of the tutorial – working with QGIS & Inkscape – go very well (although there might be a problem with colours, but more on that anon). Let’s look at the python script for combining the elevation map (generated from QGIS) with the blocks map (generated from Inkscape). Oh, you also need to install imagemagick, which you then run from the command line, to convert SVG to TIF.

“The script for generating the worlds uses PIL to load the TIFF bitmaps into memory, and pymclevel to generate a Minecraft worlds, one block at a time. It’s run successfully on both Mac OS X and Linux.”

After digitizing, looks like this.

After digitizing, looks like this.

I’ve tried both Mac and Linux, with python installed, and PIL, and pymclevel. No joy (for the same reasons as for Windows, detailed below). Like most things computational, there are dependencies that we only uncover quite by accident…

Anyway, when you’ve got python installed on Windows, you can just type the python file name at the command prompt and you’re off. So I download pymclevel, unzip it, open a command prompt in that folder (shift + right click, ‘open command prompt here’), and type ‘setup.py’. Error message. Turns out, I need setuptools. Which I obtain from:

https://pypi.python.org/pypi/setuptools#windows-7-or-graphical-install

Download, install. Works. Ok, back to the pymclevel folder, setup.py, and new error message. Looks like I need something called ‘cython’.

http://cython.org/#download

I download, unzip, go to that folder, setup.py. Problem. Some file called ‘vcvarsall.bat’ is needed. Solution? Turns out I need to donwload Microsoft Visual Studio 10. Then, I needed to create an environment variable called ‘vs90comntools’, which I did by typing this at the command prompt:

set VS90COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\Tools\

Wunderbar. I go back to the pymclevel folder, I run setup.py again, and hooray! It installs. I had PIL installed from a previous foray into things pythonesque, so at least I didn’t have to fight with that again.

I copy the generate_map.py script into notepad++, change the file names within it (so that it finds my own elevation.tif and features.tif files, which are called hogs-elevation.tif and hogs-features.tif; the area I’m looking at is the Hogsback Falls section of the Rideau. In the script, just change ‘fort-washington’ to ‘hogs’ or whatever your files are called). In my folder, at the command prompt, I type generate_map.py and get a whole bunch of error messages: various ‘yaml’ files can’t be found.

Did I mention PyYaml has to be installed? Fortunately, it has a windows installer.  Oh, and by the way – PyWin is also needed; I got that error message at one point (something obscure about win32api), and downloading/installing from here solved it: http://sourceforge.net/projects/pywin32/files/pywin32/

Ok, so where were we? Right, missing yaml files, like ‘minecraft.yaml’ and ‘classic.yaml’, and ‘indev.yaml’ and ‘pocket.yaml’. These files were there in the original repository, but for whatever, they didn’t install into the pymclevel that now lives in the Python directory. So I went to the pymclevel repo on github, copied-and-pasted the code into new documents in notepad++, saved them as thus:

c:\Python27\Lib\site-packages\pymclevel-0.1-py2.7-win32.egg\pymclevel\minecraft.yaml

Phew. Back to where I was working on my maps, and have my generate_map.py, which I duly enter and…. error. can’t find ‘tree import Tree, treeObjs’.  Googling around to solve this is a fool’s errand: ‘tree’ is such a common word, concept in programming that I just can’t figure out what’s going on here. So I turned that line off with a # in the code. Run it again…. and it seems to work (but is this the key glitch that kills all that follows?).

(update: as Jonathan Goodwin points out, ‘tree.py’ is there, in the NYPL repo

…so I uncommented out the line in generate_map.py, saved tree.py in the same directory, and ran the script again. Everything that follows still happens. So perhaps there’s something screwed-up with my map itself.)

The script tells me I need to tell it whether I’m creating a creative mode map or a survival mode:

so for creative mode: c:>generate_map.py map

for survival: c:>generate_map.py game

And it chugs along. All is good with the world. Then: error message. KeyError: 255 in line 241, block_id, block_data, depth = block_id_lookup[block_id]. This is the bit of code that tells the script how to map minecraft blocks to the colour scheme I used in Inkcraft to paint the information from the map into my features.tif. Thing is, I never used RGB R value of 255. Where’s it getting this from? I go back over my drawing, inspecting each element, trying to figure it out. All seems good with the drawing. So I just add this line to the code in the table:

block_id_lookup = {

[..existing code...]

255 : (m.Water.ID, 0, 1),

}

And run it again. Now it’s 254. And then 253. Then 249. 246. 244. 241. Now 238.

And which point, I say piss on this, and I provide you with my features tif and elevation tif and if you can please tell me what I’m doing wrong, I’d be ever so appreciative (and here’s the svg with the drawing layers, for good measure).

….when I first saw the tutorial from the NYPL, I figured, hey! I could use this with my students! I think not, at least, not yet.

(update 2: have downloaded the original map tifs that the NYPL folks used, and am running the script on them. So far, so good: which shows that, once all this stuff is installed, that it’s my maps that are the problem. This is good to know!)

Part Two:

(updated about 30 minutes after initial post) So after some to-and-fro on Twitter, we’ve got the tree.py problem sorted out. Thinking that it’s the maps where the problem is, I’ve opened the original Fort Washinton features.tif in MS Paint (which is really an underappreciated piece of software). I’ve zoomed in on some of the features, and compared the edges with my own map (similarly opened and zoomed upon). In my map, there are extremely faint colour differentations/gradations where blocks of colour meet. This, I think, is what has gone wrong. So, back to Inkscape I go…

Update the Third: looks like I made (another) silly error – big strip of white on the left hand side of my features.tif. So I’ve stripped that out. But I can’t seem to suss the pixel antialiasing issue. Grrrrr! Am now adding all of the pixels into the dictionary, thus:

lock_id_lookup = {
0 : (m.Grass.ID, None, 2),
10 : (m.Dirt.ID, 1, 1), # blockData 1 == grass can’t spread
11 : (m.Dirt.ID, 1, 1), # blockData 1 == grass can’t spread
12 : (m.Dirt.ID, 1, 1), # blockData 1 == grass can’t spread
14 : (m.Dirt.ID, 1, 1), # blockData 1 == grass can’t spread
16 : (m.Grass.ID, None, 2),
20 : (m.Grass.ID, None, 2),
30 : (m.Cobblestone.ID, None, 1),
40 : (m.StoneBricks.ID, None, 3),
200 : (m.Water.ID, 0, 2), # blockData 0 == normal state of water
210 : (m.WaterActive.ID, 0, 1),
220 : (m.Water.ID, 0, 1),
49 : (m.StoneBricks.ID, None, 3),
43 : (m.StoneBricks.ID, None, 3),
}

…there’s probably a far more elegant way of dealing with this. Rounding? Range lookup? I’m not v. python-able…

Update, 2.20pm: Ok. I can run the script on the Fort Washington maps and end up with a playable map (yay!). But my own maps continue to contain pixels of colours the script doesn’t want to play with. I suppose I could just add 255 lines worth, as above, but that seems silly. The imagemagick command, I’m told, works fine on a mac, but doesn’t seem to achieve anything on my PC. So something to look into (and perhaps try this http://www.graphicsmagick.org/ instead). In the meantime, I’ve opened the Fort Washington map in good ol’ Paint, grabbing snippets of the colours to paste into my own map (also open in Paint). Then, I use Paint’s tools to clean up the colour gradients at the edges on my map. In essence, I trace the outlines.

Then, I save, run the script and…… success!

I have a folder with everything I need (and you can have it, too.) I move it to

C:\Users\[me]\AppData\Roaming\.minecraft\saves and fire up the game:

Rideau River in Minecraft!

Rideau River in Minecraft!

Does it actually look like the Hogs’ Back to Dow’s Lake section of the Rideau Canal and the Rideau River? Well, not quite. Some issues with my basic elevation points. But – BUT! – the workflow works! So now to find some better maps and to start again…

Shared Authority & the Return of the Human Curated Web

A few years ago, I wrote a piece on Why Academic Blogging Matters: A structural argument. This was the text for a presentation as part of the SAA in Sacremento that year. In the years since, the web has changed (again). It is no longer enough for us to create strong signals in the noise, trusting in the algorithmns to connect us with our desired publics. (That’s the short version. The long version is rather more nuanced and sophisticated, trust me).

The war between the botnets and the SEO specialists has outstripped us.

In recent months, I have noticed an upsurge of new ‘followers’ on this blog with emails and handles that really do not seem to be those of actual humans. Similarly, on Twitter, I find odd tweets directed at me filled with gibberish web addresses (which I dare not touch). Digital Humanities Now highlighted an interesting post in recent days that explains what’s going on, discusses this ‘war’, and in how this post came to my attention, points the way forward for the humanistic use of the web.

In ‘Crowd-Frauding: Why the Internet is Fake‘, Eric Hellman discusses a new avenue for power (assuming that power ‘derives from the ability to get people to act together’. In this case, ‘cooperative traffic generation’, or software-organized crime. Hellman was finding a surge of fake users on his site, and he began to investigate why this was. Turns out, if you want to promote your website and jack up its traffic, you can install a program that manufacturers fake visitors to your sites, who click around, click on adverts, register… and in turn does this for other users of the software. Money is involved.

“In short, your computer has become part of a botnet. You get paid for your participation with web traffic. What you thought was something innocuous to increase your Alexa- ranking has turned you into a foot-soldier in a software-organized crime syndicate. If you forgot to run it in a sandbox, you might be running other programs as well. And who knows what else.

The thing that makes cooperative traffic generation so difficult to detect is that the advertising is really being advertised. The only problem for advertisers is that they’re paying to be advertised to robots, and robots do everything except buy stuff. The internet ad networks work hard to battle this sort of click fraud, but they have incentives to do a middling job of it. Ad networks get a cut of those ad dollars, after all.

The crowd wants to make money and organizes via the internet to shake down the merchants who think they’re sponsoring content. Turns out, content isn’t king, content is cattle.”

Hellman goes on to describe how the arms race, the red queen effect, between these botnets and advertising models that depend on clickrates etc will push those of us without the computing resources to fight in these battles into the arms of the Googles, the Amazons, the Facebooks: and their power will increase correspondingly.

“So with the crowd-frauders attacking advertising, the small advertiser will shy away from most publishers except for the least evil ones- Google or maybe Facebook. Ad networks will become less and less efficient because of the expense of dealing with click-fraud. The rest of the the internet will become fake as collateral damage. Do you think you know how many users you have? Think again, because half of them are already robots, soon it will be 90%. Do you think you know how much visitors you have? Sorry, 60% of it is already robots.”

I sometimes try explaining around the department here that when we use the internet, we’re not using a tool, we’re sharing authority with countless engineers, companies, criminals, folks-in-their-parents-basement, ordinary folks, students, algorithms whose interactions with other algorithms can lead to rather unintended outcomes. We can’t naively rely on the goodwill of the search engine to help us get our stuff out there. This I think is an opportunity for a return of the human curated web. No, I don’t mean building directories and indices. I mean, a kind of supervised learning algorithm (as it were).

Digital Humanities Now provides one such model (and there are of course others, such as Reddit, etc). A combination of algorithm and human editorial oversite, DHNow is a cybernetic attempt to bring to the surface the best in the week’s digital humanities work, wherever on the net it may reside. We should have the same in archaeology. An Archaeology Now!  The infrastructure is already there. Pressforward, the outfit from the RRCHNM has developed a workflow for folding volunteer editors into the weekly task of separating the wheat from the chaff, using a custom built plugin for WordPress. Ages ago we talked about a quarterly journal where people would nominate their own posts and we would spider the web looking for these nominations, but the technology wasn’t really there at that time (and perhaps the idea was too soon). With the example of DHNow, and the emergence of this new front in botnets/SEO/clickfraud and the dangers that that poses, perhaps it’s time to revisit the idea of the human-computer curated archaeoweb?

Hollis Peirce, George Garth Graham Research Fellow

Hollis Peirce on Twitter: https://twitter.com/HollPeirce

Mr. Hollis Peirce https://twitter.com/HollPeirce

I am pleased to announce that the first George Garth Graham Undergraduate Digital History Research Fellow will be Mr. Hollis Peirce.

Hollis is a remarkable fellow. He attended the Digital Humanities Summer Institute at the University of Victoria in the summer of 2012. At DHSI he successfully completed a course called “Digitization Fundamentals and Their Application”. In the fall semester of 2012 he was the impetus behind, and helped to organize,  THATCamp Accessibility on the subject of the impact of digital history on accessibility in every sense of the word.

Hollis writes,

Life for me has been riddled with challenges.  The majority of them coming on account of the fact that I, Hollis Peirce, am living life as a disabled individual with Congenital Muscular Dystrophy as many things are not accessible to me.  However, I have never let this fact hold me back from accomplishing my goals.  Because of this, when I first started studying history I knew I was not choosing an easy subject for a disabled individual such as myself.  All those old, heavy, books on high library shelves that history is known for made it one of the most inaccessible subjects possible to study.  All that changed however, when I discovered digital history.

It was thanks to a new mandatory class for history majors at Carleton University called The Historian’s Craft taught by a professor named Dr Shawn Graham.  This course was aimed at teaching students all about how to become a historian, and how a historian is evolving through technology.  At that moment the idea for ‘Accessibility & Digital History’ came to mind.  From that point on many steps have been taken to advance my studies in this field, which has led to being selected as the first George Garth Graham Undergraduate Digital History Reseach Fellow.

Hollis and I have had our first meeting, about what his project might entail. When I initially cooked this idea up, I thought it would allow students the opportunity to work on my projects, or those of my colleagues around the university. As we chatted about Hollis’ ideas, (and where I batted around some of my own stuff),  I realized that I had the directionality of this relationship completely backwards.

It’s not that Hollis gets to work on my projects. It’s that I get to work on his.

Here’s what we came up with.

At THATCamp Accessibility, we recorded every session. We bounced around the idea of transcribing those sessions, but realized that that was not really feasible for us. We started talking about zeroing in on certain segments, to tell a history of the future of an accessible digital humanities… and ideas started to fizz. I showed Hollis some of Jentery Sayer’s stuff, especially his work with Scalar . 

Jentery writes,

the platform particularly facilitates work with visual materials and dynamic media (such as video and audio)….it enables writers to assemble content from multiple sources and juxtapose them with their own compositions.

Can we use Scalar to tell the story of THATCamp Accessibility that captures the spontaneity, creativity, and excitement of that day in a way that highlights the issues of accessibility that Hollis  wants to explore? And if we can, how can we make it accessible for others (screenreaders, text-to-speech, etc?) And if we focus on telling history with an eye to accessibility (oh, how our metaphors privilege certain senses, ways of knowing!) maybe there will be lessons for telling history, full stop?

Stay tuned! Hollis is setting up his blog this week, but he’ll be posting over at http://hollispeirce.grahamresearchfellow.org/

Historian’s Macroscope- how we’re organizing things

‘One of the sideshows was wrestling’ from National Library of Scotland on Flickr Commons; found by running this post through http://serendipomatic.org

How do you coordinate something as massive as a book project, between three authors across two countries?

Writing is a bit like sausage making. I write this, thinking of Otto von Bismarck, but Wikipedia tells me:

  • Laws, like sausages, cease to inspire respect in proportion as we know how they are made.
    • As quoted in University Chronicle. University of Michigan (27 March 1869) books.google.de, Daily Cleveland Herald (29 March 1869), McKean Miner (22 April 1869), and “Quote… Misquote” by Fred R. Shapiro in The New York Times (21 July 2008); similar remarks have long been attributed to Otto von Bismarck, but this is the earliest known quote regarding laws and sausages, and according to Shapiro’s research, such remarks only began to be attributed to Bismarck in the 1930s.

I was thinking just about the messiness rather that inspiring respect; but we think there is a lot to gain when we reveal the messiness of writing. Nevertheless, there are some messy first-first-first drafts that really ought not to see the light of day. We want to do a bit of writing ‘behind the curtain’, before we make the bits and pieces visible on our Commentpress site, themacroscope.org.  We are all fans of Scrivener, too, for the way it allows the bits and pieces to be moved around, annotated, rejected, resurrected and so on. Two of us are windows folks, the other a Mac. We initially tried using Scrivener and Github, as a way of managing version control over time and to provide access to the latest version simultaneously. This worked fine, for about three days, until I detached the head.

Who knew that decapitation was possible? Then, we started getting weird line breaks and dropped index cards happening. So we switched tacts and moved our project into a shared dropbox folder. We know that with dropbox we absolutely can’t have more than one of us be in the project at the same time. We started emailing each other to say, ‘hey, I’m in the project….now. It’s 2.05 pm’ but that got very messy. We installed yshout  and set it up to log our chats. Now, we can just check to see who’s in, and leave quick memos about what we were up to.

Once we’ve got a bit of the mess cleaned up, we’ll push bits and pieces to our Commentpress site for comments. Then, we’ll incorporate that feedback back in our Scrivener, and perhaps re-push it out for further thoughts.

One promising avenue that we are not going down, at least for now, is to use Draft.  Draft has many attractive features, such as multiple authors, side-by-side comparisons, and automatic pushing to places such as WordPress. It even does footnotes! I’m cooking up an assignment for one of my classes that will require students to collaboratively write something, using Draft. More on that some other day.

Historical Friction

edit June 6 – following on from collaboration with Stu Eve, we’ve got a version of this at http://graeworks.net/historicalfriction/

I want to develop an app that makes it difficult to move through the historically ‘thick’ places – think Zombie Run, but with a lot of noise when you are in a place that is historically dense with information. I want to ‘visualize’ history, but not bother with the usual ‘augmented reality’ malarky where we hold up a screen in front of our face. I want to hear the thickness, the discords, of history. I want to be arrested by the noise, and to stop still in my tracks, be forced to take my headphones off, and to really pay attention to my surroundings.

So here’s how that might work.

1. Find wikipedia articles about the place where you’re at. Happily, inkdroid.org has some code that does that, called ‘Ici’. Here’s the output from that for my office (on the Carleton campus):

http://inkdroid.org/ici/#lat=45.382&lon=-75.6984

2. I copied that page (so not the full wikipedia articles, just the opening bits displayed by Ici). Convert these wikipedia snippets into numbers. Let A=1, B=2, and so on. This site will do that:

http://rumkin.com/tools/cipher/numbers.php

3. Replace dashes with commas. Convert those numbers into music. Musical Algorithmns is your friend for that. I used the default settings, though I sped it up to 220 beats per minute. Listen for yourself here. There are a lot of wikipedia articles about the places around here; presumably if I did this on, say, my home village, the resulting music would be much less complex, sparse, quiet, slow. So if we increased the granularity, you’d start to get an acoustic soundscape of quiet/loud, pleasant/harsh sounds as you moved through space – a cost surface, a slope. Would it push you from the noisy areas to the quiet? Would you discover places you hadn’t known about? Would the quiet places begin to fill up as people discovered them?

Right now, each wikipedia article is played in succession. What I really need to do is feed the entirety of each article through the musical algorithm, and play them all at once. And I need a way to do all this automatically, and feed it to my smartphone. Maybe by building upon this tutorial from MIT’s App Inventor. Perhaps there’s someone out there who’d enjoy the challenge?

I mooted all this at the NCPH THATCamp last week – which prompted a great discussion about haptics, other ways of engaging the senses, for communicating public history. I hope to play at this over the summer, but it’s looking to be a very long summer of writing new courses, applying for tenure, y’know, stuff like that.

Edit April 26th – Stuart and I have been playing around with this idea this morning, and have been making some headway per his idea in the comments. Here’s a quick screengrab of it in action: http://www.screencast.com/t/DyN91yZ0

Practical Necromancy talk @Scholarslab – part I

Below is a draft of the first part of my talk for Scholarslab this week, at the University of Virginia. It needs to be whittled down, but I thought that those of you who can’t drop by on Thursday might enjoy this sneak peak.

Thursday, March 21 at 2:00pm
in Scholars’ Lab, 4th floor Alderman Library.

When I go to parties, people will ask me, ‘what do you do?’. I’ll say, I’m in the history department at Carleton. If they don’t walk away, sometimes they’ll follow that up with, ‘I love history! I always wanted to be an archaeologist!’, to which I’ll say, ‘So did I!’

My background is in Roman archaeology. Somewhere along the line, I became a ‘digital humanist’, so I am honoured to be here to speak with you today, here at the epicentre, where the digital humanities movement all began.

If the digital humanities were a zombie flick, somewhere in this room would be patient zero.

Somewhere along the line, I became interested in the fossilized traces of social networks that I could find in the archaeology. I became deeply interested – I’m still interested – in exploring those networks with social network analysis. But I became disenchanted with the whole affair, because all I could develop were static snapshots of the networks at different times. I couldn’t fill in the gaps. Worse, I couldn’t really explore what flowed over those networks, or how those networks intersected with broader social & physical environments.

It was this problem that got me interested in agent based modeling. At the time, I had just won a postdoc in Roman Archaeology at the University of Manitoba with Lea Stirling. When pressed about what I was actually doing, I would glibly respond, ‘Oh, just a bit of practical necromancy, raising the dead, you know how it is’. Lea would just laugh, and once said to me, ‘I have no idea what it is you’re doing, but it seems cool, so let’s see what happens next!’

How amazing to meet someone with the confidence to dance out on a limb like that!

But there was truth in that glib response. It really is a form of practical necromancy, and the connections with actual necromancy and technologies of death is a bit more profound than I first considered.

So today, let me take you through a bit of the deep history of divination, necromancy, and talking with the dead; then we’ll consider modern simulation technologies as a form of divination in the same mold; and then I’ll discuss how we can use this power for good instead of evil, of how it fits into the oft-quote digital humanities ethos of ‘hacking as a way of knowing’ (which is rather like experimental archaeology, when you think about it), and how I’m able to generate a probabilistic historiography through this technique.

And like all good necromancers, it’s important to test things out on unwilling victims, so I would also like to thank the students of HIST3812 who’ve had all of the ideas road-tested on them earlier this term.

Zombies clearly fill a niche in modern western culture. The president of the University of Toronto recently spoke about ‘zombie ideas’ that despite our best efforts, persist, infect administrators, politicians, and students alike, trying to eat the brains of university education.

Zombies emerge in popular culture in times of angst, fear, and uncertainty. If hollywood has taught us anything, it’s that Zombies are bad news. Sometimes the zombies are formerly dead humans; sometimes they are humans who have been transformed. Sometimes we deliberately create a zombie. The zombie can be controlled, and made to do useful work; zombie as a kind of slavery. More often, the zombies break loose, or are the result of interfering with things humanity was wont not too; apocalypse beckons. But sometimes, like ‘Fido’, a zombie can be useful, can be harnessed, and somehow, be more human than the humans. [Fido]

If you’d like to raise the dead yourself, the answer is always just a click away [ehow].

There are other uses for the restless dead. Before our current fixation with apocalypse, the restless dead could be useful for keeping the world from ending.

In video games, we call this ‘the problem space’ – what is it that a particular simulation or interaction is trying to achieve? For humanity, at a cosmological level, the response to that problem is through necromancy and divination.

I’m generalizing horribly, of course, and the anthropologists in the audience are probably gritting their teeth. Nevertheless, when we look at the deep history and archaeology of many peoples, a lot can be tied to this problem of keeping the world from ending. A solution to the problem was to converse with those who had gone before, those who were currently inhabiting another realm. Shamanism was one such response. The agony of shamanism ties well into subsequent elaborations such as the ball games of mesoamerica, or other ‘game’ like experiences. The ritualized agony of the athlete was one portal into recreating the cosmogonies and cosmologies of a people, thus keeping the world going.

The bull-leaping game at Knossos is perhaps one example of this, according to some commentators. Some have seen in the plan of the middle minoan phase of this palace (towards the end of the 2nd millenium BC) a replication in architecture of a broader cosmology, that its very layout reflects the way the Minoans saw the world (this is partly also because this plan seems to replicate in other Minoan centres around the Aegean). Jeffrey Soles, pointing to the architectural play of light and shadow throughout the various levels of Knossos argues that this maze-like structure was all part of the ecstatic journey, and ties shamanism directly to the agonies of sport & game in this location. We don’t have the Minoans’ own stories, of course, but we do have these frescoes of bull-leaping, and other paraphernalia which tie in nicely with the later dark-age myths of Greece

So I’m making a connection here between the way a people see the world working, and their games & rituals. I’m arguing that the deep history of games  is a simulation of how the world works.

This carries through to more recent periods as well. Herodotus wrote about the coming of the Etruscans to Italy: “In the reign of Atys son of Menes there was a great scarcity of food in all Lydia. For a while the Lydians bore this with patience; but soon, when the famine continued, they looked for remedies, and various plans were suggested. It was then that they invented the games of dice, knucklebones, and ball, and all the other games of pastime, except for checkers, which the Lydians do not claim to have invented. Then, using their discovery to forget all about the famine, they would play every other day, all day, so that they would not have to eat… This was their way of life for eighteen years. Since the famine still did not end, however, but grew worse, the king at last divided the people into two groups and made them draw lots, so that one should stay and the other leave the country’.

Here I think Herodotus misses the import of the games: not as a pasttime, but as a way of trying to control, predict, solve, or otherwise intercede with the divine, to resolve the famine. In later Etruscan and Roman society, gladiatorial games for instance were not about entertainment but rather about cleansing society of disruptive elements, about bringing everything into balance again, hence the elaborate theatre of death that developed.

The specialist never disappears though, the one who has that special connection with the other side and intercedes for broader society as it navigates that original problem space. These were the magicians and priests. But there is an important distinction here. The priest is passive in reading signs, portents, and omens. Religion is revealed, at its proper time and place, through proper observation of the rituals. The magician is active – he (and she) compels the numinous to reveal itself, the spirits are dragged into this realm; it is the magician’s skill and knowledge which causes the future to unfurl before her eye.

The priest was holy, the magician was unholy.

Straddling this divide is the Oracle. The oracle has both elements of revelation and compulsion. Any decent oracle worth its salt would not give a straight-up answer, either, but rather required layers of revelation and interpretation. At Delphi, the God spoke to the Pythia, the priestess, who sat on the stool over the crack in the earth. When the god spoke, the fumes from below would overcome her, causing her to babble and writhe uncontrollably. Priests would then ‘interpret’ the prophecy, in form of a riddle.

Why riddles? Riddles are ancient. They appear on cuneiform texts. Even Gollum knew what a true riddle should look like – a kind of lyric poem asking a question that guards the right answer in hints and wordplay.

‘I tremble at each breath of air/ And yet can heaviest burders bear. [implicit question being asked is who am I? – water]

Bilbo cheated.

We could not get away from a discussion of riddles in the digital humanities without of course mentioning the I-ching. It’s a collection of texts that, depending on dice throws, get combined and read in particular ways. Because this is essentially a number of yes-or-no answers, the book can be easily coded onto a computer or represented mechanically. In which case, it’s not really a ‘book’ at all, but a machine for producing riddles.

Ruth Wehlau writes, “Riddlers, like poets, imitate God by creating their own cosmos; they recreate through words, making familiar objects into something completely new, rearranging the parts of pieces of things to produce creatures with strange combinations of arms, legs, eyes and mouths. In this transformed world, a distorted mirror of the real world, the riddler is in control, but the reader has the ability to break the code and solve the mystery (wehlau 1997)

Riddles & divination are related, and are dangerous. But they also create a simulation, of how the world can come to be, of how it can be controlled.

One can almost see the impetus for necromancy, when living in a world described by riddles. Saul visits the Witch of Endor; Oddyseus goes straight to the source.

…and Professor Hix prefers the term ‘post mortem communications’. However you spin it, though, the element of compulsion, of speaking with the dead, marks it out as a transgression; necromancers and those who seek their aid never end well.

It remains true today, that those who practice simulation, are similarly held in dubious regard. If that was not the case, tongue in cheek articles titles such as this would not be necessary.

I am making the argument that modern computational simulation, especially in the humanities, is more akin to necromancy than it is to divination, for all of these reasons.

But it’s also the fact that we do our simulation through computation itself that marks this out as a kind of necromancy.

The history of the modern digital computer is tied up with the need to accurately simulate the yields of atomic bombs,  of blast zones, and potential fallout, of death and war. Modern technoculture has its roots in the need to accurately model the outcome of nuclear war, an inversion of the age old problem space, ‘how can we keep the world from ending’ through the doctrines of mutually assured destruction.

The playfulness of those scientists, and the acceleration of hardware technology lead to video games, but that’s a talk for another day (and indeed, has been recently well treated by Rob MacDougall of Western University).

‘But wait! Are you implying that you can simulate humans just as you could individual bits of uranium and atoms, and so on, like the nuclear physicists?’ No, I’m not saying that, but it’s not for nothing that Isaac Asimov gave the world Hari Seldon & the idea of ‘psychohistory’ in the 1950s. As Wikipedia so ably puts it, “Psychohistory is a fictional science in Isaac Asimov’s Foundation universe which combines history, sociology, etc., and mathematical statistics to make general predictions about the future behavior of very large groups of people, such as the Galactic Empire.”

Even if you could do Seldon’s psychohistorical approach, it’s predicated on a population of an entire galaxy. One planetfull, or one empire-full, or one region-full, of people just isn’t enough. Remember, this is a talk on ‘practical’ necromancy, not science-fiction.

Well what about so-called ‘cliodynamics’? Cliodynamics looks for recurring patterns in aggregate statistics of human culture. It may well find such patterns, but it doesn’t really have anything to say about ‘why’ such patterns might emerge. Both psycohistory and cliodynamics are concerned with large aggregates of people. As an archaeologist, all I ever find are the traces of individuals, of individual decisions in the past. It always requires some sort of leap to jump from these individual traces to something larger like ‘the group’ or ‘the state’. A Roman aqueduct is, at base, still the result of many individual actions.

A practical necromancy therefore is a simulation of the individual.

There are many objections to simulation of human beings, rather than things like atoms, nuclear bombs, or the weather. Our simulations can only do what we program them to do. So they are only simulations of how we believe the world works (ah! Cosmology!). In some cases, like weather, our beliefs and reality match quite well, at least for a few days, and we know much about how the variables intersect. But, as complexity theory tells us, starting conditions strongly affect how things transpire. Therefore we forecast from multiple runs with slightly different starting conditions. That’s what a 10% chance of rain really means: We ran the simulation 100 times, and in 10 of them, rain emerged.

And humans are a whole lot more complex than the water cycle. In the case of humans, we don’t know all the variables; we don’t know how free will works; we don’t know how a given individual will react; we don’t understand how individuals and society influence each other. We do have theories though.

This isn’t a bug, it’s a feature. The direction of simulation is misplaced. We cannot really simulate the future, except in extremely circumscribed situations, such as pedestrian flow. So let us not simulate the future, as humanists. Let us create some zombies, and see how they interact. Let our zombies represent individuals in the past. Give these zombies rules for interacting that represent our best beliefs, our best stories, of how some aspect of the past worked. Let them interact. The resulting range of possible outcomes becomes a kind of probabilistic historiography. We end up with not just a story about the past, but also about other possible pasts that could have happened if our initial story we are telling about how individuals in the past acted is true, for a given value of true.

 We create simulacra, zombies, empty husks representing past actors. We give them rules to be interpreted given local conditions. We set them in motion from various starting positions. We watch what emerges, and thus can sweep the entire behavior space, the entire realm of possible outcomes given this understanding. We map what did occur (as best as we understand it) against the predictions of the model. For the archaeologist, for the historian, the strength of agent based modeling is that it allows us to explore the unintended consequences inherent in the stories we tell about the past. This isn’t easy. But it can be done. And compared to actually raising the dead, it is indeed practical.

[and here begins part II, which runs through some of my published ABMS, what they do, why they do it. All of this has to fit within an hour, so I need to do some trimming.]

[Postscriptum, March 23: the image of the book of random digits came from Mark Sample's 'An Account of Randomness in Literary Computing, & was meant to remind me to talk about some of the things Mark brought up. As it happens, I didn't do that when I presented the other day, but you really should go read his post.]

p3d.in for hosting your 3d scans

I’m playing with p3d.in to host some three dimensional models I’ve been making with 123D Catch. These are models that I have been using in conjunction with Junaio to create augmented reality pop-up books (and other things; more on that anon). Putting these 3d objects onto a webpage (or heaven forbid, a pdf) has been strangely much more complicated and time-consuming. P3d.in then serves a very useful purpose then!

Below are two models that I made using 123D catch. The first is the end of a log recovered from anaerobic conditions at the bottom of the Ottawa River (which is very, very deep in places). The Ottawa was used as a conduit for floating timber from its enormous watershed to markets in the US and the UK for nearly two hundred years. Millions of logs floated down annually…. so there’s a lot of money sitting down there. A local company, Log’s End, has been recovering these old growth logs and turning them into high-end wide plank flooring. They can’t use the ends of the logs as they are usually quite damaged, so my father picked some up and gave them to me, knowing my interest in all things stamped. This one carries an S within a V, which dates it to the time and timber limits of J.R. Booth I believe.

logend-edit2 (Click to view in 3D)

And here we have one of the models that my students made last year from the Mesoamerican materials conserved at the Canadian Museum of Civilization (soon-to-be-repurposed as the Museum of Canadian History; what will happen to these awkward materials that no longer fit the new mandate?)

mesoamerican (Click to view in 3D)

PS
Incidentally, I’ve now embedded these in a Neatline exhibition I am building:

3d manipulable objects in time and space

Why I Play Games

(originally posted at #HIST3812, my course blog for this term’s History3812: Gaming and Simulations for Historians, at Carleton University).

I play because I enjoy video games, obviously, but I also get something else out of it.  Games are a ‘lively art’; they are an expressive art, and the artistry lies in encoding rules (descriptions) about how the world works at some microlevel: and then watching how this artistry is further expressed in the unintended consequences of those rules, their intersections, their cancellations, causing new phenomena to emerge.

This strikes me as the most profound use of humanities computation out there. Physicists tell us that the world is made of itty bitty things that interact in particular ways. In which case, everything else is emergent: including history. I’m not saying that there are ‘laws’ of human action; but we do live in this universe. So, if I can understand some small part of the way life was lived in the past, I can model that understanding, and explore the unintended outcomes of that understanding… and go back to the beginning and model those.

I grew up with the video game industry. Adventure? I played that. We had a vic-20 . If you wanted to play a game, you had to type it in yourself. There used to be a magaine (Compute!) that would have all of the code printed within, along with screenshots. Snake, Tank Wars – yep. My older brother would type, and I would read the individual letters (and spaces, and characters) out. After about a week, we’d have a game.

And there would be bugs. O lord, there were bugs.

When we could afford games, we’d buy text adventures from Infocom. In high school, my older brother programmed a quiz game as his history project for the year. Gosh, we were cool. But it was! Here we were, making the machine do things.

As the years went on, I stopped programming my own games. Graphics & technology had moved too fast. In college, we used to play Doom (in a darkened room, with the computer wired to the stereo. Beer often figured). We played SimCity. We played the original Civilization.

These are the games that framed my interactions with computers. Then, after I finished my PhD, I returned to programming when I realized that I could use the incredible artificial intelligences, the simulation engines, of modern games, to do research. To enhance my teaching.

I got into Agent Based Modeling, using the Netlogo platform. This turned my career around: I ceased to be a run-of-the-mill materials specialist (Roman archaeology), and became this new thing, a ‘digital humanist’. Turns out, I’m now an expert on simulation and history.

Cool, eh?

And it’s all down to the fact that I’m a crappy player of games. I get more out of opening the hood, looking at how the thing works. Civilization IV and V are incredible simulation engines. So: what kinds of history are appropriate to simulate? What kinds of questions can we ask? That’s what I’m looking forward to exploring with you (and of course, seeing what you come up with in your final projects).

But maybe a more fruitful question to start with, in the context of the final project of this course, is, ‘what is the strangest game you’ve ever played?’

What made it strange? Was it the content, the mechanics, the interface?

I played one once where you had to draw the platform with crayons, and then the physics engine would take over. The point was to try to get a ball to roll up to a star. Draw a teeter-totter under the star, and perhaps the ball would fall on it, shooting the star up to fall down on the ball, for instance. A neat way of interacting with the underlying physics of game engines.

I’d encourage everyone to think differently about what the games might be. For instance, I could imagine a game that shows real-time documents (grabbed from a database), and you have to dive into it, following the connected discourses (procedurally generated using topic models and network graphing software to find these – and if this makes no sense to you, take a quick peek at the Programming Historian) within it to free the voices trapped within…

This is why I play. Because it makes me think differently about the materials I encounter.