Getting Data out of Open Context & Doing Useful Things With It: Part 2

If you recall, at the end of part 1 I said ‘oh, by the way, Open Context lets you download data as csv anyway’. You might have gotten frustrated with me there – Why are we bothering with the json then? The reason is that the full data is exposed via json, and who knows, there might be things in there that you find you need, or that catch your interest, or need to be explored further. (Note also, Open Context has unique URI’s – identifiers- for every piece of data they have; these unique URIs are captured in the json, which can also be useful for you).

Json is not easy to work with. Fortunately, Matthew Lincoln has written an excellent tutorial on json and jq over at The Programming Historian which you should go read now. Read the ‘what is json?’ part, at the very least. In essence, json is a text file where keys are paired with values. JQ is a piece of software that enables us to reach into a json file, grab the data we want, and create either new json or csv. If you intend to visualize and explore data using some sort of spreadsheet program, then you’ll need to extract the data you want into a csv that your spreadsheet can digest. If you wanted to try something like d3 or some other dynamic library for generating web-based visualizations (eg p5js), you’ll need json.

jqplay

JQ lets us do some fun filtering and parsing, but we won’t download and install it yet. Instead, we’ll load some sample data into a web-toy called jqplay. This will let us try different ideas out and see the results immediately. In the this file  called sample.json I have the query results from Open Context – Github recognizes that it is json and that it has geographic data within it, and turns it automatically into a map! To see the raw json, click on the < > button. Copy that data into the json box at jqplay.org.

JQPlay will colour-code the json. Everything in red is a key, everything in black is a value. Keys can be nested, as represented by the indentation. Scroll down through the json – do you see any interesting key:value pairs? Matthew Lincoln’s tutorial at the programming historian is one of the most cogent explanations of how this works, and I do recommend you read that piece. Suffice to say, for now, that if you see an interesting key:value pair that you’d like to extract, you need to figure out just how deeply nested it is. For instance, there is a properties key that seems to have interesting information within it about dates, wares, contexts and so on. Perhaps we’d like to build a query using JQ that extracts that information into a csv. It’s within the features key pair, so try entering the following in the filter box:

.features [ ] | .properties

You should get something like this:

{
  "id": "#geo-disc-tile-12023202222130313322",
  "href": "https://opencontext.org/search/?disc-geotile=12023202222130313322&prop=oc-gen-cat-object&rows=5&q=Poggio",
  "label": "Discovery region (1)",
  "feature-type": "discovery region (facet)",
  "count": 12,
  "early bce/ce": -700,
  "late bce/ce": -535
}
{
  "id": "#geo-disc-tile-12023202222130313323",
  "href": "https://opencontext.org/search/?disc-geotile=12023202222130313323&prop=oc-gen-cat-object&rows=5&q=Poggio",
  "label": "Discovery region (2)",
  "feature-type": "discovery region (facet)",
  "count": 25,
  "early bce/ce": -700,
  "late bce/ce": -535
}

For the exact syntax of why that works, see Lincoln’s tutorial. I’m going to just jump to the conclusion now. Let’s say we wanted to grab some of those keys within properties, and turn into a csv. We tell it to look inside features and find properties; then we tell it to make a new array with just those keys within properties we want; and then we tell it to pipe that information into comma-separated values. Try the following on the sample data:

.features [ ] | .properties | [.label, .href, ."context label", ."early bce/ce", ."late bce/ce", ."item category", .snippet] | @csv

…and make sure to tick the ‘raw output’ box at the top right. Ta da! You’ve culled the information of interest from a json file, into a csv. There’s a lot more you can do with jq, but this will get you started.

get jq and run the query from the terminal or command line

Install on OS – instructions from Lincoln

Install on PC – instructions from Lincoln

Got JQ installed? Good. Open your terminal or command prompt in the directory where you’ve got your json file with the data you extracted in part 1. Here we go:

jq -r '.features [ ] | .properties | [.label, .href, ."context label", ."early bce/ce", ."late bce/ce", ."item category", .snippet] | @csv' data.json > data.csv

So, we invoke jq, we tell it we want the raw output (-r), we give it the filter to apply, we give it the file to apply it to, and we tell it what to name the output.

one last thing

Take a look at how Lincoln pipes the output of a wget command into jq at the end of the section on ‘invoking jq’. Do you see how we might accelerate this entire process the next time you want data out of Open Context?

Now what?

Well, how about you take that csv data and see what stories you can tell with it? A good place to start is with wtfcsv, or Raw or Plot.ly or heaven help me, Excel or Numbers. Then, enter our contest maybe?

At the very least, you’ve now learned some powerful skills for working with the tsunami of open data now flooding the web. Happy wading!

Advertisement

Getting Data out of Open Context & Doing Useful Things With It: Part 1

a walkthrough for extracting and manipulating data from opencontext.org

Search for something interesting. I put ‘poggio’ in the search box, and then clicked on the various options to get the architectural fragments. Look at the URL:
https://opencontext.org/subjects-search/?prop=oc-gen-cat-object&q=Poggio#15/43.1526/11.4090/19/any/Google-Satellite
See all that stuff after the word ‘Poggio’? That’s to generate the map view. We don’t need it.

We’re going to ask for the search results w/o all of the website extras, no maps, no shiny interface. To do that, we take advantage of the API. With open context, if you have a search with a ‘?’ in the URL, you can put .json in front of the question mark, and delete all of the stuff from the # sign on, like so:

https://opencontext.org/subjects-search/.json?prop=oc-gen-cat-object&q=Poggio

Put that in the address bar. Boom! lots of stuff! But only one page’s worth, which isn’t lots of data. To get a lot more data, we have to add another parameter, the number of rows: ?rows=100&. Slot that in just before the p in prop= and see what happens.

Now, that isn’t all of the records though. Remove the .json and see what happens when you click on the arrows to page through the NEXT 100 rows. You get a URL like this:

https://opencontext.org/subjects-search/?rows=100&prop=oc-gen-cat-object&start=100&q=Poggio#15/43.1526/11.4090/19/any/Google-Satellite

So – to recap, the URL is searching for 100 rows at a time, in the general object category, starting from row 100, and grabbing materials from Poggio. We now know enough about how open context’s api works to grab material.

Couple of ways one could grab it:

  1. You could copy n’ paste -> but that will only get you one page’s worth of data (and if you tried to put, say, 10791 into the ‘rows’ parameter, you’ll just get a time-out error). You’d have to go back to the search page, hit the ‘next’ button, reinsert the .json etc over and over again.
  2. automatically. We’ll use a program called wget to do this. (To install wget on your machine, see the programming historian Wget will interact with the Open Context site to retrieve the data. We feed wget a file that contains all of the urls that we wish to grab, and it saves all of the data into a single file. So, open a new text file and paste our search URL in there like so:
https://opencontext.org/subjects-search/.json?rows=100&prop=oc-gen-cat-object---oc-gen-cat-arch-element&q=Poggio
https://opencontext.org/subjects-search/.json?rows=100&prop=oc-gen-cat-object---oc-gen-cat-arch-element&start=100&q=Poggio
https://opencontext.org/subjects-search/.json?rows=100&prop=oc-gen-cat-object---oc-gen-cat-arch-element&start=200&q=Poggio

…and so on until we’ve covered the full 4000 objects. Tedious? You bet. So we’ll get the computer to generate those URLS for us. Open a new text file, and copy the following in:

#URL-Generator.py

urls = '';
f=open('urls.txt','w')
for x in range(1, 4000, 100):
    urls = 'https://opencontext.org/subjects-search/.json?rows=100&prop=oc-gen-cat-object---oc-gen-cat-arch-element&start=%d&q=Poggio/\n' % (x)
    f.write(urls)
f.close

and save it as url-generator.py. This program is in the python language. If you’re on a Mac, it’s already installed. If you’re on a Windows machine, you’ll have to download and install it. To run the program, open your terminal (mac) or command prompt (windows) and make sure you’re in the same folder where you saved the program. Then type at the prompt:

python url-generator.py

This little program defines an empty container called ‘urls’; it then creates a new file called ‘urls.txt’; then we tell it to write the address of our search into the urls container. See the %d in there? The program writes a number between 1 and 4000; each time it does that, it counts by 100 so that the next time it goes through the loop, it adds a new address with the correct starting point! Then it saves that container of URLs into the file urls.txt. Go ahead, open it up, and you’ll see.

Now we’ll feed it to wget like so. At the prompt in your terminal or command line, type:

wget -i urls.txt -r --no-parent -nd –w 2 --limit-rate=10k

You’ll end up with a lot of files that have no file extension in your folder, eg,

.json?rows=100&prop=oc-gen-cat-object---oc-gen-cat-arch-element&start=61&q=Poggio%2F

Select all of these and rename them in your finder (instructions) or windows explorer (instructions), such that they have a sensible file name, and that the extension is .json. We are now going to concatenate these files into a single, properly formatted, .json file. (Note that it is possible for wget to push all of the downloaded information into a single json file, but it won’t be a properly formatted json file – it’ll just be a bunch of lumps of difference json hanging out together, which we don’t want).

We are going to use a piece of software written for NodeJS to concatenate our json files (this enables us to develop in javascript; it’s useful for lots of other things too). Go to the NodeJS download page and download and install for your machine. (Windows users, make sure you select the npm package manager as well during the install procedure). Once it’s installed, open a terminal or command prompt and type

npm install -g json-concat (mac users, you might need sudo npm install -g json-concat).

This installs the json-concat tool. We’ll now join our files together:

# As simple as this. Output file should be last
$ json-concat file1.json file2.json file3.json file4.json ouput.json

… for however many json files you have.

Congratulations

You now have downloaded data from Open Context as json, and you’ve compiled that data into a single json file. This ability for data to be called and retrieved programmaticaly also enables things like the Open Context package for the R statistical software environment. If you’re feeling adventurous, take a look at that.

In Part Two I’ll walk you through using JQ to masage the json into a csv file that can be explored in common spreadsheet software. (For a detailed lesson on JQ, see the programming historian, which also explains why json in the first place). Of course, lots of the more interesting data viz packages can deal with json itself, but more on that later.

And of course, if you’re looking for some quick and dirty data export, Open Context has recently implemented a ‘cloud download’ button that will export a simplified version of the data direct to csv on your desktop. Look for a little cloud icon with a down arrow at the bottom of your search results page. Now, you might wonder why I didn’t mention that at the outset, but look at it this way: now you know how to get the complete data, and with this knowledge, you could even begin building far more complicated visualizations or websites. It was good for you, right? Right? Right.

PS Eric adds: “Also, you can request different types of results from Open Context (see: https://opencontext.org/about/services#tab_query-meta). For instance, if you only want GeoJSON for the results of a search, add “response=geo-record” to the request. That will return just a list of geospatial features, without the metadata about the search, and without the facets. If you want a really simple list of URIs from a search, then add “response=uri”. Finally, if you want a simple list of search results with some descriptive attributes, add “response=uri-meta” to the search result.”

Open Context & Carleton Prize for Archaeological Visualization

Increasingly, archaeology data are being made available openly on the web. But what do these data show? How can we interrogate them? How can we visualize them? How can we re-use data visualizations?

We’d like to know. This is why we have created the Open Context and Carleton University Prize for Archaeological Visualization and we invite you to build, make, hack, the Open Context data and API for fun and prizes.

Who Can Enter?

Anyone! Wherever you are in the world, we invite you to participate. All entries will be publicly accessible and promoted via a context gallery on the Open Context website.

Sponsors

The prize competition is sponsored by the following:

  • The Alexandria Archive Institute (the nonprofit that runs Open Context)
  • The Digital Archaeology at Carleton University Project, led by Shawn Graham

Categories

We have prizes for the following categories of entries:

  • Individual entry: project developed by a single individual
  • Team entry: project developed by a collaborative group (2-3 people)
  • Individual student entry: project developed by a single student
  • Student team entry: project developed by a team of (2-3) students

Prizes

All prizes are awarded in the form of cash awards or gift vouchers of equivalent value. Depending on the award type, please note currency:

  • Best individual entry: $US200
  • Best team entry (teams of 2 or 3): $US300 (split accordingly)
  • Best student entry: $C200
  • Best student team entry (teams of 2 or 3): $C300 (split accordingly)

We will also note “Honorable Mentions” for each award category.

Entry Requirements

We want this prize competition to raise awareness of open data and reproducible research methods by highlighting some great examples of digital data in practice. To meet these goals, specific project entry requirements include the following:

  • The visualization should be publicly accessible/viewable, live on the open Web
  • The source code should be made available via Github or similar public software repository
  • The project needs to incorporate and/or create open source code, under licensing approved by the Free Software Foundation.
  • The source code must be well-commented and documented
  • The visualization must make use of the Open Context API; other data sources may also be utilized in addition to Open Context
  • A readme file should be provided (as .txt or .md or .rtf), which will include:
    • Instructions for reproducing the visualization from scratch must be included
    • Interesting observations about the data that the visualization makes possible
    • Documentation of your process and methods (that is to say, ‘paradata’ as per theLondon Charter, section 4)

All entries have to meet the minimum requirements described in ‘Entry Requirements’ to be considered.

Entries are submitted by filling a Web form (http://goo.gl/forms/stmnS73qCznv1n4v1) that will ask you for your particulars and the URL to your ‘live’ entry and the URL to your code repository. You will also be required to attest that the entry is your own creation.

Important Dates

  • Closing date for entry submissions: December 16, 2016
  • Winners announced: January 16, 2017

Criteria for Judging

  • Potential archaeological insight provided by the visualization
  • Reproducibility
  • Aesthetic impact
  • Rhetorical impact
  • Appropriate recognition for/of data stakeholders (creators and other publics)

Attention will be paid in particular to entries that explore novel ways of visualizing archaeological data, or innovative re-uses of data, or work that takes advantage of the linked nature of Open Context data, or work that enables features robust/reproducible code for visualizations that could be easily/widely applied to other datasets.

Judges

The judges for this competition are drawn from across the North America:

Resources

Digital Humanities is Archaeology

Caution: pot stirring ahead

I’m coming up on my first sabbatical. It’s been six years since I first came to Carleton – terrified – to interview for a position in the history department, in this thing, ‘digital humanities’. The previous eight years had been hard, hustling for contracts, short term jobs, precarious jobs, jobs that seemed a thousand miles away from what I had become expert in. I had had precisely one other academic interview prior to Carleton (six years earlier. Academia? I’d given up by then). Those eight years taught me much (and will be a post for another day, why I decided to give one more kick at the can).

The point of this morning’s reflection is to think about what it was that I was doing then that seemed appropriate-enough that I could spin my job application around it. At that time, it was agent based modeling.

During those previous eight years, I had had one academic year as a postdoc at the U Manitoba. The story of how I got that position is a story for another day, but essentially, it mirrors the xkcd cartoon Scott linked to the other day. I said ‘fuck it’. And I wrote an application that said, in essence, I’ve got all these networks; I want to reanimate them; agent modeling might be the ticket. (If you’ve ever spent any time in the world of stamped brick studies, this is NOT what we do…). So I did. And that’s what I had in hand when I applied to Carleton.

‘Agent modeling is digital humanities’, I said. Given that nobody else had much idea what DH was/is/could be, it worked. I then spent the next six years learning how to be a digital humanist. Learning all about what the literary historians are doing, learning about corpus linguistics, learning about natural language processing, learning learning learning. I could program in Netlogo; I learned a bit of python. I learned a bit of R. It seemed, for a long time, that my initial pitch to the department was wrong though. DH didn’t do the agent modeling schtick. Or at least, nobody I saw who called themselves a ‘digital humanist’. Maybe some digital archaeologists did (and are they DH? and how does DA differ from the use of computation in archaeology?)

But. I think there’s a change in the air.

I think, maybe, the digital humanities are starting to come around to what I’ve been arguing for over a decade, in my lonely little corners of the academy. Here’s some stuff I wrote in 2009 based on work I did in 2006 which was founded on archaeological work I did in 2001:

In any given social situation there are a number of behavioural options an individual may choose. The one chosen becomes “history,” the others become “counter-factual history.” As archaeologists, we find the traces of these individual decisions. In literature we read of Cicero’s decision to help his friend with a gift of money. What is the importance of the decision that Cicero did not make to not help his friend? How can we bridge the gap between the archaeological traces of an individual’s decision, and the option he or she chose not to pursue in order to understand the society that emerged from countless instances of individual decision-making? Compounding the problem is that the society that emerged influenced individual decision-making in a recursive, iterative fashion. The problem, simply stated, is one of facing up to complexity. A major tool for this problem is the agent based simulation.

[..]

[A]gent-based modeling […] requires modellers to make explicit their assumptions about how the world operates (Epstein). This is the same argument made by Bogost for the video game: it is an argument in code, a rhetoric for a particular view of the world. As historians, we make our own models every day when we conceive how a particular event occurred. The key difference is that the assumptions underlying our descriptions are often implicit.

The rules that we used to encode the model are behaviours derived from archaeology, from the discovered traces of individual interactions and the historical literature. Once the rules for agents in this model and others are encoded, the modeller initiates the simulation and lets the agents interact over and over again. As they interact, larger-scale behaviours – an artificial society – begins to emerge. In using an ABM, our central purpose is to generate the macro by describing the micro.

[…] It is worth repeating that agent-based modelling forces us to formalise our thoughts about the phenomenon under consideration. There is no room for fuzzy thinking. We make the argument in code. Doing so allows us to experiment with past and present human agents in way that could never be done in the real world. Some ABMs, for example, infect agents with a “disease” to determine how fast it spreads. An ABM allows us to connect individual interactions with globally emergent behaviours. It allows us to create data for statistical study that would be impossible to obtain from real-world phenomena

That’s a long quote; sorry. But.

Compare with what Sinclair & Rockwell write in their new book, Hermeneuticap41-42:

…we can say that computers force us to formalize what we know about texts and what we want to know. We have to formally represent a text – something which may seem easy, but which raises questions… Computing also forces us to write programs that formalize forms of analysis and ways of asking questions of a text. Finally, computing forces us to formalize how we want answers to our questions displayed for further reading and exploration. Formalization, not quantification, is the foundation of computer-assisted interpretation.

[…] In text analysis you make models, manipulate them, break them, and then talk about them. Counting things can be part of modeling, but is not an essential model of text analysis. Modeling is also part of the hermeneutical circle; there are formal models in the loop. […] thinking through modeling and formalization is itself a useful discipline that pushes you to understand your evidence differetnly, in greater depth, while challenging assuptions We might learn the most when the computer model fails to answer our questions.

The act of modeling becomes a parth disciplined by formalization, which frustrates notions of textual knowledge. When you fail at formalizing a claim, or when your model fails to answer questions, you learn something about what is demonstrably and quanitifiably there. Fromalizing enables interrogation. Others can engage with and interrogate your insights. Much humanities prose supports claims with quotations, providing an argument by association or with general statements aabout what is in the text – vagaries that cannot be tested by others except with more assertions and quotations. Formalization and modeling, by contrast, can be exposed openly in ways that provide new affordances for interaction between interpretations.

That’s a long quote; sorry. But.

Compare with what Piper writes in the inaugrual issue of Cultural Analytics:

One of the key concepts operative in computational research that has so far been missing from traditional studies of culture is that of modeling. A model is a metonymical tool – a miniature that represents a larger whole. But it is also recursive in that it can be modified in relationship to its “fit,” how well it represents this whole. There is a great deal of literature on the role of modeling in knowledge creation and this should become core reading for anyone undertaking cultural analytics. The more we think about our methods as models the further we will move from the confident claims of empiricism to the contingent ones of representation. Under certain conditions, it is true that (i.e. replicable and stable)…

That’s not as long a quote. I’m getting better. But.

Compare with Underwood’s abstract (and watch the video for) his talk on ‘Predicting the Past

We’re certainly comfortable searching and browsing [libraries], and we’re beginning to get used to the idea of mining patterns: we can visualise maps and networks and trends. On the other hand, interpreting the patterns we’ve discovered often remains a challenge. To address that problem, a number of literary scholars have begun to borrow methods of predictive modelling from social science. Instead of tracing a trend and then speculating about what it means, these scholars start with a specific question they want to understand — for instance, how firm is the boundary between fiction and biography? Or, how are men and women described differently in novels? The categories involved don’t have to be stable or binary. As long as you have sources of testimony that allow you to group texts, you can model the boundaries between the groups. Then you can test your models of the past by asking them to make blind predictions about unlabelled examples. Since the past already happened, the point of predicting it is not really to be right. Instead we trace the transformation of cultural categories by observing how our models work, and where they go wrong.

It feels like something is going on. It feels like there’s been a bit of a sea-change in what DH sees as its relationship to the wider world. I feel like there is an arc to my story now that makes sense, that where this field is going fits squarely in where I myself have come from. What is ‘digital humanities’?

It might be that DH is really a branch of archaeology.

Postscriptum

Here’s a thought:

If DH is archaeology in its use of modeling as a core method, and given that modeling inherently builds its theoretical perspectives into its core operations, then the only appropriate way of writing DH must be in simulation. Games. Playful interations.

Discuss.


BTW: There’s a rich literature in archaeology on modeling, on moving from the incomplete evidence to the rich stories we want to tell. All archaeological data is necessarily incomplete; it’s the foundational problem of archaeology. DH folks might want to give that literature a read. Recently, Ted Underwood posted on ‘the real problem with distant reading‘ and the objections folk raise concerning the complexity of human life if considered computationally. Ted comes around to essentially a ‘screw that’ position, and writes,

It’s okay to simplify the world in order to investigate a specific question. That’s what smart qualitative scholars do themselves, when they’re not busy giving impractical advice to their quantitative friends. Max Weber and Hannah Arendt didn’t make an impact on their respective fields — or on public life — by adding the maximum amount of nuance to everything, so their models could represent every aspect of reality at once, and also function as self-operating napkins.

The problems that literary scholars are finding in presenting their models and approaches to their (non-computational) peers have their parallels in archaeological debates from the 70s onwards; I think they might find useful material in those debates. Again: DH is archaeology.

Reactions to Battlefield Recovery episode 1

Battlefield Recovery, an execrable show that turns the looting of war dead into ‘entertainment’, was shown on Saturday on Channel 5 in the UK. I won’t dignify it by linking to it; instead see this article in the Guardian.

I wondered however what the tweeting public thought about the show – keeping in mind that Channel 5 viewers may or may not be the same kinds of folks who engage with Twitter. I used Ed Summer’s TWARC to collect approximately 3600 tweets (there are likely many more, but the system timed out). The file containing the IDs of all of these tweets is available here. You can use this file in conjuction with TWARC to recover all of the tweets and their associated metadata for yourself (which is approximately 19 mb worth of text). You can explore the language of the tweets for yourself via Voyant-Tools.

So the most retweeted interventions show a pretty strong signal of disapproval. I have not looked into users’ profiles to see whether or not folks identify as archaeologists. Nor have I mapped users’ networks to see how far these messages percolated, and into what kinds of communities. This is entirely possible to do of course, but this post just represents a first pass at the data.

Let’s look at the patterns of language in the corpus of tweets as a whole. I used the LDAVis package for R to create an interactive visualization of topics within the corpus, fitting it to 20 topics as a first stab. You can play with the visualization here. If you haven’t encountered topic modeling yet, it’s a technique to reverse engineer a corpus into the initial ‘topics’ from which the writers wrote (could have written). So, it’s worth pointing out that it’s not ‘truth’ we’re seeing here, but a kind of intellectual thought exercise: if there were 20 topics that capture the variety of discourse expressed in these tweets, what would they look like? The answer is, quite a lot of outrage, dismay, and disappointment that this TV show was aired. Look particular at say topic 8 or topic 3, and ‘disgust’. Topic 1, which accounts for the largest slice of the corpus, clearly shows how the discussants on twitter were unpacking the rebranding of this show from its previous incarnation as ‘Nazi War Diggers’, and the pointed comments at Clearstory Uk, the producers of Battlefield Recovery.

We can also look at patterns in the corpus from the point of view of individual words, imagining the interrelationships of word use as a kind of spatial map (see Ben Schmidt, Word Embeddings). If you give it a word – or a list of words – the approach will return to you words that are close in terms of their use. It’s a complementary approach to topic models. So, I wanted to see what terms were in the same vector as the name of the show & its producers (I’m using R). I give it this:

some_terms = nearest_to(model,model[[c("battlefieldrecovery", "naziwardiggers", "clearstoryuks")]],150)
plot(filter_to_rownames(model,names(some_terms)))

And I see the interrelationships like so:

…a pretty clear statement about what 3600 tweets felt, in aggregate along this particular vector. Of the tweets I saw personally (I follow a lot of archaeologists), there was an unequivocal agreement that what this show was doing was no better than looting. With word vectors, I can explore the space between pairs of binaries. So let’s assume that ‘archaeologist’ and ‘looter’ are opposite ends of a spectrum. I can plot this using this code:

actor_vector = model[["archaeologists"]] - model[["looters"]]
word_scores = data.frame(word=rownames(model))
word_scores$actor_score = model %>% cosineSimilarity(actor_vector) %>% as.vector

ggplot(word_scores %>% filter(abs(actor_score)>.725)) + geom_bar(aes(y=actor_score,x=reorder(word,actor_score),fill=actor_score<0),stat="identity") + coord_flip()+scale_fill_discrete("words associated with",labels=c("archaeologist","looter")) + labs(title="The words showing the strongest skew along the archaeologist-looter binary")

which gives us:

You can see some individual usernames in there; to be clear, this isn’t equating those individuals with ‘archaeologist’ or ‘looter’, rather, tweets mentioning those individuals tend to be RT’ing them or they themselves are using language or discussing these particular aspects of the show. I’m at a loss to explain ‘muppets’. Perhaps that’s a term of derision.

So, as far as this analysis goes – and one ought really to map how far and into what communities these messages penetrate – I’d say on balance, the twittersphere was outraged at this television ‘show’. As Nick said,

 

 

The humane hack – a snippet of an argument

[this is the snippet of an argument, and all that I’ve managed to produce today for #AcWriMo. I kinda like it though and offer it up for consumption, rough edges, warts, and all.  It emerges out of something Shawn Anctil said recently about ‘the Laws of Cool‘ when we were talking about his comps which happen this Thursday. In an effort to get my head around what he said, I started to write. This might make it into a piece on some of my recent sound work. Alan Liu’s stuff is always wonderful to read because it turns my head inside out, and I make no warrant that I am doing justice to Alan’s ideas. It’s been a while since I last looked, and I realize I really need to block out several days to do this properly. Anyway, working in public, fail gloriously, etc etc, i give you a snippet of an argument:]

Alan Liu, in 2004, wondered what the role of the arts and humanities was in an age of knowledge work, of deliverables, of an historical event horizon that only goes back the last financial quarter.  He examined the idea of ‘knowledge work’ and teased out how much of the driving force behind it is in pursuit of the ‘cool’. Through a deft plumbing of the history of the early internet (and in particular, riffing on Netscape’s ‘what’s cool?’ page from 1996 and their inability to define it except to say that they’d know it when they saw it ), Liu argues that cool is ‘the aporia of information… cool is information designed to resist information [emphasis original]… information fed back into its own signal to create a standing interference pattern, a paradox pattern’ (Liu, 2004: 179).  The latest web design, the latest app, the latest R package for statistics, the latest acronym on Twitter where all the digital humanists play: cool, and dividing the world.

That is, Liu argued that ‘cool’ was amongst other things a politics of knowledge work, a practice and ethos. He wondered how we might ‘challenge knowledge work to open a space, as yet culturally sterile (coopted, jejune, anarchistic, terroristic), for a more humane hack of contemporary knowledge?’ (Liu 2004: 9). Liu goes on to discuss how the tensions of ‘cool’ in knowledge work (for us, read: digital archaeology) also intersects with an ethos of the unknown, that is, of knowledge workers who work nowhere else somehow manage to stand outside that system of knowledge production. (Is alt-ac ‘alt’ partially because it is the cool work?). This matters for us as archaeologists. There are many ‘cool’ things happening in digital archaeology that somehow do not penetrate into the mainstream (such as it is). The utilitarian dots-on-a-map were once cool, but are now pedestrian. The ‘cool’ things that could be, linger on the fringes. If they did not, they wouldn’t be cool, one supposes. They resist.

To get that more humane hack that Liu seeks, Liu suggests that the historical depth that the humanities provides counters the shallowness of cool:

“The humanities thus have an explanation for the new arts of the information age, whose inheritance of a frantic sequence of artistic modernisms, postmodernisms, and post-postmodernists is otherwise only a displaced encounter with the raw process of historicity. Inversely, the arts offer the humanities serious ways of engaging – both practically and theoretically- with “cool”. Together, the humanities and arts might be able to offer a persuasive argument for the humane arts in the age of knowledge work” 2004:381.

In which case, the emergence of digital archaeologists and historians in the last decade might be the loci of the humane hacks – if we move into that space where we engage the arts.

We need to be making art.

 

The Video Game and the Archaeologist – draft

[this is a draft of a short piece I am writing for a society journal, hence not peer reviewed. I would therefore welcome comments, keeping in mind that I wrote it in one sitting this AM. When it comes out formally – if – I’ll post the link here and direct folks to read the final product there. I think it hangs together more or less ok.]

Tell the colleagues in your department, in your company, that you play video games, and you will be greeted with one of only two reactions: a polite murmur accompanied by the dying look of ‘this person is not serious’, or the enthusiastic embrace of the true believer. There appears to be no middle ground. Yet, there is a long history of using games in education, in museum outreach, and in public archaeology. There is even a (much shorter) history of using games to persuade (as ‘serious games’ or ‘news games’). But there is practically no history at all of games being used to make a scholarly argument. This is to miss an opportunity.

It is important however to ask, at the outset, what do games teach? What do games do?

“The game, or any computer game for that matter, is ultimately about mechanics, and not about content. The content is window dressing, and deep playing of a game such as Civilization teaches little about history, but everything about how to manipulate the complex algorithms that model the simulation” (Kee & Graham, 274)

Let us dispense with the notion that there is anything inherently gauche about archaeologists interested in the possibilities of video games, or any ‘natural’ reason why archaeology as a discipline should not be concerned with them. Manipulating algorithms, modelling societies through simulation: archaeologists have been doing this for years, within the ambit of GIS and Agent Based Models. The difference is, games have better eye-candy and production values. They should. Gaming as an industry generates more money than all of Hollywood.

A potted synopsis of game studies

Broadly, there are two camps when it comes to analyzing the affective import of games. The ludologists, as the name implies, are interested in the rules of the games, the possibilities (or not) for action within the game. Narratologists on the other hand consider the story of the game, the story that emerges, or the story within which the game action takes place. Both approaches are useful for situating what a game does, or what a game achieves.

Another (rather archaeological) approach is to consider typologies of games. This is not to be confused with ‘genre’, as genres (‘first person shooter’; ‘rogue-like’; ‘management sim’; ‘casual’) are largely marketing categories that conflate issues of game play, or perspective, or agency, for the purposes of gaining space in the various venues where games are bought and sold. There is a voluminous literature on the typologies of games which try to distill essential features in order to understand the crucial ways in which games differ (the better to understand their narratological or ludological aspects). In the context of ‘historical’ games, a typology that helps us consider what aspects about the past we wish to communicate, to teach, focuses on categorizing how the game treats time and space.

Within ‘space’, we can ask how the game treats perspective, topography, and the environment. Within ‘time’, we can wonder about pace, representation, and teleology. Consider the games ‘Civilization IV’ and ‘Caesar IV’ as in Kee and Graham xxxx:

Caesar IV

Civilization IV

Space

Perspective

Omni-Present

Vagrant

Topography

Topological

Geometrical

Environment

Dynamic

Dynamic

Time

Pace

Real-Time

Turn-Based

Representation

Arbitrary

Mimetic

Teleology

Finite

Finite

The value of this kind of typology is that it would allow us consider our archaeological representations of space and time in that light, to work out what conventions of game design would be most affective in communicating the argument about the past that we wish to impart.

Third Space

Despite the neat breakdown between ‘narratology’ and ‘ludology’, which would seem to capture all there is to know about video games, there is a third space that games-about-history inhabit. Elliot and Kappel’s recent ‘Playing with the Past’ (2013) neatly captures this aspect. They point out that while games are systems of rules interpreted by the computer referee, and while these systems are enveloped within a narrative, games-about-the-past have a larger historical narrative within which the game’s narrative must take place. That is to say, the players and designers are working within historical frameworks from the outset that inform their understanding of the past. Hence to make the game, to play the game, necessarily involves the kind of historical thinking (about contingency, about causality, about equifinality) that characterizes professional thinking about the past. ‘Why did that happen? What would happen if?’ are questions players ask about the game, which are very nearly the same thing that we ask of the past.

The fact of the matter is, while the content of a game is important, it is not as important as the system of rules and relationships that govern the emergent play; reflecting on why game play evolves the way it does forces the player to understand the rules of representation. This means that game players think about the past in ways that are the same as the kind of thinking about the past that we want in our students and publics. If one studies the communities of players that coalesce around particular games (especially games that allow for ‘modding’, or re-writing of the game rules, e.g, the Civilization franchise), one finds quite heated discussions about how best to represent the past, debates over the consequences and meanings of modifications to the games, and – while maybe sometimes not the most factually informed debates – a keen understanding of process in the past (Graham, rolling own article).

Flow

The training of archaeologists has long had an emphasis on the practical – we learn how to be archaeologists by doing archaeology. We perform the learning. Where, and from whom, we learn the hands-on aspects of archaeology has a deep influence on how we think archaeologically, how we understand the past. This is of course why we speak of ‘schools’ of thought. To play a video game well involves that same aspect of performance, and the ‘who made this and how did they imagine the world’ matters equally as much. When we play a game well, we have internalized how that game represents its world. We have internalized an understanding of the system of rules and relationships that we might not even be aware of. The learning that happens through video games is deep, and is tied to what psychologists call ‘flow’. Games don’t just represent a world: they actively watch the player. The best games adjust their difficulty in such a way as to achieve a flow state, a sense of mastery that sits in the sweet spot where the challenge is just hard enough to be difficult, but not so difficult that the player gives up in frustration.  The best learning, in whatever context, is tied to that same sense.

In representing a world to use, the system of rules and relationships that govern the emergent game play are akin to the systems of rules and relationships that we as scholars use to construct our ideas about the past: game rules are historiography. They are method and theory, all in one.  In the same way that an agent based simulation of the past encodes our ideas about how phenomenon x worked in the past (so that we can see what the consequences are of that idea for household formation amongst the Anasazi, say) game rules do encode ideas about (inter alia) power, ideology, action, colonialism, and empire. The game theorist Ian Bogost calls these ‘procedural rhetorics’, the arguments made by code (2007); the historian William Urrichio explicitly called code historiography (2005).  Games about the past will be played, experienced, and internalized by orders of magnitude more people than who ever read our formal archaeologies. And the experience will resonate far more deeply than any visit to a site or museum. We ignore games as a venue for our scholarship at our peril.

The Payoff

I have been arguing by omission that the content, the window dressing (the pretty graphics; the hyper-realistic depictions of textures and atmospheres, the 3d sound, the voice acting) does not matter nearly as much as close experience and engagement with the code and its emergent outcomes. That engagements allows a connection here with the kind of archaeology argued for by scholars such as Stuart Eve (xxxx) that seeks to use the mechanics of games and allied technologies such as mixed or augmented realities to focus on understanding the systems of relationships amongst the full sensory experience of the past. Eve calls this an ‘embodied GIS’ which does not focus on the archaeologist’s subjective experience of place, but rather, explores how sound, views, lighting (and indeed, smell and touch) combine or are constrained by the archaeology of a place experienced in that place.  This suggests a way forward for the use of games as both a tool for research on the past, and a way to communicate that research to our various publics.

Finally, we can turn our critical apparatus back to front and consider games as a venue within which we may do archaeology. Search online for ‘archaeogaming’. The most succinct definition of what this can be comes from Meghan Dennis:

Archaeogaming is the utilization and treatment of immaterial space to study created culture, specifically through videogames.

It requires treating a game world, a world bounded and defined by the limitations of its hardware, software and coding choices, as both a closed universe and as an extension of the external culture that created it. Everything that goes into the immaterial space comes from its external cultural source, in one way or another. Because of this, we see the same problems in studying culture in games as in studying culture in the material world.

Archaeogaming is a subdiscipline that requires the same standards of practice as the physical collection of excavated data, only with a different toolset. It also provides the opportunity to use game worlds to reflect on practice, theory and the perceptions of our discipline.

Video games are an extraordinarily rich tool, area of research, and affective mode of communication whose possibilities we haven’t even begun to explore. Yet, they are not so foreign to the archaeologist’s ‘formal’ computational experience, with ties to GIS, Agent Based Models, and reconstructions. Play on!

[yah, I need to work on that ending.]

[update Oct 28: I made a few changes, added a wee bit, nuked the table, and sent the thing off. That version lives on my open notebook].

Animating Watling Street

In a previous post I shared with you the first stab at using Brian Foo’s ‘Two Trains’. That experiment was mostly so that I understood what the code was doing. In the version I’m sharing below, I’ve got better data: counts of inscriptions at points mentioned in the second antonine itinerary, ie, watling street-ish, and counts of inscriptions for the surrounding county as a whole (from romaninscriptionsofbritain.org). The difference in those two numbers is the fodder for Foo’s algorithmn for selecting instruments, pitch, tempo, etc.

I will write more eventually about what these choices do for the sonification, and what they imply as a means of ‘visualizing’ Roman Britain. (Right now, I’m working with instruments that Foo selected for his piece, albeit more of the percussion instruments and a few of the woodwinds; up to now all recreations of Roman instruments I’ve found are gawdawful. So, by selecting these few instruments, at least I’ve got a bit of sound that might’ve made sense to a Roman. Fodder for reflection on this point.)

Foo also provides a processing script that grabs the latitude and longitude for each stop along the way, scaling appropriately to match the changes in the music. It’s quite clever – procedurally generated music matched by a procedurally generated visualization. I also like that this movement along a line is much closer to Roman conceptions of space – a sequence of what comes next, ie, an itinerary – rather than a top-down birds-eye view. Now, Foo also provides code to generate that kind of view, too, and I’ll probably play with that, just to see. But I don’t think it’ll make it into the final version of this project.

Listening to Watling Street

I greatly admire the work of Brian Foo, the ‘Data Driven DJ‘. His ‘Two Trains: A Sonification of Income Inequality on the NYC Subway’ uses data on incomes around the stops on the subway as fodder for an algorithmically generated sound scape that captures (to my mind; I’ve never been to NY) the dynamics of that city. Brian released all of his code and data on Github, and I’ve been playing around with it.

I’ve got big plans.

But I thought you’d enjoy, to start with, my first experiment, which is a sonification of the epigraphic density of Watling Street (also known as Route II of the Antonine Itinerary in Roman Britain). My data is extremely rough (mere counts of inscriptions per town), as I was just trying to understand in the first place how the scripts work. (I’ve got big plans for all of this, as I said). I’ve found a bit of a marching beat; when the song really picks up we’re at the big centres like Eboracum (York), Verulamium (St Albans), Londinium. The script is set for 100 BPM, with 3000 m per beat (the script takes the longitude and latitude for each place and figures out the distance between each one in the sequence, to work out the length etc of the song).

It’s pretty catchy. Hope you enjoy; I’m excited to play with these scripts some more. Thank you Data Driven Dj Brian Foo!

(See the continuation of this experiment here)

historical maps into Unity3d

This should work.

Say there’s a historical map that you want to digitize.  It may or may not have contour lines on it, but there is some indication of the topography (hatching or shading or what not). Say you wanted to digitize it such that a person could explore its conception of geography from a first person perspective.

Here’s a workflow for making that happen.

Some time ago, the folks at the NYPL put together a tutorial explaining how to turn such a map into a minecraft world. So let’s do the first part of their tutorial. In essence, what we do is take the georectified map (which you could georectify using something like the Harvard Map Warper), load that into QGIS, add elevation points, generate a surface from that elevation, turn it into grayscale, export that image, convert to raw format, import into Unity3d.

Easy peasy.

For the first part, we follow the NYPL:

Requirements

QGIS 2.2.0 ( http://qgis.org )

  • Activate Contour plugin
  • Activate GRASS plugin if not already activated

A map image to work from

  • We used a geo-rectified TIFF exported from this map but any high rez scan of a map with elevation data and features will suffice.

Process:

Layer > Add Raster Layer > [select rectified tiff]

  • Repeat for each tiff to be analyzed

Layer > New > New Shapefile Layer

  • Type: Point
  • New Attribute: add ‘elevation’ type whole number
  • remove id

Contour (plugin)

  • Vector Layer: choose points layer just created
  • Data field: elevation
  • Number: at least 20 (maybe.. number of distinct elevations + 2)
  • Layer name: default is fine

Export and import contours as vector layer:

  • right click save (e.g. port-washington-contours.shp)
  • May report error like “Only 19 of 20 features written.” Doesn’t seem to matter much

Layer > Add Vector Layer > [add .shp layer just exported]

Edit Current Grass Region (to reduce rendering time)

  • clip to minimal lat longs

Open Grass Tools

  • Modules List: Select “v.in.ogr.qgis”
  • Select recently added contours layer
  • Run, View output, and close

Open Grass Tools

  • Modules List: Select “v.to.rast.attr”
  • Name of input vector map: (layer just generated)
  • Attribute field: elevation
  • Run, View output, and close

Open Grass Tools

  • Modules List: Select “r.surf.contour”
  • Name of existing raster map containing colors: (layer just generated)
  • Run (will take a while), View output, and close

Hide points and contours (and anything else above bw elevation image) Project > Save as Image

You may want to create a cropped version of the result to remove un-analyzed/messy edges

As I noted a while ago, there are some “hidden, tacit bits [concerning] installing the Contour plugin, and working with GRASS tools (especially the bit about ‘editing the current grass region’, which always is fiddly, I find).”  Unhelpfully, I didn’t write down what these were.

Anyway, now that you have a grayscale image, open it in Gimp (or Photoshop; if you do have photoshop go watch this video and you’re done.).

For those of us without photoshop, this next bit comes from the addendum to a previous post of mine:L

    1. open the grayscale image in Gimp.
    2. resized the image as power of 2 + 1 (*shrug* everything indicates this is what you do, with unity); in this case I chose 1025.
    3. save as file type RAW. IMPORTANT: in the dialogue that opens, set ‘RGB save type to ‘planar’.
    4. Change the file extension from .data to .raw in mac Finder or windows Explorer.

Now you can import this historical elevation map in Unity. In Unity, add a gameobject -> 3d object -> terrain to the project. In the inspector window, there’s a cogwheel. Click this; it opens the settings. One of the options will be ‘import raw’. Click this.

Select your .raw grayscale image.

  1. On the import dialogue, change it to 8-bit image rather than 16-bit.
  2. Change the width, height, x and z to all be 1025. Changed the y to be 75 (yours will be different; look at the range in your original map of highest and lowest point, and input that. For reference, please also see this post which saved me: http://newton64.github.io/blog/2013-07-24-gimp-unity-terrain.html

Ta da – a white glacial landscape with your elevation data.

Screen Shot 2015-06-09 at 12.14.30 PMNow the fun stuff can happen. But – before someone can ‘walk’ around your landscape, you have to add controls to your project. So, in Unity 3d, go to:

Assets – Import package – characters.

Once that’s all done, you’ll drag-and-drop a ‘FPSController’ into your project. You’ll find it as below:

Screen Shot 2015-06-09 at 7.26.52 PM

Click and grab that blue box and move it up into your project (just drop it in the main window). Make sure that the control is above (and also, not intersecting any part of) your landscape, or when you go to play, you’ll either be stuck or indeed falling to the centre of the earth. We don’t want that. Also, delete the ‘camera’ from the hierarchy; the fpscontroller has its own camera. My interface looks like this:

Screen Shot 2015-06-09 at 7.30.47 PM

You do the grass and trees etc from the terrain inspector, as in the window there on the right. I’ll play some more with that aspect, report back soonish. Notice the column drum in the right foreground, and the tombstone in the back? Those were made with 3d photogrammetry; both are hosted on Sketchfab, as it happens. Anyway, in Meshlab I converted from .obj to .dae, after having reduced the polygons with quadratic edge decimation, to make them a bit simpler. You can add such models to your landscape by dropping the folder into the ‘assets’ folder of your Unity project (via the mac Finder or windows explorer).  Then, as you did with the fpscontroller block, you drag them into your scene and reposition them as you want.

Here’s my version, pushed to webGL

Enjoy!

(by the way, it occurs to me that you could use that workflow to visualize damned near anything that can be mapped, not just geography. Convert the output of a topic model into a grayscale elevation map; take a network and add elevation points to match betweeness metrics…)