On haunts & low-friction AR – thinking out loud

The frightening news is that we are living in a story. The reassuring part is that it’s a story we’re writing ourselves. Alas, though, most of us don’t even know it – or are afraid to accept it. Never before did we have so much access to the tools of storytelling – yet few of us are willing to participate in their creation.

– Douglas Ruskhoff, ‘Renaissance Now! The Gamers’ Perspective’ in Handbook of Computer Game Studies, MIT Press, 2005: 415.

Haunts is about the secret stories of spaces.

Haunts is about locative trauma.

Haunts is about the production of what Foucault calls “heterotopias”—a single real place in which incompatible counter-sites are layered upon or juxtaposed against one another.

The general idea behind Haunts is this: students work in teams, visiting various public places and tagging them with fragments of either a real life-inspired or fictional trauma story. Each team will work from an overarching traumatic narrative that they’ve created, but because the place-based tips are limited to text-message-sized bits, the story will emerge only in glimpses and traces, across a series of spaces.

– Mark Sample, “Haunts: Place, Play, and Trauma” Sample Reality http://www.samplereality.com/2010/06/01/haunts-place-play-and-trauma/

It’s been a while since I’ve delved into the literature surrounding locative place-based games. I’ve been doing so as I try to get my head in gear for this summer’s Digital Archaeology Institute where I’ll be teaching augmented reality for archaeology.

Archaeology and archaeological practice are so damned broad though; in order to do justice to the time spent, I feel like I have to cover lots of different possibilities for how AR could be used in archaeological practice, from several different perspectives. I know that I do want to spend a lot of time looking at AR from a game/playful perspective though.  A lot of what I do is a kind of digital bricolage, as I use whatever I have to hand to do whatever it is I do. I make no pretense that what I’m doing/using is the best method for x, only that it is a method, and one that works for me. So for augmented reality in archaeology, I’m thinking that what I need to teach are ways to get the maximum amount of storytelling/reality making into the greatest number of hands. (Which makes me think of this tweet from Colleen Morgan this am:

…but I digress.)

So much about what we find in archaeology is about trauma. Houses burn down: archaeology is created. Things are deliberately buried: archaeology is created. Materials are broken: archaeology is created.

Sample’s Haunts then provides a potential framework for doing archaeological AR. He goes on to write:

The narrative and geographic path of a single team’s story should alone be engaging enough to follow, but even more promising is a kind of cross-pollination between haunts, in which each team builds upon one or two shared narrative events, exquisite corpse style. Imagine the same traumatic kernel, being told again and again, from different points of views. Different narrative and geographic points of views. Eventually these multiple paths could be aggregated onto a master narrative—or more likely, a master database—so that Haunts could be seen (if not experienced) in its totality.

It was more of a proof of concept than anything else, but my ‘low-friction AR‘ ‘The Ottawa Anomaly‘ tries to not so much tell a story, but provide echoes of events in key areas around Ottawa’s downtown, such that each player’s experience of the story would be different – the sequence of geotriggers encountered would colour each subsequent trigger’s emotional content. If you hear the gunshot first, and then the crying, that implies a different story than if you heard them the other way around. The opening tries to frame a storyworld where it makes sense to hear these echoes of the past in the present, so that the technological mediation of the smartphone fits the world. It also is trying to make the player stop and look at the world around them with new eyes (something ‘Historical Friction‘ tries to do as well).

I once set a treasure hunt around campus for my first year students. One group however interpreted a clue as meaning a particular statue in downtown Ottawa; they returned to campus much later and told me a stunning tale of illuminati and the secret history of Official Ottawa that they had crafted to make sense of the clues. Same clues, different geographical setting (by mistake) = oddly compelling story. What I’m getting at: my audio fragments could evoke very different experiences not just in their order of encounter but also given the background of the person listening. I suggested in a tweet that

creating another level of storytelling on top of my own.

I imagine my low-friction AR as a way for multiple stories within the same geographic frame, and ‘rechoes’ or ‘fieldnotes’ as ways of cross-connecting different stories. I once toyed with the idea of printing out QR codes such that they could be pasted overtop of ‘official Ottawa‘ for similar purposes…

Somewhere in the desert… a temple

My minecraft expedition was a success. Let me share some observations.

Firstly -> I seeded the wrong world. I used

Double Village

as seed for ‘large biomes’ when I should have used it for ‘default’. Reading the map incorrectly happens all the time in landscape archaeology though. Transpose some digits, and soon you’re hundreds of metres in the wrong spot.

Framing my expedition in my mind as a kind of steam-punk exploration helped get me back ‘in the game’:


I found the village quite easily this time. It was filled with NPCs going about their mysterious business. I, a stranger, wandered into their midst and had no impact on their lives. Doesn’t that often seem the way of a ‘foreign’ expedition? When as a graduate student I was excavating at Forum Novum, our world and that of the people whose local marketground we were digging up really did not intersect, except in very particular contexts: the bar and the restaurant. On market day, we would all head back to Rome. Canadian lad flies in, digs, figures it all out, writes a paper, never explains/connects with the locals. As I remarked at the time,





And so I bumbled away, trying to record stratigraphically what I was up to. The different kinds of blocks do help differentiate context – sand fill is quite different from the sandstone blocks the temple was built with. Unfortunately, sandstone is also part of the geology of Minecraft, and typically happens around 3 or 4 blocks down from the surface in this biome. So it became difficult to figure out where the temple ended and the local geology began. Since the temple is of a common ‘type’ in Minecraft, I could just dig to exhume that prexisting type-idea and poof: complete temple. The act of excavation creates the archaeology in more ways than one, it seems.




Channeling my inner Howard Carter there. But – in this world with no ‘rules’, no overarching ‘story’, deciding to go an an archaeological expedition forces a story on us. Interacting with the NPCs, and the crude excavation tools, pushes us towards a 19th century frame of mind. In my steam-punk narrative I was constructing on twitter, the archaeologist-as-better-class-of-looter trope seemed to emerge naturally out of my interaction with the game mechanics.

And then this happened.





We’ll come back to that. Suffice to say, this encounter with the ‘otherness’ of the inhabitants of the village was oddly discomfiting.



Clearly, Notch has watched too many Indiana Jones films. Meanwhile, the villagers continued to trouble me.



And then night fell. I decided to try to spend it with the villagers.


I broke the door, quite by accident. Clumsy foreigner. Interfering.



From above, I watched the zombies and creepers and who knows what else hunt each NPC down and kill them.





So I managed to set into action a chain of events that resulted in the death of the entire village. Now obviously *real* archaeological excavation rarely results in the deaths of the locales, but there are unintended consequences to our interventions. Here, the game holds a distorted fun-house mirror to life. But were I doing this with a class, this would be a teachable moment to consider the impact of academic archaeology in those ‘distant’ lands we study.

For my minecraft adventure, I left the expedition and struck out on my own. Soon I discovered more temples, more villages, more ruins. If you’re exploring too, you can find them here:




266.9 66.87 1036.99
-219.24 65.270 13.56
58 67 347
487.73 46 560.3
247.76 66 784
430 63 929.8
692 70 1256.7

Now, one could use those coordinates to begin mapping, and perhaps working out, something of the landscape archaeology in this world. One of those coordinates belongs to a vine-covered stone temple in the jungle. Here, our expectations of what ‘archaeology’ is (informed by the movies) come to the fore.


Now, it may be that I should mod this world more in order to enable a post-colonial kind of archaeology within it. But the act of modding is itself colonialist…







So what I have I learned? I have often argued in my video games for historians class that it’s not so much the ‘skin’ of a game that should be of concern to historians, but rather the rules. The rules encode the historiographic approach of the game’s designers. You’re good at the game? You’re performing the worldview of the game’s creators. But in a game like minecraft, where the rules are a bit more low-level (for lack of a better term), what’s interesting is the way player agency in the game intersects and merges with the player’s own story, the story the player tells to make sense of the action within the world. It’s poesin. Mimemsis. Practomimetic? So while some of the game’s embedded worldview can be seen to be drawn straight from the Indiana Jones canon, other elements, like the agency of NPCs, discomfits us precisely because it intersects our own worldviews (the sociocultural practice of academic archaeology) in such a way as to draw us up short.

It will be interesting to see what Andrew’s expedition uncovers…






Somewhere in the desert…

A lost village

At the upcoming SAA in San Fracisco, Andrew Rheinhard and I are participating in a forum on digital public archaeology. Our piece, ‘Playing Pedagogy: Videogaming as site and vehicle for digital public archaeology’ is still in a process of becoming. Our original abstract:

While there is an extensive literature on the pedagogical uses of video games in STEM education, and a comparitvely smaller literature for langagues, literature, and history, there is a serious dearth of scholarship surrounding videogames in their role as vectors for public archaeology. Moreover, video games work as ‘digital public archaeology’ in the ways their imagined pasts within the games deal with monuments, monumentality, and their own ‘lore’. In this presentation, we play the past to illustrate twin poles of ‘public’ archaeology, as both worlds in which archaeology is constructed and worlds wherin archaeological knowledge may be communicated.

We had initially thought to write a game to explore these ideas, and so our entire presentation would involve the session participants playing it. But writing games is tough. In fact, it would be hard for one to top the game made by Tara Copplestone for the 2014 Heritage Jam, ‘Buried’. However, another venue presents itself. Andrew recently proposed to the makers of No Man’s Sky that he be allowed to lead an archaeological expedition therein.

“What!” I hear you exclaim. Well, think of it like this. We’re used to the idea of reception studies, of how the past is portrayed in games, movies, novels. We’re also used to the idea of games as being the locus for pedagogy, or for persuading, or making arguments. What happens then, in a game like No Man’s Sky, where the entire world is generated algorithmically from a seed? That is, no human designs it: it emerges. Rather like our own universe, eh? Such procedural games are quite common, though none perhaps are as complex in their world building as Dwarf Fortress (which evolves not just the world, but also culture & individual family/clan/culture lineages!)

What then does such  xenoarchaeology look like? How does that intersect with digital public archaeology? Well, if archaeological method has any truth to it, then in these worlds we might be faced with something profoundly alter, something profoundly different (which also accounts for why the writers of Star Trek placed such stock on archaeology)

We’ve got a month to sort these thoughts out. But it was in this frame of mind that I started thinking what archaeology in Minecraft would look like, could look like, and what it might find. Not in Minecraft worlds that have been lovingly built from scratch by a human. No, I mean the ones grown from seeds. It’s quite interesting – since no computational process is actually truly random, if you know the seed from which all calculations and algorithms are run, you can recreate the exact sequence that gives rise to a particular world (in this, and indeed in all, computational simulations). There is quite a thriving subculture in Minecraft it turns out that share interesting seeds. And so, as I searched for seeds that might prove fertile for our talk, I came across ‘Double Village’ for Minecraft 1.64. (See method 5 for spawing worlds from seeds). If you’ve got Minecraft 1.64 you too can join me on my expedition to a strange –desert land….


The texts all say the same thing. Set the portal to ‘Double Village’ and soon you’ll find the exotic and lost desert villages. I put on the archaeotrancerebretron, grabbed my kit bag, and gritted my teeth. My companions all had theirs on too. We stepped into the charmed circle…















Interview by Ben Meredith, for his article on procedurally generated archaeology sims

I was interviewed by Ben Meredith on procedurally generated game worlds and their affinities with archaeology, for Kill Screen Magazine. The piece was published this morning. It’s a good read, and an interesting take on one of the more interesting recent developments in gaming. I asked Ben if I could post the unedited communication we had, from which he drew on for his article. He said ‘yes!’, so here it is.

Hi Ben,

It seems to me that archaeology and video games share a number of affinities, not least of which because they are both procedurally generated. There is a method for field archaeology; follow the method, and you will have correctly excavated the site/surveyed the landscape/recorded the standing remains/etc. These procedures contain within them various ways of looking at the world, and emphasize certain kinds of values over others, which is why it is possible to have a marxist archaeology, or a gendered archaeology, or so on. Thus, it also seems obvious to me that you can have an archaeology within video games (not to be confused with media archaeology, or an archaeology of video games). A great example of this kind of work is Andrew Rheinhart’s exploration of the beta of Elder Scrolls Online – you should touch base with him, too.http://archaeogaming.wordpress.com/2014/01/22/beta-testing-archaeology-in-elder-scrolls-online-taken-down/

On to your questions!

What motivated you to become an archaeologist?

Romance, mystery, allure, the ‘other’, the desire to travel… my initial impetus for getting into archaeology comes from the fact that I’m ‘from the bush’ in rural Canada and as a teenager I wanted so much more from the world. I now recognize that there’s some amazing archaeology in my own backyard (as it were) but I was too young and immature to recognize it then. The Greek Bronze Age, the Mycenaean heroes, the Minoans, Thera… all these captured my imagination. And there was no snow!

Personally, what single facet of archaeology captures the spirit of the field most effectively?

Check out the work of Colleen Morgan http://middlesavagery.wordpress.com/2014/03/05/stop-saying-archaeology-is-actually-boring/ and Sophie Hay http://pompei79.wordpress.com/2014/03/05/scratching-the-surface/ and Lorna Richardson http://digipubarch.org/2014/03/14/all-the-swears-for-this/ If there is a ‘spirit of the field’, I think these three scholars capture it admirably. They are curious, reflective, aware of the impact that the doing of archaeology has in the wider world. Archaeology produces powerful narratives, powerful ways of framing our current situation regarding the past and the present. I aspire to be more like these three remarkable women.

Which game do you think, so far, best achieves this?

A hard question to answer. But I think I’d go with Minecraft, for its community and especially its ability to be adopted in educational circles, for the way it requires the player to build and engage with the environments created. The world is what you make it, in Minecraft. So too in archaeology.
If a game attempted to procedurally generate ancient civilizations, what do you think would be the three most important elements that had to be generated?
I’ve done a lot of agent-based simulation. http://www.graeworks.net/category/simulations/ . Such a game would have to be built on an agent-based framework, for the NPCs. Each NPC would have to be unique. Those rules of behaviours that describe how the NPCs interact with each other, the environment, and the player would have to accurately capture the target ancient civilization. You can’t just have an ‘ancient civilization’; you’ll have to consider one very particular culture in one very particular time and place. That’s what a procedural rhetoric is all about: an argument in code about how this aspect of the world worked/is/existed.
Would investigation play an integral part in a video game interpretation?
I’m not sure I follow. Procedural generation on its own still is meaningless; it would have to be interpreted. The act of playing the game (and see the work of Roger Travis on http://playthepast.org on practicomimetics) sings it into existence.
Conversely, for you would stumbling blindly upon a ruin diminish the effect?
If the world is procedurally generated, then there would be clues in the landscape that would attune the attentive player to the presence of the past in that location. If there is no rhyme or reason – we stumble blindly – then the procedures do not describe an ancient (or any) civilization.

Do you think an archaeology simulator would be best implemented in first person (e.g. Minecraft) or third person (e.g. Terraria)? Would it be more important to convey an intimate atmosphere or impressive scale?
I like first person, but on a screen, first person can just induce nausea in the player. Maybe with an Oculus Rift that’s not a concern, in which case I’d say go first person! On a screen, I think third is better. Why not go AR and put your procedurally generated civilization into the local landscape?

Archeology versus Archaeology versus #Blogarch

I’m working on a paper that maps the archaeological blogosphere. I thought this morning it might be good to take a quick detour into the Twitterverse.


'archaeology' on twitter, april 7 2014
‘archaeology’ on twitter, april 7 2014

‘archaeology’ on twitter

Here we have every twitter username, connected by referring to each other in a tweet. There’s a seriously strong spine of tweeting, but it doesn’t make for a unified graph. The folks keeping this clump all together, measured by betweeness centrality:


top replied-to


Photographer Klaus Leidorf’s Aerial Archaeology


Top hashtags:
archaeology 325
Pompeii 90
fresco 90
Archaeology 77
Herculaneum 40
Israel 24
nowplaying 20
roman 18
newslocker 16
Roman 14


Let’s look at american archeology – as signified by the dropped ‘e’.

'archeology' on twitter, april 7
‘archeology’ on twitter, april 7

An awful lot more fragmented – less popular consciousness of archaeology-as-a-community?
Top by betweeness centrality – the ones who hold this together:

Top urls:
Archeology girl


Top hashtags:

Top replied-to

#Blogarch on twitter

twitter search '#blogarch' april 7 2014
twitter search ‘#blogarch’ april 7 2014

And now, the archaeologists themselves, as indicated by #blogarch

We talk to ourselves – but with the nature of the hashtag, I suppose that’s to be expected?

Top by betweeness centrality

top urls

Top hashtags

Top replied to
electricarchaeo (yay me!)

Top mentioned:

Put them altogether now…

And now, we put them altogether to get ‘archaeology’ on the twitterverse today:

'archaeology, archeology, and #blogarch' on twitter, april 7
‘archaeology, archeology, and #blogarch’ on twitter, april 7

Visually, it’s apparent that the #blogarch crew are the ones tying together the wider twitter worlds of archaeology & archeology, thought it’s still pretty fragmented. There’re 460 folks in this graph.

Top by betweeness centrality:


Top urls


top hashtags (not useful, given the nature of the search, right? But anyway)


Top word pairs in those largest groups:

archeology,professor 30
started,yesterday 21
yesterday,battle 21
battle,towton 21
towton,weapon 21
weapon,tests 21
tests,forensic 21
forensic,archeology 21
museum,archeology 19
blogging,archaeology 17

second group:
blogging,archaeology 13
future,blogging 12
archaeology,go 7
archaeology,future 7
archaeology,final 6
final,review 6
review,blogarch 6
hopes,dreams 6
dreams,fears 6
fears,blogging 6

third group:
space,age 6
age,archaeology 6
archaeology,future 6
future,know 6
know,going 6
saa2014,blogarch 6
going,blogarch 5
blogarch,post 3
post,future 3
future,blogging 3

fourth group:
easterisland,ancient 10
ancient,mystery 10
mystery,easter 10
easter,slave 10
slave,history 10
history,esoteric 10
esoteric,archeology 10
archeology,egypt 10
rt,illumynous 9
illumynous,easterisland 9

fifth group:
costa,rica 8
rt,archeologynow 7
archeologynow,modern 4
modern,archeology 4
archeology,researching 4
researching,dive 4
dive,bars 4
bars,costa 4
rica,costa 4
rica,star 4

(once I saw ‘bars’, I stopped. Archaeological stereotypes, maybe).

Top mentioned in the entire graph

illumynous 9 bonesdonotlie 8
drspacejunk 8 drkillgrove 4
bonesdonotlie 8 capmsu 4
archeologynow 7 yagumboya 3
openaccessarch 7 drspacejunk 3
macbrunson 6 archeowebby 3
swbts 6 allarchaeology 3
archeowebby 6 openaccessarch 3
algenpfleger 5 cmount1 3
youtube 5 brennawalks 2

So what does this all mean? Answers on a postcard, please…

(My network files will be on figshare.com eventually).

Topic modeling the things that fell out of pockets

UK Districts by Modularity, overlain with hand-drawn civitas boundaries
Modern Districts by Modularity, overlain with hand-drawn 1st century civitas boundaries

Topic modeling is very popular at the moment in the digital humanities. Ian, Scott and I described them as tools for extracting topics or injecting semantic meaning into vocabularies: “Topic models represent a family of computer programs that extract topics from texts. A topic to the computer is a list of words that occur in statistically meaningful ways. A text can be an email, a blog post, a book chapter, a journal article, a diary entry – that is, any kind of unstructured text” (Graham, Weingart, and Milligan 2012). In that tutorial, ‘unstructured’ means that there is no encoding in the text by which a computer can model any of its semantic meaning.

But there are topic models of ships’ logs, of computer code. So why not archaeological databases?

Archaeological datasets are rich, largely unstructured bodies of text. While there are examples of archaeological datasets that are coded with semantic meaning through xml and Text Encoding Initiative practices, many of these are done after the fact of excavation or collection. Day to day, things can be rather different, and this material can be considered to be  ‘largely unstructured’ despite the use of databases, controlled vocabulary, and other means to maintain standardized descriptions of what is excavated, collected, and analyzed. This is because of the human factor. Not all archaeologists are equally skilled. Not all data gets recorded according to the standards. Where some see few differences in a particular clay fabric type, others might see many, and vice versa. Archaeological custom might call a particular vessel type a ‘casserole’, thus suggesting a particular use, only because in the 19th century when that vessel type was first encountered it reminded the archaeologist of what was in his kitchen – there is no necessary correlation between what we as archaeologists call things and what those things were originally used for. Further, once data is recorded (and the site has been destroyed through the excavation process), we tend to analyze these materials in isolation. That is, we write our analyses based on all of the examples of a particular type, rather than considering the interrelationships amongst the data found in the same context or locus. David Mimno in 2009 turned the tools of data analysis on the databases of household materials recovered and recorded room by room at Pompeii. He considered each room as a ‘document’ and the artefacts therein as the ‘tokens’ or ‘words’ within that document, for the purposes of topic modeling. The resulting ‘topics’ of this analysis are what he calls ‘vocabularies’ of object types which when taken together can suggest the mixture of functions particular rooms may have had in Pompeii. He writes, ‘the purpose of this tool is not to show that topic modeling is the best tool for archaeological investigation, but that it is an appropriate tool that can provide a complement to human analysis….mathematically concrete in its biases’. The ‘casseroles’ of Pompeii turn out to have nothing to do with food preparation, in Mimno’s analysis. To date, I believe this is the only example of topic modeling applied to archaeological data.

Directly inspired by that example, I’ve been exploring the use of topic models on another rich archaeological dataset, the Portable Antiquities Scheme database in the UK. The Portable Antiquities Scheme is a project “to encourage the voluntary recording of archaeological objects found by members of the public in England and Wales”. To date, there are over half a million unique records in the Scheme’s database. These are small things, things that fell out of pockets, things that often get found via metal-detecting.

Here’s what I’ve been doing.

1. I downloaded a nightly dump of the PAS data back in April; it came as a csv file. I opened the file, and discovered over a million lines of records. Upon closer examination, I think what happened is something to do with the encoding- there are line breaks, carriage returns, and other non-printing characters (as well as commas being used within fields) that when I open the file I end up with a single record (say a coin hoard) occupying tens of lines, or of fields shifting at the extraneous commas.

2. I cleaned this data up using Notepad++ and the liberal use of regular expressions to put everything back together again. The entire file is something like 385 mb.

3. I imported it into MS Access so that I could begin to filter it. I’ve been playing with paleo – meso – and neolithic records; bronze age records; and Roman records. The Roman material itself occupies somewhere around 100 000 unique records.

4. I exported my queries so that I would have a simpler table with dates, descriptions, and measurements.

5. I filtered this table in Excel so that I could copy and paste out all of the records found within a particular district (which left me with a folder with 275 files, totaling something like 25 mb of text).

6. Meanwhile, I began topic modeling the unfiltered total PAS database (just after #2 above). Each run takes about 3 hours, as I’ve been running diagnostics to explore the patterns. The problem I have here though is what, precisely, am I finding? What does a cluster of records who share a topic actually mean, archaeologically? Do topics sort themselves out by period, by place, by material, by finds officer…?

7. As that’s been going on, I’ve been topic modeling the folders that contain the districts of England and Wales for a given period. Let’s look at the Roman period.

There are 275 files, where a handful have *a lot* of data (> 1000 kb), while the vast majority are fairly small (< 100 kb). Perhaps that replicates patterns of metal detecting – see Bevan on biases in the PAS.  The remaining districts seem to have no records in the database. So I’ve got 80% coverage for all of England and Wales. I’ve been iterating over all of this data, so I’ll just describe the most recent, as it seems to be a typical result. Using MALLET 2.0.7, I made a topic model with 50 topics (and optimized the interval, to shake out the useful from the not-so-useful topics). Last night, as I did this, the topic diagnostics package just wouldn’t work for me (you run it from the MALLET directory, but it lives at the MALLET site; perhaps they were working on it). So I’ll probably want to run all these again.

If I sort the topic keys by their prominence (see ‘optimize interval’) the top 14 all seem to describe different kinds of objects – brooches, denarii, nummus, sherds, lead weights, radiate, coin dates, the ‘heads’ sides of coins – which Emperor. Then we get to the next topic, which reads :” record central database recording usual standards fall created scheme aware portable began antiquities rectify working corroded ae worn century”.  This meta-note about data quality appears throughout the database, and refers to materials collected before the Scheme got going.

After that, the remaining topics all seem to deal with the epigraphy of coins, and the various inscriptions, figurative devices, their weights & materials. A number of these topics also include allusions to the work of Guest and Wells, whose work on Iron Age Coins is frequently cited in the database.

Let’s look at the individual districts now, and how these topics play over geographic space. Given that these are modern districts, it’d be better – perhaps – to do this over again with the materials sorted into geographic entities which make sense from a Roman perspective. Perhaps do it by major Roman Roads ( sorting the records so that districts through which Wattling Street traverses are gathered into a single document). Often what people do when they want to visualize the patterns of topic interconnections in a corpus is to trim the composition document so that only topics greater than a certain threshold are imported to a package like Gephi.

My suspicion is that that would throw out a lot of useful data. It may be that it’s the very weak connections that matter. A very strong topic-document relationship might just mean that a coin hoard found in the area is blocking the other signals.

In which case, let’s bring the whole composition document into Gephi. Start with this:

adur 4 0.238806 15 0.19403 22 0.179104 13 0.119403 17 0.089552

and delete out the edge weights. (I’m trying to figure out how to do what follows without deleting those edge weights, but bear with me.)

You end up with something like this:

adur 4 15 22  […etc…]

Save the file with a new name, as csv.

Open in Notepad++ (or similar) and replace the commas with ;

Go to gephi. Under ‘open graph file’, select your csv file. This is not the same as ‘import spreadsheet’ under the data table tab. You can import a comma separated file where the first item on a line is a node, and each subsequent item is another node to which it is attached. If you tried to open that file under the ‘import spreadsheet’ button, you’d get an error message – in that dialogue, you have to have two columns source and target where each row describes a single relationship. See the difference?

This is why if you left the edge weights in the csv file – let’s call it an adjaceny file – you’d end up with weights becoming nodes, which is a mess. If you want to keep the weights, you have to do the second option.

I’ve tried it both ways. Ultimately, while the first option is much much faster, the second option is the one to go for because the edge weights (the proportion that a topic is present in a document) is extremely important. So I created a single list that included seven pairs of topic-weight combinations. (This doesn’t created a graph where k=7, because not every document had that many topics. But why 7? In truth, after that point, the topics all seemed to be well under 1% of each document’s composition).

With me so far? Great.

Now that I have a two mode network in Gephi, I can begin to analyze the pattern of topics in the documents. Using the multi-mode plugin, I separate this network into two one-mode networks: topics to topics (based on appearing in the same district) and district – district based on having the same topics, in different strengths.

Network visualization doesn’t offer anything useful here (although Wales always is quite distinctly apparent, when you do. It’s because of the coin hoards). Instead, I simply compute useful network metrics. For instance, ‘betweeness’ literally counts the number of times a node is in between all pairs of nodes, given all the possible paths connecting them. In a piece of text such words do the heavy semantic lifting. So identifying topics that are most in between in the topic – topic network should be a useful thing to do. But what does ‘betweeness’ imply for the district – district network? I’m not sure yet. Pivotal areas in the formation of material culture?

What is perhaps more useful is the ‘modularity’. It’s just one of a number of algorithmns one could use to try to find structural sub-groups in a network (nodexl has many more). But perhaps there are interesting geographical patterns if we examined the pattern of links. So I ran modularity, and uploaded the results to openheatmap to visualize them geographically.  Network analysis doesn’t need to produce network visualizations, by the way.

See the result for yourself here: http://www.openheatmap.com/embed.html?map=AnteriorsFrijolsHermetists

It colours each district based on the group that it belongs to. If you mouse-over a district, it’ll give you that group’s number – those numbers shouldn’t be confused with anything else. I’d do this in QGIS, but this was quicker for getting a sense of what’s going on.

I asked on Twitter (referencing a slightly earlier version) if these patterns suggested anything to any of the Romano-Britain crowd.


Modularity for topic-topic also implies some interesting groupings, but these seem to mirror what one would expect by looking at their prominence in the keys.txt file.  So that’s where I am now, soon to try out Phil’s suggestion.

As Paul Harvey was wont to say, ‘…and now you know… the REST of the story’.  At DH2013 I hope to be able to tell you what all of this may mean.

Hodder’s ‘Tanglegram’ as Network

Hodder's fig 9.2 as network
Hodder’s fig 9.2 as network

I am reading Ian Hodder’s book, ‘Entangled: An Archaeology of the Relationship between Humans and Things’ Hodder writes that the tanglegram cannot be represented as a network, since a network doesn’t consider the nature of the relationships or nodes. This is not in fact the case. Representing these complex relationships as a network is quite possible, and allows the ‘tanglegram’ to actually become a object to query in its own right, rather than a suggestive illustration. I’ve uploaded the network data to Figshare:

I used NodeXL to enter the data. If there was a bidirectional tie, I made two entries: A -> B, B -> A. If it was only one way, I entered it with the directionality of the original tanglegram. I saved it as a .net file, opened it in gephi, and ran gephi’s statistics.

This was all rather rough and ready; because I was working from a blown-up photocopy of the original figure, and I’m trying to get ready for a trip, there could be errors. One would need Hodder’s original data to do this properly, but I offer it up here to show that it’s possible, and indeed worthwhile: why else would you bother drawing a tanglegram, if not to use it to help your analysis?

In the image below, I resize the nodes to represent betweenness centrality (which elements of the tanglegram are doing the heavy lifting?) and recolour it according to modularity. Modularity finds five groups (nodes listed in descending order of betweenness centrality):

Group 0: house, groundstone, burial, plaster, figurines, pigment, skins, painting, personal artefacts, animal heads, food storage, human heads, special food, human body parts, burials, storage rooms, bins

Group 1: hoard, chipped stone, sheep, mats, dung, wild animals, fields, bone, cereals, wooden object, weeds.

Group 2: food, hearth, fuel, ash, clay balls, oven, traps, wood

Group 3: clay, baskets, extraction pits, wetland, reeds, birds, dryland, marl, ditches, fish, clean water, landscape, field, eggs

Group 4: midden, dogs, colluvium, mortar, pen, mudbrick

Seems quite suggestive! For the files for yourself, please see:

Hodder’s Figure 9.2, Entangled, as network. Shawn Graham. figshare.

Retrieved 17:47, Mar 19, 2013 (GMT)