SAA 2015: Macroscopic approaches to archaeological histories: Insights into archaeological practice from digital methods

Ben Marwick and I are organizing a session for the SAA2015 (the 80th edition, this year in San Francisco) on “Macroscopic approaches to archaeological histories: Insights into archaeological practice from digital methods”. It’s a pretty big tent. Below is the session ID and the abstract. If this sounds like something you’d be interested in, why don’t you get in touch?

Session ID 743.

The history of archaeology, like most disciplines, is often presented as a sequence of influential individuals and a discussion of their greatest hits in the literature.  Two problems with this traditional approach are that it sidelines the majority of participants in the archaeological literature who are excluded from these discussions, and it does not capture the conversations outside of the canonical literature.  Recently developed computationally intensive methods as well as creative uses of existing digital tools can address these problems by efficiently enabling quantitative analyses of large volumes of text and other digital objects, and enabling large scale analysis of non-traditional research products such as blogs, images and other media. This session explores these methods, their potentials, and their perils, as we employ so-called ‘big data’ approaches to our own discipline.


Assessing my upcoming seminar on the Illicit Antiquities trade, HIST4805b

So I’m putting together the syllabus for my illicit antiquities seminar. This is where I think I’m going with the course, which starts in less than a month (eep!). The first part is an attempt to revitalize my classroom blogging, and to formally tie it into the discussion within the classroom – that is, something done in advance of class in order to make the classroom discussion richer. In the second term, I want to make as much time as possible for students to pursue their own independent research, which I’m framing as an ‘unessay’ following the O’Donnell model.


Daylight: The Journal of #HIST4805b Studying Looted Heritage

Rationale: What we are studying is important, and what we are learning needs to be disseminated as widely as possible. In a world where ‘American Diggers‘ can be a tv show, where National Geographic (for heaven’s sake!) seriously can contemplate putting on a show that desecrates war dead for entertainment there is a need to shed daylight. The fall term major assessment piece does this. You will be writing and curating a Flipboard magazine that ties our readings and discussions into the current news regarding heritage crime.

There are a number of steps to this.

  1. Each week, everyone  logs into and puts three new reports on the map.
  2. Each week, a different subset of the class will be the lead editors for our journal.
    1. lead editors each write an editorial that explores the issues raised in the readings, with specific reference to new reports on our crowdmap. Editorials should be 750- 1000 words long.
    2. lead editors curate the Flipboard magazine so that it contains:
      1. the editorials
      2. the crowdmap reports
      3. the readings
  3. This should be completed before Monday’s class where we will discuss those readings. The lead editors will begin the class by discussing their edition of Daylight.*
  4. Each student will be a lead editor three times.

At the end of term you will nominate your two best pieces for grading. I will grade these for how you’ve framed your argument, for your use of evidence, and for your understanding of the issues. I will also take into account your in-class discussion of your edition of Daylight.

At the end of term you will also nominate two of your peers’ best pieces for consideration for bonus, with a single line explaining why.

This is worth 40% of your final grade.


The Unessay Research Project

Unessay‘ noun - as described by Daniel Paul O’Donnell,

“[...] the unessay is an assignment that attempts to undo the damage done by [traditional essay writing at the university level]. It works by throwing out all the rules you have learned about essay writing in the course of your primary, secondary, and post secondary education and asks you to focus instead solely on your intellectual interests and passions. In an unessay you choose your own topicpresent it any way you please, and are evaluated on how compelling and effective you are.”

Which means for us:

The second term is an opportunity for exploration, and for you to use the time that you would normally spend in a classroom listening as time for active planning, researching, and learning the necessary skills, to effectively craft an ‘unessay’ of original research on a topic connected with the illicit antiquities trade. I will put together a schedule for weekly one on one or small group meetings where I can help you develop your project.

For this to work, you will have to come prepared to these meetings. This means keeping a research journal to which I will have access. You may choose to make this publicly accessible as well (and we’ll talk about why and how you might want to do that).  Periodically, we will meet as an entire class to discuss the issues we are having in our research. You will present your research formally to the class and invited visitors at the end of term – your project might not be finished at that point, but your presentation can take this into account. The project is due on the final day of term.


Pass/Fail: Research Journal (ie, no complete research journal, no assessment for this project). We will discuss what is involved in a research journal. A Zotero library with notes would also be acceptable.

5% Presentation in class

45% Project

O’Donnel writes,

“If unessays can be about anything and there are no restrictions on format and presentation, how are they graded?

The main criteria is how well it all fits together. That is to say, how compelling and effective your work is.

An unessay is compelling when it shows some combination of the following:

  • it is as interesting as its topic and approach allows
  • it is as complete as its topic and approach allows (it doesn’t leave the audience thinking that important points are being skipped over or ignored)
  • it is truthful (any questions, evidence, conclusions, or arguments you raise are honestly and accurately presented)

In terms of presentation, an unessay is effective when it shows some combination of these attributes:

  • it is readable/watchable/listenable (i.e. the production values are appropriately high and the audience is not distracted by avoidable lapses in presentation)
  • it is appropriate (i.e. it uses a format and medium that suits its topic and approach)
  • it is attractive (i.e. it is presented in a way that leads the audience to trust the author and his or her arguments, examples, and conclusions).”


So that's what I'm going with. I'm not giving points out for participation, as that never has really worked for me. There will of course be much more going on in the classroom that just what is described here, including technical tutorials on various digital tools that I think are useful, beta-testing some other things, but my thinking is that these will see their expression in the quality of the independent research that takes place in the Winter term.

So Fall term: much reading, much discussion. Winter term: self-direction along trajectories established in the Fall.

Desert Island Archaeologies

You’ve been castaway on an uncharted desert isle… but friendly dolphins deposit a steamer trunk full of books on the shore to keep you occupied, the exact ten you’d pick. Thus the premise of Lorna Richardson’s new public archaeology project: Desert Island Archaeologies. Turns out, I was the first castaway. You can read my ten picks alongside those of other castaways, or just keep reading here.

Let’s see. Ah. Here we go. Goodness: the exact ten books I would want to be reading. First up: Ray Laurence, Roman Pompeii: Space and Society, 1994. This was the book that convinced me to go to grad school – we had a whole seminar built on it in my final year, back in ’96. It was unlike anything else I was reading as an undergraduate, and showed me that there were ways of looking at something as well-trod as Pompeii that were completely askew of what I’d come to expect. The geek in me loved the space-syntax, the way of reading street life. Hell, it was fun!

Next,Stephen Shennan, Genes, Memes and Human History – Darwinian Archaeology and Cultural Evolution (2002). By the time I came across this, I was getting very much into complex systems and simulation, and this was something that helped me make sense of what I was doing. And it’s a fun read. Oh look, here’s Amanda Claridge’s Rome: An Oxford Archaeological Guide‘ (1998). I hear Amanda’s dry wit every time I open this thing. This was my constant companion on my first trip to Rome. I can’t imagine going there without it.

What else, what else… It’s interesting how nostalgic I am about these items. Each one seems tied to a particular chapter of my life. Matthew Johnson’s ‘Archaeological Theory‘ (1999) still makes me laugh and provides guidance through the thorny thickets of theory. Sybille Haynes’ ‘Etruscan Civilization‘ is a treat for sore eyes, filled with the beauty and magic of that people. I expect it can also be used for self-defence, in case of wild animal attack on this island. I used it for the first class I ever taught, at the school of continuing education at Reading.

Harry Evans, ‘Water Distribution in Ancient Rome‘ (1997) reminds me of adventures through the Roman countryside on a dangerously lunatic vespa, trying to identify the standing ruins, with A. Trevor Hodge’s ‘Roman Aqueducts and Water Supply (1992) in the other hand. Hodge’s book was as a bible for me writing my MA; I had the opportunity to meet Hodge at Carleton University shortly after I started working there. Sadly, a trivial over-long meeting prevented that from happening. Hodge died later that week. I will regret that always.

Back to Ray Laurence. The man has had a profound impact on me as a scholar. His ‘Roads of Roman Italy: Mobility and Cultural Change‘ (1999) and all that space-economy stuff: fantastic! Totally connected with the ORBIS simulation of the Roman world by Meeks and Scheidel, by the way, in terms of how it changes our perspective on the Roman world (ORBIS isn’t a book, but maybe there’s a tablet in this steamer trunk somewhere?) In the intro to Roads of Roman Italy, Laurence mentions my name, which was the first time I’d seen my name in print, in an academic context. A real thrill! No less of a thrill than how I came to be mentioned in the first place: driving the British School at Rome’s death-trap ducato for Ray as we explored the remains of the Roman roads in the outskirts of town. If there is no tablet in this steamer trunk (with wifi provided by an unseen Google blimp, obviously), I think the ‘Baths of Caracalla‘ by Janet DeLaine (1997) might be buried down here somewhere… ah, here it is. When I first pitched my MA idea to Janet, she kept finishing my sentences. I wanted to do a quanity survey of the Roman aqueducts. Turned out, she was waaaaay ahead of me. She let me use the manuscript to this as I puttered away on the Aqua Claudia and the Anio Novus. It’s actually quite a fun read, especially when you start thinking about nuts-and-bolts type questions like, how the hell did they build this damned thing anyway?

Final book? It’s not archaeological, but it’s a good read. Complexity: A Guided Tour‘ by Melanie Mitchell, 2011. I’m quite into simulation and games, and the emergent behaviours of both ai and humans when they conspire together to create (ancient) history (as distinct from the past). That’s a whole lot of interdisciplinariness, so this volume by Mitchell always provides clarity and illumination.

The Web of Authors for Wikipedia’s Archaeology Page

I’m playing with a new toy, WikiImporter, which allows me to download the network of authorship on media-wiki powered sites. I fired it up, set it to grab the user-article network and “The Hyperlink Coauthorship network will analyze all the links found in the seed article and create an edge between each user that edited the article found in that link and the article”.

Naturally, I pointed it at ‘archaeology’ on Wikipedia.  I’ve posted the resulting two mode network on figshare for all and sundry to analyze.

I also asked it to download the article to article links (which is slightly different than my spidering results, as my spiders also included the wiki pages themselves, like the ‘this page is a stub’ or ‘this page needs citations’, which gives me an interesting perspective on the quality of the articles. More on that another day). This file is also on figshare here.

Just remember to cite the files. Enjoy!


Interview by Ben Meredith, for his article on procedurally generated archaeology sims

I was interviewed by Ben Meredith on procedurally generated game worlds and their affinities with archaeology, for Kill Screen Magazine. The piece was published this morning. It’s a good read, and an interesting take on one of the more interesting recent developments in gaming. I asked Ben if I could post the unedited communication we had, from which he drew on for his article. He said ‘yes!’, so here it is.

It seems to me that archaeology and video games share a number of affinities, not least of which because they are both procedurally generated. There is a method for field archaeology; follow the method, and you will have correctly excavated the site/surveyed the landscape/recorded the standing remains/etc. These procedures contain within them various ways of looking at the world, and emphasize certain kinds of values over others, which is why it is possible to have a marxist archaeology, or a gendered archaeology, or so on. Thus, it also seems obvious to me that you can have an archaeology within video games (not to be confused with media archaeology, or an archaeology of video games). A great example of this kind of work is Andrew Rheinhart’s exploration of the beta of Elder Scrolls Online – you should touch base with him, too.

On to your questions!

What motivated you to become an archaeologist?

Romance, mystery, allure, the ‘other’, the desire to travel… my initial impetus for getting into archaeology comes from the fact that I’m ‘from the bush’ in rural Canada and as a teenager I wanted so much more from the world. I now recognize that there’s some amazing archaeology in my own backyard (as it were) but I was too young and immature to recognize it then. The Greek Bronze Age, the Mycenaean heroes, the Minoans, Thera… all these captured my imagination. And there was no snow!

Personally, what single facet of archaeology captures the spirit of the field most effectively?

Check out the work of Colleen Morgan and Sophie Hay and Lorna Richardson If there is a ‘spirit of the field’, I think these three scholars capture it admirably. They are curious, reflective, aware of the impact that the doing of archaeology has in the wider world. Archaeology produces powerful narratives, powerful ways of framing our current situation regarding the past and the present. I aspire to be more like these three remarkable women.

Which game do you think, so far, best achieves this?

A hard question to answer. But I think I’d go with Minecraft, for its community and especially its ability to be adopted in educational circles, for the way it requires the player to build and engage with the environments created. The world is what you make it, in Minecraft. So too in archaeology.
If a game attempted to procedurally generate ancient civilizations, what do you think would be the three most important elements that had to be generated?
I’ve done a lot of agent-based simulation. . Such a game would have to be built on an agent-based framework, for the NPCs. Each NPC would have to be unique. Those rules of behaviours that describe how the NPCs interact with each other, the environment, and the player would have to accurately capture the target ancient civilization. You can’t just have an ‘ancient civilization'; you’ll have to consider one very particular culture in one very particular time and place. That’s what a procedural rhetoric is all about: an argument in code about how this aspect of the world worked/is/existed.
Would investigation play an integral part in a video game interpretation?
I’m not sure I follow. Procedural generation on its own still is meaningless; it would have to be interpreted. The act of playing the game (and see the work of Roger Travis on on practicomimetics) sings it into existence.
Conversely, for you would stumbling blindly upon a ruin diminish the effect?
If the world is procedurally generated, then there would be clues in the landscape that would attune the attentive player to the presence of the past in that location. If there is no rhyme or reason – we stumble blindly – then the procedures do not describe an ancient (or any) civilization.

Do you think an archaeology simulator would be best implemented in first person (e.g. Minecraft) or third person (e.g. Terraria)? Would it be more important to convey an intimate atmosphere or impressive scale?
I like first person, but on a screen, first person can just induce nausea in the player. Maybe with an Oculus Rift that’s not a concern, in which case I’d say go first person! On a screen, I think third is better. Why not go AR and put your procedurally generated civilization into the local landscape?

Archeology versus Archaeology versus #Blogarch

I’m working on a paper that maps the archaeological blogosphere. I thought this morning it might be good to take a quick detour into the Twitterverse.


'archaeology' on twitter, april 7 2014

‘archaeology’ on twitter, april 7 2014

‘archaeology’ on twitter

Here we have every twitter username, connected by referring to each other in a tweet. There’s a seriously strong spine of tweeting, but it doesn’t make for a unified graph. The folks keeping this clump all together, measured by betweeness centrality:


top replied-to


Top hashtags:
archaeology 325
Pompeii 90
fresco 90
Archaeology 77
Herculaneum 40
Israel 24
nowplaying 20
roman 18
newslocker 16
Roman 14


Let’s look at american archeology – as signified by the dropped ‘e’.

'archeology' on twitter, april 7

‘archeology’ on twitter, april 7

An awful lot more fragmented – less popular consciousness of archaeology-as-a-community?
Top by betweeness centrality – the ones who hold this together:

Top urls:

Top hashtags:

Top replied-to

#Blogarch on twitter

twitter search '#blogarch' april 7 2014

twitter search ‘#blogarch’ april 7 2014

And now, the archaeologists themselves, as indicated by #blogarch

We talk to ourselves – but with the nature of the hashtag, I suppose that’s to be expected?

Top by betweeness centrality

top urls

Top hashtags

Top replied to
electricarchaeo (yay me!)

Top mentioned:

Put them altogether now…

And now, we put them altogether to get ‘archaeology’ on the twitterverse today:

'archaeology, archeology, and #blogarch' on twitter, april 7

‘archaeology, archeology, and #blogarch’ on twitter, april 7

Visually, it’s apparent that the #blogarch crew are the ones tying together the wider twitter worlds of archaeology & archeology, thought it’s still pretty fragmented. There’re 460 folks in this graph.

Top by betweeness centrality:


Top urls

top hashtags (not useful, given the nature of the search, right? But anyway)


Top word pairs in those largest groups:

archeology,professor 30
started,yesterday 21
yesterday,battle 21
battle,towton 21
towton,weapon 21
weapon,tests 21
tests,forensic 21
forensic,archeology 21
museum,archeology 19
blogging,archaeology 17

second group:
blogging,archaeology 13
future,blogging 12
archaeology,go 7
archaeology,future 7
archaeology,final 6
final,review 6
review,blogarch 6
hopes,dreams 6
dreams,fears 6
fears,blogging 6

third group:
space,age 6
age,archaeology 6
archaeology,future 6
future,know 6
know,going 6
saa2014,blogarch 6
going,blogarch 5
blogarch,post 3
post,future 3
future,blogging 3

fourth group:
easterisland,ancient 10
ancient,mystery 10
mystery,easter 10
easter,slave 10
slave,history 10
history,esoteric 10
esoteric,archeology 10
archeology,egypt 10
rt,illumynous 9
illumynous,easterisland 9

fifth group:
costa,rica 8
rt,archeologynow 7
archeologynow,modern 4
modern,archeology 4
archeology,researching 4
researching,dive 4
dive,bars 4
bars,costa 4
rica,costa 4
rica,star 4

Top mentioned in the entire graph

illumynous 9 bonesdonotlie 8
drspacejunk 8 drkillgrove 4
bonesdonotlie 8 capmsu 4
archeologynow 7 yagumboya 3
openaccessarch 7 drspacejunk 3
macbrunson 6 archeowebby 3
swbts 6 allarchaeology 3
archeowebby 6 openaccessarch 3
algenpfleger 5 cmount1 3
youtube 5 brennawalks 2

HIST4805b Looted Heritage: The Illicit Antiquities Trade

I’m teaching a fourth year seminar next year dealing with issues surrounding the illicit antiquities trade. This seminar will be in conjunction with a larger project spearheaded by the investigative reporter and author Jason Felch, of Chasing Aphrodite. I’m quite excited about this; as an undergraduate, I once had the opportunity to work on a term project that looked at the antiquities market. That was twenty years ago; I’ve never really had the opportunity to scratch that itch since. So, when I was asked to suggest a seminar topic, I jumped at the chance to plumb the depths of my own ignorance together with my students. What better way to teach than to be learning right along with your students?

As ever, I turned to twitter, to see what folks there had to say.

Many folks chimed in with suggestions, including:

I’m keeping all of these in a zotero library for eventual sharing with my students (wider world too), but for now, this is the kind of stuff that’s come in:

Legal & Academic Frameworks

Renfrew, Colin. Loot, Legitimacy and Ownership: The Ethical Crisis in Archaeology. Duckworth, 2000.

Lazrus, Paula K. And A. Barker (eds). All the King’s Horses: Essays on the Impact of Looting and the Illicit Antiquities Trade on Our Knowledge of the Past. SAA 2012.

Marlowe, Elizabeth. Shaky Ground: Context, Connoisseurship and the History of Roman Art. Debates in Archaeology. London: Bloomsbury Academic, 2013.

Hoffman, Barbara T., ed. Art and Cultural Heritage: Law, Policy, and Practice. Cambridge ; New York: Cambridge University Press, 2006.

Green, Penny, and S. R. M. Mackenzie, eds. Criminology and Archaeology: Studies in Looted Antiquities. Oñati International Series in Law and Society. Oxford ; Portland, Or: Hart Publishing, 2009.

RealTime Delphi Study on the Future of Cultural Heritage Research

Campbell, Peter B. ‘The Illicit Antiquities Trade as a Transnational Criminal Network: Characterizing and Anticipating Trafficking of Cultural Heritage’. International Journal of Cultural Property 20, no. 02 (2013): 113–153. doi:10.1017/S0940739113000015.

World War II

Nicholas, Lynn H. The Rape of Europa: The Fate of Europe’s Treasures in the Third Reich and the Second World War. 1st ed. New York: Knopf, 1994.

Edsel, Robert M, and Bret Witter. The Monuments Men: Allied Heroes, Nazi Thieves and the Greatest Treasure Hunt in History. New York: Center Street / Hachette Book Group, 2010.

Edsel, Robert M. Saving Italy: The Race to Rescue a Nation’s Treasures from the Nazis. 1st ed. New York: W. W. Norton & Company, 2013.

Current State

Felch, Jason, and Ralph Frammolino. Chasing Aphrodite: The Hunt for Looted Antiquities at the World’s Richest Museum. Houghton Mifflin Harcourt, 2011.

Watson, Peter, and Cecilia Todeschini. The Medici Conspiracy: The Illicit Journey of Looted Antiquities from Italy’s Tomb Raiders to the World’s Greatest Museums. PublicAffairs, 2007.

Waxman, Sharon. Loot: The Battle over the Stolen Treasures of the Ancient World. Macmillan, 2010.

‘Trafficking Culture’. Accessed 12 March 2014.

and an entire special issue of Internet Archaeology: Issue 33 – Portable Antiquities: archaeology, collecting, metal detecting, Edited by Stuart Campbell and Suzie Thomas

And from Donna Yates, the exciting news that she and her collaborators at Trafficking Culture are going to write a textbook on the subject:


If you have suggestions for things the students should be reading/looking at/exploring, please do drop me a line or leave a comment.

If you have suggestions for things the students should be reading/looking at/exploring, please do drop me a line or leave a comment.

Shared Authority & the Return of the Human Curated Web

A few years ago, I wrote a piece on Why Academic Blogging Matters: A structural argument. This was the text for a presentation as part of the SAA in Sacremento that year. In the years since, the web has changed (again). It is no longer enough for us to create strong signals in the noise, trusting in the algorithmns to connect us with our desired publics. (That’s the short version. The long version is rather more nuanced and sophisticated, trust me).

The war between the botnets and the SEO specialists has outstripped us.

In recent months, I have noticed an upsurge of new ‘followers’ on this blog with emails and handles that really do not seem to be those of actual humans. Similarly, on Twitter, I find odd tweets directed at me filled with gibberish web addresses (which I dare not touch). Digital Humanities Now highlighted an interesting post in recent days that explains what’s going on, discusses this ‘war’, and in how this post came to my attention, points the way forward for the humanistic use of the web.

In ‘Crowd-Frauding: Why the Internet is Fake‘, Eric Hellman discusses a new avenue for power (assuming that power ‘derives from the ability to get people to act together’. In this case, ‘cooperative traffic generation’, or software-organized crime. Hellman was finding a surge of fake users on his site, and he began to investigate why this was. Turns out, if you want to promote your website and jack up its traffic, you can install a program that manufacturers fake visitors to your sites, who click around, click on adverts, register… and in turn does this for other users of the software. Money is involved.

“In short, your computer has become part of a botnet. You get paid for your participation with web traffic. What you thought was something innocuous to increase your Alexa- ranking has turned you into a foot-soldier in a software-organized crime syndicate. If you forgot to run it in a sandbox, you might be running other programs as well. And who knows what else.

The thing that makes cooperative traffic generation so difficult to detect is that the advertising is really being advertised. The only problem for advertisers is that they’re paying to be advertised to robots, and robots do everything except buy stuff. The internet ad networks work hard to battle this sort of click fraud, but they have incentives to do a middling job of it. Ad networks get a cut of those ad dollars, after all.

The crowd wants to make money and organizes via the internet to shake down the merchants who think they’re sponsoring content. Turns out, content isn’t king, content is cattle.”

Hellman goes on to describe how the arms race, the red queen effect, between these botnets and advertising models that depend on clickrates etc will push those of us without the computing resources to fight in these battles into the arms of the Googles, the Amazons, the Facebooks: and their power will increase correspondingly.

“So with the crowd-frauders attacking advertising, the small advertiser will shy away from most publishers except for the least evil ones- Google or maybe Facebook. Ad networks will become less and less efficient because of the expense of dealing with click-fraud. The rest of the the internet will become fake as collateral damage. Do you think you know how many users you have? Think again, because half of them are already robots, soon it will be 90%. Do you think you know how much visitors you have? Sorry, 60% of it is already robots.”

I sometimes try explaining around the department here that when we use the internet, we’re not using a tool, we’re sharing authority with countless engineers, companies, criminals, folks-in-their-parents-basement, ordinary folks, students, algorithms whose interactions with other algorithms can lead to rather unintended outcomes. We can’t naively rely on the goodwill of the search engine to help us get our stuff out there. This I think is an opportunity for a return of the human curated web. No, I don’t mean building directories and indices. I mean, a kind of supervised learning algorithm (as it were).

Digital Humanities Now provides one such model (and there are of course others, such as Reddit, etc). A combination of algorithm and human editorial oversite, DHNow is a cybernetic attempt to bring to the surface the best in the week’s digital humanities work, wherever on the net it may reside. We should have the same in archaeology. An Archaeology Now!  The infrastructure is already there. Pressforward, the outfit from the RRCHNM has developed a workflow for folding volunteer editors into the weekly task of separating the wheat from the chaff, using a custom built plugin for WordPress. Ages ago we talked about a quarterly journal where people would nominate their own posts and we would spider the web looking for these nominations, but the technology wasn’t really there at that time (and perhaps the idea was too soon). With the example of DHNow, and the emergence of this new front in botnets/SEO/clickfraud and the dangers that that poses, perhaps it’s time to revisit the idea of the human-computer curated archaeoweb?

Exploring Trends in Archaeology: Professional, Public, and Media Discourses

The following is a piece by Joe Aitken, a student in my CLCV3202a Roman Archaeology for Historians class at Carleton University. His slides may be found here. I asked Joe if I could share his work with the wider world, because I thought it an interesting example of using simple text analysis to explore broader trends in public archaeology. Happily, he said yes.

Exploring Trends in Archaeology: Professional, Public, and Media Discourses

An immense shift in content and terminology emerges when analysing the text of several documents relating to the archaeology of Colchester, as information grows from its genesis as an archaeological report, through the stage of public archaeology, and finally to mass media. Many inconsistencies emerge as the form in which archaeological information is presented changes.

This analysis was done with the help of Voyant Tools, “a web-based text analysis environment.”[1] Z-score, representing the number of standard deviations above the mean at which each term appears, will be used as the basic marker of frequency. Skew, “A measure of the asymmetry of relative frequency values for each document in the corpus,”[2] will also be used. Having a skew close to zero suggests that the term appears with relative consistency throughout the documents. This means that in comparison to, for example, “piggery,” with a skew of 11, terms with a low skew are not only frequent in the corpus as a whole, but are prevalent in many of the documents that make up the corpus.

A text analysis of Colchester Archaeological Trust Reports 585-743 (February 2011 to 22nd October 2013)[3] is the basis of this comparison. Dominant in this corpus are terms related to archaeological excavations. The term “report” has a z-score of 8.69, “finds” has a z-score of 6.43, and “site” has a z-score of 8.81. The same terms, respectively, have skews of 0.93, 0, and 0.88. Another relatively consistent term is “pottery,” which has a skew of 1 and a z-score of 5.26. “Brick”, with a skew of 2.17 and a z-score of 3.1, is similarly consistent.

The relevance of these figures becomes clearer upon a comparison with the public archaeological writings as they appear on the Colchester Archaeologist blog. The blog exists on the public-facing website of the Colchester Archaeological Trust, and has been blogging about its archaeological discoveries since 2011. This analysis will use the Voyant-Tools difference function, which returns a value based on a comparison between the z-scores of two corpora,[4] as well as a direct comparison of the z-score and skew of each term between the two corpora.

Some of the most consistent terms from the archaeological corpus appear very infrequently in the public archaeology. “Pottery” has a skew of 9.49 and a z-score of 0.25, and appears at about 1/5 of the frequency as it does in the reports. “Brick” similarly disappears: in the public archaeology, it has a skew of 9.56 and a z-score of -0.02, compared to a skew of 2.17 and a z-score of 3.1 in the archaeological reports.

Terms relating to the excavation also disappear. “Finds,” which in the archaeological reports has a skew of 0 and a z-score of 6.43, has a skew of 4.94 and a z-score of 0.42 in the public archaeology. “Report” similarly changes from a skew of 0.93 to 9.87, with it’s z-score dropping from 8.69 to -0.06. Site follows this trend to a lesser extent, although this is likely due to it appearing in the public archaeology in the context of “website,” rather than as an archaeological term. Still, the shift in z-score and skew are significant, and in the same direction: an archaeological z-score of 8.81 to a public z-score of 3.83, and an archaeological skew of 0.88 to a public skew of 1.28. In each case, these commonly used terms from the archaeological reports appeared less frequently and less consistently in the blog.

On the other hand, some terms are much more common in the public archaeology. Compared to the corpus of archaeological reports, the public archaeology texts contain the term “circus” at 5 times the frequency. In the blog, “circus” has a z-score of 5.77 and a relatively stable skew of 1.79, compared to a minimal z-score of 0.69 and a volatile skew of 6.3 in the archaeological reports. A similar change occurs to the term “burial,” although to a lesser extent: from report to blog, the z-score rises from 0.25 to 0.86, and the skew drops from 3.84 to 3.65.

Terms with a high skew and a non-insignificant z-score in the archaeological reports seem to be the most prevalent terms altogether in the public archaeology, while terms with a skew closer to zero in the reports disappear in the public archaeology: that is, the terms that appear infrequently but in large numbers in the reports are the ones selected for representation in the blog. This emphasises rare and exciting discoveries, such as the circus and large burials, while ignoring the more regular and consistent discoveries of pottery and bricks. For terms with high skew, there is a consistent rise in z-score and drop in skew in the incidences of the term between the archaeological and public corpora. For terms with a skew closer to zero, there is a consistent decline in z-score. The two trends that terms follow with regards to their relative frequencies between the two corpora can be defined as follows: low-skew terms, which tend to disappear, and significant-z-score/high skew terms, which tend to be emphasised in the public archaeology.

Archaeology in the media seems to mostly follow from the public archaeology rather than the archaeological reports on most aspects. The media corpus contains articles about the archaeology of Colchester from sources ranging from local to national media, including the BBC, the Colchester Daily Gazette, the Essex County Standard, and the Independent, in addition to international Archaeological publications. In these articles, “circus” has a low skew of 1.51, although its z-score isn’t as overwhelmingly high as it is in the public archaeology at 1.64. Still, it is much greater than the z-score of 0.69 for “circus” in the reports, and this z-score most likely reflects a greater lexical variety rather than a focus on other aspects of the archaeology, as this is the fifth-highest z-score in the entire media corpus. Still, there is less emphasis on the circus here than in the blog.

In common between the public and media corpora is their near complete removal of non-Roman archaeological terminology. The term “medieval” appears 1555 times in the archaeological corpus, with a z-score of 3.42 and a skew of 2.64. In the public corpus, the same term appears twice, with a z-score of negative -0.09 and a skew of 10.30. In the selection of news about the archaeology of Colchester, the term never appears. This follows the same trends of selection as the public archaeology: “medieval,” a low-skew term in the archaeological corpus, is ignored in favour of high-skew terms.

Although the media and public corpora contain writings about the same discoveries and use similar language, the frequency at which they do so differs. The media, unlike the blog, is unlikely to repeatedly write about the circus even when no new information is available. Rather, each media seems to be inspired by the archaeological reports, but takes its information from the public archaeology. That is, instead of repeating the public archaeology, the media takes inspiration from the actual archaeological discovery, but takes their information about this archaeology from the blog rather than directly from the report.

Altogether, archaeological writing about Colchester appears to become much narrower over time. While the archaeological reports assumedly accurately reflect what is found, the public archaeology, and, in turn, the media, does not. Instead, they focus on more marketable and exciting aspects of the archaeology: these can be recognized as the high-skew/high-z-score terms in the analysis. As a result, the particulars of the excavation, as well as the majority of findings, are de-emphasised; these are the low-skew terms. By the stage of public presentation, only a very narrow view of the archaeology of Colchester has been presented. It is almost exclusively monumental and Roman, and is at odds with the multiplicity of archaeological findings that are seen in the reports.


Patterns in Roman Inscriptions

Update August 22 I’ve now analyzed all 1385 inscriptions. I’ve put an interactive browser of the visualized topic model at

See how nicely the Latin clusters?

See how nicely the Latin clusters?

I’ve played with topic modeling inscriptions before. I’ve now got a very effective script in R that runs the topic model and produces various kinds of output (I’ll be sharing the script once the relevant bit from our book project goes live). For instance, I’ve grabbed 220 inscriptions from Miko Flohr’s database of inscriptions regarding various occupations in the Roman world(there are many more; like everything else I do, this is a work in progress).

Above is the dendrogram of the resulting topics. Remember, those aren’t phrases, and I’ve made no accounting for case endings. (Now, it’s worth pointing out that I didn’t include any of the meta data for these inscriptions; just the text of the inscription itself, with the diacritical marks removed.) Nevertheless, you get a sense of both the structure and content of the inscriptions, reading from left to right, top to bottom.

We can also look at which inscriptions group together based on the similarity matrix of their topics, and graph the result.


Inscriptions, linked based on similarity of the language of the inscription, via topics. If the image appears wonky, just click through.

So let’s look at these groups in a bit more depth. I can take the graph exported by R and import it into Gephi (or another package) to do some exploratory statistical analysis.

I’ve often put a lot of stock in ‘betweeness centrality’, reckoning that if a document is highly between in a network representation of the patterns of similarity of topics, then that document is representative of the kinds of discourses that run through it. What do we get, then?

We get this (here’s the page in the database):

aurifices Roma CIL 6, 9207 Inscription Occupation
M(arcus) Caedicius Iucundus / aurifex de / sacra via vix(it) a(nnos) XXX // Clodia …

But there are a lot of subgroupings in this graph. Something like ‘closeness’ might indicate more locally important inscriptions. In this case, the two with the highest ‘closeness’ measures are

aurifices Roma CIL 6, 9203 Inscription Occupation
Protogeni / aurfici / vix(it) an(nos) LXXX / et Claudiae / Pyrallidi con(iugi) …


aurifices Roma CIL 6, 3950 Inscription Occupation
Lucifer v(ixit) a(nnum) I et d(ies) XLV / Hesper v(ixit) a(nnos) II / Callistus …

If we look for subgroupings based on the patterning of connections, the biggest subgroup has 22 inscriptions:
Dis Manibus Felix publicus Brundisinorum servus aquarius vixit…
Dis Manibus Laetus publicus populi Romani 3 aquarius aquae An{n}ionis…
Dis Manibus sacrum Euporo servo vilico Caesaris aquario fecit Vestoria Olympias…
Nymphis Sanctis sacrum Epictetus aquarius Augusti nostri
Dis Manibus Agathemero Augusti liberto fecerunt Asia coniugi suo bene…
Agatho Aquarius Caesaris sibi et Anniae Myrine et suis ex parte parietis mediani…
Dis Manibus Sacrum Doiae Palladi coniugi dignissimae Caius Octavius…
Dis Manibus Tito Aelio Martiali architecto equitum singularium …
Dis Manibus Aureliae Fortunatae feminae incomparabili et de se bene merenti..
Dis Manibus Auliae Laodices filiae dulcissimae Rusticus Augusti libertus…
Dis Manibus Tychico Imperatoris Domitiani servo architecto Crispinilliano.
Dis Manibus Caio Iulio 3 architecto equitum singularium…
Dis Manibus Marco Claudio Tryphoni Augustali dupliciario negotiatori…
Dis Manibus Bromius argentarius
Faustus 3ae argentari
Dis Manibus sacrum Tiberius Claudius Hymeneus aurarius argentarius…
Dis Manibus Silio Victori filio et Naebiae Amoebae coniugi et Siliae…
Dis Manibus 3C3 argentari Allia coniugi? bene merenti fecit…
Dis Manibus Marco Ulpio Augusti liberto Martiali coactori argentario…
Suavis 3 aurarius
Dis Manibus sacrum Tiberius Claudius Hymeneus aurarius argentarius…
Dis Manibus Tito Aurelio Aniceto Augusti liberto aurifici Aurelia…

What ties these together? Well, ‘dis manibus’ is good, but it’s pretty common. The occupations in this group are all argentarii, architectii, or aquarii. So that’s a bit tighter. Many of these folks are mentioned in conjunction with their spouses.

In the next largest group, we get what must be a family (or familia, extended slave family) grouping:
Caius Flaminius Cai libertus Atticus argentarius Reatinus
Caius Octavius Parthenio Cai Octavi Chresti libertus argentarius
Musaeus argentarius
Caius Caicius Cai libertus Heracla argentarius de foro Esquilino sibi…
Caius Iunius Cai libertus Salvius Caius Iunius Cai libertus Aprodisi…
Caius Vedennius Cai filius Quirina Moderatus Antio militavit in legione…
Aurifex brattarius
Caius Acilius Luci filius Trebonia natus architectus
Caius Postumius Pollio architectus
Caius Camonius Cai libertus Gratus faber anularius
Caius Antistius Isochrysus architectus
Elegans architectus
Caius Cuppienus Cai filius Pollia Terminalis praefectus cohortis…
Cresces architectus
Cresces architectus
Caius Vedennius Cai filius Quirina Moderatus Antio militavit in legione…
Pompeia Memphis fecit sibi et Cnaeo Pompeio Iucundo coniugi suo aurifici…
Caius Papius Cai libertus Salvius Caius Papius Cai libertus Apelles…
Caius Flaminius Cai libertus Atticus argentarius Reatinus

The outliers here are graffitos or must be being picked up by the algorithmn due to the formation of the words; the inclusion of Pompeia in here is interesting, which must be to the overall structure of that inscription. Perhaps a stretch too far to wonder why these would be similar…?

This small experiment demonstrates I think the potential of topic modeling for digging out patterns in archaeological/epigraphic materials. In due time I will do Flohr’s entire database. Here are my files to play with yourself.

Giant component at the centre of these 220 inscriptions.

Giant component at the centre of these 220 inscriptions.