Carleton has an annual ‘academic retreat’, which is happening this weekend. I’m not sure what, precisely, occurs there, but I’ve been asked to talk about things digital/history/archaeological. In the wake of the recent SAA #blogarch session, and in advance of the upcoming special issue in Internet Archaeology on blogging archaeology, I thought I’d talk about one aspect of what I found when I set out to map the shape of ‘roman archaeology’ on the web.  It’s an update to what I did in 2011, for the #blogarch session at that year’s SAAs (you can read what I thought then here.)

I give you, ‘Shouting into the Void? Social Media and the Construction of Cultural Heritage Knowledge Online’

Archeology versus Archaeology versus #Blogarch

I’m working on a paper that maps the archaeological blogosphere. I thought this morning it might be good to take a quick detour into the Twitterverse.


‘archaeology’ on twitter

Here we have every twitter username, connected by referring to each other in a tweet. There’s a seriously strong spine of tweeting, but it doesn’t make for a unified graph. The folks keeping this clump all together, measured by betweeness centrality:


top replied-to


Let’s look at american archeology – as signified by the dropped ‘e’.

An awful lot more fragmented – less popular consciousness of archaeology-as-a-community?
Top by betweeness centrality – the ones who hold this together:

twitter search '#blogarch' april 7 2014

'archaeology, archeology, and #blogarch' on twitter, april 7

So what does this all mean? Answers on a postcard, please…

(My network files will be on eventually).

A Tale of Two Conferences: CAA UK and SAA 2011, as experienced on Twitter

Two conferences at the same time, opposite sides of the world (give or take), and you can’t get to either? There’s an app for that, and it’s called Twitter.

Nicolas Laracuente has been curating tweets relating to the Society for American Archaeology Annual Meeting in Sacramento via Storify – you can see his reporting on the conference here.

Inspired by Nicolas’ work, Jessica Ogden performed the same service at the Computer Applications in Archaeology UK edition conference, here.

Some of the things going on in the UK in terms of digital archaeology are very exciting indeed. What with my own interest and work in agent based modeling, I’m perhaps a bit biased. But I was also excited to see (‘read about’) some interesting work being done in terms of using game engines for archaeological visualization and outreach. I’m working on a project at the moment using the Web.Alive product (it’s built on Unreal) to render archaeological knowledge in an immersive environment. I’ve applied for funding to see if I can procedurally generate immersive worlds from archaeological repositories such as Stay tuned!

Signal Versus Noise: Why Academic Blogging Matters: A Structural Argument. SAA 2011

Signal versus Noise: Why Academic Blogging Matters. Shawn Graham, Carleton University, Ottawa Canada. [presentation with voice-over here; 15 mb] (Comic from the New York Times article is by David G. Klein)

“Omnia disce; postea videbis nihil esse superfluum” said Hugh of St Victor in the 12th century. ‘Learn everything; later it will all be useful somehow’. The irony of course is that I would in all probability have never come across this epigram (not being a medievalist) if it hadn’t been for the magic of the internet and my faculty’s Dean’s blog. Hugh goes on to say, ‘coartata scientia iucunda non est’, ‘narrow knowledge is not pleasant’. That phrase fits neatly with one of the standard criticisms of blogging, that blogs are narrowly focused, shrill, and often an echo-chamber for their (and their readers’) own views.

In a final neat connection, this phrase of Hugh’s is the epitaph on the tomb of Father Leonard Boyle. Father Boyle is buried at San Clemente in Rome, in the ruins of the 4th century church. This ‘lower church’ was found in the mid 19th century underneath the present basilica (which dates from the 12th century).  Father Boyle was the Irish Dominicans’ (who manage the site) archivist and historian, and it is indeed a moving testament to his life’s work that he should be buried in the ancient basilica. The epigram then is very much an archaeological sentiment, both in its context of display, and how it implores us to learn everything: for what else is an excavation but the careful recording of everything on the chance that it will be useful later on?

But it’s also directly useful to us who blog archaeology, who take on the mantel of public archaeology. It could, in a sense, be a motto for Google, who try to ‘learn everything’ with no idea of what will be useful to whom or in what way. But that’s the problem right there – deciding what is useful, and finding it. ‘Narrow knowledge is not pleasant’ I think neatly describes the results of search engines in that first phase of the internet, when the world wide web had just been created and people were still trying to produce human-curated guides to the ‘net. Google of course changed everything with the invention of ‘PageRank’. The mathematics of ‘PageRank’ are based on graph theory and network analysis. In essence, PageRank considers each link on a page as a kind of vote on the relative importance of the page being linked to.  It also considers the relative importance of the pages being linked out from as well, and so it’s a recursive process. This was Google’s original insight: that the importance of a page depends on the kind and quality and number of its relations to all other pages on the net.

Learn everything: but that’s only half the battle. The other part is determining what is useful, of extracting the signal from the noise of not only the search query, but of all those millions of pages of information.  And in this, Google benefits from the billions of searches that we the users perform every week. In essence, we are teaching the machine what is useful when we skip over the first page of results, looking for the one that *really* seems to match what we were looking for. Google observes this. Wired Magazine not long ago looked under the hood to see how Google learns from user behaviors. Google isn’t a search engine, or a catalog, or an index: it’s a massive experiment in prediction. Apparently, Google uses over 200 signals to match useful information to each individual user (who each have their own idea of what constitutes ‘useful’). PageRank is one signal; the title of a webpage another; the actual text of a hyperlink; freshness; and geolocation of the person doing the search.

This isn’t foolproof however. The system can be gamed. In November 2010, the New York Times published a story about, an online seller of glasses and eyewear, run by one Vitaly Borker. Borker discovered that if he offered poor service to some of his customers, those customers would complain on the internet (especially in forums), linking to his site in warning to others. One would think this would be poison to his business, but on the contrary, Borker discovered that it made his site’s listing on Google search results improve. That is, all publicity is good publicity, as the algorithm powering the search did not consider the semantic meaning of those mentions. So Borker would then go out of his way to aggravate certain of his customers to such a degree that they would generate more web traffic to his store. Once the New York Times broke the story, Google made some changes to its algorithm. Google did not reveal what changes it had made, in order to prevent other unscrupulous individuals from similarly gaming the system. Borker’s website dropped from its number one position to somewhere deep on the twentieth page of results in the immediate aftermath of the changes.

This story is illuminating on a number of levels. As educators, we’re already familiar with the fact that our students turn to the internet, and more specifically, Google, when they begin their research.  How deep do they go on a search results page? Search Engine Optimization is a bit of a black art, but all agree that appearing in the first five search results is the key: people do not click on results much after the fifth results. If one’s website does not appear in that golden group, it might as well not exist.

The story about Borker illustrates the way human interaction and Google search are linked. Google looks for actively updated materials; materials that are semantically tight; and materials that people link to. People link to the materials that Google serves up in its top five, thus creating a positive feedback loop. Wikipedia and Google were made for one another.  Wikipedia is simultaneously the product of enormous human energy, and enormous human laziness. Wikipedia produces strong signals – whether good or bad, Google doesn’t care. Google returns a Wikipedia page, and a human reads it, a fraction edit it, another fraction link to it whether to praise, disparage, or simply use it as a kind of glossary of terms, thus creating signals that Google picks up.  (This incidentally is also an argument for why academics must engage with Wikipedia and actively work to improve its content! It’s also an argument for why Wikipedia cannot be displaced: it’s here to stay, and will only become more dominant through this positive feedback loop).

So how does blogging fit into this? Blogging is a medium, not a genre, and so content itself is a bit secondary. What is important is that blogging as a medium also creates strong signals. ‘Academic Blogging’, as a genre creates very strong signals. That is, it should. Academic blogs tend to have a very tight focus. They are updated fairly regularly, as the academic incorporates them into his or her work cycles. The anchor text for linking tends to be rather unique combinations of words, what Amazon would call ‘statistically improbable phrases’, and thus provide more signal to Google’s robots. Contrast that with a static department website, for instance. It’s blogging that brings the latest research to that golden group of 5 results.

Let’s look at some structure. I searched ‘Blogging Archaeology’ via Google, crawled the results, and imported them into Gephi. I let the crawl run for about 20 minutes, recovering over 8500 nodes linked together by nearly 9000 edges. There’s a lot of noise, when you look at it. However, this network has a diameter (the maximum distance between the two furthest nodes) of 8 – that is, 8 steps from one side to the other. On average, to get from any node to any other node takes roughly 3 steps, and so a rather tight network. But I want to know where the academic bloggers fit into this, so I run the ‘modularity’ routine in Gephi. This routine looks for areas of self-similarity in the patterning of connections. I find four communities that translate into the archaeological blogosphere (green in the image), center around Colleen’s Middle Savagery. Light blue in the image seems to correlate to ‘cloud’ based storage. Purple seems to be the social media sector (Facebook etc – showing incidentally what a walled garden it is becoming). Red appears to be aggregator websites. Interestingly, Twitter -microblogging- is the purple node that sits at the intersection of the green and purple (perhaps I need to do this study from a Twitter-centric point of view).

In a sense, these results are not surprising, since I ‘gamed’ the system by looking for a term that I knew was active and heavily represented in the archaeological blogosphere. Let’s look for something a bit more generic: ‘Roman Archaeology’. Crawling the results for the same amount of time, we find 6240 nodes and 13 216 edges – a more dense network already. The diameter of this network is 10, and the average path length is 6, which suggests that it’s going to be a bit more parochial, despite all those connections. Once I search this network for modularity, I find 9 communities.  The image is striking, almost a barbell shape with Wikipedia being one of the weights and academia being the other (curiously, Columbia and Duke especially) – and the weak link connecting the two are certain blogs and twitter accounts. What better argument for academic blogging, and considering digital archaeology as public archaeology, could be made? We’ve argued that academic bloggers tell the rest of the world what the academy is up to, but never so strongly as this image depicts. Academia, the font of ‘professional’ knowledge, and Wikipedia, the font of crowdsourced knowledge connect through us, the academic bloggers.

A consistent presence then by an academic blogger can perform magic. It begins to tell Google what’s important. Blogging is just a medium, not a genre. It’s a content management system. It’s unfortunate that so many academics are turned off by the word ‘blog’, because they are actually missing an important new venue for communicating what they do to the wider world. In this day and age, if you’re not making that argument for, someone else will make the argument against, and it becomes very easy for a decision maker in government to say, ‘what good is x? Let’s cut its funding’.  All archaeology is public archaeology, ultimately. And we ignore that at our peril. We need to create the strongest signal in the noise that we can: and blogging is a crucial part of that. ‘Omnia disce; postea videbis nihil esse superfluum’. Google learns everything, but it still needs to be taught.

That’s our job.

The Archaeological Blogosphere

The archaeological blogosphere [zoomable pdf of image] is strangely beautiful. I generated this by scraping over 8000 pages from a Google Search of ‘Blogging Archaeology’. MiddleSavagery sits right there in the middle of the Green Zone. For more on this, and what it means, see my discussion tomorrow at the Society for American Archaeology’s general meeting.

If you’d like to play with the files and data I scraped send me a note.  You’ll need Gephi. To do your own crawling, you might try this.

Diameter: 8
Average Path Length: 3
Filter the network so that only nodes with > 30 connections appear, reduces the graph to ~1.5 % of the nodes, and 10% of all the connections
Leaves us with 4 communities, per modularity detection:

The archaeological blogosphere: green

The cloud: light blue (Google, Amazon, Youtube,)

Social Media: Purple (Facebook, Twitter; also online newspapers)

News aggregators: Red (

The unprocessed network is shown below:

Blogging Archaeology: Remembering that we’ve been here before when we ask ‘Where to next?’

For the last of Blogging Archaeology (on Twitter under #blogarch), Colleen asks:

For our last question, I would like to ask you to consider the act of publication for this blog carnival. How could we best capture the interplay, the multimedia experience of blogging as a more formalized publication? What would be the best outcome for this collection of insights from archaeological bloggers?

The relationship between blogging and other academic forms of discourse is certainly in the aether right now. One need only look at ‘Hacking the Academy‘ or Ian Bogost’s thoughts on ‘Beyond Blogs‘ to see that we archaeo-bloggers are not alone in considering these questions. For me, right now at where I am in my career, the best outcome of this rich back-and-forth we’ve been having is some sort of refereed publication. Some time ago, Tom Scheinfeldt of the CHNM suggested that all of us digitally inclined folks should start producing digital cvs (excellent example of which is here . In response to Tom’s argument, Adam Crymble wrote, “The vulnerable have to eat, so they have to play the game. The strong can change the rules.”  I think we #blogarchers have an opportunity to try to change the rules, but I think too that we are all of us vulnerable: and so what would secure us is if we can fit this new medium of expression into the safe boxes required by our annual assessments – hence a refereed publication, knowing full well how awkward that would be to produce.

That said, were I invulnerable, what might I do? Well, I think I might recycle an olde post here. In that post, I talk about Anthologize, a wordpress plugin for turning blogged content into the safer, more comfortable multi-authored volume.  And then maybe look at releasing it via Kindle Singles?

For an aborted attempt at putting individual blog posts together into a quasi-referred quarterly publication, see here and here on ‘PDQ – The Past Discussed Quarterly. Maybe an idea whose time has finally come? A number of us were involved in that project, but given where we all were (and the various stages of our careers and lives), we couldn’t make it work.  Here was our optimistic blurb:

From the official website:

PDQ is a journal designed to provide a bridge between blogging and academia. It will provide stable citeable references for selected weblog posts focussed upon or of interest to the pre-Renaissance past. It is compiled from articles submitted by bloggers on a quarterly basis. The journal is available in three formats. There is a PDF downloadable copy for free. There is a paper copy which can be ordered via Lulu, which is set to the cost of printing and delivery only. Finally we intend that the journal will also be placed in a repository for long-term curation. Until the details are finalised it will be available in XHTML format from a server based at NYU’s Institute for the Study of the Ancient World.

PDQ is released under a Creative Commons BY-NC-ND licence, making it freely copyable.”

Day of Digital Humanities

The Day of Digital Humanities is upon us. I will be chronicling my day over here. Here’s what my first post looks like…


It takes roughly an hour or so to get here in the morning. I have to navigate across one of the oldest bridges on the Ottawa River, and it’s always jam-packed. On the plus side, it takes me through some of the oldest industrial heritage in the region. Ottawa was one of the first cities in Canada (perhaps North America?) to become electrified, courtesy of the power of the Chaudiere Falls, which is where the bridge crosses.  Next year, I’m teaching a course in Digital History that will focus on this complex, using augmented reality as our expressive form. Each morning then I start by thinking about that class, and what we will do in this amazing spot.

But this morning, I have my first year Digital History seminar to prepare. We get going in about twenty minutes. Attendence on a Friday AM is always a bit sparse. But today we’re working en-masse on their group projects. We’ve partnered with a community organization, the Council of Heritage Organizations of Ottawa and their crowdsourcing history portal, Ottawagraphy. My students are preparing projects that will be hosted on Ottawagraphy, on different aspects of Ottawa’s history. One group of students are working on a smart phone guided tour of the Parliament Hill precinct; others on the development of various neighborhoods. It’s a pretty eclectic mix. What’s exciting about it is that these students were not overly critical consumers or producers of digital content when they started – I think they’ve come a pretty long way.

And so begins my day of Digital Humanities.


Visualizing Archaeology Blogging, or, Is Anybody Listening?

This image represents all of the contributions in response to Colleen’s first question for the Blogging Archaeology Carnival. It was created in Gephi using the HTTP Graph plugin. With Gephi open and running, you set your browser to pass its information through Gephi, which then represents all of the resulting data in terms of its network relationships.

So, I began by pointing my browser to Colleen’s post. Data began to fill the Gephi window. Then, I clicked on each link in turn, which would pour more data into Gephi. I returned to Colleen’s post, and then clicked on the next link. And so on. The resulting image (click here for an svg/pdf higher resolution image) shows how we’re all interconnected. One can automate this process by using Chrome with a web crawler (or see the video).

(by the way, you could use this to visualize all sorts of relations scraped from online databases – that’s a post for another day)

So, in response to the questions posed for this week’s edition of #blogarch , I would say that one way I try to understand where my blogging fits into the wider ecosystem is to actually map it out from time to time. A bit of navel gazing I suppose, but who hasn’t googled themselves at one point or another? My more serious point is to build on Bill’s observation:

Of course the model for understanding blogs that downplays the atomized post:comment relationship is not a product of the digital age and the internet.  In fact, I think that the way most people read and write to the web has close parallels with traditions of modern academic writing and reading.  Most academics do not pause to comment on specific articles or even individual conference paper (although books and reviews are an exception); instead they build references to these articles into their own work through the predecessor of hyperlinks: footnotes.  The networks that have emerged among bloggers find have nice parallels with the intellectual networks manifest in academic citations. The biggest difference between the two practices is the speed with which the discourse can develop (and evaporate) through digital publication.

I was over the moon when I got my first comment on my blog, oh-so-long-ago; I was especially chuffed when Bill had kind things to say about my blogging too (thanks Bill!). Nowadays what comments I get on average tend to be spam. Like Bill (and I suspect, everyone else) I sometimes get emails, phone calls, or ‘by the way’ notes that reference something I have blogged. I recently heard that a class at York in the UK uses some of my blogs in their course work (as examples of best practice or good ideas, I hope!) In which case, I think it is a useful exercise to try to map out the networks that we are creating through this prolonged short-form engagement with the profession, the public, and our subject matter. Blogging sometimes is a bit like “launch and forget”… but we need to have some idea who our community is and how far our thoughts are likely to percolate .  We need to be aware of possible network effects in our blogging, and to use these to get our professional voice out there in those top five search results. Is anybody listening? Yes, probably; what I’ve tried to do in my little experiment today is to show how we can begin to approach the question of ‘who?’.

Blogging Archaeology at the SAA – What do you blog?

Colleen asks,

Beyond the general problems that come with performing as a public intellectual, what risks do archaeologists take when they make themselves available to the public via blogging? What (if any) are the unexpected consequences of blogging? How do you choose what to share?

When I started this blog, back in my dark days in the academic wilderness (ca 2007), this question was easy to answer. I blogged whatever caught my fancy, as long as it fit with my general theme (see the masthead). I tried to read widely, outside my comfort zone, with the idea that I could find interesting digital applications from other fields, reporting back to my archaeological readership.

Which, at the time, was an audience of one (thanks Mom!)

But I persevered, and continued to write, and the number of people I reach has increased quite nicely, thank you. One unintended consequence of that increase in numbers was the way that the feedback stats (the drug which I mentioned in the previous post) started to form and push subjects that I would write about. It forms a positive feedback loop… and suddenly, your blog isn’t quite what it’s supposed to be about, any more.

You become a bit of a cyborg, where your relationship with the machine starts to influence what you write, and who you read.  On that note, some work by machine-learning experts  S. Bethard and D. Jurafsky, take that idea to a new level.  In, “Who should I cite? Learning literature search models from citation behavior”, S. Bethard & D. Jurafsky, ACM Conf. on Information & Knowledge Management, pdf at, they describe a system where, given the text of an abstract for a scholarly paper, the machine can predict what the paper OUGHT to cite, given the kinds of things other papers written on similar topics tend to cite. Think IBM’s Watson for academic research.

Machine learning + echo chamber = canalization function, where a certain delimited number of authors become authorities.

This would be a very bleak outcome, indeed; but my correspondent who alerted me to the paper argues instead that it would be easy to tweak such a system to introduce an element of serendipity (perhaps this is what DevonThink does; I’m not a mac person, so see Steve Johnson‘s descriptions of using it).  So another unintended consequence of blogging is that, in reaction to the feedback loop I described above, I try to pay attention to the serendipitous, the marginal, the things on the outside: and use those to inform my writing.

Finally, I’ve come to realize that a blog can be at its most useful, its most powerful, when it chronicles failure. “Happy families are all alike; every unhappy family is unhappy in its own way” said Tolstoy. If I try an experiment, and record on my blog that “Hooray! it worded just as I wanted it to!”, I don’t know whether I’ve really accomplished anything. On the other hand, My Glorious Failure is my most-read article on Play the Past. Knowing what doesn’t work helps you explore the phase-space of possibility.

The greatest unintended consequence of blogging? The discipline of keeping this blog, of reading and writing to my digital media theme, turned me from just a Roman archaeologist with technological leanings, into a digital humanist. I know this, because it says so on my door. I don’t think I would have the position I have today, without the progression of this blog to shape me.

Of course, the following is still true too…

Blogging Archaeology at the SAA – Why Blog?

Colleen asks,

The emergence of the short form, or blog entry, is becoming a popular way to transmit a wide range of archaeological knowledge. What is the place of this conversation within academic, professional, and public discourse? Simply put, what can the short form do for archaeology?

Blogging exhausts me. Blogging, as an art form, has that immediate feedback drug, the Statistics Page. Did I connect with anyone when I wrote x? Who has linked to post y? It’s a form of academic grinding; thus, I am exhausted. But I need another hit.

Almost a year ago, I wrote a post called ‘Why Academic Blogging Matters‘. My conclusions there are still appropriate:

[…]Academic blogs are content-rich, and tend to focus on very specific areas. We create an enormous signal in the chaos of the internet. This blog, Electric Archaeology, consistently shows up on Google search results for a wide variety of [knowledge] domains. […]
Google controls how we find information; but often, academic blogging tells Google what’s important.”

Yes, that’s still true. But ‘important’… well, that’s the thing. Recently, the New York Times published an article showing how one shady businessman was able to game Google’s system, on the principle that any publicity is good publicity.  This suggests that the signal we create, as far as Google is concerned, doesn’t have to be ‘good’ or ‘bad’ – it just has to be constant; it just has to be strong. In which case, let’s return to the concept of ‘grind’.  Grinding is “a term used in video gaming to describe the process of engaging in repetitive and/or non-entertaining gameplay in order to gain access to other features within the game” (Wikipedia).

Academic blogging then forces us to grind away, putting out pieces in an effort to rise to the top, to be the first piece on a given topic that the surfer encounters as they look for information. We have to drown out ‘bad’ signals and replace them with ‘good’.  The grind forces us to become better, more concise, at expressing an idea. It forces us to be topical – to write about what people are interested in. As we get better at the grind, we can begin to shape what people are interested in, since our results come out on top (ask your students how they know whether a search result is ‘valid’ or ‘good’ or ‘important’, and you’ll get a variant on ‘it was the first Google response’). That’s how we access the other features in this game of archaeology. We shape the public face. The grind demands that we try things out in public (so that we have something to write about), and we enter into conversation with others about the results. The grind demands that the curtain gets pulled back. Already, this kind of knowledge production is having repercussions for traditional modes of publication .

The blog as grind can be gamed though. For instance, at Play The Past, we have a group of around a dozen interested individuals. Because the grind is divided by twelve, we keep a constant stream of material bubbling away, but no one person gets exhausted.  It took me three years of blogging *extremely regularly* to get over 100 000 unique page views here – but Play the Past achieved this in one month.

Signal versus noise. As academics, we are obliged to create the strongest signal we can.