Text Analysis of 2012 Digital Humanities Job Adverts part 2
If we look at simple word frequencies in the 2012 job advertisement documents for Digital Humanities, we find these top words and raw frequency counts:
research 650
university 577
experience 499
library 393
work 334
information 303
position 299
project 269
applications 257
(I’ve deleted ‘digital’ and ‘humanities’ from this list).
If job advertisements are a way of signalling what an institution hopes the future will hold, one gets the sense that the focus of digital humanities work will be on projects, on research, in conjunction with libraries. But we can extract more nuance, using network analysis. You can feed the texts into Voyant’s ‘RezoViz’ tool, which extracts paired nouns in each document.
This can be outputted as a .net file, and then imported into Gephi. The resulting graph has 1461 nodes, and 20649 edges. Of course, there are some duplicates (like ‘US’ and ‘United States’), but this is only meant to be rough and ready, ‘generative‘, as it were (and note also that a network visualization is not necessary for the analysis. So no spaghetti balls. What’s important are the metrics). What I’d like to find out are what concepts are doing the heavy lifting in these job advertisements? What is the hidden structure of the future of digital humanities, as evidenced by job advertisements in the English speaking world?
My suspicion is that ‘modularity’ aka ‘community detection’, and ‘betweeness centrality’, are going to be the key metrics for figuring this out. Modularity groups nodes on the basis of shared similar local patternings of ties (or, to put it another way, it decomposes the global network into maximal subnetworks). Seth Long recently did some network analysis on the Unabomber’s manifesto, and lucidly explains why betweeness centrality is a useful metric for understanding semantic meaning: ”A word with high betweenness centrality is a word through which many meanings in a text circulate.” In other words, the heavy lifters.
So let’s peer into the future.
I ended up with about 15 groups. The first three groups by modularity account for 75% of the nodes, and 80% of the ties. These are the groups where the action lies. So let’s look at words with the highest betweenness centrality scores for those first three groups.
The first group
University
CSS
PHP
Digital
Ruby
METS (Metadata encoding and transmission standard)
United States
Python
MLS
New York
‘University’ is not surprising, and not useful. So let us discard it and bring in the next highest word:
MySQL
This one group by modularity also has all of the highest betweenness centrality scores – and it reads like a laundry list of the skills a budding DH practitioner must hold. The US, and New York would seem to be the centre of the world, too.
If we take the next ten words, we get:
MODS (Netadata Object Description Schema)
XHTML
University Libraries
CLIR (Council on Library and Information Resources)
University of Alberta
North America
Drupal
XML
MARC
Duke University
Again, skills and places figure – in Canada, U of A appears. So far, the impression is that DH is all about text, markup, and metadata. Our favorite programming languages are python and ruby. We use php, xhtml, xml, and drupal (plain-jane vanilla html eventually turns up in the list, but it’s buried very, very deep.).
So that’s an impression of the first group. (Remembering that groups are defined by patterns of similarity in their linkages).
The Second Group
The next group looks like this:
Digital Humanities
London
UK
CV
Dublin
Europe
Ireland
ICT
Department of Digital Humanities
Department of History
“digital humanities” is probably not helpful, so let’s eliminate that and go one more down: “US”. Indeed, let’s take a look at the next ten, too:
Human Resources
Department
Computer Science
BCE
Head of School
Faculty of Humanities
European
University of Amsterdam
MA
Italy
Here, we’re dealing very much with a UK, Ireland, and European focus. The ‘BCE’ is telling, for it suggests an archaeological focus in there, somewhere (unless this is some new DH acronym of which I’m not aware; I’m assuming ‘before the common era’).
The Third Group
In the final group we’ll consider here, we find a strong Canadian focus:
CRC (Canada Research Chair)
Canada
Waterloo
TEI (Text Encoding Initiative)
SSHRC
Victoria
Canada Research Chair
Skype
Digital Humanities Summer Institute
University of Victoria
Since we’ve got some duplication in here, let’s look at the next ten:
Canadian
Quebec
ETCL (Electronic Textual Cultures Laboratory, U Victoria)
Montreal
Concordia
University of Waterloo
DHSI (Digital Humanities Summer Institute)
Stratford
Faculty of Arts
Stratford Campus
‘Canada Research Chairs’ are well-funded government appointments, and so give an indication of where the state would like to see some research. Victoria continually punches above its weight, with look ins from Waterloo and Concordia.
So what have we learned? Well, despite the efforts of the digital history community, ‘digital humanities’ is still largely a literary endeavor – although it’s quite possible that a lot of the marking up that these job advertisements might envision could be of historical documents. Invest in some python skills (see Programming Historian). My friends in government tell me that if you can data mine, you’ll be set for life, as the government is looking for those skills. (Alright, that didn’t come out in this analysis at all, but he’s looking over my shoulder right now).
Finally – London, Dublin, New York, Edmonton, Victoria, Waterloo, Montreal – these seem to be the geographic hotspots. Speaking of temperature, Victoria has the nicest weather. Go there, young student!
Or come to Carleton and study with me. We’ve got tunnels.
update March 4th: jobs-topics-dh as a network graph IN the analysis above, I’ve generated a network using Voyant’s RezoViz tool. Today, I topic modelled all of the texts looking for 10 topics. So a slightly different approach. I turned the resulting document composition (ie doc 1 is 44% topic 1, 22% topic 4, 10% topic 3, etc) into a two mode graph, job advert to top two constituent topics. I then turned this into a 1 mode graph where job adverts are tied to other job adverts based on topic composition. Then I ran modularity, and found 3 groups by modularity; edges are percent composition by topics discerned through topic modeling.Nodes are ‘betweenness centrality’. Most between? George Mason University. I’m not sure what ‘betweenness centrality’ means though in this context, yet.
Makes for interesting clusters of job adverts. Topic model results to be discussed tomorrow.
Text analysis of 2012 Digital Humanities Job Adverts

2012 was a good year for hirings in the digital humanities. See for yourself at this archive of DH jobs: http://jobs.lofhm.org/ Now: what do these job adverts tell us, if you’re a graduate student trying to find your way?
Next week, I’m speaking to the Underhill Graduate Students’ Colloquium at Carleton University on ‘Living the life electric: becoming a digital humanist’. It’s broadly autobiographical in that I’ll talk about my own idiosyncratic path into this field.
That’s quite the point: there’s no firm/accepted/typical/you-ought-to-do X recipe for becoming a digital humanist. You have to find your own way, though the growing body of courses, books, journals, blog-o-sphere and twitterverse certainly makes a huge difference.
But in the interests of providing perhaps a more satisfying answer, I’ll try my hand at data mining those job posts (some 150 of them) using Voyant and MALLET to see what augurs for the future of the field.
Feel free to explore the corpus uploaded into Voyant. In any graphs you produce, January is on the left, December is on the right. If you spot anything interesting/curious, let me know.

And, because word counts are amazing:
| Word | Count |
| digital | 1082 |
| research | 650 |
| university | 577 |
| experience | 499 |
| library | 393 |
| humanities | 386 |
| work | 334 |
| information | 303 |
| position | 299 |
| project | 269 |
| applications | 257 |
| new | 223 |
| faculty | 222 |
| development | 216 |
| collections | 210 |
| department | 207 |
| management | 206 |
| projects | 195 |
| knowledge | 192 |
| data | 187 |
| including | 185 |
| ability | 182 |
| services | 180 |
| teaching | 180 |
| history | 177 |
| libraries | 176 |
| skills | 176 |
| qualifications | 172 |
| technology | 169 |
| required | 166 |
| media | 163 |
| jobs | 151 |
| application | 149 |
| original | 146 |
| program | 145 |
| link | 143 |
| web | 143 |
| working | 142 |
| loading | 140 |
| related | 140 |
| staff | 138 |
| academic | 137 |
| communication | 133 |
| job | 132 |
| college | 130 |
| degree | 127 |
| professor | 126 |
| education | 125 |
| students | 125 |
| studies | 123 |
Some Assembly Required: Teaching through/with/about/by/because of, the Digital Humanities
I’m to speak at the Canadian Network for Innovation in Education conference at Carleton in May; I’m one of the keynotes. I’ve never done a keynote before… I have a great fear of bringing coals to Newcastle, as it were. Pressed for a title and an abstract, this is what I’ve come up with (for good or ill):
Some Assembly Required
Every day, another university signs up to participate in Udacity, Coursera, or another of the monster MOOCs. Every day, another job posting makes ‘digital humanities’ a requirement. These two trends are not unrelated. Canadians have been at the forefront of massively open online courses, and in work that has come to be known as ‘digital humanities’, long before the current mania. In this talk, I want to tease apart the strands and histories that conflate these two trends. I want to look at how a perspective grounded in the digital humanities (whatever they are) is not just the latest trend, but rather a prism with a deep history through which we can refract our teaching and learning, and where MOOCs can be transmogrified into good pedagogy. Some assembly is required, and in neither trend can humans be replaced. Rather, the technology requires a humanities perspective in order for it to achieve its greatest potentials.
I’d be happy to hear people’s thoughts on this – inverting the normal order of thing, soliciting comments before the paper…
Partly as a result of speaking at this conference (and also a wedding to attend that week) I won’t be able to hit a graduate student conference on the digital humanities happening one building over.
p3d.in for hosting your 3d scans
I’m playing with p3d.in to host some three dimensional models I’ve been making with 123D Catch. These are models that I have been using in conjunction with Junaio to create augmented reality pop-up books (and other things; more on that anon). Putting these 3d objects onto a webpage (or heaven forbid, a pdf) has been strangely much more complicated and time-consuming. P3d.in then serves a very useful purpose then!
Below are two models that I made using 123D catch. The first is the end of a log recovered from anaerobic conditions at the bottom of the Ottawa River (which is very, very deep in places). The Ottawa was used as a conduit for floating timber from its enormous watershed to markets in the US and the UK for nearly two hundred years. Millions of logs floated down annually…. so there’s a lot of money sitting down there. A local company, Log’s End, has been recovering these old growth logs and turning them into high-end wide plank flooring. They can’t use the ends of the logs as they are usually quite damaged, so my father picked some up and gave them to me, knowing my interest in all things stamped. This one carries an S within a V, which dates it to the time and timber limits of J.R. Booth I believe.
And here we have one of the models that my students made last year from the Mesoamerican materials conserved at the Canadian Museum of Civilization (soon-to-be-repurposed as the Museum of Canadian History; what will happen to these awkward materials that no longer fit the new mandate?)
PS
Incidentally, I’ve now embedded these in a Neatline exhibition I am building:

3d manipulable objects in time and space
Why I Play Games
(originally posted at #HIST3812, my course blog for this term’s History3812: Gaming and Simulations for Historians, at Carleton University).
I play because I enjoy video games, obviously, but I also get something else out of it. Games are a ‘lively art’; they are an expressive art, and the artistry lies in encoding rules (descriptions) about how the world works at some microlevel: and then watching how this artistry is further expressed in the unintended consequences of those rules, their intersections, their cancellations, causing new phenomena to emerge.
This strikes me as the most profound use of humanities computation out there. Physicists tell us that the world is made of itty bitty things that interact in particular ways. In which case, everything else is emergent: including history. I’m not saying that there are ‘laws’ of human action; but we do live in this universe. So, if I can understand some small part of the way life was lived in the past, I can model that understanding, and explore the unintended outcomes of that understanding… and go back to the beginning and model those.
I grew up with the video game industry. Adventure? I played that. We had a vic-20 . If you wanted to play a game, you had to type it in yourself. There used to be a magaine (Compute!) that would have all of the code printed within, along with screenshots. Snake, Tank Wars – yep. My older brother would type, and I would read the individual letters (and spaces, and characters) out. After about a week, we’d have a game.
And there would be bugs. O lord, there were bugs.
When we could afford games, we’d buy text adventures from Infocom. In high school, my older brother programmed a quiz game as his history project for the year. Gosh, we were cool. But it was! Here we were, making the machine do things.
As the years went on, I stopped programming my own games. Graphics & technology had moved too fast. In college, we used to play Doom (in a darkened room, with the computer wired to the stereo. Beer often figured). We played SimCity. We played the original Civilization.
These are the games that framed my interactions with computers. Then, after I finished my PhD, I returned to programming when I realized that I could use the incredible artificial intelligences, the simulation engines, of modern games, to do research. To enhance my teaching.
I got into Agent Based Modeling, using the Netlogo platform. This turned my career around: I ceased to be a run-of-the-mill materials specialist (Roman archaeology), and became this new thing, a ‘digital humanist’. Turns out, I’m now an expert on simulation and history.
Cool, eh?
And it’s all down to the fact that I’m a crappy player of games. I get more out of opening the hood, looking at how the thing works. Civilization IV and V are incredible simulation engines. So: what kinds of history are appropriate to simulate? What kinds of questions can we ask? That’s what I’m looking forward to exploring with you (and of course, seeing what you come up with in your final projects).
But maybe a more fruitful question to start with, in the context of the final project of this course, is, ‘what is the strangest game you’ve ever played?’
What made it strange? Was it the content, the mechanics, the interface?
I played one once where you had to draw the platform with crayons, and then the physics engine would take over. The point was to try to get a ball to roll up to a star. Draw a teeter-totter under the star, and perhaps the ball would fall on it, shooting the star up to fall down on the ball, for instance. A neat way of interacting with the underlying physics of game engines.
I’d encourage everyone to think differently about what the games might be. For instance, I could imagine a game that shows real-time documents (grabbed from a database), and you have to dive into it, following the connected discourses (procedurally generated using topic models and network graphing software to find these – and if this makes no sense to you, take a quick peek at the Programming Historian) within it to free the voices trapped within…
This is why I play. Because it makes me think differently about the materials I encounter.
A history game brainstorming exercise
Tomorrow in my HIST3812 I want to get students thinking about the kinds of history that might be appropriate to embody in a game or simulation, and the experience of such games. Inspired by something we did at THATCamp Great Lakes, I’ve taken a deck of cards and divided it into ‘historiography (hearts)’, ‘genre (spades)’, and ‘aesthetic (clubs)’. Here’s the prompt for the exercise:
“I will give you cards from three different decks:
- historiography (Hearts)
- genre (Spades)
- aesthetic (Clubs)
Look at your cards. In your groups, brainstorm a quick idea for a game using those cards. If, after five minutes, you’ve hit a blank, you may exchange one card, and one card only. Note that nothing is being said about mechanics…
(what you come up with today is not necessarily what you have to go with for the term. This is just meant to get you thinking.)
| Historiography (Hearts) | Genre (Spades) | Aesthetics (Clubs) |
| 1 – Comparative | 1 – ARG | 1 or A – sensation |
| 2 – Cultural | 2 – Platformer | 2 or K – fantasy |
| 3 – Oral | 3 – Shooter | 3 or Q – narrative |
| 4 – Economic | 4 – Action-adventure | 4 or J – challenge |
| 5 – Environmental | 5 or 10 – Adventure | 5 or 10 – fellowship |
| 6 – World | 6 or J – RPG | 6 or 9 – discovery |
| 7 – Family | 7 or Q – Simulation | 7 – submission |
| 8 – Gender | 8 or K – Strategy | 8 – expression |
| 9 – Religious | 9 – Casual | |
| 10 – Intellectual | A – Serious | |
| J – Labour | ||
| Q – Marxist | ||
| K – Microhistory | ||
| A – Public |
Textexturing My Writing
I fed two recent posts, ‘Evaluating Digital Humanities‘, and ‘Deformative Digital Archaeology’, into Textexture.com. Textexture topic models your input texts, and then visualizes them via Gephi so that you can explore the interlinkages of topics/discourses whilst revisualizing them at the same time. You can play and explore the results for yourself at:
http://textexture.com/index.php?text_id=6941 Evaluating Digital Humanities
http://textexture.com/index.php?text_id=6943 Deformative Digital Archaeology.
You’ll want to hit ‘start layout’ to make these look a bit more presentable. Note that you can also download the gexf file itself, to open in Gephi, to try other layouts/metrics.
I find it reassuring, somehow, that natural divisions in my texts (for instance, in the second image, the code explications are clearly distinct, in red, from the broader discussion on the nature of digital archaeology, in blue). Unfortunately, Textexture only deals with relatively smallish chunks of text for now.
Evaluating Digital Work in the Humanities
Leave it to an archaeologist, but when I heard the CFP from Digital Humanities Now on ‘evaluating’ digital work, I immediately started thinking about typologies, about categorizing. If it is desirable to have criteria for evaluating DH work, then we should know roughly the different kinds of DH work, right? The criteria for determining ‘good’ or ‘relevant’, or other indications of value will probably be different, for different kinds of work.
In which case, I think there are at least two dimensions, though likely more, for creating typologies of DH work. The first – let’s call it the Owens dimension, in honour of Trevor’s post on the matter- extends along a continuum we could call ‘purpose’, from ‘discovery’ through to ‘justification’. In that vein I was mulling over the different kinds of digital archaeological work a few days ago. I decided that the closer to ‘discovery’ the work was, the more it fell within the worldview of the digital humanities.
The other dimension concerns computing skill/knowledge, and its explication. There are lots of level of skill in the digital humanities. Me, I can barely work Git or other command-line interventions, though I’m fairly useful at agent simulation in Netlogo. It’s not the kinds of skills here I am thinking about, but rather how well we fill in the blanks for others. There is so much tacit knowledge in the digital world. Read any tutorial, and there’s always some little bit that the author has left out because, well, isn’t that obvious? Do I really need to tell you that? I’m afraid the answer is yes. “Good” work on this dimension is work that provides an abundance of detail about how the work was done so that a complete neophyte can replicate it. This doesn’t mean that it has to be right there in the main body of the work – it could be in a detailed FAQ, a blog post, a stand alone site, a post at Digital Humanities Q&A, whatever.
For instance, I’ve recently decided to start a project that uses Neatline. Having put together a couple of Omeka sites before, and having played around with adding plugins, I found that (for me) the documentation supporting Neatline is quite robust. Nevertheless, I became (am still) stumped on the problem of the geoserver to serve up my georectified historical maps. Over the course of a few days, I discovered that since Geoserver is java-based, most website hosting companies charge a premium or monthly charge to host it. Not only that, it needs Apache Tomcat installed on the server first, to act as a ‘container’. I eventually found a site – Openshift - that would host all of this for free (! cost always being an issue for the one-man-band digital humanist), but this required me to install Ruby and Git on my machine, then to clone the repository to my own computer, then to drop a WAR file (as nasty as it sounds) into the webapps folder (but what is this? There are two separate webapp folders!) , then ‘commit, push’ everything back to openshift. Then I found some tutorials that were explicitly about putting Geoserver on Openshift, so I followed them to the letter…. turns out they’re out of date and a lot can change online quite quickly.
If you saw any of my tweets on Friday, you’ll appreciate how much time all of this took…. and at the end of the day, still nothing to show for it (though I did manage to delete the default html). Incidentally, Steve from Openshift saw my tweets and is coaching me through things, but still…
So: an importance axis for evaluating work in the digital humanities is explication. Since so much of what we do consists of linking together lots of disparate parts, we need to spell out how all the different bits fit together and what the neophyte needs to do to replicate what we’ve just done. (Incidentally, I’m not slagging the Neatline or Omeka folks; Wayne Graham and James Smithies have been brilliant in helping me out – thank you gentlemen!). The Programming Historian has an interesting workflow in this regard. The piece that Scott, Ian, and I put together on topic modelling was reviewed by folks who were definitely in the digital humanities world, but not necessarily well-versed in the skills that topic modeling requires. Their reviews, going over our step by step instructions, pointed out the many, many, places where we were blind to our assumptions about the target audience. If that tutorial has been useful to anyone, it’s entirely thanks to the reviewers, John Fink, Alan MacEachern, and Adam Crymble.
So, it’s late. But measure digital humanities work along these two axes, and I think you’ll have useful clustering in order to further ‘evaluate’ the work.
About a Barn
I found another presentation on that same laptop, this time related to the agricultural vernacular… that is to say, about a barn. I was very pleased with that title… Again, the venue was the Friends of Gatineau Park annual research forum.
Listening to Topic Models
I want to explore alternate ways of ‘visualizing’ patterns in data, beyond the visual. To that end, I’ve taken the major topics & their proportions from a topic model generated with MALLET and run them through the Musical Algortihms site at EWU.
1. I obtained data from the Portable Antiquities Scheme related to ceramic building materials recovered by the scheme (why this and not something else? I’m thinking about brick these days. No other reason).
2. I created a topic model of the descriptor text.
3. I take the composition file that is outputed (the one that can be read as ‘in document 2 the major topic is 4 at 25%, then topic 6 at 12%…’ etc), and grab the topics and the amount by which they compose the document- so the first two columns. I turn the decimals into whole numbers by multiplying by 100.
4. I put these two columns into Musical Algorithmns. I perform the modulo scaling, then I invert the numbers. I used a 1 for the duration of the note.
You can listen to the output here
So what does it sound like? Well, I haven’t got there yet. But… if you do the whole process again, this time with topic models derived from writing qua writing (rather than database entries; the link takes you to topic models I did from posts on Play the Past), you get this. Which sounds markedly different. More structure. Less repetition.
Anyway, this is obviously something that’ll require some more playing around (ha – see what I did there?)




