A Digital Archaeology of Digital Archaeology: work in progress

Ethan Watrall and I have been playing around with data mining as a way of writing a historiography of digital & computational archaeology. We’d like to invite you to play along.

We’ll probably have something to say on this at the SAA in April. Anyway, we’ve just been chugging along slowly, sharing the odd email, google doc, and so on – and a monstrous huge topic model browser I set up. Yesterday, an exchange on twitter took place that prompted us to share those materials.

This prompted a lot of chatter, including:

and this:

So let’s get this party started, shall we?

~o0o~

While there’s a lot of movement towards sharing data, and open access publications, there’s also this other space of materials that we don’t talk about too much – the things we build from the data that we (sometimes) share that enable us to write those publications we (sometimes) make open access. This intermediate stage never gets shared. Probably with good reason, but I thought given the nature of digital work, perhaps there’s an opportunity here to open not just our research outputs & inputs, but also our process to wider participation.

Hence this post, and all that follows.

~o0o~

Here’s what I did. I scoured JSTOR’s DFR for anglophone journals, from 1935 onwards (full bibliography right here: http://graeworks.net/digitalarchae/20000/#/bib. Then I fitted various topic models to them, using Andrew Goldstone’s dfr-topics which is an R package using MALLET on the bag-of-words that DFR gives you, running the result through Andrew’s dfr-browser (tag line: “Take a MALLET to disciplinary history!”).

The results can be viewed here. Like I said, this is the middle part of an analysis that we’re sharing here. Want to do some historiography with a distant reading approach? We’d love to see what you spot/think/observe in these models (maybe your students would like a go?) In which case, here’s an open pad for folks to share & comment.

Why would you bother? Well, it occurred to me that I’ve never seen anyone try to crowdsource this step of the process. Maybe it’s a foolish idea. But if folks did, and there was merit to this process, maybe some kind of digital publication could result where all contributors would be authors? Maybe a series of essays, all growing from this same body of analysis? Lots of opportunities.

Stranger things have happened, right?

~o0o~

Just to get you going, here are some of the things I’ve noticed, and some of my still-churning thoughts on what all this might mean (I’ve pasted this from another document; remember, work in progress!):

remembering that in topic modeling, a word can be used in different senses in different topics/discourses (thus something of the semantic sense of a word is preserved)

tools used:

-stanford tmt for detailed view on CAA (computer applications in archaeology)

-mimno’s browser based jslda for detailed view of correlations between topics (using CAA & IA) (internet archaeology, only the open access materials before it went fully OA in October 2014)

-Goldstone’s dftropics for R and dfrbrowser to visulaize 21 000 articles as entire topic model

-same again for individual journals: AJA, JFA, AmA, CA, JAMT, WA

——-

stanford tmt of caa 1974 – 2011

Screen Shot 2014-11-09 at 3.43.49 PM

-no stoplist used; discards most prominent and least likely words from the analysis

-its output is formatted in such a way that it becomes easy to visualize the patterns of discourse over time (MALLET, the other major tool for doing topic modeling, requires much more massaging to get the output in such a form. The right tool for the right job).

-30 topics gives good breakdown; topic 26 contains garbage (‘caa proceedings’ etc as topic words)

In 1974, the most prominent topics were:

topic 1 – computer, program, may, storage, then, excavation, recording, all, into, form, using, retrieval, any, user, output, records, package, entry, one, unit

topic 6: but, they, one, time, their, all, some, only, will, there, would, what, very, our, other, any, most, them, even

topic 20: some, will, many, there, field, problems, may, but, archaeologists, excavation, their, they, recording, however, record, new, systems, most, should, need

The beginnings of the CAA are marked by hesitation and prognostication: what *are* computers for, in archaeology? There is a sense that for archaeologists, computation is something that will be useful insofar as it can be helpful for recording information in the field. With time, topic 1 diminishes. By 2000 it is nearly non-existent.  The hesitation expressed by topics 6 and 20 continues though. Archaeologists do not seem comfortable with the future.

Other early topics that thread their way throughout the entire period are topics 5, 2, 27 and 28:

Topic 5: matrix, units, stratigraphie, relationships, harris, unit, between, method, each, attributes, may two diagram, point, other, seriation, one, all, stratigraphy, sequence

Topic 2: area, survey, aerial, north, features, sites, region, located, excavation, river, areas, during, field, its, large, project, south, water, over, fig

Topic 27: sites, monuments, heritage, national, record, management, cultural, records, development, systems, england, database, english, its, survey, new, will, also, planning, protection.

Topic 28: museum, museums, collections, project, national, documentation, all, database, archives, about, archive, objects, sources, documents, university, text, our, also, collection, reports.

These topics suggest the ‘what’ of topic 1: how do we deal with contexts and units? Large surveys? Sites and monuments records and museum collections? Interestingly, topics 27 and 28 can be taken as representing something of the professional archaeological world (as opposed to ‘academic’ archaeology).

Mark Lake, in a recent review of simulation and modeling in archaeology (JAMT 2014) describes various trends in modeling [discuss]. Only topic 9 seems to capture this aspect of computational/digital archaeology:

model, models, social, modeling, simulation, human, their, between, network, approach, movement, networks, past, different, theory, how, one, population, approaches, through

Interestingly, for this topic, there is a thin thread from the earliest years of the CAA to the present (2011), with brief spurst in the late 70s, and late 80s, then a consistent presence throughout the 90s, with a larger burst from 2005-2008. Lake characterizes thus…. [lake]. Of course, Lake also cites various books and monographs which this analysis does not take into account.

If we regard ‘digital archaeology’ as something akin to ‘digital humanities’ (and so distinct from ‘archaeological computation’) how does it, or does it even, appear in this tangled skein? A rough distinction between the two perspectives can be framed using Trevor Owens meditation on what computation is for. Per Owens, we can think of a humanistic use of computing as one that helps us deform our materials, to give us a different perspective on it. Alternatively, one can think of computing as something that helps us justify a conclusion. That is, the results of the computation are used to argue that such-a-thing is most likely in the past, given this model/map/cluster/statistic. In which case, there are certain topics that seem to imply a deformation of perspective (and thus, a ‘digital archaeology’ rather than an archaeological computation):

topic 03: cultural, heritage, semantic, model, knowledge, systems, web, standards, ontology, work, domain, conceptual, different, crm, between, project, based, approach

topic 04: knowledge, expert, process, its, artefacts, set, problem, different, concepts, human, systems, but, they, what, our, scientific, about, how, all, will

topic 07: project, web, digital, university, internet, access, online, service, through, electronic, http, european, technologies, available, public, heritage, will, services, network, other

topic 14: virtual, reality, museum, public, visualization, models, reconstruction, interactive, museums, multimedia, heritage, envrionment, scientific, reconstructions, will, computer, technologies, environments, communication

topic 29: gis, spatial, time, within, space, temporal, landscape, study, into, social, approaches, geographic, applications, approach, features, environmental, based, between, their, past

Topic 3 begins to emerge in 1996 (although its general discourse is present as early as 1988).  Topic 4 emerges with strength in the mid 1980s, though its general thrust (skepticism about how knowledge is created?) runs throughout the period. Topic 7 emerges in 1994 (naturally enough, when the internet/web first hit widespread public consciousness). Should topic 7 be included in this ‘digital archaeology’ group? Perhaps, inasmuch as it also seems to wrestle with public access to information, which would seem not to be about justifying some conclusion about the past but rather opening perspectives upon it. Topic 14 emerges in the early 1990s.

Topic 29, on first blush, would seem to be very quantitative. But the concern with time and temporality suggests that this is a topic that is trying to get to grips with the experience of space. Again, like the others, it emerges in the late 1980s and early 1990s. [perhaps some function of the personal computer revolution..? instead of being something rare and precious -thus rationed and only for ‘serious’ problems requiring distinct answers - computing power can now be played with and used to address less ‘obvious’ questions?]

What of justification? These are the topics that grapple with statistics and quantification:

Topic 10: age, sites, iron, settlement, early, bronze, area, burial, century, one, period, their, prehistoric, settlements, grave, within, first, neolithic, two, different

Topic 11: pottery, shape, fragments, classification, profile, ceramics, vessels, shapes, vessel, sherds, method, two, ceramic, object, work, finds, computer, fragment, matching, one

Topic 13: dating, radiocarbon, sampling, london, dates, some, but, betwen, than, e.g. , statistical, chronological, date, there, different, only, sample, results, one, errors

Topic 15: landscape, project, study, landscapes, studies, cultural, area, gis, human, through, their, its, rock, history, historical, prehistoric, environment, our, different, approach

Topic 17: sutdy, methods, quantitative, technqiues, approach, statistical, using, method, studies, number, artifacts, results, variables, two, most, bones, based, various, analyses, applied

Topic 19: statistical, methods, techniques, variables, tiie, statistics, density, using, cluster, technique, multivariate, method, two, nottingham, example, principal, some, university

Topic 21: model, predicitve, modelling, models, cost, elevation, viewshed, surface, sites, gis, visibility, van, location, landscape, areas, one, terrain, dem, digital

topic 23: image, digital, documentation, images, techniques, laser, scanning, models, using, objects, high, photogrammetry, methods, model, recording, object, surveying, drawings, accuracy, resolution

topic 24: surface, artefact, distribtuion, artefacts, palaeolithic, materials, sites, deposits, within, middle, area, activity, during, phase, soil, processes, lithic, survey, remains, france

Macroscopic patterns

Screen Shot 2014-11-09 at 3.45.25 PMThis detail of the overall flow of topics in the CAA proceedings points to the period 1978 – 1983 as a punctuation point, an inflection point, of new topics within the computers-and-archaeology crowd. The period 1990-2011 contains minor inflections around 1997 and 2008.

1997-1998

1990-2011

In terms of broad trends, pivot points seem to be the late 70s, 1997, 2008. Given that our ‘digital archaeology’ themes emerge in the late 90s, let’s add Internet Archaeology to the mix [why this journal, why this time: because of the 90s inflection point? quicker publication schedule? ability to incorporate novel outputs that could never be replicated in print?]. This time, instead of searching for topics, let’s see what correlates with our digital archaeology topics. For this, David Mimno’s browser based LDA topic model is most useful. We run it for 1000 iterations, and find the following correlation matrix.

[insert discussion here]

http://www.graeworks.net/digitalarchae/mimno/jslda.html?docs=caa_and_intarch.txt&stoplist=en.txt&topics=30

-1000 iterations. Your 1000 iterations will be slightly different than mine, because this is a probablistic approach

- the browser produces csv files for download, as well as a csv formatted for visualizing patterns of correlation as a network in Gephi or other network visualization software.

-stop list is en, fr, de from MALLET + archaeology, sites, data, research

-running this in a browser is not the most efficient way of doing this kind of analysis, but the advantage is that it allows the reader to explore how topics sort themselves out, and its visualization of correlated topics is very effective and useful.

-note word usage. Mimno’s browser calculates the ‘specificity’ of a word to a topic. The closer to 1.0, the closer the word is distributed only within a single topic. Thus, we can take such words as being true ‘keywords’ for particular kinds of discourses. [which will be useful in exploring the 20000 model]. “Computer” has a specificity of 0.61, while “virtual” has a specificity of 0.87, meaning that ‘computer’ is used in a number of topics, while ‘virtual’ is almost exclusively used in a single discourse. Predicitve has a specificty of 1, and statistical of 0.9.

In the jsLDA model, there are three topics that deal with GIS.

topic 19, gis landscape spatial social approach space study human studies approaches

topic 18, database management systems databases gis web software user model tool

topic 16, sites gis landscape model predictive area settlement modelling region land

The first, topic 19, seems to correspond well with our earlier topic that we argued was about using GIS to offer a new perspective on human use/conception of space (ie, a ‘digital’ approach, in our formulation). Topics 18 and 16 are clearly about GIS as a computational tool. In the correlation matrix below, blue equals topics that occur together greater than expected, while red equals less than expected; the size of the dot gives an indication of how much. Thus, if we look for the topics that go hand in hand with topic 19, the strongest are topic 16 (the predictive power of GIS), and topic 10 (social, spain, simulation, networks, models).

Screen Shot 2014-11-09 at 5.28.47 PMThe ‘statistical, methods, techniques, artefact, quantitative, statistics, artefacts’ topic is positively correlated with ‘human, material, palaeolithic’, ‘time, matrix, relationship’, and ‘methods, points, point’ topics. This constellation of topics is clearly a use of computation to answer or address very specific questions.

-in jslda there’s a topic ‘database project digital databases web management systems access model semantic’ – positively correlated with ‘publication project electoric’, ‘text database maps map section user images museum’, ‘excavation recording’, ‘vr model’,  ‘cultural heritage museum’, ‘italy gis’, ‘sites monuments record’ [see keys.csv for exact label]. These seem to be topics that deal with deforming our perspectives while at the same time intersecting with extremely quantitative goals.

So far, we have been reading distantly some 40 years of archaeological work that is explicitly concerned with the kind of archaeology that uses computational and digital approaches. There are punctuation points, ‘virages’, and complicated patterns – there is no easy-to-see disjuncture between what the digital humanists imagine is the object of using computers, and their critics who see computation as positivism by the back door. It does show that archaeology should be regarded as an early mover in what has come to be known as ‘the digital humanities’, with quite early sophisticated and nuanced uses of computing. But how early? And how much has archaeological computing/digital archaeology permeated the discipline? To answer these questions, we turn to a much larger topic model

Zoom Out Some More

Let’s put this into a broader context. 24 journals from JSTOR were selected for both general coverage of archaeology and for regional/topical specialities. The resulting dataset contains 21000 [get exact number] articles, mostly from the past 75 years (a target start date of 1940 was selected for journals whose print run predates the creation of the electronic computer, thus computer = machine and not = woman who computes). 100 topics seemed to capture the range of thematic discourses well. We looked first for topics that seem analogous to the CAA & IA topics (CAA and IA were not included in this analysis because they are not within the JSTOR DFR database; Goldstone’s DFR Browser was used for the visualization of the topics). [better explanation, rationale, to be written, along with implications]. We also observe ‘punctuation points’ in this broader global (anglosphere) representation of archaeology that correspond with the inflection points in the small model, many trends that fit but also other trends that do not fit with standard historigoraphy of archaeology. We then dive into certain journals (AJA, JFA, AmA, JAMT) to tease these trends apart. Just what has been the impact of computational and digital archaeology in the broader field?

Screen Shot 2014-11-09 at 5.29.24 PMThe sillouhette in the second column gives a glimpse into the topic’s prevalence over the ca 75 years of the corpus. The largest topic, topic 10, with its focus on ‘time, made, work, years, great, place, make’ suggests a kind of special pleading, that in the rhetoric of archaeological argument, one always has to explain just why this particular site/problem/context is important. A similar topic was observed in the model fitted to the CAA & IAA [-in 20000 model, there’s the ‘time’ topic time made work years great place make long case fact point important good people times; it’s the largest topic, and accounts for 5.5%. here, there is one called ‘paper time work archaeologists introduction present important problems field approach’. it’s slightly correlated with every other topic. Seems very similar. ]

More interesting are the topics a bit further down the list. Topic 45 (data, analysis, number, table, size, sample) is clearly quantitative in nature, and its sillhouette matches our existing stories about the rise of the New Archaeology in the late 60s and early 70s. Topics 38 and 1 seem to be topics related to describing finds – ‘found, site, stone, small, area’; ‘found, century, area, early, excavations’. Topic 84 suggests the emergence of social theories and power – perhaps an indication of the rise of Marxist archaeologies? Further down the list we see ‘professional’ archaeology and cutlrual resource management, with peaks in the 1960s and early 1980s.

Screen Shot 2014-11-09 at 5.29.56 PM

Topic 27 might indicate perspectives connected with gender archaeology – “social, women, material, gender, men, objects, female, meaning, press, symbolic” – and it accounts for 0.8% of the corpus: about 160 articles.  ‘Female’ appears in four topics, topic 27, topic 65 (‘head, figure, left, figures, back, side, hand, part’ – art history? 1.4% of the corpus) topic 58 (“skeletal, human, remains, age, bone”- osteoarchaeology, 1.1% of the corpus), and topic 82 (“age, population, human, children, fertility” – demographics? 0.8% of the corpus).

[other words that would perhaps key into major trends in archaeological thought? looking at these topics, things seem pretty conservative, whatever the theorists may think, which is surely important to draw out and discuss]

Concerned as we are to unpick the role of computers in archaeology more generally, if we look at the word ‘data’ in the coprus, we find it contributes to 9 different topics (http://graeworks.net/digitalarchae/20000/#/word/data ). It is the most important word in topic 45 (data, analysis, number, table, size, sample, study) and in topic 55 (data, systems, types, information, type, method, units, technique, design). The word ‘computer’ is also part of topic 55. Topic 45 looks like a topic connected with statistical analysis (indeed, ‘statistical’ is a minor word in that topic), while topic 55 seems to be more ‘digital’ in the sense we’ve been discussing here. Topic 45 is present in 3.2% of the corpus, growing in prominence from the early 1950s, falling in the 60s, and resurging in the 70s, and then decreasing to a more or less steady state in the 00s.

Screen Shot 2014-11-09 at 5.30.34 PM

Topic 55 holds some surprises:

Screen Shot 2014-11-09 at 5.31.17 PM

The papers in 1938 come from American Antiquity volume 4 and show an early awareness of not just quantitative methods, but also the reflective way those methods affect what we see [need to read all these to be certain of this]

next steps

- punctuation points – see http://graeworks.net/digitalarchae/20000/#/model/yearly

major – 1940 (but perhaps an artefact of the boundaries of the study)

minor- early 1950s

minor- mid 1960s

major- 1976 (american antiquity does something odd in this year)

major- 1997-8

 

Breakage

I was at #seeingthepast these last two days (website). During one of the discussions, the idea of glitchiness of augmented reality was raised, and ways that this might intersect with materiality were explored. At one point, the idea of an app that let people break museum objects (the better to know them and how they were created) was mooted. (nb, I didn’t come up with the idea; it might have been Keri or Caitlin).

I tweeted:

and archaeologists on the twitterverse responded. (I then would periodically inform the symposium of the twitter discussion, which would then spark ruminations on the virtuality of conferences, but I digress):

On the way home, I had time to think about how this might work. If you’ve got the chops to make it happen, this is how I think ‘Breakage’ could go, so I’d love to see something like:

- photos uploaded from museum online catalogues, exhibitions, or databases (ones without good provenances)

- user can pan through these. When one catches the user’s fancy, the user selects it: and it shatters into pieces.

- each piece can then be examined; pieces highlights some aspect of the object inherent to the object (makers’ marks, artistic effects, clay fabric, whatever).

- touch again, and the pieces are put into a *possible* context. touch again, a different *possible* context. Show how different meanings could be understood if this was the actual context, and how it…. but damn. We don’t actually know what the piece’s real context was, so we don’t know anything.

- and then the image would be deleted from the user’s version of the app, never to be seen again, as if it has been looted anew.

I’m no MacGyver

I’m no MacGyver. Tim the Tool Man? Bill Nye, Science Guy? Hell, I’m nowhere near Heinz Doofenshmirtz. Or Phineas. I’d kill to be Ferb.

Wile. E. Coyote? Brain? Possibly Pinky.

I’m not handy. But I thought I could do Google Cardboard. Print out the template. Glue it to a sheet of cardboard. Cut. Fold. VR!

Tab A certainly doesn’t fit into Slot B. And how does the eyepiece, crossbrace thingy work out? A Pampers box is admittedly probably too thick for this. Sheesh. Google, go look at Ikea instructions; they are masters of the art.

As for me, I’m going back to the warm embrace of acoustic augmented reality.

Visual- meh.

On Academic Blogging – a Conversation with Matt Burton

Papyrus, Wikimedia Commons, http://bit.ly/1DkaNWG

Matt Burton, who is working on new web genres and informal scholarly communication, asked me some questions recently as part of his research. We thought it would be interesting to share our conversation.

MB: When did you start your blog (career wise: as a grad student,  undergrad, etc)?

I recently pulled my entire blog archive into github, as part of my open-notebook workflow. (http://shawngraham.github.io/open-notebook/ll_CC/#!pages/uploads/blogarchive/posts/contents.md)

 I see there that I posted my first post on Dec 18, 2006. I was, at the time, working in what would now be recognized as alt-ac, doing contract research for Kevin Kee at Brock U, as well as freelance heritage consulting work, some online teaching, and substitute teaching at the local high school. This was after my post-doc, nearly four years to the day that I won my PhD.

MB: Why did you decide to start blogging?

Earlier in the year I had won a spot at the first digital humanities workshop at Lincoln Nebraska. John Bonnett of Brock, whom I’d met at CAA 2006 in Fargo, saw the advertisement and forwarded it to me. (John was an early champion of my work in Canada, and I’m eternally grateful for that!) I met there folks like Alan Liu, Katharine Walter, William Thomas, Stephen Ramsay. I didn’t appreciate it then, but that was the seminal moment for me. At the workshop where I presented my work on agent based modelling of Roman social structures, I distinctly remember Alan saying, ‘you’ve got a nice static website; have you thought about blogging?’  Thereupon the room began discussing how a blog for my work might, well, work.  My postdoc terminated that September, and when I was out of the warm embrace of academia, I decided ‘what the hell; what am I afraid of?’ and I started blogging. I posted three times that day, along with a statement of why I’m blogging. I framed it as a record of my explorations in virtual worlds.

Even then, it was a kind of open notebook. Kevin, the other major supporter of my work in those early days, let me count the writing of blog posts towards the more general research goals of the projects he was employing me on. We expect projects these days to blog, but in those days, I think it was still fairly novel. I wasn’t even blogging about the main project, just the side roads and blind alleys I was stumbling around.

MB How do you host your blog, i.e. Do you use a generic web-host like Dreamhost with WordPress, do you use a blogging service like Blogger.com

I’m using plain old wordpress.com, though I did invest in buying a domain name. Initially, I’d called it ‘electric archaeology’ but in the wordpress.com domain I’d called it electricarchaeologist.wordpress which was, well, confusing and annoying. I host my course blogs with Dreamhost, which over the years has gotten more clunky it seems. That’s just an impression.

MB How did you learn to set up your blog? 

I spent an inordinate amount of time farting around with the settings, themes, etc. At one point I was the tech support for an online liberal arts college start up; because I’d pressed the button on a one-click dream host install of Moodle, that made me the most technically proficient person there.

Scary thought.

Anyway, they had a wordpress merged with moodle arrangement, and one day I utterly bolloxed up the moodle upgrade, which broke everything. I printed out every php file I could find, and with the help of a friend, laid them out on the floor, drawing arrows to connect files by dependencies, shared tables, etc, to sort out the mess.

I learned a lot that day. Primarily, that I didn’t like web development. I’ve stuck more or less with whatever the free theme gods throw my way, since then. My online tenure & promotion portfolio is built on wordpress (graeworks.net) and involved a bit of hacking around to get the right plugins I wanted.

MB What are the challenges with maintaining your blog (i.e. spam, approving comments, dealing with trolls, etc)?

Spam. Spam spam spam spam!

I don’t get many comments. I know people read the thing, but since I don’t often write long discursive pieces, I guess I just don’t attract that much in the way of comments. Although I do get emails directly in response to things I’m doing on the blog, so I suppose that counts.

The biggest issue is maintaining drive. It helps to keep in my mind that this is a research blog, an open notebook, the narrative bits that help me make sense of all the digital ephemera littering my computers. I often have to consult the blog to remind myself just what the hang I’ve been working on. Initially I was posting quite regularly, but over the years it goes in fits and starts.

MB What topics do you normally write about? Do you try and keep it strictly academic, or do you mix in other topics?

I like to futz about with new (digital) toys, to make them do unexpected things, to think through how they might be of use to others, to figure out how to tell others how they might want to use them. I do bits of analyses, munge data together to share with others. I do mix in other non academic stuff from time to time. For a while, the National Geographic channel used to send me dvds to review prior to one of their big ratings weeks. Perhaps it’s a coincidence, but after I wrote, of one episode, ‘bollocks’, the dvds stopped coming.

Probably a coincidence.

MB If you allow comments on your blog, do you often get comments? What has been your experience managing comments/commenters on your blog?

Again, not so much. Probably a function of the content, I suppose. Dealing with spam that gets by akismet is tiring though.

MB What kinds of interactions (scholarly or otherwise) emerge out of your blogging practice?

I like to say that my transformation from a ho-hum bog standard Roman brick guy (and there’s more of us than that sentence would lead you to think) into this thing ‘digital humanities’ was a direct consequence of the blogging. The blogging gained my simulation work (not many DH’ers do agent modelling) a larger audience, which led to many of my how-tos, to email exchanges with grad students (for the most part) who are now getting established in various places; invitations to contribute to edited volumes, conferences, and journals, speaking engagements – all this while I was formally outside of academia. Before twitter, the blogging helped me maintain a sense of community, a sense of purpose for my intellectual curiosity that I didn’t get in my day-to-day scramble to pay the bills. I think I might be the first person in Canada to be hired to a post formally with ‘digital humanities’ in the title (though of course I’m not saying I was the first DH person!!) and it was the blogging, the exposure to and engagement with wider trends going on in computational humanities beyond archaeology, that allowed me to say with confidence, ‘yes, I’m the DH person you’re looking for’.

The blogging made me.

MB Do you find these interactions informative, useful, enlightening, tedious, frustrating, obligatory, etc? How do they feel?

I still get excited when there’s a comment on the blog. The Louis Vitton bag people, they complete me.

Real comments send me over the moon. They’ve led to many productive relationships and partnerships.

MB How do you think digital humanities blogging is different from more traditional forms of academic writing and reading?

I think it’s a return, in some ways, to academic discourses of earlier, not-second-half-of-the-20th-century ways. But that’s mostly an impression; I’m pretty foggy on most things after AD 300. But I like the reflexivity of digital humanities blogging, the exploration of not just what the tool can do, or what computation has perhaps thrown into new light, but the consideration of what that does to us as researchers, as a public.

MB How would you characterize the relationship between blogging and the digital humanities (however broadly conceived)?

Not everybody has to blog. Nor should they. It’s perfectly possible to be a productive dh person and not blog. But speaking for myself, I think blogging keeps things fresh. We’re working on a book; the blogged draft has already had a bit of an impact. I’m worried the paper version will already be dated by the time it comes out (though this is one of the fastest book projects I’ve ever been involved with), precisely because the most interesting conversations are happening across the blogs, faster than the formal apparatus can keep up. But that’s ok.

MB What DH blogs/bloggers do you read and why do you read them? What do you like about them?

A partial list: I read Scott and Ian, obviously; Ted Underwood; Elijah Meeks, Alan Liu, Bethany, Ben Schmidt, Mills Kelley, Tom Brughmans, Caleb Daniels, Profhacker, Donna Yates, Colleen Morgan, Lorna Richardson, playthepast.org… it rather depends on what project I’m working on. I followed Stu Eve religiously for a while as he puzzled out the problems of an embodied GIS. Now that that project is done – and I’m not teaching locative computing for historians at the moment – I’ve moved away a bit. So has Stu, for that matter. It all really depends on what’s going on, and what’s caught my attention. I’m a bit of a magpie. dhnow is essential though for its global view.

I read these folks for the way they dissect ideas as much as for any how-tos or code they share. They help me see bigger picture. Some of them are historians, some are english-flavoured dh, others are archaeologists.

MB What was your most popular blog post? Why do you think it was so popular?

The all-time most popular post on my blog, according to wordpress stats, are:

Civilization IV World Builder Manual & other needful things

19,338

Getting Started with MALLET and Topic Modeling

10,621

Moodle + WordPress = Online University

9,710

So, two how-tos, and one that seems to have hit some kind of SEO sweetspot, since it’s fairly anodyne. A follow up to that last one hasn’t been as popular:

WordPress + Moodle (not equal to) Online University

3,733

But if you asked for my favourites, I’d say:

Signal Versus Noise: Why Academic Blogging Matters: A Structural Argument. SAA 2011

1,206

How I Lost the Crowd: A Tale of Sorrow and Hope

1,035

What is the half-life of blog posts, I wonder? The blogging represents quite a sustained effort. I did the math; I’ve written enough tweets to fill two typical academic books; I have no idea how many words these 700 (or so) blog posts I’ve got add up to. But I do think the sustained effort of writing regularly has made me a better writer. (Reader, you may wish to disagree!)

What Careers Need History?

We have a new website at CU today; one of the interesting things on it is a page under the ‘admissions’ section that describes many careers and the departments whose program might fit you for such a career.

I was interested to know what careers were listed as needing a history degree. Updated Oct 17: I have since learned that these career listings were generated by members of the department some time ago; I initially believed that the list was generated solely by admissions, and I apologize for the confusion. This paragraph has been edited to reflect that correction. See also the conclusion to this piece at bottom.

I used wget to download all of the career pages:

wget -r --no-parent -w 2 -l 2 --limit-rate=20k http://admissions.carleton.ca/careers/

I then copied all of the index.html files using the Mac finder (searched within the subdirectory for all index.html; copied them into a new folder).

Then, I used grep to figure out how many instances of capital-h History (thus, the discipline, rather than the generic noun) could be found on those career pages:

grep -c \<h3\>History *.html >history-results.tsv

I did the same again for a couple of other keywords. The command counts all instances of History in the html files, and writes the results to a tab separated file. I open that file in Excel. But – I don’t know what index 1.html is about, or index 45.html, and so on. So in text wrangler, I searched multiple files for the text betweentags, using a simple regex:

Screen Shot 2014-10-15 at 2.28.53 PM

Screen Shot 2014-10-15 at 2.29.35 PM

Copy and paste those results into a new page in the excel file, search and replace with blank spaces all of the extraneous bits (index, .html,and), sort by the file numbers, then copy and paste the names (now in the correct order) into a new column in the original count.

Which gives us this, insofar as History as a degree option leading to particular careers, (and where the numbers indicate not absolute importance of history, more of an emphasis than anything else):

Career count of History
TeachingPage 2 of 3 4
Museums and Historical SitesPage 2 of 2 4
Heritage Conservation 4
TourismPage 2 of 2 2
ResearchPage 2 of 3 2
JournalismPage 2 of 3 2
Foreign ServicePage 2 of 3 2
EducationPage 2 of 3 2
Library and Information Science 2
Design 2
Archival Work 2
Architectural History 2
Archaeology 2

And here’s ‘Global’ (we have some new ‘Globalization’ programmes):

Career count of Global
Tourism 12
Teaching 12
Research 12
Public Service 12
Polling 12
Politics 12
Policy Analysis 12
Non-Profit Sector 12
Non-Governmental Organizations 12
Museums and Historical Sites 12
Media 12
Lobbying 12
Law 12
Journalism 12
International Relations 12
International Development 12
Government 12
Foreign Service 12
Finance 12
Education 12
Diplomacy 12
Consulting 12
Conservation 12
Civil Service 12
Business 12
Advocacy 12
Administration 12
Foreign ServicePage 2 of 3 6
FinancePage 2 of 3 6
TeachingPage 2 of 3 4
Museums and Historical SitesPage 2 of 2 4
TourismPage 2 of 2 4
ResearchPage 2 of 3 4
JournalismPage 2 of 3 4
EducationPage 2 of 3 4
Public ServicePage 2 of 3 4
PollingPage 2 of 2 4
PoliticsPage 2 of 3 4
Policy AnalysisPage 2 of 3 4
Non-Profit SectorPage 2 of 2 4
Non-Governmental OrganizationsPage 2 of 3 4
MediaPage 2 of 2 4
LobbyingPage 2 of 3 4
LawPage 2 of 2 4
International RelationsPage 2 of 2 4
International DevelopmentPage 2 of 3 4
GovernmentPage 2 of 4 4
DiplomacyPage 2 of 3 4
ConsultingPage 2 of 3 4
ConservationPage 2 of 2 4
Civil ServicePage 2 of 3 4
BusinessPage 2 of 5 4
AdvocacyPage 2 of 2 4
AdministrationPage 2 of 4 4
Management 2
International Trade 2
International Business 2
Humanitarian Aid 2
Human Resources 2
Broker 2
Banking 2

Interesting, non? 

Update October 17 - we shared these results with Admissions. There appears to have been a glitch in the system. See those ‘page 2 of 3′ or ‘page 3 of 5′ notes in the tables above? The entire lists were visible to wget, but not to the user of the site, leaving ‘history’ off the page of careers under ‘museums and historical sites’, for instance. The code was corrected, and now the invisible parts are visible. Also, in my correspondence with the folks at Admissions, they write “[we believe that] Global appears more than History because careers were listed under each of its 12 specializations. We will reconfigure the way the careers are listed for global and international studies so that it will reduce the number of times that it comes up.”

So all’s well that ends well. Thank you to Admissions for clearing up the confusion, fixing the glitch, and for pointing out my error which I am pleased to correct.

 

Historical Maps, Topography, Into Minecraft: QGIS

Building your Minecraft Topography(An earlier version of this uses Microdem, which is just a huge page in the butt. I re-wrote this using Qgis, for my hist3812a students)

If you are trying to recreate a world as recorded in a historical map, then modern topography isn’t what you want. Instead, you need to create a blank, flat world in Worldpainter, and then import your historical map as an overlay. In worldpainter, File >> New World. In the dialogue box, uncheck ‘circular world’. Tick of ‘flat’ under topography. Then, on the main icon ribbon, select the ‘picture frame’ icon (‘image overlay’). In the dialogue box, tick ‘image overlay’. Select your file. You might have to fiddle with the scale and the x, y offset to get it exactly positioned where you want. Watch the video mentioned below to see all this in action. Then you can paint the terrain type (including water), raise, lower the terrain accordingly, put down blocks to indicate buildings… Worldpainter is pretty powerful.

If you already have elevation data as greyscale .bmp or .tiff

  • Watch the video about using Worldpainter.
  • Skip ahead to where he imports the topographic data and then the historical map imagery and shows you how to paint this against your topography.
  • You should also google for Worldpainter tutorials.

If you have an ARCGIS shapefile

This was cooked up for me by Joel Rivard, one of our GIS & Map specialists in the Library. He writes,

  • Using QGIS: In the menu, go to Layer > Add Vector Layer. Find the point shapefile that has the elevation information.
  • Ensure that you select point in the file type.
  • In the menu, go to Raster > Interpolation.
  • Select “Field 3″ (this corresponds to the z or elevation field) for Interpolation attribute and click on “Add”.
  • Feel free to keep the rest as default and save the output file as an Image (bmp, jpg or any other raster)

If you need to get topographic data

In some situations, modern topography is just what you need.

  • Grab Shuttle Radar Topography Mission data for the area you are interested in (it downloads as a tiff.) To help you orient yourself, click off ‘toggle cities’ at the bottom of that page. You then click on the tile that contains the region your are interested in. This is a large piece of geography; we’ll trim in a moment.
  • Open QGIS
  • Go to Layer >> Add Raster Layer. Navigate to the location where your srtm download is located. You’re looking for the .tiff file. Select that file.

Add Raster Layer

  • You now have a grayscale image in your QGIS workspace, which might look like this

Straights of Hercules, Spain, Morocco

  • Now you need to crop this image to just the part that you are interested in. On the main menu ribbon, select Raster >> Extraction >> Clipper

Select Clipper Tool

  • In the dialogue box that opens, make sure that ‘Clipping Mode’ is set to ‘Extent’. With this dialogue box open, you can click and drag on the image to highlight the area you wish to crop to. The extent coordinates will fill in automatically.

  • Hit ‘Select…’ beside ‘Output File’. Give your new cropped image a useful name. Hit ‘Save’.

  • Nothing much will appear to happen – but on the main QGIS window, under ‘layers’ a new layer will be listed.

Imgur

  • UNCHECK the original layer (which will have a name like srtm_36_05). Suddenly, only your cropped image is left on the screen. Use the magnifying glass with the plus sign (in the icons at the top of the window) to zoom so that your cropped image fills as much of the screen as possible.
  • Go to Project >> Save as image. Give it a useful name, and make sure to set ‘files of type’ to .bmp. You can now import the .bmp file to your Worldpainter file.

Importing your grayscale DEM to a Minecraft World

Video tutorial again – never mind the bit where he talks about getting the topographic data at the beginning

At this point, the easiest thing to do is to use WorldPainter. It’s free, but you can donate to its developers to help them maintain and update it. Now, the video shown above shows how to load your DEM image into WorldPainter. It parses the black-to-white pixel values and turns them into elevations. You have the option of setting where ‘sea level’ is on your map (so elevations below that point are covered with water). There are many, many options here; play with it! Adam Clarke, who made the video, suggests scaling up your image to 900%, but I’ve found that that makes absolutely monstrous worlds. You’ll have to play around to see what makes most sense for you, but with real-world data of any area larger than a few kilometres on a side, I think 100 to 200% is fine.

So: in Worldpainter – File >> Import >> Height map. In the dialogue box that opens, select your bmp file. You’ll probably need to reduce the vertical scale a bit. Play around.

Now, the crucial bit for us: you can import an image into WorldPainter to use as an overlay to guide the placement of blocks, terrain, buildings, whatever. So, again, rather than me simply regurgitating what Adam narrates, go watch the video. Save as a .world file for editing; export to Minecraft when you’re ready (be warned: big maps can take a very long time to render. That’s another reason why I don’t scale up the way Adam suggests).

Save your .world file regularly. EXPORT your minecraft world to the saves folder (the link shows where this can be found.

Go play.

Wait, what about the historical maps again?

The video covers it much better than I could here. Watch it, but skip ahead to the map overlay section. See the bit at the top of this post.

Ps. Here’s Vimy Ridge, site of a rather important battle in WW1 fought by the Canadian Army, imported into Minecraft this way:
Vimy Ridge in Minecraft

Open Notebooks Part V: Notational Velocity and 1 superRobot

The thought occurred that not everyone wants to take their notes in Scrivener. You might prefer the simple elegance and speed of Notational Velocity, for instance. Yet, when it comes time to integrate those notes, to interrogate those notes, to rearrange them to see what kind of coherent structure you might have, Scrivener is hard to beat.

Screen Shot 2014-09-26 at 1.12.02 PMWith Notational Velocity installed, go to ‘preferences’. Under ‘Notes’ change ‘Read notes from folder’ to point to the Scrivener synchronization folder. Then, change ‘store and read notes on disk as:’ to ‘rich text format files’. This will save every note as a separate rtf file in the folder. Now you can go ahead and use Notational Velocity as per normal. Notational Velocity uses the search bar as a way of creating notes, so start typing in there; if it finds existing notes with those keywords, it’ll bring them up. Otherwise, you can just skip down to the text editing zone and add your note.

When next you sync scrivener, all of these notes will be brought into your project. Ta da! A later evolution of Notational Velocity, nvALT, has more features, and can be used locally as a personal wiki (as in this post). I haven’t played with it yet, but given its genesis, I imagine it would be easy to make it integrate with Scrivener this way. (A possible windows option is Notation, but I haven’t tried it out yet).

~o0o~

I’ve combined all of my automator applications into one single automator app, a superrobot if you will, that grabs, converts, creates a table of contents in markdown, and pushes the results into github, whereupon it lives within my markdown wiki page. I found I had to insert 10 second pauses between stages, or else the steps would get out of order making a godawful mess. Presumably, with more notecards, I’d have to build in more time? We shall see. No doubt there is a much more elegant way of doing this, but the screenshot gives you what you need to know:

Screen Shot 2014-09-26 at 1.36.03 PM

Update with Caveat Ah. Turns out that the Scrivener sync feature renames the notes slightly, which seems to break things in Notational Velocity. So perhaps the workflow should go like this:

1. Use notational velocity to keep notes, and for its handy search feature.
2. Have preferences set to individual files as rtf, as above, in a dedicated folder just for notational-velocity.
3. Create an automator app that moves everything into Scrivener sync, for your writing and visualizing of the connections between the notes.
4. Sync scrivener, continue as before. OR, if you wish to dispense with scrivener altogether, just use the rtf to md script and proceed.

Perhaps that’s just making life way too complicated.

Oh, and as Columbo used to say… “…one more thing”: Naming. Some kind of naming convention for notes needs to be developed. Here is some really good advice that I aspire to implement.

Open Notebooks Part IV – autogenerating a table of contents

I’ve got MDWiki installed as the public face of my open notebook.

Getting it installed was easy, but I made it hard, and so I’ll have to collect my thoughts and remember exactly what I did… but, as I recall, it was this bit I found in the documentation that got me going:

First off, create a new (empty) repository on GitHub, then;

git clone https://github.com/exalted/mdwiki-seed.git
cd mdwiki-seed
git remote add foobar <HTTPS/SSH Clone URL of the New Repository>
git push foobar gh-pages

 

Then, I just had to remember to edit the ‘gh-pages’ branch. Also, on github, if you click on ‘settings’, it’ll give you the .io version of your page, which is the pretty bit. So, I updated robot 3 to push to the ‘uploads/documents’ folder. Hooray! But what I needed was a self-updating ‘table of contents’. Here’s how I did that.

In the .md file that describes a particular project (which goes in the ‘pages’ folder) I have a heading ‘Current Notes’ and a link to a file, content.md, like so:

## [Current Notes](uploads/documents/contents.md)

Now I just train a robot to always make an updated contents.md file that gets pushed by robot 3.

I initially tried building this into robot 2 (‘convert-rtf-to-md’), but I outfoxed myself too many times. So I inserted a new robot into my flow between 2 & 3. Call it 2.5, ‘Create-toc’:

Screen Shot 2014-09-24 at 9.40.16 PM

It’s just a shell script:

cd ~/Documents/conversion-folder/Draft
ls *.md &gt; nolinkcontents.md
sed -E -n 's/(^.*[0-9].*$)/ \* [\1](\1)/gpw contents.md' nolinkcontents.md 
rm nolinkcontents.md

Or, in human: go to the conversion folder. List out all the newly-created md files and write that to a file called ‘nolinkcontents.md’. Then, wrap markdown links around each line, and use each line as the text of the link, and call that ‘contents.md’. Then remove the first file.

Ladies and gentlemen, this has taken me the better part of four hours.

Anyway, this ‘contents.md’ file gets pushed to github, and since my project description page always links to it, we’re golden.

Of course, I realize now that I’ll have to modify things slightly, structurally and in my nomenclature, once I start pushing more than one project’s notes to the notebook. But that’s a task for another night.

Now to lesson plan for tomorrow.

(update: when I first posted this, I kept saying robot 4. Robot 4 is my take-out-the-trash robot, which cleans out the conversion folder, in readiness for the next time. I actually meant Robot 3. See Part III)

Open notebooks part III

Do my bidding my robots!

Do my bidding my robots!

I’ve sussed the Scrivener syncing issue by moving the process of converting out of the syncing folder (remember, not the actual project folder, but the ‘sync to external folder’). I then have created four automator applications to push my stuff to github in lovely markdown. Another thing I’ve learned today: when writing in Scrivener, just keep your formatting simple. Don’t use markdown syntax within Scrivener or your stuff on github will end up looking like this \##second-heading. I mean, it’s still legible, but not as legible as we’d like.

So – I have four robots. I write in Scrivener, keep my notes, close the session, whereupon it syncs rtf to the ‘external folder’ (in this case, my dropbox folder for this purpose; again, not the actual scrivener project folder).

  1. I hit robot 1 on my desktop. Right now, this is called ‘abm-project-move-to-conversion-folder’. When I have a new project, I just open this application in Automator, and change the source directory to that project’s Scrivener external syncing folder. It grabs everything out of that folder, and copies it into a ‘conversion-folder’ that lives on my machine.
  2. I hit robot 2, ‘convert-rtf-to-md’, which opens ‘conversion-folder’ and turns everything it finds into markdown. The conversion scripts live in the ‘conversion-folder’; the things to be converted live in a subfolder, conversion-folder/draft
  3. I hit robot 3, ‘push-converted-files-to-github-repo’. This grabs just the markdown files, and copies them into my local github repository for the project. When I have a new project, I’d have to change this application to point to the new folder. This also overwrites anything with the same file name.
  4. I hit robot 4, ‘clean-conversion-folder’ which moves everything (rtfs, mds,) to the trash. This is necessary because if not, then I can end up with duplicates of files I haven’t actually modified getting through my pipeline onto my github page. (If you look at some of my experiments on github, you’ll see the same card a number of times with 1…2…3…4 versions).

Maybe it’s possible to create a meta-automator that strings those four robots into 1. I’ll try that someday.
[pause]
Ok, so of course, I tried stringing them just now. And it didn’t work. So I put that automator into the trash -
[pause]
and now my original four robots give me errors, ‘the application …. can’t be opened. -1712′. I found the solution here (basically, go to spotlight, type in activity, then locate the application on the list and quit it).

Here are my automators:

Robot 1

Robot 1

Robot 2

Robot 2

Robot 3

Robot 3

Robot 4

Robot 4

Automator….

I think I love you.

 

An Open Research Notebook Workflow with Scrivener and Github Part 2: Now With Dillinger.io!

A couple of updates:

First item

The four scripts that sparkygetsthegirl crafted allow him to

1. write in Scrivener,

2. sync to a Dropbox folder,

3. Convert to md,

4. then open those md files on an android table to write/edit/add

5. and then reconvert to rtf for syncing back into Scrivener.

Screen Shot 2014-09-19 at 2.24.27 PMI wondered to myself, what about some of the online markdown editors? Dillinger.io can scan Dropbox for md files. So, I went to Dillinger.io, linked it to my dropbox, scanned for md files, and lo! I found my project notes. So if the syncing folder is shared with other users, they can edit the notecards via Dillinger. Cool, eh? Not everyone has a native app for editing, so they can just point their browser’s device to the website. I’m sure there are more options out there.

Second Item

I was getting syncing errors because I wasn’t flipping the md back to rtf.

But, one caveat: when I went to run the md to rtf script, to get my changes back into Scrivener (and then sync), things seemed to go very wonky indeed. One card was now blank, the others were all Scrivener’s markup but Scrivener wasn’t recognizing it.

So I think the problem is me doing things out of order. I continue to play.

Third Item

I automated running of the conversion scripts. You can see my automator set up in the screenshot below. Again, I saved it as an application on my desktop. First step is to grab the right folder. Second, to open the terminal, input the commands, then close the terminal.

Screen Shot 2014-09-19 at 2.36.03 PM

Postscript

I was asked why on earth would I want to share my research notes? Many many reasons – see Caleb McDaniel’s post, for instance – but one other feature is that, because I’m doing this on Github, a person could fork (copy) my entire research archive. They could then use it to build upon. Github keeps track of who forks what, so forking becomes a kind of mass citation and breadcrumb trail showing who had an idea first. Moreover, github code (or in this case, my research archive) can be archived on figshare too, thus giving it a unique DOI *and* proper digital archiving in multiple locations. Kinda neat, eh?