MA in History, Public History, Digital Humanities: 2 Positions

MA in History, Public History, Digital Humanities: 2 Positions

Damien Huffer and I are working on a project that elaborates from our ‘Insta-Dead‘ work, which data mined Instagram to explore the trade in human remains. We have an opportunity for two potential MA students to start September 2018 to work with us, at Carleton University.  At the same time, the students would pursue their own research within the ambit of this project, which revolves around the use of various AI technologies – especially, but not limited to, various kinds of neural networks. Ideally, the students’ own research projects would push the research into other domains, for instance, historical photographs; tourist photos; advertising using historical imagery, digital historical consciousness.

Interested candidates are invited to contact Dr. Shawn Graham at shawn dot graham at carleton dot ca to discuss their potential research project, to gauge their potential fit with the funding envelope and other potential supplementary funding sources. Candidates are also invited to review the MA History, MA Public History, with Collaborative Digital Humanities Program requirements –

– a good first degree in a relevant subject (history, archaeology, etc)
– existing ability in digital humanities methods or issues is desirable, but not critical. Much more important is an ability to think creatively about the problems or potentials of computational viewpoints.

Activities that the students might be involved in:

+ Writing code to generate datasets
+ Developing various NN
+ Analyzing results
+ Ground-truthing training datasets (making sure that training images are properly classified)
+ Curating and preparing materials for data publication in appropriate venues
+ Research and writing of tutorials
+ Research and writing connected with their own research interests as they intersect with this project
+ Communicating the results of research with relevant publics at conferences and other venues


featured image by Green Chamelon on Unsplash


small acts of disruption in archaeological publishing

small acts of disruption in archaeological publishing

Last month I presented at the Computer Applications in Archaeology Conference in Tübingen. I was in #s1 (full TAGS archive here) on ‘creative disruption’ in digital archaeology. Below, my slides and the speaking notes from my presentation.


1.1. first of all – see Katherine Cook’s Twitter Conference Paper because it is a much better discussion than anything I will do here this morning.

1.2. anything I say must be taken with a grain (or more) of salt. White guy on the internet – can get away with a lot that isn’t possible/permitted to folks who don’t look like me. So folks who look like me have to assume the risk so that others can flourish

1.3. A small experiment in failing gloriously in public – i want to put my publishing projects into a framework of information ethics & sensory archaeology.
– ABM: simple rules, complex results;
– small change: unpredictable results:

– complex systems teach us that small changes can lead to startling emergent effects
– i want to think about some small things we can do that disrupt
– the first small thing is to realize that ethical considerations have to be front and centre with digital archae – see the work of Meghan Dennis!

2.  – Allison Parish, poet & programmer at NYU: “a computer program is a way of projecting power. That’s the point of a computer program, to make a decision and then have it happen millions of times. That’s the real ethical dimension of this for me.

3. decisions taken in a digital medium, given the nature of computation (whose fundamental action is to copy), get multiplied in their effects. Hence, the choices, when there is a choice to be made (as there always is), are a force multiplier for what we think is important.

4. start from first principles:

  • fundamental action of computer: to copy
  • fundamental result of copying: connection
  • fundamental consequence of connection: extended sensorium
  • digital archaeology is an extended kind of digital kinaesthetia, we go intothe flow

5. let’s talk about the things of digital:

• Luciano Floridi: treats everything that exists as informational objects or processes (including bio and other entities, right into databases, agents, etc)
• everything that exists, exists in relation to everything else, with at least some minimum worth. Thus anything that destroys or diminishes data is entropy or morally evil
• ‘Information ethics describes a moral universe [an infosphere] in which not only is no being alone, but every being is indeed morally related to other beings, because in their well-being is connected the welfare of the whole system. [journals] are systems that affect larger systems with their actions, affecting themselves as well, since other systems are procedurally and informationally related to them… Information ethics considers moral actions an information process’. (Sicart discussing Floridi, video game ethics 130). He said ‘agents’; I said ‘journals’
• consider: a paywall is an immoral act because it promotes entropy, it diminishes informational entities
• journals, as we know and love them today fundamentally prevent connection; connection is a kind of sensation (especially digitally)

6. hamilakis 2013 – 415 – argues that aesthetics + politics share same ontological ground: the distribution of the sensible – what is allowed to be sensed + experienced + what is not, what is deemed appropriate or permissible to be sensorially appreciated + embodied and by whom – consensus that is established (imposed or tacitly accepted) sensorial order dissensus, the challenge to that order by pushing new + heterodox sensorial regime

7. to stop entropy, we need to restore sensation, that digital archaeological kinaesthesia of distant knowing

hamilakis talks not of senses but of a sensorial field that returns affectivity to our work – “sensoriality + affectivity also enable + invite a radically different approach to the presentation of the archaeological work, be it in scholarly publications, popular authors, or museum displays, thus creating affective instances + environments for diverse publics” h book 20

this is where my projects come into it. I think. You’ll tell me if I’m wrong, eh? 😉

8. ‘epoiesen’ – made – implies sensuous engagement with the past. ‘epoiesen’ on vases: what games were these people playing/ what did ‘epoiesen’ imply for them? about the human body, about innovations of depiction of human form on a medium that traveled – a ceramic social network

9. the first small act of disruption: a focus on the affectivity of digital or other creative work. a focus on process, and design, and valuing subjectivity

10. another small act: …which also ties to video game ethics: games, as the native art form to the digital age, are only ethical if the choices within are meaningful, consequential… thus publishing in Epoiesen also has to give the author a meaningful choice. here, in terms of licensing. author led. form led (text, photos, art, interactives). yes this is harder for me, but so what?

another small act: futureproofing. responsible to the digital entity, don’t lock in scholarship in proprietary formats. reproducibility, replicability guiding poles

11. collaborative reading – hypothesis

12. another small act: reframing peer review. not about quality assurance. not about guarding the borders. but rather about creating new webs of relationships. new conservations. publishing as a starting point, not a finish line. own names. DOIS to recognize the value of the labour, for to fit in the other games.

13. ODATE – some more small acts. Now, Hamilakis points out H book 9 – archaeology a device of modernity that relies on sense of autonomous and disembodied vision – on other hand, that attitude undermined by intensely physical, embodied interaction with things and environments. This is the same idea that Caraher points to in his ‘slow’ archaeology. digital archaeology is slow. (at least, the way I do it)

14. ODATE is a digital archaeology textbook environment that sits in the same sensuous framework as epoiesen. It comes with its own linux-based computer ( Digital archaeology – to learn it, to challenge it, to dispel the magic of Apple’s training us to expect ‘it just works’ – needs us to open the hood. no disembodied distance from the work of the machine. There is a sense of flow that comes working with data and computation that is every bit as sensuous and embodied as ‘dirt’ archaeology.

15. ODATE: digital archaeology is slow. It is organic. Built on github, language of forks and branches is a biological one. ODATE is always going to be wrong and out of date. That is a strength: with github, replicability and reproducibility and cutting and pasting of the bits that work for you . It will grow, there will be multiple copies. There will never be one canonical version of ODATE. That’s a helluva disruption, right there.

16. conclusion – small acts of disruption in archaeological publishing are actually large acts of disruption in how we think about, with, and through digital archaeology. if we think of archaeological publishing in terms of information ethics and archaeological senses, I think there’s one final small act of disruption that flows from that, and it’s this: we all can do this, already.

Tropy – OCR – Notes Workflow?

Tropy – OCR – Notes Workflow?

Here’s the ideal:

  • I take photos of (printed) documents I want with camera phone
  • photos save to google drive
  • Tropy project reads those photos from Google Drive
  • I use Tesseract to OCR those documents
  • The result is added as a note to each document in Tropy

Ideally, that’d all happen automatically. So far, here’s what I can do

  • take photos with the camera
  • find the photos on the camera, upload them to Google Drive
  • in Tropy, I import the photos from that folder
  • in R Studio, I run a batch OCR script that uses Tesseract
  • I manually add the resulting text into the notes field in Tropy

For reference, here’s my batch ocr script:


# load 'em up
dest <- "/path/to/images"
myfiles <- list.files(path = dest, pattern = "jpg", full.names = TRUE)

# improve the images
# ocr 'em
# write the output to text file

lapply(myfiles, function(i){
text <- image_read(i) %>%
image_resize("3000x") %>%
image_convert(type = 'Grayscale') %>%
image_trim(fuzz = 40) %>%
image_write(format = 'png', density = '300x300') %>%

outfile <- paste(i,"-ocr.txt",sep="")
cat(text, file=outfile, sep="\n")




Ed Summers is always up to interesting things. Recently, he cooked up something called étudier

 […] a small Python program that uses Selenium and requests-html to drive a non-headless browser to collect a citation graph around a particular Google Scholar citation or set of search results. The resulting network is written out as a Gephi file and a D3 visualization using networkx.

I had a few issues getting it to upgrade properly (I was there for version 0.0.1!) because I’m frankly a bit messy when it comes to installing and upgrading python things. This led to a lot of frustration, as these things often do, and reminded me forcefully of the advice to be using virtualenv to wall off these different experiments! Below are my notes for getting things running tickety-boo.

pip3 install --upgrade virtualenv

virtualenv edsu-etudier

source edsu-etudier/bin/activate

pip3 install etudier

I already had chromedriver installed using brew install chromedriver so I was ahead of the game there.

Now, let’s find the citation graph for something fun like Hamilakis’ Archaeology and the senses ',5&hl=en

ta da! So below, just a quick ‘n’ nasty viz…

feature image ‘Reading Glasses’ Mari Helin-Tuominen, Unsplash

Using a Static Site Generator to Make a Nicer Omeka Front Page

Using a Static Site Generator to Make a Nicer Omeka Front Page


I like Omeka. But I’m not much good at theme development or customization. John Stewart, on the DH Slack,  showed me some Omeka sites he and his colleagues have been building (like this one and this one) that used static site generators to create lovely front page / splash pages, linking out to the exhibitions and collections.  I was showing these to some of my current students who are working on Omeka projects as part of their MA theses; they liked them, and so we decided to give it a shot.

I initially thought it’d be a simple matter of putting the generated site in the top folder of the domain, and swapping in a new index.html. Turns out, a bit more complicated. John patiently walked me through what he did – thanks John! – and here are my notes for future reference.

We’re using Hugo to generate the static site. Because Jekyll is awful and why would you do that to yourself? We followed the Hugo Quickstart  up to step three. At that point, we cd into our Hugo project’s themes folder, git clone a theme we liked, and copy its config.toml file from its ExampleSite folder into our main project folder. We adjusted the settings there the way we wanted until we got the site looking nice. The hugo command generated the static site into the public folder. Now, this is where things got a little complicated.

When Omeka serves up a site, it generates the look-and-feel from the files in the folder containing the current theme. That’s where our static site has to go. In our case, we were using the default Berlin theme. We’re also using Reclaim Hosting, so navigating around these folders is easy using the FileManager from the cPanel. Go to the relevant theme (for the purposes of this example, the Berlin theme), and you’ll see an index.php file and a css folder. We don’t want our static site to interfere with the Berlin theme inside our omeka site – we’re only messing around with making a nice splash page, remember – so we have to rename our generated css folder in the Hugo site, and then make sure the file paths in our index.html file point correctly. So:

  • once you’ve generated the static site, in the public folder of the Hugo project on your computer:
    • rename the css folder to css2
    • rename index.html to index.php
    • in the index.php file, change the paths to your css folder to css2
    • make sure the filepaths to your css2 folder are pointing to the right place: <link rel="stylesheet" href=""/>
    • check your file for any other calls to that css2 folder and change them up accordingly
    • zip the contents of the public folder into a zipfile
  • in the theme folder in your omeka installation (here, the Berlin theme),
    • rename index.php to index2.php
  • using the filemanager, upload the zip file into the theme folder
  • extract all (right-click on the file).

And now you have a lovely splash page!

Banner image Andrew Neel, Unsplash

Tropy to Scrivener Workflow

Tropy to Scrivener Workflow

Scrivener is a great tool for writing, especially if you’re like me and you like to chunk out your thoughts so that you can rearrange them later. Scrivener also has a very nice ‘research’ folder, into which you can copy out your notes a la one-thought-one-card, and then drag them into the actual writing as necessary.

Tropy is a new offering from the Roy Rosenzweig Center for History and New Media at George Mason University. It lets you manage your research photographs, such that you can annotate them, transcribe the text (say you’re in an archive taking pics of historical documents), or otherwise mark them up. I recently thought to myself, facing a stack of short-term loan books I’d accumulated, maybe I can take pictures of the bits here that I’m interested in, and use Tropy to sort them out. (I took the pics with my phone, saving them to my google drive which then synced to this computer where Tropy was waiting).


Then I wondered, perhaps I can export the notes such that they update in Scrivener as unique cards? Now, I know that Tropy has a sqlite database in the back end, and presumably I could’ve written some sort of query that’d do all of what I’m about to lay out. But I don’t know how to do that sort of thing. So instead, I’m using jq (see this tutorial) to do the heavy lifting, and a few other commands at the terminal.

1. So make your notes in Tropy. Highlight your images, and select ‘export’. Tropy exports in json-ld which is a bit different than regular ol’ json.

2. I’m on a Mac. Open terminal at the folder where you’re working. We’ll want to remove the top level [ and ], like so:

sed -i.bak 's|\[||' export.jsonld 
sed -i.bak 's|\]||' export.jsonld 

3. Now we’re going to use the jq command (which you can install with homebrew) to get the notes out. This will do the trick:

.["@graph"][].photo[]  | {id: .title, note: .note.html}

So the full jq command in the terminal is:

jq -r '.["@graph"][].photo[] | {id: .title, note: .note.html}' export.jsonld > out.txt

4. Now we’re going to split that out.txt file into separate files. Don’t do it yet, but the command will look like this:

split -p ^{ out.txt new

Let’s make a new folder for the split-up notes that we will then import into Scrivener.

mkdir outputnotes
cd outputnotes

5. Let’s split that out.txt file and have the output written into our current directory:

split -p ^{ ../out.txt

6. Scrivener is probably smart enough to recognize text, even without an extension, but just in case, let’s pretend these things are in markdown. Maybe you actually wrote your notes in markdown in the first place.

for f in *; do mv "$f" "$"; done

7. And so, over in Scrivener, just import this folder!

Or, if you want to use scrivener sync (say you’re collaborating): in scrivener sync settings, make a new folder. Do the jq step, then cd into the new folder (here, ‘scrivtest’). Inside that new folder,

mkdir Notes
cd Notes
split -p ^{ ../../out.txt

You don’t want to be in the Draft folder that scrivener made. Give those files the md extension as before. Now go over to scrivener and hit sync. make sure to tick off the option to sync all other text objets to the project. Ta da! your notes are now in your research folder!

I’m sure this can probably be streamlined, but not bad for an hour’s futzing.


3d models from archival film/video footage

3d models from archival film/video footage

Yesterday, I helped Andrew troubleshoot some workflow regarding vr-to-real-world photogrammetry. You should go read his post. As I was doing that, I was thinking that the same flow would work for archival video (which I’ve done with visualSFM, but not Regard3d, so challenge accepted! By the way, the VSFM workflow was Ryan’s regarding models from drones).  So I grabbed some aerial photography of Pompeii from WWII era ish, and gave it a spin. It worked, but it was an ugly ‘beta’-worked, so I left my machine running over the weekend and I’ll know by Monday whether or not the result is any better. I wrote up the workflow, thinking it’d be useful for my class, and deposited with Humanities Commons. I pasted it below, as well. Lemme know if it works for you, or if I’ve missed something.


It is possible to make 3d models from archival film/video footage, although the quality of the resulting model may require a significant amount of sculpting work afterwards to achieve a desireable effect. It depends, really, on why one wants to build a 3d model in the first place. Archaeologists for instance might want to work with a 3d rendering of a building or site now lost.

The workflow
The workflow has a number of steps:

1. obtaining the video (if it is on eg. youtube)
2. slicing the video into still images
3. adding camera metadata to the images
4. computing matched points across the images
5. triangulation from the matched points
6. surface reconstruction

Necessary software
nb these are all open-source or free-to-use programs

1. Youtube-dl
2. ffmepg
3. exiftool
4. regard3d
5. meshlab (for post-processing)

Step One Downloading from Youtube

Archival or interesting footage of all kinds may be found on youtube and other video streaming services. Youtube-dl is a sophisticated program for downloading this footage (and other associated metadata) from youtube and some other sites. Find a video of interest. Note the url. Then:


Try to find video that does not have watermarks (the example above has a watermark and probably is not the best source video one could use). Look for videos that are composed of long cuts, that sweep smoothly around the site/object/target of interest. You may wish to note the timing of interesting shots, as you can download or clip the video to those passages (see the youtube-dl documentation)

Step Two Slicing the Video into Stills

ffmepg is a powerful package for manipulating video and audio. We use it to cut the video into slices. Consult the full documentation to work out how to slice at say every 5 seconds or 10 seconds (whatever is appropriate to your video). Make a new directory in the folder where you’ve downloaded the video with mkdir images. Then the command below slices at every second, numbers the slices and puts them into the frames subdirectory:

ffmpeg -i "downloaded-film.mp4" -r 1 frames\images-%04d.jpeg

Windows users would call ffmpeg with ffmepg.exe (if they haven’t put it into their system’s path variable). Step Three Adding Camera Metadata

We will be using Regard3d to stitch the images together. Regard3d needs to know the camera make, model, focal length (mm), and sensor width (mm). We are going to fudge this information with our best approximation. ‘Sensor width’ is the width of the actual piece of hardware in a digital camera upon which light falls. You’ll have to do some searching to work out the best approximation for this measurement for the likely camera used to make the video you’re interested in.

Find the camera database that Regard3d uses (see the documentation for Regard3d for the location on your system). It is a csv file. Open it with a text editor (eg Sublime Text or Atom. not Excel, because Excel will introduce errors). Add the make, model, and sensor width information following this pattern:


Regard3d reads the exif image metadata to work out which camera settings to use. Focal length is read from the exif metadata as well. We assign these like so, from the command line in your frames folder:

exiftool -FocalLength="3.97" *.jpeg
exiftool -Make="CameraMake" *.jpeg
exiftool -Model="CameraModel" *.jpeg

Note that the make and model must absolutely match what you put into the camera database csv file – uppercase, lowercase, etc matters. Also, Windows users might have to rename downloaded exiftool file to exiftool.exe and put it into their path variable (alternatively, rename it and then put it in the frames folder so that when you type the command, your system can find it easily).

Step Four Computing Matches

Open Regard3d and start a new project. Add a photoset by selecting your frames directory. Note that when you used the exiftool, the original images were copied within the folder with a new name. Don’t select those original images. As the images load up, you will see whether or not your metadata is being correctly read. If you get NaN under make, model, focal length, or sensor width, revisit step three again carefully. Click ok to use the images.

Click on compute matches. Slide the keypoint density sliders (two sliders) all the way to ‘ultra’. You can try with just the default values at first, which is faster, but using ‘ultra’ means we get as many data points as possible, which can be necessary given our source images.

This might take some time. When it is finished, proceed through the next steps as Regard3d presents them to you (the options in the bottom left panel of the program are context-specific. If you want to revisit a previous step and try different settings, select the results from that step in the inspector panel top left to redo).

The final procedure in model generation is to compute the surfaces. When you click on the ‘surface’ button (having just completed the ‘densification’ step), make sure to tick off the ‘texture’ radio button. When this step is complete, you can hit the ‘export’ button. The model will be in your project folder – .obj, .stl., and .png. To share the model on something like zip these three files into a single zip folder. On sketchfab, you upload the zip folder.

Step Five Clean Up

Double click on the .obj file in your project folder. Meshlab will open and display your model. The exact tools you might wish to use to enhance or clean up your model depends very much on how your model turned out. At the very least, you’ll use the ‘vertice select’ tool (which allows you to draw a box over the offending part) and the ‘vertice delete’ tool. Search the web for help and examples for the effective use of Meshlab.



I’m trying out Regard3d, an open-source photogrammetry tool. A couple of items, memo-to-self style of thing:

    • its database does not have cellphone cameras in it. Had to google around to find the details on my particular phone
    • its database is this: 
    • just had to find where it was on my machine, and then make an entry for my phone. I’m still not sure whether I got the correct ‘width’ dimension – running with this. 
    • nb don’t do this with excel – excel does weird things to csv files, including hidden characters and so on which will cause Regard to not recognize your new database entry. Use Sublime Text or another text editor to make any changes. You can double click on an image in the imageset list inside Regard and add the relevant info one pic at a time, but this didn’t work for me.
    • I took the images with Scann3d, which made a great model out of them. But its pricing model doesn’t let me get the model out. So, found the folder on the phone with the images, uploaded to google drive, then downloaded. (Another nice thing about Scann3d is when you’re taking pictures, it has an on-screen red-dot/green-dot thingy that lets you know when you’re getting good overlap.)
    • Once I had the images on my machine, I needed to add exif metadata re focal length.  Downloaded, installed, exiftool. Command:  exiftool -FocalLength="3.97" *.jpg
    • In Regard3d, loaded the picture set in.
    • The next stages were a bit finicky (tutorial) – just clicking the obvious button would give an error, but if I had one of the image files selected in the dialogue box, all would work.
    • here’s a shot of the process in…erm… process…

  • Console would shout ‘error! error!’ from time to time, yet all continued to work…

I’m pretty sure I saw an ‘export to meshlab’ button go by at some point… but at any rate, at the end of the process I have a model in .ply and .obj!  (ah, found it: it’s one of the options when you’re ready to create the surface). All in all, a nice piece of software.


Markov Music; or; the Botnik Autogenerator Reel

Markov Music; or; the Botnik Autogenerator Reel

You must’ve seen the Harry Potter chapter written with markov chains / predictive text (not AI, I should point out). I went to the site, and thought, I wonder what this could do with music written in the text ABC notation format. So, grabbing the same source files that gave us Mancis the Poet (where I used RNN to generate the complete files), I loaded Botnik with Cape Breton Fiddle tunes. Then I generated a text, clicking madly in the middle of the interface. The result:

A ab|ca fe|dfba f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a|
Da|bf{g}fe b a2 f2|d a ab|ca fe|dfba f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a| f2 ed|ceac b2a|

Which, when you add some made-up metadata like so to the front:

T:Botnik Autogenerator Reel
O:21st century Canadian

…becomes a file that can be turned into MIDI, thence mp3. You can generate your own fiddle line with this corpus:

And here you go: the music notated above as MIDI piano:

The Human in the Digital Humanities

The Human in the Digital Humanities

I’m giving a talk in a month’s time, for the launch of a new DH centre at Drew University. This necessarily involves travel to the US. I’ve decided to go because these kinds of events are more important, in my view, than (say) traveling to a conference. Starting something that will benefit students > giving a paper that merely adds a line to my cv. The way things are going state-side, I see reasons for hope, and reasons for despair, and so if I can help further DH down there, I guess I’d better go. Because I do see DH as something that, when it turns outward, as something that can (does?) make a difference.

Or at least, it should.

Anyway, the title is ‘The Human in the Digital Humanities’. As I sketch it out, I wonder if I shouldn’t call it ‘the Humane’, which is a slightly different kettle of fish.  As part of my writing process, I also asked folks what they thought such a talk might cover; the point of this post is to gather the results together, to date. I also wanted to draw attention to this tweet by Paige Morgan, as I think it (and its thread, and resulting conversations) captures some of the things I want to talk about.

So, here are the other tweets. My initial tweet:

(featured image, Geralt, Pixabay)