How to split a csv file

If you’re on a PC, the instructions we posted here: work. It’s a macro, in visual basic, for excel. But after a long back and forth yesterday with @thomasgpadilla we worked out that it would break in various Mac versions of Excel. Why? I do not know. But there’s a pretty simple command line sequence on a Mac that’ll do the trick. We split it into big chunks, then split the big chunks into smaller chunks. So, grab a moderately large CSV (like John Adams’ diary, 1025 entries, 1 per row) and fire up your terminal thusly:

$ split -l 200 johnadams-diary.csv 200out
$ split -l 1 200outaa aa
$ split -l 1 200outab ab
$ split -l 1 200outac ac
$ split -l 1 200outad ad
$ split -l 1 200outae ae
$ split -l 1 200outaf af

and so we get a series of files,







until all 1025 rows are their own file. But they don’t have extensions. So:

find . -type f -exec mv '{}' '{}’.csv \;

will recursively go through the current folder, finding files, and appending .csv to them.

Ta da!

[edit] sometimes you should read the manual. The error message I was getting when I split the original file one line at a time was ‘file too big’. But of course, that’s because of the default names a-z by a-z, so only 676 combinations, which means that yes, 1025 lines *is* too big… but, if you tell split to use a three-letter prefix, you can split up to 17576 lines. The -a flag lets you make this change, like so:

split -a 3 -l 1 johnadams-diary.csv

which is what I should’ve done in the first place. D’oh! Ah well: you don’t have to know everything in digital history. Work it out in public, and somebody is sure to let you know how you could’ve done it better ;) Happily, I caught this fairly quickly after I made my first post – but how much more elegant if I’d gone for the best solution right away? Well, sometimes, the best solution, is the one that works when you need it to.

Crafting Digital History version 0.5: Some Final Projects

So the experiment of teaching data mining & visualization to history students – which will be rebranded ‘crafting digital history’ in its next iteration in order to attract a broader spectrum of students and to more accurately reflect what we’re doing – is done.

There’ve been some great moments, like when Matt forked one of my tutorials and rewrote it for the better, or built a virtual machine. Or when Patrick finally slayed Github! Or when Allison got the Canadiana API to work. Or when Phoebe finally persuaded Inkscape to play nice. Or when Matt conquered TWARC. Or when TEI blew Ryan’s mind. Or when Christina forked an Anthropology class project at MSU to repurpose for her project. Or… or… or. We covered a lot of ground.

So, I have permission to share some of these projects. In no particular order, here are some final projects from HIST3907b.

Matt T – The Historical Consciousness of Reddit

Matt D – What do Civil Servants Edit on Wikipedia?

Ryan – Searching for Residential Schools: How Google Trends can illuminate who is talking about residential schools, where they are, how they’re searching, and why.

Patrick – Urban and Rural Voting Patterns in Three American Elections

Christina – The St. Johns Micro History Mapping Project

Luke – Late 20th Century Immigration in Bubbles

Jonlou – Video games & historians

There are a few more to come in; I’ll add them here.

Macroscopic approaches to archaeological histories: insights into archaeological practice from digital methods [SAA session 200]

Going to the SAA? Why not stop in on session 200 on Friday morning, April 17?

Room: Golden Gate 3
Time: 10:30 AM – 12:00 PM
Chair: Shawn Graham

10:30 Tom Brughmans—Off the Beaten Track: Exploring what Lies Outside Paths of Most Frequently Cited Publications in Citation Networks

10:45 Joshua Wells, David Anderson, Eric Kansa, Sarah Kansa and Stephen Yerka—Beyond Sharks and Laser Beams: Lessons on Informatics Needs, Open Behaviors, and Analytics Practices to Achieve Archaeological Big Data, as Learned from the Digital Index of North American Archaeology (DINAA)

11:00 Eric Kansa—Academic Freedom, Data, and Job Performance in the Panopticon

11:15 Lorna-Jane Richardson—Discussant

11:30 Ian Kretzler, Joss Whittaker and Ben Marwick—Grand Challenges vs Actual Challenges: Text Mining Small and Big Data for Quantitative Insights

11:45 Ethan Watrall—Discussant

Original abstract for the session:

The history of archaeology, like most disciplines, is often presented as a sequence of influential individuals and a discussion of their greatest hits in the literature.  Two problems with this traditional approach are that it sidelines the majority of participants in the archaeological literature who are excluded from these discussions, and it does not capture the conversations outside of the canonical literature.  Recently developed computationally intensive methods as well as creative uses of existing digital tools can address these problems by efficiently enabling quantitative analyses of large volumes of text and other digital objects, and enabling large scale analysis of non-traditional research products such as blogs, images and other media. This session explores these methods, their potentials, and their perils, as we employ so-called ‘big data’ approaches to our own discipline.

The Original Big Data

I’m speaking tomorrow at Carleton U’s Data Day. I’m the only historian/humanist/archaeologist/whatever on the ticket. I can’t even stay for the full event, because I teach (my #hist3907b students are showing off their term projects!). Last year, I felt the speakers at the event were dismissive towards the humanities.

So when I was asked to speak this year, I said ok. My original draft went in all guns a-blazin’. I took a day to digest it, and decided, no, not all that useful, and threw it out. Below then are my speaker’s notes for what I’m going to say, regarding

History: The Original Big Data

The slides are online on github here. My actual talk will differ from what I’m writing below as I go off on tangents (though not many; only 15 minutes). This’ll give you a feel though for what I hope becomes a constructive point of departure for engagement with my data science colleagues.

1. title. Millions of $ spent digitizing historical resources;

2. opening every passing day, we leave ridculous amounts of traces – typically in 1s and 0s. How do we make any sense of it? For what purpose? What does it mean? What does it do to us, if those traces can be…. tracked?

3. troy ‘big data’ is not the first to wrestle with the problems of abundance. _This image_ shows several metric tonnes of archaeology recovered from a recent season of excavation at ancient Troy. Every sherd, every piece of pottery, every grain of pollen, every lithic, sit not just in 3d space, but in a 4d space of deposition and another one of use! it’s an incredible entangled mess, from which archaeological methods allow us to reconstruct an entire civilization. How’s that for big data.

4. big data is ever with us. archaeology/history the original big data. In my talk, I want to suggest ways in which these disciplines of big data in the past have more in common with all of you than you might first have guessed.

5. (monte testaccio: I measure data in cubic metres, not mere terabytes! roar!)

6. Carp Mountain / Ottawa’s own monte testaccio. talk about big stinky data.

7. Thinking in 4d. archae and history, bring skills and methods for dealing with multiplex, multicausal/multivalent information. Context is king.

8. Whitehouse. we’re not just concerned with asking question of our data in the here and now, but also of thinking how to manage our data so that questions we *can’t* imagine can be asked in the future with tools that *haven’t* been invented. We’re remarkably forward thinking doncherknow. Eric Kansa is one archaeologist in particular who has been at the forefront of such efforts in my own field, archaeology. Recently recognized by the White House for his work, he’s helping set the agenda in digital humanities more broadly.

9. Firehose. ‘Big’ isn’t really that useful a term though. It’s a relative measure; thus the goalposts are always moving. What was big five years ago: is it still big, if you’re measuring in terms of digital storage? Better to think of ‘big’ in relationship to your own ability to apply your method to it. Big is in the eye of the beholder; big is when you need to reduce complexity through computation.

10. Ian. …and so we’re in an era now when we as historians/archaeologists are having to invent new methodologies – for historians in particular, often in the smoking ruins of corporate decisions that obliterate the record of *millions of people’s lives*.

11. Teaching. The methods we’re coming up with, often borrowed from big data, sometimes made up by we ourselves, often have an element of deformation to them. We’re not using computation to prove an hypothesis; we’re using it to deform our worldview, to generate new ideas, to see data at a scale and perspective otherwise impossible to obtain. So let me tell you about my students, who’ve just encountered these ideas for the first time.

12-13-14 examples from class, still ongoing, these are early visuals used with permission

15. Imagequilt google images ‘DH projects’. It’s an exciting time to be teaching history. The sheer vitality and breadth of what’s being done is exhausting to keep up with. So y’all should keep an eye on

16. Data speaks? what unites our varied approaches is the reflective critique of what we’re doing, how the data is collected, how the code replicates certain visions of the world, of power, of control, of templates and constructed selves.

17. Data/Capta. Data are not neutral; anyone who tells you otherwise is trying to sell you something. They aren’t objective. Digital data in particular are not! there is nothing natural about interacting with 1s & 0s – its entirely constructed, and its worth thinking hard about by whom and for whom regarding whom.

18. Big Capta. Our platforms are built by people who imaging most people are like them. And if you’re not a white guy? digital media can be a harsh place. Which is why so much of what occurs online is performative, or actively trying to screw with, hide from, or subvert, the algorithms that are capturing our data.

19. Big Data needs DH. Big data could be liberating; it could be empowering; it could be transformative. There’s much promise in big data in being able to take a macroscopic look at ourselves. The role of the humanities is sometimes to critique, to help realise the promise.

20. storytellers. Critique doesn’t mean ‘be negative about’. There’s sometimes a tendency to frame data and data science somehow in a battle to the death, as if big data was not something that the humanities has centuries of experience in dealing with. I think that misunderstands what could be a productive respectful relationship. I think we’re both in the business of telling stories – perhaps for different goals (on which we can discuss)…

21. complementary but this makes the relationship complementary. Each needs the other.

22. And what of my own work? Well, if we’ve got time, this is the kind of stuff I do… I stand between worlds.

HIST3907O ‘Digital History Research Methods’ or, Crafting Digital History

(I really need to work on my course titles.)

Registration is open! [edit: not quite. *But*, if you leave me your email address, I can send you information about registration options as soon as it is. There’s a ‘send me info’ form at the bottom of this post.]

Join me next winter, online, to learn how to craft digital history. You can just follow along if you don’t want to pay tuition – all my materials will be openly available/copyable/remixable. If you need a university transfer credit, that (probably) can be arranged too. I especially welcome folks who do not consider themselves to be techy.

­­HIST 3907O  Crafting Digital History Winter 2015

Professor: Shawn Graham with guest appearances by Ian Milligan and probably others too!

Introduction: “We’ve spent millions digitizing the world’s historical resources. Let’s work together to figure out what they can teach us” – Adam Crymble

How do we find, analyze, and visualize the patterns in historical data? Is the internet a historical source? How do people talk about history online? Is Google changing our historical consciousness? What happens when people off-load their historical memory to Wikipedia? How do we regain control over our digital identity as historians? What does open access research mean for me?

Crafting Digital History explores these questions and more over the term through a series of hands-on exercises and individual project work. You do not need to be ‘techy’ to succeed in this course. I know that digital skills come in all shapes and sizes. What is far more important is that you are willing to try, and willing to say ‘I don’t know – help?’ I expect you to talk to each other in this class. Share your work. Collaborate. Help each other!

Digital history is a kind of public history. What’s more, the skills you will learn in this class will make you a better historian, a more critical consumer of online media, and more employable. If you want to do more with your computer than post on Facebook, this class is for you.

Class Format: We will be meeting face-to-face, virtually, once a week via a modified Google Hangout. These meet-ups are not obligatory, but you will get more out of the course if you do. They will help you stay on task. The class is divided into two-week modules that mirror the digital history workflow. There will be a menu of exercises to complete within each module (precisely which exercises will depend; in general terms, the exercises are pitched at different comfort levels, and so I will expect you to push yourself to do as many as possible). You can see an earlier iteration of the class materials here on github (note that the order of elements on a github page updates to put the most recent changes at the top; start with the ‘syllabus‘ folder! Note also that I will be revamping these materials in light of our experience this term, so that the fully online version is more polished.)

I anticipate being able to provide server space for you to set up your own digital platforms, blog, and digital identity. You will keep an online research notebook of your work, and a digital repository for your project. You will be expected to comment/learn/draw inspiration from the work of your peers, by leaving reflections in your own notebook. Your final project will be posted online (individual format and approach will be determined).

Aims and Goals: By the end of this course, you will be able to:

  1. Identify and define the limitations of useful sources of historical data online
  2. Compare and employ appropriate tools to clean and manipulate this data with a critical eye to how the tools themselves are theory-laden
  3. Analyze data using various tools with an awareness of the tendency of tools to push towards various historiographic or epistemic perspectives (ie, the ‘procedural rhetorics’ of various tools)
  4. Visualize meaningful patterns in the data to write ‘good history’ across multiple platforms, with critical evaluation of the limitations
  5. Model best practices in open access data management as mandated by SSHRC and other research agencies
  6. Develop an online scholarly voice to contribute data and reflection to the wider digital history community

Assessment: online notebook; reflection pieces; final project. There will be no final examination in this course.

 Text: An online workbook will be provided. Readings will be via online materials, provided within the workbook. You might topic model them… You may wish – but you are in no way obliged – to obtain a copy of ‘The Historian’s Macroscope’ ( please note that the price listed in Amazon is not correct; do not purchase until I can confirm the correct price). A draft version of the text is available for free at

Questions? Please email me at: shawn dot graham at carleton dot ca or find me on twitter @electricarchaeo.

Somewhere in the desert… a temple

My minecraft expedition was a success. Let me share some observations.

Firstly -> I seeded the wrong world. I used

Double Village

as seed for ‘large biomes’ when I should have used it for ‘default’. Reading the map incorrectly happens all the time in landscape archaeology though. Transpose some digits, and soon you’re hundreds of metres in the wrong spot.

Framing my expedition in my mind as a kind of steam-punk exploration helped get me back ‘in the game':

I found the village quite easily this time. It was filled with NPCs going about their mysterious business. I, a stranger, wandered into their midst and had no impact on their lives. Doesn’t that often seem the way of a ‘foreign’ expedition? When as a graduate student I was excavating at Forum Novum, our world and that of the people whose local marketground we were digging up really did not intersect, except in very particular contexts: the bar and the restaurant. On market day, we would all head back to Rome. Canadian lad flies in, digs, figures it all out, writes a paper, never explains/connects with the locals. As I remarked at the time,

And so I bumbled away, trying to record stratigraphically what I was up to. The different kinds of blocks do help differentiate context – sand fill is quite different from the sandstone blocks the temple was built with. Unfortunately, sandstone is also part of the geology of Minecraft, and typically happens around 3 or 4 blocks down from the surface in this biome. So it became difficult to figure out where the temple ended and the local geology began. Since the temple is of a common ‘type’ in Minecraft, I could just dig to exhume that prexisting type-idea and poof: complete temple. The act of excavation creates the archaeology in more ways than one, it seems.

Channeling my inner Howard Carter there. But – in this world with no ‘rules’, no overarching ‘story’, deciding to go an an archaeological expedition forces a story on us. Interacting with the NPCs, and the crude excavation tools, pushes us towards a 19th century frame of mind. In my steam-punk narrative I was constructing on twitter, the archaeologist-as-better-class-of-looter trope seemed to emerge naturally out of my interaction with the game mechanics.

And then this happened.

We’ll come back to that. Suffice to say, this encounter with the ‘otherness’ of the inhabitants of the village was oddly discomfiting.

Clearly, Notch has watched too many Indiana Jones films. Meanwhile, the villagers continued to trouble me.

And then night fell. I decided to try to spend it with the villagers.

I broke the door, quite by accident. Clumsy foreigner. Interfering.

From above, I watched the zombies and creepers and who knows what else hunt each NPC down and kill them.

So I managed to set into action a chain of events that resulted in the death of the entire village. Now obviously *real* archaeological excavation rarely results in the deaths of the locales, but there are unintended consequences to our interventions. Here, the game holds a distorted fun-house mirror to life. But were I doing this with a class, this would be a teachable moment to consider the impact of academic archaeology in those ‘distant’ lands we study.

For my minecraft adventure, I left the expedition and struck out on my own. Soon I discovered more temples, more villages, more ruins. If you’re exploring too, you can find them here:

266.9 66.87 1036.99
-219.24 65.270 13.56
58 67 347
487.73 46 560.3
247.76 66 784
430 63 929.8
692 70 1256.7

Now, one could use those coordinates to begin mapping, and perhaps working out, something of the landscape archaeology in this world. One of those coordinates belongs to a vine-covered stone temple in the jungle. Here, our expectations of what ‘archaeology’ is (informed by the movies) come to the fore.

Now, it may be that I should mod this world more in order to enable a post-colonial kind of archaeology within it. But the act of modding is itself colonialist…

So what I have I learned? I have often argued in my video games for historians class that it’s not so much the ‘skin’ of a game that should be of concern to historians, but rather the rules. The rules encode the historiographic approach of the game’s designers. You’re good at the game? You’re performing the worldview of the game’s creators. But in a game like minecraft, where the rules are a bit more low-level (for lack of a better term), what’s interesting is the way player agency in the game intersects and merges with the player’s own story, the story the player tells to make sense of the action within the world. It’s poesin. Mimemsis. Practomimetic? So while some of the game’s embedded worldview can be seen to be drawn straight from the Indiana Jones canon, other elements, like the agency of NPCs, discomfits us precisely because it intersects our own worldviews (the sociocultural practice of academic archaeology) in such a way as to draw us up short.

It will be interesting to see what Andrew’s expedition uncovers…

Somewhere in the desert…

A lost village

At the upcoming SAA in San Fracisco, Andrew Rheinhard and I are participating in a forum on digital public archaeology. Our piece, ‘Playing Pedagogy: Videogaming as site and vehicle for digital public archaeology’ is still in a process of becoming. Our original abstract:

While there is an extensive literature on the pedagogical uses of video games in STEM education, and a comparitvely smaller literature for langagues, literature, and history, there is a serious dearth of scholarship surrounding videogames in their role as vectors for public archaeology. Moreover, video games work as ‘digital public archaeology’ in the ways their imagined pasts within the games deal with monuments, monumentality, and their own ‘lore’. In this presentation, we play the past to illustrate twin poles of ‘public’ archaeology, as both worlds in which archaeology is constructed and worlds wherin archaeological knowledge may be communicated.

We had initially thought to write a game to explore these ideas, and so our entire presentation would involve the session participants playing it. But writing games is tough. In fact, it would be hard for one to top the game made by Tara Copplestone for the 2014 Heritage Jam, ‘Buried’. However, another venue presents itself. Andrew recently proposed to the makers of No Man’s Sky that he be allowed to lead an archaeological expedition therein.

“What!” I hear you exclaim. Well, think of it like this. We’re used to the idea of reception studies, of how the past is portrayed in games, movies, novels. We’re also used to the idea of games as being the locus for pedagogy, or for persuading, or making arguments. What happens then, in a game like No Man’s Sky, where the entire world is generated algorithmically from a seed? That is, no human designs it: it emerges. Rather like our own universe, eh? Such procedural games are quite common, though none perhaps are as complex in their world building as Dwarf Fortress (which evolves not just the world, but also culture & individual family/clan/culture lineages!)

What then does such  xenoarchaeology look like? How does that intersect with digital public archaeology? Well, if archaeological method has any truth to it, then in these worlds we might be faced with something profoundly alter, something profoundly different (which also accounts for why the writers of Star Trek placed such stock on archaeology)

We’ve got a month to sort these thoughts out. But it was in this frame of mind that I started thinking what archaeology in Minecraft would look like, could look like, and what it might find. Not in Minecraft worlds that have been lovingly built from scratch by a human. No, I mean the ones grown from seeds. It’s quite interesting – since no computational process is actually truly random, if you know the seed from which all calculations and algorithms are run, you can recreate the exact sequence that gives rise to a particular world (in this, and indeed in all, computational simulations). There is quite a thriving subculture in Minecraft it turns out that share interesting seeds. And so, as I searched for seeds that might prove fertile for our talk, I came across ‘Double Village’ for Minecraft 1.64. (See method 5 for spawing worlds from seeds). If you’ve got Minecraft 1.64 you too can join me on my expedition to a strange –desert land….


The texts all say the same thing. Set the portal to ‘Double Village’ and soon you’ll find the exotic and lost desert villages. I put on the archaeotrancerebretron, grabbed my kit bag, and gritted my teeth. My companions all had theirs on too. We stepped into the charmed circle…

‘Teaching 1613, An Algorithmic Incoherence’, or, the results of an experiment in automatic transcription

I loaded the audio of the opening remarks I made at last year’s Champlain Colloquium at Carleton into Youtube, to see what Google’s automatic transcription would make of it.

Ladies and Gentlemen, I give you,

‘Teaching 1613, An Algorithmic Incoherence’

0:00 maize from these critical encounters I’m yours and
0:04 think back to my high school history class and mutual security
0:08 more I don’t be overly much space in homers
0:12 we brigham
0:15 hey don’t think I actually for him as a good
0:19 historical persons old I’m
0:23 my position we’re in the artist’s
0:26 those addresses her donor speaking truth to power
0:30 to whom do we use in humans
0:34 the how to change such as homeless
0:38 Jaume here in this place
0:41 this time intern so 201 church introducing my
0:48 rules is batting practice teaching in volunteering
0:51 wrestle with this question these questions it is a
0:56 various university classroom secondary schools
0:59 the water column jam we are the people who was in
1:04 people that are you know that ass time hurdles and they’re going to
1:09 problem I I’m it she does so we ask that his own loss
1:15 you know I designs
1:18 that mean for us on this issue
1:22 use minutes or so that’s just wrong
1:26 I’m Evans and you know
1:30 also moms no said also I am so there’s no
1:35 lines or yes yes it does
1:38 don’t know those did final sorry I’m gone
1:42 my saddam
1:45 we have john wong you I
1:48 comedian is historic Department is here you go
1:51 University where he teaches courses in so long as you know the issue
1:55 one so this is going to her place memory and remember
1:59 placing yeah although it has a very strong residents
2:03 and numerics today a
2:06 know I was engage I’m program so it was also observed
2:11 also sewing machine i mean for his own use or lose you
2:15 him to keep his arm because it’s government
2:19 Karen the Russian a
2:22 yeah YES on the measure them all
2:26 lost museum chaos gym class heroes:
2:30 no year also a pedagogy
2:33 anything
2:34 yeah yeah he’s OK
2:37 you people in the US you all moved into a home
2:411 0 June 1810
2:44 yeah yeah and you just who is the director of any
2:49 education for the can see you vision
2:52 luminous Jim so I was I wish to change
2:57 share those experiences and observations
3:01 I and we should use those observations for a jumping off point
3:05 for our discussion that only are you know
3:08 billion GG 6 p.m.

…I wonder though, if I went through the transcription and corrected it – since Google now knows what I sound like, and what I’m saying at each of these timestamps: would the next bit I upload be better transcribed? Am I teaching the machine? Are we all?

The Data Driven DJ

The ‘Data Driven DJ‘ project is brilliant. I can see so much potential in it. I intend to write more about it soonish, but you should go and look at this project now. Run. Don’t walk!

Watch this:

Also, note this:

I don’t have very specific guidelines for this, but I’m generally looking for these kinds of sounds:

Music you own the rights to (and would allow me to use it in a fair way)
High-quality recordings of instruments (the weirder the better)
Sound recordings of cultural or historical significance (or really any recording that is interesting or unique in some way) [see post]

I’m going to see if I can find some audio somewhere that would meet that last requirement, send them to him. You should too!

I’ve been interested in sound, space, history, data, and experience for a while (you might even call that a kind of augmented reality, or a visualization, or a sonification, or…) but instead of crappily coding my own stuff, I think I’m going to explore the data driven dj’s materials for a while, see what I can build out from there.

Oh, and if you’re interested, here’s some of my sonic’d stuff:

Hearing the past

Historical Friction

Listening to Topic Models

The Audio Guide 2.0 (wow, that’s an oldie!)

Rocker and Docker and Daemons …. oh my!

I’m teaching a course at the moment on data mining, visualization, and other sundry topics. Right now, the course takes place in the physical world but this time next year, it will be a completely online course (and students at Carleton U, U Waterloo and Brock U will be able to take it for credit without issue; others might have to arrange transfer credit with their institution). All of the course materials are available on Github at Feel free to fork, improve, and follow along. I’ll be rewriting a lot of this material in the light of this term’s experience.

For instance, there’s the issue of platforms. In the class, we have Windows 7 users, Windows 8, Mac (Mavericks & Yosemite), and two flavours of Linux. This presents certain challenges. Do I try to teach folks how to use the platform in front of them to do the kind of research they are interested in? Or do I try to get them all onto one platform, and teach to that?

It might seem silly, but I elected to do the first. Most of the students I come into contact with are barely aware of the power of the machines that they are facebooking on in class. I wanted to get them familiar with their own environments and what they could accomplish within them.

This was all fine and dandy, more or less, until I decided they should use a shell script to download materials from the web via an API. Here’s the exercise in question. On the plus side, we learned a lot about how our machines worked. On the down side, we shed a lot of tears before everyone was on the same page again. It was at this point that one of the students forked the exercise and re-wrote it to use a virtual machine.

How freaking cool is that – a student contributing to the design of the course! I thought.

I also thought: ok, maybe I was wrong in my approach. Maybe I should’ve had them using a virtual machine from the outset. Now, Bill Turkel has long advocated for using command line tools for digital history research. Recently, he and Ian Milligan and Mary Beth Start put together a super-machine with all of the tools a historian could possibly want. I looked at this, and thought, ‘too much power’. Too many steps. Too many opportunities for something to go wrong.

I needed something stripped down. Ben Marwick, coming at the same problem from an archaeology perspective, put together a Lubuntu-flavoured VM that, once installed, uses a single install script to go out and grab things like Rstudio and various python packages. It lives here:

I copied that, and tweaked it here and there for my class. Here’s my version: (as an aside, I don’t know why my gists always have such crazy strings while Ben’s have sensible digits. Probably a setting somewhere I suppose).


I was running this vm on my computer at home. Everything chugged sooooooo verrrrrryyy slowwwwwllly. Could there be something lighter?

Enter Docker.

A lighter, reproducible environment? Alright, I’ll bite.

You install ‘boot2docker’ on your machine (whether Mac or Pc). First hurdle: select all the boxes on what you’ll install. Otherwise, it seems to conflict with any existing VMs or virtual boxes you have. Or rather, at least it did that on my machines.

Once installed, you double click the icon, and a shell opens up. Meanwhile, Oracle VirtualBox is running in another window.

This is where it all really went pear-shape for me. Hurdle two: After much rummaging, I found that I needed to enable virtualization in the BIOS for one of my machines (so the software that runs the motherboard. Typically hit f2 or f10 during boot up to access this. Don’t mess with anything else in there or serious trouble can ensue).

Hurdle three: After another cryptic error message in the shell window, I determined that I had to go into the oracle virtual box setting for the boot2docker machine and select 64-bit ubuntu (something to that effect; it was a few days ago and I neglected to write down all of the steps.). I may have had to remove the virtual machine from the virtual box and then hit boot2docker again too; it’s all hazy now. So much angst.

Hurdle 3.1?: meanwhile on my Mac, while it worked at first, it is as of this writing not working at all and I’m flummoxed.

Hurdle 4 So how the hell do we run anything, now that we’ve got the virtual machine up and running? (You’ll know you’ve succeeded when the shell window displays the ascii-art version of the Docker logo.) I decided to try the Rstudio described in the Boettiger article. First thing, you need to get Rstudio from the Rocker project – if you’re familiar with github, then it’s easy to get images of different ‘containers’ to run in docker, as for instance here:

So, at the prompt, I hit:

docker pull rocker/rstudio

And after awhile the smoke cleared. Ok, let’s run this thing:

docker run -dp 8787:8787 -v /c/Users/shawn graham/docker:/home/rstudio/ -e ROOT=TRUE rocker/rstudio

I direct you to Ben again, to explain what’s happening here. But basically, docker is going to serve me up Rstudio in a browser. It will connect my directory ‘docker’ on my Windows machine to Rstudio, so that I can share files between the docker container running Rstudio, and my machine. Point your browser to  (although, on my machine, it’s sometimes; type ‘boot2docker ip’ to find out what the address is on your machine), sign in to Rstudio with ‘rstudio’ as user and ‘rstudio’ as password and there you go. Another hurdle See how there’s a space between ‘shawn’ and ‘graham’ in that command? Yeah, that completely screwed it up. And you can’t just point it to another directory – it has to be your home directory as user on your machine. So I need to rename that directory.

So that’s where I called it a day. I think there’s just a wee bit too much futzing necessary to get Docker running, for me to launch it on my students yet. Hell, I’m not entirely sure what I’m doing yet either. Why not just have students install Rstudio on their machine as per normal? Why not have them install python, or any of the other tools we’ll use, as per normal? Maybe if all the bits-and-pieces of the History VM that Turkel (or Marwick & I) put together can be containerized, and made to launch painlessly in Docker… well maybe that’s what I need.

Oh… and then I got some crazy error about my daemon not having been fed. Or called. Petted? Treated well? I dunno. Why tell me what’s wrong when you can write something perfectly obtuse? I can always google it.