The Minimal Viable Digital History Virtual Machine

I have three classes on the go, all of which are heavily digitally inflected. In the past, I’ve always figured it was better to teach students how to use the machines they have to hand, rather than trying to get them all on a single virtual machine; after all, most of the students come to my classes with only the vaguest idea of what their machines can actually do. Being able to click around madly in Word or Powerpoint is not digital literacy. In which case, figure out what this (real) thing is capable of before we complicated matters with another (virtual) thing.

That means a lot of tech support.  I’ve always undertaken to help the windows people put things into their path variables, and mac people to guide them around the terminal (terminal?! path!?) and so on and so on. Normally, not a big deal. But this term, the three courses combined are taxing me pretty heavily.

Have you tried turning it off and on?

I’m thinking that the next time I run these courses, it’d be better to just bow to the inevitable, get everyone working on the same image, and spend time sorting out a virtual machine and then getting on with life. My question then is, what is the minimum viable digital history virtual machine? What are its components? What flavour of linux? Should it have a GUI? etc.

There are some places to start. Turkel, Start, and Milligan have the ‘HistoryCrawler‘ available for use. But the instructions are, well, complex . (EDIT- As Ian points out in the comments, those instructions are for replicating/building from scratch; to use, just download, load into virtualbox, and Go. It’s still 8gb though). One of my undergrads* built a VM last year for me – which was awesome, A+, that man – but it was still too heavy and a touch unstable. Ben Marwick has a VM set up script for archaeological computing (and me futzing with same).

It would have to be extremely straightforward to install. I’m thinking, no more than a handful of commands (that includes setting up all the virtualization framework too). It would have to automatically be configured to communicate with the students ‘my documents’ folder or similar, for getting stuff in and out (that’s a mission critical point, by the way. Configuring that would cause brain melts). It would have to be extremely light. No one has turned up with a chromebook (yet), but people are buying some godawful cheap laptops with next to no RAM.

Anyway, those are just some initial thoughts. What should the minimum viable machine have in it? What are the crucial things? Answers on a postcard.

*He also rewrote one of the exercises in the workbook to use a vanilla linux VM – see

(edited Jan 31, rewritten a bit)



  1. Great stuff, Shawn.. as you know I run into these same problems in class, and would agree that a minimally-viable VM would be a great thing (tied into what you need to run Programming Historian lessons is a good idea, as you floated on Twitter).

    A quick clarification.. while our HistoryCrawler instructions are complex for creating your own version, we have a pre-built version downloadable via You can just load it up in VirtualBox and enter the password ‘go’.

  2. Hi Ian – thanks! Ah yes, I misread. That is a lot simpler; I’ll edit the piece. I’m finding that I get a lot of students with macbook airs or similar, so I wonder if it’d be possible to have something pared down to 1 or 2 gb…

  3. Shawn

    We’ve just started a (distance) computing course that requires students to run various pieces of interlinked software, with UIs accessed via a browser, so we opted for building it inside a virtual machine to simplify distribution with software provisioned using Vagrant.

    Needless to say, there are *still* a whole raft of problems associated with getting VirtualBox running properly with whatever host operating system (in whatever version and state of updates repair they come in, etc etc) students are working from.

    Some of these issues may be related to the vagrant route we adopted for starting/stopping the machine, but that was chosen for a handful of reasons:

    1) We had some issues getting services inside the VM started and the vagrant route means we can kick start them explicitly
    2) We wanted to make sure that a shared folder was mounted in a place we knew
    3) Browser accessed services are delivered on various localhost ports, and we wanted to be able to set them to ports we wanted (hopefully in clear port space), or have them reassigned to another port automatically in the event that there was a collision or conflict
    4) If we have to, I think we can send a revised Vagrantfile to update/path the current VM… (bit we haven’t tried this yet, and hopefully we won’t have to…)

    My personal preference for distributing software would be to bundle separate applications in docker containers and use a docker compose script to bind together any that need binding together…

    Kitematic is ideal for launching single container apps on the desktop, although there are some issues associated with seeding data inside containers, particularly if you want to mount shared folders with the host machine. Docker Compose *still* doesn’t support – or a graphical UI – in Docker.

    The container approach also makes it relatively straightforward to deploy stuff on a cloud host (I’ve started trying to frame our course software for use in such a way: )

    Services like Tutum Cloud also offering graphical UIs for launching containers on hosts such as Digital Ocean:

    1. Hi Tony- thanks for this! This is a lot to think about. I’ll check out those links. Perhaps a couple of docker containers is the way to go, but wow. This is a bit more complicated than I thought!

  4. Thanks for this post, Shawn. I’ve thought a lot about this and have never come to a satisfactory answer. Here are three different ways of answering your question.

    For my course this semester, my minimum viable machine is RStudio Server. It’s still early days for the course, but I’m optimistic that students will be able to get through the semester without installing any software at all and just using R in their browser and using SSH to that same server if they need any command line tools.

    Another thought on what “minimally viable” might mean is whatever someone will actually use in the course. It’s relatively easy to provision a VM using Vagrant, and Vagrant will let you specify multiple provisioning scripts. So it’s fairly easy to break this up into a script for R, a script for Python, a script for some command line tools, etc. (R is especially easy to provision, since cran2deb4ubuntu lets you install many R packages with apt-get.)

    But for a VM that is intended to be used across classes, I’m not sure minimally viable is the way to go. For that use case, might it be better to aim at a maximally viable VM, which included as many different tools as someone is likely to ever use?

  5. > It would have to automatically be configured to communicate with the students
    > ‘my documents’ folder or similar, for getting stuff in and out (that’s a mission
    > critical point, by the way. Configuring that would cause brain melts).

    There are numerous ways to set up such a link, but unfortunately they mostly require *some* thought on the part of the user. The “My Documents” folder probably isn’t the best location, either, if the VM is being run on computers in an environment where folder re-direction is taking place to a quota-enabled server (ie any sensibly-run general-use student lab environment).

    Given how much time, thought & effort has gone into creating the Raspbian distribution (and yes, I know someone is going to jump and and down about an Ubuntu-based image instead), and the amazingly low cost of a Raspberry Pi, would creating a DH-enabled “Raspbian” image be a better bet? If every student bought a Raspberry Pi, they’d all have identical hardware, OS & a pre-configured software environment that enabled them to get straight to work without having to do all that nasty configuration work that drives even IT professionals nuts.

    They could even hook it up to their laptops via an Ethernet cable, and run it headless (although then you get back into configuring ssh, vac etc on the laptop).

    1. Raspbian! Oh, I like the idea of purchasing Raspberry Pi. Still cheaper than a textbook, right? A little wee box of dh goodness…

Comments are closed.