I’m teaching a course at the moment on data mining, visualization, and other sundry topics. Right now, the course takes place in the physical world but this time next year, it will be a completely online course (and students at Carleton U, U Waterloo and Brock U will be able to take it for credit without issue; others might have to arrange transfer credit with their institution). All of the course materials are available on Github at https://github.com/hist3907b-winter2015. Feel free to fork, improve, and follow along. I’ll be rewriting a lot of this material in the light of this term’s experience.
For instance, there’s the issue of platforms. In the class, we have Windows 7 users, Windows 8, Mac (Mavericks & Yosemite), and two flavours of Linux. This presents certain challenges. Do I try to teach folks how to use the platform in front of them to do the kind of research they are interested in? Or do I try to get them all onto one platform, and teach to that?
It might seem silly, but I elected to do the first. Most of the students I come into contact with are barely aware of the power of the machines that they are facebooking on in class. I wanted to get them familiar with their own environments and what they could accomplish within them.
This was all fine and dandy, more or less, until I decided they should use a shell script to download materials from the web via an API. Here’s the exercise in question. On the plus side, we learned a lot about how our machines worked. On the down side, we shed a lot of tears before everyone was on the same page again. It was at this point that one of the students forked the exercise and re-wrote it to use a virtual machine.
How freaking cool is that – a student contributing to the design of the course! I thought.
I also thought: ok, maybe I was wrong in my approach. Maybe I should’ve had them using a virtual machine from the outset. Now, Bill Turkel has long advocated for using command line tools for digital history research. Recently, he and Ian Milligan and Mary Beth Start put together a super-machine with all of the tools a historian could possibly want. I looked at this, and thought, ‘too much power’. Too many steps. Too many opportunities for something to go wrong.
I needed something stripped down. Ben Marwick, coming at the same problem from an archaeology perspective, put together a Lubuntu-flavoured VM that, once installed, uses a single install script to go out and grab things like Rstudio and various python packages. It lives here: https://gist.github.com/benmarwick/11204658.
I copied that, and tweaked it here and there for my class. Here’s my version: https://gist.github.com/shawngraham/fadc16465d6e27e0f37c (as an aside, I don’t know why my gists always have such crazy strings while Ben’s have sensible digits. Probably a setting somewhere I suppose).
But….
I was running this vm on my computer at home. Everything chugged sooooooo verrrrrryyy slowwwwwllly. Could there be something lighter?
Enter Docker.
A lighter, reproducible environment? Alright, I’ll bite.
You install ‘boot2docker’ on your machine (whether Mac or Pc). First hurdle: select all the boxes on what you’ll install. Otherwise, it seems to conflict with any existing VMs or virtual boxes you have. Or rather, at least it did that on my machines.
Once installed, you double click the icon, and a shell opens up. Meanwhile, Oracle VirtualBox is running in another window.
This is where it all really went pear-shape for me. Hurdle two: After much rummaging, I found that I needed to enable virtualization in the BIOS for one of my machines (so the software that runs the motherboard. Typically hit f2 or f10 during boot up to access this. Don’t mess with anything else in there or serious trouble can ensue).
Hurdle three: After another cryptic error message in the shell window, I determined that I had to go into the oracle virtual box setting for the boot2docker machine and select 64-bit ubuntu (something to that effect; it was a few days ago and I neglected to write down all of the steps.). I may have had to remove the virtual machine from the virtual box and then hit boot2docker again too; it’s all hazy now. So much angst.
Hurdle 3.1?: meanwhile on my Mac, while it worked at first, it is as of this writing not working at all and I’m flummoxed.
Hurdle 4 So how the hell do we run anything, now that we’ve got the virtual machine up and running? (You’ll know you’ve succeeded when the shell window displays the ascii-art version of the Docker logo.) I decided to try the Rstudio described in the Boettiger article. First thing, you need to get Rstudio from the Rocker project – if you’re familiar with github, then it’s easy to get images of different ‘containers’ to run in docker, as for instance here: https://registry.hub.docker.com/u/rocker/rstudio/
So, at the prompt, I hit:
docker pull rocker/rstudio
And after awhile the smoke cleared. Ok, let’s run this thing:
docker run -dp 8787:8787 -v /c/Users/shawn graham/docker:/home/rstudio/ -e ROOT=TRUE rocker/rstudio
I direct you to Ben again, to explain what’s happening here. But basically, docker is going to serve me up Rstudio in a browser. It will connect my directory ‘docker’ on my Windows machine to Rstudio, so that I can share files between the docker container running Rstudio, and my machine. Point your browser to 192.168.59.103:8787 (although, on my machine, it’s sometimes 192.168.59.104:8787; type ‘boot2docker ip’ to find out what the address is on your machine), sign in to Rstudio with ‘rstudio’ as user and ‘rstudio’ as password and there you go. Another hurdle See how there’s a space between ‘shawn’ and ‘graham’ in that command? Yeah, that completely screwed it up. And you can’t just point it to another directory – it has to be your home directory as user on your machine. So I need to rename that directory.
So that’s where I called it a day. I think there’s just a wee bit too much futzing necessary to get Docker running, for me to launch it on my students yet. Hell, I’m not entirely sure what I’m doing yet either. Why not just have students install Rstudio on their machine as per normal? Why not have them install python, or any of the other tools we’ll use, as per normal? Maybe if all the bits-and-pieces of the History VM that Turkel (or Marwick & I) put together can be containerized, and made to launch painlessly in Docker… well maybe that’s what I need.
Oh… and then I got some crazy error about my daemon not having been fed. Or called. Petted? Treated well? I dunno. Why tell me what’s wrong when you can write something perfectly obtuse? I can always google it.