An Elegant Open Notebook

I’ve been looking for an open notebook solution for some time. Tonight, I think I’ve hit a combination of tools that are sufficiently powerful and straightforward enough that I can integrate them into my undergraduate teaching. But first:

Caleb McDaniel’s eloquent argument for why one would want to do it is here: http://wcm1.web.rice.edu/open-notebook-history.html

Carl Boettiger is another inspiration: http://www.carlboettiger.info/lab-notebook.html

Mark Madsen’s explanation of how his open notebook works – and the logic he organizes it by – is similarly an inspiration: http://notebook.madsenlab.org/labnotebook.html

and Ben Marwick’s work is pathbreaking (especially within archaeology): http://rstudio-pubs-static.s3.amazonaws.com/14247_4703e332c133404b9765f61082dd54cc.html

Right. So what is this solution? Zettlekasten + Pykwiki. In an earlier post I got the one-card-per-note ‘zettlekasten’ system that Dan Sheffler uses working on my machine (by the way, you will learn a *lot* if you spend some time reading Dan’s thoughts on the art of note taking). In essence, Dan’s system is just a simple plugin for Sublime Text that designates a particular folder a ‘wiki’. When I type [[ Sublime text shows me a quick list of every markdown file in that folder which I can select to make a link. The link then shows up in my file as [[something like this]] and I can ctrl click on that link to jump to the other file. If there is no file when I hit [[ I can click to create a new file with the text between the [[ and ]] as the file name. It’s a bit like Notational Velocity in that regard (and of course, you could use Notational Velocity if you wanted, though – I’d have to check – I don’t think you can save your notes as separate .md files update: yes, yes you can. In ‘preferences, notes, storage’ there are two lists of file types. Hit the + button under each one in turn, adding ‘md’. Then, highlight the other options and hit the – button. Folks just getting started – start with nvAlt. You can also crosslink as described above in nvAlt using the [[ and typing the first letters of the card you wish to crosslink to).

So, with a bit of forethought and sensible naming conventions, I can quickly build up quite a network of cards. Dan also has a script for exporting annotations made on pdfs from Skim as markdown files (my post on this), and the resulting md file can be chopped up quickly or integrated into the existing network of cards).

This is good. What is even better is being able to keep this structure, searchable, online in a way that could be forked. This is where Pykwiki comes into the picture. Pykwiki is a static site generator, run from the command line, that will take a folder of .md files and generate a site from them, which you can then put on your own server space (see a list of features here). Now, I could just put my Zettlekasten in a github repo in the first place, and then push them online. But the files would lose that network of connections – remember, Sublime is interpreting my [[ ]] as an internal link. Pykwiki *also* understands those [[ ]] as internal links! (caveat: don’t have any spaces in file names).

So here’s the set up. I install Pykwiki. I designate Pykwiki’s ‘source’ folder as my Zettlekasten folder for Sublime text. I do my readings, I make my notes, I generate my web of notecards. I make sure to have a ‘data block’ for each note, which is [[ and ]] again, above and below the metadata, with the body of the card below – at a minimum, it looks like this:

[[
title: Space Syntax in Pompeii - shortcomings
]]

## Laurence, Space & Society in Roman Pompeii

+ On page 194....

…but can also include tags and other kinds of meta data that’d be useful to have. Also, a notecard can be designated ‘private’ so it doesn’t show up in search, but is still findable if you know the direct url. (And I’m also using another script from Dan Sheffler which associates BibDesk cite keys with custom URLs to open up the pdfs I was reading in the first place, so my online notes will open my pdfs on this machine or any other one where I have that script installed.).  With my session ended, I make sure to save all my cards, close Sublime, and go over to the terminal, and ‘pykwiki cache’ – et voila! The site is generated. It’s now in the ‘docroot’ folder. This folder is pushed to a github pages branch, and I’ve got myself a searchable open notebook wiki (with rss feed! So recipes from IFTTT.com can be used to further mash things up). (Set up on Mac is easy; Windows not so easy, but here are instructions in my open notebook.)

Sweet!

(PS Thought occurs – since I sometimes also do work in R, I can probably get that whole Rmd scene working with this as well.)

(PPS At this point, you might reasonably expect to find a link to my open online notebook. Erm. Well, I only got all the bugs out of the flow this evening. Over the coming days, I’ll start trying to put some of my existing notes – or maybe, start a new notebook – up.)

update oct 9 Getting pykwiki set up on a mac is easy; getting it set up on windows is more difficult, although this could just be because I’ve missed something fundamental. I don’t use any sort of bash on my windows machine, which might be the source of some of my issues. Anyway, I got it working on windows, and the instructions are in *my open notebook[!]*. http://shawngraham.github.io/installing-pykwiki-on-windows.html.

In terms of making your notebook live on the interwebs, say on a github pages site- the default pykwiki settings (see the config.yaml file) work out-of-the-box if and only if you move the static files generated (in the docroot folder) to the root of the domain (or subdomain). Ie, http://shawngraham.github.io works out of the box, but shawngraham.github.io/example-folder does not. This location: example-pykwiki-notebook.example.com would also work (ie a subdomain). To get folder paths to work, you have to update the config.yaml’s ‘web_prefix’ settings- see http://shawngraham.github.io/weprefix.html .

ppps the files I have up at the moment come from an earlier experiment with mdwiki as an open notebook solution. They are currently horrible because they’ve undergone some nasty translations back and forth across scrivener on mac and windows, an android device, and a few other tortured byways. I’m in the process of cleaning them up, crosslinking them, and tagging them, which has the virtue of reminding me what I’ve written and what I was thinking about a year ago. Incidentally, I set up the ‘source’ folder to be its own repository on my github, so I have two repos – the one that serves the generated site, the other that has the clean .md files. These latter I could copy into a scrivener project, and use them to begin writing. Win!

another update – I was reading this: http://www.chs-fellows.org/2015/08/03/contextualizing-digital-data-as-scholarship-in-eastern-mediterranean-archaeology/ by Eric Kansa, which led me to wonder this:

https://twitter.com/electricarchaeo/status/652550926717353992

which led to an awesome conversation, and also this exciting piece of news:

https://twitter.com/erochest/status/652551892690714625

…so you should trace those conversations back. Also, I made a note in my open notebook about it all:

http://shawngraham.github.io/thoughts%20on%20a%20coin.html

(open notebook image: https://www.flickr.com/photos/barnimages/18368762634/sizes/c/ public domain)

Working out the kinks in a VisualSFM via Docker workflow

Not these kinks.

VSFM, for those who’ve tried it, is a right huge pain in the arse to install. Ryan Bauman has done us all a huge favour by dockerizing it. His explanation of this is here – and once you’ve figured out some of the kinks, this is much easier way of working with it.

Ah yes, the kinks.

First of all, before we go any further, why would you want to do this? Isn’t 123D Catch enough? It is certainly easier, I grant you that. And it does a pretty good job. But structure-from-motion applications each approach the job differently – Ryan does a comparison here on the same objects. Some of those applications are very expensive indeed. VSFM is free to use, and can be called from the command line, and with care and practice one can get very good results. (What really caught everyone’s eye on twitter the other day was Ryan’s workflow for generating 3d objects from found drone aerial footage. HOW COOL IS THAT.). So I set out to replicate it.

First things first: you need to go to Docker and install it (here is a post wherein I futz with Docker to run Rstudio).

Now, Ryan’s container (that we will use in a moment) also comes with the handy youtube-dl for grabbing youtube videos, and another package for manipulating and cutting stills out of that video.  What follows are my notes to myself (in which I sometimes copy-and-pasted from others’ posts, to remind me what I was trying to do) as I work through the first part of Ryan’s workflow – from downloading the video to generating the point cloud. The meshlab texturing stuff will be a follow-up post.

1. Initialize and run boot2docker from the command line, creating a new Boot2Docker VM.

$ boot2docker init

This creates a new virtual machine. You only need to run this command once.

Start the boot2docker VM.

$ boot2docker start

Set the environment variables in your shell do the following:

$ eval "$(boot2docker shellinit)"

Then this:

$ docker run -i -t ryanfb/visualsfm /bin/bash

first time, will take a long time to download everything you need. This is Ryan’s container – the next time you go to do this, it’ll spin up very quickly indeed (one of the advantages of Docker; it’s a virtual machine with just the bits you need!) Then:

$ youtube-dl 'https://www.youtube.com/watch?v=3v-wvbNiZGY'

downloads a file from youtube called:
The Red Church Dating to the late 5thearly 6th century-3v-wvbNiZGY.mp4

let’s rename that:

$ mv 'The Red Church  Dating to the late 5thearly 6th century-3v-wvbNiZGY.mp4' redchurch.mp4

now let’s create a new directory for it:

$ mkdir redchurch

and move the mp4 file into it:

$ mv redchurch.mp4 redchurch

Ok, so now we split it into frames:

$ avconv -i redchurch.mp4 -r 1/1 -qscale:v 1 redchurch_%08d.jpg

(note that in the original post by Ryan, he was using ffmpeg; the docker container uses this alternative)

And then we go up a level

$ cd ..

and run some vsfm on it:

$ VisualSFM sfm+pairs+pmvs ~/redchurch redchurch.nvm @8

This part took nearly three hours on my machine.

ASIDE: now, I had to increase the available memory for VSFM to make it work; otherwise I was getting a ‘segmentation error’ at this step. To do this, first I found the .boot2docker folder by hitting control as I clicked on the finder, and then ‘go to’ /users/[user]/.boot2docker. I opened a new terminal there, and made a new file called ‘profile’ (no extension) with the following info

#disk image size in MB
DiskSize = 20000

# VM memory size in MB
Memory = 7168

I made the file by typing vi profile at the terminal, then typed in the info; then escape to stop editing and :save profile to save the file and close it.

Now to get stuff out (and this post was most helpful)

we need to open another terminal window, start docker there, and ask Docker to give us the id of the container that is running, so we can cp (copy) files out of it:

$ docker ps

There will we a randomly generated ‘short name’ for your container; the short id will be the same as at the prompt in your terminal where the vsfm is running; eg in my case: root@031e72dfb1de:~#

Then we need to get the full container id:

$ docker inspect -f   '{{.Id}}'  SHORT_CONTAINER_ID-or-CONTAINER_NAME

example (drawn from this post):

$ docker ps

CONTAINER ID      IMAGE    COMMAND       CREATED      STATUS       PORTS        NAMES

d8e703d7e303   solidleon/ssh:latest      /usr/sbin/sshd -D                      cranky_pare

$ docker inspect -f   '{{.Id}}' cranky_pare

You will get a ridiculously long string. Copy & paste it somewhere handy. On my machine, it’s:
031e72dfb1de9b4e61704596a7378dd35b0bd282beb9dd2fa55805472e511246

Then, in your other terminal (the one NOT running vsfm, but has docker running in it), we do:

$ docker cp <containerid>:path-to-file useful-location-on-your-machine

In my case, the command looks like this:

shawngraham$ docker cp 031e72dfb1de9b4e61704596a7378dd35b0bd282beb9dd2fa55805472e511246:root/redchurch.1.ply ~/shawngraham/dockerific/

update: turns out you can use the short id, in this case, 031e72dfb1de:root etc and it’ll work just fine.

(dockerific being the folder I made for this occasion)

~oOo~

Tomorrow, I’ll write up the kinks in the meshlab part of this workflow. Thanks again to Ryan for a brilliant piece of work!

~oOo~

update july 29

ok, let’s make life a bit easier for ourselves, in terms of getting stuff into and out of the docker container. Let’s create a folder that we can use as a kind of in-out tray. I’ll create a folder on my file system, at /user/shawngraham/dockerific

Then, when I am ready to run the container, we’ll mount that folder to the tmp folder in the container, like so:

$ docker run -i -t -v /Users/shawngraham/dockerific/:/tmp/ ryanfb/visualsfm /bin/bash

Now, anything we put in the tmp folder will turn up in dockerific, and vice versa. NB, things might get overwritten, so once something turns up in dockerific that I want to keep, I move it into another folder safe from the container. Anyway, when I want to get things out of the docker container, I can just CoPy <file> <to this location>

cp output.ply ~/tmp/

…and then grab it from the finder to move it somewhere safe.

Zettelkasten in Sublime (a note on Dan Sheffler’s script)

I’ve rapidly become a huge fan of Dan Sheffler’s workflow. One thing that I’m really looking forward to trying out is is ‘Zettelkasten‘(also here), a kind of flat wiki-like approach to note taking. I have always struggled with effective notetaking, but combining his markdown export from pdfs (via Skim) with the zettelkasten (which I could then push to github for open-notebooking purposes, or feed to Jekyll, or text mining, etc) has me (geek that I am) rather excited about the prospects.

Anyway, I’ve just gotten everything working. Here’s how:

1. Install Sublime Text 3.
2. Download the MyWiki zip from Dan’s repo on Github.
3. Install Package Control from packagecontrol.io

3a (because I forget this step when I first published this post: Install Bracketeer plugin; go to preferences – package control – install packages – type in ‘bracketeer’, select it, hit enter).
4. Open Sublime Text 3.
5. Under Preferences, go to ‘browse packages’. This opens the package location in your finder.
6. Meanwhile, unzip the MyWiki plugin. Copy the MyWiki folder to the package location.

I had trouble getting Sublime 3 to recognize the ‘keymap’, that is, the file telling Sublime what keys fire up what commands in the MyWiki.py file. I renamed it to ‘Default (OSX).sublime-keymap’ which should’ve solved the problem, but didn’t.

7. So instead, I went to Preferences – key bindings – user, and copied the text of Dan’s file into this default one.
8. In the file ‘MyWiki.sublime-settings’ I changed the wiki_directory like so:

“wiki_directory”: “/Users/shawngraham/Documents/notes-experiment-sublime/”,

I saved everything, restarted sublime text, and voila!

Screen Shot 2015-07-01 at 3.52.10 PM
Hard to see it, but I’ve just typed [[ in a markdown document, and an autocomplete with all of the file names (that is, notecards) appears so that I can ensure consistency across my files or create a new note.

Exporting your PDF Annotations from Skim

I’ve got to write this down now, while I still remember what I did. Dan Sheffler has a great blog. Lots of really neat & useful stuff. One of the things he has is a script for exporting your notes and annotations of pdfs in nicely formatted markdown. (All OS, I’m afraid Windows folks). First two things we need: Skim: http://skim-app.sourceforge.net/ Bibdesk: http://bibdesk.sourceforge.net/ And a pdf of some academic article. Download and install Bibdesk. Open Bibdesk. Drag & drop your pdf onto the Bibdesk window. Add relevant metadata. Crucial: the ‘Cite key’ field is like a shortcode for referencing all this. Screen Shot 2015-06-28 at 9.56.16 PM In the screenshot above, I’ve got Alice Watterson’s recent piece from Open Archaeology. My cite key is her name & the year. Now, in Skim, open that pdf up, and start making notes all over it. Save. Now, the first thing we’re going to do is use Sheffler’s script for making custom URLs that will link our notes to the relevant page of the pdf; these are URLs that we can then use in our Markdown documents. His instructions are at: http://www.dansheffler.com/blog/2014-07-02-custom-skim-urls/ To follow those instructions, find your AppleScript editor on your machine, and paste his code in. Save as an application. Then, find the application (he called his ‘Skimmer’) on your machine, and right click (or whatever you do to bring up the contextual menu) and inspect package contents. You then open the info.plist file in a text editor, and swap the default for what Sheffler instructs – see my screenshot:Screen Shot 2015-06-28 at 10.00.24 PM Run the Skimmer application. If all goes well, nothing much should appear to happen. Ok, so, let’s test this out. I went to dillinger.io and made the following md link: [@Watterson2015 [page 120](sk://Watterson2015#120)] and then exported it as html. I opened the html in my browser, and hey presto! When I clicked on the link, it opened the pdf in Skim at my note!

So that’s part 1 achieved. Now onwards to part 2.

Next, we want to export our Skim annotations as nicely formatted markdown (you could then use them in a blog post, or on github, or wherever). http://www.dansheffler.com/blog/2014-07-07-exporting-skim-notes/ Again, open up your applescript editor, and paste Sheffler’s code. This code requires that you have those custom URLs set up, BibDesk installed, and something called pdftk (I checked it out; seems to be a windows program, so I ignored it. Script ran well anyway, so *shrug*) [1]. Finally, the end of the script opens up Marked for viewing and editing the md. I don’t have that, so I simply changed that final line from Marked to Atom (which I do have) – and that works. If you right-click on your finder, select ‘go to folder’ and type in Library, you can find the ‘Application\Support’ folder. I made a new subfolder called ‘Skim\scripts’ and saved the script in there as skim-export. Fired Skim back up, opened Watterson 2015 in Skim, selected ‘skim-export’ and behold: Screen Shot 2015-06-28 at 10.11.59 PMA nicely formatted markdown document with my annotations – and crucially, the links back to the original pages. I can now save this markdown into a github repo, or any number of markdown flavoured wiki type things that run locally on this machine, or use pandoc to convert it to something else, or… Markdown is *wonderful*. These scripts are too! Thank you Dan. Next thing I want to try: http://www.dansheffler.com/blog/2015-05-11-my-zettelkasten-in-sublime/

[1] Dan responds at http://dansheffler.com/blog/2015-07-01-electric-archaeology/ to note that the tool in question is here: https://www.pdflabs.com/tools/pdftk-server/ and runs on the command line.

Quickly Extracting Data from PDFs

By ‘data’, I mean the tables. There are lots of archaeological articles out there that you’d love to compile together to do some sort of meta-study. Or perhaps you’ve gotten your hands on pdfs with tables and tables of census data. Wouldn’t it be great if you could just grab that data cleanly? Jonathan Stray has written a great synopsis of the various things you might try and has sketched out a workflow you might use. Having read that, I wanted to try ‘Tabula‘, one of the options that he mentioned. Tabula is open source and runs on all the major platforms. You simply download it an double-click on the icon; it runs within your browser. You load your pdf into it, and then draw bounding boxes around the tables that you want to grab. Tabula will then extract that table cleanly, allowing you to download it as a csv or tab separated file, or paste it directly into something else.

For instance, say you’re interested in the data that Gill and Chippindale compiled on Cycladic Figures. You can grab the pdf from JSTOR:

Material and Intellectual Consequences of Esteem for Cycladic Figures
David W. J. Gill and Christopher Chippindale
American Journal of Archaeology , Vol. 97, No. 4 (Oct., 1993) , pp. 601-659
Article DOI: 10.2307/506716

Download it, and then feed it into Tabula. Let’s look at table 2.

gillchippendaletable2
You could just highlight this table in your pdf reader and hit ctrl+c to copy it; when you paste that into your browser, you’d get:
gillchippendaletable2cutnpaste
Everything in a single column. For a small table, maybe that’s not such a big deal. But let’s look at what you get with Tabula. You drag the square over that same table; when you release the mouse button you get:
tabula1
Much, much cleaner & faster! I say ‘faster’, because you can quickly drag the selection box around every table and hit download just the one time. Open the resulting csv file, and you have all of your tables in a useful format:
tabula2
But wait, there’s more! Since you can copy directly to the clipboard, you can paste directly into a google drive spreadsheet (thus taking advantage of all the visualization options that Google offers) or into something like Raw from Density Design.
Tabula is a nifty little tool that you’ll probably want to keep handy.

Briefly Noted: Lytro, Light-Field Photography

  In the latest MIT Technology Review, there’s a short piece on the ‘Lytro‘, a camera that captures not just the light that falls on its sensor, but also the angle of that light. This feature allows different information, different kinds of shots, to be extracted computationally after the button is pressed.

I want one. They sell for $500.

Think of the archaeological uses! I’m no photographer, but as I understand things, a lot of archaeological photography comes down to the creative use of oblique angles, whether to see crop marks or to pick out very fine details of artefacts. If the Lytro captures the angles of the light hitting its sensors, then presumably one could take a shot, post the database of information associated with that shot, then allow other [digital] archaeologists to comb through that data extracting information/pictures of relevance? Perhaps a single photo of the soil could be combed through highlighting different textures, colours, etc…  Try out their gallery here.

The future of this camera is in the software apps developed to take advantage of the massive database of information that it will generate:

Refocusing images after they are shot is just the beginning of what Lytro’s cameras will be able to do. A downloadable software update will soon enable them to capture everything in a photo in sharp focus regardless of its distance from the lens, which is practically impossible with a conventional camera. Another update scheduled for this year will use the data in a Lytro snapshot to create a 3-D image. Ng is also exploring a video camera that could be focused after shots were taken, potentially giving home movies a much-needed boost in production values.

Getting Started with MALLET and Topic Modeling

UPDATE! September 19th 2012: Scott Weingart, Ian Milligan, and I have written an expanded ‘how to get started with Topic Modeling and MALLET’ for the Programming Historian 2. Please do consult that piece for detailed step-by-step instructions for getting the software installed, getting your data into it, and thinking through what the results might mean.

Original Post that Inspired It All:

I’m very interested in topic modeling at the moment. It has not been easy however to get started – I owe a debt of thanks to Rob Nelson for helping me to get going. In the interests of giving other folks a boost, of paying it forward, I’ll share my recipe. I’m also doing this for the benefit of some of my students. Let’s get cracking!

First, some background reading:

  1. Clay Templeton, “Topic Modeling in the Humanities: An Overview | Maryland Institute for Technology in the Humanities”, n.d., http://mith.umd.edu/topic-modeling-in-the-humanities-an-overview/.
  2. Rob Nelson, Mining the Dispatch http://dsl.richmond.edu/dispatch/
  3. Cameron Blevins, “Topic Modeling Martha Ballard’s Diary” Historying, April 1, 2010, http://historying.org/2010/04/01/topic-modeling-martha-ballards-diary/
  4. David J Newman and Sharon Block, “Probabilistic topic decomposition of an eighteenth‐century American newspaper,” Journal of the American Society for Information Science and Technology 57, no. 6 (April 1, 2006): 753-767.
  5. David Blei, Andrew Ng, and Michael Jordan, “Latent dirichlet allocation,” The Journal of Machine Learning Research 3 (2003), http://dl.acm.org/citation.cfm?id=944937.

Now you’ll need the software. Go to the MALLET project page, and download Mallet. (Mallet was developed by Andrew McCallum at U Massachusetts, Amherst).

Then, you’ll need the Java developer’s kit – nb, not the regular Java that’s on every computer, but the one that lets you program things. Install this.

Unzip Mallet into your C:/ directory . This is important; it can’t be anywhere else. You’ll then have a folder called C:/mallet-2.0.6 or similar.

Next, you’ll need to create an environment variable called MALLET_HOME. You do this by clicking on control panel >> system >> advanced system settings (in Windows 7; for XP, see this article), ‘environment variables’. In the pop-up, click ‘new’ and type MALLET_HOME in the variable name box; type c:/mallet-2.0.6 (ie, the exact location where you unzipped Mallet) in variable value.

To run mallet, click on your start menu >> all programs >> accessories >> command prompt. You’ll get the command prompt window, which will have a cursor at c:\user\user> (or similar). type cd .. (two periods; that ain’t a typo) to go up a level; keep doing this until you’re at the C:\ .  Then type cd:\mallet-2.0.6 and you’re in the Mallet directory. You can now type Mallet commands directly. If you type bin\mallet at this point, you should be presented with a list of Mallet commands – congratulations!

At this point, you’ll want some data. Using the regular windows explorer, I create a folder within mallet where I put all of the data I want to study (let’s call it ‘data’). If I were to study someone’s diary, I’d create a unique text file for each entry, naming the text file with the entry’s date. Then, following the topic modeling instructions on the mallet page, I’d import that folder, and see what happens next. I’ve got some work flow for scraping data from websites and other repositories, but I’ll leave that for another day (or skip ahead to The Programming Historian for one way of going about it.)

Once you’ve imported your documents, Mallet creates a single ‘mallet’ file that you then manipulate to determine topics.

bin\mallet import-dir --input \data\johndoediary --output
johndoediary.mallet \ --keep-sequence --remove-stopwords

(modified from the Mallet topic modeling page)

This sequence of commands tells mallet to import a directory located in the subfolder ‘data’ called ‘johndoediary’ (which contains a sequence of txt files). It then outputs that data into a file we’re calling ‘johndoediary.mallet. Removing stopwords strips out ‘and’ ‘of’ ‘the’ etc.

Then we’re ready to find some topics:

bin\mallet train-topics --input johndoediary.mallet \
  --num-topics 100 --output-state topic-state.gz --output-topic-keys
  johndoediary_keys.txt --output-doc-topics johndoediary_composition.txt

(modified from the Mallet topic modeling page)

Now, there are more complicated things you can do with this – take a look at the documentation on the Mallet page. Is there a ‘natural’ number of topics? I do not know. What I have found is that I have to run the train-topics with varying numbers of topics to see how the composition file breaks down. If I end up with the majority of my original texts all in a very limited number of topics, then I need to increase the number of topics; my settings were too coarse.

More on interpreting the output of Mallet to follow.

Again, I owe an enormous debt of gratitude to Rob Nelson for talking me through the intricacies of getting Mallet to work, and for the record, I think the work he is doing is tremendously important and fascinating!

Thoughts on the Shadow Scholar

In the Chronicle of Higher Education, there is a troubling piece written by a fellow who writes and sells papers for/to students. Which got me to thinking: shouldn’t text analysis be able to solve this?

Here’s my thinking: I’m willing to bet every author produces unique combinations of words and phrases – a concept that Amazon for instance uses to improve its search functions (“statistically improbable phrases“).  As the ‘ghost writer’ points out, most of the emails he gets from students are nearly illegible or otherwise atrocious. So – what if at the start of a school year, you sat all of your students down to handwrite a couple thousand words, any topic.  Writing by hand is important, so that you get that student’s actual genuine writing. Scan it all in. Perform text analysis on it. Obtain a ‘signature’ for that student’s style. Then, when students submit their papers, analyze them again and compare the signatures.  Where the signatures don’t match within a certain range, bring the student in to talk about their work. Chances are if they didn’t write it, they probably haven’t read it either…. Repeat each year to account for developing skill and ability.

Perhaps I’m naive, and text analysis isn’t at that level yet (but I’m willing to bet it could be…). If the problem is a student submits someone else’s work as his own, then maybe if we had a clear signal of his own true work, all this latent computer power sitting around could be brought into the equation…?

Just a thought.

 

7Scenes: Augmented Reality Authoring for Digital Storytelling

I’m very interested in augmented reality for interpreting/experiencing landscapes (archaeological or historical). I’ve explored things like Wikitude and Layar. There’s a great deal of flexibility and possibility with those two, if you’ve got the ability and resources to do a bit of programming. Skidmore College has used Layar with success to produce a Campus Map Layar. (follow that link for excellent pointers on how they did it). But what if you’d like to explore the potential of AR, but don’t have the programming skills?

One platform that I’ve come across recently which can help there is called ‘7Scenes‘. It explicitly bills itself as a ‘mobile storytelling platform’.  The free account allows you a basic ‘tour’ kind of story to tell; presumably if you purchase another kind of account, different genres become available to you.

I signed up for the free account, and began playing around with it (I’m ‘DoctorG’ if you’re looking). Even with this level of functionality, some playful elements are available – you can set quizzes by location, for instance, and keep score. A tour of your campus for first year students as part of orientation could include quizzes at crucial points.

In the editor window, you first select the genre. Then details (backstory, introduction etc).

The real work begins in the map window. When you add a location, you can make it trigger information or photos when the player encounters it. You can also build in simple quizzes, as in the screenshot.

Once the ‘scene’ is published, anyone with 7scenes on their smartphone can access it. The app knows where you are, and pulls in the closest scene. In about 15 minutes I created a scene with 3 locations, one photo, one info panel, and one quiz, around the main quad here at Carleton. Then, I fired up the app on my iphone and went outside. Even though it was quite simple, it was really rather engaging, wandering about the quad trying to get close enough to the point to trigger the interaction (note to scene makers: zoom into the map interface so that your location is precisely where you want. I put my first point actually outside my intended target, Paterson Hall, so I was wandering about the parking lot.)

I will be playing with this some more; but fired up after only a short investment in time, I wanted to share. The authoring environment makes sense, it’s easy to use, and the results are immediately apparent. When you log back into the 7scenes site, you also get use metrics and reviews of your scene. If only my digital history students had more smartphones!

More on 7scenes from their own press page

I know that I know nothing

Commuting in Ottawa is an interesting experience. It seems the entire city disappears in the summer, beguiling one into thinking that a commute that takes 30 – 40 minutes in August will continue to be 30 – 40 minutes in September.

This morning, I was pushing 1 hr and 40 minutes. On the plus side, this gives me the opportunity to listen to the podcasts from Scholars’ Lab, from the University of Virginia (available via iTunes U).  As I listen to this excellent series of talks (one talk per commute…) I realize just how profoundly shallow my knowledge is of the latest happenings in Digital Humanities – and that’s a good thing! For instance, I learned about Intrasis, a system from Sweden for recording archaeological sites (or indeed, any kind of knowledge) that focuses on generating relationships from the data, rather than specifying beforehand a relationships table (and it melds very well with GIS). This is cool. I learned also about Heurist, a tool for managing research.  Also ‘Heml’ – the Historical Event Markup and Linking Project, lead by Bruce Robertson. As I listened to this last talk, as Bruce described the problems of marking up events/places/persons using non-Gregorian calendars and so on, it struck me that this problem was rather similar to the one of defining sites in a GIS – what do you do when the boundaries are fuzzy? How do you avoid the in-built precision of dots-on-a-map, or URLS that lead to one specific location? Time is Space, as Einstein taught us….

The upshot is, I feel very humbled when I listen to these in-depth and fascinating talks – I feel rather out of my depth. At the same time, I am excited to be able to participate in such a fast moving field.  Hopefully, my small contributions to agent modeling for history generate the same kind of excitement for others!

Publish your excavation in minutes

…provided you blogged the whole thing in the first place.

How, you say?

With Anthologize, the outcome of the one-week-one-tool experiment.

Anthologize is a free, open-source, plugin that transforms WordPress 3.0 into a platform for publishing electronic texts. Grab posts from your WordPress blog, import feeds from external sites, or create new content directly within Anthologize. Then outline, order, and edit your work, crafting it into a single volume for export in several formats, including—in this release—PDF, ePUB, TEI.

How Anthologize came to be is remarkable in itself (see Dan Cohen’s blog) and is a model for what we as digitally-minded archaeology folks could be doing. Which puts me in mind of excavation reports, catalogues, and other materials produced in the day to day work of archaeology.

What if, in the course of doing your fieldwork/archive work/catalogue work/small finds work, you used WordPress as your content management system? There are plugins a-plenty for keeping things private, if that’s a concern. But once the work is complete, run Anthologize and voila: a publication fit for the 21st century.

And, since the constraints of paper publishing no longer apply, David Wilkin’s thoughts on the fuller experience of archaeology could also now find easier expression – in 2007 I wrote the following:

But he asks, ‘what of characters in archaeological writing?’ Wilkinson’s paper is really making a plea for archaeologists to remember that they themselves are characters in the story of the site or landscape that they are studying, and that they should put themselves into it:

“We all sit in portacavins, in offices, in vans, in pubs or round fires, and we tell stories… we have a great time and drink too much and what do we do the next morning? We get up and go to our offices and we rite, ‘In Phase 1 ditch 761 was recut (794) along part of its length.’ Surely, we can do better”.

A similar argument was made in the SAA Archaeological Record last May, by Cornelius Holtorf , in an article called ‘Learning from Las Vegas: Archaeology in the Experience Economy”. Holtorf argued:

“Learning from Las Vegas means learning to embrace and build upon the amazing fact that archaeologists can connect so well with some of the most widespread fantasies, dreams, and desires that people have today.[…] I am suggesting that the greatest value of archaeology in society lies in providing people with what they most desire from archaeology: great stories both about the past and about archaeological research.”

Archaeology – the doing of archaeology! – is a fantastic experience. You learn so much more about the past when you are at the coal-face itself, when you stand in 35 degree C heat, with the dust on your face so thick you almost choke, debating with the site supervisor the meaning of a complicated series of walls, or sitting at the bar afterwards with a cool beer, still debating the situation, laughing, chatting. Reading ‘Three shards of Vernice-Nera ware found in-situ below 342 indicate…’ sucks the fun out of archaeology. It certainly has no romance which puts the practice of archaeology – as published to the public – far down the list of priorities in this modern Experience Economy. The serious face of archaeology we present to the public is so lifeless : how can we expect government and the public to be excited about our work if we ourselves give every indication of not being excited either?

I’m not arguing that we turn every site monograph into a graphic novel (though that’s an interesting idea, and has been done for teaching archaeology). But with the internet being the way it is these days: couldn’t a project website contain blogs and twitters (‘tweets’, actually) from the people working on it? Can’t we make the stories of the excavation at least as important as the story of the site?

Contragulations to the folks who participated in the creation of Anthologize; there’ll be great things ahead for this tool!

Zotero Maps: Visualizing Archaeology?

You can now map your Zotero Library:

Potential Use Cases:
Map Your Collection By Key Places:
Many records from library catalogs and journal databases come pre-loaded with geographic keywords. Zotero Maps lets you quickly see the relationships between the terms catalogers, authors, and publishers have assigned to the items in your collection. Similarly, as you apply your own geographic tags to items you can then explore those geographic relationships. Whether you’re looking at key locations in studies of avian flu, ethnographic work in the American southwest, or the history of the transatlantic slave trade, the tags associated with your items provide valuable geographic information.

Map Places of Publication:
In many cases places of publication include crucial information about your items. If your working on a project involving the history of the book, how different media outlets cover an issue, or how different journals present distinct scientific points of view, the places in which those items are published can provide valuable insight.

In 2007, I was trying something along these lines using Platial (now deceased). Now – since you can add objects from things like Opencontext.org into your Zotero library, and describe these using tags, you could begin to build a map of not only ‘things’ but also the relevant reports etc, all from your browser, without doing any of the fancy coding stuff…

From my library: