Open Notebook Research in Digital Archaeology

We’re currently writing the very first draft of our integrated DHBox virtual-machine-and-textbook for digital archaeology. It’s aimed at the same crowd as a regular intro-to-archaeology text, that is, first or second year students with little digital grounding. It won’t cover everyone’s wishlist for digital archaeology, but it will with care be a solid foundation for going further.

In the instructions below, we are imagining the student using the command line within DHBox. If you’re following along here, go over to DHBox, click on start hour long demo, then command line. The login and password are both ‘demonstration’. 

As always, please use hypothesis (annotated copy link) to annotate or leave comments. Thanks!

1.4 Open Notebook Research & Scholarly Communication

Digital archaeology necessarily generates a lot of files. Many of those files are data; many more are manipulations of that data, or the data in various stages of cleaning and analysis. Without any sort of version control or revision history (as detailed in the previous section), these files quickly replicate to the point where a project can be in serious danger of failing. Which file contains the ‘correct’ data? The correct analysis? Even worse, imagine coming back to a project after a few months’ absence. Worse still, after a major operating system update of the kind foisted on Windows users from Windows 7 to Windows 10. The bad news continues: magnetic storage can fail; online cloud services can be hacked; a key person on the project can die.

Even if the data makes it to publication, there is the problem of the data not being available to others for re-interrogation or re-analysis. Requests for data from the authors of journal articles are routinely ignored, whether by accident or design. Researchers may sit on data for years. We have all of us had the experience of working on a collection of material, and then writing to the author of the original article, requesting an explanation for some aspect of the data schema used, only to find out that the author has either died, kept no notes, left the field entirely, or simply doesn’t remember.

There is no excuse for this any longer. Open notebook science is a gathering movement across a number of fields to make the entire research process transparent by sharing materials online as they are generated. These include everything from the data files themselves, to the code used to manipulated it, to notes and observations in various archives. Variations on this ‘strong’ position include data-publishing of the materials after the main paper has been published (see for instance OpenContext or the Journal of Open Archadological Data). Researchers such as Ben Marwick and Mark E. Madsen are leading the field in archaeology, while scholars such as Caleb McDaniel are pushing the boundaries in history. The combination of simple text files (whether written text or tabular data such as .csv files) with static website generators (ie, html rather than dynamically generated database websites like WordPress) enables the live publishing of in-progress work. Carl Boettiger is often cited as one of the godfathers of this movement. He makes an important distinction:

This [notebook, not blog] is the active, permanent record of my scientific research, standing in place of the traditional paper bound lab notebook. The notebook is primarily a tool for me to do science, not communicate it. I write my entries with the hope that they are intelligible to my future self; and maybe my collaborators and experts in my field. Only the occasional entry will be written for a more general audience. […] In these pages you will find not only thoughts and ideas, but references to the literature I read, the codes or manuscripts I write, derivations I scribble and graphs I create and mistakes I make. (Boettiger)

Major funding bodies are starting to require a similar transparency in the research that they support. Recently, the Social Sciences and Humanities Research Council of Canada published guidance on data management plans:

All research data collected with the use of SSHRC funds must be preserved and made available for use by others within a reasonable period of time. SSHRC considers “a reasonable period” to be within two years of the completion of the research project for which the data was collected.

Annecdotally, we have also heard of work being denied funding because the data management plan, and/or the plan for knowledge mobilization, made only the briefest of nods towards these issues: ‘we shall have a blog and will save the data onto a usb stick’ does not cut it any more. A recent volume of case-studies in ‘reproducible research’ includes a contribution from Ben Marwick that details not only the benefits of such an approach, but also the ‘pain points’. Key amongst them was that not everyone participating in the project was on board using scripted code to perform the analysis (preferring instead to use the point-and-click of Excel), the duplication of effort that emerged as a result, and the complexities that arose from what he calls the ‘dual universes’ of Microsoft tools versus the open source tools. (MARWICK REF). On the other hand, the advantages outweighed the pain. For Marwick’s team, because their results and analysis can be re-queried and re-interrogated, they have an unusually high degree of confidence in what they’ve produced. Their data, and their results have a complete history of revisions that can be examined by reviewers. Their code can be re-used and re-purposed, thus making their subsequent research more efficient. Marwick goes on to create an entire ‘compendium’ of code, notes, data, and software dependencies that can be duplicated by other researchers. Indeed, we will be re-visiting their compendium in Section XXXXXXXXX.

Ultimately, McDaniels says it best about keeping open notebooks of research in progress when he writes,

The truth is that we often don’t realize the value of what we have until someone else sees it. By inviting others to see our work in progress, we also open new avenues of interpretation, uncover new linkages between things we would otherwise have persisted in seeing as unconnected, and create new opportunities for collaboration with fellow travelers. These things might still happen through the sharing of our notebooks after publication, but imagine how our publications might be enriched and improved if we lifted our gems to the sunlight before we decided which ones to set and which ones to discard? What new flashes in the pan might we find if we sifted through our sources in the company of others?

A parallel development is the growing practice of placing materials online as pre-prints or even as drafts, for sharing and for soliciting comments. Graham for instance uses a blog as a place to share longer-form discursive writing in progress; with his collaborators Ian Milligan and Scott Weingart, he even wrote a book ‘live’ on the web, warts and all (which you may still view at The Macroscope). Sharing the draft in progress allowed them to identify errors and ommissions as they wrote, and for their individual chapters and sections to be incorporated into class syllabi right away. In their particular case, they came to an arrangment with their publisher to permit the draft to remain online even after the formal publication of the ‘finished’ book – which was fortunate, as they ended up writing another chapter immediately after publication! In this, they were building on the work of scholars such as Kathleen Fitzpatrick, whose Planned Obsolescence was one of the first to use the Media Commons ‘comment press’ website to support the writing. Commentpress is a plugin for the widely used WordPress blogging system, which allows comments to be made at the level of individual paragraphs. This textbook you are currently reading uses another solution, the hypothes.is plugin that fosters communal reading and annotation of electronic texts. This points to another happy by-product of sharing one’s work this way – the ability to generate communities of interest around one’s research. The Kitz et al. volume is written with the Gitbook platform, which is a graphical interface for writing using Git at its core with markdown text files to manage the collaboration. The commit history for the book then also is a record of how the book evolved, and who did what to it when. In a way, it functions a bit like ‘track changes’ in Word, with the significant difference that the evolution of the book can be rewound and taken down different branches when desired.

In an ideal world, we would recommend that everyone should push for such radical transparency in their research and teaching. But what is safe for a group of (mostly) white, tenured, men is not safe for everyone online. In which case, what we recommend is for individuals to assess what is safest for them to do, while still making use of the affordances of Git, remote repositories, and simple text files. Bitbucket at the time of writing offers free private repositories (so you can push your changes to a remote repository without fear of others looking or cloning your materials); ReclaimHosting supports academic webhosting and allows one to set up the private ‘dropbox’ like file-sharing service Owncloud.

In this exercises below, we will explore how to make a simple open notebook via a combination of markdown files and a repository on Github. Ultimately, we endorse the model developed by Ben Marwick, of creating an entire ‘research compendium’ that can be installed on another researcher’s machine, but a good place to start are with the historian Lincoln Mullen’s simple notebook templates. This will introduce to you another tool in the digital archaeologist’s toolkit, the open source R programming language and the R Studio ‘IDE’ (’integrated development environment).

Far more complicated notebooks are possible, inasmuch as they combine more features and ways of compiling your research. Scholars such as Mark Madsen use a combination of Github pages and the Jekyll blog generator (for more on using Jekyll to create static websites, see Amanda Visconti’s Programming Historian tutorial.) A simple Github repository and WordPress blog can be used in tandem, where the blog serves for the narrative part of a notebook, the part that tries to make sense of the notes contained in the repository. This aspect of open notebook science is critically important in that it serves to signal your bona fides as a serious scholarly person. Research made available online is findable; given the way web search works, if something cannot be found easily, it might as well not exist.

Ultimately, tou will need to work out what combination of tools works best for you. Some of our students have had success using Scrivener as a way of keeping notes, where Scrivener writes to a repository folder or some other folder synced across the web (like Dropbox, for instance). In this workflow, you have one Scrivener file per project. Scrivener uses the visual conceit of actual 3 x 5 notecards. Within Scrivener, one would make one card per note, and keep them in a ‘research’ folder. Then, when it becomes time to write up the project, those notecards can be moved into the draft and rearranged as necessary so that the writing flows naturally from them.

1.4.1 How to Ask Questions

  • stuff here on how to ask a question on sites like stackoverflow etc.
  • also this https://speakerdeck.com/jennybc/reprex-help-me-help-you, although perhaps move it to 3.1 literate programming. perhaps use it to create actual examples that can be copied over to R though. In which case, talk about it in both places.
  • the idea being ways in which your open notebook becomes an invitation to others to help you, and also, a way of making sure you find the answer you’re looking for when the inevitable troubles emerge

1.4.2 discussion

Questions for discussion:

  1. Search the archaeological literature (via jstor or Google Scholar) for examples of open notebook science ‘in the wild’. Are you finding anything, and if so, where? Do there seem to be impediments from the journals regarding this practice?
  2. What excites you about the possibilities of open notebook archaeology? What are the advantages?
  3. What frightens you? What are the disadvantages?
  4. Search online for the ‘replicability crisis in science’. Is there any such thing in archaeology?
  5. Study Marwick’s paper REF and compare it to its supporting Github repository. What new questions could be asked of this data?
  6. In what ways are terms like ‘open access’, ‘open source’, and ‘open science’ synonyms for a similar approach, and in what ways are they different?

1.4.3 Take-aways

Keeping an open notebook (or if necessary, a closed notebook) is a habit that must be cultivated. As a target to aim for, try to have

  • each experiment|project in its own folder
  • each experiment|project with regular pattern of subfolders data and figures and text and bib etc
  • the experiments|projects under version control.
  • a plan for data publishing. One option is to submit the repository to zenodo or similar to obtain digital object identifiers (DOIs) for the repository
  • a plan to write as you go, on a fail log or blog or what-have-you. Obtain a DOI for this, too.

We haven’t mentioned DOIs in this section, but when your notebook and your narrative about your research has a DOI, it becomes easier for your colleagues to cite your work – even this work in progress!

1.4.4 Further Reading

Baker, James. ‘Preserving Your Research Data’, The Programming Historianhttp://programminghistorian.org/lessons/preserving-your-research-data

1.4.5 On Privilege and Open Notebooks

While we argue for open notebooks, there may be circumstances where this is not desireable or safe to do. Readers may also want to explore an Evernote alternative, Laverna which stores your notes in your web-browser’s cache hence making them private, but also allows sync to services such as Dropbox (versioning and backup are still absolutely critical). If you work primarily on a Mac computer, nvAlt by Brett Terpstra is an excellent note-taking application that can sync remotely. Another possibility is Classeur a web abb that integrates with various blogging platforms, allows for syncing and collaboration, the choice of what to make public and what to keep private, and includes the ability to sort notes into various notebooks. It does not save locally, so be warned that your information is on their servers. There is an API (application programming interface) that allows you to download your materials (for more on APIs, see [Introduction to Digital Libraries, Archives & Repositories]).

A final word on the privilege involved in keeping an open notebook is warranted. To make one’s research available openly on the web, to discuss openly the things that worked, the things that haven’t, the experiments tried and the dead ends explored, is at the current moment something that depends on the perceived race, class, and gender of the person doing it. What passes without comment when I (Shawn Graham, a white, tenured, professor) do something could attract unwarranted, unwanted, and unfair attention if a woman of colour undergraduate tried. This is not to say this always happens; but disgracefully it happens far too often. It is important and necessary to fight back against the so-called ‘internet culture’ in these things, but it is not worth risking one’s safety. To those who benefit from privilege, it is incumbent upon them to make things safe for others, to recognize that open science, open humanities, represents a net boon to our field. In which case, it is up to them to normalize such practices, to make it safe to try things out. We discuss more in the following section on what [Failing Productively] means, why it matters, and why it is integral not only to digital archaeology, but the culture of academic research, teaching, and outreach more generally.

1.4.6 exercises

In this series of exercises, we are going to take you through the process of setting up an open research notebook, where you control all of the code and all of the data. A good rule-of-thumb in terms of keeping a notebook is ‘one notecard per thought`, here adapted as ’one file per thought, one folder per project’.

Let us set up a template open-notebook based on Lincoln Mullen’s Simple RmD Notebook. If you go to https://lmullen.github.io/rmd-notebook/ you’ll see a ‘live’ version of this template on the web. It is being served to us from a special branch in Mullen’s Github account, called gh-pages. When you have a gh-branches branch in very nearly any repo associated with your own Github account, Github.com will treat that repository not as a repository but as an actual website. This allows us to update or experiment with changes on other branches, and show a ‘polished’ version to the world via the gh-pages branch. That branch will have a special address, in the form your-account-name.github.io/your-repo. Whenever you see github.io in a URL, you know that the source for that website will be found at github.com/account-name/repo. Do you see the difference? (For more on gh-pages, see the Github documentation).

  1. Begin sketching out on paper an idea for a digital archaeological project, perhaps the one you imagined at the end of our section on Project Management Basics. Imagine its file structure. In the top level folder are going to go all of your notecards. Sub-folders are going to hold diagrams that you create in the course of your reserch; source-data that you leave untouched; data that you’ve cleaned or manipulated; any helper code that you might create; and so on. You will use this structure to help organize your open notebook once we’ve installed it.
  2. Make a fork of our copy of Mullen’s notebook (we’ve added hypothes.is to it). You can find our copy at https://github.com/o-date/rmd-notebook.
  3. Clone your copy to the ODATE environment at the command line. (Review Github & Version Control if necessary first).
  4. Type ls to make sure the rmd-notebook directory is present; then cd rmd-notebook.
  5. Check which branch you are on, and make sure it is the gh-pages branch. (Hint: check the status)
  6. Now you’re ready to start adding notes! Remember, .Rmd files are just markdown files into which you can insert working R code. We’re not ready to do that yet (but we will encounter it in due course), but for now, you can think of these files as simple notecards where one card = one idea. Note the existing .rmd files in this folder. Their filenames all begin with a number. Yours should be numbered as well. You can create a new card by typing nano filename.rmd where filename is whatever you want it to be. Your notecard can include images by using the markdown syntax, ![image tile](path/to/image/filename.jpg); those images can be on the web or in an image folder. (A good place to practice markdown is at Dillinger.io).

Your note must contain some descriptive metadata at the top. This is good practice no matter what kind of note-taking system you use. In our case here, we use the yaml approach. This is what a minimum example looks like:

---
title: "First page of the notebook"
author: "Lincoln Mullen"
date: "December 3, 2015"
---

Title, author, and date. These will get passed to the <meta> tags in the eventual HTML we are going to generate. This information makes your site easier to find and to archive and to associate with you as a scholar.

  1. Now we make the public-facing website for your notebook. Mullen has bundled a series of commands into a make file, which acts as a short-cut for us and also ensures that the same sequence of operations is carried out everytime. (You can read more about makefiles here.) At the command prompt type $ make.
SG <- note pandoc and rmarkdown have to be bundled into the dhbox before hand. Otherwise, the following commands have to be run:
$ wget https://github.com/jgm/pandoc/releases/download/1.19.2.1/pandoc-1.19.2.1-1-amd64.deb
to get pandoc and then $ sudo dpkg -i pandoc-1.19.2.1-1-amd64.deb` to unzip and install it. Rmarkdown has to be installed from the R Server code pane. The first sample rmd file in Lincoln's example has a leaflet webapp in it; I modified the R to install leaflet first, but I dunno. This'll have to be tested. Probably easier at this point to just remove it.
  1. The make file is pushing all of your .Rmd files through a program called pandoc, and adding various styling options to make an entire website.

As an aside, Pandoc is an extremely useful piece of software for converting files from one format to another. A series of examples can be found at http://www.pandoc.org/demos.html. At the command line, type $ pandoc and then your source file, then indicate the kind of output you want using the -o flag, eg: $ pandoc MANUAL.txt -o example1.html. One of the outputs Pandoc can generate is a Word document – which means, your source text is kept in a very simple, future-proof format, and Word can be used just for typography.

Commit your changes and push them to your remote repository. Visit the live version of your repository – ie, the one at github.io not github.com to see your live open research notebook!
<

  1. Write a new note for your notebook recording your work, the problems you encountered, and the solutions you found. Save, make, commit, and push your note to the web.

Another approach involves writing markdown files, putting them online, and then using a kind of ‘helper’ file to manage their display as html. In this particular case, we are going to use something called mdwiki. Mdwiki involves a single html file which, when put in the same folder as a series of markdown files, acts as a kind of wrapper to turn the markdown files into pages of a wiki-style website. There is a lot of customization possible, but for now we’re going to make a basic notebook out of a mdwiki template.

  1. Fork the minimal mdwiki template to your Github account; md wiki template is linked here
  2. At this point, any markdown file you create and save into the mdwiki-seed\ll_CC\ folder will become a webpage, although the .md extension should still be used in the URL . If you study the folder structure, you’ll see that there are pre-made folders for pages, for pdfs, for images, and so on (if you clone the repo, you can then add or remove these folders as you like using the file manager). Remembering to frame any internal links as relative links. That is to say, if you saved a markdown file in ll_CC/pages/observation1.md but wanted to link to ll_CC/pages/observation2.md, it is enough to just add [Click here](observation2.md). Because the mdwiki-seed you forked was already on a gh-pages branch, your notebook will be visible at YOURUSERNAME.github.io/mdwiki-seed/. But note: the page will reload and you’ll see #! or ‘hashbang’ inserted at the end of the URL. This is expected behaviour
  3. Let’s customize this a bit. Via Github, click on the ll_CC directory. One of the files that will be listed is config.json. If you click on that file, you’ll see:
{
  "additionalFooterText": "All content and images &copy; by Your Name Goes Here&nbsp;",
  "anchorCharacter": "#",
  "lineBreaks": "gfm",
  "title": "Your wiki name",
  "useSideMenu": true
}

Change the title so that it says something like Your-name Open Research Notebook. You can do this by clicking on the pencil icon at the top right of the file viewer (if you don’t see a pencil icon, you might not be logged into github). Scroll to the bottom and click on the ‘commit changes’ button when you’re done.

  1. Let’s add notes to this notebook. You can do this in two ways. In the first, you clone your mdwiki-seed via the command line, and use the text editor to create new pages in the appropriate folder (in this case, ll_CC\pages), then git commit, git add ., and git push to get your changes live online. You can create a kind of table of contents by directing the ls command into a new file, like so:

$ ls > index.md

and then editing that file to turn the filenames into markdown links like so: [display text for link](filename.md).

Alternatively, a more elegant approach to use in conjunction with mdwiki is to use Prose.io and keep your notebook live on the web. Prose.io is an editor for files hosted in Github. You log into Prose.io with your github credentials, and select the repository you wish to edit, in this case, mdwiki-seed. Then, click on the ‘new file’ button. This will give you a markdown text editor, and allow you to commit changes to your notebook! Warning do not make changes to index.html when using mdwiki. If you want a particular markdown file to appear as the default page in a folder, call it index.md instead. You could then periodically update your cloned copy on your own machine for back up purposes.

Either way, add some notes to the notebook, and make them available online.

An introduction to github and version control

We’re currently writing the very first draft of our integrated DHBox virtual-machine-and-textbook for digital archaeology. It’s aimed at the same crowd as a regular intro-to-archaeology text, that is, first or second year students with little digital grounding. It won’t cover everyone’s wishlist for digital archaeology, but it will with care be a solid foundation for going further.

In the instructions below, we are imagining the student using the command line within DHBox. If you’re following along here, go over to DHBox, click on start hour long demo, then command line. The login and password are both ‘demonstration’. 

As always, please use hypothesis (annotated copy link) to annotate or leave comments. Thanks!

1.3 Github & Version Control

It’s a familiar situation – you’ve been working on a paper. It’s where you want it to be, and you’re certain you’re done. You save it as ‘final.doc’. Then, you ask your friend to take a look at it. She spots several typos and that you flubbed an entire paragraph. You open it up, make the changes, and save as ‘final-w-changes.doc’. Later that day it occurs to you that you don’t like those changes, and you go back to the original ‘final.doc’, make some changes, and just overwrite the previous version. Soon, you have a folder like:

|-project
    |-'finalfinal.doc'
    |-'final-w-changes.doc'
    |-'final-w-changes2.doc'
    |-'isthisone-changes.doc'
    |-'this.doc'

Things can get messy quite quickly. Imagine that you also have several spreadsheets in there as well, images, snippets of code… we don’t want this. What we want is a way of managing the evolution of your files. We do this with a program called Git. Git is not a user-friendly piece of software, and it takes some work to get your head around. Git is also very powerful, but fortunately, the basic uses to which most of us put it to are more or less straightforward. There are many other programs that make use of Git for version control; these programs weld a graphical user interface on top of the main Git program. It is better however to become familiar with the basic uses of git from the command line first before learning the idiosyncracies of these helper programs. The exercises in this section will take you through the basics of using Git from the command line.

1.3.1 The core functions of Git

At its heart, Git is a way of taking ‘snapshots’ of the current state of a folder, and saving those snapshots in sequence. (For an excellent brief presentation on Git, see Alice Bartlett’s presentation here; Bartlett is a senior developer for the Financial Times). In Git’s lingo, a folder on your computer is known as a repository. This sequence of snapshots in total lets you see how your project unfolded over time. Each time you wish to take a snapshot, you make a commit. A commit is a Git command to take a snapshot of the entire repository. Thus, your folder we discussed above, with its proliferation of documents becomes:

|-project
    |-'final.doc'

BUT its commit history could be visualized like this:

Each one of those circles represents a point in time when you the writer made a commit; Git compared the state of the file to the earlier state, and saved a snapshot of the differences. What is particularly useful about making a commit is that Git requires two more pieces of information about the git: who is making it, and when. The final useful bit about a commit is that you can save a detailed message about why the commit is being made. In our hypothetical situation, your first commit message might look like this:

Fixed conclusion

Julie pointed out that I had missed 
the critical bit in the assignment 
regarding stratigraphy. This was 
added in the concluding section.

This information is stored in the history of the commits. In this way, you can see exactly how the project evolved and why. Each one of these commits has what is called a hash. This is a unique fingerprint that you can use to ‘time travel’ (in Bartlett’s felicitous phrasing). If you want to see what your project looked like a few months ago, you checkout that commit. This has the effect of ‘rewinding’ the project. Once you’ve checked out a commit, don’t be alarmed when you look at the folder: your folder (your repository) looks like how it once did all those weeks ago! Any files written after that commit seem as if they’ve disappeared. Don’t worry: they still exist!

What would happen if you wanted to experiment or take your project in a new direction from that point forward? Git lets you do this. What you will do is create a new branch of your project from that point. You can think of a branch as like the branch of a tree, or perhaps better, a branch of a river that eventually merges back to the source. (Another way of thinking about branches is that it is a label that sticks with these particular commits.) It is generally considered ‘best practice’ to leave your masterbranch alone, in the sense that it represents the best version of your project. When you want to experiment or do something new, you create a branch and work there. If the work on the branch ultimately proves fruitless, you can discard it. But, if you decide that you like how it’s going, you can merge that branch back into your master. A merge is a commit that folds all of the commits from the branch with the commits from the master.

Git is also a powerful tool for backing up your work. You can work quite happily with Git on your own machine, but when you store those files and the history of commits somewhere remote, you open up the possibility of collaboration and a safe place where your materials can be recalled if -perish the thought- something happened to your computer. In Git-speak, the remote location is, well, the remote. There are many different places on the web that can function as a remote for Git repositories. You can even set one up on your own server, if you want. One of the most popular (and the one that we use for ODATE) is Github. There are many useful repositories shared via Github of interest to archaeologists – OpenContext for instance shares a lot of material that way. To get material out of Github and onto your own computer, you clone it. If that hypothetical paper you were writing was part of a group project, your partners could clone it from your Github space, and work on it as well!

You and Anna are working together on the project. You have made a new project repository in your Github space, and you have cloned it to your computer. Anna has cloned it to hers. Let’s assume that you have a very productive weekend and you make some real headway on the project. You commityour changes, and then push them from your computer to the Github version of your repository. That repository is now one commit ahead of Anna’s version. Anna pulls those changes from Github to her own version of the repository, which now looks exactly like your version. What happens if you make changes to the exact same part of the exact same file? This is called a conflict. Git will make a version of the file that contains text clearly marking off the part of the file where the conflict occurs, with the conflicting information marked out as well. The way to resolve the conflict is to open the file (typically with a text editor) and to delete the added Git text, making a decision on which information is the correct information.

1.3.2 Key Terms

  • repository: a single folder that holds all of the files and subfolders of your project
  • commit: this means, ‘take a snapshot of the current state of my repostiory’
  • publish: take my folder on my computer, and copy it and its contents to the web as a repository at github.com/myusername/repositoryname
  • sync: update the web repository with the latest commit from my local folder
  • branch: make a copy of my repository with a ‘working name’
  • merge: fold the changes I have made on a branch into another branch
  • fork: to make a copy of someone else’s repo
  • clone: to copy an online repo onto your own computer
  • pull request: to ask the original maker of a repo to ‘pull’ your changes into their master, original, repository
  • push: to move your changes from your computer to the online repo
  • conflict: when two commits describe different changes to the same part of a file

1.3.3 Take-aways

  • Git keeps track of all of the differences in your files, when you take a ‘snapshot’ of the state of your folder (repository) with the commit command
  • Git allows you to roll back changes
  • Git allows you to experiment by making changes that can be deleted or incorporated as desired
  • Git allows you to manage collaboration safely
  • Git allows you to distribute your materials

1.3.4 Further Reading

We alluded above to the presence of ‘helper’ programs that are designed to make it easier to use Git to its full potential. An excellent introduction to Github’s desktop GUI is at this Programming Historian lesson on Github. A follow-up lesson explains the way Github itself can be used to host entire websites! You may explore it here. In the section of this chapter on open notebooks, we will also use Git and Github to create a simple open notebook for your research projects.

You might also wish to dip into the archived live stream; link here from the first day of the NEH funded Institute on Digital Archaeology Method and Practice (2015) where Prof. Ethan Watrall discusses project management fundamentals and, towards the last part of the stream, introduces Git.

1.3.5 Exercises

(sg: wordpress has buggered up my numbering, and life is too short to try to fix it nicely. So let bolded words represent a new exercise)

  1. How do you turn a folder into a repository? With the git init command. At the command line (remember, the $ just shows you the prompt; you don’t have to type it!):
  1. make a new director: $ mkdir first-repo
  2. type $ ls (list) to see that the director exists. Then change directory into it: cd first-repo. (remember: if you’re ever not sure what directory you’re in, type $ pwd, or print working directory).
  3. make a new file called readme.md. You do this by calling the text editor: nano readme.md. Type an explanation of what this exercise is about. The .md signals that you’re writing a text file that uses the markdown format of signalling things like headings, lists, tables, etc. (A guide to markdown syntax is here). Hit ctrl+x to exit, then y to save, leave the file name as it is.
  4. type $ ls again to check that the file is there.
  5. type $ git init to tell the Git program that this folder is to be tracked as a repository. If all goes correctly, you should see a variation on this message: Initialized empty Git repository in /home/demonstration/first-repo/.git/. But type $ ls again. What do you (not) see?

The changes in your repo will now be stored in that hidden directory, .git. Most of the time, you will never have reason to search that folder out. But know that the config file that describes your repo is in that folder. There might come a time in the future where you want to alter some of the default behaviour of the git program. You do that by opening the config file (which you can read with a text editor). Google ‘show hidden files and folders’ for your operating system when that time comes.

  1. Open your readme.md file again with the nano text editor, from the command line. Add some more information to it, then save and exit the text editor.
  1. type $ git status
  2. Git will respond with a couple of pieces of information. It will tell you which branch you are on. It will list any untracked files present or new changes that are unstaged. We now will stage those changes to be added to our commit history by typing $ git add -A. (the bit that says -A adds any new, modified, or deleted files to your commit when you make it. There are other options or flags where you add only the new and modified files, or only the modified and deleted files.)
  3. Let’s check our git status again: type $ git status
  4. You should see something like this:
On branch master
Initial commit
Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
        new file:   readme.md```
  1. Let’s take a snapshot: type $ git commit -m "My first commit". What happened? Remember, Git keeps track not only of the changes, but who is making them. If this is your first time working with Git in the Archaebox, Git will ask you for your name and email. Helpfully, the Git error message tells you exactly what to do: type $ git config --global user.email "you\@example.com" and then type $ git config --global user.name "Your Name". Now try making your first commit.
  2. The command above represents a bit of a shortcut for making commit messages by using the -mflag to associate the text in the quotation marks with the commit. Open up your readme.md file again, and add some more text to it. Save and exit the text editor. Add the new changes to the snapshot that we will take. Then, type $ git commit. Git automatically opens up the text editor so you can type a longer, more substantive commit message. In this message (unlike in markdown) the # indicates a line to be ignored. You’ll see that there is already some default text in there telling you what to do. Type a message indicating the nature of the changes you have made. Then save and exit the text editor. DO NOT change the filename!

Congratulations, you are now able to track your changes, and keep your materials under version control!

  1. Go ahead and make some more changes to your repository. Add some new files. Commit your changes after each new file is created. Now we’re going to view the history of your commits. Type $git log. What do you notice about this list of changes? Look at the time stamps. You’ll see that the entries are listed in reverse chronological order. Each entry has its own ‘hash’ or unique ID, the person who made the commit and time are listed, as well as the commit message eg:
commit 253506bc23070753c123accbe7c495af0e8b5a43
Author: Shawn Graham <shawn.graham@carleton.ca>
Date:   Tue Feb 14 18:42:31 2017 +0000

Fixed the headings that were broken in the about section of readme.md
  1. We’re going to go back in time and create a new branch. You can escape the git log by typing q. Here’s how the command will look: $ git checkout -b branchname <commit> where branch is the name you want the branch to be called, and <commit> is that unique ID. Make a new branch from your second last commit (don’t use < or >).
  2. We typed git checkout -b experiment 253506bc23070753c123accbe7c495af0e8b5a43. The response: Switched to a new branch 'experiment' Check git status and then list the contents of your repository. What do you see? You should notice that some of the files you had created before seem to have disappeared – congratulations, you’ve time travelled! Those files are not missing; but they are on a different branch (the master branch) and you can’t harm them now. Add a number of new files, making commits after each one. Check your git status, and check your git log as you go to make sure you’re getting everything. Make sure there are no unstaged changes – everything’s been committed.
  1. Now let’s assume that your experiment branch was successful – everything you did there you were happy with and you want to integrate all of those changes back into your master branch. We’re going to merge things. To merge, we have to go back to the master branch: $ git checkout master. (Good practice is to keep separate branches for all major experiments or directions you go. In case you lose track of the names of the branches you’ve created, this command: git branch -va will list them for you.)
  1. Now, we merge with $ git merge experiment. Remember, a merge is a special kind of commit that rolls all previous commits from both branches into one – Git will open your text editor and prompt you to add a message (it will have a default message already there if you want it). Save and exit and ta da! Your changes have been merged together.
  1. One of the most powerful aspects of using Git is the possibility of using it to manage collaborations. To do this, we have to make a copy of your repository available to others as a remote. There are a variety of places on the web where this can be done; one of the most popular at the moment is Github. Github allows a user to have an unlimited number of public repositories. Public repositories can be viewed and copied by anyone. Private repositories require a paid account, and access is controlled. If you are working on sensitive materials that can only be shared amongst the collaborators on a project, you should invest in an upgraded account (note that you can also control which files get included in commit; see this help file. In essence, you simply list the file names you do not want committed; here’s an example). Let’s assume that your materials are not sensitive.
  1. Go to Github, register for an account.
  2. On the upper right part of the screen there is a large + sign. Click on that, and select new public repository
  3. On the following screen, give your repo a name.
  4. DO NOT ‘initialize this repo with a readme.md’. Leave add .gitignore and add license set to NONE.
  5. Clic the green ‘Create Repository’ button.
  6. You now have a space into which you will publish the repository on your machine. At the command line, we now need to tell Git the location of this space. We do that with the following command, where you will change your-username and your-new-repo appropriately:
    $ git remote add origin https://github.com/YOUR-USERNAME/YOUR-NEW-REPO.git
  7. Now we push your local copy of the repository onto the web, to the Github version of your repo:
    git push -u origin master

NB If you wanted to push a branch to your repository on the web instead, do you see how you would do that? If your branch was called experiment, the command would look like this:

$ git push origin experiment
  1. The changes can sometimes take a few minutes to show up on the website. Now, the next time you make changes to this repository, you can push them to your Github account – which is the ‘origin’ in the command above. Add a new text file. Commit the changes. Push the changes to your account.
  1. Imagine you are collaborating with one of your classmates. Your classmate is in charge of the project, and is keeping track of the ‘official’ folder of materials (eg, the repo). You wish to make some changes to the files in that repository. You can manage that collaboration via Github by making a copy, what Github calls a fork.
  1. Make sure you’re logged into your Github account on the Github website. We’re going to fork an example repository right now by going to https://github.com/octocat/Spoon-Knife. Click the ‘fork’ button at top-right. Github now makes a copy of the repository in your own Github account!
  2. To make a copy of that repository on your own machine, you will now clone it with the git clonecommand. (Remember: a ‘fork’ copies someone’s Github repo into a repo in your OWN Github account; a ‘clone’ makes a copy on your own MACHINE). Type:
    $ cd.. 
    $ pwd

    We do that to make sure you’re not inside any other repo you’ve made! Make sure you’re not inside the repository we used in exercises 1 to 5, then proceed:

$ git clone https://github.com/YOUR-USERNAME/Spoon-Knife
$ ls

You now have a folder called ‘Spoon-Knife’ on your machine! Any changes you make inside that folder can be tracked with commits. You can also git push -u origin master when you’re inside it, and the changes will show up on your OWN copy (your fork) on Github.com. c. Make a fork of, and then clone, one of your classmates’ repositories. Create a new branch. Add a new file to the repository on your machine, and then push it to your fork on Github. Remember, your new file will appear on the new branch you created, NOT the master branch.

  1. Now, you let your collaborator know that you’ve made a change that you want her to merge into the original repository. You do this by issuing a pull request. But first, we have to tell Git to keep an eye on that original repository, which we will call upstream. You do this by adding that repository’s location like so:
  1. type (but change the address appropriately):
$ git remote add upstream THE-FULL-URL-TO-THEIR-REPO-ENDING-WITH-.git
  1. You can keep your version of the remote up-to-date by fetching any new changes your classmate has done:
    $ git fetch upstream
  2. Now let’s make a pull request (you might want to bookmark this help document). Go to your copy of your classmate’s repository at your Github account. Make sure you’ve selected the correct branch you pushed your changes to, by selecting it from the Branches menu drop down list.
  3. Click the ‘new pull request’ button.
  4. The new page that appears can be confusing, but it is trying to double check with you which changes you want to make, and where. ‘Base Branch’ is the branch where you want your changes to go, ie, your classmate’s repository. ‘head branch’ is the branch where you made your changes. Make sure these are set properly. Remember: the first one is the TO, the second one is the FROM: the place where you want your changes to go TO, FROM the place where you made the changes.
  5. A pull request has to have a message attached to it, so that your classmate knows what kind of change you’re proposing. Fill in the message fields appropriately, then hit the ‘create pull request’ button.
  1. Finally, the last bit of work to be done is to accept the pull request and merge the changes into the original repository.
  1. Go to your repository on your Github account. Check to see if there are any ‘pull requests’ – these will be listed under the ‘pull requests’ tab. Click on that tab.
  2. You can merge from the command line, but for now, you can simply click on the green ‘merge pull request’ button, and then the ‘confirm merge’ button. The changes your classmate has made have now been folded into your repository.
  3. To get the updates on your local machine, go back to the command line and type
    $ git pull origin master

1.3.6 Warnings

It is possible to make changes to files directly via the edit button on Github. Be careful if you do this, because things rapidly can become out of sync, resulting in conflicts between differing versions of the same file. Get in the habit of making your changes on your own machine, and making sure things are committed and up-to-date (git status, git pull origin master, git fetch upstream are your friends) before beginning work. At this point, you might want to investigate some of the graphical interfaces for Git (such as Github Desktop). Knowing as you do how things work from the command line, the idiosyncracies of the graphical interfaces will make more sense. For further practice on the ins-and-outs of Git and Github Desktop, we recommend trying the Git-it app by Jessica Lord.

For help in resolving merge conflicts, see the Github help documentation. For a quick reminder of how the workflow should go, see this cheat-sheet by Chase Pettit.

Failing Productively in Digital Archaeology

We’re currently writing the very first draft of our integrated DHBox virtual-machine-and-textbook for digital archaeology. It’s aimed at the same crowd as a regular intro-to-archaeology text, that is, first or second year students with little digital grounding. It won’t cover everyone’s wishlist for digital archaeology, but it will with care be a solid foundation for going further.

In this excerpt, we explore the idea of the productive fail and its value in both teaching and research in digital archaeology. This section had its genesis in my presentation to the Digital Archaeology Institute at MSU last summer. (You can listen to me here or watch me here but note video has a few minutes of dead air at the beginning. Yes, I could edit it.) Feel free to comment or annotate via this hypothesis link

1.5 Failing Productively

We have found that students are very nervous about doing digital work because, ‘what if it breaks?’ and ‘what if I can’t get it to work?’ This is perhaps a result of high-stakes testing and the ways we as educators have inculcated all-or-nothing grading in our courses. There is no room for experimentation, no room for trying things out when the final essay is worth 50% of the course grade, for instance. Playing it safe is a valid response in such an environment. A better approach, from a pedagogical point of view, is to encourage students to explore and try things out, with the grading being focused on documenting the process rather than on the final outcome. We will point the reader to Daniel Paul O’Donnel’s concept of the unessay; more details behind the link.

Our emphasis on open notebooks has an ulterior motive, and that is to surface the many ways in which digital work sometimes fails. We want to introduce to you the idea of ‘failing productively’ because there is such a thing as an unproductive failure. There are ways of failing that do not do us much good, and we need – especially with digital work – to consider what ‘fail’ actually can mean. In the technology world, there are various slogans surrounding the idea of ‘fail’ – fail fast; move fast and break things; fail better; for instance.

When we talk about ‘failing productively’ or failing better, it is easy for critics of digital archaeology (or the digital humanities; see Allington et al 2016 but contra: Greenspan 2016) to connect digital work to the worst excesses of the tech sector. But again, this is to misunderstand what ‘fail’ should mean. The understanding of many tech startup folks that valorizing failure as a license to burn through funding has caused a lot of harm. The tech sector failed to understand the humanistic implication of the phrase, and instead took it literally to mean ‘a lack of success is itself the goal’. But where does it come from?

The earliest use of the ‘fail better’ idea that we have found outside the world of literature and criticism seems to occur during the first tech boom, where it turns up in everything from diet books to learning to play jazz, to technology is in The Quest for a Unified Theory of Information. CITATION (Wolfgang Hofkirchner). That book according to Google Scholar at the time of writing has been cited 13 times, but the things that cite it have themselves been cited over 600 times, which while not conclusive is suggestive. We are here merely speculating on where this mantra of the fast fail, fail better, comes from and how it spreads, but it would be a very interesting topic to explore.

Perhaps a better understanding of what ‘fail’ should mean is as something akin to what Nassim Taleb called ‘antifragility’. The fragile thing breaks under stress and randomness; the resilient thing stays the same; and the anti-fragile thing actually gets stronger as it is exposed to randomness. Kids’ bones for instance need to be exposed to shocks in order to get stronger. Academia’s systems are ‘fragile’ in that they do not tolerate fail; they are to a degree resilient, but they are not ‘antifragile’ in Taleb’s sense. The idea that ‘fail’ can break that which is ‘fragile’ is part of the issue here. So silicon valley really means ‘fail’ in the sense of ‘antifragile’ but they frequently forget that; academia sees ‘fail’ as the breaking of something fragile; and so the two are at loggerheads. Indeed, the rhetorical moves of academe often frame weak results – fails – as actual successes, thus making the scholarship fragile (hence the fierceness of academic disputes when results are challenged, sometimes). To make scholarship anti-fragile – to extract the full value of a fail and make it be productive, we need remember only one thing:

A failure shared is not a failure.

Not every experiment results in success; indeed, the failures are richer experiences because as academics we are loathe to say when something did not work – but how else will anybody know that a particular method, or approach, is flawed? If we try something, it does not work, and we then critically analyze why that should be, we have in fact entered a circle of positive feedback. This perspective is informed by our research into game based learning. A good game keeps the challenges just ahead of the player’s (student’s) ability, to create a state of ‘flow’. Critical failure is part of this: too hard, and the player quits; too easy, and the player drops the controller in disgust. The ‘fails’ that happen in a state of flow enable the player to learn how to overcome them. Perhaps if we can design assessment to tap into this state of flow, then we can create the conditions for continual learning and growth (see for instance Kee, Graham, et al. 2009). As in our teaching, so too in our research. Presner writes,

Digital projects in the Humanities, Social Sciences, and Arts share with experimental practices in the Sciences a willingness to be open about iteration and negative results. As such, experimentation and trial-and-error are inherent parts of digital research and must be recognized to carry risk. The processes of experimentation can be documented and prove to be essential in the long-term development process of an idea or project. White papers, sets of best practices, new design environments, and publications can result from such projects and these should be considered in the review process. Experimentation and risk-taking in scholarship represent the best of what the university, in all its many disciplines, has to offer society. To treat scholarship that takes on risk and the challenge of experimentation as an activity of secondary (or no) value for promotion and advancement, can only serve to reduce innovation, reward mediocrity, and retard the development of research. PRESSNER 2012 cite

1.5.1 A taxonomy of fails

There are fails, and then there are fails. Croxall and Warnick identify a taxonomy of four kinds of failure in digital work:

  1. Technological Failure
  2. Human Failure
  3. Failure as Artifact
  4. Failure as Epistemology

…to which we might add a fifth kind of fail:

  1. Failing to Share

The first is the simplest: something simply did not work. The code is buggy, dust and grit got into the fan and the hardware seized. The second, while labeled ‘human failure’ really means that the context, the framework for encountering the technology was not erected properly, leading to a failure to appreciate what the technology could do or how it was intended to be used. This kind of failure can also emerge when we ourselves are not open to the possibilities or work that the technology entails. The next two kinds of failure emerge from the first in that they are ways of dealing with the first two kinds of failure. ‘Failure as Artifact’ means that we seek out examples of failures as things to study, working out the implications of why something did not work. Finally, ‘Failure as Epistemology’ purposely builds the opportunity to fail into the research design, such that each succeeding fail (of type 1 or type 2) moves us closer to the solution that we need. The first two refer to what happened; the second two refer to our response and how we react to the first two (if we react at all). The key to productive failure as we envision it is to recognize when one’s work is suffering from a type 1 or type 2 fail, and to transform it to a type 3 or type 4. Perhaps there should be a fifth category though, a failure to share. For digital archaeology to move forward, we need to know where the fails are and how to move beyond them, such that we move forward as a whole. Report not just the things that work, but also the fails. That is why we keep open research notebooks.

Lets consider some digital projects that we have been involved in, and categorize the kinds of fails they suffered from. We turn first to the HeritageCrowd project that Graham established in 2011. This project straddled community history and cultural history in a region poorly served by the internet. It was meant to crowd-source intangible heritage via a combination of web-platform and telephony (people could phone in with stories, which were automatically transcribed and added to a webmap). The first write-up of the project was published just as the project started to get underway (CITATION GRAHAM ETAL). It’s what happened next that is of interest here.

The project website was hacked, knocked offline and utterly compromised. The project failed.

Why did it fail? It was a combination of at least four distinct problems (GRAHAM CITATION):

  1. poor record keeping of the installation process of the various technologies that made it work
  2. computers talk to other computers to persuade them to do things. In this case, one computer injected malicious code into the technologies Graham was using to map the stories
  3. Graham ignored security warnings from the main platform’s maintainers
  4. Backups and versioning: there were none.

Graham’s fails here are of both type 1 and type 2. In terms of type 2, his failure to keep careful notes on how the various pieces of the project were made to fit together meant that he lacked the framework to understand how he had made the project vulnerable to attack. The actual fail point of the attack – that’s a type 1 fail, but could have been avoided if Graham had participated more in the spirit of open software development, eg, read the security warnings in the developer forum! When Graham realized what had happened to his project, he was faced with two options. One option, having already published a piece on the project that hailed its successes and broad lessons for crowdsourcing cultural heritage, would have been to quietly walked away from the project (perhaps putting up a new website avverring that version 2.0 was coming, pending funding). The other option was to warn folks to beef up the security and backups for their own projects. At the time, crowdsourcing was very much an academic fashion and Graham opted for the second option in that spirit. In doing this, the HeritageCrowd project became a fail of type 3, an artifact for study and reflection. The act of blogging his post-mortem makes this project also an instance of type 5, or the communication of the fail. It is worth pointing out here that the public sharing of this failure is not without issues. As we indicated in the [Open Notebook Research & Scholarly Communication] section, the venue for sharing what hasn’t worked and the lessons learned is highly contingent on many factors. Graham, as a white male tenure-track academic on the web in 2012, could share openly that things had not worked. As the web has developed in the intervening years, and with the increasing polarization of web discourse into broad ideological camps, it may well not be safe for someone in a more precarious position to share so openly. One must keep in mind one’s own situation and adapt what we argue for here accordingly. Sharing fails can be done with close colleagues, students, specialist forums and so on.

If we are successful with ODATE, and the ideas of productive fail begin to permeate more widely in the teaching of digital archaeology, then a pedagogy that values fail will with time normalize such ‘negative results’. We are motivated by this belief that digital archaeology is defined by the productive, pedagogical fail. It is this aspect of digital archaeology that also makes it a kind of public archaeology, and that failing in public can be the most powerful thing that digital archaeology offers the wider field.

We implore you to do your research so that others can retrace your steps; even a partial step forward is a step forward! When you find a colleague struggling, give positive and meaningful feedback. Be open about your own struggles, but get validation of your skills if necessary. Build things that make you feel good about your work into your work.

1.5.2 Exercises

What are the nature of your own fails? Reflect on a ‘fail’ that happened this past year. Where might it fit on the taxonomy? Share this fail via your open notebook, blog, or similar, with your classmates. How can your classmate convert their fails into types three or four?

What is Digital Archaeology?

We’re currently writing the very first draft of our integrated DHBox virtual-machine-and-textbook for digital archaeology. It’s aimed at the same crowd as a regular intro-to-archaeology text, that is, first or second year students with little digital grounding. It won’t cover everyone’s wishlist for digital archaeology, but it will with care be a solid foundation for going further.

What follows is an excerpt from the opening chapter, ‘Going Digital’ and the first subsection, ‘So What is Digital Archaeology Anyway?’

This is a rough draft; indeed, I call it the ‘barf’ draft where you are just struggling to get ideas out, in the hopes that eventually it will come together, and the errors, omissions and non-sequiturs are ironed out. Use with caution; comments welcome. Feel free to annotate with Hypothesis.

I post this today in honour of the Computer Applications in Archaeology Conference in Atlanta, (updates from which follow on Twitter at #caaatlanta.)

1 Going Digital

Digital archaeology should exist to assist us in the performance of archaeology as a whole. It should not be a secret knowledge, nor a distinct school of thought, but rather simply seen as archaeology done well, using all of the tools available to and in better recovering, understanding and presenting the past. In the end, there is no such thing as digital archaeology. What exists, or at least what should exist, are intelligent and practical ways of applying the use of computers to archaeology that better enable us to pursue both our theoretical questions and our methodological applications. (Evans and Daly  2006)

While we agree with the first part of the sentiment, the second part is rather up for debate. We believe that there is such a thing as digital archaeology. Digital tools exist in a meshwork of legal and cultural obligations, and moreso than any other tool humans have yet come up with, have the capability to exert their own agency upon the user. Digital tools and their use are not theory-free nor without theoretical implications. There is no such thing as neutral, when digital tools are employed. This is why digital archaeology is – or should be – a distinct subfield of the wider archaeological project.

In a conversation initiated on Twitter on March 10, 2017, Graham asked the question, ‘is digital archaeology the same as using computers in archaeology?’ REF.

The resulting conversation ranged widely over everything from the topic of study (ref tinysapiens ref) to the ways in which computational power enables the researcher to ask questions that were not previously feasible to ask (ref CD Wren). Other researchers sounded a note of caution against the kind of ‘technological fetishism’ (ref Lorna) that digital work can often fall pray to, especially given the larger issues of gender and ‘solutionitis’ that emerge given the white, 20-35 year old demographic of many tech workers (for criticisms of technological solutionism or utopianism in archaeology, see the work of Colleen Morgan (ref phd thesis) Joyce, Tringham, Morozov, Kansa ). Others sounded a warning that to think of digital archaeology as something distinct from archaeology risks ‘going the way of DH’ and instead appealed for a holistic understanding (ref Gruber).

Hanna Marie Pageau succintly captured these issues, when over a series of tweets (REF) she wrote,

‘Digital archaeology has an obvious digital component. However, saying it’s simply using a computer is like saying being a computer scientist means you use a computer to do science. There is an implied addition [to the] topic of specific methods that brings you from an archaeologist using a computer to being an archaeologist who studies digital archaeology. I would argue that archaeogaming is the most straight forward example. Because while gaming is usually thought of as digital, it could study table top gaming and not technically be digital in nature. However if you’re studying ethics of representation in games you’re going from just using a computer as a tool to it being THE medium.’

In which case, an important aspect of digital archaeology that differentiates it from the use of computing power to answer archaeological questions is this question of purpose. In this section, we take up this question beginning with the question of teaching digital approaches. We progress by suggesting that digital archaeology is akin to work at the intersection of art and public archaeology and digital humanities. We provide you the necessary basics for setting up your own digital archaeological practice. Entrance into the world of digital archaeology requires organizational ability and facility with versioning files. It is allied with the practice of open notebook science, and it attempts to future-proof by using the simplest file formats and avoiding proprietary software where possible. These are the basics on which the rest of digital archaeological practice is founded.

1.1 So what is Digital Archaeology?

If you are holding this book in your hands, via a device or on paper, or looking at it on your desktop, you might wonder why we feel it necessary to even ask the question. It is important at the outset to make the argument that digital archaeology is not about ‘mere’ tool use. Andrew Goldstone in Debates in the Digital Humanities discusses this tension (Goldstone 2018). He has found (and Lincoln Mullen concurs with regard to his own teaching,(Mullen 2017)) that our current optimism about teaching technical facility is misplaced. Tools first, context second doesn’t work. Alternatively, theory first doesn’t seem to work either. And finally, for anything to work at all, datasets have to be curated and carefully pruned for their pedagogical value. We can’t simply turn students loose on a dataset (or worse, ask them to build their own) and expect ‘learning’ to happen.

Our approach in this volume is to resolve that seeming paradox by providing not just the tools, and not just the data, but also the computer itself. Archaeologically, this puts our volume in dialog with the work of scholars such as Ben Marwick, who makes available with his research the code, the dependencies, and sometimes, an entire virtual machine, to enable other scholars to replicate, reuse, or dispute his conclusions. We want you to reuse our code, to study it, and to improve upon it. We want you to annotate our pages, and point out our errors. For us, digital archaeology is not the mere use of computational tools to answer archaeological questions. Rather, it is to enable the audience for archaeological thinking to enter into conversation with us, and to do archaeology for themselves.

Digital archaeology is necessarily a public archaeology. This is its principal difference with what has come before, for never forget, there has been at least a half-century of innovative use of computational power for archaeological knowledge building.

Ethan Watrall has drawn the histoy of computational archaeology/digital archaeology all the way back to the pioneering work of James Deetz in the 1960s, who used computers at MIT to perform stylistic analyses of Arikara ceramics (Ethan Watrall 2017, Deetz (1965)). Most early interest in computation for archaeology was centred on the potential for computational databases, although ambition often out-stripped capability. By the 1970s, serious efforts were being put into work to build the infrastructural knowledge necessary to make and usefully query archaeological datasets. One can see this concern play out by considering a topic model (Shawn Graham 2014) of the early volumes of the Computer Applications in Archaeology (a topic model is a way of deducing latent patterns of discourse within text, based on patternings of words (See Graham, Weingart, and Milligan 2012)):

topic 1 – computer, program, may, storage, then, excavation, recording, all, into, form, using, retrieval, any, user, output, records, package, entry, one, unit

topic 6: but, they, one, time, their, all, some, only, will, there, would, what, very, our, other, any, most, them, even

topic 20: some, will, many, there, field, problems, may, but, archaeologists, excavation, their, they, recording, however, record, new, systems, most, should, need

The beginnings of the CAA are marked by hesitation and prognostication: what are computers for, in archaeology? There is a sense that for archaeologists, computation is something that will be useful insofar as it can be helpful for recording information in the field. By the 1980s desktop computing was becoming sufficiently widespread that the use of geographic information systems was feasible for more and more archaeologists. The other ‘killer app’ of the time was computer-aided design, which allowed metric 3d reconstructions from the plans drawn on site by excavators. Yet, computational resources were still limited enough that computing was not something that one could merely ‘play’ with. Software was costly, computation took time, and training resources were put into learning the proprietary packages that existed (rather than coding knowledge). By the 1990s, the introduction of the cd-rom and the shift in PC gaming technologies from primarily text-based to graphical based games led to teaching simulations for archaeology, most notably T. Douglas Price and Anne Birgitte Gebauer’s Adventures in Fugawiland. Watrall identifies the emergence of the web as being not so much a boon for computational archaeology as it was for public archaeology (although the pioneering journal Internet Archaeology was first published in 1996); nevertheless, the birth of the web (which it must be remembered is distinct from and overlays the internet) allowed for a step-change in the effectiveness of the dissemination of open-source software and code, including practices for remote collaboration on code that are now beginning to percolate into scholarly publication.

The 2000s have seen, insofar as digital archaeology is concerned, a replay of the earlier episodes of computational archaeology, concommitant with each subsequent web ‘revolution’ (ie, so-called web 2.0, web 3.0 etc). Works such as (Evans, Daly, and MyiLibrary 2006) and (E. C. Kansa, Kansa, and Watrall 2011) are broadly concerned more with questions of infrastructure and training, while the more recent Mobilizing the Past deal with problems of training, and the ethical issues that the emerging digital surveillance permitted by our networked society presents to the practice of archaeology (and public archaeology). Perhaps the most promising new digital technologies to emerge in recent years include methods for linking open archaeological data via the web (ie, freeing various ‘silos’ of disciplinary knowledge so that the semantic connections between them can be followed and queried) and various mixed-reality approaches (virtual reality, augmented reality, 3d printing, and the so-called internet of things or the practice of wiring everything that can be wired to the web). The 2000s have also seen a growing realization that our digital tools and their algorithmic biases not only permit interesting questions to be asked about the past, but also inhibit points of view or impose their own worldviews upon the past in ways that may damage communities and/or scholarship. This reflective critique of computation in the service of archaeology marks digital archaeology within the ambit of the digital humanities (despite the division between anthropological and humanistic archaeologies).

1.1.1 Is digital archaeology part of the digital humanities?

In recent years – certainly the last decade – an idea called ‘the digital humanities’ has been percolating around the academy. It is a successor idea to ‘humanities computing’, but it captures that same distinction between discussed above. Digital archaeology has developed alongside the digital humanities, sometimes intersecting with it (notably, there was a major archaeological session at the annual international Alliance of Digital Humanities Organizations (ADHO) DH conference in 2013).

The various component organizations of the ADHO have been meeting in one form or another since the 1970s; so too the Computer Applications in Archaeology Conference has been publishing its proceedings since 1973. Archaeologists have been running simulations, doing spatial analysis, clustering, imaging, geophysicing, 3d modeling, neutron activation analyzing, x-tent modeling , etc, for what seems like ages. Happily, there is no one definition of ‘dh’ that everyone agrees on (see the various definitions collected at http://definingdh.org/; reload the page to get a new definition). For us, a defining characteristic of DH work is that public use we discussed above. But, another characteristic that we find useful to consider is the purpose to which computation is put in DH work. This means that digital work also has to be situated in the contexts of power and access and control (which sometimes means that digital work is mis-characterised as being part of a ‘neo-liberal’ agenda to reduce knowledge work to base profit motifs, eg Brouiellet; more thoughtful work about the confluence of the digital with neoliberalism may be found in Caraher xxxx and Kansa xxxx and Greenspan xxx. We discuss the ethical dimensions to digital work more fully in [The Ethics of Big Data in Archaeology].)

For us, a key difference between the kind of computational archaeology of the last years of the twentieth century versus the emerging digital archaeology of the last decade lie in the idea of the purpose behind the computing power. Trevor Owens, a digital archivist, draws attention to the purpose behind one’s use of computational power – generative discovery versus justification of an hypothesis (tjowens 2012). Discovery marks out the digital humanist whilst justification signals the humanist who uses computers. Discovery and justification are critically different concepts. For Owens, if we are using computational power to deform our texts, then we are trying to see things in a new light, to create new juxtapositions, to spark new insight. Stephen Ramsay talks about this too in Reading Machines (Ramsay 2011, 33), discussing the work of Samuels and McGann, (Samuels and McGann 1999): “Reading a poem backward is like viewing the face of a watch sideways – a way of unleashing the potentialities that altered perspectives may reveal”. This kind of reading of data (especially, but not necessarily, through digital manipulation), does not happen very much at all in archaeology. If ‘deformance’ is a key sign of the digital humanities, then digital archaeologists are not digital humanists. Owen’s point isn’t to signal who’s in or who’s out, but rather to draw attention to the fact that:

When we separate out the the context of discovery and exploration from the context of justification we end up clarifying the terms of our conversation. There is a huge difference between “here is an interesting way of thinking about this” and “This evidence supports this claim.”

This is important in the wider conversation concerning how we evaluate digital scholarship. We’ve used computers in archaeology for decades to try to justify or otherwise connect our leaps of logic and faith, spanning the gap between our data and the stories we’d like to tell. We believe, on balance, that ‘digital archaeology’ sits along this spectrum between justification and discovery closer to the discovery end, that it sits within the digital humanities and should worry less about hypothesis testing, and concentrate more on discovery and generation, of ‘interesting way[s] of thinking about this’.

Digital archaeology should be a prompt to make us ‘think different’. Let’s take a small example of how that might play out. It’s also worth suggesting that ‘play’ as a strategy for doing digital work is a valid methodology (see Ramsay (2011)). (And of course, the ability to play with computing power is a function of Moore’s law governing the increase in computing power time: computing is no longer a precious resource but something that can be ‘wasted’.)

1.1.2 Archaeological Glitch Art

Bill Caraher is a leading thinker on the implications and practice of digital archaeology. In a post on archaeological glitch art (Caraher 2012) Caraher changed file extensions to fiddle about in the insides of images of archaeological maps. He then looked at them again as images:

The idea … is to combine computer code and human codes to transform our computer mediated image of archaeological reality in unpredictable ways. The process is remarkably similar to analyzing the site via the GIS where we take the “natural” landscape and transform it into a series of symbols, lines, and text. By manipulating the code that produces these images in both random and patterned ways, we manipulate the meaning of the image and the way in which these images communicate information to the viewer. We problematize the process and manifestation of mediating between the experienced landscape and its representation as archaeological data.

Similarly, Graham’s work in representing archaeological data in sound (a literal auditory metaphor) translates movement over space (or through time) into a soundscape of tones (Graham 2017). This frees us from the tyranny of the screen and visual modes of knowing that often occlude more than they reveal (for instance, our Western-framed understanding of the top of the page or screen as ‘north’ means we privilege visual patterns in the vertical dimension over the horizontal (Montello et al. 2003)).

These playful approaches force us to rethink some of our norms of communication, our norms of what archaeology can concern itself with. It should be apparent that digital archaeology transcends mere ‘digital skills’ or ‘tool use’; but it also suffers from being ‘cool’.

1.1.3 The ‘cool’ factor

Alan Liu (Liu 2004) wondered what the role of the arts and humanities was in an age of knowledge work, of deliverables, of an historical event horizon that only goes back the last financial quarter. He examined the idea of ‘knowledge work’ and teased out how much of the driving force behind it is in pursuit of the ‘cool’. Through a deft plumbing of the history of the early internet (and in particular, riffing on Netscape’s ‘what’s cool?’ page from 1996 and their inability to define it except to say that they’d know it when they saw it), Liu argues that cool is ‘the aporia of information… cool is information designed to resist information… information fed back into its own signal to create a standing interference pattern, a paradox pattern’ (Liu 2004, 179). The latest web design, the latest app, the latest R package for statistics, the latest acronym on Twitter where all the digital humanists play: cool, and dividing the world.

That is, Liu argued that ‘cool’ was amongst other things a politics of knowledge work, a practice and ethos. He wondered how we might ‘challenge knowledge work to open a space, as yet culturally sterile (coopted, jejune, anarchistic, terroristic), for a more humane hack of contemporary knowledge?’ (Liu 2004, 9). Liu goes on to discuss how the tensions of ‘cool’ in knowledge work (for us, read: digital archaeology) also intersects with an ethos of the unknown, that is, of knowledge workers who work nowhere else somehow manage to stand outside that system of knowledge production. (Is alt-ac ‘alt’ partially because it is the cool work?). This matters for us as archaeologists. There are many ‘cool’ things happening in digital archaeology that somehow do not penetrate into the mainstream (such as it is). The utilitarian dots-on-a-map were once cool, but are now pedestrian. The ‘cool’ things that could be, linger on the fringes. If they did not, they wouldn’t be cool, one supposes. They resist.

To get that more humane hack that Liu seeks, Liu suggests that the historical depth that the humanities provides counters the shallowness of cool:

The humanities thus have an explanation for the new arts of the information age, whose inheritance of a frantic sequence of artistic modernisms, postmodernisms, and post-postmodernists is otherwise only a displaced encounter with the raw process of historicity. Inversely, the arts offer the humanities serious ways of engaging – both practically and theoretically- with “cool”. Together, the humanities and arts might be able to offer a persuasive argument for the humane arts in the age of knowledge work. (Liu 2004, 381).

In which case, the emergence of digital archaeologists and historians in the last decade might be the loci of the humane hacks – if we move into that space where we engage the arts. Indeed, the seminal anthropologist Tim Ingold makes this very argument with reference to his own arc as a scholar, ‘From Science to Art and Back Again’:

Revisiting science and art: which is more ecological now? Why is art leading the way in promoting radical ecological awareness? The goals of today’s science are modelling, prediction and control. Is that why we turn to art to rediscover the humility that science has lost?

We need to be making art. Digital archaeology naturally pushes in that direction.

1.1.4 Takeaways

  • Digital archaeology is a public archaeology
  • Digital archaeology is often about deformance rather than justification
  • In that deformative practice, it is in some ways extremely aligned with artistic ways of knowing
  • Digital archaeology is part of the digital humanities, and in many ways, presaged current debates and trends in that field.

All of these aspects of digital archaeology exist along a continuum. In the remainder of this chapter, we give you a ‘boot-camp’ to get you to the point where you can begin to wonder about deformation and the public entanglement with your work.

1.1.5 Exercises

The first steps in going digital are quite easy. They are fundamentally a question of maintaining some basic good habits. Everything else flows from these three habits:

1. separate _what_ your write/create from _how_ you write it.
2. keep what you write/create under version control.
3. break tasks down into their smallest manageable bits

Have you ever fought with Word or another wordprocessor, trying to get things just right? Word processing is a mess. It conflates writing with typesetting and layout. Sometimes, you just want to get the words out. Othertimes, you want to make your writing as accessible as possible… but your intended recipient can’t open your file, because they don’t use the same wordprocessor. Or perhaps you wrote up some great notes that you’d love to have in a slideshow; but you can’t, because copying and pasting preserves a whole lot of extra gunk that messes up your materials. Similarly, while many archaeologists will use Microsoft Excel to manipulate tabular data (artifact measurements, geochemistry data, and so on), Excel is well known for both corrupting data and for being impossible to replicate (ie, the series of clicks to manipulate or perform an analysis differ depending on the individual’s particular installation of Excel).

The answer is to separate your content from your tool, and your analytical processes separate from your data. This can help keep your thinking clear, but it also has a more nuts-and-bolts practical dimension. A computer will always be able to read a text file. That is to say: you’ve futureproofed your material. Any researcher will have old physical discs or disc drives or obsolete computers lying around. It is not uncommon for a colleague to remark, ‘I wrote this in Wordperfect and I can’t open this any more’. Graham’s MA thesis is trapped on a 3.5″ disc drive that was compressed using a now-obsolete algorithm and it cannot be recovered. If, on the other hand, he had written the text as a .txt file, and saved the data as .csv tables, those materials would continue to be accessible. If the way you have manipulated or cleaned the data is written out as a script, then a subsequent investigator (or even your future self) can re-run the exact sequence of analysis, or re-write the script into the equivalent steps in another analytical language.

A .txt file is simply a text file; a .csv is a text file that uses commas to separate the text into columns. Similarly, a .md file is a text file that uses things like # to indicate headers, and _ to show where italicized text starts and stops. A script, in a play, tells you what to say and do. A script for a language like R or Python does the same thing for the computer, and has the advantage that it is human-readable and annotatable as well, because its format is still a simple text file. Scripts you might encounter could have the .r or .py or .sh file extensions. You can open these in a text editor and see what the computer is being instructed to do. Annotations or comments in the script can be set off in various ways, and help the researcher know what is happening or is intended to happen at various points. Let’s begin by creating some simple text files to document our research process, in the Markdown format.

  1. A nice place to practice writing in markdown that shows you immediately how your text might be rendered when turned into html, pdf, or Word doc is Dillinger.io. Go there now to try it out. Write a short piece on why you’re interested in Digital Archaeology.
  1. Include a blockquote from the introduction to this book.
  2. Include two links to an external site.
  3. Embed an image.
  1. In ODATE at the command line, you’ll now write a markdown file using the built-in text editor nano. Everywhere in this book where you see the $ symbol, we mean for you to type whatever follows after the $ at the command line. The $ is the prompt. Enter the command $ nano my-first-md-file.md. This tells the machine to use the nano text editor to make a new file called my-first-md-file.md and to allow you to edit it right away. (If you just wanted to create an empty file, you could use $ touch my-first-md-file.md). The screen that opens is a simple text editor. Re-write what you wrote for 1 but add subheadings this time. To save your work, hit ctrl+x. Nano will ask you if you want to save changes; select y. It will then prompt you for a file name, but my-first-md-file.md will already be inserted there, so just hit enter.
  2. Make a new markdown file called ‘todo list’ or similar. Use bullet points to break down what else you need to do this week. Each bullet point should have a sub-bullet with an actual ACTION listed, something that you can accomplish to get things done.

As you work through this book, we encourage you to write your thoughts, observations, or results in simple text files. This is good practice whether or not you embark on a full-blown digital project, because ultimately, if you use a computer in your research, you have gone digital.

References

Goldstone, Andrew. 2018. “Teaching Quantitative Methods: What Makes It Hard 9in Literary Studies).” In Debates in the Digital Humanities.

Mullen, Lincoln. 2017. “A Confirmation of Andrew Goldstone on ‘Teaching Quantitative Methods’.” The Backward Glance. http://lincolnmullen.com/blog/a-confirmation-of-andrew-goldstone-on-teaching-quantitative-methods/.

Ethan Watrall. 2017. “Archaeology, the Digital Humanities, and the ‘Big Tent’.” In Debates in the Digital Humanities, 2016th ed. Accessed February 23. http://dhdebates.gc.cuny.edu/debates/text/79.

Deetz, James. 1965. The Dynamics of Stylistic Change in Arikara Ceramics. Urbana: University of Illinois Press.

Shawn Graham. 2014. “A Digital Archaeology of Digital Archaeology: Work in Progress.” https://electricarchaeology.ca/2014/11/06/a-digital-archaeology-of-digital-archaeology-work-in-progress/.

Graham, Shawn, Scott Weingart, and Ian Milligan. 2012. “Getting Started with Topic Modeling and MALLET.” Programming Historian, September. http://programminghistorian.org/lessons/topic-modeling-and-mallet.

Evans, Thomas L., Patrick T. Daly, eds. 2006. Digital Archaeology: Bridging Method and Theory. London ; New York: Routledge. http://proxy.library.carleton.ca/login?url=http://www.myilibrary.com?id=29182.

Kansa, Eric C., Sarah Whitcher Kansa, and Ethan Watrall. 2011. Archaeology 2.0: New Approaches to Communication and Collaboration. Cotsen Digital Archaeology. http://escholarship.org/uc/item/1r6137tb.

tjowens. 2012. “Discovery and Justification Are Different: Notes on Science-Ing the Humanities.” Trevor Owens. http://www.trevorowens.org/2012/11/discovery-and-justification-are-different-notes-on-sciencing-the-humanities/.

Ramsay, Stephen. 2011. Reading Machines: Toward an Algorithmic Criticism. 1st Edition edition. Urbana: University of Illinois Press.

Samuels, Lisa, and Jerome J. McGann. 1999. “Deformance and Interpretation.” New Literary History 30 (1): 25–56. doi:10.1353/nlh.1999.0010.

Caraher, William. 2012. “Archaeological Glitch Art.” The Archaeology of the Mediterranean World. https://mediterraneanworld.wordpress.com/2012/11/21/archaeological-glitch-art/.

Graham, Shawn. 2017. “Cacophony: Bad Algorithmic Music to Muse To.” https://electricarchaeology.ca/2017/02/03/cacophony-bad-algorithmic-music-to-muse-to/.

Montello, Daniel R., Sara Irina Fabrikant, Marco Ruocco, and Richard S. Middleton. 2003. “Testing the First Law of Cognitive Geography on Point-Display Spatializations.” In International Conference on Spatial Information Theory, 316–31. Springer. http://link.springer.com/chapter/10.1007/978-3-540-39923-0_21.

Liu, Alan. 2004. The Laws of Cool: Knowledge Work and the Culture of Information. 1 edition. Chicago: University of Chicago Press.