Procedural History

I see this is my third post with this title. Ah well. Just playing with a script I found here

shawngraham$ python2 historygen.py
In the beginning there was “Baubrugrend Free State”, “Dominion of Clioriwen”
In that era, the people could bear it no longer, and so these ones rebelled from ‘Baubrugrend Free State’ ==> ‘Province of Vrevrela’
In that era, the people could bear it no longer, and so these ones rebelled from ‘Province of Vrevrela’ ==> ‘Free People’s Republic of Craepai’
It is a terrible thing when brothers fight. Thus ‘Free People’s Republic of Craepai’ became “Eiwerela”, “Broteuvallia”
It is a terrible thing when brothers fight. Thus ‘Dominion of Clioriwen’ became “Duchy of Corica”, “Orican Republic”
The thirst for new lands, new glory, and the desire to distract the people, led to new conquests ‘Duchy of Corica’ conquered ‘Eiwerela’
The thirst for new lands, new glory, and the desire to distract the people, led to new conquests ‘Duchy of Corica’ conquered ‘Broteuvallia’
The thirst for new lands, new glory, and the desire to distract the people, led to new conquests ‘Duchy of Corica’ conquered ‘Orican Republic’
In that era, the people could bear it no longer, and so these ones rebelled from ‘Duchy of Corica’ ==> ‘United States of Heukan’
In that era, the people could bear it no longer, and so these ones rebelled from ‘United States of Heukan’ ==> ‘Kingdom of Amoth’
END “Kingdom of Amoth”

The script can also make a nice diagram; now to get it to write the history AND the diagram at the same time.

The directionality of the arrows is a bit confusing. You almost have to read it backwards. However, since it is just a .dot file, I think I can probably load it into something like yEd and make a prettier timeline.

update I’ve added Tracery to the script, made the output a bit more lyrical:

> shawngraham$ python2 historygen.py

Gather by, young ones, and let me tell you of our nations and peoples.

In the beginning there was “Duchy of Corica”

These people shared a single peninsula, shielded from the rest of the world by tall mountains.

Flooding ruined the crops; the famine weakened them all and so, ‘Duchy of Corica’ dissolved in fragments, eventually becoming “Province of Eabloris” and “Voches” and “Uamafai “

A few years later, the strength of the people could bear it no longer, and they rose up in violent revolution. The old ‘Province of Eabloris’ was no more; a new dawn broke on ‘Heawoth’.

As it came to pass, the Queen gave up power and fled into exile. The old ‘Heawoth’ was no more; a new dawn broke on ‘Iroa’

Flooding ruined the crops; the famine weakened them all and so,“Uamafai ” and “Voches” became ‘Eiwerela’.

As it came to pass, the Satrap gave up power and fled into exile. The old ‘Eiwerela’ was no more; a new dawn broke on ‘Oyune’

Low cunning and high treachery divided them and so, ‘Oyune’ dissolved in fragments, eventually becoming “Broteuvallia” and “Islands of Hekla” and “Kingdom of Abroth”.

Low cunning and high treachery divided them and so, ‘Islands of Hekla’ dissolved in fragments, eventually becoming “Satrapy of Yaislaxuin” and “Dominion of Clioriwen”.

The clouds grew dark, and hunger stalked the land, so sickness weakened them all and so, “Dominion of Clioriwen” and “Satrapy of Yaislaxuin” became ‘Kingdom of Amoth’.

The thirst for new lands, new glory, and the desire to distract the people, led to new conquests. ‘Broteuvallia’ conquered ‘Kingdom of Amoth’

A few years later, the Queen gave up power and fled into exile. The old ‘Iroa’ was no more; a new dawn broke on ‘Province of Vrevrela’

Standing proud upon the ruins there are only now “Broteuvallia”and “Kingdom of Abroth”and “Province of Vrevrela”.

(feature image: Chester Alvarez, Unsplash

Building Epoiesen

The curtain goes up, the first pawn moves, the first shot is fired*—but that’s not the start. The play, the game, the war is just a little window on a ribbon of events that may extend back thousands of years. The point is, there’s always something before. It’s always a case of Now Read On.

Much human ingenuity has gone into finding the ultimate Before.

The current state of knowledge can be summarized thus:

In the beginning, there was nothing, which exploded.

* Probably at the first pawn.

  • Terry Pratchett, Lords and Ladies 

Epoiesen is now alive, a journal for creative engagement in history and archaeology. But when did Epoiesen begin? What was its genesis?

As I look through my notebooks and emails and miscellaneous files, I can’t find the *exact* beginning (I know that I’ve been interested in new publishing models for a while though). I find in my inbox an email setting up a meeting with George Duimovich and Pat Moore from our Library to talk about Open Journal Systems in March of 2015. I find scribbles of ideas in notebooks going back to about 2014 (not coincidentally, shortly after my tenure and promotion portfolio was shoehorned from its born-digital format into dead pdfs). In October 2015, I find a google doc that I shared with some folks for an idea of something called ‘Paradata: A Journal of Digital Scholarship in Archaeology and Ancient History’. The influence of HeritageJam I think is clear too ;) I wrote,

“I see this idea as being parallel to things like http://openarchaeologydata.metajnl.com/ which publishes ‘data papers’. Paradata would publish the scholarship that goes into making something of that kind of info, while the dead tree versions can be where people duke it out over the interpretations. Moreover, since link rot and website death is a real issue, Paradata would hit a real niche where projects could continue ever after”

But that idea seems to have run out of steam.  I’m not entirely sure why. I find one note that suggests we felt that our idea perhaps was too close to what DHCommons Journal had in mind.

My notes go silent for a while. Then I find scribbles in my notebooks again from around the time of my participation in MSUDAI, the Digital Archaeology Institute at MSU, concomitant with the creation of @tinyarchae my Tracery-powered dysfunctional excavation bot. That was August 2016. Then, sometime in September of last year, I find a website I built:

O what could’ve been, eh? Here, I’m clearly going for a bit of whimsy; not so much just paradata for conventional digital projects, but maybe something more. The core idea still seems to be there – a place to validate digital things.  I rather like that template, and I need to remind myself what I was using to build it. Structurally, there’s a debt here to open research notebooks done in Jekyll, so that’s probably it.  I did show this ‘Miscellaney’ to people, but there were some very strong reactions to the name, to the whimsy, as it were – see below. (I still like ‘Haptic Visions’ as a category though).

The actual email that led to Epoiesen seeing the light of day comes from October 16 2016:

Hi Pat,

As I was saying to George – and I think you and I have talked about this too on occasion – I’ve been interested to explore creating an open access journal for digital archaeology. I’ve seen the open journals platform, and while it is very cool, it’s not quite what I’m thinking of. I’m interested in something a bit more idiosyncratic that would be based on simple text files in the markdown format, and building the site/journal from that with a static site generator.

The idea is to create what amounts to a kind of literary journal, but for creative engagement with the past. I would solicit everything from twitter bots (I’ve created one that tweets out what amounts to a procedurally-generated soap opera, scenes from an excavation) to music, to art, to creative writing, to data viz… I would solicit reviews, but these would also be published alongside the work under the reviewers’ name. The Hypothesis web annotation architecture would also be built in  […] In a way, it would be a place to publish the ‘paradata’ of digital making in archaeology … Does this sound feasible? Is it something we could do? Maybe I could drop by sometime to chat.

Pat said ‘Yes’. Simple word, ‘yes’. Strong word, ‘yes’. Librarians are powerful.

From that initial meeting, many more meetings took place. Research. Emails. Phone calls. I’ll try to summarize… My first port of call was of course those folks who’ve done this kind of thing before. Martin Paul Eve published a series of posts on his blog that offered his advice on starting an open access journal, and I can’t recommend them enough. Indeed, if you’re one of the people who received an email from me about joining the editorial board, you’ll recognize that I adhered rather closely to Eve’s model.

I was still going with the ‘Smith’s’ name until about November of last year, when I find an email I wrote,

I have, on the advice of several people whose situation is far more precarious than my own, gone for a bit of a name change to signal a bit less whimsy….They rightly pointed out to me that as junior folks the perception of their work is everything, and my whimsical ’Smiths’ name would undermine rather than help them…

One of the earliest folks on board was the wonderful Sara Perry.  I find we exchanged several yonks-worth of emails, throwing ideas around about who to contact, who might be persuaded to submit, and so on. The wonderful folks of the editorial board as a group kept me grounded, found potential contributors, suggested Trello as a way of keeping track of who was doing what, and basically helped keep things on track when my enthusiasm threatened to derail things.

While all of this was going on, I continued to play with the design and platform. I eventually settled on Hexo as a static site generator. I’d been using Jekyll with my open research notebook, but Jekyll frankly is just not something I can work with.

Now Hexo is not without its idiosyncracies. I learned how it builds the site out of little snippets of ‘ejs’ code. I learned how to embed (that is, into what ‘partial’ to paste) Hypothes.is . I figured out where to place the bits and bobs of Tipuesearch (an open source jquery search engine plugin) into the site. (It generates a full json representation of all the site content, so not only making it searchable, but other folks can use it for remixing, data viz, whatever). You wouldn’t believe how hard it was to work out how to make the list-of-articles page, list alphabetically, rather than by date. There was also a battle where the hexo deploy command for pushing everything to my test site ingested accidentally a bunch of stuff I didn’t want – super huge image files – and so I had to wade deep into the waters of git to fix (and I thank the DHAnswers channel in DH Slack for the help!).  Turns out, if you’re using Hexo and Github, don’t fiddle with anything via the website. Getting DOIs embedded into the page metadata, that was also difficult.

Here’s what the YAML metadata for an article looks like:

---
title: Destory History
date: 2017-09-01 20:01:04
tags: interactive fiction
cover_index: /imgs/coyne/quinten-de-graaf-258711.jpg
cover_detail: /imgs/coyne/11146352055_64c730a741_o.jpg
author: "Coyne, Lucas"
doi: "10.22215/epoiesen/2017.4"
---

The partials grab the ‘author’ for alphabetizing posts, and the doi for embedding into the metadata:

<% if (page.doi){ %>
        <meta name="dc:identifier" content="<%= page.doi %>" />
    <% } %>

That might not mean very much to you, or look very impressive, but darnit,  it took several hours of reading stackoverflow and fiddling with things, generating the site over and over again, so I’m pasting it here for posterity…. anyway, I now have a default template for creating articles, that has reminders within it of the kinds of information that I need to include and where they go.

One year later, Epoiesen exists in the world! I announced it with a tweet…

… we will be doing a formal ‘Ta Da!’ during open access week in October. So many people come together at just the right time to make something in this world. Serendipity, and someone says ‘Yes’, and suddenly, there is something that wasn’t before. What else is tenure for, if it isn’t to make space for someone else to do something new? I hope you’ll consider Epoiesen for your own experiments and creative engagements.

~o0o~

I’m grateful to everyone who has sent me a note or tweeted regarding the start of Epoiesen. I look forward to seeing where this adventure will lead! Thank you, all. I’m also grateful to Neville Morley, who writes about Epoiesen’s situation in the broader publishing landscape in ‘Changing the Rules

Moving forward, I’ve very excited to work with Bill Caraher and The Digital Press at the University of North Dakota to publish the Epoiesen Annual, where all of the articles from a given year are gathered together and given (printed) flesh.

Stay tuned! Make wonderful things!

Rolling out DHBox for HIST3814

First of all, the support of Steve Zweibel of CUNY and Andrew Pullin from Carleton’s Computer Science department has been above and beyond the call of duty, and I owe them a case (if not several) of beer.

So below I offer some observations on rolling out our own DHBox at Carleton for my HIST3814o Crafting Digital History students’ use. If you’ll recall, our Open Digital Archaeology Textbook Environment (ODATE) proposes to roll a digital archaeology textbook into a customized DHBox. I took advantage of HIST3814o to dry-run DHBox for ODATE and also to solve some of the problems of a distance, online course in Digital History so that I only had to provide tech support for one kind of computer (an Ubuntu box) rather than the myriad setups that my students might use.

So here we go:

  • I don’t think there’s a DHBox outside DHBox.org in a production setting, so there are a lot of unknowns on our end involved in doing what we’ve proposed to do:
    • How to make it work on a shared server
    • What kind of load it would experience, and hence, what kind of resources it would need
    • How to add more abilities to it
    • How to on-board readers/users
  • Thus, I rewrote my HIST3814o, Crafting Digital History so that we could try to answer these questions in a limited sandbox environment (and where, if things went wrong, I could recover from). Students wouldn’t have to use their own computers to compute but rather, would log into our DHBox. This gave us a maximum number of students (60) from whose experience we can extrapolate the kinds of things ODATE might encounter in terms of computational load, ongoing maintenance, and issues in terms of Carleton’s wider internet security. NB The things that these digital history students do with the DHBox are computational less intense than the things that the archaeology textbook users will do (text analysis versus 3d reconstruction from images, for example), and thus if anything we’re simply defining a minimal baseline.
  • We’ve discovered a number of things:
    • The amount of RAM and CPU processor cores we need is very much higher than we anticipated. Currently, we have had to upsize to 224 GB of space for the students, and increase RAM 4x and CPU processors 5x. Of course, I made a very poor guess at the outset
    • While 60 students are enrolled, it seems that only 20 – 25 are fully active in the course as of yet (but it’s only been 1 week), and so we expect we’ll probably have to increase again.
    • Adding new capabilities to the underlying image of DHBox can break things in unexpected ways. It’s a Docker container being run in an OpenStack environment, so we’ve had to get au-fait with Docker, too. Fortunately, we’ve been in close communication with Steve at CUNY and have been able to find solutions. I say ‘we’, but this is all Andrew and Steve. I am the cause of work in other people
    • Extending the default existence of a user’s DHBox instance from 1 month to 2 months caused odd errors we haven’t fully sussed yet (and in any event, will have an effect on the size of the memory we need), so we rolled that back
    • We discovered that Carleton University’s antivirus provider, was/is using deep-packet sniffing which was preventing elements of the DHBox from working properly in the on-campus computer labs. ITS has now white-listed our server for the DHBox
    • Because of Carleton security concerns, all users of the DHBox have to be logged into the Carleton VPN in order to use the DHBox. For ODATE, this will not be acceptable, and we’ll have to find a solution (which may mean a commercial provider, which will increase costs, although we did budget for these).
    • I think that’s everything for now, but I don’t have my notes handy

I intend to reveal the first section of the ODATE text this summer, so that people can try it out with their own classes this September. We (Neha, Michael, Beth, and I) are current writing the section on the ethics of digital work in archaeology which I believe is crucial to get right before letting people see our drafts. While I don’t think our computational environment – the official ODATE DHBox instance – will be up and running by then (that is, NOT the one I’m using for my HIST3814o class), individuals will be able to go to DHBox.org itself and try things out there OR run things on their own machines (if they have access to Ubuntu Linux computers).

Incidentally, HIST3814o is available online at http://craftingdigitalhistory.ca ; I run a fully open-access version of it for non-Carleton folks concurrently. For OA participants, I use a separate instance of Slack as a communication space, see https://electricarchaeology.ca/2017/07/03/crafting-digital-history-open-access-version-summer-2017/ . One of the OA participants has blogged about starting the course with me at https://infoliterati.com/2017/07/09/down-the-rabbit-hole-with-crafting-digital-history/. You’re welcome to have a look!

Featured image by Patrick Lindenberg

Bots of Archaeology: Machines Writing Public Archaeology?

Today was the day for the keynotes for the Public Archaeology Twitter Conference; the main papers unroll tomorrow, so set your Twitters to #PATC and enjoy! This conference I think might well be one of those landmark conferences we discuss in years to come.

For convenience, I’ve copied my keynote tweets below. 

https://twitter.com/electricarchaeo/status/857625409084682244

https://twitter.com/electricarchaeo/status/857625638051618817

https://twitter.com/electricarchaeo/status/857625894504017920

https://twitter.com/electricarchaeo/status/857626146921476096

https://twitter.com/electricarchaeo/status/857626395488342016

https://twitter.com/electricarchaeo/status/857626648866410497

https://twitter.com/electricarchaeo/status/857626896766345216

https://twitter.com/electricarchaeo/status/857627151075618816

https://twitter.com/electricarchaeo/status/857627399399145472

https://twitter.com/electricarchaeo/status/857627651992764416

https://twitter.com/electricarchaeo/status/857627905307758593

https://twitter.com/electricarchaeo/status/857628155309211648

https://twitter.com/electricarchaeo/status/857628406183108608

https://twitter.com/electricarchaeo/status/857628657921122305

https://twitter.com/electricarchaeo/status/857628912620232704

https://twitter.com/electricarchaeo/status/857629166245642241

https://twitter.com/electricarchaeo/status/857629415802404865

https://twitter.com/electricarchaeo/status/857629665610973184

https://twitter.com/electricarchaeo/status/857629916472291328

https://twitter.com/electricarchaeo/status/857630167694430208

https://twitter.com/electricarchaeo/status/857630420577337345

https://twitter.com/electricarchaeo/status/857630673464590337

https://twitter.com/electricarchaeo/status/857630922761424896

https://twitter.com/electricarchaeo/status/857631174453190656

https://twitter.com/electricarchaeo/status/857631430364495872

https://twitter.com/electricarchaeo/status/857631679342493696

…and if you’re wondering why the featured image (and final image) are shots from the tv adaptation of ‘Going Postal’, by Terry Pratchett, see this.

Tracery continues to be awesome

I had a long conversation this morning with Ed Summers about making bots, why we make bots, and the ways bots (and bot making) might provide insight into other aspects of being digitally literate. I also have thoughts on how this intersects with archaeological knowledge making, more on which another day.

For now: I came across this short piece by Joe Zach where he uses tracery to generate a sequence of chords (the names of chords) which he then feeds into another library, scribbletune to write those chords to a midi file.

HOW AWESOME IS THAT!

And so, as part of my continuing exploration of generative sound art, I forked his code and added a bit to generate a melody line to go along with the chords. My version writes three different midi files, which I then combine in Garageband (and what the hell, I added a bongo sample because it’s spring and it’s snowing, and I’d like to imagine I’m somewhere sunny).

Give it a listen: https://soundcloud.com/shawn-graham-60451318/sand

I’m thinking that this could be plumbed into a twitter bot somehow, write the midi overtop of an autoplaying video…? I dunno.

(oh, and I also indulged in my imposter syndrome re music, generative art, etc, by sonifying an article about imposter syndrome through substitution of letters for number values, A = 1, etc, and then doing a modulo operation to map the resulting numbers to the 88 key keyboard. It’s a bit on the creepy side. https://soundcloud.com/shawn-graham-60451318/imposter-syndrome )

Failing Productively in Digital Archaeology

We’re currently writing the very first draft of our integrated DHBox virtual-machine-and-textbook for digital archaeology. It’s aimed at the same crowd as a regular intro-to-archaeology text, that is, first or second year students with little digital grounding. It won’t cover everyone’s wishlist for digital archaeology, but it will with care be a solid foundation for going further.

In this excerpt, we explore the idea of the productive fail and its value in both teaching and research in digital archaeology. This section had its genesis in my presentation to the Digital Archaeology Institute at MSU last summer. (You can listen to me here or watch me here but note video has a few minutes of dead air at the beginning. Yes, I could edit it.) Feel free to comment or annotate via this hypothesis link

1.5 Failing Productively

We have found that students are very nervous about doing digital work because, ‘what if it breaks?’ and ‘what if I can’t get it to work?’ This is perhaps a result of high-stakes testing and the ways we as educators have inculcated all-or-nothing grading in our courses. There is no room for experimentation, no room for trying things out when the final essay is worth 50% of the course grade, for instance. Playing it safe is a valid response in such an environment. A better approach, from a pedagogical point of view, is to encourage students to explore and try things out, with the grading being focused on documenting the process rather than on the final outcome. We will point the reader to Daniel Paul O’Donnel’s concept of the unessay; more details behind the link.

Our emphasis on open notebooks has an ulterior motive, and that is to surface the many ways in which digital work sometimes fails. We want to introduce to you the idea of ‘failing productively’ because there is such a thing as an unproductive failure. There are ways of failing that do not do us much good, and we need – especially with digital work – to consider what ‘fail’ actually can mean. In the technology world, there are various slogans surrounding the idea of ‘fail’ – fail fast; move fast and break things; fail better; for instance.

When we talk about ‘failing productively’ or failing better, it is easy for critics of digital archaeology (or the digital humanities; see Allington et al 2016 but contra: Greenspan 2016) to connect digital work to the worst excesses of the tech sector. But again, this is to misunderstand what ‘fail’ should mean. The understanding of many tech startup folks that valorizing failure as a license to burn through funding has caused a lot of harm. The tech sector failed to understand the humanistic implication of the phrase, and instead took it literally to mean ‘a lack of success is itself the goal’. But where does it come from?

The earliest use of the ‘fail better’ idea that we have found outside the world of literature and criticism seems to occur during the first tech boom, where it turns up in everything from diet books to learning to play jazz, to technology is in The Quest for a Unified Theory of Information. CITATION (Wolfgang Hofkirchner). That book according to Google Scholar at the time of writing has been cited 13 times, but the things that cite it have themselves been cited over 600 times, which while not conclusive is suggestive. We are here merely speculating on where this mantra of the fast fail, fail better, comes from and how it spreads, but it would be a very interesting topic to explore.

Perhaps a better understanding of what ‘fail’ should mean is as something akin to what Nassim Taleb called ‘antifragility’. The fragile thing breaks under stress and randomness; the resilient thing stays the same; and the anti-fragile thing actually gets stronger as it is exposed to randomness. Kids’ bones for instance need to be exposed to shocks in order to get stronger. Academia’s systems are ‘fragile’ in that they do not tolerate fail; they are to a degree resilient, but they are not ‘antifragile’ in Taleb’s sense. The idea that ‘fail’ can break that which is ‘fragile’ is part of the issue here. So silicon valley really means ‘fail’ in the sense of ‘antifragile’ but they frequently forget that; academia sees ‘fail’ as the breaking of something fragile; and so the two are at loggerheads. Indeed, the rhetorical moves of academe often frame weak results – fails – as actual successes, thus making the scholarship fragile (hence the fierceness of academic disputes when results are challenged, sometimes). To make scholarship anti-fragile – to extract the full value of a fail and make it be productive, we need remember only one thing:

A failure shared is not a failure.

Not every experiment results in success; indeed, the failures are richer experiences because as academics we are loathe to say when something did not work – but how else will anybody know that a particular method, or approach, is flawed? If we try something, it does not work, and we then critically analyze why that should be, we have in fact entered a circle of positive feedback. This perspective is informed by our research into game based learning. A good game keeps the challenges just ahead of the player’s (student’s) ability, to create a state of ‘flow’. Critical failure is part of this: too hard, and the player quits; too easy, and the player drops the controller in disgust. The ‘fails’ that happen in a state of flow enable the player to learn how to overcome them. Perhaps if we can design assessment to tap into this state of flow, then we can create the conditions for continual learning and growth (see for instance Kee, Graham, et al. 2009). As in our teaching, so too in our research. Presner writes,

Digital projects in the Humanities, Social Sciences, and Arts share with experimental practices in the Sciences a willingness to be open about iteration and negative results. As such, experimentation and trial-and-error are inherent parts of digital research and must be recognized to carry risk. The processes of experimentation can be documented and prove to be essential in the long-term development process of an idea or project. White papers, sets of best practices, new design environments, and publications can result from such projects and these should be considered in the review process. Experimentation and risk-taking in scholarship represent the best of what the university, in all its many disciplines, has to offer society. To treat scholarship that takes on risk and the challenge of experimentation as an activity of secondary (or no) value for promotion and advancement, can only serve to reduce innovation, reward mediocrity, and retard the development of research. PRESSNER 2012 cite

1.5.1 A taxonomy of fails

There are fails, and then there are fails. Croxall and Warnick identify a taxonomy of four kinds of failure in digital work:

  1. Technological Failure
  2. Human Failure
  3. Failure as Artifact
  4. Failure as Epistemology

…to which we might add a fifth kind of fail:

  1. Failing to Share

The first is the simplest: something simply did not work. The code is buggy, dust and grit got into the fan and the hardware seized. The second, while labeled ‘human failure’ really means that the context, the framework for encountering the technology was not erected properly, leading to a failure to appreciate what the technology could do or how it was intended to be used. This kind of failure can also emerge when we ourselves are not open to the possibilities or work that the technology entails. The next two kinds of failure emerge from the first in that they are ways of dealing with the first two kinds of failure. ‘Failure as Artifact’ means that we seek out examples of failures as things to study, working out the implications of why something did not work. Finally, ‘Failure as Epistemology’ purposely builds the opportunity to fail into the research design, such that each succeeding fail (of type 1 or type 2) moves us closer to the solution that we need. The first two refer to what happened; the second two refer to our response and how we react to the first two (if we react at all). The key to productive failure as we envision it is to recognize when one’s work is suffering from a type 1 or type 2 fail, and to transform it to a type 3 or type 4. Perhaps there should be a fifth category though, a failure to share. For digital archaeology to move forward, we need to know where the fails are and how to move beyond them, such that we move forward as a whole. Report not just the things that work, but also the fails. That is why we keep open research notebooks.

Lets consider some digital projects that we have been involved in, and categorize the kinds of fails they suffered from. We turn first to the HeritageCrowd project that Graham established in 2011. This project straddled community history and cultural history in a region poorly served by the internet. It was meant to crowd-source intangible heritage via a combination of web-platform and telephony (people could phone in with stories, which were automatically transcribed and added to a webmap). The first write-up of the project was published just as the project started to get underway (CITATION GRAHAM ETAL). It’s what happened next that is of interest here.

The project website was hacked, knocked offline and utterly compromised. The project failed.

Why did it fail? It was a combination of at least four distinct problems (GRAHAM CITATION):

  1. poor record keeping of the installation process of the various technologies that made it work
  2. computers talk to other computers to persuade them to do things. In this case, one computer injected malicious code into the technologies Graham was using to map the stories
  3. Graham ignored security warnings from the main platform’s maintainers
  4. Backups and versioning: there were none.

Graham’s fails here are of both type 1 and type 2. In terms of type 2, his failure to keep careful notes on how the various pieces of the project were made to fit together meant that he lacked the framework to understand how he had made the project vulnerable to attack. The actual fail point of the attack – that’s a type 1 fail, but could have been avoided if Graham had participated more in the spirit of open software development, eg, read the security warnings in the developer forum! When Graham realized what had happened to his project, he was faced with two options. One option, having already published a piece on the project that hailed its successes and broad lessons for crowdsourcing cultural heritage, would have been to quietly walked away from the project (perhaps putting up a new website avverring that version 2.0 was coming, pending funding). The other option was to warn folks to beef up the security and backups for their own projects. At the time, crowdsourcing was very much an academic fashion and Graham opted for the second option in that spirit. In doing this, the HeritageCrowd project became a fail of type 3, an artifact for study and reflection. The act of blogging his post-mortem makes this project also an instance of type 5, or the communication of the fail. It is worth pointing out here that the public sharing of this failure is not without issues. As we indicated in the [Open Notebook Research & Scholarly Communication] section, the venue for sharing what hasn’t worked and the lessons learned is highly contingent on many factors. Graham, as a white male tenure-track academic on the web in 2012, could share openly that things had not worked. As the web has developed in the intervening years, and with the increasing polarization of web discourse into broad ideological camps, it may well not be safe for someone in a more precarious position to share so openly. One must keep in mind one’s own situation and adapt what we argue for here accordingly. Sharing fails can be done with close colleagues, students, specialist forums and so on.

If we are successful with ODATE, and the ideas of productive fail begin to permeate more widely in the teaching of digital archaeology, then a pedagogy that values fail will with time normalize such ‘negative results’. We are motivated by this belief that digital archaeology is defined by the productive, pedagogical fail. It is this aspect of digital archaeology that also makes it a kind of public archaeology, and that failing in public can be the most powerful thing that digital archaeology offers the wider field.

We implore you to do your research so that others can retrace your steps; even a partial step forward is a step forward! When you find a colleague struggling, give positive and meaningful feedback. Be open about your own struggles, but get validation of your skills if necessary. Build things that make you feel good about your work into your work.

1.5.2 Exercises

What are the nature of your own fails? Reflect on a ‘fail’ that happened this past year. Where might it fit on the taxonomy? Share this fail via your open notebook, blog, or similar, with your classmates. How can your classmate convert their fails into types three or four?

What is Digital Archaeology?

We’re currently writing the very first draft of our integrated DHBox virtual-machine-and-textbook for digital archaeology. It’s aimed at the same crowd as a regular intro-to-archaeology text, that is, first or second year students with little digital grounding. It won’t cover everyone’s wishlist for digital archaeology, but it will with care be a solid foundation for going further.

What follows is an excerpt from the opening chapter, ‘Going Digital’ and the first subsection, ‘So What is Digital Archaeology Anyway?’

This is a rough draft; indeed, I call it the ‘barf’ draft where you are just struggling to get ideas out, in the hopes that eventually it will come together, and the errors, omissions and non-sequiturs are ironed out. Use with caution; comments welcome. Feel free to annotate with Hypothesis.

I post this today in honour of the Computer Applications in Archaeology Conference in Atlanta, (updates from which follow on Twitter at #caaatlanta.)

1 Going Digital

Digital archaeology should exist to assist us in the performance of archaeology as a whole. It should not be a secret knowledge, nor a distinct school of thought, but rather simply seen as archaeology done well, using all of the tools available to and in better recovering, understanding and presenting the past. In the end, there is no such thing as digital archaeology. What exists, or at least what should exist, are intelligent and practical ways of applying the use of computers to archaeology that better enable us to pursue both our theoretical questions and our methodological applications. (Evans and Daly  2006)

While we agree with the first part of the sentiment, the second part is rather up for debate. We believe that there is such a thing as digital archaeology. Digital tools exist in a meshwork of legal and cultural obligations, and moreso than any other tool humans have yet come up with, have the capability to exert their own agency upon the user. Digital tools and their use are not theory-free nor without theoretical implications. There is no such thing as neutral, when digital tools are employed. This is why digital archaeology is – or should be – a distinct subfield of the wider archaeological project.

In a conversation initiated on Twitter on March 10, 2017, Graham asked the question, ‘is digital archaeology the same as using computers in archaeology?’ REF.

The resulting conversation ranged widely over everything from the topic of study (ref tinysapiens ref) to the ways in which computational power enables the researcher to ask questions that were not previously feasible to ask (ref CD Wren). Other researchers sounded a note of caution against the kind of ‘technological fetishism’ (ref Lorna) that digital work can often fall pray to, especially given the larger issues of gender and ‘solutionitis’ that emerge given the white, 20-35 year old demographic of many tech workers (for criticisms of technological solutionism or utopianism in archaeology, see the work of Colleen Morgan (ref phd thesis) Joyce, Tringham, Morozov, Kansa ). Others sounded a warning that to think of digital archaeology as something distinct from archaeology risks ‘going the way of DH’ and instead appealed for a holistic understanding (ref Gruber).

Hanna Marie Pageau succintly captured these issues, when over a series of tweets (REF) she wrote,

‘Digital archaeology has an obvious digital component. However, saying it’s simply using a computer is like saying being a computer scientist means you use a computer to do science. There is an implied addition [to the] topic of specific methods that brings you from an archaeologist using a computer to being an archaeologist who studies digital archaeology. I would argue that archaeogaming is the most straight forward example. Because while gaming is usually thought of as digital, it could study table top gaming and not technically be digital in nature. However if you’re studying ethics of representation in games you’re going from just using a computer as a tool to it being THE medium.’

In which case, an important aspect of digital archaeology that differentiates it from the use of computing power to answer archaeological questions is this question of purpose. In this section, we take up this question beginning with the question of teaching digital approaches. We progress by suggesting that digital archaeology is akin to work at the intersection of art and public archaeology and digital humanities. We provide you the necessary basics for setting up your own digital archaeological practice. Entrance into the world of digital archaeology requires organizational ability and facility with versioning files. It is allied with the practice of open notebook science, and it attempts to future-proof by using the simplest file formats and avoiding proprietary software where possible. These are the basics on which the rest of digital archaeological practice is founded.

1.1 So what is Digital Archaeology?

If you are holding this book in your hands, via a device or on paper, or looking at it on your desktop, you might wonder why we feel it necessary to even ask the question. It is important at the outset to make the argument that digital archaeology is not about ‘mere’ tool use. Andrew Goldstone in Debates in the Digital Humanities discusses this tension (Goldstone 2018). He has found (and Lincoln Mullen concurs with regard to his own teaching,(Mullen 2017)) that our current optimism about teaching technical facility is misplaced. Tools first, context second doesn’t work. Alternatively, theory first doesn’t seem to work either. And finally, for anything to work at all, datasets have to be curated and carefully pruned for their pedagogical value. We can’t simply turn students loose on a dataset (or worse, ask them to build their own) and expect ‘learning’ to happen.

Our approach in this volume is to resolve that seeming paradox by providing not just the tools, and not just the data, but also the computer itself. Archaeologically, this puts our volume in dialog with the work of scholars such as Ben Marwick, who makes available with his research the code, the dependencies, and sometimes, an entire virtual machine, to enable other scholars to replicate, reuse, or dispute his conclusions. We want you to reuse our code, to study it, and to improve upon it. We want you to annotate our pages, and point out our errors. For us, digital archaeology is not the mere use of computational tools to answer archaeological questions. Rather, it is to enable the audience for archaeological thinking to enter into conversation with us, and to do archaeology for themselves.

Digital archaeology is necessarily a public archaeology. This is its principal difference with what has come before, for never forget, there has been at least a half-century of innovative use of computational power for archaeological knowledge building.

Ethan Watrall has drawn the histoy of computational archaeology/digital archaeology all the way back to the pioneering work of James Deetz in the 1960s, who used computers at MIT to perform stylistic analyses of Arikara ceramics (Ethan Watrall 2017, Deetz (1965)). Most early interest in computation for archaeology was centred on the potential for computational databases, although ambition often out-stripped capability. By the 1970s, serious efforts were being put into work to build the infrastructural knowledge necessary to make and usefully query archaeological datasets. One can see this concern play out by considering a topic model (Shawn Graham 2014) of the early volumes of the Computer Applications in Archaeology (a topic model is a way of deducing latent patterns of discourse within text, based on patternings of words (See Graham, Weingart, and Milligan 2012)):

topic 1 – computer, program, may, storage, then, excavation, recording, all, into, form, using, retrieval, any, user, output, records, package, entry, one, unit

topic 6: but, they, one, time, their, all, some, only, will, there, would, what, very, our, other, any, most, them, even

topic 20: some, will, many, there, field, problems, may, but, archaeologists, excavation, their, they, recording, however, record, new, systems, most, should, need

The beginnings of the CAA are marked by hesitation and prognostication: what are computers for, in archaeology? There is a sense that for archaeologists, computation is something that will be useful insofar as it can be helpful for recording information in the field. By the 1980s desktop computing was becoming sufficiently widespread that the use of geographic information systems was feasible for more and more archaeologists. The other ‘killer app’ of the time was computer-aided design, which allowed metric 3d reconstructions from the plans drawn on site by excavators. Yet, computational resources were still limited enough that computing was not something that one could merely ‘play’ with. Software was costly, computation took time, and training resources were put into learning the proprietary packages that existed (rather than coding knowledge). By the 1990s, the introduction of the cd-rom and the shift in PC gaming technologies from primarily text-based to graphical based games led to teaching simulations for archaeology, most notably T. Douglas Price and Anne Birgitte Gebauer’s Adventures in Fugawiland. Watrall identifies the emergence of the web as being not so much a boon for computational archaeology as it was for public archaeology (although the pioneering journal Internet Archaeology was first published in 1996); nevertheless, the birth of the web (which it must be remembered is distinct from and overlays the internet) allowed for a step-change in the effectiveness of the dissemination of open-source software and code, including practices for remote collaboration on code that are now beginning to percolate into scholarly publication.

The 2000s have seen, insofar as digital archaeology is concerned, a replay of the earlier episodes of computational archaeology, concommitant with each subsequent web ‘revolution’ (ie, so-called web 2.0, web 3.0 etc). Works such as (Evans, Daly, and MyiLibrary 2006) and (E. C. Kansa, Kansa, and Watrall 2011) are broadly concerned more with questions of infrastructure and training, while the more recent Mobilizing the Past deal with problems of training, and the ethical issues that the emerging digital surveillance permitted by our networked society presents to the practice of archaeology (and public archaeology). Perhaps the most promising new digital technologies to emerge in recent years include methods for linking open archaeological data via the web (ie, freeing various ‘silos’ of disciplinary knowledge so that the semantic connections between them can be followed and queried) and various mixed-reality approaches (virtual reality, augmented reality, 3d printing, and the so-called internet of things or the practice of wiring everything that can be wired to the web). The 2000s have also seen a growing realization that our digital tools and their algorithmic biases not only permit interesting questions to be asked about the past, but also inhibit points of view or impose their own worldviews upon the past in ways that may damage communities and/or scholarship. This reflective critique of computation in the service of archaeology marks digital archaeology within the ambit of the digital humanities (despite the division between anthropological and humanistic archaeologies).

1.1.1 Is digital archaeology part of the digital humanities?

In recent years – certainly the last decade – an idea called ‘the digital humanities’ has been percolating around the academy. It is a successor idea to ‘humanities computing’, but it captures that same distinction between discussed above. Digital archaeology has developed alongside the digital humanities, sometimes intersecting with it (notably, there was a major archaeological session at the annual international Alliance of Digital Humanities Organizations (ADHO) DH conference in 2013).

The various component organizations of the ADHO have been meeting in one form or another since the 1970s; so too the Computer Applications in Archaeology Conference has been publishing its proceedings since 1973. Archaeologists have been running simulations, doing spatial analysis, clustering, imaging, geophysicing, 3d modeling, neutron activation analyzing, x-tent modeling , etc, for what seems like ages. Happily, there is no one definition of ‘dh’ that everyone agrees on (see the various definitions collected at http://definingdh.org/; reload the page to get a new definition). For us, a defining characteristic of DH work is that public use we discussed above. But, another characteristic that we find useful to consider is the purpose to which computation is put in DH work. This means that digital work also has to be situated in the contexts of power and access and control (which sometimes means that digital work is mis-characterised as being part of a ‘neo-liberal’ agenda to reduce knowledge work to base profit motifs, eg Brouiellet; more thoughtful work about the confluence of the digital with neoliberalism may be found in Caraher xxxx and Kansa xxxx and Greenspan xxx. We discuss the ethical dimensions to digital work more fully in [The Ethics of Big Data in Archaeology].)

For us, a key difference between the kind of computational archaeology of the last years of the twentieth century versus the emerging digital archaeology of the last decade lie in the idea of the purpose behind the computing power. Trevor Owens, a digital archivist, draws attention to the purpose behind one’s use of computational power – generative discovery versus justification of an hypothesis (tjowens 2012). Discovery marks out the digital humanist whilst justification signals the humanist who uses computers. Discovery and justification are critically different concepts. For Owens, if we are using computational power to deform our texts, then we are trying to see things in a new light, to create new juxtapositions, to spark new insight. Stephen Ramsay talks about this too in Reading Machines (Ramsay 2011, 33), discussing the work of Samuels and McGann, (Samuels and McGann 1999): “Reading a poem backward is like viewing the face of a watch sideways – a way of unleashing the potentialities that altered perspectives may reveal”. This kind of reading of data (especially, but not necessarily, through digital manipulation), does not happen very much at all in archaeology. If ‘deformance’ is a key sign of the digital humanities, then digital archaeologists are not digital humanists. Owen’s point isn’t to signal who’s in or who’s out, but rather to draw attention to the fact that:

When we separate out the the context of discovery and exploration from the context of justification we end up clarifying the terms of our conversation. There is a huge difference between “here is an interesting way of thinking about this” and “This evidence supports this claim.”

This is important in the wider conversation concerning how we evaluate digital scholarship. We’ve used computers in archaeology for decades to try to justify or otherwise connect our leaps of logic and faith, spanning the gap between our data and the stories we’d like to tell. We believe, on balance, that ‘digital archaeology’ sits along this spectrum between justification and discovery closer to the discovery end, that it sits within the digital humanities and should worry less about hypothesis testing, and concentrate more on discovery and generation, of ‘interesting way[s] of thinking about this’.

Digital archaeology should be a prompt to make us ‘think different’. Let’s take a small example of how that might play out. It’s also worth suggesting that ‘play’ as a strategy for doing digital work is a valid methodology (see Ramsay (2011)). (And of course, the ability to play with computing power is a function of Moore’s law governing the increase in computing power time: computing is no longer a precious resource but something that can be ‘wasted’.)

1.1.2 Archaeological Glitch Art

Bill Caraher is a leading thinker on the implications and practice of digital archaeology. In a post on archaeological glitch art (Caraher 2012) Caraher changed file extensions to fiddle about in the insides of images of archaeological maps. He then looked at them again as images:

The idea … is to combine computer code and human codes to transform our computer mediated image of archaeological reality in unpredictable ways. The process is remarkably similar to analyzing the site via the GIS where we take the “natural” landscape and transform it into a series of symbols, lines, and text. By manipulating the code that produces these images in both random and patterned ways, we manipulate the meaning of the image and the way in which these images communicate information to the viewer. We problematize the process and manifestation of mediating between the experienced landscape and its representation as archaeological data.

Similarly, Graham’s work in representing archaeological data in sound (a literal auditory metaphor) translates movement over space (or through time) into a soundscape of tones (Graham 2017). This frees us from the tyranny of the screen and visual modes of knowing that often occlude more than they reveal (for instance, our Western-framed understanding of the top of the page or screen as ‘north’ means we privilege visual patterns in the vertical dimension over the horizontal (Montello et al. 2003)).

These playful approaches force us to rethink some of our norms of communication, our norms of what archaeology can concern itself with. It should be apparent that digital archaeology transcends mere ‘digital skills’ or ‘tool use’; but it also suffers from being ‘cool’.

1.1.3 The ‘cool’ factor

Alan Liu (Liu 2004) wondered what the role of the arts and humanities was in an age of knowledge work, of deliverables, of an historical event horizon that only goes back the last financial quarter. He examined the idea of ‘knowledge work’ and teased out how much of the driving force behind it is in pursuit of the ‘cool’. Through a deft plumbing of the history of the early internet (and in particular, riffing on Netscape’s ‘what’s cool?’ page from 1996 and their inability to define it except to say that they’d know it when they saw it), Liu argues that cool is ‘the aporia of information… cool is information designed to resist information… information fed back into its own signal to create a standing interference pattern, a paradox pattern’ (Liu 2004, 179). The latest web design, the latest app, the latest R package for statistics, the latest acronym on Twitter where all the digital humanists play: cool, and dividing the world.

That is, Liu argued that ‘cool’ was amongst other things a politics of knowledge work, a practice and ethos. He wondered how we might ‘challenge knowledge work to open a space, as yet culturally sterile (coopted, jejune, anarchistic, terroristic), for a more humane hack of contemporary knowledge?’ (Liu 2004, 9). Liu goes on to discuss how the tensions of ‘cool’ in knowledge work (for us, read: digital archaeology) also intersects with an ethos of the unknown, that is, of knowledge workers who work nowhere else somehow manage to stand outside that system of knowledge production. (Is alt-ac ‘alt’ partially because it is the cool work?). This matters for us as archaeologists. There are many ‘cool’ things happening in digital archaeology that somehow do not penetrate into the mainstream (such as it is). The utilitarian dots-on-a-map were once cool, but are now pedestrian. The ‘cool’ things that could be, linger on the fringes. If they did not, they wouldn’t be cool, one supposes. They resist.

To get that more humane hack that Liu seeks, Liu suggests that the historical depth that the humanities provides counters the shallowness of cool:

The humanities thus have an explanation for the new arts of the information age, whose inheritance of a frantic sequence of artistic modernisms, postmodernisms, and post-postmodernists is otherwise only a displaced encounter with the raw process of historicity. Inversely, the arts offer the humanities serious ways of engaging – both practically and theoretically- with “cool”. Together, the humanities and arts might be able to offer a persuasive argument for the humane arts in the age of knowledge work. (Liu 2004, 381).

In which case, the emergence of digital archaeologists and historians in the last decade might be the loci of the humane hacks – if we move into that space where we engage the arts. Indeed, the seminal anthropologist Tim Ingold makes this very argument with reference to his own arc as a scholar, ‘From Science to Art and Back Again’:

Revisiting science and art: which is more ecological now? Why is art leading the way in promoting radical ecological awareness? The goals of today’s science are modelling, prediction and control. Is that why we turn to art to rediscover the humility that science has lost?

We need to be making art. Digital archaeology naturally pushes in that direction.

1.1.4 Takeaways

  • Digital archaeology is a public archaeology
  • Digital archaeology is often about deformance rather than justification
  • In that deformative practice, it is in some ways extremely aligned with artistic ways of knowing
  • Digital archaeology is part of the digital humanities, and in many ways, presaged current debates and trends in that field.

All of these aspects of digital archaeology exist along a continuum. In the remainder of this chapter, we give you a ‘boot-camp’ to get you to the point where you can begin to wonder about deformation and the public entanglement with your work.

1.1.5 Exercises

The first steps in going digital are quite easy. They are fundamentally a question of maintaining some basic good habits. Everything else flows from these three habits:

1. separate _what_ your write/create from _how_ you write it.
2. keep what you write/create under version control.
3. break tasks down into their smallest manageable bits

Have you ever fought with Word or another wordprocessor, trying to get things just right? Word processing is a mess. It conflates writing with typesetting and layout. Sometimes, you just want to get the words out. Othertimes, you want to make your writing as accessible as possible… but your intended recipient can’t open your file, because they don’t use the same wordprocessor. Or perhaps you wrote up some great notes that you’d love to have in a slideshow; but you can’t, because copying and pasting preserves a whole lot of extra gunk that messes up your materials. Similarly, while many archaeologists will use Microsoft Excel to manipulate tabular data (artifact measurements, geochemistry data, and so on), Excel is well known for both corrupting data and for being impossible to replicate (ie, the series of clicks to manipulate or perform an analysis differ depending on the individual’s particular installation of Excel).

The answer is to separate your content from your tool, and your analytical processes separate from your data. This can help keep your thinking clear, but it also has a more nuts-and-bolts practical dimension. A computer will always be able to read a text file. That is to say: you’ve futureproofed your material. Any researcher will have old physical discs or disc drives or obsolete computers lying around. It is not uncommon for a colleague to remark, ‘I wrote this in Wordperfect and I can’t open this any more’. Graham’s MA thesis is trapped on a 3.5″ disc drive that was compressed using a now-obsolete algorithm and it cannot be recovered. If, on the other hand, he had written the text as a .txt file, and saved the data as .csv tables, those materials would continue to be accessible. If the way you have manipulated or cleaned the data is written out as a script, then a subsequent investigator (or even your future self) can re-run the exact sequence of analysis, or re-write the script into the equivalent steps in another analytical language.

A .txt file is simply a text file; a .csv is a text file that uses commas to separate the text into columns. Similarly, a .md file is a text file that uses things like # to indicate headers, and _ to show where italicized text starts and stops. A script, in a play, tells you what to say and do. A script for a language like R or Python does the same thing for the computer, and has the advantage that it is human-readable and annotatable as well, because its format is still a simple text file. Scripts you might encounter could have the .r or .py or .sh file extensions. You can open these in a text editor and see what the computer is being instructed to do. Annotations or comments in the script can be set off in various ways, and help the researcher know what is happening or is intended to happen at various points. Let’s begin by creating some simple text files to document our research process, in the Markdown format.

  1. A nice place to practice writing in markdown that shows you immediately how your text might be rendered when turned into html, pdf, or Word doc is Dillinger.io. Go there now to try it out. Write a short piece on why you’re interested in Digital Archaeology.
  1. Include a blockquote from the introduction to this book.
  2. Include two links to an external site.
  3. Embed an image.
  1. In ODATE at the command line, you’ll now write a markdown file using the built-in text editor nano. Everywhere in this book where you see the $ symbol, we mean for you to type whatever follows after the $ at the command line. The $ is the prompt. Enter the command $ nano my-first-md-file.md. This tells the machine to use the nano text editor to make a new file called my-first-md-file.md and to allow you to edit it right away. (If you just wanted to create an empty file, you could use $ touch my-first-md-file.md). The screen that opens is a simple text editor. Re-write what you wrote for 1 but add subheadings this time. To save your work, hit ctrl+x. Nano will ask you if you want to save changes; select y. It will then prompt you for a file name, but my-first-md-file.md will already be inserted there, so just hit enter.
  2. Make a new markdown file called ‘todo list’ or similar. Use bullet points to break down what else you need to do this week. Each bullet point should have a sub-bullet with an actual ACTION listed, something that you can accomplish to get things done.

As you work through this book, we encourage you to write your thoughts, observations, or results in simple text files. This is good practice whether or not you embark on a full-blown digital project, because ultimately, if you use a computer in your research, you have gone digital.

References

Goldstone, Andrew. 2018. “Teaching Quantitative Methods: What Makes It Hard 9in Literary Studies).” In Debates in the Digital Humanities.

Mullen, Lincoln. 2017. “A Confirmation of Andrew Goldstone on ‘Teaching Quantitative Methods’.” The Backward Glance. http://lincolnmullen.com/blog/a-confirmation-of-andrew-goldstone-on-teaching-quantitative-methods/.

Ethan Watrall. 2017. “Archaeology, the Digital Humanities, and the ‘Big Tent’.” In Debates in the Digital Humanities, 2016th ed. Accessed February 23. http://dhdebates.gc.cuny.edu/debates/text/79.

Deetz, James. 1965. The Dynamics of Stylistic Change in Arikara Ceramics. Urbana: University of Illinois Press.

Shawn Graham. 2014. “A Digital Archaeology of Digital Archaeology: Work in Progress.” https://electricarchaeology.ca/2014/11/06/a-digital-archaeology-of-digital-archaeology-work-in-progress/.

Graham, Shawn, Scott Weingart, and Ian Milligan. 2012. “Getting Started with Topic Modeling and MALLET.” Programming Historian, September. http://programminghistorian.org/lessons/topic-modeling-and-mallet.

Evans, Thomas L., Patrick T. Daly, eds. 2006. Digital Archaeology: Bridging Method and Theory. London ; New York: Routledge. http://proxy.library.carleton.ca/login?url=http://www.myilibrary.com?id=29182.

Kansa, Eric C., Sarah Whitcher Kansa, and Ethan Watrall. 2011. Archaeology 2.0: New Approaches to Communication and Collaboration. Cotsen Digital Archaeology. http://escholarship.org/uc/item/1r6137tb.

tjowens. 2012. “Discovery and Justification Are Different: Notes on Science-Ing the Humanities.” Trevor Owens. http://www.trevorowens.org/2012/11/discovery-and-justification-are-different-notes-on-sciencing-the-humanities/.

Ramsay, Stephen. 2011. Reading Machines: Toward an Algorithmic Criticism. 1st Edition edition. Urbana: University of Illinois Press.

Samuels, Lisa, and Jerome J. McGann. 1999. “Deformance and Interpretation.” New Literary History 30 (1): 25–56. doi:10.1353/nlh.1999.0010.

Caraher, William. 2012. “Archaeological Glitch Art.” The Archaeology of the Mediterranean World. https://mediterraneanworld.wordpress.com/2012/11/21/archaeological-glitch-art/.

Graham, Shawn. 2017. “Cacophony: Bad Algorithmic Music to Muse To.” https://electricarchaeology.ca/2017/02/03/cacophony-bad-algorithmic-music-to-muse-to/.

Montello, Daniel R., Sara Irina Fabrikant, Marco Ruocco, and Richard S. Middleton. 2003. “Testing the First Law of Cognitive Geography on Point-Display Spatializations.” In International Conference on Spatial Information Theory, 316–31. Springer. http://link.springer.com/chapter/10.1007/978-3-540-39923-0_21.

Liu, Alan. 2004. The Laws of Cool: Knowledge Work and the Culture of Information. 1 edition. Chicago: University of Chicago Press.

On Writing with Bookdown

How do you collaborate remotely with co-authors? How do you make sure that what you write is as accessible as possible to as many people as possible, and do so such that your readers become collaborators as well?

When Ian, Scott and I wrote The Macroscope, we tried a combination of a comment-press enabled wordpress site, coupled with a private ychat instance and a dropbox filled with word doc files. We had initially tried Scrivener with a github repo (since a scrivener project is ultimately a stack of rtf files), but we ran into sync problems, which were partly caused by using Scrivener on windows and mac at the same time, and… well, it didn’t work off the bat.  You can read more about that experience over on the AHA.

For my current project, the Open Digital Archaeology Textbook Environment (ODATE for short; I’m rubbish at a) names and b) acronyms) Neha, Michael, Beth and I are trying to use Slack to manage our conversations, and writing instead in a series of markdown files, and use a repo to manage the collaboration, resolve conflicts, etc. Right now this is going swimmingly because for various reasons the rest of the gang can’t devote any time to this project. For my part, I’m trying to get the infrastructure and preliminary writing set up so we can hit the ground running in around mid march.

Given our requirements outlined above, I looked around at a couple of different platforms, settling eventually on Bookdown . I like Bookdown because I can write in whatever editor strikes my fancy (switching into R to do the build) and it will give me a complete flat website (*with* search), an epub, and a PDF, grabbing my citations from a Bibtext file and formatting appropriately (we have a collaborative Zotero library for adding to while we write/research; export to Bibtext, boom!). Right now, I’m writing within RStudio. With some minimal tweaking, it also allows me to build in Hypothes.is collaborative annotation (and via the Hypothesis API, I plan on collating annotations periodically to guid revisioning and also perhaps to build into the book user appendices, but that idea is still nebulous at the moment). I’ve also run the resulting website from Bookdown through various web accessibility tools, and the report comes back fairly positive. With some more tweaking, I think I can make the final product super-accessible, or at the very least, produce PDFs that screenreaders and so on can work with.

Getting Bookdown set up was not without its idiosyncracies. The RStudio version has to be the preview release. Then:

  1. create a new project in RStudio (do this in a brand new folder)
  2. run the following script to install Bookdown:
install.packages("devtools")
devtools::install_github("rstudio/bookdown")
  1. create a new textfile with metadata that describe how the book will be built. The metadata is in a format called YAML (‘yet another markup language’) that uses keys and values that get passed into other parts of Bookdown:
title: "The archaeology of spoons"
author: "Graham, Gupta, Carter, & Compton"
date: "July 1 2017"
description: "A book about cocleararchaeology."
github-repo: "my-github-account/my-project"
cover-image: "images/cover.png"
url: 'https\://my-domain-ex/my-project/'
bibliography: myproject.bib
biblio-style: apa-like
link-citations: yes

This is the only thing you need to have in this file, which is saved in the project folder as index.Rmd.

  1. Write! We write the content of this book as text files, saving the parts in order. Each file should be numbered 01-introduction.Rmd, 02-a-history-of-spoons.Rmd, 03-the-spoons-of-site-x.Rmd and so on.
  2. Build the book. With Bookdown installed, there will be a ‘Build Book’ button in the R Studio build pane. This will generate the static html files for the book, the pdf, and the epub. All of these will be found in a new folder in your project, _book. There are many more customizations that can be done, but that is sufficient to get one started.

To get Hypothesis working, we have to modify the _output.yml file:


bookdown::gitbook:
  includes:
      in_header: hypothesis.html

That html file is just a file with a single line, where you add the script src line for embedding hypothesis (see this guidance).

You end up with a Gitbook-looking site, but without any of the gitbook editor flakiness. Have fun! Right now, we’re using a private Hypothesis group too to leave feedback on the various bits that have been written, so hopefully this will make for a smoother collaborative experience. Stay tuned.

ODATE: Open Digital Archaeology Textbook Environment (original proposal)

“Never promise to do the possible. Anyone could do the possible. You should promise to do the impossible, because sometimes the impossible was possible, if you could find the right way, and at least you could often extend the limits of the possible. And if you failed, well, it had been impossible.”
Terry Pratchett, Going Postal

And so we did. And the proposal Neha, Michael, Beth, and I put together was successful. The idea we pitched to ecampus ontario is for an open textbook that would have an integral computational laboratory (DHBox!), for teaching digital archaeology. The work of the DHBox team, and their generous licensing of their code makes this entire project possible: thank you!

We put together a pretty ambitious proposal. Right now, we’re working towards designing the minimal viable version of this. The original funding guidelines didn’t envision any sort of crowd-collaboration, but we think it’d be good to figure out how to make this less us and more all of you. That is, maybe we can provide a kernal that becomes the seed for development along the lines of the Programming Historian.

So, in the interests of transparency, here’s the meat-and-potatoes of the proposal. Comments & queries welcome at bottom, or if I forget to leave that open, on twitter @electricarchaeo.

~o0o~

Project Description

We are excited to propose this project to create an integrated digital laboratory and e-textbook environment, which will be a first for the broader field of archaeology

Digital archaeology as a subfield rests upon the creative use of primarily open-source and/or open-access materials to archive, reuse, visualize, analyze and communicate archaeological data. Digital archaeology encourages innovative and critical use of open access data and the development of digital tools that facilitate linkages and analysis across varied digital sources. 

To that end, the proposed ‘e-textbook’ is an integrated cloud-based digital exploratory laboratory of multiple cloud-computing tools with teaching materials that instructors will be able to use ‘out-of-the-box’ with a single click, or to remix as circumstances dictate.

We are proposing to create in one package both the integrated digital exploratory laboratory and the written texts that engage the student with the laboratory. Institutions may install it on their own servers, or they may use our hosted version. By taking care of the digital infrastructure that supports learning, the e-textbook enables instructors and students to focus on core learning straight away. We employ a student-centred, experiential, and outcome-based pedagogy, where students develop their own personal learning environment (via remixing our tools and materials provided through the laboratory) networked with their peers, their course professors, and the wider digital community.

Project Overview

Digital archaeology as a field rests upon the creative use of primarily open-source and/or open-access materials to archive, reuse, visualize, analyze and communicate archaeological data. This reliance on open-source and open-access is a political stance that emerges in opposition to archaeology’s past complicity in colonial enterprises and scholarship; digital archaeology resists the digital neo-colonialism of Google, Facebook, and similar tech giants that typically promote disciplinary silos and closed data repositories. Specifically, digital archaeology encourages innovative, reflective, and critical use of open access data and the development of digital tools that facilitate linkages and analysis across varied digital sources. 

To that end, the proposed ‘e-textbook’ is an integrated cloud-based digital exploratory laboratory of multiple cloud-computing tools with teaching materials that instructors will be able to use ‘out-of-the-box’ with a single click, or to remix as circumstances dictate. The Open Digital Archaeology Textbook Environment will be the first of its kind to address methods and practice in digital archaeology.

Part of our inspiration comes from the ‘DHBox’ project from CUNY (City University of New York, http://dhbox.org), a project that is creating a ‘digital humanities laboratory’ in the cloud. While the tools of the digital humanities are congruent with those of digital archaeology, they are typically configured to work with texts rather than material culture in which archaeologists specialise. The second inspiration is the open-access guide ‘The Programming Historian’, which is a series of how-tos and tutorials (http://programminghistorian.org) pitched at historians confronting digital sources for the first time. A key challenge scholars face in carrying out novel digital analysis is how to install or configure software; each ‘Programming Historian’ tutorial therefore explains in length and in detail how to configure software. The present e-textbook merges the best of both approaches to create a singular experience for instructors and students: a one-click digital laboratory approach, where installation of materials is not an issue, and with carefully designed tutorials and lessons on theory and practice in digital archaeology.

The word ‘e-textbook’ will be used throughout this proposal to include both the integrated digital exploratory laboratory and the written texts that engage the student with it and the supporting materials. This digital infrastructure includes the source code for exploratory laboratory so that faculty or institutions may install it on their own servers, or they may use our hosted version. This accessibility is a key component because one instructor alone cannot be expected to provide technical support across multiple operating systems on student machines whilst still bringing the data, tools and methodologies together in a productive manner. Moreover, at present, students in archaeology do not necessarily have the appropriate computing resources or skill sets to install and manage the various kinds of server-side software that digital archaeology typically uses. Thus, all materials will be appropriately licensed for maximum re-use. Written material will be provided as source markdown-formatted text files (this allows for the widest interoperability across platforms and operating systems; see sections 9 and 10). By taking care of the digital infrastructure that supports learning, the e-textbook enables instructors and students to focus on core learning straight away.

At our e-textbook’s website, an instructor will click once to ‘spin up’ a digital laboratory accessible within any current web browser, a unique version of the laboratory for that class, at a unique URL. At that address, students will select the appropriate tools for the tasks explored in the written materials. Thus, valuable class time is directed towards learning and experimenting with the material rather than installing or configuring software.

The e-textbook materials will be pitched at an intermediate level; appropriate remixing of the materials with other open-access materials on the web will allow the instructor to increase or decrease the learning level as appropriate. Its exercises and materials will be mapped to a typical one-semester time frame.

Rationale

Digital archaeology sits at the intersection of the computational analysis of human heritage and material cultural, and rapidly developing ecosystems of new media technologies. Very few universities in Ontario have digital archaeologists as faculty and thus digital archaeology courses are rarely offered as part of their roster. Of the ten universities in Ontario that offer substantial undergraduate and graduate programs in archaeology (see http://www.ontarioarchaeology.on.ca/archaeology-programs), only three (Western, Ryerson and Carleton) currently offer training in digital methods. Training in digital archaeology is offered on a per project level, most often in the context of Museum Studies, History, or Digital Media programs. Yet growing numbers of students demand these skills, often seeking out international graduate programs in digital archaeology. This e-textbook therefore would be a valuable resource for this growing field, while simultaneously building on Ontario’s leadership in online learning and Open Educational Resources. Moreover, the data and informatics skills that students could learn via this e-textbook, as well as the theoretical and historiographical grounding for those skills, see high and growing demand, which means that this e-textbook could find utility beyond the anthropology, archaeology, and cultural heritage sectors.

Our e-textbook would arrive at an opportune moment to make Ontario a leading centre for digital archaeological education. Recently, the provincial government has made vast public investment in archaeology by creating ‘Sustainable Archaeology’ (http://sustainablearchaeology.org/), a physical repository of Ontario’s archaeological materials and centre for research. While growing amounts of digitized archaeological materials are being made available online via data publishers such as Open Context (http://opencontext.org), and repositories such as tDAR (https://www.tdar.org), DINAA (http://ux.opencontext.org/archaeology-site-data/dinaa-overview/) and ADS (http://archaeologydataservice.ac.uk), materials for teaching digital archaeology have not kept pace with the sources now available for study (and print-only materials go out of date extremely quickly). Put simply, once archaeological material is online, we face the question of “so what?” and “now what?” This e-textbook is about data mining the archaeological database, reading distantly thousands of ‘documents’ at once, graphing, mapping, visualizing what we find and working out how best to communicate those findings. It is about writing archaeology in digital media that are primarily visual media. Thus, through the e-textbook, students will learn how to collect and curate open data, how to visualize meaningful patterns within digital archaeological data, and how to analyze them.

Furthermore, this e-textbook has two social goals:

  1. It agitates for students to take control of their own digital identity, and to think critically about digital data, tools and methods. This in turn, can enable them to embody open access principles of research and communication.
  2. It promotes the creation, use and re-use of digital archaeological data in meaningful ways that deepen our understanding of past human societies.

Research materials that are online do not speak for themselves, nor are they necessarily findable or ‘democratized’. To truly make access democratic, we must equip scholars with “digital literacy” — the relevant skills and theoretical perspectives that enable critical thinking. These aims are at the heart of the liberal arts curriculum. We know that digital tools are often repurposed from commercial services and set to work for research ends in the social sciences and liberal arts. We are well aware that digital tools inherently emphasize particular aspects of data, making some more important than others. Therefore, it is essential that students think critically about the digital tools they employ. What are the unintended consequences of working with these tools? There is a relative dearth of expertise in critically assessing digital tools, and in seeing how their biases (often literally encoded in how they work) can impact the practice of archaeology.

To that end, we employ a student-centred, experiential, and outcome-based pedagogy, where students develop their own personal learning environment (via remixing our tools and materials provided through the laboratory) networked with their peers, their course professors, and the wider digital community.

Content Map

E-textbook Structure (instructional materials to support the digital exploratory laboratory)

The individual pieces (files and documents) of this e-textbook will all be made available using the distributed Git versioning control software (via Github). This granularity of control will enable interested individuals to take the project to pieces to reuse or remix those elements that make the most sense for their own practice. Since the writing is in the markdown text format, learners can create EPubs, PDFs, and webpages on-demand as necessary, which facilitates easy reuse, remixing and adaptation of the content. The granularity of control also has the added bonus that our readers/users can make their own suggestions for improvement of our code and writing, which we can then fold into our project easily. In this fashion our e-textbook becomes a living document that grows with its use and readership.

Introduction. Why Digital Archaeology?

Part One: Going Digital

  1. Project management basics
    1. Github & Version control
    2. Failing Productively
    3. Open Notebook Research & Scholarly Communication
  2. Introduction to Digital Libraries, Archives & Repositories
    1. Command Line Methods for Working with APIs
    2. Working with Open Context
    3. Working with Omeka
    4. Working with tDAR
    5. Working with ADS
  3. The Ethics of Big Data in Archaeology

The digital laboratory elements in this part enable the student to explore versioning control, a bash shell for command line interactions, and an Omeka installation.

Part Two: Making Data Useful

  1. Designing Data Collection
  2. Cleaning Data with OpenRefine
  3. Linked Open Data and Data publishing

The digital laboratory elements in this part continue to use the bash shell, as well as OpenRefine.

Part Three: Finding and Communicating the Compelling Story

  1. Statistical Computing with R and Python Notebooks; Reproducible code
  2. D3, Processing, and Data Driven Documents
  3. Storytelling and the Archaeological CMS: Omeka, Kora
  4. Web Mapping with Leaflet
  5. Place-based Interpretation with Locative Augmented Reality
  6. Archaeogaming and Virtual Archaeology
  7. Social media as Public Engagement & Scholarly Communication in Archaeology

The digital laboratory elements in this part include the bash shell, Omeka (with the Neatline mapping installation) and Kora installations, mapwarper, RStudio Server, Jupyter notebooks (python), Meshlab, and Blender.

Part Four: Eliding the Digital and the Physical

  1. 3D Photogrammetry & Structure from Motion
  2. 3D Printing, the Internet of Things and “Maker” Archaeology
  3. Artificial Intelligence in Digital Archaeology (agent models; machine learning for image captioning and other classificatory tasks)

The digital laboratory elements in this part include Wu’s Visual Structure from Motion package, and the TORCH-RNN machine learning package.

Part Five: Digital Archaeology’s Place in the World

  1. Marketing Digital Archaeology
  2. Sustainability & Power in Digital Archaeology

To reiterate, the digital laboratory portion of the e-textbook will contain within it a file manager; a bash shell for command line utilities (useful tools for working with CSV and JSON formatted data); a Jupyter Notebook installation; an RStudio installation; VSFM structure-from-motion; Meshlab; Omeka with Neatline; Jekyll; Mapwarper; Torch for machine learning and image classification. Other packages may be added as the work progresses. The digital laboratory will itself run on a Linux Ubuntu virtual machine. All necessary dependencies and packages will be installed and properly configured. The digital laboratory may be used from our website, or an instructor may choose to install locally. Detailed instructions will be provided for both options.

Cacophony: Bad Algorithmic Music to Muse To

I was going to actually release this as an actual album, but I looked into the costs and it was a wee bit too pricey. So instead, let’s pretend this post is shiny vinyl, and you’re about to read the liner notes and have a listen.

~o0o~

Track 1. Listening to Watling Street

To hear the discordant notes as well as the pleasing ones, and to use these to understand something of the unseen experience of the Roman world: that is my goal. Space in the Roman world was most often represented as a series of places-that-come next; traveling along these two-dimensional paths replete with meanings was a sequence of views, sounds, and associations. In Listening to Watling Street, I take the simple counts of numbers of epigraphs in the Inscriptions of Roman Britain website discovered in the modern districts that correspond with the Antonine Itinerary along Watling street. I compare these to the total number of inscriptions for that county. The algorithm then selects instruments, tones, and durations according to a kind of auction based on my counts, and stitches them into a song. As we listen to this song, we hear crescendoes and dimuendoes that reflect a kind of place-based shouting: here are the places that are advertising their Romanness, that have an expectation to be heard(Roman inscriptions quite literally speak to the reader); as Western listeners, we have also learned to interpret such musical dynamics as implying movement (emotional, phsycial) or importance. The same itinerary can then be repeated using different base data – coins from the Portable Antiquities Scheme database, for instance – to generate a new tonal poem that speaks to the economic world, and, perhaps the insecurity of that world (for why else would one bury coins?).

Code: This song re-uses Brian Foo’s 2 Trains code.

~o0o~

Track 2: Mancis the Poet. (Original blog post.) A neural network trained on Cape Breton Fiddle tunes in ABC notation; the output then sonified. This site converts ABC notation to MIDI file and makes the pdf of the score; this site converts to mp3, which I then uploaded to soundcloud.

~o0o~

Track 3: John Adams’ 20 (original post here). Topic modeling does some rather magical things. It imposes sense (it fits a model) onto a body of text. The topics that the model duly provide us with insight into the semantic patterns latent within the text…What I like about sonification is that the time dimension becomes a significant element in how the data is represented, and how the data is experienced. So – let’s take a body of text, in this case the diaries of John Adams.  I scraped these, one line per diary entry (see this csv we prepped for our book, the Macroscope). I imported into R and topic modeled for 20 topics. The output is a monstrous csv showing the proportion each topic contributes to the entire diary entry (so each row adds to 1). If you use conditional formatting in Excel, and dial the decimal places to 2, you get a pretty good visual of which topics are the major ones in any given entry (and the really minor ones just round to 0.00, so you can ignore them). I then used ‘Musical Algorithms‘ one column at a time to generate a midi file. I’ve got the various settings in a notebook at home; I’ll update this post with them later. I then uploaded each midi file (all twenty) into GarageBand in the order of their complexity – that is, as indicated by file size.

~o0o~

Track 4: Jesuit Funk

The topic modeling approach seemed promising. I took the english translation of the complete Jesuit Relations, fitted a topic model, and then set about sonifying it. This time, I explored the live-coding music environment, Sonic Pi, but focussed on one topic only.

Code: https://gist.github.com/shawngraham/7ea86a33471acaaa5063

~o0o~

Track 5: PunctMoodie (original blog post here). There was a fashion, for a time, to create posters of various famous literary works as represented only by their patterns of punctuation. I used this code to reduce Susanna Moodie’s ‘Roughing it in the Bush‘ to its punctuation. Then I mapped the punctuation to its numeric ascii values, and fed the result into the Sonic Pi.

Bonus track! Disco version:

~o0o~

Track 6: Human Bone Song. I have scraped several thousand photos from Instagram for a study on the trade in human remains facilitated by social media. I ran a selection (the first 1000) through Manovich’s ImagePlot; first voice is brightness, second is saturation, third (drums) is hue. Sonified with musicalgorithms.org.

~o0o~

Track 7: Song of Dust and Ashes. Same rough procedure as before but with site photos from the Kenan Tepe excavations archived in Open Context. (see http://smgprojects.github.io/imageplot-opencontext/ ). Sonified via musicalgorithms.com, mixed in garageband. First pass.

~o0o~

Track 8: Kenentepe Colours – same data as track 7, but I used a fibonacci series per musicalgorithms.com to perform the duration mapping. Everything else was via the presets. Instrumentation via garageband.

~o0o~

Track 9: Bad Equity. I wondered if sonification could be useful in determining patterns in bad OCR of historical newspapers. I grabbed 1500 editions of The Shawville Equity, 1883-1914, (http://theequity.ca) using wget from the provincial archives of Quebec. Then, measure the number of special characters in each OCR’d txt file, taking the presence of these as a proxy for bad OCR. Add up for each file. Then to musicalgorithms, defaults. Then, because I’m a frustrated musician (and a poor one, at that), I threw a beat on it for the sake of interest. Read the full discussion and code over here.

Track 10: Lullaby

I wrote this one for my kids. It’s not algorithmic in any way. That’s me playing.

(featured image, British Library https://www.flickr.com/photos/britishlibrary/11147476496 )

The OpenContext & Carleton Prize for Archaeological Data Visualization

We are pleased to announce that the winner of the 1st OpenContext & Carleton University Data Visualization Prize is awarded to the ‘Poggio Civitate VR Data Viewer’, created by the team led by Russell Alleen-Willems. 

The team hacked this data viewer together over a weekend as a proof-of-concept. In the typical spirit of the digital humanities and digital archaeology, they developed a playful approach exploring the materials using the potential of the HTC Vive sdk to ingest Open Context data as json, and then to place it into a relative 3d space. We particularly appreciated their candour and self-assessment of what worked, and didn’t work about their project, and their plans for the future. We look forward to seeing their work progress, and hope that this prize will help them move forward. Please explore their project at https://vrcheology.github.io/ .

Congratulations to the team, and thank you to all who participated. Please keep your eyes peeled for next year’s edition of the prize!

The team members are:

  • Russell Alleen-Willems (Archaeology domain knowledge, Unity/C# Scripting)
  • Mader Bradley (JSON Data Parsing/Unity Scripting)
  • Jeanpierre Chery (UX Design, Unity/C# Scripting)
  • Blair Lyons (Unity/C# Scripting)
  • Aileen McGraw (Instructional Design and Program Storytelling)
  • Tania Pavlisak (3D modeler)
  • Jami Schwarzwalder (Git Management, Team Organization, and Social Media)
  • Paul Schwarzwalder (Unity/C# Scripting)
  • Stephen Silver (Background Music)

Scraping Instagram with R, with PHP

I’ve had reason lately to be collecting information regarding the sale of human remains online, in various places. One such is Instagram.

Working with Instagram is not straightforward. One approach that I had been using was a package for R called ‘InstaR‘ by Pablo Barbera. It worked great, after some initial confusion on my part on how to get the damned thing to authenticate (which involves setting up an Instagram developer’s account, etc etc.). Then, in the middle of last year, Instagram changed its developer API rules *such that* the only data I could access with the api *was my own*. So, all that publicly exposed data, but no tool to grab it. (If you read the new ToS, you can only get your wider access to the data approved if you’re commercializing your app – that is, the reason you’re wanting the data – and ‘research’ is not an approved choice. I’m drawing on my memory here, not having the ToS in front of me at the moment).

Long story short: no more data for me.

But then I came across a PHP library that did the trick, paging through the publicly displayed results. You can get it here: https://github.com/postaddictme/instagram-php-scraper. In what follows, I’m talking Mac; Windows folks, you’re on your own.

Getting it to work on this machine required installing Composer, a package manager for PHP which I could do from the terminal. I didn’t initially realize that one could run a PHP file from the terminal prompt same as one would run python etc:

$ php whatever.php

Who knew, eh? Anyway, with composer installed, the next hurdle is getting composer to do its damned job. Turns out this:

$ composer require raiym/instagram-php-scraper

actually had to be this:

$ composer.phar require raiym/instagram-php-scraper

The extra .phar probably means that I haven’t set something properly somewhere, but screw it. It works.

Then, it becomes a matter of writing the php to do what you want it to do, and piping the output to where it needs to go. In this, I found this post by Tom Woodward super helpful. End result:

<?php

require_once 'vendor/autoload.php';

use InstagramScraper\Instagram;

error_reporting(E_ALL);
ini_set("display_errors", 1);

$tag = '_whatever_it_is_youre_looking_for_';

$medias = Instagram::getMediasByTag($tag, 3000); //sets the number of results returned
echo json_encode($medias, JSON_UNESCAPED_SLASHES | JSON_PRETTY_PRINT); 
?>

so, from the terminal line:

$ php myweescript.php > output.json

Then, in order for me to do the next stage of the affairs, I need to convert the json to csv. One can do it with jq but json2csv made life so much easier. Make sure to install it with command line options, like so:

$ npm install json2csv --save -g

And of course, you have to have npm and node.js to make *that* work…

Anyway, good luck.