Archstopmo: the 2020 edition

How lovely, that with everything going on, that some folks found the time to try their hand at archaeological stop motion! Let’s watch some films:

Buried Ship

Abby and Maggie Mullen write,

“We created this film because we like Vikings and we wanted to make something about the ocean. (Our team likes a lot of different archaeological sites, so we went through a lot of ideas before landing on this one!) We found that the two-minute limitation made it both easier and more challenging, because it’s difficult to communicate a complicated story in two minutes, with Legos, but that helped us narrow down our topic.

Our process started with research about different archaeological sites, and when we found two stories about different Viking ships found with GPR, we decided it could be fun to try to view the site from both above the ground and below it.

Our set designer painted our backdrops in watercolor and built the sets in Lego. We had to adjust the scale of our Lego models multiple times, which she built, to make our photography work. We weren’t 100% successful, but an 8yo’s attention span is limited and we can’t exactly run out to the store right now to get more supplies.

We used an iPhone to take the photographs. We set it up on a tripod with a remote shutter to make it easier to keep it mostly in the same place. We then transferred our photos to a MacBook Pro and put the photos into iMovie to create the stop-motion. Our “silent film” text slides were created in PowerPoint, and we used a song from the YouTube Studio free music collection for our soundtrack.”

Comments on Youtube include, “I really liked this! It was so interesting AND beautiful. Really well done. It made me want to learn more!” and “Great information! I did not know that Viking ships had been found so recently from so long ago. I greatly enjoyed the scene settings and photography. The accompanying music was excellent.”

The Venus of Willendorf: an archaeological yarn

Karen Miller writes,

“As a traditional women’s craft, crochet is an apt sculptural method to recreate an iconic archaeological artefact that evokes the beauty of the female body. I was excited to find the pattern at Lady Crafthole’s ‘Cabinet of Crochet Curiositie’s https://www.crochetcuriosities.com/. I filmed it on an ipad with the Stop Motion Studio app https://apps.apple.com/au/app/stop-motion-studio/id441651297 and added the title and credits in iMovie. ”

Archaeological Tea-construction

Beth Pruitt writes,

“This video is about methodological theory in archaeology, created for SAA’s Online Archaeology Week after the cancellation of the planned Austin Public Archaeology Day at the 2020 SAA Annual Meeting. Through observing the attributes of the rim sherd (its curvature, decoration, etc.), archaeologists can make inferences about the rest of the whole, even when pieces remain missing. This is based on an in-person activity that I do at public archaeology events to help visitors understand laboratory methods and induction. I used the app Stop Motion Studio for taking the frame photos and strung them together in the Windows 10 Photos app. I drew the animated overlays frame-by-frame in Inkscape.”

Jury Prizes

  • To Maggie and Abby Mullen, in the ‘Story of a Site’ category
  • To Karen Miller, in the ‘Biography of an Object’ category
  • To Beth Pruitt, in the ‘Archaeological Theory’ category

Best Overall and Choix du Peuple

To be announced May 4th! Make your votes on the Choix du Peuple:

Tuesday May 5th:

And with the polls closed, looks like ‘Tea-Construction’ is the Choix du Peuple!

Searching Inside PDFs from the Terminal Prompt

I have reason, today, to want to search the Military Law Review. If you know which issue the info you’re looking for is located, then you can just jump right in.

When do we ever know that? There’s no search-inside feature. So we’ll build one ourselves. After a bit of futzing, you can see that all of the pdfs are available in this one directory:

https://www.loc.gov/rr/frd/Military_Law/Military_Law_Review/pdf-files/

so

$ wget https://www.loc.gov/rr/frd/Military_Law/Military_Law_Review/pdf-files/ -A .pdf

should just download them all directly. But it doesn’t. However, you can copy the source html to a text editor, and with a bit of regex you end up with a file with just the paths directly to the pdf.  Pass that file as -i urls.txt to wget, and you end up with a corpus of materials.

How do we search inside? This question on Stackoverflow will help us out.  But it requires pdftotext to be installed. Sigh. Always dependencies! So, following this, here we go.

On the command line (with Anaconda installed):

conda create -n envname python=3.7
conda activate envname
conda config --add channels conda-forge
conda install poppler

The pdfs are in a folder called ‘MLR’ on my machine. From one level up:

$ find /MLR -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename --label="{}" --color "trophies"' \;

et voila!

a quick thought on library JSON

Read this post today: https://tomcritchlow.com/2020/04/15/library-json/

which seems very cool. In Tom’s #update 1, he points to a parser that one of his readers wrote for this imagined spec, and if you format your books according to Tom’s spec, and point the parser at the file, you get this really cool interface on your materials: see https://bookshelves.ravern.co/shelf?url=https://tomcritchlow.com/library.json .

Anyway, the thought occurred that the ruby script that inukshuk wrote with regard to my query about adding materials to tropy notes in json (full thread here, .rb file here ) could easily be modified to produce the json from simple lists in txt.

So I might fart around with that.

 

Archstopmo: An Archaeology Stop Motion Movie Festival!

April 5 – 30th, with winners revealed May 4th.

Let’s have a movie festival. Also, I like stop-motion films – y’know, like the Wallace and Gromit films (and here’s an archaeological one I just found on youtube using playmobil!). So here’s what you do –

How?

  1. Make a stop motion film on one of the following themes:
    • a. archaeological theory
    • b. history of archaeology
    • c. the story of a site
    • d. the story of a dig
    • e. the biography of an object
  2. Make it two minutes in length
  3. Can be lego, clay, paper cut outs, whatever you like
  4. Upload to youtube
  5. Tweet it out with #archstopmo
  6. Check out twitter under the ‘archstopmo’ hashtag
  7. Prepare an artist’s statement that explains and contextualizes your work, as well as the software you’ve used, and your process.
  8. Submit your film at the form below.
  9. Have fun!

There will be a film gallery which will be updated frequently with entries and links to the artist’s statement.

Judging

Prizes (there are no prizes, only glory!) will be selected by a panel of judges, plus one audience choice.

  • Best in each category – five prizes
  • Choix des Judges – best overall
  • Choix du Peuple – best by voting

Submit Your Work

 

Featured image by Adi Suryanata  on unsplash https://unsplash.com/photos/5T0bY-x9A8U

Ah, I See You Have A Policy: A Screenshot Essay on the Trade in Human Remains

Warning: There are many photographs of human remains in this post.

There is a literature on the online trade in human remains going back to at least Huxley and Finnegan’s 2004 piece on eBay in the Journal of Forensic Science,  and since then, several academics have been active in discussing the ethical, moral, and legal dimensions of this trade, producing a steady stream of articles. At the same time, the trade was transformed by the merging of social media with marketplace and ad-driven revenue models, expanding in scope and reach. Several platforms, over the last decade, have added wording to their prohibited categories of goods that deals with human remains. Let’s walk through some of that.

I found a copy of the World Archaeological Congress 2010 Newsletter in the Internet Archive, with this one line describing a human skull seen on Etsy, and WAC’s successful request to Etsy to remove the post.

The post was not in fact removed. And can still be found online.

It sold in 2011. What’s etsy’s stance on human remains, anyway?

Etsy’s current policy on human remains. Such as it is. Human remains were added to the prohibited list in 2012.

The seller from 2010, still active, using a different skull as a prop. Still selling human remains, now points people towards her Facebook page, and since Etsy banned human remains, wants you to send private messages if you’re interested. Facebook’s good for that sort of thing, eh? Private messaging, I mean.


Facebook says no human body parts or fluids.

But here’s a Facebook store selling…. human remains.

We are not surprised, to find human remains on Facebook. After all, Facebook owns Instagram, and there are any number of posts there selling human remains. Including this one. But wait, is that an Amazon box? Does Amazon have a human remains policy?


Yes, yes they do. And it seems a bit contradictory. And unenforced.

And it is trivial to find human remains being sold on Amazon. Like this skull. Displayed sideways, since the photo was taken with the seller’s cellphone.

Since I’m on wordpress.com, you might see advertisements interspersed in this essay. It will be interesting to see which advertisements WordPress matches to this post; it might even be hard to see the difference between those ads and these screencaptures.


Ebay, 2012: ” [the policy prohibits] “humans, the human body, or any human body parts”  but expressly permits “clean, articulated (jointed), non-Native American skulls and skeletons used for medical research.” (Marsh, 2012, HuffPost). Today?

It was on eBay that we all (the archaeological ‘we’) first twigged that human remains selling online was lucrative and booming. While their policy has changed over the years, the policy is now admirably lucid and succinct. Did this tighter, stronger, policy have any impact?

It is possible to find the ruins and remains of specialist eBay aggregator sites like this one in the Internet Archive. I spent quite a lot of time tracking as many of these down as I could, teasing out which posts were actually for human remains, and which ones were replicas or adjacent materials, and scraping the data, plotting it over time.

And I see three phases here. An early phase where there was a lot of money happening (remember, these values are approximate indications rather than absolute totals. They give us a sense of the trend rather than the exact dollar number); a phase where language is suddenly cagey about what precisely is being sold (the stand? or the skull? Remember the earlier wishy-washy policy of 2012?), and the volume drops; and then, from July 2016: eBay bans human remains outright. And human remains drop out of the aggregators completely. The ban – to judge from these numbers – worked. Graphs and underlying research Graham, forthcoming.

Have we accomplished anything? eBay certainly has, I think, and that’s worth thinking about.  Perhaps an auction site where sales are also dependent on reputation responds better to moral suasion than the other platforms. When is it in a platform’s best interest to actually police its own policies?

Human remains are in a nebulous zone, legally. In Canada, the law to my mind seems pretty clear:

Section 182.B seems to cover it. These materials are human beings. Buying and selling humans interferes -at the very least!- with human dignity. I’m no lawyer, and I don’t think this has ever been tested in court. But: If a platform profits from a user’s breaking of the platform’s very own policies on human remains, if a platform turns a blind eye, is the platform not condoning the trade? Is this not a nudge-nudge wink-wink tacit approval of the trade? Who should want to invest in a platform that makes money from selling human beings? Should we not hold such a platform accountable?

See ACCO for more on various illicit and illegal trades happening across social media. For more on our project studying the trade in human remains, see bonetrade.github.io.

Posts referred to have also been saved to the Internet Archive.

a note on git-lfs

Sometimes, I have files that are larger than github’s 100 mb. So here’s what you need to do.

brew install git-lfs
brew upgrade git-lfs

Start a new git repository, and then make sure git large file storage (git lfs) is tracking the large file. For instance, I just moved a topic model visualization to a repo on github (20,000 archaeological journal articles). It has a data csv that is 135 mb. So I made a new repo on github, but didn’t initialize it on the website. Instead, after getting git-lfs installed on my machine:

git init
git lfs track "20000/data/topic_words.csv"
git add .gitattributes 20000/data/topic_words.csv
git commit -m "initial"
git add .
git commit -m "the rest"
git remote add origin https://github.com/shawngraham/archae-topic-models.git
git push -u origin master

Making Nerdstep Music as Archaeological Enchantment, or, How do you Connect with People Who Lived 3000 Years Ago?

by Shawn Graham, Eric Kansa, Andrew Reinhard

What does data sound like?

Over the last few days, what began as a bit of a lark has transformed into something more profound and meaningful. We’d like to share it with you—not just the result, but also our process. And in what we’ve made, perhaps, we find a way of answering the title’s question: how do you connect with people who lived 3,000 years ago?

In the recent past, Shawn has become more and more interested in representing the patterns we might detect, at a distance, in the large collections of digital data that are becoming more and more available . . . using sound. Called ‘sonification’, this technique maps aspects of the information against things like timbre, scale, instrumentation, rhythm, and beats-per-minute to highlight aspects of the data that a visual representation might not pick up. It’s also partly about making something strange—we’ve become so used to visual representations of information that we don’t necessarily recognize the ways assumptions about it are encoded in the visual grammars of barcharts and graphs. By trying to represent historical information in sound, we have to think through all of those basic decisions and elaborate on their implications.

Last week, he was toying with mapping patterns of topics in publications from Scotland from the 18th and 19th centuries as sound, using an online app called ‘TwoTone’. He shared it on Twitter, and well, one thing led to another, and a conversation began between Shawn, Eric, and Andrew: What might archaeological data sound like?

Sing in me Muse, through thine API, of sherds and munsell colors, of stratigraphic relations, and of linked thesauri URIs!

—Eric Kansa

Get Some Data

First things first: get some data. Open Context (Eric’s pet project) carefully curates and publishes archaeological data from all over the world. He downloaded 38,000 rows of data from the excavations at the Etruscan site of Poggio Civitate (where, in a cosmic coincidence, Andrew attended field school in 1991) and began examining it for fields that could be usefully mapped to various sonic dimensions. Ultimately, it was too much data! While there are a variety of ways of performing a sonification (see Cristina Wood’s Songs of the Ottawa, for instance), TwoTone only accepts 2,000 rows. The data used therefore for this audio experiment was very simple—counts of objects from Poggio Civitate were rendered as arpeggiated piano lines over three octaves; average latitude and average longitude were calculated for each class of thing thereby making a chord, and then each class of thing had its own unique value. Shawn’s initial result of data-driven piano sonification can be listened to here.

The four original dimensions of the sonification appear above, mapped in TwoTone. The rising notes in the bottom track are the item type ids. All of the materials come from the same chronological period, thus to listen (or view left-to-right) needed some sort of organizing principle. Whether or not it is the right principle is a matter of interpretation and debate.

Archaeology is a Remix

But what if an actual musician got a hold of these tracks? Andrew recently published a work called ‘Assemblage Theory’ where he remixed found digital music in order to explore ideas of archaeological assemblages.[1] Taking his experimentation in electronic dance music (EDM) a step beyond Assemablage Theory, he took Shawn’s four original tracks based on Eric’s 3,000-year-old data and began to play, iterating through a couple of versions, in a genre he calls ‘nerdstep’. He crafted a 5-minute piece that has movements isolating one of the four data threads, which sometimes crash together like waves of building data, yet are linked together. He opted for 120 bpm, a dance music standard, and then, noting where the waves of data subside into quiet pools, was inspired to write some lyrics. “The quiet segues are basically data reflexivity in audio form,” he says.

Data propagation
All this information
Gives me a reaction
Need time for reflection

A one-way conversation
This endless computation
Numbs me from sensation
Need time for reflection

Reflexivity
Give me time to breathe
Give me time to think

Reflexivity
Data raining down on me

Emotionally exhausting
How much will this cost me
I’m alone but you are watching
Look up from your screen

Reflexivity
Give me time to breathe
Give me time to think
Look up from your screen.

Reinhard used the open source Audacity audio software application to create the song based on archaeological data sonification. The first four tracks are Shawn’s piano parts, staggered in such a way as to introduce the data bit-by-bit, and then merged with 16 other tracks—overburden or matrix. In the beginning, they are harmonious and in time, but because of subtle variations in bpm, by the time the song ends the data have become messy and frenetic, a reflection of the scattered pieces within the archaeological record, something that happens over time. Each movement in the song corresponds to an isolated data thread from one of Shawn’s piano parts, which then loops back in with the others to see how they relate.

Life is A Strange Loop

Speaking of loops, let’s think about the full loop we’ve encountered here. 3,000 years ago, at a plateau in the tufa landscape of southern Etruria, people lived their lives, only to have their debris carefully collected, studied, systematized, counted, digitized, and exposed online. No longer things but data, these counts and spaces were mapped to simple sonic dimensions using a web-toy, making a moderately pleasing experience. Remixed, the music moves us, enchants us, towards pausing and thinking through the material, the labour, the meanings, of a digital archaeology.[2] If/when this song is performed in a club (attn: John Schofield and the Theoretical Archaeology Groups [TAG] in both the UK and North America), the dancers would then be embodying our archaeological knowledge of Poggio in their movements, in the flows and subtle actions/reactions their bodies make across the floor. In dancing, we achieve a different kind of knowledge of the world, that reconnects us with the physicality of the world.[3] The eruptions of deep time into the present [4] – such as that encountered at an archaeological site – are weird and taxing and require a certain kind of trained imagination to engage with. But by turning the data into music, we let go of our authority over imagination, and let the dancers perform what they know.

For the three of us as creators, this playful sonification of data allows us to see archaeological material with fresh eyes . . . errrrrr ears . . . and by doing so restores the enchantment we once felt at the start of a new project, or of being interested in archaeology in the first place. Restoring the notion of wonder into three middle-aged archaeologists is no small feat, but the act of play enabled us to approach a wealth of artifacts from one site we know quite well, and realize that we didn’t know it quite like this. Using the new music bridges the gap between humans past and present and in dancing we (and hopefully you) embody the data we present. It’s a new connection to something old, and is experienced by bodies. This is perhaps almost as intoxicating as the work done by Patrick McGovern (U. Penn) and Sam Caglione (Dogfish Head) in their experimentation and creation of ancient ales, the first of which was “Midas Touch”, a surprisingly drinkable brew concocted from an ancient recipe recovered on excavation in Asia Minor. Archaeology is often a cerebral enterprise, which deserves—at times—a good ass-shaking derived from a driving beat.

I’m listening now and am amazed. It is really beautiful, not only as a finished product, but as a process that started with people who lived their lives almost 3000 years ago.

—Eric Kansa

Reflexivity, by KGR [5]

Endnotes

[1] Reinhard’s article, “Assemblage Theory: Recording the Archaeological Record,” and two responses by archaeologists Jolene Smith and Bill Caraher.

[2] An argument made by Perry, Sara. (2019). The Enchantment of the Archaeological Record. European Journal of Archaeology, 22(3), 354-371. doi:10.1017/eaa.2019.24

[3] See for instance Block, Betty, and Judith Kissel (2001). Dance: The Essence of Embodiment. Theoretical Medicine and Bioethics 22(1), 5-15. DOI: 10.1023/A:1009928504969

[4] Fredengren, Christina (2016). Unexpected Encounters with Deep Time Enchantment. Bog Bodies, Crannogs and ‘Otherworldly’ sites. The materializing powers of disjunctures in time. World Archaeology 48(4), 482-499, DOI: 10.1080/00438243.2016.1220327

[5]  Kansa-Graham-Reinhard (pronounced as either “Cager” or “Kegger”—the GIF-debate of archaeological nerdstep/nerdcore).

References

Block, Betty, and Judith Kissel (2001). Dance: The Essence of Embodiment. Theoretical Medicine and Bioethics 22(1), 5-15. DOI: 10.1023/A:1009928504969

Caraher, William. (2019). “Assemblage Theory: Recording the Archaeological Record: Second Response” Epoiesen http://dx.doi.org/10.22215/epoiesen/2019.10

Fredengren, Christina (2016). Unexpected Encounters with Deep Time Enchantment. Bog Bodies, Crannogs and ‘Otherworldly’ sites. The materializing powers of disjunctures in time. World Archaeology 48(4), 482-499, DOI: 10.1080/00438243.2016.1220327

Perry, Sara. (2019). The Enchantment of the Archaeological Record. European Journal of Archaeology, 22(3), 354-371. doi:10.1017/eaa.2019.24

Reinhard, Andrew. (2019). “Assemblage Theory: Recording the Archaeological Record” Epoiesen http://dx.doi.org/10.22215/epoiesen/2019.1

Smith, Jolene. (2019). “Assemblage Theory: Recording the Archaeological Record: First Response” Epoiesen http://dx.doi.org/10.22215/epoiesen/2019.5

Anthony Tuck. “Murlo“. (2012) Anthony Tuck (Ed.) . Released: 2012-07-06. Open Context. <http://opencontext.org/projects/DF043419-F23B-41DA-7E4D-EE52AF22F92F> DOI: https://doi.org/10.6078/M77P8W98 ARK (Archive): https://n2t.net/ark:/28722/k2222wm10

Featured Image by Sarthak Navjivan https://unsplash.com/photos/iTZOPe7BpTM

A Song of Scottish Publishing, 1671-1893

The Scottish National Library has made available a collection of chapbooks printed in Scotland, from 1671 – 1893, on their website here. That’s nearly 11 million words’ worth of material. The booklets cover an enormous variety of subjects. So, what do you do with it? Today, I decided to turn it into music.

As part of writing the second edition to the Historian’s Macroscope, I’ve been re-writing the topic modeling section, and I’ve included working with this information, and building a topic model for it using R. As part of that exercise, I preprocessed all the data so that it would be a bit easier for the newcomer to work with it (which will be held in a Github repo for the purpose). Part of the preprocessing was adding a ‘publication date’ to the NLS-provided inventory file (which involved a whole bunch of command line regex etc to grab that info from the METS metadata files).

To turn this into sound – I used the Topic Modeling Tool  to build a quick topic model on the 3000 + text files containing the ocr’d text. The TMT can also match your metadata up against the topic results, which is very nice and handy, especially for turning the results into music, which I did with the TwoTone app. Drop the resulting csv onto TwoTone, and your columns are ready to map to the music; the visualization is also handy to get a sense of when your topics are most prominent (where the left hand side is my earliest date, and the right hand side is my latest date):

Then I played with the settings, filtering things so that notes only are played if they are making a meaningful contribution to the entire year’s text.

You can listen to it on Soundcloud.

The piano arpeggios are mapped to a topic that seems largely to be bad ocr. The pipe organ corresponds to a topic about religion. The trumpet seems to be stories of people off to make their fortune (as I read the topic words for that topic). There’s a double base in there, which I assigned to the ‘histories’ topic (because why not). The glockenspiel is assigned to the topic that I understand as ‘folk wisdom’, while the harp is mapped to stories of love and romanced (and doomed love too, for that matter).

What do we learn doing this? Well, for one thing, it forces us to think about the constructedness of our ‘visualizations’, which is never a bad thing. It foregrounds how much dirty data is in this thing. It shows change over time in Scottish publishing habits (“we could have done that with a graph, Shawn!” to which I say: So what? Now you can engage a different part of your brain and feel that change over time.)

Enjoy.

Revisiting AR, some notes

I haven’t futzed around with AR in a while. Here are some notes from a recent foray back into Unity3d + Vuforia. My students Ayda & Marissa were trying to use ARIS to do some AR in the National Gallery. It worked in their tests with 1 image, and so they went ahead and developed several more triggers and overlays (which took a lot of term time), but then when they went to play the AR, it would crash. I thought maybe it was a memory error, but after reformatting their images, triggers, and database, the crash continued.

We quickly ported it to Unity3d and Vuforia.

– when installing Unity, you now have to install unity hub. It’s from the hub that you add the android sdk, and the vuforia modules, if you forget to add those initially.

– the programming historian tutorial on unity and vuforia is a bit out of date as a consequence

– the unity quickstart guide is pretty good for getting going https://docs.unity3d.com/Manual/vuforia_get_started.html

– it adds 3d objects to the tracking images. For an image overlay, right click on the ImageTarget, select ‘plane’ from 3d object. Drag your image overlay from your assets folder ONTO the plane.

– make sure your plane occupies the same spatial coordinates as your target. Otherwise, in AR the image will float in ‘real’ space at those other coordinates

– name your scene containing all of your ImageTargets ‘scene 1’

– make a new scene for your splash/menu. Call it ‘scene 0’

– make sure to add your scenes to the build settings, and that scene 0 is loaded BEFORE scene 1.

– this tutorial can be followed, more or less, to create the menu http://theflyingkeyboard.net/unity/unity-ui-c-simple-main-menu/

– you really just need the exit button, and the button that loads scene 1 when pressed.

An Enchantment of Digital Archaeology – a peek at the contents

What is more rational than a computer, reducing all phenomena down to tractable ones and zeros? What is more magical than a computer, that we tie our identities to the particular hardware or software machines we use?…Archaeology, as conventionally practiced, uses computation to effect a distancing from the world; perhaps not intentionally but practically. Its rituals (the plotting of points on a map; the carefully controlled vocabularies to encode the messiness of the world into a database and thence a report, and so on) relieves us of the task of feeling the past, of telling the tales that enable us to envision actual lives lived. The power of the computer relieves us of the burden of having to be human.

An enchanted digital archaeology remembers that when we are using computers, the computer is not a passive tool. It is an active agent in its own right (in the same way that an environment can be seen to be active)…In that emergent dynamic, in that co-creation with a non-human but active agent, we might find the enchantment, the magic of archaeology that is currently lacking in the field.

I might be a posthumanist.

An Enchantment of Digital Archaeology: Raising the Dead with Agent Based Models, Archaeogaming, and Artificial Intelligence with Berghahn Books in New York, in the series edited by Andrew Reinhard, ‘Digital Archaeology: Documenting the Anthropocene’, has now moved to the next stage of the publishing process. I signed the contract two years ago, got the first draft to the editor in June of this year, got the peer reviews back in September, rejigged the damned thing, rewrote parts, rearranged the structure, added new parts, and resubmitted it earlier this month. The peer reviews were incredibly generous, even when some parts or decisions on my end left them cold, and so of course the end result isn’t their fault, as the acknowledgements will duly note.

In one way or another, this is the book that I’ve been trying to write since I came to Carleton. I might’ve even outlined the idea for this book in my original application for the job. It’s always had a pedagogical aspect to it, even when it was called Toying with the Past (too negative) and then later Practical Necromancy (too scary). But what finally made things start to click were conversations with the folks at the University of York, who as a department seem to really gel around ideas of reflexivity, affective engagement, and plain ol’ out-there digital archaeology. I love them all!

The book is meant to take the reader through my own experience of disenchantment with archaeology, and then the ways I found myself re-enchanted through digital work. The intended audience is undergraduate students; I am not writing a how-to, but rather, I want to enthuse with the possibilities, to spark curiosity, and to fire the imagination.

Anyway, the publisher asked me to write abstracts for each chapter, and said I could share them here, so awwaaaaay we gooooo!

Introduction: An Enchantment of Digital Archaeology?

‘Enchantment’ is discussed, drawing on the political philosophy of Bennett, and contrasted with the ways archaeology comes to know the past. The rupture of the past into the present is one locus of enchantment. The chapter argues that simulation and related digital technologies capture something similar. A rationale for why simulation should be a necessary part of the archaeologist’s toolkit is offered. Considering enchantment means confronting disenchantment, and so prompts a reflective examination of the purpose of archaeology and archaeological computing. This kind of reflexive writing necessarily requires a very personal engagement with the materials. The chapter concludes with a discussion on some of the potential dangers of misunderstanding ‘enchantment’ as ‘seduction’.

Keywords: enchantment; digital archaeology; new aesthetic; simulation; affective engagement

Chapter One: Imagine a Network

Networks are a foundational metaphor for digital archaeology. If we can imagine the archaeological past within a system of relationships, we are dealing with networks. Networks can then be operationalized as the substrate for simulation, and the substrate for computation. The chapter sets up a longer discussion where we begin with a network as metaphor before moving towards more grounded and less metaphorical uses. It imagines the city of Rome as a process of flows through intertwining networks, a process of concretization of flows of energy and power and materials.

Keywords: city of Rome; bricks; building trade; complexity; networks

Chapter Two: Reanimating Networks
Agent based simulations are introduced. Their potentials and limitations are discussed, as well as the ways the code of the simulation captures the historiography of the phenomenon under discussion. Part of the attraction of agent based models rests in their formalization of the ‘just-so’ stories we might normally tell. This allows us to test the unintended or unforeseen consequences of those stories. We create self-contained software agents and give them rules drawn from our understanding of the past to guide their behaviour – and then we turn them loose to interact within the channels of the archaeological networks we have uncovered. In this case, the network of inter-urban communications.

Keywords: agent based models; antonine itineraries; information diffusion; replicability; formal models

Chapter Three: Add Agents, and Stir

The network can exist in social space, in addition to physical space. The social information recovered from stamped Roman bricks can be stitched into a network of human interactions over time; these networks can then be used as the starting point for simulating ancient social dynamics, and for asking what-if questions. The chapter concludes with a reflection on how such computational agents might escape from the confines of the machine, and what that implies for how we might know or have an affective response for the past. One way is that the labour these resurrected Romans, these ’digital zombies’ do depends on compelled labour in the ‘real’ world. How we talk about the creatures we create (in silicon) has ramifications for the world outside the machine.

Keywords: agent based models; agency; salutatio; violence; assemblages; vibrant matter

Chapter Four: Archaeogaming

One way for the archaeologist to sink into the digital assemblages reanimated with simulation is to transform the simulation into a game. Archaeogaming is considered in the sense of playing games with archaeological themes. A theory of play is also a theory of learning. The simulation considered in the previous chapter is recast as an archaeogame and the consequences of ‘playing’ this game are considered. The points of intersection between archaeogames and agent based models are considered for the ways in which the two forms differ. The chapter concludes with a discussion of a case study where students were asked to design games to communicate ‘good history’. The play of building leads to greater engagement and enchantment.

Keywords: archaeogames; design; play; time; failure; pedagogy

Chapter Five: The Fun is in the Building

A case study building an actual video-game informed by an agent based model is discussed, including design elements and a post-mortem on the successes and failures of the project. The ethics of game play and meaningful individual choices as they intersect with a larger society-level simulation should make for an engaging experience, but our lack of expertise in actual game design hampers the project. There is a mismatch between the mechanics of the genre and the dynamics of the cultural experience we wish to explore. Returning to the idea of the city of Rome as a kind of emergent outcome of dynamic flows, we consider the city management genre and its connections to archaeogaming. The chapter concludes with a consideration of how an analogue format, the board game, promotes the kind of digital thinking and enchantment we are seeking.

Keywords: first person shooter; artificial anasazi; simCity; Will Wright; board games

Chapter Six: Artificial Intelligence

Networks are capable of computation. Neural networks enable us to represent our archaeological information and historical imagination in ways that a computer can engage with creatively. A simple recurrent neural network is trained on the writings of various historical personae so that it can mimic their voice. A very complex language model released by the OpenAI foundation is used as a kind of parameter space out of which we can collapse its understanding of ‘archaeology’, as filtered through its understanding of the writings of Flinders Petrie. The enchantment of digital archaeology might therefore sit at the point of combination of powerful neural network models of knowledge with agent based models of behaviour and archaeogaming methods for interaction.

Keywords: artificial intelligence; GPT-2; neural networks; ethics; augmented reality

Conclusion: Enchantment is a Remembering

Digital artefacts are subject to decay and ruin. They sometimes erupt into the digital world’s ever-present ‘now’ in the same ways archaeological materials interrupt the physical world of today. To program something necessarily means cutting away information, and to understand how something is programmed involves actively trying to break it, to see in its ruptures what has been cut away. There is enchantment in this process. The simulations and toys that the book considers also point to the playfulness that is necessary to find the enchantment in digital archaeology. Ultimately, the growing power of digital technologies to pluck representations of the past out of the possibility space of computation increases our responsibility to the dead to be truthful, to be engaged, and to be enchanted.

Keywords: ruin, forgetting, world-views; representation; complexity

Afterword: Guidelines for developing your own digital archaeology

Some thoughts on how one might get started in all of this

Appendices

Code walk throughs for developing some ABMS and re-implimenting one of my earlier models.

Failing Gloriously and Other Essays

‘Failing Gloriously and Other Essays’, my book reflecting on what ‘failure’ means, can mean, should mean, in the digital humanities and digital archaeology will be released on Dec 1. From the publisher website (where you’ll be able to get your copy in due course):

Failing Gloriously and Other Essays documents Shawn Graham’s odyssey through the digital humanities and digital archaeology against the backdrop of the 21st-century university. At turns hilarious, depressing, and inspiring, Graham’s book presents a contemporary take on the academic memoir, but rather than celebrating the victories, he reflects on the failures and considers their impact on his intellectual and professional development. These aren’t heroic tales of overcoming odds or paeans to failure as evidence for a macho willingness to take risks. They’re honest lessons laced with a genuine humility that encourages us to think about making it safer for ourselves and others to fail.

A foreword from Eric Kansa and an afterword by Neha Gupta engage the lessons of Failing Gloriously and consider the role of failure in digital archaeology, the humanities, and social sciences

The book will be available in print for $, and for free via pdf download.

Quinn Dombrowski has posted a wonderfully generous review over on Stanford Digital Humanities . I hope you’ll find value in it too!

scraping with rvest

We’re working on a second edition for the Historian’s Macroscope. We’re pruning dead links, updating bits and bobs, and making sure things still work the way we imagined they’d work.

But we really relied on a couple of commercial pieces of software and while there’s nothing wrong in doing that, I really don’t want to be shilling for various companies, and trying to explain in print how to click this, then that, then look for this menu…

So, I figured, what the hell, let’s take the new-to-digital-history person by the hand and push them into the R and RStudio pool.

What shall we scrape? Perhaps we’re interested in the diaries of the second American President, John Adams. The diaries have been transcribed and put online by the Massachusetts Historical Society. The diaries are sorted by date on this page. Each diary has its own webpage, and is linked to on that index page. We would like to collect all of these links into a list, and then iterate through the list, grabbing all of the text of the diaries (without all of the surrounding html!) and copying them into both a series of text files on our machine, and into a variable so that we can do further analysis (eventually).

If you look at any of the webpages containing the diary entries, and study the source (right-click, ‘view source’) you’ll see that text of the diary is wrapped or embraced by an opening




<div class="entry">

and closing

</div>



That’s what we’re after.  If you look at the source code for the main index page listing all of the diaries, you’ll see that the links are all relative links rather than absolute ones – they just have the next bit of the url relative to a baseurl. Every webpage will be different; you will get used to right-clicking and ‘viewing source’ or using the ‘inspector’

For the purposes of this exercise, it isn’t necessary to install R and RStudio on your own machine, although you are welcome to do so and you will want to do so eventually. For now we can run a version of RStudio in your browser courtesy of the Binder service – if you click the link here a version of RStudio already preconfigured with many useful packages will (eventually) fire up in your browser, including rvest and dpylr, which we will be using shortly.

With RStudio loaded up, select file > new file > r script (or, click on the green plus sign beside the R icon).

The panel that opens is where we’re going to write our code. We’re not going to write our code from first principles though. We’re going to take advantage of an existing package called ‘rvest’ (pronounce it as if you’re a pirate….) and we are going to reuse but gently modify code that Jerid Francom first wrote to scrape State of the Union Addresses. By writing scripts or code to do our work (from data gathering all the way through to visualization) we enable other scholars to build on our work, to replicate our work, and to critique our work.

In the code snippets below, any line that starts with a # is a comment. Anything else is a line we run.


library(rvest)
library(dplyr)

These first two lines tell R that we want to use the rvest and dplyr packages to make things a bit easier. Put your cursor at the end of each line, and hit the ‘run’ button. R will pass the code into the console window below; if all goes well, it will just show a new prompt down there. Error messages will appear if things go wrong, of course. The cursor will move down to the next line; hit ‘run’ again. Now let’s tell R the baseurl and the main page that we want to scrape. Type:


base_url <- "https://www.masshist.org"
# Load the page
main.page <- read_html(x = "https://www.masshist.org/digitaladams/archive/browse/diaries_by_date.php")

I don’t know why WordPress is mangling those three lines up, breaking them apart like that. They should look like this:

We give a variable a name, and then use the <- arrow to tell R what goes into that variable. In the code above, we are also using rvest’s function for reading html to tell R that, well, we want it to fill the variable ‘main.page’ with the html from that location. Now let’s get some data:

# Get link URLs
urls <- main.page %>% # feed `main.page` to the next step
    html_nodes("a") %>% # get the CSS nodes
    html_attr("href") # extract the URLs
# Get link text
links <- main.page %>% # feed `main.page` to the next step
    html_nodes("a") %>% # get the CSS nodes
    html_text() # extract the link text

In the code above, we first create a variable called ‘urls’. We feed it the html from main.page; the %>% then passes the data on the left to the next function on the right, in this case ‘html_nodes’ which is a function that travels through the html looking for the ‘a’ node in the CSS, and then passes that to the next part, the ‘href’ of a hyperlink. The url is thus extracted. Then we do it again, but this time pass the text of the link to our ‘links’ variable. You’ve scraped some data!

But it’s not very usable yet. We’re going to make a ‘data frame’, or a table, of these results, creating a column for ‘links’ and a column for ‘urls’. Remember how we said earlier that the links were all relative? We’re also going to paste the base url into those links so that we get the complete path, the complete url, to each diary’s webpage.


# Combine `links` and `urls` into a data.frame
# because the links are all relative, let's add the base url with paste
diaries <- data.frame(links = links, urls = paste(base_url,urls, sep=""), stringsAsFactors = FALSE)

Here, we have created a ‘diaries’ variable, and we’ve told R that it’s actually a dataframe. Into that dataframe we are saying, ‘make a links column, and put links into it; and make an urls column, but paste the base_url and the link url together and do not put a space between them’. The ‘stringsAsFactors’ bit isn’t germane to us right now (but you can read about it here.) Want to see what you’ve got so far?


View(diaries)

The uppercase ‘V’ is important; a lowercase view doesn’t exist, in R. Your dataframe will open in a new tab beside your script, and you can see what you have. But there are a couple of rows there where we’ve grabbed links like ‘home’, ‘search’, ‘browse’ which we do not want. Every row that we want begins with ‘John Adams’ (and in fact, if we don’t get rid of those rows we don’t want, the next bit of code won’t work!).


# but we have a few links to 'home' etc that we don't want
# so we'll filter those out with grepl and a regular
# expression that looks for 'John' at the start of
# the links field.
diaries <- diaries %>% filter(grepl("^John", links))

We are telling R to overwrite ‘diaries’ with ‘diaries’ that we have passed through a filter. The filter command has also been told how to filter: use ‘grepl’ and the regular expression (or search pattern) ^John. In English: keep only the rows that begin with the word John in the links column. Try View diary again. All the extra stuff should be gone now!

We still haven’t grabbed the diary entries themselves yet. We’ll do that in a moment, while at the same time writing those entries into their own folder in individual text files. Let’s create a directory to put them in:


#create a directory to keep our materials in

dir.create("diaries")

and now, we’re going to systematically move through our list of diaries, one row at a time, extracting the diary entry which, when we examined the webpage source code earlier, we saw was marked by an ‘entry’ div. Here we go!


# Loop over each row in `diaries`
for(i in seq(nrow(diaries))) { # we're going to loop over each row in 'diaries', extracting the entries from the pages and then writing them to file.
text <- read_html(diaries$urls[i]) %>% # load the page
html_nodes(".entry") %>% # isloate the text
html_text() # get the text

# Create the file name
filename <- paste0("diaries/", diaries$links[i], ".txt") #this uses the relevant link text as the file name sink(file = filename) %>% # open file to write
cat(text) # write the file
sink() # close the file
}

The first line sets up a loop – ‘i’ is used to keep track of which row in ‘diaries’ that we are currently in. The code between the { and } is the code that we loop through, for each row. So, we start with the first row. We create a variable called ‘text’ into which we get the read_html function from rvest to read the html for the webpage address that exists in the url column of ‘diaries’ in row i. We pass that html to the html_nodes function, which looks for the div that embraces the diary entry. We pass what we found there to the html_text function, which extracts the actual text.

That was part one of the loop. In part two of the loop we create a filename variable and create a name from the link text for the webpage by pasting the folder name diaries + link-name-from-this-row + .txt. We use the ‘sink’ command to tell R we want to drain the data into a file. ‘cat’, which is short for ‘concatenate’, does the writing, putting the contents of the text variable into the file. Then we close the sink. We get to the closing bracket } and we start the loop over again, moving to the next row.

Cool, eh?

You now have a folder filled with text files, that we can analyze with a variety of tools or approaches, and a text variable all ready to go for more analysis right now in R.

The full code is in this github gist:

#after https://francojc.github.io/2015/03/01/web-scraping-with-rvest-in-r/
library(rvest)
library(dplyr)
base_url <- "https://www.masshist.org"
# Load the page
main.page <- read_html(x = "https://www.masshist.org/digitaladams/archive/browse/diaries_by_date.php")
# Get link URLs
urls <- main.page %>% # feed `main.page` to the next step
html_nodes("a") %>% # get the CSS nodes
html_attr("href") # extract the URLs
# Get link text
links <- main.page %>% # feed `main.page` to the next step
html_nodes("a") %>% # get the CSS nodes
html_text() # extract the link text
# Combine `links` and `urls` into a data.frame
# because the links are all relative, let's add the base url with paste
diaries <- data.frame(links = links, urls = paste(base_url,urls, sep=""), stringsAsFactors = FALSE)
# but we have a few links to 'home' etc that we don't want
# so we'll filter those out with grepl and a regular
# expression that looks for 'John' at the start of
# the links field.
diaries <- diaries %>% filter(grepl("^John", links))
#update nov 9 – I find that line 26 doesn't work in some versions of r via binder that
#i have running. I think it's a versioning thing. Anyway, another way of achieving the same
#effect if you get an error there is to slice away the bits you don't want (thus keeping
#the range of stuff you *do* want:
diaries <- diaries %>% slice(9:59)
#create a directory to keep our materials in
dir.create("diaries")
# Loop over each row in `diaries`
for(i in seq(nrow(diaries))) { # we're going to loop over each row in 'diaries', extracting the entries from the pages and then writing them to file.
text <- read_html(diaries$urls[i]) %>% # load the page
html_nodes(".entry") %>% # isloate the text
html_text() # get the text
# Create the file name
filename <- paste0("diaries/", diaries$links[i], ".txt") #this uses the relevant link text as the file name
sink(file = filename) %>% # open file to write
cat(text) # write the file
sink() # close the file
}

view raw
diary-scraper.r
hosted with ❤ by GitHub