The Large Language Model Sea Shanty

Like everyone else, I’m playing with GPT3 chat. I asked it to generate a sea shanty that explains how large language models work, in the style of Stan Rogers.

Prompt: Explain large language models in the style of a Stan Rogers sea-shanty.

Completion:

Verse 1:

Hark ye sailors, come listen to me

I’ll sing a tale of a machine so grand

A large language model, ye see

To predict the words that we’ll command

Chorus:

Oh, the large language model

Truly a marvel to behold

With its deep neural nets

It never forgets

A treasure trove of knowledge untold

Verse 2:

Trained on vast amounts of data

It can understand any text we feed

From simple sentences to complex queries

It can provide answers with impressive speed

Chorus:

Oh, the large language model

Truly a marvel to behold

With its deep neural nets

It never forgets

A treasure trove of knowledge untold

Verse 3:

With its ability to generate words

It can create stories never heard before

From news articles to poems and more

It’s a tool that we’ll use and explore

Chorus:

Oh, the large language model

Truly a marvel to behold

With its deep neural nets

It never forgets

A treasure trove of knowledge untold.

Prompt:Compose the melody for the shanty. Write in ABC notation.

Completion:

X:1
T:Large Language Model
C:Traditional
M:6/8
L:1/8
K:G
D | “G”GGG “C”GEG | “D”FDF “G”DGD | “G”GGG “C”GEG | “D”FDF “G”G2 D |
GGG GEG | FDF DGD | GGG GEG | “D”FDF “G”G2 D |
“G”GGG “C”GEG | “D”FDF “G”DGD | “G”GGG “C”GEG | “D”FDF “G”G2 D |
GGG GEG | FDF DGD | GGG GEG | “D”FDF “G”G2 D |
“G”GGG “C”GEG | “D”FDF “G”DGD | “G”GGG “C”GEG | “D”FDF “G”G2 D |
GGG GEG | FDF DGD | GGG GEG | “D”FDF “G”G2 D ||
D | “G”GGG “C”GEG | “D”FDF “G”DGD | “G”GGG “C”GEG | “D”FDF “G”G2 D |
GGG GEG | FDF DGD | GGG GEG | “D”FDF “G”G2 D |
“G”GGG “C”GEG | “D”FDF “G”DGD | “G”GGG “C”GEG | “D”FDF “G”G2 D |
GGG GEG | FDF DGD | GGG GEG | “D”FDF “G”G2 D |
“G”GGG “C”GEG | “D”FDF “G”DGD | “G”GGG “C”GEG | “D”FDF “G”G2 D |
GGG GEG | FDF DGD | GGG GEG | “D”FDF “G”G2 D ||

So of course I turned it into a mp3; listen here.

Advertisement

What does GPT3 know of archaeology?

Aside from the initial prompt, the text is all GPT3; the artwork is via Nightcafe Creator’s stable diffusion. Each prompt for artwork is generated from a generated summary of the previously generated text. The recursion, it burns!

….alright. That’s enough of that. The GPT3 devolved into a series of ruminations on postholes – how many there were, ooops, here’s another- when my last prompt ended with ‘he turned back towards his task’. So I added one line to suggest, hey, the texture of the soil changed, it sounds different now, perhaps we’ve encountered a different feature or context or what-have-you… but GPT3 wasn’t having any of that. ‘What this story needs…. is a charismatic dog’. And nightcafe creator is thoroughly bonkers, so that’s nice.

House of Flies

An inhuman intelligence that tells stories? From the Department of Digital Inhumanities at Miskatonic University, we bring you ‘House of Flies’, a tale of Roman History.

This ‘graphic novel’ was generated as I did yesterday, through a dialogue with the machine. Dialogue was not so much the focus this time, as look and feel. In the text below, I would start with a prompt, GPT3 would respond, and I would respond with the next prompt. Then I selected phrases and generated around 100 images to get the ones I used here. I used Pixlr to do the small amount of touching up and rearranging.

I’m doing this to figure out the parameters for an assignment for my HIST3812 class.

Prompts begin with a #:

#Write the opening scene for a graphic novel, set in ancient Rome in a lovecraftian universe, called ‘The House of the Flies’.

The sky is dark and foreboding, a storm is brewing. The wind howls through the streets of Rome, sending trash and debris flying through the air. In the distance, lightning flashes and thunder rolls. A figure cloaked in black stands on the rooftop of a decrepit building, overlooking the city. The figure is holding a small jar with a black liquid inside. The figure opens the jar and tips it over, allowing the contents to spill out onto the ground below. From the jar, a swarm of flies erupts and begins to fly around the city. The figure watches as the flies descend upon the city, crawling into the ears of the citizens and laying their eggs. The figure cackles with glee as the citizens of Rome begin to go mad, tearing at their clothes and hair as the flies consume their minds.

#PUGNAX (speaking very slowly):

Welcome… to the House of the Flies.

PUGNAX: This city… is infested.

PUGNAX: The citizens are infected… with madness.

PUGNAX: And it is all thanks to me.

#New scene.

The scene is a crowded marketplace in Rome. Citizens are going about their business, bargaining for goods, when suddenly they begin to scratch at their heads and faces. Confusion and panic spreads through the marketplace as the citizens become more and more agitated. Some people begin to flee the scene, while others drop to the ground, foaming at the mouth. Suddenly, a group of Roman soldiers runs into the marketplace, swords drawn. The soldiers begin to hack and slash at the citizens, trying to control the chaos. But it is no use, the citizens are beyond reasoning, and they fight back with a ferocity that the soldiers are not prepared for. The scene is descending into madness, with blood and guts everywhere. The flies are everywhere, crawling into people’s eyes and mouths. The soldiers are being overrun and the citizens are becoming more and more feral.

In the midst of the chaos, PUGNAX stands on a platform, watching the scene unfold with glee.

#New scene. PUGNAX meets his downfall through entirely non-violent means. [<- note otherwise GPT3 generates violent imagery, which then violates terms of service. So I added the entirely non-violent bit to avoide getting in trouble.]

PUGNAX is sitting on his throne, surrounded by his followers, when a Roman soldier walks into the room. The soldier is holding a jar with a black liquid inside.

SOLDIER: This is for you.

PUGNAX: What is it?

SOLDIER: A gift. From the citizens of Rome.

PUGNAX opens the jar and looks inside. From the jar, a swarm of flies erupts and begins to fly around the room. The flies crawl into PUGNAX’s eyes and mouth, and he begins to scream in agony. The flies lay their eggs inside PUGNAX’s body, and he is consumed from the inside out. As PUGNAX writhes in pain, the soldier watches with a satisfied smirk on his face.

SOLDIER: Madness is a disease, and you are the cure.

—-

Wild, eh?

POSTSCRIPT

So I wonders to myself, ‘can GPT3 generate music’ and the answer is, of course!

Prompt: Write the theme music for a 6 part HBO miniseries for ‘House of the Flies’, a lovecraftian Roman history. Use ABC notation. The tone should be dark, but the tempo should be lively. Include all ABC notation metadata.

Result:

X:1

T:House of the Flies

M:6/8 L:1/8 Q:1/4=120 K:Gmin

%%staves {(Pos1 Pos2) (Pos3 Pos4) (Pos5 Pos6)} V:Pos1 “Piano”\ c2 | B2 | A2 | G2 | G2 | A2 | A2 | B2 | c2 | B2 | A2 | G2 | G2 | A2 | A2 | V:Pos2 “Bass”\ CC | GG | FF | EE | EE | FF | FF | GG | CC | GG | FF | EE | EE | FF | FF | GG | V:Pos3 “Strings”\ D2 | C2 | B,2 | A,2 | A,2 | B,2 | B,2 | C2 | D2 | C2 | B,2 | A,2 | A,2 | B,2 | B,2 | V:Pos4 “Flute”\ G,2 | F,2 | E,2 | D,2 | D,2 | E,2 | E,2 | F,2 | G,2 | F,2 | E,2 | D,2 | D,2 | E,2 | E,2 | V:Pos5 “Harp”\ G,B,D | F,A,C | E,G,B | D,F,A | D,F,A | E,G,B | E,G,B | F,A,C | G,B,D | F,A,C | E,G,B | D,F,A | D,F,A | E,G,B | E,G,B | V:Pos6 “Percussion”\ E2 | E2 | E2 | E2 | G2 | G2 | E2 | E2 | E2 | E2 | E2 | E2 | G2 | G2 | E2 | E2

…which, if you run through something like https://colinhume.com/Music.aspx gives you the sheet music AND the midi files (for your own instrumentation). Have a listen!

A comic strip via AI about archaeology

This article by Robert Gonsalves details his experiment trying to develop and generate the artwork for a comic strip. It’s very cool, so I thought what the hell… I’d give it a shot. I have an account with OpenAI, so let’s generate some magic!

Concept, Scene, & Dialogue

First step: let’s get a title. In the block below, the line starting with # is what I gave the machine, the rest is what it generated on its default settings.

#Create short wacky titles for a humorous comic strip about archaeologists.

1. A Day in the Life of an Archaeologist
2. The Trials and Tribulations of Archaeology
3. The Funny Side of Archaeology
4. Archaeology: It's Not Just for Geeks Anymore!
5. How to Be an Archaeologist in 10 Easy Steps
6. The Archaeologist's Handbook to Life
7. The Archaeologist's Guide to the Galaxy
8. The Archaeologist's Code of Ethics
9. The Archaeologist's Field Guide to the Universe
10. The Archaeologist's Survival Guide

Alright, the Archaeologist’s Handbook to Life sounds fun, let’s go with that. Next step, some dramatis personae.

#Create lead characters for a new comic strip about archaeologists called 'The Archaeologist's Handbook to Life'.

1. Dr. Jane Smith: A world-renowned archaeologist who has dedicated her life to unearthing ancient civilizations. 
2. Dr. John Jones: Jane's partner and fellow archaeologist. He is a brilliant mind and together they make an unstoppable team. 
3. Bobby: The couple's teenage son who often accompanies them on their adventures. He is a gifted student and is always eager to help out with the excavations.

Ok, husband-and-wife pairs are not uncommon in archaeology. Ok, we’ll go with this. Let’s generate a scene…

#Create a scene with dialog for a comic strip about archaeologists called 'The Archaeologist's Handbook to Life'.
#CHARACTERS:
#Dr. Jane Smith: A world-renowned archaeologist who has dedicated her life to unearthing ancient civilizations. 
#Dr. John Jones: Jane's partner and fellow archaeologist. He is a brilliant mind and together they make an unstoppable team. 
#Bobby: The couple's teenage son who often accompanies them on their adventures. He is a gifted student and is always eager to #help out with the excavations.

Scene:

Dr. Jane Smith: This is it, John! After years of searching, we've finally found the lost city of Atlantis!
Dr. John Jones: Incredible! The sheer size of this place is breathtaking. It's going to take us years to explore all of it!
Bobby: Wow! This is amazing! I can't wait to start digging!
Dr. Jane Smith: Hang on, Bobby. We have to be careful. We don't want to damage any of the artifacts.
Dr. John Jones: Jane's right. This is a once in a lifetime opportunity. We need to take our time and do it right.
Bobby: I understand. I'll be careful. I promise.

Well that’s pretty… underwhelming. Certainly not funny. A bit earnest. And of course – Atlantis. Let’s see if we can improve that a bit. Other scenes are… well:


Dr. Jane Smith: We've been working on this dig site for weeks now and we still haven't found anything. I'm starting to think that this is a wild goose chase.

Dr. John Jones: Jane, you know as well as I do that these things take time. We can't just give up now. Who knows what we might find if we keep going?

Bobby: Mom, Dad, I found something! Come take a look!

Dr. Jane Smith: What have you found, Bobby?

Bobby: I think it's an ancient coin!

Dr. John Jones: Wow, great job, Bobby! It looks like we might have finally struck gold!

…in fact, the more I generate, the more depressed I get. These characters just aren’t… interesting. Time to back up a step.

#Create lead characters for a new comic strip about archaeologists called 'The Archaeologist's Handbook to Life', which provides an irreverent look at day-to-day life in the field.

1. Dr. Amy Jones: A highly respected archaeologist who is always looking for new and innovative ways to do her job. She's not afraid to speak her mind, and she's always looking for new challenges.
2. Dr. John Smith: Amy's colleague and friend. He's a bit of a goofball, but he's always there to lend a helping hand.
3. Dr. Sarah James: Sarah is the newest member of the team. She's fresh out of grad school and ready to take on the world.
4. Dr. David Miller: David is the team's resident expert on ancient cultures. He's always ready to share his knowledge, and he's always eager to help out with any project.

But again, as I generate, everything that comes out is just… just so… worthy. I suppose that’s a good thing, in that this monstrous AI thinks archaeologists are fine upstanding citizens of the world, even if there is a bit of Atlantis going on. In the interests of keeping this experiment going, we’ll go with this scene:

#Create a scene with humorous dialog for a comic strip about archaeologists called 'The Archaeologist's Handbook to Life'.
#Characters:
#Amy: A highly respected archaeologist who is always looking for new and innovative ways to do her job. She's not afraid to #speak her mind, and she's always looking for new challenges.
#John: Amy's colleague and friend. He's a bit of a goofball, but he's always there to lend a helping hand.
#Sarah: Sarah is the newest member of the team. She's fresh out of grad school and ready to take on the world.
#David: David is the team's resident expert on ancient cultures. He's always ready to share his knowledge, and he's always eager to #help out with any project.

Amy: Alright, everyone, listen up! I've just been told that we're going to be working with a new team of archaeologists.
John: Ooh, I love meeting new people!
Sarah: Me too! I'm excited to learn from them.
David: I'm sure they'll be a great help.
Amy: I'm sure they will be, but we're going to need to be careful. We don't want to give away too much about our methods.

Archaeologists being tetchy about how they do things? That tracks.

Art Work with DALL-E

I fed DALL-E the prompt, a comic set in a laboratory for an archaeological excavation. Two men and two women lounge around the work tables, chatting. The style is similar to the Far Side, by Gary Larson.

That resulted in pretty much garbage. This one worked better: `A comic set at an archaeological excavation, with two men and two women as characters wearing work clothes, Amy and John who are serious, and Sarah and David who are not.

The first three pictures could work; the last one looks like DALL-E has ingested a children’s bible. I’ll take the second one, create some variations (which is just a click inside the editor) and using pixlr, see if I can remove DALL-E’s wonky lettering, and add the text that GPT3 created.

…so there we have it. A comic designed and implemented via prompt engineering and a bit of the ol’ cut-n-paste. Now, to use this power for good instead of evil…

Whisper, from OpenAI, for Transcribing Audio

Whisper is a trained neural network model for transcribing audio. You can read about it at https://openai.com/blog/whisper/. I can see this as being enormously useful for public historians, oral historians… anyone who deals with recorded speech. I’ve tested it on audio recordings from the 1920s in English, and more recent recordings in French. A very cool feature of the language model is the ability to translate another language into an English transcription. But first, here’s how you get started.

$ conda create -n py39 python=3.9
$ conda activate py39
$ conda install pytorch torchvision torchaudio -c pytorch-nightly
$ brew install rust
$ pip install git+https://github.com/openai/whisper.git

I’m using miniconda on a mac mini m1.

Here’s the result on a recording of Eamon de Valera’s Saint Patrick’s Day address of 1920:

$ whisper 'Eamon_de_Valera-Saint_Patricks_Day_address_(03.04.1920).mp3'
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: english
[00:00.000 --> 00:06.960]  sons and daughters of the gale wherever you be today in the name of the motherland greeting
[00:07.920 --> 00:15.120]  whatever flag be the flag you guard and cherish it is consistent with your highest duty to link
[00:15.120 --> 00:23.360]  yourself together to use your united strength to break the chains that bind our sweet sad mother
[00:23.360 --> 00:30.480]  and never before have the scattered children of era had put an opportunity for noble service
[00:31.760 --> 00:39.760]  today you can serve not only Ireland but the world a cruel war and a more cruel
[00:39.760 --> 00:48.320]  peak have shattered the generous of souls apathy marks the high minded and heartless cynicism
[00:48.320 --> 00:56.960]  points the way of selfishness we the children of a race that has endured for ages the blight

_and so on_.

When I asked it to translate a French recording, using the default small model, things got stuck, with the model outputting the same line over and over. I wasn’t surprised; the recording is not very clear – listen to it here. But I re-ran the command with the medium model (which is the largest model that Whisper has that is multilingual). The results were much better, and quite impressive:

$ whisper RL10059-CS-1751_02.mp3 --language French --task translate --model medium
[00:00.000 --> 00:09.000]  We will stop in the framework of the investigation that will maintain good relations with the American embassy.
[00:09.000 --> 00:12.000]  Listen to the explanations of Senator Danny Toussaint.
[00:12.000 --> 00:34.000]  But think about the name of Philip MacKenton, that Ana Rana told you that he was linked with Daniel Whitman.
[00:34.000 --> 00:42.000]  He said that during the arrest, the American embassy came to look for him.
[00:42.000 --> 00:46.000]  The judge asked him to sign, and he did.
[00:46.000 --> 00:49.000]  He asked him who was coming to look for him.
[00:49.000 --> 01:03.000]  It's clear that during the trial, the judge, the father who was in prison,
[01:03.000 --> 01:07.000]  who was authorized by the American embassy to look for Philip MacKenton,
[01:07.000 --> 01:14.000]  Ana Rana said that he was linked with Daniel Whitman.
[01:14.000 --> 01:17.000]  He said that it was higher authorities that authorized him.
[01:17.000 --> 01:21.000]  So the judge declared that during the arrest, the embassy came to look for him.

_and so on_.

If you have the computing power then, and you work with oral history or other recorded speech, Whisper is well worth investment of time and energy.

obsidian zotero integration plugin

I’m making a coursepack for my fall class in Obsidian. It’ll have useful features, tools, and templates for doing the work I want them to do.

I want students to use Zotero to keep track of their readings and annotations. I’ve been playing with https://github.com/mgmeyers/obsidian-zotero-integration and have now got it where I want it, more or less. Now that I’ve got it set up, students just have to download my vault and turn off safe mode, and everything’ll be configured. For reference, this is what I did:
– add it via the community plugins
– create a new note for your template.
– use this template for your notes: https://forum.obsidian.md/t/zotero-desktop-connector-import-templates/36310/2?u=drgraham
– the ‘css snippet’ is a text file with .css extension in the .obsidian/snippets folder; you can tell obsidian to read the snippet from the ‘appearance’ settings
– in the template itself, the hex values for colour did not agree with the colours being used by my version of zotero (maybe somewhere in zotero you can set those?)
– Here’re mine:
“`
{%- macro colorValueToName(color) -%}
{%- switch color -%}
{%- case “#a28ae5” -%}
Relevant / important
{%- case “#ff6666” -%}
Disagree
{%- case “#ffd400” -%}
Questions / confusion
{%- case “#5fb236” -%}
Agree
{%- case “#2ea8e5” -%}
Definitions / concepts
{%- default -%}
Interesting but not relevant
{%- endswitch -%}
{%- endmacro -%}
“`
In the settings for the zotero integration, I turned on the insert citekeys from the @ symbol, and I renamed note import to be ‘extract your notes from an item’. I have it create new notes in a folder called `zotero-notes`. I added the @ symbol in front of the filename template – which uses bibtex cite key- so that my extracted annotation notes all follow the pattern @graham2006 etc. Useful for eventual pandoc integration.

Use the colours in Zotero to indicate _why_ you’re highlighting something.

Yellow – Questions/confusion
Red – Disagree
Green – Agree
Blue – Definitions / concepts
Purple – Important

cmd + p to open the command pallette
look for ‘**zotero – extract your notes from an item’**.
wait a few moments for the zotero selector to appear.
search for and select the item that you were annotating within zotero
a new note will appear in the zotero-notes folder
you can then refactor (use the ‘note composer’ command) the one big note into several small ones. Alternatively, in a brand new note you can link to the note and use the ^ symbol to link to a particular annotation.

If you have zotero on an ipad, and you have some kind of shared folder accessible between your computer and the ipad, there seems to be a glitch. Anything you annotate on your ipad in a shared folder needs to be moved to your main ‘my library’ on your computer, and then deleted from the shared folder. Otherwise you’ll get an error when you go to extract the annotations.

steampipe.io + hypothes.is -> obsidian.md

Steampipe.io lets you run sql against a variety of services, including hypothes.is. This strikes me as a nice way to perhaps develop a workflow from hypothesis to obsidian (there are any number of ways one could do this).

These are my notes for getting it up and running.
At the terminal, with brew installed (running on mac)


brew tap turbot/tap
brew install steampipe
steampipe -v
steampipe plugin install hypothesis


Then add my developer key for hypothesis to the config file for steampipe, which lives at

~/.steampipe/config/hypothesis.spc

There is a dashboard for hypothesis at https://github.com/turbot/steampipe-samples/tree/main/all/hypothesis . Copy those files to text editor, save with .sp extension. Then at the terminal, fire ’em up with

steampipe dashboard`

Queries can be run at the query prompt, via

steampipe query

. Or you can save your queries as a file, and pipe the results to output. So, I make an empty file and put this query into it:


select
uri,
tags,
exact,
text
from
hypothesis_search
where
query = 'tag=media&tag=review';


I save it as `query1`, no file extension. Then, at the terminal prompt,


$ steampipe query query1 > output.csv


The resulting output file actually uses the `|` character as a separator, so once I remember to specify that when opening in Excel, a lovely file of annotations. I have to run now, but I can see opening this file in Obsidian.md and then refactoring it so that I end up with one note per annotation.

references
https://hub.steampipe.io/plugins/turbot/hypothesis/tables/hypothesis_search
https://hub.steampipe.io/plugins/turbot/hypothesis

 

 

…god I hate the gutenberg editor here on wordpress.com. It sucks…

Memo to self: The Full Circuit of Humanities Computing

In 2013, Bethany Nowviskie wrote in ‘Resistance in the Materials‘:

“What new, interpretive research avenues will open up for you, in places of interesting friction and resistance, when you gain access to the fresh, full circuit of humanities computing—that is, the loop from the physical to the digital to the material text and artifact again?”

I love this idea; have loved it for years. It was the organizing principle of a course I did a few years ago. I’m working on a proposal that takes this as the central conceit. And I’m making this note here, because that most petulant of man-child, that monstrous Ego, has bought the location where I might normally post such things. Perhaps I’ll be spending more time here…

A System of Relationships, or Getting My Stuff Into Neo4j

Photo by Pixabay on Pexels.com

A context is just the name we give to describe a system of relationships at a particular point in time; a point that conventionally corresponds with a singular event in the life of the ‘site’. But there’s nothing real about a context. It’s just a convention that hides the fact that we are describing the edges, not the nodes, of material culture. What would an archaeology that focused exclusively on ‘assemblages’ in the sense of D&G – agencements – that had their own agency, coming together?

When we think of things this way, it’s clear that a conventional ‘relational’ database is the wrong way of ordering things. A relational database, ironically, does not attached any meaning to the connections in themselves; it’s just a way of joining tables. A graph database on the other hand assigns properties to both the nodes and the edges, and allows us to query through both the properties and the relationships.

Graph databases emerge out of networks & graph theory. Neo4j is one of the most prominent in the field at the moment. But it has, to my mind, a glaring problem: it’s a right pain in the ass to get information into it. You can write queries that describe the nodes and the edges, and you can import csv files, but they have to be arranged just right. Much as in social network analysis, one effective way to import things is to have a table of the nodes, where the columns contain the properties, and another with the edges, and their properties.

But data entry is a nasty business. If you just start filling in cells in a spreadsheet, you quickly run into user interface issues, data validation issues, sheer pain-in-the-arsedness issues. The other requirement was that there were a series of images on which we wanted to annotate the location of objects. The forms would then capture the metadata around how these objects interrelated with each other: the photo captured one moment in time, one event, one context. I wanted therefore to design a form that would make that pain a little less awful.

One can design forms in Excel. Or in Access (though I didn’t have a copy of Access). Or Filemaker. Or Google App script. But… part of me is bloody minded. If I was going to screw things up, I wanted it to be because I screwed things up, not because some part of Excel decided that the data I was entering was really a date, or some other ‘helpful’ automagical correction.

There are a variety of ways of doing this DIY business. I could’ve gone with html and webforms. Remember when writing html was straightforward? Now it’s all Django or Svelte or typescript or whatever. Not being much of a python person (or front-end person too, for that matter), but recognizing that it could run on whatever system, I thought I’d see what I could do in that regard. Which is how I came to Tkinter. Similarly, I could’ve used pyqt4, 5, or 6, or even gui tools (eg, like this) for designing python forms. But I had found someone’s tutorial that did pretty much what I wanted to do. I might not be able to write from scratch, but i can generally follow the logic and adapt/adopt things as I find ’em.

I wrote four forms. One for context metadata, one for artifact metadata, one for photo metadata, and one for notes. I built in validation and dropdowns that pulled from one form to the next, so that I’d minimize my ability to screw up consistency of names and descriptions. I got the forms to import image annotations from LabelImg, and to export the whole boiling to four csv tables. (Chantal, my student, pointed out that if I’d started in pyqt5, I could probably have built the forms directly into LabelImg in the first place. Maybe next time.)

Now the problem was getting all this into Neo4J. Two nodes were obvious: photo, and artifact.

(Artifact)-[APPEARS_IN]->(Photo).

But was the third node really ‘context’? I think the answer is ‘no’. The third node is the physical location; but the context describes an event that happens there. So:

(Photo)-[DEPICTS]->(Square)

and the various contextual information is a property sometimes of DEPICTS, sometimes of APPEARS_IN, sometimes of Square, Photo, and Artifact.

My forms didn’t export a nice table of relationships; instead I had to do some merging using Pandas to parse the various fields/cells into the places I want. Code examples follow below. Then it became a matter of writing print statements that would iterate through the rows of data and write the values in the various cells mixed together with the correct Cypher text and syntax. In this, I was following an example from this ; there are similar things around the web.

I bundled these up into little scripts, and wrote one last gui to help with the workflow, a launcher with buttons for all the various components. I can enter data (saving it to sqlite3, which could be pushed online with Datasette, which’ll wrap it in an API so others can play with it), I can export it to flat csv tables, and I can push a button to get the Cypher statements to add to the graph database.

In Neo4j, I can now start querying the material as if I was looking for shortest paths through it, attenuated by physical x,y,z. Community detection. Cliques. And so on. If an archaeological site is a series of relationships, then I want to use a method that is built around understanding the structure (including absences!) of relationships. Tune in later for when I start querying stuff; that’ll be the thing: was all of this worth it?.

~~~

Python that creates a Cypher create statement from a CSV:

import sys
import csv

import pandas as pd

filename="../data/context_data.csv" 
  
# load the data with pd.read_csv
record = pd.read_csv(filename)

#print(record.Square.unique())
record = record.drop('CONTEXT_NUMBER', 1) # contexts will show up in the relationships

#a square might show up a couple of times 
#because of the other csv files - many artifacts from the same location - 
#so I just want ONE node to represent the square

record.drop_duplicates(subset=['Square'], inplace=True)
row_number = record.index

original_stdout = sys.stdout

with open('./square_nodes.cql', 'w') as f:
	sys.stdout = f # Change the standard output to the file we created.
	for i, row in record.iterrows():
		print("\n"+ "CREATE (s"+ str(row_number[i]+1) +":SQUARE {square:'"+record.loc[i][0] +"',module:'"+record.loc[i][1]+"',context_type:'"+record.loc[i][2]+"',description:'"+record.loc[i][6]+"'})")
	sys.stdout = original_stdout

And some Python that creates relationships; remember, there was no original relationships table; these are all derived implicitly or explicitly from the act of recording, and thinking of contexts as relationships, not things. Incidentally, I wrote these in a jupyter notebook first, so I could tinker slowly and make sure that I was grabbing the right things, that everything looked ok.

import pandas as pd
import csv
import sys

a = pd.read_csv("../data/photo_data.csv")
b = pd.read_csv("../data/artifact_data.csv")
b = b.dropna(axis=1)

c = pd.read_csv("../data/context_data.csv")
d = pd.read_csv("../data/photo_data.csv")
d = d.dropna(axis=1)

merged2 = c.merge(d, on='CONTEXT_NUMBER')
merged = a.merge(b, on='Photo_filename')


original_stdout = sys.stdout

with open('./relationships.cql', 'w') as f:

	for i, row in merged.iterrows():
		sys.stdout = f
		print("\n"+ "MATCH (a {artifactName:'"+ str(merged.loc[i][16]) +"'}), (b {photoNumber:"+ str(merged.loc[i][0])+"}) MERGE (a)-[:APPEARS_IN{timestamp:'"+ merged.loc[i][3]+"', xmin:"+ str(merged.loc[i][10])+", ymin:"+ str(merged.loc[i][11])+", xmax:"+ str(merged.loc[i][12])+", ymax:"+ str(merged.loc[i][13])+"}]->(b);")
		sys.stdout = original_stdout


with open('./relationships.cql', 'a') as f:

	for i, row in merged2.iterrows():
		sys.stdout = f
		print("\n"+ "MATCH (a {photoNumber:"+ str(merged2.loc[i][8]) +"}), (b {square:'"+ str(merged2.loc[i][1])+"'}) MERGE (a)-[:TAKEN_FROM{Square:'"+ merged2.loc[i][1]+"', Module:'"+ str(merged2.loc[i][2])+"', CONTEXT_NUMBER:'"+ str(merged2.loc[i][0])+"'}]->(b);")
		sys.stdout = original_stdout


Now I’d like to make my code more generalizable. But for the time being… it works, it’s alive!

Trying Something New

I never really got the hang of having a ‘holiday’, and dammit, futzing about on the computer still is fun, when there’s no deadline or pressing compulsion to do so… so the idea is:

  • i want a graph database (looks like neo4j is the most accessible thing?), because the relationships in this data seem pretty important
  • but the people entering the data can’t be expected to write cypher queries to do that

But we can import CSV tables describing objects and properties and relations and the properties of those relations. OK. So maybe just fill out a bunch of spreadsheets and export to csv.

But filling out cells in a spreadsheet can be pretty mind numbing, and we also know that excel does weird shit to data sometimes, and well, couldn’t we just avoid it?

So I’ve been designing data entry forms using jupyter notebooks, widgets, and voila. I’ve also been trying out tkinter for making forms directly in python too. So far, tkinter is winning. And at the end of that process, I end up with an sqlite database which I could push online using say datasette and have an api all of a sudden for the data, too. So that’s a win.

But back to the csv. I get a csv out of my tkinter data entry forms. It’s still not really in the format for entry into neo4j through its load.csv stuff. This probably is more a reflection on me than on neo4j, of course. Came across this – https://aspen-lang.org/ which looks like it’ll do what I want. Remember, the data is being added and worked with by folks with even less experience in this domain than me. Aspen looks pretty cool. Describe your data, it translates it into cypher queries, pushes into neo4js.

Problem is, I’m getting all sorts of weird errors that lead me to suspect that the version of ruby I’m running is the wrong version. I have no experience with ruby, other than a couple of jekyll experiments. The Aspen documentation says 2.6 or higher, but I’m running 3, and this isn’t working. It turns out you can run many versions of ruby at once (see stackoverflow) so I’ll give that a try, install 2.6 and see what happens….

… not much; but a new error message ‘undefined method `merge’ for false:FalseClass (NoMethodError)’ so that’s progress I guess.

**update**

memo to self. In the aspen.rb line, if I comment out the merge command, all works:

def self.compile_text(text, environment = {})
    assert_text(text)

    if text.include?(SEPARATOR)
      env, _sep, code = text.partition(SEPARATOR)
      compile_code(code, )#YAML.load(env).merge(environment))
    else
      code = text
      compile_code(code, environment)
    end
  end
  ```

Now, if I follow the wiki to do the csv import, modify the bin/convert script as directed and have empty ‘main.aspen’ files in ‘grammars’ and ‘discourse’, plus remember to capitalize nodes, node properties, then… hot damn, it writes a cypher query!

Five Years Of Epoiesen; Future Funding the Next Five!

Holy moly, it’s been 5 years of Epoiesen. I’d like to think we’ve had a bit of impact, a small moving of the needle concerning expanding the range of what is possible to do! Neville Morley, of the University of Exeter remarked a short while ago, on Twitter, about Epoiesen:

“Five years of being the most downright interesting and thought-provoking publication in archaeology/ancient history.’

The mission of Epoiesen has been referenced in journals like the Canadian Journal of Archaeology, the European Journal of Archaeology, and Advances in Archaeological Practice, and elsewhere; and individual pieces are being cited, used in teaching, and enjoyed by readers from all walks of life. Our authors range from tenured professors, to graduate and undergraduate students, to members of the public – probably the widest variety you’ll see! Of course, as a matter of principle we don’t use tracking cookies on Epoiesen, so I can’t give you ‘hits’ or shares or that sort of thing, but on Google Scholar you can see some of the pieces are gaining traction – https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=epoiesen.library.carleton.ca&btnG= .

Moving forward, I have set up a campaign on Carleton University’s micro-fundraising site, ‘FutureFunder’, to build up a bit of a reserve so that I can provide some paid training opportunities for students to help with Epoiesen; everything from copy-editing to site architecture to promotion. I would be grateful if readers of Electric Archaeology or Epoiesen could circulate this call around their own networks, link below. Starting at midnight on Nov 30 (ie, 12.01 am Nov 30), funds raised will be matched dollar-for-dollar by the University on ‘Giving Tuesday’, which is a great initiative.

link here because I can’t stop the new wordpress editor from turning it into an embed.

Epoiesen is free to read, and free to publish in; we do not charge nor will we ever charge, article processing fees. Pieces are published under licenses chosen by the authors. The formats are only limited by my own technical skills – but if I can start hiring people, even that limitation will fall away! Paper ‘Annual’ versions are available courtesy our friends at The Digital Press at the University of North Dakota. Volumes 1 – 4 are available (free pdf, $9 physical; any funds from sales go back into making other works from the Press open access); Volume 5 will be in production shortly.