A museum bot

I wanted to build a bot, inspired by some of my students who made a jupyter notebook that pulls in a random object from the Canadian Science and Technology Museum’s open data, displaying all associated information.

The museum’s data is online as a csv file to download (go here to find it: http://data.techno-science.ca/en/dataset/cstmc-smstc-artifacts-artefact ). Which is great; but not easy to integrate – no API.

Build an API for the data

So, I used Simon Willison’s Datasette package to take the csv table, turn it into a sqlite database, and then push it online – https://datasette.io/.

First I installed sqlite-utils and datasette using homebrew:

brew install sqlite-utils datasette

then I turned the csv into sql:

sqlite-utils insert cstm.db artefacts cstmc-CSV-en.csv --csv

I installed the commandline tools for vercel, where my museum data api will live, with

npm i -g vercel

vercel login

then I pushed the data online with datasette; datasette wraps the database in all its datasette goodness:

datasette publish vercel cstm.db --project=cstm-artefacts

You can see the results for yourself at https://cstm-artefacts.vercel.app/ (click on ‘artefacts’).

Now, a few days ago, Dan Pett posted the code for a bot he made that tweets out pics & data from the Portable Antiquities Scheme database – see his repo at https://github.com/portableant/findsbot. I figured it should be easy enough to adapt his code, especially since my new api will return data as json.

Build a Bot with R

So I fired up RStudio on my machine, and began experimenting. The core of my code runs an sql query on the API looking for a random object where ideally the general description and thumbnail fields are not null. Then it parses out the information I want, and builds a tweet:

library(httr)
library(rtweet)
library(jsonlite)
library(digest)

search <- paste0('https://cstm-artefacts.vercel.app/cstm.json?sql=SELECT+*+FROM+artefacts+WHERE+NOT+GeneralDescription+IS+NULL+AND+NOT+thumbnail+IS+NULL+ORDER+BY+RANDOM%28%29+LIMIT+1%3B')
randomFinds <- fromJSON(search)
## grab the info, put it into a dataframe
df <- as.data.frame(randomFinds$rows)
artifactNumber <- df$V1
generalDescription <- df$V3
contextFunction <- df$V17
thumbnail <- df$V36

## write a tweet
tweet <- paste(artifactNumber,generalDescription,contextFunction, sep=' ')

## thank god the images have a sensible naming convention;
## grab the image data
imagedir <- randomFinds$results$imagedir
image <- paste0(artifactNumber,'.aa.cs.thumb.png')
imageUrl <- paste0('http://source.techno-science.ca/artifacts-artefacts/images/', URLencode(image))

## but sometimes despire my sql, I get results where there's an issue with the thumbnail
## so we'll test to see if there is an error, and if there is, we'll set up a 
## an image of the Museum's lighthouse, to signal that well, we're a bit lost here
if (http_error(imageUrl)){
  imageUrl <- paste0('https://ingeniumcanada.org/sites/default/files/styles/inline_image/public/2018-04/lighthouse_.jpg')
  tweet <- paste(artifactNumber,generalDescription,contextFunction, "no image available", sep=' ')
}

## then we download the image so that we can upload it within the tweet
temp_file <- tempfile()
download.file(imageUrl, temp_file)

So all that will construct our tweet.

Authenticate….Authenticate…

The next issue is setting up a bot on twitter, and getting it to… tweet. You have to make a new account, verify it, and then go to developer.twitter.com and create a new app. Once you’ve done that, find the consumer key, the consumer secret, the access token, and the access secret. Then, make a few posts from the new account as well just to make it appear like your account is a going concern. Now, back in our script, I add the following to authenticate with twitter:

findsbot_token <- rtweet::create_token(
  app = "THE-EXACT-NAME-YOU-GAVE-YOUR-APP",
  consumer_key = "THE-KEY-GOES-HERE",
  consumer_secret = "THE-SECRET-GOES-HERE",
  access_token = "THE-TOKEN-GOES-HERE",
  access_secret = "THE-ACCESS-SECRET-GOES-HERE"
)

# post the tweet
rtweet::post_tweet(
  status = tweet,
  media = temp_file,
  token = findsbot_token
)

And, if all goes according to plan, you’ll get a “your tweet has been posted!” message.

Getting the authentication to work for me took a lot longer than I care to admit; the hassel was all on the developer.twitter.com site because I couldn’t find the right damned placed to click.

Secrets

Anyway, a bot that tweets when I run code on my machine is cool, but I’d rather the thing just ran on its own. Good thing I have Dan on speed-dial.

It turns out you can use Github Actions to run the script periodically. I created a new public repo (Github actions for private repos cost $) with the intention of putting my bot.R script in it. It is a very bad idea to put secret tokens in plain text on a public repo. So we’ll use the ‘secrets’ settings for the repo to store this info, and then modify the code to pull that info from there. Actually, let’s modify the code first. Change the create_token to look like this:

findsbot_token <- rtweet::create_token(
  app = "objectbot",
  consumer_key =    Sys.getenv("TWITTER_CONSUMER_API_KEY"),
  consumer_secret = Sys.getenv("TWITTER_CONSUMER_API_SECRET"),
  access_token =    Sys.getenv("TWITTER_ACCESS_TOKEN"),
  access_secret =   Sys.getenv("TWITTER_ACCESS_TOKEN_SECRET")
)

Save, and then commit to your repo. Then, click on the cogwheel for your repo, and select ‘Secrets’ from the menu on the left. Create a new secret, call it TWITTER_CONSUMER_API_KEY and then paste in the relevant info, and save. Do this for the other three items.

One thing left to do. Create a new file, and give it the file name .github\workflows\bot.yml ; here’s what should go inside it:

name: findsbot

on:
  schedule:
    - cron: '0 */6 * * *'
  workflow_dispatch:
    inputs:
      logLevel:
        description: 'Log level'
        required: true
        default: 'warning'
      tags:
        description: 'Run findsbot manually'
jobs:
  findsbot-post:
    runs-on: macOS-latest
    env:
      TWITTER_CONSUMER_API_KEY: ${{ secrets.TWITTER_CONSUMER_API_KEY }}
      TWITTER_CONSUMER_API_SECRET: ${{ secrets.TWITTER_CONSUMER_API_SECRET }}
      TWITTER_ACCESS_TOKEN: ${{ secrets.TWITTER_ACCESS_TOKEN }}
      TWITTER_ACCESS_TOKEN_SECRET: ${{ secrets.TWITTER_ACCESS_TOKEN_SECRET }}
    steps:
      - uses: actions/checkout@v2
      - uses: r-lib/actions/setup-r@master
      - name: Install rtweet package
        run: Rscript -e 'install.packages("rtweet", dependencies = TRUE)'
      - name: Install httr package
        run: Rscript -e 'install.packages("httr", dependencies = TRUE)'
      - name: Install jsonlite package
        run: Rscript -e 'install.packages("jsonlite", dependencies = TRUE)'
      - name: Install digest package
        run: Rscript -e 'install.packages("digest", dependencies = TRUE)'
      - name: Create and post tweet
        run: Rscript bot.R

If you didn’t call your script bot.R then you’d change that last line accordingly. Commit your changes. Ta da!

The line that says ‘cron: ‘0 */6 * * *’ is the actual schedule. You can decipher that with this:

which comes from here: https://www.adminschoice.com/crontab-quick-reference . If you want to test your workflow, click on the ‘actions’ link at the top of your repo, then on ‘findsbot’. If all goes according to plan, you’ll soon see a new tweet. If not, you can click on the log file to see where things broke. Here’s my repo, fyi https://github.com/shawngraham/cstmbot.

So to reiterate – we found a whole bunch of open data; we got it online in a format that we can query; we wrote a script to query it, and build and post a tweet from the results; we’ve used github actions to automate the whole thing.

Oh, here’s my bot, by the way: https://twitter.com/BotCstm

Time for a drink of your choice.

Postscript

Individual objects are online, and the path to them can be built from the artefact number, as Steve Leahy pointed out to me: https://ingeniumcanada.org/ingenium/collection-research/collection-item.php?id=1979.0363.041 Just slap that number after the php?id=. So, I added that to the text of the tweet. But this also sometimes causes the thing to fail because of the character length. I’m sure I could probably test for tweet length and then swap in alternative text as appropriate, but one thing at least is easy to implement in R – the use of an url shortener. Thus:

library(urlshorteneR)

liveLink <- paste0('https://ingeniumcanada.org/ingenium/collection-research/collection-item.php?id=', artifactNumber)
shortlink <- isgd_LinksShorten(longUrl = liveLink)

tweet <- paste(artifactNumber,generalDescription,contextFunction,shortlink, sep=' ')

Which works well. Then, to make sure this works with Github actions, you have to install urlshorteneR with this line in your yaml:

   - name: Install urlshorteneR package
        run: Rscript -e 'install.packages("urlshorteneR", dependencies = TRUE)'

ta da!

Why I Will Never Use My University’s LMS Again

There is a new LMS coming to Carleton. The switch has been flipped. We’re moving from Moodle to Brightspace. All of the things that we used to do for ourselves now depends on an office somewhere in Kitchener-Waterloo (24 hour support!).

This post is not a long reasoned argument about the use or not of LMS in higher ed. It’s just what I’m feeling right now about our particular circumstances. I’m imagining where things might lead. Let me share my worries.

I would be delighted to be wrong about all of this.

Every higher ed institution in Ottawa now uses Brightspace. And because of historical agreements, many students and programs end up taking classes across the different institutions. Can you imagine the pressure towards a ‘standardized’ experience that this would create? “It’s too confusing to students to navigate all these different course designs!” it will be said. It is said. I’ve heard it said.

So we’ll be encouraged to use the resources of our education support people to design our courses. At which point, it’d be good to check the fine print. Your course is your course, of course, unless you are contracted to design and teach a course, or perhaps you’ve used too many institutional resources to build it… at a certain point, your course might not be yours after all. Teaching from beyond the grave, anyone?

But we’ve got standardization. Multiple choice, short answer, essays, that’s standard. Easy to roll out. Everyone understands that game. You can’t really experiment or try to ungrade or empower your students, when everything’s standard. Not particularly good pedagogy; not really higher education, but by god it’s easy to churn out course shells fast. But oof, now there’s all this cheating. Better get eproctoring. Better get plagiarism detection. Examine how we got to this point where cheating is a rationale response to a series of hoops that carry little pedagogical value? Perish the thought. And don’t point out the problems with these ‘solutions’, lest someone hit you with a SLAPP. Imagine if the money spent on contracts for all this was put into providing stability for contingent lecturers – you want to make a difference for student experience? That’s where I’d spend the money.

This fall, we move back to the classroom for the smaller classes; or at least, that’s the plan. But all the big classes – or the classes that could be made bigger – well, keep an eye out…

So – I won’t use the LMS because it reifies a model of teaching I don’t believe is good pedagogy. While I still have the position and privilege to resist, I will. The more one uses an institutional LMS (or is compelled to use it), the more all of our freedom to teach using a different model – exposing students to other ways of learning, to ungrade, to turn things inside out – is eroded. I keep control of my teaching materials by making them open on the web; I keep control by giving it all away. It’s out there, on my own terms, and for good or for ill, people know what I’ve done. It pushes in ways that are uncomfortable, it makes space for things that don’t work out and that makes students extremely uncomfortable: I want them to try things that just might not come together within a seminar. But I don’t grade the thing, I work with the student to understand their process. This ain’t standard. It doesn’t scale. By design. (I once argued in a meeting that a class of 600 students was unethical. The instructor for that class was present. You can imagine how that went over).

But the pressure is mounting.

Consider the scenario- All those rich juicy data points that come from using the LMS. ‘We’ use those for their own good! We can see how many times they log into the LMS, correlate that with their GPA, cross reference with their demographic profile! But woops, Dr. Graham’s class doesn’t use the lms… that sure messes up those students’ analytics profiles, right? They’d be unfairly marked as ‘at risk’ (or some other consequence) because they don’t have as many ‘touch’ points as the others. That’s just one scenario. Others can be imagined.

Look, I worked in for-profit online education. These things exist. At where I worked, they also used the same tools to turn the gaze onto the faculty. Not enough points of contact with the system in a defined time frame? You got the ax.

But also: monoculture. No ecosystem survives monoculture. If everything’s standardized, nothing’s special, so why do we have four higher ed institutions in Ottawa anyway…

My ability to predict the future has always been poor, so I look forward to being proven wrong about all of this. But right now…

 

My Opening Remarks for HeritageJam 2021

After I remembered to unmute my mic (d’oh) this is what I said…

Welcome to HeritageJam 2021! This is the fourth iteration of the jam, and the first to be located outside of the UK, if you will accept that my basement office where I am now sitting constitutes the location of the jam. I’m Shawn Graham, and I’m at Carleton University in Ottawa, Canada. My colleagues on the Jam are Sierra McKinney and Katherine Cook, from the University of Montreal, and Stuart Eve from Bournemouth University and L.P. Archaeology, and I am so grateful that they are on board for this mad enterprise! This jam would not be possible without funding from Carleton University and the Social Sciences and Humanities Research Council of Canada.

I want to begin by acknowledging that where I live and work is on the unceded traditional lands of Algonquin Anishnaabeg. It is customary now at Carleton to begin events by making that land acknowledgement; but it seems to me that too often we just then continue on to do what we were going to do anyway. So one of my goals for this year’s Heritagejam is that we keep that land acknowledgement uppermost in our minds as we craft, create, and explore. The theme for this year’s HeritageJam is ‘sensation’, so one way to think about that theme is in the context of the land or territory you are in or on whose territory your work depends. What sensations in us, or in the public, should a land acknowledgement generate? How can we make that acknowledgement meaningful wherever we are, and however we might interpret ‘sensation?’ Sensations can be troubling; they can be enchanting. Perhaps we encounter sensations when we are confronted by eruptions of deep time in the present: how can we convey that sensation?

Today you will have an opportunity to meet the entire HeritageJam team, to hear more about how the Jam will unfold, gain inspiration and encouragement from past examples of work, and to meet other jammers with whom you are welcome to collaborate. I have participated in each iteration of the Jam, and what excited me then – and continues to excite me now – is that this opportunity to be wholly creative for the sake of thinking differently about heritage refreshes me; it re-invigorates me and reminds me that there are so many ways other than essays, articles, and monographs to engage the past. Each time I’ve participated in the jam, it has redirected me into new avenues that simply enrich my daily life.

This is the first edition of the Heritagejam that is completely virtual. In years past, there has been an in-person two day event where folks would come together, break into teams, and over the course of the two days make something – sometimes, it was completely fully-formed; othertimes, it was more like a design or prototype. There was always a virtual component where people working remotely could produce something in the month leading up to the in-person jam

I’m going to share my screen now. Here’s the 2014 jam, the first of its name. One of my favourite entries is this comic, by Nela Scholma-Mason; as you explore the entries, take a look at each entry’s paradata document. If data are things we study/use, and metadata describe the data, then paradata describe our process. Scholma-Mason’s paradata is a wonderful piece of zine making in itself! Another favourite of mine from the first jam is ‘Buried’, by Tara Copplestone and Luke Botham, a piece of interactive literature made with the Twine text game engine.  (‘Buried’ is available on the Internet Archive at https://web.archive.org/web/20161228094857/http://taracopplestone.co.uk/buried.html).

In the 2015 jam, take a look at Jens Nortoff’s sketch. It’s a quick thing he put together while out in the field; Heritagejam entries don’t have to be ‘digital’. In the 2017 jam Andrew Reinhard invented a deck builder game around the archaeological idea of ‘assemblage’.

So you can do just about anything you set your mind to; it is entirely ok and appropriate to submit a _design_ idea, a mock up, a wireframe, a powerpoint that uses found imagery to give us an idea of what you have in mind. Just take a look at the rules page, and get in touch with us. EVERY entry must include a paradata document. We follow the London Charter which calls paradata the “documentation of the evaluative, analytical, deductive, interpretative and creative decisions made in the course of … visualisation” to allow a clear understanding of how the visualisation came into being.”. Your paradata can be in whatever format you’d like to, although you’ll probably find that a page or two of text is most straightforward.

Now, to help you get started, I’m going to ask Sierra, Katherine, and Stuart to say a few words about their own creative processes and their experiences.

(Stu was working in the field and was not able to join, as the platform we were using it turned out did not support mobile, which was my fault for not checking first)

Now, you’ll notice that there are different areas marked out in this room according to the canonical western senses. Feel free to move around into an area that captures something about what you might be interested in exploring with regard to our theme, ‘sensation’. When your avatars are in close proximity, you will be able to see and speak to each other. Introduce yourself and perhaps begin by wondering what ‘sensation’ might mean in terms of land acknowledgements. Sierra, Katherine, Stuart and I will circulate; after about 20 – 30 minutes we’ll wind up the session.

At this point, a kind of unconference took place, with conversations taking place mostly in the ‘hearing’ and ‘taste’ circles. It’ll be interesting to see what emerges at the end of the month!

Thank you everyone! Our time for today is now up, but I am grateful that you were willing to spend it with us; I hope you’ve found new friends and collaborators here, and I encourage you to use our HeritageJam discord server for companionship while you heritagejam! Your creations and paradata can be submitted to our heritagejam email, and you can always contact me or ask for help in the discord as you need it.

HeritageJam 2021 is Go: The Sensation of the Past

HeritageJam 2021 is go! http://www.heritagejam.org/

How the past is conceptualised – how it is presented graphically, acoustically, haptically, olfactorily, vocally, and in other performative capacities – has a significant impact upon people’s understanding of themselves and the world around them. It is fundamental to influencing the degree of importance that individuals and communities assign to their environment, and how they care for that environment in the present and build upon it in the future. The artistry and enquiry that are invested into this creative work have known effects not only on public perception but on the whole trajectory of heritage study and practice – from research to policy-making to protection and conservation. The Heritage Jam is about showcasing the presentation of the past, and drawing together the many people invested in such presentation.

The Heritage Jam begun in 2014 at the University of York in the UK as a way to bring people together to design and create forward-thinking pieces of heritage visualisation in a short space of time. This year, it’s hosted by Shawn Graham at Carleton U, Katherine Cook at the Universite de Montreal, and Stuart Eve of L-P Archaeology plc.

The Heritage Jam is open to anyone interested in the way heritage is visualised: we call to artists, animators, game designers, programmers, archaeologists, historians, conservators, museum professionals, heritage practitioners, and any interested members of the public to join forces and collaborate. The outcomes of the Jam are hugely varied – ranging from fine art pieces, 3D models and games through to stories, sketches and videos. The only limits on creation are the theme, time and your imagination!

The Jam will take place entirely online over the month of April, 2021. Submissions will be due at Midnight (eastern), on April 30th.

We will host a kick-off event on March 31st at 1 pm eastern. Make sure to register your intent to participate in the jam via the sign-up form. We will also host a Discord server where you can casually drop in to have some company while you jam. Registered participants will be sent the invitation links for the kick-off event and the discord.

The theme for this year is ‘sensation’. We’ve all had to endure multiple lockdowns and isolation as a result of the COVID-19 pandemic. As the snow melts here in Ottawa, and we begin to feel the sun again on our faces, our senses perhaps are overwhelmed… what does ‘sensation’ mean to you in terms of heritage, history, and archaeology, as we approach another summer? (Some resources are available here https://dhmuse.netlify.app/building/technotes-toc/ to help you get started; for inspiration, see past HeritageJam entries!)

Entries are welcome in either English or French; there will be separate awards in both languages. Winning entries and Runners-Up will be invited to publish their work in Epoiesen: A Journal for Creative Engagement in History and Archaeology. A submission page for your entries will be made available on this site towards the end of April.

Even if you intend to create something as part of a team, please complete the sign-up sheet as an individual (so that we can send invitation links and so on to you); when you submit your entry, you will be able to indicate whether or not it’s part of a team entry (and who the team members are!) then. Even if you don’t sign up for the kick-off you can still participate in the jam – just submit your entry on April 30th! Make sure to also tweet about it using the #thj2021 tag.

The Dig: We Know Where the Bodies are Buried

Andrew Reinhard and I have been at it again. We wondered what the archaeology of Sutton Hoo might sound like. There are a lot of ways one could’ve approached this. We could’ve tried to recreate a soundscape – of the moment of the ship burial, or the moment of its excavation, for instance. We might have found tabular data from the various excavations and projects and maybe mapped the differing amounts of different kinds of artefacts by period they date to – or days they were found – or locations found in the earth (x,y,z making a chord, maybe). Maybe there is geophysics data (magnetometry, georadar, etc) and we could’ve approached things a la Soundmarks.

We instead looked at one piece of the public archaeology literature around Sutton Hoo – ‘The Sutton Hoo ship-burial : a handbook by
R. L. S. Bruce-Mitford as reproduced on the Internet Archive. I copied the text, and then divided it up into ‘documents’ of one page each. These I fed into a topic modeling routine I use in my teaching (written in R; see the course website). A topic model is a way of asking the machine, ‘if there are 15 topics in this corpus of material, what are they about?’. The machine will duly decompose the material, looking at statistical patterns of word use both in the ‘documents’ (here, individual pages) to try to sort those patterns into 15 buckets of words which we as the humans involved can then look at and say, ‘oh yes, that’s clearly a topic about English myth-history’. The result was this:

 

Notice how each chunk adds up to 1. I then took the underlying proportions for each chunk for four separate topics that seemed interesting: ‘coins date time hoard merovingian’, ‘sutton hoo swedish jewellery’ ‘gold plate figure purse buckle’ and ‘burial pagan grave christian east’. Those raw numbers, ranging between 0 and 1 (ie, the proportion each topic goes towards forming those chunks of writing) I multiply by 100 and then scale against 1 – 88 for the 88 key piano keyboard. Think of each topic as now a voice in a choir, each one singing their note on the beat. Muscially, a bit boring, but to the intellect, interesting; Andrew and I are still working with that data (mapping to instruments, remixing to bring out particular themes and so on). I am also interested in coding music, though I am very bad at it; I turned to Sam Aaron’s Sonic Pi live-music-coding synth. Building on some sample code I wrote a little piece that kinda looks like this:

with_fx :reverb do
   in_thread do
    loop do
     notes = (ring 20,50,21,50  etc etc: these are the proportions of the different topics for the first topic)
     notes2 = (ring 6, 14, 59 etc etc)
     notes3 etc 
     notes4 etc
     use_synth :piano
     play notes, release: 0.1, amp: rand, pan: rrand(-1, 1)
     play notes2, release: 0.1, amp: rand,pan: rrand(-1, 1)
     play notes etc
     sleep 0.25
    end
   end
 end

with_fx :wobble, phase: 2 do |w|
  with_fx :echo, mix: 0.6 do
    loop do
      sample :drum_heavy_kick
      sample :bass_hit_c, rate: 0.8, amp: 0.4
      sleep 1
    end
  end
end

and then I let that play; because it’s a live coding synth, you can make changes on the fly and layer those changes as you play. So not just sonification, a kind of digital instrument and performance. It’s not just the data you’re hearing, it’s my coding choices and my performance ability. I sent the result to Andrew and he immediately saw how the emotional impact of that music matched the latent horror of the film, and recut the trailer appropriately. Below, you can here the result (and if dmca takes down the video, you can also see it on Twitter. This reconstitution of Bruce-Mitford’s writing, a kind of digital body horror on a corpus of thought perhaps. The archaeological uncanny always eventually emerges.

 

PS: Youtube hit me with a copyright infringement the instant I uploaded that video. If it doesn’t play, you might be able to see it here:

From Hypothesis Annotation to Obsidian Note

Obsidian is a really nice interface for keeping zettlekasten-style notes (in individual markdown files, in a folder or ‘vault’). Hypothesis is a really nice interface for annotation on the web. Wouldn’t it be nice to be able to drop your annotations as unique files into your vault?

Well, this might work.

First, get ‘Hypexport’ from https://github.com/karlicoss/hypexport . Install it with

pip3 install --user git+https://github.com/karlicoss/hypexport

Then, create a new text file; call it secrets.py and put into it your Hypothesis username and your developer token (which is underneath your username when you have hypothesis open) like so:

username = "USERNAME"
token = "TOKEN"

Now, you can grab all of your annotations with:

python3 -m hypexport.export --secrets secrets.py > annotations.json

Now we need to turn that json into markdown. Incidentally, if you want to turn it into a csv, get jq and run something like this

jq -r '.annotations[] | [.text, .tags, .updated, .uri] | @csv' annotations.json > annotations.csv

So, here’s a json to markdown script: https://github.com/PolBaladas/torsimany . Pip install that, but then find where it’s located on your machine (search for torsimany.py) and change this line


data = f.read().decode('ascii', 'ignore')

to just

data = f.read()

and then run 

torsimany annotations.json

at the command prompt, and after a bit you’ll have a file called annotations.markdown.

Last thing – we want to split that up into separate markdown files, to drop into the obsidian vault. cpslit, split, awk, etc, all of those things will probably work; here’s some perl. Copy it into a text file, save with .pl, and if you’re on a mac, run

chmod +x split.pl

so you can run it. (Sourced from stackoverflow):

#!/usr/bin/perl

undef $/;
$_ = <>;
$n = 0;

for $match (split(/(?=### Title)/)) {
      open(O, '>temp' . ++$n);
      print O $match;
      close(O);
}

then run

./split.pl annotations.markdown

and you’ll have a whoooole lotta files you can drop into your obsidian vault. Ta da!

Now, you’ll have to add the .md file extension, which can be done as a batch with this one liner on a mac:

for file in *; do mv "$file" "${file%}.md"; done

It’d be nice to have the correct file extension done in my split script, but whatever. Above, a portion of one of my recent annotations, exported and then turned into markdown through the above process.

Now, hypothesis allows any user to annotate in the public stream; Here’s a Zotero-Hypothesis importer that works through your zotero library, and then checks whether there are any public annotations for a piece, and saves them to a zotero note: https://github.com/JD-P/hypothesis-zotero.

I haven’t tried it out, but if it works, and once your notes are in Zotero, you can use zotero-mdnotes to push ’em all into your Obsidian vault. Talk about distributed knowledge generation!

My own travel ban

So, back when it all began, I wrote ‘Why I’m Not Travelling to the US‘. Over the past four years, I went to 0 conferences in the US (of course, the pandemic these last 10 months helped with that too). I went to the states 3 times during that period to work with students at Muhlenberg, Drew, and U Mass Amherst.

So, no conference/self promotion travel, limited travel to support students.

Knowledge Management

This is my first post on wordpress.com where I have to use the Gutenberg editor and man I hate it. I should probably migrate this damned thing – 14 years old! – out of wordpress. I’ve wasted a huge amount of time just trying to get things organized so I know what the hell is going on. Yeah, I can’t stand change, once I figure out how to do something the way I like it. But, on a similar theme… I’m still trying to figure out out to *take good notes*.

When I was a PhD student, I kept incredibly dense and largely useless notes. By ‘useless’ I mean – once I’d made an entry in my great big notebook, it was incredibly hard to find it again. To see which other materials it might speak to. I had several notebooks, where I’d duly copy out interesting passages and make my observations, and then shove slips of paper or post-it notes in on those pages where I thought ‘hey, this might be important and I really need to find this again’. Because I had no structure for getting into and through those notes, I’ve never used those notebooks since. Which is a waste.

Nobody ever taught me how to take notes.

When it came time to write, I’d get a big piece of paper and try to sketch out how the Big Idea I was writing about worked. I’d write down page numbers, cryptic directions to various pages in various notebooks, half-baked references, remembrances of important things I’d read, and draw circles and lines and swoops and eventually something would emerge out of that, but it was a messy, wasteful process.

What I’ve been searching for ever since was a way where I could capture the exciting ideas I was reading, the interesting thoughts I was having, in such a way that knowledge would crystallize out of the mess. With time, I’ve started to come up with a way that works on paper – a notebook with a line down the middle of the page; observations or important phrases copied on one side, with my reflection or thoughts on the other side. A citekey scribbled at the top of the page to connect to my reference manager (I’ve used all of the reference managers, it seems). An index page at the front of the notebook. Then, when it comes time to write, my Big Page is at least a little bit tidier with references to ‘orange nb p24 re bennett 32’.

There are a number of posts on this ol’ blog about taking notes, and different systems I’ve tried to cobble together to make ’em. In recent years, I’ve really become interested in the whole zettlekasten scene; the basic idea is one idea, one note. Sometimes I copy the relevant passage down, but most of the time, it’s just me riffing on something I’ve read; usually no more than three or four sentences. Then a system for indexing these so that notes can be compiled into larger overview notes or broken back down again. It doesn’t have to be digital, but of course, digital search and storage makes life easier. I’ve used everything from Notational Velocity through to plugins and mods for Sublime Text or Atom. And these all work in the sense that I’m able to pull together all of my relevant atomic notes and sometimes – if I’ve been really switched on – the compiled overview note goes into whatever I’m writing in its entirety. The note taking process is the writing process.

I really like when that happens.

Unfortunately, I find it hard to maintain the use of these different packages for zettles consistently. I think the reason for this is because, despite the ability to recombine, search, and find my atomic notes, I still can’t see the connections between things very well. But, with my most recent book project finally out of the way, and a bunch of other things finally having made their way through the publishing process, I’m ready to start again.

And boy, how the landscape has changed!

Roam, Foam, Org-roam, and Obsidian

It was a chance tweet I saw by Jonathan Reeve that sent me out on this latest note-taking odyssey, by the way:

I had to investigate. The major thing that has changed I think is the idea of ‘networked thought’ has really entered into the note taking space. And I think that’s what I’ve always missed in note taking process. The idea that if you make connections as appropriate between ideas, eventually larger structures emerge; these larger structures (network structures like shortest paths, clusters of various kinds, most-central nodes of various kinds) can give insight into the nature of your thoughts/nodes and perhaps suggest insights that you might not otherwise have spotted.

There is a wide array of editors to help you with this, all of which include network visualizations of links, backlinks, and tag structures. Some, like Roam  are subscription based and keep your notes somewhere on the cloud; others like  Foam  or Org-Roam are open source and keep your notes locally as markdown files (though Org-Roam is built on emacs and life’s too short). Then there’s  Obsidian which is not open source, but does keep your notes as separate markdown files. It has a pretty slick interface, and it will publish and host your ‘vault’ (folder of notes) as a website if you so wish (for a hosting fee, which seems pretty reasonable). If you ever read Caleb McDaniel’s ‘Open Notebook History‘ that feature will be quite interesting.*

I’ve been kicking the tires on Obsidian for the last week, and I have to say, I quite like it. I have a few community plugins installed that let me ‘refactor’ (break apart or merge together) notes as appropriate, that let me insert citations from my Zotero library (or create new notes from scratch on a given resource in my Zotero library) with links back to the original pdfs/resource, and a few cosmetic tweaks. New panes can be opened at will from a variety of places, and if you have the screen real-estate, organized however you like. I grabbed my existing folder of notes and opened it within Obsidian; I created a new index note to provide some consistent points of entry:

*I keep my notes, my ‘vault’, in a git-tracked folder, pushing online to a private git repo. I was also pushing to a public wikijs instance I host on Reclaim Cloud, but the importer broke and I can’t make it work any more. Anyway, that was probably too much – if I want to make my notes available online, I can probably just gh-pages them and that’ll serve. You can automate the process of pushing new notes to github; see this post by Bryan Jenks.

When you search for keywords or phrases or tags, the results of those searches can be turned into instant notes with wiki-style links. See that graph at top right? The green nodes are tags, the blue are notes, the red are notes that I’ve created while writing other notes that remain to be filled in.

Workflow

So here’s my workflow. I have Zotero and Zotfile installed, so I can send pdfs to a folder on my ipad. On the ipad, I use pdf reader to annotate. Zotfile retrieves these and pulls them back into Zotero. I use zotero-mdnotes to push the notes to my folder (‘vault’) of notes. (If I’m reading something physical, I can just mark it up or use my paper notebooks as before, and then transfer/consolidate notes into a new note in Obsidian.)

These I can then refactor into individual atomic notes as necessary. Using BetterBibTex for Zotero, I have also exported (with constant updating) my library’s bib file (as csl json) to the vault; I can then add the cite-key to any atomic note as appropriate. I add tags as appropriate. I link to other notes as appropriate. Obsidian shows me when a given note is referenced by another or mentions another and so I can use that to guide back-linking too.

Then, I can garden. By ‘gardening’, I mean, exploring my notes and their connections and thinking about what I’m seeing. Perhaps I add new notes. Perhaps I prune or delete notes. Perhaps I add more links or tags.

I love the graph feature. But I wish I could analyze it. There is a plugin that exports your graph to Neo4j for analysis, but that’s almost too much power for what I have in mind, and besides, you need to learn the cypher query language to make sense of that kind of thing. The ‘Infranodus‘ platform might be worth exploring here, as it does network metrics and text analysis too and can ingest your notes (see for instance this post) but I didn’t feel like signing up with credit card to something I just wanted to explore a bit (Infranodus can be installed locally, but it’s a beast of a thing to configure – it depends on Neo4j! – and after wasting the better part of a day on it, I threw in the towel).

No, good ol’ gephi or cytoscape or similar is all I need. So I did a bit of digging – where does Obsidian hold all of that info? It turns out, there is a json graph in a folder called ‘ObsidianCache’ that contains the current representation of your vault and its interlinkages:

Now, I’m certain that one could write a bit of python to grab each note and its links and tags, represent as a graph, and then do a few network metrics. But I don’t know how to do that in python – yet. But I can do it with jq , and reshape the json so that I end up with note – link and note – tag pairs. Gephi doesn’t ingest json, so I use a bit of R to turn it into graphml. Hey presto, a network I can explore in Gephi! What are the most central ideas? What kinds of ‘communities’ exist? I am imagining that knowing this information would help kick start my writing, or help me detect emergent ideas I hadn’t considered yet. (Other people feed their notes in Devonthink, which does some natural language magic to find connections in your notes. That’s another of the beautiful things about keeping your notes in plain text on your own machine).

The relevant jq query:

jq --raw-output '.metadata[] | {title: .frontmatter.title, tag: .tags[]?.tag}'

Then a bit of regex in sublime to put commas at the end of each line, wrap in square brackets, then a bit of R:

ibrary(igraph)
library(jsonlite)
setwd("~/Desktop")
thing <- fromJSON("tag-test.json") 
g <- graph_from_data_frame(thing, directed = FALSE)
write_graph(g, "tag-2mode.graphml", "graphml")

Open ‘er up in Gephi, using the multimode plugin to turn it from a network of notes to tags, to tags – tags by virtue of notes in common…

So obviously, there is some mucky data in my test vault, but interesting, eh? Incidentally, the ‘sg’ tag is for when I’ve had some inspiration that I want to come back to. And of course, maybe note to note by virtue of common tags would be a more interesting/useful view. Or perhaps, since a ‘tag’ could be considered as a kind of semantic note on its own, I just leave it notes – tags and treat it all as unimodal. Things to explore!

So we’ll see how things go. This morning, I spent a happy hour refactoring and building notes from a great article about archaeological photography at Dura Europos. Baird writes,

“Taking photographs, like drawing reconstructions, was a means by which the archaeologists could attempt to understand the object and the past and to rebuild the ruin. At Dura, photography was not a passive recording device as it is thought of in most histories of archaeology; rather, it was something that seems to have been an active means of constructing a particular past (fig. 5). Time in these photographs refers both to the practice of taking the photographs— the posing and framing—and the excavator’s construction of a time in the image; thus, they reflect a temporal breach that constructed an East in which modern peoples are equated with ancient.”

Active note taking, gardening our thoughts using these digital tools, seems to me a bit like how Baird writes about photography, perhaps. But I haven’t fully fleshed out that thought yet; perhaps its because it lets me build something new from others’ mental excavations of their own thought. Or I’m pushing the metaphor too far. Back to the garden I go!

Some useful videos

Below is a video of PhD student Courtney Applewhite describes how she uses Obsidian to study for her comps; something similar to this approach might be worth adapting.

New paper out: Towards a Method for Discerning Sources of Supply within the Human Remains Trade via Patterns of Visual Dissimilarity and Computer Vision

We have a new paper out:

Graham, S., Lane, A., Huffer, D. and Angourakis, A., 2020. Towards a Method for Discerning Sources of Supply within the Human Remains Trade via Patterns of Visual Dissimilarity and Computer Vision. Journal of Computer Applications in Archaeology, 3(1), pp.253–268. DOI: http://doi.org/10.5334/jcaa.59

“While traders of human remains on Instagram will give some indication, their best estimate, or repeat hearsay, regarding the geographic origin or provenance of the remains, how can we assess the veracity of these claims when we cannot physically examine the remains? A novel image analysis using convolutional neural networks in a one-shot learning architecture with a triplet loss function is used to develop a range of ‘distances’ to known ‘reference’ images for a group of skulls with known provenances and a group of images of skulls from social media posts. Comparing the two groups enables us to predict a broad geographic ‘ancestry’ for any given skull depicted, using a mixture discriminant analysis, as well as a machine-learning model, on the image dissimilarity scores. It thus seems possible to assign, in broad strokes, that a particular skull has a particular geographic ancestry. ”

Our code is at https://github.com/bonetrade/visual-dissimilarity

The key idea: a one-shot neural network can be used to measure the web of differences in carefully selected social media images (backgrounds removed) of human skulls. patterns of similar *dissimilarities* can then be compared with osteological or forensic materials and then we can look at what vendors say about the remains. We find that the stories told are often dubious. The web of differences also seems to imply that Indigenous North American human remains are being traded, but not labelled as such. While bonetraders will be quick to point out that ‘buying human skulls is legal’ (and we’ll write more about that in due course), trading in Indigenous Human remains gets into NAGPRA territory & it’s most definitely illegal (US): law.cornell.edu/uscode/text/18.

 

Zettlekasten to Online Wiki

I was never taught how to take notes. Periodically, I try to develop better habits. I’ll go read various blogs, forums, product pages, looking for the thing that’ll make everything come together, make my reading more effective, make my thinking so much sharper…

sigh.

Some time ago I bought an ipad (‘it’ll be for research! honestly! for pdfs!’) and still my reading/note taking didn’t come together. I have Liquid Text on it and pdf viewer. Liquid Text is pretty neat… but I find its ability to pull multiple pdfs together and all of its note taking, connecting just doesn’t work for me – on an iPad. Part of the problem is that I wasn’t using it the way its designers imagined a person might use it. (Apparently, it’s now available on Microsoft devices and in that context I think it would really work for me). The other part was, well, probably a discipline thing. Or lack thereof.

PDF Viewer is a nice little app for reading pdfs, and when I tied it to Zotero with zotfile… now we’re talking! Got my notes back on my writing machine, so headway.

~

In the past, for various projects, I’ve tried the whole one-idea-per-card note taking system called ‘Zettelkasten‘. Combine that with an editor that does search and creation at the same time (like nvAlt), and I actually got kinda good at pulling stuff out and framing searches, finding connections between my notes. It’s a bit like ‘commonplace books‘, at least the way I’ve been using ’em.  I’ve also been thinking of these in the context of open notebook science, reproducibility and that sort of thing – Caleb McDaniel put it best:

: The truth is that we often don’t realize the value of what we have until someone else sees it. By inviting others to see our work in progress, we also open new avenues of interpretation, uncover new linkages between things we would otherwise have persisted in seeing as unconnected, and create new opportunities for collaboration with fellow travelers. These things might still happen through the sharing of our notebooks after publication, but imagine how our publications might be enriched and improved if we lifted our gems to the sunlight before we decided which ones to set and which ones to discard? What new flashes in the pan might we find if we sifted through our sources in the company of others?

He used an open notebook powered by Gitit to write his book Sweet Taste of Liberty and it won a Pulitzer Prize! (Caleb’s original open notebook) .

So how do I put these ‘zettels’ online? ‘The Archive‘ is a nice little bit of software, developed on top of nvAlt, and I like how it works. I have it saving each note as an md file into a git repository on my machine. I push these things to a github repo. Now, there are plenty of static site generators that will turn a collection of markdown into a static website, but collaboration on the underlying files is still an iffy process. I spun up a wiki.js instance on Reclaim Cloud and then figured out how to connect it to the github repo (thread here).

I am now the proud owner of a wiki that my students can edit and collaborate with me on some of my larger projects (they can just use the web interface, which is nice, no faffing about); whenever I git pull I have their research to hand in my preferred note taking app; whenever I push they get my stuff. And our research is out there in the open.

Gotchas:

– configure storage to grab from github using https, not ssh

– spaces in file names will break the import/export

– set up a metadata template in ‘The Archive’ so that notes will render nicely there.

HIST3000|CLCV3000 Introduction to Digital Archaeology – Trailers!

I started scratching out ideas for what this ‘intro to digital archaeology’ class might look like as I taught my early summer course, ‘Crafting Digital History.’ Scratches became mindmaps and random scraps of paper and orphaned text files. One thing that I found really worked well with the DH course was that it had a regular beat to it. Each week, the rhythm and routine was the same, although within that there was a lot of choice about what to do and how to approach it. I want to preserve that for the digiarch class; I also want to provide more signposts along the way, so I’m planning to seed the readings with my own annotations using hypothes.is; I also saw someone on Twitter mention that they might embed short wee videos of themselves speaking about each reading, in the reading via annotation and I thought, ‘my god, that’s brilliant’ and so I’ll give that a try too. I have the link to the tweet somewhere, just not here as I write.

Anyway, in the interests of providing more structure and more presence, I’ve also been building trailers for the course and the modules within it. Making these have helped narrow down what it is I want to do; you can’t touch on everything, so you’d better go deep rather than wide. Without further ado…

and a bit about me…

Elegy for George Floyd

Today is the funeral of George Floyd, the man murdered by police in Minneapolis. Since his death, other instances of police brutality as the police riot have been collated in various places; one reckoning has over 400 instances (link here, kept by Greg Doucette, and just the ones that have been shared on Twitter!).

We – Andrew Reinhard and myself – wanted to honour George Floyd, and so we composed ‘Elegy for George Floyd’, a data composition built from sonifying the data in that spreadsheet and then remixing the results.

As you listen, you will hear a trumpet (police siren / police action) that waxes and wanes with the brutality of the action recorded. The reports for each incident were loaded into Voyant-Tools, where they were reorganized by the most common terms. Each word was then replaced in the report by its count; then all of the scores for each report were added up. This index value was then mapped against four octaves in D# minor, a key that invokes “…Feelings of the anxiety of the soul’s deepest distress, of brooding despair, of blackest depresssion, of the most gloomy condition of the soul. Every fear, every hesitation of the shuddering heart, breathes out of horrible D# minor. If ghosts could speak, their speech would approximate this key. ” (source). These reports are scored into the music twice – one voice in whole notes, a second voice in arpeggiated chords to reflect the sirens and chaos of the police brutality

Each city’s latitude and longitude and the cumulative report number were converted into chords and baseline.

The resulting sonification was then remixed, with an 808 bass line added. T808 runs throughout the entire song, the heartbeat of George Floyd that abruptly stops at 8:46. It contrasts with the intrusive double-bass of the police line generated in the original sonification. The crescendos of all of the data tracks reflect clashes with the police. Towards the end of the song, there are instances (and then a full minute) of tracks playing backwards, which reflects how upside-down things have become.

The remixed piece is at 90 bpm which we feel adds to the gravitas of the work; it is unsettling and sad, but yet, even now, contains beauty and hope.

With respect, we offer this piece in that spirit.

Our original tracks are available at https://github.com/shawngraham/elegy-for-George-Floyd. We invite you to remix and recompose your own version.

We are uploading the piece to itunes, and any monies it might earn will be donated to #blm.