A museum bot

I wanted to build a bot, inspired by some of my students who made a jupyter notebook that pulls in a random object from the Canadian Science and Technology Museum’s open data, displaying all associated information.

The museum’s data is online as a csv file to download (go here to find it: http://data.techno-science.ca/en/dataset/cstmc-smstc-artifacts-artefact ). Which is great; but not easy to integrate – no API.

Build an API for the data

So, I used Simon Willison’s Datasette package to take the csv table, turn it into a sqlite database, and then push it online – https://datasette.io/.

First I installed sqlite-utils and datasette using homebrew:

brew install sqlite-utils datasette

then I turned the csv into sql:

sqlite-utils insert cstm.db artefacts cstmc-CSV-en.csv --csv

I installed the commandline tools for vercel, where my museum data api will live, with

npm i -g vercel

vercel login

then I pushed the data online with datasette; datasette wraps the database in all its datasette goodness:

datasette publish vercel cstm.db --project=cstm-artefacts

You can see the results for yourself at https://cstm-artefacts.vercel.app/ (click on ‘artefacts’).

Now, a few days ago, Dan Pett posted the code for a bot he made that tweets out pics & data from the Portable Antiquities Scheme database – see his repo at https://github.com/portableant/findsbot. I figured it should be easy enough to adapt his code, especially since my new api will return data as json.

Build a Bot with R

So I fired up RStudio on my machine, and began experimenting. The core of my code runs an sql query on the API looking for a random object where ideally the general description and thumbnail fields are not null. Then it parses out the information I want, and builds a tweet:

library(httr)
library(rtweet)
library(jsonlite)
library(digest)

search <- paste0('https://cstm-artefacts.vercel.app/cstm.json?sql=SELECT+*+FROM+artefacts+WHERE+NOT+GeneralDescription+IS+NULL+AND+NOT+thumbnail+IS+NULL+ORDER+BY+RANDOM%28%29+LIMIT+1%3B')
randomFinds <- fromJSON(search)
## grab the info, put it into a dataframe
df <- as.data.frame(randomFinds$rows)
artifactNumber <- df$V1
generalDescription <- df$V3
contextFunction <- df$V17
thumbnail <- df$V36

## write a tweet
tweet <- paste(artifactNumber,generalDescription,contextFunction, sep=' ')

## thank god the images have a sensible naming convention;
## grab the image data
imagedir <- randomFinds$results$imagedir
image <- paste0(artifactNumber,'.aa.cs.thumb.png')
imageUrl <- paste0('http://source.techno-science.ca/artifacts-artefacts/images/', URLencode(image))

## but sometimes despire my sql, I get results where there's an issue with the thumbnail
## so we'll test to see if there is an error, and if there is, we'll set up a 
## an image of the Museum's lighthouse, to signal that well, we're a bit lost here
if (http_error(imageUrl)){
  imageUrl <- paste0('https://ingeniumcanada.org/sites/default/files/styles/inline_image/public/2018-04/lighthouse_.jpg')
  tweet <- paste(artifactNumber,generalDescription,contextFunction, "no image available", sep=' ')
}

## then we download the image so that we can upload it within the tweet
temp_file <- tempfile()
download.file(imageUrl, temp_file)

So all that will construct our tweet.

Authenticate….Authenticate…

The next issue is setting up a bot on twitter, and getting it to… tweet. You have to make a new account, verify it, and then go to developer.twitter.com and create a new app. Once you’ve done that, find the consumer key, the consumer secret, the access token, and the access secret. Then, make a few posts from the new account as well just to make it appear like your account is a going concern. Now, back in our script, I add the following to authenticate with twitter:

findsbot_token <- rtweet::create_token(
  app = "THE-EXACT-NAME-YOU-GAVE-YOUR-APP",
  consumer_key = "THE-KEY-GOES-HERE",
  consumer_secret = "THE-SECRET-GOES-HERE",
  access_token = "THE-TOKEN-GOES-HERE",
  access_secret = "THE-ACCESS-SECRET-GOES-HERE"
)

# post the tweet
rtweet::post_tweet(
  status = tweet,
  media = temp_file,
  token = findsbot_token
)

And, if all goes according to plan, you’ll get a “your tweet has been posted!” message.

Getting the authentication to work for me took a lot longer than I care to admit; the hassel was all on the developer.twitter.com site because I couldn’t find the right damned placed to click.

Secrets

Anyway, a bot that tweets when I run code on my machine is cool, but I’d rather the thing just ran on its own. Good thing I have Dan on speed-dial.

It turns out you can use Github Actions to run the script periodically. I created a new public repo (Github actions for private repos cost $) with the intention of putting my bot.R script in it. It is a very bad idea to put secret tokens in plain text on a public repo. So we’ll use the ‘secrets’ settings for the repo to store this info, and then modify the code to pull that info from there. Actually, let’s modify the code first. Change the create_token to look like this:

findsbot_token <- rtweet::create_token(
  app = "objectbot",
  consumer_key =    Sys.getenv("TWITTER_CONSUMER_API_KEY"),
  consumer_secret = Sys.getenv("TWITTER_CONSUMER_API_SECRET"),
  access_token =    Sys.getenv("TWITTER_ACCESS_TOKEN"),
  access_secret =   Sys.getenv("TWITTER_ACCESS_TOKEN_SECRET")
)

Save, and then commit to your repo. Then, click on the cogwheel for your repo, and select ‘Secrets’ from the menu on the left. Create a new secret, call it TWITTER_CONSUMER_API_KEY and then paste in the relevant info, and save. Do this for the other three items.

One thing left to do. Create a new file, and give it the file name .github\workflows\bot.yml ; here’s what should go inside it:

name: findsbot

on:
  schedule:
    - cron: '0 */6 * * *'
  workflow_dispatch:
    inputs:
      logLevel:
        description: 'Log level'
        required: true
        default: 'warning'
      tags:
        description: 'Run findsbot manually'
jobs:
  findsbot-post:
    runs-on: macOS-latest
    env:
      TWITTER_CONSUMER_API_KEY: ${{ secrets.TWITTER_CONSUMER_API_KEY }}
      TWITTER_CONSUMER_API_SECRET: ${{ secrets.TWITTER_CONSUMER_API_SECRET }}
      TWITTER_ACCESS_TOKEN: ${{ secrets.TWITTER_ACCESS_TOKEN }}
      TWITTER_ACCESS_TOKEN_SECRET: ${{ secrets.TWITTER_ACCESS_TOKEN_SECRET }}
    steps:
      - uses: actions/checkout@v2
      - uses: r-lib/actions/setup-r@master
      - name: Install rtweet package
        run: Rscript -e 'install.packages("rtweet", dependencies = TRUE)'
      - name: Install httr package
        run: Rscript -e 'install.packages("httr", dependencies = TRUE)'
      - name: Install jsonlite package
        run: Rscript -e 'install.packages("jsonlite", dependencies = TRUE)'
      - name: Install digest package
        run: Rscript -e 'install.packages("digest", dependencies = TRUE)'
      - name: Create and post tweet
        run: Rscript bot.R

If you didn’t call your script bot.R then you’d change that last line accordingly. Commit your changes. Ta da!

The line that says ‘cron: ‘0 */6 * * *’ is the actual schedule. You can decipher that with this:

which comes from here: https://www.adminschoice.com/crontab-quick-reference . If you want to test your workflow, click on the ‘actions’ link at the top of your repo, then on ‘findsbot’. If all goes according to plan, you’ll soon see a new tweet. If not, you can click on the log file to see where things broke. Here’s my repo, fyi https://github.com/shawngraham/cstmbot.

So to reiterate – we found a whole bunch of open data; we got it online in a format that we can query; we wrote a script to query it, and build and post a tweet from the results; we’ve used github actions to automate the whole thing.

Oh, here’s my bot, by the way: https://twitter.com/BotCstm

Time for a drink of your choice.

Postscript

Individual objects are online, and the path to them can be built from the artefact number, as Steve Leahy pointed out to me: https://ingeniumcanada.org/ingenium/collection-research/collection-item.php?id=1979.0363.041 Just slap that number after the php?id=. So, I added that to the text of the tweet. But this also sometimes causes the thing to fail because of the character length. I’m sure I could probably test for tweet length and then swap in alternative text as appropriate, but one thing at least is easy to implement in R – the use of an url shortener. Thus:

library(urlshorteneR)

liveLink <- paste0('https://ingeniumcanada.org/ingenium/collection-research/collection-item.php?id=', artifactNumber)
shortlink <- isgd_LinksShorten(longUrl = liveLink)

tweet <- paste(artifactNumber,generalDescription,contextFunction,shortlink, sep=' ')

Which works well. Then, to make sure this works with Github actions, you have to install urlshorteneR with this line in your yaml:

   - name: Install urlshorteneR package
        run: Rscript -e 'install.packages("urlshorteneR", dependencies = TRUE)'

ta da!