The Scottish National Library has made available a collection of chapbooks printed in Scotland, from 1671 – 1893, on their website here. That’s nearly 11 million words’ worth of material. The booklets cover an enormous variety of subjects. So, what do you do with it? Today, I decided to turn it into music.
As part of writing the second edition to the Historian’s Macroscope, I’ve been re-writing the topic modeling section, and I’ve included working with this information, and building a topic model for it using R. As part of that exercise, I preprocessed all the data so that it would be a bit easier for the newcomer to work with it (which will be held in a Github repo for the purpose). Part of the preprocessing was adding a ‘publication date’ to the NLS-provided inventory file (which involved a whole bunch of command line regex etc to grab that info from the METS metadata files).
To turn this into sound – I used the Topic Modeling Tool to build a quick topic model on the 3000 + text files containing the ocr’d text. The TMT can also match your metadata up against the topic results, which is very nice and handy, especially for turning the results into music, which I did with the TwoTone app. Drop the resulting csv onto TwoTone, and your columns are ready to map to the music; the visualization is also handy to get a sense of when your topics are most prominent (where the left hand side is my earliest date, and the right hand side is my latest date):
Then I played with the settings, filtering things so that notes only are played if they are making a meaningful contribution to the entire year’s text.
You can listen to it on Soundcloud.
The piano arpeggios are mapped to a topic that seems largely to be bad ocr. The pipe organ corresponds to a topic about religion. The trumpet seems to be stories of people off to make their fortune (as I read the topic words for that topic). There’s a double base in there, which I assigned to the ‘histories’ topic (because why not). The glockenspiel is assigned to the topic that I understand as ‘folk wisdom’, while the harp is mapped to stories of love and romanced (and doomed love too, for that matter).
What do we learn doing this? Well, for one thing, it forces us to think about the constructedness of our ‘visualizations’, which is never a bad thing. It foregrounds how much dirty data is in this thing. It shows change over time in Scottish publishing habits (“we could have done that with a graph, Shawn!” to which I say: So what? Now you can engage a different part of your brain and feel that change over time.)