gnōthi seauton, or, mine your own tweets

Sometimes, one of the best ways to understand a method is to run it on data that you know very well indeed. In which case, the ability to request one’s twitter archive and to feed it into R is quite handy. You make the request, download the csv, then paste the ‘text’ column into its own csv. Clean it up with regex to remove http and special characters etc, then feed it into this script:

This can take a while. When it’s done, go to the output folder, and copy each file into a single github gist (as I’ve done here: Then, swap out the for and you can explore the result or share it: If you hit the ‘view in another window, you get the visualization full screen, eg

If you find something interesting in a topic or term, you can put that in the URL as appropriate and share/cite the relevant visualization directly. ldaVIS is a really nice package.

So – what does all this mean? Well, at first blush, it shows that my tweeting activity is largely pretty consistent, for all of its mass. Topics 1 and 2 are on point for archaeology, history, and digital applications thereof; Topic 2 is filled with #msudai from the summer, where I went on a massive twitter-spree tweeting materials at participants and reporting on the institute to the wider world (indeed, at one point, we were trending in Detroit!). Other topics (6 for instance) evidence an interest in gaming and so on. In a way, it’s not the discrete topics, the clearly delimited ones, that are of interest. It’s the fuzzy stuff. 19,9,15, and 7, all overlap. Topic 15, two of the top three words are ‘fiction’ and ‘moocs’ (top word is the username for a robot of mine that tweets the latest archaeological papers). A robot, a roboticized learning environment, fiction…. that perhaps says something.

Anyway, feel free to explore. Or give this a shot on your own materials (whether authored by you or from somewhere else).