I’ve been topic modeling all of the posts from the Day of Archaeology.
Topic modeling looks at patterning of words to determine ‘topics’ :
Topic models provide a simple way to analyze large volumes of unlabeled text. A “topic” consists of a cluster of words that frequently occur together. Using contextual clues, topic models can connect words with similar meanings and distinguish between uses of words with multiple meanings. For a general introduction to topic modeling, see for example Probabilistic Topic Models by Steyvers and Griffiths (2007).
-from the MALLET website
I’m in the process of analyzing what I’ve found. To see worked out examples of what topic modeling can achieve, why not visit Mining the Dispatch? (I owe a debt of gratitude by the way to Rob Nelson, who is patiently talking me through how to do all of this).
When I’ve got more to report, I’ll work out the detailed implications and put them up. I’ll just say at the moment that the most commonly occurring ‘topic’ in the Day of Archaeology posts is ‘topic 4’, which includes the following words:
bronze material museum objects hoard age finds british report collections work past antiquities people time things treasure communities late scheme bones swords institute portable archaeological nuraghi phase
What does that mean? Well, it seems to maybe perhaps somewhat point to a topic around the idea of the Portable Antiquities Scheme (but I may change my mind on that). Sometimes, the meaning of the ‘topic’ is much clearer:
digital built finds museum project world soil class internet student detmold blog including publishing coda learn documentation geomatics information
…which probably means, ‘digital archaeology’.
Any particular post will be a combination of the various topics, in varying proportions. But, as Elijah Meeks demonstrates, you can graph these combinations with Gephi, to get a visualized representation of what’s going on – see inset below.