In which I topic model the entire PAS database by individual rows

Previously, I was trying to consider the geography of Roman Britain as a corpus of documents – individual geographic (modern) areas – where the records in the Portable Antiquities Scheme database formed the words of the document.

Today, I inverted that process. I treated each individual row in the entire PAS database as an individual document, with the data within that record its words. It took about two hours of processing time, looking for 100 topics. I now have a series of outputs that neither Excel nor Notepad++ can open, as they are too big. I’ll have to break the files up before I can dig too much deeper into them. However, what I can examine seems promising – topics that seem to indicate various regions; topics that indicate particular finds officers; topics that indicate particular kinds of artefacts; topics that indicate the status of the object (whether it was returned to the finder). Here’s a sampling:

Topic Weight Words
94 0.01654 mm thick wide weighs long diameter measures grams length width weight thickness high weighing fragment edge section maximum measuring
3 0.01442 suffolk east metal detector minter faye finder returned alloy plouviez judith copper mid st jane carr coastal edmundsbury geake
22 0.01409 green patina surface dark colour mid brown alloy copper corrosion light worn grey slightly condition corroded object pitted original
45 0.01374 mm weight width thickness length diameter atherton rachel maximum thick derbyshire dimensions wt height fragment midlands max including complete
1 0.01165 yorkshire humber riding metal detector east finder north returned alloy copper holmes simon paynton ceinwen hambleton selby david illegible
56 0.01076 east lincolnshire adam daubney midlands detector metal alloy lindsey copper finder returned kesteven north west elwes marina nottinghamshire rushcliffe
49 0.01028 lines decorated incised side decoration line central edge raised ring centre border grooves dot cross rectangular end upper punched
69 0.01003 ae nummus constantine house gloria exercitvs bust soldiers standards copper standard victory prow ii left illegible constantinopolis helmeted ad
27 0.00985 frame buckle pin bar alloy copper medieval loop edge oval outer missing cast strap double narrowed shaped looped section
92 0.00976 sherd pottery rim fabric sherds ware vessel grey chance find body medieval ceramic roman detecting inclusions colour surface orange

topic3topic1topic94  topic3 topic22 topic45