I tried a new tact in my quest to data mine archaeological records. Stuart Eve sent me the csv from the Prescot Street excavations, where each record was a unique context. I fed this into the vanilla java gui for MALLET (so no tuning, just the basic settings, looking for 25 topics) to see what – if anything – might result. The output seems very promising. I deliberately did not look up any information on the excavation until after I’d run this analysis. Can reading site records algorithmically tell us anything useful, that we did not otherwise know?

How to visualize this? I’m growing cold towards network visualizations of this kind of data, but in this case a two-mode representation might be appropriate, since the topic modeling algorithm is functioning as a kind of unsupervised clustering routine, pulling words out of the records that seem to go together. Here’s a two-mode network of the results, contexts tied to their constituent topics:

Prescot Street as Topic Model.
It seems promising. In that image, I took the excavators’ names out. But upon reflection, I shouldn’t do that:

I asked Gephi to look for modules (communities; groups; based on similarity of ties) within this two mode network. Below are a series of images that focus on the individual modules. Two items jump out immediately – one, particular excavators are associated with particular word choice, patterning of word usages; two, particular kinds of materials clump together quite nicely.

Do particular excavators ‘see’ particular kinds of info that others don’t? Do they ‘specialize’ in certain kinds of info? As a newbie on the Forum Novum project for BSR many years ago, I was never allowed on any of the ‘interesting’ stuff, being consigned to digging through layers of fill to find the depth of the natural soil level. There’s only so many ways to describe dirt. This kind of thing happens often. You want your most experienced excavators to handle the most delicate/intricate/complicated situations, but… I wonder.

Topic modeling this material, whilst including the names of the excavators attached to each context, seems to shed interesting light on the ways we see things archaeologically. In my other experiments with the PAS database, because of extraneous commas creeping in and shifting the fields, I often ended up with an inconsistent inclusion of the finds officers’ names, so I tended to just exclude them completely. That might be an error. I think we need to know whose voice is most tied to the ‘topics’/’discourses’ that make up our record (after all, once it’s excavated, this is all we have left, right?) This experiment here suggests that perhaps one of the more valuable outcomes of topic modeling archaeological material is the re-introduction of subjectivity into our records, the idea that many voices (modern and ancient) make up the ‘record’ – and we should listen to them.

In due course I’ll put the html up somewhere so that the interested reader can jump through the contexts along the topic – context – topic pathways suggested by the topic modeling. We use Harris matrices (a kind of network) to understand the three dimensional relationships amongst contexts (which imply their chronological ordering); what kinds of insights can deforming our reading of an excavation along the network paths suggested by the topic modeling produce?

Below are the visualizations of the modules.

pits and burials
roman pits, fills, structures
cellars and latrines
graves and cemeteries
roman fill
modern ditches
And the topics with their top words:
topicId words..

1 schager elisabet pottery area part remains found bone similar poss fills bones appears burnt located human pieces waste grey activity main animal clear cremations broken cbm fragments truncates domestic skull high underneath mid shells bit edge sort chalk vessels deposits charcoal nw sherds disarticulated lost oyster sterile specific includes thrown

2 pit roman ii po ossuary irregular large latest including probable mixed pictured truncating inside planned sealed appears cut continuation surviving soakaways remained intercutting step pitting results topped width relates infilling partial include moved northwards steven ashley contexts adult perpendicular offset remain aesthetically loaced disturb sprial mentioned compass fed skeletons connections

3 fill floor basement rubble concrete slab fl evidence bedding ce larger glass abutting represent demolition room darker suggesting repair boundaries situe remaining unclear feature continues samian cessy eval packed facade john photo subrectangular reused actual ws lay inclusion noted lie teh constrcution looked crees brick lots archaeology flexed state

4 soakaway late water sump collection su pm brick soak masonry structure horn core back lined bricks lining drainage masonary materials face smell fit red held system courses time functioned sloping putrid cores aid headers lain knocked pipes mottled lies bands buried rotten real lying tirtiary simple earthernware exterior acrivity respective

5 pm pooley ashley late backfill century brick lucas tom cellar made garden line deliberate material walls cistern places sitting leveling thc proximity shallow backfilling based lerza rivets lifting limestone rebuild characteristic general redep suggested potential campion signs putrid map shown phase bits occurance structure element disintegrated ash southwards act crumble

6 truncated linear modern clark william heavily south west east truncation due north shape rectangular foundation cist machine stone cutting running aligned relationship pre composition tiles ne note observed worked sides deeper manhole intrusion define identical machining unknown depression tile mod axis bagged tegula limit channel erosional forming sample loe uneven

7 cut construction structural back slightly ring recut ephemeral completely realised doesn left partly heading heavy fragment contents analogous suggests comprises properties limestone short wells thc intervening association reflecting pictures clarify count sotnes terminus browny vertically bar unarticulated highest repdeposited things redeposit crmated tank approx ifthe lessnes forming explaining inclination plan

8 fill top base finds contained clancy sara clay organic context section date level horncore excavated original sheet shallow sketch nature suggest silting pipe suggests depth sampled trap lined dumped fully put reverse cemetery hearth beam deliberately frequent removal orientation orange paper backfilled horncores lain discussion sealed cultural appeared thick tenon

9 pit roman cut howell paula oval tip big ground difficult exact probable vertical pocket reflect shows pretty phase work means times duffy region alignment man matches nail wasn sequence build silting clinker fl brittle abuts tentar db sewer quadrant disarticulation implying characteristics revised constuction bottomed pressed unimpressive smc extending

10 gary webster filled fill possibly surface metalling gravelly related underlying unclear laid em difficult compact cobbling overlying represent dark modern mix undetermined series metalled place yellow gaps se stratigraphically extend dumps intentionally missing size charnal foundations spilled lack unsure things areas barrell blue metal yard variety respects ploughsoil anphora

11 pm post early fill med large medieval cess lot contemporary light pc inclusions latrine observed single mortar collapse character leather recovered ceramic suggests extent glacial lense hand event green interpretation resting case demo roughly curved household apparently assist inflow setting render cores varying determining belongs tenuously derivation mixture unlike consistant

12 refuse pr greg crees pit rubbish kind determine paula representing previously rounded bs discovered full gradual probing based enclosing struck housing similarities fronted coursings characterisation excavate sharp valley abse meeting people compare chronologically indication hypocaust blurring subjected distinctive amost grain remaining forms patchy interred colours including similar time midden

13 fill lower natural secondary sand greyish shaped statter claire surveyed black yorkstone proper ashley loose return horizontally mmx rest slots tbm patches largest acidic order distinct interface terrace drainage seperated ark rubberley hit spiralled rebuild destruction coming eastwards sharply hold candidate smells distorted field air powdered stains overview vacant dated

14 roman carreton adrian upper pit colour excavation makes funerary alignment southern preservation fireplace cover collapsed extending scattered adhering pinkish comprised ns nw bag smelly find soakaway whoel gs meant belonged disused regard ditches meters quarries huge making corresponds ritual existing cemented dimensions starts dimension marked paired excavtion staining shipton

15 wall fergal donoghue late pm wa sill building georgian st internal tenter butting wooden victorian present buttress support extension long prescot barrel house rear street dividing facing immediately platform front rising moisture prevent slate thinks medium slabs beneath seemingly fl counstruction plot wider lienar knees erosional lies cu trample photographed

16 roman fill matt ceri shipton law nails williams earlier black form find situ obvious fe uncertain complete amounts objects culvert smae skulls notably addition wood stoney domed truncations rectilinear pyres quality moderate working bonding earliest dark gis ark failry compost peeled functional rows ended properly remnants buildings accounted variation

17 cremation burial cr urn disturbed pot plan tile dobosz ukasz votive diffuse recorded dug built sample cm represents bone chest cremated position surrounding box analysis nb lifted coin regular offering vessel concentration occasional deposits suggesting block intact urned sw notable lid samples deep stones western broad higher plate cms relate

18 make levelling mu layer gravel dump material brickearth redeposited ed sandy deposited earth dumped spread dirty ground silty slumped capping clayey charcoal derived quarrying stoney extraction layers thin square sorted period exposed occupation sands soft parts provide lines stuff didn true partly significantly basal white tom mixey cluster test central

19 void external soil deposit hole cultivation posthole ec soils sp fra features lerza site fairly agricultural number brown debris dep evaluation reworked dumping result dates horticultural environmental run unurned plough residue deposition manuring representing upright storage exit family connected cleaning difference squared linked geophoto amorphous gravely concentrations poo defined

20 pit david unspecified ross edge roman brenna lowest shallow expect final basal presume dimensions marcus pebbles angular appeared covering diffuse processing stage stuart lens stored missed thickness const irregularities souther button funcation limits uncear oblong wider poshole suggested works fil metaling patella jaws grounds greater major purposes elisabet derive pegtile

21 cut small ruth rolfe side cuts end pits grid sq eastern circular originally western piece hard mm fact partially edges removed orangey wood half northern thought directly separate nearby degraded marcus initial urns period solid straight slope inwash graves limit rough wide occured occasionally centre good concave leading survives undertermined

22 drain ditch gully feature possibly shallow trench bottom boundary hassett visible southern runs burials sides postmed aspects slot cemetary presence point robber quarrying footing essentially direction formed doesn land homogenous number indicating section constructed thc terraced gulley parallel holes assoc overflow longbone debitage arising pressure fragment mark glazed wash sealing

23 pit quarry pq hassan anies primary silt dark prob middle zone skull filling mausoleum planned machined edges tiled tanked evident reused stain northeast ts corner sit redepoisted doubt terminate pillow overleaf shale fits standard means dateable existant redundant easts dropping quarried usage gc report truncate trampling compositions marcus bag

24 morse chaz deposit mixed roman dumped brown gravels rich function silty forms lenses subsided narrow assume robbed rest past discernable pitcut con sitly barren bucket cesspit shot beneath late unfrogged sister occupancy flure terminates consister retrieved resolved parallel joining ideas give millefiore burrial cd assumption regularity uppermost imbrex deposite

25 grave roman skeleton cut sk moskal tomasz coffin dug inhumation preserved poorly erroded goods body left legs head articulated skeletal nos poor events juvenile severly feet condition fragmentary holding bed ends stain strongly info spaced cu deposited shaped assigned disturbance cleaned chalk disatriculated femur hands soakawy showing overhangs hom cen


    1. Yes. I think you could probably tie the individual contexts over space to the various topic models, or perhaps work the various word weights within topics to the contexts, displaying them as a Harris matrix… I’m bouncing a couple of ideas around!

