Stanford NER, extracting & visualizing patterns

This is just a quick note while I’m thinking about this. I say ‘visualizing’ patterns, but there are of course many ways of doing that. Here, I’m just going quick’n’dirty into a network.

Say you have the diplomatic correspondence of the Republic of Texas, and you suspect that there might be interesting patterns in the places named over time. You can use the Stanford Named Entity Recognition package to extract locations. Then, using some regular expressions, you can transform that output into a network file. BUT – and this is important – it’s a format that carries some baggage of its own. Anyway, first you’ll want the Correspondence. Over at The Macroscope, we’ve already written about how you can extract the patterns of correspondence between individuals using regex patterns. This doesn’t need the Stanford NER because there is an index to that correspondence, and the regex grabs & parses that information for you.

But there is no such index for locations named. So grab that document, and feed it into the NER as Michelle Moravec instructs on her blog here. In the  terminal window, as the classifier classifies Persons, Organizations, and Locations, you’ll spot blank lines between batches of categorized items (edit: there’s a classifier that’ll grab time too; that’d be quite handy to incorporate here – SG). These blanks correspond to the blanks between the letters in the original document. Copy all of the terminal output into a new Notepad++ or Textwrangler document. We’re going to trim away every line that isn’t led by LOCATION:

\n[^LOCATION].+

and replace with nothing. This will delete everything that doesn’t have the location tag in front. Now, let’s mark those blank lines as the start of a new letter. A thread on Stack Overflow suggests this regex to find those blank lines:

^\s*$

where:

^ is the beginning of string anchor
$ is the end of string anchor
\s is the whitespace character class
* is zero-or-more repetition

and we replace with the string new-letter.

Now we want to get all of the locations for a single letter into a single line. Replace ‘LOCATION’ with a comma. This budges everything into a single line, so we need to reintroduce line breaks, by replacing ‘new-letter’ with the new line character:

find: (new-letter)
replace \n(\1)

I could’ve just replaced new-letter with a new-line, but I wanted to make sure that every new line did in fact start with new-letter. Now find and replace new-letter so that it’s removed. You now have a document with the same number of lines as original letters in the original correspondence file. Now to turn it into a network file! Add the following information at the start of the file:

DL
n=721
format = nodelist1
labels embedded:
data:

DL will tell a network analysis program that we are dealing with UCINET’s DL format. N equals the number of nodes. Format=nodelist1 says, ‘this is a format where the first item on the line is connected to all the subsequent items on that line’. As a historian or archaeologist, you can see that there’s a big assumption in that format. Is it justified? That’s something to mull over. Gephi only accepts DL in format=edgelist1, that is, binary pairs. If that describes the relationship in your data, there’s a lot of legwork involved in moving from nodelist1 to edgelist1, and I’m not covering that here. Let’s imagine that, on historical grounds, nodelist1 accurately describes the relationship between locations mentioned in letters, that the first location mentioned is probably the place where the letter is being written from, or the most important place, or….

“labels embedded:” tells a network program that the labels themselves are being used as data points, and “data:” indicates that everything afterwards is the data. But how did we know how many nodes there were? You could tally up by hand; you could copy and paste your data )(back when each LOCATION was listed) into a spreadsheet and use its COUNT function to find uniques; I’m lazy and just bang any old number in there, and then save it with a .dl extension.  Then I open it using a small program called Keyplayer. This isn’t what the program is for, but it will give you an error message that tells you the correct number of nodes! Put that number into your DL file, and try again. If you’ve got it right, Keyplayer won’t do anything – its silence speaks volumes (you can then run an analysis in keyplayer. If your DL file is not formatted correctly, no results!).

You now have a DL file that you can analyze in Pajek or UCINET. If you want to visualize in Gephi, you have to get it into a DL format that Gephi can use (edgelist) or else into .net format. Open your DL file in Pajek, and then save as Pajek format (which is .net). Then open in Gephi. (Alternatively, going back a step, you can open in Keyplayer, and then within Keyplayer, hit the ‘visualize in Pajek’ button, and you’ll automatically get that transformation). (edit: if you’re on a Mac, you have to run Pajek or Ucinet with something like Winebottler. Forgot to mention that).

Ta da!

Locations mentioned in letters of the Republic of Texas

Locations mentioned in letters of the Republic of Texas

 

 

The Web of Authors for Wikipedia’s Archaeology Page

I’m playing with a new toy, WikiImporter, which allows me to download the network of authorship on media-wiki powered sites. I fired it up, set it to grab the user-article network and “The Hyperlink Coauthorship network will analyze all the links found in the seed article and create an edge between each user that edited the article found in that link and the article”.

Naturally, I pointed it at ‘archaeology’ on Wikipedia.  I’ve posted the resulting two mode network on figshare for all and sundry to analyze.

I also asked it to download the article to article links (which is slightly different than my spidering results, as my spiders also included the wiki pages themselves, like the ‘this page is a stub’ or ‘this page needs citations’, which gives me an interesting perspective on the quality of the articles. More on that another day). This file is also on figshare here.

Just remember to cite the files. Enjoy!

 

Beyond the Spaghetti Monster

No, I don’t mean that spaghetti monster. I mean the one that people invoke when they wish to disparage network analysis. That particular spaghetti monster is some variant of a force-directed layout algorithm. Now, these have their place, but they sometimes obscure more than they illuminate. There are alternatives, and Elijah Meeks has been sharing some d3.js code for making interactive ‘arc diagrams’ and ‘adjacency matrices’ that highlight important patterns in network data without the monstrousness.

Elijah writes:

An arc diagram is another way of visualizing networks that doesn’t use force-directed principles. Instead, it draws the edge from one node to another as arcs above or below the nodes. Weight is indicated by edge thickness and directionality is indicated by the arc being above or below the nodes as well as with the edge getting wider at the source.

Over at http://bl.ocks.org/emeeks/9458332  Elijah shows us the d3 code for making such a creature. In essence, the code says to your browser, ‘there’s an edgelist, and a nodelist, and they go together like this.’ Since it’s using d3.js (data-driven documents), it loads that library up to make this process easier.  If you wanted to draw one of these things for yourself, you need to copy Elijah’s index.html code from his bl.ocks.org page, and then create two files, edgelist.csv and nodelist.csv.  If you have a network in Gephi, you can export both of these from the data laboratory tab by clicking ‘export spreadsheet’.

Similarly, Elijah provides an interactive adjacency matrix at http://bl.ocks.org/emeeks/9441864

An adjacency matrix is a useful way of visualizing networks using an n-by-n grid that shows connection between nodes as a filled grid square. This adjacency matrix is directed, with the source on the y-axis and target on the x-axis. It loads the data from a node list and edge list and represents edge weight using opacity. It also highlights rows and columns on mouseover.

If you copy that bit of html into a new file, it points to the same nodelist.csv and edgelist.csv. Voila! Two very crisp and clear visualizations of the structure of your network, with very little spaghetti mess. Here is Peter Holdsworth’s network of women from 1898 Ontario as both an arc diagram and an adjaceny matrix (and I thank Peter for making is data public for these kinds of explorations – so, I opened his .gexf network file in Gephi. On the data laboratory tab I hit ‘export spreadsheet’ for the nodes table, and then the edges table. I opened the csv files in excel, stripped out extraneous fields, and saved as csv):

1898 - Women networked by virtue of shared membership in various organizations

1898 – Women networked by virtue of shared membership in various organizations

Same again

Same again

Contrast these with the spaghetti version that was generated with gephi (Figshare provides a preview here). The patterning is much clearer and intuitive, I think. It’s beyond my programming prowess, but it should not be overly difficult for someone to package this code as a layout plugin for Gephi I would think.

Now, here’s the thing – you’ll need to put the html and the csv into the same folder on a server somewhere for this to work. I use WAMP for this kind of thing before moving everything onto the live interwebs. Installing WAMP is quite straightforward; it’s a one-click installer. Once you’ve got it installed, and running, you simply create a subfolder inside the c:\wamp\www\ folder, ie \myproject\. Then in your browser, got to localhost\myproject. Save your html and csv files in that folder. In your browser, click on your html file, and you’re good to go.

Elijah does point out:

This may be true, but if one plays with some of the html, making the canvas bigger, some of this can be mitigated… As with most things, no one approach is going to do everything you need it to, but these two visualizations should be in your toolkit.

Mapping the Web in Real Time

I don’t think I’ve shared my workflow before for mapping the structure of a webcrawl. After listening to Sebastian Heath speak at #dapw it occurred to me that it might be useful for, interalia linked open data type resources. So, here’s what you do (and my example draw’s from this year’s SAA 2014 blogging archaeology session blog-o-sphere):

1. install the http graph generator from the gephi plugin marketplace.

2. download the navicrawler + firefox portable zip file at the top of this page.

3. make sure no other instance of firefox is open. Open firefox portable. DO NOT click the ‘update firefox’ button, as this will make navicrawler unusable.

4. Navicrawler can be used to download or scrape the web. In the navicrawler window, click on the (+) to select the ‘crawl’ pane. This will let you set how deep and how far to crawl. Under the ‘file’ tab, you can save all of what you crawl in various file formats. With the httpgraph plugin for Gephi however, we will simply ‘listen’ to the browser and render the graph in real time.

5. The first time you run firefox portable, you will need to configure a manual proxy. Do this by going to tools >> options >> network >> settings. Set the manual proxy configuration for http to 127.0.0.1 and the port to 8088. Click ‘ok’.

If you tried loading a webpage at this point, you’d get an error. To resolve this, you need to tell Gephi to connect to that port as well, and then web traffic will be routed correctly.

6. Open Gephi. Select new project. Under ‘generate’, select ‘http graph’. This will open a dialogue box asking for the port number. Enter 8088.

7. Over in Firefox portable, you can now start a websearch or go to the page from which you wish to crawl. For instance, you could put in the address bar, http://dougsarchaeology.wordpress.com/2013/11/05/blogging-archaeology/. Over in gephi, you will start to see a number of nodes and edges appearing. In the ‘crawl’ window in Navicrawler, set ‘max depth’ to 1, ‘crawl distance’ to 2′ and ‘tabs count’ to 25. Then hit the ‘start’ button. Your Gephi window will now begin to fill with the structure of the internet. There are 4 types of nodes: client, uri, host, and domain. For our purposes here, we will want to filter the resulting graph to hide most of the architecture of the web and just show the URIs. (This by the way could be very useful for visualizing archaeological resources organized via Linked Open Data principles).

Your crawl can run for quite some time.  I was running the crawl describe above for around 10 minutes when it crashed on me. The resulting gephi file (which has 5374 nodes and 14993 edges) can be downloaded from my space on figshare. For the illustration below, I filtered the ‘content-type’ for ‘text/html’, to present the structure of the human readable archaeo-blog-o-sphere as represented by Doug’s Blogging Archaeology Carnival.

The view from Doug's place
The view from Doug’s place

Gaze & Eonydis for Archaeological Data

I’m experimenting with Clement Levallois‘ data mining tools ‘Gaze‘ and ‘Eonydis‘. I created a table with some mock archaeological data in it: artefact, findspot, and date range for the artefact. More on dates in a moment. Here’s the fake dataset.

Firstly, Gaze will take a list of nodes (source, target), and create a network where the source nodes are connected to each other by virtue of sharing a common target. Clement explains:

Paul,dog
Paul, hamster
Paul,cat
Gerald,cat
Gerald,dog
Marie,horse
Donald,squirrel
Donald,cat
… In this case, it is interesting to get a network made of Paul, Gerald, Marie and Donald (sources nodes), showing how similar they are in terms of pets they own. Make sure you do this by choosing “directed networks” in the parameters of Gaze. A related option for directed networks: you can choose a minimum number of times Paul should appear as a source to be included in the computations (useful to filter out unfrequent, irrelevant nodes: because you want only owners with many pets to appear for instance).

The output is in a nodes.dl file and an edges.dl file. In Gephi, go to the import spreadsheet button on the data table, import the nodes file first, then the edges file. Here’s the graph file.

Screenshot, Gaze output into Gephi, from mock archaeo-data

Screenshot, Gaze output into Gephi, from mock archaeo-data

Eonydis on the other hand takes that same list and if it has time-stamps within it (a column with dates), will create a dynamic network over time. My mock dataset above seems to cause Eonydis to crash – is it my negative numbers? How do you encode dates from the Bronze Age in the day/month/year system? Checking the documentation, I see that I didn’t have proper field labels, so I needed to fix that. Trying again, it still crashed. I fiddled with the dates to remove the range (leaving a column to imply ‘earliest known date for this sort of thing’), which gave me this file.

Which still crashed. Now I have to go do some other stuff, so I’ll leave this here and perhaps one of you can pick up where I’ve left off. The example file that comes with Eonydis works fine, so I guess when I return to this I’ll carefully compare the two. Then the task will be to work out how to visualize dynamic networks in Gephi. Clement has a very good tutorial on this.

Postscript:

Ok, so I kept plugging away at it. I found if I put the dates yyyy-mm-dd, as in 1066-01-23 then Eonydis worked a treat. Here’s the mock data and here’s the gexf.

And here’s the dynamic animation! http://screencast.com/t/Nlf06OSEkuA

Post post script:

I took the mock data (archaeo-test4.csv) and concatenated a – in front of the dates, thus -1023-01-01 to represent dates BC. In Eonydis, where it asks for the date format, I tried this:

#yyyy#mm#dd  which accepted the dates, but dropped the negative;

-yyyy#mm#dd, which accepted the dates and also dropped the negative.

Thus, it seems to me that I can still use Eonydis for archaeological data, but I should frame my date column in relative terms rather than absolute, as absolute isn’t really necessary for the network analysis/visualization anyway.

Getting Historical Network Data into Gephi

I’m running a workshop next week on getting started with networks & gephi. Below, please find my first pass at a largely self-directed tutorial. This may eventually get incorporated into the Macroscope.

Data files for this tutorial may be found here. There’s a pdf/pptx with the images below, too.

The data for this exercise comes from Peter Holdsworth’s MA dissertation research, which Peter shared on Figshare here. Peter was interested in the social networks surrounding ideas of commemoration of the centenerary of the War of 1812, in 1912. He studied the membership rolls for women’s service organization in Ontario both before and after that centenerary. By making his data public, Peter enables others to build upon his own research in a way not commonly done in history. (Peter can be followed on Twitter at https://twitter.com/P_W_Holdsworth).

On with the show!

Download and install Gephi. (What follows assumes Gephi 0.8.2). You will need the MultiMode Projection pluging installed.

To install the plugin – select Tools >> Plugins  (across the top of Gephi you’ll see ‘File Workspace View Tools Window Plugins Help’. Don’t click on this ‘plugins’. You need to hit ‘tools’ first. Some images would be helpful, eh?).

In the popup, under ‘available plugins’ look for ‘MultimodeNetworksTransformation’. Tick this box, then click on Install. Follow the instructions, ignore any warnings, click on ‘finish’. You may or may not need to restart Gephi to get the plugin running. If you suddenly see on the far right of ht Gephi window a new tab besid ‘statistics’, ‘filters’, called ‘Multimode Network’, then you’re ok.

Slide1

Getting the Plugin

Assuming you’ve now got that sorted out,

1. Under ‘file’, select -> New project.
2. On the data  laboratory tab, select Import-spreadsheet, and in the pop-up, make sure to select under ‘as table: EDGES table. Select women-orgs.csv.  Click ‘next’, click finish.

(On the data table, have ‘edges’ selected. This is showing you the source and the target for each link (aka ‘edge’). This implies a directionality to the relationship that we just don’t know – so down below, when we get to statistics, we will always have to make sure to tell Gephi that we want the network treated as ‘undirected’. More on that below.)

Slide2

Loading your csv file, step 1.

Slide3

Loading your CSV file, step 2

3. Click on ‘copy data to other column’. Select ‘Id’. In the pop-up, select ‘Label’
4. Just as you did in step 2, now import NODES (Women-names.csv)

(nb. You can always add more attribute data to your network this way, as long as you always use a column called Id so that Gephi knows where to slot the new information. Make sure to never tick off the box labeled ‘force nodes to be created as new ones’.)

Adding new columns

Adding new columns

5. Copy ID to Label
6. Add new column, make it boolean. Call it ‘organization’

Filtering & ticking off the boxes

Filtering & ticking off the boxes

7. In the Filter box, type [a-z], and select Id – this filters out all the women.
8. Tick off the check boxes in the ‘organization’ columns.

Save this as ‘women-organizations-2-mode.gephi’.

Now, we want to explore how women are connected to other women via shared membership.

Setting up the transformation.

Setting up the transformation.

Make sure you have the Multimode networks projection plugin installed.

On the multimode networks projection tab,
1. click load attributes.
2. in ‘attribute type’, select organization
4. in left matrix, select ‘false – true’ (or ‘null – true’)
5. in right matrix, select ‘true – false’. (or ‘true – null’)
(do you see why this is the case? what would selecting the inverse accomplish?)

6. select ‘remove edges’ and ‘remove nodes’.

7. Once you hit ‘run’, organizations will be removed from your bipartite network, leaving you with a single-mode network. hit ‘run’.

8. save as ‘women to women network.csv’

…you can reload your ‘women-organizations-2-mode.gephi’ file and re-run the multimode networks projection so that you are left with an organization to organization network.

! if your data table is blank, your filter might still be active. make sure the filter box is clear. You should be left with a list of women.

9. You can add the ‘women-years.csv’ table to your gephi file, to add the number of organizations the woman was active in, by year, as an attribute. You can then begin to filter your graph’s attributes…

10. Let’s filter by the year 1902. Under filters, select ‘attributes – equal’ and then drag ‘1902’ to the queries box.
11. in ‘pattern’ enter [0-9] and tick the ‘use regex’ box.
12. click ok, click ‘filter’.

You should now have a network with 188 nodes and 8728 edges, showing the women who were active in 1902.

Let’s learn something about this network. On statistics,
13. Run ‘avg. path length’ by clicking on ‘run’
14. In the pop up that opens, select ‘undirected’ (as we know nothing about directionality in this network).
15. click ok.

16. run ‘modularity’ to look for subgroups. make sure ‘randomize’ and ‘use weights’ are selected. Leave ‘resolution’ at 1.0

Let’s visualize what we’ve just learned.

17. On the ‘partition’ tab, over on the left hand side of the ‘overview’ screen, click on nodes, then click the green arrows beside ‘choose a partition parameter’.
18. Click on ‘choose a partition parameter’. Scroll down to modularity class. The different groups will be listed, with their colours and their % composition of the network.
19. Hit ‘apply’ to recolour your network graph.

20. Let’s resize the nodes to show off betweeness-centrality (to figure out which woman was in the greatest position to influence flows of information in this network.) Click ‘ranking’.
21. Click ‘nodes’.
22. Click the down arrow on ‘choose a rank parameter’. Select ‘betweeness centrality’.
23. Click the red diamond. This will resize the nodes according to their ‘betweeness centrality’.
24. Click ‘apply’.

Now, down at the bottom of the middle panel, you can click the large black ‘T’ to display labels. Do so. Click the black letter ‘A’ and select ‘node size’.

Mrs. Mary Elliot-Murray-Kynynmound and Mrs. John Henry Wilson should now dominate your network. Who were they? What organizations were they members of? Who were they connected to? To the archives!

Congratulations! You’ve imported historical network data into Gephi, manipulated it, and run some analyzes. Play with the settings on ‘preview’ in order to share your visualization as svg, pdf, or png.

Now go back to your original gephi file, and recast it as organizations to organizations via shared members, to figure out which organizations were key in early 20th century Ontario…

Reanimating Networks with Agent Modeling

I’m presenting next week at the Society for American Archaeology Annual Meeting. I’m giving two papers. One argues for parsimonious models when we do agent based modeling.  The other reverses the flow of archaeological network analysis and instead of finding nets in the archaeology, I use agent based models to generate networks that help me understand the archaeology. (The session is ‘Connected Past’.) Here is the draft of my talk, with all the usual caveats that that entails. Parts of it have been drawn from an unpublished piece that discusses this methodology and the results in much greater detail. It will appear…. eventually.

Scott Weingart has been an enormous help in all of this. You should follow his work. 

My interests lie in the social networks surrounding primary resource extraction in the Roman world. The Roman epigraphy of stamped brick easily lends itself to network analysis. One string together, like pearls, individual landowners, estate names, individual brick makers, signa, brick fabrics, and locations. This leads to very complicated, multi-dimensional networks.

When I first started working with this material, I reduced this complexity by looking only at the humans, whom I tied together based on appearing in the same stamp type together. I called these ‘producer’ networks. I then looked at the ties implied by the shared use of fabrics, or the co-location of brick stamp types at various findspots, and called these ‘manufacturing’ networks.

I then sliced these networks up by reigning dynasty, and developed a story to account for their changing shapes over time.

This was in the late 1990s, and in terms of network theorists I had largely only Granovetter, Hanneman & Riddle, and Strogatz & Watts to go on. The story I told was little more than a just-so story, like how the Camel got its Hump.

I had the shape, I had points where I could hang the story, but I couldn’t account for how I got from the shape of the network in the Julio-Claudian period, to that of the Flavian, to that of the Antonines. I’ve done a lot of work on networks since then; now I want to know what generates these networks that we see archaeologically, in the first place.

In this talk today, I want to reverse the direction of my inquiry. We are all agreed that we can find networks in our archaeological materials. The problem, I think, for us, is to explain the network processes that produce these patterns, and then to use our understanding of those processes to narrow down the possible entangled human & thing interactions that could give rise to these possible processes.

We need to be able to understand the possible behaviour-spaces that could produce the networks we see, to tease out the inevitable from the contingent. We need to be able to rigorously explore the emergent or unintended consequences of the stories we tell. The only way I know how to do that systematically, is to encode those stories as computer code, to turn them from normal, archaeological storytelling rhetoric, to computational procedural rhetoric.

So this is what we did.

One story we tell about the Roman world, that might be useful for understanding things like the exploitation of land for building materials, is that its social economy functioned like a ‘bazaar’.

According to Peter Bang, the Roman economic system is best understood as a complex, agrarian tributary empire, of a kind similar to the Ottoman or Mughal (Bang 2006; 2008).  Bang (2006: 72-9) draws attention to the concept of the bazaar. The bazaar was a complete social system that incorporated the small peddler with larger merchants, long distance trade, with a smearing of categories of role and scale. The bazaar emerged from the interplay of instability and fragmentation. The mechanisms developed to cope with these reproduced that same instability and fragmentation. Bang identifies four key mechanisms that did this: small parcels of capital (to combat risk, cf Skydsgaard 1976); little homogenization of products (agricultural output and quality varied year by year, and region by region as Pliny discusses in Naturalis Historia 12 and 18); opportunism; and social networks (80-4). As Bang demonstrates, these characteristics correspond well with the archaeology of the Roman economy and the picture we know from legal and other text.

Bang’s model of the bazaar (2008; 2006), and the role of social networks within that model, can be simulated computationally. What follows is a speculative attempt to do so, and should be couched in all appropriate caveats and warnings. The model simulates the extraction of various natural resources, where social connections may emerge between individuals as a consequence of the interplay of the environment, transaction costs, and the agent’s knowledge of the world. If the networks generated from the computational simulation of our models for the ancient economy correspond to those we see in the ancient evidence , we have a powerful tool for exploring antiquity, for playing with different ideas about how the ancient world worked (cf. Dibble 2006). Computation might be able to bridge our models and our evidence. In particular, I mean, ‘agent based modeling’.

Agent based modelling is an approach to simulation that focuses on the individual. In an agent based model, the agents or individuals are autonomous computing objects. They are their own programmes. They are allowed to interact within an environment (which frequently represents some real-world physical environment). Every agent has the same suite of variables but each agent’s individual combination of variables is unique (if it was a simulation of an ice-hockey game, every agent would have a ‘speed’ variable, and an ‘ability’ variable, and so the nature of every game would be unique). Agents can be aware of each other and the state of the world (or their location within it), depending on the needs of the simulation. It is a tool to simulate how we believe a particular phenomenon worked in the past. When we simulate, we are interrogating our own understandings and beliefs.

The model imagines a ‘world’ (‘gameboard’ would not be an inappropriate term) in which help is necessary to find and consume resources. The agents do not know when or where resources will appear or become exhausted. By accumulating resources, and ‘investing’ in improvements to make extraction easier, agents can accrue prestige. When agents get into ‘trouble’ (they run out of resources) they can examine their local area and become a ‘client’ of someone with more prestige than themselves.  It is an exceedingly simple simulation, and a necessary simplification of Bang’s ‘Bazaar’ model, but one that captures the essence and exhibits subtle complexity in its results. The resulting networks can be imported into social network analysis software like Gephi.

It is always better to start with a simple simulation, even at the expense of fidelity to the phenomenon under consideration, on the grounds that it is easier to understand and interpret outputs. A simple model can always be made more complex when we understand what it is doing and why; a complex model is rather the inverse, its outcomes difficult to isolate and understand.

A criticism of computational simulation is that one only gets out of it what one puts in; that its results are tautological. This is to misunderstand what an agent based simulation does.  In the model developed here, I put no information into the model about the ‘real world’, the archaeological information against which I measure the results. The model is meant to simulate my understanding of key elements of Bang’s formulation of the ‘Imperial Bazaar’. We measure whether or not this formulation is useful by matching its results against archaeological information which was never incorporated into the agents’ rules, procedures, or starting points. I never pre-specify the shape of the social networks that the agents will employ; rather, I allow them to generate their own social networks which I then measure against those known from archaeology. In this way, I start with the dynamic to produce static snapshots.

We sweep the ‘parameter space’ to understand how the simulation behaves; ie, the simulation is set to run multiple times with different variable settings. In this case, there are only two agent variables that we are interested in (having already pre-set the environment to reflect different kinds of resources), ‘transaction costs’ and ‘knowledge of the world’. Because we are ultimately interested in comparing the social networks produced by the model against a known network, the number of agents is set at 235, a number that reflects the networks known from archaeometric and epigraphic analysis of the South Etruria Collection of stamped Roman bricks (Graham 2006a).

What is particularly exciting about this kind of approach, to my mind, is that if you disagree with it, with my assumptions, with my encoded representation of how we as archaeologists believed the ancient world to have worked, you can simply download the code, make your own changes, and see for yourself. If you are presented with the results of a simulation that you cannot open the hood and examine its inner workings for yourself, you have no reason to believe those findings. Thus agent based modeling plays into open access issues as well.

So let us consider then some of the results of this model, this computational petri dish for generating social networks.For my archaeological networks, I looked at clustering coefficient and average path length as indicator metrics, (key elements of Watts’ small world formulation).  We can tentatively identify a small-world then as one with a short average path length and a strong clustering coefficient, compared to a randomly connected network with the same number of actors and connections. Watts suggests that a small-world exists when the path lengths are similar but the clustering coefficient is an order of magnitude greater than in the equivalent random network (Watts 1999: 114).

In Roman economic history, discussions of the degree of market integration within and across the regions of the Empire could usefully be recast as a discussion of small-worlds. If small-worlds could be identified in the archaeology (or emerge as a consequence of a simulation of the economy), then we would have a powerful tool for exploring flows of power, information, and materials. Perhaps Rome’s structural growth – or lack thereof – could be understood in terms of the degree to which the imperial economy resembles a small-world (cf the papers in Manning and Morris 2005)?

The networks generated from the study of brick stamps are of course a proxy indicator at best. Not everyone (presumably) who made brick stamped it. That said, there are some combinations of settings that produce results broadly similar to those observed in stamp networks, in terms of their internal structure and the average path length between any two agents.

One such mimics a world where transaction costs are significant (but not prohibitive), and knowledge of the world is limited . The clustering coefficient and average path length observed for stamped bricks during the second century fall within the range of results for multiple runs with these settings. In the simulation, the rate at which individuals linked together into a network suggests that there was a constant demand for help and support. The world described by the model doesn’t sound quite like the world of the second century, the height of Rome’s power, that we think we know, suggesting something isn’t quite right, in either the model or our understandings. But how much of the world did brickmakers actually know, remembering that ‘knowledge of the world’ in the model is here limited to the location of new resources to exploit?

Agent based modeling also allow us to explore the consequences of things that didn’t happen. There were a number of simulated worlds that did not produce any clustering at all (and very little social network growth). Most of those runs occurred when the resource being simulated was coppiced woodland. This would suggest that the nature of the resource is such that social networks do not need to emerge to any great degree (for the most part, they are all dyadic pairs, as small groups of agents exploit the same patch of land over and over again). The implication is that some kinds of resources do not need to be tied into social networks to any great degree in order for them to be exploited successfully (these were also some of the longest model runs, another indicator of stability).

What are some of the implications of computationally searching for the networks characteristic of the Roman economy-as-bazaar? If, despite its flaws, this model correctly encapsulates something of the way the Roman economy worked, we have an idea of, and the ability to explore, some of the circumstances that promoted economic stability. It depends on the nature of the resource and the interplay with the degree of transaction costs and the agents’ knowledge of the world. In some situations, ‘patronage’ (as instantiated in the model) serves as a system for enabling continual extraction; in other situations, patronage does not seem to be a factor.

However, with that said, none of the model runs produced networks that had the classical signals of a small-world. This is rather interesting. If we have correctly modeled the way patronage works in the Roman world, and patronage is the key to understanding Rome (cf Verboven 2002), we should have expected that small-worlds would naturally emerge. This suggests that something is missing from the model – or our thinking about patronage is incorrect. We can begin to explore the conundrum by examining the argument made in the code of the simulation, especially in the way agents search for patrons. In the model, it is a local search. There is no way of creating those occasionally long-distance ties. We had initially imagined that the differences in the individual agents’ ‘vision’ would allow some agents to have a greater ability to know more about the world and thus choose from a wider range. In practice, those with greater ‘vision’ were able to find the best patches of resources, indeed, the variability in the distribution of resources allowed these individuals to squat on what was locally best. My ‘competition’ and prestige mechanisms seem to have promoted a kind of path dependence. Perhaps we should have instead included something like a ‘salutatio’, a way for the agents to assess patrons’ fitness or change patrons (cf Graham 2009; Garnsey and Woolf 1989: 154; Drummond 1989: 101; Wallace-Hadrill 1989b: 72-3). Even when models fail, their failures still throw useful light. This failure of my model suggests that we should focus on markets and fairs as not just economic mechanisms, but as social mechanisms that allow individuals to make the long distance links. A subsequent iteration of the model will include just this.

This model will come into its own once there is more and better network data drawn from archaeological, epigraphic, historical sources. This will allow the refining of both the set-up of the model and comparanda for the results. The model presented here is a very simple model, with obvious faults and limitations. Nevertheless, it does have the virtue of forcing us to think about how patronage, resource extraction, and social networks intersected in the Roman economy. It produces output that can be directly measured against archaeological data, unlike most models of the Roman economy. When one finds fault with the model (since every model is a simplification), and with the assumptions coded therein, he or she is invited to download the model and to modify it to better reflect his or her understandings. In this way, we develop a laboratory, a petri-dish, to test our beliefs about the Roman economy. We offer this model in that spirit.

[edited April 4th to make it less clumsy, and to fit in the 15 minute time frame]

 

Hodder’s ‘Tanglegram’ as Network

Hodder's fig 9.2 as network

Hodder’s fig 9.2 as network

I am reading Ian Hodder’s book, ‘Entangled: An Archaeology of the Relationship between Humans and Things’ Hodder writes that the tanglegram cannot be represented as a network, since a network doesn’t consider the nature of the relationships or nodes. This is not in fact the case. Representing these complex relationships as a network is quite possible, and allows the ‘tanglegram’ to actually become a object to query in its own right, rather than a suggestive illustration. I’ve uploaded the network data to Figshare:
http://dx.doi.org/10.6084/m9.figshare.654626

I used NodeXL to enter the data. If there was a bidirectional tie, I made two entries: A -> B, B -> A. If it was only one way, I entered it with the directionality of the original tanglegram. I saved it as a .net file, opened it in gephi, and ran gephi’s statistics.

This was all rather rough and ready; because I was working from a blown-up photocopy of the original figure, and I’m trying to get ready for a trip, there could be errors. One would need Hodder’s original data to do this properly, but I offer it up here to show that it’s possible, and indeed worthwhile: why else would you bother drawing a tanglegram, if not to use it to help your analysis?

In the image below, I resize the nodes to represent betweenness centrality (which elements of the tanglegram are doing the heavy lifting?) and recolour it according to modularity. Modularity finds five groups (nodes listed in descending order of betweenness centrality):

Group 0: house, groundstone, burial, plaster, figurines, pigment, skins, painting, personal artefacts, animal heads, food storage, human heads, special food, human body parts, burials, storage rooms, bins

Group 1: hoard, chipped stone, sheep, mats, dung, wild animals, fields, bone, cereals, wooden object, weeds.

Group 2: food, hearth, fuel, ash, clay balls, oven, traps, wood

Group 3: clay, baskets, extraction pits, wetland, reeds, birds, dryland, marl, ditches, fish, clean water, landscape, field, eggs

Group 4: midden, dogs, colluvium, mortar, pen, mudbrick

Seems quite suggestive! For the files for yourself, please see:

Hodder’s Figure 9.2, Entangled, as network. Shawn Graham. figshare.
http://dx.doi.org/10.6084/m9.figshare.654626

Retrieved 17:47, Mar 19, 2013 (GMT)

Teaching Network Analysis

I had a conversation with Scott Weingart the other day, prompted by this plaintive cry:

Backstory: I’m teaching a class where we are looking at maps and networks and archaeological data, as ways of understanding how cities and countryside blur into one another in the ancient world. Last week, we played iterated Prisoner’s Dilemma’s with playing cards (thanks to this site by Alannah Morrison) as part of a discussion about Agent Based Modeling.

Which brings me to the conversation with Scott. Today, we’re playing with Gephi and making network models of the character relationships in our favourite TV shows. The next step is to combine the two lessons to address the question: what flows over networks? What do different network shapes imply, and what kinds of metrics answer what kinds of questions? So I think I’ll set up two different networks with the students – literally, I’ll arrange students in a line, a star, etc – and have them play iterated Prisoner’s Dilemmas with the people to whom they’re connected. We’ll use playing cards to represent payoffs… and hopefully we’ll see the cards flow over the network.

I thank Scott for his suggestions!

Then we’ll turn to Netlogo’s community models of network dynamics. That is, they will. The classroom computer is so locked down that I can’t run a freaking java applet in the classroom.

Anyway, that’s the plan for today.

Coins from Sirmium in the PAS database

Interested in networks, and looking for an exemplar that I could do in my class, I turned to the Portable Antiquities Scheme database, and extracted coins known to have originated from the mint at Sirmium (modern Sremska Mitrovica, Serbia). You can find the list here. I downloaded the data as a CSV. Looking at it, it seemed to me that a multimodal graph of coin to findspot, to material, to date, and to ruling house might be useful (and of course could be transformed into single mode graphs as necessary).  So I made a list, where the 21 coins were marked ‘source’ and the other data were marked ‘target’ (which means that I repeated the coins four times in my list).

Here is the resulting network in a zoomable pdf: coins from Sirmium in the UK – PAS. I ran modularity and betweeness centrality. Most central nodes were ‘copper’ as a material, Constantine I and II, and then coins 283287 and 433211 and the date, AD324. If you revisualize the graph so that the communities are grouped into single nodes, the most ‘between’ of these communities is group 3, which has the following data: coins 410195, 272397, 451670, 474164, 283287, silver as a material, Constantius II and Valentinian I, and Bedford, Isle of Wight, North Kesteven, and the Vale of White Horse.

I’m no numismatist, but perhaps the coin folks out there can take a look at this small experiment, and tell me if these patterns are meaningful to them… my csv files & gephi files are here:

Coins from Sirmium, a networking experiment. Shawn Graham. figshare.
Retrieved 17:15, Oct 02, 2012 (GMT)
http://dx.doi.org/10.6084/m9.figshare.96219