Beyond the Spaghetti Monster

No, I don’t mean that spaghetti monster. I mean the one that people invoke when they wish to disparage network analysis. That particular spaghetti monster is some variant of a force-directed layout algorithm. Now, these have their place, but they sometimes obscure more than they illuminate. There are alternatives, and Elijah Meeks has been sharing some d3.js code for making interactive ‘arc diagrams’ and ‘adjacency matrices’ that highlight important patterns in network data without the monstrousness.

Elijah writes:

An arc diagram is another way of visualizing networks that doesn’t use force-directed principles. Instead, it draws the edge from one node to another as arcs above or below the nodes. Weight is indicated by edge thickness and directionality is indicated by the arc being above or below the nodes as well as with the edge getting wider at the source.

Over at http://bl.ocks.org/emeeks/9458332  Elijah shows us the d3 code for making such a creature. In essence, the code says to your browser, ‘there’s an edgelist, and a nodelist, and they go together like this.’ Since it’s using d3.js (data-driven documents), it loads that library up to make this process easier.  If you wanted to draw one of these things for yourself, you need to copy Elijah’s index.html code from his bl.ocks.org page, and then create two files, edgelist.csv and nodelist.csv.  If you have a network in Gephi, you can export both of these from the data laboratory tab by clicking ‘export spreadsheet’.

Similarly, Elijah provides an interactive adjacency matrix at http://bl.ocks.org/emeeks/9441864

An adjacency matrix is a useful way of visualizing networks using an n-by-n grid that shows connection between nodes as a filled grid square. This adjacency matrix is directed, with the source on the y-axis and target on the x-axis. It loads the data from a node list and edge list and represents edge weight using opacity. It also highlights rows and columns on mouseover.

If you copy that bit of html into a new file, it points to the same nodelist.csv and edgelist.csv. Voila! Two very crisp and clear visualizations of the structure of your network, with very little spaghetti mess. Here is Peter Holdsworth’s network of women from 1898 Ontario as both an arc diagram and an adjaceny matrix (and I thank Peter for making is data public for these kinds of explorations – so, I opened his .gexf network file in Gephi. On the data laboratory tab I hit ‘export spreadsheet’ for the nodes table, and then the edges table. I opened the csv files in excel, stripped out extraneous fields, and saved as csv):

1898 - Women networked by virtue of shared membership in various organizations

1898 – Women networked by virtue of shared membership in various organizations

Same again

Same again

Contrast these with the spaghetti version that was generated with gephi (Figshare provides a preview here). The patterning is much clearer and intuitive, I think. It’s beyond my programming prowess, but it should not be overly difficult for someone to package this code as a layout plugin for Gephi I would think.

Now, here’s the thing – you’ll need to put the html and the csv into the same folder on a server somewhere for this to work. I use WAMP for this kind of thing before moving everything onto the live interwebs. Installing WAMP is quite straightforward; it’s a one-click installer. Once you’ve got it installed, and running, you simply create a subfolder inside the c:\wamp\www\ folder, ie \myproject\. Then in your browser, got to localhost\myproject. Save your html and csv files in that folder. In your browser, click on your html file, and you’re good to go.

Elijah does point out:

This may be true, but if one plays with some of the html, making the canvas bigger, some of this can be mitigated… As with most things, no one approach is going to do everything you need it to, but these two visualizations should be in your toolkit.

Mapping the Web in Real Time

I don’t think I’ve shared my workflow before for mapping the structure of a webcrawl. After listening to Sebastian Heath speak at #dapw it occurred to me that it might be useful for, interalia linked open data type resources. So, here’s what you do (and my example draw’s from this year’s SAA 2014 blogging archaeology session blog-o-sphere):

1. install the http graph generator from the gephi plugin marketplace.

2. download the navicrawler + firefox portable zip file at the top of this page.

3. make sure no other instance of firefox is open. Open firefox portable. DO NOT click the ‘update firefox’ button, as this will make navicrawler unusable.

4. Navicrawler can be used to download or scrape the web. In the navicrawler window, click on the (+) to select the ‘crawl’ pane. This will let you set how deep and how far to crawl. Under the ‘file’ tab, you can save all of what you crawl in various file formats. With the httpgraph plugin for Gephi however, we will simply ‘listen’ to the browser and render the graph in real time.

5. The first time you run firefox portable, you will need to configure a manual proxy. Do this by going to tools >> options >> network >> settings. Set the manual proxy configuration for http to 127.0.0.1 and the port to 8088. Click ‘ok’.

If you tried loading a webpage at this point, you’d get an error. To resolve this, you need to tell Gephi to connect to that port as well, and then web traffic will be routed correctly.

6. Open Gephi. Select new project. Under ‘generate’, select ‘http graph’. This will open a dialogue box asking for the port number. Enter 8088.

7. Over in Firefox portable, you can now start a websearch or go to the page from which you wish to crawl. For instance, you could put in the address bar, http://dougsarchaeology.wordpress.com/2013/11/05/blogging-archaeology/. Over in gephi, you will start to see a number of nodes and edges appearing. In the ‘crawl’ window in Navicrawler, set ‘max depth’ to 1, ‘crawl distance’ to 2′ and ‘tabs count’ to 25. Then hit the ‘start’ button. Your Gephi window will now begin to fill with the structure of the internet. There are 4 types of nodes: client, uri, host, and domain. For our purposes here, we will want to filter the resulting graph to hide most of the architecture of the web and just show the URIs. (This by the way could be very useful for visualizing archaeological resources organized via Linked Open Data principles).

Your crawl can run for quite some time.  I was running the crawl describe above for around 10 minutes when it crashed on me. The resulting gephi file (which has 5374 nodes and 14993 edges) can be downloaded from my space on figshare. For the illustration below, I filtered the ‘content-type’ for ‘text/html’, to present the structure of the human readable archaeo-blog-o-sphere as represented by Doug’s Blogging Archaeology Carnival.

The view from Doug's place
The view from Doug’s place

Gaze & Eonydis for Archaeological Data

I’m experimenting with Clement Levallois‘ data mining tools ‘Gaze‘ and ‘Eonydis‘. I created a table with some mock archaeological data in it: artefact, findspot, and date range for the artefact. More on dates in a moment. Here’s the fake dataset.

Firstly, Gaze will take a list of nodes (source, target), and create a network where the source nodes are connected to each other by virtue of sharing a common target. Clement explains:

Paul,dog
Paul, hamster
Paul,cat
Gerald,cat
Gerald,dog
Marie,horse
Donald,squirrel
Donald,cat
… In this case, it is interesting to get a network made of Paul, Gerald, Marie and Donald (sources nodes), showing how similar they are in terms of pets they own. Make sure you do this by choosing “directed networks” in the parameters of Gaze. A related option for directed networks: you can choose a minimum number of times Paul should appear as a source to be included in the computations (useful to filter out unfrequent, irrelevant nodes: because you want only owners with many pets to appear for instance).

The output is in a nodes.dl file and an edges.dl file. In Gephi, go to the import spreadsheet button on the data table, import the nodes file first, then the edges file. Here’s the graph file.

Screenshot, Gaze output into Gephi, from mock archaeo-data

Screenshot, Gaze output into Gephi, from mock archaeo-data

Eonydis on the other hand takes that same list and if it has time-stamps within it (a column with dates), will create a dynamic network over time. My mock dataset above seems to cause Eonydis to crash – is it my negative numbers? How do you encode dates from the Bronze Age in the day/month/year system? Checking the documentation, I see that I didn’t have proper field labels, so I needed to fix that. Trying again, it still crashed. I fiddled with the dates to remove the range (leaving a column to imply ‘earliest known date for this sort of thing’), which gave me this file.

Which still crashed. Now I have to go do some other stuff, so I’ll leave this here and perhaps one of you can pick up where I’ve left off. The example file that comes with Eonydis works fine, so I guess when I return to this I’ll carefully compare the two. Then the task will be to work out how to visualize dynamic networks in Gephi. Clement has a very good tutorial on this.

Postscript:

Ok, so I kept plugging away at it. I found if I put the dates yyyy-mm-dd, as in 1066-01-23 then Eonydis worked a treat. Here’s the mock data and here’s the gexf.

And here’s the dynamic animation! http://screencast.com/t/Nlf06OSEkuA

Post post script:

I took the mock data (archaeo-test4.csv) and concatenated a – in front of the dates, thus -1023-01-01 to represent dates BC. In Eonydis, where it asks for the date format, I tried this:

#yyyy#mm#dd  which accepted the dates, but dropped the negative;

-yyyy#mm#dd, which accepted the dates and also dropped the negative.

Thus, it seems to me that I can still use Eonydis for archaeological data, but I should frame my date column in relative terms rather than absolute, as absolute isn’t really necessary for the network analysis/visualization anyway.

Getting Historical Network Data into Gephi

I’m running a workshop next week on getting started with networks & gephi. Below, please find my first pass at a largely self-directed tutorial. This may eventually get incorporated into the Macroscope.

Data files for this tutorial may be found here. There’s a pdf/pptx with the images below, too.

The data for this exercise comes from Peter Holdsworth’s MA dissertation research, which Peter shared on Figshare here. Peter was interested in the social networks surrounding ideas of commemoration of the centenerary of the War of 1812, in 1912. He studied the membership rolls for women’s service organization in Ontario both before and after that centenerary. By making his data public, Peter enables others to build upon his own research in a way not commonly done in history. (Peter can be followed on Twitter at https://twitter.com/P_W_Holdsworth).

On with the show!

Download and install Gephi. (What follows assumes Gephi 0.8.2). You will need the MultiMode Projection pluging installed.

To install the plugin – select Tools >> Plugins  (across the top of Gephi you’ll see ‘File Workspace View Tools Window Plugins Help’. Don’t click on this ‘plugins’. You need to hit ‘tools’ first. Some images would be helpful, eh?).

In the popup, under ‘available plugins’ look for ‘MultimodeNetworksTransformation’. Tick this box, then click on Install. Follow the instructions, ignore any warnings, click on ‘finish’. You may or may not need to restart Gephi to get the plugin running. If you suddenly see on the far right of ht Gephi window a new tab besid ‘statistics’, ‘filters’, called ‘Multimode Network’, then you’re ok.

Slide1

Getting the Plugin

Assuming you’ve now got that sorted out,

1. Under ‘file’, select -> New project.
2. On the data  laboratory tab, select Import-spreadsheet, and in the pop-up, make sure to select under ‘as table: EDGES table. Select women-orgs.csv.  Click ‘next’, click finish.

(On the data table, have ‘edges’ selected. This is showing you the source and the target for each link (aka ‘edge’). This implies a directionality to the relationship that we just don’t know – so down below, when we get to statistics, we will always have to make sure to tell Gephi that we want the network treated as ‘undirected’. More on that below.)

Slide2

Loading your csv file, step 1.

Slide3

Loading your CSV file, step 2

3. Click on ‘copy data to other column’. Select ‘Id’. In the pop-up, select ‘Label’
4. Just as you did in step 2, now import NODES (Women-names.csv)

(nb. You can always add more attribute data to your network this way, as long as you always use a column called Id so that Gephi knows where to slot the new information. Make sure to never tick off the box labeled ‘force nodes to be created as new ones’.)

Adding new columns

Adding new columns

5. Copy ID to Label
6. Add new column, make it boolean. Call it ‘organization’

Filtering & ticking off the boxes

Filtering & ticking off the boxes

7. In the Filter box, type [a-z], and select Id – this filters out all the women.
8. Tick off the check boxes in the ‘organization’ columns.

Save this as ‘women-organizations-2-mode.gephi’.

Now, we want to explore how women are connected to other women via shared membership.

Setting up the transformation.

Setting up the transformation.

Make sure you have the Multimode networks projection plugin installed.

On the multimode networks projection tab,
1. click load attributes.
2. in ‘attribute type’, select organization
4. in left matrix, select ‘false – true’ (or ‘null – true’)
5. in right matrix, select ‘true – false’. (or ‘true – null’)
(do you see why this is the case? what would selecting the inverse accomplish?)

6. select ‘remove edges’ and ‘remove nodes’.

7. Once you hit ‘run’, organizations will be removed from your bipartite network, leaving you with a single-mode network. hit ‘run’.

8. save as ‘women to women network.csv’

…you can reload your ‘women-organizations-2-mode.gephi’ file and re-run the multimode networks projection so that you are left with an organization to organization network.

! if your data table is blank, your filter might still be active. make sure the filter box is clear. You should be left with a list of women.

9. You can add the ‘women-years.csv’ table to your gephi file, to add the number of organizations the woman was active in, by year, as an attribute. You can then begin to filter your graph’s attributes…

10. Let’s filter by the year 1902. Under filters, select ‘attributes – equal’ and then drag ’1902′ to the queries box.
11. in ‘pattern’ enter [0-9] and tick the ‘use regex’ box.
12. click ok, click ‘filter’.

You should now have a network with 188 nodes and 8728 edges, showing the women who were active in 1902.

Let’s learn something about this network. On statistics,
13. Run ‘avg. path length’ by clicking on ‘run’
14. In the pop up that opens, select ‘undirected’ (as we know nothing about directionality in this network).
15. click ok.

16. run ‘modularity’ to look for subgroups. make sure ‘randomize’ and ‘use weights’ are selected. Leave ‘resolution’ at 1.0

Let’s visualize what we’ve just learned.

17. On the ‘partition’ tab, over on the left hand side of the ‘overview’ screen, click on nodes, then click the green arrows beside ‘choose a partition parameter’.
18. Click on ‘choose a partition parameter’. Scroll down to modularity class. The different groups will be listed, with their colours and their % composition of the network.
19. Hit ‘apply’ to recolour your network graph.

20. Let’s resize the nodes to show off betweeness-centrality (to figure out which woman was in the greatest position to influence flows of information in this network.) Click ‘ranking’.
21. Click ‘nodes’.
22. Click the down arrow on ‘choose a rank parameter’. Select ‘betweeness centrality’.
23. Click the red diamond. This will resize the nodes according to their ‘betweeness centrality’.
24. Click ‘apply’.

Now, down at the bottom of the middle panel, you can click the large black ‘T’ to display labels. Do so. Click the black letter ‘A’ and select ‘node size’.

Mrs. Mary Elliot-Murray-Kynynmound and Mrs. John Henry Wilson should now dominate your network. Who were they? What organizations were they members of? Who were they connected to? To the archives!

Congratulations! You’ve imported historical network data into Gephi, manipulated it, and run some analyzes. Play with the settings on ‘preview’ in order to share your visualization as svg, pdf, or png.

Now go back to your original gephi file, and recast it as organizations to organizations via shared members, to figure out which organizations were key in early 20th century Ontario…

Reanimating Networks with Agent Modeling

I’m presenting next week at the Society for American Archaeology Annual Meeting. I’m giving two papers. One argues for parsimonious models when we do agent based modeling.  The other reverses the flow of archaeological network analysis and instead of finding nets in the archaeology, I use agent based models to generate networks that help me understand the archaeology. (The session is ‘Connected Past’.) Here is the draft of my talk, with all the usual caveats that that entails. Parts of it have been drawn from an unpublished piece that discusses this methodology and the results in much greater detail. It will appear…. eventually.

Scott Weingart has been an enormous help in all of this. You should follow his work. 

My interests lie in the social networks surrounding primary resource extraction in the Roman world. The Roman epigraphy of stamped brick easily lends itself to network analysis. One string together, like pearls, individual landowners, estate names, individual brick makers, signa, brick fabrics, and locations. This leads to very complicated, multi-dimensional networks.

When I first started working with this material, I reduced this complexity by looking only at the humans, whom I tied together based on appearing in the same stamp type together. I called these ‘producer’ networks. I then looked at the ties implied by the shared use of fabrics, or the co-location of brick stamp types at various findspots, and called these ‘manufacturing’ networks.

I then sliced these networks up by reigning dynasty, and developed a story to account for their changing shapes over time.

This was in the late 1990s, and in terms of network theorists I had largely only Granovetter, Hanneman & Riddle, and Strogatz & Watts to go on. The story I told was little more than a just-so story, like how the Camel got its Hump.

I had the shape, I had points where I could hang the story, but I couldn’t account for how I got from the shape of the network in the Julio-Claudian period, to that of the Flavian, to that of the Antonines. I’ve done a lot of work on networks since then; now I want to know what generates these networks that we see archaeologically, in the first place.

In this talk today, I want to reverse the direction of my inquiry. We are all agreed that we can find networks in our archaeological materials. The problem, I think, for us, is to explain the network processes that produce these patterns, and then to use our understanding of those processes to narrow down the possible entangled human & thing interactions that could give rise to these possible processes.

We need to be able to understand the possible behaviour-spaces that could produce the networks we see, to tease out the inevitable from the contingent. We need to be able to rigorously explore the emergent or unintended consequences of the stories we tell. The only way I know how to do that systematically, is to encode those stories as computer code, to turn them from normal, archaeological storytelling rhetoric, to computational procedural rhetoric.

So this is what we did.

One story we tell about the Roman world, that might be useful for understanding things like the exploitation of land for building materials, is that its social economy functioned like a ‘bazaar’.

According to Peter Bang, the Roman economic system is best understood as a complex, agrarian tributary empire, of a kind similar to the Ottoman or Mughal (Bang 2006; 2008).  Bang (2006: 72-9) draws attention to the concept of the bazaar. The bazaar was a complete social system that incorporated the small peddler with larger merchants, long distance trade, with a smearing of categories of role and scale. The bazaar emerged from the interplay of instability and fragmentation. The mechanisms developed to cope with these reproduced that same instability and fragmentation. Bang identifies four key mechanisms that did this: small parcels of capital (to combat risk, cf Skydsgaard 1976); little homogenization of products (agricultural output and quality varied year by year, and region by region as Pliny discusses in Naturalis Historia 12 and 18); opportunism; and social networks (80-4). As Bang demonstrates, these characteristics correspond well with the archaeology of the Roman economy and the picture we know from legal and other text.

Bang’s model of the bazaar (2008; 2006), and the role of social networks within that model, can be simulated computationally. What follows is a speculative attempt to do so, and should be couched in all appropriate caveats and warnings. The model simulates the extraction of various natural resources, where social connections may emerge between individuals as a consequence of the interplay of the environment, transaction costs, and the agent’s knowledge of the world. If the networks generated from the computational simulation of our models for the ancient economy correspond to those we see in the ancient evidence , we have a powerful tool for exploring antiquity, for playing with different ideas about how the ancient world worked (cf. Dibble 2006). Computation might be able to bridge our models and our evidence. In particular, I mean, ‘agent based modeling’.

Agent based modelling is an approach to simulation that focuses on the individual. In an agent based model, the agents or individuals are autonomous computing objects. They are their own programmes. They are allowed to interact within an environment (which frequently represents some real-world physical environment). Every agent has the same suite of variables but each agent’s individual combination of variables is unique (if it was a simulation of an ice-hockey game, every agent would have a ‘speed’ variable, and an ‘ability’ variable, and so the nature of every game would be unique). Agents can be aware of each other and the state of the world (or their location within it), depending on the needs of the simulation. It is a tool to simulate how we believe a particular phenomenon worked in the past. When we simulate, we are interrogating our own understandings and beliefs.

The model imagines a ‘world’ (‘gameboard’ would not be an inappropriate term) in which help is necessary to find and consume resources. The agents do not know when or where resources will appear or become exhausted. By accumulating resources, and ‘investing’ in improvements to make extraction easier, agents can accrue prestige. When agents get into ‘trouble’ (they run out of resources) they can examine their local area and become a ‘client’ of someone with more prestige than themselves.  It is an exceedingly simple simulation, and a necessary simplification of Bang’s ‘Bazaar’ model, but one that captures the essence and exhibits subtle complexity in its results. The resulting networks can be imported into social network analysis software like Gephi.

It is always better to start with a simple simulation, even at the expense of fidelity to the phenomenon under consideration, on the grounds that it is easier to understand and interpret outputs. A simple model can always be made more complex when we understand what it is doing and why; a complex model is rather the inverse, its outcomes difficult to isolate and understand.

A criticism of computational simulation is that one only gets out of it what one puts in; that its results are tautological. This is to misunderstand what an agent based simulation does.  In the model developed here, I put no information into the model about the ‘real world’, the archaeological information against which I measure the results. The model is meant to simulate my understanding of key elements of Bang’s formulation of the ‘Imperial Bazaar’. We measure whether or not this formulation is useful by matching its results against archaeological information which was never incorporated into the agents’ rules, procedures, or starting points. I never pre-specify the shape of the social networks that the agents will employ; rather, I allow them to generate their own social networks which I then measure against those known from archaeology. In this way, I start with the dynamic to produce static snapshots.

We sweep the ‘parameter space’ to understand how the simulation behaves; ie, the simulation is set to run multiple times with different variable settings. In this case, there are only two agent variables that we are interested in (having already pre-set the environment to reflect different kinds of resources), ‘transaction costs’ and ‘knowledge of the world’. Because we are ultimately interested in comparing the social networks produced by the model against a known network, the number of agents is set at 235, a number that reflects the networks known from archaeometric and epigraphic analysis of the South Etruria Collection of stamped Roman bricks (Graham 2006a).

What is particularly exciting about this kind of approach, to my mind, is that if you disagree with it, with my assumptions, with my encoded representation of how we as archaeologists believed the ancient world to have worked, you can simply download the code, make your own changes, and see for yourself. If you are presented with the results of a simulation that you cannot open the hood and examine its inner workings for yourself, you have no reason to believe those findings. Thus agent based modeling plays into open access issues as well.

So let us consider then some of the results of this model, this computational petri dish for generating social networks.For my archaeological networks, I looked at clustering coefficient and average path length as indicator metrics, (key elements of Watts’ small world formulation).  We can tentatively identify a small-world then as one with a short average path length and a strong clustering coefficient, compared to a randomly connected network with the same number of actors and connections. Watts suggests that a small-world exists when the path lengths are similar but the clustering coefficient is an order of magnitude greater than in the equivalent random network (Watts 1999: 114).

In Roman economic history, discussions of the degree of market integration within and across the regions of the Empire could usefully be recast as a discussion of small-worlds. If small-worlds could be identified in the archaeology (or emerge as a consequence of a simulation of the economy), then we would have a powerful tool for exploring flows of power, information, and materials. Perhaps Rome’s structural growth – or lack thereof – could be understood in terms of the degree to which the imperial economy resembles a small-world (cf the papers in Manning and Morris 2005)?

The networks generated from the study of brick stamps are of course a proxy indicator at best. Not everyone (presumably) who made brick stamped it. That said, there are some combinations of settings that produce results broadly similar to those observed in stamp networks, in terms of their internal structure and the average path length between any two agents.

One such mimics a world where transaction costs are significant (but not prohibitive), and knowledge of the world is limited . The clustering coefficient and average path length observed for stamped bricks during the second century fall within the range of results for multiple runs with these settings. In the simulation, the rate at which individuals linked together into a network suggests that there was a constant demand for help and support. The world described by the model doesn’t sound quite like the world of the second century, the height of Rome’s power, that we think we know, suggesting something isn’t quite right, in either the model or our understandings. But how much of the world did brickmakers actually know, remembering that ‘knowledge of the world’ in the model is here limited to the location of new resources to exploit?

Agent based modeling also allow us to explore the consequences of things that didn’t happen. There were a number of simulated worlds that did not produce any clustering at all (and very little social network growth). Most of those runs occurred when the resource being simulated was coppiced woodland. This would suggest that the nature of the resource is such that social networks do not need to emerge to any great degree (for the most part, they are all dyadic pairs, as small groups of agents exploit the same patch of land over and over again). The implication is that some kinds of resources do not need to be tied into social networks to any great degree in order for them to be exploited successfully (these were also some of the longest model runs, another indicator of stability).

What are some of the implications of computationally searching for the networks characteristic of the Roman economy-as-bazaar? If, despite its flaws, this model correctly encapsulates something of the way the Roman economy worked, we have an idea of, and the ability to explore, some of the circumstances that promoted economic stability. It depends on the nature of the resource and the interplay with the degree of transaction costs and the agents’ knowledge of the world. In some situations, ‘patronage’ (as instantiated in the model) serves as a system for enabling continual extraction; in other situations, patronage does not seem to be a factor.

However, with that said, none of the model runs produced networks that had the classical signals of a small-world. This is rather interesting. If we have correctly modeled the way patronage works in the Roman world, and patronage is the key to understanding Rome (cf Verboven 2002), we should have expected that small-worlds would naturally emerge. This suggests that something is missing from the model – or our thinking about patronage is incorrect. We can begin to explore the conundrum by examining the argument made in the code of the simulation, especially in the way agents search for patrons. In the model, it is a local search. There is no way of creating those occasionally long-distance ties. We had initially imagined that the differences in the individual agents’ ‘vision’ would allow some agents to have a greater ability to know more about the world and thus choose from a wider range. In practice, those with greater ‘vision’ were able to find the best patches of resources, indeed, the variability in the distribution of resources allowed these individuals to squat on what was locally best. My ‘competition’ and prestige mechanisms seem to have promoted a kind of path dependence. Perhaps we should have instead included something like a ‘salutatio’, a way for the agents to assess patrons’ fitness or change patrons (cf Graham 2009; Garnsey and Woolf 1989: 154; Drummond 1989: 101; Wallace-Hadrill 1989b: 72-3). Even when models fail, their failures still throw useful light. This failure of my model suggests that we should focus on markets and fairs as not just economic mechanisms, but as social mechanisms that allow individuals to make the long distance links. A subsequent iteration of the model will include just this.

This model will come into its own once there is more and better network data drawn from archaeological, epigraphic, historical sources. This will allow the refining of both the set-up of the model and comparanda for the results. The model presented here is a very simple model, with obvious faults and limitations. Nevertheless, it does have the virtue of forcing us to think about how patronage, resource extraction, and social networks intersected in the Roman economy. It produces output that can be directly measured against archaeological data, unlike most models of the Roman economy. When one finds fault with the model (since every model is a simplification), and with the assumptions coded therein, he or she is invited to download the model and to modify it to better reflect his or her understandings. In this way, we develop a laboratory, a petri-dish, to test our beliefs about the Roman economy. We offer this model in that spirit.

[edited April 4th to make it less clumsy, and to fit in the 15 minute time frame]

 

Hodder’s ‘Tanglegram’ as Network

Hodder's fig 9.2 as network

Hodder’s fig 9.2 as network

I am reading Ian Hodder’s book, ‘Entangled: An Archaeology of the Relationship between Humans and Things’ Hodder writes that the tanglegram cannot be represented as a network, since a network doesn’t consider the nature of the relationships or nodes. This is not in fact the case. Representing these complex relationships as a network is quite possible, and allows the ‘tanglegram’ to actually become a object to query in its own right, rather than a suggestive illustration. I’ve uploaded the network data to Figshare:
http://dx.doi.org/10.6084/m9.figshare.654626

I used NodeXL to enter the data. If there was a bidirectional tie, I made two entries: A -> B, B -> A. If it was only one way, I entered it with the directionality of the original tanglegram. I saved it as a .net file, opened it in gephi, and ran gephi’s statistics.

This was all rather rough and ready; because I was working from a blown-up photocopy of the original figure, and I’m trying to get ready for a trip, there could be errors. One would need Hodder’s original data to do this properly, but I offer it up here to show that it’s possible, and indeed worthwhile: why else would you bother drawing a tanglegram, if not to use it to help your analysis?

In the image below, I resize the nodes to represent betweenness centrality (which elements of the tanglegram are doing the heavy lifting?) and recolour it according to modularity. Modularity finds five groups (nodes listed in descending order of betweenness centrality):

Group 0: house, groundstone, burial, plaster, figurines, pigment, skins, painting, personal artefacts, animal heads, food storage, human heads, special food, human body parts, burials, storage rooms, bins

Group 1: hoard, chipped stone, sheep, mats, dung, wild animals, fields, bone, cereals, wooden object, weeds.

Group 2: food, hearth, fuel, ash, clay balls, oven, traps, wood

Group 3: clay, baskets, extraction pits, wetland, reeds, birds, dryland, marl, ditches, fish, clean water, landscape, field, eggs

Group 4: midden, dogs, colluvium, mortar, pen, mudbrick

Seems quite suggestive! For the files for yourself, please see:

Hodder’s Figure 9.2, Entangled, as network. Shawn Graham. figshare.
http://dx.doi.org/10.6084/m9.figshare.654626

Retrieved 17:47, Mar 19, 2013 (GMT)

Teaching Network Analysis

I had a conversation with Scott Weingart the other day, prompted by this plaintive cry:

Backstory: I’m teaching a class where we are looking at maps and networks and archaeological data, as ways of understanding how cities and countryside blur into one another in the ancient world. Last week, we played iterated Prisoner’s Dilemma’s with playing cards (thanks to this site by Alannah Morrison) as part of a discussion about Agent Based Modeling.

Which brings me to the conversation with Scott. Today, we’re playing with Gephi and making network models of the character relationships in our favourite TV shows. The next step is to combine the two lessons to address the question: what flows over networks? What do different network shapes imply, and what kinds of metrics answer what kinds of questions? So I think I’ll set up two different networks with the students – literally, I’ll arrange students in a line, a star, etc – and have them play iterated Prisoner’s Dilemmas with the people to whom they’re connected. We’ll use playing cards to represent payoffs… and hopefully we’ll see the cards flow over the network.

I thank Scott for his suggestions!

Then we’ll turn to Netlogo’s community models of network dynamics. That is, they will. The classroom computer is so locked down that I can’t run a freaking java applet in the classroom.

Anyway, that’s the plan for today.

Coins from Sirmium in the PAS database

Interested in networks, and looking for an exemplar that I could do in my class, I turned to the Portable Antiquities Scheme database, and extracted coins known to have originated from the mint at Sirmium (modern Sremska Mitrovica, Serbia). You can find the list here. I downloaded the data as a CSV. Looking at it, it seemed to me that a multimodal graph of coin to findspot, to material, to date, and to ruling house might be useful (and of course could be transformed into single mode graphs as necessary).  So I made a list, where the 21 coins were marked ‘source’ and the other data were marked ‘target’ (which means that I repeated the coins four times in my list).

Here is the resulting network in a zoomable pdf: coins from Sirmium in the UK – PAS. I ran modularity and betweeness centrality. Most central nodes were ‘copper’ as a material, Constantine I and II, and then coins 283287 and 433211 and the date, AD324. If you revisualize the graph so that the communities are grouped into single nodes, the most ‘between’ of these communities is group 3, which has the following data: coins 410195, 272397, 451670, 474164, 283287, silver as a material, Constantius II and Valentinian I, and Bedford, Isle of Wight, North Kesteven, and the Vale of White Horse.

I’m no numismatist, but perhaps the coin folks out there can take a look at this small experiment, and tell me if these patterns are meaningful to them… my csv files & gephi files are here:

Coins from Sirmium, a networking experiment. Shawn Graham. figshare.
Retrieved 17:15, Oct 02, 2012 (GMT)
http://dx.doi.org/10.6084/m9.figshare.96219

Visualizing THATCamp

THATCamps are quite popular. I’m throwing one myself. But who are the people talking about them on Twitter? What does the THATCamp look like on the Twitterverse?

I used NodeXL to retrieve the data – a search for tweets, people, and the links between them. I then visualized the data in Gephi, where colour = community (per Gephi’s modularity routine) and sized the nodes (individual Twitterers) using Pagerank, on the premise that this was a directed graph and one should follow the links (although there was little difference with Betweeness Centrality. Major players are still major, either way).

I found 233 individuals, linked together by 4435 edges. Some general stats on this directed network:

Top 10 Vertices, Ranked by Betweenness Centrality Betweenness Centrality
thatcamp 10493.93299
marindacos 3381.65598
amandafrench 2530.589717
openeditionsays 2491.27153
inactinique 2183.450362
briancroxall 2093.876857
piotrr70 2014.064889
brettbobley 1798.013658
miriamkp 1693.203103
melissaterras 1596.42585

 

Top Replied-To in Entire Graph Entire Graph Count
colleengreene 4
thatcamp 3
rosemarysewart 2
normasalim 2
janaremy 2
spagnoloacht 1
chuckrybak 1
lawnsports 1
ncecire 1
academicdave 1
Top Mentioned in Entire Graph Entire Graph Count
thatcamp 25
piotrr70 25
briancroxall 17
ncecire 16
spouyllau 14
thtcmpfeminisms 10
marindacos 8
dhlib2012 8
thatcamprtp 7
goldstoneandrew 6
Top URLs in Tweet in Entire Graph Entire Graph Count
http://leo.hypotheses.org/9506 26
http://bit.ly/RyrPvA 19
http://tcp.hypotheses.org/609 19
http://tcp.hypotheses.org/programme 15
http://rtp2012.thatcamp.org/apply/ 12
http://bit.ly/w1IFmR 11
http://dhlib2012.thatcamp.org/register/ 10
http://goo.gl/qJ185 10
http://dhlib2012.thatcamp.org/ 8
http://bit.ly/RNHLKO 8
Top Hashtags in Tweet in Entire Graph Entire Graph Count
thatcamp 137
mla13 27
dh 22
tcp2012 17
thatcampsocal 12
dhlib2012 9
unconferences 7
thatcamptheory 6
digitalhumanities 6
tcny2012 6

And now the visualization. You can download the zoomable pdf here.

As I look at the modularity in this graph, at first blush, you can see quite a North America / European divide, with various satellite outposts. This could be of course because there’s a THATCamp Paris coming down the pipe (lots of French in the tweets).

Artefact Networks Analysis – Thinking Out Loud

What follows is a process of me thinking out loud whilst working with data related to the Tiber Valley Brick Industry.

If we look at the distribution of stamped bricks across the Tiber Valley (as collected by the BSR), we can work out some of the implications for production and consumption by considering the way location, stamp type, and fabric of the brick intersect. I looked at every combination for every site in the South Etruria Survey, having done archaeometric analysis on the same collection to the point where I could say that ‘these two bricks share the same fabric’.

Here is the table (where 1 implies same, 0 implies different):

Combination Findspot Stamp Fabric
A 1 1 1
B 1 1 0
C 1 0 0
D 1 0 1
E 0 0 0
F 0 0 1
G 0 1 1
H 0 1 0

Which we may collapse, as some of these become functionally the same.
1. A&G: imply a common origin and distribution
2. B&H: imply a geographically dispersed production
3. D&F: imply a single clay source exploited by different figlinae.

Finally, there is situation ‘C’, which suggests that builders at a site had access to a variety of sources.

Since we can closely date many of the bricks, we can look at how these modes play out over time (remembering that this is a tally of relationships):

Julio-Claudian Flavian Nerva-Hadrian Antoninus Pius-Commodus Severan Late Total
Mode 1 (A/G) 10 2 2 6 2 22
Mode 2 (B/H) 54 14 4 4 6 82
Mode 3 (D/F) 13 13 7 7 2 42
Combination C (“Consumer Choice”) 17 7 6 2 1 2 35
Total 94 36 19 19 11 2 181

We can graph that:

After Fig. 4.2, Graham 2006 Ex Figlinis

Thus, by simply categorizing the relationships and graphing their evolution over time, we develop a complex view of the changing dynamics of production and consumption in the Tiber Valley.

However, we can also represent these relationships as a multi-modal network graph. The nodes in this graph are bricks and locations. Some of those locations, like the findspots, are known in space. Other locations, like the precise x,y, of the clay bodies is unknown or can be suggested to be roughly in a particular area. I create a directed graph where clay sources are ‘source’ and bricks are ‘target’, and where bricks are ‘source’ to findspots. I also categorize the linkages according to the relationships determined above. The result is below (Fig 1), where colours = communities and size = betweeness:

Fig 1. All relationships in evidence in Tiber Valley Roman Bricks

all relationships in tv-bricks  (zoomable pdf of the png image)

One can then filter this graph, looking for instance for relationships of type B (Fig 2). What I would really like is to be able to lay these graphs out so that my findspots and claysources can be pinned to geographic space, while allowing the bricks themselves and their relationships to float freely. This would require a layout plugin that understood how to apply a different layout algorithm to nodes with null latitude/longitude values. That at the moment is rather beyond me.

Fig 2. Type B relationships

Finally, one can collapse this graph into its component communities, and look at those relationships. In this PDF you can zoom in and see not just the bricks, but also the kinds of relationships, and the clay body that seems to go with this group (B6, somewhere in the neiborhood of Fiano Romano-ish).

Fig 3. Community 1

Here we collapse the communities into single nodes, and run betweeness on this graph (fig 4). We end up with community 6 being ‘most between’; this community seems to source its clay from somewhere in the upper Sabina in the Forum Novum area (see chapter 3 of Ex Figlinis). Next most between is community 1, which sources in South Etruria in the Fiano Romano area and in the lower Sabina. Then we have community 8, which is pulling from either side of the Tiber in the lower reaches of South Etruria and the Sabina.

Community 6 has within it two sites and six bricks -TVP3592 and TVP2304; bricks se18, se13, se116, se104, se156, and se152. These bricks are first century bricks from the Tonneiana/Viccianae, PL (which might mean ‘Portus Licini’), someone named CASPR and a Q. Sulpicius Sabinus. What does ‘betweeness’ mean in this context here? Obviously, a question with a lot riding on it, but off the top of my head, if the association of PL with Portus Licini, which was a known brick warehouse is suggestive.

Community 1 is a much more complicated group with six sites and 15 brickstamps. Figlinae mentioned are Tonneina/Viccianae again, some Domitianae Veteres, an explicit Portus Licini stamp, a few tied to the Domitiana family, some late ‘officinae’ stamps (including ‘Boconiana’, which implicitly suggests the town of Bocignano).

There is directionality implicit in this diagram, but I’m not entirely sure yet what to make of it.

Fig 4. Communities in Tiber Valley Brick