Interview by Ben Meredith, for his article on procedurally generated archaeology sims

I was interviewed by Ben Meredith on procedurally generated game worlds and their affinities with archaeology, for Kill Screen Magazine. The piece was published this morning. It’s a good read, and an interesting take on one of the more interesting recent developments in gaming. I asked Ben if I could post the unedited communication we had, from which he drew on for his article. He said ‘yes!’, so here it is.

Hi Ben,

It seems to me that archaeology and video games share a number of affinities, not least of which because they are both procedurally generated. There is a method for field archaeology; follow the method, and you will have correctly excavated the site/surveyed the landscape/recorded the standing remains/etc. These procedures contain within them various ways of looking at the world, and emphasize certain kinds of values over others, which is why it is possible to have a marxist archaeology, or a gendered archaeology, or so on. Thus, it also seems obvious to me that you can have an archaeology within video games (not to be confused with media archaeology, or an archaeology of video games). A great example of this kind of work is Andrew Rheinhart’s exploration of the beta of Elder Scrolls Online – you should touch base with him, too.http://archaeogaming.wordpress.com/2014/01/22/beta-testing-archaeology-in-elder-scrolls-online-taken-down/

On to your questions!

What motivated you to become an archaeologist?

Romance, mystery, allure, the ‘other’, the desire to travel… my initial impetus for getting into archaeology comes from the fact that I’m ‘from the bush’ in rural Canada and as a teenager I wanted so much more from the world. I now recognize that there’s some amazing archaeology in my own backyard (as it were) but I was too young and immature to recognize it then. The Greek Bronze Age, the Mycenaean heroes, the Minoans, Thera… all these captured my imagination. And there was no snow!

Personally, what single facet of archaeology captures the spirit of the field most effectively?

Check out the work of Colleen Morgan http://middlesavagery.wordpress.com/2014/03/05/stop-saying-archaeology-is-actually-boring/ and Sophie Hay http://pompei79.wordpress.com/2014/03/05/scratching-the-surface/ and Lorna Richardson http://digipubarch.org/2014/03/14/all-the-swears-for-this/ If there is a ‘spirit of the field’, I think these three scholars capture it admirably. They are curious, reflective, aware of the impact that the doing of archaeology has in the wider world. Archaeology produces powerful narratives, powerful ways of framing our current situation regarding the past and the present. I aspire to be more like these three remarkable women.

Which game do you think, so far, best achieves this?

A hard question to answer. But I think I’d go with Minecraft, for its community and especially its ability to be adopted in educational circles, for the way it requires the player to build and engage with the environments created. The world is what you make it, in Minecraft. So too in archaeology.
If a game attempted to procedurally generate ancient civilizations, what do you think would be the three most important elements that had to be generated?
I’ve done a lot of agent-based simulation. http://www.graeworks.net/category/simulations/ . Such a game would have to be built on an agent-based framework, for the NPCs. Each NPC would have to be unique. Those rules of behaviours that describe how the NPCs interact with each other, the environment, and the player would have to accurately capture the target ancient civilization. You can’t just have an ‘ancient civilization’; you’ll have to consider one very particular culture in one very particular time and place. That’s what a procedural rhetoric is all about: an argument in code about how this aspect of the world worked/is/existed.
Would investigation play an integral part in a video game interpretation?
I’m not sure I follow. Procedural generation on its own still is meaningless; it would have to be interpreted. The act of playing the game (and see the work of Roger Travis on http://playthepast.org on practicomimetics) sings it into existence.
Conversely, for you would stumbling blindly upon a ruin diminish the effect?
If the world is procedurally generated, then there would be clues in the landscape that would attune the attentive player to the presence of the past in that location. If there is no rhyme or reason – we stumble blindly – then the procedures do not describe an ancient (or any) civilization.

Do you think an archaeology simulator would be best implemented in first person (e.g. Minecraft) or third person (e.g. Terraria)? Would it be more important to convey an intimate atmosphere or impressive scale?
I like first person, but on a screen, first person can just induce nausea in the player. Maybe with an Oculus Rift that’s not a concern, in which case I’d say go first person! On a screen, I think third is better. Why not go AR and put your procedurally generated civilization into the local landscape?

Archeology versus Archaeology versus #Blogarch

I’m working on a paper that maps the archaeological blogosphere. I thought this morning it might be good to take a quick detour into the Twitterverse.

Behold!

'archaeology' on twitter, april 7 2014

‘archaeology’ on twitter, april 7 2014

‘archaeology’ on twitter

Here we have every twitter username, connected by referring to each other in a tweet. There’s a seriously strong spine of tweeting, but it doesn’t make for a unified graph. The folks keeping this clump all together, measured by betweeness centrality:

pompeiiapp
arqueologiabcn
herculaneumapp
romanheritage
openaccessarch
cmount1
groovyhistorian
lornarichardson

top replied-to
hotrodngold
raymondsnoddy
colesprouse
1014retold
janell_elise
yorksarch
holleyalex
bonesbehaviours
uclu
illustreets

Top URLS:

http://bit.ly/1husSFB

http://phy.so/316076983

http://bit.ly/1sqHFu0

http://beasiswaindo.com/1796

https://www.dur.ac.uk/archaeology/conferences/current/babao2014/

http://wanderinggypsyvoyager.blogspot.com/2014/04/archaeology-two-day-search.html?spref=tw

http://www.thisiscolossal.com/2014/04/aerial-archaeology/

http://news.sciencemag.org/archaeology/2014/04/did-europeans-get-fat-neandertals

http://www.smartsurvey.co.uk/s/HadriansWall

http://ift.tt/PWRYrf

Top hashtags:
archaeology 325
Pompeii 90
fresco 90
Archaeology 77
Herculaneum 40
Israel 24
nowplaying 20
roman 18
newslocker 16
Roman 14

Archeology

Let’s look at american archeology – as signified by the dropped ‘e’.

'archeology' on twitter, april 7

‘archeology’ on twitter, april 7

An awful lot more fragmented – less popular consciousness of archaeology-as-a-community?
Top by betweeness centrality – the ones who hold this together:
illumynous
archeologynow
youtube
heritagedaily
algenpfleger
riosallier
david328124
ogurek3
gold248131
leafenthusiast

Top urls:

http://ift.tt/1hN75Lp

http://wp.me/p4jAM9-1cZ

http://fav.me/d7d95kp

http://bit.ly/1qdaHLD

http://newszap.com

http://www.valencia953fm.com.ve

http://bit.ly/PS6hg4

http://goo.gl/fb/MfmNZ

http://goo.gl/fb/IfRnh

Top hashtags:
archeology
history
rome
ancient
easterisland
mystery
easter
slave
esoteric
egypt

Top replied-to
atheistlauren
nofaith313
faraishah
sebpatrick
swbts
thebiblestrue
animal
christofpierson
simba_83
andystacey

#Blogarch on twitter

twitter search '#blogarch' april 7 2014

twitter search ‘#blogarch’ april 7 2014

And now, the archaeologists themselves, as indicated by #blogarch

We talk to ourselves – but with the nature of the hashtag, I suppose that’s to be expected?

Top by betweeness centrality
openaccessarch
drspacejunk
bonesdonotlie
archeowebby
drkillgrove
fieldofwork
archaeo_girl
brennawalks
ejarchaeology
yagumboya

top urls

http://zoharesque.blogspot.com/2014/03/space-age-archaeology-and-future-do-i.html?spref=tw

http://bit.ly/1gBkNin

http://campusarch.msu.edu/?p=2782

http://wp.me/p36umf-cW

http://www.poweredbyosteons.org/2014/03/blogging-bioarchaeology-where-do-we-go.html#.Uzm7zM8kJUw.twitter

http://ow.ly/3iVK4f

http://wp.me/p3Kfwu-cb

http://bit.ly/PCdEIE

http://wp.me/p1rKjz-V2

http://diggin-it-archaeology.blogspot.com/2014/04/my-future-in-blogging-archaeology.html

Top hashtags
blogarch
BlogArch
archaeology
saa2014
SAA2014
blogging
CRMArch
newslocker
crmarch

Top replied to
electricarchaeo (yay me!)

Top mentioned:
drspacejunk
bonesdonotlie
fieldofwork
openaccessarch
archeowebby
jsatgra
cmount1
archaeo_girl
capmsu
drkillgrove

Put them altogether now…

And now, we put them altogether to get ‘archaeology’ on the twitterverse today:

'archaeology, archeology, and #blogarch' on twitter, april 7

‘archaeology, archeology, and #blogarch’ on twitter, april 7

Visually, it’s apparent that the #blogarch crew are the ones tying together the wider twitter worlds of archaeology & archeology, thought it’s still pretty fragmented. There’re 460 folks in this graph.

Top by betweeness centrality:

openaccessarch
drspacejunk
bonesdonotlie
archeowebby
drkillgrove
fieldofwork
jamvallmitjana
archaeo_girl
brennawalks
ejarchaeology

Top urls

http://zoharesque.blogspot.com/2014/03/space-age-archaeology-and-future-do-i.html?spref=tw
http://bit.ly/1gBkNin
http://www.poweredbyosteons.org/2014/03/blogging-bioarchaeology-where-do-we-go.html#.Uzm7zM8kJUw.twitter
http://campusarch.msu.edu/?p=2782
http://wp.me/p4jAM9-1cZ
http://fav.me/d7d95kp
http://wp.me/p1rKjz-V2
http://diggin-it-archaeology.blogspot.com/2014/04/my-future-in-blogging-archaeology.html
http://bonesdontlie.wordpress.com/2014/04/01/the-future-of-blogging-for-bones-dont-lie/
http://soundcloud.com/vrecordings/l-side-andrezz-archeology-v

top hashtags (not useful, given the nature of the search, right? But anyway)

blogarch
archeology
archaeology
BlogArch
history
ancient
easterisland
mystery
easter
slave

Top word pairs in those largest groups:

archeology,professor 30
started,yesterday 21
yesterday,battle 21
battle,towton 21
towton,weapon 21
weapon,tests 21
tests,forensic 21
forensic,archeology 21
museum,archeology 19
blogging,archaeology 17

second group:
blogging,archaeology 13
future,blogging 12
archaeology,go 7
archaeology,future 7
archaeology,final 6
final,review 6
review,blogarch 6
hopes,dreams 6
dreams,fears 6
fears,blogging 6

third group:
space,age 6
age,archaeology 6
archaeology,future 6
future,know 6
know,going 6
saa2014,blogarch 6
going,blogarch 5
blogarch,post 3
post,future 3
future,blogging 3

fourth group:
easterisland,ancient 10
ancient,mystery 10
mystery,easter 10
easter,slave 10
slave,history 10
history,esoteric 10
esoteric,archeology 10
archeology,egypt 10
rt,illumynous 9
illumynous,easterisland 9

fifth group:
costa,rica 8
rt,archeologynow 7
archeologynow,modern 4
modern,archeology 4
archeology,researching 4
researching,dive 4
dive,bars 4
bars,costa 4
rica,costa 4
rica,star 4

(once I saw ‘bars’, I stopped. Archaeological stereotypes, maybe).

Top mentioned in the entire graph

illumynous 9 bonesdonotlie 8
drspacejunk 8 drkillgrove 4
bonesdonotlie 8 capmsu 4
archeologynow 7 yagumboya 3
openaccessarch 7 drspacejunk 3
macbrunson 6 archeowebby 3
swbts 6 allarchaeology 3
archeowebby 6 openaccessarch 3
algenpfleger 5 cmount1 3
youtube 5 brennawalks 2

So what does this all mean? Answers on a postcard, please…

(My network files will be on figshare.com eventually).

Quickly Extracting Data from PDFs

By ‘data’, I mean the tables. There are lots of archaeological articles out there that you’d love to compile together to do some sort of meta-study. Or perhaps you’ve gotten your hands on pdfs with tables and tables of census data. Wouldn’t it be great if you could just grab that data cleanly? Jonathan Stray has written a great synopsis of the various things you might try and has sketched out a workflow you might use. Having read that, I wanted to try ‘Tabula‘, one of the options that he mentioned. Tabula is open source and runs on all the major platforms. You simply download it an double-click on the icon; it runs within your browser. You load your pdf into it, and then draw bounding boxes around the tables that you want to grab. Tabula will then extract that table cleanly, allowing you to download it as a csv or tab separated file, or paste it directly into something else.

For instance, say you’re interested in the data that Gill and Chippindale compiled on Cycladic Figures. You can grab the pdf from JSTOR:

Material and Intellectual Consequences of Esteem for Cycladic Figures
David W. J. Gill and Christopher Chippindale
American Journal of Archaeology , Vol. 97, No. 4 (Oct., 1993) , pp. 601-659
Article DOI: 10.2307/506716

Download it, and then feed it into Tabula. Let’s look at table 2.

gillchippendaletable2
You could just highlight this table in your pdf reader and hit ctrl+c to copy it; when you paste that into your browser, you’d get:
gillchippendaletable2cutnpaste
Everything in a single column. For a small table, maybe that’s not such a big deal. But let’s look at what you get with Tabula. You drag the square over that same table; when you release the mouse button you get:
tabula1
Much, much cleaner & faster! I say ‘faster’, because you can quickly drag the selection box around every table and hit download just the one time. Open the resulting csv file, and you have all of your tables in a useful format:
tabula2
But wait, there’s more! Since you can copy directly to the clipboard, you can paste directly into a google drive spreadsheet (thus taking advantage of all the visualization options that Google offers) or into something like Raw from Density Design.
Tabula is a nifty little tool that you’ll probably want to keep handy.

Beyond the Spaghetti Monster

No, I don’t mean that spaghetti monster. I mean the one that people invoke when they wish to disparage network analysis. That particular spaghetti monster is some variant of a force-directed layout algorithm. Now, these have their place, but they sometimes obscure more than they illuminate. There are alternatives, and Elijah Meeks has been sharing some d3.js code for making interactive ‘arc diagrams’ and ‘adjacency matrices’ that highlight important patterns in network data without the monstrousness.

Elijah writes:

An arc diagram is another way of visualizing networks that doesn’t use force-directed principles. Instead, it draws the edge from one node to another as arcs above or below the nodes. Weight is indicated by edge thickness and directionality is indicated by the arc being above or below the nodes as well as with the edge getting wider at the source.

Over at http://bl.ocks.org/emeeks/9458332  Elijah shows us the d3 code for making such a creature. In essence, the code says to your browser, ‘there’s an edgelist, and a nodelist, and they go together like this.’ Since it’s using d3.js (data-driven documents), it loads that library up to make this process easier.  If you wanted to draw one of these things for yourself, you need to copy Elijah’s index.html code from his bl.ocks.org page, and then create two files, edgelist.csv and nodelist.csv.  If you have a network in Gephi, you can export both of these from the data laboratory tab by clicking ‘export spreadsheet’.

Similarly, Elijah provides an interactive adjacency matrix at http://bl.ocks.org/emeeks/9441864

An adjacency matrix is a useful way of visualizing networks using an n-by-n grid that shows connection between nodes as a filled grid square. This adjacency matrix is directed, with the source on the y-axis and target on the x-axis. It loads the data from a node list and edge list and represents edge weight using opacity. It also highlights rows and columns on mouseover.

If you copy that bit of html into a new file, it points to the same nodelist.csv and edgelist.csv. Voila! Two very crisp and clear visualizations of the structure of your network, with very little spaghetti mess. Here is Peter Holdsworth’s network of women from 1898 Ontario as both an arc diagram and an adjaceny matrix (and I thank Peter for making is data public for these kinds of explorations – so, I opened his .gexf network file in Gephi. On the data laboratory tab I hit ‘export spreadsheet’ for the nodes table, and then the edges table. I opened the csv files in excel, stripped out extraneous fields, and saved as csv):

1898 - Women networked by virtue of shared membership in various organizations

1898 – Women networked by virtue of shared membership in various organizations

Same again

Same again

Contrast these with the spaghetti version that was generated with gephi (Figshare provides a preview here). The patterning is much clearer and intuitive, I think. It’s beyond my programming prowess, but it should not be overly difficult for someone to package this code as a layout plugin for Gephi I would think.

Now, here’s the thing – you’ll need to put the html and the csv into the same folder on a server somewhere for this to work. I use WAMP for this kind of thing before moving everything onto the live interwebs. Installing WAMP is quite straightforward; it’s a one-click installer. Once you’ve got it installed, and running, you simply create a subfolder inside the c:\wamp\www\ folder, ie \myproject\. Then in your browser, got to localhost\myproject. Save your html and csv files in that folder. In your browser, click on your html file, and you’re good to go.

Elijah does point out:

This may be true, but if one plays with some of the html, making the canvas bigger, some of this can be mitigated… As with most things, no one approach is going to do everything you need it to, but these two visualizations should be in your toolkit.

HIST4805b Looted Heritage: The Illicit Antiquities Trade

I’m teaching a fourth year seminar next year dealing with issues surrounding the illicit antiquities trade. This seminar will be in conjunction with a larger project spearheaded by the investigative reporter and author Jason Felch, of Chasing Aphrodite. I’m quite excited about this; as an undergraduate, I once had the opportunity to work on a term project that looked at the antiquities market. That was twenty years ago; I’ve never really had the opportunity to scratch that itch since. So, when I was asked to suggest a seminar topic, I jumped at the chance to plumb the depths of my own ignorance together with my students. What better way to teach than to be learning right along with your students?

As ever, I turned to twitter, to see what folks there had to say.

Many folks chimed in with suggestions, including:

I’m keeping all of these in a zotero library for eventual sharing with my students (wider world too), but for now, this is the kind of stuff that’s come in:

Legal & Academic Frameworks

Renfrew, Colin. Loot, Legitimacy and Ownership: The Ethical Crisis in Archaeology. Duckworth, 2000.

Lazrus, Paula K. And A. Barker (eds). All the King’s Horses: Essays on the Impact of Looting and the Illicit Antiquities Trade on Our Knowledge of the Past. SAA 2012.

Marlowe, Elizabeth. Shaky Ground: Context, Connoisseurship and the History of Roman Art. Debates in Archaeology. London: Bloomsbury Academic, 2013. http://catalogue.library.carleton.ca/record=b3486847~S9

Hoffman, Barbara T., ed. Art and Cultural Heritage: Law, Policy, and Practice. Cambridge ; New York: Cambridge University Press, 2006. http://catalogue.library.carleton.ca:80/record=b2293643~S9

Green, Penny, and S. R. M. Mackenzie, eds. Criminology and Archaeology: Studies in Looted Antiquities. Oñati International Series in Law and Society. Oxford ; Portland, Or: Hart Publishing, 2009. http://catalogue.library.carleton.ca:80/record=b2609135~S9

RealTime Delphi Study on the Future of Cultural Heritage Research http://www.jpi-culturalheritage.eu/wp-content/uploads/JPI-Cultural-Heritage-RealTime-Delphi-Report-final-version-to-be-published.pdf

Campbell, Peter B. ‘The Illicit Antiquities Trade as a Transnational Criminal Network: Characterizing and Anticipating Trafficking of Cultural Heritage’. International Journal of Cultural Property 20, no. 02 (2013): 113–153. doi:10.1017/S0940739113000015.

World War II

Nicholas, Lynn H. The Rape of Europa: The Fate of Europe’s Treasures in the Third Reich and the Second World War. 1st ed. New York: Knopf, 1994. http://catalogue.library.carleton.ca/record=b1456118~S9

Edsel, Robert M, and Bret Witter. The Monuments Men: Allied Heroes, Nazi Thieves and the Greatest Treasure Hunt in History. New York: Center Street / Hachette Book Group, 2010.

Edsel, Robert M. Saving Italy: The Race to Rescue a Nation’s Treasures from the Nazis. 1st ed. New York: W. W. Norton & Company, 2013. http://catalogue.library.carleton.ca/record=b3445170~S9

Current State

Felch, Jason, and Ralph Frammolino. Chasing Aphrodite: The Hunt for Looted Antiquities at the World’s Richest Museum. Houghton Mifflin Harcourt, 2011.

Watson, Peter, and Cecilia Todeschini. The Medici Conspiracy: The Illicit Journey of Looted Antiquities from Italy’s Tomb Raiders to the World’s Greatest Museums. PublicAffairs, 2007.

Waxman, Sharon. Loot: The Battle over the Stolen Treasures of the Ancient World. Macmillan, 2010. http://catalogue.library.carleton.ca/record=b2928026~S9

‘Trafficking Culture’. Accessed 12 March 2014. http://traffickingculture.org/.

and an entire special issue of Internet Archaeology: Issue 33 – Portable Antiquities: archaeology, collecting, metal detecting, Edited by Stuart Campbell and Suzie Thomas http://intarch.ac.uk/journal/issue33/index.html

And from Donna Yates, the exciting news that she and her collaborators at Trafficking Culture are going to write a textbook on the subject:

Assessment

In terms of assessment, I want to avoid long research essays based on secondary sources. Instead, I’d rather have the students build something, analyze something, visualize something… so this will be a heavily digital humanities inflected course. I want my students at the coalface. My little looted heritage social media observatory, https://heritage.crowdmap.com/ will be pulled out of the mothballs and will become an active part of the course. We’ll be mining eBay, looking at the auction sites, exploring museum archives… probably. Stay tuned!

If you have suggestions for things the students should be reading/looking at/exploring, please do drop me a line or leave a comment.

Mapping the Web in Real Time

I don’t think I’ve shared my workflow before for mapping the structure of a webcrawl. After listening to Sebastian Heath speak at #dapw it occurred to me that it might be useful for, interalia linked open data type resources. So, here’s what you do (and my example draw’s from this year’s SAA 2014 blogging archaeology session blog-o-sphere):

1. install the http graph generator from the gephi plugin marketplace.

2. download the navicrawler + firefox portable zip file at the top of this page.

3. make sure no other instance of firefox is open. Open firefox portable. DO NOT click the ‘update firefox’ button, as this will make navicrawler unusable.

4. Navicrawler can be used to download or scrape the web. In the navicrawler window, click on the (+) to select the ‘crawl’ pane. This will let you set how deep and how far to crawl. Under the ‘file’ tab, you can save all of what you crawl in various file formats. With the httpgraph plugin for Gephi however, we will simply ‘listen’ to the browser and render the graph in real time.

5. The first time you run firefox portable, you will need to configure a manual proxy. Do this by going to tools >> options >> network >> settings. Set the manual proxy configuration for http to 127.0.0.1 and the port to 8088. Click ‘ok’.

If you tried loading a webpage at this point, you’d get an error. To resolve this, you need to tell Gephi to connect to that port as well, and then web traffic will be routed correctly.

6. Open Gephi. Select new project. Under ‘generate’, select ‘http graph’. This will open a dialogue box asking for the port number. Enter 8088.

7. Over in Firefox portable, you can now start a websearch or go to the page from which you wish to crawl. For instance, you could put in the address bar, http://dougsarchaeology.wordpress.com/2013/11/05/blogging-archaeology/. Over in gephi, you will start to see a number of nodes and edges appearing. In the ‘crawl’ window in Navicrawler, set ‘max depth’ to 1, ‘crawl distance’ to 2′ and ‘tabs count’ to 25. Then hit the ‘start’ button. Your Gephi window will now begin to fill with the structure of the internet. There are 4 types of nodes: client, uri, host, and domain. For our purposes here, we will want to filter the resulting graph to hide most of the architecture of the web and just show the URIs. (This by the way could be very useful for visualizing archaeological resources organized via Linked Open Data principles).

Your crawl can run for quite some time.  I was running the crawl describe above for around 10 minutes when it crashed on me. The resulting gephi file (which has 5374 nodes and 14993 edges) can be downloaded from my space on figshare. For the illustration below, I filtered the ‘content-type’ for ‘text/html’, to present the structure of the human readable archaeo-blog-o-sphere as represented by Doug’s Blogging Archaeology Carnival.

The view from Doug's place
The view from Doug’s place

Shared Authority & the Return of the Human Curated Web

A few years ago, I wrote a piece on Why Academic Blogging Matters: A structural argument. This was the text for a presentation as part of the SAA in Sacremento that year. In the years since, the web has changed (again). It is no longer enough for us to create strong signals in the noise, trusting in the algorithmns to connect us with our desired publics. (That’s the short version. The long version is rather more nuanced and sophisticated, trust me).

The war between the botnets and the SEO specialists has outstripped us.

In recent months, I have noticed an upsurge of new ‘followers’ on this blog with emails and handles that really do not seem to be those of actual humans. Similarly, on Twitter, I find odd tweets directed at me filled with gibberish web addresses (which I dare not touch). Digital Humanities Now highlighted an interesting post in recent days that explains what’s going on, discusses this ‘war’, and in how this post came to my attention, points the way forward for the humanistic use of the web.

In ‘Crowd-Frauding: Why the Internet is Fake‘, Eric Hellman discusses a new avenue for power (assuming that power ‘derives from the ability to get people to act together’. In this case, ‘cooperative traffic generation’, or software-organized crime. Hellman was finding a surge of fake users on his site, and he began to investigate why this was. Turns out, if you want to promote your website and jack up its traffic, you can install a program that manufacturers fake visitors to your sites, who click around, click on adverts, register… and in turn does this for other users of the software. Money is involved.

“In short, your computer has become part of a botnet. You get paid for your participation with web traffic. What you thought was something innocuous to increase your Alexa- ranking has turned you into a foot-soldier in a software-organized crime syndicate. If you forgot to run it in a sandbox, you might be running other programs as well. And who knows what else.

The thing that makes cooperative traffic generation so difficult to detect is that the advertising is really being advertised. The only problem for advertisers is that they’re paying to be advertised to robots, and robots do everything except buy stuff. The internet ad networks work hard to battle this sort of click fraud, but they have incentives to do a middling job of it. Ad networks get a cut of those ad dollars, after all.

The crowd wants to make money and organizes via the internet to shake down the merchants who think they’re sponsoring content. Turns out, content isn’t king, content is cattle.”

Hellman goes on to describe how the arms race, the red queen effect, between these botnets and advertising models that depend on clickrates etc will push those of us without the computing resources to fight in these battles into the arms of the Googles, the Amazons, the Facebooks: and their power will increase correspondingly.

“So with the crowd-frauders attacking advertising, the small advertiser will shy away from most publishers except for the least evil ones- Google or maybe Facebook. Ad networks will become less and less efficient because of the expense of dealing with click-fraud. The rest of the the internet will become fake as collateral damage. Do you think you know how many users you have? Think again, because half of them are already robots, soon it will be 90%. Do you think you know how much visitors you have? Sorry, 60% of it is already robots.”

I sometimes try explaining around the department here that when we use the internet, we’re not using a tool, we’re sharing authority with countless engineers, companies, criminals, folks-in-their-parents-basement, ordinary folks, students, algorithms whose interactions with other algorithms can lead to rather unintended outcomes. We can’t naively rely on the goodwill of the search engine to help us get our stuff out there. This I think is an opportunity for a return of the human curated web. No, I don’t mean building directories and indices. I mean, a kind of supervised learning algorithm (as it were).

Digital Humanities Now provides one such model (and there are of course others, such as Reddit, etc). A combination of algorithm and human editorial oversite, DHNow is a cybernetic attempt to bring to the surface the best in the week’s digital humanities work, wherever on the net it may reside. We should have the same in archaeology. An Archaeology Now!  The infrastructure is already there. Pressforward, the outfit from the RRCHNM has developed a workflow for folding volunteer editors into the weekly task of separating the wheat from the chaff, using a custom built plugin for WordPress. Ages ago we talked about a quarterly journal where people would nominate their own posts and we would spider the web looking for these nominations, but the technology wasn’t really there at that time (and perhaps the idea was too soon). With the example of DHNow, and the emergence of this new front in botnets/SEO/clickfraud and the dangers that that poses, perhaps it’s time to revisit the idea of the human-computer curated archaeoweb?

On Research Witchcraft

I’m a fan of Terry Pratchett. I re-read his novels frequently because each time, I find something new to consider. I was recently reading Lords and Ladies, which is part of the witches’ cycle of stories set in Discworld. This passage resonated:

Cottages tend to attract similar kinds of witches. It’s natural. Every witch trains up one or two young witches in their life, and when in the course of mortal time the cottage becomes vacant it’s only sense for one of them to move in.

Magrat’s cottage traditionally housed thoughtful witches who noticed things and wrote things down. Which herbs were better than others for headaches, fragments of old stories, odds and ends like that.

[...]It was a cottage of questioning witches, research witches. Eye of what newt? What species of ravined salt-sea shark? It’s all very well a potion calling for Love-in-idleness, but which of the thirty-seven common plants called by that name in various parts of the continent was actually meant?

The reason that Granny Weatherwax was a better witch than Magrat was that she knew that in witchcraft it didn’t matter a damn which one it was, or even if it was a piece of grass.

The reason that Magrat was a better doctor than Granny was that she thought it did.

Take a look at any github page, and examine the readme page. Strikes me, there’s a lot of the witches about these code repositories. The parallel isn’t perfect, but I feel rather like poor Magrat. For instance (and taken at random*):

Install PostgreSQL.

Install a Java Development Kit.

Install Git.

git clone https://github.com/overview/overview-server.git

Which development kit? What version? How many flavours of PostgreSQL are there? What do I do with that?  As I fumble towards dim understanding, I figure the folks who are building these things are more like Granny, and understand that any will do the trick, because they know what to expect and how to fix it if it goes wrong. Me, I need the right version the first time, because otherwise I’ll just make a hash of it – and I’ll have to teach it to someone! (Although I can git clone from git bash with the best of ‘em – now!)

I don’t have the tacit knowledge of experience built up yet. There’s just so much to learn! Like Magrat, I can write it all down, spell it all out, and in doing so, I’ll eventually become like Granny, where it just flows.

I look forward to that day. But for now, I’ll keep engaging in my research witchcraft, figuring out the bits and bobs that those far more clever than me have devised, and reporting back what I’ve found.

*Well, not totally at random. It comes from the Overview Project who have taken pity on me (and others!) and have worked very hard indeed to simplify setting up a development environment for their text analysis server, ‘Overview‘. Thank you Jonathan and Adam! I’m learning a lot from chatting with these guys, as they shepherd me through the process. I’ll be posting on that process soon, pointing out some of the tacit bits I found I had to uncover in order to make it work. Their platform, conceived for journalists should also migrate its way into history & archaeology as I think we’ll find it very useful!

Getting started with some open source alternatives to 123D Catch

I like 123D Catch, but there is the whiff of ‘black-box’ about it all. Sometimes, you’d just like to know what’s going on. There may also be times when, for various reasons, uploading data to a cloud service hosted in another country is just not the right solution for you.

There are many open source products though; right now I’m playing with VisualSFM. Download and install it; then download CMVS. Extract the zip. Within it you will find folders for various operating systems. Find yours, and copy the files within, to the VisualSFM folder.

Now you’re ready to go, as per the image below. Here’s a longer tutorial too.

You might however find it easier to use this bundle of all the bits and pieces you need, if you are familiar with python. Extract the zip, grab the folder that corresponds to your operating system, and move it to C:\ . Install python (I’m using Python 2.7). Then, open the command prompt (type ‘cmd’ in the ‘run’ box, Windows), navigate to the folder (On my machine, it’s now in c:\bundler, so I had to type:

C:\users\Shawn Graham> cd ..

C:\users> cd ..

C:\> cd bundler

C:\Bundler>RunBundler.py –photos=C:\MyPhotoFolder\

…and the magic begins to happen. I got an error at first: ‘blah blah blah PIL missing blah blah’. PIL stands for Python Image LIbrary. Go here, grab the correct version, download it, and double-click to install. Then try again with the RunBundler.py command above.

So that’s running now on my machine; I’ll update here if it all goes wrong – or if indeed it all goes right!

Putting Pompeii on Your Coffee Table

(cross-posted from my course blog, #hist5702x digital/public history. If you’re interested in public history and augmented reality, check out my students’ posts!)

Creating three dimensional models from photographs has its ups and downs. But what if we could do it from video? I decided to find out.

First, I found this tourist’s film of a house at Pompeii (house of the tragic poet, he says):

I saved a copy of the film locally; there are a variety of ways of doing this and two seconds with google will show you how. I then watched it carefully, and took note of a sequence of clearly lit pans at various points, marking down when they started and stopped, in seconds.

C extract extract3 atr-00625

Then, I searched for a way to extract still images from that clip. This blog post describes a command-line option using VLC, option 3. I went with that, which created around 600-images. I then batch converted them from png to jpg (Google around again; the solution I found from download.com was filled with extraneous crapware that cost me 30 minutes to delete).

I then selected around 40 images that seemed to cover things well. It would’ve been better if the cameraman had moved around rather than panned, as that would’ve provided better viewpoints (I’ll search for a better video clip). These I stitched together using 123D Catch. I have the Python Photogrammetry Toolbox on my other computer, so I’ll try doing it again on that machine; 123D Catch is all well and good but it is quite black-box; with PPT I can perhaps achieve better results.

The resulting model from 123D Catch shows the inside of the atrium far better than I expected (and again, a better starting film would probably give better results). I exported the .obj, .mtl, and the jpg textures for the resulting model, to my computer, which I then uploaded to augmentedev.com.

The result? A pompeian house, on my desktop!

The Atrium of the House of the Tragic Poet, Pompeii-on-the-Rideau

Now imagine *all* of the video that exists out there of Pompeii. It should be possible to create a 3d model of nearly the whole city (or at least, the parts they let tourists into), harvesting videos from youtube. One could then 3d print the city, export to AR, or import into a game engine….

As far as the #hist5702x project is concerned, we could do this in the workspace they’ve set up for us in the warehouse building, or at the airport, or from historical footage from inside a plane, or….