Quickly Extracting Data from PDFs

By ‘data’, I mean the tables. There are lots of archaeological articles out there that you’d love to compile together to do some sort of meta-study. Or perhaps you’ve gotten your hands on pdfs with tables and tables of census data. Wouldn’t it be great if you could just grab that data cleanly? Jonathan Stray has written a great synopsis of the various things you might try and has sketched out a workflow you might use. Having read that, I wanted to try ‘Tabula‘, one of the options that he mentioned. Tabula is open source and runs on all the major platforms. You simply download it an double-click on the icon; it runs within your browser. You load your pdf into it, and then draw bounding boxes around the tables that you want to grab. Tabula will then extract that table cleanly, allowing you to download it as a csv or tab separated file, or paste it directly into something else.

For instance, say you’re interested in the data that Gill and Chippindale compiled on Cycladic Figures. You can grab the pdf from JSTOR:

Material and Intellectual Consequences of Esteem for Cycladic Figures
David W. J. Gill and Christopher Chippindale
American Journal of Archaeology , Vol. 97, No. 4 (Oct., 1993) , pp. 601-659
Article DOI: 10.2307/506716

Download it, and then feed it into Tabula. Let’s look at table 2.

gillchippendaletable2
You could just highlight this table in your pdf reader and hit ctrl+c to copy it; when you paste that into your browser, you’d get:
gillchippendaletable2cutnpaste
Everything in a single column. For a small table, maybe that’s not such a big deal. But let’s look at what you get with Tabula. You drag the square over that same table; when you release the mouse button you get:
tabula1
Much, much cleaner & faster! I say ‘faster’, because you can quickly drag the selection box around every table and hit download just the one time. Open the resulting csv file, and you have all of your tables in a useful format:
tabula2
But wait, there’s more! Since you can copy directly to the clipboard, you can paste directly into a google drive spreadsheet (thus taking advantage of all the visualization options that Google offers) or into something like Raw from Density Design.
Tabula is a nifty little tool that you’ll probably want to keep handy.

Beyond the Spaghetti Monster

No, I don’t mean that spaghetti monster. I mean the one that people invoke when they wish to disparage network analysis. That particular spaghetti monster is some variant of a force-directed layout algorithm. Now, these have their place, but they sometimes obscure more than they illuminate. There are alternatives, and Elijah Meeks has been sharing some d3.js code for making interactive ‘arc diagrams’ and ‘adjacency matrices’ that highlight important patterns in network data without the monstrousness.

Elijah writes:

An arc diagram is another way of visualizing networks that doesn’t use force-directed principles. Instead, it draws the edge from one node to another as arcs above or below the nodes. Weight is indicated by edge thickness and directionality is indicated by the arc being above or below the nodes as well as with the edge getting wider at the source.

Over at http://bl.ocks.org/emeeks/9458332  Elijah shows us the d3 code for making such a creature. In essence, the code says to your browser, ‘there’s an edgelist, and a nodelist, and they go together like this.’ Since it’s using d3.js (data-driven documents), it loads that library up to make this process easier.  If you wanted to draw one of these things for yourself, you need to copy Elijah’s index.html code from his bl.ocks.org page, and then create two files, edgelist.csv and nodelist.csv.  If you have a network in Gephi, you can export both of these from the data laboratory tab by clicking ‘export spreadsheet’.

Similarly, Elijah provides an interactive adjacency matrix at http://bl.ocks.org/emeeks/9441864

An adjacency matrix is a useful way of visualizing networks using an n-by-n grid that shows connection between nodes as a filled grid square. This adjacency matrix is directed, with the source on the y-axis and target on the x-axis. It loads the data from a node list and edge list and represents edge weight using opacity. It also highlights rows and columns on mouseover.

If you copy that bit of html into a new file, it points to the same nodelist.csv and edgelist.csv. Voila! Two very crisp and clear visualizations of the structure of your network, with very little spaghetti mess. Here is Peter Holdsworth’s network of women from 1898 Ontario as both an arc diagram and an adjaceny matrix (and I thank Peter for making is data public for these kinds of explorations – so, I opened his .gexf network file in Gephi. On the data laboratory tab I hit ‘export spreadsheet’ for the nodes table, and then the edges table. I opened the csv files in excel, stripped out extraneous fields, and saved as csv):

1898 - Women networked by virtue of shared membership in various organizations

1898 – Women networked by virtue of shared membership in various organizations

Same again

Same again

Contrast these with the spaghetti version that was generated with gephi (Figshare provides a preview here). The patterning is much clearer and intuitive, I think. It’s beyond my programming prowess, but it should not be overly difficult for someone to package this code as a layout plugin for Gephi I would think.

Now, here’s the thing – you’ll need to put the html and the csv into the same folder on a server somewhere for this to work. I use WAMP for this kind of thing before moving everything onto the live interwebs. Installing WAMP is quite straightforward; it’s a one-click installer. Once you’ve got it installed, and running, you simply create a subfolder inside the c:\wamp\www\ folder, ie \myproject\. Then in your browser, got to localhost\myproject. Save your html and csv files in that folder. In your browser, click on your html file, and you’re good to go.

Elijah does point out:

This may be true, but if one plays with some of the html, making the canvas bigger, some of this can be mitigated… As with most things, no one approach is going to do everything you need it to, but these two visualizations should be in your toolkit.

HIST4805b Looted Heritage: The Illicit Antiquities Trade

I’m teaching a fourth year seminar next year dealing with issues surrounding the illicit antiquities trade. This seminar will be in conjunction with a larger project spearheaded by the investigative reporter and author Jason Felch, of Chasing Aphrodite. I’m quite excited about this; as an undergraduate, I once had the opportunity to work on a term project that looked at the antiquities market. That was twenty years ago; I’ve never really had the opportunity to scratch that itch since. So, when I was asked to suggest a seminar topic, I jumped at the chance to plumb the depths of my own ignorance together with my students. What better way to teach than to be learning right along with your students?

As ever, I turned to twitter, to see what folks there had to say.

Many folks chimed in with suggestions, including:

I’m keeping all of these in a zotero library for eventual sharing with my students (wider world too), but for now, this is the kind of stuff that’s come in:

Legal & Academic Frameworks

Renfrew, Colin. Loot, Legitimacy and Ownership: The Ethical Crisis in Archaeology. Duckworth, 2000.

Lazrus, Paula K. And A. Barker (eds). All the King’s Horses: Essays on the Impact of Looting and the Illicit Antiquities Trade on Our Knowledge of the Past. SAA 2012.

Marlowe, Elizabeth. Shaky Ground: Context, Connoisseurship and the History of Roman Art. Debates in Archaeology. London: Bloomsbury Academic, 2013. http://catalogue.library.carleton.ca/record=b3486847~S9

Hoffman, Barbara T., ed. Art and Cultural Heritage: Law, Policy, and Practice. Cambridge ; New York: Cambridge University Press, 2006. http://catalogue.library.carleton.ca:80/record=b2293643~S9

Green, Penny, and S. R. M. Mackenzie, eds. Criminology and Archaeology: Studies in Looted Antiquities. Oñati International Series in Law and Society. Oxford ; Portland, Or: Hart Publishing, 2009. http://catalogue.library.carleton.ca:80/record=b2609135~S9

RealTime Delphi Study on the Future of Cultural Heritage Research http://www.jpi-culturalheritage.eu/wp-content/uploads/JPI-Cultural-Heritage-RealTime-Delphi-Report-final-version-to-be-published.pdf

Campbell, Peter B. ‘The Illicit Antiquities Trade as a Transnational Criminal Network: Characterizing and Anticipating Trafficking of Cultural Heritage’. International Journal of Cultural Property 20, no. 02 (2013): 113–153. doi:10.1017/S0940739113000015.

World War II

Nicholas, Lynn H. The Rape of Europa: The Fate of Europe’s Treasures in the Third Reich and the Second World War. 1st ed. New York: Knopf, 1994. http://catalogue.library.carleton.ca/record=b1456118~S9

Edsel, Robert M, and Bret Witter. The Monuments Men: Allied Heroes, Nazi Thieves and the Greatest Treasure Hunt in History. New York: Center Street / Hachette Book Group, 2010.

Edsel, Robert M. Saving Italy: The Race to Rescue a Nation’s Treasures from the Nazis. 1st ed. New York: W. W. Norton & Company, 2013. http://catalogue.library.carleton.ca/record=b3445170~S9

Current State

Felch, Jason, and Ralph Frammolino. Chasing Aphrodite: The Hunt for Looted Antiquities at the World’s Richest Museum. Houghton Mifflin Harcourt, 2011.

Watson, Peter, and Cecilia Todeschini. The Medici Conspiracy: The Illicit Journey of Looted Antiquities from Italy’s Tomb Raiders to the World’s Greatest Museums. PublicAffairs, 2007.

Waxman, Sharon. Loot: The Battle over the Stolen Treasures of the Ancient World. Macmillan, 2010. http://catalogue.library.carleton.ca/record=b2928026~S9

‘Trafficking Culture’. Accessed 12 March 2014. http://traffickingculture.org/.

and an entire special issue of Internet Archaeology: Issue 33 – Portable Antiquities: archaeology, collecting, metal detecting, Edited by Stuart Campbell and Suzie Thomas http://intarch.ac.uk/journal/issue33/index.html

And from Donna Yates, the exciting news that she and her collaborators at Trafficking Culture are going to write a textbook on the subject:

Assessment

In terms of assessment, I want to avoid long research essays based on secondary sources. Instead, I’d rather have the students build something, analyze something, visualize something… so this will be a heavily digital humanities inflected course. I want my students at the coalface. My little looted heritage social media observatory, https://heritage.crowdmap.com/ will be pulled out of the mothballs and will become an active part of the course. We’ll be mining eBay, looking at the auction sites, exploring museum archives… probably. Stay tuned!

If you have suggestions for things the students should be reading/looking at/exploring, please do drop me a line or leave a comment.

Mapping the Web in Real Time

I don’t think I’ve shared my workflow before for mapping the structure of a webcrawl. After listening to Sebastian Heath speak at #dapw it occurred to me that it might be useful for, interalia linked open data type resources. So, here’s what you do (and my example draw’s from this year’s SAA 2014 blogging archaeology session blog-o-sphere):

1. install the http graph generator from the gephi plugin marketplace.

2. download the navicrawler + firefox portable zip file at the top of this page.

3. make sure no other instance of firefox is open. Open firefox portable. DO NOT click the ‘update firefox’ button, as this will make navicrawler unusable.

4. Navicrawler can be used to download or scrape the web. In the navicrawler window, click on the (+) to select the ‘crawl’ pane. This will let you set how deep and how far to crawl. Under the ‘file’ tab, you can save all of what you crawl in various file formats. With the httpgraph plugin for Gephi however, we will simply ‘listen’ to the browser and render the graph in real time.

5. The first time you run firefox portable, you will need to configure a manual proxy. Do this by going to tools >> options >> network >> settings. Set the manual proxy configuration for http to 127.0.0.1 and the port to 8088. Click ‘ok’.

If you tried loading a webpage at this point, you’d get an error. To resolve this, you need to tell Gephi to connect to that port as well, and then web traffic will be routed correctly.

6. Open Gephi. Select new project. Under ‘generate’, select ‘http graph’. This will open a dialogue box asking for the port number. Enter 8088.

7. Over in Firefox portable, you can now start a websearch or go to the page from which you wish to crawl. For instance, you could put in the address bar, http://dougsarchaeology.wordpress.com/2013/11/05/blogging-archaeology/. Over in gephi, you will start to see a number of nodes and edges appearing. In the ‘crawl’ window in Navicrawler, set ‘max depth’ to 1, ‘crawl distance’ to 2′ and ‘tabs count’ to 25. Then hit the ‘start’ button. Your Gephi window will now begin to fill with the structure of the internet. There are 4 types of nodes: client, uri, host, and domain. For our purposes here, we will want to filter the resulting graph to hide most of the architecture of the web and just show the URIs. (This by the way could be very useful for visualizing archaeological resources organized via Linked Open Data principles).

Your crawl can run for quite some time.  I was running the crawl describe above for around 10 minutes when it crashed on me. The resulting gephi file (which has 5374 nodes and 14993 edges) can be downloaded from my space on figshare. For the illustration below, I filtered the ‘content-type’ for ‘text/html’, to present the structure of the human readable archaeo-blog-o-sphere as represented by Doug’s Blogging Archaeology Carnival.

The view from Doug's place
The view from Doug’s place

Shared Authority & the Return of the Human Curated Web

A few years ago, I wrote a piece on Why Academic Blogging Matters: A structural argument. This was the text for a presentation as part of the SAA in Sacremento that year. In the years since, the web has changed (again). It is no longer enough for us to create strong signals in the noise, trusting in the algorithmns to connect us with our desired publics. (That’s the short version. The long version is rather more nuanced and sophisticated, trust me).

The war between the botnets and the SEO specialists has outstripped us.

In recent months, I have noticed an upsurge of new ‘followers’ on this blog with emails and handles that really do not seem to be those of actual humans. Similarly, on Twitter, I find odd tweets directed at me filled with gibberish web addresses (which I dare not touch). Digital Humanities Now highlighted an interesting post in recent days that explains what’s going on, discusses this ‘war’, and in how this post came to my attention, points the way forward for the humanistic use of the web.

In ‘Crowd-Frauding: Why the Internet is Fake‘, Eric Hellman discusses a new avenue for power (assuming that power ‘derives from the ability to get people to act together’. In this case, ‘cooperative traffic generation’, or software-organized crime. Hellman was finding a surge of fake users on his site, and he began to investigate why this was. Turns out, if you want to promote your website and jack up its traffic, you can install a program that manufacturers fake visitors to your sites, who click around, click on adverts, register… and in turn does this for other users of the software. Money is involved.

“In short, your computer has become part of a botnet. You get paid for your participation with web traffic. What you thought was something innocuous to increase your Alexa- ranking has turned you into a foot-soldier in a software-organized crime syndicate. If you forgot to run it in a sandbox, you might be running other programs as well. And who knows what else.

The thing that makes cooperative traffic generation so difficult to detect is that the advertising is really being advertised. The only problem for advertisers is that they’re paying to be advertised to robots, and robots do everything except buy stuff. The internet ad networks work hard to battle this sort of click fraud, but they have incentives to do a middling job of it. Ad networks get a cut of those ad dollars, after all.

The crowd wants to make money and organizes via the internet to shake down the merchants who think they’re sponsoring content. Turns out, content isn’t king, content is cattle.”

Hellman goes on to describe how the arms race, the red queen effect, between these botnets and advertising models that depend on clickrates etc will push those of us without the computing resources to fight in these battles into the arms of the Googles, the Amazons, the Facebooks: and their power will increase correspondingly.

“So with the crowd-frauders attacking advertising, the small advertiser will shy away from most publishers except for the least evil ones- Google or maybe Facebook. Ad networks will become less and less efficient because of the expense of dealing with click-fraud. The rest of the the internet will become fake as collateral damage. Do you think you know how many users you have? Think again, because half of them are already robots, soon it will be 90%. Do you think you know how much visitors you have? Sorry, 60% of it is already robots.”

I sometimes try explaining around the department here that when we use the internet, we’re not using a tool, we’re sharing authority with countless engineers, companies, criminals, folks-in-their-parents-basement, ordinary folks, students, algorithms whose interactions with other algorithms can lead to rather unintended outcomes. We can’t naively rely on the goodwill of the search engine to help us get our stuff out there. This I think is an opportunity for a return of the human curated web. No, I don’t mean building directories and indices. I mean, a kind of supervised learning algorithm (as it were).

Digital Humanities Now provides one such model (and there are of course others, such as Reddit, etc). A combination of algorithm and human editorial oversite, DHNow is a cybernetic attempt to bring to the surface the best in the week’s digital humanities work, wherever on the net it may reside. We should have the same in archaeology. An Archaeology Now!  The infrastructure is already there. Pressforward, the outfit from the RRCHNM has developed a workflow for folding volunteer editors into the weekly task of separating the wheat from the chaff, using a custom built plugin for WordPress. Ages ago we talked about a quarterly journal where people would nominate their own posts and we would spider the web looking for these nominations, but the technology wasn’t really there at that time (and perhaps the idea was too soon). With the example of DHNow, and the emergence of this new front in botnets/SEO/clickfraud and the dangers that that poses, perhaps it’s time to revisit the idea of the human-computer curated archaeoweb?

On Research Witchcraft

I’m a fan of Terry Pratchett. I re-read his novels frequently because each time, I find something new to consider. I was recently reading Lords and Ladies, which is part of the witches’ cycle of stories set in Discworld. This passage resonated:

Cottages tend to attract similar kinds of witches. It’s natural. Every witch trains up one or two young witches in their life, and when in the course of mortal time the cottage becomes vacant it’s only sense for one of them to move in.

Magrat’s cottage traditionally housed thoughtful witches who noticed things and wrote things down. Which herbs were better than others for headaches, fragments of old stories, odds and ends like that.

[...]It was a cottage of questioning witches, research witches. Eye of what newt? What species of ravined salt-sea shark? It’s all very well a potion calling for Love-in-idleness, but which of the thirty-seven common plants called by that name in various parts of the continent was actually meant?

The reason that Granny Weatherwax was a better witch than Magrat was that she knew that in witchcraft it didn’t matter a damn which one it was, or even if it was a piece of grass.

The reason that Magrat was a better doctor than Granny was that she thought it did.

Take a look at any github page, and examine the readme page. Strikes me, there’s a lot of the witches about these code repositories. The parallel isn’t perfect, but I feel rather like poor Magrat. For instance (and taken at random*):

Install PostgreSQL.

Install a Java Development Kit.

Install Git.

git clone https://github.com/overview/overview-server.git

Which development kit? What version? How many flavours of PostgreSQL are there? What do I do with that?  As I fumble towards dim understanding, I figure the folks who are building these things are more like Granny, and understand that any will do the trick, because they know what to expect and how to fix it if it goes wrong. Me, I need the right version the first time, because otherwise I’ll just make a hash of it – and I’ll have to teach it to someone! (Although I can git clone from git bash with the best of ‘em – now!)

I don’t have the tacit knowledge of experience built up yet. There’s just so much to learn! Like Magrat, I can write it all down, spell it all out, and in doing so, I’ll eventually become like Granny, where it just flows.

I look forward to that day. But for now, I’ll keep engaging in my research witchcraft, figuring out the bits and bobs that those far more clever than me have devised, and reporting back what I’ve found.

*Well, not totally at random. It comes from the Overview Project who have taken pity on me (and others!) and have worked very hard indeed to simplify setting up a development environment for their text analysis server, ‘Overview‘. Thank you Jonathan and Adam! I’m learning a lot from chatting with these guys, as they shepherd me through the process. I’ll be posting on that process soon, pointing out some of the tacit bits I found I had to uncover in order to make it work. Their platform, conceived for journalists should also migrate its way into history & archaeology as I think we’ll find it very useful!

Getting started with some open source alternatives to 123D Catch

I like 123D Catch, but there is the whiff of ‘black-box’ about it all. Sometimes, you’d just like to know what’s going on. There may also be times when, for various reasons, uploading data to a cloud service hosted in another country is just not the right solution for you.

There are many open source products though; right now I’m playing with VisualSFM. Download and install it; then download CMVS. Extract the zip. Within it you will find folders for various operating systems. Find yours, and copy the files within, to the VisualSFM folder.

Now you’re ready to go, as per the image below. Here’s a longer tutorial too.

You might however find it easier to use this bundle of all the bits and pieces you need, if you are familiar with python. Extract the zip, grab the folder that corresponds to your operating system, and move it to C:\ . Install python (I’m using Python 2.7). Then, open the command prompt (type ‘cmd’ in the ‘run’ box, Windows), navigate to the folder (On my machine, it’s now in c:\bundler, so I had to type:

C:\users\Shawn Graham> cd ..

C:\users> cd ..

C:\> cd bundler

C:\Bundler>RunBundler.py –photos=C:\MyPhotoFolder\

…and the magic begins to happen. I got an error at first: ‘blah blah blah PIL missing blah blah’. PIL stands for Python Image LIbrary. Go here, grab the correct version, download it, and double-click to install. Then try again with the RunBundler.py command above.

So that’s running now on my machine; I’ll update here if it all goes wrong – or if indeed it all goes right!

Putting Pompeii on Your Coffee Table

(cross-posted from my course blog, #hist5702x digital/public history. If you’re interested in public history and augmented reality, check out my students’ posts!)

Creating three dimensional models from photographs has its ups and downs. But what if we could do it from video? I decided to find out.

First, I found this tourist’s film of a house at Pompeii (house of the tragic poet, he says):

I saved a copy of the film locally; there are a variety of ways of doing this and two seconds with google will show you how. I then watched it carefully, and took note of a sequence of clearly lit pans at various points, marking down when they started and stopped, in seconds.

C extract extract3 atr-00625

Then, I searched for a way to extract still images from that clip. This blog post describes a command-line option using VLC, option 3. I went with that, which created around 600-images. I then batch converted them from png to jpg (Google around again; the solution I found from download.com was filled with extraneous crapware that cost me 30 minutes to delete).

I then selected around 40 images that seemed to cover things well. It would’ve been better if the cameraman had moved around rather than panned, as that would’ve provided better viewpoints (I’ll search for a better video clip). These I stitched together using 123D Catch. I have the Python Photogrammetry Toolbox on my other computer, so I’ll try doing it again on that machine; 123D Catch is all well and good but it is quite black-box; with PPT I can perhaps achieve better results.

The resulting model from 123D Catch shows the inside of the atrium far better than I expected (and again, a better starting film would probably give better results). I exported the .obj, .mtl, and the jpg textures for the resulting model, to my computer, which I then uploaded to augmentedev.com.

The result? A pompeian house, on my desktop!

The Atrium of the House of the Tragic Poet, Pompeii-on-the-Rideau

Now imagine *all* of the video that exists out there of Pompeii. It should be possible to create a 3d model of nearly the whole city (or at least, the parts they let tourists into), harvesting videos from youtube. One could then 3d print the city, export to AR, or import into a game engine….

As far as the #hist5702x project is concerned, we could do this in the workspace they’ve set up for us in the warehouse building, or at the airport, or from historical footage from inside a plane, or….

Gaze & Eonydis for Archaeological Data

I’m experimenting with Clement Levallois‘ data mining tools ‘Gaze‘ and ‘Eonydis‘. I created a table with some mock archaeological data in it: artefact, findspot, and date range for the artefact. More on dates in a moment. Here’s the fake dataset.

Firstly, Gaze will take a list of nodes (source, target), and create a network where the source nodes are connected to each other by virtue of sharing a common target. Clement explains:

Paul,dog
Paul, hamster
Paul,cat
Gerald,cat
Gerald,dog
Marie,horse
Donald,squirrel
Donald,cat
… In this case, it is interesting to get a network made of Paul, Gerald, Marie and Donald (sources nodes), showing how similar they are in terms of pets they own. Make sure you do this by choosing “directed networks” in the parameters of Gaze. A related option for directed networks: you can choose a minimum number of times Paul should appear as a source to be included in the computations (useful to filter out unfrequent, irrelevant nodes: because you want only owners with many pets to appear for instance).

The output is in a nodes.dl file and an edges.dl file. In Gephi, go to the import spreadsheet button on the data table, import the nodes file first, then the edges file. Here’s the graph file.

Screenshot, Gaze output into Gephi, from mock archaeo-data

Screenshot, Gaze output into Gephi, from mock archaeo-data

Eonydis on the other hand takes that same list and if it has time-stamps within it (a column with dates), will create a dynamic network over time. My mock dataset above seems to cause Eonydis to crash – is it my negative numbers? How do you encode dates from the Bronze Age in the day/month/year system? Checking the documentation, I see that I didn’t have proper field labels, so I needed to fix that. Trying again, it still crashed. I fiddled with the dates to remove the range (leaving a column to imply ‘earliest known date for this sort of thing’), which gave me this file.

Which still crashed. Now I have to go do some other stuff, so I’ll leave this here and perhaps one of you can pick up where I’ve left off. The example file that comes with Eonydis works fine, so I guess when I return to this I’ll carefully compare the two. Then the task will be to work out how to visualize dynamic networks in Gephi. Clement has a very good tutorial on this.

Postscript:

Ok, so I kept plugging away at it. I found if I put the dates yyyy-mm-dd, as in 1066-01-23 then Eonydis worked a treat. Here’s the mock data and here’s the gexf.

And here’s the dynamic animation! http://screencast.com/t/Nlf06OSEkuA

Post post script:

I took the mock data (archaeo-test4.csv) and concatenated a – in front of the dates, thus -1023-01-01 to represent dates BC. In Eonydis, where it asks for the date format, I tried this:

#yyyy#mm#dd  which accepted the dates, but dropped the negative;

-yyyy#mm#dd, which accepted the dates and also dropped the negative.

Thus, it seems to me that I can still use Eonydis for archaeological data, but I should frame my date column in relative terms rather than absolute, as absolute isn’t really necessary for the network analysis/visualization anyway.

Hollis Peirce, George Garth Graham Research Fellow

Hollis Peirce on Twitter: https://twitter.com/HollPeirce

Mr. Hollis Peirce https://twitter.com/HollPeirce

I am pleased to announce that the first George Garth Graham Undergraduate Digital History Research Fellow will be Mr. Hollis Peirce.

Hollis is a remarkable fellow. He attended the Digital Humanities Summer Institute at the University of Victoria in the summer of 2012. At DHSI he successfully completed a course called “Digitization Fundamentals and Their Application”. In the fall semester of 2012 he was the impetus behind, and helped to organize,  THATCamp Accessibility on the subject of the impact of digital history on accessibility in every sense of the word.

Hollis writes,

Life for me has been riddled with challenges.  The majority of them coming on account of the fact that I, Hollis Peirce, am living life as a disabled individual with Congenital Muscular Dystrophy as many things are not accessible to me.  However, I have never let this fact hold me back from accomplishing my goals.  Because of this, when I first started studying history I knew I was not choosing an easy subject for a disabled individual such as myself.  All those old, heavy, books on high library shelves that history is known for made it one of the most inaccessible subjects possible to study.  All that changed however, when I discovered digital history.

It was thanks to a new mandatory class for history majors at Carleton University called The Historian’s Craft taught by a professor named Dr Shawn Graham.  This course was aimed at teaching students all about how to become a historian, and how a historian is evolving through technology.  At that moment the idea for ‘Accessibility & Digital History’ came to mind.  From that point on many steps have been taken to advance my studies in this field, which has led to being selected as the first George Garth Graham Undergraduate Digital History Reseach Fellow.

Hollis and I have had our first meeting, about what his project might entail. When I initially cooked this idea up, I thought it would allow students the opportunity to work on my projects, or those of my colleagues around the university. As we chatted about Hollis’ ideas, (and where I batted around some of my own stuff),  I realized that I had the directionality of this relationship completely backwards.

It’s not that Hollis gets to work on my projects. It’s that I get to work on his.

Here’s what we came up with.

At THATCamp Accessibility, we recorded every session. We bounced around the idea of transcribing those sessions, but realized that that was not really feasible for us. We started talking about zeroing in on certain segments, to tell a history of the future of an accessible digital humanities… and ideas started to fizz. I showed Hollis some of Jentery Sayer’s stuff, especially his work with Scalar . 

Jentery writes,

the platform particularly facilitates work with visual materials and dynamic media (such as video and audio)….it enables writers to assemble content from multiple sources and juxtapose them with their own compositions.

Can we use Scalar to tell the story of THATCamp Accessibility that captures the spontaneity, creativity, and excitement of that day in a way that highlights the issues of accessibility that Hollis  wants to explore? And if we can, how can we make it accessible for others (screenreaders, text-to-speech, etc?) And if we focus on telling history with an eye to accessibility (oh, how our metaphors privilege certain senses, ways of knowing!) maybe there will be lessons for telling history, full stop?

Stay tuned! Hollis is setting up his blog this week, but he’ll be posting over at http://hollispeirce.grahamresearchfellow.org/