The Original Big Data

I’m speaking tomorrow at Carleton U’s Data Day. I’m the only historian/humanist/archaeologist/whatever on the ticket. I can’t even stay for the full event, because I teach (my #hist3907b students are showing off their term projects!). Last year, I felt the speakers at the event were dismissive towards the humanities.

So when I was asked to speak this year, I said ok. My original draft went in all guns a-blazin’. I took a day to digest it, and decided, no, not all that useful, and threw it out. Below then are my speaker’s notes for what I’m going to say, regarding

History: The Original Big Data

The slides are online on github here. My actual talk will differ from what I’m writing below as I go off on tangents (though not many; only 15 minutes). This’ll give you a feel though for what I hope becomes a constructive point of departure for engagement with my data science colleagues.

1. title. Millions of $ spent digitizing historical resources;

2. opening every passing day, we leave ridculous amounts of traces – typically in 1s and 0s. How do we make any sense of it? For what purpose? What does it mean? What does it do to us, if those traces can be…. tracked?

3. troy ‘big data’ is not the first to wrestle with the problems of abundance. _This image_ shows several metric tonnes of archaeology recovered from a recent season of excavation at ancient Troy. Every sherd, every piece of pottery, every grain of pollen, every lithic, sit not just in 3d space, but in a 4d space of deposition and another one of use! it’s an incredible entangled mess, from which archaeological methods allow us to reconstruct an entire civilization. How’s that for big data.

4. big data is ever with us. archaeology/history the original big data. In my talk, I want to suggest ways in which these disciplines of big data in the past have more in common with all of you than you might first have guessed.

5. (monte testaccio: I measure data in cubic metres, not mere terabytes! roar!)

6. Carp Mountain / Ottawa’s own monte testaccio. talk about big stinky data.

7. Thinking in 4d. archae and history, bring skills and methods for dealing with multiplex, multicausal/multivalent information. Context is king.

8. Whitehouse. we’re not just concerned with asking question of our data in the here and now, but also of thinking how to manage our data so that questions we *can’t* imagine can be asked in the future with tools that *haven’t* been invented. We’re remarkably forward thinking doncherknow. Eric Kansa is one archaeologist in particular who has been at the forefront of such efforts in my own field, archaeology. Recently recognized by the White House for his work, he’s helping set the agenda in digital humanities more broadly.

9. Firehose. ‘Big’ isn’t really that useful a term though. It’s a relative measure; thus the goalposts are always moving. What was big five years ago: is it still big, if you’re measuring in terms of digital storage? Better to think of ‘big’ in relationship to your own ability to apply your method to it. Big is in the eye of the beholder; big is when you need to reduce complexity through computation.

10. Ian. …and so we’re in an era now when we as historians/archaeologists are having to invent new methodologies – for historians in particular, often in the smoking ruins of corporate decisions that obliterate the record of *millions of people’s lives*.

11. Teaching. The methods we’re coming up with, often borrowed from big data, sometimes made up by we ourselves, often have an element of deformation to them. We’re not using computation to prove an hypothesis; we’re using it to deform our worldview, to generate new ideas, to see data at a scale and perspective otherwise impossible to obtain. So let me tell you about my students, who’ve just encountered these ideas for the first time.

12-13-14 examples from class, still ongoing, these are early visuals used with permission

15. Imagequilt google images ‘DH projects’. It’s an exciting time to be teaching history. The sheer vitality and breadth of what’s being done is exhausting to keep up with. So y’all should keep an eye on

16. Data speaks? what unites our varied approaches is the reflective critique of what we’re doing, how the data is collected, how the code replicates certain visions of the world, of power, of control, of templates and constructed selves.

17. Data/Capta. Data are not neutral; anyone who tells you otherwise is trying to sell you something. They aren’t objective. Digital data in particular are not! there is nothing natural about interacting with 1s & 0s – its entirely constructed, and its worth thinking hard about by whom and for whom regarding whom.

18. Big Capta. Our platforms are built by people who imaging most people are like them. And if you’re not a white guy? digital media can be a harsh place. Which is why so much of what occurs online is performative, or actively trying to screw with, hide from, or subvert, the algorithms that are capturing our data.

19. Big Data needs DH. Big data could be liberating; it could be empowering; it could be transformative. There’s much promise in big data in being able to take a macroscopic look at ourselves. The role of the humanities is sometimes to critique, to help realise the promise.

20. storytellers. Critique doesn’t mean ‘be negative about’. There’s sometimes a tendency to frame data and data science somehow in a battle to the death, as if big data was not something that the humanities has centuries of experience in dealing with. I think that misunderstands what could be a productive respectful relationship. I think we’re both in the business of telling stories – perhaps for different goals (on which we can discuss)…

21. complementary but this makes the relationship complementary. Each needs the other.

22. And what of my own work? Well, if we’ve got time, this is the kind of stuff I do… I stand between worlds.