update, 8pm march 29. Well, this sure is embarrassing. Turns out, when they said ‘panel’, by gosh, they really did mean panel. I didn’t need to produce all this; I wasn’t presenting anything; I wasn’t a formal speaker… man, make sure to read the fine print, eh? Or in my case, the large print. So, in the end, there were four of us being quizzed by the moderator and the audience. It was a really great conversation. But that’s all it was. This was totally not necessary. So now I’ve got this talk, below, and here it will remain, ne’er to be delivered. But y’all might find it interesting anyway, and perhaps I’ll fix its problems, expand it, turn it into something meaningful, someday. But in the meantime…
I’ve got between 12-15 minutes tomorrow, at Carleton’s 3rd data day. That’s not a lot of time. I’ve written out roughly what it is I want to talk about – but I go off-script a lot when I speak, so what’s below is only the most nebulous of guides to what’ll actually come out of my mouth tomorrow. In any event, the stuff below would take more like 25 – 30 minutes if I stuck to the script.
Apparently the day’ll be live-streamed too, at http://carleton.ca/cuids/events/data-day-2/. I’m on at the 2pm-ish slot.
Slides are at: j.mp/sg-dd3.
*title* One of the things I’m interested in, inasmuch as ‘big data’ is a thing in history, is the way our tool use changes us. I’m an archaeologist by training; it’s almost an article of faith in archaeology that our tools change us as much as we change our tools. But I guess I worry that our tools are getting out of control, that we exult in the ways our tools exceed us*. This panel is called ‘Needs and Opportunities for Big Data in Social Sciences and the Humanities’; I think I’d like to rejig that to read ‘ways in which big data needs social sciences & the humanities’
1. In recent years, everyone has suddenly had to cope with the sheer amount of data that our always-on, always-connected, always-under-surveillance society has generated. It’s forced us to take stock, and rethink our place in the world. This kind of thing has happened before; in an earlier moment it gave rise to ‘the Gothic’ – and it’s worthwhile thinking through *that* reaction to the changing place of humans in the world, and what it implies for *this* moment. To simplify horribly, and at the risk of undermining my humanities street-cred, I will for the sake of convenience conflate romanticism with the gothic and boil the gothic down to the text on this slide; this is all tied up the broader changes in western society precipitating out of the enlightenment and the beginnings of our industrial age. If I said ‘Frankenstein, or the modern prometheus’, you get what I’m talking about. Key here is the idea of shock and thrill …
2. …and the annihilation of ‘the self’. That is, the real ambition here with the gothic is to first frame everything from the point of view of the individual, to overwhelm the senses of the viewer or the reader in the terror or majesty of, say, a landscape or a sensation (the gothic horror), such that only the sublime feeling remains. The quantified self, and those people who keep fit-bits and personal trackers, would be at home in the gothic – and indeed, it is from an event explicitly tying that aspect of ‘personal’ big data to the gothic sensibility (see this; also, read this) that I began thinking along these lines. But let’s return to that idea of the annihilation of ‘self’.
3. It’s in this sense that big data is gothic. The traces that we leave are aggregated and interrogated and correlated, a vast digital landscape of signs and signals from which predictions are generated; and yet, without foundation. Microsoft can build a chat-bot that learns from humans, but they don’t understand or can’t foresee that releasing it into a particular environment already toxic for women is going to be a bad idea. The terror and majesty of the algorithm, of the code, of the data – never mind the humans in the foreground – is what matters.
4. This is what I mean by big data gothic: as Zoe Quin said, ‘if you’re not asking yourself ‘how could this be used to hurt someone’ in your design/engineering process, you’ve failed’. That is to say: you’ve been seduced by the data vista stretching out before you. This is the same impulse that Steve Jobs channeled when he (more or less) said ‘I don’t do marketing, I do what I want because people don’t know what they want’. It’s the same impulse that reaches for plagiarism detection software rather than asking, ‘what is it about my course that makes plagiarism a rationale response’. The seductive lure of ‘moar data’ suggests that eventually, all solutions will percolate out of the data. But who decides what *counts* as data?
5. It’s the rhetorical and tactical usage of the phrase ‘big data’ that I’m concerned with here; I’m not against data science per se or the interesting things you can learn when you have aggregated information – I am an archaeologist after all, and I *did* publish a book with ‘big data’ in the title. The thing is, metaphors matter. Metaphors structure thought and action – ‘the university is a business’, for instance – and so if we imagine ‘big data’ as somehow objectively out there, and not produced by conscious decisions about *what counts*, and *who does the counting*, we end up in situations where people lose their jobs (uber!) or miss out on credit, or constantly get reconnected with abusive ex partners on facebook. Big data, as an aggregated thing, means that the means of production, the power in the system, has shifted from those of us who create the data, to those of us with the money, the privilege, the computing power, to mine it.
6. So let’s call this blinkered version of working with human data ‘Big Data Gothic’. Like its namesake, it’s not too concerned about fallout on individual, named, humans; it revels in the data landscape and draws much of its power from the thrill of off-loading decision making to the machines…
7. That is to say, it begins by thinking of people as things.
8. Oddly enough, big data is not concerned with context; but – again, as an archaeologist – context forces us to think of humans as, well, human.
9. This is what the humanities excel at. At this event two years ago, a businessman spoke about the need to learn how to tell stories from data. We’ve got you covered, over in the humanities.
10. When you think of humans as things, a poorly trained machine learning routine can be used to target other humans for killing. That the routine has a probable error rate that translates into 15 000 people slated for death, people shrug. Much easier to say, there’s a 0.008% false positive rate. (see http://arstechnica.co.uk/security/2016/02/the-nsas-skynet-program-may-be-killing-thousands-of-innocent-people/)
11. Treating humans as things. Is there anything more thing-like, than buying and selling human remains? The literal commodification of humans. This is a project I’m working on with Damien Huffer; he does the anthropology, I do the numbers. Human remains are bought and sold on Instagram in ways that circumvent Instagram’s rules (such as they are). We want to know both the language used to facilitate these sales, and also the visual language, that isn’t caught by my algorithmic trawling. I have around, I donno, perhaps 15000 images and posts now on my machine. ‘Big’ in this context means: overwhelming for one person using the tools he was taught in grad school. Big data from culling the posts gives me some insight, esp when I represent as vector models, some of the explicit language behind this trade, and ways that people signal that something is for sale. But it also misses the visual signals in the composition of the images itself. For that, I have to go in and read these hidden cues – rather like a kind of steganography that is explicitly meant to conceal the trade from algorithmic monitoring. By the way, this kind of reaction is also present on Facebook or Twitter as people ‘template’ themselves for particular audiences. The danger is that these templated selves could become algorithmic prisons: our performances in reaction to alogorithms that make assumptions about how the world work cease to become performances and instead become real. This is big data gothic.
12. Tricia Wang prefers the term ‘thick data’, that is, the kind of thick storytelling that ethnography, anthropology, history, english, and so on, excel at. She argues with reference to what happened to Nokia – and I agree with her – that insight is dependent on both modes; that data science can usefully learn a thing or two from the humanities, and likewise, the humanities can benefit from the scale and distance that an aggregated view can provide.
13. The thing is not to be seduced by the view that big data gothic provides. It is exhilerating, I agree. But it’s ultimately getting in the way.
14. I’ll just leave this here: a good way to know if you’re dealing with a big data gothic situation is if you’ve blamed the algorithm. If you’ve offloaded responsibility for the consequences of decisions made to the computer. In the end, it all comes down to humans.
a note on the phrase ‘big data gothic’. I’m pretty sure that this is not my own phrase, that I must have encountered it somewhere and it’s been pinging around inside my head, quietly waiting for its moment to emerge. I really like the phrase; but I’d really like to attribute it properly. So now to excavate back through everything I’ve read/browsed these last few months…
Featured image is courtesy Sophie Hay
* not that I have access to any of the really souped up big data tools that the people in that room tomorrow will use as a simple matter of course. It’s all relative… right?