Text Analysis of the Grand Jury Documents

a topic in the grand jury documents, #ferguson
a topic in the grand jury documents, #ferguson

I watched Twitter and the CBC while the prosecutor was reading his statement. I watched the live feeds from Ferguson, and other cities around the US. Back in August, when this all first began, I was glued to my computer, several feeds going at once.

A spectator.

Yesterday, Mitch Fraas put the grand jury documents (transcripts of the statements, the proceedings) into Voyant Tools:

These ultimately came from here: http://apps.stlpublicradio.org/ferguson-project/evidence.html

So today, I began, in a small way, to try to make sense of it all, the only way that I can. Text analysis.

Here’s the Voyant Tools corpus

Not having read the full corpus closely (this is, of course, a *distant* tool), it certainly looks as if the focus was on working out what Brown was doing, rather than Wilson…

I started topic modeling, using R & MALLET.

and I put everything up on github

but then I felt that I could improve the analysis; I created one concatenated file, then broke it into 1000 line chunks. The latest inputs, outputs, and scripts, are all on my github page.

The most haunting…

And all 100 topics…

None of this counts as analysis. But – by putting it altogether, my hope is that more people will grab the text files, grab the R script, explore the Voyant corpus, and really put this all under the microscope. I was tremendously effected by Bethany’s latest blog post, ‘All at once‘, which discusses her own reaction to recent news in both Ferguson and UVa, and elsewhere. It was this bit at the end that really resonated:

[…]we need analytical and interpretive platforms, too, that help us embrace our own subjective positioning in the systems in which we labor–which means, inevitably, to embrace our own complicity and culpability in them. And we need these, at the same time, to help us see beyond: to see patterns and trends, to read close and distantly all at once, to know how to act and what to do next. We need platforms that help us understand the workings of the cogs, of which we are one.

So here’s my small contribution. Maybe this can be a platform for someone to do a deeper analysis, to get started with text analysis, to read distantly and closely, to see beyond, and to understand what happened during the Grand Jury.

Introducing Voyant in a History Tutorial

This week my HIST2809 students are encountering digital history, as part of their ‘Historian’s Craft’ class (an introduction to various tools & methods). As part of the upcoming assignment, I’m having them run some history websites through Voyant, as a way of sussing out how these websites craft a particular historical consciousness. Each week, there’s a two-hour lecture and one hour of tutorial where the students lead discussions given the lecture & assigned readings. For this week, I want the students to explore different flavours of Digital History – here are the readings:

“Possible discussion questions: How is digital history different? In ten years, will there still be something called ‘digital history’ or will we all history be digital? Is there space for writing history through games or simulations? How should historians cope with that? What kind of logical fallacies would such approaches be open to?”

To help the TAs bring the students up to speed with using Voyant, I’ve suggested to them that they might find it fun/interesting/useful/annoying to run one of those papers through Voyant. Here’s a link to the ‘Interchange’ article, loaded into Voyant:

http://voyant-tools.org/?corpus=1363622350848.367&stopList=stop.en.taporware.txt

The TAs could put that up on the screen, click on various words in the word cloud, to see how the word is used over the course of a single article (though in this case, there are several academics speaking, so the patterns are in part author-related). Click on ‘scholarship’ in the word cloud, and you get a graph of its usage on the right – the highest point is clickable (‘segment six’). Click on that, and the relevant bit of text appears in the middle, as Bill Turkel talks about the extent to which historical scholarship should be free. On the bottom left, if you click on ‘words in the entire corpus’, you can select ‘access’ and ‘scholarship’, which will put both of them on the graph

( http://voyant-tools.org/tool/TypeFrequenciesChart/?corpus=1363622350848.367&docIdType=d1363579550728.b646f3e3-65d1-2347-c580-5e5c0985e6d0%3Ascholarship&docIdType=d1363579550728.b646f3e3-65d1-2347-c580-5e5c0985e6d0%3Aaccess&stopList=stop.en.taporware.txt&mode=document&limit=2 )

and you’ll see that the two words move in perfect tandem, so the discussion in here is all about digital tools opening access to scholarship – except in segment 8. The question would then become, why?

….so by doing this exercise, the students should get a sense of how looking at macroscopic patterns involves jumping back to the close reading we’re normally familiar with, then back out again, in an iterative process, generating new questions all along the way. An hour is a short period of time, really, but I think this would be a valuable exercise.

(I have of course made screen capture videos walking the students through the various knobs and dials of Voyant. This is a required course here at Carleton. 95 students are enrolled. 35 come to every lecture. Approximately 50 come to the tutorials. Roughly half the class never comes…. in protest that it’s a requirement? apathy? thinking they know how to write an essay so what could I possibly teach them? That’s a question for another day, but I’m fairly certain that the next assignment, as it requires careful use of Voyant, is going to be a helluva surprise for that fraction.”