Open Notebooks Part V: Notational Velocity and 1 superRobot

The thought occurred that not everyone wants to take their notes in Scrivener. You might prefer the simple elegance and speed of Notational Velocity, for instance. Yet, when it comes time to integrate those notes, to interrogate those notes, to rearrange them to see what kind of coherent structure you might have, Scrivener is hard to beat.

Screen Shot 2014-09-26 at 1.12.02 PMWith Notational Velocity installed, go to ‘preferences’. Under ‘Notes’ change ‘Read notes from folder’ to point to the Scrivener synchronization folder. Then, change ‘store and read notes on disk as:’ to ‘rich text format files’. This will save every note as a separate rtf file in the folder. Now you can go ahead and use Notational Velocity as per normal. Notational Velocity uses the search bar as a way of creating notes, so start typing in there; if it finds existing notes with those keywords, it’ll bring them up. Otherwise, you can just skip down to the text editing zone and add your note.

When next you sync scrivener, all of these notes will be brought into your project. Ta da! A later evolution of Notational Velocity, nvALT, has more features, and can be used locally as a personal wiki (as in this post). I haven’t played with it yet, but given its genesis, I imagine it would be easy to make it integrate with Scrivener this way. (A possible windows option is Notation, but I haven’t tried it out yet).

~o0o~

I’ve combined all of my automator applications into one single automator app, a superrobot if you will, that grabs, converts, creates a table of contents in markdown, and pushes the results into github, whereupon it lives within my markdown wiki page. I found I had to insert 10 second pauses between stages, or else the steps would get out of order making a godawful mess. Presumably, with more notecards, I’d have to build in more time? We shall see. No doubt there is a much more elegant way of doing this, but the screenshot gives you what you need to know:

Screen Shot 2014-09-26 at 1.36.03 PM

Update with Caveat Ah. Turns out that the Scrivener sync feature renames the notes slightly, which seems to break things in Notational Velocity. So perhaps the workflow should go like this:

1. Use notational velocity to keep notes, and for its handy search feature.
2. Have preferences set to individual files as rtf, as above, in a dedicated folder just for notational-velocity.
3. Create an automator app that moves everything into Scrivener sync, for your writing and visualizing of the connections between the notes.
4. Sync scrivener, continue as before. OR, if you wish to dispense with scrivener altogether, just use the rtf to md script and proceed.

Perhaps that’s just making life way too complicated.

Oh, and as Columbo used to say… “…one more thing”: Naming. Some kind of naming convention for notes needs to be developed. Here is some really good advice that I aspire to implement.

Open Notebooks Part IV – autogenerating a table of contents

I’ve got MDWiki installed as the public face of my open notebook.

Getting it installed was easy, but I made it hard, and so I’ll have to collect my thoughts and remember exactly what I did… but, as I recall, it was this bit I found in the documentation that got me going:

First off, create a new (empty) repository on GitHub, then;

git clone https://github.com/exalted/mdwiki-seed.git
cd mdwiki-seed
git remote add foobar <HTTPS/SSH Clone URL of the New Repository>
git push foobar gh-pages

 

Then, I just had to remember to edit the ‘gh-pages’ branch. Also, on github, if you click on ‘settings’, it’ll give you the .io version of your page, which is the pretty bit. So, I updated robot 3 to push to the ‘uploads/documents’ folder. Hooray! But what I needed was a self-updating ‘table of contents’. Here’s how I did that.

In the .md file that describes a particular project (which goes in the ‘pages’ folder) I have a heading ‘Current Notes’ and a link to a file, content.md, like so:

## [Current Notes](uploads/documents/contents.md)

Now I just train a robot to always make an updated contents.md file that gets pushed by robot 3.

I initially tried building this into robot 2 (‘convert-rtf-to-md’), but I outfoxed myself too many times. So I inserted a new robot into my flow between 2 & 3. Call it 2.5, ‘Create-toc':

Screen Shot 2014-09-24 at 9.40.16 PM

It’s just a shell script:

cd ~/Documents/conversion-folder/Draft
ls *.md &gt; nolinkcontents.md
sed -E -n 's/(^.*[0-9].*$)/ \* [\1](\1)/gpw contents.md' nolinkcontents.md 
rm nolinkcontents.md

Or, in human: go to the conversion folder. List out all the newly-created md files and write that to a file called ‘nolinkcontents.md’. Then, wrap markdown links around each line, and use each line as the text of the link, and call that ‘contents.md’. Then remove the first file.

Ladies and gentlemen, this has taken me the better part of four hours.

Anyway, this ‘contents.md’ file gets pushed to github, and since my project description page always links to it, we’re golden.

Of course, I realize now that I’ll have to modify things slightly, structurally and in my nomenclature, once I start pushing more than one project’s notes to the notebook. But that’s a task for another night.

Now to lesson plan for tomorrow.

(update: when I first posted this, I kept saying robot 4. Robot 4 is my take-out-the-trash robot, which cleans out the conversion folder, in readiness for the next time. I actually meant Robot 3. See Part III)

Open notebooks part III

Do my bidding my robots!

Do my bidding my robots!

I’ve sussed the Scrivener syncing issue by moving the process of converting out of the syncing folder (remember, not the actual project folder, but the ‘sync to external folder’). I then have created four automator applications to push my stuff to github in lovely markdown. Another thing I’ve learned today: when writing in Scrivener, just keep your formatting simple. Don’t use markdown syntax within Scrivener or your stuff on github will end up looking like this \##second-heading. I mean, it’s still legible, but not as legible as we’d like.

So – I have four robots. I write in Scrivener, keep my notes, close the session, whereupon it syncs rtf to the ‘external folder’ (in this case, my dropbox folder for this purpose; again, not the actual scrivener project folder).

  1. I hit robot 1 on my desktop. Right now, this is called ‘abm-project-move-to-conversion-folder’. When I have a new project, I just open this application in Automator, and change the source directory to that project’s Scrivener external syncing folder. It grabs everything out of that folder, and copies it into a ‘conversion-folder’ that lives on my machine.
  2. I hit robot 2, ‘convert-rtf-to-md’, which opens ‘conversion-folder’ and turns everything it finds into markdown. The conversion scripts live in the ‘conversion-folder'; the things to be converted live in a subfolder, conversion-folder/draft
  3. I hit robot 3, ‘push-converted-files-to-github-repo’. This grabs just the markdown files, and copies them into my local github repository for the project. When I have a new project, I’d have to change this application to point to the new folder. This also overwrites anything with the same file name.
  4. I hit robot 4, ‘clean-conversion-folder’ which moves everything (rtfs, mds,) to the trash. This is necessary because if not, then I can end up with duplicates of files I haven’t actually modified getting through my pipeline onto my github page. (If you look at some of my experiments on github, you’ll see the same card a number of times with 1…2…3…4 versions).

Maybe it’s possible to create a meta-automator that strings those four robots into 1. I’ll try that someday.
[pause]
Ok, so of course, I tried stringing them just now. And it didn’t work. So I put that automator into the trash -
[pause]
and now my original four robots give me errors, ‘the application …. can’t be opened. -1712′. I found the solution here (basically, go to spotlight, type in activity, then locate the application on the list and quit it).

Here are my automators:

Robot 1

Robot 1

Robot 2

Robot 2

Robot 3

Robot 3

Robot 4

Robot 4

Automator….

I think I love you.

 

An Open Research Notebook Workflow with Scrivener and Github Part 2: Now With Dillinger.io!

A couple of updates:

First item

The four scripts that sparkygetsthegirl crafted allow him to

1. write in Scrivener,

2. sync to a Dropbox folder,

3. Convert to md,

4. then open those md files on an android table to write/edit/add

5. and then reconvert to rtf for syncing back into Scrivener.

Screen Shot 2014-09-19 at 2.24.27 PMI wondered to myself, what about some of the online markdown editors? Dillinger.io can scan Dropbox for md files. So, I went to Dillinger.io, linked it to my dropbox, scanned for md files, and lo! I found my project notes. So if the syncing folder is shared with other users, they can edit the notecards via Dillinger. Cool, eh? Not everyone has a native app for editing, so they can just point their browser’s device to the website. I’m sure there are more options out there.

Second Item

I was getting syncing errors because I wasn’t flipping the md back to rtf.

But, one caveat: when I went to run the md to rtf script, to get my changes back into Scrivener (and then sync), things seemed to go very wonky indeed. One card was now blank, the others were all Scrivener’s markup but Scrivener wasn’t recognizing it.

So I think the problem is me doing things out of order. I continue to play.

Third Item

I automated running of the conversion scripts. You can see my automator set up in the screenshot below. Again, I saved it as an application on my desktop. First step is to grab the right folder. Second, to open the terminal, input the commands, then close the terminal.

Screen Shot 2014-09-19 at 2.36.03 PM

Postscript

I was asked why on earth would I want to share my research notes? Many many reasons – see Caleb McDaniel’s post, for instance – but one other feature is that, because I’m doing this on Github, a person could fork (copy) my entire research archive. They could then use it to build upon. Github keeps track of who forks what, so forking becomes a kind of mass citation and breadcrumb trail showing who had an idea first. Moreover, github code (or in this case, my research archive) can be archived on figshare too, thus giving it a unique DOI *and* proper digital archiving in multiple locations. Kinda neat, eh?

An Open Research Notebook Workflow with Scrivener and Github

I like Scrivener. I *really* like being able to have my research and my writing in the same place, and most of all, I like being able to re-arrange the cards until I start to see the ideas fall into place.

I’m a bit of a visual learner, I suppose. (Which makes it ironic that I so rarely provide screenshots here. But I digress). What I’ve been looking for is a way to share my research, my lab notes, my digital ephemera in a single notebook. Lots of examples are out there, but another criterion is that I need to be able to set something up that my students might possibly be able to replicate.

So my requirements:

1. Visually see my notes, their layout, their possible logical connections. The ability to rearrange my notes provides the framework for my later written outputs.

2. Get my notes (but not all of the other bits and pieces) onto the web in such a way that each note becomes a citable object, with revision history freely available.

3. Ideally, that could then feed into some sort of shiny interface for others’ browsing – something like Jeckyll, I guess – but not really a big deal at the moment.

So #1 is taken care of with Scrivener. Number 2? I’m thinking Github. Number 3? We’ll worry about that some other day. There are Scrivener project templates that can be dropped into a Github repository (see previous post). You would create a folder/repo on your computer, drop the template into that, and write away to your hearts content, committing and syncing at the end of the day. This is what you’d get. All those slashes and curly brackets tell Scrivener what’s going on, but it’s not all that nice to read. (After all, that solution is about revision history, not open notebooks).

Now, it is possible to manually compile your whole document, or bits at a time, into markdown files and to commit/sync those. That’s nice, but time consuming. What I think I need is some way to turn Scrivener’s rtf’s into nice markdown. I found this, a collection of scripts by Sparkygetsthegirl as part of a Scrivener to Android tablet and back writing flow. Check it out! Here’s how it works. NB, this is all Mac based, today.

1. Make a new Scrivener project.

2. Sync it to dropbox. (which is nice: backups, portability via Dropbox, sharing via Github! see below)

3. drop the 4 scripts into the synced folder. Open a terminal window there. We’ll come back to that.

4. open Automator. What we’re going to do is create an application that will open the ‘drafts’ folder in the synced project, grab everything, then filter for just the markdown files we made, then move them over to our github repo, overwriting any pre-existing files there. Here’s a screenshot of what that application looks like in the Automator editing screen:

Remember, you're creating an 'application', not a 'workflow'

Remember, you’re creating an ‘application’, not a ‘workflow’

You drag the drafts folder into the ‘Get specified finder items’ box, get the folder contents, filter for files with file extension .md, and then copy to your github repo. Tick off the overwrite checkbox.

Back in scrivener, you start to write.

Write write write.

Here’s a screenshot of how I’m setting up a new project.

Screen Shot 2014-09-17 at 1.50.14 PM

In this screenshot, I’ve already moved my notecards from ‘research’ into ‘draft’. In a final compile, I’d edit things heavily, add bits and pieces to connect the thoughts, shuffle them around, etc. But right now, you can see one main card that identifies the project and the pertinent information surrounding it (like for instance, when I’m supposed to have this thing done). I can compile just that card into multimarkdown, and save it directly to the github repository as readme.md.

Now the day is done, I’m finished writing/researching/playing. I sync the project one last time. Then, in the terminal window, I can type

./rtf2md Draft/*.rtf

for everything in the draft folder, and

./rtf2md Notes/*.rtf

for everything in the notes folder. Mirabile dictu, the resulting md files will have the title of the notecard as their file name!

Screen Shot 2014-09-17 at 1.56.06 PM

Here, I’ve used some basic citation info as the name for each card; a better idea might be to include tags in there too. Hey, this is all still improv theatre.

Now, when I created that application using automator, I saved it to my desktop. I double-click on it, and it strains out the md files and moves them over to my github repository. I then commit & sync, and I now have an open lab notebook on the web. Now, there are still some glitches; my markdown syntax that I wrote in, in Scrivener, isn’t being recognized on github because I think Scrivener is adding backslashes here and there, which are working like escape characters?

Anyway, this seems a promising start. When I do further analysis in R, or build a model in Netlogo, I can record my observations this way, create an R notebook with knitr or a netlogo applet, and push these into subfolders in this repo. Thus the whole thing will stick together.

I think this works.

~o~
Update Sept 18. I’ve discovered that I might have messed something up with my syncing. It could be I’ve just done something foolish locally or it might be something with my workflow. I’m investigating, but the upshot is, I got an error when I synced and a new folder called ‘Trashed Files’, and well, I think I’m close to my ideal setup, but there’s still something wonky. Stay tuned.

Update Sept 19 Don’t write in Scrivener using markdown syntax! I had a ‘doh’ moment. Write in Scrivener using bold, italics, bullets, etc to mark up your text. Then, when the script converts to markdown, it’ll format it correctly – which means that github will render it more or less correctly, making your notes a whole lot easier to read. Click on ‘raw’ on this page to see what I mean!

Open Notebooks

This post is more a reminder to me that anything you’d like to read, but anyway-

I want to make my research more open, more reproducible, and more accessible. I work from several locations, so I want to have all my stuff easily to hand. I work on a Mac (sometimes) a PC (sometimes) and on Linux (rarely, but it happens; with new goodies from Bill Turkel et al I might work more there!).

I build models in Netlogo. I do text analysis in R. I visualize and analyze with things like Voyant and Overview. I scrape websites. I use Excel quite a lot. I’m starting to write in markdown more often. I want to teach students (my students typically have fairly low levels of digital literacy) how to do all this too. What I don’t do is much web development type stuff, which means that I’m still struggling with concepts and workflow around things like version control. And indeed, getting access to a server where I can just screw around to try things out is difficult (for a variety of reasons). So my server-side skills are weak.

What I think I need, is an open notebook. Caleb McDaniel has an excellent post on what this could look like. He uses Gitit. I looked at the documentation, and was defeated out of the gate. Carl Boettiger uses a combination of github and jekyll and who knows what else. What I really like is Mark Madsen’s example but I’m not aufait enough yet with all the bits and pieces (damn you version control, commits, make, rake, et cetera et cetera!)

I’ve got ipython notebooks working on my PC, which are quite cool (I installed the Anaconda version). I don’t know much python though, so yeah. Stefan Sinclair is working on ‘voyant notebooks’ which uses the same general idea to wrap analysis around Voyant, so I’m looking forward to that. Ipython can be used to call R, which is cool, but it’s still early days for me (here’s a neat example passing data to R’s ggplot2).

So maybe that’s just the wrong tool.  Much of what I want to do, at least as far as R is concerned is covered in this post by Robert Flight on ‘creating an analysis as a package and vignette‘ in R studio. And there’s also this, for making sure things are reproducible – ‘packrat

Some combination of all of this I expect will be the solution that’ll work for me. Soon I want to start doing some more agent based modeling & simulation work, and it’s mission critical that I sort out my data management, notebooks, versioning etc first this time.

God, you should see the mess around here from the last time!

SAA 2015: Macroscopic approaches to archaeological histories: Insights into archaeological practice from digital methods

Ben Marwick and I are organizing a session for the SAA2015 (the 80th edition, this year in San Francisco) on “Macroscopic approaches to archaeological histories: Insights into archaeological practice from digital methods”. It’s a pretty big tent. Below is the session ID and the abstract. If this sounds like something you’d be interested in, why don’t you get in touch?

Session ID 743.

The history of archaeology, like most disciplines, is often presented as a sequence of influential individuals and a discussion of their greatest hits in the literature.  Two problems with this traditional approach are that it sidelines the majority of participants in the archaeological literature who are excluded from these discussions, and it does not capture the conversations outside of the canonical literature.  Recently developed computationally intensive methods as well as creative uses of existing digital tools can address these problems by efficiently enabling quantitative analyses of large volumes of text and other digital objects, and enabling large scale analysis of non-traditional research products such as blogs, images and other media. This session explores these methods, their potentials, and their perils, as we employ so-called ‘big data’ approaches to our own discipline.

—-

Like I said, if that sounds like something you’d be curious to know more about, ping me.

Quickly Extracting Data from PDFs

By ‘data’, I mean the tables. There are lots of archaeological articles out there that you’d love to compile together to do some sort of meta-study. Or perhaps you’ve gotten your hands on pdfs with tables and tables of census data. Wouldn’t it be great if you could just grab that data cleanly? Jonathan Stray has written a great synopsis of the various things you might try and has sketched out a workflow you might use. Having read that, I wanted to try ‘Tabula‘, one of the options that he mentioned. Tabula is open source and runs on all the major platforms. You simply download it an double-click on the icon; it runs within your browser. You load your pdf into it, and then draw bounding boxes around the tables that you want to grab. Tabula will then extract that table cleanly, allowing you to download it as a csv or tab separated file, or paste it directly into something else.

For instance, say you’re interested in the data that Gill and Chippindale compiled on Cycladic Figures. You can grab the pdf from JSTOR:

Material and Intellectual Consequences of Esteem for Cycladic Figures
David W. J. Gill and Christopher Chippindale
American Journal of Archaeology , Vol. 97, No. 4 (Oct., 1993) , pp. 601-659
Article DOI: 10.2307/506716

Download it, and then feed it into Tabula. Let’s look at table 2.

gillchippendaletable2
You could just highlight this table in your pdf reader and hit ctrl+c to copy it; when you paste that into your browser, you’d get:
gillchippendaletable2cutnpaste
Everything in a single column. For a small table, maybe that’s not such a big deal. But let’s look at what you get with Tabula. You drag the square over that same table; when you release the mouse button you get:
tabula1
Much, much cleaner & faster! I say ‘faster’, because you can quickly drag the selection box around every table and hit download just the one time. Open the resulting csv file, and you have all of your tables in a useful format:
tabula2
But wait, there’s more! Since you can copy directly to the clipboard, you can paste directly into a google drive spreadsheet (thus taking advantage of all the visualization options that Google offers) or into something like Raw from Density Design.
Tabula is a nifty little tool that you’ll probably want to keep handy.

Gaze & Eonydis for Archaeological Data

I’m experimenting with Clement Levallois‘ data mining tools ‘Gaze‘ and ‘Eonydis‘. I created a table with some mock archaeological data in it: artefact, findspot, and date range for the artefact. More on dates in a moment. Here’s the fake dataset.

Firstly, Gaze will take a list of nodes (source, target), and create a network where the source nodes are connected to each other by virtue of sharing a common target. Clement explains:

Paul,dog
Paul, hamster
Paul,cat
Gerald,cat
Gerald,dog
Marie,horse
Donald,squirrel
Donald,cat
… In this case, it is interesting to get a network made of Paul, Gerald, Marie and Donald (sources nodes), showing how similar they are in terms of pets they own. Make sure you do this by choosing “directed networks” in the parameters of Gaze. A related option for directed networks: you can choose a minimum number of times Paul should appear as a source to be included in the computations (useful to filter out unfrequent, irrelevant nodes: because you want only owners with many pets to appear for instance).

The output is in a nodes.dl file and an edges.dl file. In Gephi, go to the import spreadsheet button on the data table, import the nodes file first, then the edges file. Here’s the graph file.

Screenshot, Gaze output into Gephi, from mock archaeo-data

Screenshot, Gaze output into Gephi, from mock archaeo-data

Eonydis on the other hand takes that same list and if it has time-stamps within it (a column with dates), will create a dynamic network over time. My mock dataset above seems to cause Eonydis to crash – is it my negative numbers? How do you encode dates from the Bronze Age in the day/month/year system? Checking the documentation, I see that I didn’t have proper field labels, so I needed to fix that. Trying again, it still crashed. I fiddled with the dates to remove the range (leaving a column to imply ‘earliest known date for this sort of thing’), which gave me this file.

Which still crashed. Now I have to go do some other stuff, so I’ll leave this here and perhaps one of you can pick up where I’ve left off. The example file that comes with Eonydis works fine, so I guess when I return to this I’ll carefully compare the two. Then the task will be to work out how to visualize dynamic networks in Gephi. Clement has a very good tutorial on this.

Postscript:

Ok, so I kept plugging away at it. I found if I put the dates yyyy-mm-dd, as in 1066-01-23 then Eonydis worked a treat. Here’s the mock data and here’s the gexf.

And here’s the dynamic animation! http://screencast.com/t/Nlf06OSEkuA

Post post script:

I took the mock data (archaeo-test4.csv) and concatenated a – in front of the dates, thus -1023-01-01 to represent dates BC. In Eonydis, where it asks for the date format, I tried this:

#yyyy#mm#dd  which accepted the dates, but dropped the negative;

-yyyy#mm#dd, which accepted the dates and also dropped the negative.

Thus, it seems to me that I can still use Eonydis for archaeological data, but I should frame my date column in relative terms rather than absolute, as absolute isn’t really necessary for the network analysis/visualization anyway.

How I Lost the Crowd: A Tale of Sorrow and Hope

Yesterday, my HeritageCrowd project website was annihilated. Gone. Kaput. Destroyed. Joined the choir.

It is a dead parrot.

This is what I think happened, what I now know and need to learn, and what I think the wider digital humanities community needs to think about/teach each other.

HeritageCrowd was (may be again, if I can salvage from the wreckage) a project that tried to encourage the crowdsourcing of local cultural heritage knowledge for a community that does not have particularly good internet access or penetration. It was built on the Ushahidi platform, which allows folks to participate via cell phone text messages. We even had it set up so that a person could leave a voice message and software would automatically transcribe the message and submit it via email. It worked fairly well, and we wrote it up for Writing History in the Digital Age. I was looking forward to working more on it this summer.

Problem #1: Poor record keeping of the process of getting things intalled, and the decisions taken.

Now, originally, we were using the Crowdmap hosted version of Ushahidi, so we wouldn’t have to worry about things like security, updates, servers, that sort of thing. But… I wanted to customize the look, move the blocks around, and make some other cosmetic changes so that Ushahidi’s genesis in crisis-mapping wouldn’t be quite as evident. When you repurpose software meant for one domain to another, it’s the sort of thing you do. So, I set up a new domain, got some server space, downloaded Ushahidi and installed it. The installation tested my server skills. Unlike setting up WordPress or Omeka (which I’ve done several times), Ushahidi requires the concommitant set up of ‘Kohana‘. This was not easy. There are many levels of tacit knowledge in computing and especially in web-based applications that I, as an outsider, have not yet learned. It takes a lot of trial and error, and sometimes, just dumb luck. I kept poor records of this period – I was working to a tight deadline, and I wanted to just get the damned thing working. Today, I have no idea what I actually did to get Kohana and Ushahidi playing nice with one another. I think it actually boiled down to file structure.

(It’s funny to think of myself as an outsider, when it comes to all this digital work. I am after all an official, card-carrying ‘digital humanist’. It’s worth remembering what that label actually means. At least one part of it is ‘humanist’. I spent well over a decade learning how to do that part. I’ve only been at the ‘digital’ part since about 2005… and my experience of ‘digital’, at least initially, is in social networks and simulation – things that don’t actually require me to mount materials on the internet. We forget sometimes that there’s more to the digital humanities than building flashy internet-based digital tools. Archaeologists have been using digital methods in their research since the 1960s; Classicists at least that long – and of course Father Busa).

Problem #2: Computers talk to other computers, and persuade them to do things.

I forget where I read it now (it was probably Stephen Ramsay or Geoffrey Rockwell), but digital humanists need to consider artificial intelligence. We do a humanities not just of other humans, but of humans’ creations that engage in their own goal-directed behaviours. As some one who has built a number of agent based models and simulations, I suppose I shouldn’t have forgotten this. But on the internet, there is a whole netherworld of computers corrupting and enslaving each other, for all sorts of purposes.

HeritageCrowd was destroyed so that one computer could persuade another computer to send spam to gullible humans with erectile dsyfunction.

It seems that Ushahidi was vulnerable to ‘Cross-site Request Forgery‘ and ‘Cross-site Scripting‘ attacks. I think what happened to HeritageCrowd was an instance of persistent XSS:

The persistent (or stored) XSS vulnerability is a more devastating variant of a cross-site scripting flaw: it occurs when the data provided by the attacker is saved by the server, and then permanently displayed on “normal” pages returned to other users in the course of regular browsing, without proper HTML escaping.

When I examine every php file on the site, there are all sorts of injected base64 code. So this is what killed my site. Once my site started flooding spam all over the place, the internet’s immune systems (my host’s own, and others), shut it all down. Now, I could just clean everything out, and reinstall, but the more devastating issue: it appears my sql database is gone. Destroyed. Erased. No longer present. I’ve asked my host to help confirm that, because at this point, I’m way out of my league. Hey all you lone digital humanists: how often does your computing services department help you out in this regard? Find someone at your institution who can handle this kind of thing. We can’t wear every hat. I’ve been a one-man band for so long, I’m a bit like the guy in Shawshank Redemption who asks his boss at the supermarket for permission to go to the bathroom. Old habits are hard to break.

Problem #3: Security Warnings

There are many Ushahidi installations all over the world, and they deal with some pretty sensitive stuff. Security is therefore something Ushahidi takes seriously. I should’ve too. I was not subscribed to the Ushahidi Security Advisories. The hardest pill to swallow is when you know it’s your own damned fault. The warning was there; heed the warnings! Schedule time into every week to keep on top of security. If you’ve got a team, task someone to look after this. I have lots of excuses – it was end of term, things were due, meetings to be held, grades to get in – but it was my responsibility. And I dropped the ball.

Problem #4: Backups

This is the most embarrasing to admit. I did not back things up regularly. I am not ever making that mistake again. Over on Looted Heritage, I have an IFTTT recipe set up that sends every new report to BufferApp, which then tweets it. I’ve also got one that sends every report to Evernote. There are probably more elegant ways to do this. But the worst would be to remind myself to manually download things. That didn’t work the first time. It ain’t gonna work the next.

So what do I do now?

If I can get my database back, I’ll clean everything out and reinstall, and then progress onwards wiser for the experience. If I can’t… well, perhaps that’s the end of HeritageCrowd. It was always an experiment, and as Scott Weingart reminds us,

The best we can do is not as much as we can, but as much as we need. There is a point of diminishing return for data collection; that point at which you can’t measure the coastline fast enough before the tides change it. We as humanists have to become comfortable with incompleteness and imperfection, and trust that in aggregate those data can still tell us something, even if they can’t reveal everything.

The HeritageCrowd project taught me quite a lot about crowdsourcing cultural heritage, about building communities, about the problems, potentials, and perils of data management. Even in its (quite probable) death, I’ve learned some hard lessons. I share them here so that you don’t have to make the same mistakes. Make new ones! Share them! The next time I go to THATCamp, I know what I’ll be proposing. I want a session on the Black Hats, and the dark side of the force. I want to know what the resources are for learning how they work, what I can do to protect myself, and frankly, more about the social and cultural anthropology of their world. Perhaps there is space in the Digital Humanities for that.

PS.

When I discovered what had happened, I tweeted about it. Thank you everyone who responded with help and advice. That’s the final lesson I think, about this episode. Don’t be afraid to share your failures, and ask for help. As Bethany wrote some time ago, we’re at that point where we’re building the new ways of knowing for the future, just like the Lunaticks in the 18th century. Embrace your inner Lunatick:

Those 18th-century Lunaticks weren’t about the really big theories and breakthroughs – instead, their heroic work was to codify knowledge, found professional societies and journals, and build all the enabling infrastructure that benefited a succeeding generation of scholars and scientists.

[...]

if you agree with me that there’s something remarkable about a generation of trained scholars ready to subsume themselves in collaborative endeavors, to do the grunt work, and to step back from the podium into roles only they can play – that is, to become systems-builders for the humanities — then we might also just pause to appreciate and celebrate, and to use “#alt-ac” as a safe place for people to say, “I’m a Lunatick, too.”

Perhaps my role is to fail gloriously & often, so you don’t have to. I’m ok with that.