Home » tools
Category Archives: tools
In the latest MIT Technology Review, there’s a short piece on the ‘Lytro‘, a camera that captures not just the light that falls on its sensor, but also the angle of that light. This feature allows different information, different kinds of shots, to be extracted computationally after the button is pressed.
I want one. They sell for $500.
Think of the archaeological uses! I’m no photographer, but as I understand things, a lot of archaeological photography comes down to the creative use of oblique angles, whether to see crop marks or to pick out very fine details of artefacts. If the Lytro captures the angles of the light hitting its sensors, then presumably one could take a shot, post the database of information associated with that shot, then allow other [digital] archaeologists to comb through that data extracting information/pictures of relevance? Perhaps a single photo of the soil could be combed through highlighting different textures, colours, etc… Try out their gallery here.
The future of this camera is in the software apps developed to take advantage of the massive database of information that it will generate:
Refocusing images after they are shot is just the beginning of what Lytro’s cameras will be able to do. A downloadable software update will soon enable them to capture everything in a photo in sharp focus regardless of its distance from the lens, which is practically impossible with a conventional camera. Another update scheduled for this year will use the data in a Lytro snapshot to create a 3-D image. Ng is also exploring a video camera that could be focused after shots were taken, potentially giving home movies a much-needed boost in production values.
UPDATE! September 19th 2012: Scott Weingart, Ian Milligan, and I have written an expanded ‘how to get started with Topic Modeling and MALLET’ for the Programming Historian 2. Please do consult that piece for detailed step-by-step instructions for getting the software installed, getting your data into it, and thinking through what the results might mean.
Original Post that Inspired It All:
I’m very interested in topic modeling at the moment. It has not been easy however to get started – I owe a debt of thanks to Rob Nelson for helping me to get going. In the interests of giving other folks a boost, of paying it forward, I’ll share my recipe. I’m also doing this for the benefit of some of my students. Let’s get cracking!
First, some background reading:
- Clay Templeton, “Topic Modeling in the Humanities: An Overview | Maryland Institute for Technology in the Humanities”, n.d., http://mith.umd.edu/topic-modeling-in-the-humanities-an-overview/.
- Rob Nelson, Mining the Dispatch http://dsl.richmond.edu/dispatch/
- Cameron Blevins, “Topic Modeling Martha Ballard’s Diary” Historying, April 1, 2010, http://historying.org/2010/04/01/topic-modeling-martha-ballards-diary/
- David J Newman and Sharon Block, “Probabilistic topic decomposition of an eighteenth‐century American newspaper,” Journal of the American Society for Information Science and Technology 57, no. 6 (April 1, 2006): 753-767.
- David Blei, Andrew Ng, and Michael Jordan, “Latent dirichlet allocation,” The Journal of Machine Learning Research 3 (2003), http://dl.acm.org/citation.cfm?id=944937.
Then, you’ll need the Java developer’s kit – nb, not the regular Java that’s on every computer, but the one that lets you program things. Install this.
Unzip Mallet into your C:/ directory . This is important; it can’t be anywhere else. You’ll then have a folder called C:/mallet-2.0.6 or similar.
Next, you’ll need to create an environment variable called MALLET_HOME. You do this by clicking on control panel >> system >> advanced system settings (in Windows 7; for XP, see this article), ‘environment variables’. In the pop-up, click ‘new’ and type MALLET_HOME in the variable name box; type c:/mallet-2.0.6 (ie, the exact location where you unzipped Mallet) in variable value.
To run mallet, click on your start menu >> all programs >> accessories >> command prompt. You’ll get the command prompt window, which will have a cursor at c:\user\user> (or similar). type cd .. (two periods; that ain’t a typo) to go up a level; keep doing this until you’re at the C:\ . Then type cd:\mallet-2.0.6 and you’re in the Mallet directory. You can now type Mallet commands directly. If you type bin\mallet at this point, you should be presented with a list of Mallet commands – congratulations!
At this point, you’ll want some data. Using the regular windows explorer, I create a folder within mallet where I put all of the data I want to study (let’s call it ‘data’). If I were to study someone’s diary, I’d create a unique text file for each entry, naming the text file with the entry’s date. Then, following the topic modeling instructions on the mallet page, I’d import that folder, and see what happens next. I’ve got some work flow for scraping data from websites and other repositories, but I’ll leave that for another day (or skip ahead to The Programming Historian for one way of going about it.)
Once you’ve imported your documents, Mallet creates a single ‘mallet’ file that you then manipulate to determine topics.
bin\mallet import-dir --input \data\johndoediary --output johndoediary.mallet \ --keep-sequence --remove-stopwords
(modified from the Mallet topic modeling page)
This sequence of commands tells mallet to import a directory located in the subfolder ‘data’ called ‘johndoediary’ (which contains a sequence of txt files). It then outputs that data into a file we’re calling ‘johndoediary.mallet. Removing stopwords strips out ‘and’ ‘of’ ‘the’ etc.
Then we’re ready to find some topics:
bin\mallet train-topics --input johndoediary.mallet \ --num-topics 100 --output-state topic-state.gz --output-topic-keys johndoediary_keys.txt --output-doc-topics johndoediary_composition.txt
(modified from the Mallet topic modeling page)
Now, there are more complicated things you can do with this – take a look at the documentation on the Mallet page. Is there a ‘natural’ number of topics? I do not know. What I have found is that I have to run the train-topics with varying numbers of topics to see how the composition file breaks down. If I end up with the majority of my original texts all in a very limited number of topics, then I need to increase the number of topics; my settings were too coarse.
More on interpreting the output of Mallet to follow.
Again, I owe an enormous debt of gratitude to Rob Nelson for talking me through the intricacies of getting Mallet to work, and for the record, I think the work he is doing is tremendously important and fascinating!
In the Chronicle of Higher Education, there is a troubling piece written by a fellow who writes and sells papers for/to students. Which got me to thinking: shouldn’t text analysis be able to solve this?
Here’s my thinking: I’m willing to bet every author produces unique combinations of words and phrases – a concept that Amazon for instance uses to improve its search functions (“statistically improbable phrases“). As the ‘ghost writer’ points out, most of the emails he gets from students are nearly illegible or otherwise atrocious. So – what if at the start of a school year, you sat all of your students down to handwrite a couple thousand words, any topic. Writing by hand is important, so that you get that student’s actual genuine writing. Scan it all in. Perform text analysis on it. Obtain a ‘signature’ for that student’s style. Then, when students submit their papers, analyze them again and compare the signatures. Where the signatures don’t match within a certain range, bring the student in to talk about their work. Chances are if they didn’t write it, they probably haven’t read it either…. Repeat each year to account for developing skill and ability.
Perhaps I’m naive, and text analysis isn’t at that level yet (but I’m willing to bet it could be…). If the problem is a student submits someone else’s work as his own, then maybe if we had a clear signal of his own true work, all this latent computer power sitting around could be brought into the equation…?
Just a thought.
I’m very interested in augmented reality for interpreting/experiencing landscapes (archaeological or historical). I’ve explored things like Wikitude and Layar. There’s a great deal of flexibility and possibility with those two, if you’ve got the ability and resources to do a bit of programming. Skidmore College has used Layar with success to produce a Campus Map Layar. (follow that link for excellent pointers on how they did it). But what if you’d like to explore the potential of AR, but don’t have the programming skills?
One platform that I’ve come across recently which can help there is called ‘7Scenes‘. It explicitly bills itself as a ‘mobile storytelling platform’. The free account allows you a basic ‘tour’ kind of story to tell; presumably if you purchase another kind of account, different genres become available to you.
I signed up for the free account, and began playing around with it (I’m ‘DoctorG’ if you’re looking). Even with this level of functionality, some playful elements are available – you can set quizzes by location, for instance, and keep score. A tour of your campus for first year students as part of orientation could include quizzes at crucial points.
In the editor window, you first select the genre. Then details (backstory, introduction etc).
The real work begins in the map window. When you add a location, you can make it trigger information or photos when the player encounters it. You can also build in simple quizzes, as in the screenshot.
Once the ‘scene’ is published, anyone with 7scenes on their smartphone can access it. The app knows where you are, and pulls in the closest scene. In about 15 minutes I created a scene with 3 locations, one photo, one info panel, and one quiz, around the main quad here at Carleton. Then, I fired up the app on my iphone and went outside. Even though it was quite simple, it was really rather engaging, wandering about the quad trying to get close enough to the point to trigger the interaction (note to scene makers: zoom into the map interface so that your location is precisely where you want. I put my first point actually outside my intended target, Paterson Hall, so I was wandering about the parking lot.)
I will be playing with this some more; but fired up after only a short investment in time, I wanted to share. The authoring environment makes sense, it’s easy to use, and the results are immediately apparent. When you log back into the 7scenes site, you also get use metrics and reviews of your scene. If only my digital history students had more smartphones!
More on 7scenes from their own press page
Commuting in Ottawa is an interesting experience. It seems the entire city disappears in the summer, beguiling one into thinking that a commute that takes 30 – 40 minutes in August will continue to be 30 – 40 minutes in September.
This morning, I was pushing 1 hr and 40 minutes. On the plus side, this gives me the opportunity to listen to the podcasts from Scholars’ Lab, from the University of Virginia (available via iTunes U). As I listen to this excellent series of talks (one talk per commute…) I realize just how profoundly shallow my knowledge is of the latest happenings in Digital Humanities – and that’s a good thing! For instance, I learned about Intrasis, a system from Sweden for recording archaeological sites (or indeed, any kind of knowledge) that focuses on generating relationships from the data, rather than specifying beforehand a relationships table (and it melds very well with GIS). This is cool. I learned also about Heurist, a tool for managing research. Also ‘Heml’ – the Historical Event Markup and Linking Project, lead by Bruce Robertson. As I listened to this last talk, as Bruce described the problems of marking up events/places/persons using non-Gregorian calendars and so on, it struck me that this problem was rather similar to the one of defining sites in a GIS – what do you do when the boundaries are fuzzy? How do you avoid the in-built precision of dots-on-a-map, or URLS that lead to one specific location? Time is Space, as Einstein taught us….
The upshot is, I feel very humbled when I listen to these in-depth and fascinating talks – I feel rather out of my depth. At the same time, I am excited to be able to participate in such a fast moving field. Hopefully, my small contributions to agent modeling for history generate the same kind of excitement for others!
…provided you blogged the whole thing in the first place.
How, you say?
Anthologize is a free, open-source, plugin that transforms WordPress 3.0 into a platform for publishing electronic texts. Grab posts from your WordPress blog, import feeds from external sites, or create new content directly within Anthologize. Then outline, order, and edit your work, crafting it into a single volume for export in several formats, including—in this release—PDF, ePUB, TEI.
How Anthologize came to be is remarkable in itself (see Dan Cohen’s blog) and is a model for what we as digitally-minded archaeology folks could be doing. Which puts me in mind of excavation reports, catalogues, and other materials produced in the day to day work of archaeology.
What if, in the course of doing your fieldwork/archive work/catalogue work/small finds work, you used WordPress as your content management system? There are plugins a-plenty for keeping things private, if that’s a concern. But once the work is complete, run Anthologize and voila: a publication fit for the 21st century.
And, since the constraints of paper publishing no longer apply, David Wilkin’s thoughts on the fuller experience of archaeology could also now find easier expression – in 2007 I wrote the following:
But he asks, ‘what of characters in archaeological writing?’ Wilkinson’s paper is really making a plea for archaeologists to remember that they themselves are characters in the story of the site or landscape that they are studying, and that they should put themselves into it:
“We all sit in portacavins, in offices, in vans, in pubs or round fires, and we tell stories… we have a great time and drink too much and what do we do the next morning? We get up and go to our offices and we rite, ‘In Phase 1 ditch 761 was recut (794) along part of its length.’ Surely, we can do better”.
A similar argument was made in the SAA Archaeological Record last May, by Cornelius Holtorf , in an article called ‘Learning from Las Vegas: Archaeology in the Experience Economy”. Holtorf argued:
“Learning from Las Vegas means learning to embrace and build upon the amazing fact that archaeologists can connect so well with some of the most widespread fantasies, dreams, and desires that people have today.[…] I am suggesting that the greatest value of archaeology in society lies in providing people with what they most desire from archaeology: great stories both about the past and about archaeological research.”
Archaeology – the doing of archaeology! – is a fantastic experience. You learn so much more about the past when you are at the coal-face itself, when you stand in 35 degree C heat, with the dust on your face so thick you almost choke, debating with the site supervisor the meaning of a complicated series of walls, or sitting at the bar afterwards with a cool beer, still debating the situation, laughing, chatting. Reading ‘Three shards of Vernice-Nera ware found in-situ below 342 indicate…’ sucks the fun out of archaeology. It certainly has no romance which puts the practice of archaeology – as published to the public – far down the list of priorities in this modern Experience Economy. The serious face of archaeology we present to the public is so lifeless : how can we expect government and the public to be excited about our work if we ourselves give every indication of not being excited either?
I’m not arguing that we turn every site monograph into a graphic novel (though that’s an interesting idea, and has been done for teaching archaeology). But with the internet being the way it is these days: couldn’t a project website contain blogs and twitters (‘tweets’, actually) from the people working on it? Can’t we make the stories of the excavation at least as important as the story of the site?
Contragulations to the folks who participated in the creation of Anthologize; there’ll be great things ahead for this tool!
You can now map your Zotero Library:
Potential Use Cases:
Map Your Collection By Key Places:
Many records from library catalogs and journal databases come pre-loaded with geographic keywords. Zotero Maps lets you quickly see the relationships between the terms catalogers, authors, and publishers have assigned to the items in your collection. Similarly, as you apply your own geographic tags to items you can then explore those geographic relationships. Whether you’re looking at key locations in studies of avian flu, ethnographic work in the American southwest, or the history of the transatlantic slave trade, the tags associated with your items provide valuable geographic information.
Map Places of Publication:
In many cases places of publication include crucial information about your items. If your working on a project involving the history of the book, how different media outlets cover an issue, or how different journals present distinct scientific points of view, the places in which those items are published can provide valuable insight.
In 2007, I was trying something along these lines using Platial (now deceased). Now – since you can add objects from things like Opencontext.org into your Zotero library, and describe these using tags, you could begin to build a map of not only ‘things’ but also the relevant reports etc, all from your browser, without doing any of the fancy coding stuff…
From my library:
Heard of Twitter Times?
More recently, social media such as Twitter has provided a surprisingly good set of pointers toward worthy materials I should be reading or exploring. (And as happened with blogs five years ago, the critics are now dismissing Twitter as unscholarly, missing the filtering function it somehow generates among so many unfiltered tweets.) I follow as many digital humanists as I can on Twitter, and created a comprehensive list of people in digital humanities. (You can follow me @dancohen.)
Digital Humanities Now is a new web publication that is the experimental result of this thought. It aggregates thousands of tweets and the hundreds of articles and projects those tweets point to, and boils everything down to the most-discussed items, with commentary from Twitter. A slightly longer discussion of how the publication was created can be found on the DHN “About” page.
I’m following mostly folks in elearning, archaeology, and digital humanities; you can see my edition here.
I just realized. I’ve been intermittently blogging now for three years, as of this December past. In that time, I think I’ve remained more or less true to the ‘mission’ of Electric Archaeology – to try out new techs, recount experiments, disseminate my research, in new media for archaeology and history. There have been times when I could post thoughtful, in-depth pieces; and times when I’ve merely passed on the interesting things that have turned up in my inbox. As of this morning according to WordPress, Electric Archaeology has had over 85,000 views, spread across 394 posts. There have been 329 comments made. I have 62 categories – clearly I need some rationalization there.
I sometimes toy with the idea of moving Electric Archaeology to my own space, so I can put some better analytics on it, but for whatever reason, that just doesn’t happen…
The all time most viewed posts on Electric Archaeology (the most recent posts of course are at the bottom, having had less chance to be viewed):
Archaeology is slowly getting into 3d representations of artefacts, sites, and so on. I don’t know whether we’ve spent enough time thinking ‘why bother?’. What does having a 3d representation of an object or site help us to achieve? A quick answer might have something to do with public archaeology, or education… but that’s a post for another day (or search the archives of this blog
Anyway, for less than $1000, one can now own a 3d printer, and “print” those objects out, from plastic. If you want to build your own printer, plans exist on the internet. This was a need I didn’t even know I had, but now I very much want… I’ve posted the video in a separate post (thanks, Wired!) From what I can find out so far, if you’ve got the svg file (I think), and it’s not overly large, you can print it.
One of the key problems that social scientists and humanists face is knowledge mobilization: getting information out of the ‘silos’ surrounding particular research groups, integrating it on a broad scale, and making it available to all Canadians. The transformation of image, text and sound into a common digital currency has profoundly lowered the transaction costs for researchers to find and utilize new information. A range of new technologies—powerful search engines, wikis, weblogs, text and data mining tools, and so on—make it easier and faster than ever to conduct research and disseminate results. In many disciplines, however, the focus has remained on individuals reading and writing with traditional desktop or laptop computers.
I propose to develop a methodology and a number of prototype devices to make the digital data sets and interpretations of a strategic knowledge cluster available in interactive, ambient and tangible forms that can be recreated in many different settings. To give some idea of the potential of these kinds of devices, consider the difference between writing with a word processor and stepping on the brake of a moving car. While using a word processor you are typically focused on the task and aware that you are interacting with a computer. The interface is intricate, sensorimotor involvement is mostly limited to looking and typing, and your surrounding environment recedes into the background of awareness. On the other hand, when braking you are focused on your involvement with the environment. Sensorimotor experiences are immersive, the interface to the car is as simple as possible, and you are not aware that you are interacting with computers (although recent-model cars in fact have dozens of continuously operating microcontrollers). As academic researchers we have tended to emphasize opportunities for dissemination that require our audience to be passive, focused and isolated from one another and from their surroundings. We need to supplement that model by building some of our research findings into communicative devices that are transparently easy to use, provide ambient feedback, and are closely coupled with the surrounding environment.
(and also read this for a few more arguments).
Well archaeologists? Are you going to let a historian lead the way with material culture?