Crowdsourced transcription ahoy!

2010 October 29
by Shane Landrum

Regular readers know that I’m interested in the idea of crowdsourced transcription for archival materials (perhaps more accurately called nerd-sourcing). But while talking to colleagues at Radcliffe’s wonderful Why Books? symposium, I realized that I haven’t yet blogged about the recent good news on this front from the  Center from History and New Media.

CHNM has received a NEH Office of Digital Humanities grant to write an open-source tool for crowdsourcing transcriptions. The resulting project, just in its infancy, is called Scripto, and it’s under active development now. Source code is available at GitHub, and lead programmer Jim Safley has written about the project’s basic technical choices for those who are interested.

Scripto sits on top of MediaWiki, the software that runs Wikipedia, but it’s worth noticing that another of CHNM’s big projects, the open-source web exhibit software Omeka, is also growing. In response to interest from libraries, archives, museums, and individual researchers, CHNM has launched a hosted version of Omeka, which is now in beta testing. (Once Scripto gets a bit more mature, I wouldn’t be at all surprised if someone develops a way to wire the two together.)

Jeremy Bentham's auto-icon, photo by Michael ReeveUniversity College London has also launched a major project worth watching: Transcribe Bentham, which is slowly digitizing all of UCL’s collection of manuscripts by English philosopher-reformer Jeremy Bentham. Their Transcription Desk software (built on MediaWiki too) uses a points system to keep an honor-roll of top contributors. They aren’t solely looking to get cheap labor out of people; it’s also proving to be a useful teaching tool which lets students work with original texts much more closely than was previously possible.

CHNM and UCL fund development of their respective projects. At the entirely-volunteer, labor-of-love project level, FromThePage, built by Ben Brumfield, is also worth knowing about. Ben keeps a blog on collaborative manuscript transcription which explores the technical details of other systems, including features for motivating volunteers.

It’s great to see all this motion happening. It convinces me that my recent dream of crowdsourcing federally-held women’s history collections is fast becoming technically feasible, even if the legal and ethical issues are still complicated and murky.

5 Responses
  1. Sharon Leon permalink
    October 29, 2010

    Just a not that Scripto is also funded by NHPRC.

  2. October 29, 2010

    Word on the street is, approximately, “more mature? whatever. link it up NOW. ” :)

  3. November 1, 2010

    Thanks for the kind words, Shane. Stay tuned for some good news about FromThePage before the end of the year!

  4. November 7, 2010

    Cool! I knew about Transcribe Bentham, but didn’t know about Scripto. I think the question of how to incentivize/motivate volunteers is a really interesting one, and one which we (academics/digital humanists) are really still working out. I’m really curious to see how these projects develop.


  1. Tweets that mention Crowdsourced transcription ahoy! | cliotropic --

Comments are closed.