Crowdsourced transcription ahoy!
Regular readers know that I’m interested in the idea of crowdsourced transcription for archival materials (perhaps more accurately called nerd-sourcing). But while talking to colleagues at Radcliffe’s wonderful Why Books? symposium, I realized that I haven’t yet blogged about the recent good news on this front from the Center from History and New Media.
CHNM has received a NEH Office of Digital Humanities grant to write an open-source tool for crowdsourcing transcriptions. The resulting project, just in its infancy, is called Scripto, and it’s under active development now. Source code is available at GitHub, and lead programmer Jim Safley has written about the project’s basic technical choices for those who are interested.
Scripto sits on top of MediaWiki, the software that runs Wikipedia, but it’s worth noticing that another of CHNM’s big projects, the open-source web exhibit software Omeka, is also growing. In response to interest from libraries, archives, museums, and individual researchers, CHNM has launched a hosted version of Omeka, which is now in beta testing. (Once Scripto gets a bit more mature, I wouldn’t be at all surprised if someone develops a way to wire the two together.)
University College London has also launched a major project worth watching: Transcribe Bentham, which is slowly digitizing all of UCL’s collection of manuscripts by English philosopher-reformer Jeremy Bentham. Their Transcription Desk software (built on MediaWiki too) uses a points system to keep an honor-roll of top contributors. They aren’t solely looking to get cheap labor out of people; it’s also proving to be a useful teaching tool which lets students work with original texts much more closely than was previously possible.
CHNM and UCL fund development of their respective projects. At the entirely-volunteer, labor-of-love project level, FromThePage, built by Ben Brumfield, is also worth knowing about. Ben keeps a blog on collaborative manuscript transcription which explores the technical details of other systems, including features for motivating volunteers.
It’s great to see all this motion happening. It convinces me that my recent dream of crowdsourcing federally-held women’s history collections is fast becoming technically feasible, even if the legal and ethical issues are still complicated and murky.