AHA 2012 proposal: Crowdsourcing History

2011 February 17

(Welcome, AHA 2012 attendees! If you got to this page from the AHA Annual Meeting program, please know that we’ve established a blog for our session which contains more up-to-date information as well as the ability to submit questions and comments to us via web. Please join us there and at the Sheraton Ballroom IX at 11:30 on Saturday, January 7 at 11:30.)


I’ve recently submitted the following experimental-session proposal for the American Historical Association’s 2012 annual meeting in Chicago. It’ll be interesting to see what the AHA Program Committee makes of our session format, which is an adaptation of THATCamp‘s “dork shorts” for a 2-hour session populated mostly by historians. [Edit, May 2011: The AHA Program Committee has accepted this proposal. Hooray!]

We have really cool presenters, but the session-proposal software literally wouldn’t let me give abstracts for more than 5 presentations— so I’m posting them here instead because I’m so excited about what people are doing. Read to the bottom for the details and links.

(For grad students interested in seeing what an AHA session proposal looks like, this may provide some guidance too. I remember how much stress I put into the first big-conference proposals I wrote, and anything I can do to help someone else avoid that stress is time well spent.)

Crowdsourcing History:
Collaborative Online Transcription and Archives

Session Abstract

Large-scale digitization of manuscript materials has recently made new volumes of primary sources available to a global audience via the Internet. However, transcribing these materials to enable searches or other kinds of algorithmic processing poses a significant challenge in terms of labor required. Scholars and cultural heritage institutions are increasingly exploring the use of collaborative online approaches (also called “crowdsourcing”) as a way to address these challenges.

This session seeks to explore the potential and pitfalls of crowdsourcing as a method for collecting transcriptions and for teaching wider audiences about reading and using historical manuscripts. We seek to bring together public historians, archivists, academic historians, technologists, and other scholars to learn about and discuss these projects and the future of crowdsourcing in the historical profession.

Our experimental format will feature 5-to-8-minute presentations by a number of scholar-technologists who are studying and working on crowdsourcing projects.  As time allows, we will also invite attendees working on similar projects to give short, limited-to-2-minutes “lightning talks.” This wide array of presenters will enable attendees to get a clear sense of how collaborative transcription projects work and to talk about how we might use such strategies in the future. After the presentations, the moderator will facilitate a 20 to 30 minute discussion between presenters and attendees.

Participants and projects will include:

Moderator: Shane Landrum, Department of History, Brandeis University
Crowdsourcing Transcription of the Papers of the War Department using Scripto
Sharon M. Leon, Center for History and New Media, George Mason University
Transcribing Jeremy Bentham
Valerie Wallace, The Bentham Project, University College London
T-PEN — Transcription for Paleographical and Editorial Notation
James Ginther, Center for Digital Theology, Saint Louis University
Invisible Australians: Living under the White Australia Policy
Kate Bagnall, Independent scholar, and Tim Sherratt, National Museum of Australia
Crowdsourcing access to women’s history in Western Australia
Jennifer Griffiths, Historian and Heritage Consultant
Linked Data, Transcription, and Markup for Archives and Communities
Abigail Belfrage, Digital Projects, Public Record Office Victoria, Australia
Crowdsourcing Historical Climate Data and Papyrus Transcriptions
Chris Lintott, Citizen Science Alliance
FromThePage: a web-based tool for transcribing, indexing, and annotating handwritten material
Ben Brumfield, software engineer, Beta.FromThePage.com
User Participation and Collaborative Creativity
Alexandra Eveleigh, Department of Information Studies, University College London

My note to the Program Committee reviewers:

I have entered bios and abstracts for the maximum 5 presenters to clarify that our session will feature scholarly presenters speaking on professionally important topics. If this session is accepted, please contact the organizer for clarification on how to list this session in the program.

Although the AHA usually does not provide internet connectivity to sessions, it is important that this session be held in one of the rooms with wireless and/or wired internet access, since participants will be demonstrating their projects.

Some of our non-US-based presenters work at institutions which may not be able to fund their travel. If they are unable to attend in person, we will present their work via YouTube or Skype.

Details

Here are the full details for our presentations, only some of which we were able to submit to the Program Committee. (I’ve omitted the 250-word biographical statements for presenters, but you can find most of that information at the websites linked below.)

Crowdsourcing Transcription of the Papers of the War Department using Scripto

Sharon Leon, Center for History and New Media

In 1800, the United States War Department burned to the ground. The important materials of that archive were lost to historians until an intrepid group for researchers began the mission to reconstitute the papers by collecting and scanning received copies and materials from other archives. Today, those images, representing nearly 55,000 documents, are available to researchers from the Papers of the War Department, 1784-1800 project website. In keeping with the origins of this non-traditional digital archive, PWD is embarking on a new venture to revolutionize the work of documentary editors by opening the archive up for crowdsourcing of transcription.

This innovation in editing practice is facilitated by the use of the Center for History and New Media’s newest open source tool: Scripto. Scripto allows users to contribute transcriptions to online documentary projects. The tool includes a versioning history and full set of editorial controls, so that project staff can manage public contributions. The crowdsourcing work with PWD serves as a case study for other documentary projects that might want to pursue similar methods for beginning transcription, measurably improving their search corpus, and creating a vibrant community of users among scholarly researchers, students and teacher, and members of the general public. CHNM will capture the lessons learned with PWD in the form of a guide for editors, and will share those lessons with the audience of the AHA’s 2012 annual meeting.

Transcribing Jeremy Bentham

Valerie Wallace, The Bentham Project, University College London

In his will Jeremy Bentham, the great philosopher and reformer who lived from 1748 until 1832, requested that after his death his body be preserved in a box and put on display. He suggested that an accompaniment to this ‘auto-icon’ might be his ‘unedited and unfinished manuscripts, lodged in an appropriate case of shelves’. Bentham would have approved therefore of the Transcribe Bentham initiative, a project whose aim is to digitise these uedited and unfinished papers, of which there are 60,000, and put them on what is arguably an appropriate case of shelves for the twenty-first century: the internet.

Transcribe Bentham is run by the Bentham Project in the Faculty of Laws at University College London in colloboration with UCL Centre for Digital Humanities. The Bentham Project is responsible for the publication of the Collected Works of Jeremy Bentham, an authoritative edition of the philosopher’s writings based on the original manuscript papers. The aim of the Transcribe Bentham initiative is to digitise and crowdsource the transcription of these manuscripts. The Transcribe Bentham team has designed a Transcription Desk using MediaWiki where users can log-in, view, and transcribe Bentham’s papers, encoding their transcripts in TEI-compliant XML. The project aims to digitise at least 12,500 manuscripts in a year. This presentation will discuss the project’s experience of crowdsourcing and the quantitative and qualitative data generated by the initiative, offering thoughts on the future of collaborative manuscript transcription and the impact of crowdsourcing on an academic editorial project.

T-PEN — Transcription for Paleographical and Editorial Notation

James Ginther, Center for Digital Theology, Saint Louis University

T-PEN (Transcription for Paleographical and Editorial Notation) is a digital tool for scholars who use digital images of unpublished manuscripts that are housed in digital repositories throughout the world. T-PEN will provide a fully-equipped digital workspace in which the scholar — while constantly viewing the manuscript images — transcribes line by line, makes notes about problematic paleographic features, documents glosses and corrections or revisions to the manuscript, and may—either during transcription or after further research—add interpretative or bibliographic information pertaining to particular lines or larger sections of the text. With this tool, the transcribed text can also be immediately encoded with XML markup to indicate any given feature of the text (e.g., a rubric, colophon, gloss, lemma, correction, quire signature, citation, etc.). T-PEN will be ready for release as an open-source web-based application by April 2012 and will be in beta testing at the time of the AHA conference in January 2012.

Invisible Australians: Living under the White Australia Policy

Kate Bagnall and Tim Sherratt

Invisible Australians aims to reveal something of the lives of the thousands of men, women and children who were affected by the racially-based immigration policy of early 20th-century Australia. To administer the Immigration Restriction Act, government officials implemented an increasingly complex and structured system of tracking and documenting the movements of non-white people as they travelled in and out of the country. This surveillance left an extraordinary body of records containing information about people who, according to the national myth of a ‘White Australia’, were not Australian at all. Using crowdsourced transcription, our project intends to extract biographical data from these records, piece together these fragments of identity and work towards revealing the real face of White Australia.

Crowdsourcing access to women’s history in Western Australia

Jennifer Griffiths

The project I hope to run aims to use crowdsourcing techniques on the resources of the Western Australian State Library, State Records Office and Museum to access women’s stories in the records in order to support the heritage industry in improving the representation of women in the State Heritage Register. Currently, women are seriously under-represented in both the Register and the historical research of heritage in WA. This impacts significantly on the community’s understanding women’s lives in WA in the past. This in turn has ramifications for the way contemporary women are represented and valued. While the project on its own is unlikely to change community perceptions of women’s pasts, it will be part of a network of actions that will achieve this. The project also aims to introduce history and heritage professionals in WA to feminist history practices (how records are read with a feminist lens) and produce a database of women’s stories and histories that will be important to the future study of women in WA. In addition, the project will allow professionals to participate in a crowdsourcing project thus introducing them to using technology in new ways for research.

User participation and collaborative creativity

Alexandra Eveleigh, Department of Information Studies, University College London

My research looks at the impact of user participation and ‘collaborative creativity’ upon archival theory and practice, with a particular focus on users’ involvement in archival description and metadata creation/reuse. It is funded by a UK Arts and Humanities Research Council collaborative doctoral award, the partners being University College London and The National Archives.

My working research questions are:

  • Is user participation an evolution or revolution in archival practice & professionalism?
  • What contexts and circumstances encourage and motivate users to participate in archival description?
  • What impact do participatory methodologies have upon (a) the archive service (b) existing users (c) new users and broader society?
  • The objectives are essentially to distinguish between what works and what doesn’t, and why: to explore some of the realities behind the claims made regarding experts, crowds and volunteer communities, and seek to understand what moves to allow a multiplicity of voices to supplement or even supplant the authoritative professional voice might mean for notions of archival value and traditional communities of archive users.

    Linked Data, Transcription, and Markup for Archives and Communities

    Abigail Belfrage, Public Record Office Victoria, http://www.prov.vic.gov.au

    Public Record Office Victoria (PROV) is the archival authority for the State of Victoria, Australia. In partnership with a number of research and community-based organisations PROV is developing an open-source, web-based crowdsourcing transcription and semantic (& geo-location) mark-up app. The aim is not just to create a valuable body of linked data and images from the state’s archives, but to enable access to the transcription & markup functionality for communities to use on their own projects.

    Crowdsourcing Historical Climate Data and Papyrus Transcriptions

    Chris Lintott, Citizen Science Alliance

    Chris Lintott is the chair of the Citizen Science Alliance which builds and operates the Zooniverse network of online citizen science projects which grew from Galaxy Zoo, which invited participants to classify a million galaxies. More than 350,000 people have taken part in Zooniverse projects, which include Old Weather – which transcribes historical and climate data from ship’s logs – and a project to transcribe the Oxyrhynchus papyri.

    FromThePage: a web-based tool for transcribing, indexing, and annotating handwritten material

    Ben Brumfield, software engineer, Beta.FromThePage.com

    Ben Brumfield is a software engineer in Austin, Texas with more than a dozen years of experience developing web-based, database-driven software. Since 2005 he has been building FromThePage, a web-based tool for transcribing, indexing, and annotating handwritten material. This tool has been used to transcribe over 1500 pages of family diaries and is now being used by the San Diego Natural History Museum to transcribe and analyze naturalists’ field notes from the early 20th century. In 2010, FromThePage was released under a Free/Open Source license. Brumfield blogs about manuscript transcription technology at http://manuscripttranscription.blogspot.com/.

    Regardless of the outcome of our session proposal, I’m sure that all the presenters listed above would welcome other opportunities to speak about their work. If you’re putting together a conference panel or arranging a talk series on your campus, keep them in mind.