Jeremy Bentham, Bloomsbury WC1 by Ewan-M on Flickr (CC:BY)Did I mention that we are very excited to be contributing to UCL’s Bentham Transcription Initiative. This is an AHRC-funded project to complete the digitisation of the manuscripts of 18th Century philosopher Jeremy Bentham, and transcribe them using a wiki-based collaborative approach. It is being run by the Bentham Project at UCL, with support from ourselves and UCL’s newly-launched Centre for Digital Humanities. You can read an overview of the project on Melissa Terras’s blog.

Obviously, transcription of manuscript materials is an important digitisation activity that can rarely, if ever, be left to computers, in the way that printed texts can be, using OCR. But it’s painstaking and laborious work, and anything that eases the burden is welcome.

The project is already throwing up some very interesting conversations about transcription.  At ULCC we have thought about transcription before, particularly with regard to our ongoing work for the Linnean Society archives, and we hope that there will yet be synergies to exploit. It is a great feeling to be so closely involved with disseminating the work of two such seminal figures as Linnaeus and Bentham.

We’re not naïve enough to think that collaborative web-based transcription is new, but we’ve yet to find any substantial comparable examples. A comment on UCL’s Digital Humanities blog teases us with the prospect of information about other similar projects, but fails to provide even a single link or hint, so is effectively useless: hardly in the collaborative spirit! A more useful lead was Joanne Evans’ link to the National Library of Australia’s Australian Newspapers project, which is crowdsourcing the proof-reading and correcting of OCR outputs, and has an impressive-looking site – I’m sure we’ll be borrowing some ideas from there.

Another useful lead has been from Ben Brumfield of Austin, Texas, directing us to his blog about collaborative manuscript transcription which has been going even longer than DA Blog, and looks like it’s going to make interesting reading. Ben’s recent blog post about a distributed transcription exercise of the US Geological Survey’s Bird Phenology Program includes a link to a training video for volunteers (it even sounds like it’s been recorded in a birdhouse).  In the video we can see a database-form approach to transcription, which is particularly appropriate for transcribing data already entered on structured forms.

For more heterogeneous and free-form texts, such as the Bentham manuscripts, wikis seem to me much more appropriate, being in essence discrete hypertext engines. As for collaborative features, MediaWiki in particular has strong and proven features: there can be few better advertisements for effective virtual, global collaboration and crowdsourcing than Wikipedia.

One thing that is particularly compelling about the BPP video is that it is an excellent example of a thorough approach to online collaboration, giving clear and unequivocal guidance to contributors. Now that screencast tools are so readily available, it’s clear that for many activities like this, video-based instruction is the ideal tool, and often preferable to any number of written instructions. No less than for online teaching and learning environments, the need for effective induction and inclusive management of the online community must never be overlooked.

About the author:

7 Comments

  1. I’m pleased you found my links useful, and interested to see that your conclusions about the NABPP website match my own so far. It seems to me that the BPP newsletters and website are doing a great job of fostering community through things like the featured photos of transcriptionists at work, and that that effort compensates for the lack of volunteer-to-volunteer communication which creates the community at places like Wikipedia. I look forward to seeing what kind of balance the Bentham project strikes.

    Reply

  2. Interesting stuff, thanks Richard. Are you aware of the Dutch SCRATCH project? It’s exploring methods for automated information retrieval and analysis in large collections of scanned handwritten-document images. More information at http://www.onderzoekinformatie.nl/en/oi/nod/onderzoek/OND1308170/ and http://www.ai.rug.nl/alice/nwo-catch-scratch/index_english.html.

    Reply

  3. The SCRATCH project sounds a lot like Doug Kennard’s work on George Washington’s letters. It’s possible to programmatically digitize handwriting for the purpose of searching, but not terribly useful when the goal is a transcription.

    Reply

  4. i’m sorry you found my comment to be unhelpful,

    and sorry that i didn’t get around to a follow-up

    until just now…

    but i’m surprised your work didn’t turn up info on

    _distributed_proofreaders_, since they have now

    digitized 17,000 e-books for project gutenberg.

    > http://pgdp.net

    the d.p. workflow is _not_ something to be emulated,

    since it’s far too wasteful of human time and energy,

    but you should take a good look at a new experiment:

    > http://fadedpage.com

    i’d also be happy to tell you exactly how to set up

    a collaborative transcription environment, and even

    provide you with some code to implement the system.

    -bowerbird

    bowerbird@aol.com

    Reply

  5. Thanks for the links – links /are/ helpful. We’ll add the Gutenberg and Fadedpage references to our reading list for defining the system. And I’ll make sure the project leaders see your kind offer!

    Reply

  6. i can be “helpful” to an annoying degree, if you want.

    for instance, one of the first things you need to do

    – in setting up a system — is to make sure that

    your filenaming conventions are wisely grounded.

    if you want me to look at your filename structure,

    or that for the bentham project, i’d be happy to

    review it and let you know what i think about it…

    i’d even set up a demo of a system for you to see.

    or if you have a wiki where things are discussed,

    i would be happy to contribute to the dialog…

    -bowerbird

    Reply

  7. Mr. Frank, you might be interested in the other existing projects that Melissa Terras posted about. There’s a lot there that goes beyond Distributed Proofreaders.

    I’m curious how “filename structure” fits into your design, however — are you referring to image filenames reflecting their contents, thereby enhancing SEO/findability? If not, a good introduction to that is Bruce Tate’s presentation at LSRC last year.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>