Transcribing Bentham

March 1st, 2010 Richard M. Davis Posted in Linnean Online, Museums 6 Comments »

Jeremy Bentham, Bloomsbury WC1 by Ewan-M on Flickr (CC:BY)Did I mention that we are very excited to be contributing to UCL’s Bentham Transcription Initiative. This is an AHRC-funded project to complete the digitisation of the manuscripts of 18th Century philosopher Jeremy Bentham, and transcribe them using a wiki-based collaborative approach. It is being run by the Bentham Project at UCL, with support from ourselves and UCL’s newly-launched Centre for Digital Humanities. You can read an overview of the project on Melissa Terras’s blog.

Obviously, transcription of manuscript materials is an important digitisation activity that can rarely, if ever, be left to computers, in the way that printed texts can be, using OCR. But it’s painstaking and laborious work, and anything that eases the burden is welcome.

The project is already throwing up some very interesting conversations about transcription.  At ULCC we have thought about transcription before, particularly with regard to our ongoing work for the Linnean Society archives, and we hope that there will yet be synergies to exploit. It is a great feeling to be so closely involved with disseminating the work of two such seminal figures as Linnaeus and Bentham.

We’re not naïve enough to think that collaborative web-based transcription is new, but we’ve yet to find any substantial comparable examples. A comment on UCL’s Digital Humanities blog teases us with the prospect of information about other similar projects, but fails to provide even a single link or hint, so is effectively useless: hardly in the collaborative spirit! A more useful lead was Joanne Evans’ link to the National Library of Australia’s Australian Newspapers project, which is crowdsourcing the proof-reading and correcting of OCR outputs, and has an impressive-looking site – I’m sure we’ll be borrowing some ideas from there.

Another useful lead has been from Ben Brumfield of Austin, Texas, directing us to his blog about collaborative manuscript transcription which has been going even longer than DA Blog, and looks like it’s going to make interesting reading. Ben’s recent blog post about a distributed transcription exercise of the US Geological Survey’s Bird Phenology Program includes a link to a training video for volunteers (it even sounds like it’s been recorded in a birdhouse).  In the video we can see a database-form approach to transcription, which is particularly appropriate for transcribing data already entered on structured forms.

For more heterogeneous and free-form texts, such as the Bentham manuscripts, wikis seem to me much more appropriate, being in essence discrete hypertext engines. As for collaborative features, MediaWiki in particular has strong and proven features: there can be few better advertisements for effective virtual, global collaboration and crowdsourcing than Wikipedia.

One thing that is particularly compelling about the BPP video is that it is an excellent example of a thorough approach to online collaboration, giving clear and unequivocal guidance to contributors. Now that screencast tools are so readily available, it’s clear that for many activities like this, video-based instruction is the ideal tool, and often preferable to any number of written instructions. No less than for online teaching and learning environments, the need for effective induction and inclusive management of the online community must never be overlooked.

AddThis Social Bookmark Button

Our new EPrints repository (is not just for Christmas)

December 21st, 2009 Richard M. Davis Posted in Repositories Service No Comments »

IR

As regular readers will know, we have been working with repositories for quite a few years now. In 2005 we began working with the School of Advanced Study on their requirements for an Institutional Repository, and since then we have installed, configured and maintained several repositories, including some highly customised, specialist systems.

In most cases we have used EPrints. This is partly because we are familiar with the stuff it is built with (Perl, MySQL and XML have been at the heart of the NDAD dataset repository we have operated for The National Archives since 1997). But also because we like the ever-expanding set of features and options EPrints provides. I’ve watched its capabilities grow, thanks to the seemingly limitless energy and initiative of the EPrints team at Southampton. (For an interesting, user’s-eye perspective on the relative merits of DSpace and EPrints, I recommend reading some of the posts tagged DSpace in Dorothea Salo’s Caveat Lector blog).

It’s three years almost to the day since Rory and I attended the pre-launch briefing on EPrints3 and came away convinced that, with its AJAX UI and evolving plugin architecture, EPrints 3 was likely to play a big part in our future plans.

And hardly a day’s gone by since, when we haven’t had some EPrints-related work on our plate. In 2007 we began developing Linnean Online for the Linnean Society, and PRIMO for the Institute of Musical Research. Out of this, and the snowballing Web 2.0 zeitgeist, we also honed the idea that became SNEEP (Social Networking Extensions for EPrints), one of the first JISC Rapid Innovation projects. Most recently, we’ve scaled new heights of EPrints customisation with the SOAS Fürer-Haimendorf collection, with its user-defined albums and searching enhancements, all wrapped up in 9Web’s impressive graphic design.

We’ve tweaked config files and hacked templates and for the most part enjoyed doing stuff with EPrints. (All credit is due to Rory and Ben, by the way. My role is chiefly to say “We could make it do that couldn’t we?” And, lo and behold, usually “we” can.)

Over the years I’ve also talked to many repository managers, and potential repository managers, about their requirements and expectations. I’ve spoken and networked at DSpace User Groups , Open Repositories conferences and many excellent events organised by the JISC, particularly the Repositories Support Project – and I’ve met a lot of smart and insightful people in the repo biz. Some of it must have rubbed off – I think my own understanding of what’s needed, and what’s feasible has grown considerably.

But what we’ve never done is run our own repository, and experienced these things day-to-day for ourselves. As Atticus Finch said in To Kill A Mockingbird,

You never really understand a person until you consider things from his point of view . . . until you climb into his skin and walk around in it.

That’s why, in the gaps between everything else going on round here, Annemarie has been putting together the ULCC Publications Archive, which I hope will become a canonical home for our published outputs. It’s not big and it’s not clever, it’s certainly not perfect, but it is something we can use to improve our understanding of what it means to run a repository. We will also no doubt use it to explore some of the tools and techniques emanating from the EPrints developer community.

And now I can really start to empathise with the repository managers I know: their agony – clarifying copyright and licenses, ambiguous form fields, disappearing diacritics – and their ecstasy – a well-formed subject tree or citation, a successful search. I’ve also an insight into the needs of authors/submitters, since several articles are mine – and I naturally want to get the citations looking just right, so that I can embed some of the nice feeds EPrints provides into my blogs, e-portfolios and who knows what other mashups. Self-interest is a great motivator, as many Open Access advocates have observed: before long I’m sure I’ll be wanting download statistics, author profiles, and most of the other things I described in 1001 Things To Do With A Live Repository.

For me it’s an invaluable experience – no less so than when, a couple of years ago, I became an actual user of a VLE, through my MSc course at Edinburgh. There’s a world of difference between being a developer or implementer of this kind of online system – thinking your job’s done when it seems to be up-and-running – and being the poor end-user who doesn’t care about PHP, JSP, Maven, Apache, etc, but  just wants to get something done.

Among the things you’ll find in pubs.ulcc.ac.uk are: papers and articles from events we have contributed to over the years, such as iPRES, Open Repositories, and DLM-Forum; published reports, like last year’s JISC-PoWR web preservation report; presentations and posters from other events, mostly in the field of e-learning or digital archives; and even the swish product sheets produced by our ace marketing department, Tim and Frank!

As well as our most recent UK activities, we’ve also unearthed some other curios, such as Patricia’s article for the Catalan Archivists’ Forum, in Catalan, and a piece by Kevin in La Vanguardia, in Spanish. Also of interest is a brief account of ULCC’s first 30 years, in the form of a brochure for a small exhibition that was held at Senate House Library in 1999.

No doubt as we delve through our own digital archives we’ll find more goodies. Having a repository is an excellent opportunity to locate and appraise these things, and share those that seem interesting and informative enough. No less than this blog, and our E-learning colleagues’ El Blog, it should be an attractive and effective shop-window – just like any good Institutional Repository.

AddThis Social Bookmark Button

JISC “Deposit Show & Tell” Event

October 16th, 2009 Richard M. Davis Posted in CLASM, E-learning, Events, Reports No Comments »

Since it was at ULU, two doors down from our new home in Senate House, we had no excuses not to attend the JISC “Deposit Show & Tell” Event for applications and projects dedicated to making life easier for repository depositors. Like many of the JISC Developer Happiness/Rapid Innovation events, the format was a combination of short, informal presentations and discussions, and a chance to meet old and new faces on the Repo Scene.

As you’ll know from my previous post, the CLASM project is developing a plugin to enable direct deposit from Moodle to any SWORD-compliant repository. Our specific use case is to support management of CLA materials for teaching, but there is no reason why the Moodle plugins that James has developed couldn’t be adapted to deposit pretty much anything available in a Moodle VLE into a repository.

Luckily I don’t even have to write it up, because James has already written an excellent account on EL Blog, so please check his report out if you want to know more.

AddThis Social Bookmark Button

Future of Technology in Education (FOTE) 2009

October 14th, 2009 Richard M. Davis Posted in CLASM, E-learning, Events, JISC 1 Comment »

FOTE 2009 in Second Life

FOTE 2009 in Second Life

For the second year running, ULCC organised a successful and interesting Future of Technology in Education (FOTE) conference, held on October 2nd at the Royal Geographic Society in Kensington. The programme had a particular focus on two hot topics, Cloud Computing and Social Media. There is a wealth of information on the FOTE website, including slides and videos of the presentations. The event was widely Tweeted, live-blogged by Andy Powell, and ran in parallel in Second Life.

We used the opportunity to include a short presentation about our CLASM project, and I shared the platform for one session with James Ballard, our resident Learning Technologist and ace Moodle hacker. The full video and slides (with audio) of our talk are available from the FOTE website; the slides are also on Slideshare.

I was particularly pleased to make contact with Jane Secker of LSE, who knows more than most about CLA, and I am looking forward to discussing some issues with her, as we try to refine the work done on the CLASM plugins and produce a finished package. Jane also published an excellent account of the day’s events on her blog

The audience was a bit different from the JISC Information Environment crowd I’ve made presentations to before, so my talk was a very high-level overview of repository work in the sector, with a few ideas about where trends and technology seem to be leading us. One particular advantage I see is that interoperability between web applications should enable us to focus on using the “right” tools – portfolio, VLE, blog, repository, etc, maybe even VW – at each stage of the institutional/educational workflow, rather than using over-ambitious and over-complicated systems that try to do everything. “Small pieces loosely joined,” and all that.

Unfortunately, while the slides on the FOTE website include audio, the video there doesn’t include the slides, which robs the talk of some context. I have, by some dark means, managed to create a new version which combines the video and slides and upload it to YouTube. (To keep it short and relevant to DA Blog readers, I’ve only included my part of the presentation.)

AddThis Social Bookmark Button

Open Repositories 2009

June 10th, 2009 Richard M. Davis Posted in Events, Linnean Online, SNEEP 1 Comment »

Georgia Aquarium by Driek Heesakkers on Flickr (CC:by-nc-sa)Less than three weeks have passed since I found myself at Open Repositories 2009 (#OR09) in Atlanta, and it already seems a long time ago. For the record, Georgia Tech put on an excellent show, overflowing with fascinating presentations, people and ideas – far too many to take in – and (most importantly) an excellent and entertaining dinner at the Georgia Aquarium.

I took a smashing poster describing our work on Linnean Online and the SNEEP extensions for EPrints, and also spoke about these projects to the EPrints User Group sessions and had to endure the now inevitable Minute Madness. I was pleased to spot the SNEEP Comments plugin in use when Jessie Hey demonstrated EdShare, another of Southampton’s learning resource repository projects. It was also great to meet up again with Patrick McSweeney who has been tweaking SNEEP at Southampton, and discuss ways of keeping ongoing work on the plugins in sync. Regular readers may remember Patrick from OR08, and he cut an even more unforgettable figure this time.

The talk of the event seemed to be the relentless buzz around the unification of DSpace/Fedora Commons, engendering the new creation that is DuraSpace (and DuraCloud). This offers a lot of exciting possibilities that we’ll need to keep track of, though it won’t be the first repositories event that has offered us a surfeit of jam tomorrow… For now, for the curious, here’s the Duraspace FAQ.

By contrast, it’s slightly disappointing that, over the water, the EPrints user group seemed a tad under-subscribed. Features available in EPrints 3.1.x, and those imminent for 3.2, from cloud storage controllers and desktop folder visualisations to preservation support, promise quick wins for anyone wanting to push the repository model further: Les and the EPrints team waste no time in responding to the latest demands of the zeitgeist. All the same, informal discussions with users and non-users of EPrints suggested substantial resistance to its Perl-based core. Yet EPrints continues to push more configurability away from its Perl source: in the kind of repository-driven future oft foretold – from WordPress-type exensibility to modular service-oriented solutions – the underlying code base ought to become increasingly irrelevant as long as the package does what it says on the tin.

As usual it was great to meet some old friends, and lots of people for the first time. Memorably serendipitous (re-)discoveries included:

  • Bibapp – “a Campus Research Gateway and Expert Finder”. There have been many attempts to integrate personalised, portfolio pages with repositories, and this looks like an effort worth investigating further, particularly as it claims to be repository neutral (and a good excuse to try out Ruby for real?).
  • ParallelArchive – another variant on the repository model: “a personal scholarly workspace, a collaborative research environment, and a digital repository”. Run by Open Society Archives (OSA) at Central European University in Budapest – of particular interest to students of cold war and related issues
  • E-Lis – still a superb multilingual collection of LIS resources, and undoubtedly the acid test of all EPrints internationalisation efforts
  • MIT Open CourseWare – the mother of all OERs?
  • The great Peter Sefton – great to meet him at last, at 6′ 7″, someone I can truly look up to. For a much more thorough account of the conference, see Pete’s Blog

I didn’t manage anything in the way of sightseeing, though the Aquarium seemed to be top of most locals’ list of recommendations, and we went there. Perhaps I should have made more of an effort to see the Civil War museum. For the visual record of OR09, content and context, you might like to see Jim Downing’s photos from the event, and the official photo OR09 set on Flickr.

AddThis Social Bookmark Button