Just-published case study on ULCC web site

In October 2012 the DPTP delivered a two-day version of our course for the National Library of Ireland. The course was so popular that it was over-subscribed by other local archivists and librarians keen to learn about digital preservation. Now ULCC have published a case study about our success.

One of our personal triumphs in 2012 is the way we are making the OAIS Model more understandable. If the feedback is reliable, we’re getting it right: “I first read about OAIS seven years ago and never quite got it all to fit together in my brain. Ed managed!”

 

 

The Digital Preservation Training Programme (DPTP) bagged itself a prestigious award at the Digital Preservation awards 2012. We were up against fierce national and international competition and so it was a tremendous honour to win. However as pleased as we are about this validation of our work, it would never have happened without the work and support of so many people. This post is a way of expressing our gratitude to those who have contributed and supported the project.

None of this would have been possible if the JISC hadn’t funded the initial year long project in 2006. Thank you JISC. This saw us working with Cornell University Library (Nancy Mc Govern and Ann Kenney), ULCC, King’s College (Simon Tanner), the DPC (Wiliam Kilbride and Maggie Jones) and the Archaeology Data Service (ADS) (Jen Mitcham). In particular Nancy McGovern and Anne Kenney’s models are invaluable for introducing digital preservation. Thanks all.

Thanks also to those at ULCC past and present: Kevin Ashley, Richard Davis, all the team at ULCC for being great and amazing through the highs and lows (hardly any of course) of digital preservation: Silvia Arango Docio, Rory McNicholl, Jose Martin and Kate Bradford without whose trojan work we would have been lost. Also big thanks to Graciano Soares who showed us that by working well with students we can get so much more from the course and make it lots of fun and truly engaging. The joy of this course is so much about working with knowledgeable, diverse and interesting people and seeing lightbulb moments when something ‘clicks’. So thanks must also go to all the people who have attended and institutions who send their staff to us (and pay good money) since 2006. Thanks also to Frank Steiner and  Kienuwa Enobakhare for trying to get us to sell ourselves more! Thanks also to our Director Richard Maccabee for his support.

Big thanks to the DPC and their support of course.

Lastly I would like to dedicate my part of the prize to the librarians and archivists of Iraq, in particular Dr Saad Eskander who despite living and working in difficult situations we cannot begin to imagine, has recognised the vital importance of digital preservation. This has been been done in the context of many projects both big and small but crucially for their new digital library for all Iraqi people in the midst of a society in turmoil.

We hope  this makes up for the fact that we didn’t get to say this on the night. The list really could go on and I hope I haven’t forgotten anyone. We also hope that the message gets clearer to the unitiated of you reading this about the importance of safeguarding our digital memory and what is at stake if we don’t act. It is a paradigm shift, from being reactive to proactive about preserving our past, both personal and collective, call it a revolution if you will but a revolution it is.

ULCC’s Patricia Sleeman and Ed Pinsent receive the 2012 Digital Preservation Award for Teaching and Communication from Oliver Morley (Chief Executive of the National Archives) and Matthew Woollard (Director, UK Data Archive). [Picture courtesy of the Digital Preservation Coalition]

University of London can be justly proud of the team from ULCC who collected the 2012 Digital Preservation Coalition Award for Teaching and Communication.

It was a great moment at last night’s ceremony at the Wellcome Trust, to see Patricia and Ed receive their award from Oliver Morley, Chief Executive of the National Archives, and Matthew Woollard, Director of the UK Data Archive. Patricia and Ed have worked extremely hard to develop and sustain the Digital Preservation Training Programme (DPTP) since its beginnings as a JISC project in 2005, and it was truly gratifying to see their achievement recognised by a panel of judges that included the British Library, the BBC and the Bodleian Library.

As you should know by now, DPTP is an entry-level, introductory course that develops critical thinking about digital preservation, designed to help those working in information management to understand effective approaches to the challenges of digital preservation, and enable them to assess the models and examples in the context of their own organisations.

As well as its scheduled and bespoke courses (most recently at the National Library of Ireland), the DPTP team is actively involved in many of ULCC’s  Digital Archives & Research Technologies activities, including work with the School of Advanced Study, and most of the University’s colleges and institutes, and on projects for JISC, AHRC, Mellon Foundation and the European Commission. They recently completed the JISC-funded SHARD “Preservation of Historical Research Data” project with IHR, and will contribute to IHR’s forthcoming History DMT (Data Management Training and Guidance) project, funded by AHRC.

Over the years DPTP contributors and supporters have included Kate Bradford, William Kilbride, Kevin Ashley, Maggie Jones, Jen Mitcham, Simon Tanner, Nancy McGovern, Adrian Stevenson and Rory McNicholl. But it is above all the skill and dedication of Patricia Sleeman and Ed Pinsent that have ensured the sustained success of the programme in sharing essential digital preservation skills with a wider audience.

We’d like to thank everyone involved in the awards – the judges, the other nominees, our hosts at the Wellcome Trust (with their smashing new library web site) but most of all the indefatigable staff of the Digital Preservation Coalition who organised everything with their usual aplomb. A splendid time…!

I recently attempted to web-archive an interesting website called Letters of Charlotte Mary Yonge. The creators had approached us for some preservation advice, as there was some danger of losing institutional support.

The site was built on a WordPress platform, with some functional enhancements undertaken by computer science students, to create a very useful and well-presented collection of correspondence transcripts of this influential Victorian woman writer; within the texts, important names, dates and places have been identified and are hyperlinked.

Since I’ve harvested many WordPress sites before with great success, I added the URL to Web Curator Tool, confident of success. However, right from the start some problems were experienced. One concern was that the harvest was taking many hours to complete, which seemed unusual for a small text-based site with no large assets such as images or media attachments. One of my test harvests even went up to the 3 GB limit. As I often do in such cases, I terminated the harvests to examine the log files and folder structures of what had been collected up to that point.

This revealed that a number of page requests were showing a disproportionately large size, some of them collecting over 40 MB for one page – odd, considering that the average size of a gathered page in the rest of the site was less than 50 KB. When I tried to open these 40 MB pages in the Web Curator Tool viewer, they failed badly, often yielding an Apache Tomcat error report and not rendering any viewable text at all.

More »

Cyrillic German by kecko on FlickrA number of discussions are going on at the moment about the importance of preserving representation information for electronic files. Given the many possible ways even ordinary text can be encoded, it can never be taken for granted that digital files will be readable in the future.

Maureen Pennock has recently set in motion the Crowd-sourced Representation Information for Supporting Preservation, to collect this information from anyone with something to contribute. Over at the Signal, Trevor Owens wrote a fascinating post about Glitching Files for Understanding – deliberately manipulating files at byte-level and changing them in inappropriate ways to see the effects. In a similar vein, Paul Wheatley and the SPRUCE project have created the intriguing Atlas of Digital Damages, on Flickr, illustrating what happens when files go wrong or can no longer be accurately rendered.

This reminded me of one of my own personal adventures in digital archaeology, and I finally tracked down the materials necessary to tell the story.

Some years ago I discovered an old floppy disk containing several undergraduate essays. While I was a student at Warwick University, from 1989-1992, I wrote most of my essays on a Research Machines “Nimbus”, inherited somehow from the school where my mother taught. It ran a version of DOS: I did little of interest on it beyond using the Word Processing application that was part of an office suite called First Choice.

On copying them to a PC, the file format used by First Choice was not obviously among those recognised by the conversion tools offered by MS Word,1 so I simply opened the document files in a text editor, and was pleased to discover they were broadly intelligible (the character encoding, at least – I can’t vouch for the intellectual content!): the text-encoding was clearly ASCII based.

The bizarre GERBILDOC file header

The mysterious “GERBILDOC” file header

More »

(for Otto) handle and keyhole by bootpainter on FlickrVersion 7 of the Handle System brings template handles, which make it much easier than before to provide an EPrints repository with persistent URLs.

While previous versions required a new record to be created in the local Handle server database for every persistent URL like http://hdl.handle.net/<prefix>/<item_id> to be resolved, we are now able to simply define a template that will map any

http://hdl.handle.net/<your_prefix>/xyz

to

http://your.repo.url/xyz

Assuming the following scenario:

  • 7.x Handle server set up and running
  • A prefix (institutional id registered in the Handle System) homed on that server. We’ll use 123456 for this example
  • Your EPrints repository is located at http://your.repo.url

Here is how:

  1. For handle 123456: create a Simple URL with the value http://your.repo.url
  2. For handle 0.NA/123456: add an HS_NAMESPACE entry with the following UTF8 Text value:
    <namespace>
      <template delimiter="/">
        <foreach>
          <if value="type" test="equals" expression="URL">
              <value data= "${data}/${extension}" />
          </if>
          <else>
              <value />
          </else>
        </foreach>
      </template>
    </namespace>

And we’re done! Any URL with the format http://hdl.handle.net/123456/* will be resolved as http://your.repo.url/*

Being, as it is, an effective, simple way to use persistent URLs on your repository, it is most convenient when moving an existing repository that was already integrated into the Handle System from DSpace to EPrints. No need to worry about managing the existing Handle database records: a template will successfully resolve to the summary page of every record by just keeping the same record ids.

Had Handle 7 templates been available when we began the SAS-Space migration project (from DSpace to EPrints), we may have reached a different decision about discontinuing Handle for that service. (Handles created by DSpace persist, but no new handles have been minted since the system went live in EPrints.) Handle 7 templates (and the templates now have available to integrate them into EPrints) mean that setting up Handle to work with EPrints is scarcely more difficult than it is for DSpace.

Richard talking about E-books at FOTE12 conference, Senate House, University of London


From the Anthologizr project blog

I enjoyed presenting some of the early Anthologizr work to ULCC’s Future of Technology in Education (FOTE12) conference, as well as the general e-book message of earlier posts here. Slides embedded below (and also available in our repository.)

One backchannel tweet I saw described me (I presume) as “some guy who thought e-books were great”, which I don’t think entirely represents the complexity of what I was trying to convey. It’s all relative, to where we’ve arrived at, and the success of the devices that now frame e-books is so well-established that there’s no way back: what iPods and their successors did for physical audio media, iPads, Kindles and their ilk will surely do to printed media, no matter that it may have been “the most stable and mature market for creative works that exists”.

The FOTE event also yielded some smashing photos, thanks to ULCC’s excellent marketing and photography team.

Richard talking about e-books at FOTE12, Senate House, University of London

Next stop for the project, I have been looking at a couple of e-book creating environments, and hope to write them up. And our next hackathon with our excellent development team will be upon us soon.

Some nice pictures came our way of recent activities, from the DPC reception at the House of Lords on Monday and the Future of Technology in Education (FOTE12) conference at Senate House last Friday. Full sets of photos from both the DPC reception and FOTE12 are available on Flickr – as well as smashing pictures from June’s Institutional Repository Managers’ Workshop (IRMW12).

Patricia at the DPC Decennial Reception, House of Lords, 8th October 2012

Ed at the DPC Decennial Reception, House of Lords, 8th October 2012

Patricia at the DPC Decennial Reception, House of Lords, 8th October 2012

Richard (left) at the DPC Decennial Reception, House of Lords, with Dee Burn (Head of Communications, School of Advanced Study) and Richard Maccabee (Director of ICT, University of London)

Richard discussing e-books at ULCC’s annual Future of Technology in Education conference

Richard at the Future of Technology in Education conference, Senate House, 5th October 2012