Just-published case study on ULCC web site
In October 2012 the DPTP delivered a two-day version of our course for the National Library of Ireland. The course was so popular that it was over-subscribed by other local archivists and librarians keen to learn about digital preservation. Now ULCC have published a case study about our success.
One of our personal triumphs in 2012 is the way we are making the OAIS Model more understandable. If the feedback is reliable, we’re getting it right: “I first read about OAIS seven years ago and never quite got it all to fit together in my brain. Ed managed!”
I very much enjoyed the DPC’s latest event on metadata, particularly the first half of the day which concentrated on the PREMIS preservation metadata standard. One of my interests is how I can improve my teaching when I’m training students on the Digital Preservation Training Programme on this subject. Angela Dappert’s excellent presentation and exercise, now available here, has been enormously helpful for this.
My tendency has been to introduce standards like PREMIS and METS to my eager students in a linear top-down manner, explaining data models and structure…only to find them somewhat overwhelmed by the detail and the degree of effort that seems to be required in implementing it. I sense some students get the impression that they are (a) compelled to use these standards in order to succeed at digital preservation in the first place, or (b) have to implement it in a certain way. Worst case would be if they assume they have to use all the fields they possibly can, to arrive at a “complete” profile of a digital object.
The Digital Preservation Training Programme (DPTP) bagged itself a prestigious award at the Digital Preservation awards 2012. We were up against fierce national and international competition and so it was a tremendous honour to win. However as pleased as we are about this validation of our work, it would never have happened without the work and support of so many people. This post is a way of expressing our gratitude to those who have contributed and supported the project.
None of this would have been possible if the JISC hadn’t funded the initial year long project in 2006. Thank you JISC. This saw us working with Cornell University Library (Nancy Mc Govern and Ann Kenney), ULCC, King’s College (Simon Tanner), the DPC (Wiliam Kilbride and Maggie Jones) and the Archaeology Data Service (ADS) (Jen Mitcham). In particular Nancy McGovern and Anne Kenney’s models are invaluable for introducing digital preservation. Thanks all.
ULCC’s Patricia Sleeman and Ed Pinsent receive the 2012 Digital Preservation Award for Teaching and Communication from Oliver Morley (Chief Executive of the National Archives) and Matthew Woollard (Director, UK Data Archive). [Picture courtesy of the Digital Preservation Coalition]
University of London can be justly proud of the team from ULCC who collected the 2012 Digital Preservation Coalition Award for Teaching and Communication.
It was a great moment at last night’s ceremony at the Wellcome Trust, to see Patricia and Ed receive their award from Oliver Morley, Chief Executive of the National Archives, and Matthew Woollard, Director of the UK Data Archive. Patricia and Ed have worked extremely hard to develop and sustain the Digital Preservation Training Programme (DPTP) since its beginnings as a JISC project in 2005, and it was truly gratifying to see their achievement recognised by a panel of judges that included the British Library, the BBC and the Bodleian Library.
As you should know by now, DPTP is an entry-level, introductory course that develops critical thinking about digital preservation, designed to help those working in information management to understand effective approaches to the challenges of digital preservation, and enable them to assess the models and examples in the context of their own organisations.
I recently attempted to web-archive an interesting website called Letters of Charlotte Mary Yonge. The creators had approached us for some preservation advice, as there was some danger of losing institutional support.
The site was built on a WordPress platform, with some functional enhancements undertaken by computer science students, to create a very useful and well-presented collection of correspondence transcripts of this influential Victorian woman writer; within the texts, important names, dates and places have been identified and are hyperlinked.
Since I’ve harvested many WordPress sites before with great success, I added the URL to Web Curator Tool, confident of success. However, right from the start some problems were experienced. One concern was that the harvest was taking many hours to complete, which seemed unusual for a small text-based site with no large assets such as images or media attachments. One of my test harvests even went up to the 3 GB limit. As I often do in such cases, I terminated the harvests to examine the log files and folder structures of what had been collected up to that point.
This revealed that a number of page requests were showing a disproportionately large size, some of them collecting over 40 MB for one page – odd, considering that the average size of a gathered page in the rest of the site was less than 50 KB. When I tried to open these 40 MB pages in the Web Curator Tool viewer, they failed badly, often yielding an Apache Tomcat error report and not rendering any viewable text at all.
A number of discussions are going on at the moment about the importance of preserving representation information for electronic files. Given the many possible ways even ordinary text can be encoded, it can never be taken for granted that digital files will be readable in the future.
Maureen Pennock has recently set in motion the Crowd-sourced Representation Information for Supporting Preservation, to collect this information from anyone with something to contribute. Over at the Signal, Trevor Owens wrote a fascinating post about Glitching Files for Understanding – deliberately manipulating files at byte-level and changing them in inappropriate ways to see the effects. In a similar vein, Paul Wheatley and the SPRUCE project have created the intriguing Atlas of Digital Damages, on Flickr, illustrating what happens when files go wrong or can no longer be accurately rendered.
This reminded me of one of my own personal adventures in digital archaeology, and I finally tracked down the materials necessary to tell the story.
Some years ago I discovered an old floppy disk containing several undergraduate essays. While I was a student at Warwick University, from 1989-1992, I wrote most of my essays on a Research Machines “Nimbus”, inherited somehow from the school where my mother taught. It ran a version of DOS: I did little of interest on it beyond using the Word Processing application that was part of an office suite called First Choice.
On copying them to a PC, the file format used by First Choice was not obviously among those recognised by the conversion tools offered by MS Word,1 so I simply opened the document files in a text editor, and was pleased to discover they were broadly intelligible (the character encoding, at least – I can’t vouch for the intellectual content!): the text-encoding was clearly ASCII based.
The mysterious “GERBILDOC” file header