Photo by Ard Hesselink https://www.flickr.com/photos/docman/

Photo by Ard Hesselink
https://www.flickr.com/photos/docman/

This one-day event on 31 October 2014 was organised by the DPC and hosted at the futuristic, spacious offices of HSBC, where the presentation facilities and the catering were excellent. All those attending were given plenty of mental exercises by William Kilbride. He said he wanted to build on his “Getting Started in Digital Preservation” events and help everyone move further along the path towards a steady state, where digital preservation starts to become “business as usual”. The very first exercise he proposed was a brief sharing-discussion exercise where people shared things they have tried, and what worked and didn’t work.

Kurt Helfrich from The RIBA Library said his organisation had a large amount of staff administering a historic archive; various databases, created at different time for different needs, would be better if connected. He was keen to collaborate with other RIBA teams and link “silos” in his agency.

Lindsay Ould from Kings College London said “starting small worked for us”. They’ve built a standalone virtual machine, using locally-owned kit, and are using it for “manual” preservation; when they’ve got the process right, they could automate it and bring in network help from IT.

When asked about “barriers to success”, over a dozen hands in the room went up. Common themes: getting the momentum to get preservation going in the first place; extracting a long-term commitment from Executives who lose interest when they see it’s not going to be finished in 12 months. There’s a need to do advocacy regularly, not just once; and a need to convince depositors to co-operate. IT departments, especially in the commercial sector, are slow to see the point of digital preservation if its “business purpose” – a euphemism for “income stream”, I would say – is not immediately apparent. Steph Taylor of ULCC pointed out how many case studies in tools in our profession are mostly geared to the needs of large memory institutions, not the dozens of county archives and small organisations who were in the room.

Ed Pinsent (i.e. me) delivered a talk on conducting a preservation assessment survey, paying particular attention to the Digital Preservation Capability Maturity Model and other tools and standards. If done properly, this could tell you useful things about your capability to support digital preservation; you could even use the evidence from the survey to build a business case for investment or funding. The tricky thing is choosing the model that’s right for you; there are about a dozen available, with varying degrees of credibility as to their fundamental basis.

Catherine Hardman from the Archaeological Data Service (ADS) is one who is very much aware of “income streams”, since the profession of archaeology has become commercialised and somewhat profit-driven. She now has to engage with many depositors as paying customers. To that end, she’s devised a superb interface called ADS Easy that allows them to upload their own deposits, and add suitable metadata through a series of web forms. This process also incorporates a costing calculator, so that the real costs of archiving (based on file size) can be estimated; it even acts as a billing system, creating and sending out invoices. Putting this much onus on depositors is, in fact, a proven effective way of engaging with your users. In the same vein, ADS have published good practice guidance on things to consider when using CAD files, and advice on metadata to add to a Submission Package. Does she ever receive non-preferred formats in a transfer? Yes, and their response is to send them back – the ADS has had interesting experiences with “experimental” archaeologists in the field. Kurt Helfrich opened up the discussion here, speaking of the lengthy process before deposit that is sometimes needed; he memorably described it as a “pre-custodial intervention”. Later in the day, William Kilbride picked up this theme: maybe “starting early”, while good practice, is not ambitious enough. Maybe we have to begin our curation activities before the digital object is even created!

Catherine also perceived an interesting shift in user expectations; they want more from digital content, and leaps in technology make them impatient for speedy delivery. As part of meeting this need, ADS have embraced OAI-PMH protocols, which enables them to reuse their collections metadata and enhance their services to multiple external shareholders.

There is no doubt that having a proper preservation policy in place would go some way to helping address issues like this. When Kirsty Lee from the University of Edinburgh asked how many of us already had a signed-off policy document, the response level was not high. She then shared with us the methodology that she’s using to build a policy at Edinburgh, and it’s a thought-through meticulous process indeed. Her flowcharts show her constructing a complex “matrix” of separate policy elements, all drawn from a number of reports and sources, which tend to say similar things but in different ways; her triumph has been to distil this array of information and, equally importantly, arrange the elements in a meaningful order.

Kirsty is upbeat and optimistic about the value of a preservation policy. It can be a statement of intent; a mandate for the archive to support digital records and archives. It provides authority and can be leverage for a business case; it helps get senior management buy-in. To help us understand, she gave us an excellent handout which listed some two dozen elements; the exercise was to pick only the ones that suit our organisation, and to put them in order of priority. The tough part was coming up with a “single sentence that defines the purpose of your policy” – I think we all got stumped by this!

Download the slides from this event

Report continues in part two

DThompsonTweet2

In September this year Dave Thompson of the Wellcome Library asked a question by Twitter, one which is highly relevant to digital preservation practice and learning skills. Addressing digital archivists and librarians, he asked: “Do we need to be able to do all ourselves, or know how to ask for what is required?”

My answer is “we need to do both”…and I would add a third thing to Dave’s list. We also need to understand enough of what is happening when we get what we ask for, whether it’s a system, tool, application, storage interface, or whatever.

Personally, I’ve got several interests here. I’m a traditional archivist (got my diploma in 1992 or thereabouts) with a strong interest in digital preservation, since about 2004. I’m also a tutor on the Digital Preservation Training Programme.

As an archivist wedded to paper and analogue methods, for some years I was fiercely proud of my lack of IT knowledge. Whenever forced to use IT, I found I was always happier when I could open an application, see it working on the screen, and experiment with it until it does what I want it to do. On this basis, for example, I loved playing around with the File Information Tool Set (FITS).

When I first managed to get some output from FITS, it was like I was seeing the inside of a file format for the first time. I could see tags and values of a TIFF file, some of which I was able to recognise as those elusive “significant properties” you hear so much about. So this is what they look like! From my limited understanding of XML – which is what FITS outputs into – I knew that XML was structured and could be stored in a database. That meant I’d be able to store those significant properties as fields in a database, and interrogate them. This would give me the intellectual control that I used to relish with my old card catalogues in the late 1980s. I could see from this how it would be possible to have “domain” over a digital object.

There’s a huge gap, I know, between me messing around on my desktop and the full functionality of a preservation system like Preservica. But with exercises like the above, I feel closer to the goal of being able to “ask for what is required”, and more to the point, I could interpret the outputs of this functionality to some degree. I certainly couldn’t do everything myself, but I want to feel that I know enough about what’s happening in those multiple “black boxes” to give me the confidence I need as an archivist that my resources are being preserved correctly.

With my DPTP tutor hat on, I would like to think it’s possible to equip archivists, librarians and data managers with the same degree of confidence; teaching them “just enough” of what is happening in these complex processes, at the same time translating machine code into concrete metaphors that an information professional can grasp and understand. In short, I believe these things are knowable, and archivists should know them. Of course it’s important that the next step is to open a meaningful discussion with the developer, data centre manager, or database engineer (i.e. “ask for what is required”), but it’s also important to keep that dialogue open, to go on asking, to continue understanding what these tools and systems are doing. There is a school of thought that progress in digital preservation can only be made when information professionals and IT experts collaborate more closely, and I would align myself with that.

OpenAccess2014

It’s Open Access Week ( hashtag – #OAWeek2014 ) and around the world everyone is talking about the importance of sharing, of re-use and of people having free access to content. Although it started as a movement focused on scholarly publications, Open Access as a concept has made big waves. The move from paper to online has made the possibility of much greater openness attainable. Since the first Open Access Week took place in 2009, the movement has developed to promote the benefits of sharing in academia far beyond scholarly publications, to include research data and teaching and learning resources.

So what role, in all this excitement of sharing and re-use and collaboration, does digital preservation play? A very central one, we would say. Peter Subar’s definition is a good place to start -

“Open-access (OA) literature is digital, online, free of charge, and free of most copyright and licensing restrictions

OA removes price barriers (subscriptions, licensing fees, pay-per-view fees) and permission barriers (most copyright and licensing restrictions)”

- but to keep something digital and online, that something needs to be part of a well-managed digital preservation programme. Putting it out there is only half of the job. Deciding what content stays available, and for how long, and how digital content will continue to be accessible over time is fundamental to the ongoing success of the OA movement.Without digital preservation taking place, content can become inaccessible over time as file formats change, as hardware needed to view the content becomes obsolete – for any number of reasons that can damage content or make it inaccessible over time. So, digital preservation has a role in keeping OA content in an open and accessible state after the initial publication.

Digital preservation also has an important role to play before content is published in an OA way. Content is created, and that content needs to be preserved so that it can become open and accessible. If a researcher, for example, has created research data as part of a research project, then written a research paper based upon that data, intending to share their entire research output under Open Access, there is usually a period of time before both are ‘live’ and publicly published. Making sure that all research outputs are managed well from a digital preservation perspective is crucial. Without digital preservation taking place, digital objects can and do become inaccessible. To be able to open up content as Open Access, that content needs, by definition,  to be accessible. A desire to share will not overcome such issues as bit rot,  file corruption, content that can only now be viewed on unavailable software or any of the other many ways that digital objects can become inaccessible and/or degenerate over time.

The theme of the OA Week for 2014 is Generation Open. So this seems like the perfect year to raise a awareness of  digital preservation and how it supports and underpins the aspirations of the Open Access movement. If you’d like to know more about digital preservation, there are some useful resources out online. We’ve compiled a short list if some key resources, below, which you might find useful.

This blog is a good place to start, and we also run training courses in digital preservation, catering for the beginner with our ‘Introduction to Digital Preservation’ course and to the more experienced practitioner with our ’Practice of Digital Preservation’ course, running in November and December 2014 respectively.

The Digital Preservation Coalition (DPC), is a membership organisation that supports digital preservation. Their site is a wealth of information on all things digital preservation, including Tech Watch Reports, news, training and even jobs (if you get carried away!),  this is a great starting point. UK-based, they have members from all over the world.

The Open Preservation Foundation (OPF), is another international organisation. They support and open community around digital preservation and have useful information on tools, training and software and community events. Most useful when you have some basic knowledge of the subject.

The SPRUCE Project was a collaboration between the University of Leeds, the British Library, the Digital Preservation Coalition, the London School of Economics, and the Open Preservation Foundation, co-funded by Jisc. The aim was to bring together a community to support digital preservation in the UK. Although the project ended in November 2013, a live wiki brings together the top project outputs (all open, of course), including a Digital Preservation Business Case Toolkit and a community-owned Digital Preservation Tool Registry.

The Digital Curation Centre (DCC) is a centre of expertise in the curation of digital information. This is the go-to place for all your research data preservation needs, with useful case studies, how-to guides and training courses in this area.

For some tips and information on how the ‘big guys’ manage digital preservation, check out the British Library’s digital preservation strategy, which includes some useful links as well as the strategy itself, and ‘Preserving Digital Collections’ from The National Archives has lots of good information on digital preservation, including FAQs.

Enjoy Open Access Week 2014, and remember that sharing starts and ends with good digital preservation!

Today saw the inaugural meeting at ULCC of what we hope will become an ongoing series, intended to complement the successful EPrints UK User Group meeting.

The pow-wow will bring together developers from universities around the UK to learn about the next generation of features and functionality offered by the EPrints repository platform.

The event gives developers a chance to look “under the hood” of EPrints and better understand how to effectively implement and deploy new features at their own institutions. Developers discussed how they can actively contribute to the platform by feeding back changes and enhancements to the EPrints github repository.

 

Links:

Team Linter

Team Linter receiving their prizes. Photo to RepoFringe used under CC

Rory McNicholl, Tim Miles-Board and  Steph Taylor attended the Repository Fringe conference in Edinburgh, 30-31 July. Steph gave a presentation on how to turn a repository into a digital archive. The talk used the ART team’s knowledge of both repositories and digital presentation to give UK repository manages some useful guidelines and tips on how they could start to engage with digital preservation. Tim, already an established member of the UK repository community, made the most of the excellent networking opportunities to bring back many interesting leads and contacts for the team.

Rory, meanwhile, was busy not only networking but also creating the winning entry  in the Developer Challenge that ran for the two days of the conference.  With a brief ‘to do something cool with repositories within the Open Access realm or even better, something aiding Open Access compliance’, the race was on among participating teams.

Rory worked with  two other delegates, Paul Mucur of Altmetric & Richard Wincewicz of EDINA,  as part of ‘Team Linter’ to build a tool to not only check the completeness of repository records but to then fill in those gaps. The tool first identified any missing metadata within a record and then used  existing services such as SHERPA, and CROSS-REF to suggest information to fill those gaps. It was a great time-saving tool, and allowed for a knowing human eye to check the suggestions as they were made. The information was added in a cascade style, with each new piece of information being then used to search for more in-depth information. The record became more and more detailed and accurate as it progressed through the search, gathering information in. The demo showed how a very detailed record could be created from a journal article title alone. The code can be found on GitHub.

The team had espite tough competition from team ‘Are We There Yettt?’ and their  tool to alert repository managers when a paper that is deposited before publication is actually published. But the repository managers loved Team Linter just a little bit more, and showed their appreciation by giving them the loudest applause and cheers.  It was great to see one of our developers get such public recognition for their skills and knowledge, and a great way to end a very productive conference.

Moomins Blogpost

Photo by Todd.vision used under CC

In June, Richard Davis and Steph Taylor attended the Open Repositories conference in Helsinki. Between them, they gave three presentations on various aspects of the work of the ART team. Richard and Steph wrote a joint presentation on the evolution of the repository landscape, using the many bespoke developments carried out by the ART team’s EPrints developers. Repositories showcased in the presentation included the Linnean Society Herbarium, the Atlantic Archive repository and the SAS Open Journals.

Richard also spoke about the ULCC partnership with Arkivum, and how the ART team developers are linking up Arkivum and EPrints to create a repository that is also a digital archive.

And finally, carrying on in the digital preservation theme, Steph gave a short ‘repository rant’ presentation in which she was able to point out (in a rather firm way!),  why a repository is not a digital archive. The conference provided a great opportunity to network with repository people from around the world, to learn about their work and to share what we are doing at ULCC.