Engineering and Physical Sciences Research Council (EPSRC) guidelines: What are they and how can you meet them?

After receiving an increasing number of enquiries about the new EPSRC guidelines which are coming into effect in May 2015, I decided to catch up with Kevin Ashley, Director, Digital Curation Centre, my colleagues Rory McNicholl and Timothy Miles-Board, and Matthew Addis, CTO at Arkivum to get a better understanding of what the requirements are and how institutions can meet them.

Q: When looking into the EPSRC guidelines on research funding, I couldn’t help but notice them being not as tightly defined as I would have thought. Is that deliberate?

KA: I think it is, and there are perfectly valid reasons for it, but some are uncomfortable with that. The flexibility allows for different responses from larger and smaller institutions – you just need to be able to defend what you do.

MA: This largely due to the variety of research projects and their differing objectives, both in terms of brief and data output. Some might have a mandate to be publicly accessible first, others will focus on the safety and security of the research data before being concerned about dissemination of research findings.

Q: There is a requirement to include an ‘access statement’ into published research papers and underlying data be available for scrutiny by others. How can that be achieved?

KA: Putting the data somewhere that has permanence is key, inside or outside the institution. Fundamentally it isn’t difficult. Many researchers aren’t used to doing this yet, but there is a lot of evidence that whenever papers include links to the data behind them the research gets more widely cited. This is something that benefits the researchers, the research funders and the universities so there are incentives all round.

kevin_mug_2010_aa

Kevin Ashley

 

MA: DOIs are a great way to reference data so it’s more easily accessible, for example a DOI might resolve to a link to an EPrints page which in turn links to the underlying data in storage. But it’s important that these links are persistent and don’t break causing ‘404 errors’.

RM: The DataCite plugin we developed, ensures that Digital Object Identifiers (DOI) are created not only offers a certain degree of flexibility, either automatically creating DOIs or allowing depositors to choose when a data set should receive a DOI, and in addition also has ‘sense check’ capabilities.

Q: There is an expectation aimed at institutional level to ensure policies and process are in place and maintained to meet the EPSRC guidelines. How can institutions ensure they are doing the right thing?

KA: This is where EPSRC differs from other funders, by placing a duty on the university and not the researcher making clear that it is the university’s job to support researchers dealing with research data. It’s only been in the last 4 years that universities have begun producing policies, with Edinburgh, Oxford and Hertfordshire being amongst the first to publish theirs. The main difference we are seeing is between an ‘aspirational policy’, such as Edinburgh’s which then requires further processes and services to be added later on and a more mandatory policy which prescribes everything at the outset. Both approaches can work and it depends on the individual institution, its structure and culture as to which to adapt.

Matthew Addis

Matthew Addis

TMB: Monitoring access helps institutions understand the impact of data sets and also informs the data retention policy. The latest version of IRstats for example has a pretty robust API which can be used to analyse the EPrints logs and produce useful statistics. One of our customers the London School of Hygiene and Tropical Medicine (LSHTM) uses it internally to show the value of making information available, although in their case this is for their publications and not research data repository.

MA: University of Bristol has produced a case study highlighting key areas which benefited from establishing policies, procedures and internal awareness – increased data sharing, improved RDM skills and awareness, better funding applications, and improved efficiency and cost-savings – with an increase in grant funding being just one of the compelling figures.

Q: There is a big focus on ‘metadata for discovery’. Why is that important?

KA: Because if you can’t discover that data exists you are unlikely to reuse it, and reuse is the ultimate goal. This is another area where some people are requesting more prescriptive guidelines than exist at present. It’s not about following ONE prescribed standard but being able to ‘defend’ the standard you chose if challenged. Many scientific disciplines already have well established standards and it wouldn’t be feasible to impose a one-size-fits-all.

RM: The EPrints Re-collect plugin, developed by University of Essex & Jisc developed a so called meta-date profile. Together with University of East London we extended the metadata that is collected to include pointers to the research publications that use the research data, which again helps improve and evidence the benefits of sharing data and making it discoverable.

MA: Access supports repeatable and verifiable science, meaning research results derived from data can be scrutinised. But this isn’t just about descriptive metadata – it includes knowing what formats data is in and that these formats are usable. Arkivum now has an integration with Archivematica, a tool that will automatically do file format analysis, metadata extraction and format conversions.

Q: Data needs to be preserved for a minimum of 10 years from last access or creation, which seems a very long time.

KA: This is certainly the one that seems to worry most of the IT managers. It strikes me as a fear of the unknown – how much storage will this require? – but the DCC have developed tools to make it easier to estimate the scale of the problem It’s important to understand the EPSRC doesn’t expect all data, all working versions, of research projects to be preserved. The DCC produced guidance on choosing what to keep. Comparing your research data storage needs vs all other ‘business as usual’ data is a valuable exercise; the former can be fraction of the latter.

MA: It’s being able to ensure that if the data is being used by people, i.e. it’s useful, then it remains available to the community. With the Arkivum service we support long-term access to data, including ongoing data integrity and authenticity with predictable costs.

Q: Data curation and life cycle

KA: Although this is likely to be the most difficult area for institutions to be 100% compliant from the outset, most are already doing something towards meeting this requirement. The key consideration is to have processes and support in place to ensure that data curation issues are being considered and addressed at the outset of a research project, rather than once the research has concluded.

MA: Curation is about the usability of data, especially by those who didn’t create it in the first place. Much of this is simply Good Research Practice and should be a normal part of doing research. But the job never stops, especially dealing the challenges of on-going digital preservation. This is where services from ULCC, Arkivum and others can really help. Researchers can get on do what they are good at – research – and they don’t need to be experts at digital preservation. Institutions still take responsibility, but delegate execution to service providers, and can use a range of ways to establish trust, for example reputation, due diligence, ISO 27001, Janet Frameworks, ISO 9000 and so on.

Interview conducted by Frank Steiner, Marketing Manager at ULCC.

To find out more about the EPSRC guidelines and how we can help your institution meet them, please register for our upcoming webinar ‘Helping you meet the EPSRC guidelines’ which takes place on Thursday, 5th February 2015.

24Days_sledge

Last week we launched our digital preservation advent tweets, a series of 24 tweets throughout the run up to Christmas. If you missed any, you can catch up via our Storify of the first 7 days.  We hope you’re enjoying this small celebration of all things digital preservation, and that you will share the links on Twitter if you find something useful. Thanks to all the people and organisations who have inspired our tweets so far, and here’s to the next 17 days!

Image from the British Library on Flickr

11126055896_811d60f9e4_z

This year, we are getting into the festive spirit. Starting on the 1st of December, we will be sending out an Advent tweet per day, up to Christmas Eve, using the #24DaysofDP hashtag. We’ll tweet links to some things we’ve doing this year, some past projects and some new ideas. We hope our tweets will inspire you to engage with digital preservation. We also hope to give you some help with maintaining your interest via courses, presentations, books, tools and some useful organisations. Look out for our daily tweets, and join us in a 24-day long celebration of all things digital preservation for Christmas 2014…

Photograph by Krista Hennebury https://www.flickr.com/photos/bluecottage/

Photograph by Krista Hennebury
https://www.flickr.com/photos/bluecottage/

Our award-winning Digital Preservation Training Programme now offers a Beginner Course and a Practitioner Course. If you’re not sure which Course is right for you, see the tables below. These new courses have been designed to meet the needs of the community, so anyone working in archives, libraries, museums, or information management will benefit. Whether a records manager or a digital librarian, working in a commercial business or a County Archives, you will be welcome on the DPTP.

We’ve made this change to bring our work more closely in line with the emerging Curriculum Framework for Digital Curation which is being designed by the good people at DigCurV.

Are you a beginner or a practitioner?

Beginners…

  • Want to learn about the OAIS Model
  • Are just starting out with digital preservation where they work
  • Want to hear about tools and systems they can use to do it
  • Would like to know what we mean by file formats, metadata, managed storage, and migration

Practitioners…

  • Are familiar with the OAIS Model
  • Are already doing some form of digital preservation where they work
  • Want to get started working with more challenging objects (e.g. email or databases)
  • Would like to understand how file formats work, how to implement METS and PREMIS, and develop their strategy

What’s the difference between the courses?

The Beginner course…

  • Entry-level, two days long
  • Aimed at complete beginners who wish to learn more about the field
  • Ideal for starters in all disciplines who want to know more about digital preservation
  • Focussed on raising awareness on the main subject areas and trends around digital preservation

The Practitioner Course…

  • Intermediate, three days long
  • Aimed at practitioners who wish apply more practical solutions to their content
  • Aims to be extremely practical
  • Imparts knowledge of relevant terminology, business processes and tools
  • Teaches detailed understanding of technical issues
  • Suggests approaches to preserving difficult objects
  • Teaches aspects of strategic planning

There is however also a little overlap between the content of the two courses. Please see the course descriptions for more information.

What else are we working on?

One-day or half-day workshops, which we call “masterclasses”, are planned for 2015. The first of these will probably cover the topic of web-archiving. We have sometimes featured this subject on the course, and invariably found there was not enough time to teach it properly, nor to answer all the questions that it raised in the minds of students. Look out for the DPTP masterclass in web-archiving sometime around May or June 2015.

Dates for 2015

  • 19-20 January 2015 (2 day Beginner)
  • 16-18 March 2015 (3 day Practitioner)
  • 21-23 September 2015 (3 day Practitioner)
  • 26-27 October 2015 (2 day Beginner)
  • 7-9 December 2015 (3 day Practitioner)

Learn more

Beginner Course

Practitioner Course

Book a place

ULCC Store

Enquiries

Email Us

Photo by James Jordan https://www.flickr.com/photos/jamesjordan/

Photo by James Jordan https://www.flickr.com/photos/jamesjordan/

This one-day event on 31 October 2014 was organised by the DPC. The day concluded with a roundtable discussion, featuring a panel of the speakers and taking questions from the floor. The level of engagement from delegates throughout the event was clearly shown in the interesting questions posed to the panel, the thoughtful responses and the buzz of general discussion in this session. Among many interesting topics covered, three stand out as typical of the breadth of knowledge and interest shown at the event.

First, a fundamental question about the explosion of digital content and how it will impact on our work. How can we keep all of this stuff, where will we put it, and how much will it really cost? Sarah Middleton urged us to attend the upcoming 4C Conference in London to hear discussion of cutting-edge ideas about large-scale storage approaches. Catherine Hardman reminded us of one of the most obvious archival skills, which we sometimes tend to forget: selection. We do not have to keep “everything”, and a well-formulated selection policy continues to be an effective way to target the preservation of the most meaningful digital resources.

Next, a question on copyright and IPR as it applies to archives/archivists and hence digital preservation quickly span into the audience and back to different panel members in a lively discussion. The general inability of the current legislation, formed in a world of print, to deal with the digital reality of today was quickly identified as an obstacle to both those engaged in digital preservation and to users seeking access to digital resources.

The Hargreaves report was mentioned (by Ed Pinsent of ULCC) and given an approving nod for the sensible approach it took to bringing legislation into the 21st century. However, the speed with which any change has actually been implemented was of concern for all, and was felt to be damaging to the need to preserve material. The issues around copyright and IPR were knowledgeable discussed from a wide variety of perspectives, including the cultural heritage sector, specialist collections, archaeological data and resources and, equally important among delegates, the inability to fully open up collections to users in order to comply with the law as it stands.

Some hope was found, though, in the recent (and ongoing) Free Our History campaign. Using the national and international awareness of various exhibitions, broadcasts and events to mark the anniversary of the First World War, the campaign has focussed on the WW1 content that museums, libraries and archives are unable to display because of current copyright law. Led by the National Library of Scotland, other memory institutions and many cultural heritage institutions have joined in the CILIP campaign to prominently exhibit a blank piece of paper. The blank page represents the many items which cannot be publicly displayed. The visual impact of such displays has caught attention, and the accompanying petition is currently being addressed by the UK government.

The third issue raised during this session was the suggestion for more community activity, for example more networking and exchange of experience opportunities. Given the high rate of networking during lunchtime and breaks, not to mention the lively discussions and questions, this was greeted with enthusiasm. Kurt Helfrich from RIBA explained his idea for an informal group to organise site visits and exchange of experience sessions among themselves, perhaps based in London to start off with. Judging by the level of interest among delegates to share their own work and learn from others during this day, this would be really useful to many. Leaving the event with positive plans for practical action felt a very fitting way to end an event around making progress in digital preservation.

The above authored mostly by Steph Taylor, ULCC

Download the slides from this event

Photo by Max Ross https://www.flickr.com/photos/max-design/

Photo by Max Ross https://www.flickr.com/photos/max-design/

This one-day event on 31 October 2014 was organised by the DPC. After lunch Sarah Middleton of the DPC reported on progress from the 4C Project on the costs of curation. The big problem facing the digital preservation community is that the huge volumes of data we are expected to manage are increasing dramatically, yet our budgets are shrinking. Any investment we make must be strategic and highly targeted, and collaboration with others will be pretty much an essential feature of the future. To assist with this, the 4C project has built the Curation Exchange platform, which will allow participating institutions to share – anonymised, of course – financial data in a way that will enable the comparison of costs. The 4C project has worked very hard to advance us beyond the simple “costs model” paradigm, and this dynamic interactive tool will be a big step in the right direction.

William Kilbride then described the certification landscape, mentioning Trusted Digital Repositories, compliance with the OAIS Model, and the Trusted Repositories Audit & Certification checklist, and the evolution of European standards DIN 31644 and the Data Seal of Approval. William gave his personal endorsement to the Data Seal of Approval approach (it has been completed by 36 organisations, and another 30 are in progress of doing it), and suggested that we all try an exercise to see how many of the 16 elements we felt we could comply with. After ten minutes, a common lament was “there are things here beyond my control…I can’t influence my depositors!”

William went on to discuss tools for digital preservation. Very coincidentally, he had just participated in the DPC collaborative “book sprint” event for the upcoming new DPC Handbook, and helped to write a chapter on this very topic. Guess what? There are now more tools for digital preservation than we know what to do with. The huge proliferation of devices we can use, for everything from ingest to migration to access, has developed into a situation where we can hardly find them any more, let alone use them. William pins his hopes on the Tools Registry COPTR, the user-driven wiki with brief descriptions of the functionality and purpose of hundreds of tools – but COPTR is just one of many such registries. The field is crowded out with competitors such as the APARSEN Tool Repository, DCH-RP, the Library of Congress, DCEX…ironically, we may soon need a “registry of tool registries”.

Our host James Mortlock described the commercial route his firm had taken in building a bespoke digital repository and cataloguing tool. His project management process showed him just how requirements can evolve in the lifetime of a project – what they built was not what they first envisaged, but through the process they came up with stronger ideas about how to access content.

Kurt Helfrich’s challenge was not only to unify a number of diverse web services and systems at RIBA, but also to create a seamless entity in the Cloud that could meet multiple requirements. RIBA’s in a unique position to work on system platforms and their development, because of their strategic partnership with the V&A, a partner organisation with whom they even share some office space. The problem he faces is not just scattered teams, but one of mixed content – library and archive materials in various states of completion regarding their digitisation or cataloguing. Among his solutions, he trialled the Archivists’ Toolkit which served him so well in California; and the open-source application Archivematica, with an attached Atom catalogue and Duracloud storage service. A keen adaptor of tools, Kurt proposed that we look at the POWRR tool grid, which is especially suitable for small organisations; and Bit Curator, the digital forensics systems from Chapel Hill.

Download the slides from this event

Report concludes in part three