Digital Archives

Your friendly neighborhood Digital Archives team

In May, the University of London Communications Office invited us to answer a few questions about ourselves for an Intranet article. We thought we’d reproduce some of our answers here, for the benefit of anyone else who wants a quick introduction to what we do (and ourselves, in case we forget).


ULCC Digital Archives & Repositories Team

ULCC Digital Archives & Repositories Team, May 2011

Please introduce yourself
We are ULCC’s Digital Archives & Repositories Team, and you can generally find us in the basement of Senate House along with the rest of ULCC. We are five:

  • Ed Pinsent and Patricia Sleeman, our digital archivists, provide digital preservation training and consultancy.
  • Rory McNicholl is our lead developer and repository systems manager.
  • Silvia Arango-Docio is project officer for Web archiving projects and repository support.
  • Richard Davis is team leader, and actively contributes to all of the team’s projects, as well as pursuing new and interesting opportunities.

What does your team do?
Our department was created in 1997 to develop the National Digital Archive of Datasets (NDAD), and operate it for The National Archives. NDAD was originally a joint project of ULCC and Senate House Libraries, providing a specialised cataloguing and preservation service for government databases.

Since then we’ve worked on many digital archive and library projects. The NDAD service ceased in 2010, but we continue working on innovative projects for born-digital and digitised records, developing repositories and related information systems for education and research. We provide specialist training and consultancy for the HE and cultural heritage sectors through the highly-acclaimed Digital Preservation Training Programme (DPTP).

Our current or recent partners and customers include most of the University’s colleges and institutes, the British Library, the Parliamentary Archives, the JISC, the UK Web Archiving Consortium, and the European Commission. We work closely with the Digital Preservation Coalition (DPC), and local groups such as the AIM25 archives network and theSHERPA-LEAP consortium of University of London repository managers.

Tell us one thing about your work that colleagues might not know.
Digital preservation issues might seem esoteric, but they are fundamental to every electronic and computer-based activity – in other words, pretty much everything in business, education and research. Understanding the best ways to manage, describe and preserve all our electronic information – from a single email, to complex Web sites, or collections of digital photos or videos – is an essential 21st century skill.

What aspect of your work gives you the most satisfaction?
We are very lucky to work closely with information professionals from around the UK and internationally. Patricia has recently delivered preservation workshops at the National Library of Jordan; Ed and Silvia are working on Web archiving projects for the JISC and the EU; Richard and Rory won the annual JISC Developer Challenge at last year’s Open Repositories conference in Madrid.

What is the most challenging part of your job?
Keeping up-to-date with constantly changing ICT landscape in education, libraries and archives. Electronic information systems evolve continually, as do the tools and methods for managing them. Luckily there is a wealth of current information on the Web, if you know where to look! We actively share our thoughts and experiences through our long-running Digital Archives Blog and on Twitter.

Name one thing that would make your working environment better.
Sometimes it seems we work with everyone except colleagues in Senate House! It would be great to work more with colleagues at the Central University, and bring more of our accumulated experience to bear on some of the University’s information and records management challenges.

If you could meet anyone (dead or alive) who would it be and why?
We’re not going to agree on this any time soon. Suggestions so far include Sofia Loren, Hypatia of AlexandriaRichard Brautigan and Pete Townshend.

Name three hobbies team members pursue – the more unusual the better.
Most of us have small children, so hobbies take a back seat. Ed is an accomplished artist, writer, broadcaster, musician and samizdat publisher. Patricia is about to learn how to stain glass before starting Arabic lessons for the third time.

If, as a group, you stormed the musical charts, what genre of music would that be in, what would be your stage name and who would be your lead vocalist?
We could be a psychedelic beat group called the Dates Of Creation. Our lead singer would have to be Patricia.

 

 

 

To have and to hold

Susan Corrigall(NRS), Patricia Sleeman & Ed Pinsent (ULCC).

We gave a 2 day in house workshop to the National Archives of Scotland last week.  As Ed Pinsent has noted in his post about legal admissability, our timing was quite interesting; we arrived on the Monday the week after NAS had merged with the General Register Office, to become  the National Records of Scotland.  Interesting times for all and while we couldn’t have known in 2010 what a pivotal day it would be for NAS, we hoped  our training session on digital preservation would be in some way timely.

The 20 staff attending were very varied in their reponsibility and roles in the NAS/NRS, however wisely it was felt that all needed to have a entry level awareness of the issues relating to digital preservation.   NAS has been engaged for some time in the management of its digital resources and it is looking at cost effective ways of doing so.  Staff are being skilled up in relation to digital preservation, in this way NAS are ‘maximising their current resource potential’, to use a buzz word.

Working with a group from one organisation in situ is quite different for us as normally the DPTP is an open course which sees a variety of delegate hailing from many very varied organsiations. As a result we cover many topics to give a good grounding in what we feel are the most important aspects fo digital preservation, while trying to accomodate varying levels of expertise and knowledge.  This type of course is sucessful in many ways – it can often lift spirits, particularly for peope working in a solitary environment to see how much in common they have in terms of digital preservation and its challenges. The social aspect of the course is important as people recognise that digital preservation affects all communities involved in any kind of management of digital content in the long term.  A lot of what we aspire to do is build confidence in people to tackle the issue.

Working in an organisation who has commissioned the DPTP is quite a different experience. We see the coming together of a group already known to each other to a greater or lesser degree. However very often they come from different sections and this ‘time out’ together should be of value in many ways:  as a collective learning opportunity, socialisation and in a time of change an opportunity for supporting each other.

We also get the chance during a class project to investigate a particular issue of note for the organisation.  This hopefully enables the group to leave with a finished product.  In the ‘open to all’ DPTP we want our group to leave with a better understanding of everything but more precisely to have gone some way regarding the implmentation of theory to real life work situatsions through our classwork.  To leave the course with soemthign immediately implementable within the workplace we feel is important. Our class session at NAS/NRS remains confidential but we thoroughly enjoyed it and felt that it translated a lot of what was hitherto theory into a an enhancement of an existing practise to enable improved digital object management.

Interesting times for all and I feel perhaps it was timely that NAS are skilling up their staff to embark on digital preservation now.  Digital preservation is a concern for all government departments but who can manage it? Archivists as a profession have a good set of tools and concepts from their profession which translate well to digital preservation.  In fact OAIS tells many archivists nothing new – they are doing most of it already in the analogue world.

Thus archivists hold one major key on our large key set which we need to unlock digital preservation.

Semantic Analysis of AIM25 EAD

From the AIM25 OMP project blog.

Rory and I met with Richard Gartner and Gareth Knight at CeRch today, to catch up with their investigations into using GATE and OpenCalais to process the EAD outputs from AIM25.

Results look very encouraging. OpenCalais, in particular, generates a post-processing set of identified entities (personal names, place names, corporate names) which Richard G has then created regular expressions to locate these in the body of the EAD and wrap in appropriate EAD tags (<persname> etc).

This suggests that the way forward for enhancing the existing data entry processes for AIM25 will involve dispatching the EAD-compliant data entered by collections manager to OpenCalais, and returning the data, with enhanced markup, for checking by the submitter. This hook should be easy enough to insert for manual, form-based entry; for batch entry processes we will need to assess whether any significant delays are introduced.

We've also started to consider ideas for a URI scheme for the entities identified. Our current working hypothesis is that this will involve defining a "data" namespace for AIM25, binding to http://data.aim25.ac.uk/. Within that we can develop a structure along the lines /person, /place, /corporate_body, and append our unique IDs for each entity. Further research is necessary, particularly into the recommendations of the Cabinet Office recommendations for Designing URI Sets for the UK Public Sector.

These URIs can then be used in identifier attributes for our EAD elements (<persname>, etc.), and thence easily transformed into an RDFa format for the Web-based HTML rendering of the AIM25 catalogues.

Next steps include further investigating how to implement and assert relationships between our entities and other open datasets (e.g. our_entity  is_the_same_as  your_entity). And how to make the authority data, duly marked-up, available as open metadata.

Rory and I can now start to consider suitable approaches to embedding this in our development copy of the existing AIM25 system, and we'll continue to liaise closely with CeRch for advice on  the relative merits of Gate and OpenCalais processing, and guidance on URI implementation.