Repositories

Open Repositories 2011 (Part 2): The Developer Challenge

Excitement at the OR11 Developer Challenge Show-and-Tell (Photo by @sparrowbarley)

An event that asked developers to demonstrate the Future of Repositories can only be considered a great success when it receives entries that include:

  • Multiple real-time examples of using “Repositories As A Service (RaaS)”, not only exchanging data but also sharing sophisticated functionality between EPrints and DSpace – and even including an Android application
  • A tool for bundling and depositing a whole raft of research related outputs from the Web via RDF
  • A tactile repository search interface with dynamic search suggestions, specifically designed for tablets and smartphones
  • A complete gesture and voice-driven system for depositing and searching in repositories

All these – and other great entries too – were achieved in a couple of days’ work during the course of the conference, for the annual OR Developer Challenge, and presented at a packed Show-and-Tell session on Thursday afternoon (true, there was free beer).

Stuart Lewis’s team were worthy winners with their RaaS project, particularly as they showed a genuine commitment to a cross-platform approach – something which, sensibly, backgrounds the individual software platforms, that often receive too much attention, and focuses on the Repository as an application and entity in its own right.

We were also really pleased to see a prize go to Patrick McSweeney and Matt Taylor. And enjoyed seeing Dave Tarrant stealing the show (again) with his live demonstration of using a Microsoft Xbox Kinect to submit items to a repository.

Our own entry may not have won, but several people liked it, and you may see more of it in future. For the second year running, the Developer Challenge was a great opportunity for Rory and me to concentrate on an idea that we’ve been kicking around, without having found a home for it in existing work (yet). This was true for our Semantic Metadata popup tools that won the challenge with last year.

Read More »

Statistically relevant

From the SHERPA-LEAP blog.

Over the last year or so we’ve installed and configured (in some cases reconfigured) the IRStats package for several of the LEAP repositories, including those hosted by ULCC. It seemed a good moment to share a few thoughts about the process of getting “all statted up” with EPrints.

By default, and without any further action, IRStats provides a kind of smorgasbord control panel, demonstrating the many optional graphs, charts and list available. You can see an example on our own ULCC Publications repository.

More recently we’ve seen growing demand among repository managers to share data on downloads with both their depositors and users at large. It’s really important for repository managers to select carefully which statistics views they actually want or need to display – we can only suggest things we think might work. Once you’ve decided on the views you want, we can look at the most effective ways to display them: and this is why I’ve been having fun souping up some of the displays already offered by IRstats.

The first display we’ve been working on is the Statistics digest. These are common enough and we’ve used the example of UCL Discovery repository as the basis of work for both SAS-Space and SOAS institutional repository.

The second approach has been to re-style the IRstats “dashboard” view to lay the graphs on top of each other and then use some Javascript to handle the tabbed navigation. This seemed a more elegant approach than inserting lots of charts in the abstract page itself (as, for example, at ECS EPrints). I’ve used this display technique to display statistics for individual eprints for the School of Pharmacy, as well as SAS and SOAS.

IRStats on School of Pharmacy EPrints
The tabbed display of graphs and tables was also combined with a ‘modal box’ display that keeps the height of page the same (for example on this Abstract page at SOAS. At the bottom of the Abstract page I’ve added a statistics section showing the number full-text downloads, and a link that displays detailed stats in an overlaid box.

This method doesn’t just work for individual items, but can be used on other datasets in too. For example, on SAS-Space we have added it to the bottom of their Collection browse pages, so that at the bottom of each Collection view there is an opportunity to view download statistics for that collection as a whole.

Additionally in SAS-Space, since it is a repository for a number of discrete institutes, there was a requirement for institutional editors to have access to their own institute’s statistics. To achieve this, I allowed access to a constrained version of the IRStats control panel for editor-users who had the appropriate editorial permissions for the institute in question. (Unless you are a SAS-Space editor, you won’t be able to access this.)

Which statistics views to insert as tabs is the decision of the repository manager. Views we’ve used include:

  • Monthly downloads
  • Daily downloads
  • Unique visitors
  • Referrers
  • Search Engines
  • Top 10 items downloaded (only for a Collection, Repository or Division)
  • Top 10 search terms

From a technical point-of-view, we will have to review these configurations when we upgrade to EPrints version 3.3, possibly later in the year (if it’s released!!), in conjunction with our VM infrastructure migration, and start doing things with EPStats rather than IRStats. But we now have an effective framework for adding statistics quickly to any EPrints installation.

Handy Hints: MIME-Types

From the SHERPA-LEAP blog.

Some repositories have reported issues with Microsoft “DOCX” files, which IE8 in particular may treat as a ZIP file. This is a potential problem with all the current slew of MS file types. The solution is to add the following entries to your web server configuration.

Extension MIME Type
.xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
.xltx application/vnd.openxmlformats-officedocument.spreadsheetml.template
.potx application/vnd.openxmlformats-officedocument.presentationml.template
.ppsx application/vnd.openxmlformats-officedocument.presentationml.slideshow
.pptx application/vnd.openxmlformats-officedocument.presentationml.presentation
.sldx application/vnd.openxmlformats-officedocument.presentationml.slide
.docx application/vnd.openxmlformats-officedocument.wordprocessingml.document
.dotx application/vnd.openxmlformats-officedocument.wordprocessingml.template
.xlam application/vnd.ms-excel.addin.macroEnabled.12
.xlsb application/vnd.ms-excel.sheet.binary.macroEnabled.12

Exactly how you (or more likely your system manager) achieve this depends on your Web platform (e.g. Apache, Tomcat, IIS) but whoever runs it should be able to make the necessary changes, and once the Web server is restarted, the new types should be picked up. (We’ve just done this for the ULCC-hosted repositories.)

MIME-Types” have a long and chequered history as a way of identifying file types to internet applications. To some extent IE8 is correct to infer (in the absence of better information from the Web server) that .docx files are ZIP files, because MS Office Open XML formats are bundled using the ZIP compression tool. But in general what one really wants the browser to do is pass the file to an Office application, not WinZip.

Ironically, it seems other browsers do correctly infer MS OOXML file types.

Synergies abound

Some days it all seems worthwhile and last Friday was such a day. I spent most of it at SOAS listening to accounts of the many digitisation projects of the Centre for Digital Africa, Asia and the Middle East (CeDAAME), including the Fürer-Haimendorf photographic collection, Islamic manuscripts (in partnership with Yale) and other justly named “Treasures of SOAS”. What Malcolm, Stuart, Julie and the rest of the SOAS team have achieved is extremely impressive. And of course I was also there to admire the fantastic work Rory has done making an attractive and accessible online showcase for them out of EPrints. (There are some rough edges still to polish, but by-Friday was a tough deadline! ;) )

Friday’s CeDAAME dissemination event was also an opportunity to be reminded that ULCC’s Digital Archives team has contributed in other ways to the success of SOAS’s team, directly and indirectly. Julie Makinson described how SOAS used the AIDA digital asset assessment toolkit in developing their strategic approach; and many of the SOAS team are alumni of the DPTP: so Ed and Patricia have also had their part to play in supporting SOAS’s digitisation efforts.

The presentations at SOAS were extremely interesting, describing the full range of activities of a multi-faceted digitisation programme, from the development of the strategy (using the aforementioned AIDA) to the many challenges of digitising Islamic manuscripts and related materials.

How, for example, do you reliably OCR pages of centuries-old text with mixtures of Arabic and Latin/English/French? The answer is that sometimes rekeying is unavoidable. We learned, too, that Yale used UKOLN’s DC Dot Dublin Core editor to create their metadata for Islamic collections (and then convert to TEI). Thanks to the native DC and Unicode support in EPrints, SOAS metadata (in English and Arabic) was created and managed directly in the repository. Metadata exchange between Yale’s Fedora-based system and SOAS’s EPrints system seems to have been achieved effectively – I know Rory worked closely with SOAS and Yale on this.

And I sensed genuine excitement in the room when the page-turning interfaces for viewing the books online were unveiled: both very impressive. (For SOAS Rory has been working long and hard on adapting the open source book viewer used by the Internet Archive, and ensuring that the right-to-left reading and page-turning functionality works smoothly.) We also learned about a variety of different approaches to the issues of managing and funding digitisation and cataloguing activities: with my work on the Mediawiki-based Transcribe Bentham project in mind, it was particularly interesting to hear about University of Michigan’s Collaborative Cataloguing initiative.

All in all an exciting day, and particularly satisfying to see close-up the kind of synergies that exist across all of the activities of ULCC’s Digital Archives and Repositories Team. In addition to further enhancing the SOAS Digital Archives system, we are also looking forward to working with them on their JISC-funded Engaging Overseas Communities project, which is going to involve hooking EPrints up to mobile phones in Africa and Asia.

As if that wasn’t enough, at lunchtime I also dashed over to the School of Pharmacy, where Jean, Neroli and Michelle had kindly organised a lunchtime meeting for the University of London repository managers in the LEAP consortium. It was an opportunity for me to unveil a preview of the new SHERPA-LEAP website (with added social networking goodness, courtesy of WordPress/BuddyPress) that we expect to launch very shortly.

It was a nice way to round off a week in which the Team also achieved significant milestones in our work on preservation metadata for the Parliamentary Archives and strategic development for The Women’s Library, began planning for the next DPTP course, and we received news that the FP7 BlogForever project, which will see us collaborating with Warwick, HATII, CERN and others until 2013, has received its final sign-off from the European Commission.

Doing It Differently In Sheffield Cathedral!

183191782

It was great to take part in last week’s Repositories Support Project event at Sheffield Cathedral. The theme of the day, organised by Jackie Wickham and the RSP team, was “Doing It Differently” and it covered a wide range of repository-related themes. I took along an updated and expanded version of the presentation I made to SHERPA-LEAP repository managers. I covered the same topics, but in preparing the presentation, I was amazed how many more things there were to talk about a year on.

Stephanie Taylor gave an excellent overview of the repository scene, and I hope I followed it up with useful ideas about making repositories more user-friendly or just generally useful to users. Other talks went off into less well trodden areas, though no less interesting: Pat Lockley impressed again with his enthusiastic description of Xpert; Joss Winn described his further adventures in WordPress land; and Stephanie Meece described the challenges of non-textual repositories at UAL. My ears pricked up when Jason Hoyt of Mendeley mentioned that an imminent upgrade to Mendeley will be able to identify OA sources for papers, which might signal it’s time for me to finally catch up with Mendeley (dissertation starts next year!). I didn’t catch the final speakers as I had to catch my train, but I commend to you Vicki McGarvey’s post on the SHARE project blog at Nottingham Trent University.

I tried to keep things simple by steering clear of all the complicated issues in repository management – OA, OAI-PMH, copyright, advocacy, REF, RIM, etc – and just focus on simple UI enhancements that might improve a user’s experience of the repository, and effective use of features like RSS feeds and statistics, with examples from all over the world of institutional and specialist repositories. Which features a repository manager might choose, if any, is up to them and their own circumstances, but my aim was to ensure they are at least aware of what’s possible – as evidenced by what’s been done in many repositories around the country.

Although I focused on EPrints installations, I think nearly everything I demonstrated ought to be feasible in other platforms. Overloading an abstract page with features like “Share this on Facebook/Twitter”, QR Codes, or metadata export in RSS/JSON/CSV and more, should be a very easy way to enhance the user experience of repositories. As I suggested, adding buttons to support “the latest thing” users may be finding useful, is generally not difficult. A “Send This Paper To My Kindle” button, for example, seems so trivial I might even try it myself.

I had a long list of ideas/examples to show: for anyone who didn’t have time to copy down the small print, they were: