4th International Digital Curation Conference part 2 (return of idcc4)

My last post from IDCC4 ended with me being unable to report on John Wilbank’s closing keynote on day 1, so I’ll rectify that now with the benefit of handwritten notes and a little more time for reflection.

John is working on the Science Commons initiative which, through projects and advocacy, is taking the Creative Commons concepts and applying them to the doing of science, as well as the publishing of its outputs. (The DCC were drawing our attention to the initiative over 3 years ago.) He began with a view[1] that science is not unlike wikipedia: they are about publishing, in the sense of disclosure, advances are made by individual action and proceed by small, discrete steps, and trust ratings accrue from peer review. He also commented on the “tyranny of the crowd” effect that general search tools like Google suffer from: someone searching for information about spears (their manufacture, use, or “spears, the carrying and chucking of”) will be somewhat overwhelmed by the number of results relating to Britney. And from this he moved to a view that science, to advance further, requires a disruptive change to its practices that it is inherently resistant to. One thing that needs to change is the notion that science is communicated through periodic papers (itself an outdated metaphor), “units of knowledge which are adverts for years of work.” He observed that, even if we still want papers, we really want them with embedded (ideally semantic) linking and tagging. Yet, although we have the technology to do this in a semi-automated way, the licenses that apply to many e-journals explicitly prevent us from doing so.

He then moved to considering ways to improve openness of journals and their content: by giving incentives such as better statistics to those who publish in open journals, and through simple but effective tools such as the scholar’s copyright and addendum tool, a really simple idea which impressed me: something which automatically adds text to a publisher’s licence from a closed journal which gives the author the necessary rights to self-publish (and hence link and annotate) their work. He demonstrated the effect of linking and the power of the semantic web through a google search on particular receptors in brain chemistry and the genes which relate to them. A google search, despite highly specific language, returned 88,400 results, most of which were papers about the receptors. But what the researcher probably wants is a list of genes and evidence for the nature of their relationship (as encoders, regulators, etc.). Their semantic web tools give exactly this, and allow the resultant RDF query to be turned into a simple (if unwieldy) hyperlink. What’s more, they were able to use google maps for brain data in a way that allows it to be annotated; one of those clever, simple ideas that makes you wonder why noone else did it before. As John made clear, one of the reasons is that the data isn’t sufficiently open.

He argued strongly that, for open science data to realise its potential, we must abandon the notion of requiring attribution and/or citation because it places too great a burden on those combining data from multiple sources. He’s doing pilot work on open data in science commons with charities funding work on rare brain diseases: the sort of thing in which the ability to link scattered data, often from other areas of research, hugely amplifies the value of the original research funding. Working with google, they’ve come to the realisation that typical data mashups or search results might involve 40,000 invididual citations if all the data sources ar taken into account. For Google that’s a burden they’re not willing to deal with. So John is arguing strongly that we should abandon the desire to have databases, or cells in databases, attributable. It was a powerful argument, although I’m concerned that it shouldn’t be forgotten that we often still need the ability to determine data provenance, sometimes at the level of an individual value in a database cell, to ensure that we know we’re comparing like with like, or applying an appropriate statistical technique. And the provenance information is often tied up with the citation information. Still, John’s argument, and that of the science commons, seems very persuasive: huge benefits can be gained from making data completely open (as in public domain) and we will not realise those benefits if we cling to attribution or citation. He also made that point that although data growth is exponential, our brain capacity remains constant, and that the only human factor increasing along with data is population. People-driven annotation and sharing therefore helps us process increasing volumes of information (I think.)

Cameron Neylon has also written about John’s talk and that crucial distinction between attribution and citation. (His blog post also contains a link to his presentation from Tuesday morning, which makes a great case for the need to curate networks of science, not digital objects per se. I’m sorry I missed the talk.) John points out that many people confuse the need to attribute with the need to cite; attribution is a legal requirement, bound up with copyright and licences (even in the open world of creative commons) and failure to attribute material puts you legally in the wrong. Citation is merely a social convention or an ethical obligation, something we ought to do; failure to cite leads to the disapproval of your peers and is called plagiarism, at least when you attempt to pass off the ideas of others as your own. And that leads to the conclusion that, for an academic, the consequences of not meeting that social obligation are potentially much worse than the consequences of falling foul of copyright law. The latter may, at worst, lead to a financial penalty which is unpleasant but survivable, whereas plagiarism can lead to the loss of reputation, job and career – despite no laws having been broken.

Edinburgh Castle in the mist: asifch@flickr.com, CC-BY-ND-NC licenceBut that was a minor cororollary of the Science Commons thesis, which appears to be that we have to be willing to really let go of data, in a way that we haven’t done before, to allow science to proceed in a way that permits disruptive, rather than incremental, change.

Tuesday morning saw Martin Lewis, university librarian at Sheffield, look at the role libraries need to play in supporting research. His talk had two threads: one a reflection on how libraries are still agile, capable of change and sensitive to the needs of scolars and learners, and one a summary of the findings and consequences of the UKRDS report, which at present is still in draft. He used new library buildings at Sheffield and Glasgow Calendonian to illustrate that libraries now create very different sorts of spaces for learners and as a result still attract students to congregate there. Capping of collection size is now commonplace – they recognise that most collections cannot continue to consume more space indefinitely. He reflected that UK university libraries had been devoting too much attention to learning and teaching and had, as a result, neglected the needs of researchers. And thus he moved to his assertion that university library services were best placed to meet the needs that the UKRDS feasibility study would identify. Martin says that they can raise awareness of data issues, lead policy on data management, provide advice to researchers early in the data lifecycle, and work with IT services to develop appropriate local capacity.

Martin’s talk was entertaining, informative and well-argued in equal measure, but I am not entirely persuaded by it. University libraries certainly have a place to play here, and in some institutions they may well need to adopt a central role. But just because they can do all of the things that Martin describes does not mean that they are the only people who can, nor that they are the best-placed to do so in every case. His argument put up a number of straw men, one of which seemed to see future RDS provision as a choice between a library-led service and an computing services-led one. It was a cheap dig (”How many university computing services have a ‘friends’ organisation?”) and sets up a false dichotomy. This won’t be a simple either/or choice and there are many other routes to the intended end of a federated shared service of some sort. He emphasised that the UKRDS study was primarily a political act to keep the issue of research data management high on the agenda and that it should be seen in that light. I’m in complete agreement with that, and that’s the very reason I don’t see that it’s useful to engage in a turf war about which part of the university takes this forward.

That takes us to just over halfway at iDCC4, and it’s enough for one blog post. The rest will follow in a day or two, and there’s lots more of interest to write about.
—————————————–
[1] – which he credited to someone else that sounded like Jean-Pierre Galdon, but wasn’t


You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

AddThis Social Bookmark Button

3 Responses to “4th International Digital Curation Conference part 2 (return of idcc4)”

  1. Chris Rusbridge Says:

    Thanks Kevin. Comes up great on my NetNewsWire now. Love the (un-credited, un-explained) picture (though I know what it is; I really regretted not having my camera with me for the dinner).

    I thought ,maybe the person credited (which I missed as well) might have been Jean-Claude Bradley, but that would have been more likely to be in Neylon’s talk. I’ll ask Wilbanks!

  2. Chris: glad you are able to read it all now. I explained and credited the picture in its ALT text, but I should have added a title attribute as well; now done, and thanks for pointing it out. (It also links to the author’s page on flickr.)

    On that theme of open sharing, it was notable that many of the presentations this year consisted wholly or mainly of pictures from flickr and other online sources, all with licences that explicitly permit such use.

  3. [...] and almost immediately succombed with flu – so this is a late post. Fortunately others including Kevin Ashley in the ulcc da blog and Chris Rusbridge in the digital curation blog have given quite detailed [...]

Leave a Reply