<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ulcc da blog &#187; British Library</title>
	<atom:link href="http://dablog.ulcc.ac.uk/tag/british-library/feed/" rel="self" type="application/rss+xml" />
	<link>http://dablog.ulcc.ac.uk</link>
	<description>blogging about digital archives &#38; repositories since 2007</description>
	<lastBuildDate>Tue, 14 May 2013 10:42:32 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
		<item>
		<title>Working with Web Curator Tool (part 2): wikis</title>
		<link>http://dablog.ulcc.ac.uk/2009/03/10/working-with-web-curator-tool-part-2-wikis/</link>
		<comments>http://dablog.ulcc.ac.uk/2009/03/10/working-with-web-curator-tool-part-2-wikis/#comments</comments>
		<pubDate>Tue, 10 Mar 2009 12:16:22 +0000</pubDate>
		<dc:creator>Edward Pinsent</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Web Archiving]]></category>
		<category><![CDATA[British Library]]></category>
		<category><![CDATA[JISC]]></category>
		<category><![CDATA[UKWAC]]></category>
		<category><![CDATA[web archiving]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=391</guid>
		<description><![CDATA[How to archive a website built with a wiki? It&#8217;s worth looking into this as increasingly JISC projects are using wikis to manage and report on their projects; of the available brands, MediaWiki is a popular one. The challenge for me is how to bring in a good copy of a wiki site without causing [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-medium wp-image-397" style="border: 1px solid black; margin: 15px;" title="mediawiki" src="http://dablog.ulcc.ac.uk/wp-content/uploads/2009/03/mediawiki-300x246.jpg" alt="mediawiki" width="300" height="246" /></p>
<p>How to archive a website built with a wiki? It&#8217;s worth looking into this as increasingly JISC projects are using wikis to manage and report on their projects; of the available brands, <a href="http://www.mediawiki.org" target="_blank">MediaWiki</a> is a popular one.</p>
<p>The challenge for me is how to bring in a good copy of a wiki site without causing Web Curator Tool to gather too many pages from it. We don&#8217;t want that, because (a) the finished result occupies unnecessary space in the archive and (b) because it takes so long to complete that it can hold up the gather queue in the shared web-archiving service, delaying the work of other UKWAC partners.</p>
<p>I am not technical enough to tell you in great detail what&#8217;s causing this, although I sense that it&#8217;s something to do with the Heritrix crawler requesting too many pages from the wiki. When you consider that a wiki is database-driven it should not surprise us that it&#8217;s creating a lot of its pages on the fly. Secondly, since a wiki is editable by lots of contributors (that&#8217;s its core function after all), it presumably means we have numerous past versions of pages also stored somewhere in the wiki labyrinth, and it&#8217;s possible that the implacable Heritrix will not cease until it&#8217;s faithfully requested and copied every single one of them.</p>
<p>Let&#8217;s look at the <strong><a href="http://www.ukoln.ac.uk/repositories/digirep/index/JISC_Digital_Repository_Wiki" target="_blank">Repositories Research Team wiki (DigiRep)</a></strong> owned by UKOLN, which I tried to gather five times in 2008. WCT conveniently keeps a history of these attempts, information about which I can still access even if the actual gathered pages have been discarded or archived. The size problems were chronic. Of five 2008 gathers, one was aborted after it had reached a massive 16.87 GB; a second one was rejected at 14.69 GB. I have archived one impression at 5.31 GB, another at 736.26 MB and another at 157.36 MB. Quite large variations there, which was worrying enough in itself.</p>
<p>At first, my workaround was to adjust the Profile Setting in the title to override the maximum number of documents Heritrix can gather. Setting &#8216;Maximum Documents&#8217; at 10000 worked, but it was not ideal; I suppose all this means is that Heritrix stops when it collects 10,000 pages, whether we have everything we want or not. (I found that the copies in the archive seemed to render OK however).</p>
<p>To get a closer look at what&#8217;s going on, I started to browse the Log Files created by WCT (complete records of every single client-server request), which show patterns which I can vaguely understand; when these Log Files are packed with near-identical strings of code I sense that something&#8217;s up. For example, a string containing <code>index.php?title=Repositories_Research&amp;action=edit</code> tells us that the wiki is requesting a specific named page, <strong>and</strong> allowing an edit action on that page. If you multiply that by the number of pages in the wiki, you can see how the problem builds up. (PHP is the script used for MediaWiki&#8217;s web scripting engine).</p>
<p>I follow this up by browsing the actual gathered pages in Web Curator Tool using the Tree View. From here I can click on the &#8216;View&#8217; button to examine a page which I think to be suspect, and compare it with other suspect pages. Lastly, I go back to the live DigiRep site to confirm in my mind what&#8217;s happening when certain links are followed.</p>
<p>All the above gave me just about enough information to experiment with exclusion filters. After a certain amount of trial and error, and working with other Media Wiki sites, I arrived at the following exclusion codes which I can add to the Profile Setting:</p>
<p><code>.*&amp;oldid.*</p>
<p>.*&amp;diff.*</p>
<p>.*&amp;limit.*</p>
<p>.*&amp;direction.*</p>
<p>.*Recentchanges.*</p>
<p>.*/Special.*</p>
<p>.*?title=Special.*</p>
<p>.*&amp;action=edit.*</p>
<p>.*&amp;action=history.*</p>
<p>.*&amp;section.*</p>
<p>.*&amp;redlink.*</p>
<p>.*&amp;printable=yes.*</p>
<p>.*&amp;redirect=no.*</code></p>
<p>These have the effect of telling WCT to exclude certain pages and actions from Heritrix&#8217;s harvesting action. The expectation was that I would lose the discussion / edit / history functions of the wiki in the archive copy.</p>
<p>The title with the above exclusion profile gathered just 63.41 MB and it completed in under ten minutes. I would say that&#8217;s an improvement on 16.87 GB. Log Files and the Tree View confirmed the success of this new &#8220;slimline&#8221; gather. As well as losing the discussion / edit / history functions, we also have eliminated the Toolbox functions, the &#8216;printable&#8217; views, and the login pages.</p>
<p>This is no great loss at all for our purposes, as scholars who browse the archived copy of DigiRep are not expecting to be able to edit pages, nor join in the discussions, nor browse the history of stored versions of pages. Indeed in a lot of cases, they would require a login to do so. The users simply want to see the results of the DigiRep team&#8217;s work.</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2009/03/10/working-with-web-curator-tool-part-2-wikis/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Dr Saad Eskander</title>
		<link>http://dablog.ulcc.ac.uk/2009/01/21/dr-saad-eskander/</link>
		<comments>http://dablog.ulcc.ac.uk/2009/01/21/dr-saad-eskander/#comments</comments>
		<pubDate>Wed, 21 Jan 2009 13:36:10 +0000</pubDate>
		<dc:creator>Patricia Sleeman</dc:creator>
				<category><![CDATA[Digital Archives]]></category>
		<category><![CDATA[archives]]></category>
		<category><![CDATA[British Library]]></category>
		<category><![CDATA[Events]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=311</guid>
		<description><![CDATA[On December 8th I was invited by Dr Saad Eskander, the Director of the National Library and Archives of Iraq, as his guest to an awards ceremony he was having at the British Library. The story behind this is that in 2006 I met him in Abu Dhabi speaking about his work and I subsequently [...]]]></description>
			<content:encoded><![CDATA[<p><img class="float-left" title="Saad" src="http://psleeman.files.wordpress.com/2008/12/saad3.jpg" alt="Penny Brook, Saad Eskander, Patricia Sleeman" width="270" /></p>
<p>On December 8th I was invited by Dr Saad Eskander, the Director of the National Library and Archives of Iraq, as his guest to an awards ceremony he was having at the British Library.</p>
<p>The story behind this is that in 2006 I met him in Abu Dhabi speaking about his work and I subsequently asked him to write a blog for our community of archivists and librarians about his day to day life. This blog was picked up by almost all main stream press around the world. Here is an article about him in the <a href="http://www.iht.com/articles/ap/2007/04/06/africa/ME-FEA-GEN-Iraq-Guerrilla-Librarian.php">International Herald Tribune</a>. In an interview with Fran Monks (in her website on <a href="http://howtomakeadifference.net/page/3/">How to make a difference</a> ) Saad spoke about how through his own free will he decided to return to his home city of Baghdad in 2003 after 23 years of absence, of which 13 had been spent as an academic in the UK. After Sadaam had been removed, Saad and a group of Iraqi artists, writers and academics from the UK returned to their country to see what they could do. All of the group except for Saad returned to London almost immediately because they were so shocked by the security situation. Saad alone was prepared to risk his life in order to assist with the rebuilding of the nation that he loves. Saad believes that the future of Iraq and Baghdad must be non-sectarian and democratic and have equal rights for all citizens, including women. This he practises in the Library which he has restored. Saad described his shock on first seeing the library: “95% of the contents had been either destroyed or looted. Everything had been burnt and even the marble had melted. Everything was covered in soot and the stench was almost unbearable.”</p>
<p>At the dinner after the ceremony were many of the great and the good from UK with specialities in the Middle East. I was an incongruous figure among them…as this obviously does not include me…I also sat beside Ann Clwyd who is a Welsh MP and Gordon Browns’ special envoy on Human Rights in Iraq. A lot of interest in the digital preservation training we do here and also the potential of VLEs being developed to facilitate learning in this area. It is seen as priority for the INLA to learn about how to manage their digital surrogates as their collections of archives is largely made up of these surrogates from the British Library as well as other places.</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2009/01/21/dr-saad-eskander/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>iPres2008 &#8211; first impressions</title>
		<link>http://dablog.ulcc.ac.uk/2008/10/01/ipres2008-first-impressions/</link>
		<comments>http://dablog.ulcc.ac.uk/2008/10/01/ipres2008-first-impressions/#comments</comments>
		<pubDate>Wed, 01 Oct 2008 22:15:10 +0000</pubDate>
		<dc:creator>Kevin Ashley</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Digital Archives]]></category>
		<category><![CDATA[British Library]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[DPC]]></category>
		<category><![CDATA[DPTP]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[iPres2008]]></category>
		<category><![CDATA[JISC]]></category>
		<category><![CDATA[JiSC-PoWR]]></category>
		<category><![CDATA[JISCPoWR]]></category>
		<category><![CDATA[preservation]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/2008/10/01/ipres2008-first-impressions/</guid>
		<description><![CDATA[iPres2008 finished yesterday, and overall it was a useful and informative event. It took place a mere 15 minutes walk from our current home, so we took advantage of its proximity and attended en masse. Chris Rusbridge has already done an excellent job of some near-real-time reporting on the sessions, and I&#8217;m not going to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.bl.uk/ipres2008/">iPres2008</a> finished yesterday, and overall it was a useful and informative event. It took place a mere 15 minutes walk from our current home, so we took advantage of its proximity and attended <em>en masse.</em></p>
<p>Chris Rusbridge has already done an excellent job of some <a href="http://digitalcuration.blogspot.com/search/label/iPres-2008">near-real-time reporting</a> on the sessions, and I&#8217;m not going to try to replicate that level of detail in this post. As a first-time attendee at iPres, I was impressed by the professional mix attending, which took in hard-core computer science, digital preservation and curation folk, repository managers and those from the traditional custodial professions. In that respect it was very reminiscent of the <a href="http://ec.europa.eu/transparency/archival_policy/dlm_forum/index_en.htm">early DLM-Forums</a>, which were eye-opening for me when I attended the first one in 1996. But it was also interesting to observe that, just as DLM was dominated by archivists and records managers, iPres is a very library-oriented event. For example, those who expressed a desire for a Europe-wide event bringing together all those with an interest in digital preservation didn&#8217;t seem to be aware that the DLM-Forums existed.</p>
<p>One positive observation (of many) is that there is more reassuring news on the oft-vexed issue of IPR barriers to digital preservation. At the close of day 1, we heard a summary of the findings of the international survey on the impact of copyright law on digital preservation. <span id="more-196"></span>That indicated that the UK had one of the strictest set of constraints of all the countries looked at &#8211; in terms of who is permitted to carry out certain acts in the name of preservation and what those acts are. Other countries have more relaxed exemptions and that doesn&#8217;t appear to be causing the major rightsholdfers any significant financial loss. That should give us hope for some change in the law in the UK at least. And Horst Foster, making the keynote speech opening day 2, appeared to echo this at the European level, implying that the case for change had been made and accepted, although he was notably cautious about making any promises as to when this change might come about.</p>
<p>That&#8217;s an improvement on the situation in Europe a few years ago, though, when I was at one of a number of expert panels helping the European Commission to frame the forthcoming research questions and challenges in the digital preservation arena. At that time we were all warned off mentioning the (C) word at all &#8211; it seemed to have a somewhat toxic flavour. It&#8217;s really heartening to see that things have changed.</p>
<p>One should add a note of caution, however. After Adrienne Muir had commented favourably on how Australian law allows institutions such as the national library and archives to bypass DRM systems in order to preserve material, Colin Webb injected a note of caution. It is apparently still illegal to manufacture or to import a device to Australia which allows a DRM system to be bypassed. But if the national library happens to find such a device on its premises, it can use it without fear of breaking the law. Still some way to go, then, before the law is &#8216;joined up and working&#8217; &#8211; the strapline of iPres2008.</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2008/10/01/ipres2008-first-impressions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LIFE, a users manual</title>
		<link>http://dablog.ulcc.ac.uk/2008/06/24/life-a-users-manual/</link>
		<comments>http://dablog.ulcc.ac.uk/2008/06/24/life-a-users-manual/#comments</comments>
		<pubDate>Tue, 24 Jun 2008 13:24:19 +0000</pubDate>
		<dc:creator>Patricia Sleeman</dc:creator>
				<category><![CDATA[Digital Archives]]></category>
		<category><![CDATA[British Library]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[LIFE]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/2008/06/24/life-a-users-manual/</guid>
		<description><![CDATA[Not the book by George Perec but actually a costing model for digital preservation developed by the British Library. LIFE, (Life Cycle Information for E-Literature) is a collaboration between University College London (UCL) and the British Library. The Project has developed a methodology to model the digital lifecycle and calculate the costs of preserving digital [...]]]></description>
			<content:encoded><![CDATA[<p> <img src="http://ecx.images-amazon.com/images/I/41T21BiRoDL._SL500_AA240_.jpg" alt="LIFE" class="float-left" width="192" /></p>
<p>Not the book by <a href="http://web.ncf.ca/ek867/perec.cat.jpg">George Perec</a> but actually a costing model for digital preservation developed by the British Library.<a href="http://www.life.ac.uk/"> LIFE</a>, (Life Cycle Information for E-Literature) is a collaboration between University College London (UCL) and the British Library. The Project has developed a methodology to model the digital lifecycle and calculate the costs of preserving digital information for the next 5, 10 or 100 years. For the first time, organisations can apply this process and plan effectively for the preservation of their digital collections.</p>
<p>Currently the LIFE Project is in its second phase (“LIFE<sup>2</sup>”), an 18 month project running from March 2007 to August 2008. LIFE tries to develop a costing model which can identify costs at each stage, 1 year, 10 years and 100 years. It wants to show the long term consequences of accessioning collections and can be used for curatorial decisions.</p>
<p><span id="more-125"></span>The event was kicked off by <strong>Helen Shenton,</strong> who asked us not to crack any more Second Life jokes, and introduced us all to LIFE and its evolution. To read more, have a look <a href="http://eprints.ucl.ac.uk/view/subjects/516.html." target="_blank">here</a>.</p>
<p><strong>Paul Courant</strong> of the University of Michigan gave a keynote speech and noted how history &#8216;gets longer by the minute&#8217;. He noted that market forces may drive preservation for a while but that material which is seen as unmarketable will not be taken care of, in crude economic terms. It is vitally important that people who control resources understand how important digital preservation is.</p>
<p>Throughout the day we heard case studies from national institutions in Denmark and the British Library, as well as the newspaper project within the British Library. All of them had used LIFE and gave feedback regarding their experience with it. Some thought it would be useful to use OAIS terms to ease understanding and communication and would like to see a breakdown in some of the generic functional entities. One case study involving film noted that it would be more economical to preserve film in digital format as opposed to 105 mm films. Professor <strong>Bo-Christer Björk</strong> who was asked to check the model related that he didn&#8217;t identify any significant flaws with the methodology of LIFE. He hoped that convergence of a few file formats would, in the future, reduce costs. He could see how very rare formats could create high costs in the long run.</p>
<p><strong>Paul Ayris</strong> of UCL questioned whether libraries should, instead of pulling readers into their institutions, be pushing material out to the public. He referred to the Blue Ribbon task force on substantial digital preservation and access (BRTF -SDPA) which he is involved in. Launched by the National Science Foundation (NSF) and the Andrew W. Mellon Foundation, the Task Force’s two-year mission is to develop a viable economic sustainability strategy to ensure that today&#8217;s data will be available for further use, analysis and study. Members will convene a broad set of international experts from the academic, public and private sectors who will participate in quarterly discussion panels. The group will then publish two substantial reports with their findings, including a final report in late 2009 that will include a set of actionable recommendations for digital preservation.</p>
<p>LIFE seems to be workable; however, costing takes time, evidence for costing takes time to gather, and it appears to work best within national institutions and Institutional Repositories where preservation priorities have been clearly stated; and there must be a stable budget.</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2008/06/24/life-a-users-manual/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Web-archiving: the WCT workflow tool</title>
		<link>http://dablog.ulcc.ac.uk/2008/04/30/web-wct/</link>
		<comments>http://dablog.ulcc.ac.uk/2008/04/30/web-wct/#comments</comments>
		<pubDate>Wed, 30 Apr 2008 15:11:57 +0000</pubDate>
		<dc:creator>Edward Pinsent</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Web Archiving]]></category>
		<category><![CDATA[British Library]]></category>
		<category><![CDATA[JISC]]></category>
		<category><![CDATA[UKWAC]]></category>
		<category><![CDATA[web archiving]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/2008/04/30/web-wct/</guid>
		<description><![CDATA[This month I have been happily harvesting JISC project website content using my new toy, the Web Curator Tool. It has been rewarding to resume work on this project after a hiatus of some months; the former setup, which used PANDAS software, has been winding down since December. Who knows what valuable information and website [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://dablog.ulcc.ac.uk/wp-content/uploads/2008/04/web-curator-tool-logo.gif" alt="web-curator-tool-logo.gif" class="float-left" style="border: 0pt none " />This month I have been happily harvesting JISC project website content using my new toy, the <a href="http://webcurator.sourceforge.net/" target="_blank">Web Curator Tool</a>. It has been rewarding to resume work on this project after a hiatus of some months; the former setup, which used PANDAS software, has been winding down since December. Who knows what valuable information and website content changes may have escaped the archiving process during these barren months?</p>
<p>Web Curator Tool is a web-based workflow database, one which manages the assignment of permission records, builds profiles for each &#8216;target&#8217; website, and allows a certain amount of inter-facing with Heritrix, the actual engine that gathers the materials. The <a href="http://crawler.archive.org/" target="_blank">open-source Heritrix project</a> is being developed by the <a href="http://www.archive.org" target="_blank">Internet Archive</a>, whose access software (effectively the &#8216;Wayback Machine&#8217;) may also be deployed in the new public-facing website when it is launched in May 2008.</p>
<p><span id="more-94"></span><img src="http://dablog.ulcc.ac.uk/wp-content/uploads/2008/04/title-icon-targets.gif" alt="title-icon-targets.gif" class="float-right" style="border: 0pt none " />Although the idiosyncrasies of WCT caused me some anguish at first, largely through being removed from my &#8216;comfort zone&#8217; of managing regular harvests, I suddenly turned the corner about two weeks ago. The diagnostics are starting to make sense. Through judicious ticking of boxes and refreshing of pages, I can now interrogate the database to the finest detail. I learned how to edit and save a target so as to &#8216;force&#8217; a gather, thus helping to clear the backlog of scheduled gathers which had been accumulating, unbeknownst to us, since December. Most importantly, with the help of <a href="http://www.webarchive.org.uk" target="_blank">UKWAC </a>colleagues, we&#8217;re slowly finding ways of modifying the profile so as to gather less external material (or reduce collateral harvesting, to put it another way); or extend its reach to capture stylesheets and other content which is outside the root URL.</p>
<p>True, a lot of this has been trial and error, involving experimental gathers before a setting was found that would &#8216;take&#8217;. But WCT, unlike our previous set-up, allows the possibility of gathering a site more than once in a day. And it’s much faster. It can bring in results on some of the smaller sites in less than two minutes.</p>
<p>Now, 200 new instances of JISC project sites have been successfully gathered during March and April alone. A further 50 instances have been brought in from the Jan-Feb backlog. The daunting backlog of queued instances has been reduced to zero. Best of all, over 30 new JISC project websites (i.e. those which started around or after December 07) have been brought into the new system. I&#8217;ll be back in my comfort zone in no time…</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2008/04/30/web-wct/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Significant Properties Workshop @ BL</title>
		<link>http://dablog.ulcc.ac.uk/2008/04/08/significant-properties/</link>
		<comments>http://dablog.ulcc.ac.uk/2008/04/08/significant-properties/#comments</comments>
		<pubDate>Tue, 08 Apr 2008 17:57:43 +0000</pubDate>
		<dc:creator>Richard M. Davis</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Digital Archives]]></category>
		<category><![CDATA[British Library]]></category>
		<category><![CDATA[DPC]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[JISC]]></category>
		<category><![CDATA[preservation]]></category>
		<category><![CDATA[significant properties]]></category>
		<category><![CDATA[sigprops]]></category>
		<category><![CDATA[SPeLOs]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/2008/04/08/significant-properties/</guid>
		<description><![CDATA[I don&#8217;t think I could begin to do justice in a few words to the wide-ranging debate at the JISC/BL/DPC Workshop on Significant Properties at the British Library on Monday: I&#8217;d rather leave it to others to analyse the significant outcomes in more detail, or to further discussions like the one started by Chris Rusbridge [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center"><a href="http://dablog.ulcc.ac.uk/wp-content/uploads/2008/04/image042.jpg" title="Neil Grindley and Chris Rusbridge kick off the Significant Properties Workshop"><img src="http://dablog.ulcc.ac.uk/wp-content/uploads/2008/04/image042.jpg" alt="Neil Grindley and Chris Rusbridge kick off the Significant Properties Workshop" style="width: 95%" /></a></p>
<p>I don&#8217;t think I could begin to do justice in a few words to the wide-ranging debate at the <a href="http://www.dpconline.org/graphics/events/080407workshop.html" target="_blank">JISC/BL/DPC Workshop on Significant Properties</a> at the British Library on Monday: I&#8217;d rather leave it to <a href="http://digitalcuration.blogspot.com/2008/04/seriiously-seeking-significance.html" title="Maureen Pennock on Sig Props @ DCC Blog" target="_blank">others </a>to analyse the significant outcomes in more detail, or to further discussions like the one started by Chris Rusbridge (our cucumber-cool chairman on the day) on the <a href="http://digitalcuration.blogspot.com/2008/03/significant-properties-workshop.html" title="Chris Rusbridge on Sig Props @ DCC Blog" target="_blank">DCC Blog</a>. Suffice it to say there was a sack of food for thought in all the presentations, and lots of opportunities to wonder &#8220;now why didn&#8217;t <em>I</em> think of that?&#8221;</p>
<p><span id="more-79"></span>I think the most enjoyable presentation was that of Cal Lee from UNC Chapel Hill, who managed to lift us with a few laughs about his work &#8211; with Microsoft Word file format specifications &#8211; just as we were beginning to flag at the end of a long day of considering many things from many angles. The <a href="http://www.jisc.ac.uk/whatwedo/programmes/programme_preservation/project_movingimagesound.aspx" target="_blank">Moving Images</a> study also looks like it may have useful information for our deliberations on standards and formats for <a href="http://primo.sas.ac.uk" target="_blank">PRIMO</a>. If I wasn&#8217;t so taken with the presentation on E-learning objects, that was chiefly because I gave it &#8211; a mind-bending task to give our account of not only Learning Objects but also Significant Properties in 15 mins. But the <a href="http://www.dpconline.org/docs/events/080407sigpropsDavis.pdf" target="_blank">slides</a> look nice on the DPC website. (All credit to Ed for the excellent diagrams.)</p>
<p><a href="http://dablog.ulcc.ac.uk/wp-content/uploads/2008/04/image041.jpg" title="Avant le deluge: an empty BL Conference Centre"><img src="http://dablog.ulcc.ac.uk/wp-content/uploads/2008/04/image041.thumbnail.jpg" alt="Avant le deluge: an empty BL Conference Centre" class="float-right" /></a>Neil Grindley and the rest of the JISC/DPC/BL organisers did well to run it so smoothly and facilitate the discussion, and it was a rare opportunity to catch Andrew Wilson on a flying visit to this hemisphere and hear his first hand account of <a href="http://www.naa.gov.au/records-management/secure-and-store/e-preservation/at-NAA/index.aspx" target="_blank">NAA&#8217;s</a> approaches and models. I can&#8217;t remember if I&#8217;ve been to the British Library Conference Centre before or not, but it is certainly an impressive space, and with super-slick AV facilities.</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2008/04/08/significant-properties/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UKWAC: what about HLF websites?</title>
		<link>http://dablog.ulcc.ac.uk/2007/12/13/ukwac-what-about-hlf-websites/</link>
		<comments>http://dablog.ulcc.ac.uk/2007/12/13/ukwac-what-about-hlf-websites/#comments</comments>
		<pubDate>Thu, 13 Dec 2007 15:11:34 +0000</pubDate>
		<dc:creator>Edward Pinsent</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Web Archiving]]></category>
		<category><![CDATA[British Library]]></category>
		<category><![CDATA[Community Archives]]></category>
		<category><![CDATA[Digital sustainability]]></category>
		<category><![CDATA[HLF]]></category>
		<category><![CDATA[JISC]]></category>
		<category><![CDATA[minority ethnic groups]]></category>
		<category><![CDATA[National Archives]]></category>
		<category><![CDATA[UKWAC]]></category>
		<category><![CDATA[web archiving]]></category>

		<guid isPermaLink="false">http://dash.ulcc.ac.uk/blog/?p=25</guid>
		<description><![CDATA[We were recently relieved to learn that the Bernie Grant Trust archives website is still alive and well at http://www.berniegrantarchive.org.uk/. For a few weeks in November 2007, the site appeared to have vanished, ostensibly another web-based resource to have fallen to the vicissitudes of short-term funding. True, the Internet Archive had captured a few impressions [...]]]></description>
			<content:encoded><![CDATA[<p>We were recently relieved to learn that the <strong>Bernie Grant Trust archives</strong> website is still alive and well at <a href="http://www.berniegrantarchive.org.uk/" target="_blank">http://www.berniegrantarchive.org.uk/</a>. For a few weeks in November 2007, the site appeared to have vanished, ostensibly another web-based resource to have fallen to the vicissitudes of short-term funding. True, the <a href="http://www.archive.org">Internet Archive</a> had captured a few impressions of it, but the site is a complex one &#8211; full of interactive elements and database-driven deliverables, to say nothing of the online exhibition and other materials which can only be experienced through the website.</p>
<p>Why haven&#8217;t <a href="http://www.webarchive.org.uk">UKWAC</a> got a copy of this site? True, complex sites like this one tend to remain out of the reach of harvesting tools like PANDAS, which is based on HTTrack, and can&#8217;t get good results for sites which rely on complex server-side architecture. The site however is still unarchived as far as we know. <span id="more-25"></span>ULCC&#8217;s Joanne Anthony (who had worked as the archivist for the Bernie Grant Trust) was keen to learn if there was any way of submitting the site for consideration to one of the UKWAC partners. There is indeed an <a href="http://info.webarchive.org.uk/cgi-bin/submission.cgi" target="_blank">online submissions form</a> available, but this merely delivers a message to the UKWAC webmaster, who then forwards the request to the most appropriate partner. It would help considerably if the individual collection policies of each partner were made more manifest and published on the public site. But the visitor to <a href="http://www.webarchive.org.uk/">www.webarchive.org.uk</a> will find only a sketchy description of these policies, for example &#8220;The British Library will focus on sites of cultural, historical and political importance.&#8221;</p>
<p>Among UKWAC partners, the BL and the TNA are known to be directing their energies on certain specialised collection strands. These are given more descriptive paras at <a href="http://info.webarchive.org.uk/col.html">http://info.webarchive.org.uk/col.html</a>, yet the underlying pattern or theme of these collections is not apparently obvious. At least three of them &#8211; the Tsunami, General Election and London Terrorist attack strands &#8211; appear to be based primarily on the fact that the sites are ephemeral and most in danger of loss (regardless of their informational or evidentiary value as records).</p>
<p>It is not clear how a concerned individual, or a member of the DP Community, might be empowered to somehow influence UKWAC&#8217;s collection policies for the better. In the case of the Bernie Grant website, Joanne&#8217;s interest was to see minority ethnic groups better represented in UK archival collections; but another approach would be to see it within in the larger group of &#8216;websites funded by Heritage Lottery Funding&#8217;. It seems likely there are many such project sites, all with short-term funding and therefore potentially at risk of being removed from cyberspace at any time, yet containing unique digital materials of huge potential cultural value. As a discrete collection of websites, it has parallels with JISC&#8217;s collection focus, ie JISC-funded projects which are occupying web space on a similar short-term lease. How can we persuade the relevant funding bodies to ensure their web outputs are archived, as JISC already does?</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2007/12/13/ukwac-what-about-hlf-websites/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>UKWAC&#8217;s migration of websites</title>
		<link>http://dablog.ulcc.ac.uk/2007/12/11/migration-of-websites/</link>
		<comments>http://dablog.ulcc.ac.uk/2007/12/11/migration-of-websites/#comments</comments>
		<pubDate>Tue, 11 Dec 2007 14:48:08 +0000</pubDate>
		<dc:creator>Edward Pinsent</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Web Archiving]]></category>
		<category><![CDATA[British Library]]></category>
		<category><![CDATA[JISC]]></category>
		<category><![CDATA[UKWAC]]></category>
		<category><![CDATA[web archiving]]></category>

		<guid isPermaLink="false">http://dash.ulcc.ac.uk/blog/?p=23</guid>
		<description><![CDATA[As I write, the UK&#8216;s archive of websites is undergoing the process of migration, in the hands of the British Library who continue to act as the lead partners for the UKWAC Consortium. There are at least two sides to this mammoth task. The first (which I assume is probably relatively easy) involves moving the [...]]]></description>
			<content:encoded><![CDATA[<p>As I write, the <a href="http://www.webarchive.org.uk" target="_blank"><st1:country-region w:st="on"><st1:place w:st="on">UK</st1:place></st1:country-region>&#8216;s archive of websites</a> is undergoing the process of migration, in the hands of the <a href="http://www.bl.uk" target="_blank">British Library</a> who continue to act as the lead partners for the <a href="http://info.webarchive.org.uk/index.html" target="_blank">UKWAC Consortium</a>.</p>
<p>There are at least two sides to this mammoth task. The first (which I assume is probably relatively easy) involves moving the archive of gathered websites from its current server infrastructure to its new one. The previous hosts, <a href="http://www.magus.co.uk/index.html">Magus</a>, have decided they can&#8217;t see a future in archiving websites. The new host, very coincidentally, is ULCC; our infrastructure services recently won the contract to provide a home for the large quantities of stored websites.</p>
<p>The second migration aspect, which involves complexities I&#8217;m glad I don&#8217;t have to deal with, involves moving the publisher and website profiles across from the <a href="http://pandora.nla.gov.au/pandas.html" target="_blank">PANDAS</a> database to the <a href="http://webcurator.sourceforge.net/" target="_blank">Web Curator Tool</a> (WCT) database. <span id="more-23"></span> WCT, as fate would have it, is being jointly developed by the BL and the National Library of Zealand, and it will become our weapon of choice for all future web-harvesting activities. It&#8217;s certainly a more sophisticated piece of software than the clunky, web-object driven PANDAS, and appears able to handle the concept of one publisher owning more than one title (something which always baffled PANDAS).</p>
<p>The planned migration moves have been causing consternation to many of the UKWAC partners, particularly those who have been storing unprocessed gathers in the Magus &#8216;Temporary Drive&#8217; for a long time. We at ULCC have been assisting with managing that process for months. Kevin Ashley devised a simple script that could query this drive, and report back on the occupancy broken down by website number, with figures on file sizes and dates. Ed Pinsent, by querying PANDAS, was able to match website numbers to their owners, thus providing a handy set of reports on information that was otherwise unavailable. (PANDAS wasn&#8217;t able to see these unprocessed gathers, for some reason; Magus wouldn&#8217;t run a script to report on them because they&#8217;d never been asked to, and it would probably have incurred additional charges anyway.)</p>
<p>The JISC occupancy of the temp drive has been negligible however. This is mainly because we have been so efficient in processing our completed gathers, and using the ftp connection has allowed us to look more closely at failed gathers. Additionally, JISC&#8217;s requirements are such that (unlike other partner members) we have rarely had to gather entire websites, instead concentrating on a few pages that constitute a JISC Project.</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2007/12/11/migration-of-websites/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
