<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ulcc da blog &#187; Technical</title>
	<atom:link href="http://dablog.ulcc.ac.uk/tag/technical/feed/" rel="self" type="application/rss+xml" />
	<link>http://dablog.ulcc.ac.uk</link>
	<description>blogging about digital archives &#38; repositories since 2007</description>
	<lastBuildDate>Tue, 14 May 2013 10:42:32 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
		<item>
		<title>PICT Memento plugin allows us to step into a wiki’s past</title>
		<link>http://dablog.ulcc.ac.uk/2010/03/16/948/</link>
		<comments>http://dablog.ulcc.ac.uk/2010/03/16/948/#comments</comments>
		<pubDate>Tue, 16 Mar 2010 12:33:58 +0000</pubDate>
		<dc:creator>Rory McNicholl</dc:creator>
				<category><![CDATA[E-Learning]]></category>
		<category><![CDATA[Web Archiving]]></category>
		<category><![CDATA[CLASM]]></category>
		<category><![CDATA[JISC]]></category>
		<category><![CDATA[PICT]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=948</guid>
		<description><![CDATA[The PICT project is pretty much over, but I can steal a few moments out of my day every now and then to do a bit of house keeping, try out a new plugin and maybe even blog about it. Inspired by Rob Sanderson&#8217;s lightening talk at dev8D on Memento I decided to go for [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://pict.ulcc.ac.uk">PICT project</a> is pretty much over, but I can steal a few moments out of my day every now and then to do a bit of house keeping, try out a new plugin and maybe even blog about it.</p>
<p>Inspired by Rob Sanderson&#8217;s lightening talk at <a href="http://http://wiki.2010.dev8d.org">dev8D</a> on <a href="http://arxiv.org/abs/0911.1112">Memento</a> I decided to go for the bounty offered for writing a memento client. My tack was to enable a mediawiki instance to handle the Accept-Date protocol using an existing plugin. Then to write a little PICT tool that supplied a user interface by which users could specify a date and <strong>browse their PICT enabled mediawiki &#8220;from the past&#8221;&#8230; </strong>spooky!</p>
<p>Thanks to getting nowhere near the deadline (those involved with this project will be grinning at that), I got nowhere near the prize, but I did finish a prototype plugin and threw it up on a mediawiki instance for another project: <a href="http://clasm.ulcc.ac.uk">CLASM-demo</a>. CLASM is the name of project not a piece of conjurers&#8217; onomatopoeia, so click on that link to see the PICT-memento client in action. A wiki with more pages would have been a better example, but at least it does have a lot of revisions.</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2010/03/16/948/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A repository for pi(es)</title>
		<link>http://dablog.ulcc.ac.uk/2010/01/07/a-repository-for-pies/</link>
		<comments>http://dablog.ulcc.ac.uk/2010/01/07/a-repository-for-pies/#comments</comments>
		<pubDate>Thu, 07 Jan 2010 16:40:47 +0000</pubDate>
		<dc:creator>Kevin Ashley</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Repositories]]></category>
		<category><![CDATA[costs]]></category>
		<category><![CDATA[curation]]></category>
		<category><![CDATA[data curation]]></category>
		<category><![CDATA[EAD]]></category>
		<category><![CDATA[repositories]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=849</guid>
		<description><![CDATA[As you may have read recently, Fabrice Bellard has announced the computation of &#960; to almost 2.7 trillion decimal places using a faster algorithm that allows desktop technology to be used, rather than the supercomputers that are usually used to break this particular record. Bellard is an extremely talented programmer who has made a useful [...]]]></description>
			<content:encoded><![CDATA[<p>As you may have <a href="http://news.bbc.co.uk/1/hi/technology/8442255.stm">read</a> recently, <a href="http://bellard.org/">Fabrice Bellard</a> has announced the computation of &pi; to almost <a href="http://bellard.org/pi/pi2700e9/announce.html">2.7 trillion decimal places</a> using a faster algorithm that allows desktop technology to be used, rather than the supercomputers that are usually used to break this particular record. Bellard is an extremely talented programmer who has made a useful contribution to one area of digital preservation with his emulation and virtualisation system <a href="http://www.nongnu.org/qemu/">QEMU</a>. But it&#8217;s a <a href="http://twitter.com/lescarr/status/7472981654">comment</a> by <a href="http://users.ecs.soton.ac.uk/lac">Les Carr</a> that set me thinking about costs, research data and repositories. </p>
<p>&#8220;Would you want to put that in your repository?&#8221; asked Les. And this is a particularly extreme example where we can do some calculations to give us a fairly good answer. Scientific data centres and the researchers that <a href="http://www.flickr.com/photos/maitri/2333509032/"><img src="http://dablog.ulcc.ac.uk/wp-content/uploads/2010/01/PiPie.jpg" alt="Pi Pie - CC-BY-NC-SA by Maitri@flickr" title="PiPie" width="240" height="160" class="alignright" style="margin: 4px;" /></a> use them have been considering this question for many years, and one way of looking at it is to see if the cost of recomputation exceeds the cost of storage over a particular time period. We&#8217;re assuming here that the initial question &#8211; <a href="http://scienceblogs.com/bookoftrogool/2010/01/chris_rusbridge_settles_the_qu.php">is this worth keeping at all</a> &#8211; has been answered at least vaguely positively.</p>
<p>Let&#8217;s look first at the cost of recomputation. Fabrice says the equipment used for this task cost no more than €2000. If we assume that it has a life of 3 years, that gives us a cost per day of €1.83. I&#8217;m avoiding the usual accounting practice of allowing for inflation, or lost interest on capital, in calculating the true depreciation value of the asset &#8211; there&#8217;s a number of different schemes and they all give similar results. I&#8217;ve just dividided the capital cost by the number of days of use we&#8217;ll get. But computers use electricity, and that costs money as well. Let&#8217;s assume this is a power-hungry beast that draws 400W and that power costs us 13.5&cent; per kwH (which is what my domestic tarrif is if we assume a euro/sterling rate of €1.10 = £1 and 5% VAT.) That adds €1.30/day to the cost of running the system, for a total cost of €3.13/day.</p>
<p>Fabrice&#8217;s announcement says that it took 131 days of system time to calculate and verify his results, which gives a computational cost of €410.03 &#8211; which I&#8217;ll round to €410 since I&#8217;ve only been using 3 significant figures so far in the computations, and because there&#8217;s a lot of hand-waving involved in lots of these figures. So, we know how much it would take to recompute this result given the software, machine and instructions. (And the computational cost is likely to decline over time in the short term.)</p>
<p>The answer needs a Terabyte of storage. What will it cost to keep that in a repository? That&#8217;s a slightly more difficult question to answer, but we can give a number of figures that provide upper and lower bounds. <a href="http://www.sdsc.edu/services/StorageBackup.html">SDSC quote</a> $390/Tbyte/year for archival tape storage (dual copies), excluding setup costs and assuming no retrieval. <a href="http://chronopolis.sdsc.edu/assets/docs/dt_cost.pdf">Moore et al</a> quote $500/year as a raw figure, obtained by dividing total system costs by usable storage within it. At current rates of $1 = €0.67, that gives us a cost of €261/year or €335/year. SDSC are likely to be at the cheap end of the scale. ULCC&#8217;s costs, given our lower total volumes, would be closer to €1500/year for a similar service (dual archival tape copies on separate sites) although that does include retrieval costs. <a href="http://aws.amazon.com/s3/#pricing">Amazon&#8217;s AWS</a> would be about €100/year for a single copy. You would want two copies, so it&#8217;s twice that, and the cost of transferring the data in would be about 25% more than the storage cost. Since I haven&#8217;t factored in ingest costs for any of the other models, I&#8217;ll ignore it for AWS as well. (And yes, AWS isn&#8217;t a repository, and there&#8217;s no metadata, and&#8230; This is a back-of-the-envelope calculation. It&#8217;s a small envelope.)</p>
<p>Which means, at a very rough level and ignoring many pertinent factors, that after about two years of storage in the repository, we would have been better off recalculating the data rather than storing it. There&#8217;s a lot of assumptions hidden there, however. For one, we&#8217;re assuming that this data will rarely, if ever, be required. If many people want it, the recalculation cost rapidly becomes prohibitive (and so does the 131 days they have to wait for their request to be satisfied!)</p>
<p>One of the other problems is more subtle. I said that, in the short term, recalculation costs would be likely to fall as computational power becomes cheaper. The energy costs involved will rise, of course, but there&#8217;s still a significant downward trend. But after a sufficient period of time, it becomes non-trivial to reconstruct the software and the environment it needs in order to allow the computation to happen. Imagine trying to recalculate something now where the original software is a <a href="http://en.wikipedia.org/wiki/PL/I">PL/I</a> program designed to run under OS/360. It&#8217;s not impossible by any means, but the cost involved and expertise required is non-trivial. At least with our example we won&#8217;t have any doubts about whether the right answer has been produced &#8211; the computation of &pi; produces an exact, if never-ending, answer. Most scientific software doesn&#8217;t do this and the exact answers produced can depend on the compiler, the floating-point hardware, mathematical libraries and the operating system. Over time, it becomes harder and harder to recreate these faithfully, and we often don&#8217;t have any means of checking whether or not we have succeeded. (Keeping the original outputs would help in this, of course, but that&#8217;s exactly what we&#8217;re trying to avoid.) That&#8217;s part of the problem that Brian Matthews and his colleagues examine in the <a href="http://sigsoft.dcc.rl.ac.uk/twiki/bin/view/Main/AboutSigSoft">SigSoft</a> project and there&#8217;s still a great deal of work to be done there.</p>
<p>So have we answered Les&#8217;s question ? My feeling is that in this case we have &#8211; there&#8217;s a fair amount of evidence that suggests that keeping this particular data set isn&#8217;t cost-effective. But in general, the question is far harder to answer. Yet we must strive harder for more general answers as the cost of not doing so is not trivial. Even if money did grow on trees, it still wouldn&#8217;t be free and at present we need to be very careful how we use it.</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2010/01/07/a-repository-for-pies/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>File formats&#8230;or data streams?</title>
		<link>http://dablog.ulcc.ac.uk/2009/12/03/ffods/</link>
		<comments>http://dablog.ulcc.ac.uk/2009/12/03/ffods/#comments</comments>
		<pubDate>Thu, 03 Dec 2009 16:42:53 +0000</pubDate>
		<dc:creator>Edward Pinsent</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Digital Archives]]></category>
		<category><![CDATA[DPC]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[file formats]]></category>
		<category><![CDATA[preservation]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=811</guid>
		<description><![CDATA[On 1st December Malcolm Todd of The National Archives gave a good account of the work he&#8217;s been doing on File Formats for Preservation, resulting in a substantial new Technology Watch report for the DPC. It was a seminar hosted by William Kilbride, with participants from the BBC, the BL, NLW and others. The afternoon [...]]]></description>
			<content:encoded><![CDATA[<p>On 1st December Malcolm Todd of The National Archives gave a good account of the work he&#8217;s been doing on <strong>File Formats for Preservation</strong>, resulting in a substantial new <a href="http://www.dpconline.org/docs/reports/dpctw09-02.pdf">Technology Watch report for the DPC</a>. It was a seminar hosted by William Kilbride, with participants from the BBC, the BL, NLW and others. The afternoon was useful and interesting for me since I teach an elementary module on file formats in a preservation context for our DPTP courses.</p>
<p>My naïve thinking in the area has been characterised by the assumption that the process is rather static or linear, and that the problem we&#8217;re facing is broadly the same every time; migrate data from a format that&#8217;s about to become obsolete or unsupported, onto another format that&#8217;s stable, supported, and open. MS Word document to PDF or PDF/A…now <em>that</em>, I can understand!</p>
<p>In fact, I learned at least two ways of thinking about formats that hadn&#8217;t occurred to me before. One simple one is costs; some formats can cost more to preserve than others. This can be calculated in terms of storage costs, multiplied over time, and the costs associated with migrations to new versions of that format. <span id="more-811"></span>For example, we&#8217;ve tended to pin our faith on the TIFF format for images for many reasons, but there&#8217;s a high storage price to be paid for all that wonderful losslessness. This may be one reason why the DP world is looking with more favour on the JPEG2000 format, which is &#8216;virtually&#8217; lossless and smaller in size.</p>
<p>Secondly, the problems of preserving digital data which doesn&#8217;t actually have a specified stable preservation format. Chris Puttick of <a href="http://thehumanjourney.net/">Oxford Archaeology</a> gave a vivid description of the problems he&#8217;s facing with CAD and GIS files, where the data can&#8217;t easily be tied to a single format in the first place (nor can a stable format for migration be identified). As the NLA put it on their <a href="http://www.nla.gov.au/padi/topics/432.html">PADI page</a>, &#8220;At present there is little dealing specifically or comprehensively with the preservation of this particular type of data, although some aspects of database preservation are applicable to GIS. Some long term preservation issues include a lack of open source formats and metadata standards, large data volume and complex data objects.&#8221; Puttick suggests that his data doesn&#8217;t really perform at all unless it&#8217;s operated within a very specific environment of hardware and software. How do we preserve an environment? This appears to be quite a distinct preservation problem and much harder to solve than Word to PDF, to put it mildly.</p>
<p>William Kilbride suggested that such cases (and websites too, arguably, because they are time-based) are more like a <em>stream </em>of data &#8211; a handy image which conveys something about the dynamic of such information packages, and showing us that it&#8217;s much harder to nail them down into a single format. You can never step into the same river twice.</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2009/12/03/ffods/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>SNEEP 0.3.2 (now with automagic installer) + PICT (SNEEP evolves!)</title>
		<link>http://dablog.ulcc.ac.uk/2009/06/11/sneep-032-plus-pict/</link>
		<comments>http://dablog.ulcc.ac.uk/2009/06/11/sneep-032-plus-pict/#comments</comments>
		<pubDate>Thu, 11 Jun 2009 13:09:31 +0000</pubDate>
		<dc:creator>Rory McNicholl</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Repositories]]></category>
		<category><![CDATA[EAD]]></category>
		<category><![CDATA[JISC]]></category>
		<category><![CDATA[jiscri]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[PICT]]></category>
		<category><![CDATA[repositories]]></category>
		<category><![CDATA[SNEEP]]></category>
		<category><![CDATA[social networking]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[ULCC]]></category>
		<category><![CDATA[web 2.0]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=693</guid>
		<description><![CDATA[SNEEP 0.3.2 The JISC funded SNEEP project (Social Networking Extensions for EPrints) &#8211; part of the original JISC rapid innovation programme &#8211; aimed to provide a set of social networking tools for EPrints repositories. It ran for 6 months and ended in May 2008. Since the rather low key publication of the resultant EPrints plugin [...]]]></description>
			<content:encoded><![CDATA[<p><strong>SNEEP 0.3.2</strong></p>
<p>The JISC funded <a title="SNEEP wiki" href="http://sneep.ulcc.ac.uk/wiki/index.php/Main_Page">SNEEP</a> project (Social Networking Extensions for EPrints) &#8211; part of the original JISC rapid innovation programme &#8211; aimed to provide a set of social networking tools for EPrints repositories. It ran for 6 months and ended in May 2008. Since the rather low key publication of the resultant EPrints plugin interest and uptake has been <a title="sneep posts on daBlog" href="http://dablog.ulcc.ac.uk/tag/sneep">slowly but surely gathering momentum</a>.</p>
<p>Today I am pleased to announce a couple of significant SNEEP related developments. Firstly , thanks to my colleague Ben Wheeler here at ULCC, SNEEP 0.3.2 released this week offers an automagic installer. This does away with the (slightly tortuous) manual install procedure that we suspect discouraged all but the hardier EPrints hac&#8230; I mean administrators.</p>
<p>You can download <a title="SNEEP 0.3.2 download" href="http://sneep.ulcc.ac.uk/eprints/21/">SNEEP 0.3.2</a> and/or read <a title="SNEEP 0.3.2 announcement" href="http://www.eprints.org/tech.php/11149.html">Ben&#8217;s post</a> to the EP-tech mailling list. The download page is also a good place to see SNEEP in action.</p>
<p><strong>PICT</strong></p>
<p>I am also pleased to announce a new project (funded as part of the 2009 JISC rapid innovation programme) that aims to build on the SNEEP work to provide SNEEP-ish services to a broader range of web resources. The goal of the PICT project (Platform Independent Community Toolbox) is a lightweight javascript tool that can be deployed across an number of web resources (not just a repository) to encompass the web-based real estate of a given research community and provide that community with collaborative tools <em>available at the on-line research coalface</em>.</p>
<p>Effectively PICT will allow resource owners to offer</p>
<ul>
<li>tags</li>
<li>comments</li>
<li>notes</li>
<li>other goodies</li>
</ul>
<p>from <em>their</em> web page. The data gathered by these tools will be managed by a PICT server (probably run by a community-minded resource owner) and be available for cross referencing with other resources in a PICT community.</p>
<p>If all that is a bit difficult to picture, rest assured that demos will appear throughout the course of the project that should help to clear the murk.</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2009/06/11/sneep-032-plus-pict/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>If you can keep your blog when all around&#8230;</title>
		<link>http://dablog.ulcc.ac.uk/2009/03/20/if-you-can-keep-your-blog/</link>
		<comments>http://dablog.ulcc.ac.uk/2009/03/20/if-you-can-keep-your-blog/#comments</comments>
		<pubDate>Fri, 20 Mar 2009 15:07:18 +0000</pubDate>
		<dc:creator>Richard M. Davis</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Web Archiving]]></category>
		<category><![CDATA[blogs]]></category>
		<category><![CDATA[digital archives]]></category>
		<category><![CDATA[DPE]]></category>
		<category><![CDATA[JiSC-PoWR]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[web 2.0]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=412</guid>
		<description><![CDATA[I was a keen participant in the activities of ERPANET , but I must confess I haven’t kept abreast of its successor, Digital Preservation Europe (DPE). However I was interested to see the recent DPE briefing paper about blog preservation, since it covers an area that we also tackled in the course of the JISC-PoWR [...]]]></description>
			<content:encoded><![CDATA[<p>I was a keen participant in the activities of <a id="lr72" title="ERPANET" href="http://www.erpanet.org/">ERPANET</a> , but I must confess I haven’t kept abreast of its successor, <a id="uxzv" title="Digital Preservation Europe" href="http://www.digitalpreservationeurope.eu/">Digital Preservation Europe</a> (DPE). However I was interested to see the recent DPE briefing paper about blog preservation, since it covers an area that we also tackled in the course of the JISC-PoWR project &#8211; on the <a id="aozl" title="blog" href="http://jiscpowr.jiscinvolve.org/">blog</a> , in the <a id="te3e" title="workshops" href="http://jiscpowr.jiscinvolve.org/workshops/">workshops</a> and the <a id="emon" title="handbook" href="http://jiscpowr.jiscinvolve.org/handbook/">handbook</a>. The Briefing Paper highlights key issues for those who would preserve blogs. It is a necessarily general overview, and manages to cram a lot of preservation issues into its two sides of A4. But, for the blogger approaching preservation, or the preservationist approaching blogs, I wonder if such avalanches of considerations aren&#8217;t sometimes unnecessarily overwhelming. It seemed worth looking at a few of the points made in the DPE briefing paper, and considering whether we can demystify them or make the task seem less daunting.</p>
<p><span id="more-412"></span><em><strong>Calls in the literature have advocated that blogs, as potentially valuable additions to the human record, are worthy of stewardship and long-term preservation. </strong></em>It&#8217;s hard to argue with this, in the light of recent widely publicised use of blogging by <a id="lzcq" title="President Obama" href="http://www.whitehouse.gov/blog/">President Obama</a>, the <a id="zh71" title="Number Ten Downing Street blog" href="http://number10.gov.uk/">Number Ten Downing Street blog</a>, the blogging and tweeting activities of certain celebrities, and so on. But of course anyone can have a blog (and it sometimes seems that almost everyone does). Blogs are phenomenally useful, and have a wide variety of applications: individual or corporate announcements and publications; time-stamped records of activities; public or private figures writing as themselves or pseudonymously; public and private journal and discussion spaces for students and a wide range of communities. A blog may well be just for Christmas (or any kind of project) no less than a traditional diary or journal. It is in essence, after all, just a sequential log (“blog” is merely a geeky contraction of “web log”). Blogging is therefore not so fundamentally alien: <a id="bs2y" title="Captain Cook kept a log" href="http://southseas.nla.gov.au/index_voyaging.html">Captain Cook kept a log</a> (and so will Captain Kirk). Rather than make it sound very technical and complicated, let&#8217;s start by taking some comfort in what&#8217;s familiar.</p>
<p><strong><em>Should all blogs be preserved? Which blogs should be preserved?</em></strong> The DPE Briefing Paper highlights (as all good preservation advice must) the issue of selection. How do we decide which blogs to preserve? To start with, the remit of any archival body (national, municipal, corporate, specialist, etc.) is itself self-selecting. It’s unlikely any archivist at NARA, TNA or any other archive would ever have to begin their deliberations with “all blogs”. Of matters of authenticity and attribution, the nature of the archive undertaking the preservation will, equally, dictate whether knowing the true identity is essential or not. Common sense should prevail. Clearly for records of official activities, such as the Downing Street and Obama blogs, reliable provenance and authenticity are critical. On the other hand, satirical blogs by &#8220;<a id="q:ax" title="Geoffrey Chaucer" href="http://www.gordonbrown.com/">Gordon Brown</a>&#8221; or &#8220;<a id="d5a2" title="Vladimir Putin" href="http://www.newsgroper.com/vladimir-putin">Vladimir Putin</a>&#8220;  are clearly not records of official activities, but they still may merit preservation as publications and/or as interesting artefacts: as with many fictional or satirical diaries or letters, the true identity of the author may or may not be material (Beachcomber, Henry Root, Primary Colors).</p>
<p><em><strong>Is preservation of content, regardless of context, sufficient?</strong></em> This is one of the most significant questions raised by the briefing paper. Has the Web changed our expectations of context? Once a descriptive catalogue or historical account seemed sufficient to establish context. Digital media now tempt us into thinking that, for any object, we should automatically gather (a copy of)  everything linked-to, and perhaps everything that links to, and so on. We can leave a computer to follow the endlessly forking paths, and create a virtual machine environment that will always render the gather materials faithfully &#8211; all without any manual intervention. Will that provide context, or make the preserved content understandable? Can we do without expert researchers and information professionals to scope, select and provide commentary where necessary?</p>
<p><em><strong>The inability to capture and preserve the design and features of the blog contradicts defining attributes of a blog</strong></em>. In fact it&#8217;s not impossible to capture and preserve many if not all of the design features of a blog. The JISC-PoWR blog has been <a id="r_bb" title="harvested by UKWAC" href="http://www.webarchive.org.uk/wayback/archive/20090101223818/http://jiscpowr.jiscinvolve.org/">harvested by UKWAC</a> with most of its features intact (experiments at ULCC have suggested that, with a little care, WordPress blogs are particularly well-suited to harvesting). But in general, let’s accept (as the <a id="xsdm" title="Blogbackuponline" href="http://www.blogbackuponline.com/">Blogbackuponline</a> people do) the overriding significance of the textual content over and above the bells and whistles of the web presentation. If there is one particularly interesting feature of blogs, that does set them apart from conventional web content, it is that they are generally intended for delivery and consumption in two distinct formats: by HTML web page, and by RSS/Atom XML newsfeed. The newsfeed version is bare-bones hypertext stripped of all the styling, bells and whistles of the web version, and will be read in many contexts, from online newsreaders and aggregators, like Netvibes or Bloglines, to desktop clients, such as Mozilla Thunderbird, as well as on mobile phones, and by automatic syndication to other sites.</p>
<p><em><strong></strong></em></p>
<p><em><strong>Blogs, including posts and comments, may be intentionally edited or deleted.</strong></em> Looked at in isolation, editability is a significant property of blogs in everyday use. But, as <a id="pyzf" title="Chris Rusbridge observed" href="http://digitalcuration.blogspot.com/2007/07/authenticity-across-migrations.html">Chris Rusbridge observed</a> , &#8220;editability, while vital for some kinds of re-use, is not essential for conveying the information essence of the objects&#8221;. Any preserved object we have only in the state it was when taken into archival custody (though we may also capture and preserve records of how it changed). Preservation schedules for blogs will need to decide how important this really is: when is it and when is it not important to preserve an edit that no one ever saw? (A transactional approach &#8211; only capturing page impressions users request &#8211; is one way to address this without excessive redundancy.) However, if anything, this volatiliy of blog content is a more serious issue for those citing an active blog (who may find their quotations no longer match): a copy in an archive ought to be fixed, and reliably citable. If anything, this would suggest that creating stable archives of valuable blogs<em> for the record </em>is essential.</p>
<p>James Currall rightly warned us against making an undue <a href="http://jiscpowr.jiscinvolve.org/2009/01/07/the-fetish-of-the-digital/">fetish of digital media</a>, at the expense of the information (and entertainment) which they convey; arguably as dangerous is the fetishisation of specific manifestations of digital media. How different, really, are blogs from other publications or applications built on the Web platform? Blogs are essentially web pages and don’t exist in a void. The &#8220;problem&#8221; of hypertext links to possibly unstable objects in remote locations is the bête-noire of all efforts to tame the Web. “Not in archive” is, and will ever be, a familiar message to regular users of archive.org. But we can perhaps take heart that many <a id="gdjs" title="archives of correspondence" href="http://correspondence.linnean-online.org/view/correspondence/correspondence.html">archives of correspondence</a> tend to be one-sided too.</p>
<p>Sometimes thinking about these things in the abstract is no substitute for actually doing them for real. Perhaps the best advice for anyone tasked with preserving blogs is, first of all, to try keeping one yourself for a few days or weeks. Then it only takes a few minutes to try out <a id="mchf" title="BlogBackupOnline" href="http://www.blogbackuponline.com/">BlogBackupOnline</a>; or check how your favourite blog is faring in the <a id="b0.:" title="your favourite blog in the Internet Archive" href="http://web.archive.org/web/*/http://dablog.ulcc.ac.uk">Internet Archive</a> , the <a id="z-gg" title="European Web Archive" href="http://www.europarchive.org/web.php">European Web Archive</a>, or the <a id="wnm." title="UK Web Archiv" href="http://www.ukwebarchive.org.uk/">UK Web Archive</a>; or even download <a id="jj28" title="HTTrack" href="http://www.httrack.com/">HTTrack</a> or <a id="lzax" title="Wget" href="http://wget.addictivecode.org/FrequentlyAskedQuestions?action=show&amp;redirect=Faq#download">Wget</a> and try harvesting something yourself.</p>
<p>Then it&#8217;s time to <a href="http://jiscpowr.jiscinvolve.org/2009/02/16/archivists-and-records-managers-twitter-group/">get Twittering</a>&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2009/03/20/if-you-can-keep-your-blog/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Draft standard for long-term archiving of CAD data</title>
		<link>http://dablog.ulcc.ac.uk/2008/09/08/draft-standard-for-long-term-archiving-of-cad-data/</link>
		<comments>http://dablog.ulcc.ac.uk/2008/09/08/draft-standard-for-long-term-archiving-of-cad-data/#comments</comments>
		<pubDate>Mon, 08 Sep 2008 16:17:05 +0000</pubDate>
		<dc:creator>Kevin Ashley</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Digital Archives]]></category>
		<category><![CDATA[CAD]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[preservation]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/2008/09/08/draft-standard-for-long-term-archiving-of-cad-data/</guid>
		<description><![CDATA[September&#8217;s edition of BSI&#8217;s Update Standards magazine alerted me to another batch of standards, currently at the public comment stage, which are of particular relevance to digital preservation. The BS EN 9300 family is entitled &#8216;Long term archiving and retrieval of digital technical product documentation such as 3D, CAD and PDM data&#8216; and 5 parts [...]]]></description>
			<content:encoded><![CDATA[<p>September&#8217;s edition of BSI&#8217;s Update Standards magazine alerted me to another batch of standards, currently at the public comment stage, which are of particular relevance to digital preservation. The BS EN 9300 family is entitled &#8216;<strong>Long term archiving and retrieval of digital technical product documentation such as 3D, CAD and PDM data</strong>&#8216; and 5 parts (100; 110; 007; 005; 002 and 115) are open for comment until September 30th. I was initially surprised that I had heard nothing of this series of standards before, and wasn&#8217;t sure if this was simply lack of observation on my part or because they had come from an entirely different domain. They clearly aren&#8217;t new &#8211; unlike BS10008, which I <a href="http://dablog.ulcc.ac.uk/2008/06/26/bs-10008-time-is-running-out/">wrote about in June</a>, this is not a home-grown British Standard but one which is being proposed &#8216;for adoption&#8217; &#8211; which means that it&#8217;s already been adopted by another body somewhere. That status also means that you can&#8217;t use BSI&#8217;s excellent online commenting system. You have to buy the drafts from BSI on paper, so far as I can see.</p>
<p>In fact, as I was relieved to note, this group of standards isn&#8217;t entirely new to the digital preservation community, and the authors are also aware of general DP standards such as OAIS. They derive from a group of standards known as <a href="http://www.steptools.com/library/standard/">STEP</a> (Standard for Exchange of Product Model Data), codified in ISO 10303. <span id="more-190"></span>STEP came up in a presentation and discussions at <a href="http://homepages.inf.ed.ac.uk/hmueller/presdb07/">PresDB&#8217;07</a>, although it&#8217;s been in development since the turn of the century at least. But STEP is a huge family of standards, and this particular work appears to have emerged from specific work going on in the aerospace industry, which is known to have had concerns about long-term survival of CAD data for some time.</p>
<p><a href="http://www.asd-stan.org/Lotar.html">LOTAR</a> is one of the results of this concern, and <a href="http://www.aristote.asso.fr/sem/sem0804.d/005-Duchier-airbus.pdf">a presentation by Pierre Duchier</a> of AIRBUS at an <a href="http://www.aristote.asso.fr/sem/semnext.html">Aristote conference</a> in April this year gives a clear picture of the concerns of industry and the approach they are taking. LOTAR and related work were also covered at a invitation-only event at Bath in 2007, <a href="http://www.ukoln.ac.uk/events/ltkr-2007/programme.html">Atlantic Workshop on Long-Term Knowledge Retrieval</a> whose attendees included many names familiar from the DCC and the repository community.</p>
<p>Overall, this is reassuring. Here is a set of digital preservation standards being developed and driven by concerns in industry, but where the work is taking place in dialogue with with the academic research and development community. CAD data in particular has long been a concern at the Archaeology Data Service, which has significant holdings of Autocad files, and it will be interesting to see to what extent LOTAR has relevance for such activities. In the meantime, I can see I&#8217;ve got a lot of reading to do to catch up, but it would be interesting to hear from others who have more insight into the possible use of these standards outside the aerospace industry.</p>
<p>Incidentally, BS EN 9300 is not to be confused with BS EN ISO 9300, an entirely different standard concerned with &#8220;<em>Measurement of gas flow by means of critical flow Venturi nozzles</em>&#8220;. <img src='http://dablog.ulcc.ac.uk/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2008/09/08/draft-standard-for-long-term-archiving-of-cad-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Web 2.0 and Archives: Something like a Phenomenon?</title>
		<link>http://dablog.ulcc.ac.uk/2008/08/20/web-20-and-archives-something-like-a-phenomenon/</link>
		<comments>http://dablog.ulcc.ac.uk/2008/08/20/web-20-and-archives-something-like-a-phenomenon/#comments</comments>
		<pubDate>Wed, 20 Aug 2008 22:20:23 +0000</pubDate>
		<dc:creator>Joanne Anthony</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Web Archiving]]></category>
		<category><![CDATA[archives]]></category>
		<category><![CDATA[audience development]]></category>
		<category><![CDATA[blogs]]></category>
		<category><![CDATA[community engagement]]></category>
		<category><![CDATA[digital divide]]></category>
		<category><![CDATA[exclusion]]></category>
		<category><![CDATA[inclusion]]></category>
		<category><![CDATA[minority ethnic groups]]></category>
		<category><![CDATA[social networking software]]></category>
		<category><![CDATA[tagging]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[technologies]]></category>
		<category><![CDATA[web 2.0]]></category>
		<category><![CDATA[wikis]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/2008/08/20/web-20-and-archives-something-like-a-phenomenon/</guid>
		<description><![CDATA[I just spotted a posting from a fellow Antipodean, made to the Australian Archivists (aus-archivists) listserv, which has certainly raised some interesting questions surrounding web 2.0 technologies and their impact on the Archive sector&#8230;. Perhaps a debate well worth monitoring, and further exploring here, within the realm of web 2.0 itself? See Australian Archivists listserv [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://wordle.net/gallery/wrdl/132752/Archives_2.0_again" title="Wordle: Archives 2.0 again"><img src="http://wordle.net/thumb/wrdl/132752/Archives_2.0_again" class="float-right" style="border: 1px solid #dddddd; padding: 4px" /></a></p>
<p>I just spotted a posting from a fellow Antipodean, made to the Australian Archivists (aus-archivists) listserv, which has certainly raised some interesting questions surrounding web 2.0 technologies and their impact on the Archive sector&#8230;. <strong>Perhaps a debate well worth monitoring, and further exploring here, within the realm of web 2.0 itself?</strong></p>
<p>See <strong>Australian Archivists listserv posting</strong> below:</p>
<blockquote><p><em>Archival institutions are increasingly using social networking sites, tagging (folksonomies), blogs, wikis, and Flickr to promote their collections. Does anyone know any studies evaluating this phenomenon in the archival setting? I don&#8217;t mean the this-is-how-we-did-it, isn&#8217;t-it-exciting or look-how-many-hits-we&#8217;re-getting articles. I mean thoughtful consideration of the value of these tools and the effects they are having on our work. I&#8217;d also love some discussion on the list.</em></p>
<p><em>Why we have decided to use these tools? What benefits have they brought? What kind of new audiences are they attracting? How long do these audiences stick around? Is the resource taken to sustain these &#8216;relationships&#8217; worth it? Do these audiences engage with us beyond the social network stuff? Do they use our databases? come in and use our collections? order quality copies? Does it matter if they don&#8217;t? How is our adoption of these tools affecting what material we choose to process and promote? With user-generated content and tagging are our formal documentation skills, cataloguing standards, thesauri, etc passé? Does mashing trivialise our research collections? Any observations or leads to articles welcome. [ Helen Yoxall, Archives Manager, Registration and Collection Management, Powerhouse Museum, PO Box K346, Haymarket NSW 1238, Australia, URL: <a href="www.powerhousemuseum.com/archives/index.asp">www.powerhousemuseum.com/archives/index.asp</a>]&#8220;</em></p></blockquote>
<p><span id="more-187"></span>After a quick (and rudimentary) search, I stumbled across a few sources that may be of interest:</p>
<ul>
<li><a href="http://www.ukoln.ac.uk/web-focus/events/seminars/mla-ne-2006-10/" target="_blank"> UKOLN </a>have a presentation <strong>&#8220;Web 2.0: Implications For The Cultural</strong><strong> Heritage Sector&#8221;</strong></li>
<li>The Archives &amp; Museum Informatics <a href="http://www.archimuse.com/mw2007/papers/alain/alain.html" target="_blank">Museums and the Web</a> website, includes a paper <strong>&#8220;Towards Community Contribution: Empowering Community Voices On-Line&#8221;</strong> (Angèle Alain, Library and Archives Canada, Canada; Michelle Foggett, The National Archives of England and Wales, UK). It refers to Web 2.0 and community involvement in museums, libraries and archives e.g. the <a href="http://www.movinghere.org.uk/" target="_blank">Moving Here</a> project which has sought to &#8220;break down barriers to the direct involvement of minority ethnic groups in sharing their history on-line&#8221; and is among other projects keen to &#8220;embrace social networking in future to give users a higher profile voice to enable their knowledge to be passed down to the next generation&#8221;. (However, &#8220;specialised and appropriate training&#8221; was identified as crucial to tackling the barriers,[such as the 'digital divide' itself]). On the related <a href="http://www.museumscomputergroup.org.uk/meetings/2-2006-abs.shtml" target="_blank">MuseumsComputerGroup</a> website is the article abstract <strong>&#8220;Museums and Web 2.0: Connections + Community&#8221; </strong>(by Jennifer Trant, Archives and Museum Informatics), noting both the possibilities, and challenges, surrounding the adoption of web 2.0.</li>
<li>Also of interest are the <strong>web 2.0 blogs</strong>: <a href="http://library20.ning.com/" target="_blank">Library 2.0 network</a>, <a href="http://librarygang.talis.com/" target="_blank">Library 2.0 gang</a>, <a href="http://museumtwo.blogspot.com/" target="_blank">Museum 2.0</a>; and while I couldn&#8217;t spot an Archives 2.0-specific blog anywhere, there was an interesting posting on the <a href="http://www.archiveshub.ac.uk/blog/2007/12/archives-20-fact-or-fiction.html" target="_blank">Archives Hub blog</a> and on <a href="http://www.archivesnext.com/?cat=25" target="_blank">ArchivesNext</a> (which, as one of its aims, invites bloggers to explore &#8220;Web 2.0 applications and discussing their applicability to archival institutions&#8221;).</li>
<li>Our own <a href="http://dablog.ulcc.ac.uk/2007/12/13/ukwac-what-about-hlf-websites/" target="_blank">dablog</a> has highlighted one dimension of the web 2.0 impact (inspired by UCL&#8217;s <a href="http://www.ucl.ac.uk/slais/andrew-flinn/" target="_blank">Dr Andrew Flinn</a>) i.e. in relation to urgent calls to preserve the heritage outputs of web 2.0, due to &#8220;the transient history of the increasing number of minority/dissenting voices, whose heritage is only documented via websites, blogs, wikis and social software&#8221;.</li>
<li>In true folksonomic Web 2.0 style, we can also see what&#8217;s been tagged as <a href="http://delicious.com/tag/archives2.0">Archives2.0 in del.icio.us</a>.</li>
</ul>
<p>Know of any other sources that discuss the impact of Web 2.0 on the Archives Sector? Or would you like to share your opinions in response to the questions posed by Helen Yoxall (above)?</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2008/08/20/web-20-and-archives-something-like-a-phenomenon/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>DCC discussions on image formats</title>
		<link>http://dablog.ulcc.ac.uk/2008/07/04/dcc-discussions-on-image-formats/</link>
		<comments>http://dablog.ulcc.ac.uk/2008/07/04/dcc-discussions-on-image-formats/#comments</comments>
		<pubDate>Thu, 03 Jul 2008 23:14:12 +0000</pubDate>
		<dc:creator>Richard M. Davis</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Digitisation]]></category>
		<category><![CDATA[DART]]></category>
		<category><![CDATA[DCC]]></category>
		<category><![CDATA[file formats]]></category>
		<category><![CDATA[images]]></category>
		<category><![CDATA[JPEG]]></category>
		<category><![CDATA[preservation]]></category>
		<category><![CDATA[RAW]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[TIFF]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/2008/07/04/dcc-discussions-on-image-formats/</guid>
		<description><![CDATA[Rich pickings in a couple of fascinating posts on the DCC Digital Curation Blog, in which Chris Rusbridge summarises recent discussions on the DCC-Associates email list about appropriate photo image file formats for preservation, specifically TIFF, RAW and JPEG 2000. A sibling post also discusses the merits of RAW versus TIFF from the perspective of [...]]]></description>
			<content:encoded><![CDATA[<p>Rich pickings in a couple of fascinating posts on the <a href="http://digitalcuration.blogspot.com/2008/07/responses-to-raw-versus-tiff.html" title="Responses to RAW versus TIFF on DCC Blog" target="_blank">DCC Digital Curation Blog</a>, in which Chris Rusbridge summarises recent discussions <span class="postbody">on the DCC-Associates email list </span>about appropriate photo image file formats for preservation, specifically TIFF, RAW and JPEG 2000. A <a href="http://digitalcuration.blogspot.com/2008/07/responses-to-raw-versus-tiff-image.html" title="Responses to RAW versus TIFF image on DCC Blog" target="_blank">sibling post</a> also discusses the merits of RAW versus TIFF from the perspective of different users and uses.</p>
<p>The proprietary nature of RAW formats (an emerging OpenRAW standard notwithstanding) and the relative newness on the block of JPEG 2000 would both tend to bolster the longstanding preference for TIFF, but as Chris&#8217;s posts make clear, each preservation project should nevertheless weigh the options based on its own requirements and resources.</p>
<p>If in doubt, the &#8220;keep everything&#8221; approach is attractive, as ever, but &#8211; in spite of the old mantras about ever-cheaper filestore &#8211; the implications for storage space and management are potentially very costly once one enters the world of Terabytes and Petabytes. <span id="more-134"></span>In one example, Sean Martin of BL concludes that</p>
<blockquote><p>probably only a small amount of additional value is created for the additional expense approaching £200K. This leads to the question &#8220;if we had £200K on what would we spend it?&#8221; and probably the answer is &#8220;not in this way&#8221;.</p></blockquote>
<p>The posts contain many interesting examples of costings, based on the experience of BL, SDSC and others. I won&#8217;t even attempt to summarise (Chris&#8217;s summary of) them here, but I hope that this post will still be available for me to consult next time I have to dabble in the murky science of DP costings.</p>
<p>In some instances, the case for preserving the RAW image may nevertheless be compelling. One can imagine that the risk of missing a new planet or virus (or identifying a non-existent one), or other potential infelicities in scientific and medical imaging, is not worth contemplating. By contrast, it&#8217;s hard to see what value might be added to a collection like the <a href="http://dablog.ulcc.ac.uk/2007/11/02/7/" title="Launch of Linnean Online">Linnean Society&#8217;s</a> by keeping raw camera data alongside the TIFFs, and more than doubling the amount of storage capacity required.</p>
<p>It&#8217;s for making decisions like this that the intellectual exercise of identifying <a href="http://dablog.ulcc.ac.uk/2008/04/08/significant-properties/">significant properties</a> <em>and</em> the needs of all stakeholders &#8211; creators, curators and users &#8211; seems particularly essential.</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2008/07/04/dcc-discussions-on-image-formats/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>BS 10008 &#8211; time is running out</title>
		<link>http://dablog.ulcc.ac.uk/2008/06/26/bs-10008-time-is-running-out/</link>
		<comments>http://dablog.ulcc.ac.uk/2008/06/26/bs-10008-time-is-running-out/#comments</comments>
		<pubDate>Thu, 26 Jun 2008 22:40:52 +0000</pubDate>
		<dc:creator>Kevin Ashley</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Digital Archives]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/2008/06/26/bs-10008-time-is-running-out/</guid>
		<description><![CDATA[I wrote in early May about the draft of BS 10008 on the evidential value of digital information. It&#8217;s only in the last few days that I&#8217;ve got round to reading and commenting on it, and I&#8217;m afraid to say it&#8217;s not as good as I had hoped. The standard claims applicability to all classes [...]]]></description>
			<content:encoded><![CDATA[<p>I <a href="http://dablog.ulcc.ac.uk/2008/05/12/draft-standard-on-evidential-value-of-digital-information/">wrote in early May</a> about the draft of BS 10008 on the evidential value of digital information. It&#8217;s only in the last few days that I&#8217;ve got round to reading and commenting on it, and I&#8217;m afraid to say it&#8217;s not as good as I had hoped. The standard claims applicability to all classes of digital object, but it&#8217;s clear that it&#8217;s written by people that deal primarily with documents &#8211; and some parts of it read as if it&#8217;s primarily about digitised versions of paper documents. If there&#8217;s a need for a standard about document management systems or about scanned document systems, then that&#8217;s fine. But I find it worrying when it&#8217;s then claimed that such a standard has much broader scope.</p>
<p>Elements of the standard would benefit from knowledge of standards such as OAIS, which try to address long-term accessibility in a much more rigorous fashion. Comments close next Monday, June 30th, so if you have not yet commented but are concerned by any of this, do so now.</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2008/06/26/bs-10008-time-is-running-out/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Draft standard on evidential value of digital information</title>
		<link>http://dablog.ulcc.ac.uk/2008/05/12/draft-standard-on-evidential-value-of-digital-information/</link>
		<comments>http://dablog.ulcc.ac.uk/2008/05/12/draft-standard-on-evidential-value-of-digital-information/#comments</comments>
		<pubDate>Mon, 12 May 2008 16:08:33 +0000</pubDate>
		<dc:creator>Kevin Ashley</dc:creator>
				<category><![CDATA[DA Blog]]></category>
		<category><![CDATA[Digital Archives]]></category>
		<category><![CDATA[authenticity]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/2008/05/12/draft-standard-on-evidential-value-of-digital-information/</guid>
		<description><![CDATA[My thanks are due to Susan Healy at The National Archives for drawing my attention to a new draft standard of relevance to digital preservation, currently open for public comment. British Standard BS 10008 &#8211; Evidential weight and legal admissibility of electronic information attempts to define clear guidelines for the creation and custody of digital [...]]]></description>
			<content:encoded><![CDATA[<p>My thanks are due to Susan Healy at The National Archives for drawing my attention to a new draft standard of relevance to digital preservation, currently open for public comment. <strong>British Standard BS 10008 &#8211; Evidential weight and legal admissibility of electronic information</strong> attempts to define clear guidelines for the creation and custody of digital documents that allows them to be used as evidence in legal proceedings, although its applicability is much wider. <span id="more-109"></span>The content will be familiar to some in the field, as this is a revision of the guidance document previously known as PD0008, which has been around since 1996. (It was one of the documents we drew on when establishing <a href="http://ndad.ulcc.ac.uk/">NDAD</a>)</p>
<p>Authenticity is a key requirement in any digital preservation application, and it&#8217;s not a particularly novel one either as <a href="http://digitalcuration.blogspot.com/2008/04/question-of-authenticity.html">this post by Maureen Pennock</a> makes clear. One of the six mandatory responsibilities which <a href="http://en.wikipedia.org/wiki/Open_Archival_Information_System">OAIS</a> places on an archive deal with authenticity and traceability, and this was reflected virtually unchanged in the <a href="http://www.crl.edu/content.asp?l1=13&#038;l2=58&#038;l3=162&#038;l4=91">TRAC</a> methodology, in which one of the audit checkpoints was that the repository</p>
<blockquote><p>&#8230; enables the dissemination of authentic copies of the original or objects traceable to originals.</p></blockquote>
<p>That choice of words is important: most repositories aren&#8217;t dealing with use cases that involve legal evidence on a daily basis, and for most users it isn&#8217;t necessary to provide dissemination copies of material that can be taken straight into a courtroom. Sometimes, doing so places extra costs on the dissemination process that aren&#8217;t needed. Other repositories may find it easier to make <strong>all</strong> copies &#8216;authentic&#8217;, or &#8216;capable of being traceable to originals.&#8217; But we simply wanted to ensure that repositories were capable of doing this when it was required. If they could not, it called into question their own confidence in the integrity of their holdings.</p>
<p>BS 10008 deals with a subset of the problem, one suitable for document-oriented problems, and it isn&#8217;t at all specific about the mechanics of how it is achieved. The standard is going to be useful and it&#8217;s likely to have more weight as a standard (to which it&#8217;s easier to force compliance) than as a code of practice. But there&#8217;s areas that are worthy of comment from the DP community. Take, for instance, this definition of <strong>migration</strong>:</p>
<blockquote><p>transfer of electronic information from one storage media to another</p></blockquote>
<p>That&#8217;s certainly not the usual use of the term &#8211; I would use either &#8216;refreshing&#8217; to refer to this action, or &#8216;media migration&#8217; if I wanted to make it clear that no change in format or encoding was involved but still wanted to talk of &#8216;migration.&#8217;</p>
<p>The document is open for public comment until June 30th, 2008. You can view the draft and make your comments using <a href="http://drafts.bsigroup.com/">BSI&#8217;s online review website</a> (Incidentally, this is the first time I&#8217;ve used the online review process, and it works a treat. I last reviewed a standard about 10 years ago, when it was still a very paper-based process, although one could at least submit comments by email then. This is a great improvement. And while you&#8217;re there, you can take the opportunity to review such upcoming standards as BS 8507-1 for &#8220;Close Protection Services&#8221; &#8211; otherwise known as bodyguards, methinks)</p>
]]></content:encoded>
			<wfw:commentRss>http://dablog.ulcc.ac.uk/2008/05/12/draft-standard-on-evidential-value-of-digital-information/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
