<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: File formats&#8230;or data streams?</title>
	<atom:link href="http://dablog.ulcc.ac.uk/2009/12/03/ffods/feed/" rel="self" type="application/rss+xml" />
	<link>http://dablog.ulcc.ac.uk/2009/12/03/ffods/</link>
	<description>ulcc digital archives blog</description>
	<lastBuildDate>Tue, 31 Jan 2012 13:40:46 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
	<item>
		<title>By: Malcolm Todd</title>
		<link>http://dablog.ulcc.ac.uk/2009/12/03/ffods/comment-page-1/#comment-2349</link>
		<dc:creator>Malcolm Todd</dc:creator>
		<pubDate>Fri, 08 Jan 2010 15:08:27 +0000</pubDate>
		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=811#comment-2349</guid>
		<description>I am glad the report has stirred this thread.  Two things I feel I ought to say, rather belatedly:

Chris - the second half of the report is concerned with the adequacy of information representation by candidate formats.  If by saying we need to be very careful about that, you mean:

[1] This is so important we need to get it right; and
[2] We as a community have only really just begun to grapple with this head on ......

.....then I couldn&#039;t agree more.  What I&#039;d tried to do is to look at this from an archival science perspective and try to provide the hooks for other parts of the DP community to find their own equivalent of similar issues.  

Ed - at the event itself you asked a very valid question about web archiving formats that I could have answered a bit more intelligently and helpfully.  I admitted that the report doesn&#039;t discuss these in their own right, though it does discuss the problems with XML encoding as a &quot;preservation strategy&quot;.  What I should have added is that the criteria in the first half of the report ought to be equally applicable to web archiving formats as to anything else, so no doubt the developers of WARK traded the inherent risks of a wrapper format advisedly for the ability to relate and render web pages into the future.....

I&#039;ve only got my tongue slightly in my cheek here - I honestly wouldn&#039;t know, but what I do know is that one of the most interesting parts of researching the report was looking back at previous announcements of preservation approaches and reflecting on how these might be articulated and explained today.</description>
		<content:encoded><![CDATA[<p>I am glad the report has stirred this thread.  Two things I feel I ought to say, rather belatedly:</p>
<p>Chris &#8211; the second half of the report is concerned with the adequacy of information representation by candidate formats.  If by saying we need to be very careful about that, you mean:</p>
<p>[1] This is so important we need to get it right; and<br />
[2] We as a community have only really just begun to grapple with this head on &#8230;&#8230;</p>
<p>&#8230;..then I couldn&#8217;t agree more.  What I&#8217;d tried to do is to look at this from an archival science perspective and try to provide the hooks for other parts of the DP community to find their own equivalent of similar issues.  </p>
<p>Ed &#8211; at the event itself you asked a very valid question about web archiving formats that I could have answered a bit more intelligently and helpfully.  I admitted that the report doesn&#8217;t discuss these in their own right, though it does discuss the problems with XML encoding as a &#8220;preservation strategy&#8221;.  What I should have added is that the criteria in the first half of the report ought to be equally applicable to web archiving formats as to anything else, so no doubt the developers of WARK traded the inherent risks of a wrapper format advisedly for the ability to relate and render web pages into the future&#8230;..</p>
<p>I&#8217;ve only got my tongue slightly in my cheek here &#8211; I honestly wouldn&#8217;t know, but what I do know is that one of the most interesting parts of researching the report was looking back at previous announcements of preservation approaches and reflecting on how these might be articulated and explained today.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: William Kilbride</title>
		<link>http://dablog.ulcc.ac.uk/2009/12/03/ffods/comment-page-1/#comment-2318</link>
		<dc:creator>William Kilbride</dc:creator>
		<pubDate>Mon, 04 Jan 2010 17:08:41 +0000</pubDate>
		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=811#comment-2318</guid>
		<description>As also discussed during the day and as yet imperfectly completed, the DPC has a new website - but the redirects to files and pages are not yet as reliable as we had hoped.  So if you&#039;re looking for the report that Ed mentions then let me recommend: http://www.dpconline.org/technology-watch-reports/download-document/375-file-formats-for-preservation.html

As to whether a website is a collection of files or a data stream, my instinct is that we (DPC) are moving toward the latter and for good reasons. Just as well Ed made a copy of the old site for us in the UK Web Archive!</description>
		<content:encoded><![CDATA[<p>As also discussed during the day and as yet imperfectly completed, the DPC has a new website &#8211; but the redirects to files and pages are not yet as reliable as we had hoped.  So if you&#8217;re looking for the report that Ed mentions then let me recommend: <a href="http://www.dpconline.org/technology-watch-reports/download-document/375-file-formats-for-preservation.html" rel="nofollow">http://www.dpconline.org/technology-watch-reports/download-document/375-file-formats-for-preservation.html</a></p>
<p>As to whether a website is a collection of files or a data stream, my instinct is that we (DPC) are moving toward the latter and for good reasons. Just as well Ed made a copy of the old site for us in the UK Web Archive!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin Ashley</title>
		<link>http://dablog.ulcc.ac.uk/2009/12/03/ffods/comment-page-1/#comment-2119</link>
		<dc:creator>Kevin Ashley</dc:creator>
		<pubDate>Tue, 08 Dec 2009 17:40:06 +0000</pubDate>
		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=811#comment-2119</guid>
		<description>Chris, you are right to draw attention to these issues. But to be fair both to the original report, and Ed&#039;s summary of the day, I think the tone of the event was more nuanced, and the difficult tradeoffs to be considered - such as that between storage costs and vulnerability of content to bitstream corruption - were part of the discussion.

As for Word-&gt;PDF - I read Ed&#039;s description not as approving it or recommending it, but merely as a relatively simple example where the issues can be considered. It&#039;s one we include in the &lt;a href=&quot;www.dptp.org&quot; rel=&quot;nofollow&quot;&gt;DPTP&lt;/a&gt; for that very reason. The property of editability, for instance, is critical if one is storing resources for reuse and refactorisation. But if the purpose of preservation is to preserve an inalienable record, then editability isn&#039;t a significant property.

Malcolm Todd&#039;s report draws attention to these and other issues. I think the release of an earlier draft may be one of the reasons that the current redrafting of OAIS is going to explicitly address these concepts, albeit using new terminology.

MS Word isn&#039;t going to go away in the near future. Obsolescence is slow. But it does happen. 9-track magtapes continued to be a good interchange format for many years after the technology had been superseded by others precisely because 9-track was so widely adopted. But they are definitely obsolete now - the drives aren&#039;t being manufactured any more. There&#039;s still lots of content living on 9-track tape and it&#039;s more endangered with each year that passes.

&lt; advertising interlude &gt;
ULCC happens to have a 9-track drive, by the way. How much longer that will be true, I can&#039;t say. But while we have it, we&#039;re happy to help others who need to recover information from such tapes.
&lt; /advertising interlude &gt;</description>
		<content:encoded><![CDATA[<p>Chris, you are right to draw attention to these issues. But to be fair both to the original report, and Ed&#8217;s summary of the day, I think the tone of the event was more nuanced, and the difficult tradeoffs to be considered &#8211; such as that between storage costs and vulnerability of content to bitstream corruption &#8211; were part of the discussion.</p>
<p>As for Word->PDF &#8211; I read Ed&#8217;s description not as approving it or recommending it, but merely as a relatively simple example where the issues can be considered. It&#8217;s one we include in the <a href="www.dptp.org" rel="nofollow">DPTP</a> for that very reason. The property of editability, for instance, is critical if one is storing resources for reuse and refactorisation. But if the purpose of preservation is to preserve an inalienable record, then editability isn&#8217;t a significant property.</p>
<p>Malcolm Todd&#8217;s report draws attention to these and other issues. I think the release of an earlier draft may be one of the reasons that the current redrafting of OAIS is going to explicitly address these concepts, albeit using new terminology.</p>
<p>MS Word isn&#8217;t going to go away in the near future. Obsolescence is slow. But it does happen. 9-track magtapes continued to be a good interchange format for many years after the technology had been superseded by others precisely because 9-track was so widely adopted. But they are definitely obsolete now &#8211; the drives aren&#8217;t being manufactured any more. There&#8217;s still lots of content living on 9-track tape and it&#8217;s more endangered with each year that passes.</p>
<p>&lt; advertising interlude &gt;<br />
ULCC happens to have a 9-track drive, by the way. How much longer that will be true, I can&#8217;t say. But while we have it, we&#8217;re happy to help others who need to recover information from such tapes.<br />
&lt; /advertising interlude &gt;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris Rusbridge</title>
		<link>http://dablog.ulcc.ac.uk/2009/12/03/ffods/comment-page-1/#comment-2113</link>
		<dc:creator>Chris Rusbridge</dc:creator>
		<pubDate>Mon, 07 Dec 2009 22:00:42 +0000</pubDate>
		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=811#comment-2113</guid>
		<description>I think we have to think very carefully about this stuff. This is dangerous to say, as I wasn&#039;t at the seminar, and maybe this was gone into. For example, you say &quot;...migrate data from a format that’s about to become obsolete or unsupported, onto another format that’s stable, supported, and open. MS Word document to PDF or PDF/A…now that, I can understand!&quot; So first, is Word about to become obsolete? really? Any format since Word 6 about to fall off the map of readable document formats? Somehow given the large number of alternative platforms (including OpenOffice), this does not seem likely. Obsolescence is a slooooooow phenomenon! 

Second, Word to PDF is an extremely lossy migration, unless your world view (or significant property view) is STRICTLY 2-d page oriented. You lose a lot of what might well be in the Word document. So it&#039;s not a migration that should be done unless you have to (or at least, not without retaining the original, which begs the question of why migrate at all).

You also quote the TIFF vs JPEG2000 matter. We&#039;ve just published in IJDC an article by Wright et al from iPres 2008 (Wright, R., Miller, A., &amp; Addis, M. (2009). The Significance of Storage in the &quot;Cost of Risk&quot; of Digital Preservation. The International Journal of Digital Curation, 4(3), 104-122. Retrieved from http://www.ijdc.net/index.php/ijdc/article/viewFile/138/160); here&#039;s a quote from a late section, after a long discussion of some of the many ways that storage can go bad on you &quot;Tests by Heydegger showed that corrupting only 0.01% of the bytes in a compressed JPEG2000 file, including lossless compression, could result in at least 50% of the original information encoded in the file being affected. In some cases, corrupting just a single byte in a JPEG2000 image would cause highly visible artefacts throughout the whole of that image.&quot;

Anyway, what I&#039;m trying to get to is the need for some really careful thought and subtlety on the matter of migration!</description>
		<content:encoded><![CDATA[<p>I think we have to think very carefully about this stuff. This is dangerous to say, as I wasn&#8217;t at the seminar, and maybe this was gone into. For example, you say &#8220;&#8230;migrate data from a format that’s about to become obsolete or unsupported, onto another format that’s stable, supported, and open. MS Word document to PDF or PDF/A…now that, I can understand!&#8221; So first, is Word about to become obsolete? really? Any format since Word 6 about to fall off the map of readable document formats? Somehow given the large number of alternative platforms (including OpenOffice), this does not seem likely. Obsolescence is a slooooooow phenomenon! </p>
<p>Second, Word to PDF is an extremely lossy migration, unless your world view (or significant property view) is STRICTLY 2-d page oriented. You lose a lot of what might well be in the Word document. So it&#8217;s not a migration that should be done unless you have to (or at least, not without retaining the original, which begs the question of why migrate at all).</p>
<p>You also quote the TIFF vs JPEG2000 matter. We&#8217;ve just published in IJDC an article by Wright et al from iPres 2008 (Wright, R., Miller, A., &#038; Addis, M. (2009). The Significance of Storage in the &#8220;Cost of Risk&#8221; of Digital Preservation. The International Journal of Digital Curation, 4(3), 104-122. Retrieved from <a href="http://www.ijdc.net/index.php/ijdc/article/viewFile/138/160" rel="nofollow">http://www.ijdc.net/index.php/ijdc/article/viewFile/138/160</a>); here&#8217;s a quote from a late section, after a long discussion of some of the many ways that storage can go bad on you &#8220;Tests by Heydegger showed that corrupting only 0.01% of the bytes in a compressed JPEG2000 file, including lossless compression, could result in at least 50% of the original information encoded in the file being affected. In some cases, corrupting just a single byte in a JPEG2000 image would cause highly visible artefacts throughout the whole of that image.&#8221;</p>
<p>Anyway, what I&#8217;m trying to get to is the need for some really careful thought and subtlety on the matter of migration!</p>
]]></content:encoded>
	</item>
</channel>
</rss>

