<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Working with Web Curator Tool (part 2): wikis</title>
	<atom:link href="http://dablog.ulcc.ac.uk/2009/03/10/working-with-web-curator-tool-part-2-wikis/feed/" rel="self" type="application/rss+xml" />
	<link>http://dablog.ulcc.ac.uk/2009/03/10/working-with-web-curator-tool-part-2-wikis/</link>
	<description>ulcc digital archives blog</description>
	<lastBuildDate>Tue, 31 Jan 2012 13:40:46 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
	<item>
		<title>By: Richard M. Davis</title>
		<link>http://dablog.ulcc.ac.uk/2009/03/10/working-with-web-curator-tool-part-2-wikis/comment-page-1/#comment-4229</link>
		<dc:creator>Richard M. Davis</dc:creator>
		<pubDate>Fri, 03 Jun 2011 17:30:00 +0000</pubDate>
		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=391#comment-4229</guid>
		<description>Thanks for the update about your tools, emijrp, we will look into this for future projects.</description>
		<content:encoded><![CDATA[<p>Thanks for the update about your tools, emijrp, we will look into this for future projects.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: emijrp</title>
		<link>http://dablog.ulcc.ac.uk/2009/03/10/working-with-web-curator-tool-part-2-wikis/comment-page-1/#comment-4209</link>
		<dc:creator>emijrp</dc:creator>
		<pubDate>Mon, 30 May 2011 19:56:30 +0000</pubDate>
		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=391#comment-4209</guid>
		<description>Hi;

There is an excellent tool for extracting all the info (text, histories and images) of a MediaWiki wiki, it is called WikiTeam tool.[1]

It has been tested with several MediaWiki versions. You can see a list of preserved wikis ad the downloads section.

The output format is a big XML with all the text and metadata (asy to import using MediaWiki import tools) and a directory with the wiki images.

Regards,
emijrp

[1] http://code.google.com/p/wikiteam/</description>
		<content:encoded><![CDATA[<p>Hi;</p>
<p>There is an excellent tool for extracting all the info (text, histories and images) of a MediaWiki wiki, it is called WikiTeam tool.[1]</p>
<p>It has been tested with several MediaWiki versions. You can see a list of preserved wikis ad the downloads section.</p>
<p>The output format is a big XML with all the text and metadata (asy to import using MediaWiki import tools) and a directory with the wiki images.</p>
<p>Regards,<br />
emijrp</p>
<p>[1] <a href="http://code.google.com/p/wikiteam/" rel="nofollow">http://code.google.com/p/wikiteam/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: JISC-PoWR &#187; Blog Archive &#187; Archiving a wiki</title>
		<link>http://dablog.ulcc.ac.uk/2009/03/10/working-with-web-curator-tool-part-2-wikis/comment-page-1/#comment-1222</link>
		<dc:creator>JISC-PoWR &#187; Blog Archive &#187; Archiving a wiki</dc:creator>
		<pubDate>Wed, 25 Mar 2009 14:05:24 +0000</pubDate>
		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=391#comment-1222</guid>
		<description>[...] dablog recently I have put up a post with a few observations about archiving a MediaWiki site. The example is the UKOLN Repositories [...]</description>
		<content:encoded><![CDATA[<p>[...] dablog recently I have put up a post with a few observations about archiving a MediaWiki site. The example is the UKOLN Repositories [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gordon Paynter</title>
		<link>http://dablog.ulcc.ac.uk/2009/03/10/working-with-web-curator-tool-part-2-wikis/comment-page-1/#comment-1193</link>
		<dc:creator>Gordon Paynter</dc:creator>
		<pubDate>Sat, 14 Mar 2009 09:08:38 +0000</pubDate>
		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=391#comment-1193</guid>
		<description>Hi Ed, Maureen:

We&#039;ve encountered pretty the same Wiki issue as Ed describes at the National Library of New Zealand. The curators used exclusion filters much as Ed describes to limit the harvester (I will also be sending them this article in case they need to harvest mediawiki).

The biggest problem is not the edit history, but the differences between versions of a page. A page with 10 versions will have dozens of &quot;diff&quot; pages (Ed&#039;s .*&amp;diff.* pattern presumably takes care of this).   So even if we want to capture the edit history, we don&#039;t want the differences.  

We have similar but more frequent issues with blogs. On some platforms each post appears on the homepage and on its own page, but also separately on the archive pages for the relevant year, month and day, and also on the page for any tags. So you can wind up harvesting the same content many times over. Again, crafting exclusion filters is the solution.

Gordon</description>
		<content:encoded><![CDATA[<p>Hi Ed, Maureen:</p>
<p>We&#8217;ve encountered pretty the same Wiki issue as Ed describes at the National Library of New Zealand. The curators used exclusion filters much as Ed describes to limit the harvester (I will also be sending them this article in case they need to harvest mediawiki).</p>
<p>The biggest problem is not the edit history, but the differences between versions of a page. A page with 10 versions will have dozens of &#8220;diff&#8221; pages (Ed&#8217;s .*&amp;diff.* pattern presumably takes care of this).   So even if we want to capture the edit history, we don&#8217;t want the differences.  </p>
<p>We have similar but more frequent issues with blogs. On some platforms each post appears on the homepage and on its own page, but also separately on the archive pages for the relevant year, month and day, and also on the page for any tags. So you can wind up harvesting the same content many times over. Again, crafting exclusion filters is the solution.</p>
<p>Gordon</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard M. Davis</title>
		<link>http://dablog.ulcc.ac.uk/2009/03/10/working-with-web-curator-tool-part-2-wikis/comment-page-1/#comment-1190</link>
		<dc:creator>Richard M. Davis</dc:creator>
		<pubDate>Wed, 11 Mar 2009 17:11:14 +0000</pubDate>
		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=391#comment-1190</guid>
		<description>Maureen - sort of... though I was thinking more of a view of the document, so that the timeline slider plays through the amendments as if they are being entered live - you know the sort of thing? Bit like &lt;a href=&quot;http://tinyurl.com/abrmjp&quot; rel=&quot;nofollow&quot;&gt;this&lt;/a&gt; (but not too much ;)</description>
		<content:encoded><![CDATA[<p>Maureen &#8211; sort of&#8230; though I was thinking more of a view of the document, so that the timeline slider plays through the amendments as if they are being entered live &#8211; you know the sort of thing? Bit like <a href="http://tinyurl.com/abrmjp" rel="nofollow">this</a> (but not too much <img src='http://dablog.ulcc.ac.uk/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Maureen Pennock</title>
		<link>http://dablog.ulcc.ac.uk/2009/03/10/working-with-web-curator-tool-part-2-wikis/comment-page-1/#comment-1189</link>
		<dc:creator>Maureen Pennock</dc:creator>
		<pubDate>Wed, 11 Mar 2009 16:00:08 +0000</pubDate>
		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=391#comment-1189</guid>
		<description>Ed - I agree, it comes back to the user community. I guess I&#039;m just not clear on who&#039;s &#039;in&#039; the UKWAC user community, let alone how it may change in the future. Oh, for a crystal ball... ! 

Richard - hmm, a graphic, progressive change animator tool. Like a Dipity interface for the web archive? Neat idea :)</description>
		<content:encoded><![CDATA[<p>Ed &#8211; I agree, it comes back to the user community. I guess I&#8217;m just not clear on who&#8217;s &#8216;in&#8217; the UKWAC user community, let alone how it may change in the future. Oh, for a crystal ball&#8230; ! </p>
<p>Richard &#8211; hmm, a graphic, progressive change animator tool. Like a Dipity interface for the web archive? Neat idea <img src='http://dablog.ulcc.ac.uk/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard M. Davis</title>
		<link>http://dablog.ulcc.ac.uk/2009/03/10/working-with-web-curator-tool-part-2-wikis/comment-page-1/#comment-1186</link>
		<dc:creator>Richard M. Davis</dc:creator>
		<pubDate>Wed, 11 Mar 2009 11:12:30 +0000</pubDate>
		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=391#comment-1186</guid>
		<description>I don&#039;t know if many people ever actually try to follow the authorship trail on (say) a Mediawiki page from start to finish. For example, the notorious &lt;a href=&quot;http://en.wikipedia.org/w/index.php?title=Ronnie_Hazlehurst&amp;offset=20071223142507&amp;action=history&quot; rel=&quot;nofollow&quot;&gt;Ronnie Hazlehurst/S Club 7&lt;/a&gt; controversy on Wikipedia: there&#039;s solid audit info there, no doubt, but there&#039;s probably work for a whole new breed of auditor in actually working it out :)

Maybe a graphic, progressive change animator would be a useful tool in a wiki archive?</description>
		<content:encoded><![CDATA[<p>I don&#8217;t know if many people ever actually try to follow the authorship trail on (say) a Mediawiki page from start to finish. For example, the notorious <a href="http://en.wikipedia.org/w/index.php?title=Ronnie_Hazlehurst&#038;offset=20071223142507&#038;action=history" rel="nofollow">Ronnie Hazlehurst/S Club 7</a> controversy on Wikipedia: there&#8217;s solid audit info there, no doubt, but there&#8217;s probably work for a whole new breed of auditor in actually working it out <img src='http://dablog.ulcc.ac.uk/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Maybe a graphic, progressive change animator would be a useful tool in a wiki archive?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ed Pinsent</title>
		<link>http://dablog.ulcc.ac.uk/2009/03/10/working-with-web-curator-tool-part-2-wikis/comment-page-1/#comment-1184</link>
		<dc:creator>Ed Pinsent</dc:creator>
		<pubDate>Wed, 11 Mar 2009 09:43:55 +0000</pubDate>
		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=391#comment-1184</guid>
		<description>Hi Maureen
Many thanks for adding a comment. You make a good point. I agree there is an argument for capturing history pages from a wiki, and it&#039;s likely &quot;there will be people who want to know what was done by whom and when.&quot; But I wonder if the UKWAC user community are those people? Perhaps a full record of this blog&#039;s change history is more likely to be of value to UKOLN (primarily), and to the wider community of information specialists who are interested in digital preservation. So maybe this raises another possibility for web-archiving; different levels of capture, and capture of different types of content, depending on the requirements of your user communities.</description>
		<content:encoded><![CDATA[<p>Hi Maureen<br />
Many thanks for adding a comment. You make a good point. I agree there is an argument for capturing history pages from a wiki, and it&#8217;s likely &#8220;there will be people who want to know what was done by whom and when.&#8221; But I wonder if the UKWAC user community are those people? Perhaps a full record of this blog&#8217;s change history is more likely to be of value to UKOLN (primarily), and to the wider community of information specialists who are interested in digital preservation. So maybe this raises another possibility for web-archiving; different levels of capture, and capture of different types of content, depending on the requirements of your user communities.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Maureen Pennock</title>
		<link>http://dablog.ulcc.ac.uk/2009/03/10/working-with-web-curator-tool-part-2-wikis/comment-page-1/#comment-1183</link>
		<dc:creator>Maureen Pennock</dc:creator>
		<pubDate>Wed, 11 Mar 2009 08:52:07 +0000</pubDate>
		<guid isPermaLink="false">http://dablog.ulcc.ac.uk/?p=391#comment-1183</guid>
		<description>Hi Ed,
this is really interesting, thanks for posting it. I&#039;m particularly interested in your selection decision to capture the main pages and not the history/edit/discussion pages. I see where you&#039;re coming from when you say that &#039;scholars who browse the archived copy of DigiRep are not expecting to be able to edit pages, nor join in the discussions, nor browse the history of stored versions of pages.&#039; I agree insofar as the edit pages are concerned, but there&#039;s an argument for capturing the history pages as an audit trail of who contributed what and when. This is interesting and valuable stuff - a particular (or should that be significant?) characteristic of wikis is that they are collaborative tools. The history pages are evidence of this and I imagine there will be people who want to know what was done by whom and when. You could make the same argument for the discussion pages, but in my experience the discussion pages are not widely used. I haven&#039;t looked at the ones in the Digirep wiki so can&#039;t say whether that&#039;s the case there. But as I said, all interesting stuff, thank you!</description>
		<content:encoded><![CDATA[<p>Hi Ed,<br />
this is really interesting, thanks for posting it. I&#8217;m particularly interested in your selection decision to capture the main pages and not the history/edit/discussion pages. I see where you&#8217;re coming from when you say that &#8216;scholars who browse the archived copy of DigiRep are not expecting to be able to edit pages, nor join in the discussions, nor browse the history of stored versions of pages.&#8217; I agree insofar as the edit pages are concerned, but there&#8217;s an argument for capturing the history pages as an audit trail of who contributed what and when. This is interesting and valuable stuff &#8211; a particular (or should that be significant?) characteristic of wikis is that they are collaborative tools. The history pages are evidence of this and I imagine there will be people who want to know what was done by whom and when. You could make the same argument for the discussion pages, but in my experience the discussion pages are not widely used. I haven&#8217;t looked at the ones in the Digirep wiki so can&#8217;t say whether that&#8217;s the case there. But as I said, all interesting stuff, thank you!</p>
]]></content:encoded>
	</item>
</channel>
</rss>

