If you can keep your blog when all around…

I was a keen participant in the activities of ERPANET , but I must confess I haven’t kept abreast of its successor, Digital Preservation Europe (DPE). However I was interested to see the recent DPE briefing paper about blog preservation, since it covers an area that we also tackled in the course of the JISC-PoWR project – on the blog , in the workshops and the handbook. The Briefing Paper highlights key issues for those who would preserve blogs. It is a necessarily general overview, and manages to cram a lot of preservation issues into its two sides of A4. But, for the blogger approaching preservation, or the preservationist approaching blogs, I wonder if such avalanches of considerations aren’t sometimes unnecessarily overwhelming. It seemed worth looking at a few of the points made in the DPE briefing paper, and considering whether we can demystify them or make the task seem less daunting.

Calls in the literature have advocated that blogs, as potentially valuable additions to the human record, are worthy of stewardship and long-term preservation. It’s hard to argue with this, in the light of recent widely publicised use of blogging by President Obama, the Number Ten Downing Street blog, the blogging and tweeting activities of certain celebrities, and so on. But of course anyone can have a blog (and it sometimes seems that almost everyone does). Blogs are phenomenally useful, and have a wide variety of applications: individual or corporate announcements and publications; time-stamped records of activities; public or private figures writing as themselves or pseudonymously; public and private journal and discussion spaces for students and a wide range of communities. A blog may well be just for Christmas (or any kind of project) no less than a traditional diary or journal. It is in essence, after all, just a sequential log (“blog” is merely a geeky contraction of “web log”). Blogging is therefore not so fundamentally alien: Captain Cook kept a log (and so will Captain Kirk). Rather than make it sound very technical and complicated, let’s start by taking some comfort in what’s familiar.

Should all blogs be preserved? Which blogs should be preserved? The DPE Briefing Paper highlights (as all good preservation advice must) the issue of selection. How do we decide which blogs to preserve? To start with, the remit of any archival body (national, municipal, corporate, specialist, etc.) is itself self-selecting. It’s unlikely any archivist at NARA, TNA or any other archive would ever have to begin their deliberations with “all blogs”. Of matters of authenticity and attribution, the nature of the archive undertaking the preservation will, equally, dictate whether knowing the true identity is essential or not. Common sense should prevail. Clearly for records of official activities, such as the Downing Street and Obama blogs, reliable provenance and authenticity are critical. On the other hand, satirical blogs by “Gordon Brown” or “Vladimir Putin“  are clearly not records of official activities, but they still may merit preservation as publications and/or as interesting artefacts: as with many fictional or satirical diaries or letters, the true identity of the author may or may not be material (Beachcomber, Henry Root, Primary Colors).

Is preservation of content, regardless of context, sufficient? This is one of the most significant questions raised by the briefing paper. Has the Web changed our expectations of context? Once a descriptive catalogue or historical account seemed sufficient to establish context. Digital media now tempt us into thinking that, for any object, we should automatically gather (a copy of)  everything linked-to, and perhaps everything that links to, and so on. We can leave a computer to follow the endlessly forking paths, and create a virtual machine environment that will always render the gather materials faithfully – all without any manual intervention. Will that provide context, or make the preserved content understandable? Can we do without expert researchers and information professionals to scope, select and provide commentary where necessary?

The inability to capture and preserve the design and features of the blog contradicts defining attributes of a blog. In fact it’s not impossible to capture and preserve many if not all of the design features of a blog. The JISC-PoWR blog has been harvested by UKWAC with most of its features intact (experiments at ULCC have suggested that, with a little care, Wordpress blogs are particularly well-suited to harvesting). But in general, let’s accept (as the Blogbackuponline people do) the overriding significance of the textual content over and above the bells and whistles of the web presentation. If there is one particularly interesting feature of blogs, that does set them apart from conventional web content, it is that they are generally intended for delivery and consumption in two distinct formats: by HTML web page, and by RSS/Atom XML newsfeed. The newsfeed version is bare-bones hypertext stripped of all the styling, bells and whistles of the web version, and will be read in many contexts, from online newsreaders and aggregators, like Netvibes or Bloglines, to desktop clients, such as Mozilla Thunderbird, as well as on mobile phones, and by automatic syndication to other sites.

Blogs, including posts and comments, may be intentionally edited or deleted. Looked at in isolation, editability is a significant property of blogs in everyday use. But, as Chris Rusbridge observed , “editability, while vital for some kinds of re-use, is not essential for conveying the information essence of the objects”. Any preserved object we have only in the state it was when taken into archival custody (though we may also capture and preserve records of how it changed). Preservation schedules for blogs will need to decide how important this really is: when is it and when is it not important to preserve an edit that no one ever saw? (A transactional approach – only capturing page impressions users request – is one way to address this without excessive redundancy.) However, if anything, this volatiliy of blog content is a more serious issue for those citing an active blog (who may find their quotations no longer match): a copy in an archive ought to be fixed, and reliably citable. If anything, this would suggest that creating stable archives of valuable blogs for the record is essential.

James Currall rightly warned us against making an undue fetish of digital media, at the expense of the information (and entertainment) which they convey; arguably as dangerous is the fetishisation of specific manifestations of digital media. How different, really, are blogs from other publications or applications built on the Web platform? Blogs are essentially web pages and don’t exist in a void. The “problem” of hypertext links to possibly unstable objects in remote locations is the bête-noire of all efforts to tame the Web. “Not in archive” is, and will ever be, a familiar message to regular users of archive.org. But we can perhaps take heart that many archives of correspondence tend to be one-sided too.

Sometimes thinking about these things in the abstract is no substitute for actually doing them for real. Perhaps the best advice for anyone tasked with preserving blogs is, first of all, to try keeping one yourself for a few days or weeks. Then it only takes a few minutes to try out BlogBackupOnline; or check how your favourite blog is faring in the Internet Archive , the European Web Archive, or the UK Web Archive; or even download HTTrack or Wget and try harvesting something yourself.

Then it’s time to get Twittering


You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

AddThis Social Bookmark Button

One Response to “If you can keep your blog when all around…”

  1. [...] using blogging to replace traditional approaches to reporting and minuting). Perhaps it just seems too complicated. For those that want to, the only tools that seem to be readily available are specialised tools – [...]

Leave a Reply