· 11th April 2011 · Comments Off · Categories: Web Archiving

From the BlogForever blog.

At an atomic level, a “blog” comprises “blog posts”, which are continually added to the blog corpus: that is the dynamic essence of a blog, and distinguishes it from old-fashioned, largely static Websites and hypertexts in which little content changed between major update iterations, which process was probably more akin to “publishing a new edition” in the world of non-digital publications.

The blog also displays, as part of its frame, other graphical and functional elements (sidebars, widgets, “blogrolls”, etc) which may themselves contain dynamically updated, constantly changing information. These can be added, removed, amended and rearranged at will by the blog author/editor. Blog posts that were “published” in the context of one set of framing elements, will persist through subsequent versions of that framework.

Similarly with design (layout, colours, mastheads, etc), though the persistence tends to be longer, the informal nature of blogs means that these may be easily changed by the blog editor/author, and are thus more volatile than a typical “corporate” website. Again, blog posts may persist, unchanged in themselves, through many iterations of the blog site design and layout.

blogatoms 300x223 Asynchronicities in blog structure

A simple view of blog elements and their temporal relationship

 

This very simplified visualisations suggests where we might start conceptualising key elements of a blog. It indicates that they iterate over time, but in the cases of Design, Posts and Widgets (as we’ll call them for brevity), according to independent schedules. While Posts and Comments persist in the online view of a blog, designs and widget arrangements are overwritten.

With my earlier ArchivePress project we deliberately overlooked preservation of the blog’s framing elements, and (given the much smaller scope of that project) established an acceptable rationale for doing so. The challenge for BlogForever is to find a solution to  precisely these issues. Unless we were simply to adopt the snapshot approach of Heritrix-based web archiving initiatives (e.g. Wayback/archive.org, UK Web Archive), we need to ensure the BlogForever repository supports a degree of granularity that can capture, describe and preserve atomic blog objects in a way that reflects the particular interdependencies, in order to understand and preserve them authentically, and permit the many possible authentic and valid “time slice” views and analyses that users of the archive will need.

(I appreciate, by the way that these objects themselves are compound objects, so not strictly “atomic”: but the same is also true of atoms, as our CERN colleagues can attest!)