Web Archiving

The BlogForever survey is live!

After weeks of design work, the BlogForever survey is live, available in 6 languages and running for 28 days.

This survey is part of BlogForever, an EU-funded collaborative project that ULCC collaborates through the Digital Archives department.

The results of the survey, available at the end of the summer, will help to develop digital preservation, management and dissemination facilities for weblogs. Hence, we are keen to gather information about the content, context and usage patterns of current weblogs, so we could identify blogs users’ views on their long-term preservation, management, analysis, access and use. If you would like to take part on the survey please use the following link:

DA visits CERN

CERN Computer Centre

CERN Computer Centre

CERN Computer Racks

Last week I was in a very welcoming Geneva, exactly in the European Organization for Nuclear Research (CERN) to meet other partners working on BlogForever and to have several Invenio workshops. I felt very lucky to be in the hub of such an organization and to see how many young international students are getting the opportunity to be in the forefront of high physics research.

The Globe at CERN

The Globe at CERN

CERN is home of the world’s biggest and most powerful particle accelerator, the Large Hadron Collider (LHC). This machine is installed in a 27 km circumference tunnel. The LHC records around 15 petabytes per year. All the data is stored in their vast computer centre, where open access and sharing has been the driving principle since their foundation in 1954 and an inspirational environment for the Web to be born there.

Invenio’s workshops showed us that their electronic document management system is robust and versatile, targets the management of more than 1.2 million documents and it can be used in 19 different languages. Its content is clean and complete. In just their High Energy Physics domain, they have around 700 collections and approximately 20K queries a day. As well Invenio is used for special programs like the UNESCO funded digital repos in Africa and EU funded projects like D4Science and OpenAIRE.

Silvia at CERN

Me at CERN

In the case of BlogForever and Invenio, plenty of work to be done by the Invenio Team at the User and Document Service Group. At the moment, they have more than 30 readily available Python modules that can be adapted to the case of preserving huge amount of blogs. From the point of view of my work with repositories as part of the Digital Archives and Repositories Team at ULCC, I was inspired by Invenio’s advance search engines; indexing and ranking methods.

In a more personal level, if you are ever crossing the border between France and Switzerland near Geneva, get Tram 18 and hop off at CERN to see their Microcosm and Globe exhibitions.

Nominate blogs for our survey

From the BlogForever blog.

Is there a particular blog or blogger you would like to see included in the BlogForever survey? We invite you to use this form to nominate them, and we will try to ensure that the blog is reviewed or the blogger contacted to participate in our survey.

Asynchronicities in blog structure

From the BlogForever blog.

At an atomic level, a “blog” comprises “blog posts”, which are continually added to the blog corpus: that is the dynamic essence of a blog, and distinguishes it from old-fashioned, largely static Websites and hypertexts in which little content changed between major update iterations, which process was probably more akin to “publishing a new edition” in the world of non-digital publications.

The blog also displays, as part of its frame, other graphical and functional elements (sidebars, widgets, “blogrolls”, etc) which may themselves contain dynamically updated, constantly changing information. These can be added, removed, amended and rearranged at will by the blog author/editor. Blog posts that were “published” in the context of one set of framing elements, will persist through subsequent versions of that framework.

Similarly with design (layout, colours, mastheads, etc), though the persistence tends to be longer, the informal nature of blogs means that these may be easily changed by the blog editor/author, and are thus more volatile than a typical “corporate” website. Again, blog posts may persist, unchanged in themselves, through many iterations of the blog site design and layout.

blogatoms 300x223 Asynchronicities in blog structure

A simple view of blog elements and their temporal relationship

 

This very simplified visualisations suggests where we might start conceptualising key elements of a blog. It indicates that they iterate over time, but in the cases of Design, Posts and Widgets (as we’ll call them for brevity), according to independent schedules. While Posts and Comments persist in the online view of a blog, designs and widget arrangements are overwritten.

With my earlier ArchivePress project we deliberately overlooked preservation of the blog’s framing elements, and (given the much smaller scope of that project) established an acceptable rationale for doing so. The challenge for BlogForever is to find a solution to  precisely these issues. Unless we were simply to adopt the snapshot approach of Heritrix-based web archiving initiatives (e.g. Wayback/archive.org, UK Web Archive), we need to ensure the BlogForever repository supports a degree of granularity that can capture, describe and preserve atomic blog objects in a way that reflects the particular interdependencies, in order to understand and preserve them authentically, and permit the many possible authentic and valid “time slice” views and analyses that users of the archive will need.

(I appreciate, by the way that these objects themselves are compound objects, so not strictly “atomic”: but the same is also true of atoms, as our CERN colleagues can attest!)