Archiving

by John Holbo on February 6, 2007

I have an archiving question. Suppose I wanted to save a link to a page and do my best to ensure that it stays good – for years and years, in principle, even if the page goes away. As long as google shall reign. The obvious answer: just link to a googlecache URL. One thing I’m not clear about is google’s policy about these page images. Suppose I record a page as it appeared, say, today. That is: the google page has a little ‘as retrieved on 06 Feb 2007 04:14:38 GMT’ or whatever. Does google only take a new snapshot if something changes on the page (does it have some way of knowing that?) so that, so long as the page isn’t changed, the snapshot stays good forever? Suppose I link to a page and, in two years, the page is modified and google has in the meantime taken any number of fresh cache images. There’s no expiry date on old caches, right? (I’m sure about this. But I want to be really sure. Has google made any explicit commitments, archiving-wise? I’m thinking about the long haul here.) Also, googlecache URL’s are a bit unwieldy. Tinyurl promises to offer more convenient handles that ‘never expire’. But what if tinyurl dies? Would it be bad archiving policy to tiny-fy a googlecache link, for convenience (so someone could type it in, more easily than those long monsters?) What would you say are best practices, short of doing the old fashioned thing and hitting ‘print’ and locking the results in a fireproof safe?