<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Archiving</title>
	<atom:link href="http://crookedtimber.org/2007/02/06/archiving/feed/" rel="self" type="application/rss+xml" />
	<link>http://crookedtimber.org/2007/02/06/archiving/</link>
	<description>Out of the crooked timber of humanity, no straight thing was ever made</description>
	<lastBuildDate>Sun, 27 May 2012 00:43:58 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: Crooked Timber &#187; &#187; Is it &#8230; atomic? Very atomic, sir.</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-2/#comment-186469</link>
		<dc:creator>Crooked Timber &#187; &#187; Is it &#8230; atomic? Very atomic, sir.</dc:creator>
		<pubDate>Thu, 08 Feb 2007 13:46:49 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186469</guid>
		<description>[...] archiving post got good results. If you want to cite a webpage (in an academic paper, say) and you want to do [...]</description>
		<content:encoded><![CDATA[	<p>[...] archiving post got good results. If you want to cite a webpage (in an academic paper, say) and you want to do [...]</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: stuart</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-2/#comment-186322</link>
		<dc:creator>stuart</dc:creator>
		<pubDate>Wed, 07 Feb 2007 12:32:56 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186322</guid>
		<description>&lt;i&gt;But archive.org retroactively obeys the robots.txt directives it finds on current websites, and thus makes content inaccessible if the owner of a domain changes his/her mind later about whether the content should be visible.&lt;/i&gt;

Well especially as expired domains are now often picked up by porn redirect sites to get the extra hits from old links and favourites, so you cant even rely on dead domains staying dead (and hence no new robots.txt to change the instructions) like in the past.</description>
		<content:encoded><![CDATA[	<p><i>But archive.org retroactively obeys the robots.txt directives it finds on current websites, and thus makes content inaccessible if the owner of a domain changes his/her mind later about whether the content should be visible.</i></p>

	<p>Well especially as expired domains are now often picked up by porn redirect sites to get the extra hits from old links and favourites, so you cant even rely on dead domains staying dead (and hence no new robots.txt to change the instructions) like in the past.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: notme</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-2/#comment-186292</link>
		<dc:creator>notme</dc:creator>
		<pubDate>Wed, 07 Feb 2007 03:27:00 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186292</guid>
		<description>&lt;a href=&quot;http://www.ala.org/ala/lita/litaevents/litanationalforum2006nashvilletn/NewTools.pdf&quot; rel=&quot;nofollow&quot;&gt;Here&#039;s&lt;/a&gt; a recent bibliography from a presentation title &quot;New Tools for Preserving Digital Collections&quot;, given by Tracy Seneca of the California Digital Library at the Library &amp; Information Technology Association&#039;s 2006 Forum. Go &lt;a href=&quot;http://www.ala.org/ala/lita/litaevents/litanationalforum2006nashvilletn/2006forum.htm&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt; for the Forum&#039;s main page.

Then there&#039;s the January 2007 issue of &lt;a href=&quot;http://www.ariadne.ac.uk/issue50/&quot; rel=&quot;nofollow&quot;&gt;Ariadne&lt;/a&gt;, with relevant articles and conference reports.

That information is oriented to organizations (such as libraries) seeking to archive web pages. But you are going to have many of the problems they&#039;ve already spent a lot of time analyzing.

If you&#039;re only interested in &quot;pretty good&quot; persistence for someone else&#039;s document, and you want to be able to point people to it with a URL, then webcite seems to be the way to go. Assuming the pages aren&#039;t already being archived by a national library.</description>
		<content:encoded><![CDATA[	<p><a href="http://www.ala.org/ala/lita/litaevents/litanationalforum2006nashvilletn/NewTools.pdf" rel="nofollow">Here&#8217;s</a> a recent bibliography from a presentation title &#8220;New Tools for Preserving Digital Collections&#8221;, given by Tracy Seneca of the California Digital Library at the Library &#038; Information Technology Association&#8217;s 2006 Forum. Go <a href="http://www.ala.org/ala/lita/litaevents/litanationalforum2006nashvilletn/2006forum.htm" rel="nofollow">here</a> for the Forum&#8217;s main page.</p>

	<p>Then there&#8217;s the January 2007 issue of <a href="http://www.ariadne.ac.uk/issue50/" rel="nofollow">Ariadne</a>, with relevant articles and conference reports.</p>

	<p>That information is oriented to organizations (such as libraries) seeking to archive web pages. But you are going to have many of the problems they&#8217;ve already spent a lot of time analyzing.</p>

	<p>If you&#8217;re only interested in &#8220;pretty good&#8221; persistence for someone else&#8217;s document, and you want to be able to point people to it with a <span class="caps">URL</span>, then webcite seems to be the way to go. Assuming the pages aren&#8217;t already being archived by a national library.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: bemused</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-1/#comment-186289</link>
		<dc:creator>bemused</dc:creator>
		<pubDate>Wed, 07 Feb 2007 02:36:16 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186289</guid>
		<description>To everyone suggesting references to the wayback machine -- as long as you are referring to content on a domain you control (and intend to control in future) you&#039;re golden.  But archive.org retroactively obeys the robots.txt directives it finds on current websites, and thus makes content inaccessible if the owner of a domain changes his/her mind later about whether the content should be visible.

Beware.</description>
		<content:encoded><![CDATA[	<p>To everyone suggesting references to the wayback machine&#8212;as long as you are referring to content on a domain you control (and intend to control in future) you&#8217;re golden.  But archive.org retroactively obeys the robots.txt directives it finds on current websites, and thus makes content inaccessible if the owner of a domain changes his/her mind later about whether the content should be visible.</p>

	<p>Beware.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: joeo</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-1/#comment-186284</link>
		<dc:creator>joeo</dc:creator>
		<pubDate>Wed, 07 Feb 2007 01:52:06 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186284</guid>
		<description>WebCite looks pretty sweet.

Here is an example I just made:

http://www.webcitation.org/5MSlJEAAW</description>
		<content:encoded><![CDATA[	<p>WebCite looks pretty sweet.</p>

	<p>Here is an example I just made:</p>

	<p><a href="http://www.webcitation.org/5MSlJEAAW" rel="nofollow">http://www.webcitation.org/5MSlJEAAW</a></p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: Bob Calder</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-1/#comment-186282</link>
		<dc:creator>Bob Calder</dc:creator>
		<pubDate>Wed, 07 Feb 2007 00:49:21 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186282</guid>
		<description>I noticed teh Internet Archive mentioned only once. There is a REASON they call it the Internet Archive. Really.

Of course there are issues. There are always issues.</description>
		<content:encoded><![CDATA[	<p>I noticed teh Internet Archive mentioned only once. There is a <span class="caps">REASON</span> they call it the Internet Archive. Really.</p>

	<p>Of course there are issues. There are always issues.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: Fr.</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-1/#comment-186276</link>
		<dc:creator>Fr.</dc:creator>
		<pubDate>Tue, 06 Feb 2007 22:32:38 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186276</guid>
		<description>1. Archived copy (with Refresh) - With Furl, you can save a copy of a web page and it is archived for you. This means that you can access that page and read it any time you need to, even if the web site is down, or the page has changed on the original web site, or even if the page is no longer accessible for free. In those cases where the web page has changed and you would rather have the new content, you can refresh the archived copy at will. [&lt;a href=&quot;http://www.furl.net/furlFeatures.jsp#Saving&quot; rel=&quot;nofollow&quot;&gt;furl&lt;/a&gt;]

2. Full-page capture programmes, like Zotero. Some other browser plugins can also do it.</description>
		<content:encoded><![CDATA[	<p>1. Archived copy (with Refresh) &#8211; With Furl, you can save a copy of a web page and it is archived for you. This means that you can access that page and read it any time you need to, even if the web site is down, or the page has changed on the original web site, or even if the page is no longer accessible for free. In those cases where the web page has changed and you would rather have the new content, you can refresh the archived copy at will. [<a href="http://www.furl.net/furlFeatures.jsp#Saving" rel="nofollow">furl</a>]</p>

	<p>2. Full-page capture programmes, like Zotero. Some other browser plugins can also do it.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: Eszter</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-1/#comment-186266</link>
		<dc:creator>Eszter</dc:creator>
		<pubDate>Tue, 06 Feb 2007 21:36:46 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186266</guid>
		<description>LW - Thanks for clarifying what you meant. However, at that rate any page can change on the Web, of course.  (I guess it would be hard to get a Web Archive page to change, but even archives of media sites change on their own domains as the sites go through entire revamps and post different ads, just to name an example.)

Sure, in that sense it&#039;s all dynamic. But since you mentioned the Flickr example with the Amazon example, I thought you were focusing more on the pages that are so dynamic that they almost certainly would not come up the same way as a result to a search by someone else using another machine logged in to another profile (or no login at all). I was surprised you&#039;d equate Flickr to that sort of dynamic rendering, that&#039;s all.

As for the Shakespeare garden, I&#039;m living in Palo Alto this year so I can&#039;t say.  If it wasn&#039;t so cold I&#039;d say I&#039;ll check it out next week when I&#039;m back briefly, but I don&#039;t think that going to happen this time around.</description>
		<content:encoded><![CDATA[	<p><span class="caps">LW </span>- Thanks for clarifying what you meant. However, at that rate any page can change on the Web, of course.  (I guess it would be hard to get a Web Archive page to change, but even archives of media sites change on their own domains as the sites go through entire revamps and post different ads, just to name an example.)</p>

	<p>Sure, in that sense it&#8217;s all dynamic. But since you mentioned the Flickr example with the Amazon example, I thought you were focusing more on the pages that are so dynamic that they almost certainly would not come up the same way as a result to a search by someone else using another machine logged in to another profile (or no login at all). I was surprised you&#8217;d equate Flickr to that sort of dynamic rendering, that&#8217;s all.</p>

	<p>As for the Shakespeare garden, I&#8217;m living in Palo Alto this year so I can&#8217;t say.  If it wasn&#8217;t so cold I&#8217;d say I&#8217;ll check it out next week when I&#8217;m back briefly, but I don&#8217;t think that going to happen this time around.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: abb1</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-1/#comment-186260</link>
		<dc:creator>abb1</dc:creator>
		<pubDate>Tue, 06 Feb 2007 19:58:41 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186260</guid>
		<description>Sfisher, sure, clearly if what you have is a bunch of documents - you need to archive the documents. 

And it&#039;s certainly true that archiving databases often isn&#039;t simple. 

I&#039;m just saying that conceptually you&#039;re probably better off with a copy of the database than with a bunch of reports based on this database.</description>
		<content:encoded><![CDATA[	<p>Sfisher, sure, clearly if what you have is a bunch of documents &#8211; you need to archive the documents.</p>

	<p>And it&#8217;s certainly true that archiving databases often isn&#8217;t simple.</p>

	<p>I&#8217;m just saying that conceptually you&#8217;re probably better off with a copy of the database than with a bunch of reports based on this database.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: Adam Kotsko</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-1/#comment-186254</link>
		<dc:creator>Adam Kotsko</dc:creator>
		<pubDate>Tue, 06 Feb 2007 19:33:45 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186254</guid>
		<description>You should probably put the safe in a fallout shelter as well.</description>
		<content:encoded><![CDATA[	<p>You should probably put the safe in a fallout shelter as well.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: e-tat</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-1/#comment-186253</link>
		<dc:creator>e-tat</dc:creator>
		<pubDate>Tue, 06 Feb 2007 19:25:03 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186253</guid>
		<description>Yeah: Furl, Scrapbook, and &lt;a href=&quot;http://www.spurl.net&quot; rel=&quot;nofollow&quot;&gt;Spurl&lt;/a&gt; are all good for making archive copies. With Furl and Spurl you also get to see who else has made a copy... so you can, like, &lt;i&gt;ask to borrow&lt;/i&gt; theirs.</description>
		<content:encoded><![CDATA[	<p>Yeah: Furl, Scrapbook, and <a href="http://www.spurl.net" rel="nofollow">Spurl</a> are all good for making archive copies. With Furl and Spurl you also get to see who else has made a copy&#8230; so you can, like, <i>ask to borrow</i> theirs.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: sfisher</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-1/#comment-186252</link>
		<dc:creator>sfisher</dc:creator>
		<pubDate>Tue, 06 Feb 2007 18:59:28 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186252</guid>
		<description>In reply to abb1 (#39): I think it depends on the situation as to whether you want to preserve a presentation of the data, or some kind database/XML dump of all the data itself.  A exact presentation may be the best thing for something like an academic paper reference in which you want to say &quot;see page 39 of pdf xxx&quot; and point to an exact spot that&#039;s fairly stable for humans to look at and will be easily available for a while.

On the other hand, preserving the data for machines to look at and do gymnastics on is flexible since you can &lt;i&gt;theoretically&lt;/i&gt; present it or crunch the data again in various ways if you have all the original information somewhere. But creating new views of data isn&#039;t always cheap or trivial. It becomes even more expensive and more complicated if the original rendering software isn&#039;t preserved along with with the data (or if it becomes obsolete).  At some point the cost of re-rendering from some specific data structure likely becomes higher than the known or suspected value of the data.  Whereas a fixed-render in a standard format is likely to have pre-packaged software available to render it in a fixed and inflexible way, but at much lower cost longer into the future.

Likely you could both archive original data along with a fixed render and then you have some flexibility as well as lower cost for previewing/seeing the contents.

In reply to Peter (#38): Yikes.  That&#039;s a horrible story.  That makes me consider that I should have better back-up practices for off site storage. (Or perhaps do twice-yearly DVD back ups of my USB hard drive back-ups to store off site).  Not that it would ruin my life if I lost all my data on my home computer, but it would probably be more unpleasant than doing occasional off-site archiving.</description>
		<content:encoded><![CDATA[	<p>In reply to abb1 (#39): I think it depends on the situation as to whether you want to preserve a presentation of the data, or some kind database/XML dump of all the data itself.  A exact presentation may be the best thing for something like an academic paper reference in which you want to say &#8220;see page 39 of pdf xxx&#8221; and point to an exact spot that&#8217;s fairly stable for humans to look at and will be easily available for a while.</p>

	<p>On the other hand, preserving the data for machines to look at and do gymnastics on is flexible since you can <i>theoretically</i> present it or crunch the data again in various ways if you have all the original information somewhere. But creating new views of data isn&#8217;t always cheap or trivial. It becomes even more expensive and more complicated if the original rendering software isn&#8217;t preserved along with with the data (or if it becomes obsolete).  At some point the cost of re-rendering from some specific data structure likely becomes higher than the known or suspected value of the data.  Whereas a fixed-render in a standard format is likely to have pre-packaged software available to render it in a fixed and inflexible way, but at much lower cost longer into the future.</p>

	<p>Likely you could both archive original data along with a fixed render and then you have some flexibility as well as lower cost for previewing/seeing the contents.</p>

	<p>In reply to Peter (#38): Yikes.  That&#8217;s a horrible story.  That makes me consider that I should have better back-up practices for off site storage. (Or perhaps do twice-yearly <span class="caps">DVD</span> back ups of my <span class="caps">USB</span> hard drive back-ups to store off site).  Not that it would ruin my life if I lost all my data on my home computer, but it would probably be more unpleasant than doing occasional off-site archiving.</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: lw</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-1/#comment-186250</link>
		<dc:creator>lw</dc:creator>
		<pubDate>Tue, 06 Feb 2007 18:50:35 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186250</guid>
		<description>Flickr, like many services, generates base URL in a way closely connected to the way underlying objects are stored in the db; in flickr&#039;s case (as with most services), there&#039;s a core object the page is associated with.  I was actually thinking of added comments or edits, both of which change  the content of the page displayed; capturing the human reasoning &quot;the photo is key; same photo, same URL, everything OK&quot; in a program is not easy.  That intuitively stable mapping may not exist; consider the document http://del.icio.us/recent, 
or a vendor&#039;s &quot;you may also like&quot; recommendations.
The core attribute linking the entries on that page is the result of a query against an ever-changing db.  Or consider this very exchange-- are comments not written by the entry&#039;s author a part of this page or not?  What if the interface was richer and links to threads with similar keywords were displayed-- would those links be part of the page?  What about stable URLs associated with an astronomical object that just blew up, or a gene for which there&#039;s a newly reported functional assay?  Same query, same URL, same &quot;core&quot; entity, but a qualitatively different page.  These are not marginal cases-- pages with a single definite human author and denumerable edits by that author are already rare.

Static documents are at least superficially indistinguishable from dynamic objects that are pretty different from them, making scope and synchrony questions essential for any internet archiving beyond saving a small amount of html locally.  This leaves aside the unformalizable question of format/content, which is not one of HTML&#039;s strengths.  

On a local note-- are the Shakespeare gardens behind Vogelback being kept up?</description>
		<content:encoded><![CDATA[	<p>Flickr, like many services, generates base <span class="caps">URL</span> in a way closely connected to the way underlying objects are stored in the db; in flickr&#8217;s case (as with most services), there&#8217;s a core object the page is associated with.  I was actually thinking of added comments or edits, both of which change  the content of the page displayed; capturing the human reasoning &#8220;the photo is key; same photo, same <span class="caps">URL</span>, everything OK&#8221; in a program is not easy.  That intuitively stable mapping may not exist; consider the document <a href="http://del.icio.us/recent" rel="nofollow">http://del.icio.us/recent</a>,<br />
or a vendor&#8217;s &#8220;you may also like&#8221; recommendations.<br />
The core attribute linking the entries on that page is the result of a query against an ever-changing db.  Or consider this very exchange&#8212;are comments not written by the entry&#8217;s author a part of this page or not?  What if the interface was richer and links to threads with similar keywords were displayed&#8212;would those links be part of the page?  What about stable URLs associated with an astronomical object that just blew up, or a gene for which there&#8217;s a newly reported functional assay?  Same query, same <span class="caps">URL</span>, same &#8220;core&#8221; entity, but a qualitatively different page.  These are not marginal cases&#8212;pages with a single definite human author and denumerable edits by that author are already rare.</p>

	<p>Static documents are at least superficially indistinguishable from dynamic objects that are pretty different from them, making scope and synchrony questions essential for any internet archiving beyond saving a small amount of html locally.  This leaves aside the unformalizable question of format/content, which is not one of <span class="caps">HTML</span>&#8217;s strengths.</p>

	<p>On a local note&#8212;are the Shakespeare gardens behind Vogelback being kept up?</p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: atd</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-1/#comment-186248</link>
		<dc:creator>atd</dc:creator>
		<pubDate>Tue, 06 Feb 2007 18:43:07 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186248</guid>
		<description>Zotero does full snapshots. and plays super nice with other bibliographic programs.

http://www.zotero.org/</description>
		<content:encoded><![CDATA[	<p>Zotero does full snapshots. and plays super nice with other bibliographic programs.</p>

	<p><a href="http://www.zotero.org/" rel="nofollow">http://www.zotero.org/</a></p>
 ]]></content:encoded>
	</item>
	<item>
		<title>By: abb1</title>
		<link>http://crookedtimber.org/2007/02/06/archiving/comment-page-1/#comment-186233</link>
		<dc:creator>abb1</dc:creator>
		<pubDate>Tue, 06 Feb 2007 17:20:53 +0000</pubDate>
		<guid isPermaLink="false">http://crookedtimber.org/2007/02/06/archiving/#comment-186233</guid>
		<description>Nah, what you really care about is the data, not all the zillion snapshots of various presentations of this data.</description>
		<content:encoded><![CDATA[	<p>Nah, what you really care about is the data, not all the zillion snapshots of various presentations of this data.</p>
 ]]></content:encoded>
	</item>
</channel>
</rss>

