![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
During today's massive update of the Period Games Homepage, I'm discovering a new horror. Many of the sites I point to are now dead, which isn't a surprise. Many of them have been taken over by domain thieves, which also isn't a surprise.
What *is* a surprise is that many of those thieves have turned on robots.txt files that wind up blocking the Wayback Machine from producing results: it appears that archive.org respects robots.txt a little *too* much. The result is that a large number of useful pages are just plain inaccessible -- I can't even get at their archived versions. Grr...
(BTW, time for another reminder that archive.org is one of the most important and unsung sites on the Web -- the Wayback Machine is the only really good archive of the Web's history, and is often invaluable. I've given them another donation today...)
What *is* a surprise is that many of those thieves have turned on robots.txt files that wind up blocking the Wayback Machine from producing results: it appears that archive.org respects robots.txt a little *too* much. The result is that a large number of useful pages are just plain inaccessible -- I can't even get at their archived versions. Grr...
(BTW, time for another reminder that archive.org is one of the most important and unsung sites on the Web -- the Wayback Machine is the only really good archive of the Web's history, and is often invaluable. I've given them another donation today...)
(no subject)
Date: 2010-02-22 01:20 am (UTC)(no subject)
Date: 2010-02-22 02:11 pm (UTC)There's a little about that in Wikipedia.
(no subject)
Date: 2010-02-22 02:33 pm (UTC)On a moral and ethical level, it clearly is *not* appropriate to respect robots.txt in this case, and even on a legal level it's probably clear. I do wonder if there's a practical way to recognize this case appropriately without falling prey to the legal danger...
(no subject)
Date: 2010-02-22 02:39 pm (UTC)