I have finally starting clearing the junk out.  For example, since the beginning there have been about 20-30 Japanese entries in the list that were garbage.  They’re finally gone.

I also learned a lesson about wget that didn’t directly affect the list.  Under certain circumstances, if you get, say, a “403 Access Denied” response, wget will not store the page you would normally see in your browser.  This only affected the “Timeout” servers, but there is more junk to be found if there is a 302 or 304 redirect.

I exported all the non-CoDeeN proxies and used SwitchProxy, a FireFox plug-in, to check the junk factor.  There’s still a fair amount in there, but the next purge should take care of most of it.

It seems that Interesting Sites 1 and 2 are gone for good.  No more 75,000+ proxy imports.  I’m glad I got those when I could.  Curious Site is still supplying proxies, and of course I still hit the other lists every night (but they have nothing).  I’m running the Google Hack on and off but not getting much live data.  I’m going to keep hitting it because that’s where the Interesting Sites came from in the first place.  Somewhere, there’s an IS-3 out there.



