I have finally starting clearing the junk out.  For example, since the beginning there have been about 20-30 Japanese entries in the list that were garbage.  They’re finally gone.

I also learned a lesson about wget that didn’t directly affect the list.  Under certain circumstances, if you get, say, a “403 Access Denied” response, wget will not store the page you would normally see in your browser.  This only affected the “Timeout” servers, but there is more junk to be found if there is a 302 or 304 redirect.

I exported all the non-CoDeeN proxies and used SwitchProxy, a FireFox plug-in, to check the junk factor.  There’s still a fair amount in there, but the next purge should take care of most of it.

It seems that Interesting Sites 1 and 2 are gone for good.  No more 75,000+ proxy imports.  I’m glad I got those when I could.  Curious Site is still supplying proxies, and of course I still hit the other lists every night (but they have nothing).  I’m running the Google Hack on and off but not getting much live data.  I’m going to keep hitting it because that’s where the Interesting Sites came from in the first place.  Somewhere, there’s an IS-3 out there.



0 Responses to “Junkbusting”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s



%d bloggers like this: