Quarter Mill Mark

Very soon we will hit the quarter million mark in the database.

The Google Hack from yesterday continues to churn although at this point no new proxies are getting added.  However there are thousands of dead proxies getting added every hour and the total count in the database is slowly approaching a quarter of a million unique address:port combinations.  There is less than five thousand to go.  I consider this a significant milestone in the project.

It occurs to me that because of the Google Hack there is no way to be 100% positive that what is being added are actual dead or missing proxies because it is a Blind Hack.  There is no way to know where the data actually came from.  Anything that is in an address:port format is only assumed to be a proxy.  Or to have been a proxy at some time in the past.

That is something of a conundrum.  Some of these address:port combos could be part of some long lost TCP  tutorial, Web server log, or technical publication, but as long as the Google Hack identifies real proxies I believe it’s safe to assume the data is from proxy lists.  Whatever their origin, the Google Hack works better than anything else I have designed to date for this project.


Luckily, the way the geolocation system works, RFC 1918, APIPA, and other bogus addresses are automatically filtered by generating a harmless SQL error that keeps them out of the database.   If they are not real, at least they are not bogus.


