A Clever Plan

The ADPE ran and culled 160+ proxies.  The List went from 11 pages to 8, or about 360 proxies.  This number seems to be something of a fuzzy constant.

In anticipation of this I did a number of ad hoc queries and in the process I have found that this can be more fruitful than harvesting lists.  This is something of a revelation.  I had stopped doing ad hocs regularly before the List went online.  I did it primarily to pump the database up, whether the proxies were dead or alive.  A usual, most were dead, but that’s just a reflection of the proxy problem at large. 

Besides the extra proxies that the lists don’t have, there a few other advantages to doing ad hoc entries:

  • More often than not, proxies are listed in addr:port format, making cut & paste simple, and page scraping even more so.
  • By limiting the format to addr:port this automatically excludes most of the sites I already harvest from, since they like to give you (worthless) sorting options and pretty tables.
  • By adding a date criteria such as Jun 2008 to the search the results are relatively fresh.
  • The results are generally always cached by Google, so accessing a foreign site is faster.
  • Cached results can supply forum postings where you would otherwise need a logon.
  • The Google cache format seldom changes.
  • My Ad Hoc method is already fairly sophisticated, automatically eliminating duplicates and randomly going through any one time list to avoid scanning consequtive IP addresses/ranges.

The only disadvantage was that I was doing it all manually.

That is about to change.

The real challenge will be doing it under Google’s radar.  Whenever I start doing something like this they eventaully accuse me of having/being a virus.

Which I’m not.

I’m just a plucky little security researcher.


