Can’t Keep Up With IS-1

I woke up this morning to find a new file on IS-1.  I downloaded it and started banging on it.

An hour later I refreshed the page and the same file’s timestamp had changed.  I never noticed this before so I’m starting to wonder whether it hasn’t done this all along.  If so, this site has the richest supply of proxies on the Internet.

I’m at the limit of my processing power importing three file simultaneously on the AMD64x2 box, so I may have to enlist another VM if the file updates again today.  Or I can just start stockpiling data and catch-as-catch can.

-= UPDATE 12:00PM =-

I have implemented a check once evry 15 minutes on this file and it appears it is refreshed every 30 minutes, like clockwork.  It’s not a new file, but an update.  The file always has about 250,000 proxies so I’ll need to hack out a diff to make this manageable.

-= UPDATE 1:15PM =-

I hacked out the diff.  Using – surprise – diff!

This site just may max out my processing capabilities.  Right now the page says we have 995,000 proxies, but we’ve probably already gone over a million.

The page updates are taking almost an hour with the extra data.  The twelve o’clock run didn’t make it to the server until 12:46.  I may have to look at that code.  It checks the new proxies sequentially and with a 45 second timeout that can slow things down considerably.  There must be some multitasking opportunities in there somewhere.


