How do you stop scripters from slamming your website hundreds of times a second?
- by davebug
[update] I've accepted an answer, as lc deserves the bounty due to the well thought-out answer, but sadly, I believe we're stuck with our original worst case scenario: CAPTCHA everyone on purchase attempts of the crap. Short explanation: caching / web farms make it impossible for us to actually track hits, and any workaround (sending a non-cached web-beacon, writing to a unified table, etc.) slows the site down worse than the bots would. There is likely some pricey bit of hardware from Cisco or the like that can help at a high level, but it's hard to justify the cost if CAPTCHAing everyone is an alternative. I'll attempt to do a more full explanation in here later, as well as cleaning this up for future searchers (though others are welcome to try, as it's community wiki).
I've added bounty to this question and attempted to explain why the current answers don't fit our needs. First, though, thanks to all of you who have thought about this, it's amazing to have this collective intelligence to help work through seemingly impossible problems.
I'll be a little more clear than I was before: This is about the bag o' crap sales on woot.com. I'm the president of Woot Workshop, the subsidiary of Woot that does the design, writes the product descriptions, podcasts, blog posts, and moderates the forums. I work in the css/html world and am only barely familiar with the rest of the developer world. I work closely with the developers and have talked through all of the answers here (and many other ideas we've had).
Usability of the site is a massive part of my job, and making the site exciting and fun is most of the rest of it. That's where the three goals below derive. CAPTCHA harms usability, and bots steal the fun and excitement out of our crap sales.
To set up the scenario a little more, bots are slamming our front page tens of times a second screenscraping (and/or scanning our rss) for the Random Crap sale. The moment they see that, it triggers a second stage of the program that logs in, clicks I want One, fills out the form, and buys the crap.
In current (2/6/2009) order of votes:
lc:
On stackoverflow and other sites that use this method, they're almost always dealing with authenticated (logged in) users, because the task being attempted requires that.
On Woot, anonymous (non-logged) users can view our home page. In other words, the slamming bots can be non-authenticated (and essentially non-trackable except by IP address).
So we're back to scanning for IPs, which a) is fairly useless in this age of cloud networking and spambot zombies and b) catches too many innocents given the number of businesses that come from one IP address (not to mention the issues with non-static IP ISPs and potential performance hits to trying to track this).
Oh, and having people call us would be the worst possible scenario. Can we have them call you?
BradC
Ned Batchelder's methods look pretty cool, but they're pretty firmly designed to defeat bots built for a network of sites. Our problem is bots are built specifically to defeat our site. Some of these methods could likely work for a short time until the scripters evolved their bots to ignore the honeypot, screenscrape for nearby label names instead of form ids, and use a javascript-capable browser control.
lc again
"Unless, of course, the hype is part of you