Recently I, along with the rest of the world, have seen a significant increase in what appears to be scraping from Amazon AWS-related sources.
So simply put, I blocked all incoming requests from the Amazon cloud for our hosted application.
I know that some good services/bots are now hosted on the cloud, and I'm wondering if certain IP addresses should be allowed, as they may gather data that would in the end benefit our site's SEO rankings?
-- UPDATE --
I added a feature to block requests from the following hosts:
Amazon
Softlayer
ServerDeals
GigAvenue
Since then, I have seen my network traffic decrease (monitored by network out bytes). Average operation is around 10,000,000 bytes.
You can see where last week I was not blocking, then started blocking. I've since removed the blocks and will see what the outcome is.