Is there a list of known web crawlers?

Posted by J. Pablo Fernández on Stack Overflow See other posts from Stack Overflow or by J. Pablo Fernández
Published on 2009-11-14T07:33:57Z Indexed on 2010/05/13 10:54 UTC
Read the original article Hit count: 771

Filed under:

web-crawler

|

bots

|

list

|

documentation

I'm trying to get accurate download numbers for some files on a web server. I look at the user agents and some are clearly bots or web crawlers, but many for many I'm not sure, they may or may not be a web crawler and they are causing many downloads so it's important for me to know.

Is there somewhere a list of know web crawlers with some documentation like user agent, IPs, behavior, etc?

I'm not interested in the official ones, like Google's, Yahoo's, or Microsoft's. Those are generally well behaved and self-indentified.

© Stack Overflow or respective owner

Related posts about web-crawler

web crawler needed

as seen on Stack Overflow - Search for 'Stack Overflow'
does anybody know where i can get a free web crawler that actually works with minimal coding by me. ive googled it and can only find really old ones that dont work or openwebspider which doesnt seem to work. ideally id like to store just the web addresses and which links that page contains any suggestions… >>> More
Building an automatic web crawler

as seen on Stack Overflow - Search for 'Stack Overflow'
I am building a web application crawler that's meant not only to find all the links or pages in a web application, but also perform all the allowed actions in the app (such as pushing buttons, filling forms, notice changes in the DOM even if they did not trigger a request etc.) Basically, this is… >>> More
Appengine Apps Vs Google bot web crawler

as seen on Stack Overflow - Search for 'Stack Overflow'
i built an appengine web app cricket.hover.in. The web app consists of about 15k url's linked in it, But even after a long time of my launch, no pages are indexed on google. Any base link place on my root site hover.in are being indexed with in minutes. but i placed the same link home page of root… >>> More
Extracting data from internet

as seen on Programmers - Search for 'Programmers'
I would like to extract data from internet like www.mozenda.com does but I want to write my own program to do that. Specific data I'm looking for is various event data. Based on my research, I think custom web crawler is my answer but I Would like to confirm the answer and see if there are any suggestion… >>> More
Web crawler update strategy

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to crawl useful resource (like background picture .. ) from certain websites. It is not a hard job, especially with the help of some wonderful projects like scrapy. The problem here is I not only just want crawl this site ONE TIME. I also want to keep my crawl long running and crawl the updated… >>> More

Related posts about bots

How to remove bots from my computer?

as seen on Super User - Search for 'Super User'
Because trend micro's rubotted detected that there is a bot running on my system and recommends to run Trend micro's house call to remove the bot. But when I ran house call, it does not detect anything. By the way,my current antivirus software is Microsoft Security Essentials. >>> More
Java, how to make bots? Beginner

as seen on Stack Overflow - Search for 'Stack Overflow'
I am new in the world of programming. I am student of J2ME(Java) beginner. And I am interested to make bot, how to make bots easily. Any guide? Link or keywords which I can google to learn things? >>> More
Twitter traffic might not be what it seems

as seen on Hadermann.be - Search for 'Hadermann.be'
Are you using bit.ly stats to measure interest in the links you post on twitter? I’ve been hearing for a while about people claiming to get the majority of their traffic originating from twitter these days. Now, I’ve been playing with the twitter ruby gem recently, doing various experiments… >>> More
Searchengine bots and meta refresh for disabled Javascript

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi! I have a website that must have javascript turned on so it can work there is a < noscript tag that have a meta to redirect the user to a page that alerts him about the disabled javascript... I am wondering, is this a bad thing for search engine crawlers? Because I send an e-mail to myself… >>> More
Video Game Bots?

as seen on Stack Overflow - Search for 'Stack Overflow'
Something I've always wondered, especially since it inspired me to start programming when I was a kid, was how video game bots work? I'm sure there are a lot of different methods, but what about automation for MMORPGs? Or even FPS-type bots? >>> More