-
as seen on Server Fault
- Search for 'Server Fault'
It's been suggested that we use mysql for our site's search as it'd be running on the same server that hosts our web server (nginx) and our db (mysql).
Since not all of our pages are created from the database, it's been suggested that we have a crawler that can crawl the site, and toss the page url…
>>> More
-
as seen on Server Fault
- Search for 'Server Fault'
Hi all,
I recently removed a sub-domain from my domain so I just have 1 website to manage. However, if I do a google search, my old domain is still there, I removed the sub-domain well over a week ago and if you try to access the domain directly, you will get an error saying the website can not…
>>> More
-
as seen on Pro Webmasters
- Search for 'Pro Webmasters'
Is there a way to take site inventory using a crawler program that checks either the sources of images for specific servers that serve ads, or, that the crawler looks at a page for specific (html5?) tags like <aside> or some other tag to count the inventory of ad spaces available on a site?…
>>> More
-
as seen on Stack Overflow
- Search for 'Stack Overflow'
I am building a web application crawler that's meant not only to find all the links or pages in a web application, but also perform all the allowed actions in the app (such as pushing buttons, filling forms, notice changes in the DOM even if they did not trigger a request etc.)
Basically, this is…
>>> More
-
as seen on Stack Overflow
- Search for 'Stack Overflow'
Hi,
I am about to develop a crawler in Java but don't feel like reinventing the wheel. A quick Google search gives a whole bunch of Java libraries to build a web crawler. Besides that Nutch is of course a very robust package but seems a bit too advanced for my needs. I only need to crawl a handful…
>>> More