Search Results

Search found 24 results on 1 pages for 'webcrawling'.

Page 1/1 | 1 

  • Asynchronous Webcrawling F#, something wrong ?

    - by jlezard
    Not quite sure if it is ok to do this but, my question is: Is there something wrong with my code ? It doesn't go as fast as I would like, and since I am using lots of async workflows maybe I am doing something wrong. The goal here is to build something that can crawl 20 000 pages in less than an hour. open System open System.Text open

    Read the article

  • WebCrawling Dynamic Links

    - by Jojo
    Hi Everyone, Anybody has any idea on crawling websites that have dynamic pages/queries? I mean if I click a certain link, it has different values every I try to reload it in a web browser. Now my webcrawler could not download the contents of these pages. Please advise.

    Read the article

  • Crawling engine architecture - Java/ Perl integration

    - by Bigtwinz
    Hi all, I am looking to develop a management and administration solution around our webcrawling perl scripts. Basically, right now our scripts are saved in SVN and are manually kicked off by SysAdmin/devs etc. Everytime we need to retrieve data from new sources we have to create a ticket with business instructions and goals. As you can imagine,

    Read the article

  • Building an automatic web crawler

    - by Sakin
    I am building a web application crawler that's meant not only to find all the links or pages in a web application, but also perform all the allowed actions in the app (such as pushing buttons, filling forms, notice changes in the DOM even if they did not trigger a request etc.) Basically, this is a kind of "browser simulator". I find WebKit a

    Read the article

  • Getting web page after calling DownloadStringAsync()?

    - by OverTheRainbow
    Hello I don't know enough about VB.Net yet to use the richer HttpWebRequest class, so I figured I'd use the simpler WebClient class to download web pages asynchronously (to avoid freezing the UI). However, how can the asynchronous event handler actually return the web page to the calling routine? Imports System.Net Public Class Form1

    Read the article

  • What is a good Java crawler library?

    - by DrDee
    Hi, I am about to develop a crawler in Java but don't feel like reinventing the wheel. A quick Google search gives a whole bunch of Java libraries to build a web crawler. Besides that Nutch is of course a very robust package but seems a bit too advanced for my needs. I only need to crawl a handful websites a week containing a couple of 1000

    Read the article

  • Mining Groups of people from Wikipedia

    - by AlgoMan
    I am trying to get the list of people from the http://en.wikipedia.org/wiki/Category:People_by_occupation . I have to go through all the sections and get people from each section. How should i go about it ? Should i use a crawler and get the pages and search through those using BeautifulSoup ? Or is there any other alternative to get the

    Read the article

  • Prevent bot from crawling certain areas of site.

    - by Skoder
    Hey, I don't know much about SEO and how web spiders work, so forgive my ignorance here. I'm creating a site (using ASP.NET-MVC) which has areas that displays information retrieved from the database. The data is unique to the user, so there's no real server-side output caching going on. However, since the data can contain things the user

    Read the article

  • Asynchronous crawling F#

    - by jlezard
    When crawling on webpages I need to be careful as to not make too many requests to the same domain, for example I want to put 1 s between requests. From what I understand it is the time between requests that is important. So to speed things up I want to use async workflows in F#, the idea being make your requests with 1 sec interval but

    Read the article

  • Problem extracting text from RSS feeds

    - by Gautam
    Hi, I am new to the world of Ruby and Rails. I have seen rails cast 190 and I just started playing with it. I used selector gadget to find out the CSS and XPath I have the following code.. require 'rubygems' require 'nokogiri' require 'open-uri' url = "http://www.telegraph.co.uk/sport/football/rss" doc =

    Read the article

  • What is a good Java web crawler library?

    - by DrDee
    Hi, I am about to develop a crawler in Java but don't feel like reinventing the wheel. A quick Google search gives a whole bunch of Java libraries to build a web crawler. Besides that Nutch is of course a very robust package but seems a bit too advanced for my needs. I only need to crawl a handful websites a week

    Read the article

  • What's a good Web Crawler tool

    - by Glenn Slaven
    I need to index a whole lot of webpages, what good webcrawler utilities are there? I'm preferably after something that .NET can talk to, but that's not a showstopper. What I really need is something that I can give a site url to & it will follow every link and store the content for indexing.

    Read the article

  • Web request returns "DOS"

    - by jlezard
    I am getting a "DOS" instead of the html string .... let getHtmlBasic (uri :System.Uri ) = use client = new WebClient() client.DownloadString( uri) let uri = System.Uri( "http://www.b-a-r-f.com/" ) getHtmlBasic uri This gives a string, "DOS" Lol what the ? All other websites seems to work ...

    Read the article

  • Guidelines for good webcrawler 'Etiquette'

    - by Harry
    I'm building a search engine (for fun) and it has just struck me that potentially my little project might wreak havok by clicking on ads and all sorts of problems. So what are the guidelines for good webcrawler 'Etiquette'? Things that spring to mind: Observe Robot.txt instructions Limit the number of

    Read the article

  • need help in site classification

    - by goh
    hi guys, I have to crawl the contents of several blogs. The problem is that I need to classify whether the blogs the authors are from a specific school and is talking about the school's stuff. May i know what's the best approach in doing the crawling or how should i go about the classification?

    Read the article

  • guide on crawling the entire web ?

    - by bohohasdhfasdf
    i just had this thought, and was wondering if it's possible to crawl the entire web (just like the big boys!) on a single dedicated server (like Core2Duo, 8gig ram, 750gb disk 100mbps) . I've come across a paper where this was done....but i cannot recall this paper's title. it was like about

    Read the article

  • how to store data crawled from website

    - by Richard
    I want to crawl a website and store the content on my computer for later analysis. However my OS file system has a limit on the number of sub directories, meaning storing the original folder structure is not going to work. Suggestions? Map the URL to some filename so can store flatly? Or

    Read the article

  • Not crawling the same content twice

    - by sirrocco
    I'm building a small application that will crawl sites where the content is growing (like on stackoverflow) the difference is that the content once created is rarely modified. Now , in the first pass I crawl all the pages in the site. But next, the paged content of that site - I don't

    Read the article

  • Approaches for cross server content sharing?

    - by Anonymity
    I've currently been tasked with finding a best solution to serving up content on our new site from another one of our other sites. Several approaches suggested to me, that I've looked into include using SharePoint's Lists Web Service to grab the list through javascript - which results in

    Read the article

1