How to use a loop to download HTML with paging?

Posted by Nai on Stack Overflow See other posts from Stack Overflow or by Nai
Published on 2010-12-25T17:16:15Z Indexed on 2010/12/25 19:54 UTC
Read the original article Hit count: 359

Filed under:
|
|
|
|

I want to loop through this URL and download the HTML.

https://www.googleapis.com/customsearch/v1?key=AIzaSyAAoPQprb6aAV-AfuVjoCdErKTiJHn-4uI&cx=017576662512468239146:omuauf_lfve&q=" + searchTermFormat + "&num=10" +"&start=" + i

start and num controls the paging of the URL. So if &start=2, and &num=10, it will scrape 10 results from page 2. Given that Google has a max limit of num = 10, how can I write a loop that loops through the HTML and scrape the results for the first 10 pages?

This is what I have so far which just scrapes the first page.

    //input search term
    Console.WriteLine("What is your search query?:");
    string searchTerm = Console.ReadLine();

    //concantenate the strings using + symbol to make it URL friendly for google
    string searchTermFormat = searchTerm.Replace(" ", "+");

    //create a new instance of Webclient and use DownloadString method from the Webclient class to extract download html
    WebClient client = new WebClient();
    int i = 1;
    string Json = client.DownloadString("https://www.googleapis.com/customsearch/v1?key=AIzaSyAAoPQprb6aAV-AfuVjoCdErKTiJHn-4uI&cx=017576662512468239146:omuauf_lfve&q=" + searchTermFormat + "&num=10" + "&start=" + i);

    //create a new instance of JavaScriptSerializer and deserialise the desired content
    JavaScriptSerializer js = new JavaScriptSerializer();
    GoogleSearchResults results = js.Deserialize<GoogleSearchResults>(Json);

    //output results to console
    Console.WriteLine(js.Serialize(results));
    Console.ReadLine();

© Stack Overflow or respective owner

Related posts about c#

Related posts about loops