How to use a loop to download HTML with paging?
Posted
by
Nai
on Stack Overflow
See other posts from Stack Overflow
or by Nai
Published on 2010-12-25T17:16:15Z
Indexed on
2010/12/25
19:54 UTC
Read the original article
Hit count: 297
I want to loop through this URL and download the HTML.
https://www.googleapis.com/customsearch/v1?key=AIzaSyAAoPQprb6aAV-AfuVjoCdErKTiJHn-4uI&cx=017576662512468239146:omuauf_lfve&q=" + searchTermFormat + "&num=10" +"&start=" + i
start
and num
controls the paging of the URL. So if &start=2
, and &num=10
, it will scrape 10 results from page 2. Given that Google has a max limit of num = 10, how can I write a loop
that loops through the HTML and scrape the results for the first 10 pages?
This is what I have so far which just scrapes the first page.
//input search term
Console.WriteLine("What is your search query?:");
string searchTerm = Console.ReadLine();
//concantenate the strings using + symbol to make it URL friendly for google
string searchTermFormat = searchTerm.Replace(" ", "+");
//create a new instance of Webclient and use DownloadString method from the Webclient class to extract download html
WebClient client = new WebClient();
int i = 1;
string Json = client.DownloadString("https://www.googleapis.com/customsearch/v1?key=AIzaSyAAoPQprb6aAV-AfuVjoCdErKTiJHn-4uI&cx=017576662512468239146:omuauf_lfve&q=" + searchTermFormat + "&num=10" + "&start=" + i);
//create a new instance of JavaScriptSerializer and deserialise the desired content
JavaScriptSerializer js = new JavaScriptSerializer();
GoogleSearchResults results = js.Deserialize<GoogleSearchResults>(Json);
//output results to console
Console.WriteLine(js.Serialize(results));
Console.ReadLine();
© Stack Overflow or respective owner