How to use a loop to download HTML with paging?
- by Nai
I want to loop through this URL and download the HTML.
https://www.googleapis.com/customsearch/v1?key=AIzaSyAAoPQprb6aAV-AfuVjoCdErKTiJHn-4uI&cx=017576662512468239146:omuauf_lfve&q=" + searchTermFormat + "&num=10" +"&start=" + i
start and num controls the paging of the URL. So if &start=2, and &num=10, it will scrape 10 results from page 2. Given that Google has a max limit of num = 10, how can I write a loop that loops through the HTML and scrape the results for the first 10 pages?
This is what I have so far which just scrapes the first page.
//input search term
Console.WriteLine("What is your search query?:");
string searchTerm = Console.ReadLine();
//concantenate the strings using + symbol to make it URL friendly for google
string searchTermFormat = searchTerm.Replace(" ", "+");
//create a new instance of Webclient and use DownloadString method from the Webclient class to extract download html
WebClient client = new WebClient();
int i = 1;
string Json = client.DownloadString("https://www.googleapis.com/customsearch/v1?key=AIzaSyAAoPQprb6aAV-AfuVjoCdErKTiJHn-4uI&cx=017576662512468239146:omuauf_lfve&q=" + searchTermFormat + "&num=10" + "&start=" + i);
//create a new instance of JavaScriptSerializer and deserialise the desired content
JavaScriptSerializer js = new JavaScriptSerializer();
GoogleSearchResults results = js.Deserialize<GoogleSearchResults>(Json);
//output results to console
Console.WriteLine(js.Serialize(results));
Console.ReadLine();