Groovy htmlunit getFirstByXPath returning null

Posted by StartingGroovy on Stack Overflow See other posts from Stack Overflow or by StartingGroovy
Published on 2011-01-08T18:15:50Z Indexed on 2011/01/09 2:53 UTC
Read the original article Hit count: 302

Filed under:

htmlunit

I have had a few issues with HtmlUnit returning nulls lately and am looking for guidance. each of my results for grabbing the first row of a website have returned null. I am wondering if someone can

A) explain why they might be returning null

B) explain better ways (if there are some) to go about getting the information

Here is my current code (URL is in the source):

client = new WebClient(BrowserVersion.FIREFOX_3)
client.javaScriptEnabled = false

def url = "http://www.hidemyass.com/proxy-list/"

page = client.getPage(url)

IpAddress = page.getFirstByXPath("//html/body/div/div/form/table/tbody/tr/td[2]").getValue()
println "IP Address is: $data"          //returns null

//Port_Number is an Image

Country = page.getFirstByXPath("//html/body/div/div/form/table/tbody/tr/td[4][@class='country']/@rel").getValue()
println "Country abbreviation is: $Country"

//differentiate speed and connection by name of gif?

Type = page.getFirstByXPath("//html/body/div/div/form/table/tbody/tr/td[7]").getValue()
println "Proxy type is: $Type"

Anonymity = page.getFirstByXPath("//html/body/div/div/form/table/tbody/tr/td[8]").getValue()
println "Anonymity Level is: $Anonymity"

client.closeAllWindows()

Right now all of my XPaths return null and .getValue() obviously doesn't work on null.

I also have questions as to what I should do about the PORT since it is an image? Is there a better alternative than downloading it and attempting to solve it by OCR?

Side Note

There is no significance in this site, I was just looking for a site that I could practice scraping on (the last one I ran into issues of fragment identities and couldn't get an answer to: HtmlUnit getByXpath returns null and HtmlUnit and Fragment Identities )

Developer IT

Groovy htmlunit getFirstByXPath returning null - Developer IT

Groovy htmlunit getFirstByXPath returning null

html

groovy

screen-scraping

htmlunit

Related posts about html

Install usblib package - Ubuntu

Prevent malicious vulnerability scan increasing load on a server

can't install psycopg2 in my env on mac os x lion

Bitnami redmine error SVN

Can the .htaccess file slow down a website to a crawl? If so, are there better ways to solve these problems with different rewrite rules and such?

Related posts about groovy

Does IntelliJ-Idea support Groovy 2.x?

grails 1.3.1 Error executing script GenerateViews:

How to list all (groovy) classes in JVM in groovy

Multiple file access in groovy(Groovy on Grails)

Geb not working with chrome driver

Categories cloud