hpricot throws exception when trying to parse url which has noscript tag
- by anusuya
I use hpricot gem in ruby on rails to parse a webpage and extract the meta-tag contents. But if the website has a <noscrpit> tag just after the <head> tag it throws an exception
Exception: undefined method `[]' for nil:NilClass
I even tried to update the gem to the latest version. but still the same.
this is the sample code i use.
require 'rubygems'
require 'hpricot'
require 'open-uri'
begin
index_page = Hpricot(open("http://sample.com"))
puts index_page.at("/html/head/meta[@name='verification']")['content'].gsub(/\s/, "")
rescue Exception => e
puts "Exception: #{e}"
end
i was thinking to remove the noscript tag before giving the webpage to hpricot.
or is there anyother way to do it??