hpricot throws exception when trying to parse url which has noscript tag
Posted
by anusuya
on Stack Overflow
See other posts from Stack Overflow
or by anusuya
Published on 2010-04-08T11:40:21Z
Indexed on
2010/04/08
11:43 UTC
Read the original article
Hit count: 521
ruby
|ruby-on-rails
I use hpricot gem in ruby on rails to parse a webpage and extract the meta-tag contents. But if the website has a <noscrpit>
tag just after the <head>
tag it throws an exception
Exception: undefined method `[]' for nil:NilClass
I even tried to update the gem to the latest version. but still the same.
this is the sample code i use.
require 'rubygems'
require 'hpricot'
require 'open-uri'
begin
index_page = Hpricot(open("http://sample.com"))
puts index_page.at("/html/head/meta[@name='verification']")['content'].gsub(/\s/, "")
rescue Exception => e
puts "Exception: #{e}"
end
i was thinking to remove the noscript tag before giving the webpage to hpricot. or is there anyother way to do it??
© Stack Overflow or respective owner