How to make Nokogiri transparently return un/encoded Html entities untouched?

Posted by svenfuchs on Stack Overflow See other posts from Stack Overflow or by svenfuchs
Published on 2010-04-02T14:07:30Z Indexed on 2010/04/02 16:53 UTC
Read the original article Hit count: 286

Filed under:
|
|
|

How can I use Nokogiri with having html entities (like German umlauts) untouched?

I.e.:

# this is fine
node = Nokogiri::HTML.fragment('<p>&ouml;</p>')
node.to_s # => '<p>&ouml;</p>'

# this is not
node = Nokogiri::HTML.fragment('<p>ö</p>')
node.to_s # => '<p>&ouml;</p>'

# this is what I need
node = Nokogiri::HTML.fragment('<p>ö</p>')
node.to_s # => '<p>ö</p>'

I've tried to mess with both PARSE_OPTIONS and :save_with options but could not come up with a way to have Nokogiri just transparently behave like above.

Any pointers?

© Stack Overflow or respective owner

Related posts about ruby

Related posts about nokogiri