Can Nokogiri use a SAX parser to parse an HTML fragment?

Posted by .yahoo.co.jpaqwsykcj3aulh3h1k0cy6nzs3isj on Stack Overflow See other posts from Stack Overflow or by .yahoo.co.jpaqwsykcj3aulh3h1k0cy6nzs3isj
Published on 2010-03-16T05:18:54Z Indexed on 2010/03/16 5:26 UTC
Read the original article Hit count: 398

Filed under:
|
|
|
|

I have this code.

class MyParser < Nokogiri::XML::SAX::Document
  def characters(string)
    LOG.debug("characters #{string}")
  end

  def start_element(name, attrs = [])
    LOG.debug("start_element #{name}")
  end

  def end_element(name)
    LOG.debug("end_element #{name}")
  end
end

parser = Nokogiri::HTML::SAX::Parser.new(MyParser.new)
parser.parse(File.new($*[0], 'rb'))

Run on an HTML fragment like this,

<h1>Hello</h1> 
<p>Hi.</p>

the output shows that only the first element is processed:

start_element h1
characters Hello
end_element h1

If I wrap the fragment in html and body tags, the whole input is parsed.

Is there a way to use a SAX style parser on HTML fragments?

© Stack Overflow or respective owner

Related posts about nokogiri

Related posts about html