Nokogiri changing custom elements

Posted by dagda1 on Stack Overflow See other posts from Stack Overflow or by dagda1
Published on 2011-02-06T07:22:44Z Indexed on 2011/02/06 7:25 UTC
Read the original article Hit count: 251

Filed under:
|

Hi,

I have sample html that I have marked up with some special tags that will be used by a different program, an example of the html is below. You should note the <START:organization>..<END> elements.

<html>
<head/>
<body>
  <ul>
    <li> <START:organization> Advanced Integrated Pest Management <END> </li>
    <li> <START:organization> American Bakers Association <END> </li>
  </ul>
</body>
</html>

I wanted to use nokogiri to preprocess the html to easily remove irrelevant tags like <script>. I created the following extension to the nokogiri document class:

module Nokogiri
  module HTML
    class Document
      def prepare_html
        xpath("//script").remove
        to_html.remove_new_lines
      end
    end
  end
end

The problem is that nokogiri is changing the <START:organization> element to <organization>.

Is there anyway that I can preserve the htnl to maintain my custom markup tags?

Thanks

Paul

© Stack Overflow or respective owner

Related posts about ruby

Related posts about nokogiri