Nokogiri changing custom elements
Posted
by
dagda1
on Stack Overflow
See other posts from Stack Overflow
or by dagda1
Published on 2011-02-06T07:22:44Z
Indexed on
2011/02/06
7:25 UTC
Read the original article
Hit count: 248
Hi,
I have sample html that I have marked up with some special tags that will be used by a different program, an example of the html is below. You should note the <START:organization>..<END>
elements.
<html>
<head/>
<body>
<ul>
<li> <START:organization> Advanced Integrated Pest Management <END> </li>
<li> <START:organization> American Bakers Association <END> </li>
</ul>
</body>
</html>
I wanted to use nokogiri to preprocess the html to easily remove irrelevant tags like <script>
. I created the following extension to the nokogiri document class:
module Nokogiri
module HTML
class Document
def prepare_html
xpath("//script").remove
to_html.remove_new_lines
end
end
end
end
The problem is that nokogiri is changing the <START:organization>
element to <organization>
.
Is there anyway that I can preserve the htnl to maintain my custom markup tags?
Thanks
Paul
© Stack Overflow or respective owner