how to remove all text nodes and only preserve structure information of a html page with nokogiri
Posted
by
user58948
on Stack Overflow
See other posts from Stack Overflow
or by user58948
Published on 2010-12-25T10:34:44Z
Indexed on
2010/12/25
10:54 UTC
Read the original article
Hit count: 170
nokogiri
I want to remove all text from html page that I load with nokogiri. For example, if a page has the following:
<body><script>var x = 10;</script><div>Hello</div><div><h1>Hi</h1></div></body>
I want to process it with Nokogiri and return html like the following after stripping the text like so:
<body><script>var x = 10;</script><div></div><div><h1></h1></div></body>
(THat is, remove the actual h1 text, text between divs, text in p elements etc, but keep the tags. also, dont remove text in the script tags.)
How can I do that?
© Stack Overflow or respective owner