Killing HTML nodes from shell

Posted by hendry on Stack Overflow See other posts from Stack Overflow or by hendry
Published on 2010-05-03T11:12:13Z Indexed on 2010/05/03 12:08 UTC
Read the original article Hit count: 447

Filed under:
|
|
|

Need a solution to kill nodes like <footer>foobar</footer> and <div class="nav"></div> from many several HTML files.

I want to dump a site to disk without the menus and footers and what not. Ideally I would accomplish this task using basic unix tools like sed. Since it's not XML I can't use xmlstarlet.

Could anyone please suggest recipes, so I can ideally have a script running kill-node.sh 'div class="toplinks"' *.html to prune the bits I don't want. Thank you,

© Stack Overflow or respective owner

Related posts about sed

Related posts about html