Python 3-compatibe HTML to text converter preserving basic structure under permissive licence?

Posted by hawk64 on Stack Overflow See other posts from Stack Overflow or by hawk64
Published on 2009-09-14T12:23:52Z Indexed on 2010/04/02 11:03 UTC
Read the original article Hit count: 208

Filed under:

I am looking for a relatively simple HTML to text converter which displays links and works on strings.

So far I have tried

  • lynx but performance is too bad,
  • html2text which gives weird and verbose markdown output and is under GPLv3 which is too restrictive for my (BSD-licensed) project,
  • http://effbot.org/librarybook/formatter-example-3.py using htmllib.HTMLParser with formatter.AbstractFormatter and a custom writer, however htmllib.HTMLParser is drpeceated and has been removed from Python 3.

So is there any simple, performant, Python 3-compatible HTML to text converter under a permissive license such as MIT/BSD/Apache and the like?

Edit: I dont just need something to strip HTML-Tags but also to preserve the basic structure of the HTML, that is output that somewhat resembles that of Lynx.

© Stack Overflow or respective owner

Related posts about python