Python 3-compatibe HTML to text converter preserving basic structure under permissive licence?
Posted
by hawk64
on Stack Overflow
See other posts from Stack Overflow
or by hawk64
Published on 2009-09-14T12:23:52Z
Indexed on
2010/04/02
11:03 UTC
Read the original article
Hit count: 214
python
I am looking for a relatively simple HTML to text converter which displays links and works on strings.
So far I have tried
- lynx but performance is too bad,
- html2text which gives weird and verbose markdown output and is under GPLv3 which is too restrictive for my (BSD-licensed) project,
- http://effbot.org/librarybook/formatter-example-3.py using htmllib.HTMLParser with formatter.AbstractFormatter and a custom writer, however htmllib.HTMLParser is drpeceated and has been removed from Python 3.
So is there any simple, performant, Python 3-compatible HTML to text converter under a permissive license such as MIT/BSD/Apache and the like?
Edit: I dont just need something to strip HTML-Tags but also to preserve the basic structure of the HTML, that is output that somewhat resembles that of Lynx.
© Stack Overflow or respective owner