Python 3-compatibe HTML to text converter preserving basic structure under permissive licence?
- by hawk64
I am looking for a relatively simple HTML to text converter which displays links and works on strings.
So far I have tried
lynx but performance is too bad,
html2text which gives weird and verbose markdown output and is under GPLv3 which is too restrictive for my (BSD-licensed) project,
http://effbot.org/librarybook/formatter-example-3.py using htmllib.HTMLParser with formatter.AbstractFormatter and a custom writer, however htmllib.HTMLParser is drpeceated and has been removed from Python 3.
So is there any simple, performant, Python 3-compatible HTML to text converter under a permissive license such as MIT/BSD/Apache and the like?
Edit:
I dont just need something to strip HTML-Tags but also to preserve the basic structure of the HTML, that is output that somewhat resembles that of Lynx.