Extract all text from a HTML page without losing context
Posted
by grmbl
on Stack Overflow
See other posts from Stack Overflow
or by grmbl
Published on 2010-05-07T03:03:33Z
Indexed on
2010/05/07
3:08 UTC
Read the original article
Hit count: 326
For a translation program I am trying to get a 95% accurate text from a HTML file in order to translate the sentences and links.
For example:
<div><a href="stack">Overflow</a> <span>Texts <b>go</b> here</span></div>
Should give me 2 results to translate:
Overflow
Texts <b>go</b> here
Any suggestions or commercial packages available for this problem?
© Stack Overflow or respective owner