Transform PDF to HTML, keep layout
Posted
by Tgr
on Stack Overflow
See other posts from Stack Overflow
or by Tgr
Published on 2010-05-08T13:36:29Z
Indexed on
2010/05/08
13:58 UTC
Read the original article
Hit count: 254
What methods are there to transform a PDF to HTML? It could be anything - online service, software, library. (Opensource preferred. In the last case, php or python would be preferred.) It has to keep the original layout (including page numbers, footnotes and such), keep the images (combining them to one single background image per page is acceptable) and keep the links. It should preferably output valid XHTML and clean up PDF features such as ligatures, but if there is some post-processing required, I can live with that. Something with a clean, relatively semantic HTML output would be great.
The closest one I found was zamzar.org, but it choked on links. (Also, the HTML output is an ugly heap of absolutely positioned divs and needs post-processing because of encoding problems.)
© Stack Overflow or respective owner