How do I convert PDF to HTML programmatically?
Posted
by SoaperGEM
on Stack Overflow
See other posts from Stack Overflow
or by SoaperGEM
Published on 2010-03-25T22:17:33Z
Indexed on
2010/03/25
22:23 UTC
Read the original article
Hit count: 370
Are there any classes, COM objects, command line utilities, or anything else that I can make an API for that can convert a PDF to an HTML document? Obviously the conversion might be a little rough since PDFs can contain a lot more than HTML can describe. I found a utility called pdftohtml on Source Forge, but quite honestly it does a horrible job with the conversion. I don't care if the software is free or commercial, but is there anything out there at all that I can incorporate with my own software to do this sort of conversion at least decently? I know Google's developed their own method of doing this, since you can click "View as HTML" on a PDF attached to an email through Gmail, but I was hoping there was something out available to the public.
Remember, PDF to HTML. I'm NOT worried about HTML to PDF.
© Stack Overflow or respective owner