How do I convert PDF to HTML programmatically?

Posted by SoaperGEM on Stack Overflow See other posts from Stack Overflow or by SoaperGEM
Published on 2010-03-25T22:17:33Z Indexed on 2010/03/25 22:23 UTC
Read the original article Hit count: 375

Filed under:
|
|

Are there any classes, COM objects, command line utilities, or anything else that I can make an API for that can convert a PDF to an HTML document? Obviously the conversion might be a little rough since PDFs can contain a lot more than HTML can describe. I found a utility called pdftohtml on Source Forge, but quite honestly it does a horrible job with the conversion. I don't care if the software is free or commercial, but is there anything out there at all that I can incorporate with my own software to do this sort of conversion at least decently? I know Google's developed their own method of doing this, since you can click "View as HTML" on a PDF attached to an email through Gmail, but I was hoping there was something out available to the public.

Remember, PDF to HTML. I'm NOT worried about HTML to PDF.

© Stack Overflow or respective owner

Related posts about pdf

Related posts about html