Transform PDF to HTML, keep layout

Posted by Tgr on Stack Overflow See other posts from Stack Overflow or by Tgr
Published on 2010-05-08T13:36:29Z Indexed on 2010/05/08 13:58 UTC
Read the original article Hit count: 324

Filed under:

pdf

|

html

|

conversion

What methods are there to transform a PDF to HTML? It could be anything - online service, software, library. (Opensource preferred. In the last case, php or python would be preferred.) It has to keep the original layout (including page numbers, footnotes and such), keep the images (combining them to one single background image per page is acceptable) and keep the links. It should preferably output valid XHTML and clean up PDF features such as ligatures, but if there is some post-processing required, I can live with that. Something with a clean, relatively semantic HTML output would be great.

The closest one I found was zamzar.org, but it choked on links. (Also, the HTML output is an ugly heap of absolutely positioned divs and needs post-processing because of encoding problems.)

© Stack Overflow or respective owner

Related posts about pdf

PDF Converter wanted: Convert 8.5*11 PDF images into 600*800px PDF images for the Nook

as seen on Super User - Search for 'Super User'
I have PDF files that are maritime charts, For example this one from the Delaware Bay http://ocsdata.ncd.noaa.gov/BookletChart/12304_BookletChart_HomeEd.pdf There is a lot of detailed information in the image. When I show them on a monitor the details are shown. When I put them on the Nook they… >>> More
Loop through values and display in a pdf file

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello all! i have written the following code: As you can see there is a for loop to go through some values and display them in the generated pdf. The problem is that all the values are being written at the same place. I have tried to insert a new line but it does not seem to work. Can anyone suggest… >>> More
convert scanned images pdf file to searchable pdf file

as seen on Super User - Search for 'Super User'
I have a pdf of a scanned book. I'm looking for a free software that will perform ocr and then provide an option to save it as pdf/doc. Is there one? Thanks. >>> More
Integrate Nitro PDF Reader with Windows 7

as seen on How to geek - Search for 'How to geek'
Would you like a lightweight PDF reader that integrates nicely with Office and Windows 7? Here we look at the new Nitro PDF Reader, a nice PDF viewer that also lets you create and markup PDF files. Adobe Reader is the de-facto PDF viewer, but it only lets you view PDFs and not much else. … >>> More
foxpro to pdf and pdf to foxpro

as seen on Stack Overflow - Search for 'Stack Overflow'
how will i convert pdf database converted from foxpro back to foxpro >>> More

Related posts about html

Install usblib package - Ubuntu

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I need the package libusb for another package I am installing. I tried the following which seemed to install the package, sudo apt-get install libusb-dev but when I try to install the other package I get, configure: error: Package requirements (libusb-1.0 >= 0.9.1) were not met: No package… >>> More
Prevent malicious vulnerability scan increasing load on a server

as seen on Server Fault - Search for 'Server Fault'
Hi all, this week we have been suffering some malicious vulnerability scans to our servers, increasing the load on them, making them nearly unusable. The attack is easy to defend, just blocking the offending ip, but only after discovering it. Is there any form of prevent it? Is it normal that… >>> More
can't install psycopg2 in my env on mac os x lion

as seen on Server Fault - Search for 'Server Fault'
I tried install psycopg2 via pip in my virtual env, but got this error: ld: library not found for -lpq (full log here: http://pastebin.com/XdmGyJ4u ) I tried install postgres 9.1 from .dmg and via port, (gksks)iMac-Alexander:~ lorddaedra$ locate libpq /Developer/SDKs/MacOSX10.7.sdk/usr/include/libpq /Developer/SDKs/MacOSX10… >>> More
Bitnami redmine error SVN

as seen on Server Fault - Search for 'Server Fault'
I'm installing the Bitnami Redmine stack (redmine + subversion). Firstly I install configure and test it locally (Ubuntu 14.04 LTS). And everything is OK. I install Bitnami stack on server (Red Hat 4.4.7-4) and configure SVN. I commit files into SVN and connect project into Redmine with SVN repository… >>> More
Can the .htaccess file slow down a website to a crawl? If so, are there better ways to solve these problems with different rewrite rules and such?

as seen on Pro Webmasters - Search for 'Pro Webmasters'
here is my htaccess file...... RewriteCond %{REQUEST_URI} ^/patients/billing/FAQ_billing\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/billing/getintouch\.html$ RewriteRule ^patients/billing/(.*)\.html$ $1.php [L,NC] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/a\.html$ [OR] RewriteCond… >>> More