Convert doc/docx to semantic HTML
Posted
by sandstrom
on Stack Overflow
See other posts from Stack Overflow
or by sandstrom
Published on 2009-08-26T15:06:56Z
Indexed on
2010/03/16
9:26 UTC
Read the original article
Hit count: 392
I would like to convert doc/docx documents to semantic HTML.
Some wishes/requirements:
Semantic HTML such that headers in the document are <h1>, <h2> etc., tables are <table> and so forth.
Should preferably be possible to handle headings, lists, tables and images. Graphs and math formulas is a nice extra.
• Doesn't have to be converted straight from doc/docx to html, could use an intermediary format, such as xml or docbook.
• Should work programatically, and with large number of documents.
The closest thing to a solution I've found so far is http://holloway.co.nz/docvert/index.html, but unfortunately there are many a few bugs, small user base and it can't handle a lot of documents. More of a proof of concept.
© Stack Overflow or respective owner