Converting Creole to HTML, PDF, DOCX, ..
Posted
by Marko Apfel
on Geeks with Blogs
See other posts from Geeks with Blogs
or by Marko Apfel
Published on Fri, 24 Aug 2012 11:14:00 GMT
Indexed on
2012/08/27
21:40 UTC
Read the original article
Hit count: 510
Challenge
We documented a project on Github with the Wiki there. For most articles we used Creole as markup language. Now we have to deliver a lot of the content to our client in an usual format like PDF or DOCX.
So we need a automatism to extract all relevant content, merge it together and convert the stuff to a new format.
Problem
One of the most popular toolsets to convert between several formats is Pandoc. But unfortunally Pandoc does not support Creole (see the converting matrix).
Approach
So we need an intermediate step: Converting from Creole to a supported Pandoc format.
Creolo/c is a Creole to Html converter and does exactly what we need. After converting our Creole content to Html we could use Pandoc for all the subsequent tasks.
Solution
Getting the Creole stuff
First at all we need the Creole content on our locale machines.
This is easy. Because the Github Wiki themselves is a Git repository we could clone it to our machine. In the working copy we see now all the files and the suffix gives us the hint for the markup language.
Converting and Merging Creole content to Html
Because we would like all content from several Creole files in one HTML file, we have to convert and merge all the input files to one output file.
Creole/c has an option (-b) to generate only the Html-stuff below a Html <Body>-tag. And this is hook for us to start.
We have to create manually the additional preluding Html-tags (<html>, <head>, ..), then we merge all needed Creole content to our output file and last we add the closing tags.
This could be done straightforward with a little bit old DOS magic:
REM === Generate the intro tags === ECHO ^<html^> > %TMP%\output.html ECHO ^<head^> >> %TMP%\output.html ECHO ^<meta name="generator" content="creole/c"^> >> %TMP%\output.html ECHO ^</head^> >> %TMP%\output.html ECHO ^<body^> >> %TMP%\output.html REM === Mix in all interesting Creole stuff with creole/c === .\Creole-C\bin\creole.exe -b .\..\datamodel+overview.creole >> %TMP%\output.html .\Creole-C\bin\creole.exe -b .\..\datamodel+domain+CvdCaptureMode.creole >> %TMP%\output.html .\Creole-C\bin\creole.exe -b .\..\datamodel+domain+CvdDamageReducingActivity.creole >> %TMP%\output.html .\Creole-C\bin\creole.exe -b .\..\datamodel+lookup+IncidentDamageCodes.creole >> %TMP%\output.html .\Creole-C\bin\creole.exe -b .\..\datamodel+table+Attachments.creole >> %TMP%\output.html .\Creole-C\bin\creole.exe -b .\..\datamodel+table+TrafficLights.creole >> %TMP%\output.html REM === Generate the outro tags === ECHO ^</body^> >> %TMP%\output.html ECHO ^</html^> >> %TMP%\output.html REM === Convert the Html file to Docx with Pandoc === .\Pandoc\bin\pandoc.exe -o .\Database-Schema.docx %TMP%\output.html
Some explanation for this
- The first ECHO call creates the file.
Therefore the beginning <html> tag is send via > to a temporary working file.
All following calls add content to the existing file via >>. - The tag-characters < and > must be escaped. This is done by the caret sign (^).
- We use a file in the default temporary folder (%TMP%) to avoid writing in our current folders.
(better for continuous integration) - Both toolsets (Creole/c and Pandoc) are copied to a versioned tools folder in the Wiki. This is committable and no problem after pushing – Github does not do anything with it.
In this folder is also the batch (Export-Docx.bat) for all the steps. - Pandoc recognizes the conversion by the suffixes of the file names. So it is enough to specify only the input and output files.
© Geeks with Blogs or respective owner