Converting Creole to HTML, PDF, DOCX, ..

Posted by Marko Apfel on Geeks with Blogs See other posts from Geeks with Blogs or by Marko Apfel
Published on Fri, 24 Aug 2012 11:14:00 GMT Indexed on 2012/08/27 21:40 UTC
Read the original article Hit count: 504

Filed under:

Challenge

We documented a project on Github with the Wiki there. For most articles we used Creole as markup language. Now we have to deliver a lot of the content to our client in an usual format like PDF or DOCX.

So we need a automatism to extract all relevant content, merge it together and convert the stuff to a new format.

Problem

One of the most popular toolsets to convert between several formats is Pandoc. But unfortunally Pandoc does not support Creole (see the converting matrix).

Approach

So we need an intermediate step: Converting from Creole to a supported Pandoc format.

Creolo/c is a Creole to Html converter and does exactly what we need. After converting our Creole content to Html we could use Pandoc for all the subsequent tasks.

Solution

Getting the Creole stuff

First at all we need the Creole content on our locale machines.

This is easy. Because the Github Wiki themselves is a Git repository we could clone it to our machine. In the working copy we see now all the files and the suffix gives us the hint for the markup language.

Converting and Merging Creole content to Html

Because we would like all content from several Creole files in one HTML file, we have to convert and merge all the input files to one output file.

Creole/c has an option (-b) to generate only the Html-stuff below a Html <Body>-tag. And this is hook for us to start.

We have to create manually the additional preluding Html-tags (<html>, <head>, ..), then we merge all needed Creole content to our output file and last we add the closing tags.

This could be done straightforward with a little bit old DOS magic:

REM === Generate the intro tags ===

ECHO ^<html^> > %TMP%\output.html
ECHO ^<head^> >> %TMP%\output.html
ECHO ^<meta name="generator" content="creole/c"^> >> %TMP%\output.html
ECHO ^</head^> >> %TMP%\output.html
ECHO ^<body^> >> %TMP%\output.html

REM === Mix in all interesting Creole stuff with creole/c ===

.\Creole-C\bin\creole.exe -b .\..\datamodel+overview.creole >> %TMP%\output.html

.\Creole-C\bin\creole.exe -b .\..\datamodel+domain+CvdCaptureMode.creole >> %TMP%\output.html
.\Creole-C\bin\creole.exe -b .\..\datamodel+domain+CvdDamageReducingActivity.creole >> %TMP%\output.html

.\Creole-C\bin\creole.exe -b .\..\datamodel+lookup+IncidentDamageCodes.creole >> %TMP%\output.html

.\Creole-C\bin\creole.exe -b .\..\datamodel+table+Attachments.creole >> %TMP%\output.html
.\Creole-C\bin\creole.exe -b .\..\datamodel+table+TrafficLights.creole >> %TMP%\output.html

REM === Generate the outro tags ===
ECHO ^</body^> >> %TMP%\output.html
ECHO ^</html^> >> %TMP%\output.html

REM === Convert the Html file to Docx with Pandoc ===

.\Pandoc\bin\pandoc.exe -o .\Database-Schema.docx %TMP%\output.html

Some explanation for this

  • The first ECHO call creates the file.
    Therefore the beginning <html> tag is send via > to a temporary working file.
    All following calls add content to the existing file via >>.
  • The tag-characters < and > must be escaped. This is done by the caret sign (^).
  • We use a file in the default temporary folder (%TMP%) to avoid writing in our current folders.
    (better for continuous integration)
  • Both toolsets (Creole/c and Pandoc) are copied to a versioned tools folder in the Wiki. This is committable and no problem after pushing – Github does not do anything with it.
    In this folder is also the batch (Export-Docx.bat) for all the steps.
    image
  • Pandoc recognizes the conversion by the suffixes of the file names. So it is enough to specify only the input and output files.

© Geeks with Blogs or respective owner