Converting Creole to HTML, PDF, DOCX, ..
- by Marko Apfel
Challenge We documented a project on Github with the Wiki there. For most articles we used Creole as markup language. Now we have to deliver a lot of the content to our client in an usual format like PDF or DOCX. So we need a automatism to extract all relevant content, merge it together and convert the stuff to a new format. Problem One of the most popular toolsets to convert between several formats is Pandoc. But unfortunally Pandoc does not support Creole (see the converting matrix). Approach So we need an intermediate step: Converting from Creole to a supported Pandoc format. Creolo/c is a Creole to Html converter and does exactly what we need. After converting our Creole content to Html we could use Pandoc for all the subsequent tasks. Solution Getting the Creole stuff First at all we need the Creole content on our locale machines. This is easy. Because the Github Wiki themselves is a Git repository we could clone it to our machine. In the working copy we see now all the files and the suffix gives us the hint for the markup language. Converting and Merging Creole content to Html Because we would like all content from several Creole files in one HTML file, we have to convert and merge all the input files to one output file. Creole/c has an option (-b) to generate only the Html-stuff below a Html <Body>-tag. And this is hook for us to start. We have to create manually the additional preluding Html-tags (<html>, <head>, ..), then we merge all needed Creole content to our output file and last we add the closing tags. This could be done straightforward with a little bit old DOS magic: REM === Generate the intro tags ===
ECHO ^<html^> > %TMP%\output.html
ECHO ^<head^> >> %TMP%\output.html
ECHO ^<meta name="generator" content="creole/c"^> >> %TMP%\output.html
ECHO ^</head^> >> %TMP%\output.html
ECHO ^<body^> >> %TMP%\output.html
REM === Mix in all interesting Creole stuff with creole/c ===
.\Creole-C\bin\creole.exe -b .\..\datamodel+overview.creole >> %TMP%\output.html
.\Creole-C\bin\creole.exe -b .\..\datamodel+domain+CvdCaptureMode.creole >> %TMP%\output.html
.\Creole-C\bin\creole.exe -b .\..\datamodel+domain+CvdDamageReducingActivity.creole >> %TMP%\output.html
.\Creole-C\bin\creole.exe -b .\..\datamodel+lookup+IncidentDamageCodes.creole >> %TMP%\output.html
.\Creole-C\bin\creole.exe -b .\..\datamodel+table+Attachments.creole >> %TMP%\output.html
.\Creole-C\bin\creole.exe -b .\..\datamodel+table+TrafficLights.creole >> %TMP%\output.html
REM === Generate the outro tags ===
ECHO ^</body^> >> %TMP%\output.html
ECHO ^</html^> >> %TMP%\output.html
REM === Convert the Html file to Docx with Pandoc ===
.\Pandoc\bin\pandoc.exe -o .\Database-Schema.docx %TMP%\output.html
Some explanation for this
The first ECHO call creates the file.
Therefore the beginning <html> tag is send via > to a temporary working file.
All following calls add content to the existing file via >>.
The tag-characters < and > must be escaped. This is done by the caret sign (^).
We use a file in the default temporary folder (%TMP%) to avoid writing in our current folders.
(better for continuous integration)
Both toolsets (Creole/c and Pandoc) are copied to a versioned tools folder in the Wiki. This is committable and no problem after pushing – Github does not do anything with it.
In this folder is also the batch (Export-Docx.bat) for all the steps.
Pandoc recognizes the conversion by the suffixes of the file names. So it is enough to specify only the input and output files.