Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php
- by robertphyatt
Ok, so I needed to convert .docx files to .pdf files on the fly, but none of the free php libraries that were available let me do it on my server (a webservice was not good enough).
Basically either I needed to pay for a library (and have it maybe suck) or just deal with the free ones that didn't convert the formatting well enough.
Not good enough!
I found that LibreOffice (OpenOffice's successor) allows command line conversion using the LibreOffice conversion engine (which DID preserve the formatting like I wanted and generally worked great).
I loaded the latest version of Ubuntu (http://www.ubuntu.com/download/ubuntu/download) onto my Virtual Box (https://www.virtualbox.org/wiki/Downloads) on my computer and found that I was able to easily convert files using the commandline like this:
libreoffice --headless -convert-to pdf fileToConvert.docx -outdir output/path/for/pdf
I thought: sweet...but I don't have admin rights on my host's web server. I tried to use a "portable" version of LibreOffice that I obtained from http://portablelinuxapps.org/ but I was unable to get it to work on my host's webserver, because my host's webserver didn't have all the dependencies (Dependency Hell! http://en.wikipedia.org/wiki/Dependency_hell)
I was at a loss of how to make it work, until I ran across a cool project made by a Ph.D. student (Philip J. Guo) at Stanford called CDE: http://www.stanford.edu/~pgbovine/cde.html
I will let you look at his explanations of how it works (I followed what he did in http://www.youtube.com/watch?feature=player_embedded&v=6XdwHo1BWwY, starting at about 32:00 as well as the directions on his site), but in short, it allows one to avoid dependency hell by copying all the files used when you run certain commands, recreating the linux environment where the command worked. I was able to use this to run LibreOffice without having to resort to someone's portable version of it, and it worked just like it did when I did it on Ubuntu with the command above, with a tweak: I needed to run the wrapper of LibreOffice the CDE generated.
So, below is my PHP code that calls it. In this code snippet, the filename to be copied is passed in as $_POST["filename"]. I copy the file to the same spot where I originally converted the file, convert it, copy it back and then delete all the files (so that it doesn't start growing exponentially).
I did it this way because I wasn't able to make it work otherwise on the webserver. If there is a linux + webserver ninja out there that can figure out how to make it work without doing this, I would be interested to know what you did. Please post a comment or something if you did that.
<?php
//first copy the file to the magic place where we can convert it to a pdf on the fly
copy($time.$_POST["filename"], "../LibreOffice/cde-package/cde-root/home/robert/Desktop/".$_POST["filename"]);
//change to that directory
chdir('../LibreOffice/cde-package/cde-root/home/robert');
//the magic command that does the conversion
$myCommand = "./libreoffice.cde --headless -convert-to pdf Desktop/".$_POST["filename"]." -outdir Desktop/";
exec ($myCommand);
//copy the file back
copy("Desktop/".str_replace(".docx", ".pdf", $_POST["filename"]), "../../../../../documents/".str_replace(".docx", ".pdf", $_POST["filename"]));
//delete all the files out of the magic place where we can convert it to a pdf on the fly
$files1 = scandir('Desktop');
//my files that I generated all happened to start with a number.
$pattern = '/^[0-9]/';
foreach ($files1 as $value)
{
preg_match($pattern, $value, $matches);
if(count($matches) ?> 0)
{
unlink("Desktop/".$value);
}
}
//changing the header to the location of the file makes it work well on androids
header( 'Location: '.str_replace(".docx", ".pdf", $_POST["filename"]) );
?>
And here is the tar.gz file I generated I generated with CDE. To duplicate what I did exactly, put the tar.gz file in a folder somewhere. I will call that folder the "root". Make a new folder called "documents" in the "root" folder. Unpack the tar.gz and run the php script above from the "documents" folder.
Success! I made a truly portable version of LibreOffice that can convert files on the fly on a webserver using 100% free, open source software!