Uncompress OpenOffice files for better storage in version control
Posted
by Craig McQueen
on Stack Overflow
See other posts from Stack Overflow
or by Craig McQueen
Published on 2009-06-10T12:01:32Z
Indexed on
2010/03/16
7:46 UTC
Read the original article
Hit count: 385
openoffice.org
|version-control
I've heard discussion about how OpenOffice (ODF) files are compressed zip files of XML and other data. So making a tiny change to the file can potentially totally change the data, so delta compression doesn't work well in version control systems.
I've done basic testing on an OpenOffice file, unzipping it and then rezipping it with zero compression. I used the Linux zip utility for my testing. OpenOffice will still happily open it.
So I'm wondering if it's worth developing a small utility to run on ODF files each time just before I commit to version control. Any thoughts on this idea? Possible better alternatives?
Secondly, what would be a good and robust way to implement this little utility? Bash shell that calls zip (probably Linux only)? Python? Any gotchas you can think of? Obviously I don't want to accidentally mangle a file, and there are several ways that could happen.
Possible gotchas I can think of:
- Insufficient disk space
- Some other permissions issue that prevents writing the file or temporary files
- ODF document is encrypted (probably should just leave these alone; the encryption probably also causes large file changes and thus prevents efficient delta compression)
© Stack Overflow or respective owner