Making libmagic/file detect .docx files
Posted
by
Jonatan Littke
on Server Fault
See other posts from Server Fault
or by Jonatan Littke
Published on 2011-12-06T11:11:39Z
Indexed on
2012/04/09
11:32 UTC
Read the original article
Hit count: 300
As seen elsewhere, docx, xlsx and pttx are ZIPs. When uploading them to my web application, file
(via libmagic
andpython-magic
) detects them as being ZIP.
I store the contents of the file as a blob in the database, but naturally I don't want to trust the user with what kind of file type this is. So I would like to trust file
for and automatically generate a filename during download.
I know one can modify /etc/magic
but the format (magic(5)
) is way too complicated for me. I found a bug report on the issue at Debian bugs but since it's from 2008 it doesn't seem to be fixed any time soon.
I guess my only other alternative is to indeed trust the user (but still store the contents as a blob) and only check the file extension based on the file name. This way I can disallow some extensions and allow others. And when the user re-downloads his file, he can have it in whatever way he uploaded it. But this solution is insecure if the file is shared with others, since you can simply rename the file to allow uploading it.
Any ideas?
Lastly, I found a list of magic numbers for docx etc, but I'm unable to convert these into the magic(5)
format.
© Server Fault or respective owner