How to distinguish doc, ppt, xls files, without looking at file extension
Posted
by
Shelby. S
on Ask Ubuntu
See other posts from Ask Ubuntu
or by Shelby. S
Published on 2012-07-03T14:50:03Z
Indexed on
2012/07/04
3:23 UTC
Read the original article
Hit count: 262
microsoft-office
|file-type
So I was wondering how would you differentiate ppt, xls and doc files from each other in linux regardless of extensions. I tried 'file' but from the looks of it, all of MSOffice files are categorized under the same file type. Similarly I'm having trouble with docx, xlsx and pptx files, since they're essentially all zip files containing a bunch of xml.
I also tried a python script importing the magic module, but no go.
I'm trying to identify the actual file for a sandbox analysis. And for this specific purpose I need to find the actual file type in order to run it in the sandbox vm (the Windows vm runs everything by extension).
Let's say my sample file is labeled as try.exe, but in reality it's just a doc file. My script will rename it as try.exe.doc, which would work fine for doc files. But since linux identifies all MSOffice files as simple DOC files then there's no way to identify ppt or xls files. As a result the sandbox wont' analyze the sample correctly.
© Ask Ubuntu or respective owner