How to distinguish doc, ppt, xls files, without looking at file extension

Posted by Shelby. S on Ask Ubuntu See other posts from Ask Ubuntu or by Shelby. S
Published on 2012-07-03T14:50:03Z Indexed on 2012/07/04 3:23 UTC
Read the original article Hit count: 265

Filed under:
|

So I was wondering how would you differentiate ppt, xls and doc files from each other in linux regardless of extensions. I tried 'file' but from the looks of it, all of MSOffice files are categorized under the same file type. Similarly I'm having trouble with docx, xlsx and pptx files, since they're essentially all zip files containing a bunch of xml.

I also tried a python script importing the magic module, but no go.

I'm trying to identify the actual file for a sandbox analysis. And for this specific purpose I need to find the actual file type in order to run it in the sandbox vm (the Windows vm runs everything by extension).

Let's say my sample file is labeled as try.exe, but in reality it's just a doc file. My script will rename it as try.exe.doc, which would work fine for doc files. But since linux identifies all MSOffice files as simple DOC files then there's no way to identify ppt or xls files. As a result the sandbox wont' analyze the sample correctly.

© Ask Ubuntu or respective owner

Related posts about microsoft-office

Related posts about file-type