Checking for valid document files

Posted by sweb on Super User See other posts from Super User or by sweb
Published on 2012-06-25T12:17:45Z Indexed on 2012/06/25 15:18 UTC
Read the original article Hit count: 177

I need a simple way to check if my files are valid documents (pdf, doc, docx, ppt, pptx, xls, xlsx, odt, ods, odp and etc).

I can't use file because magic does not work well at all. For example, for PDF files, this is my output.

sweb@sweb-laptop: /media/files/ebooks/PDF and CHM$ file --mime *. Pdf
PHP 5 for Dummies. Pdf: application/pdf; charset=binary
PHP 6 and MySQL 5 for Dynamic Web Sites. Pdf: application/octet-stream; charset=binary
PHP6 and MySQL Bible. Pdf: application/pdf; charset=binary
PHP6.pdf: application/octet-stream; charset=binary
PHP and MySQL for Dummies SE. Pdf: application/pdf; charset=binary

For example, I use abiword – which is a good tool – but it converts any format. It doesn't check for valid documents:

abiword --to=txt --to-name=output.txt audio.mp3

Is there any command available to check for valid documents then?

© Super User or respective owner

Related posts about command-line

Related posts about documents