I need a simple way to check if my files are valid documents (pdf, doc, docx, ppt, pptx, xls, xlsx, odt, ods, odp and etc).
I can't use file because magic does not work well at all. For example, for PDF files, this is my output.
sweb@sweb-laptop: /media/files/ebooks/PDF and CHM$ file --mime *. Pdf
PHP 5 for Dummies. Pdf: application/pdf; charset=binary
PHP 6 and MySQL 5 for Dynamic Web Sites. Pdf: application/octet-stream; charset=binary
PHP6 and MySQL Bible. Pdf: application/pdf; charset=binary
PHP6.pdf: application/octet-stream; charset=binary
PHP and MySQL for Dummies SE. Pdf: application/pdf; charset=binary
For example, I use abiword – which is a good tool – but it converts any format. It doesn't check for valid documents:
abiword --to=txt --to-name=output.txt audio.mp3
Is there any command available to check for valid documents then?