Batch OCR for many PDF files (not already OCRed) ?
- by David
Hello,
I use Google Desktop Search (I am on Vista) and not all my PDF files are recognized in my archive folder. It is normal as "PDF files that contain scanned images" are not indexed (http://desktop.google.com/support/bin/answer.py?hl=en&answer=90651)
So I would like to OCR many of my PDF files that are not already OCRed.
My goal : I give the program a folder and it search alone in the subfolders the PDF files that need to be converted into PDF-OCRed files.
Note: In the past, if a PDF file was password protected, I removed the password with another batch (paying) tool: verypdf.com "pwdremover"
Any (not too much expensive) idea ?
I already tried :
Finereader 6 pro on xp at the time, but there was no batch processor included...
Paperfile paperfile.net which uses Tesseract code.google.com/p/tesseract-ocr/. But the OCR is only PDF to text, not PDF to PDF!
There is also another project code.google.com/p/ocropus
Thanks in advance ;)