Batch OCR for many PDF files (not already OCRed) ?
Posted
by David
on Super User
See other posts from Super User
or by David
Published on 2010-02-11T19:30:53Z
Indexed on
2010/03/22
20:01 UTC
Read the original article
Hit count: 388
Hello,
I use Google Desktop Search (I am on Vista) and not all my PDF files are recognized in my archive folder. It is normal as "PDF files that contain scanned images" are not indexed (http://desktop.google.com/support/bin/answer.py?hl=en&answer=90651)
So I would like to OCR many of my PDF files that are not already OCRed. My goal : I give the program a folder and it search alone in the subfolders the PDF files that need to be converted into PDF-OCRed files.
Note: In the past, if a PDF file was password protected, I removed the password with another batch (paying) tool: verypdf.com "pwdremover"
Any (not too much expensive) idea ?
I already tried : Finereader 6 pro on xp at the time, but there was no batch processor included... Paperfile paperfile.net which uses Tesseract code.google.com/p/tesseract-ocr/. But the OCR is only PDF to text, not PDF to PDF! There is also another project code.google.com/p/ocropus
Thanks in advance ;)
© Super User or respective owner