Batch OCR for many PDF files (not already OCRed) ?

Posted by David on Super User See other posts from Super User or by David
Published on 2010-02-11T19:30:53Z Indexed on 2010/03/22 20:01 UTC
Read the original article Hit count: 388

Filed under:
|
|

Hello,

I use Google Desktop Search (I am on Vista) and not all my PDF files are recognized in my archive folder. It is normal as "PDF files that contain scanned images" are not indexed (http://desktop.google.com/support/bin/answer.py?hl=en&answer=90651)

So I would like to OCR many of my PDF files that are not already OCRed. My goal : I give the program a folder and it search alone in the subfolders the PDF files that need to be converted into PDF-OCRed files.

Note: In the past, if a PDF file was password protected, I removed the password with another batch (paying) tool: verypdf.com "pwdremover"

Any (not too much expensive) idea ?

I already tried : Finereader 6 pro on xp at the time, but there was no batch processor included... Paperfile paperfile.net which uses Tesseract code.google.com/p/tesseract-ocr/. But the OCR is only PDF to text, not PDF to PDF! There is also another project code.google.com/p/ocropus

Thanks in advance ;)

© Super User or respective owner

Related posts about pdf

Related posts about ocr