Is their an optimal config/format for a TIFF when using Tesseract or other OCR?

Posted by Zando on Stack Overflow See other posts from Stack Overflow or by Zando
Published on 2010-04-19T22:29:02Z Indexed on 2010/04/19 22:33 UTC
Read the original article Hit count: 359

I'm having a bizarre problem with Tesseract. I have a name, "Janice" that is in a 200x40 pixel tiff, that Tesseract interprets as a blank. I'm running hundreds of names through Tesseract and they are processed fine.

What I'm actually doing, though, is breaking up a larger TIFF into smaller tiffs of one word each. In the larger TIFF, tesseract recognizes "Janice".

What could cause it to hiccup in a TIFF that solely contains that word (and there's enough space around the word to not truncate any of the pixels)? I'm using ImageMagick to split the big TIFF, are there options I should set when reconstituting the new TIFF files?

© Stack Overflow or respective owner

Related posts about image-processing

Related posts about image-manipulation