Is their an optimal config/format for a TIFF when using Tesseract or other OCR?
- by Zando
I'm having a bizarre problem with Tesseract. I have a name, "Janice" that is in a 200x40 pixel tiff, that Tesseract interprets as a blank. I'm running hundreds of names through Tesseract and they are processed fine.
What I'm actually doing, though, is breaking up a larger TIFF into smaller tiffs of one word each. In the larger TIFF, tesseract recognizes "Janice".
What could cause it to hiccup in a TIFF that solely contains that word (and there's enough space around the word to not truncate any of the pixels)? I'm using ImageMagick to split the big TIFF, are there options I should set when reconstituting the new TIFF files?