Search Results

Search found 1 results on 1 pages for 'fastrack'.

Page 1/1 | 1 

  • layout analysis of text based pdf without ocr

    - by fastrack
    Before recognizing a pdf, OCR software do document layout analysis to determine which parts are texts, tables or images, as shown in the picture below. ![papercrop]http://cache.gawkerassets.com/assets/images/17/2011/07/papercrop.jpg I want to use some parts of the text while leaving out the others. So having a software marking those zones comes in handy. Papercrop does a decent job, but it has a bug of now showing some of the text in the pdf file. And OCR software can also do layout analysis, marking out "zones" which I can add or delete. But you have to OCR to do that. Since my pdfs are already text based, I don't want to waste so much time OCRing. So my question is, is there any software that automatically mark out those zones and let me manually manipulate them, without having to OCR? Thanks! Waiting for your help.

    Read the article

1