I have a bold idea where a user could take an image like the following
and in a few seconds of processing, be able to edit a document which looks roughly the same.
The software would use WhatTheFont (or something similar) to recognize the fonts used, and OCR and other software to handle the font size, color, line-spacing, and of course the text content itself. In the case of the example image, there would be three separate "textboxes" produced, each starting at the upper left corner of the text, and extending as far to the bottom right as it could before running into another text box. So the user would then see something like this:
(The rectangles are just used to show the boundaries of each textbox.)
From here, the user would be able to edit the text in each of these boxes to create a new document.
Of course there are tons of obvious uses for such an application, especially on a mobile phone with a built in camera.
So my questions are the following:
I doubt the answer is yes, but does anything do this already?
If I'm going to try to build this, what should I write it in? Can I use Python?
What would be the best OCR libraries to start with?
Is there a service other than WhatTheFont for font recognition that has better API support?
Anybody want to help me build it? :)
etc. etc.
Update:
One thing I wanted to mention (but forgot) is I would also like the background to be preserved. In other words, if the example above had an image behind the text, I'd like the document to use that image with text removed. I know this complicates things a lot because that would require some image editing techniques too (something akin to Photoshop CS5' "content-aware fill"). But if we can solve diminished reality on iPhones, I think we can figure this out!