by Matt Asay

Google’s open source OCR work

analysis
Apr 11, 20071 min

This is the sort of thing that makes me like Google again. Google just announced work on the open source OCRopus project, a document analysis and OCR (Optical Character Recognition) system:The goal of the project is to advance the state of the art in optical character recognition and related technologies, and to deliver a high quality OCR system suitable for document conversions, electronic libraries, vision imp

This is the sort of thing that makes me like Google again. Google just announced work on the open source OCRopus project, a document analysis and OCR (Optical Character Recognition) system:

The goal of the project is to advance the state of the art in optical character recognition and related technologies, and to deliver a high quality OCR system suitable for document conversions, electronic libraries, vision impaired users, historical document analysis, and general desktop use. In addition, we are structuring the system in such a way that it will be easy to reuse by other researchers in the field.

The project is licensed under the Apache 2.0 license, which I think is an ideal license for how this code could be used (embedding it into desktop applications, for example). And given Google’s interest in mining the world’s information, I suspect this is a project that will see a lot of Google’s time and, hopefully, a lot of others’ time, as well.

Great move, Google. This looks like a fantastic project on which to work.