Posts

Showing posts with the label open source OCR

Tesseract: an Open-Source Optical Character Recognition Engine

Image
This OCR is originally owned by HP , and Google Bought it since around 9 Months, and Google Released it as Open Source. This is Engine is one of the best engines Google Code Link h3r3 Project Link h3r3 How to Install Version 1.03 was the latest version at the time of this writing, and the build and install process still needed a little work. Also, integration with libtiff (which would allow you to use compressed TIFF as input) was configured by default, but it was not working properly. You might try configuring it with libtiff, as that would allow compressed TIFF image input: # ./configure If you later find that it doesn't recognize text, reconfigure it without libtiff: # ./configure --without-libtiff The build is done as expected: # make Configure for version 1.03 also indicated that make install was broken. I managed to figure out the basics of installation by trial and error. First, copy the executable from ccmain/tesseract to a directory on your path (for example, /usr...