Scanned PDF files are becoming increasingly common now to replace faxes.    But how many time have you tried to select the text in a PDF, only to find you can’t because it’s in image form?

Well I’ve just searched Google for a solution – knowing that they also use OCR (Optical Character Recognition) for indexing PDF Files.

It turns out that Google are developing an open source OCR platform based on an engine called Tesseract which was developed by HP Labs between 1985 and 1995.

FreeOCR is a free windows GUI which uses the Tesseract engine – and I have to say it works pretty well!

scanfile