In an earlier blog posting I made reference to us implementing the Tesseract OCR engine and that this had given us a couple of additional significant new features, one of them being the enhancement of our Screen Scrape technology to enable us to read other applications screens which we previously could not.
The other nice feature we’ve been able to add is ‘on the fly’ OCR of documents for indexing. When the user is profiling the document (defining the indexes to subsequently locate the document) they can now select a part of the image by dragging a rectangle over the desired text; this is then OCR’d and the resulting text used to populate the index. The next index is then presented and the process repeated. This essentially semi-automates the indexing of documents making the process more efficient – and making document management even easier for end users!
We are including this in both our Business Edition, Enterprise and the new .Net Web versions of the product.