How does scanning a QR code with your phone, make the device present the right Web page? This is possible due to a form of data recognition, which is an important factor of intelligent document understanding.
Intelligent document understanding first happens when information from a paper form has been scanned and converted into an image. Once a document has been converted, then key relevant data elements of that document can be recognized. There are different ways to do this and recognizing key elements ranges from very fast and accurate, to time-consuming and not as accurate.
To expand further, there are five main techniques for reading the data needed off of documents.
The first two techniques are fast and accurate, but only extract key information from the forms of data; they do not read the entire document.
- Barcode- For instance, when you pick out items at the grocery store they all have individual barcodes that allow the cash register to recognize what item it is and charge the appropriate amount. This method is very fast and accurate but has a limited range of document types because you must complete printing at the source.
- Object Mark Recognition (OMR)- This type of data recognition is found on items such as surveys, school tests and applications. OMR can only understand yes/no type questions, it then takes the written marks and converts them to data on a device. This is in the process of being replaced by online capture of information.
- Optical Character Recognition (OCR)- OCR can recognize printed text, such as shipping labels, unlike OMR which could only recognize marks. This technique is very accurate with Latin-based characters and is still under development with Asian languages. Although it is accurate, it is very intense so it can be a slow form of data recognition on PCs.
- Intelligent Character Recognition (ICR)/ Handprint recognition- The final two techniques are grouped together because they are the least developed and least accurate forms of data recognition. These techniques understand handwritten data but only accurate when using a limited vocabulary or information that can be matched to a database.
Any questions on how data recognition works for intelligent document understanding? Read more about this topic in our whitepaper.