About nikoledebell

Social Media/ Marketing Communications intern for Kodak Alaris

Using data extraction to reduce human error

When you call the doctor’s office to schedule an appointment, one of the first things they ask you for is your name and birthday. This is a form of data extraction, which makes it simple for the receptionist to find your files in the system.

Data extraction is a way to replace manual data entry via metadata extraction from documents. There are three different types of data extraction:

  1. Graphical: This method can only be used with forms; the intelligent document understanding extracts the data through a visual observation.
  2. Rules-based: This technique is performed by using keyword anchors and expressions such as “birthday” or “social security number”.
  3. Semantic understating: In this method, areas of interest are connected through a metadata field and run on many documents allowing it to learn and retrieve the information in the future without human interaction.
Image credit: Rimell.com

Image credit: Rimell.com

Companies and vendors that utilize data extraction have established three ways of understating.

  1. Self-learning: The utilization of semantic understanding and machine learning.
  2. Fuzzy understanding: Knowing that e-mails and documents can include misspellings or mistakes, this method finds data on degrees of truth rather than absolutes.
  3. Validation systems: This is similar to a database look up. However, since misspellings can occur in databases too, fuzzy understanding still applies.

Want to learn more about data extraction? Read our whitepaper about this topic.

Nikole

Stay organized via document classification

Want to be better organized to be more effective?  Classifying your documents is an important step to take. At work, you may have folders filled with papers pertaining to a certain topic neatly filled away into a labeled folder. Even on your computer you classify different documents and place them into their correct “folders.”

Picture1

Intelligent Document Understanding takes these differences in documents and sorts them into four different classification methods.

  • Symbolic: The easiest and most accurate method, which usually features barcodes. It typically is used with paper or fax documents but not e-mail.
  • Graphics-based document analytics: This method classifies the documents based on their appearance but not what the text actually says. This is typically used for invoices or semi-structured documents.
  • Graphics-and text-based keyword combined: This process lacks much flexibility, but can classify a bit more than the previous method.  It looks at the graphics (appearance) and can see that it is an invoice; however it also will pick up on keywords in the document such as “car repair” and place it into the correct folder.
  • Full, text-based document analytics: This is the newest method which also is the most complex yet versatile of the four. It can classify all text-based documents which range from paper, e-mails and social media.

There are also three “classes” of documents in which the different methods follow to arrange the documents.

  • Unstructured: Simple and no variability, typically documents that have an unknown page layout with variable content. Typically these are correspondence or mailrooms.
  • Semi-Structured: These are documents such as invoices, which are the same types of document but vary with information. These typically have an unknown page layout with tabular data.
  • Structured: These documents are very complex and vary a lot. They have a fixed page layout and consistent, defined content.  Examples of structured documents are applications, benefit forms or checks.

 

Do you have any questions about document classification?  Read more about this topic in our whitepaper.

Nikole

P.S.: Visit our Web site to learn about KODAK Info Activate Solution to see how is can activate your information and ignite your workflow.

Integration of new information sources  

With modern technology continuously expanding, there are numerous ways to receive information – both structured and unstructured. Today, paper documents and mail are becoming outdated as sources such as social media, mobile apps, and Web sites take over in volume and significance.

These new information sources are known as “big data,” and they are explosively increasing.  Big data gives businesses more data that is available to be analyzed, which in return makes for better decision making.

However, many businesses find these sources to be “chaotic or difficult to manage,” according to AIIM’s report titled “State of the ECM Industry 2011.”  Paper documents are easy for businesses and organizations to manage because they are easily organized through a capture system.  But how does one organize a Tweet, e-mail, Web document, and a phone call all about the same thing?

For example, say you get in a car accident. You would take a picture on your Smartphone of the damage, call your insurance agency to explain what happened, e-mail the photos from your Smartphone, and depending on how you feel about your insurance company’s service, you might Tweet about your experience.

 

ImageThe insurance agency must then manually enter all of that data you provided to them into a unique customer claim case folder – which can be very tedious and time consuming.  Ultimately, this work will provide the insurance agency with instant photos of the case, instant communication with the customer, and overall details about the claim straight from the scene.

Businesses that are able to expand (and manage) their information sources with the proper solution will find that big data does not have to be time consuming and frustrating, but overall very rewarding.

 

Do you have any questions about multi-source integration?  Read more about this topic in our whitepaper.

Nikole

 

How data recognition drives intelligent document understanding

How does scanning a QR code with your phone, make the device present the right Web page? This is possible due to a form of data recognition, which is an important factor of intelligent document understanding.

qrcode

Intelligent document understanding first happens when information from a paper form has been scanned and converted into an image. Once a document has been converted, then key relevant data elements of that document can be recognized. There are different ways to do this and recognizing key elements ranges from very fast and accurate, to time-consuming and not as accurate.

To expand further, there are five main techniques for reading the data needed off of documents.

Picture1

The first two techniques are fast and accurate, but only extract key information from the forms of data; they do not read the entire document.

  • Barcode- For instance, when you pick out items at the grocery store they all have individual barcodes that allow the cash register to recognize what item it is and charge the appropriate amount. This method is very fast and accurate but has a limited range of document types because you must complete printing at the source.

 

  • Object Mark Recognition (OMR)- This type of data recognition is found on items such as surveys, school tests and applications. OMR can only understand yes/no type questions, it then takes the written marks and converts them to data on a device. This is in the process of being replaced by online capture of information.

 

  • Optical Character Recognition (OCR)- OCR can recognize printed text, such as shipping labels, unlike OMR which could only recognize marks. This technique is very accurate with Latin-based characters and is still under development with Asian languages. Although it is accurate, it is very intense so it can be a slow form of data recognition on PCs.

 

  • Intelligent Character Recognition (ICR)/ Handprint recognition- The final two techniques are grouped together because they are the least developed and least accurate forms of data recognition. These techniques understand handwritten data but only accurate when using a limited vocabulary or information that can be matched to a database.

 

Any questions on how data recognition works for intelligent document understanding?  Read more about this topic in our whitepaper.

 

Nikole

Image source