From the recipient s standpoint, the incoming documents are all unstructured or semi-structured. Whether image-based or data for example, PDF normal-, the documents might look similar all invoices-, but the data elements do not contain understandable metatags. Therefore, each document must be looked at and interpreted into a common format that is understandable by the IT backend procedures. And that is an expensive process. Capturing data from images using traditional forms processing is based on knowing the specific form layout so that you can build a template to locate which fields to capture, the rules to use for each field and any cross field validations. The template also defines the associated output metadata for the fields. Output is usually comma delimited or XML tagged, although in some cases can be electronic data interchange EDI-. It only works well when the layout of forms is the same or where clear identifiers define the format. This has confined forms processing to turnaround documents or regulated forms, like tax returns, credit card applications, medical claims, etc. Capturing images for indexing using batch capture software requires the manual insertion of coded batch and document separators between documents that provide automated retrieval index metadata. Release scripts format the images and data for the backend document management DM- or enterprise content management ECM- solution. Capturing data from unstructured, unknown data layouts can use search engines. Those hunt through unstructured text to identify and extract contextually relevant documents and phrases. However, to create understandable metadata for output into business processes requires business-specific rules, which means that the software must understand what the document is. New intelligent document recognition IDR---technologies originally developed for invoice processing and the electronic mailroom--uses techniques from each of the above areas and eliminates the limitations. It is no longer necessary to know what the form layout looks like. It is no longer necessary to insert separators. It is no longer necessary to presort. Specific rules can make the data understandable. IDR has the ability to figure out what the document category is and apply the appropriate business rules. IDR, which is also called intelligent data capture works a lot more like humans, relying on training and an internal knowledge of the layout and content of generic forms types, which is used to understand and extract required information and initiate workflows. That widens the types of forms that can be captured and reduces costs, but IDR also changes capture capabilities substantially into a series of tools that have the ability to interpret and extract data from all sorts of unstructured information. IDR capture provides the ability to make sense of and help manage the unstructured, untagged information that is coming into the corporation or organization. It can provide the front-end understanding needed to feed business process management BPM- and business intelligence BI- applications, as well as traditional accounting and document or records management systems. Source and full article: www.kmworld.coma>

Deel dit artikel

Submit to FacebookSubmit to Google PlusSubmit to TwitterSubmit to LinkedIn