About OCR Form Processing

Form Processing 1.9.0.32 or later (October 2016)

Prerequisites:

  • Synergize Process Server

  • Synergize Server

The Synergize Form Processing module lets you process documents and forms to extract data and potentially save the original form into a Synergize repository. The module features an indexing solution for automated identification and data capture from forms and documents.

The goal of Form Processing is to reduce manual indexing. Form Processing is too often perceived as a "magic bullet" that can eliminate the need for employees to do any indexing work. Unfortunately, this results in unrealistic expectations, as optical character recognition (OCR) isn't flawless and reviewing and correcting misread documents is generally part of the solution.

Form Processing includes an optional tool called Form Indexing that can efficiently do this review-and-correction step.

 

Common Reasons for Using Form Processing

Form Processing is used in the following scenarios:

Identification of Documents (No Data Capture)

This is a very common case, where indexing data is included with a set of related documents, usually in the filename, barcodes, or a metadata file, but the individual documents within the set are unknown.

Form Processing is used to identify the individual pages (often "backup/supporting documents"), and when indexed as separate documents into Synergize, they are all saved with the same metadata.

 

Identification and Data Extraction of Single Pages

The documents are mostly single page forms, which usually eliminates the need to split them.

 

Identification and Data Extraction of Multipage Documents That Are in Their Own Files

This is a rarer use case, but whenever possible, it's advantages to have the documents pre-split, so that we don't need to have Form Processing split a document (which can fail, if pages of the document fail to read splitting values correctly).

Sometimes, documents come pre-split, and sometimes, we can split a batch by barcode, before sending it to OCR.

 

Identification, Data Extraction, and Splitting of Documents That are in Batches

This is the most complex situation, but it is very common.

Form Processing has a lot of options and strategies for splitting documents and how to handle unknown pages within a Batch.

Form Indexing is usually recommended for this scenario, for its ease of fixing documents that failed to split correctly.

 

Common Review and Correction Scenarios

To accomplish a review of the scanned documents and correct them, Form Processing is used in the following scenarios:

 

Use Synergize Workflow

This is the most common strategy for situations where Form Processing does not have to split documents.

Documents that can't be identified or extracted are sent to a special workflow queue, where people manually index them.

Optionally, identified documents can be sent there (or to a second workflow queue) to be reviewed as well.

 

Use a Lookup to Verify Data

The data that is extracted from a document is checked against a database, to see if it has been read correctly.

An example would be if an order number (key field) and shipping weight (secondary field) are known ahead of time, and a key field (order number) is the only essential piece of information that the client needs to extract.

Once both values are read, they are looked up against the database and if BOTH match to a single record (order), then this is seen as a successful read that doesn't need to be reviewed.

The key point here is that a secondary piece of information that is unique (or mostly unique) to the key field (order number) should be used, to verify that the the key field has been read correctly.

Just verifying a key field, such as order number, is something that has been implemented several times, but it isn't enough to verify the data.

Just because an order number exists in the database, doesn't mean that it's the one that's on the page.

This is a rarely used strategy, but it is a great way to avoid manual review, if proper data is available.

 

Use Form Indexing

This is the preferred method, when dealing with Batches and splitting.

 

Don't Review (Unconditionally Accept)

This is not recommended, but sometimes the information is not essential and the client doesn't mind that it's not always correct.

This is most frequently used when assigning the document types of any supporting documents.

The documents will be all together since the most of the metadata is the same, and if a supporting document is mistyped, it can be corrected on the fly.

 

Form Processing Components

The following pieces make up the Form Processing solution:

Form Processing Designer

The Designer is a tool that helps you visually design how Form Processing will identify documents, what data to extract from the page, and how to map the information to Synergize documents.

 

Synergize Process Server (SPS) Actions

There are a collection of custom SPS actions for Form Processing that, along with the stock SPS actions, allow you to process individual documents and batches, interact with the Indexing Repository, interrogate OCR results, and save data to a Synergize repository.

 

Form Indexing

The Form Indexing application helps with reviewing and correcting documents that have been processed by Form Processing.

 

Form Processing Engines

Form Processing can use several third-party OCR (Optical Character Recognition) and OMR (Optical Mark Recognition [check boxes, radio buttons, etc.]) engines when processing documents.

Many have been used over the years, but only the following are still supported:

1. Nicomsoft: a royalty-free OCR engine that is always installed with Form Processing. Because other SPS actions use Nicomsoft, it installs on its own, but it is required for Form Processing to run and the Form Processing install package will not install if it detects that Nicomsoft is not installed. Instead, the installation instructs you to install the Nicomsoft engine first.

2. Nuance: the OCR engine of choice at the moment. Its results have been drastically better in recent releases, thanks to their implementation of auto-zoning. If a client is getting poor results with an older install of Form Processing using Nuance, an upgrade is recommended.

3. Microdea: an in-house created engine that only does OMR. It was created when Nuance's OMR results were deemed unacceptable.