The Business Process for Document Understanding - UiPath Studio Template Explained

Lahiru Fernando
Nov 20, 2022
10 min read

Updated: Nov 27, 2022

UiPath offers a template that we can use for building Document Understanding workflows. The article explains the workflow concept and how to develop our project on top of it.

Automated document processing usually has two main approaches; attended document automation and unattended document automation. The latest Document Understanding templates provide a reliable foundation for attended, unattended document automation and testing.

The new Document Understanding Template can be found under the Templates Menu of UiPath Studio.

The UiPath Document Understanding template uses the concept of "One Job per Document" to process documents.

Let's understand the concept and have a look at the detailed process.

Topics Covered in the Article

What is the One Job per Document Concept?
Understanding the Configuration File
Document Understanding Template Explained
Testing in Document Understanding Template

1. One Job per Document Concept?

We can think of many ways to iterate through a list of documents. The most common iterative logic that comes to mind is the introduction of a loop. UiPath has multiple mechanisms to loop through records, such as a For Each, Do While, Parallel For Each, etc. However, these approaches have their pros and cons when it comes to processing documents. Let's have a look at some of those limitations.

For Each - Pros

None that I can think of...

For Each - Cons

Not the best approach when processing a large number of documents
Not suitable for unattended document automation, as every document is processed in a sequential order

Parallel For Each - Pros

Allows parallel processing of documents
Works for both attended and unattended document automation

Parallel For Each - Cons

Exception handling can be complex due to the complex nature of a Parallel For Each.
Not suitable for processing a large number of documents
Tracking the status of a document through logs can be tricky
The entire process fails if the process runs into an unhandled exception with the Parallel For Each
Not possible to scale and use multiple robots to process documents

As we see, there are many limitations in the usual looping approaches for processing documents. This is where the concept of One Job per Process comes into the picture.

What is the One Job per Document concept?

The concept focuses on building the automation workflow to process only one document. In other words, the workflow does not contain any activity or logic to loop through records. Instead, it includes steps to capture the available document from either Orchestrator Queues or manual input and process only that given document.

The following options are available in the template to obtain new files:

File Input Option	Execution Method
Introduce a Document collector process (dispatcher) to find all the pending documents and load them into an Orchestrator Queue	Attended and Unattended
Provide a Select File window	Attended
Provide an input argument to provide the file path in the Job creation page in the Orchestrator/ Assistant	Attended

But still, even if we load the Queue, how do we trigger the process for each document when the workflow is designed to process only one file?

The Queue Trigger feature of the UiPath Orchestrator plays a significant role in this concept. The Queue-Based Trigger allows us to trigger a Job (Document Understanding Job) for each new item created in the Queue. Each started job takes care of one queue item and runs independently from others.

What's the benefit of using this concept?

Each document is processed independently.
Failed items can be retried automatically through the built-in Retry capabilities of the Queue.
Easy tracking of failures and the cause
Flexibility to process items in Queues and process specific files provided by the user through the same process
Easy to scale up and introduce new robots to process documents in parallel
An ideal approach for processing a large number of records efficiently

One common question about this concept is the number of jobs it creates to process documents. Let's take an example. Let's say the Queue is loaded with 100 files. The Queue-Based Trigger creates 100 jobs to process each file. One may think it is not easy to track the jobs and know which job processed which document. But we don't need to worry about the number of jobs or tracking the job status. Each queue item records its status on how it got executed. In addition, if you add a unique reference to each queue item (ex: file name), you can search for the item using the Reference field and monitor its progress. Similarly, failed and retried items can also be tracked through the Queue. However, if you are troubleshooting for an error, you can always download the Global Logs for that specific process from the Orchestrator and look for detailed logs to track the exceptions.

2. Understanding the Configuration File

The configuration excel file used for the Document Understanding template consists of several sheets. Each sheet in the document address a specific requirement of the template. In general, the config file contains the "Settings," "Assets," and "Constants" sheets that are common to any process.

Settings sheet in Config File

The Settings sheet holds all the process-specific settings and static configuration information. In general, the Settings sheet includes information such as:

Credential Asset information: Used for holding API Keys and other secure information
Endpoint URLs: Used to hold OCR, Extractor, and Classifier endpoints for each document type
Configuration information about using manual verification (classification/ extraction): Specify whether the verification is done 100% manually or based on confidence and validation\ business rules
Enable or disable model retraining: Configuration information on whether to skip or perform continuous model training for Classifiers and Extractors

Following is an illustration of the Settings page in the Config file.

The configuration in the Settings sheet also provides information about the Action Catalogs and Storage Buckets. The default configuration is to use one Action Catalog and one Storage Bucket. However, you might encounter scenarios where you need multiple Action Catalogs and storage locations. It is always a good practice to create multiple entries in the Settings sheet and mention them. You can use it easily in the workflow using the Config variable. The configuration available in the sheet is the basic template given to us. We can change it easily according to our business requirements. However, the general structure and the items available here should remain the same.

For example, assume you are working on an invoice processing task. The invoices come from multiple countries, and the business has a separate team to process invoices for each country. In such situations, having multiple Action Catalogs to group the tasks in Action Center based on the country helps business users to filter efficiently.

Constants and Assets Sheets in the Config File

These two sheets are the same as any other config file. The Constants sheet is used mainly for having specific log messages, whereas the Assets sheet is used to obtain configuration values from the Orchestrator Assets. The same Assets sheet can be used to obtain configuration values and overwrite config values mentioned in the Settings sheet. For example, you can have an Asset of type Boolean to hold the configuration for "SkipExtractorTraining." The configuration is the Asset will overwrite the value provided in the Settings sheet. This approach helps when you want control over the configuration without deploying the Processes repeatedly.

New Configuration Sheets for Each Document Type

Introducing new sheets in the config file (apart from previously mentioned items) to address the requirements of different document types has a lot of benefits in controlling the overall flow. Each sheet introduced for a document type contains the following configurations in general:

Mandatory fields (both regular and column-level fields)
Confidence thresholds for each mandatory field (both regular and column-level fields)
Confidence thresholds for non-mandatory fields
Specific log messages to display the results of cross-validation steps
Extra configuration settings specific to the document type

The Document Understanding template provides two sheets with essential information for processing invoices and receipts. It is a good practice to use a similar template for each document type processed in the workflow. The following image is a sample extract from the template.

The names provided for Mandatory and non-mandatory fields should have the name as specified in the Taxonomy Manager

Note: These sheets are primarily used in the validation stage of the workflow.

The information provided in the mandatory, non-mandatory, and confidence settings is used to validate the extraction results through defined business logic.

3. The Document Understanding Template Framework Explained

The template provided by UiPath supports both attended and unattended automation. However, there is no significant difference between the two apart from how we validate the information. The attended document automation uses the Present Classification/ Validation Station for manual verification, whereas the unattended version uses Action Center for Classification/ Validation tasks.

One of the most significant advantages of this architecture is that the process can execute in both attended and unattended modes because the same workflows are used in both methods.

The diagram below illustrates the general steps in the Document Understanding unattended template.

Document Understanding Unattended Framework Template

The framework consists of the following stages for both attended and unattended versions.

1 - Initialize

The initiation of the process includes reading configuration information from the Config file and obtaining the data from Assets. The initiation is done through two workflows ("ReadConfigFile.XAML" and "10_InitializeProcess.XAML").

You can use the "10_InitializeProcess.XAML" file to add any extra initialization steps required according to your business scenario.

2 - Get Transaction Data

3 - Digitize

4 - Classification

5 - Classification Rules Check

6 - Validate and Train Classifiers

7 - Extraction

8 - Extraction Rules Check

9 - Validate and Train Extractors

10 - Export

The template provides a lot of reusable workflows across attended and unattended document automation. In addition, the architecture also supports building multiple test cases to test each step described above.

4. Testing in Document Understanding Template

The new Document Understanding template offers a list of test workflows to test our solution. The test cases are built around specific scenarios applicable to each functionality, such as obtaining transition items, digitizing, classifying, extracting, and exporting. The test functionality also allows using previously cached data to compare against the newly generated data and test the output of the intended function for any anomalies.

The cached data refers to the output data generated by the digitizing, classification, or extraction activities and stored in text, JSON, or Excel format for testing purposes. The cached data files always contain the expected outcome.

UiPath uses the "UiPath.Testing.Activities" activity package to perform testing.

To better understand, let's look at a few examples.

Expand the examples to read more...

Example 1: Digitization Test

Example 2: Classification Business Rule Confidence Test

Having an understanding of the test scenario examples, let's look at how the cached data is generated and the different test cases available in the DU template.

Generating Cache Data for Testing

Generating the required cached data for the test is easy. The template offers two workflow files where you can submit the documents to generate the cached data.

The following figure illustrates the two workflows available for generating the data.

All the test cases are located inside the Test Folder in the template. The "CacheDUData.XAML" is the main file used for cached data generation. The workflow expects the following arguments.

Argument	Description
Target File	The file path of the target document used for generating cached data
Document Text	The Document Text output generated from the Digitize Document activity
Document Object Model	The Document Object Model output generated from the Digitize Document activity
Classification Results Array	The output of the Classify Document Scope
Extraction Results Index	Refers to the target index of the classification results array. Indicate which Classification Result Object in the array is used for data extraction.
Extraction Results	The output of the Data Extraction Scope or the Validation Station/ Action containing confirmed and accurate data

You can use a separate workflow to generate the above data and invoke the workflow by passing the input arguments. The data passed to "CacheDUData.XAML" is sent to "WriteDatatoCache.XAML" to generate the text/ JSON files containing the cache data.

Let's understand the different test cases available in the Document Understanding template.

Retrieving Transaction Data

The template offers three test workflows for retrieving and testing the transaction data.

Test for no available queue item
Test for queue items without Specific Content (test if the queue item is empty)
Test the data mentioned under Specific Content and validate if it contains the Target File Key used to hold the file path

Digitize

The template offers a test workflow to test how the digitization functionality works for given documents. This test case comes in handy when you have files you would like to test and compare the digitized output with previously cached output results.

Classification

The template offers a test case to test the classification of a document against cached classification data and compare for differences. The test case performs the following tests against the cached output.

Compare actual classification contains the same number of classification results as the cached result (compare the length of the ClassificationResult array)
Compare each classification result of the array is equal to the classification result order in the cached result.
Compare the document bounds of each classification result against the cached result.

The tests help identify anomalies in the classification and ensure accurate results in future updates.

Classification Validation

Compares the classification result against the expected result provided by the user as an input argument. The use of cached data in this test case is slightly different than the others. The cached data could be the previously generated data of the test file. However, if the cached data is not provided, the process will use the actual Digitization and Classification steps to generate the data required for the test. The test case compares the result generated by the classification validation steps mentioned in "35_ClassificationBusinessRuleValidation.XAML" against the expected output and returns it as the test result.

This test case helps test the classification validation logic against different scenarios.

Extraction

Extracting accurate data from files is very important in Document Understanding projects. However, sometimes we face scenarios where the extractors give us low confidence for specific fields. We can use many methods to improve the accuracy of the model. However, testing the accuracy can be a painful task. The data extraction test case provided in the template help address this. The data extraction test case compares the output of the extractors against cached data provided previously.

The cached data always contain accurate information for the test case to compare with. The comparison includes the following tests:

Compares the actual document type against the cached
Compares the values of simple fields to ensure it matches with cached data
Compares the values of column fields (all rows and columns)

The test result will indicate whether each field matches the cached data.

This test case does not perform the extraction data validation rules.

This test is helpful when you want to test the extractors to ensure it generates the expected results.

Extraction Validation

Extracted data validation determines the need for manual verification based on the defined rules. The manual verification is mainly triggered based on two aspects.

Low confidence levels
Inaccurate values

The confidence levels are usually checked against the thresholds provided for each field. The inaccurate values are checked by either cross-checking with other extracted values or trying to parse the values.

The template has provided many mock test files to test each out-of-the-box invoice and receipt taxonomy field. The mock test covers the confidence as well as data accuracy tests. The templates available can also be easily modified for extra fields you may add.

Read the UiPath Documentation on Mock Tests to get to know more...

The following screenshot illustrates the different field validation test cases available in the template.

These test cases are very helpful when you have to test your validation logic built for each field to ensure you get the expected result stating the need for manual verification. Further, it also ensures the validation logic passes the accurate data to downstream applications.

Export

Testing the export results ensures the excel file contains all the expected data. The test case compares the cached Excel file (excel file containing the expected export result and structure) against the Excel file generated through the actual Export function of the template. The following tasks are performed in this test case.

Compare the number of sheets generated in the file against the cached Excel file.
Compare the data in each sheet against the data in the cached Excel file (comparison is made by converting the data of the sheet to a string and comparing it)

All the test cases mentioned above enable developers and testers to test the Document understanding workflow efficiently. The test cases provided cover the default out-of-the-box functions of invoices and receipts. However, it can be configured easily to incorporate more document types and fields.

Conclusion

UiPath Document Understanding is one of the best tools to automate manual and complex document processing activities. There are many challenges in processing documents, starting from business rules, document volume, layouts, validations, etc. Using a proper base for the solution workflow plays a significant role in implementing reliable, efficient, and easy-to-manage solutions. The article explains the concept behind the latest Document Understanding template offered by UiPath, and how we can use it to build our document processing solutions.