The Business Process for Document Understanding - UiPath Studio Template Explained
Updated: Nov 27, 2022
UiPath offers a template that we can use for building Document Understanding workflows. The article explains the workflow concept and how to develop our project on top of it.
Automated document processing usually has two main approaches; attended document automation and unattended document automation. The latest Document Understanding templates provide a reliable foundation for attended, unattended document automation and testing.
The new Document Understanding Template can be found under the Templates Menu of UiPath Studio.
The UiPath Document Understanding template uses the concept of "One Job per Document" to process documents.
Let's understand the concept and have a look at the detailed process.
Topics Covered in the Article
1. One Job per Document Concept?
We can think of many ways to iterate through a list of documents. The most common iterative logic that comes to mind is the introduction of a loop. UiPath has multiple mechanisms to loop through records, such as a For Each, Do While, Parallel For Each, etc. However, these approaches have their pros and cons when it comes to processing documents. Let's have a look at some of those limitations.
For Each - Pros
None that I can think of...
For Each - Cons
Not the best approach when processing a large number of documents
Not suitable for unattended document automation, as every document is processed in a sequential order
Parallel For Each - Pros
Allows parallel processing of documents
Works for both attended and unattended document automation
Parallel For Each - Cons
Exception handling can be complex due to the complex nature of a Parallel For Each.
Not suitable for processing a large number of documents
Tracking the status of a document through logs can be tricky
The entire process fails if the process runs into an unhandled exception with the Parallel For Each
Not possible to scale and use multiple robots to process documents
As we see, there are many limitations in the usual looping approaches for processing documents. This is where the concept of One Job per Process comes into the picture.
What is the One Job per Document concept?
The concept focuses on building the automation workflow to process only one document. In other words, the workflow does not contain any activity or logic to loop through records. Instead, it includes steps to capture the available document from either Orchestrator Queues or manual input and process only that given document.
The following options are available in the template to obtain new files:
File Input Option
Introduce a Document collector process (dispatcher) to find all the pending documents and load them into an Orchestrator Queue
Attended and Unattended
Provide a Select File window
Provide an input argument to provide the file path in the Job creation page in the Orchestrator/ Assistant
But still, even if we load the Queue, how do we trigger the process for each document when the workflow is designed to process only one file?
The Queue Trigger feature of the UiPath Orchestrator plays a significant role in this concept. The Queue-Based Trigger allows us to trigger a Job (Document Understanding Job) for each new item created in the Queue. Each started job takes care of one queue item and runs independently from others.
What's the benefit of using this concept?
Each document is processed independently.
Failed items can be retried automatically through the built-in Retry capabilities of the Queue.
Easy tracking of failures and the cause
Flexibility to process items in Queues and process specific files provided by the user through the same process
Easy to scale up and introduce new robots to process documents in parallel
An ideal approach for processing a large number of records efficiently
One common question about this concept is the number of jobs it creates to process documents. Let's take an example. Let's say the Queue is loaded with 100 files. The Queue-Based Trigger creates 100 jobs to process each file. One may think it is not easy to track the jobs and know which job processed which document. But we don't need to worry about the number of jobs or tracking the job status. Each queue item records its status on how it got executed. In addition, if you add a unique reference to each queue item (ex: file name), you can search for the item using the Reference field and monitor its progress. Similarly, failed and retried items can also be tracked through the Queue. However, if you are troubleshooting for an error, you can always download the Global Logs for that specific process from the Orchestrator and look for detailed logs to track the exceptions.
2. Understanding the Configuration File
The configuration excel file used for the Document Understanding template consists of several sheets. Each sheet in the document address a specific requirement of the template. In general, the config file contains the "Settings," "Assets," and "Constants" sheets that are common to any process.
Settings sheet in Config File
The Settings sheet holds all the process-specific settings and static configuration information. In general, the Settings sheet includes information such as:
Credential Asset information: Used for holding API Keys and other secure information
Endpoint URLs: Used to hold OCR, Extractor, and Classifier endpoints for each document type
Configuration information about using manual verification (classification/ extraction): Specify whether the verification is done 100% manually or based on confidence and validation\ business rules
Enable or disable model retraining: Configuration information on whether to skip or perform continuous model training for Classifiers and Extractors
Following is an illustration of the Settings page in the Config file.
The configuration in the Settings sheet also provides information about the Action Catalogs and Storage Buckets. The default configuration is to use one Action Catalog and one Storage Bucket. However, you might encounter scenarios where you need multiple Action Catalogs and storage locations. It is always a good practice to create multiple entries in the Settings sheet and mention them. You can use it easily in the workflow using the Config variable. The configuration available in the sheet is the basic template given to us. We can change it easily according to our business requirements. However, the general structure and the items available here should remain the same.
For example, assume you are working on an invoice processing task. The invoices come from multiple countries, and the business has a separate team to process invoices for each country. In such situations, having multiple Action Catalogs to group the tasks in Action Center based on the country helps business users to filter efficiently.
Constants and Assets Sheets in the Config File
These two sheets are the same as any other config file. The Constants sheet is used mainly for having specific log messages, whereas the Assets sheet is used to obtain configuration values from the Orchestrator Assets. The same Assets sheet can be used to obtain configuration values and overwrite config values mentioned in the Settings sheet. For example, you can have an Asset of type Boolean to hold the configuration for "SkipExtractorTraining." The configuration is the Asset will overwrite the value provided in the Settings sheet. This approach helps when you want control over the configuration without deploying the Processes repeatedly.
New Configuration Sheets for Each Document Type
Introducing new sheets in the config file (apart from previously mentioned items) to address the requirements of different document types has a lot of benefits in controlling the overall flow. Each sheet introduced for a document type contains the following configurations in general:
Mandatory fields (both regular and column-level fields)
Confidence thresholds for each mandatory field (both regular and column-level fields)
Confidence thresholds for non-mandatory fields
Specific log messages to display the results of cross-validation steps
Extra configuration settings specific to the document type
The Document Understanding template provides two sheets with essential information for processing invoices and receipts. It is a good practice to use a similar template for each document type processed in the workflow. The following image is a sample extract from the template.
The names provided for Mandatory and non-mandatory fields should have the name as specified in the Taxonomy Manager
Note: These sheets are primarily used in the validation stage of the workflow.
The information provided in the mandatory, non-mandatory, and confidence settings is used to validate the extraction results through defined business logic.
3. The Document Understanding Template Framework Explained
The template provided by UiPath supports both attended and unattended automation. However, there is no significant difference between the two apart from how we validate the information. The attended document automation uses the Present Classification/ Validation Station for manual verification, whereas the unattended version uses Action Center for Classification/ Validation tasks.
One of the most significant advantages of this architecture is that the process can execute in both attended and unattended modes because the same workflows are used in both methods.
The diagram below illustrates the general steps in the Document Understanding unattended template.
The framework consists of the following stages for both attended and unattended versions.
1 - Initialize
The initiation of the process includes reading configuration information from the Config file and obtaining the data from Assets. The initiation is done through two workflows ("ReadConfigFile.XAML" and "10_InitializeProcess.XAML").
You can use the "10_InitializeProcess.XAML" file to add any extra initialization steps required according to your business scenario.
2 - Get Transaction Data
Similar to Robotic Enterprise Framework, the Transaction Data is obtained from an Orchestrator Queue. However, the Document Understanding framework requires using a specific name for the Specific Content item in the Queue that holds the file path. The "TargetFileKey" tag in the Settings sheet of the configuration file provides the name for the tag to use under Specific Contents of the Queue Item. This workflow is only executed when the user has not provided a specific file through other input options and when the "UseQueue" option is enabled on the Main workflow input arguments.
This workflow remains mostly the same because the original design is to use the Queue to support the Job per Document concept.
3 - Digitize
The document is converted to a machine-readable format in this stage. The template provides a structure for the digitization part. However, it also has a provision to include custom pre-digitization steps. The pre-digitization steps depend on the business scenarios and the quality of the documents. Usually, pre-digitization includes tasks such as grayscale actions.
Example: The "UiPathTeam.ImageActivities.Activities" provide grayscaling and other functionalities for image files.
Further, you can also configure this workflow to use multiple OCR engines as needed to support better data extraction quality and reliability. Consider including logic on what basis the OCR engines are selected for the digitization wrapped inside a Try-Catch activity within the Retry Scope.
The video in the link explains how to select the best OCR engines for the business case.
4 - Classification
The workflow "30_Classify.XAML" consists of all the steps required to perform the classification using any of the classification methods available in UiPath. The default workflow only includes the Intelligent Keyword Classifier. You can change the Classify Document Scope according to your requirements. Using multiple classifiers (Intelligent Keyword Classifier or Keyword Classifier) require changes in the Settings sheet to include the file path for the classifier JSON file. Further, using the Machine Learning Classifier would require modifications for the "ClassificationEndpoint" property in the Settings sheet of the Config file.
5 - Classification Rules Check
This step is performed in the "35_ClassificationBusinessRuleValidation.XAML." The Document Understanding template enables the user to configure the workflow to check for business rules that trigger a manual classification. The template provides a flowchart with one condition that checks the "AlwaysValidateClassification" setting in the Config file. Setting this property to True triggers manual classification for all documents. Further, setting this property to False applies the business rules defined for classification to determine whether the manual classification is required. You can introduce several workflow files depending on the need to define the business rules to decide how to get the document classified.
6 - Validate and Train Classifiers
The Classification Rules Check phase (5) provides an output indicating whether manual verification is needed. The Validate and Train Classifiers stage (6) is executed only when manual verification is performed. This stage contains the activities to prompt the user for manual confirmation. The attended version uses the Present Classification Station, whereas the unattended version uses the Create Classification Action in the Action Center.
The user-verified results are also used here to improve the classifiers through the Train Classifiers Scope activity. However, the training is done only when it is enabled through the configuration settings (Config tag: "SkipClassifierTraining"). In addition, the template also offers a set of reusable workflows that will allow file locking to update the JSON files used to hold classification information. The file locking feature eliminates bots from trying to access and perform edits to the file simultaneously, leading to process failures or corrupt JSON files.
7 - Extraction
The Document Understanding workflow is designed to process only one document in a single job. However, we may encounter files that contain multiple document types. We need to perform the data extraction for each identified document type in the file by looping through the Classification Results.
The template uses a Parallel For Each to loop through each classification type to perform the extraction. The Parallel For Each activity is used for looping here because the bot can perform the extraction simultaneously for each type. The extraction is done using multiple data extraction methods based on the requirements. The actions required for data extraction are mentioned in the workflow "50_Extract.XAML".
8 - Extraction Rules Check
Similar to the Classification Rule Check (5), the data extraction also consists of a rules check section to validate the data extracted.
The workflow "55_ExtractionBusinessRuleValidation.XAML" handles the data validation logic.
The validation logic is also controlled by the configuration setting "AlwaysValidateExtraction" in the Config file. Setting this property to True pushes all the validation actions to the user. Similarly, setting the property to False triggers the use of defined business rules to determine whether the data can be automatically verified without manual intervention. Each file type has its fields and rules to perform the validation. The validation is performed around the configuration defined in the config sheet specific to the document type. For example, we can use the config sheet specific to Invoices to obtain the mandatory fields, mandatory field confidence, general field confidence, etc., to conduct the validation. The template, by default, uses the two sheets (InvoicePostProcessing and ReceiptPostProcessing) in the Config file to build the validation logic.
The validation logic in the template is built around the default out-of-the-box taxonomy used for invoices and receipts.
Following are the steps in the extracted data validation task. A sequence or flowchart separates each task for easy organization in Studio.
Sequence: Setting Up Variables
The following steps are performed during the initialization of the validation step. The following information is obtained from the Taxonomy Manager and the Config file sheet specific to the document type.
Obtain the list of fields for the given document type
Obtain the list of fields that contribute to the totals
Generate the mandatory field list
Generate the non-mandatory (common) field list
Generate the confidence thresholds for mandatory and non-mandatory fields
Sequence: Check Mandatory Extracted Values
The mandatory field check step focuses on checking whether the mandatory fields have values. Further, it also checks whether the extracted values are parsable to ensure valid data is provided.
For example, check if the Invoice date is captured, and confirm by trying to convert it into a Date format to ensure data accuracy.
Sequence: Value Cleanup
The value cleanup section focuses mainly on amount fields to ensure the data does not contain any OCR errors or invalid special characters. This section is essential as the next step uses the cleansed values to perform mathematical calculations. Some cleanup functions include checking the decimal points and removing unwanted characters. In addition to the default cleanup functions, you can add any additional cleanup steps needed depending on the scenarios faced in your project.
For Each: Line Verification
Performs mathematical calculations on each line item to calculate the subtotals to cross-verify the data. The cleansed data is used to perform the calculations in this step. You can add more custom logic if required for your scenarios.
Flowchart: Extraction Results Check
The flowchart consists of steps to cross-verify the information available in the invoice against the calculated fields for data validity.
Use a calculated field to sum up the totals of each line and verify against the subtotal field to verify if it matches.
Use a calculated field to calculate the line total using unit price and quantity and cross-check against the extracted line total of each line
In addition, the flow also checks whether the extractor confidence meets the minimum confidence thresholds defined in the configuration. It considers all these steps to decide whether to validate automatically or to request human involvement.
You can add custom validation logic here depending on the requirements to validate all the information extracted.
9 - Validate and Train Extractors
This step is executed based on the result provided by the Extraction Rule Check (8) step. The validation step focuses on getting user involvement through the Present Validation Station or Validation Action to manually verify the data. Further, the manually verified data is passed on to the training step to train the extractors for better accuracy and reliability. However, the training step is controlled by a setting in the Config file ("SkipExtractorTraining" setting in the Settings sheet). The extractor training is handled in the "60_TrainExtractors.XAML" workflow.
10 - Export
This step is about exporting the validated data and saving it in an excel file. Other RPA processes later consume the excel file to update the downstream applications. The template uses the "70_Export.XAML" to perform the final task of exporting the data.
The template provides a lot of reusable workflows across attended and unattended document automation. In addition, the architecture also supports building multiple test cases to test each step described above.
4. Testing in Document Understanding Template
The new Document Understanding template offers a list of test workflows to test our solution. The test cases are built around specific scenarios applicable to each functionality, such as obtaining transition items, digitizing, classifying, extracting, and exporting. The test functionality also allows using previously cached data to compare against the newly generated data and test the output of the intended function for any anomalies.
The cached data refers to the output data generated by the digitizing, classification, or extraction activities and stored in text, JSON, or Excel format for testing purposes. The cached data files always contain the expected outcome.
UiPath uses the "UiPath.Testing.Activities" activity package to perform testing.
To better understand, let's look at a few examples.
Expand the examples to read more...
Example 1: Digitization Test
Assume you have a file you believe does not digitize as expected using the current OCR engine. You, however, wish to test it against another OCR and compare the output. You can run the digitization step and save the outcome of the Digitize Document activity (Document Text and the Document Object Model) in two separate files. These two files act as the cached data for the Digitizing test case. Once you have placed the cached data in the cached data folder, you can run the Digitize test to compare the result of the actual Digitize process against the previously cached data to validate your test case. The test case provided in the template performs two test activities on the Document Object Model and the Document Text and returns the result.
Example 2: Classification Business Rule Confidence Test
Classification confidence illustrates whether the manual classification is needed or not. The test performed in this test case checks whether the classification of the cached document meets the expected result. Similar to the previous example, the target document is cached and stored in the cached data folder before running this test case. In addition to the digitization step, the classification result is also cached. The test runs the cached data against the actual classification rules defined in the workflow and checks whether the rules generate the expected output provided.
Having an understanding of the test scenario examples, let's look at how the cached data is generated and the different test cases available in the DU template.
Generating Cache Data for Testing
Generating the required cached data for the test is easy. The template offers two workflow files where you can submit the documents to generate the cached data.
The following figure illustrates the two workflows available for generating the data.
All the test cases are located inside the Test Folder in the template. The "CacheDUData.XAML" is the main file used for cached data generation. The workflow expects the following arguments.
The file path of the target document used for generating cached data
The Document Text output generated from the Digitize Document activity
Document Object Model
The Document Object Model output generated from the Digitize Document activity
Classification Results Array
The output of the Classify Document Scope
Extraction Results Index
Refers to the target index of the classification results array. Indicate which Classification Result Object in the array is used for data extraction.
The output of the Data Extraction Scope or the Validation Station/ Action containing confirmed and accurate data
You can use a separate workflow to generate the above data and invoke the workflow by passing the input arguments. The data passed to "CacheDUData.XAML" is sent to "WriteDatatoCache.XAML" to generate the text/ JSON files containing the cache data.
Let's understand the different test cases available in the Document Understanding template.
Retrieving Transaction Data
The template offers three test workflows for retrieving and testing the transaction data.
Test for no available queue item
Test for queue items without Specific Content (test if the queue item is empty)
Test the data mentioned under Specific Content and validate if it contains the Target File Key used to hold the file path
The template offers a test workflow to test how the digitization functionality works for given documents. This test case comes in handy when you have files you would like to test and compare the digitized output with previously cached output results.
The template offers a test case to test the classification of a document against cached classification data and compare for differences. The test case performs the following tests against the cached output.
Compare actual classification contains the same number of classification results as the cached result (compare the length of the ClassificationResult array)
Compare each classification result of the array is equal to the classification result order in the cached result.
Compare the document bounds of each classification result against the cached result.
The tests help identify anomalies in the classification and ensure accurate results in future updates.
Compares the classification result against the expected result provided by the user as an input argument. The use of cached data in this test case is slightly different than the others. The cached data could be the previously generated data of the test file. However, if the cached data is not provided, the process will use the actual Digitization and Classification steps to generate the data required for the test. The test case compares the result generated by the classification validation steps mentioned in "35_ClassificationBusinessRuleValidation.XAML" against the expected output and returns it as the test result.
This test case helps test the classification validation logic against different scenarios.
Extracting accurate data from files is very important in Document Understanding projects. However, sometimes we face scenarios where the extractors give us low confidence for specific fields. We can use many methods to improve the accuracy of the model. However, testing the accuracy can be a painful task. The data extraction test case provided in the template help address this. The data extraction test case compares the output of the extractors against cached data provided previously.
The cached data always contain accurate information for the test case to compare with. The comparison includes the following tests:
Compares the actual document type against the cached
Compares the values of simple fields to ensure it matches with cached data
Compares the values of column fields (all rows and columns)
The test result will indicate whether each field matches the cached data.
This test case does not perform the extraction data validation rules.
This test is helpful when you want to test the extractors to ensure it generates the expected results.
Extracted data validation determines the need for manual verification based on the defined rules. The manual verification is mainly triggered based on two aspects.
Low confidence levels
The confidence levels are usually checked against the thresholds provided for each field. The inaccurate values are checked by either cross-checking with other extracted values or trying to parse the values.
The template has provided many mock test files to test each out-of-the-box invoice and receipt taxonomy field. The mock test covers the confidence as well as data accuracy tests. The templates available can also be easily modified for extra fields you may add.
Read the UiPath Documentation on Mock Tests to get to know more...
The following screenshot illustrates the different field validation test cases available in the template.
These test cases are very helpful when you have to test your validation logic built for each field to ensure you get the expected result stating the need for manual verification. Further, it also ensures the validation logic passes the accurate data to downstream applications.
Testing the export results ensures the excel file contains all the expected data. The test case compares the cached Excel file (excel file containing the expected export result and structure) against the Excel file generated through the actual Export function of the template. The following tasks are performed in this test case.
Compare the number of sheets generated in the file against the cached Excel file.
Compare the data in each sheet against the data in the cached Excel file (comparison is made by converting the data of the sheet to a string and comparing it)
All the test cases mentioned above enable developers and testers to test the Document understanding workflow efficiently. The test cases provided cover the default out-of-the-box functions of invoices and receipts. However, it can be configured easily to incorporate more document types and fields.
UiPath Document Understanding is one of the best tools to automate manual and complex document processing activities. There are many challenges in processing documents, starting from business rules, document volume, layouts, validations, etc. Using a proper base for the solution workflow plays a significant role in implementing reliable, efficient, and easy-to-manage solutions. The article explains the concept behind the latest Document Understanding template offered by UiPath, and how we can use it to build our document processing solutions.