Using Language Analysis on Documents

Documents are not always structured. Extracting data from unstructured documents can be tricky...

Documents are used across many industries and domains. Every domain has its own document types and ways of processing the data. We have seen very complicated and unstructured documents in different fields. Processing unstructured documents can be very tricky because of the nature of the documents. When we say "unstructured," these documents could include paragraphs of text, emails, images, and handwritten data. The document layout could also differ to make it even more complicated. In such scenarios, identifying the information we need can be tricky.

When it comes to unstructured data, sometimes we even need to understand the context of the text in the document to find what we need. For example, if we want to identify the names of people in a paragraph, a model that supports semi-structured documents (ex: invoice) may not work here. Hence, it is necessary to understand the language to find what we need. UiPath provides several out-of-the-box packages that easily allow us to perform this task.

The UiPath Named Entity Recognition and Custom Named Entity Recognition models allow us to extract entities from unstructured documents. These models generally work on text inputs, but these models can be easily integrated with Document Understanding.

This tutorial focuses on building and using a Named Entity Recognition model to extract data, cleanse, and convert the extracted values into a structured format for better processing. Further, we also look at how to build custom NER models to suit our different business requirements.

Using Language Analysis on Documents

Video Series

Join my mailing list