The Document Understanding Process template is a fully functional UiPath Studio project template based on a document processing flowchart. It provides logging, exception handling, retry mechanisms, and all the methods that should be used in a Document Understanding workflow, out of the box. The template has an architecture decoupled from other connected automations and supports both attended and unattended processes with human-in-the-loop validation via Action Center. The template consists of the following components1:
Load Taxonomy: This component loads the taxonomy file that defines the document types and fields to be extracted. The taxonomy file can be created using the Taxonomy Manager in Studio or the Data Manager web application.
Digitization: This component converts the input document into a digital format that can be processed by the subsequent components. It uses the Digitize Document activity to perform OCR (optical character recognition) on the document and obtain a Document Object Model (DOM).
Classification: This component determines the document type of the input document using the Classify Document Scope activity. It can use either a Keyword Based Classifier or a Machine Learning Classifier, depending on the configuration. The classification result is stored in a ClassificationResult variable.
Data Extraction: This component extracts the relevant data from the input document using the Data Extraction Scope activity. It can use different extractors for different document types, such as the Form Extractor, the Machine Learning Extractor, the Regex Based Extractor, or the Intelligent Form Extractor. The extraction result is stored in an ExtractionResult variable.
Data Validation: This component allows human validation and correction of the extracted data using the Present Validation Station activity. It opens the Validation Station window where the user can review and edit the extracted data, as well as provide feedback for retraining the classifiers and extractors. The validated data is stored in a DocumentValidationResult variable.
Export: This component exports the validated data to a desired output, such as an Excel file, a database, or a downstream process. It uses the Export Extraction Results activity to convert the DocumentValidationResult variable into a DataTable variable, which can then be manipulated or written using other activities.
References: Document Understanding Process: Studio Template, Document Understanding Process - New Studio Template, Document Understanding Process Template in UiPath Studio