Despite increasing digitization, documents such as invoices, annual reports, complaints, forms and contracts are still frequently used and accumulate as mass data in banks, insurance companies and public authorities. The processing of such documents - known as unstructured documents in Computer Science - is of great importance for the efficiency of many business processes, such as the forwarding of inquiries, the extraction and storage of data and support for search queries. In the SLIMDOC (Synergetic LIghtweight Multimodal DOCument Analysis) project, researchers at Hochschule RheinMain – University of Applied Sciences and Arts (HSRM) are investigating how artificial intelligence (AI) can be used for automated document analysis.
Automated analysis of multimodal documents
AI-based document analysis is a key technology for the interpretation of documents, dealing with information extraction (e.g. of product prices), entity recognition (e.g. of locations or invoice items), the classification of documents or the automatic answering of questions about document content. It is particularly challenging to understand multimodal documents: In addition to text, these also contain images such as graphics or photos. The AI must therefore not only take textual information into account, but also visual signals and the spatial arrangement of layout elements. When settling insurance claims, for example, AI models must examine whether claims documents are compatible and plausible.
Reducing the size of AI models
"With the SLIMDOC project, we want to develop AI models that reliably analyze such documents in a lightweight way," explains project leader Prof. Dr. Adrian Ulges. Previous AI models can be divided into two types: on the one hand, large language models (LLMs) such as the models of the GPT series, which are convincing as general problem solvers, but involve enormous resource consumption and can only be operated locally to a limited extent. There are also models that specialize in document processing, which interpret image content, text and layout in combination, but require manually annotated data, which involves additional work for the customers who have to provide these annotations.
The goal of SLIMDOC is to combine both types of model synergistically. Using a process known as distillation, the capabilities of LLMs are to be transferred to very small, task-specific models for document analysis. This should result in a more effective model that solves the same task in a resource-efficient manner in terms of sustainability and digital sovereignty. The AI models should also be able to independently create the graded training data using document generation, eliminating the need for expensive data collection and manual annotation.
Collaboration with practice partners
The project is working on two use cases with three practice partners. Together with Insiders Technologies GmbH, a medium-sized provider of software solutions for the automation of document-centric business processes, new, highly efficient AI models are being developed for special tasks in document analysis. Document analysis as a multimodal problem is the focus of the collaboration with R+V Versicherung as a processor of mass data and Doxis GmbH as a provider in the field of enterprise content management. The plan is to use newly developed AI models to extract information from business reports with graphics and to check the plausibility of insurance claims.