Main stages of Tovie Data Agent operation
The operation of Tovie Data Agent is based on the Retrieval-Augmented Generation (RAG), which combines search and response generation for user queries. RAG integrates the internal knowledge of LLM with extensive and continually updated external data sources, such as databases, websites, and other information resources.
RAG improves the accuracy and reliability of response generation, especially for tasks requiring deep knowledge. The approach also enables continual updating of knowledge and integration of domain-specific information.
Based on the project settings, one of the following pipelines is executed.
- Semantic pipeline
- Agentic pipeline
Index knowledge base
Various data sources are used for indexing the knowledge base: your files in DOCX, PDF, TXT formats, etc., as well as external knowledge storage services like Confluence. Source data are pre-processed, resulting in MD text files, which are then submitted for chunking and vectorisation.
Chunking
Chunking is the division of text into small fragments (chunks) to improve search quality. When searching and preparing an answer, the system uses these chunks instead of or alongside entire text documents (depending on settings).
Vectorisation
Vectorisation is the transformation of text into a vector representation. Each chunk is converted into a vector, which is then used to find the most relevant answer to the user query. Vectorisation enables the use of geometric and algebraic operations for data comparison and analysis.
Vectorised chunks (embeddings) are stored in a vector storage optimised for fast search and data access.
Retrieval-Augmented Generation (RAG)
- Semantic pipeline
- Agentic pipeline
Retrieving
The user query is rephrased considering the chat history. The rephrased query, just like the data chunks, is converted into a vector representation. Information retrieval is performed by comparing the vector representations of the user query and the data. The vector storage automatically chooses the most relevant data based on vector similarity.
Re-ranking
Once the vector storage has found the most relevant chunks, they are re-ranked by a special model called a re-ranker. As a result, the relevance score of each chunk may change compared to the score provided by the vector storage.
Response generation
At the final stage, the system generates a response to the user’s query. The system submits the query and the chunks selected in the previous stages to the LLM. The LLM then generates a response using the information provided as well as its own internal knowledge.
The AI agent sends the user’s query to an LLM, along with descriptions of the knowledge base functions that the agent can use.
The LLM generates a specialised response that identifies the function to call and provides the arguments. For instance, it might specify a function for chunk retrieval.
The AI agent then executes the function and feeds the result back to the LLM. The LLM analyses the result and may request another function call if needed. This iterative process, known as the agent loop, can repeat multiple times.
If the LLM’s response does not specify a function to call, it is considered the final response, which the AI agent delivers to the user.