Chunk retrieval settings

You can find settings for retrieving chunks in the Settings → Retrieval section.

The pipeline defines the steps involved in chunk retrieval and response generation:

Semantic pipeline: the system compares the vector representation (embedding) of the user’s query to the vector representations of content chunks. The most relevant chunks, along with the query, are then passed to an LLM for response generation.
Agentic pipeline: the AI agent manages the response generation process using an LLM and function calls. The LLM, via the agent, retrieves chunks, analyses them, and requests additional ones if needed. Once the LLM deems the retrieved chunks the most relevant, chunk selection is complete, and the LLM generates its response.

For detailed information about the pipelines, see the Main stages of Tovie Data Agent operation section.

The available settings depend on the selected pipeline.

Semantic pipeline
Agentic pipeline

Parameters of chunk retrieval from sources

Re-ranking

Abbreviations

Parameters of chunk retrieval from sources

Number of retrieval results: specifies the number of found chunks to process.

tip
If you are going to use re-ranking, increase the number of retrieval results. This way, you can choose relevant chunks from a larger pool of candidates.
Number of adjacent chunks: specifies the number of adjacent chunks included with each found chunk. Adjacent chunks help the model understand the query’s context better and generate more accurate responses.

For example, if you set the Number of retrieval results to 10 and the Number of adjacent chunks to 3, then 3 preceding and 3 following chunks are included with each found chunk. As a result, the size of each found chunk increases, and 10 larger chunks are sent for further processing.
Search by chunk metadata: if enabled, the system searches for relevant chunks based on their content and additional metadata fields populated during indexing.
Rephrase user’s query: if enabled, the system attempts to reformulate the query to make it clearer. This can improve the quality of search results.

You can edit the prompt for rephrasing the query if needed.
Full-text search: if enabled, semantic search is complemented by full-text search.

Semantic search selects chunks relevant to the query’s meaning, even if they do not contain exact words from the query. Full-text search selects chunks that match the exact words in the query.

Select your full-text search strategy:
- The Hybrid strategy combines the results of semantic and full-text search, returning the chunks both matching the query’s meaning and containing words from the query. The total number of results is capped by the Number of retrieval results specified above.
- The Weighted strategy enables you to combine the results of semantic and full-text search in the desired ratio.
  
  Specify the maximum number of results for each search type to include in the final list for further processing.
  
  For example, if the Number of retrieval results is 10, the Semantic portion is 8, and the Full-text search portion is 4, the top 10 results from each search type are selected. The results are merged, discarding the duplicates (chunks found by both search types) and results exceeding the portion sizes. The final number of results does not exceed 12 (8 + 4).
- The With threshold strategy runs semantic search first. If it does not return enough relevant results, full-text search is applied.
  
  Specify the relevance score threshold for semantic search results. If all the results fall below the threshold, the search switches to full-text mode.
Consider chat history: if enabled, the system takes into account the user’s previous queries during chunk retrieval. This helps the model better understand the context and generate more accurate responses.

To consider the history, adjust the following settings:
- Prompt.
- The maximum history size in tokens that can be sent to the LLM.
- The minimum number of user’s queries in the chat history.
caution
If the history size exceeds the token limit, messages are deleted one at a time until the required size or the minimum number of user queries is reached. If the minimum number of queries is reached first, the history is sent to the LLM even if it exceeds the size limit, which might cause an error. If the error persists, try reducing the minimum number of queries in the history.

Re-ranking

Re-ranking is performed after chunk retrieval from source documents to re-evaluate the relevance of chunks to the user’s query and to select the most relevant ones. If the retrieved chunks are sufficiently relevant to the query, you may disable re-ranking.

Select the re-ranking method:

Empirical: an algorithm based on Tovie AI’s experience.
Using a model: performed by a specialised re-ranking model.

Empirical ranking is faster than model-based ranking, but may be less accurate.

Re-ranking settings vary depending on the selected method.

Empirical
Using model

Number of retrieval results: specifies the maximum number of results sent to the LLM along with the user’s query for response generation after re-ranking.
Number of results from a single document: ensures that chunks from a single source document do not take up the entire response generation context.
Threshold score: results with a relevance score below the threshold are excluded from response generation.
Maximum deviation from the best score, %: specifies the allowed deviation from the score of the most relevant chunk. Results with a deviation above the allowed value are excluded from response generation.

Abbreviations

If your source documents contain abbreviations or contractions that differ from common usage, you can clarify them for the system. To do this, upload a JSON file containing these abbreviations with full forms and explanations. You can download a template file from the interface.

File format

Each JSON object describes an abbreviation. The key is the abbreviation itself.

The value contains the following fields:

name: full form of the abbreviation.
variations: spelling variations.
description: explanation.

If the abbreviation or its variation is found in the user’s query, they are added to the prompt for rephrasing along with name and description.

Example file:

{
  "GDPR": {
    "name": "General Data Protection Regulation",
    "variations": [],
    "description": "A regulation in EU law on data protection and privacy in the European Economic Area (EEA) and the United Kingdom."
  },
  "PIN": {
    "name": "Personal Identification Number",
    "variations": ["p.i.n."],
    "description": "A code used for secure access, often associated with bank cards."
  }
}