Skip to main content

Chunk retrieval settings

You can find settings for retrieving chunks in the Project settingsRetrieval section.

The pipeline defines the steps involved in chunk retrieval and response generation:

  • Semantic pipeline: the system compares the vector representation (embedding) of the user’s query to the vector representations of content chunks. The most relevant chunks, along with the query, are then passed to an LLM for response generation.
  • Agentic pipeline: the AI agent manages the response generation process using an LLM and function calls. Through the agent, the LLM retrieves and analyzes chunks, requesting additional ones if needed. Once the LLM determines that the retrieved chunks are the most relevant, the selection is complete and the LLM generates the response.

For detailed information about the pipelines, see the Main stages of Tovie Data Agent operation section.

The available settings depend on the selected pipeline.


Parameters of chunk retrieval from sources

  • Number of retrieval results: specifies the number of found chunks to process.

    tip

    If you are going to use re-ranking, increase the number of retrieval results. This way, you can choose relevant chunks from a larger pool of candidates.

  • Number of adjacent chunks: specifies the number of adjacent chunks included with each found chunk. Adjacent chunks help the model understand the query’s context better and generate more accurate responses.

    For example, if you set the Number of retrieval results to 10 and the Number of adjacent chunks to 3, then 3 preceding and 3 following chunks are included with each found chunk. As a result, the size of each found chunk increases, and 10 larger chunks are sent for further processing.

  • Search by chunk metadata: if enabled, the system searches for relevant chunks based on their content and additional metadata fields populated during indexing.

  • Search within segments: controls the search behavior when no matching chunks are found within the specified scope:

    • Expandable: expands the search across the entire knowledge base.

    • Strict: restricts the search to the specified scope, the knowledge base will reply that no data is available.

      You can specify the search scope when querying the knowledge base via the API or in the test chat. The scope can include specific segments and unsegmented sources. For more details, see the Segments section.

    tip

    We recommend enabling re-ranking for the expandable mode. If the chunks found within the specified search scope aren’t relevant enough to the query, re-ranking will filter them out and trigger a search across the entire knowledge base.

  • Rephrase user’s query: if enabled, the system attempts to reformulate the query to make it clearer. This can improve the quality of search results.

    • On request Prompt: You can request to edit this prompt, for example, to add requirements for the rephrased query or include examples relevant to your company’s area of business.

      If you require this feature, please contact support at support@tovie.ai.

  • Full-text search: if enabled, semantic search is complemented by full-text search.

    Semantic search selects chunks relevant to the query’s meaning, even if they do not contain exact words from the query. Full-text search selects chunks that match the exact words in the query.

    Select your full-text search strategy:

    • The Hybrid strategy combines the results of semantic and full-text search, returning the chunks both matching the query’s meaning and containing words from the query. The total number of results is capped by the Number of retrieval results specified above.

    • The Weighted strategy enables you to combine the results of semantic and full-text search in the desired ratio.

      Specify the maximum number of results for each search type to include in the final list for further processing.

      For example, if the Number of retrieval results is 10, the Semantic portion is 8, and the Full-text search portion is 4, the top 10 results from each search type are selected. The results are merged, discarding the duplicates (chunks found by both search types) and results exceeding the portion sizes. The final number of results does not exceed 12 (8 + 4).

    • The With threshold strategy runs semantic search first. If it does not return enough relevant results, full-text search is applied.

      Specify the relevance score threshold for semantic search results. If all the results fall below the threshold, the search switches to full-text mode.

  • Consider chat history: if enabled, the system takes into account the user’s previous queries during chunk retrieval. This helps the model better understand the context and generate more accurate responses.

    To consider the history, adjust the following settings:

    • On request Prompt: You can request to edit this prompt, for example, to add requirements for queries modified by message history or to include examples relevant to your company’s area of business.

    • The maximum history size in tokens that can be sent to the LLM.

    • The minimum number of user’s queries in the chat history.

    caution

    If the history size exceeds the token limit, messages are deleted one at a time until the required size or the minimum number of user queries is reached. If the minimum number of queries is reached first, the history is sent to the LLM even if it exceeds the size limit, which might cause an error. If the error persists, try reducing the minimum number of queries in the history.

Re-ranking

Re-ranking is performed after chunk retrieval from source documents to re-evaluate the relevance of chunks to the user’s query and to select the most relevant ones. If the retrieved chunks are sufficiently relevant to the query, you may disable re-ranking.

Select the re-ranking method:

  • Empirical: an algorithm based on Tovie AI’s experience.
  • Using a model: performed by a specialised re-ranking model.

Empirical ranking is faster than model-based ranking, but may be less accurate.

Re-ranking settings vary depending on the selected method.

  • Number of retrieval results: specifies the maximum number of results sent to the LLM along with the user’s query for response generation after re-ranking.
  • Number of results from a single document: ensures that chunks from a single source document do not take up the entire response generation context.
  • Threshold score: results with a relevance score below the threshold are excluded from response generation.
  • Maximum deviation from the best score, %: specifies the allowed deviation from the score of the most relevant chunk. Results with a deviation above the allowed value are excluded from response generation.

Abbreviations

If your source documents contain abbreviations or contractions that differ from common usage, you can clarify them for the system. To do this, upload a JSON file containing these abbreviations with explanations. You can download a template file from the interface.

File format

Each JSON object describes an abbreviation. The key is the abbreviation itself.

The value contains the following fields:

  • name: full form of the abbreviation.
  • variations: spelling variations.
  • description: explanation.

If the abbreviation or its variation is found in the user’s query, they are added to the prompt for rephrasing along with name and description.

Example file:

{
"GDPR": {
"name": "General Data Protection Regulation",
"variations": [],
"description": "A regulation in EU law on data protection and privacy in the European Economic Area (EEA) and the United Kingdom."
},
"PIN": {
"name": "Personal Identification Number",
"variations": ["p.i.n."],
"description": "A code used for secure access, often associated with bank cards."
}
}

Process CSV files

On request

If the knowledge base contains CSV files, a dedicated tabular pipeline is used to extract data from them. Based on the user’s query, the AI agent determines whether the tables contain relevant data, selects the relevant table, generates an SQL query, executes it, and presents the result in the response.

The option to disable the tabular pipeline is available on request. If you require this feature, please contact support at support@tovie.ai.