Evaluate quality of knowledge base responses
Response quality evaluation addresses several key tasks:
- Determine how effectively the knowledge base meets typical user queries.
- Optimize response quality by adjusting settings and analysing results.
- Monitor changes in response quality over time as data sources are updated.
How evaluation works
For quality evaluation, a test set is used, consisting of queries to the knowledge base and their expected responses. Each query is sent to the knowledge base, and the received response is then passed to the LLM along with the query and expected response. The LLM rates the quality of the actual response on a scale from 1 to 10. The final score is the average of all scores across the test set.
The more questions included in the test set, the more reliable the resulting score will be.
Response quality can be assessed using multiple test sets, with each evaluation conducted independently.
Prepare test set
You can create a test set yourself or generate it using an LLM.

- Create manually
- Generate by LLM
-
Navigate to the Quality evaluation section.
-
Download the test set template: click Upload under Test sets, and then click Download XLSX template in the upload window.
-
Add your queries and expected responses to the file.
If you have assigned segments to sources during file upload or integration setup, you can also specify segments for questions in the Segments column. In this case, the search will be limited to sources from the specified segments.
To specify multiple segments, separate them with commas. To include sources without any assigned segments, enter
include_without_segments
. -
In the upload window, enter a name for the test set to display in the list of test sets, and attach the completed file.
-
Click Create.
Under Test sets, you can download the test set.

-
Navigate to the Quality evaluation section.
-
Under Test sets, click Generate.
-
Specify the test set parameters:
-
Name to display in the list of test sets.
-
Language model for generating queries and responses.
-
Number of queries per source document and the maximum number of queries in the test set.
-
Prompt for generation. You can define a specific style or types of questions (e.g. comparative, step-by-step), add examples, or change the language.
Prompt editing is available on request. If you require this feature, please contact support at support@tovie.ai.
cautionThe prompt defines the response structure that the system expects from the model. An accidental change to this structure (such as deleting a field) will prevent the system from processing the response, causing the test set generation to fail. Make changes with caution.
-
-
Click Create.
Generating a test set using an LLM can take a significant amount of time, potentially up to several hours, depending on the number of queries required.
Under Test sets, you can:
-
Track the generation status.
-
Cancel the generation if necessary.
-
Download the generated test set.
Start evaluation
To modify settings before starting an evaluation:
-
Click next to the test set.
-
Select a model from the list.
-
In the evaluation prompt, you can adjust the scoring scale, clarify criteria for lowering a score, or add examples.
Prompt editing is available on request. If you require this feature, please contact support at support@tovie.ai.
cautionThe prompt defines the response structure that the system expects from the model. An accidental change to this structure (such as deleting a field) will prevent the system from processing the response, causing the evaluation to fail. Make changes with caution.
-
Click Save and start.
To start the evaluation with the current settings, click Start evaluation next to the test set.
The evaluation process can take a significant amount of time, potentially up to several hours, depending on the number of queries in the test set.
View evaluation results
To download a detailed report with scores for each query in the test set, click next to the test set, and then click Results for the evaluation.

Set up schedule
To schedule an evaluation:
-
Click Schedule next to the test set.
-
Specify the frequency and start time.
-
To skip the scheduled evaluation if there are no updates to the knowledge base since the last evaluation, use the options under Evaluate only on updates:
- On data source updates: start evaluation only if the knowledge base data has been updated.
- On project settings updates: start evaluation only if project settings have been updated.
- Enable both options to start evaluation whenever there are any updates.
View evaluation history
The evaluation history is available separately for each test set.
To view the chart, select the period and test set under Evaluation history.

To view the list of evaluations, click next to the test set.