Skip to main content

Evaluate quality of knowledge base responses

Response quality evaluation addresses several key tasks:

  • Determine how effectively the knowledge base meets typical user queries.
  • Optimize response quality by adjusting settings and analysing results.
  • Monitor changes in response quality over time as data sources are updated.
info

To evaluate response quality, you need at least the KHUB_EDITOR role.

How evaluation works

For quality evaluation, a test set is used, consisting of queries to the knowledge base and their expected responses. Each query is sent to the knowledge base, and the received response is then passed to the LLM along with the query and expected response. The LLM rates the quality of the actual response on a scale from 1 to 10. The final score is the average of all scores across the test set.

The more questions included in the test set, the more reliable the resulting score will be.

Response quality can be assessed using multiple test sets, with each evaluation conducted independently.

Prepare test set

You can create a test set yourself or generate it using an LLM.

Create test set
  1. Navigate to the Quality evaluation section.

  2. Download the test set template: click Upload under Test sets, and then click Download XLSX template in the upload window.

  3. Add your queries and expected responses to the file.

  4. To evaluate response quality by individual knowledge base segments, add the Segments column to the test set:

    • If answering a question requires searching within specific segments, list them separated by commas.
    • To search unsegmented sources, specify include_without_segments.
    • To search the entire knowledge base, leave the field blank.

    Example: Quality control,include_without_segments.

  5. In the upload window, enter a display name for the test set and attach the completed file.

  6. Click Create.

Under Test sets, you can download the test set.

Download test set

Start evaluation

To modify settings before starting an evaluation:

  1. Click next to the test set.

  2. Select a model from the list.

  3. In the evaluation prompt, you can adjust the scoring scale, clarify criteria for lowering a score, or add examples.

    caution
    • To view and edit the prompt, you need the KHUB_ADMIN role.
    • The prompt defines the response structure that the system expects from the model. An accidental change to this structure (such as deleting a field) will prevent the system from processing the response, causing the evaluation to fail. Make changes with caution.
  4. Click Save and start.

To start the evaluation with the current settings, click Start evaluation next to the test set.

The evaluation process can take a significant amount of time, potentially up to several hours, depending on the number of queries in the test set.

View evaluation results

To download a detailed report with scores for each query in the test set, click next to the test set, and then click Results for the evaluation.

List of evaluations

Set up schedule

To schedule an evaluation:

  1. Click Schedule next to the test set.

  2. Specify the frequency and start time.

  3. To skip the scheduled evaluation if there are no updates to the knowledge base since the last evaluation, use the options under Evaluate only on updates:

    • On data source updates: start evaluation only if the knowledge base data has been updated.
    • On project settings updates: start evaluation only if project settings have been updated.
    • Enable both options to start evaluation whenever there are any updates.

View evaluation history

The evaluation history is available separately for each test set.

To view the chart, select the period and test set under Evaluation history.

Evaluation history

To view the list of evaluations, click next to the test set.