Skip to main content

Response generation settings

To access the parameters for generating responses to user queries, go to Project settingsGeneration.

  • To view project settings, you need at least the KHUB_EDITOR role.
  • To edit project settings, you need the KHUB_OWNER or KHUB_ADMIN role.

System prompt

The system prompt is used when generating responses to user queries. The system prompt can provide the model with additional information — for example, describe your company’s area of business; specify requirements for the response’s style, tone, and formatting; and define options for handling special cases, such as when the knowledge base has no relevant information.

caution
  • To view and edit the system prompt, you need the KHUB_ADMIN role.
  • A poorly written prompt can significantly degrade response quality. Please make changes with caution and test the results thoroughly. Always keep a backup of a working version of your prompt.

LLM settings

The LLM settings apply to:

  • generating responses to user queries
  • chunk retrieval within the agentic pipeline
  • rephrasing queries and considering history within the semantic pipeline.

Main settings:

  • Model: Select one of the available language models. For the agentic pipeline, only models that support function calling are available, as the model calls functions to request chunks.
  • Max tokens in request: Limits the number of tokens that can be sent to the LLM.
  • Max tokens in response: Limits the number of tokens that the LLM can generate in one iteration.
  • Temperature: Adjusts the creativity level of responses. Higher temperature values produce more creative and less predictable results. We recommend adjusting either Temperature or Top P, but not both at once.

Advanced settings:

  • Top P adjusts the diversity of responses. At lower values, the LLM selects words from a smaller, more likely set. At higher values, the response becomes more diverse. We recommend adjusting either Top P or Temperature, but not both at once.

  • Presence penalty: reduces the likelihood of repeated tokens in a response. By increasing the value, you decrease the likelihood of repeating words or phrases in the response.

    All repetitions are penalised equally, no matter how frequently they occur. For example, the second appearance of a token is penalised the same as the tenth.

  • Frequency penalty: reduces the likelihood of frequently occurring tokens in a response. By increasing the value, you reduce the likelihood of words or phrases appearing multiple times in the response.

    The impact of Frequency penalty grows with the number of times a token appears in the text.

Show source documents in bot response

If enabled, each knowledge base response includes a list of sources, specifically the files or pages the response is based on.

How sources appear in the response

The source list includes source names and links. In the Tovie Data Agent API, the source list is returned as a relevantSources array.

  • If the source document came from an integration, the link to its original location is provided, such as a link to a page or attachment in Confluence.
  • If the source document was uploaded manually as a file, a temporary download link is provided in channels and via the API. Such links are only valid for a limited time. The test chat displays a link to the Sources section and a download button.

If the source is audio or video, a player is available in the test chat, and playback starts from the timestamp of the most relevant chunk.

Additionally, the API provides an endpoint to download a source from the knowledge base: GET /sources/{sourceId}/download.

Caching

If enabled, LLM responses (excluding function calls) will be cached. Caching responses helps reduce costs and response times for similar queries.

Select the cache time-to-live.

If the cache contains outdated data, clear it manually. Clearing the cache will temporarily increase costs and result in slower responses to similar questions.

Additions to tabular pipeline prompt

If the knowledge base contains CSV or XLSX files, a dedicated tabular pipeline is used to extract data from them. Based on the user’s query, the AI agent determines whether the tables contain relevant data, selects the relevant table, generates an SQL query, executes it, and presents the result in the response.

The tabular pipeline uses a separate system prompt, which is not available for viewing or editing. However, you can append additional instructions to it — for example, to specify formatting requirements for the response.

info

To edit the prompt additions, you need the KHUB_ADMIN role.