Integrate website
This integration enables you to import data from a website into your knowledge base, automatically synchronising and re-indexing it.
Processing of pages via website integration is billed separately.
Data to import
There are three download options:
- Entire website: Pages discovered by the crawler starting from the homepage.
- Section: Only nested (child) pages within the specified section.
- Specific pages: Only the pages at the paths you specify.
- The total number of pages to download is up to 100.
- Only the text content of HTML pages is downloaded. Images are not downloaded or processed (although they may appear in responses if linked in the text). Files in other formats are ignored.
Set up integration
To download page content from a site:
-
Go to Integrations and click Connect → Website.
If you selected Connect integration when creating the project, you will be automatically redirected to the new integration page.
-
Specify the integration settings:
-
Integration name: Prefix for the names of sources imported from this integration.
-
Segments: Segments to assign to sources imported from this integration. Segments can be used to group sources with a common theme. You can specify segments when querying the knowledge base to search within those segments only. For more details, see the Segments section.
-
Domain: the website address. Subdomains of any level are supported.
-
Data to download:
-
For the Section option, specify the starting path.
For example, if you enter
/features/1, pages like/features/1/docsand/features/1/pricing/discountswill be downloaded, but/features/2will not. -
For the Specific pages option, enter all the paths to the pages to be downloaded.
-
-
Automatically synchronise data: Enable this option to keep the knowledge base up to date, or disable it for a one-time import.
cautionEach page is reprocessed on every sync, even if the content hasn’t changed, and this is billed.
-
Automatically restart indexing: Enable this option to re-index the knowledge base after each synchronisation.
noteAll the settings can be modified later.
-
Immediately after setup, the integration status is “Updating.” When it changes to “Connected,” you can view the downloaded pages in the Sources section.
In addition to data from integrations, you can upload files manually.
After loading the data, you need to index the knowledge base.
To learn about forced synchronisation and changing settings, see the Manage integrations section.