Integrate website
This integration enables you to import data from a website into your knowledge base. The data can be automatically synchronised and re-indexed.
Processing of pages via website integration is billed separately.
Data to import
There are three download options available:
- Entire website: Pages discovered by the crawler starting from the homepage.
- Section: Nested (child) pages within the specified section.
- Specific pages.
Only the text content of HTML pages is downloaded. Images are not downloaded or processed (although they may appear in responses if linked in the text). Files in other formats are ignored.
Processing of each downloaded page is billed.
Set up integration
To download page content from a site:
-
Go to Integrations and click Connect → Website.
If you selected Connect integration when creating the project, you will be automatically redirected to the new integration page.
-
Specify the integration settings:
-
Integration name: Prefix for the names of sources imported from this integration.
-
Segments: Segments to assign to imported sources. Use them to group sources by theme and limit knowledge base queries to specific segments. For more details, see the Segments section.
-
Domain: The website’s domain name. Subdomains of any level are supported.
-
Data to download:
-
For the Entire website option, set the maximum number of pages to download. The crawl stops once this limit is reached, even if not all discovered links have been processed.
-
For the Section option, specify:
- Starting path. For example, if you enter
/features/1, pages like/features/1/docsand/features/1/pricing/discountsmay be downloaded, but/features/2will not. - Maximum number of pages to download.
- Starting path. For example, if you enter
-
For the Specific pages option, enter all the paths to the pages to be downloaded.
-
-
Automatically synchronise data: Enable this option to keep the knowledge base up to date, or disable it for a one-time import.
cautionEach page is reprocessed on every sync, even if the content hasn’t changed, and this is billed.
-
Automatically restart indexing: Enable this option to re-index the knowledge base after each synchronisation.
noteYou can change these settings later.
-
Immediately after setup, the integration status is “Updating.” When the status changes to “Connected”, you can view the imported files and pages in the Sources section.
In addition to data from integrations, you can upload files manually.
After loading the data, you need to index the knowledge base.
To learn about forced synchronisation and changing settings, see the Manage integrations section.