Website sources

Website sources let BoundBot discover, fetch, and index pages from your site. This is the best option when your public content changes often and you want the bot to stay aligned with it. When you add a source, BoundBot starts discovering and scraping pages in the background automatically.

Add a source

Open Knowledge -> Websites and click Add Website. You can add a source in three modes:

Crawl links: discover pages by following internal links
Sitemap: import URLs from your sitemap
Individual link: index one specific page

If you paste a direct sitemap.xml URL, BoundBot detects it and treats it as a sitemap source automatically.

Advanced options

For crawl and sitemap sources, you can also set:

Include path prefix to limit the crawl to part of a site
Exclude path prefix to skip sections such as /admin
Max links to control crawl size
Auto Recrawl if you want the source refreshed on a schedule

After you save the source, BoundBot opens the source detail page and starts the first crawl automatically. While the source is running, the page updates live with discovered links, crawl progress, failures, and stored size.

Manage a source

Each source card shows:

current status
number of discovered links
number of scraped links
total stored size
last crawl time
last error, if one occurred

Open a source to inspect every URL, filter by status, search links, and run background actions:

Fetch & Crawl to rediscover links and crawl pending pages in one run
Fetch Links to start the same background discovery-and-crawl workflow from the source card or detail page
Crawl Pending to scrape only links that are still pending
Retry Failed to retry links that failed on the last pass
Retrain Agent to mark every discovered link for recrawl and scrape the whole source again
Crawl Selected to crawl only the links you choose from the table

Watch your limits

The top summary shows:

website source count
total links
stored data size
crawl credit cost per page

These limits vary by plan. If you hit a limit, BoundBot prompts you to upgrade before you add more sources. If a crawl runs out of credits partway through, the remaining links stay pending so you can resume later.

Best practices

Start with the smallest useful section of your site.
Exclude duplicate or low-value pages such as admin routes, login pages, and legal archives if they do not help customers.
Use Fetch & Crawl after major navigation or sitemap changes.
Use Retrain Agent when existing pages changed heavily and you want the full source scraped again.
Review errors early so broken pages do not quietly degrade your answers.

Crawling consumes credits. If your site is large, use path filters and max-link limits instead of crawling the whole domain on day one.

Knowledge base

See how website sources fit with FAQs, files, products, and MCP tools.

Files

Use uploaded documents when the source of truth does not live on a public site.

Products

Keep structured catalog data separate from crawled website content.

Plans and limits

Review website source, crawl, and storage limits by tier.

Getting started

Workspace

Knowledge

Channels

Automation

Sales

Account

Add a source

Advanced options

Manage a source

Watch your limits

Best practices

Knowledge base

Files

Products

Plans and limits

Getting started

Workspace

Knowledge

Channels

Automation

Sales

Account

Documentation Index

​Add a source

​Advanced options

​Manage a source

​Watch your limits

​Best practices

​Related pages

Knowledge base

Files

Products

Plans and limits

Add a source

Advanced options

Manage a source

Watch your limits

Best practices

Related pages