Skip to main content
Website sources let BoundBot discover, fetch, and index pages from your site. This is the best option when your public content changes often and you want the bot to stay aligned with it. When you add a source, BoundBot starts discovering and scraping pages in the background automatically.

Add a source

Open Knowledge -> Websites and click Add Website. You can add a source in three modes:
  • Crawl links: discover pages by following internal links
  • Sitemap: import URLs from your sitemap
  • Individual link: index one specific page
If you paste a direct sitemap.xml URL, BoundBot detects it and treats it as a sitemap source automatically.

Advanced options

For crawl and sitemap sources, you can also set:
  • Include path prefix to limit the crawl to part of a site
  • Exclude path prefix to skip sections such as /admin
  • Max links to control crawl size
  • Auto Recrawl if you want the source refreshed on a schedule
After you save the source, BoundBot opens the source detail page and starts the first crawl automatically. While the source is running, the page updates live with discovered links, crawl progress, failures, and stored size.

Manage a source

Each source card shows:
  • current status
  • number of discovered links
  • number of scraped links
  • total stored size
  • last crawl time
  • last error, if one occurred
Open a source to inspect every URL, filter by status, search links, and run background actions:
  • Fetch & Crawl to rediscover links and crawl pending pages in one run
  • Fetch Links to start the same background discovery-and-crawl workflow from the source card or detail page
  • Crawl Pending to scrape only links that are still pending
  • Retry Failed to retry links that failed on the last pass
  • Retrain Agent to mark every discovered link for recrawl and scrape the whole source again
  • Crawl Selected to crawl only the links you choose from the table

Watch your limits

The top summary shows:
  • website source count
  • total links
  • stored data size
  • crawl credit cost per page
These limits vary by plan. If you hit a limit, BoundBot prompts you to upgrade before you add more sources. If a crawl runs out of credits partway through, the remaining links stay pending so you can resume later.

Best practices

  • Start with the smallest useful section of your site.
  • Exclude duplicate or low-value pages such as admin routes, login pages, and legal archives if they do not help customers.
  • Use Fetch & Crawl after major navigation or sitemap changes.
  • Use Retrain Agent when existing pages changed heavily and you want the full source scraped again.
  • Review errors early so broken pages do not quietly degrade your answers.
Crawling consumes credits. If your site is large, use path filters and max-link limits instead of crawling the whole domain on day one.

Knowledge base

See how website sources fit with FAQs, files, products, and MCP tools.

Files

Use uploaded documents when the source of truth does not live on a public site.

Products

Keep structured catalog data separate from crawled website content.

Plans and limits

Review website source, crawl, and storage limits by tier.