{src ?

: children ? children : null}

; }; Website sources let BoundBot discover, fetch, and index pages from your site. This is the best option when your public content changes often and you want the bot to stay aligned with it. When you add a source, BoundBot starts discovering and scraping pages in the background automatically. ## Add a source Open Knowledge -> Websites and click **Add Website**. You can add a source in three modes: * **Crawl links**: discover pages by following internal links * **Sitemap**: import URLs from your sitemap * **Individual link**: index one specific page If you paste a direct `sitemap.xml` URL, BoundBot detects it and treats it as a sitemap source automatically. ## Advanced options For crawl and sitemap sources, you can also set: * **Include path prefix** to limit the crawl to part of a site * **Exclude path prefix** to skip sections such as `/admin` * **Max links** to control crawl size * **Auto Recrawl** if you want the source refreshed on a schedule After you save the source, BoundBot opens the source detail page and starts the first crawl automatically. While the source is running, the page updates live with discovered links, crawl progress, failures, and stored size. ## Manage a source Each source card shows: * current status * number of discovered links * number of scraped links * total stored size * last crawl time * last error, if one occurred Open a source to inspect every URL, filter by status, search links, and run background actions: * **Fetch & Crawl** to rediscover links and crawl pending pages in one run * **Fetch Links** to start the same background discovery-and-crawl workflow from the source card or detail page * **Crawl Pending** to scrape only links that are still pending * **Retry Failed** to retry links that failed on the last pass * **Retrain Agent** to mark every discovered link for recrawl and scrape the whole source again * **Crawl Selected** to crawl only the links you choose from the table ## Watch your limits The top summary shows: * website source count * total links * stored data size * crawl credit cost per page These limits vary by plan. If you hit a limit, BoundBot prompts you to upgrade before you add more sources. If a crawl runs out of credits partway through, the remaining links stay pending so you can resume later. ## Best practices * Start with the smallest useful section of your site. * Exclude duplicate or low-value pages such as admin routes, login pages, and legal archives if they do not help customers. * Use **Fetch & Crawl** after major navigation or sitemap changes. * Use **Retrain Agent** when existing pages changed heavily and you want the full source scraped again. * Review errors early so broken pages do not quietly degrade your answers. Crawling consumes credits. If your site is large, use path filters and max-link limits instead of crawling the whole domain on day one. ## Related pages See how website sources fit with FAQs, files, products, and MCP tools. Use uploaded documents when the source of truth does not live on a public site. Keep structured catalog data separate from crawled website content. Review website source, crawl, and storage limits by tier.