> ## Documentation Index
> Fetch the complete documentation index at: https://boundbot.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Website sources

> Crawl website pages and import live site content into your BoundBot knowledge base.

export const BrowserWrapper = ({src, alt, title, description, height = '280px', caption, imgStyle = {}, children, imagePadding = '0px'}) => {
  const legacyDocsImagePrefix = '/images/';
  const productionDocsImagePrefix = '/docs/images';
  const rawSrc = typeof src === 'string' ? src : src && (src.src || src.default) || '';
  const imageSrc = rawSrc.startsWith(legacyDocsImagePrefix) ? `${productionDocsImagePrefix}/${rawSrc.slice(legacyDocsImagePrefix.length)}` : rawSrc;
  return <Frame caption={caption || alt || title} width="100%">
      <div style={{
    border: '1px solid #e2e8f0',
    borderRadius: '12px',
    overflow: 'hidden',
    boxShadow: '0 10px 15px -3px rgba(0, 0, 0, 0.1), 0 4px 6px -2px rgba(0, 0, 0, 0.05)',
    width: '100%'
  }}>
        <div style={{
    backgroundColor: '#f8fafc',
    padding: '10px 15px',
    borderBottom: '1px solid #e2e8f0',
    display: 'flex',
    gap: '6px'
  }}>
          <div style={{
    width: '10px',
    height: '10px',
    borderRadius: '50%',
    backgroundColor: '#ff5f56'
  }} />
          <div style={{
    width: '10px',
    height: '10px',
    borderRadius: '50%',
    backgroundColor: '#ffbd2e'
  }} />
          <div style={{
    width: '10px',
    height: '10px',
    borderRadius: '50%',
    backgroundColor: '#27c93f'
  }} />
        </div>
        <div style={{
    overflow: 'hidden',
    background: '#ffffff',
    width: '100%',
    padding: imagePadding
  }}>
          {src ? <img src={imageSrc} alt={alt || ''} data-docs-screenshot style={{
    ...imgStyle,
    width: '100%',
    minWidth: '100%',
    height: 'auto',
    display: 'block',
    borderRadius: '0px'
  }} /> : children ? children : null}
        </div>
      </div>
    </Frame>;
};

Website sources let BoundBot discover, fetch, and index pages from your site. This is the best option when your public content changes often and you want the bot to stay aligned with it. When you add a source, BoundBot starts discovering and scraping pages in the background automatically.

## Add a source

Open <a href="https://www.boundbot.com/dashboard/knowledge/knowledge-base" target="_blank" rel="noopener noreferrer"><b>Knowledge</b></a> -> <a href="https://www.boundbot.com/dashboard/knowledge/websites" target="_blank" rel="noopener noreferrer"><b>Websites</b></a> and click **Add Website**.

<BrowserWrapper src="../images/website-image.png" alt="Add Website Source Modal showing URL input and crawl modes." maxWidth="700px" />

You can add a source in three modes:

* **Crawl links**: discover pages by following internal links
* **Sitemap**: import URLs from your sitemap
* **Individual link**: index one specific page

If you paste a direct `sitemap.xml` URL, BoundBot detects it and treats it as a sitemap source automatically.

## Advanced options

For crawl and sitemap sources, you can also set:

* **Include path prefix** to limit the crawl to part of a site
* **Exclude path prefix** to skip sections such as `/admin`
* **Max links** to control crawl size
* **Auto Recrawl** if you want the source refreshed on a schedule

After you save the source, BoundBot opens the source detail page and starts the first crawl automatically. While the source is running, the page updates live with discovered links, crawl progress, failures, and stored size.

## Manage a source

Each source card shows:

* current status
* number of discovered links
* number of scraped links
* total stored size
* last crawl time
* last error, if one occurred

Open a source to inspect every URL, filter by status, search links, and run background actions:

* **Fetch & Crawl** to rediscover links and crawl pending pages in one run
* **Fetch Links** to start the same background discovery-and-crawl workflow from the source card or detail page
* **Crawl Pending** to scrape only links that are still pending
* **Retry Failed** to retry links that failed on the last pass
* **Retrain Agent** to mark every discovered link for recrawl and scrape the whole source again
* **Crawl Selected** to crawl only the links you choose from the table

## Watch your limits

The top summary shows:

* website source count
* total links
* stored data size
* crawl credit cost per page

These limits vary by plan. If you hit a limit, BoundBot prompts you to upgrade before you add more sources. If a crawl runs out of credits partway through, the remaining links stay pending so you can resume later.

## Best practices

* Start with the smallest useful section of your site.
* Exclude duplicate or low-value pages such as admin routes, login pages, and legal archives if they do not help customers.
* Use **Fetch & Crawl** after major navigation or sitemap changes.
* Use **Retrain Agent** when existing pages changed heavily and you want the full source scraped again.
* Review errors early so broken pages do not quietly degrade your answers.

<Warning>
  Crawling consumes credits. If your site is large, use path filters and max-link limits instead of crawling the whole domain on day one.
</Warning>

## Related pages

<CardGroup cols={2}>
  <Card title="Knowledge base" icon="book-open-text" href="/guides/knowledge-base">
    See how website sources fit with FAQs, files, products, and MCP tools.
  </Card>

  <Card title="Files" icon="files" href="/guides/files">
    Use uploaded documents when the source of truth does not live on a public site.
  </Card>

  <Card title="Products" icon="shopping-bag" href="/guides/products">
    Keep structured catalog data separate from crawled website content.
  </Card>

  <Card title="Plans and limits" icon="badge-dollar-sign" href="/reference/plans-and-limits">
    Review website source, crawl, and storage limits by tier.
  </Card>
</CardGroup>
