# Domain-Expert Chatbot — Build Kit

A starter kit for building your own domain-specific AI chatbot **with [Claude Code](https://www.anthropic.com/claude-code)**: sharp in exactly one niche, politely useless outside it, answers that cite real sources, and cheap enough to leave running. The architecture is host-neutral — it runs on any serverless stack (Cloudflare Workers, AWS, Google Cloud).

Companion to the write-up: <https://gregwilson.tech/building-domain-specific-ai-chatbots>

Use it however you like. Fill in your domain and go.

---

## How to use this kit

1. Make an empty git repo and open Claude Code in it.
2. Save the **CLAUDE.md template** (below) into the repo as `CLAUDE.md`.
3. Paste the **Starter prompt** (below) into Claude Code.
4. Answer its three questions: your **domain**, your **trusted sources**, and your **local knowledge source**.
5. Work through the **five phases** in order, verifying each before moving on.

(Or just point Claude Code at the URL of this file and say: "Build me this. My domain is ___.")

---

## The architecture

A domain-expert chatbot is a handful of generic capabilities:

```
Browser ──WebSocket──▶ Server ──▶ Session object (one per visitor)
                                     │
                                     ├─ 1. cached answer?    (zero-cost fast path)
                                     ├─ 2. rate-limited?     (per-session budget)
                                     ├─ 3. cheap-model gate  (in scope? needs the web?)
                                     └─ 4. strong model + tools (local search / web)
                                     │
Browser ◀──streamed text + citations┘
```

The patterns that make it work:

- **Two-model routing.** A cheap, fast model classifies *every* message before the expensive model runs: is this in scope, and does it need live web data? Off-topic questions get rejected for ~100× less than it costs to answer them; questions the strong model already knows are answered with no web search at all. Attach the web-search tool *only* when the gate says the answer needs fresh data.
- **Curated web search, not the open web.** Give the search tool a hard allowlist of trusted domains for your field. The open web is a hallucination delivery mechanism; a curated list is your editorial judgment, encoded.
- **A local knowledge base in full-text search.** For the stable facts of your domain, copy the authoritative material into a local SQLite FTS5 (BM25) index and give the model a search tool over it. A local lookup is free; a web search is metered — so most answers should never touch the web. (Keyword full-text search is plenty here; you usually don't need embeddings or a vector database.)
- **Guardrails + cost controls.** A pre-computed answer cache for your suggested questions (zero model cost), per-session and per-IP rate limits, a bot challenge, prompt caching on the system prompt, history pruning (drop old tool results before re-sending), and a bounded tool loop.

### Where to host it (host-neutral)

| Capability | AWS | Google Cloud | Cloudflare |
| --- | --- | --- | --- |
| Compute + streaming/WebSocket | API Gateway WebSocket + Lambda, or Fargate / App Runner | Cloud Run | Workers |
| Per-session state | DynamoDB, keyed by session | Firestore (or Memorystore) | Durable Objects |
| SQL + full-text search | Aurora Serverless v2 (Postgres `tsvector`), or OpenSearch for BM25 | Cloud SQL / AlloyDB (Postgres) | D1 (SQLite + FTS5) |
| Static assets + CDN | S3 + CloudFront | Cloud Storage + Cloud CDN | Static Assets |
| Bot challenge | AWS WAF CAPTCHA | reCAPTCHA Enterprise | Turnstile |
| Scheduled cache refresh | EventBridge Scheduler → Lambda | Cloud Scheduler → Cloud Run | Cron Triggers |

Claude is reachable from all three — the Anthropic API directly, or via Amazon Bedrock or Google Vertex AI — so "which cloud" and "which model" stay independent choices.

---

## Starter prompt (paste into Claude Code)

```
You're helping me build a domain-expert AI chatbot. Goal: it is a sharp expert in
ONE domain, politely declines everything else, answers with cited sources, and is
cheap to run. Use Claude — a small/fast model for routing, a strong model for answers.

Build this architecture on [STACK — e.g. Cloudflare Workers / AWS Lambda / Cloud Run]:

- A streaming chat UI with per-session state (history + a rate-limit counter).
- TWO-MODEL ROUTING: a cheap model classifies every message first — (a) in scope?
  (b) does it need live web data? — and only then does the strong model answer.
  Attach the web-search tool ONLY when the classifier says the answer needs fresh data.
- CURATED WEB SEARCH: restrict the search tool to a hard allowlist of trusted domains.
- A LOCAL KNOWLEDGE BASE: ingest my source material into a SQLite FTS5 (BM25) full-text
  index and give the strong model a search tool over it. Prefer it over web search.
- GUARDRAILS: a pre-computed answer cache for suggested questions, per-session and
  per-IP rate limits, a bot challenge, prompt caching on the system prompt, history
  pruning, and a bounded tool loop (max ~6 steps).

System-prompt rules for the strong model:
- Stay strictly in domain. Refuse to write code, draft documents, or act as a general
  assistant — even when the request is dressed up in domain terms.
- NEVER narrate a tool call ("let me look that up…"). Tool calls are invisible; a turn
  that ends on a promise instead of an answer stops generation and strands the user.
  Always end on a real answer.
- Have a voice and opinions, but flag opinion as opinion and keep facts grounded in
  tool results, not memory.

Before you write any code, ask me three things:
  1. My DOMAIN (what is this bot an expert in?).
  2. 10–20 TRUSTED SOURCE DOMAINS for the web allowlist.
  3. My LOCAL KNOWLEDGE SOURCE (a dataset, a document corpus, a book, an API export).
Then scaffold the project and we'll build it up in phases.
```

---

## CLAUDE.md template

Save this as `CLAUDE.md` in your repo so the architecture and guardrails stay in Claude Code's context across the whole build:

```
# CLAUDE.md — [Project name]

A domain-expert AI chatbot. Expert in [DOMAIN]; declines everything else.

## Architecture
- Streaming chat UI + serverless backend + one stateful object per session.
- Two-model routing: a cheap model gates every message (in scope? needs web?) before
  the strong model answers. The web-search tool is attached only when fresh data is needed.
- Curated web allowlist. Local SQLite FTS5 knowledge base (built by an out-of-band ingest script).
- Guardrails: answer cache, per-session + per-IP rate limits, bot challenge, prompt
  caching, history pruning, bounded tool loop.

## Non-negotiables
- The bot answers [DOMAIN] questions only. It does NOT write code, draft documents, or
  act as a general assistant — refuse those even when domain-framed.
- The model must never narrate a tool call or end a turn on a promise to continue.
- Every verifiable fact comes from a tool result (local search or web), never from memory.

## Definition of done (each change)
- Streams cleanly; citations render; off-topic questions are refused cheaply.
- No new uncited factual claims. Per-question cost is logged. Rate limits intact.

## Cost discipline
- Prefer the local knowledge base over web search. Keep web search behind the gate.
- Cache suggested-question answers. Don't re-send stale tool output on follow-ups.
```

---

## The five phases

Work these in order. Verify each before starting the next.

### 1. Scaffold the chat shell
**Goal:** a streaming chat UI, a serverless backend, and per-session state — nothing domain-specific yet.
**Prompt:** "Scaffold the chat shell from CLAUDE.md: a streaming UI, a serverless backend, and one stateful object per session holding history. No domain logic yet."
**Verify:** you can send a message and watch tokens stream back; reloading keeps the session.

### 2. Add the two-model gate
**Goal:** the cheap model classifies every message before the strong model runs.
**Prompt:** "Add a routing step: a cheap model classifies each message as OFF / KNOWN / FRESH. Only KNOWN or FRESH reach the strong model; attach web search only for FRESH."
**Verify:** off-topic questions are refused cheaply; in-domain questions get answered; the logs show the gate ran first.
**Watch for:** the model ending a turn with "let me look that up…" and then stopping. Add the never-narrate-a-lookup rule to the system prompt.

### 3. Build the knowledge base
**Goal:** a local full-text index of your domain's stable facts, plus the curated allowlist.
**Prompt:** "Write an out-of-band ingest script that loads [my source] into a SQLite FTS5 table, and give the strong model a search tool over it. Restrict web search to this allowlist: [...]."
**Verify:** ask something only your source would know; confirm the answer cites the local source and didn't pay for a web search.
**Watch for:** sources that block the search crawler — add a retry that drops blocked domains and remembers them. Big datasets — build indexes *before* bulk-loading, not after.

### 4. Guardrails and cost controls
**Goal:** make it safe to leave running.
**Prompt:** "Add a pre-computed answer cache for the suggested questions, per-session and per-IP rate limits, a bot challenge on the first request, prompt caching on the system prompt, and history pruning before each model call."
**Verify:** suggested questions return instantly at zero model cost; rapid requests get throttled; the system-prompt prefix is being cached.

### 5. Verify and harden
**Goal:** the part that's still on you.
**Do:** read the diffs, throw ugly and abusive inputs at the gate, watch the per-question cost, and confirm every factual claim is backed by a tool result. Then run a top-to-bottom audit — accessibility, SEO, performance, Open Graph, favicons — and have the agent fix what it finds; it can diagnose Lighthouse/PageSpeed problems straight from a screenshot. Claude Code writes faster than you can read — knowing what "correct" and "cheap" look like is your job.

---

## Lessons that save money and pain

- **Gate before you generate.** Rejecting junk with a cheap model costs ~100× less than answering it with the expensive one. This single split is most of your cost control.
- **Never narrate a lookup.** A turn that ends on "let me check…" with no tool call stops generation and strands the user. Make the model end every turn on a real answer.
- **Curate the web; don't trust it.** A hard allowlist of trusted domains beats the open web on both accuracy and cost.
- **Local beats metered.** Copy stable reference material into a local full-text index; reserve paid web search for what genuinely changes.
- **Keyword search is usually enough.** For one book or a few hundred pages, SQLite FTS5 + BM25 beats standing up a vector database. Reach for embeddings only when you must.
- **Prune the history.** Drop old tool results and citations before re-sending the conversation; you don't need to pay to re-read them every turn.
- **You're the architect.** The agent writes the code; you decide the architecture, curate the sources, and judge whether the output is right. That's the job that isn't automated yet.

---

*Built this into something? I'd love to hear about it. — Greg Wilson, <https://gregwilson.tech>*