# Flowchart: knowledge-corpus-builder ## Sources Consulted - `src/services/worker/knowledge/CorpusBuilder.ts:1-174` - `src/services/worker/knowledge/KnowledgeAgent.ts:1-284` - `src/services/worker/knowledge/CorpusRenderer.ts:1-133` - `src/services/worker/knowledge/CorpusStore.ts:1-127` - `src/services/worker/http/routes/CorpusRoutes.ts:1-284` - `src/services/worker/search/SearchOrchestrator.ts:1-80` - `src/services/worker/search/ResultFormatter.ts:1-100` - `src/services/context/formatters/AgentFormatter.ts:1-100` ## Happy Path Description `POST /api/corpus` → `handleBuildCorpus` → `CorpusBuilder.build()` maps filters to `SearchOrchestrator.search()` → extract IDs → `SessionStore.getObservationsByIds()` hydrates full records → map to `CorpusObservation` → compute stats (type breakdown, date range) → `CorpusRenderer.generateSystemPrompt()` → `CorpusRenderer.renderCorpus()` produces full-detail markdown → persist to `~/.claude-mem/corpora/{name}.corpus.json` via `CorpusStore.write`. `POST /api/corpus/:name/prime` → `KnowledgeAgent.prime()` → render full corpus text + system prompt → pass to Claude Agent SDK `query()` → capture `session_id` → persist in corpus.json. `POST /api/corpus/:name/query` → `KnowledgeAgent.query()` resumes SDK session by id, agent answers from corpus context, auto-reprimes on expiration. ## Mermaid Flowchart ```mermaid flowchart TD A["POST /api/corpus
CorpusRoutes.ts:43"] --> B["handleBuildCorpus"] B --> C["CorpusBuilder.build
CorpusBuilder.ts:50"] C --> D["SearchOrchestrator.search
CorpusBuilder.ts:64"] D --> E["SessionStore.getObservationsByIds
CorpusBuilder.ts:82"] E --> F["mapObservationToCorpus
CorpusBuilder.ts:126"] F --> G["calculateStats
CorpusBuilder.ts:146"] G --> H["CorpusRenderer.generateSystemPrompt
CorpusBuilder.ts:109"] H --> I["CorpusRenderer.renderCorpus (estimate tokens)
CorpusBuilder.ts:112"] I --> J["CorpusStore.write
CorpusBuilder.ts:116"] J --> K[(~/.claude-mem/corpora/{name}.corpus.json
CorpusStore.ts:14)] L1["GET /api/corpus/:name"] --> L3["CorpusStore.read
CorpusStore.ts:39"] L3 --> K M["POST /api/corpus/:name/prime
CorpusRoutes.ts:213"] --> N["KnowledgeAgent.prime
KnowledgeAgent.ts:58"] N --> P["CorpusRenderer.renderCorpus
CorpusRenderer.ts:14"] P --> Q["Claude Agent SDK query
KnowledgeAgent.ts:75"] Q --> R["session_id captured
KnowledgeAgent.ts:89"] R --> S["CorpusStore.write update session_id
KnowledgeAgent.ts:114"] T["POST /api/corpus/:name/query
CorpusRoutes.ts:235"] --> V["KnowledgeAgent.query
KnowledgeAgent.ts:125"] V --> W["Agent SDK resume session_id
KnowledgeAgent.ts:190-200"] W --> X{Session expired?} X -->|Yes| Y["auto-reprime
KnowledgeAgent.ts:148"] X -->|No| Z["Return answer"] AA["POST /api/corpus/:name/rebuild"] --> C AB["POST /api/corpus/:name/reprime"] --> N AC["DELETE /api/corpus/:name"] --> AD["CorpusStore.delete
CorpusStore.ts:94"] ``` ## Side Effects - Writes `{name}.corpus.json` in `~/.claude-mem/corpora/`. - Spawns Claude Agent SDK subprocess for prime/query. - Creates `OBSERVER_SESSIONS_DIR` if absent. - Environment isolation via `buildIsolatedEnv`. ## External Feature Dependencies **Calls into:** SearchOrchestrator (strategy routing), SessionStore (hydration), Anthropic Claude Agent SDK, SettingsDefaultsManager, ChromaSync (indirect through hybrid). **Called by:** CorpusRoutes HTTP endpoints; knowledge-agent skill (external). ## Potential Duplication Noted **CorpusRenderer vs ResultFormatter vs AgentFormatter** — all three produce markdown from observations: | Renderer | Audience | Density | Grouping | |---|---|---|---| | ResultFormatter | CLI search results | Compact table rows | Date/file | | AgentFormatter | Session context injection | Compact per-line | Day timeline | | CorpusRenderer | Agent priming corpus | FULL DETAIL narrative-first | List or chronological | **No direct code reuse** but all three independently iterate observations and format markdown. Consolidating on a shared rendering interface (base class or strategy) could reduce surface area if output configurations overlap. **Search logic NOT duplicated** — CorpusBuilder correctly delegates to SearchOrchestrator. ## Confidence + Gaps **High:** Build → prime → query flow; 8 HTTP endpoints; session reprime on expiration. **Gaps:** Exact "session expired" detection (regex match at KnowledgeAgent.ts:179); token heuristic (chars/4 at CorpusRenderer.ts:91); no quota enforcement for corpus count/size.