feat: Knowledge Agents — queryable corpora from claude-mem (#1653)

* feat: add knowledge agent types, store, builder, and renderer

Phase 1 of Knowledge Agents feature. Introduces corpus compilation
pipeline that filters observations from the database into portable
corpus files stored at ~/.claude-mem/corpora/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add corpus CRUD HTTP endpoints and wire into worker service

Phase 2 of Knowledge Agents. Adds CorpusRoutes with 5 endpoints
(build, list, get, delete, rebuild) and registers them during
worker background initialization alongside SearchRoutes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add KnowledgeAgent with V1 SDK prime/query/reprime

Phase 3 of Knowledge Agents. Uses Agent SDK V1 query() with
resume and disallowedTools for Q&A-only knowledge sessions.
Auto-reprimes on session expiry. Adds prime, query, and reprime
HTTP endpoints to CorpusRoutes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add MCP tools and skill for knowledge agents

Phase 4 of Knowledge Agents. Adds build_corpus, list_corpora,
prime_corpus, and query_corpus MCP tools delegating to worker
HTTP endpoints. Includes /knowledge-agent skill with workflow docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: handle SDK process exit in KnowledgeAgent, add e2e test

The Agent SDK may throw after yielding all messages when the
Claude process exits with a non-zero code. Now tolerates this
if session_id/answer were already captured. Adds comprehensive
e2e test script (31 assertions) orchestrated via tmux-cli.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use settings model ID instead of hardcoded model in KnowledgeAgent

Reads CLAUDE_MEM_MODEL from user settings via getModelId(), matching
the existing SDKAgent pattern. No more hardcoded model assumptions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: improve knowledge agents developer experience

Add public documentation page, rebuild/reprime MCP tools, and actionable
error messages. DX review scored knowledge agents 4/10 — core engineering
works (31/31 e2e) but the feature was invisible. This addresses
discoverability (docs, cross-links), API completeness (missing MCP tools),
and error quality (fix/example fields in error responses).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add quick start guide to knowledge agents page

Covers the three main use cases upfront: creating an agent, asking a
single question, and starting a fresh conversation with reprime. Includes
keeping-it-current section for rebuild + reprime workflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address code review issues — path traversal, session safety, prompt injection

- Block path traversal in CorpusStore with alphanumeric name validation and resolved path check
- Harden system prompt against instruction injection from untrusted corpus content
- Validate question field as non-empty string in query endpoint
- Only persist session_id after successful prime (not null on failure)
- Persist refreshed session_id after query execution
- Only auto-reprime on session resume errors, not all query failures
- Add fenced code block language tags to SKILL.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address remaining code review issues — e2e robustness, MCP validation, docs

- Harden e2e curl wrappers with connect-timeout, fallback to HTTP 000 on transport failure
- Use curl_post wrapper consistently for all long-running POST calls
- Add runtime name validation to all corpus MCP tool handlers
- Fix docs: soften hallucination guarantee to probabilistic claim
- Fix architecture diagram: add missing rebuild_corpus and reprime_corpus tools

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: enforce string[] type in safeParseJsonArray for corpus data integrity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add blank line before fenced code blocks in SKILL.md maintenance section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-04-08 17:30:20 -07:00
committed by GitHub
parent 07be61cf91
commit c648d5d8d2
17 changed files with 2011 additions and 268 deletions
@@ -0,0 +1,218 @@
/**
* Corpus Routes
*
* Handles knowledge agent corpus CRUD operations: build, list, get, delete, rebuild.
* All endpoints delegate to CorpusStore (file I/O) and CorpusBuilder (search + hydrate).
*/
import express, { Request, Response } from 'express';
import { BaseRouteHandler } from '../BaseRouteHandler.js';
import { CorpusStore } from '../../knowledge/CorpusStore.js';
import { CorpusBuilder } from '../../knowledge/CorpusBuilder.js';
import { KnowledgeAgent } from '../../knowledge/KnowledgeAgent.js';
import type { CorpusFilter } from '../../knowledge/types.js';
export class CorpusRoutes extends BaseRouteHandler {
constructor(
private corpusStore: CorpusStore,
private corpusBuilder: CorpusBuilder,
private knowledgeAgent: KnowledgeAgent
) {
super();
}
setupRoutes(app: express.Application): void {
app.post('/api/corpus', this.handleBuildCorpus.bind(this));
app.get('/api/corpus', this.handleListCorpora.bind(this));
app.get('/api/corpus/:name', this.handleGetCorpus.bind(this));
app.delete('/api/corpus/:name', this.handleDeleteCorpus.bind(this));
app.post('/api/corpus/:name/rebuild', this.handleRebuildCorpus.bind(this));
app.post('/api/corpus/:name/prime', this.handlePrimeCorpus.bind(this));
app.post('/api/corpus/:name/query', this.handleQueryCorpus.bind(this));
app.post('/api/corpus/:name/reprime', this.handleReprimeCorpus.bind(this));
}
/**
* Build a new corpus from matching observations
* POST /api/corpus
* Body: { name, description?, project?, types?, concepts?, files?, query?, date_start?, date_end?, limit? }
*/
private handleBuildCorpus = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
if (!req.body.name) {
res.status(400).json({
error: 'Missing required field: name',
fix: 'Add a "name" field to your request body',
example: { name: 'my-corpus', query: 'hooks', limit: 100 }
});
return;
}
const { name, description, project, types, concepts, files, query, date_start, date_end, limit } = req.body;
const filter: CorpusFilter = {};
if (project) filter.project = project;
if (types) filter.types = types;
if (concepts) filter.concepts = concepts;
if (files) filter.files = files;
if (query) filter.query = query;
if (date_start) filter.date_start = date_start;
if (date_end) filter.date_end = date_end;
if (limit) filter.limit = limit;
const corpus = await this.corpusBuilder.build(name, description || '', filter);
// Return stats without the full observations array
const { observations, ...metadata } = corpus;
res.json(metadata);
});
/**
* List all corpora with stats
* GET /api/corpus
*/
private handleListCorpora = this.wrapHandler((_req: Request, res: Response): void => {
const corpora = this.corpusStore.list();
res.json(corpora);
});
/**
* Get corpus metadata (without observations)
* GET /api/corpus/:name
*/
private handleGetCorpus = this.wrapHandler((req: Request, res: Response): void => {
const { name } = req.params;
const corpus = this.corpusStore.read(name);
if (!corpus) {
res.status(404).json({
error: `Corpus "${name}" not found`,
fix: 'Check the corpus name or build a new one',
available: this.corpusStore.list().map(c => c.name)
});
return;
}
// Return metadata without the full observations array
const { observations, ...metadata } = corpus;
res.json(metadata);
});
/**
* Delete a corpus
* DELETE /api/corpus/:name
*/
private handleDeleteCorpus = this.wrapHandler((req: Request, res: Response): void => {
const { name } = req.params;
const existed = this.corpusStore.delete(name);
if (!existed) {
res.status(404).json({
error: `Corpus "${name}" not found`,
fix: 'Check the corpus name or build a new one',
available: this.corpusStore.list().map(c => c.name)
});
return;
}
res.json({ success: true });
});
/**
* Rebuild a corpus from its stored filter
* POST /api/corpus/:name/rebuild
*/
private handleRebuildCorpus = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
const { name } = req.params;
const existingCorpus = this.corpusStore.read(name);
if (!existingCorpus) {
res.status(404).json({
error: `Corpus "${name}" not found`,
fix: 'Check the corpus name or build a new one',
available: this.corpusStore.list().map(c => c.name)
});
return;
}
const corpus = await this.corpusBuilder.build(name, existingCorpus.description, existingCorpus.filter);
// Return stats without the full observations array
const { observations, ...metadata } = corpus;
res.json(metadata);
});
/**
* Prime a corpus — load all observations into a new Agent SDK session
* POST /api/corpus/:name/prime
*/
private handlePrimeCorpus = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
const { name } = req.params;
const corpus = this.corpusStore.read(name);
if (!corpus) {
res.status(404).json({
error: `Corpus "${name}" not found`,
fix: 'Check the corpus name or build a new one',
available: this.corpusStore.list().map(c => c.name)
});
return;
}
const sessionId = await this.knowledgeAgent.prime(corpus);
res.json({ session_id: sessionId, name: corpus.name });
});
/**
* Query a primed corpus — resume the SDK session with a question
* POST /api/corpus/:name/query
* Body: { question: string }
*/
private handleQueryCorpus = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
const { name } = req.params;
if (!req.body.question || typeof req.body.question !== 'string' || req.body.question.trim().length === 0) {
res.status(400).json({
error: 'Missing required field: question',
fix: 'Add a non-empty "question" string to your request body',
example: { question: 'What architectural decisions were made about hooks?' }
});
return;
}
const corpus = this.corpusStore.read(name);
if (!corpus) {
res.status(404).json({
error: `Corpus "${name}" not found`,
fix: 'Check the corpus name or build a new one',
available: this.corpusStore.list().map(c => c.name)
});
return;
}
const { question } = req.body;
const result = await this.knowledgeAgent.query(corpus, question);
res.json({ answer: result.answer, session_id: result.session_id });
});
/**
* Reprime a corpus — create a fresh session, clearing prior Q&A context
* POST /api/corpus/:name/reprime
*/
private handleReprimeCorpus = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
const { name } = req.params;
const corpus = this.corpusStore.read(name);
if (!corpus) {
res.status(404).json({
error: `Corpus "${name}" not found`,
fix: 'Check the corpus name or build a new one',
available: this.corpusStore.list().map(c => c.name)
});
return;
}
const sessionId = await this.knowledgeAgent.reprime(corpus);
res.json({ session_id: sessionId, name: corpus.name });
});
}