diff --git a/REFACTOR-PLAN.md b/REFACTOR-PLAN.md index 67bf3644..8cde3a54 100644 --- a/REFACTOR-PLAN.md +++ b/REFACTOR-PLAN.md @@ -80,8 +80,8 @@ It should NOT: PostToolUse hooks run in parallel. Handle this by: - Make SDK agent calls async and fire-and-forget -- Use a message queue (in-memory) to serialize SDK prompts -- SDK session can handle streaming prompts naturally +- Use the observation_queue SQLite table to serialize observations +- SDK worker polls this queue and processes observations sequentially ### What if the user interrupts Claude Code? @@ -271,17 +271,23 @@ async function runSDKWorker(sessionId: string) { const response = query({ prompt: messageGenerator(), options: { - model: 'claude-sonnet-4-5-20250929', - allowedTools: ['mcp__claude-mem__*'], // ChromaDB tools + model: 'claude-haiku-4-5-20251001', // 3x faster than Sonnet 4.5, quality of Sonnet 4.0-4.1 + allowedTools: [], // No tools needed - agent outputs XML that we parse maxTurns: 1000, cwd: session.cwd } }); - // Consume responses + // Consume responses and parse XML for observations/summaries for await (const msg of response) { - // SDK is storing observations to ChromaDB - // We just need to keep the stream alive + if (msg.type === 'text') { + // Use an XML parser library (e.g., fast-xml-parser or similar) to parse observations and summaries + // Parse blocks and call storeObservation(session_id, project, type, text) + // Parse blocks, extract all 8 fields, format and call storeSummary(session_id, project, text) + + parseAndStoreObservations(msg.content, session); + parseAndStoreSummary(msg.content, session); + } } } ``` @@ -408,7 +414,7 @@ YOUR ROLE You will observe tool executions during this Claude Code session. Your job is to: 1. Extract meaningful insights (not just raw data) -2. Store atomic observations in ChromaDB +2. Store atomic observations in SQLite 3. Focus on: key decisions, patterns discovered, problems solved, technical insights WHAT TO CAPTURE @@ -423,26 +429,28 @@ WHAT TO CAPTURE ✗ NOT work-in-progress (only completed work) ✗ NOT obvious facts (e.g., "TypeScript file has types") -TOOLS AVAILABLE ---------------- -The claude-mem process has direct access to the SQLite database using bun:sqlite. +HOW TO STORE OBSERVATIONS +-------------------------- +When you identify something worth remembering, output your observation in this EXACT XML format: -To store observations, the SDK will call internal functions that execute: -\`\`\`typescript -db.run(\` - INSERT INTO observations (sdk_session_id, project, text, type, created_at, created_at_epoch) - VALUES (?, ?, ?, ?, datetime('now'), unixepoch()) -\`, [sessionId, project, observationText, type]); +\`\`\`xml + + feature + Implemented JWT token refresh flow with 7-day expiry + \`\`\` -Types: decision, bugfix, feature, refactor, discovery +Valid types: decision, bugfix, feature, refactor, discovery -Example observations: -- feature: "Implemented JWT token refresh flow with 7-day expiry" -- bugfix: "Fixed race condition in session middleware by adding mutex" -- decision: "Chose PostgreSQL over MongoDB for ACID guarantees" +Structure requirements: +- is the root element +- must be one of the 5 valid types (single word) +- contains your concise observation (one sentence preferred) +- No additional fields or nesting -You don't need to write SQL directly - the system provides these observations automatically. +The SDK worker will parse all blocks from your response using regex and store them in SQLite. + +You can include your reasoning before or after the observation block, or just output the observation by itself. EXAMPLE ------- @@ -471,7 +479,21 @@ ${JSON.stringify(JSON.parse(obs.tool_output), null, 2)} ANALYSIS TASK ------------- 1. Does this observation contain something worth remembering? -2. If YES: Store it as a clear, concise observation in ChromaDB +2. If YES: Output the observation in this EXACT XML format: + + \`\`\`xml + + feature + Your concise observation here + + \`\`\` + + Requirements: + - Use one of these types: decision, bugfix, feature, refactor, discovery + - Keep text concise (one sentence preferred) + - No markdown formatting inside + - No additional XML fields + 3. If NO: Just acknowledge and wait for next observation Remember: Quality over quantity. Only store meaningful insights.`; @@ -499,29 +521,40 @@ FINAL TASK - Files edited - Notes -3. The system will automatically store your summary using bun:sqlite when you provide it. +3. Generate the structured summary and output it in this EXACT XML format: -Just generate the structured summary text - the claude-mem process will handle storage using: -\`\`\`typescript -db.run(\` - INSERT INTO session_summaries (sdk_session_id, project, summary, created_at, created_at_epoch) - VALUES (?, ?, ?, datetime('now'), unixepoch()) -\`, [sessionId, project, summaryText]); +\`\`\`xml + + Implement JWT authentication system + Existing auth middleware, session management, token storage patterns + Current system uses session cookies; no JWT support; race condition in middleware + Implemented JWT token + refresh flow with 7-day expiry; fixed race condition with mutex; added token validation middleware + Add token revocation API endpoint; write integration tests + + src/auth.ts + src/middleware/session.ts + src/types/user.ts + + + src/auth.ts + src/middleware/auth.ts + src/routes/auth.ts + + Token secret stored in .env; refresh tokens use rotation strategy + \`\`\` -The summary should be suitable for showing the user in future sessions. +Structure requirements: +- is the root element +- All 8 child elements are REQUIRED: request, investigated, learned, completed, next_steps, files_read, files_edited, notes +- and must contain child elements (one per file) +- If no files were read/edited, use empty tags: +- Text fields can be multiple sentences but avoid markdown formatting +- Use underscores in element names: next_steps, files_read, files_edited -FORMAT EXAMPLE: -**Request:** Implement JWT authentication system -**Investigated:** Existing auth middleware, session management, token storage patterns -**Learned:** Current system uses session cookies; no JWT support; race condition in middleware -**Completed:** Implemented JWT token + refresh flow with 7-day expiry; fixed race condition with mutex; added token validation middleware -**Next Steps:** Add token revocation API endpoint; write integration tests -**Files Read:** src/auth.ts, src/middleware/session.ts, src/types/user.ts -**Files Edited:** src/auth.ts, src/middleware/auth.ts, src/routes/auth.ts -**Notes:** Token secret stored in .env; refresh tokens use rotation strategy +The SDK worker will parse the block and extract all fields to store in SQLite. -Generate and store the structured summary now.`; +Generate the summary now in the required XML format.`; } ``` @@ -565,11 +598,12 @@ export function saveHook(stdin: HookInput) { The `claude-mem save` hook just queues observations - processing happens in the background SDK worker process that polls the queue continuously. -This way: -- No background daemons needed -- Works on all platforms -- Self-healing (if worker crashes, next tool restarts it) -- Simple state management +The SDK worker is spawned by `claude-mem new` as a detached process and runs for the duration of the Claude Code session. + +Benefits: +- Works on all platforms (no systemd/launchd needed) +- Self-contained (spawned and managed by claude-mem itself) +- Simple state management (all state in SQLite) --- @@ -590,3 +624,17 @@ This way: - Graceful degradation (log error, continue) - Retry with exponential backoff - Don't block main Claude Code session + +--- + +## Implementation Order + +1. **Database setup** - Create tables and migration scripts +2. **Hook commands** - Implement the 4 hook commands (context, new, save, summary) +3. **SDK worker** - Implement the background worker process with response parsing +4. **SDK prompts** - Wire up the prompts and message generator +5. **Test end-to-end** - Run a real Claude Code session and verify it works + +Start simple. Get one hook working before moving to the next. Don't try to build everything at once. + +**Note:** MCP is only used for retrieval (when Claude Code needs to access stored memories), not for storage. The SDK agent stores data by outputting specially formatted text that the SDK worker parses and writes to SQLite. diff --git a/context/claude-code/models.md b/context/claude-code/models.md new file mode 100644 index 00000000..2dc7deae --- /dev/null +++ b/context/claude-code/models.md @@ -0,0 +1,218 @@ +# Models overview + +> Claude is a family of state-of-the-art large language models developed by Anthropic. This guide introduces our models and compares their performance with legacy models. + +export const ModelId = ({children, style = {}}) => { + const copiedNotice = 'Copied!'; + const handleClick = e => { + const element = e.currentTarget; + const originalText = element.textContent; + navigator.clipboard.writeText(children).then(() => { + element.textContent = copiedNotice; + element.style.backgroundColor = '#d4edda'; + element.style.color = '#155724'; + element.style.borderColor = '#c3e6cb'; + setTimeout(() => { + element.textContent = originalText; + element.style.backgroundColor = '#f5f5f5'; + element.style.color = ''; + element.style.borderColor = 'transparent'; + }, 2000); + }).catch(error => { + console.error('Failed to copy:', error); + }); + }; + const handleMouseEnter = e => { + const element = e.currentTarget; + const tooltip = element.querySelector('.copy-tooltip'); + if (tooltip && element.textContent !== copiedNotice) { + tooltip.style.opacity = '1'; + } + element.style.backgroundColor = '#e8e8e8'; + element.style.borderColor = '#d0d0d0'; + }; + const handleMouseLeave = e => { + const element = e.currentTarget; + const tooltip = element.querySelector('.copy-tooltip'); + if (tooltip) { + tooltip.style.opacity = '0'; + } + if (element.textContent !== copiedNotice) { + element.style.backgroundColor = '#f5f5f5'; + element.style.borderColor = 'transparent'; + } + }; + const defaultStyle = { + cursor: 'pointer', + position: 'relative', + transition: 'all 0.2s ease', + display: 'inline-block', + userSelect: 'none', + backgroundColor: '#f5f5f5', + padding: '2px 4px', + borderRadius: '4px', + fontFamily: 'Monaco, Consolas, "Courier New", monospace', + fontSize: '0.9em', + border: '1px solid transparent', + ...style + }; + return + {children} + ; +}; + + + + Our best model for complex agents and coding + + * Text and image input + * Text output + * 200k context window (1M context beta available) + * Highest intelligence across most tasks + + + + Our fastest and most intelligent Haiku model + + * Text and image input + * Text output + * 200k context window + * Lightning-fast speed with extended thinking + + + + Exceptional model for specialized complex tasks + + * Text and image input + * Text output + * 200k context window + * Superior reasoning capabilities + + + +*** + +## Model names + +| Model | Claude API | AWS Bedrock | GCP Vertex AI | +| ----------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------------ | ---------------------------------------------- | +| Claude Sonnet 4.5 | claude-sonnet-4-5-20250929 | anthropic.claude-sonnet-4-5-20250929-v1:0 | claude-sonnet-4-5\@20250929 | +| Claude Sonnet 4 | claude-sonnet-4-20250514 | anthropic.claude-sonnet-4-20250514-v1:0 | claude-sonnet-4\@20250514 | +| Claude Sonnet 3.7 | claude-3-7-sonnet-20250219 (claude-3-7-sonnet-latest) | anthropic.claude-3-7-sonnet-20250219-v1:0 | claude-3-7-sonnet\@20250219 | +| Claude Haiku 4.5 | claude-haiku-4-5-20251001 | anthropic.claude-haiku-4-5-20251001-v1:0 | claude-haiku-4-5\@20251001 | +| Claude Haiku 3.5 | claude-3-5-haiku-20241022 (claude-3-5-haiku-latest) | anthropic.claude-3-5-haiku-20241022-v1:0 | claude-3-5-haiku\@20241022 | +| Claude Haiku 3 | claude-3-haiku-20240307 | anthropic.claude-3-haiku-20240307-v1:0 | claude-3-haiku\@20240307 | +| Claude Opus 4.1 | claude-opus-4-1-20250805 | anthropic.claude-opus-4-1-20250805-v1:0 | claude-opus-4-1\@20250805 | +| Claude Opus 4 | claude-opus-4-20250514 | anthropic.claude-opus-4-20250514-v1:0 | claude-opus-4\@20250514 | + +Models with the same snapshot date (e.g., 20240620) are identical across all platforms and do not change. The snapshot date in the model name ensures consistency and allows developers to rely on stable performance across different environments. + +Starting with **Claude Sonnet 4.5 and all future models**, AWS Bedrock and Google Vertex AI offer two endpoint types: **global endpoints** (dynamic routing for maximum availability) and **regional endpoints** (guaranteed data routing through specific geographic regions). For more information, see the [third-party platform pricing section](/en/docs/about-claude/pricing#third-party-platform-pricing). + +### Model aliases + +For convenience during development and testing, we offer aliases for our model ids. These aliases automatically point to the most recent snapshot of a given model. When we release new model snapshots, we migrate aliases to point to the newest version of a model, typically within a week of the new release. + + + While aliases are useful for experimentation, we recommend using specific model versions (e.g., `claude-sonnet-4-5-20250929`) in production applications to ensure consistent behavior. + + +| Model | Alias | Model ID | +| ----------------- | ------------------------------------------- | --------------------------------------------- | +| Claude Sonnet 4.5 | claude-sonnet-4-5 | claude-sonnet-4-5-20250929 | +| Claude Sonnet 4 | claude-sonnet-4-0 | claude-sonnet-4-20250514 | +| Claude Sonnet 3.7 | claude-3-7-sonnet-latest | claude-3-7-sonnet-20250219 | +| Claude Haiku 4.5 | claude-haiku-4-5 | claude-haiku-4-5-20251001 | +| Claude Haiku 3.5 | claude-3-5-haiku-latest | claude-3-5-haiku-20241022 | +| Claude Opus 4.1 | claude-opus-4-1 | claude-opus-4-1-20250805 | +| Claude Opus 4 | claude-opus-4-0 | claude-opus-4-20250514 | + + + Aliases are subject to the same rate limits and pricing as the underlying model version they reference. + + +### Model comparison table + +To help you choose the right model for your needs, we've compiled a table comparing the key features and capabilities of each model in the Claude family: + +| Feature | Claude Sonnet 4.5 | Claude Sonnet 4 | Claude Sonnet 3.7 | Claude Opus 4.1 | Claude Opus 4 | Claude Haiku 4.5 | Claude Haiku 3.5 | Claude Haiku 3 | +| :-------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------------------- | +| **Description** | Our best model for complex agents and coding | High-performance model | High-performance model with early extended thinking | Exceptional model for specialized complex tasks | Our previous flagship model | Our fastest and most intelligent Haiku model | Our fastest model | Fast and compact model for near-instant responsiveness | +| **Strengths** | Highest intelligence across most tasks with exceptional agent and coding capabilities | High intelligence and balanced performance | High intelligence with toggleable extended thinking | Very high intelligence and capability for specialized tasks | Very high intelligence and capability | Near-frontier intelligence at blazing speeds with extended thinking and exceptional cost-efficiency | Intelligence at blazing speeds | Quick and accurate targeted performance | +| **Multilingual** | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| **Vision** | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| **[Extended thinking](/en/docs/build-with-claude/extended-thinking)** | Yes | Yes | Yes | Yes | Yes | Yes | No | No | +| **[Priority Tier](/en/api/service-tiers)** | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | +| **API model name** | claude-sonnet-4-5-20250929 | claude-sonnet-4-20250514 | claude-3-7-sonnet-20250219 | claude-opus-4-1-20250805 | claude-opus-4-20250514 | claude-haiku-4-5-20251001 | claude-3-5-haiku-20241022 | claude-3-haiku-20240307 | +| **Comparative latency** | Fast | Fast | Fast | Moderately Fast | Moderately Fast | Fastest | Fastest | Fast | +| **Context window** | 200K /
1M (beta)1 | 200K /
1M (beta)1 | 200K | 200K | 200K | 200K | 200K | 200K | +| **Max output** | 64000 tokens | 64000 tokens | 64000 tokens | 32000 tokens | 32000 tokens | 64000 tokens | 8192 tokens | 4096 tokens | +| **Reliable knowledge cutoff** | Jan 20252 | Jan 20252 | Oct 20242 | Jan 20252 | Jan 20252 | Feb 2025 | 3 | 3 | +| **Training data cutoff** | Jul 2025 | Mar 2025 | Nov 2024 | Mar 2025 | Mar 2025 | Jul 2025 | Jul 2024 | Aug 2023 | + +*1 - Claude Sonnet 4.5 and Claude Sonnet 4 support a [1M token context window](/en/docs/build-with-claude/context-windows#1m-token-context-window) when using the `context-1m-2025-08-07` beta header. [Long context pricing](/en/docs/about-claude/pricing#long-context-pricing) applies to requests exceeding 200K tokens.* + +*2 - **Reliable knowledge cutoff** indicates the date through which a model's knowledge is most extensive and reliable. **Training data cutoff** is the broader date range of training data used. For example, Claude Sonnet 4.5 was trained on publicly available information through July 2025, but its knowledge is most extensive and reliable through January 2025. For more information, see [Anthropic's Transparency Hub](https://www.anthropic.com/transparency).* + +*3 - Some Haiku models have a single training data cutoff date.* + + + Include the beta header `output-128k-2025-02-19` in your API request to increase the maximum output token length to 128k tokens for Claude Sonnet 3.7. + + We strongly suggest using our [streaming Messages API](/en/docs/build-with-claude/streaming) to avoid timeouts when generating longer outputs. + See our guidance on [long requests](/en/api/errors#long-requests) for more details. + + +### Model pricing + +The table below shows the price per million tokens for each model: + +| Model | Base Input Tokens | 5m Cache Writes | 1h Cache Writes | Cache Hits & Refreshes | Output Tokens | +| -------------------------------------------------------------------------- | ----------------- | --------------- | --------------- | ---------------------- | ------------- | +| Claude Opus 4.1 | \$15 / MTok | \$18.75 / MTok | \$30 / MTok | \$1.50 / MTok | \$75 / MTok | +| Claude Opus 4 | \$15 / MTok | \$18.75 / MTok | \$30 / MTok | \$1.50 / MTok | \$75 / MTok | +| Claude Sonnet 4.5 | \$3 / MTok | \$3.75 / MTok | \$6 / MTok | \$0.30 / MTok | \$15 / MTok | +| Claude Sonnet 4 | \$3 / MTok | \$3.75 / MTok | \$6 / MTok | \$0.30 / MTok | \$15 / MTok | +| Claude Sonnet 3.7 | \$3 / MTok | \$3.75 / MTok | \$6 / MTok | \$0.30 / MTok | \$15 / MTok | +| Claude Sonnet 3.5 ([deprecated](/en/docs/about-claude/model-deprecations)) | \$3 / MTok | \$3.75 / MTok | \$6 / MTok | \$0.30 / MTok | \$15 / MTok | +| Claude Haiku 4.5 | \$1 / MTok | \$1.25 / MTok | \$2 / MTok | \$0.10 / MTok | \$5 / MTok | +| Claude Haiku 3.5 | \$0.80 / MTok | \$1 / MTok | \$1.6 / MTok | \$0.08 / MTok | \$4 / MTok | +| Claude Opus 3 ([deprecated](/en/docs/about-claude/model-deprecations)) | \$15 / MTok | \$18.75 / MTok | \$30 / MTok | \$1.50 / MTok | \$75 / MTok | +| Claude Haiku 3 | \$0.25 / MTok | \$0.30 / MTok | \$0.50 / MTok | \$0.03 / MTok | \$1.25 / MTok | + +## Prompt and output performance + +Claude 4 models excel in: + +* **Performance**: Top-tier results in reasoning, coding, multilingual tasks, long-context handling, honesty, and image processing. See the [Claude 4 blog post](http://www.anthropic.com/news/claude-4) for more information. +* **Engaging responses**: Claude models are ideal for applications that require rich, human-like interactions. + + * If you prefer more concise responses, you can adjust your prompts to guide the model toward the desired output length. Refer to our [prompt engineering guides](/en/docs/build-with-claude/prompt-engineering) for details. + * For specific Claude 4 prompting best practices, see our [Claude 4 best practices guide](/en/docs/build-with-claude/prompt-engineering/claude-4-best-practices). +* **Output quality**: When migrating from previous model generations to Claude 4, you may notice larger improvements in overall performance. + +## Migrating to Claude 4.5 + +If you're currently using Claude 3 models, we recommend migrating to Claude 4.5 to take advantage of improved intelligence and enhanced capabilities. For detailed migration instructions, see [Migrating to Claude 4.5](/en/docs/about-claude/models/migrating-to-claude-4). + +## Get started with Claude + +If you're ready to start exploring what Claude can do for you, let's dive in! Whether you're a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we've got you covered. + +Looking to chat with Claude? Visit [claude.ai](http://www.claude.ai)! + + + + Explore Claude’s capabilities and development flow. + + + + Learn how to make your first API call in minutes. + + + + Craft and test powerful prompts directly in your browser. + + + +If you have any questions or need assistance, don't hesitate to reach out to our [support team](https://support.claude.com/) or consult the [Discord community](https://www.anthropic.com/discord).