Replace search skill with mem-search (#91)
* feat: add mem-search skill with progressive disclosure architecture Add comprehensive mem-search skill for accessing claude-mem's persistent cross-session memory database. Implements progressive disclosure workflow and token-efficient search patterns. Features: - 12 search operations (observations, sessions, prompts, by-type, by-concept, by-file, timelines, etc.) - Progressive disclosure principles to minimize token usage - Anti-patterns documentation to guide LLM behavior - HTTP API integration for all search functionality - Common workflows with composition examples Structure: - SKILL.md: Entry point with temporal trigger patterns - principles/: Progressive disclosure + anti-patterns - operations/: 12 search operation files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: add CHANGELOG entry for mem-search skill Document mem-search skill addition in Unreleased section with: - 100% effectiveness compliance metrics - Comparison to previous search skill implementation - Progressive disclosure architecture details - Reference to audit report documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: add mem-search skill audit report Add comprehensive audit report validating mem-search skill against Anthropic's official skill-creator documentation. Report includes: - Effectiveness metrics comparison (search vs mem-search) - Critical issues analysis for production readiness - Compliance validation across 6 key dimensions - Reference implementation guidance Result: mem-search achieves 100% compliance vs search's 67% 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: Add comprehensive search architecture analysis document - Document current state of dual search architectures (HTTP API and MCP) - Analyze HTTP endpoints and MCP search server architectures - Identify DRY violations across search implementations - Evaluate the use of curl as the optimal approach for search - Provide architectural recommendations for immediate and long-term improvements - Outline action plan for cleanup, feature parity, DRY refactoring * refactor: Remove deprecated search skill documentation and operations * refactor: Reorganize documentation into public and context directories Changes: - Created docs/public/ for Mintlify documentation (.mdx files) - Created docs/context/ for internal planning and implementation docs - Moved all .mdx files and assets to docs/public/ - Moved all internal .md files to docs/context/ - Added CLAUDE.md to both directories explaining their purpose - Updated docs.json paths to work with new structure Benefits: - Clear separation between user-facing and internal documentation - Easier to maintain Mintlify docs in dedicated directory - Internal context files organized separately 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Enhance session management and continuity in hooks - Updated new-hook.ts to clarify session_id threading and idempotent session creation. - Modified prompts.ts to require claudeSessionId for continuation prompts, ensuring session context is maintained. - Improved SessionStore.ts documentation on createSDKSession to emphasize idempotent behavior and session connection. - Refined SDKAgent.ts to detail continuation prompt logic and its reliance on session.claudeSessionId for unified session handling. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Alex Newman <thedotmack@gmail.com>
@@ -1,73 +0,0 @@
|
||||
# Claude-Mem Documentation Folder
|
||||
|
||||
## What This Folder Is
|
||||
|
||||
This `docs/` folder is a **Mintlify documentation site** - the official user-facing documentation for claude-mem. It's a structured documentation platform with a specific file format and organization.
|
||||
|
||||
## File Structure Requirements
|
||||
|
||||
### Mintlify Documentation Files (.mdx)
|
||||
All official documentation files must be:
|
||||
- Written in `.mdx` format (Markdown with JSX support)
|
||||
- Listed in `docs.json` navigation structure
|
||||
- Follow Mintlify's schema and conventions
|
||||
|
||||
The documentation is organized into these sections:
|
||||
- **Get Started**: Introduction, installation, usage guides
|
||||
- **Best Practices**: Context engineering, progressive disclosure
|
||||
- **Configuration & Development**: Settings, dev workflow, troubleshooting
|
||||
- **Architecture**: System design, components, technical details
|
||||
|
||||
### Configuration File
|
||||
`docs.json` defines:
|
||||
- Site metadata (name, description, theme)
|
||||
- Navigation structure
|
||||
- Branding (logos, colors)
|
||||
- Footer links and social media
|
||||
|
||||
## What Does NOT Belong Here
|
||||
|
||||
**Planning documents, design docs, and reference materials should go in `/context/` instead:**
|
||||
|
||||
Files that should be in `/context/` (not `/docs/`):
|
||||
- Planning documents (`*-plan.md`, `*-outline.md`)
|
||||
- Implementation analysis (`*-audit.md`, `*-code-reference.md`)
|
||||
- Error tracking (`typescript-errors.md`)
|
||||
- Design documents not part of official docs
|
||||
- PR review responses
|
||||
- Reference materials (like `agent-sdk-ref.md`)
|
||||
|
||||
**Example**: The deleted `VIEWER.md` was moved because it was implementation documentation, not user-facing docs.
|
||||
|
||||
## Current Files That Should Be Moved
|
||||
|
||||
These `.md` files currently in `docs/` should probably be moved to `context/`:
|
||||
- `typescript-errors.md` - Error tracking
|
||||
- `worker-service-architecture.md` - Implementation details (not user-facing architecture)
|
||||
- `processing-indicator-audit.md` - Implementation audit
|
||||
- `processing-indicator-code-reference.md` - Code reference
|
||||
- `worker-service-rewrite-outline.md` - Planning document
|
||||
- `worker-service-overhead.md` - Analysis document
|
||||
- `CHROMA.md` - Implementation reference (if not user-facing)
|
||||
- `chroma-search-completion-plan.md` - Planning document
|
||||
|
||||
## How to Add Official Documentation
|
||||
|
||||
1. Create a new `.mdx` file in the appropriate subdirectory
|
||||
2. Add the file path to `docs.json` navigation
|
||||
3. Use Mintlify's frontmatter and components
|
||||
4. Follow the existing documentation style
|
||||
|
||||
## Development Workflow
|
||||
|
||||
**For contributors working on claude-mem:**
|
||||
- Read `/CLAUDE.md` in the project root for development instructions
|
||||
- Place planning/design docs in `/context/`
|
||||
- Only add user-facing documentation to `/docs/`
|
||||
- Test documentation locally with Mintlify CLI if available
|
||||
|
||||
## Summary
|
||||
|
||||
**Simple Rule**:
|
||||
- `/docs/` = Official user documentation (Mintlify .mdx files)
|
||||
- `/context/` = Development context, plans, references, internal docs
|
||||
@@ -0,0 +1,104 @@
|
||||
# Claude-Mem Context Documentation
|
||||
|
||||
## What This Folder Is
|
||||
|
||||
This `docs/context/` folder contains **internal documentation** - planning documents, design references, audits, and work-in-progress materials that support development but are NOT user-facing.
|
||||
|
||||
## Folder Structure
|
||||
|
||||
```
|
||||
docs/
|
||||
├── public/ ← User-facing Mintlify docs (DO NOT put internal docs there)
|
||||
│ └── *.mdx - Official documentation
|
||||
└── context/ ← You are here (Internal documentation)
|
||||
├── *.md - Planning docs, audits, references
|
||||
├── *-plan.md - Implementation plans
|
||||
├── *-audit.md - Code audits and reviews
|
||||
├── agent-sdk-*.md - SDK reference materials
|
||||
└── subdirs/ - Organized by topic
|
||||
```
|
||||
|
||||
## What Belongs Here
|
||||
|
||||
**Internal Documentation** (`.md` format):
|
||||
- Planning documents (`*-plan.md`, `*-outline.md`)
|
||||
- Implementation analysis (`*-audit.md`, `*-code-reference.md`)
|
||||
- Error tracking (`typescript-errors.md`)
|
||||
- Design documents not ready for public docs
|
||||
- PR review responses
|
||||
- Reference materials (like `agent-sdk-ref.md`)
|
||||
- Work-in-progress documentation
|
||||
- Technical investigations and postmortems
|
||||
- Architecture analysis documents
|
||||
|
||||
**Examples from this folder:**
|
||||
- `mem-search-technical-architecture.md` - Deep technical reference
|
||||
- `search-architecture-analysis.md` - Implementation analysis
|
||||
- `agent-sdk-ref.md` - SDK reference for developers
|
||||
- `typescript-errors.md` - Error tracking during development
|
||||
- `worker-service-architecture.md` - Internal architecture notes
|
||||
- `processing-indicator-audit.md` - Code audit document
|
||||
|
||||
## What Does NOT Belong Here
|
||||
|
||||
**User-Facing Documentation** goes in `/docs/public/`:
|
||||
- User guides and tutorials
|
||||
- Official architecture documentation
|
||||
- Installation instructions
|
||||
- Configuration guides
|
||||
- Best practices for users
|
||||
- Troubleshooting guides
|
||||
|
||||
**Rule of Thumb:**
|
||||
- If a user would read it → `/docs/public/` (as `.mdx`)
|
||||
- If only developers/contributors need it → `/docs/context/` (as `.md`)
|
||||
|
||||
## File Organization
|
||||
|
||||
### By Type
|
||||
- `*-plan.md` - Implementation plans for features
|
||||
- `*-audit.md` - Code audits and reviews
|
||||
- `*-postmortem.md` - Analysis of issues or incidents
|
||||
- `*-reference.md` - Technical reference materials
|
||||
- `*-analysis.md` - Architecture or design analysis
|
||||
|
||||
### By Topic
|
||||
- Create subdirectories for related documents
|
||||
- Example: `claude-code/` for Claude Code specific docs
|
||||
- Example: `architecture/` for internal architecture notes
|
||||
|
||||
## Development Workflow
|
||||
|
||||
### When to Create Context Docs
|
||||
|
||||
1. **Planning Phase** - Before implementing a feature
|
||||
- Create `feature-name-plan.md`
|
||||
- Outline implementation steps
|
||||
- Document decisions and tradeoffs
|
||||
|
||||
2. **During Development** - Track issues and decisions
|
||||
- Create `feature-name-audit.md` for code reviews
|
||||
- Update `typescript-errors.md` for build issues
|
||||
- Document gotchas in topic-specific files
|
||||
|
||||
3. **After Implementation** - Preserve knowledge
|
||||
- Create `feature-name-postmortem.md` if issues occurred
|
||||
- Update architecture analysis documents
|
||||
- Archive plan docs (don't delete - useful for history)
|
||||
|
||||
### Graduating to Public Docs
|
||||
|
||||
When internal docs are polished enough for users:
|
||||
1. Convert `.md` to `.mdx` format
|
||||
2. Add Mintlify frontmatter
|
||||
3. Move to appropriate `/docs/public/` subdirectory
|
||||
4. Add to `docs.json` navigation
|
||||
5. Keep original in `/docs/context/` for reference
|
||||
|
||||
## Summary
|
||||
|
||||
**Simple Rule**:
|
||||
- `/docs/context/` = Internal docs, plans, references, audits ← YOU ARE HERE
|
||||
- `/docs/public/` = Official user documentation (Mintlify .mdx files)
|
||||
|
||||
**Purpose**: This folder preserves development context, design decisions, and technical knowledge that helps contributors understand WHY things work the way they do, even if users don't need those details.
|
||||
@@ -0,0 +1,164 @@
|
||||
# CWD Context Fix - Technical Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
This fix adds working directory (CWD) context propagation through the entire claude-mem pipeline, enabling the SDK agent to have spatial awareness of which directory/repository it's observing.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Previously, the SDK agent would:
|
||||
- Search wrong repositories when analyzing file operations
|
||||
- Report "file not found" for files that actually exist
|
||||
- Lack context about which project was being worked on
|
||||
- Generate inaccurate observations due to spatial confusion
|
||||
|
||||
## Solution
|
||||
|
||||
The CWD information now flows through the entire system:
|
||||
|
||||
```
|
||||
Hook Input (cwd) → Worker API (cwd) → SessionManager (cwd) → SDK Agent (tool_cwd)
|
||||
```
|
||||
|
||||
## Data Flow
|
||||
|
||||
### 1. Hook Layer (`save-hook.ts`)
|
||||
```typescript
|
||||
export interface PostToolUseInput {
|
||||
session_id: string;
|
||||
cwd: string; // ← Captured from Claude Code
|
||||
tool_name: string;
|
||||
tool_input: any;
|
||||
tool_response: any;
|
||||
}
|
||||
```
|
||||
|
||||
The hook extracts `cwd` and includes it in the worker API request:
|
||||
```typescript
|
||||
body: JSON.stringify({
|
||||
tool_name,
|
||||
tool_input,
|
||||
tool_response,
|
||||
prompt_number,
|
||||
cwd: cwd || '' // ← Passed to worker
|
||||
})
|
||||
```
|
||||
|
||||
### 2. Worker Service (`worker-service.ts`)
|
||||
```typescript
|
||||
const { tool_name, tool_input, tool_response, prompt_number, cwd } = req.body;
|
||||
|
||||
this.sessionManager.queueObservation(sessionDbId, {
|
||||
tool_name,
|
||||
tool_input,
|
||||
tool_response,
|
||||
prompt_number,
|
||||
cwd // ← Forwarded to queue
|
||||
});
|
||||
```
|
||||
|
||||
### 3. Session Manager (`SessionManager.ts`)
|
||||
```typescript
|
||||
session.pendingMessages.push({
|
||||
type: 'observation',
|
||||
tool_name: data.tool_name,
|
||||
tool_input: data.tool_input,
|
||||
tool_response: data.tool_response,
|
||||
prompt_number: data.prompt_number,
|
||||
cwd: data.cwd // ← Included in message queue
|
||||
});
|
||||
```
|
||||
|
||||
### 4. SDK Agent (`SDKAgent.ts`)
|
||||
```typescript
|
||||
content: buildObservationPrompt({
|
||||
id: 0,
|
||||
tool_name: message.tool_name!,
|
||||
tool_input: JSON.stringify(message.tool_input),
|
||||
tool_output: JSON.stringify(message.tool_response),
|
||||
created_at_epoch: Date.now(),
|
||||
cwd: message.cwd // ← Passed to prompt builder
|
||||
})
|
||||
```
|
||||
|
||||
### 5. Prompt Generation (`prompts.ts`)
|
||||
```typescript
|
||||
return `<tool_used>
|
||||
<tool_name>${obs.tool_name}</tool_name>
|
||||
<tool_time>${new Date(obs.created_at_epoch).toISOString()}</tool_time>${obs.cwd ? `
|
||||
<tool_cwd>${obs.cwd}</tool_cwd>` : ''} // ← Included in XML
|
||||
<tool_input>${JSON.stringify(toolInput, null, 2)}</tool_input>
|
||||
<tool_output>${JSON.stringify(toolOutput, null, 2)}</tool_output>
|
||||
</tool_used>`;
|
||||
```
|
||||
|
||||
## SDK Agent Prompt Changes
|
||||
|
||||
The init prompt now includes a "SPATIAL AWARENESS" section:
|
||||
|
||||
```
|
||||
SPATIAL AWARENESS: Tool executions include the working directory (tool_cwd) to help you understand:
|
||||
- Which repository/project is being worked on
|
||||
- Where files are located relative to the project root
|
||||
- How to match requested paths to actual execution paths
|
||||
```
|
||||
|
||||
## Example Usage
|
||||
|
||||
When a user executes a read operation in `/home/user/my-project`:
|
||||
|
||||
```xml
|
||||
<tool_used>
|
||||
<tool_name>ReadTool</tool_name>
|
||||
<tool_time>2025-11-10T19:18:03.065Z</tool_time>
|
||||
<tool_cwd>/home/user/my-project</tool_cwd>
|
||||
<tool_input>
|
||||
{
|
||||
"path": "src/index.ts"
|
||||
}
|
||||
</tool_input>
|
||||
<tool_output>
|
||||
{
|
||||
"content": "export default..."
|
||||
}
|
||||
</tool_output>
|
||||
</tool_used>
|
||||
```
|
||||
|
||||
The SDK agent now knows:
|
||||
1. The operation happened in `/home/user/my-project`
|
||||
2. The file `src/index.ts` is relative to that directory
|
||||
3. Which repository context to search when generating observations
|
||||
|
||||
## Testing
|
||||
|
||||
8 comprehensive tests validate the CWD propagation:
|
||||
|
||||
```bash
|
||||
npx tsx --test tests/cwd-propagation.test.ts
|
||||
```
|
||||
|
||||
All tests verify:
|
||||
- Type interfaces include `cwd` fields
|
||||
- Hook extracts and passes `cwd`
|
||||
- Worker accepts and forwards `cwd`
|
||||
- SDK agent includes `cwd` in prompts
|
||||
- End-to-end flow is correct
|
||||
|
||||
## Benefits
|
||||
|
||||
1. **Spatial Awareness**: SDK agent knows which directory/repository it's observing
|
||||
2. **Accurate Path Matching**: Can verify if requested paths match executed paths
|
||||
3. **Better Summaries**: Won't search wrong repositories or report false negatives
|
||||
4. **Works with All Models**: Even Haiku benefits from correct context (no need for Opus workaround)
|
||||
|
||||
## Backward Compatibility
|
||||
|
||||
- `cwd` is optional in all interfaces (`cwd?: string`)
|
||||
- Missing `cwd` values are handled gracefully (defaults to empty string)
|
||||
- Existing observations without `cwd` continue to work
|
||||
- No database migration required (CWD is transient, not persisted)
|
||||
|
||||
## Related Issues
|
||||
|
||||
Fixes issue #73 (CWD context missing from SDK agent)
|
||||
@@ -0,0 +1,219 @@
|
||||
# Response to PR Review #47
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Thank you for the thorough review. Most of the "issues" identified are actually **intentional architectural decisions** made to solve production failures. The comprehensive analysis docs (JUST-FUCKING-RUN-IT.md, LINE-BY-LINE-CASCADING-BULLSHIT.md) document why these changes were necessary.
|
||||
|
||||
However, you've identified **2 legitimate issues** that need fixing:
|
||||
1. ✅ **Race condition in worker startup** - Valid concern, needs fixing
|
||||
2. ✅ **Watch mode in production** - Appears to be unintentional leftover from development
|
||||
|
||||
The other concerns are **working as intended** based on documented architectural decisions.
|
||||
|
||||
---
|
||||
|
||||
## Detailed Response to Each Concern
|
||||
|
||||
### ⚠️ Issue #1: Race Condition in Worker Health Check - **VALID CONCERN**
|
||||
|
||||
**Review Comment**: "The spawn() call inside the close event handler is non-blocking, but the function returns immediately. Hooks may attempt HTTP requests before worker has started."
|
||||
|
||||
**Our Response**: **You're absolutely right**. This is a legitimate race condition we need to fix.
|
||||
|
||||
**However**, the suggested fixes (async/await health check, retry loops) are exactly what we intentionally removed because they were causing production failures (see Observation #3602, #3600).
|
||||
|
||||
**Proposed Solution**:
|
||||
The hooks already have proper error handling for `ECONNREFUSED` with actionable user messages:
|
||||
```typescript
|
||||
if (error.cause?.code === 'ECONNREFUSED' || error.name === 'TimeoutError' || error.message.includes('fetch failed')) {
|
||||
throw new Error("There's a problem with the worker. If you just updated, type `pm2 restart claude-mem-worker` in your terminal to continue");
|
||||
}
|
||||
```
|
||||
|
||||
We should either:
|
||||
1. Document this as expected behavior (fire-and-forget spawn)
|
||||
2. Add a single synchronous `pm2 list` check after spawn to verify startup
|
||||
3. Keep the current approach and rely on hook error messages
|
||||
|
||||
**We will NOT re-add**: Retry loops, health check polling, or arbitrary delays. Those caused the 100% failure rate we just fixed.
|
||||
|
||||
---
|
||||
|
||||
### ⚠️ Issue #2: Removed Health Endpoint Information - **INTENTIONAL**
|
||||
|
||||
**Review Comment**: "This removes useful debugging information. When troubleshooting production issues, knowing the PID, active sessions count, and port would be valuable."
|
||||
|
||||
**Our Documentation**:
|
||||
- **Observation #3616**: "Simplified Health Check Endpoint to Minimal Response"
|
||||
- **Observation #3601**: "Minimum Parameters = Minimum Bugs"
|
||||
- **Observation #3600**: "Comprehensive Analysis of Cascading Architectural Problems"
|
||||
|
||||
**Why We Did This**:
|
||||
1. **HTTP 200 = Alive**: If the endpoint responds, the worker is healthy. Period.
|
||||
2. **Diagnostic fields provided no actionable value**: PID, activeSessions, chromaSynced didn't help debug the actual production failures
|
||||
3. **Part of 87% code reduction**: worker-utils.ts went from 113 lines → 15 lines
|
||||
4. **Health checks were hiding real problems**: Retry logic masked that startup sequence was broken
|
||||
|
||||
**Original Problem**:
|
||||
- Worker startup: 4-5 seconds (actual)
|
||||
- Health check timeout: 3 seconds (configured)
|
||||
- Result: **100% user failure rate**
|
||||
|
||||
The detailed health response didn't help diagnose this - fixing the startup sequence (HTTP server first) did.
|
||||
|
||||
**Response**: **Will not change**. The health endpoint serves one purpose: availability signal. Use PM2 commands for diagnostics:
|
||||
- `pm2 list` - See PID, status, memory
|
||||
- `pm2 logs claude-mem-worker` - See application logs
|
||||
- `npm run worker:logs` - Convenience wrapper
|
||||
|
||||
---
|
||||
|
||||
### ⚠️ Issue #3: Auto-Session Creation Without Validation - **NEEDS FIXING**
|
||||
|
||||
**Review Comment**: "Uses non-null assertion (dbSession!) without checking if dbSession is actually null. If getSessionById() returns null, this will throw at runtime."
|
||||
|
||||
**Our Response**: **You're absolutely right**. This is a legitimate bug.
|
||||
|
||||
**Action Required**: Add null checks to `handleObservation` and `handleSummarize` like already exist in `handleInit`:
|
||||
```typescript
|
||||
const dbSession = db.getSessionById(sessionDbId);
|
||||
if (!dbSession) {
|
||||
db.close();
|
||||
res.status(404).json({ error: 'Session not found in database' });
|
||||
return;
|
||||
}
|
||||
```
|
||||
|
||||
**This needs to be fixed before merge.**
|
||||
|
||||
---
|
||||
|
||||
### ⚠️ Issue #4: Removed Observation Counter - **INTENTIONAL**
|
||||
|
||||
**Review Comment**: "Was this used for generating correlation IDs for logging? If so, is there now no way to correlate observations within a session for debugging?"
|
||||
|
||||
**Our Documentation**:
|
||||
- **Observation #3621-3627**: Complete removal of observation counter and correlation IDs
|
||||
- **Observation #3602**: "Architectural Decision: Remove Health Checks and Arbitrary Delays"
|
||||
- **Observation #3612**: "Worker Service Simplification Strategy"
|
||||
|
||||
**Why We Removed It**:
|
||||
1. **Over-engineering**: Provided per-observation tracking when session-level identification was sufficient
|
||||
2. **Part of cascading complexity**: Correlation IDs were monitoring infrastructure for complexity that shouldn't exist
|
||||
3. **Session-level debugging is sufficient**: Most issues diagnosed by knowing which session, not which observation #5 within that session
|
||||
4. **Database IDs provide uniqueness**: Once stored, observations have DB IDs for precise identification
|
||||
|
||||
**The Problem It Was Solving (That No Longer Needs Solving)**:
|
||||
- Tracking individual observations through worker pipeline
|
||||
- Monitoring Chroma sync success/failure per observation
|
||||
- Detailed per-observation timing metrics
|
||||
|
||||
**Why That's Unnecessary**:
|
||||
- Session-level logging is sufficient for debugging
|
||||
- Database IDs provide uniqueness after storage
|
||||
- The monitoring was masking real problems (startup sequence)
|
||||
|
||||
**Response**: **Will not change**. This was part of the simplification strategy that fixed production failures.
|
||||
|
||||
---
|
||||
|
||||
### ⚠️ Issue #5: PM2 Watch Mode in Production - **VALID CONCERN**
|
||||
|
||||
**Review Comment**: "Watch mode causes PM2 to restart the process whenever files change. This is useful during development but potentially problematic in production."
|
||||
|
||||
**Our Investigation**:
|
||||
- **Observation #3631**: Documents what watch mode does, but **no observation documents WHY we enabled it**
|
||||
- **Observation #3611**: PM2 config was "drastically simplified" by removing 21 unnecessary parameters
|
||||
- **Watch mode was kept** during this aggressive simplification
|
||||
|
||||
**Conclusion**: **This appears to be unintentional** - likely enabled for development and inadvertently left enabled.
|
||||
|
||||
**Action Required**: Either:
|
||||
1. **Disable watch mode** (recommended) - Users aren't developing, they're using the plugin
|
||||
2. **Document it as intentional** if there's a reason we want auto-restart on file changes
|
||||
|
||||
**This should be addressed before merge** - likely by disabling watch mode.
|
||||
|
||||
---
|
||||
|
||||
### ⚠️ Issue #6: Duplicate Port Constant - **ACKNOWLEDGED**
|
||||
|
||||
**Review Comment**: "FIXED_PORT constant is defined in 5 places. Creates maintenance burden."
|
||||
|
||||
**Our Response**: **Fair point**. This is technical debt we can clean up.
|
||||
|
||||
**However**, it's low priority because:
|
||||
- Port is unlikely to change
|
||||
- All values are currently consistent
|
||||
- Not causing production issues
|
||||
|
||||
**Action**: Add to backlog for post-merge cleanup. Export from worker-utils.ts and import elsewhere.
|
||||
|
||||
---
|
||||
|
||||
## Summary of Actions
|
||||
|
||||
### Must Fix Before Merge:
|
||||
1. ✅ **Add null checks to auto-session creation** in handleObservation and handleSummarize
|
||||
2. ✅ **Decide on watch mode** - Disable unless there's documented reason to keep it
|
||||
|
||||
### Will Not Change (Intentional Decisions):
|
||||
1. ❌ **Health endpoint simplification** - Part of solving 100% failure rate
|
||||
2. ❌ **Removed observation counter** - Part of simplification strategy
|
||||
3. ❌ **Removed health check system** - Was causing production failures
|
||||
4. ❌ **Fire-and-forget worker spawn** - Hooks have proper error handling
|
||||
|
||||
### Race Condition Discussion Needed:
|
||||
1. 🤔 **Worker startup race condition** - Valid concern, but retry loops caused the original failures. Options:
|
||||
- Keep current approach (hooks handle ECONNREFUSED gracefully)
|
||||
- Add single synchronous `pm2 list` check after spawn
|
||||
- Document as expected behavior
|
||||
|
||||
### Nice to Have (Post-Merge):
|
||||
1. 📋 **Consolidate FIXED_PORT constant** - Technical debt cleanup
|
||||
|
||||
---
|
||||
|
||||
## Key Documentation References
|
||||
|
||||
The architectural decisions are comprehensively documented in:
|
||||
|
||||
1. **JUST-FUCKING-RUN-IT.md** (Observation #3602)
|
||||
- Architectural decision to remove health checks
|
||||
- Philosophy: Trust PM2, let HTTP timeouts be the health check
|
||||
|
||||
2. **LINE-BY-LINE-CASCADING-BULLSHIT.md** (Observation #3600)
|
||||
- Root cause analysis of how health checks caused 100% failure rate
|
||||
- Documents cascade from arbitrary 3000ms timeout → retry loops → race conditions
|
||||
|
||||
3. **MINIMUM-PARAMETERS.md** (Observation #3601)
|
||||
- Quantified impact: 21 unnecessary PM2 parameters, ~160 lines deleted
|
||||
- Philosophy: "Minimum parameters = minimum bugs"
|
||||
|
||||
4. **STUPID-SHIT-THAT-BROKE-PRODUCTION.md** (Observation #3597)
|
||||
- 8 critical issues causing 100% user failure rate
|
||||
- Includes worker crashing on Chroma failures despite data already in SQLite
|
||||
|
||||
These documents explain **why** the simplifications were necessary - they weren't arbitrary removal of useful features, they were targeted fixes for production failures.
|
||||
|
||||
---
|
||||
|
||||
## Production Context
|
||||
|
||||
**Before This PR**:
|
||||
- 100% user failure rate after v4.x release
|
||||
- Worker startup took 4-5 seconds but health checks timed out at 3 seconds
|
||||
- `stdio: 'ignore'` eliminated all debugging visibility
|
||||
- Worker crashed on Chroma failures despite data safely in SQLite
|
||||
- ChromaSync initialized in constructor, blocking HTTP server
|
||||
- 113 lines of health check code with retry loops masking real problems
|
||||
|
||||
**After This PR**:
|
||||
- HTTP server starts immediately
|
||||
- Worker stays alive through Chroma failures (graceful degradation)
|
||||
- Errors are visible (`stdio: 'inherit'`)
|
||||
- Worker-utils.ts: 113 lines → 15 lines (87% reduction)
|
||||
- Hooks have proper error handling with actionable user messages
|
||||
- System works with just SQLite FTS5, Chroma is optional enhancement
|
||||
|
||||
The "removed observability" was actually **removed complexity that was hiding problems**, not helping diagnose them.
|
||||
@@ -0,0 +1,302 @@
|
||||
# Agent Skills in the SDK
|
||||
|
||||
> Extend Claude with specialized capabilities using Agent Skills in the Claude Agent SDK
|
||||
|
||||
## Overview
|
||||
|
||||
Agent Skills extend Claude with specialized capabilities that Claude autonomously invokes when relevant. Skills are packaged as `SKILL.md` files containing instructions, descriptions, and optional supporting resources.
|
||||
|
||||
For comprehensive information about Skills, including benefits, architecture, and authoring guidelines, see the [Agent Skills overview](/en/docs/agents-and-tools/agent-skills/overview).
|
||||
|
||||
## How Skills Work with the SDK
|
||||
|
||||
When using the Claude Agent SDK, Skills are:
|
||||
|
||||
1. **Defined as filesystem artifacts**: Created as `SKILL.md` files in specific directories (`.claude/skills/`)
|
||||
2. **Loaded from filesystem**: Skills are loaded from configured filesystem locations. You must specify `settingSources` (TypeScript) or `setting_sources` (Python) to load Skills from the filesystem
|
||||
3. **Automatically discovered**: Once filesystem settings are loaded, Skill metadata is discovered at startup from user and project directories; full content loaded when triggered
|
||||
4. **Model-invoked**: Claude autonomously chooses when to use them based on context
|
||||
5. **Enabled via allowed\_tools**: Add `"Skill"` to your `allowed_tools` to enable Skills
|
||||
|
||||
Unlike subagents (which can be defined programmatically), Skills must be created as filesystem artifacts. The SDK does not provide a programmatic API for registering Skills.
|
||||
|
||||
<Note>
|
||||
**Default behavior**: By default, the SDK does not load any filesystem settings. To use Skills, you must explicitly configure `settingSources: ['user', 'project']` (TypeScript) or `setting_sources=["user", "project"]` (Python) in your options.
|
||||
</Note>
|
||||
|
||||
## Using Skills with the SDK
|
||||
|
||||
To use Skills with the SDK, you need to:
|
||||
|
||||
1. Include `"Skill"` in your `allowed_tools` configuration
|
||||
2. Configure `settingSources`/`setting_sources` to load Skills from the filesystem
|
||||
|
||||
Once configured, Claude automatically discovers Skills from the specified directories and invokes them when relevant to the user's request.
|
||||
|
||||
<CodeGroup>
|
||||
```python Python theme={null}
|
||||
import asyncio
|
||||
from claude_agent_sdk import query, ClaudeAgentOptions
|
||||
|
||||
async def main():
|
||||
options = ClaudeAgentOptions(
|
||||
cwd="/path/to/project", # Project with .claude/skills/
|
||||
setting_sources=["user", "project"], # Load Skills from filesystem
|
||||
allowed_tools=["Skill", "Read", "Write", "Bash"] # Enable Skill tool
|
||||
)
|
||||
|
||||
async for message in query(
|
||||
prompt="Help me process this PDF document",
|
||||
options=options
|
||||
):
|
||||
print(message)
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
```typescript TypeScript theme={null}
|
||||
import { query } from "@anthropic-ai/claude-agent-sdk";
|
||||
|
||||
for await (const message of query({
|
||||
prompt: "Help me process this PDF document",
|
||||
options: {
|
||||
cwd: "/path/to/project", // Project with .claude/skills/
|
||||
settingSources: ["user", "project"], // Load Skills from filesystem
|
||||
allowedTools: ["Skill", "Read", "Write", "Bash"] // Enable Skill tool
|
||||
}
|
||||
})) {
|
||||
console.log(message);
|
||||
}
|
||||
```
|
||||
</CodeGroup>
|
||||
|
||||
## Skill Locations
|
||||
|
||||
Skills are loaded from filesystem directories based on your `settingSources`/`setting_sources` configuration:
|
||||
|
||||
* **Project Skills** (`.claude/skills/`): Shared with your team via git - loaded when `setting_sources` includes `"project"`
|
||||
* **User Skills** (`~/.claude/skills/`): Personal Skills across all projects - loaded when `setting_sources` includes `"user"`
|
||||
* **Plugin Skills**: Bundled with installed Claude Code plugins
|
||||
|
||||
## Creating Skills
|
||||
|
||||
Skills are defined as directories containing a `SKILL.md` file with YAML frontmatter and Markdown content. The `description` field determines when Claude invokes your Skill.
|
||||
|
||||
**Example directory structure**:
|
||||
|
||||
```bash theme={null}
|
||||
.claude/skills/processing-pdfs/
|
||||
└── SKILL.md
|
||||
```
|
||||
|
||||
For complete guidance on creating Skills, including SKILL.md structure, multi-file Skills, and examples, see:
|
||||
|
||||
* [Agent Skills in Claude Code](https://code.claude.com/docs/skills): Complete guide with examples
|
||||
* [Agent Skills Best Practices](/en/docs/agents-and-tools/agent-skills/best-practices): Authoring guidelines and naming conventions
|
||||
|
||||
## Tool Restrictions
|
||||
|
||||
<Note>
|
||||
The `allowed-tools` frontmatter field in SKILL.md is only supported when using Claude Code CLI directly. **It does not apply when using Skills through the SDK**.
|
||||
|
||||
When using the SDK, control tool access through the main `allowedTools` option in your query configuration.
|
||||
</Note>
|
||||
|
||||
To restrict tools for Skills in SDK applications, use the `allowedTools` option:
|
||||
|
||||
<Note>
|
||||
Import statements from the first example are assumed in the following code snippets.
|
||||
</Note>
|
||||
|
||||
<CodeGroup>
|
||||
```python Python theme={null}
|
||||
options = ClaudeAgentOptions(
|
||||
setting_sources=["user", "project"], # Load Skills from filesystem
|
||||
allowed_tools=["Skill", "Read", "Grep", "Glob"] # Restricted toolset
|
||||
)
|
||||
|
||||
async for message in query(
|
||||
prompt="Analyze the codebase structure",
|
||||
options=options
|
||||
):
|
||||
print(message)
|
||||
```
|
||||
|
||||
```typescript TypeScript theme={null}
|
||||
// Skills can only use Read, Grep, and Glob tools
|
||||
for await (const message of query({
|
||||
prompt: "Analyze the codebase structure",
|
||||
options: {
|
||||
settingSources: ["user", "project"], // Load Skills from filesystem
|
||||
allowedTools: ["Skill", "Read", "Grep", "Glob"] // Restricted toolset
|
||||
}
|
||||
})) {
|
||||
console.log(message);
|
||||
}
|
||||
```
|
||||
</CodeGroup>
|
||||
|
||||
## Discovering Available Skills
|
||||
|
||||
To see which Skills are available in your SDK application, simply ask Claude:
|
||||
|
||||
<CodeGroup>
|
||||
```python Python theme={null}
|
||||
options = ClaudeAgentOptions(
|
||||
setting_sources=["user", "project"], # Load Skills from filesystem
|
||||
allowed_tools=["Skill"]
|
||||
)
|
||||
|
||||
async for message in query(
|
||||
prompt="What Skills are available?",
|
||||
options=options
|
||||
):
|
||||
print(message)
|
||||
```
|
||||
|
||||
```typescript TypeScript theme={null}
|
||||
for await (const message of query({
|
||||
prompt: "What Skills are available?",
|
||||
options: {
|
||||
settingSources: ["user", "project"], // Load Skills from filesystem
|
||||
allowedTools: ["Skill"]
|
||||
}
|
||||
})) {
|
||||
console.log(message);
|
||||
}
|
||||
```
|
||||
</CodeGroup>
|
||||
|
||||
Claude will list the available Skills based on your current working directory and installed plugins.
|
||||
|
||||
## Testing Skills
|
||||
|
||||
Test Skills by asking questions that match their descriptions:
|
||||
|
||||
<CodeGroup>
|
||||
```python Python theme={null}
|
||||
options = ClaudeAgentOptions(
|
||||
cwd="/path/to/project",
|
||||
setting_sources=["user", "project"], # Load Skills from filesystem
|
||||
allowed_tools=["Skill", "Read", "Bash"]
|
||||
)
|
||||
|
||||
async for message in query(
|
||||
prompt="Extract text from invoice.pdf",
|
||||
options=options
|
||||
):
|
||||
print(message)
|
||||
```
|
||||
|
||||
```typescript TypeScript theme={null}
|
||||
for await (const message of query({
|
||||
prompt: "Extract text from invoice.pdf",
|
||||
options: {
|
||||
cwd: "/path/to/project",
|
||||
settingSources: ["user", "project"], // Load Skills from filesystem
|
||||
allowedTools: ["Skill", "Read", "Bash"]
|
||||
}
|
||||
})) {
|
||||
console.log(message);
|
||||
}
|
||||
```
|
||||
</CodeGroup>
|
||||
|
||||
Claude automatically invokes the relevant Skill if the description matches your request.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Skills Not Found
|
||||
|
||||
**Check settingSources configuration**: Skills are only loaded when you explicitly configure `settingSources`/`setting_sources`. This is the most common issue:
|
||||
|
||||
<CodeGroup>
|
||||
```python Python theme={null}
|
||||
# Wrong - Skills won't be loaded
|
||||
options = ClaudeAgentOptions(
|
||||
allowed_tools=["Skill"]
|
||||
)
|
||||
|
||||
# Correct - Skills will be loaded
|
||||
options = ClaudeAgentOptions(
|
||||
setting_sources=["user", "project"], # Required to load Skills
|
||||
allowed_tools=["Skill"]
|
||||
)
|
||||
```
|
||||
|
||||
```typescript TypeScript theme={null}
|
||||
// Wrong - Skills won't be loaded
|
||||
const options = {
|
||||
allowedTools: ["Skill"]
|
||||
};
|
||||
|
||||
// Correct - Skills will be loaded
|
||||
const options = {
|
||||
settingSources: ["user", "project"], // Required to load Skills
|
||||
allowedTools: ["Skill"]
|
||||
};
|
||||
```
|
||||
</CodeGroup>
|
||||
|
||||
For more details on `settingSources`/`setting_sources`, see the [TypeScript SDK reference](/en/docs/agent-sdk/typescript#settingsource) or [Python SDK reference](/en/docs/agent-sdk/python#settingsource).
|
||||
|
||||
**Check working directory**: The SDK loads Skills relative to the `cwd` option. Ensure it points to a directory containing `.claude/skills/`:
|
||||
|
||||
<CodeGroup>
|
||||
```python Python theme={null}
|
||||
# Ensure your cwd points to the directory containing .claude/skills/
|
||||
options = ClaudeAgentOptions(
|
||||
cwd="/path/to/project", # Must contain .claude/skills/
|
||||
setting_sources=["user", "project"], # Required to load Skills
|
||||
allowed_tools=["Skill"]
|
||||
)
|
||||
```
|
||||
|
||||
```typescript TypeScript theme={null}
|
||||
// Ensure your cwd points to the directory containing .claude/skills/
|
||||
const options = {
|
||||
cwd: "/path/to/project", // Must contain .claude/skills/
|
||||
settingSources: ["user", "project"], // Required to load Skills
|
||||
allowedTools: ["Skill"]
|
||||
};
|
||||
```
|
||||
</CodeGroup>
|
||||
|
||||
See the "Using Skills with the SDK" section above for the complete pattern.
|
||||
|
||||
**Verify filesystem location**:
|
||||
|
||||
```bash theme={null}
|
||||
# Check project Skills
|
||||
ls .claude/skills/*/SKILL.md
|
||||
|
||||
# Check personal Skills
|
||||
ls ~/.claude/skills/*/SKILL.md
|
||||
```
|
||||
|
||||
### Skill Not Being Used
|
||||
|
||||
**Check the Skill tool is enabled**: Confirm `"Skill"` is in your `allowedTools`.
|
||||
|
||||
**Check the description**: Ensure it's specific and includes relevant keywords. See [Agent Skills Best Practices](/en/docs/agents-and-tools/agent-skills/best-practices#writing-effective-descriptions) for guidance on writing effective descriptions.
|
||||
|
||||
### Additional Troubleshooting
|
||||
|
||||
For general Skills troubleshooting (YAML syntax, debugging, etc.), see the [Claude Code Skills troubleshooting section](https://code.claude.com/docs/skills#troubleshooting).
|
||||
|
||||
## Related Documentation
|
||||
|
||||
### Skills Guides
|
||||
|
||||
* [Agent Skills in Claude Code](https://code.claude.com/docs/skills): Complete Skills guide with creation, examples, and troubleshooting
|
||||
* [Agent Skills Overview](/en/docs/agents-and-tools/agent-skills/overview): Conceptual overview, benefits, and architecture
|
||||
* [Agent Skills Best Practices](/en/docs/agents-and-tools/agent-skills/best-practices): Authoring guidelines for effective Skills
|
||||
* [Agent Skills Cookbook](https://github.com/anthropics/claude-cookbooks/tree/main/skills): Example Skills and templates
|
||||
|
||||
### SDK Resources
|
||||
|
||||
* [Subagents in the SDK](/en/docs/agent-sdk/subagents): Similar filesystem-based agents with programmatic options
|
||||
* [Slash Commands in the SDK](/en/docs/agent-sdk/slash-commands): User-invoked commands
|
||||
* [SDK Overview](/en/docs/agent-sdk/overview): General SDK concepts
|
||||
* [TypeScript SDK Reference](/en/docs/agent-sdk/typescript): Complete API documentation
|
||||
* [Python SDK Reference](/en/docs/agent-sdk/python): Complete API documentation
|
||||
@@ -0,0 +1,607 @@
|
||||
# Agent Skills
|
||||
|
||||
> Create, manage, and share Skills to extend Claude's capabilities in Claude Code.
|
||||
|
||||
This guide shows you how to create, use, and manage Agent Skills in Claude Code. Skills are modular capabilities that extend Claude's functionality through organized folders containing instructions, scripts, and resources.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* Claude Code version 1.0 or later
|
||||
* Basic familiarity with [Claude Code](/en/quickstart)
|
||||
|
||||
## What are Agent Skills?
|
||||
|
||||
Agent Skills package expertise into discoverable capabilities. Each Skill consists of a `SKILL.md` file with instructions that Claude reads when relevant, plus optional supporting files like scripts and templates.
|
||||
|
||||
**How Skills are invoked**: Skills are **model-invoked**—Claude autonomously decides when to use them based on your request and the Skill's description. This is different from slash commands, which are **user-invoked** (you explicitly type `/command` to trigger them).
|
||||
|
||||
**Benefits**:
|
||||
|
||||
* Extend Claude's capabilities for your specific workflows
|
||||
* Share expertise across your team via git
|
||||
* Reduce repetitive prompting
|
||||
* Compose multiple Skills for complex tasks
|
||||
|
||||
Learn more in the [Agent Skills overview](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/overview).
|
||||
|
||||
<Note>
|
||||
For a deep dive into the architecture and real-world applications of Agent Skills, read our engineering blog: [Equipping agents for the real world with Agent Skills](https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills).
|
||||
</Note>
|
||||
|
||||
## Create a Skill
|
||||
|
||||
Skills are stored as directories containing a `SKILL.md` file.
|
||||
|
||||
### Personal Skills
|
||||
|
||||
Personal Skills are available across all your projects. Store them in `~/.claude/skills/`:
|
||||
|
||||
```bash theme={null}
|
||||
mkdir -p ~/.claude/skills/my-skill-name
|
||||
```
|
||||
|
||||
**Use personal Skills for**:
|
||||
|
||||
* Your individual workflows and preferences
|
||||
* Experimental Skills you're developing
|
||||
* Personal productivity tools
|
||||
|
||||
### Project Skills
|
||||
|
||||
Project Skills are shared with your team. Store them in `.claude/skills/` within your project:
|
||||
|
||||
```bash theme={null}
|
||||
mkdir -p .claude/skills/my-skill-name
|
||||
```
|
||||
|
||||
**Use project Skills for**:
|
||||
|
||||
* Team workflows and conventions
|
||||
* Project-specific expertise
|
||||
* Shared utilities and scripts
|
||||
|
||||
Project Skills are checked into git and automatically available to team members.
|
||||
|
||||
### Plugin Skills
|
||||
|
||||
Skills can also come from [Claude Code plugins](/en/plugins). Plugins may bundle Skills that are automatically available when the plugin is installed. These Skills work the same way as personal and project Skills.
|
||||
|
||||
## Write SKILL.md
|
||||
|
||||
Create a `SKILL.md` file with YAML frontmatter and Markdown content:
|
||||
|
||||
```yaml theme={null}
|
||||
---
|
||||
name: your-skill-name
|
||||
description: Brief description of what this Skill does and when to use it
|
||||
---
|
||||
|
||||
# Your Skill Name
|
||||
|
||||
## Instructions
|
||||
Provide clear, step-by-step guidance for Claude.
|
||||
|
||||
## Examples
|
||||
Show concrete examples of using this Skill.
|
||||
```
|
||||
|
||||
**Field requirements**:
|
||||
|
||||
* `name`: Must use lowercase letters, numbers, and hyphens only (max 64 characters)
|
||||
* `description`: Brief description of what the Skill does and when to use it (max 1024 characters)
|
||||
|
||||
The `description` field is critical for Claude to discover when to use your Skill. It should include both what the Skill does and when Claude should use it.
|
||||
|
||||
See the [best practices guide](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/best-practices) for complete authoring guidance including validation rules.
|
||||
|
||||
## Add supporting files
|
||||
|
||||
Create additional files alongside SKILL.md:
|
||||
|
||||
```
|
||||
my-skill/
|
||||
├── SKILL.md (required)
|
||||
├── reference.md (optional documentation)
|
||||
├── examples.md (optional examples)
|
||||
├── scripts/
|
||||
│ └── helper.py (optional utility)
|
||||
└── templates/
|
||||
└── template.txt (optional template)
|
||||
```
|
||||
|
||||
Reference these files from SKILL.md:
|
||||
|
||||
````markdown theme={null}
|
||||
For advanced usage, see [reference.md](reference.md).
|
||||
|
||||
Run the helper script:
|
||||
```bash
|
||||
python scripts/helper.py input.txt
|
||||
```
|
||||
````
|
||||
|
||||
Claude reads these files only when needed, using progressive disclosure to manage context efficiently.
|
||||
|
||||
## Restrict tool access with allowed-tools
|
||||
|
||||
Use the `allowed-tools` frontmatter field to limit which tools Claude can use when a Skill is active:
|
||||
|
||||
```yaml theme={null}
|
||||
---
|
||||
name: safe-file-reader
|
||||
description: Read files without making changes. Use when you need read-only file access.
|
||||
allowed-tools: Read, Grep, Glob
|
||||
---
|
||||
|
||||
# Safe File Reader
|
||||
|
||||
This Skill provides read-only file access.
|
||||
|
||||
## Instructions
|
||||
1. Use Read to view file contents
|
||||
2. Use Grep to search within files
|
||||
3. Use Glob to find files by pattern
|
||||
```
|
||||
|
||||
When this Skill is active, Claude can only use the specified tools (Read, Grep, Glob) without needing to ask for permission. This is useful for:
|
||||
|
||||
* Read-only Skills that shouldn't modify files
|
||||
* Skills with limited scope (e.g., only data analysis, no file writing)
|
||||
* Security-sensitive workflows where you want to restrict capabilities
|
||||
|
||||
If `allowed-tools` is not specified, Claude will ask for permission to use tools as normal, following the standard permission model.
|
||||
|
||||
<Note>
|
||||
`allowed-tools` is only supported for Skills in Claude Code.
|
||||
</Note>
|
||||
|
||||
## View available Skills
|
||||
|
||||
Skills are automatically discovered by Claude from three sources:
|
||||
|
||||
* Personal Skills: `~/.claude/skills/`
|
||||
* Project Skills: `.claude/skills/`
|
||||
* Plugin Skills: bundled with installed plugins
|
||||
|
||||
**To view all available Skills**, ask Claude directly:
|
||||
|
||||
```
|
||||
What Skills are available?
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```
|
||||
List all available Skills
|
||||
```
|
||||
|
||||
This will show all Skills from all sources, including plugin Skills.
|
||||
|
||||
**To inspect a specific Skill**, you can also check the filesystem:
|
||||
|
||||
```bash theme={null}
|
||||
# List personal Skills
|
||||
ls ~/.claude/skills/
|
||||
|
||||
# List project Skills (if in a project directory)
|
||||
ls .claude/skills/
|
||||
|
||||
# View a specific Skill's content
|
||||
cat ~/.claude/skills/my-skill/SKILL.md
|
||||
```
|
||||
|
||||
## Test a Skill
|
||||
|
||||
After creating a Skill, test it by asking questions that match your description.
|
||||
|
||||
**Example**: If your description mentions "PDF files":
|
||||
|
||||
```
|
||||
Can you help me extract text from this PDF?
|
||||
```
|
||||
|
||||
Claude autonomously decides to use your Skill if it matches the request—you don't need to explicitly invoke it. The Skill activates automatically based on the context of your question.
|
||||
|
||||
## Debug a Skill
|
||||
|
||||
If Claude doesn't use your Skill, check these common issues:
|
||||
|
||||
### Make description specific
|
||||
|
||||
**Too vague**:
|
||||
|
||||
```yaml theme={null}
|
||||
description: Helps with documents
|
||||
```
|
||||
|
||||
**Specific**:
|
||||
|
||||
```yaml theme={null}
|
||||
description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
|
||||
```
|
||||
|
||||
Include both what the Skill does and when to use it in the description.
|
||||
|
||||
### Verify file path
|
||||
|
||||
**Personal Skills**: `~/.claude/skills/skill-name/SKILL.md`
|
||||
**Project Skills**: `.claude/skills/skill-name/SKILL.md`
|
||||
|
||||
Check the file exists:
|
||||
|
||||
```bash theme={null}
|
||||
# Personal
|
||||
ls ~/.claude/skills/my-skill/SKILL.md
|
||||
|
||||
# Project
|
||||
ls .claude/skills/my-skill/SKILL.md
|
||||
```
|
||||
|
||||
### Check YAML syntax
|
||||
|
||||
Invalid YAML prevents the Skill from loading. Verify the frontmatter:
|
||||
|
||||
```bash theme={null}
|
||||
cat SKILL.md | head -n 10
|
||||
```
|
||||
|
||||
Ensure:
|
||||
|
||||
* Opening `---` on line 1
|
||||
* Closing `---` before Markdown content
|
||||
* Valid YAML syntax (no tabs, correct indentation)
|
||||
|
||||
### View errors
|
||||
|
||||
Run Claude Code with debug mode to see Skill loading errors:
|
||||
|
||||
```bash theme={null}
|
||||
claude --debug
|
||||
```
|
||||
|
||||
## Share Skills with your team
|
||||
|
||||
**Recommended approach**: Distribute Skills through [plugins](/en/plugins).
|
||||
|
||||
To share Skills via plugin:
|
||||
|
||||
1. Create a plugin with Skills in the `skills/` directory
|
||||
2. Add the plugin to a marketplace
|
||||
3. Team members install the plugin
|
||||
|
||||
For complete instructions, see [Add Skills to your plugin](/en/plugins#add-skills-to-your-plugin).
|
||||
|
||||
You can also share Skills directly through project repositories:
|
||||
|
||||
### Step 1: Add Skill to your project
|
||||
|
||||
Create a project Skill:
|
||||
|
||||
```bash theme={null}
|
||||
mkdir -p .claude/skills/team-skill
|
||||
# Create SKILL.md
|
||||
```
|
||||
|
||||
### Step 2: Commit to git
|
||||
|
||||
```bash theme={null}
|
||||
git add .claude/skills/
|
||||
git commit -m "Add team Skill for PDF processing"
|
||||
git push
|
||||
```
|
||||
|
||||
### Step 3: Team members get Skills automatically
|
||||
|
||||
When team members pull the latest changes, Skills are immediately available:
|
||||
|
||||
```bash theme={null}
|
||||
git pull
|
||||
claude # Skills are now available
|
||||
```
|
||||
|
||||
## Update a Skill
|
||||
|
||||
Edit SKILL.md directly:
|
||||
|
||||
```bash theme={null}
|
||||
# Personal Skill
|
||||
code ~/.claude/skills/my-skill/SKILL.md
|
||||
|
||||
# Project Skill
|
||||
code .claude/skills/my-skill/SKILL.md
|
||||
```
|
||||
|
||||
Changes take effect the next time you start Claude Code. If Claude Code is already running, restart it to load the updates.
|
||||
|
||||
## Remove a Skill
|
||||
|
||||
Delete the Skill directory:
|
||||
|
||||
```bash theme={null}
|
||||
# Personal
|
||||
rm -rf ~/.claude/skills/my-skill
|
||||
|
||||
# Project
|
||||
rm -rf .claude/skills/my-skill
|
||||
git commit -m "Remove unused Skill"
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
### Keep Skills focused
|
||||
|
||||
One Skill should address one capability:
|
||||
|
||||
**Focused**:
|
||||
|
||||
* "PDF form filling"
|
||||
* "Excel data analysis"
|
||||
* "Git commit messages"
|
||||
|
||||
**Too broad**:
|
||||
|
||||
* "Document processing" (split into separate Skills)
|
||||
* "Data tools" (split by data type or operation)
|
||||
|
||||
### Write clear descriptions
|
||||
|
||||
Help Claude discover when to use Skills by including specific triggers in your description:
|
||||
|
||||
**Clear**:
|
||||
|
||||
```yaml theme={null}
|
||||
description: Analyze Excel spreadsheets, create pivot tables, and generate charts. Use when working with Excel files, spreadsheets, or analyzing tabular data in .xlsx format.
|
||||
```
|
||||
|
||||
**Vague**:
|
||||
|
||||
```yaml theme={null}
|
||||
description: For files
|
||||
```
|
||||
|
||||
### Test with your team
|
||||
|
||||
Have teammates use Skills and provide feedback:
|
||||
|
||||
* Does the Skill activate when expected?
|
||||
* Are the instructions clear?
|
||||
* Are there missing examples or edge cases?
|
||||
|
||||
### Document Skill versions
|
||||
|
||||
You can document Skill versions in your SKILL.md content to track changes over time. Add a version history section:
|
||||
|
||||
```markdown theme={null}
|
||||
# My Skill
|
||||
|
||||
## Version History
|
||||
- v2.0.0 (2025-10-01): Breaking changes to API
|
||||
- v1.1.0 (2025-09-15): Added new features
|
||||
- v1.0.0 (2025-09-01): Initial release
|
||||
```
|
||||
|
||||
This helps team members understand what changed between versions.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Claude doesn't use my Skill
|
||||
|
||||
**Symptom**: You ask a relevant question but Claude doesn't use your Skill.
|
||||
|
||||
**Check**: Is the description specific enough?
|
||||
|
||||
Vague descriptions make discovery difficult. Include both what the Skill does and when to use it, with key terms users would mention.
|
||||
|
||||
**Too generic**:
|
||||
|
||||
```yaml theme={null}
|
||||
description: Helps with data
|
||||
```
|
||||
|
||||
**Specific**:
|
||||
|
||||
```yaml theme={null}
|
||||
description: Analyze Excel spreadsheets, generate pivot tables, create charts. Use when working with Excel files, spreadsheets, or .xlsx files.
|
||||
```
|
||||
|
||||
**Check**: Is the YAML valid?
|
||||
|
||||
Run validation to check for syntax errors:
|
||||
|
||||
```bash theme={null}
|
||||
# View frontmatter
|
||||
cat .claude/skills/my-skill/SKILL.md | head -n 15
|
||||
|
||||
# Check for common issues
|
||||
# - Missing opening or closing ---
|
||||
# - Tabs instead of spaces
|
||||
# - Unquoted strings with special characters
|
||||
```
|
||||
|
||||
**Check**: Is the Skill in the correct location?
|
||||
|
||||
```bash theme={null}
|
||||
# Personal Skills
|
||||
ls ~/.claude/skills/*/SKILL.md
|
||||
|
||||
# Project Skills
|
||||
ls .claude/skills/*/SKILL.md
|
||||
```
|
||||
|
||||
### Skill has errors
|
||||
|
||||
**Symptom**: The Skill loads but doesn't work correctly.
|
||||
|
||||
**Check**: Are dependencies available?
|
||||
|
||||
Claude will automatically install required dependencies (or ask for permission to install them) when it needs them.
|
||||
|
||||
**Check**: Do scripts have execute permissions?
|
||||
|
||||
```bash theme={null}
|
||||
chmod +x .claude/skills/my-skill/scripts/*.py
|
||||
```
|
||||
|
||||
**Check**: Are file paths correct?
|
||||
|
||||
Use forward slashes (Unix style) in all paths:
|
||||
|
||||
**Correct**: `scripts/helper.py`
|
||||
**Wrong**: `scripts\helper.py` (Windows style)
|
||||
|
||||
### Multiple Skills conflict
|
||||
|
||||
**Symptom**: Claude uses the wrong Skill or seems confused between similar Skills.
|
||||
|
||||
**Be specific in descriptions**: Help Claude choose the right Skill by using distinct trigger terms in your descriptions.
|
||||
|
||||
Instead of:
|
||||
|
||||
```yaml theme={null}
|
||||
# Skill 1
|
||||
description: For data analysis
|
||||
|
||||
# Skill 2
|
||||
description: For analyzing data
|
||||
```
|
||||
|
||||
Use:
|
||||
|
||||
```yaml theme={null}
|
||||
# Skill 1
|
||||
description: Analyze sales data in Excel files and CRM exports. Use for sales reports, pipeline analysis, and revenue tracking.
|
||||
|
||||
# Skill 2
|
||||
description: Analyze log files and system metrics data. Use for performance monitoring, debugging, and system diagnostics.
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Simple Skill (single file)
|
||||
|
||||
```
|
||||
commit-helper/
|
||||
└── SKILL.md
|
||||
```
|
||||
|
||||
```yaml theme={null}
|
||||
---
|
||||
name: generating-commit-messages
|
||||
description: Generates clear commit messages from git diffs. Use when writing commit messages or reviewing staged changes.
|
||||
---
|
||||
|
||||
# Generating Commit Messages
|
||||
|
||||
## Instructions
|
||||
|
||||
1. Run `git diff --staged` to see changes
|
||||
2. I'll suggest a commit message with:
|
||||
- Summary under 50 characters
|
||||
- Detailed description
|
||||
- Affected components
|
||||
|
||||
## Best practices
|
||||
|
||||
- Use present tense
|
||||
- Explain what and why, not how
|
||||
```
|
||||
|
||||
### Skill with tool permissions
|
||||
|
||||
```
|
||||
code-reviewer/
|
||||
└── SKILL.md
|
||||
```
|
||||
|
||||
```yaml theme={null}
|
||||
---
|
||||
name: code-reviewer
|
||||
description: Review code for best practices and potential issues. Use when reviewing code, checking PRs, or analyzing code quality.
|
||||
allowed-tools: Read, Grep, Glob
|
||||
---
|
||||
|
||||
# Code Reviewer
|
||||
|
||||
## Review checklist
|
||||
|
||||
1. Code organization and structure
|
||||
2. Error handling
|
||||
3. Performance considerations
|
||||
4. Security concerns
|
||||
5. Test coverage
|
||||
|
||||
## Instructions
|
||||
|
||||
1. Read the target files using Read tool
|
||||
2. Search for patterns using Grep
|
||||
3. Find related files using Glob
|
||||
4. Provide detailed feedback on code quality
|
||||
```
|
||||
|
||||
### Multi-file Skill
|
||||
|
||||
```
|
||||
pdf-processing/
|
||||
├── SKILL.md
|
||||
├── FORMS.md
|
||||
├── REFERENCE.md
|
||||
└── scripts/
|
||||
├── fill_form.py
|
||||
└── validate.py
|
||||
```
|
||||
|
||||
**SKILL.md**:
|
||||
|
||||
````yaml theme={null}
|
||||
---
|
||||
name: pdf-processing
|
||||
description: Extract text, fill forms, merge PDFs. Use when working with PDF files, forms, or document extraction. Requires pypdf and pdfplumber packages.
|
||||
---
|
||||
|
||||
# PDF Processing
|
||||
|
||||
## Quick start
|
||||
|
||||
Extract text:
|
||||
```python
|
||||
import pdfplumber
|
||||
with pdfplumber.open("doc.pdf") as pdf:
|
||||
text = pdf.pages[0].extract_text()
|
||||
```
|
||||
|
||||
For form filling, see [FORMS.md](FORMS.md).
|
||||
For detailed API reference, see [REFERENCE.md](REFERENCE.md).
|
||||
|
||||
## Requirements
|
||||
|
||||
Packages must be installed in your environment:
|
||||
```bash
|
||||
pip install pypdf pdfplumber
|
||||
```
|
||||
````
|
||||
|
||||
<Note>
|
||||
List required packages in the description. Packages must be installed in your environment before Claude can use them.
|
||||
</Note>
|
||||
|
||||
Claude loads additional files only when needed.
|
||||
|
||||
## Next steps
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Authoring best practices" icon="lightbulb" href="https://docs.claude.com/en/docs/agents-and-tools/agent-skills/best-practices">
|
||||
Write Skills that Claude can use effectively
|
||||
</Card>
|
||||
|
||||
<Card title="Agent Skills overview" icon="book" href="https://docs.claude.com/en/docs/agents-and-tools/agent-skills/overview">
|
||||
Learn how Skills work across Claude products
|
||||
</Card>
|
||||
|
||||
<Card title="Use Skills in the Agent SDK" icon="cube" href="https://docs.claude.com/en/docs/agent-sdk/skills">
|
||||
Use Skills programmatically with TypeScript and Python
|
||||
</Card>
|
||||
|
||||
<Card title="Get started with Agent Skills" icon="rocket" href="https://docs.claude.com/en/docs/agents-and-tools/agent-skills/quickstart">
|
||||
Create your first Skill
|
||||
</Card>
|
||||
</CardGroup>
|
||||
@@ -0,0 +1,987 @@
|
||||
# Claude-Mem Architecture v3 to v4 Plan (✅ Completed)
|
||||
|
||||
This file exists as a reference to explain the path forward from v3 to v4.
|
||||
|
||||
## Core Purpose
|
||||
|
||||
Create a lightweight, hook-driven memory system that captures important context during Claude Code sessions and makes it available in future sessions.
|
||||
|
||||
**Principles:**
|
||||
- Hooks should be fast and non-blocking
|
||||
- SDK agent synthesizes observations, not just stores raw data
|
||||
- Storage should be simple and queryable
|
||||
- Users should never notice the memory system working
|
||||
|
||||
---
|
||||
|
||||
## Understanding the Foundation
|
||||
|
||||
### What Claude Code Hooks Actually Do
|
||||
|
||||
**SessionStart Hook:**
|
||||
- Runs when Claude Code starts or resumes
|
||||
- Can inject context via stdout (plain text) OR JSON `additionalContext`
|
||||
- This is how we show "What's new" to Claude
|
||||
|
||||
**UserPromptSubmit Hook:**
|
||||
- Runs BEFORE Claude processes the user's message
|
||||
- Can inject context via stdout OR JSON `additionalContext`
|
||||
- This is where we initialize per-session tracking
|
||||
|
||||
**PostToolUse Hook:**
|
||||
- Runs AFTER each tool completes successfully
|
||||
- Gets both tool input and output
|
||||
- Runs in PARALLEL with other matching hooks
|
||||
- This is where we observe what Claude is doing
|
||||
|
||||
**Stop Hook:**
|
||||
- Runs when main agent finishes (NOT on user interrupt)
|
||||
- This is where we finalize the session
|
||||
- Summary should be structured responses that answer the following:
|
||||
- What did user request?
|
||||
- What did you investigate?
|
||||
- What did you learn?
|
||||
- What did you do?
|
||||
- What's next?
|
||||
- Files read
|
||||
- Files edited
|
||||
- Notes
|
||||
|
||||
### How SDK Streaming Actually Works
|
||||
|
||||
**Streaming Input Mode (what we need):**
|
||||
- Persistent session with AsyncGenerator
|
||||
- Can queue multiple messages
|
||||
- Supports interruption via `interrupt()` method
|
||||
- Natural multi-turn conversations
|
||||
- The SDK maintains conversation state
|
||||
|
||||
**Critical insight:** We use "Streaming Input Mode" which creates ONE long-running SDK session per Claude Code session, not multiple short sessions.
|
||||
|
||||
**Session ID Management:**
|
||||
- Session IDs change with each turn of the conversation
|
||||
- Must capture session ID from the initial system message
|
||||
- SDK worker needs to track session ID updates continuously, not just capture once
|
||||
- The first message in the response stream is a system init message with the session_id
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
### Visual Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ CLAUDE CODE SESSION │
|
||||
│ (Main session - user interacting with Claude Code) │
|
||||
│ │
|
||||
│ User → Claude → Tools (Read, Edit, Write, Bash, etc.) │
|
||||
│ │ │
|
||||
│ │ PostToolUse Hook │
|
||||
│ ↓ │
|
||||
│ claude-mem save │
|
||||
│ (queues observation) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ SQLite observation_queue
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ SDK WORKER PROCESS │
|
||||
│ (Background process - detached from main session) │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────┐ │
|
||||
│ │ Message Generator (AsyncIterable) │ │
|
||||
│ │ - Yields initial prompt │ │
|
||||
│ │ - Polls observation_queue │ │
|
||||
│ │ - Yields observation prompts │ │
|
||||
│ └─────────────────────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌─────────────────────────────────────────────┐ │
|
||||
│ │ SDK query() → Claude API │ │
|
||||
│ │ Model: claude-sonnet-4-5 │ │
|
||||
│ │ No tools needed (text-only synthesis) │ │
|
||||
│ └─────────────────────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌─────────────────────────────────────────────┐ │
|
||||
│ │ Response Handler │ │
|
||||
│ │ - Parses XML <observation> blocks │ │
|
||||
│ │ - Parses XML <summary> blocks │ │
|
||||
│ │ - Writes to SQLite tables │ │
|
||||
│ └─────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ SQLite: observations, session_summaries
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ NEXT CLAUDE CODE SESSION │
|
||||
│ │
|
||||
│ SessionStart Hook → claude-mem context │
|
||||
│ (Reads from SQLite and injects context) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### What is the SDK agent's job?
|
||||
|
||||
The SDK agent is a **synthesis engine**, not a data collector.
|
||||
|
||||
It should:
|
||||
- Receive tool observations as they happen
|
||||
- Extract meaningful patterns and insights
|
||||
- Store atomic, searchable observations in SQLite
|
||||
- Synthesize a human-readable summary at the end
|
||||
|
||||
It should NOT:
|
||||
- Store raw tool outputs
|
||||
- Try to capture everything
|
||||
- Make decisions about what Claude Code should do
|
||||
- Block or slow down the main session
|
||||
|
||||
### Session Management Strategy
|
||||
|
||||
**Built-in SDK Session Resumption:**
|
||||
|
||||
The Agent SDK provides native session resumption capabilities. Instead of manually tracking and rebuilding session state, we can leverage the SDK's built-in features:
|
||||
|
||||
```typescript
|
||||
// Resume a previous SDK session
|
||||
const resumedResponse = query({
|
||||
prompt: "Continue where we left off",
|
||||
options: {
|
||||
resume: sdkSessionId // Use the session ID captured from init message
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**When to use session resumption:**
|
||||
- User interrupts Claude Code and resumes later
|
||||
- SDK worker crashes and needs to restart
|
||||
- Long-running observations that span multiple Claude Code sessions
|
||||
|
||||
**Session state tracking:**
|
||||
- Store SDK session ID in database when captured from init message
|
||||
- Mark sessions as 'active', 'completed', 'interrupted', or 'failed'
|
||||
- Use session status to determine whether to resume or start fresh
|
||||
|
||||
### How hooks run in parallel
|
||||
|
||||
PostToolUse hooks run in parallel. Handle this by:
|
||||
- Make SDK agent calls async and fire-and-forget
|
||||
- Use the observation_queue SQLite table to serialize observations
|
||||
- SDK worker polls this queue and processes observations sequentially
|
||||
|
||||
### What if the user interrupts Claude Code?
|
||||
|
||||
Stop hook doesn't run on interrupts. So:
|
||||
- Observations stay in queue
|
||||
- Next session continues where left off
|
||||
- Mark session as 'interrupted' after 24h of inactivity
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Tracks SDK streaming sessions
|
||||
CREATE TABLE sdk_sessions (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
claude_session_id TEXT UNIQUE NOT NULL,
|
||||
sdk_session_id TEXT UNIQUE NOT NULL,
|
||||
project TEXT NOT NULL,
|
||||
user_prompt TEXT,
|
||||
started_at TEXT NOT NULL,
|
||||
started_at_epoch INTEGER NOT NULL,
|
||||
completed_at TEXT,
|
||||
completed_at_epoch INTEGER,
|
||||
status TEXT CHECK(status IN ('active', 'completed', 'failed'))
|
||||
);
|
||||
|
||||
-- Tracks pending observations (message queue)
|
||||
CREATE TABLE observation_queue (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
sdk_session_id TEXT NOT NULL,
|
||||
tool_name TEXT NOT NULL,
|
||||
tool_input TEXT NOT NULL, -- JSON
|
||||
tool_output TEXT NOT NULL, -- JSON
|
||||
created_at_epoch INTEGER NOT NULL,
|
||||
processed_at_epoch INTEGER,
|
||||
FOREIGN KEY(sdk_session_id) REFERENCES sdk_sessions(sdk_session_id)
|
||||
);
|
||||
|
||||
-- Stores extracted observations (what SDK decides is important)
|
||||
CREATE TABLE observations (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
sdk_session_id TEXT NOT NULL,
|
||||
project TEXT NOT NULL,
|
||||
text TEXT NOT NULL,
|
||||
type TEXT NOT NULL, -- 'decision' | 'bugfix' | 'feature' | 'refactor' | 'discovery'
|
||||
created_at TEXT NOT NULL,
|
||||
created_at_epoch INTEGER NOT NULL,
|
||||
FOREIGN KEY(sdk_session_id) REFERENCES sdk_sessions(sdk_session_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_observations_project ON observations(project);
|
||||
CREATE INDEX idx_observations_created ON observations(created_at_epoch DESC);
|
||||
|
||||
-- Stores session summaries
|
||||
CREATE TABLE session_summaries (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
sdk_session_id TEXT UNIQUE NOT NULL,
|
||||
project TEXT NOT NULL,
|
||||
summary TEXT NOT NULL,
|
||||
created_at TEXT NOT NULL,
|
||||
created_at_epoch INTEGER NOT NULL,
|
||||
FOREIGN KEY(sdk_session_id) REFERENCES sdk_sessions(sdk_session_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_summaries_project ON session_summaries(project);
|
||||
CREATE INDEX idx_summaries_created ON session_summaries(created_at_epoch DESC);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Hook Implementation
|
||||
|
||||
**IMPORTANT DISTINCTION:**
|
||||
|
||||
There are TWO separate hook systems at play here:
|
||||
|
||||
1. **Claude Code Hooks** - External command hooks configured in `~/.config/claude-code/settings.json`
|
||||
- These hooks observe the MAIN Claude Code session
|
||||
- They run as external commands (like `claude-mem save`)
|
||||
- This is what we use to capture observations from the user's session
|
||||
|
||||
2. **SDK Hooks** - Programmatic hooks configured in TypeScript code via `HookMatcher`
|
||||
- These hooks would observe the MEMORY SDK agent's own tool usage
|
||||
- They run as TypeScript callbacks within the SDK worker process
|
||||
- We're NOT using these (yet) - they're a future enhancement
|
||||
|
||||
**Our architecture:** Use Claude Code hooks (external commands) to observe the main session, and run a separate SDK worker process that doesn't need its own hooks.
|
||||
|
||||
### 1. SessionStart Hook
|
||||
|
||||
**Purpose:** Show user what happened in recent sessions
|
||||
|
||||
**Claude Code Hook Config (in settings.json):**
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"SessionStart": [{
|
||||
"matcher": "startup",
|
||||
"hooks": [{
|
||||
"type": "command",
|
||||
"command": "claude-mem context"
|
||||
}]
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Command: `claude-mem context`**
|
||||
|
||||
Flow:
|
||||
1. Read stdin JSON (session_id, cwd, source, etc.)
|
||||
2. If source !== "startup", exit immediately
|
||||
3. Extract project from cwd basename
|
||||
4. Query SQLite for recent summaries:
|
||||
```sql
|
||||
SELECT summary, created_at
|
||||
FROM session_summaries
|
||||
WHERE project = ?
|
||||
ORDER BY created_at_epoch DESC
|
||||
LIMIT 10
|
||||
```
|
||||
5. Format results as human-readable text
|
||||
6. Output to stdout (Claude Code automatically injects this)
|
||||
7. Exit with code 0
|
||||
|
||||
### 2. UserPromptSubmit Hook
|
||||
|
||||
**Purpose:** Initialize SDK memory session in background
|
||||
|
||||
**Hook config:**
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"UserPromptSubmit": [{
|
||||
"hooks": [{
|
||||
"type": "command",
|
||||
"command": "claude-mem new"
|
||||
}]
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Command: `claude-mem new`**
|
||||
|
||||
Flow:
|
||||
1. Read stdin JSON (session_id, prompt, cwd, etc.)
|
||||
2. Extract project from cwd
|
||||
3. Create SDK session record in database
|
||||
4. Start SDK session with initialization prompt in background process
|
||||
5. Save SDK session ID to database
|
||||
6. Output: `{"continue": true, "suppressOutput": true}`
|
||||
7. Exit immediately (SDK runs in background daemon/process)
|
||||
|
||||
**The Background SDK Process:**
|
||||
|
||||
The SDK session should run as a detached background process:
|
||||
```typescript
|
||||
// In claude-mem new
|
||||
const child = spawn('claude-mem', ['sdk-worker', session_id], {
|
||||
detached: true,
|
||||
stdio: 'ignore'
|
||||
});
|
||||
child.unref();
|
||||
```
|
||||
|
||||
The SDK worker:
|
||||
```typescript
|
||||
// claude-mem sdk-worker <session_id>
|
||||
import { query } from '@anthropic-ai/agent-sdk';
|
||||
import type { Query, UserMessage } from '@anthropic-ai/agent-sdk';
|
||||
|
||||
async function runSDKWorker(sessionId: string) {
|
||||
const session = await loadSessionFromDB(sessionId);
|
||||
|
||||
// Track the SDK session ID from the init message
|
||||
let sdkSessionId: string | undefined;
|
||||
const abortController = new AbortController();
|
||||
|
||||
// Message generator yields UserMessage objects (role + content)
|
||||
// This matches the SDK's expected format for streaming input mode
|
||||
async function* messageGenerator(): AsyncIterable<UserMessage> {
|
||||
// Initial prompt
|
||||
yield {
|
||||
role: "user",
|
||||
content: buildInitPrompt(session)
|
||||
};
|
||||
|
||||
// Then listen for queued observations
|
||||
while (session.status === 'active' && !abortController.signal.aborted) {
|
||||
const observations = await pollObservationQueue(session.sdk_session_id);
|
||||
|
||||
for (const obs of observations) {
|
||||
yield {
|
||||
role: "user",
|
||||
content: buildObservationPrompt(obs)
|
||||
};
|
||||
markObservationProcessed(obs.id);
|
||||
}
|
||||
|
||||
await sleep(1000); // Poll every second
|
||||
}
|
||||
}
|
||||
|
||||
// Run SDK session with proper streaming interface
|
||||
// The query function signature: query({ prompt, options }): Query
|
||||
const response: Query = query({
|
||||
prompt: messageGenerator(), // AsyncIterable<UserMessage>
|
||||
options: {
|
||||
model: 'claude-sonnet-4-5', // Use documented model name
|
||||
disallowedTools: ['Glob', 'Grep', 'ListMcpResourcesTool', 'WebSearch'], // More efficient than filtering in hooks
|
||||
maxTurns: 1000,
|
||||
cwd: session.cwd,
|
||||
abortController: abortController
|
||||
}
|
||||
});
|
||||
|
||||
try {
|
||||
// Consume responses from the Query AsyncGenerator
|
||||
for await (const msg of response) {
|
||||
// Capture session ID from system init message
|
||||
if (msg.type === 'system' && msg.subtype === 'init') {
|
||||
sdkSessionId = msg.session_id;
|
||||
console.log(`SDK session started with ID: ${sdkSessionId}`);
|
||||
// Store SDK session ID in database for potential resumption
|
||||
await updateSDKSessionId(session.id, sdkSessionId);
|
||||
}
|
||||
|
||||
if (msg.type === 'text') {
|
||||
// Use an XML parser library (e.g., fast-xml-parser or similar) to parse observations and summaries
|
||||
// Parse <observation> blocks and call storeObservation(session_id, project, type, text)
|
||||
// Parse <summary> blocks, extract all 8 fields, format and call storeSummary(session_id, project, text)
|
||||
|
||||
parseAndStoreObservations(msg.content, session);
|
||||
parseAndStoreSummary(msg.content, session);
|
||||
}
|
||||
}
|
||||
} catch (error) {
|
||||
if (abortController.signal.aborted) {
|
||||
console.log('SDK worker interrupted gracefully');
|
||||
} else {
|
||||
console.error('SDK worker error:', error);
|
||||
await markSessionFailed(session.id, error);
|
||||
}
|
||||
} finally {
|
||||
await cleanupSession(session.id);
|
||||
}
|
||||
}
|
||||
|
||||
// Note: Query and UserMessage types are imported from '@anthropic-ai/agent-sdk'
|
||||
// Query extends AsyncGenerator and provides interrupt() and setPermissionMode() methods
|
||||
// UserMessage has the shape: { role: "user", content: string }
|
||||
```
|
||||
|
||||
### 3. PostToolUse Hook
|
||||
|
||||
**Purpose:** Queue tool observations for SDK processing
|
||||
|
||||
**Hook config:**
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"PostToolUse": [{
|
||||
"matcher": "*",
|
||||
"hooks": [{
|
||||
"type": "command",
|
||||
"command": "claude-mem save"
|
||||
}]
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Command: `claude-mem save`**
|
||||
|
||||
Flow:
|
||||
1. Read stdin JSON (tool_name, tool_input, tool_output, etc.)
|
||||
2. Find active SDK session for this project
|
||||
3. Insert observation into queue:
|
||||
```sql
|
||||
INSERT INTO observation_queue
|
||||
(sdk_session_id, tool_name, tool_input, tool_output, created_at_epoch)
|
||||
VALUES (?, ?, ?, ?, ?)
|
||||
```
|
||||
4. Output: `{"continue": true, "suppressOutput": true}`
|
||||
5. Exit immediately
|
||||
|
||||
**Observation Filtering:**
|
||||
|
||||
Tool filtering is now handled at the SDK level (see SDK Worker section) using `disallowedTools` option. The hook no longer needs to filter tools manually - all tool observations that reach the hook are already relevant.
|
||||
|
||||
However, you may still want to skip certain tools for performance reasons:
|
||||
```typescript
|
||||
// Optional: Skip very frequent or low-value tools
|
||||
const SKIP_TOOLS = new Set(['TodoWrite', 'ListMcpResourcesTool']);
|
||||
|
||||
if (SKIP_TOOLS.has(tool_name)) {
|
||||
console.log('{"continue": true, "suppressOutput": true}');
|
||||
process.exit(0);
|
||||
}
|
||||
```
|
||||
|
||||
**Note:** The SDK worker uses `disallowedTools: ['Glob', 'Grep', 'ListMcpResourcesTool', 'WebSearch']` which is more efficient than hook-level filtering because it prevents the SDK agent from using these tools entirely.
|
||||
|
||||
### 4. Stop Hook
|
||||
|
||||
**Purpose:** Signal SDK to finalize and generate summary
|
||||
|
||||
**Hook config:**
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"Stop": [{
|
||||
"hooks": [{
|
||||
"type": "command",
|
||||
"command": "claude-mem summary"
|
||||
}]
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Command: `claude-mem summary`**
|
||||
|
||||
Flow:
|
||||
1. Read stdin JSON (session_id, cwd, etc.)
|
||||
2. Find active SDK session
|
||||
3. Insert special "FINALIZE" message into observation queue:
|
||||
```sql
|
||||
INSERT INTO observation_queue
|
||||
(sdk_session_id, tool_name, tool_input, tool_output, created_at_epoch)
|
||||
VALUES (?, 'FINALIZE', '{}', '{}', ?)
|
||||
```
|
||||
4. Output: `{"continue": true, "suppressOutput": true}`
|
||||
5. Exit immediately
|
||||
|
||||
**SDK Worker Handling:**
|
||||
|
||||
When SDK worker sees FINALIZE message:
|
||||
```typescript
|
||||
if (obs.tool_name === 'FINALIZE') {
|
||||
yield {
|
||||
type: "user",
|
||||
message: {
|
||||
role: "user",
|
||||
content: buildFinalizePrompt(session)
|
||||
}
|
||||
};
|
||||
|
||||
// Wait for SDK to finish processing
|
||||
await waitForSDKCompletion();
|
||||
|
||||
// Update session status
|
||||
await markSessionCompleted(session.id);
|
||||
|
||||
// Exit worker
|
||||
break;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## SDK Agent Prompts
|
||||
|
||||
### Initialization Prompt
|
||||
|
||||
```typescript
|
||||
function buildInitPrompt(project: string, sessionId: string, userPrompt: string): string {
|
||||
return `You are a memory assistant for the "${project}" project.
|
||||
|
||||
SESSION CONTEXT
|
||||
---------------
|
||||
Session ID: ${sessionId}
|
||||
User's Goal: ${userPrompt}
|
||||
Date: ${new Date().toISOString().split('T')[0]}
|
||||
|
||||
YOUR ROLE
|
||||
---------
|
||||
You will observe tool executions during this Claude Code session. Your job is to:
|
||||
|
||||
1. Extract meaningful insights (not just raw data)
|
||||
2. Store atomic observations in SQLite
|
||||
3. Focus on: key decisions, patterns discovered, problems solved, technical insights
|
||||
|
||||
WHAT TO CAPTURE
|
||||
----------------
|
||||
✓ Architecture decisions (e.g., "chose PostgreSQL over MongoDB for ACID guarantees")
|
||||
✓ Bug fixes (e.g., "fixed race condition in auth middleware by adding mutex")
|
||||
✓ New features (e.g., "implemented JWT refresh token flow")
|
||||
✓ Refactorings (e.g., "extracted validation logic into separate service")
|
||||
✓ Discoveries (e.g., "found that API rate limit is 100 req/min")
|
||||
|
||||
✗ NOT routine operations (reading files, listing directories)
|
||||
✗ NOT work-in-progress (only completed work)
|
||||
✗ NOT obvious facts (e.g., "TypeScript file has types")
|
||||
|
||||
HOW TO STORE OBSERVATIONS
|
||||
--------------------------
|
||||
When you identify something worth remembering, output your observation in this EXACT XML format:
|
||||
|
||||
\`\`\`xml
|
||||
<observation>
|
||||
<type>feature</type>
|
||||
<text>Implemented JWT token refresh flow with 7-day expiry</text>
|
||||
</observation>
|
||||
\`\`\`
|
||||
|
||||
Valid types: decision, bugfix, feature, refactor, discovery
|
||||
|
||||
Structure requirements:
|
||||
- <observation> is the root element
|
||||
- <type> must be one of the 5 valid types (single word)
|
||||
- <text> contains your concise observation (one sentence preferred)
|
||||
- No additional fields or nesting
|
||||
|
||||
The SDK worker will parse all <observation> blocks from your response using regex and store them in SQLite.
|
||||
|
||||
You can include your reasoning before or after the observation block, or just output the observation by itself.
|
||||
|
||||
EXAMPLE
|
||||
-------
|
||||
Bad: "Read src/auth.ts file"
|
||||
Good: "Implemented JWT token refresh flow with 7-day expiry"
|
||||
|
||||
Wait for tool observations. Acknowledge this message briefly.`;
|
||||
}
|
||||
```
|
||||
|
||||
### Observation Prompt
|
||||
|
||||
```typescript
|
||||
function buildObservationPrompt(obs: Observation): string {
|
||||
return `TOOL OBSERVATION
|
||||
================
|
||||
Tool: ${obs.tool_name}
|
||||
Time: ${new Date(obs.created_at_epoch).toISOString()}
|
||||
|
||||
Input:
|
||||
${JSON.stringify(JSON.parse(obs.tool_input), null, 2)}
|
||||
|
||||
Output:
|
||||
${JSON.stringify(JSON.parse(obs.tool_output), null, 2)}
|
||||
|
||||
ANALYSIS TASK
|
||||
-------------
|
||||
1. Does this observation contain something worth remembering?
|
||||
2. If YES: Output the observation in this EXACT XML format:
|
||||
|
||||
\`\`\`xml
|
||||
<observation>
|
||||
<type>feature</type>
|
||||
<text>Your concise observation here</text>
|
||||
</observation>
|
||||
\`\`\`
|
||||
|
||||
Requirements:
|
||||
- Use one of these types: decision, bugfix, feature, refactor, discovery
|
||||
- Keep text concise (one sentence preferred)
|
||||
- No markdown formatting inside <text>
|
||||
- No additional XML fields
|
||||
|
||||
3. If NO: Just acknowledge and wait for next observation
|
||||
|
||||
Remember: Quality over quantity. Only store meaningful insights.`;
|
||||
}
|
||||
```
|
||||
|
||||
### Finalization Prompt
|
||||
|
||||
```typescript
|
||||
function buildFinalizePrompt(session: SDKSession): string {
|
||||
return `SESSION ENDING
|
||||
==============
|
||||
The Claude Code session is finishing.
|
||||
|
||||
FINAL TASK
|
||||
----------
|
||||
1. Review the observations you've stored this session
|
||||
2. Generate a structured summary that answers these questions:
|
||||
- What did user request?
|
||||
- What did you investigate?
|
||||
- What did you learn?
|
||||
- What did you do?
|
||||
- What's next?
|
||||
- Files read
|
||||
- Files edited
|
||||
- Notes
|
||||
|
||||
3. Generate the structured summary and output it in this EXACT XML format:
|
||||
|
||||
\`\`\`xml
|
||||
<summary>
|
||||
<request>Implement JWT authentication system</request>
|
||||
<investigated>Existing auth middleware, session management, token storage patterns</investigated>
|
||||
<learned>Current system uses session cookies; no JWT support; race condition in middleware</learned>
|
||||
<completed>Implemented JWT token + refresh flow with 7-day expiry; fixed race condition with mutex; added token validation middleware</completed>
|
||||
<next_steps>Add token revocation API endpoint; write integration tests</next_steps>
|
||||
<files_read>
|
||||
<file>src/auth.ts</file>
|
||||
<file>src/middleware/session.ts</file>
|
||||
<file>src/types/user.ts</file>
|
||||
</files_read>
|
||||
<files_edited>
|
||||
<file>src/auth.ts</file>
|
||||
<file>src/middleware/auth.ts</file>
|
||||
<file>src/routes/auth.ts</file>
|
||||
</files_edited>
|
||||
<notes>Token secret stored in .env; refresh tokens use rotation strategy</notes>
|
||||
</summary>
|
||||
\`\`\`
|
||||
|
||||
Structure requirements:
|
||||
- <summary> is the root element
|
||||
- All 8 child elements are REQUIRED: request, investigated, learned, completed, next_steps, files_read, files_edited, notes
|
||||
- <files_read> and <files_edited> must contain <file> child elements (one per file)
|
||||
- If no files were read/edited, use empty tags: <files_read></files_read>
|
||||
- Text fields can be multiple sentences but avoid markdown formatting
|
||||
- Use underscores in element names: next_steps, files_read, files_edited
|
||||
|
||||
The SDK worker will parse the <summary> block and extract all fields to store in SQLite.
|
||||
|
||||
Generate the summary now in the required XML format.`;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Hook Commands Architecture
|
||||
|
||||
All four hook commands (`claude-mem context`, `claude-mem new`, `claude-mem save`, `claude-mem summary`) are implemented as standalone TypeScript functions that:
|
||||
|
||||
1. **Use bun:sqlite directly** - No spawning child processes or CLI subcommands
|
||||
2. **Are self-contained** - Each hook has all the logic it needs
|
||||
3. **Share a common database layer** - Import from shared `db.ts` module
|
||||
4. **Never call other claude-mem commands** - All functionality via direct library calls
|
||||
|
||||
```typescript
|
||||
// Example structure
|
||||
import { Database } from 'bun:sqlite';
|
||||
|
||||
export function contextHook(stdin: HookInput) {
|
||||
const db = new Database('~/.claude-mem/db.sqlite');
|
||||
// Query and return context directly
|
||||
const summaries = db.query('SELECT ...').all();
|
||||
console.log(formatContext(summaries));
|
||||
db.close();
|
||||
}
|
||||
|
||||
export function saveHook(stdin: HookInput) {
|
||||
const db = new Database('~/.claude-mem/db.sqlite');
|
||||
// Insert observation directly
|
||||
db.run('INSERT INTO observation_queue ...', params);
|
||||
db.close();
|
||||
console.log('{"continue": true, "suppressOutput": true}');
|
||||
}
|
||||
```
|
||||
|
||||
**Key principle:** Hooks are fast, synchronous database operations. The SDK worker process is where async/complex logic happens.
|
||||
|
||||
---
|
||||
|
||||
## Background Process Management
|
||||
|
||||
The `claude-mem save` hook just queues observations - processing happens in the background SDK worker process that polls the queue continuously.
|
||||
|
||||
The SDK worker is spawned by `claude-mem new` as a detached process and runs for the duration of the Claude Code session.
|
||||
|
||||
Benefits:
|
||||
- Works on all platforms (no systemd/launchd needed)
|
||||
- Self-contained (spawned and managed by claude-mem itself)
|
||||
- Simple state management (all state in SQLite)
|
||||
|
||||
---
|
||||
|
||||
## Advanced SDK Features
|
||||
|
||||
### Permission Integration (Future Enhancement)
|
||||
|
||||
The SDK provides a permission system that could be integrated with memory for context-aware decisions:
|
||||
|
||||
```typescript
|
||||
canUseTool: async (toolName, input) => {
|
||||
// Check memory for previous decisions about this tool/context
|
||||
const previousDecisions = await queryMemoryForTool(toolName, input);
|
||||
|
||||
if (previousDecisions.shouldAllow) {
|
||||
return {
|
||||
behavior: "allow",
|
||||
updatedInput: input
|
||||
};
|
||||
}
|
||||
|
||||
return {
|
||||
behavior: "ask_user",
|
||||
message: `This tool was previously flagged. Allow anyway?`
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
This could enable:
|
||||
- Learning from previous tool use patterns
|
||||
- Automatically allowing/denying based on historical context
|
||||
- Providing smart defaults based on project-specific patterns
|
||||
|
||||
**Implementation priority:** Low (add after core functionality is stable)
|
||||
|
||||
### SDK Hook Configuration (Alternative to Claude Code Hooks)
|
||||
|
||||
Instead of using external command hooks via Claude Code settings.json, the SDK supports native hook configuration:
|
||||
|
||||
```typescript
|
||||
import { HookMatcher } from '@anthropic-ai/agent-sdk';
|
||||
|
||||
const response = query({
|
||||
prompt: messageGenerator(),
|
||||
options: {
|
||||
hooks: {
|
||||
'PreToolUse': [
|
||||
HookMatcher(matcher='Bash', hooks=[validateBashCommand]),
|
||||
HookMatcher(hooks=[logToolUse]) // Applies to all tools
|
||||
],
|
||||
'PostToolUse': [
|
||||
HookMatcher(hooks=[captureObservation])
|
||||
]
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
type HookCallback = (
|
||||
input: HookInput,
|
||||
toolUseID: string | undefined,
|
||||
options: { signal: AbortSignal }
|
||||
) => Promise<HookJSONOutput>;
|
||||
```
|
||||
|
||||
**When to use SDK hooks vs Claude Code hooks:**
|
||||
- **Claude Code hooks**: For integrating with the main Claude Code session (our current approach)
|
||||
- **SDK hooks**: For controlling the memory agent's own tool usage (future enhancement)
|
||||
|
||||
**Implementation priority:** Medium (could simplify architecture, but adds complexity to migration)
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
**SDK worker failures:**
|
||||
- Each observation processing is atomic
|
||||
- Failed observations stay in queue
|
||||
- Next worker run retries
|
||||
- After 3 failures, mark observation as skipped
|
||||
- Use AbortController for graceful cancellation
|
||||
|
||||
**Abort signal handling:**
|
||||
```typescript
|
||||
try {
|
||||
for await (const msg of response) {
|
||||
if (abortController.signal.aborted) {
|
||||
throw new Error('Aborted');
|
||||
}
|
||||
// Process message
|
||||
}
|
||||
} catch (error) {
|
||||
if (abortController.signal.aborted) {
|
||||
// Clean shutdown
|
||||
await response.interrupt();
|
||||
} else {
|
||||
// Actual error
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Database corruption:**
|
||||
- SQLite with WAL mode (write-ahead logging)
|
||||
- Regular backups to ~/.claude-mem/backups/
|
||||
- Automatic recovery from backups
|
||||
|
||||
**SDK API failures:**
|
||||
- Retry with exponential backoff
|
||||
- Don't block main Claude Code session
|
||||
- Log errors for debugging
|
||||
- Mark session as 'failed' after max retries
|
||||
|
||||
---
|
||||
|
||||
## Implementation Order
|
||||
|
||||
1. **Database setup** - Create tables and migration scripts
|
||||
2. **Hook commands** - Implement the 4 hook commands (context, new, save, summary)
|
||||
3. **SDK worker** - Implement the background worker process with response parsing
|
||||
4. **SDK prompts** - Wire up the prompts and message generator
|
||||
5. **Test end-to-end** - Run a real Claude Code session and verify it works
|
||||
|
||||
Start simple. Get one hook working before moving to the next. Don't try to build everything at once.
|
||||
|
||||
**Note:** MCP is only used for retrieval (when Claude Code needs to access stored memories), not for storage. The SDK agent stores data by outputting specially formatted text that the SDK worker parses and writes to SQLite.
|
||||
|
||||
### SDK Import Verification
|
||||
|
||||
Before implementing, verify the SDK exports match your usage:
|
||||
|
||||
```typescript
|
||||
// Required imports from @anthropic-ai/agent-sdk
|
||||
import { query } from '@anthropic-ai/agent-sdk';
|
||||
import type { Query, UserMessage, Options } from '@anthropic-ai/agent-sdk';
|
||||
|
||||
// Verify the query function signature:
|
||||
// function query(options: { prompt: string | AsyncIterable<UserMessage>; options?: Options }): Query
|
||||
|
||||
// Verify Query type:
|
||||
// interface Query extends AsyncGenerator<SDKMessage, void> {
|
||||
// interrupt(): Promise<void>;
|
||||
// setPermissionMode(mode: PermissionMode): Promise<void>;
|
||||
// }
|
||||
|
||||
// Verify UserMessage type:
|
||||
// type UserMessage = { role: "user"; content: string }
|
||||
```
|
||||
|
||||
If the SDK exports differ from this structure, adjust the implementation accordingly. The SDK documentation should be the source of truth.
|
||||
|
||||
---
|
||||
|
||||
## Key Corrections from Agent SDK Documentation
|
||||
|
||||
This refactor plan has been updated to align with the official Agent SDK documentation. Key corrections include:
|
||||
|
||||
### 1. Session ID Management
|
||||
- **Before:** Captured session ID once in UserPromptSubmit hook
|
||||
- **After:** Capture from system init message and track updates continuously
|
||||
- **Why:** Session IDs change with each conversation turn
|
||||
|
||||
### 2. Hook Configuration
|
||||
- **Before:** Mixed up SDK hook format with Claude Code hook format
|
||||
- **After:** Clarified that Claude Code uses settings.json format (external commands); SDK uses TypeScript HookMatcher (programmatic callbacks)
|
||||
- **Why:** Two separate hook systems with different purposes and configuration methods
|
||||
- **Our approach:** Use Claude Code hooks to observe the main session; SDK hooks are future enhancement
|
||||
|
||||
### 3. Message Generator and Query Interface
|
||||
- **Before:** Custom SDKMessage type with nested message structure
|
||||
- **After:** Simple UserMessage type `{ role: "user", content: string }` yielded from AsyncIterable
|
||||
- **Why:** SDK expects AsyncIterable<UserMessage>, not a custom wrapper format
|
||||
- **Query type:** Properly typed as `Query` which extends AsyncGenerator with interrupt() and setPermissionMode()
|
||||
|
||||
### 4. Tool Filtering
|
||||
- **Before:** Filter "boring tools" in PostToolUse hook
|
||||
- **After:** Use SDK's `disallowedTools` option in query configuration
|
||||
- **Why:** More efficient to prevent SDK from using tools entirely
|
||||
|
||||
### 5. Model Identifier
|
||||
- **Before:** Used `claude-haiku-4-5-20251001` (undocumented)
|
||||
- **After:** Use `claude-sonnet-4-5` (documented model name)
|
||||
- **Why:** Stick to documented model identifiers for stability
|
||||
|
||||
### 6. Error Handling
|
||||
- **Before:** Custom error handling without SDK features
|
||||
- **After:** Use AbortController and response.interrupt() for graceful cancellation
|
||||
- **Why:** SDK provides built-in cancellation mechanisms
|
||||
|
||||
### 7. Session Resumption
|
||||
- **Before:** Manual session state reconstruction
|
||||
- **After:** Leverage SDK's built-in `resume: sessionId` option
|
||||
- **Why:** SDK already handles session resumption
|
||||
|
||||
### Future Enhancements to Consider
|
||||
|
||||
1. **Permission integration** - Use canUseTool callback to make memory-aware decisions
|
||||
2. **SDK native hooks** - Replace external command hooks with SDK HookMatcher
|
||||
3. **Better session recovery** - Use SDK resumption for interrupted sessions
|
||||
|
||||
These corrections ensure our implementation follows Agent SDK best practices and avoids reinventing functionality the SDK already provides.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Validation Summary
|
||||
|
||||
This plan has been validated against the official Agent SDK documentation and confirmed to be architecturally sound.
|
||||
|
||||
### ✅ Validated Design Decisions
|
||||
|
||||
1. **Hook System Usage** - Correctly uses Claude Code external command hooks for observation; SDK programmatic hooks reserved for future enhancement
|
||||
2. **Query Function Interface** - Properly implements AsyncIterable<UserMessage> for streaming input mode
|
||||
3. **Session Management** - Leverages SDK's built-in session resumption instead of manual state reconstruction
|
||||
4. **Tool Filtering** - Uses SDK's `disallowedTools` option for efficiency
|
||||
5. **Error Handling** - Implements AbortController and interrupt() for graceful cancellation
|
||||
6. **Separation of Concerns** - Clean isolation between main Claude Code session and background SDK worker
|
||||
|
||||
### 🎯 Architecture Strengths
|
||||
|
||||
- **Non-blocking** - Hooks are fast database operations; complex logic happens in background
|
||||
- **Queue-based** - Handles parallel hook execution correctly via observation_queue table
|
||||
- **Fault-tolerant** - Failed observations stay in queue for retry; graceful degradation
|
||||
- **Platform-agnostic** - No dependency on systemd/launchd; works everywhere
|
||||
- **Type-safe** - Uses official SDK TypeScript types throughout
|
||||
|
||||
### 📋 Pre-Implementation Checklist
|
||||
|
||||
Before starting implementation, verify:
|
||||
|
||||
1. [ ] Agent SDK installed and accessible: `@anthropic-ai/agent-sdk`
|
||||
2. [ ] Verify SDK exports match expected structure (query, Query, UserMessage types)
|
||||
3. [ ] SQLite database location decided: `~/.claude-mem/db.sqlite`
|
||||
4. [ ] Claude Code settings.json hook configuration tested
|
||||
5. [ ] Background process spawning works on target platform (test detached process)
|
||||
|
||||
### 🚀 Ready for Implementation
|
||||
|
||||
The architecture is validated and ready for implementation. Follow the phased approach:
|
||||
|
||||
1. Database setup first (get schema working with bun:sqlite)
|
||||
2. Implement hooks one at a time (start with `context`, then `save`)
|
||||
3. Build SDK worker with simple message generator
|
||||
4. Test end-to-end with a real Claude Code session
|
||||
5. Iterate and refine based on real-world usage
|
||||
|
||||
**Remember:** Start simple, get one piece working, then build on it. Don't try to implement everything at once.
|
||||
@@ -0,0 +1,61 @@
|
||||
For tracking costs and tokens in your Agent SDK plugin, you have built-in programmatic access to usage data through the SDK itself[(1)](https://docs.claude.com/en/api/agent-sdk/cost-tracking).
|
||||
|
||||
## Agent SDK Cost Tracking
|
||||
|
||||
The Claude Agent SDK provides detailed token usage information for each interaction[(1)](https://docs.claude.com/en/api/agent-sdk/cost-tracking). Here's how to track it:
|
||||
|
||||
**TypeScript:**
|
||||
```typescript
|
||||
import { query } from "@anthropic-ai/claude-agent-sdk";
|
||||
|
||||
const result = await query({
|
||||
prompt: "Your task here",
|
||||
options: {
|
||||
onMessage: (message) => {
|
||||
if (message.type === 'assistant' && message.usage) {
|
||||
console.log(`Message ID: ${message.id}`);
|
||||
console.log(`Usage:`, message.usage);
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
```
|
||||
[(1)](https://docs.claude.com/en/api/agent-sdk/cost-tracking)
|
||||
|
||||
The final `result` message contains the total cumulative usage from all steps in the conversation[(1)](https://docs.claude.com/en/api/agent-sdk/cost-tracking):
|
||||
|
||||
```typescript
|
||||
console.log("Total usage:", result.usage);
|
||||
console.log("Total cost:", result.usage.total_cost_usd);
|
||||
```
|
||||
[(1)](https://docs.claude.com/en/api/agent-sdk/cost-tracking)
|
||||
|
||||
## Important: Avoid Double-Counting
|
||||
|
||||
When Claude executes tools in parallel, multiple assistant messages may share the same ID and usage data[(1)](https://docs.claude.com/en/api/agent-sdk/cost-tracking). You should only charge once per unique message ID[(1)](https://docs.claude.com/en/api/agent-sdk/cost-tracking):
|
||||
|
||||
```typescript
|
||||
const processedMessageIds = new Set<string>();
|
||||
|
||||
onMessage: (message) => {
|
||||
if (message.type === 'assistant' && message.usage) {
|
||||
// Skip if already processed
|
||||
if (processedMessageIds.has(message.id)) {
|
||||
return;
|
||||
}
|
||||
|
||||
processedMessageIds.add(message.id);
|
||||
// Record usage here
|
||||
}
|
||||
}
|
||||
```
|
||||
[(1)](https://docs.claude.com/en/api/agent-sdk/cost-tracking)
|
||||
|
||||
## Usage Fields
|
||||
|
||||
Each usage object contains[(1)](https://docs.claude.com/en/api/agent-sdk/cost-tracking):
|
||||
- `input_tokens`: Base input tokens processed
|
||||
- `output_tokens`: Tokens generated in the response
|
||||
- `cache_creation_input_tokens`: Tokens used to create cache entries
|
||||
- `cache_read_input_tokens`: Tokens read from cache
|
||||
- `total_cost_usd`: Total cost in USD (only in result message)
|
||||
@@ -0,0 +1,607 @@
|
||||
# Agent Skills
|
||||
|
||||
> Create, manage, and share Skills to extend Claude's capabilities in Claude Code.
|
||||
|
||||
This guide shows you how to create, use, and manage Agent Skills in Claude Code. Skills are modular capabilities that extend Claude's functionality through organized folders containing instructions, scripts, and resources.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* Claude Code version 1.0 or later
|
||||
* Basic familiarity with [Claude Code](/en/docs/claude-code/quickstart)
|
||||
|
||||
## What are Agent Skills?
|
||||
|
||||
Agent Skills package expertise into discoverable capabilities. Each Skill consists of a `SKILL.md` file with instructions that Claude reads when relevant, plus optional supporting files like scripts and templates.
|
||||
|
||||
**How Skills are invoked**: Skills are **model-invoked**—Claude autonomously decides when to use them based on your request and the Skill's description. This is different from slash commands, which are **user-invoked** (you explicitly type `/command` to trigger them).
|
||||
|
||||
**Benefits**:
|
||||
|
||||
* Extend Claude's capabilities for your specific workflows
|
||||
* Share expertise across your team via git
|
||||
* Reduce repetitive prompting
|
||||
* Compose multiple Skills for complex tasks
|
||||
|
||||
Learn more in the [Agent Skills overview](/en/docs/agents-and-tools/agent-skills/overview).
|
||||
|
||||
<Note>
|
||||
For a deep dive into the architecture and real-world applications of Agent Skills, read our engineering blog: [Equipping agents for the real world with Agent Skills](https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills).
|
||||
</Note>
|
||||
|
||||
## Create a Skill
|
||||
|
||||
Skills are stored as directories containing a `SKILL.md` file.
|
||||
|
||||
### Personal Skills
|
||||
|
||||
Personal Skills are available across all your projects. Store them in `~/.claude/skills/`:
|
||||
|
||||
```bash theme={null}
|
||||
mkdir -p ~/.claude/skills/my-skill-name
|
||||
```
|
||||
|
||||
**Use personal Skills for**:
|
||||
|
||||
* Your individual workflows and preferences
|
||||
* Experimental Skills you're developing
|
||||
* Personal productivity tools
|
||||
|
||||
### Project Skills
|
||||
|
||||
Project Skills are shared with your team. Store them in `.claude/skills/` within your project:
|
||||
|
||||
```bash theme={null}
|
||||
mkdir -p .claude/skills/my-skill-name
|
||||
```
|
||||
|
||||
**Use project Skills for**:
|
||||
|
||||
* Team workflows and conventions
|
||||
* Project-specific expertise
|
||||
* Shared utilities and scripts
|
||||
|
||||
Project Skills are checked into git and automatically available to team members.
|
||||
|
||||
### Plugin Skills
|
||||
|
||||
Skills can also come from [Claude Code plugins](/en/docs/claude-code/plugins). Plugins may bundle Skills that are automatically available when the plugin is installed. These Skills work the same way as personal and project Skills.
|
||||
|
||||
## Write SKILL.md
|
||||
|
||||
Create a `SKILL.md` file with YAML frontmatter and Markdown content:
|
||||
|
||||
```yaml theme={null}
|
||||
---
|
||||
name: your-skill-name
|
||||
description: Brief description of what this Skill does and when to use it
|
||||
---
|
||||
|
||||
# Your Skill Name
|
||||
|
||||
## Instructions
|
||||
Provide clear, step-by-step guidance for Claude.
|
||||
|
||||
## Examples
|
||||
Show concrete examples of using this Skill.
|
||||
```
|
||||
|
||||
**Field requirements**:
|
||||
|
||||
* `name`: Must use lowercase letters, numbers, and hyphens only (max 64 characters)
|
||||
* `description`: Brief description of what the Skill does and when to use it (max 1024 characters)
|
||||
|
||||
The `description` field is critical for Claude to discover when to use your Skill. It should include both what the Skill does and when Claude should use it.
|
||||
|
||||
See the [best practices guide](/en/docs/agents-and-tools/agent-skills/best-practices) for complete authoring guidance including validation rules.
|
||||
|
||||
## Add supporting files
|
||||
|
||||
Create additional files alongside SKILL.md:
|
||||
|
||||
```
|
||||
my-skill/
|
||||
├── SKILL.md (required)
|
||||
├── reference.md (optional documentation)
|
||||
├── examples.md (optional examples)
|
||||
├── scripts/
|
||||
│ └── helper.py (optional utility)
|
||||
└── templates/
|
||||
└── template.txt (optional template)
|
||||
```
|
||||
|
||||
Reference these files from SKILL.md:
|
||||
|
||||
````markdown theme={null}
|
||||
For advanced usage, see [reference.md](reference.md).
|
||||
|
||||
Run the helper script:
|
||||
```bash
|
||||
python scripts/helper.py input.txt
|
||||
```
|
||||
````
|
||||
|
||||
Claude reads these files only when needed, using progressive disclosure to manage context efficiently.
|
||||
|
||||
## Restrict tool access with allowed-tools
|
||||
|
||||
Use the `allowed-tools` frontmatter field to limit which tools Claude can use when a Skill is active:
|
||||
|
||||
```yaml theme={null}
|
||||
---
|
||||
name: safe-file-reader
|
||||
description: Read files without making changes. Use when you need read-only file access.
|
||||
allowed-tools: Read, Grep, Glob
|
||||
---
|
||||
|
||||
# Safe File Reader
|
||||
|
||||
This Skill provides read-only file access.
|
||||
|
||||
## Instructions
|
||||
1. Use Read to view file contents
|
||||
2. Use Grep to search within files
|
||||
3. Use Glob to find files by pattern
|
||||
```
|
||||
|
||||
When this Skill is active, Claude can only use the specified tools (Read, Grep, Glob) without needing to ask for permission. This is useful for:
|
||||
|
||||
* Read-only Skills that shouldn't modify files
|
||||
* Skills with limited scope (e.g., only data analysis, no file writing)
|
||||
* Security-sensitive workflows where you want to restrict capabilities
|
||||
|
||||
If `allowed-tools` is not specified, Claude will ask for permission to use tools as normal, following the standard permission model.
|
||||
|
||||
<Note>
|
||||
`allowed-tools` is only supported for Skills in Claude Code.
|
||||
</Note>
|
||||
|
||||
## View available Skills
|
||||
|
||||
Skills are automatically discovered by Claude from three sources:
|
||||
|
||||
* Personal Skills: `~/.claude/skills/`
|
||||
* Project Skills: `.claude/skills/`
|
||||
* Plugin Skills: bundled with installed plugins
|
||||
|
||||
**To view all available Skills**, ask Claude directly:
|
||||
|
||||
```
|
||||
What Skills are available?
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```
|
||||
List all available Skills
|
||||
```
|
||||
|
||||
This will show all Skills from all sources, including plugin Skills.
|
||||
|
||||
**To inspect a specific Skill**, you can also check the filesystem:
|
||||
|
||||
```bash theme={null}
|
||||
# List personal Skills
|
||||
ls ~/.claude/skills/
|
||||
|
||||
# List project Skills (if in a project directory)
|
||||
ls .claude/skills/
|
||||
|
||||
# View a specific Skill's content
|
||||
cat ~/.claude/skills/my-skill/SKILL.md
|
||||
```
|
||||
|
||||
## Test a Skill
|
||||
|
||||
After creating a Skill, test it by asking questions that match your description.
|
||||
|
||||
**Example**: If your description mentions "PDF files":
|
||||
|
||||
```
|
||||
Can you help me extract text from this PDF?
|
||||
```
|
||||
|
||||
Claude autonomously decides to use your Skill if it matches the request—you don't need to explicitly invoke it. The Skill activates automatically based on the context of your question.
|
||||
|
||||
## Debug a Skill
|
||||
|
||||
If Claude doesn't use your Skill, check these common issues:
|
||||
|
||||
### Make description specific
|
||||
|
||||
**Too vague**:
|
||||
|
||||
```yaml theme={null}
|
||||
description: Helps with documents
|
||||
```
|
||||
|
||||
**Specific**:
|
||||
|
||||
```yaml theme={null}
|
||||
description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
|
||||
```
|
||||
|
||||
Include both what the Skill does and when to use it in the description.
|
||||
|
||||
### Verify file path
|
||||
|
||||
**Personal Skills**: `~/.claude/skills/skill-name/SKILL.md`
|
||||
**Project Skills**: `.claude/skills/skill-name/SKILL.md`
|
||||
|
||||
Check the file exists:
|
||||
|
||||
```bash theme={null}
|
||||
# Personal
|
||||
ls ~/.claude/skills/my-skill/SKILL.md
|
||||
|
||||
# Project
|
||||
ls .claude/skills/my-skill/SKILL.md
|
||||
```
|
||||
|
||||
### Check YAML syntax
|
||||
|
||||
Invalid YAML prevents the Skill from loading. Verify the frontmatter:
|
||||
|
||||
```bash theme={null}
|
||||
cat SKILL.md | head -n 10
|
||||
```
|
||||
|
||||
Ensure:
|
||||
|
||||
* Opening `---` on line 1
|
||||
* Closing `---` before Markdown content
|
||||
* Valid YAML syntax (no tabs, correct indentation)
|
||||
|
||||
### View errors
|
||||
|
||||
Run Claude Code with debug mode to see Skill loading errors:
|
||||
|
||||
```bash theme={null}
|
||||
claude --debug
|
||||
```
|
||||
|
||||
## Share Skills with your team
|
||||
|
||||
**Recommended approach**: Distribute Skills through [plugins](/en/docs/claude-code/plugins).
|
||||
|
||||
To share Skills via plugin:
|
||||
|
||||
1. Create a plugin with Skills in the `skills/` directory
|
||||
2. Add the plugin to a marketplace
|
||||
3. Team members install the plugin
|
||||
|
||||
For complete instructions, see [Add Skills to your plugin](/en/docs/claude-code/plugins#add-skills-to-your-plugin).
|
||||
|
||||
You can also share Skills directly through project repositories:
|
||||
|
||||
### Step 1: Add Skill to your project
|
||||
|
||||
Create a project Skill:
|
||||
|
||||
```bash theme={null}
|
||||
mkdir -p .claude/skills/team-skill
|
||||
# Create SKILL.md
|
||||
```
|
||||
|
||||
### Step 2: Commit to git
|
||||
|
||||
```bash theme={null}
|
||||
git add .claude/skills/
|
||||
git commit -m "Add team Skill for PDF processing"
|
||||
git push
|
||||
```
|
||||
|
||||
### Step 3: Team members get Skills automatically
|
||||
|
||||
When team members pull the latest changes, Skills are immediately available:
|
||||
|
||||
```bash theme={null}
|
||||
git pull
|
||||
claude # Skills are now available
|
||||
```
|
||||
|
||||
## Update a Skill
|
||||
|
||||
Edit SKILL.md directly:
|
||||
|
||||
```bash theme={null}
|
||||
# Personal Skill
|
||||
code ~/.claude/skills/my-skill/SKILL.md
|
||||
|
||||
# Project Skill
|
||||
code .claude/skills/my-skill/SKILL.md
|
||||
```
|
||||
|
||||
Changes take effect the next time you start Claude Code. If Claude Code is already running, restart it to load the updates.
|
||||
|
||||
## Remove a Skill
|
||||
|
||||
Delete the Skill directory:
|
||||
|
||||
```bash theme={null}
|
||||
# Personal
|
||||
rm -rf ~/.claude/skills/my-skill
|
||||
|
||||
# Project
|
||||
rm -rf .claude/skills/my-skill
|
||||
git commit -m "Remove unused Skill"
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
### Keep Skills focused
|
||||
|
||||
One Skill should address one capability:
|
||||
|
||||
**Focused**:
|
||||
|
||||
* "PDF form filling"
|
||||
* "Excel data analysis"
|
||||
* "Git commit messages"
|
||||
|
||||
**Too broad**:
|
||||
|
||||
* "Document processing" (split into separate Skills)
|
||||
* "Data tools" (split by data type or operation)
|
||||
|
||||
### Write clear descriptions
|
||||
|
||||
Help Claude discover when to use Skills by including specific triggers in your description:
|
||||
|
||||
**Clear**:
|
||||
|
||||
```yaml theme={null}
|
||||
description: Analyze Excel spreadsheets, create pivot tables, and generate charts. Use when working with Excel files, spreadsheets, or analyzing tabular data in .xlsx format.
|
||||
```
|
||||
|
||||
**Vague**:
|
||||
|
||||
```yaml theme={null}
|
||||
description: For files
|
||||
```
|
||||
|
||||
### Test with your team
|
||||
|
||||
Have teammates use Skills and provide feedback:
|
||||
|
||||
* Does the Skill activate when expected?
|
||||
* Are the instructions clear?
|
||||
* Are there missing examples or edge cases?
|
||||
|
||||
### Document Skill versions
|
||||
|
||||
You can document Skill versions in your SKILL.md content to track changes over time. Add a version history section:
|
||||
|
||||
```markdown theme={null}
|
||||
# My Skill
|
||||
|
||||
## Version History
|
||||
- v2.0.0 (2025-10-01): Breaking changes to API
|
||||
- v1.1.0 (2025-09-15): Added new features
|
||||
- v1.0.0 (2025-09-01): Initial release
|
||||
```
|
||||
|
||||
This helps team members understand what changed between versions.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Claude doesn't use my Skill
|
||||
|
||||
**Symptom**: You ask a relevant question but Claude doesn't use your Skill.
|
||||
|
||||
**Check**: Is the description specific enough?
|
||||
|
||||
Vague descriptions make discovery difficult. Include both what the Skill does and when to use it, with key terms users would mention.
|
||||
|
||||
**Too generic**:
|
||||
|
||||
```yaml theme={null}
|
||||
description: Helps with data
|
||||
```
|
||||
|
||||
**Specific**:
|
||||
|
||||
```yaml theme={null}
|
||||
description: Analyze Excel spreadsheets, generate pivot tables, create charts. Use when working with Excel files, spreadsheets, or .xlsx files.
|
||||
```
|
||||
|
||||
**Check**: Is the YAML valid?
|
||||
|
||||
Run validation to check for syntax errors:
|
||||
|
||||
```bash theme={null}
|
||||
# View frontmatter
|
||||
cat .claude/skills/my-skill/SKILL.md | head -n 15
|
||||
|
||||
# Check for common issues
|
||||
# - Missing opening or closing ---
|
||||
# - Tabs instead of spaces
|
||||
# - Unquoted strings with special characters
|
||||
```
|
||||
|
||||
**Check**: Is the Skill in the correct location?
|
||||
|
||||
```bash theme={null}
|
||||
# Personal Skills
|
||||
ls ~/.claude/skills/*/SKILL.md
|
||||
|
||||
# Project Skills
|
||||
ls .claude/skills/*/SKILL.md
|
||||
```
|
||||
|
||||
### Skill has errors
|
||||
|
||||
**Symptom**: The Skill loads but doesn't work correctly.
|
||||
|
||||
**Check**: Are dependencies available?
|
||||
|
||||
Claude will automatically install required dependencies (or ask for permission to install them) when it needs them.
|
||||
|
||||
**Check**: Do scripts have execute permissions?
|
||||
|
||||
```bash theme={null}
|
||||
chmod +x .claude/skills/my-skill/scripts/*.py
|
||||
```
|
||||
|
||||
**Check**: Are file paths correct?
|
||||
|
||||
Use forward slashes (Unix style) in all paths:
|
||||
|
||||
**Correct**: `scripts/helper.py`
|
||||
**Wrong**: `scripts\helper.py` (Windows style)
|
||||
|
||||
### Multiple Skills conflict
|
||||
|
||||
**Symptom**: Claude uses the wrong Skill or seems confused between similar Skills.
|
||||
|
||||
**Be specific in descriptions**: Help Claude choose the right Skill by using distinct trigger terms in your descriptions.
|
||||
|
||||
Instead of:
|
||||
|
||||
```yaml theme={null}
|
||||
# Skill 1
|
||||
description: For data analysis
|
||||
|
||||
# Skill 2
|
||||
description: For analyzing data
|
||||
```
|
||||
|
||||
Use:
|
||||
|
||||
```yaml theme={null}
|
||||
# Skill 1
|
||||
description: Analyze sales data in Excel files and CRM exports. Use for sales reports, pipeline analysis, and revenue tracking.
|
||||
|
||||
# Skill 2
|
||||
description: Analyze log files and system metrics data. Use for performance monitoring, debugging, and system diagnostics.
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Simple Skill (single file)
|
||||
|
||||
```
|
||||
commit-helper/
|
||||
└── SKILL.md
|
||||
```
|
||||
|
||||
```yaml theme={null}
|
||||
---
|
||||
name: generating-commit-messages
|
||||
description: Generates clear commit messages from git diffs. Use when writing commit messages or reviewing staged changes.
|
||||
---
|
||||
|
||||
# Generating Commit Messages
|
||||
|
||||
## Instructions
|
||||
|
||||
1. Run `git diff --staged` to see changes
|
||||
2. I'll suggest a commit message with:
|
||||
- Summary under 50 characters
|
||||
- Detailed description
|
||||
- Affected components
|
||||
|
||||
## Best practices
|
||||
|
||||
- Use present tense
|
||||
- Explain what and why, not how
|
||||
```
|
||||
|
||||
### Skill with tool permissions
|
||||
|
||||
```
|
||||
code-reviewer/
|
||||
└── SKILL.md
|
||||
```
|
||||
|
||||
```yaml theme={null}
|
||||
---
|
||||
name: code-reviewer
|
||||
description: Review code for best practices and potential issues. Use when reviewing code, checking PRs, or analyzing code quality.
|
||||
allowed-tools: Read, Grep, Glob
|
||||
---
|
||||
|
||||
# Code Reviewer
|
||||
|
||||
## Review checklist
|
||||
|
||||
1. Code organization and structure
|
||||
2. Error handling
|
||||
3. Performance considerations
|
||||
4. Security concerns
|
||||
5. Test coverage
|
||||
|
||||
## Instructions
|
||||
|
||||
1. Read the target files using Read tool
|
||||
2. Search for patterns using Grep
|
||||
3. Find related files using Glob
|
||||
4. Provide detailed feedback on code quality
|
||||
```
|
||||
|
||||
### Multi-file Skill
|
||||
|
||||
```
|
||||
pdf-processing/
|
||||
├── SKILL.md
|
||||
├── FORMS.md
|
||||
├── REFERENCE.md
|
||||
└── scripts/
|
||||
├── fill_form.py
|
||||
└── validate.py
|
||||
```
|
||||
|
||||
**SKILL.md**:
|
||||
|
||||
````yaml theme={null}
|
||||
---
|
||||
name: pdf-processing
|
||||
description: Extract text, fill forms, merge PDFs. Use when working with PDF files, forms, or document extraction. Requires pypdf and pdfplumber packages.
|
||||
---
|
||||
|
||||
# PDF Processing
|
||||
|
||||
## Quick start
|
||||
|
||||
Extract text:
|
||||
```python
|
||||
import pdfplumber
|
||||
with pdfplumber.open("doc.pdf") as pdf:
|
||||
text = pdf.pages[0].extract_text()
|
||||
```
|
||||
|
||||
For form filling, see [FORMS.md](FORMS.md).
|
||||
For detailed API reference, see [REFERENCE.md](REFERENCE.md).
|
||||
|
||||
## Requirements
|
||||
|
||||
Packages must be installed in your environment:
|
||||
```bash
|
||||
pip install pypdf pdfplumber
|
||||
```
|
||||
````
|
||||
|
||||
<Note>
|
||||
List required packages in the description. Packages must be installed in your environment before Claude can use them.
|
||||
</Note>
|
||||
|
||||
Claude loads additional files only when needed.
|
||||
|
||||
## Next steps
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Authoring best practices" icon="lightbulb" href="/en/docs/agents-and-tools/agent-skills/best-practices">
|
||||
Write Skills that Claude can use effectively
|
||||
</Card>
|
||||
|
||||
<Card title="Agent Skills overview" icon="book" href="/en/docs/agents-and-tools/agent-skills/overview">
|
||||
Learn how Skills work across Claude products
|
||||
</Card>
|
||||
|
||||
<Card title="Use Skills in the Agent SDK" icon="cube" href="/en/api/agent-sdk/skills">
|
||||
Use Skills programmatically with TypeScript and Python
|
||||
</Card>
|
||||
|
||||
<Card title="Get started with Agent Skills" icon="rocket" href="/en/docs/agents-and-tools/agent-skills/quickstart">
|
||||
Create your first Skill
|
||||
</Card>
|
||||
</CardGroup>
|
||||
@@ -0,0 +1,31 @@
|
||||
|
||||
|
||||
# Claude Code Hooks Exit Code Cheat Sheet
|
||||
|
||||
## Exit Code Behavior [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output)
|
||||
|
||||
- **Exit code 0**: Success. `stdout` is shown to the user in transcript mode, except for `UserPromptSubmit` hook where stdout is injected as context [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output)
|
||||
- **Exit code 2**: Blocking error. `stderr` is fed back to Claude to process automatically [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output)
|
||||
- **Other exit codes**: Non-blocking error. `stderr` is shown to the user and execution continues [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output)
|
||||
|
||||
## Per-Hook Event Behavior [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output)
|
||||
|
||||
| Hook Event | Exit Code 2 Behavior |
|
||||
|------------|---------------------|
|
||||
| `PreToolUse` | Blocks the tool call, shows stderr to Claude [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output) |
|
||||
| `PostToolUse` | Shows stderr to Claude (tool already ran) [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output) |
|
||||
| `Notification` | N/A, shows stderr to user only [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output) |
|
||||
| `UserPromptSubmit` | Blocks prompt processing, erases prompt, shows stderr to user only [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output) |
|
||||
| `Stop` | Blocks stoppage, shows stderr to Claude [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output) |
|
||||
| `SubagentStop` | Blocks stoppage, shows stderr to Claude subagent [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output) |
|
||||
| `PreCompact` | N/A, shows stderr to user only [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output) |
|
||||
| `SessionStart` | N/A, shows stderr to user only [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output) |
|
||||
| `SessionEnd` | N/A, shows stderr to user only [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output) |
|
||||
|
||||
## Quick Reference
|
||||
|
||||
- **Success**: `process.exit(0)` - Operation completed successfully
|
||||
- **Block & feedback**: `process.exit(2)` - Block operation and give Claude feedback via stderr
|
||||
- **Non-blocking error**: `process.exit(1)` - Show error to user but continue execution
|
||||
|
||||
**Important**: Claude Code does not see stdout if the exit code is 0, except for the `UserPromptSubmit` hook where stdout is injected as context [(1)](https://docs.claude.com/en/docs/claude-code/hooks#hook-output)
|
||||
@@ -0,0 +1,837 @@
|
||||
# Hooks reference
|
||||
|
||||
> This page provides reference documentation for implementing hooks in Claude Code.
|
||||
|
||||
<Tip>
|
||||
For a quickstart guide with examples, see [Get started with Claude Code hooks](/en/docs/claude-code/hooks-guide).
|
||||
</Tip>
|
||||
|
||||
## Configuration
|
||||
|
||||
Claude Code hooks are configured in your [settings files](/en/docs/claude-code/settings):
|
||||
|
||||
* `~/.claude/settings.json` - User settings
|
||||
* `.claude/settings.json` - Project settings
|
||||
* `.claude/settings.local.json` - Local project settings (not committed)
|
||||
* Enterprise managed policy settings
|
||||
|
||||
### Structure
|
||||
|
||||
Hooks are organized by matchers, where each matcher can have multiple hooks:
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"hooks": {
|
||||
"EventName": [
|
||||
{
|
||||
"matcher": "ToolPattern",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "your-command-here"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
* **matcher**: Pattern to match tool names, case-sensitive (only applicable for
|
||||
`PreToolUse` and `PostToolUse`)
|
||||
* Simple strings match exactly: `Write` matches only the Write tool
|
||||
* Supports regex: `Edit|Write` or `Notebook.*`
|
||||
* Use `*` to match all tools. You can also use empty string (`""`) or leave
|
||||
`matcher` blank.
|
||||
* **hooks**: Array of commands to execute when the pattern matches
|
||||
* `type`: Currently only `"command"` is supported
|
||||
* `command`: The bash command to execute (can use `$CLAUDE_PROJECT_DIR`
|
||||
environment variable)
|
||||
* `timeout`: (Optional) How long a command should run, in seconds, before
|
||||
canceling that specific command.
|
||||
|
||||
For events like `UserPromptSubmit`, `Notification`, `Stop`, and `SubagentStop`
|
||||
that don't use matchers, you can omit the matcher field:
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"hooks": {
|
||||
"UserPromptSubmit": [
|
||||
{
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "/path/to/prompt-validator.py"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Project-Specific Hook Scripts
|
||||
|
||||
You can use the environment variable `CLAUDE_PROJECT_DIR` (only available when
|
||||
Claude Code spawns the hook command) to reference scripts stored in your project,
|
||||
ensuring they work regardless of Claude's current directory:
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"hooks": {
|
||||
"PostToolUse": [
|
||||
{
|
||||
"matcher": "Write|Edit",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/check-style.sh"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Plugin hooks
|
||||
|
||||
[Plugins](/en/docs/claude-code/plugins) can provide hooks that integrate seamlessly with your user and project hooks. Plugin hooks are automatically merged with your configuration when plugins are enabled.
|
||||
|
||||
**How plugin hooks work**:
|
||||
|
||||
* Plugin hooks are defined in the plugin's `hooks/hooks.json` file or in a file given by a custom path to the `hooks` field.
|
||||
* When a plugin is enabled, its hooks are merged with user and project hooks
|
||||
* Multiple hooks from different sources can respond to the same event
|
||||
* Plugin hooks use the `${CLAUDE_PLUGIN_ROOT}` environment variable to reference plugin files
|
||||
|
||||
**Example plugin hook configuration**:
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"description": "Automatic code formatting",
|
||||
"hooks": {
|
||||
"PostToolUse": [
|
||||
{
|
||||
"matcher": "Write|Edit",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/format.sh",
|
||||
"timeout": 30
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<Note>
|
||||
Plugin hooks use the same format as regular hooks with an optional `description` field to explain the hook's purpose.
|
||||
</Note>
|
||||
|
||||
<Note>
|
||||
Plugin hooks run alongside your custom hooks. If multiple hooks match an event, they all execute in parallel.
|
||||
</Note>
|
||||
|
||||
**Environment variables for plugins**:
|
||||
|
||||
* `${CLAUDE_PLUGIN_ROOT}`: Absolute path to the plugin directory
|
||||
* `${CLAUDE_PROJECT_DIR}`: Project root directory (same as for project hooks)
|
||||
* All standard environment variables are available
|
||||
|
||||
See the [plugin components reference](/en/docs/claude-code/plugins-reference#hooks) for details on creating plugin hooks.
|
||||
|
||||
## Hook Events
|
||||
|
||||
### PreToolUse
|
||||
|
||||
Runs after Claude creates tool parameters and before processing the tool call.
|
||||
|
||||
**Common matchers:**
|
||||
|
||||
* `Task` - Subagent tasks (see [subagents documentation](/en/docs/claude-code/sub-agents))
|
||||
* `Bash` - Shell commands
|
||||
* `Glob` - File pattern matching
|
||||
* `Grep` - Content search
|
||||
* `Read` - File reading
|
||||
* `Edit` - File editing
|
||||
* `Write` - File writing
|
||||
* `WebFetch`, `WebSearch` - Web operations
|
||||
|
||||
### PostToolUse
|
||||
|
||||
Runs immediately after a tool completes successfully.
|
||||
|
||||
Recognizes the same matcher values as PreToolUse.
|
||||
|
||||
### Notification
|
||||
|
||||
Runs when Claude Code sends notifications. Notifications are sent when:
|
||||
|
||||
1. Claude needs your permission to use a tool. Example: "Claude needs your
|
||||
permission to use Bash"
|
||||
2. The prompt input has been idle for at least 60 seconds. "Claude is waiting
|
||||
for your input"
|
||||
|
||||
### UserPromptSubmit
|
||||
|
||||
Runs when the user submits a prompt, before Claude processes it. This allows you
|
||||
to add additional context based on the prompt/conversation, validate prompts, or
|
||||
block certain types of prompts.
|
||||
|
||||
### Stop
|
||||
|
||||
Runs when the main Claude Code agent has finished responding. Does not run if
|
||||
the stoppage occurred due to a user interrupt.
|
||||
|
||||
### SubagentStop
|
||||
|
||||
Runs when a Claude Code subagent (Task tool call) has finished responding.
|
||||
|
||||
### PreCompact
|
||||
|
||||
Runs before Claude Code is about to run a compact operation.
|
||||
|
||||
**Matchers:**
|
||||
|
||||
* `manual` - Invoked from `/compact`
|
||||
* `auto` - Invoked from auto-compact (due to full context window)
|
||||
|
||||
### SessionStart
|
||||
|
||||
Runs when Claude Code starts a new session or resumes an existing session (which
|
||||
currently does start a new session under the hood). Useful for loading in
|
||||
development context like existing issues or recent changes to your codebase.
|
||||
|
||||
**Matchers:**
|
||||
|
||||
* `startup` - Invoked from startup
|
||||
* `resume` - Invoked from `--resume`, `--continue`, or `/resume`
|
||||
* `clear` - Invoked from `/clear`
|
||||
* `compact` - Invoked from auto or manual compact.
|
||||
|
||||
### SessionEnd
|
||||
|
||||
Runs when a Claude Code session ends. Useful for cleanup tasks, logging session
|
||||
statistics, or saving session state.
|
||||
|
||||
The `reason` field in the hook input will be one of:
|
||||
|
||||
* `clear` - Session cleared with /clear command
|
||||
* `logout` - User logged out
|
||||
* `prompt_input_exit` - User exited while prompt input was visible
|
||||
* `other` - Other exit reasons
|
||||
|
||||
## Hook Input
|
||||
|
||||
Hooks receive JSON data via stdin containing session information and
|
||||
event-specific data:
|
||||
|
||||
```typescript theme={null}
|
||||
{
|
||||
// Common fields
|
||||
session_id: string
|
||||
transcript_path: string // Path to conversation JSON
|
||||
cwd: string // The current working directory when the hook is invoked
|
||||
|
||||
// Event-specific fields
|
||||
hook_event_name: string
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
### PreToolUse Input
|
||||
|
||||
The exact schema for `tool_input` depends on the tool.
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"session_id": "abc123",
|
||||
"transcript_path": "/Users/.../.claude/projects/.../00893aaf-19fa-41d2-8238-13269b9b3ca0.jsonl",
|
||||
"cwd": "/Users/...",
|
||||
"hook_event_name": "PreToolUse",
|
||||
"tool_name": "Write",
|
||||
"tool_input": {
|
||||
"file_path": "/path/to/file.txt",
|
||||
"content": "file content"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### PostToolUse Input
|
||||
|
||||
The exact schema for `tool_input` and `tool_response` depends on the tool.
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"session_id": "abc123",
|
||||
"transcript_path": "/Users/.../.claude/projects/.../00893aaf-19fa-41d2-8238-13269b9b3ca0.jsonl",
|
||||
"cwd": "/Users/...",
|
||||
"hook_event_name": "PostToolUse",
|
||||
"tool_name": "Write",
|
||||
"tool_input": {
|
||||
"file_path": "/path/to/file.txt",
|
||||
"content": "file content"
|
||||
},
|
||||
"tool_response": {
|
||||
"filePath": "/path/to/file.txt",
|
||||
"success": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Notification Input
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"session_id": "abc123",
|
||||
"transcript_path": "/Users/.../.claude/projects/.../00893aaf-19fa-41d2-8238-13269b9b3ca0.jsonl",
|
||||
"cwd": "/Users/...",
|
||||
"hook_event_name": "Notification",
|
||||
"message": "Task completed successfully"
|
||||
}
|
||||
```
|
||||
|
||||
### UserPromptSubmit Input
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"session_id": "abc123",
|
||||
"transcript_path": "/Users/.../.claude/projects/.../00893aaf-19fa-41d2-8238-13269b9b3ca0.jsonl",
|
||||
"cwd": "/Users/...",
|
||||
"hook_event_name": "UserPromptSubmit",
|
||||
"prompt": "Write a function to calculate the factorial of a number"
|
||||
}
|
||||
```
|
||||
|
||||
### Stop and SubagentStop Input
|
||||
|
||||
`stop_hook_active` is true when Claude Code is already continuing as a result of
|
||||
a stop hook. Check this value or process the transcript to prevent Claude Code
|
||||
from running indefinitely.
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"session_id": "abc123",
|
||||
"transcript_path": "~/.claude/projects/.../00893aaf-19fa-41d2-8238-13269b9b3ca0.jsonl",
|
||||
"hook_event_name": "Stop",
|
||||
"stop_hook_active": true
|
||||
}
|
||||
```
|
||||
|
||||
### PreCompact Input
|
||||
|
||||
For `manual`, `custom_instructions` comes from what the user passes into
|
||||
`/compact`. For `auto`, `custom_instructions` is empty.
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"session_id": "abc123",
|
||||
"transcript_path": "~/.claude/projects/.../00893aaf-19fa-41d2-8238-13269b9b3ca0.jsonl",
|
||||
"hook_event_name": "PreCompact",
|
||||
"trigger": "manual",
|
||||
"custom_instructions": ""
|
||||
}
|
||||
```
|
||||
|
||||
### SessionStart Input
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"session_id": "abc123",
|
||||
"transcript_path": "~/.claude/projects/.../00893aaf-19fa-41d2-8238-13269b9b3ca0.jsonl",
|
||||
"hook_event_name": "SessionStart",
|
||||
"source": "startup"
|
||||
}
|
||||
```
|
||||
|
||||
### SessionEnd Input
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"session_id": "abc123",
|
||||
"transcript_path": "~/.claude/projects/.../00893aaf-19fa-41d2-8238-13269b9b3ca0.jsonl",
|
||||
"cwd": "/Users/...",
|
||||
"hook_event_name": "SessionEnd",
|
||||
"reason": "exit"
|
||||
}
|
||||
```
|
||||
|
||||
## Hook Output
|
||||
|
||||
There are two ways for hooks to return output back to Claude Code. The output
|
||||
communicates whether to block and any feedback that should be shown to Claude
|
||||
and the user.
|
||||
|
||||
### Simple: Exit Code
|
||||
|
||||
Hooks communicate status through exit codes, stdout, and stderr:
|
||||
|
||||
* **Exit code 0**: Success. `stdout` is shown to the user in transcript mode
|
||||
(CTRL-R), except for `UserPromptSubmit` and `SessionStart`, where stdout is
|
||||
added to the context.
|
||||
* **Exit code 2**: Blocking error. `stderr` is fed back to Claude to process
|
||||
automatically. See per-hook-event behavior below.
|
||||
* **Other exit codes**: Non-blocking error. `stderr` is shown to the user and
|
||||
execution continues.
|
||||
|
||||
<Warning>
|
||||
Reminder: Claude Code does not see stdout if the exit code is 0, except for
|
||||
the `UserPromptSubmit` hook where stdout is injected as context.
|
||||
</Warning>
|
||||
|
||||
#### Exit Code 2 Behavior
|
||||
|
||||
| Hook Event | Behavior |
|
||||
| ------------------ | ------------------------------------------------------------------ |
|
||||
| `PreToolUse` | Blocks the tool call, shows stderr to Claude |
|
||||
| `PostToolUse` | Shows stderr to Claude (tool already ran) |
|
||||
| `Notification` | N/A, shows stderr to user only |
|
||||
| `UserPromptSubmit` | Blocks prompt processing, erases prompt, shows stderr to user only |
|
||||
| `Stop` | Blocks stoppage, shows stderr to Claude |
|
||||
| `SubagentStop` | Blocks stoppage, shows stderr to Claude subagent |
|
||||
| `PreCompact` | N/A, shows stderr to user only |
|
||||
| `SessionStart` | N/A, shows stderr to user only |
|
||||
| `SessionEnd` | N/A, shows stderr to user only |
|
||||
|
||||
### Advanced: JSON Output
|
||||
|
||||
Hooks can return structured JSON in `stdout` for more sophisticated control:
|
||||
|
||||
#### Common JSON Fields
|
||||
|
||||
All hook types can include these optional fields:
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"continue": true, // Whether Claude should continue after hook execution (default: true)
|
||||
"stopReason": "string", // Message shown when continue is false
|
||||
|
||||
"suppressOutput": true, // Hide stdout from transcript mode (default: false)
|
||||
"systemMessage": "string" // Optional warning message shown to the user
|
||||
}
|
||||
```
|
||||
|
||||
If `continue` is false, Claude stops processing after the hooks run.
|
||||
|
||||
* For `PreToolUse`, this is different from `"permissionDecision": "deny"`, which
|
||||
only blocks a specific tool call and provides automatic feedback to Claude.
|
||||
* For `PostToolUse`, this is different from `"decision": "block"`, which
|
||||
provides automated feedback to Claude.
|
||||
* For `UserPromptSubmit`, this prevents the prompt from being processed.
|
||||
* For `Stop` and `SubagentStop`, this takes precedence over any
|
||||
`"decision": "block"` output.
|
||||
* In all cases, `"continue" = false` takes precedence over any
|
||||
`"decision": "block"` output.
|
||||
|
||||
`stopReason` accompanies `continue` with a reason shown to the user, not shown
|
||||
to Claude.
|
||||
|
||||
#### `PreToolUse` Decision Control
|
||||
|
||||
`PreToolUse` hooks can control whether a tool call proceeds.
|
||||
|
||||
* `"allow"` bypasses the permission system. `permissionDecisionReason` is shown
|
||||
to the user but not to Claude.
|
||||
* `"deny"` prevents the tool call from executing. `permissionDecisionReason` is
|
||||
shown to Claude.
|
||||
* `"ask"` asks the user to confirm the tool call in the UI.
|
||||
`permissionDecisionReason` is shown to the user but not to Claude.
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"hookSpecificOutput": {
|
||||
"hookEventName": "PreToolUse",
|
||||
"permissionDecision": "allow" | "deny" | "ask",
|
||||
"permissionDecisionReason": "My reason here"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<Note>
|
||||
The `decision` and `reason` fields are deprecated for PreToolUse hooks.
|
||||
Use `hookSpecificOutput.permissionDecision` and
|
||||
`hookSpecificOutput.permissionDecisionReason` instead. The deprecated fields
|
||||
`"approve"` and `"block"` map to `"allow"` and `"deny"` respectively.
|
||||
</Note>
|
||||
|
||||
#### `PostToolUse` Decision Control
|
||||
|
||||
`PostToolUse` hooks can provide feedback to Claude after tool execution.
|
||||
|
||||
* `"block"` automatically prompts Claude with `reason`.
|
||||
* `undefined` does nothing. `reason` is ignored.
|
||||
* `"hookSpecificOutput.additionalContext"` adds context for Claude to consider.
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"decision": "block" | undefined,
|
||||
"reason": "Explanation for decision",
|
||||
"hookSpecificOutput": {
|
||||
"hookEventName": "PostToolUse",
|
||||
"additionalContext": "Additional information for Claude"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### `UserPromptSubmit` Decision Control
|
||||
|
||||
`UserPromptSubmit` hooks can control whether a user prompt is processed.
|
||||
|
||||
* `"block"` prevents the prompt from being processed. The submitted prompt is
|
||||
erased from context. `"reason"` is shown to the user but not added to context.
|
||||
* `undefined` allows the prompt to proceed normally. `"reason"` is ignored.
|
||||
* `"hookSpecificOutput.additionalContext"` adds the string to the context if not
|
||||
blocked.
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"decision": "block" | undefined,
|
||||
"reason": "Explanation for decision",
|
||||
"hookSpecificOutput": {
|
||||
"hookEventName": "UserPromptSubmit",
|
||||
"additionalContext": "My additional context here"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### `Stop`/`SubagentStop` Decision Control
|
||||
|
||||
`Stop` and `SubagentStop` hooks can control whether Claude must continue.
|
||||
|
||||
* `"block"` prevents Claude from stopping. You must populate `reason` for Claude
|
||||
to know how to proceed.
|
||||
* `undefined` allows Claude to stop. `reason` is ignored.
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"decision": "block" | undefined,
|
||||
"reason": "Must be provided when Claude is blocked from stopping"
|
||||
}
|
||||
```
|
||||
|
||||
#### `SessionStart` Decision Control
|
||||
|
||||
`SessionStart` hooks allow you to load in context at the start of a session.
|
||||
|
||||
* `"hookSpecificOutput.additionalContext"` adds the string to the context.
|
||||
* Multiple hooks' `additionalContext` values are concatenated.
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"hookSpecificOutput": {
|
||||
"hookEventName": "SessionStart",
|
||||
"additionalContext": "My additional context here"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### `SessionEnd` Decision Control
|
||||
|
||||
`SessionEnd` hooks run when a session ends. They cannot block session termination
|
||||
but can perform cleanup tasks.
|
||||
|
||||
#### Exit Code Example: Bash Command Validation
|
||||
|
||||
```python theme={null}
|
||||
#!/usr/bin/env python3
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
|
||||
# Define validation rules as a list of (regex pattern, message) tuples
|
||||
VALIDATION_RULES = [
|
||||
(
|
||||
r"\bgrep\b(?!.*\|)",
|
||||
"Use 'rg' (ripgrep) instead of 'grep' for better performance and features",
|
||||
),
|
||||
(
|
||||
r"\bfind\s+\S+\s+-name\b",
|
||||
"Use 'rg --files | rg pattern' or 'rg --files -g pattern' instead of 'find -name' for better performance",
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
def validate_command(command: str) -> list[str]:
|
||||
issues = []
|
||||
for pattern, message in VALIDATION_RULES:
|
||||
if re.search(pattern, command):
|
||||
issues.append(message)
|
||||
return issues
|
||||
|
||||
|
||||
try:
|
||||
input_data = json.load(sys.stdin)
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"Error: Invalid JSON input: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
tool_name = input_data.get("tool_name", "")
|
||||
tool_input = input_data.get("tool_input", {})
|
||||
command = tool_input.get("command", "")
|
||||
|
||||
if tool_name != "Bash" or not command:
|
||||
sys.exit(1)
|
||||
|
||||
# Validate the command
|
||||
issues = validate_command(command)
|
||||
|
||||
if issues:
|
||||
for message in issues:
|
||||
print(f"• {message}", file=sys.stderr)
|
||||
# Exit code 2 blocks tool call and shows stderr to Claude
|
||||
sys.exit(2)
|
||||
```
|
||||
|
||||
#### JSON Output Example: UserPromptSubmit to Add Context and Validation
|
||||
|
||||
<Note>
|
||||
For `UserPromptSubmit` hooks, you can inject context using either method:
|
||||
|
||||
* Exit code 0 with stdout: Claude sees the context (special case for `UserPromptSubmit`)
|
||||
* JSON output: Provides more control over the behavior
|
||||
</Note>
|
||||
|
||||
```python theme={null}
|
||||
#!/usr/bin/env python3
|
||||
import json
|
||||
import sys
|
||||
import re
|
||||
import datetime
|
||||
|
||||
# Load input from stdin
|
||||
try:
|
||||
input_data = json.load(sys.stdin)
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"Error: Invalid JSON input: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
prompt = input_data.get("prompt", "")
|
||||
|
||||
# Check for sensitive patterns
|
||||
sensitive_patterns = [
|
||||
(r"(?i)\b(password|secret|key|token)\s*[:=]", "Prompt contains potential secrets"),
|
||||
]
|
||||
|
||||
for pattern, message in sensitive_patterns:
|
||||
if re.search(pattern, prompt):
|
||||
# Use JSON output to block with a specific reason
|
||||
output = {
|
||||
"decision": "block",
|
||||
"reason": f"Security policy violation: {message}. Please rephrase your request without sensitive information."
|
||||
}
|
||||
print(json.dumps(output))
|
||||
sys.exit(0)
|
||||
|
||||
# Add current time to context
|
||||
context = f"Current time: {datetime.datetime.now()}"
|
||||
print(context)
|
||||
|
||||
"""
|
||||
The following is also equivalent:
|
||||
print(json.dumps({
|
||||
"hookSpecificOutput": {
|
||||
"hookEventName": "UserPromptSubmit",
|
||||
"additionalContext": context,
|
||||
},
|
||||
}))
|
||||
"""
|
||||
|
||||
# Allow the prompt to proceed with the additional context
|
||||
sys.exit(0)
|
||||
```
|
||||
|
||||
#### JSON Output Example: PreToolUse with Approval
|
||||
|
||||
```python theme={null}
|
||||
#!/usr/bin/env python3
|
||||
import json
|
||||
import sys
|
||||
|
||||
# Load input from stdin
|
||||
try:
|
||||
input_data = json.load(sys.stdin)
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"Error: Invalid JSON input: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
tool_name = input_data.get("tool_name", "")
|
||||
tool_input = input_data.get("tool_input", {})
|
||||
|
||||
# Example: Auto-approve file reads for documentation files
|
||||
if tool_name == "Read":
|
||||
file_path = tool_input.get("file_path", "")
|
||||
if file_path.endswith((".md", ".mdx", ".txt", ".json")):
|
||||
# Use JSON output to auto-approve the tool call
|
||||
output = {
|
||||
"decision": "approve",
|
||||
"reason": "Documentation file auto-approved",
|
||||
"suppressOutput": True # Don't show in transcript mode
|
||||
}
|
||||
print(json.dumps(output))
|
||||
sys.exit(0)
|
||||
|
||||
# For other cases, let the normal permission flow proceed
|
||||
sys.exit(0)
|
||||
```
|
||||
|
||||
## Working with MCP Tools
|
||||
|
||||
Claude Code hooks work seamlessly with
|
||||
[Model Context Protocol (MCP) tools](/en/docs/claude-code/mcp). When MCP servers
|
||||
provide tools, they appear with a special naming pattern that you can match in
|
||||
your hooks.
|
||||
|
||||
### MCP Tool Naming
|
||||
|
||||
MCP tools follow the pattern `mcp__<server>__<tool>`, for example:
|
||||
|
||||
* `mcp__memory__create_entities` - Memory server's create entities tool
|
||||
* `mcp__filesystem__read_file` - Filesystem server's read file tool
|
||||
* `mcp__github__search_repositories` - GitHub server's search tool
|
||||
|
||||
### Configuring Hooks for MCP Tools
|
||||
|
||||
You can target specific MCP tools or entire MCP servers:
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"hooks": {
|
||||
"PreToolUse": [
|
||||
{
|
||||
"matcher": "mcp__memory__.*",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "echo 'Memory operation initiated' >> ~/mcp-operations.log"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"matcher": "mcp__.*__write.*",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "/home/user/scripts/validate-mcp-write.py"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
<Tip>
|
||||
For practical examples including code formatting, notifications, and file protection, see [More Examples](/en/docs/claude-code/hooks-guide#more-examples) in the get started guide.
|
||||
</Tip>
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Disclaimer
|
||||
|
||||
**USE AT YOUR OWN RISK**: Claude Code hooks execute arbitrary shell commands on
|
||||
your system automatically. By using hooks, you acknowledge that:
|
||||
|
||||
* You are solely responsible for the commands you configure
|
||||
* Hooks can modify, delete, or access any files your user account can access
|
||||
* Malicious or poorly written hooks can cause data loss or system damage
|
||||
* Anthropic provides no warranty and assumes no liability for any damages
|
||||
resulting from hook usage
|
||||
* You should thoroughly test hooks in a safe environment before production use
|
||||
|
||||
Always review and understand any hook commands before adding them to your
|
||||
configuration.
|
||||
|
||||
### Security Best Practices
|
||||
|
||||
Here are some key practices for writing more secure hooks:
|
||||
|
||||
1. **Validate and sanitize inputs** - Never trust input data blindly
|
||||
2. **Always quote shell variables** - Use `"$VAR"` not `$VAR`
|
||||
3. **Block path traversal** - Check for `..` in file paths
|
||||
4. **Use absolute paths** - Specify full paths for scripts (use
|
||||
"\$CLAUDE\_PROJECT\_DIR" for the project path)
|
||||
5. **Skip sensitive files** - Avoid `.env`, `.git/`, keys, etc.
|
||||
|
||||
### Configuration Safety
|
||||
|
||||
Direct edits to hooks in settings files don't take effect immediately. Claude
|
||||
Code:
|
||||
|
||||
1. Captures a snapshot of hooks at startup
|
||||
2. Uses this snapshot throughout the session
|
||||
3. Warns if hooks are modified externally
|
||||
4. Requires review in `/hooks` menu for changes to apply
|
||||
|
||||
This prevents malicious hook modifications from affecting your current session.
|
||||
|
||||
## Hook Execution Details
|
||||
|
||||
* **Timeout**: 60-second execution limit by default, configurable per command.
|
||||
* A timeout for an individual command does not affect the other commands.
|
||||
* **Parallelization**: All matching hooks run in parallel
|
||||
* **Deduplication**: Multiple identical hook commands are deduplicated automatically
|
||||
* **Environment**: Runs in current directory with Claude Code's environment
|
||||
* The `CLAUDE_PROJECT_DIR` environment variable is available and contains the
|
||||
absolute path to the project root directory (where Claude Code was started)
|
||||
* **Input**: JSON via stdin
|
||||
* **Output**:
|
||||
* PreToolUse/PostToolUse/Stop/SubagentStop: Progress shown in transcript (Ctrl-R)
|
||||
* Notification/SessionEnd: Logged to debug only (`--debug`)
|
||||
* UserPromptSubmit/SessionStart: stdout added as context for Claude
|
||||
|
||||
## Debugging
|
||||
|
||||
### Basic Troubleshooting
|
||||
|
||||
If your hooks aren't working:
|
||||
|
||||
1. **Check configuration** - Run `/hooks` to see if your hook is registered
|
||||
2. **Verify syntax** - Ensure your JSON settings are valid
|
||||
3. **Test commands** - Run hook commands manually first
|
||||
4. **Check permissions** - Make sure scripts are executable
|
||||
5. **Review logs** - Use `claude --debug` to see hook execution details
|
||||
|
||||
Common issues:
|
||||
|
||||
* **Quotes not escaped** - Use `\"` inside JSON strings
|
||||
* **Wrong matcher** - Check tool names match exactly (case-sensitive)
|
||||
* **Command not found** - Use full paths for scripts
|
||||
|
||||
### Advanced Debugging
|
||||
|
||||
For complex hook issues:
|
||||
|
||||
1. **Inspect hook execution** - Use `claude --debug` to see detailed hook
|
||||
execution
|
||||
2. **Validate JSON schemas** - Test hook input/output with external tools
|
||||
3. **Check environment variables** - Verify Claude Code's environment is correct
|
||||
4. **Test edge cases** - Try hooks with unusual file paths or inputs
|
||||
5. **Monitor system resources** - Check for resource exhaustion during hook
|
||||
execution
|
||||
6. **Use structured logging** - Implement logging in your hook scripts
|
||||
|
||||
### Debug Output Example
|
||||
|
||||
Use `claude --debug` to see hook execution details:
|
||||
|
||||
```
|
||||
[DEBUG] Executing hooks for PostToolUse:Write
|
||||
[DEBUG] Getting matching hook commands for PostToolUse with query: Write
|
||||
[DEBUG] Found 1 hook matchers in settings
|
||||
[DEBUG] Matched 1 hooks for query "Write"
|
||||
[DEBUG] Found 1 hook commands to execute
|
||||
[DEBUG] Executing hook command: <Your command> with timeout 60000ms
|
||||
[DEBUG] Hook command completed with status 0: <Your stdout>
|
||||
```
|
||||
|
||||
Progress messages appear in transcript mode (Ctrl-R) showing:
|
||||
|
||||
* Which hook is running
|
||||
* Command being executed
|
||||
* Success/failure status
|
||||
* Output or error messages
|
||||
@@ -0,0 +1,391 @@
|
||||
# Plugins
|
||||
|
||||
> Extend Claude Code with custom commands, agents, hooks, and MCP servers through the plugin system.
|
||||
|
||||
<Tip>
|
||||
For complete technical specifications and schemas, see [Plugins reference](/en/docs/claude-code/plugins-reference). For marketplace management, see [Plugin marketplaces](/en/docs/claude-code/plugin-marketplaces).
|
||||
</Tip>
|
||||
|
||||
Plugins let you extend Claude Code with custom functionality that can be shared across projects and teams. Install plugins from [marketplaces](/en/docs/claude-code/plugin-marketplaces) to add pre-built commands, agents, hooks, and MCP servers, or create your own to automate your workflows.
|
||||
|
||||
## Quickstart
|
||||
|
||||
Let's create a simple greeting plugin to get you familiar with the plugin system. We'll build a working plugin that adds a custom command, test it locally, and understand the core concepts.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
* Claude Code installed on your machine
|
||||
* Basic familiarity with command-line tools
|
||||
|
||||
### Create your first plugin
|
||||
|
||||
<Steps>
|
||||
<Step title="Create the marketplace structure">
|
||||
```bash theme={null}
|
||||
mkdir test-marketplace
|
||||
cd test-marketplace
|
||||
```
|
||||
</Step>
|
||||
|
||||
<Step title="Create the plugin directory">
|
||||
```bash theme={null}
|
||||
mkdir my-first-plugin
|
||||
cd my-first-plugin
|
||||
```
|
||||
</Step>
|
||||
|
||||
<Step title="Create the plugin manifest">
|
||||
```bash Create .claude-plugin/plugin.json theme={null}
|
||||
mkdir .claude-plugin
|
||||
cat > .claude-plugin/plugin.json << 'EOF'
|
||||
{
|
||||
"name": "my-first-plugin",
|
||||
"description": "A simple greeting plugin to learn the basics",
|
||||
"version": "1.0.0",
|
||||
"author": {
|
||||
"name": "Your Name"
|
||||
}
|
||||
}
|
||||
EOF
|
||||
```
|
||||
</Step>
|
||||
|
||||
<Step title="Add a custom command">
|
||||
```bash Create commands/hello.md theme={null}
|
||||
mkdir commands
|
||||
cat > commands/hello.md << 'EOF'
|
||||
---
|
||||
description: Greet the user with a personalized message
|
||||
---
|
||||
|
||||
# Hello Command
|
||||
|
||||
Greet the user warmly and ask how you can help them today. Make the greeting personal and encouraging.
|
||||
EOF
|
||||
```
|
||||
</Step>
|
||||
|
||||
<Step title="Create the marketplace manifest">
|
||||
```bash Create marketplace.json theme={null}
|
||||
cd ..
|
||||
mkdir .claude-plugin
|
||||
cat > .claude-plugin/marketplace.json << 'EOF'
|
||||
{
|
||||
"name": "test-marketplace",
|
||||
"owner": {
|
||||
"name": "Test User"
|
||||
},
|
||||
"plugins": [
|
||||
{
|
||||
"name": "my-first-plugin",
|
||||
"source": "./my-first-plugin",
|
||||
"description": "My first test plugin"
|
||||
}
|
||||
]
|
||||
}
|
||||
EOF
|
||||
```
|
||||
</Step>
|
||||
|
||||
<Step title="Install and test your plugin">
|
||||
```bash Start Claude Code from parent directory theme={null}
|
||||
cd ..
|
||||
claude
|
||||
```
|
||||
|
||||
```shell Add the test marketplace theme={null}
|
||||
/plugin marketplace add ./test-marketplace
|
||||
```
|
||||
|
||||
```shell Install your plugin theme={null}
|
||||
/plugin install my-first-plugin@test-marketplace
|
||||
```
|
||||
|
||||
Select "Install now". You'll then need to restart Claude Code in order to use the new plugin.
|
||||
|
||||
```shell Try your new command theme={null}
|
||||
/hello
|
||||
```
|
||||
|
||||
You'll see Claude use your greeting command! Check `/help` to see your new command listed.
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
You've successfully created and tested a plugin with these key components:
|
||||
|
||||
* **Plugin manifest** (`.claude-plugin/plugin.json`) - Describes your plugin's metadata
|
||||
* **Commands directory** (`commands/`) - Contains your custom slash commands
|
||||
* **Test marketplace** - Allows you to test your plugin locally
|
||||
|
||||
### Plugin structure overview
|
||||
|
||||
Your plugin follows this basic structure:
|
||||
|
||||
```
|
||||
my-first-plugin/
|
||||
├── .claude-plugin/
|
||||
│ └── plugin.json # Plugin metadata
|
||||
├── commands/ # Custom slash commands (optional)
|
||||
│ └── hello.md
|
||||
├── agents/ # Custom agents (optional)
|
||||
│ └── helper.md
|
||||
├── skills/ # Agent Skills (optional)
|
||||
│ └── my-skill/
|
||||
│ └── SKILL.md
|
||||
└── hooks/ # Event handlers (optional)
|
||||
└── hooks.json
|
||||
```
|
||||
|
||||
**Additional components you can add:**
|
||||
|
||||
* **Commands**: Create markdown files in `commands/` directory
|
||||
* **Agents**: Create agent definitions in `agents/` directory
|
||||
* **Skills**: Create `SKILL.md` files in `skills/` directory
|
||||
* **Hooks**: Create `hooks/hooks.json` for event handling
|
||||
* **MCP servers**: Create `.mcp.json` for external tool integration
|
||||
|
||||
<Note>
|
||||
**Next steps**: Ready to add more features? Jump to [Develop more complex plugins](#develop-more-complex-plugins) to add agents, hooks, and MCP servers. For complete technical specifications of all plugin components, see [Plugins reference](/en/docs/claude-code/plugins-reference).
|
||||
</Note>
|
||||
|
||||
***
|
||||
|
||||
## Install and manage plugins
|
||||
|
||||
Learn how to discover, install, and manage plugins to extend your Claude Code capabilities.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
* Claude Code installed and running
|
||||
* Basic familiarity with command-line interfaces
|
||||
|
||||
### Add marketplaces
|
||||
|
||||
Marketplaces are catalogs of available plugins. Add them to discover and install plugins:
|
||||
|
||||
```shell Add a marketplace theme={null}
|
||||
/plugin marketplace add your-org/claude-plugins
|
||||
```
|
||||
|
||||
```shell Browse available plugins theme={null}
|
||||
/plugin
|
||||
```
|
||||
|
||||
For detailed marketplace management including Git repositories, local development, and team distribution, see [Plugin marketplaces](/en/docs/claude-code/plugin-marketplaces).
|
||||
|
||||
### Install plugins
|
||||
|
||||
#### Via interactive menu (recommended for discovery)
|
||||
|
||||
```shell Open the plugin management interface theme={null}
|
||||
/plugin
|
||||
```
|
||||
|
||||
Select "Browse Plugins" to see available options with descriptions, features, and installation options.
|
||||
|
||||
#### Via direct commands (for quick installation)
|
||||
|
||||
```shell Install a specific plugin theme={null}
|
||||
/plugin install formatter@your-org
|
||||
```
|
||||
|
||||
```shell Enable a disabled plugin theme={null}
|
||||
/plugin enable plugin-name@marketplace-name
|
||||
```
|
||||
|
||||
```shell Disable without uninstalling theme={null}
|
||||
/plugin disable plugin-name@marketplace-name
|
||||
```
|
||||
|
||||
```shell Completely remove a plugin theme={null}
|
||||
/plugin uninstall plugin-name@marketplace-name
|
||||
```
|
||||
|
||||
### Verify installation
|
||||
|
||||
After installing a plugin:
|
||||
|
||||
1. **Check available commands**: Run `/help` to see new commands
|
||||
2. **Test plugin features**: Try the plugin's commands and features
|
||||
3. **Review plugin details**: Use `/plugin` → "Manage Plugins" to see what the plugin provides
|
||||
|
||||
## Set up team plugin workflows
|
||||
|
||||
Configure plugins at the repository level to ensure consistent tooling across your team. When team members trust your repository folder, Claude Code automatically installs specified marketplaces and plugins.
|
||||
|
||||
**To set up team plugins:**
|
||||
|
||||
1. Add marketplace and plugin configuration to your repository's `.claude/settings.json`
|
||||
2. Team members trust the repository folder
|
||||
3. Plugins install automatically for all team members
|
||||
|
||||
For complete instructions including configuration examples, marketplace setup, and rollout best practices, see [Configure team marketplaces](/en/docs/claude-code/plugin-marketplaces#how-to-configure-team-marketplaces).
|
||||
|
||||
***
|
||||
|
||||
## Develop more complex plugins
|
||||
|
||||
Once you're comfortable with basic plugins, you can create more sophisticated extensions.
|
||||
|
||||
### Add Skills to your plugin
|
||||
|
||||
Plugins can include [Agent Skills](/en/docs/claude-code/skills) to extend Claude's capabilities. Skills are model-invoked—Claude autonomously uses them based on the task context.
|
||||
|
||||
To add Skills to your plugin, create a `skills/` directory at your plugin root and add Skill folders with `SKILL.md` files. Plugin Skills are automatically available when the plugin is installed.
|
||||
|
||||
For complete Skill authoring guidance, see [Agent Skills](/en/docs/claude-code/skills).
|
||||
|
||||
### Organize complex plugins
|
||||
|
||||
For plugins with many components, organize your directory structure by functionality. For complete directory layouts and organization patterns, see [Plugin directory structure](/en/docs/claude-code/plugins-reference#plugin-directory-structure).
|
||||
|
||||
### Test your plugins locally
|
||||
|
||||
When developing plugins, use a local marketplace to test changes iteratively. This workflow builds on the quickstart pattern and works for plugins of any complexity.
|
||||
|
||||
<Steps>
|
||||
<Step title="Set up your development structure">
|
||||
Organize your plugin and marketplace for testing:
|
||||
|
||||
```bash Create directory structure theme={null}
|
||||
mkdir dev-marketplace
|
||||
cd dev-marketplace
|
||||
mkdir my-plugin
|
||||
```
|
||||
|
||||
This creates:
|
||||
|
||||
```
|
||||
dev-marketplace/
|
||||
├── .claude-plugin/marketplace.json (you'll create this)
|
||||
└── my-plugin/ (your plugin under development)
|
||||
├── .claude-plugin/plugin.json
|
||||
├── commands/
|
||||
├── agents/
|
||||
└── hooks/
|
||||
```
|
||||
</Step>
|
||||
|
||||
<Step title="Create the marketplace manifest">
|
||||
```bash Create marketplace.json theme={null}
|
||||
mkdir .claude-plugin
|
||||
cat > .claude-plugin/marketplace.json << 'EOF'
|
||||
{
|
||||
"name": "dev-marketplace",
|
||||
"owner": {
|
||||
"name": "Developer"
|
||||
},
|
||||
"plugins": [
|
||||
{
|
||||
"name": "my-plugin",
|
||||
"source": "./my-plugin",
|
||||
"description": "Plugin under development"
|
||||
}
|
||||
]
|
||||
}
|
||||
EOF
|
||||
```
|
||||
</Step>
|
||||
|
||||
<Step title="Install and test">
|
||||
```bash Start Claude Code from parent directory theme={null}
|
||||
cd ..
|
||||
claude
|
||||
```
|
||||
|
||||
```shell Add your development marketplace theme={null}
|
||||
/plugin marketplace add ./dev-marketplace
|
||||
```
|
||||
|
||||
```shell Install your plugin theme={null}
|
||||
/plugin install my-plugin@dev-marketplace
|
||||
```
|
||||
|
||||
Test your plugin components:
|
||||
|
||||
* Try your commands with `/command-name`
|
||||
* Check that agents appear in `/agents`
|
||||
* Verify hooks work as expected
|
||||
</Step>
|
||||
|
||||
<Step title="Iterate on your plugin">
|
||||
After making changes to your plugin code:
|
||||
|
||||
```shell Uninstall the current version theme={null}
|
||||
/plugin uninstall my-plugin@dev-marketplace
|
||||
```
|
||||
|
||||
```shell Reinstall to test changes theme={null}
|
||||
/plugin install my-plugin@dev-marketplace
|
||||
```
|
||||
|
||||
Repeat this cycle as you develop and refine your plugin.
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
<Note>
|
||||
**For multiple plugins**: Organize plugins in subdirectories like `./plugins/plugin-name` and update your marketplace.json accordingly. See [Plugin sources](/en/docs/claude-code/plugin-marketplaces#plugin-sources) for organization patterns.
|
||||
</Note>
|
||||
|
||||
### Debug plugin issues
|
||||
|
||||
If your plugin isn't working as expected:
|
||||
|
||||
1. **Check the structure**: Ensure your directories are at the plugin root, not inside `.claude-plugin/`
|
||||
2. **Test components individually**: Check each command, agent, and hook separately
|
||||
3. **Use validation and debugging tools**: See [Debugging and development tools](/en/docs/claude-code/plugins-reference#debugging-and-development-tools) for CLI commands and troubleshooting techniques
|
||||
|
||||
### Share your plugins
|
||||
|
||||
When your plugin is ready to share:
|
||||
|
||||
1. **Add documentation**: Include a README.md with installation and usage instructions
|
||||
2. **Version your plugin**: Use semantic versioning in your `plugin.json`
|
||||
3. **Create or use a marketplace**: Distribute through plugin marketplaces for easy installation
|
||||
4. **Test with others**: Have team members test the plugin before wider distribution
|
||||
|
||||
<Note>
|
||||
For complete technical specifications, debugging techniques, and distribution strategies, see [Plugins reference](/en/docs/claude-code/plugins-reference).
|
||||
</Note>
|
||||
|
||||
***
|
||||
|
||||
## Next steps
|
||||
|
||||
Now that you understand Claude Code's plugin system, here are suggested paths for different goals:
|
||||
|
||||
### For plugin users
|
||||
|
||||
* **Discover plugins**: Browse community marketplaces for useful tools
|
||||
* **Team adoption**: Set up repository-level plugins for your projects
|
||||
* **Marketplace management**: Learn to manage multiple plugin sources
|
||||
* **Advanced usage**: Explore plugin combinations and workflows
|
||||
|
||||
### For plugin developers
|
||||
|
||||
* **Create your first marketplace**: [Plugin marketplaces guide](/en/docs/claude-code/plugin-marketplaces)
|
||||
* **Advanced components**: Dive deeper into specific plugin components:
|
||||
* [Slash commands](/en/docs/claude-code/slash-commands) - Command development details
|
||||
* [Subagents](/en/docs/claude-code/sub-agents) - Agent configuration and capabilities
|
||||
* [Agent Skills](/en/docs/claude-code/skills) - Extend Claude's capabilities
|
||||
* [Hooks](/en/docs/claude-code/hooks) - Event handling and automation
|
||||
* [MCP](/en/docs/claude-code/mcp) - External tool integration
|
||||
* **Distribution strategies**: Package and share your plugins effectively
|
||||
* **Community contribution**: Consider contributing to community plugin collections
|
||||
|
||||
### For team leads and administrators
|
||||
|
||||
* **Repository configuration**: Set up automatic plugin installation for team projects
|
||||
* **Plugin governance**: Establish guidelines for plugin approval and security review
|
||||
* **Marketplace maintenance**: Create and maintain organization-specific plugin catalogs
|
||||
* **Training and documentation**: Help team members adopt plugin workflows effectively
|
||||
|
||||
## See also
|
||||
|
||||
* [Plugin marketplaces](/en/docs/claude-code/plugin-marketplaces) - Creating and managing plugin catalogs
|
||||
* [Slash commands](/en/docs/claude-code/slash-commands) - Understanding custom commands
|
||||
* [Subagents](/en/docs/claude-code/sub-agents) - Creating and using specialized agents
|
||||
* [Agent Skills](/en/docs/claude-code/skills) - Extend Claude's capabilities
|
||||
* [Hooks](/en/docs/claude-code/hooks) - Automating workflows with event handlers
|
||||
* [MCP](/en/docs/claude-code/mcp) - Connecting to external tools and services
|
||||
* [Settings](/en/docs/claude-code/settings) - Configuration options for plugins
|
||||
@@ -0,0 +1,218 @@
|
||||
# Models overview
|
||||
|
||||
> Claude is a family of state-of-the-art large language models developed by Anthropic. This guide introduces our models and compares their performance with legacy models.
|
||||
|
||||
export const ModelId = ({children, style = {}}) => {
|
||||
const copiedNotice = 'Copied!';
|
||||
const handleClick = e => {
|
||||
const element = e.currentTarget;
|
||||
const originalText = element.textContent;
|
||||
navigator.clipboard.writeText(children).then(() => {
|
||||
element.textContent = copiedNotice;
|
||||
element.style.backgroundColor = '#d4edda';
|
||||
element.style.color = '#155724';
|
||||
element.style.borderColor = '#c3e6cb';
|
||||
setTimeout(() => {
|
||||
element.textContent = originalText;
|
||||
element.style.backgroundColor = '#f5f5f5';
|
||||
element.style.color = '';
|
||||
element.style.borderColor = 'transparent';
|
||||
}, 2000);
|
||||
}).catch(error => {
|
||||
console.error('Failed to copy:', error);
|
||||
});
|
||||
};
|
||||
const handleMouseEnter = e => {
|
||||
const element = e.currentTarget;
|
||||
const tooltip = element.querySelector('.copy-tooltip');
|
||||
if (tooltip && element.textContent !== copiedNotice) {
|
||||
tooltip.style.opacity = '1';
|
||||
}
|
||||
element.style.backgroundColor = '#e8e8e8';
|
||||
element.style.borderColor = '#d0d0d0';
|
||||
};
|
||||
const handleMouseLeave = e => {
|
||||
const element = e.currentTarget;
|
||||
const tooltip = element.querySelector('.copy-tooltip');
|
||||
if (tooltip) {
|
||||
tooltip.style.opacity = '0';
|
||||
}
|
||||
if (element.textContent !== copiedNotice) {
|
||||
element.style.backgroundColor = '#f5f5f5';
|
||||
element.style.borderColor = 'transparent';
|
||||
}
|
||||
};
|
||||
const defaultStyle = {
|
||||
cursor: 'pointer',
|
||||
position: 'relative',
|
||||
transition: 'all 0.2s ease',
|
||||
display: 'inline-block',
|
||||
userSelect: 'none',
|
||||
backgroundColor: '#f5f5f5',
|
||||
padding: '2px 4px',
|
||||
borderRadius: '4px',
|
||||
fontFamily: 'Monaco, Consolas, "Courier New", monospace',
|
||||
fontSize: '0.9em',
|
||||
border: '1px solid transparent',
|
||||
...style
|
||||
};
|
||||
return <span onClick={handleClick} onMouseEnter={handleMouseEnter} onMouseLeave={handleMouseLeave} style={defaultStyle}>
|
||||
{children}
|
||||
</span>;
|
||||
};
|
||||
|
||||
<CardGroup cols={3}>
|
||||
<Card title="Claude Sonnet 4.5" icon="star" href="/en/docs/about-claude/models/overview#model-comparison-table">
|
||||
Our best model for complex agents and coding
|
||||
|
||||
* <Icon icon="inbox-in" iconType="thin" /> Text and image input
|
||||
* <Icon icon="inbox-out" iconType="thin" /> Text output
|
||||
* <Icon icon="book" iconType="thin" /> 200k context window (1M context beta available)
|
||||
* <Icon icon="brain" iconType="thin" /> Highest intelligence across most tasks
|
||||
</Card>
|
||||
|
||||
<Card title="Claude Haiku 4.5" icon="rocket-launch" href="/en/docs/about-claude/models/overview#model-comparison-table">
|
||||
Our fastest and most intelligent Haiku model
|
||||
|
||||
* <Icon icon="inbox-in" iconType="thin" /> Text and image input
|
||||
* <Icon icon="inbox-out" iconType="thin" /> Text output
|
||||
* <Icon icon="book" iconType="thin" /> 200k context window
|
||||
* <Icon icon="zap" iconType="thin" /> Lightning-fast speed with extended thinking
|
||||
</Card>
|
||||
|
||||
<Card title="Claude Opus 4.1" icon="trophy" href="/en/docs/about-claude/models/overview#model-comparison-table">
|
||||
Exceptional model for specialized complex tasks
|
||||
|
||||
* <Icon icon="inbox-in" iconType="thin" /> Text and image input
|
||||
* <Icon icon="inbox-out" iconType="thin" /> Text output
|
||||
* <Icon icon="book" iconType="thin" /> 200k context window
|
||||
* <Icon icon="brain" iconType="thin" /> Superior reasoning capabilities
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
***
|
||||
|
||||
## Model names
|
||||
|
||||
| Model | Claude API | AWS Bedrock | GCP Vertex AI |
|
||||
| ----------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------------ | ---------------------------------------------- |
|
||||
| Claude Sonnet 4.5 | <ModelId>claude-sonnet-4-5-20250929</ModelId> | <ModelId>anthropic.claude-sonnet-4-5-20250929-v1:0</ModelId> | <ModelId>claude-sonnet-4-5\@20250929</ModelId> |
|
||||
| Claude Sonnet 4 | <ModelId>claude-sonnet-4-20250514</ModelId> | <ModelId>anthropic.claude-sonnet-4-20250514-v1:0</ModelId> | <ModelId>claude-sonnet-4\@20250514</ModelId> |
|
||||
| Claude Sonnet 3.7 | <ModelId>claude-3-7-sonnet-20250219</ModelId> (<ModelId>claude-3-7-sonnet-latest</ModelId>) | <ModelId>anthropic.claude-3-7-sonnet-20250219-v1:0</ModelId> | <ModelId>claude-3-7-sonnet\@20250219</ModelId> |
|
||||
| Claude Haiku 4.5 | <ModelId>claude-haiku-4-5-20251001</ModelId> | <ModelId>anthropic.claude-haiku-4-5-20251001-v1:0</ModelId> | <ModelId>claude-haiku-4-5\@20251001</ModelId> |
|
||||
| Claude Haiku 3.5 | <ModelId>claude-3-5-haiku-20241022</ModelId> (<ModelId>claude-3-5-haiku-latest</ModelId>) | <ModelId>anthropic.claude-3-5-haiku-20241022-v1:0</ModelId> | <ModelId>claude-3-5-haiku\@20241022</ModelId> |
|
||||
| Claude Haiku 3 | <ModelId>claude-3-haiku-20240307</ModelId> | <ModelId>anthropic.claude-3-haiku-20240307-v1:0</ModelId> | <ModelId>claude-3-haiku\@20240307</ModelId> |
|
||||
| Claude Opus 4.1 | <ModelId>claude-opus-4-1-20250805</ModelId> | <ModelId>anthropic.claude-opus-4-1-20250805-v1:0</ModelId> | <ModelId>claude-opus-4-1\@20250805</ModelId> |
|
||||
| Claude Opus 4 | <ModelId>claude-opus-4-20250514</ModelId> | <ModelId>anthropic.claude-opus-4-20250514-v1:0</ModelId> | <ModelId>claude-opus-4\@20250514</ModelId> |
|
||||
|
||||
<Note>Models with the same snapshot date (e.g., 20240620) are identical across all platforms and do not change. The snapshot date in the model name ensures consistency and allows developers to rely on stable performance across different environments.</Note>
|
||||
|
||||
<Note>Starting with **Claude Sonnet 4.5 and all future models**, AWS Bedrock and Google Vertex AI offer two endpoint types: **global endpoints** (dynamic routing for maximum availability) and **regional endpoints** (guaranteed data routing through specific geographic regions). For more information, see the [third-party platform pricing section](/en/docs/about-claude/pricing#third-party-platform-pricing).</Note>
|
||||
|
||||
### Model aliases
|
||||
|
||||
For convenience during development and testing, we offer aliases for our model ids. These aliases automatically point to the most recent snapshot of a given model. When we release new model snapshots, we migrate aliases to point to the newest version of a model, typically within a week of the new release.
|
||||
|
||||
<Tip>
|
||||
While aliases are useful for experimentation, we recommend using specific model versions (e.g., `claude-sonnet-4-5-20250929`) in production applications to ensure consistent behavior.
|
||||
</Tip>
|
||||
|
||||
| Model | Alias | Model ID |
|
||||
| ----------------- | ------------------------------------------- | --------------------------------------------- |
|
||||
| Claude Sonnet 4.5 | <ModelId>claude-sonnet-4-5</ModelId> | <ModelId>claude-sonnet-4-5-20250929</ModelId> |
|
||||
| Claude Sonnet 4 | <ModelId>claude-sonnet-4-0</ModelId> | <ModelId>claude-sonnet-4-20250514</ModelId> |
|
||||
| Claude Sonnet 3.7 | <ModelId>claude-3-7-sonnet-latest</ModelId> | <ModelId>claude-3-7-sonnet-20250219</ModelId> |
|
||||
| Claude Haiku 4.5 | <ModelId>claude-haiku-4-5</ModelId> | <ModelId>claude-haiku-4-5-20251001</ModelId> |
|
||||
| Claude Haiku 3.5 | <ModelId>claude-3-5-haiku-latest</ModelId> | <ModelId>claude-3-5-haiku-20241022</ModelId> |
|
||||
| Claude Opus 4.1 | <ModelId>claude-opus-4-1</ModelId> | <ModelId>claude-opus-4-1-20250805</ModelId> |
|
||||
| Claude Opus 4 | <ModelId>claude-opus-4-0</ModelId> | <ModelId>claude-opus-4-20250514</ModelId> |
|
||||
|
||||
<Note>
|
||||
Aliases are subject to the same rate limits and pricing as the underlying model version they reference.
|
||||
</Note>
|
||||
|
||||
### Model comparison table
|
||||
|
||||
To help you choose the right model for your needs, we've compiled a table comparing the key features and capabilities of each model in the Claude family:
|
||||
|
||||
| Feature | Claude Sonnet 4.5 | Claude Sonnet 4 | Claude Sonnet 3.7 | Claude Opus 4.1 | Claude Opus 4 | Claude Haiku 4.5 | Claude Haiku 3.5 | Claude Haiku 3 |
|
||||
| :-------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------------------- |
|
||||
| **Description** | Our best model for complex agents and coding | High-performance model | High-performance model with early extended thinking | Exceptional model for specialized complex tasks | Our previous flagship model | Our fastest and most intelligent Haiku model | Our fastest model | Fast and compact model for near-instant responsiveness |
|
||||
| **Strengths** | Highest intelligence across most tasks with exceptional agent and coding capabilities | High intelligence and balanced performance | High intelligence with toggleable extended thinking | Very high intelligence and capability for specialized tasks | Very high intelligence and capability | Near-frontier intelligence at blazing speeds with extended thinking and exceptional cost-efficiency | Intelligence at blazing speeds | Quick and accurate targeted performance |
|
||||
| **Multilingual** | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
|
||||
| **Vision** | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
|
||||
| **[Extended thinking](/en/docs/build-with-claude/extended-thinking)** | Yes | Yes | Yes | Yes | Yes | Yes | No | No |
|
||||
| **[Priority Tier](/en/api/service-tiers)** | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No |
|
||||
| **API model name** | <ModelId>claude-sonnet-4-5-20250929</ModelId> | <ModelId>claude-sonnet-4-20250514</ModelId> | <ModelId>claude-3-7-sonnet-20250219</ModelId> | <ModelId>claude-opus-4-1-20250805</ModelId> | <ModelId>claude-opus-4-20250514</ModelId> | <ModelId>claude-haiku-4-5-20251001</ModelId> | <ModelId>claude-3-5-haiku-20241022</ModelId> | <ModelId>claude-3-haiku-20240307</ModelId> |
|
||||
| **Comparative latency** | Fast | Fast | Fast | Moderately Fast | Moderately Fast | Fastest | Fastest | Fast |
|
||||
| **Context window** | <Tooltip tip="~150K words \ ~680K unicode characters">200K</Tooltip> / <br /> 1M (beta)<sup>1</sup> | <Tooltip tip="~150K words \ ~680K unicode characters">200K</Tooltip> / <br /> 1M (beta)<sup>1</sup> | <Tooltip tip="~150K words \ ~680K unicode characters">200K</Tooltip> | <Tooltip tip="~150K words \ ~680K unicode characters">200K</Tooltip> | <Tooltip tip="~150K words \ ~680K unicode characters">200K</Tooltip> | <Tooltip tip="~150K words \ ~680K unicode characters">200K</Tooltip> | <Tooltip tip="~150K words \ ~215K unicode characters">200K</Tooltip> | <Tooltip tip="~150K words \ ~680K unicode characters">200K</Tooltip> |
|
||||
| **Max output** | <Tooltip tip="~48K words \ 218K unicode characters \ ~100 single spaced pages">64000 tokens</Tooltip> | <Tooltip tip="~48K words \ 218K unicode characters \ ~100 single spaced pages">64000 tokens</Tooltip> | <Tooltip tip="~48K words \ 218K unicode characters \ ~100 single spaced pages">64000 tokens</Tooltip> | <Tooltip tip="~24K words \ 109K unicode characters \ ~50 single spaced pages">32000 tokens</Tooltip> | <Tooltip tip="~24K words \ 109K unicode characters \ ~50 single spaced pages">32000 tokens</Tooltip> | <Tooltip tip="~48K words \ 218K unicode characters \ ~100 single spaced pages">64000 tokens</Tooltip> | <Tooltip tip="~6.2K words \ 28K unicode characters \ ~12-13 single spaced pages">8192 tokens</Tooltip> | <Tooltip tip="~3.1K words \ 14K unicode characters \ ~6-7 single spaced pages">4096 tokens</Tooltip> |
|
||||
| **Reliable knowledge cutoff** | Jan 2025<sup>2</sup> | Jan 2025<sup>2</sup> | Oct 2024<sup>2</sup> | Jan 2025<sup>2</sup> | Jan 2025<sup>2</sup> | Feb 2025 | <sup>3</sup> | <sup>3</sup> |
|
||||
| **Training data cutoff** | Jul 2025 | Mar 2025 | Nov 2024 | Mar 2025 | Mar 2025 | Jul 2025 | Jul 2024 | Aug 2023 |
|
||||
|
||||
*<sup>1 - Claude Sonnet 4.5 and Claude Sonnet 4 support a [1M token context window](/en/docs/build-with-claude/context-windows#1m-token-context-window) when using the `context-1m-2025-08-07` beta header. [Long context pricing](/en/docs/about-claude/pricing#long-context-pricing) applies to requests exceeding 200K tokens.</sup>*
|
||||
|
||||
*<sup>2 - **Reliable knowledge cutoff** indicates the date through which a model's knowledge is most extensive and reliable. **Training data cutoff** is the broader date range of training data used. For example, Claude Sonnet 4.5 was trained on publicly available information through July 2025, but its knowledge is most extensive and reliable through January 2025. For more information, see [Anthropic's Transparency Hub](https://www.anthropic.com/transparency).</sup>*
|
||||
|
||||
*<sup>3 - Some Haiku models have a single training data cutoff date.</sup>*
|
||||
|
||||
<Note>
|
||||
Include the beta header `output-128k-2025-02-19` in your API request to increase the maximum output token length to 128k tokens for Claude Sonnet 3.7.
|
||||
|
||||
We strongly suggest using our [streaming Messages API](/en/docs/build-with-claude/streaming) to avoid timeouts when generating longer outputs.
|
||||
See our guidance on [long requests](/en/api/errors#long-requests) for more details.
|
||||
</Note>
|
||||
|
||||
### Model pricing
|
||||
|
||||
The table below shows the price per million tokens for each model:
|
||||
|
||||
| Model | Base Input Tokens | 5m Cache Writes | 1h Cache Writes | Cache Hits & Refreshes | Output Tokens |
|
||||
| -------------------------------------------------------------------------- | ----------------- | --------------- | --------------- | ---------------------- | ------------- |
|
||||
| Claude Opus 4.1 | \$15 / MTok | \$18.75 / MTok | \$30 / MTok | \$1.50 / MTok | \$75 / MTok |
|
||||
| Claude Opus 4 | \$15 / MTok | \$18.75 / MTok | \$30 / MTok | \$1.50 / MTok | \$75 / MTok |
|
||||
| Claude Sonnet 4.5 | \$3 / MTok | \$3.75 / MTok | \$6 / MTok | \$0.30 / MTok | \$15 / MTok |
|
||||
| Claude Sonnet 4 | \$3 / MTok | \$3.75 / MTok | \$6 / MTok | \$0.30 / MTok | \$15 / MTok |
|
||||
| Claude Sonnet 3.7 | \$3 / MTok | \$3.75 / MTok | \$6 / MTok | \$0.30 / MTok | \$15 / MTok |
|
||||
| Claude Sonnet 3.5 ([deprecated](/en/docs/about-claude/model-deprecations)) | \$3 / MTok | \$3.75 / MTok | \$6 / MTok | \$0.30 / MTok | \$15 / MTok |
|
||||
| Claude Haiku 4.5 | \$1 / MTok | \$1.25 / MTok | \$2 / MTok | \$0.10 / MTok | \$5 / MTok |
|
||||
| Claude Haiku 3.5 | \$0.80 / MTok | \$1 / MTok | \$1.6 / MTok | \$0.08 / MTok | \$4 / MTok |
|
||||
| Claude Opus 3 ([deprecated](/en/docs/about-claude/model-deprecations)) | \$15 / MTok | \$18.75 / MTok | \$30 / MTok | \$1.50 / MTok | \$75 / MTok |
|
||||
| Claude Haiku 3 | \$0.25 / MTok | \$0.30 / MTok | \$0.50 / MTok | \$0.03 / MTok | \$1.25 / MTok |
|
||||
|
||||
## Prompt and output performance
|
||||
|
||||
Claude 4 models excel in:
|
||||
|
||||
* **Performance**: Top-tier results in reasoning, coding, multilingual tasks, long-context handling, honesty, and image processing. See the [Claude 4 blog post](http://www.anthropic.com/news/claude-4) for more information.
|
||||
* **Engaging responses**: Claude models are ideal for applications that require rich, human-like interactions.
|
||||
|
||||
* If you prefer more concise responses, you can adjust your prompts to guide the model toward the desired output length. Refer to our [prompt engineering guides](/en/docs/build-with-claude/prompt-engineering) for details.
|
||||
* For specific Claude 4 prompting best practices, see our [Claude 4 best practices guide](/en/docs/build-with-claude/prompt-engineering/claude-4-best-practices).
|
||||
* **Output quality**: When migrating from previous model generations to Claude 4, you may notice larger improvements in overall performance.
|
||||
|
||||
## Migrating to Claude 4.5
|
||||
|
||||
If you're currently using Claude 3 models, we recommend migrating to Claude 4.5 to take advantage of improved intelligence and enhanced capabilities. For detailed migration instructions, see [Migrating to Claude 4.5](/en/docs/about-claude/models/migrating-to-claude-4).
|
||||
|
||||
## Get started with Claude
|
||||
|
||||
If you're ready to start exploring what Claude can do for you, let's dive in! Whether you're a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we've got you covered.
|
||||
|
||||
<Note>Looking to chat with Claude? Visit [claude.ai](http://www.claude.ai)!</Note>
|
||||
|
||||
<CardGroup cols={3}>
|
||||
<Card title="Intro to Claude" icon="check" href="/en/docs/intro-to-claude">
|
||||
Explore Claude’s capabilities and development flow.
|
||||
</Card>
|
||||
|
||||
<Card title="Quickstart" icon="bolt-lightning" href="/en/resources/quickstarts">
|
||||
Learn how to make your first API call in minutes.
|
||||
</Card>
|
||||
|
||||
<Card title="Claude Console" icon="code" href="https://console.anthropic.com">
|
||||
Craft and test powerful prompts directly in your browser.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
If you have any questions or need assistance, don't hesitate to reach out to our [support team](https://support.claude.com/) or consult the [Discord community](https://www.anthropic.com/discord).
|
||||
@@ -0,0 +1,376 @@
|
||||
# Plugins reference
|
||||
|
||||
> Complete technical reference for Claude Code plugin system, including schemas, CLI commands, and component specifications.
|
||||
|
||||
<Tip>
|
||||
For hands-on tutorials and practical usage, see [Plugins](/en/docs/claude-code/plugins). For plugin management across teams and communities, see [Plugin marketplaces](/en/docs/claude-code/plugin-marketplaces).
|
||||
</Tip>
|
||||
|
||||
This reference provides complete technical specifications for the Claude Code plugin system, including component schemas, CLI commands, and development tools.
|
||||
|
||||
## Plugin components reference
|
||||
|
||||
This section documents the five types of components that plugins can provide.
|
||||
|
||||
### Commands
|
||||
|
||||
Plugins add custom slash commands that integrate seamlessly with Claude Code's command system.
|
||||
|
||||
**Location**: `commands/` directory in plugin root
|
||||
|
||||
**File format**: Markdown files with frontmatter
|
||||
|
||||
For complete details on plugin command structure, invocation patterns, and features, see [Plugin commands](/en/docs/claude-code/slash-commands#plugin-commands).
|
||||
|
||||
### Agents
|
||||
|
||||
Plugins can provide specialized subagents for specific tasks that Claude can invoke automatically when appropriate.
|
||||
|
||||
**Location**: `agents/` directory in plugin root
|
||||
|
||||
**File format**: Markdown files describing agent capabilities
|
||||
|
||||
**Agent structure**:
|
||||
|
||||
```markdown theme={null}
|
||||
---
|
||||
description: What this agent specializes in
|
||||
capabilities: ["task1", "task2", "task3"]
|
||||
---
|
||||
|
||||
# Agent Name
|
||||
|
||||
Detailed description of the agent's role, expertise, and when Claude should invoke it.
|
||||
|
||||
## Capabilities
|
||||
- Specific task the agent excels at
|
||||
- Another specialized capability
|
||||
- When to use this agent vs others
|
||||
|
||||
## Context and examples
|
||||
Provide examples of when this agent should be used and what kinds of problems it solves.
|
||||
```
|
||||
|
||||
**Integration points**:
|
||||
|
||||
* Agents appear in the `/agents` interface
|
||||
* Claude can invoke agents automatically based on task context
|
||||
* Agents can be invoked manually by users
|
||||
* Plugin agents work alongside built-in Claude agents
|
||||
|
||||
### Skills
|
||||
|
||||
Plugins can provide Agent Skills that extend Claude's capabilities. Skills are model-invoked—Claude autonomously decides when to use them based on the task context.
|
||||
|
||||
**Location**: `skills/` directory in plugin root
|
||||
|
||||
**File format**: Directories containing `SKILL.md` files with frontmatter
|
||||
|
||||
**Skill structure**:
|
||||
|
||||
```
|
||||
skills/
|
||||
├── pdf-processor/
|
||||
│ ├── SKILL.md
|
||||
│ ├── reference.md (optional)
|
||||
│ └── scripts/ (optional)
|
||||
└── code-reviewer/
|
||||
└── SKILL.md
|
||||
```
|
||||
|
||||
**Integration behavior**:
|
||||
|
||||
* Plugin Skills are automatically discovered when the plugin is installed
|
||||
* Claude autonomously invokes Skills based on matching task context
|
||||
* Skills can include supporting files alongside SKILL.md
|
||||
|
||||
For SKILL.md format and complete Skill authoring guidance, see:
|
||||
|
||||
* [Use Skills in Claude Code](/en/docs/claude-code/skills)
|
||||
* [Agent Skills overview](/en/docs/agents-and-tools/agent-skills/overview#skill-structure)
|
||||
|
||||
### Hooks
|
||||
|
||||
Plugins can provide event handlers that respond to Claude Code events automatically.
|
||||
|
||||
**Location**: `hooks/hooks.json` in plugin root, or inline in plugin.json
|
||||
|
||||
**Format**: JSON configuration with event matchers and actions
|
||||
|
||||
**Hook configuration**:
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"hooks": {
|
||||
"PostToolUse": [
|
||||
{
|
||||
"matcher": "Write|Edit",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/format-code.sh"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Available events**:
|
||||
|
||||
* `PreToolUse`: Before Claude uses any tool
|
||||
* `PostToolUse`: After Claude uses any tool
|
||||
* `UserPromptSubmit`: When user submits a prompt
|
||||
* `Notification`: When Claude Code sends notifications
|
||||
* `Stop`: When Claude attempts to stop
|
||||
* `SubagentStop`: When a subagent attempts to stop
|
||||
* `SessionStart`: At the beginning of sessions
|
||||
* `SessionEnd`: At the end of sessions
|
||||
* `PreCompact`: Before conversation history is compacted
|
||||
|
||||
**Hook types**:
|
||||
|
||||
* `command`: Execute shell commands or scripts
|
||||
* `validation`: Validate file contents or project state
|
||||
* `notification`: Send alerts or status updates
|
||||
|
||||
### MCP servers
|
||||
|
||||
Plugins can bundle Model Context Protocol (MCP) servers to connect Claude Code with external tools and services.
|
||||
|
||||
**Location**: `.mcp.json` in plugin root, or inline in plugin.json
|
||||
|
||||
**Format**: Standard MCP server configuration
|
||||
|
||||
**MCP server configuration**:
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"mcpServers": {
|
||||
"plugin-database": {
|
||||
"command": "${CLAUDE_PLUGIN_ROOT}/servers/db-server",
|
||||
"args": ["--config", "${CLAUDE_PLUGIN_ROOT}/config.json"],
|
||||
"env": {
|
||||
"DB_PATH": "${CLAUDE_PLUGIN_ROOT}/data"
|
||||
}
|
||||
},
|
||||
"plugin-api-client": {
|
||||
"command": "npx",
|
||||
"args": ["@company/mcp-server", "--plugin-mode"],
|
||||
"cwd": "${CLAUDE_PLUGIN_ROOT}"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Integration behavior**:
|
||||
|
||||
* Plugin MCP servers start automatically when the plugin is enabled
|
||||
* Servers appear as standard MCP tools in Claude's toolkit
|
||||
* Server capabilities integrate seamlessly with Claude's existing tools
|
||||
* Plugin servers can be configured independently of user MCP servers
|
||||
|
||||
***
|
||||
|
||||
## Plugin manifest schema
|
||||
|
||||
The `plugin.json` file defines your plugin's metadata and configuration. This section documents all supported fields and options.
|
||||
|
||||
### Complete schema
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"name": "plugin-name",
|
||||
"version": "1.2.0",
|
||||
"description": "Brief plugin description",
|
||||
"author": {
|
||||
"name": "Author Name",
|
||||
"email": "author@example.com",
|
||||
"url": "https://github.com/author"
|
||||
},
|
||||
"homepage": "https://docs.example.com/plugin",
|
||||
"repository": "https://github.com/author/plugin",
|
||||
"license": "MIT",
|
||||
"keywords": ["keyword1", "keyword2"],
|
||||
"commands": ["./custom/commands/special.md"],
|
||||
"agents": "./custom/agents/",
|
||||
"hooks": "./config/hooks.json",
|
||||
"mcpServers": "./mcp-config.json"
|
||||
}
|
||||
```
|
||||
|
||||
### Required fields
|
||||
|
||||
| Field | Type | Description | Example |
|
||||
| :----- | :----- | :---------------------------------------- | :------------------- |
|
||||
| `name` | string | Unique identifier (kebab-case, no spaces) | `"deployment-tools"` |
|
||||
|
||||
### Metadata fields
|
||||
|
||||
| Field | Type | Description | Example |
|
||||
| :------------ | :----- | :---------------------------------- | :------------------------------------------------- |
|
||||
| `version` | string | Semantic version | `"2.1.0"` |
|
||||
| `description` | string | Brief explanation of plugin purpose | `"Deployment automation tools"` |
|
||||
| `author` | object | Author information | `{"name": "Dev Team", "email": "dev@company.com"}` |
|
||||
| `homepage` | string | Documentation URL | `"https://docs.example.com"` |
|
||||
| `repository` | string | Source code URL | `"https://github.com/user/plugin"` |
|
||||
| `license` | string | License identifier | `"MIT"`, `"Apache-2.0"` |
|
||||
| `keywords` | array | Discovery tags | `["deployment", "ci-cd"]` |
|
||||
|
||||
### Component path fields
|
||||
|
||||
| Field | Type | Description | Example |
|
||||
| :----------- | :------------- | :----------------------------------- | :------------------------------------- |
|
||||
| `commands` | string\|array | Additional command files/directories | `"./custom/cmd.md"` or `["./cmd1.md"]` |
|
||||
| `agents` | string\|array | Additional agent files | `"./custom/agents/"` |
|
||||
| `hooks` | string\|object | Hook config path or inline config | `"./hooks.json"` |
|
||||
| `mcpServers` | string\|object | MCP config path or inline config | `"./mcp.json"` |
|
||||
|
||||
### Path behavior rules
|
||||
|
||||
**Important**: Custom paths supplement default directories - they don't replace them.
|
||||
|
||||
* If `commands/` exists, it's loaded in addition to custom command paths
|
||||
* All paths must be relative to plugin root and start with `./`
|
||||
* Commands from custom paths use the same naming and namespacing rules
|
||||
* Multiple paths can be specified as arrays for flexibility
|
||||
|
||||
**Path examples**:
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"commands": [
|
||||
"./specialized/deploy.md",
|
||||
"./utilities/batch-process.md"
|
||||
],
|
||||
"agents": [
|
||||
"./custom-agents/reviewer.md",
|
||||
"./custom-agents/tester.md"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Environment variables
|
||||
|
||||
**`${CLAUDE_PLUGIN_ROOT}`**: Contains the absolute path to your plugin directory. Use this in hooks, MCP servers, and scripts to ensure correct paths regardless of installation location.
|
||||
|
||||
```json theme={null}
|
||||
{
|
||||
"hooks": {
|
||||
"PostToolUse": [
|
||||
{
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/process.sh"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
## Plugin directory structure
|
||||
|
||||
### Standard plugin layout
|
||||
|
||||
A complete plugin follows this structure:
|
||||
|
||||
```
|
||||
enterprise-plugin/
|
||||
├── .claude-plugin/ # Metadata directory
|
||||
│ └── plugin.json # Required: plugin manifest
|
||||
├── commands/ # Default command location
|
||||
│ ├── status.md
|
||||
│ └── logs.md
|
||||
├── agents/ # Default agent location
|
||||
│ ├── security-reviewer.md
|
||||
│ ├── performance-tester.md
|
||||
│ └── compliance-checker.md
|
||||
├── skills/ # Agent Skills
|
||||
│ ├── code-reviewer/
|
||||
│ │ └── SKILL.md
|
||||
│ └── pdf-processor/
|
||||
│ ├── SKILL.md
|
||||
│ └── scripts/
|
||||
├── hooks/ # Hook configurations
|
||||
│ ├── hooks.json # Main hook config
|
||||
│ └── security-hooks.json # Additional hooks
|
||||
├── .mcp.json # MCP server definitions
|
||||
├── scripts/ # Hook and utility scripts
|
||||
│ ├── security-scan.sh
|
||||
│ ├── format-code.py
|
||||
│ └── deploy.js
|
||||
├── LICENSE # License file
|
||||
└── CHANGELOG.md # Version history
|
||||
```
|
||||
|
||||
<Warning>
|
||||
The `.claude-plugin/` directory contains the `plugin.json` file. All other directories (commands/, agents/, skills/, hooks/) must be at the plugin root, not inside `.claude-plugin/`.
|
||||
</Warning>
|
||||
|
||||
### File locations reference
|
||||
|
||||
| Component | Default Location | Purpose |
|
||||
| :-------------- | :--------------------------- | :------------------------------- |
|
||||
| **Manifest** | `.claude-plugin/plugin.json` | Required metadata file |
|
||||
| **Commands** | `commands/` | Slash command markdown files |
|
||||
| **Agents** | `agents/` | Subagent markdown files |
|
||||
| **Skills** | `skills/` | Agent Skills with SKILL.md files |
|
||||
| **Hooks** | `hooks/hooks.json` | Hook configuration |
|
||||
| **MCP servers** | `.mcp.json` | MCP server definitions |
|
||||
|
||||
***
|
||||
|
||||
## Debugging and development tools
|
||||
|
||||
### Debugging commands
|
||||
|
||||
Use `claude --debug` to see plugin loading details:
|
||||
|
||||
```bash theme={null}
|
||||
claude --debug
|
||||
```
|
||||
|
||||
This shows:
|
||||
|
||||
* Which plugins are being loaded
|
||||
* Any errors in plugin manifests
|
||||
* Command, agent, and hook registration
|
||||
* MCP server initialization
|
||||
|
||||
### Common issues
|
||||
|
||||
| Issue | Cause | Solution |
|
||||
| :--------------------- | :------------------------------ | :--------------------------------------------------- |
|
||||
| Plugin not loading | Invalid `plugin.json` | Validate JSON syntax |
|
||||
| Commands not appearing | Wrong directory structure | Ensure `commands/` at root, not in `.claude-plugin/` |
|
||||
| Hooks not firing | Script not executable | Run `chmod +x script.sh` |
|
||||
| MCP server fails | Missing `${CLAUDE_PLUGIN_ROOT}` | Use variable for all plugin paths |
|
||||
| Path errors | Absolute paths used | All paths must be relative and start with `./` |
|
||||
|
||||
***
|
||||
|
||||
## Distribution and versioning reference
|
||||
|
||||
### Version management
|
||||
|
||||
Follow semantic versioning for plugin releases:
|
||||
|
||||
```json theme={null}
|
||||
|
||||
## See also
|
||||
|
||||
- [Plugins](/en/docs/claude-code/plugins) - Tutorials and practical usage
|
||||
- [Plugin marketplaces](/en/docs/claude-code/plugin-marketplaces) - Creating and managing marketplaces
|
||||
- [Slash commands](/en/docs/claude-code/slash-commands) - Command development details
|
||||
- [Subagents](/en/docs/claude-code/sub-agents) - Agent configuration and capabilities
|
||||
- [Agent Skills](/en/docs/claude-code/skills) - Extend Claude's capabilities
|
||||
- [Hooks](/en/docs/claude-code/hooks) - Event handling and automation
|
||||
- [MCP](/en/docs/claude-code/mcp) - External tool integration
|
||||
- [Settings](/en/docs/claude-code/settings) - Configuration options for plugins
|
||||
```
|
||||
@@ -0,0 +1,295 @@
|
||||
# Streaming Input
|
||||
|
||||
> Understanding the two input modes for Claude Agent SDK and when to use each
|
||||
|
||||
## Overview
|
||||
|
||||
The Claude Agent SDK supports two distinct input modes for interacting with agents:
|
||||
|
||||
* **Streaming Input Mode** (Default & Recommended) - A persistent, interactive session
|
||||
* **Single Message Input** - One-shot queries that use session state and resuming
|
||||
|
||||
This guide explains the differences, benefits, and use cases for each mode to help you choose the right approach for your application.
|
||||
|
||||
## Streaming Input Mode (Recommended)
|
||||
|
||||
Streaming input mode is the **preferred** way to use the Claude Agent SDK. It provides full access to the agent's capabilities and enables rich, interactive experiences.
|
||||
|
||||
It allows the agent to operate as a long lived process that takes in user input, handles interruptions, surfaces permission requests, and handles session management.
|
||||
|
||||
### How It Works
|
||||
|
||||
```mermaid theme={null}
|
||||
%%{init: {"theme": "base", "themeVariables": {"edgeLabelBackground": "#F0F0EB", "lineColor": "#91918D", "primaryColor": "#F0F0EB", "primaryTextColor": "#191919", "primaryBorderColor": "#D9D8D5", "secondaryColor": "#F5E6D8", "tertiaryColor": "#CC785C", "noteBkgColor": "#FAF0E6", "noteBorderColor": "#91918D"}, "sequence": {"actorMargin": 50, "width": 150, "height": 65, "boxMargin": 10, "boxTextMargin": 5, "noteMargin": 10, "messageMargin": 35}}}%%
|
||||
sequenceDiagram
|
||||
participant App as Your Application
|
||||
participant Agent as Claude Agent
|
||||
participant Tools as Tools/Hooks
|
||||
participant FS as Environment/<br/>File System
|
||||
|
||||
App->>Agent: Initialize with AsyncGenerator
|
||||
activate Agent
|
||||
|
||||
App->>Agent: Yield Message 1
|
||||
Agent->>Tools: Execute tools
|
||||
Tools->>FS: Read files
|
||||
FS-->>Tools: File contents
|
||||
Tools->>FS: Write/Edit files
|
||||
FS-->>Tools: Success/Error
|
||||
Agent-->>App: Stream partial response
|
||||
Agent-->>App: Stream more content...
|
||||
Agent->>App: Complete Message 1
|
||||
|
||||
App->>Agent: Yield Message 2 + Image
|
||||
Agent->>Tools: Process image & execute
|
||||
Tools->>FS: Access filesystem
|
||||
FS-->>Tools: Operation results
|
||||
Agent-->>App: Stream response 2
|
||||
|
||||
App->>Agent: Queue Message 3
|
||||
App->>Agent: Interrupt/Cancel
|
||||
Agent->>App: Handle interruption
|
||||
|
||||
Note over App,Agent: Session stays alive
|
||||
Note over Tools,FS: Persistent file system<br/>state maintained
|
||||
|
||||
deactivate Agent
|
||||
```
|
||||
|
||||
### Benefits
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Image Uploads" icon="image">
|
||||
Attach images directly to messages for visual analysis and understanding
|
||||
</Card>
|
||||
|
||||
<Card title="Queued Messages" icon="layer-group">
|
||||
Send multiple messages that process sequentially, with ability to interrupt
|
||||
</Card>
|
||||
|
||||
<Card title="Tool Integration" icon="wrench">
|
||||
Full access to all tools and custom MCP servers during the session
|
||||
</Card>
|
||||
|
||||
<Card title="Hooks Support" icon="link">
|
||||
Use lifecycle hooks to customize behavior at various points
|
||||
</Card>
|
||||
|
||||
<Card title="Real-time Feedback" icon="bolt">
|
||||
See responses as they're generated, not just final results
|
||||
</Card>
|
||||
|
||||
<Card title="Context Persistence" icon="database">
|
||||
Maintain conversation context across multiple turns naturally
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
### Implementation Example
|
||||
|
||||
<CodeGroup>
|
||||
```typescript TypeScript theme={null}
|
||||
import { query } from "@anthropic-ai/claude-agent-sdk";
|
||||
import { readFileSync } from "fs";
|
||||
|
||||
async function* generateMessages() {
|
||||
// First message
|
||||
yield {
|
||||
type: "user" as const,
|
||||
message: {
|
||||
role: "user" as const,
|
||||
content: "Analyze this codebase for security issues"
|
||||
}
|
||||
};
|
||||
|
||||
// Wait for conditions or user input
|
||||
await new Promise(resolve => setTimeout(resolve, 2000));
|
||||
|
||||
// Follow-up with image
|
||||
yield {
|
||||
type: "user" as const,
|
||||
message: {
|
||||
role: "user" as const,
|
||||
content: [
|
||||
{
|
||||
type: "text",
|
||||
text: "Review this architecture diagram"
|
||||
},
|
||||
{
|
||||
type: "image",
|
||||
source: {
|
||||
type: "base64",
|
||||
media_type: "image/png",
|
||||
data: readFileSync("diagram.png", "base64")
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
// Process streaming responses
|
||||
for await (const message of query({
|
||||
prompt: generateMessages(),
|
||||
options: {
|
||||
maxTurns: 10,
|
||||
allowedTools: ["Read", "Grep"]
|
||||
}
|
||||
})) {
|
||||
if (message.type === "result") {
|
||||
console.log(message.result);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```python Python theme={null}
|
||||
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions, AssistantMessage, TextBlock
|
||||
import asyncio
|
||||
import base64
|
||||
|
||||
async def streaming_analysis():
|
||||
async def message_generator():
|
||||
# First message
|
||||
yield {
|
||||
"type": "user",
|
||||
"message": {
|
||||
"role": "user",
|
||||
"content": "Analyze this codebase for security issues"
|
||||
}
|
||||
}
|
||||
|
||||
# Wait for conditions
|
||||
await asyncio.sleep(2)
|
||||
|
||||
# Follow-up with image
|
||||
with open("diagram.png", "rb") as f:
|
||||
image_data = base64.b64encode(f.read()).decode()
|
||||
|
||||
yield {
|
||||
"type": "user",
|
||||
"message": {
|
||||
"role": "user",
|
||||
"content": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": "Review this architecture diagram"
|
||||
},
|
||||
{
|
||||
"type": "image",
|
||||
"source": {
|
||||
"type": "base64",
|
||||
"media_type": "image/png",
|
||||
"data": image_data
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# Use ClaudeSDKClient for streaming input
|
||||
options = ClaudeAgentOptions(
|
||||
max_turns=10,
|
||||
allowed_tools=["Read", "Grep"]
|
||||
)
|
||||
|
||||
async with ClaudeSDKClient(options) as client:
|
||||
# Send streaming input
|
||||
await client.query(message_generator())
|
||||
|
||||
# Process responses
|
||||
async for message in client.receive_response():
|
||||
if isinstance(message, AssistantMessage):
|
||||
for block in message.content:
|
||||
if isinstance(block, TextBlock):
|
||||
print(block.text)
|
||||
|
||||
asyncio.run(streaming_analysis())
|
||||
```
|
||||
</CodeGroup>
|
||||
|
||||
## Single Message Input
|
||||
|
||||
Single message input is simpler but more limited.
|
||||
|
||||
### When to Use Single Message Input
|
||||
|
||||
Use single message input when:
|
||||
|
||||
* You need a one-shot response
|
||||
* You do not need image attachments, hooks, etc.
|
||||
* You need to operate in a stateless environment, such as a lambda function
|
||||
|
||||
### Limitations
|
||||
|
||||
<Warning>
|
||||
Single message input mode does **not** support:
|
||||
|
||||
* Direct image attachments in messages
|
||||
* Dynamic message queueing
|
||||
* Real-time interruption
|
||||
* Hook integration
|
||||
* Natural multi-turn conversations
|
||||
</Warning>
|
||||
|
||||
### Implementation Example
|
||||
|
||||
<CodeGroup>
|
||||
```typescript TypeScript theme={null}
|
||||
import { query } from "@anthropic-ai/claude-agent-sdk";
|
||||
|
||||
// Simple one-shot query
|
||||
for await (const message of query({
|
||||
prompt: "Explain the authentication flow",
|
||||
options: {
|
||||
maxTurns: 1,
|
||||
allowedTools: ["Read", "Grep"]
|
||||
}
|
||||
})) {
|
||||
if (message.type === "result") {
|
||||
console.log(message.result);
|
||||
}
|
||||
}
|
||||
|
||||
// Continue conversation with session management
|
||||
for await (const message of query({
|
||||
prompt: "Now explain the authorization process",
|
||||
options: {
|
||||
continue: true,
|
||||
maxTurns: 1
|
||||
}
|
||||
})) {
|
||||
if (message.type === "result") {
|
||||
console.log(message.result);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```python Python theme={null}
|
||||
from claude_agent_sdk import query, ClaudeAgentOptions, ResultMessage
|
||||
import asyncio
|
||||
|
||||
async def single_message_example():
|
||||
# Simple one-shot query using query() function
|
||||
async for message in query(
|
||||
prompt="Explain the authentication flow",
|
||||
options=ClaudeAgentOptions(
|
||||
max_turns=1,
|
||||
allowed_tools=["Read", "Grep"]
|
||||
)
|
||||
):
|
||||
if isinstance(message, ResultMessage):
|
||||
print(message.result)
|
||||
|
||||
# Continue conversation with session management
|
||||
async for message in query(
|
||||
prompt="Now explain the authorization process",
|
||||
options=ClaudeAgentOptions(
|
||||
continue_conversation=True,
|
||||
max_turns=1
|
||||
)
|
||||
):
|
||||
if isinstance(message, ResultMessage):
|
||||
print(message.result)
|
||||
|
||||
asyncio.run(single_message_example())
|
||||
```
|
||||
</CodeGroup>
|
||||
@@ -0,0 +1,259 @@
|
||||
**Claude finished @thedotmack's task** —— [View job](https://github.com/thedotmack/claude-mem/actions/runs/19080917196)
|
||||
|
||||
---
|
||||
### Code Review Complete
|
||||
|
||||
This PR implements significant architectural improvements focused on production stability. The refactoring successfully simplifies the worker service and eliminates several failure modes. Below is my detailed review:
|
||||
|
||||
---
|
||||
|
||||
### ✅ Strengths
|
||||
|
||||
#### 1. **Excellent Simplification of Worker Service Startup** (src/services/worker-service.ts:115-147)
|
||||
The new startup sequence is a major improvement:
|
||||
- HTTP server starts **first** before any blocking operations
|
||||
- ChromaSync initialization is deferred until after HTTP is ready
|
||||
- Chroma backfill runs in background (non-blocking)
|
||||
- Failures no longer crash the entire service
|
||||
|
||||
**Before**: ChromaSync initialized in constructor → Chroma backfill blocks startup → Any failure prevents HTTP server from starting
|
||||
|
||||
**After**: HTTP server starts immediately → ChromaSync initialized → Chroma backfill in background → Service always available
|
||||
|
||||
This is excellent engineering. The worker will now be responsive even if Chroma has issues.
|
||||
|
||||
#### 2. **Smart Executable Path Caching** (src/services/worker-service.ts:22-61)
|
||||
Adding `cachedClaudePath` prevents repeated `execSync` calls on every SDK agent initialization. This reduces filesystem overhead and improves performance. Good optimization.
|
||||
|
||||
#### 3. **Improved Error Handling in Hooks** (src/hooks/save-hook.ts:84-91, new-hook.ts:62-69, summary-hook.ts:65-72)
|
||||
The error handling now distinguishes between connection errors vs HTTP errors:
|
||||
```typescript
|
||||
if (error.cause?.code === 'ECONNREFUSED' || error.name === 'TimeoutError' || error.message.includes('fetch failed')) {
|
||||
throw new Error("There's a problem with the worker. If you just updated, type `pm2 restart claude-mem-worker` in your terminal to continue");
|
||||
}
|
||||
```
|
||||
This provides actionable error messages to users instead of generic failures.
|
||||
|
||||
#### 4. **Dramatic PM2 Configuration Simplification** (ecosystem.config.cjs)
|
||||
The PM2 config went from 44 lines with complex error/output redirection down to 26 lines. This is exactly right - PM2's defaults are sensible and the previous configuration was adding complexity without clear benefit.
|
||||
|
||||
#### 5. **Worker Management Simplification** (src/shared/worker-utils.ts)
|
||||
Removed the async health check system (124 lines → 48 lines). The new synchronous approach using PM2 list output is simpler and more reliable:
|
||||
- No HTTP health checks that can timeout
|
||||
- No retry logic with arbitrary delays
|
||||
- Just checks if PM2 reports the process as 'online'
|
||||
|
||||
---
|
||||
|
||||
### ⚠️ Issues & Concerns
|
||||
|
||||
#### 1. **Race Condition in Worker Health Check** (src/shared/worker-utils.ts:28-39) - **CRITICAL**
|
||||
|
||||
There's a race condition in the worker startup check:
|
||||
|
||||
```typescript
|
||||
checkProcess.on('close', (code) => {
|
||||
const isRunning = output.includes('claude-mem-worker') && output.includes('online');
|
||||
|
||||
if (!isRunning) {
|
||||
// Only start if not already running
|
||||
spawn(pm2Path, ['start', ecosystemPath], {
|
||||
cwd: packageRoot,
|
||||
stdio: 'ignore'
|
||||
});
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**Problem**: The `spawn()` call inside the `close` event handler is non-blocking, but the function returns immediately. The hooks calling `ensureWorkerRunning()` will continue execution **before** the worker has actually started.
|
||||
|
||||
**Impact**: Hooks may attempt to make HTTP requests to the worker before it's listening, resulting in connection failures.
|
||||
|
||||
**Suggested Fix**: Either:
|
||||
- Make `ensureWorkerRunning()` async and await the health check
|
||||
- Add a brief health check retry loop after spawning (like the old implementation had)
|
||||
- Document that this is fire-and-forget and hooks should handle ECONNREFUSED
|
||||
|
||||
#### 2. **Removed Health Endpoint Information** (src/services/worker-service.ts:152-154)
|
||||
|
||||
The health endpoint was simplified from:
|
||||
```typescript
|
||||
// OLD
|
||||
res.json({
|
||||
status: 'ok',
|
||||
port: this.port,
|
||||
pid: process.pid,
|
||||
activeSessions: this.sessions.size,
|
||||
chromaSynced: true
|
||||
});
|
||||
```
|
||||
|
||||
To:
|
||||
```typescript
|
||||
// NEW
|
||||
res.json({ status: 'ok' });
|
||||
```
|
||||
|
||||
**Concern**: This removes useful debugging information. When troubleshooting production issues, knowing the PID, active sessions count, and port would be valuable. The health endpoint is typically used for diagnostics, not just uptime checks.
|
||||
|
||||
**Suggestion**: Consider keeping at least `pid` and `activeSessions` in the health response for observability.
|
||||
|
||||
#### 3. **Auto-Session Creation Without Validation** (src/services/worker-service.ts:252-280, 309-336)
|
||||
|
||||
Both `handleObservation` and `handleSummarize` auto-create sessions if they don't exist:
|
||||
|
||||
```typescript
|
||||
if (!session) {
|
||||
// Auto-create session if it doesn't exist (e.g., worker restarted)
|
||||
const db = new SessionStore();
|
||||
const dbSession = db.getSessionById(sessionDbId);
|
||||
db.close();
|
||||
|
||||
session = {
|
||||
sessionDbId,
|
||||
claudeSessionId: dbSession!.claude_session_id, // ⚠️ Non-null assertion
|
||||
// ...
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Concerns**:
|
||||
- Uses non-null assertion (`dbSession!`) without checking if `dbSession` is actually null
|
||||
- If `getSessionById()` returns null (session doesn't exist in DB), this will throw at runtime
|
||||
- The error won't be caught and will crash the handler
|
||||
|
||||
**Suggested Fix**:
|
||||
```typescript
|
||||
const dbSession = db.getSessionById(sessionDbId);
|
||||
if (!dbSession) {
|
||||
db.close();
|
||||
res.status(404).json({ error: 'Session not found in database' });
|
||||
return;
|
||||
}
|
||||
```
|
||||
|
||||
This validation already exists in `handleInit` (lines 167-173), so it should be added to the other handlers too.
|
||||
|
||||
#### 4. **Removed Observation Counter** (src/services/worker-service.ts:80)
|
||||
|
||||
The `ActiveSession` interface no longer tracks `observationCounter`:
|
||||
```typescript
|
||||
// REMOVED: observationCounter: number; // Counter for correlation IDs
|
||||
```
|
||||
|
||||
**Question**: Was this used for generating correlation IDs for logging? If so, is there now no way to correlate observations within a session for debugging?
|
||||
|
||||
**Impact**: Minor - mainly affects debugging/logging. Not a functional issue.
|
||||
|
||||
#### 5. **PM2 Watch Mode in Production** (ecosystem.config.cjs:16)
|
||||
|
||||
The config enables watch mode:
|
||||
```javascript
|
||||
watch: true,
|
||||
```
|
||||
|
||||
**Concern**: Watch mode causes PM2 to restart the process whenever files change. This is useful during development but potentially problematic in production:
|
||||
- Could cause unexpected restarts if files are touched
|
||||
- May interrupt active sessions
|
||||
- Increases resource usage (file watching)
|
||||
|
||||
**Suggestion**: Consider disabling watch mode by default, or document that users should disable it in production if they're deploying this.
|
||||
|
||||
#### 6. **Duplicate Port Constant** (Multiple Files)
|
||||
|
||||
The `FIXED_PORT` constant is defined in multiple places:
|
||||
- `src/services/worker-service.ts:20`
|
||||
- `src/shared/worker-utils.ts:5`
|
||||
- `src/hooks/save-hook.ts:54`
|
||||
- `src/hooks/new-hook.ts:47`
|
||||
- `src/hooks/summary-hook.ts:39`
|
||||
|
||||
**Concern**: This creates maintenance burden and risk of inconsistency. If the port needs to change, it must be updated in 5 places.
|
||||
|
||||
**Suggestion**: Export `FIXED_PORT` from a single shared module (e.g., `worker-utils.ts`) and import it everywhere else.
|
||||
|
||||
---
|
||||
|
||||
### 🔍 Minor Issues
|
||||
|
||||
#### 7. **Error Handling Inconsistency in Chroma Sync** (src/services/worker-service.ts:220-223)
|
||||
|
||||
```typescript
|
||||
).catch(err => {
|
||||
logger.failure('WORKER', 'Failed to sync user_prompt to Chroma - continuing', { promptId: latestPrompt.id }, err);
|
||||
// Don't crash - SQLite has the data
|
||||
});
|
||||
```
|
||||
|
||||
The logger method is `failure` here but `error` elsewhere (lines 625, 664). For consistency, these should all use the same log level for Chroma sync failures.
|
||||
|
||||
#### 8. **Type Safety in Error Handling** (src/hooks/save-hook.ts:84)
|
||||
|
||||
```typescript
|
||||
} catch (error: any) {
|
||||
```
|
||||
|
||||
Using `any` defeats type safety. Consider using `unknown` and type guards:
|
||||
```typescript
|
||||
} catch (error: unknown) {
|
||||
const err = error as Error & { cause?: { code?: string } };
|
||||
if (err.cause?.code === 'ECONNREFUSED' || err.name === 'TimeoutError' || err.message.includes('fetch failed')) {
|
||||
// ...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This pattern is used in all three hook files.
|
||||
|
||||
---
|
||||
|
||||
### 📋 Documentation Cleanup
|
||||
|
||||
The deletion of obsolete documentation files is appropriate:
|
||||
- `EXPERIMENTAL_RELEASE_NOTES.md` (331 lines)
|
||||
- `FEATURE_PLAN_HYBRID_SEARCH.md` (486 lines)
|
||||
- `GITHUB_RELEASE_TEMPLATE.md` (83 lines)
|
||||
- `IMPLEMENTATION_STATUS.md` (503 lines)
|
||||
- `NEXT_SESSION_PROMPT.md` (193 lines)
|
||||
- `docs/coderabbit-PR-41.md` (314 lines)
|
||||
|
||||
These appear to be planning documents and old release artifacts. Good cleanup, though consider archiving rather than deleting if there's historical value.
|
||||
|
||||
---
|
||||
|
||||
### 🎯 Testing Recommendations
|
||||
|
||||
In addition to the testing recommendations in the PR description:
|
||||
|
||||
1. **Test race condition scenario**: Start a fresh session immediately after a `pm2 stop` to verify hooks handle ECONNREFUSED gracefully
|
||||
2. **Test worker restart under load**: Restart worker while observations are being processed to verify auto-session-creation works
|
||||
3. **Test Chroma failures**: Disconnect/break Chroma to verify worker stays alive and logs errors appropriately
|
||||
4. **Verify PM2 watch mode**: Ensure watch mode doesn't cause problems with the data directory or node_modules changes
|
||||
|
||||
---
|
||||
|
||||
### ✅ Overall Assessment
|
||||
|
||||
**Verdict**: **Approve with recommendations**
|
||||
|
||||
This PR represents a significant improvement in production stability and code maintainability. The architectural changes are sound and address real production failure modes. The primary concerns are:
|
||||
|
||||
1. **Race condition in worker startup** (Critical - should be fixed)
|
||||
2. **Missing null checks in auto-session creation** (High - could cause crashes)
|
||||
3. **Reduced observability in health endpoint** (Medium - impacts debugging)
|
||||
4. **Watch mode in production** (Low - document or disable)
|
||||
|
||||
The simplifications are well-justified and the error handling improvements make the system more resilient. Once the critical race condition and null check issues are addressed, this will be a solid improvement.
|
||||
|
||||
**Lines of code review**: The 1,855 line reduction (2,335 deletions - 480 additions) accurately reflects meaningful simplification rather than just removing features.
|
||||
|
||||
---
|
||||
|
||||
#### Todo List
|
||||
- [x] Read and analyze core service files (worker-service.ts, worker-utils.ts)
|
||||
- [x] Review hook changes (save-hook.ts, new-hook.ts, summary-hook.ts, cleanup-hook.ts)
|
||||
- [x] Review PM2 configuration changes (ecosystem.config.cjs)
|
||||
- [x] Analyze architectural changes and potential issues
|
||||
- [x] Provide comprehensive review feedback
|
||||
|
||||
---
|
||||
@@ -0,0 +1,616 @@
|
||||
# JIT Context Filtering: Post-Mortem
|
||||
|
||||
**Date:** November 9, 2025
|
||||
**Duration:** 3.5 hours (7:45 PM - 11:11 PM)
|
||||
**Branches:** `feature/jit-context`, `failed/jit-context`
|
||||
**Status:** Failed, reverted to main
|
||||
**Commits:**
|
||||
- `3ac0790` - feat: Implement JIT context hook for user prompt submission
|
||||
- `adf7bf4` - Refactor JIT context handling in SDKAgent and WorkerService
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Attempted to implement JIT (Just-In-Time) context filtering—a feature that would dynamically generate relevant context timelines on every user prompt, potentially replacing the static session-start context entirely. After multiple architectural iterations spanning 3.5 hours and adding ~2,850 lines of code, the implementation was abandoned and reverted. The revert was not due to lack of vision (the feature aligns with long-term architectural goals), but due to implementation complexity and the need for a simpler initial approach. Significant architectural knowledge was gained about hook limitations, worker patterns, and proper separation of concerns.
|
||||
|
||||
## What We Tried to Build
|
||||
|
||||
### Goal
|
||||
When a user submits a prompt, dynamically generate a relevant context timeline instead of the static session-start context. Use the fast search infrastructure (SQLite FTS5 + ChromaDB) to fetch precisely relevant context on-demand.
|
||||
|
||||
### The Vision
|
||||
**Current approach:** SessionStart hook loads 50 recent observations blindly, displays them all.
|
||||
|
||||
**Proposed approach:** UserPromptSubmit hook analyzes the prompt, queries the timeline search API, and loads only the relevant context window dynamically.
|
||||
|
||||
**Why this makes sense:**
|
||||
- We already have fast search: SQLite FTS5 + Chroma semantic search
|
||||
- Dynamic context timeline search is implemented and tested
|
||||
- Search results come back in <200ms
|
||||
- Could **replace** session-start context entirely with smarter, prompt-specific context
|
||||
|
||||
### User Experience
|
||||
```
|
||||
User types: "How did we fix the authentication bug?"
|
||||
|
||||
Behind the scenes:
|
||||
1. Analyze prompt: "authentication bug fix"
|
||||
2. Query timeline search for relevant period
|
||||
3. Load 5-10 observations from that specific timeline
|
||||
4. Inject as context
|
||||
5. Claude answers with precisely relevant historical context
|
||||
|
||||
vs. Current:
|
||||
Load 50 most recent observations regardless of relevance
|
||||
```
|
||||
|
||||
### Why Checkbox Settings Became Less Important
|
||||
Originally asked for checkboxes to customize session-start context display. But if JIT context could replace session-start context with intelligent, prompt-specific timelines, the display customization became a non-issue.
|
||||
|
||||
## Architectural Attempts
|
||||
|
||||
### Attempt 1: Hook-Based Filtering (7:45 PM - 9:30 PM)
|
||||
|
||||
**Approach:** Call Agent SDK `query()` directly in `new-hook.ts` during UserPromptSubmit event.
|
||||
|
||||
**Implementation:**
|
||||
- Created `jit-context-hook.ts` (~432 lines)
|
||||
- Added `generateJitContext()` function in hook
|
||||
- Called SDK `query()` with observation list and user prompt
|
||||
- Expected hook to block for ~1-2s while Haiku filters
|
||||
|
||||
**Failure:**
|
||||
```
|
||||
Error: Claude Code executable not found at
|
||||
/Users/alexnewman/.claude/plugins/marketplaces/thedotmack/plugin/scripts/cli.js
|
||||
```
|
||||
|
||||
**Root Cause:** Hooks run in sandboxed environment without access to `claudePath` (path to Claude Code executable). The Agent SDK requires this path, which is only available in the worker service.
|
||||
|
||||
**Architectural Violation:** This broke the established pattern where hooks handle orchestration and workers handle AI processing. The `save-hook` sets the precedent: hooks capture data, send to worker, worker runs SDK queries asynchronously.
|
||||
|
||||
### Attempt 2: Worker-Based with Simple Queries (9:30 PM - 10:30 PM)
|
||||
|
||||
**Approach:** Move JIT filtering to worker service, keep it simple with per-request SDK queries.
|
||||
|
||||
**Implementation:**
|
||||
- Documented architecture fix plan in `docs/jit-context-architecture-fix.md`
|
||||
- Moved `generateJitContext()` to worker (considered creating `src/services/worker/JitContext.ts`)
|
||||
- Modified `/sessions/:id/init` endpoint to accept `jitEnabled` flag
|
||||
- Worker would run one-shot SDK query per prompt
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
UserPromptSubmit → new-hook → POST /sessions/:id/init { jitEnabled: true }
|
||||
↓
|
||||
Worker spawns Claude Haiku
|
||||
↓
|
||||
Filters 50 obs → 3-5 IDs
|
||||
↓
|
||||
Returns { context: [...] }
|
||||
↓
|
||||
Hook injects context → Claude
|
||||
```
|
||||
|
||||
**Issues Identified:**
|
||||
- Each filter request spawns a new Claude subprocess (~200-500ms overhead)
|
||||
- Observation list re-sent on every prompt (~5-10KB per request)
|
||||
- No token caching between requests
|
||||
- Performance worse than just loading all observations directly
|
||||
|
||||
**Decision:** Pivoted to persistent sessions to solve performance issues.
|
||||
|
||||
### Attempt 3: Persistent JIT Sessions (10:30 PM - 11:11 PM)
|
||||
|
||||
**Approach:** Create a long-lived Agent SDK session that persists throughout user session, similar to main memory session pattern.
|
||||
|
||||
**Implementation (291 new lines in SDKAgent.ts):**
|
||||
|
||||
1. **Session Lifecycle:**
|
||||
- Added `jitSessionId`, `jitAbortController`, `jitGeneratorPromise` to `ActiveSession` interface
|
||||
- `startJitSession()`: Creates persistent SDK session at session init
|
||||
- `cleanupJitSession()`: Terminates JIT session at session end
|
||||
|
||||
2. **Request Queue Architecture:**
|
||||
- `jitFilterQueues` Map: Per-session request queues
|
||||
- `JITFilterRequest` interface: `{ userPrompt, resolve, reject }`
|
||||
- EventEmitter coordination: Wake generator when new requests arrive
|
||||
|
||||
3. **Message Generator Pattern:**
|
||||
- `createJitMessageGenerator()`: Async generator that yields filter requests
|
||||
- Initial prompt: Load 50 observations, wait for "READY" response
|
||||
- Loop: Wait for EventEmitter signal → yield user prompt → parse response → resolve promise
|
||||
- Pattern: Persistent session stays alive between requests
|
||||
|
||||
4. **Filter Query Flow:**
|
||||
```typescript
|
||||
runFilterQuery(sessionDbId, userPrompt) {
|
||||
// Queue request
|
||||
queue.requests.push({ userPrompt, resolve, reject });
|
||||
queue.emitter.emit('request');
|
||||
|
||||
// Wait for response (30s timeout)
|
||||
return Promise.race([
|
||||
new Promise((resolve, reject) => { /* queued */ }),
|
||||
timeout(30000)
|
||||
]);
|
||||
}
|
||||
```
|
||||
|
||||
5. **Response Processing:**
|
||||
- `processJitFilterResponse()`: Accumulate streaming text
|
||||
- Parse IDs: "1,5,23,41" or "NONE"
|
||||
- Resolve queued promise with ID array
|
||||
|
||||
**Added Files:**
|
||||
- `src/services/worker/SDKAgent.ts`: +291 lines
|
||||
- `src/services/worker-types.ts`: +3 fields (jit state tracking)
|
||||
- `src/services/worker/SessionManager.ts`: +26 lines (JIT cleanup)
|
||||
- `src/services/worker-service.ts`: +102 lines (JIT initialization)
|
||||
- `src/shared/settings.ts`: +65 lines (JIT config)
|
||||
- `src/hooks/jit-context-hook.ts`: +208 lines (orchestration)
|
||||
- `docs/jit-context-architecture-fix.md`: +265 lines
|
||||
- `context/session-pattern-parity.md`: +298 lines
|
||||
|
||||
**Total Changes:** 18 files, +2,852 lines, -133 lines
|
||||
|
||||
**Final Status at Revert:** Implementation was complete and likely functional, but...
|
||||
|
||||
## Why It Failed
|
||||
|
||||
### 1. Architectural Complexity Explosion
|
||||
|
||||
**Problem:** The persistent session pattern added enormous complexity for marginal benefit.
|
||||
|
||||
**Evidence:**
|
||||
- Parallel session management: Regular + JIT sessions running concurrently
|
||||
- Complex coordination: EventEmitter + promise queues + generator pattern
|
||||
- Lifecycle coupling: Session init, request handling, cleanup all intertwined
|
||||
- State explosion: 3 new fields per session (`jitSessionId`, `jitAbortController`, `jitGeneratorPromise`)
|
||||
|
||||
**Code Smell:** When the "optimization" requires 300 lines of coordination code, it's probably not an optimization.
|
||||
|
||||
### 2. Premature Optimization
|
||||
|
||||
**YAGNI Violation:** Built elaborate token caching and persistent session architecture before proving the feature provided value.
|
||||
|
||||
**Reality Check:**
|
||||
- **Current approach:** Load 50 observations = ~25KB context, works fine
|
||||
- **JIT overhead:** Haiku query = 1-2s latency + coordination complexity
|
||||
- **User benefit:** Unclear—users haven't complained about context relevance
|
||||
- **Token savings:** Marginal—Claude caches long contexts efficiently anyway
|
||||
|
||||
**Quote from CLAUDE.md:**
|
||||
> "Write the dumb, obvious thing first. Add complexity only when you actually hit the problem."
|
||||
|
||||
We didn't hit a problem. We invented one.
|
||||
|
||||
### 3. Implementation Complexity, Not Vision
|
||||
|
||||
**The Vision is Sound:**
|
||||
- Dynamic context is better than static context
|
||||
- Timeline search API exists and is fast
|
||||
- Infrastructure (SQLite + Chroma) can support this
|
||||
- Replacing session-start context with prompt-specific context makes sense
|
||||
|
||||
**The Problem:**
|
||||
We jumped to the complex persistent-session approach without trying the simple per-request approach first.
|
||||
|
||||
**What We Should Have Done:**
|
||||
```typescript
|
||||
// Simple version (not tried):
|
||||
app.post('/sessions/:id/init', async (req, res) => {
|
||||
const { userPrompt } = req.body;
|
||||
|
||||
// Query timeline search API (already exists, fast)
|
||||
const timeline = await timelineSearch(project, userPrompt, depth=10);
|
||||
|
||||
// Return observations
|
||||
return res.json({ context: timeline });
|
||||
});
|
||||
```
|
||||
|
||||
**This would have:**
|
||||
- Validated the feature's value quickly
|
||||
- Used existing infrastructure
|
||||
- Avoided all the persistence complexity
|
||||
- Taken 30 minutes instead of 3.5 hours
|
||||
|
||||
### 4. Pattern Divergence
|
||||
|
||||
**Inconsistency:** JIT sessions work fundamentally differently from memory sessions.
|
||||
|
||||
**Memory Session Pattern:**
|
||||
```typescript
|
||||
// One-shot: Init → Process observations → Complete
|
||||
startSession() → yield prompts → parse responses → complete
|
||||
```
|
||||
|
||||
**JIT Session Pattern:**
|
||||
```typescript
|
||||
// Persistent: Init → Wait indefinitely → Process on-demand → Complete
|
||||
startJitSession() → yield initial load → LOOP:
|
||||
- Wait for EventEmitter signal
|
||||
- Yield filter request
|
||||
- Parse response
|
||||
- Resolve promise
|
||||
- GOTO LOOP
|
||||
```
|
||||
|
||||
**Maintenance Burden:** Two completely different session patterns means:
|
||||
- Doubled testing complexity
|
||||
- Increased cognitive load for contributors
|
||||
- Higher risk of subtle bugs in lifecycle management
|
||||
|
||||
**Session Pattern Parity Document:** The 298-line `session-pattern-parity.md` was created to document the differences—a sign that maybe they shouldn't be different.
|
||||
|
||||
### 5. Blocking I/O in Critical Path
|
||||
|
||||
**Performance Impact:** Every user prompt now blocks for 1-2s waiting for Haiku filtering.
|
||||
|
||||
**Current Flow:**
|
||||
```
|
||||
User types prompt → 10ms → Claude responds
|
||||
```
|
||||
|
||||
**JIT Flow:**
|
||||
```
|
||||
User types prompt → 10ms init → 1-2s Haiku filter → Claude responds
|
||||
```
|
||||
|
||||
**User Experience:** We added 1-2 seconds of latency to every interaction for questionable benefit.
|
||||
|
||||
**Alternative:** If context filtering is valuable, do it asynchronously and apply to next prompt.
|
||||
|
||||
### 6. Missing the Forest for the Trees
|
||||
|
||||
**Real Issue:** We focused on technical implementation without asking strategic questions:
|
||||
|
||||
- **Is context relevance actually a problem?** No evidence.
|
||||
- **Do users want this?** No feedback requested.
|
||||
- **Is 50 observations too many?** Not proven.
|
||||
- **Does filtering improve responses?** Not tested.
|
||||
|
||||
**Anti-Pattern:** Solution in search of a problem.
|
||||
|
||||
## What We Should Have Done
|
||||
|
||||
### Option 1: Don't Build It
|
||||
|
||||
**Justification:** No validated user need. Current system works fine.
|
||||
|
||||
**Next Step:** Wait for user feedback indicating context relevance is an issue.
|
||||
|
||||
### Option 2: Simple MVP
|
||||
|
||||
If we really wanted to explore this:
|
||||
|
||||
1. **Week 1:** Add basic filtering in worker with one-shot queries
|
||||
- Accept slight performance hit (~500ms overhead)
|
||||
- Measure filter accuracy and user impact
|
||||
- Gather feedback
|
||||
|
||||
2. **Week 2:** If proven valuable, optimize
|
||||
- Add token caching only if needed
|
||||
- Consider persistent sessions only if performance is bottleneck
|
||||
|
||||
3. **Week 3:** If still valuable, scale
|
||||
- Polish error handling
|
||||
- Add configuration options
|
||||
- Document patterns
|
||||
|
||||
**Philosophy:** Incremental validation, not big-bang architecture.
|
||||
|
||||
### Option 3: Different Approach Entirely
|
||||
|
||||
**Alternative:** Pre-computed relevance scores
|
||||
|
||||
Instead of on-demand filtering:
|
||||
- Score observations at creation time (save-hook)
|
||||
- Store relevance embeddings in Chroma
|
||||
- At session start, query Chroma with user's first prompt
|
||||
- Load top 10-20 most relevant observations
|
||||
- No runtime latency, better accuracy, simpler architecture
|
||||
|
||||
**Benefit:** Leverages existing Chroma infrastructure, avoids runtime overhead.
|
||||
|
||||
## Technical Lessons Learned
|
||||
|
||||
### 1. EventEmitter Coordination Anti-Pattern
|
||||
|
||||
**Code:**
|
||||
```typescript
|
||||
queue.emitter.on('request', () => {
|
||||
// Wake up generator to process request
|
||||
});
|
||||
```
|
||||
|
||||
**Issue:** Complex async coordination using event-driven wakeup signals is hard to reason about.
|
||||
|
||||
**Better:** Use async queues or channels (e.g., `async-queue` package) that handle coordination internally.
|
||||
|
||||
### 2. Generator Pattern Complexity
|
||||
|
||||
**Pattern:**
|
||||
```typescript
|
||||
async *createJitMessageGenerator() {
|
||||
yield initialPrompt;
|
||||
while (!aborted) {
|
||||
await waitForEvent(); // Blocks here
|
||||
yield nextRequest;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Tradeoff:** Generators are great for iteration, but terrible for event-driven request/response patterns.
|
||||
|
||||
**Better:** Use explicit session object with `sendMessage()/waitForResponse()` methods.
|
||||
|
||||
### 3. Dual Session Management
|
||||
|
||||
**Complexity:** Managing two concurrent SDK sessions per user session is inherently complex.
|
||||
|
||||
**Alternatives Considered:**
|
||||
- Single session handling both observations and filtering (rejected: tight coupling)
|
||||
- Separate service for filtering (rejected: too much infrastructure)
|
||||
- Pre-computed filtering (not considered: should have been)
|
||||
|
||||
**Lesson:** When parallel state management feels hard, question whether you need parallel state.
|
||||
|
||||
### 4. Promise Queue Pattern
|
||||
|
||||
**Implementation:**
|
||||
```typescript
|
||||
interface QueuedRequest {
|
||||
resolve: (result: T) => void;
|
||||
reject: (error: Error) => void;
|
||||
}
|
||||
queue.push({ resolve, reject });
|
||||
// Later...
|
||||
queue[0].resolve(result);
|
||||
```
|
||||
|
||||
**Good:** Clean async API for callers
|
||||
**Bad:** Easy to leak promises if error handling isn't perfect
|
||||
**Improvement:** Use libraries like `p-queue` that handle edge cases
|
||||
|
||||
## Process Lessons Learned
|
||||
|
||||
### 1. No Incremental Validation
|
||||
|
||||
**Mistake:** Went from "idea" to "complete architecture" without validation points.
|
||||
|
||||
**Better Process:**
|
||||
1. Write one-pager explaining user value
|
||||
2. Build simplest possible version (2 hours max)
|
||||
3. Test with real usage
|
||||
4. Measure impact
|
||||
5. Decide: kill, iterate, or scale
|
||||
|
||||
**Checkpoint Questions:**
|
||||
- After 1 hour: "Does this solve a real problem?"
|
||||
- After 2 hours: "Is this getting too complex?"
|
||||
- After 3 hours: "Should I just ship the simple version?"
|
||||
|
||||
### 2. Architecture Astronomy
|
||||
|
||||
**Definition:** Designing elaborate systems without building/testing them.
|
||||
|
||||
**Evidence:**
|
||||
- 265-line architecture doc written before any code
|
||||
- 298-line session pattern parity analysis
|
||||
- Multiple complete rewrites of the same feature
|
||||
|
||||
**Better:** Code first, document later. Spike solutions, learn from implementation.
|
||||
|
||||
### 3. Sunk Cost Fallacy
|
||||
|
||||
**Timeline:**
|
||||
- **Hour 1:** "This seems complex but achievable"
|
||||
- **Hour 2:** "We're halfway done, can't stop now"
|
||||
- **Hour 3:** "Just need to fix this one coordination issue"
|
||||
- **Hour 4:** "It's working, but... this feels wrong"
|
||||
|
||||
**Correct Decision:** Revert. Took courage to throw away 4 hours of work.
|
||||
|
||||
**Learning:** Time invested is not a reason to continue. Quality of outcome matters more.
|
||||
|
||||
### 4. Missing User Feedback Loop
|
||||
|
||||
**No User Input:**
|
||||
- Didn't ask: "Is context relevance a problem for you?"
|
||||
- Didn't test: "Does filtered context improve your responses?"
|
||||
- Didn't measure: "Are you hitting context limits?"
|
||||
|
||||
**Engineering Theater:** Building impressive-sounding features without user validation.
|
||||
|
||||
## What We Actually Learned (The Real Value)
|
||||
|
||||
Despite reverting, this was productive R&D:
|
||||
|
||||
### 1. Deep Understanding of Hook Architecture
|
||||
|
||||
**Critical Discovery:** Hooks run in sandboxed environment without `claudePath`.
|
||||
- Hooks cannot call Agent SDK `query()` directly
|
||||
- All AI processing must happen in worker service
|
||||
- This architectural constraint is now documented
|
||||
|
||||
**Learned Pattern:**
|
||||
```
|
||||
Hook (orchestration) → Worker (AI processing)
|
||||
✓ save-hook: Captures data → Worker processes with SDK
|
||||
✓ new-hook: Creates session → Worker returns confirmation
|
||||
✗ jit-hook: Tried SDK in hook → Failed, no claudePath
|
||||
```
|
||||
|
||||
**Value:** Future features will avoid this mistake. We now know the boundary.
|
||||
|
||||
### 2. Worker Architecture Patterns
|
||||
|
||||
**Blocking vs. Non-Blocking:**
|
||||
- SessionStart: Can be non-blocking (context loads async)
|
||||
- UserPromptSubmit: Must be blocking (session must exist before processing)
|
||||
- JIT Context: Must be blocking (context needed before prompt processed)
|
||||
|
||||
**Established Pattern:**
|
||||
```typescript
|
||||
// Worker endpoint for features requiring AI
|
||||
app.post('/sessions/:id/operation', async (req, res) => {
|
||||
const { operationData } = req.body;
|
||||
const result = await sdkAgent.performOperation(operationData);
|
||||
return res.json({ result });
|
||||
});
|
||||
```
|
||||
|
||||
### 3. Persistent Session Management
|
||||
|
||||
**Architecture Knowledge Gained:**
|
||||
- How to maintain long-lived SDK sessions
|
||||
- EventEmitter coordination patterns for request/response
|
||||
- Promise queue management for async operations
|
||||
- Proper cleanup with AbortControllers
|
||||
|
||||
**Pattern Documented:**
|
||||
- Dual session management (regular + JIT)
|
||||
- Generator-based message loops
|
||||
- Request queuing with timeouts
|
||||
|
||||
**Value:** When we build the simpler version, we'll know these patterns.
|
||||
|
||||
### 4. Configuration Infrastructure
|
||||
|
||||
`src/shared/settings.ts` (65 lines) provides reusable configuration patterns:
|
||||
```typescript
|
||||
export function getConfigValue(key: string, defaultValue: string): string {
|
||||
// Priority: settings.json → env var → default
|
||||
}
|
||||
```
|
||||
|
||||
**Kept After Revert:** This module is useful for other features.
|
||||
|
||||
### 5. Key Architectural Decisions Made
|
||||
|
||||
**Decisions that will guide future implementation:**
|
||||
1. JIT context filtering must happen in worker (proven via failed hook attempt)
|
||||
2. Context must be blocking on UserPromptSubmit (session needs context before processing)
|
||||
3. Dynamic timeline search is the right approach (fast, precise, leverages existing infrastructure)
|
||||
4. Simple per-request queries should be tried before persistent sessions
|
||||
|
||||
### 6. Documentation Quality
|
||||
|
||||
- `jit-context-architecture-fix.md`: Documents why hooks can't run SDK queries
|
||||
- `session-pattern-parity.md`: Reference for implementing dual sessions
|
||||
- Hooks reference: Comprehensive hook documentation added
|
||||
|
||||
**Value:** These docs help future contributors understand the system constraints.
|
||||
|
||||
### 7. Infrastructure Validation
|
||||
|
||||
**Confirmed that our search stack is ready:**
|
||||
- SQLite FTS5: Fast full-text search (<50ms)
|
||||
- ChromaDB: Semantic search (<200ms with 8,000+ vectors)
|
||||
- Timeline search API: Already implemented and tested
|
||||
- Worker service: Can handle synchronous AI operations
|
||||
|
||||
**The infrastructure exists. We just need a simpler integration.**
|
||||
|
||||
## Recommendations
|
||||
|
||||
### Immediate Actions
|
||||
|
||||
1. **Archive the work:**
|
||||
- Keep `failed/jit-context` branch for reference
|
||||
- Extract reusable components (settings.ts)
|
||||
- Save architecture docs for future features
|
||||
|
||||
2. **Document the anti-patterns:**
|
||||
- Add this post-mortem to CLAUDE.md references
|
||||
- Update coding standards with lessons learned
|
||||
|
||||
3. **Reset focus:**
|
||||
- Return to validated user needs
|
||||
- Prioritize features with clear value propositions
|
||||
|
||||
### Future Feature Development
|
||||
|
||||
**Gating Questions (Answer before coding):**
|
||||
|
||||
1. **User Value:** What specific user problem does this solve?
|
||||
2. **Evidence:** Have users requested this or reported the underlying issue?
|
||||
3. **Measurement:** How will we know if it's successful?
|
||||
4. **Simplicity:** What's the dumbest version that could work?
|
||||
5. **Time Limit:** If we can't prove value in 2 hours, should we build it?
|
||||
|
||||
**Process:**
|
||||
|
||||
```
|
||||
VALIDATE → BUILD SIMPLE → TEST → MEASURE → DECIDE
|
||||
↑ ↓
|
||||
└──────────── ITERATE OR KILL ────────────┘
|
||||
```
|
||||
|
||||
### If Context Filtering Returns
|
||||
|
||||
Should we revisit this idea in the future:
|
||||
|
||||
**Prerequisites:**
|
||||
- User feedback requesting better context relevance
|
||||
- Metrics showing current context is too broad
|
||||
- Evidence that filtering improves response quality
|
||||
|
||||
**Simple Approach:**
|
||||
```typescript
|
||||
// In worker-service.ts /sessions/:id/init
|
||||
if (jitEnabled) {
|
||||
const observations = await db.getRecentObservations(project, 50);
|
||||
const filtered = await simpleFilter(observations, userPrompt); // One-shot query
|
||||
return { context: filtered };
|
||||
}
|
||||
```
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- <100 lines of code
|
||||
- <500ms latency impact
|
||||
- No new session types
|
||||
- Degrades gracefully on errors
|
||||
|
||||
**If that works:** Then consider optimization.
|
||||
|
||||
## Conclusion
|
||||
|
||||
JIT context filtering failed not because the vision was wrong, but because we jumped to the complex implementation without validating the simple one first. The feature aligns with long-term goals (dynamic, prompt-specific context using our fast search infrastructure), but the persistent-session architecture was premature optimization.
|
||||
|
||||
**The right call:** Revert the complex implementation. Build the simple version when ready.
|
||||
|
||||
**Key Takeaway:** The vision is sound. The execution was overcomplicated. We now have:
|
||||
- Deep knowledge of hook/worker architecture constraints
|
||||
- Documented patterns for persistent SDK sessions
|
||||
- Validated fast search infrastructure
|
||||
- Clear understanding of what to build next time (simple timeline search API integration)
|
||||
|
||||
**This was R&D, not failure.** We learned what doesn't work (SDK in hooks), what does work (worker-based AI processing), and how to approach it next time (simple API calls before persistent sessions).
|
||||
|
||||
**Next Implementation:**
|
||||
When we revisit this (and we should), start with:
|
||||
1. Worker endpoint that accepts prompt
|
||||
2. Queries existing timeline search API
|
||||
3. Returns context
|
||||
4. Hook injects context
|
||||
5. Validate it improves responses
|
||||
6. Then optimize if needed
|
||||
|
||||
**Final Thought:** Sometimes you have to build the wrong thing to understand the right thing. That's R&D.
|
||||
|
||||
---
|
||||
|
||||
**Branch Status:**
|
||||
- `feature/jit-context`: Abandoned
|
||||
- `failed/jit-context`: Archived for reference
|
||||
- `main`: Stable at v5.4.0
|
||||
|
||||
**Files to Keep:**
|
||||
- `src/shared/settings.ts`: Reusable config utilities
|
||||
|
||||
**Files Discarded:**
|
||||
- Everything else (+2,850 lines)
|
||||
|
||||
**Emotional State:** Relieved. Dodged a maintenance nightmare.
|
||||
@@ -0,0 +1,141 @@
|
||||
# I built a context management plugin and it CHANGED MY LIFE
|
||||
|
||||
Okay so I know this sounds clickbait-y but genuinely: if you've ever spent 20 minutes re-explaining your project architecture to Claude because you started a new chat, this might actually save your sanity.
|
||||
|
||||
The actual problem I was trying to solve:
|
||||
|
||||
Claude Code is incredible for building stuff, but it has the memory of a goldfish. Every new session I'd be like "okay so remember we're using Express for the API and SQLite for storage and—" and Claude's like "I have never seen this codebase in my life."
|
||||
|
||||
What I built:
|
||||
|
||||
A plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude itself lol), and injects relevant context back into future sessions.
|
||||
|
||||
So instead of explaining your project every time, you just... start coding. Claude already knows what happened yesterday.
|
||||
|
||||
How it actually works:
|
||||
|
||||
Hooks into Claude's tool system and watches everything (file reads, edits, bash commands, etc.)
|
||||
|
||||
Background worker processes observations into compressed summaries
|
||||
|
||||
When you start a new session, last 10 summaries get auto-injected
|
||||
|
||||
Built-in search tools let Claude query its own memory ("what did we decide about auth?")
|
||||
|
||||
Runs locally on SQLite + PM2, your code never leaves your machine
|
||||
|
||||
Real talk:
|
||||
|
||||
I made this because I was building a different project and kept hitting the context limit, then having to restart and re-teach Claude the entire architecture. It was driving me insane. Now Claude just... remembers. It's wild.
|
||||
|
||||
Link: https://github.com/thedotmack/claude-mem (AGPL-3.0 licensed)
|
||||
|
||||
It is set up to use Claude Code's new plugin system, type the following to install, then restart Claude Code.
|
||||
|
||||
/plugin marketplace add thedotmack/claude-mem
|
||||
|
||||
/plugin install claude-mem
|
||||
Would love feedback from anyone actually building real projects with Claude Code, if this helps you continue, if it helps you save tokens and get more use out of Claude Code. Thanks in advance!
|
||||
|
||||
===============================================================================
|
||||
|
||||
# How is Claude-Mem different from Claude's New Memory Tool?
|
||||
|
||||
A few people have been asking this question on the claude-mem thread I posted yesterday, so I wanted to put up a definitive answer for people, that really explains the differences and how they can be complimentary to each other actually. I used the "claude code docs agent" to help figure this out:
|
||||
|
||||
---
|
||||
|
||||
Based on the documentation, here are the key differences between your Claude-Mem tool and Claude's official memory tool:
|
||||
|
||||
Scope and Architecture
|
||||
Claude's Memory Tool is designed for single-session memory management within conversations (1). It provides commands like view, create, str_replace, insert, delete, and rename for managing memory files during a conversation (1). The tool automatically includes this instruction: "IMPORTANT: ALWAYS VIEW YOUR MEMORY DIRECTORY BEFORE DOING ANYTHING ELSE" (1).
|
||||
|
||||
Your Claude-Mem is a comprehensive multi-session persistence system that captures context across different Claude Code sessions. It uses hooks to automatically capture tool usage, process observations through the Claude Agent SDK, and restore context when new sessions start.
|
||||
|
||||
Memory Persistence
|
||||
Claude's Memory Tool focuses on within-session memory management. It helps Claude maintain context during a single conversation by reading and writing to memory files (1).
|
||||
|
||||
Your Claude-Mem provides cross-session persistence by:
|
||||
|
||||
Capturing every tool execution through PostToolUse hooks (2)
|
||||
|
||||
Processing observations through the Claude Agent SDK (3)
|
||||
|
||||
Automatically injecting summaries from the last few sessions into new session contexts
|
||||
|
||||
Using SQLite with FTS5 full-text search for retrieval
|
||||
|
||||
Integration Method
|
||||
Claude's Memory Tool is a built-in tool that works through the standard tool use interface (1).
|
||||
|
||||
Your Claude-Mem integrates as a Claude Code plugin using multiple hooks:
|
||||
|
||||
SessionStart for context injection (2)
|
||||
|
||||
UserPromptSubmit for session initialization (2)
|
||||
|
||||
PostToolUse for observation capture (2)
|
||||
|
||||
Stop for summary generation (2)
|
||||
|
||||
SessionEnd for cleanup (2)
|
||||
|
||||
Search and Retrieval
|
||||
Claude's Memory Tool provides basic file operations for memory management (1).
|
||||
|
||||
Your Claude-Mem includes an MCP server with 6 specialized search tools:
|
||||
|
||||
search_observations - Full-text search across observations
|
||||
|
||||
search_sessions - Search across session summaries
|
||||
|
||||
find_by_concept - Find by tagged concepts
|
||||
|
||||
find_by_file - Find by file paths
|
||||
|
||||
find_by_type - Find by observation type
|
||||
|
||||
advanced_search - Combined search with filters
|
||||
|
||||
Use Cases
|
||||
Claude's Memory Tool is ideal for maintaining context within a single conversation, helping with tasks that require remembering information throughout the session (1).
|
||||
|
||||
Your Claude-Mem addresses the broader challenge of maintaining project knowledge across multiple Claude Code sessions, essentially solving the session isolation problem that can occur in Claude Code (4).
|
||||
|
||||
Your tool appears to be complementary to Claude's memory tool rather than directly competing - it operates at the session level while Claude's memory tool operates within conversations.
|
||||
|
||||
===============================================================================
|
||||
|
||||
# Hot take... "You're absolutely right!" is a bug, not a feature
|
||||
|
||||
When Claude first started saying "You're absolutely right!" I started instructing it to "never tell me I'm absolutely right" because most of the time, it didn't do any verification or thinking before deeming my suggestion "The absolutely right one"
|
||||
|
||||
Now we're many versions later, and the team at Claude have embraced "You're absolutely right!" as a "cute" addition to their overall brand, fully accepting this clear anti-pattern.
|
||||
|
||||
Is Claude just "smarter" now? Do you perceive "You're absolutely right!" as being given the "absolute right" solution, or are do you feel as though you need to clarify or follow up when this happens?
|
||||
|
||||
One of the foundations of my theory behind priming context with claude-mem is this:
|
||||
|
||||
"The less Claude has to keep track of that's unrelated to the task at hand, the better Claude will perform that task."
|
||||
|
||||
The system I designed uses a parallel instance to manage the memory flow, it's receiving data as it comes in, but the Claude instance you're working with doesn't have any instructions for storing memories. It doesn't need it. That's all handled in the background.
|
||||
|
||||
This decoupling matters because every instruction you give Claude is cognitive overhead.
|
||||
|
||||
When you load up context with "remember to store this" or "track that observation" or "don't forget to summarize," you're polluting the workspace. Claude has to juggle your actual task AND the meta-task of managing its own memory.
|
||||
|
||||
That's when you get lazy agreement.
|
||||
|
||||
I've noticed that when Claude's context window gets cluttered with unrelated instructions, this pattern of lazy agreement shows up more and more.
|
||||
|
||||
Agreeing with you is easier than deep analysis when the context is already maxed out.
|
||||
|
||||
"You're absolutely right!" becomes the path of least resistance.
|
||||
|
||||
When Claude can focus purely on your code, your architecture, your question - without memory management instructions competing for attention - it accomplishes tasks faster and more accurately.
|
||||
|
||||
The difference is measurable.
|
||||
|
||||
The "You're absolutely right!" reflex drops off noticeably because there's room in the context window for actual analysis instead of performative agreement.
|
||||
|
||||
What do you think? Does this bother you as much as it does me? 😭
|
||||
@@ -0,0 +1,857 @@
|
||||
# Search Architecture Analysis
|
||||
|
||||
**Date:** 2025-11-11 **Scope:** HTTP API endpoints, MCP search server, DRY violations, architectural recommendations
|
||||
|
||||
---
|
||||
|
||||
## Current State: Dual Search Architectures
|
||||
|
||||
### Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Claude Code Session │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────────┐ │
|
||||
│ │ mem-search Skill (ACTIVE) │ │
|
||||
│ │ - Uses HTTP API via curl commands │ │
|
||||
│ │ - 10 search operations │ │
|
||||
│ │ - Progressive disclosure workflow │ │
|
||||
│ └────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ │ HTTP GET │
|
||||
│ ▼ │
|
||||
│ ┌────────────────────────────────────────────────────┐ │
|
||||
│ │ MCP Search Server (DEPRECATED but BUILT) │ │
|
||||
│ │ - .mcp.json configured │ │
|
||||
│ │ - search-server.mjs exists (74KB) │ │
|
||||
│ │ - 9 MCP tools defined │ │
|
||||
│ │ - Not used by skill │ │
|
||||
│ └────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌───────────┴───────────┐
|
||||
▼ ▼
|
||||
┌──────────────────────────┐ ┌──────────────────────────┐
|
||||
│ Worker Service │ │ MCP Server │
|
||||
│ (worker-service.ts) │ │ (search-server.ts) │
|
||||
│ │ │ │
|
||||
│ 10 HTTP Endpoints: │ │ 9 MCP Tools: │
|
||||
│ ├─ /api/search/ │ │ ├─ search_observations │
|
||||
│ │ observations │ │ ├─ search_sessions │
|
||||
│ ├─ /api/search/ │ │ ├─ search_user_prompts │
|
||||
│ │ sessions │ │ ├─ find_by_concept │
|
||||
│ ├─ /api/search/ │ │ ├─ find_by_file │
|
||||
│ │ prompts │ │ ├─ find_by_type │
|
||||
│ ├─ /api/search/ │ │ ├─ get_recent_context │
|
||||
│ │ by-concept │ │ ├─ get_context_timeline │
|
||||
│ ├─ /api/search/ │ │ └─ get_timeline_by_query│
|
||||
│ │ by-file │ │ │
|
||||
│ ├─ /api/search/ │ │ Built: ✅ │
|
||||
│ │ by-type │ │ Used: ❌ │
|
||||
│ ├─ /api/context/recent │ │ Configured: ✅ │
|
||||
│ ├─ /api/context/ │ │ Status: DEPRECATED │
|
||||
│ │ timeline │ │ │
|
||||
│ ├─ /api/timeline/ │ │ │
|
||||
│ │ by-query │ │ │
|
||||
│ └─ /api/search/help │ │ │
|
||||
│ │ │ │
|
||||
│ Built: ✅ │ │ │
|
||||
│ Used: ✅ │ │ │
|
||||
│ Status: ACTIVE │ │ │
|
||||
└──────────────────────────┘ └──────────────────────────┘
|
||||
│ │
|
||||
└─────────┬─────────────────┘
|
||||
▼
|
||||
┌────────────────────────────────┐
|
||||
│ SessionSearch (Shared Layer) │
|
||||
│ - FTS5 queries │
|
||||
│ - SQLite operations │
|
||||
│ - Common data access │
|
||||
└────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────────────────┐
|
||||
│ SQLite Database │
|
||||
│ ~/.claude-mem/claude-mem.db │
|
||||
└────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## HTTP Endpoints Architecture
|
||||
|
||||
### Location
|
||||
|
||||
`src/services/worker-service.ts` (lines 108-118, 748-1174)
|
||||
|
||||
### Endpoints (10 total)
|
||||
|
||||
| Endpoint | Method | Purpose | Used By |
|
||||
| -------------------------- | ------ | ----------------------------------- | ---------------- |
|
||||
| `/api/search/observations` | GET | Full-text search observations | mem-search skill |
|
||||
| `/api/search/sessions` | GET | Full-text search session summaries | mem-search skill |
|
||||
| `/api/search/prompts` | GET | Full-text search user prompts | mem-search skill |
|
||||
| `/api/search/by-concept` | GET | Find observations by concept tag | mem-search skill |
|
||||
| `/api/search/by-file` | GET | Find work related to specific files | mem-search skill |
|
||||
| `/api/search/by-type` | GET | Find observations by type | mem-search skill |
|
||||
| `/api/context/recent` | GET | Get recent session context | mem-search skill |
|
||||
| `/api/context/timeline` | GET | Get timeline around point in time | mem-search skill |
|
||||
| `/api/timeline/by-query` | GET | Search + timeline in one call | mem-search skill |
|
||||
| `/api/search/help` | GET | API documentation | mem-search skill |
|
||||
|
||||
### Implementation Pattern
|
||||
|
||||
**Example: Search Observations**
|
||||
|
||||
```typescript
|
||||
// src/services/worker-service.ts:748-781
|
||||
private handleSearchObservations(req: Request, res: Response): void {
|
||||
try {
|
||||
// 1. Parse query parameters
|
||||
const query = req.query.query as string;
|
||||
const format = (req.query.format as string) || 'full';
|
||||
const limit = parseInt(req.query.limit as string, 10) || 20;
|
||||
const project = req.query.project as string | undefined;
|
||||
|
||||
// 2. Validate required parameters
|
||||
if (!query) {
|
||||
res.status(400).json({ error: 'Missing required parameter: query' });
|
||||
return;
|
||||
}
|
||||
|
||||
// 3. Call SessionSearch (shared data layer)
|
||||
const sessionSearch = this.dbManager.getSessionSearch();
|
||||
const results = sessionSearch.searchObservations(query, { limit, project });
|
||||
|
||||
// 4. Format response based on format parameter
|
||||
res.json({
|
||||
query,
|
||||
count: results.length,
|
||||
format,
|
||||
results: format === 'index' ? results.map(r => ({
|
||||
id: r.id,
|
||||
type: r.type,
|
||||
title: r.title,
|
||||
subtitle: r.subtitle,
|
||||
created_at_epoch: r.created_at_epoch,
|
||||
project: r.project,
|
||||
score: r.score
|
||||
})) : results
|
||||
});
|
||||
} catch (error) {
|
||||
logger.failure('WORKER', 'Search observations failed', {}, error as Error);
|
||||
res.status(500).json({ error: (error as Error).message });
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Characteristics
|
||||
|
||||
**Pros:**
|
||||
|
||||
- ✅ Simple HTTP GET requests (curl-friendly)
|
||||
- ✅ Standard REST API pattern
|
||||
- ✅ Easy to test and debug
|
||||
- ✅ No MCP protocol overhead
|
||||
- ✅ Works with any HTTP client
|
||||
|
||||
**Cons:**
|
||||
|
||||
- ⚠️ Parameter parsing duplicated across 10 endpoints
|
||||
- ⚠️ Format conversion logic duplicated
|
||||
- ⚠️ Error handling pattern repeated
|
||||
|
||||
---
|
||||
|
||||
## MCP Search Server Architecture
|
||||
|
||||
### Location
|
||||
|
||||
`src/servers/search-server.ts` (1,781 lines)
|
||||
|
||||
### Status
|
||||
|
||||
- **Built:** ✅ Yes (`plugin/scripts/search-server.mjs`, 74KB)
|
||||
- **Configured:** ✅ Yes (`.mcp.json` line 3-6)
|
||||
- **Used:** ❌ No (deprecated in v5.4.0)
|
||||
- **Maintained:** ⚠️ Source kept for reference
|
||||
|
||||
### Tools (9 total)
|
||||
|
||||
| Tool Name | Purpose | Line |
|
||||
| ----------------------- | -------------------------------------- | -------- |
|
||||
| `search_observations` | Search observations with FTS5 + Chroma | 348-422 |
|
||||
| `search_sessions` | Search session summaries | 438-490 |
|
||||
| `search_user_prompts` | Search user prompts | 506-558 |
|
||||
| `find_by_concept` | Find by concept tag | 574-626 |
|
||||
| `find_by_file` | Find by file path | 642-694 |
|
||||
| `find_by_type` | Find by observation type | 710-762 |
|
||||
| `get_recent_context` | Get recent sessions | 778-830 |
|
||||
| `get_context_timeline` | Get timeline context | 846-950 |
|
||||
| `get_timeline_by_query` | Search + timeline | 966-1064 |
|
||||
|
||||
### Implementation Pattern
|
||||
|
||||
**Example: Search Observations (MCP)**
|
||||
|
||||
```typescript
|
||||
// src/servers/search-server.ts:348-422
|
||||
{
|
||||
name: 'search_observations',
|
||||
description: 'Search observations using full-text search across titles, narratives, facts, and concepts...',
|
||||
inputSchema: z.object({
|
||||
query: z.string().describe('Search query for FTS5 full-text search'),
|
||||
format: z.enum(['index', 'full']).default('index').describe('...'),
|
||||
...filterSchema.shape
|
||||
}),
|
||||
handler: async (args: any) => {
|
||||
try {
|
||||
const { query, format = 'index', ...options } = args;
|
||||
let results: ObservationSearchResult[] = [];
|
||||
|
||||
// Hybrid search: Try Chroma semantic search first, fall back to FTS5
|
||||
if (chromaClient) {
|
||||
try {
|
||||
// Step 1: Chroma semantic search (top 100)
|
||||
const chromaResults = await queryChroma(query, 100);
|
||||
|
||||
if (chromaResults.ids.length > 0) {
|
||||
// Step 2: Filter by recency (90 days)
|
||||
const ninetyDaysAgo = Date.now() - (90 * 24 * 60 * 60 * 1000);
|
||||
const recentIds = chromaResults.ids.filter((_id, idx) => {
|
||||
const meta = chromaResults.metadatas[idx];
|
||||
return meta && meta.created_at_epoch > ninetyDaysAgo;
|
||||
});
|
||||
|
||||
// Step 3: Hydrate from SQLite
|
||||
if (recentIds.length > 0) {
|
||||
const limit = options.limit || 20;
|
||||
results = store.getObservationsByIds(recentIds, { orderBy: 'date_desc', limit });
|
||||
}
|
||||
}
|
||||
} catch (chromaError: any) {
|
||||
console.error('[search-server] Chroma query failed, falling back to FTS5:', chromaError.message);
|
||||
}
|
||||
}
|
||||
|
||||
// Fall back to FTS5 if Chroma unavailable or returned no results
|
||||
if (results.length === 0) {
|
||||
results = search.searchObservations(query, options);
|
||||
}
|
||||
|
||||
// Format results
|
||||
if (format === 'index') {
|
||||
return {
|
||||
content: [{
|
||||
type: 'text',
|
||||
text: results.map((r, i) => formatObservationIndex(r, i)).join('\n\n') + formatSearchTips()
|
||||
}]
|
||||
};
|
||||
} else {
|
||||
return {
|
||||
content: results.map(r => ({
|
||||
type: 'resource',
|
||||
resource: {
|
||||
uri: `claude-mem://observation/${r.id}`,
|
||||
mimeType: 'text/markdown',
|
||||
text: formatObservationResult(r)
|
||||
}
|
||||
}))
|
||||
};
|
||||
}
|
||||
} catch (error: any) {
|
||||
return { content: [{ type: 'text', text: `Error: ${error.message}` }] };
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Characteristics
|
||||
|
||||
**Pros:**
|
||||
|
||||
- ✅ MCP protocol support
|
||||
- ✅ Hybrid search (Chroma + FTS5)
|
||||
- ✅ Rich formatting (markdown, resources)
|
||||
- ✅ Comprehensive error handling
|
||||
|
||||
**Cons:**
|
||||
|
||||
- ❌ Not used by skill (deprecated)
|
||||
- ❌ ~2,500 token overhead for tool definitions
|
||||
- ❌ More complex than HTTP
|
||||
- ❌ Still being built despite deprecation
|
||||
|
||||
---
|
||||
|
||||
## DRY Violation Analysis
|
||||
|
||||
### Areas of Duplication
|
||||
|
||||
#### 1. **Parameter Parsing** (10 HTTP endpoints + 9 MCP tools)
|
||||
|
||||
**HTTP Endpoints:**
|
||||
|
||||
```typescript
|
||||
// Repeated in each endpoint handler
|
||||
const query = req.query.query as string;
|
||||
const format = (req.query.format as string) || "full";
|
||||
const limit = parseInt(req.query.limit as string, 10) || 20;
|
||||
const project = req.query.project as string | undefined;
|
||||
|
||||
if (!query) {
|
||||
res.status(400).json({ error: "Missing required parameter: query" });
|
||||
return;
|
||||
}
|
||||
```
|
||||
|
||||
**MCP Tools:**
|
||||
|
||||
```typescript
|
||||
// Repeated in each tool handler
|
||||
const { query, format = "index", ...options } = args;
|
||||
if (!query) {
|
||||
throw new Error("Missing required parameter: query");
|
||||
}
|
||||
```
|
||||
|
||||
**Violation:** Parameter parsing logic duplicated 19 times (10 + 9)
|
||||
|
||||
#### 2. **Format Conversion** (Index vs Full)
|
||||
|
||||
**HTTP Endpoints:**
|
||||
|
||||
```typescript
|
||||
results: format === "index"
|
||||
? results.map((r) => ({
|
||||
id: r.id,
|
||||
type: r.type,
|
||||
title: r.title,
|
||||
subtitle: r.subtitle,
|
||||
created_at_epoch: r.created_at_epoch,
|
||||
project: r.project,
|
||||
score: r.score,
|
||||
}))
|
||||
: results;
|
||||
```
|
||||
|
||||
**MCP Tools:**
|
||||
|
||||
```typescript
|
||||
if (format === "index") {
|
||||
return {
|
||||
content: [
|
||||
{
|
||||
type: "text",
|
||||
text: results.map((r, i) => formatObservationIndex(r, i)).join("\n\n"),
|
||||
},
|
||||
],
|
||||
};
|
||||
} else {
|
||||
return {
|
||||
content: results.map((r) => ({
|
||||
type: "resource",
|
||||
resource: {
|
||||
uri: `claude-mem://observation/${r.id}`,
|
||||
mimeType: "text/markdown",
|
||||
text: formatObservationResult(r),
|
||||
},
|
||||
})),
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Violation:** Format conversion logic duplicated with different output formats
|
||||
|
||||
#### 3. **Search Logic Duplication**
|
||||
|
||||
**HTTP Endpoints:**
|
||||
|
||||
```typescript
|
||||
const sessionSearch = this.dbManager.getSessionSearch();
|
||||
const results = sessionSearch.searchObservations(query, { limit, project });
|
||||
```
|
||||
|
||||
**MCP Tools:**
|
||||
|
||||
```typescript
|
||||
// Hybrid search with Chroma fallback
|
||||
if (chromaClient) {
|
||||
const chromaResults = await queryChroma(query, 100);
|
||||
// ... complex hybrid logic ...
|
||||
}
|
||||
if (results.length === 0) {
|
||||
results = search.searchObservations(query, options);
|
||||
}
|
||||
```
|
||||
|
||||
**Violation:** MCP has hybrid Chroma+FTS5 search, HTTP only has FTS5
|
||||
|
||||
#### 4. **Error Handling**
|
||||
|
||||
**HTTP Endpoints:**
|
||||
|
||||
```typescript
|
||||
try {
|
||||
// ... handler logic ...
|
||||
} catch (error) {
|
||||
logger.failure("WORKER", "Search observations failed", {}, error as Error);
|
||||
res.status(500).json({ error: (error as Error).message });
|
||||
}
|
||||
```
|
||||
|
||||
**MCP Tools:**
|
||||
|
||||
```typescript
|
||||
try {
|
||||
// ... handler logic ...
|
||||
} catch (error: any) {
|
||||
return { content: [{ type: "text", text: `Error: ${error.message}` }] };
|
||||
}
|
||||
```
|
||||
|
||||
**Violation:** Different error handling patterns
|
||||
|
||||
### DRY Compliance at Data Layer ✅
|
||||
|
||||
**Good news:** Both architectures use the **same data layer**:
|
||||
|
||||
```
|
||||
HTTP Endpoints → SessionSearch → SQLite
|
||||
MCP Tools → SessionSearch → SQLite
|
||||
```
|
||||
|
||||
The `SessionSearch` class is the **single source of truth** for data access. No duplication there.
|
||||
|
||||
---
|
||||
|
||||
## Is curl the Best Approach?
|
||||
|
||||
### Current Approach: curl Commands
|
||||
|
||||
**Example from skill:**
|
||||
|
||||
```bash
|
||||
curl -s "http://localhost:37777/api/search/observations?query=authentication&format=index&limit=5"
|
||||
```
|
||||
|
||||
### Alternative Approaches
|
||||
|
||||
#### 1. **MCP Tools** (Deprecated)
|
||||
|
||||
**Pros:**
|
||||
|
||||
- Native Claude Code protocol
|
||||
- Rich type definitions
|
||||
- Better error handling
|
||||
- Resource formatting
|
||||
|
||||
**Cons:**
|
||||
|
||||
- ❌ ~2,500 token overhead per session
|
||||
- ❌ More complex to implement
|
||||
- ❌ Requires MCP server process
|
||||
- ❌ Less accessible for external tools
|
||||
|
||||
**Verdict:** MCP was deprecated for good reasons (token overhead). curl is better.
|
||||
|
||||
#### 2. **Direct Database Access** (Not feasible)
|
||||
|
||||
**Pros:**
|
||||
|
||||
- No HTTP overhead
|
||||
- No worker process needed
|
||||
|
||||
**Cons:**
|
||||
|
||||
- ❌ Skills can't access files directly
|
||||
- ❌ No way to execute TypeScript/SQLite from skill
|
||||
- ❌ Would require building native bindings
|
||||
|
||||
**Verdict:** Not possible with current skill architecture.
|
||||
|
||||
#### 3. **HTTP API via curl** (Current) ✅
|
||||
|
||||
**Pros:**
|
||||
|
||||
- ✅ Simple, standard protocol
|
||||
- ✅ Works with skill architecture
|
||||
- ✅ Easy to test (curl in terminal)
|
||||
|
||||
- ✅ Language-agnostic
|
||||
- ✅ No MCP token overhead
|
||||
- ✅ RESTful design
|
||||
|
||||
**Cons:**
|
||||
|
||||
- ⚠️ Requires worker service running
|
||||
- ⚠️ HTTP parsing overhead (minimal)
|
||||
|
||||
**Verdict:** **Best approach given constraints.**
|
||||
|
||||
### Why curl is Optimal
|
||||
|
||||
1. **Skill Constraints:** Skills can only execute shell commands. curl is the standard HTTP client.
|
||||
2. **Token Efficiency:** No tool definitions loaded into context (~2,250 token savings).
|
||||
3. **Progressive Disclosure:** Skill loads gradually, HTTP requests are made only when needed.
|
||||
|
||||
4. **Debuggability:** Easy to test endpoints manually with curl.
|
||||
5. **Cross-platform:** curl available on all platforms.
|
||||
|
||||
---
|
||||
|
||||
### Question: "Is it routing into the search-service MCP file or is it a DRY violation?"
|
||||
|
||||
**Answer:** Both architectures exist, creating a DRY violation:
|
||||
|
||||
1. **HTTP Endpoints** (worker-service.ts) ← **Used by skill**
|
||||
2. **MCP Server** (search-server.ts) ← **Deprecated but still built**
|
||||
|
||||
### Current State
|
||||
|
||||
```
|
||||
mem-search skill → HTTP API (worker-service.ts) → SessionSearch → SQLite
|
||||
↑
|
||||
MCP search server (deprecated) → SessionSearch ──────────────────────┘
|
||||
```
|
||||
|
||||
Both use the same data layer (SessionSearch), but:
|
||||
|
||||
- ❌ Parameter parsing duplicated
|
||||
- ❌ Format conversion duplicated
|
||||
- ❌ MCP has hybrid Chroma search, HTTP doesn't
|
||||
- ❌ MCP still being built despite deprecation
|
||||
|
||||
**You said:** "We are intentionally exposing API search endpoints
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
|
||||
│ - Web UI │
|
||||
│ - Mobile app │
|
||||
│ - VS Code extension │
|
||||
│ - CLI tools │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ HTTP API
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Worker Service HTTP API │
|
||||
│ localhost:37777/api/search/* │
|
||||
│ │
|
||||
│ - Standard REST endpoints │
|
||||
│ - JSON responses │
|
||||
│ - Query parameter API │
|
||||
│ - format=index/full support │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ SessionSearch + ChromaSync │
|
||||
│ (Shared data layer) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
- Standard REST API
|
||||
- Easy to consume from any language/platform
|
||||
- Already supports format=index/full for token efficiency
|
||||
- Well-documented in skill operation guides
|
||||
- Clean JSON responses
|
||||
|
||||
---
|
||||
|
||||
## Architectural Recommendations
|
||||
|
||||
### Immediate Actions
|
||||
|
||||
#### 1. **Remove MCP Search Server** (Reduce Maintenance Burden)
|
||||
|
||||
**Problem:**
|
||||
|
||||
- MCP server is deprecated but still being built
|
||||
- Adds 1,781 lines of maintenance burden
|
||||
- Creates confusion about which search to use
|
||||
- DRY violation with HTTP endpoints
|
||||
|
||||
**Recommendation:**
|
||||
|
||||
```bash
|
||||
# Remove from build pipeline
|
||||
# scripts/build-hooks.js - already commented out, make permanent
|
||||
|
||||
# Delete configuration
|
||||
rm plugin/.mcp.json
|
||||
|
||||
# Archive source (don't delete, keep for reference)
|
||||
git mv src/servers/search-server.ts archive/search-server.ts.archived
|
||||
|
||||
# Remove built file
|
||||
rm plugin/scripts/search-server.mjs
|
||||
```
|
||||
|
||||
**Impact:**
|
||||
|
||||
- ✅ Reduces build time
|
||||
- ✅ Eliminates confusion
|
||||
- ✅ Reduces maintenance burden
|
||||
- ✅ Removes DRY violation
|
||||
- ⚠️ Loses hybrid Chroma search in MCP (but HTTP doesn't have it anyway)
|
||||
|
||||
#### 2. **Add Hybrid Search to HTTP Endpoints** (Feature Parity)
|
||||
|
||||
**Problem:** MCP server has Chroma hybrid search, HTTP endpoints don't
|
||||
|
||||
**Recommendation:**
|
||||
|
||||
```typescript
|
||||
// src/services/worker-service.ts
|
||||
private async handleSearchObservations(req: Request, res: Response): Promise<void> {
|
||||
try {
|
||||
const { query, format, limit, project } = this.parseSearchParams(req);
|
||||
|
||||
// Try hybrid search first if Chroma available
|
||||
let results = await this.hybridSearch(query, { limit, project });
|
||||
|
||||
// Fallback to FTS5 if Chroma unavailable
|
||||
if (results.length === 0) {
|
||||
const sessionSearch = this.dbManager.getSessionSearch();
|
||||
results = sessionSearch.searchObservations(query, { limit, project });
|
||||
}
|
||||
|
||||
res.json(this.formatSearchResponse(query, results, format));
|
||||
} catch (error) {
|
||||
this.handleSearchError(res, 'Search observations failed', error);
|
||||
}
|
||||
}
|
||||
|
||||
// Extract shared methods
|
||||
private parseSearchParams(req: Request): SearchParams { /* ... */ }
|
||||
private async hybridSearch(query: string, options: SearchOptions): Promise<any[]> { /* ... */ }
|
||||
private formatSearchResponse(query: string, results: any[], format: string): any { /* ... */ }
|
||||
private handleSearchError(res: Response, message: string, error: any): void { /* ... */ }
|
||||
```
|
||||
|
||||
**Impact:**
|
||||
|
||||
- ✅ Adds Chroma semantic search to HTTP API
|
||||
- ✅ Makes HTTP API feature-complete
|
||||
|
||||
#### 3. **Extract Shared Search Logic** (DRY Refactoring)
|
||||
|
||||
**Problem:** 10 HTTP endpoints have duplicated parameter parsing and formatting
|
||||
|
||||
**Recommendation:**
|
||||
|
||||
```typescript
|
||||
// src/services/search/SearchController.ts (new file)
|
||||
export class SearchController {
|
||||
constructor(private sessionSearch: SessionSearch, private chromaSync: ChromaSync) {}
|
||||
|
||||
async searchObservations(params: SearchParams): Promise<SearchResponse> {
|
||||
// Shared logic for observations search
|
||||
const results = await this.hybridSearch(params);
|
||||
return this.formatResponse(results, params.format);
|
||||
}
|
||||
|
||||
async searchSessions(params: SearchParams): Promise<SearchResponse> {
|
||||
// Shared logic for sessions search
|
||||
}
|
||||
|
||||
// ... other search methods
|
||||
|
||||
private async hybridSearch(params: SearchParams): Promise<any[]> {
|
||||
// Shared hybrid search logic
|
||||
}
|
||||
|
||||
private formatResponse(results: any[], format: "index" | "full"): SearchResponse {
|
||||
// Shared formatting logic
|
||||
}
|
||||
|
||||
private parseParams(req: Request): SearchParams {
|
||||
// Shared parameter parsing
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Usage in worker-service.ts:**
|
||||
|
||||
```typescript
|
||||
private searchController: SearchController;
|
||||
|
||||
private handleSearchObservations(req: Request, res: Response): void {
|
||||
try {
|
||||
const params = this.searchController.parseParams(req);
|
||||
const response = await this.searchController.searchObservations(params);
|
||||
res.json(response);
|
||||
} catch (error) {
|
||||
this.handleSearchError(res, error);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Impact:**
|
||||
|
||||
- ✅ Eliminates 90% of duplication across 10 endpoints
|
||||
- ✅ Single source of truth for search logic
|
||||
- ✅ Easier to test (test controller, not HTTP layer)
|
||||
- ✅ Easier to maintain
|
||||
- ✅ Easier to add new search endpoints
|
||||
|
||||
### Long-term Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Clients │
|
||||
│ ┌──────────────┬──────────────┬──────────────────────┐ │
|
||||
|
||||
│ │ Skill │ Frontend │ (CLI, IDE plugins) │ │
|
||||
│ └──────────────┴──────────────┴──────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ HTTP API (REST)
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ WorkerService (Express.js) │
|
||||
│ │
|
||||
│ Route Layer (thin) │
|
||||
│ ├─ GET /api/search/observations │
|
||||
│ ├─ GET /api/search/sessions │
|
||||
│ └─ ... (delegates to controller) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ SearchController (business logic) │
|
||||
│ │
|
||||
│ ├─ searchObservations() │
|
||||
│ ├─ searchSessions() │
|
||||
│ ├─ hybridSearch() - Chroma + FTS5 │
|
||||
│ ├─ formatResponse() - index/full conversion │
|
||||
│ └─ parseParams() - parameter validation │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌───────────┴───────────┐
|
||||
▼ ▼
|
||||
┌──────────────────────────┐ ┌──────────────────────────┐
|
||||
│ SessionSearch (FTS5) │ │ ChromaSync (Vectors) │
|
||||
│ - searchObservations() │ │ - queryByEmbedding() │
|
||||
│ - searchSessions() │ │ - 90-day recency filter │
|
||||
│ - searchPrompts() │ │ - Hydrate from SQLite │
|
||||
└──────────────────────────┘ └──────────────────────────┘
|
||||
│ │
|
||||
└─────────┬─────────────────┘
|
||||
▼
|
||||
┌────────────────────────────────┐
|
||||
│ SQLite Database │
|
||||
│ ~/.claude-mem/claude-mem.db │
|
||||
└────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
### Current Architecture Issues
|
||||
|
||||
1. ❌ **Dual search implementations** (HTTP + deprecated MCP)
|
||||
2. ❌ **DRY violations** across 19 search handlers
|
||||
3. ❌ **MCP server still built** despite deprecation
|
||||
4. ❌ **HTTP missing hybrid Chroma search** (MCP has it)
|
||||
5. ❌ **No shared controller layer** for search logic
|
||||
|
||||
### Is curl the Best Approach?
|
||||
|
||||
**Yes.** ✅
|
||||
|
||||
Given the constraints:
|
||||
|
||||
- Skills can only execute shell commands
|
||||
|
||||
- Token efficiency vs MCP (~2,250 token savings)
|
||||
- Standard REST pattern, easy to consume
|
||||
|
||||
curl + HTTP API is the optimal architecture.
|
||||
|
||||
### Is it Routing into search-service or DRY Violation?
|
||||
|
||||
**DRY violation.** ❌
|
||||
|
||||
Both architectures exist and duplicate logic:
|
||||
|
||||
- HTTP endpoints (worker-service.ts) ← ACTIVE
|
||||
- MCP server (search-server.ts) ← DEPRECATED but BUILT
|
||||
|
||||
They share the data layer (SessionSearch) but duplicate:
|
||||
|
||||
- Parameter parsing
|
||||
- Format conversion
|
||||
- Error handling
|
||||
- Search orchestration (MCP has Chroma, HTTP doesn't)
|
||||
|
||||
### Recommendations Priority
|
||||
|
||||
**High Priority:**
|
||||
|
||||
1. ✅ Remove MCP search server entirely (archive source)
|
||||
2. ✅ Add hybrid Chroma search to HTTP endpoints
|
||||
3. ✅ Extract SearchController for shared logic
|
||||
|
||||
**Medium Priority:**
|
||||
|
||||
5. Add API versioning (/api/v1/search/\*)
|
||||
6. Add rate limiting for external access
|
||||
|
||||
**Low Priority:** 7. OpenAPI/Swagger documentation
|
||||
|
||||
9. WebSocket support for real-time search
|
||||
|
||||
### Action Plan
|
||||
|
||||
**Phase 1: Cleanup (1 day)**
|
||||
|
||||
- Remove .mcp.json
|
||||
- Archive search-server.ts
|
||||
- Update CLAUDE.md to reflect removal
|
||||
- Update build scripts to skip MCP server
|
||||
|
||||
**Phase 2: Feature Parity (2 days)**
|
||||
|
||||
- Port hybrid Chroma search from MCP to HTTP
|
||||
- Test all 10 endpoints with hybrid search
|
||||
- Update skill documentation
|
||||
|
||||
**Phase 3: DRY Refactoring (3 days)**
|
||||
|
||||
- Create SearchController class
|
||||
- Extract shared logic (parsing, formatting, errors)
|
||||
- Refactor 10 HTTP handlers to use controller
|
||||
- Add comprehensive tests
|
||||
|
||||
- Document API for external consumption
|
||||
- Add authentication/authorization (if needed)
|
||||
- Add rate limiting
|
||||
- Create OpenAPI spec
|
||||
|
||||
---
|
||||
|
||||
## Files Referenced
|
||||
|
||||
**Active:**
|
||||
|
||||
- `src/services/worker-service.ts` - HTTP endpoints (1,338 lines)
|
||||
- `src/services/sqlite/SessionSearch.ts` - FTS5 search
|
||||
- `src/services/sync/ChromaSync.ts` - Vector search
|
||||
- `plugin/skills/mem-search/SKILL.md` - Skill using HTTP API
|
||||
|
||||
**Deprecated:**
|
||||
|
||||
- `src/servers/search-server.ts` - MCP tools (1,781 lines)
|
||||
- `plugin/.mcp.json` - MCP configuration
|
||||
- `plugin/scripts/search-server.mjs` - Built MCP server (74KB)
|
||||
|
||||
**Configuration:**
|
||||
|
||||
- `CLAUDE.md` line 314 - Deprecation notice
|
||||
- `CHANGELOG.md` line 32-52 - v5.4.0 migration
|
||||
- `scripts/build-hooks.js` - Build pipeline (MCP commented out)
|
||||
@@ -0,0 +1,160 @@
|
||||
# Skill Audit Report
|
||||
|
||||
**Date:** 2025-11-10
|
||||
**Validation:** Anthropic's official skill-creator documentation
|
||||
**Skills Audited:** mem-search, search
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The mem-search skill achieves 100% compliance across all dimensions. The search skill meets technical requirements but fails effectiveness metrics critical for auto-invocation.
|
||||
|
||||
**mem-search:** Production-ready. No changes required.
|
||||
|
||||
**search:** Requires three critical fixes before Claude reliably discovers and invokes this skill.
|
||||
|
||||
## mem-search Skill Results
|
||||
|
||||
**Status:** ✅ PASS
|
||||
**Compliance:** 100% technical, 100% effectiveness
|
||||
**Files:** 17 (202-line SKILL.md + 13 operations + 2 principles)
|
||||
|
||||
### Strengths
|
||||
|
||||
The skill demonstrates exemplary effectiveness engineering:
|
||||
|
||||
1. **Trigger Design (85% concrete)**
|
||||
- Five unique identifiers: claude-mem, PM2-managed database, cross-session memory, session summaries, observations
|
||||
- Nine scope differentiation keywords
|
||||
- Explicit boundary: "NOT in the current conversation context"
|
||||
- Minimal overlap with Claude's native capabilities
|
||||
|
||||
2. **Capability Visibility (100%)**
|
||||
- All nine operations include inline "Use when" examples
|
||||
- Decision guide reduces complexity from nine operations to five common cases
|
||||
- No navigation friction
|
||||
|
||||
3. **Structure**
|
||||
- 202 lines (60% under limit)
|
||||
- Perfect progressive disclosure with token cost documentation
|
||||
- Clean file organization: operations/ and principles/ directories
|
||||
- No content duplication
|
||||
|
||||
### Issues
|
||||
|
||||
**One false positive:** Line 152 contains backslashes in regex notation `(bugfix\|feature\|decision)`. This documents parameter syntax, not Windows paths. No action required.
|
||||
|
||||
## search Skill Results
|
||||
|
||||
**Status:** ⚠️ NEEDS IMPROVEMENT
|
||||
**Compliance:** 100% technical, 67% effectiveness
|
||||
**Files:** 13 (96-line SKILL.md + 12 operations)
|
||||
|
||||
### Critical Effectiveness Issues
|
||||
|
||||
Three failures prevent reliable auto-invocation:
|
||||
|
||||
#### Issue 1: Insufficient Scope Differentiation
|
||||
|
||||
**Problem:** Description contains only two differentiation keywords (threshold: ≥3). Claude cannot distinguish this skill from native conversation memory.
|
||||
|
||||
**Current description:**
|
||||
```text
|
||||
Search claude-mem persistent memory for past sessions, observations, bugs
|
||||
fixed, features implemented, decisions made, code changes, and previous work.
|
||||
Use when answering questions about history, finding past decisions, or
|
||||
researching previous implementations.
|
||||
```
|
||||
|
||||
**Domain overlap analysis:**
|
||||
- Claude answers natively: "What bugs did we fix?" (current conversation)
|
||||
- Claude needs skill: "What bugs did we fix last week?" (external database)
|
||||
|
||||
**Fix required:**
|
||||
|
||||
```text
|
||||
Search claude-mem's external database of past sessions, observations, and
|
||||
work from previous conversations. Accesses persistent memory stored outside
|
||||
current session context - NOT information from today's conversation. Use when
|
||||
users ask about: (1) previous sessions ("what did we do last week?"),
|
||||
(2) historical work ("bugs we fixed months ago"), (3) cross-session patterns
|
||||
("how have we approached this before?"), (4) work already stored in claude-mem
|
||||
("what's in the database about X?"). Searches FTS5 full-text index across
|
||||
typed observations (bugfix/feature/refactor/decision/discovery). For current
|
||||
session memory, use native conversation context instead.
|
||||
```
|
||||
|
||||
This adds eight differentiation keywords: "external database", "past sessions", "previous conversations", "outside current session", "NOT information from today's", "last week", "months ago", "already stored in claude-mem".
|
||||
|
||||
#### Issue 2: Weak Trigger Specificity
|
||||
|
||||
**Problem:** Only 44% concrete triggers (threshold: >50%). Only one unique identifier (threshold: ≥2).
|
||||
|
||||
**Abstract triggers (low specificity):**
|
||||
- "history" (could mean git history, browser history)
|
||||
- "past work" (could mean files, commits, documents)
|
||||
- "decisions" (could mean any decision tracking)
|
||||
- "previous work" (could mean current session earlier)
|
||||
- "implementations" (could mean code in current conversation)
|
||||
|
||||
**Concrete triggers (high specificity):**
|
||||
- "claude-mem" (unique system name)
|
||||
- "persistent memory" (system-specific)
|
||||
- "sessions" (cross-session concept)
|
||||
- "observations" (system-specific)
|
||||
|
||||
**Concrete ratio:** 4/9 = 44% (fails 50% threshold)
|
||||
|
||||
**Fix required:** Add system-specific terminology: "HTTP API", "port 37777", "FTS5 full-text index", "typed observations". See combined description in Issue 1 fix.
|
||||
|
||||
#### Issue 3: Wasted Content in Body
|
||||
|
||||
**Problem:** Lines 10-22 contain "When to Use This Skill" section in SKILL.md body. This loads AFTER triggering, wastes ~200 tokens, provides no value.
|
||||
|
||||
**Reference:** [Anthropic's skill-creator documentation](https://github.com/anthropics/anthropic-quickstarts/tree/main/skill-creator) states: "The body is only loaded after triggering, so 'When to Use This Skill' sections in the body are not helpful to Claude."
|
||||
|
||||
**Fix required:** Delete lines 10-22 entirely. Move triggering examples to description field (already included in Issue 1 fix).
|
||||
|
||||
### Strengths
|
||||
|
||||
The skill demonstrates strong structure:
|
||||
|
||||
- Excellent progressive disclosure (96-line navigation hub)
|
||||
- Strong decision guide (reduces 10 operations to common cases)
|
||||
- 100% capability visibility (all operations show purpose inline)
|
||||
- No forbidden files or content duplication
|
||||
- Clean operations/ directory structure
|
||||
|
||||
### Warning
|
||||
|
||||
**Minor:** Description uses imperative "Use when" instead of third person. Change to "Useful for" or "Invoked when" for consistency with skill-creator best practices.
|
||||
|
||||
## Comparison
|
||||
|
||||
| Metric | mem-search | search | Impact |
|
||||
|--------|-----------|---------|--------|
|
||||
| Concrete triggers | 85% | 44% | search harder to discover |
|
||||
| Unique identifiers | 5+ | 1 | search less distinct |
|
||||
| Scope differentiation | 9 keywords | 2 keywords | **search conflicts with native memory** |
|
||||
| Body optimization | Clean | Wasted section | search wastes tokens |
|
||||
| Overall effectiveness | 100% | 67% | search needs fixes |
|
||||
|
||||
## Critical Recommendations
|
||||
|
||||
The search skill requires three changes before production use:
|
||||
|
||||
1. **Rewrite description** to add scope differentiation and concrete triggers (see Issue 1 fix)
|
||||
2. **Delete lines 10-22** from SKILL.md body
|
||||
3. **Convert to third person** - change "Use when" to "Useful for"
|
||||
|
||||
**Why this matters:** Without scope differentiation, Claude assumes "What bugs did we fix?" refers to current conversation, not the external claude-mem database. This causes systematic under-invocation.
|
||||
|
||||
## Reference Implementation
|
||||
|
||||
The mem-search skill serves as a reference implementation for:
|
||||
|
||||
- Trigger design with explicit scope boundaries
|
||||
- Progressive disclosure with token efficiency documentation
|
||||
- Inline capability visibility eliminating navigation friction
|
||||
- Decision guides reducing cognitive load
|
||||
|
||||
Study mem-search when creating skills that overlap with Claude's native capabilities.
|
||||
@@ -0,0 +1,88 @@
|
||||
# Claude-Mem Public Documentation
|
||||
|
||||
## What This Folder Is
|
||||
|
||||
This `docs/public/` folder contains the **Mintlify documentation site** - the official user-facing documentation for claude-mem. It's a structured documentation platform with a specific file format and organization.
|
||||
|
||||
## Folder Structure
|
||||
|
||||
```
|
||||
docs/
|
||||
├── public/ ← You are here (Mintlify MDX files)
|
||||
│ ├── *.mdx - User-facing documentation pages
|
||||
│ ├── docs.json - Mintlify configuration and navigation
|
||||
│ ├── architecture/ - Technical architecture docs
|
||||
│ ├── usage/ - User guides and workflows
|
||||
│ └── *.webp, *.gif - Assets (logos, screenshots)
|
||||
└── context/ ← Internal documentation (DO NOT put here)
|
||||
└── *.md - Planning docs, audits, references
|
||||
```
|
||||
|
||||
## File Requirements
|
||||
|
||||
### Mintlify Documentation Files (.mdx)
|
||||
All official documentation files must be:
|
||||
- Written in `.mdx` format (Markdown with JSX support)
|
||||
- Listed in `docs.json` navigation structure
|
||||
- Follow Mintlify's schema and conventions
|
||||
|
||||
The documentation is organized into these sections:
|
||||
- **Get Started**: Introduction, installation, usage guides
|
||||
- **Best Practices**: Context engineering, progressive disclosure
|
||||
- **Configuration & Development**: Settings, dev workflow, troubleshooting
|
||||
- **Architecture**: System design, components, technical details
|
||||
|
||||
### Configuration File
|
||||
`docs.json` defines:
|
||||
- Site metadata (name, description, theme)
|
||||
- Navigation structure
|
||||
- Branding (logos, colors)
|
||||
- Footer links and social media
|
||||
|
||||
## What Does NOT Belong Here
|
||||
|
||||
**Planning documents, design docs, and reference materials go in `/docs/context/` instead:**
|
||||
|
||||
Files that belong in `/docs/context/` (NOT here):
|
||||
- Planning documents (`*-plan.md`, `*-outline.md`)
|
||||
- Implementation analysis (`*-audit.md`, `*-code-reference.md`)
|
||||
- Error tracking (`typescript-errors.md`)
|
||||
- Internal design documents
|
||||
- PR review responses
|
||||
- Reference materials (like `agent-sdk-ref.md`)
|
||||
- Work-in-progress documentation
|
||||
|
||||
## How to Add Official Documentation
|
||||
|
||||
1. Create a new `.mdx` file in the appropriate subdirectory
|
||||
2. Add the file path to `docs.json` navigation
|
||||
3. Use Mintlify's frontmatter and components
|
||||
4. Follow the existing documentation style
|
||||
5. Test locally: `npx mintlify dev`
|
||||
|
||||
## Development Workflow
|
||||
|
||||
**For contributors working on claude-mem:**
|
||||
- Read `/CLAUDE.md` in the project root for development instructions
|
||||
- Place planning/design docs in `/docs/context/`
|
||||
- Only add user-facing documentation to `/docs/public/`
|
||||
- Test documentation locally with Mintlify CLI before committing
|
||||
|
||||
## Testing Documentation
|
||||
|
||||
```bash
|
||||
# Validate docs structure
|
||||
npx mintlify validate
|
||||
|
||||
# Check for broken links
|
||||
npx mintlify broken-links
|
||||
|
||||
# Run local dev server
|
||||
npx mintlify dev
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
**Simple Rule**:
|
||||
- `/docs/public/` = Official user documentation (Mintlify .mdx files) ← YOU ARE HERE
|
||||
- `/docs/context/` = Internal docs, plans, references, audits
|
||||
|
Before Width: | Height: | Size: 78 KiB After Width: | Height: | Size: 78 KiB |
|
Before Width: | Height: | Size: 78 KiB After Width: | Height: | Size: 78 KiB |
|
Before Width: | Height: | Size: 42 KiB After Width: | Height: | Size: 42 KiB |
|
Before Width: | Height: | Size: 2.1 MiB After Width: | Height: | Size: 2.1 MiB |
@@ -0,0 +1,222 @@
|
||||
# Context Engineering for AI Agents: Best Practices Cheat Sheet
|
||||
|
||||
## Core Principle
|
||||
**Find the smallest possible set of high-signal tokens that maximize the likelihood of your desired outcome.**
|
||||
|
||||
---
|
||||
|
||||
## Context Engineering vs Prompt Engineering
|
||||
|
||||
**Prompt Engineering**: Writing and organizing LLM instructions for optimal outcomes (one-time task)
|
||||
|
||||
**Context Engineering**: Curating and maintaining the optimal set of tokens during inference across multiple turns (iterative process)
|
||||
|
||||
Context engineering manages:
|
||||
- System instructions
|
||||
- Tools
|
||||
- Model Context Protocol (MCP)
|
||||
- External data
|
||||
- Message history
|
||||
- Runtime data retrieval
|
||||
|
||||
---
|
||||
|
||||
## The Problem: Context Rot
|
||||
|
||||
**Key Insight**: LLMs have an "attention budget" that gets depleted as context grows
|
||||
|
||||
- Every token attends to every other token (n² relationships)
|
||||
- As context length increases, model accuracy decreases
|
||||
- Models have less training experience with longer sequences
|
||||
- Context must be treated as a finite resource with diminishing marginal returns
|
||||
|
||||
---
|
||||
|
||||
## System Prompts: Find the "Right Altitude"
|
||||
|
||||
### The Goldilocks Zone
|
||||
|
||||
**Too Prescriptive** ❌
|
||||
- Hardcoded if-else logic
|
||||
- Brittle and fragile
|
||||
- High maintenance complexity
|
||||
|
||||
**Too Vague** ❌
|
||||
- High-level guidance without concrete signals
|
||||
- Falsely assumes shared context
|
||||
- Lacks actionable direction
|
||||
|
||||
**Just Right** ✅
|
||||
- Specific enough to guide behavior effectively
|
||||
- Flexible enough to provide strong heuristics
|
||||
- Minimal set of information that fully outlines expected behavior
|
||||
|
||||
### Best Practices
|
||||
- Use simple, direct language
|
||||
- Organize into distinct sections (`<background_information>`, `<instructions>`, `## Tool guidance`, etc.)
|
||||
- Use XML tags or Markdown headers for structure
|
||||
- Start with minimal prompt, add based on failure modes
|
||||
- Note: Minimal ≠ short (provide sufficient information upfront)
|
||||
|
||||
---
|
||||
|
||||
## Tools: Minimal and Clear
|
||||
|
||||
### Design Principles
|
||||
- **Self-contained**: Each tool has a single, clear purpose
|
||||
- **Robust to error**: Handle edge cases gracefully
|
||||
- **Extremely clear**: Intended use is unambiguous
|
||||
- **Token-efficient**: Returns relevant information without bloat
|
||||
- **Descriptive parameters**: Unambiguous input names (e.g., `user_id` not `user`)
|
||||
|
||||
### Critical Rule
|
||||
**If a human engineer can't definitively say which tool to use in a given situation, an AI agent can't be expected to do better.**
|
||||
|
||||
### Common Failure Modes to Avoid
|
||||
- Bloated tool sets covering too much functionality
|
||||
- Tools with overlapping purposes
|
||||
- Ambiguous decision points about which tool to use
|
||||
|
||||
---
|
||||
|
||||
## Examples: Diverse, Not Exhaustive
|
||||
|
||||
**Do** ✅
|
||||
- Curate a set of diverse, canonical examples
|
||||
- Show expected behavior effectively
|
||||
- Think "pictures worth a thousand words"
|
||||
|
||||
**Don't** ❌
|
||||
- Stuff in a laundry list of edge cases
|
||||
- Try to articulate every possible rule
|
||||
- Overwhelm with exhaustive scenarios
|
||||
|
||||
---
|
||||
|
||||
## Context Retrieval Strategies
|
||||
|
||||
### Just-In-Time Context (Recommended for Agents)
|
||||
**Approach**: Maintain lightweight identifiers (file paths, queries, links) and dynamically load data at runtime
|
||||
|
||||
**Benefits**:
|
||||
- Avoids context pollution
|
||||
- Enables progressive disclosure
|
||||
- Mirrors human cognition (we don't memorize everything)
|
||||
- Leverages metadata (file names, folder structure, timestamps)
|
||||
- Agents discover context incrementally
|
||||
|
||||
**Trade-offs**:
|
||||
- Slower than pre-computed retrieval
|
||||
- Requires proper tool guidance to avoid dead-ends
|
||||
|
||||
### Pre-Inference Retrieval (Traditional RAG)
|
||||
**Approach**: Use embedding-based retrieval to surface context before inference
|
||||
|
||||
**When to Use**: Static content that won't change during interaction
|
||||
|
||||
### Hybrid Strategy (Best of Both)
|
||||
**Approach**: Retrieve some data upfront, enable autonomous exploration as needed
|
||||
|
||||
**Example**: Claude Code loads CLAUDE.md files upfront, uses glob/grep for just-in-time retrieval
|
||||
|
||||
**Rule of Thumb**: "Do the simplest thing that works"
|
||||
|
||||
---
|
||||
|
||||
## Long-Horizon Tasks: Three Techniques
|
||||
|
||||
### 1. Compaction
|
||||
**What**: Summarize conversation nearing context limit, reinitiate with summary
|
||||
|
||||
**Implementation**:
|
||||
- Pass message history to model for compression
|
||||
- Preserve critical details (architectural decisions, bugs, implementation)
|
||||
- Discard redundant outputs
|
||||
- Continue with compressed context + recently accessed files
|
||||
|
||||
**Tuning Process**:
|
||||
1. **First**: Maximize recall (capture all relevant information)
|
||||
2. **Then**: Improve precision (eliminate superfluous content)
|
||||
|
||||
**Low-Hanging Fruit**: Clear old tool calls and results
|
||||
|
||||
**Best For**: Tasks requiring extensive back-and-forth
|
||||
|
||||
### 2. Structured Note-Taking (Agentic Memory)
|
||||
**What**: Agent writes notes persisted outside context window, retrieved later
|
||||
|
||||
**Examples**:
|
||||
- To-do lists
|
||||
- NOTES.md files
|
||||
- Game state tracking (Pokémon example: tracking 1,234 steps of training)
|
||||
- Project progress logs
|
||||
|
||||
**Benefits**:
|
||||
- Persistent memory with minimal overhead
|
||||
- Maintains critical context across tool calls
|
||||
- Enables multi-hour coherent strategies
|
||||
|
||||
**Best For**: Iterative development with clear milestones
|
||||
|
||||
### 3. Sub-Agent Architectures
|
||||
**What**: Specialized sub-agents handle focused tasks with clean context windows
|
||||
|
||||
**How It Works**:
|
||||
- Main agent coordinates high-level plan
|
||||
- Sub-agents perform deep technical work
|
||||
- Sub-agents explore extensively (tens of thousands of tokens)
|
||||
- Return condensed summaries (1,000-2,000 tokens)
|
||||
|
||||
**Benefits**:
|
||||
- Clear separation of concerns
|
||||
- Parallel exploration
|
||||
- Detailed context remains isolated
|
||||
|
||||
**Best For**: Complex research and analysis tasks
|
||||
|
||||
---
|
||||
|
||||
## Quick Decision Framework
|
||||
|
||||
| Scenario | Recommended Approach |
|
||||
|----------|---------------------|
|
||||
| Static content | Pre-inference retrieval or hybrid |
|
||||
| Dynamic exploration needed | Just-in-time context |
|
||||
| Extended back-and-forth | Compaction |
|
||||
| Iterative development | Structured note-taking |
|
||||
| Complex research | Sub-agent architectures |
|
||||
| Rapid model improvement | "Do the simplest thing that works" |
|
||||
|
||||
---
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
1. **Context is finite**: Treat it as a precious resource with an attention budget
|
||||
2. **Think holistically**: Consider the entire state available to the LLM
|
||||
3. **Stay minimal**: More context isn't always better
|
||||
4. **Be iterative**: Context curation happens each time you pass to the model
|
||||
5. **Design for autonomy**: As models improve, let them act intelligently
|
||||
6. **Start simple**: Test with minimal setup, add based on failure modes
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns to Avoid
|
||||
|
||||
- ❌ Cramming everything into prompts
|
||||
- ❌ Creating brittle if-else logic
|
||||
- ❌ Building bloated tool sets
|
||||
- ❌ Stuffing exhaustive edge cases as examples
|
||||
- ❌ Assuming larger context windows solve everything
|
||||
- ❌ Ignoring context pollution over long interactions
|
||||
|
||||
---
|
||||
|
||||
## Remember
|
||||
|
||||
> "Even as models continue to improve, the challenge of maintaining coherence across extended interactions will remain central to building more effective agents."
|
||||
|
||||
Context engineering will evolve, but the core principle stays the same: **optimize signal-to-noise ratio in your token budget**.
|
||||
|
||||
---
|
||||
|
||||
*Based on Anthropic's "Effective context engineering for AI agents" (September 2025)*
|
||||