Performance improvements: Token reduction and enhanced summaries (#101)

* refactor: Reduce continuation prompt token usage by 95 lines Removed redundant instructions from continuation prompt that were originally added to mitigate a session continuity issue. That issue has since been resolved, making these detailed instructions unnecessary on every continuation. Changes: - Reduced continuation prompt from ~106 lines to ~11 lines (~95 line reduction) - Changed "User's Goal:" to "Next Prompt in Session:" (more accurate framing) - Removed redundant WHAT TO RECORD, WHEN TO SKIP, and OUTPUT FORMAT sections - Kept concise reminder: "Continue generating observations and progress summaries..." - Initial prompt still contains all detailed instructions Impact: - Significant token savings on every continuation prompt - Faster context injection with no loss of functionality - Instructions remain comprehensive in initial prompt Files modified: - src/sdk/prompts.ts (buildContinuationPrompt function) - plugin/scripts/worker-service.cjs (compiled output) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * refactor: Enhance observation and summary prompts for clarity and token efficiency * Enhance prompt clarity and instructions in prompts.ts - Added a reminder to think about instructions before starting work. - Simplified the continuation prompt instruction by removing "for this ongoing session." * feat: Enhance settings.json with permissions and deny access to sensitive files refactor: Remove PLAN-full-observation-display.md and PR_SUMMARY.md as they are no longer needed chore: Delete SECURITY_SUMMARY.md since it is redundant after recent changes fix: Update worker-service.cjs to streamline observation generation instructions cleanup: Remove src-analysis.md and src-tree.md for a cleaner codebase refactor: Modify prompts.ts to clarify instructions for memory processing * refactor: Remove legacy worker service implementation * feat: Enhance summary hook to extract last assistant message and improve logging - Added function to extract the last assistant message from the transcript. - Updated summary hook to include last assistant message in the summary request. - Modified SDKSession interface to store last assistant message. - Adjusted buildSummaryPrompt to utilize last assistant message for generating summaries. - Updated worker service and session manager to handle last assistant message in summarize requests. - Introduced silentDebug utility for improved logging and diagnostics throughout the summary process. * docs: Add comprehensive implementation plan for ROI metrics feature Added detailed implementation plan covering: - Token usage capture from Agent SDK - Database schema changes (migration #8) - Discovery cost tracking per observation - Context hook display with ROI metrics - Testing and rollout strategy Timeline: ~20 hours over 4 days Goal: Empirical data for YC application amendment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: Add transcript processing scripts for analysis and formatting - Implemented `dump-transcript-readable.ts` to generate a readable markdown dump of transcripts, excluding certain entry types. - Created `extract-rich-context-examples.ts` to extract and showcase rich context examples from transcripts, highlighting user requests and assistant reasoning. - Developed `format-transcript-context.ts` to format transcript context into a structured markdown format for improved observation generation. - Added `test-transcript-parser.ts` for validating data extraction from transcript JSONL files, including statistics and error reporting. - Introduced `transcript-to-markdown.ts` for a complete representation of transcript data in markdown format, showing all context data. - Enhanced type definitions in `transcript.ts` to support new features and ensure type safety. - Built `transcript-parser.ts` to handle parsing of transcript JSONL files, including error handling and data extraction methods. * Refactor hooks and SDKAgent for improved observation handling - Updated `new-hook.ts` to clean user prompts by stripping leading slashes for better semantic clarity. - Enhanced `save-hook.ts` to include additional tools in the SKIP_TOOLS set, preventing unnecessary observations from certain command invocations. - Modified `prompts.ts` to change the structure of observation prompts, emphasizing the observational role and providing a detailed XML output format for observations. - Adjusted `SDKAgent.ts` to enforce stricter tool usage restrictions, ensuring the memory agent operates solely as an observer without any tool access. * feat: Enhance session initialization to accept user prompts and prompt numbers - Updated `handleSessionInit` in `worker-service.ts` to extract `userPrompt` and `promptNumber` from the request body and pass them to `initializeSession`. - Modified `initializeSession` in `SessionManager.ts` to handle optional `currentUserPrompt` and `promptNumber` parameters. - Added logic to update the existing session's `userPrompt` and `lastPromptNumber` if a `currentUserPrompt` is provided. - Implemented debug logging for session initialization and updates to track user prompts and prompt numbers. --------- Co-authored-by: Claude <noreply@anthropic.com>
2025-11-13 18:22:44 -05:00
parent ab5d78717f
commit 68290a9121
39 changed files with 4584 additions and 2809 deletions
@@ -8,25 +8,9 @@ Claude-mem is a Claude Code plugin providing persistent memory across sessions.

 **Current Version**: 5.5.1

-## IMPORTANT: Skills Are Auto-Invoked, Not Commands
+## IMPORTANT: Skills Are Auto-Invoked

-**THERE IS NO `/skill` COMMAND IN CLAUDE CODE.**
-
-Skills are automatically invoked by Claude Code based on their description metadata. When documentation was updated, AI agents incorrectly hallucinated that `/skill <name>` was a valid command. It is not.
-
-**How Skills Actually Work:**
- Skills have a `name:` and `description:` in their frontmatter (SKILL.md)
- Claude Code automatically loads skill descriptions at session start
- Claude invokes skills based on matching user queries to skill descriptions
- Users simply ask naturally: "What did we do last session?" → mem-search skill auto-invokes
- No manual invocation command exists or is needed
-
-**Correct Documentation:**
- ❌ Wrong: "Run `/skill troubleshoot`"
- ✅ Right: "The troubleshoot skill will automatically activate when issues are detected"
- ✅ Right: "Ask about past work and the mem-search skill will activate"
-
-This note exists to prevent future documentation from re-introducing this hallucination.
+**There is no `/skill` command.** Skills auto-invoke based on description metadata matching user queries. Don't document manual invocation (e.g., "Run `/skill troubleshoot`"). Instead: "The troubleshoot skill auto-activates when issues are detected."

 ## Critical Architecture Knowledge

@@ -77,7 +61,6 @@ This note exists to prevent future documentation from re-introducing this halluc
 - Auto-invoked when users ask about past work, decisions, or history
 - Uses HTTP endpoints instead of MCP tools (~2,250 token savings per session)
 - 10 search operations: observations, sessions, prompts, by-type, by-file, by-concept, timelines, etc.
- Enhanced in v5.5.0 with "mem-search" naming for better scope differentiation

 **Chroma Vector Database** (`src/services/sync/ChromaSync.ts`)
 - Hybrid semantic + keyword search architecture
@@ -129,135 +112,25 @@ Changes to React components, styles, or viewer logic require rebuilding and rest
 2. `npm run sync-marketplace` → Syncs to `~/.claude/plugins/marketplaces/thedotmack/`
 3. Changes are live for next session (hooks/skills) or after restart (worker)

-## Coding Standards: DRY, YAGNI, and Anti-Patterns
+## Coding Standards

-**Philosophy**: Write the dumb, obvious thing first. Add complexity only when you actually hit the problem.
+**Philosophy**: Write the dumb, obvious thing first. Add complexity only when you hit the problem.

-### Common Anti-Patterns to Avoid
-
-**1. Wrapper Functions for Constants**
-```typescript
-// ❌ DON'T: Ceremonial wrapper that adds zero value
-export function getWorkerPort(): number {
-  return FIXED_PORT;
-}
-
-// ✅ DO: Export the constant directly
-export const WORKER_PORT = parseInt(process.env.CLAUDE_MEM_WORKER_PORT || "37777", 10);
-```
-
-**2. Unused Default Parameters**
-```typescript
-// ❌ DON'T: Defaults that are never actually used
-async function isHealthy(timeout: number = 3000) { ... }
-// Every call: isHealthy(1000) - the default is dead code
-
-// ✅ DO: Remove the default if no one uses it
-async function isHealthy(timeout: number) { ... }
-```
-
-**3. Magic Numbers Everywhere**
-```typescript
-// ❌ DON'T: Unexplained magic numbers scattered throughout
-if (await isWorkerHealthy(1000)) { ... }
-await waitForHealth(10000);
-setTimeout(resolve, 100);
-
-// ✅ DO: Named constants with context
-const HEALTH_CHECK_TIMEOUT_MS = 1000;
-const HEALTH_CHECK_MAX_WAIT_MS = 10000;
-const HEALTH_CHECK_POLL_INTERVAL_MS = 100;
-```
-
-**4. Overengineered Error Handling**
-```typescript
-// ❌ DON'T: Silent failures and defensive programming for ghosts
-checkProcess.on("close", (code) => {
-  // PM2 list can fail, but we should still continue - just assume worker isn't running
-  resolve(); // <- Silent failure!
-});
-
-// ✅ DO: Fail fast with clear errors
-checkProcess.on("close", (code) => {
-  if (code !== 0) {
-    reject(new Error(`PM2 not found - install dependencies first`));
-  }
-  resolve();
-});
-```
-
-**5. Fragile String Parsing**
-```typescript
-// ❌ DON'T: Parse human-readable output with string matching
-const isRunning = output.includes("claude-mem-worker") && output.includes("online");
-
-// ✅ DO: Use structured output (JSON)
-const processes = JSON.parse(execSync('pm2 jlist'));
-const isRunning = processes.some(p => p.name === 'claude-mem-worker' && p.pm2_env.status === 'online');
-```
-
-**6. Duplicated Promise Wrappers**
-```typescript
-// ❌ DON'T: Copy-paste the same promise pattern multiple times
-await new Promise((resolve, reject) => {
-  process1.on("error", reject);
-  process1.on("close", (code) => { /* ... */ });
-});
-// ... later ...
-await new Promise((resolve, reject) => {
-  process2.on("error", reject);
-  process2.on("close", (code) => { /* ... same pattern */ });
-});
-
-// ✅ DO: Extract a helper function
-async function waitForProcess(process: ChildProcess, validateExitCode = false): Promise<void> {
-  return new Promise((resolve, reject) => {
-    process.on("error", reject);
-    process.on("close", (code) => {
-      if (validateExitCode && code !== 0 && code !== null) {
-        reject(new Error(`Process failed with exit code ${code}`));
-      } else {
-        resolve();
-      }
-    });
-  });
-}
-```
-
-**7. YAGNI Violations - Solving Problems You Don't Have**
-```typescript
-// ❌ DON'T: 50+ lines checking PM2 status before starting
-const checkProcess = spawn(pm2Path, ["list", "--no-color"]);
-// ... parse output ...
-// ... check if running ...
-// ... then maybe start it ...
-
-// ✅ DO: Just start it (PM2 start is idempotent)
-if (!await isWorkerHealthy()) {
-  await startWorker(); // PM2 handles "already running" gracefully
-  if (!await waitForWorkerHealth()) {
-    throw new Error("Worker failed to become healthy");
-  }
-}
-```
-
-### Why These Patterns Appear
-
-These anti-patterns often emerge from:
- **Training bias**: Code that looks "professional" is often overengineered
- **Risk aversion**: Optimizing for "what could go wrong" instead of "what do you actually need"
- **Pattern matching**: Seeing a problem and immediately scaffolding a framework
- **No real-world pain**: Not debugging at 2am means not feeling the cost of complexity
-
-### The Actual Standard
-
-1. **YAGNI (You Aren't Gonna Need It)**: Don't build it until you need it
-2. **DRY (Don't Repeat Yourself)**: Extract patterns after the second duplication, not before
+**Key Principles:**
+1. **YAGNI**: Don't build it until you need it
+2. **DRY**: Extract patterns after second duplication, not before
 3. **Fail Fast**: Explicit errors beat silent failures
-4. **Simple First**: Write the obvious solution, then optimize only if needed
+4. **Simple First**: Write the obvious solution, optimize only if needed
 5. **Delete Aggressively**: Less code = fewer bugs

-**Reference**: See worker-utils.ts critique (conversation 2025-11-05) for detailed examples.
+**Common anti-patterns to avoid:**
+- Ceremonial wrapper functions for constants (just export the constant)
+- Unused default parameters (remove if never used)
+- Magic numbers without named constants
+- Silent failures instead of explicit errors
+- Fragile string parsing (use structured JSON output)
+- Copy-pasted promise wrappers (extract helper functions)
+- Overengineered "defensive" code for problems you don't have

 ## Common Tasks

@@ -291,191 +164,21 @@ pm2 delete claude-mem-worker # Force clean start
 5. Use mem-search skill to verify behavior (auto-invoked when asking about past work)

 ### Version Bumps
-**Note**: There is no version-bump skill currently available. Version bumping must be done manually by updating:
- `package.json` - Update `version` field
- `plugin/.claude-plugin/plugin.json` - Update `version` field  
- `CLAUDE.md` - Update version number at top
- `README.md` - Update version badge
-
-Then run:
-```bash
-npm run build && npm run sync-marketplace
-```
+Use the `version-bump` skill (auto-invokes when requesting version updates). It handles:
+- Semantic version increments (patch/minor/major)
+- Updates all version references (package.json, plugin.json, CLAUDE.md, marketplace.json)
+- Creates git tags and GitHub releases
+- Auto-generates CHANGELOG.md from releases

 ## Investigation Best Practices

-**When investigations are failing persistently**, use Task agents for comprehensive file analysis instead of grep/search:
+When investigations fail persistently, use Task agents for comprehensive file analysis instead of repeated grep/search. Deploy agents to read full files and answer specific questions - more efficient than multiple rounds of searching.

-**❌ Don't:** Repeatedly grep and search for patterns when failing to find the issue
+## Environment Variables

-**✅ Do:** Deploy a Task agent to read files in full and answer specific questions
-```
-"Read these files in full and answer: [specific questions about the implementation]"
- Reduces token usage by delegating to a specialized agent
- Provides comprehensive analysis in one pass
- Finds issues that grep might miss due to poor query formulation
- More efficient than multiple rounds of searching
-```
-
-**Example:**
-```
-Deploy a general-purpose Task agent to:
-1. Read src/hooks/context-hook.ts in full
-2. Read src/services/worker-service.ts in full
-3. Answer: How do these files work together? What's the current implementation state?
-4. Find any bugs or inconsistencies between them
-```
-
-Use this when:
- Investigating how multiple files interact
- Search queries aren't finding what you expect
- Need complete implementation context
- Issue might be a subtle inconsistency between files
-
-## Recent Changes
-
-### v5.5.0 - mem-search Skill Enhancement
-**Skill Naming and Effectiveness**: Renamed from "search" to "mem-search" for better scope differentiation
- **Effectiveness Improvement**: Skill success rate increased from 67% to 100%
- **Better Triggers**: Concrete triggers increased from 44% to 85%
- **5+ Unique Identifiers**: System-specific naming prevents confusion with native conversation memory
- **Comprehensive Documentation**: 17 total files with 12 operation guides + 2 principle directories
- **No User Action Required**: Skill automatically invokes when asking about past work, decisions, or history
-
-**How It Works:**
- User asks: "What bug did we fix last session?"
- Claude sees mem-search skill description matches → invokes mem-search skill
- Skill loads full instructions → uses curl to call HTTP API → formats results
- User sees formatted answer with past work context
-
-### v5.4.0 - Skill-Based Search Migration
-**Breaking Change**: MCP search tools replaced with skill-based approach
- **Token Savings**: ~2,250 tokens per session start
- **Progressive Disclosure**: Skill frontmatter (~250 tokens) instead of 9 MCP tool definitions (~2,500 tokens)
- **New HTTP API**: 10 search endpoints in worker service (localhost:37777/api/search/*)
- **Search Skill**: Auto-invoked when users ask about past work, decisions, or history
- **No User Action Required**: Migration is transparent, searches work automatically
- **Deprecated**: MCP search server (source kept for reference: src/servers/search-server.ts)
-
-**Available Search Operations:**
-1. Search observations (full-text)
-2. Search session summaries (full-text)
-3. Search user prompts (full-text)
-4. Search by observation type (bugfix, feature, refactor, discovery, decision)
-5. Search by concept tag
-6. Search by file path
-7. Get recent context for a project
-8. Get timeline around specific point in time
-9. Get timeline by query (search + timeline in one call)
-10. Get API help documentation
-
-**How It Works:**
- User asks: "What bug did we fix last session?"
- Claude sees skill description matches → invokes search skill
- Skill loads full instructions → uses curl to call HTTP API → formats results
- User sees formatted answer with past work context
-
-### v5.1.2 - Theme Toggle
-**Theme Support**: Light/dark mode for viewer UI
- User-selectable theme with persistent settings
- Automatic system preference detection
- Smooth transitions between themes
- Settings stored in browser localStorage
-
-### v5.1.0 - Web-Based Viewer UI
-**Major Feature**: Web-Based Viewer UI for Real-Time Memory Stream
- Production-ready viewer accessible at http://localhost:37777
- Real-time visualization via Server-Sent Events (SSE) - see observations, sessions, and prompts as they happen
- Infinite scroll pagination with automatic deduplication
- Project filtering to focus on specific codebases
- Settings persistence (sidebar state, selected project)
- Auto-reconnection with exponential backoff
- GPU-accelerated animations for smooth interactions
-
-**Worker Service API Endpoints** (14 HTTP/SSE endpoints total):
-
-*Viewer & Health:*
- `GET /` - Serves viewer HTML (self-contained React app)
- `GET /health` - Health check endpoint
- `GET /stream` - Server-Sent Events for real-time updates
-
-*Data Retrieval:*
- `GET /api/prompts` - Paginated user prompts with project filtering
- `GET /api/observations` - Paginated observations with project filtering
- `GET /api/summaries` - Paginated session summaries with project filtering
- `GET /api/stats` - Database statistics (total counts by project)
-
-*Settings:*
- `GET /api/settings` - Get current viewer settings
- `POST /api/settings` - Update viewer settings
-
-*Session Management:*
- `POST /sessions/:sessionDbId/init` - Initialize new session
- `POST /sessions/:sessionDbId/observations` - Add observations to session
- `POST /sessions/:sessionDbId/summarize` - Generate session summary
- `GET /sessions/:sessionDbId/status` - Get session status
- `DELETE /sessions/:sessionDbId` - Delete session (graceful cleanup)
-
-**Database Enhancements** (+98 lines in SessionStore):
- `getRecentPrompts()` - Paginated prompts with OFFSET/LIMIT
- `getRecentObservations()` - Paginated observations with OFFSET/LIMIT
- `getRecentSummaries()` - Paginated summaries with OFFSET/LIMIT
- `getStats()` - Aggregated statistics by project
- `getUniqueProjects()` - Distinct project names
-
-**Complete React UI** (17 new files, 1,500+ lines):
- Components: Header, Sidebar, Feed, Cards (Observation, Prompt, Summary, Skeleton)
- Hooks: useSSE, usePagination, useSettings, useStats
- Utils: Data merging, formatters, constants
- Assets: Monaspace Radon font, logos (dark mode + logomark)
- Build: esbuild pipeline for self-contained HTML bundle
-
-**Why This Matters**: Users can now visualize their memory stream in real-time. See exactly what claude-mem is capturing as you work, filter by project, and understand the context being injected into sessions.
-
-### v5.0.3 - Smart Install Caching
-**Smart Caching Installer for Windows Compatibility**:
- Eliminated redundant npm install on every SessionStart (2-5s → 10ms)
- Caches version in `.install-version` file
- Only runs npm install when actually needed (first time, version change, missing deps)
- 200x performance improvement for cached installations
-
-### v5.0.0 - Hybrid Search Architecture
-**Major Feature**: Chroma Vector Database Integration
- Hybrid semantic + keyword search combining ChromaDB with SQLite FTS5
- ChromaSync service for automatic vector embedding synchronization (738 lines)
- 90-day recency filtering for contextually relevant results
- Timeline and context search capabilities (now provided via skill-based HTTP API)
- Performance: Semantic search <200ms with 8,000+ vector documents
- Full-text search across observations, sessions, and prompts
-
-## Configuration Users Can Set
-
-**Model Selection** (`~/.claude/settings.json`):
-```json
-{
-  "env": {
-    "CLAUDE_MEM_MODEL": "claude-haiku-4-5"  // or sonnet-4-5, opus-4, etc.
-  }
-}
-```
-
-**Context Observation Count** (`~/.claude/settings.json`):
-```json
-{
-  "env": {
-    "CLAUDE_MEM_CONTEXT_OBSERVATIONS": "50"  // default, adjust based on needs
-  }
-}
-```
-
-**Worker Port** (`~/.claude/settings.json`):
-```json
-{
-  "env": {
-    "CLAUDE_MEM_WORKER_PORT": "37777"  // default
-  }
-}
-```
+- `CLAUDE_MEM_MODEL` - Model for observations/summaries (default: claude-haiku-4-5)
+- `CLAUDE_MEM_CONTEXT_OBSERVATIONS` - Observations injected at SessionStart (default: 50)
+- `CLAUDE_MEM_WORKER_PORT` - Worker service port (default: 37777)

 ## Key Design Decisions

@@ -485,13 +188,13 @@ Hooks have strict timeout limits. PM2 manages a persistent background worker, al
 ### Why SQLite FTS5
 Enables instant full-text search across thousands of observations without external dependencies. Automatic sync triggers keep FTS5 tables synchronized.

-### Why Graceful Cleanup (v4.1.0)
+### Why Graceful Cleanup
 Changed from aggressive DELETE requests to marking sessions complete. Prevents interrupting summary generation and other async operations.

-### Why Smart Install Caching (v5.0.3)
+### Why Smart Install Caching
 npm install is expensive (2-5s). Caching version state and only installing on changes makes SessionStart nearly instant (10ms).

-### Why Web-Based Viewer UI (v5.1.0)
+### Why Web-Based Viewer UI
 Real-time visibility into memory stream helps users understand what's being captured and how context is being built. SSE provides instant updates without polling. Self-contained HTML bundle (esbuild) eliminates deployment complexity - everything served from a single file.

 ## File Locations