Delete implementation plans and memory leak documentation files
- Removed `IMPLEMENTATION_PLAN_ROI_METRICS.md` which detailed the implementation plan for ROI metrics and discovery cost tracking. - Deleted `MEMORY_LEAK_FIXES.md` and `MEMORY_LEAK_SUMMARY.md` that contained information on memory leak fixes and their summaries.
This commit is contained in:
@@ -1,434 +0,0 @@
|
||||
# Hybrid Search Architecture: Problem-Solution Document
|
||||
|
||||
**Date:** 2025-01-15
|
||||
**Author:** Claude Code (Session handoff document)
|
||||
**Purpose:** Comprehensive fix guide for hybrid search architecture documentation and implementation
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The claude-mem hybrid search architecture is **correctly implemented in code** but **incorrectly documented** in skill guides. Additionally, the workflow is missing the final "instant context timeline" step that completes the human memory analogy.
|
||||
|
||||
**Quick Status:**
|
||||
- ✅ Backend code (`search-server.ts`): ChromaDB first, SQLite temporal sort
|
||||
- ❌ Skill operation guides: Describe FTS5 as primary search method
|
||||
- ❌ Missing feature: Automatic timeline context retrieval (before/after observations)
|
||||
- ✅ Landing page: Recently corrected
|
||||
- ⚠️ Documentation: Needs validation and potential refinement
|
||||
|
||||
---
|
||||
|
||||
## The Intended Architecture (User's Vision)
|
||||
|
||||
### Storage Flow
|
||||
|
||||
```
|
||||
User Action
|
||||
↓
|
||||
1. SQLite Insert (FAST, synchronous)
|
||||
- Immediate persistence
|
||||
- Available for querying instantly
|
||||
↓
|
||||
2. ChromaDB Sync (BACKGROUND, asynchronous)
|
||||
- Worker generates embeddings
|
||||
- Takes time but doesn't block user
|
||||
- Uses OpenAI text-embedding-3-small
|
||||
```
|
||||
|
||||
**Why this design:**
|
||||
- Users don't wait for embedding generation
|
||||
- SQLite provides immediate access
|
||||
- ChromaDB catches up in background for semantic search
|
||||
|
||||
### Search Flow (3-Layer Sequential Architecture)
|
||||
|
||||
```
|
||||
User Query: "How did we implement authentication?"
|
||||
↓
|
||||
LAYER 1: Semantic Retrieval (ChromaDB)
|
||||
- Vector similarity search
|
||||
- Returns observation IDs (not full records)
|
||||
- Top 100 semantic matches
|
||||
- 90-day recency filter applied
|
||||
↓
|
||||
LAYER 2: Temporal Ordering (SQLite)
|
||||
- Takes IDs from Layer 1
|
||||
- Hydrates full records from SQLite
|
||||
- Sorts by created_at_epoch DESC
|
||||
- Returns NEWEST relevant observation
|
||||
↓
|
||||
LAYER 3: Instant Context Timeline (SQLite) [MISSING IN CURRENT IMPLEMENTATION]
|
||||
- Takes top observation ID from Layer 2
|
||||
- Retrieves N observations BEFORE that point
|
||||
- Retrieves N observations AFTER that point
|
||||
- Provides temporal context: "what led here" + "what happened next"
|
||||
↓
|
||||
Present to User
|
||||
- Most relevant observation
|
||||
- Timeline showing before/after context
|
||||
- Mimics human memory
|
||||
```
|
||||
|
||||
**Why ChromaDB can't do it alone:**
|
||||
- ChromaDB doesn't efficiently support date range queries sorted by time
|
||||
- SQLite excels at temporal operations (ORDER BY created_at_epoch)
|
||||
- Need both: ChromaDB for semantic, SQLite for temporal
|
||||
|
||||
**Why the timeline matters:**
|
||||
> LLMs don't experience time linearly like humans do. Humans remember: "I did X, which led to Y, then Z happened." The instant context timeline gives LLMs this temporal awareness that humans experience naturally.
|
||||
|
||||
### Fallback Behavior
|
||||
|
||||
```
|
||||
IF ChromaDB unavailable OR no results:
|
||||
↓
|
||||
FTS5 Keyword Search (SQLite)
|
||||
- Full-text search on observations_fts
|
||||
- Basic keyword matching
|
||||
- Ensures backward compatibility
|
||||
- Fallback for older systems
|
||||
```
|
||||
|
||||
**FTS5 is NOT "optional"** - it's the fallback mechanism for when ChromaDB isn't available or returns no results.
|
||||
|
||||
---
|
||||
|
||||
## Current State Analysis
|
||||
|
||||
### ✅ What's Correct: Backend Implementation
|
||||
|
||||
**File:** `/Users/alexnewman/Scripts/claude-mem/src/servers/search-server.ts`
|
||||
**Lines:** 360-396 (search_observations handler)
|
||||
|
||||
The code DOES implement Layers 1 & 2 correctly:
|
||||
|
||||
```typescript
|
||||
// Step 1: ChromaDB semantic search (top 100)
|
||||
if (chromaClient) {
|
||||
const chromaResults = await queryChroma(query, 100);
|
||||
|
||||
// Step 2: Filter by 90-day recency
|
||||
const ninetyDaysAgo = Date.now() - (90 * 24 * 60 * 60 * 1000);
|
||||
const recentIds = chromaResults.ids.filter((_id, idx) => {
|
||||
const meta = chromaResults.metadatas[idx];
|
||||
return meta && meta.created_at_epoch > ninetyDaysAgo;
|
||||
});
|
||||
|
||||
// Step 3: Hydrate from SQLite with temporal ordering
|
||||
results = store.getObservationsByIds(recentIds, {
|
||||
orderBy: 'date_desc',
|
||||
limit
|
||||
});
|
||||
}
|
||||
|
||||
// Fallback to FTS5 if ChromaDB unavailable
|
||||
if (results.length === 0) {
|
||||
results = search.searchObservations(query, options); // FTS5
|
||||
}
|
||||
```
|
||||
|
||||
**What this gets right:**
|
||||
- ChromaDB semantic search FIRST (not FTS5)
|
||||
- 90-day recency filter
|
||||
- SQLite temporal ordering (`orderBy: 'date_desc'`)
|
||||
- FTS5 fallback for reliability
|
||||
|
||||
### ❌ What's Wrong: Skill Operation Guides
|
||||
|
||||
**File:** `/Users/alexnewman/Scripts/claude-mem/plugin/skills/mem-search/operations/observations.md`
|
||||
|
||||
**Current Title:** "Search Observations (Full-Text)"
|
||||
**Current Description:** "Search all observations using natural language queries."
|
||||
**Current Line 351:** `query: z.string().describe('Search query for FTS5 full-text search')`
|
||||
|
||||
**The Problem:**
|
||||
- Describes FTS5 as the search method
|
||||
- No mention of ChromaDB semantic search
|
||||
- Misleading title "Full-Text" implies keyword-only
|
||||
- Examples don't show the ChromaDB → SQLite flow
|
||||
|
||||
**Impact:**
|
||||
- Claude thinks it's doing FTS5 keyword search
|
||||
- Doesn't understand it's semantic vector search
|
||||
- Can't explain the architecture to users correctly
|
||||
|
||||
### ⚠️ What's Missing: Layer 3 (Instant Context Timeline)
|
||||
|
||||
The current implementation stops at Layer 2 (temporal ordering). It doesn't automatically:
|
||||
|
||||
1. Identify the MOST relevant observation (it returns a sorted list)
|
||||
2. Retrieve observations BEFORE that point in time
|
||||
3. Retrieve observations AFTER that point in time
|
||||
4. Present the timeline context to the user
|
||||
|
||||
**Why this matters:**
|
||||
The timeline is the **killer feature** that mimics human memory. Without it, users get:
|
||||
- ❌ A sorted list of relevant observations
|
||||
- ❌ No context about what led there
|
||||
- ❌ No context about what happened next
|
||||
|
||||
With timeline, users get:
|
||||
- ✅ The MOST relevant observation
|
||||
- ✅ Context: "You did A and B before this"
|
||||
- ✅ Context: "After this, you did C and D"
|
||||
- ✅ Complete narrative like human memory
|
||||
|
||||
### 📋 Documentation Status
|
||||
|
||||
**Recently Fixed (✅):**
|
||||
- `/Users/alexnewman/Scripts/claude-mem/docs/context/mem-search-technical-architecture.md`
|
||||
- Now describes 3-layer sequential flow
|
||||
- Includes human memory analogy
|
||||
- Positions ChromaDB as primary
|
||||
|
||||
**Landing Page (✅):**
|
||||
- `/Users/alexnewman/Scripts/claude-mem-pro/src/components/landing/Features.tsx`
|
||||
- `/Users/alexnewman/Scripts/claude-mem-pro/src/components/landing/QuickBenefits.tsx`
|
||||
- `/Users/alexnewman/Scripts/claude-mem-pro/src/components/landing/Architecture.tsx`
|
||||
- All updated to describe ChromaDB-first architecture
|
||||
- "Remember Like a Human" messaging added
|
||||
- Timeline feature highlighted
|
||||
|
||||
**Needs Review:**
|
||||
- SKILL.md technical notes (line 172)
|
||||
- All operation guides in `/operations/` directory
|
||||
- Common workflows documentation
|
||||
|
||||
---
|
||||
|
||||
## Required Fixes
|
||||
|
||||
### Fix 1: Update Skill Operation Guides
|
||||
|
||||
**Files to modify:**
|
||||
- `/Users/alexnewman/Scripts/claude-mem/plugin/skills/mem-search/operations/observations.md`
|
||||
- `/Users/alexnewman/Scripts/claude-mem/plugin/skills/mem-search/operations/common-workflows.md`
|
||||
|
||||
**Changes needed:**
|
||||
|
||||
1. **observations.md:**
|
||||
- Change title: "Search Observations (Full-Text)" → "Search Observations (Semantic + Temporal)"
|
||||
- Update description: Explain ChromaDB semantic search as primary
|
||||
- Update command examples to explain hybrid flow
|
||||
- Add note: "Uses ChromaDB vector search with SQLite temporal ordering. FTS5 used as fallback."
|
||||
|
||||
2. **common-workflows.md:**
|
||||
- Update "Workflow 2: Finding Specific Bug Fixes" to explain ChromaDB → SQLite flow
|
||||
- Add new workflow: "Workflow N: Getting Timeline Context Around Relevant Observations"
|
||||
|
||||
**Example of corrected observations.md header:**
|
||||
|
||||
```markdown
|
||||
# Search Observations (Semantic + Temporal)
|
||||
|
||||
Search observations using ChromaDB vector similarity with SQLite temporal ordering.
|
||||
|
||||
## Architecture
|
||||
|
||||
**3-Layer Hybrid Search:**
|
||||
1. **ChromaDB semantic retrieval** - Finds what's semantically relevant (vector similarity)
|
||||
2. **90-day recency filter** - Prioritizes recent work
|
||||
3. **SQLite temporal ordering** - Sorts by time, returns newest relevant
|
||||
|
||||
**Fallback:** If ChromaDB unavailable, falls back to FTS5 keyword search.
|
||||
|
||||
## When to Use
|
||||
|
||||
- User asks: "How did we implement authentication?"
|
||||
- User asks: "What bugs did we fix?"
|
||||
- Looking for past work by meaning/topic (not just keywords)
|
||||
```
|
||||
|
||||
### Fix 2: Implement Layer 3 (Instant Context Timeline)
|
||||
|
||||
**Option A: Add to existing search_observations handler**
|
||||
|
||||
Modify `/Users/alexnewman/Scripts/claude-mem/src/servers/search-server.ts` line ~396:
|
||||
|
||||
```typescript
|
||||
// After getting sorted results, if user wants timeline context
|
||||
if (results.length > 0 && options.includeTimeline) {
|
||||
const topObservation = results[0];
|
||||
const depth_before = options.timelineDepthBefore || 5;
|
||||
const depth_after = options.timelineDepthAfter || 5;
|
||||
|
||||
// Get observations before and after
|
||||
const timeline = store.getTimelineContext(
|
||||
topObservation.id,
|
||||
depth_before,
|
||||
depth_after
|
||||
);
|
||||
|
||||
return {
|
||||
topResult: topObservation,
|
||||
timeline: timeline,
|
||||
format: format
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Option B: Use existing timeline-by-query operation**
|
||||
|
||||
The `/api/timeline/by-query` endpoint already implements search + timeline. Could:
|
||||
1. Make it the DEFAULT recommended operation in skill guides
|
||||
2. Update operation guides to emphasize this as primary workflow
|
||||
3. Position observations search as "timeline-less" alternative
|
||||
|
||||
**Recommendation:** Option B is faster - leverage existing `timeline-by-query` endpoint and update skill guides to make it the primary workflow.
|
||||
|
||||
### Fix 3: Update SKILL.md Technical Notes
|
||||
|
||||
**File:** `/Users/alexnewman/Scripts/claude-mem/plugin/skills/mem-search/SKILL.md`
|
||||
**Line 172:**
|
||||
|
||||
**Current:**
|
||||
```markdown
|
||||
- **Search engine:** FTS5 full-text search + structured filters
|
||||
```
|
||||
|
||||
**Change to:**
|
||||
```markdown
|
||||
- **Search engine:** ChromaDB vector search (primary) + SQLite temporal ordering + instant context timeline (3-layer sequential architecture)
|
||||
```
|
||||
|
||||
### Fix 4: Update search_observations Description
|
||||
|
||||
**File:** `/Users/alexnewman/Scripts/claude-mem/src/servers/search-server.ts`
|
||||
**Line 349:**
|
||||
|
||||
**Current:**
|
||||
```typescript
|
||||
description: 'Search observations using full-text search across titles, narratives...'
|
||||
```
|
||||
|
||||
**Change to:**
|
||||
```typescript
|
||||
description: 'Search observations using hybrid semantic search (ChromaDB vector similarity + SQLite temporal ordering). Falls back to FTS5 keyword search if ChromaDB unavailable. IMPORTANT: Always use index format first...'
|
||||
```
|
||||
|
||||
**Line 351:**
|
||||
|
||||
**Current:**
|
||||
```typescript
|
||||
query: z.string().describe('Search query for FTS5 full-text search'),
|
||||
```
|
||||
|
||||
**Change to:**
|
||||
```typescript
|
||||
query: z.string().describe('Search query (semantic vector search via ChromaDB, falls back to FTS5 if unavailable)'),
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
Use this checklist when executing fixes:
|
||||
|
||||
### Phase 1: Core Documentation
|
||||
- [ ] Update `observations.md` title and description
|
||||
- [ ] Update `observations.md` architecture explanation
|
||||
- [ ] Update `observations.md` examples to mention ChromaDB
|
||||
- [ ] Update `common-workflows.md` to explain hybrid flow
|
||||
- [ ] Update `SKILL.md` line 172 technical notes
|
||||
- [ ] Verify all operation guides mention ChromaDB correctly
|
||||
|
||||
### Phase 2: Backend Updates
|
||||
- [ ] Update `search-server.ts` search_observations description (line 349)
|
||||
- [ ] Update `search-server.ts` query parameter description (line 351)
|
||||
- [ ] Add code comments explaining 3-layer flow
|
||||
- [ ] Consider adding `includeTimeline` option to search_observations
|
||||
|
||||
### Phase 3: Timeline Integration
|
||||
- [ ] Review timeline-by-query operation
|
||||
- [ ] Update skill guides to recommend timeline-by-query as primary workflow
|
||||
- [ ] Add example: "When you need context, use timeline-by-query instead of observations search"
|
||||
- [ ] Update quick reference table in SKILL.md to highlight timeline-by-query
|
||||
|
||||
### Phase 4: Validation
|
||||
- [ ] Test search behavior with ChromaDB enabled
|
||||
- [ ] Test fallback behavior with ChromaDB disabled
|
||||
- [ ] Verify skill guides accurately describe behavior
|
||||
- [ ] Ensure landing page messaging aligns with skill guides
|
||||
- [ ] Check that human memory analogy is consistent everywhere
|
||||
|
||||
---
|
||||
|
||||
## Key Messaging (Use Consistently)
|
||||
|
||||
### Value Proposition
|
||||
"3-layer hybrid search mimics human memory: ChromaDB semantic retrieval finds what's relevant → SQLite temporal ordering identifies when → instant context timeline shows what led there and what came next."
|
||||
|
||||
### Technical Architecture
|
||||
"ChromaDB vector search handles semantic understanding (what's relevant), SQLite handles temporal queries (when it happened, what's newest), and timeline context provides before/after observations (what led there, what happened next)."
|
||||
|
||||
### Why It Matters
|
||||
"LLMs don't experience time linearly like humans do. Claude-mem gives them temporal context: not just 'you implemented authentication,' but 'you researched OAuth libraries, then implemented JWT auth, then fixed a token expiration bug.' Complete narrative, like human memory."
|
||||
|
||||
### ChromaDB Role
|
||||
"ChromaDB is the PRIMARY search mechanism for semantic understanding. FTS5 is the FALLBACK for backward compatibility and reliability when ChromaDB is unavailable."
|
||||
|
||||
---
|
||||
|
||||
## Files Reference
|
||||
|
||||
**Skill Guides (Primary Fixes):**
|
||||
- `/Users/alexnewman/Scripts/claude-mem/plugin/skills/mem-search/SKILL.md`
|
||||
- `/Users/alexnewman/Scripts/claude-mem/plugin/skills/mem-search/operations/observations.md`
|
||||
- `/Users/alexnewman/Scripts/claude-mem/plugin/skills/mem-search/operations/timeline-by-query.md`
|
||||
- `/Users/alexnewman/Scripts/claude-mem/plugin/skills/mem-search/operations/common-workflows.md`
|
||||
|
||||
**Backend Code (Minor Updates):**
|
||||
- `/Users/alexnewman/Scripts/claude-mem/src/servers/search-server.ts`
|
||||
|
||||
**Documentation (Validation):**
|
||||
- `/Users/alexnewman/Scripts/claude-mem/docs/context/mem-search-technical-architecture.md`
|
||||
|
||||
**Landing Page (Already Fixed):**
|
||||
- `/Users/alexnewman/Scripts/claude-mem-pro/src/components/landing/Features.tsx`
|
||||
- `/Users/alexnewman/Scripts/claude-mem-pro/src/components/landing/QuickBenefits.tsx`
|
||||
- `/Users/alexnewman/Scripts/claude-mem-pro/src/components/landing/Architecture.tsx`
|
||||
|
||||
---
|
||||
|
||||
## Questions for User (If Needed)
|
||||
|
||||
1. **Timeline Integration Approach:**
|
||||
- Option A: Modify search_observations to add `includeTimeline` parameter
|
||||
- Option B: Emphasize timeline-by-query as primary workflow in guides
|
||||
- User preference?
|
||||
|
||||
2. **Backward Compatibility:**
|
||||
- Should FTS5 fallback be MORE prominent in docs for older systems?
|
||||
- Or keep it as "implementation detail"?
|
||||
|
||||
3. **Progressive Disclosure:**
|
||||
- Should timeline context ALWAYS be included?
|
||||
- Or only when user explicitly asks for context?
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
When these fixes are complete:
|
||||
|
||||
1. ✅ Skill operation guides accurately describe ChromaDB-first architecture
|
||||
2. ✅ No references to "FTS5 as primary search method"
|
||||
3. ✅ Timeline feature integrated into standard workflow
|
||||
4. ✅ Human memory analogy present in key documentation
|
||||
5. ✅ Consistent messaging across skill guides, docs, and landing page
|
||||
6. ✅ Backend code comments explain 3-layer flow clearly
|
||||
7. ✅ Users understand: "This is semantic search with temporal context, not just keyword search"
|
||||
|
||||
---
|
||||
|
||||
## Notes for Next Claude
|
||||
|
||||
- The user has already clarified the architecture thoroughly
|
||||
- Backend code is already correct - focus on documentation/guides
|
||||
- Landing page recently updated - validate for consistency
|
||||
- Timeline-by-query endpoint already exists - leverage it
|
||||
- Key insight: This mimics human memory through temporal context
|
||||
- ChromaDB is PRIMARY, not optional. FTS5 is FALLBACK, not primary.
|
||||
|
||||
**Start with:** Reading this document fully, then update skill operation guides first (highest impact).
|
||||
@@ -1,614 +0,0 @@
|
||||
# Implementation Plan: ROI Metrics & Discovery Cost Tracking
|
||||
|
||||
**Feature**: Display token discovery costs alongside observations to demonstrate knowledge reuse ROI
|
||||
**Branch**: `enhancement/roi`
|
||||
**Issue**: #104
|
||||
**Priority**: HIGH (needed for YC application amendment)
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Capture token usage from Agent SDK, store as "discovery cost" with each observation, and display metrics in SessionStart context to prove that claude-mem reduces token consumption by 50-75% through knowledge reuse.
|
||||
|
||||
### The Value Proposition
|
||||
|
||||
**Session 1**: Claude spends 4,000 tokens discovering "how Stop hooks work"
|
||||
**Sessions 2-5**: Claude reads 163-token observation instead of re-discovering
|
||||
**Savings**: 15,348 tokens (77% reduction) over 5 sessions
|
||||
|
||||
This feature makes that ROI **visible and measurable** for both users and Claude.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
Agent SDK Messages (with usage)
|
||||
↓
|
||||
SDKAgent captures usage data
|
||||
↓
|
||||
ActiveSession tracks cumulative tokens
|
||||
↓
|
||||
Observations stored with discovery_tokens
|
||||
↓
|
||||
Context hook displays metrics
|
||||
↓
|
||||
User/Claude sees ROI
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Phase 1: Capture Token Usage from Agent SDK
|
||||
|
||||
**File**: `src/services/worker/SDKAgent.ts`
|
||||
|
||||
**Changes**:
|
||||
1. Extract usage data from assistant messages (lines 64-86)
|
||||
2. Track cumulative session tokens in ActiveSession
|
||||
3. Pass cumulative tokens when storing observations
|
||||
|
||||
**Code Changes**:
|
||||
|
||||
```typescript
|
||||
// Line ~70: After extracting textContent, add:
|
||||
const usage = message.message.usage;
|
||||
if (usage) {
|
||||
session.cumulativeInputTokens += usage.input_tokens || 0;
|
||||
session.cumulativeOutputTokens += usage.output_tokens || 0;
|
||||
|
||||
// Cache creation counts as discovery, cache read doesn't
|
||||
if (usage.cache_creation_input_tokens) {
|
||||
session.cumulativeInputTokens += usage.cache_creation_input_tokens;
|
||||
}
|
||||
|
||||
logger.debug('SDK', 'Token usage captured', {
|
||||
sessionId: session.sessionDbId,
|
||||
inputTokens: usage.input_tokens,
|
||||
outputTokens: usage.output_tokens,
|
||||
cumulativeInput: session.cumulativeInputTokens,
|
||||
cumulativeOutput: session.cumulativeOutputTokens
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
```typescript
|
||||
// Line ~213-218: Pass discovery tokens when storing
|
||||
const { id: obsId, createdAtEpoch } = this.dbManager.getSessionStore().storeObservation(
|
||||
session.claudeSessionId,
|
||||
session.project,
|
||||
obs,
|
||||
session.lastPromptNumber,
|
||||
session.cumulativeInputTokens + session.cumulativeOutputTokens // Add discovery cost
|
||||
);
|
||||
```
|
||||
|
||||
**Edge Cases**:
|
||||
- Handle missing usage data (default to 0)
|
||||
- Cache tokens: `cache_creation_input_tokens` counts as discovery, `cache_read_input_tokens` doesn't
|
||||
- Multiple observations per response: Each gets snapshot of cumulative tokens at creation time
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Update ActiveSession Type
|
||||
|
||||
**File**: `src/services/worker-types.ts`
|
||||
|
||||
**Changes**: Add token tracking fields to ActiveSession interface
|
||||
|
||||
```typescript
|
||||
export interface ActiveSession {
|
||||
sessionDbId: number;
|
||||
sdkSessionId: string | null;
|
||||
claudeSessionId: string;
|
||||
project: string;
|
||||
userPrompt: string;
|
||||
lastPromptNumber: number;
|
||||
pendingMessages: PendingMessage[];
|
||||
abortController: AbortController;
|
||||
startTime: number;
|
||||
cumulativeInputTokens: number; // NEW: Track input tokens
|
||||
cumulativeOutputTokens: number; // NEW: Track output tokens
|
||||
}
|
||||
```
|
||||
|
||||
**Initialization**: When creating new session in SessionManager.initializeSession, set:
|
||||
```typescript
|
||||
cumulativeInputTokens: 0,
|
||||
cumulativeOutputTokens: 0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Database Schema Migration
|
||||
|
||||
**File**: `src/services/sqlite/migrations.ts`
|
||||
|
||||
**Add Migration**: Create migration #8 (next available number)
|
||||
|
||||
```typescript
|
||||
{
|
||||
version: 8,
|
||||
name: 'add_discovery_tokens',
|
||||
up: (db: Database) => {
|
||||
// Add discovery_tokens to observations
|
||||
db.exec(`
|
||||
ALTER TABLE observations
|
||||
ADD COLUMN discovery_tokens INTEGER DEFAULT 0;
|
||||
`);
|
||||
|
||||
// Add discovery_tokens to summaries
|
||||
db.exec(`
|
||||
ALTER TABLE summaries
|
||||
ADD COLUMN discovery_tokens INTEGER DEFAULT 0;
|
||||
`);
|
||||
|
||||
logger.info('DB', 'Migration 8: Added discovery_tokens columns');
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Why summaries too?**: Summaries represent accumulated session work, so they should also show total discovery cost.
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Update SessionStore
|
||||
|
||||
**File**: `src/services/sqlite/SessionStore.ts`
|
||||
|
||||
**Changes**:
|
||||
|
||||
1. Update `storeObservation` signature (around line ~1000):
|
||||
```typescript
|
||||
storeObservation(
|
||||
sessionId: string,
|
||||
project: string,
|
||||
observation: ParsedObservation,
|
||||
promptNumber: number,
|
||||
discoveryTokens: number = 0 // NEW parameter
|
||||
): { id: number; createdAtEpoch: number }
|
||||
```
|
||||
|
||||
2. Update INSERT statement to include discovery_tokens:
|
||||
```typescript
|
||||
const stmt = this.db.prepare(`
|
||||
INSERT INTO observations (
|
||||
session_id,
|
||||
project,
|
||||
type,
|
||||
title,
|
||||
subtitle,
|
||||
narrative,
|
||||
facts,
|
||||
concepts,
|
||||
files_read,
|
||||
files_modified,
|
||||
prompt_number,
|
||||
discovery_tokens, -- NEW
|
||||
created_at_epoch
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
`);
|
||||
|
||||
const result = stmt.run(
|
||||
sessionId,
|
||||
project,
|
||||
observation.type,
|
||||
observation.title,
|
||||
observation.subtitle || '',
|
||||
observation.narrative || '',
|
||||
JSON.stringify(observation.facts || []),
|
||||
JSON.stringify(observation.concepts || []),
|
||||
JSON.stringify(observation.files || []),
|
||||
JSON.stringify([]),
|
||||
promptNumber,
|
||||
discoveryTokens, // NEW
|
||||
createdAtEpoch
|
||||
);
|
||||
```
|
||||
|
||||
3. Update `storeSummary` similarly (around line ~1150):
|
||||
```typescript
|
||||
storeSummary(
|
||||
sessionId: string,
|
||||
project: string,
|
||||
summary: ParsedSummary,
|
||||
promptNumber: number,
|
||||
discoveryTokens: number = 0 // NEW parameter
|
||||
): { id: number; createdAtEpoch: number }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase 5: Update Database Types
|
||||
|
||||
**File**: `src/services/sqlite/types.ts`
|
||||
|
||||
**Changes**: Add discovery_tokens to DBObservation and DBSummary interfaces
|
||||
|
||||
```typescript
|
||||
export interface DBObservation {
|
||||
id: number;
|
||||
session_id: string;
|
||||
project: string;
|
||||
type: 'decision' | 'bugfix' | 'feature' | 'refactor' | 'discovery' | 'change';
|
||||
title: string;
|
||||
subtitle: string;
|
||||
narrative: string | null;
|
||||
facts: string; // JSON array
|
||||
concepts: string; // JSON array
|
||||
files_read: string; // JSON array
|
||||
files_modified: string; // JSON array
|
||||
prompt_number: number;
|
||||
discovery_tokens: number; // NEW
|
||||
created_at_epoch: number;
|
||||
}
|
||||
|
||||
export interface DBSummary {
|
||||
id: number;
|
||||
session_id: string;
|
||||
request: string;
|
||||
investigated: string | null;
|
||||
learned: string | null;
|
||||
completed: string | null;
|
||||
next_steps: string | null;
|
||||
notes: string | null;
|
||||
project: string;
|
||||
prompt_number: number;
|
||||
discovery_tokens: number; // NEW
|
||||
created_at_epoch: number;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase 6: Update Search Queries
|
||||
|
||||
**File**: `src/services/sqlite/SessionSearch.ts`
|
||||
|
||||
**Changes**: Ensure all SELECT queries include discovery_tokens
|
||||
|
||||
Example (around line ~50, searchObservations):
|
||||
```typescript
|
||||
SELECT
|
||||
o.id,
|
||||
o.session_id,
|
||||
o.project,
|
||||
o.type,
|
||||
o.title,
|
||||
o.subtitle,
|
||||
o.narrative,
|
||||
o.facts,
|
||||
o.concepts,
|
||||
o.files_read,
|
||||
o.files_modified,
|
||||
o.prompt_number,
|
||||
o.discovery_tokens, -- NEW
|
||||
o.created_at_epoch,
|
||||
...
|
||||
```
|
||||
|
||||
**Affected methods**:
|
||||
- `searchObservations`
|
||||
- `getRecentObservations`
|
||||
- `getObservationsByType`
|
||||
- `getObservationsByConcept`
|
||||
- `getObservationsByFile`
|
||||
- All other observation query methods
|
||||
|
||||
---
|
||||
|
||||
### Phase 7: Update Context Hook Display
|
||||
|
||||
**File**: `src/hooks/context-hook.ts`
|
||||
|
||||
**Changes**: Display discovery costs and ROI metrics in SessionStart context
|
||||
|
||||
**Section 1: Add Aggregate Metrics** (insert after line ~250, before observation table)
|
||||
|
||||
```typescript
|
||||
// Calculate aggregate metrics
|
||||
const totalObservations = observations.length;
|
||||
const totalReadTokens = observations.reduce((sum, obs) => {
|
||||
// Estimate read tokens from observation size
|
||||
const obsSize = (obs.title?.length || 0) +
|
||||
(obs.subtitle?.length || 0) +
|
||||
(obs.narrative?.length || 0) +
|
||||
JSON.stringify(obs.facts || []).length;
|
||||
return sum + Math.ceil(obsSize / 4); // Rough token estimate
|
||||
}, 0);
|
||||
const totalDiscoveryTokens = observations.reduce((sum, obs) => sum + (obs.discovery_tokens || 0), 0);
|
||||
const savings = totalDiscoveryTokens - totalReadTokens;
|
||||
const savingsPercent = totalDiscoveryTokens > 0
|
||||
? Math.round((savings / totalDiscoveryTokens) * 100)
|
||||
: 0;
|
||||
|
||||
output += `\n📊 **Context Economics**:\n`;
|
||||
output += ` - Loading: ${totalObservations} observations (${totalReadTokens.toLocaleString()} tokens to read)\n`;
|
||||
output += ` - Discovery investment: ${totalDiscoveryTokens.toLocaleString()} tokens spent by previous sessions\n`;
|
||||
output += ` - Your savings: ${savings.toLocaleString()} tokens (${savingsPercent}% reduction from reuse)\n\n`;
|
||||
```
|
||||
|
||||
**Section 2: Update Table Format** (around line ~280)
|
||||
|
||||
```typescript
|
||||
output += `| ID | Time | T | Title | Read | Discovery |\n`;
|
||||
output += `|----|------|---|-------|------|-----------||\n`;
|
||||
|
||||
for (const obs of observations) {
|
||||
const time = formatTime(obs.created_at_epoch);
|
||||
const emoji = getTypeEmoji(obs.type);
|
||||
const title = truncate(obs.title, 50);
|
||||
|
||||
// Estimate read tokens (observation size in tokens)
|
||||
const obsSize = (obs.title?.length || 0) +
|
||||
(obs.subtitle?.length || 0) +
|
||||
(obs.narrative?.length || 0) +
|
||||
JSON.stringify(obs.facts || []).length;
|
||||
const readTokens = Math.ceil(obsSize / 4);
|
||||
|
||||
const discoveryTokens = obs.discovery_tokens || 0;
|
||||
const discoveryDisplay = discoveryTokens > 0
|
||||
? `🔍 ${discoveryTokens.toLocaleString()}`
|
||||
: '-';
|
||||
|
||||
output += `| #${obs.id} | ${time} | ${emoji} | ${title} | ~${readTokens} | ${discoveryDisplay} |\n`;
|
||||
}
|
||||
```
|
||||
|
||||
**Section 3: Add Footer Explanation** (after table)
|
||||
|
||||
```typescript
|
||||
output += `\n💡 **Column Key**:\n`;
|
||||
output += ` - **Read**: Tokens to read this observation (cost to learn it now)\n`;
|
||||
output += ` - **Discovery**: Tokens Previous Claude spent exploring/researching this topic\n`;
|
||||
output += `\n**ROI**: Reading these learnings instead of re-discovering saves ${savingsPercent}% tokens\n`;
|
||||
```
|
||||
|
||||
**Edge Case**: Handle old observations without discovery_tokens (show '-' or 0)
|
||||
|
||||
---
|
||||
|
||||
### Phase 8: Update Chroma Sync (Optional)
|
||||
|
||||
**File**: `src/services/sync/ChromaSync.ts`
|
||||
|
||||
**Changes**: Include discovery_tokens in vector metadata
|
||||
|
||||
```typescript
|
||||
// Around line ~100, syncObservation metadata
|
||||
metadata: {
|
||||
session_id: sessionId,
|
||||
project: project,
|
||||
type: observation.type,
|
||||
title: observation.title,
|
||||
prompt_number: promptNumber,
|
||||
discovery_tokens: discoveryTokens, // NEW
|
||||
created_at_epoch: createdAtEpoch,
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
**Why?**: Enables semantic search to factor in discovery cost for relevance scoring (future enhancement)
|
||||
|
||||
---
|
||||
|
||||
## Testing Plan
|
||||
|
||||
### Unit Tests
|
||||
|
||||
1. **Token Capture Test**:
|
||||
- Mock Agent SDK response with usage data
|
||||
- Verify ActiveSession.cumulativeTokens increments correctly
|
||||
- Test cache token handling (creation counts, read doesn't)
|
||||
|
||||
2. **Storage Test**:
|
||||
- Create observation with discovery_tokens
|
||||
- Verify database stores correctly
|
||||
- Query back and verify field present
|
||||
|
||||
3. **Display Test**:
|
||||
- Create test observations with varying discovery costs
|
||||
- Run context-hook
|
||||
- Verify metrics calculate correctly
|
||||
- Verify table displays both Read and Discovery columns
|
||||
|
||||
### Integration Tests
|
||||
|
||||
1. **Full Session Flow**:
|
||||
- Start new session
|
||||
- Trigger multiple tool executions
|
||||
- Generate observations
|
||||
- Verify cumulative tokens accumulate
|
||||
- Check context displays metrics
|
||||
|
||||
2. **Migration Test**:
|
||||
- Backup existing database
|
||||
- Run migration #8
|
||||
- Verify columns added
|
||||
- Verify existing data intact (discovery_tokens = 0)
|
||||
- Test new observations store correctly
|
||||
|
||||
### Manual Testing
|
||||
|
||||
1. **Real Usage Scenario**:
|
||||
- Start fresh Claude Code session
|
||||
- Perform research task (read files, search codebase)
|
||||
- Generate observations via claude-mem
|
||||
- Check database for discovery_tokens values
|
||||
- Start new session, verify context shows metrics
|
||||
|
||||
2. **YC Demo Data**:
|
||||
- Run 5 sessions on same topic
|
||||
- Collect token data for each session
|
||||
- Calculate actual ROI (Session 1 cost vs Sessions 2-5)
|
||||
- Screenshot metrics for YC application
|
||||
|
||||
---
|
||||
|
||||
## Rollout Plan
|
||||
|
||||
### Phase 1: Data Collection (Week 1)
|
||||
- Deploy migration and token capture
|
||||
- Run without displaying metrics yet
|
||||
- Verify data quality and accuracy
|
||||
- Fix any issues with token tracking
|
||||
|
||||
### Phase 2: Display Metrics (Week 2)
|
||||
- Enable context hook display
|
||||
- Gather user feedback
|
||||
- Iterate on presentation format
|
||||
- Document any edge cases
|
||||
|
||||
### Phase 3: YC Application (Week 2-3)
|
||||
- Collect empirical data from real usage
|
||||
- Generate charts/graphs showing ROI
|
||||
- Write case study with actual numbers
|
||||
- Amend YC application with proof
|
||||
|
||||
### Phase 4: Public Launch (Week 4)
|
||||
- Blog post explaining the feature
|
||||
- Update README with ROI metrics
|
||||
- Submit to HN/Reddit with data
|
||||
- Reach out to Anthropic with findings
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
**Technical Success**:
|
||||
- ✅ Token capture accuracy: >95% of SDK responses captured
|
||||
- ✅ Database migration: 0 data loss, all observations migrated
|
||||
- ✅ Display accuracy: Metrics match raw data within 5%
|
||||
|
||||
**Business Success**:
|
||||
- ✅ Demonstrate 50-75% token reduction across 10+ sessions
|
||||
- ✅ YC application strengthened with empirical data
|
||||
- ✅ User/Claude understanding of ROI improves (survey/feedback)
|
||||
|
||||
**Strategic Success**:
|
||||
- ✅ Proof that memory optimization reduces infrastructure needs
|
||||
- ✅ Data compelling enough for Anthropic partnership discussion
|
||||
- ✅ Foundation for enterprise licensing ROI calculator
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Token Attribution**:
|
||||
- Should each observation get cumulative session tokens, or split proportionally?
|
||||
- **Decision**: Use cumulative (simpler, shows total cost at that point)
|
||||
|
||||
2. **Cache Tokens**:
|
||||
- How to handle cache_read_input_tokens in ROI calculation?
|
||||
- **Decision**: Don't count cache reads as discovery (they're already discovered)
|
||||
|
||||
3. **Display Format**:
|
||||
- Show raw token counts or human-readable format (K, M)?
|
||||
- **Decision**: Use toLocaleString() for readability (e.g., "4,000" not "4K")
|
||||
|
||||
4. **Pricing Display**:
|
||||
- Should we show dollar costs too, or just tokens?
|
||||
- **Decision**: Tokens only initially. Pricing varies by model/plan, adds complexity
|
||||
|
||||
5. **Historical Data**:
|
||||
- What to do with old observations without discovery_tokens?
|
||||
- **Decision**: Show as 0 or '-', document limitation
|
||||
|
||||
---
|
||||
|
||||
## Files Modified Summary
|
||||
|
||||
**Core Implementation**:
|
||||
- `src/services/worker/SDKAgent.ts` - Capture usage, pass to storage
|
||||
- `src/services/worker-types.ts` - Add cumulative token fields
|
||||
- `src/services/sqlite/migrations.ts` - Migration #8 for discovery_tokens
|
||||
- `src/services/sqlite/SessionStore.ts` - Store discovery tokens
|
||||
- `src/services/sqlite/types.ts` - Update interfaces
|
||||
- `src/services/sqlite/SessionSearch.ts` - Include in queries
|
||||
- `src/hooks/context-hook.ts` - Display metrics
|
||||
|
||||
**Optional**:
|
||||
- `src/services/sync/ChromaSync.ts` - Include in vector metadata
|
||||
- `src/services/worker/SessionManager.ts` - Initialize cumulative tokens
|
||||
|
||||
**Documentation**:
|
||||
- `CLAUDE.md` - Update with new feature
|
||||
- `README.md` - Add ROI metrics section
|
||||
- Issue #104 - Track implementation progress
|
||||
|
||||
---
|
||||
|
||||
## Timeline Estimate
|
||||
|
||||
**Day 1** (Tomorrow):
|
||||
- [ ] Create branch ✅
|
||||
- [ ] Write implementation plan ✅
|
||||
- [ ] Phase 1: Capture token usage (2 hours)
|
||||
- [ ] Phase 2: Update types (30 min)
|
||||
- [ ] Phase 3: Database migration (1 hour)
|
||||
|
||||
**Day 2**:
|
||||
- [ ] Phase 4: Update SessionStore (1 hour)
|
||||
- [ ] Phase 5: Update types (30 min)
|
||||
- [ ] Phase 6: Update search queries (1 hour)
|
||||
- [ ] Testing: Unit tests (2 hours)
|
||||
|
||||
**Day 3**:
|
||||
- [ ] Phase 7: Update context hook display (2 hours)
|
||||
- [ ] Testing: Integration tests (2 hours)
|
||||
- [ ] Manual testing and iteration (2 hours)
|
||||
|
||||
**Day 4**:
|
||||
- [ ] Collect real usage data (ongoing throughout day)
|
||||
- [ ] Generate YC metrics/charts (2 hours)
|
||||
- [ ] Amend YC application (2 hours)
|
||||
- [ ] Documentation updates (1 hour)
|
||||
|
||||
**Total**: ~20 hours of development over 4 days
|
||||
|
||||
---
|
||||
|
||||
## Risk Mitigation
|
||||
|
||||
**Risk 1**: Agent SDK usage data incomplete or missing
|
||||
**Mitigation**: Default to 0, log warnings, don't break existing functionality
|
||||
|
||||
**Risk 2**: Migration fails on large databases
|
||||
**Mitigation**: Test on database copy first, add rollback mechanism
|
||||
|
||||
**Risk 3**: Token estimates inaccurate
|
||||
**Mitigation**: Document methodology, provide "rough estimate" disclaimer
|
||||
|
||||
**Risk 4**: Display too noisy/overwhelming
|
||||
**Mitigation**: Make display configurable via settings, start collapsed
|
||||
|
||||
**Risk 5**: YC data not compelling enough
|
||||
**Mitigation**: Run on diverse projects, cherry-pick best examples, be honest about limitations
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. ✅ Create branch `enhancement/roi`
|
||||
2. ✅ Write implementation plan
|
||||
3. Start Phase 1: Implement token capture in SDKAgent.ts
|
||||
4. Run manual test to verify usage data captured
|
||||
5. Continue through phases sequentially
|
||||
6. Collect data for YC application by end of week
|
||||
|
||||
---
|
||||
|
||||
## Notes for Tomorrow
|
||||
|
||||
**Start here**: `src/services/worker/SDKAgent.ts` line 64-86
|
||||
**Key insight**: `message.message.usage` contains the token data
|
||||
**Don't forget**: Initialize cumulative tokens to 0 in SessionManager
|
||||
**Test with**: Simple session that reads a few files and creates 1-2 observations
|
||||
|
||||
**The goal**: By end of week, have real numbers showing 50-75% token savings to prove the hypothesis and strengthen YC application.
|
||||
|
||||
---
|
||||
|
||||
*This plan represents ~20 hours of focused development. Prioritize getting Phase 1-7 working correctly over perfection. The YC data is the critical deliverable.*
|
||||
@@ -1,189 +0,0 @@
|
||||
# Memory Leak Fixes - Process Cleanup
|
||||
|
||||
## Problem Summary
|
||||
|
||||
Multiple `uvx` and Python processes were accumulating over time, eventually consuming excessive system resources. The root cause was improper cleanup of child processes spawned by:
|
||||
|
||||
1. **ChromaSync** - Each instance spawns a `uvx chroma-mcp` process via MCP StdioClientTransport
|
||||
2. **Search Server** - Spawns a `uvx chroma-mcp` process for semantic search
|
||||
3. **Worker Service** - Creates an MCP client connection to the search server
|
||||
|
||||
## Root Causes
|
||||
|
||||
### 1. ChromaSync Not Closed in DatabaseManager
|
||||
**Location**: `src/services/worker/DatabaseManager.ts:42-52`
|
||||
|
||||
**Problem**: The `close()` method did not call `chromaSync.close()`, leaving the uvx process running even after the worker shut down.
|
||||
|
||||
**Fix**: Added explicit ChromaSync cleanup in the close() method:
|
||||
```typescript
|
||||
async close(): Promise<void> {
|
||||
// Close ChromaSync first (terminates uvx/python processes)
|
||||
if (this.chromaSync) {
|
||||
try {
|
||||
await this.chromaSync.close();
|
||||
this.chromaSync = null;
|
||||
} catch (error) {
|
||||
logger.error('DB', 'Failed to close ChromaSync', {}, error as Error);
|
||||
}
|
||||
}
|
||||
// ... rest of cleanup
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Search Server No Cleanup Handlers
|
||||
**Location**: `src/servers/search-server.ts:1743-1781`
|
||||
|
||||
**Problem**: The search server had no SIGTERM/SIGINT handlers, so child processes were orphaned when the server was terminated (especially during PM2 restarts).
|
||||
|
||||
**Fix**: Added comprehensive cleanup function:
|
||||
```typescript
|
||||
async function cleanup() {
|
||||
console.error('[search-server] Shutting down...');
|
||||
|
||||
// Close Chroma client (terminates uvx/python processes)
|
||||
if (chromaClient) {
|
||||
await chromaClient.close();
|
||||
}
|
||||
|
||||
// Close database connections
|
||||
if (search) search.close();
|
||||
if (store) store.close();
|
||||
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
// Register cleanup handlers
|
||||
process.on('SIGTERM', cleanup);
|
||||
process.on('SIGINT', cleanup);
|
||||
```
|
||||
|
||||
### 3. Worker Service Not Closing MCP Client
|
||||
**Location**: `src/services/worker-service.ts:214-230`
|
||||
|
||||
**Problem**: The worker service connected to the search server via MCP client but never closed the connection, keeping the search server process alive.
|
||||
|
||||
**Fix**: Added MCP client cleanup in shutdown:
|
||||
```typescript
|
||||
async shutdown(): Promise<void> {
|
||||
await this.sessionManager.shutdownAll();
|
||||
|
||||
// Close MCP client connection (terminates search server process)
|
||||
if (this.mcpClient) {
|
||||
try {
|
||||
await this.mcpClient.close();
|
||||
logger.info('SYSTEM', 'MCP client closed');
|
||||
} catch (error) {
|
||||
logger.error('SYSTEM', 'Failed to close MCP client', {}, error as Error);
|
||||
}
|
||||
}
|
||||
|
||||
// ... rest of shutdown
|
||||
}
|
||||
```
|
||||
|
||||
### 4. PM2 Configuration Not Optimized for Graceful Shutdown
|
||||
**Location**: `ecosystem.config.cjs`
|
||||
|
||||
**Problem**: PM2 watch mode was restarting the worker frequently, but without proper configuration for graceful shutdown, child processes could be orphaned.
|
||||
|
||||
**Fix**: Enhanced PM2 configuration:
|
||||
```javascript
|
||||
{
|
||||
kill_timeout: 5000, // Extra time for cleanup
|
||||
wait_ready: true, // Wait for process to be ready
|
||||
kill_signal: 'SIGTERM', // Use graceful shutdown signal
|
||||
ignore_watch: [
|
||||
'vector-db', // Don't restart on Chroma DB changes
|
||||
'.claude-mem' // Don't restart on data changes
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Process Lifecycle
|
||||
|
||||
### Before Fixes
|
||||
```
|
||||
SessionStart -> Worker -> DatabaseManager -> ChromaSync -> uvx (orphaned)
|
||||
\-> MCP Client -> Search Server -> uvx (orphaned)
|
||||
\-> Chroma Client -> uvx (orphaned)
|
||||
Worker Restart -> 3 new orphaned processes per restart
|
||||
```
|
||||
|
||||
### After Fixes
|
||||
```
|
||||
SessionStart -> Worker -> DatabaseManager -> ChromaSync -> uvx
|
||||
↓
|
||||
Shutdown -> DatabaseManager.close() -> chromaSync.close() -> terminates uvx
|
||||
|
||||
Worker -> MCP Client -> Search Server -> Chroma Client -> uvx
|
||||
↓ ↓
|
||||
Worker.shutdown() -> mcpClient.close() ↓
|
||||
↓ ↓
|
||||
Search Server cleanup() -> chromaClient.close()
|
||||
↓
|
||||
terminates uvx
|
||||
```
|
||||
|
||||
## Testing Process Cleanup
|
||||
|
||||
### Manual Test
|
||||
1. Start worker: `pm2 start ecosystem.config.cjs`
|
||||
2. Check processes: `ps aux | grep -E "(uvx|python.*chroma)" | grep -v grep`
|
||||
3. Create a session (trigger ChromaSync)
|
||||
4. Check process count again
|
||||
5. Restart worker: `pm2 restart claude-mem-worker`
|
||||
6. Wait 5 seconds for cleanup
|
||||
7. Check final process count - should return to baseline
|
||||
|
||||
### Expected Behavior
|
||||
- **Baseline**: 0-1 uvx/python processes (persistent PM2 worker)
|
||||
- **During Session**: +2-3 processes (ChromaSync, Search Server, Chroma)
|
||||
- **After Restart**: Returns to baseline within 5 seconds
|
||||
|
||||
## Verification
|
||||
|
||||
Run the test script:
|
||||
```bash
|
||||
chmod +x tests/test-process-cleanup.sh
|
||||
./tests/test-process-cleanup.sh
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
=== Process Cleanup Test ===
|
||||
1. Initial process count: 0
|
||||
2. Starting test process...
|
||||
During execution: 3 processes
|
||||
3. Final process count: 0
|
||||
✅ PASS: No process leaks detected
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
To monitor for leaks in production:
|
||||
|
||||
```bash
|
||||
# Watch process count over time
|
||||
watch -n 5 'ps aux | grep -E "(uvx|python.*chroma)" | grep -v grep | wc -l'
|
||||
|
||||
# Detailed process list
|
||||
ps aux | grep -E "(uvx|python.*chroma)" | grep -v grep
|
||||
|
||||
# PM2 process monitoring
|
||||
pm2 monit
|
||||
```
|
||||
|
||||
## Additional Safeguards
|
||||
|
||||
1. **Error Handling**: All cleanup operations have try-catch blocks to ensure partial cleanup succeeds even if one component fails
|
||||
2. **Logging**: Comprehensive logging of cleanup operations for debugging
|
||||
3. **Timeout Configuration**: PM2 kill_timeout ensures enough time for graceful shutdown
|
||||
4. **Signal Handling**: Both SIGTERM and SIGINT handlers registered for flexibility
|
||||
|
||||
## Future Improvements
|
||||
|
||||
1. **Process Monitoring**: Add metrics to track child process count over time
|
||||
2. **Health Checks**: Periodic verification that process count stays within expected bounds
|
||||
3. **Automatic Cleanup**: Detect and clean up orphaned processes on worker startup
|
||||
4. **Resource Limits**: Set memory/CPU limits on child processes to prevent runaway resource usage
|
||||
@@ -1,240 +0,0 @@
|
||||
# Memory Leak Fix - Summary & Recommendations
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Fixed critical memory leaks where `uvx`, `python`, and `chroma-mcp` processes were accumulating over time, eventually requiring system shutdown. The root cause was improper cleanup of child processes spawned by ChromaSync and the search server.
|
||||
|
||||
## Issues Fixed
|
||||
|
||||
### 1. ChromaSync Process Leak ✅
|
||||
- **Problem**: ChromaSync spawned `uvx chroma-mcp` processes that were never terminated
|
||||
- **Fix**: DatabaseManager now properly closes ChromaSync connections on shutdown
|
||||
- **Impact**: Prevents 1 orphaned process per worker session
|
||||
|
||||
### 2. Search Server Process Leak ✅
|
||||
- **Problem**: No SIGTERM/SIGINT handlers, orphaned processes on restart
|
||||
- **Fix**: Added comprehensive cleanup function with signal handlers
|
||||
- **Impact**: Prevents 2 orphaned processes per worker restart
|
||||
|
||||
### 3. MCP Client Connection Leak ✅
|
||||
- **Problem**: Worker service never closed MCP client connections
|
||||
- **Fix**: Worker shutdown now closes MCP client
|
||||
- **Impact**: Ensures search server processes are properly terminated
|
||||
|
||||
### 4. PM2 Configuration Issues ✅
|
||||
- **Problem**: Insufficient time for graceful shutdown during restarts
|
||||
- **Fix**: Increased kill_timeout to 5000ms, added proper signal handling
|
||||
- **Impact**: Reduces likelihood of orphaned processes during auto-restarts
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Process Hierarchy
|
||||
```
|
||||
PM2
|
||||
└── Worker Service (Node.js)
|
||||
├── MCP Client → Search Server (Node.js)
|
||||
│ └── Chroma MCP Client → uvx chroma-mcp (Python)
|
||||
└── DatabaseManager
|
||||
└── ChromaSync → uvx chroma-mcp (Python)
|
||||
```
|
||||
|
||||
### Cleanup Chain
|
||||
```
|
||||
SIGTERM/SIGINT
|
||||
↓
|
||||
Worker.shutdown()
|
||||
├→ sessionManager.shutdownAll() (abort SDK agents)
|
||||
├→ mcpClient.close() → Search Server cleanup()
|
||||
│ ├→ chromaClient.close() → terminates uvx
|
||||
│ ├→ search.close()
|
||||
│ └→ store.close()
|
||||
├→ server.close() (HTTP server)
|
||||
└→ dbManager.close()
|
||||
├→ chromaSync.close() → terminates uvx
|
||||
├→ sessionStore.close()
|
||||
└→ sessionSearch.close()
|
||||
```
|
||||
|
||||
## Code Changes
|
||||
|
||||
### Files Modified
|
||||
1. `src/services/worker/DatabaseManager.ts` - Added ChromaSync cleanup
|
||||
2. `src/services/worker-service.ts` - Added MCP client cleanup
|
||||
3. `src/servers/search-server.ts` - Added signal handlers and cleanup
|
||||
4. `ecosystem.config.cjs` - Enhanced PM2 configuration
|
||||
|
||||
### Files Added
|
||||
1. `MEMORY_LEAK_FIXES.md` - Detailed documentation
|
||||
2. `tests/test-process-cleanup.sh` - Verification script
|
||||
|
||||
## Verification
|
||||
|
||||
### Before Fix
|
||||
```bash
|
||||
# After several hours of usage
|
||||
$ ps aux | grep -E "(uvx|python.*chroma)" | grep -v grep | wc -l
|
||||
47 # 47 orphaned processes!
|
||||
```
|
||||
|
||||
### After Fix
|
||||
```bash
|
||||
# After several hours of usage
|
||||
$ ps aux | grep -E "(uvx|python.*chroma)" | grep -v grep | wc -l
|
||||
2 # Only active worker processes
|
||||
```
|
||||
|
||||
## Testing Instructions
|
||||
|
||||
1. **Manual Test**:
|
||||
```bash
|
||||
# Start worker
|
||||
pm2 start ecosystem.config.cjs
|
||||
|
||||
# Check baseline
|
||||
ps aux | grep -E "(uvx|python.*chroma)" | grep -v grep
|
||||
|
||||
# Trigger sessions (use Claude Code with plugin)
|
||||
# ... perform normal operations ...
|
||||
|
||||
# Restart worker
|
||||
pm2 restart claude-mem-worker
|
||||
|
||||
# Wait 5 seconds for cleanup
|
||||
sleep 5
|
||||
|
||||
# Verify processes cleaned up
|
||||
ps aux | grep -E "(uvx|python.*chroma)" | grep -v grep
|
||||
```
|
||||
|
||||
2. **Automated Test**:
|
||||
```bash
|
||||
chmod +x tests/test-process-cleanup.sh
|
||||
./tests/test-process-cleanup.sh
|
||||
```
|
||||
|
||||
## Monitoring Recommendations
|
||||
|
||||
### Real-Time Monitoring
|
||||
```bash
|
||||
# Watch process count (updates every 5 seconds)
|
||||
watch -n 5 'ps aux | grep -E "(uvx|python.*chroma)" | grep -v grep | wc -l'
|
||||
```
|
||||
|
||||
### Periodic Checks
|
||||
```bash
|
||||
# Add to cron (check every hour)
|
||||
0 * * * * pgrep -f "uvx.*chroma" | wc -l >> /tmp/chroma-process-count.log
|
||||
```
|
||||
|
||||
### Alerting
|
||||
```bash
|
||||
# Alert if process count exceeds threshold
|
||||
if [ $(ps aux | grep -E "(uvx|python.*chroma)" | grep -v grep | wc -l) -gt 10 ]; then
|
||||
echo "WARNING: Excessive chroma processes detected" | mail -s "Claude-mem alert" admin@example.com
|
||||
fi
|
||||
```
|
||||
|
||||
## Future Improvements
|
||||
|
||||
### Short-term (Next Release)
|
||||
1. **Process Monitoring Dashboard**
|
||||
- Add endpoint to expose process metrics
|
||||
- Track process count over time
|
||||
- Alert on anomalies
|
||||
|
||||
2. **Orphan Detection**
|
||||
- Scan for orphaned processes on worker startup
|
||||
- Automatically clean up stranded processes
|
||||
- Log cleanup actions
|
||||
|
||||
3. **Health Checks**
|
||||
- Periodic verification of process count
|
||||
- Auto-restart if leak detected
|
||||
- Better logging for debugging
|
||||
|
||||
### Long-term
|
||||
1. **Resource Limits**
|
||||
- Set memory/CPU limits on child processes
|
||||
- Prevent runaway resource usage
|
||||
- Graceful degradation when limits reached
|
||||
|
||||
2. **Process Pooling**
|
||||
- Reuse existing Chroma processes instead of spawning new ones
|
||||
- Connection pooling for MCP clients
|
||||
- Reduce process churn
|
||||
|
||||
3. **Alternative Architecture**
|
||||
- Consider using Chroma's HTTP API instead of MCP
|
||||
- Evaluate in-process embedding models (avoid Python)
|
||||
- Explore WebAssembly-based vector search
|
||||
|
||||
## Known Limitations
|
||||
|
||||
1. **Edge Cases**
|
||||
- If PM2 is force-killed (`kill -9`), cleanup handlers won't run
|
||||
- Network timeouts during MCP client close() may delay cleanup
|
||||
- Concurrent shutdowns might race (should be rare)
|
||||
|
||||
2. **Workarounds**
|
||||
```bash
|
||||
# If processes still accumulate, manual cleanup:
|
||||
pkill -f "uvx.*chroma"
|
||||
pm2 restart claude-mem-worker
|
||||
```
|
||||
|
||||
3. **Recovery**
|
||||
- Worker restarts automatically clean up stale connections
|
||||
- No manual intervention required for normal operation
|
||||
- Process limits provide safety net
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **Signal Handling**
|
||||
- Only responds to SIGTERM and SIGINT (not SIGKILL)
|
||||
- Prevents accidental resource leaks from force-kills
|
||||
- Recommends graceful shutdown procedures
|
||||
|
||||
2. **Resource Exhaustion**
|
||||
- Previous behavior could lead to DoS via resource exhaustion
|
||||
- Fixed code prevents unbounded process growth
|
||||
- System remains stable under load
|
||||
|
||||
3. **CodeQL Analysis**
|
||||
- No security vulnerabilities detected
|
||||
- All cleanup operations use try-catch for safety
|
||||
- Error handling prevents partial cleanup failures
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
If issues occur after deployment:
|
||||
|
||||
1. **Immediate**: Restart worker
|
||||
```bash
|
||||
pm2 restart claude-mem-worker
|
||||
```
|
||||
|
||||
2. **Temporary**: Disable watch mode
|
||||
```bash
|
||||
# Edit ecosystem.config.cjs
|
||||
watch: false
|
||||
pm2 reload ecosystem.config.cjs
|
||||
```
|
||||
|
||||
3. **Full Rollback**: Revert to previous version
|
||||
```bash
|
||||
git revert HEAD
|
||||
npm run build
|
||||
npm run sync-marketplace
|
||||
pm2 restart claude-mem-worker
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
This fix resolves a critical memory leak that was causing system instability. The solution is:
|
||||
- ✅ **Comprehensive**: Addresses all identified leak sources
|
||||
- ✅ **Safe**: Includes error handling and logging
|
||||
- ✅ **Tested**: Includes verification scripts
|
||||
- ✅ **Documented**: Detailed explanations and monitoring guides
|
||||
- ✅ **Backwards Compatible**: No breaking changes to API or behavior
|
||||
|
||||
**Expected Outcome**: System stability restored, no more process accumulation, clean shutdowns during PM2 restarts.
|
||||
Reference in New Issue
Block a user