- Issue #514: Documented analysis of orphaned observer session files, including root cause, evidence, and recommended fixes. - Issue #517: Analyzed PowerShell escaping issues in cleanupOrphanedProcesses() on Windows, with recommended fixes using WMIC. - Issue #520: Confirmed resolution of stuck messages issue through architectural changes to a claim-and-delete pattern. - Issue #527: Identified detection failure of uv on Apple Silicon Macs with Homebrew installation, proposed path updates for detection. - Issue #532: Analyzed memory leak issues in SessionManager, detailing session cleanup and conversationHistory growth concerns, with recommended fixes.
This commit is contained in:
@@ -0,0 +1,292 @@
|
|||||||
|
# Issue #514: Orphaned Observer Session Files Analysis
|
||||||
|
|
||||||
|
**Date:** January 4, 2026
|
||||||
|
**Status:** PARTIALLY RESOLVED - Root cause understood, fix was made but reverted
|
||||||
|
**Original Issue:** 13,000+ orphaned .jsonl session files created over 2 days
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
Issue #514 reported that the plugin created 13,000+ orphaned session .jsonl files in `~/.claude/projects/<project>/`. Each file contained only an initialization message with no actual observations. The hypothesis was that `startSessionProcessor()` in startup-recovery created new observer sessions in a loop.
|
||||||
|
|
||||||
|
**Current State:** The issue was **fixed in commit 9a7f662** with a deterministic `mem-${contentSessionId}` prefix approach, but this fix was **reverted in commit f9197b5** due to the SDK not accepting custom session IDs. The current code uses a NULL-based initialization pattern that can still create orphaned sessions under certain conditions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Evidence: Current File Analysis
|
||||||
|
|
||||||
|
Filesystem analysis of `~/.claude/projects/-Users-alexnewman-Scripts-claude-mem/`:
|
||||||
|
|
||||||
|
| Line Count | Number of Files |
|
||||||
|
|------------|-----------------|
|
||||||
|
| 0 lines (empty) | 407 |
|
||||||
|
| 1 line | **12,562** |
|
||||||
|
| 2 lines | 3,199 |
|
||||||
|
| 3+ lines | 3,546 |
|
||||||
|
| **Total** | **~19,714** |
|
||||||
|
|
||||||
|
The 12,562 single-line files are consistent with the issue description - sessions that initialized but never received observations.
|
||||||
|
|
||||||
|
Sample single-line file content:
|
||||||
|
```json
|
||||||
|
{"type":"queue-operation","operation":"dequeue","timestamp":"2025-12-28T20:41:25.484Z","sessionId":"00081a3b-9485-48a4-89f0-fd4dfccd3ac9"}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Root Cause Analysis
|
||||||
|
|
||||||
|
### The Problem Chain
|
||||||
|
|
||||||
|
1. **Worker startup calls `processPendingQueues()`** (line 281 in worker-service.ts)
|
||||||
|
2. For each session with pending messages, it calls `initializeSession()` then `startSessionProcessor()`
|
||||||
|
3. `startSessionProcessor()` invokes `sdkAgent.startSession()` which calls the Claude Agent SDK `query()` function
|
||||||
|
4. **If `memorySessionId` is NULL**, no `resume` parameter is passed to `query()`
|
||||||
|
5. **The SDK creates a NEW .jsonl file** for each query call without a resume parameter
|
||||||
|
6. **If the query aborts before receiving a response** (timeout, crash, abort signal), the `memorySessionId` is never captured
|
||||||
|
7. On next startup, the cycle repeats - creating yet another orphaned file
|
||||||
|
|
||||||
|
### Why Sessions Abort Before Capturing memorySessionId
|
||||||
|
|
||||||
|
Looking at `startSessionProcessor()` flow:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// worker-service.ts lines 301-321
|
||||||
|
private startSessionProcessor(session, source) {
|
||||||
|
session.generatorPromise = this.sdkAgent.startSession(session, this)
|
||||||
|
.catch(error => { /* error handling */ })
|
||||||
|
.finally(() => {
|
||||||
|
session.generatorPromise = null;
|
||||||
|
this.broadcastProcessingStatus();
|
||||||
|
});
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
And `processPendingQueues()`:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// worker-service.ts lines 347-371
|
||||||
|
for (const sessionDbId of orphanedSessionIds) {
|
||||||
|
const session = this.sessionManager.initializeSession(sessionDbId);
|
||||||
|
this.startSessionProcessor(session, 'startup-recovery');
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 100)); // 100ms delay between sessions
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The problem: Starting 50 sessions rapidly (100ms delay) with pending messages means:
|
||||||
|
- All 50 SDK queries start nearly simultaneously
|
||||||
|
- The SDK creates 50 new .jsonl files (since none have memorySessionId yet)
|
||||||
|
- If any query fails/aborts before the first response, its memorySessionId is never captured
|
||||||
|
- On next startup, those sessions get new files again
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Code Flow: Where .jsonl Files Are Created
|
||||||
|
|
||||||
|
The .jsonl files are created by the **Claude Agent SDK** (`@anthropic-ai/claude-agent-sdk`), not by claude-mem directly.
|
||||||
|
|
||||||
|
When `query()` is called in SDKAgent.ts:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// SDKAgent.ts lines 89-99
|
||||||
|
const queryResult = query({
|
||||||
|
prompt: messageGenerator,
|
||||||
|
options: {
|
||||||
|
model: modelId,
|
||||||
|
// Resume with captured memorySessionId (null on first prompt, real ID on subsequent)
|
||||||
|
...(hasRealMemorySessionId && { resume: session.memorySessionId }),
|
||||||
|
disallowedTools,
|
||||||
|
abortController: session.abortController,
|
||||||
|
pathToClaudeCodeExecutable: claudePath
|
||||||
|
}
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key insight:** If `hasRealMemorySessionId` is false (memorySessionId is null), no `resume` parameter is passed. The SDK then generates a new UUID and creates a new file at:
|
||||||
|
`~/.claude/projects/<dashed-cwd>/<new-uuid>.jsonl`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Fix History
|
||||||
|
|
||||||
|
### Commit 9a7f662: The Original Fix (Reverted)
|
||||||
|
|
||||||
|
```
|
||||||
|
fix(sdk): always pass deterministic session ID to prevent orphaned files
|
||||||
|
|
||||||
|
Fixes #514 - Excessive observer sessions created during startup-recovery
|
||||||
|
|
||||||
|
Root cause: When memorySessionId was null, no `resume` parameter was passed
|
||||||
|
to the SDK's query(). This caused the SDK to create a NEW session file on
|
||||||
|
every call. If queries aborted before capturing the SDK's session_id, the
|
||||||
|
placeholder remained, leading to cascading creation of 13,000+ orphaned files.
|
||||||
|
|
||||||
|
Fix:
|
||||||
|
- Generate deterministic ID `mem-${contentSessionId}` upfront
|
||||||
|
- Always pass it to `resume` parameter
|
||||||
|
- Persist immediately to database before query starts
|
||||||
|
- If SDK returns different ID, capture and use that going forward
|
||||||
|
```
|
||||||
|
|
||||||
|
**This fix was correct in approach** - always passing a resume parameter prevents new file creation.
|
||||||
|
|
||||||
|
### Commit f9197b5: The Revert
|
||||||
|
|
||||||
|
```
|
||||||
|
fix(sdk): restore session continuity via robust capture-and-resume strategy
|
||||||
|
|
||||||
|
Replaces the deterministic 'mem-' ID approach with a capture-based strategy:
|
||||||
|
1. Passes 'resume' parameter ONLY when a verified memory session ID exists
|
||||||
|
2. Captures SDK-generated session ID when it differs from current ID
|
||||||
|
3. Ensures subsequent prompts resume the correctly captured session ID
|
||||||
|
|
||||||
|
This resolves the issue where new sessions were created for every message
|
||||||
|
due to failure to capture/resume the initial session ID, without introducing
|
||||||
|
potentially invalid deterministic IDs.
|
||||||
|
```
|
||||||
|
|
||||||
|
**The revert explanation suggests the SDK rejected the `mem-` prefix IDs.**
|
||||||
|
|
||||||
|
### Commit 005b0f8: Current NULL-based Pattern
|
||||||
|
|
||||||
|
Changed `memory_session_id` initialization from `contentSessionId` (placeholder) to `NULL`:
|
||||||
|
- Simpler logic: `!!session.memorySessionId` instead of `memorySessionId !== contentSessionId`
|
||||||
|
- But still creates new files on first prompt of each session
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Relationship with Issue #520 (Stuck Messages)
|
||||||
|
|
||||||
|
**Issue #520 is related but distinct:**
|
||||||
|
|
||||||
|
| Aspect | Issue #514 (Orphaned Files) | Issue #520 (Stuck Messages) |
|
||||||
|
|--------|-----------------------------|-----------------------------|
|
||||||
|
| Problem | Too many .jsonl files | Messages never processed |
|
||||||
|
| Root Cause | SDK creates new file per query without resume | Old claim-process-mark pattern left messages in 'processing' state |
|
||||||
|
| Status | Partially resolved | **Fully resolved** |
|
||||||
|
| Fix | Need deterministic resume IDs | Changed to claim-and-delete pattern |
|
||||||
|
|
||||||
|
**Connection:** Both issues relate to startup-recovery. Issue #520's fix (claim-and-delete pattern) doesn't create the loop that #514 describes, but #514 can still occur when:
|
||||||
|
1. Sessions have pending messages
|
||||||
|
2. Recovery starts the generator
|
||||||
|
3. Generator aborts before capturing memorySessionId
|
||||||
|
4. Next startup repeats the cycle
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## v8.5.7 Status
|
||||||
|
|
||||||
|
**v8.5.7 did NOT fully address Issue #514.** The major changes were:
|
||||||
|
- Modular architecture refactor
|
||||||
|
- NULL-based initialization pattern
|
||||||
|
- Comprehensive test coverage
|
||||||
|
|
||||||
|
The deterministic `mem-` prefix fix (9a7f662) was reverted before v8.5.7.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Recommended Fix
|
||||||
|
|
||||||
|
### Option 1: Reintroduce Deterministic IDs with SDK Validation
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// SDKAgent.ts - In startSession()
|
||||||
|
async startSession(session: ActiveSession, worker?: WorkerRef): Promise<void> {
|
||||||
|
// Generate deterministic ID based on database session ID (not UUID-based contentSessionId)
|
||||||
|
// Format: "mem-<sessionDbId>" is short and unlikely to conflict
|
||||||
|
const deterministicMemoryId = session.memorySessionId || `mem-${session.sessionDbId}`;
|
||||||
|
|
||||||
|
// Always pass resume to prevent orphaned sessions
|
||||||
|
const queryResult = query({
|
||||||
|
prompt: messageGenerator,
|
||||||
|
options: {
|
||||||
|
model: modelId,
|
||||||
|
resume: deterministicMemoryId, // ALWAYS pass, even if SDK might reject
|
||||||
|
disallowedTools,
|
||||||
|
abortController: session.abortController,
|
||||||
|
pathToClaudeCodeExecutable: claudePath
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Capture whatever ID the SDK actually uses
|
||||||
|
for await (const message of queryResult) {
|
||||||
|
if (message.session_id && message.session_id !== session.memorySessionId) {
|
||||||
|
session.memorySessionId = message.session_id;
|
||||||
|
this.dbManager.getSessionStore().updateMemorySessionId(
|
||||||
|
session.sessionDbId,
|
||||||
|
message.session_id
|
||||||
|
);
|
||||||
|
}
|
||||||
|
// ... rest of processing
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 2: Limit Recovery Scope
|
||||||
|
|
||||||
|
Prevent the recovery loop by limiting how many times a session can be recovered:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// In processPendingQueues()
|
||||||
|
for (const sessionDbId of orphanedSessionIds) {
|
||||||
|
// Check if this session was already recovered recently
|
||||||
|
const dbSession = this.dbManager.getSessionById(sessionDbId);
|
||||||
|
const recoveryAttempts = dbSession.recovery_attempts || 0;
|
||||||
|
|
||||||
|
if (recoveryAttempts >= 3) {
|
||||||
|
logger.warn('SYSTEM', 'Session exceeded max recovery attempts, skipping', {
|
||||||
|
sessionDbId,
|
||||||
|
recoveryAttempts
|
||||||
|
});
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Increment recovery counter
|
||||||
|
this.dbManager.getSessionStore().incrementRecoveryAttempts(sessionDbId);
|
||||||
|
|
||||||
|
// ... rest of recovery
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 3: Cleanup Old Files (Mitigation, Not Fix)
|
||||||
|
|
||||||
|
Add a cleanup script that removes orphaned .jsonl files:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Find files with only 1 line older than 7 days
|
||||||
|
find ~/.claude/projects/ -name "*.jsonl" -mtime +7 \
|
||||||
|
-exec sh -c '[ $(wc -l < "$1") -le 1 ] && rm "$1"' _ {} \;
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files Involved
|
||||||
|
|
||||||
|
| File | Role |
|
||||||
|
|------|------|
|
||||||
|
| `src/services/worker-service.ts` | `startSessionProcessor()`, `processPendingQueues()` |
|
||||||
|
| `src/services/worker/SDKAgent.ts` | `startSession()`, `query()` call with `resume` parameter |
|
||||||
|
| `src/services/worker/SessionManager.ts` | `initializeSession()`, session lifecycle |
|
||||||
|
| `src/services/sqlite/sessions/create.ts` | `createSDKSession()`, NULL-based initialization |
|
||||||
|
| `src/services/sqlite/PendingMessageStore.ts` | `getSessionsWithPendingMessages()` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
Issue #514 was correctly diagnosed. The fix in commit 9a7f662 was the right approach but was reverted because the SDK may not accept arbitrary custom IDs. The current NULL-based pattern (005b0f8) is cleaner but doesn't prevent orphaned files when queries abort before capturing the SDK's session ID.
|
||||||
|
|
||||||
|
**Recommendation:** Reintroduce the deterministic ID approach with proper handling of SDK rejections (Option 1). If the SDK rejects the ID and returns a different one, capture and persist that ID immediately. This ensures at most one .jsonl file per database session, even across crashes and restarts.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix: Git Commit References
|
||||||
|
|
||||||
|
| Commit | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| 9a7f662 | Original fix: deterministic `mem-` prefix IDs (REVERTED) |
|
||||||
|
| f9197b5 | Revert: capture-based strategy without deterministic IDs |
|
||||||
|
| 005b0f8 | NULL-based initialization pattern (current) |
|
||||||
|
| d72a81e | Queue refactoring (related to #520) |
|
||||||
|
| eb1a78b | Claim-and-delete pattern (fixes #520) |
|
||||||
@@ -0,0 +1,87 @@
|
|||||||
|
# Issue #517 Analysis: Windows PowerShell Escaping in cleanupOrphanedProcesses()
|
||||||
|
|
||||||
|
**Date:** 2026-01-04
|
||||||
|
**Version Analyzed:** 8.5.7
|
||||||
|
**Status:** NOT FIXED - Issue still present
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
The reported issue involves PowerShell's `$_` variable being interpreted by Bash before PowerShell receives it when running in Git Bash or WSL environments on Windows. This causes `cleanupOrphanedProcesses()` to fail during worker initialization.
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
The `cleanupOrphanedProcesses()` function is located in:
|
||||||
|
- **File:** `/Users/alexnewman/Scripts/claude-mem/src/services/infrastructure/ProcessManager.ts`
|
||||||
|
- **Lines:** 164-251
|
||||||
|
|
||||||
|
### Problematic Code (Lines 170-172)
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
if (isWindows) {
|
||||||
|
// Windows: Use PowerShell Get-CimInstance to find chroma-mcp processes
|
||||||
|
const cmd = `powershell -Command "Get-CimInstance Win32_Process | Where-Object { $_.Name -like '*python*' -and $_.CommandLine -like '*chroma-mcp*' } | Select-Object -ExpandProperty ProcessId"`;
|
||||||
|
const { stdout } = await execAsync(cmd, { timeout: 60000 });
|
||||||
|
```
|
||||||
|
|
||||||
|
The `$_.Name` and `$_.CommandLine` contain `$_` which is a special variable in both PowerShell and Bash. When this command string is executed via Node.js `child_process.exec()` in a Git Bash or WSL environment, Bash may interpret `$_` as its own special variable (the last argument of the previous command) before passing it to PowerShell.
|
||||||
|
|
||||||
|
### Additional Occurrence (Lines 91-92)
|
||||||
|
|
||||||
|
A similar issue exists in `getChildProcesses()`:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
const cmd = `powershell -Command "Get-CimInstance Win32_Process | Where-Object { $_.ParentProcessId -eq ${parentPid} } | Select-Object -ExpandProperty ProcessId"`;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Error Handling Analysis
|
||||||
|
|
||||||
|
Both functions have try-catch blocks with non-blocking error handling:
|
||||||
|
- Line 208-212: `cleanupOrphanedProcesses()` catches errors and logs a warning, then returns
|
||||||
|
- Line 98-102: `getChildProcesses()` catches errors and logs a warning, returning empty array
|
||||||
|
|
||||||
|
While this prevents worker initialization from crashing, it means orphaned process cleanup silently fails on affected Windows environments.
|
||||||
|
|
||||||
|
## Recommended Fix
|
||||||
|
|
||||||
|
Replace PowerShell commands with WMIC (Windows Management Instrumentation Command-line), which does not use `$_` syntax:
|
||||||
|
|
||||||
|
### For cleanupOrphanedProcesses() (Line 171):
|
||||||
|
|
||||||
|
**Current:**
|
||||||
|
```typescript
|
||||||
|
const cmd = `powershell -Command "Get-CimInstance Win32_Process | Where-Object { $_.Name -like '*python*' -and $_.CommandLine -like '*chroma-mcp*' } | Select-Object -ExpandProperty ProcessId"`;
|
||||||
|
```
|
||||||
|
|
||||||
|
**Recommended:**
|
||||||
|
```typescript
|
||||||
|
const cmd = `wmic process where "name like '%python%' and commandline like '%chroma-mcp%'" get processid /format:list`;
|
||||||
|
```
|
||||||
|
|
||||||
|
### For getChildProcesses() (Line 91):
|
||||||
|
|
||||||
|
**Current:**
|
||||||
|
```typescript
|
||||||
|
const cmd = `powershell -Command "Get-CimInstance Win32_Process | Where-Object { $_.ParentProcessId -eq ${parentPid} } | Select-Object -ExpandProperty ProcessId"`;
|
||||||
|
```
|
||||||
|
|
||||||
|
**Recommended:**
|
||||||
|
```typescript
|
||||||
|
const cmd = `wmic process where "parentprocessid=${parentPid}" get processid /format:list`;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Implementation Notes
|
||||||
|
|
||||||
|
1. WMIC output format differs from PowerShell - parse `ProcessId=12345` format
|
||||||
|
2. WMIC is deprecated in newer Windows versions but still widely available
|
||||||
|
3. Alternative: Use PowerShell with proper escaping (`$$_` or `\$_` depending on context)
|
||||||
|
4. Consider using `powershell -NoProfile -NonInteractive` flags for faster execution
|
||||||
|
|
||||||
|
## Impact Assessment
|
||||||
|
|
||||||
|
- **Severity:** Medium - orphaned process cleanup fails silently
|
||||||
|
- **Scope:** Windows users running in Git Bash, WSL, or mixed shell environments
|
||||||
|
- **Workaround:** None currently - users must manually kill orphaned chroma-mcp processes
|
||||||
|
|
||||||
|
## Files to Modify
|
||||||
|
|
||||||
|
1. `/src/services/infrastructure/ProcessManager.ts` (lines 91-92, 171-172)
|
||||||
@@ -0,0 +1,210 @@
|
|||||||
|
# Issue #520: Stuck Messages Analysis
|
||||||
|
|
||||||
|
**Date:** January 4, 2026
|
||||||
|
**Status:** RESOLVED - Issue no longer exists in current codebase
|
||||||
|
**Original Issue:** Messages in 'processing' status never recovered after worker crash
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
The issue described in GitHub #520 has been **fully resolved** in the current codebase through a fundamental architectural change. The system now uses a **claim-and-delete** pattern instead of the old **claim-process-mark** pattern, which eliminates the stuck 'processing' state problem entirely.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Original Issue Description
|
||||||
|
|
||||||
|
The issue claimed that after a worker crash:
|
||||||
|
|
||||||
|
1. `getSessionsWithPendingMessages()` returns sessions with `status IN ('pending', 'processing')`
|
||||||
|
2. But `claimNextMessage()` only looks for `status = 'pending'`
|
||||||
|
3. So 'processing' messages are orphaned
|
||||||
|
|
||||||
|
**Proposed Fix:** Add `resetStuckMessages(0)` at start of `processPendingQueues()`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current Code Analysis
|
||||||
|
|
||||||
|
### 1. Queue Processing Pattern: Claim-and-Delete
|
||||||
|
|
||||||
|
The current architecture uses `claimAndDelete()` instead of `claimNextMessage()`:
|
||||||
|
|
||||||
|
**File:** `/Users/alexnewman/Scripts/claude-mem/src/services/sqlite/PendingMessageStore.ts`
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Lines 85-104
|
||||||
|
claimAndDelete(sessionDbId: number): PersistentPendingMessage | null {
|
||||||
|
const claimTx = this.db.transaction((sessionId: number) => {
|
||||||
|
const peekStmt = this.db.prepare(`
|
||||||
|
SELECT * FROM pending_messages
|
||||||
|
WHERE session_db_id = ? AND status = 'pending'
|
||||||
|
ORDER BY id ASC
|
||||||
|
LIMIT 1
|
||||||
|
`);
|
||||||
|
const msg = peekStmt.get(sessionId) as PersistentPendingMessage | null;
|
||||||
|
|
||||||
|
if (msg) {
|
||||||
|
// Delete immediately - no "processing" state needed
|
||||||
|
const deleteStmt = this.db.prepare('DELETE FROM pending_messages WHERE id = ?');
|
||||||
|
deleteStmt.run(msg.id);
|
||||||
|
}
|
||||||
|
return msg;
|
||||||
|
});
|
||||||
|
|
||||||
|
return claimTx(sessionDbId) as PersistentPendingMessage | null;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key insight:** Messages are atomically selected and deleted in a single transaction. There is no 'processing' state for messages being actively worked on - they simply don't exist in the database anymore.
|
||||||
|
|
||||||
|
### 2. Iterator Uses claimAndDelete
|
||||||
|
|
||||||
|
**File:** `/Users/alexnewman/Scripts/claude-mem/src/services/queue/SessionQueueProcessor.ts`
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Lines 18-38
|
||||||
|
async *createIterator(sessionDbId: number, signal: AbortSignal): AsyncIterableIterator<PendingMessageWithId> {
|
||||||
|
while (!signal.aborted) {
|
||||||
|
try {
|
||||||
|
// Atomically claim AND DELETE next message from DB
|
||||||
|
// Message is now in memory only - no "processing" state tracking needed
|
||||||
|
const persistentMessage = this.store.claimAndDelete(sessionDbId);
|
||||||
|
|
||||||
|
if (persistentMessage) {
|
||||||
|
// Yield the message for processing (it's already deleted from queue)
|
||||||
|
yield this.toPendingMessageWithId(persistentMessage);
|
||||||
|
} else {
|
||||||
|
// Queue empty - wait for wake-up event
|
||||||
|
await this.waitForMessage(signal);
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
// ... error handling
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. getSessionsWithPendingMessages Still Checks Both States
|
||||||
|
|
||||||
|
**File:** `/Users/alexnewman/Scripts/claude-mem/src/services/sqlite/PendingMessageStore.ts`
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Lines 319-326
|
||||||
|
getSessionsWithPendingMessages(): number[] {
|
||||||
|
const stmt = this.db.prepare(`
|
||||||
|
SELECT DISTINCT session_db_id FROM pending_messages
|
||||||
|
WHERE status IN ('pending', 'processing')
|
||||||
|
`);
|
||||||
|
const results = stmt.all() as { session_db_id: number }[];
|
||||||
|
return results.map(r => r.session_db_id);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**This is technically vestigial code** - with the claim-and-delete pattern, messages should never be in 'processing' state. However, it provides backward compatibility and defense-in-depth.
|
||||||
|
|
||||||
|
### 4. Startup Recovery Still Exists
|
||||||
|
|
||||||
|
**File:** `/Users/alexnewman/Scripts/claude-mem/src/services/worker-service.ts`
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Lines 236-242
|
||||||
|
// Recover stuck messages from previous crashes
|
||||||
|
const { PendingMessageStore } = await import('./sqlite/PendingMessageStore.js');
|
||||||
|
const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
|
||||||
|
const STUCK_THRESHOLD_MS = 5 * 60 * 1000;
|
||||||
|
const resetCount = pendingStore.resetStuckMessages(STUCK_THRESHOLD_MS);
|
||||||
|
if (resetCount > 0) {
|
||||||
|
logger.info('SYSTEM', `Recovered ${resetCount} stuck messages from previous session`, { thresholdMinutes: 5 });
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This runs BEFORE `processPendingQueues()` is called (line 281), which addresses the original fix request.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification of Issue Status
|
||||||
|
|
||||||
|
### Does the Issue Exist?
|
||||||
|
|
||||||
|
**NO** - The issue as described no longer exists because:
|
||||||
|
|
||||||
|
1. **No 'processing' state during normal operation**: With claim-and-delete, messages go directly from 'pending' to 'deleted'. They never enter a 'processing' state.
|
||||||
|
|
||||||
|
2. **Startup recovery handles legacy stuck messages**: Even if 'processing' messages exist (from old code or edge cases), `resetStuckMessages()` is called BEFORE `processPendingQueues()` in `initializeBackground()` (lines 236-241 run before line 281).
|
||||||
|
|
||||||
|
3. **Architecture fundamentally changed**: The old `claimNextMessage()` function that only looked for `status = 'pending'` no longer exists. It was replaced with `claimAndDelete()`.
|
||||||
|
|
||||||
|
### GeminiAgent and OpenRouterAgent Behavior
|
||||||
|
|
||||||
|
Both agents use the same `SessionManager.getMessageIterator()` which calls `SessionQueueProcessor.createIterator()` which uses `claimAndDelete()`. All three agents (SDKAgent, GeminiAgent, OpenRouterAgent) use identical queue processing:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// GeminiAgent.ts:174, OpenRouterAgent.ts:134
|
||||||
|
for await (const message of this.sessionManager.getMessageIterator(session.sessionDbId)) {
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
They do NOT handle recovery differently - they all rely on the shared infrastructure.
|
||||||
|
|
||||||
|
### What v8.5.7 Changed
|
||||||
|
|
||||||
|
Looking at the git history:
|
||||||
|
|
||||||
|
```
|
||||||
|
v8.5.7 (ac03901):
|
||||||
|
- Minor ESM/CommonJS compatibility fix for isMainModule detection
|
||||||
|
- No queue-related changes
|
||||||
|
|
||||||
|
v8.5.6 -> v8.5.7:
|
||||||
|
- f21ea97 refactor: decompose monolith into modular architecture with comprehensive test suite (#538)
|
||||||
|
```
|
||||||
|
|
||||||
|
The major refactor happened before v8.5.7. The claim-and-delete pattern was already in place.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Timeline of Resolution
|
||||||
|
|
||||||
|
Based on git history, the issue was likely resolved through these commits:
|
||||||
|
|
||||||
|
1. **b8ce27b** - `feat(queue): Simplify queue processing and enhance reliability`
|
||||||
|
2. **eb1a78b** - `fix: eliminate duplicate observations by simplifying message queue`
|
||||||
|
3. **d72a81e** - `Refactor session queue processing and database interactions`
|
||||||
|
|
||||||
|
These commits appear to have introduced the claim-and-delete pattern that eliminates the original bug.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
**Issue #520 should be closed as resolved.**
|
||||||
|
|
||||||
|
The described bug (`claimNextMessage()` only checking `status = 'pending'`) no longer exists because:
|
||||||
|
|
||||||
|
1. `claimNextMessage()` was replaced with `claimAndDelete()` which atomically removes messages
|
||||||
|
2. `resetStuckMessages()` is already called at startup BEFORE `processPendingQueues()`
|
||||||
|
3. The 'processing' status is now only used for legacy compatibility and edge cases
|
||||||
|
|
||||||
|
### No Fix Needed
|
||||||
|
|
||||||
|
The proposed fix ("Add `resetStuckMessages(0)` at start of `processPendingQueues()`") is:
|
||||||
|
|
||||||
|
1. **Unnecessary** - The recovery happens in `initializeBackground()` before `processPendingQueues()` is called
|
||||||
|
2. **Using wrong threshold** - `resetStuckMessages(0)` would reset ALL processing messages immediately, which could cause issues if called during normal operation (not just startup)
|
||||||
|
|
||||||
|
The current implementation with a 5-minute threshold is more robust - it only recovers truly stuck messages, not messages that are actively being processed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix: File References
|
||||||
|
|
||||||
|
| Component | File | Key Lines |
|
||||||
|
|-----------|------|-----------|
|
||||||
|
| claimAndDelete | `src/services/sqlite/PendingMessageStore.ts` | 85-104 |
|
||||||
|
| Queue Iterator | `src/services/queue/SessionQueueProcessor.ts` | 18-38 |
|
||||||
|
| Startup Recovery | `src/services/worker-service.ts` | 236-242 |
|
||||||
|
| processPendingQueues | `src/services/worker-service.ts` | 326-375 |
|
||||||
|
| getSessionsWithPendingMessages | `src/services/sqlite/PendingMessageStore.ts` | 319-326 |
|
||||||
|
| resetStuckMessages | `src/services/sqlite/PendingMessageStore.ts` | 279-290 |
|
||||||
@@ -0,0 +1,112 @@
|
|||||||
|
# Issue #527: uv Detection Fails on Apple Silicon Macs with Homebrew Installation
|
||||||
|
|
||||||
|
**Date**: 2026-01-04
|
||||||
|
**Issue**: GitHub Issue #527
|
||||||
|
**Status**: Confirmed - Fix Required
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
The `isUvInstalled()` function fails to detect uv when installed via Homebrew on Apple Silicon Macs because it does not check the `/opt/homebrew/bin/uv` path.
|
||||||
|
|
||||||
|
## Analysis
|
||||||
|
|
||||||
|
### Files Affected
|
||||||
|
|
||||||
|
Two copies of `smart-install.js` exist in the codebase:
|
||||||
|
|
||||||
|
1. **Source file**: `/Users/alexnewman/Scripts/claude-mem/scripts/smart-install.js`
|
||||||
|
2. **Built/deployed file**: `/Users/alexnewman/Scripts/claude-mem/plugin/scripts/smart-install.js`
|
||||||
|
|
||||||
|
### Current uv Path Detection
|
||||||
|
|
||||||
|
**Source file (`scripts/smart-install.js`)** - Lines 22-24:
|
||||||
|
```javascript
|
||||||
|
const UV_COMMON_PATHS = IS_WINDOWS
|
||||||
|
? [join(homedir(), '.local', 'bin', 'uv.exe'), join(homedir(), '.cargo', 'bin', 'uv.exe')]
|
||||||
|
: [join(homedir(), '.local', 'bin', 'uv'), join(homedir(), '.cargo', 'bin', 'uv'), '/usr/local/bin/uv'];
|
||||||
|
```
|
||||||
|
|
||||||
|
**Plugin file (`plugin/scripts/smart-install.js`)** - Lines 103-105:
|
||||||
|
```javascript
|
||||||
|
const uvPaths = IS_WINDOWS
|
||||||
|
? [join(homedir(), '.local', 'bin', 'uv.exe'), join(homedir(), '.cargo', 'bin', 'uv.exe')]
|
||||||
|
: [join(homedir(), '.local', 'bin', 'uv'), join(homedir(), '.cargo', 'bin', 'uv'), '/usr/local/bin/uv'];
|
||||||
|
```
|
||||||
|
|
||||||
|
### Paths Currently Checked (Unix/macOS)
|
||||||
|
|
||||||
|
| Path | Installer | Architecture |
|
||||||
|
|------|-----------|--------------|
|
||||||
|
| `~/.local/bin/uv` | Official installer | Any |
|
||||||
|
| `~/.cargo/bin/uv` | Cargo/Rust install | Any |
|
||||||
|
| `/usr/local/bin/uv` | Homebrew (Intel) | x86_64 |
|
||||||
|
|
||||||
|
### Missing Path
|
||||||
|
|
||||||
|
| Path | Installer | Architecture |
|
||||||
|
|------|-----------|--------------|
|
||||||
|
| `/opt/homebrew/bin/uv` | Homebrew (Apple Silicon) | arm64 |
|
||||||
|
|
||||||
|
## Root Cause
|
||||||
|
|
||||||
|
Homebrew installs to different prefixes depending on architecture:
|
||||||
|
- **Intel Macs (x86_64)**: `/usr/local/bin/`
|
||||||
|
- **Apple Silicon Macs (arm64)**: `/opt/homebrew/bin/`
|
||||||
|
|
||||||
|
The current implementation only includes the Intel Homebrew path, causing detection to fail on Apple Silicon when:
|
||||||
|
1. uv is installed via `brew install uv`
|
||||||
|
2. The user's shell PATH is not available during script execution (common in non-interactive contexts)
|
||||||
|
|
||||||
|
## Impact
|
||||||
|
|
||||||
|
Users on Apple Silicon Macs who installed uv via Homebrew will:
|
||||||
|
1. See "uv not found" errors
|
||||||
|
2. Have uv unnecessarily reinstalled via the official installer
|
||||||
|
3. End up with duplicate installations
|
||||||
|
|
||||||
|
## Recommended Fix
|
||||||
|
|
||||||
|
Add `/opt/homebrew/bin/uv` to the Unix paths array.
|
||||||
|
|
||||||
|
### Source file (`scripts/smart-install.js`) - Line 24
|
||||||
|
|
||||||
|
**Before:**
|
||||||
|
```javascript
|
||||||
|
: [join(homedir(), '.local', 'bin', 'uv'), join(homedir(), '.cargo', 'bin', 'uv'), '/usr/local/bin/uv'];
|
||||||
|
```
|
||||||
|
|
||||||
|
**After:**
|
||||||
|
```javascript
|
||||||
|
: [join(homedir(), '.local', 'bin', 'uv'), join(homedir(), '.cargo', 'bin', 'uv'), '/usr/local/bin/uv', '/opt/homebrew/bin/uv'];
|
||||||
|
```
|
||||||
|
|
||||||
|
### Plugin file (`plugin/scripts/smart-install.js`) - Lines 103-105 and 222-224
|
||||||
|
|
||||||
|
The same fix should be applied in both locations where `uvPaths` is defined:
|
||||||
|
- Line 105 in `isUvInstalled()`
|
||||||
|
- Line 224 in `installUv()`
|
||||||
|
|
||||||
|
### Note: Bun Has the Same Issue
|
||||||
|
|
||||||
|
The Bun detection has the same gap:
|
||||||
|
|
||||||
|
**Current (`scripts/smart-install.js` line 20):**
|
||||||
|
```javascript
|
||||||
|
: [join(homedir(), '.bun', 'bin', 'bun'), '/usr/local/bin/bun'];
|
||||||
|
```
|
||||||
|
|
||||||
|
**Should also add:**
|
||||||
|
```javascript
|
||||||
|
: [join(homedir(), '.bun', 'bin', 'bun'), '/usr/local/bin/bun', '/opt/homebrew/bin/bun'];
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
After the fix, verify by:
|
||||||
|
1. Installing uv via Homebrew on an Apple Silicon Mac
|
||||||
|
2. Running the smart-install script
|
||||||
|
3. Confirming uv is detected without attempting reinstallation
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
**Fix is required.** The `/opt/homebrew/bin/uv` path is missing from both files. This is a simple one-line addition to the path arrays. The same fix should also be applied to Bun detection paths for consistency.
|
||||||
@@ -0,0 +1,324 @@
|
|||||||
|
# Issue #532: Memory Leak in SessionManager - Analysis Report
|
||||||
|
|
||||||
|
**Date**: 2026-01-04
|
||||||
|
**Issue**: Memory leak causing 54GB+ VS Code memory consumption after several days of use
|
||||||
|
**Reported Root Causes**:
|
||||||
|
1. Sessions never auto-cleanup after SDK agent completes
|
||||||
|
2. `conversationHistory` array grows unbounded (never trimmed)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
This analysis confirms **both issues exist in the current codebase** (v8.5.7). While v8.5.7 included a major modular refactor, it did **not address either memory leak issue**. The `SessionManager` holds sessions indefinitely in memory with no TTL/cleanup mechanism, and `conversationHistory` arrays grow unbounded within each session (with only OpenRouter implementing partial mitigation).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. SessionManager Session Storage Analysis
|
||||||
|
|
||||||
|
### Location
|
||||||
|
`/Users/alexnewman/Scripts/claude-mem/src/services/worker/SessionManager.ts`
|
||||||
|
|
||||||
|
### Current Implementation
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
export class SessionManager {
|
||||||
|
private sessions: Map<number, ActiveSession> = new Map();
|
||||||
|
private sessionQueues: Map<number, EventEmitter> = new Map();
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Sessions are stored in an in-memory `Map<number, ActiveSession>` with the session database ID as the key.
|
||||||
|
|
||||||
|
### Session Lifecycle
|
||||||
|
|
||||||
|
| Event | Method | Behavior |
|
||||||
|
|-------|--------|----------|
|
||||||
|
| Session created | `initializeSession()` | Added to `this.sessions` Map (line 152) |
|
||||||
|
| Session deleted | `deleteSession()` | Removed from `this.sessions` Map (line 293) |
|
||||||
|
| Worker shutdown | `shutdownAll()` | Calls `deleteSession()` on all sessions |
|
||||||
|
|
||||||
|
### The Problem: No Automatic Cleanup
|
||||||
|
|
||||||
|
Looking at `/Users/alexnewman/Scripts/claude-mem/src/services/worker/http/routes/SessionRoutes.ts` (lines 213-216), the session completion handling has this comment:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// NOTE: We do NOT delete the session here anymore.
|
||||||
|
// The generator waits for events, so if it exited, it's either aborted or crashed.
|
||||||
|
// Idle sessions stay in memory (ActiveSession is small) to listen for future events.
|
||||||
|
```
|
||||||
|
|
||||||
|
**Critical Finding**: Sessions are **intentionally never deleted** after the SDK agent completes. They persist indefinitely "to listen for future events."
|
||||||
|
|
||||||
|
### When Sessions ARE Deleted
|
||||||
|
|
||||||
|
Sessions are only deleted when:
|
||||||
|
1. Explicit `DELETE /sessions/:sessionDbId` HTTP request (manual cleanup)
|
||||||
|
2. `POST /sessions/:sessionDbId/complete` HTTP request (cleanup-hook callback)
|
||||||
|
3. Worker service shutdown (`shutdownAll()`)
|
||||||
|
|
||||||
|
There is **NO automatic cleanup mechanism** based on:
|
||||||
|
- Session age/TTL
|
||||||
|
- Session inactivity timeout
|
||||||
|
- Memory pressure
|
||||||
|
- Completed/failed status
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. conversationHistory Analysis
|
||||||
|
|
||||||
|
### Location
|
||||||
|
`/Users/alexnewman/Scripts/claude-mem/src/services/worker-types.ts` (line 34)
|
||||||
|
|
||||||
|
### Type Definition
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
export interface ActiveSession {
|
||||||
|
// ...
|
||||||
|
conversationHistory: ConversationMessage[]; // Shared conversation history for provider switching
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Usage Pattern
|
||||||
|
|
||||||
|
The `conversationHistory` array is populated by three agent implementations:
|
||||||
|
|
||||||
|
1. **SDKAgent** (`/Users/alexnewman/Scripts/claude-mem/src/services/worker/SDKAgent.ts`)
|
||||||
|
- Adds user messages at lines 247, 280, 302
|
||||||
|
- Assistant responses added via `ResponseProcessor`
|
||||||
|
|
||||||
|
2. **GeminiAgent** (`/Users/alexnewman/Scripts/claude-mem/src/services/worker/GeminiAgent.ts`)
|
||||||
|
- Adds user messages at lines 143, 196, 232
|
||||||
|
- Adds assistant responses at lines 148, 202, 238
|
||||||
|
|
||||||
|
3. **OpenRouterAgent** (`/Users/alexnewman/Scripts/claude-mem/src/services/worker/OpenRouterAgent.ts`)
|
||||||
|
- Adds user messages at lines 103, 155, 191
|
||||||
|
- Adds assistant responses at lines 108, 161, 197
|
||||||
|
- **Implements truncation**: See `truncateHistory()` at lines 262-301
|
||||||
|
|
||||||
|
4. **ResponseProcessor** (`/Users/alexnewman/Scripts/claude-mem/src/services/worker/agents/ResponseProcessor.ts`)
|
||||||
|
- Adds assistant responses at line 57
|
||||||
|
|
||||||
|
### The Problem: Unbounded Growth
|
||||||
|
|
||||||
|
**For Claude SDK and Gemini agents**, there is **no limit or trimming** of `conversationHistory`. Every message is `push()`ed without checking array size.
|
||||||
|
|
||||||
|
**OpenRouter ONLY** has mitigation via `truncateHistory()`:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
private truncateHistory(history: ConversationMessage[]): ConversationMessage[] {
|
||||||
|
const MAX_CONTEXT_MESSAGES = parseInt(settings.CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES) || 20;
|
||||||
|
const MAX_ESTIMATED_TOKENS = parseInt(settings.CLAUDE_MEM_OPENROUTER_MAX_TOKENS) || 100000;
|
||||||
|
|
||||||
|
// Sliding window: keep most recent messages within limits
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
However, this only truncates the copy sent to OpenRouter API - **it does NOT truncate the actual `session.conversationHistory` array**. The original array still grows unbounded.
|
||||||
|
|
||||||
|
### Memory Impact Calculation
|
||||||
|
|
||||||
|
Each `ConversationMessage` contains:
|
||||||
|
- `role`: 'user' | 'assistant' (small string)
|
||||||
|
- `content`: string (can be very large - full prompts/responses)
|
||||||
|
|
||||||
|
A typical session with 100 tool uses could have:
|
||||||
|
- 1 init prompt (~2KB)
|
||||||
|
- 100 observation prompts (~5KB each = 500KB)
|
||||||
|
- 100 responses (~1KB each = 100KB)
|
||||||
|
- 1 summary prompt + response (~5KB)
|
||||||
|
|
||||||
|
**Per session**: ~600KB in `conversationHistory` alone
|
||||||
|
|
||||||
|
After several days with many sessions, this adds up to gigabytes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. v8.5.7 Refactor Assessment
|
||||||
|
|
||||||
|
The v8.5.7 release (2026-01-04) focused on modular architecture refactoring:
|
||||||
|
|
||||||
|
### What v8.5.7 DID:
|
||||||
|
- Extracted SQLite repositories into `/src/services/sqlite/`
|
||||||
|
- Extracted worker agents into `/src/services/worker/agents/`
|
||||||
|
- Extracted search strategies into `/src/services/worker/search/`
|
||||||
|
- Extracted context generation into `/src/services/context/`
|
||||||
|
- Extracted infrastructure into `/src/services/infrastructure/`
|
||||||
|
- Added 595 tests across 36 test files
|
||||||
|
|
||||||
|
### What v8.5.7 DID NOT address:
|
||||||
|
- No session TTL or automatic cleanup mechanism
|
||||||
|
- No `conversationHistory` size limits for Claude SDK or Gemini
|
||||||
|
- No memory pressure monitoring for sessions
|
||||||
|
- The "sessions stay in memory" design comment was already present
|
||||||
|
|
||||||
|
**Relevant v8.5.2 Note**: There was a related fix for SDK Agent child process memory leak (orphaned Claude processes), but that addressed process cleanup, not in-memory session state.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Specific Code Locations Requiring Fixes
|
||||||
|
|
||||||
|
### Fix Location 1: SessionManager needs cleanup mechanism
|
||||||
|
**File**: `/Users/alexnewman/Scripts/claude-mem/src/services/worker/SessionManager.ts`
|
||||||
|
|
||||||
|
Add automatic session cleanup based on:
|
||||||
|
- Session completion (when generator finishes and no pending work)
|
||||||
|
- Session age TTL (e.g., 1 hour after last activity)
|
||||||
|
- Memory pressure (configurable max sessions)
|
||||||
|
|
||||||
|
### Fix Location 2: conversationHistory needs bounds
|
||||||
|
**Files**:
|
||||||
|
- `/Users/alexnewman/Scripts/claude-mem/src/services/worker/SDKAgent.ts`
|
||||||
|
- `/Users/alexnewman/Scripts/claude-mem/src/services/worker/GeminiAgent.ts`
|
||||||
|
- `/Users/alexnewman/Scripts/claude-mem/src/services/worker/agents/ResponseProcessor.ts`
|
||||||
|
|
||||||
|
Apply sliding window truncation similar to OpenRouterAgent's approach, but mutate the original array.
|
||||||
|
|
||||||
|
### Fix Location 3: Session cleanup on completion
|
||||||
|
**File**: `/Users/alexnewman/Scripts/claude-mem/src/services/worker/http/routes/SessionRoutes.ts`
|
||||||
|
|
||||||
|
Remove the design decision to keep idle sessions in memory. Add cleanup timer after generator completes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Recommended Fixes
|
||||||
|
|
||||||
|
### Fix 1: Add Session TTL and Cleanup Timer
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// In SessionManager.ts
|
||||||
|
|
||||||
|
private readonly SESSION_TTL_MS = 60 * 60 * 1000; // 1 hour
|
||||||
|
private cleanupTimers: Map<number, NodeJS.Timeout> = new Map();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Schedule automatic cleanup for idle sessions
|
||||||
|
*/
|
||||||
|
scheduleSessionCleanup(sessionDbId: number): void {
|
||||||
|
// Clear existing timer if any
|
||||||
|
const existingTimer = this.cleanupTimers.get(sessionDbId);
|
||||||
|
if (existingTimer) {
|
||||||
|
clearTimeout(existingTimer);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Schedule cleanup after TTL
|
||||||
|
const timer = setTimeout(() => {
|
||||||
|
const session = this.sessions.get(sessionDbId);
|
||||||
|
if (session && !session.generatorPromise) {
|
||||||
|
// Only delete if no active generator
|
||||||
|
this.deleteSession(sessionDbId);
|
||||||
|
logger.info('SESSION', 'Session auto-cleaned due to TTL', { sessionDbId });
|
||||||
|
}
|
||||||
|
}, this.SESSION_TTL_MS);
|
||||||
|
|
||||||
|
this.cleanupTimers.set(sessionDbId, timer);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Cancel cleanup timer (call when session receives new work)
|
||||||
|
*/
|
||||||
|
cancelSessionCleanup(sessionDbId: number): void {
|
||||||
|
const timer = this.cleanupTimers.get(sessionDbId);
|
||||||
|
if (timer) {
|
||||||
|
clearTimeout(timer);
|
||||||
|
this.cleanupTimers.delete(sessionDbId);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fix 2: Add conversationHistory Bounds
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// In src/services/worker/SessionManager.ts or new utility file
|
||||||
|
|
||||||
|
const MAX_CONVERSATION_HISTORY_LENGTH = 50; // Configurable
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Trim conversation history to prevent unbounded growth
|
||||||
|
* Keeps the most recent messages
|
||||||
|
*/
|
||||||
|
export function trimConversationHistory(session: ActiveSession): void {
|
||||||
|
if (session.conversationHistory.length > MAX_CONVERSATION_HISTORY_LENGTH) {
|
||||||
|
const toRemove = session.conversationHistory.length - MAX_CONVERSATION_HISTORY_LENGTH;
|
||||||
|
session.conversationHistory.splice(0, toRemove);
|
||||||
|
logger.debug('SESSION', 'Trimmed conversation history', {
|
||||||
|
sessionDbId: session.sessionDbId,
|
||||||
|
removed: toRemove,
|
||||||
|
remaining: session.conversationHistory.length
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Then call this after each message is added in SDKAgent, GeminiAgent, and ResponseProcessor.
|
||||||
|
|
||||||
|
### Fix 3: Update SessionRoutes Generator Completion
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// In SessionRoutes.ts, update the finally block (around line 164)
|
||||||
|
|
||||||
|
.finally(() => {
|
||||||
|
const sessionDbId = session.sessionDbId;
|
||||||
|
const wasAborted = session.abortController.signal.aborted;
|
||||||
|
|
||||||
|
if (wasAborted) {
|
||||||
|
logger.info('SESSION', `Generator aborted`, { sessionId: sessionDbId });
|
||||||
|
} else {
|
||||||
|
logger.info('SESSION', `Generator completed naturally`, { sessionId: sessionDbId });
|
||||||
|
}
|
||||||
|
|
||||||
|
session.generatorPromise = null;
|
||||||
|
session.currentProvider = null;
|
||||||
|
this.workerService.broadcastProcessingStatus();
|
||||||
|
|
||||||
|
// Check for pending work
|
||||||
|
const pendingStore = this.sessionManager.getPendingMessageStore();
|
||||||
|
const pendingCount = pendingStore.getPendingCount(sessionDbId);
|
||||||
|
|
||||||
|
if (pendingCount > 0 && !wasAborted) {
|
||||||
|
// Restart for pending work
|
||||||
|
// ... existing restart logic ...
|
||||||
|
} else {
|
||||||
|
// No pending work - schedule cleanup instead of keeping forever
|
||||||
|
this.sessionManager.scheduleSessionCleanup(sessionDbId);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Configuration Recommendations
|
||||||
|
|
||||||
|
Add these to `settings.json` defaults:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"CLAUDE_MEM_SESSION_TTL_MINUTES": 60,
|
||||||
|
"CLAUDE_MEM_MAX_CONVERSATION_HISTORY": 50,
|
||||||
|
"CLAUDE_MEM_MAX_ACTIVE_SESSIONS": 100
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Testing Recommendations
|
||||||
|
|
||||||
|
Add tests for:
|
||||||
|
1. Session cleanup after TTL expires
|
||||||
|
2. `conversationHistory` trimming at various sizes
|
||||||
|
3. Memory monitoring under sustained load
|
||||||
|
4. Cleanup timer cancellation on new work
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
| Issue | Status in v8.5.7 | Fix Required |
|
||||||
|
|-------|------------------|--------------|
|
||||||
|
| Sessions never auto-cleanup | NOT FIXED | Yes - add TTL/cleanup mechanism |
|
||||||
|
| conversationHistory unbounded | NOT FIXED (except partial OpenRouter mitigation) | Yes - add trimming to all agents |
|
||||||
|
|
||||||
|
Both memory leaks are confirmed to exist in the current codebase and require the fixes outlined above.
|
||||||
+1
-1
@@ -1,6 +1,6 @@
|
|||||||
{
|
{
|
||||||
"name": "claude-mem-plugin",
|
"name": "claude-mem-plugin",
|
||||||
"version": "8.5.6",
|
"version": "8.5.7",
|
||||||
"private": true,
|
"private": true,
|
||||||
"description": "Runtime dependencies for claude-mem bundled hooks",
|
"description": "Runtime dependencies for claude-mem bundled hooks",
|
||||||
"type": "module",
|
"type": "module",
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user