Files
claude-mem/docs/reports/2026-01-04--issue-514-orphaned-sessions-analysis.md
T
Alex Newman bb033b95f1 feat(docs): Add analysis reports for issues #514, #517, #520, #527, and #532
- Issue #514: Documented analysis of orphaned observer session files, including root cause, evidence, and recommended fixes.
- Issue #517: Analyzed PowerShell escaping issues in cleanupOrphanedProcesses() on Windows, with recommended fixes using WMIC.
- Issue #520: Confirmed resolution of stuck messages issue through architectural changes to a claim-and-delete pattern.
- Issue #527: Identified detection failure of uv on Apple Silicon Macs with Homebrew installation, proposed path updates for detection.
- Issue #532: Analyzed memory leak issues in SessionManager, detailing session cleanup and conversationHistory growth concerns, with recommended fixes.
2026-01-04 00:21:22 -05:00

11 KiB

Issue #514: Orphaned Observer Session Files Analysis

Date: January 4, 2026 Status: PARTIALLY RESOLVED - Root cause understood, fix was made but reverted Original Issue: 13,000+ orphaned .jsonl session files created over 2 days


Executive Summary

Issue #514 reported that the plugin created 13,000+ orphaned session .jsonl files in ~/.claude/projects/<project>/. Each file contained only an initialization message with no actual observations. The hypothesis was that startSessionProcessor() in startup-recovery created new observer sessions in a loop.

Current State: The issue was fixed in commit 9a7f662 with a deterministic mem-${contentSessionId} prefix approach, but this fix was reverted in commit f9197b5 due to the SDK not accepting custom session IDs. The current code uses a NULL-based initialization pattern that can still create orphaned sessions under certain conditions.


Evidence: Current File Analysis

Filesystem analysis of ~/.claude/projects/-Users-alexnewman-Scripts-claude-mem/:

Line Count Number of Files
0 lines (empty) 407
1 line 12,562
2 lines 3,199
3+ lines 3,546
Total ~19,714

The 12,562 single-line files are consistent with the issue description - sessions that initialized but never received observations.

Sample single-line file content:

{"type":"queue-operation","operation":"dequeue","timestamp":"2025-12-28T20:41:25.484Z","sessionId":"00081a3b-9485-48a4-89f0-fd4dfccd3ac9"}

Root Cause Analysis

The Problem Chain

  1. Worker startup calls processPendingQueues() (line 281 in worker-service.ts)
  2. For each session with pending messages, it calls initializeSession() then startSessionProcessor()
  3. startSessionProcessor() invokes sdkAgent.startSession() which calls the Claude Agent SDK query() function
  4. If memorySessionId is NULL, no resume parameter is passed to query()
  5. The SDK creates a NEW .jsonl file for each query call without a resume parameter
  6. If the query aborts before receiving a response (timeout, crash, abort signal), the memorySessionId is never captured
  7. On next startup, the cycle repeats - creating yet another orphaned file

Why Sessions Abort Before Capturing memorySessionId

Looking at startSessionProcessor() flow:

// worker-service.ts lines 301-321
private startSessionProcessor(session, source) {
  session.generatorPromise = this.sdkAgent.startSession(session, this)
    .catch(error => { /* error handling */ })
    .finally(() => {
      session.generatorPromise = null;
      this.broadcastProcessingStatus();
    });
}

And processPendingQueues():

// worker-service.ts lines 347-371
for (const sessionDbId of orphanedSessionIds) {
  const session = this.sessionManager.initializeSession(sessionDbId);
  this.startSessionProcessor(session, 'startup-recovery');
  await new Promise(resolve => setTimeout(resolve, 100));  // 100ms delay between sessions
}

The problem: Starting 50 sessions rapidly (100ms delay) with pending messages means:

  • All 50 SDK queries start nearly simultaneously
  • The SDK creates 50 new .jsonl files (since none have memorySessionId yet)
  • If any query fails/aborts before the first response, its memorySessionId is never captured
  • On next startup, those sessions get new files again

Code Flow: Where .jsonl Files Are Created

The .jsonl files are created by the Claude Agent SDK (@anthropic-ai/claude-agent-sdk), not by claude-mem directly.

When query() is called in SDKAgent.ts:

// SDKAgent.ts lines 89-99
const queryResult = query({
  prompt: messageGenerator,
  options: {
    model: modelId,
    // Resume with captured memorySessionId (null on first prompt, real ID on subsequent)
    ...(hasRealMemorySessionId && { resume: session.memorySessionId }),
    disallowedTools,
    abortController: session.abortController,
    pathToClaudeCodeExecutable: claudePath
  }
});

Key insight: If hasRealMemorySessionId is false (memorySessionId is null), no resume parameter is passed. The SDK then generates a new UUID and creates a new file at: ~/.claude/projects/<dashed-cwd>/<new-uuid>.jsonl


Fix History

Commit 9a7f662: The Original Fix (Reverted)

fix(sdk): always pass deterministic session ID to prevent orphaned files

Fixes #514 - Excessive observer sessions created during startup-recovery

Root cause: When memorySessionId was null, no `resume` parameter was passed
to the SDK's query(). This caused the SDK to create a NEW session file on
every call. If queries aborted before capturing the SDK's session_id, the
placeholder remained, leading to cascading creation of 13,000+ orphaned files.

Fix:
- Generate deterministic ID `mem-${contentSessionId}` upfront
- Always pass it to `resume` parameter
- Persist immediately to database before query starts
- If SDK returns different ID, capture and use that going forward

This fix was correct in approach - always passing a resume parameter prevents new file creation.

Commit f9197b5: The Revert

fix(sdk): restore session continuity via robust capture-and-resume strategy

Replaces the deterministic 'mem-' ID approach with a capture-based strategy:
1. Passes 'resume' parameter ONLY when a verified memory session ID exists
2. Captures SDK-generated session ID when it differs from current ID
3. Ensures subsequent prompts resume the correctly captured session ID

This resolves the issue where new sessions were created for every message
due to failure to capture/resume the initial session ID, without introducing
potentially invalid deterministic IDs.

The revert explanation suggests the SDK rejected the mem- prefix IDs.

Commit 005b0f8: Current NULL-based Pattern

Changed memory_session_id initialization from contentSessionId (placeholder) to NULL:

  • Simpler logic: !!session.memorySessionId instead of memorySessionId !== contentSessionId
  • But still creates new files on first prompt of each session

Relationship with Issue #520 (Stuck Messages)

Issue #520 is related but distinct:

Aspect Issue #514 (Orphaned Files) Issue #520 (Stuck Messages)
Problem Too many .jsonl files Messages never processed
Root Cause SDK creates new file per query without resume Old claim-process-mark pattern left messages in 'processing' state
Status Partially resolved Fully resolved
Fix Need deterministic resume IDs Changed to claim-and-delete pattern

Connection: Both issues relate to startup-recovery. Issue #520's fix (claim-and-delete pattern) doesn't create the loop that #514 describes, but #514 can still occur when:

  1. Sessions have pending messages
  2. Recovery starts the generator
  3. Generator aborts before capturing memorySessionId
  4. Next startup repeats the cycle

v8.5.7 Status

v8.5.7 did NOT fully address Issue #514. The major changes were:

  • Modular architecture refactor
  • NULL-based initialization pattern
  • Comprehensive test coverage

The deterministic mem- prefix fix (9a7f662) was reverted before v8.5.7.


Option 1: Reintroduce Deterministic IDs with SDK Validation

// SDKAgent.ts - In startSession()
async startSession(session: ActiveSession, worker?: WorkerRef): Promise<void> {
  // Generate deterministic ID based on database session ID (not UUID-based contentSessionId)
  // Format: "mem-<sessionDbId>" is short and unlikely to conflict
  const deterministicMemoryId = session.memorySessionId || `mem-${session.sessionDbId}`;

  // Always pass resume to prevent orphaned sessions
  const queryResult = query({
    prompt: messageGenerator,
    options: {
      model: modelId,
      resume: deterministicMemoryId,  // ALWAYS pass, even if SDK might reject
      disallowedTools,
      abortController: session.abortController,
      pathToClaudeCodeExecutable: claudePath
    }
  });

  // Capture whatever ID the SDK actually uses
  for await (const message of queryResult) {
    if (message.session_id && message.session_id !== session.memorySessionId) {
      session.memorySessionId = message.session_id;
      this.dbManager.getSessionStore().updateMemorySessionId(
        session.sessionDbId,
        message.session_id
      );
    }
    // ... rest of processing
  }
}

Option 2: Limit Recovery Scope

Prevent the recovery loop by limiting how many times a session can be recovered:

// In processPendingQueues()
for (const sessionDbId of orphanedSessionIds) {
  // Check if this session was already recovered recently
  const dbSession = this.dbManager.getSessionById(sessionDbId);
  const recoveryAttempts = dbSession.recovery_attempts || 0;

  if (recoveryAttempts >= 3) {
    logger.warn('SYSTEM', 'Session exceeded max recovery attempts, skipping', {
      sessionDbId,
      recoveryAttempts
    });
    continue;
  }

  // Increment recovery counter
  this.dbManager.getSessionStore().incrementRecoveryAttempts(sessionDbId);

  // ... rest of recovery
}

Option 3: Cleanup Old Files (Mitigation, Not Fix)

Add a cleanup script that removes orphaned .jsonl files:

# Find files with only 1 line older than 7 days
find ~/.claude/projects/ -name "*.jsonl" -mtime +7 \
  -exec sh -c '[ $(wc -l < "$1") -le 1 ] && rm "$1"' _ {} \;

Files Involved

File Role
src/services/worker-service.ts startSessionProcessor(), processPendingQueues()
src/services/worker/SDKAgent.ts startSession(), query() call with resume parameter
src/services/worker/SessionManager.ts initializeSession(), session lifecycle
src/services/sqlite/sessions/create.ts createSDKSession(), NULL-based initialization
src/services/sqlite/PendingMessageStore.ts getSessionsWithPendingMessages()

Conclusion

Issue #514 was correctly diagnosed. The fix in commit 9a7f662 was the right approach but was reverted because the SDK may not accept arbitrary custom IDs. The current NULL-based pattern (005b0f8) is cleaner but doesn't prevent orphaned files when queries abort before capturing the SDK's session ID.

Recommendation: Reintroduce the deterministic ID approach with proper handling of SDK rejections (Option 1). If the SDK rejects the ID and returns a different one, capture and persist that ID immediately. This ensures at most one .jsonl file per database session, even across crashes and restarts.


Appendix: Git Commit References

Commit Description
9a7f662 Original fix: deterministic mem- prefix IDs (REVERTED)
f9197b5 Revert: capture-based strategy without deterministic IDs
005b0f8 NULL-based initialization pattern (current)
d72a81e Queue refactoring (related to #520)
eb1a78b Claim-and-delete pattern (fixes #520)