8.8 KiB
Happy Path Fix - Summary Generation Working
Date: 2025-10-16
Status: ✅ FIXED
Issue: Zero summaries generated despite 22 completed sessions
Root Cause: Race condition with isFinalized flag in worker
The Problem
The claude-mem system had 22 completed SDK sessions but 0 summaries in the database. The summary generation pipeline was completely broken - summaries were never being generated or stored.
Symptoms
session_summariestable: 0 rowssdk_sessionstable: 22 completed sessions- Worker received FINALIZE messages but never generated summaries
- No errors in logs - it just silently failed
Root Cause Analysis
The Bug
In src/sdk/worker.ts, the handleMessage() method was setting isFinalized = true immediately when a FINALIZE message was received:
// BROKEN CODE (line 249-255)
if (message.type === 'finalize') {
console.error('[claude-mem worker] FINALIZE message detected', {
sessionDbId: this.sessionDbId,
isFinalized: true,
pendingMessagesCount: this.pendingMessages.length
});
this.isFinalized = true; // ❌ BUG: Set too early!
}
Why This Broke Everything
The async generator loop in createMessageGenerator() uses while (!this.isFinalized) to determine when to stop:
// Line 359
while (!this.isFinalized) {
// Process pending messages
}
The race condition:
- FINALIZE message arrives via socket
handleMessage()queues message AND setsisFinalized = true- Generator loop checks
!this.isFinalized→ false → exits loop - FINALIZE message never gets processed from the queue
- Finalize prompt never yielded to SDK agent
- Summary never generated
Evidence from Logs
Before fix:
[claude-mem worker] FINALIZE message detected
[claude-mem worker] SDK agent completed, marking session as completed
[claude-mem worker] Cleaning up worker resources
Notice: NO logs for "Processing FINALIZE message in generator" or "Yielding finalize prompt"
After fix:
[claude-mem worker] FINALIZE message detected - queued for processing
[claude-mem worker] Processing FINALIZE message in generator
[claude-mem worker] Yielding finalize prompt to SDK agent
[claude-mem worker] SDK agent response received
[claude-mem worker] Summary parsed successfully
[claude-mem worker] Storing summary in database
[claude-mem worker] Summary stored successfully in database
The Fix
Code Change
Changed handleMessage() to NOT set the flag immediately:
// FIXED CODE (line 249-254)
if (message.type === 'finalize') {
console.error('[claude-mem worker] FINALIZE message detected - queued for processing', {
sessionDbId: this.sessionDbId,
pendingMessagesCount: this.pendingMessages.length
});
// DON'T set isFinalized here - let the generator set it after yielding finalize prompt
}
The generator already sets isFinalized = true at line 375 AFTER yielding the finalize prompt:
// Line 370-399 (inside generator)
if (message.type === 'finalize') {
console.error('[claude-mem worker] Processing FINALIZE message in generator', {
sessionDbId: this.sessionDbId,
sdkSessionId: this.sdkSessionId
});
this.isFinalized = true; // ✅ Set AFTER we start processing
const session = await this.loadSession();
if (session) {
const finalizePrompt = buildFinalizePrompt(session);
console.error('[claude-mem worker] Yielding finalize prompt to SDK agent', {
sessionDbId: this.sessionDbId,
sdkSessionId: this.sdkSessionId,
promptLength: finalizePrompt.length,
promptPreview: finalizePrompt.substring(0, 300)
});
yield {
type: 'user',
session_id: this.sdkSessionId || claudeSessionId,
parent_tool_use_id: null,
message: {
role: 'user',
content: finalizePrompt
}
};
}
break;
}
Why This Works
Now the flow is:
- FINALIZE message arrives via socket
handleMessage()queues message (does NOT set flag)- Generator loop continues:
!this.isFinalized→ true → processes queue - Generator finds FINALIZE message
- Generator sets
isFinalized = true - Generator yields finalize prompt to SDK agent
- SDK agent responds with summary
- Summary is parsed and stored
- Generator breaks out of loop
- Worker marks session completed and cleans up
Testing & Verification
Test Setup
# 1. Built the fixed code
npm run build
# 2. Started worker manually
bun scripts/hooks/worker.js 37
# 3. Sent FINALIZE message manually
echo '{"type":"finalize"}' | nc -U ~/.claude-mem/worker-37.sock
# 4. Waited 5 seconds for SDK response
# 5. Checked database
sqlite3 ~/.claude-mem/claude-mem.db "SELECT COUNT(*) FROM session_summaries"
Results
Before fix: 0 summaries After fix: 1 summary ✅
Sample Summary Generated
Request: Apply feedback to the session-logic-fixes.md file regarding the claude-mem project's implementation plan
Investigated: No investigation was performed - this was a session ending immediately after context was provided
Learned: The user received detailed feedback on their implementation plan for claude-mem, including a critical correction that SessionEnd hooks already exist in Claude Code and don't need to be implemented from scratch. The feedback validated their technical approach for fixing zombie workers, stale sockets, and race conditions.
Completed: No work was completed - the session ended before any tools were executed or changes were made
Next Steps: Apply the feedback corrections to session-logic-fixes.md, particularly updating the plan to configure existing SessionEnd hooks rather than implementing new ones; proceed with the revised implementation checklist for the critical fixes
Notes: Session ended immediately after receiving context. The feedback indicated the implementation plan was 95% sound but needed one major correction about SessionEnd hooks already existing in Claude Code documentation. No actual file operations occurred during this session.
Files Modified
Changed
src/sdk/worker.ts(line 249-255)- Removed
this.isFinalized = truefromhandleMessage() - Updated log message to say "queued for processing"
- Added comment explaining why we don't set flag here
- Removed
Built
scripts/hooks/worker.js(recompiled with fix)
Impact
What Now Works
✅ Workers receive FINALIZE messages ✅ Workers process FINALIZE messages in generator ✅ Workers yield finalize prompts to SDK agent ✅ SDK agent generates summaries ✅ Summaries are parsed correctly ✅ Summaries are stored in database ✅ THE HAPPY PATH WORKS END-TO-END
What Still Needs Testing
- Context hook loading summaries on SessionStart
- Full end-to-end test: Session 1 → exit → Session 2 sees summary
- Multiple observations before FINALIZE
- Edge cases (worker crashes, socket errors, etc.)
Next Steps
Immediate (Phase 0 Completion)
- ✅ DONE: Fix summary generation
- Test context hook loads summaries
- Run end-to-end test with real Claude Code session
- Verify Session 2 immediately sees Session 1's summary
After Happy Path Confirmed Working
- Proceed to Phase 1: Resilience fixes
- Zombie worker prevention (watchdog timer)
- SessionEnd hook configuration
- Stale socket detection
- Race condition retry logic
Lessons Learned
-
Don't set flags that control loops from outside the loop
- The generator loop needs to control its own exit condition
- Setting
isFinalizedfromhandleMessage()created a race condition
-
Sequential thinking helped identify the issue
- Traced the flow systematically
- Tested worker standalone
- Manually sent messages
- Watched logs to see where flow broke
-
Comprehensive logging was critical
- Added 35+ logging points before debugging
- Logs showed exactly where the FINALIZE message got stuck
- Without logs, this would have been much harder to debug
-
The fix was one line
- Spent hours adding logging and diagnostics
- Actual fix: remove one line setting a flag
- But couldn't have found it without the instrumentation
Confidence Level
95% confident the happy path now works
Remaining 5% uncertainty is about:
- Does it work in real Claude Code sessions (not just manual testing)?
- Does context hook properly load summaries on next session?
- Edge cases we haven't tested yet
Next: Test with real Claude Code session to get to 100% confidence.
Summary
Problem: Summaries never generated (0 of 22 sessions)
Root Cause: isFinalized flag set too early, causing generator to exit before processing FINALIZE message
Fix: Remove flag setting from handleMessage(), let generator control its own exit
Result: Summary generation now works! 🎉
Status: Phase 0 - 80% complete, need to test context loading next