claude-mem

Author	SHA1	Message	Date
Alex Newman	b88251bc8b	fix: self-healing claimNextMessage prevents stuck processing messages (#1159 ) * fix: self-healing claimNextMessage prevents stuck processing messages claimAndDelete → claimNextMessage with atomic self-healing: resets stale processing messages (>60s) back to pending before claiming. Eliminates stuck messages from generator crashes without external timers. Removes redundant idle-timeout reset in worker-service.ts. Adds QUEUE to logger Component type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update stale comments in SessionQueueProcessor to reflect claim-confirm pattern Comments still referenced the old claim-and-delete pattern after the claimNextMessage rename. Updated to accurately describe the current lifecycle where messages are marked as processing and stay in DB until confirmProcessed() is called. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: move Date.now() inside transaction and extract stale threshold constant - Move Date.now() inside claimNextMessage transaction closure so timestamp is fresh if WAL contention causes retry - Extract STALE_PROCESSING_THRESHOLD_MS to module-level constant - Add comment clarifying strict < boundary semantics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 23:15:46 -05:00
Alex Newman	7566b8c650	fix: add idle timeout to prevent zombie observer processes (#856 ) * fix: add idle timeout to prevent zombie observer processes Root cause fix for zombie observer accumulation. The SessionQueueProcessor iterator now exits gracefully after 3 minutes of inactivity instead of waiting forever for messages. Changes: - Add IDLE_TIMEOUT_MS constant (3 minutes) - waitForMessage() now returns boolean and accepts timeout parameter - createIterator() tracks lastActivityTime and exits on idle timeout - Graceful exit via return (not throw) allows SDK to complete cleanly This addresses the root cause that PR #848 worked around with pattern matching. Observer processes now self-terminate, preventing accumulation when session-complete hooks don't fire. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: trigger abort on idle timeout to actually kill subprocess The previous implementation only returned from the iterator on idle timeout, but this doesn't terminate the Claude subprocess - it just stops yielding messages. The subprocess stays alive as a zombie because: 1. Returning from createIterator() ends the generator 2. The SDK closes stdin via transport.endInput() 3. But the subprocess may not exit on stdin EOF 4. No abort signal is sent to kill it Fix: Add onIdleTimeout callback that SessionManager uses to call session.abortController.abort(). This sends SIGTERM to the subprocess via the SDK's ProcessTransport abort handler. Verified by Codex analysis of the SDK internals: - abort() triggers ProcessTransport abort handler → SIGTERM - transport.close() sends SIGTERM → escalates to SIGKILL after 5s - Just closing stdin is NOT sufficient to guarantee subprocess exit Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: add idle timeout to prevent zombie observer processes Also cleaned up hooks.json to remove redundant start commands. The hook command handler now auto-starts the worker if not running, which is how it should have been since we changed to auto-start. This maintenance change was done manually. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: resolve race condition in session queue idle timeout detection - Reset timer on spurious wakeup when queue is empty but duration check fails - Use optional chaining for onIdleTimeout callback - Include threshold value in idle timeout log message for better diagnostics - Add comprehensive unit tests for SessionQueueProcessor Fixes PR #856 review feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: migrate installer to Setup hook - Add plugin/scripts/setup.sh for one-time dependency setup - Add Setup hook to hooks.json (triggers via claude --init) - Remove smart-install.js from SessionStart hook - Keep smart-install.js as manual fallback for Windows/auto-install Setup hook handles: - Bun detection with fallback locations - uv detection (optional, for Chroma) - Version marker to skip redundant installs - Clear error messages with install instructions * feat: add np for one-command npm releases - Add np as dev dependency - Add release, release:patch, release:minor, release:major scripts - Add prepublishOnly hook to run build before publish - Configure np (no yarn, include all contents, run tests) * fix: reduce PostToolUse hook timeout to 30s PostToolUse runs on every tool call, 120s was excessive and could cause hangs. Reduced to 30s for responsive behavior. * docs: add PR shipping report Analyzed 6 PRs for shipping readiness: - #856: Ready to merge (idle timeout fix) - #700, #722, #657: Have conflicts, need rebase - #464: Contributor PR, too large (15K+ lines) - #863: Needs manual review Includes shipping strategy and conflict resolution order. * MAESTRO: Verify PR #856 test suite passes All 797 tests pass (3 skipped, 0 failures). The 11 SessionQueueProcessor idle timeout tests all pass with 20 expect() assertions verified. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * MAESTRO: Verify PR #856 build passes - Ran npm run build successfully with no TypeScript errors - All artifacts generated (worker-service, mcp-server, context-generator, viewer UI) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * MAESTRO: Code review PR #856 implementation verified Verified all requirements in SessionQueueProcessor.ts: - IDLE_TIMEOUT_MS = 180000ms (3 minutes) - waitForMessage() accepts timeout parameter - lastActivityTime reset on spurious wakeup (race condition fix) - Graceful exit logs include thresholdMs parameter - 11 comprehensive test cases in SessionQueueProcessor.test.ts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: bigph00t <166455923+bigph00t@users.noreply.github.com> Co-authored-by: root <root@srv1317155.hstgr.cloud>	2026-02-04 19:31:24 -05:00
Alex Newman	2fc4153bef	refactor: decompose monolithic services into modular architecture (#534 ) * docs: add monolith refactor report with system breakdown Comprehensive analysis of codebase identifying: - 14 files over 500 lines requiring refactoring - 3 critical monoliths (SessionStore, SearchManager, worker-service) - 80% code duplication across agent files - 5-phase refactoring roadmap with domain-based architecture * fix: prevent memory_session_id from equaling content_session_id The bug: memory_session_id was initialized to contentSessionId as a "placeholder for FK purposes". This caused the SDK resume logic to inject memory agent messages into the USER's Claude Code transcript, corrupting their conversation history. Root cause: - SessionStore.createSDKSession initialized memory_session_id = contentSessionId - SDKAgent checked memorySessionId !== contentSessionId but this check only worked if the session was fetched fresh from DB The fix: - SessionStore: Initialize memory_session_id as NULL, not contentSessionId - SDKAgent: Simple truthy check !!session.memorySessionId (NULL = fresh start) - Database migration: Ran UPDATE to set memory_session_id = NULL for 1807 existing sessions that had the bug Also adds [ALIGNMENT] logging across the session lifecycle to help debug session continuity issues: - Hook entry: contentSessionId + promptNumber - DB lookup: contentSessionId → memorySessionId mapping proof - Resume decision: shows which memorySessionId will be used for resume - Capture: logs when memorySessionId is captured from first SDK response UI: Added "Alignment" quick filter button in LogsModal to show only alignment logs for debugging session continuity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: improve error handling in worker-service.ts - Fix GENERIC_CATCH anti-patterns by logging full error objects instead of just messages - Add [ANTI-PATTERN IGNORED] markers for legitimate cases (cleanup, hot paths) - Simplify error handling comments to be more concise - Improve httpShutdown() error discrimination for ECONNREFUSED - Reduce LARGE_TRY_BLOCK issues in initialization code Part of anti-pattern cleanup plan (132 total issues) * refactor: improve error logging in SearchManager.ts - Pass full error objects to logger instead of just error.message - Fixes PARTIAL_ERROR_LOGGING anti-patterns (10 instances) - Better debugging visibility when Chroma queries fail Part of anti-pattern cleanup (133 remaining) * refactor: improve error logging across SessionStore and mcp-server - SessionStore.ts: Fix error logging in column rename utility - mcp-server.ts: Log full error objects instead of just error.message - Improve error handling in Worker API calls and tool execution Part of anti-pattern cleanup (133 remaining) * Refactor hooks to streamline error handling and loading states - Simplified error handling in useContextPreview by removing try-catch and directly checking response status. - Refactored usePagination to eliminate try-catch, improving readability and maintaining error handling through response checks. - Cleaned up useSSE by removing unnecessary try-catch around JSON parsing, ensuring clarity in message handling. - Enhanced useSettings by streamlining the saving process, removing try-catch, and directly checking the result for success. * refactor: add error handling back to SearchManager Chroma calls - Wrap queryChroma calls in try-catch to prevent generator crashes - Log Chroma errors as warnings and fall back gracefully - Fixes generator failures when Chroma has issues - Part of anti-pattern cleanup recovery * feat: Add generator failure investigation report and observation duplication regression report - Created a comprehensive investigation report detailing the root cause of generator failures during anti-pattern cleanup, including the impact, investigation process, and implemented fixes. - Documented the critical regression causing observation duplication due to race conditions in the SDK agent, outlining symptoms, root cause analysis, and proposed fixes. * fix: address PR #528 review comments - atomic cleanup and detector improvements This commit addresses critical review feedback from PR #528: ## 1. Atomic Message Cleanup (Fix Race Condition) Problem: SessionRoutes.ts generator error handler had race condition - Queried messages then marked failed in loop - If crash during loop → partial marking → inconsistent state Solution: - Added `markSessionMessagesFailed()` to PendingMessageStore.ts - Single atomic UPDATE statement replaces loop - Follows existing pattern from `resetProcessingToPending()` Files: - src/services/sqlite/PendingMessageStore.ts (new method) - src/services/worker/http/routes/SessionRoutes.ts (use new method) ## 2. Anti-Pattern Detector Improvements Problem: Detector didn't recognize logger.failure() method - Lines 212 & 335 already included "failure" - Lines 112-113 (PARTIAL_ERROR_LOGGING detection) did not Solution: Updated regex patterns to include "failure" for consistency Files: - scripts/anti-pattern-test/detect-error-handling-antipatterns.ts ## 3. Documentation PR Comment: Added clarification on memory_session_id fix location - Points to SessionStore.ts:1155 - Explains why NULL initialization prevents message injection bug ## Review Response Addresses "Must Address Before Merge" items from review: ✅ Clarified memory_session_id bug fix location (via PR comment) ✅ Made generator error handler message cleanup atomic ❌ Deferred comprehensive test suite to follow-up PR (keeps PR focused) ## Testing - Build passes with no errors - Anti-pattern detector runs successfully - Atomic cleanup follows proven pattern from existing methods 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: FOREIGN KEY constraint and missing failed_at_epoch column Two critical bugs fixed: 1. Missing failed_at_epoch column in pending_messages table - Added migration 20 to create the column - Fixes error when trying to mark messages as failed 2. FOREIGN KEY constraint failed when storing observations - All three agents (SDK, Gemini, OpenRouter) were passing session.contentSessionId instead of session.memorySessionId - storeObservationsAndMarkComplete expects memorySessionId - Added null check and clear error message However, observations still not saving - see investigation report. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Refactor hook input parsing to improve error handling - Added a nested try-catch block in new-hook.ts, save-hook.ts, and summary-hook.ts to handle JSON parsing errors more gracefully. - Replaced direct error throwing with logging of the error details using logger.error. - Ensured that the process exits cleanly after handling input in all three hooks. * docs: update monolith report post session-logging merge - SessionStore grew to 2,011 lines (49 methods) - highest priority - SearchManager reduced to 1,778 lines (improved) - Agent files reduced by ~45 lines combined - Added trend indicators and post-merge observations - Core refactoring proposal remains valid * refactor(sqlite): decompose SessionStore into modular architecture Extract the 2011-line SessionStore.ts monolith into focused, single-responsibility modules following grep-optimized progressive disclosure pattern: New module structure: - sessions/ - Session creation and retrieval (create.ts, get.ts, types.ts) - observations/ - Observation storage and queries (store.ts, get.ts, recent.ts, files.ts, types.ts) - summaries/ - Summary storage and queries (store.ts, get.ts, recent.ts, types.ts) - prompts/ - User prompt management (store.ts, get.ts, types.ts) - timeline/ - Cross-entity timeline queries (queries.ts) - import/ - Bulk import operations (bulk.ts) - migrations/ - Database migrations (runner.ts) New coordinator files: - Database.ts - ClaudeMemDatabase class with re-exports - transactions.ts - Atomic cross-entity transactions - Named re-export facades (Sessions.ts, Observations.ts, etc.) Key design decisions: - All functions take `db: Database` as first parameter (functional style) - Named re-exports instead of index.ts for grep-friendliness - SessionStore retained as backward-compatible wrapper - Target file size: 50-150 lines (60% compliance) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(agents): extract shared logic into modular architecture Consolidate duplicate code across SDKAgent, GeminiAgent, and OpenRouterAgent into focused utility modules. Total reduction: 500 lines (29%). New modules in src/services/worker/agents/: - ResponseProcessor.ts: Atomic DB transactions, Chroma sync, SSE broadcast - ObservationBroadcaster.ts: SSE event formatting and dispatch - SessionCleanupHelper.ts: Session state cleanup and stuck message reset - FallbackErrorHandler.ts: Provider error detection for fallback logic - types.ts: Shared interfaces (WorkerRef, SSE payloads, StorageResult) Bug fix: SDKAgent was incorrectly using obs.files instead of obs.files_read and hardcoding files_modified to empty array. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(search): extract search strategies into modular architecture Decompose SearchManager into focused strategy pattern with: - SearchOrchestrator: Coordinates strategy selection and fallback - ChromaSearchStrategy: Vector semantic search via ChromaDB - SQLiteSearchStrategy: Filter-only queries for date/project/type - HybridSearchStrategy: Metadata filtering + semantic ranking - ResultFormatter: Markdown table formatting for results - TimelineBuilder: Chronological timeline construction - Filter modules: DateFilter, ProjectFilter, TypeFilter SearchManager now delegates to new infrastructure while maintaining full backward compatibility with existing public API. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(context): decompose context-generator into modular architecture Extract 660-line monolith into focused components: - ContextBuilder: Main orchestrator (~160 lines) - ContextConfigLoader: Configuration loading - TokenCalculator: Token budget calculations - ObservationCompiler: Data retrieval and query building - MarkdownFormatter/ColorFormatter: Output formatting - Section renderers: Header, Timeline, Summary, Footer Maintains full backward compatibility - context-generator.ts now delegates to new ContextBuilder while preserving public API. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(worker): decompose worker-service into modular infrastructure Split 2000+ line monolith into focused modules: Infrastructure: - ProcessManager: PID files, signal handlers, child process cleanup - HealthMonitor: Port checks, health polling, version matching - GracefulShutdown: Coordinated cleanup on exit Server: - Server: Express app setup, core routes, route registration - Middleware: Re-exports from existing middleware - ErrorHandler: Centralized error handling with AppError class Integrations: - CursorHooksInstaller: Full Cursor IDE integration (registry, hooks, MCP) WorkerService now acts as thin coordinator wiring all components together. Maintains full backward compatibility with existing public API. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Refactor session queue processing and database interactions - Implement claim-and-delete pattern in SessionQueueProcessor to simplify message handling and eliminate duplicate processing. - Update PendingMessageStore to support atomic claim-and-delete operations, removing the need for intermediate processing states. - Introduce storeObservations method in SessionStore for simplified observation and summary storage without message tracking. - Remove deprecated methods and clean up session state management in worker agents. - Adjust response processing to accommodate new storage patterns, ensuring atomic transactions for observations and summaries. - Remove unnecessary reset logic for stuck messages due to the new queue handling approach. * Add duplicate observation cleanup script Script to clean up duplicate observations created by the batching bug where observations were stored once per message ID instead of once per observation. Includes safety checks to always keep at least one copy. Usage: bun scripts/cleanup-duplicates.ts # Dry run bun scripts/cleanup-duplicates.ts --execute # Delete duplicates bun scripts/cleanup-duplicates.ts --aggressive # Ignore time window 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-01-03 21:22:27 -05:00
Alex Newman	b8ce27bd31	feat(queue): Simplify queue processing and enhance reliability - Implemented atomic message claiming in PendingMessageStore with claimNextMessage. - Removed obsolete peekPending method to streamline message retrieval. - Introduced SessionQueueProcessor for robust async message iteration, replacing complex polling logic. - Refactored SessionManager to eliminate in-memory queue state, relying on PendingMessageStore for message tracking. - Cleaned up session handling logic, removing recursive restarts and session deletion on empty queues. - Enhanced error handling and logging for generator failures and session processing. - Updated SessionRoutes to handle crash recovery more effectively without deleting sessions.	2025-12-28 16:28:58 -05:00

4 Commits