2f19eab9c29179e944d6c6efc20f299d12c97314
182 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
c648d5d8d2 |
feat: Knowledge Agents — queryable corpora from claude-mem (#1653)
* feat: add knowledge agent types, store, builder, and renderer Phase 1 of Knowledge Agents feature. Introduces corpus compilation pipeline that filters observations from the database into portable corpus files stored at ~/.claude-mem/corpora/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add corpus CRUD HTTP endpoints and wire into worker service Phase 2 of Knowledge Agents. Adds CorpusRoutes with 5 endpoints (build, list, get, delete, rebuild) and registers them during worker background initialization alongside SearchRoutes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add KnowledgeAgent with V1 SDK prime/query/reprime Phase 3 of Knowledge Agents. Uses Agent SDK V1 query() with resume and disallowedTools for Q&A-only knowledge sessions. Auto-reprimes on session expiry. Adds prime, query, and reprime HTTP endpoints to CorpusRoutes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add MCP tools and skill for knowledge agents Phase 4 of Knowledge Agents. Adds build_corpus, list_corpora, prime_corpus, and query_corpus MCP tools delegating to worker HTTP endpoints. Includes /knowledge-agent skill with workflow docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: handle SDK process exit in KnowledgeAgent, add e2e test The Agent SDK may throw after yielding all messages when the Claude process exits with a non-zero code. Now tolerates this if session_id/answer were already captured. Adds comprehensive e2e test script (31 assertions) orchestrated via tmux-cli. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use settings model ID instead of hardcoded model in KnowledgeAgent Reads CLAUDE_MEM_MODEL from user settings via getModelId(), matching the existing SDKAgent pattern. No more hardcoded model assumptions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: improve knowledge agents developer experience Add public documentation page, rebuild/reprime MCP tools, and actionable error messages. DX review scored knowledge agents 4/10 — core engineering works (31/31 e2e) but the feature was invisible. This addresses discoverability (docs, cross-links), API completeness (missing MCP tools), and error quality (fix/example fields in error responses). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add quick start guide to knowledge agents page Covers the three main use cases upfront: creating an agent, asking a single question, and starting a fresh conversation with reprime. Includes keeping-it-current section for rebuild + reprime workflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address code review issues — path traversal, session safety, prompt injection - Block path traversal in CorpusStore with alphanumeric name validation and resolved path check - Harden system prompt against instruction injection from untrusted corpus content - Validate question field as non-empty string in query endpoint - Only persist session_id after successful prime (not null on failure) - Persist refreshed session_id after query execution - Only auto-reprime on session resume errors, not all query failures - Add fenced code block language tags to SKILL.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address remaining code review issues — e2e robustness, MCP validation, docs - Harden e2e curl wrappers with connect-timeout, fallback to HTTP 000 on transport failure - Use curl_post wrapper consistently for all long-running POST calls - Add runtime name validation to all corpus MCP tool handlers - Fix docs: soften hallucination guarantee to probabilistic claim - Fix architecture diagram: add missing rebuild_corpus and reprime_corpus tools Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: enforce string[] type in safeParseJsonArray for corpus data integrity Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add blank line before fenced code blocks in SKILL.md maintenance section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
abd55977ca |
fix(mcp): MCP server crashes with Cannot find module 'bun:sqlite' under Node (#1645)
* fix(mcp): MCP server crashes with Cannot find module 'bun:sqlite' under Node
The MCP server bundle (mcp-server.cjs) ships with `#!/usr/bin/env node` so
it must run under Node, but commit
|
||
|
|
25bb93a995 |
fix: address PR #1641 review comments (round 2)
- Remove duplicate TranscriptWatcher/config imports in worker-service.ts - Use normalizePlatformSource in handleSessionInitByClaudeId for consistency - Don't skip DB completion when session not in memory (completeByClaudeId) - Add try-catch around fetch in useContextPreview refresh callback - Deduplicate store.getAllProjects() call in DataRoutes - Fix malformed comment separators in migration runner - Fix missing closing brace and JSDoc opener (merge artifact) in migration runner Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
cbb68ad9e1 |
fix: worker startup crash and missing observation columns
Two bugs fixed: 1. SessionCompletionHandler called dbManager.getSessionStore() during WorkerService construction, before DB initialization. Changed to accept DatabaseManager and defer the call to runtime. 2. migration009 (generated_by_model, relevance_count columns) only ran via the deprecated MigrationRunner path, never through SessionStore's migration chain. Added addObservationModelColumns() to SessionStore constructor. Checks column existence directly since schema_versions may have been marked applied without the ALTER TABLE succeeding. Also removed duplicate transcriptWatcher declaration and shutdown block (merge artifact). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
6250a194dd |
Merge branch 'pr-1472' into integration/validation-batch
# Conflicts: # plugin/scripts/context-generator.cjs # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs # plugin/ui/viewer-bundle.js # src/cli/handlers/context.ts # src/services/sqlite/SessionStore.ts # src/services/sqlite/migrations/runner.ts # src/services/worker-service.ts # src/shared/SettingsDefaultsManager.ts |
||
|
|
d570909bf1 |
Merge branch 'pr-1491' into integration/validation-batch
# Conflicts: # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs # src/shared/hook-constants.ts |
||
|
|
753837bff3 |
fix(windows): isMainModule CJS branch fails on Bun — add CLAUDE_MEM_MANAGED fallback
On Bun/Windows, `require.main !== module` in CJS mode causes the worker to exit silently with code 0. The wrapper already sets CLAUDE_MEM_MANAGED=true when spawning the inner worker, so checking this env var is a safe fallback that doesn't affect standalone execution. Ref #1450 (incomplete fix in PR #1518 — ESM path fixed but CJS branch not). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
2495f98496 |
refactor: consolidate MCP factory, add non-TTY support, auto-detect transcript watchers
- Phase 1: Replace 5 duplicate MCP installers with config-driven factory, extract shared context-injection and json-utils utilities, fix process.execPath usage - Phase 2: Add non-TTY fallback for @clack/prompts to prevent ENOENT in CI/Docker - Phase 3: Wire GeminiCliHooksInstaller through hook command framework with adapter - Phase 4: Auto-start transcript watchers on worker boot when config exists Net -107 lines via DRY consolidation of duplicated installer logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
4589b34eab | fix: decouple mcp health from loopback self-check | ||
|
|
b0f1a458cf |
fix: log warning when readiness times out on reused-worker path (#1491)
Mirror the fresh-spawn path's timeout logging for debugging parity. CodeRabbit nitpick on PR #1491. Co-Authored-By: CC <noreply@anthropic.com> |
||
|
|
83f61177c7 |
fix: address CodeRabbit review feedback on PR #1491
- Update POST_SPAWN_WAIT test assertion from 5000 to 15000 to match the constant change in hook-constants.ts - Remove redundant readPidFile() from aggressiveStartupCleanup() — start() writes the new PID before this runs, so it always returns process.pid (already protected) - Add waitForReadiness() to the reused-worker path in ensureWorkerStarted() to prevent concurrent hooks from racing past a cold-starting worker's initialization guard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
88b47f9e9c |
fix: prevent worker daemon from being killed by its own hooks (#1490)
Three independent fixes for worker daemon instability: 1. Remove version mismatch auto-restart from ensureWorkerStarted() (#1435). The marketplace bundle ships with __DEFAULT_PACKAGE_VERSION__ unbaked, causing BUILT_IN_VERSION to fall back to "development". This creates a 100% reproducible mismatch on every hook call, killing a healthy worker and often failing to restart. Same pattern across #566, #665, #667, #669, #689, #1124, #1145 (8+ releases). 2. Add process.ppid and PID-file PID to aggressiveStartupCleanup() exclusions (#1426). Without this, a newly spawned daemon SIGKILLs the hook process that spawned it and any already-running worker the PID file points to. 3. Increase POST_SPAWN_WAIT from 5s to 15s (#1423). The 5s timeout was sized for Linux (<1s startup) but macOS ARM64 cold starts take 6-8s with Chroma enabled. |
||
|
|
07ab7000a8 |
fix: patch 7 critical bugs affecting all non-dev-machine users and Windows
1. Fix esbuild inlining build-machine __dirname as string literal — use
CJS-compatible runtime banner with require("node:url").fileURLToPath
across worker-service, mcp-server, and context-generator builds.
2. Fix isMainModule check missing .cjs extension and Windows backslash
path normalization.
3. Wrap extractLastMessage in try-catch to prevent infinite Stop hook
feedback loop on malformed transcripts (exit 0 instead of exit 2).
4. Replace heavy SessionEnd hook (Node→Bun→1.7MB CJS→HTTP) with
lightweight inline node -e one-liner (~200ms vs >1s).
5. Add 7 Gemini/OpenRouter error patterns to unrecoverablePatterns
circuit breaker to prevent 77K+ retry loops on expired API keys.
6. Preserve CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_CODE_GIT_BASH_PATH in
sanitizeEnv instead of stripping them with the CLAUDE_CODE_ prefix.
7. Use PowerShell -EncodedCommand for spawnDaemon to fix path quoting
when Windows usernames contain spaces.
Closes #1515, #1495, #1475, #1465, #1500, #1513, #1512, #1450, #1460,
#1486, #1449, #1481, #1451, #1480, #1453, #1445
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
2b60dd2932 |
feat: isolate Claude and Codex session sources
Persist platform_source across session creation, transcript ingestion, API query paths, and viewer state so Claude and Codex data can coexist without bleeding into each other. - add platform-source normalization helpers and persist platform_source in sdk_sessions via migration 24 with backfill and indexing - thread platformSource through CLI hooks, transcript processing, context generation, pagination, search routes, SSE payloads, and session management - expose source-aware project catalogs, viewer tabs, context preview selectors, and source badges for observations, prompts, and summaries - start the transcript watcher from the worker for transcript-based clients and preserve platform source during Codex ingestion - auto-start the worker from the MCP server for MCP-only clients and tighten stdio-driven cleanup during shutdown - keep createSDKSession backward compatible with existing custom-title callers while allowing explicit platform source forwarding |
||
|
|
4d7bec4d05 |
fix: stop spinner from spinning forever (#1440)
* fix: stop spinner from spinning forever due to orphaned DB messages The activity spinner never stopped because isAnySessionProcessing() queried ALL pending/processing messages in the database, including orphaned messages from dead sessions that no generator would ever process. Root cause: isAnySessionProcessing() used hasAnyPendingWork() which is a global DB scan. Changed it to use getTotalQueueDepth() which only checks sessions in the active in-memory Map. Additional fixes: - Add terminateSession() to enforce restart-or-terminate invariant - Fix 3 zombie paths in .finally() handler that left sessions alive - Clean up idle sessions from memory on successful completion - Remove redundant bare isProcessing:true broadcast - Replace inline require() with proper accessor - Add 8 regression tests for session termination invariant Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review findings — idle-timeout race, double broadcast, query amplification - Move pendingCount check before idle-timeout termination to prevent abandoning fresh messages that arrive between idle abort and .finally() - Move broadcastProcessingStatus() inside restart branch only — the else branch already broadcasts via removeSessionImmediate callback - Compute queueDepth once in broadcastProcessingStatus() and derive isProcessing from it, eliminating redundant double iteration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
80a8c90a1a |
feat: add embedded Process Supervisor for unified process lifecycle (#1370)
* feat: add embedded Process Supervisor for unified process lifecycle management Consolidates scattered process management (ProcessManager, GracefulShutdown, HealthMonitor, ProcessRegistry) into a unified src/supervisor/ module. New: ProcessRegistry with JSON persistence, env sanitizer (strips CLAUDECODE_* vars), graceful shutdown cascade (SIGTERM → 5s wait → SIGKILL with tree-kill on Windows), PID file liveness validation, and singleton Supervisor API. Fixes #1352 (worker inherits CLAUDECODE env causing nested sessions) Fixes #1356 (zombie TCP socket after Windows reboot) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add session-scoped process reaping to supervisor Adds reapSession(sessionId) to ProcessRegistry for killing session-tagged processes on session end. SessionManager.deleteSession() now triggers reaping. Tightens orphan reaper interval from 60s to 30s. Fixes #1351 (MCP server processes leak on session end) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Unix domain socket support for worker communication Introduces socket-manager.ts for UDS-based worker communication, eliminating port 37777 collisions between concurrent sessions. Worker listens on ~/.claude-mem/sockets/worker.sock by default with TCP fallback. All hook handlers, MCP server, health checks, and admin commands updated to use socket-aware workerHttpRequest(). Backwards compatible — settings can force TCP mode via CLAUDE_MEM_WORKER_TRANSPORT=tcp. Fixes #1346 (port 37777 collision across concurrent sessions) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove in-process worker fallback from hook command Removes the fallback path where hook scripts started WorkerService in-process, making the worker a grandchild of Claude Code (killed by sandbox). Hooks now always delegate to ensureWorkerStarted() which spawns a fully detached daemon. Fixes #1249 (grandchild process killed by sandbox) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add health checker and /api/admin/doctor endpoint Adds 30-second periodic health sweep that prunes dead processes from the supervisor registry and cleans stale socket files. Adds /api/admin/doctor endpoint exposing supervisor state, process liveness, and environment health. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add comprehensive supervisor test suite 64 tests covering all supervisor modules: process registry (18 tests), env sanitizer (8), shutdown cascade (10), socket manager (15), health checker (5), and supervisor API (6). Includes persistence, isolation, edge cases, and cross-module integration scenarios. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: revert Unix domain socket transport, restore TCP on port 37777 The socket-manager introduced UDS as default transport, but this broke the HTTP server's TCP accessibility (viewer UI, curl, external monitoring). Since there's only ever one worker process handling all sessions, the port collision rationale for UDS doesn't apply. Reverts to TCP-only, removing ~900 lines of unnecessary complexity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove dead code found in pre-landing review Remove unused `acceptingSpawns` field from Supervisor class (written but never read — assertCanSpawn uses stopPromise instead) and unused `buildWorkerUrl` import from context handler. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * updated gitignore * fix: address PR review feedback - downgrade HTTP logging, clean up gitignore, harden supervisor - Downgrade request/response HTTP logging from info to debug to reduce noise - Remove unused getWorkerPort imports, use buildWorkerUrl helper - Export ENV_PREFIXES/ENV_EXACT_MATCHES from env-sanitizer, reuse in Server.ts - Fix isPidAlive(0) returning true (should be false) - Add shutdownInitiated flag to prevent signal handler race condition - Make validateWorkerPidFile testable with pidFilePath option - Remove unused dataDir from ShutdownCascadeOptions - Upgrade reapSession log from debug to warn - Rename zombiePidFiles to deadProcessPids (returns actual PIDs) - Clean up gitignore: remove duplicate datasets/, stale ~*/ and http*/ patterns - Fix tests to use temp directories instead of relying on real PID file Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
626654f816 |
fix: prevent infinite restart loop on FOREIGN KEY constraint errors (#1334)
The pending-work-restart logic had no retry limit, causing infinite loops when sessions encountered FOREIGN KEY constraint failures. This led to 2000+ error log entries per minute and eventual worker crash via SIGTERM. Two fixes: 1. Add 'FOREIGN KEY constraint failed' to unrecoverable error patterns so it short-circuits immediately instead of falling through to restart 2. Add MAX_PENDING_RESTARTS (3) limit to pending-work-restart path as a safety net for any future unhandled persistent errors Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
38d9ac7adb |
fix: prevent zombie subprocess accumulation by only trusting exitCode (#1226) (#1325)
proc.killed only means Node sent a signal — the process can still be alive. This caused premature pool slot release, allowing unbounded process spawning. - ensureProcessExit: remove proc.killed from early-exit checks, only trust exitCode - Fix 3 call-site guards that skipped cleanup for signaled-but-alive processes - Add TOTAL_PROCESS_HARD_CAP=10 safety net in waitForSlot() - After SIGKILL, wait up to 1s via exit event instead of blind 200ms sleep - Reduce reaper interval from 5min to 1min, idle threshold from 2min to 1min Closes #1226 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
ad3d236cec |
fix: resolve hook crashes and CLAUDE_PLUGIN_ROOT fallback (#1215, #1220) (#1229)
* fix: resolve PostToolUse hook crashes and 5s latency (#1220) Three compounding bugs caused hook failures: 1. Missing break statements in worker-service.ts switch — if async code threw before process.exit(), execution fell through to subsequent cases. Added break to all 7 cases missing them. 2. Unhandled promise rejection on main() — added .catch() that logs the error and exits 0 (per project exit code strategy: don't block Claude Code or leave Windows Terminal tabs open). 3. Redundant start commands in hooks.json — PostToolUse, UserPromptSubmit, and Stop groups each had a standalone start command that was redundant (the hook case already calls ensureWorkerStarted internally). The redundant start also caused 5s latency via bun-runner.js collectStdin() timeout since Claude Code never closes stdin. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add CLAUDE_PLUGIN_ROOT fallback for Stop hooks (#1215) Upstream Claude Code bug (anthropics/claude-code#24529) leaves CLAUDE_PLUGIN_ROOT unset for Stop hooks on macOS and ALL hooks on Linux. Two-layer defense: 1. Shell-level: hooks.json commands now use inline fallback _R="${CLAUDE_PLUGIN_ROOT}"; [ -z "$_R" ] && _R="$HOME/..."; falling back to the known marketplace install path. 2. Script-level: bun-runner.js self-resolves plugin root from its own filesystem location via import.meta.url, and fixes broken /scripts/... paths that result from empty expansion. Added test to verify all hook commands include the fallback path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
c6f932988a |
Fix 30+ root-cause bugs across 10 triage phases (#1214)
* MAESTRO: fix ChromaDB core issues — Python pinning, Windows paths, disable toggle, metadata sanitization, transport errors - Add --python version pinning to uvx args in both local and remote mode (fixes #1196, #1206, #1208) - Convert backslash paths to forward slashes for --data-dir on Windows (fixes #1199) - Add CLAUDE_MEM_CHROMA_ENABLED setting for SQLite-only fallback mode (fixes #707) - Sanitize metadata in addDocuments() to filter null/undefined/empty values (fixes #1183, #1188) - Wrap callTool() in try/catch for transport errors with auto-reconnect (fixes #1162) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix data integrity — content-hash deduplication, project name collision, empty project guard, stuck isProcessing - Add SHA-256 content-hash deduplication to observations INSERT (store.ts, transactions.ts, SessionStore.ts) - Add content_hash column via migration 22 with backfill and index - Fix project name collision: getCurrentProjectName() now returns parent/basename - Guard against empty project string with cwd-derived fallback - Fix stuck isProcessing: hasAnyPendingWork() resets processing messages older than 5 minutes - Add 12 new tests covering all four fixes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix hook lifecycle — stderr suppression, output isolation, conversation pollution prevention - Suppress process.stderr.write in hookCommand() to prevent Claude Code showing diagnostic output as error UI (#1181). Restores stderr in finally block for worker-continues case. - Convert console.error() to logger.warn()/error() in hook-command.ts and handlers/index.ts so all diagnostics route to log file instead of stderr. - Verified all 7 handlers return suppressOutput: true (prevents conversation pollution #598, #784). - Verified session-complete is a recognized event type (fixes #984). - Verified unknown event types return no-op handler with exit 0 (graceful degradation). - Added 10 new tests in tests/hook-lifecycle.test.ts covering event dispatch, adapter defaults, stderr suppression, and standard response constants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix worker lifecycle — restart loop coordination, stale transport retry, ENOENT shutdown race - Add PID file mtime guard to prevent concurrent restart storms (#1145): isPidFileRecent() + touchPidFile() coordinate across sessions - Add transparent retry in ChromaMcpManager.callTool() on transport error — reconnects and retries once instead of failing (#1131) - Wrap getInstalledPluginVersion() with ENOENT/EBUSY handling (#1042) - Verified ChromaMcpManager.stop() already called on all shutdown paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix Windows platform support — uvx.cmd spawn, PowerShell $_ elimination, windowsHide, FTS5 fallback - Route uvx spawn through cmd.exe /c on Windows since MCP SDK lacks shell:true (#1190, #1192, #1199) - Replace all PowerShell Where-Object {$_} pipelines with WQL -Filter server-side filtering (#1024, #1062) - Add windowsHide: true to all exec/spawn calls missing it to prevent console popups (#1048) - Add FTS5 runtime probe with graceful fallback when unavailable on Windows (#791) - Guard FTS5 table creation in migrations, SessionSearch, and SessionStore with try/catch Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix skills/ distribution — build-time verification and regression tests (#1187) Add post-build verification in build-hooks.js that fails if critical distribution files (skills, hooks, plugin manifest) are missing. Add 10 regression tests covering skill file presence, YAML frontmatter, hooks.json integrity, and package.json files field. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix MigrationRunner schema initialization (#979) — version conflict between parallel migration systems Root cause: old DatabaseManager migrations 1-7 shared schema_versions table with MigrationRunner's 4-22, causing version number collisions (5=drop tables vs add column, 6=FTS5 vs prompt tracking, 7=discovery_tokens vs remove UNIQUE). initializeSchema() was gated behind maxApplied===0, so core tables were never created when old versions were present. Fixes: - initializeSchema() always creates core tables via CREATE TABLE IF NOT EXISTS - Migrations 5-7 check actual DB state (columns/constraints) not just version tracking - Crash-safe temp table rebuilds (DROP IF EXISTS _new before CREATE) - Added missing migration 21 (ON UPDATE CASCADE) to MigrationRunner - Added ON UPDATE CASCADE to FK definitions in initializeSchema() - All changes applied to both runner.ts and SessionStore.ts Tests: 13 new tests in migration-runner.test.ts covering fresh DB, idempotency, version conflicts, crash recovery, FK constraints, and data integrity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix 21 test failures — stale mocks, outdated assertions, missing OpenClaw guards Server tests (12): Added missing workerPath and getAiStatus to ServerOptions mocks after interface expansion. ChromaSync tests (3): Updated to verify transport cleanup in ChromaMcpManager after architecture refactor. OpenClaw (2): Added memory_ tool skipping and response truncation to prevent recursive loops and oversized payloads. MarkdownFormatter (2): Updated assertions to match current output. SettingsDefaultsManager (1): Used correct default key for getBool test. Logger standards (1): Excluded CLI transcript command from background service check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix Codex CLI compatibility (#744) — session_id fallbacks, unknown platform tolerance, undefined guard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix Cursor IDE integration (#838, #1049) — adapter field fallbacks, tolerant session-init validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix /api/logs OOM (#1203) — tail-read replaces full-file readFileSync Replace readFileSync (loads entire file into memory) with readLastLines() that reads only from the end of the file in expanding chunks (64KB → 10MB cap). Prevents OOM on large log files while preserving the same API response shape. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix Settings CORS error (#1029) — explicit methods and allowedHeaders in CORS config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: add session custom_title for agent attribution (#1213) — migration 23, endpoint + store support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: prevent CLAUDE.md/AGENTS.md writes inside .git/ directories (#1165) Add .git path guard to all 4 write sites to prevent ref corruption when paths resolve inside .git internals. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix plugin disabled state not respected (#781) — early exit check in all hook entry points Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix UserPromptSubmit context re-injection on every turn (#1079) — contextInjected session flag Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix stale AbortController queue stall (#1099) — lastGeneratorActivity tracking + 30s timeout Three-layer fix: 1. Added lastGeneratorActivity timestamp to ActiveSession, updated by processAgentResponse (all agents), getMessageIterator (queue yields), and startGeneratorWithProvider (generator launch) 2. Added stale generator detection in ensureGeneratorRunning — if no activity for >30s, aborts stale controller, resets state, restarts 3. Added AbortSignal.timeout(30000) in deleteSession to prevent indefinite hang when awaiting a stuck generator promise Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
7966c6cba9 |
fix: rename save_memory and fix MCP search instructions + startup hook (#1210)
* fix: rename save_memory to save_observation and fix MCP search instructions Stop the primary agent from proactively saving memories by renaming save_memory to save_observation with a neutral description. Remove "Saving Memories" section from SKILL.md. Update context formatters and output styles to reference the mem-search skill instead of raw MCP tool names. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: split SessionStart hooks so smart-install failure doesn't block worker start smart-install.js and worker-start were in the same hook group, so if smart-install exited non-zero the worker never started. Split into separate hook groups so they run independently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: worker startup waits for readiness before hooks fire Move initializationCompleteFlag to set after DB/search init (not MCP), add waitForReadiness() polling /api/readiness, and extract shared pollEndpointUntilOk helper to DRY up health/readiness checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e788fd3676 |
fix: prevent duplicate worker daemons and zombie processes (#1178)
* fix: prevent duplicate worker daemons and zombie processes Three root causes of chroma-mcp timeouts: 1. HTTP shutdown (POST /api/admin/shutdown) closed resources but never called process.exit(). Zombie workers stayed alive, background tasks reconnected to chroma-mcp, spawning duplicate subprocesses that all contended for the same persistent data directory. 2. No guard against concurrent daemon startup. When hooks fired simultaneously, multiple daemons started before either wrote a PID file. The loser got EADDRINUSE but stayed alive because signal handlers registered in the constructor prevented exit. 3. Corrupt 147GB HNSW index file caused all chroma queries to timeout (MCP error -32001). Data fix: deleted corrupt collection, backfill rebuilds from SQLite. Code fixes: - Add PID-based guard in daemon startup: exit if PID file process alive - Add port-based guard in daemon startup: exit if port already bound (runs before WorkerService constructor registers keepalive handlers) - Add process.exit(0) after HTTP shutdown/restart completes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: aggressive startup cleanup and one-time chroma wipe for upgrade Kill orphaned worker-service.cjs and chroma-mcp processes immediately at startup (no age gate) while keeping 30-min threshold for mcp-server. Wipe corrupt chroma data once on upgrade from pre-v10.3 versions — backfill rebuilds from SQLite automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wrap shutdown handlers in try/finally to guarantee process.exit If onShutdown() or onRestart() threw, process.exit(0) was never reached, leaving the daemon alive as a zombie. Also removed redundant require('fs') calls in process-manager tests where ESM imports already existed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
40daf8f3fa |
feat: replace WASM embeddings with persistent chroma-mcp MCP connection (#1176)
* feat: replace WASM embeddings with persistent chroma-mcp MCP connection Replace ChromaServerManager (npx chroma run + chromadb npm + ONNX/WASM) with ChromaMcpManager, a singleton stdio MCP client that communicates with chroma-mcp via uvx. This eliminates native binary issues, segfaults, and WASM embedding failures that plagued cross-platform installs. Key changes: - Add ChromaMcpManager: singleton MCP client with lazy connect, auto-reconnect, connection lock, and Zscaler SSL cert support - Rewrite ChromaSync to use MCP tool calls instead of chromadb npm client - Handle chroma-mcp's non-JSON responses (plain text success/error messages) - Treat "collection already exists" as idempotent success - Wire ChromaMcpManager into GracefulShutdown for clean subprocess teardown - Delete ChromaServerManager (no longer needed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review — connection guard leak, timer leak, async reset - Clear connecting guard in finally block to prevent permanent reconnection block - Clear timeout after successful connection to prevent timer leak - Make reset() async to await stop() before nullifying instance - Delete obsolete chroma-server-manager test (imports deleted class) - Update graceful-shutdown test to use chromaMcpManager property name Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent chroma-mcp spawn storm — zombie cleanup, stale onclose guard, reconnect backoff Three bugs caused chroma-mcp processes to accumulate (92+ observed): 1. Zombie on timeout: failed connections left subprocess alive because only the timer was cleared, not the transport. Now catch block explicitly closes transport+client before rethrowing. 2. Stale onclose race: old transport's onclose handler captured `this` and overwrote the current connection reference after reconnect, orphaning the new subprocess. Now guarded with reference check. 3. No backoff: every failure triggered immediate reconnect. With backfill doing hundreds of MCP calls, this created rapid-fire spawning. Added 10s backoff on both connection failure and unexpected process death. Also includes ChromaSync fixes from PR review: - queryChroma deduplication now preserves index-aligned arrays - SQL injection guard on backfill ID exclusion lists Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
5d79bb7a7a |
fix: prevent zombie process accumulation by verifying subprocess exit (#1168) (#1175)
Two changes fix the observer process resource leak: 1. Add ensureProcessExit to generator finally blocks in SessionRoutes and worker-service, matching the pattern already working in SDKAgent. 2. Add stale session reaper (every 2m) that removes sessions with no active generator and no pending work after 15m idle. This unblocks the orphan reaper which previously skipped processes for "active" sessions. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
b88251bc8b |
fix: self-healing claimNextMessage prevents stuck processing messages (#1159)
* fix: self-healing claimNextMessage prevents stuck processing messages claimAndDelete → claimNextMessage with atomic self-healing: resets stale processing messages (>60s) back to pending before claiming. Eliminates stuck messages from generator crashes without external timers. Removes redundant idle-timeout reset in worker-service.ts. Adds QUEUE to logger Component type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update stale comments in SessionQueueProcessor to reflect claim-confirm pattern Comments still referenced the old claim-and-delete pattern after the claimNextMessage rename. Updated to accurately describe the current lifecycle where messages are marked as processing and stay in DB until confirmProcessed() is called. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: move Date.now() inside transaction and extract stale threshold constant - Move Date.now() inside claimNextMessage transaction closure so timestamp is fresh if WAL contention causes retry - Extract STALE_PROCESSING_THRESHOLD_MS to module-level constant - Add comment clarifying strict < boundary semantics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
ca8421611c |
fix: backfill Chroma vector DB for all projects on startup (#1154)
* fix: backfill all Chroma projects on worker startup ChromaSync.ensureBackfilled() existed but was never called. After v10.2.2's bun cache clear destroyed the ONNX model cache, Chroma only had ~2 days of embeddings while SQLite had 49k+ observations. - Add static backfillAllProjects() to ChromaSync — iterates all projects in SQLite, creates temporary ChromaSync per project, runs smart diff - Call backfillAllProjects() fire-and-forget on worker startup - Add 'CHROMA_SYNC' to logger Component type (pre-existing gap) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: sanitize project names for Chroma collection naming Replace characters outside [a-zA-Z0-9._-] with underscores so projects like "YC Stuff" map to collection "cm__YC_Stuff" instead of failing Chroma's collection name validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: route backfill to shared cm__claude-mem collection, harden sanitization - Use single ChromaSync('claude-mem') in backfillAllProjects() instead of per-project instances, matching how DatabaseManager and SearchManager operate — fixes critical bug where backfilled data landed in orphaned collections that no search path reads from - Strip trailing non-alphanumeric chars from sanitized collection names to satisfy Chroma's end-character constraint - Guard backfill behind Chroma server readiness to avoid N spurious error logs when Chroma failed to start - Use CHROMA_SYNC log component consistently for backfill messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: pass project as parameter to ensureBackfilled instead of mutating instance state Eliminates shared mutable state in backfillAllProjects() loop. Project scoping is now passed explicitly via parameter to both ensureBackfilled() and getExistingChromaIds(), keeping a single Chroma connection while avoiding fragile instance property mutation across iterations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
f24251118e |
fix: bun install, node-addon-api for sharp, consolidate PendingMessageStore (#1140)
* fix: use bun install in sync, add node-addon-api for sharp, consolidate PendingMessageStore - Switch sync-marketplace from npm to bun install - Add node-addon-api as dev dep so sharp builds under bun - Consolidate duplicate PendingMessageStore instantiation in worker-service finally block Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * build assets --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
d2e926fbf7 |
fix: post-merge breakage (Gemini, idle timeout, sharp cache) (#1138)
* fix: add gemini-3-flash to validModels array The model was defined in the type union and RPM limits but missing from the runtime validModels array, causing silent fallback to gemini-2.5-flash. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: skip processing when Gemini returns empty observation response Empty responses were silently consuming messages from the queue via processAgentResponse. Now skips processing on empty content, leaving the message in processing status for stale recovery. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent idle timeout from triggering infinite restart loop When a session hits the 3-minute idle timeout, the finally block was seeing stale processing messages and restarting the generator endlessly. Now tracks idle timeout as a distinct exit reason via session flag, resets stale messages, and skips restart. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: clear stale Bun native module cache on update Bun's global cache retains sharp/libvips native binaries with broken dylib references after version upgrades. Clear ~/.bun/install/cache/@img/ before install in both the end-user (smart-install) and dev (sync-marketplace) paths to prevent ERR_DLOPEN_FAILED errors in Chroma sync. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback (empty summary response, session-scoped reset, shell injection) - Apply same empty-response guard to summary path as observation path in GeminiAgent - Add optional sessionDbId param to resetStaleProcessingMessages for session-scoped resets - Use JSON.stringify for gitignore pattern escaping, filter negation patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
c27314f896 | fix: address PR review comments for chroma server lifecycle | ||
|
|
ed313db742 |
Merge main into feat/chroma-http-server
Resolve conflicts between Chroma HTTP server PR and main branch changes (folder CLAUDE.md, exclusion settings, Zscaler SSL, transport cleanup). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
5de728612e |
chore: bump version to 10.0.6
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
f05f9ca735 |
Merge remote-tracking branch 'origin/main' into openclaw-installer
# Conflicts: # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs |
||
|
|
05e904e613 |
feat: enhance /api/health with version, uptime, workerPath, and AI status
Replace hardcoded TEST-008 build ID with real package version. Add worker filesystem path, uptime counter, and AI provider status (including last interaction success/failure tracking) to the health endpoint response. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
98d87d7573 |
chore: bump version to 10.0.4
Reverts v10.0.3 chroma-mcp spawn storm fix (broken release). Restores codebase to v10.0.2 state. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
3f01baebfe |
Merge remote-tracking branch 'origin/main' into fix/chroma-mcp-spawn-storm
# Conflicts: # src/services/worker-service.ts # tests/infrastructure/process-manager.test.ts |
||
|
|
a3f9e7f638 |
fix: prevent chroma-mcp spawn storm with 5-layer defense (641 processes → max 2)
During SIGHUP testing with 6+ active sessions, ChromaSync.ensureConnection()
had no mutex — concurrent fire-and-forget syncObservation() calls each spawned
a chroma-mcp subprocess via StdioClientTransport, creating 641 orphans in ~5min.
Error-driven reconnection formed a positive feedback loop amplifying the storm.
Defense layers:
- Layer 0: Connection mutex via promise memoization (prevents concurrent spawns)
- Layer 1: Pre-spawn process count guard using execFileSync('ps') (kills excess)
- Layer 2: Hardened close() with try-finally + Unix pkill in GracefulShutdown
- Layer 3: Count-based orphan reaper in ProcessManager (not age-based)
- Layer 4: Circuit breaker stops retries after 3 consecutive failures for 60s
Closes #1063, closes #695
Relates to #1010, #707
|
||
|
|
4e67393d27 |
fix: prevent daemon silent death from SIGHUP + unhandled errors
Root cause: registerSignalHandlers() handled SIGTERM/SIGINT but not SIGHUP. When the parent hook process exits, the kernel sends SIGHUP to the daemon, causing immediate termination (default signal action). Belt-and-suspenders fix: 1. SIGHUP handler: ignore in daemon mode, graceful shutdown otherwise 2. setsid: spawn daemon in new session on Linux (prevents SIGHUP delivery) 3. Global unhandledRejection/uncaughtException guards in daemon mode |
||
|
|
418e38ee46 |
fix: hook resilience and worker lifecycle improvements (#957, #923, #984, #987, #1042)
Reduce timeouts to eliminate 10-30s startup delay when worker is dead (common on WSL2 after hibernate). Add stale PID detection, graceful error handling across all handlers, and error classification that distinguishes worker unavailability from handler bugs. - HEALTH_CHECK 30s→3s, new POST_SPAWN_WAIT (5s), PORT_IN_USE_WAIT (3s) - isProcessAlive() with EPERM handling, cleanStalePidFile() - getPluginVersion() try-catch for shutdown race (#1042) - isWorkerUnavailableError: transport+5xx+429→exit 0, 4xx→exit 2 - No-op handler for unknown event types (#984) - Wrap all handler fetch calls in try-catch for graceful degradation - CLAUDE_MEM_HEALTH_TIMEOUT_MS env var override with validation |
||
|
|
5969d670d0 |
chore: bump version to 9.1.1
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
8dfcb5e612 |
chore: bump version to 9.1.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
ff503d08a7 |
MAESTRO: Merge PR #657 - Add generate/clean CLI commands for CLAUDE.md management
Cherry-picked source changes from PR #657 (224 commits behind main). Adds `claude-mem generate` and `claude-mem clean` CLI commands: - New src/cli/claude-md-commands.ts with generateClaudeMd() and cleanClaudeMd() - Worker service generate/clean case handlers with --dry-run support - CLAUDE_MD logger component type - Uses shared isDirectChild from path-utils.ts (DRY improvement over PR original) Skipped from PR: 91 CLAUDE.md file deletions (stale), build artifacts, .claude/plans/ dev artifact, smart-install.js shell alias auto-injection (aggressive profile modification without consent). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
98920bd860 |
MAESTRO: Merge PR #662 - Add save_memory MCP tool for manual memory storage
Adds save_memory MCP tool allowing users to manually save observations for semantic search. Source changes cherry-picked from PR #662 by @darconada (build artifact conflicts resolved by direct application). Closes #645. Co-Authored-By: darconadalabarga <darconada@arsys.es> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
5dffb1ebb0 |
MAESTRO: fix(hooks): add session-complete handler to enable orphan reaper cleanup
Cherry-picked from PR #844 by @thusdigital. Sessions stayed in active sessions map forever after summarize, causing the orphan reaper to think all processes were still active. Adds session-complete as Stop phase 2 hook that calls POST /api/sessions/complete to remove sessions from the active map, allowing the reaper to correctly identify and clean up orphaned worker processes. Fixes #842. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
da1d2cd36a |
MAESTRO: fix(db): prevent FK constraint failures on worker restart
Cherry-picked source changes from PR #889 by @Et9797. Fixes #846. Key changes: - Add ensureMemorySessionIdRegistered() guard in SessionStore.ts - Add ON UPDATE CASCADE migration (schema v21) for observations and session_summaries FK constraints - Change message queue from claim-and-delete to claim-confirm pattern (PendingMessageStore.ts) - Add spawn deduplication and unrecoverable error detection in SessionRoutes.ts and worker-service.ts - Add forceInit flag to SDKAgent for stale session recovery Build artifacts skipped (pre-existing dompurify dep issue). Path fixes (HealthMonitor.ts, worker-utils.ts) already merged via PR #634. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
2d40afe7ef |
fix: provider-aware recovery and stale session cleanup (PR #741)
Applies PR #741 by @licutis onto main, resolving conflicts with recently merged PRs #693, #937, and #627. Adds getActiveAgent() to WorkerService so startup-recovery uses the correct provider instead of hardcoding SDKAgent. Also cleans up sessions stuck 'active' for 6+ hours and their pending messages before processing orphaned queues. Co-Authored-By: licutis <43884712+licutis@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
f24bba21e9 | fix(worker): gracefully process orphaned pending messages after session termination | ||
|
|
27aa98c269 |
fix: wait for database initialization before processing session-init requests
Fixes a race condition where the UserPromptSubmit hook could call /api/sessions/init before the database is fully initialized, resulting in a 500 error with "Database not initialized". Root cause: - Worker starts HTTP server immediately for fast startup - Health endpoint (/api/health) returns OK when server is listening - session-init hook waits for health check, then calls /api/sessions/init - Database initialization happens in background, creating a race window Solution: - Add early handler for /api/sessions/init that waits for initialization - Uses same pattern as existing /api/context/inject handler - Returns 503 with retry message if initialization times out The fix ensures hooks receive proper responses even during the brief startup window when the server is listening but DB isn't ready yet. Co-Authored-By: Claude Code <noreply@anthropic.com> |
||
|
|
0ecb387f58 |
MAESTRO: Implement Windows spawn guard to prevent repeated worker popups (closes #921)
Adds file-based cooldown lock in ensureWorkerStarted() so failed worker spawn attempts on Windows don't produce repeated bun.exe terminal popups. Based on the approach from PR #931, integrated into the refactored code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
b8b88d998c |
fix: fail open on /api/context/inject during initialization
The /api/context/inject endpoint previously blocked for up to 5 minutes (300s timeout) waiting for initializationComplete, which includes MCP connection setup. On Windows, the MCP connection can hang indefinitely, causing the context hook to never return and blocking Claude Code startup. This change makes the endpoint fail open: if the worker hasn't finished initializing, return empty context immediately instead of blocking. The hook completes fast, and context becomes available on subsequent prompts once initialization finishes in the background. Closes #958 |
||
|
|
4df9f61347 |
refactor: implement in-process worker architecture for hooks (#722)
* fix: stop generating empty CLAUDE.md files - Return empty string instead of "No recent activity" when no observations exist - Skip writing CLAUDE.md files when formatted content is empty - Remove redundant "auto-generated by claude-mem" HTML comment - Clean up 98 existing empty CLAUDE.md files across the codebase - Update tests to expect empty string for empty input Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * build assets * refactor: implement in-process worker architecture for hooks Replaces spawn-based worker startup with in-process architecture: - Hook processes now become the worker when port 37777 is free - Eliminates Windows spawn issues (NO SPAWN rule) - SessionStart chains: smart-install && stop && context Key changes: - worker-service.ts: hook case starts WorkerService in-process - hook-command.ts: skipExit option prevents process.exit() when hosting worker - hooks.json: single chained command replaces separate start/hook commands - worker-utils.ts: ensureWorkerRunning() returns boolean, doesn't block - handlers: graceful fallback when worker unavailable All 761 tests pass. Manual verification confirms hook stays alive as worker. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * context * a * MAESTRO: Mark PR #722 test verification task complete All 797 tests passed (3 skipped, 0 failed) after merge conflict resolution. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * MAESTRO: Mark PR #722 build verification task complete * MAESTRO: Mark PR #722 code review task complete Code review verified: - worker-service.ts hook case starts WorkerService in-process - hook-command.ts has skipExit option - hooks.json uses single chained command - worker-utils.ts ensureWorkerRunning() returns boolean Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * MAESTRO: Mark PR #722 conflict resolution push task complete Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> |