b7c23ca23222d204689e0be2342251c00a4fff00
114 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
abd55977ca |
fix(mcp): MCP server crashes with Cannot find module 'bun:sqlite' under Node (#1645)
* fix(mcp): MCP server crashes with Cannot find module 'bun:sqlite' under Node
The MCP server bundle (mcp-server.cjs) ships with `#!/usr/bin/env node` so
it must run under Node, but commit
|
||
|
|
6250a194dd |
Merge branch 'pr-1472' into integration/validation-batch
# Conflicts: # plugin/scripts/context-generator.cjs # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs # plugin/ui/viewer-bundle.js # src/cli/handlers/context.ts # src/services/sqlite/SessionStore.ts # src/services/sqlite/migrations/runner.ts # src/services/worker-service.ts # src/shared/SettingsDefaultsManager.ts |
||
|
|
58fcd85724 |
Merge branch 'pr-1578' into integration/validation-batch
# Conflicts: # plugin/scripts/context-generator.cjs # plugin/scripts/worker-service.cjs # src/utils/tag-stripping.ts |
||
|
|
c1a3fc27ec |
Merge branch 'pr-1557' into integration/validation-batch
# Conflicts: # plugin/hooks/hooks.json # tests/infrastructure/plugin-distribution.test.ts |
||
|
|
d570909bf1 |
Merge branch 'pr-1491' into integration/validation-batch
# Conflicts: # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs # src/shared/hook-constants.ts |
||
|
|
5dd2a6f758 |
Merge branch 'pr-1553' into integration/validation-batch
# Conflicts: # src/services/worker/session/SessionCompletionHandler.ts |
||
|
|
c3cb8f81ed |
Merge branch 'pr-1368' into integration/validation-batch
# Conflicts: # plugin/scripts/context-generator.cjs # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs # plugin/ui/viewer-bundle.js |
||
|
|
5cffff7d40 | Merge branch 'pr-1620' into integration/validation-batch | ||
|
|
d63d73acc2 | Merge branch 'pr-1524' into integration/validation-batch | ||
|
|
842d614adb | Merge branch 'pr-1550' into integration/validation-batch | ||
|
|
4d2bb1f13e | Merge branch 'pr-1441' into integration/validation-batch | ||
|
|
a9de029c02 | Merge branch 'pr-1549' into integration/validation-batch | ||
|
|
85f57e6440 | Merge branch 'pr-1554' into integration/validation-batch | ||
|
|
36de44d661 | Merge branch 'pr-1556' into integration/validation-batch | ||
|
|
f32fda8b35 | Merge branch 'pr-1494' into integration/validation-batch | ||
|
|
f81684c61c |
fix: strip persisted-output tags from memory
Fixes #1551 Co-Authored-By: Claude <noreply@anthropic.com> |
||
|
|
9063c5d8a7 |
fix: block memory agent prose-skip responses at prompt and runtime levels
Observer prompt now explicitly requires XML observation blocks or empty responses — prose explanations like "Skipping" are discarded. ResponseProcessor logs a warning when non-XML content is received. Recording focus expanded to include concrete debugging findings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
64cce2bf10 |
fix: resolve 3 upstream bugs (summarize, ChromaSync, HealthMonitor) (#1566)
* fix: resolve 3 upstream bugs in summarize, ChromaSync, and HealthMonitor 1. summarize.ts: Skip summary when transcript has no assistant message. Prevents error loop where empty transcripts cause repeated failed summarize attempts (~30 errors/day observed in production). 2. ChromaSync.ts: Fallback to chroma_update_documents when add fails with "IDs already exist". Handles partial writes after MCP timeout without waiting for next backfill cycle. 3. HealthMonitor.ts: Replace HTTP-based isPortInUse with atomic socket bind on Unix. Eliminates TOCTOU race when two sessions start simultaneously (HTTP check is non-atomic — both see "port free" before either completes listen()). Updated tests accordingly. All three bugs are pre-existing in v10.5.5. Confirmed via log analysis of 543K lines over 17 days of production usage across two servers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add CONTRIB_NOTES.md to gitignore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address CodeRabbit review on PR #1566 - HealthMonitor: add APPROVED OVERRIDE annotation for Win32 HTTP fallback - ChromaSync: replace chroma_update_documents with delete+add for proper upsert (update only modifies existing IDs, silently ignores missing ones) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Alessandro Costa <alessandro@claudio.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
190c74492f |
fix: address second PR review — clean replace, IDE failure bubbling, bun validation
- cpSync now does rmSync before copy to avoid stale file merges - setupIDEs() returns failed IDE list; install reports partial success - runSmartInstall() returns boolean status instead of void - Worker port in next-steps URL reads CLAUDE_MEM_WORKER_PORT env var - Goose YAML regex stops at column-0 keys (prevents eating sibling sections) - AGENTS.md uninstall removes header-only stub files - findBunPath() validated before use in WindsurfHooksInstaller - Cursor marked unsupported in ide-detection until installer is wired Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
ae6915b88e |
fix: address PR review — shebang, double-escaping, data loss, uninstall scope
- Add shebang banner to NPX CLI esbuild config so npx claude-mem works - Remove manual backslash pre-escaping in WindsurfHooksInstaller (JSON.stringify handles it) - Scope cache deletion to claude-mem only, not entire vendor namespace - Use getWorkerPort() in OpenCodeInstaller instead of hard-coded 37777 - Throw on corrupt JSON in readJsonSafe/readGeminiSettings/Windsurf to prevent data loss - Fix Cursor install stub to warn instead of silently succeeding - Fix Gemini uninstall to remove individual hooks within groups, not whole groups - Update tests for new corrupt-file-throws behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
cdffdba97a |
test: add unit tests for MCP factory, context injection, JSON utils, and non-TTY install
59 tests across 4 files covering: - context-injection: tag injection, replacement, headerLine support, idempotency - json-utils: missing/valid/corrupt JSON handling with generic types - mcp-integrations: factory function, process.execPath, idempotency, merge behavior - install-non-tty: isInteractive detection, runTasks fallback, log wrapper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
a66b98bcdd |
fix: strip <system-reminder> tags from persisted memory and DRY up regex
System reminders (CLAUDE.md contents, deferred tool lists) were being stored in memory observations. Add system-reminder to the tag stripping pipeline alongside <private> and <system_instruction>, and extract the duplicated regex into a shared SYSTEM_REMINDER_REGEX constant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
d8eb2fa9f9 |
fix: resolve hook failures when CLAUDE_PLUGIN_ROOT is not injected (#1533)
The fallback path for CLAUDE_PLUGIN_ROOT was pointing to the old marketplaces install location which no longer exists. Hooks now first try to find the latest versioned cache directory (~/.claude/plugins/cache/thedotmack/claude-mem/<version>/) using ls -dt, with the marketplaces path kept as a final fallback. This mirrors the self-resolution pattern already used in bun-runner.js (resolve(__bun_runner_dirname, '..')) but at the shell level, so node can find bun-runner.js in the first place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
93a30c5c8f |
fix: skip parseSummary false positives with no sub-tags (#1360)
When an observation response accidentally contains a <summary> tag with plain text (no <request>/<investigated>/etc. sub-tags), parseSummary was creating empty SESSION SUMMARY records with all fields as empty strings. Add an all-null guard AFTER field extraction: if none of the 5 sub-tags matched, the <summary> match is a false positive and we return null. This is distinct from the commented-out validation above (which rejected summaries with SOME missing fields). We only reject when ALL are absent — real partial summaries are still saved per the maintainer's explicit note. Closes #1360 Co-Authored-By: Claude <noreply@anthropic.com> |
||
|
|
2a304d59eb |
fix: handle bare path strings in files_modified/files_read columns (#1359)
JSON.parse('/path/to/file') throws SyntaxError, crashing the viewer and
any code reading observations with legacy bare-path data in those columns.
- Add parseFileList() helper in observations/files.ts — tries JSON.parse,
falls back to wrapping bare strings in an array
- Replace unsafe JSON.parse calls in files.ts, SessionStore.ts, ChromaSync.ts
- Add 9 unit tests covering null, empty, valid JSON, bare paths, invalid JSON
Closes #1359
Co-Authored-By: Claude <noreply@anthropic.com>
|
||
|
|
12501412b9 |
fix: persist session completion to database in completeByDbId (#1532)
completeByDbId only cleaned up in-memory state, leaving sdk_sessions rows with status='active' and completed_at=NULL indefinitely. Ghost sessions accumulated and exhausted the agent pool, causing 60s timeout errors. - Add SessionStore.markSessionCompleted() to set status/completed_at/completed_at_epoch - Call it at the start of completeByDbId before in-memory cleanup - Inject SessionStore into SessionCompletionHandler via constructor - Add 4 tests covering status, timestamps, isolation, and non-existent IDs Closes #1532 Co-Authored-By: Claude <noreply@anthropic.com> |
||
|
|
b81281fd6c |
fix: update default model from claude-sonnet-4-5 to claude-sonnet-4-6 (#1390)
CLAUDE_MEM_MODEL defaulted to the deprecated claude-sonnet-4-5 across source, installer, tests, and documentation. Updated all references to claude-sonnet-4-6. Closes #1390 Co-Authored-By: Claude <noreply@anthropic.com> |
||
|
|
2a6c9ea2b7 |
Add CLAUDE.local.md support via CLAUDE_MEM_FOLDER_USE_LOCAL_MD setting
When CLAUDE_MEM_FOLDER_USE_LOCAL_MD is set to 'true' in settings, claude-mem writes auto-generated context to CLAUDE.local.md instead of CLAUDE.md. This separates personal machine-generated context from shared project instructions, aligning with Claude Code's native CLAUDE.local.md convention where: - CLAUDE.md = team-shared project instructions (checked into git) - CLAUDE.local.md = personal/local context (gitignored) Changes: - Add CLAUDE_MEM_FOLDER_USE_LOCAL_MD setting (default: false) - Add getTargetFilename() helper to resolve target based on settings - Update writeClaudeMdToFolder() to accept optional target filename - Update active-file detection to skip folders with either CLAUDE.md or CLAUDE.local.md being actively read/modified (issue #859 compat) - Add 8 new tests covering filename selection, write behavior, content preservation, atomic writes, and active-file detection Closes #632 |
||
|
|
83f61177c7 |
fix: address CodeRabbit review feedback on PR #1491
- Update POST_SPAWN_WAIT test assertion from 5000 to 15000 to match the constant change in hook-constants.ts - Remove redundant readPidFile() from aggressiveStartupCleanup() — start() writes the new PID before this runs, so it always returns process.pid (already protected) - Add waitForReadiness() to the reused-worker path in ensureWorkerStarted() to prevent concurrent hooks from racing past a cold-starting worker's initialization guard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
f86be1ef2b |
fix(project-name): expand ~ to home directory before project resolution
Fixes #1478 When a terminal reports cwd as '~' or '~/subpath' instead of the full path, getProjectName() fell through to the 'unknown-project' fallback because path.basename('~') returns '~' as-is. Added expandTilde() helper that resolves leading ~ to os.homedir(), called in both getProjectName() and getProjectContext() before path operations and worktree detection. |
||
|
|
07ab7000a8 |
fix: patch 7 critical bugs affecting all non-dev-machine users and Windows
1. Fix esbuild inlining build-machine __dirname as string literal — use
CJS-compatible runtime banner with require("node:url").fileURLToPath
across worker-service, mcp-server, and context-generator builds.
2. Fix isMainModule check missing .cjs extension and Windows backslash
path normalization.
3. Wrap extractLastMessage in try-catch to prevent infinite Stop hook
feedback loop on malformed transcripts (exit 0 instead of exit 2).
4. Replace heavy SessionEnd hook (Node→Bun→1.7MB CJS→HTTP) with
lightweight inline node -e one-liner (~200ms vs >1s).
5. Add 7 Gemini/OpenRouter error patterns to unrecoverablePatterns
circuit breaker to prevent 77K+ retry loops on expired API keys.
6. Preserve CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_CODE_GIT_BASH_PATH in
sanitizeEnv instead of stripping them with the CLAUDE_CODE_ prefix.
7. Use PowerShell -EncodedCommand for spawnDaemon to fix path quoting
when Windows usernames contain spaces.
Closes #1515, #1495, #1475, #1465, #1500, #1513, #1512, #1450, #1460,
#1486, #1449, #1481, #1451, #1480, #1453, #1445
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
9cfa57d498 |
fix: use null-byte delimiter in observation content hash to prevent collisions
Fields concatenated without separators allowed different tuples to produce identical hashes (e.g. session="ab", title="cd" vs session="abc", title="d"). This could cause legitimate observations to be silently deduplicated. Join with \x00 so field boundaries are unambiguous. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
4f6fb9e614 |
fix: address platform source review feedback
Tighten platform source persistence so legacy callers cannot silently relabel existing sessions, repair migration 24 when schema_versions drifts from the real schema, and polish the follow-up UI/error-handler review nits. - only backfill platform_source when it is blank and raise on explicit source conflicts for an existing session - make migration 24 verify both the sdk_sessions column and its index before treating it as applied - expose platform_source from the functional session getters and add regression tests for source preservation and schema drift recovery - add the required APPROVED OVERRIDE annotation for centralized HTTP error translation - keep mobile source pills on a single horizontal row |
||
|
|
df1fb8bb89 |
fix(gemini): add conversation history truncation to prevent O(N²) token cost growth
GeminiAgent sends the full conversation history with every API call, causing quadratic token growth per session. A 100-observation session sends ~30M cumulative input tokens. This ports the proven truncateHistory() sliding window from OpenRouterAgent to GeminiAgent. - Add CLAUDE_MEM_GEMINI_MAX_CONTEXT_MESSAGES (default: 20) and CLAUDE_MEM_GEMINI_MAX_TOKENS (default: 100000) settings - Add truncateHistory() to GeminiAgent using shared estimateTokens() - Always preserve at least the newest message to avoid empty API requests - Add settings validation in SettingsRoutes (1-100 messages, 1K-1M tokens) - Add regression tests for truncation and oversized single-prompt edge case Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
4d7bec4d05 |
fix: stop spinner from spinning forever (#1440)
* fix: stop spinner from spinning forever due to orphaned DB messages The activity spinner never stopped because isAnySessionProcessing() queried ALL pending/processing messages in the database, including orphaned messages from dead sessions that no generator would ever process. Root cause: isAnySessionProcessing() used hasAnyPendingWork() which is a global DB scan. Changed it to use getTotalQueueDepth() which only checks sessions in the active in-memory Map. Additional fixes: - Add terminateSession() to enforce restart-or-terminate invariant - Fix 3 zombie paths in .finally() handler that left sessions alive - Clean up idle sessions from memory on successful completion - Remove redundant bare isProcessing:true broadcast - Replace inline require() with proper accessor - Add 8 regression tests for session termination invariant Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review findings — idle-timeout race, double broadcast, query amplification - Move pendingCount check before idle-timeout termination to prevent abandoning fresh messages that arrive between idle abort and .finally() - Move broadcastProcessingStatus() inside restart branch only — the else branch already broadcasts via removeSessionImmediate callback - Compute queueDepth once in broadcastProcessingStatus() and derive isProcessing from it, eliminating redundant double iteration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
5b041d6b49 |
refactor: rename formatters to AgentFormatter/HumanFormatter for semantic clarity
ColorFormatter and MarkdownFormatter names obscured their actual purpose. The formatters serve two distinct audiences: the AI agent (compressed, token-efficient context) and the human (rich ANSI-colored terminal output). - MarkdownFormatter → AgentFormatter (renderMarkdown* → renderAgent*) - ColorFormatter → HumanFormatter (renderColor* → renderHuman*) - useColors parameter → forHuman across the pipeline - Import aliases Color/Markdown → Human/Agent - API query param `colors=true` unchanged (backward compatible) Pure rename refactor — no logic or behavior changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
9f529a30f5 |
feat: strip <system_instruction> tags before DB storage (#1398)
* feat: strip <system_instruction> tags before database storage Extends the existing tag-stripping mechanism (used for <private> and <claude-mem-context>) to also filter Conductor-injected system instructions, preventing them from being persisted in the observation database. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: also strip <system-instruction> (hyphen variant) before DB storage Conductor uses both <system_instruction> and <system-instruction> tag formats. This adds the hyphen variant to the same stripping mechanism. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
80a8c90a1a |
feat: add embedded Process Supervisor for unified process lifecycle (#1370)
* feat: add embedded Process Supervisor for unified process lifecycle management Consolidates scattered process management (ProcessManager, GracefulShutdown, HealthMonitor, ProcessRegistry) into a unified src/supervisor/ module. New: ProcessRegistry with JSON persistence, env sanitizer (strips CLAUDECODE_* vars), graceful shutdown cascade (SIGTERM → 5s wait → SIGKILL with tree-kill on Windows), PID file liveness validation, and singleton Supervisor API. Fixes #1352 (worker inherits CLAUDECODE env causing nested sessions) Fixes #1356 (zombie TCP socket after Windows reboot) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add session-scoped process reaping to supervisor Adds reapSession(sessionId) to ProcessRegistry for killing session-tagged processes on session end. SessionManager.deleteSession() now triggers reaping. Tightens orphan reaper interval from 60s to 30s. Fixes #1351 (MCP server processes leak on session end) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Unix domain socket support for worker communication Introduces socket-manager.ts for UDS-based worker communication, eliminating port 37777 collisions between concurrent sessions. Worker listens on ~/.claude-mem/sockets/worker.sock by default with TCP fallback. All hook handlers, MCP server, health checks, and admin commands updated to use socket-aware workerHttpRequest(). Backwards compatible — settings can force TCP mode via CLAUDE_MEM_WORKER_TRANSPORT=tcp. Fixes #1346 (port 37777 collision across concurrent sessions) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove in-process worker fallback from hook command Removes the fallback path where hook scripts started WorkerService in-process, making the worker a grandchild of Claude Code (killed by sandbox). Hooks now always delegate to ensureWorkerStarted() which spawns a fully detached daemon. Fixes #1249 (grandchild process killed by sandbox) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add health checker and /api/admin/doctor endpoint Adds 30-second periodic health sweep that prunes dead processes from the supervisor registry and cleans stale socket files. Adds /api/admin/doctor endpoint exposing supervisor state, process liveness, and environment health. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add comprehensive supervisor test suite 64 tests covering all supervisor modules: process registry (18 tests), env sanitizer (8), shutdown cascade (10), socket manager (15), health checker (5), and supervisor API (6). Includes persistence, isolation, edge cases, and cross-module integration scenarios. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: revert Unix domain socket transport, restore TCP on port 37777 The socket-manager introduced UDS as default transport, but this broke the HTTP server's TCP accessibility (viewer UI, curl, external monitoring). Since there's only ever one worker process handling all sessions, the port collision rationale for UDS doesn't apply. Reverts to TCP-only, removing ~900 lines of unnecessary complexity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove dead code found in pre-landing review Remove unused `acceptingSpawns` field from Supervisor class (written but never read — assertCanSpawn uses stopPromise instead) and unused `buildWorkerUrl` import from context handler. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * updated gitignore * fix: address PR review feedback - downgrade HTTP logging, clean up gitignore, harden supervisor - Downgrade request/response HTTP logging from info to debug to reduce noise - Remove unused getWorkerPort imports, use buildWorkerUrl helper - Export ENV_PREFIXES/ENV_EXACT_MATCHES from env-sanitizer, reuse in Server.ts - Fix isPidAlive(0) returning true (should be false) - Add shutdownInitiated flag to prevent signal handler race condition - Make validateWorkerPidFile testable with pidFilePath option - Remove unused dataDir from ShutdownCascadeOptions - Upgrade reapSession log from debug to warn - Rename zombiePidFiles to deadProcessPids (returns actual PIDs) - Clean up gitignore: remove duplicate datasets/, stale ~*/ and http*/ patterns - Fix tests to use temp directories instead of relying on real PID file Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
237a4c37f8 |
fix: always pass --ssl flag to chroma-mcp in remote mode (#1286)
* fix: always pass --ssl flag to chroma-mcp in remote mode The chroma-mcp CLI defaults to SSL when using --client-type http. When CLAUDE_MEM_CHROMA_SSL is false (the common case for local ChromaDB servers), buildCommandArgs() omitted --ssl entirely, causing chroma-mcp to attempt an SSL connection to a plain HTTP server and fail with "Could not connect to a Chroma server". Always pass --ssl with an explicit true/false value so the user's CLAUDE_MEM_CHROMA_SSL setting is faithfully forwarded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add regression tests for ChromaMcpManager SSL flag fix Adds 4 focused test cases verifying buildCommandArgs() produces correct --ssl args, covering SSL=false, SSL=true, unset (defaults to false), and local mode (no --ssl flag). Requested by @xkonjin in PR #1286 review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: rebuild checked-in bundles to include SSL flag fix Rebuild all bundles against upstream/main so the --ssl <true|false> fix is present in the runtime artifacts that hooks and the marketplace plugin actually execute. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
ad902bedd9 |
fix: auto-repair malformed database schema from cross-version sync (#1308)
When a claude-mem DB is synced between machines running different versions, orphaned indexes can reference non-existent columns (e.g. idx_observations_content_hash referencing content_hash). This causes SQLite to throw "malformed database schema" on ALL queries, including PRAGMAs, creating a silent 503 failure loop. The fix detects this on startup, uses Python's sqlite3 module to drop the orphaned schema objects (bun:sqlite doesn't support writable_schema modifications), resets migration versions, and lets the idempotent migration system recreate everything properly. Fixes #1307 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
10e980cd69 |
fix: remove unrecognized fields from Claude Code Stop hook output (#1291)
* fix: remove unrecognized fields from Claude Code Stop hook output
Claude Code validates Stop hook JSON output against its hook contract
schema which only accepts {decision?, reason?, systemMessage?}. The
formatOutput() function was returning {continue, suppressOutput} which
are not part of the Claude Code hook API, causing "JSON validation
failed" errors on every session stop.
Return an empty object {} for the default case (no hookSpecificOutput),
preserving only systemMessage when present. This is valid for all hook
event types and eliminates the schema validation error.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add unhappy-path tests for formatOutput per PR review
Add edge case coverage for malformed input (undefined/null), falsy
systemMessage values, non-contract field stripping, and contract key
allowlist. Also add defensive null guard to formatOutput matching
normalizeInput pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Alex Worland <alexworland@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|
|
38d9ac7adb |
fix: prevent zombie subprocess accumulation by only trusting exitCode (#1226) (#1325)
proc.killed only means Node sent a signal — the process can still be alive. This caused premature pool slot release, allowing unbounded process spawning. - ensureProcessExit: remove proc.killed from early-exit checks, only trust exitCode - Fix 3 call-site guards that skipped cleanup for signaled-but-alive processes - Add TOTAL_PROCESS_HARD_CAP=10 safety net in waitForSlot() - After SIGKILL, wait up to 1s via exit event instead of blind 200ms sleep - Reduce reaper interval from 5min to 1min, idle threshold from 2min to 1min Closes #1226 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
4616f7ab1c |
fix: output valid JSON from smart-install.js hook to prevent SessionStart error (#1337)
smart-install.js used stdio: 'inherit' for execSync calls, leaking
plain-text install output (bun/npm progress) to stdout. Claude Code
expects hook output to start with '{' (valid JSON), so this caused
a confusing "SessionStart:startup hook error" on every session start.
Changes:
- Pipe stdout on all execSync calls to prevent non-JSON stdout leak
- Output {"continue":true,"suppressOutput":true} on both success and
error paths so Claude Code always receives valid JSON
- Add tests verifying no stdio:'inherit' remains and JSON is output
Fixes #1253
Vibe-coded by Ousama Ben Younes
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|
|
ad3d236cec |
fix: resolve hook crashes and CLAUDE_PLUGIN_ROOT fallback (#1215, #1220) (#1229)
* fix: resolve PostToolUse hook crashes and 5s latency (#1220) Three compounding bugs caused hook failures: 1. Missing break statements in worker-service.ts switch — if async code threw before process.exit(), execution fell through to subsequent cases. Added break to all 7 cases missing them. 2. Unhandled promise rejection on main() — added .catch() that logs the error and exits 0 (per project exit code strategy: don't block Claude Code or leave Windows Terminal tabs open). 3. Redundant start commands in hooks.json — PostToolUse, UserPromptSubmit, and Stop groups each had a standalone start command that was redundant (the hook case already calls ensureWorkerStarted internally). The redundant start also caused 5s latency via bun-runner.js collectStdin() timeout since Claude Code never closes stdin. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add CLAUDE_PLUGIN_ROOT fallback for Stop hooks (#1215) Upstream Claude Code bug (anthropics/claude-code#24529) leaves CLAUDE_PLUGIN_ROOT unset for Stop hooks on macOS and ALL hooks on Linux. Two-layer defense: 1. Shell-level: hooks.json commands now use inline fallback _R="${CLAUDE_PLUGIN_ROOT}"; [ -z "$_R" ] && _R="$HOME/..."; falling back to the known marketplace install path. 2. Script-level: bun-runner.js self-resolves plugin root from its own filesystem location via import.meta.url, and fixes broken /scripts/... paths that result from empty expansion. Added test to verify all hook commands include the fallback path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
c6f932988a |
Fix 30+ root-cause bugs across 10 triage phases (#1214)
* MAESTRO: fix ChromaDB core issues — Python pinning, Windows paths, disable toggle, metadata sanitization, transport errors - Add --python version pinning to uvx args in both local and remote mode (fixes #1196, #1206, #1208) - Convert backslash paths to forward slashes for --data-dir on Windows (fixes #1199) - Add CLAUDE_MEM_CHROMA_ENABLED setting for SQLite-only fallback mode (fixes #707) - Sanitize metadata in addDocuments() to filter null/undefined/empty values (fixes #1183, #1188) - Wrap callTool() in try/catch for transport errors with auto-reconnect (fixes #1162) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix data integrity — content-hash deduplication, project name collision, empty project guard, stuck isProcessing - Add SHA-256 content-hash deduplication to observations INSERT (store.ts, transactions.ts, SessionStore.ts) - Add content_hash column via migration 22 with backfill and index - Fix project name collision: getCurrentProjectName() now returns parent/basename - Guard against empty project string with cwd-derived fallback - Fix stuck isProcessing: hasAnyPendingWork() resets processing messages older than 5 minutes - Add 12 new tests covering all four fixes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix hook lifecycle — stderr suppression, output isolation, conversation pollution prevention - Suppress process.stderr.write in hookCommand() to prevent Claude Code showing diagnostic output as error UI (#1181). Restores stderr in finally block for worker-continues case. - Convert console.error() to logger.warn()/error() in hook-command.ts and handlers/index.ts so all diagnostics route to log file instead of stderr. - Verified all 7 handlers return suppressOutput: true (prevents conversation pollution #598, #784). - Verified session-complete is a recognized event type (fixes #984). - Verified unknown event types return no-op handler with exit 0 (graceful degradation). - Added 10 new tests in tests/hook-lifecycle.test.ts covering event dispatch, adapter defaults, stderr suppression, and standard response constants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix worker lifecycle — restart loop coordination, stale transport retry, ENOENT shutdown race - Add PID file mtime guard to prevent concurrent restart storms (#1145): isPidFileRecent() + touchPidFile() coordinate across sessions - Add transparent retry in ChromaMcpManager.callTool() on transport error — reconnects and retries once instead of failing (#1131) - Wrap getInstalledPluginVersion() with ENOENT/EBUSY handling (#1042) - Verified ChromaMcpManager.stop() already called on all shutdown paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix Windows platform support — uvx.cmd spawn, PowerShell $_ elimination, windowsHide, FTS5 fallback - Route uvx spawn through cmd.exe /c on Windows since MCP SDK lacks shell:true (#1190, #1192, #1199) - Replace all PowerShell Where-Object {$_} pipelines with WQL -Filter server-side filtering (#1024, #1062) - Add windowsHide: true to all exec/spawn calls missing it to prevent console popups (#1048) - Add FTS5 runtime probe with graceful fallback when unavailable on Windows (#791) - Guard FTS5 table creation in migrations, SessionSearch, and SessionStore with try/catch Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix skills/ distribution — build-time verification and regression tests (#1187) Add post-build verification in build-hooks.js that fails if critical distribution files (skills, hooks, plugin manifest) are missing. Add 10 regression tests covering skill file presence, YAML frontmatter, hooks.json integrity, and package.json files field. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix MigrationRunner schema initialization (#979) — version conflict between parallel migration systems Root cause: old DatabaseManager migrations 1-7 shared schema_versions table with MigrationRunner's 4-22, causing version number collisions (5=drop tables vs add column, 6=FTS5 vs prompt tracking, 7=discovery_tokens vs remove UNIQUE). initializeSchema() was gated behind maxApplied===0, so core tables were never created when old versions were present. Fixes: - initializeSchema() always creates core tables via CREATE TABLE IF NOT EXISTS - Migrations 5-7 check actual DB state (columns/constraints) not just version tracking - Crash-safe temp table rebuilds (DROP IF EXISTS _new before CREATE) - Added missing migration 21 (ON UPDATE CASCADE) to MigrationRunner - Added ON UPDATE CASCADE to FK definitions in initializeSchema() - All changes applied to both runner.ts and SessionStore.ts Tests: 13 new tests in migration-runner.test.ts covering fresh DB, idempotency, version conflicts, crash recovery, FK constraints, and data integrity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix 21 test failures — stale mocks, outdated assertions, missing OpenClaw guards Server tests (12): Added missing workerPath and getAiStatus to ServerOptions mocks after interface expansion. ChromaSync tests (3): Updated to verify transport cleanup in ChromaMcpManager after architecture refactor. OpenClaw (2): Added memory_ tool skipping and response truncation to prevent recursive loops and oversized payloads. MarkdownFormatter (2): Updated assertions to match current output. SettingsDefaultsManager (1): Used correct default key for getBool test. Logger standards (1): Excluded CLI transcript command from background service check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix Codex CLI compatibility (#744) — session_id fallbacks, unknown platform tolerance, undefined guard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix Cursor IDE integration (#838, #1049) — adapter field fallbacks, tolerant session-init validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix /api/logs OOM (#1203) — tail-read replaces full-file readFileSync Replace readFileSync (loads entire file into memory) with readLastLines() that reads only from the end of the file in expanding chunks (64KB → 10MB cap). Prevents OOM on large log files while preserving the same API response shape. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix Settings CORS error (#1029) — explicit methods and allowedHeaders in CORS config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: add session custom_title for agent attribution (#1213) — migration 23, endpoint + store support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: prevent CLAUDE.md/AGENTS.md writes inside .git/ directories (#1165) Add .git path guard to all 4 write sites to prevent ref corruption when paths resolve inside .git internals. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix plugin disabled state not respected (#781) — early exit check in all hook entry points Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix UserPromptSubmit context re-injection on every turn (#1079) — contextInjected session flag Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * MAESTRO: fix stale AbortController queue stall (#1099) — lastGeneratorActivity tracking + 30s timeout Three-layer fix: 1. Added lastGeneratorActivity timestamp to ActiveSession, updated by processAgentResponse (all agents), getMessageIterator (queue yields), and startGeneratorWithProvider (generator launch) 2. Added stale generator detection in ensureGeneratorRunning — if no activity for >30s, aborts stale controller, resets state, restarts 3. Added AbortSignal.timeout(30000) in deleteSession to prevent indefinite hang when awaiting a stuck generator promise Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
d9a30cc7d4 |
MAESTRO: exclude transcript CLI from logger-usage-standards test
src/services/transcripts/cli.ts is a CLI command with user-visible console output, not a background service — console.log is intentional. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
50eeed97e7 |
MAESTRO: fix cache-based install — resolveRoot, post-install verification, CLI path
Replace hardcoded marketplace path in plugin/scripts/smart-install.js with resolveRoot() that uses CLAUDE_PLUGIN_ROOT env var (set by Claude Code for all hooks), with fallback to script location and legacy paths. Fixes #1128, #1166 where cache installs couldn't find or install node_modules. Also fixes installCLI() path (ROOT/plugin/scripts/ → ROOT/scripts/) and adds verifyCriticalModules() post-install check with npm fallback on failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
5f28550551 |
MAESTRO: fix MCP type coercion for batch endpoints, add defensive observation error handling
Add string-to-array coercion for ids and memorySessionIds in DataRoutes.ts batch endpoints so MCP clients sending "[1,2,3]" or "1,2,3" instead of native arrays no longer get 400 errors. Wrap observation storage path in SessionRoutes.ts with try/catch returning 200 on recoverable errors instead of 500, preventing hook breakage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e788fd3676 |
fix: prevent duplicate worker daemons and zombie processes (#1178)
* fix: prevent duplicate worker daemons and zombie processes Three root causes of chroma-mcp timeouts: 1. HTTP shutdown (POST /api/admin/shutdown) closed resources but never called process.exit(). Zombie workers stayed alive, background tasks reconnected to chroma-mcp, spawning duplicate subprocesses that all contended for the same persistent data directory. 2. No guard against concurrent daemon startup. When hooks fired simultaneously, multiple daemons started before either wrote a PID file. The loser got EADDRINUSE but stayed alive because signal handlers registered in the constructor prevented exit. 3. Corrupt 147GB HNSW index file caused all chroma queries to timeout (MCP error -32001). Data fix: deleted corrupt collection, backfill rebuilds from SQLite. Code fixes: - Add PID-based guard in daemon startup: exit if PID file process alive - Add port-based guard in daemon startup: exit if port already bound (runs before WorkerService constructor registers keepalive handlers) - Add process.exit(0) after HTTP shutdown/restart completes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: aggressive startup cleanup and one-time chroma wipe for upgrade Kill orphaned worker-service.cjs and chroma-mcp processes immediately at startup (no age gate) while keeping 30-min threshold for mcp-server. Wipe corrupt chroma data once on upgrade from pre-v10.3 versions — backfill rebuilds from SQLite automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wrap shutdown handlers in try/finally to guarantee process.exit If onShutdown() or onRestart() threw, process.exit(0) was never reached, leaving the daemon alive as a zombie. Also removed redundant require('fs') calls in process-manager tests where ESM imports already existed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
40daf8f3fa |
feat: replace WASM embeddings with persistent chroma-mcp MCP connection (#1176)
* feat: replace WASM embeddings with persistent chroma-mcp MCP connection Replace ChromaServerManager (npx chroma run + chromadb npm + ONNX/WASM) with ChromaMcpManager, a singleton stdio MCP client that communicates with chroma-mcp via uvx. This eliminates native binary issues, segfaults, and WASM embedding failures that plagued cross-platform installs. Key changes: - Add ChromaMcpManager: singleton MCP client with lazy connect, auto-reconnect, connection lock, and Zscaler SSL cert support - Rewrite ChromaSync to use MCP tool calls instead of chromadb npm client - Handle chroma-mcp's non-JSON responses (plain text success/error messages) - Treat "collection already exists" as idempotent success - Wire ChromaMcpManager into GracefulShutdown for clean subprocess teardown - Delete ChromaServerManager (no longer needed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review — connection guard leak, timer leak, async reset - Clear connecting guard in finally block to prevent permanent reconnection block - Clear timeout after successful connection to prevent timer leak - Make reset() async to await stop() before nullifying instance - Delete obsolete chroma-server-manager test (imports deleted class) - Update graceful-shutdown test to use chromaMcpManager property name Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent chroma-mcp spawn storm — zombie cleanup, stale onclose guard, reconnect backoff Three bugs caused chroma-mcp processes to accumulate (92+ observed): 1. Zombie on timeout: failed connections left subprocess alive because only the timer was cleared, not the transport. Now catch block explicitly closes transport+client before rethrowing. 2. Stale onclose race: old transport's onclose handler captured `this` and overwrote the current connection reference after reconnect, orphaning the new subprocess. Now guarded with reference check. 3. No backoff: every failure triggered immediate reconnect. With backfill doing hundreds of MCP calls, this created rapid-fire spawning. Added 10s backoff on both connection failure and unexpected process death. Also includes ChromaSync fixes from PR review: - queryChroma deduplication now preserves index-aligned arrays - SQL injection guard on backfill ID exclusion lists Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |