Commit Graph

369 Commits

Author SHA1 Message Date
Kamran Khalid 02f7c3c9d0 fix(security): validate and restrict /api/instructions operation and topic params (CWE-22, CWE-1321) (#986) 2026-02-16 00:29:08 -05:00
Mark L a94ddc504f fix(cursor): remove obsolete cursor-hooks directory gate (#1087)
Co-authored-by: root <root@localhost.localdomain>
2026-02-16 00:26:37 -05:00
zhaixingzi 454e9c5870 fix: resolve duplicate assistant messages in OpenRouter agent (#1074)
This commit addresses the issue of duplicate assistant messages appearing
in the conversation history by commenting out the lines that were
unnecessarily pushing assistant responses to the conversationHistory array.

The processAgentResponse function already handles adding assistant messages
to the conversation history, so these additional pushes were causing
duplicate entries.

Changes made:
- Commented out session.conversationHistory.push calls for assistant responses
  in three locations within OpenRouterAgent.ts:
  1. In the init response handling (around line 117)
  2. In the observation response handling (around line 188)
  3. In the summary response handling (around line 230)

This ensures that assistant messages are only added once to the conversation
history, preventing duplication while maintaining the intended functionality.

Co-authored-by: 张坤 <zhangkun@example.com>
2026-02-16 00:26:07 -05:00
SaneApps 2f337dab13 fix: use Gemini v1 API endpoint instead of v1beta (#1082)
v1beta does not support newer models like gemini-3-flash, causing
silent 404 errors that back up the observation queue indefinitely.
Users with CLAUDE_MEM_GEMINI_MODEL=gemini-3-flash get zero observations
stored, with no visible error — the queue just grows silently.

Changes:
- Switch API URL from v1beta/models to v1/models (generateContent
  works identically on both endpoints)
- Add gemini-3-flash to GeminiModel type and RPM limits
- Update test to match new endpoint

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 00:26:01 -05:00
Alex Newman 055888e181 fix: address PR review feedback for subprocess cleanup and binary resolution
Wrap SDK query loop in try/finally so subprocess cleanup runs on error paths.
Swap Chroma binary check order to try project-level .bin first (common case).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 23:24:00 -05:00
Alex Newman 67ba17cc8a fix: use WASM backend for Chroma embeddings to fix cross-platform issues
Chroma requires client-side embeddings — the server is storage only.
The previous commit incorrectly removed @chroma-core/default-embed.

Uses DefaultEmbeddingFunction({ wasm: true }) which forces the WASM
backend instead of native ONNX binaries. Same model (all-MiniLM-L6-v2),
same embeddings, but works on all platforms without segfaults or
ENOENT errors (#1104, #1105, #1110).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 23:14:21 -05:00
Alex Newman e1ef14dbcc fix: resolve orphaned subprocesses and Chroma HTTP regressions
- Add subprocess cleanup after SDK query loop completes, using existing
  ProcessRegistry infrastructure (getProcessBySession + ensureProcessExit)
- Replace npx-based Chroma binary spawning with absolute path resolution
  via require.resolve, falling back to npx with explicit cwd (#1120)
- Remove @chroma-core/default-embed client-side dependency; let Chroma
  HTTP server handle embeddings server-side (#1104, #1105, #1110)

Closes #1010, #1089, #1090, #1068, #1120, #1104, #1105, #1110

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 22:04:52 -05:00
Alex Newman c27314f896 fix: address PR review comments for chroma server lifecycle 2026-02-13 23:39:30 -05:00
Alex Newman 1b68c55763 fix: resolve SDK spawn failures and sharp native binary crashes
- Strip CLAUDECODE env var from SDK subprocesses to prevent "cannot be
  launched inside another Claude Code session" error (Claude Code 2.1.42+)
- Lazy-load @chroma-core/default-embed to avoid eagerly pulling in
  sharp native binaries at bundle startup (fixes ERR_DLOPEN_FAILED)
- Add stderr capture to SDK spawn for diagnosing future process failures
- Exclude lockfiles from marketplace rsync and delete stale lockfiles
  before npm install to prevent native dep version mismatches

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:47:27 -05:00
Alex Newman ed313db742 Merge main into feat/chroma-http-server
Resolve conflicts between Chroma HTTP server PR and main branch changes
(folder CLAUDE.md, exclusion settings, Zscaler SSL, transport cleanup).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 21:02:54 -05:00
Alex Newman 5de728612e chore: bump version to 10.0.6
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 00:02:37 -05:00
Alex Newman f05f9ca735 Merge remote-tracking branch 'origin/main' into openclaw-installer
# Conflicts:
#	plugin/scripts/mcp-server.cjs
#	plugin/scripts/worker-service.cjs
2026-02-12 22:04:03 -05:00
Alex Newman 05e904e613 feat: enhance /api/health with version, uptime, workerPath, and AI status
Replace hardcoded TEST-008 build ID with real package version. Add worker
filesystem path, uptime counter, and AI provider status (including last
interaction success/failure tracking) to the health endpoint response.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 21:16:22 -05:00
Alex Newman 98d87d7573 chore: bump version to 10.0.4
Reverts v10.0.3 chroma-mcp spawn storm fix (broken release).
Restores codebase to v10.0.2 state.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 21:36:34 -05:00
Alex Newman 3f01baebfe Merge remote-tracking branch 'origin/main' into fix/chroma-mcp-spawn-storm
# Conflicts:
#	src/services/worker-service.ts
#	tests/infrastructure/process-manager.test.ts
2026-02-11 15:43:08 -05:00
Alex Newman 0b214a59a1 chore: bump version to 10.0.2
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 15:25:50 -05:00
Rod Boev 2c5c99c0c7 fix: use etime-based sorting instead of PID ordering for process guards
Addresses Greptile review feedback:
- ChromaSync: replace PID-based sort with ps etime column + parseElapsedTime()
  for reliable age ordering (PIDs wrap and don't guarantee ordering)
- ProcessManager: filter out entries with unparseable etime (-1) before
  sorting to prevent sort corruption in cleanupExcessChromaProcesses()
2026-02-11 07:19:28 -05:00
Rod Boev a3f9e7f638 fix: prevent chroma-mcp spawn storm with 5-layer defense (641 processes → max 2)
During SIGHUP testing with 6+ active sessions, ChromaSync.ensureConnection()
had no mutex — concurrent fire-and-forget syncObservation() calls each spawned
a chroma-mcp subprocess via StdioClientTransport, creating 641 orphans in ~5min.
Error-driven reconnection formed a positive feedback loop amplifying the storm.

Defense layers:
- Layer 0: Connection mutex via promise memoization (prevents concurrent spawns)
- Layer 1: Pre-spawn process count guard using execFileSync('ps') (kills excess)
- Layer 2: Hardened close() with try-finally + Unix pkill in GracefulShutdown
- Layer 3: Count-based orphan reaper in ProcessManager (not age-based)
- Layer 4: Circuit breaker stops retries after 3 consecutive failures for 60s

Closes #1063, closes #695
Relates to #1010, #707
2026-02-11 07:19:28 -05:00
Rod Boev 4e67393d27 fix: prevent daemon silent death from SIGHUP + unhandled errors
Root cause: registerSignalHandlers() handled SIGTERM/SIGINT but not
SIGHUP. When the parent hook process exits, the kernel sends SIGHUP
to the daemon, causing immediate termination (default signal action).

Belt-and-suspenders fix:
1. SIGHUP handler: ignore in daemon mode, graceful shutdown otherwise
2. setsid: spawn daemon in new session on Linux (prevents SIGHUP delivery)
3. Global unhandledRejection/uncaughtException guards in daemon mode
2026-02-11 00:35:53 -05:00
Alex Newman af95461a70 Merge branch 'main' into fix/hook-resilience-worker-lifecycle
# Conflicts:
#	plugin/scripts/mcp-server.cjs
#	plugin/scripts/worker-service.cjs
2026-02-10 23:37:33 -05:00
Rod Boev 418e38ee46 fix: hook resilience and worker lifecycle improvements (#957, #923, #984, #987, #1042)
Reduce timeouts to eliminate 10-30s startup delay when worker is dead
(common on WSL2 after hibernate). Add stale PID detection, graceful
error handling across all handlers, and error classification that
distinguishes worker unavailability from handler bugs.

- HEALTH_CHECK 30s→3s, new POST_SPAWN_WAIT (5s), PORT_IN_USE_WAIT (3s)
- isProcessAlive() with EPERM handling, cleanStalePidFile()
- getPluginVersion() try-catch for shutdown race (#1042)
- isWorkerUnavailableError: transport+5xx+429→exit 0, 4xx→exit 2
- No-op handler for unknown event types (#984)
- Wrap all handler fetch calls in try-catch for graceful degradation
- CLAUDE_MEM_HEALTH_TIMEOUT_MS env var override with validation
2026-02-10 15:34:35 -05:00
xingyu e4e1d3fb92 fix: Windows platform improvements — re-enable Chroma, fix DB race, simplify env isolation
1. ProcessManager: Migrate spawnDaemon() from WMIC to PowerShell Start-Process
   - WMIC deprecated in Windows 11, PowerShell inherits env vars properly
   - Use -WindowStyle Hidden to prevent console popups
   - Fix redundant backslash escaping in PowerShell $_ variables

2. ChromaSync: Re-enable vector search on Windows
   - Remove overly defensive platform check that disabled all semantic search
   - Worker daemon starts with -WindowStyle Hidden; child processes inherit
   - MCP SDK's StdioClientTransport uses shell:false, no new console created

3. worker-service: Unified DB-ready gate middleware
   - Replace single-endpoint /api/sessions/init wait with global middleware
   - Hold all DB-dependent requests until database is initialized (30s timeout)
   - Whitelist static assets, /health, and viewer page for immediate response
   - Separate dbReadyPromise (DB only) from initializationComplete (full init)
   - Fixes "Database not initialized" errors on /stream, /summarize, /init

4. EnvManager: Switch from allowlist to blocklist for subprocess env
   - Only strip ANTHROPIC_API_KEY to prevent Issue #733 billing hijack
   - Pass through all other vars (ANTHROPIC_AUTH_TOKEN, ANTHROPIC_BASE_URL, etc.)
   - Simpler, less fragile than maintaining an exhaustive system vars allowlist
2026-02-07 18:30:57 +08:00
Alex Newman 5969d670d0 chore: bump version to 9.1.1
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 02:18:44 -05:00
Alex Newman 8dfcb5e612 chore: bump version to 9.1.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 01:05:38 -05:00
Alex Newman ff503d08a7 MAESTRO: Merge PR #657 - Add generate/clean CLI commands for CLAUDE.md management
Cherry-picked source changes from PR #657 (224 commits behind main).
Adds `claude-mem generate` and `claude-mem clean` CLI commands:
- New src/cli/claude-md-commands.ts with generateClaudeMd() and cleanClaudeMd()
- Worker service generate/clean case handlers with --dry-run support
- CLAUDE_MD logger component type
- Uses shared isDirectChild from path-utils.ts (DRY improvement over PR original)

Skipped from PR: 91 CLAUDE.md file deletions (stale), build artifacts,
.claude/plans/ dev artifact, smart-install.js shell alias auto-injection
(aggressive profile modification without consent).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 05:52:54 -05:00
Alex Newman 98920bd860 MAESTRO: Merge PR #662 - Add save_memory MCP tool for manual memory storage
Adds save_memory MCP tool allowing users to manually save observations
for semantic search. Source changes cherry-picked from PR #662 by
@darconada (build artifact conflicts resolved by direct application).

Closes #645.

Co-Authored-By: darconadalabarga <darconada@arsys.es>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 04:13:44 -05:00
Alex Newman 5dffb1ebb0 MAESTRO: fix(hooks): add session-complete handler to enable orphan reaper cleanup
Cherry-picked from PR #844 by @thusdigital. Sessions stayed in active
sessions map forever after summarize, causing the orphan reaper to think
all processes were still active. Adds session-complete as Stop phase 2
hook that calls POST /api/sessions/complete to remove sessions from the
active map, allowing the reaper to correctly identify and clean up
orphaned worker processes. Fixes #842.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 03:23:13 -05:00
Alex Newman da1d2cd36a MAESTRO: fix(db): prevent FK constraint failures on worker restart
Cherry-picked source changes from PR #889 by @Et9797. Fixes #846.

Key changes:
- Add ensureMemorySessionIdRegistered() guard in SessionStore.ts
- Add ON UPDATE CASCADE migration (schema v21) for observations and session_summaries FK constraints
- Change message queue from claim-and-delete to claim-confirm pattern (PendingMessageStore.ts)
- Add spawn deduplication and unrecoverable error detection in SessionRoutes.ts and worker-service.ts
- Add forceInit flag to SDKAgent for stale session recovery

Build artifacts skipped (pre-existing dompurify dep issue). Path fixes (HealthMonitor.ts, worker-utils.ts)
already merged via PR #634.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 03:16:17 -05:00
Alex Newman 8030c44af4 MAESTRO: fix(cursor): use bun runtime and fix hooks directory detection
Cherry-picked source changes from PR #721. Fixes two Cursor standalone
setup bugs:

1. findCursorHooksDir() now checks for hooks.json (unified CLI mode)
   in addition to legacy common.sh/common.ps1 scripts
2. installCursorHooks() now uses bun instead of node for hook commands
   since worker-service.cjs depends on bun:sqlite
3. Added findBunPath() to detect bun executable across platforms

Build artifacts skipped (pre-existing dompurify viewer dep issue).
Source-only cherry-pick, TypeScript compilation clean for modified file.

Co-Authored-By: polux0 <aleksaprosperitylabs@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 03:09:26 -05:00
Alex Newman 75a0f2e981 fix: respect CLAUDE_CONFIG_DIR for plugin paths (#626)
Add MARKETPLACE_ROOT constant to paths.ts and update 5 source files to use
centralized path constants instead of hardcoded ~/.claude paths. Preserves
backwards compatibility when CLAUDE_CONFIG_DIR is not set.

Based on PR #634 by @Kuroakira, cherry-picked onto main due to build artifact
merge conflicts (source changes applied cleanly).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 03:00:08 -05:00
Alex Newman 91e1d5baad fix: correct Gemini model name from gemini-3-flash to gemini-3-flash-preview
The Gemini API requires the -preview suffix for the Gemini 3 Flash model.
gemini-3-flash does not exist - only gemini-3-flash-preview is available.
This was causing 404 errors when users selected this model option.

Closes #831

Co-Authored-By: Glucksberg <markuscontasul@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 02:55:30 -05:00
Alex Newman 3ad53733e8 MAESTRO: Merge PR #884 adding Zscaler SSL certificate support for ChromaDB vector search
Adds automatic detection and handling of Zscaler enterprise security certificates
on macOS. Combines standard certifi CA certificates with Zscaler certificates into
a single bundle, passed via SSL_CERT_FILE/REQUESTS_CA_BUNDLE/CURL_CA_BUNDLE env vars
to the chroma-mcp subprocess. Certificate bundle is cached for 24 hours. Falls back
gracefully when Zscaler is not present, with no impact on non-Zscaler environments.

Co-Authored-By: RClark4958 <rickdclark48@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 02:15:19 -05:00
Jenha Poyarkov 9f2a237aaf fix: close transport on connection error to prevent chroma-mcp zombie processes
Fixes #761

Root cause: When connection errors occur (MCP error -32000, Connection closed),
the code was resetting \`connected\` and \`client\` but NOT calling
\`transport.close()\`, leaving the chroma-mcp subprocess alive. Each
reconnection attempt spawned a NEW process while old ones accumulated.

Changes:
- Close transport before resetting state in ensureCollection() error handler
- Close transport before resetting state in queryChroma() error handler
- Set transport = null after closing to match close() method behavior
- Add regression tests for Issue #761 with source code verification

Tested on macOS - no more zombie processes after the fix.
2026-02-06 02:10:18 -05:00
Abdelkarim Mateos Sanchez 9bd56c993c fix: align IDs with metadatas in ChromaSearchStrategy
ChromaSync.queryChroma() returns deduplicated sqlite_ids but the
metadatas array contains multiple entries per observation (narrative +
facts). The filterByRecency() method was iterating over metadatas and
using the index to access ids, causing array out-of-bounds access.

The fix builds a Map from sqlite_id to metadata, then iterates over
the deduplicated ids array to ensure proper alignment.

Symptoms before fix:
- Semantic search returning incorrect/empty results
- Search only working with near-exact queries
- Recent items (same day) not being found

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 02:07:03 -05:00
Alex Newman 711f5455df fix: generate synthetic memorySessionId for stateless providers (PR #615)
Gemini and OpenRouter are stateless APIs that never return session IDs.
Without synthetic IDs, PR #693's defensive memorySessionId checks throw
errors on every observation processing call for these providers.

Generates provider-prefixed IDs (gemini-/openrouter-{contentSessionId}-
{timestamp}) before the first API call, persisted to the database via
updateMemorySessionId(). Applied from PR #615 (closed due to staleness).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 02:04:49 -05:00
Alex Newman 2d40afe7ef fix: provider-aware recovery and stale session cleanup (PR #741)
Applies PR #741 by @licutis onto main, resolving conflicts with
recently merged PRs #693, #937, and #627. Adds getActiveAgent() to
WorkerService so startup-recovery uses the correct provider instead
of hardcoding SDKAgent. Also cleans up sessions stuck 'active' for
6+ hours and their pending messages before processing orphaned queues.

Co-Authored-By: licutis <43884712+licutis@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 01:58:00 -05:00
TranslateMe ea38601564 fix: Reset AbortController before starting generator to prevent infinite abort loop
When a generator exits with wasAborted=true, the AbortController remains in
aborted state but generatorPromise is set to null. When a new observation
arrives, ensureGeneratorRunning() sees generatorPromise=null and tries to
start a new generator, but the new generator immediately sees
signal.aborted=true and exits, causing an infinite "Generator aborted" loop.

This fix resets the AbortController if it's already aborted before starting
a new generator, allowing the session to recover from the stuck state.

Bug reproduction:
1. Session receives observations
2. Something causes the generator to be aborted
3. generatorPromise = null, but abortController.signal.aborted = true
4. New observation arrives → starts generator → immediately aborted → loop

Fix: Check if abortController.signal.aborted before starting generator,
and create a new AbortController if needed.
2026-02-06 01:53:17 -05:00
jayvenn21 f24bba21e9 fix(worker): gracefully process orphaned pending messages after session termination 2026-02-06 01:47:43 -05:00
Michael Lipscombe af308ea5c8 fix: Backfill project field on SDK session creation to prevent race condition
PostToolUse hook can create the session before UserPromptSubmit's
session-init sets the project, leaving it empty. Add an UPDATE after
INSERT OR IGNORE to backfill the project when a later hook provides it.

Closes #939

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 01:44:42 -05:00
Alex Newman 6382d6f9c7 MAESTRO: Merge PR #693 - prevent infinite restart loop that causes runaway API costs
Add restart limit (max 3 consecutive restarts) with exponential backoff
to prevent infinite generator restart loops. Also add defensive
memorySessionId checks in GeminiAgent and OpenRouterAgent before
expensive LLM calls to fail fast when session ID hasn't been captured.

Based on PR #693 by @ajbmachon (applied to current main).

Co-Authored-By: Andre Machon <ajbmachon2@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 01:43:12 -05:00
jayvenn21 93bad99d79 fix: terminate session on prompt-too-long instead of retrying indefinitely 2026-02-06 01:39:55 -05:00
Michel Tomas b07130acc6 fix: handle both boolean and string types for settings
JSON.parse preserves native types, so boolean true/false stay as
booleans rather than strings. The previous check only handled string
'true', meaning users who set `"ENABLED": true` (boolean) wouldn't
have the feature work.

Now both `getBool()` helper and ResponseProcessor check handle:
- String 'true' → enabled
- Boolean true → enabled
- Any other value → disabled

Tested: Confirmed feature stays disabled with string "false" and
no new CLAUDE.md files are created when accessing directories.
2026-02-06 01:36:45 -05:00
Michel Tomas bb96092d74 fix: add FOLDER_CLAUDEMD_ENABLED to settingKeys for API/UI access 2026-02-06 01:36:45 -05:00
Michel Tomas 9907df1db8 fix: respect CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED setting
The CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED setting was documented but never
actually checked in code. The folder CLAUDE.md generation ran
unconditionally whenever files were touched.

Changes:
- Add CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED to SettingsDefaults interface
- Add default value 'false' to DEFAULTS object
- Check setting in ResponseProcessor before calling updateFolderClaudeMdFiles

Fixes the issue identified in issue-600-documentation-audit-features-not-implemented.md:
"The setting CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED is never read."
2026-02-06 01:36:45 -05:00
jayvenn21 5d1ee20076 fix: prevent duplicate generator spawns in handleSessionInit
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-05 23:48:56 -05:00
Rajiv Sinclair 27aa98c269 fix: wait for database initialization before processing session-init requests
Fixes a race condition where the UserPromptSubmit hook could call
/api/sessions/init before the database is fully initialized, resulting
in a 500 error with "Database not initialized".

Root cause:
- Worker starts HTTP server immediately for fast startup
- Health endpoint (/api/health) returns OK when server is listening
- session-init hook waits for health check, then calls /api/sessions/init
- Database initialization happens in background, creating a race window

Solution:
- Add early handler for /api/sessions/init that waits for initialization
- Uses same pattern as existing /api/context/inject handler
- Returns 503 with retry message if initialization times out

The fix ensures hooks receive proper responses even during the brief
startup window when the server is listening but DB isn't ready yet.

Co-Authored-By: Claude Code <noreply@anthropic.com>
2026-02-05 22:23:52 -05:00
Alex Newman d333c7dc08 MAESTRO: Expand startup orphan cleanup to target mcp-server and worker-service processes
The startup cleanupOrphanedProcesses() only targeted chroma-mcp, leaving
orphaned mcp-server.cjs and worker-service.cjs processes undetected after
daemon crashes. Expanded to target all claude-mem process types with
30-minute age filtering and current PID exclusion. Closes PR #687 (which
had a spawnDaemon regression removing Windows WMIC support).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 22:06:46 -05:00
david718 cdb0e823aa fix: add daemon children cleanup to orphan reaper
Add killIdleDaemonChildren() to clean up SDK-spawned claude processes
that don't terminate after completing their work.

Problem:
- Worker-service daemon spawns Claude SDK processes
- These processes remain alive after work completes
- They accumulate over time, consuming significant memory
- Existing killSystemOrphans() only handles PPID=1 orphans

Solution:
- Add killIdleDaemonChildren() that finds claude processes where:
  - Parent PID = daemon's PID (children of worker-service)
  - CPU = 0% (idle, not actively working)
  - Running > 2 minutes (completed their work)
- Call it from reapOrphanedProcesses() (runs every 5 minutes)

Testing:
- Verified locally: 15+ zombie processes cleaned up
- Memory saved: ~2GB
- Normal processes (MCP server, Chroma) unaffected

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-05 19:50:46 -05:00
Alex Newman 0ecb387f58 MAESTRO: Implement Windows spawn guard to prevent repeated worker popups (closes #921)
Adds file-based cooldown lock in ensureWorkerStarted() so failed worker
spawn attempts on Windows don't produce repeated bun.exe terminal popups.
Based on the approach from PR #931, integrated into the refactored code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 18:59:53 -05:00
Rod Boev b8b88d998c fix: fail open on /api/context/inject during initialization
The /api/context/inject endpoint previously blocked for up to 5 minutes
(300s timeout) waiting for initializationComplete, which includes MCP
connection setup. On Windows, the MCP connection can hang indefinitely,
causing the context hook to never return and blocking Claude Code startup.

This change makes the endpoint fail open: if the worker hasn't finished
initializing, return empty context immediately instead of blocking. The
hook completes fast, and context becomes available on subsequent prompts
once initialization finishes in the background.

Closes #958
2026-02-05 18:22:08 -05:00