fix: sequoia-territory bug-fix bundle (chroma, env, build, MCP, worker) (#2394)

* fix(mcp): drop ${_R%/} parameter-expansion trim that trips Claude Code MCP validator The POSIX substring trim ${_R%/} is misread by Claude Code's MCP-config validator as a required env var named "_R%/", causing /doctor to flag mcp-search as invalid on every install. POSIX collapses // in paths, so the trim was cosmetic — drop it and the validator passes. Fixes #2350, #2354, #2356. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(env): block ANTHROPIC_BASE_URL leak + three-branch OAuth-skip predicate Issue #2375: parent-shell ANTHROPIC_BASE_URL leaked through to subprocess isolatedEnv, while ANTHROPIC_AUTH_TOKEN was blocked. The OAuth-skip predicate fired on bare BASE_URL, but no auth credential reached the subprocess -> "Not logged in". Add ANTHROPIC_BASE_URL to BLOCKED_ENV_VARS so it can only enter isolatedEnv via ~/.claude-mem/.env. Replace the OAuth-skip predicate with three branches to prevent a second-order security regression: a user with a tokenless gateway configured in .env (BASE_URL only, no token) would otherwise have their Anthropic OAuth token fetched and sent to their gateway. Token leak to third party. Three-branch predicate: 1. BASE_URL set -> return without OAuth (custom gateway, never leak token) 2. API_KEY or AUTH_TOKEN set -> return without OAuth (explicit credentials) 3. Otherwise -> OAuth lookup for api.anthropic.com Adds tests/env-isolation.test.ts. Fixes #2375. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(worker): classify Claude SDK HTTP 400 as unrecoverable ClaudeProvider previously had no explicit HTTP 400 handling — the default branch classified all errors as `transient`, so a permanent 400 (e.g., model rejecting an `effort` parameter forwarded from a leaked CLAUDE_CODE_EFFORT_LEVEL) would be retried indefinitely (#1874+ retries observed in one session per #2357). Mirror GeminiProvider/OpenRouterProvider's pattern: classify 400 as `unrecoverable`, 401/403 as `auth_invalid`, 429 as `rate_limit`, default to `transient`. When the 400 body matches the "effort parameter" signature, emit a one-time SDK warn log pointing at the env-leak fix in ~/.claude-mem/.env. Adds tests/claude-provider-error-classifier.test.ts. Fixes #2357. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroma): pin onnxruntime>=1.20 + protobuf<7 to fix INVALID_PROTOBUF on macOS arm64 The shipped all-MiniLM-L6-v2 model has pytorch-2.0 IR. chroma-mcp 0.2.6 transitively depends on `chromadb>=1.0.16` which only requires `onnxruntime>=1.14.1` — uv can therefore resolve to an onnxruntime old enough to fail every embedding add with `[ONNXRuntimeError] : 7 : INVALID_PROTOBUF` on macOS arm64 / Python 3.13. Semantic search silently degraded to FTS-only and smart backfill broke (#2371). Path B (override) was required because chroma-mcp 0.2.6 is the latest PyPI release — no upstream bump exists. Inject `--with onnxruntime>=1.20 --with protobuf<7` into the uvx spawn args (both persistent and remote modes). The protobuf cap is essential: forcing only `onnxruntime>=1.20` causes uv to re-resolve and land on protobuf 7.x, which trips opentelemetry's `_pb2` stubs with `TypeError: Descriptors cannot be created directly` because they were generated with protoc <3.19. Capping below 7 lands on protobuf 6.x which opentelemetry tolerates. Verified end-to-end: ONNX model loads, embeddings produce a 384-dim vector, PersistentClient init / add / query roundtrip succeeds: uvx --python 3.13 --with "onnxruntime>=1.20" --with "protobuf<7" \ chroma-mcp==0.2.6 --help # clean # programmatic test: onnxruntime 1.26.0, protobuf 6.33.6, # embedding ok 384, query ok ids=[['1']] Fixes #2371. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroma): enforce single chroma-mcp subprocess per worker (#2313) Root cause: every reconnect path in ChromaMcpManager — connectInternal's re-entry, the connect-timeout catch, callTool's transport-error retry, and the transport.onclose handler — used to abandon `this.transport`/`this.client` by calling at most `transport.close()` and nulling the handles. The MCP SDK's StdioClientTransport.close() only signals the direct child (uvx); on Linux the grandchildren (uv -> python -> chroma-mcp) re-parent to init and survive because the SDK does not put the subprocess in its own process group. Each reconnect therefore leaked a full chroma-mcp tree, accumulating 20+ instances per session. Fix: introduce a private disposeCurrentSubprocess() helper that always tree- kills via the existing killProcessTree primitive before nulling the transport reference, and route every "abandon current transport" path (reconnect, connect-timeout, transport error, onclose, stop) through it. The existing `connecting: Promise<void> | null` lock continues to serialize concurrent ensureConnected() callers into a single spawn. Adds tests/services/sync/chroma-mcp-manager-singleton.test.ts covering: - 5 parallel ensureConnected() calls produce exactly one spawn - a transport-error reconnect tree-kills the prior subprocess pid before spawning a replacement - stop() disposes state including any pending connecting promise Manual verification needed on Linux: after a long session with multiple tool uses, `ps aux | grep chroma-mcp | wc -l` should return 1, not 20+. Fixes #2313. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build): polyfill import.meta.url to __filename in CJS worker bundle The worker bundles ESM dependencies (notably @anthropic-ai/claude-agent-sdk's *.mjs files) into CJS output. Those modules call createRequire(import.meta.url) at module-load time. esbuild's CJS output left this as createRequire(ute.url) — where `ute` is its `import.meta` polyfill `{}` — so `ute.url` was undefined and module-load crashed with: TypeError: The argument 'filename' must be a file URL object, file URL string, or absolute path string. Received undefined code: ERR_INVALID_ARG_VALUE Every Stop hook and every worker subprocess invocation hit this. Fix is the esbuild `define` option mapping `import.meta.url` to `__filename` (provided as a real absolute path by the existing CJS prelude in the banner). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: daily dep bump per CLAUDE.md maintenance policy Root: @anthropic-ai/claude-agent-sdk, @clack/prompts, @types/node, dompurify, postcss, react, react-dom, yaml, zod. plugin/: tree-sitter-cli, zod. openclaw/: @types/node. All patch/minor bumps; no major version changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * build: regenerate plugin artifacts after env/chroma/mcp fixes Built artifacts are committed so the marketplace-installable plugin ships with the runtime bundles. Picks up: - d7b145e9 .mcp.json shell-prelude trim drop - a8cbd651 EnvManager BASE_URL block + 3-branch predicate - 8cb73b8c ClaudeProvider HTTP 400 unrecoverable classifier - ecd5b802 ChromaMcpManager onnxruntime/protobuf overrides - c79324ea ChromaMcpManager singleton enforcement - e8376f46 esbuild import.meta.url -> __filename polyfill - a7541d71 daily dep bump Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * build: regenerate plugin artifacts after main merge Bundles now include both v13.0.0 server-beta runtime (server-beta-service.cjs + updated mcp-server.cjs / worker-service.cjs) and this branch's chroma / env / build / Claude SDK fixes. Verified: bun test tests/env-isolation.test.ts \\ tests/claude-provider-error-classifier.test.ts \\ tests/services/sync/chroma-mcp-manager-singleton.test.ts → 13/13 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): address CodeRabbit findings on PR #2394 1. scripts/build-hooks.js — `import.meta.url` now maps to a file:// URL (via pathToFileURL(__filename).href in the CJS banner) instead of the raw __filename path. Preserves URL semantics for any bundled ESM dep that does `new URL(rel, import.meta.url)`. createRequire still works. 2. src/shared/EnvManager.ts — added envFilePath() that resolves CLAUDE_MEM_ENV_FILE lazily (falling back to paths.envFile()), and switched internal load/save call sites to use it. ENV_FILE_PATH is kept as a deprecated snapshot for back-compat. Lets tests target a temp file without depending on module-load order. 3. tests/env-isolation.test.ts — redirects to a temp dir via CLAUDE_MEM_ENV_FILE in beforeAll, removes all mutation of the real ~/.claude-mem/.env, and wraps the OAuth-spy assertion in try/finally so the spy is always restored even if the test fails. Verified: bun test tests/env-isolation.test.ts \ tests/claude-provider-error-classifier.test.ts \ tests/services/sync/chroma-mcp-manager-singleton.test.ts → 13/13 pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 18:05:48 -07:00
parent 13d5fa71c2
commit 5533412984
14 changed files with 1064 additions and 417 deletions
@@ -0,0 +1,122 @@
+import { describe, it, expect, beforeEach, afterEach, spyOn } from 'bun:test';
+
+import {
+  classifyClaudeError,
+  __resetEffortHintLatchForTesting,
+} from '../src/services/worker/ClaudeProvider.js';
+import { isClassified } from '../src/services/worker/provider-errors.js';
+import { logger } from '../src/utils/logger.js';
+
+/**
+ * Tests for HTTP 400 classification in ClaudeProvider's classifyClaudeError.
+ *
+ * Regression coverage for #2357: ClaudeProvider previously had no explicit
+ * HTTP 400 handling, so the default branch classified all 400s as `transient`
+ * and the retry loop would hammer a permanent error indefinitely (e.g. when
+ * CLAUDE_CODE_EFFORT_LEVEL leaks into the SDK subprocess and the model
+ * rejects the `effort` parameter).
+ */
+describe('classifyClaudeError — HTTP 400 handling (#2357)', () => {
+  let warnSpy: ReturnType<typeof spyOn>;
+
+  beforeEach(() => {
+    __resetEffortHintLatchForTesting();
+    warnSpy = spyOn(logger, 'warn').mockImplementation(() => {});
+  });
+
+  afterEach(() => {
+    warnSpy.mockRestore();
+    __resetEffortHintLatchForTesting();
+  });
+
+  it('classifies 400 with "effort parameter" body as unrecoverable AND logs an SDK warn once', () => {
+    const sdkErr = Object.assign(
+      new Error('This model does not support the effort parameter.'),
+      { status: 400 },
+    );
+
+    const classified = classifyClaudeError(sdkErr);
+
+    expect(isClassified(classified)).toBe(true);
+    expect(classified.kind).toBe('unrecoverable');
+    expect(warnSpy).toHaveBeenCalledTimes(1);
+    // First positional arg of logger.warn is the component category.
+    const [component, hintMessage] = warnSpy.mock.calls[0] as [string, string, ...unknown[]];
+    expect(component).toBe('SDK');
+    expect(hintMessage).toMatch(/effort/i);
+    expect(hintMessage).toMatch(/2357/);
+  });
+
+  it('classifies 400 with effort marker in a structured body field', () => {
+    const sdkErr = Object.assign(
+      new Error('Bad request'),
+      {
+        status: 400,
+        body: { error: { message: 'This model does not support the effort parameter.' } },
+      },
+    );
+
+    const classified = classifyClaudeError(sdkErr);
+
+    expect(classified.kind).toBe('unrecoverable');
+    expect(warnSpy).toHaveBeenCalledTimes(1);
+  });
+
+  it('classifies 400 without effort body as unrecoverable WITHOUT firing the effort hint', () => {
+    const sdkErr = Object.assign(
+      new Error('some other 400 error'),
+      { status: 400 },
+    );
+
+    const classified = classifyClaudeError(sdkErr);
+
+    expect(classified.kind).toBe('unrecoverable');
+    expect(warnSpy).not.toHaveBeenCalled();
+  });
+
+  it('throttles the effort hint to one log per process even on repeated 400s', () => {
+    const sdkErr = Object.assign(
+      new Error('This model does not support the effort parameter.'),
+      { status: 400 },
+    );
+
+    for (let i = 0; i < 5; i++) {
+      const classified = classifyClaudeError(sdkErr);
+      expect(classified.kind).toBe('unrecoverable');
+    }
+
+    expect(warnSpy).toHaveBeenCalledTimes(1);
+  });
+});
+
+describe('classifyClaudeError — sibling status codes (regression sanity)', () => {
+  let warnSpy: ReturnType<typeof spyOn>;
+
+  beforeEach(() => {
+    __resetEffortHintLatchForTesting();
+    warnSpy = spyOn(logger, 'warn').mockImplementation(() => {});
+  });
+
+  afterEach(() => {
+    warnSpy.mockRestore();
+    __resetEffortHintLatchForTesting();
+  });
+
+  it('classifies status=401 as auth_invalid', () => {
+    const sdkErr = Object.assign(new Error('unauthorized'), { status: 401 });
+    const classified = classifyClaudeError(sdkErr);
+    expect(classified.kind).toBe('auth_invalid');
+  });
+
+  it('classifies status=429 as rate_limit', () => {
+    const sdkErr = Object.assign(new Error('rate limited'), { status: 429 });
+    const classified = classifyClaudeError(sdkErr);
+    expect(classified.kind).toBe('rate_limit');
+  });
+
+  it('classifies a network error with no status as transient', () => {
+    const networkErr = new Error('ECONNRESET: socket hang up');
+    const classified = classifyClaudeError(networkErr);
+    expect(classified.kind).toBe('transient');
+  });
+});
@@ -0,0 +1,155 @@
+import { describe, it, expect, beforeAll, afterAll, beforeEach, afterEach, spyOn } from 'bun:test';
+import * as fs from 'fs';
+import { tmpdir } from 'os';
+import { join } from 'path';
+import {
+  envFilePath,
+  buildIsolatedEnv,
+  buildIsolatedEnvWithFreshOAuth,
+} from '../src/shared/EnvManager.js';
+import * as oauthToken from '../src/shared/oauth-token.js';
+
+/**
+ * Tests for issue #2375: ANTHROPIC_BASE_URL must not leak from the parent
+ * shell into the spawned worker's isolatedEnv, AND the OAuth-skip predicate
+ * must not inject the user's Anthropic OAuth token onto a custom gateway URL
+ * (which would be a token leak to a third party).
+ *
+ * Redirect EnvManager to a per-suite temp file via CLAUDE_MEM_ENV_FILE so
+ * the user's real ~/.claude-mem/.env is never read or mutated even if a test
+ * fails mid-flight. envFilePath() resolves the override on every call, so
+ * this works regardless of the order other tests imported the module.
+ */
+
+const TEST_DATA_DIR = fs.mkdtempSync(join(tmpdir(), 'claude-mem-env-isolation-'));
+const TEST_ENV_FILE = join(TEST_DATA_DIR, '.env');
+const ORIGINAL_ENV_FILE = process.env.CLAUDE_MEM_ENV_FILE;
+
+const ORIGINAL_BASE_URL = process.env.ANTHROPIC_BASE_URL;
+const ORIGINAL_API_KEY = process.env.ANTHROPIC_API_KEY;
+const ORIGINAL_AUTH_TOKEN = process.env.ANTHROPIC_AUTH_TOKEN;
+const ORIGINAL_OAUTH_TOKEN = process.env.CLAUDE_CODE_OAUTH_TOKEN;
+
+function clearEnvFile(): void {
+  if (fs.existsSync(TEST_ENV_FILE)) {
+    fs.unlinkSync(TEST_ENV_FILE);
+  }
+}
+
+function clearAnthropicEnv(): void {
+  delete process.env.ANTHROPIC_BASE_URL;
+  delete process.env.ANTHROPIC_API_KEY;
+  delete process.env.ANTHROPIC_AUTH_TOKEN;
+  delete process.env.CLAUDE_CODE_OAUTH_TOKEN;
+}
+
+function restoreOriginalEnv(): void {
+  if (ORIGINAL_BASE_URL === undefined) {
+    delete process.env.ANTHROPIC_BASE_URL;
+  } else {
+    process.env.ANTHROPIC_BASE_URL = ORIGINAL_BASE_URL;
+  }
+  if (ORIGINAL_API_KEY === undefined) {
+    delete process.env.ANTHROPIC_API_KEY;
+  } else {
+    process.env.ANTHROPIC_API_KEY = ORIGINAL_API_KEY;
+  }
+  if (ORIGINAL_AUTH_TOKEN === undefined) {
+    delete process.env.ANTHROPIC_AUTH_TOKEN;
+  } else {
+    process.env.ANTHROPIC_AUTH_TOKEN = ORIGINAL_AUTH_TOKEN;
+  }
+  if (ORIGINAL_OAUTH_TOKEN === undefined) {
+    delete process.env.CLAUDE_CODE_OAUTH_TOKEN;
+  } else {
+    process.env.CLAUDE_CODE_OAUTH_TOKEN = ORIGINAL_OAUTH_TOKEN;
+  }
+}
+
+describe('Issue #2375: ANTHROPIC_BASE_URL env-var isolation', () => {
+  beforeAll(() => {
+    fs.mkdirSync(TEST_DATA_DIR, { recursive: true, mode: 0o700 });
+    process.env.CLAUDE_MEM_ENV_FILE = TEST_ENV_FILE;
+    expect(envFilePath()).toBe(TEST_ENV_FILE);
+  });
+
+  afterAll(() => {
+    fs.rmSync(TEST_DATA_DIR, { recursive: true, force: true });
+    if (ORIGINAL_ENV_FILE === undefined) {
+      delete process.env.CLAUDE_MEM_ENV_FILE;
+    } else {
+      process.env.CLAUDE_MEM_ENV_FILE = ORIGINAL_ENV_FILE;
+    }
+  });
+
+  beforeEach(() => {
+    clearEnvFile();
+    clearAnthropicEnv();
+  });
+
+  afterEach(() => {
+    clearEnvFile();
+    restoreOriginalEnv();
+  });
+
+  it('leaked ANTHROPIC_BASE_URL is stripped from isolatedEnv', () => {
+    // No .env file exists. The parent shell sets a stray ANTHROPIC_BASE_URL —
+    // this MUST NOT propagate into the subprocess isolatedEnv, because doing
+    // so used to trigger the OAuth-skip path and leave the worker with no
+    // credentials at all.
+    process.env.ANTHROPIC_BASE_URL = 'https://shouldnotleak.example';
+
+    const result = buildIsolatedEnv();
+
+    expect(result.ANTHROPIC_BASE_URL).toBeUndefined();
+  });
+
+  it('~/.claude-mem/.env BASE_URL + AUTH_TOKEN reaches isolatedEnv', () => {
+    // User intentionally configured a gateway with a gateway-appropriate
+    // auth token. Both must be re-injected into isolatedEnv.
+    fs.writeFileSync(
+      TEST_ENV_FILE,
+      'ANTHROPIC_BASE_URL=https://gateway.example\nANTHROPIC_AUTH_TOKEN=test-token\n',
+      { mode: 0o600 },
+    );
+
+    const result = buildIsolatedEnv();
+
+    expect(result.ANTHROPIC_BASE_URL).toBe('https://gateway.example');
+    expect(result.ANTHROPIC_AUTH_TOKEN).toBe('test-token');
+  });
+
+  it('bare .env BASE_URL alone does not trigger OAuth fetch', async () => {
+    // A user with a tokenless gateway (e.g. mTLS at the network boundary)
+    // configures BASE_URL only. The three-branch predicate must hit the
+    // BASE_URL-set branch BEFORE OAuth lookup, so CLAUDE_CODE_OAUTH_TOKEN
+    // must NOT appear in the result. This is the security-regression guard
+    // against a token leak to a third-party gateway.
+    //
+    // Note: EnvManager captures readClaudeOAuthToken via a named import at
+    // module load, so spyOn on the namespace export only weakly observes
+    // the call (the binding inside EnvManager is independent). The
+    // behavioral assertions (BASE_URL re-injected AND OAuth token NOT
+    // injected) are the load-bearing checks: in the no-OAuth-injection
+    // outcome, the only execution path that produces this combination is
+    // the new BASE_URL-first branch returning early.
+    fs.writeFileSync(
+      TEST_ENV_FILE,
+      'ANTHROPIC_BASE_URL=https://gateway.example\n',
+      { mode: 0o600 },
+    );
+
+    const oauthSpy = spyOn(oauthToken, 'readClaudeOAuthToken');
+
+    try {
+      const result = await buildIsolatedEnvWithFreshOAuth();
+
+      expect(result.ANTHROPIC_BASE_URL).toBe('https://gateway.example');
+      expect(result.CLAUDE_CODE_OAUTH_TOKEN).toBeUndefined();
+      // Best-effort sanity check; see note above.
+      expect(oauthSpy).not.toHaveBeenCalled();
+    } finally {
+      oauthSpy.mockRestore();
+    }
+  });
+});
@@ -0,0 +1,228 @@
+import { describe, it, expect, beforeEach, mock } from 'bun:test';
+
+// Singleton enforcement regression coverage for issue #2313.
+//
+// Hypothesis under test: prior to the fix, ChromaMcpManager could leak its
+// chroma-mcp subprocess tree on every reconnect / transport error, accumulating
+// 20+ instances per session on Linux because the MCP SDK's transport.close()
+// only signals the direct child (uvx). The fix routes every "abandon current
+// transport" path through disposeCurrentSubprocess(), which tree-kills via
+// killProcessTree() before nulling the handles.
+
+let transportCount = 0;
+const transportInstances: Array<FakeTransport> = [];
+
+interface FakeChildProcess {
+  pid: number;
+  once: (event: string, _cb: (...args: unknown[]) => void) => FakeChildProcess;
+  on: (event: string, _cb: (...args: unknown[]) => void) => FakeChildProcess;
+}
+
+class FakeTransport {
+  static nextPid = 100_000;
+  onclose: (() => void) | null = null;
+  closed = false;
+  // Mimic StdioClientTransport's internal `_process` field that the manager
+  // pokes into via `(this.transport as unknown as { _process })._process`.
+  _process: FakeChildProcess;
+
+  constructor(_opts: { command: string; args: string[] }) {
+    transportCount += 1;
+    const pid = FakeTransport.nextPid++;
+    const child: FakeChildProcess = {
+      pid,
+      once: function (this: FakeChildProcess) { return this; },
+      on: function (this: FakeChildProcess) { return this; },
+    };
+    this._process = child;
+    transportInstances.push(this);
+  }
+
+  async close(): Promise<void> {
+    this.closed = true;
+  }
+}
+
+mock.module('@modelcontextprotocol/sdk/client/stdio.js', () => ({
+  StdioClientTransport: FakeTransport,
+}));
+
+let connectImpl: () => Promise<void> = async () => {};
+let callToolImpl: () => Promise<unknown> = async () => ({
+  content: [{ type: 'text', text: '{}' }],
+});
+
+class FakeClient {
+  closed = false;
+  async connect(): Promise<void> {
+    await connectImpl();
+  }
+  async callTool(): Promise<unknown> {
+    return await callToolImpl();
+  }
+  async close(): Promise<void> {
+    this.closed = true;
+  }
+}
+
+mock.module('@modelcontextprotocol/sdk/client/index.js', () => ({
+  Client: FakeClient,
+}));
+
+mock.module('../../../src/shared/SettingsDefaultsManager.js', () => ({
+  SettingsDefaultsManager: {
+    get: () => '',
+    getInt: () => 0,
+    loadFromFile: () => ({}),
+  },
+}));
+
+mock.module('../../../src/shared/paths.js', () => ({
+  USER_SETTINGS_PATH: '/tmp/fake-settings.json',
+  paths: {
+    chroma: () => '/tmp/fake-chroma',
+    combinedCerts: () => '/tmp/fake-combined-certs.pem',
+  },
+}));
+
+mock.module('../../../src/utils/logger.js', () => ({
+  logger: {
+    info: () => {},
+    debug: () => {},
+    warn: () => {},
+    error: () => {},
+    failure: () => {},
+  },
+}));
+
+// Track tree-kill invocations and the transport whose subprocess was killed.
+const killTreeCalls: number[] = [];
+
+mock.module('../../../src/supervisor/index.ts', () => ({
+  getSupervisor: () => ({
+    assertCanSpawn: () => {},
+    registerProcess: () => {},
+    unregisterProcess: () => {},
+  }),
+}));
+
+mock.module('../../../src/supervisor/env-sanitizer.js', () => ({
+  sanitizeEnv: (env: NodeJS.ProcessEnv) => env,
+}));
+
+// Replace child_process.execFile so the static killProcessTree implementation
+// can be observed without actually shelling out. We feed pgrep an empty stdout
+// (no descendants) so the only signal target is the root pid.
+mock.module('child_process', () => {
+  const original = require('node:child_process');
+  return {
+    ...original,
+    execFile: (
+      cmd: string,
+      args: string[],
+      _opts: unknown,
+      cb: (err: Error | null, stdout: { stdout: string; stderr: string }) => void
+    ) => {
+      // Bun's promisify path will call this as if it were a Node-style callback.
+      if (cmd === 'pgrep') {
+        cb(null, { stdout: '', stderr: '' } as any);
+      } else {
+        cb(null, { stdout: '', stderr: '' } as any);
+      }
+    },
+    execSync: () => '',
+  };
+});
+
+// Stub process.kill so the tree-kill path can record targets without crashing
+// the test runner if the synthetic PID happens to collide with a real one.
+const realProcessKill = process.kill.bind(process);
+const stubbedProcessKill = ((pid: number, _signal?: string | number) => {
+  killTreeCalls.push(pid);
+  return true;
+}) as typeof process.kill;
+process.kill = stubbedProcessKill;
+
+import { ChromaMcpManager } from '../../../src/services/sync/ChromaMcpManager.js';
+
+function resetState(): void {
+  transportCount = 0;
+  transportInstances.length = 0;
+  killTreeCalls.length = 0;
+  connectImpl = async () => {};
+  callToolImpl = async () => ({ content: [{ type: 'text', text: '{}' }] });
+}
+
+describe('ChromaMcpManager singleton enforcement (#2313)', () => {
+  beforeEach(async () => {
+    await ChromaMcpManager.reset();
+    resetState();
+  });
+
+  it('serializes concurrent ensureConnected() calls into one spawn', async () => {
+    const mgr = ChromaMcpManager.getInstance();
+
+    // Five parallel callers race ensureConnected via callTool — only one
+    // chroma-mcp subprocess (one transport) should be spawned.
+    await Promise.all(
+      Array.from({ length: 5 }, () =>
+        mgr.callTool('chroma_list_collections', { limit: 1 })
+      )
+    );
+
+    expect(transportCount).toBe(1);
+  });
+
+  it('kills the prior subprocess tree before a reconnect spawn', async () => {
+    const mgr = ChromaMcpManager.getInstance();
+
+    // First call: opens transport #1.
+    await mgr.callTool('chroma_list_collections', { limit: 1 });
+    expect(transportInstances.length).toBe(1);
+    const firstPid = transportInstances[0]._process.pid;
+
+    // Second call: rig callTool to throw a transport error on the FIRST attempt
+    // so the manager runs its reconnect-and-retry path. The retry should
+    // dispose the prior subprocess tree (firstPid) before spawning a new one.
+    let invocations = 0;
+    callToolImpl = async () => {
+      invocations += 1;
+      if (invocations === 1) {
+        throw new Error('Connection closed');
+      }
+      return { content: [{ type: 'text', text: '{}' }] };
+    };
+
+    await mgr.callTool('chroma_list_collections', { limit: 1 });
+
+    expect(transportInstances.length).toBe(2);
+    // The first transport's pid must have been signaled by killProcessTree
+    // before the second transport spawned.
+    expect(killTreeCalls).toContain(firstPid);
+  });
+
+  it('stop() disposes state including any pending connecting promise', async () => {
+    const mgr = ChromaMcpManager.getInstance();
+
+    await mgr.callTool('chroma_list_collections', { limit: 1 });
+    expect(transportInstances.length).toBe(1);
+    const subprocessPid = transportInstances[0]._process.pid;
+
+    await mgr.stop();
+
+    // After stop(), every internal handle should be cleared and the prior
+    // subprocess tree must have been signaled.
+    expect(killTreeCalls).toContain(subprocessPid);
+
+    // A subsequent ensureConnected must spawn a fresh transport (not reuse
+    // a stale one).
+    await mgr.callTool('chroma_list_collections', { limit: 1 });
+    expect(transportInstances.length).toBe(2);
+  });
+});
+
+// Restore the real process.kill once the test module finishes evaluating any
+// late-arriving microtasks.
+process.on('exit', () => {
+  process.kill = realProcessKill;
+});