Files
claude-mem/tests/claude-provider-error-classifier.test.ts
T
Alex Newman 5533412984 fix: sequoia-territory bug-fix bundle (chroma, env, build, MCP, worker) (#2394)
* fix(mcp): drop ${_R%/} parameter-expansion trim that trips Claude Code MCP validator

The POSIX substring trim ${_R%/} is misread by Claude Code's MCP-config
validator as a required env var named "_R%/", causing /doctor to flag
mcp-search as invalid on every install. POSIX collapses // in paths, so
the trim was cosmetic — drop it and the validator passes.

Fixes #2350, #2354, #2356.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(env): block ANTHROPIC_BASE_URL leak + three-branch OAuth-skip predicate

Issue #2375: parent-shell ANTHROPIC_BASE_URL leaked through to subprocess
isolatedEnv, while ANTHROPIC_AUTH_TOKEN was blocked. The OAuth-skip
predicate fired on bare BASE_URL, but no auth credential reached the
subprocess -> "Not logged in". Add ANTHROPIC_BASE_URL to BLOCKED_ENV_VARS
so it can only enter isolatedEnv via ~/.claude-mem/.env.

Replace the OAuth-skip predicate with three branches to prevent a
second-order security regression: a user with a tokenless gateway
configured in .env (BASE_URL only, no token) would otherwise have their
Anthropic OAuth token fetched and sent to their gateway. Token leak to
third party. Three-branch predicate:

1. BASE_URL set -> return without OAuth (custom gateway, never leak token)
2. API_KEY or AUTH_TOKEN set -> return without OAuth (explicit credentials)
3. Otherwise -> OAuth lookup for api.anthropic.com

Adds tests/env-isolation.test.ts.

Fixes #2375.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(worker): classify Claude SDK HTTP 400 as unrecoverable

ClaudeProvider previously had no explicit HTTP 400 handling — the
default branch classified all errors as `transient`, so a permanent
400 (e.g., model rejecting an `effort` parameter forwarded from a
leaked CLAUDE_CODE_EFFORT_LEVEL) would be retried indefinitely
(#1874+ retries observed in one session per #2357).

Mirror GeminiProvider/OpenRouterProvider's pattern: classify 400 as
`unrecoverable`, 401/403 as `auth_invalid`, 429 as `rate_limit`,
default to `transient`. When the 400 body matches the
"effort parameter" signature, emit a one-time SDK warn log pointing
at the env-leak fix in ~/.claude-mem/.env.

Adds tests/claude-provider-error-classifier.test.ts.

Fixes #2357.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroma): pin onnxruntime>=1.20 + protobuf<7 to fix INVALID_PROTOBUF on macOS arm64

The shipped all-MiniLM-L6-v2 model has pytorch-2.0 IR. chroma-mcp 0.2.6
transitively depends on `chromadb>=1.0.16` which only requires
`onnxruntime>=1.14.1` — uv can therefore resolve to an onnxruntime old
enough to fail every embedding add with `[ONNXRuntimeError] : 7 :
INVALID_PROTOBUF` on macOS arm64 / Python 3.13. Semantic search silently
degraded to FTS-only and smart backfill broke (#2371).

Path B (override) was required because chroma-mcp 0.2.6 is the latest
PyPI release — no upstream bump exists.

Inject `--with onnxruntime>=1.20 --with protobuf<7` into the uvx spawn
args (both persistent and remote modes). The protobuf cap is essential:
forcing only `onnxruntime>=1.20` causes uv to re-resolve and land on
protobuf 7.x, which trips opentelemetry's `_pb2` stubs with `TypeError:
Descriptors cannot be created directly` because they were generated
with protoc <3.19. Capping below 7 lands on protobuf 6.x which
opentelemetry tolerates.

Verified end-to-end: ONNX model loads, embeddings produce a 384-dim
vector, PersistentClient init / add / query roundtrip succeeds:

    uvx --python 3.13 --with "onnxruntime>=1.20" --with "protobuf<7" \
        chroma-mcp==0.2.6 --help     # clean
    # programmatic test: onnxruntime 1.26.0, protobuf 6.33.6,
    # embedding ok 384, query ok ids=[['1']]

Fixes #2371.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroma): enforce single chroma-mcp subprocess per worker (#2313)

Root cause: every reconnect path in ChromaMcpManager — connectInternal's
re-entry, the connect-timeout catch, callTool's transport-error retry, and
the transport.onclose handler — used to abandon `this.transport`/`this.client`
by calling at most `transport.close()` and nulling the handles. The MCP SDK's
StdioClientTransport.close() only signals the direct child (uvx); on Linux the
grandchildren (uv -> python -> chroma-mcp) re-parent to init and survive
because the SDK does not put the subprocess in its own process group. Each
reconnect therefore leaked a full chroma-mcp tree, accumulating 20+ instances
per session.

Fix: introduce a private disposeCurrentSubprocess() helper that always tree-
kills via the existing killProcessTree primitive before nulling the transport
reference, and route every "abandon current transport" path (reconnect,
connect-timeout, transport error, onclose, stop) through it. The existing
`connecting: Promise<void> | null` lock continues to serialize concurrent
ensureConnected() callers into a single spawn.

Adds tests/services/sync/chroma-mcp-manager-singleton.test.ts covering:
- 5 parallel ensureConnected() calls produce exactly one spawn
- a transport-error reconnect tree-kills the prior subprocess pid before
  spawning a replacement
- stop() disposes state including any pending connecting promise

Manual verification needed on Linux: after a long session with multiple
tool uses, `ps aux | grep chroma-mcp | wc -l` should return 1, not 20+.

Fixes #2313.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build): polyfill import.meta.url to __filename in CJS worker bundle

The worker bundles ESM dependencies (notably @anthropic-ai/claude-agent-sdk's
*.mjs files) into CJS output. Those modules call createRequire(import.meta.url)
at module-load time. esbuild's CJS output left this as createRequire(ute.url)
— where `ute` is its `import.meta` polyfill `{}` — so `ute.url` was undefined
and module-load crashed with:

  TypeError: The argument 'filename' must be a file URL object, file URL
  string, or absolute path string. Received undefined
  code: ERR_INVALID_ARG_VALUE

Every Stop hook and every worker subprocess invocation hit this. Fix is the
esbuild `define` option mapping `import.meta.url` to `__filename` (provided as
a real absolute path by the existing CJS prelude in the banner).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: daily dep bump per CLAUDE.md maintenance policy

Root: @anthropic-ai/claude-agent-sdk, @clack/prompts, @types/node,
dompurify, postcss, react, react-dom, yaml, zod.
plugin/: tree-sitter-cli, zod.
openclaw/: @types/node.

All patch/minor bumps; no major version changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* build: regenerate plugin artifacts after env/chroma/mcp fixes

Built artifacts are committed so the marketplace-installable plugin
ships with the runtime bundles. Picks up:
- d7b145e9 .mcp.json shell-prelude trim drop
- a8cbd651 EnvManager BASE_URL block + 3-branch predicate
- 8cb73b8c ClaudeProvider HTTP 400 unrecoverable classifier
- ecd5b802 ChromaMcpManager onnxruntime/protobuf overrides
- c79324ea ChromaMcpManager singleton enforcement
- e8376f46 esbuild import.meta.url -> __filename polyfill
- a7541d71 daily dep bump

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* build: regenerate plugin artifacts after main merge

Bundles now include both v13.0.0 server-beta runtime (server-beta-service.cjs
+ updated mcp-server.cjs / worker-service.cjs) and this branch's chroma /
env / build / Claude SDK fixes.

Verified: bun test tests/env-isolation.test.ts \\
  tests/claude-provider-error-classifier.test.ts \\
  tests/services/sync/chroma-mcp-manager-singleton.test.ts
→ 13/13 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): address CodeRabbit findings on PR #2394

1. scripts/build-hooks.js — `import.meta.url` now maps to a file:// URL
   (via pathToFileURL(__filename).href in the CJS banner) instead of the
   raw __filename path. Preserves URL semantics for any bundled ESM dep
   that does `new URL(rel, import.meta.url)`. createRequire still works.

2. src/shared/EnvManager.ts — added envFilePath() that resolves
   CLAUDE_MEM_ENV_FILE lazily (falling back to paths.envFile()), and
   switched internal load/save call sites to use it. ENV_FILE_PATH is
   kept as a deprecated snapshot for back-compat. Lets tests target a
   temp file without depending on module-load order.

3. tests/env-isolation.test.ts — redirects to a temp dir via
   CLAUDE_MEM_ENV_FILE in beforeAll, removes all mutation of the real
   ~/.claude-mem/.env, and wraps the OAuth-spy assertion in try/finally
   so the spy is always restored even if the test fails.

Verified:
  bun test tests/env-isolation.test.ts \
    tests/claude-provider-error-classifier.test.ts \
    tests/services/sync/chroma-mcp-manager-singleton.test.ts
  → 13/13 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 18:05:48 -07:00

123 lines
4.0 KiB
TypeScript

import { describe, it, expect, beforeEach, afterEach, spyOn } from 'bun:test';
import {
classifyClaudeError,
__resetEffortHintLatchForTesting,
} from '../src/services/worker/ClaudeProvider.js';
import { isClassified } from '../src/services/worker/provider-errors.js';
import { logger } from '../src/utils/logger.js';
/**
* Tests for HTTP 400 classification in ClaudeProvider's classifyClaudeError.
*
* Regression coverage for #2357: ClaudeProvider previously had no explicit
* HTTP 400 handling, so the default branch classified all 400s as `transient`
* and the retry loop would hammer a permanent error indefinitely (e.g. when
* CLAUDE_CODE_EFFORT_LEVEL leaks into the SDK subprocess and the model
* rejects the `effort` parameter).
*/
describe('classifyClaudeError — HTTP 400 handling (#2357)', () => {
let warnSpy: ReturnType<typeof spyOn>;
beforeEach(() => {
__resetEffortHintLatchForTesting();
warnSpy = spyOn(logger, 'warn').mockImplementation(() => {});
});
afterEach(() => {
warnSpy.mockRestore();
__resetEffortHintLatchForTesting();
});
it('classifies 400 with "effort parameter" body as unrecoverable AND logs an SDK warn once', () => {
const sdkErr = Object.assign(
new Error('This model does not support the effort parameter.'),
{ status: 400 },
);
const classified = classifyClaudeError(sdkErr);
expect(isClassified(classified)).toBe(true);
expect(classified.kind).toBe('unrecoverable');
expect(warnSpy).toHaveBeenCalledTimes(1);
// First positional arg of logger.warn is the component category.
const [component, hintMessage] = warnSpy.mock.calls[0] as [string, string, ...unknown[]];
expect(component).toBe('SDK');
expect(hintMessage).toMatch(/effort/i);
expect(hintMessage).toMatch(/2357/);
});
it('classifies 400 with effort marker in a structured body field', () => {
const sdkErr = Object.assign(
new Error('Bad request'),
{
status: 400,
body: { error: { message: 'This model does not support the effort parameter.' } },
},
);
const classified = classifyClaudeError(sdkErr);
expect(classified.kind).toBe('unrecoverable');
expect(warnSpy).toHaveBeenCalledTimes(1);
});
it('classifies 400 without effort body as unrecoverable WITHOUT firing the effort hint', () => {
const sdkErr = Object.assign(
new Error('some other 400 error'),
{ status: 400 },
);
const classified = classifyClaudeError(sdkErr);
expect(classified.kind).toBe('unrecoverable');
expect(warnSpy).not.toHaveBeenCalled();
});
it('throttles the effort hint to one log per process even on repeated 400s', () => {
const sdkErr = Object.assign(
new Error('This model does not support the effort parameter.'),
{ status: 400 },
);
for (let i = 0; i < 5; i++) {
const classified = classifyClaudeError(sdkErr);
expect(classified.kind).toBe('unrecoverable');
}
expect(warnSpy).toHaveBeenCalledTimes(1);
});
});
describe('classifyClaudeError — sibling status codes (regression sanity)', () => {
let warnSpy: ReturnType<typeof spyOn>;
beforeEach(() => {
__resetEffortHintLatchForTesting();
warnSpy = spyOn(logger, 'warn').mockImplementation(() => {});
});
afterEach(() => {
warnSpy.mockRestore();
__resetEffortHintLatchForTesting();
});
it('classifies status=401 as auth_invalid', () => {
const sdkErr = Object.assign(new Error('unauthorized'), { status: 401 });
const classified = classifyClaudeError(sdkErr);
expect(classified.kind).toBe('auth_invalid');
});
it('classifies status=429 as rate_limit', () => {
const sdkErr = Object.assign(new Error('rate limited'), { status: 429 });
const classified = classifyClaudeError(sdkErr);
expect(classified.kind).toBe('rate_limit');
});
it('classifies a network error with no status as transient', () => {
const networkErr = new Error('ECONNRESET: socket hang up');
const classified = classifyClaudeError(networkErr);
expect(classified.kind).toBe('transient');
});
});