* fix(mcp): drop ${_R%/} parameter-expansion trim that trips Claude Code MCP validator
The POSIX substring trim ${_R%/} is misread by Claude Code's MCP-config
validator as a required env var named "_R%/", causing /doctor to flag
mcp-search as invalid on every install. POSIX collapses // in paths, so
the trim was cosmetic — drop it and the validator passes.
Fixes#2350, #2354, #2356.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(env): block ANTHROPIC_BASE_URL leak + three-branch OAuth-skip predicate
Issue #2375: parent-shell ANTHROPIC_BASE_URL leaked through to subprocess
isolatedEnv, while ANTHROPIC_AUTH_TOKEN was blocked. The OAuth-skip
predicate fired on bare BASE_URL, but no auth credential reached the
subprocess -> "Not logged in". Add ANTHROPIC_BASE_URL to BLOCKED_ENV_VARS
so it can only enter isolatedEnv via ~/.claude-mem/.env.
Replace the OAuth-skip predicate with three branches to prevent a
second-order security regression: a user with a tokenless gateway
configured in .env (BASE_URL only, no token) would otherwise have their
Anthropic OAuth token fetched and sent to their gateway. Token leak to
third party. Three-branch predicate:
1. BASE_URL set -> return without OAuth (custom gateway, never leak token)
2. API_KEY or AUTH_TOKEN set -> return without OAuth (explicit credentials)
3. Otherwise -> OAuth lookup for api.anthropic.com
Adds tests/env-isolation.test.ts.
Fixes#2375.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(worker): classify Claude SDK HTTP 400 as unrecoverable
ClaudeProvider previously had no explicit HTTP 400 handling — the
default branch classified all errors as `transient`, so a permanent
400 (e.g., model rejecting an `effort` parameter forwarded from a
leaked CLAUDE_CODE_EFFORT_LEVEL) would be retried indefinitely
(#1874+ retries observed in one session per #2357).
Mirror GeminiProvider/OpenRouterProvider's pattern: classify 400 as
`unrecoverable`, 401/403 as `auth_invalid`, 429 as `rate_limit`,
default to `transient`. When the 400 body matches the
"effort parameter" signature, emit a one-time SDK warn log pointing
at the env-leak fix in ~/.claude-mem/.env.
Adds tests/claude-provider-error-classifier.test.ts.
Fixes#2357.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chroma): pin onnxruntime>=1.20 + protobuf<7 to fix INVALID_PROTOBUF on macOS arm64
The shipped all-MiniLM-L6-v2 model has pytorch-2.0 IR. chroma-mcp 0.2.6
transitively depends on `chromadb>=1.0.16` which only requires
`onnxruntime>=1.14.1` — uv can therefore resolve to an onnxruntime old
enough to fail every embedding add with `[ONNXRuntimeError] : 7 :
INVALID_PROTOBUF` on macOS arm64 / Python 3.13. Semantic search silently
degraded to FTS-only and smart backfill broke (#2371).
Path B (override) was required because chroma-mcp 0.2.6 is the latest
PyPI release — no upstream bump exists.
Inject `--with onnxruntime>=1.20 --with protobuf<7` into the uvx spawn
args (both persistent and remote modes). The protobuf cap is essential:
forcing only `onnxruntime>=1.20` causes uv to re-resolve and land on
protobuf 7.x, which trips opentelemetry's `_pb2` stubs with `TypeError:
Descriptors cannot be created directly` because they were generated
with protoc <3.19. Capping below 7 lands on protobuf 6.x which
opentelemetry tolerates.
Verified end-to-end: ONNX model loads, embeddings produce a 384-dim
vector, PersistentClient init / add / query roundtrip succeeds:
uvx --python 3.13 --with "onnxruntime>=1.20" --with "protobuf<7" \
chroma-mcp==0.2.6 --help # clean
# programmatic test: onnxruntime 1.26.0, protobuf 6.33.6,
# embedding ok 384, query ok ids=[['1']]
Fixes#2371.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chroma): enforce single chroma-mcp subprocess per worker (#2313)
Root cause: every reconnect path in ChromaMcpManager — connectInternal's
re-entry, the connect-timeout catch, callTool's transport-error retry, and
the transport.onclose handler — used to abandon `this.transport`/`this.client`
by calling at most `transport.close()` and nulling the handles. The MCP SDK's
StdioClientTransport.close() only signals the direct child (uvx); on Linux the
grandchildren (uv -> python -> chroma-mcp) re-parent to init and survive
because the SDK does not put the subprocess in its own process group. Each
reconnect therefore leaked a full chroma-mcp tree, accumulating 20+ instances
per session.
Fix: introduce a private disposeCurrentSubprocess() helper that always tree-
kills via the existing killProcessTree primitive before nulling the transport
reference, and route every "abandon current transport" path (reconnect,
connect-timeout, transport error, onclose, stop) through it. The existing
`connecting: Promise<void> | null` lock continues to serialize concurrent
ensureConnected() callers into a single spawn.
Adds tests/services/sync/chroma-mcp-manager-singleton.test.ts covering:
- 5 parallel ensureConnected() calls produce exactly one spawn
- a transport-error reconnect tree-kills the prior subprocess pid before
spawning a replacement
- stop() disposes state including any pending connecting promise
Manual verification needed on Linux: after a long session with multiple
tool uses, `ps aux | grep chroma-mcp | wc -l` should return 1, not 20+.
Fixes#2313.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(build): polyfill import.meta.url to __filename in CJS worker bundle
The worker bundles ESM dependencies (notably @anthropic-ai/claude-agent-sdk's
*.mjs files) into CJS output. Those modules call createRequire(import.meta.url)
at module-load time. esbuild's CJS output left this as createRequire(ute.url)
— where `ute` is its `import.meta` polyfill `{}` — so `ute.url` was undefined
and module-load crashed with:
TypeError: The argument 'filename' must be a file URL object, file URL
string, or absolute path string. Received undefined
code: ERR_INVALID_ARG_VALUE
Every Stop hook and every worker subprocess invocation hit this. Fix is the
esbuild `define` option mapping `import.meta.url` to `__filename` (provided as
a real absolute path by the existing CJS prelude in the banner).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: daily dep bump per CLAUDE.md maintenance policy
Root: @anthropic-ai/claude-agent-sdk, @clack/prompts, @types/node,
dompurify, postcss, react, react-dom, yaml, zod.
plugin/: tree-sitter-cli, zod.
openclaw/: @types/node.
All patch/minor bumps; no major version changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* build: regenerate plugin artifacts after env/chroma/mcp fixes
Built artifacts are committed so the marketplace-installable plugin
ships with the runtime bundles. Picks up:
- d7b145e9 .mcp.json shell-prelude trim drop
- a8cbd651 EnvManager BASE_URL block + 3-branch predicate
- 8cb73b8c ClaudeProvider HTTP 400 unrecoverable classifier
- ecd5b802 ChromaMcpManager onnxruntime/protobuf overrides
- c79324ea ChromaMcpManager singleton enforcement
- e8376f46 esbuild import.meta.url -> __filename polyfill
- a7541d71 daily dep bump
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* build: regenerate plugin artifacts after main merge
Bundles now include both v13.0.0 server-beta runtime (server-beta-service.cjs
+ updated mcp-server.cjs / worker-service.cjs) and this branch's chroma /
env / build / Claude SDK fixes.
Verified: bun test tests/env-isolation.test.ts \\
tests/claude-provider-error-classifier.test.ts \\
tests/services/sync/chroma-mcp-manager-singleton.test.ts
→ 13/13 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(review): address CodeRabbit findings on PR #2394
1. scripts/build-hooks.js — `import.meta.url` now maps to a file:// URL
(via pathToFileURL(__filename).href in the CJS banner) instead of the
raw __filename path. Preserves URL semantics for any bundled ESM dep
that does `new URL(rel, import.meta.url)`. createRequire still works.
2. src/shared/EnvManager.ts — added envFilePath() that resolves
CLAUDE_MEM_ENV_FILE lazily (falling back to paths.envFile()), and
switched internal load/save call sites to use it. ENV_FILE_PATH is
kept as a deprecated snapshot for back-compat. Lets tests target a
temp file without depending on module-load order.
3. tests/env-isolation.test.ts — redirects to a temp dir via
CLAUDE_MEM_ENV_FILE in beforeAll, removes all mutation of the real
~/.claude-mem/.env, and wraps the OAuth-spy assertion in try/finally
so the spy is always restored even if the test fails.
Verified:
bun test tests/env-isolation.test.ts \
tests/claude-provider-error-classifier.test.ts \
tests/services/sync/chroma-mcp-manager-singleton.test.ts
→ 13/13 pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Major version bump following PR #2351 merge — server-beta runtime,
Postgres observation storage, BullMQ queue engine, and Apache 2.0
relicense are now on main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add server beta runtime foundation
* Address server beta review findings
* Resolve server beta review comments
* Tighten server beta review follow-ups
* Harden server beta auth and search
* Avoid unnecessary FTS rebuilds
* Block scoped keys from creating projects
* Release BullMQ claims best effort on close
* Address server beta review blockers
* Reset BullMQ claims best effort
* Add Postgres observation storage foundation
* feat(server-beta): add independent runtime service
Introduce src/server/runtime/ as a self-contained server-beta runtime
that owns its lifecycle, Postgres bootstrap, and HTTP boundary without
depending on WorkerService.
ServerBetaService wraps the existing Server class, exposes
/healthz and /v1/info with runtime="server-beta", and persists state
to dedicated paths (.server-beta.pid|.port|.runtime.json). The four
boundary managers (queue, generation worker, provider registry, event
broadcaster) are intentionally disabled in this phase and report their
status through /v1/info; later phases activate them.
Adds plans/2026-05-07-finish-bullmq-branch-ship-plan.md to track the
remaining work for this branch.
Phase 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): route CLI lifecycle and bundle separate runtime
scripts/build-hooks.js now produces plugin/scripts/server-beta-service.cjs
as a separate Node CJS bundle, alongside the existing worker-service
bundle. The server-beta runtime is now installable independently.
src/npx-cli/commands/server.ts routes start|stop|restart|status to the
server-beta lifecycle instead of the legacy worker. The worker keeps its
own start|stop|restart|status under the worker namespace; the two
runtimes can be operated independently.
src/services/worker-service.ts adds a server-* command parser branch
that delegates to the sibling server-beta-service.cjs bundle so
direct worker-service invocations still route to the right runtime.
tests/npx-cli-server-namespace.test.ts updated to expect server-beta
lifecycle routing.
Includes rebuilt plugin/scripts/*.cjs bundles produced by
build-and-sync.
Phase 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): add BullMQ job queue primitives
Introduce src/server/jobs/ as the queue-side primitives that Phase 3 of
the server-beta runtime needs to operate.
types.ts defines a discriminated union over the four job kinds (event,
event-batch, summary, reindex) and maps each to a per-kind BullMQ queue
name and deterministic-ID prefix.
job-id.ts builds deterministic, colon-free BullMQ jobIds from
(kind, team, project, source). The colon ban exists because BullMQ uses
':' as a Redis key separator internally; embedding ':' in jobIds
breaks scan and state lookups.
ServerJobQueue.ts is a thin wrapper over BullMQ Queue + Worker that
enforces autorun:false, default concurrency 1, and an attached error
listener — all per BullMQ docs requirements. Test seams accept queue
and worker factories so unit tests do not need Redis.
outbox.ts publishes through the Postgres ObservationGenerationJob
repository as canonical history. enqueueOutbox writes the row first,
then publishes to BullMQ; if BullMQ throws, the row is transitioned to
failed and a failed event is appended. reconcileOnStartup re-enqueues
queued + processing rows after a restart, replacing terminal BullMQ
jobs that may still be holding the deterministic ID slot. markCompleted
and markFailed wrap transitionStatus and append the matching event row.
Includes 20 unit tests covering deterministic ID stability, colon-free
output, queue lifecycle, error-listener attachment, double-start
refusal, idempotent enqueue, BullMQ failure rollback, startup
reconciliation, max-attempts skipping, and completion / failure /
retry transitions.
Phase 3 commit 1 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): activate queue boundary in runtime service
Wire ActiveServerBetaQueueManager into the server-beta runtime graph.
The active manager owns one ServerJobQueue per generation kind (event,
event-batch, summary, reindex) and surfaces lane metadata through
boundary health.
Selection is opt-in and fail-fast: if CLAUDE_MEM_QUEUE_ENGINE is set to
bullmq the active manager is constructed (and any Redis/config error
throws — no silent fallback to SQLite, per Phase 3 anti-pattern guard).
For any other engine the disabled boundary remains so worker-era and
test setups stay compatible.
Widens ServerBetaBoundaryHealth.status to a discriminated union
('disabled' | 'active' | 'errored') with optional details. The disabled
adapter still emits status='disabled', which keeps the existing
server-beta-service test green.
ServerBetaService receives the manager through a new optional
queueManager field on CreateServerBetaServiceOptions so test graphs
and Phase 4 wiring can inject custom managers.
Adds tests/server/runtime/active-queue-manager.test.ts covering bullmq
guard, active health shape, per-kind queue access, close behavior, and
post-close errored health.
Phase 3 commit 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(server-beta): cap /v1/events/batch at 500 events
Prevents unbounded array DoS surface flagged in PR review.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Correct hook lifecycle list: 6 hooks (Setup, SessionStart,
UserPromptSubmit, PreToolUse, PostToolUse, Stop), not the
fictional 'Summary' / 'SessionEnd' pair.
- Replace misleading 'src/hooks/*.ts' description with the actual
build path from src/services/worker-service.ts via
scripts/build-hooks.js, and list the real subcommands.
- Drop the broken link to private/context/claude-code/exit-codes.md
(path no longer exists in the repo).
PR #2300 moved 21 tree-sitter grammar packages from devDependencies into
root dependencies, claiming "their .wasm files are loaded at runtime by
parser.ts." That justification is wrong for the root claude-mem npm
package: parser.ts compiles into plugin/scripts/worker-service.cjs, which
runs from the marketplace folder where plugin/package.json already lists
every grammar as a runtime dep. Nothing in dist/npx-cli/ ever loads a
grammar, and resolveGrammarPath() handles missing packages gracefully.
The regression: `npx claude-mem@12.6.1 install` now fetches all 21
grammars at npx time. tree-sitter-swift's postinstall pulls a nested
tree-sitter-cli that downloads a Rust binary from GitHub and hangs the
install. npm ignores the trustedDependencies bun-allowlist, so there's
no way to skip the postinstall scripts on a bare `npx` fetch.
Fix: move grammar packages back to root devDependencies. The marketplace
plugin install (installPluginDependencies → bun install in plugin/) still
works because plugin/package.json keeps them as deps and Bun honors
trustedDependencies: ["tree-sitter-cli"] to skip the harmful postinstalls
on every other grammar.
Keep PR #2300's --legacy-peer-deps + --omit=dev install.ts changes —
those address a separate, valid marketplace ERESOLVE.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- waitForSlot now accepts an optional AbortSignal. When the signal
fires (e.g. session.abortController.abort() during shutdown or
cancel), the queued waiter is removed from slotWaiters and the
promise rejects immediately, instead of hanging until a slot
naturally opens. Restores the cancellation guarantee that the
removed 60s timeout used to provide. ClaudeProvider.startSession
now passes session.abortController.signal at the call site.
- EnvManager: a bare ANTHROPIC_BASE_URL now also short-circuits the
OAuth lookup. Tokenless gateways (allowed by the new install flow)
were otherwise being authenticated against api.anthropic.com via the
injected OS-keychain OAuth token.
- install.ts: resolveClaudeAuthMethod now reads the raw stored
CLAUDE_MEM_CLAUDE_AUTH_METHOD value via a direct settings.json read
(readRawStoredAuthMethod), bypassing SettingsDefaultsManager's
default backfill. Without this, getSetting() always returned
'subscription' for unmigrated installs and the env-based fallback
never ran — so the previous fix only addressed the optics, not
the actual misclassification.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- EnvManager: add ANTHROPIC_AUTH_TOKEN to BLOCKED_ENV_VARS so a token
inherited from the parent shell can no longer short-circuit the OAuth
lookup at SDK spawn time. Mirrors the ANTHROPIC_API_KEY treatment
added in issue #733. Explicit gateway tokens in
~/.claude-mem/.env are still re-injected by buildIsolatedEnv().
- install.ts: extract resolveClaudeAuthMethod() that returns a stored
CLAUDE_MEM_CLAUDE_AUTH_METHOD when present and otherwise infers
the mode from ~/.claude-mem/.env (ANTHROPIC_BASE_URL → gateway,
ANTHROPIC_API_KEY → api-key, else subscription). persistClaudeProvider,
the interactive Claude auth flow, and promptClaudeModel now use it,
so older installs that pre-date the setting are no longer
misclassified as 'subscription' (which would clear working
credentials and disable custom gateway models).
- configureDirectApiKey: when an Anthropic API key already exists,
prompt to keep or rotate it instead of silently re-saving — restores
the ability to update a revoked or rotated key from the installer
without losing the cancel-safe behaviour added in 7f3686fd.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Cancel of API-key / Gateway-URL prompts no longer wipes existing
credentials by switching to subscription auth and emptying
ANTHROPIC_API_KEY / ANTHROPIC_BASE_URL / ANTHROPIC_AUTH_TOKEN. Cancel
now leaves the prior config untouched.
- Empty gateway-token input preserves the existing token instead of
clearing it. The new prompt copy explains that blank keeps the
current token.
- Interactive install no longer hard-locks to Claude when
--provider is unset. Prompt now asks for provider
(claude/gemini/openrouter) up front, then runs the Claude auth flow
only when the user picks Claude.
- Claude auth-mode prompt now seeds initialValue from the stored
CLAUDE_MEM_CLAUDE_AUTH_METHOD setting, so reruns honor existing
configuration instead of always defaulting to subscription.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit flagged a duplicate slot wakeup: spawnSdkProcess's child
'exit' handler called registry.unregister(recordId) and then
notifySlotAvailable() unconditionally. Registry.unregister() already
fires notifySlotAvailable() internally when removing an SDK entry, so
the trailing call woke a second waiter for the same freed slot — both
could see count < maxConcurrent in the same synchronous tick before
either replacement registered, transiently exceeding maxConcurrent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- process-registry.ts: skip the trailing notifySlotAvailable() when
pruneDeadEntries() removed entries — prune already wakes one waiter
per removed SDK process, so the unconditional call double-woke and
could let two waiters spawn in the same synchronous tick, briefly
exceeding maxConcurrent. Only fire the safety-net notify when nothing
was pruned.
- install.ts: persistClaudeProvider() no longer silently rewrites
CLAUDE_MEM_CLAUDE_AUTH_METHOD to 'subscription'. When called without
an explicit auth method, preserve the existing setting; only fall
back to 'subscription' when none is configured. Prevents re-running
'claude-mem install --provider claude' from wiping a user's
configured 'api-key' or 'gateway' auth.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>