Six numbered plan documents covering: - 01 Hook IO Discipline (#2376) - 02 Spawn-Contract Templating (#2377) - 03 Worker / Daemon Lifecycle Hardening (#2378) - 04 Installer Failure Transparency (#2379) - 05 Observer SDK Tool Enforcement (#2380) - 06 Worker Env Isolation (#2381) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
33 KiB
Plan 06 — Worker Env Isolation
Goal: Stop host-side environment variables from contaminating the worker's Anthropic SDK subprocess. Two confirmed bugs anchor this plan:
ANTHROPIC_BASE_URLleaks from the parent shell whileANTHROPIC_AUTH_TOKENis blocked, breaking proxy/gateway auth (#2375); andCLAUDE_CODE_EFFORT_LEVELpropagates from host CLI settings into the SDK subprocess where it triggers a permanent HTTP 400 that the retry classifier mistakes for transient (#2357). Adjacent feature #2289 ($TIERalias syntax) is in scope where it shares the same env/model-resolution surface.Net effect:
- The OAuth-skip predicate requires a real credential (
ANTHROPIC_API_KEYorANTHROPIC_AUTH_TOKEN), not a bareANTHROPIC_BASE_URL. Proxy/gateway users put credentials in~/.claude-mem/.env; nothing relies on parent-shell leaks.BLOCKED_ENV_VARSaddsANTHROPIC_BASE_URLand theCLAUDE_CODE_EFFORT_LEVEL/CLAUDE_CODE_ALWAYS_ENABLE_EFFORTpair (defense in depth alongside the existingenv-sanitizer.tsCLAUDE_CODE_*prefix filter).- The Claude provider's error classifier explicitly handles HTTP 400 as
unrecoverable, matchingGeminiProvider/OpenRouterProvider. No more unbounded retry loop on permanent-error responses.- Every spawn boundary that hands env to a child process applies BOTH
buildIsolatedEnvandsanitizeEnv. A grep-based CI check forbids spawning subprocesses with rawprocess.env.~/.claude-mem/.envbecomes the single source of truth for non-OAuth Anthropic credentials. The loader's whitelist documents this contract.Out of scope:
- Hook-side env handling (Plan 01 / 02 territory).
- Worker daemon lifecycle, DB bloat, and chroma-mcp leaks (Plan 03).
- Observer/Knowledge SDK tool enforcement (Plan 05).
- Re-auth UX flow (different concern; out of scope for this plan).
- General provider-router refactor —
$TIERalias is scoped to model resolution only (Phase 4).
Problem Statement (line citations)
Bug A — ANTHROPIC_BASE_URL leaks, OAuth gets skipped, ANTHROPIC_AUTH_TOKEN is missing (#2375)
src/shared/EnvManager.ts lines 14–24 (BLOCKED_ENV_VARS):
const BLOCKED_ENV_VARS = [
'ANTHROPIC_API_KEY', // #733
'ANTHROPIC_AUTH_TOKEN', // added 5edf1557 (2026-05-04) — leak prevention
'CLAUDECODE',
'CLAUDE_CODE_OAUTH_TOKEN', // #2215
];
ANTHROPIC_BASE_URL is not in the list, so it survives buildIsolatedEnv() (lines 166–205) and reaches isolatedEnv from process.env.
buildIsolatedEnvWithFreshOAuth() lines 222–288 then runs the OAuth-skip predicate at lines 237–244:
if (
isolatedEnv.ANTHROPIC_API_KEY ||
isolatedEnv.ANTHROPIC_BASE_URL ||
isolatedEnv.ANTHROPIC_AUTH_TOKEN
) {
clearStaleMarker();
return isolatedEnv;
}
The bare BASE_URL branch was added in commit a122d34e (2026-05-04) under the rationale "tokenless gateways may exist." Combined with the AUTH_TOKEN block from 5edf1557 the same day, the subprocess ends up with:
ANTHROPIC_BASE_URL✅ (leaked from parent)ANTHROPIC_AUTH_TOKEN❌ (blocked, never re-injected because~/.claude-mem/.envis empty for first-time proxy users)CLAUDE_CODE_OAUTH_TOKEN❌ (skip path bypassed the keychain read)
Result: Not logged in · Please run /login from every SDK subprocess.
Bug B — CLAUDE_CODE_EFFORT_LEVEL triggers permanent 400 + unbounded retry (#2357)
The Anthropic SDK subprocess reads CLAUDE_CODE_EFFORT_LEVEL from its env and forwards it as the effort parameter on Messages API calls. claude-mem's source contains zero references to effort — the leak path is environmental, not code. Models without effort support (Haiku 4.5, Sonnet 4.5, older) reject with HTTP 400.
src/supervisor/env-sanitizer.ts lines 1–51 already filters CLAUDE_CODE_* via ENV_PREFIXES (with explicit allowances in ENV_PRESERVE). But:
buildIsolatedEnvdoes NOT callsanitizeEnvinternally; callers are expected to chain them.BLOCKED_ENV_VARSis the canonical leak deny-list and does not nameCLAUDE_CODE_EFFORT_LEVEL. Defense-in-depth is currently single-layer.- The retry classifier in
src/services/worker/ClaudeProvider.tshas no HTTP 400 case; the default branch at line 98 returnskind: 'transient', so a permanent 400 loops forever.
src/services/worker/GeminiProvider.ts lines 89–94 and src/services/worker/OpenRouterProvider.ts lines 82–87 already classify 400 as unrecoverable; that pattern is the copy-target for ClaudeProvider.
Adjacent — $TIER alias syntax (#2289)
src/shared/SettingsDefaultsManager.ts line 116 already implements a portable 'haiku' alias for CLAUDE_MEM_TIER_SIMPLE_MODEL (per #1463). What's missing is the user-facing $TIER syntax in the CLAUDE_MEM_MODEL field that resolves to a provider-appropriate model at request time. Same code surface (model resolution in ClaudeProvider.getModelId at lines 442–446); minimal extension.
Phase 0 — Documentation Discovery (already completed)
Findings below are direct file reads dated 2026-05-08. Each implementation phase cites by line number; do not re-derive. Confidence: HIGH on file/API inventory. Local-only files were read end-to-end.
Allowed APIs / patterns to copy
| Item | Location | What to copy |
|---|---|---|
BLOCKED_ENV_VARS array |
src/shared/EnvManager.ts:14–24 |
Add new entries; keep the comment-per-entry convention |
buildIsolatedEnv filter pattern |
src/shared/EnvManager.ts:166–205 |
Filter on BLOCKED_ENV_VARS.includes(key); defensive delete isolatedEnv.X post-filter |
buildIsolatedEnvWithFreshOAuth skip-check |
src/shared/EnvManager.ts:237–244 |
Restrict predicate to real credentials only |
loadClaudeMemEnv whitelist + ClaudeMemEnv interface |
src/shared/EnvManager.ts:26–32, 79–100 |
Single source of truth for what ~/.claude-mem/.env accepts |
ENV_PRESERVE / ENV_EXACT_MATCHES / ENV_PREFIXES |
src/supervisor/env-sanitizer.ts:1–51 |
Whitelist-based env stripping; do NOT add CLAUDE_CODE_EFFORT_LEVEL to ENV_PRESERVE |
| Provider error classifier (HTTP 400 → unrecoverable) | src/services/worker/GeminiProvider.ts:89–94, src/services/worker/OpenRouterProvider.ts:82–87 |
Identical pattern to apply in ClaudeProvider |
ClassifiedProviderError constructor + kind: 'unrecoverable' | 'auth_invalid' | 'transient' | 'rate_limit' | 'quota_exhausted' |
src/services/worker/retry.ts |
Use existing kind enum; do not invent permanent |
isRetryableKind predicate |
src/services/worker/retry.ts:37–44 |
Used by all retry sites; no edit needed once classifier is correct |
Tier model resolution + 'haiku' alias |
src/services/worker/http/routes/SessionRoutes.ts:503–521, src/shared/SettingsDefaultsManager.ts:51–53, 115–117 |
Pattern for extending $TIER syntax |
Settings flat-key + loadFromFile |
src/shared/SettingsDefaultsManager.ts:6–67, 70–131, 137–139, 161–206 |
New keys MUST be added to interface AND DEFAULTS block |
| Plan format (phase numbering, line-cited edits, anti-patterns block) | plans/01-hook-io-discipline.md, plans/05-observer-tool-enforcement.md |
Reuse layout |
Anti-patterns / methods that DO NOT exist (avoid inventing)
- claude-mem source has zero references to
effort,CLAUDE_CODE_EFFORT_LEVEL,CLAUDE_CODE_ALWAYS_ENABLE_EFFORT, orreasoning_effort. Do not "remove the effort parameter we forward" — there is none. The leak is the SDK subprocess reading the env var directly. BLOCKED_ENV_VARSis anArray<string>with.includeslookup. Do NOT convert toSetin the same change — that touches every caller and is an unrelated refactor.ClassifiedProviderError.kinddoes NOT support the value'permanent'. The existing enum is'transient' | 'rate_limit' | 'unrecoverable' | 'auth_invalid' | 'quota_exhausted'. Useunrecoverablefor permanent 400s.pending_messageshas noretry_countcolumn (dropped — seesrc/services/sqlite/SessionStore.ts:104'sdeadColumnsarray). Issue #2357's "retry counter climbed past #1874" refers to log-line numbering, not a DB counter. Do not add a counter as part of this plan; that's Plan 03 territory.sanitizeEnvis whitelist-based (preserves a fixed set; strips everything matchingCLAUDE_CODE_*etc). It is NOT idempotent if you re-add a name toENV_PRESERVE. Do not addCLAUDE_CODE_EFFORT_LEVELtoENV_PRESERVE— that's the opposite of what we want.buildIsolatedEnvandsanitizeEnvare independent layers. Some callers chain (sanitizeEnv(buildIsolatedEnv(...))); some only use one. Do not assume chaining is universal — Phase 5 audits every spawn boundary.- The
~/.claude-mem/.envloader atsrc/shared/EnvManager.ts:79–100uses property-by-property assignment as an implicit whitelist. Do NOT replace withObject.assign(result, parsed)— that breaks the whitelist guarantee.
File inventory used by this plan
| File | Lines | Disposition |
|---|---|---|
src/shared/EnvManager.ts |
319 | Edited heavily (Phase 2, Phase 5) |
src/supervisor/env-sanitizer.ts |
51 | Light edit (Phase 3 — comment change only; CLAUDE_CODE_* prefix already filters EFFORT_LEVEL) |
src/services/worker/ClaudeProvider.ts |
448 | Edited (Phase 3 — error classifier on query() rejection path) |
src/services/worker/retry.ts |
small | Confirm-only (Phase 3 — isRetryableKind already correct) |
src/services/worker/GeminiProvider.ts |
reference only | Read for pattern (Phase 3) |
src/services/worker/OpenRouterProvider.ts |
reference only | Read for pattern (Phase 3) |
src/shared/SettingsDefaultsManager.ts |
209 | Edited (Phase 4 — $TIER alias resolution) |
src/services/worker/http/routes/SessionRoutes.ts |
reference | Read tier-routing pattern (Phase 4) |
src/services/infrastructure/ProcessManager.ts |
line 415 | Audit (Phase 5) — confirm sanitizeEnv chain is sufficient |
src/services/sync/ChromaMcpManager.ts |
line 585 | Audit (Phase 5) |
src/supervisor/process-registry.ts |
line 539 | Audit (Phase 5) |
src/services/worker-service.ts |
line 412 | Audit (Phase 5) |
src/services/worker/knowledge/KnowledgeAgent.ts |
lines 54, 149 | Confirm-only (Phase 5) |
tests/env-isolation.test.ts |
NEW | CREATED (Phase 6) |
scripts/check-spawn-env-discipline.cjs |
NEW | CREATED (Phase 7) |
CLAUDE.md |
small | Edited (Phase 7 — document ~/.claude-mem/.env contract) |
Phase 1 — Audit & write the failing tests first
Goal: Pin down current behavior with red tests so the fix can prove itself green. No production-code changes in this phase.
1.1 Tests to add (tests/env-isolation.test.ts)
Use bun:test per package.json "test": "bun test". Pattern from tests/claude-provider-resume.test.ts:1.
buildIsolatedEnvWithFreshOAuth strips ANTHROPIC_BASE_URL when no .env credentials are configured- Stub
process.env.ANTHROPIC_BASE_URL = 'https://proxy.example', no~/.claude-mem/.env, no API_KEY/AUTH_TOKEN in env. - Call
buildIsolatedEnvWithFreshOAuth(). - Assert: result does NOT have
ANTHROPIC_BASE_URL(post-fix). Currently RED.
- Stub
OAuth-skip does not fire on bare ANTHROPIC_BASE_URL- Same setup. Spy on
readClaudeOAuthToken. - Assert:
readClaudeOAuthTokenwas called (because BASE_URL alone is not enough to skip). Currently RED —readClaudeOAuthTokenis NOT called today.
- Same setup. Spy on
ANTHROPIC_AUTH_TOKEN from ~/.claude-mem/.env reaches the isolated env- Write a temp
.envwithANTHROPIC_AUTH_TOKEN=test-tokenandANTHROPIC_BASE_URL=https://proxy.example. - Assert:
isolatedEnv.ANTHROPIC_AUTH_TOKEN === 'test-token'ANDisolatedEnv.ANTHROPIC_BASE_URL === 'https://proxy.example'. Currently GREEN (already works); test guards against regression.
- Write a temp
CLAUDE_CODE_EFFORT_LEVEL is stripped from the isolated env- Stub
process.env.CLAUDE_CODE_EFFORT_LEVEL = 'MAX'. - Assert:
sanitizeEnv(buildIsolatedEnv())does NOT containCLAUDE_CODE_EFFORT_LEVEL. Currently GREEN viaenv-sanitizer.ENV_PREFIXES; test guards.
- Stub
CLAUDE_CODE_EFFORT_LEVEL is in BLOCKED_ENV_VARS for defense-in-depth- Assert:
BLOCKED_ENV_VARS.includes('CLAUDE_CODE_EFFORT_LEVEL'). Currently RED.
- Assert:
HTTP 400 from Claude SDK is classified unrecoverable- Construct an error matching the SDK's 400 shape (
error.status === 400, body containsdoes not support the effort parameter). - Assert:
classifyClaudeProviderError(err).kind === 'unrecoverable'. Currently RED — falls through totransient.
- Construct an error matching the SDK's 400 shape (
HTTP 400 with effort-parameter body emits a once-only warn log- Same setup as 6, plus capture
logger.warncalls. - Assert: warn fires once with category
SDKand a hint pointing at #2357 /~/.claude-mem/.env. Currently RED.
- Same setup as 6, plus capture
1.2 Verification checklist (Phase 1)
- All 7 tests added; tests 1, 2, 5, 6, 7 are RED; tests 3, 4 are GREEN.
bun test tests/env-isolation.test.tsruns cleanly (RED tests fail with the expected assertion, no other errors).- No production-code changes in this phase (
git diff src/empty).
1.3 Anti-pattern guards
- Do NOT mock
EnvManager.buildIsolatedEnv— it's the unit under test. - Do NOT use
vi.*(project usesbun:test, not vitest). - Do NOT skip cleanup of temp
.envfiles. Use a per-testbeforeEach/afterEachwithmkdtempSync.
Phase 2 — Fix #2375 (BASE_URL leak + OAuth-skip predicate)
Goal: Make the OAuth-skip require a real credential, and add ANTHROPIC_BASE_URL to the deny-list so it can only be configured via ~/.claude-mem/.env.
2.1 Edit src/shared/EnvManager.ts:14–24 — extend BLOCKED_ENV_VARS
Before:
const BLOCKED_ENV_VARS = [
'ANTHROPIC_API_KEY',
'ANTHROPIC_AUTH_TOKEN',
'CLAUDECODE',
'CLAUDE_CODE_OAUTH_TOKEN',
];
After (add ANTHROPIC_BASE_URL):
const BLOCKED_ENV_VARS = [
'ANTHROPIC_API_KEY', // #733
'ANTHROPIC_AUTH_TOKEN', // 5edf1557 — leak prevention; re-injected from ~/.claude-mem/.env when configured
'ANTHROPIC_BASE_URL', // #2375 — same leak class as AUTH_TOKEN; re-injected from ~/.claude-mem/.env. Without this entry, a leaked BASE_URL alone triggered the OAuth-skip while no auth credential reached the subprocess.
'CLAUDECODE',
'CLAUDE_CODE_OAUTH_TOKEN', // #2215
];
2.2 Edit src/shared/EnvManager.ts:237–244 — restrict OAuth-skip to real credentials
Before:
if (
isolatedEnv.ANTHROPIC_API_KEY ||
isolatedEnv.ANTHROPIC_BASE_URL ||
isolatedEnv.ANTHROPIC_AUTH_TOKEN
) {
clearStaleMarker();
return isolatedEnv;
}
After:
// Skip OAuth lookup ONLY when a real credential is configured. A bare
// ANTHROPIC_BASE_URL is not a credential — every documented gateway needs
// either an AUTH_TOKEN or an API_KEY. This guards #2375 against a class of
// leaks where a parent shell exports BASE_URL (e.g. for the Claude Code CLI
// itself) while no token is present.
if (isolatedEnv.ANTHROPIC_API_KEY || isolatedEnv.ANTHROPIC_AUTH_TOKEN) {
clearStaleMarker();
return isolatedEnv;
}
2.3 Verify the ~/.claude-mem/.env re-injection at src/shared/EnvManager.ts:178–195
Currently the loader path covers BASE_URL re-injection from .env. Confirm by reading the function. No code change required here, but add a TS comment block above lines 178–195 documenting the new contract:
// Contract (post-#2375): ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, and
// ANTHROPIC_API_KEY are *only* populated from ~/.claude-mem/.env. They are
// in BLOCKED_ENV_VARS so parent-shell values never leak through.
2.4 Verification checklist (Phase 2)
- Tests 1, 2 from Phase 1 now GREEN.
- Existing test suite still passes (
bun test). grep -n "ANTHROPIC_BASE_URL" src/shared/EnvManager.tsshows entries at:BLOCKED_ENV_VARS,ClaudeMemEnvinterface, loader, re-injection, OAuth-skip predicate (NOT in skip predicate).- Smoke: with a
~/.claude-mem/.envcontainingANTHROPIC_BASE_URL=...andANTHROPIC_AUTH_TOKEN=..., the worker actually authenticates against the proxy. Test with BigModel or any sandboxed proxy.
2.5 Anti-pattern guards
- Do NOT add
ANTHROPIC_BASE_URLtoENV_PRESERVEinenv-sanitizer.ts—BLOCKED_ENV_VARSis the right layer;env-sanitizeris a downstream filter. - Do NOT keep the BASE_URL branch in the OAuth-skip predicate "for tokenless gateways may exist" — every documented gateway requires a token. The skip path was a misdesign.
- Do NOT delete the existing
delete isolatedEnv.CLAUDE_CODE_OAUTH_TOKENdefensive line at line 229. That guard is intact; it's belt-and-suspenders for #2215 and orthogonal to this plan.
Phase 3 — Fix #2357 (CLAUDE_CODE_EFFORT_LEVEL leak + 400 retry classification)
Goal: Two-layer defense for the env leak (existing CLAUDE_CODE_* prefix filter + new BLOCKED_ENV_VARS entries), plus a permanent classification for the resulting HTTP 400 so the retry loop terminates if the leak ever sneaks past either layer.
3.1 Edit src/shared/EnvManager.ts:14–24 — add EFFORT entries to BLOCKED_ENV_VARS
After the Phase 2 edit, the list is:
const BLOCKED_ENV_VARS = [
'ANTHROPIC_API_KEY',
'ANTHROPIC_AUTH_TOKEN',
'ANTHROPIC_BASE_URL',
'CLAUDECODE',
'CLAUDE_CODE_OAUTH_TOKEN',
// #2357 — host CLI config, not part of the plugin's contract. The
// env-sanitizer's CLAUDE_CODE_* prefix filter strips these for spawn paths
// that go through it, but BLOCKED_ENV_VARS is the canonical deny-list and
// belongs in defense-in-depth.
'CLAUDE_CODE_EFFORT_LEVEL',
'CLAUDE_CODE_ALWAYS_ENABLE_EFFORT',
];
3.2 Edit src/services/worker/ClaudeProvider.ts — classify HTTP 400 as unrecoverable
Locate the existing error-classification path. The Anthropic SDK raises errors with error.status and a body containing the failure description. Pattern from src/services/worker/GeminiProvider.ts:89–94 (the canonical copy-target):
if (status === 400) {
return new ClassifiedProviderError(
`Gemini bad request (status 400)`,
{ kind: 'unrecoverable', cause: input.cause },
);
}
Add the equivalent in ClaudeProvider's error classifier (new function or existing — read the file; create if absent, mirroring GeminiProvider shape):
function classifyClaudeProviderError(input: { cause: unknown }): ClassifiedProviderError {
const err = input.cause;
const status = (err as { status?: number })?.status;
const bodyText = String((err as { message?: string })?.message ?? '');
// Permanent: SDK rejected the request itself. Most common cause in the wild
// is a leaked CLAUDE_CODE_EFFORT_LEVEL the SDK subprocess forwarded as
// `effort` against a model that doesn't support it (#2357). The leak is
// also blocked at BLOCKED_ENV_VARS + env-sanitizer; this classifier ends
// the retry loop if either layer is bypassed.
if (status === 400) {
if (/effort parameter/i.test(bodyText)) {
logger.warn(
'SDK',
'Claude API rejected effort parameter — likely CLAUDE_CODE_EFFORT_LEVEL leaked into SDK env (issue #2357). Configure CLAUDE_MEM_MODEL or set credentials in ~/.claude-mem/.env.',
{ status, bodyText },
);
}
return new ClassifiedProviderError(
`Claude bad request (status 400): ${bodyText}`,
{ kind: 'unrecoverable', cause: input.cause },
);
}
// 401 / 403 → auth_invalid (existing pattern from GeminiProvider:96-103)
if (status === 401 || status === 403) {
return new ClassifiedProviderError(
`Claude auth rejected (status ${status})`,
{ kind: 'auth_invalid', cause: input.cause },
);
}
// 429 → rate_limit
if (status === 429) {
return new ClassifiedProviderError(
`Claude rate limited (status 429)`,
{ kind: 'rate_limit', cause: input.cause },
);
}
// Default: transient (preserves the existing fall-through behavior).
return new ClassifiedProviderError(
`Claude SDK error: ${bodyText}`,
{ kind: 'transient', cause: input.cause },
);
}
Wire this classifier into the existing try { ... } catch around query(...) in ClaudeProvider.ts. Read the actual catch shape before editing — the function lives near line 180–195 and the existing for await over queryResult is where rejections surface.
3.3 Confirm src/supervisor/env-sanitizer.ts already strips CLAUDE_CODE_EFFORT_LEVEL
Read lines 1–51. Verify:
ENV_PREFIXESincludes'CLAUDE_CODE_'.ENV_PRESERVEdoes NOT includeCLAUDE_CODE_EFFORT_LEVEL,CLAUDE_CODE_ALWAYS_ENABLE_EFFORT.
Add an inline comment at the ENV_PREFIXES declaration:
// Filters CLAUDE_CODE_* unless explicitly preserved in ENV_PRESERVE.
// This is layer 2 of defense for #2357 — layer 1 is BLOCKED_ENV_VARS in EnvManager.
No code change to behavior here.
3.4 Verification checklist (Phase 3)
- Tests 5, 6, 7 from Phase 1 now GREEN.
grep -n "CLAUDE_CODE_EFFORT_LEVEL" src/returns hits inEnvManager.ts(BLOCKED_ENV_VARS) and the test file. Nothing else.- Reproduce #2357 scenario locally:
CLAUDE_CODE_EFFORT_LEVEL=MAX bun run src/services/worker-service.ts --daemon # Observe: no `effort` parameter on outgoing requests. - If a 400 is forced (e.g., via a mocked SDK reject), the retry loop terminates after the first attempt;
logger.warnfires once.
3.5 Anti-pattern guards
- Do NOT add a separate "permanent error" enum value —
kind: 'unrecoverable'already exists and is the right slot. - Do NOT regex on the entire error stack —
error.status === 400is the deterministic signal; the body text check is purely for the user-facing log hint. - Do NOT log inside
classifyClaudeProviderErrorfor every 400 — only the effort-parameter sub-case warrants a hint. Generic 400s are noisy enough at the call site. - Do NOT mark all 400s with body matching
/effort/iasauth_invalid— that would trigger the "re-login" flow incorrectly. Useunrecoverable. - Do NOT rely on the SDK supporting an
effortSDK-option that we strip. The SDK type does not exposeeffort; the leak is the SDK's own subprocess (pathToClaudeCodeExecutable) reading the env var. Stripping at our env layer is the only fix we control.
Phase 4 — $TIER alias syntax (#2289)
Goal: Allow CLAUDE_MEM_MODEL=$TIER:summary (and similar) to resolve at request time to a provider-appropriate model, reusing the existing 'haiku' portable alias machinery (line 116, #1463). Optional phase; can be deferred without blocking Phase 2/3.
4.1 Edit src/shared/SettingsDefaultsManager.ts — extend tier interface
Add to the SettingsDefaults interface near lines 51–53:
CLAUDE_MEM_TIER_FAST_MODEL: string; // for $TIER:fast — defaults to 'haiku'
CLAUDE_MEM_TIER_SMART_MODEL: string; // for $TIER:smart — defaults to 'sonnet' (or provider-equivalent)
Add to the DEFAULTS block near lines 115–117:
CLAUDE_MEM_TIER_FAST_MODEL: 'haiku',
CLAUDE_MEM_TIER_SMART_MODEL: 'sonnet',
4.2 Edit src/services/worker/ClaudeProvider.ts:442–446 — add $TIER resolution
Replace getModelId():
private getModelId(): string {
const settingsPath = paths.settings();
const settings = SettingsDefaultsManager.loadFromFile(settingsPath);
return resolveTierAlias(settings.CLAUDE_MEM_MODEL, settings);
}
Add resolveTierAlias to a shared util (src/services/worker/model-aliases.ts, NEW):
import type { SettingsDefaults } from '../../shared/SettingsDefaultsManager';
const TIER_PATTERN = /^\$TIER:(fast|smart|simple|summary)$/;
export function resolveTierAlias(model: string, settings: SettingsDefaults): string {
const match = TIER_PATTERN.exec(model);
if (!match) return model;
switch (match[1]) {
case 'fast': return settings.CLAUDE_MEM_TIER_FAST_MODEL || 'haiku';
case 'smart': return settings.CLAUDE_MEM_TIER_SMART_MODEL || 'sonnet';
case 'simple': return settings.CLAUDE_MEM_TIER_SIMPLE_MODEL || 'haiku';
case 'summary': return settings.CLAUDE_MEM_TIER_SUMMARY_MODEL || settings.CLAUDE_MEM_MODEL;
default: return model;
}
}
4.3 Same call site in KnowledgeAgent.ts:149 (getModelId)
Apply the same resolveTierAlias wrap. Knowledge agent uses the same settings path.
4.4 Verification checklist (Phase 4)
- New test:
resolveTierAlias('$TIER:fast', settings)returnssettings.CLAUDE_MEM_TIER_FAST_MODEL. - New test:
resolveTierAlias('claude-haiku-4-5-20251001', settings)returns input unchanged (non-tier passthrough). - Setting
CLAUDE_MEM_MODEL=$TIER:fastand starting the worker actually queries against the fast-tier model. - Documentation updated in
docs/public/configuration.mdxwith the four tier aliases.
4.5 Anti-pattern guards
- Do NOT match
$TIER:*greedily — the regex is anchored. - Do NOT add
$PROVIDER:or$MODEL:aliases in this phase — out of scope; one syntax at a time. - Do NOT mutate
settingsinsideresolveTierAlias; pure function only. - Do NOT resolve the alias at settings-load time — resolve at request time so users can edit settings without restarting the worker.
Phase 5 — Cross-spawn-boundary audit
Goal: Every place claude-mem spawns a subprocess must apply both buildIsolatedEnv (or the async variant) AND sanitizeEnv. A grep-based check codifies the rule.
5.1 Audit table — current state per call site
| File | Line | Spawn target | Env construction | Sufficient? |
|---|---|---|---|---|
src/services/worker/ClaudeProvider.ts |
155 | Anthropic SDK subprocess | sanitizeEnv(await buildIsolatedEnvWithFreshOAuth()) |
✅ |
src/services/worker/knowledge/KnowledgeAgent.ts |
54, 149 | Knowledge SDK subprocess | sanitizeEnv(await buildIsolatedEnvWithFreshOAuth()) |
✅ |
src/services/infrastructure/ProcessManager.ts |
415 | Worker daemon | sanitizeEnv({...process.env, CLAUDE_MEM_WORKER_PORT, ...extraEnv}) |
⚠️ daemon inherits parent env then sanitizes — does not pass through buildIsolatedEnv. Document why this is OK: daemon is the trust boundary; parent env IS the truth. But it should still strip CLAUDE_CODE_EFFORT_LEVEL via the prefix filter. Confirm. |
src/services/sync/ChromaMcpManager.ts |
585 | chroma-mcp subprocess | sanitizeEnv(process.env) |
⚠️ same as above. |
src/supervisor/process-registry.ts |
539 | Generic spawn factory | sanitizeEnv(options.env ?? process.env) |
⚠️ same. |
src/services/worker-service.ts |
412 | MCP server subprocess | sanitizeEnv(process.env) |
⚠️ same. |
For the worker-daemon and downstream MCP/chroma spawns, parent-process env IS the source of truth — they are pre-credential paths. As long as CLAUDE_CODE_EFFORT_LEVEL and the Anthropic credentials are stripped (which sanitizeEnv does via CLAUDE_CODE_* prefix and the existing ANTHROPIC_AUTH_TOKEN block), behavior is correct. The plan does not change these paths — it adds tests that prove they stay correct.
5.2 Add audit test — tests/env-isolation.test.ts
every documented spawn site applies sanitizeEnv- Read each file from the audit table.
- Assert: each line cited contains
sanitizeEnv(. Currently GREEN; test prevents regression.
worker-daemon spawn env does not contain CLAUDE_CODE_EFFORT_LEVEL- Stub
process.env.CLAUDE_CODE_EFFORT_LEVEL = 'MAX'. - Construct the env block as ProcessManager.ts:415 does.
- Assert: result does not contain
CLAUDE_CODE_EFFORT_LEVEL. Currently GREEN.
- Stub
5.3 Verification checklist (Phase 5)
- Tests 8, 9 GREEN.
- No new spawn sites introduced; if any are added by accident, the CI check (Phase 7) flags them.
5.4 Anti-pattern guards
- Do NOT add
buildIsolatedEnvcalls to ProcessManager / ChromaMcpManager / MCP server spawn paths. They legitimately need parent-shellPATH,HOME, etc. — those would be wiped by the credential-isolated builder. - Do NOT consolidate the two layers into one helper "for clarity" — they have distinct contracts and are layered intentionally.
Phase 6 — Test the full integration end-to-end
Goal: Smoke test the proxy/gateway path so we know the fix works in the real world.
6.1 Manual smoke (BigModel proxy or any equivalent)
# Setup:
cat > ~/.claude-mem/.env <<'EOF'
ANTHROPIC_BASE_URL=https://open.bigmodel.cn/api/anthropic
ANTHROPIC_AUTH_TOKEN=<your-bigmodel-token>
EOF
chmod 600 ~/.claude-mem/.env
# Reset worker:
npm run build-and-sync
pkill -f worker-service.cjs
# Trigger:
# In any Claude Code session, use any tool — PostToolUse hook should land an observation.
# Verify:
tail -f ~/.claude-mem/logs/claude-mem-$(date +%Y-%m-%d).log
# Expect: no "Not logged in" errors; observations land via the proxy.
6.2 Manual smoke (CLAUDE_CODE_EFFORT_LEVEL leak)
# Setup:
export CLAUDE_CODE_EFFORT_LEVEL=MAX
export CLAUDE_CODE_ALWAYS_ENABLE_EFFORT=true
# Restart Claude Code so the env propagates to the hook subprocess.
# Verify:
tail -f ~/.claude-mem/logs/claude-mem-$(date +%Y-%m-%d).log
# Expect: NO repeated "API Error: 400 This model does not support the effort parameter."
# Expect: NO "PARSER returned non-XML response; marking messages as failed for retry".
6.3 Verification checklist (Phase 6)
- Both smoke scenarios pass.
bun testis green.- One iteration on a fresh machine confirms
~/.claude-mem/.envis the only knob users need for proxy auth.
Phase 7 — CI guard + documentation
Goal: A grep-based CI check rejects PRs that introduce a subprocess spawn without sanitizeEnv. Documentation aligns with the new contract.
7.1 Add scripts/check-spawn-env-discipline.cjs
Pattern from plans/01-hook-io-discipline.md Phase 6 (scripts/check-hook-io-discipline.cjs):
#!/usr/bin/env node
// Forbid raw process.env in subprocess spawn calls. Every spawn must use
// sanitizeEnv(...) and (where credentials are involved) buildIsolatedEnv*.
const { execSync } = require('node:child_process');
const VIOLATIONS = [];
// Find every `spawn(` / `spawnSync(` / `child_process.spawn(` call in src/
const grep = execSync(
`grep -rEn "spawn(Sync)?\\(" src/ | grep -v "node_modules" | grep -v "\\.test\\."`,
{ encoding: 'utf8' },
);
for (const line of grep.split('\n').filter(Boolean)) {
// Allow if the same logical block contains sanitizeEnv
// (heuristic: read 5 lines after the match in the source file)
const [filePath, lineNumStr] = line.split(':', 2);
const lineNum = Number.parseInt(lineNumStr, 10);
const src = require('node:fs').readFileSync(filePath, 'utf8').split('\n');
const window = src.slice(lineNum - 1, lineNum + 8).join('\n');
if (!/sanitizeEnv\s*\(/.test(window)) {
VIOLATIONS.push(`${filePath}:${lineNum} — spawn without sanitizeEnv`);
}
}
if (VIOLATIONS.length > 0) {
console.error('Spawn-env discipline check FAILED:');
VIOLATIONS.forEach(v => console.error(' ' + v));
process.exit(1);
}
console.log('Spawn-env discipline check passed.');
Wire to package.json scripts.test:env-discipline. Add to CI alongside existing hook checks.
7.2 Edit CLAUDE.md — document the ~/.claude-mem/.env contract
Add a section under "Configuration":
### Anthropic Credentials (proxies, gateways, BigModel, etc.)
For non-OAuth Anthropic credentials (proxies / gateways / `ANTHROPIC_AUTH_TOKEN` / `ANTHROPIC_API_KEY`), put them in `~/.claude-mem/.env`:
\```
ANTHROPIC_BASE_URL=https://your-proxy.example
ANTHROPIC_AUTH_TOKEN=your-token
\```
The file is read at worker spawn time and re-injected into the SDK subprocess. **Parent-shell exports of these variables are intentionally ignored** — they are in `BLOCKED_ENV_VARS` to prevent host-config bleed-through (#2375).
If you only have an OAuth subscription, no `.env` is needed; the worker reads the token from your keychain at spawn time.
7.3 Verification checklist (Phase 7)
npm run test:env-disciplinepasses on the post-fix tree.- CI pipeline runs the new check.
- CLAUDE.md section exists and accurately reflects the new contract.
7.4 Anti-pattern guards
- Do NOT extend the CI check to flag every
process.envread — onlyspawn*()call sites needsanitizeEnv. Reads are fine. - Do NOT add the
.envfile path to.gitignore— it lives in~/.claude-mem/, not in the repo, so it's already outside.
Cross-plan dependencies
- Plan 01 (Hook IO Discipline): Independent. Both can be implemented in parallel.
- Plan 02 (Spawn-Contract Templating): Independent. Both touch templating but at different layers.
- Plan 03 (Worker Lifecycle): Phase 3.2's HTTP 400 classification removes a class of unbounded retries. Plan 03's "circuit breaker" + "stale-session sweep" handles other retry classes. Merge order: this plan first (small, surgical), then Plan 03.
- Plan 04 (Installer Transparency): Independent.
- Plan 05 (Observer Tool Enforcement): Adjacent —
KnowledgeAgentis touched in both plans (this one forgetModelId, Plan 05 for tool enforcement). Sequence Plan 05 first (security urgency), then Plan 06.
Pre-/do checklist
- Verify
BLOCKED_ENV_VARSis still anArray<string>and not converted to aSet(Phase 2 refactor risk). - Verify the existing test suite passes against current
mainbefore starting (bun test). - Re-confirm
effortis still absent fromsrc/(grep -rn "effort" src/) — if a future change adds the parameter, Phase 3.2's regex needs revisiting. - Read
node_modules/@anthropic-ai/claude-agent-sdk/sdk.d.tsto confirmquery()options does NOT supporteffortnatively. If the SDK adds it, Phase 3.2's body-text regex still works as a fallback, but a code-level strip becomes the right fix. - Verify
~/.claude-mem/.envpermissions are0o600post-fix (the saver enforces this; readers should not weaken it).