* fix: mirror migration 28 in SessionStore so pending_messages.tool_use_id and worker_pid columns are created (#2139)
SessionStore's inline migration list jumped from v27 to v29, skipping
rebuildPendingMessagesForSelfHealingClaim. The worker uses SessionStore
directly via worker/DatabaseManager.ts and bypasses the canonical
MigrationRunner, so fresh installs ended up at "max v29" with neither
column present — every queue claim and observation insert failed.
Adds addPendingMessagesToolUseIdAndWorkerPidColumns following the existing
mirror precedent (addObservationSubagentColumns / addObservationsUniqueContentHashIndex).
Uses ALTER TABLE + column-existence guards so already-broken DBs at v29
self-heal on next worker boot.
Verified on fresh DB and on a synthetic v29-without-v28 broken DB:
both columns and indexes (idx_pending_messages_worker_pid,
ux_pending_session_tool) appear after one boot.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: wrap v28 mirror dedup+index creation in transaction
Addresses Greptile P2 review on PR #2140: matches the existing pattern in
addObservationsUniqueContentHashIndex (v29 mirror at SessionStore.ts:1127)
and runner.ts rebuildPendingMessagesForSelfHealingClaim. A crash between
the dedup DELETE and the schema_versions INSERT no longer leaves the DB
in a half-applied state.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(plan): cynical-deletion plan for 29 open issues
9-phase plan applying delete-first lens to triaged issue corpus.
Headlines: kill defenders (orphan cleanup, EncodedCommand spawn,
restart-port-steal) and tolerators (silent JSON drops, drifted SSE
filters). Each phase closes a named subset of issues.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: delete process-management theater (Phase 1: DEL-1 + DEL-2)
Delete aggressiveStartupCleanup, the PowerShell -EncodedCommand
spawn branch, and the restart-with-port-steal sequence. Replace
daemon spawning with a single uniform child_process.spawn path
using arg-array form, keeping setsid on Unix when available.
The defenders (orphan cleanup, duplicate-worker probes, port
stealing) bred more bugs than they fixed. PID file with start-time
token already provides correct OS-trust ownership; restart now
requests httpShutdown, waits 5s for the port to free, then exits 1
if it didn't (user resolves). Net -247 lines.
Closes #2090, #2095 (already fixed at session-init.ts:78), #2107,
#2111, #2114, #2117, #2123, #2097, #2135.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: observer-sessions trust boundary via CLAUDE_MEM_INTERNAL env (Phase 2: DEL-9)
Replace the cwd === OBSERVER_SESSIONS_DIR discriminator (which every
consumer must repeat and inevitably drifts) with a single env-var
trust boundary set once at spawn time in buildIsolatedEnv.
- buildIsolatedEnv now sets CLAUDE_MEM_INTERNAL=1, covering all three
spawn sites (SDKAgent, KnowledgeAgent.prime, KnowledgeAgent.executeQuery)
- shouldTrackProject checks the env var first (cwd check stays as
belt-and-braces fallback)
- New shared shouldEmitProjectRow predicate — SSE broadcaster and
pagination filter share the same predicate so they can never drift
apart (#2118)
- ObservationBroadcaster filters observer rows from SSE stream
- PaginationHelper hardcoded 'observer-sessions' replaced with
OBSERVER_SESSIONS_PROJECT const
- project-filter basename match pass — *observer-sessions* now matches
basename, not just full path (globToRegex's [^/]* can't cross /)
(#2126 item 1)
- New `claude-mem cleanup [--dry-run]` subcommand wires CleanupV12_4_3
through to the worker for #2126 item 5
Closes #2118, #2126.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: strip proxy env vars before spawning worker (Phase 4: CON-1)
User's HTTP_PROXY/HTTPS_PROXY config was bleeding into internal AI
calls when claude-mem spawns the claude subprocess, causing
connection failures. Strip unconditionally — no passthrough knob,
which rejects #2099's whitelist proposal.
Closes #2115, #2099.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: fail-fast on silent drops in stdin/file-context/memory-save (Phase 5: FF-1)
Three independent fail-fast fixes:
#2089 — stdin-reader silent drop. Non-empty stdin that fails JSON.parse
now rejects with a clear error instead of resolving undefined. Empty
stdin still resolves undefined.
#2094 — PreToolUse:Read truncation Edit deadlock. file-context handler
no longer returns a fake truncated Read result via updatedInput.
Removes userOffset/userLimit/truncated machinery; injects the timeline
via additionalContext only and lets the real Read pass through. Read
state and Claude's expectation now stay consistent, eliminating the
infinite Edit retry loop.
#2116 — /api/memory/save metadata drop + project bug. Schema accepts
metadata as a documented JSON column (migration 30 adds observations.
metadata TEXT, mirrored in SessionStore). Schema also tightened to
.strict() so unknown top-level fields fail fast instead of being
silently dropped. Project resolution now consults metadata.project as
a fallback before defaultProject.
Closes #2089, #2094, #2116.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: small deletions — Zod externalize / Gemini fallback / session timeout / installCLI alias (Phase 6)
DEL-4 (#2113): Externalize zod from mcp-server.cjs and context-generator.cjs
hook bundles so OpenCode's runtime resolves a single Zod copy. Worker
keeps Zod bundled (it's a daemon subprocess, not in OpenCode's hook
bundle). Added zod to plugin/package.json so externalized requires
resolve at runtime.
DEL-5 (#2087): Delete the never-wired GeminiAgent → Claude fallback.
fallbackAgent was always null in production. On 429 the agent now
throws cleanly (message stays pending for retry). Removed
setFallbackAgent, FallbackAgent interface, and the 429 fallback
branch from both GeminiAgent and OpenRouterAgent. Updated docs
that claimed automatic Claude fallback.
DEL-6 (#2127, #2098): Raise MAX_SESSION_WALL_CLOCK_MS from 4h to
24h. The timeout is a real guard against runaway-cost loops (per
issue #1590), but 4h kills legitimate long Claude Code days. 24h
preserves the guard while never hitting in normal use. No knob —
a session approaching this age is a bug worth investigating, not
a value worth tuning.
DEL-8 (#2054): Delete installCLI() alias function. Saves 4 keystrokes
at the cost of cross-platform shell-config mutation surface — not
worth it. Canonical entry is npx claude-mem (and bunx). Uninstall
now strips legacy alias/function lines from ~/.bashrc, ~/.zshrc,
and the PowerShell profile.
Closes #2087, #2098, #2113, #2127, #2054.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: de-hardcode worker port + multi-account commit (Phase 3: CON-2 + DEL-7)
Replace hardcoded 37777 fallbacks with SettingsDefaultsManager.get(
'CLAUDE_MEM_WORKER_PORT') in npx-cli (runtime/install/uninstall),
opencode-plugin, OpenClaw installer, SearchRoutes example URLs.
Timeline-report SKILL.md now resolves WORKER_PORT from settings.json
at the top and uses ${WORKER_PORT} in all curl invocations.
Remaining 37777 literals are doc comments + viewer build-time form-
field placeholder (which is replaced by /api/settings on mount).
hooks.json: add cygpath POSIX→Windows path translation between _R
resolution and node invocation. No-op on macOS/Linux. Closes the
Windows + Git Bash MODULE_NOT_FOUND in #2109.
CLAUDE.md gains a Multi-account section documenting CLAUDE_MEM_DATA_DIR
+ optional CLAUDE_MEM_WORKER_PORT — every existing path/port code
path now honors them.
Closes #2103, #2109, #2101.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: install/uninstall improvements (Phase 7: #2106)
5 fixes for the install/uninstall flow:
Item 1 — multiselect default. install.ts no longer pre-selects every
detected IDE; user explicitly opts in.
Item 3 — shutdown-before-overwrite. New
src/services/install/shutdown-helper.ts shared by install and
uninstall: POSTs /api/admin/shutdown then polls /api/health until
the worker stops responding. install calls it before
copyPluginToMarketplace so reinstall over a running worker doesn't
conflict; uninstall calls it before deletion.
Item 4 — uninstall path coverage. Removes ~/.npm/_npx/*/node_modules/
claude-mem, ~/.cache/claude-cli-nodejs/*/mcp-logs-plugin-claude-mem-*,
~/.claude/plugins/data/claude-mem-thedotmack/. Best-effort: per-path
try/catch so a single permission failure doesn't abort uninstall.
chroma-mcp shutdown is implicit via the worker's GracefulShutdown
cascade in item 3's helper.
Item 5 — install summary documents "Close all Claude Code sessions
before uninstalling, or ~/.claude-mem will be recreated by active
hooks."
Item 6 — real-port query. After install, fetches /api/health on the
configured port with 3s timeout. Reports actually-bound port if the
response carries it; falls back to requested port. No retry loop.
Closes #2106 (items 1, 3, 4, 5, 6). Items 2, 7 closed separately
as already-fixed and insufficient-detail.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: pin chroma-mcp to 0.2.6 (Phase 8: DEL-3 lite)
Replace unpinned 'chroma-mcp' arg with chroma-mcp==0.2.6 in both
local and remote modes. Pinning makes installs deterministic across
machines and across time, eliminating the dependency-drift class
of bugs.
Verified 0.2.6 in a clean uv cache: starts cleanly, no httpcore/
httpx ImportError, no --with flags needed. The --with flags removed
in a0dd516c are not required at this pin (transitive deps resolve
correctly when the top-level version is fixed).
#2102's three protections (transport cleanup on failure, stale onclose
handler guard, 10s reconnect backoff) confirmed intact.
Closes #2046, #2085, #2102.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: update stale assertions for per-UID port + migration 30 (Phase 9)
SettingsDefaultsManager.CLAUDE_MEM_WORKER_PORT default is per-UID
(37700 + uid%100), not literal '37777'. Three assertions in
settings-defaults-manager.test.ts now compute the expected value
the same way the source does.
migration-runner.test.ts: drop expect(versions).toContain(19)
(version 19 was a noop never recorded — pre-existing bug at parent),
add expect(versions).toContain(30) for the new observations.metadata
column added in Phase 5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address Greptile P1/P2 review comments on PR #2141
P1: spawnDaemon return value was unchecked in worker-service.ts restart
case, so a failed spawn silently exited 0 with a misleading "Worker
restart spawned" log. Now error and exit 1 when restartPid is undefined.
P2: shutdown-helper.ts health-poll catch treated AbortError (timeout)
the same as connection-refused, so a slow worker could be reported
confirmedStopped while still holding file locks. Now distinguish:
AbortError continues polling; other errors return confirmedStopped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* build: rebuild plugin artifacts after merging main
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address CodeRabbit review comments on PR #2141
- hooks.json: quote $HOME in cache lookup so paths with spaces work
- timeline-report SKILL.md: fall back when process.getuid is unavailable (Windows)
- opencode-plugin: validate CLAUDE_MEM_WORKER_PORT before using
- uninstall.ts: only strip alias lines, not function declarations (multi-line bodies left intact)
- MemoryRoutes: trim whitespace-only project before precedence resolution
- SessionStore migration 21: preserve metadata column if observations already has it
- stdin-reader test: restore full property descriptor to avoid cross-test pollution
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
24 KiB
Cynical Deletion Plan — 29 issues → ~7 deletions
Date: 2026-04-25
Branch: claude-mem-skill-invocation-and-github-issue-2139
Source: Triage of all 29 open issues for thedotmack/claude-mem applied with delete-first lens.
Headline
The codebase has accumulated defenders (orphan cleanup → duplicate detection → restart-port-stealing) and tolerators (silent JSON drops, drifted SQL/SSE filters, silent metadata drops). Each defender breeds two more bugs; each tolerator hides the bug it tolerates until it explodes as a "regression." The work is deleting the moats, not patching them.
Coverage map (29 issues)
| Phase | Action | Closes |
|---|---|---|
| P1 | DEL-1 + DEL-2: process-management theater + shell-string spawning | #2090, #2095, #2107, #2111, #2114, #2117, #2135, #2123, #2097 |
| P2 | DEL-9: observer-sessions trust boundary (CLAUDE_MEM_INTERNAL env) |
#2126, #2118 |
| P3 | CON-2 + DEL-7: multi-account commit, port/path de-hardcoding | #2103, #2109, #2101 |
| P4 | CON-1: extend env sanitizer to proxy vars | #2115, #2099 |
| P5 | FF-1: fail-fast cleanup | #2089, #2094, #2116 |
| P6 | DEL-4 + DEL-5 + DEL-6 + DEL-8: small deletions | #2113, #2087, #2127, #2098, #2054 |
| P7 | #2106 install fixes (UX + shutdown-before-overwrite + uninstall coverage + real-port query) | #2106 |
| P8 | DEL-3 lite: pin chroma-mcp deterministically (full sqlite-vec migration deferred) | #2046, #2085, #2102 |
| P9 | Verification + close-as-dup/already-fixed | #2112, #2123→#2135, #2097→#2135, #2098→#2127, #2126 (closed by P2) |
Phase 0 — Documentation Discovery (DONE)
Allowed APIs (verified)
child_process.spawn(cmd, [args], { detached, stdio, env })— Node API used inProcessManager.ts. Bun.spawn does NOT supportdetached:true(perprocess-registry.ts:633-639comment). Use Nodechild_processfor daemon spawning.Bun.spawn([args], { env })— used for non-detached children (e.g.chroma-vector-sync.test.ts:25). Arg-array form bypasses shell on all platforms.Agent SDK query({ cwd, env, spawnClaudeCodeProcess })— used bySDKAgent.ts:145-163andKnowledgeAgent.ts:75-84. CustomspawnClaudeCodeProcesslets us inject env vars into the spawnedclaudesubprocess.sanitizeEnv()fromsrc/supervisor/env-sanitizer.ts— currently stripsCLAUDE_CODE_*andCLAUDECODE_*(preserve list:CLAUDE_CODE_OAUTH_TOKEN,CLAUDE_CODE_GIT_BASH_PATH).SettingsDefaultsManager.get('CLAUDE_MEM_WORKER_PORT')— canonical port reader. Default:37700 + (uid % 100).paths.tsexports:DATA_DIR,OBSERVER_SESSIONS_DIR,OBSERVER_SESSIONS_PROJECT,USER_SETTINGS_PATH,DB_PATH. All resolve underCLAUDE_MEM_DATA_DIRif set.- Hook exit-code contract (CLAUDE.md:48-58): exit 0 = success, exit 1 = non-blocking error, exit 2 = blocking error. Worker errors should exit 0 to prevent Windows Terminal tab accumulation.
Anti-patterns to avoid
- Don't invent shell-string variants of spawn. Use arg-array form everywhere. PowerShell
-EncodedCommandand quoting heuristics are deletable once we stop building shell strings. - Don't add new defender code (orphan janitors, duplicate-worker probes, retry-with-backoff loops). The existing defenders are what we're removing.
- Don't add new config knobs (env-passthrough whitelist, configurable timeout). Fix the default instead.
- Don't add tolerators (
|| true, silent JSON drops,.passthrough()schemas that drop fields). Fail loud or accept the input. - Don't start a sqlite-vec migration in this plan. It's a separate plan with its own discovery.
Surprising findings worth re-verifying mid-plan
- #2090/#2095 may already be fixed:
session-init.ts:78returnsEXIT_CODE.SUCCESSon worker-unreachable. Verify against the issue's repro before patching. - #2115 root cause confirmed:
sanitizeEnvdoes NOT stripHTTP_PROXY/HTTPS_PROXY/NO_PROXY. Extend the sanitizer; don't add a passthrough knob (#2099). - #2094
file-context.ts:184,196truncation is intentional token economics. The bug is that the truncated Read return value confuses Claude into infinite Edit retries. Fix: don't return a partial Read result from a hook — emit an injected-context note instead, or let the full Read happen. - #2126 items 2, 3, 4, 6 collapse into the P2 trust-boundary fix. Items 1 (basename glob) and 5 (cleanup CLI extension) are real but small.
Phase 1 — Delete process-management theater (DEL-1 + DEL-2)
Closes: #2090, #2095, #2107, #2111, #2114, #2117, #2135, #2123, #2097
What to delete
aggressiveStartupCleanup()atsrc/services/infrastructure/ProcessManager.ts:659-727. Including:- Windows WQL filter block (lines 563-606) — deletable; PowerShell WQL bug (#2114) disappears
- Linux/macOS
ps -eo pid,etime,command | grepblock (lines 607-644) AGGRESSIVE_CLEANUP_PATTERNSandAGE_GATED_CLEANUP_PATTERNSconstantsORPHAN_MAX_AGE_MINUTESconstant- All callers of
aggressiveStartupCleanup(grep for usage; expected:worker-service.tsstartup)
- PowerShell
-EncodedCommandwrapper atProcessManager.ts:944-1041. Replace withchild_process.spawn(cmd, [args], { detached: true, stdio: 'ignore', windowsHide: true }). Arg-array form bypasses shell on Windows, no quoting needed. ThesetsidUnix wrapper stays (it's correct). - Restart-with-port-steal sequence at
worker-service.ts:1154-1175. Replace with: tryhttpShutdown(port)→ if port still bound after 5s, log error and exit 1 (let user resolve). Don't loop. Don't kill PID by force. The user sees the error and acts. - Worker-cli duplicate-worker self-detection. Read
src/cli/worker-cli.js(or wherever the restart entry-point lives). Find the path that triggers duplicate detection on arestartcommand and remove it. The PID file owns the lock; restart should atomically swap.
What stays
verifyPidFileOwnership()atprocess-registry.ts:160-182andcaptureProcessStartToken()at lines 94-146 — these are correct. PID file with start-time token is exactly the OS-trust pattern we want.- The PID file itself at
~/.claude-mem/worker.pid(or$DATA_DIR/worker.pid). This is the lock. waitForPortFree()with a short timeout — used to confirm shutdown completed. Stays.
Implementation steps
git grep -n aggressiveStartupCleanup→ list every callsite. Delete the function and every callsite. Runnpm run build-and-sync.- Replace daemon-spawn body in
ProcessManager.ts:944-1041:- Single platform-uniform path:
child_process.spawn(execPath, args, { detached: true, stdio: 'ignore', windowsHide: true }).unref() - Keep
setsidwrapper on Unix when available (process-group cleanup on parent death). - Delete the PowerShell branch entirely.
- Single platform-uniform path:
- Rewrite
worker-service.ts:1154-1175restart case:await httpShutdown(port) const free = await waitForPortFree(port, 5000) if (!free) { console.error('Port still bound after shutdown. Resolve manually.') process.exit(1) } removePidFile() spawnDaemon(__filename, port) - Re-verify #2090/#2095 are already fixed by reading
session-init.ts:30-80. If yes, log "no-op" in plan execution notes. If the original repro still fires, add|| true-equivalent at the hooks.json shell wrapper layer (NOT in the handler itself). - Confirm #2117 (cleanup SIGKILLs own ancestors) goes away once cleanup is deleted.
Verification
git grep aggressiveStartupCleanupreturns zero hits.git grep -E "EncodedCommand|powershell.*Start-Process"returns zero hits insrc/.- Manual: kill worker, restart, confirm clean restart. Spawn 3 workers in parallel from different shells, confirm 2 fail with PID-file-owned errors and the first one wins (no kill cascade).
- Windows VM (or CI): username with space (
C:\Users\Alex Newman\) — confirm spawn works without quoting drama. Closes #2135/#2123/#2097. - Manually verify #2094 is NOT regressed (separate concern; covered in P5).
Anti-pattern guards
- Don't add a "lighter" cleanup. There is no lighter cleanup. The OS owns process lifecycle.
- Don't add a "warn user about orphan workers" branch. If orphans exist, they're someone else's bug.
- Don't add platform branches in the spawn code beyond the existing
setsidcheck.
Phase 2 — Observer-sessions trust boundary (DEL-9)
Closes: #2126 (items 2, 3, 4, 6 by deletion; items 1, 5 by small fix), #2118
What to do
Replace the cwd === OBSERVER_SESSIONS_DIR discriminator pattern (which has to be repeated by every consumer and inevitably drifts) with a single env-var trust boundary.
Implementation steps
-
Set the env var at every spawn site:
src/services/worker/SDKAgent.ts:113(buildIsolatedEnv) — addCLAUDE_MEM_INTERNAL: '1'to the returned env.src/services/worker/knowledge/KnowledgeAgent.ts:73— same.- Confirm both call
Agent SDK query()withenv: isolatedEnvso the spawnedclaudesubprocess inherits.
-
Check the env var first in
shouldTrackProject:src/shared/should-track-project.ts:35-44— first line of function:if (process.env.CLAUDE_MEM_INTERNAL === '1') return false;- Keep the existing
isWithin(cwd, OBSERVER_SESSIONS_DIR)check as a belt-and-braces fallback.
-
Delete now-redundant filters:
src/services/worker/PaginationHelper.ts:115-117— keep (UI hides observer rows; harmless).src/services/worker/PaginationHelper.ts:178— change hardcoded string'observer-sessions'toOBSERVER_SESSIONS_PROJECTconst for consistency. Tiny fix.src/services/worker/SSEBroadcaster.ts:45-60— add the SAME filter that SearchManager uses (SearchManager.ts:194). Don't invent a new one. Extract the filter predicate to a shared helper used by both. Closes #2118.
-
#2126 item 1 (basename glob fix): Read the issue's exact bug. Likely
EXCLUDED_PROJECTSmatches by full path instead of basename. Fix in the matcher; one-liner. -
#2126 item 5 (cleanup CLI): Extend
src/services/infrastructure/CleanupV12_4_3.ts:185-205to take a--dry-runand report counts. Don't write a new CLI; add the flag to existing.
Verification
- Add a test: spawn
SDKAgent, verify the spawned subprocess hasCLAUDE_MEM_INTERNAL=1in its env. - Add a test:
shouldTrackProject('/any/path')withCLAUDE_MEM_INTERNAL=1set returnsfalse. - Manual: trigger an observer session, confirm zero new rows under user's project in the DB.
- SSE: connect a client to
/api/events, trigger an observer session, confirm no observer events on the SSE stream.
Anti-pattern guards
- Don't add a
CLAUDE_MEM_OBSERVER_SESSION_DIRenv override (#2126 item 2).CLAUDE_MEM_DATA_DIRalready overrides; the observer dir is derived. - Don't add per-consumer filter knobs. One trust boundary, two existing filters (PaginationHelper, SSE), shared helper.
Phase 3 — Multi-account commit + port/path de-hardcoding (CON-2 + DEL-7)
Closes: #2103, #2109, #2101
Discovery showed multi-account is ~80% there: DATA_DIR is fully overridable, per-UID port already exists, PID files are DATA_DIR-relative. The remaining gap is 8 hardcoded 37777 literals + hooks.json bare-port assumption.
What to do
-
Eliminate every hardcoded
37777:src/ui/viewer/constants/settings.ts:8— change to read from settings/env at runtime if possible; otherwise leave as build-time default (least bad).src/npx-cli/commands/runtime.ts:154,install.ts:545,uninstall.ts:109— replace fallback withSettingsDefaultsManager.get('CLAUDE_MEM_WORKER_PORT').src/integrations/opencode-plugin/index.ts:97— same. Read from settings.src/services/integrations/OpenClawInstaller.ts:171— drop the default; require the caller to pass it.plugin/skills/timeline-report/SKILL.md:23,53— replace literal with${CLAUDE_MEM_WORKER_PORT:-37700}or instruct the skill to read from settings.json. Closes #2103.
-
Fix hooks.json port handling for #2109:
plugin/hooks/hooks.json— every hook command needs to either (a) inherit the port from env or (b) read from settings.json. Update thebun-runner.jswrapper to do this once.- On Windows + Git Bash, ensure POSIX path → Windows path conversion happens before passing to
node.exe. Thebun-runner.jswrapper is the right place.
-
Multi-account commit:
- Document in CLAUDE.md: multi-account works by setting
CLAUDE_MEM_DATA_DIR=/path/to/account-Nper shell. All paths derive from it. Per-UID port collision is handled automatically. - Add a one-line CLI command:
claude-mem profile use <name>that exports the right env vars (or just print the export command for user to eval). - Close #2101 with documentation pointing at the above.
- Document in CLAUDE.md: multi-account works by setting
Verification
git grep -nE "37777" src/ plugin/returns only the build-time default insettings.ts.- Run two workers in parallel under different
CLAUDE_MEM_DATA_DIRvalues; both bind successfully on different ports; both have separate PID files; both serve separate SSE streams. - Run timeline-report skill against a non-default port; it picks up the right port from settings.
Anti-pattern guards
- Don't add a "discover running workers on common ports" probe. The settings.json port is the source of truth.
- Don't add a
--portflag to every CLI command. The env / settings.json owns it.
Phase 4 — Extend env sanitizer (CON-1)
Closes: #2115, #2099
What to do
src/supervisor/env-sanitizer.ts— extendENV_PREFIXESand/or add aPROXY_VARSset that strips:HTTP_PROXY,HTTPS_PROXY,ALL_PROXY,NO_PROXY(and lowercase variants)- Optionally:
npm_config_proxy,npm_config_https_proxy
- Decide whether the strip should be unconditional or opt-in. Default: unconditional. Worker spawns
claudefor internal AI calls; the user's proxy config should not bleed in. - Reject #2099's passthrough-whitelist feature. Close with: "we now strip proxy vars by default; if you have a real use case for letting them through, file a new issue with details."
Verification
- Set
HTTPS_PROXY=http://bad-proxy:1234in the worker shell. Spawn an SDK subprocess. Confirm the subprocess's env does NOT containHTTPS_PROXY. Add a test for this. git grep -n "HTTP_PROXY\|HTTPS_PROXY"shows the sanitizer is the only place that knows about them.
Phase 5 — Fail-fast cleanup (FF-1)
Closes: #2089, #2094, #2116. #2118 is closed by P2.
#2089 — stdin-reader silent drop
src/cli/stdin-reader.ts:156-164 — onEnd resolves with undefined even on parse failure. Change to: if input is non-empty AND parse fails, throw or call the safety-timeout error path. Match what the issue asks for: distinguish "no input" from "malformed input." Document in the function header.
#2094 — PreToolUse:Read truncation causes Edit deadlock
src/cli/handlers/file-context.ts:141-143, 184, 196 — the truncation is intentional (token economics), but returning a truncated Read result confuses Claude. Fix:
- Hooks should not return modified Read results. They can inject context as
additionalContextor skip entirely. - Audit what the handler returns to Claude Code. If it returns a fake Read response with 1 line, that's the bug. It should either return
{ continue: true }(let the real Read happen) or inject context viaadditionalContextfield. - Read Claude Code's PreToolUse hook contract for what fields are allowed in the response.
#2116 — /api/memory/save silently drops metadata
src/services/worker/http/routes/MemoryRoutes.ts:16-20, 38-67 — the schema uses .passthrough() which keeps unknown fields, but discovery suggests fields are dropped at insert time. Audit:
- Where do the schema's accepted fields get inserted? If only
text/title/projectare in the INSERT statement, the metadata is dropped silently. - Fix: either accept arbitrary metadata into a
metadataJSON column, or reject requests with unknown fields (.strict()instead of.passthrough()). Pick one. Default: accept into a JSON column. - The "force project to plugin's own project" line at
MemoryRoutes.ts:40(const targetProject = project || this.defaultProject) is fine. It uses caller's value if provided. Verify the issue reporter wasn't omittingprojectfield.
Verification
- Test:
POST /api/memory/savewithmetadata: { foo: 'bar' }— confirm the data is retrievable. - Test: malformed JSON to stdin-reader fires error, not silent undefined.
- Manual: trigger PreToolUse:Read on a large file — confirm Edit succeeds afterward (no deadlock).
Phase 6 — Small deletions (DEL-4 + DEL-5 + DEL-6 + DEL-8)
DEL-4 — Un-bundle Zod from hook scripts (#2113)
scripts/build-hooks.js:163-171, 203-230, 294— add'zod'to theexternallist for hook builds.- If hooks need validation, write a 20-line shape check (
typeof x.foo === 'string'etc.). Don't reach for Zod for hook input. - Audit
src/hooks/for Zod imports; replace with hand-rolled checks. - Worker (
worker-service.cjs) can still bundle Zod — the conflict is only in hook-bundled scripts loaded by OpenCode.
Verification: node -e "require('./plugin/scripts/<hook>.js')" shows no Zod in the bundle. Run with OpenCode hook environment; #2113's TypeError doesn't reproduce.
DEL-5 — Delete GeminiAgent fallback (#2087)
src/services/worker/GeminiAgent.ts:130-132— deletesetFallbackAgent.src/services/worker/GeminiAgent.ts:365— delete theif (this.fallbackAgent)branch. On 429: log + throw.src/services/worker/OpenRouterAgent.ts:79-81— same.tests/gemini_agent.test.ts:279, 313— delete the fallback tests; add an explicit "429 throws" test.- Update docs anywhere that mentions Gemini-falls-back-to-Claude (it never did in production).
DEL-6 — Delete the 4-hour session timeout knob request (#2127, #2098)
- Find
MAX_SESSION_WALL_CLOCK_MS(likelysrc/services/worker/sessions/SessionManager.tsor similar). Read the surrounding code: what does the timeout do? (Likely cleanup of stale sessions.) - If the timeout is arbitrary: raise to 24h or remove. Document why.
- If the timeout exists for a real reason (memory pressure, abandoned sessions): document the reason in code, raise to a value nobody hits in practice, and close both issues with the explanation.
- Close #2098 as dup of #2127.
DEL-8 — Delete installCLI() alias (#2054)
plugin/scripts/smart-install.js:345-395— deleteinstallCLIfunction.plugin/scripts/smart-install.js:633— delete the call.src/npx-cli/commands/uninstall.ts— add a one-time legacy-alias-strip pass:- Read
~/.bashrc,~/.zshrc,~/Documents/PowerShell/Microsoft.PowerShell_profile.ps1. - Remove any line matching
^alias claude-mem=or^function claude-mem. - Print "Removed legacy claude-mem alias from " so users know.
- Read
- Update README + docs: canonical entry points are
npx claude-mem <cmd>andbunx claude-mem <cmd>.
Verification: Fresh install creates no shell-config mutations. Existing user with the alias runs uninstall — alias is gone. which claude-mem after uninstall returns nothing.
Phase 7 — #2106 install fixes (modest scope)
Closes: #2106 (items 1, 3, 4, 6 by fix; items 2, 7 by close-as-already-fixed/insufficient-detail; item 5 by documentation).
Fixes
- Item 1 — multiselect default:
src/npx-cli/commands/install.ts:275-277— changeinitialValues: detected.filter(...).map(...)toinitialValues: []. Force explicit opt-in. - Item 3 — install-shutdown-before-overwrite: Extract
uninstall.ts:109-132(HTTP shutdown + poll) tosrc/services/install/shutdown-helper.ts. Call it from bothuninstall.tsandinstall.tsbeforecopyPluginToMarketplace. - Item 4 — uninstall path coverage:
src/npx-cli/commands/uninstall.ts— add removal of:~/.npm/_npx/*/node_modules/claude-mem~/.cache/claude-cli-nodejs/*/mcp-logs-plugin-claude-mem-*~/.claude/plugins/data/claude-mem-thedotmack/- Cascade shutdown to chroma-mcp (call its shutdown endpoint or kill PID).
- Item 6 — real port query:
install.ts:545— aftersmart-install.jscompletes, hithttp://127.0.0.1:<settingsPort>/api/healthand report the actually-bound port. If health fails, just print "worker not yet ready" and exit cleanly. - Item 5 — documentation: Add to install summary output: "Close all Claude Code sessions before uninstalling, or
~/.claude-memwill be recreated by active hooks."
Close
- Item 2 (SQLite migration race): closed as already fixed by
ba37b2b2/68e92edc. - Item 7 (vague SessionStart errors): closed as insufficient detail.
Verification
- Fresh install on a clean VM: only the IDEs the user explicitly checks are installed.
- Reinstall while worker is running: install succeeds, no "overwrite" loop.
- Uninstall +
find ~/.npm ~/.cache ~/.claude -name "*claude-mem*"returns empty. - Install summary prints the actual port when the user has overridden via env or settings.
Phase 8 — Chroma deterministic pinning (DEL-3 lite)
Closes: #2046, #2085, #2102
Full sqlite-vec migration is a separate plan (would require replacing the embedding pipeline currently owned by chroma-mcp's bundled SBERT). For this plan: stop using uvx --with flags ad-hoc and pin chroma-mcp to a specific version with locked deps.
Implementation
- Pin chroma-mcp version.
src/services/sync/ChromaMcpManager.ts:200-244— changebuildCommandArgs()to invoke a specific pinned version:uvx --python 3.11 chroma-mcp==<X.Y.Z>(pick a known-good version that bundles its own deps). - Re-add
--with httpcore --with httpxONLY if the pinned version requires them. Verify by running the pinned command in a clean uvx cache. If the deps are declared properly upstream, the--withflags are unnecessary. - Verify #2102 fix is intact: commit
05114becadded transport cleanup on timeout, stale onclose handler guard, and 10s reconnect backoff. ReadChromaMcpManager.tsto confirm these are still present.
Decision deferred to a separate plan
- Replacing chroma-mcp with sqlite-vec or a different vector store. This requires picking an embedding strategy (OpenAI? local model?) and rewriting
ChromaSync.ts. Not in this plan.
Verification
- Fresh install on a clean machine:
~/.claude-mem/chroma/populates,chroma_query_documentsreturns results without errors. - No "No module named 'httpcore'" error in worker logs (closes #2046, #2085).
- Force a chroma-mcp timeout (e.g. kill the subprocess); confirm the worker reconnects after backoff without spawning duplicate subprocesses (closes #2102).
Phase 9 — Verification + close-as-dup
Cross-cutting verification
git grep -nE "aggressiveStartupCleanup|EncodedCommand|setFallbackAgent|installCLI"— all return zero hits.git grep -nE "37777" src/ plugin/— only the build-time default inviewer/constants/settings.ts.- Full test suite passes.
npm run build-and-synccompletes; worker starts; SessionStart context injection works (manual test: open a new session, confirm memory recap appears).- CI runs on Windows (or manual VM): username with space spawns successfully.
Close issues
- #2112: already fixed → close with link to fix commit.
- #2123: dup of #2135.
- #2097: dup of #2135.
- #2098: dup of #2127.
- #2126: closed by P2 trust-boundary fix.
- #2099: closed by P4 (proxy strip is the right fix; passthrough whitelist not needed).
- #2101: closed by P3 documentation + multi-account commit.
- #2117: closed by P1 (deletion of aggressive cleanup).
- #2087: closed by P6 (DEL-5).
All other issues close as part of their respective phase verification.
Plan execution order
P1 first (highest leverage; closes 9 issues; reverses regression treadmill). Then P2 (single trust boundary closes 2 issues + prevents future leaks). P3-P8 are independent and can run in parallel by different sessions. P9 last.
If time-constrained, the high-value subset is P1 + P2 + P5: kills the two structural patterns (defenders, tolerators) plus the trust-boundary leak. That alone closes 14 of 29 issues with mostly deletions.