Six numbered plan documents covering: - 01 Hook IO Discipline (#2376) - 02 Spawn-Contract Templating (#2377) - 03 Worker / Daemon Lifecycle Hardening (#2378) - 04 Installer Failure Transparency (#2379) - 05 Observer SDK Tool Enforcement (#2380) - 06 Worker Env Isolation (#2381) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
47 KiB
Installer Failure Transparency — Cross-IDE Matrix
Goal: Stop the universal installer (npx claude-mem install) from silently swallowing real failures and falsely reporting "installed successfully" on all 12 IDEs. Convert every error-suppression site to a single installerError(severity, ctx) decision point driven by an explicit taxonomy. Make tree-sitter ERESOLVE conflicts and missing uv fail loudly with platform-specific remediation. Add a 12-IDE × 4-failure-mode validation matrix and CI postinstall regression guards inspired by the v12.6.2 tree-sitter-swift fix.
Net effect:
- "Installation Complete" is only printed when every ABORT-level dependency was satisfied. Partial outcomes get a yellow "Installation Partial" headline with a remediation block.
runNpmInstallInMarketplace()runs strict first;--legacy-peer-depsis only applied on a confirmedERESOLVEtoken, with the fallback announced loudly.- Missing
uvafter auto-install attempt = ABORT with platform-specific instructions surfaced as the primary message (not buried under a wrapped "version probe failed" line). When the user has opted out of vector search, downgrade to WARN_CONTINUE. - Postinstall regression guard: any new transitive dep with
scripts.postinstallorscripts.installthat is not in an explicit allowlist fails the build, preventing a re-run of the v12.6.1tree-sitter-swifthang. - Cross-IDE test matrix: 12 IDEs × 4 scenarios (happy / ERESOLVE / missing uv / missing bun) = 48 cells, each asserting exit code, summary text, and remediation presence.
Out of scope (defer to follow-up plans):
- Replacing
bun-runner.js(its own runtime concerns; tracked inplans/2026-04-29-installer-streamline.md). - Re-architecting
bufferConsoleto a structured event stream (this plan only fixes the data loss; full streaming UX is later). - Internationalizing the new remediation messages (English-only for now).
- Migrating
openclaw/install.shfrom bash to TypeScript (audit only; remediation is in-place hardening).
Problem Statement (with line citations)
Concrete swallowed errors that exist today:
| # | File | Line(s) | Current behavior | Why it matters |
|---|---|---|---|---|
| 1 | src/npx-cli/commands/install.ts |
1126–1135 | Catches every npm install error, prints console.warn, returns the misleading task message Dependencies may need manual install ⚠. The surrounding install still ends with installed successfully!. |
A genuine ERESOLVE (or any npm crash) becomes a yellow tip the user immediately ignores. |
| 2 | src/npx-cli/commands/install.ts |
565–581 | runNpmInstallInMarketplace always uses npm install --omit=dev --legacy-peer-deps. The flag papers over real peer conflicts unconditionally. |
The next time a tree-sitter peer range tightens, --legacy-peer-deps will quietly install a broken tree, and we'll only see runtime failures. |
| 3 | src/npx-cli/install/setup-runtime.ts |
206–219 | If getUvVersion() returns null after auto-install, throws "uv installed but version probe failed." runInstallCommand does not wrap this with platform-specific instructions; the user sees the wrapped error during a clack spinner that may overwrite it. |
Honors CLAUDE.md's "uv auto-installed if missing" promise on the happy path but degrades to a confusing one-liner on failure. |
| 4 | src/npx-cli/commands/install.ts |
163–169, 328–347 | Per-IDE failures push into pendingErrors[] via bufferConsole (lines 43–64). installStatus (line 1197) only reads failedIDEs.length > 0, so an IDE that throws after bufferConsole returns 0 is invisible. The summary line "Failed: …" is the only signal. |
A single failed IDE produces a yellow note that scrolls off-screen above the green "installed successfully!" outro. |
| 5 | src/npx-cli/commands/install.ts |
1131 | console.warn('[install] npm install error:', …) — error is logged but not classified, retried, or surfaced in the summary. |
Same root cause as #1: stderr disappears, exit code stays 0. |
| 6 | src/npx-cli/commands/install.ts |
1161–1166 | disableClaudeAutoMemory failures classified as "WARN_CONTINUE" today (correct severity), but the implementation is ad-hoc. |
Inconsistent — every other catch in this file uses different logging shapes. |
| 7 | openclaw/install.sh |
36 occurrences of 2>/dev/null / || true (e.g. lines 169, 224–229, 251, 255, 289, 293, 405, 435, 471, 495, 572, 612, 631, 670, 1076, 1155, 1161, 1185) |
Bash-level error suppression on curl/jq/find/health-check pipelines. Many are correct (best-effort probes), but several mask genuine install failures. | Some || true patterns hide a missing bun or unwritable plugin dir. |
| 8 | src/services/integrations/*.ts |
50+ catch blocks across 7 files (Codex, Cursor, Gemini, OpenCode, OpenClaw, Windsurf, MCP) | Each integration installer has its own ad-hoc error handling. Errors return non-zero, are buffered by bufferConsole, then dropped. |
The IDE matrix has 12 different failure UX paths. |
| 9 | scripts/build-hooks.js |
Generates plugin/package.json with all tree-sitter deps and trustedDependencies: ['tree-sitter-cli']. No CI guard prevents adding a new package with scripts.postinstall outside this allowlist. |
The exact root cause of v12.6.1 — re-runnable by anyone editing this file. |
Reference incident (canonical learning)
CHANGELOG.md:93–110 documents v12.6.1 → v12.6.2: PR #2300 moved 21 tree-sitter grammars from devDependencies to dependencies; tree-sitter-swift's postinstall pulled a nested tree-sitter-cli that downloaded a Rust binary and SIGINT'd. Lesson: npm does not honor trustedDependencies (Bun-only). Any new transitive dep with a network postinstall can hang npx claude-mem install. Phase 7 turns this into a CI guard.
Phase 0 — Documentation Discovery
Each implementation phase below cites these facts by line number; do not re-derive.
Allowed APIs / patterns to copy
| Item | Location | What to copy |
|---|---|---|
Existing clack runTasks / bufferConsole pattern |
src/npx-cli/commands/install.ts:32–64 |
Tasks return a string; orchestrator handles spinner. Reuse, but route every error through installerError. |
describeExecError (stdout/stderr extractor) |
src/npx-cli/install/setup-runtime.ts:100–112 |
Already canonical for child_process errors. Move to a shared module. |
| Marker write pattern for partial state | src/npx-cli/install/setup-runtime.ts:262–275 |
Use the same JSON shape ({ severity, component, phase, cause, …}) for the new ~/.claude-mem/last-install-error.json. |
| Plugin-cache resolution | src/npx-cli/utils/paths.ts (pluginCacheDirectory, marketplaceDirectory) |
All path resolution must honor CLAUDE_MEM_DATA_DIR; reuse instead of inventing. |
| Existing IDE list (canonical 12) | src/npx-cli/commands/ide-detection.ts:40–129 |
claude-code, gemini-cli, opencode, openclaw, windsurf, codex-cli, cursor, copilot-cli, antigravity, goose, roo-code, warp. |
trustedDependencies allowlist (postinstall guard) |
scripts/build-hooks.js:106–108 and root package.json:190–202 |
The pattern Phase 7 enforces. |
| Existing install tests (extend, don't replace) | tests/install-non-tty.test.ts, tests/setup-runtime.test.ts, tests/install-disable-auto-memory.test.ts |
Same harness shape (mocked spawn, isolated TMPDIR HOME). |
| Docker harness (clean Linux) | Dockerfile.test-installer |
Already supports running install with no bun/uv preinstalled. Phase 6 forks this for the matrix runner. |
| CLAUDE.md exit-code contract | CLAUDE.md "Exit Code Strategy" section |
Hooks: exit 0 = success, 1 = non-blocking, 2 = blocking. Installer is NOT a hook — it can exit 1 or 2 for ABORT. Phase 8 cross-references. |
| Prior plan format | plans/2026-04-29-installer-streamline.md, plans/2026-04-30-onboarding-ux-overhaul.md |
Phased layout, file inventory, anti-patterns table. |
| v12.6.2 incident text | CHANGELOG.md:93–110 |
Phase 7 quotes this verbatim in code comments. |
External facts (cited)
| Topic | Source / canonical reference | Key fact |
|---|---|---|
npm ERESOLVE semantics |
npm install docs (npm v10+) and npm RFC 0023 |
ERESOLVE is emitted on stderr with a deterministic prefix npm error code ERESOLVE followed by While resolving: block. --legacy-peer-deps skips peer-dep resolution; --force accepts conflicting trees. They are NOT equivalent — --force is more aggressive and is not what we want. |
| Bun install errors | bun install source / docs |
Stderr lines start with error:. A peer-dep violation prints error: package "X" has unmet peer "Y". A network failure prints error: failed to resolve. |
| uv install script return codes | https://astral.sh/uv/install.sh |
Exits 0 on success even when binary lands in a non-PATH dir (e.g. ~/.local/bin not yet on PATH). The version probe must check UV_COMMON_PATHS after the script runs. |
| Claude Code hook exit-code contract | CLAUDE.md "Exit Code Strategy" |
Worker/hook errors exit 0 (Windows Terminal hygiene). The npx claude-mem install CLI is NOT a hook and is allowed to exit non-zero on ABORT. |
Anti-patterns / API methods that DO NOT exist (avoid inventing)
- There is no central
installerErrorfunction today. Phase 3 must create it. Do not reach for a non-existent helper. --forceis not a substitute for--legacy-peer-deps. Phase 4 must not "upgrade" the fallback to--force— that masks more than ERESOLVE.- npm has no
--no-postinstallflag at the CLI level. The correct flag is--ignore-scripts. Don't invent. - Bun's
trustedDependenciesis not honored by npm. Do not assume the same allowlist works for both. Phase 7 enforces a separate npm-level guard. process.exitCode = 1(line 1324 of install.ts) does not abort an in-flightawaitchain. Phase 3'sInstallAbortErrormust throw, not just setexitCode.- The
bufferConsolewrapper (install.ts:43–64) swallows stderr inside the buffer; do not assume stderr ever reaches the terminal in non-interactive mode unless explicitly flushed. clack'sp.spinner()overwrites the line on.stop(). Errors emitted viaconsole.warnduring a spinner are lost. Phase 3's WARN_CONTINUE must enqueue to a summary list, not log live.ensureUv()already throws on failure — but the throw is caught one level up by clack's task runner, which displays the message in a single line. Do not assume the user reads it; Phase 5 must add an explicit ABORT block.- The
install/public/install.shandinstall/public/installer.jsfiles are already deprecated stubs (verified — both just print "use npx claude-mem install"). Don't waste audit time on them. openclaw/install.shis the active shell installer (1653 lines). It has its own bash-level audit in Phase 1.
File inventory
| File | Lines | Disposition |
|---|---|---|
src/npx-cli/commands/install.ts |
1371 | Edited heavily (Phase 1, 3, 4, 5) |
src/npx-cli/install/setup-runtime.ts |
288 | Edited (Phase 5, 7) |
src/npx-cli/install/error-taxonomy.ts |
NEW | CREATED (Phase 2) |
src/npx-cli/install/error-reporter.ts |
NEW | CREATED (Phase 3) |
src/services/integrations/CodexCliInstaller.ts |
~360 | Edited (Phase 3) — every catch routed to installerError |
src/services/integrations/CursorHooksInstaller.ts |
~530 | Edited (Phase 3) |
src/services/integrations/GeminiCliHooksInstaller.ts |
~310 | Edited (Phase 3) |
src/services/integrations/OpenCodeInstaller.ts |
~250 | Edited (Phase 3) |
src/services/integrations/OpenClawInstaller.ts |
~260 | Edited (Phase 3) |
src/services/integrations/WindsurfHooksInstaller.ts |
~395 | Edited (Phase 3) |
src/services/integrations/McpIntegrations.ts |
~220 | Edited (Phase 3) |
openclaw/install.sh |
1653 | Audited and selectively hardened (Phase 1) |
scripts/build-hooks.js |
~250 | Edited (Phase 7) — postinstall allowlist guard |
scripts/check-postinstall-allowlist.js |
NEW | CREATED (Phase 7) — pre-publish CI script |
tests/install-error-matrix.test.ts |
NEW | CREATED (Phase 6) — 12 × 4 matrix |
tests/install-non-tty.test.ts |
277 | Extended (Phase 6) |
tests/setup-runtime.test.ts |
135 | Extended (Phase 5) |
Dockerfile.test-installer-matrix |
NEW | CREATED (Phase 6) |
docs/public/troubleshooting.mdx |
NEW or extended | Edited (Phase 8) |
CLAUDE.md "Exit Code Strategy" |
Existing | Edited (Phase 8) — cross-reference taxonomy |
CHANGELOG.md |
— | DO NOT EDIT — generated automatically per CLAUDE.md |
Phase 1 — Audit every error-suppression pattern
Goal: Produce a definitive table of every catch, || true, 2>/dev/null, and try {} catch {} in installer paths. Every row gets a proposed Phase 2 classification (ABORT / FAIL_LOUD_PER_IDE / WARN_CONTINUE / SILENT_RETRY).
Deliverable: plans/audit-installer-errors.csv (committed alongside this plan), with columns:
file, line, kind (catch | bash-or-true | bash-redirect), current_behavior, proposed_severity, proposed_remediation_text, notes.
What to audit (exact greps)
Run these greps from repo root and turn every hit into a row:
# TS catch blocks
grep -nE 'catch\s*(\(|\{)' src/npx-cli/ src/services/integrations/ -r
# TS empty catch
grep -nB1 'catch\s*\{\s*\}' src/npx-cli/ src/services/integrations/ -r
# TS console.warn after caught error
grep -nE 'catch.*\{' src/npx-cli/ src/services/integrations/ -r -A 3 | grep -A 0 'console\.warn\|log\.warn'
# Shell silent failures
grep -nE '\|\| true|2>/dev/null|2>&1.*\|\|' openclaw/install.sh
# Build / sync scripts
grep -nE 'catch|process\.exit\(0\)' scripts/build-hooks.js scripts/sync-marketplace.cjs
# Plugin hooks
grep -nE 'catch|exit 0' plugin/scripts/version-check.js plugin/scripts/bun-runner.js
Known counts (from the initial audit baked into this plan)
src/npx-cli/commands/install.ts: 14 catch blocks (lines 387, 393, 406, 455, 596, 613, 631, 725, 980, 1056, 1131, 1161, 1243, 1252).src/npx-cli/install/setup-runtime.ts: 5 catch blocks (lines 38, 60, 73, 95, 233).src/services/integrations/CursorHooksInstaller.ts: 8 catch blocks.src/services/integrations/CodexCliInstaller.ts: 8 catch blocks.src/services/integrations/WindsurfHooksInstaller.ts: 9 catch blocks.src/services/integrations/OpenCodeInstaller.ts: 8 catch blocks.src/services/integrations/OpenClawInstaller.ts: 4 catch blocks.src/services/integrations/GeminiCliHooksInstaller.ts: 4 catch blocks.src/services/integrations/McpIntegrations.ts: 2 catch blocks.scripts/sync-marketplace.cjs: 6 catch blocks (line 28, 75, 90, 101, 111, 188, 220).scripts/build-hooks.js: 1 catch block (line 422).openclaw/install.sh: 36|| true/2>/dev/nullpatterns.
Audit total ≈ 105 sites. Each row in the CSV must end with a Phase 2 severity proposal.
Verification checklist
- CSV row count ≥ 100 (matches grep counts above ± 5%).
- Every row has a non-empty
proposed_severity. - No row has
proposed_severity = SILENT— that severity does not exist; the closest valid choice is SILENT_RETRY. - CSV is committed at
plans/audit-installer-errors.csvand referenced from this plan.
Anti-pattern guards
- Do not classify "this catch logs a warning today" as "WARN_CONTINUE" automatically. Read each one and decide. Some are genuine ABORTs masquerading as warnings.
- Do not classify any
2>/dev/nullon acurlhealth probe as ABORT — health probes are best-effort by design. - Do not mark
installClaudeCode()(line 416–462) failures as ABORT; the user explicitly opted into "install Claude Code now" and a failure should be FAIL_LOUD with manual remediation, not abort the install.
Phase 2 — Define error taxonomy
Goal: Single source-of-truth typed enum + lookup table that classifies every installer error and prescribes a remediation string.
File to create: src/npx-cli/install/error-taxonomy.ts
What to implement
Copy the structure from this skeleton (paraphrased; do not edit copy verbatim — adapt to actual TypeScript types in the repo):
export enum ErrorSeverity {
ABORT = 'ABORT', // exit 1, do not continue
FAIL_LOUD_PER_IDE = 'FAIL_LOUD_PER_IDE', // exit 1 if all IDEs fail; otherwise partial summary
WARN_CONTINUE = 'WARN_CONTINUE', // print warning to end-of-install summary, continue
SILENT_RETRY = 'SILENT_RETRY', // retry once with backoff; escalate to WARN_CONTINUE
}
export interface ErrorCategory {
id: string; // 'tree-sitter-eresolve', 'uv-missing', etc.
severity: ErrorSeverity;
match: (cause: unknown, ctx: { component: string; phase: string }) => boolean;
remediation: (ctx: { platform: NodeJS.Platform; dataDir: string }) => string;
}
export const ERROR_CATEGORIES: ErrorCategory[] = [ /* see seed list below */ ];
Seed taxonomy (the categories Phase 3 must implement)
| id | Severity | Match heuristic | Remediation summary |
|---|---|---|---|
bun-missing-after-install |
ABORT | cause.message.includes('Bun executable not found') |
"Install Bun manually then re-run npx claude-mem install. macOS/Linux: curl -fsSL https://bun.sh/install | bash. Windows: winget install Oven-sh.Bun." |
uv-missing-after-install |
ABORT (downgradable to WARN_CONTINUE if user opted out of vector search — see Phase 5) | cause.message.includes('uv executable not found') || cause.message.includes('uv installed but version probe failed') |
Platform-specific block from installUv() (lines 164–166) surfaced as primary message. |
tree-sitter-eresolve |
ABORT (after one retry with --legacy-peer-deps) |
stderr contains literal ERESOLVE AND --legacy-peer-deps retry also failed |
"ERESOLVE conflict in marketplace deps that --legacy-peer-deps could not resolve. Open an issue at https://github.com/thedotmack/claude-mem/issues with the conflicting peer ranges below: <details>." |
bun-install-network-fail |
SILENT_RETRY → WARN_CONTINUE | bun stderr error: failed to resolve for a known package on first try, repeated on retry |
"bun install failed to resolve packages — check network connectivity and re-run npx claude-mem install. Cached packages in ~/.bun/install/cache will be reused." |
marketplace-dir-not-writable |
ABORT | EACCES/EPERM on mkdirSync / writeFileSync to marketplaceDirectory() |
"Cannot write to marketplace directory ${dataDir}/.claude/plugins/.... Check filesystem permissions or set CLAUDE_MEM_DATA_DIR to a writable path." |
plugin-json-corrupt |
ABORT | JSON.parse error on plugin.json |
"Existing plugin.json is corrupt. Run rm -rf ~/.claude/plugins/marketplaces/thedotmack and re-run install." |
all-ides-failed |
ABORT | failedIDEs.length === selectedIDEs.length && selectedIDEs.length > 0 |
"Every selected IDE integration failed. See per-IDE errors above. Re-run with --ide=<single> to isolate." |
single-ide-failed |
FAIL_LOUD_PER_IDE | per-IDE installer non-zero exit | Echo first 20 lines of stderr + "Run npx claude-mem install --ide=<name> to retry just this IDE." |
mcp-integration-optional-fail |
WARN_CONTINUE | MCP installer non-zero AND IDE has alternate (non-MCP) integration path | "MCP setup for ${ide} failed; non-MCP features still work. Run npx claude-mem mcp ${ide} later." |
path-update-failed |
WARN_CONTINUE | applyClaudeCodePathSetupIfNeeded write fails |
"Could not auto-update PATH in ${configFile}. Run manually: echo '...' >> ${configFile}." |
auto-memory-toggle-failed |
WARN_CONTINUE | disableClaudeAutoMemory throws |
"Could not disable Claude Code auto-memory. Add CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 to ~/.claude/settings.json env block." |
version-probe-transient |
SILENT_RETRY → WARN_CONTINUE | bun/uv --version returns non-zero once |
(no message on first try; on retry: "Could not verify ${tool} version — installation likely OK.") |
idempotent-json-merge-race |
SILENT_RETRY | EEXIST/ENOENT race during writeJsonFileAtomic retry |
(silent; retry once.) |
child-process-timeout |
ABORT | spawnSync/execSync timeout (Phase 7's wrapper) | "${command} did not finish in ${timeout}s. Check network connectivity. If the host is slow, set CLAUDE_MEM_INSTALL_TIMEOUT_MS." |
Verification checklist
error-taxonomy.tsexportsErrorSeverity,ErrorCategory,ERROR_CATEGORIES.ERROR_CATEGORIEScontains exactly the 14 rows above (extensions allowed).- Every category's
remediation()readsdataDirfrom a passed-in context, not fromprocess.envdirectly (so multi-account setups work — see CLAUDE.md "Multi-account"). npm run typecheckpasses.
Anti-pattern guards
- Do not include a
SILENTseverity (no remediation, no log). It does not exist in this taxonomy. - Do not hard-code
~/.claude-mempaths in remediation strings. Always interpolatedataDir. - Do not add a category for "unknown error" with low severity. Unknown errors must default to ABORT until classified — fail loud is the safe default.
Phase 3 — Implement installerError(severity, ctx) central handler
Goal: Single function every catch in installer paths must call. ABORTs throw a typed error; WARN_CONTINUEs enqueue to a summary list; SILENT_RETRYs re-invoke the wrapped action.
Files to create: src/npx-cli/install/error-reporter.ts
What to implement
Skeleton (adapt to actual repo conventions; do not paste verbatim):
export class InstallAbortError extends Error {
readonly category: ErrorCategory;
readonly remediation: string;
readonly cause: unknown;
}
export interface ErrorContext {
component: string; // 'cursor', 'codex-cli', 'marketplace-npm-install', 'uv-install', etc.
phase: string; // 'setup-runtime', 'ide-install', 'marketplace-deps', etc.
cause: unknown;
remediation?: string; // optional override; default from taxonomy
eresolveDetails?: string; // raw stderr block to surface verbatim
}
export interface InstallSummary {
warnings: Array<{ component: string; message: string; remediation: string }>;
failedIDEs: string[];
retryCount: Record<string, number>;
}
export function createInstallSummary(): InstallSummary;
export function installerError(
severity: ErrorSeverity,
ctx: ErrorContext,
summary: InstallSummary
): never | void;
export async function withRetry<T>(
action: () => Promise<T>,
ctx: ErrorContext,
summary: InstallSummary,
maxAttempts: number = 2
): Promise<T>;
export function flushSummary(summary: InstallSummary, isInteractive: boolean): void;
Behavior contract
| Severity | Behavior |
|---|---|
ABORT |
Write ~/.claude-mem/last-install-error.json (path resolved via pluginCacheDirectory / CLAUDE_MEM_DATA_DIR), print remediation block to stderr (ANSI-colored only when process.stderr.isTTY), throw InstallAbortError with cause chained. The top-level runInstallCommand catches InstallAbortError, prints the headline "Installation Aborted: <category.id>", and process.exit(1). |
FAIL_LOUD_PER_IDE |
Append to summary.failedIDEs, append a remediation block to summary.warnings. Continue. The top-level summary prints "Installation Partial" (red, not green). Exits 1 only if all IDEs fail (which then triggers all-ides-failed ABORT). |
WARN_CONTINUE |
Append to summary.warnings. Do not log live (clack spinner would clobber). flushSummary prints all warnings after the spinner / outro. |
SILENT_RETRY |
Increment summary.retryCount[component]. If count > 1, escalate to WARN_CONTINUE. Caller uses withRetry helper to wrap the action. |
Refactor every audited catch
For each row in plans/audit-installer-errors.csv produced by Phase 1, replace the existing handler with a call to installerError(severity, ctx, summary). Before/after example:
Before (install.ts:1126–1135):
try {
runNpmInstallInMarketplace();
return `Dependencies installed ${pc.green('OK')}`;
} catch (error: unknown) {
console.warn('[install] npm install error:', error instanceof Error ? error.message : String(error));
return `Dependencies may need manual install ${pc.yellow('!')}`;
}
After:
try {
await runNpmInstallInMarketplace(); // Phase 4: now async w/ ERESOLVE handling
return `Dependencies installed ${pc.green('OK')}`;
} catch (error: unknown) {
installerError(ErrorSeverity.ABORT, {
component: 'marketplace-npm-install',
phase: 'marketplace-deps',
cause: error,
}, summary);
// installerError throws — unreachable, but TypeScript needs a return
return '';
}
Rework bufferConsole
src/npx-cli/commands/install.ts:43–64 currently swallows stderr into a string buffer and only surfaces it via pendingErrors. After this phase:
- A non-zero result from the wrapped function must preserve the stderr verbatim in the returned object (already does).
setupIDEs(lines 328–347) must callinstallerError(FAIL_LOUD_PER_IDE, …)witheresolveDetails: output.slice(0, 4000)(first ~80 lines).- The IDE summary block must show the exit code + first 20 lines of stderr verbatim, not a generic "X failed" line.
Top-level wiring
In runInstallCommand (install.ts:961), thread summary through:
- Create
summaryat the top. - Pass to
setupIDEs, everyrunTaskstask,ensureBun/ensureUv,runNpmInstallInMarketplace. - After all tasks, call
flushSummary(summary, isInteractive)before the existingp.note(summaryLines, installStatus). - Wrap the entire body in
try { … } catch (e) { if (e instanceof InstallAbortError) { … print + exit 1 } else throw }.
Verification checklist
grep -rE 'console\.warn\(.*install' src/npx-cli/ src/services/integrations/returns 0 hits (all warnings go viainstallerError).grep -rE 'catch.*\{[^}]*//.*ignore' src/npx-cli/ src/services/integrations/returns 0 hits.- Every catch in the Phase 1 CSV has been edited (verify by line-number cross-check).
- New unit test: ABORT throws
InstallAbortError, WARN_CONTINUE appends to summary, SILENT_RETRY escalates after 2 attempts. npm run typecheckpasses.npm run testpasses (existing tests must keep passing — refactor must be behavior-preserving on the happy path).
Anti-pattern guards
- Do not call
process.exit()directly insideinstallerError— throwInstallAbortErrorso the top-level handler can flush the summary and print a coherent outro. - Do not print warnings live during a clack spinner. Always enqueue to
summary.warningsand flush at the end. - Do not introduce a new global module.
summaryis an explicit parameter (testability). - Do not silence the stack trace inside
InstallAbortError— Node's defaultstackis fine; the user wants debug info.
Phase 4 — tree-sitter ERESOLVE detection and explicit handling
Goal: Replace the unconditional --legacy-peer-deps with strict-first, fall-back-on-confirmed-ERESOLVE-only.
File to edit: src/npx-cli/commands/install.ts:565–581
What to implement
Rewrite runNpmInstallInMarketplace:
async function runNpmInstallInMarketplace(summary: InstallSummary): Promise<void> {
const marketplaceDir = marketplaceDirectory();
const packageJsonPath = join(marketplaceDir, 'package.json');
if (!existsSync(packageJsonPath)) return;
// Phase 7: --ignore-scripts is the default. The 12.6.2 incident proved that
// any new transitive dep with a postinstall (e.g. tree-sitter-swift's
// tree-sitter-cli download) can hang `npx claude-mem install`.
const baseFlags = ['install', '--omit=dev', '--ignore-scripts'];
const strictResult = await runNpmStrict(marketplaceDir, baseFlags);
if (strictResult.code === 0) return;
const stderr = strictResult.stderr ?? '';
const isEresolve = /\bERESOLVE\b/.test(stderr) || /code ERESOLVE/.test(stderr);
if (!isEresolve) {
installerError(ErrorSeverity.ABORT, {
component: 'marketplace-npm-install',
phase: 'marketplace-deps',
cause: new Error(`npm install failed (exit ${strictResult.code})`),
eresolveDetails: stderr.slice(0, 4000),
}, summary);
}
// Confirmed ERESOLVE — log loudly, attempt one fallback with --legacy-peer-deps.
log.warn(`npm reported ERESOLVE peer-dependency conflict in marketplace deps; retrying with --legacy-peer-deps. Conflict details:`);
log.warn(extractEresolveBlock(stderr));
const legacyResult = await runNpmStrict(marketplaceDir, [...baseFlags, '--legacy-peer-deps']);
if (legacyResult.code === 0) {
summary.warnings.push({
component: 'marketplace-npm-install',
message: 'tree-sitter peer-dep ERESOLVE was resolved with --legacy-peer-deps fallback. This is benign for the marketplace install but should be re-evaluated when tree-sitter peer ranges change.',
remediation: 'No action required.',
});
return;
}
installerError(ErrorSeverity.ABORT, {
component: 'marketplace-npm-install',
phase: 'marketplace-deps',
cause: new Error(`npm install --legacy-peer-deps still failed (exit ${legacyResult.code})`),
eresolveDetails: legacyResult.stderr?.slice(0, 4000),
}, summary);
}
Helpers (extract to src/npx-cli/install/npm-install-helper.ts):
runNpmStrict(cwd, flags): Promise<{ code: number; stdout: string; stderr: string }>— wrapsspawnSyncwith timeout (Phase 7).extractEresolveBlock(stderr): string— pulls theWhile resolving:…Conflicting peer dependency:block for display.
Bun install hardening (installPluginDependencies setup-runtime.ts:221–239)
Same pattern: wrap with runBunStrict, parse stderr for error: failed to resolve (network) vs error: package "X" not found (real missing dep). Network failures = SILENT_RETRY (one retry); real missing = ABORT.
Verification checklist
- Existing test
tests/install-non-tty.test.tsstill passes (happy path). - New unit test: simulated
npm installexit 1 withERESOLVEin stderr triggers fallback path. - New unit test: simulated
npm installexit 1 withoutERESOLVE→ immediate ABORT (no fallback). - New unit test: both strict and legacy fail → ABORT with first-20-lines stderr in
eresolveDetails. grep -n "legacy-peer-deps" src/npx-cli/commands/install.tsonly appears insiderunNpmInstallInMarketplace's fallback path, never on first try.
Anti-pattern guards
- Do not use
--force. It accepts conflicting trees that--legacy-peer-depswould skip — different semantics. - Do not retry the strict install — strict failure with no ERESOLVE means a real bug; retrying just hides it.
- Do not assume
ERESOLVEis always present in lowercase. The npm format is uppercase; match/\bERESOLVE\b/not/eresolve/i. - Do not parse stderr with a fragile regex; the simple
\bERESOLVE\btoken check is sufficient. KeepextractEresolveBlockdefensive (return raw stderr if the block markers aren't found).
Phase 5 — Missing-uv auto-detection and explicit failure
Goal: Honor CLAUDE.md's "uv auto-installed if missing" promise, but make the failure case loud and platform-specific. Downgrade to WARN_CONTINUE if the user opted out of vector search.
File to edit: src/npx-cli/install/setup-runtime.ts:206–219
What to implement
Augment ensureUv():
export async function ensureUv(
summary: InstallSummary,
options: { allowVectorSearchOptOut?: boolean } = {}
): Promise<{ uvPath: string; version: string } | { uvPath: null; version: null }> {
if (!isUvInstalled()) {
installUv(); // existing logic — already throws platform-specific error on failure
}
// Post-install verification: PATH may not yet include ~/.local/bin in the
// current shell. Re-probe UV_COMMON_PATHS explicitly.
let uvPath = getUvPath();
if (!uvPath) {
// One more direct check of UV_COMMON_PATHS (in case install just wrote there).
uvPath = UV_COMMON_PATHS.find(existsSync) ?? null;
}
if (!uvPath) {
if (options.allowVectorSearchOptOut && userHasOptedOutOfVectorSearch()) {
installerError(ErrorSeverity.WARN_CONTINUE, {
component: 'uv-install',
phase: 'setup-runtime',
cause: new Error('uv binary not found after install; vector search disabled — continuing.'),
}, summary);
return { uvPath: null, version: null };
}
installerError(ErrorSeverity.ABORT, {
component: 'uv-install',
phase: 'setup-runtime',
cause: new Error('uv binary not found after auto-install attempt'),
remediation: platformUvRemediation(), // surfaced as PRIMARY message
}, summary);
}
const version = getUvVersion();
if (!version) {
// Probe failed once — retry with a 1-second sleep (sometimes new binaries need a moment).
await new Promise((r) => setTimeout(r, 1000));
const retried = getUvVersion();
if (!retried) {
installerError(ErrorSeverity.WARN_CONTINUE, {
component: 'uv-version-probe',
phase: 'setup-runtime',
cause: new Error(`uv binary at ${uvPath} did not respond to --version after retry`),
}, summary);
return { uvPath, version: 'unknown' };
}
return { uvPath, version: retried };
}
return { uvPath, version };
}
Helpers:
userHasOptedOutOfVectorSearch()— checkSettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH)for aCLAUDE_MEM_DISABLE_VECTOR_SEARCHsetting (define if it does not exist; default false).platformUvRemediation()— extract the existing platform-specific block frominstallUv(lines 164–166) into a standalone exported function so both error paths share it.
Apply same pattern to ensureBun
ensureBun (lines 191–204): same retry-after-1s, same platformBunRemediation(). Bun has no opt-out — bun is mandatory for hooks.
Verification checklist
tests/setup-runtime.test.tsextended: case whereinstallUvsucceeds butgetUvPathstill returns null (mockexistsSyncto lie) → ABORT with platform string.- Test: same scenario but with vector search opted out → WARN_CONTINUE,
ensureUvreturns{uvPath: null}. - Test:
getUvVersionreturns null on first call, version on second → returns{ version: ...}after retry, no warning. - Test:
getUvVersionreturns null both times → WARN_CONTINUE,version: 'unknown'.
Anti-pattern guards
- Do not call
installUv()more than once perensureUv()invocation. The auto-install attempt is one-shot; if it fails, ABORT with manual instructions. Do not loop. - Do not silently swallow
installUv()'s thrown error — its message already contains the platform-specific instructions; let them propagate as the ABORT remediation. - Do not add a "press enter to continue" prompt on missing uv — non-interactive installs would hang.
Phase 6 — Cross-IDE validation matrix (12 × 4 = 48 cells)
Goal: Every IDE × every failure mode asserts the right outcome.
Files to create:
tests/install-error-matrix.test.tsDockerfile.test-installer-matrix
What to implement
Use bun test's existing harness. For each of the 12 IDEs (claude-code, gemini-cli, opencode, openclaw, windsurf, codex-cli, cursor, copilot-cli, antigravity, goose, roo-code, warp) and for each of 4 scenarios, generate one test case:
| Scenario | Fixture / mock | Assertions |
|---|---|---|
| Happy path | Mock spawnSync so bun --version, uv --version, npm install all return 0. |
exit 0, stdout contains installed successfully, summary failedIDEs.length === 0, summary.warnings.length === 0. |
| tree-sitter ERESOLVE | Mock npm install to exit 1 with npm error code ERESOLVE in stderr; mock --legacy-peer-deps retry to also exit 1. |
exit 1, stderr contains Installation Aborted: tree-sitter-eresolve, stderr contains the conflicting peer ranges block, stdout does not contain installed successfully. |
| Missing uv (auto-install fails) | Mock getUvPath to return null; mock installUv to throw with astral.sh 404. |
exit 1, stderr contains Installation Aborted: uv-missing-after-install, stderr contains platform-specific manual instructions (curl -LsSf https://astral.sh/uv/install.sh | sh on Linux, winget install astral-sh.uv on Windows). |
| Missing bun (auto-install fails) | Mock getBunPath to return null; mock installBun to throw with bun.sh 404. |
exit 1, stderr contains Installation Aborted: bun-missing-after-install, stderr contains platform-specific manual instructions. |
Helpers needed
setupIsolatedHome(): { home: string; cleanup: () => void }— creates a temp HOME, setsCLAUDE_MEM_DATA_DIR=$home/.claude-mem,HOME=$home, returns paths.mockSpawnSync(matrix: Record<string, { code: number; stdout?: string; stderr?: string }>): void— installs a mock that matches by command+arg.runInstallSubprocess(ide: string, env: Record<string, string>): Promise<{ exitCode: number; stdout: string; stderr: string }>— spawnsbun src/npx-cli/index.ts install --no-auto-start --ide=${ide}with mocked env via a wrapper that injects the spawn mocks.
Docker matrix runner
Dockerfile.test-installer-matrix extends Dockerfile.test-installer:
- Adds
RUN bun installfor the test deps. - ENTRYPOINT runs
bun test tests/install-error-matrix.test.ts --reporter junit > /workspace/results.xml. - A
scripts/run-matrix-docker.shwrapper builds the image and runs it; CI invokes this on every PR that touchessrc/npx-cli/,src/services/integrations/,scripts/build-hooks.js, ortests/install-*.
Verification checklist
bun test tests/install-error-matrix.test.tsproduces 48 test cases (12 × 4).- Every case asserts at least: exit code, summary headline (
installed successfullyvsInstallation Aborted), specific remediation substring, structured stderr. - Docker matrix run completes in < 5 minutes.
- CI fails the PR if any of the 48 cells regresses.
Anti-pattern guards
- Do not test against the real
~/.claude— every case must use isolated TMPDIR HOME. - Do not mock at the
installerErrorlevel. Mock the underlyingspawnSync/existsSyncso the full pipeline is exercised. - Do not skip the IDEs marked
coming soonin the matrix — the install command can still be invoked with them. The matrix should assert that they exit cleanly with a "support coming soon" message and exit 0 (they are not failures). - Do not rely on
process.env.HOMEmutations inside the test process — spawn a subprocess with the env override.
Phase 7 — Postinstall regression guards (12.6.2 lesson)
Goal: Prevent another tree-sitter-swift-style hang. CI must fail when a new transitive dep with scripts.postinstall or scripts.install lands outside the explicit allowlist.
Files to create / edit:
scripts/check-postinstall-allowlist.js(NEW, pre-publish CI)package.jsonprepublishOnlyscript (extend)src/npx-cli/install/setup-runtime.tsinstallPluginDependencies(timeout wrapper)
CI guard
scripts/check-postinstall-allowlist.js:
#!/usr/bin/env node
// Enforces: no transitive dep with scripts.postinstall|scripts.install may
// land in plugin/ or root node_modules unless allowlisted.
//
// Why: see CHANGELOG.md:93–110 (12.6.1 → 12.6.2 incident). npm does NOT honor
// trustedDependencies (Bun-only). Any new package with a network postinstall
// will hang `npx claude-mem install`.
const ALLOWLIST = new Set([
'tree-sitter-cli', // builds bindings; trusted because we explicitly need it
'esbuild', // platform-specific binary download is the package itself
]);
// Walk node_modules, parse each package.json, fail if scripts.postinstall or
// scripts.install is present and the package name is not in ALLOWLIST.
// Run against both root and plugin/ trees.
Wire into prepublishOnly: "prepublishOnly": "npm run build && node scripts/check-postinstall-allowlist.js".
Runtime --ignore-scripts default
installPluginDependencies (setup-runtime.ts:228–233): pass --ignore-scripts to bun install. Add comment:
// Per CHANGELOG.md:93–110 (v12.6.1 → v12.6.2): tree-sitter-swift's
// nested tree-sitter-cli postinstall downloads a Rust binary and can
// hang the install. We allowlist the small set of packages that legitimately
// need postinstall (tree-sitter-cli, esbuild) via package.json
// trustedDependencies. Bun honors trustedDependencies; npm does not, which is
// why we additionally pass --ignore-scripts and why root devDependencies stay
// out of npx fetch (v12.6.2 fix).
execSync(`${bunCmd} install --ignore-scripts`, { ... });
runNpmInstallInMarketplace already has --ignore-scripts from Phase 4.
Timeout wrapper
Every execSync/spawnSync install command must have an explicit timeout:
const TIMEOUT_FIRST_RUN_MS = 5 * 60 * 1000; // 5 min
const TIMEOUT_SUBSEQUENT_MS = 2 * 60 * 1000; // 2 min
const installTimeout = process.env.CLAUDE_MEM_INSTALL_TIMEOUT_MS
? Number(process.env.CLAUDE_MEM_INSTALL_TIMEOUT_MS)
: (isFirstRun ? TIMEOUT_FIRST_RUN_MS : TIMEOUT_SUBSEQUENT_MS);
spawnSync returns signal === 'SIGTERM' on timeout. Convert to ABORT with child-process-timeout category.
Apply to all install spawns
Audit-driven list of spawns to wrap:
installBun(line 122–127) — curl pipe-bash, 5 min timeout, allow override.installUv(line 152–155) — curl pipe-bash, 5 min timeout.installPluginDependenciesbun install — 5 min first run, 2 min subsequent.runNpmStrictandrunNpmStrict --legacy-peer-deps— 5 min first run, 2 min subsequent.installClaudeCode(line 426) — already has its own spinner, but no timeout. Add 5 min.
Verification checklist
node scripts/check-postinstall-allowlist.jsagainst the current tree exits 0 (no offenders today).- Adding
tree-sitter-haskell-evil(hypothetical fixture) with a fake postinstall breaks CI. grep -n "ignore-scripts" src/npx-cli/install/setup-runtime.ts src/npx-cli/commands/install.tsshows the flag in bothbun installandnpm installpaths.- Test:
spawnSyncwithtimeout: 100mson a slow command returnssignal: 'SIGTERM'and triggers ABORT.
Anti-pattern guards
- Do not auto-add packages to the allowlist when CI fails. Failing CI is the point — a human reviews each new postinstall.
- Do not add
tree-sitter-clito the allowlist twice (it already lives intrustedDependenciesin package.json:190 andscripts/build-hooks.js:106). The new allowlist is just a CI-time guard, not a duplicate of trustedDependencies. - Do not remove
--ignore-scriptsfrombun installeven though Bun honorstrustedDependencies— the belt-and-suspenders is intentional. - Do not make the timeout configurable per-IDE — one global
CLAUDE_MEM_INSTALL_TIMEOUT_MSenv var is sufficient.
Phase 8 — Documentation and cross-references
Goal: Document the taxonomy and remediation map for end-users and contributors. Update CLAUDE.md to cross-reference.
Files to edit / create:
docs/public/troubleshooting.mdx(CREATE or EXTEND if it exists)CLAUDE.md"Exit Code Strategy" sectionplans/04-installer-transparency.md(this file — already)
What to write
docs/public/troubleshooting.mdx:
- Section "Installation errors": lists each
idfrom the taxonomy table, the error message format, and the remediation. Markdown table mirroring Phase 2's seed taxonomy. - Section "Reading the error": shows a sample stderr block and how to copy-paste the bottom block into a GitHub issue.
- Section "Debug": doc the
CLAUDE_MEM_INSTALL_TIMEOUT_MSenv var and~/.claude-mem/last-install-error.json.
CLAUDE.md "Exit Code Strategy" — append:
**Installer exit codes** (note: installer is NOT a hook; it follows standard CLI exit semantics):
- **Exit 0**: install succeeded; "Installation Complete" headline; summary may include `WARN_CONTINUE` warnings.
- **Exit 1**: ABORT or partial-IDE failures. Headline is "Installation Aborted: \<category\>" or "Installation Partial". Structured cause written to `~/.claude-mem/last-install-error.json` (or `$CLAUDE_MEM_DATA_DIR/last-install-error.json`). See `src/npx-cli/install/error-taxonomy.ts` for the full category list.
docs.json (Mintlify nav): add a link to the new troubleshooting page.
Verification checklist
troubleshooting.mdxcovers all 14 categories from Phase 2.- CLAUDE.md cross-reference points to the right file.
docs.jsonupdated.- CHANGELOG.md is NOT edited (auto-generated per CLAUDE.md's "No need to edit the changelog ever, it's generated automatically.").
Anti-pattern guards
- Do not edit CHANGELOG.md.
- Do not add a "report this error to support" link to a non-existent endpoint. Use the GitHub issues URL from
package.json:25–27. - Do not localize the remediation strings yet — English-only for this phase.
Phase 9 — Final verification
Whole-system checks
npm run typecheckpasses (root + viewer).npm run testpasses (all suites including the new matrix).bun test tests/install-error-matrix.test.tsproduces 48 test cases, all green.- Docker matrix runner (
scripts/run-matrix-docker.sh) green on clean Linux. npm run build-and-synccompletes without errors and the worker restarts cleanly.- Manual test:
bun src/npx-cli/index.ts install --no-auto-starton a fresh test home (HOME=/tmp/test-home) — should succeed and produce a clean summary. - Manual test: same command after
mv ~/.bun /tmp/.bun-stash(simulate missing bun) — should ABORT with platform-specific instructions. grep -nE 'console\.warn\(' src/npx-cli/ src/services/integrations/— should only show non-installer-error usage (e.g.bug-reportscript), no swallowed-error patterns.grep -nE '\|\| true' openclaw/install.sh— sites that should remain (best-effort probes) are documented; sites that should fail loud are converted to\|\| { error "..."; exit 1; }.
Anti-pattern guards (sweep)
- No new
try {} catch {}empty handlers introduced. - No new
console.warnin installer paths that bypassinstallerError. - No use of
--forceanywhere in install scripts. - No removal of
--ignore-scriptsfrombun installornpm installcalls. - No edits to CHANGELOG.md.
Rollback plan
If post-merge a real-world install regression appears:
- Revert PR. Each phase is on a separate commit so partial revert is feasible.
- The pre-existing
--legacy-peer-depsunconditional behavior is preserved in git history at the line numbers cited in this plan. - The
~/.claude-mem/last-install-error.jsonfile written byinstallerErrorprovides a reproducible diagnostic for any user who hits an ABORT — capture this in the rollback issue.
Phase boundaries / ordering
Phases must execute in numerical order:
- Phase 0 → Phase 1: discovery before audit.
- Phase 2 (taxonomy) blocks Phase 3 (reporter uses the enum).
- Phase 3 (reporter) blocks Phase 4 / 5 (both call
installerError). - Phase 4 + 5 land independently after Phase 3.
- Phase 6 (matrix tests) needs 3, 4, 5 complete to assert correct behavior.
- Phase 7 (postinstall guards) can land any time after Phase 3 — independent.
- Phase 8 (docs) is last (documents what shipped).
Each phase is a separate commit (and each is a runnable mini-task in a fresh chat context).