Cynical deletion: close 27 issues by removing defenders + tolerators (#2141)

* fix: mirror migration 28 in SessionStore so pending_messages.tool_use_id and worker_pid columns are created (#2139) SessionStore's inline migration list jumped from v27 to v29, skipping rebuildPendingMessagesForSelfHealingClaim. The worker uses SessionStore directly via worker/DatabaseManager.ts and bypasses the canonical MigrationRunner, so fresh installs ended up at "max v29" with neither column present — every queue claim and observation insert failed. Adds addPendingMessagesToolUseIdAndWorkerPidColumns following the existing mirror precedent (addObservationSubagentColumns / addObservationsUniqueContentHashIndex). Uses ALTER TABLE + column-existence guards so already-broken DBs at v29 self-heal on next worker boot. Verified on fresh DB and on a synthetic v29-without-v28 broken DB: both columns and indexes (idx_pending_messages_worker_pid, ux_pending_session_tool) appear after one boot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: wrap v28 mirror dedup+index creation in transaction Addresses Greptile P2 review on PR #2140: matches the existing pattern in addObservationsUniqueContentHashIndex (v29 mirror at SessionStore.ts:1127) and runner.ts rebuildPendingMessagesForSelfHealingClaim. A crash between the dedup DELETE and the schema_versions INSERT no longer leaves the DB in a half-applied state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plan): cynical-deletion plan for 29 open issues 9-phase plan applying delete-first lens to triaged issue corpus. Headlines: kill defenders (orphan cleanup, EncodedCommand spawn, restart-port-steal) and tolerators (silent JSON drops, drifted SSE filters). Each phase closes a named subset of issues. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: delete process-management theater (Phase 1: DEL-1 + DEL-2) Delete aggressiveStartupCleanup, the PowerShell -EncodedCommand spawn branch, and the restart-with-port-steal sequence. Replace daemon spawning with a single uniform child_process.spawn path using arg-array form, keeping setsid on Unix when available. The defenders (orphan cleanup, duplicate-worker probes, port stealing) bred more bugs than they fixed. PID file with start-time token already provides correct OS-trust ownership; restart now requests httpShutdown, waits 5s for the port to free, then exits 1 if it didn't (user resolves). Net -247 lines. Closes #2090, #2095 (already fixed at session-init.ts:78), #2107, #2111, #2114, #2117, #2123, #2097, #2135. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: observer-sessions trust boundary via CLAUDE_MEM_INTERNAL env (Phase 2: DEL-9) Replace the cwd === OBSERVER_SESSIONS_DIR discriminator (which every consumer must repeat and inevitably drifts) with a single env-var trust boundary set once at spawn time in buildIsolatedEnv. - buildIsolatedEnv now sets CLAUDE_MEM_INTERNAL=1, covering all three spawn sites (SDKAgent, KnowledgeAgent.prime, KnowledgeAgent.executeQuery) - shouldTrackProject checks the env var first (cwd check stays as belt-and-braces fallback) - New shared shouldEmitProjectRow predicate — SSE broadcaster and pagination filter share the same predicate so they can never drift apart (#2118) - ObservationBroadcaster filters observer rows from SSE stream - PaginationHelper hardcoded 'observer-sessions' replaced with OBSERVER_SESSIONS_PROJECT const - project-filter basename match pass — *observer-sessions* now matches basename, not just full path (globToRegex's [^/]* can't cross /) (#2126 item 1) - New `claude-mem cleanup [--dry-run]` subcommand wires CleanupV12_4_3 through to the worker for #2126 item 5 Closes #2118, #2126. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: strip proxy env vars before spawning worker (Phase 4: CON-1) User's HTTP_PROXY/HTTPS_PROXY config was bleeding into internal AI calls when claude-mem spawns the claude subprocess, causing connection failures. Strip unconditionally — no passthrough knob, which rejects #2099's whitelist proposal. Closes #2115, #2099. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: fail-fast on silent drops in stdin/file-context/memory-save (Phase 5: FF-1) Three independent fail-fast fixes: #2089 — stdin-reader silent drop. Non-empty stdin that fails JSON.parse now rejects with a clear error instead of resolving undefined. Empty stdin still resolves undefined. #2094 — PreToolUse:Read truncation Edit deadlock. file-context handler no longer returns a fake truncated Read result via updatedInput. Removes userOffset/userLimit/truncated machinery; injects the timeline via additionalContext only and lets the real Read pass through. Read state and Claude's expectation now stay consistent, eliminating the infinite Edit retry loop. #2116 — /api/memory/save metadata drop + project bug. Schema accepts metadata as a documented JSON column (migration 30 adds observations. metadata TEXT, mirrored in SessionStore). Schema also tightened to .strict() so unknown top-level fields fail fast instead of being silently dropped. Project resolution now consults metadata.project as a fallback before defaultProject. Closes #2089, #2094, #2116. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: small deletions — Zod externalize / Gemini fallback / session timeout / installCLI alias (Phase 6) DEL-4 (#2113): Externalize zod from mcp-server.cjs and context-generator.cjs hook bundles so OpenCode's runtime resolves a single Zod copy. Worker keeps Zod bundled (it's a daemon subprocess, not in OpenCode's hook bundle). Added zod to plugin/package.json so externalized requires resolve at runtime. DEL-5 (#2087): Delete the never-wired GeminiAgent → Claude fallback. fallbackAgent was always null in production. On 429 the agent now throws cleanly (message stays pending for retry). Removed setFallbackAgent, FallbackAgent interface, and the 429 fallback branch from both GeminiAgent and OpenRouterAgent. Updated docs that claimed automatic Claude fallback. DEL-6 (#2127, #2098): Raise MAX_SESSION_WALL_CLOCK_MS from 4h to 24h. The timeout is a real guard against runaway-cost loops (per issue #1590), but 4h kills legitimate long Claude Code days. 24h preserves the guard while never hitting in normal use. No knob — a session approaching this age is a bug worth investigating, not a value worth tuning. DEL-8 (#2054): Delete installCLI() alias function. Saves 4 keystrokes at the cost of cross-platform shell-config mutation surface — not worth it. Canonical entry is npx claude-mem (and bunx). Uninstall now strips legacy alias/function lines from ~/.bashrc, ~/.zshrc, and the PowerShell profile. Closes #2087, #2098, #2113, #2127, #2054. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: de-hardcode worker port + multi-account commit (Phase 3: CON-2 + DEL-7) Replace hardcoded 37777 fallbacks with SettingsDefaultsManager.get( 'CLAUDE_MEM_WORKER_PORT') in npx-cli (runtime/install/uninstall), opencode-plugin, OpenClaw installer, SearchRoutes example URLs. Timeline-report SKILL.md now resolves WORKER_PORT from settings.json at the top and uses ${WORKER_PORT} in all curl invocations. Remaining 37777 literals are doc comments + viewer build-time form- field placeholder (which is replaced by /api/settings on mount). hooks.json: add cygpath POSIX→Windows path translation between _R resolution and node invocation. No-op on macOS/Linux. Closes the Windows + Git Bash MODULE_NOT_FOUND in #2109. CLAUDE.md gains a Multi-account section documenting CLAUDE_MEM_DATA_DIR + optional CLAUDE_MEM_WORKER_PORT — every existing path/port code path now honors them. Closes #2103, #2109, #2101. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: install/uninstall improvements (Phase 7: #2106) 5 fixes for the install/uninstall flow: Item 1 — multiselect default. install.ts no longer pre-selects every detected IDE; user explicitly opts in. Item 3 — shutdown-before-overwrite. New src/services/install/shutdown-helper.ts shared by install and uninstall: POSTs /api/admin/shutdown then polls /api/health until the worker stops responding. install calls it before copyPluginToMarketplace so reinstall over a running worker doesn't conflict; uninstall calls it before deletion. Item 4 — uninstall path coverage. Removes ~/.npm/_npx/*/node_modules/ claude-mem, ~/.cache/claude-cli-nodejs/*/mcp-logs-plugin-claude-mem-*, ~/.claude/plugins/data/claude-mem-thedotmack/. Best-effort: per-path try/catch so a single permission failure doesn't abort uninstall. chroma-mcp shutdown is implicit via the worker's GracefulShutdown cascade in item 3's helper. Item 5 — install summary documents "Close all Claude Code sessions before uninstalling, or ~/.claude-mem will be recreated by active hooks." Item 6 — real-port query. After install, fetches /api/health on the configured port with 3s timeout. Reports actually-bound port if the response carries it; falls back to requested port. No retry loop. Closes #2106 (items 1, 3, 4, 5, 6). Items 2, 7 closed separately as already-fixed and insufficient-detail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: pin chroma-mcp to 0.2.6 (Phase 8: DEL-3 lite) Replace unpinned 'chroma-mcp' arg with chroma-mcp==0.2.6 in both local and remote modes. Pinning makes installs deterministic across machines and across time, eliminating the dependency-drift class of bugs. Verified 0.2.6 in a clean uv cache: starts cleanly, no httpcore/ httpx ImportError, no --with flags needed. The --with flags removed in a0dd516c are not required at this pin (transitive deps resolve correctly when the top-level version is fixed). #2102's three protections (transport cleanup on failure, stale onclose handler guard, 10s reconnect backoff) confirmed intact. Closes #2046, #2085, #2102. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: update stale assertions for per-UID port + migration 30 (Phase 9) SettingsDefaultsManager.CLAUDE_MEM_WORKER_PORT default is per-UID (37700 + uid%100), not literal '37777'. Three assertions in settings-defaults-manager.test.ts now compute the expected value the same way the source does. migration-runner.test.ts: drop expect(versions).toContain(19) (version 19 was a noop never recorded — pre-existing bug at parent), add expect(versions).toContain(30) for the new observations.metadata column added in Phase 5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address Greptile P1/P2 review comments on PR #2141 P1: spawnDaemon return value was unchecked in worker-service.ts restart case, so a failed spawn silently exited 0 with a misleading "Worker restart spawned" log. Now error and exit 1 when restartPid is undefined. P2: shutdown-helper.ts health-poll catch treated AbortError (timeout) the same as connection-refused, so a slow worker could be reported confirmedStopped while still holding file locks. Now distinguish: AbortError continues polling; other errors return confirmedStopped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * build: rebuild plugin artifacts after merging main Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address CodeRabbit review comments on PR #2141 - hooks.json: quote $HOME in cache lookup so paths with spaces work - timeline-report SKILL.md: fall back when process.getuid is unavailable (Windows) - opencode-plugin: validate CLAUDE_MEM_WORKER_PORT before using - uninstall.ts: only strip alias lines, not function declarations (multi-line bodies left intact) - MemoryRoutes: trim whitespace-only project before precedence resolution - SessionStore migration 21: preserve metadata column if observations already has it - stdin-reader test: restore full property descriptor to avoid cross-test pollution Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:23:24 -07:00
parent 7f255cbc51
commit d13662d5d8
52 changed files with 2312 additions and 1222 deletions
@@ -106,8 +106,7 @@ function deduplicateObservations(

 function formatFileTimeline(
  observations: ObservationRow[],
-  filePath: string,
-  truncated: boolean
+  filePath: string
 ): string {
  // Escape filePath for safe interpolation into recovery hints (quotes, backslashes, newlines)
  const safePath = filePath.replace(/\\/g, '\\\\').replace(/"/g, '\\"').replace(/\n/g, '\\n');
@@ -138,17 +137,14 @@ function formatFileTimeline(
  }).toLowerCase().replace(' ', '');
  const currentTimezone = now.toLocaleTimeString('en-US', { timeZoneName: 'short' }).split(' ').pop();

-  const headerLine = truncated
-    ? `This file has prior observations. Only line 1 was read to save tokens.`
-    : `This file has prior observations. The requested section was read normally.`;
-
+  // The hook never modifies the Read call (#2094) — Claude always sees the
+  // full requested section. The timeline below is supplementary priming, not
+  // a replacement for the file contents.
  const lines: string[] = [
    `Current: ${currentDate} ${currentTime} ${currentTimezone}`,
-    headerLine,
-    `- **Already know enough?** The timeline below may be all you need (semantic priming).`,
-    `- **Need details?** get_observations([IDs]) — ~300 tokens each.`,
-    `- **Need full file?** Read again with offset/limit for the section you need.`,
-    `- **Need to edit?** Edit works — the file is registered as read. Use smart_outline("${safePath}") for line numbers.`,
+    `This file has prior observations — supplementary context follows. The Read result below is the full requested section.`,
+    `- **Need details on a past observation?** get_observations([IDs]) — ~300 tokens each.`,
+    `- **Need a structural map first?** smart_outline("${safePath}") — line numbers only, cheaper than re-reading.`,
  ];

  for (const [day, dayObservations] of sortedDays) {
@@ -176,15 +172,8 @@ export const fileContextHandler: EventHandler = {
      return { continue: true, suppressOutput: true };
    }

-    // Preserve user-supplied offset/limit to avoid read-dedup collisions (fixes #1719)
-    const userOffset = typeof toolInput?.offset === 'number' && Number.isFinite(toolInput.offset) && toolInput.offset >= 0
-      ? Math.floor(toolInput.offset) : undefined;
-    const userLimit = typeof toolInput?.limit === 'number' && Number.isFinite(toolInput.limit) && toolInput.limit > 0
-      ? Math.floor(toolInput.limit) : undefined;
-    const isTargetedRead = userOffset !== undefined || userLimit !== undefined;
-
    // Stat the file once: size (gate) + mtime (cache invalidation).
-    // 0 = stat failed non-fatally (e.g. EPERM) — skip mtime check, fall through to truncation.
+    // 0 = stat failed non-fatally (e.g. EPERM) — skip mtime check, fall through to context injection.
    let fileMtimeMs = 0;
    try {
      const statPath = path.isAbsolute(filePath)
@@ -241,12 +230,12 @@ export const fileContextHandler: EventHandler = {
      return { continue: true, suppressOutput: true };
    }

-    // mtime invalidation: bypass truncation when the file is newer than the latest observation.
-    // Uses >= to handle same-millisecond edits (cost: one extra full read vs risk of stuck truncation).
+    // mtime invalidation: skip the timeline injection when the file is newer than the latest
+    // observation — past observations are stale and adding them risks misleading the model.
    if (fileMtimeMs > 0) {
      const newestObservationMs = Math.max(...data.observations.map(o => o.created_at_epoch));
      if (fileMtimeMs >= newestObservationMs) {
-        logger.debug('HOOK', 'File modified since last observation, skipping truncation', {
+        logger.debug('HOOK', 'File modified since last observation, skipping context injection', {
          filePath: relativePath,
          fileMtimeMs,
          newestObservationMs,
@@ -261,23 +250,18 @@ export const fileContextHandler: EventHandler = {
      return { continue: true, suppressOutput: true };
    }

-    // Unconstrained → truncate to 1 line; targeted → preserve offset/limit.
-    const truncated = !isTargetedRead;
-    const timeline = formatFileTimeline(dedupedObservations, filePath, truncated);
-    const updatedInput: Record<string, unknown> = { file_path: filePath };
-    if (isTargetedRead) {
-      if (userOffset !== undefined) updatedInput.offset = userOffset;
-      if (userLimit !== undefined) updatedInput.limit = userLimit;
-    } else {
-      updatedInput.limit = 1;
-    }
+    // #2094: never modify the Read call. Returning `updatedInput` with `limit: 1` previously
+    // truncated unconstrained reads, leaving Claude with a stale 1-line snapshot in context
+    // while the timeline told it not to re-read. Subsequent Edit calls then deadlocked because
+    // Claude Code's read-state tracker reported the file as "read" but the actual content was
+    // missing. The hook now only injects supplementary context — the Read proceeds unmodified.
+    const timeline = formatFileTimeline(dedupedObservations, filePath);

    return {
      hookSpecificOutput: {
        hookEventName: 'PreToolUse',
        additionalContext: timeline,
        permissionDecision: 'allow',
-        updatedInput,
      },
    };
  },
@@ -6,6 +6,16 @@
 // Solution: JSON is self-delimiting. We detect complete JSON by attempting
 // to parse after each chunk. Once we have valid JSON, we resolve immediately
 // without waiting for EOF. This is the proper fix, not a timeout workaround.
+//
+// Resolve/reject contract:
+//   - Resolves with parsed JSON value when stdin yields valid JSON.
+//   - Resolves with `undefined` when stdin is unavailable, closes empty,
+//     or emits a stream error.
+//   - Rejects with an Error when stdin closes (or the safety timeout fires)
+//     after non-empty bytes that never form valid JSON. Malformed input is
+//     a handler/client bug — surfacing it lets the upstream exit-code
+//     strategy treat it as a blocking error (exit 2) rather than silently
+//     proceeding as if no input was given. (#2089)

 import { logger } from '../utils/logger.js';

@@ -157,8 +167,14 @@ export async function readJsonFromStdin(): Promise<unknown> {
      // stdin closed - parse whatever we have
      if (!resolved) {
        if (!tryResolveWithJson()) {
-          // Empty or invalid - resolve with undefined
-          resolveWith(input.trim() ? undefined : undefined);
+          // Mirror the safety-timeout semantics (#2089):
+          // non-empty bytes that never parsed = malformed input, surface it.
+          // Empty stdin = "no input given", resolve undefined.
+          if (input.trim()) {
+            rejectWith(new Error(`Malformed JSON at stdin EOF: ${input.slice(0, 100)}...`));
+          } else {
+            resolveWith(undefined);
+          }
        }
      }
    };
@@ -94,7 +94,24 @@ interface SessionDeletedEvent {
 // Constants
 // ============================================================================

-const WORKER_BASE_URL = "http://127.0.0.1:37777";
+/**
+ * Resolve the worker port matching SettingsDefaultsManager's algorithm:
+ *   process.env.CLAUDE_MEM_WORKER_PORT, else 37700 + (uid % 100).
+ * Required for multi-account isolation (#2101) and so this plugin talks to
+ * the same worker the rest of claude-mem (hooks, npx-cli) connects to.
+ * Inlined rather than imported to keep this OpenCode plugin standalone.
+ */
+function resolveWorkerPort(): string {
+  const fromEnv = process.env.CLAUDE_MEM_WORKER_PORT;
+  const parsed = fromEnv ? Number.parseInt(fromEnv.trim(), 10) : NaN;
+  if (Number.isInteger(parsed) && parsed >= 1 && parsed <= 65535) {
+    return String(parsed);
+  }
+  const uid = typeof process.getuid === "function" ? process.getuid() : 77;
+  return String(37700 + (uid % 100));
+}
+
+const WORKER_BASE_URL = `http://127.0.0.1:${resolveWorkerPort()}`;
 const MAX_TOOL_RESPONSE_LENGTH = 1000;

 // ============================================================================
@@ -55,6 +55,8 @@ import {
  writeJsonFileAtomic,
 } from '../utils/paths.js';
 import { readJsonSafe } from '../../utils/json-utils.js';
+import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
+import { shutdownWorkerAndWait } from '../../services/install/shutdown-helper.js';
 import { detectInstalledIDEs } from './ide-detection.js';

 // ---------------------------------------------------------------------------
@@ -272,9 +274,9 @@ async function promptForIDESelection(): Promise<string[]> {
  const result = await p.multiselect({
    message: 'Which IDEs do you use?',
    options,
-    initialValues: detected
-      .filter((ide) => ide.supported)
-      .map((ide) => ide.id),
+    // No pre-selection — users must explicitly opt in to each IDE so we
+    // never wire up an integration the user did not actually request (#2106).
+    initialValues: [],
    required: true,
  });

@@ -458,6 +460,19 @@ export async function runInstallCommand(options: InstallOptions = {}): Promise<v
  const needsManualInstall = selectedIDEs.some((id) => id !== 'claude-code');

  if (needsManualInstall) {
+    // Shut down any running worker FIRST so it isn't holding open file
+    // handles when we overwrite plugin files (#2106 item 3). Best-effort:
+    // helper swallows its own errors when no worker is running.
+    const installPort = SettingsDefaultsManager.get('CLAUDE_MEM_WORKER_PORT');
+    try {
+      const result = await shutdownWorkerAndWait(installPort, 10000);
+      if (result.workerWasRunning) {
+        log.info('Stopped running worker before overwrite.');
+      }
+    } catch (error: unknown) {
+      console.warn('[install] Pre-overwrite worker shutdown failed:', error instanceof Error ? error.message : String(error));
+    }
+
    await runTasks([
      {
        title: 'Copying plugin files',
@@ -542,12 +557,47 @@ export async function runInstallCommand(options: InstallOptions = {}): Promise<v
    summaryLines.forEach(l => console.log(`  ${l}`));
  }

-  const workerPort = process.env.CLAUDE_MEM_WORKER_PORT || '37777';
+  // Resolve port via SettingsDefaultsManager so CLAUDE_MEM_WORKER_PORT env
+  // takes priority and the per-UID default (37700 + uid % 100) is used
+  // otherwise. Required for multi-account isolation (#2101).
+  const workerPort = SettingsDefaultsManager.get('CLAUDE_MEM_WORKER_PORT');
+
+  // Probe the actually-bound port (#2106 item 6). smart-install just
+  // started the worker; if it's reachable we report the real port the
+  // worker bound to. If the probe fails, the worker is still spinning
+  // up — say so plainly and exit cleanly. Don't loop, don't block.
+  let actualPort: number | string = workerPort;
+  let workerReady = false;
+  try {
+    const healthResponse = await fetch(`http://127.0.0.1:${workerPort}/api/health`, {
+      signal: AbortSignal.timeout(3000),
+    });
+    if (healthResponse.ok) {
+      workerReady = true;
+      try {
+        const body = await healthResponse.json() as { port?: number | string };
+        if (body && (typeof body.port === 'number' || typeof body.port === 'string')) {
+          actualPort = body.port;
+        }
+      } catch {
+        // Health endpoint returned non-JSON — keep using the requested port.
+      }
+    }
+  } catch {
+    // Health probe failed — worker may still be starting.
+  }
+
+  const portLine = workerReady
+    ? `Worker port: ${pc.cyan(String(actualPort))}`
+    : `Worker port: ${pc.cyan(String(workerPort))} (worker not yet ready -- still starting up; check ${pc.bold('claude-mem status')} later)`;
+
  const nextSteps = [
    'Open Claude Code and start a conversation -- memory is automatic!',
-    `View your memories: ${pc.underline(`http://localhost:${workerPort}`)}`,
+    portLine,
+    `View your memories: ${pc.underline(`http://localhost:${actualPort}`)}`,
    `Search past work: use ${pc.bold('/mem-search')} in Claude Code`,
    `Start worker: ${pc.bold('npx claude-mem start')}`,
+    `Note: Close all Claude Code sessions before uninstalling, or ${pc.cyan('~/.claude-mem')} will be recreated by active hooks.`,
  ];

  if (isInteractive) {
@@ -12,6 +12,7 @@ import { join } from 'path';
 import pc from 'picocolors';
 import { resolveBunBinaryPath } from '../utils/bun-resolver.js';
 import { isPluginInstalled, marketplaceDirectory } from '../utils/paths.js';
+import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';

 // ---------------------------------------------------------------------------
 // Installation guard
@@ -139,6 +140,15 @@ export function runAdoptCommand(extraArgs: string[] = []): void {
  });
 }

+/**
+ * Run the one-time v12.4.3 pollution cleanup, or preview it via --dry-run.
+ * Delegates to the worker-service.cjs `cleanup` subcommand so the scan and
+ * (optional) deletion run in Bun (needed for bun:sqlite). (#2126 item 5)
+ */
+export function runCleanupCommand(extraArgs: string[] = []): void {
+  spawnBunWorkerCommand('cleanup', extraArgs);
+}
+
 /**
 * Search the worker API at `GET /api/search?query=<query>`.
 */
@@ -151,7 +161,10 @@ export async function runSearchCommand(queryParts: string[]): Promise<void> {
    process.exit(1);
  }

-  const workerPort = process.env.CLAUDE_MEM_WORKER_PORT || '37777';
+  // Resolve port via SettingsDefaultsManager so CLAUDE_MEM_WORKER_PORT env
+  // takes priority and the per-UID default (37700 + uid % 100) is used
+  // otherwise. Required for multi-account isolation (#2101).
+  const workerPort = SettingsDefaultsManager.get('CLAUDE_MEM_WORKER_PORT');
  const searchUrl = `http://127.0.0.1:${workerPort}/api/search?query=${encodeURIComponent(query)}`;

  let response: Response;
@@ -9,7 +9,8 @@
 */
 import * as p from '@clack/prompts';
 import pc from 'picocolors';
-import { existsSync, rmSync } from 'fs';
+import { existsSync, readFileSync, readdirSync, rmSync, writeFileSync } from 'fs';
+import { homedir } from 'os';
 import { join } from 'path';
 import {
  claudeSettingsPath,
@@ -21,6 +22,8 @@ import {
  writeJsonFileAtomic,
 } from '../utils/paths.js';
 import { readJsonSafe } from '../../utils/json-utils.js';
+import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
+import { shutdownWorkerAndWait } from '../../services/install/shutdown-helper.js';

 // ---------------------------------------------------------------------------
 // Cleanup helpers
@@ -60,6 +63,48 @@ function removeFromInstalledPlugins(): void {
  }
 }

+/**
+ * Strip the legacy `claude-mem` shell alias/function from common shell rc files
+ * (#2054). The alias used to be added by `installCLI()` in smart-install.js;
+ * that function was deleted, but existing users still have the line. This is
+ * a one-time best-effort cleanup — idempotent (no-op if the line is absent),
+ * and safely matches only lines that BEGIN with `alias claude-mem=` or
+ * `function claude-mem` to avoid mangling unrelated code.
+ */
+function stripLegacyClaudeMemAlias(): void {
+  const home = homedir();
+  const candidateFiles = [
+    join(home, '.bashrc'),
+    join(home, '.zshrc'),
+    join(home, 'Documents', 'PowerShell', 'Microsoft.PowerShell_profile.ps1'),
+  ];
+
+  // Only strip simple aliases. A function declaration would span multiple
+  // lines and can't be safely removed by a line filter — leave it for the
+  // user to remove manually.
+  const aliasLineRegex = /^\s*alias\s+claude-mem\s*=/;
+
+  for (const filePath of candidateFiles) {
+    if (!existsSync(filePath)) continue;
+    let content: string;
+    try {
+      content = readFileSync(filePath, 'utf-8');
+    } catch (error: unknown) {
+      console.warn(`[uninstall] Could not read ${filePath}:`, error instanceof Error ? error.message : String(error));
+      continue;
+    }
+    const lines = content.split('\n');
+    const filtered = lines.filter((line) => !aliasLineRegex.test(line));
+    if (filtered.length === lines.length) continue; // no match — leave file untouched
+    try {
+      writeFileSync(filePath, filtered.join('\n'));
+      console.error(`Removed legacy claude-mem alias from ${filePath}`);
+    } catch (error: unknown) {
+      console.warn(`[uninstall] Could not rewrite ${filePath}:`, error instanceof Error ? error.message : String(error));
+    }
+  }
+}
+
 function removeFromClaudeSettings(): void {
  const settings = readJsonSafe<Record<string, any>>(claudeSettingsPath(), {});
  if (settings.enabledPlugins?.['claude-mem@thedotmack'] !== undefined) {
@@ -68,6 +113,90 @@ function removeFromClaudeSettings(): void {
  }
 }

+/**
+ * Best-effort cleanup of stray claude-mem residue (#2106 item 4) that
+ * accumulates outside of `~/.claude/plugins/marketplaces/thedotmack/`:
+ *
+ *   - `~/.npm/_npx/<hash>/node_modules/claude-mem` (npx install caches)
+ *   - `~/.cache/claude-cli-nodejs/<project>/mcp-logs-plugin-claude-mem-*`
+ *   - `~/.claude/plugins/data/claude-mem-thedotmack/`
+ *
+ * Each step is wrapped in its own try/catch — a failure on one path
+ * (e.g. permissions denied on a single npx hash dir) must not abort
+ * the rest. We log the failure and continue.
+ *
+ * Returns the count of paths actually removed (purely for reporting).
+ */
+function removeStrayClaudeMemPaths(): number {
+  const home = homedir();
+  let removedCount = 0;
+
+  // 1. ~/.npm/_npx/*/node_modules/claude-mem
+  const npxRoot = join(home, '.npm', '_npx');
+  if (existsSync(npxRoot)) {
+    let hashDirs: string[] = [];
+    try {
+      hashDirs = readdirSync(npxRoot);
+    } catch (error: unknown) {
+      console.warn(`[uninstall] Could not read ${npxRoot}:`, error instanceof Error ? error.message : String(error));
+    }
+    for (const hashDir of hashDirs) {
+      const candidate = join(npxRoot, hashDir, 'node_modules', 'claude-mem');
+      if (!existsSync(candidate)) continue;
+      try {
+        rmSync(candidate, { recursive: true, force: true });
+        removedCount++;
+      } catch (error: unknown) {
+        console.warn(`[uninstall] Could not remove ${candidate}:`, error instanceof Error ? error.message : String(error));
+      }
+    }
+  }
+
+  // 2. ~/.cache/claude-cli-nodejs/*/mcp-logs-plugin-claude-mem-*
+  const cacheRoot = join(home, '.cache', 'claude-cli-nodejs');
+  if (existsSync(cacheRoot)) {
+    let projectDirs: string[] = [];
+    try {
+      projectDirs = readdirSync(cacheRoot);
+    } catch (error: unknown) {
+      console.warn(`[uninstall] Could not read ${cacheRoot}:`, error instanceof Error ? error.message : String(error));
+    }
+    for (const projectDir of projectDirs) {
+      const projectPath = join(cacheRoot, projectDir);
+      let logEntries: string[] = [];
+      try {
+        logEntries = readdirSync(projectPath);
+      } catch (error: unknown) {
+        console.warn(`[uninstall] Could not read ${projectPath}:`, error instanceof Error ? error.message : String(error));
+        continue;
+      }
+      for (const entry of logEntries) {
+        if (!entry.startsWith('mcp-logs-plugin-claude-mem-')) continue;
+        const logPath = join(projectPath, entry);
+        try {
+          rmSync(logPath, { recursive: true, force: true });
+          removedCount++;
+        } catch (error: unknown) {
+          console.warn(`[uninstall] Could not remove ${logPath}:`, error instanceof Error ? error.message : String(error));
+        }
+      }
+    }
+  }
+
+  // 3. ~/.claude/plugins/data/claude-mem-thedotmack/
+  const pluginDataDir = join(home, '.claude', 'plugins', 'data', 'claude-mem-thedotmack');
+  if (existsSync(pluginDataDir)) {
+    try {
+      rmSync(pluginDataDir, { recursive: true, force: true });
+      removedCount++;
+    } catch (error: unknown) {
+      console.warn(`[uninstall] Could not remove ${pluginDataDir}:`, error instanceof Error ? error.message : String(error));
+    }
+  }
+
+  return removedCount;
+}
+
 // ---------------------------------------------------------------------------
 // Public API
 // ---------------------------------------------------------------------------
@@ -105,30 +234,23 @@ export async function runUninstallCommand(): Promise<void> {
    }
  }

-  // Stop the worker and wait for it to exit before deleting files
-  const workerPort = process.env.CLAUDE_MEM_WORKER_PORT || '37777';
+  // Stop the worker and wait for it to exit before deleting files.
+  // Resolve port via SettingsDefaultsManager so CLAUDE_MEM_WORKER_PORT env
+  // takes priority and the per-UID default (37700 + uid % 100) is used
+  // otherwise. Required for multi-account isolation (#2101).
+  //
+  // The worker's graceful shutdown also stops chroma-mcp via
+  // GracefulShutdown -> ChromaMcpManager.stop(), so this single call
+  // cascades to the chroma-mcp subprocess as well.
+  const workerPort = SettingsDefaultsManager.get('CLAUDE_MEM_WORKER_PORT');
  try {
-    await fetch(`http://127.0.0.1:${workerPort}/api/admin/shutdown`, {
-      method: 'POST',
-      signal: AbortSignal.timeout(5000),
-    });
-    // Poll health endpoint until worker is gone (max 10s)
-    for (let attempt = 0; attempt < 20; attempt++) {
-      await new Promise((resolve) => setTimeout(resolve, 500));
-      try {
-        await fetch(`http://127.0.0.1:${workerPort}/api/health`, {
-          signal: AbortSignal.timeout(1000),
-        });
-        // Still alive — keep waiting
-      } catch (error: unknown) {
-        // Connection refused = worker is gone (expected shutdown behavior)
-        console.error('[uninstall] Worker health check failed (worker stopped):', error instanceof Error ? error.message : String(error));
-        break;
-      }
+    const result = await shutdownWorkerAndWait(workerPort, 10000);
+    if (result.workerWasRunning) {
+      p.log.info('Worker service stopped.');
    }
-    p.log.info('Worker service stopped.');
-  } catch {
-    // Worker may not be running — that is fine
+  } catch (error: unknown) {
+    // shutdownWorkerAndWait swallows its own errors, but guard anyway.
+    console.warn('[uninstall] Worker shutdown attempt failed:', error instanceof Error ? error.message : String(error));
  }

  await p.tasks([
@@ -171,6 +293,22 @@ export async function runUninstallCommand(): Promise<void> {
        return `Claude settings updated ${pc.green('OK')}`;
      },
    },
+    {
+      title: 'Removing legacy claude-mem shell alias',
+      task: async () => {
+        stripLegacyClaudeMemAlias();
+        return `Legacy alias check complete ${pc.green('OK')}`;
+      },
+    },
+    {
+      title: 'Removing stray claude-mem caches and logs',
+      task: async () => {
+        const removed = removeStrayClaudeMemPaths();
+        return removed > 0
+          ? `Stray paths removed: ${removed} ${pc.green('OK')}`
+          : `No stray paths found ${pc.dim('skipped')}`;
+      },
+    },
  ]);

  // Remove IDE-specific hooks and config (best-effort, each is independent)
@@ -53,6 +53,7 @@ ${pc.bold('Runtime Commands')} (requires Bun, delegates to installed plugin):
  ${pc.cyan('npx claude-mem status')}               Show worker status
  ${pc.cyan('npx claude-mem search <query>')}       Search observations
  ${pc.cyan('npx claude-mem adopt [--dry-run] [--branch <name>]')}    Stamp merged worktrees into parent project
+  ${pc.cyan('npx claude-mem cleanup [--dry-run]')}    Run one-time v12.4.3 pollution cleanup (or preview counts)
  ${pc.cyan('npx claude-mem transcript watch')}     Start transcript watcher

 ${pc.bold('IDE Identifiers')}:
@@ -153,6 +154,13 @@ async function main(): Promise<void> {
      break;
    }

+    // -- One-time v12.4.3 cleanup ------------------------------------------
+    case 'cleanup': {
+      const { runCleanupCommand } = await import('./commands/runtime.js');
+      runCleanupCommand(args.slice(1));
+      break;
+    }
+
    // -- Transcript --------------------------------------------------------
    case 'transcript': {
      const subCommand = args[1]?.toLowerCase();
@@ -50,29 +50,52 @@ interface MarkerPayload {
 * the marker file ensures the work runs at most once per data directory.
 *
 * @param dataDirectory - Override for DATA_DIR (used in tests)
+ * @param options.dryRun - When true, scans + reports counts but performs NO
+ *        DB writes, NO backup, NO chroma wipe, and does NOT write the marker.
+ *        Used by `claude-mem cleanup --dry-run` to preview what would happen
+ *        without mutating user state. (#2126 item 5)
 */
-export function runOneTimeV12_4_3Cleanup(dataDirectory?: string): void {
+export function runOneTimeV12_4_3Cleanup(
+  dataDirectory?: string,
+  options: { dryRun?: boolean } = {},
+): CleanupCounts | undefined {
+  const dryRun = options.dryRun === true;
  const effectiveDataDir = dataDirectory ?? DATA_DIR;
  const markerPath = path.join(effectiveDataDir, MARKER_FILENAME);

-  if (existsSync(markerPath)) {
+  if (existsSync(markerPath) && !dryRun) {
    logger.debug('SYSTEM', 'v12.4.3 cleanup marker exists, skipping');
    return;
  }

-  if (process.env.CLAUDE_MEM_SKIP_CLEANUP_V12_4_3 === '1') {
+  if (process.env.CLAUDE_MEM_SKIP_CLEANUP_V12_4_3 === '1' && !dryRun) {
    logger.warn('SYSTEM', 'v12.4.3 cleanup skipped via CLAUDE_MEM_SKIP_CLEANUP_V12_4_3=1; marker not written');
    return;
  }

  const dbPath = path.join(effectiveDataDir, 'claude-mem.db');
  if (!existsSync(dbPath)) {
+    if (dryRun) {
+      logger.info('SYSTEM', 'v12.4.3 cleanup --dry-run: no DB present, nothing to scan', { dbPath });
+      return emptyCounts();
+    }
    mkdirSync(effectiveDataDir, { recursive: true });
    writeMarker(markerPath, { appliedAt: new Date().toISOString(), backupPath: null, chromaWiped: false, counts: emptyCounts(), skipped: 'no-db' });
    logger.debug('SYSTEM', 'No DB present, v12.4.3 cleanup marker written without work', { dbPath });
    return;
  }

+  if (dryRun) {
+    logger.info('SYSTEM', 'Running v12.4.3 cleanup --dry-run (read-only scan, no writes)', { dbPath });
+    try {
+      return scanCleanupCounts(dbPath);
+    } catch (err: unknown) {
+      const error = err instanceof Error ? err : new Error(String(err));
+      logger.error('SYSTEM', 'v12.4.3 cleanup --dry-run scan failed', {}, error);
+      return undefined;
+    }
+  }
+
  logger.warn('SYSTEM', 'Running one-time v12.4.3 pollution cleanup', { dbPath });

  try {
@@ -83,6 +106,43 @@ export function runOneTimeV12_4_3Cleanup(dataDirectory?: string): void {
  }
 }

+/**
+ * Read-only scan: count what runOneTimeV12_4_3Cleanup *would* delete.
+ * Mirrors the COUNT(*) queries from runObserverSessionsPurge and
+ * runStuckPendingPurge. Opens the DB read-only — never mutates.
+ */
+function scanCleanupCounts(dbPath: string): CleanupCounts {
+  const counts = emptyCounts();
+  const db = new Database(dbPath, { readonly: true });
+  try {
+    counts.observerSessions = (
+      db.prepare(`SELECT COUNT(*) AS n FROM sdk_sessions WHERE project = ?`).get(OBSERVER_SESSIONS_PROJECT) as { n: number }
+    ).n;
+    counts.observerCascadeRows =
+      (db.prepare(`SELECT COUNT(*) AS n FROM user_prompts WHERE content_session_id IN (SELECT content_session_id FROM sdk_sessions WHERE project = ?)`).get(OBSERVER_SESSIONS_PROJECT) as { n: number }).n
+      + (db.prepare(`SELECT COUNT(*) AS n FROM observations WHERE memory_session_id IN (SELECT memory_session_id FROM sdk_sessions WHERE project = ? AND memory_session_id IS NOT NULL)`).get(OBSERVER_SESSIONS_PROJECT) as { n: number }).n
+      + (db.prepare(`SELECT COUNT(*) AS n FROM session_summaries WHERE memory_session_id IN (SELECT memory_session_id FROM sdk_sessions WHERE project = ? AND memory_session_id IS NOT NULL)`).get(OBSERVER_SESSIONS_PROJECT) as { n: number }).n;
+    counts.stuckPendingMessages = (db.prepare(
+      `SELECT COUNT(*) AS n FROM pending_messages
+         WHERE status IN ('failed', 'processing')
+           AND session_db_id IN (
+             SELECT session_db_id FROM pending_messages
+              WHERE status IN ('failed', 'processing')
+              GROUP BY session_db_id
+              HAVING COUNT(*) >= ?
+           )`
+    ).get(STUCK_PENDING_THRESHOLD) as { n: number }).n;
+  } finally {
+    db.close();
+  }
+  logger.info('SYSTEM', 'v12.4.3 cleanup --dry-run scan complete', {
+    observerSessions: counts.observerSessions,
+    observerCascadeRows: counts.observerCascadeRows,
+    stuckPendingMessages: counts.stuckPendingMessages,
+  });
+  return counts;
+}
+
 function executeCleanup(dbPath: string, effectiveDataDir: string, markerPath: string): void {
  const dbSize = statSync(dbPath).size;
  const required = Math.ceil(dbSize * 1.2) + 100 * 1024 * 1024;
@@ -541,191 +541,6 @@ export async function cleanupOrphanedProcesses(): Promise<void> {
  logger.info('SYSTEM', 'Orphaned processes cleaned up', { count: pidsToKill.length });
 }

-// Patterns that should be killed immediately at startup (no age gate)
-// These are child processes that should not outlive their parent worker
-const AGGRESSIVE_CLEANUP_PATTERNS = ['worker-service.cjs', 'chroma-mcp'];
-
-// Patterns that keep the age-gated threshold (may be legitimately running)
-const AGE_GATED_CLEANUP_PATTERNS = ['mcp-server.cjs'];
-
-/**
- * Enumerate processes for aggressive startup cleanup. Aggressive patterns are
- * killed immediately; age-gated patterns only if older than ORPHAN_MAX_AGE_MINUTES.
- */
-async function enumerateAggressiveCleanupProcesses(
-  isWindows: boolean,
-  currentPid: number,
-  protectedPids: Set<number>,
-  allPatterns: string[]
-): Promise<number[]> {
-  const pidsToKill: number[] = [];
-
-  if (isWindows) {
-    // Use WQL -Filter for server-side filtering (no $_ pipeline syntax).
-    // Avoids Git Bash $_ interpretation (#1062) and PowerShell syntax errors (#1024).
-    const wqlPatternConditions = allPatterns
-      .map(p => `CommandLine LIKE '%${p}%'`)
-      .join(' OR ');
-
-    const cmd = `powershell -NoProfile -NonInteractive -Command "Get-CimInstance Win32_Process -Filter '(${wqlPatternConditions}) AND ProcessId != ${currentPid}' | Select-Object ProcessId, CommandLine, CreationDate | ConvertTo-Json"`;
-    const { stdout } = await execAsync(cmd, { timeout: HOOK_TIMEOUTS.POWERSHELL_COMMAND, windowsHide: true });
-
-    if (!stdout.trim() || stdout.trim() === 'null') {
-      logger.debug('SYSTEM', 'No orphaned claude-mem processes found (Windows)');
-      return [];
-    }
-
-    const processes = JSON.parse(stdout);
-    const processList = Array.isArray(processes) ? processes : [processes];
-    const now = Date.now();
-
-    for (const proc of processList) {
-      const pid = proc.ProcessId;
-      if (!Number.isInteger(pid) || pid <= 0 || protectedPids.has(pid)) continue;
-
-      const commandLine = proc.CommandLine || '';
-      const isAggressive = AGGRESSIVE_CLEANUP_PATTERNS.some(p => commandLine.includes(p));
-
-      if (isAggressive) {
-        // Kill immediately — no age check
-        pidsToKill.push(pid);
-        logger.debug('SYSTEM', 'Found orphaned process (aggressive)', { pid, commandLine: commandLine.substring(0, 80) });
-      } else {
-        // Age-gated: only kill if older than threshold
-        const creationMatch = proc.CreationDate?.match(/\/Date\((\d+)\)\//);
-        if (creationMatch) {
-          const creationTime = parseInt(creationMatch[1], 10);
-          const ageMinutes = (now - creationTime) / (1000 * 60);
-          if (ageMinutes >= ORPHAN_MAX_AGE_MINUTES) {
-            pidsToKill.push(pid);
-            logger.debug('SYSTEM', 'Found orphaned process (age-gated)', { pid, ageMinutes: Math.round(ageMinutes) });
-          }
-        }
-      }
-    }
-  } else {
-    // Unix: Use ps with elapsed time
-    const patternRegex = allPatterns.join('|');
-    const { stdout } = await execAsync(
-      `ps -eo pid,etime,command | grep -E "${patternRegex}" | grep -v grep || true`
-    );
-
-    if (!stdout.trim()) {
-      logger.debug('SYSTEM', 'No orphaned claude-mem processes found (Unix)');
-      return [];
-    }
-
-    const lines = stdout.trim().split('\n');
-    for (const line of lines) {
-      const match = line.trim().match(/^(\d+)\s+(\S+)\s+(.*)$/);
-      if (!match) continue;
-
-      const pid = parseInt(match[1], 10);
-      const etime = match[2];
-      const command = match[3];
-
-      if (!Number.isInteger(pid) || pid <= 0 || protectedPids.has(pid)) continue;
-
-      const isAggressive = AGGRESSIVE_CLEANUP_PATTERNS.some(p => command.includes(p));
-
-      if (isAggressive) {
-        // Kill immediately — no age check
-        pidsToKill.push(pid);
-        logger.debug('SYSTEM', 'Found orphaned process (aggressive)', { pid, command: command.substring(0, 80) });
-      } else {
-        // Age-gated: only kill if older than threshold
-        const ageMinutes = parseElapsedTime(etime);
-        if (ageMinutes >= ORPHAN_MAX_AGE_MINUTES) {
-          pidsToKill.push(pid);
-          logger.debug('SYSTEM', 'Found orphaned process (age-gated)', { pid, ageMinutes, command: command.substring(0, 80) });
-        }
-      }
-    }
-  }
-
-  return pidsToKill;
-}
-
-/**
- * Aggressive startup cleanup for orphaned claude-mem processes.
- *
- * Unlike cleanupOrphanedProcesses() which age-gates everything at 30 minutes,
- * this function kills worker-service.cjs and chroma-mcp processes immediately
- * (they should not outlive their parent worker). Only mcp-server.cjs keeps
- * the age threshold since it may be legitimately running.
- *
- * Called once at daemon startup.
- */
-export async function aggressiveStartupCleanup(): Promise<void> {
-  const isWindows = process.platform === 'win32';
-  const currentPid = process.pid;
-  const allPatterns = [...AGGRESSIVE_CLEANUP_PATTERNS, ...AGE_GATED_CLEANUP_PATTERNS];
-
-  // Protect parent process (the hook that spawned us) from being killed.
-  // Without this, a new daemon kills its own parent hook process (#1426).
-  //
-  // Note: readPidFile() is not used here because start() writes the new PID
-  // before initializeBackground() calls this function, so readPidFile() would
-  // just return process.pid (already protected). If a pre-existing worker needs
-  // protection, ensureWorkerStarted() handles that by returning early when a
-  // healthy worker is detected — we never reach this code in that case.
-  const protectedPids = new Set<number>([currentPid]);
-  if (process.ppid && process.ppid > 0) {
-    protectedPids.add(process.ppid);
-  }
-
-  let pidsToKill: number[];
-  try {
-    pidsToKill = await enumerateAggressiveCleanupProcesses(isWindows, currentPid, protectedPids, allPatterns);
-  } catch (error: unknown) {
-    if (error instanceof Error) {
-      logger.error('SYSTEM', 'Failed to enumerate orphaned processes during aggressive cleanup', {}, error);
-    } else {
-      logger.error('SYSTEM', 'Failed to enumerate orphaned processes during aggressive cleanup', {}, new Error(String(error)));
-    }
-    return;
-  }
-
-  if (pidsToKill.length === 0) {
-    return;
-  }
-
-  logger.info('SYSTEM', 'Aggressive startup cleanup: killing orphaned processes', {
-    platform: isWindows ? 'Windows' : 'Unix',
-    count: pidsToKill.length,
-    pids: pidsToKill
-  });
-
-  if (isWindows) {
-    for (const pid of pidsToKill) {
-      if (!Number.isInteger(pid) || pid <= 0) continue;
-      try {
-        execSync(`taskkill /PID ${pid} /T /F`, { timeout: HOOK_TIMEOUTS.POWERSHELL_COMMAND, stdio: 'ignore', windowsHide: true });
-      } catch (error: unknown) {
-        if (error instanceof Error) {
-          logger.debug('SYSTEM', 'Failed to kill process, may have already exited', { pid }, error);
-        } else {
-          logger.debug('SYSTEM', 'Failed to kill process, may have already exited', { pid }, new Error(String(error)));
-        }
-      }
-    }
-  } else {
-    for (const pid of pidsToKill) {
-      try {
-        process.kill(pid, 'SIGKILL');
-      } catch (error: unknown) {
-        if (error instanceof Error) {
-          logger.debug('SYSTEM', 'Process already exited', { pid }, error);
-        } else {
-          logger.debug('SYSTEM', 'Process already exited', { pid }, new Error(String(error)));
-        }
-      }
-    }
-  }
-
-  logger.info('SYSTEM', 'Aggressive startup cleanup complete', { count: pidsToKill.length });
-}
-
 const CHROMA_MIGRATION_MARKER_FILENAME = '.chroma-cleaned-v10.3';

 /**
@@ -929,14 +744,20 @@ function executeCwdRemap(dbPath: string, effectiveDataDir: string, markerPath: s
 }

 /**
- * Spawn a detached daemon process
- * Returns the child PID or undefined if spawn failed
+ * Spawn a detached daemon process.
 *
- * On Windows, uses PowerShell Start-Process with -WindowStyle Hidden to spawn
- * a truly independent process without console popups. Unlike WMIC, PowerShell
- * inherits environment variables from the parent process.
+ * Uses Node's child_process.spawn with the arg-array form on every platform.
+ * The arg-array form bypasses the shell entirely on Windows, so no quoting
+ * heuristics or PowerShell wrappers are needed (handles paths with spaces
+ * like `C:\Users\Alex Newman\...` natively).
 *
- * On Unix, uses standard detached spawn.
+ * On Unix, prefer setsid to detach from the controlling terminal so SIGHUP
+ * can't reach the daemon even if the in-process handler fails. The
+ * `detached: true` option already creates a new process group on POSIX;
+ * setsid is the belt-and-suspenders extra.
+ *
+ * Bun.spawn is intentionally NOT used here: it does not support detached
+ * spawning (see comment in process-registry.ts:633-639).
 *
 * PID file is written by the worker itself after listen() succeeds,
 * not by the spawner (race-free, works on all platforms).
@@ -946,7 +767,6 @@ export function spawnDaemon(
  port: number,
  extraEnv: Record<string, string> = {}
 ): number | undefined {
-  const isWindows = process.platform === 'win32';
  getSupervisor().assertCanSpawn('worker daemon');

  const env = sanitizeEnv({
@@ -957,9 +777,7 @@ export function spawnDaemon(

  // worker-service.cjs imports `bun:sqlite`, so the spawned runtime MUST be
  // Bun on every platform — never the current process.execPath, which may be
-  // Node when the caller is the MCP server. Resolve once before the OS branch
-  // split so we don't pay for a duplicate PATH lookup if Bun isn't found at a
-  // well-known path. See resolveWorkerRuntimePath() for the candidate list.
+  // Node when the caller is the MCP server.
  const runtimePath = resolveWorkerRuntimePath();
  if (!runtimePath) {
    logger.error(
@@ -969,65 +787,20 @@ export function spawnDaemon(
    return undefined;
  }

-  if (isWindows) {
-    // Use PowerShell Start-Process to spawn a hidden, independent process
-    // Unlike WMIC, PowerShell inherits environment variables from parent
-    // -WindowStyle Hidden prevents console popup
-
-    // Use -EncodedCommand to avoid all shell quoting issues with spaces in paths
-    const psScript = `Start-Process -FilePath '${runtimePath.replace(/'/g, "''")}' -ArgumentList @('${scriptPath.replace(/'/g, "''")}','--daemon') -WindowStyle Hidden`;
-    const encodedCommand = Buffer.from(psScript, 'utf16le').toString('base64');
-
-    try {
-      execSync(`powershell -NoProfile -EncodedCommand ${encodedCommand}`, {
-        stdio: 'ignore',
-        windowsHide: true,
-        env
-      });
-      // Windows success sentinel: PowerShell `Start-Process` does not return
-      // the spawned PID, and we don't want to pay for an extra `Get-Process`
-      // round-trip just to discover it. Return 0 (a conventionally invalid
-      // Unix PID) so callers can distinguish "spawn dispatched" from "spawn
-      // failed". Callers MUST use `pid === undefined` to detect failure —
-      // never falsy checks like `if (!pid)`, which would silently treat
-      // success as failure here.
-      return 0;
-    } catch (error: unknown) {
-      // APPROVED OVERRIDE: Windows daemon spawn is best-effort; log and let callers fall back to health checks/retry flow.
-      if (error instanceof Error) {
-        logger.error('SYSTEM', 'Failed to spawn worker daemon on Windows', { runtimePath }, error);
-      } else {
-        logger.error('SYSTEM', 'Failed to spawn worker daemon on Windows', { runtimePath }, new Error(String(error)));
-      }
-      return undefined;
-    }
-  }
-
-  // Unix: Use setsid to create a new session, fully detaching from the
-  // controlling terminal. This prevents SIGHUP from reaching the daemon
-  // even if the in-process SIGHUP handler somehow fails (belt-and-suspenders).
-  // Fall back to standard detached spawn if setsid is not available.
-  // `runtimePath` was resolved at the top of this function (see comment there).
+  // On Unix, prefer setsid to fully detach from the controlling terminal.
+  // On Windows or systems without setsid, spawn the runtime directly.
  const setsidPath = '/usr/bin/setsid';
-  if (existsSync(setsidPath)) {
-    const child = spawn(setsidPath, [runtimePath, scriptPath, '--daemon'], {
-      detached: true,
-      stdio: 'ignore',
-      env
-    });
+  const useSetsid = process.platform !== 'win32' && existsSync(setsidPath);

-    if (child.pid === undefined) {
-      return undefined;
-    }
+  const execPath = useSetsid ? setsidPath : runtimePath;
+  const args = useSetsid
+    ? [runtimePath, scriptPath, '--daemon']
+    : [scriptPath, '--daemon'];

-    child.unref();
-    return child.pid;
-  }
-
-  // Fallback: standard detached spawn (macOS, systems without setsid)
-  const child = spawn(runtimePath, [scriptPath, '--daemon'], {
+  const child = spawn(execPath, args, {
    detached: true,
    stdio: 'ignore',
+    windowsHide: true,
    env
  });

@@ -1036,7 +809,6 @@ export function spawnDaemon(
  }

  child.unref();
-
  return child.pid;
 }

@@ -0,0 +1,58 @@
+/**
+ * Shared worker-shutdown helper used by both `install` (to clear out a
+ * running worker before overwriting plugin files) and `uninstall` (to
+ * release file locks before deletion).
+ *
+ * Posts to `/api/admin/shutdown`, then polls `/api/health` until the
+ * connection is refused (= worker is gone) or the timeout elapses.
+ *
+ * Best-effort: if the worker is not running, the POST throws and we
+ * return immediately. Callers should never depend on this throwing.
+ */
+
+export interface ShutdownResult {
+  /** True if we actively shut down a worker; false if none was running. */
+  workerWasRunning: boolean;
+  /** True if we observed the worker stop responding before the timeout. */
+  confirmedStopped: boolean;
+}
+
+export async function shutdownWorkerAndWait(
+  port: number | string,
+  timeoutMs: number = 10000,
+): Promise<ShutdownResult> {
+  const baseUrl = `http://127.0.0.1:${port}`;
+  let workerWasRunning = false;
+
+  try {
+    await fetch(`${baseUrl}/api/admin/shutdown`, {
+      method: 'POST',
+      signal: AbortSignal.timeout(5000),
+    });
+    workerWasRunning = true;
+  } catch {
+    // Worker not running (connection refused) or shutdown POST timed out.
+    // Either way, nothing more to do.
+    return { workerWasRunning: false, confirmedStopped: true };
+  }
+
+  const pollIntervalMs = 500;
+  const maxAttempts = Math.ceil(timeoutMs / pollIntervalMs);
+  for (let attempt = 0; attempt < maxAttempts; attempt++) {
+    await new Promise((resolve) => setTimeout(resolve, pollIntervalMs));
+    try {
+      await fetch(`${baseUrl}/api/health`, {
+        signal: AbortSignal.timeout(1000),
+      });
+      // Health endpoint still responding — worker is still alive, keep waiting.
+    } catch (err) {
+      // AbortError = health endpoint timed out (worker still accepting
+      // connections but slow). Keep polling. Any other error
+      // (ECONNREFUSED, ECONNRESET) means the worker is gone.
+      if (err instanceof Error && err.name === 'AbortError') continue;
+      return { workerWasRunning, confirmedStopped: true };
+    }
+  }
+
+  return { workerWasRunning, confirmedStopped: false };
+}
@@ -26,6 +26,7 @@ import {
  unlinkSync,
 } from 'fs';
 import { logger } from '../../utils/logger.js';
+import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';

 // ============================================================================
 // Path Resolution
@@ -168,7 +169,7 @@ function writeOpenClawConfig(config: Record<string, any>): void {
 * and the memory slot.
 */
 function registerPluginInOpenClawConfig(
-  workerPort: number = 37777,
+  workerPort: number,
  project: string = 'openclaw',
  syncMemoryFile: boolean = true,
 ): void {
@@ -305,7 +306,11 @@ function copyPluginFilesAndRegister(
    'utf-8',
  );

-  registerPluginInOpenClawConfig();
+  // Resolve port via SettingsDefaultsManager so CLAUDE_MEM_WORKER_PORT env
+  // takes priority and the per-UID default (37700 + uid % 100) is used
+  // otherwise. Required for multi-account isolation (#2101).
+  const workerPort = SettingsDefaultsManager.getInt('CLAUDE_MEM_WORKER_PORT');
+  registerPluginInOpenClawConfig(workerPort);
  console.log(`  Registered in openclaw.json`);

  logger.info('OPENCLAW', 'Plugin installed', { destination: extensionDirectory });
@@ -75,6 +75,7 @@ export class SessionStore {
    this.addObservationSubagentColumns();
    this.addPendingMessagesToolUseIdAndWorkerPidColumns();
    this.addObservationsUniqueContentHashIndex();
+    this.addObservationsMetadataColumn();
  }

  /**
@@ -715,6 +716,14 @@ export class SessionStore {
    // Clean up leftover temp table from a previously-crashed run
    this.db.run('DROP TABLE IF EXISTS observations_new');

+    // If the live observations table already has metadata (added in v30 or
+    // by an older bundled artifact that ran v30 before v21 was recorded),
+    // preserve it so this rebuild doesn't silently drop the column's data.
+    const observationsCols = this.db.query('PRAGMA table_info(observations)').all() as TableColumnInfo[];
+    const observationsHasMetadata = observationsCols.some(c => c.name === 'metadata');
+    const metadataColumnSQL = observationsHasMetadata ? ',\n        metadata TEXT' : '';
+    const metadataSelectSQL = observationsHasMetadata ? ', metadata' : '';
+
    const observationsNewSQL = `
      CREATE TABLE observations_new (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
@@ -732,7 +741,7 @@ export class SessionStore {
        prompt_number INTEGER,
        discovery_tokens INTEGER DEFAULT 0,
        created_at TEXT NOT NULL,
-        created_at_epoch INTEGER NOT NULL,
+        created_at_epoch INTEGER NOT NULL${metadataColumnSQL},
        FOREIGN KEY(memory_session_id) REFERENCES sdk_sessions(memory_session_id) ON DELETE CASCADE ON UPDATE CASCADE
      )
    `;
@@ -740,7 +749,7 @@ export class SessionStore {
      INSERT INTO observations_new
      SELECT id, memory_session_id, project, text, type, title, subtitle, facts,
             narrative, concepts, files_read, files_modified, prompt_number,
-             discovery_tokens, created_at, created_at_epoch
+             discovery_tokens, created_at, created_at_epoch${metadataSelectSQL}
      FROM observations
    `;
    const observationsIndexesSQL = `
@@ -1156,6 +1165,29 @@ export class SessionStore {
    }
  }

+  /**
+   * Add metadata TEXT column to observations (migration 30).
+   *
+   * Mirrors MigrationRunner.addObservationsMetadataColumn so bundled artifacts
+   * that embed SessionStore (e.g. worker-service.cjs, context-generator.cjs)
+   * stay schema-consistent. Without this, INSERT … (..., metadata, ...) raises
+   * "table observations has no column named metadata" and POST /api/memory/save
+   * starts failing on every call once it begins persisting metadata (#2116).
+   *
+   * Idempotent via PRAGMA table_info guard.
+   */
+  private addObservationsMetadataColumn(): void {
+    const cols = this.db.query('PRAGMA table_info(observations)').all() as TableColumnInfo[];
+    const hasColumn = cols.some(c => c.name === 'metadata');
+
+    if (!hasColumn) {
+      this.db.run('ALTER TABLE observations ADD COLUMN metadata TEXT');
+      logger.debug('DB', 'Added metadata column to observations table (#2116)');
+    }
+
+    this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(30, new Date().toISOString());
+  }
+
  /**
   * Update the memory session ID for a session
   * Called by SDKAgent when it captures the session ID from the first SDK message
@@ -2009,6 +2041,9 @@ export class SessionStore {
      files_modified: string[];
      agent_type?: string | null;
      agent_id?: string | null;
+      // Caller-supplied JSON metadata, stored verbatim in the metadata column (#2116).
+      // Pre-stringified by the caller so we don't double-encode an already-JSON value.
+      metadata?: string | null;
    },
    promptNumber?: number,
    discoveryTokens: number = 0,
@@ -2027,8 +2062,8 @@ export class SessionStore {
      INSERT INTO observations
      (memory_session_id, project, type, title, subtitle, facts, narrative, concepts,
       files_read, files_modified, prompt_number, discovery_tokens, agent_type, agent_id, content_hash, created_at, created_at_epoch,
-       generated_by_model)
-      VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+       generated_by_model, metadata)
+      VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
      ON CONFLICT(memory_session_id, content_hash) DO NOTHING
      RETURNING id, created_at_epoch
    `);
@@ -2051,7 +2086,8 @@ export class SessionStore {
      contentHash,
      timestampIso,
      timestampEpoch,
-      generatedByModel || null
+      generatedByModel || null,
+      observation.metadata ?? null
    ) as { id: number; created_at_epoch: number } | null;

    if (inserted) {
@@ -40,6 +40,7 @@ export class MigrationRunner {
    this.addObservationSubagentColumns();
    this.rebuildPendingMessagesForSelfHealingClaim();
    this.addObservationsUniqueContentHashIndex();
+    this.addObservationsMetadataColumn();
  }

  /**
@@ -1204,4 +1205,27 @@ export class MigrationRunner {
      throw new Error(`Migration 29 failed: ${String(error)}`);
    }
  }
+
+  /**
+   * Add metadata TEXT column to observations (migration 30).
+   *
+   * Backward-compatible: nullable, no default. Holds JSON-encoded arbitrary
+   * metadata supplied by callers of POST /api/memory/save (#2116). Without
+   * this column, the route's Zod `.passthrough()` accepted unknown fields
+   * but the INSERT silently dropped them — a quiet contract violation.
+   *
+   * Idempotent via PRAGMA table_info guard so cross-machine DB sync that
+   * leaves schema_versions ahead of actual schema still self-heals.
+   */
+  private addObservationsMetadataColumn(): void {
+    const cols = this.db.query('PRAGMA table_info(observations)').all() as TableColumnInfo[];
+    const hasColumn = cols.some(c => c.name === 'metadata');
+
+    if (!hasColumn) {
+      this.db.run('ALTER TABLE observations ADD COLUMN metadata TEXT');
+      logger.debug('DB', 'Added metadata column to observations table (#2116)');
+    }
+
+    this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(30, new Date().toISOString());
+  }
 }
@@ -74,6 +74,7 @@ CREATE TABLE IF NOT EXISTS observations (
  agent_id             TEXT,
  merged_into_project  TEXT,
  generated_by_model   TEXT,
+  metadata             TEXT,
  created_at           TEXT    NOT NULL,
  created_at_epoch     INTEGER NOT NULL,
  FOREIGN KEY(memory_session_id) REFERENCES sdk_sessions(memory_session_id)
@@ -31,6 +31,24 @@ const RECONNECT_BACKOFF_MS = 10_000; // Don't retry connections faster than this
 const DEFAULT_CHROMA_DATA_DIR = path.join(os.homedir(), '.claude-mem', 'chroma');
 const CHROMA_SUPERVISOR_ID = 'chroma-mcp';

+/**
+ * Pinned chroma-mcp version for deterministic installs.
+ *
+ * Why pin: `uvx chroma-mcp` (unpinned) resolves whatever version PyPI happens
+ * to serve at install time. That has bitten us multiple ways:
+ *   - #2046: transient missing httpcore/httpx after dependency resolver shifts
+ *   - #2085: surprise breaking changes between point releases
+ *   - #2102: subprocess spawn storms triggered by version drift in chromadb deps
+ *
+ * Pinning to a specific known-good version makes installs reproducible across
+ * machines and across time. Bump deliberately, not accidentally.
+ *
+ * Verified 2026-04-25 with `uvx --python 3.13 chroma-mcp==0.2.6 --help` in a
+ * clean uv cache: starts cleanly, no httpcore/httpx ImportError, no `--with`
+ * flags required. If that changes on a future bump, re-add the flags here.
+ */
+const CHROMA_MCP_PINNED_VERSION = '0.2.6';
+
 export class ChromaMcpManager {
  private static instance: ChromaMcpManager | null = null;
  private client: Client | null = null;
@@ -212,7 +230,7 @@ export class ChromaMcpManager {

      const args = [
        '--python', pythonVersion,
-        'chroma-mcp',
+        `chroma-mcp==${CHROMA_MCP_PINNED_VERSION}`,
        '--client-type', 'http',
        '--host', chromaHost,
        '--port', chromaPort
@@ -238,7 +256,7 @@ export class ChromaMcpManager {
    // Local mode: persistent client with data directory
    return [
      '--python', pythonVersion,
-      'chroma-mcp',
+      `chroma-mcp==${CHROMA_MCP_PINNED_VERSION}`,
      '--client-type', 'persistent',
      '--data-dir', DEFAULT_CHROMA_DATA_DIR.replace(/\\/g, '/')
    ];
@@ -44,7 +44,6 @@ import {
  readPidFile,
  removePidFile,
  getPlatformTimeout,
-  aggressiveStartupCleanup,
  runOneTimeChromaMigration,
  runOneTimeCwdRemap,
  cleanStalePidFile,
@@ -386,7 +385,6 @@ export class WorkerService implements WorkerRef {
  private async initializeBackground(): Promise<void> {
    try {
      logger.info('WORKER', 'Background initialization starting...');
-      await aggressiveStartupCleanup();

      // Load mode configuration
      const { ModeManager } = await import('./domain/ModeManager.js');
@@ -1154,34 +1152,21 @@ async function main() {
    case 'restart': {
      logger.info('SYSTEM', 'Restarting worker');
      await httpShutdown(port);
-      const restartFreed = await waitForPortFree(port, getPlatformTimeout(15000));
+      const restartFreed = await waitForPortFree(port, 5000);
      if (!restartFreed) {
-        logger.error('SYSTEM', 'Port did not free up after shutdown, aborting restart', { port });
-        process.exit(0);
+        // Don't loop, don't force-kill, don't steal the port. The PID file
+        // owns the lock; if the previous worker won't release the port the
+        // user resolves it manually.
+        console.error('Port still bound after shutdown. Resolve manually.');
+        process.exit(1);
      }
      removePidFile();
-
-      const pid = spawnDaemon(__filename, port);
-      if (pid === undefined) {
-        logger.error('SYSTEM', 'Failed to spawn worker daemon during restart');
-        // Exit gracefully: Windows Terminal won't keep tab open on exit 0
-        // The wrapper/plugin will handle restart logic if needed
-        process.exit(0);
+      const restartPid = spawnDaemon(__filename, port);
+      if (restartPid === undefined) {
+        console.error('Failed to spawn worker daemon during restart.');
+        process.exit(1);
      }
-
-      // PID file is written by the worker itself after listen() succeeds
-      // This is race-free and works correctly on Windows where cmd.exe PID is useless
-
-      const healthy = await waitForHealth(port, getPlatformTimeout(HOOK_TIMEOUTS.POST_SPAWN_WAIT));
-      if (!healthy) {
-        removePidFile();
-        logger.error('SYSTEM', 'Worker failed to restart');
-        // Exit gracefully: Windows Terminal won't keep tab open on exit 0
-        // The wrapper/plugin will handle restart logic if needed
-        process.exit(0);
-      }
-
-      logger.info('SYSTEM', 'Worker restarted successfully');
+      logger.info('SYSTEM', 'Worker restart spawned', { pid: restartPid });
      process.exit(0);
      break;
    }
@@ -1298,6 +1283,26 @@ async function main() {
      process.exit(0);
    }

+    case 'cleanup': {
+      // CLI surface for the v12.4.3 pollution cleanup. Shares its scan logic
+      // with the auto-run-on-startup path so --dry-run reports counts that
+      // exactly match what the next startup would delete. (#2126 item 5)
+      const dryRun = process.argv.includes('--dry-run');
+      const counts = runOneTimeV12_4_3Cleanup(undefined, { dryRun });
+      const tag = dryRun ? '(dry-run, no changes made)' : '(applied)';
+      console.log(`\nv12.4.3 cleanup ${tag}`);
+      if (counts) {
+        console.log(`  Observer sessions:        ${counts.observerSessions}`);
+        console.log(`  Observer cascade rows:    ${counts.observerCascadeRows}`);
+        console.log(`  Stuck pending_messages:   ${counts.stuckPendingMessages}`);
+      } else if (dryRun) {
+        console.log('  Scan failed — see worker log for details.');
+      } else {
+        console.log('  Already applied (marker present) or skipped.');
+      }
+      process.exit(0);
+    }
+
    case '--daemon':
    default: {
      // GUARD 1: Refuse to start if another worker is already alive.
@@ -25,10 +25,8 @@ import { ModeManager } from '../domain/ModeManager.js';
 import type { ModeConfig } from '../domain/types.js';
 import {
  processAgentResponse,
-  shouldFallbackToClaude,
  isAbortError,
-  type WorkerRef,
-  type FallbackAgent
+  type WorkerRef
 } from './agents/index.js';

 // Gemini API endpoint — use v1 (stable), not v1beta.
@@ -116,21 +114,12 @@ interface GeminiContent {
 export class GeminiAgent {
  private dbManager: DatabaseManager;
  private sessionManager: SessionManager;
-  private fallbackAgent: FallbackAgent | null = null;

  constructor(dbManager: DatabaseManager, sessionManager: SessionManager) {
    this.dbManager = dbManager;
    this.sessionManager = sessionManager;
  }

-  /**
-   * Set the fallback agent (Claude SDK) for when Gemini API fails
-   * Must be set after construction to avoid circular dependency
-   */
-  setFallbackAgent(agent: FallbackAgent): void {
-    this.fallbackAgent = agent;
-  }
-
  /**
   * Start Gemini agent for a session
   * Uses multi-turn conversation to maintain context across messages
@@ -352,28 +341,19 @@ export class GeminiAgent {
  }

  /**
-   * Handle errors from Gemini API calls with abort detection and Claude fallback.
+   * Handle errors from Gemini API calls with abort detection.
   * Shared by init query and message processing try blocks.
+   *
+   * Note: The previous Claude-SDK fallback path was removed in #2087 — it was
+   * never wired in production (`fallbackAgent` was always null), so 429s
+   * already threw in practice. The throw is now explicit.
   */
-  private handleGeminiError(error: unknown, session: ActiveSession, worker?: WorkerRef): Promise<void> | never {
+  private handleGeminiError(error: unknown, session: ActiveSession, _worker?: WorkerRef): never {
    if (isAbortError(error)) {
      logger.warn('SDK', 'Gemini agent aborted', { sessionId: session.sessionDbId });
      throw error;
    }

-    // Check if we should fall back to Claude
-    if (shouldFallbackToClaude(error) && this.fallbackAgent) {
-      logger.warn('SDK', 'Gemini API failed, falling back to Claude SDK', {
-        sessionDbId: session.sessionDbId,
-        error: error instanceof Error ? error.message : String(error),
-        historyLength: session.conversationHistory.length
-      });
-
-      // Fall back to Claude - it will use the same session with shared conversationHistory
-      // Note: With claim-and-delete queue pattern, messages are already deleted on claim
-      return this.fallbackAgent.startSession(session, worker);
-    }
-
    logger.failure('SDK', 'Gemini agent error', { sessionDbId: session.sessionDbId }, error instanceof Error ? error : new Error(String(error)));
    throw error;
  }
@@ -24,8 +24,6 @@ import { SessionManager } from './SessionManager.js';
 import {
  isAbortError,
  processAgentResponse,
-  shouldFallbackToClaude,
-  type FallbackAgent,
  type WorkerRef
 } from './agents/index.js';

@@ -65,21 +63,12 @@ interface OpenRouterResponse {
 export class OpenRouterAgent {
  private dbManager: DatabaseManager;
  private sessionManager: SessionManager;
-  private fallbackAgent: FallbackAgent | null = null;

  constructor(dbManager: DatabaseManager, sessionManager: SessionManager) {
    this.dbManager = dbManager;
    this.sessionManager = sessionManager;
  }

-  /**
-   * Set the fallback agent (Claude SDK) for when OpenRouter API fails
-   * Must be set after construction to avoid circular dependency
-   */
-  setFallbackAgent(agent: FallbackAgent): void {
-    this.fallbackAgent = agent;
-  }
-
  /**
   * Start OpenRouter agent for a session
   * Uses multi-turn conversation to maintain context across messages
@@ -327,27 +316,18 @@ export class OpenRouterAgent {
  }

  /**
-   * Handle errors from session processing: abort re-throw, fallback to Claude, or log and re-throw.
+   * Handle errors from session processing: abort re-throw or log and re-throw.
+   *
+   * Note: The previous Claude-SDK fallback path was removed in #2087 — it was
+   * never wired in production (`fallbackAgent` was always null), so 429s
+   * already threw in practice. The throw is now explicit.
   */
-  private async handleSessionError(error: unknown, session: ActiveSession, worker?: WorkerRef): Promise<never | void> {
+  private async handleSessionError(error: unknown, session: ActiveSession, _worker?: WorkerRef): Promise<never> {
    if (isAbortError(error)) {
      logger.warn('SDK', 'OpenRouter agent aborted', { sessionId: session.sessionDbId });
      throw error;
    }

-    if (shouldFallbackToClaude(error) && this.fallbackAgent) {
-      logger.warn('SDK', 'OpenRouter API failed, falling back to Claude SDK', {
-        sessionDbId: session.sessionDbId,
-        error: error instanceof Error ? error.message : String(error),
-        historyLength: session.conversationHistory.length
-      });
-
-      // Fall back to Claude - it will use the same session with shared conversationHistory
-      // Note: With claim-and-delete queue pattern, messages are already deleted on claim
-      await this.fallbackAgent.startSession(session, worker);
-      return;
-    }
-
    logger.failure('SDK', 'OpenRouter agent error', { sessionDbId: session.sessionDbId }, error instanceof Error ? error : new Error(String(error)));
    throw error;
  }
@@ -175,7 +175,8 @@ export class PaginationHelper {
      params.push(project, project);
    } else {
      // Hide internal observer-session rows from the unfiltered UI list.
-      conditions.push("ss.project != 'observer-sessions'");
+      conditions.push('ss.project != ?');
+      params.push(OBSERVER_SESSIONS_PROJECT);
    }

    if (platformSource) {
@@ -229,7 +230,8 @@ export class PaginationHelper {
      params.push(project);
    } else {
      // Hide internal observer-session rows from the unfiltered UI list.
-      conditions.push("s.project != 'observer-sessions'");
+      conditions.push('s.project != ?');
+      params.push(OBSERVER_SESSIONS_PROJECT);
    }

    if (platformSource) {
@@ -13,6 +13,7 @@

 import type { WorkerRef, ObservationSSEPayload, SummarySSEPayload } from './types.js';
 import { logger } from '../../../utils/logger.js';
+import { shouldEmitProjectRow } from '../../../shared/should-track-project.js';

 /**
 * Broadcast a new observation to SSE clients
@@ -28,6 +29,18 @@ export function broadcastObservation(
    return;
  }

+  // Parity with PaginationHelper's unfiltered-list SQL filter (#2118):
+  // observer-session rows are internal and must not stream to viewer clients.
+  // Same predicate used by both filters via shouldEmitProjectRow so they
+  // can never drift apart.
+  if (!shouldEmitProjectRow(payload.project)) {
+    logger.debug('WORKER', 'SSE observation broadcast skipped (internal project)', {
+      project: payload.project,
+      id: payload.id,
+    });
+    return;
+  }
+
  worker.sseBroadcaster.broadcast({
    type: 'new_observation',
    observation: payload
@@ -48,6 +61,15 @@ export function broadcastSummary(
    return;
  }

+  // Parity with PaginationHelper's unfiltered-list SQL filter (#2118).
+  if (!shouldEmitProjectRow(payload.project)) {
+    logger.debug('WORKER', 'SSE summary broadcast skipped (internal project)', {
+      project: payload.project,
+      id: payload.id,
+    });
+    return;
+  }
+
  worker.sseBroadcaster.broadcast({
    type: 'new_summary',
    summary: payload
@@ -6,7 +6,7 @@
 *
 * Usage:
 * ```typescript
- * import { processAgentResponse, shouldFallbackToClaude } from './agents/index.js';
+ * import { processAgentResponse, isAbortError } from './agents/index.js';
 * ```
 */

@@ -19,7 +19,6 @@ export type {
  StorageResult,
  ResponseProcessingContext,
  ParsedResponse,
-  FallbackAgent,
  BaseAgentConfig,
 } from './types.js';

@@ -98,17 +98,6 @@ export interface ParsedResponse {
  summary: ParsedSummary | null;
 }

-// ============================================================================
-// Fallback Agent Interface
-// ============================================================================
-
-/**
- * Interface for fallback agent (used by Gemini/OpenRouter to fall back to Claude)
- */
-export interface FallbackAgent {
-  startSession(session: ActiveSession, worker?: WorkerRef): Promise<void>;
-}
-
 // ============================================================================
 // Agent Configuration Types
 // ============================================================================
@@ -13,11 +13,22 @@ import { logger } from '../../../../utils/logger.js';
 import type { DatabaseManager } from '../../DatabaseManager.js';

 // Plan 06 Phase 3 — per-route Zod schema.
+//
+// `metadata` is an arbitrary JSON object the caller can use to attach
+// integration-specific provenance (e.g. obsidian_note, claude_mem_version,
+// custom_key). It is stored verbatim in the observations.metadata column
+// (migration 30) — no schema enforcement on its keys (#2116).
+//
+// `metadata.project`, when present and the top-level `project` is omitted,
+// is honored as the project assignment. This lets integrating plugins file
+// observations under a project other than their own without having to know
+// the top-level field name.
 const saveMemorySchema = z.object({
  text: z.string().trim().min(1),
  title: z.string().optional(),
  project: z.string().optional(),
-}).passthrough();
+  metadata: z.record(z.string(), z.unknown()).optional(),
+}).strict();

 export class MemoryRoutes extends BaseRouteHandler {
  constructor(
@@ -33,11 +44,26 @@ export class MemoryRoutes extends BaseRouteHandler {

  /**
   * POST /api/memory/save - Save a manual memory/observation
-   * Body: { text: string, title?: string, project?: string }
+   * Body: {
+   *   text: string,
+   *   title?: string,
+   *   project?: string,
+   *   metadata?: Record<string, unknown>  // arbitrary JSON, persisted verbatim (#2116)
+   * }
+   *
+   * Project resolution order: top-level `project` → `metadata.project` (string)
+   * → this.defaultProject. Unknown top-level fields are now rejected (400) —
+   * `.strict()` replaced `.passthrough()` so silent drops can't recur.
   */
  private handleSaveMemory = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
-    const { text, title, project } = req.body as z.infer<typeof saveMemorySchema>;
-    const targetProject = project || this.defaultProject;
+    const { text, title, project, metadata } = req.body as z.infer<typeof saveMemorySchema>;
+    const explicitProject = typeof project === 'string' && project.trim()
+      ? project.trim()
+      : undefined;
+    const metadataProject = typeof metadata?.project === 'string' && metadata.project.trim()
+      ? metadata.project.trim()
+      : undefined;
+    const targetProject = explicitProject || metadataProject || this.defaultProject;

    const sessionStore = this.dbManager.getSessionStore();
    const chromaSync = this.dbManager.getChromaSync();
@@ -54,7 +80,10 @@ export class MemoryRoutes extends BaseRouteHandler {
      narrative: text,
      concepts: [] as string[],
      files_read: [] as string[],
-      files_modified: [] as string[]
+      files_modified: [] as string[],
+      // Stringify here so the storage layer doesn't need to know about JSON shape.
+      // Preserved verbatim, including nested objects.
+      metadata: metadata ? JSON.stringify(metadata) : null,
    };

    // 3. Store to SQLite
@@ -449,6 +449,10 @@ export class SearchRoutes extends BaseRouteHandler {
   * GET /api/search/help
   */
  private handleSearchHelp = this.wrapHandler((req: Request, res: Response): void => {
+    // Use the actual host:port the request came in on so example URLs always
+    // round-trip back to this same worker — matters for multi-account / non-
+    // default-port setups (#2101, #2103).
+    const baseUrl = `http://${req.headers.host ?? 'localhost'}`;
    res.json({
      title: 'Claude-Mem Search API',
      description: 'HTTP API for searching persistent memory',
@@ -551,10 +555,10 @@ export class SearchRoutes extends BaseRouteHandler {
        }
      ],
      examples: [
-        'curl "http://localhost:37777/api/search/observations?query=authentication&limit=5"',
-        'curl "http://localhost:37777/api/search/by-type?type=bugfix&limit=10"',
-        'curl "http://localhost:37777/api/context/recent?project=claude-mem&limit=3"',
-        'curl "http://localhost:37777/api/context/timeline?anchor=123&depth_before=5&depth_after=5"'
+        `curl "${baseUrl}/api/search/observations?query=authentication&limit=5"`,
+        `curl "${baseUrl}/api/search/by-type?type=bugfix&limit=10"`,
+        `curl "${baseUrl}/api/context/recent?project=claude-mem&limit=3"`,
+        `curl "${baseUrl}/api/context/timeline?anchor=123&depth_before=5&depth_after=5"`
      ]
    });
  });
@@ -95,7 +95,18 @@ export class SessionRoutes extends BaseRouteHandler {
   * The next generator will use the new provider with shared conversationHistory.
   */
  private static readonly STALE_GENERATOR_THRESHOLD_MS = 30_000; // 30 seconds (#1099)
-  private static readonly MAX_SESSION_WALL_CLOCK_MS = 4 * 60 * 60 * 1000; // 4 hours (#1590)
+
+  // Wall-clock cap on a single in-memory session — exists to prevent runaway
+  // API costs from a session that is somehow stuck in a re-activation loop
+  // (#1590, #2127, #2098). 4h was the original value, picked when bugs in the
+  // re-activation path made cost runaways more plausible; users in practice
+  // have legitimate long-running sessions (24h+ Claude Code days) that this
+  // killed without warning. 24h is the new ceiling — long enough that
+  // a real human workday never hits it, short enough that a runaway loop is
+  // still bounded. We deliberately do NOT expose this as a config knob: a
+  // session approaching this age is almost certainly a bug worth investigating,
+  // not a knob worth tuning.
+  private static readonly MAX_SESSION_WALL_CLOCK_MS = 24 * 60 * 60 * 1000; // 24 hours (#1590, #2127)

  public ensureGeneratorRunning(sessionDbId: number, source: string): void {
    const session = this.sessionManager.getSession(sessionDbId);
@@ -217,6 +217,13 @@ export function buildIsolatedEnv(includeCredentials: boolean = true): Record<str
  // 2. Override SDK entrypoint marker
  isolatedEnv.CLAUDE_CODE_ENTRYPOINT = 'sdk-ts';

+  // 2a. Mark this as an internal claude-mem subprocess so spawned hooks can
+  // skip tracking unconditionally. This is the single trust boundary for
+  // observer-session detection — every consumer can check
+  // process.env.CLAUDE_MEM_INTERNAL instead of repeating cwd-based exclusion
+  // checks (which inevitably drift; see #2118 / #2126).
+  isolatedEnv.CLAUDE_MEM_INTERNAL = '1';
+
  // 3. Re-inject managed credentials from claude-mem's .env file
  if (includeCredentials) {
    const credentials = loadClaudeMemEnv();
@@ -14,7 +14,7 @@
 import { relative, isAbsolute } from 'path';
 import { isProjectExcluded } from '../utils/project-filter.js';
 import { loadFromFileOnce } from './hook-settings.js';
-import { OBSERVER_SESSIONS_DIR } from './paths.js';
+import { OBSERVER_SESSIONS_DIR, OBSERVER_SESSIONS_PROJECT } from './paths.js';

 function isWithin(child: string, parent: string): boolean {
  if (child === parent) return true;
@@ -27,12 +27,18 @@ function isWithin(child: string, parent: string): boolean {
 *          tracking, i.e., the hook should proceed; false when the project
 *          matches one of the exclusion globs.
 *
- * Hard-excludes OBSERVER_SESSIONS_DIR: the SDK agent spawns Claude Code with
- * that cwd, and its hooks must never feed the worker — otherwise the observer's
+ * Single trust boundary: when the spawning worker set CLAUDE_MEM_INTERNAL=1
+ * (see EnvManager.buildIsolatedEnv), the spawned subprocess is an internal
+ * claude-mem agent and must never feed the worker — otherwise the observer's
 * own init/continuation/summary prompts end up stored as `user_prompts` and
- * leak into the viewer (meta-observation).
+ * leak into the viewer (meta-observation; see #2118, #2126).
+ *
+ * The cwd-based OBSERVER_SESSIONS_DIR check stays as belt-and-braces for any
+ * pre-env-var spawn path (e.g., user manually launching `claude` inside the
+ * observer dir) and for tests that don't exercise the env var.
 */
 export function shouldTrackProject(cwd: string): boolean {
+  if (process.env.CLAUDE_MEM_INTERNAL === '1') return false;
  if (!cwd) return true;
  // path.relative handles separator differences (Windows '\\' vs POSIX '/')
  // and trailing-slash variance, which a literal startsWith would miss.
@@ -42,3 +48,17 @@ export function shouldTrackProject(cwd: string): boolean {
  const settings = loadFromFileOnce();
  return !isProjectExcluded(cwd, settings.CLAUDE_MEM_EXCLUDED_PROJECTS);
 }
+
+/**
+ * Shared predicate: should a row tagged with `project` be emitted to user-facing
+ * surfaces (SSE stream, viewer UI list)? Used by both PaginationHelper SQL
+ * filters and SSEBroadcaster payload filters so they can never drift.
+ *
+ * Internal claude-mem rows (project === OBSERVER_SESSIONS_PROJECT) are hidden
+ * from the unfiltered list view and the live SSE stream. They remain queryable
+ * by id and by explicit `project=observer-sessions` filter for diagnostics.
+ */
+export function shouldEmitProjectRow(project: string | null | undefined): boolean {
+  if (!project) return true;
+  return project !== OBSERVER_SESSIONS_PROJECT;
+}
@@ -55,7 +55,8 @@ let cachedHost: string | null = null;

 /**
 * Get the worker port number from settings
- * Uses CLAUDE_MEM_WORKER_PORT from settings file or default (37777)
+ * Uses CLAUDE_MEM_WORKER_PORT from settings file, or the per-UID default
+ * (37700 + uid % 100) defined in SettingsDefaultsManager.
 * Caches the port value to avoid repeated file reads
 */
 export function getWorkerPort(): number {
@@ -6,6 +6,24 @@ export const ENV_EXACT_MATCHES = new Set([
  'MCP_SESSION_ID',
 ]);

+/**
+ * Proxy-related env vars stripped before spawning the worker / `claude` subprocess.
+ * The user's proxy config bleeding into internal AI calls causes connection failures
+ * (see issues #2115, #2099). Stripped unconditionally — no opt-in flag.
+ */
+export const ENV_PROXY_VARS = new Set([
+  'HTTP_PROXY',
+  'HTTPS_PROXY',
+  'ALL_PROXY',
+  'NO_PROXY',
+  'http_proxy',
+  'https_proxy',
+  'all_proxy',
+  'no_proxy',
+  'npm_config_proxy',
+  'npm_config_https_proxy',
+]);
+
 /** Vars that start with CLAUDE_CODE_ but must be preserved for subprocess auth/tooling */
 export const ENV_PRESERVE = new Set([
  'CLAUDE_CODE_OAUTH_TOKEN',
@@ -19,6 +37,7 @@ export function sanitizeEnv(env: NodeJS.ProcessEnv = process.env): NodeJS.Proces
    if (value === undefined) continue;
    if (ENV_PRESERVE.has(key)) { sanitized[key] = value; continue; }
    if (ENV_EXACT_MATCHES.has(key)) continue;
+    if (ENV_PROXY_VARS.has(key)) continue;
    if (ENV_PREFIXES.some(prefix => key.startsWith(prefix))) continue;
    sanitized[key] = value;
  }
@@ -5,6 +5,12 @@
 export const DEFAULT_SETTINGS = {
  CLAUDE_MEM_MODEL: 'claude-sonnet-4-6',
  CLAUDE_MEM_CONTEXT_OBSERVATIONS: '50',
+  // Build-time placeholder only. The viewer runs in-browser served by the
+  // worker itself, so actual API calls use window.location and the real port
+  // is fetched from /api/settings into useSettings(). This literal is just a
+  // form-field fallback rendered for the brief moment before the API response
+  // arrives. Multi-account / per-UID port resolution lives in
+  // SettingsDefaultsManager (server-side); see CLAUDE.md → Multi-account.
  CLAUDE_MEM_WORKER_PORT: '37777',
  CLAUDE_MEM_WORKER_HOST: '127.0.0.1',

@@ -6,6 +6,7 @@
 */

 import { homedir } from 'os';
+import { basename } from 'path';

 /**
 * Convert a glob pattern to a regular expression
@@ -50,6 +51,11 @@ export function isProjectExcluded(projectPath: string, exclusionPatterns: string

  // Normalize cwd path separators
  const normalizedProjectPath = projectPath.replace(/\\/g, '/');
+  // Basename match pass: users intuitively expect `observer-sessions` or
+  // `*observer-sessions*` to match any cwd whose final segment matches, but
+  // globToRegex translates `*` → `[^/]*` which can't cross `/`. Without this,
+  // both bare names and basename globs silently fail (#2126 item 1).
+  const projectBasename = basename(normalizedProjectPath);

  // Parse comma-separated patterns
  const patternList = exclusionPatterns
@@ -60,7 +66,7 @@ export function isProjectExcluded(projectPath: string, exclusionPatterns: string
  for (const pattern of patternList) {
    try {
      const regex = globToRegex(pattern);
-      if (regex.test(normalizedProjectPath)) {
+      if (regex.test(normalizedProjectPath) || regex.test(projectBasename)) {
        return true;
      }
    } catch (error: unknown) {