v12.4.3: one-time pollution cleanup migration + v12.4.1/v12.4.2 fixes (#2133)

* fix: 5 trivial bugs from v12.4.1 issue triage - #2092: emit CJS-safe banner (no import.meta.url) in worker-service.cjs - #2100: PreToolUse Read hook timeout 2000s → 60s - #2131: add "shell": "bash" to every hook for Windows compat - #2132: Antigravity dir typo .agent → .agents - #2088: clear inherited MCP servers in worker SDK query() calls Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: stop context overflow loop + block task-notification leak - SDKAgent: clear memorySessionId on "prompt is too long" so crash-recovery starts a fresh SDK session instead of resuming the same poisoned context forever (was producing 68+ failed pending_messages on a single stuck session in the wild) - tag-stripping: new isInternalProtocolPayload() predicate; session-init hook + SessionRoutes both skip storage when entire prompt is one of Claude Code's autonomous protocol blocks (currently <task-notification>; conservative deny-list — does NOT touch <command-name>/<command-message> which wrap real user slash-commands) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version to 12.4.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update CHANGELOG.md for v12.4.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cleanup): one-time v12.4.3 migration purges observer-sessions and stuck pending_messages Adds CleanupV12_4_3 module that runs once per data dir on worker startup (after migrations apply, before Chroma backfill). Drops accumulated pollution that v12.4.0 (observer-sessions filter) and v12.4.2 (context-overflow guard + task-notification leak block) prevent from recurring: - DELETE FROM sdk_sessions WHERE project='observer-sessions' (cascades to user_prompts, observations, session_summaries via existing FK ON DELETE CASCADE) - DELETE FROM pending_messages stuck in 'failed'/'processing' for any session with >=10 such rows (poisoned chains from the pre-v12.4.2 retry loop; threshold spares legitimate transient failures) - Wipes ~/.claude-mem/chroma and chroma-sync-state.json so backfillAllProjects rebuilds the vector store from cleaned SQLite Pre-flight checks free disk (1.2x DB size + 100MB) via fs.statfsSync; backs up via VACUUM INTO with copyFileSync fallback; PRAGMA foreign_keys=ON on the cleanup connection (off by default in bun:sqlite). Marker file ~/.claude-mem/.cleanup-v12.4.3-applied records backup path and counts. Opt-out via CLAUDE_MEM_SKIP_CLEANUP_V12_4_3=1. Verified locally: 311MB DB backed up to 277MB in 943ms; 11 observer sessions + 3 cascade rows + 141 stuck pending_messages purged; chroma rebuilt via backfill. Total cleanup time 1.1s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address PR #2133 code review - SessionRoutes: check isInternalProtocolPayload before stripping tags so internal protocol prompts skip the strip work entirely. - tag-stripping: bound isInternalProtocolPayload input length to 256KB to prevent ReDoS-class scans on malformed unclosed tags. - SDKAgent: extract resetSessionForFreshStart helper; both context-overflow paths now share one nullification routine. - worker-service: drop the per-startup "Checking for one-time v12.4.3 cleanup" info log — runs every boot even after marker exists; the function already logs at debug/warn when relevant. - tests: add isInternalProtocolPayload edge cases (whitespace, attributes, partial tags, unrelated tags, oversize input). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address Greptile P2 comments on PR #2133 CleanupV12_4_3.ts: derive backup directory and restore-hint path from effectiveDataDir instead of the module-level BACKUPS_DIR/DB_PATH constants. The dataDirectory override is meant for test isolation; the prior version still wrote backups to the production directory. SessionRoutes.ts: move isInternalProtocolPayload guard to the top of handleSessionInitByClaudeId, before createSDKSession. The previous position blocked the user_prompts insert but still created an empty sdk_sessions row, asymmetric with the hook-layer guard in session-init.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cleanup): retry on disk-skip; survive chroma wipe failure CodeRabbit Major + Claude review: - Disk pre-flight skip no longer writes the marker. A user temporarily low on disk would otherwise have the cleanup permanently disabled even after freeing space. Retry on next startup instead. - Wrap wipeChromaArtifacts in try/catch and write the marker even on failure (with chromaWipeError captured). Without this, an rmSync permission failure on chroma/ left writeMarker unreached, so every subsequent boot re-ran the SQL purge AND created a fresh backup, consuming disk indefinitely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cleanup): close backup handle before copyFileSync fallback Claude review: - backupDb is now closed before falling into the copyFileSync fallback. On Windows an open SQLite handle holds a file lock that can prevent the fallback copy from reading the source. The previous version only closed after both branches completed. - Add empty-body <task-notification></task-notification> case to the isInternalProtocolPayload tests for completeness. Cascade-row count queries already match the actual FK columns (content_session_id for user_prompts, memory_session_id for observations / session_summaries) — no fix needed there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cleanup): accurate session count + add migration tests Claude review v3: session-init.ts: filter on rawPrompt before the [media prompt] substitution. Functionally equivalent but explicit — the check no longer depends on the substitution leaving real protocol payloads untouched. CleanupV12_4_3.ts: counts.observerSessions now comes from a pre-DELETE COUNT(*), not from result.changes. bun:sqlite inflates result.changes with FTS-trigger and cascade row counts (the user_prompts_fts triggers inflate a 3-session purge to 19 changes). The previous code logged a misleading total and wrote it to the marker. tests/infrastructure/cleanup-v12_4_3.test.ts: happy-path coverage of the migration against a real on-disk SQLite under a tmpdir. Verifies observer-session purge with cascades, stuck pending_messages purge, chroma artifact wipe, marker payload shape, idempotency on re-run, and CLAUDE_MEM_SKIP_CLEANUP_V12_4_3 opt-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(protocol-filter): close two-block false positive; address review CodeRabbit + Claude review v5: tag-stripping.ts: PROTOCOL_ONLY_REGEX rewritten with a negative-lookahead body so a prompt like "<task-notification>x</task-notification> hi <task-notification>y</task-notification>" no longer matches as a single outer block — the prior greedy [\s\S]* spanned the middle user text and would have silently dropped a real prompt. Confirmed via probe. tag-stripping.test.ts: drop the 50ms wall-clock assertion (CI flake); add the two-block-with-text case as a regression test. SessionRoutes.ts: filter on req.body.prompt directly, before the [media prompt] substitution and 256KB truncation. Mirrors the session-init.ts hook-layer ordering and ensures a protocol payload that happens to be near the byte limit isn't truncated before the filter runs. cleanup-v12_4_3.test.ts: add stuckCount=9 below-threshold case verifying pending_messages with <10 stuck rows are preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cleanup): include WAL/SHM in backup fallback; safer rollback CodeRabbit Major + Claude review v6: CleanupV12_4_3.ts: when VACUUM INTO fails and copyFileSync runs, also copy any -wal/-shm sidecars. The DB is configured WAL mode, so recent committed pages can live in those files; copying only the .db would miss them. VACUUM INTO already captures everything in one file, so the happy path is unaffected. CleanupV12_4_3.ts: wrap ROLLBACK in try/catch so a no-op rollback (SQLite already rolled back on a constraint failure) cannot shadow the original purge error. SDKAgent.ts: align both context-overflow log levels to error. Both branches are fatal-recovery paths; the previous warn/error split was inconsistent and made the throw branch easy to miss in logs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: pre-count stuck pending_messages; document adjacent-block fall-through Claude review v7: CleanupV12_4_3.ts: runStuckPendingPurge now uses a SELECT COUNT(*) before the DELETE, matching the pattern in runObserverSessionsPurge. result.changes is reliable today (no FTS on pending_messages) but the explicit count protects against future schema additions, and keeps the two purges symmetric. tag-stripping.test.ts: add test documenting that adjacent protocol blocks (no user text between) deliberately fall through to storage. The deny-list is per-block; concatenations are out of scope. Skipped per project rules / Node API constraints: - frsize fallback in disk check: Node/Bun StatFs doesn't expose frsize - VACUUM-INTO comment: comment-only suggestion - Overflow string constant extraction: low value Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 16:30:34 -07:00
parent a2e174b90f
commit 703c64c756
24 changed files with 1191 additions and 511 deletions
@@ -0,0 +1,196 @@
+/**
+ * Happy-path tests for runOneTimeV12_4_3Cleanup.
+ *
+ * Uses a real on-disk SQLite under a tmpdir so VACUUM INTO, statSync,
+ * statfsSync, and marker-file writes all exercise their real code paths.
+ */
+
+import { describe, it, expect, beforeEach, afterEach, spyOn } from 'bun:test';
+import { mkdtempSync, rmSync, existsSync, writeFileSync, mkdirSync, readFileSync, readdirSync } from 'fs';
+import path from 'path';
+import { tmpdir } from 'os';
+import { Database } from 'bun:sqlite';
+import { runOneTimeV12_4_3Cleanup } from '../../src/services/infrastructure/CleanupV12_4_3.js';
+import { ClaudeMemDatabase } from '../../src/services/sqlite/Database.js';
+import { OBSERVER_SESSIONS_PROJECT } from '../../src/shared/paths.js';
+import { logger } from '../../src/utils/logger.js';
+
+let loggerSpies: ReturnType<typeof spyOn>[] = [];
+
+function silenceLogger(): void {
+  loggerSpies = [
+    spyOn(logger, 'info').mockImplementation(() => {}),
+    spyOn(logger, 'debug').mockImplementation(() => {}),
+    spyOn(logger, 'warn').mockImplementation(() => {}),
+    spyOn(logger, 'error').mockImplementation(() => {}),
+  ];
+}
+
+function restoreLogger(): void {
+  loggerSpies.forEach(s => s.mockRestore());
+  loggerSpies = [];
+}
+
+function seedDatabase(dbPath: string, opts: { observerSessions: number; stuckCount: number }): { observerSessionDbIds: number[]; keepSessionDbId: number } {
+  const seed = new ClaudeMemDatabase(dbPath);
+  const db = seed.db;
+  const now = new Date().toISOString();
+  const epoch = Date.now();
+
+  const insertSession = db.prepare(
+    `INSERT INTO sdk_sessions (content_session_id, memory_session_id, project, started_at, started_at_epoch)
+     VALUES (?, ?, ?, ?, ?)`
+  );
+  const insertPrompt = db.prepare(
+    `INSERT INTO user_prompts (content_session_id, prompt_number, prompt_text, created_at, created_at_epoch)
+     VALUES (?, 1, ?, ?, ?)`
+  );
+  const insertObservation = db.prepare(
+    `INSERT INTO observations (memory_session_id, project, type, text, created_at, created_at_epoch)
+     VALUES (?, ?, 'discovery', ?, ?, ?)`
+  );
+
+  const observerSessionDbIds: number[] = [];
+  for (let i = 0; i < opts.observerSessions; i++) {
+    const result = insertSession.run(`obs-content-${i}`, `obs-memory-${i}`, OBSERVER_SESSIONS_PROJECT, now, epoch);
+    observerSessionDbIds.push(Number(result.lastInsertRowid));
+    insertPrompt.run(`obs-content-${i}`, `prompt ${i}`, now, epoch);
+    insertObservation.run(`obs-memory-${i}`, OBSERVER_SESSIONS_PROJECT, `obs ${i}`, now, epoch);
+  }
+
+  // Real session that should survive
+  const keepResult = insertSession.run('keep-content', 'keep-memory', 'real-project', now, epoch);
+  const keepSessionDbId = Number(keepResult.lastInsertRowid);
+  insertPrompt.run('keep-content', 'survives', now, epoch);
+
+  // Stuck pending_messages tied to the surviving session (so FK passes).
+  const insertPending = db.prepare(
+    `INSERT INTO pending_messages (session_db_id, content_session_id, message_type, status, created_at_epoch)
+     VALUES (?, 'keep-content', 'observation', 'failed', ?)`
+  );
+  for (let i = 0; i < opts.stuckCount; i++) {
+    insertPending.run(keepSessionDbId, epoch);
+  }
+
+  seed.close();
+  return { observerSessionDbIds, keepSessionDbId };
+}
+
+describe('runOneTimeV12_4_3Cleanup', () => {
+  let tmpDataDir: string;
+
+  beforeEach(() => {
+    tmpDataDir = mkdtempSync(path.join(tmpdir(), 'cleanup-v12_4_3-'));
+    silenceLogger();
+  });
+
+  afterEach(() => {
+    restoreLogger();
+    rmSync(tmpDataDir, { recursive: true, force: true });
+  });
+
+  it('writes a no-db marker when the DB is missing', () => {
+    runOneTimeV12_4_3Cleanup(tmpDataDir);
+
+    const markerPath = path.join(tmpDataDir, '.cleanup-v12.4.3-applied');
+    expect(existsSync(markerPath)).toBe(true);
+
+    const payload = JSON.parse(readFileSync(markerPath, 'utf8'));
+    expect(payload.skipped).toBe('no-db');
+    expect(payload.backupPath).toBeNull();
+    expect(payload.counts).toEqual({ observerSessions: 0, observerCascadeRows: 0, stuckPendingMessages: 0 });
+  });
+
+  it('purges observer-sessions and stuck pending_messages, writes marker, wipes chroma', () => {
+    const dbPath = path.join(tmpDataDir, 'claude-mem.db');
+    seedDatabase(dbPath, { observerSessions: 3, stuckCount: 12 });
+
+    // chroma artifacts that should be wiped
+    mkdirSync(path.join(tmpDataDir, 'chroma'), { recursive: true });
+    writeFileSync(path.join(tmpDataDir, 'chroma', 'collection.bin'), 'opaque');
+    writeFileSync(path.join(tmpDataDir, 'chroma-sync-state.json'), '{}');
+
+    runOneTimeV12_4_3Cleanup(tmpDataDir);
+
+    const markerPath = path.join(tmpDataDir, '.cleanup-v12.4.3-applied');
+    expect(existsSync(markerPath)).toBe(true);
+    const payload = JSON.parse(readFileSync(markerPath, 'utf8'));
+
+    expect(payload.counts.observerSessions).toBe(3);
+    expect(payload.counts.observerCascadeRows).toBe(6); // 3 user_prompts + 3 observations
+    expect(payload.counts.stuckPendingMessages).toBe(12);
+    expect(payload.chromaWiped).toBe(true);
+    expect(payload.chromaWipeError).toBeUndefined();
+    expect(payload.backupPath).toBeTruthy();
+
+    // Backup file is real and non-empty
+    expect(existsSync(payload.backupPath)).toBe(true);
+
+    // Chroma artifacts gone
+    expect(existsSync(path.join(tmpDataDir, 'chroma'))).toBe(false);
+    expect(existsSync(path.join(tmpDataDir, 'chroma-sync-state.json'))).toBe(false);
+
+    // Real session still present; observer rows gone
+    const verify = new Database(dbPath, { readonly: true });
+    const observerCount = (verify.prepare('SELECT COUNT(*) AS n FROM sdk_sessions WHERE project = ?').get(OBSERVER_SESSIONS_PROJECT) as { n: number }).n;
+    const realCount = (verify.prepare(`SELECT COUNT(*) AS n FROM sdk_sessions WHERE project = 'real-project'`).get() as { n: number }).n;
+    const survivingPrompts = (verify.prepare('SELECT COUNT(*) AS n FROM user_prompts').get() as { n: number }).n;
+    const survivingPending = (verify.prepare('SELECT COUNT(*) AS n FROM pending_messages').get() as { n: number }).n;
+    verify.close();
+
+    expect(observerCount).toBe(0);
+    expect(realCount).toBe(1);
+    expect(survivingPrompts).toBe(1); // only the keep-content prompt
+    expect(survivingPending).toBe(0);
+  });
+
+  it('preserves pending_messages when stuck count is below the threshold of 10', () => {
+    const dbPath = path.join(tmpDataDir, 'claude-mem.db');
+    seedDatabase(dbPath, { observerSessions: 0, stuckCount: 9 });
+
+    runOneTimeV12_4_3Cleanup(tmpDataDir);
+
+    const markerPath = path.join(tmpDataDir, '.cleanup-v12.4.3-applied');
+    const payload = JSON.parse(readFileSync(markerPath, 'utf8'));
+    expect(payload.counts.stuckPendingMessages).toBe(0);
+
+    const verify = new Database(dbPath, { readonly: true });
+    const survivingPending = (verify.prepare('SELECT COUNT(*) AS n FROM pending_messages').get() as { n: number }).n;
+    verify.close();
+    expect(survivingPending).toBe(9);
+  });
+
+  it('is idempotent: a second invocation does no work and does not create a second backup', () => {
+    const dbPath = path.join(tmpDataDir, 'claude-mem.db');
+    seedDatabase(dbPath, { observerSessions: 1, stuckCount: 10 });
+
+    runOneTimeV12_4_3Cleanup(tmpDataDir);
+    const backupsAfterFirst = readdirSync(path.join(tmpDataDir, 'backups'));
+    expect(backupsAfterFirst.length).toBe(1);
+
+    runOneTimeV12_4_3Cleanup(tmpDataDir);
+    const backupsAfterSecond = readdirSync(path.join(tmpDataDir, 'backups'));
+    expect(backupsAfterSecond).toEqual(backupsAfterFirst);
+  });
+
+  it('honors CLAUDE_MEM_SKIP_CLEANUP_V12_4_3=1 by exiting without writing the marker', () => {
+    const dbPath = path.join(tmpDataDir, 'claude-mem.db');
+    seedDatabase(dbPath, { observerSessions: 1, stuckCount: 10 });
+
+    const original = process.env.CLAUDE_MEM_SKIP_CLEANUP_V12_4_3;
+    process.env.CLAUDE_MEM_SKIP_CLEANUP_V12_4_3 = '1';
+    try {
+      runOneTimeV12_4_3Cleanup(tmpDataDir);
+    } finally {
+      if (original === undefined) delete process.env.CLAUDE_MEM_SKIP_CLEANUP_V12_4_3;
+      else process.env.CLAUDE_MEM_SKIP_CLEANUP_V12_4_3 = original;
+    }
+
+    expect(existsSync(path.join(tmpDataDir, '.cleanup-v12.4.3-applied'))).toBe(false);
+
+    const verify = new Database(dbPath, { readonly: true });
+    const observerCount = (verify.prepare('SELECT COUNT(*) AS n FROM sdk_sessions WHERE project = ?').get(OBSERVER_SESSIONS_PROJECT) as { n: number }).n;
+    verify.close();
+    expect(observerCount).toBe(1); // untouched
+  });
+});