v12.4.3: one-time pollution cleanup migration + v12.4.1/v12.4.2 fixes (#2133)

* fix: 5 trivial bugs from v12.4.1 issue triage

- #2092: emit CJS-safe banner (no import.meta.url) in worker-service.cjs
- #2100: PreToolUse Read hook timeout 2000s → 60s
- #2131: add "shell": "bash" to every hook for Windows compat
- #2132: Antigravity dir typo .agent → .agents
- #2088: clear inherited MCP servers in worker SDK query() calls

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: stop context overflow loop + block task-notification leak

- SDKAgent: clear memorySessionId on "prompt is too long" so crash-recovery
  starts a fresh SDK session instead of resuming the same poisoned context
  forever (was producing 68+ failed pending_messages on a single stuck
  session in the wild)
- tag-stripping: new isInternalProtocolPayload() predicate; session-init
  hook + SessionRoutes both skip storage when entire prompt is one of
  Claude Code's autonomous protocol blocks (currently <task-notification>;
  conservative deny-list — does NOT touch <command-name>/<command-message>
  which wrap real user slash-commands)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version to 12.4.2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update CHANGELOG.md for v12.4.2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cleanup): one-time v12.4.3 migration purges observer-sessions and stuck pending_messages

Adds CleanupV12_4_3 module that runs once per data dir on worker startup
(after migrations apply, before Chroma backfill). Drops accumulated pollution
that v12.4.0 (observer-sessions filter) and v12.4.2 (context-overflow guard +
task-notification leak block) prevent from recurring:

- DELETE FROM sdk_sessions WHERE project='observer-sessions' (cascades to
  user_prompts, observations, session_summaries via existing FK ON DELETE CASCADE)
- DELETE FROM pending_messages stuck in 'failed'/'processing' for any session
  with >=10 such rows (poisoned chains from the pre-v12.4.2 retry loop;
  threshold spares legitimate transient failures)
- Wipes ~/.claude-mem/chroma and chroma-sync-state.json so backfillAllProjects
  rebuilds the vector store from cleaned SQLite

Pre-flight checks free disk (1.2x DB size + 100MB) via fs.statfsSync; backs up
via VACUUM INTO with copyFileSync fallback; PRAGMA foreign_keys=ON on the
cleanup connection (off by default in bun:sqlite). Marker file
~/.claude-mem/.cleanup-v12.4.3-applied records backup path and counts. Opt-out
via CLAUDE_MEM_SKIP_CLEANUP_V12_4_3=1.

Verified locally: 311MB DB backed up to 277MB in 943ms; 11 observer sessions
+ 3 cascade rows + 141 stuck pending_messages purged; chroma rebuilt via
backfill. Total cleanup time 1.1s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address PR #2133 code review

- SessionRoutes: check isInternalProtocolPayload before stripping tags
  so internal protocol prompts skip the strip work entirely.
- tag-stripping: bound isInternalProtocolPayload input length to
  256KB to prevent ReDoS-class scans on malformed unclosed tags.
- SDKAgent: extract resetSessionForFreshStart helper; both
  context-overflow paths now share one nullification routine.
- worker-service: drop the per-startup "Checking for one-time
  v12.4.3 cleanup" info log — runs every boot even after marker
  exists; the function already logs at debug/warn when relevant.
- tests: add isInternalProtocolPayload edge cases (whitespace,
  attributes, partial tags, unrelated tags, oversize input).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile P2 comments on PR #2133

CleanupV12_4_3.ts: derive backup directory and restore-hint path from
effectiveDataDir instead of the module-level BACKUPS_DIR/DB_PATH
constants. The dataDirectory override is meant for test isolation;
the prior version still wrote backups to the production directory.

SessionRoutes.ts: move isInternalProtocolPayload guard to the top of
handleSessionInitByClaudeId, before createSDKSession. The previous
position blocked the user_prompts insert but still created an empty
sdk_sessions row, asymmetric with the hook-layer guard in
session-init.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cleanup): retry on disk-skip; survive chroma wipe failure

CodeRabbit Major + Claude review:

- Disk pre-flight skip no longer writes the marker. A user temporarily
  low on disk would otherwise have the cleanup permanently disabled
  even after freeing space. Retry on next startup instead.

- Wrap wipeChromaArtifacts in try/catch and write the marker even on
  failure (with chromaWipeError captured). Without this, an rmSync
  permission failure on chroma/ left writeMarker unreached, so every
  subsequent boot re-ran the SQL purge AND created a fresh backup,
  consuming disk indefinitely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cleanup): close backup handle before copyFileSync fallback

Claude review:

- backupDb is now closed before falling into the copyFileSync fallback.
  On Windows an open SQLite handle holds a file lock that can prevent
  the fallback copy from reading the source. The previous version only
  closed after both branches completed.

- Add empty-body <task-notification></task-notification> case to the
  isInternalProtocolPayload tests for completeness.

Cascade-row count queries already match the actual FK columns
(content_session_id for user_prompts, memory_session_id for
observations / session_summaries) — no fix needed there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cleanup): accurate session count + add migration tests

Claude review v3:

session-init.ts: filter on rawPrompt before the [media prompt]
substitution. Functionally equivalent but explicit — the check no
longer depends on the substitution leaving real protocol payloads
untouched.

CleanupV12_4_3.ts: counts.observerSessions now comes from a pre-DELETE
COUNT(*), not from result.changes. bun:sqlite inflates result.changes
with FTS-trigger and cascade row counts (the user_prompts_fts triggers
inflate a 3-session purge to 19 changes). The previous code logged a
misleading total and wrote it to the marker.

tests/infrastructure/cleanup-v12_4_3.test.ts: happy-path coverage of
the migration against a real on-disk SQLite under a tmpdir. Verifies
observer-session purge with cascades, stuck pending_messages purge,
chroma artifact wipe, marker payload shape, idempotency on re-run, and
CLAUDE_MEM_SKIP_CLEANUP_V12_4_3 opt-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(protocol-filter): close two-block false positive; address review

CodeRabbit + Claude review v5:

tag-stripping.ts: PROTOCOL_ONLY_REGEX rewritten with a negative-lookahead
body so a prompt like "<task-notification>x</task-notification> hi
<task-notification>y</task-notification>" no longer matches as a single
outer block — the prior greedy [\s\S]* spanned the middle user text and
would have silently dropped a real prompt. Confirmed via probe.

tag-stripping.test.ts: drop the 50ms wall-clock assertion (CI flake);
add the two-block-with-text case as a regression test.

SessionRoutes.ts: filter on req.body.prompt directly, before the
[media prompt] substitution and 256KB truncation. Mirrors the
session-init.ts hook-layer ordering and ensures a protocol payload
that happens to be near the byte limit isn't truncated before the
filter runs.

cleanup-v12_4_3.test.ts: add stuckCount=9 below-threshold case
verifying pending_messages with <10 stuck rows are preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cleanup): include WAL/SHM in backup fallback; safer rollback

CodeRabbit Major + Claude review v6:

CleanupV12_4_3.ts: when VACUUM INTO fails and copyFileSync runs, also
copy any -wal/-shm sidecars. The DB is configured WAL mode, so recent
committed pages can live in those files; copying only the .db would
miss them. VACUUM INTO already captures everything in one file, so
the happy path is unaffected.

CleanupV12_4_3.ts: wrap ROLLBACK in try/catch so a no-op rollback
(SQLite already rolled back on a constraint failure) cannot shadow
the original purge error.

SDKAgent.ts: align both context-overflow log levels to error. Both
branches are fatal-recovery paths; the previous warn/error split was
inconsistent and made the throw branch easy to miss in logs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: pre-count stuck pending_messages; document adjacent-block fall-through

Claude review v7:

CleanupV12_4_3.ts: runStuckPendingPurge now uses a SELECT COUNT(*)
before the DELETE, matching the pattern in runObserverSessionsPurge.
result.changes is reliable today (no FTS on pending_messages) but the
explicit count protects against future schema additions, and keeps
the two purges symmetric.

tag-stripping.test.ts: add test documenting that adjacent protocol
blocks (no user text between) deliberately fall through to storage.
The deny-list is per-block; concatenations are out of scope.

Skipped per project rules / Node API constraints:
- frsize fallback in disk check: Node/Bun StatFs doesn't expose frsize
- VACUUM-INTO comment: comment-only suggestion
- Overflow string constant extraction: low value

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-04-25 16:30:34 -07:00
committed by GitHub
parent a2e174b90f
commit 703c64c756
24 changed files with 1191 additions and 511 deletions
@@ -0,0 +1,196 @@
/**
* Happy-path tests for runOneTimeV12_4_3Cleanup.
*
* Uses a real on-disk SQLite under a tmpdir so VACUUM INTO, statSync,
* statfsSync, and marker-file writes all exercise their real code paths.
*/
import { describe, it, expect, beforeEach, afterEach, spyOn } from 'bun:test';
import { mkdtempSync, rmSync, existsSync, writeFileSync, mkdirSync, readFileSync, readdirSync } from 'fs';
import path from 'path';
import { tmpdir } from 'os';
import { Database } from 'bun:sqlite';
import { runOneTimeV12_4_3Cleanup } from '../../src/services/infrastructure/CleanupV12_4_3.js';
import { ClaudeMemDatabase } from '../../src/services/sqlite/Database.js';
import { OBSERVER_SESSIONS_PROJECT } from '../../src/shared/paths.js';
import { logger } from '../../src/utils/logger.js';
let loggerSpies: ReturnType<typeof spyOn>[] = [];
function silenceLogger(): void {
loggerSpies = [
spyOn(logger, 'info').mockImplementation(() => {}),
spyOn(logger, 'debug').mockImplementation(() => {}),
spyOn(logger, 'warn').mockImplementation(() => {}),
spyOn(logger, 'error').mockImplementation(() => {}),
];
}
function restoreLogger(): void {
loggerSpies.forEach(s => s.mockRestore());
loggerSpies = [];
}
function seedDatabase(dbPath: string, opts: { observerSessions: number; stuckCount: number }): { observerSessionDbIds: number[]; keepSessionDbId: number } {
const seed = new ClaudeMemDatabase(dbPath);
const db = seed.db;
const now = new Date().toISOString();
const epoch = Date.now();
const insertSession = db.prepare(
`INSERT INTO sdk_sessions (content_session_id, memory_session_id, project, started_at, started_at_epoch)
VALUES (?, ?, ?, ?, ?)`
);
const insertPrompt = db.prepare(
`INSERT INTO user_prompts (content_session_id, prompt_number, prompt_text, created_at, created_at_epoch)
VALUES (?, 1, ?, ?, ?)`
);
const insertObservation = db.prepare(
`INSERT INTO observations (memory_session_id, project, type, text, created_at, created_at_epoch)
VALUES (?, ?, 'discovery', ?, ?, ?)`
);
const observerSessionDbIds: number[] = [];
for (let i = 0; i < opts.observerSessions; i++) {
const result = insertSession.run(`obs-content-${i}`, `obs-memory-${i}`, OBSERVER_SESSIONS_PROJECT, now, epoch);
observerSessionDbIds.push(Number(result.lastInsertRowid));
insertPrompt.run(`obs-content-${i}`, `prompt ${i}`, now, epoch);
insertObservation.run(`obs-memory-${i}`, OBSERVER_SESSIONS_PROJECT, `obs ${i}`, now, epoch);
}
// Real session that should survive
const keepResult = insertSession.run('keep-content', 'keep-memory', 'real-project', now, epoch);
const keepSessionDbId = Number(keepResult.lastInsertRowid);
insertPrompt.run('keep-content', 'survives', now, epoch);
// Stuck pending_messages tied to the surviving session (so FK passes).
const insertPending = db.prepare(
`INSERT INTO pending_messages (session_db_id, content_session_id, message_type, status, created_at_epoch)
VALUES (?, 'keep-content', 'observation', 'failed', ?)`
);
for (let i = 0; i < opts.stuckCount; i++) {
insertPending.run(keepSessionDbId, epoch);
}
seed.close();
return { observerSessionDbIds, keepSessionDbId };
}
describe('runOneTimeV12_4_3Cleanup', () => {
let tmpDataDir: string;
beforeEach(() => {
tmpDataDir = mkdtempSync(path.join(tmpdir(), 'cleanup-v12_4_3-'));
silenceLogger();
});
afterEach(() => {
restoreLogger();
rmSync(tmpDataDir, { recursive: true, force: true });
});
it('writes a no-db marker when the DB is missing', () => {
runOneTimeV12_4_3Cleanup(tmpDataDir);
const markerPath = path.join(tmpDataDir, '.cleanup-v12.4.3-applied');
expect(existsSync(markerPath)).toBe(true);
const payload = JSON.parse(readFileSync(markerPath, 'utf8'));
expect(payload.skipped).toBe('no-db');
expect(payload.backupPath).toBeNull();
expect(payload.counts).toEqual({ observerSessions: 0, observerCascadeRows: 0, stuckPendingMessages: 0 });
});
it('purges observer-sessions and stuck pending_messages, writes marker, wipes chroma', () => {
const dbPath = path.join(tmpDataDir, 'claude-mem.db');
seedDatabase(dbPath, { observerSessions: 3, stuckCount: 12 });
// chroma artifacts that should be wiped
mkdirSync(path.join(tmpDataDir, 'chroma'), { recursive: true });
writeFileSync(path.join(tmpDataDir, 'chroma', 'collection.bin'), 'opaque');
writeFileSync(path.join(tmpDataDir, 'chroma-sync-state.json'), '{}');
runOneTimeV12_4_3Cleanup(tmpDataDir);
const markerPath = path.join(tmpDataDir, '.cleanup-v12.4.3-applied');
expect(existsSync(markerPath)).toBe(true);
const payload = JSON.parse(readFileSync(markerPath, 'utf8'));
expect(payload.counts.observerSessions).toBe(3);
expect(payload.counts.observerCascadeRows).toBe(6); // 3 user_prompts + 3 observations
expect(payload.counts.stuckPendingMessages).toBe(12);
expect(payload.chromaWiped).toBe(true);
expect(payload.chromaWipeError).toBeUndefined();
expect(payload.backupPath).toBeTruthy();
// Backup file is real and non-empty
expect(existsSync(payload.backupPath)).toBe(true);
// Chroma artifacts gone
expect(existsSync(path.join(tmpDataDir, 'chroma'))).toBe(false);
expect(existsSync(path.join(tmpDataDir, 'chroma-sync-state.json'))).toBe(false);
// Real session still present; observer rows gone
const verify = new Database(dbPath, { readonly: true });
const observerCount = (verify.prepare('SELECT COUNT(*) AS n FROM sdk_sessions WHERE project = ?').get(OBSERVER_SESSIONS_PROJECT) as { n: number }).n;
const realCount = (verify.prepare(`SELECT COUNT(*) AS n FROM sdk_sessions WHERE project = 'real-project'`).get() as { n: number }).n;
const survivingPrompts = (verify.prepare('SELECT COUNT(*) AS n FROM user_prompts').get() as { n: number }).n;
const survivingPending = (verify.prepare('SELECT COUNT(*) AS n FROM pending_messages').get() as { n: number }).n;
verify.close();
expect(observerCount).toBe(0);
expect(realCount).toBe(1);
expect(survivingPrompts).toBe(1); // only the keep-content prompt
expect(survivingPending).toBe(0);
});
it('preserves pending_messages when stuck count is below the threshold of 10', () => {
const dbPath = path.join(tmpDataDir, 'claude-mem.db');
seedDatabase(dbPath, { observerSessions: 0, stuckCount: 9 });
runOneTimeV12_4_3Cleanup(tmpDataDir);
const markerPath = path.join(tmpDataDir, '.cleanup-v12.4.3-applied');
const payload = JSON.parse(readFileSync(markerPath, 'utf8'));
expect(payload.counts.stuckPendingMessages).toBe(0);
const verify = new Database(dbPath, { readonly: true });
const survivingPending = (verify.prepare('SELECT COUNT(*) AS n FROM pending_messages').get() as { n: number }).n;
verify.close();
expect(survivingPending).toBe(9);
});
it('is idempotent: a second invocation does no work and does not create a second backup', () => {
const dbPath = path.join(tmpDataDir, 'claude-mem.db');
seedDatabase(dbPath, { observerSessions: 1, stuckCount: 10 });
runOneTimeV12_4_3Cleanup(tmpDataDir);
const backupsAfterFirst = readdirSync(path.join(tmpDataDir, 'backups'));
expect(backupsAfterFirst.length).toBe(1);
runOneTimeV12_4_3Cleanup(tmpDataDir);
const backupsAfterSecond = readdirSync(path.join(tmpDataDir, 'backups'));
expect(backupsAfterSecond).toEqual(backupsAfterFirst);
});
it('honors CLAUDE_MEM_SKIP_CLEANUP_V12_4_3=1 by exiting without writing the marker', () => {
const dbPath = path.join(tmpDataDir, 'claude-mem.db');
seedDatabase(dbPath, { observerSessions: 1, stuckCount: 10 });
const original = process.env.CLAUDE_MEM_SKIP_CLEANUP_V12_4_3;
process.env.CLAUDE_MEM_SKIP_CLEANUP_V12_4_3 = '1';
try {
runOneTimeV12_4_3Cleanup(tmpDataDir);
} finally {
if (original === undefined) delete process.env.CLAUDE_MEM_SKIP_CLEANUP_V12_4_3;
else process.env.CLAUDE_MEM_SKIP_CLEANUP_V12_4_3 = original;
}
expect(existsSync(path.join(tmpDataDir, '.cleanup-v12.4.3-applied'))).toBe(false);
const verify = new Database(dbPath, { readonly: true });
const observerCount = (verify.prepare('SELECT COUNT(*) AS n FROM sdk_sessions WHERE project = ?').get(OBSERVER_SESSIONS_PROJECT) as { n: number }).n;
verify.close();
expect(observerCount).toBe(1); // untouched
});
});