v12.4.3: one-time pollution cleanup migration + v12.4.1/v12.4.2 fixes (#2133)
* fix: 5 trivial bugs from v12.4.1 issue triage - #2092: emit CJS-safe banner (no import.meta.url) in worker-service.cjs - #2100: PreToolUse Read hook timeout 2000s → 60s - #2131: add "shell": "bash" to every hook for Windows compat - #2132: Antigravity dir typo .agent → .agents - #2088: clear inherited MCP servers in worker SDK query() calls Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: stop context overflow loop + block task-notification leak - SDKAgent: clear memorySessionId on "prompt is too long" so crash-recovery starts a fresh SDK session instead of resuming the same poisoned context forever (was producing 68+ failed pending_messages on a single stuck session in the wild) - tag-stripping: new isInternalProtocolPayload() predicate; session-init hook + SessionRoutes both skip storage when entire prompt is one of Claude Code's autonomous protocol blocks (currently <task-notification>; conservative deny-list — does NOT touch <command-name>/<command-message> which wrap real user slash-commands) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version to 12.4.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update CHANGELOG.md for v12.4.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cleanup): one-time v12.4.3 migration purges observer-sessions and stuck pending_messages Adds CleanupV12_4_3 module that runs once per data dir on worker startup (after migrations apply, before Chroma backfill). Drops accumulated pollution that v12.4.0 (observer-sessions filter) and v12.4.2 (context-overflow guard + task-notification leak block) prevent from recurring: - DELETE FROM sdk_sessions WHERE project='observer-sessions' (cascades to user_prompts, observations, session_summaries via existing FK ON DELETE CASCADE) - DELETE FROM pending_messages stuck in 'failed'/'processing' for any session with >=10 such rows (poisoned chains from the pre-v12.4.2 retry loop; threshold spares legitimate transient failures) - Wipes ~/.claude-mem/chroma and chroma-sync-state.json so backfillAllProjects rebuilds the vector store from cleaned SQLite Pre-flight checks free disk (1.2x DB size + 100MB) via fs.statfsSync; backs up via VACUUM INTO with copyFileSync fallback; PRAGMA foreign_keys=ON on the cleanup connection (off by default in bun:sqlite). Marker file ~/.claude-mem/.cleanup-v12.4.3-applied records backup path and counts. Opt-out via CLAUDE_MEM_SKIP_CLEANUP_V12_4_3=1. Verified locally: 311MB DB backed up to 277MB in 943ms; 11 observer sessions + 3 cascade rows + 141 stuck pending_messages purged; chroma rebuilt via backfill. Total cleanup time 1.1s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address PR #2133 code review - SessionRoutes: check isInternalProtocolPayload before stripping tags so internal protocol prompts skip the strip work entirely. - tag-stripping: bound isInternalProtocolPayload input length to 256KB to prevent ReDoS-class scans on malformed unclosed tags. - SDKAgent: extract resetSessionForFreshStart helper; both context-overflow paths now share one nullification routine. - worker-service: drop the per-startup "Checking for one-time v12.4.3 cleanup" info log — runs every boot even after marker exists; the function already logs at debug/warn when relevant. - tests: add isInternalProtocolPayload edge cases (whitespace, attributes, partial tags, unrelated tags, oversize input). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address Greptile P2 comments on PR #2133 CleanupV12_4_3.ts: derive backup directory and restore-hint path from effectiveDataDir instead of the module-level BACKUPS_DIR/DB_PATH constants. The dataDirectory override is meant for test isolation; the prior version still wrote backups to the production directory. SessionRoutes.ts: move isInternalProtocolPayload guard to the top of handleSessionInitByClaudeId, before createSDKSession. The previous position blocked the user_prompts insert but still created an empty sdk_sessions row, asymmetric with the hook-layer guard in session-init.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cleanup): retry on disk-skip; survive chroma wipe failure CodeRabbit Major + Claude review: - Disk pre-flight skip no longer writes the marker. A user temporarily low on disk would otherwise have the cleanup permanently disabled even after freeing space. Retry on next startup instead. - Wrap wipeChromaArtifacts in try/catch and write the marker even on failure (with chromaWipeError captured). Without this, an rmSync permission failure on chroma/ left writeMarker unreached, so every subsequent boot re-ran the SQL purge AND created a fresh backup, consuming disk indefinitely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cleanup): close backup handle before copyFileSync fallback Claude review: - backupDb is now closed before falling into the copyFileSync fallback. On Windows an open SQLite handle holds a file lock that can prevent the fallback copy from reading the source. The previous version only closed after both branches completed. - Add empty-body <task-notification></task-notification> case to the isInternalProtocolPayload tests for completeness. Cascade-row count queries already match the actual FK columns (content_session_id for user_prompts, memory_session_id for observations / session_summaries) — no fix needed there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cleanup): accurate session count + add migration tests Claude review v3: session-init.ts: filter on rawPrompt before the [media prompt] substitution. Functionally equivalent but explicit — the check no longer depends on the substitution leaving real protocol payloads untouched. CleanupV12_4_3.ts: counts.observerSessions now comes from a pre-DELETE COUNT(*), not from result.changes. bun:sqlite inflates result.changes with FTS-trigger and cascade row counts (the user_prompts_fts triggers inflate a 3-session purge to 19 changes). The previous code logged a misleading total and wrote it to the marker. tests/infrastructure/cleanup-v12_4_3.test.ts: happy-path coverage of the migration against a real on-disk SQLite under a tmpdir. Verifies observer-session purge with cascades, stuck pending_messages purge, chroma artifact wipe, marker payload shape, idempotency on re-run, and CLAUDE_MEM_SKIP_CLEANUP_V12_4_3 opt-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(protocol-filter): close two-block false positive; address review CodeRabbit + Claude review v5: tag-stripping.ts: PROTOCOL_ONLY_REGEX rewritten with a negative-lookahead body so a prompt like "<task-notification>x</task-notification> hi <task-notification>y</task-notification>" no longer matches as a single outer block — the prior greedy [\s\S]* spanned the middle user text and would have silently dropped a real prompt. Confirmed via probe. tag-stripping.test.ts: drop the 50ms wall-clock assertion (CI flake); add the two-block-with-text case as a regression test. SessionRoutes.ts: filter on req.body.prompt directly, before the [media prompt] substitution and 256KB truncation. Mirrors the session-init.ts hook-layer ordering and ensures a protocol payload that happens to be near the byte limit isn't truncated before the filter runs. cleanup-v12_4_3.test.ts: add stuckCount=9 below-threshold case verifying pending_messages with <10 stuck rows are preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cleanup): include WAL/SHM in backup fallback; safer rollback CodeRabbit Major + Claude review v6: CleanupV12_4_3.ts: when VACUUM INTO fails and copyFileSync runs, also copy any -wal/-shm sidecars. The DB is configured WAL mode, so recent committed pages can live in those files; copying only the .db would miss them. VACUUM INTO already captures everything in one file, so the happy path is unaffected. CleanupV12_4_3.ts: wrap ROLLBACK in try/catch so a no-op rollback (SQLite already rolled back on a constraint failure) cannot shadow the original purge error. SDKAgent.ts: align both context-overflow log levels to error. Both branches are fatal-recovery paths; the previous warn/error split was inconsistent and made the throw branch easy to miss in logs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: pre-count stuck pending_messages; document adjacent-block fall-through Claude review v7: CleanupV12_4_3.ts: runStuckPendingPurge now uses a SELECT COUNT(*) before the DELETE, matching the pattern in runObserverSessionsPurge. result.changes is reliable today (no FTS on pending_messages) but the explicit count protects against future schema additions, and keeps the two purges symmetric. tag-stripping.test.ts: add test documenting that adjacent protocol blocks (no user text between) deliberately fall through to storage. The deny-list is per-block; concatenations are out of scope. Skipped per project rules / Node API constraints: - frsize fallback in disk check: Node/Bun StatFs doesn't expose frsize - VACUUM-INTO comment: comment-only suggestion - Overflow string constant extraction: low value Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -12,6 +12,7 @@ import { HOOK_EXIT_CODES } from '../../shared/hook-constants.js';
|
||||
import { shouldTrackProject } from '../../shared/should-track-project.js';
|
||||
import { loadFromFileOnce } from '../../shared/hook-settings.js';
|
||||
import { normalizePlatformSource } from '../../shared/platform-source.js';
|
||||
import { isInternalProtocolPayload } from '../../utils/tag-stripping.js';
|
||||
|
||||
interface SessionInitResponse {
|
||||
sessionDbId: number;
|
||||
@@ -43,6 +44,15 @@ export const sessionInitHandler: EventHandler = {
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
|
||||
// Filter on the raw prompt so the check is independent of the
|
||||
// [media prompt] substitution below.
|
||||
if (rawPrompt && isInternalProtocolPayload(rawPrompt)) {
|
||||
logger.debug('HOOK', 'session-init: skipping internal protocol payload', {
|
||||
preview: rawPrompt.slice(0, 80),
|
||||
});
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
|
||||
// Handle image-only prompts (where text prompt is empty/undefined)
|
||||
// Use placeholder so sessions still get created and tracked for memory
|
||||
const prompt = (!rawPrompt || !rawPrompt.trim()) ? '[media prompt]' : rawPrompt;
|
||||
|
||||
@@ -0,0 +1,276 @@
|
||||
/**
|
||||
* One-time v12.4.3 pollution cleanup.
|
||||
*
|
||||
* Removes accumulated junk that v12.4.0/v12.4.2 fixes prevent from ever recurring:
|
||||
* 1. observer-sessions: rows that polluted user-facing search/timeline before
|
||||
* the observer-sessions filter shipped. Cascades to user_prompts, observations,
|
||||
* and session_summaries via existing FK ON DELETE CASCADE.
|
||||
* 2. Stuck pending_messages: poisoned chains where ≥10 rows for a single
|
||||
* session_db_id are stuck in 'failed' or 'processing'. Threshold spares
|
||||
* legitimate transient failures while clearing the cascade-failure cases
|
||||
* from the pre-v12.4.2 context-overflow loop.
|
||||
*
|
||||
* After SQLite is cleaned, ~/.claude-mem/chroma/ and ~/.claude-mem/chroma-sync-state.json
|
||||
* are removed so backfillAllProjects rebuilds the vector store from the cleaned SQLite.
|
||||
*
|
||||
* Marker-file gated. Idempotent. Opt-out via CLAUDE_MEM_SKIP_CLEANUP_V12_4_3=1.
|
||||
*
|
||||
* Mirrors the runOneTimeChromaMigration / runOneTimeCwdRemap pattern in
|
||||
* ProcessManager.ts. Must run AFTER dbManager.initialize() (so migrations have
|
||||
* applied) and BEFORE ChromaSync.backfillAllProjects (so backfill sees the
|
||||
* cleaned state).
|
||||
*/
|
||||
|
||||
import path from 'path';
|
||||
import { existsSync, writeFileSync, mkdirSync, rmSync, statSync, copyFileSync, statfsSync } from 'fs';
|
||||
import { Database } from 'bun:sqlite';
|
||||
import { DATA_DIR, OBSERVER_SESSIONS_PROJECT } from '../../shared/paths.js';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
|
||||
const MARKER_FILENAME = '.cleanup-v12.4.3-applied';
|
||||
const STUCK_PENDING_THRESHOLD = 10;
|
||||
|
||||
interface CleanupCounts {
|
||||
observerSessions: number;
|
||||
observerCascadeRows: number;
|
||||
stuckPendingMessages: number;
|
||||
}
|
||||
|
||||
interface MarkerPayload {
|
||||
appliedAt: string;
|
||||
backupPath: string | null;
|
||||
chromaWiped: boolean;
|
||||
chromaWipeError?: string;
|
||||
counts: CleanupCounts;
|
||||
skipped?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Run the one-time v12.4.3 cleanup. Safe to call on every worker startup;
|
||||
* the marker file ensures the work runs at most once per data directory.
|
||||
*
|
||||
* @param dataDirectory - Override for DATA_DIR (used in tests)
|
||||
*/
|
||||
export function runOneTimeV12_4_3Cleanup(dataDirectory?: string): void {
|
||||
const effectiveDataDir = dataDirectory ?? DATA_DIR;
|
||||
const markerPath = path.join(effectiveDataDir, MARKER_FILENAME);
|
||||
|
||||
if (existsSync(markerPath)) {
|
||||
logger.debug('SYSTEM', 'v12.4.3 cleanup marker exists, skipping');
|
||||
return;
|
||||
}
|
||||
|
||||
if (process.env.CLAUDE_MEM_SKIP_CLEANUP_V12_4_3 === '1') {
|
||||
logger.warn('SYSTEM', 'v12.4.3 cleanup skipped via CLAUDE_MEM_SKIP_CLEANUP_V12_4_3=1; marker not written');
|
||||
return;
|
||||
}
|
||||
|
||||
const dbPath = path.join(effectiveDataDir, 'claude-mem.db');
|
||||
if (!existsSync(dbPath)) {
|
||||
mkdirSync(effectiveDataDir, { recursive: true });
|
||||
writeMarker(markerPath, { appliedAt: new Date().toISOString(), backupPath: null, chromaWiped: false, counts: emptyCounts(), skipped: 'no-db' });
|
||||
logger.debug('SYSTEM', 'No DB present, v12.4.3 cleanup marker written without work', { dbPath });
|
||||
return;
|
||||
}
|
||||
|
||||
logger.warn('SYSTEM', 'Running one-time v12.4.3 pollution cleanup', { dbPath });
|
||||
|
||||
try {
|
||||
executeCleanup(dbPath, effectiveDataDir, markerPath);
|
||||
} catch (err: unknown) {
|
||||
const error = err instanceof Error ? err : new Error(String(err));
|
||||
logger.error('SYSTEM', 'v12.4.3 cleanup failed, marker not written (will retry on next startup)', {}, error);
|
||||
}
|
||||
}
|
||||
|
||||
function executeCleanup(dbPath: string, effectiveDataDir: string, markerPath: string): void {
|
||||
const dbSize = statSync(dbPath).size;
|
||||
const required = Math.ceil(dbSize * 1.2) + 100 * 1024 * 1024;
|
||||
|
||||
let backupPath: string | null = null;
|
||||
try {
|
||||
const fs = statfsSync(effectiveDataDir);
|
||||
const free = Number(fs.bavail) * Number(fs.bsize);
|
||||
if (free < required) {
|
||||
// Don't write the marker — once the user frees disk space, the next
|
||||
// worker startup should retry the cleanup rather than skipping forever.
|
||||
logger.error('SYSTEM', 'Insufficient disk for v12.4.3 backup; skipping cleanup (will retry on next startup)', { dbSize, free, required });
|
||||
return;
|
||||
}
|
||||
} catch (err: unknown) {
|
||||
const error = err instanceof Error ? err : new Error(String(err));
|
||||
logger.warn('SYSTEM', 'statfsSync failed; proceeding without disk-space pre-flight', {}, error);
|
||||
}
|
||||
|
||||
const effectiveBackupsDir = path.join(effectiveDataDir, 'backups');
|
||||
mkdirSync(effectiveBackupsDir, { recursive: true });
|
||||
const ts = new Date().toISOString().replace(/[:.]/g, '-');
|
||||
backupPath = path.join(effectiveBackupsDir, `claude-mem-pre-12.4.3-${ts}.db`);
|
||||
|
||||
const backupDb = new Database(dbPath, { readonly: true });
|
||||
let vacuumFailed = false;
|
||||
let vacuumError: Error | null = null;
|
||||
try {
|
||||
backupDb.run(`VACUUM INTO '${backupPath.replace(/'/g, "''")}'`);
|
||||
logger.info('SYSTEM', 'v12.4.3 backup created via VACUUM INTO', { backupPath, dbSize });
|
||||
} catch (err: unknown) {
|
||||
vacuumFailed = true;
|
||||
vacuumError = err instanceof Error ? err : new Error(String(err));
|
||||
}
|
||||
// Close before any fallback: on Windows an open SQLite handle holds a
|
||||
// file lock that can prevent copyFileSync from reading the source.
|
||||
backupDb.close();
|
||||
|
||||
if (vacuumFailed) {
|
||||
logger.warn('SYSTEM', 'VACUUM INTO failed, falling back to copyFileSync', {}, vacuumError ?? undefined);
|
||||
try {
|
||||
copyFileSync(dbPath, backupPath);
|
||||
// The DB is in WAL mode; recent committed pages may live in -wal/-shm.
|
||||
// VACUUM INTO captures them automatically; copyFileSync does not, so
|
||||
// mirror them alongside so the backup represents the same state.
|
||||
const walPath = `${dbPath}-wal`;
|
||||
const shmPath = `${dbPath}-shm`;
|
||||
if (existsSync(walPath)) copyFileSync(walPath, `${backupPath}-wal`);
|
||||
if (existsSync(shmPath)) copyFileSync(shmPath, `${backupPath}-shm`);
|
||||
logger.info('SYSTEM', 'v12.4.3 backup created via copyFileSync (incl. -wal/-shm if present)', { backupPath, dbSize });
|
||||
} catch (copyErr: unknown) {
|
||||
const copyError = copyErr instanceof Error ? copyErr : new Error(String(copyErr));
|
||||
logger.error('SYSTEM', 'v12.4.3 backup failed via both VACUUM INTO and copyFileSync; aborting cleanup', {}, copyError);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
const counts = emptyCounts();
|
||||
const db = new Database(dbPath);
|
||||
// PRAGMA foreign_keys must be set OUTSIDE a transaction to take effect on this connection.
|
||||
db.run('PRAGMA foreign_keys = ON');
|
||||
|
||||
try {
|
||||
runObserverSessionsPurge(db, counts);
|
||||
runStuckPendingPurge(db, counts);
|
||||
} finally {
|
||||
db.close();
|
||||
}
|
||||
|
||||
// SQLite purge succeeded; chroma wipe failure must NOT re-run the migration
|
||||
// on the next startup or we accumulate one new backup per boot. Capture the
|
||||
// failure on the marker instead.
|
||||
let chromaWiped = false;
|
||||
let chromaWipeError: string | undefined;
|
||||
try {
|
||||
chromaWiped = wipeChromaArtifacts(effectiveDataDir);
|
||||
} catch (err: unknown) {
|
||||
const error = err instanceof Error ? err : new Error(String(err));
|
||||
chromaWipeError = error.message;
|
||||
logger.error('SYSTEM', 'v12.4.3: Chroma wipe failed; marker still written so cleanup does not re-run', {}, error);
|
||||
}
|
||||
|
||||
writeMarker(markerPath, {
|
||||
appliedAt: new Date().toISOString(),
|
||||
backupPath,
|
||||
chromaWiped,
|
||||
chromaWipeError,
|
||||
counts,
|
||||
});
|
||||
|
||||
logger.info('SYSTEM', 'v12.4.3 cleanup complete', {
|
||||
backupPath,
|
||||
chromaWiped,
|
||||
...counts,
|
||||
});
|
||||
logger.info('SYSTEM', `To restore: cp '${backupPath}' '${dbPath}'`);
|
||||
}
|
||||
|
||||
function runObserverSessionsPurge(db: Database, counts: CleanupCounts): void {
|
||||
db.run('BEGIN IMMEDIATE');
|
||||
try {
|
||||
// Count rows before the delete: bun:sqlite's result.changes inflates with
|
||||
// FTS-trigger and cascade row counts, so it can't stand in for a session
|
||||
// count or a cascade-row count on its own.
|
||||
const sessionCount = (db.prepare(`SELECT COUNT(*) AS n FROM sdk_sessions WHERE project = ?`).get(OBSERVER_SESSIONS_PROJECT) as { n: number }).n;
|
||||
const cascadeRows =
|
||||
(db.prepare(`SELECT COUNT(*) AS n FROM user_prompts WHERE content_session_id IN (SELECT content_session_id FROM sdk_sessions WHERE project = ?)`).get(OBSERVER_SESSIONS_PROJECT) as { n: number }).n
|
||||
+ (db.prepare(`SELECT COUNT(*) AS n FROM observations WHERE memory_session_id IN (SELECT memory_session_id FROM sdk_sessions WHERE project = ? AND memory_session_id IS NOT NULL)`).get(OBSERVER_SESSIONS_PROJECT) as { n: number }).n
|
||||
+ (db.prepare(`SELECT COUNT(*) AS n FROM session_summaries WHERE memory_session_id IN (SELECT memory_session_id FROM sdk_sessions WHERE project = ? AND memory_session_id IS NOT NULL)`).get(OBSERVER_SESSIONS_PROJECT) as { n: number }).n;
|
||||
|
||||
db.run(`DELETE FROM sdk_sessions WHERE project = ?`, [OBSERVER_SESSIONS_PROJECT]);
|
||||
counts.observerSessions = sessionCount;
|
||||
counts.observerCascadeRows = cascadeRows;
|
||||
|
||||
db.run('COMMIT');
|
||||
logger.info('SYSTEM', 'v12.4.3: observer-sessions purge committed', {
|
||||
sessions: counts.observerSessions,
|
||||
cascadeRows: counts.observerCascadeRows,
|
||||
});
|
||||
} catch (err: unknown) {
|
||||
// Defensive: SQLite may have already auto-rolled back on certain
|
||||
// constraint failures. Don't let a no-op ROLLBACK shadow the real error.
|
||||
try { db.run('ROLLBACK'); } catch { /* already rolled back */ }
|
||||
throw err;
|
||||
}
|
||||
}
|
||||
|
||||
function runStuckPendingPurge(db: Database, counts: CleanupCounts): void {
|
||||
db.run('BEGIN IMMEDIATE');
|
||||
try {
|
||||
// Pre-count for consistency with runObserverSessionsPurge: result.changes
|
||||
// would be reliable today (no FTS on pending_messages) but the explicit
|
||||
// count protects against future schema changes.
|
||||
const stuckCount = (db.prepare(
|
||||
`SELECT COUNT(*) AS n FROM pending_messages
|
||||
WHERE status IN ('failed', 'processing')
|
||||
AND session_db_id IN (
|
||||
SELECT session_db_id FROM pending_messages
|
||||
WHERE status IN ('failed', 'processing')
|
||||
GROUP BY session_db_id
|
||||
HAVING COUNT(*) >= ?
|
||||
)`
|
||||
).get(STUCK_PENDING_THRESHOLD) as { n: number }).n;
|
||||
|
||||
db.run(
|
||||
`DELETE FROM pending_messages
|
||||
WHERE status IN ('failed', 'processing')
|
||||
AND session_db_id IN (
|
||||
SELECT session_db_id FROM pending_messages
|
||||
WHERE status IN ('failed', 'processing')
|
||||
GROUP BY session_db_id
|
||||
HAVING COUNT(*) >= ?
|
||||
)`,
|
||||
[STUCK_PENDING_THRESHOLD]
|
||||
);
|
||||
counts.stuckPendingMessages = stuckCount;
|
||||
db.run('COMMIT');
|
||||
logger.info('SYSTEM', 'v12.4.3: stuck pending_messages purge committed', { rows: counts.stuckPendingMessages });
|
||||
} catch (err: unknown) {
|
||||
// Defensive: SQLite may have already auto-rolled back on certain
|
||||
// constraint failures. Don't let a no-op ROLLBACK shadow the real error.
|
||||
try { db.run('ROLLBACK'); } catch { /* already rolled back */ }
|
||||
throw err;
|
||||
}
|
||||
}
|
||||
|
||||
function wipeChromaArtifacts(effectiveDataDir: string): boolean {
|
||||
const chromaDir = path.join(effectiveDataDir, 'chroma');
|
||||
const stateFile = path.join(effectiveDataDir, 'chroma-sync-state.json');
|
||||
let wiped = false;
|
||||
|
||||
if (existsSync(chromaDir)) {
|
||||
rmSync(chromaDir, { recursive: true, force: true });
|
||||
logger.info('SYSTEM', 'v12.4.3: chroma directory removed (will rebuild via backfill)', { chromaDir });
|
||||
wiped = true;
|
||||
}
|
||||
if (existsSync(stateFile)) {
|
||||
rmSync(stateFile, { force: true });
|
||||
logger.info('SYSTEM', 'v12.4.3: chroma-sync-state.json removed', { stateFile });
|
||||
wiped = true;
|
||||
}
|
||||
return wiped;
|
||||
}
|
||||
|
||||
function writeMarker(markerPath: string, payload: MarkerPayload): void {
|
||||
writeFileSync(markerPath, JSON.stringify(payload, null, 2));
|
||||
}
|
||||
|
||||
function emptyCounts(): CleanupCounts {
|
||||
return { observerSessions: 0, observerCascadeRows: 0, stuckPendingMessages: 0 };
|
||||
}
|
||||
@@ -5,3 +5,4 @@
|
||||
export * from './ProcessManager.js';
|
||||
export * from './HealthMonitor.js';
|
||||
export * from './GracefulShutdown.js';
|
||||
export * from './CleanupV12_4_3.js';
|
||||
|
||||
@@ -185,7 +185,7 @@ const ANTIGRAVITY_CONFIG: McpInstallerConfig = {
|
||||
configPath: path.join(homedir(), '.gemini', 'antigravity', 'mcp_config.json'),
|
||||
configKey: 'mcpServers',
|
||||
contextFile: {
|
||||
path: path.join(process.cwd(), '.agent', 'rules', 'claude-mem-context.md'),
|
||||
path: path.join(process.cwd(), '.agents', 'rules', 'claude-mem-context.md'),
|
||||
isWorkspaceRelative: true,
|
||||
},
|
||||
};
|
||||
|
||||
@@ -52,6 +52,7 @@ import {
|
||||
spawnDaemon,
|
||||
touchPidFile
|
||||
} from './infrastructure/ProcessManager.js';
|
||||
import { runOneTimeV12_4_3Cleanup } from './infrastructure/CleanupV12_4_3.js';
|
||||
import {
|
||||
isPortInUse,
|
||||
waitForHealth,
|
||||
@@ -453,6 +454,10 @@ export class WorkerService implements WorkerRef {
|
||||
logger.warn('QUEUE', 'Startup GC for failed pending_messages rows failed', {}, err instanceof Error ? err : undefined);
|
||||
}
|
||||
|
||||
// One-time v12.4.3 pollution cleanup. Runs AFTER migrations have applied
|
||||
// and BEFORE backfillAllProjects so the rebuilt Chroma sees a clean SQLite.
|
||||
runOneTimeV12_4_3Cleanup();
|
||||
|
||||
// Initialize search services
|
||||
logger.info('WORKER', 'Initializing search services...');
|
||||
const formattingService = new FormattingService();
|
||||
|
||||
@@ -42,6 +42,12 @@ export class SDKAgent {
|
||||
this.sessionManager = sessionManager;
|
||||
}
|
||||
|
||||
private resetSessionForFreshStart(session: ActiveSession): void {
|
||||
this.dbManager.getSessionStore().updateMemorySessionId(session.sessionDbId, null);
|
||||
session.memorySessionId = null;
|
||||
session.forceInit = true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Start SDK agent for a session (event-driven, no polling)
|
||||
* @param worker WorkerService reference for spinner control (optional)
|
||||
@@ -151,7 +157,8 @@ export class SDKAgent {
|
||||
// Custom spawn factory: spawns the SDK child in its own POSIX process
|
||||
// group so the worker can tear down the whole subtree on shutdown.
|
||||
spawnClaudeCodeProcess: createSdkSpawnFactory(session.sessionDbId),
|
||||
env: isolatedEnv // Use isolated credentials from ~/.claude-mem/.env, not process.env
|
||||
env: isolatedEnv, // Use isolated credentials from ~/.claude-mem/.env, not process.env
|
||||
mcpServers: {},
|
||||
}
|
||||
});
|
||||
|
||||
@@ -208,7 +215,8 @@ export class SDKAgent {
|
||||
// Check for context overflow - prevents infinite retry loops
|
||||
if (textContent.includes('prompt is too long') ||
|
||||
textContent.includes('context window')) {
|
||||
logger.error('SDK', 'Context overflow detected - terminating session');
|
||||
logger.error('SDK', 'Context overflow detected - terminating session and forcing fresh start');
|
||||
this.resetSessionForFreshStart(session);
|
||||
session.abortController.abort();
|
||||
return;
|
||||
}
|
||||
@@ -259,6 +267,12 @@ export class SDKAgent {
|
||||
|
||||
// Detect fatal context overflow and terminate gracefully (issue #870)
|
||||
if (typeof textContent === 'string' && textContent.includes('Prompt is too long')) {
|
||||
// Resume of this SDK session will overflow forever. Force a fresh session on the
|
||||
// next spawn so crash-recovery can drain remaining pending messages successfully.
|
||||
this.resetSessionForFreshStart(session);
|
||||
logger.error('SDK', 'Context overflow — cleared memorySessionId so next spawn starts fresh', {
|
||||
sessionDbId: session.sessionDbId
|
||||
});
|
||||
throw new Error('Claude session context overflow: prompt is too long');
|
||||
}
|
||||
|
||||
|
||||
@@ -11,7 +11,7 @@ import { ingestObservation } from '../shared.js';
|
||||
import { validateBody } from '../middleware/validateBody.js';
|
||||
import { getWorkerPort } from '../../../../shared/worker-utils.js';
|
||||
import { logger } from '../../../../utils/logger.js';
|
||||
import { stripMemoryTagsFromJson, stripMemoryTagsFromPrompt } from '../../../../utils/tag-stripping.js';
|
||||
import { stripMemoryTagsFromJson, stripMemoryTagsFromPrompt, isInternalProtocolPayload } from '../../../../utils/tag-stripping.js';
|
||||
import { SessionManager } from '../../SessionManager.js';
|
||||
import { DatabaseManager } from '../../DatabaseManager.js';
|
||||
import { SDKAgent } from '../../SDKAgent.js';
|
||||
@@ -857,10 +857,20 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
// Only contentSessionId is truly required — Cursor and other platforms
|
||||
// may omit prompt/project in their payload (#838, #1049)
|
||||
const project = req.body.project || 'unknown';
|
||||
let prompt = req.body.prompt || '[media prompt]';
|
||||
const rawPrompt = typeof req.body.prompt === 'string' ? req.body.prompt : undefined;
|
||||
const platformSource = normalizePlatformSource(req.body.platformSource);
|
||||
const customTitle = req.body.customTitle || undefined;
|
||||
|
||||
// Filter on the raw prompt before truncation / [media prompt] substitution
|
||||
// so the check is independent of those transforms.
|
||||
if (rawPrompt && isInternalProtocolPayload(rawPrompt)) {
|
||||
logger.debug('HTTP', 'session-init: skipping internal protocol payload before session creation', { contentSessionId });
|
||||
res.json({ skipped: true, reason: 'internal_protocol' });
|
||||
return;
|
||||
}
|
||||
|
||||
let prompt = rawPrompt || '[media prompt]';
|
||||
|
||||
const promptByteLength = Buffer.byteLength(prompt, 'utf8');
|
||||
if (promptByteLength > MAX_USER_PROMPT_BYTES) {
|
||||
logger.warn('HTTP', 'SessionRoutes: oversized prompt truncated at session-init boundary', {
|
||||
|
||||
@@ -79,7 +79,8 @@ export class KnowledgeAgent {
|
||||
cwd: OBSERVER_SESSIONS_DIR,
|
||||
disallowedTools: KNOWLEDGE_AGENT_DISALLOWED_TOOLS,
|
||||
pathToClaudeCodeExecutable: claudePath,
|
||||
env: isolatedEnv
|
||||
env: isolatedEnv,
|
||||
mcpServers: {},
|
||||
}
|
||||
});
|
||||
|
||||
@@ -195,7 +196,8 @@ export class KnowledgeAgent {
|
||||
cwd: OBSERVER_SESSIONS_DIR,
|
||||
disallowedTools: KNOWLEDGE_AGENT_DISALLOWED_TOOLS,
|
||||
pathToClaudeCodeExecutable: claudePath,
|
||||
env: isolatedEnv
|
||||
env: isolatedEnv,
|
||||
mcpServers: {},
|
||||
}
|
||||
});
|
||||
|
||||
|
||||
@@ -104,3 +104,38 @@ export function stripMemoryTagsFromJson(content: string): string {
|
||||
export function stripMemoryTagsFromPrompt(content: string): string {
|
||||
return stripTags(content).stripped;
|
||||
}
|
||||
|
||||
/**
|
||||
* Tag names that Claude Code emits autonomously into the prompt stream as
|
||||
* protocol notifications — never authored by the user. When the entire prompt
|
||||
* payload is one of these blocks (with no surrounding user text), the hook
|
||||
* MUST skip storage to keep `user_prompts` clean.
|
||||
*
|
||||
* Conservative deny-list: do NOT add `<command-name>` / `<command-message>`
|
||||
* here — those wrap genuine user slash-command invocations.
|
||||
*/
|
||||
const PROTOCOL_ONLY_TAGS = ['task-notification'] as const;
|
||||
|
||||
// Negative lookahead in the body keeps a payload like
|
||||
// "<task-notification>x</task-notification> hi <task-notification>y</task-notification>"
|
||||
// from matching as a single outer block (greedy [\s\S]* would otherwise span
|
||||
// the middle user text and silently drop a real prompt).
|
||||
const PROTOCOL_ONLY_REGEX = new RegExp(
|
||||
`^\\s*<(${PROTOCOL_ONLY_TAGS.join('|')})\\b[^>]*>(?:(?!<\\1\\b|</\\1\\b)[\\s\\S])*</\\1>\\s*$`,
|
||||
);
|
||||
|
||||
// Bounds the unanchored `[\s\S]*` body to keep a malformed 1MB+ payload that
|
||||
// opens a protocol tag and never closes it from running the regex engine
|
||||
// against the whole prompt before failing.
|
||||
const MAX_PROTOCOL_PAYLOAD_BYTES = 256 * 1024;
|
||||
|
||||
/**
|
||||
* Returns true when `text` is *entirely* a Claude Code protocol payload
|
||||
* (e.g. a `<task-notification>` block emitted on background Agent completion)
|
||||
* with no surrounding user-authored content.
|
||||
*/
|
||||
export function isInternalProtocolPayload(text: string): boolean {
|
||||
if (!text) return false;
|
||||
if (text.length > MAX_PROTOCOL_PAYLOAD_BYTES) return false;
|
||||
return PROTOCOL_ONLY_REGEX.test(text);
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user