Files
claude-mem/plans/issue-2139-pending-messages-migration.md
T
Alex Newman d13662d5d8 Cynical deletion: close 27 issues by removing defenders + tolerators (#2141)
* fix: mirror migration 28 in SessionStore so pending_messages.tool_use_id and worker_pid columns are created (#2139)

SessionStore's inline migration list jumped from v27 to v29, skipping
rebuildPendingMessagesForSelfHealingClaim. The worker uses SessionStore
directly via worker/DatabaseManager.ts and bypasses the canonical
MigrationRunner, so fresh installs ended up at "max v29" with neither
column present — every queue claim and observation insert failed.

Adds addPendingMessagesToolUseIdAndWorkerPidColumns following the existing
mirror precedent (addObservationSubagentColumns / addObservationsUniqueContentHashIndex).
Uses ALTER TABLE + column-existence guards so already-broken DBs at v29
self-heal on next worker boot.

Verified on fresh DB and on a synthetic v29-without-v28 broken DB:
both columns and indexes (idx_pending_messages_worker_pid,
ux_pending_session_tool) appear after one boot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: wrap v28 mirror dedup+index creation in transaction

Addresses Greptile P2 review on PR #2140: matches the existing pattern in
addObservationsUniqueContentHashIndex (v29 mirror at SessionStore.ts:1127)
and runner.ts rebuildPendingMessagesForSelfHealingClaim. A crash between
the dedup DELETE and the schema_versions INSERT no longer leaves the DB
in a half-applied state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plan): cynical-deletion plan for 29 open issues

9-phase plan applying delete-first lens to triaged issue corpus.
Headlines: kill defenders (orphan cleanup, EncodedCommand spawn,
restart-port-steal) and tolerators (silent JSON drops, drifted SSE
filters). Each phase closes a named subset of issues.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: delete process-management theater (Phase 1: DEL-1 + DEL-2)

Delete aggressiveStartupCleanup, the PowerShell -EncodedCommand
spawn branch, and the restart-with-port-steal sequence. Replace
daemon spawning with a single uniform child_process.spawn path
using arg-array form, keeping setsid on Unix when available.

The defenders (orphan cleanup, duplicate-worker probes, port
stealing) bred more bugs than they fixed. PID file with start-time
token already provides correct OS-trust ownership; restart now
requests httpShutdown, waits 5s for the port to free, then exits 1
if it didn't (user resolves). Net -247 lines.

Closes #2090, #2095 (already fixed at session-init.ts:78), #2107,
#2111, #2114, #2117, #2123, #2097, #2135.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: observer-sessions trust boundary via CLAUDE_MEM_INTERNAL env (Phase 2: DEL-9)

Replace the cwd === OBSERVER_SESSIONS_DIR discriminator (which every
consumer must repeat and inevitably drifts) with a single env-var
trust boundary set once at spawn time in buildIsolatedEnv.

- buildIsolatedEnv now sets CLAUDE_MEM_INTERNAL=1, covering all three
  spawn sites (SDKAgent, KnowledgeAgent.prime, KnowledgeAgent.executeQuery)
- shouldTrackProject checks the env var first (cwd check stays as
  belt-and-braces fallback)
- New shared shouldEmitProjectRow predicate — SSE broadcaster and
  pagination filter share the same predicate so they can never drift
  apart (#2118)
- ObservationBroadcaster filters observer rows from SSE stream
- PaginationHelper hardcoded 'observer-sessions' replaced with
  OBSERVER_SESSIONS_PROJECT const
- project-filter basename match pass — *observer-sessions* now matches
  basename, not just full path (globToRegex's [^/]* can't cross /)
  (#2126 item 1)
- New `claude-mem cleanup [--dry-run]` subcommand wires CleanupV12_4_3
  through to the worker for #2126 item 5

Closes #2118, #2126.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: strip proxy env vars before spawning worker (Phase 4: CON-1)

User's HTTP_PROXY/HTTPS_PROXY config was bleeding into internal AI
calls when claude-mem spawns the claude subprocess, causing
connection failures. Strip unconditionally — no passthrough knob,
which rejects #2099's whitelist proposal.

Closes #2115, #2099.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: fail-fast on silent drops in stdin/file-context/memory-save (Phase 5: FF-1)

Three independent fail-fast fixes:

#2089 — stdin-reader silent drop. Non-empty stdin that fails JSON.parse
now rejects with a clear error instead of resolving undefined. Empty
stdin still resolves undefined.

#2094 — PreToolUse:Read truncation Edit deadlock. file-context handler
no longer returns a fake truncated Read result via updatedInput.
Removes userOffset/userLimit/truncated machinery; injects the timeline
via additionalContext only and lets the real Read pass through. Read
state and Claude's expectation now stay consistent, eliminating the
infinite Edit retry loop.

#2116 — /api/memory/save metadata drop + project bug. Schema accepts
metadata as a documented JSON column (migration 30 adds observations.
metadata TEXT, mirrored in SessionStore). Schema also tightened to
.strict() so unknown top-level fields fail fast instead of being
silently dropped. Project resolution now consults metadata.project as
a fallback before defaultProject.

Closes #2089, #2094, #2116.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: small deletions — Zod externalize / Gemini fallback / session timeout / installCLI alias (Phase 6)

DEL-4 (#2113): Externalize zod from mcp-server.cjs and context-generator.cjs
hook bundles so OpenCode's runtime resolves a single Zod copy. Worker
keeps Zod bundled (it's a daemon subprocess, not in OpenCode's hook
bundle). Added zod to plugin/package.json so externalized requires
resolve at runtime.

DEL-5 (#2087): Delete the never-wired GeminiAgent → Claude fallback.
fallbackAgent was always null in production. On 429 the agent now
throws cleanly (message stays pending for retry). Removed
setFallbackAgent, FallbackAgent interface, and the 429 fallback
branch from both GeminiAgent and OpenRouterAgent. Updated docs
that claimed automatic Claude fallback.

DEL-6 (#2127, #2098): Raise MAX_SESSION_WALL_CLOCK_MS from 4h to
24h. The timeout is a real guard against runaway-cost loops (per
issue #1590), but 4h kills legitimate long Claude Code days. 24h
preserves the guard while never hitting in normal use. No knob —
a session approaching this age is a bug worth investigating, not
a value worth tuning.

DEL-8 (#2054): Delete installCLI() alias function. Saves 4 keystrokes
at the cost of cross-platform shell-config mutation surface — not
worth it. Canonical entry is npx claude-mem (and bunx). Uninstall
now strips legacy alias/function lines from ~/.bashrc, ~/.zshrc,
and the PowerShell profile.

Closes #2087, #2098, #2113, #2127, #2054.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: de-hardcode worker port + multi-account commit (Phase 3: CON-2 + DEL-7)

Replace hardcoded 37777 fallbacks with SettingsDefaultsManager.get(
'CLAUDE_MEM_WORKER_PORT') in npx-cli (runtime/install/uninstall),
opencode-plugin, OpenClaw installer, SearchRoutes example URLs.
Timeline-report SKILL.md now resolves WORKER_PORT from settings.json
at the top and uses ${WORKER_PORT} in all curl invocations.
Remaining 37777 literals are doc comments + viewer build-time form-
field placeholder (which is replaced by /api/settings on mount).

hooks.json: add cygpath POSIX→Windows path translation between _R
resolution and node invocation. No-op on macOS/Linux. Closes the
Windows + Git Bash MODULE_NOT_FOUND in #2109.

CLAUDE.md gains a Multi-account section documenting CLAUDE_MEM_DATA_DIR
+ optional CLAUDE_MEM_WORKER_PORT — every existing path/port code
path now honors them.

Closes #2103, #2109, #2101.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: install/uninstall improvements (Phase 7: #2106)

5 fixes for the install/uninstall flow:

Item 1 — multiselect default. install.ts no longer pre-selects every
detected IDE; user explicitly opts in.

Item 3 — shutdown-before-overwrite. New
src/services/install/shutdown-helper.ts shared by install and
uninstall: POSTs /api/admin/shutdown then polls /api/health until
the worker stops responding. install calls it before
copyPluginToMarketplace so reinstall over a running worker doesn't
conflict; uninstall calls it before deletion.

Item 4 — uninstall path coverage. Removes ~/.npm/_npx/*/node_modules/
claude-mem, ~/.cache/claude-cli-nodejs/*/mcp-logs-plugin-claude-mem-*,
~/.claude/plugins/data/claude-mem-thedotmack/. Best-effort: per-path
try/catch so a single permission failure doesn't abort uninstall.
chroma-mcp shutdown is implicit via the worker's GracefulShutdown
cascade in item 3's helper.

Item 5 — install summary documents "Close all Claude Code sessions
before uninstalling, or ~/.claude-mem will be recreated by active
hooks."

Item 6 — real-port query. After install, fetches /api/health on the
configured port with 3s timeout. Reports actually-bound port if the
response carries it; falls back to requested port. No retry loop.

Closes #2106 (items 1, 3, 4, 5, 6). Items 2, 7 closed separately
as already-fixed and insufficient-detail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: pin chroma-mcp to 0.2.6 (Phase 8: DEL-3 lite)

Replace unpinned 'chroma-mcp' arg with chroma-mcp==0.2.6 in both
local and remote modes. Pinning makes installs deterministic across
machines and across time, eliminating the dependency-drift class
of bugs.

Verified 0.2.6 in a clean uv cache: starts cleanly, no httpcore/
httpx ImportError, no --with flags needed. The --with flags removed
in a0dd516c are not required at this pin (transitive deps resolve
correctly when the top-level version is fixed).

#2102's three protections (transport cleanup on failure, stale onclose
handler guard, 10s reconnect backoff) confirmed intact.

Closes #2046, #2085, #2102.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: update stale assertions for per-UID port + migration 30 (Phase 9)

SettingsDefaultsManager.CLAUDE_MEM_WORKER_PORT default is per-UID
(37700 + uid%100), not literal '37777'. Three assertions in
settings-defaults-manager.test.ts now compute the expected value
the same way the source does.

migration-runner.test.ts: drop expect(versions).toContain(19)
(version 19 was a noop never recorded — pre-existing bug at parent),
add expect(versions).toContain(30) for the new observations.metadata
column added in Phase 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile P1/P2 review comments on PR #2141

P1: spawnDaemon return value was unchecked in worker-service.ts restart
case, so a failed spawn silently exited 0 with a misleading "Worker
restart spawned" log. Now error and exit 1 when restartPid is undefined.

P2: shutdown-helper.ts health-poll catch treated AbortError (timeout)
the same as connection-refused, so a slow worker could be reported
confirmedStopped while still holding file locks. Now distinguish:
AbortError continues polling; other errors return confirmedStopped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* build: rebuild plugin artifacts after merging main

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit review comments on PR #2141

- hooks.json: quote $HOME in cache lookup so paths with spaces work
- timeline-report SKILL.md: fall back when process.getuid is unavailable (Windows)
- opencode-plugin: validate CLAUDE_MEM_WORKER_PORT before using
- uninstall.ts: only strip alias lines, not function declarations (multi-line bodies left intact)
- MemoryRoutes: trim whitespace-only project before precedence resolution
- SessionStore migration 21: preserve metadata column if observations already has it
- stdin-reader test: restore full property descriptor to avoid cross-test pollution

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:23:24 -07:00

15 KiB
Raw Blame History

Plan: Fix Issue #2139 — Missing migration for pending_messages.tool_use_id and pending_messages.worker_pid

Root Cause (verified)

There are two parallel migration code paths in this repo:

  1. src/services/sqlite/migrations/runner.ts::MigrationRunner.runAllMigrations() — the canonical runner. It includes rebuildPendingMessagesForSelfHealingClaim() (v28) which adds tool_use_id + worker_pid columns and the idx_pending_messages_worker_pid + ux_pending_session_tool indexes.
  2. src/services/sqlite/SessionStore.ts constructor (lines 5677) — a duplicated inline migration list. It is missing migration 28 entirely — it calls addObservationSubagentColumns() (v27) directly followed by addObservationsUniqueContentHashIndex() (v29).

The worker bypasses Database.ts → MigrationRunner and instantiates SessionStore directly via src/services/worker/DatabaseManager.ts:34 (this.sessionStore = new SessionStore(this.db);). So in a fresh worker boot, the worker only runs SessionStore's incomplete list, leaves v28 unapplied, marks v29 as applied, and the bundled plugin/scripts/worker-service.cjs ships without v28's logic (verified: grep -c "rebuildPendingMessagesForSelfHealingClaim" plugin/scripts/worker-service.cjs returns 0; .run(28, is absent while .run(27, and .run(29, are present).

Result: pending_messages is created from createPendingMessagesTable() (v16) which has neither column, no later step adds them, and every queue claim and observation insert fails as the issue describes.

Fix Strategy

Mirror MigrationRunner.rebuildPendingMessagesForSelfHealingClaim into SessionStore.ts following the exact mirror precedent already established in that file at SessionStore.ts:1003-1039 (addObservationSubagentColumns) and SessionStore.ts:1041-… (addObservationsUniqueContentHashIndex). Each existing mirror's docstring explicitly says: "Mirrors MigrationRunner.<name> so bundled artifacts that embed SessionStore (e.g. worker-service.cjs, context-generator.cjs) stay schema-consistent."

We do not need a new schema_versions number. The existing migration is v28; we just need SessionStore to apply it. The mirror should be column-existence driven (not version-trust driven) per the SessionStore convention at line 952: "Cannot trust schema_versions alone — the old MigrationRunner may have recorded version 26 without the ALTER TABLE actually succeeding. Always check column existence directly." This matters because real-world affected DBs already have v29 recorded (per the issue) — checking version alone would skip the fix.

We should use the simple ALTER TABLE approach the issue suggests rather than the full table-rebuild from runner.ts, because:

  • ALTER TABLE is safe to run on DBs that already reached v29 with rows present.
  • The runner.ts rebuild's only extra work was dropping a legacy stale-reset epoch column that hasn't existed since v20 in DBs created by the SessionStore path.
  • Idempotency is achieved by PRAGMA table_info + column-name guards.

Phase 0: Documentation Discovery (already done inline above)

Sources consulted:

  • src/services/sqlite/SessionStore.ts:30-77 (constructor migration list)
  • src/services/sqlite/SessionStore.ts:949-1100 (existing mirror methods + docstrings)
  • src/services/sqlite/migrations/runner.ts:22-43 (canonical migration order)
  • src/services/sqlite/migrations/runner.ts:1005-1153 (canonical v28 logic)
  • src/services/sqlite/PendingMessageStore.ts:106-194 (consumer SQL using both columns)
  • src/services/sqlite/schema.sql:121-156 (canonical fresh-DB schema — already has both columns + indexes)
  • src/services/worker/DatabaseManager.ts:31-35 (worker uses SessionStore directly)
  • plugin/scripts/worker-service.cjs — confirmed bundled artifact has .run(27, and .run(29, but no .run(28, and no rebuildPendingMessagesForSelfHealingClaim symbol.

Allowed APIs (verified to exist):

  • this.db.query('PRAGMA table_info(pending_messages)').all() as TableColumnInfo[] — used at SessionStore.ts:1024.
  • this.db.run('ALTER TABLE pending_messages ADD COLUMN <col> <type>') — used at SessionStore.ts:1029, 1032.
  • this.db.run('CREATE INDEX IF NOT EXISTS …') — used throughout.
  • this.db.run('CREATE UNIQUE INDEX IF NOT EXISTS …') — used at runner.ts:1134.
  • this.db.prepare('INSERT OR IGNORE INTO schema_versions …').run(28, new Date().toISOString()) — same pattern as v27, v29 mirrors.
  • TableColumnInfo is already imported at SessionStore.ts top.

Anti-patterns to avoid:

  • Do NOT trust schema_versions.version = 28 alone — check PRAGMA table_info for column existence first (real-world DBs from issue #2139 already have v29 recorded with no v28 logic ever applied).
  • Do NOT do a full table rebuild in SessionStore — risky on populated DBs and unnecessary; use ALTER TABLE.
  • Do NOT add a new version number (e.g. v30). The migration is v28 — we are completing what was already specified, not creating new schema.
  • Do NOT modify runner.ts — its v28 is correct already; the bug is only that SessionStore doesn't mirror it.
  • Do NOT remove the duplicated migration system. That's a larger refactor (see observation 71512). For this fix, just complete the mirror.

Phase 1: Add the mirror method to SessionStore.ts

File: src/services/sqlite/SessionStore.ts

1A. Add the call site

In the constructor migration list, insert one line between line 75 (this.addObservationSubagentColumns();) and line 76 (this.addObservationsUniqueContentHashIndex();):

this.addObservationSubagentColumns();
this.addPendingMessagesToolUseIdAndWorkerPidColumns();   // ← new
this.addObservationsUniqueContentHashIndex();

This places the call in the same relative position as rebuildPendingMessagesForSelfHealingClaim in runner.ts:41.

1B. Add the method body

Insert immediately before addObservationsUniqueContentHashIndex (around SessionStore.ts:1041), following the docstring pattern of the two adjacent mirrors:

/**
 * Add tool_use_id and worker_pid columns + indexes to pending_messages (migration 28).
 *
 * Mirrors MigrationRunner.rebuildPendingMessagesForSelfHealingClaim so bundled
 * artifacts that embed SessionStore (e.g. worker-service.cjs, context-generator.cjs)
 * stay schema-consistent. Without this, every queue-claim cycle fails with
 * "no such column: worker_pid" and every observation insert fails with
 * "table pending_messages has no column named tool_use_id" (issue #2139).
 *
 * Uses ALTER TABLE rather than the full table rebuild from MigrationRunner because:
 *   - It's safe on populated DBs that already reached v29 without ever applying v28.
 *   - The legacy stale-reset epoch column the rebuild dropped never existed in
 *     pending_messages tables created by the SessionStore migration path.
 *
 * Column existence is checked directly — schema_versions cannot be trusted because
 * affected DBs may already have v29 recorded with neither column present (#2139).
 */
private addPendingMessagesToolUseIdAndWorkerPidColumns(): void {
  // pending_messages may not exist yet on freshly-created DBs at this point in
  // the migration order — createPendingMessagesTable (v16) has already run by
  // the time we get here, so this guard is defensive only.
  const tables = this.db.query(
    "SELECT name FROM sqlite_master WHERE type='table' AND name='pending_messages'"
  ).all() as TableNameRow[];
  if (tables.length === 0) {
    this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(28, new Date().toISOString());
    return;
  }

  const cols = this.db.query('PRAGMA table_info(pending_messages)').all() as TableColumnInfo[];
  const hasToolUseId = cols.some(c => c.name === 'tool_use_id');
  const hasWorkerPid = cols.some(c => c.name === 'worker_pid');

  if (!hasToolUseId) {
    this.db.run('ALTER TABLE pending_messages ADD COLUMN tool_use_id TEXT');
  }
  if (!hasWorkerPid) {
    this.db.run('ALTER TABLE pending_messages ADD COLUMN worker_pid INTEGER');
  }

  // Indexes are idempotent — match runner.ts:1117-1120 + 1134-1138.
  this.db.run('CREATE INDEX IF NOT EXISTS idx_pending_messages_worker_pid ON pending_messages(worker_pid)');

  // The UNIQUE partial index requires no duplicate (content_session_id, tool_use_id)
  // pairs. Dedup before creating it (matches runner.ts:1124-1132). Safe to run
  // unconditionally — if tool_use_id was just added, every row has it as NULL
  // and the WHERE filter excludes them.
  this.db.run(`
    DELETE FROM pending_messages
     WHERE tool_use_id IS NOT NULL
       AND id NOT IN (
         SELECT MIN(id) FROM pending_messages
          WHERE tool_use_id IS NOT NULL
          GROUP BY content_session_id, tool_use_id
       )
  `);
  this.db.run(`
    CREATE UNIQUE INDEX IF NOT EXISTS ux_pending_session_tool
    ON pending_messages(content_session_id, tool_use_id)
    WHERE tool_use_id IS NOT NULL
  `);

  this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(28, new Date().toISOString());
}

TableNameRow is not currently imported in SessionStore.ts. Check the existing imports; if absent, either:

  • Add TableNameRow to the existing import { TableColumnInfo, … } from '../../types/database.js'; line, or
  • Inline the cast as as Array<{ name: string }> (matches the inline pattern used elsewhere in the file).

1C. Anti-pattern guards

  • Do not wrap in BEGIN TRANSACTION — the surrounding constructor doesn't, and ALTER TABLE … ADD COLUMN is auto-committed in SQLite.
  • Do not call PRAGMA foreign_keys = OFF — only needed for table rebuilds, not ALTER TABLE.
  • Do not key off SELECT version FROM schema_versions WHERE version = 28 to early-return — affected DBs have v29 recorded without v28 columns. Always inspect PRAGMA table_info first.

1D. Verification (Phase 1)

# Source-side smoke checks
grep -n "addPendingMessagesToolUseIdAndWorkerPidColumns" src/services/sqlite/SessionStore.ts
# Should show 2 matches (call site + method definition)

# Confirm relative ordering is correct
grep -n "addObservationSubagentColumns\|addPendingMessagesToolUseIdAndWorkerPid\|addObservationsUniqueContentHashIndex" src/services/sqlite/SessionStore.ts | head -3
# Should print three lines in order: subagent, pending-messages, unique-hash

Phase 2: Build and verify the bundle

npm run build-and-sync

Verification:

# Bundled artifact must now contain v28 logic.
grep -c "addPendingMessagesToolUseIdAndWorkerPidColumns\|tool_use_id" plugin/scripts/worker-service.cjs
# tool_use_id count should rise from 6 to >=10 (CREATE INDEX strings + new ALTERs).

grep -on ".run(2[7-9]," plugin/scripts/worker-service.cjs
# Must now include .run(28, in addition to existing .run(27, and .run(29,

Phase 3: End-to-end verification on a real worker

  1. Move the existing DB aside to simulate a fresh install:
    mv ~/.claude-mem/claude-mem.db ~/.claude-mem/claude-mem.db.preissue2139
    mv ~/.claude-mem/claude-mem.db-wal ~/.claude-mem/claude-mem.db-wal.preissue2139 2>/dev/null
    mv ~/.claude-mem/claude-mem.db-shm ~/.claude-mem/claude-mem.db-shm.preissue2139 2>/dev/null
    
  2. Restart the worker (kill PID from ~/.claude-mem/supervisor.json; the supervisor respawns it).
  3. Confirm the schema:
    sqlite3 ~/.claude-mem/claude-mem.db "PRAGMA table_info(pending_messages);" | grep -E 'tool_use_id|worker_pid'
    # Both rows must appear.
    sqlite3 ~/.claude-mem/claude-mem.db "SELECT version FROM schema_versions ORDER BY version;"
    # Must include 28 and 29.
    sqlite3 ~/.claude-mem/claude-mem.db ".indexes pending_messages" | grep -E 'worker_pid|session_tool'
    # idx_pending_messages_worker_pid and ux_pending_session_tool must appear.
    
  4. Run a tool call in Claude Code so PostToolUse fires.
  5. tail -n 200 ~/.claude-mem/logs/<latest>.log | grep -E 'no such column|has no column' — must be empty.
  6. sqlite3 ~/.claude-mem/claude-mem.db "SELECT COUNT(*) FROM observations;" — must be > 0 after a real session.
  7. Restore the original DB so the test isn't destructive:
    mv ~/.claude-mem/claude-mem.db.preissue2139 ~/.claude-mem/claude-mem.db
    # (and the -wal/-shm if they existed)
    

Phase 4: Existing-DB upgrade verification

The user's reported scenario (v29 already applied, columns missing) must also self-heal once the bundle ships. To prove that without waiting for an external user:

  1. Copy the current dev DB to a scratch path.
  2. Force the broken state:
    cp ~/.claude-mem/claude-mem.db /tmp/issue2139-test.db
    sqlite3 /tmp/issue2139-test.db "
      ALTER TABLE pending_messages DROP COLUMN tool_use_id;
      ALTER TABLE pending_messages DROP COLUMN worker_pid;
      DROP INDEX IF EXISTS idx_pending_messages_worker_pid;
      DROP INDEX IF EXISTS ux_pending_session_tool;
      INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (29, datetime('now'));
    "
    # If DROP COLUMN errors on an older sqlite3 build, simulate via a fresh DB
    # at a 12.4.4-equivalent state instead.
    
  3. Point a one-off SessionStore at it (a tiny bun run script invoking new SessionStore('/tmp/issue2139-test.db')).
  4. Re-run PRAGMA table_info(pending_messages) — both columns must be present, and schema_versions must contain 28.

Phase 5: Issue follow-through

  1. Reply on issue #2139:
    • Confirm the diagnosis (SessionStore mirror missing v28).
    • Note the fix is shipping — give the version number after version-bump.
    • Thank the reporter (offer was already in their post; we don't need a PR from them).
  2. After the next claude-mem release, the affected user's worker will self-heal on next boot via the column-existence guards.

Anti-Pattern Audit (final)

  • No new schema_versions number invented (we use existing v28).
  • No version-trust early returns added — column-existence is the source of truth.
  • No table rebuild — straight ALTER TABLE to keep the existing rows safe.
  • No edits to runner.ts (already correct).
  • Mirror docstring follows the exact precedent at SessionStore.ts:1003 + :1041.
  • Bundle rebuilt and grep-verified to include .run(28,.

Risk Assessment

  • Low risk: ALTER TABLE ADD COLUMN with a NULLable type cannot fail on a non-empty table; CREATE INDEX IF NOT EXISTS is no-op on subsequent boots; the dedup DELETE is bounded by tool_use_id IS NOT NULL, which is empty immediately after the first ALTER.
  • No data loss: Adding columns and partial unique indexes is non-destructive. The dedup DELETE only fires if duplicate (content_session_id, tool_use_id) pairs already exist — an impossibility in the broken-DB scenario where tool_use_id was never persisted.
  • Idempotent: Repeated boots are safe — PRAGMA table_info + IF NOT EXISTS + INSERT OR IGNORE.