Files
claude-mem/plans/2026-05-07-finish-bullmq-branch-ship-plan.md
T
Alex Newman 36b0929fae Server-beta: Postgres storage + independent runtime + BullMQ queue (Phases 1–3) (#2351)
* Add server beta runtime foundation

* Address server beta review findings

* Resolve server beta review comments

* Tighten server beta review follow-ups

* Harden server beta auth and search

* Avoid unnecessary FTS rebuilds

* Block scoped keys from creating projects

* Release BullMQ claims best effort on close

* Address server beta review blockers

* Reset BullMQ claims best effort

* Add Postgres observation storage foundation

* feat(server-beta): add independent runtime service

Introduce src/server/runtime/ as a self-contained server-beta runtime
that owns its lifecycle, Postgres bootstrap, and HTTP boundary without
depending on WorkerService.

ServerBetaService wraps the existing Server class, exposes
/healthz and /v1/info with runtime="server-beta", and persists state
to dedicated paths (.server-beta.pid|.port|.runtime.json). The four
boundary managers (queue, generation worker, provider registry, event
broadcaster) are intentionally disabled in this phase and report their
status through /v1/info; later phases activate them.

Adds plans/2026-05-07-finish-bullmq-branch-ship-plan.md to track the
remaining work for this branch.

Phase 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(server-beta): route CLI lifecycle and bundle separate runtime

scripts/build-hooks.js now produces plugin/scripts/server-beta-service.cjs
as a separate Node CJS bundle, alongside the existing worker-service
bundle. The server-beta runtime is now installable independently.

src/npx-cli/commands/server.ts routes start|stop|restart|status to the
server-beta lifecycle instead of the legacy worker. The worker keeps its
own start|stop|restart|status under the worker namespace; the two
runtimes can be operated independently.

src/services/worker-service.ts adds a server-* command parser branch
that delegates to the sibling server-beta-service.cjs bundle so
direct worker-service invocations still route to the right runtime.

tests/npx-cli-server-namespace.test.ts updated to expect server-beta
lifecycle routing.

Includes rebuilt plugin/scripts/*.cjs bundles produced by
build-and-sync.

Phase 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(server-beta): add BullMQ job queue primitives

Introduce src/server/jobs/ as the queue-side primitives that Phase 3 of
the server-beta runtime needs to operate.

types.ts defines a discriminated union over the four job kinds (event,
event-batch, summary, reindex) and maps each to a per-kind BullMQ queue
name and deterministic-ID prefix.

job-id.ts builds deterministic, colon-free BullMQ jobIds from
(kind, team, project, source). The colon ban exists because BullMQ uses
':' as a Redis key separator internally; embedding ':' in jobIds
breaks scan and state lookups.

ServerJobQueue.ts is a thin wrapper over BullMQ Queue + Worker that
enforces autorun:false, default concurrency 1, and an attached error
listener — all per BullMQ docs requirements. Test seams accept queue
and worker factories so unit tests do not need Redis.

outbox.ts publishes through the Postgres ObservationGenerationJob
repository as canonical history. enqueueOutbox writes the row first,
then publishes to BullMQ; if BullMQ throws, the row is transitioned to
failed and a failed event is appended. reconcileOnStartup re-enqueues
queued + processing rows after a restart, replacing terminal BullMQ
jobs that may still be holding the deterministic ID slot. markCompleted
and markFailed wrap transitionStatus and append the matching event row.

Includes 20 unit tests covering deterministic ID stability, colon-free
output, queue lifecycle, error-listener attachment, double-start
refusal, idempotent enqueue, BullMQ failure rollback, startup
reconciliation, max-attempts skipping, and completion / failure /
retry transitions.

Phase 3 commit 1 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(server-beta): activate queue boundary in runtime service

Wire ActiveServerBetaQueueManager into the server-beta runtime graph.
The active manager owns one ServerJobQueue per generation kind (event,
event-batch, summary, reindex) and surfaces lane metadata through
boundary health.

Selection is opt-in and fail-fast: if CLAUDE_MEM_QUEUE_ENGINE is set to
bullmq the active manager is constructed (and any Redis/config error
throws — no silent fallback to SQLite, per Phase 3 anti-pattern guard).
For any other engine the disabled boundary remains so worker-era and
test setups stay compatible.

Widens ServerBetaBoundaryHealth.status to a discriminated union
('disabled' | 'active' | 'errored') with optional details. The disabled
adapter still emits status='disabled', which keeps the existing
server-beta-service test green.

ServerBetaService receives the manager through a new optional
queueManager field on CreateServerBetaServiceOptions so test graphs
and Phase 4 wiring can inject custom managers.

Adds tests/server/runtime/active-queue-manager.test.ts covering bullmq
guard, active health shape, per-kind queue access, close behavior, and
post-close errored health.

Phase 3 commit 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(server-beta): cap /v1/events/batch at 500 events

Prevents unbounded array DoS surface flagged in PR review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 01:20:07 -07:00

15 KiB
Raw Blame History

Finish BullMQ Observation Queue Branch — Ship Plan

Date: 2026-05-07 Branch: bullmq-vs-bee-queue-for-claude-mem-observation-que Base: origin/main @ 0a43ab76 Parent plan: plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md

Reframe

The prior session believed Phase 1 was ungated because two reviewer agents failed (one returned not_found, "Carver" was user-aborted at 111.9s). That belief was based on a stale snapshot that predated commit 4e0fc77a Add Postgres observation storage foundation. Phase 1 is committed. git status shows zero uncommitted changes under src/storage/postgres/.

What is actually dirty in the worktree is Phase 2: Define Server Runtime Boundary. The dirty files map 1:1 to that phase's "What To Implement" section. The remaining work to "finish this branch" is: confirm Phase 1 with concrete checks (not another reviewer agent), land Phase 2, push.

Phases 313 (BullMQ queue, event-to-job pipeline, provider extraction, hook routing, MCP, compat, Docker, team auth, observability, final verification) are explicitly out of scope for this branch. The PR is already 167 files / 23.5K insertions. Continuing past Phase 2 here would make review impossible.

Phase 0: Documentation Discovery

Sources Read

  • plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md (parent plan, 987 lines, all 14 sections from Phase 0 through Phase 13)
  • PR_REORIENTATION_REPORT.md (660 lines) — independent inventory of committed + dirty surfaces
  • git status, git log --oneline -15, git diff --stat HEAD
  • Worktree: src/server/runtime/{ServerBetaService.ts,create-server-beta-service.ts,types.ts}
  • Worktree: src/storage/postgres/ — already in commit 4e0fc77a

Concrete Findings

  • Phase 1 (Postgres storage foundation) is committed in 4e0fc77a. Includes scoped addSource, transitionStatus, generation-job event append, FTS via generated content_search tsvector + GIN index, tenant-scoped uniqueness constraints, and 20 integration tests including the negative-scope mutation test.
  • Phase 2 (server runtime boundary) is implemented but uncommitted. Files match the parent plan's Phase 2 deliverables exactly: independent ServerBetaService, create-server-beta-service, disabled boundary types, .server-beta.{pid,port,runtime.json} paths, runtime labels in /api/health and /v1/info, server-beta CLI lifecycle, build-hooks split into a separate server-beta-service.cjs bundle, ephemeral-port test for /api/health and /v1/info.
  • Two doc artifacts (AGENTS.md, PR_REORIENTATION_REPORT.md) are also untracked. Decide before push.

Anti-Pattern Guards (carried from parent plan)

  • Do not spawn a third reviewer agent to "gate" Phase 1. The integration test suite plus the plan's grep checklist is the gate. Reviewer agents are a second opinion, not the primary gate.
  • Do not pull Phase 3+ work into this branch.
  • Do not amend 4e0fc77a to "tidy" Phase 1; create new commits.
  • Do not couple Phase 2 to WorkerService (the entire point of Phase 2 is independence).

Phase A: Re-Confirm Phase 1 Gate (Deterministic, No Reviewer Agent)

What To Run

  1. tsc --noEmit scoped to Postgres storage:
    bunx tsc --noEmit src/storage/postgres/*.ts
    
  2. Postgres integration suite (requires DATABASE_URL or local Postgres on default port):
    bun test tests/storage/postgres
    
  3. Anti-pattern greps (must all return zero matches in src/storage/postgres/):
    rg -n "UNIQUE\s*\(\s*source_type\s*,\s*source_id\s*,\s*job_type\s*\)" src/storage/postgres
    rg -n "UNIQUE\s*\(\s*observation_id\s*,\s*source_type\s*,\s*source_id\s*\)" src/storage/postgres
    
  4. Scoped-mutation grep (must show projectId/teamId parameters):
    rg -n "addSource|transitionStatus|append" src/storage/postgres
    

Verification Checklist

  • TypeScript clean.
  • All 20 Postgres integration tests pass, including the negative-scope mutation test.
  • Both anti-pattern greps return empty.
  • Scoped-mutation grep shows projectId/teamId in every signature.

Anti-Pattern Guards

  • Do not edit src/storage/postgres/*.ts in this phase. If Phase A fails, open a separate fix-up commit; do not amend 4e0fc77a.

Phase B: Land Phase 2 (Server Runtime Boundary)

What To Run

  1. Phase 2 independence grep — Server beta runtime must not import worker:
    rg -n "WorkerService|services/worker-service|worker/http" \
      src/server/runtime src/npx-cli/commands/server.ts
    
    Allowed: matches inside src/services/worker-service.ts itself (delegation back to server-beta is fine). Forbidden: any import inside src/server/runtime/.
  2. Server-beta service test:
    bun test tests/server/server-beta-service.test.ts
    
  3. CLI namespace test:
    bun test tests/npx-cli-server-namespace.test.ts
    
  4. Build verifies server-beta-service.cjs bundle is produced:
    npm run build-and-sync
    ls -la plugin/scripts/server-beta-service.cjs
    
  5. Smoke test independence:
    npx claude-mem server status      # before start
    npx claude-mem server start
    npx claude-mem server status      # running, runtime=server-beta
    curl -s http://127.0.0.1:$(cat ~/.claude-mem/.server-beta.port)/healthz
    curl -s http://127.0.0.1:$(cat ~/.claude-mem/.server-beta.port)/v1/info
    npx claude-mem server stop
    
    Worker start|stop|status must remain functional throughout.

Commit Layout

Two commits, in order:

  1. feat(server-beta): add independent runtime service

    • src/server/runtime/ServerBetaService.ts
    • src/server/runtime/create-server-beta-service.ts
    • src/server/runtime/types.ts
    • src/server/routes/v1/ServerV1Routes.ts (runtime label)
    • src/services/server/Server.ts (runtime option)
    • src/shared/paths.ts (.server-beta.{pid,port,runtime.json})
    • tests/server/server-beta-service.test.ts
  2. feat(server-beta): route CLI lifecycle and build a separate bundle

    • scripts/build-hooks.js (server-beta bundle output)
    • src/npx-cli/commands/runtime.ts (server-beta lifecycle commands)
    • src/npx-cli/commands/server.ts (CLI routing)
    • src/services/worker-service.ts (delegate server-start|stop|restart|status to sibling bundle)
    • tests/npx-cli-server-namespace.test.ts

Documentation References

  • Parent plan, lines 469514: Phase 2 deliverables and verification checklist.
  • src/services/server/Server.ts: existing route-composition style to copy.
  • src/services/infrastructure/ProcessManager.ts: PID-file safety patterns.

Verification Checklist

  • All five Phase B steps pass.
  • Worker lifecycle still works while server-beta is running, and vice versa.
  • Two commits land cleanly with no --amend or force operations.

Anti-Pattern Guards

  • Do not import WorkerService from src/server/runtime/.
  • Do not overload worker PID/port files.
  • Do not boot worker as a background dependency of server-beta.
  • Do not silently fall back from server-beta to worker.

Phase C: Decide Doc Artifacts

What To Decide

File Recommendation Rationale
PR_REORIENTATION_REPORT.md Use as PR body, then delete (or move to docs/internal/). It's a snapshot, not durable docs. Useful for the PR reviewer; rots in-tree.
AGENTS.md Read first, then either commit (if generally useful guidance) or move under .scratch/. Decision depends on content.

Verification

  • Final git status shows only intended doc artifacts (or none).
  • .scratch/ is gitignored if used.

Anti-Pattern Guard

  • Do not push PR_REORIENTATION_REPORT.md to main as a doc; it has a date and a HEAD SHA, it ages immediately.

Phase D: Push and Open/Update PR

What To Run

  1. git push -u origin bullmq-vs-bee-queue-for-claude-mem-observation-que
  2. gh pr view --web (if PR exists) or gh pr create with body sourced from PR_REORIENTATION_REPORT.md.
  3. PR body must explicitly carve scope: "Includes Phase 1 + Phase 2 from plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md. Phases 313 are follow-ups on separate branches."

Verification Checklist

  • PR title is short (under 70 chars) and reflects scope: e.g., "Add Postgres storage + independent server-beta runtime (Phases 12)".
  • PR body lists out-of-scope phases.
  • CI is green.

Anti-Pattern Guards

  • Do not force-push to main.
  • Do not merge without CI green.

Phase E: Branch Closeout

Once the PR merges, this branch is done. Phase 3 (BullMQ-First Server Queue) starts on a fresh branch off main. Do not reuse this branch for Phase 3 work — keep the queue/runtime split visible in history.

Final Verification (cross-phase)

Run after Phases AD:

git status                                   # clean or only intended doc artifacts
git log --oneline origin/main..HEAD          # 4e0fc77a + Phase 2 commits, no force-push markers
bun test tests/storage/postgres tests/server tests/npx-cli-server-namespace.test.ts
rg -n "WorkerService|services/worker-service|worker/http" src/server/runtime
rg -n "PendingMessageStore|SessionQueueProcessor" src/server/runtime

Expected:

  • All three test paths green.
  • Both greps return zero matches.
  • Branch ready to merge.

Decisions Locked

  1. Phase 1 gate: orchestrator-managed deterministic checks (no reviewer agent).
  2. AGENTS.md + PR_REORIENTATION_REPORT.md: discard before commit.
  3. Scope: this branch ships Phases 1 + 2 + 3 (BullMQ-First Server Queue). Phase E becomes Phase 3 work, push moves to Phase F.

Phase D (revised): Discard Untracked Doc Artifacts

rm AGENTS.md PR_REORIENTATION_REPORT.md

Verification: git status shows neither file.

Phase E: Implement Phase 3 — BullMQ-First Server Queue

Source: parent plan lines 515570.

What To Implement

  • src/server/jobs/types.ts — job-shape types:
    • ServerGenerationJob (base)
    • GenerateObservationsForEventJob
    • GenerateObservationsForEventBatchJob
    • GenerateSessionSummaryJob
    • ReindexObservationJob
    • Every job carries team_id, project_id, source_type, source_id, generation_job_id. Event jobs add agent_event_id. Summary jobs add server_session_id. Reindex jobs add target observation ID or deterministic reindex scope ID.
  • src/server/jobs/job-id.ts — deterministic, colon-free job IDs (port the SHA-256-safe pattern from src/server/queue/BullMqObservationQueueEngine.ts).
  • src/server/jobs/ServerJobQueue.ts — thin wrapper around BullMQ Queue, Worker, QueueEvents. Use autorun: false, explicit concurrency: 1 default per lane, and an error listener on every Worker.
  • src/server/jobs/outbox.ts — durable outbox over ObservationGenerationJobRepository. Statuses: queued, processing, completed, failed, cancelled. Tracks attempts, last error, timestamps, and tenant/project/session IDs.
  • Startup reconciliation:
    • Re-enqueue rows in queued or stale processing.
    • Skip rows already completed.
    • Replace terminal BullMQ jobs before reusing deterministic IDs.
  • Wire queue health into /v1/info, /api/health, and claude-mem server status via the existing runtime label hook.
  • Activate the queue boundary in ServerBetaService (Phase 2 left it disabled). Provide a real adapter when CLAUDE_MEM_QUEUE_ENGINE=bullmq and REDIS_URL are present; keep the disabled adapter as the fallback.

Documentation References

Verification Checklist

Unit tests under tests/server/jobs/:

  • job-id.test.ts — deterministic IDs, no colons, stable across runs, content-derived.
  • server-job-queue.test.ts — Queue/Worker lifecycle, error listener attached, concurrency honored, autorun false.
  • outbox.test.ts — duplicate enqueue suppression, terminal job replacement, status transitions, attempt counting.

Integration tests under tests/server/queue-bootstrap/:

  • Start ServerBetaService with Postgres + Valkey + queue boundary enabled.
  • Insert outbox rows directly through ObservationGenerationJobRepository.
  • Enqueue fake jobs; restart before fake processing completes.
  • Assert reconciliation re-enqueues exactly once and outbox status reaches completed exactly once.
  • Assert Redis-down fails Server beta startup when CLAUDE_MEM_QUEUE_ENGINE=bullmq; no silent fallback to SQLite.

Greps:

rg -n "Bull(MQ|Mq).*\.add\(" src/server/jobs        # uses BullMQ Queue.add
rg -n "autorun" src/server/jobs                     # workers explicitly set autorun
rg -n "on\(['\"]error" src/server/jobs              # error listener attached
rg -n ":job:|:obs:" src/server/jobs                 # NO colons in deterministic IDs

The colon-grep must return zero matches.

Anti-Pattern Guards

  • Do not treat BullMQ completed/failed state as canonical history — Postgres outbox is canonical.
  • Do not require event-route wiring or provider generation here (Phase 4 territory).
  • Do not allow duplicate processor side effects on retry — keep observation writes idempotent by deterministic key.
  • Do not use BullMQ Pro-only features (groups).
  • Do not leave pending work only in Redis.
  • Do not silently fall back from BullMQ to SQLite when CLAUDE_MEM_QUEUE_ENGINE=bullmq is set.

Commit Layout

Two commits:

  1. feat(server-beta): add BullMQ job queue primitives

    • src/server/jobs/types.ts
    • src/server/jobs/job-id.ts
    • src/server/jobs/ServerJobQueue.ts
    • src/server/jobs/outbox.ts
    • tests/server/jobs/*.test.ts
  2. feat(server-beta): activate queue boundary in runtime service

    • src/server/runtime/ServerBetaService.ts (queue boundary wiring)
    • src/server/runtime/create-server-beta-service.ts (boundary selection from env)
    • src/server/runtime/types.ts (active queue manager interface)
    • Health surface updates in /v1/info and /api/health if not already covered by Phase 2 runtime label.
    • tests/server/queue-bootstrap/*.test.ts

Phase F: Push and Open/Update PR

git push -u origin bullmq-vs-bee-queue-for-claude-mem-observation-que
gh pr view --web   # if PR exists
# else:
gh pr create --title "Server-beta: Postgres storage + independent runtime + BullMQ queue (Phases 13)"

PR body must list:

  • Scope: Phases 1, 2, 3 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md.
  • Out of scope: Phases 413 (event-to-job pipeline, provider extraction, hook routing, MCP, compat, Docker, team auth, observability, final verification).

Verification Checklist

  • git status clean.
  • git log --oneline origin/main..HEAD shows all expected commits, no force-push markers.
  • CI green.

Final Cross-Phase Verification

git status                                                            # clean
bun test tests/storage/postgres tests/server tests/npx-cli-server-namespace.test.ts
rg -n "WorkerService|services/worker-service|worker/http" src/server/runtime    # zero
rg -n "PendingMessageStore|SessionQueueProcessor" src/server/runtime src/server/jobs  # zero