* Add server beta runtime foundation * Address server beta review findings * Resolve server beta review comments * Tighten server beta review follow-ups * Harden server beta auth and search * Avoid unnecessary FTS rebuilds * Block scoped keys from creating projects * Release BullMQ claims best effort on close * Address server beta review blockers * Reset BullMQ claims best effort * Add Postgres observation storage foundation * feat(server-beta): add independent runtime service Introduce src/server/runtime/ as a self-contained server-beta runtime that owns its lifecycle, Postgres bootstrap, and HTTP boundary without depending on WorkerService. ServerBetaService wraps the existing Server class, exposes /healthz and /v1/info with runtime="server-beta", and persists state to dedicated paths (.server-beta.pid|.port|.runtime.json). The four boundary managers (queue, generation worker, provider registry, event broadcaster) are intentionally disabled in this phase and report their status through /v1/info; later phases activate them. Adds plans/2026-05-07-finish-bullmq-branch-ship-plan.md to track the remaining work for this branch. Phase 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server-beta): route CLI lifecycle and bundle separate runtime scripts/build-hooks.js now produces plugin/scripts/server-beta-service.cjs as a separate Node CJS bundle, alongside the existing worker-service bundle. The server-beta runtime is now installable independently. src/npx-cli/commands/server.ts routes start|stop|restart|status to the server-beta lifecycle instead of the legacy worker. The worker keeps its own start|stop|restart|status under the worker namespace; the two runtimes can be operated independently. src/services/worker-service.ts adds a server-* command parser branch that delegates to the sibling server-beta-service.cjs bundle so direct worker-service invocations still route to the right runtime. tests/npx-cli-server-namespace.test.ts updated to expect server-beta lifecycle routing. Includes rebuilt plugin/scripts/*.cjs bundles produced by build-and-sync. Phase 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server-beta): add BullMQ job queue primitives Introduce src/server/jobs/ as the queue-side primitives that Phase 3 of the server-beta runtime needs to operate. types.ts defines a discriminated union over the four job kinds (event, event-batch, summary, reindex) and maps each to a per-kind BullMQ queue name and deterministic-ID prefix. job-id.ts builds deterministic, colon-free BullMQ jobIds from (kind, team, project, source). The colon ban exists because BullMQ uses ':' as a Redis key separator internally; embedding ':' in jobIds breaks scan and state lookups. ServerJobQueue.ts is a thin wrapper over BullMQ Queue + Worker that enforces autorun:false, default concurrency 1, and an attached error listener — all per BullMQ docs requirements. Test seams accept queue and worker factories so unit tests do not need Redis. outbox.ts publishes through the Postgres ObservationGenerationJob repository as canonical history. enqueueOutbox writes the row first, then publishes to BullMQ; if BullMQ throws, the row is transitioned to failed and a failed event is appended. reconcileOnStartup re-enqueues queued + processing rows after a restart, replacing terminal BullMQ jobs that may still be holding the deterministic ID slot. markCompleted and markFailed wrap transitionStatus and append the matching event row. Includes 20 unit tests covering deterministic ID stability, colon-free output, queue lifecycle, error-listener attachment, double-start refusal, idempotent enqueue, BullMQ failure rollback, startup reconciliation, max-attempts skipping, and completion / failure / retry transitions. Phase 3 commit 1 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server-beta): activate queue boundary in runtime service Wire ActiveServerBetaQueueManager into the server-beta runtime graph. The active manager owns one ServerJobQueue per generation kind (event, event-batch, summary, reindex) and surfaces lane metadata through boundary health. Selection is opt-in and fail-fast: if CLAUDE_MEM_QUEUE_ENGINE is set to bullmq the active manager is constructed (and any Redis/config error throws — no silent fallback to SQLite, per Phase 3 anti-pattern guard). For any other engine the disabled boundary remains so worker-era and test setups stay compatible. Widens ServerBetaBoundaryHealth.status to a discriminated union ('disabled' | 'active' | 'errored') with optional details. The disabled adapter still emits status='disabled', which keeps the existing server-beta-service test green. ServerBetaService receives the manager through a new optional queueManager field on CreateServerBetaServiceOptions so test graphs and Phase 4 wiring can inject custom managers. Adds tests/server/runtime/active-queue-manager.test.ts covering bullmq guard, active health shape, per-kind queue access, close behavior, and post-close errored health. Phase 3 commit 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server-beta): cap /v1/events/batch at 500 events Prevents unbounded array DoS surface flagged in PR review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 KiB
Finish BullMQ Observation Queue Branch — Ship Plan
Date: 2026-05-07
Branch: bullmq-vs-bee-queue-for-claude-mem-observation-que
Base: origin/main @ 0a43ab76
Parent plan: plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md
Reframe
The prior session believed Phase 1 was ungated because two reviewer agents failed (one returned not_found, "Carver" was user-aborted at 111.9s). That belief was based on a stale snapshot that predated commit 4e0fc77a Add Postgres observation storage foundation. Phase 1 is committed. git status shows zero uncommitted changes under src/storage/postgres/.
What is actually dirty in the worktree is Phase 2: Define Server Runtime Boundary. The dirty files map 1:1 to that phase's "What To Implement" section. The remaining work to "finish this branch" is: confirm Phase 1 with concrete checks (not another reviewer agent), land Phase 2, push.
Phases 3–13 (BullMQ queue, event-to-job pipeline, provider extraction, hook routing, MCP, compat, Docker, team auth, observability, final verification) are explicitly out of scope for this branch. The PR is already 167 files / 23.5K insertions. Continuing past Phase 2 here would make review impossible.
Phase 0: Documentation Discovery
Sources Read
plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md(parent plan, 987 lines, all 14 sections from Phase 0 through Phase 13)PR_REORIENTATION_REPORT.md(660 lines) — independent inventory of committed + dirty surfacesgit status,git log --oneline -15,git diff --stat HEAD- Worktree:
src/server/runtime/{ServerBetaService.ts,create-server-beta-service.ts,types.ts} - Worktree:
src/storage/postgres/— already in commit4e0fc77a
Concrete Findings
- Phase 1 (Postgres storage foundation) is committed in
4e0fc77a. Includes scopedaddSource,transitionStatus, generation-job eventappend, FTS via generatedcontent_searchtsvector + GIN index, tenant-scoped uniqueness constraints, and 20 integration tests including the negative-scope mutation test. - Phase 2 (server runtime boundary) is implemented but uncommitted. Files match the parent plan's Phase 2 deliverables exactly: independent
ServerBetaService,create-server-beta-service, disabled boundary types,.server-beta.{pid,port,runtime.json}paths, runtime labels in/api/healthand/v1/info, server-beta CLI lifecycle, build-hooks split into a separateserver-beta-service.cjsbundle, ephemeral-port test for/api/healthand/v1/info. - Two doc artifacts (
AGENTS.md,PR_REORIENTATION_REPORT.md) are also untracked. Decide before push.
Anti-Pattern Guards (carried from parent plan)
- Do not spawn a third reviewer agent to "gate" Phase 1. The integration test suite plus the plan's grep checklist is the gate. Reviewer agents are a second opinion, not the primary gate.
- Do not pull Phase 3+ work into this branch.
- Do not amend
4e0fc77ato "tidy" Phase 1; create new commits. - Do not couple Phase 2 to
WorkerService(the entire point of Phase 2 is independence).
Phase A: Re-Confirm Phase 1 Gate (Deterministic, No Reviewer Agent)
What To Run
tsc --noEmitscoped to Postgres storage:bunx tsc --noEmit src/storage/postgres/*.ts- Postgres integration suite (requires
DATABASE_URLor local Postgres on default port):bun test tests/storage/postgres - Anti-pattern greps (must all return zero matches in
src/storage/postgres/):rg -n "UNIQUE\s*\(\s*source_type\s*,\s*source_id\s*,\s*job_type\s*\)" src/storage/postgres rg -n "UNIQUE\s*\(\s*observation_id\s*,\s*source_type\s*,\s*source_id\s*\)" src/storage/postgres - Scoped-mutation grep (must show
projectId/teamIdparameters):rg -n "addSource|transitionStatus|append" src/storage/postgres
Verification Checklist
- TypeScript clean.
- All 20 Postgres integration tests pass, including the negative-scope mutation test.
- Both anti-pattern greps return empty.
- Scoped-mutation grep shows
projectId/teamIdin every signature.
Anti-Pattern Guards
- Do not edit
src/storage/postgres/*.tsin this phase. If Phase A fails, open a separate fix-up commit; do not amend4e0fc77a.
Phase B: Land Phase 2 (Server Runtime Boundary)
What To Run
- Phase 2 independence grep — Server beta runtime must not import worker:
Allowed: matches inside
rg -n "WorkerService|services/worker-service|worker/http" \ src/server/runtime src/npx-cli/commands/server.tssrc/services/worker-service.tsitself (delegation back to server-beta is fine). Forbidden: any import insidesrc/server/runtime/. - Server-beta service test:
bun test tests/server/server-beta-service.test.ts - CLI namespace test:
bun test tests/npx-cli-server-namespace.test.ts - Build verifies
server-beta-service.cjsbundle is produced:npm run build-and-sync ls -la plugin/scripts/server-beta-service.cjs - Smoke test independence:
Worker
npx claude-mem server status # before start npx claude-mem server start npx claude-mem server status # running, runtime=server-beta curl -s http://127.0.0.1:$(cat ~/.claude-mem/.server-beta.port)/healthz curl -s http://127.0.0.1:$(cat ~/.claude-mem/.server-beta.port)/v1/info npx claude-mem server stopstart|stop|statusmust remain functional throughout.
Commit Layout
Two commits, in order:
-
feat(server-beta): add independent runtime servicesrc/server/runtime/ServerBetaService.tssrc/server/runtime/create-server-beta-service.tssrc/server/runtime/types.tssrc/server/routes/v1/ServerV1Routes.ts(runtime label)src/services/server/Server.ts(runtime option)src/shared/paths.ts(.server-beta.{pid,port,runtime.json})tests/server/server-beta-service.test.ts
-
feat(server-beta): route CLI lifecycle and build a separate bundlescripts/build-hooks.js(server-beta bundle output)src/npx-cli/commands/runtime.ts(server-beta lifecycle commands)src/npx-cli/commands/server.ts(CLI routing)src/services/worker-service.ts(delegateserver-start|stop|restart|statusto sibling bundle)tests/npx-cli-server-namespace.test.ts
Documentation References
- Parent plan, lines 469–514: Phase 2 deliverables and verification checklist.
src/services/server/Server.ts: existing route-composition style to copy.src/services/infrastructure/ProcessManager.ts: PID-file safety patterns.
Verification Checklist
- All five Phase B steps pass.
- Worker lifecycle still works while server-beta is running, and vice versa.
- Two commits land cleanly with no
--amendor force operations.
Anti-Pattern Guards
- Do not import
WorkerServicefromsrc/server/runtime/. - Do not overload worker PID/port files.
- Do not boot worker as a background dependency of server-beta.
- Do not silently fall back from server-beta to worker.
Phase C: Decide Doc Artifacts
What To Decide
| File | Recommendation | Rationale |
|---|---|---|
PR_REORIENTATION_REPORT.md |
Use as PR body, then delete (or move to docs/internal/). |
It's a snapshot, not durable docs. Useful for the PR reviewer; rots in-tree. |
AGENTS.md |
Read first, then either commit (if generally useful guidance) or move under .scratch/. |
Decision depends on content. |
Verification
- Final
git statusshows only intended doc artifacts (or none). .scratch/is gitignored if used.
Anti-Pattern Guard
- Do not push
PR_REORIENTATION_REPORT.mdto main as a doc; it has a date and a HEAD SHA, it ages immediately.
Phase D: Push and Open/Update PR
What To Run
git push -u origin bullmq-vs-bee-queue-for-claude-mem-observation-quegh pr view --web(if PR exists) orgh pr createwith body sourced fromPR_REORIENTATION_REPORT.md.- PR body must explicitly carve scope: "Includes Phase 1 + Phase 2 from
plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md. Phases 3–13 are follow-ups on separate branches."
Verification Checklist
- PR title is short (under 70 chars) and reflects scope: e.g., "Add Postgres storage + independent server-beta runtime (Phases 1–2)".
- PR body lists out-of-scope phases.
- CI is green.
Anti-Pattern Guards
- Do not force-push to main.
- Do not merge without CI green.
Phase E: Branch Closeout
Once the PR merges, this branch is done. Phase 3 (BullMQ-First Server Queue) starts on a fresh branch off main. Do not reuse this branch for Phase 3 work — keep the queue/runtime split visible in history.
Final Verification (cross-phase)
Run after Phases A–D:
git status # clean or only intended doc artifacts
git log --oneline origin/main..HEAD # 4e0fc77a + Phase 2 commits, no force-push markers
bun test tests/storage/postgres tests/server tests/npx-cli-server-namespace.test.ts
rg -n "WorkerService|services/worker-service|worker/http" src/server/runtime
rg -n "PendingMessageStore|SessionQueueProcessor" src/server/runtime
Expected:
- All three test paths green.
- Both greps return zero matches.
- Branch ready to merge.
Decisions Locked
- Phase 1 gate: orchestrator-managed deterministic checks (no reviewer agent).
AGENTS.md+PR_REORIENTATION_REPORT.md: discard before commit.- Scope: this branch ships Phases 1 + 2 + 3 (BullMQ-First Server Queue). Phase E becomes Phase 3 work, push moves to Phase F.
Phase D (revised): Discard Untracked Doc Artifacts
rm AGENTS.md PR_REORIENTATION_REPORT.md
Verification: git status shows neither file.
Phase E: Implement Phase 3 — BullMQ-First Server Queue
Source: parent plan lines 515–570.
What To Implement
src/server/jobs/types.ts— job-shape types:ServerGenerationJob(base)GenerateObservationsForEventJobGenerateObservationsForEventBatchJobGenerateSessionSummaryJobReindexObservationJob- Every job carries
team_id,project_id,source_type,source_id,generation_job_id. Event jobs addagent_event_id. Summary jobs addserver_session_id. Reindex jobs add target observation ID or deterministic reindex scope ID.
src/server/jobs/job-id.ts— deterministic, colon-free job IDs (port the SHA-256-safe pattern fromsrc/server/queue/BullMqObservationQueueEngine.ts).src/server/jobs/ServerJobQueue.ts— thin wrapper around BullMQQueue,Worker,QueueEvents. Useautorun: false, explicitconcurrency: 1default per lane, and anerrorlistener on everyWorker.src/server/jobs/outbox.ts— durable outbox overObservationGenerationJobRepository. Statuses:queued,processing,completed,failed,cancelled. Tracks attempts, last error, timestamps, and tenant/project/session IDs.- Startup reconciliation:
- Re-enqueue rows in
queuedor staleprocessing. - Skip rows already
completed. - Replace terminal BullMQ jobs before reusing deterministic IDs.
- Re-enqueue rows in
- Wire queue health into
/v1/info,/api/health, andclaude-mem server statusvia the existing runtime label hook. - Activate the queue boundary in
ServerBetaService(Phase 2 left it disabled). Provide a real adapter whenCLAUDE_MEM_QUEUE_ENGINE=bullmqandREDIS_URLare present; keep the disabled adapter as the fallback.
Documentation References
- BullMQ Workers: https://docs.bullmq.io/guide/workers
- BullMQ Concurrency: https://docs.bullmq.io/guide/workers/concurrency
- BullMQ Stalled Jobs: https://docs.bullmq.io/guide/jobs/stalled
src/server/queue/BullMqObservationQueueEngine.ts— copy deterministic job-ID + Redis health patterns; do not copy the worker-iterator compatibility shape.src/server/queue/redis-config.ts— Valkey/Redis health checks.src/storage/postgres/generation-jobs.ts— outbox repository (already committed in 4e0fc77a).
Verification Checklist
Unit tests under tests/server/jobs/:
job-id.test.ts— deterministic IDs, no colons, stable across runs, content-derived.server-job-queue.test.ts— Queue/Worker lifecycle,errorlistener attached, concurrency honored, autorun false.outbox.test.ts— duplicate enqueue suppression, terminal job replacement, status transitions, attempt counting.
Integration tests under tests/server/queue-bootstrap/:
- Start
ServerBetaServicewith Postgres + Valkey + queue boundary enabled. - Insert outbox rows directly through
ObservationGenerationJobRepository. - Enqueue fake jobs; restart before fake processing completes.
- Assert reconciliation re-enqueues exactly once and outbox status reaches
completedexactly once. - Assert Redis-down fails Server beta startup when
CLAUDE_MEM_QUEUE_ENGINE=bullmq; no silent fallback to SQLite.
Greps:
rg -n "Bull(MQ|Mq).*\.add\(" src/server/jobs # uses BullMQ Queue.add
rg -n "autorun" src/server/jobs # workers explicitly set autorun
rg -n "on\(['\"]error" src/server/jobs # error listener attached
rg -n ":job:|:obs:" src/server/jobs # NO colons in deterministic IDs
The colon-grep must return zero matches.
Anti-Pattern Guards
- Do not treat BullMQ completed/failed state as canonical history — Postgres outbox is canonical.
- Do not require event-route wiring or provider generation here (Phase 4 territory).
- Do not allow duplicate processor side effects on retry — keep observation writes idempotent by deterministic key.
- Do not use BullMQ Pro-only features (groups).
- Do not leave pending work only in Redis.
- Do not silently fall back from BullMQ to SQLite when
CLAUDE_MEM_QUEUE_ENGINE=bullmqis set.
Commit Layout
Two commits:
-
feat(server-beta): add BullMQ job queue primitivessrc/server/jobs/types.tssrc/server/jobs/job-id.tssrc/server/jobs/ServerJobQueue.tssrc/server/jobs/outbox.tstests/server/jobs/*.test.ts
-
feat(server-beta): activate queue boundary in runtime servicesrc/server/runtime/ServerBetaService.ts(queue boundary wiring)src/server/runtime/create-server-beta-service.ts(boundary selection from env)src/server/runtime/types.ts(active queue manager interface)- Health surface updates in
/v1/infoand/api/healthif not already covered by Phase 2 runtime label. tests/server/queue-bootstrap/*.test.ts
Phase F: Push and Open/Update PR
git push -u origin bullmq-vs-bee-queue-for-claude-mem-observation-que
gh pr view --web # if PR exists
# else:
gh pr create --title "Server-beta: Postgres storage + independent runtime + BullMQ queue (Phases 1–3)"
PR body must list:
- Scope: Phases 1, 2, 3 of
plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md. - Out of scope: Phases 4–13 (event-to-job pipeline, provider extraction, hook routing, MCP, compat, Docker, team auth, observability, final verification).
Verification Checklist
git statusclean.git log --oneline origin/main..HEADshows all expected commits, no force-push markers.- CI green.
Final Cross-Phase Verification
git status # clean
bun test tests/storage/postgres tests/server tests/npx-cli-server-namespace.test.ts
rg -n "WorkerService|services/worker-service|worker/http" src/server/runtime # zero
rg -n "PendingMessageStore|SessionQueueProcessor" src/server/runtime src/server/jobs # zero