36b0929fae
* Add server beta runtime foundation * Address server beta review findings * Resolve server beta review comments * Tighten server beta review follow-ups * Harden server beta auth and search * Avoid unnecessary FTS rebuilds * Block scoped keys from creating projects * Release BullMQ claims best effort on close * Address server beta review blockers * Reset BullMQ claims best effort * Add Postgres observation storage foundation * feat(server-beta): add independent runtime service Introduce src/server/runtime/ as a self-contained server-beta runtime that owns its lifecycle, Postgres bootstrap, and HTTP boundary without depending on WorkerService. ServerBetaService wraps the existing Server class, exposes /healthz and /v1/info with runtime="server-beta", and persists state to dedicated paths (.server-beta.pid|.port|.runtime.json). The four boundary managers (queue, generation worker, provider registry, event broadcaster) are intentionally disabled in this phase and report their status through /v1/info; later phases activate them. Adds plans/2026-05-07-finish-bullmq-branch-ship-plan.md to track the remaining work for this branch. Phase 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server-beta): route CLI lifecycle and bundle separate runtime scripts/build-hooks.js now produces plugin/scripts/server-beta-service.cjs as a separate Node CJS bundle, alongside the existing worker-service bundle. The server-beta runtime is now installable independently. src/npx-cli/commands/server.ts routes start|stop|restart|status to the server-beta lifecycle instead of the legacy worker. The worker keeps its own start|stop|restart|status under the worker namespace; the two runtimes can be operated independently. src/services/worker-service.ts adds a server-* command parser branch that delegates to the sibling server-beta-service.cjs bundle so direct worker-service invocations still route to the right runtime. tests/npx-cli-server-namespace.test.ts updated to expect server-beta lifecycle routing. Includes rebuilt plugin/scripts/*.cjs bundles produced by build-and-sync. Phase 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server-beta): add BullMQ job queue primitives Introduce src/server/jobs/ as the queue-side primitives that Phase 3 of the server-beta runtime needs to operate. types.ts defines a discriminated union over the four job kinds (event, event-batch, summary, reindex) and maps each to a per-kind BullMQ queue name and deterministic-ID prefix. job-id.ts builds deterministic, colon-free BullMQ jobIds from (kind, team, project, source). The colon ban exists because BullMQ uses ':' as a Redis key separator internally; embedding ':' in jobIds breaks scan and state lookups. ServerJobQueue.ts is a thin wrapper over BullMQ Queue + Worker that enforces autorun:false, default concurrency 1, and an attached error listener — all per BullMQ docs requirements. Test seams accept queue and worker factories so unit tests do not need Redis. outbox.ts publishes through the Postgres ObservationGenerationJob repository as canonical history. enqueueOutbox writes the row first, then publishes to BullMQ; if BullMQ throws, the row is transitioned to failed and a failed event is appended. reconcileOnStartup re-enqueues queued + processing rows after a restart, replacing terminal BullMQ jobs that may still be holding the deterministic ID slot. markCompleted and markFailed wrap transitionStatus and append the matching event row. Includes 20 unit tests covering deterministic ID stability, colon-free output, queue lifecycle, error-listener attachment, double-start refusal, idempotent enqueue, BullMQ failure rollback, startup reconciliation, max-attempts skipping, and completion / failure / retry transitions. Phase 3 commit 1 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server-beta): activate queue boundary in runtime service Wire ActiveServerBetaQueueManager into the server-beta runtime graph. The active manager owns one ServerJobQueue per generation kind (event, event-batch, summary, reindex) and surfaces lane metadata through boundary health. Selection is opt-in and fail-fast: if CLAUDE_MEM_QUEUE_ENGINE is set to bullmq the active manager is constructed (and any Redis/config error throws — no silent fallback to SQLite, per Phase 3 anti-pattern guard). For any other engine the disabled boundary remains so worker-era and test setups stay compatible. Widens ServerBetaBoundaryHealth.status to a discriminated union ('disabled' | 'active' | 'errored') with optional details. The disabled adapter still emits status='disabled', which keeps the existing server-beta-service test green. ServerBetaService receives the manager through a new optional queueManager field on CreateServerBetaServiceOptions so test graphs and Phase 4 wiring can inject custom managers. Adds tests/server/runtime/active-queue-manager.test.ts covering bullmq guard, active health shape, per-kind queue access, close behavior, and post-close errored health. Phase 3 commit 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server-beta): cap /v1/events/batch at 500 events Prevents unbounded array DoS surface flagged in PR review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
332 lines
15 KiB
Markdown
332 lines
15 KiB
Markdown
# Finish BullMQ Observation Queue Branch — Ship Plan
|
||
|
||
Date: 2026-05-07
|
||
Branch: `bullmq-vs-bee-queue-for-claude-mem-observation-que`
|
||
Base: `origin/main` @ `0a43ab76`
|
||
Parent plan: `plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md`
|
||
|
||
## Reframe
|
||
|
||
The prior session believed Phase 1 was ungated because two reviewer agents failed (one returned not_found, "Carver" was user-aborted at 111.9s). That belief was based on a stale snapshot that predated commit `4e0fc77a Add Postgres observation storage foundation`. **Phase 1 is committed.** `git status` shows zero uncommitted changes under `src/storage/postgres/`.
|
||
|
||
What is actually dirty in the worktree is **Phase 2: Define Server Runtime Boundary**. The dirty files map 1:1 to that phase's "What To Implement" section. The remaining work to "finish this branch" is: confirm Phase 1 with concrete checks (not another reviewer agent), land Phase 2, push.
|
||
|
||
Phases 3–13 (BullMQ queue, event-to-job pipeline, provider extraction, hook routing, MCP, compat, Docker, team auth, observability, final verification) are explicitly **out of scope** for this branch. The PR is already 167 files / 23.5K insertions. Continuing past Phase 2 here would make review impossible.
|
||
|
||
## Phase 0: Documentation Discovery
|
||
|
||
### Sources Read
|
||
|
||
- `plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md` (parent plan, 987 lines, all 14 sections from Phase 0 through Phase 13)
|
||
- `PR_REORIENTATION_REPORT.md` (660 lines) — independent inventory of committed + dirty surfaces
|
||
- `git status`, `git log --oneline -15`, `git diff --stat HEAD`
|
||
- Worktree: `src/server/runtime/{ServerBetaService.ts,create-server-beta-service.ts,types.ts}`
|
||
- Worktree: `src/storage/postgres/` — already in commit `4e0fc77a`
|
||
|
||
### Concrete Findings
|
||
|
||
- Phase 1 (Postgres storage foundation) is committed in `4e0fc77a`. Includes scoped `addSource`, `transitionStatus`, generation-job event `append`, FTS via generated `content_search` tsvector + GIN index, tenant-scoped uniqueness constraints, and 20 integration tests including the negative-scope mutation test.
|
||
- Phase 2 (server runtime boundary) is implemented but uncommitted. Files match the parent plan's Phase 2 deliverables exactly: independent `ServerBetaService`, `create-server-beta-service`, disabled boundary types, `.server-beta.{pid,port,runtime.json}` paths, runtime labels in `/api/health` and `/v1/info`, server-beta CLI lifecycle, build-hooks split into a separate `server-beta-service.cjs` bundle, ephemeral-port test for `/api/health` and `/v1/info`.
|
||
- Two doc artifacts (`AGENTS.md`, `PR_REORIENTATION_REPORT.md`) are also untracked. Decide before push.
|
||
|
||
### Anti-Pattern Guards (carried from parent plan)
|
||
|
||
- Do not spawn a third reviewer agent to "gate" Phase 1. The integration test suite plus the plan's grep checklist is the gate. Reviewer agents are a second opinion, not the primary gate.
|
||
- Do not pull Phase 3+ work into this branch.
|
||
- Do not amend `4e0fc77a` to "tidy" Phase 1; create new commits.
|
||
- Do not couple Phase 2 to `WorkerService` (the entire point of Phase 2 is independence).
|
||
|
||
## Phase A: Re-Confirm Phase 1 Gate (Deterministic, No Reviewer Agent)
|
||
|
||
### What To Run
|
||
|
||
1. `tsc --noEmit` scoped to Postgres storage:
|
||
```bash
|
||
bunx tsc --noEmit src/storage/postgres/*.ts
|
||
```
|
||
2. Postgres integration suite (requires `DATABASE_URL` or local Postgres on default port):
|
||
```bash
|
||
bun test tests/storage/postgres
|
||
```
|
||
3. Anti-pattern greps (must all return zero matches in `src/storage/postgres/`):
|
||
```bash
|
||
rg -n "UNIQUE\s*\(\s*source_type\s*,\s*source_id\s*,\s*job_type\s*\)" src/storage/postgres
|
||
rg -n "UNIQUE\s*\(\s*observation_id\s*,\s*source_type\s*,\s*source_id\s*\)" src/storage/postgres
|
||
```
|
||
4. Scoped-mutation grep (must show `projectId`/`teamId` parameters):
|
||
```bash
|
||
rg -n "addSource|transitionStatus|append" src/storage/postgres
|
||
```
|
||
|
||
### Verification Checklist
|
||
|
||
- TypeScript clean.
|
||
- All 20 Postgres integration tests pass, including the negative-scope mutation test.
|
||
- Both anti-pattern greps return empty.
|
||
- Scoped-mutation grep shows `projectId`/`teamId` in every signature.
|
||
|
||
### Anti-Pattern Guards
|
||
|
||
- Do not edit `src/storage/postgres/*.ts` in this phase. If Phase A fails, open a separate fix-up commit; do not amend `4e0fc77a`.
|
||
|
||
## Phase B: Land Phase 2 (Server Runtime Boundary)
|
||
|
||
### What To Run
|
||
|
||
1. Phase 2 independence grep — Server beta runtime must not import worker:
|
||
```bash
|
||
rg -n "WorkerService|services/worker-service|worker/http" \
|
||
src/server/runtime src/npx-cli/commands/server.ts
|
||
```
|
||
Allowed: matches inside `src/services/worker-service.ts` itself (delegation back to server-beta is fine). Forbidden: any import inside `src/server/runtime/`.
|
||
2. Server-beta service test:
|
||
```bash
|
||
bun test tests/server/server-beta-service.test.ts
|
||
```
|
||
3. CLI namespace test:
|
||
```bash
|
||
bun test tests/npx-cli-server-namespace.test.ts
|
||
```
|
||
4. Build verifies `server-beta-service.cjs` bundle is produced:
|
||
```bash
|
||
npm run build-and-sync
|
||
ls -la plugin/scripts/server-beta-service.cjs
|
||
```
|
||
5. Smoke test independence:
|
||
```bash
|
||
npx claude-mem server status # before start
|
||
npx claude-mem server start
|
||
npx claude-mem server status # running, runtime=server-beta
|
||
curl -s http://127.0.0.1:$(cat ~/.claude-mem/.server-beta.port)/healthz
|
||
curl -s http://127.0.0.1:$(cat ~/.claude-mem/.server-beta.port)/v1/info
|
||
npx claude-mem server stop
|
||
```
|
||
Worker `start|stop|status` must remain functional throughout.
|
||
|
||
### Commit Layout
|
||
|
||
Two commits, in order:
|
||
|
||
1. **`feat(server-beta): add independent runtime service`**
|
||
- `src/server/runtime/ServerBetaService.ts`
|
||
- `src/server/runtime/create-server-beta-service.ts`
|
||
- `src/server/runtime/types.ts`
|
||
- `src/server/routes/v1/ServerV1Routes.ts` (runtime label)
|
||
- `src/services/server/Server.ts` (runtime option)
|
||
- `src/shared/paths.ts` (`.server-beta.{pid,port,runtime.json}`)
|
||
- `tests/server/server-beta-service.test.ts`
|
||
|
||
2. **`feat(server-beta): route CLI lifecycle and build a separate bundle`**
|
||
- `scripts/build-hooks.js` (server-beta bundle output)
|
||
- `src/npx-cli/commands/runtime.ts` (server-beta lifecycle commands)
|
||
- `src/npx-cli/commands/server.ts` (CLI routing)
|
||
- `src/services/worker-service.ts` (delegate `server-start|stop|restart|status` to sibling bundle)
|
||
- `tests/npx-cli-server-namespace.test.ts`
|
||
|
||
### Documentation References
|
||
|
||
- Parent plan, lines 469–514: Phase 2 deliverables and verification checklist.
|
||
- `src/services/server/Server.ts`: existing route-composition style to copy.
|
||
- `src/services/infrastructure/ProcessManager.ts`: PID-file safety patterns.
|
||
|
||
### Verification Checklist
|
||
|
||
- All five Phase B steps pass.
|
||
- Worker lifecycle still works while server-beta is running, and vice versa.
|
||
- Two commits land cleanly with no `--amend` or force operations.
|
||
|
||
### Anti-Pattern Guards
|
||
|
||
- Do not import `WorkerService` from `src/server/runtime/`.
|
||
- Do not overload worker PID/port files.
|
||
- Do not boot worker as a background dependency of server-beta.
|
||
- Do not silently fall back from server-beta to worker.
|
||
|
||
## Phase C: Decide Doc Artifacts
|
||
|
||
### What To Decide
|
||
|
||
| File | Recommendation | Rationale |
|
||
|------|---------------|-----------|
|
||
| `PR_REORIENTATION_REPORT.md` | Use as PR body, then delete (or move to `docs/internal/`). | It's a snapshot, not durable docs. Useful for the PR reviewer; rots in-tree. |
|
||
| `AGENTS.md` | Read first, then either commit (if generally useful guidance) or move under `.scratch/`. | Decision depends on content. |
|
||
|
||
### Verification
|
||
|
||
- Final `git status` shows only intended doc artifacts (or none).
|
||
- `.scratch/` is gitignored if used.
|
||
|
||
### Anti-Pattern Guard
|
||
|
||
- Do not push `PR_REORIENTATION_REPORT.md` to main as a doc; it has a date and a HEAD SHA, it ages immediately.
|
||
|
||
## Phase D: Push and Open/Update PR
|
||
|
||
### What To Run
|
||
|
||
1. `git push -u origin bullmq-vs-bee-queue-for-claude-mem-observation-que`
|
||
2. `gh pr view --web` (if PR exists) or `gh pr create` with body sourced from `PR_REORIENTATION_REPORT.md`.
|
||
3. PR body must explicitly carve scope: "Includes Phase 1 + Phase 2 from `plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md`. Phases 3–13 are follow-ups on separate branches."
|
||
|
||
### Verification Checklist
|
||
|
||
- PR title is short (under 70 chars) and reflects scope: e.g., "Add Postgres storage + independent server-beta runtime (Phases 1–2)".
|
||
- PR body lists out-of-scope phases.
|
||
- CI is green.
|
||
|
||
### Anti-Pattern Guards
|
||
|
||
- Do not force-push to main.
|
||
- Do not merge without CI green.
|
||
|
||
## Phase E: Branch Closeout
|
||
|
||
Once the PR merges, this branch is done. Phase 3 (BullMQ-First Server Queue) starts on a fresh branch off main. Do not reuse this branch for Phase 3 work — keep the queue/runtime split visible in history.
|
||
|
||
## Final Verification (cross-phase)
|
||
|
||
Run after Phases A–D:
|
||
|
||
```bash
|
||
git status # clean or only intended doc artifacts
|
||
git log --oneline origin/main..HEAD # 4e0fc77a + Phase 2 commits, no force-push markers
|
||
bun test tests/storage/postgres tests/server tests/npx-cli-server-namespace.test.ts
|
||
rg -n "WorkerService|services/worker-service|worker/http" src/server/runtime
|
||
rg -n "PendingMessageStore|SessionQueueProcessor" src/server/runtime
|
||
```
|
||
|
||
Expected:
|
||
|
||
- All three test paths green.
|
||
- Both greps return zero matches.
|
||
- Branch ready to merge.
|
||
|
||
## Decisions Locked
|
||
|
||
1. Phase 1 gate: orchestrator-managed deterministic checks (no reviewer agent).
|
||
2. `AGENTS.md` + `PR_REORIENTATION_REPORT.md`: **discard** before commit.
|
||
3. Scope: this branch ships Phases 1 + 2 + **3** (BullMQ-First Server Queue). Phase E becomes Phase 3 work, push moves to Phase F.
|
||
|
||
## Phase D (revised): Discard Untracked Doc Artifacts
|
||
|
||
```bash
|
||
rm AGENTS.md PR_REORIENTATION_REPORT.md
|
||
```
|
||
|
||
Verification: `git status` shows neither file.
|
||
|
||
## Phase E: Implement Phase 3 — BullMQ-First Server Queue
|
||
|
||
Source: parent plan lines 515–570.
|
||
|
||
### What To Implement
|
||
|
||
- `src/server/jobs/types.ts` — job-shape types:
|
||
- `ServerGenerationJob` (base)
|
||
- `GenerateObservationsForEventJob`
|
||
- `GenerateObservationsForEventBatchJob`
|
||
- `GenerateSessionSummaryJob`
|
||
- `ReindexObservationJob`
|
||
- Every job carries `team_id`, `project_id`, `source_type`, `source_id`, `generation_job_id`. Event jobs add `agent_event_id`. Summary jobs add `server_session_id`. Reindex jobs add target observation ID or deterministic reindex scope ID.
|
||
- `src/server/jobs/job-id.ts` — deterministic, colon-free job IDs (port the SHA-256-safe pattern from `src/server/queue/BullMqObservationQueueEngine.ts`).
|
||
- `src/server/jobs/ServerJobQueue.ts` — thin wrapper around BullMQ `Queue`, `Worker`, `QueueEvents`. Use `autorun: false`, explicit `concurrency: 1` default per lane, and an `error` listener on every `Worker`.
|
||
- `src/server/jobs/outbox.ts` — durable outbox over `ObservationGenerationJobRepository`. Statuses: `queued`, `processing`, `completed`, `failed`, `cancelled`. Tracks attempts, last error, timestamps, and tenant/project/session IDs.
|
||
- Startup reconciliation:
|
||
- Re-enqueue rows in `queued` or stale `processing`.
|
||
- Skip rows already `completed`.
|
||
- Replace terminal BullMQ jobs before reusing deterministic IDs.
|
||
- Wire queue health into `/v1/info`, `/api/health`, and `claude-mem server status` via the existing runtime label hook.
|
||
- Activate the queue boundary in `ServerBetaService` (Phase 2 left it disabled). Provide a real adapter when `CLAUDE_MEM_QUEUE_ENGINE=bullmq` and `REDIS_URL` are present; keep the disabled adapter as the fallback.
|
||
|
||
### Documentation References
|
||
|
||
- BullMQ Workers: https://docs.bullmq.io/guide/workers
|
||
- BullMQ Concurrency: https://docs.bullmq.io/guide/workers/concurrency
|
||
- BullMQ Stalled Jobs: https://docs.bullmq.io/guide/jobs/stalled
|
||
- `src/server/queue/BullMqObservationQueueEngine.ts` — copy deterministic job-ID + Redis health patterns; do **not** copy the worker-iterator compatibility shape.
|
||
- `src/server/queue/redis-config.ts` — Valkey/Redis health checks.
|
||
- `src/storage/postgres/generation-jobs.ts` — outbox repository (already committed in 4e0fc77a).
|
||
|
||
### Verification Checklist
|
||
|
||
Unit tests under `tests/server/jobs/`:
|
||
|
||
- `job-id.test.ts` — deterministic IDs, no colons, stable across runs, content-derived.
|
||
- `server-job-queue.test.ts` — Queue/Worker lifecycle, `error` listener attached, concurrency honored, autorun false.
|
||
- `outbox.test.ts` — duplicate enqueue suppression, terminal job replacement, status transitions, attempt counting.
|
||
|
||
Integration tests under `tests/server/queue-bootstrap/`:
|
||
|
||
- Start `ServerBetaService` with Postgres + Valkey + queue boundary enabled.
|
||
- Insert outbox rows directly through `ObservationGenerationJobRepository`.
|
||
- Enqueue fake jobs; restart before fake processing completes.
|
||
- Assert reconciliation re-enqueues exactly once and outbox status reaches `completed` exactly once.
|
||
- Assert Redis-down fails Server beta startup when `CLAUDE_MEM_QUEUE_ENGINE=bullmq`; no silent fallback to SQLite.
|
||
|
||
Greps:
|
||
|
||
```bash
|
||
rg -n "Bull(MQ|Mq).*\.add\(" src/server/jobs # uses BullMQ Queue.add
|
||
rg -n "autorun" src/server/jobs # workers explicitly set autorun
|
||
rg -n "on\(['\"]error" src/server/jobs # error listener attached
|
||
rg -n ":job:|:obs:" src/server/jobs # NO colons in deterministic IDs
|
||
```
|
||
|
||
The colon-grep must return zero matches.
|
||
|
||
### Anti-Pattern Guards
|
||
|
||
- Do not treat BullMQ completed/failed state as canonical history — Postgres outbox is canonical.
|
||
- Do not require event-route wiring or provider generation here (Phase 4 territory).
|
||
- Do not allow duplicate processor side effects on retry — keep observation writes idempotent by deterministic key.
|
||
- Do not use BullMQ Pro-only features (groups).
|
||
- Do not leave pending work only in Redis.
|
||
- Do not silently fall back from BullMQ to SQLite when `CLAUDE_MEM_QUEUE_ENGINE=bullmq` is set.
|
||
|
||
### Commit Layout
|
||
|
||
Two commits:
|
||
|
||
1. **`feat(server-beta): add BullMQ job queue primitives`**
|
||
- `src/server/jobs/types.ts`
|
||
- `src/server/jobs/job-id.ts`
|
||
- `src/server/jobs/ServerJobQueue.ts`
|
||
- `src/server/jobs/outbox.ts`
|
||
- `tests/server/jobs/*.test.ts`
|
||
|
||
2. **`feat(server-beta): activate queue boundary in runtime service`**
|
||
- `src/server/runtime/ServerBetaService.ts` (queue boundary wiring)
|
||
- `src/server/runtime/create-server-beta-service.ts` (boundary selection from env)
|
||
- `src/server/runtime/types.ts` (active queue manager interface)
|
||
- Health surface updates in `/v1/info` and `/api/health` if not already covered by Phase 2 runtime label.
|
||
- `tests/server/queue-bootstrap/*.test.ts`
|
||
|
||
## Phase F: Push and Open/Update PR
|
||
|
||
```bash
|
||
git push -u origin bullmq-vs-bee-queue-for-claude-mem-observation-que
|
||
gh pr view --web # if PR exists
|
||
# else:
|
||
gh pr create --title "Server-beta: Postgres storage + independent runtime + BullMQ queue (Phases 1–3)"
|
||
```
|
||
|
||
PR body must list:
|
||
|
||
- Scope: Phases 1, 2, 3 of `plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md`.
|
||
- Out of scope: Phases 4–13 (event-to-job pipeline, provider extraction, hook routing, MCP, compat, Docker, team auth, observability, final verification).
|
||
|
||
### Verification Checklist
|
||
|
||
- `git status` clean.
|
||
- `git log --oneline origin/main..HEAD` shows all expected commits, no force-push markers.
|
||
- CI green.
|
||
|
||
## Final Cross-Phase Verification
|
||
|
||
```bash
|
||
git status # clean
|
||
bun test tests/storage/postgres tests/server tests/npx-cli-server-namespace.test.ts
|
||
rg -n "WorkerService|services/worker-service|worker/http" src/server/runtime # zero
|
||
rg -n "PendingMessageStore|SessionQueueProcessor" src/server/runtime src/server/jobs # zero
|
||
```
|