The repo shipped both a root-level .mcp.json and plugin/.mcp.json with
identical mcp-search launchers — kept in sync by a build-time guard and
a test. The root file was a holdover from when devs working inside the
repo could load mem-search without installing the plugin. With the
plugin universally installed, every plugin user now sees `/doctor` warn:
Plugin (claude-mem @ plugin:claude-mem:mcp-search): MCP server
"mcp-search" skipped — same command/URL as already-configured
"mcp-search"
…because Claude Code dedupes by command and skips the plugin's
namespaced registration. The duplicate is functionally harmless but
suppresses the canonical `plugin:claude-mem:mcp-search` entry.
This removes the root .mcp.json entirely and re-points everything that
referenced it at the bundled plugin copy:
- .mcp.json: deleted
- .codex-plugin/plugin.json: mcpServers → ./plugin/.mcp.json
- package.json: drop .mcp.json from files
- scripts/build-hooks.js: drop root-file requirement + sync check
- scripts/sync-marketplace.cjs: drop syncManagedFiles entry
- src/npx-cli/commands/install.ts: drop from allowedTopLevelEntries
- tests/infrastructure/plugin-distribution.test.ts: drop two tests
enforcing the now-removed root file
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(skills): add oh-my-issues for root-cause issue clustering
Codifies the consolidation method that turned ~100 open issues into 6
plan-master issues during the v13.0.1 cycle. Three modes: cluster pass
(initial reduction), triage (route a new bug into an existing master),
bundle (ship a PR that closes the cluster atomically).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(skills): correct oh-my-issues cluster-pass instructions
- Cluster pass step 1: drop the misleading single-call pattern;
point to a paginated list + per-issue comment fetch since
`gh issue list --json comments` returns only counts and
`--limit` silently truncates large backlogs.
- GitHub CLI primitives: replace the buggy snippet with a
total-count check, paginated listing, per-issue comment loop,
and REST API fallback for repos with >1000 open issues.
- make-plan: add See Also section linking oh-my-issues so the
planning skill knows about its issue-side sibling.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(skills): use search API for issue counts and tag fenced block
- repos/{owner}/{repo}.open_issues_count includes PRs. Switch to the
search/issues API which differentiates issues from PRs so the
cluster-pass count is accurate.
- Add `text` language tag to the standardized redirect comment
fenced block (MD040).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(skills): add weekly-digests skill for serial timeline narrative
Generate a chapter-per-ISO-week narrative digest of a project's full
claude-mem history. Splits the timeline by ISO week, then runs
consecutive (non-parallel) subagents — each receiving the prior week's
carry-forward block — to produce a coherent multi-chapter serial.
Encodes the pipeline discipline that emerged from running it end-to-end:
narrative budget scaled to obs count, carry-forward capped and pruned,
register evolution tracked explicitly, components as characters,
silence as story, no false ending in the final chapter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(skills/weekly-digests): degeneralize from hardcoded 30-week assumption
The skill was overfit to a single run that happened to span 30 ISO weeks.
On a 2-week project the prompt template would tell the subagent it was
writing chapter N of a "30-part serial narrative" — which lies.
Changes:
- Frontmatter and opening prose no longer claim a fixed chapter count.
- Subagent prompt template uses "chapter N of TOTAL" wording that scales
to any N including 1.
- Added explicit N=1 handling: apply first-and-final treatment together.
- Genericized component-as-character and meta-recursion examples — they
no longer import claude-mem's specific cast as if mandatory.
- Filename zero-pad width now derived from N (works past 99 weeks).
- Examples section shows long-project, short-project, and N=1 flows.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audits a design against Rams' ten principles with evidence-cited
scores (0-3 per principle), produces a NEW/REFINE/REDESIGN verdict,
and hands off a ready-to-run /make-plan prompt for the chosen outcome.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(skills): wowerpoint share-link upload step
After the kawaii NotebookLM PDF lands on disk, the subagent now also POSTs
it to the WOWerpoint Server (if configured) and reports back a share URL.
The PDF is still the backup; the share URL is the primary deliverable.
Gated on three env vars (WOWERPOINT_API_BASE, WOWERPOINT_VIEWER_BASE,
WOWERPOINT_UPLOAD_TOKEN) — if any are missing the skill skips the upload
silently and behaves exactly as before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(skills): address CodeRabbit + Greptile findings on wowerpoint
- Drop the ~/.wowerpoint.env reference: the subagent inherits the parent's
environment and never sources a dotenv file, so storing vars there would
silently disable the upload step. Documented only the shell-export path.
- Switch jq parsing to `.id // empty` so a missing key yields an empty
string instead of the literal "null", letting the [-z "$DECK_ID"] guard
fire correctly on error responses.
- Capture the full JSON response so a non-empty .error field is surfaced as
a warning rather than emitting an invalid …/d/null share URL.
- Add TITLE to the subagent template's Inputs block so the parent agent
knows it must supply a title slot the curl command depends on.
- Make step 6 itself guard on the env vars instead of relying on prose, so
the snippet works in isolation if a future agent skips the surrounding
instructions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(skills): gate the top-level upload snippet on env vars too
CodeRabbit pointed out the prose snippet at the top of the Share-link
section uploaded unconditionally, while the subagent step 6 version had the
env-var guard. Anyone copying the standalone snippet would have skipped
"silently" by failing the curl request. Wrapping both in the same guard
keeps the two snippets in sync.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(skills): cap wowerpoint upload curls at 30 s
Greptile flagged that a bare curl on an unreachable WOWERPOINT_API_BASE can
sit on the OS TCP timeout (75–130 s) before returning, stalling the
background subagent and delaying the completion notification. Adding
--connect-timeout 10 --max-time 30 to both upload snippets bounds the
hang and lets the share-link step fail fast.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(skills): wowerpoint slug example reflects 3-word IDs
Server now mints adjective-noun-creature slugs (e.g. quirky-compass-hawk)
instead of base64url. The curl/jq snippets are unchanged — they already
parse .id as opaque — but the prose was stale.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(skills): wowerpoint slug example reflects title-aware IDs
Server now slugifies the title and appends a creature suffix
(tokenrouter-quest-hawk) instead of three random words. Falls back to a
3-word slug when the title is empty or non-ASCII. The curl/jq snippets
are unchanged — they parse .id as opaque — but the prose was stale.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(skills): add wowerpoint for one-doc kawaii NotebookLM slide decks
A skill that turns one source document into a kawaii NotebookLM slide-deck
PDF. Wraps the `notebooklm` CLI with the kawaii-prompt + `--format detailed`
defaults, and the spawn-subagent-and-end-turn pattern so generation (~10 min)
never blocks the main conversation.
Single-source-per-deck is enforced by the workflow shape: step 1 is "confirm
or write the source doc"; step 3 adds exactly one source. If the doc is
non-existent or thin, write it first using mem-search and sequential
thinking — don't paper over a weak source by stacking more sources.
Slide-deck only — videos and podcasts from the same engine are noticeably
worse and out of scope; refer the user to the `notebooklm` CLI directly if
they want those.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(skills): address CodeRabbit findings on wowerpoint
- Document `jq` as a required workflow dependency in setup
- Add `text` language identifier to three unlabeled fenced code blocks
(MD040 lint compliance)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): Phase 4 — Postgres event-to-generation-job pipeline
Adds POST /v1/events, /v1/events/batch, GET /v1/jobs/:id, GET /v1/events/:id,
and POST /v1/memories on the server-beta runtime, backed by Postgres.
- Event row + outbox generation-job row insert in one withPostgresTransaction.
- BullMQ enqueue happens after commit; enqueue failure leaves the row queued
for Phase 3 startup reconciliation.
- ?generate=false skips the outbox; ?wait=true returns queue status only,
never observation IDs (provider generation is Phase 5).
- Batch pre-validates all event projectIds against api-key scope before any
write; mixed-project batches reject 403 with zero side effects.
- /v1/memories is a direct insert alias — no generator, no outbox.
- Cross-tenant /v1/jobs/:id returns 404 to avoid leaking row existence.
- New PostgresAuthMiddleware reads api_keys by SHA-256 hash; populates
req.authContext.teamId/projectId; legacy ServerV1Routes (SQLite, used by
worker runtime) is left untouched.
- Tests: unit suite hardened with stubbed pool.query so route registration
is safe; integration tests skip cleanly without CLAUDE_MEM_TEST_POSTGRES_URL.
Verification: 87 pass / 1 skip / 0 fail. No new typecheck errors. Required
greps for WorkerService and MemoryItemsRepository in src/server/routes/v1
and src/server/runtime return no hits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): Phase 5 — provider observation generator
Adds independent provider generation under src/server/generation/ with no
worker coupling. Server beta can now generate observations end-to-end:
event -> outbox -> BullMQ -> provider -> parser -> persisted observation.
- ProviderObservationGenerator orchestrates: lock outbox (queued -> processing),
reload agent_event from Postgres (BullMQ payload is advisory only), call
provider, hand raw text to processGeneratedResponse, route errors via
markGenerationFailed with retryable flag from ServerClassifiedProviderError.
- processGeneratedResponse parses with parseAgentXml, persists via
PostgresObservationRepository with deterministic
generation_key = generation:v1:{job_id}:{index}:{fingerprint},
links via PostgresObservationSourcesRepository, advances outbox status,
appends observation_generation_job_events, audits — all in one
withPostgresTransaction. Idempotent on retry via UNIQUE constraints.
- Three provider adapters under src/server/generation/providers/:
Claude, Gemini, OpenRouter. Self-contained — no imports from
src/services/worker/*. Worker providers unchanged.
- Shared error classification + prompt builder under providers/shared/.
Prompt builder strips <private> at the edge; fully-private batches
emit <skip_summary /> without billing the provider.
- ActiveServerBetaGenerationWorkerManager wires BullMQ Worker via
ServerJobQueue.start(...) with concurrency 1 + autorun:false +
worker.on('error') per BullMQ docs.
- New GET /v1/events/:id/observations on ServerV1PostgresRoutes returns
observations linked via observation_sources, team/project scoped.
Verification: 104 pass / 4 skip / 0 fail. No typecheck regressions.
Anti-pattern greps clean for services/worker imports under src/server,
WorkerRef/ActiveSession/SessionStore in src/server/generation.
Deferred: ModeManager loading uses a stable fallback observation type
list; summary and reindex queue lanes are not yet wired.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): Phase 6 — independent server session semantics
server_sessions is now the canonical Server beta session model. Sessions
are independent of legacy worker ActiveSession state.
- PostgresServerSessionRepository extended: findByExternalIdForScope,
endSession (idempotent via COALESCE(ended_at, now())),
markGenerationStarted/Completed/Failed, listUnprocessedEvents (filters
agent_events with completed agent_event jobs).
- ServerSessionRuntimeRepository wraps the repo; every method requires
explicit team_id + project_id and validates scope via assertProjectOwnership.
- SessionGenerationPolicy supports per-event (default), debounce
(BullMQ delayed-job replace via getJob+remove+add), and end-of-session.
Configured via CLAUDE_MEM_SERVER_SESSION_POLICY and
CLAUDE_MEM_SERVER_SESSION_DEBOUNCE_MS env vars; per-team override hooks
are exposed on ServerV1PostgresRoutesOptions for future settings layer.
- POST /v1/sessions/start (find-or-create on (project_id, external_session_id),
GET /v1/sessions/:id (scoped 404), POST /v1/sessions/:id/end
(transactional: end + create summary outbox via UNIQUE collapse +
enqueue post-commit). Re-ending is fully idempotent.
- processSessionSummaryResponse persists summary as kind='summary'
observation with the same idempotency model
(generation_key + observation_sources UNIQUE).
- ProviderObservationGenerator dispatches on source_type:
agent_event -> processGeneratedResponse, session_summary ->
processSessionSummaryResponse; loadEvents handles session-summary
by loading unprocessed events.
- ActiveServerBetaGenerationWorkerManager wires summary BullMQ lane
alongside event lane (concurrency=1, autorun=false, error listener
attached per BullMQ docs).
Verification: 110 pass / 6 skip / 0 fail. Net typecheck error count
unchanged at 24 (pre-existing, none in Phase 6 files). Anti-pattern
greps clean for ActiveSession/SessionStore in src/server/runtime,
no worker imports anywhere in src/server.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): Phase 7 — hook routing without worker dependency
Hooks can now talk directly to server-beta when CLAUDE_MEM_RUNTIME=server-beta
is selected, with a clean worker fallback when server-beta is unhealthy.
- src/services/hooks/server-beta-client.ts — typed HTTP client for
/v1/sessions/start, /v1/events, /v1/sessions/:id/end. Throws
ServerBetaClientError with kind classification (missing_api_key,
transport, timeout, http_error, invalid_response) and isFallbackEligible
helper. Zero imports from services/worker/.
- src/services/hooks/runtime-selector.ts — reads CLAUDE_MEM_RUNTIME from
settings, returns worker or server-beta context, logs
[server-beta-fallback] reason=<code> on every config-time fallback.
- src/services/hooks/server-beta-bootstrap.ts — Postgres-backed API key
bootstrap. Find-or-creates local-hook-team + local-hook-project,
generates cmem_<random> key (SHA-256 hashed), inserts into api_keys
with scopes events:write/sessions:write/observations:read/jobs:read.
Settings file written with chmod 0600. rotateServerBetaApiKey() wired
to a new `claude-mem server keys rotate` command.
- src/cli/handlers/{observation,session-init,summarize}.ts — every hook
handler tries server-beta first when configured, falls through to the
existing worker path on transport/5xx/429/missing-key. One WARN line
per fallback. Hook JSON output shape unchanged.
- src/shared/SettingsDefaultsManager.ts — three new keys with defaults:
CLAUDE_MEM_SERVER_BETA_URL, CLAUDE_MEM_SERVER_BETA_API_KEY,
CLAUDE_MEM_SERVER_BETA_PROJECT_ID.
- src/npx-cli/commands/install.ts — when installer selects server-beta
runtime and CLAUDE_MEM_SERVER_DATABASE_URL is set, bootstraps a local
API key automatically. Warns and continues if the DB URL is missing.
plugin/scripts/*.cjs bundles rebuilt via npm run build to pick up the
new hook handler code path. No plaintext keys in the bundle (verified).
Verification: 16 hook unit tests pass; 275 server/storage/services tests
pass with 7 pre-existing failures (verified independent of this change
via git stash --include-untracked). Build clean. No new typecheck
errors in Phase 7 files.
Anti-pattern guards verified:
- /api/sessions/observations only reached via explicit fallback path
- server-beta runtime never starts the worker process
- API keys live only in ~/.claude-mem/settings.json (chmod 0600), never
in the bundle (grep confirmed)
- Worker fallback preserved, observable via single WARN line per call
Deferred: semantic context injection (UserPromptSubmit hook) stays
worker-only; server-beta does not yet expose /v1/context/semantic.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): Phase 8 — MCP backed by server-beta core
MCP tools now route through server-beta in server-beta mode while keeping
worker-mode search/timeline/get_observations tools fully working.
- src/servers/mcp-server.ts — five new observation_* tools registered:
observation_add, observation_record_event, observation_search,
observation_context, observation_generation_status. Three memory_*
compatibility aliases delegate to the canonical handlers. Worker
auto-start is gated when selectRuntime() === 'server-beta' so MCP
in server-beta mode never spawns the worker.
- src/services/hooks/server-beta-client.ts — addObservation,
searchObservations, contextObservations, getJobStatus added so MCP
shares one transport with hooks (Phase 7).
- src/server/routes/v1/ServerV1PostgresRoutes.ts — POST /v1/search and
POST /v1/context REST cores backed by PostgresObservationRepository
full-text search (GIN tsvector from Phase 1).
- Existing memory_search/timeline/get_observations tools call
callWorkerAPI unchanged in worker mode; worker tests unaffected.
Verification: 39 pass / 4 skip / 0 fail on targeted suite. Pre-existing
7 baseline failures verified independent (git stash). No new typecheck
errors. WorkerService grep clean across src/servers/mcp-server.ts and
src/server/.
Anti-pattern guards verified:
- No duplicate generation logic in MCP — observation_record_event hits
/v1/events which owns event+outbox+enqueue inside one tx
- WorkerService not imported anywhere under MCP server-beta path
- No hardcoded worker URLs — all transport via Phase 7 ServerBetaClient
- memory_* aliases retained, single handler per pair
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): Phase 9 — compatibility adapters without coupling
Legacy /api/sessions/observations and /api/sessions/summarize endpoints
keep working on server-beta runtime by translating to AgentEvent and
session-end calls — no worker code, no route duplication.
- src/server/services/IngestEventsService.ts — shared event-ingest path
used by both /v1/events and the compat adapter. Owns transactional
event row + outbox row + lifecycle log + post-commit BullMQ enqueue,
honors Phase 6 SessionGenerationPolicy.
- src/server/services/EndSessionService.ts — shared session-end path
used by both /v1/sessions/:id/end and the compat adapter. Idempotent
ended_at + summary outbox + deterministic summary job id.
- src/server/compat/SessionsObservationsAdapter.ts — translates legacy
POST /api/sessions/observations payload (Claude Code transcript shape)
-> AgentEvent (source_adapter='claude-code-compat',
event_type='tool_use') -> IngestEventsService.ingestOne. Resolves
contentSessionId to server_sessions via find-or-create.
- src/server/compat/SessionsSummarizeAdapter.ts — translates legacy
POST /api/sessions/summarize -> EndSessionService.end. Preserves the
legacy agentId -> {status:'skipped', reason:'subagent_context'}
behavior so existing clients see the same response shape.
- src/server/routes/v1/ServerV1PostgresRoutes.ts — refactored to
delegate to the new shared services (-203 LoC net) so /v1 and
/api compat both call the SAME canonical code path.
- src/server/runtime/ServerBetaService.ts — registers both compat
adapters alongside ServerV1PostgresRoutes, sharing service instances.
- docs/server-beta-parity-map.md — full enumeration of legacy /api/*
routes labeled native, adapter, or unsupported (with reasons).
Viewer read-path adapters explicitly listed as unsupported pending
a future viewer-rewrite phase.
Verification: 7 compat tests pass, 6 v1-routes tests still pass
(refactor preserved behavior), 4 session-routes tests pass. Pre-
existing 16 baseline failures verified independent via git stash.
Zero new typecheck errors.
Anti-pattern guards verified:
- No services/worker/http/routes or WorkerService imports under
src/server/compat or src/server/runtime
- Compat adapters are thin translators with names ending in *Adapter
and a top-of-file comment noting they are legacy compatibility
- /v1/* remains the canonical Server beta API; compat adapters
call shared services rather than acting as a parallel API
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): Phase 10 — Docker stack and deployable runtime
Server beta now ships as a Docker stack with no worker process anywhere
and a separate horizontal generation worker for scaling.
- src/server/runtime/create-server-beta-service.ts — validateServerBetaEnv()
fails fast on missing CLAUDE_MEM_SERVER_DATABASE_URL, requires
CLAUDE_MEM_QUEUE_ENGINE=bullmq in Docker, rejects
CLAUDE_MEM_AUTH_MODE=local-dev and CLAUDE_MEM_ALLOW_LOCAL_DEV_BYPASS
inside containers (detected via /.dockerenv or CLAUDE_MEM_DOCKER=1).
Adds CLAUDE_MEM_GENERATION_DISABLED so the HTTP service can run
generator-free.
- src/server/runtime/ServerBetaService.ts — runServerBetaGenerationWorker
for the dedicated consumer process; runServerBetaApiKeyCli is a new
Postgres-backed `server api-key` command (the legacy worker CLI wrote
to SQLite and was invisible to the Postgres runtime); getQueueHealth
shim feeds /api/health a consistent ObservationQueueHealth shape.
- src/npx-cli/commands/{runtime,server}.ts — `claude-mem server worker
start` subcommand that boots only the BullMQ consumer.
- docker/claude-mem/{Dockerfile,entrypoint.sh} — entrypoint forces
CLAUDE_MEM_DOCKER=1 + CLAUDE_MEM_RUNTIME=server-beta and exposes
three modes: server (HTTP only, generation disabled), worker (BullMQ
consumer), shell. Worker bundle is no longer the default CMD.
- docker-compose.yml — full stack: postgres + valkey + claude-mem-server
(HTTP-only) + claude-mem-worker (generation consumer). Wires
service-to-service env vars.
- scripts/e2e-server-beta-docker.sh + docker/e2e/server-beta-e2e.mjs —
E2E now hits /v1/sessions/start, /v1/events?wait=true, /v1/jobs/:id;
asserts no worker-service.cjs process anywhere in the stack;
one-shot docker compose run --rm verifies local-dev auth is
rejected with the expected stderr; restart-and-verify confirms
Postgres durability and BullMQ retry idempotency.
- docs/server.md — full Phase 10 doc: stack diagram, env table,
worker mode, auth-in-Docker policy.
- docs/api.md — event generation semantics (wait=true, generationJob).
Verification: full Docker E2E PASSED on live daemon
(phase1 + phase2 + restart-and-verify + revoked-key + no-worker-
process + local-dev-rejected). Unit tests 292 pass / 9 skip / 7 fail
(7 fails pre-existing baseline). Zero new typecheck errors.
Anti-pattern guards verified:
- entrypoint never execs worker-service.cjs; E2E greps prove no
worker process anywhere in the stack
- validateServerBetaEnv refuses local-dev auth in Docker with explicit
remediation message; ALLOW_LOCAL_DEV_BYPASS rejected the same way
- Docker requires CLAUDE_MEM_QUEUE_ENGINE=bullmq; in-process queue
rejected at startup
- claude-mem worker / worker-service / WorkerService greps clean
in docker/
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): Phase 11 — team-aware generation with audit chain
Generation jobs now carry team_id/project_id/api_key_id/actor_id/
source_adapter from enqueue through execution; the outbox is reloaded
from Postgres before any side effect so BullMQ payload can never act
as auth authority.
- src/server/jobs/types.ts — ServerGenerationJobPayloadSchema (Zod
discriminated union) requires team_id, project_id, generation_job_id,
source_adapter, api_key_id, actor_id (nullable), source_type, source_id,
plus event_id / server_session_id per kind. assertServerGenerationJobPayload
is called at enqueue (outbox.ts) and again at execution boundary.
- src/server/services/{IngestEventsService,EndSessionService}.ts +
SessionGenerationPolicy.ts — thread identity context (apiKeyId, actorId,
sourceAdapter) into both event and summary BullMQ payloads.
- src/server/generation/ProviderObservationGenerator.ts —
loadCanonicalOutbox loads the outbox row WITHOUT scope filter, then
compares candidate.team_id/project_id to payload.team_id/project_id;
mismatch -> ServerGenerationScopeViolationError (non-retryable),
failed status, generation_job.scope_violation audit. isApiKeyRevoked
checks api_keys (revoked_at, expires_at, row missing) before any
provider call; revoked -> generation_job.revoked_key audit + non-
retryable failure. generation_job.processing audit emitted on lock.
- src/server/generation/processGeneratedResponse.ts — generated
observations carry team_id/project_id/server_session_id from the
reloaded source row (not job payload). observation_sources.metadata
records source_adapter, actor_id, api_key_id for traceability.
observation.created audit per observation; generation_job.completed
audit per terminal transition. All audit rows reference the same
generation_job_id in details.
- src/server/routes/v1/ServerV1PostgresRoutes.ts — GET /v1/teams/:id/jobs
and GET /v1/projects/:id/jobs with SQL-layer scoping (WHERE team_id=$1
[AND project_id=$2] [AND status=$3]); cross-tenant returns 404 to
avoid leaking row existence. Pagination via status/limit/offset.
audit_log rows for event.received, event.batch_received, observation.read.
- src/server/compat/{SessionsObservationsAdapter,SessionsSummarizeAdapter}.ts —
propagate apiKeyId and sourceAdapter='claude-code-compat'.
Verification: 162 pass / 10 skip / 0 fail. Pre-existing failures in
tests/services/queue and tests/services/worker confirmed independent
via git stash. Zero new typecheck errors in server-beta files.
Required greps:
rg "team_id.*req\.body|project_id.*req\.body" src/server -> 0 matches
Audit chain integration test passes — generation_job.processing,
observation.created, and generation_job.completed audit rows all
share the same generation_job_id reference.
Anti-pattern guards verified:
- BullMQ payload never acts as auth authority — Postgres outbox
reload with mismatch check happens before every side effect
- team_id / project_id never derived from request body for scope
decisions; always req.authContext.teamId / projectId
- Application-layer team/project filtering forbidden — listJobsForScope
pushes scope into the SQL WHERE clause
- Project-scoped key on cross-project /v1/teams/:id/jobs returns 404
- Revoked api keys cause non-retryable failure with audit before
any provider call
Deferred: a redundant generation_job.queued audit_log row (already
covered by observation_generation_job_events lifecycle log per Phase 1
schema split). Compat adapters set actor_id=null but propagate
api_key_id which is the canonical reference downstream.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): Phase 12 — observability and operations
Operators can now inspect, retry, and cancel generation jobs from the
CLI; queue lane metrics flow into /api/health and /v1/info; every
request gets a stable request_id that flows through HTTP -> audit ->
outbox -> generator -> completion log.
- src/server/middleware/request-id.ts — honors safe inbound X-Request-Id,
mints uuid v4 otherwise. Set on req.requestId and echoed via response
header so external traces can correlate.
- src/server/jobs/ServerJobQueue.ts — QueueEvents wired with completed,
failed, progress, stalled, error listeners; lifecycle counters
exposed via observe() API. Logs emitted as
[generation] job=<id> source_type=<...> duration=<ms> attempts=<N>
reason=<message>. Stalled and error counters survive worker restart.
- src/server/jobs/types.ts — ServerGenerationJob payload schema
extended with optional request_id; flows through from HTTP into
every BullMQ job.
- src/server/queue/ObservationQueueEngine.ts — health snapshot now
carries per-lane (event, summary) counts via
ObservationQueueHealthLaneSnapshot.
- src/server/runtime/{ActiveServerBetaQueueManager,
ActiveServerBetaGenerationWorkerManager,ServerBetaService}.ts —
per-lane getJobCounts feed /api/health and /v1/info; stalled events
audit through audit_log with action generation_job.stalled.
- src/server/routes/v1/ServerV1PostgresRoutes.ts —
GET /v1/jobs (status/source_type/since/limit/offset, scope from
api-key, payload stripped unless ?include=payload AND admin scope),
POST /v1/jobs/:id/retry (idempotent; queued -> no-op; audit
generation_job.retried_by_operator), POST /v1/jobs/:id/cancel
(terminal -> no-op; audit generation_job.cancelled_by_operator;
generator reload-before-side-effects already prevents double work).
- src/server/services/IngestEventsService.ts +
SessionGenerationPolicy.ts + ProviderObservationGenerator.ts —
request_id propagated end to end. Generator extracts request_id
from BullMQ payload and includes it in lock/processing/completion
logs and audit details.
- src/npx-cli/commands/server-jobs.ts +
src/npx-cli/commands/server.ts — `claude-mem server jobs
status|failed|retry|cancel`. status compares Postgres outbox counts
to BullMQ queue counts and surfaces divergence. failed prints
attempts + last_error message. --team and --project filters.
Verification: 350 pass / 12 skip / 7 fail (pre-existing baseline,
verified independent via git stash). 18 new tests added (request-id
middleware, server-jobs CLI seams, jobs list/retry/cancel routes
Postgres-gated). Zero new typecheck errors.
Anti-pattern guards verified:
- agent_events.payload only emitted in /v1/jobs response inside the
admin-gated branch (?include=payload + admin scope) — returns 403
otherwise
- jobs retry on a queued row is a no-op (no double BullMQ enqueue,
no double UPDATE)
- Every operator action writes to audit_log with the
*_by_operator action and request_id correlation in details
- Stalled events audit through generation_job.stalled
Sample correlated trace (one request_id end to end):
HTTP middleware: req.requestId = 'req-abc'
audit event.received: details.requestId = 'req-abc'
BullMQ payload: { request_id: 'req-abc', generation_job_id: 'gj_x' }
generator lock log: [generation] job locked { jobId, requestId }
audit generation_job.processing: details.requestId = 'req-abc'
completion log: [generation] job=evt_... duration=1230ms
Deferred: live /api/health round-trip integration test (needs
Redis); stalled event live integration test (needs Redis); storing
request_id on the observations row itself (spec did not require).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(server-beta): add Phase 13 release readiness report
Captures the final verification gate: tests (1749 pass, 45 fail all
pre-existing baseline, zero regressions), required greps clean,
Docker E2E green end-to-end, all 7 exit criteria met, build clean,
typecheck unchanged from main. Documents deferred items.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* build(server-beta): rebuild server-beta-service bundle
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(server-beta): address Greptile review on PR #2383
- ProviderObservationGenerator.lockOutbox: skip duplicate worker run when
another lock is active instead of returning the row, which previously let
two BullMQ workers issue the (paid, rate-limited) external provider call
before the persistence-layer terminal-status guard collapsed the duplicate.
Reconciliation still recovers from a stale lock on startup or next retry.
- docker-compose.yml: require POSTGRES_USER/PASSWORD/DB env vars (no
defaults). Stack refuses to start without explicit secrets. Added a header
warning that the file must not be deployed unmodified.
- e2e-server-beta-docker.sh: export ephemeral test creds for the new
required env vars so the Docker E2E driver still runs unattended.
- ServerBetaService api-key list: bound query with LIMIT/OFFSET (default 100,
max 500) and add optional --team filter to prevent unintentional
cross-tenant key metadata disclosure on shared admin hosts.
- SessionGenerationPolicy: fix dead `??` fallback for NaN parseInt result;
use `||` so DEFAULT_DEBOUNCE_MS actually applies.
- ServerV1PostgresRoutes: `?wait=true` now actually waits — polls the outbox
row until terminal status (timeout 30s, 100ms interval) on both
/v1/events and /v1/events/batch. Returns `waitTimedOut: true` if the cap
is hit so callers can re-poll the status endpoints.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(server-beta): address CodeRabbit + Greptile second review on PR #2383
P1 fixes
- Operator retry endpoint was re-publishing the Postgres outbox metadata
column as the BullMQ payload; the worker's
assertServerGenerationJobPayload always rejected it, leaving the row
stuck in queued until startup reconciliation. Persist the BullMQ payload
on the outbox row at create-time inside IngestEventsService and
EndSessionService, then re-enqueue that canonical payload on retry.
Major fixes
- prompt-builder: escape server_session_id when interpolating into the
XML prompt; previously a session id containing `<`, `&`, or quotes
could inject XML into the provider input.
- ServerJobQueue: route both worker.on('stalled') and the QueueEvents
'stalled' subscriber through a single notifyStalled helper that
dedupes by jobId for 30s, so counters.stalled increments once per
stall. QueueEvents 'error' now routes through notifyQueueError so
it increments counters.errored and runs onError listeners — keeping
observability symmetric across both sources.
- ServerV1PostgresRoutes: convert PostgresObservationRepository from
three dynamic imports to a single static import for consistency.
- mcp-server / ServerBetaClient: actually forward the
observation_record_event tool's `generate` flag through to the
/v1/events endpoint as `?generate=false` instead of voiding it.
- server-sessions.markGenerationFailed: guard jsonb_set against a null
error payload so the failure path can't null out metadata before the
generation_status='failed' write commits.
Minor fixes
- server-sessions.endSession: keep updated_at stable on repeated calls
so the documented idempotency contract holds.
- SettingsDefaultsManager + ServerBetaService.getServerBetaPort: derive
the server-beta default port from UID (37877 + uid%100), matching the
worker port pattern, so two users on the same host don't collide.
Docker stacks always pass CLAUDE_MEM_SERVER_PORT explicitly so the
containerized deployment is unaffected.
- server-session-runtime test: close the pg.Pool in afterAll.
- server-beta-release-readiness.md: escape pipes inside table inline
code, add `text` language tag to the fenced log block.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(server-beta): address Greptile + CodeRabbit third review on PR #2383
P1 fixes
- SessionsObservationsAdapter.resolveServerSession: catch unique-violation
(23505) on concurrent compat inserts and re-fetch instead of returning
500. Two compat callers carrying the same contentSessionId can both
observe `existing===null` and race on the (project_id,
external_session_id) unique constraint; the second now resolves to the
raced row instead of dropping the event.
- /v1/events/batch: pass `sourceAdapter: null` to ingestBatch so each
event's BullMQ payload (and persisted outbox payload column) reflects
its own event.sourceAdapter via buildEventBullmqPayload's fallback,
rather than stamping the whole batch with the first event's adapter.
Minor
- server-session-runtime test afterEach: wrap DROP SCHEMA in try/finally
so client.release() always runs even if the drop throws.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(test): drop `pool as never` cast — pg.Pool already matches PostgresPool
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(server-beta): retry of completed job now 409s instead of duplicating
retryGenerationJob previously fell through to the reset+re-enqueue path
when called on a job in `completed` status. The observations index
dedupes on (generation_job_id, parsed_observation_index, content) but
LLM output is non-deterministic, so a second provider run almost always
produced a different content string and bypassed the index, persisting a
parallel set of observation rows attributed to the same generation job.
Match cancelGenerationJob's 409 guard for completed jobs. failed and
cancelled remain valid retry targets.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* build(server-beta): rebuild bundles after rebase onto main
Regenerates the three plugin bundles so they reflect the rebased source
state. Mechanical rebuild output only — no source changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(server-beta): wrap resolveServerSession in try/catch for structured error response
Greptile P1 on PR #2383: resolveServerSession was called before the try/catch
in both compat adapters, so Postgres errors during session lookup (timeout,
pool exhaustion, etc.) escaped to Express's default error handler and returned
HTML/text 500s. Legacy clients calling response.json() would get a parse
failure instead of the documented { stored: false, reason: 'internal_error' }
(or { status: 'error', reason: 'internal_error' } for the summarize adapter)
shape.
Move the resolveServerSession call inside the existing try block in both
adapters so any failure flows through the structured catch handler.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(server-beta): catch 23505 unique violation in POST /v1/sessions/start
Greptile P1 on PR #2383: concurrent requests with the same externalSessionId
can both pass the findByExternalIdForScope check, both call repo.create,
and the loser hits the (project_id, external_session_id) unique constraint.
The handler treated that as an unknown error and returned a 500.
Apply the same pattern resolveServerSession already uses: catch error.code
'23505' when externalSessionId is set, refetch the row inserted by the
winning request, and return 200 with that session.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(mcp): drop ${_R%/} parameter-expansion trim that trips Claude Code MCP validator
The POSIX substring trim ${_R%/} is misread by Claude Code's MCP-config
validator as a required env var named "_R%/", causing /doctor to flag
mcp-search as invalid on every install. POSIX collapses // in paths, so
the trim was cosmetic — drop it and the validator passes.
Fixes#2350, #2354, #2356.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(env): block ANTHROPIC_BASE_URL leak + three-branch OAuth-skip predicate
Issue #2375: parent-shell ANTHROPIC_BASE_URL leaked through to subprocess
isolatedEnv, while ANTHROPIC_AUTH_TOKEN was blocked. The OAuth-skip
predicate fired on bare BASE_URL, but no auth credential reached the
subprocess -> "Not logged in". Add ANTHROPIC_BASE_URL to BLOCKED_ENV_VARS
so it can only enter isolatedEnv via ~/.claude-mem/.env.
Replace the OAuth-skip predicate with three branches to prevent a
second-order security regression: a user with a tokenless gateway
configured in .env (BASE_URL only, no token) would otherwise have their
Anthropic OAuth token fetched and sent to their gateway. Token leak to
third party. Three-branch predicate:
1. BASE_URL set -> return without OAuth (custom gateway, never leak token)
2. API_KEY or AUTH_TOKEN set -> return without OAuth (explicit credentials)
3. Otherwise -> OAuth lookup for api.anthropic.com
Adds tests/env-isolation.test.ts.
Fixes#2375.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(worker): classify Claude SDK HTTP 400 as unrecoverable
ClaudeProvider previously had no explicit HTTP 400 handling — the
default branch classified all errors as `transient`, so a permanent
400 (e.g., model rejecting an `effort` parameter forwarded from a
leaked CLAUDE_CODE_EFFORT_LEVEL) would be retried indefinitely
(#1874+ retries observed in one session per #2357).
Mirror GeminiProvider/OpenRouterProvider's pattern: classify 400 as
`unrecoverable`, 401/403 as `auth_invalid`, 429 as `rate_limit`,
default to `transient`. When the 400 body matches the
"effort parameter" signature, emit a one-time SDK warn log pointing
at the env-leak fix in ~/.claude-mem/.env.
Adds tests/claude-provider-error-classifier.test.ts.
Fixes#2357.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chroma): pin onnxruntime>=1.20 + protobuf<7 to fix INVALID_PROTOBUF on macOS arm64
The shipped all-MiniLM-L6-v2 model has pytorch-2.0 IR. chroma-mcp 0.2.6
transitively depends on `chromadb>=1.0.16` which only requires
`onnxruntime>=1.14.1` — uv can therefore resolve to an onnxruntime old
enough to fail every embedding add with `[ONNXRuntimeError] : 7 :
INVALID_PROTOBUF` on macOS arm64 / Python 3.13. Semantic search silently
degraded to FTS-only and smart backfill broke (#2371).
Path B (override) was required because chroma-mcp 0.2.6 is the latest
PyPI release — no upstream bump exists.
Inject `--with onnxruntime>=1.20 --with protobuf<7` into the uvx spawn
args (both persistent and remote modes). The protobuf cap is essential:
forcing only `onnxruntime>=1.20` causes uv to re-resolve and land on
protobuf 7.x, which trips opentelemetry's `_pb2` stubs with `TypeError:
Descriptors cannot be created directly` because they were generated
with protoc <3.19. Capping below 7 lands on protobuf 6.x which
opentelemetry tolerates.
Verified end-to-end: ONNX model loads, embeddings produce a 384-dim
vector, PersistentClient init / add / query roundtrip succeeds:
uvx --python 3.13 --with "onnxruntime>=1.20" --with "protobuf<7" \
chroma-mcp==0.2.6 --help # clean
# programmatic test: onnxruntime 1.26.0, protobuf 6.33.6,
# embedding ok 384, query ok ids=[['1']]
Fixes#2371.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chroma): enforce single chroma-mcp subprocess per worker (#2313)
Root cause: every reconnect path in ChromaMcpManager — connectInternal's
re-entry, the connect-timeout catch, callTool's transport-error retry, and
the transport.onclose handler — used to abandon `this.transport`/`this.client`
by calling at most `transport.close()` and nulling the handles. The MCP SDK's
StdioClientTransport.close() only signals the direct child (uvx); on Linux the
grandchildren (uv -> python -> chroma-mcp) re-parent to init and survive
because the SDK does not put the subprocess in its own process group. Each
reconnect therefore leaked a full chroma-mcp tree, accumulating 20+ instances
per session.
Fix: introduce a private disposeCurrentSubprocess() helper that always tree-
kills via the existing killProcessTree primitive before nulling the transport
reference, and route every "abandon current transport" path (reconnect,
connect-timeout, transport error, onclose, stop) through it. The existing
`connecting: Promise<void> | null` lock continues to serialize concurrent
ensureConnected() callers into a single spawn.
Adds tests/services/sync/chroma-mcp-manager-singleton.test.ts covering:
- 5 parallel ensureConnected() calls produce exactly one spawn
- a transport-error reconnect tree-kills the prior subprocess pid before
spawning a replacement
- stop() disposes state including any pending connecting promise
Manual verification needed on Linux: after a long session with multiple
tool uses, `ps aux | grep chroma-mcp | wc -l` should return 1, not 20+.
Fixes#2313.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(build): polyfill import.meta.url to __filename in CJS worker bundle
The worker bundles ESM dependencies (notably @anthropic-ai/claude-agent-sdk's
*.mjs files) into CJS output. Those modules call createRequire(import.meta.url)
at module-load time. esbuild's CJS output left this as createRequire(ute.url)
— where `ute` is its `import.meta` polyfill `{}` — so `ute.url` was undefined
and module-load crashed with:
TypeError: The argument 'filename' must be a file URL object, file URL
string, or absolute path string. Received undefined
code: ERR_INVALID_ARG_VALUE
Every Stop hook and every worker subprocess invocation hit this. Fix is the
esbuild `define` option mapping `import.meta.url` to `__filename` (provided as
a real absolute path by the existing CJS prelude in the banner).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: daily dep bump per CLAUDE.md maintenance policy
Root: @anthropic-ai/claude-agent-sdk, @clack/prompts, @types/node,
dompurify, postcss, react, react-dom, yaml, zod.
plugin/: tree-sitter-cli, zod.
openclaw/: @types/node.
All patch/minor bumps; no major version changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* build: regenerate plugin artifacts after env/chroma/mcp fixes
Built artifacts are committed so the marketplace-installable plugin
ships with the runtime bundles. Picks up:
- d7b145e9 .mcp.json shell-prelude trim drop
- a8cbd651 EnvManager BASE_URL block + 3-branch predicate
- 8cb73b8c ClaudeProvider HTTP 400 unrecoverable classifier
- ecd5b802 ChromaMcpManager onnxruntime/protobuf overrides
- c79324ea ChromaMcpManager singleton enforcement
- e8376f46 esbuild import.meta.url -> __filename polyfill
- a7541d71 daily dep bump
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* build: regenerate plugin artifacts after main merge
Bundles now include both v13.0.0 server-beta runtime (server-beta-service.cjs
+ updated mcp-server.cjs / worker-service.cjs) and this branch's chroma /
env / build / Claude SDK fixes.
Verified: bun test tests/env-isolation.test.ts \\
tests/claude-provider-error-classifier.test.ts \\
tests/services/sync/chroma-mcp-manager-singleton.test.ts
→ 13/13 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(review): address CodeRabbit findings on PR #2394
1. scripts/build-hooks.js — `import.meta.url` now maps to a file:// URL
(via pathToFileURL(__filename).href in the CJS banner) instead of the
raw __filename path. Preserves URL semantics for any bundled ESM dep
that does `new URL(rel, import.meta.url)`. createRequire still works.
2. src/shared/EnvManager.ts — added envFilePath() that resolves
CLAUDE_MEM_ENV_FILE lazily (falling back to paths.envFile()), and
switched internal load/save call sites to use it. ENV_FILE_PATH is
kept as a deprecated snapshot for back-compat. Lets tests target a
temp file without depending on module-load order.
3. tests/env-isolation.test.ts — redirects to a temp dir via
CLAUDE_MEM_ENV_FILE in beforeAll, removes all mutation of the real
~/.claude-mem/.env, and wraps the OAuth-spy assertion in try/finally
so the spy is always restored even if the test fails.
Verified:
bun test tests/env-isolation.test.ts \
tests/claude-provider-error-classifier.test.ts \
tests/services/sync/chroma-mcp-manager-singleton.test.ts
→ 13/13 pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Major version bump following PR #2351 merge — server-beta runtime,
Postgres observation storage, BullMQ queue engine, and Apache 2.0
relicense are now on main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add server beta runtime foundation
* Address server beta review findings
* Resolve server beta review comments
* Tighten server beta review follow-ups
* Harden server beta auth and search
* Avoid unnecessary FTS rebuilds
* Block scoped keys from creating projects
* Release BullMQ claims best effort on close
* Address server beta review blockers
* Reset BullMQ claims best effort
* Add Postgres observation storage foundation
* feat(server-beta): add independent runtime service
Introduce src/server/runtime/ as a self-contained server-beta runtime
that owns its lifecycle, Postgres bootstrap, and HTTP boundary without
depending on WorkerService.
ServerBetaService wraps the existing Server class, exposes
/healthz and /v1/info with runtime="server-beta", and persists state
to dedicated paths (.server-beta.pid|.port|.runtime.json). The four
boundary managers (queue, generation worker, provider registry, event
broadcaster) are intentionally disabled in this phase and report their
status through /v1/info; later phases activate them.
Adds plans/2026-05-07-finish-bullmq-branch-ship-plan.md to track the
remaining work for this branch.
Phase 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): route CLI lifecycle and bundle separate runtime
scripts/build-hooks.js now produces plugin/scripts/server-beta-service.cjs
as a separate Node CJS bundle, alongside the existing worker-service
bundle. The server-beta runtime is now installable independently.
src/npx-cli/commands/server.ts routes start|stop|restart|status to the
server-beta lifecycle instead of the legacy worker. The worker keeps its
own start|stop|restart|status under the worker namespace; the two
runtimes can be operated independently.
src/services/worker-service.ts adds a server-* command parser branch
that delegates to the sibling server-beta-service.cjs bundle so
direct worker-service invocations still route to the right runtime.
tests/npx-cli-server-namespace.test.ts updated to expect server-beta
lifecycle routing.
Includes rebuilt plugin/scripts/*.cjs bundles produced by
build-and-sync.
Phase 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): add BullMQ job queue primitives
Introduce src/server/jobs/ as the queue-side primitives that Phase 3 of
the server-beta runtime needs to operate.
types.ts defines a discriminated union over the four job kinds (event,
event-batch, summary, reindex) and maps each to a per-kind BullMQ queue
name and deterministic-ID prefix.
job-id.ts builds deterministic, colon-free BullMQ jobIds from
(kind, team, project, source). The colon ban exists because BullMQ uses
':' as a Redis key separator internally; embedding ':' in jobIds
breaks scan and state lookups.
ServerJobQueue.ts is a thin wrapper over BullMQ Queue + Worker that
enforces autorun:false, default concurrency 1, and an attached error
listener — all per BullMQ docs requirements. Test seams accept queue
and worker factories so unit tests do not need Redis.
outbox.ts publishes through the Postgres ObservationGenerationJob
repository as canonical history. enqueueOutbox writes the row first,
then publishes to BullMQ; if BullMQ throws, the row is transitioned to
failed and a failed event is appended. reconcileOnStartup re-enqueues
queued + processing rows after a restart, replacing terminal BullMQ
jobs that may still be holding the deterministic ID slot. markCompleted
and markFailed wrap transitionStatus and append the matching event row.
Includes 20 unit tests covering deterministic ID stability, colon-free
output, queue lifecycle, error-listener attachment, double-start
refusal, idempotent enqueue, BullMQ failure rollback, startup
reconciliation, max-attempts skipping, and completion / failure /
retry transitions.
Phase 3 commit 1 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(server-beta): activate queue boundary in runtime service
Wire ActiveServerBetaQueueManager into the server-beta runtime graph.
The active manager owns one ServerJobQueue per generation kind (event,
event-batch, summary, reindex) and surfaces lane metadata through
boundary health.
Selection is opt-in and fail-fast: if CLAUDE_MEM_QUEUE_ENGINE is set to
bullmq the active manager is constructed (and any Redis/config error
throws — no silent fallback to SQLite, per Phase 3 anti-pattern guard).
For any other engine the disabled boundary remains so worker-era and
test setups stay compatible.
Widens ServerBetaBoundaryHealth.status to a discriminated union
('disabled' | 'active' | 'errored') with optional details. The disabled
adapter still emits status='disabled', which keeps the existing
server-beta-service test green.
ServerBetaService receives the manager through a new optional
queueManager field on CreateServerBetaServiceOptions so test graphs
and Phase 4 wiring can inject custom managers.
Adds tests/server/runtime/active-queue-manager.test.ts covering bullmq
guard, active health shape, per-kind queue access, close behavior, and
post-close errored health.
Phase 3 commit 2 of plans/2026-05-07-server-beta-independent-bullmq-observation-runtime.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(server-beta): cap /v1/events/batch at 500 events
Prevents unbounded array DoS surface flagged in PR review.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>