# Observation Queue Engine Deep Dive: BullMQ vs Bee-Queue Date: 2026-05-06 ## Executive decision If claude-mem replaces its observation queue with one of the two Redis-backed libraries, choose **BullMQ**, not Bee-Queue. That said, the current observation queue is not a generic background job queue. It is a durable, per-session input stream feeding long-lived provider generators. Replacing it with Redis should not be the default local install path unless claude-mem is willing to require, bundle, or supervise Redis. If Redis is not acceptable as a new operational dependency, the better path is to keep the SQLite queue and fix the contract/test drift. Recommended path: 1. Stabilize the current queue contract and tests. 2. Add a queue-engine adapter boundary. 3. Keep SQLite as the default backend. 4. Add BullMQ as an optional backend for users who explicitly configure Redis. 5. Do not adopt Bee-Queue. ## Current claude-mem queue shape The active queue path is: - `src/services/worker/http/shared.ts` and `SessionRoutes.ts` ingest observations/summarize requests. - `SessionManager.queueObservation()` and `queueSummarize()` persist rows through `PendingMessageStore.enqueue()`. - `SessionQueueProcessor.createIterator()` claims one row at a time and wakes via a per-session `EventEmitter`. - Provider loops in `ClaudeProvider`, `GeminiProvider`, and `OpenRouterProvider` consume `sessionManager.getMessageIterator(sessionDbId)`. - Parsed agent output is stored through `processAgentResponse()`, then `SessionManager.clearPendingForSession()` clears that session's pending rows. Key semantics that must survive any replacement: - Per-session FIFO ordering. - At-most-one active consumer per session. - Durable queue across worker restarts. - Startup recovery from `processing` back to `pending`. - Low-latency wakeup when new tool observations arrive. - Deduplication by `content_session_id + tool_use_id`. - Original observation timestamp preservation for storage/broadcast. - Queue depth for `/api/processing-status` and SSE. - Local-first behavior and simple install are product requirements, not just implementation details. Important mismatch found during the dive: - Current `PendingMessageStore` only models `pending` and `processing`. - Older migrations, tests, and scripts still reference `processed`, `failed`, `retry_count`, `completed_at_epoch`, `failed_at_epoch`, and `worker_pid`. - `storeObservationsAndMarkComplete()` still updates a row to `processed`, while the currently visible queue path clears all pending messages for the session after parsing. - `src/services/sqlite/schema.sql` still creates `idx_pending_messages_worker_pid` even though the visible table definition has no `worker_pid`. Focused test run: ```sh bun test tests/services/sqlite/PendingMessageStore.test.ts tests/services/queue/SessionQueueProcessor.test.ts ``` Result: 10 pass, 6 fail. Failures show stale tests/contract drift: - `PendingMessageStore.test.ts` passes `3` as constructor arg, but constructor now expects `onMutate?: () => void`. - `SessionQueueProcessor.test.ts` expects retry-after-store-error behavior, but current implementation logs and exits the iterator on claim failure. This needs to be reconciled before swapping engines; otherwise the migration will encode inconsistent behavior. ## BullMQ deep dive Sources checked: - GitHub: https://github.com/taskforcesh/bullmq - NPM: https://www.npmjs.com/package/bullmq - Docs: https://docs.bullmq.io/ - Queues: https://docs.bullmq.io/guide/queues - Connections/Redis constraints: https://docs.bullmq.io/guide/connections - Production notes: https://docs.bullmq.io/guide/going-to-production - Manual processing: https://docs.bullmq.io/patterns/manually-fetching-jobs - Job IDs/dedupe: https://docs.bullmq.io/guide/jobs/job-ids - Stalled jobs: https://docs.bullmq.io/guide/workers/stalled-jobs Current package/repo facts captured on 2026-05-06: - NPM latest: `bullmq@5.76.5`. - NPM modified: 2026-05-02. - GitHub pushed: 2026-05-05. - GitHub stars/forks/open issues at capture time: 8808 / 606 / 414. - License: MIT. - Unpacked size: about 2.5 MB. - Dependencies: `ioredis`, `cron-parser`, `msgpackr`, `node-abort-controller`, `semver`, `tslib`. - TypeScript types are bundled. - A Bun import smoke test succeeded for `import { Queue } from 'bullmq'`. Strengths for claude-mem: - Actively maintained and widely used. - Built-in TypeScript API. - Redis-backed durability and distributed workers. - Built-in stalled-job recovery, retry attempts, fixed/exponential backoff, delays, priorities, FIFO/LIFO, auto-removal, QueueEvents, manual processing APIs, and job ID based dedupe. - BullMQ docs explicitly support manual job fetching with `Worker#getNextJob()`, `moveToCompleted()`, `moveToFailed()`, and lock extension. This matters because claude-mem's provider loop is closer to a stream consumer than a normal job processor. Costs and risks: - Redis becomes required for the queue backend. BullMQ docs require a Redis connection to use queues and recommend Redis compatibility 6.2+. - Redis must be configured like durable infrastructure, not cache: AOF persistence and `maxmemory-policy=noeviction` are recommended/required for correctness. - Connection count increases. BullMQ docs note each class consumes at least one Redis connection; `Worker` and `QueueEvents` need blocking/duplicated connections in some cases. - Jobs store data in Redis in clear text unless claude-mem encrypts or avoids sensitive payload fields. Tool input/output can be sensitive. - BullMQ job completion/failure semantics do not map directly to claude-mem's current "provider consumes many messages, parses one response, then clears the session" behavior. - Per-session FIFO with parallel sessions is not free in OSS BullMQ. A single global queue with worker concurrency > 1 can violate same-session ordering unless we add a scheduler. BullMQ Pro groups would address this, but claude-mem should not depend on Pro. - Custom `jobId` is useful for `tool_use_id` dedupe, but BullMQ custom job IDs must not contain `:`. Use a hash or safe delimiter. - Manual processing requires lock management. BullMQ docs call out that manually fetched jobs do not get automatic lock renewal like standard processors; claude-mem would need `extendLock()` for long provider calls or a large lock duration. Best BullMQ shape if adopted: - Prefer **one queue per active session** over one global queue initially: - Queue name: `claude-mem:session:` or a hashed content-session suffix. - Worker/manual consumer concurrency: `1`. - Preserves per-session FIFO without BullMQ Pro groups. - Active session counts are naturally low for local claude-mem usage. - Cleanup queue keys when a session is deleted or after idle timeout. - Use `jobId` for observation dedupe: - `obs_`. - Summaries should use a distinct id scheme and usually should not dedupe unless the current summarize semantics require it. - Use `removeOnComplete` aggressively if SQLite remains the source of truth for stored observations. - Keep only bounded failed jobs for debugging. - Treat Redis as queue state only; SQLite remains the canonical observation/session store. - Add config: - `CLAUDE_MEM_QUEUE_ENGINE=sqlite|bullmq` - `CLAUDE_MEM_REDIS_URL` - `CLAUDE_MEM_QUEUE_REDIS_PREFIX` - `CLAUDE_MEM_QUEUE_ENCRYPT_PAYLOADS=true|false` if sensitive fields are stored. ## Bee-Queue deep dive Sources checked: - GitHub: https://github.com/bee-queue/bee-queue - NPM: https://www.npmjs.com/package/bee-queue - README/API docs in repository. Current package/repo facts captured on 2026-05-06: - NPM latest: `bee-queue@2.0.0`. - NPM modified: 2025-12-08. - GitHub pushed: 2026-04-10. - GitHub stars/forks/open issues at capture time: 4027 / 221 / 47. - License field from NPM: MIT. GitHub API license metadata returned `NOASSERTION`. - Unpacked size: about 107 KB. - Dependencies: `redis@^3.1.2`, `p-finally`, `promise-callbacks`. - NPM package exposes `./index.d.ts`. - A Bun import smoke test succeeded for `import BeeQueue from 'bee-queue'`. Strengths for claude-mem: - Very small and simple. - Designed for short, real-time jobs. - Redis-backed with Lua/pipelining and low overhead. - Supports concurrency, retries, retry strategies, timeouts, scheduled jobs, pub/sub events, results to producers, and stalled job retry. - Redis requirement is lighter in docs: Redis 2.8+, with Redis 3.2+ recommended for delayed jobs. Costs and risks: - Narrower feature set by design. The README says priorities and repeatable jobs are not currently supported. - CommonJS-first API; workable, but less idiomatic for this ESM TypeScript codebase. - Uses the older `redis` v3 client line, not modern `redis` v4/v5 or `ioredis`. - Observability and operational tooling are thinner than BullMQ. - Same per-session ordering mismatch exists as BullMQ, but with fewer escape hatches. - Delayed retry behavior requires `activateDelayedJobs` on at least one queue instance. - The package is newly revived, but not as active/mature as BullMQ for a queue-engine foundation. Conclusion: Bee-Queue is attractive if the only goal is "small Redis queue for short jobs." claude-mem needs a durable session stream with strict per-session semantics, good TypeScript ergonomics, explicit recovery behavior, and long-term maintenance. Bee-Queue is the wrong tradeoff. ## Scorecard | Area | Current SQLite | BullMQ | Bee-Queue | | --- | --- | --- | --- | | Local-first install | Strong | Weak unless Redis is bundled/optional | Weak unless Redis is bundled/optional | | Per-session FIFO | Strong | Medium with per-session queues; weak with one global queue | Medium with per-session queues; weak with one global queue | | Restart durability | Strong, SQLite-backed | Strong if Redis persistence configured | Strong if Redis persistence configured | | Stalled recovery | Custom/simple | Strong built-in | Built-in | | TypeScript fit | Strong | Strong | Medium | | Maintenance/activity | Internal | Strong | Medium | | Operational complexity | Low | High | Medium-high | | Queue observability | Custom/basic | Strong | Medium | | Dependency footprint | Low | Larger | Small | | Privacy/data locality | SQLite local file | Redis clear-text unless handled | Redis clear-text unless handled | | Best use in claude-mem | Default | Optional advanced backend | Do not use | ## Migration plan Phase 0: Fix the existing contract - Decide whether `pending_messages.status` is only `pending|processing`, or whether `processed|failed` is coming back. - Fix `schema.sql` and migrations so `worker_pid` indexes are not created after `worker_pid` is dropped. - Fix `storeObservationsAndMarkComplete()` or remove it if no longer used. - Update queue tests to match real behavior: - constructor signature; - claim error behavior; - reset-on-start behavior; - dedupe by `tool_use_id`; - clear-session behavior. Phase 1: Add an adapter boundary Define a small interface around current behavior, not around BullMQ: ```ts interface ObservationQueueEngine { enqueue(sessionDbId: number, contentSessionId: string, message: PendingMessage): Promise; createIterator(sessionDbId: number, signal: AbortSignal, onIdleTimeout?: () => void): AsyncIterableIterator; clearPendingForSession(sessionDbId: number): Promise; resetProcessingToPending(sessionDbId: number): Promise; getPendingCount(sessionDbId: number): Promise; getTotalQueueDepth(): Promise; close(): Promise; } ``` Keep `SqliteObservationQueueEngine` as the first implementation by moving the current `PendingMessageStore + SessionQueueProcessor` behavior behind this interface. Phase 2: Add BullMQ backend behind feature flag - Add `BullMqObservationQueueEngine`. - Use per-session queues with concurrency/manual fetch of 1. - Use safe hashed `jobId` for observation dedupe. - Preserve `_originalTimestamp` in job data. - Keep provider loops unchanged by preserving the async iterator interface. - Implement lock extension if manual processing can exceed the configured lock duration. - Keep SQLite as the observation/session truth; Redis is transport. - Add Redis connectivity health to `/api/health` only when BullMQ backend is enabled. Phase 3: Migration and fallback - On startup with BullMQ enabled, migrate existing SQLite `pending_messages` rows into BullMQ once, then mark/delete migrated rows. - If Redis is unavailable at startup, fail loudly for `CLAUDE_MEM_QUEUE_ENGINE=bullmq`; do not silently drop observations. - For default `sqlite`, do not require Redis. Phase 4: Tests - Unit-test the adapter contract with a shared test suite. - Run the suite against SQLite always. - Run BullMQ tests only when Redis is available, or spin Redis in CI. - Add crash/restart tests: - enqueue, kill worker, restart, process; - claimed job stalls and returns; - duplicate `tool_use_id` is suppressed; - per-session FIFO across concurrent sessions; - idle timeout still aborts provider subprocesses. ## Final recommendation Do not do a direct swap from SQLite to either library. If the product goal is to keep claude-mem easy to install and local-first, invest in the current SQLite queue: clean up the schema/status drift, restore tests, add explicit retries/failure rows if needed, and keep the in-process wakeup path. If the product goal is to support distributed workers or stronger queue observability, add **BullMQ as an optional backend** through an adapter. It has the right maintenance profile, TypeScript support, recovery primitives, and docs. Bee-Queue is too narrow and too legacy-client-oriented for this role.