ba1ef6c42c
* fix: resolve search, database, and docker bugs (#1913, #1916, #1956, #1957, #2048) - Fix concept/concepts param mismatch in SearchManager.normalizeParams (#1916) - Add FTS5 keyword fallback when ChromaDB is unavailable (#1913, #2048) - Add periodic WAL checkpoint and journal_size_limit to prevent unbounded WAL growth (#1956) - Add periodic clearFailed() to purge stale pending_messages (#1957) - Fix nounset-safe TTY_ARGS expansion in docker/claude-mem/run.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent silent data loss on non-XML responses, add queue info to /health (#1867, #1874) - ResponseProcessor: mark messages as failed (with retry) instead of confirming when the LLM returns non-XML garbage (auth errors, rate limits) (#1874) - Health endpoint: include activeSessions count for queue liveness monitoring (#1867) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cache isFts5Available() at construction time Addresses Greptile review: avoid DDL probe (CREATE + DROP) on every text query. Result is now cached in _fts5Available at construction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve worker stability bugs — pool deadlock, MCP loopback, restart guard (#1868, #1876, #2053) - Replace flat consecutiveRestarts counter with time-windowed RestartGuard: only counts restarts within 60s window (cap=10), decays after 5min of success. Prevents stranding pending messages on long-running sessions. (#2053) - Add idle session eviction to pool slot allocation: when all slots are full, evict the idlest session (no pending work, oldest activity) to free a slot for new requests, preventing 60s timeout deadlock. (#1868) - Fix MCP loopback self-check: use process.execPath instead of bare 'node' which fails on non-interactive PATH. Fix crash misclassification by removing false "Generator exited unexpectedly" error log on normal completion. (#1876) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve hooks reliability bugs — summarize exit code, session-init health wait (#1896, #1901, #1903, #1907) - Wrap summarize hook's workerHttpRequest in try/catch to prevent exit code 2 (blocking error) on network failures or malformed responses. Session exit no longer blocks on worker errors. (#1901) - Add health-check wait loop to UserPromptSubmit session-init command in hooks.json. On Linux/WSL where hook ordering fires UserPromptSubmit before SessionStart, session-init now waits up to 10s for worker health before proceeding. Also wrap session-init HTTP call in try/catch. (#1907) - Close #1896 as already-fixed: mtime comparison at file-context.ts:255-267 bypasses truncation when file is newer than latest observation. - Close #1903 as no-repro: hooks.json correctly declares all hook events. Issue was Claude Code 12.0.1/macOS platform event-dispatch bug. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: security hardening — bearer auth, path validation, rate limits, per-user port (#1932, #1933, #1934, #1935, #1936) - Add bearer token auth to all API endpoints: auto-generated 32-byte token stored at ~/.claude-mem/worker-auth-token (mode 0600). All hook, MCP, viewer, and OpenCode requests include Authorization header. Health/readiness endpoints exempt for polling. (#1932, #1933) - Add path traversal protection: watch.context.path validated against project root and ~/.claude-mem/ before write. Rejects ../../../etc style attacks. (#1934) - Reduce JSON body limit from 50MB to 5MB. Add in-memory rate limiter (300 req/min/IP) to prevent abuse. (#1935) - Derive default worker port from UID (37700 + uid%100) to prevent cross-user data leakage on multi-user macOS. Windows falls back to 37777. Shell hooks use same formula via id -u. (#1936) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve search project filtering and import Chroma sync (#1911, #1912, #1914, #1918) - Fix per-type search endpoints to pass project filter to Chroma queries and SQLite hydration. searchObservations/Sessions/UserPrompts now use $or clause matching project + merged_into_project. (#1912) - Fix timeline/search methods to pass project to Chroma anchor queries. Prevents cross-project result leakage when project param omitted. (#1911) - Sync imported observations to ChromaDB after FTS rebuild. Import endpoint now calls chromaSync.syncObservation() for each imported row, making them visible to MCP search(). (#1914) - Fix session-init cwd fallback to match context.ts (process.cwd()). Prevents project key mismatch that caused "no previous sessions" on fresh sessions. (#1918) - Fix sync-marketplace restart to include auth token and per-user port. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve all CodeRabbit and Greptile review comments on PR #2080 - Fix run.sh comment mismatch (no-op flag vs empty array) - Gate session-init on health check success (prevent running when worker unreachable) - Fix date_desc ordering ignored in FTS session search - Age-scope failed message purge (1h retention) instead of clearing all - Anchor RestartGuard decay to real successes (null init, not Date.now()) - Add recordSuccess() calls in ResponseProcessor and completion path - Prevent caller headers from overriding bearer auth token - Add lazy cleanup for rate limiter map to prevent unbounded growth - Bound post-import Chroma sync with concurrency limit of 8 - Add doc_type:'observation' filter to Chroma queries feeding observation hydration - Add FTS fallback to all specialized search handlers (observations, sessions, prompts, timeline) - Add response.ok check and error handling in viewer saveSettings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve CodeRabbit round-2 review comments - Use failure timestamp (COALESCE) instead of created_at_epoch for stale purge - Downgrade _fts5Available flag when FTS table creation fails - Escape FTS5 MATCH input by quoting user queries as literal phrases - Escape LIKE metacharacters (%, _, \) in prompt text search - Add response.ok check in initial settings load (matches save flow) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve CodeRabbit round-3 review comments - Include failed_at_epoch in COALESCE for age-scoped purge - Re-throw FTS5 errors so callers can distinguish failure from no-results - Wrap all FTS fallback calls in SearchManager with try/catch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
229 lines
14 KiB
Markdown
229 lines
14 KiB
Markdown
# Issue Blowout 2026 - Running TODO
|
|
|
|
Branch: `issue-blowout-2026` (merged as PR #2079)
|
|
Strategy: Cynical dev. Every bug report is suspect — look for overengineered band-aids as root cause.
|
|
Test gate: After every build-and-sync, verify observations are flowing.
|
|
Released: **v12.3.2** on 2026-04-19
|
|
|
|
## Instructions for Continuation
|
|
|
|
### Workflow per issue
|
|
1. Use `/make-plan` and `/do` to attack each issue's root cause
|
|
2. Be cynical — most bug reports are surface-level; the real issue is usually overengineered band-aids
|
|
3. After every `npm run build-and-sync`, verify observations flow:
|
|
```bash
|
|
sleep 5 && sqlite3 ~/.claude-mem/claude-mem.db "SELECT COUNT(*) FROM observations WHERE created_at_epoch > (strftime('%s','now') - 120) * 1000"
|
|
```
|
|
4. If observations stop flowing, that's a regression — fix it before continuing
|
|
|
|
### Docker isolation
|
|
- **Port 37777**: Host's live bun worker (YOUR claude-mem instance — don't touch)
|
|
- **Port 37778**: Another agent's docker container (`claude-mem-dev`) — hands off
|
|
- **Your docker**: Use tag `claude-mem:blowout`, data dir `.docker-blowout-data/`
|
|
```bash
|
|
TAG=claude-mem:blowout docker/claude-mem/build.sh
|
|
HOST_MEM_DIR=$(pwd)/.docker-blowout-data TAG=claude-mem:blowout docker/claude-mem/run.sh
|
|
```
|
|
- Check observations in docker DB:
|
|
```bash
|
|
sqlite3 .docker-blowout-data/claude-mem.db 'select count(*) from observations'
|
|
```
|
|
|
|
### PR → Review → Merge → Release cycle
|
|
1. Create PR from feature branch to main
|
|
2. Start review loop: `/loop 2m` to check and resolve review comments
|
|
- CodeRabbit and Greptile post inline comments — read, fix, commit, push, reply
|
|
- `claude-review` is a CI check — just needs to pass
|
|
- CodeRabbit can take 5-10 min to process after each push
|
|
3. When all reviews pass: `gh pr merge <PR#> --repo thedotmack/claude-mem --squash --delete-branch --admin`
|
|
4. Close resolved issues: `for issue in <numbers>; do gh issue close $issue --repo thedotmack/claude-mem --comment "Fixed in PR #XXXX"; done`
|
|
5. Version bump:
|
|
```bash
|
|
cd ~/Scripts/claude-mem
|
|
git pull origin main
|
|
# Run /version-bump patch (or use the skill: claude-mem:version-bump)
|
|
# It handles: version files → build → commit → tag → push → gh release → changelog
|
|
```
|
|
|
|
### Key files in the codebase
|
|
- **Parser**: `src/sdk/parser.ts` — observation and summary XML parsing
|
|
- **Prompts**: `src/sdk/prompts.ts` — LLM prompt templates (observation, summary, continuation)
|
|
- **ResponseProcessor**: `src/services/worker/agents/ResponseProcessor.ts` — unified response handler
|
|
- **SessionManager**: `src/services/worker/SessionManager.ts` — queue, sessions, circuit breaker
|
|
- **SessionSearch**: `src/services/sqlite/SessionSearch.ts` — FTS5 and filter queries
|
|
- **SearchManager**: `src/services/worker/SearchManager.ts` — hybrid Chroma+SQLite orchestration
|
|
- **Worker service**: `src/services/worker-service.ts` — periodic reapers, startup
|
|
- **Summarize hook**: `src/cli/handlers/summarize.ts` — Stop hook entry point
|
|
- **SessionRoutes**: `src/services/worker/http/routes/SessionRoutes.ts` — HTTP API
|
|
- **ViewerRoutes**: `src/services/worker/http/routes/ViewerRoutes.ts` — /health endpoint
|
|
- **Agents**: `src/services/worker/SDKAgent.ts`, `GeminiAgent.ts`, `OpenRouterAgent.ts`
|
|
- **Modes**: `plugin/modes/code.json` — prompt field values for the default mode
|
|
- **Migrations**: `src/services/sqlite/migrations/runner.ts`
|
|
- **PendingMessageStore**: `src/services/sqlite/PendingMessageStore.ts` — queue persistence
|
|
|
|
## Completed Phase 2-5 (16 more issues — this session)
|
|
|
|
| # | Component | Issue | Resolution |
|
|
|---|-----------|-------|------------|
|
|
| 2053 | worker | Generator restart guard strands pending messages | FIXED — Time-windowed RestartGuard replaces flat counter (10 restarts/60s window, 5min decay) |
|
|
| 1868 | worker | SDK pool deadlock: idle sessions monopolize slots | FIXED — evictIdlestSession() callback in waitForSlot() preempts idle sessions |
|
|
| 1876 | worker | MCP loopback self-check fails; crash misclassification | FIXED — process.execPath replaces bare 'node'; removed false "exited unexpectedly" log |
|
|
| 1901 | hooks | Summarize stop hook exits code 2 on errors | FIXED — workerHttpRequest wrapped in try/catch, exits gracefully |
|
|
| 1907 | hooks | Linux/WSL session-init before worker healthy | FIXED — health-check curl loop added to UserPromptSubmit hook; HTTP call wrapped |
|
|
| 1896 | hooks | PreToolUse file-context caps Read to limit:1 | CLOSED — already fixed (mtime comparison at file-context.ts:255-267) |
|
|
| 1903 | hooks | PostToolUse/Stop/SessionEnd never fire | CLOSED — no-repro (hooks.json correct; Claude Code 12.0.1 platform bug) |
|
|
| 1932 | security | Admin endpoints spoofable requireLocalhost | FIXED — bearer token auth on all API endpoints |
|
|
| 1933 | security | Unauthenticated HTTP API exposes 30+ endpoints | FIXED — auto-generated token at ~/.claude-mem/worker-auth-token (mode 0600) |
|
|
| 1934 | security | watch.context.path written without validation | FIXED — path traversal protection validates against project root / data dir |
|
|
| 1935 | security | Unbounded input, no rate limits | FIXED — 5MB body limit (was 50MB), 300 req/min/IP rate limiter |
|
|
| 1936 | security | Multi-user macOS shared port cross-user MCP | FIXED — per-user port derivation from UID (37700 + uid%100) |
|
|
| 1911 | search | search()/timeline() cross-project results | FIXED — project filter passed to Chroma queries and timeline anchor searches |
|
|
| 1912 | search | /api/search per-type endpoints ignore project | FIXED — project $or clause added to searchObservations/Sessions/UserPrompts |
|
|
| 1914 | search | Imported observations invisible to MCP search | FIXED — ChromaSync.syncObservation() called after import |
|
|
| 1918 | search | SessionStart "no previous sessions" on fresh sessions | FIXED — session-init cwd fallback matches context.ts (process.cwd()) |
|
|
|
|
## Completed (9 issues — PR #2079, v12.3.2)
|
|
|
|
| # | Component | Issue | Resolution |
|
|
|---|-----------|-------|------------|
|
|
| 1908 | summarizer | parseSummary discards output when LLM emits observation tags | CLOSED — already fixed by Gen 3 coercion (coerceObservationToSummary in parser.ts) |
|
|
| 1953 | db | Migration 7 rebuilds table every startup | CLOSED — already fixed by commit 59ce0fc5 (origin !== 'pk' filter) |
|
|
| 1916 | search | /api/search/by-concept emits malformed SQL | FIXED — concept→concepts remap in SearchManager.normalizeParams() |
|
|
| 1913 | search | Text search returns empty when ChromaDB disabled | FIXED — FTS5 keyword fallback in SessionSearch + SearchManager |
|
|
| 2048 | search | Text queries should fall back to FTS5 when Chroma disabled | FIXED — same as #1913 |
|
|
| 1957 | db | pending_messages: failed rows never purged | FIXED — periodic clearFailed() in stale session reaper (every 2 min) |
|
|
| 1956 | db | WAL grows unbounded, no checkpoint schedule | FIXED — journal_size_limit=4MB + periodic wal_checkpoint(PASSIVE) |
|
|
| 1874 | worker | processAgentResponse deletes queued messages on non-XML output | FIXED — mark messages failed (with retry) instead of confirming |
|
|
| 1867 | worker | Queue processor dies while /health stays green | FIXED — activeSessions count added to /health endpoint |
|
|
|
|
Also fixed (not an issue): docker/claude-mem/run.sh nounset-safe TTY_ARGS expansion.
|
|
Also fixed (Greptile review): cached isFts5Available() at construction time.
|
|
|
|
## Remaining — CRITICAL (5)
|
|
|
|
| # | Component | Issue |
|
|
|---|-----------|-------|
|
|
| 1925 | mcp | chroma-mcp subprocess leak via null-before-close |
|
|
| 1926 | mcp | chroma-mcp stdio handshake broken across all versions |
|
|
| 1942 | auth | Default model not resolved on Bedrock/Vertex/Azure |
|
|
| 1943 | auth | SDK pipeline rejects Bedrock auth |
|
|
| 1880 | windows | Ghost LISTEN socket on port 37777 after crash |
|
|
| 1887 | windows | Failing worker blocks Claude Code MCP 10+ min in hook-restart loop |
|
|
|
|
## Remaining — HIGH (32)
|
|
|
|
| # | Component | Issue |
|
|
|---|-----------|-------|
|
|
| 1869 | worker | No mid-session auto-restart after inner crash |
|
|
| 1870 | worker | Stop hook blocks ~110s when SDK pool saturated |
|
|
| 1871 | worker | generateContext opens fresh SessionStore per call |
|
|
| 1875 | worker | Spawns uvx/node/claude by bare name; silent fail in non-interactive |
|
|
| 1877 | worker | Cross-session context bleed in same project dir |
|
|
| 1879 | worker | Session completion races in-flight summarize |
|
|
| 1890 | sdk-pool | SDK session resume during summarize causes context-overflow |
|
|
| 1892 | sdk-pool | Memory agent prompt defeats cache (dynamic before static) |
|
|
| 1895 | hooks | Stop hook spins 110s when worker older than v12.1.0 |
|
|
| 1897 | hooks | PreToolUse:Read lacks PATH export and cache-path lookup |
|
|
| 1899 | hooks | SessionStart additionalContext >10KB truncated to 2KB |
|
|
| 1902 | hooks | Stop and PostToolUse hooks synchronously block up to 120s |
|
|
| 1904 | hooks | UserPromptSubmit hooks skipped in git worktree sessions |
|
|
| 1905 | hooks | Saved_hook_context entries pegs CPU 100% on session load |
|
|
| 1906 | hooks | PR #1229 fallback path points to source, not cache |
|
|
| 1909 | summarizer | Summarize hook doesn't recognize Gemini transcripts |
|
|
| 1921 | mcp | Root .mcp.json is empty, mcp-search never registers |
|
|
| 1922 | mcp | MCP server uses 3s timeout for corpus prime/query |
|
|
| 1929 | installer | "Update now" fails for cache-only installs |
|
|
| 1930 | installer | Windows 11 ships smart-explore without tree-sitter |
|
|
| 1937 | observer | JSONL files accumulate indefinitely, tens of GB |
|
|
| 1938 | observer | Observer background sessions burn tokens with no budget |
|
|
| 1939 | cross-platform | Project key uses basename(cwd), fragmenting worktrees |
|
|
| 1941 | cross-platform | Linux worker with live-but-unhealthy PID blocks restart |
|
|
| 1944 | auth | ANTHROPIC_AUTH_TOKEN not forwarded to SDK subprocess |
|
|
| 1945 | auth | Vertex AI CLI auth fails silently on expired OAuth |
|
|
| 1947 | plugin-lifecycle | OpenCode tool args as plain objects not Zod schemas |
|
|
| 1948 | plugin-lifecycle | OpenClaw installer "plugin not found" |
|
|
| 1949 | plugin-lifecycle | OpenClaw per-agent memory isolation broken |
|
|
| 1950 | plugin-lifecycle | OpenClaw missing skills, session drift, workspaceDir loss |
|
|
| 1952 | db | ON UPDATE CASCADE rewrites historical session attribution |
|
|
| 1954 | db | observation_feedback schema mismatch source vs compiled |
|
|
| 1958 | viewer | Settings model dropdown destroys precise model IDs |
|
|
| 1881-1888 | windows | 8 Windows-specific bugs (paths, spawning, timeouts) |
|
|
|
|
## Remaining — MEDIUM (21)
|
|
|
|
| # | Component | Issue |
|
|
|---|-----------|-------|
|
|
| 1872 | worker | Gemini 400/401 triggers 2-min crash-recovery loop |
|
|
| 1873 | worker | worker-service.cjs killed by SIGKILL (unbounded heap) |
|
|
| 1878 | worker | Logger caches log file path, never rotates |
|
|
| 1891 | sdk-pool | Mode prompts in user messages, not system prompt |
|
|
| 1893 | sdk-pool | SDK sub-agents hardcoded permissionMode:"default" |
|
|
| 1894 | hooks | SessionStart can't find claude at ~/.local/bin |
|
|
| 1898 | hooks | SessionStart health-check uses hardcoded port 37777 |
|
|
| 1900 | hooks | Setup hook references non-existent scripts/setup.sh |
|
|
| 1910 | summarizer | Summary prompt leaks observation tags, ignores user_prompt |
|
|
| 1915 | search | Search results not deduplicated |
|
|
| 1917 | search | $CMEM context preview shows oldest instead of newest |
|
|
| 1920 | search | Context footer "ID" ambiguous across 3 ID spaces |
|
|
| 1923 | mcp | smart_outline empty for .txt files |
|
|
| 1924 | mcp | chroma-mcp child not terminated on exit |
|
|
| 1927 | mcp | chroma-mcp fails on WSL with ALL_PROXY=socks5 |
|
|
| 1928 | installer | BranchManager.pullUpdates() fails on cache-layout |
|
|
| 1931 | installer | npm run worker:status ENOENT .claude/package.json |
|
|
| 1940 | cross-platform | cmux.app wrapper "Claude executable not found" |
|
|
| 1946 | auth | OpenRouter 401 Missing Authentication header |
|
|
| 1955 | db | Duplicate observations bypass content-hash dedup |
|
|
| 1959 | viewer | SSE new_prompt broadcast dies after /reload-plugins |
|
|
| 1961 | misc | Traditional Chinese falls back to Simplified |
|
|
|
|
## Remaining — LOW (3)
|
|
|
|
| # | Component | Issue |
|
|
|---|-----------|-------|
|
|
| 1919 | search | Shared jsts tree-sitter query applies TS-only to JS |
|
|
| 1951 | plugin-lifecycle | OpenClaw lifecycle events stored as observations |
|
|
| 1960 | misc | OpenRouter URL hardcoded |
|
|
|
|
## Remaining — NON-LABELED (1)
|
|
|
|
| # | Component | Issue |
|
|
|---|-----------|-------|
|
|
| 2054 | installer | installCLI version-pinned alias can't self-update |
|
|
|
|
## Suggested Next Attack Order
|
|
|
|
### Phase 2: Worker stability — DONE
|
|
### Phase 3: Hooks reliability — DONE
|
|
### Phase 4: Security hardening — DONE
|
|
### Phase 5: Search remaining — DONE
|
|
|
|
### Phase 6: MCP + Auth
|
|
- #1925, #1926, #1942, #1943
|
|
|
|
### Phase 7: Windows
|
|
- #1880, #1887, #1881-1888
|
|
|
|
### Phase 6: MCP / Chroma
|
|
- #1925, #1926, #2046, #1921
|
|
|
|
### Phase 7: Everything else
|
|
- Remaining hooks, installer, windows, observer, viewer, auth, plugin-lifecycle
|
|
|
|
## Progress Log
|
|
|
|
| Time | Action | Result |
|
|
|------|--------|--------|
|
|
| 9:40p | #1908 analyzed | Already fixed by Gen 3 coercion. Closed. |
|
|
| 9:51p | #1916 fixed | concept→concepts remap in normalizeParams |
|
|
| 9:53p | #1913/#2048 fixed | FTS5 fallback in SessionSearch + SearchManager |
|
|
| 9:57p | #1953 closed | Already fixed by commit 59ce0fc5 |
|
|
| 9:57p | #1957 fixed | Periodic clearFailed() in stale session reaper |
|
|
| 9:58p | #1956 fixed | journal_size_limit + periodic WAL checkpoint |
|
|
| 10:01p | #1874 fixed | Non-XML responses mark messages failed instead of confirming |
|
|
| 10:01p | #1867 fixed | Health endpoint includes activeSessions count |
|
|
| 10:02p | build-and-sync | Observations flowing. No regression. |
|
|
| 10:03p | PR #2079 created | 2 commits pushed |
|
|
| 10:06p | Greptile review | 2 comments — cached isFts5Available(). Fixed + pushed. |
|
|
| 10:20p | PR #2079 merged | All reviews passed (CodeRabbit, Greptile, claude-review) |
|
|
| 10:25p | v12.3.2 released | Tag pushed, GitHub release created, CHANGELOG updated |
|