Commit Graph

435 Commits

Author SHA1 Message Date
Alessandro Costa c3e5f3a79e fix: wire generated_by_model into observation write path
The generated_by_model column was added to the observations table in the
Phase 0 governance schema migration but never wired into the INSERT
statements. All 3,878+ observations in production have this field NULL.

This fix threads the model ID from each agent (SDKAgent, GeminiAgent,
OpenRouterAgent) through processAgentResponse() into storeObservation(),
storeObservations(), and storeObservationsAndMarkComplete().

Unblocks Thompson Sampling RFC (#1571) which needs {obs_type}:{model}
as the bandit arm key.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:25:18 -03:00
Alex Newman 76207fb8d6 Merge branch 'feat/tier-routing-feedback' into thedotmack/merge-alessandro-prs 2026-04-04 15:18:34 -07:00
Alessandro Costa 42cc863bf2 fix: address CodeRabbit review on PR #1569
Critical:
- migrations: change version 8 → 25 to avoid collision with
  MigrationRunner.addObservationHierarchicalFields (uses version 8)
- SessionRoutes: remove duplicate imports that prevent compilation

Major:
- SessionRoutes: call applyTierRouting() before every generator spawn
  (stale-recovery and crash-recovery paths were missing it)
- applyTierRouting: clear session.modelOverride at top before re-evaluating
  to prevent stale tier from persisting across spawns

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:18:13 -07:00
Alessandro Costa 0fcc078873 feat: tier routing by queue complexity + observation feedback table
Tier Routing:
- Inspect pending queue before starting generator
- Summarize messages → CLAUDE_MEM_TIER_SUMMARY_MODEL (e.g., Opus)
- All simple tools (Read, Glob, Grep, LS) → CLAUDE_MEM_TIER_SIMPLE_MODEL (Haiku)
- Mixed/complex → default model (no override)
- session.modelOverride in ActiveSession, used by SDKAgent.getModelId()
- peekPendingTypes() in PendingMessageStore for non-claiming inspection
- Configurable via CLAUDE_MEM_TIER_ROUTING_ENABLED (default: true)

Feedback Collection (schema only):
- New observation_feedback table via MigrationRunner (schema version 24)
- Tracks signal_type (semantic_inject_hit, search_accessed, etc.)
- Indexes on observation_id and signal_type
- Foundation for future Thompson Sampling optimization

Production data (24h tier routing test):
- 36 Haiku observations in 4 min, quality indistinguishable from Sonnet
- Estimated ~52% cost reduction on SDK Agent usage
- 835 → 6,695 feedback signals collected over 13 days

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:18:13 -07:00
Alex Newman d11c0821bb fix: correct semantic endpoint doc comment GET→POST, clamp limit 1-20
Follow-up to PR #1568: fix stale doc comment that still said GET, and add
limit parameter validation (default 5, clamped to 1-20 range).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:17:11 -07:00
Alessandro Costa 876cc4d837 feat: semantic context injection via Chroma on UserPromptSubmit (#1568)
* feat: semantic context injection via Chroma on every UserPromptSubmit

On each prompt, queries ChromaDB for the top-N most relevant past
observations and injects them as additionalContext. Replaces the
recency-based "last N observations" approach with relevance-based
semantic search.

Changes:
- session-init.ts: After session init, query /api/context/semantic
  with user's prompt text. If results found, return as
  hookSpecificOutput with hookEventName 'UserPromptSubmit'.
- SearchRoutes.ts: New GET /api/context/semantic endpoint that queries
  SearchManager with format='json' and formats results as markdown.
- SettingsDefaultsManager.ts: New settings CLAUDE_MEM_SEMANTIC_INJECT
  (default: true) and CLAUDE_MEM_SEMANTIC_INJECT_LIMIT (default: 5).

Key behaviors:
- Fires on every UserPromptSubmit (not just SessionStart)
- Minimum prompt length: 20 chars (skips "ok", "yes", etc.)
- Skips media-only prompts
- Graceful degradation: if worker/Chroma unavailable, no injection
- Survives /clear: re-injects on next prompt (not session-bound)
- Uses workerHttpRequest (v10.6.3 API, not raw fetch)

Production data (23 days, 3,400+ observations):
- Before: 8 most recent observations (often irrelevant to current topic)
- After: 5 most relevant observations (semantic match)
- Token cost: ~1800 → ~800-1200 per injection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit review on PR #1568

- session-init: don't skip semantic injection when contextInjected=true
  (only skip agent re-init, semantic lookup must run every prompt)
- session-init: normalize SEMANTIC_INJECT toggle via String().toLowerCase()
- semantic endpoint: change from GET to POST to avoid URL-length limits
  and prompt exposure in access logs. Handler accepts both body and query
  for backwards compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Alessandro Costa <alessandro@claudio.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:16:46 -07:00
Alessandro Costa 64cce2bf10 fix: resolve 3 upstream bugs (summarize, ChromaSync, HealthMonitor) (#1566)
* fix: resolve 3 upstream bugs in summarize, ChromaSync, and HealthMonitor

1. summarize.ts: Skip summary when transcript has no assistant message.
   Prevents error loop where empty transcripts cause repeated failed
   summarize attempts (~30 errors/day observed in production).

2. ChromaSync.ts: Fallback to chroma_update_documents when add fails
   with "IDs already exist". Handles partial writes after MCP timeout
   without waiting for next backfill cycle.

3. HealthMonitor.ts: Replace HTTP-based isPortInUse with atomic socket
   bind on Unix. Eliminates TOCTOU race when two sessions start
   simultaneously (HTTP check is non-atomic — both see "port free"
   before either completes listen()). Updated tests accordingly.

All three bugs are pre-existing in v10.5.5. Confirmed via log analysis
of 543K lines over 17 days of production usage across two servers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add CONTRIB_NOTES.md to gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit review on PR #1566

- HealthMonitor: add APPROVED OVERRIDE annotation for Win32 HTTP fallback
- ChromaSync: replace chroma_update_documents with delete+add for proper
  upsert (update only modifies existing IDs, silently ignores missing ones)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Alessandro Costa <alessandro@claudio.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:15:08 -07:00
Alessandro Costa 8958c3335d feat: drain orphaned pending messages on SIGTERM session completion (#1567)
* feat: drain orphaned pending messages on session completion (SIGTERM)

When deleteSession() aborts the SDK agent via SIGTERM, pending messages
in the queue are never processed. Without drain, they remain in
'pending' status forever — no future generator picks them up because
the session is already completed.

Adds markAllSessionMessagesAbandoned() call after deleteSession() in
completeByDbId(). This reuses the existing PendingMessageStore method
already used by worker-service.ts terminateSession().

Production evidence: 15 orphaned summarize messages found across
completed sessions (ages 3h to 3 days) before this fix. After fix:
0 orphaned messages over 23 days of operation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: document best-effort drain limitation per CodeRabbit review #1567

Add comment noting the rare race condition when generators outlive the
30s SIGTERM timeout. Practical risk is negligible (0 orphans over 23 days).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Alessandro Costa <alessandro@claudio.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:14:25 -07:00
Alex Newman c7c68e81f4 fix: address 10 unresolved PR review threads
- README: add language specifier to fenced code block
- paths.ts: guard npmPackageRootDirectory() against bundle structure drift
- OpenCodeInstaller: resolve bundle from import.meta.url, not process.cwd()
- OpenCodeInstaller: log warnings on AGENTS.md injection failures
- WindsurfHooksInstaller: key registry by full workspace path, not basename
- uninstall.ts: poll health endpoint to wait for worker exit before file deletion
- uninstall.ts: call IDE-specific uninstallers (Gemini, Windsurf, OpenCode, OpenClaw, Codex)
- opencode-plugin: cap session tracking Map at 1000 entries with LRU eviction
- GeminiCliHooksInstaller: document intentional JSON double-escaping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 14:45:53 -07:00
Alex Newman 4de417663c fix: catch corrupt JSON in Gemini CLI status command
readGeminiSettings() throws on corrupt JSON since ae6915b, but
checkGeminiCliHooksStatus() called it without catching — violating
its "returns 0 always" contract.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 14:29:08 -07:00
Alex Newman 190c74492f fix: address second PR review — clean replace, IDE failure bubbling, bun validation
- cpSync now does rmSync before copy to avoid stale file merges
- setupIDEs() returns failed IDE list; install reports partial success
- runSmartInstall() returns boolean status instead of void
- Worker port in next-steps URL reads CLAUDE_MEM_WORKER_PORT env var
- Goose YAML regex stops at column-0 keys (prevents eating sibling sections)
- AGENTS.md uninstall removes header-only stub files
- findBunPath() validated before use in WindsurfHooksInstaller
- Cursor marked unsupported in ide-detection until installer is wired

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 14:17:18 -07:00
Alex Newman ae6915b88e fix: address PR review — shebang, double-escaping, data loss, uninstall scope
- Add shebang banner to NPX CLI esbuild config so npx claude-mem works
- Remove manual backslash pre-escaping in WindsurfHooksInstaller (JSON.stringify handles it)
- Scope cache deletion to claude-mem only, not entire vendor namespace
- Use getWorkerPort() in OpenCodeInstaller instead of hard-coded 37777
- Throw on corrupt JSON in readJsonSafe/readGeminiSettings/Windsurf to prevent data loss
- Fix Cursor install stub to warn instead of silently succeeding
- Fix Gemini uninstall to remove individual hooks within groups, not whole groups
- Update tests for new corrupt-file-throws behavior

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 13:49:14 -07:00
Alex Newman 2495f98496 refactor: consolidate MCP factory, add non-TTY support, auto-detect transcript watchers
- Phase 1: Replace 5 duplicate MCP installers with config-driven factory, extract
  shared context-injection and json-utils utilities, fix process.execPath usage
- Phase 2: Add non-TTY fallback for @clack/prompts to prevent ENOENT in CI/Docker
- Phase 3: Wire GeminiCliHooksInstaller through hook command framework with adapter
- Phase 4: Auto-start transcript watchers on worker boot when config exists

Net -107 lines via DRY consolidation of duplicated installer logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 00:35:55 -07:00
Alex Newman a2ac116aac fix: move summary wait + session-complete into Stop hook to prevent lost summaries
SessionEnd has a 1.5s hardcoded cap from Claude Code (CLAUDE_CODE_SESSIONEND_HOOKS_TIMEOUT_MS),
making it unsuitable for waiting on async work. Previously, the Stop hook would fire-and-forget
the summarize request, then SessionEnd would immediately call deleteSession — aborting the SDK
agent mid-summary.

Now the Stop hook (120s timeout, no cap) owns the full lifecycle:
1. Queue summarize request
2. Poll new GET /api/sessions/status endpoint until queue drains
3. Call /api/sessions/complete after summary finishes

SessionEnd is now a true fire-and-forget fallback (process.exit(0) immediately).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:05:53 -07:00
Alex Newman 8265fc7aa1 Merge remote-tracking branch 'origin/thedotmack/npx-gemini-cli' into thedotmack/npx-gemini-cli
Resolve merge conflicts in adapter index, gemini-cli adapter, and rebuilt CJS artifacts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 13:47:49 -07:00
Alex Newman 76a880a3d6 feat: update install CLI, ESM compat, and Gemini CLI docs
Fixes CursorHooksInstaller ESM compatibility, updates install command
with improved path resolution, and refreshes built plugin artifacts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 12:38:45 -07:00
Alex Newman a66b98bcdd fix: strip <system-reminder> tags from persisted memory and DRY up regex
System reminders (CLAUDE.md contents, deferred tool lists) were being
stored in memory observations. Add system-reminder to the tag stripping
pipeline alongside <private> and <system_instruction>, and extract the
duplicated regex into a shared SYSTEM_REMINDER_REGEX constant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 00:25:13 -07:00
Alex Newman 67645041fa Merge main into thedotmack/file-read-timeline-inject
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 16:11:41 -07:00
Ousama Ben Younes 2a304d59eb fix: handle bare path strings in files_modified/files_read columns (#1359)
JSON.parse('/path/to/file') throws SyntaxError, crashing the viewer and
any code reading observations with legacy bare-path data in those columns.

- Add parseFileList() helper in observations/files.ts — tries JSON.parse,
  falls back to wrapping bare strings in an array
- Replace unsafe JSON.parse calls in files.ts, SessionStore.ts, ChromaSync.ts
- Add 9 unit tests covering null, empty, valid JSON, bare paths, invalid JSON

Closes #1359

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-01 06:17:35 +00:00
Ousama Ben Younes 12501412b9 fix: persist session completion to database in completeByDbId (#1532)
completeByDbId only cleaned up in-memory state, leaving sdk_sessions rows
with status='active' and completed_at=NULL indefinitely. Ghost sessions
accumulated and exhausted the agent pool, causing 60s timeout errors.

- Add SessionStore.markSessionCompleted() to set status/completed_at/completed_at_epoch
- Call it at the start of completeByDbId before in-memory cleanup
- Inject SessionStore into SessionCompletionHandler via constructor
- Add 4 tests covering status, timestamps, isolation, and non-existent IDs

Closes #1532

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-01 06:02:14 +00:00
Oracle Public Cloud User 4589b34eab fix: decouple mcp health from loopback self-check 2026-03-30 16:37:56 +00:00
Ryan Malia b0f1a458cf fix: log warning when readiness times out on reused-worker path (#1491)
Mirror the fresh-spawn path's timeout logging for debugging parity.
CodeRabbit nitpick on PR #1491.

Co-Authored-By: CC <noreply@anthropic.com>
2026-03-30 03:47:08 -07:00
Ryan Malia 83f61177c7 fix: address CodeRabbit review feedback on PR #1491
- Update POST_SPAWN_WAIT test assertion from 5000 to 15000 to match
  the constant change in hook-constants.ts
- Remove redundant readPidFile() from aggressiveStartupCleanup() —
  start() writes the new PID before this runs, so it always returns
  process.pid (already protected)
- Add waitForReadiness() to the reused-worker path in
  ensureWorkerStarted() to prevent concurrent hooks from racing
  past a cold-starting worker's initialization guard

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 03:43:36 -07:00
Ryan Malia 88b47f9e9c fix: prevent worker daemon from being killed by its own hooks (#1490)
Three independent fixes for worker daemon instability:

1. Remove version mismatch auto-restart from ensureWorkerStarted() (#1435).
   The marketplace bundle ships with __DEFAULT_PACKAGE_VERSION__ unbaked,
   causing BUILT_IN_VERSION to fall back to "development". This creates a
   100% reproducible mismatch on every hook call, killing a healthy worker
   and often failing to restart. Same pattern across #566, #665, #667,
   #669, #689, #1124, #1145 (8+ releases).

2. Add process.ppid and PID-file PID to aggressiveStartupCleanup()
   exclusions (#1426). Without this, a newly spawned daemon SIGKILLs
   the hook process that spawned it and any already-running worker
   the PID file points to.

3. Increase POST_SPAWN_WAIT from 5s to 15s (#1423). The 5s timeout was
   sized for Linux (<1s startup) but macOS ARM64 cold starts take 6-8s
   with Chroma enabled.
2026-03-30 03:43:36 -07:00
Alex Newman 80d1deedbe fix: address PR review feedback from CodeRabbit
- Add sessionId to summarize.ts warning log for easier triage
- Add APPROVED OVERRIDE annotation to Windows spawn catch block

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:34:42 -07:00
Alex Newman 07ab7000a8 fix: patch 7 critical bugs affecting all non-dev-machine users and Windows
1. Fix esbuild inlining build-machine __dirname as string literal — use
   CJS-compatible runtime banner with require("node:url").fileURLToPath
   across worker-service, mcp-server, and context-generator builds.

2. Fix isMainModule check missing .cjs extension and Windows backslash
   path normalization.

3. Wrap extractLastMessage in try-catch to prevent infinite Stop hook
   feedback loop on malformed transcripts (exit 0 instead of exit 2).

4. Replace heavy SessionEnd hook (Node→Bun→1.7MB CJS→HTTP) with
   lightweight inline node -e one-liner (~200ms vs >1s).

5. Add 7 Gemini/OpenRouter error patterns to unrecoverablePatterns
   circuit breaker to prevent 77K+ retry loops on expired API keys.

6. Preserve CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_CODE_GIT_BASH_PATH in
   sanitizeEnv instead of stripping them with the CLAUDE_CODE_ prefix.

7. Use PowerShell -EncodedCommand for spawnDaemon to fix path quoting
   when Windows usernames contain spaces.

Closes #1515, #1495, #1475, #1465, #1500, #1513, #1512, #1450, #1460,
#1486, #1449, #1481, #1451, #1480, #1453, #1445

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:20:29 -07:00
Conductor 5621b67ccd Saving uncommitted changes before archiving 2026-03-26 19:35:27 -07:00
79475432@qq.com 9cfa57d498 fix: use null-byte delimiter in observation content hash to prevent collisions
Fields concatenated without separators allowed different tuples to produce
identical hashes (e.g. session="ab", title="cd" vs session="abc", title="d").
This could cause legitimate observations to be silently deduplicated.

Join with \x00 so field boundaries are unambiguous.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 17:16:17 +08:00
huakson 4f6fb9e614 fix: address platform source review feedback
Tighten platform source persistence so legacy callers cannot silently relabel existing sessions, repair migration 24 when schema_versions drifts from the real schema, and polish the follow-up UI/error-handler review nits.

- only backfill platform_source when it is blank and raise on explicit source conflicts for an existing session
- make migration 24 verify both the sdk_sessions column and its index before treating it as applied
- expose platform_source from the functional session getters and add regression tests for source preservation and schema drift recovery
- add the required APPROVED OVERRIDE annotation for centralized HTTP error translation
- keep mobile source pills on a single horizontal row
2026-03-24 10:46:48 -03:00
huakson 2b60dd2932 feat: isolate Claude and Codex session sources
Persist platform_source across session creation, transcript ingestion, API query paths, and viewer state so Claude and Codex data can coexist without bleeding into each other.

- add platform-source normalization helpers and persist platform_source in sdk_sessions via migration 24 with backfill and indexing
- thread platformSource through CLI hooks, transcript processing, context generation, pagination, search routes, SSE payloads, and session management
- expose source-aware project catalogs, viewer tabs, context preview selectors, and source badges for observations, prompts, and summaries
- start the transcript watcher from the worker for transcript-based clients and preserve platform source during Codex ingestion
- auto-start the worker from the MCP server for MCP-only clients and tighten stdio-driven cleanup during shutdown
- keep createSDKSession backward compatible with existing custom-title callers while allowing explicit platform source forwarding
2026-03-24 08:46:18 -03:00
Alex Newman 031513d723 feat: add Codex CLI, OpenClaw, and MCP-based IDE integrations
Codex CLI: transcript-based integration watching ~/.codex/sessions/,
schema bumped to v0.3 with exec_command support, AGENTS.md context.

OpenClaw: installer wires pre-built plugin to ~/.openclaw/extensions/,
registers in openclaw.json with memory slot and sync config.

MCP integrations (6 IDEs): Copilot CLI, Antigravity, Goose, Crush,
Roo Code, and Warp — config writing + context injection. Goose uses
string-based YAML manipulation (no parser dependency).

All 13 IDE targets now supported in npx claude-mem install.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 23:02:18 -07:00
Alex Newman f2cc33b494 feat: add Gemini CLI, OpenCode, and Windsurf IDE integrations
Gemini CLI: platform adapter mapping 6 of 11 hooks, settings.json
deep-merge installer, GEMINI.md context injection.

OpenCode: plugin with tool.execute.after interceptor, bus events for
session lifecycle, claude_mem_search custom tool, AGENTS.md context.

Windsurf: platform adapter for tool_info envelope format, hooks.json
installer for 5 post-action hooks, .windsurf/rules context injection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 23:02:18 -07:00
vnz df1fb8bb89 fix(gemini): add conversation history truncation to prevent O(N²) token cost growth
GeminiAgent sends the full conversation history with every API call,
causing quadratic token growth per session. A 100-observation session
sends ~30M cumulative input tokens. This ports the proven truncateHistory()
sliding window from OpenRouterAgent to GeminiAgent.

- Add CLAUDE_MEM_GEMINI_MAX_CONTEXT_MESSAGES (default: 20) and
  CLAUDE_MEM_GEMINI_MAX_TOKENS (default: 100000) settings
- Add truncateHistory() to GeminiAgent using shared estimateTokens()
- Always preserve at least the newest message to avoid empty API requests
- Add settings validation in SettingsRoutes (1-100 messages, 1K-1M tokens)
- Add regression tests for truncation and oversized single-prompt edge case

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 07:37:58 +01:00
Alex Newman 4d7bec4d05 fix: stop spinner from spinning forever (#1440)
* fix: stop spinner from spinning forever due to orphaned DB messages

The activity spinner never stopped because isAnySessionProcessing() queried
ALL pending/processing messages in the database, including orphaned messages
from dead sessions that no generator would ever process.

Root cause: isAnySessionProcessing() used hasAnyPendingWork() which is a
global DB scan. Changed it to use getTotalQueueDepth() which only checks
sessions in the active in-memory Map.

Additional fixes:
- Add terminateSession() to enforce restart-or-terminate invariant
- Fix 3 zombie paths in .finally() handler that left sessions alive
- Clean up idle sessions from memory on successful completion
- Remove redundant bare isProcessing:true broadcast
- Replace inline require() with proper accessor
- Add 8 regression tests for session termination invariant

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review findings — idle-timeout race, double broadcast, query amplification

- Move pendingCount check before idle-timeout termination to prevent
  abandoning fresh messages that arrive between idle abort and .finally()
- Move broadcastProcessingStatus() inside restart branch only — the else
  branch already broadcasts via removeSessionImmediate callback
- Compute queueDepth once in broadcastProcessingStatus() and derive
  isProcessing from it, eliminating redundant double iteration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 14:13:10 -07:00
Alex Newman 5b041d6b49 refactor: rename formatters to AgentFormatter/HumanFormatter for semantic clarity
ColorFormatter and MarkdownFormatter names obscured their actual purpose.
The formatters serve two distinct audiences: the AI agent (compressed,
token-efficient context) and the human (rich ANSI-colored terminal output).

- MarkdownFormatter → AgentFormatter (renderMarkdown* → renderAgent*)
- ColorFormatter → HumanFormatter (renderColor* → renderHuman*)
- useColors parameter → forHuman across the pipeline
- Import aliases Color/Markdown → Human/Agent
- API query param `colors=true` unchanged (backward compatible)

Pure rename refactor — no logic or behavior changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 11:50:41 -07:00
Alex Newman c80763390b feat: file-read decision gate — block reads when observation history exists
Add a PreToolUse gate that blocks file reads on first attempt when rich
observation history exists, presenting the timeline as feedback. Claude
then decides: use get_observations() (skip read, save tokens) or re-read
(allowed on second attempt).

- FileReadGate: in-memory session-scoped gate with 4h TTL
- POST /api/file-context/gate endpoint in worker
- stderrMessage plumbing in hook-command for exit code 2
- file-context handler uses gate to block/allow reads

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 12:11:02 -07:00
Alex Newman e07b13f7de fix: proper project isolation and relative path matching for file-context hook
- Use getProjectContext(cwd).allProjects for project scoping (same as SessionStart)
- Convert absolute file_path to relative using cwd (observations store relative paths)
- API accepts comma-separated projects param with IN() SQL filter
- Remove basename matching — use full relative path to avoid cross-file collisions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 15:38:53 -07:00
Alex Newman fb9d917f8a feat: inject file observation timeline on PreToolUse Read hook
When Claude reads a file, the PreToolUse hook queries for existing
observations about that file and injects the timeline into context
via additionalContext + permissionDecision: allow. This prevents
duplicate observations and saves tokens through active rediscovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 15:18:54 -07:00
Alex Newman 7e07210635 feat: add timeline-report skill with token economics, compress context output 53%
## Summary
- New timeline-report skill for generating narrative project history reports
- Compressed markdown context output ~53% (tables → flat compact lines, verbose labels → terse format)
- Added `full=true` param to /api/context/inject for fetching all observations
- Split TimelineRenderer into separate markdown/color rendering paths
- Removed arbitrary file write vulnerability (dump_to_file param)
- Fixed timestamp ditto marker leaking across session summary boundaries

## Review
- Rebased on main (v10.6.0) to preserve OpenClaw system prompt injection
- Reviewed by /review (gstack) + /octo:review (Codex, Gemini, Claude fleet)
- Security fix (dump_to_file removal) confirmed by all 3 reviewers
- Timestamp bug caught by Codex, fixed

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-03-18 13:57:20 -07:00
Alex Newman 80a8c90a1a feat: add embedded Process Supervisor for unified process lifecycle (#1370)
* feat: add embedded Process Supervisor for unified process lifecycle management

Consolidates scattered process management (ProcessManager, GracefulShutdown,
HealthMonitor, ProcessRegistry) into a unified src/supervisor/ module.

New: ProcessRegistry with JSON persistence, env sanitizer (strips CLAUDECODE_*
vars), graceful shutdown cascade (SIGTERM → 5s wait → SIGKILL with tree-kill
on Windows), PID file liveness validation, and singleton Supervisor API.

Fixes #1352 (worker inherits CLAUDECODE env causing nested sessions)
Fixes #1356 (zombie TCP socket after Windows reboot)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add session-scoped process reaping to supervisor

Adds reapSession(sessionId) to ProcessRegistry for killing session-tagged
processes on session end. SessionManager.deleteSession() now triggers reaping.
Tightens orphan reaper interval from 60s to 30s.

Fixes #1351 (MCP server processes leak on session end)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Unix domain socket support for worker communication

Introduces socket-manager.ts for UDS-based worker communication, eliminating
port 37777 collisions between concurrent sessions. Worker listens on
~/.claude-mem/sockets/worker.sock by default with TCP fallback.

All hook handlers, MCP server, health checks, and admin commands updated to
use socket-aware workerHttpRequest(). Backwards compatible — settings can
force TCP mode via CLAUDE_MEM_WORKER_TRANSPORT=tcp.

Fixes #1346 (port 37777 collision across concurrent sessions)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove in-process worker fallback from hook command

Removes the fallback path where hook scripts started WorkerService in-process,
making the worker a grandchild of Claude Code (killed by sandbox). Hooks now
always delegate to ensureWorkerStarted() which spawns a fully detached daemon.

Fixes #1249 (grandchild process killed by sandbox)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add health checker and /api/admin/doctor endpoint

Adds 30-second periodic health sweep that prunes dead processes from the
supervisor registry and cleans stale socket files. Adds /api/admin/doctor
endpoint exposing supervisor state, process liveness, and environment health.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add comprehensive supervisor test suite

64 tests covering all supervisor modules: process registry (18 tests),
env sanitizer (8), shutdown cascade (10), socket manager (15), health
checker (5), and supervisor API (6). Includes persistence, isolation,
edge cases, and cross-module integration scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert Unix domain socket transport, restore TCP on port 37777

The socket-manager introduced UDS as default transport, but this broke
the HTTP server's TCP accessibility (viewer UI, curl, external monitoring).
Since there's only ever one worker process handling all sessions, the
port collision rationale for UDS doesn't apply. Reverts to TCP-only,
removing ~900 lines of unnecessary complexity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove dead code found in pre-landing review

Remove unused `acceptingSpawns` field from Supervisor class (written but
never read — assertCanSpawn uses stopPromise instead) and unused
`buildWorkerUrl` import from context handler.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* updated gitignore

* fix: address PR review feedback - downgrade HTTP logging, clean up gitignore, harden supervisor

- Downgrade request/response HTTP logging from info to debug to reduce noise
- Remove unused getWorkerPort imports, use buildWorkerUrl helper
- Export ENV_PREFIXES/ENV_EXACT_MATCHES from env-sanitizer, reuse in Server.ts
- Fix isPidAlive(0) returning true (should be false)
- Add shutdownInitiated flag to prevent signal handler race condition
- Make validateWorkerPidFile testable with pidFilePath option
- Remove unused dataDir from ShutdownCascadeOptions
- Upgrade reapSession log from debug to warn
- Rename zombiePidFiles to deadProcessPids (returns actual PIDs)
- Clean up gitignore: remove duplicate datasets/, stale ~*/ and http*/ patterns
- Fix tests to use temp directories instead of relying on real PID file

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 14:49:23 -07:00
Vincent Leraitre 237a4c37f8 fix: always pass --ssl flag to chroma-mcp in remote mode (#1286)
* fix: always pass --ssl flag to chroma-mcp in remote mode

The chroma-mcp CLI defaults to SSL when using --client-type http.
When CLAUDE_MEM_CHROMA_SSL is false (the common case for local
ChromaDB servers), buildCommandArgs() omitted --ssl entirely,
causing chroma-mcp to attempt an SSL connection to a plain HTTP
server and fail with "Could not connect to a Chroma server".

Always pass --ssl with an explicit true/false value so the user's
CLAUDE_MEM_CHROMA_SSL setting is faithfully forwarded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add regression tests for ChromaMcpManager SSL flag fix

Adds 4 focused test cases verifying buildCommandArgs() produces correct
--ssl args, covering SSL=false, SSL=true, unset (defaults to false), and
local mode (no --ssl flag). Requested by @xkonjin in PR #1286 review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: rebuild checked-in bundles to include SSL flag fix

Rebuild all bundles against upstream/main so the --ssl <true|false>
fix is present in the runtime artifacts that hooks and the marketplace
plugin actually execute.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:03:58 -07:00
laihenyi 626654f816 fix: prevent infinite restart loop on FOREIGN KEY constraint errors (#1334)
The pending-work-restart logic had no retry limit, causing infinite loops
when sessions encountered FOREIGN KEY constraint failures. This led to
2000+ error log entries per minute and eventual worker crash via SIGTERM.

Two fixes:
1. Add 'FOREIGN KEY constraint failed' to unrecoverable error patterns
   so it short-circuits immediately instead of falling through to restart
2. Add MAX_PENDING_RESTARTS (3) limit to pending-work-restart path as a
   safety net for any future unhandled persistent errors

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:03:48 -07:00
enzoricciulli e7ba9acaa7 fix: add content-hash dedup to batch observation store methods (#1302)
storeObservations() and storeObservationsAndMarkComplete() were missing
the content-hash deduplication that storeObservation() (singular) already
had via computeObservationContentHash() and findDuplicateObservation().

This caused the Gemini provider (and potentially others that return
multiple observations per response) to insert 2-10x duplicate rows per
tool use, since the batch methods inserted unconditionally without
checking content_hash.

The fix adds the same dedup pattern from storeObservation() to both
batch methods:
1. Compute content hash via computeObservationContentHash()
2. Check for existing observation within 30s window via findDuplicateObservation()
3. Skip insert and reuse existing ID if duplicate found
4. Include content_hash column in INSERT statement

Fixes #1158 (duplicate observations with Gemini provider)

Co-authored-by: Enzo Ricciulli <e.ricciulli@systhema.ai>
2026-03-12 20:01:53 -07:00
antmid ad902bedd9 fix: auto-repair malformed database schema from cross-version sync (#1308)
When a claude-mem DB is synced between machines running different versions,
orphaned indexes can reference non-existent columns (e.g. idx_observations_content_hash
referencing content_hash). This causes SQLite to throw "malformed database schema"
on ALL queries, including PRAGMAs, creating a silent 503 failure loop.

The fix detects this on startup, uses Python's sqlite3 module to drop the
orphaned schema objects (bun:sqlite doesn't support writable_schema modifications),
resets migration versions, and lets the idempotent migration system recreate
everything properly.

Fixes #1307

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:01:51 -07:00
Nir Alfasi 38d9ac7adb fix: prevent zombie subprocess accumulation by only trusting exitCode (#1226) (#1325)
proc.killed only means Node sent a signal — the process can still be alive.
This caused premature pool slot release, allowing unbounded process spawning.

- ensureProcessExit: remove proc.killed from early-exit checks, only trust exitCode
- Fix 3 call-site guards that skipped cleanup for signaled-but-alive processes
- Add TOTAL_PROCESS_HARD_CAP=10 safety net in waitForSlot()
- After SIGKILL, wait up to 1s via exit event instead of blind 200ms sleep
- Reduce reaper interval from 5min to 1min, idle threshold from 2min to 1min

Closes #1226

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 19:59:42 -07:00
Ben Younes 503bda4868 fix: add null guards for getChromaSync() when Chroma is disabled (#1336)
When CLAUDE_MEM_CHROMA_ENABLED=false, getChromaSync() returns null.
Two call sites were missing null guards, causing "null is not an object"
errors on every UserPromptSubmit / session init.

Fixes #1294

Vibe-coded by Ousama Ben Younes

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 19:58:03 -07:00
Umut Polat 73113321a1 fix: respect dateStart/dateEnd filters in Chroma search path (#1343)
When a search query includes dateStart/dateEnd parameters, the Chroma
semantic search path (PATH 2) ignored them entirely and only applied a
hardcoded 90-day recency window. This meant date-filtered searches
returned results from outside the requested range.

Now the Chroma path checks for a user-provided dateRange first. If
present, it filters results by the requested start/end bounds. The
90-day window is only used as the default when no date filter is
specified, preserving backward compatibility.

Fixes #1324

Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
2026-03-12 19:57:58 -07:00
Alex Newman 6581d2ef45 fix: unify mode type/concept loading to always use mode definition (#1316)
* fix: unify mode type/concept loading to always use mode definition

Code mode previously read observation types/concepts from settings.json
while non-code modes read from their mode JSON definition. This caused
stale filters to persist when switching between modes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: remove dead observation type/concept settings constants

CLAUDE_MEM_CONTEXT_OBSERVATION_TYPES and OBSERVATION_CONCEPTS are no
longer read by ContextConfigLoader since all modes now use their mode
definition. Removes the constants, defaults, UI controls, and the
now-empty observation-metadata.ts file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 03:00:20 -07:00
Alex Newman 0e502dbd21 feat: add smart-explore AST-based code navigation (#1244)
* feat: add smart-file-read module for token-optimized semantic code search

- Created package.json for the smart-file-read module with dependencies and scripts.
- Implemented parser.ts for code structure parsing using tree-sitter, supporting multiple languages.
- Developed search.ts for searching code files and symbols with grep-style and structural matching.
- Added test-run.mjs for testing search and outline functionalities.
- Configured TypeScript with tsconfig.json for strict type checking and module resolution.

* fix: update .gitignore to include _tree-sitter and remove unused subproject

* feat: add preliminary results and skill recommendation for smart-explore module

* chore: remove outdated plan.md file detailing session start hook issues

* feat: update Smart File Read integration plan and skill documentation for smart-explore

* feat: migrate Smart File Read to web-tree-sitter WASM for cross-platform compatibility

* refactor: switch to tree-sitter CLI for parsing and enhance search functionality

- Updated `parser.ts` to utilize the tree-sitter CLI for AST extraction instead of native bindings, improving compatibility and performance.
- Removed grammar loading logic and replaced it with a path resolution for grammar packages.
- Implemented batch parsing in `parseFilesBatch` to handle multiple files in a single CLI call, enhancing search speed.
- Refactored `searchCodebase` to collect files and parse them in batches, streamlining the search process.
- Adjusted symbol extraction logic to accommodate the new parsing method and ensure accurate symbol matching.

* feat: update Smart File Read integration plan to utilize tree-sitter CLI for improved performance and cross-platform compatibility

* feat: add smart-file-read parser and search to src/services

Copy validated tree-sitter CLI-based parser and search modules from
smart-file-read prototype into the claude-mem source tree for MCP
tool integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: register smart_search, smart_unfold, smart_outline MCP tools

Add 3 tree-sitter AST-based code exploration tools to the MCP server.
Direct execution (no HTTP delegation) — they call parser/search
functions directly for sub-second response times.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add tree-sitter CLI deps to build system and plugin runtime

Externalize tree-sitter packages in esbuild MCP server build. Add
10 grammar packages + CLI to plugin package.json for runtime install.
Remove unused @chroma-core/default-embed from plugin deps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: create smart-explore skill with 3-layer workflow docs

Progressive disclosure workflow: search -> outline -> unfold.
Documents all 3 MCP tools with parameters and token economics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add comprehensive documentation for the smart-explore feature

- Introduced a detailed technical reference covering the architecture, parser, search engine, and tool registration for the smart-explore feature in claude-mem.
- Documented the three-layer workflow: search, outline, and unfold, along with their respective MCP tools.
- Explained the parsing process using tree-sitter, including language support, query patterns, and symbol extraction.
- Outlined the search module's functionality, including file discovery, batch parsing, and relevance scoring.
- Provided insights into build system integration and token economics for efficient code exploration.

* chore: remove experiment artifacts, prototypes, and plan files

Remove A/B test docs, prototype smart-file-read directory, and
implementation plans. Keep only production code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: simplify hooks configuration and remove setup script

* fix: use execFileSync to prevent command injection in tree-sitter parser

Replaces execSync shell string with execFileSync + argument array,
eliminating shell interpretation of file paths. Also corrects
file_pattern description from "Glob pattern" to "Substring filter".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 21:00:26 -05:00
Alex Newman ad3d236cec fix: resolve hook crashes and CLAUDE_PLUGIN_ROOT fallback (#1215, #1220) (#1229)
* fix: resolve PostToolUse hook crashes and 5s latency (#1220)

Three compounding bugs caused hook failures:

1. Missing break statements in worker-service.ts switch — if async
   code threw before process.exit(), execution fell through to
   subsequent cases. Added break to all 7 cases missing them.

2. Unhandled promise rejection on main() — added .catch() that logs
   the error and exits 0 (per project exit code strategy: don't block
   Claude Code or leave Windows Terminal tabs open).

3. Redundant start commands in hooks.json — PostToolUse,
   UserPromptSubmit, and Stop groups each had a standalone start
   command that was redundant (the hook case already calls
   ensureWorkerStarted internally). The redundant start also caused
   5s latency via bun-runner.js collectStdin() timeout since Claude
   Code never closes stdin.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add CLAUDE_PLUGIN_ROOT fallback for Stop hooks (#1215)

Upstream Claude Code bug (anthropics/claude-code#24529) leaves
CLAUDE_PLUGIN_ROOT unset for Stop hooks on macOS and ALL hooks
on Linux. Two-layer defense:

1. Shell-level: hooks.json commands now use inline fallback
   _R="${CLAUDE_PLUGIN_ROOT}"; [ -z "$_R" ] && _R="$HOME/...";
   falling back to the known marketplace install path.

2. Script-level: bun-runner.js self-resolves plugin root from
   its own filesystem location via import.meta.url, and fixes
   broken /scripts/... paths that result from empty expansion.

Added test to verify all hook commands include the fallback path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 19:31:26 -05:00