e1ab73decc
* docs: add folder index generator plan RFC for auto-generating folder-level CLAUDE.md files with observation timelines. Includes IDE symlink support and root CLAUDE.md integration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: implement folder index generator (Phase 1) Add automatic CLAUDE.md generation for folders containing observed files. This enables IDE context providers to access relevant memory observations. Core modules: - FolderDiscovery: Extract folders from observation file paths - FolderTimelineCompiler: Compile chronological timeline per folder - ClaudeMdGenerator: Write CLAUDE.md with tag-based content replacement - FolderIndexOrchestrator: Coordinate regeneration on observation save Integration: - Event-driven regeneration after observation save in ResponseProcessor - HTTP endpoints for folder discovery, timeline, and manual generation - Settings for enabling/configuring folder index behavior The <claude-mem-context> tag wrapping ensures: - Manual CLAUDE.md content is preserved - Auto-generated content won't be recursively observed - Clean separation between user and system content 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add updateFolderClaudeMd function to CursorHooksInstaller Adds function to update CLAUDE.md files for folders touched by observations. Uses existing /api/search/by-file endpoint, preserves content outside <claude-mem-context> tags, and writes atomically via temp file + rename. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: hook updateFolderClaudeMd into ResponseProcessor Calls updateFolderClaudeMd after observation save to update folder-level CLAUDE.md files. Uses fire-and-forget pattern with error logging. Extracts file paths from saved observations and workspace path from registry. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add timeline formatting for folder CLAUDE.md files Implements formatTimelineForClaudeMd function that transforms API response into compact markdown table format. Converts emojis to text labels, handles ditto marks for timestamps, and groups under "Recent" header. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: remove old folder-index implementation Deletes redundant folder-index services that were replaced by the simpler updateFolderClaudeMd approach in CursorHooksInstaller.ts. Removed: - src/services/folder-index/ directory (5 files) - FolderIndexRoutes.ts - folder-index settings from SettingsDefaultsManager - folder-index route registration from worker-service 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add worktree-aware project filtering for unified timelines Detect git worktrees and show both parent repo and worktree observations in the session start timeline. When running in a worktree, the context now includes observations from both projects, interleaved chronologically. - Add detectWorktree() utility to identify worktree directories - Add getProjectContext() to return parent + worktree projects - Update context hook to pass multi-project queries - Add queryObservationsMulti() and querySummariesMulti() for IN clauses - Maintain backward compatibility with single-project queries 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * fix: restructure logging to prove session correctness and reduce noise Add critical logging at each stage of the session lifecycle to prove the session ID chain (contentSessionId → sessionDbId → memorySessionId) stays aligned. New logs include CREATED, ENQUEUED, CLAIMED, MEMORY_ID_CAPTURED, STORING, and STORED. Move intermediate migration and backfill progress logs to DEBUG level to reduce noise, keeping only essential initialization and completion logs at INFO level. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * refactor: extract folder CLAUDE.md utils to shared location Moves folder CLAUDE.md utilities from CursorHooksInstaller to a new shared utils file. Removes Cursor registry dependency - file paths from observations are already absolute, no workspace lookup needed. New file: src/utils/claude-md-utils.ts - replaceTaggedContent() - preserves user content outside tags - writeClaudeMdToFolder() - atomic writes with tag preservation - formatTimelineForClaudeMd() - API response to compact markdown - updateFolderClaudeMdFiles() - orchestrates folder updates 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: trigger folder CLAUDE.md updates when observations are saved The folder CLAUDE.md update was previously only triggered in syncAndBroadcastSummary, but summaries run with observationCount=0 (observations are saved separately). Moved the update logic to syncAndBroadcastObservations where file paths are available. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * all the claudes * test: add unit tests for claude-md-utils pure functions Add 11 tests covering replaceTaggedContent and formatTimelineForClaudeMd: - replaceTaggedContent: empty content, tag replacement, appending, partial tags - formatTimelineForClaudeMd: empty input, parsing, ditto marks, session IDs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test: add integration tests for file operation functions Add 9 tests for writeClaudeMdToFolder and updateFolderClaudeMdFiles: - writeClaudeMdToFolder: folder creation, content preservation, nested dirs, atomic writes - updateFolderClaudeMdFiles: empty skip, fetch/write, deduplication, error handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test: add unit tests for timeline-formatting utilities Add 14 tests for extractFirstFile and groupByDate functions: - extractFirstFile: relative paths, fallback to files_read, null handling, invalid JSON - groupByDate: empty arrays, date grouping, chronological sorting, item preservation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: rebuild plugin scripts with merged features * docs: add project-specific CLAUDE.md with architecture and development notes * fix: exclude project root from auto-generated CLAUDE.md updates Skip folders containing .git directory when auto-updating subfolder CLAUDE.md files. This ensures: 1. Root CLAUDE.md remains user-managed and untouched by the system 2. SessionStart context injection stays pristine throughout the session 3. Subfolder CLAUDE.md files continue to receive live context updates 4. Cleaner separation between user-authored root docs and auto-generated folder indexes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: prevent crash from resuming stale SDK sessions on worker restart When the worker restarts, it was incorrectly passing the `resume` parameter to INIT prompts (lastPromptNumber=1) when a memorySessionId existed from a previous SDK session. This caused "Claude Code process exited with code 1" crashes because the SDK tried to resume into a session that no longer exists. Root cause: The resume condition only checked `hasRealMemorySessionId` but did not verify that this was a CONTINUATION prompt (lastPromptNumber > 1). Fix: Add `session.lastPromptNumber > 1` check to the resume condition: - Before: `...(hasRealMemorySessionId && { resume: session.memorySessionId })` - After: `...(hasRealMemorySessionId && session.lastPromptNumber > 1 && { resume: ... })` Also added: - Enhanced debug logging that warns when skipping resume for INIT prompts - Unit tests in tests/sdk-agent-resume.test.ts (9 test cases) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: properly handle Chroma MCP connection errors Previously, ensureCollection() caught ALL errors from chroma_get_collection_info and assumed they meant "collection doesn't exist", triggering unnecessary collection creation attempts. Connection errors like "Not connected" or "MCP error -32000: Connection closed" would cascade into failed creation attempts. Similarly, queryChroma() would silently return empty results when the MCP call failed, masking the underlying connection problem. Changes: - ensureCollection(): Detect connection errors and re-throw immediately instead of attempting collection creation - queryChroma(): Wrap MCP call in try-catch and throw connection errors instead of returning empty results - Both methods reset connection state (connected=false, client=null) on connection errors so subsequent operations can attempt to reconnect 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * pushed * fix: scope regenerate-claude-md.ts to current working directory Critical bug fix: The script was querying ALL observations from the database across ALL projects ever recorded (1396+ folders), then attempting to write CLAUDE.md files everywhere including other projects, non-existent paths, and ignored directories. Changes: - Use git ls-files to discover folders (respects .gitignore automatically) - Filter database query to current project only (by folder name) - Use relative paths for database queries (matches storage format) - Add --clean flag to remove auto-generated CLAUDE.md files - Add fallback directory walker for non-git repos Now correctly scopes to 26 folders with observations instead of 1396+. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs and adjustments * fix: cleanup mode strips tags instead of deleting files blindly The cleanup mode was incorrectly deleting entire files that contained <claude-mem-context> tags. The correct behavior (per original design): 1. Strip the <claude-mem-context>...</claude-mem-context> section 2. If empty after stripping → delete the file 3. If has remaining content → save the stripped version Now properly preserves user content in CLAUDE.md files while removing only the auto-generated sections. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * deleted some files * chore: regenerate folder CLAUDE.md files with fixed script Regenerated 23 folder CLAUDE.md files using the corrected script that: - Scopes to current working directory only - Uses git ls-files to respect .gitignore - Filters by project name 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update CLAUDE.md files for January 5, 2026 - Regenerated and staged 23 CLAUDE.md files with a mix of new and modified content. - Fixed cleanup mode to properly strip tags instead of deleting files blindly. - Cleaned up empty CLAUDE.md files from various directories, including ~/.claude and ~/Scripts. - Conducted dry-run cleanup that identified a significant reduction in auto-generated CLAUDE.md files. - Removed the isAutoGeneratedClaudeMd function due to incorrect file deletion behavior. * feat: use settings for observation limit in batch regeneration script Replace hard-coded limit of 10 with configurable CLAUDE_MEM_CONTEXT_OBSERVATIONS setting (default: 50). This allows users to control how many observations appear in folder CLAUDE.md files. Changes: - Import SettingsDefaultsManager and load settings at script startup - Use OBSERVATION_LIMIT constant derived from settings at both call sites - Remove stale default parameter from findObservationsByFolder function 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: use settings for observation limit in event-driven updates Replace hard-coded limit of 10 in updateFolderClaudeMdFiles with configurable CLAUDE_MEM_CONTEXT_OBSERVATIONS setting (default: 50). Changes: - Import SettingsDefaultsManager and os module - Load settings at function start (once, not in loop) - Use limit from settings in API call 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: Implement configurable observation limits and enhance search functionality - Added configurable observation limits to batch regeneration scripts. - Enhanced SearchManager to handle folder queries and normalize parameters. - Introduced methods to check for direct child files in observations and sessions. - Updated SearchOptions interface to include isFolder flag for filtering. - Improved code quality with comprehensive reviews and anti-pattern checks. - Cleaned up auto-generated CLAUDE.md files across various directories. - Documented recent changes and improvements in CLAUDE.md files. * build asset * Project Context from Claude-Mem auto-added (can be auto removed at any time) * CLAUDE.md updates * fix: resolve CLAUDE.md files to correct directory in worktree setups When using git worktrees, CLAUDE.md files were being written relative to the worker's process.cwd() instead of the actual project directory. This fix threads the project's cwd from message processing through to the file writing utilities, ensuring CLAUDE.md files are created in the correct project directory regardless of where the worker was started. Changes: - Add projectRoot parameter to updateFolderClaudeMdFiles for path resolution - Thread projectRoot through ResponseProcessor call chain - Track lastCwd from messages in SDKAgent, GeminiAgent, OpenRouterAgent - Add tests for relative/absolute path handling with projectRoot 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * more project context updates * context updates * fix: preserve actual dates in folder CLAUDE.md generation Previously, formatTimelineForClaudeMd used today's date for all observations because the API only returned time (e.g., "4:30 PM") without date information. This caused all historical observations to appear as if they happened today. Changes: - SearchManager.findByFile now groups results by date with headers (e.g., "### Jan 4, 2026") matching formatSearchResults behavior - formatTimelineForClaudeMd now parses these date headers and uses the correct date when constructing epochs for date grouping The timeline dates are critical for claude-mem context - LLMs need accurate temporal context to understand when work happened. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * build: update worker assets with date parsing fix 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * claude-mem context: Fixed critical date parsing bug in PR #556 * fix: address PR #556 review items - Use getWorkerHost() instead of hard-coded 127.0.0.1 in claude-md-utils - Add error message and stack details to FOLDER_INDEX logging - Add 5 new tests for worktree/projectRoot path resolution 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Refactor CLAUDE documentation across multiple components and tests - Updated CLAUDE.md files in src/ui/viewer, src/ui/viewer/constants, src/ui/viewer/hooks, tests/server, tests/worker/agents, and plans to reflect recent changes and improvements. - Removed outdated entries and consolidated recent activities for clarity. - Enhanced documentation for hooks, settings, and pagination implementations. - Streamlined test suite documentation for server and worker agents, indicating recent test audits and cleanup efforts. - Adjusted plans to remove obsolete entries and focus on current implementation strategies. * docs: comprehensive v9.0 documentation audit and updates - Add usage/folder-context to docs.json navigation (was documented but hidden!) - Update introduction.mdx with v9.0 release notes (Live Context, Worktree Support, Windows Fixes) - Add CLAUDE_MEM_WORKER_HOST setting to configuration.mdx - Add Folder Context Files section with link to detailed docs - Document worktree support in folder-context.mdx - Update terminology from "mem-search skill" to "MCP tools" throughout active docs - Update Search Pipeline in architecture/overview.mdx - Update usage/getting-started.mdx with MCP tools terminology - Update usage/claude-desktop.mdx title and terminology - Update hooks-architecture.mdx reference 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add recent activity log for worker CLI with detailed entries * chore: update CLAUDE.md context files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add brainstorming report for CLAUDE.md distribution architecture --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
318 lines
12 KiB
Markdown
318 lines
12 KiB
Markdown
# Logging Analysis and Recommendations
|
|
|
|
**Date**: 2026-01-04
|
|
**Status**: CRITICAL - Current logging does not prove system correctness
|
|
**Goal**: Enable operators to visually verify the system is working and quickly discover when it isn't
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
The current logging is **noisy bullshit that doesn't cover the important parts of the system**. The logging should:
|
|
|
|
1. **PROVE** the system is working correctly (not just record activity)
|
|
2. **MAKE OBVIOUS** when things break (clear error paths)
|
|
3. **TRACE** data end-to-end through the pipeline
|
|
|
|
### Critical Finding: Session ID Alignment is BROKEN and UNVERIFIABLE
|
|
|
|
The system has **three session ID types** that must stay aligned:
|
|
- `contentSessionId` - from Claude Code (user's session)
|
|
- `sessionDbId` - our internal database ID (integer)
|
|
- `memorySessionId` - from Claude SDK (enables resume)
|
|
|
|
**The [ALIGNMENT] logs exist because this mapping is STILL a regression bug.** The current logs show intermediate values but **don't prove correctness**.
|
|
|
|
---
|
|
|
|
## Critical System Operations
|
|
|
|
### 1. Session ID Mapping Chain (MOST CRITICAL)
|
|
|
|
```
|
|
contentSessionId (from hook)
|
|
→ sessionDbId (our DB lookup)
|
|
→ memorySessionId (captured from SDK)
|
|
```
|
|
|
|
**If this breaks, observations go to wrong sessions = DATA CORRUPTION**
|
|
|
|
**Current State:**
|
|
| Operation | Has Logging? | Proves Correctness? |
|
|
|-----------|-------------|---------------------|
|
|
| Hook receives contentSessionId | YES | NO - just logs receipt |
|
|
| DB creates/looks up sessionDbId | PARTIAL | NO - no verification |
|
|
| SDK response gives memorySessionId | YES | NO - no DB update verification |
|
|
| Observations stored with memorySessionId | PARTIAL | NO - doesn't show which IDs used |
|
|
|
|
**What's MISSING:**
|
|
|
|
```
|
|
[INFO] [SESSION] SESSION_CREATED | contentSessionId=abc123 → sessionDbId=42 | isNew=true
|
|
[INFO] [SESSION] MEMORY_ID_CAPTURED | sessionDbId=42 | memorySessionId=xyz789 | dbUpdateSuccess=true
|
|
[INFO] [SESSION] E2E_VERIFIED | contentSessionId=abc123 → sessionDbId=42 → memorySessionId=xyz789
|
|
```
|
|
|
|
### 2. Observation Storage Pipeline (CRITICAL)
|
|
|
|
**The pipeline:**
|
|
```
|
|
Hook captures tool use
|
|
→ Worker receives observation
|
|
→ Queued to pending_messages
|
|
→ SDK agent claims message
|
|
→ SDK processes → generates XML
|
|
→ Observations parsed
|
|
→ Stored to DB with memorySessionId
|
|
→ Synced to Chroma
|
|
```
|
|
|
|
**Current State:**
|
|
| Operation | Has Logging? | Proves Correctness? |
|
|
|-----------|-------------|---------------------|
|
|
| Hook captures tool | YES | Noise - "Received hook input" |
|
|
| Observation queued | YES | Noise - just says "queued" |
|
|
| Message claimed from queue | NO | MISSING |
|
|
| Observation parsed | NO | MISSING |
|
|
| Observation stored to DB | PARTIAL | NO - doesn't show IDs used |
|
|
| DB transaction committed | NO | MISSING |
|
|
| Chroma sync complete | DEBUG only | Should be INFO for failures |
|
|
|
|
**What's MISSING:**
|
|
|
|
```
|
|
[INFO] [QUEUE] CLAIMED | sessionDbId=42 | messageId=5 | type=observation | tool=Bash(npm test)
|
|
[INFO] [DB ] STORED | sessionDbId=42 | memorySessionId=xyz789 | observations=2 | ids=[101,102]
|
|
[INFO] [QUEUE] COMPLETED | sessionDbId=42 | messageId=5 | processingTime=1.2s
|
|
```
|
|
|
|
### 3. Queue Processing (CRITICAL)
|
|
|
|
Messages can fail, get stuck, or be lost. Current logging doesn't show:
|
|
- When a message is claimed
|
|
- When a message is completed
|
|
- When a message fails and WHY
|
|
- Queue depth and processing latency
|
|
|
|
**Current State:**
|
|
- Queue enqueue: `logger.debug` (not visible at INFO)
|
|
- Queue claim: NO LOGGING
|
|
- Queue completion: NO LOGGING
|
|
- Queue failure: `logger.error` (exists but rare)
|
|
- Recovery of stuck messages: `logger.info` (good)
|
|
|
|
**What's MISSING:**
|
|
|
|
```
|
|
[INFO] [QUEUE] ENQUEUE | sessionDbId=42 | type=observation | queueDepth=3
|
|
[INFO] [QUEUE] CLAIM | sessionDbId=42 | messageId=5 | waitTime=0.1s
|
|
[INFO] [QUEUE] COMPLETE | sessionDbId=42 | messageId=5 | success=true
|
|
[ERROR][QUEUE] FAILED | sessionDbId=42 | messageId=5 | error="SDK timeout" | willRetry=true
|
|
```
|
|
|
|
### 4. Context Injection (IMPORTANT)
|
|
|
|
When a session starts, relevant past observations should be injected. Current logging doesn't show:
|
|
- What context was searched for
|
|
- What was found
|
|
- What was injected
|
|
|
|
**Current State:** Effectively no logging for context injection success path.
|
|
|
|
---
|
|
|
|
## What's Currently NOISE (Should Be DEBUG or Removed)
|
|
|
|
### Chatty Session Init Logs (new-hook.ts)
|
|
```typescript
|
|
// 7 INFO logs for a single session init
|
|
logger.info('HOOK', 'new-hook: Received hook input'); // WHO CARES
|
|
logger.info('HOOK', 'new-hook: Calling /api/sessions/init'); // WHO CARES
|
|
logger.info('HOOK', 'new-hook: Received from /api/sessions/init'); // WHO CARES
|
|
logger.info('HOOK', 'new-hook: Session N, prompt #M'); // CONSOLIDATE INTO ONE
|
|
logger.info('HOOK', 'new-hook: Calling /sessions/{id}/init'); // WHO CARES
|
|
```
|
|
|
|
**Should be ONE log:** `SESSION_INIT | sessionDbId=42 | promptNumber=1 | project=foo`
|
|
|
|
### Chatty SessionManager Logs
|
|
```typescript
|
|
logger.info('SESSION', 'initializeSession called'); // WHO CARES
|
|
logger.info('SESSION', 'Returning cached session'); // DEBUG
|
|
logger.info('SESSION', 'Fetched session from database'); // DEBUG
|
|
logger.info('SESSION', 'Creating new session object'); // DEBUG
|
|
logger.info('SESSION', 'Session initialized'); // GOOD - KEEP
|
|
logger.info('SESSION', 'Observation queued'); // DEBUG - happens constantly
|
|
logger.info('SESSION', 'Summarize queued'); // DEBUG - happens constantly
|
|
```
|
|
|
|
### Chatty Chroma Backfill Logs
|
|
```typescript
|
|
// Logs EVERY batch at INFO - should be DEBUG for progress
|
|
logger.info('CHROMA_SYNC', 'Backfill progress', { processed, remaining }); // DEBUG
|
|
```
|
|
|
|
**Should be START and END only at INFO level.**
|
|
|
|
### Duplicate Migration Logs
|
|
Both `SessionStore.ts` and `migrations/runner.ts` have ~25 identical log statements. **DEDUPLICATE.**
|
|
|
|
---
|
|
|
|
## [ALIGNMENT] Logs: The Problem
|
|
|
|
The [ALIGNMENT] logs were added to debug session ID issues. They're in the RIGHT places but they **don't prove anything**:
|
|
|
|
```typescript
|
|
// Current - shows values but doesn't verify
|
|
logger.info('SDK', `[ALIGNMENT] Resume Decision | contentSessionId=${...} | memorySessionId=${...}`);
|
|
|
|
// What's needed - proves correctness
|
|
logger.info('SDK', `[ALIGNMENT] VERIFIED | contentSessionId=${...} → sessionDbId=${...} → memorySessionId=${...} | dbMatch=true | resumeValid=true`);
|
|
```
|
|
|
|
**Current problems:**
|
|
1. Log values without validation
|
|
2. Don't show if DB operations succeeded
|
|
3. Don't trace end-to-end
|
|
4. Mixed in with noise - hard to see
|
|
|
|
**What they should do:**
|
|
1. Log the mapping chain ONCE with verification
|
|
2. Show DB operation success/failure
|
|
3. Provide clear end-to-end trace
|
|
4. Stand out from noise with consistent prefix
|
|
|
|
---
|
|
|
|
## Proposed Logging Architecture
|
|
|
|
### Log Levels by Purpose
|
|
|
|
| Level | Purpose | Examples |
|
|
|-------|---------|----------|
|
|
| ERROR | Something FAILED | DB write failed, SDK crashed, queue overflow |
|
|
| WARN | Something UNEXPECTED but handled | Fallback used, retry needed, timeout |
|
|
| INFO | KEY OPERATIONS completed | Session created, observation stored, queue processed |
|
|
| DEBUG | Detailed tracing | Cache hits, intermediate states, parsing details |
|
|
|
|
### Critical Path Logging (Must be INFO)
|
|
|
|
#### Session Lifecycle
|
|
```
|
|
[INFO] [SESSION] CREATED | contentSessionId=abc → sessionDbId=42 | project=foo
|
|
[INFO] [SESSION] MEMORY_ID_CAPTURED | sessionDbId=42 → memorySessionId=xyz | dbUpdated=true
|
|
[INFO] [SESSION] VERIFIED | chain: abc→42→xyz | valid=true
|
|
[INFO] [SESSION] COMPLETED | sessionDbId=42 | duration=45s | observations=12 | summaries=1
|
|
```
|
|
|
|
#### Observation Pipeline
|
|
```
|
|
[INFO] [QUEUE] ENQUEUED | sessionDbId=42 | type=observation | tool=Bash(npm test) | depth=1
|
|
[INFO] [QUEUE] CLAIMED | sessionDbId=42 | messageId=5 | waitTime=0.1s
|
|
[INFO] [DB ] STORED | sessionDbId=42 | memorySessionId=xyz | obsIds=[101,102] | txnCommit=true
|
|
[INFO] [QUEUE] COMPLETED | sessionDbId=42 | messageId=5 | duration=1.2s
|
|
```
|
|
|
|
#### Error Conditions
|
|
```
|
|
[ERROR] [SESSION] MEMORY_ID_MISMATCH | expected=xyz | got=abc | sessionDbId=42
|
|
[ERROR] [DB ] STORE_FAILED | sessionDbId=42 | error="FK constraint" | observations=2
|
|
[ERROR] [QUEUE ] STUCK | sessionDbId=42 | stuckFor=5min | action=marking_failed
|
|
[ERROR] [SDK ] CRASHED | sessionDbId=42 | error="Claude process died" | pendingWork=3
|
|
```
|
|
|
|
### Health Dashboard Output
|
|
|
|
After fixes, a healthy session should produce:
|
|
```
|
|
[INFO] [SESSION] CREATED | contentSessionId=abc → sessionDbId=42
|
|
[INFO] [SESSION] GENERATOR_STARTED | sessionDbId=42 | provider=claude-sdk
|
|
[INFO] [QUEUE ] CLAIMED | sessionDbId=42 | messageId=1 | type=observation
|
|
[INFO] [SESSION] MEMORY_ID_CAPTURED | sessionDbId=42 → memorySessionId=xyz
|
|
[INFO] [DB ] STORED | sessionDbId=42 | memorySessionId=xyz | obsIds=[1]
|
|
[INFO] [QUEUE ] COMPLETED | sessionDbId=42 | messageId=1
|
|
... (more observations)
|
|
[INFO] [QUEUE ] CLAIMED | sessionDbId=42 | messageId=5 | type=summarize
|
|
[INFO] [DB ] STORED | sessionDbId=42 | summaryId=1
|
|
[INFO] [QUEUE ] COMPLETED | sessionDbId=42 | messageId=5
|
|
[INFO] [SESSION] COMPLETED | sessionDbId=42 | duration=45s | observations=12
|
|
```
|
|
|
|
An UNHEALTHY session should make problems OBVIOUS:
|
|
```
|
|
[INFO] [SESSION] CREATED | contentSessionId=abc → sessionDbId=42
|
|
[INFO] [SESSION] GENERATOR_STARTED | sessionDbId=42 | provider=claude-sdk
|
|
[ERROR] [SESSION] MEMORY_ID_NOT_CAPTURED | sessionDbId=42 | waited=30s
|
|
[ERROR] [DB ] STORE_FAILED | sessionDbId=42 | error="memorySessionId is null"
|
|
[WARN ] [QUEUE ] STUCK | sessionDbId=42 | messageId=1 | age=60s | action=retry
|
|
[ERROR] [SESSION] GENERATOR_CRASHED | sessionDbId=42 | error="SDK timeout"
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Priorities
|
|
|
|
### P0: Fix Critical Missing Logs (Session Alignment)
|
|
|
|
1. **ResponseProcessor.ts** - Add logging BEFORE storeObservations:
|
|
```typescript
|
|
logger.info('DB', 'STORING | sessionDbId=... | memorySessionId=... | count=...');
|
|
```
|
|
|
|
2. **SDKAgent.ts** - Verify DB update after memorySessionId capture:
|
|
```typescript
|
|
const updated = store.updateMemorySessionId(sessionDbId, memorySessionId);
|
|
logger.info('SESSION', `MEMORY_ID_CAPTURED | sessionDbId=${...} | memorySessionId=${...} | dbUpdated=${updated}`);
|
|
```
|
|
|
|
3. **SessionRoutes.ts** - Log session creation with verification:
|
|
```typescript
|
|
logger.info('SESSION', `CREATED | contentSessionId=${...} → sessionDbId=${...} | verified=true`);
|
|
```
|
|
|
|
### P1: Fix Queue Processing Logs
|
|
|
|
1. **SessionQueueProcessor.ts** - Add CLAIM/COMPLETE logs
|
|
2. **PendingMessageStore.ts** - Add enqueue/dequeue logs
|
|
|
|
### P2: Reduce Noise
|
|
|
|
1. Move chatty logs to DEBUG level
|
|
2. Deduplicate migration logs
|
|
3. Consolidate hook init logs
|
|
|
|
### P3: Add Health Validation
|
|
|
|
1. Periodic verification log: `[INFO] [HEALTH] OK | sessions=3 | pending=0 | chroma=connected`
|
|
2. On-demand chain verification: `[INFO] [VERIFY] contentSessionId=abc chain is VALID`
|
|
|
|
---
|
|
|
|
## Files Requiring Changes
|
|
|
|
| File | Priority | Changes |
|
|
|------|----------|---------|
|
|
| `src/services/worker/agents/ResponseProcessor.ts` | P0 | Add pre-store logging with IDs |
|
|
| `src/services/worker/SDKAgent.ts` | P0 | Verify DB update, consolidate ALIGNMENT logs |
|
|
| `src/services/worker/http/routes/SessionRoutes.ts` | P0 | Add session creation verification log |
|
|
| `src/services/queue/SessionQueueProcessor.ts` | P1 | Add CLAIM/COMPLETE logs |
|
|
| `src/services/sqlite/PendingMessageStore.ts` | P1 | Add enqueue/dequeue logs |
|
|
| `src/services/worker/SessionManager.ts` | P2 | Move chatty logs to DEBUG |
|
|
| `src/hooks/new-hook.ts` | P2 | Consolidate to single INFO log |
|
|
| `src/services/sync/ChromaSync.ts` | P2 | Move progress to DEBUG, keep start/end INFO |
|
|
| `src/services/sqlite/SessionStore.ts` | P2 | Remove duplicate migration logs |
|
|
|
|
---
|
|
|
|
## Verification Checklist
|
|
|
|
After implementing changes, verify:
|
|
|
|
- [ ] Can trace contentSessionId → sessionDbId → memorySessionId in logs
|
|
- [ ] Can see when observation storage succeeds/fails
|
|
- [ ] Can see queue claim/complete for each message
|
|
- [ ] Errors are OBVIOUS and include context for debugging
|
|
- [ ] Noise is reduced to the point where INFO level is useful
|
|
- [ ] A "normal" session produces ~10-15 INFO logs, not 50+
|