Files

T

Alex Newman 417acb0f81 fix: comprehensive error handling improvements and architecture documentation (#522 )

* Add enforceable anti-pattern detection for try-catch abuse

PROBLEM:
- Overly-broad try-catch blocks waste 10+ hours of debugging time
- Empty catch blocks silently swallow errors
- AI assistants use try-catch to paper over uncertainty instead of doing research

SOLUTION:
1. Created detect-error-handling-antipatterns.ts test
   - Detects empty catch blocks (45 CRITICAL found)
   - Detects catch without logging (45 CRITICAL total)
   - Detects large try blocks (>10 lines)
   - Detects generic catch without type checking
   - Detects catch-and-continue on critical paths
   - Exit code 1 if critical issues found

2. Updated CLAUDE.md with MANDATORY ERROR HANDLING RULES
   - 5-question pre-flight checklist before any try-catch
   - FORBIDDEN patterns with examples
   - ALLOWED patterns with examples
   - Meta-rule: UNCERTAINTY TRIGGERS RESEARCH, NOT TRY-CATCH
   - Critical path protection list

3. Created comprehensive try-catch audit report
   - Documents all 96 try-catch blocks in worker service
   - Identifies critical issue at worker-service.ts:748-750
   - Categorizes patterns and provides recommendations

This is enforceable via test, not just instructions that can be ignored.

Current state: 163 anti-patterns detected (45 critical, 47 high, 71 medium)
Next: Fix critical issues identified by test

🤖 Generated with Claude Code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add logging to 5 critical empty catch blocks (Wave 1)

Wave 1 of error handling cleanup - fixing empty catch blocks that
silently swallow errors without any trace.

Fixed files:
- src/bin/import-xml-observations.ts:80 - Log skipped invalid JSON
- src/utils/bun-path.ts:33 - Log when bun not in PATH
- src/utils/cursor-utils.ts:44 - Log failed registry reads
- src/utils/cursor-utils.ts:149 - Log corrupt MCP config
- src/shared/worker-utils.ts:128 - Log failed health checks

All catch blocks now have proper logging with context and error details.

Progress: 41 → 39 CRITICAL issues remaining

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add logging to promise catches on critical paths (Wave 2)

Wave 2 of error handling cleanup - fixing empty promise catch handlers
that silently swallow errors on critical code paths. These are the
patterns that caused the 10-hour debugging session.

Fixed empty promise catches:
- worker-service.ts:642 - Background initialization failures
- SDKAgent.ts:372,446 - Session processor errors
- GeminiAgent.ts:408,475 - Finalization failures
- OpenRouterAgent.ts:451,518 - Finalization failures
- SessionManager.ts:289 - Generator promise failures

Added justification comments to catch-and-continue blocks:
- worker-service.ts:68 - PID file removal (cleanup, non-critical)
- worker-service.ts:130 - Cursor context update (non-critical)

All promise rejection handlers now log errors with context, preventing
silent failures that were nearly impossible to debug.

Note: The anti-pattern detector only tracks try-catch blocks, not
standalone promise chains. These fixes address the root cause of the
original 10-hour debugging session even though the detector count
remains unchanged.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add logging and documentation to error handling patterns (Wave 3)

Wave 3 of error handling cleanup - comprehensive review and fixes for
remaining critical issues identified by the anti-pattern detector.

Changes organized by severity:

**Wave 3.1: Fixed 2 EMPTY_CATCH blocks**
- worker-service.ts:162 - Health check polling now logs failures
- worker-service.ts:610 - Process cleanup logs failures

**Wave 3.2: Reviewed 12 CATCH_AND_CONTINUE patterns**
- Verified all are correct (log errors AND exit/return HTTP errors)
- Added justification comment to session recovery (line 829)
- All patterns properly notify callers of failures

**Wave 3.3: Fixed 29 NO_LOGGING_IN_CATCH issues**

Added logging to 16 catch blocks:
- UI layer: useSettings.ts, useContextPreview.ts (console logging)
- Servers: mcp-server.ts health checks and tool execution
- Worker: version fetch, cleanup, config corruption
- Routes: error handler, session recovery, settings validation
- Services: branch checkout, timeline queries

Documented 13 intentional exceptions with comments explaining why:
- Hot paths (port checks, process checks in tight loops)
- Error accumulation (transcript parser collects for batch retrieval)
- Special cases (logger can't log its own failures)
- Fallback parsing (JSON parse in optional data structures)

All changes follow error handling guidelines from CLAUDE.md:
- Appropriate log levels (error/warn/debug)
- Context objects with relevant details
- Descriptive messages explaining failures
- Error extraction pattern for Error instances

Progress: 41 → 29 detector warnings
Remaining warnings are conservative flags on verified-correct patterns
(catch-and-continue blocks that properly log + notify callers).

Build verified successful. All error handling now provides visibility
for debugging while avoiding excessive logging on hot paths.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: add queue:clear command to remove failed messages

Added functionality to clear failed messages from the observation queue:

**Changes:**
- PendingMessageStore: Added clearFailed() method to delete failed messages
- DataRoutes: Added DELETE /api/pending-queue/failed endpoint
- CLI: Created scripts/clear-failed-queue.ts for interactive queue clearing
- package.json: Added npm run queue:clear script

**Usage:**
  npm run queue:clear          # Interactive - prompts for confirmation
  npm run queue:clear -- --force  # Non-interactive - clears without prompt

Failed messages are observations that exceeded max retry count. They
remain in the queue for debugging but won't be processed. This command
removes them to clean up the queue.

Works alongside existing queue:check and queue:process commands to
provide complete queue management capabilities.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: add --all flag to queue:clear for complete queue reset

Extended queue clearing functionality to support clearing all messages,
not just failed ones.

**Changes:**
- PendingMessageStore: Added clearAll() method to clear pending, processing, and failed
- DataRoutes: Added DELETE /api/pending-queue/all endpoint
- clear-failed-queue.ts: Added --all flag to clear everything
- Updated help text and UI to distinguish between failed-only and all-clear modes

**Usage:**
  npm run queue:clear              # Clear failed only (interactive)
  npm run queue:clear -- --all     # Clear ALL messages (interactive)
  npm run queue:clear -- --all --force  # Clear all without confirmation

The --all flag provides a complete queue reset, removing pending,
processing, and failed messages. Useful when you want a fresh start
or need to cancel stuck sessions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: add comprehensive documentation for session ID architecture and validation tests

* feat: add logs viewer with clear functionality to UI

- Add LogsRoutes API endpoint for fetching and clearing worker logs
- Create LogsModal component with auto-refresh and clear button
- Integrate logs viewer button into Header component
- Add comprehensive CSS styling for logs modal
- Logs accessible via new document icon button in header

Logs viewer features:
- Display last 1000 lines of current day's log file
- Auto-refresh toggle (2s interval)
- Clear logs button with confirmation
- Monospace font for readable log output
- Responsive modal design matching existing UI

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: redesign logs as Chrome DevTools-style console drawer

Major UX improvements to match Chrome DevTools console:
- Convert from modal to bottom drawer that slides up
- Move toggle button to bottom-left corner (floating button)
- Add draggable resize handle for height adjustment
- Use plain monospace font (SF Mono/Monaco/Consolas) instead of Monaspace
- Simplify controls with icon-only buttons
- Add Console tab UI matching DevTools aesthetic

Changes:
- Renamed LogsModal to LogsDrawer with drawer implementation
- Added resize functionality with mouse drag
- Removed logs button from header
- Added floating console toggle button in bottom-left
- Updated all CSS to match Chrome console styling
- Minimum height: 150px, maximum: window height - 100px

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: suppress /api/logs endpoint logging to reduce noise

Skip logging GET /api/logs requests in HTTP middleware to prevent
log spam from auto-refresh polling (every 2s). Keeps the auto-refresh
feature functional while eliminating the repetitive log entries.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: enhance error handling guidelines with approved overrides for justified exceptions

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-01-01 23:38:22 -05:00

7.8 KiB

Raw Permalink Blame History

/* To @claude: be vigilant about only leaving evergreen context in this file, claude-mem handles working context separately. */

⚠️ MANDATORY ERROR HANDLING RULES ⚠️

The Try-Catch Problem That Cost 10 Hours

A single overly-broad try-catch block wasted 10 hours of debugging time by silently swallowing errors. This pattern is BANNED.

BEFORE You Write Any Try-Catch

RUN THIS TEST FIRST:

bun run scripts/detect-error-handling-antipatterns.ts

You MUST answer these 5 questions to the user BEFORE writing try-catch:

What SPECIFIC error am I catching? (Name the error type: FileNotFoundError, NetworkTimeout, ValidationError)
Show documentation proving this error can occur (Link to docs or show me the source code)
Why can't this error be prevented? (If it can be prevented, prevent it instead)
What will the catch block DO? (Must include logging + either rethrow OR explicit fallback)
Why shouldn't this error propagate? (Justify swallowing it rather than letting caller handle)

If you cannot answer ALL 5 questions with specifics, DO NOT write the try-catch.

FORBIDDEN PATTERNS (Zero Tolerance)

🔴 CRITICAL - Never Allowed

// ❌ FORBIDDEN: Empty catch
try {
  doSomething();
} catch {}

// ❌ FORBIDDEN: Catch without logging
try {
  doSomething();
} catch (error) {
  return null;  // Silent failure!
}

// ❌ FORBIDDEN: Large try blocks (>10 lines)
try {
  // 50 lines of code
  // Multiple operations
  // Different failure modes
} catch (error) {
  logger.error('Something failed');  // Which thing?!
}

// ❌ FORBIDDEN: Promise empty catch
promise.catch(() => {});  // Error disappears into void

// ❌ FORBIDDEN: Try-catch to fix TypeScript errors
try {
  // @ts-ignore
  const value = response.propertyThatDoesntExist;
} catch {}

✅ ALLOWED Patterns

// ✅ GOOD: Specific, logged, explicit handling
try {
  await fetch(url);
} catch (error) {
  if (error instanceof NetworkError) {
    logger.warn('SYNC', 'Network request failed, will retry', { url }, error);
    return null;  // Explicit: null means "fetch failed"
  }
  throw error;  // Unexpected errors propagate
}

// ✅ GOOD: Minimal scope, clear recovery
try {
  JSON.parse(data);
} catch (error) {
  logger.error('CONFIG', 'Corrupt settings file, using defaults', {}, error);
  return DEFAULT_SETTINGS;
}

// ✅ GOOD: Fire-and-forget with logging
backgroundTask()
  .catch(error => logger.warn('BACKGROUND', 'Task failed', {}, error));

// ✅ GOOD: Approved override for justified exceptions
try {
  JSON.parse(optionalField);
} catch (error) {
  // [APPROVED OVERRIDE]: Expected JSON parse failures for optional fields, too frequent to log
  return [];
}

Approved Overrides

When you have a justified reason to violate the error handling rules (e.g., performance-critical hot paths, expected frequent failures), you can use an approved override:

// [APPROVED OVERRIDE]: Brief explanation of why this is necessary

Rules for approved overrides:

Must have a specific, technical reason (not "seemed fine" or "works for me")
Reason must explain why the violation is necessary, not just what it does
Examples of valid reasons:
- "Expected JSON parse failures for optional fields, too frequent to log"
- "Logger can't log its own failures, using stderr as last resort"
- "Health check port scan, expected connection failures"
The detector will flag these as APPROVED_OVERRIDE (warning level) for review
Invalid or outdated reasons should be challenged during code review

The Meta-Rule

UNCERTAINTY TRIGGERS RESEARCH, NOT TRY-CATCH

When you're unsure if a property exists or a method signature is correct:

READ the source code or documentation
VERIFY with the Read tool
USE TypeScript types to catch errors at compile time
WRITE code you KNOW is correct

Never use try-catch to paper over uncertainty. That wastes hours of debugging time later.

Critical Path Protection

These files are NEVER allowed to have catch-and-continue:

SDKAgent.ts - Errors must propagate, not hide
GeminiAgent.ts - Must fail loud, not silent
OpenRouterAgent.ts - Must fail loud, not silent
SessionStore.ts - Database errors must propagate
worker-service.ts - Core service errors must be visible

On critical paths, prefer NO TRY-CATCH and let errors propagate naturally.

Claude-Mem: AI Development Instructions

What This Project Is

Claude-mem is a Claude Code plugin providing persistent memory across sessions. It captures tool usage, compresses observations using the Claude Agent SDK, and injects relevant context into future sessions.

Architecture

5 Lifecycle Hooks: SessionStart → UserPromptSubmit → PostToolUse → Summary → SessionEnd

Hooks (src/hooks/*.ts) - TypeScript → ESM, built to plugin/scripts/*-hook.js

Worker Service (src/services/worker-service.ts) - Express API on port 37777, Bun-managed, handles AI processing asynchronously

Database (src/services/sqlite/) - SQLite3 at ~/.claude-mem/claude-mem.db

Search Skill (plugin/skills/mem-search/SKILL.md) - HTTP API for searching past work, auto-invoked when users ask about history

Chroma (src/services/sync/ChromaSync.ts) - Vector embeddings for semantic search

Viewer UI (src/ui/viewer/) - React interface at http://localhost:37777, built to plugin/ui/viewer.html

Privacy Tags

Dual-Tag System for meta-observation control:

<private>content</private> - User-level privacy control (manual, prevents storage)
<claude-mem-context>content</claude-mem-context> - System-level tag (auto-injected observations, prevents recursive storage)

Implementation: Tag stripping happens at hook layer (edge processing) before data reaches worker/database. See src/utils/tag-stripping.ts for shared utilities.

Build Commands

npm run build-and-sync        # Build, sync to marketplace, restart worker

Configuration

Settings are managed in ~/.claude-mem/settings.json. The file is auto-created with defaults on first run.

File Locations

Source: <project-root>/src/
Built Plugin: <project-root>/plugin/
Installed Plugin: ~/.claude/plugins/marketplaces/thedotmack/
Database: ~/.claude-mem/claude-mem.db
Chroma: ~/.claude-mem/chroma/

Requirements

Bun (all platforms - auto-installed if missing)
uv (all platforms - auto-installed if missing, provides Python for Chroma)
Node.js

Documentation

Public Docs: https://docs.claude-mem.ai (Mintlify) Source: docs/public/ - MDX files, edit docs.json for navigation Deploy: Auto-deploys from GitHub on push to main

Pro Features Architecture

Claude-mem is designed with a clean separation between open-source core functionality and optional Pro features.

Open-Source Core (this repository):

All worker API endpoints on localhost:37777 remain fully open and accessible
Pro features are headless - no proprietary UI elements in this codebase
Pro integration points are minimal: settings for license keys, tunnel provisioning logic
The architecture ensures Pro features extend rather than replace core functionality

Pro Features (coming soon, external):

Enhanced UI (Memory Stream) connects to the same localhost:37777 endpoints as the open viewer
Additional features like advanced filtering, timeline scrubbing, and search tools
Access gated by license validation, not by modifying core endpoints
Users without Pro licenses continue using the full open-source viewer UI without limitation

This architecture preserves the open-source nature of the project while enabling sustainable development through optional paid features.

Important

No need to edit the changelog ever, it's generated automatically.

7.8 KiB Raw Permalink Blame History