Files
claude-mem/.claude/reports/test-audit-2026-01-05.md
T
Alex Newman f38b5b85bc fix: resolve issues #543, #544, #545, #557 (#558)
* docs: add investigation reports for 5 open GitHub issues

Comprehensive analysis of issues #543, #544, #545, #555, and #557:

- #557: settings.json not generated, module loader error (node/bun mismatch)
- #555: Windows hooks not executing, hasIpc always false
- #545: formatTool crashes on non-JSON tool_input strings
- #544: mem-search skill hint shown incorrectly to Claude Code users
- #543: /claude-mem slash command unavailable despite installation

Each report includes root cause analysis, affected files, and proposed fixes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(logger): handle non-JSON tool_input in formatTool (#545)

Wrap JSON.parse in try-catch to handle raw string inputs (e.g., Bash
commands) that aren't valid JSON. Falls back to using the string as-is.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(context): update mem-search hint to reference MCP tools (#544)

Update hint messages to reference MCP tools (search, get_observations)
instead of the deprecated "mem-search skill" terminology.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(settings): auto-create settings.json on first load (#557, #543)

When settings.json doesn't exist, create it with defaults instead of
returning in-memory defaults. Creates parent directory if needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(hooks): use bun runtime for hooks except smart-install (#557)

Change hook commands from node to bun since hooks use bun:sqlite.
Keep smart-install.js on node since it bootstraps bun installation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: rebuild plugin scripts

* docs: clarify that build artifacts must be committed

* fix(docs): update build artifacts directory reference in CLAUDE.md

* test: add test coverage for PR #558 fixes

- Fix 2 failing tests: update "mem-search skill" → "MCP tools" expectations
- Add 56 tests for formatTool() JSON.parse crash fix (Issue #545)
- Add 27 tests for settings.json auto-creation (Issue #543)

Test coverage includes:
- formatTool: JSON parsing, raw strings, objects, null/undefined, all tool types
- Settings: file creation, directory creation, schema migration, edge cases

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(tests): clean up flaky tests and fix circular dependency

Phase 1 of test quality improvements:

- Delete 6 harmful/worthless test files that used problematic mock.module()
  patterns or tested implementation details rather than behavior:
  - context-builder.test.ts (tested internal implementation)
  - export-types.test.ts (fragile mock patterns)
  - smart-install.test.ts (shell script testing antipattern)
  - session_id_refactor.test.ts (outdated, tested refactoring itself)
  - validate_sql_update.test.ts (one-time migration validation)
  - observation-broadcaster.test.ts (excessive mocking)

- Fix circular dependency between logger.ts and SettingsDefaultsManager.ts
  by using late binding pattern - logger now lazily loads settings

- Refactor mock.module() to spyOn() in several test files for more
  maintainable and less brittle tests:
  - observation-compiler.test.ts
  - gemini_agent.test.ts
  - error-handler.test.ts
  - server.test.ts
  - response-processor.test.ts

All 649 tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(tests): phase 2 - reduce mock-heavy tests and improve focus

- Remove mock-heavy query tests from observation-compiler.test.ts, keep real buildTimeline tests
- Convert session_id_usage_validation.test.ts from 477 to 178 lines of focused smoke tests
- Remove tests for language built-ins from worker-spawn.test.ts (JSON.parse, array indexing)
- Rename logger-coverage.test.ts to logger-usage-standards.test.ts for clarity

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(tests): phase 3 - add JSDoc mock justification to test files

Document mock usage rationale in 5 test files to improve maintainability:
- error-handler.test.ts: Express req/res mocks, logger spies (~11%)
- fallback-error-handler.test.ts: Zero mocks, pure function tests
- session-cleanup-helper.test.ts: Session fixtures, worker mocks (~19%)
- hook-constants.test.ts: process.platform mock for Windows tests (~12%)
- session_store.test.ts: Zero mocks, real SQLite :memory: database

Part of ongoing effort to document mock justifications per TESTING.md guidelines.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test(integration): phase 5 - add 72 tests for critical coverage gaps

Add comprehensive test coverage for previously untested areas:

- tests/integration/hook-execution-e2e.test.ts (10 tests)
  Tests lifecycle hooks execution flow and context propagation

- tests/integration/worker-api-endpoints.test.ts (19 tests)
  Tests all worker service HTTP endpoints without heavy mocking

- tests/integration/chroma-vector-sync.test.ts (16 tests)
  Tests vector embedding synchronization with ChromaDB

- tests/utils/tag-stripping.test.ts (27 tests)
  Tests privacy tag stripping utilities for both <private> and
  <meta-observation> tags

All tests use real implementations where feasible, following the
project's testing philosophy of preferring integration-style tests
over unit tests with extensive mocking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* context update

* docs: add comment linking DEFAULT_DATA_DIR locations

Added NOTE comment in logger.ts pointing to the canonical DEFAULT_DATA_DIR
in SettingsDefaultsManager.ts. This addresses PR reviewer feedback about
the fragility of having the default defined in two places to avoid
circular dependencies.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 19:45:09 -05:00

16 KiB

Test Quality Audit Report

Date: 2026-01-05 Auditor: Claude Code (Opus 4.5) Methodology: Deep analysis with focus on anti-pattern prevention, actual functionality testing, and regression prevention


Executive Summary

Total Test Files Audited: 41 Total Test Cases: ~450+

Score Distribution

Score Category Count Percentage
5 Essential 8 19.5%
4 Valuable 15 36.6%
3 Marginal 11 26.8%
2 Weak 5 12.2%
1 Delete 2 4.9%

Key Findings

Strengths:

  • SQLite database tests are exemplary - real database operations with proper setup/teardown
  • Infrastructure tests (WMIC parsing, token calculator) use pure unit testing with no mocks
  • Search strategy tests have comprehensive coverage of edge cases
  • Logger formatTool tests are thorough and test actual transformation logic

Critical Issues:

  • context-builder.test.ts has incomplete mocks that pollute the module cache, causing 81 test failures when run with the full suite
  • Several tests verify mock behavior rather than actual functionality
  • Type validation tests (export-types.test.ts) provide minimal value - TypeScript already validates types at compile time
  • Some "validation" tests only verify code patterns exist, not that they work

Recommendations:

  1. Fix or delete context-builder.test.ts - it actively harms the test suite
  2. Delete trivial type validation tests that duplicate TypeScript compiler checks
  3. Convert heavy-mock tests to integration tests where feasible
  4. Add integration tests for critical paths (hook execution, worker API endpoints)

Detailed Scores

Score 5 - Essential (8 tests)

These tests catch real bugs, use minimal mocking, and test actual behavior.

File Test Count Notes
tests/sqlite/observations.test.ts 25+ Real SQLite operations, in-memory DB, tests actual data persistence and retrieval
tests/sqlite/sessions.test.ts 20+ Real database CRUD operations, status transitions, relationship integrity
tests/sqlite/transactions.test.ts 15+ Critical transaction isolation tests, rollback behavior, error handling
tests/context/token-calculator.test.ts 35+ Pure unit tests, no mocks, tests actual token estimation algorithms
tests/infrastructure/wmic-parsing.test.ts 20+ Pure parsing logic tests, validates Windows process enumeration edge cases
tests/utils/logger-format-tool.test.ts 56 Comprehensive formatTool tests, validates JSON parsing, tool output formatting
tests/server/server.test.ts 15+ Real HTTP server integration tests, actual endpoint validation
tests/cursor-hook-outputs.test.ts 12+ Integration tests running actual hook scripts, validates real output

Why Essential: These tests catch actual bugs before production. They test real behavior with minimal abstraction. The SQLite tests in particular are exemplary - they use an in-memory database but perform real SQL operations.


Score 4 - Valuable (15 tests)

Good tests with acceptable mocking that still verify meaningful behavior.

File Test Count Notes
tests/sqlite/prompts.test.ts 15+ Real DB operations for user prompts, timestamp handling
tests/sqlite/summaries.test.ts 15+ Real DB operations for session summaries
tests/worker/search/search-orchestrator.test.ts 30+ Comprehensive strategy selection logic, good edge case coverage
tests/worker/search/strategies/sqlite-search-strategy.test.ts 25+ Filter logic tests, date range handling
tests/worker/search/strategies/hybrid-search-strategy.test.ts 20+ Ranking preservation, merge logic
tests/worker/search/strategies/chroma-search-strategy.test.ts 20+ Vector search behavior, doc_type filtering
tests/worker/search/result-formatter.test.ts 15+ Output formatting validation
tests/gemini_agent.test.ts 20+ Multi-turn conversation flow, rate limiting fallback
tests/infrastructure/health-monitor.test.ts 15+ Health check logic, threshold validation
tests/infrastructure/graceful-shutdown.test.ts 15+ Shutdown sequence, timeout handling
tests/infrastructure/process-manager.test.ts 12+ Process lifecycle management
tests/cursor-mcp-config.test.ts 10+ MCP configuration generation validation
tests/cursor-hooks-json-utils.test.ts 8+ JSON parsing utilities
tests/shared/settings-defaults-manager.test.ts 27 Settings validation, migration logic
tests/context/formatters/markdown-formatter.test.ts 15+ Markdown generation, terminology consistency

Why Valuable: These tests have some mocking but still verify important business logic. The search strategy tests are particularly good at testing the decision-making logic for query routing.


Score 3 - Marginal (11 tests)

Tests with moderate value, often too much mocking or testing obvious behavior.

File Test Count Issues
tests/worker/agents/observation-broadcaster.test.ts 15+ Heavy mocking of SSE workers, tests mock behavior more than actual broadcasting
tests/worker/agents/fallback-error-handler.test.ts 10+ Error message formatting tests, low complexity
tests/worker/agents/session-cleanup-helper.test.ts 10+ Cleanup logic with mocked dependencies
tests/context/observation-compiler.test.ts 20+ Mock database, tests query building not actual compilation
tests/server/error-handler.test.ts 8+ Mock Express response, tests formatting only
tests/cursor-registry.test.ts 8+ Registry pattern tests, low risk area
tests/cursor-context-update.test.ts 5+ File format validation, could be stricter
tests/hook-constants.test.ts 5+ Constant validation, low value
tests/session_store.test.ts 10+ In-memory store tests, straightforward logic
tests/logger-coverage.test.ts 8+ Coverage verification, not functionality
tests/scripts/smart-install.test.ts 25+ Path array tests, replicates rather than imports logic

Why Marginal: These tests provide some regression protection but either mock too heavily or test low-risk areas. The smart-install tests notably replicate the path arrays from the source file rather than testing the actual module.


Score 2 - Weak (5 tests)

Tests that mostly verify mocks work or provide little value.

File Test Count Issues
tests/worker/agents/response-processor.test.ts 20+ Heavy mocking: >50% setup is mock configuration. Tests verify mocks are called, not that XML parsing actually works
tests/session_id_refactor.test.ts 10+ Code pattern validation: Tests that certain patterns exist in code, not that they work
tests/session_id_usage_validation.test.ts 5+ Static analysis as tests: Reads files and checks for string patterns. Should be a lint rule, not a test
tests/validate_sql_update.test.ts 5+ One-time validation: Validated a migration, no ongoing value
tests/worker-spawn.test.ts 5+ Trivial mocking: Tests spawn config exists, doesn't test actual spawning

Why Weak: These tests create false confidence. The response-processor tests in particular set up elaborate mocks and then verify those mocks were called - they don't verify actual XML parsing or database operations work correctly.


Score 1 - Delete (2 tests)

Tests that actively harm the codebase or provide zero value.

File Test Count Issues
tests/context/context-builder.test.ts 20+ CRITICAL: Incomplete logger mock pollutes module cache. Causes 81 test failures when run with full suite. Tests verify mocks, not actual context building
tests/scripts/export-types.test.ts 30+ Zero runtime value: Tests TypeScript type definitions compile. TypeScript compiler already does this. These tests can literally never fail at runtime

Why Delete:

  • context-builder.test.ts: This test is actively harmful. It imports the logger module with an incomplete mock (only 4 of 13+ methods mocked), and this polluted mock persists in Bun's module cache. When other tests run afterwards, they get the broken logger singleton. The test itself only verifies that mocked methods were called with expected arguments - it doesn't test actual context building logic.
  • export-types.test.ts: These tests instantiate TypeScript interfaces and verify properties exist. TypeScript already validates this at compile time. If a type definition is wrong, the code won't compile. These runtime tests add overhead without catching any bugs that TypeScript wouldn't already catch.

Missing Test Coverage

Critical Gaps

Area Risk Current Coverage Recommendation
Hook execution E2E HIGH None Add integration tests that run hooks with real Claude Code SDK
Worker API endpoints HIGH Partial (server.test.ts) Add tests for all REST endpoints: /observe, /search, /health
Chroma vector sync HIGH None Add tests for ChromaSync.ts embedding generation and retrieval
Database migrations MEDIUM None Add tests for schema migrations, especially version upgrades
Settings file I/O MEDIUM Partial Add tests for settings file creation, corruption recovery
Tag stripping MEDIUM None Add tests for <private> and <meta-observation> tag handling
MCP tool handlers MEDIUM None Add tests for search, timeline, get_observations MCP tools
Error recovery MEDIUM Minimal Add tests for worker crash recovery, database corruption handling
  1. tests/integration/hook-execution.test.ts

    • Run actual hooks with mocked Claude Code environment
    • Verify data flows correctly through SessionStart -> PostToolUse -> SessionEnd
  2. tests/integration/worker-api.test.ts

    • Start actual worker server
    • Make real HTTP requests to all endpoints
    • Verify response formats and error handling
  3. tests/services/chroma-sync.test.ts

    • Test embedding generation with real text
    • Test semantic similarity retrieval
    • Test sync between SQLite and Chroma
  4. tests/utils/tag-stripping.test.ts

    • Test <private> tag removal
    • Test <meta-observation> tag handling
    • Test nested tag scenarios

Recommendations

Immediate Actions

  1. Delete or fix tests/context/context-builder.test.ts (Priority: CRITICAL)

    • This test causes 81 other tests to fail due to module cache pollution
    • Either complete the logger mock (all 13+ methods) or delete entirely
    • Recommended: Delete and rewrite as integration test without mocks
  2. Delete tests/scripts/export-types.test.ts (Priority: HIGH)

    • Zero runtime value - TypeScript compiler already validates types
    • Remove to reduce test suite noise
  3. Delete or convert validation tests (Priority: MEDIUM)

    • tests/session_id_refactor.test.ts - Was useful during migration, no longer needed
    • tests/session_id_usage_validation.test.ts - Convert to lint rule
    • tests/validate_sql_update.test.ts - Was useful during migration, no longer needed

Architecture Improvements

  1. Create test utilities for common mocks

    • Centralize logger mock in tests/utils/mock-logger.ts with ALL methods
    • Centralize database mock with proper transaction support
    • Prevent incomplete mocks from polluting module cache
  2. Add integration test suite

    • Create tests/integration/ directory
    • Run with real worker server (separate database)
    • Test actual data flow, not mock interactions
  3. Implement test isolation

    • Use beforeEach to reset module state
    • Consider test file ordering to prevent cache pollution
    • Add cleanup hooks for database state

Quality Guidelines

For future tests, follow these principles:

  1. Prefer real implementations over mocks

    • Use in-memory SQLite instead of mock database
    • Use real HTTP requests instead of mock req/res
    • Mock only external services (AI APIs, file system when needed)
  2. Test behavior, not implementation

    • Bad: "verify function X was called with argument Y"
    • Good: "verify output contains expected data after operation"
  3. Each test should be able to fail

    • If a test cannot fail (like type validation tests), it's not testing anything
    • Write tests that would catch real bugs
  4. Keep test setup minimal

    • If >50% of test is mock setup, consider integration testing
    • Complex mock setup often indicates testing the wrong thing

Appendix: Full Test File Inventory

File Score Tests LOC Mock %
tests/context/context-builder.test.ts 1 20+ 400+ 80%
tests/context/formatters/markdown-formatter.test.ts 4 15+ 200+ 10%
tests/context/observation-compiler.test.ts 3 20+ 300+ 60%
tests/context/token-calculator.test.ts 5 35+ 400+ 0%
tests/cursor-context-update.test.ts 3 5+ 100+ 20%
tests/cursor-hook-outputs.test.ts 5 12+ 250+ 10%
tests/cursor-hooks-json-utils.test.ts 4 8+ 150+ 0%
tests/cursor-mcp-config.test.ts 4 10+ 200+ 20%
tests/cursor-registry.test.ts 3 8+ 150+ 30%
tests/gemini_agent.test.ts 4 20+ 400+ 40%
tests/hook-constants.test.ts 3 5+ 80+ 0%
tests/infrastructure/graceful-shutdown.test.ts 4 15+ 300+ 40%
tests/infrastructure/health-monitor.test.ts 4 15+ 250+ 30%
tests/infrastructure/process-manager.test.ts 4 12+ 200+ 35%
tests/infrastructure/wmic-parsing.test.ts 5 20+ 240+ 0%
tests/logger-coverage.test.ts 3 8+ 150+ 20%
tests/scripts/export-types.test.ts 1 30+ 350+ 0%
tests/scripts/smart-install.test.ts 3 25+ 230+ 0%
tests/server/error-handler.test.ts 3 8+ 150+ 50%
tests/server/server.test.ts 5 15+ 300+ 20%
tests/session_id_refactor.test.ts 2 10+ 200+ N/A
tests/session_id_usage_validation.test.ts 2 5+ 150+ N/A
tests/session_store.test.ts 3 10+ 180+ 10%
tests/shared/settings-defaults-manager.test.ts 4 27 400+ 20%
tests/sqlite/observations.test.ts 5 25+ 400+ 0%
tests/sqlite/prompts.test.ts 4 15+ 250+ 0%
tests/sqlite/sessions.test.ts 5 20+ 350+ 0%
tests/sqlite/summaries.test.ts 4 15+ 250+ 0%
tests/sqlite/transactions.test.ts 5 15+ 300+ 0%
tests/utils/logger-format-tool.test.ts 5 56 1000+ 0%
tests/validate_sql_update.test.ts 2 5+ 100+ N/A
tests/worker/agents/fallback-error-handler.test.ts 3 10+ 200+ 40%
tests/worker/agents/observation-broadcaster.test.ts 3 15+ 350+ 60%
tests/worker/agents/response-processor.test.ts 2 20+ 500+ 70%
tests/worker/agents/session-cleanup-helper.test.ts 3 10+ 200+ 50%
tests/worker/search/result-formatter.test.ts 4 15+ 250+ 20%
tests/worker/search/search-orchestrator.test.ts 4 30+ 500+ 45%
tests/worker/search/strategies/chroma-search-strategy.test.ts 4 20+ 350+ 50%
tests/worker/search/strategies/hybrid-search-strategy.test.ts 4 20+ 300+ 45%
tests/worker/search/strategies/sqlite-search-strategy.test.ts 4 25+ 350+ 40%
tests/worker-spawn.test.ts 2 5+ 100+ 60%

Report generated by Claude Code (Opus 4.5) on 2026-01-05