Files

T

Alex Newman f38b5b85bc fix: resolve issues #543 , #544 , #545 , #557 (#558 )

* docs: add investigation reports for 5 open GitHub issues

Comprehensive analysis of issues #543, #544, #545, #555, and #557:

- #557: settings.json not generated, module loader error (node/bun mismatch)
- #555: Windows hooks not executing, hasIpc always false
- #545: formatTool crashes on non-JSON tool_input strings
- #544: mem-search skill hint shown incorrectly to Claude Code users
- #543: /claude-mem slash command unavailable despite installation

Each report includes root cause analysis, affected files, and proposed fixes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(logger): handle non-JSON tool_input in formatTool (#545)

Wrap JSON.parse in try-catch to handle raw string inputs (e.g., Bash
commands) that aren't valid JSON. Falls back to using the string as-is.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(context): update mem-search hint to reference MCP tools (#544)

Update hint messages to reference MCP tools (search, get_observations)
instead of the deprecated "mem-search skill" terminology.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(settings): auto-create settings.json on first load (#557, #543)

When settings.json doesn't exist, create it with defaults instead of
returning in-memory defaults. Creates parent directory if needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(hooks): use bun runtime for hooks except smart-install (#557)

Change hook commands from node to bun since hooks use bun:sqlite.
Keep smart-install.js on node since it bootstraps bun installation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: rebuild plugin scripts

* docs: clarify that build artifacts must be committed

* fix(docs): update build artifacts directory reference in CLAUDE.md

* test: add test coverage for PR #558 fixes

- Fix 2 failing tests: update "mem-search skill" → "MCP tools" expectations
- Add 56 tests for formatTool() JSON.parse crash fix (Issue #545)
- Add 27 tests for settings.json auto-creation (Issue #543)

Test coverage includes:
- formatTool: JSON parsing, raw strings, objects, null/undefined, all tool types
- Settings: file creation, directory creation, schema migration, edge cases

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(tests): clean up flaky tests and fix circular dependency

Phase 1 of test quality improvements:

- Delete 6 harmful/worthless test files that used problematic mock.module()
  patterns or tested implementation details rather than behavior:
  - context-builder.test.ts (tested internal implementation)
  - export-types.test.ts (fragile mock patterns)
  - smart-install.test.ts (shell script testing antipattern)
  - session_id_refactor.test.ts (outdated, tested refactoring itself)
  - validate_sql_update.test.ts (one-time migration validation)
  - observation-broadcaster.test.ts (excessive mocking)

- Fix circular dependency between logger.ts and SettingsDefaultsManager.ts
  by using late binding pattern - logger now lazily loads settings

- Refactor mock.module() to spyOn() in several test files for more
  maintainable and less brittle tests:
  - observation-compiler.test.ts
  - gemini_agent.test.ts
  - error-handler.test.ts
  - server.test.ts
  - response-processor.test.ts

All 649 tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(tests): phase 2 - reduce mock-heavy tests and improve focus

- Remove mock-heavy query tests from observation-compiler.test.ts, keep real buildTimeline tests
- Convert session_id_usage_validation.test.ts from 477 to 178 lines of focused smoke tests
- Remove tests for language built-ins from worker-spawn.test.ts (JSON.parse, array indexing)
- Rename logger-coverage.test.ts to logger-usage-standards.test.ts for clarity

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(tests): phase 3 - add JSDoc mock justification to test files

Document mock usage rationale in 5 test files to improve maintainability:
- error-handler.test.ts: Express req/res mocks, logger spies (~11%)
- fallback-error-handler.test.ts: Zero mocks, pure function tests
- session-cleanup-helper.test.ts: Session fixtures, worker mocks (~19%)
- hook-constants.test.ts: process.platform mock for Windows tests (~12%)
- session_store.test.ts: Zero mocks, real SQLite :memory: database

Part of ongoing effort to document mock justifications per TESTING.md guidelines.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test(integration): phase 5 - add 72 tests for critical coverage gaps

Add comprehensive test coverage for previously untested areas:

- tests/integration/hook-execution-e2e.test.ts (10 tests)
  Tests lifecycle hooks execution flow and context propagation

- tests/integration/worker-api-endpoints.test.ts (19 tests)
  Tests all worker service HTTP endpoints without heavy mocking

- tests/integration/chroma-vector-sync.test.ts (16 tests)
  Tests vector embedding synchronization with ChromaDB

- tests/utils/tag-stripping.test.ts (27 tests)
  Tests privacy tag stripping utilities for both <private> and
  <meta-observation> tags

All tests use real implementations where feasible, following the
project's testing philosophy of preferring integration-style tests
over unit tests with extensive mocking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* context update

* docs: add comment linking DEFAULT_DATA_DIR locations

Added NOTE comment in logger.ts pointing to the canonical DEFAULT_DATA_DIR
in SettingsDefaultsManager.ts. This addresses PR reviewer feedback about
the fragility of having the default defined in two places to avoid
circular dependencies.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-05 19:45:09 -05:00

16 KiB

Raw Blame History

Test Quality Audit Report

Date: 2026-01-05 Auditor: Claude Code (Opus 4.5) Methodology: Deep analysis with focus on anti-pattern prevention, actual functionality testing, and regression prevention

Executive Summary

Total Test Files Audited: 41 Total Test Cases: ~450+

Score Distribution

Score	Category	Count	Percentage
5	Essential	8	19.5%
4	Valuable	15	36.6%
3	Marginal	11	26.8%
2	Weak	5	12.2%
1	Delete	2	4.9%

Key Findings

Strengths:

SQLite database tests are exemplary - real database operations with proper setup/teardown
Infrastructure tests (WMIC parsing, token calculator) use pure unit testing with no mocks
Search strategy tests have comprehensive coverage of edge cases
Logger formatTool tests are thorough and test actual transformation logic

Critical Issues:

context-builder.test.ts has incomplete mocks that pollute the module cache, causing 81 test failures when run with the full suite
Several tests verify mock behavior rather than actual functionality
Type validation tests (export-types.test.ts) provide minimal value - TypeScript already validates types at compile time
Some "validation" tests only verify code patterns exist, not that they work

Recommendations:

Fix or delete context-builder.test.ts - it actively harms the test suite
Delete trivial type validation tests that duplicate TypeScript compiler checks
Convert heavy-mock tests to integration tests where feasible
Add integration tests for critical paths (hook execution, worker API endpoints)

Detailed Scores

Score 5 - Essential (8 tests)

These tests catch real bugs, use minimal mocking, and test actual behavior.

File	Test Count	Notes
`tests/sqlite/observations.test.ts`	25+	Real SQLite operations, in-memory DB, tests actual data persistence and retrieval
`tests/sqlite/sessions.test.ts`	20+	Real database CRUD operations, status transitions, relationship integrity
`tests/sqlite/transactions.test.ts`	15+	Critical transaction isolation tests, rollback behavior, error handling
`tests/context/token-calculator.test.ts`	35+	Pure unit tests, no mocks, tests actual token estimation algorithms
`tests/infrastructure/wmic-parsing.test.ts`	20+	Pure parsing logic tests, validates Windows process enumeration edge cases
`tests/utils/logger-format-tool.test.ts`	56	Comprehensive formatTool tests, validates JSON parsing, tool output formatting
`tests/server/server.test.ts`	15+	Real HTTP server integration tests, actual endpoint validation
`tests/cursor-hook-outputs.test.ts`	12+	Integration tests running actual hook scripts, validates real output

Why Essential: These tests catch actual bugs before production. They test real behavior with minimal abstraction. The SQLite tests in particular are exemplary - they use an in-memory database but perform real SQL operations.

Score 4 - Valuable (15 tests)

Good tests with acceptable mocking that still verify meaningful behavior.

File	Test Count	Notes
`tests/sqlite/prompts.test.ts`	15+	Real DB operations for user prompts, timestamp handling
`tests/sqlite/summaries.test.ts`	15+	Real DB operations for session summaries
`tests/worker/search/search-orchestrator.test.ts`	30+	Comprehensive strategy selection logic, good edge case coverage
`tests/worker/search/strategies/sqlite-search-strategy.test.ts`	25+	Filter logic tests, date range handling
`tests/worker/search/strategies/hybrid-search-strategy.test.ts`	20+	Ranking preservation, merge logic
`tests/worker/search/strategies/chroma-search-strategy.test.ts`	20+	Vector search behavior, doc_type filtering
`tests/worker/search/result-formatter.test.ts`	15+	Output formatting validation
`tests/gemini_agent.test.ts`	20+	Multi-turn conversation flow, rate limiting fallback
`tests/infrastructure/health-monitor.test.ts`	15+	Health check logic, threshold validation
`tests/infrastructure/graceful-shutdown.test.ts`	15+	Shutdown sequence, timeout handling
`tests/infrastructure/process-manager.test.ts`	12+	Process lifecycle management
`tests/cursor-mcp-config.test.ts`	10+	MCP configuration generation validation
`tests/cursor-hooks-json-utils.test.ts`	8+	JSON parsing utilities
`tests/shared/settings-defaults-manager.test.ts`	27	Settings validation, migration logic
`tests/context/formatters/markdown-formatter.test.ts`	15+	Markdown generation, terminology consistency

Why Valuable: These tests have some mocking but still verify important business logic. The search strategy tests are particularly good at testing the decision-making logic for query routing.

Score 3 - Marginal (11 tests)

Tests with moderate value, often too much mocking or testing obvious behavior.

File	Test Count	Issues
`tests/worker/agents/observation-broadcaster.test.ts`	15+	Heavy mocking of SSE workers, tests mock behavior more than actual broadcasting
`tests/worker/agents/fallback-error-handler.test.ts`	10+	Error message formatting tests, low complexity
`tests/worker/agents/session-cleanup-helper.test.ts`	10+	Cleanup logic with mocked dependencies
`tests/context/observation-compiler.test.ts`	20+	Mock database, tests query building not actual compilation
`tests/server/error-handler.test.ts`	8+	Mock Express response, tests formatting only
`tests/cursor-registry.test.ts`	8+	Registry pattern tests, low risk area
`tests/cursor-context-update.test.ts`	5+	File format validation, could be stricter
`tests/hook-constants.test.ts`	5+	Constant validation, low value
`tests/session_store.test.ts`	10+	In-memory store tests, straightforward logic
`tests/logger-coverage.test.ts`	8+	Coverage verification, not functionality
`tests/scripts/smart-install.test.ts`	25+	Path array tests, replicates rather than imports logic

Why Marginal: These tests provide some regression protection but either mock too heavily or test low-risk areas. The smart-install tests notably replicate the path arrays from the source file rather than testing the actual module.

Score 2 - Weak (5 tests)

Tests that mostly verify mocks work or provide little value.

File	Test Count	Issues
`tests/worker/agents/response-processor.test.ts`	20+	Heavy mocking: >50% setup is mock configuration. Tests verify mocks are called, not that XML parsing actually works
`tests/session_id_refactor.test.ts`	10+	Code pattern validation: Tests that certain patterns exist in code, not that they work
`tests/session_id_usage_validation.test.ts`	5+	Static analysis as tests: Reads files and checks for string patterns. Should be a lint rule, not a test
`tests/validate_sql_update.test.ts`	5+	One-time validation: Validated a migration, no ongoing value
`tests/worker-spawn.test.ts`	5+	Trivial mocking: Tests spawn config exists, doesn't test actual spawning

Why Weak: These tests create false confidence. The response-processor tests in particular set up elaborate mocks and then verify those mocks were called - they don't verify actual XML parsing or database operations work correctly.

Score 1 - Delete (2 tests)

Tests that actively harm the codebase or provide zero value.

File	Test Count	Issues
`tests/context/context-builder.test.ts`	20+	CRITICAL: Incomplete logger mock pollutes module cache. Causes 81 test failures when run with full suite. Tests verify mocks, not actual context building
`tests/scripts/export-types.test.ts`	30+	Zero runtime value: Tests TypeScript type definitions compile. TypeScript compiler already does this. These tests can literally never fail at runtime

Why Delete:

context-builder.test.ts: This test is actively harmful. It imports the logger module with an incomplete mock (only 4 of 13+ methods mocked), and this polluted mock persists in Bun's module cache. When other tests run afterwards, they get the broken logger singleton. The test itself only verifies that mocked methods were called with expected arguments - it doesn't test actual context building logic.
export-types.test.ts: These tests instantiate TypeScript interfaces and verify properties exist. TypeScript already validates this at compile time. If a type definition is wrong, the code won't compile. These runtime tests add overhead without catching any bugs that TypeScript wouldn't already catch.

Missing Test Coverage

Critical Gaps

Area	Risk	Current Coverage	Recommendation
Hook execution E2E	HIGH	None	Add integration tests that run hooks with real Claude Code SDK
Worker API endpoints	HIGH	Partial (server.test.ts)	Add tests for all REST endpoints: `/observe`, `/search`, `/health`
Chroma vector sync	HIGH	None	Add tests for ChromaSync.ts embedding generation and retrieval
Database migrations	MEDIUM	None	Add tests for schema migrations, especially version upgrades
Settings file I/O	MEDIUM	Partial	Add tests for settings file creation, corruption recovery
Tag stripping	MEDIUM	None	Add tests for `<private>` and `<meta-observation>` tag handling
MCP tool handlers	MEDIUM	None	Add tests for search, timeline, get_observations MCP tools
Error recovery	MEDIUM	Minimal	Add tests for worker crash recovery, database corruption handling

Recommendations

Immediate Actions

Delete or fix tests/context/context-builder.test.ts (Priority: CRITICAL)
- This test causes 81 other tests to fail due to module cache pollution
- Either complete the logger mock (all 13+ methods) or delete entirely
- Recommended: Delete and rewrite as integration test without mocks
Delete tests/scripts/export-types.test.ts (Priority: HIGH)
- Zero runtime value - TypeScript compiler already validates types
- Remove to reduce test suite noise
Delete or convert validation tests (Priority: MEDIUM)
- tests/session_id_refactor.test.ts - Was useful during migration, no longer needed
- tests/session_id_usage_validation.test.ts - Convert to lint rule
- tests/validate_sql_update.test.ts - Was useful during migration, no longer needed

Architecture Improvements

Create test utilities for common mocks
- Centralize logger mock in tests/utils/mock-logger.ts with ALL methods
- Centralize database mock with proper transaction support
- Prevent incomplete mocks from polluting module cache
Add integration test suite
- Create tests/integration/ directory
- Run with real worker server (separate database)
- Test actual data flow, not mock interactions
Implement test isolation
- Use beforeEach to reset module state
- Consider test file ordering to prevent cache pollution
- Add cleanup hooks for database state

Quality Guidelines

For future tests, follow these principles:

Prefer real implementations over mocks
- Use in-memory SQLite instead of mock database
- Use real HTTP requests instead of mock req/res
- Mock only external services (AI APIs, file system when needed)
Test behavior, not implementation
- Bad: "verify function X was called with argument Y"
- Good: "verify output contains expected data after operation"
Each test should be able to fail
- If a test cannot fail (like type validation tests), it's not testing anything
- Write tests that would catch real bugs
Keep test setup minimal
- If >50% of test is mock setup, consider integration testing
- Complex mock setup often indicates testing the wrong thing

Appendix: Full Test File Inventory

File	Score	Tests	LOC	Mock %
`tests/context/context-builder.test.ts`	1	20+	400+	80%
`tests/context/formatters/markdown-formatter.test.ts`	4	15+	200+	10%
`tests/context/observation-compiler.test.ts`	3	20+	300+	60%
`tests/context/token-calculator.test.ts`	5	35+	400+	0%
`tests/cursor-context-update.test.ts`	3	5+	100+	20%
`tests/cursor-hook-outputs.test.ts`	5	12+	250+	10%
`tests/cursor-hooks-json-utils.test.ts`	4	8+	150+	0%
`tests/cursor-mcp-config.test.ts`	4	10+	200+	20%
`tests/cursor-registry.test.ts`	3	8+	150+	30%
`tests/gemini_agent.test.ts`	4	20+	400+	40%
`tests/hook-constants.test.ts`	3	5+	80+	0%
`tests/infrastructure/graceful-shutdown.test.ts`	4	15+	300+	40%
`tests/infrastructure/health-monitor.test.ts`	4	15+	250+	30%
`tests/infrastructure/process-manager.test.ts`	4	12+	200+	35%
`tests/infrastructure/wmic-parsing.test.ts`	5	20+	240+	0%
`tests/logger-coverage.test.ts`	3	8+	150+	20%
`tests/scripts/export-types.test.ts`	1	30+	350+	0%
`tests/scripts/smart-install.test.ts`	3	25+	230+	0%
`tests/server/error-handler.test.ts`	3	8+	150+	50%
`tests/server/server.test.ts`	5	15+	300+	20%
`tests/session_id_refactor.test.ts`	2	10+	200+	N/A
`tests/session_id_usage_validation.test.ts`	2	5+	150+	N/A
`tests/session_store.test.ts`	3	10+	180+	10%
`tests/shared/settings-defaults-manager.test.ts`	4	27	400+	20%
`tests/sqlite/observations.test.ts`	5	25+	400+	0%
`tests/sqlite/prompts.test.ts`	4	15+	250+	0%
`tests/sqlite/sessions.test.ts`	5	20+	350+	0%
`tests/sqlite/summaries.test.ts`	4	15+	250+	0%
`tests/sqlite/transactions.test.ts`	5	15+	300+	0%
`tests/utils/logger-format-tool.test.ts`	5	56	1000+	0%
`tests/validate_sql_update.test.ts`	2	5+	100+	N/A
`tests/worker/agents/fallback-error-handler.test.ts`	3	10+	200+	40%
`tests/worker/agents/observation-broadcaster.test.ts`	3	15+	350+	60%
`tests/worker/agents/response-processor.test.ts`	2	20+	500+	70%
`tests/worker/agents/session-cleanup-helper.test.ts`	3	10+	200+	50%
`tests/worker/search/result-formatter.test.ts`	4	15+	250+	20%
`tests/worker/search/search-orchestrator.test.ts`	4	30+	500+	45%
`tests/worker/search/strategies/chroma-search-strategy.test.ts`	4	20+	350+	50%
`tests/worker/search/strategies/hybrid-search-strategy.test.ts`	4	20+	300+	45%
`tests/worker/search/strategies/sqlite-search-strategy.test.ts`	4	25+	350+	40%
`tests/worker-spawn.test.ts`	2	5+	100+	60%

Report generated by Claude Code (Opus 4.5) on 2026-01-05

16 KiB Raw Blame History