* chore: bump version to 7.3.6 in package.json
* Enhance worker readiness checks and MCP connection handling
- Updated health check endpoint to /api/readiness for better initialization tracking.
- Increased timeout for health checks and worker startup retries, especially for Windows.
- Added initialization flags to track MCP readiness and overall worker initialization status.
- Implemented a timeout guard for MCP connection to prevent hanging.
- Adjusted logging to reflect readiness state and errors more accurately.
* fix(windows): use Bun PATH detection in worker wrapper
Phase 2/8: Fix Bun PATH Detection in Worker Wrapper
- Import getBunPath() in worker-wrapper.ts for Bun detection
- Add Bun path resolution before spawning inner worker process
- Update spawn call to use detected Bun path instead of process.execPath
- Add logging to bun-path.ts when PATH detection succeeds
- Add logging when fallback paths are used
- Add Windows-specific validation for .exe extension
- Log warning with searched paths when Bun not found
- Fail fast with clear error message if Bun cannot be detected
This ensures worker-wrapper uses the correct Bun executable on Windows
even when Bun is not in PATH, fixing issue #371 where users reported
"Bun not in PATH" errors despite Bun being installed.
Addresses: #371🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix(windows): standardize child process spawning with windowsHide
Phase 3/8: Standardize Child Process Spawning (Windows)
Changes:
- Added windowsHide flag to ChromaSync MCP subprocess spawn
- Added Windows-specific process tracking (childPid) in ChromaSync
- Force-kill subprocess on Windows before closing transport to prevent zombie processes
- Updated cleanupOrphanedProcesses() to support Windows using PowerShell Get-CimInstance
- Use taskkill /T /F for proper process tree cleanup on Windows
- Audited BranchManager - confirmed windowsHide already present on all spawn calls
This prevents PowerShell windows from appearing during ChromaSync operations
and ensures proper cleanup of subprocess trees on Windows.
Addresses: #363, #361, #367, #371, #373, #374🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix(windows): enhance socket cleanup with recursive process tree management
Phase 4/8: Enhanced Socket Cleanup & Process Tree Management
Changes:
- Added recursive process tree enumeration in worker-wrapper.ts for Windows
- Enhanced killInner() to enumerate all descendants before killing
- Added fallback individual process kill if taskkill /T fails
- Added 10s timeout to ChromaSync.close() in DatabaseManager to prevent hangs
- Force nullify ChromaSync even on close failure to prevent resource leaks
- Improved logging to show full process tree during cleanup
This ensures complete cleanup of all child processes (ChromaSync MCP subprocess,
Python processes, etc.) preventing socket leaks and CLOSE_WAIT states.
Addresses: #363, #361
* fix(windows): consolidate project name extraction with drive root handling
Phase 5/8: Project Name Extraction Consolidation
- Created shared getProjectName() utility in src/utils/project-name.ts
- Handles edge case: drive roots (C:\, J:\) now return "drive-X" format
- Handles edge case: null/undefined/empty cwd now returns "unknown-project"
- Fixed missing null check bug in new-hook.ts
- Replaced duplicated path.basename(cwd) logic in:
- src/hooks/context-hook.ts
- src/hooks/new-hook.ts
- src/services/context-generator.ts
Addresses: #374🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix(windows): increase timeouts and improve error messages
Phase 6/8: Increase Timeouts & Improve Error Messages
- Enhanced logger.ts with platform prefix (WIN32/DARWIN) and PID in all logs
- Added comprehensive Windows troubleshooting to ProcessManager error messages
- Enhanced Bun detection error message with Windows-specific troubleshooting
- All error messages now include GitHub issue numbers and docs links
- Windows timeout already increased to 2.0x multiplier in previous phases
Changes:
- src/utils/logger.ts: Added platform prefix and PID to all log output
- src/services/process/ProcessManager.ts: Enhanced error messages with troubleshooting steps
- src/utils/bun-path.ts: Added Windows-specific Bun detection error guidance
Addresses: #363, #361, #367, #371, #373, #374🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix(windows): add comprehensive Windows CI testing
Phase 7/8: Add Windows CI Testing
- Create automated Windows testing workflow
- Test worker startup/shutdown cycles
- Verify Bun PATH detection on Windows
- Test rapid restart scenarios
- Validate port cleanup after shutdown
- Check for zombie processes
- Run on all pushes and PRs to main/fix/feature branches
Addresses: #363, #361, #367, #371, #373, #374🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* ci(windows): remove build steps from Windows CI workflow
Build files are already included in the plugin folder, so npm install
and npm run build are unnecessary steps in the CI workflow.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* revert: remove Windows CI workflow
The CI workflow cannot be properly implemented in the current architecture
due to limitations in testing the worker service in CI environments.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* security: add PID validation and improve ChromaSync timeout handling
Address critical security and reliability issues identified in PR review:
**Security Fixes:**
- Add PID validation before all PowerShell/taskkill command execution
- Validate PIDs are positive integers to prevent command injection
- Apply validation in worker-wrapper.ts, worker-service.ts, and ChromaSync.ts
**Reliability Improvements:**
- Add timeout handling to ChromaSync client.close() (10s timeout)
- Add timeout handling to ChromaSync transport.close() (5s timeout)
- Implement force-kill fallback when ChromaSync close operations timeout
- Prevents hanging on shutdown and ensures subprocess cleanup
**Implementation Details:**
- PID validation checks: Number.isInteger(pid) && pid > 0
- Applied before all execSync taskkill calls on Windows
- Applied in process enumeration (Get-CimInstance) PowerShell commands
- ChromaSync.close() uses Promise.race for timeout enforcement
- Graceful degradation with force-kill fallback on timeout
Addresses PR #378 review feedback
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Refactor ChromaSync client and transport closure logic
- Removed timeout handling for closing the Chroma client and transport.
- Simplified error logging for client and transport closure.
- Ensured subprocess cleanup logic is more straightforward.
* fix(worker): streamline Windows process management and cleanup
* revert: remove speculative LLM-generated complexity
Reverts defensive code that was added speculatively without user-reported issues:
- ChromaSync: Remove PID extraction and explicit taskkill (wrapper handles this)
- worker-wrapper: Restore simple taskkill /T /F (validated in v7.3.5)
- DatabaseManager: Remove Promise.race timeout wrapper
- hook-constants: Restore original timeout values
- logger: Remove platform/PID additions to every log line
- bun-path: Remove speculative logging
Keeps only changes that map to actual GitHub issues:
- #374: Drive root project name fix (getProjectName utility)
- #363: Readiness endpoint and Windows orphan cleanup
- #367: windowsHide on ChromaSync transport
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- Removed all instances of happy_path_error__with_fallback from various hooks, services, and utilities.
- Introduced logger.happyPathError for consistent logging of unexpected nulls and fallback values.
- Updated the logger utility to include a new happyPathError method with enhanced context and stack trace.
- Deprecated silent-debug utility as all logging functionality has been migrated to the logger.
* Enhance error logging in hooks
- Added detailed error logging in context-hook, new-hook, save-hook, and summary-hook to capture status, project, port, and relevant session information on failures.
- Improved error messages thrown in save-hook and summary-hook to include specific context about the failure.
* Refactor migration logging to use console.log instead of console.error
- Updated SessionSearch and SessionStore classes to replace console.error with console.log for migration-related messages.
- Added notes in the documentation to clarify the use of console.log for migration messages due to the unavailability of the structured logger during constructor execution.
* Refactor SDKAgent and silent-debug utility to simplify error handling
- Updated SDKAgent to use direct defaults instead of happy_path_error__with_fallback for non-critical fields such as last_user_message, last_assistant_message, title, filesRead, filesModified, concepts, and summary.request.
- Enhanced silent-debug documentation to clarify appropriate use cases for happy_path_error__with_fallback, emphasizing its role in handling unexpected null/undefined values while discouraging its use for nullable fields with valid defaults.
* fix: correct happy_path_error__with_fallback usage to prevent false errors
Fixes false "Missing cwd" and "Missing transcript_path" errors that were
flooding silent.log even when values were present.
Root cause: happy_path_error__with_fallback was being called unconditionally
instead of only when the value was actually missing.
Pattern changed from:
value: happy_path_error__with_fallback('Missing', {}, value || '')
To correct usage:
value: value || happy_path_error__with_fallback('Missing', {}, '')
Fixed in:
- src/hooks/save-hook.ts (PostToolUse hook)
- src/hooks/summary-hook.ts (Stop hook)
- src/services/worker/http/routes/SessionRoutes.ts (2 instances)
Impact: Eliminates false error noise, making actual errors visible.
Addresses issue #260 - users were seeing "Missing cwd" errors despite
Claude Code correctly passing all required fields.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Enhance error logging and handling across services
- Improved error messages in SessionStore to include project context when fetching boundary observations and timestamps.
- Updated ChromaSync error handling to provide more informative messages regarding client initialization failures, including the project context.
- Enhanced error logging in WorkerService to include the package path when reading version fails.
- Added detailed error logging in worker-utils to capture expected and running versions during health checks.
- Extended WorkerErrorMessageOptions to include actualError for more informative restart instructions.
* Refactor error handling in hooks to use standardized fetch error handler
- Introduced a new error handler `handleFetchError` in `shared/error-handler.ts` to standardize logging and user-facing error messages for fetch failures across hooks.
- Updated `context-hook.ts`, `new-hook.ts`, `save-hook.ts`, and `summary-hook.ts` to utilize the new error handler, improving consistency and maintainability.
- Removed redundant imports and error handling logic related to worker restart instructions from the hooks.
* feat: add comprehensive error handling tests for hooks and ChromaSync client
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Critical bugfix: ChromaSync now properly cleans up chroma-mcp subprocesses
when the worker is restarted, preventing memory exhaustion from orphaned
processes accumulating over time.
Changes:
- Store reference to StdioClientTransport subprocess
- Explicitly close transport in close() method to kill subprocess
- Add error handling to ensure cleanup even on failures
- Reset all state in finally block
Problem:
Each worker restart spawned a new chroma-mcp process but never killed the
old one. After multiple restarts, orphaned processes accumulated (16+ seen
in production), consuming 900MB+ RAM and eventually causing OOM kills that
silently failed backfills.
Impact:
- Eliminates process accumulation
- Prevents memory exhaustion from leaked subprocesses
- Fixes silent backfill failures caused by OOM kills
- Ensures graceful cleanup on worker shutdown
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Moved SettingsDefaultsManager from worker/settings to shared directory.
- Updated all import paths across the codebase to reflect the new location.
- Removed early-settings.ts as its functionality is now handled by SettingsDefaultsManager.
- Adjusted logger and paths to utilize SettingsDefaultsManager for configuration values.
- Replaced instances of silentDebug with happy_path_error__with_fallback across multiple files to improve error logging and handling.
- Updated the utility function to provide clearer semantics for error handling when expected values are missing.
- Introduced a script to find potential silent failures in the codebase that may need to be addressed with the new error handling approach.
- Updated paths in troubleshooting documentation to reflect new settings file location.
- Modified diagnostics and reference files to read from ~/.claude-mem/settings.json.
- Introduced getWorkerPort utility for cleaner worker port retrieval.
- Enhanced ChromaSync and SDKAgent to load Python version and Claude path from settings.
- Updated SettingsRoutes to validate new settings: CLAUDE_MEM_LOG_LEVEL and CLAUDE_MEM_PYTHON_VERSION.
- Added early-settings module to load settings for logger and other early-stage modules.
- Adjusted logger to use early-loaded log level setting.
- Refactored paths to utilize early-loaded data directory setting.
Resolved conflicts:
- SearchManager.ts: Keep refactored class version (main had old search-server.ts startup code)
- worker-service.cjs, search-server.cjs: Keep our built versions
- package-lock.json: Take main's version
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extracted timeline-related functionality from mcp-server.ts to a dedicated TimelineService class.
- Added methods to build, filter, and format timeline items based on observations, sessions, and prompts.
- Introduced interfaces for TimelineItem and TimelineData to standardize data structures.
- Implemented sorting and grouping of timeline items by date, with markdown formatting for output.
- Included utility methods for date and time formatting, as well as token estimation.
* feat: Add discovery_tokens for ROI tracking in observations and session summaries
- Introduced `discovery_tokens` column in `observations` and `session_summaries` tables to track token costs associated with discovering and creating each observation and summary.
- Updated relevant services and hooks to calculate and display ROI metrics based on discovery tokens.
- Enhanced context economics reporting to include savings from reusing previous observations.
- Implemented migration to ensure the new column is added to existing tables.
- Adjusted data models and sync processes to accommodate the new `discovery_tokens` field.
* refactor: streamline context hook by removing unused functions and updating terminology
- Removed the estimateTokens and getObservations helper functions as they were not utilized.
- Updated the legend and output messages to replace "discovery" with "work" for clarity.
- Changed the emoji representation for different observation types to better reflect their purpose.
- Enhanced output formatting for improved readability and understanding of token usage.
* Refactor user-message-hook and context-hook for improved clarity and functionality
- Updated user-message-hook.js to enhance error messaging and improve variable naming for clarity.
- Modified context-hook.ts to include a new column key section, improved context index instructions, and added emoji icons for observation types.
- Adjusted footer messages in context-hook.ts to emphasize token savings and access to past research.
- Changed user-message-hook.ts to update the feedback and support message for clarity.
* fix: Critical ROI tracking fixes from PR review
Addresses critical findings from PR #111 review:
1. **Fixed incorrect discovery token calculation** (src/services/worker/SDKAgent.ts)
- Changed from passing cumulative total to per-response delta
- Now correctly tracks token cost for each observation/summary
- Captures token state before/after response processing
- Prevents all observations getting inflated cumulative values
2. **Fixed schema version mismatch** (src/services/sqlite/SessionStore.ts)
- Changed ensureDiscoveryTokensColumn() from version 11 to version 7
- Now matches migration007 definition in migrations.ts
- Ensures consistent version tracking across migration system
These fixes ensure ROI metrics accurately reflect token costs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
- Changed user prompt formatting to use full text instead of truncated version.
- Updated date filtering logic to use milliseconds instead of seconds for 90-day cutoff.
- Renamed doc_type values in ChromaSync to ensure consistency and prevent deduplication issues.
- Improved documentation for concept tags in input schema.
- Updated the StdioClientTransport configuration in search-server.ts to include 'stderr: ignore' for the Chroma client.
- Modified the ChromaSync class in ChromaSync.ts to also set 'stderr: ignore' when initializing the Chroma client.
- Added `getObservationById` method to retrieve observations by ID in SessionStore.
- Introduced `getSessionSummariesByIds` and `getUserPromptsByIds` methods for fetching session summaries and user prompts by IDs.
- Developed `getTimelineAroundTimestamp` and `getTimelineAroundObservation` methods to provide a unified timeline of observations, sessions, and prompts around a specified anchor point.
- Enhanced ChromaSync to format and sync user prompts, including a new `syncUserPrompt` method.
- Updated WorkerService to sync the latest user prompt to Chroma after updating the worker port.
- Created tests for timeline querying and MCP handler logic to ensure functionality.
- Documented the implementation plan for user prompts and timeline context tool in the Chroma search completion plan.