Files

T

Alex Newman 2659ec3231 fix: Claude Code 2.1.1 compatibility + log-level audit + path validation fixes (#614 )

* Refactor CLAUDE.md and related files for December 2025 updates

- Updated CLAUDE.md in src/services/worker with new entries for December 2025, including changes to Search.ts, GeminiAgent.ts, SDKAgent.ts, and SessionManager.ts.
- Revised CLAUDE.md in src/shared to reflect updates and new entries for December 2025, including paths.ts and worker-utils.ts.
- Modified hook-constants.ts to clarify exit codes and their behaviors.
- Added comprehensive hooks reference documentation for Claude Code, detailing usage, events, and examples.
- Created initial CLAUDE.md files in various directories to track recent activity.

* fix: Merge user-message-hook output into context-hook hookSpecificOutput

- Add footer message to additionalContext in context-hook.ts
- Remove user-message-hook from SessionStart hooks array
- Fixes issue where stderr+exit(1) approach was silently discarded

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Update logs and documentation for recent plugin and worker service changes

- Added detailed logs for worker service activities from Dec 10, 2025 to Jan 7, 2026, including initialization patterns, cleanup confirmations, and diagnostic logging.
- Updated plugin documentation with recent activities, including plugin synchronization and configuration changes from Dec 3, 2025 to Jan 7, 2026.
- Enhanced the context hook and worker service logs to reflect improvements and fixes in the plugin architecture.
- Documented the migration and verification processes for the Claude memory system and its integration with the marketplace.

* Refactor hooks architecture and remove deprecated user-message-hook

- Updated hook configurations in CLAUDE.md and hooks.json to reflect changes in session start behavior.
- Removed user-message-hook functionality as it is no longer utilized in Claude Code 2.1.0; context is now injected silently.
- Enhanced context-hook to handle session context injection without user-visible messages.
- Cleaned up documentation across multiple files to align with the new hook structure and removed references to obsolete hooks.
- Adjusted timing and command execution for hooks to improve performance and reliability.

* fix: Address PR #610 review issues

- Replace USER_MESSAGE_ONLY test with BLOCKING_ERROR test in hook-constants.test.ts
- Standardize Claude Code 2.1.0 note wording across all three documentation files
- Exclude deprecated user-message-hook.ts from logger-usage-standards test

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: Remove hardcoded fake token counts from context injection

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Address PR #610 review issues by fixing test files, standardizing documentation notes, and verifying code quality improvements.

* fix: Add path validation to CLAUDE.md distribution to prevent invalid directory creation

- Add isValidPathForClaudeMd() function to reject invalid paths:
  - Tilde paths (~) that Node.js doesn't expand
  - URLs (http://, https://)
  - Paths with spaces (likely command text or PR references)
  - Paths with # (GitHub issue/PR references)
  - Relative paths that escape project boundary

- Integrate validation in updateFolderClaudeMdFiles loop
- Add 6 unit tests for path validation
- Update .gitignore to prevent accidental commit of malformed directories
- Clean up existing invalid directories (~/, PR #610..., git diff..., https:)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: Implement path validation in CLAUDE.md generation to prevent invalid directory creation

- Added `isValidPathForClaudeMd()` function to validate file paths in `src/utils/claude-md-utils.ts`.
- Integrated path validation in `updateFolderClaudeMdFiles` to skip invalid paths.
- Added 6 new unit tests in `tests/utils/claude-md-utils.test.ts` to cover various rejection cases.
- Updated `.gitignore` to prevent tracking of invalid directories.
- Cleaned up existing invalid directories in the repository.

* feat: Promote critical WARN logs to ERROR level across codebase

Comprehensive log-level audit promoting 38+ WARN messages to ERROR for
improved debugging and incident response:

- Parser: observation type errors, data contamination
- SDK/Agents: empty init responses (Gemini, OpenRouter)
- Worker/Queue: session recovery, auto-recovery failures
- Chroma: sync failures, search failures (now treated as critical)
- SQLite: search failures (primary data store)
- Session/Generator: failures, missing context
- Infrastructure: shutdown, process management failures
- File Operations: CLAUDE.md updates, config reads
- Branch Management: recovery checkout failures

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: Address PR #614 review issues

- Remove incorrectly tracked tilde-prefixed files from git
- Fix absolute path validation to check projectRoot boundaries
- Add test coverage for absolute path validation edge cases

Closes review issues:
- Issue 1: ~/ prefixed files removed from tracking
- Issue 3: Absolute paths now validated against projectRoot
- Issue 4: Added 3 new test cases for absolute path scenarios

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* build assets and context

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-07 23:34:20 -05:00

13 KiB

Raw Blame History

Issue #596: ProcessTransport is not ready for writing - Generator aborted on every observation

Date: 2026-01-07 Issue: #596 Reported by: soho-dev-account Severity: Critical Status: Under Investigation Labels: bug

1. Executive Summary

After a clean install of claude-mem v9.0.0, the SDK agent aborts every observation with a "ProcessTransport is not ready for writing" error. The worker starts successfully and the HTTP API responds, but no observations are stored to the database. The error originates from the Claude Agent SDK's internal transport layer, specifically in the bundled worker-service.cjs at line 1119.

Key Finding: This is a race condition or timing issue in the Claude Agent SDK's ProcessTransport initialization. The SDK attempts to write messages to its subprocess transport before the transport's ready state is established.

Impact: Complete loss of memory functionality. The system appears operational but silently fails to capture any development context.

2. Problem Analysis

2.1 Symptoms

Worker starts successfully - No startup errors, HTTP endpoints respond
Observations are queued - HTTP 200 responses from /api/sessions/observations
Generator aborts immediately - Every queued message triggers generator abort
No observations stored - Database remains empty despite active usage

2.2 Error Signature

error: ProcessTransport is not ready for writing at write (/Users/.../worker-service.cjs:1119:5337)

2.3 Worker Logs Pattern

[INFO ] [SDK ] Starting SDK query...
[INFO ] [SDK ] Creating message generator...
[INFO ] [SESSION] [session-3458] Generator aborted

The log shows:

SDK query starts (line 78-85 in SDKAgent.ts)
Message generator created (line 266-272 in SDKAgent.ts)
Generator aborts immediately (line 169 in SessionRoutes.ts)

The gap between "Creating message generator" and "Generator aborted" indicates the SDK's query() function throws before yielding any messages.

2.4 Environment Context

OS: macOS 26.3, Apple Silicon
Bun: 1.3.5
Node: v22.21.1
Claude Code: 2.0.75
claude-mem: v9.0.0 (clean install)

3. Technical Details

3.1 ProcessTransport in the Agent SDK

The ProcessTransport class is part of the Claude Agent SDK (@anthropic-ai/claude-agent-sdk), bundled into worker-service.cjs during the build process. This transport manages bidirectional IPC communication between:

Parent process: The claude-mem worker service
Child process: Claude Code subprocess spawned for SDK queries

The transport uses stdin/stdout pipes to exchange JSON messages with the Claude Code process.

3.2 The Ready State Problem

ProcessTransport maintains a ready state that gates write operations:

// Approximate structure from bundled code
class ProcessTransport {
  ready = false;

  write(data) {
    if (!this.ready) {
      throw new Error("ProcessTransport is not ready for writing");
    }
    // ... actual write to subprocess stdin
  }

  async start() {
    // Spawn subprocess
    // Set up pipes
    this.ready = true;
  }
}

The error occurs when write() is called before start() completes, or when the transport initialization fails silently.

3.3 Code Flow Analysis

Session initialization (SessionRoutes.ts:237-299)
- HTTP request creates/fetches session
- Calls startGeneratorWithProvider()
Generator startup (SessionRoutes.ts:118-217)
- Sets session.currentProvider
- Calls agent.startSession(session, worker)
- Wraps in Promise with error/finally handlers

SDK query invocation (SDKAgent.ts:102-114)

const queryResult = query({
  prompt: messageGenerator,
  options: {
    model: modelId,
    ...(hasRealMemorySessionId && session.lastPromptNumber > 1 && { resume: session.memorySessionId }),
    disallowedTools,
    abortController: session.abortController,
    pathToClaudeCodeExecutable: claudePath
  }
});

SDK internal flow (inside query())
- Creates ProcessTransport
- Spawns Claude subprocess
- RACE: Attempts to write before ready

3.4 Abort Controller Signal Path

When ProcessTransport throws, the error propagates through:

query() async iterator throws
for await loop in startSession() exits
Generator promise rejects
SessionRoutes .catch() handler executes
Checks session.abortController.signal.aborted
Since not manually aborted, logs "Generator failed" at ERROR level
.finally() handler executes
Logs "Generator aborted" (misleading - it wasn't aborted, it crashed)

4. Impact Assessment

4.1 Functional Impact

Component	Status	Notes
Worker startup	Working	HTTP server binds correctly
HTTP API	Working	Endpoints respond with 200
Session creation	Working	Database rows created
Observation queueing	Working	Messages added to pending queue
SDK query	Failing	ProcessTransport error
Observation storage	Failing	No observations saved
Summary generation	Failing	Depends on working SDK
CLAUDE.md generation	Partial	No recent activity to show

4.2 User Impact

100% loss of memory functionality - No observations captured
Silent failure mode - Worker appears healthy
Queue grows indefinitely - Messages stuck in "processing"
No error visible to user - Requires checking worker logs

4.3 System Recovery

After this failure:

Pending messages remain in database (crash-safe design)
On worker restart, messages are recoverable
If SDK issue is resolved, backlog will process

5. Root Cause Analysis

5.1 Primary Hypothesis: SDK Version Incompatibility

Confidence: 85%

The Claude Agent SDK version (^0.1.76) may have introduced changes to ProcessTransport initialization timing that conflict with how claude-mem invokes query().

Evidence:

v9.0.0 works for some users but fails for others
Error occurs in SDK internals, not claude-mem code
Similar timing issues seen in previous SDK versions

5.2 Alternative Hypothesis: Subprocess Spawn Race

Confidence: 70%

The Claude Code subprocess may fail to start or respond in time, causing the transport to remain in non-ready state.

Evidence:

pathToClaudeCodeExecutable is auto-detected
Different Claude Code versions may have different startup times
Apple Silicon Bun may spawn processes differently

5.3 Alternative Hypothesis: Bun-Specific IPC Issue

Confidence: 50%

Bun's process spawning may handle stdin/stdout pipes differently than Node.js, causing transport initialization to fail.

Evidence:

claude-mem runs under Bun
Agent SDK may not be tested extensively with Bun runtime
Bun 1.3.5 is relatively new

Commit e22e2bfc fixed a version mismatch causing infinite worker restart loops. This touched:

plugin/package.json
plugin/scripts/worker-service.cjs
Hook scripts

While this fix addressed restart loops, it may have introduced timing changes that expose this race condition.

6. Recommended Solutions

6.1 Immediate Workarounds

Option A: Retry with Backoff (Quick Fix)

Add retry logic around query() invocation:

// SDKAgent.ts - wrap query() with retry
async function queryWithRetry(options: QueryOptions, maxRetries = 3): Promise<QueryResult> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return query(options);
    } catch (error) {
      if (error.message?.includes('ProcessTransport is not ready') && attempt < maxRetries - 1) {
        await new Promise(resolve => setTimeout(resolve, 100 * (attempt + 1)));
        continue;
      }
      throw error;
    }
  }
}

Pros: Quick to implement, may resolve timing-sensitive cases Cons: Masks underlying issue, adds latency

Option B: Verify Claude Executable Before Query

Add explicit verification that Claude is responsive:

// Before calling query()
const testResult = execSync(`${claudePath} --version`, { timeout: 5000 });
if (!testResult) {
  throw new Error('Claude executable not responding');
}

Pros: Catches subprocess spawn failures early Cons: Adds startup latency, doesn't address transport race

6.2 Medium-Term Fixes

Option C: Pin SDK Version

Lock to a known-working SDK version:

{
  "dependencies": {
    "@anthropic-ai/claude-agent-sdk": "0.1.75"
  }
}

Pros: Immediate resolution if regression confirmed Cons: Misses security updates, may not match Claude Code version

Option D: Add Transport Ready Callback

Request SDK feature to expose transport ready state:

// Hypothetical API
const queryResult = query({
  prompt: messageGenerator,
  options: { ... },
  onTransportReady: () => logger.info('SDK', 'Transport ready')
});

Pros: Proper fix at SDK level Cons: Requires SDK changes

6.3 Long-Term Solutions

Option E: V2 SDK Migration

The V2 SDK (unstable_v2_createSession) uses a different session-based architecture that may not have this race condition:

await using session = unstable_v2_createSession({
  model: 'claude-sonnet-4-5-20250929'
});

await session.send(initPrompt);  // Explicit send/receive
for await (const msg of session.receive()) { ... }

Pros: Modern API, explicit lifecycle control Cons: V2 is "unstable preview", requires significant refactor

Option F: Alternative Agent Provider

Use Gemini or OpenRouter as default when SDK fails:

// SessionRoutes.ts - fallback logic
try {
  await sdkAgent.startSession(session, worker);
} catch (error) {
  if (error.message?.includes('ProcessTransport')) {
    logger.warn('SESSION', 'SDK transport failed, falling back to Gemini');
    await geminiAgent.startSession(session, worker);
  }
}

Pros: System remains functional Cons: Different model behavior, requires API key

7. Priority/Severity Assessment

7.1 Severity: Critical

Criterion	Rating	Justification
Functional Impact	Critical	Core feature completely broken
User Count	Unknown	Appears on clean installs
Data Loss	Low	No data corrupted, queue preserved
Recoverability	Medium	Worker restart may help
Workaround Available	Limited	Use alternative provider

7.2 Priority: P0

This should be treated as a P0 (highest priority) issue because:

Core functionality broken - Memory capture is the primary feature
Silent failure - Users may not realize observations aren't being saved
Clean install affected - New users cannot use the product
No easy workaround - Requires code changes or provider switching

7.3 Recommended Action Plan

Immediate (Day 1)
- Reproduce issue in controlled environment
- Test with pinned SDK version 0.1.75
- Test with Node.js instead of Bun
- Add explicit error message to SessionRoutes for this failure mode
Short-term (Week 1)
- Implement retry logic (Option A)
- Add transport failure telemetry
- Document workaround in issue comments
- File SDK issue with Anthropic if confirmed regression
Medium-term (Week 2-4)
- Evaluate V2 SDK migration timeline
- Add graceful fallback to alternative providers
- Improve generator error visibility in viewer UI

8. Appendix

File	Relevance
`src/services/worker/SDKAgent.ts`	SDK query invocation
`src/services/worker/http/routes/SessionRoutes.ts`	Generator lifecycle management
`src/services/worker/SessionManager.ts`	Session state and queue management
`src/services/worker-types.ts`	ActiveSession type definition
`plugin/scripts/worker-service.cjs`	Bundled worker with SDK code

#567 - Version mismatch causing infinite worker restart loop (may be related)
#520 - Stuck messages analysis (similar symptom pattern)
#532 - Memory leak analysis (generator lifecycle issues)

docs/context/agent-sdk-v2-preview.md - V2 SDK documentation
docs/context/agent-sdk-v2-examples.ts - V2 SDK code examples
docs/reports/2026-01-02--generator-failure-investigation.md - Previous generator failure analysis

8.4 Test Commands

# Check worker logs
tail -f ~/.claude-mem/logs/worker-$(date +%Y-%m-%d).log

# Check pending queue
npm run queue

# Restart worker
npm run worker:restart

# Test with specific SDK version
npm install @anthropic-ai/claude-agent-sdk@0.1.75
npm run build-and-sync

Report prepared by: Claude Code Analysis date: 2026-01-07 Next review: After reproduction attempt

13 KiB Raw Blame History

Issue #596: ProcessTransport is not ready for writing - Generator aborted on every observation

1. Executive Summary

2. Problem Analysis

2.1 Symptoms

2.2 Error Signature

2.3 Worker Logs Pattern

2.4 Environment Context

3. Technical Details

3.1 ProcessTransport in the Agent SDK

3.2 The Ready State Problem

3.3 Code Flow Analysis

3.4 Abort Controller Signal Path

4. Impact Assessment

4.1 Functional Impact

4.2 User Impact

4.3 System Recovery

5. Root Cause Analysis

5.1 Primary Hypothesis: SDK Version Incompatibility

5.2 Alternative Hypothesis: Subprocess Spawn Race

5.3 Alternative Hypothesis: Bun-Specific IPC Issue

5.4 Related: Recent Version Mismatch Fix (#567)

6. Recommended Solutions

6.1 Immediate Workarounds

Option A: Retry with Backoff (Quick Fix)

Option B: Verify Claude Executable Before Query

6.2 Medium-Term Fixes

Option C: Pin SDK Version

Option D: Add Transport Ready Callback

6.3 Long-Term Solutions

Option E: V2 SDK Migration

Option F: Alternative Agent Provider

7. Priority/Severity Assessment

7.1 Severity: Critical

7.2 Priority: P0

7.3 Recommended Action Plan

8. Appendix

8.1 Related Files

8.2 Related Issues

8.3 Related Documentation

8.4 Test Commands

13 KiB

Raw Blame History