Fix 30+ root-cause bugs across 10 triage phases (#1214)

* MAESTRO: fix ChromaDB core issues — Python pinning, Windows paths, disable toggle, metadata sanitization, transport errors

- Add --python version pinning to uvx args in both local and remote mode (fixes #1196, #1206, #1208)
- Convert backslash paths to forward slashes for --data-dir on Windows (fixes #1199)
- Add CLAUDE_MEM_CHROMA_ENABLED setting for SQLite-only fallback mode (fixes #707)
- Sanitize metadata in addDocuments() to filter null/undefined/empty values (fixes #1183, #1188)
- Wrap callTool() in try/catch for transport errors with auto-reconnect (fixes #1162)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix data integrity — content-hash deduplication, project name collision, empty project guard, stuck isProcessing

- Add SHA-256 content-hash deduplication to observations INSERT (store.ts, transactions.ts, SessionStore.ts)
- Add content_hash column via migration 22 with backfill and index
- Fix project name collision: getCurrentProjectName() now returns parent/basename
- Guard against empty project string with cwd-derived fallback
- Fix stuck isProcessing: hasAnyPendingWork() resets processing messages older than 5 minutes
- Add 12 new tests covering all four fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix hook lifecycle — stderr suppression, output isolation, conversation pollution prevention

- Suppress process.stderr.write in hookCommand() to prevent Claude Code showing diagnostic
  output as error UI (#1181). Restores stderr in finally block for worker-continues case.
- Convert console.error() to logger.warn()/error() in hook-command.ts and handlers/index.ts
  so all diagnostics route to log file instead of stderr.
- Verified all 7 handlers return suppressOutput: true (prevents conversation pollution #598, #784).
- Verified session-complete is a recognized event type (fixes #984).
- Verified unknown event types return no-op handler with exit 0 (graceful degradation).
- Added 10 new tests in tests/hook-lifecycle.test.ts covering event dispatch, adapter defaults,
  stderr suppression, and standard response constants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix worker lifecycle — restart loop coordination, stale transport retry, ENOENT shutdown race

- Add PID file mtime guard to prevent concurrent restart storms (#1145):
  isPidFileRecent() + touchPidFile() coordinate across sessions
- Add transparent retry in ChromaMcpManager.callTool() on transport
  error — reconnects and retries once instead of failing (#1131)
- Wrap getInstalledPluginVersion() with ENOENT/EBUSY handling (#1042)
- Verified ChromaMcpManager.stop() already called on all shutdown paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix Windows platform support — uvx.cmd spawn, PowerShell $_ elimination, windowsHide, FTS5 fallback

- Route uvx spawn through cmd.exe /c on Windows since MCP SDK lacks shell:true (#1190, #1192, #1199)
- Replace all PowerShell Where-Object {$_} pipelines with WQL -Filter server-side filtering (#1024, #1062)
- Add windowsHide: true to all exec/spawn calls missing it to prevent console popups (#1048)
- Add FTS5 runtime probe with graceful fallback when unavailable on Windows (#791)
- Guard FTS5 table creation in migrations, SessionSearch, and SessionStore with try/catch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix skills/ distribution — build-time verification and regression tests (#1187)

Add post-build verification in build-hooks.js that fails if critical
distribution files (skills, hooks, plugin manifest) are missing. Add
10 regression tests covering skill file presence, YAML frontmatter,
hooks.json integrity, and package.json files field.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix MigrationRunner schema initialization (#979) — version conflict between parallel migration systems

Root cause: old DatabaseManager migrations 1-7 shared schema_versions table with
MigrationRunner's 4-22, causing version number collisions (5=drop tables vs add column,
6=FTS5 vs prompt tracking, 7=discovery_tokens vs remove UNIQUE).  initializeSchema()
was gated behind maxApplied===0, so core tables were never created when old versions
were present.

Fixes:
- initializeSchema() always creates core tables via CREATE TABLE IF NOT EXISTS
- Migrations 5-7 check actual DB state (columns/constraints) not just version tracking
- Crash-safe temp table rebuilds (DROP IF EXISTS _new before CREATE)
- Added missing migration 21 (ON UPDATE CASCADE) to MigrationRunner
- Added ON UPDATE CASCADE to FK definitions in initializeSchema()
- All changes applied to both runner.ts and SessionStore.ts

Tests: 13 new tests in migration-runner.test.ts covering fresh DB, idempotency,
version conflicts, crash recovery, FK constraints, and data integrity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix 21 test failures — stale mocks, outdated assertions, missing OpenClaw guards

Server tests (12): Added missing workerPath and getAiStatus to ServerOptions
mocks after interface expansion. ChromaSync tests (3): Updated to verify
transport cleanup in ChromaMcpManager after architecture refactor. OpenClaw (2):
Added memory_ tool skipping and response truncation to prevent recursive loops
and oversized payloads. MarkdownFormatter (2): Updated assertions to match
current output. SettingsDefaultsManager (1): Used correct default key for
getBool test. Logger standards (1): Excluded CLI transcript command from
background service check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix Codex CLI compatibility (#744) — session_id fallbacks, unknown platform tolerance, undefined guard

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix Cursor IDE integration (#838, #1049) — adapter field fallbacks, tolerant session-init validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix /api/logs OOM (#1203) — tail-read replaces full-file readFileSync

Replace readFileSync (loads entire file into memory) with readLastLines()
that reads only from the end of the file in expanding chunks (64KB → 10MB cap).
Prevents OOM on large log files while preserving the same API response shape.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix Settings CORS error (#1029) — explicit methods and allowedHeaders in CORS config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: add session custom_title for agent attribution (#1213) — migration 23, endpoint + store support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: prevent CLAUDE.md/AGENTS.md writes inside .git/ directories (#1165)

Add .git path guard to all 4 write sites to prevent ref corruption when
paths resolve inside .git internals.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix plugin disabled state not respected (#781) — early exit check in all hook entry points

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix UserPromptSubmit context re-injection on every turn (#1079) — contextInjected session flag

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix stale AbortController queue stall (#1099) — lastGeneratorActivity tracking + 30s timeout

Three-layer fix:
1. Added lastGeneratorActivity timestamp to ActiveSession, updated by
   processAgentResponse (all agents), getMessageIterator (queue yields),
   and startGeneratorWithProvider (generator launch)
2. Added stale generator detection in ensureGeneratorRunning — if no
   activity for >30s, aborts stale controller, resets state, restarts
3. Added AbortSignal.timeout(30000) in deleteSession to prevent
   indefinite hang when awaiting a stuck generator promise

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-02-23 19:34:35 -05:00
committed by GitHub
parent d9a30cc7d4
commit c6f932988a
62 changed files with 3639 additions and 793 deletions
+12 -7
View File
@@ -11,6 +11,8 @@
import { SessionStore } from '../sqlite/SessionStore.js';
import { SessionSearch } from '../sqlite/SessionSearch.js';
import { ChromaSync } from '../sync/ChromaSync.js';
import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
import { USER_SETTINGS_PATH } from '../../shared/paths.js';
import { logger } from '../../utils/logger.js';
import type { DBSession } from '../worker-types.js';
@@ -27,8 +29,14 @@ export class DatabaseManager {
this.sessionStore = new SessionStore();
this.sessionSearch = new SessionSearch();
// Initialize ChromaSync (lazy - connects on first search, not at startup)
this.chromaSync = new ChromaSync('claude-mem');
// Initialize ChromaSync only if Chroma is enabled (SQLite-only fallback when disabled)
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
const chromaEnabled = settings.CLAUDE_MEM_CHROMA_ENABLED !== 'false';
if (chromaEnabled) {
this.chromaSync = new ChromaSync('claude-mem');
} else {
logger.info('DB', 'Chroma disabled via CLAUDE_MEM_CHROMA_ENABLED=false, using SQLite-only search');
}
logger.info('DB', 'Database initialized');
}
@@ -75,12 +83,9 @@ export class DatabaseManager {
}
/**
* Get ChromaSync instance (throws if not initialized)
* Get ChromaSync instance (returns null if Chroma is disabled)
*/
getChromaSync(): ChromaSync {
if (!this.chromaSync) {
throw new Error('ChromaSync not initialized');
}
getChromaSync(): ChromaSync | null {
return this.chromaSync;
}
+1 -1
View File
@@ -39,7 +39,7 @@ export class SearchManager {
constructor(
private sessionSearch: SessionSearch,
private sessionStore: SessionStore,
private chromaSync: ChromaSync,
private chromaSync: ChromaSync | null,
private formatter: FormattingService,
private timelineService: TimelineService
) {
+13 -3
View File
@@ -155,7 +155,8 @@ export class SessionManager {
conversationHistory: [], // Initialize empty - will be populated by agents
currentProvider: null, // Will be set when generator starts
consecutiveRestarts: 0, // Track consecutive restart attempts to prevent infinite loops
processingMessageIds: [] // CLAIM-CONFIRM: Track message IDs for confirmProcessed()
processingMessageIds: [], // CLAIM-CONFIRM: Track message IDs for confirmProcessed()
lastGeneratorActivity: Date.now() // Initialize for stale detection (Issue #1099)
};
logger.debug('SESSION', 'Creating new session object (memorySessionId cleared to prevent stale resume)', {
@@ -286,11 +287,17 @@ export class SessionManager {
// 1. Abort the SDK agent
session.abortController.abort();
// 2. Wait for generator to finish
// 2. Wait for generator to finish (with 30s timeout to prevent stale stall, Issue #1099)
if (session.generatorPromise) {
await session.generatorPromise.catch(() => {
const generatorDone = session.generatorPromise.catch(() => {
logger.debug('SYSTEM', 'Generator already failed, cleaning up', { sessionId: session.sessionDbId });
});
const timeoutDone = new Promise<void>(resolve => {
AbortSignal.timeout(30_000).addEventListener('abort', () => resolve(), { once: true });
});
await Promise.race([generatorDone, timeoutDone]).then(() => {}, () => {
logger.warn('SESSION', 'Generator did not exit within 30s after abort, forcing cleanup (#1099)', { sessionDbId });
});
}
// 3. Verify subprocess exit with 5s timeout (Issue #737 fix)
@@ -468,6 +475,9 @@ export class SessionManager {
session.earliestPendingTimestamp = Math.min(session.earliestPendingTimestamp, message._originalTimestamp);
}
// Update generator activity for stale detection (Issue #1099)
session.lastGeneratorActivity = Date.now();
yield message;
}
}
@@ -56,6 +56,9 @@ export async function processAgentResponse(
agentName: string,
projectRoot?: string
): Promise<void> {
// Track generator activity for stale detection (Issue #1099)
session.lastGeneratorActivity = Date.now();
// Add assistant response to shared conversation history for provider interop
if (text) {
session.conversationHistory.push({ role: 'assistant', content: text });
@@ -189,8 +192,8 @@ async function syncAndBroadcastObservations(
const obs = observations[i];
const chromaStart = Date.now();
// Sync to Chroma (fire-and-forget)
dbManager.getChromaSync().syncObservation(
// Sync to Chroma (fire-and-forget, skipped if Chroma is disabled)
dbManager.getChromaSync()?.syncObservation(
obsId,
session.contentSessionId,
session.project,
@@ -282,8 +285,8 @@ async function syncAndBroadcastSummary(
const chromaStart = Date.now();
// Sync to Chroma (fire-and-forget)
dbManager.getChromaSync().syncSummary(
// Sync to Chroma (fire-and-forget, skipped if Chroma is disabled)
dbManager.getChromaSync()?.syncSummary(
result.summaryId,
session.contentSessionId,
session.project,
+2
View File
@@ -37,6 +37,8 @@ export function createMiddleware(
callback(new Error('CORS not allowed'));
}
},
methods: ['GET', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE'],
allowedHeaders: ['Content-Type', 'Authorization', 'X-Requested-With'],
credentials: false
}));
+78 -9
View File
@@ -5,12 +5,85 @@
*/
import express, { Request, Response } from 'express';
import { readFileSync, existsSync, writeFileSync, readdirSync } from 'fs';
import { openSync, fstatSync, readSync, closeSync, existsSync, writeFileSync } from 'fs';
import { join } from 'path';
import { logger } from '../../../../utils/logger.js';
import { SettingsDefaultsManager } from '../../../../shared/SettingsDefaultsManager.js';
import { BaseRouteHandler } from '../BaseRouteHandler.js';
/**
* Read the last N lines from a file without loading the entire file into memory.
* Reads backwards from the end of the file in chunks until enough lines are found.
*/
export function readLastLines(filePath: string, lineCount: number): { lines: string; totalEstimate: number } {
const fd = openSync(filePath, 'r');
try {
const stat = fstatSync(fd);
const fileSize = stat.size;
if (fileSize === 0) {
return { lines: '', totalEstimate: 0 };
}
// Start with a reasonable chunk size, expand if needed
const INITIAL_CHUNK_SIZE = 64 * 1024; // 64KB
const MAX_READ_SIZE = 10 * 1024 * 1024; // 10MB cap to prevent OOM on huge single-line files
let readSize = Math.min(INITIAL_CHUNK_SIZE, fileSize);
let content = '';
let newlineCount = 0;
while (readSize <= fileSize && readSize <= MAX_READ_SIZE) {
const startPosition = Math.max(0, fileSize - readSize);
const bytesToRead = fileSize - startPosition;
const buffer = Buffer.alloc(bytesToRead);
readSync(fd, buffer, 0, bytesToRead, startPosition);
content = buffer.toString('utf-8');
// Count newlines to see if we have enough
newlineCount = 0;
for (let i = 0; i < content.length; i++) {
if (content[i] === '\n') newlineCount++;
}
// We need lineCount newlines to get lineCount full lines (trailing newline)
if (newlineCount >= lineCount || startPosition === 0) {
break;
}
// Double the read size for next attempt
readSize = Math.min(readSize * 2, fileSize, MAX_READ_SIZE);
}
// Split and take the last N lines
const allLines = content.split('\n');
// Remove trailing empty element from final newline
if (allLines.length > 0 && allLines[allLines.length - 1] === '') {
allLines.pop();
}
const startIndex = Math.max(0, allLines.length - lineCount);
const resultLines = allLines.slice(startIndex);
// Estimate total lines: if we read the whole file, we know exactly; otherwise estimate
let totalEstimate: number;
if (fileSize <= readSize) {
totalEstimate = allLines.length;
} else {
// Rough estimate based on average line length in the chunk we read
const avgLineLength = content.length / Math.max(newlineCount, 1);
totalEstimate = Math.round(fileSize / avgLineLength);
}
return {
lines: resultLines.join('\n'),
totalEstimate,
};
} finally {
closeSync(fd);
}
}
export class LogsRoutes extends BaseRouteHandler {
private getLogFilePath(): string {
const dataDir = SettingsDefaultsManager.get('CLAUDE_MEM_DATA_DIR');
@@ -50,19 +123,15 @@ export class LogsRoutes extends BaseRouteHandler {
const requestedLines = parseInt(req.query.lines as string || '1000', 10);
const maxLines = Math.min(requestedLines, 10000); // Cap at 10k lines
const content = readFileSync(logFilePath, 'utf-8');
const lines = content.split('\n');
// Return the last N lines
const startIndex = Math.max(0, lines.length - maxLines);
const recentLines = lines.slice(startIndex).join('\n');
const { lines: recentLines, totalEstimate } = readLastLines(logFilePath, maxLines);
const returnedLines = recentLines === '' ? 0 : recentLines.split('\n').length;
res.json({
logs: recentLines,
path: logFilePath,
exists: true,
totalLines: lines.length,
returnedLines: lines.length - startIndex
totalLines: totalEstimate,
returnedLines,
});
});
@@ -90,6 +90,8 @@ export class SessionRoutes extends BaseRouteHandler {
* we let the current generator finish naturally (max 5s linger timeout).
* The next generator will use the new provider with shared conversationHistory.
*/
private static readonly STALE_GENERATOR_THRESHOLD_MS = 30_000; // 30 seconds (#1099)
private ensureGeneratorRunning(sessionDbId: number, source: string): void {
const session = this.sessionManager.getSession(sessionDbId);
if (!session) return;
@@ -109,6 +111,26 @@ export class SessionRoutes extends BaseRouteHandler {
return;
}
// Generator is running - check if stale (no activity for 30s) to prevent queue stall (#1099)
const timeSinceActivity = Date.now() - session.lastGeneratorActivity;
if (timeSinceActivity > SessionRoutes.STALE_GENERATOR_THRESHOLD_MS) {
logger.warn('SESSION', 'Stale generator detected, aborting to prevent queue stall (#1099)', {
sessionId: sessionDbId,
timeSinceActivityMs: timeSinceActivity,
thresholdMs: SessionRoutes.STALE_GENERATOR_THRESHOLD_MS,
source
});
// Abort the stale generator and reset state
session.abortController.abort();
session.generatorPromise = null;
session.abortController = new AbortController();
session.lastGeneratorActivity = Date.now();
// Start a fresh generator
this.spawnInProgress.set(sessionDbId, true);
this.startGeneratorWithProvider(session, selectedProvider, 'stale-recovery');
return;
}
// Generator is running - check if provider changed
if (session.currentProvider && session.currentProvider !== selectedProvider) {
logger.info('SESSION', `Provider changed, will switch after current generator finishes`, {
@@ -155,8 +177,9 @@ export class SessionRoutes extends BaseRouteHandler {
historyLength: session.conversationHistory.length
});
// Track which provider is running
// Track which provider is running and mark activity for stale detection (#1099)
session.currentProvider = provider;
session.lastGeneratorActivity = Date.now();
session.generatorPromise = agent.startSession(session, this.workerService)
.catch(error => {
@@ -669,23 +692,30 @@ export class SessionRoutes extends BaseRouteHandler {
* Returns: { sessionDbId, promptNumber, skipped: boolean, reason?: string }
*/
private handleSessionInitByClaudeId = this.wrapHandler((req: Request, res: Response): void => {
const { contentSessionId, project, prompt } = req.body;
const { contentSessionId } = req.body;
// Only contentSessionId is truly required — Cursor and other platforms
// may omit prompt/project in their payload (#838, #1049)
const project = req.body.project || 'unknown';
const prompt = req.body.prompt || '[media prompt]';
const customTitle = req.body.customTitle || undefined;
logger.info('HTTP', 'SessionRoutes: handleSessionInitByClaudeId called', {
contentSessionId,
project,
prompt_length: prompt?.length
prompt_length: prompt?.length,
customTitle
});
// Validate required parameters
if (!this.validateRequired(req, res, ['contentSessionId', 'project', 'prompt'])) {
if (!this.validateRequired(req, res, ['contentSessionId'])) {
return;
}
const store = this.dbManager.getSessionStore();
// Step 1: Create/get SDK session (idempotent INSERT OR IGNORE)
const sessionDbId = store.createSDKSession(contentSessionId, project, prompt);
const sessionDbId = store.createSDKSession(contentSessionId, project, prompt, customTitle);
// Verify session creation with DB lookup
const dbSession = store.getSessionById(sessionDbId);
@@ -729,16 +759,22 @@ export class SessionRoutes extends BaseRouteHandler {
// Step 5: Save cleaned user prompt
store.saveUserPrompt(contentSessionId, promptNumber, cleanedPrompt);
// Step 6: Check if SDK agent is already running for this session (#1079)
// If contextInjected is true, the hook should skip re-initializing the SDK agent
const contextInjected = this.sessionManager.getSession(sessionDbId) !== undefined;
// Debug-level log since CREATED already logged the key info
logger.debug('SESSION', 'User prompt saved', {
sessionId: sessionDbId,
promptNumber
promptNumber,
contextInjected
});
res.json({
sessionDbId,
promptNumber,
skipped: false
skipped: false,
contextInjected
});
});
}