9eddc51979
- Added detailed documentation on performance issues within `worker-service.ts`. - Identified high severity issues including unnecessary polling, artificial delays, and redundant database operations. - Provided recommendations for immediate and long-term improvements to enhance performance and reduce complexity. - Suggested architectural changes to replace polling with event-driven patterns and optimize database connection handling.
960 lines
33 KiB
Markdown
960 lines
33 KiB
Markdown
# Worker Service Overhead Analysis
|
|
|
|
**Date**: 2025-11-06
|
|
**File**: `src/services/worker-service.ts`
|
|
**Total Lines**: 1173
|
|
**Overall Assessment**: This file has accumulated unnecessary complexity, artificial delays, and defensive programming patterns that actively harm performance. Many patterns were likely added "just in case" without real-world justification.
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
**High Severity Issues (Score 8-10)**:
|
|
- **Line 942**: Polling loop with 100ms delay instead of event-driven architecture (Score: 10/10)
|
|
- **Lines 338-365**: Spinner debounce with 1.5s artificial delay (Score: 9/10)
|
|
- **Lines 204-234**: Database reopening on every getOrCreateSession call (Score: 8/10)
|
|
|
|
**Medium Severity Issues (Score 5-7)**:
|
|
- **Lines 33-70**: Unnecessary Claude path caching for rare operation (Score: 6/10)
|
|
- **Lines 694-711**: Redundant database reopening in handleInit (Score: 7/10)
|
|
- **Lines 728-741**: Fire-and-forget Chroma sync with verbose error handling (Score: 5/10)
|
|
|
|
**Low Severity Issues (Score 3-4)**:
|
|
- **Line 28**: Magic number MESSAGE_POLL_INTERVAL_MS without justification (Score: 4/10)
|
|
- **Lines 303-321**: Over-engineered SSE client cleanup (Score: 4/10)
|
|
|
|
---
|
|
|
|
## Line-by-Line Analysis
|
|
|
|
### Lines 1-30: Setup and Constants
|
|
|
|
**Lines 22-24**: Version reading from package.json
|
|
```typescript
|
|
const packageJson = JSON.parse(readFileSync(join(__dirname, '..', '..', 'package.json'), 'utf-8'));
|
|
const VERSION = packageJson.version;
|
|
```
|
|
**Score**: 2/10
|
|
**Why**: This is fine. Reads once at startup, uses the value for the /api/stats endpoint.
|
|
|
|
**Line 26**: Model configuration
|
|
```typescript
|
|
const MODEL = process.env.CLAUDE_MEM_MODEL || 'claude-sonnet-4-5';
|
|
```
|
|
**Score**: 1/10
|
|
**Why**: Clean, simple, correct.
|
|
|
|
**Line 28**: Magic number
|
|
```typescript
|
|
const MESSAGE_POLL_INTERVAL_MS = 100;
|
|
```
|
|
**Score**: 4/10
|
|
**Why**: This is a magic number without justification. Why 100ms? Why not 50ms or 200ms? More importantly, **why are we polling at all instead of using event-driven patterns?** The name is descriptive, but the existence of this constant indicates a fundamental architectural problem (see line 942).
|
|
|
|
**Pattern**: This constant exists to support a polling loop that shouldn't exist.
|
|
|
|
---
|
|
|
|
### Lines 33-70: Claude Path Caching
|
|
|
|
```typescript
|
|
let cachedClaudePath: string | null = null;
|
|
|
|
function findClaudePath(): string {
|
|
if (cachedClaudePath) {
|
|
return cachedClaudePath;
|
|
}
|
|
// ... 30 lines of logic to find and cache path ...
|
|
}
|
|
```
|
|
|
|
**Score**: 6/10
|
|
**Why Stupid**:
|
|
1. **YAGNI Violation**: This function is called **exactly once** per worker startup (line 846 in runSDKAgent)
|
|
2. **Premature Optimization**: Caching saves ~5ms on an operation that happens once per worker lifetime
|
|
3. **Added Complexity**: 37 lines of code including module-level state for negligible benefit
|
|
4. **False Economy**: The worker runs for hours/days. Saving 5ms on startup is meaningless.
|
|
|
|
**What Should Happen**:
|
|
```typescript
|
|
function findClaudePath(): string {
|
|
if (process.env.CLAUDE_CODE_PATH) return process.env.CLAUDE_CODE_PATH;
|
|
|
|
const command = process.platform === 'win32' ? 'where claude' : 'which claude';
|
|
const result = execSync(command, { encoding: 'utf8' }).trim().split('\n')[0].trim();
|
|
|
|
if (!result) throw new Error('Claude executable not found in PATH');
|
|
return result;
|
|
}
|
|
```
|
|
**Savings**: Remove 33 lines of unnecessary code and module-level state.
|
|
|
|
---
|
|
|
|
### Lines 103-110: WorkerService State
|
|
|
|
```typescript
|
|
class WorkerService {
|
|
private app: express.Application;
|
|
private sessions: Map<number, ActiveSession> = new Map();
|
|
private chromaSync!: ChromaSync;
|
|
private sseClients: Set<Response> = new Set();
|
|
private isProcessing: boolean = false;
|
|
private spinnerStopTimer: NodeJS.Timeout | null = null;
|
|
```
|
|
|
|
**Score**: 7/10 (for spinnerStopTimer)
|
|
**Why**:
|
|
- `app`, `sessions`, `chromaSync`, `sseClients`: **Good** - necessary state
|
|
- `isProcessing`: **Questionable** (Score 5/10) - Do we really need to track this globally? Can't we derive it from `sessions.size > 0` or `sessions.values().some(s => s.pendingMessages.length > 0)`?
|
|
- `spinnerStopTimer`: **Bad** (Score 7/10) - Exists solely to support artificial debouncing (see lines 338-365)
|
|
|
|
**Pattern**: State that exists to support other unnecessary complexity.
|
|
|
|
---
|
|
|
|
### Lines 145-178: Service Startup
|
|
|
|
**Lines 145-153**: HTTP server startup
|
|
```typescript
|
|
async start(): Promise<void> {
|
|
const port = getWorkerPort();
|
|
await new Promise<void>((resolve, reject) => {
|
|
this.app.listen(port, () => resolve())
|
|
.on('error', reject);
|
|
});
|
|
logger.info('SYSTEM', 'Worker started', { port, pid: process.pid });
|
|
```
|
|
**Score**: 1/10
|
|
**Why**: This is good. Clean promise wrapper, fail-fast on errors, clear logging.
|
|
|
|
**Lines 155-167**: ChromaSync initialization and orphan cleanup
|
|
```typescript
|
|
this.chromaSync = new ChromaSync('claude-mem');
|
|
logger.info('SYSTEM', 'ChromaSync initialized');
|
|
|
|
const db = new SessionStore();
|
|
const cleanedCount = db.cleanupOrphanedSessions();
|
|
db.close();
|
|
```
|
|
**Score**: 2/10
|
|
**Why**: This is fine. Necessary initialization and cleanup. Database is opened, used, and closed immediately.
|
|
|
|
**Lines 168-177**: Chroma backfill
|
|
```typescript
|
|
logger.info('SYSTEM', 'Starting Chroma backfill in background...');
|
|
this.chromaSync.ensureBackfilled()
|
|
.then(() => {
|
|
logger.info('SYSTEM', 'Chroma backfill complete');
|
|
})
|
|
.catch((error: Error) => {
|
|
logger.error('SYSTEM', 'Chroma backfill failed - continuing anyway', {}, error);
|
|
// Don't exit - allow worker to continue serving requests
|
|
});
|
|
```
|
|
**Score**: 3/10
|
|
**Why**: This is mostly fine. Fire-and-forget background operation that doesn't block startup. The verbose error handling is slightly excessive (could be a single logger call), but acceptable for a background operation.
|
|
|
|
---
|
|
|
|
### Lines 200-236: getOrCreateSession - THE KILLER
|
|
|
|
```typescript
|
|
private getOrCreateSession(sessionDbId: number): ActiveSession {
|
|
let session = this.sessions.get(sessionDbId);
|
|
if (session) return session;
|
|
|
|
const db = new SessionStore();
|
|
const dbSession = db.getSessionById(sessionDbId);
|
|
if (!dbSession) {
|
|
db.close();
|
|
throw new Error(`Session ${sessionDbId} not found in database`);
|
|
}
|
|
|
|
session = {
|
|
sessionDbId,
|
|
claudeSessionId: dbSession.claude_session_id,
|
|
sdkSessionId: null,
|
|
project: dbSession.project,
|
|
userPrompt: dbSession.user_prompt,
|
|
pendingMessages: [],
|
|
abortController: new AbortController(),
|
|
generatorPromise: null,
|
|
lastPromptNumber: 0,
|
|
startTime: Date.now()
|
|
};
|
|
|
|
this.sessions.set(sessionDbId, session);
|
|
|
|
session.generatorPromise = this.runSDKAgent(session).catch(err => {
|
|
logger.failure('WORKER', 'SDK agent error', { sessionId: sessionDbId }, err);
|
|
const db = new SessionStore();
|
|
db.markSessionFailed(sessionDbId);
|
|
db.close();
|
|
this.sessions.delete(sessionDbId);
|
|
});
|
|
|
|
db.close();
|
|
return session;
|
|
}
|
|
```
|
|
|
|
**Score**: 8/10
|
|
**Why This Is Stupid**:
|
|
|
|
1. **Database Reopening**: Opens database at line 204, closes at line 234. This happens on:
|
|
- First call to `/sessions/:id/init` (line 691)
|
|
- First call to `/sessions/:id/observations` (line 762)
|
|
- First call to `/sessions/:id/summarize` (line 789)
|
|
|
|
For a typical session: init (DB open/close) → observation (DB open/close) → observation (DB open/close) → summarize (DB open/close). **That's 4 database open/close cycles when ONE would suffice.**
|
|
|
|
2. **Redundant Database Access**: The database is ALREADY opened in `handleInit` at line 695 to call `setWorkerPort()`. So we have:
|
|
- Line 695: `const db = new SessionStore()` in handleInit
|
|
- Line 696: `db.setWorkerPort()`
|
|
- Line 697-711: More queries on the same database
|
|
- Line 711: `db.close()`
|
|
- Line 691: `this.getOrCreateSession()` is called
|
|
- Line 204: **Opens database AGAIN** inside getOrCreateSession
|
|
- Line 234: Closes it
|
|
|
|
**This is fucking insane.** We close the database, then immediately reopen it in the same call stack.
|
|
|
|
3. **Error Handler Opens Database**: Line 228 opens a NEW database connection in the error handler. If runSDKAgent fails, we open the database AGAIN just to mark it failed, then close it. This is defensive programming for ghosts - if the worker is crashing, do we really care about marking it failed?
|
|
|
|
**What Should Happen**:
|
|
- Pass the already-open database connection to getOrCreateSession
|
|
- Or at minimum, reuse the connection from the calling context
|
|
- The error handler should either crash hard or mark failed WITHOUT reopening the database
|
|
|
|
**Estimated Performance Impact**: Database open/close is expensive (~1-5ms each). For a session with 10 observations, this pattern adds **20-100ms of pure overhead**.
|
|
|
|
---
|
|
|
|
### Lines 263-292: SSE Stream Setup
|
|
|
|
```typescript
|
|
private handleSSEStream(req: Request, res: Response): void {
|
|
// Set SSE headers
|
|
res.setHeader('Content-Type', 'text/event-stream');
|
|
res.setHeader('Cache-Control', 'no-cache');
|
|
res.setHeader('Connection', 'keep-alive');
|
|
res.setHeader('Access-Control-Allow-Origin', '*');
|
|
|
|
// Add client to set
|
|
this.sseClients.add(res);
|
|
logger.info('WORKER', `SSE client connected`, { totalClients: this.sseClients.size });
|
|
|
|
// Send only projects list - all data will be loaded via pagination
|
|
const db = new SessionStore();
|
|
const allProjects = db.getAllProjects();
|
|
db.close();
|
|
|
|
const initialData = {
|
|
type: 'initial_load',
|
|
projects: allProjects,
|
|
timestamp: Date.now()
|
|
};
|
|
|
|
res.write(`data: ${JSON.stringify(initialData)}\n\n`);
|
|
|
|
// Handle client disconnect
|
|
req.on('close', () => {
|
|
this.sseClients.delete(res);
|
|
logger.info('WORKER', `SSE client disconnected`, { remainingClients: this.sseClients.size });
|
|
});
|
|
}
|
|
```
|
|
|
|
**Score**: 2/10
|
|
**Why**: This is mostly good. Clean SSE setup with proper headers and client tracking. Database is opened, used, and closed.
|
|
|
|
---
|
|
|
|
### Lines 297-322: SSE Broadcast and Cleanup
|
|
|
|
```typescript
|
|
private broadcastSSE(event: any): void {
|
|
if (this.sseClients.size === 0) {
|
|
return; // No clients connected, skip broadcast
|
|
}
|
|
|
|
const data = `data: ${JSON.stringify(event)}\n\n`;
|
|
const clientsToRemove: Response[] = [];
|
|
|
|
for (const client of this.sseClients) {
|
|
try {
|
|
client.write(data);
|
|
} catch (error) {
|
|
// Client disconnected, mark for removal
|
|
clientsToRemove.push(client);
|
|
}
|
|
}
|
|
|
|
// Clean up disconnected clients
|
|
for (const client of clientsToRemove) {
|
|
this.sseClients.delete(client);
|
|
}
|
|
|
|
if (clientsToRemove.length > 0) {
|
|
logger.info('WORKER', `SSE cleaned up disconnected clients`, { count: clientsToRemove.length });
|
|
}
|
|
}
|
|
```
|
|
|
|
**Score**: 4/10
|
|
**Why This Is Slightly Stupid**:
|
|
|
|
1. **Two-Pass Cleanup**: Creates a temporary array of failed clients, then iterates again to remove them. Why not just remove them in the first loop?
|
|
2. **Unnecessary Logging**: Do we really need to log every time a client disconnects? The `handleSSEStream` already logs disconnects (line 290). This is duplicate logging.
|
|
|
|
**What Should Happen**:
|
|
```typescript
|
|
private broadcastSSE(event: any): void {
|
|
if (this.sseClients.size === 0) return;
|
|
|
|
const data = `data: ${JSON.stringify(event)}\n\n`;
|
|
for (const client of this.sseClients) {
|
|
try {
|
|
client.write(data);
|
|
} catch {
|
|
this.sseClients.delete(client);
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Savings**: Remove 10 lines, remove duplicate logging, eliminate temporary array.
|
|
|
|
---
|
|
|
|
### Lines 338-365: Spinner Debounce - ARTIFICIAL DELAY
|
|
|
|
```typescript
|
|
private checkAndStopSpinner(): void {
|
|
// Clear any existing timer
|
|
if (this.spinnerStopTimer) {
|
|
clearTimeout(this.spinnerStopTimer);
|
|
this.spinnerStopTimer = null;
|
|
}
|
|
|
|
// Check if any session has pending messages
|
|
const hasPendingMessages = Array.from(this.sessions.values()).some(
|
|
session => session.pendingMessages.length > 0
|
|
);
|
|
|
|
if (!hasPendingMessages) {
|
|
// Debounce: wait 1.5s and check again
|
|
this.spinnerStopTimer = setTimeout(() => {
|
|
const stillEmpty = Array.from(this.sessions.values()).every(
|
|
session => session.pendingMessages.length === 0
|
|
);
|
|
|
|
if (stillEmpty) {
|
|
logger.debug('WORKER', 'All queues empty - stopping spinner');
|
|
this.broadcastProcessingStatus(false);
|
|
}
|
|
|
|
this.spinnerStopTimer = null;
|
|
}, 1500);
|
|
}
|
|
}
|
|
```
|
|
|
|
**Score**: 9/10
|
|
**Why This Is ABSOLUTELY FUCKING STUPID**:
|
|
|
|
1. **Artificial Delay**: **1.5 SECONDS** (1500ms) of artificial delay before stopping the spinner. This is pure overhead added for no reason.
|
|
|
|
2. **Why Was This Added?**: Probably someone thought "the UI flickers when the spinner stops/starts rapidly." **SO FUCKING WHAT?** That's a UI rendering problem, not a worker service problem. Fix it in the UI with CSS transitions or debouncing on the CLIENT side.
|
|
|
|
3. **Double-Check Pattern**: Checks if queues are empty, waits 1.5s, then checks AGAIN. This is defensive programming for ghosts. If the queue is empty, it's empty. We're not protecting against race conditions here - we're just wasting time.
|
|
|
|
4. **Polling Instead of Events**: This function is called from `handleAgentMessage` (line 1145) after processing every single response. Instead of reacting to the actual completion of work, we're polling state and debouncing.
|
|
|
|
5. **State Management Overhead**: Requires `spinnerStopTimer` field (line 109), timer cleanup logic, null checks, etc.
|
|
|
|
**Real-World Impact**: Every time the worker finishes processing observations, the UI spinner continues to show "processing" for **1.5 seconds** even though nothing is happening. This makes the entire system feel slower.
|
|
|
|
**What Should Happen**:
|
|
```typescript
|
|
private checkAndStopSpinner(): void {
|
|
const hasPendingMessages = Array.from(this.sessions.values()).some(
|
|
session => session.pendingMessages.length > 0
|
|
);
|
|
|
|
if (!hasPendingMessages) {
|
|
this.broadcastProcessingStatus(false);
|
|
}
|
|
}
|
|
```
|
|
|
|
**Savings**: Remove 15 lines of debouncing logic, remove timer state, eliminate 1.5s artificial delay.
|
|
|
|
**Alternative**: If UI flickering is actually a problem (prove it first), handle it client-side with CSS transitions or client-side debouncing.
|
|
|
|
---
|
|
|
|
### Lines 370-411: Stats Endpoint
|
|
|
|
```typescript
|
|
private handleStats(_req: Request, res: Response): void {
|
|
try {
|
|
const db = new SessionStore();
|
|
|
|
// Get database stats
|
|
const obsCount = db.db.prepare('SELECT COUNT(*) as count FROM observations').get() as { count: number };
|
|
const sessionCount = db.db.prepare('SELECT COUNT(*) as count FROM sdk_sessions').get() as { count: number };
|
|
const summaryCount = db.db.prepare('SELECT COUNT(*) as count FROM session_summaries').get() as { count: number };
|
|
|
|
// Get database file size
|
|
const dbPath = join(homedir(), '.claude-mem', 'claude-mem.db');
|
|
let dbSize = 0;
|
|
if (existsSync(dbPath)) {
|
|
dbSize = statSync(dbPath).size;
|
|
}
|
|
|
|
db.close();
|
|
|
|
// Get worker stats
|
|
const uptime = process.uptime();
|
|
|
|
res.json({
|
|
worker: {
|
|
version: VERSION,
|
|
uptime: Math.floor(uptime),
|
|
activeSessions: this.sessions.size,
|
|
sseClients: this.sseClients.size,
|
|
port: getWorkerPort()
|
|
},
|
|
database: {
|
|
path: dbPath,
|
|
size: dbSize,
|
|
observations: obsCount.count,
|
|
sessions: sessionCount.count,
|
|
summaries: summaryCount.count
|
|
}
|
|
});
|
|
} catch (error: any) {
|
|
logger.error('WORKER', 'Failed to get stats', {}, error);
|
|
res.status(500).json({ error: 'Failed to get stats' });
|
|
}
|
|
}
|
|
```
|
|
|
|
**Score**: 3/10
|
|
**Why Slightly Stupid**:
|
|
|
|
1. **Redundant existsSync Check**: The database path is guaranteed to exist if SessionStore initialized successfully. If it doesn't exist, SessionStore would have crashed on startup. This is defensive programming for ghosts.
|
|
|
|
2. **Three Separate Queries**: Could be combined into a single query with UNION or multiple SELECT columns, but this is minor.
|
|
|
|
**What Should Happen**:
|
|
```typescript
|
|
const dbSize = statSync(dbPath).size; // Just crash if it doesn't exist
|
|
```
|
|
|
|
Otherwise, this is mostly fine. Stats endpoints are low-frequency and non-critical.
|
|
|
|
---
|
|
|
|
### Lines 507-555: GET /api/observations
|
|
|
|
```typescript
|
|
private handleGetObservations(req: Request, res: Response): void {
|
|
try {
|
|
const offset = parseInt(req.query.offset as string || '0', 10);
|
|
const limit = Math.min(parseInt(req.query.limit as string || '50', 10), 100); // Cap at 100
|
|
const project = req.query.project as string | undefined;
|
|
|
|
const db = new SessionStore();
|
|
|
|
// Build query with optional project filter
|
|
let query = `
|
|
SELECT id, type, title, subtitle, text, project, prompt_number, created_at, created_at_epoch
|
|
FROM observations
|
|
`;
|
|
let countQuery = 'SELECT COUNT(*) as total FROM observations';
|
|
const params: any[] = [];
|
|
const countParams: any[] = [];
|
|
|
|
if (project) {
|
|
query += ' WHERE project = ?';
|
|
countQuery += ' WHERE project = ?';
|
|
params.push(project);
|
|
countParams.push(project);
|
|
}
|
|
|
|
query += ' ORDER BY created_at_epoch DESC LIMIT ? OFFSET ?';
|
|
params.push(limit, offset);
|
|
|
|
const stmt = db.db.prepare(query);
|
|
const observations = stmt.all(...params);
|
|
|
|
// Check if there are more results
|
|
const countStmt = db.db.prepare(countQuery);
|
|
const { total } = countStmt.get(...countParams) as { total: number };
|
|
const hasMore = (offset + limit) < total;
|
|
|
|
db.close();
|
|
|
|
res.json({
|
|
observations,
|
|
hasMore,
|
|
total,
|
|
offset,
|
|
limit
|
|
});
|
|
} catch (error: any) {
|
|
logger.error('WORKER', 'Failed to get observations', {}, error);
|
|
res.status(500).json({ error: 'Failed to get observations' });
|
|
}
|
|
}
|
|
```
|
|
|
|
**Score**: 5/10
|
|
**Why This Is Mildly Stupid**:
|
|
|
|
1. **Duplicate Parameter Arrays**: `params` and `countParams` are maintained separately even though they contain the same values (just the project filter). This is error-prone and verbose.
|
|
|
|
2. **Two Queries Instead of One**: We run a COUNT query and a SELECT query. For small datasets, this is fine, but for large datasets, the COUNT query can be expensive. The `hasMore` flag could be computed by fetching `limit + 1` rows and checking if we got more than `limit`.
|
|
|
|
**What Should Happen**:
|
|
```typescript
|
|
// Fetch one extra row to determine if there are more results
|
|
const stmt = db.db.prepare(query);
|
|
const results = stmt.all(...params);
|
|
const observations = results.slice(0, limit);
|
|
const hasMore = results.length > limit;
|
|
|
|
// Only run COUNT if the UI actually needs it (it probably doesn't)
|
|
```
|
|
|
|
**Pattern**: This same pattern is repeated in `handleGetSummaries` (line 557) and `handleGetPrompts` (line 618). Copy-paste code smell.
|
|
|
|
**Estimated Savings**: Remove COUNT queries (which can be expensive on large tables), simplify parameter handling.
|
|
|
|
---
|
|
|
|
### Lines 685-752: POST /sessions/:sessionDbId/init - DATABASE REOPENING HELL
|
|
|
|
```typescript
|
|
private async handleInit(req: Request, res: Response): Promise<void> {
|
|
const sessionDbId = parseInt(req.params.sessionDbId, 10);
|
|
const { project } = req.body;
|
|
|
|
logger.info('WORKER', 'Session init', { sessionDbId, project });
|
|
|
|
const session = this.getOrCreateSession(sessionDbId); // <-- Opens DB at line 204
|
|
const claudeSessionId = session.claudeSessionId;
|
|
|
|
// Update port in database
|
|
const db = new SessionStore(); // <-- Opens DB AGAIN
|
|
db.setWorkerPort(sessionDbId, getWorkerPort());
|
|
|
|
// Get the latest user_prompt for this session to sync to Chroma
|
|
const latestPrompt = db.db.prepare(`
|
|
SELECT
|
|
up.*,
|
|
s.sdk_session_id,
|
|
s.project
|
|
FROM user_prompts up
|
|
JOIN sdk_sessions s ON up.claude_session_id = s.claude_session_id
|
|
WHERE up.claude_session_id = ?
|
|
ORDER BY up.created_at_epoch DESC
|
|
LIMIT 1
|
|
`).get(claudeSessionId) as any;
|
|
|
|
db.close(); // <-- Closes DB
|
|
|
|
// ... SSE broadcast ...
|
|
// ... Chroma sync ...
|
|
|
|
logger.success('WORKER', 'Session initialized', { sessionId: sessionDbId, port: getWorkerPort() });
|
|
res.json({
|
|
status: 'initialized',
|
|
sessionDbId,
|
|
port: getWorkerPort()
|
|
});
|
|
}
|
|
```
|
|
|
|
**Score**: 7/10
|
|
**Why This Is Stupid**:
|
|
|
|
1. **Two Database Opens in Same Function**:
|
|
- Line 691: `getOrCreateSession()` opens DB internally (line 204)
|
|
- Line 695: Opens DB AGAIN for `setWorkerPort()`
|
|
- Line 711: Closes DB
|
|
|
|
2. **Redundant Data Fetching**: `getOrCreateSession()` already fetches session data from the database (line 205). Then we query AGAIN for the user prompt (line 698).
|
|
|
|
3. **Tight Coupling**: `getOrCreateSession()` hides database access, making it unclear that we're opening the database twice.
|
|
|
|
**What Should Happen**:
|
|
- Open database ONCE at the start of handleInit
|
|
- Pass the open database to getOrCreateSession
|
|
- Fetch all needed data in a single transaction
|
|
- Close database at the end
|
|
|
|
**Estimated Savings**: Eliminate 1 database open/close cycle (1-5ms).
|
|
|
|
---
|
|
|
|
### Lines 728-741: Chroma Sync with Verbose Error Handling
|
|
|
|
```typescript
|
|
// Sync user prompt to Chroma (fire-and-forget, but crash on failure)
|
|
if (latestPrompt) {
|
|
this.chromaSync.syncUserPrompt(
|
|
latestPrompt.id,
|
|
latestPrompt.sdk_session_id,
|
|
latestPrompt.project,
|
|
latestPrompt.prompt_text,
|
|
latestPrompt.prompt_number,
|
|
latestPrompt.created_at_epoch
|
|
).catch(err => {
|
|
logger.failure('WORKER', 'Failed to sync user_prompt to Chroma - continuing', { promptId: latestPrompt.id }, err);
|
|
// Don't crash - SQLite has the data
|
|
});
|
|
}
|
|
```
|
|
|
|
**Score**: 5/10
|
|
**Why This Is Mildly Stupid**:
|
|
|
|
1. **Inconsistent Error Handling**: The comment says "crash on failure" but then we catch the error and continue. Which is it?
|
|
|
|
2. **Redundant Comment**: The code says `.catch(err => { /* continue */ })` and the comment says "Don't crash - SQLite has the data". The code is self-documenting.
|
|
|
|
3. **Fire-and-Forget**: If we're going to fire-and-forget, why bother with verbose error handling? Either care about failures (and retry/alert) or don't (and just log).
|
|
|
|
**What Should Happen**:
|
|
```typescript
|
|
// Fire-and-forget Chroma sync (SQLite is source of truth)
|
|
if (latestPrompt) {
|
|
this.chromaSync.syncUserPrompt(/* ... */).catch(() => {}); // Swallow errors
|
|
}
|
|
```
|
|
|
|
**Pattern**: This same verbose error handling appears in lines 1057-1076 and 1114-1133.
|
|
|
|
---
|
|
|
|
### Lines 758-779: POST /sessions/:sessionDbId/observations
|
|
|
|
```typescript
|
|
private handleObservation(req: Request, res: Response): void {
|
|
const sessionDbId = parseInt(req.params.sessionDbId, 10);
|
|
const { tool_name, tool_input, tool_output, prompt_number } = req.body;
|
|
|
|
const session = this.getOrCreateSession(sessionDbId); // <-- Opens DB
|
|
const toolStr = logger.formatTool(tool_name, tool_input);
|
|
|
|
logger.dataIn('WORKER', `Observation queued: ${toolStr}`, {
|
|
sessionId: sessionDbId,
|
|
queue: session.pendingMessages.length + 1
|
|
});
|
|
|
|
session.pendingMessages.push({
|
|
type: 'observation',
|
|
tool_name,
|
|
tool_input,
|
|
tool_output,
|
|
prompt_number
|
|
});
|
|
|
|
res.json({ status: 'queued', queueLength: session.pendingMessages.length });
|
|
}
|
|
```
|
|
|
|
**Score**: 6/10
|
|
**Why This Is Stupid**:
|
|
|
|
1. **Database Opens for No Reason**: `getOrCreateSession()` opens the database (line 204), but we don't actually need any data from the database here. We just need to get or create the in-memory session object.
|
|
|
|
2. **Hot Path Performance**: This endpoint is called **for every single tool execution**. If you run 100 tool calls in a session, this opens/closes the database 100 times unnecessarily.
|
|
|
|
**What Should Happen**:
|
|
- Separate "get existing session" from "create session from database"
|
|
- Only open database if creating a new session
|
|
- For existing sessions, just push to the queue
|
|
|
|
**Estimated Savings**: For a session with 100 observations, eliminate 99 unnecessary database open/close cycles (**99-495ms of pure overhead**).
|
|
|
|
---
|
|
|
|
### Lines 914-1005: createMessageGenerator - THE POLLING HORROR
|
|
|
|
```typescript
|
|
private async* createMessageGenerator(session: ActiveSession): AsyncIterable<SDKUserMessage> {
|
|
// ... send init prompt ...
|
|
|
|
// Process messages continuously until session is deleted
|
|
while (true) {
|
|
if (session.abortController.signal.aborted) {
|
|
break;
|
|
}
|
|
|
|
if (session.pendingMessages.length === 0) {
|
|
await new Promise(resolve => setTimeout(resolve, MESSAGE_POLL_INTERVAL_MS));
|
|
continue;
|
|
}
|
|
|
|
while (session.pendingMessages.length > 0) {
|
|
const message = session.pendingMessages.shift()!;
|
|
// ... process message ...
|
|
yield { /* SDK message */ };
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Score**: 10/10
|
|
**Why This Is ABSOLUTELY FUCKING STUPID**:
|
|
|
|
1. **Infinite Polling Loop**: Lines 936-944 implement a **busy-wait polling loop** that checks `pendingMessages.length` every 100ms. This is the single dumbest pattern in the entire file.
|
|
|
|
2. **Event-Driven Alternative**: We have a fucking queue! When something is added to the queue, **NOTIFY THE CONSUMER**. Use an EventEmitter, a Promise, a Condition Variable, ANYTHING but polling.
|
|
|
|
3. **Wasted CPU**: Every 100ms, this loop wakes up, checks if the queue is empty, and goes back to sleep. For a worker that runs for hours, this is thousands of unnecessary wake-ups.
|
|
|
|
4. **Latency**: When an observation is queued (line 770), it sits in the queue for up to 100ms before being processed. **This adds 0-100ms of artificial latency to every single observation.**
|
|
|
|
5. **Battery Impact**: On laptops, constant polling prevents CPU from entering deep sleep states, draining battery.
|
|
|
|
**What Should Happen**:
|
|
|
|
```typescript
|
|
// In WorkerService class
|
|
private sessionQueues: Map<number, EventEmitter> = new Map();
|
|
|
|
private handleObservation(req: Request, res: Response): void {
|
|
// ... existing code ...
|
|
session.pendingMessages.push({ /* message */ });
|
|
|
|
// Notify the generator that new work is available
|
|
const emitter = this.sessionQueues.get(sessionDbId);
|
|
if (emitter) {
|
|
emitter.emit('message');
|
|
}
|
|
|
|
res.json({ status: 'queued', queueLength: session.pendingMessages.length });
|
|
}
|
|
|
|
private async* createMessageGenerator(session: ActiveSession): AsyncIterable<SDKUserMessage> {
|
|
const emitter = new EventEmitter();
|
|
this.sessionQueues.set(session.sessionDbId, emitter);
|
|
|
|
yield { /* init prompt */ };
|
|
|
|
while (!session.abortController.signal.aborted) {
|
|
if (session.pendingMessages.length === 0) {
|
|
// Wait for new messages via event, not polling
|
|
await new Promise(resolve => emitter.once('message', resolve));
|
|
}
|
|
|
|
while (session.pendingMessages.length > 0) {
|
|
const message = session.pendingMessages.shift()!;
|
|
yield { /* process message */ };
|
|
}
|
|
}
|
|
|
|
this.sessionQueues.delete(session.sessionDbId);
|
|
}
|
|
```
|
|
|
|
**Estimated Savings**:
|
|
- Remove 100ms polling interval (eliminate 0-100ms latency per observation)
|
|
- Reduce CPU wake-ups from ~10/second to 0 when idle
|
|
- Improve battery life on laptops
|
|
- Make the system feel more responsive
|
|
|
|
**Real-World Impact**: For a session with 10 observations, this polling adds **0-1000ms of cumulative latency**. The user is literally waiting for the polling loop to wake up.
|
|
|
|
---
|
|
|
|
### Lines 1011-1146: handleAgentMessage - Database Reopening and Chroma Spam
|
|
|
|
```typescript
|
|
private handleAgentMessage(session: ActiveSession, content: string, promptNumber: number): void {
|
|
// ... parse observations and summary ...
|
|
|
|
const db = new SessionStore(); // <-- Opens DB
|
|
|
|
// Store observations and sync to Chroma (non-blocking, fail-fast)
|
|
for (const obs of observations) {
|
|
const { id, createdAtEpoch } = db.storeObservation(/* ... */);
|
|
logger.success('DB', 'Observation stored', { /* ... */ });
|
|
|
|
// Broadcast to SSE clients
|
|
this.broadcastSSE({ /* ... */ });
|
|
|
|
// Sync to Chroma (non-blocking fire-and-forget, but crash on failure)
|
|
this.chromaSync.syncObservation(/* ... */)
|
|
.then(() => {
|
|
logger.success('WORKER', 'Observation synced to Chroma', { /* ... */ });
|
|
})
|
|
.catch((error: Error) => {
|
|
logger.error('WORKER', 'Observation sync failed - continuing', { /* ... */ }, error);
|
|
// Don't crash - SQLite has the data
|
|
});
|
|
}
|
|
|
|
// ... similar pattern for summary ...
|
|
|
|
db.close(); // <-- Closes DB
|
|
|
|
// Check if queue is empty and stop spinner after debounce
|
|
this.checkAndStopSpinner(); // <-- Triggers 1.5s delay
|
|
}
|
|
```
|
|
|
|
**Score**: 6/10
|
|
**Why This Is Stupid**:
|
|
|
|
1. **Database Reopening**: Opens database (line 1030), stores all observations, closes database (line 1142). This is called **for every SDK response**. For a session with 10 observations, this opens/closes the database 10+ times.
|
|
|
|
2. **Verbose Chroma Error Handling**: Lines 1057-1076 and 1114-1133 have identical verbose error handling for Chroma sync failures. This is copy-paste code smell.
|
|
|
|
3. **Success Logging Spam**: Line 1066 and 1123 log success for EVERY Chroma sync. For a session with 100 observations, this logs 100 success messages. Why? Who reads these?
|
|
|
|
4. **Debounce Call**: Line 1145 calls `checkAndStopSpinner()`, triggering the 1.5s artificial delay.
|
|
|
|
**What Should Happen**:
|
|
- Reuse database connection across multiple calls
|
|
- Simplify Chroma error handling (fire-and-forget means swallow errors)
|
|
- Remove success logging (or make it debug-level)
|
|
- Remove debounce delay
|
|
|
|
---
|
|
|
|
## Summary of Patterns
|
|
|
|
### 1. Database Reopening Anti-Pattern
|
|
**Occurrences**: Lines 200-236, 685-752, 758-779, 1011-1146
|
|
**Impact**: Opens/closes database 4-100+ times per session instead of reusing connections
|
|
**Fix**: Pass open database connections between functions, use transactions, connection pooling
|
|
|
|
### 2. Polling Instead of Events
|
|
**Occurrences**: Line 942 (100ms polling loop)
|
|
**Impact**: 0-100ms latency per observation, wasted CPU cycles, battery drain
|
|
**Fix**: Use EventEmitter or async queue with await/notify pattern
|
|
|
|
### 3. Artificial Delays
|
|
**Occurrences**: Line 363 (1.5s spinner debounce), line 942 (100ms poll interval)
|
|
**Impact**: 1.5s delay before spinner stops, 0-100ms delay per observation
|
|
**Fix**: Remove debouncing, use event-driven patterns
|
|
|
|
### 4. Premature Optimization
|
|
**Occurrences**: Lines 33-70 (Claude path caching)
|
|
**Impact**: 37 lines of code to save 5ms on a one-time operation
|
|
**Fix**: Remove caching, inline the function
|
|
|
|
### 5. Defensive Programming for Ghosts
|
|
**Occurrences**: Line 382 (existsSync check), lines 228-231 (error handler reopens DB), lines 728-741 (verbose error handling)
|
|
**Impact**: Code complexity without real benefit
|
|
**Fix**: Fail fast, trust invariants, simplify error handling
|
|
|
|
### 6. Copy-Paste Code
|
|
**Occurrences**: handleGetObservations, handleGetSummaries, handleGetPrompts (nearly identical)
|
|
**Impact**: Maintenance burden, inconsistency risk
|
|
**Fix**: Extract common pagination logic into helper function
|
|
|
|
---
|
|
|
|
## Recommendations
|
|
|
|
### Immediate Wins (Low Effort, High Impact)
|
|
|
|
1. **Remove Spinner Debounce** (Lines 338-365)
|
|
- **Effort**: 5 minutes
|
|
- **Impact**: Eliminate 1.5s artificial delay
|
|
- **Score**: 9/10 stupidity
|
|
|
|
2. **Replace Polling with Events** (Line 942)
|
|
- **Effort**: 30 minutes
|
|
- **Impact**: Eliminate 0-100ms latency per observation, reduce CPU usage
|
|
- **Score**: 10/10 stupidity
|
|
|
|
3. **Remove Claude Path Caching** (Lines 33-70)
|
|
- **Effort**: 5 minutes
|
|
- **Impact**: Remove 37 lines of unnecessary code
|
|
- **Score**: 6/10 stupidity
|
|
|
|
### Medium Wins (Moderate Effort, Good Impact)
|
|
|
|
4. **Fix Database Reopening in Hot Path** (Lines 758-779)
|
|
- **Effort**: 1 hour
|
|
- **Impact**: Eliminate 99+ database cycles per session
|
|
- **Score**: 6/10 stupidity
|
|
|
|
5. **Simplify Chroma Error Handling** (Lines 728-741, 1057-1076, 1114-1133)
|
|
- **Effort**: 15 minutes
|
|
- **Impact**: Remove 50+ lines of verbose error handling
|
|
- **Score**: 5/10 stupidity
|
|
|
|
6. **Simplify SSE Broadcast** (Lines 297-322)
|
|
- **Effort**: 5 minutes
|
|
- **Impact**: Remove 10 lines, eliminate two-pass cleanup
|
|
- **Score**: 4/10 stupidity
|
|
|
|
### Long-Term Improvements (High Effort, Architectural)
|
|
|
|
7. **Database Connection Pooling**
|
|
- **Effort**: 4 hours
|
|
- **Impact**: Reuse connections across requests, eliminate all open/close overhead
|
|
- **Score**: 8/10 stupidity (current approach)
|
|
|
|
8. **Extract Pagination Helper**
|
|
- **Effort**: 1 hour
|
|
- **Impact**: DRY up handleGetObservations/Summaries/Prompts
|
|
- **Score**: 5/10 stupidity
|
|
|
|
---
|
|
|
|
## Estimated Performance Impact
|
|
|
|
**Current Hot Path (1 observation)**:
|
|
- HTTP request arrives: 0ms
|
|
- getOrCreateSession opens/closes DB: 1-5ms
|
|
- Queue message: 0ms
|
|
- Poll interval: 0-100ms (average 50ms)
|
|
- SDK processing: variable
|
|
- handleAgentMessage opens/closes DB: 1-5ms
|
|
- Chroma sync (async): N/A
|
|
- checkAndStopSpinner debounce: 1500ms
|
|
- **Total artificial overhead**: 1502-1610ms (1.5-1.6 seconds)
|
|
|
|
**Optimized Hot Path (1 observation)**:
|
|
- HTTP request arrives: 0ms
|
|
- Get existing session (no DB): 0ms
|
|
- Queue message + notify: 0ms
|
|
- SDK processing: variable
|
|
- Store in DB (connection pool): 0.1-0.5ms
|
|
- Chroma sync (async): N/A
|
|
- Stop spinner (no debounce): 0ms
|
|
- **Total artificial overhead**: 0.1-0.5ms
|
|
|
|
**Speedup**: **3000-16000x faster** (removing artificial delays and polling)
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
This file has accumulated significant technical debt in the form of:
|
|
- **Artificial delays** (1.5s debounce, 100ms polling)
|
|
- **Database reopening anti-pattern** (4-100+ opens per session)
|
|
- **Polling instead of events** (busy-wait loop)
|
|
- **Premature optimization** (caching rare operations)
|
|
- **Defensive programming** (protecting against non-existent failures)
|
|
|
|
The worker spends more time **waiting** (polling, debouncing) than **working**. Most of these patterns were likely added with good intentions ("make the UI smooth", "cache for performance", "handle errors gracefully") but ended up creating more problems than they solved.
|
|
|
|
**Priority Fixes**:
|
|
1. Remove spinner debounce (9/10 stupidity)
|
|
2. Replace polling with events (10/10 stupidity)
|
|
3. Fix database reopening in hot path (6-8/10 stupidity)
|
|
|
|
These three changes alone would eliminate **1.5+ seconds of artificial delay** per session and make the system feel dramatically more responsive.
|