refactor: Complete rewrite of worker-utils.ts and cleanup of worker-service.ts

- Removed fragile PM2 string parsing and replaced with direct PM2 restart logic.
- Eliminated silent error handling in worker-utils.ts for better error visibility.
- Extracted duplicated session auto-creation logic into a new helper method getOrCreateSession() in worker-service.ts.
- Centralized configuration values and replaced magic numbers with named constants.
- Updated health check logic to ensure worker is restarted if unhealthy.
- Removed unnecessary getWorkerPort() wrapper function.
- Improved overall code quality and maintainability by applying DRY and YAGNI principles.
This commit is contained in:
Alex Newman
2025-11-06 22:00:07 -05:00
parent f8dc7f940f
commit 3030f518b5
18 changed files with 2454 additions and 346 deletions
+416
View File
@@ -0,0 +1,416 @@
# Processing Indicator "Fucking Stupid" Audit
## What It SHOULD Do (Simple Version)
1. **Page load**: Check if worker is already processing → spin or don't spin
2. **UserPromptSubmit**: Start spinning, set worker status "on"
3. **Summary complete**: Stop spinning, set worker status "off"
**Result**: One boolean. Simple. Clear.
---
## What It ACTUALLY Does (Overcomplicated Version)
### Problem 1: Set<string> Instead of Boolean
**Current**: `processingSessions: Set<string>` - tracks individual session IDs
**File**: `src/ui/viewer/hooks/useSSE.ts:12`
```typescript
const [processingSessions, setProcessingSessions] = useState<Set<string>>(new Set());
```
**Why it's stupid**: We don't care WHICH sessions are processing. We just need to know IF anything is processing. The conversion to boolean happens anyway:
**File**: `src/ui/viewer/App.tsx:92`
```typescript
isProcessing={processingSessions.size > 0} // ← Converting Set to boolean!
```
**Fix**: Just use `const [isProcessing, setIsProcessing] = useState(false)`
---
### Problem 2: Complex Set Manipulation
**Current**: Add/remove session IDs from Set based on SSE events
**File**: `src/ui/viewer/hooks/useSSE.ts:90-104`
```typescript
case 'processing_status':
if (data.processing) {
const processing = data.processing;
console.log('[SSE] Processing status:', processing);
setProcessingSessions(prev => {
const next = new Set(prev);
if (processing.is_processing) {
next.add(processing.session_id); // ← Why track session ID?
} else {
next.delete(processing.session_id); // ← Just need true/false
}
return next;
});
}
break;
```
**Why it's stupid**: Creating new Sets, adding/removing items, all to track individual sessions when we only care about "any processing yes/no"
**Fix**: `setIsProcessing(data.is_processing)`
---
### Problem 3: Defensive Cleanup in Multiple Places
**Current**: Two places remove sessions from the Set
**Location 1** - `useSSE.ts:90-104` - Handles `processing_status` events
**Location 2** - `useSSE.ts:73-78` - Handles `new_summary` events
```typescript
// Mark session as no longer processing (summary is the final step)
setProcessingSessions(prev => {
const next = new Set(prev);
next.delete(summary.session_id); // ← Defensive cleanup
return next;
});
```
**Why it's stupid**: We're defensively cleaning up in case events arrive out of order. This is a band-aid for not having a single source of truth.
**Fix**: One place sets `isProcessing = false` (summary complete). No defensive cleanup needed.
---
### Problem 4: SSE Event Includes Session ID
**Current**: Processing status events include session ID
**File**: `src/services/worker-service.ts:277-285`
```typescript
private broadcastProcessingStatus(claudeSessionId: string, isProcessing: boolean): void {
this.broadcastSSE({
type: 'processing_status',
processing: {
session_id: claudeSessionId, // ← Why send session ID?
is_processing: isProcessing
}
});
}
```
**Why it's stupid**: We send session_id but never use it for the spinner decision. The logomark doesn't care WHICH session is processing.
**Fix**: `{ type: 'processing_status', isProcessing: boolean }` - That's it.
---
### Problem 5: TypeScript Interface Overcomplicated
**Current**: StreamEvent includes processing object with session_id
**File**: `src/ui/viewer/types.ts:54-57`
```typescript
processing?: {
session_id: string; // ← Unnecessary
is_processing: boolean;
};
```
**Why it's stupid**: Adds complexity to type definitions when we only need the boolean.
**Fix**: `isProcessing?: boolean;`
---
### Problem 6: Multiple Broadcast Points (But No Initial State!)
**Current**: 3 places broadcast processing status in worker-service.ts
1. **Line 817**: `handleSummarize()``broadcastProcessingStatus(session.claudeSessionId, true)`
2. **Line 1153**: `processSummarizeMessage()` success → `broadcastProcessingStatus(session.claudeSessionId, false)`
3. **Line 1183**: `processSummarizeMessage()` no summary → `broadcastProcessingStatus(session.claudeSessionId, false)`
**Why it's stupid**: We broadcast changes but there's NO WAY TO GET INITIAL STATE on page load. If you open the viewer while processing is active, you won't see the spinner until the next status change.
**Fix**: Add `/api/processing-status` endpoint that returns current state. Call it on page load.
---
### Problem 7: Skeleton Cards Require Session Tracking
**Current**: Feed.tsx creates skeleton cards for each processing session
**File**: `src/ui/viewer/components/Feed.tsx:66-80`
```typescript
const skeletons: FeedItem[] = [];
processingSessions.forEach(sessionId => { // ← Iterating over Set
if (!sessionsWithSummaries.has(sessionId)) {
const prompt = sessionPrompts.get(sessionId);
skeletons.push({
itemType: 'skeleton',
id: sessionId,
session_id: sessionId, // ← Using individual session IDs
project: prompt?.project,
created_at_epoch: Date.now()
});
}
});
```
**Why it's relevant**: This is the ONLY place that actually uses individual session IDs. If we want per-session skeleton cards, we need session tracking.
**Question for you**: Do we still want skeleton cards in the feed? Or just the logomark spinner?
**Option A**: Keep skeleton cards → Need to track session IDs (current complexity justified)
**Option B**: Remove skeleton cards → Use simple boolean for logomark only
---
### Problem 8: No Synchronization Between Worker State and UI State
**Current**: Worker doesn't maintain processing state. It just broadcasts events.
**Why it's stupid**: If the UI disconnects/reconnects, it loses processing state. Worker should be the source of truth.
**Fix**: Worker maintains `private isProcessing: boolean = false`
- Set to true on summarize request
- Set to false when summary completes
- Expose via `/api/processing-status` endpoint
- Broadcast changes via SSE
---
## The "Fucking Stupid" Score
| Issue | Complexity Cost | Why It's Stupid |
|-------|----------------|-----------------|
| Set<string> instead of boolean | HIGH | We convert it to boolean anyway |
| Complex Set manipulation | HIGH | 10+ lines of code to add/remove from Set |
| Defensive cleanup in 2 places | MEDIUM | Band-aid for lack of single source of truth |
| SSE includes unused session_id | LOW | Minor overhead, but conceptually wrong |
| Overcomplicated TypeScript types | LOW | Makes code harder to read |
| No initial state endpoint | HIGH | Broken user experience (no spinner on page load during active processing) |
| Session tracking for skeletons | ??? | Depends if we want per-session skeletons or not |
| Worker has no state | HIGH | UI is source of truth, should be worker |
---
## Proposed Simple Architecture
### Worker Service (Source of Truth)
```typescript
class WorkerService {
private isProcessing: boolean = false; // Single source of truth
// New endpoint: GET /api/processing-status
private handleGetProcessingStatus(req: Request, res: Response): void {
res.json({ isProcessing: this.isProcessing });
}
// On summarize request
private handleSummarize(req: Request, res: Response): void {
// ... existing code ...
this.isProcessing = true;
this.broadcastSSE({ type: 'processing_status', isProcessing: true });
// ...
}
// On summary complete
private processSummarizeMessage(session: SessionState, message: Message): void {
// ... existing code ...
// After summary is saved/failed:
this.isProcessing = false;
this.broadcastSSE({ type: 'processing_status', isProcessing: false });
}
}
```
### React Hook (Simple Boolean)
```typescript
export function useSSE() {
const [isProcessing, setIsProcessing] = useState(false);
// On mount: Get initial state
useEffect(() => {
fetch('/api/processing-status')
.then(res => res.json())
.then(data => setIsProcessing(data.isProcessing));
}, []);
// Listen for changes
useEffect(() => {
const eventSource = new EventSource('/stream');
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'processing_status') {
setIsProcessing(data.isProcessing); // Simple!
}
};
return () => eventSource.close();
}, []);
return { isProcessing, /* other state */ };
}
```
### TypeScript Types (Simplified)
```typescript
export interface StreamEvent {
type: 'initial_load' | 'new_observation' | 'new_summary' | 'new_prompt' | 'processing_status';
observations?: Observation[];
summaries?: Summary[];
prompts?: UserPrompt[];
projects?: string[];
observation?: Observation;
summary?: Summary;
prompt?: UserPrompt;
isProcessing?: boolean; // Simple!
}
```
### React Components (No Changes Needed!)
```typescript
// App.tsx
const { isProcessing } = useSSE(); // Already a boolean now!
<Header isProcessing={isProcessing} /> // Just pass it through
// Header.tsx (no changes needed)
<img className={`logomark ${isProcessing ? 'spinning' : ''}`} />
```
---
## Breaking Changes & Decisions
### Decision 1: What About Skeleton Cards?
**Current**: Skeleton cards in feed show "Generating..." for each processing session
**Options**:
**A) Keep skeleton cards** (requires session tracking)
- Need to track individual session IDs
- Justifies the Set<string> complexity
- Provides per-session feedback in feed
**B) Remove skeleton cards** (simplest)
- Only logomark spins (global processing indicator)
- No need to track individual sessions
- Simpler architecture
**C) Hybrid: Single skeleton card** (middle ground)
- Show ONE skeleton card when `isProcessing === true`
- Don't tie it to specific sessions
- Keep it simple but provide feed feedback
**What do you want?**
---
### Decision 2: Multiple Concurrent Sessions?
**Question**: Can multiple sessions be processing simultaneously?
**Current assumption**: Yes (hence the Set<string>)
**Reality check**: Worker processes messages from a queue. Can it actually process multiple sessions at once, or is it sequential?
**If sequential**: We DEFINITELY don't need session tracking. One boolean is perfect.
**If concurrent**: We still might not need session tracking for the logomark (just spin if ANY processing), but skeleton cards would need session IDs.
---
## Recommended Implementation Plan
### Phase 1: Add Initial State (Quick Win)
**File**: `src/services/worker-service.ts`
- Add `private isProcessing: boolean = false;`
- Add GET `/api/processing-status` endpoint
- Set `this.isProcessing = true` on line 817
- Set `this.isProcessing = false` on lines 1153, 1183
**File**: `src/ui/viewer/hooks/useSSE.ts`
- Add `fetch('/api/processing-status')` on mount
- Initialize `isProcessing` state from response
**Impact**: Fixes the "no spinner on page load" bug without breaking changes.
---
### Phase 2: Simplify State (Breaking Change)
**File**: `src/services/worker-service.ts`
- Change `broadcastProcessingStatus()` to send `{ type: 'processing_status', isProcessing: boolean }`
- Remove session_id from broadcast
**File**: `src/ui/viewer/hooks/useSSE.ts`
- Change `processingSessions` Set to `isProcessing` boolean
- Simplify event handler: `setIsProcessing(data.isProcessing)`
- Remove defensive cleanup from `new_summary` handler
**File**: `src/ui/viewer/types.ts`
- Simplify `StreamEvent.processing` to just `isProcessing?: boolean`
**File**: `src/ui/viewer/App.tsx`
- Change `processingSessions.size > 0` to just `isProcessing`
**File**: `src/ui/viewer/components/Feed.tsx`
- **Decision needed**: Remove skeleton cards or show single generic skeleton?
**Impact**: Cleaner code, easier to maintain, fewer bugs.
---
## Files That Need Changes
### Worker Service
- `src/services/worker-service.ts` (add state, endpoint, update broadcasts)
### React
- `src/ui/viewer/hooks/useSSE.ts` (boolean instead of Set, fetch initial state)
- `src/ui/viewer/types.ts` (simplify StreamEvent)
- `src/ui/viewer/App.tsx` (pass boolean instead of Set.size > 0)
- `src/ui/viewer/components/Feed.tsx` (handle skeleton cards decision)
- `src/ui/viewer/constants/api.ts` (add PROCESSING_STATUS endpoint)
### No Changes Needed
- `src/ui/viewer/components/Header.tsx` (already receives boolean)
- `src/ui/viewer/components/SummarySkeleton.tsx` (might be removed)
- CSS/animations (work the same with boolean)
---
## Summary: What's Fucking Stupid
1. **Set<string> when we only need boolean** ← Biggest offender
2. **No initial state on page load** ← Broken UX
3. **Complex Set manipulation** ← 10+ lines for add/remove
4. **Defensive cleanup in multiple places** ← No single source of truth
5. **Session IDs in SSE events** ← Data we don't use
6. **Worker doesn't maintain state** ← UI is source of truth (backwards!)
**Complexity Score**: 7/10 stupid
**After refactor**: 2/10 (the remaining complexity is React/SSE boilerplate)
---
## What Do You Want To Do?
Tell me:
1. **Skeleton cards**: Keep (per-session), remove entirely, or show one generic skeleton?
2. **Breaking changes**: OK to simplify now, or do you want backwards compatibility?
3. **Implementation**: Want me to do Phase 1 (quick fix), Phase 2 (full refactor), or both?
+564
View File
@@ -0,0 +1,564 @@
# Processing Indicator: Complete Code Reference
This document provides a line-by-line breakdown of every piece of code related to the processing/activity indicator (the spinning logomark in the top left corner of the viewer UI).
## Overview
The processing indicator is a visual cue that shows when the worker service is actively processing memories (observations or summaries). It consists of:
1. **Logomark Image**: `claude-mem-logomark.webp` in the header
2. **Spinning Animation**: Applied via CSS class when processing is active
3. **State Management**: Tracked via Server-Sent Events (SSE) from the worker
4. **Processing Sessions Set**: Maintains active session IDs being processed
## Data Flow
```
Worker Service
└─> broadcastProcessingStatus(sessionId, isProcessing)
└─> broadcastSSE({ type: 'processing_status', ... })
└─> SSE Event Stream (/stream)
└─> useSSE Hook (React)
└─> processingSessions Set<string>
└─> App.tsx: isProcessing={processingSessions.size > 0}
└─> Header.tsx: className={isProcessing ? 'spinning' : ''}
└─> CSS Animation: @keyframes spin
```
---
## 1. TypeScript Types
### File: `src/ui/viewer/types.ts`
**Lines 45-58: StreamEvent interface with processing_status type**
```typescript
export interface StreamEvent {
type: 'initial_load' | 'new_observation' | 'new_summary' | 'new_prompt' | 'processing_status';
observations?: Observation[];
summaries?: Summary[];
prompts?: UserPrompt[];
projects?: string[];
observation?: Observation;
summary?: Summary;
prompt?: UserPrompt;
processing?: {
session_id: string;
is_processing: boolean;
};
}
```
**Purpose**: Defines the structure of SSE events. The `processing_status` type includes a `processing` object that indicates whether a session is currently being processed.
---
## 2. Worker Service (Backend)
### File: `src/services/worker-service.ts`
**Lines 247-272: broadcastSSE() - Core SSE broadcasting**
```typescript
/**
* Broadcast SSE event to all connected clients
*/
private broadcastSSE(event: any): void {
if (this.sseClients.size === 0) {
return; // No clients connected, skip broadcast
}
const data = `data: ${JSON.stringify(event)}\n\n`;
const clientsToRemove: Response[] = [];
for (const client of this.sseClients) {
try {
client.write(data);
} catch (error) {
// Client disconnected, mark for removal
clientsToRemove.push(client);
}
}
// Clean up disconnected clients
for (const client of clientsToRemove) {
this.sseClients.delete(client);
}
if (clientsToRemove.length > 0) {
logger.info('WORKER', `SSE cleaned up disconnected clients`, { count: clientsToRemove.length });
}
}
```
**Purpose**: Broadcasts SSE events to all connected UI clients. Handles disconnected clients gracefully.
---
**Lines 274-285: broadcastProcessingStatus() - Processing indicator control**
```typescript
/**
* Broadcast processing status to SSE clients
*/
private broadcastProcessingStatus(claudeSessionId: string, isProcessing: boolean): void {
this.broadcastSSE({
type: 'processing_status',
processing: {
session_id: claudeSessionId,
is_processing: isProcessing
}
});
}
```
**Purpose**: Dedicated method for broadcasting processing status changes. Called when sessions start/stop processing.
---
**Line 817: Summarize request triggers processing start**
```typescript
// Notify UI that processing is active
this.broadcastProcessingStatus(session.claudeSessionId, true);
```
**Context**: In `handleSummarize()` method - when a summary request is queued, processing starts.
**File location**: `src/services/worker-service.ts:817`
---
**Line 1153: Summary generation complete - processing stops**
```typescript
// Notify UI that processing is complete (summary is the final step)
this.broadcastProcessingStatus(session.claudeSessionId, false);
```
**Context**: In `processSummarizeMessage()` after successfully generating and saving a summary.
**File location**: `src/services/worker-service.ts:1153`
---
**Line 1183: No summary generated - still mark processing complete**
```typescript
// Still mark processing as complete even if no summary was generated
this.broadcastProcessingStatus(session.claudeSessionId, false);
```
**Context**: In `processSummarizeMessage()` when no summary tags are found in the AI response.
**File location**: `src/services/worker-service.ts:1183`
---
## 3. React Hook: SSE Connection
### File: `src/ui/viewer/hooks/useSSE.ts`
**Line 12: processingSessions state initialization**
```typescript
const [processingSessions, setProcessingSessions] = useState<Set<string>>(new Set());
```
**Purpose**: Maintains a Set of session IDs currently being processed. Used to determine if any processing is active.
---
**Lines 90-104: processing_status event handler**
```typescript
case 'processing_status':
if (data.processing) {
const processing = data.processing;
console.log('[SSE] Processing status:', processing);
setProcessingSessions(prev => {
const next = new Set(prev);
if (processing.is_processing) {
next.add(processing.session_id);
} else {
next.delete(processing.session_id);
}
return next;
});
}
break;
```
**Purpose**: Listens for `processing_status` SSE events and updates the processingSessions Set:
- `is_processing: true` → Adds session ID to Set
- `is_processing: false` → Removes session ID from Set
**File location**: `src/ui/viewer/hooks/useSSE.ts:90-104`
---
**Lines 73-78: Summary completion also clears processing status**
```typescript
// Mark session as no longer processing (summary is the final step)
setProcessingSessions(prev => {
const next = new Set(prev);
next.delete(summary.session_id);
return next;
});
```
**Purpose**: When a `new_summary` event arrives, remove the session from processingSessions (defensive cleanup in case the processing_status event was missed).
**File location**: `src/ui/viewer/hooks/useSSE.ts:73-78`
---
**Line 125: Hook return value includes processingSessions**
```typescript
return { observations, summaries, prompts, projects, processingSessions, isConnected };
```
**Purpose**: Exposes processingSessions Set to consuming components.
---
## 4. React Component: App
### File: `src/ui/viewer/App.tsx`
**Line 20: Destructure processingSessions from useSSE**
```typescript
const { observations, summaries, prompts, projects, processingSessions, isConnected } = useSSE();
```
**Purpose**: Gets the processingSessions Set from the SSE hook.
---
**Line 92: Convert Set to boolean for Header component**
```typescript
isProcessing={processingSessions.size > 0}
```
**Purpose**: Passes `true` to Header if ANY session is being processed (Set has items), `false` otherwise.
**File location**: `src/ui/viewer/App.tsx:92`
---
## 5. React Component: Header
### File: `src/ui/viewer/components/Header.tsx`
**Line 12: isProcessing prop definition**
```typescript
interface HeaderProps {
isConnected: boolean;
projects: string[];
currentFilter: string;
onFilterChange: (filter: string) => void;
onSettingsToggle: () => void;
sidebarOpen: boolean;
isProcessing: boolean; // ← Processing indicator prop
themePreference: ThemePreference;
onThemeChange: (theme: ThemePreference) => void;
}
```
**Purpose**: Defines the isProcessing boolean prop for the Header component.
---
**Line 24: isProcessing destructured from props**
```typescript
export function Header({
isConnected,
projects,
currentFilter,
onFilterChange,
onSettingsToggle,
sidebarOpen,
isProcessing, // ← Received from App.tsx
themePreference,
onThemeChange
}: HeaderProps) {
```
---
**Line 31: Logomark with conditional spinning class**
```typescript
<img src="claude-mem-logomark.webp" alt="" className={`logomark ${isProcessing ? 'spinning' : ''}`} />
```
**Purpose**: The core of the processing indicator. When `isProcessing` is `true`, adds the `spinning` CSS class to the logomark image, triggering the rotation animation.
**File location**: `src/ui/viewer/components/Header.tsx:31`
**Rendered HTML Examples**:
- Not processing: `<img src="claude-mem-logomark.webp" alt="" className="logomark" />`
- Processing: `<img src="claude-mem-logomark.webp" alt="" className="logomark spinning" />`
---
## 6. CSS Styling & Animation
### File: `plugin/ui/viewer.html` (compiled output)
**Lines 342-349: Logomark and spinning class styles**
```css
.logomark {
height: 32px;
width: auto;
}
.logomark.spinning {
animation: spin 1.5s linear infinite;
}
```
**Purpose**:
- `.logomark`: Base styles for the logo image (32px height, auto width)
- `.logomark.spinning`: Applies the spin animation when processing is active
- **Duration**: 1.5 seconds per rotation
- **Timing**: Linear (constant speed)
- **Iteration**: Infinite (continues until class is removed)
**File location**: `plugin/ui/viewer.html:342-349`
---
**Lines 701-705: Spin animation keyframes**
```css
@keyframes spin {
to {
transform: rotate(360deg);
}
}
```
**Purpose**: Defines the rotation animation. Rotates the element from 0° (implicit) to 360° (full circle).
**File location**: `plugin/ui/viewer.html:701-705`
---
## 7. API Endpoint: Stream
### File: `src/ui/viewer/constants/api.ts`
**Line 11: SSE stream endpoint**
```typescript
export const API_ENDPOINTS = {
OBSERVATIONS: '/api/observations',
SUMMARIES: '/api/summaries',
PROMPTS: '/api/prompts',
SETTINGS: '/api/settings',
STATS: '/api/stats',
STREAM: '/stream', // ← SSE endpoint for processing events
} as const;
```
**Purpose**: Centralized API endpoint constant. The `/stream` endpoint is used by `useSSE.ts` to establish the EventSource connection.
---
## Bonus: Feed Skeleton Processing Indicator
While not part of the logomark spinner, the feed also shows processing state with skeleton cards and a smaller spinner.
### File: `src/ui/viewer/components/Feed.tsx`
**Lines 66-80: Create skeleton items for processing sessions**
```typescript
// Create skeleton items for sessions being processed that don't have summaries yet
const skeletons: FeedItem[] = [];
processingSessions.forEach(sessionId => {
if (!sessionsWithSummaries.has(sessionId)) {
const prompt = sessionPrompts.get(sessionId);
skeletons.push({
itemType: 'skeleton',
id: sessionId,
session_id: sessionId,
project: prompt?.project,
// Always use current time so skeletons appear at top of feed
created_at_epoch: Date.now()
});
}
});
```
**Purpose**: Creates temporary skeleton cards for sessions currently being processed (from `processingSessions` Set).
---
**Line 104: Render SummarySkeleton component**
```typescript
} else if (item.itemType === 'skeleton') {
return <SummarySkeleton key={key} sessionId={item.session_id} project={item.project} />;
```
---
### File: `src/ui/viewer/components/SummarySkeleton.tsx`
**Lines 14-17: Processing indicator in skeleton card**
```typescript
<div className="processing-indicator">
<div className="spinner"></div>
<span>Generating...</span>
</div>
```
**Purpose**: Shows a smaller inline spinner with "Generating..." text in skeleton summary cards.
---
### CSS for Feed Spinner
**Lines 682-690: Processing indicator container**
```css
.processing-indicator {
display: inline-flex;
align-items: center;
gap: 6px;
color: var(--color-accent-focus);
font-size: 11px;
font-weight: 500;
margin-left: auto;
}
```
---
**Lines 692-700: Small spinner for skeleton cards**
```css
.spinner {
width: 12px;
height: 12px;
border: 2px solid var(--color-border-primary);
border-top-color: var(--color-accent-focus);
border-radius: 50%;
animation: spin 0.8s linear infinite;
}
```
**Purpose**: Smaller circular spinner (12px) with faster rotation (0.8s) used in skeleton cards. Uses the same `@keyframes spin` animation.
---
**Lines 711-715: Skeleton card opacity**
```css
.summary-skeleton {
opacity: 0.7;
}
.summary-skeleton .processing-indicator {
margin-left: auto;
}
```
---
**Lines 715-740: Skeleton line animations (shimmer effect)**
```css
.skeleton-line {
height: 16px;
background: linear-gradient(90deg, var(--color-skeleton-base) 25%, var(--color-skeleton-highlight) 50%, var(--color-skeleton-base) 75%);
background-size: 200% 100%;
animation: shimmer 1.5s infinite;
border-radius: 4px;
margin-bottom: 8px;
}
.skeleton-title {
height: 20px;
width: 80%;
margin-bottom: 10px;
}
.skeleton-subtitle {
height: 16px;
width: 90%;
}
.skeleton-subtitle.short {
width: 60%;
}
@keyframes shimmer {
0% {
background-position: 200% 0;
}
100% {
background-position: -200% 0;
}
}
```
**Purpose**: Creates animated placeholder lines with a shimmer effect while summary is being generated.
---
## Summary: Complete Processing Flow
1. **User submits prompt** → Claude Code session starts
2. **Worker receives summarize request**`worker-service.ts:817` calls `broadcastProcessingStatus(sessionId, true)`
3. **SSE broadcasts**`{ type: 'processing_status', processing: { session_id: '...', is_processing: true } }`
4. **React receives event**`useSSE.ts:90-104` adds sessionId to `processingSessions` Set
5. **State flows down**`App.tsx:92` converts Set size to boolean → `Header.tsx:31` receives `isProcessing={true}`
6. **CSS class applied**`className="logomark spinning"` triggers animation
7. **Logomark spins** → CSS animation `@keyframes spin` rotates 360° every 1.5s
8. **Feed shows skeleton**`Feed.tsx:66-80` creates skeleton cards for processing sessions
9. **Summary completes**`worker-service.ts:1153` calls `broadcastProcessingStatus(sessionId, false)`
10. **SSE broadcasts**`{ type: 'processing_status', processing: { session_id: '...', is_processing: false } }`
11. **React clears state**`useSSE.ts:90-104` removes sessionId from Set
12. **Animation stops**`isProcessing={false}` removes `spinning` class, logomark stops rotating
---
## File Summary
| File | Lines | Purpose |
|------|-------|---------|
| `src/ui/viewer/types.ts` | 45-58 | Defines `StreamEvent` interface with `processing_status` type |
| `src/services/worker-service.ts` | 247-285, 817, 1153, 1183 | Broadcasts processing status via SSE |
| `src/ui/viewer/hooks/useSSE.ts` | 12, 73-78, 90-104, 125 | Manages `processingSessions` Set from SSE events |
| `src/ui/viewer/App.tsx` | 20, 92 | Converts Set to boolean, passes to Header |
| `src/ui/viewer/components/Header.tsx` | 12, 24, 31 | Applies `spinning` class to logomark |
| `plugin/ui/viewer.html` (CSS) | 342-349, 701-705 | Styles logomark and defines spin animation |
| `src/ui/viewer/components/Feed.tsx` | 66-80, 104 | Creates skeleton cards for processing sessions |
| `src/ui/viewer/components/SummarySkeleton.tsx` | 14-17 | Renders inline spinner in skeleton cards |
| `plugin/ui/viewer.html` (CSS) | 682-740 | Styles for skeleton cards and inline spinner |
---
## Key Design Decisions
1. **Set vs Boolean**: Using a `Set<string>` for `processingSessions` allows tracking multiple concurrent sessions. The UI shows spinning as long as *any* session is processing.
2. **Defensive Cleanup**: Both `processing_status` events AND `new_summary` events clear processing state, ensuring the spinner stops even if events arrive out of order.
3. **CSS-Only Animation**: No JavaScript animation loops - pure CSS transforms provide smooth, GPU-accelerated rotation with minimal performance impact.
4. **Dual Indicators**: Header logomark (global processing state) + skeleton cards (per-session processing state) provide both overview and detail-level feedback.
5. **SSE Architecture**: Server-Sent Events provide real-time updates without polling, keeping UI responsive with minimal network overhead.
+303
View File
@@ -0,0 +1,303 @@
# Worker Service Refactor Plan
**Date**: 2025-11-06
**Based on**: worker-service-analysis.md
**Branch**: cleanup/worker
---
## Decisions Made
### 🔥🔥🔥🔥🔥 Critical Fixes
#### Issue #1: Fragile PM2 String Parsing
**Decision**: DELETE all PM2 status checking code
- Remove lines 54-98 in worker-utils.ts (PM2 list parsing)
- Replace with simple: health check → if unhealthy, restart → wait for health
- PM2 restart is idempotent - handles "not started" and "started but broken"
- Rationale: "Just ping localhost:37777" - if unhealthy, restart it
#### Issue #2: Silent PM2 Error Handling
**Decision**: AUTOMATICALLY RESOLVED by Issue #1
- Gets deleted with PM2 status checking code
- New approach naturally fails fast on execSync
#### Issue #3: Session Auto-Creation Duplication
**Decision**: EXTRACT to helper method
- Create `private getOrCreateSession(sessionDbId): ActiveSession`
- Remove 60+ lines of duplicated code from:
- handleInit() (lines 663-733)
- handleObservation() (lines 754-785)
- handleSummarize() (lines 813-844)
- Rationale: DRY principle
#### Issue #4: No "Running But Unhealthy" Handling
**Decision**: AUTOMATICALLY RESOLVED by Issue #1
- New approach always restarts if unhealthy
- PM2 restart handles all cases
#### Issue #5: Useless getWorkerPort() Wrapper
**Decision**: CREATE proper settings reader
- Delete the wrapper function
- Create settings reader that:
1. Reads from `~/.claude-mem/settings.json`
2. Falls back to `process.env.CLAUDE_MEM_WORKER_PORT`
3. Falls back to `37777`
- Rationale: UI writes to `~/.claude-mem/settings.json`, worker/hooks must read from there
---
### 🔥🔥🔥 Cleanup
#### Issue #6: 1500ms Debounce Too Long
**Decision**: SKIP - not a concern
#### Issue #7: Magic Numbers Throughout
**Decision**: DELETE unnecessary magic numbers, UNIFY required ones
- Remove hardcoded defaults that aren't needed
- Centralize remaining constants with named variables
- Locations:
- worker-utils.ts: timeout values (100ms, 1000ms, 10000ms)
- worker-service.ts: Line 997 (100ms), Line 109 ('50mb'), etc.
#### Issue #8: Configuration Duplication
**Decision**: AUTOMATICALLY RESOLVED by Issue #7
- Centralizing constants solves this
#### Issue #9: Hardcoded Model Validation
**Decision**: AUTOMATICALLY RESOLVED by Issue #7
- Delete hardcoded model list
- Let SDK handle validation
#### Issue #10: Hardcoded Version Fallback
**Decision**: READ from package.json
- Line 343: Replace `'5.0.3'` with dynamic read from package.json
- Rationale: Why hardcode a version that gets stale?
#### Issue #11: Unnecessary this.port Instance Variable
**Decision**: DELETE `this.port`
- worker-service.ts:100 - remove instance variable
- Replace all `this.port` uses with direct constant/settings reader
- Used at lines 351, 738, 742
---
## Implementation Plan
### Phase 1: worker-utils.ts Complete Rewrite
**File**: `src/shared/worker-utils.ts`
**Changes**:
1. Create settings reader function:
```typescript
function getWorkerPort(): number {
try {
const settingsPath = join(homedir(), '.claude-mem', 'settings.json');
if (existsSync(settingsPath)) {
const settings = JSON.parse(readFileSync(settingsPath, 'utf-8'));
const port = parseInt(settings.env?.CLAUDE_MEM_WORKER_PORT, 10);
if (!isNaN(port)) return port;
}
} catch {}
return parseInt(process.env.CLAUDE_MEM_WORKER_PORT || '37777', 10);
}
```
2. Add named constants:
```typescript
const HEALTH_CHECK_TIMEOUT_MS = 100;
const HEALTH_CHECK_POLL_INTERVAL_MS = 100;
const HEALTH_CHECK_MAX_WAIT_MS = 10000;
```
3. Simplify `ensureWorkerRunning()`:
```typescript
export async function ensureWorkerRunning(): Promise<void> {
if (await isWorkerHealthy()) return;
const packageRoot = getPackageRoot();
const pm2Path = path.join(packageRoot, "node_modules", ".bin", "pm2");
const ecosystemPath = path.join(packageRoot, "ecosystem.config.cjs");
execSync(`"${pm2Path}" restart "${ecosystemPath}"`, {
cwd: packageRoot,
stdio: 'pipe'
});
if (!await waitForWorkerHealth()) {
throw new Error("Worker failed to become healthy after restart");
}
}
```
4. Update `isWorkerHealthy()` and `waitForWorkerHealth()` to use constants
**Result**: ~50 lines (vs 110 original), all bugs fixed
---
### Phase 2: worker-service.ts Cleanup
**File**: `src/services/worker-service.ts`
**Changes**:
1. **Read version from package.json** (line 343):
```typescript
import { readFileSync } from 'fs';
import { join, dirname } from 'path';
import { fileURLToPath } from 'url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
const packageJson = JSON.parse(readFileSync(join(__dirname, '../../package.json'), 'utf-8'));
const VERSION = packageJson.version;
```
2. **Extract getOrCreateSession() helper**:
```typescript
private getOrCreateSession(sessionDbId: number): ActiveSession {
let session = this.sessions.get(sessionDbId);
if (session) return session;
const db = new SessionStore();
const dbSession = db.getSessionById(sessionDbId);
if (!dbSession) {
db.close();
throw new Error(`Session ${sessionDbId} not found in database`);
}
session = {
sessionDbId,
claudeSessionId: dbSession.claude_session_id,
sdkSessionId: null,
project: dbSession.project,
userPrompt: dbSession.user_prompt,
pendingMessages: [],
abortController: new AbortController(),
generatorPromise: null,
lastPromptNumber: 0,
startTime: Date.now()
};
this.sessions.set(sessionDbId, session);
session.generatorPromise = this.runSDKAgent(session).catch(err => {
logger.failure('WORKER', 'SDK agent error', { sessionId: sessionDbId }, err);
const db = new SessionStore();
db.markSessionFailed(sessionDbId);
db.close();
this.sessions.delete(sessionDbId);
});
db.close();
return session;
}
```
3. **Update handleInit(), handleObservation(), handleSummarize()**:
Replace duplication with single line:
```typescript
const session = this.getOrCreateSession(sessionDbId);
```
4. **Delete model validation** (lines 407+):
Remove hardcoded validModels array and validation check
5. **Delete this.port instance variable** (line 100):
- Remove `private port: number = FIXED_PORT;`
- Replace all `this.port` references with `FIXED_PORT` or settings reader
6. **Add named constants** at top of file:
```typescript
const MESSAGE_POLL_INTERVAL_MS = 100;
const MAX_REQUEST_SIZE = '50mb';
```
7. **Use named constants** throughout (lines 109, 997, etc.)
---
### Phase 3: Update Hooks
**Files**:
- `src/hooks/new-hook.ts`
- `src/hooks/save-hook.ts`
- `src/hooks/summary-hook.ts`
- `src/hooks/cleanup-hook.ts`
**Changes**:
1. Import settings reader from worker-utils
2. Replace `const FIXED_PORT = parseInt(process.env.CLAUDE_MEM_WORKER_PORT || '37777', 10);`
with call to settings reader
3. Update cleanup-hook.ts line 74 to use settings reader as fallback
---
### Phase 4: Update user-message-hook.ts
**File**: `src/hooks/user-message-hook.ts`
**Changes**:
- Line 53: Replace hardcoded `http://localhost:37777/` with dynamic port from settings reader
---
## Files Changed
1. `src/shared/worker-utils.ts` - Complete rewrite (~50 lines)
2. `src/services/worker-service.ts` - Major cleanup (remove ~60 lines duplication, add helper)
3. `src/hooks/new-hook.ts` - Use settings reader
4. `src/hooks/save-hook.ts` - Use settings reader
5. `src/hooks/summary-hook.ts` - Use settings reader
6. `src/hooks/cleanup-hook.ts` - Use settings reader
7. `src/hooks/user-message-hook.ts` - Dynamic port in message
---
## Testing Checklist
After implementation:
- [ ] Build: `npm run build`
- [ ] Sync: `npm run sync-marketplace`
- [ ] Restart worker: `npm run worker:restart`
- [ ] Start new Claude Code session (hooks should work)
- [ ] Change port in UI settings to 38888
- [ ] Restart worker
- [ ] Verify worker binds to 38888
- [ ] Verify hooks connect to 38888
- [ ] Verify UI connects to 38888
- [ ] Change port back to 37777
- [ ] Test all endpoints work
---
## Expected Outcomes
**Lines Removed**: ~130 lines (60 from duplication, 70 from PM2 parsing)
**Lines Added**: ~50 lines (helper method, settings reader, constants)
**Net Change**: -80 lines
**Bugs Fixed**:
- ✅ PM2 string parsing false positives
- ✅ Silent error handling
- ✅ No restart when unhealthy
- ✅ Port configuration not synchronized with UI
**Code Quality**:
- ✅ DRY principle applied (no duplication)
- ✅ YAGNI principle applied (removed ceremony)
- ✅ Fail fast error handling
- ✅ Named constants instead of magic numbers
- ✅ Single source of truth for configuration
---
## Notes
- This plan addresses all Severity 5 and Severity 4 issues from the analysis
- Skipped Severity 2 issues that aren't actual problems (debounce timing)
- All "automatically resolved" issues are covered by the main fixes
- Settings synchronization bug (port not working) is now fixed
+907
View File
@@ -0,0 +1,907 @@
# Worker Service & Worker Utils: Comprehensive YAGNI Analysis
**Date**: 2025-11-06
**Files Analyzed**:
- `src/services/worker-service.ts` (1228 lines)
- `src/shared/worker-utils.ts` (110 lines)
**Overall Assessment**: 80% excellent architecture, 20% cleanup needed. Worker-service is well-structured with proper error handling priorities, but worker-utils contains critical bugs and YAGNI violations.
---
## Executive Summary
### What These Files Do
**worker-service.ts**: Long-running Express HTTP service managed by PM2. Handles AI compression of observations, session management, SSE streaming for web UI, and Chroma vector sync. This is the heart of claude-mem's async processing.
**worker-utils.ts**: Utilities for ensuring the worker is running. Called by hooks at session start to verify/start the PM2 worker process.
### Critical Findings
#### 🔥🔥🔥🔥🔥 SEVERITY 5 - MUST FIX IMMEDIATELY
1. **worker-utils.ts:75** - Fragile string parsing of PM2 output causes false positives
2. **worker-service.ts:754-844** - 60+ lines of identical session auto-creation code duplicated 3 times
3. **worker-utils.ts:70** - Silent error handling defers PM2 failures instead of failing fast
#### 🔥🔥🔥 SEVERITY 3 - FIX SOON
4. **worker-utils.ts:77-95** - No handling for "running but unhealthy" case
5. **worker-utils.ts:107-109** - Useless `getWorkerPort()` wrapper function
6. **worker-service.ts:316** - 1500ms debounce is 10x too long
#### 🔥🔥 SEVERITY 2 - CLEANUP WHEN CONVENIENT
7. Multiple magic numbers (100ms, 1000ms, 10000ms) without named constants
8. Hardcoded default values duplicated across multiple locations
9. Hardcoded model validation list that will become stale
---
## Complete Function Catalog
### worker-utils.ts Functions
| Function | Lines | Purpose | Status |
|----------|-------|---------|--------|
| `isWorkerHealthy(timeoutMs)` | 10-19 | Check /health endpoint responds | ✅ OK |
| `waitForWorkerHealth(maxWaitMs)` | 24-36 | Poll until worker healthy | 🔥 Inefficient timeout |
| `ensureWorkerRunning()` | 43-102 | Main orchestrator to start worker | 🔥🔥🔥🔥🔥 CRITICAL BUGS |
| `getWorkerPort()` | 107-109 | Returns FIXED_PORT constant | 🔥🔥🔥🔥🔥 DELETE THIS |
### worker-service.ts Functions
| Function | Lines | Purpose | Status |
|----------|-------|---------|--------|
| `findClaudePath()` | 35-65 | Find Claude Code executable | ✅ Excellent |
| Constructor | 107-139 | Setup Express routes | ✅ Good |
| `start()` | 141-173 | Start HTTP server, init Chroma | ✅ Excellent prioritization |
| `getUIDirectory()` | 178-189 | Get UI path (CJS/ESM) | ✅ Good defensive code |
| `handleHealth()` | 194-196 | GET /health | ✅ PERFECT |
| `handleViewerHTML()` | 201-211 | GET / | ✅ Good |
| `handleSSEStream()` | 216-245 | GET /stream (SSE) | ✅ Good |
| `broadcastSSE()` | 250-275 | Broadcast to clients | ✅ Excellent defensive code |
| `broadcastProcessingStatus()` | 280-286 | Broadcast processing state | ✅ Good |
| `checkAndStopSpinner()` | 291-318 | Debounced spinner stop | 🔥 1500ms too long |
| `handleStats()` | 323-365 | GET /api/stats | 🔥 Hardcoded paths/version |
| `handleGetSettings()` | 370-397 | GET /api/settings | 🔥 Duplicated defaults |
| `handlePostSettings()` | 402-461 | POST /api/settings | 🔥 Hardcoded model list |
| `handleGetObservations()` | 467-515 | GET /api/observations | ✅ Excellent |
| `handleGetSummaries()` | 517-576 | GET /api/summaries | ✅ Excellent |
| `handleGetPrompts()` | 578-631 | GET /api/prompts | ✅ Excellent |
| `handleGetProcessingStatus()` | 637-639 | GET /api/processing-status | ✅ Good |
| `handleInit()` | 645-744 | POST /sessions/:id/init | ✅ Good but has duplication |
| `handleObservation()` | 750-803 | POST /sessions/:id/observations | 🔥🔥🔥🔥🔥 MASSIVE DUPLICATION |
| `handleSummarize()` | 809-858 | POST /sessions/:id/summarize | 🔥🔥🔥🔥🔥 MASSIVE DUPLICATION |
| `handleComplete()` | 864-873 | POST /sessions/:id/complete | ✅ PERFECT |
| `handleStatus()` | 878-893 | GET /sessions/:id/status | ✅ Good |
| `runSDKAgent()` | 898-963 | Run SDK agent loop | ✅ Excellent |
| `createMessageGenerator()` | 969-1060 | Async generator for SDK | ✅ Excellent |
| `handleAgentMessage()` | 1066-1201 | Parse and store AI response | ✅ EXCELLENT |
| `main()` | 1205-1225 | Entry point + signals | ✅ Good |
---
## Line-by-Line Analysis
### worker-utils.ts
#### Lines 1-5: Imports and Constants
```typescript
const FIXED_PORT = parseInt(process.env.CLAUDE_MEM_WORKER_PORT || "37777", 10);
```
**What**: Parse port from env var with fallback to 37777
**Why**: Need to know which port to connect to
**Critique**: ✅ Good - simple constant, no unnecessary abstraction
---
#### Lines 10-19: `isWorkerHealthy(timeoutMs = 100)`
```typescript
async function isWorkerHealthy(timeoutMs: number = 100): Promise<boolean> {
try {
const response = await fetch(`http://127.0.0.1:${FIXED_PORT}/health`, {
signal: AbortSignal.timeout(timeoutMs)
});
return response.ok;
} catch {
return false;
}
}
```
**What**: Checks if /health endpoint responds within timeout
**Why**: Need to know if worker is running before trying to start it
**Critique**:
- Default 100ms is used once (line 45 initial check)
- Explicit 1000ms passed at line 29 (during startup polling)
- This inconsistency is actually INTENTIONAL: quick initial check vs. waiting for startup
-**VERDICT**: Reasonable pattern
**Why the two timeouts?**
- 100ms: "Is it already running?" (fast check, don't wait)
- 1000ms: "Is it starting up?" (wait for initialization)
---
#### Lines 24-36: `waitForWorkerHealth(maxWaitMs = 10000)`
```typescript
async function waitForWorkerHealth(maxWaitMs: number = 10000): Promise<boolean> {
const start = Date.now();
const checkInterval = 100; // Check every 100ms
while (Date.now() - start < maxWaitMs) {
if (await isWorkerHealthy(1000)) {
return true;
}
// Wait before next check
await new Promise(resolve => setTimeout(resolve, checkInterval));
}
return false;
}
```
**What**: Polls health endpoint every 100ms until healthy or timeout
**Why**: Worker takes time to start, need to wait
**Critique**:
🔥 **MAGIC NUMBER #1**: Line 26 `checkInterval = 100` - no units! Is this milliseconds? Should be `CHECK_INTERVAL_MS = 100`
🔥 **MAGIC NUMBER #2**: Line 29 `isWorkerHealthy(1000)` - why 1000ms timeout per check?
🔥 **INEFFICIENCY**: Each health check has 1000ms timeout, but we check every 100ms. If the worker is down, each check waits 1000ms to timeout. We could fail faster with a 100ms timeout since we retry quickly anyway.
**The Math**:
- Check interval: 100ms
- Health timeout: 1000ms
- If worker is down, first check fails after 1000ms, then we wait 100ms, then try again
- Total time to detect "worker is down" on first check: 1000ms (could be 100ms)
**RECOMMENDED**: Use 100ms timeout for health checks since we retry every 100ms anyway:
```typescript
const HEALTH_CHECK_TIMEOUT_MS = 100;
const HEALTH_CHECK_POLL_INTERVAL_MS = 100;
const HEALTH_CHECK_MAX_WAIT_MS = 10000;
async function waitForWorkerHealth(): Promise<boolean> {
const start = Date.now();
while (Date.now() - start < HEALTH_CHECK_MAX_WAIT_MS) {
if (await isWorkerHealthy(HEALTH_CHECK_TIMEOUT_MS)) return true;
await new Promise(resolve => setTimeout(resolve, HEALTH_CHECK_POLL_INTERVAL_MS));
}
return false;
}
```
---
#### Lines 43-102: `ensureWorkerRunning()` - 🔥🔥🔥🔥🔥 THE DISASTER ZONE
```typescript
export async function ensureWorkerRunning(): Promise<void> {
// First, check if worker is already healthy
if (await isWorkerHealthy()) {
return; // Worker is already running and responsive
}
const packageRoot = getPackageRoot();
const pm2Path = path.join(packageRoot, "node_modules", ".bin", "pm2");
const ecosystemPath = path.join(packageRoot, "ecosystem.config.cjs");
// Check PM2 status to see if worker process exists
const checkProcess = spawn(pm2Path, ["list", "--no-color"], {
cwd: packageRoot,
stdio: ["ignore", "pipe", "ignore"],
});
let output = "";
checkProcess.stdout?.on("data", (data) => {
output += data.toString();
});
// Wait for PM2 list to complete
await new Promise<void>((resolve, reject) => {
checkProcess.on("error", (error) => reject(error));
checkProcess.on("close", (code) => {
// PM2 list can fail, but we should still continue - just assume worker isn't running
// This handles cases where PM2 isn't installed yet
resolve();
});
});
// Check if 'claude-mem-worker' is in the PM2 list output and is 'online'
const isRunning = output.includes("claude-mem-worker") && output.includes("online");
if (!isRunning) {
// Start the worker
const startProcess = spawn(pm2Path, ["start", ecosystemPath], {
cwd: packageRoot,
stdio: "ignore",
});
// Wait for PM2 start command to complete
await new Promise<void>((resolve, reject) => {
startProcess.on("error", (error) => reject(error));
startProcess.on("close", (code) => {
if (code !== 0 && code !== null) {
reject(new Error(`PM2 start command failed with exit code ${code}`));
} else {
resolve();
}
});
});
}
// Wait for worker to become healthy (either just started or was starting)
const healthy = await waitForWorkerHealth(10000);
if (!healthy) {
throw new Error("Worker failed to become healthy after starting");
}
}
```
**What**: Ensure PM2 worker is running - check health, check PM2 status, start if needed, wait for health
**Why**: Hooks need worker running to process observations
#### 🔥🔥🔥🔥🔥 CRITICAL BUG #1: Fragile String Parsing (Line 75)
```typescript
const isRunning = output.includes("claude-mem-worker") && output.includes("online");
```
**THE PROBLEM**: This checks if BOTH strings exist ANYWHERE in the output. This is WRONG.
**Counter-Example**:
```
PM2 Process List:
┌─────┬────────────────────┬─────────┐
│ id │ name │ status │
├─────┼────────────────────┼─────────┤
│ 0 │ claude-mem-worker │ stopped │
│ 1 │ some-other-app │ online │
└─────┴────────────────────┴─────────┘
```
This would return `true` because output contains "claude-mem-worker" AND "online", even though the worker is STOPPED!
**Impact**:
- False positive: Worker is stopped, but code thinks it's running
- Result: Skip starting worker (line 77 `if (!isRunning)`), wait for health
- Health check fails because worker isn't actually running
- Entire function fails with "Worker failed to become healthy"
- User sees cryptic error instead of "Worker is stopped, restarting..."
**THE FIX**: Use PM2's JSON output
```typescript
const result = execSync(`"${pm2Path}" jlist`, { encoding: 'utf8' });
const processes = JSON.parse(result);
const worker = processes.find(p => p.name === 'claude-mem-worker');
const isRunning = worker?.pm2_env?.status === 'online';
```
#### 🔥🔥🔥🔥🔥 CRITICAL BUG #2: Silent Error Handling (Lines 65-72)
```typescript
await new Promise<void>((resolve, reject) => {
checkProcess.on("error", (error) => reject(error));
checkProcess.on("close", (code) => {
// PM2 list can fail, but we should still continue - just assume worker isn't running
// This handles cases where PM2 isn't installed yet
resolve(); // ← ALWAYS RESOLVES, NEVER REJECTS
});
});
```
**THE PROBLEM**:
1. If PM2 isn't installed, `pm2 list` fails
2. Line 70: ALWAYS resolves, ignoring the failure
3. `output` is empty string
4. Line 75: `isRunning = false` (correct by accident)
5. Line 77-94: Try to START the worker... which will ALSO fail because PM2 isn't installed
6. Line 85-93: THIS finally rejects with error
**Why This Is Terrible**:
- Defers error detection to the start command instead of failing fast
- Confusing error message: "PM2 start command failed" instead of "PM2 not found - run npm install"
- User wastes time waiting for PM2 list to fail, then waiting for PM2 start to fail
- The comment is a LIE: "we should still continue" - no, we shouldn't! If PM2 isn't installed, FAIL IMMEDIATELY.
**THE FIX**: Fail fast
```typescript
await new Promise<void>((resolve, reject) => {
checkProcess.on("error", reject);
checkProcess.on("close", (code) => {
if (code !== 0 && code !== null) {
reject(new Error(`PM2 not found - install dependencies first (npm install)`));
}
resolve();
});
});
```
#### 🔥🔥🔥🔥 CRITICAL BUG #3: No Handling for "Running But Unhealthy" (Lines 77-98)
**THE LOGIC**:
1. Line 45: Check if worker is healthy → NO (or we would have returned)
2. Line 54-75: Check if PM2 says worker is running
3. Line 77: `if (!isRunning)` → start the worker
4. Line 98: Wait for worker to become healthy
**THE PROBLEM**: What if PM2 says worker IS running but our health check (line 45) failed?
**Answer**: We do NOTHING. We skip the `if (!isRunning)` block and jump straight to line 98, waiting for it to become healthy.
**Why This Is Wrong**: If the worker is started but unhealthy, it won't magically heal itself. It needs to be RESTARTED.
**Scenarios**:
- Worker crashed but PM2 hasn't noticed yet → Status: "online", Health: failed → We wait forever
- Worker is in infinite loop → Status: "online", Health: timeout → We wait forever
- Worker port is wrong → Status: "online", Health: failed → We wait forever
**THE FIX**: Restart if unhealthy
```typescript
if (!await isWorkerHealthy()) {
// Not healthy - restart it (PM2 restart is idempotent)
execSync(`"${pm2Path}" restart "${ecosystemPath}"`);
if (!await waitForWorkerHealth()) {
throw new Error("Worker failed to become healthy after restart");
}
}
```
Or even simpler: Just always restart if health fails. PM2 handles "not started" vs "started" gracefully.
---
#### Lines 107-109: `getWorkerPort()` - 🔥🔥🔥🔥🔥 DELETE THIS
```typescript
/**
* Get the worker port number (fixed port)
*/
export function getWorkerPort(): number {
return FIXED_PORT;
}
```
**What**: Returns the FIXED_PORT constant
**Why**: ???
**Critique**: 🔥🔥🔥🔥🔥 **TEXTBOOK YAGNI VIOLATION**
This is the "wrapper function for a constant" anti-pattern from CLAUDE.md.
**THE PROBLEM**: This function adds ZERO value. It's pure ceremony.
**Callers should just**:
```typescript
import { FIXED_PORT } from './worker-utils.js';
// Use FIXED_PORT directly
```
**Instead of**:
```typescript
import { getWorkerPort } from './worker-utils.js';
const port = getWorkerPort(); // Why???
```
**Why This Exists**: Training bias. Code that looks "professional" often includes ceremonial getters for constants. But this is WRONG. Delete it and export the constant.
**THE FIX**:
```typescript
export const WORKER_PORT = parseInt(process.env.CLAUDE_MEM_WORKER_PORT || "37777", 10);
```
Then update all callers to use `WORKER_PORT` instead of `getWorkerPort()`.
---
### worker-utils.ts COMPLETE REWRITE
Here's what this file SHOULD be:
```typescript
import path from "path";
import { execSync } from "child_process";
import { getPackageRoot } from "./paths.js";
// Configuration
export const WORKER_PORT = parseInt(process.env.CLAUDE_MEM_WORKER_PORT || "37777", 10);
const HEALTH_CHECK_TIMEOUT_MS = 100;
const HEALTH_CHECK_POLL_INTERVAL_MS = 100;
const HEALTH_CHECK_MAX_WAIT_MS = 10000;
/**
* Check if worker is responsive by trying the health endpoint
*/
async function isWorkerHealthy(): Promise<boolean> {
try {
const response = await fetch(`http://127.0.0.1:${WORKER_PORT}/health`, {
signal: AbortSignal.timeout(HEALTH_CHECK_TIMEOUT_MS)
});
return response.ok;
} catch {
return false;
}
}
/**
* Wait for worker to become healthy, polling every 100ms
*/
async function waitForWorkerHealth(): Promise<boolean> {
const start = Date.now();
while (Date.now() - start < HEALTH_CHECK_MAX_WAIT_MS) {
if (await isWorkerHealthy()) return true;
await new Promise(resolve => setTimeout(resolve, HEALTH_CHECK_POLL_INTERVAL_MS));
}
return false;
}
/**
* Ensure worker service is running and healthy
* Restarts worker if not healthy (PM2 restart is idempotent)
*/
export async function ensureWorkerRunning(): Promise<void> {
if (await isWorkerHealthy()) return;
const packageRoot = getPackageRoot();
const pm2Path = path.join(packageRoot, "node_modules", ".bin", "pm2");
const ecosystemPath = path.join(packageRoot, "ecosystem.config.cjs");
// PM2 restart is idempotent - handles both "not started" and "started but broken"
try {
const result = execSync(`"${pm2Path}" restart "${ecosystemPath}"`, {
cwd: packageRoot,
encoding: 'utf8',
stdio: 'pipe'
});
if (!await waitForWorkerHealth()) {
throw new Error(`Worker failed to become healthy. PM2 output:\n${result}`);
}
} catch (error: any) {
if (error.code === 'ENOENT' || error.message.includes('not found')) {
throw new Error('PM2 not found - run: npm install');
}
throw error;
}
}
```
**Line Count**: 43 lines (vs 110 original)
**Complexity**: 1/3 of original
**Bugs Fixed**: All of them
**Ceremony Removed**: All of it
**What Changed**:
1. Removed `getWorkerPort()` wrapper - export constant directly
2. Removed PM2 status checking - just restart if unhealthy
3. Removed string parsing - use PM2's idempotent restart
4. Removed silent error handling - fail fast on PM2 not found
5. Named all magic numbers as constants
6. Simplified to: "Unhealthy? Restart. Wait for health. Done."
---
## worker-service.ts Analysis
### Overall Structure
**Lines 1-24**: Imports and constants ✅
**Lines 27-65**: `findClaudePath()` ✅ Excellent
**Lines 67-96**: Type definitions ✅
**Lines 98-1228**: WorkerService class
### Critical Issues in worker-service.ts
#### 🔥🔥🔥🔥🔥 ISSUE #1: Massive Code Duplication (Lines 754-844)
**THE PROBLEM**: Session auto-creation logic is COPIED THREE TIMES:
1. `handleInit()` (lines 663-733)
2. `handleObservation()` (lines 754-785)
3. `handleSummarize()` (lines 813-844)
**The Duplicated Code** (20+ lines per copy):
```typescript
let session = this.sessions.get(sessionDbId);
if (!session) {
const db = new SessionStore();
const dbSession = db.getSessionById(sessionDbId);
db.close();
session = {
sessionDbId,
claudeSessionId: dbSession!.claude_session_id,
sdkSessionId: null,
project: dbSession!.project,
userPrompt: dbSession!.user_prompt,
pendingMessages: [],
abortController: new AbortController(),
generatorPromise: null,
lastPromptNumber: 0,
startTime: Date.now()
};
this.sessions.set(sessionDbId, session);
session.generatorPromise = this.runSDKAgent(session).catch(err => {
logger.failure('WORKER', 'SDK agent error', { sessionId: sessionDbId }, err);
const db = new SessionStore();
db.markSessionFailed(sessionDbId);
db.close();
this.sessions.delete(sessionDbId);
});
}
```
**Impact**: 60+ lines of duplicated code across 3 functions
**THE FIX**: Extract to helper method
```typescript
private getOrCreateSession(sessionDbId: number): ActiveSession {
let session = this.sessions.get(sessionDbId);
if (session) return session;
const db = new SessionStore();
const dbSession = db.getSessionById(sessionDbId);
if (!dbSession) {
db.close();
throw new Error(`Session ${sessionDbId} not found in database`);
}
session = {
sessionDbId,
claudeSessionId: dbSession.claude_session_id,
sdkSessionId: null,
project: dbSession.project,
userPrompt: dbSession.user_prompt,
pendingMessages: [],
abortController: new AbortController(),
generatorPromise: null,
lastPromptNumber: 0,
startTime: Date.now()
};
this.sessions.set(sessionDbId, session);
// Start SDK agent in background
session.generatorPromise = this.runSDKAgent(session).catch(err => {
logger.failure('WORKER', 'SDK agent error', { sessionId: sessionDbId }, err);
const db = new SessionStore();
db.markSessionFailed(sessionDbId);
db.close();
this.sessions.delete(sessionDbId);
});
db.close();
return session;
}
```
Then all three functions become:
```typescript
private handleObservation(req: Request, res: Response): void {
const sessionDbId = parseInt(req.params.sessionDbId, 10);
const { tool_name, tool_input, tool_output, prompt_number } = req.body;
const session = this.getOrCreateSession(sessionDbId);
session.pendingMessages.push({
type: 'observation',
tool_name,
tool_input,
tool_output,
prompt_number
});
res.json({ status: 'queued', queueLength: session.pendingMessages.length });
}
```
**Savings**: Remove 60 lines, improve maintainability 10x
---
#### 🔥🔥 ISSUE #2: Magic Numbers Throughout
**Line 316**: `setTimeout(() => { ... }, 1500);` - Why 1500ms debounce?
**Line 997**: `setTimeout(resolve, 100)` - Why 100ms polling?
**Line 343**: `const version = process.env.npm_package_version || '5.0.3';` - Hardcoded fallback
**Line 109**: `express.json({ limit: '50mb' })` - Why 50mb?
**THE FIX**: Named constants
```typescript
const SPINNER_DEBOUNCE_MS = 200; // Debounce spinner to prevent flicker
const MESSAGE_POLL_INTERVAL_MS = 100; // Check for new messages every 100ms
const MAX_REQUEST_SIZE = '50mb'; // Allow large tool outputs
```
---
#### 🔥🔥 ISSUE #3: Configuration Duplication
Default values appear in multiple places:
- Line 377-380: Default settings in GET handler
- Line 22: MODEL default
- Throughout: Port defaults, observation count defaults
**THE FIX**: Centralize
```typescript
export const DEFAULT_CONFIG = {
MODEL: 'claude-haiku-4-5',
CONTEXT_OBSERVATIONS: 50,
WORKER_PORT: 37777,
VALID_MODELS: ['claude-haiku-4-5', 'claude-sonnet-4-5', 'claude-opus-4'],
MAX_CONTEXT_OBSERVATIONS: 200,
MIN_PORT: 1024,
MAX_PORT: 65535
} as const;
```
---
#### 🔥 ISSUE #4: Hardcoded Model Validation (Line 407)
```typescript
const validModels = ['claude-haiku-4-5', 'claude-sonnet-4-5', 'claude-opus-4'];
```
**THE PROBLEM**: This list will get stale when new models are released.
**YAGNI QUESTION**: Do we even need to validate? The SDK will error if model doesn't exist.
**ANSWER**: Better error messages for users. But this should be a WARNING, not a blocker.
**THE FIX**: Remove validation or make it advisory
```typescript
// Let SDK handle validation - it knows the current model list
// We don't need to duplicate that logic here
if (CLAUDE_MEM_MODEL) {
settings.env.CLAUDE_MEM_MODEL = CLAUDE_MEM_MODEL;
logger.info('WORKER', `Model changed to ${CLAUDE_MEM_MODEL}`, {});
}
```
---
### What worker-service.ts Does RIGHT ✅
#### 1. Excellent Error Handling Priority
```typescript
// Store to SQLite FIRST (source of truth)
const { id, createdAtEpoch } = db.storeObservation(...);
// Broadcast to SSE (real-time UI updates)
this.broadcastSSE({ type: 'new_observation', ... });
// Sync to Chroma ASYNC (fire-and-forget, non-critical)
this.chromaSync.syncObservation(...)
.catch((error: Error) => {
logger.error('...continuing', ...);
// Don't crash - SQLite has the data
});
```
**Priority**: SQLite > SSE > Chroma
**Philosophy**: Write to source of truth first, update UI second, sync to vector DB last. Chroma failures don't crash the worker.
#### 2. Clean Pagination APIs
All data endpoints follow consistent pattern:
- Parse `offset`, `limit`, `project` from query params
- Cap limit at 100 to prevent abuse
- Return `{ items, hasMore, total, offset, limit }`
- Use parameterized queries (SQL injection safe)
Example: `handleGetObservations()` (lines 467-515) is textbook good API design.
#### 3. Proper Async Generator Pattern
`createMessageGenerator()` (lines 969-1060) is an excellent implementation:
- Yields init prompt immediately
- Polls message queue with proper abort signal handling
- No busy-waiting (100ms sleep between polls)
- Clean message type discrimination
- Proper error propagation
#### 4. Defensive SSE Cleanup
`broadcastSSE()` (lines 250-275):
- Early return if no clients (optimization)
- Two-phase cleanup (collect failures, then remove)
- Doesn't modify Set during iteration
- Handles disconnected clients gracefully
This is GOOD defensive programming, not YAGNI violation.
---
## Severity-Ranked YAGNI Violations
### 🔥🔥🔥🔥🔥 SEVERITY 5: CRITICAL - FIX IMMEDIATELY
| Issue | File | Lines | Problem | Impact |
|-------|------|-------|---------|--------|
| Fragile string parsing | worker-utils | 75 | `output.includes("claude-mem-worker") && output.includes("online")` | False positives cause failures |
| Session auto-creation duplication | worker-service | 754-844 | 60+ lines copied 3 times | Maintenance nightmare |
| Silent PM2 error handling | worker-utils | 70 | Always resolves, defers errors | Confusing error messages |
### 🔥🔥🔥🔥 SEVERITY 4: MAJOR - FIX SOON
| Issue | File | Lines | Problem | Impact |
|-------|------|-------|---------|--------|
| No "running but unhealthy" handling | worker-utils | 77-98 | Skip restart if PM2 says running | Worker never recovers |
| Useless getWorkerPort() wrapper | worker-utils | 107-109 | Ceremony for a constant | Code bloat |
### 🔥🔥🔥 SEVERITY 3: MODERATE - FIX WHEN CONVENIENT
| Issue | File | Lines | Problem | Impact |
|-------|------|-------|---------|--------|
| 1500ms debounce too long | worker-service | 316 | Should be 100-200ms | Spinner lags |
| Hardcoded model validation | worker-service | 407 | List will get stale | Blocks valid models |
| Hardcoded fallback version | worker-service | 343 | '5.0.3' will get stale | Wrong stats |
### 🔥🔥 SEVERITY 2: MINOR - CLEANUP
| Issue | File | Lines | Problem | Impact |
|-------|------|-------|---------|--------|
| Magic numbers everywhere | Both | Multiple | 100, 1000, 1500, etc | Hard to maintain |
| Duplicated default configs | worker-service | Multiple | Defaults in many places | Inconsistency risk |
| Unnecessary this.port | worker-service | 100 | Should use FIXED_PORT | Confusion |
---
## Recommended Action Plan
### Phase 1: Critical Fixes (Do Today)
1. **Fix worker-utils.ts completely** - Use the rewrite provided above (43 lines)
- Remove getWorkerPort()
- Fix PM2 string parsing → use `pm2 restart` (idempotent)
- Remove silent error handling
- Named constants for all timeouts
2. **Extract getOrCreateSession()** in worker-service.ts
- Remove 60 lines of duplication
- Update handleInit, handleObservation, handleSummarize
### Phase 2: Cleanup (Do This Week)
3. **Centralize configuration**
- Create DEFAULT_CONFIG constant
- Remove duplicated defaults
- Update all references
4. **Fix magic numbers**
- SPINNER_DEBOUNCE_MS = 200
- MESSAGE_POLL_INTERVAL_MS = 100
- HEALTH_CHECK_TIMEOUT_MS = 100
- etc.
5. **Remove hardcoded validations**
- Model validation (let SDK handle it)
- Fallback version (read from package.json)
### Phase 3: Polish (Do Next Week)
6. **Fix minor issues**
- Remove `this.port` instance variable
- Update debounce to 200ms
- Add constants for all magic numbers
---
## The YAGNI Philosophy Applied
### What YAGNI Means Here
**You Aren't Gonna Need It**: Don't build infrastructure for problems you don't have.
### Examples from This Code
#### YAGNI Violation ❌
```typescript
export function getWorkerPort(): number {
return FIXED_PORT; // Wrapper for a constant
}
```
**Why**: Adds zero value. Pure ceremony. Just export the constant.
#### YAGNI Compliance ✅
```typescript
export const WORKER_PORT = parseInt(...);
```
**Why**: Solves the actual need (get port) without ceremony.
---
#### YAGNI Violation ❌
```typescript
// Check PM2 status with string parsing
const checkProcess = spawn(pm2Path, ["list", "--no-color"]);
let output = "";
checkProcess.stdout?.on("data", (data) => { output += data.toString(); });
// ... 30 lines of promise wrappers and parsing ...
const isRunning = output.includes("claude-mem-worker") && output.includes("online");
if (!isRunning) {
// Start worker
}
// But what if it's running AND unhealthy? Do nothing!
```
**Why**: Solving a problem that doesn't exist. PM2 restart is idempotent - it handles both "not started" and "started but broken". We don't need to distinguish.
#### YAGNI Compliance ✅
```typescript
if (!await isWorkerHealthy()) {
execSync(`pm2 restart ecosystem.config.cjs`);
await waitForWorkerHealth();
}
```
**Why**: Solves the actual problem (ensure worker is healthy) in the simplest way.
---
### The Pattern
**YAGNI Violations Follow This Pattern**:
1. Imagine a scenario ("what if PM2 isn't installed?")
2. Write defensive code for the scenario (silent error handling)
3. Defer the error to a later point
4. Make the actual error message worse
**YAGNI Compliance Follows This Pattern**:
1. Write the obvious solution (check health, restart if unhealthy)
2. Let errors propagate naturally
3. Add error handling only where actually needed
4. Keep error messages clear and direct
---
## Conclusion
### Overall Assessment
**worker-utils.ts**: 🔥🔥🔥🔥 2/5 - Needs complete rewrite
**worker-service.ts**: ✅✅✅✅🔥 4/5 - Mostly excellent, fix duplication
### The Good
- worker-service.ts has excellent architecture (SQLite > SSE > Chroma priority)
- Clean pagination APIs with proper parameterization
- Good async generator pattern for SDK streaming
- Proper SSE client management with defensive cleanup
- Non-blocking Chroma sync with graceful failures
### The Bad
- worker-utils.ts has 3 critical bugs (string parsing, silent errors, missing restart)
- 60+ lines of duplicated session auto-creation code
- Magic numbers everywhere without named constants
- Hardcoded defaults in multiple locations
### The Ugly
- `getWorkerPort()` is pure ceremony - delete it
- 1500ms debounce is 10x too long
- PM2 string parsing is fragile and will break
- Silent error handling makes debugging impossible
### Time to Fix
- Critical fixes (worker-utils rewrite + extract getOrCreateSession): **2 hours**
- Cleanup (centralize config, fix magic numbers): **2 hours**
- Polish (minor issues): **1 hour**
**Total**: 5 hours to bring codebase from 80% to 95% quality.
### Final Verdict
This code is **80% excellent, 20% disaster**. The disaster is concentrated in worker-utils.ts (which is called on EVERY session start) and the session auto-creation duplication (which makes maintenance painful). Fix these two issues and you have a rock-solid codebase.
The worker-service.ts architecture is actually brilliant - the prioritization of SQLite > SSE > Chroma is exactly right, and the async generator pattern for SDK streaming is textbook perfect. Don't let the duplication overshadow the good design.
**Recommendation**: Fix worker-utils.ts TODAY (it has production bugs), extract getOrCreateSession() THIS WEEK (it's painful to maintain), and clean up the rest NEXT WEEK.