97d565e3cd
* feat: add mem-search skill with progressive disclosure architecture Add comprehensive mem-search skill for accessing claude-mem's persistent cross-session memory database. Implements progressive disclosure workflow and token-efficient search patterns. Features: - 12 search operations (observations, sessions, prompts, by-type, by-concept, by-file, timelines, etc.) - Progressive disclosure principles to minimize token usage - Anti-patterns documentation to guide LLM behavior - HTTP API integration for all search functionality - Common workflows with composition examples Structure: - SKILL.md: Entry point with temporal trigger patterns - principles/: Progressive disclosure + anti-patterns - operations/: 12 search operation files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: add CHANGELOG entry for mem-search skill Document mem-search skill addition in Unreleased section with: - 100% effectiveness compliance metrics - Comparison to previous search skill implementation - Progressive disclosure architecture details - Reference to audit report documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: add mem-search skill audit report Add comprehensive audit report validating mem-search skill against Anthropic's official skill-creator documentation. Report includes: - Effectiveness metrics comparison (search vs mem-search) - Critical issues analysis for production readiness - Compliance validation across 6 key dimensions - Reference implementation guidance Result: mem-search achieves 100% compliance vs search's 67% 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: Add comprehensive search architecture analysis document - Document current state of dual search architectures (HTTP API and MCP) - Analyze HTTP endpoints and MCP search server architectures - Identify DRY violations across search implementations - Evaluate the use of curl as the optimal approach for search - Provide architectural recommendations for immediate and long-term improvements - Outline action plan for cleanup, feature parity, DRY refactoring * refactor: Remove deprecated search skill documentation and operations * refactor: Reorganize documentation into public and context directories Changes: - Created docs/public/ for Mintlify documentation (.mdx files) - Created docs/context/ for internal planning and implementation docs - Moved all .mdx files and assets to docs/public/ - Moved all internal .md files to docs/context/ - Added CLAUDE.md to both directories explaining their purpose - Updated docs.json paths to work with new structure Benefits: - Clear separation between user-facing and internal documentation - Easier to maintain Mintlify docs in dedicated directory - Internal context files organized separately 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Enhance session management and continuity in hooks - Updated new-hook.ts to clarify session_id threading and idempotent session creation. - Modified prompts.ts to require claudeSessionId for continuation prompts, ensuring session context is maintained. - Improved SessionStore.ts documentation on createSDKSession to emphasize idempotent behavior and session connection. - Refined SDKAgent.ts to detail continuation prompt logic and its reliance on session.claudeSessionId for unified session handling. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Alex Newman <thedotmack@gmail.com>
259 lines
10 KiB
Markdown
259 lines
10 KiB
Markdown
**Claude finished @thedotmack's task** —— [View job](https://github.com/thedotmack/claude-mem/actions/runs/19080917196)
|
|
|
|
---
|
|
### Code Review Complete
|
|
|
|
This PR implements significant architectural improvements focused on production stability. The refactoring successfully simplifies the worker service and eliminates several failure modes. Below is my detailed review:
|
|
|
|
---
|
|
|
|
### ✅ Strengths
|
|
|
|
#### 1. **Excellent Simplification of Worker Service Startup** (src/services/worker-service.ts:115-147)
|
|
The new startup sequence is a major improvement:
|
|
- HTTP server starts **first** before any blocking operations
|
|
- ChromaSync initialization is deferred until after HTTP is ready
|
|
- Chroma backfill runs in background (non-blocking)
|
|
- Failures no longer crash the entire service
|
|
|
|
**Before**: ChromaSync initialized in constructor → Chroma backfill blocks startup → Any failure prevents HTTP server from starting
|
|
|
|
**After**: HTTP server starts immediately → ChromaSync initialized → Chroma backfill in background → Service always available
|
|
|
|
This is excellent engineering. The worker will now be responsive even if Chroma has issues.
|
|
|
|
#### 2. **Smart Executable Path Caching** (src/services/worker-service.ts:22-61)
|
|
Adding `cachedClaudePath` prevents repeated `execSync` calls on every SDK agent initialization. This reduces filesystem overhead and improves performance. Good optimization.
|
|
|
|
#### 3. **Improved Error Handling in Hooks** (src/hooks/save-hook.ts:84-91, new-hook.ts:62-69, summary-hook.ts:65-72)
|
|
The error handling now distinguishes between connection errors vs HTTP errors:
|
|
```typescript
|
|
if (error.cause?.code === 'ECONNREFUSED' || error.name === 'TimeoutError' || error.message.includes('fetch failed')) {
|
|
throw new Error("There's a problem with the worker. If you just updated, type `pm2 restart claude-mem-worker` in your terminal to continue");
|
|
}
|
|
```
|
|
This provides actionable error messages to users instead of generic failures.
|
|
|
|
#### 4. **Dramatic PM2 Configuration Simplification** (ecosystem.config.cjs)
|
|
The PM2 config went from 44 lines with complex error/output redirection down to 26 lines. This is exactly right - PM2's defaults are sensible and the previous configuration was adding complexity without clear benefit.
|
|
|
|
#### 5. **Worker Management Simplification** (src/shared/worker-utils.ts)
|
|
Removed the async health check system (124 lines → 48 lines). The new synchronous approach using PM2 list output is simpler and more reliable:
|
|
- No HTTP health checks that can timeout
|
|
- No retry logic with arbitrary delays
|
|
- Just checks if PM2 reports the process as 'online'
|
|
|
|
---
|
|
|
|
### ⚠️ Issues & Concerns
|
|
|
|
#### 1. **Race Condition in Worker Health Check** (src/shared/worker-utils.ts:28-39) - **CRITICAL**
|
|
|
|
There's a race condition in the worker startup check:
|
|
|
|
```typescript
|
|
checkProcess.on('close', (code) => {
|
|
const isRunning = output.includes('claude-mem-worker') && output.includes('online');
|
|
|
|
if (!isRunning) {
|
|
// Only start if not already running
|
|
spawn(pm2Path, ['start', ecosystemPath], {
|
|
cwd: packageRoot,
|
|
stdio: 'ignore'
|
|
});
|
|
}
|
|
});
|
|
```
|
|
|
|
**Problem**: The `spawn()` call inside the `close` event handler is non-blocking, but the function returns immediately. The hooks calling `ensureWorkerRunning()` will continue execution **before** the worker has actually started.
|
|
|
|
**Impact**: Hooks may attempt to make HTTP requests to the worker before it's listening, resulting in connection failures.
|
|
|
|
**Suggested Fix**: Either:
|
|
- Make `ensureWorkerRunning()` async and await the health check
|
|
- Add a brief health check retry loop after spawning (like the old implementation had)
|
|
- Document that this is fire-and-forget and hooks should handle ECONNREFUSED
|
|
|
|
#### 2. **Removed Health Endpoint Information** (src/services/worker-service.ts:152-154)
|
|
|
|
The health endpoint was simplified from:
|
|
```typescript
|
|
// OLD
|
|
res.json({
|
|
status: 'ok',
|
|
port: this.port,
|
|
pid: process.pid,
|
|
activeSessions: this.sessions.size,
|
|
chromaSynced: true
|
|
});
|
|
```
|
|
|
|
To:
|
|
```typescript
|
|
// NEW
|
|
res.json({ status: 'ok' });
|
|
```
|
|
|
|
**Concern**: This removes useful debugging information. When troubleshooting production issues, knowing the PID, active sessions count, and port would be valuable. The health endpoint is typically used for diagnostics, not just uptime checks.
|
|
|
|
**Suggestion**: Consider keeping at least `pid` and `activeSessions` in the health response for observability.
|
|
|
|
#### 3. **Auto-Session Creation Without Validation** (src/services/worker-service.ts:252-280, 309-336)
|
|
|
|
Both `handleObservation` and `handleSummarize` auto-create sessions if they don't exist:
|
|
|
|
```typescript
|
|
if (!session) {
|
|
// Auto-create session if it doesn't exist (e.g., worker restarted)
|
|
const db = new SessionStore();
|
|
const dbSession = db.getSessionById(sessionDbId);
|
|
db.close();
|
|
|
|
session = {
|
|
sessionDbId,
|
|
claudeSessionId: dbSession!.claude_session_id, // ⚠️ Non-null assertion
|
|
// ...
|
|
};
|
|
}
|
|
```
|
|
|
|
**Concerns**:
|
|
- Uses non-null assertion (`dbSession!`) without checking if `dbSession` is actually null
|
|
- If `getSessionById()` returns null (session doesn't exist in DB), this will throw at runtime
|
|
- The error won't be caught and will crash the handler
|
|
|
|
**Suggested Fix**:
|
|
```typescript
|
|
const dbSession = db.getSessionById(sessionDbId);
|
|
if (!dbSession) {
|
|
db.close();
|
|
res.status(404).json({ error: 'Session not found in database' });
|
|
return;
|
|
}
|
|
```
|
|
|
|
This validation already exists in `handleInit` (lines 167-173), so it should be added to the other handlers too.
|
|
|
|
#### 4. **Removed Observation Counter** (src/services/worker-service.ts:80)
|
|
|
|
The `ActiveSession` interface no longer tracks `observationCounter`:
|
|
```typescript
|
|
// REMOVED: observationCounter: number; // Counter for correlation IDs
|
|
```
|
|
|
|
**Question**: Was this used for generating correlation IDs for logging? If so, is there now no way to correlate observations within a session for debugging?
|
|
|
|
**Impact**: Minor - mainly affects debugging/logging. Not a functional issue.
|
|
|
|
#### 5. **PM2 Watch Mode in Production** (ecosystem.config.cjs:16)
|
|
|
|
The config enables watch mode:
|
|
```javascript
|
|
watch: true,
|
|
```
|
|
|
|
**Concern**: Watch mode causes PM2 to restart the process whenever files change. This is useful during development but potentially problematic in production:
|
|
- Could cause unexpected restarts if files are touched
|
|
- May interrupt active sessions
|
|
- Increases resource usage (file watching)
|
|
|
|
**Suggestion**: Consider disabling watch mode by default, or document that users should disable it in production if they're deploying this.
|
|
|
|
#### 6. **Duplicate Port Constant** (Multiple Files)
|
|
|
|
The `FIXED_PORT` constant is defined in multiple places:
|
|
- `src/services/worker-service.ts:20`
|
|
- `src/shared/worker-utils.ts:5`
|
|
- `src/hooks/save-hook.ts:54`
|
|
- `src/hooks/new-hook.ts:47`
|
|
- `src/hooks/summary-hook.ts:39`
|
|
|
|
**Concern**: This creates maintenance burden and risk of inconsistency. If the port needs to change, it must be updated in 5 places.
|
|
|
|
**Suggestion**: Export `FIXED_PORT` from a single shared module (e.g., `worker-utils.ts`) and import it everywhere else.
|
|
|
|
---
|
|
|
|
### 🔍 Minor Issues
|
|
|
|
#### 7. **Error Handling Inconsistency in Chroma Sync** (src/services/worker-service.ts:220-223)
|
|
|
|
```typescript
|
|
).catch(err => {
|
|
logger.failure('WORKER', 'Failed to sync user_prompt to Chroma - continuing', { promptId: latestPrompt.id }, err);
|
|
// Don't crash - SQLite has the data
|
|
});
|
|
```
|
|
|
|
The logger method is `failure` here but `error` elsewhere (lines 625, 664). For consistency, these should all use the same log level for Chroma sync failures.
|
|
|
|
#### 8. **Type Safety in Error Handling** (src/hooks/save-hook.ts:84)
|
|
|
|
```typescript
|
|
} catch (error: any) {
|
|
```
|
|
|
|
Using `any` defeats type safety. Consider using `unknown` and type guards:
|
|
```typescript
|
|
} catch (error: unknown) {
|
|
const err = error as Error & { cause?: { code?: string } };
|
|
if (err.cause?.code === 'ECONNREFUSED' || err.name === 'TimeoutError' || err.message.includes('fetch failed')) {
|
|
// ...
|
|
}
|
|
}
|
|
```
|
|
|
|
This pattern is used in all three hook files.
|
|
|
|
---
|
|
|
|
### 📋 Documentation Cleanup
|
|
|
|
The deletion of obsolete documentation files is appropriate:
|
|
- `EXPERIMENTAL_RELEASE_NOTES.md` (331 lines)
|
|
- `FEATURE_PLAN_HYBRID_SEARCH.md` (486 lines)
|
|
- `GITHUB_RELEASE_TEMPLATE.md` (83 lines)
|
|
- `IMPLEMENTATION_STATUS.md` (503 lines)
|
|
- `NEXT_SESSION_PROMPT.md` (193 lines)
|
|
- `docs/coderabbit-PR-41.md` (314 lines)
|
|
|
|
These appear to be planning documents and old release artifacts. Good cleanup, though consider archiving rather than deleting if there's historical value.
|
|
|
|
---
|
|
|
|
### 🎯 Testing Recommendations
|
|
|
|
In addition to the testing recommendations in the PR description:
|
|
|
|
1. **Test race condition scenario**: Start a fresh session immediately after a `pm2 stop` to verify hooks handle ECONNREFUSED gracefully
|
|
2. **Test worker restart under load**: Restart worker while observations are being processed to verify auto-session-creation works
|
|
3. **Test Chroma failures**: Disconnect/break Chroma to verify worker stays alive and logs errors appropriately
|
|
4. **Verify PM2 watch mode**: Ensure watch mode doesn't cause problems with the data directory or node_modules changes
|
|
|
|
---
|
|
|
|
### ✅ Overall Assessment
|
|
|
|
**Verdict**: **Approve with recommendations**
|
|
|
|
This PR represents a significant improvement in production stability and code maintainability. The architectural changes are sound and address real production failure modes. The primary concerns are:
|
|
|
|
1. **Race condition in worker startup** (Critical - should be fixed)
|
|
2. **Missing null checks in auto-session creation** (High - could cause crashes)
|
|
3. **Reduced observability in health endpoint** (Medium - impacts debugging)
|
|
4. **Watch mode in production** (Low - document or disable)
|
|
|
|
The simplifications are well-justified and the error handling improvements make the system more resilient. Once the critical race condition and null check issues are addressed, this will be a solid improvement.
|
|
|
|
**Lines of code review**: The 1,855 line reduction (2,335 deletions - 480 additions) accurately reflects meaningful simplification rather than just removing features.
|
|
|
|
---
|
|
|
|
#### Todo List
|
|
- [x] Read and analyze core service files (worker-service.ts, worker-utils.ts)
|
|
- [x] Review hook changes (save-hook.ts, new-hook.ts, summary-hook.ts, cleanup-hook.ts)
|
|
- [x] Review PM2 configuration changes (ecosystem.config.cjs)
|
|
- [x] Analyze architectural changes and potential issues
|
|
- [x] Provide comprehensive review feedback
|
|
|
|
--- |