Fixes blank console windows appearing on Windows 11 when spawning the worker process. ## Problem On Windows, `windowsHide: true` doesn't work when combined with `detached: true` in child_process.spawn(). This is a Node.js limitation (nodejs/node#21825) that **Bun also inherits** because Bun uses Node.js process spawning semantics. Result: Blank console windows with "claude" title appear during claude-mem operations. ## Solution Use PowerShell's `Start-Process -WindowStyle Hidden` on Windows to properly hide console windows. Unix platforms continue using standard `spawn()` with `detached: true`. ## Testing Validated by ToxMox on Windows 11 in PR #315: - windowsHide approach: ❌ Still shows blank consoles - PowerShell approach: ✅ Properly hides windows ## Implementation ```typescript // Windows: PowerShell workaround Start-Process -FilePath 'bun' -ArgumentList '${script}' -WindowStyle Hidden // Unix: Standard spawn (works fine) spawn(bunPath, [script], { detached: true }) ``` ## Notes - Affects BOTH Bun and Node.js runtimes on Windows - This is a **high-priority fix** for Windows users - Keeps Bun runtime (doesn't address zombie socket issue) - Zombie socket issue is separate and requires different solution ## References - Issue: #304 (Multiple visible console windows on Windows 11) - Testing: PR #315 (ToxMox's detailed analysis) - Node.js bug: nodejs/node#21825 - Extracted from: PR #335 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
4.9 KiB
⏺ PR #335 Review Summary: Queue Monitoring UI + Windows Fixes
What PR #335 Does
Additions: 3,724 lines Author: ToxMox Main Changes:
- Switch worker runtime from Bun → Node.js (all platforms)
- Add persistent message queue in SQLite (pending_messages table)
- Add queue monitoring UI with drawer interface
- Add watchdog service for auto-recovery (30s polling)
- Fix Windows console popups using PowerShell Start-Process -WindowStyle Hidden
Initial Concerns (My Review)
Raised red flags about:
- Lack of evidence for "zombie socket" issue (no GitHub issue, only ToxMox reported)
- Over-engineering: Full persistent queue + watchdog + retry logic + UI for unproven problems
- Mixing multiple concerns in one PR (should be 3-4 separate PRs)
- No automated tests for complex state machine logic
- Global runtime change (Bun→Node) affects all platforms for Windows-specific issue
- Command injection vulnerability in PowerShell string (ProcessManager.ts:67)
What We Discovered
- Problems ARE Real & Documented
- Found detailed analysis in PR #315 comments by ToxMox
- Zombie socket issue has upstream Bun GitHub issues linked:
- oven-sh/bun#12127, #5774, #8786
- windowsHide: true doesn't work with detached: true (Node.js bug #21825)
- SDK subprocess hangs documented with testing details
- Queue UI Has Real Value
- You saw video demo and said it's "gorgeous and helpful"
- Provides visibility into worker activity
- Recovery controls prevent user frustration
- Real-time updates via existing SSE infrastructure
- Architecture Makes Sense
Why persistent worker is needed:
- Real-time UI at http://localhost:37777 requires persistent process
- SSE (Server-Sent Events) for live updates
- Can't do on-demand worker if UI needs to be always available
Why queue in database is justified:
- Transactional consistency (save observation + enqueue atomically)
- Relational queries (JOIN with sessions/projects)
- Foreign key cascades (session deleted → queue entries auto-cleaned)
- Already have SQLite infrastructure
- Storage Optimization Strategy
Smart cleanup approach (your insight):
- Keep full data while pending/processing (needed for retry)
- Clear payloads immediately on completion: Set tool_input, tool_response, last_user_message, last_assistant_message to NULL
- Keep lightweight metadata for "Recently Processed" UI
- Eventually delete old completed records (after 1hr or >100 count)
- Result: 100x storage reduction while keeping UI history
Rejected approach: Linking to transcript files (overly complex, YAGNI)
Critical Insight: Bun → Node Switch Likely Unnecessary
Your final assessment: "honestly thats more an llm hallucinating an overengineered solution based on incorrect data that probably could be solved by just killing the process correctly"
The real issue is probably:
- Missing cleanup handlers (server.close() before exit)
- Process killed too fast (SIGKILL before cleanup finishes)
- Not receiving SIGTERM properly
- No registered signal handlers for graceful shutdown
Simple fix to try FIRST: const server = app.listen(port);
async function cleanup() { server.close(); // Close server sessionManager.abortAll(); // Stop active work db.close(); // Close DB process.exit(0); }
process.on('SIGTERM', cleanup); process.on('SIGINT', cleanup);
Final Assessment
PR #335 is mostly solid with real benefits, BUT:
✅ What's Good:
- Queue UI provides real value
- Persistent queue in DB is architecturally justified
- Auto-recovery prevents stuck sessions
- Problems are real and documented
- Comprehensive solution to multiple pain points
⚠️ What Needs Validation:
- Bun zombie socket issue - Only ToxMox reported, not validated by you
- Proper cleanup handlers - Try fixing process termination before switching runtimes
- Platform-specific runtime - If Bun issue is real, use Node only on Windows, keep Bun on Mac/Linux
🔧 What Needs Fixing:
- Add tests - Zero automated tests for complex state machine
- Fix command injection - ProcessManager.ts:67 PowerShell string interpolation
- Implement payload cleanup - Clear heavy data immediately on completion
- Try simple fix first - Proper signal handlers before runtime switch
Action Items for Next Session
- Ask ToxMox: Did you try proper cleanup handlers before switching runtimes?
- Suggest: Platform-specific runtime (Bun on Unix, Node on Windows only if needed)
- Request: Reproduction steps for zombie socket issue
- Require: Basic tests before merge
- Fix: Command injection vulnerability
- Consider: Splitting into separate PRs (optional, not required)
Key Takeaway
The PR solves real problems with solid architecture, but the Bun→Node switch is likely over-engineered. Try proper process cleanup first.