Improve error handling and logging across worker services (#528)
* fix: prevent memory_session_id from equaling content_session_id The bug: memory_session_id was initialized to contentSessionId as a "placeholder for FK purposes". This caused the SDK resume logic to inject memory agent messages into the USER's Claude Code transcript, corrupting their conversation history. Root cause: - SessionStore.createSDKSession initialized memory_session_id = contentSessionId - SDKAgent checked memorySessionId !== contentSessionId but this check only worked if the session was fetched fresh from DB The fix: - SessionStore: Initialize memory_session_id as NULL, not contentSessionId - SDKAgent: Simple truthy check !!session.memorySessionId (NULL = fresh start) - Database migration: Ran UPDATE to set memory_session_id = NULL for 1807 existing sessions that had the bug Also adds [ALIGNMENT] logging across the session lifecycle to help debug session continuity issues: - Hook entry: contentSessionId + promptNumber - DB lookup: contentSessionId → memorySessionId mapping proof - Resume decision: shows which memorySessionId will be used for resume - Capture: logs when memorySessionId is captured from first SDK response UI: Added "Alignment" quick filter button in LogsModal to show only alignment logs for debugging session continuity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: improve error handling in worker-service.ts - Fix GENERIC_CATCH anti-patterns by logging full error objects instead of just messages - Add [ANTI-PATTERN IGNORED] markers for legitimate cases (cleanup, hot paths) - Simplify error handling comments to be more concise - Improve httpShutdown() error discrimination for ECONNREFUSED - Reduce LARGE_TRY_BLOCK issues in initialization code Part of anti-pattern cleanup plan (132 total issues) * refactor: improve error logging in SearchManager.ts - Pass full error objects to logger instead of just error.message - Fixes PARTIAL_ERROR_LOGGING anti-patterns (10 instances) - Better debugging visibility when Chroma queries fail Part of anti-pattern cleanup (133 remaining) * refactor: improve error logging across SessionStore and mcp-server - SessionStore.ts: Fix error logging in column rename utility - mcp-server.ts: Log full error objects instead of just error.message - Improve error handling in Worker API calls and tool execution Part of anti-pattern cleanup (133 remaining) * Refactor hooks to streamline error handling and loading states - Simplified error handling in useContextPreview by removing try-catch and directly checking response status. - Refactored usePagination to eliminate try-catch, improving readability and maintaining error handling through response checks. - Cleaned up useSSE by removing unnecessary try-catch around JSON parsing, ensuring clarity in message handling. - Enhanced useSettings by streamlining the saving process, removing try-catch, and directly checking the result for success. * refactor: add error handling back to SearchManager Chroma calls - Wrap queryChroma calls in try-catch to prevent generator crashes - Log Chroma errors as warnings and fall back gracefully - Fixes generator failures when Chroma has issues - Part of anti-pattern cleanup recovery * feat: Add generator failure investigation report and observation duplication regression report - Created a comprehensive investigation report detailing the root cause of generator failures during anti-pattern cleanup, including the impact, investigation process, and implemented fixes. - Documented the critical regression causing observation duplication due to race conditions in the SDK agent, outlining symptoms, root cause analysis, and proposed fixes. * fix: address PR #528 review comments - atomic cleanup and detector improvements This commit addresses critical review feedback from PR #528: ## 1. Atomic Message Cleanup (Fix Race Condition) **Problem**: SessionRoutes.ts generator error handler had race condition - Queried messages then marked failed in loop - If crash during loop → partial marking → inconsistent state **Solution**: - Added `markSessionMessagesFailed()` to PendingMessageStore.ts - Single atomic UPDATE statement replaces loop - Follows existing pattern from `resetProcessingToPending()` **Files**: - src/services/sqlite/PendingMessageStore.ts (new method) - src/services/worker/http/routes/SessionRoutes.ts (use new method) ## 2. Anti-Pattern Detector Improvements **Problem**: Detector didn't recognize logger.failure() method - Lines 212 & 335 already included "failure" - Lines 112-113 (PARTIAL_ERROR_LOGGING detection) did not **Solution**: Updated regex patterns to include "failure" for consistency **Files**: - scripts/anti-pattern-test/detect-error-handling-antipatterns.ts ## 3. Documentation **PR Comment**: Added clarification on memory_session_id fix location - Points to SessionStore.ts:1155 - Explains why NULL initialization prevents message injection bug ## Review Response Addresses "Must Address Before Merge" items from review: ✅ Clarified memory_session_id bug fix location (via PR comment) ✅ Made generator error handler message cleanup atomic ❌ Deferred comprehensive test suite to follow-up PR (keeps PR focused) ## Testing - Build passes with no errors - Anti-pattern detector runs successfully - Atomic cleanup follows proven pattern from existing methods 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: FOREIGN KEY constraint and missing failed_at_epoch column Two critical bugs fixed: 1. Missing failed_at_epoch column in pending_messages table - Added migration 20 to create the column - Fixes error when trying to mark messages as failed 2. FOREIGN KEY constraint failed when storing observations - All three agents (SDK, Gemini, OpenRouter) were passing session.contentSessionId instead of session.memorySessionId - storeObservationsAndMarkComplete expects memorySessionId - Added null check and clear error message However, observations still not saving - see investigation report. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Refactor hook input parsing to improve error handling - Added a nested try-catch block in new-hook.ts, save-hook.ts, and summary-hook.ts to handle JSON parsing errors more gracefully. - Replaced direct error throwing with logging of the error details using logger.error. - Ensured that the process exits cleanly after handling input in all three hooks. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -3,11 +3,11 @@ import{stdin as y}from"process";var U=JSON.stringify({continue:!0,suppressOutput
|
||||
${t.stack}`:t.message;if(Array.isArray(t))return`[${t.length} items]`;let r=Object.keys(t);return r.length===0?"{}":r.length<=3?JSON.stringify(t):`{${r.length} keys: ${r.slice(0,3).join(", ")}...}`}return String(t)}formatTool(t,r){if(!r)return t;let e=typeof r=="string"?JSON.parse(r):r;if(t==="Bash"&&e.command)return`${t}(${e.command})`;if(e.file_path)return`${t}(${e.file_path})`;if(e.notebook_path)return`${t}(${e.notebook_path})`;if(t==="Glob"&&e.pattern)return`${t}(${e.pattern})`;if(t==="Grep"&&e.pattern)return`${t}(${e.pattern})`;if(e.url)return`${t}(${e.url})`;if(e.query)return`${t}(${e.query})`;if(t==="Task"){if(e.subagent_type)return`${t}(${e.subagent_type})`;if(e.description)return`${t}(${e.description})`}return t==="Skill"&&e.skill?`${t}(${e.skill})`:t==="LSP"&&e.operation?`${t}(${e.operation})`:t}formatTimestamp(t){let r=t.getFullYear(),e=String(t.getMonth()+1).padStart(2,"0"),n=String(t.getDate()).padStart(2,"0"),o=String(t.getHours()).padStart(2,"0"),i=String(t.getMinutes()).padStart(2,"0"),E=String(t.getSeconds()).padStart(2,"0"),l=String(t.getMilliseconds()).padStart(3,"0");return`${r}-${e}-${n} ${o}:${i}:${E}.${l}`}log(t,r,e,n,o){if(t<this.getLevel())return;let i=this.formatTimestamp(new Date),E=M[t].padEnd(5),l=r.padEnd(6),c="";n?.correlationId?c=`[${n.correlationId}] `:n?.sessionId&&(c=`[session-${n.sessionId}] `);let g="";o!=null&&(o instanceof Error?g=this.getLevel()===0?`
|
||||
${o.message}
|
||||
${o.stack}`:` ${o.message}`:this.getLevel()===0&&typeof o=="object"?g=`
|
||||
`+JSON.stringify(o,null,2):g=" "+this.formatData(o));let O="";if(n){let{sessionId:D,memorySessionId:q,correlationId:z,...m}=n;Object.keys(m).length>0&&(O=` {${Object.entries(m).map(([P,$])=>`${P}=${$}`).join(", ")}}`)}let C=`[${i}] [${E}] [${l}] ${c}${e}${O}${g}`;if(this.logFilePath)try{W(this.logFilePath,C+`
|
||||
`+JSON.stringify(o,null,2):g=" "+this.formatData(o));let u="";if(n){let{sessionId:D,memorySessionId:q,correlationId:z,...m}=n;Object.keys(m).length>0&&(u=` {${Object.entries(m).map(([P,$])=>`${P}=${$}`).join(", ")}}`)}let C=`[${i}] [${E}] [${l}] ${c}${e}${u}${g}`;if(this.logFilePath)try{W(this.logFilePath,C+`
|
||||
`,"utf8")}catch(D){process.stderr.write(`[LOGGER] Failed to write to log file: ${D}
|
||||
`)}else process.stderr.write(C+`
|
||||
`)}debug(t,r,e,n){this.log(0,t,r,e,n)}info(t,r,e,n){this.log(1,t,r,e,n)}warn(t,r,e,n){this.log(2,t,r,e,n)}error(t,r,e,n){this.log(3,t,r,e,n)}dataIn(t,r,e,n){this.info(t,`\u2192 ${r}`,e,n)}dataOut(t,r,e,n){this.info(t,`\u2190 ${r}`,e,n)}success(t,r,e,n){this.info(t,`\u2713 ${r}`,e,n)}failure(t,r,e,n){this.error(t,`\u2717 ${r}`,e,n)}timing(t,r,e,n){this.info(t,`\u23F1 ${r}`,n,{duration:`${e}ms`})}happyPathError(t,r,e,n,o=""){let c=((new Error().stack||"").split(`
|
||||
`)[2]||"").match(/at\s+(?:.*\s+)?\(?([^:]+):(\d+):(\d+)\)?/),g=c?`${c[1].split("/").pop()}:${c[2]}`:"unknown",O={...e,location:g};return this.warn(t,`[HAPPY-PATH] ${r}`,O,n),o}},_=new f;import A from"path";import{homedir as G}from"os";import{readFileSync as K}from"fs";var p={DEFAULT:3e5,HEALTH_CHECK:3e4,WORKER_STARTUP_WAIT:1e3,WORKER_STARTUP_RETRIES:300,PRE_RESTART_SETTLE_DELAY:2e3,WINDOWS_MULTIPLIER:1.5};function h(s){return process.platform==="win32"?Math.round(s*p.WINDOWS_MULTIPLIER):s}function I(s={}){let{port:t,includeSkillFallback:r=!1,customPrefix:e,actualError:n}=s,o=e||"Worker service connection failed.",i=t?` (port ${t})`:"",E=`${o}${i}
|
||||
`)[2]||"").match(/at\s+(?:.*\s+)?\(?([^:]+):(\d+):(\d+)\)?/),g=c?`${c[1].split("/").pop()}:${c[2]}`:"unknown",u={...e,location:g};return this.warn(t,`[HAPPY-PATH] ${r}`,u,n),o}},_=new f;import A from"path";import{homedir as G}from"os";import{readFileSync as K}from"fs";var p={DEFAULT:3e5,HEALTH_CHECK:3e4,WORKER_STARTUP_WAIT:1e3,WORKER_STARTUP_RETRIES:300,PRE_RESTART_SETTLE_DELAY:2e3,WINDOWS_MULTIPLIER:1.5};function h(s){return process.platform==="win32"?Math.round(s*p.WINDOWS_MULTIPLIER):s}function I(s={}){let{port:t,includeSkillFallback:r=!1,customPrefix:e,actualError:n}=s,o=e||"Worker service connection failed.",i=t?` (port ${t})`:"",E=`${o}${i}
|
||||
|
||||
`;return E+=`To restart the worker:
|
||||
`,E+=`1. Exit Claude Code completely
|
||||
@@ -16,4 +16,4 @@ ${o.stack}`:` ${o.message}`:this.getLevel()===0&&typeof o=="object"?g=`
|
||||
|
||||
If that doesn't work, try: /troubleshoot`),n&&(E=`Worker Error: ${n}
|
||||
|
||||
${E}`),E}var X=A.join(G(),".claude","plugins","marketplaces","thedotmack"),At=h(p.HEALTH_CHECK),T=null;function u(){if(T!==null)return T;let s=A.join(a.get("CLAUDE_MEM_DATA_DIR"),"settings.json"),t=a.loadFromFile(s);return T=parseInt(t.CLAUDE_MEM_WORKER_PORT,10),T}async function V(){let s=u();return(await fetch(`http://127.0.0.1:${s}/api/readiness`)).ok}function j(){let s=A.join(X,"package.json");return JSON.parse(K(s,"utf-8")).version}async function B(){let s=u(),t=await fetch(`http://127.0.0.1:${s}/api/version`);if(!t.ok)throw new Error(`Failed to get worker version: ${t.status}`);return(await t.json()).version}async function Y(){let s=j(),t=await B();s!==t&&_.debug("SYSTEM","Version check",{pluginVersion:s,workerVersion:t,note:"Mismatch will be auto-restarted by worker-service start command"})}async function N(){for(let r=0;r<75;r++){try{if(await V()){await Y();return}}catch(e){_.debug("SYSTEM","Worker health check failed, will retry",{attempt:r+1,maxRetries:75,error:e instanceof Error?e.message:String(e)})}await new Promise(e=>setTimeout(e,200))}throw new Error(I({port:u(),customPrefix:"Worker did not become ready within 15 seconds."}))}async function J(s){if(await N(),!s)throw new Error("saveHook requires input");let{session_id:t,cwd:r,tool_name:e,tool_input:n,tool_response:o}=s,i=u(),E=_.formatTool(e,n);if(_.dataIn("HOOK",`PostToolUse: ${E}`,{workerPort:i}),!r)throw new Error(`Missing cwd in PostToolUse hook input for session ${t}, tool ${e}`);let l=await fetch(`http://127.0.0.1:${i}/api/sessions/observations`,{method:"POST",headers:{"Content-Type":"application/json"},body:JSON.stringify({contentSessionId:t,tool_name:e,tool_input:n,tool_response:o,cwd:r})});if(!l.ok)throw new Error(`Observation storage failed: ${l.status}`);_.debug("HOOK","Observation sent successfully",{toolName:e}),console.log(U)}var L="";y.on("data",s=>L+=s);y.on("end",async()=>{let s;try{s=L?JSON.parse(L):void 0}catch(t){throw new Error(`Failed to parse hook input: ${t instanceof Error?t.message:String(t)}`)}await J(s)});
|
||||
${E}`),E}var X=A.join(G(),".claude","plugins","marketplaces","thedotmack"),At=h(p.HEALTH_CHECK),T=null;function O(){if(T!==null)return T;let s=A.join(a.get("CLAUDE_MEM_DATA_DIR"),"settings.json"),t=a.loadFromFile(s);return T=parseInt(t.CLAUDE_MEM_WORKER_PORT,10),T}async function V(){let s=O();return(await fetch(`http://127.0.0.1:${s}/api/readiness`)).ok}function j(){let s=A.join(X,"package.json");return JSON.parse(K(s,"utf-8")).version}async function B(){let s=O(),t=await fetch(`http://127.0.0.1:${s}/api/version`);if(!t.ok)throw new Error(`Failed to get worker version: ${t.status}`);return(await t.json()).version}async function Y(){let s=j(),t=await B();s!==t&&_.debug("SYSTEM","Version check",{pluginVersion:s,workerVersion:t,note:"Mismatch will be auto-restarted by worker-service start command"})}async function N(){for(let r=0;r<75;r++){try{if(await V()){await Y();return}}catch(e){_.debug("SYSTEM","Worker health check failed, will retry",{attempt:r+1,maxRetries:75,error:e instanceof Error?e.message:String(e)})}await new Promise(e=>setTimeout(e,200))}throw new Error(I({port:O(),customPrefix:"Worker did not become ready within 15 seconds."}))}async function J(s){if(await N(),!s)throw new Error("saveHook requires input");let{session_id:t,cwd:r,tool_name:e,tool_input:n,tool_response:o}=s,i=O(),E=_.formatTool(e,n);if(_.dataIn("HOOK",`PostToolUse: ${E}`,{workerPort:i}),!r)throw new Error(`Missing cwd in PostToolUse hook input for session ${t}, tool ${e}`);let l=await fetch(`http://127.0.0.1:${i}/api/sessions/observations`,{method:"POST",headers:{"Content-Type":"application/json"},body:JSON.stringify({contentSessionId:t,tool_name:e,tool_input:n,tool_response:o,cwd:r})});if(!l.ok)throw new Error(`Observation storage failed: ${l.status}`);_.debug("HOOK","Observation sent successfully",{toolName:e}),console.log(U)}var L="";y.on("data",s=>L+=s);y.on("end",async()=>{try{let s;try{s=L?JSON.parse(L):void 0}catch(t){throw new Error(`Failed to parse hook input: ${t instanceof Error?t.message:String(t)}`)}await J(s)}catch(s){_.error("HOOK","save-hook failed",{},s)}finally{process.exit(0)}});
|
||||
|
||||
Reference in New Issue
Block a user