Compare commits

..

8 Commits

Author SHA1 Message Date
Alex Newman a7ebc35ee0 chore: bump version to 11.0.0
Publish to npm / publish (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 19:39:28 -07:00
Alex Newman 9063c5d8a7 fix: block memory agent prose-skip responses at prompt and runtime levels
Observer prompt now explicitly requires XML observation blocks or empty
responses — prose explanations like "Skipping" are discarded. ResponseProcessor
logs a warning when non-XML content is received. Recording focus expanded to
include concrete debugging findings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 19:39:01 -07:00
Alex Newman 3b34feb779 chore: rebuild plugin artifacts for v10.7.2 with Alessandro's stability PRs (#1607)
Rebuilt worker-service, mcp-server, and viewer-bundle to include:
- SIGTERM drain for orphaned pending messages (#1567)
- Multi-machine sync script (#1570)
- 3 upstream bug fixes: summarize loop, ChromaSync duplicates, TOCTOU port check (#1566)
- Semantic context injection via Chroma (#1568)
- Tier routing by queue complexity (#1569)
- Architecture overview + production guide docs (#1574)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 19:36:32 -07:00
Alex Newman ad58fdf8fc docs: update CHANGELOG.md for v10.7.2
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 19:25:32 -07:00
Alex Newman b385570884 chore: bump version to 10.7.2
Publish to npm / publish (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 19:22:50 -07:00
Alex Newman 29ef3f5603 fix: downgrade concept-type cleanup log from error to debug (#1606)
The parser correctly strips observation types from concepts arrays when the
LLM ignores the prompt instruction. This is routine data normalization, not
an error — downgrade to debug to reduce log noise.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 19:21:38 -07:00
Alex Newman f7a088c6d9 docs: update CHANGELOG.md for v10.7.1 2026-04-04 19:01:19 -07:00
Alex Newman 538ada9ec4 docs: update CHANGELOG.md for v10.7.1 2026-04-04 19:00:04 -07:00
14 changed files with 4430 additions and 145 deletions
+1 -1
View File
@@ -10,7 +10,7 @@
"plugins": [
{
"name": "claude-mem",
"version": "10.7.1",
"version": "11.0.0",
"source": "./plugin",
"description": "Persistent memory system for Claude Code - context compression across sessions"
}
+2 -65
View File
@@ -1,68 +1,5 @@
# Memory Context from Past Sessions
The following context is from claude-mem, a persistent memory system that tracks your coding sessions.
*No context yet. Complete your first session and context will appear here.*
# $CMEM claude-mem 2026-04-03 6:48pm PDT
Legend: 🎯session 🔴bugfix 🟣feature 🔄refactor ✅change 🔵discovery ⚖️decision
Format: ID TIME TYPE TITLE
Fetch details: get_observations([IDs]) | Search: mem-search skill
Stats: 50 obs (18,868t read) | 401,168t work | 95% savings
### Apr 3, 2026
62994 1:47p 🔴 Merge Commit Finalized on thedotmack/npx-gemini-cli Branch
62995 1:48p 🔵 Worker Running but Health Endpoint Doesn't Accept POST
62996 " 🔵 Worker Health Endpoint Returns Detailed Status via GET
62997 1:49p 🔵 Worker Service Timeout and Shutdown Behavior in worker-service.ts
62998 " 🔵 claude-mem Hook Architecture Defined in plugin/hooks/hooks.json
62999 " 🔵 Session Idle Timeout Architecture: Two-Tier System in claude-mem
63000 " 🔵 Orphan Reaper Runs Every 30 Seconds; Sessions Orphaned After 6 Hours
63001 1:51p 🔵 POST /api/sessions/complete Removes Sessions from Active Map to Unblock Orphan Reaper
63002 1:52p 🔵 Stop Hook Summarize Flow: Extracts Last Assistant Message from Transcript
63004 " 🔵 POST /api/sessions/summarize: Privacy Check Before Queuing SDK Agent
63005 " 🔵 SessionManager.deleteSession Verifies Subprocess Exit to Prevent Zombies
63007 " 🔵 deleteSession: 4-Step Teardown with Generator and Subprocess Timeouts
63008 1:53p 🔵 Queue Depth Always Read from Database; Generator Restarts Capped at 3
63009 " 🔴 Fixed Lost Summaries: session-complete Now Waits for Pending Work Before Deleting Session
63010 1:54p 🔴 SessionEnd Hook Timeout Increased to 180s
63014 2:00p 🔵 claude-mem Hook Architecture and Exit Code System
63015 2:01p 🔵 SessionEnd Hook Has a 1.5s Default Timeout Controlled by Environment Variable
63016 2:02p 🔴 Stop Hook Now Owns Full Session Lifecycle: Summarize → Poll → Complete
63017 " 🔵 Missing /api/sessions/status Route — Only DB-ID Variant Exists
63018 2:03p 🔴 Added /api/sessions/status Route Registration to SessionRoutes
63020 " 🟣 Added handleStatusByClaudeId Handler for GET /api/sessions/status
63022 " 🔄 Removed Pending-Work Polling from /api/sessions/complete — Moved to Stop Hook
63024 " 🔄 SessionEnd Hook Reverted to Fast Fire-and-Forget (2s Timeout)
63026 2:04p 🔵 claude-mem hooks.json Full Hook Lifecycle Configuration
63027 2:05p ✅ Push to Pull Request
63028 " 🔵 Pre-Push State: claude-mem Repository Changes
63029 " 🔴 Fix Lost Summaries: Move Summary Wait into Stop Hook
63035 2:11p ✅ Testing Plan Created for tmux-cli npx Installation Flows
63036 2:12p 🔵 claude-mem Supports 13 npx Installation Flows Across IDE Integrations
63037 " 🔵 Detailed Integration Strategies for All 13 claude-mem npx Installation Flows
63038 2:13p ✅ NPX Install Flow Test Plan Document Created
63039 " ✅ 12 TODO Tasks Created for npx Install Flow Testing
63040 2:19p 🟣 Comprehensive Test Suite Requested for Claude-Mem CLI
63041 2:20p 🔵 NPX Install Flow Test Plan Exists for 12 IDE Integrations
63042 " 🟣 Phase 2 E2E Runtime Testing Added to NPX Install Test Plan
63043 " ✅ Test Tasks Updated with Phase 2 E2E Runtime Steps for 5 IDE Flows
63044 " ✅ All Remaining Test Tasks (612) Updated with Phase 2 E2E Runtime Steps
63079 6:31p ⚖️ Test Execution via Subagents Using /do Command
63080 6:32p 🔵 IDE Auto-Detection Module in claude-mem
63081 " 🔵 Install Command Architecture with Multi-IDE Dispatch
63082 " 🔵 MCP Integrations Module for 6 IDEs
63083 " 🔵 Cursor, Windsurf, and Gemini CLI Hook-Based Integrations
63084 " 🔵 OpenCode, OpenClaw, and Codex CLI Installers
63085 6:33p 🔵 tmux-cli Available for Automated Testing
63086 " 🔵 NPX Install Flow Test Plan — 12 IDE Flows
63087 6:34p 🟣 Detailed Test Execution Plan Created for NPX Install Flows
63103 6:47p 🔵 NPX Install Fails for Windsurf IDE with Missing rxjs Dependency
63104 " 🔵 Windsurf Install Failure Was a Dependency Ordering Race
63105 " 🟣 claude-mem Gemini CLI Integration: 8 Hooks Registered
63106 " 🟣 claude-mem OpenCode Integration: Plugin File + AGENTS.md Context
Access 401k tokens of past work via get_observations([IDs]) or mem-search skill.
---
*Auto-updated by claude-mem after each session. Use MCP search tools for detailed queries.*
Use claude-mem's MCP search tools for manual memory queries.
+4341 -64
View File
File diff suppressed because it is too large Load Diff
+1 -1
View File
@@ -1,6 +1,6 @@
{
"name": "claude-mem",
"version": "10.7.1",
"version": "11.0.0",
"description": "Memory compression system for Claude Code - persist context across sessions",
"keywords": [
"claude",
+1 -1
View File
@@ -1,6 +1,6 @@
{
"name": "claude-mem",
"version": "10.7.1",
"version": "11.0.0",
"description": "Persistent memory system for Claude Code - seamlessly preserve context across sessions",
"author": {
"name": "Alex Newman"
+3 -3
View File
@@ -87,8 +87,8 @@
"system_identity": "You are a Claude-Mem, a specialized observer tool for creating searchable memory FOR FUTURE SESSIONS.\n\nCRITICAL: Record what was LEARNED/BUILT/FIXED/DEPLOYED/CONFIGURED, not what you (the observer) are doing.\n\nYou do not have access to tools. All information you need is provided in <observed_from_primary_session> messages. Create observations from what you observe - no investigation needed.",
"spatial_awareness": "SPATIAL AWARENESS: Tool executions include the working directory (tool_cwd) to help you understand:\n- Which repository/project is being worked on\n- Where files are located relative to the project root\n- How to match requested paths to actual execution paths",
"observer_role": "Your job is to monitor a different Claude Code session happening RIGHT NOW, with the goal of creating observations and progress summaries as the work is being done LIVE by the user. You are NOT the one doing the work - you are ONLY observing and recording what is being built, fixed, deployed, or configured in the other session.",
"recording_focus": "WHAT TO RECORD\n--------------\nFocus on deliverables and capabilities:\n- What the system NOW DOES differently (new capabilities)\n- What shipped to users/production (features, fixes, configs, docs)\n- Changes in technical domains (auth, data, UI, infra, DevOps, docs)\n\nUse verbs like: implemented, fixed, deployed, configured, migrated, optimized, added, refactored\n\n✅ GOOD EXAMPLES (describes what was built):\n- \"Authentication now supports OAuth2 with PKCE flow\"\n- \"Deployment pipeline runs canary releases with auto-rollback\"\n- \"Database indexes optimized for common query patterns\"\n\n❌ BAD EXAMPLES (describes observation process - DO NOT DO THIS):\n- \"Analyzed authentication implementation and stored findings\"\n- \"Tracked deployment steps and logged outcomes\"\n- \"Monitored database performance and recorded metrics\"",
"skip_guidance": "WHEN TO SKIP\n------------\nSkip routine operations:\n- Empty status checks\n- Package installations with no errors\n- Simple file listings\n- Repetitive operations you've already documented\n- If file related research comes back as empty or not found\n- **No output necessary if skipping.**",
"recording_focus": "WHAT TO RECORD\n--------------\nFocus on durable technical signal:\n- What the system NOW DOES differently (new capabilities)\n- What shipped to users/production (features, fixes, configs, docs)\n- Changes in technical domains (auth, data, UI, infra, DevOps, docs)\n- Concrete debugging or investigative findings from logs, traces, queue state, database rows, and code-path inspection\n\nUse verbs like: implemented, fixed, deployed, configured, migrated, optimized, added, refactored, discovered, confirmed, traced\n\n✅ GOOD EXAMPLES (describes what was built or learned):\n- \"Authentication now supports OAuth2 with PKCE flow\"\n- \"Deployment pipeline runs canary releases with auto-rollback\"\n- \"Database indexes optimized for common query patterns\"\n- \"Observation queue for claude-mem session timed out waiting for an agent pool slot\"\n- \"Fallback processing abandoned pending messages after Gemini and OpenRouter returned 404\"\n\n❌ BAD EXAMPLES (describes observation process - DO NOT DO THIS):\n- \"Analyzed authentication implementation and stored findings\"\n- \"Tracked deployment steps and logged outcomes\"\n- \"Monitored database performance and recorded metrics\"",
"skip_guidance": "WHEN TO SKIP\n------------\nSkip routine operations:\n- Empty status checks\n- Package installations with no errors\n- Simple file listings with no follow-on finding\n- Repetitive operations you've already documented\n- File related research that comes back empty or not found\n\nIf skipping, return an empty response only. Do not explain the skip in prose.",
"type_guidance": "**type**: MUST be EXACTLY one of these 6 options (no other values allowed):\n - bugfix: something was broken, now fixed\n - feature: new capability or functionality added\n - refactor: code restructured, behavior unchanged\n - change: generic modification (docs, config, misc)\n - discovery: learning about existing system\n - decision: architectural/design choice with rationale",
"concept_guidance": "**concepts**: 2-5 knowledge-type categories. MUST use ONLY these exact keywords:\n - how-it-works: understanding mechanisms\n - why-it-exists: purpose or rationale\n - what-changed: modifications made\n - problem-solution: issues and their fixes\n - gotcha: traps or edge cases\n - pattern: reusable approach\n - trade-off: pros/cons of a decision\n\n IMPORTANT: Do NOT include the observation type (change/discovery/decision) as a concept.\n Types and concepts are separate dimensions.",
"field_guidance": "**facts**: Concise, self-contained statements\nEach fact is ONE piece of information\n No pronouns - each fact must stand alone\n Include specific details: filenames, functions, values\n\n**files**: All files touched (full paths from project root)",
@@ -122,4 +122,4 @@
"summary_format_instruction": "Respond in this XML format:",
"summary_footer": "IMPORTANT! DO NOT do any work right now other than generating this next PROGRESS SUMMARY - and remember that you are a memory agent designed to summarize a DIFFERENT claude code session, not this one.\n\nNever reference yourself or your own actions. Do not output anything other than the summary content formatted in the XML structure above. All other output is ignored by the system, and the system has been designed to be smart about token usage. Please spend your tokens wisely on useful summary content.\n\nThank you, this summary will be very useful for keeping track of our progress!"
}
}
}
+1 -1
View File
@@ -1,6 +1,6 @@
{
"name": "claude-mem-plugin",
"version": "10.7.1",
"version": "11.0.0",
"private": true,
"description": "Runtime dependencies for claude-mem bundled hooks",
"type": "module",
+1 -1
View File
@@ -114,7 +114,7 @@ Set the \`cycles\` parameter to \`"ref"\` to resolve cyclical schemas with defs.
${c}`}var bP=new Set([".js",".jsx",".ts",".tsx",".mjs",".cjs",".py",".pyw",".go",".rs",".rb",".java",".cs",".cpp",".c",".h",".hpp",".swift",".kt",".php",".vue",".svelte"]),xP=new Set(["node_modules",".git","dist","build",".next","__pycache__",".venv","venv","env",".env","target","vendor",".cache",".turbo","coverage",".nyc_output",".claude",".smart-file-read"]),kP=512*1024;async function*n$(e,t,r=20){if(r<=0)return;let n;try{n=await(0,Sn.readdir)(e,{withFileTypes:!0})}catch{return}for(let o of n){if(o.name.startsWith(".")&&o.name!=="."||xP.has(o.name))continue;let i=(0,pi.join)(e,o.name);if(o.isDirectory())yield*n$(i,t,r-1);else if(o.isFile()){let a=o.name.slice(o.name.lastIndexOf("."));bP.has(a)&&(yield i)}}}async function SP(e){try{let t=await(0,Sn.stat)(e);if(t.size>kP||t.size===0)return null;let r=await(0,Sn.readFile)(e,"utf-8");return r.slice(0,1e3).includes("\0")?null:r}catch{return null}}async function o$(e,t,r={}){let n=r.maxResults||20,o=t.toLowerCase(),i=o.split(/[\s_\-./]+/).filter(h=>h.length>0),a=[];for await(let h of n$(e,e)){if(r.filePattern&&!(0,pi.relative)(e,h).toLowerCase().includes(r.filePattern.toLowerCase()))continue;let _=await SP(h);_&&a.push({absolutePath:h,relativePath:(0,pi.relative)(e,h),content:_})}let s=e$(a),c=[],u=[],l=0;for(let[h,_]of s){l+=wP(_);let E=Os(h.toLowerCase(),i)>0,I=[],A=(j,Le)=>{for(let de of j){let Wt=0,Qe="",Kt=Os(de.name.toLowerCase(),i);Kt>0&&(Wt+=Kt*3,Qe="name match"),de.signature.toLowerCase().includes(o)&&(Wt+=2,Qe=Qe?`${Qe} + signature`:"signature match"),de.jsdoc&&de.jsdoc.toLowerCase().includes(o)&&(Wt+=1,Qe=Qe?`${Qe} + jsdoc`:"jsdoc match"),Wt>0&&(E=!0,I.push({filePath:h,symbolName:Le?`${Le}.${de.name}`:de.name,kind:de.kind,signature:de.signature,jsdoc:de.jsdoc,lineStart:de.lineStart,lineEnd:de.lineEnd,matchReason:Qe})),de.children&&A(de.children,de.name)}};A(_.symbols),E&&(c.push(_),u.push(...I))}u.sort((h,_)=>{let b=Os(h.symbolName.toLowerCase(),i);return Os(_.symbolName.toLowerCase(),i)-b});let d=u.slice(0,n),m=new Set(d.map(h=>h.filePath)),p=c.filter(h=>m.has(h.filePath)).slice(0,n),g=p.reduce((h,_)=>h+_.foldedTokenEstimate,0);return{foldedFiles:p,matchingSymbols:d,totalFilesScanned:a.length,totalSymbolsFound:l,tokenEstimate:g}}function Os(e,t){let r=0;for(let n of t)if(e===n)r+=10;else if(e.includes(n))r+=5;else{let o=0,i=0;for(let a of n){let s=e.indexOf(a,o);s!==-1&&(i++,o=s+1)}i===n.length&&(r+=1)}return r}function wP(e){let t=e.symbols.length;for(let r of e.symbols)r.children&&(t+=r.children.length);return t}function i$(e,t){let r=[];if(r.push(`\u{1F50D} Smart Search: "${t}"`),r.push(` Scanned ${e.totalFilesScanned} files, found ${e.totalSymbolsFound} symbols`),r.push(` ${e.matchingSymbols.length} matches across ${e.foldedFiles.length} files (~${e.tokenEstimate} tokens for folded view)`),r.push(""),e.matchingSymbols.length===0)return r.push(" No matching symbols found."),r.join(`
`);r.push("\u2500\u2500 Matching Symbols \u2500\u2500"),r.push("");for(let n of e.matchingSymbols){if(r.push(` ${n.kind} ${n.symbolName} (${n.filePath}:${n.lineStart+1})`),r.push(` ${n.signature}`),n.jsdoc){let o=n.jsdoc.split(`
`).find(i=>i.replace(/^[\s*/]+/,"").trim().length>0);o&&r.push(` \u{1F4AC} ${o.replace(/^[\s*/]+/,"").trim()}`)}r.push("")}r.push("\u2500\u2500 Folded File Views \u2500\u2500"),r.push("");for(let n of e.foldedFiles)r.push(kn(n)),r.push("");return r.push("\u2500\u2500 Actions \u2500\u2500"),r.push(" To see full implementation: use smart_unfold with file path and symbol name"),r.join(`
`)}var Of=require("node:fs/promises"),js=require("node:path"),zP="10.7.1";console.log=(...e)=>{ve.error("CONSOLE","Intercepted console output (MCP protocol protection)",void 0,{args:e})};var a$={search:"/api/search",timeline:"/api/timeline"};async function s$(e,t){ve.debug("SYSTEM","\u2192 Worker API",void 0,{endpoint:e,params:t});try{let r=new URLSearchParams;for(let[a,s]of Object.entries(t))s!=null&&r.append(a,String(s));let n=`${e}?${r}`,o=await Ts(n);if(!o.ok){let a=await o.text();throw new Error(`Worker API error (${o.status}): ${a}`)}let i=await o.json();return ve.debug("SYSTEM","\u2190 Worker API success",void 0,{endpoint:e}),i}catch(r){return ve.error("SYSTEM","\u2190 Worker API error",{endpoint:e},r),{content:[{type:"text",text:`Error calling Worker API: ${r instanceof Error?r.message:String(r)}`}],isError:!0}}}async function IP(e,t){ve.debug("HTTP","Worker API request (POST)",void 0,{endpoint:e});try{let r=await Ts(e,{method:"POST",headers:{"Content-Type":"application/json"},body:JSON.stringify(t)});if(!r.ok){let o=await r.text();throw new Error(`Worker API error (${r.status}): ${o}`)}let n=await r.json();return ve.debug("HTTP","Worker API success (POST)",void 0,{endpoint:e}),{content:[{type:"text",text:JSON.stringify(n,null,2)}]}}catch(r){return ve.error("HTTP","Worker API error (POST)",{endpoint:e},r),{content:[{type:"text",text:`Error calling Worker API: ${r instanceof Error?r.message:String(r)}`}],isError:!0}}}async function EP(){try{return(await Ts("/api/health")).ok}catch(e){return ve.debug("SYSTEM","Worker health check failed",{},e),!1}}var c$=[{name:"__IMPORTANT",description:`3-LAYER WORKFLOW (ALWAYS FOLLOW):
`)}var Of=require("node:fs/promises"),js=require("node:path"),zP="11.0.0";console.log=(...e)=>{ve.error("CONSOLE","Intercepted console output (MCP protocol protection)",void 0,{args:e})};var a$={search:"/api/search",timeline:"/api/timeline"};async function s$(e,t){ve.debug("SYSTEM","\u2192 Worker API",void 0,{endpoint:e,params:t});try{let r=new URLSearchParams;for(let[a,s]of Object.entries(t))s!=null&&r.append(a,String(s));let n=`${e}?${r}`,o=await Ts(n);if(!o.ok){let a=await o.text();throw new Error(`Worker API error (${o.status}): ${a}`)}let i=await o.json();return ve.debug("SYSTEM","\u2190 Worker API success",void 0,{endpoint:e}),i}catch(r){return ve.error("SYSTEM","\u2190 Worker API error",{endpoint:e},r),{content:[{type:"text",text:`Error calling Worker API: ${r instanceof Error?r.message:String(r)}`}],isError:!0}}}async function IP(e,t){ve.debug("HTTP","Worker API request (POST)",void 0,{endpoint:e});try{let r=await Ts(e,{method:"POST",headers:{"Content-Type":"application/json"},body:JSON.stringify(t)});if(!r.ok){let o=await r.text();throw new Error(`Worker API error (${r.status}): ${o}`)}let n=await r.json();return ve.debug("HTTP","Worker API success (POST)",void 0,{endpoint:e}),{content:[{type:"text",text:JSON.stringify(n,null,2)}]}}catch(r){return ve.error("HTTP","Worker API error (POST)",{endpoint:e},r),{content:[{type:"text",text:`Error calling Worker API: ${r instanceof Error?r.message:String(r)}`}],isError:!0}}}async function EP(){try{return(await Ts("/api/health")).ok}catch(e){return ve.debug("SYSTEM","Worker health check failed",{},e),!1}}var c$=[{name:"__IMPORTANT",description:`3-LAYER WORKFLOW (ALWAYS FOLLOW):
1. search(query) \u2192 Get index with IDs (~50-100 tokens/result)
2. timeline(anchor=ID) \u2192 Get context around interesting results
3. get_observations([IDs]) \u2192 Fetch full details ONLY for filtered IDs
File diff suppressed because one or more lines are too long
+1 -1
View File
@@ -75,7 +75,7 @@ export function parseObservations(text: string, correlationId?: string): ParsedO
const cleanedConcepts = concepts.filter(c => c !== finalType);
if (cleanedConcepts.length !== concepts.length) {
logger.error('PARSER', 'Removed observation type from concepts array', {
logger.debug('PARSER', 'Removed observation type from concepts array', {
correlationId,
type: finalType,
originalConcepts: concepts,
+6 -2
View File
@@ -116,7 +116,11 @@ export function buildObservationPrompt(obs: Observation): string {
<occurred_at>${new Date(obs.created_at_epoch).toISOString()}</occurred_at>${obs.cwd ? `\n <working_directory>${obs.cwd}</working_directory>` : ''}
<parameters>${JSON.stringify(toolInput, null, 2)}</parameters>
<outcome>${JSON.stringify(toolOutput, null, 2)}</outcome>
</observed_from_primary_session>`;
</observed_from_primary_session>
Return either one or more <observation>...</observation> blocks, or an empty response if this tool use should be skipped.
Concrete debugging findings from logs, queue state, database rows, session routing, or code-path inspection count as durable discoveries and should be recorded.
Never reply with prose such as "Skipping", "No substantive tool executions", or any explanation outside XML. Non-XML text is discarded.`;
}
/**
@@ -235,4 +239,4 @@ ${mode.prompts.format_examples}
${mode.prompts.footer}
${mode.prompts.header_memory_continued}`;
}
}
@@ -68,6 +68,19 @@ export async function processAgentResponse(
const observations = parseObservations(text, session.contentSessionId);
const summary = parseSummary(text, session.sessionDbId);
if (
text.trim() &&
observations.length === 0 &&
!summary &&
!/<observation>|<summary>|<skip_summary\b/.test(text)
) {
const preview = text.length > 200 ? `${text.slice(0, 200)}...` : text;
logger.warn('PARSER', `${agentName} returned non-XML response; observation content was discarded`, {
sessionId: session.sessionDbId,
preview
});
}
// Convert nullable fields to empty strings for storeSummary (if summary exists)
const summaryForStore = normalizeSummaryForStorage(summary);
+20
View File
@@ -0,0 +1,20 @@
import { describe, expect, it } from 'bun:test';
import { buildObservationPrompt } from '../../src/sdk/prompts.js';
describe('buildObservationPrompt', () => {
it('instructs the observer to avoid prose skip responses', () => {
const prompt = buildObservationPrompt({
id: 1,
tool_name: 'exec_command',
tool_input: JSON.stringify({ cmd: 'pwd' }),
tool_output: JSON.stringify({ output: '/repo' }),
created_at_epoch: Date.now(),
cwd: '/repo',
});
expect(prompt).toContain('Return either one or more <observation>...</observation> blocks, or an empty response');
expect(prompt).toContain('Concrete debugging findings from logs, queue state, database rows, session routing, or code-path inspection');
expect(prompt).toContain('Never reply with prose such as "Skipping", "No substantive tool executions"');
});
});
@@ -212,6 +212,36 @@ describe('ResponseProcessor', () => {
});
});
describe('non-XML observer responses', () => {
it('warns when the observer returns prose that will be discarded', async () => {
const session = createMockSession();
const responseText = 'Skipping — repeated log scan with no new findings.';
await processAgentResponse(
responseText,
session,
mockDbManager,
mockSessionManager,
mockWorker,
100,
null,
'TestAgent'
);
expect(logger.warn).toHaveBeenCalledWith(
'PARSER',
'TestAgent returned non-XML response; observation content was discarded',
expect.objectContaining({
sessionId: 1,
preview: responseText
})
);
const [, , observations, summary] = mockStoreObservations.mock.calls[0];
expect(observations).toHaveLength(0);
expect(summary).toBeNull();
});
});
describe('parsing summary from XML response', () => {
it('should parse summary from response', async () => {
const session = createMockSession();