* Initial plan * fix: break infinite summary-retry loop (#1633) Three-part fix: 1. Parser coercion: When LLM returns <observation> tags instead of <summary>, coerce observation content into summary fields (root cause fix) 2. Stronger summary prompt: Add clearer tag requirements with warnings 3. Circuit breaker: Track consecutive summary failures per session, skip further attempts after 3 failures to prevent unbounded prompt growth Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77 Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> * refactor: extract shared constants for summary mode marker and failure threshold Addresses code review feedback: SUMMARY_MODE_MARKER and MAX_CONSECUTIVE_SUMMARY_FAILURES are now defined once in sdk/prompts.ts and imported by ResponseProcessor and SessionManager. Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77 Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> * fix: guard summary failure counter on summaryExpected (Greptile P1) The circuit breaker counter previously incremented on any response containing <observation> or <summary> tags — which matches virtually every normal observation response. After 3 observations the breaker would open and permanently block summarization, reproducing the data-loss scenario #1633 was meant to prevent. Gate the increment block on summaryExpected (already computed for parseSummary coercion) so the counter only tracks actual summary attempts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: cover circuit-breaker + apply review polish - Use findLast / at(-1) for last-user-message lookup instead of filter + index (O(1) common case). - Drop redundant `|| 0` fallback — field is required and initialized. - Add comment noting counter is ephemeral by design. - Add ResponseProcessor tests covering: * counter NOT incrementing on normal observation responses (regression guard for the Greptile P1) * counter incrementing when a summary was expected but missing * counter resetting to 0 on successful summary storage Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: iterate all observation blocks; don't count skip_summary as failure Addresses CodeRabbit review on #2072: - coerceObservationToSummary now iterates all <observation> blocks with a global regex and returns the first block that has title, narrative, or facts. Previously, an empty leading observation would short-circuit and discard populated follow-ups. - Circuit-breaker counter now treats explicit <skip_summary/> as neutral — neither a failure nor a success — so a run that happens to end on a skip doesn't punish the session or mask a prior bad streak. Real failures (no summary, no skip) still increment. - Tests added for both cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: reference SUMMARY_MODE_MARKER constant instead of hardcoded string Addresses CodeRabbit nitpick: tests should pull the marker from the canonical source so they don't silently drift when the constant is renamed or edited. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: also coerce observations when <summary> has empty sub-tags When the LLM wraps an empty <summary></summary> around real observation content, the #1360 empty-subtag guard rejects the summary and returns null — which would lose the observation content and resurrect the #1633 retry loop. Fall back to coerceObservationToSummary in that branch too, mirroring the unmatched-<summary> path. Adds a test covering the empty-summary-wraps-observation case and a guard test for empty summary with no observation content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> Co-authored-by: Alex Newman <thedotmack@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+75
-3
@@ -113,8 +113,13 @@ export function parseObservations(text: string, correlationId?: string): ParsedO
|
|||||||
/**
|
/**
|
||||||
* Parse summary XML block from SDK response
|
* Parse summary XML block from SDK response
|
||||||
* Returns null if no valid summary found or if summary was skipped
|
* Returns null if no valid summary found or if summary was skipped
|
||||||
|
*
|
||||||
|
* @param coerceFromObservation - When true, attempts to convert <observation> tags
|
||||||
|
* into summary fields if no <summary> tags are found. Only set this when the
|
||||||
|
* response was expected to be a summary (i.e., a summarize message was sent).
|
||||||
|
* Prevents the infinite retry loop described in #1633.
|
||||||
*/
|
*/
|
||||||
export function parseSummary(text: string, sessionId?: number): ParsedSummary | null {
|
export function parseSummary(text: string, sessionId?: number, coerceFromObservation: boolean = false): ParsedSummary | null {
|
||||||
// Check for skip_summary first
|
// Check for skip_summary first
|
||||||
const skipRegex = /<skip_summary\s+reason="([^"]+)"\s*\/>/;
|
const skipRegex = /<skip_summary\s+reason="([^"]+)"\s*\/>/;
|
||||||
const skipMatch = skipRegex.exec(text);
|
const skipMatch = skipRegex.exec(text);
|
||||||
@@ -132,11 +137,23 @@ export function parseSummary(text: string, sessionId?: number): ParsedSummary |
|
|||||||
const summaryMatch = summaryRegex.exec(text);
|
const summaryMatch = summaryRegex.exec(text);
|
||||||
|
|
||||||
if (!summaryMatch) {
|
if (!summaryMatch) {
|
||||||
// Log when the response contains <observation> instead of <summary>
|
// When the LLM returns <observation> tags instead of <summary> tags,
|
||||||
// to help diagnose prompt conditioning issues (see #1312)
|
// coerce the observation content into summary fields rather than discarding it.
|
||||||
|
// This breaks the infinite retry loop described in #1633: without coercion,
|
||||||
|
// the summary is silently dropped, the session completes without a summary,
|
||||||
|
// a new session is spawned with an ever-growing prompt, and the cycle repeats.
|
||||||
|
// Only coerce when explicitly requested (i.e., when a summarize message was sent).
|
||||||
if (/<observation>/.test(text)) {
|
if (/<observation>/.test(text)) {
|
||||||
|
if (coerceFromObservation) {
|
||||||
|
const coerced = coerceObservationToSummary(text, sessionId);
|
||||||
|
if (coerced) {
|
||||||
|
return coerced;
|
||||||
|
}
|
||||||
|
logger.warn('PARSER', 'Summary response contained <observation> tags instead of <summary> — coercion failed, no usable content', { sessionId });
|
||||||
|
} else {
|
||||||
logger.warn('PARSER', 'Summary response contained <observation> tags instead of <summary> — prompt conditioning may need strengthening', { sessionId });
|
logger.warn('PARSER', 'Summary response contained <observation> tags instead of <summary> — prompt conditioning may need strengthening', { sessionId });
|
||||||
}
|
}
|
||||||
|
}
|
||||||
return null;
|
return null;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -171,6 +188,17 @@ export function parseSummary(text: string, sessionId?: number): ParsedSummary |
|
|||||||
// This is NOT the same as missing some fields (which we intentionally allow above).
|
// This is NOT the same as missing some fields (which we intentionally allow above).
|
||||||
// Fix for #1360.
|
// Fix for #1360.
|
||||||
if (!request && !investigated && !learned && !completed && !next_steps) {
|
if (!request && !investigated && !learned && !completed && !next_steps) {
|
||||||
|
// If the response also contains <observation> tags with real content, fall
|
||||||
|
// back to coercion rather than discarding the response entirely — this covers
|
||||||
|
// the case where the LLM wraps empty <summary></summary> around observation
|
||||||
|
// content, which would otherwise resurrect the #1633 retry loop.
|
||||||
|
if (coerceFromObservation && /<observation>/.test(text)) {
|
||||||
|
const coerced = coerceObservationToSummary(text, sessionId);
|
||||||
|
if (coerced) {
|
||||||
|
logger.warn('PARSER', 'Empty <summary> match rejected — coerced from <observation> fallback (#1633)', { sessionId });
|
||||||
|
return coerced;
|
||||||
|
}
|
||||||
|
}
|
||||||
logger.warn('PARSER', 'Summary match has no sub-tags — skipping false positive', { sessionId });
|
logger.warn('PARSER', 'Summary match has no sub-tags — skipping false positive', { sessionId });
|
||||||
return null;
|
return null;
|
||||||
}
|
}
|
||||||
@@ -185,6 +213,50 @@ export function parseSummary(text: string, sessionId?: number): ParsedSummary |
|
|||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Coerce <observation> response into a ParsedSummary when <summary> tags are missing.
|
||||||
|
* Maps observation fields to the closest summary equivalents so that a usable
|
||||||
|
* summary is stored instead of nothing — breaking the retry loop (#1633).
|
||||||
|
*/
|
||||||
|
function coerceObservationToSummary(text: string, sessionId?: number): ParsedSummary | null {
|
||||||
|
// Iterate all <observation> blocks — if the LLM emits multiple and the first is
|
||||||
|
// empty, we still want to salvage the first one that has usable content.
|
||||||
|
const obsRegex = /<observation>([\s\S]*?)<\/observation>/g;
|
||||||
|
let obsMatch: RegExpExecArray | null;
|
||||||
|
let blockIndex = 0;
|
||||||
|
|
||||||
|
while ((obsMatch = obsRegex.exec(text)) !== null) {
|
||||||
|
const obsContent = obsMatch[1];
|
||||||
|
const title = extractField(obsContent, 'title');
|
||||||
|
const subtitle = extractField(obsContent, 'subtitle');
|
||||||
|
const narrative = extractField(obsContent, 'narrative');
|
||||||
|
const facts = extractArrayElements(obsContent, 'facts', 'fact');
|
||||||
|
|
||||||
|
if (title || narrative || facts.length > 0) {
|
||||||
|
// Map observation fields → summary fields (best-effort)
|
||||||
|
const request = title || subtitle || null;
|
||||||
|
const investigated = narrative || null;
|
||||||
|
const learned = facts.length > 0 ? facts.join('; ') : null;
|
||||||
|
const completed = title ? `${title}${subtitle ? ' — ' + subtitle : ''}` : null;
|
||||||
|
const next_steps = null; // No direct observation equivalent
|
||||||
|
|
||||||
|
logger.warn('PARSER', 'Coerced <observation> response into <summary> to prevent retry loop (#1633)', {
|
||||||
|
sessionId,
|
||||||
|
blockIndex,
|
||||||
|
hasTitle: !!title,
|
||||||
|
hasNarrative: !!narrative,
|
||||||
|
factCount: facts.length,
|
||||||
|
});
|
||||||
|
|
||||||
|
return { request, investigated, learned, completed, next_steps, notes: null };
|
||||||
|
}
|
||||||
|
|
||||||
|
blockIndex++;
|
||||||
|
}
|
||||||
|
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Extract a simple field value from XML content
|
* Extract a simple field value from XML content
|
||||||
* Returns null for missing or empty/whitespace-only fields
|
* Returns null for missing or empty/whitespace-only fields
|
||||||
|
|||||||
+20
-3
@@ -6,6 +6,20 @@
|
|||||||
import { logger } from '../utils/logger.js';
|
import { logger } from '../utils/logger.js';
|
||||||
import type { ModeConfig } from '../services/domain/types.js';
|
import type { ModeConfig } from '../services/domain/types.js';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Marker string embedded in summary prompts — used by ResponseProcessor to detect
|
||||||
|
* whether the most recent user message was a summary request (enables observation→summary
|
||||||
|
* coercion for #1633). Keep in sync with buildSummaryPrompt below.
|
||||||
|
*/
|
||||||
|
export const SUMMARY_MODE_MARKER = 'MODE SWITCH: PROGRESS SUMMARY';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Maximum consecutive summary failures before the circuit breaker opens.
|
||||||
|
* After this many failures, SessionManager.queueSummarize will skip further
|
||||||
|
* summarize requests to prevent the infinite retry loop (#1633).
|
||||||
|
*/
|
||||||
|
export const MAX_CONSECUTIVE_SUMMARY_FAILURES = 3;
|
||||||
|
|
||||||
export interface Observation {
|
export interface Observation {
|
||||||
id: number;
|
id: number;
|
||||||
tool_name: string;
|
tool_name: string;
|
||||||
@@ -134,9 +148,11 @@ export function buildSummaryPrompt(session: SDKSession, mode: ModeConfig): strin
|
|||||||
return '';
|
return '';
|
||||||
})();
|
})();
|
||||||
|
|
||||||
return `--- MODE SWITCH: PROGRESS SUMMARY ---
|
return `--- ${SUMMARY_MODE_MARKER} ---
|
||||||
Do NOT output <observation> tags. This is a summary request, not an observation request.
|
⚠️ CRITICAL TAG REQUIREMENT — READ CAREFULLY:
|
||||||
Your response MUST use <summary> tags ONLY. Any <observation> output will be discarded.
|
• You MUST wrap your ENTIRE response in <summary>...</summary> tags.
|
||||||
|
• Do NOT use <observation> tags. <observation> output will be DISCARDED and cause a system error.
|
||||||
|
• The ONLY accepted root tag is <summary>. Any other root tag is a protocol violation.
|
||||||
|
|
||||||
${mode.prompts.header_summary_checkpoint}
|
${mode.prompts.header_summary_checkpoint}
|
||||||
${mode.prompts.summary_instruction}
|
${mode.prompts.summary_instruction}
|
||||||
@@ -154,6 +170,7 @@ ${mode.prompts.summary_format_instruction}
|
|||||||
<notes>${mode.prompts.xml_summary_notes_placeholder}</notes>
|
<notes>${mode.prompts.xml_summary_notes_placeholder}</notes>
|
||||||
</summary>
|
</summary>
|
||||||
|
|
||||||
|
REMINDER: Your response MUST use <summary> as the root tag, NOT <observation>.
|
||||||
${mode.prompts.summary_footer}`;
|
${mode.prompts.summary_footer}`;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -46,6 +46,9 @@ export interface ActiveSession {
|
|||||||
// Track whether the most recent storage operation persisted a summary record.
|
// Track whether the most recent storage operation persisted a summary record.
|
||||||
// Used by the status endpoint so the Stop hook can detect silent summary loss (#1633).
|
// Used by the status endpoint so the Stop hook can detect silent summary loss (#1633).
|
||||||
lastSummaryStored?: boolean;
|
lastSummaryStored?: boolean;
|
||||||
|
// Circuit breaker: track consecutive summary failures to prevent infinite retry loops (#1633).
|
||||||
|
// When this reaches MAX_CONSECUTIVE_SUMMARY_FAILURES, further summarize requests are skipped.
|
||||||
|
consecutiveSummaryFailures: number;
|
||||||
}
|
}
|
||||||
|
|
||||||
export interface PendingMessage {
|
export interface PendingMessage {
|
||||||
|
|||||||
@@ -16,6 +16,7 @@ import { PendingMessageStore } from '../sqlite/PendingMessageStore.js';
|
|||||||
import { SessionQueueProcessor } from '../queue/SessionQueueProcessor.js';
|
import { SessionQueueProcessor } from '../queue/SessionQueueProcessor.js';
|
||||||
import { getProcessBySession, ensureProcessExit } from './ProcessRegistry.js';
|
import { getProcessBySession, ensureProcessExit } from './ProcessRegistry.js';
|
||||||
import { getSupervisor } from '../../supervisor/index.js';
|
import { getSupervisor } from '../../supervisor/index.js';
|
||||||
|
import { MAX_CONSECUTIVE_SUMMARY_FAILURES } from '../../sdk/prompts.js';
|
||||||
|
|
||||||
/** Idle threshold before a stuck generator (zombie subprocess) is force-killed. */
|
/** Idle threshold before a stuck generator (zombie subprocess) is force-killed. */
|
||||||
export const MAX_GENERATOR_IDLE_MS = 5 * 60 * 1000; // 5 minutes
|
export const MAX_GENERATOR_IDLE_MS = 5 * 60 * 1000; // 5 minutes
|
||||||
@@ -219,7 +220,8 @@ export class SessionManager {
|
|||||||
currentProvider: null, // Will be set when generator starts
|
currentProvider: null, // Will be set when generator starts
|
||||||
consecutiveRestarts: 0, // Track consecutive restart attempts to prevent infinite loops
|
consecutiveRestarts: 0, // Track consecutive restart attempts to prevent infinite loops
|
||||||
processingMessageIds: [], // CLAIM-CONFIRM: Track message IDs for confirmProcessed()
|
processingMessageIds: [], // CLAIM-CONFIRM: Track message IDs for confirmProcessed()
|
||||||
lastGeneratorActivity: Date.now() // Initialize for stale detection (Issue #1099)
|
lastGeneratorActivity: Date.now(), // Initialize for stale detection (Issue #1099)
|
||||||
|
consecutiveSummaryFailures: 0 // Circuit breaker for summary retry loop (#1633)
|
||||||
};
|
};
|
||||||
|
|
||||||
logger.debug('SESSION', 'Creating new session object (memorySessionId cleared to prevent stale resume)', {
|
logger.debug('SESSION', 'Creating new session object (memorySessionId cleared to prevent stale resume)', {
|
||||||
@@ -312,6 +314,18 @@ export class SessionManager {
|
|||||||
session = this.initializeSession(sessionDbId);
|
session = this.initializeSession(sessionDbId);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Circuit breaker: skip summarize if too many consecutive failures (#1633).
|
||||||
|
// This prevents the infinite loop where each failed summary spawns a new session
|
||||||
|
// with an ever-growing prompt. Counter is in-memory per ActiveSession — it resets
|
||||||
|
// on worker restart, which is acceptable because session state is already ephemeral.
|
||||||
|
if (session.consecutiveSummaryFailures >= MAX_CONSECUTIVE_SUMMARY_FAILURES) {
|
||||||
|
logger.warn('SESSION', `Circuit breaker OPEN: skipping summarize after ${session.consecutiveSummaryFailures} consecutive failures (#1633)`, {
|
||||||
|
sessionId: sessionDbId,
|
||||||
|
contentSessionId: session.contentSessionId
|
||||||
|
});
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
// CRITICAL: Persist to database FIRST
|
// CRITICAL: Persist to database FIRST
|
||||||
const message: PendingMessage = {
|
const message: PendingMessage = {
|
||||||
type: 'summarize',
|
type: 'summarize',
|
||||||
|
|||||||
@@ -13,6 +13,7 @@
|
|||||||
|
|
||||||
import { logger } from '../../../utils/logger.js';
|
import { logger } from '../../../utils/logger.js';
|
||||||
import { parseObservations, parseSummary, type ParsedObservation, type ParsedSummary } from '../../../sdk/parser.js';
|
import { parseObservations, parseSummary, type ParsedObservation, type ParsedSummary } from '../../../sdk/parser.js';
|
||||||
|
import { SUMMARY_MODE_MARKER, MAX_CONSECUTIVE_SUMMARY_FAILURES } from '../../../sdk/prompts.js';
|
||||||
import { updateCursorContextForProject } from '../../integrations/CursorHooksInstaller.js';
|
import { updateCursorContextForProject } from '../../integrations/CursorHooksInstaller.js';
|
||||||
import { updateFolderClaudeMdFiles } from '../../../utils/claude-md-utils.js';
|
import { updateFolderClaudeMdFiles } from '../../../utils/claude-md-utils.js';
|
||||||
import { getWorkerPort } from '../../../shared/worker-utils.js';
|
import { getWorkerPort } from '../../../shared/worker-utils.js';
|
||||||
@@ -67,7 +68,17 @@ export async function processAgentResponse(
|
|||||||
|
|
||||||
// Parse observations and summary
|
// Parse observations and summary
|
||||||
const observations = parseObservations(text, session.contentSessionId);
|
const observations = parseObservations(text, session.contentSessionId);
|
||||||
const summary = parseSummary(text, session.sessionDbId);
|
|
||||||
|
// Detect whether the most recent prompt was a summary request.
|
||||||
|
// If so, enable observation-to-summary coercion to prevent the infinite
|
||||||
|
// retry loop described in #1633.
|
||||||
|
const lastMessage = session.conversationHistory.at(-1);
|
||||||
|
const lastUserMessage = lastMessage?.role === 'user'
|
||||||
|
? lastMessage
|
||||||
|
: session.conversationHistory.findLast(m => m.role === 'user') ?? null;
|
||||||
|
const summaryExpected = lastUserMessage?.content?.includes(SUMMARY_MODE_MARKER) ?? false;
|
||||||
|
|
||||||
|
const summary = parseSummary(text, session.sessionDbId, summaryExpected);
|
||||||
|
|
||||||
if (
|
if (
|
||||||
text.trim() &&
|
text.trim() &&
|
||||||
@@ -130,6 +141,32 @@ export async function processAgentResponse(
|
|||||||
// to the Stop hook for silent-summary-loss detection (#1633)
|
// to the Stop hook for silent-summary-loss detection (#1633)
|
||||||
session.lastSummaryStored = result.summaryId !== null;
|
session.lastSummaryStored = result.summaryId !== null;
|
||||||
|
|
||||||
|
// Circuit breaker: track consecutive summary failures (#1633).
|
||||||
|
// Only evaluate when a summary was actually expected (summarize message was sent).
|
||||||
|
// Without this guard, the counter would increment on every normal observation
|
||||||
|
// response, tripping the breaker after 3 observations and permanently blocking
|
||||||
|
// summarization — reproducing the data-loss scenario this fix is meant to prevent.
|
||||||
|
if (summaryExpected) {
|
||||||
|
const skippedIntentionally = /<skip_summary\b/.test(text);
|
||||||
|
if (summaryForStore !== null) {
|
||||||
|
// Summary was present in the response — reset the failure counter
|
||||||
|
session.consecutiveSummaryFailures = 0;
|
||||||
|
} else if (skippedIntentionally) {
|
||||||
|
// Explicit <skip_summary/> is a valid protocol response — neither success
|
||||||
|
// nor failure. Leave the counter unchanged so we don't mask a bad run that
|
||||||
|
// happens to end on a skip, but also don't punish intentional skips.
|
||||||
|
} else {
|
||||||
|
// Summary was expected but none was stored — count as failure
|
||||||
|
session.consecutiveSummaryFailures += 1;
|
||||||
|
if (session.consecutiveSummaryFailures >= MAX_CONSECUTIVE_SUMMARY_FAILURES) {
|
||||||
|
logger.error('SESSION', `Circuit breaker: ${session.consecutiveSummaryFailures} consecutive summary failures — further summarize requests will be skipped (#1633)`, {
|
||||||
|
sessionId: session.sessionDbId,
|
||||||
|
contentSessionId: session.contentSessionId
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// CLAIM-CONFIRM: Now that storage succeeded, confirm all processing messages (delete from queue)
|
// CLAIM-CONFIRM: Now that storage succeeded, confirm all processing messages (delete from queue)
|
||||||
// This is the critical step that prevents message loss on generator crash
|
// This is the critical step that prevents message loss on generator crash
|
||||||
const pendingStore = sessionManager.getPendingMessageStore();
|
const pendingStore = sessionManager.getPendingMessageStore();
|
||||||
|
|||||||
@@ -8,10 +8,14 @@ import { describe, it, expect } from 'bun:test';
|
|||||||
import { parseSummary } from '../../src/sdk/parser.js';
|
import { parseSummary } from '../../src/sdk/parser.js';
|
||||||
|
|
||||||
describe('parseSummary', () => {
|
describe('parseSummary', () => {
|
||||||
it('returns null when no <summary> tag present', () => {
|
it('returns null when no <summary> tag present and coercion disabled', () => {
|
||||||
expect(parseSummary('<observation><title>foo</title></observation>')).toBeNull();
|
expect(parseSummary('<observation><title>foo</title></observation>')).toBeNull();
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('returns null when no <summary> or <observation> tags present', () => {
|
||||||
|
expect(parseSummary('Some plain text response without any XML tags')).toBeNull();
|
||||||
|
});
|
||||||
|
|
||||||
it('returns null when <summary> has no sub-tags (false positive — fix for #1360)', () => {
|
it('returns null when <summary> has no sub-tags (false positive — fix for #1360)', () => {
|
||||||
// This is the bug: observation response accidentally contains <summary>some text</summary>
|
// This is the bug: observation response accidentally contains <summary>some text</summary>
|
||||||
expect(parseSummary('<observation>done <summary>some content here</summary></observation>')).toBeNull();
|
expect(parseSummary('<observation>done <summary>some content here</summary></observation>')).toBeNull();
|
||||||
@@ -50,4 +54,96 @@ describe('parseSummary', () => {
|
|||||||
it('returns null when skip_summary tag is present', () => {
|
it('returns null when skip_summary tag is present', () => {
|
||||||
expect(parseSummary('<skip_summary reason="no work done"/>')).toBeNull();
|
expect(parseSummary('<skip_summary reason="no work done"/>')).toBeNull();
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// Observation-to-summary coercion tests (#1633)
|
||||||
|
it('coerces <observation> with content into a summary when coerceFromObservation=true (#1633)', () => {
|
||||||
|
const result = parseSummary('<observation><title>foo</title></observation>', undefined, true);
|
||||||
|
expect(result).not.toBeNull();
|
||||||
|
expect(result?.request).toBe('foo');
|
||||||
|
expect(result?.completed).toBe('foo');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('coerces observation with narrative into summary with investigated field (#1633)', () => {
|
||||||
|
const text = `<observation>
|
||||||
|
<type>refactor</type>
|
||||||
|
<title>UObjectArray refactored</title>
|
||||||
|
<narrative>Removed local XXXX and migrated to new pattern</narrative>
|
||||||
|
</observation>`;
|
||||||
|
const result = parseSummary(text, undefined, true);
|
||||||
|
expect(result).not.toBeNull();
|
||||||
|
expect(result?.request).toBe('UObjectArray refactored');
|
||||||
|
expect(result?.investigated).toBe('Removed local XXXX and migrated to new pattern');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('coerces observation with facts into summary with learned field (#1633)', () => {
|
||||||
|
const text = `<observation>
|
||||||
|
<type>discovery</type>
|
||||||
|
<title>JWT token handling</title>
|
||||||
|
<facts>
|
||||||
|
<fact>Tokens expire after 1 hour</fact>
|
||||||
|
<fact>Refresh flow uses rotating keys</fact>
|
||||||
|
</facts>
|
||||||
|
</observation>`;
|
||||||
|
const result = parseSummary(text, undefined, true);
|
||||||
|
expect(result).not.toBeNull();
|
||||||
|
expect(result?.request).toBe('JWT token handling');
|
||||||
|
expect(result?.learned).toBe('Tokens expire after 1 hour; Refresh flow uses rotating keys');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('coerces observation with subtitle into completed field (#1633)', () => {
|
||||||
|
const text = `<observation>
|
||||||
|
<type>config</type>
|
||||||
|
<title>Database migration</title>
|
||||||
|
<subtitle>Added new index for performance</subtitle>
|
||||||
|
</observation>`;
|
||||||
|
const result = parseSummary(text, undefined, true);
|
||||||
|
expect(result).not.toBeNull();
|
||||||
|
expect(result?.completed).toBe('Database migration — Added new index for performance');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns null for empty observation even with coercion enabled (#1633)', () => {
|
||||||
|
const text = `<observation><type>config</type></observation>`;
|
||||||
|
expect(parseSummary(text, undefined, true)).toBeNull();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('prefers <summary> tags over observation coercion when both present (#1633)', () => {
|
||||||
|
const text = `<observation><title>obs title</title></observation>
|
||||||
|
<summary><request>summary request</request></summary>`;
|
||||||
|
const result = parseSummary(text, undefined, true);
|
||||||
|
expect(result).not.toBeNull();
|
||||||
|
expect(result?.request).toBe('summary request');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('falls back to observation coercion when <summary> matches but has empty sub-tags (#1633)', () => {
|
||||||
|
// LLM wraps an empty summary around real observation content — without the
|
||||||
|
// fallback, the empty-subtag guard (#1360) rejects the summary and we lose
|
||||||
|
// the observation content, resurrecting the retry loop.
|
||||||
|
const text = `<summary></summary>
|
||||||
|
<observation>
|
||||||
|
<title>the real work</title>
|
||||||
|
<narrative>what actually happened</narrative>
|
||||||
|
</observation>`;
|
||||||
|
const result = parseSummary(text, undefined, true);
|
||||||
|
expect(result).not.toBeNull();
|
||||||
|
expect(result?.request).toBe('the real work');
|
||||||
|
expect(result?.investigated).toBe('what actually happened');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('empty <summary> with no observation content still returns null (coercion disabled)', () => {
|
||||||
|
const text = '<summary></summary>';
|
||||||
|
expect(parseSummary(text, undefined, true)).toBeNull();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('skips empty leading observation blocks and coerces from the first populated one (#1633)', () => {
|
||||||
|
const text = `<observation><type>discovery</type></observation>
|
||||||
|
<observation>
|
||||||
|
<type>bugfix</type>
|
||||||
|
<title>second block has content</title>
|
||||||
|
<narrative>fixed the crash</narrative>
|
||||||
|
</observation>`;
|
||||||
|
const result = parseSummary(text, undefined, true);
|
||||||
|
expect(result).not.toBeNull();
|
||||||
|
expect(result?.request).toBe('second block has content');
|
||||||
|
expect(result?.investigated).toBe('fixed the crash');
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -31,6 +31,7 @@ mock.module('../../../src/services/domain/ModeManager.js', () => ({
|
|||||||
|
|
||||||
// Import after mocks
|
// Import after mocks
|
||||||
import { processAgentResponse } from '../../../src/services/worker/agents/ResponseProcessor.js';
|
import { processAgentResponse } from '../../../src/services/worker/agents/ResponseProcessor.js';
|
||||||
|
import { SUMMARY_MODE_MARKER } from '../../../src/sdk/prompts.js';
|
||||||
import type { WorkerRef, StorageResult } from '../../../src/services/worker/agents/types.js';
|
import type { WorkerRef, StorageResult } from '../../../src/services/worker/agents/types.js';
|
||||||
import type { ActiveSession } from '../../../src/services/worker-types.js';
|
import type { ActiveSession } from '../../../src/services/worker-types.js';
|
||||||
import type { DatabaseManager } from '../../../src/services/worker/DatabaseManager.js';
|
import type { DatabaseManager } from '../../../src/services/worker/DatabaseManager.js';
|
||||||
@@ -130,8 +131,9 @@ describe('ResponseProcessor', () => {
|
|||||||
conversationHistory: [],
|
conversationHistory: [],
|
||||||
currentProvider: 'claude',
|
currentProvider: 'claude',
|
||||||
processingMessageIds: [], // CLAIM-CONFIRM pattern: track message IDs being processed
|
processingMessageIds: [], // CLAIM-CONFIRM pattern: track message IDs being processed
|
||||||
|
consecutiveSummaryFailures: 0,
|
||||||
...overrides,
|
...overrides,
|
||||||
};
|
} as ActiveSession;
|
||||||
}
|
}
|
||||||
|
|
||||||
describe('parsing observations from XML response', () => {
|
describe('parsing observations from XML response', () => {
|
||||||
@@ -726,4 +728,103 @@ describe('ResponseProcessor', () => {
|
|||||||
expect(session.lastSummaryStored).toBe(false);
|
expect(session.lastSummaryStored).toBe(false);
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
describe('circuit breaker: consecutiveSummaryFailures counter (#1633)', () => {
|
||||||
|
const SUMMARY_PROMPT = `--- ${SUMMARY_MODE_MARKER} ---\nDo the summary now.`;
|
||||||
|
|
||||||
|
it('does NOT increment the counter on normal observation responses (P1 regression guard)', async () => {
|
||||||
|
// Session where the last user message is an OBSERVATION request, not a summary request.
|
||||||
|
// The counter must stay at 0 even though the response has <observation> tags and no summary.
|
||||||
|
mockStoreObservations.mockImplementation(() => ({
|
||||||
|
observationIds: [1],
|
||||||
|
summaryId: null,
|
||||||
|
createdAtEpoch: 1700000000000,
|
||||||
|
} as StorageResult));
|
||||||
|
|
||||||
|
const session = createMockSession({
|
||||||
|
conversationHistory: [{ role: 'user', content: 'record a new observation' }],
|
||||||
|
});
|
||||||
|
const obsResponse = `
|
||||||
|
<observation>
|
||||||
|
<type>discovery</type>
|
||||||
|
<title>found a thing</title>
|
||||||
|
<narrative>it happened</narrative>
|
||||||
|
<facts></facts>
|
||||||
|
<concepts></concepts>
|
||||||
|
<files_read></files_read>
|
||||||
|
<files_modified></files_modified>
|
||||||
|
</observation>
|
||||||
|
`;
|
||||||
|
|
||||||
|
// Drive multiple observation responses — counter must never increment.
|
||||||
|
for (let i = 0; i < 5; i++) {
|
||||||
|
await processAgentResponse(obsResponse, session, mockDbManager, mockSessionManager, mockWorker, 0, null, 'TestAgent');
|
||||||
|
}
|
||||||
|
|
||||||
|
expect(session.consecutiveSummaryFailures).toBe(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('increments the counter when a summary was expected but none was stored', async () => {
|
||||||
|
mockStoreObservations.mockImplementation(() => ({
|
||||||
|
observationIds: [],
|
||||||
|
summaryId: null,
|
||||||
|
createdAtEpoch: 1700000000000,
|
||||||
|
} as StorageResult));
|
||||||
|
|
||||||
|
const session = createMockSession({
|
||||||
|
conversationHistory: [{ role: 'user', content: SUMMARY_PROMPT }],
|
||||||
|
});
|
||||||
|
// LLM returned nothing structured — no summary stored
|
||||||
|
const badResponse = 'I cannot comply with that request.';
|
||||||
|
|
||||||
|
await processAgentResponse(badResponse, session, mockDbManager, mockSessionManager, mockWorker, 0, null, 'TestAgent');
|
||||||
|
|
||||||
|
expect(session.consecutiveSummaryFailures).toBe(1);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('does NOT increment the counter on intentional <skip_summary/> responses', async () => {
|
||||||
|
mockStoreObservations.mockImplementation(() => ({
|
||||||
|
observationIds: [],
|
||||||
|
summaryId: null,
|
||||||
|
createdAtEpoch: 1700000000000,
|
||||||
|
} as StorageResult));
|
||||||
|
|
||||||
|
const session = createMockSession({
|
||||||
|
consecutiveSummaryFailures: 1,
|
||||||
|
conversationHistory: [{ role: 'user', content: SUMMARY_PROMPT }],
|
||||||
|
});
|
||||||
|
const skipResponse = '<skip_summary reason="no meaningful work this session"/>';
|
||||||
|
|
||||||
|
await processAgentResponse(skipResponse, session, mockDbManager, mockSessionManager, mockWorker, 0, null, 'TestAgent');
|
||||||
|
|
||||||
|
// Skip is neutral — counter stays where it was, no spurious increment
|
||||||
|
expect(session.consecutiveSummaryFailures).toBe(1);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('resets the counter to 0 when a summary is successfully stored', async () => {
|
||||||
|
mockStoreObservations.mockImplementation(() => ({
|
||||||
|
observationIds: [],
|
||||||
|
summaryId: 42,
|
||||||
|
createdAtEpoch: 1700000000000,
|
||||||
|
} as StorageResult));
|
||||||
|
|
||||||
|
const session = createMockSession({
|
||||||
|
consecutiveSummaryFailures: 2,
|
||||||
|
conversationHistory: [{ role: 'user', content: SUMMARY_PROMPT }],
|
||||||
|
});
|
||||||
|
const goodResponse = `
|
||||||
|
<summary>
|
||||||
|
<request>wrap it up</request>
|
||||||
|
<investigated>the thing</investigated>
|
||||||
|
<learned>the answer</learned>
|
||||||
|
<completed>the work</completed>
|
||||||
|
<next_steps>none</next_steps>
|
||||||
|
</summary>
|
||||||
|
`;
|
||||||
|
|
||||||
|
await processAgentResponse(goodResponse, session, mockDbManager, mockSessionManager, mockWorker, 0, null, 'TestAgent');
|
||||||
|
|
||||||
|
expect(session.consecutiveSummaryFailures).toBe(0);
|
||||||
|
});
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|||||||
Reference in New Issue
Block a user