fix: break infinite summary-retry loop (#1633) (#2072)

* Initial plan

* fix: break infinite summary-retry loop (#1633)

Three-part fix:
1. Parser coercion: When LLM returns <observation> tags instead of <summary>,
   coerce observation content into summary fields (root cause fix)
2. Stronger summary prompt: Add clearer tag requirements with warnings
3. Circuit breaker: Track consecutive summary failures per session,
   skip further attempts after 3 failures to prevent unbounded prompt growth

Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77

Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com>

* refactor: extract shared constants for summary mode marker and failure threshold

Addresses code review feedback: SUMMARY_MODE_MARKER and
MAX_CONSECUTIVE_SUMMARY_FAILURES are now defined once in sdk/prompts.ts
and imported by ResponseProcessor and SessionManager.

Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77

Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com>

* fix: guard summary failure counter on summaryExpected (Greptile P1)

The circuit breaker counter previously incremented on any response
containing <observation> or <summary> tags — which matches virtually
every normal observation response. After 3 observations the breaker
would open and permanently block summarization, reproducing the
data-loss scenario #1633 was meant to prevent.

Gate the increment block on summaryExpected (already computed for
parseSummary coercion) so the counter only tracks actual summary
attempts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: cover circuit-breaker + apply review polish

- Use findLast / at(-1) for last-user-message lookup instead of
  filter + index (O(1) common case).
- Drop redundant `|| 0` fallback — field is required and initialized.
- Add comment noting counter is ephemeral by design.
- Add ResponseProcessor tests covering:
  * counter NOT incrementing on normal observation responses
    (regression guard for the Greptile P1)
  * counter incrementing when a summary was expected but missing
  * counter resetting to 0 on successful summary storage

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: iterate all observation blocks; don't count skip_summary as failure

Addresses CodeRabbit review on #2072:

- coerceObservationToSummary now iterates all <observation> blocks
  with a global regex and returns the first block that has title,
  narrative, or facts. Previously, an empty leading observation
  would short-circuit and discard populated follow-ups.

- Circuit-breaker counter now treats explicit <skip_summary/> as
  neutral — neither a failure nor a success — so a run that happens
  to end on a skip doesn't punish the session or mask a prior bad
  streak. Real failures (no summary, no skip) still increment.

- Tests added for both cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: reference SUMMARY_MODE_MARKER constant instead of hardcoded string

Addresses CodeRabbit nitpick: tests should pull the marker from the
canonical source so they don't silently drift when the constant is
renamed or edited.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: also coerce observations when <summary> has empty sub-tags

When the LLM wraps an empty <summary></summary> around real observation
content, the #1360 empty-subtag guard rejects the summary and returns
null — which would lose the observation content and resurrect the
#1633 retry loop. Fall back to coerceObservationToSummary in that
branch too, mirroring the unmatched-<summary> path.

Adds a test covering the empty-summary-wraps-observation case and
a guard test for empty summary with no observation content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com>
Co-authored-by: Alex Newman <thedotmack@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Copilot
2026-04-19 12:00:38 -07:00
committed by GitHub
parent beea7899b9
commit 8ec91e7ffa
7 changed files with 351 additions and 11 deletions
+76 -4
View File
@@ -113,8 +113,13 @@ export function parseObservations(text: string, correlationId?: string): ParsedO
/**
* Parse summary XML block from SDK response
* Returns null if no valid summary found or if summary was skipped
*
* @param coerceFromObservation - When true, attempts to convert <observation> tags
* into summary fields if no <summary> tags are found. Only set this when the
* response was expected to be a summary (i.e., a summarize message was sent).
* Prevents the infinite retry loop described in #1633.
*/
export function parseSummary(text: string, sessionId?: number): ParsedSummary | null {
export function parseSummary(text: string, sessionId?: number, coerceFromObservation: boolean = false): ParsedSummary | null {
// Check for skip_summary first
const skipRegex = /<skip_summary\s+reason="([^"]+)"\s*\/>/;
const skipMatch = skipRegex.exec(text);
@@ -132,10 +137,22 @@ export function parseSummary(text: string, sessionId?: number): ParsedSummary |
const summaryMatch = summaryRegex.exec(text);
if (!summaryMatch) {
// Log when the response contains <observation> instead of <summary>
// to help diagnose prompt conditioning issues (see #1312)
// When the LLM returns <observation> tags instead of <summary> tags,
// coerce the observation content into summary fields rather than discarding it.
// This breaks the infinite retry loop described in #1633: without coercion,
// the summary is silently dropped, the session completes without a summary,
// a new session is spawned with an ever-growing prompt, and the cycle repeats.
// Only coerce when explicitly requested (i.e., when a summarize message was sent).
if (/<observation>/.test(text)) {
logger.warn('PARSER', 'Summary response contained <observation> tags instead of <summary> — prompt conditioning may need strengthening', { sessionId });
if (coerceFromObservation) {
const coerced = coerceObservationToSummary(text, sessionId);
if (coerced) {
return coerced;
}
logger.warn('PARSER', 'Summary response contained <observation> tags instead of <summary> — coercion failed, no usable content', { sessionId });
} else {
logger.warn('PARSER', 'Summary response contained <observation> tags instead of <summary> — prompt conditioning may need strengthening', { sessionId });
}
}
return null;
}
@@ -171,6 +188,17 @@ export function parseSummary(text: string, sessionId?: number): ParsedSummary |
// This is NOT the same as missing some fields (which we intentionally allow above).
// Fix for #1360.
if (!request && !investigated && !learned && !completed && !next_steps) {
// If the response also contains <observation> tags with real content, fall
// back to coercion rather than discarding the response entirely — this covers
// the case where the LLM wraps empty <summary></summary> around observation
// content, which would otherwise resurrect the #1633 retry loop.
if (coerceFromObservation && /<observation>/.test(text)) {
const coerced = coerceObservationToSummary(text, sessionId);
if (coerced) {
logger.warn('PARSER', 'Empty <summary> match rejected — coerced from <observation> fallback (#1633)', { sessionId });
return coerced;
}
}
logger.warn('PARSER', 'Summary match has no sub-tags — skipping false positive', { sessionId });
return null;
}
@@ -185,6 +213,50 @@ export function parseSummary(text: string, sessionId?: number): ParsedSummary |
};
}
/**
* Coerce <observation> response into a ParsedSummary when <summary> tags are missing.
* Maps observation fields to the closest summary equivalents so that a usable
* summary is stored instead of nothing — breaking the retry loop (#1633).
*/
function coerceObservationToSummary(text: string, sessionId?: number): ParsedSummary | null {
// Iterate all <observation> blocks — if the LLM emits multiple and the first is
// empty, we still want to salvage the first one that has usable content.
const obsRegex = /<observation>([\s\S]*?)<\/observation>/g;
let obsMatch: RegExpExecArray | null;
let blockIndex = 0;
while ((obsMatch = obsRegex.exec(text)) !== null) {
const obsContent = obsMatch[1];
const title = extractField(obsContent, 'title');
const subtitle = extractField(obsContent, 'subtitle');
const narrative = extractField(obsContent, 'narrative');
const facts = extractArrayElements(obsContent, 'facts', 'fact');
if (title || narrative || facts.length > 0) {
// Map observation fields → summary fields (best-effort)
const request = title || subtitle || null;
const investigated = narrative || null;
const learned = facts.length > 0 ? facts.join('; ') : null;
const completed = title ? `${title}${subtitle ? ' — ' + subtitle : ''}` : null;
const next_steps = null; // No direct observation equivalent
logger.warn('PARSER', 'Coerced <observation> response into <summary> to prevent retry loop (#1633)', {
sessionId,
blockIndex,
hasTitle: !!title,
hasNarrative: !!narrative,
factCount: facts.length,
});
return { request, investigated, learned, completed, next_steps, notes: null };
}
blockIndex++;
}
return null;
}
/**
* Extract a simple field value from XML content
* Returns null for missing or empty/whitespace-only fields
+20 -3
View File
@@ -6,6 +6,20 @@
import { logger } from '../utils/logger.js';
import type { ModeConfig } from '../services/domain/types.js';
/**
* Marker string embedded in summary prompts — used by ResponseProcessor to detect
* whether the most recent user message was a summary request (enables observation→summary
* coercion for #1633). Keep in sync with buildSummaryPrompt below.
*/
export const SUMMARY_MODE_MARKER = 'MODE SWITCH: PROGRESS SUMMARY';
/**
* Maximum consecutive summary failures before the circuit breaker opens.
* After this many failures, SessionManager.queueSummarize will skip further
* summarize requests to prevent the infinite retry loop (#1633).
*/
export const MAX_CONSECUTIVE_SUMMARY_FAILURES = 3;
export interface Observation {
id: number;
tool_name: string;
@@ -134,9 +148,11 @@ export function buildSummaryPrompt(session: SDKSession, mode: ModeConfig): strin
return '';
})();
return `--- MODE SWITCH: PROGRESS SUMMARY ---
Do NOT output <observation> tags. This is a summary request, not an observation request.
Your response MUST use <summary> tags ONLY. Any <observation> output will be discarded.
return `--- ${SUMMARY_MODE_MARKER} ---
⚠️ CRITICAL TAG REQUIREMENT — READ CAREFULLY:
You MUST wrap your ENTIRE response in <summary>...</summary> tags.
• Do NOT use <observation> tags. <observation> output will be DISCARDED and cause a system error.
• The ONLY accepted root tag is <summary>. Any other root tag is a protocol violation.
${mode.prompts.header_summary_checkpoint}
${mode.prompts.summary_instruction}
@@ -154,6 +170,7 @@ ${mode.prompts.summary_format_instruction}
<notes>${mode.prompts.xml_summary_notes_placeholder}</notes>
</summary>
REMINDER: Your response MUST use <summary> as the root tag, NOT <observation>.
${mode.prompts.summary_footer}`;
}