fix: break infinite summary-retry loop (#1633) (#2072)

* Initial plan * fix: break infinite summary-retry loop (#1633) Three-part fix: 1. Parser coercion: When LLM returns <observation> tags instead of <summary>, coerce observation content into summary fields (root cause fix) 2. Stronger summary prompt: Add clearer tag requirements with warnings 3. Circuit breaker: Track consecutive summary failures per session, skip further attempts after 3 failures to prevent unbounded prompt growth Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77 Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> * refactor: extract shared constants for summary mode marker and failure threshold Addresses code review feedback: SUMMARY_MODE_MARKER and MAX_CONSECUTIVE_SUMMARY_FAILURES are now defined once in sdk/prompts.ts and imported by ResponseProcessor and SessionManager. Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77 Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> * fix: guard summary failure counter on summaryExpected (Greptile P1) The circuit breaker counter previously incremented on any response containing <observation> or <summary> tags — which matches virtually every normal observation response. After 3 observations the breaker would open and permanently block summarization, reproducing the data-loss scenario #1633 was meant to prevent. Gate the increment block on summaryExpected (already computed for parseSummary coercion) so the counter only tracks actual summary attempts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: cover circuit-breaker + apply review polish - Use findLast / at(-1) for last-user-message lookup instead of filter + index (O(1) common case). - Drop redundant `|| 0` fallback — field is required and initialized. - Add comment noting counter is ephemeral by design. - Add ResponseProcessor tests covering: * counter NOT incrementing on normal observation responses (regression guard for the Greptile P1) * counter incrementing when a summary was expected but missing * counter resetting to 0 on successful summary storage Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: iterate all observation blocks; don't count skip_summary as failure Addresses CodeRabbit review on #2072: - coerceObservationToSummary now iterates all <observation> blocks with a global regex and returns the first block that has title, narrative, or facts. Previously, an empty leading observation would short-circuit and discard populated follow-ups. - Circuit-breaker counter now treats explicit <skip_summary/> as neutral — neither a failure nor a success — so a run that happens to end on a skip doesn't punish the session or mask a prior bad streak. Real failures (no summary, no skip) still increment. - Tests added for both cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: reference SUMMARY_MODE_MARKER constant instead of hardcoded string Addresses CodeRabbit nitpick: tests should pull the marker from the canonical source so they don't silently drift when the constant is renamed or edited. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: also coerce observations when <summary> has empty sub-tags When the LLM wraps an empty <summary></summary> around real observation content, the #1360 empty-subtag guard rejects the summary and returns null — which would lose the observation content and resurrect the #1633 retry loop. Fall back to coerceObservationToSummary in that branch too, mirroring the unmatched-<summary> path. Adds a test covering the empty-summary-wraps-observation case and a guard test for empty summary with no observation content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> Co-authored-by: Alex Newman <thedotmack@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:00:38 -07:00
parent beea7899b9
commit 8ec91e7ffa
7 changed files with 351 additions and 11 deletions
@@ -113,8 +113,13 @@ export function parseObservations(text: string, correlationId?: string): ParsedO
 /**
 * Parse summary XML block from SDK response
 * Returns null if no valid summary found or if summary was skipped
+ *
+ * @param coerceFromObservation - When true, attempts to convert <observation> tags
+ *   into summary fields if no <summary> tags are found. Only set this when the
+ *   response was expected to be a summary (i.e., a summarize message was sent).
+ *   Prevents the infinite retry loop described in #1633.
 */
-export function parseSummary(text: string, sessionId?: number): ParsedSummary | null {
+export function parseSummary(text: string, sessionId?: number, coerceFromObservation: boolean = false): ParsedSummary | null {
  // Check for skip_summary first
  const skipRegex = /<skip_summary\s+reason="([^"]+)"\s*\/>/;
  const skipMatch = skipRegex.exec(text);
@@ -132,10 +137,22 @@ export function parseSummary(text: string, sessionId?: number): ParsedSummary |
  const summaryMatch = summaryRegex.exec(text);

  if (!summaryMatch) {
-    // Log when the response contains <observation> instead of <summary>
-    // to help diagnose prompt conditioning issues (see #1312)
+    // When the LLM returns <observation> tags instead of <summary> tags,
+    // coerce the observation content into summary fields rather than discarding it.
+    // This breaks the infinite retry loop described in #1633: without coercion,
+    // the summary is silently dropped, the session completes without a summary,
+    // a new session is spawned with an ever-growing prompt, and the cycle repeats.
+    // Only coerce when explicitly requested (i.e., when a summarize message was sent).
    if (/<observation>/.test(text)) {
-      logger.warn('PARSER', 'Summary response contained <observation> tags instead of <summary> — prompt conditioning may need strengthening', { sessionId });
+      if (coerceFromObservation) {
+        const coerced = coerceObservationToSummary(text, sessionId);
+        if (coerced) {
+          return coerced;
+        }
+        logger.warn('PARSER', 'Summary response contained <observation> tags instead of <summary> — coercion failed, no usable content', { sessionId });
+      } else {
+        logger.warn('PARSER', 'Summary response contained <observation> tags instead of <summary> — prompt conditioning may need strengthening', { sessionId });
+      }
    }
    return null;
  }
@@ -171,6 +188,17 @@ export function parseSummary(text: string, sessionId?: number): ParsedSummary |
  // This is NOT the same as missing some fields (which we intentionally allow above).
  // Fix for #1360.
  if (!request && !investigated && !learned && !completed && !next_steps) {
+    // If the response also contains <observation> tags with real content, fall
+    // back to coercion rather than discarding the response entirely — this covers
+    // the case where the LLM wraps empty <summary></summary> around observation
+    // content, which would otherwise resurrect the #1633 retry loop.
+    if (coerceFromObservation && /<observation>/.test(text)) {
+      const coerced = coerceObservationToSummary(text, sessionId);
+      if (coerced) {
+        logger.warn('PARSER', 'Empty <summary> match rejected — coerced from <observation> fallback (#1633)', { sessionId });
+        return coerced;
+      }
+    }
    logger.warn('PARSER', 'Summary match has no sub-tags — skipping false positive', { sessionId });
    return null;
  }
@@ -185,6 +213,50 @@ export function parseSummary(text: string, sessionId?: number): ParsedSummary |
  };
 }

+/**
+ * Coerce <observation> response into a ParsedSummary when <summary> tags are missing.
+ * Maps observation fields to the closest summary equivalents so that a usable
+ * summary is stored instead of nothing — breaking the retry loop (#1633).
+ */
+function coerceObservationToSummary(text: string, sessionId?: number): ParsedSummary | null {
+  // Iterate all <observation> blocks — if the LLM emits multiple and the first is
+  // empty, we still want to salvage the first one that has usable content.
+  const obsRegex = /<observation>([\s\S]*?)<\/observation>/g;
+  let obsMatch: RegExpExecArray | null;
+  let blockIndex = 0;
+
+  while ((obsMatch = obsRegex.exec(text)) !== null) {
+    const obsContent = obsMatch[1];
+    const title = extractField(obsContent, 'title');
+    const subtitle = extractField(obsContent, 'subtitle');
+    const narrative = extractField(obsContent, 'narrative');
+    const facts = extractArrayElements(obsContent, 'facts', 'fact');
+
+    if (title || narrative || facts.length > 0) {
+      // Map observation fields → summary fields (best-effort)
+      const request = title || subtitle || null;
+      const investigated = narrative || null;
+      const learned = facts.length > 0 ? facts.join('; ') : null;
+      const completed = title ? `${title}${subtitle ? ' — ' + subtitle : ''}` : null;
+      const next_steps = null; // No direct observation equivalent
+
+      logger.warn('PARSER', 'Coerced <observation> response into <summary> to prevent retry loop (#1633)', {
+        sessionId,
+        blockIndex,
+        hasTitle: !!title,
+        hasNarrative: !!narrative,
+        factCount: facts.length,
+      });
+
+      return { request, investigated, learned, completed, next_steps, notes: null };
+    }
+
+    blockIndex++;
+  }
+
+  return null;
+}
+
 /**
 * Extract a simple field value from XML content
 * Returns null for missing or empty/whitespace-only fields
@@ -6,6 +6,20 @@
 import { logger } from '../utils/logger.js';
 import type { ModeConfig } from '../services/domain/types.js';

+/**
+ * Marker string embedded in summary prompts — used by ResponseProcessor to detect
+ * whether the most recent user message was a summary request (enables observation→summary
+ * coercion for #1633). Keep in sync with buildSummaryPrompt below.
+ */
+export const SUMMARY_MODE_MARKER = 'MODE SWITCH: PROGRESS SUMMARY';
+
+/**
+ * Maximum consecutive summary failures before the circuit breaker opens.
+ * After this many failures, SessionManager.queueSummarize will skip further
+ * summarize requests to prevent the infinite retry loop (#1633).
+ */
+export const MAX_CONSECUTIVE_SUMMARY_FAILURES = 3;
+
 export interface Observation {
  id: number;
  tool_name: string;
@@ -134,9 +148,11 @@ export function buildSummaryPrompt(session: SDKSession, mode: ModeConfig): strin
    return '';
  })();

-  return `--- MODE SWITCH: PROGRESS SUMMARY ---
-Do NOT output <observation> tags. This is a summary request, not an observation request.
-Your response MUST use <summary> tags ONLY. Any <observation> output will be discarded.
+  return `--- ${SUMMARY_MODE_MARKER} ---
+⚠️ CRITICAL TAG REQUIREMENT — READ CAREFULLY:
+• You MUST wrap your ENTIRE response in <summary>...</summary> tags.
+• Do NOT use <observation> tags. <observation> output will be DISCARDED and cause a system error.
+• The ONLY accepted root tag is <summary>. Any other root tag is a protocol violation.

 ${mode.prompts.header_summary_checkpoint}
 ${mode.prompts.summary_instruction}
@@ -154,6 +170,7 @@ ${mode.prompts.summary_format_instruction}
  <notes>${mode.prompts.xml_summary_notes_placeholder}</notes>
 </summary>

+REMINDER: Your response MUST use <summary> as the root tag, NOT <observation>.
 ${mode.prompts.summary_footer}`;
 }

@@ -46,6 +46,9 @@ export interface ActiveSession {
  // Track whether the most recent storage operation persisted a summary record.
  // Used by the status endpoint so the Stop hook can detect silent summary loss (#1633).
  lastSummaryStored?: boolean;
+  // Circuit breaker: track consecutive summary failures to prevent infinite retry loops (#1633).
+  // When this reaches MAX_CONSECUTIVE_SUMMARY_FAILURES, further summarize requests are skipped.
+  consecutiveSummaryFailures: number;
 }

 export interface PendingMessage {
@@ -16,6 +16,7 @@ import { PendingMessageStore } from '../sqlite/PendingMessageStore.js';
 import { SessionQueueProcessor } from '../queue/SessionQueueProcessor.js';
 import { getProcessBySession, ensureProcessExit } from './ProcessRegistry.js';
 import { getSupervisor } from '../../supervisor/index.js';
+import { MAX_CONSECUTIVE_SUMMARY_FAILURES } from '../../sdk/prompts.js';

 /** Idle threshold before a stuck generator (zombie subprocess) is force-killed. */
 export const MAX_GENERATOR_IDLE_MS = 5 * 60 * 1000; // 5 minutes
@@ -219,7 +220,8 @@ export class SessionManager {
      currentProvider: null,  // Will be set when generator starts
      consecutiveRestarts: 0,  // Track consecutive restart attempts to prevent infinite loops
      processingMessageIds: [],  // CLAIM-CONFIRM: Track message IDs for confirmProcessed()
-      lastGeneratorActivity: Date.now()  // Initialize for stale detection (Issue #1099)
+      lastGeneratorActivity: Date.now(),  // Initialize for stale detection (Issue #1099)
+      consecutiveSummaryFailures: 0  // Circuit breaker for summary retry loop (#1633)
    };

    logger.debug('SESSION', 'Creating new session object (memorySessionId cleared to prevent stale resume)', {
@@ -312,6 +314,18 @@ export class SessionManager {
      session = this.initializeSession(sessionDbId);
    }

+    // Circuit breaker: skip summarize if too many consecutive failures (#1633).
+    // This prevents the infinite loop where each failed summary spawns a new session
+    // with an ever-growing prompt. Counter is in-memory per ActiveSession — it resets
+    // on worker restart, which is acceptable because session state is already ephemeral.
+    if (session.consecutiveSummaryFailures >= MAX_CONSECUTIVE_SUMMARY_FAILURES) {
+      logger.warn('SESSION', `Circuit breaker OPEN: skipping summarize after ${session.consecutiveSummaryFailures} consecutive failures (#1633)`, {
+        sessionId: sessionDbId,
+        contentSessionId: session.contentSessionId
+      });
+      return;
+    }
+
    // CRITICAL: Persist to database FIRST
    const message: PendingMessage = {
      type: 'summarize',
@@ -13,6 +13,7 @@

 import { logger } from '../../../utils/logger.js';
 import { parseObservations, parseSummary, type ParsedObservation, type ParsedSummary } from '../../../sdk/parser.js';
+import { SUMMARY_MODE_MARKER, MAX_CONSECUTIVE_SUMMARY_FAILURES } from '../../../sdk/prompts.js';
 import { updateCursorContextForProject } from '../../integrations/CursorHooksInstaller.js';
 import { updateFolderClaudeMdFiles } from '../../../utils/claude-md-utils.js';
 import { getWorkerPort } from '../../../shared/worker-utils.js';
@@ -67,7 +68,17 @@ export async function processAgentResponse(

  // Parse observations and summary
  const observations = parseObservations(text, session.contentSessionId);
-  const summary = parseSummary(text, session.sessionDbId);
+
+  // Detect whether the most recent prompt was a summary request.
+  // If so, enable observation-to-summary coercion to prevent the infinite
+  // retry loop described in #1633.
+  const lastMessage = session.conversationHistory.at(-1);
+  const lastUserMessage = lastMessage?.role === 'user'
+    ? lastMessage
+    : session.conversationHistory.findLast(m => m.role === 'user') ?? null;
+  const summaryExpected = lastUserMessage?.content?.includes(SUMMARY_MODE_MARKER) ?? false;
+
+  const summary = parseSummary(text, session.sessionDbId, summaryExpected);

  if (
    text.trim() &&
@@ -130,6 +141,32 @@ export async function processAgentResponse(
  // to the Stop hook for silent-summary-loss detection (#1633)
  session.lastSummaryStored = result.summaryId !== null;

+  // Circuit breaker: track consecutive summary failures (#1633).
+  // Only evaluate when a summary was actually expected (summarize message was sent).
+  // Without this guard, the counter would increment on every normal observation
+  // response, tripping the breaker after 3 observations and permanently blocking
+  // summarization — reproducing the data-loss scenario this fix is meant to prevent.
+  if (summaryExpected) {
+    const skippedIntentionally = /<skip_summary\b/.test(text);
+    if (summaryForStore !== null) {
+      // Summary was present in the response — reset the failure counter
+      session.consecutiveSummaryFailures = 0;
+    } else if (skippedIntentionally) {
+      // Explicit <skip_summary/> is a valid protocol response — neither success
+      // nor failure. Leave the counter unchanged so we don't mask a bad run that
+      // happens to end on a skip, but also don't punish intentional skips.
+    } else {
+      // Summary was expected but none was stored — count as failure
+      session.consecutiveSummaryFailures += 1;
+      if (session.consecutiveSummaryFailures >= MAX_CONSECUTIVE_SUMMARY_FAILURES) {
+        logger.error('SESSION', `Circuit breaker: ${session.consecutiveSummaryFailures} consecutive summary failures — further summarize requests will be skipped (#1633)`, {
+          sessionId: session.sessionDbId,
+          contentSessionId: session.contentSessionId
+        });
+      }
+    }
+  }
+
  // CLAIM-CONFIRM: Now that storage succeeded, confirm all processing messages (delete from queue)
  // This is the critical step that prevents message loss on generator crash
  const pendingStore = sessionManager.getPendingMessageStore();