* Initial plan * fix: break infinite summary-retry loop (#1633) Three-part fix: 1. Parser coercion: When LLM returns <observation> tags instead of <summary>, coerce observation content into summary fields (root cause fix) 2. Stronger summary prompt: Add clearer tag requirements with warnings 3. Circuit breaker: Track consecutive summary failures per session, skip further attempts after 3 failures to prevent unbounded prompt growth Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77 Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> * refactor: extract shared constants for summary mode marker and failure threshold Addresses code review feedback: SUMMARY_MODE_MARKER and MAX_CONSECUTIVE_SUMMARY_FAILURES are now defined once in sdk/prompts.ts and imported by ResponseProcessor and SessionManager. Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77 Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> * fix: guard summary failure counter on summaryExpected (Greptile P1) The circuit breaker counter previously incremented on any response containing <observation> or <summary> tags — which matches virtually every normal observation response. After 3 observations the breaker would open and permanently block summarization, reproducing the data-loss scenario #1633 was meant to prevent. Gate the increment block on summaryExpected (already computed for parseSummary coercion) so the counter only tracks actual summary attempts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: cover circuit-breaker + apply review polish - Use findLast / at(-1) for last-user-message lookup instead of filter + index (O(1) common case). - Drop redundant `|| 0` fallback — field is required and initialized. - Add comment noting counter is ephemeral by design. - Add ResponseProcessor tests covering: * counter NOT incrementing on normal observation responses (regression guard for the Greptile P1) * counter incrementing when a summary was expected but missing * counter resetting to 0 on successful summary storage Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: iterate all observation blocks; don't count skip_summary as failure Addresses CodeRabbit review on #2072: - coerceObservationToSummary now iterates all <observation> blocks with a global regex and returns the first block that has title, narrative, or facts. Previously, an empty leading observation would short-circuit and discard populated follow-ups. - Circuit-breaker counter now treats explicit <skip_summary/> as neutral — neither a failure nor a success — so a run that happens to end on a skip doesn't punish the session or mask a prior bad streak. Real failures (no summary, no skip) still increment. - Tests added for both cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: reference SUMMARY_MODE_MARKER constant instead of hardcoded string Addresses CodeRabbit nitpick: tests should pull the marker from the canonical source so they don't silently drift when the constant is renamed or edited. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: also coerce observations when <summary> has empty sub-tags When the LLM wraps an empty <summary></summary> around real observation content, the #1360 empty-subtag guard rejects the summary and returns null — which would lose the observation content and resurrect the #1633 retry loop. Fall back to coerceObservationToSummary in that branch too, mirroring the unmatched-<summary> path. Adds a test covering the empty-summary-wraps-observation case and a guard test for empty summary with no observation content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> Co-authored-by: Alex Newman <thedotmack@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -31,6 +31,7 @@ mock.module('../../../src/services/domain/ModeManager.js', () => ({
|
||||
|
||||
// Import after mocks
|
||||
import { processAgentResponse } from '../../../src/services/worker/agents/ResponseProcessor.js';
|
||||
import { SUMMARY_MODE_MARKER } from '../../../src/sdk/prompts.js';
|
||||
import type { WorkerRef, StorageResult } from '../../../src/services/worker/agents/types.js';
|
||||
import type { ActiveSession } from '../../../src/services/worker-types.js';
|
||||
import type { DatabaseManager } from '../../../src/services/worker/DatabaseManager.js';
|
||||
@@ -130,8 +131,9 @@ describe('ResponseProcessor', () => {
|
||||
conversationHistory: [],
|
||||
currentProvider: 'claude',
|
||||
processingMessageIds: [], // CLAIM-CONFIRM pattern: track message IDs being processed
|
||||
consecutiveSummaryFailures: 0,
|
||||
...overrides,
|
||||
};
|
||||
} as ActiveSession;
|
||||
}
|
||||
|
||||
describe('parsing observations from XML response', () => {
|
||||
@@ -726,4 +728,103 @@ describe('ResponseProcessor', () => {
|
||||
expect(session.lastSummaryStored).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('circuit breaker: consecutiveSummaryFailures counter (#1633)', () => {
|
||||
const SUMMARY_PROMPT = `--- ${SUMMARY_MODE_MARKER} ---\nDo the summary now.`;
|
||||
|
||||
it('does NOT increment the counter on normal observation responses (P1 regression guard)', async () => {
|
||||
// Session where the last user message is an OBSERVATION request, not a summary request.
|
||||
// The counter must stay at 0 even though the response has <observation> tags and no summary.
|
||||
mockStoreObservations.mockImplementation(() => ({
|
||||
observationIds: [1],
|
||||
summaryId: null,
|
||||
createdAtEpoch: 1700000000000,
|
||||
} as StorageResult));
|
||||
|
||||
const session = createMockSession({
|
||||
conversationHistory: [{ role: 'user', content: 'record a new observation' }],
|
||||
});
|
||||
const obsResponse = `
|
||||
<observation>
|
||||
<type>discovery</type>
|
||||
<title>found a thing</title>
|
||||
<narrative>it happened</narrative>
|
||||
<facts></facts>
|
||||
<concepts></concepts>
|
||||
<files_read></files_read>
|
||||
<files_modified></files_modified>
|
||||
</observation>
|
||||
`;
|
||||
|
||||
// Drive multiple observation responses — counter must never increment.
|
||||
for (let i = 0; i < 5; i++) {
|
||||
await processAgentResponse(obsResponse, session, mockDbManager, mockSessionManager, mockWorker, 0, null, 'TestAgent');
|
||||
}
|
||||
|
||||
expect(session.consecutiveSummaryFailures).toBe(0);
|
||||
});
|
||||
|
||||
it('increments the counter when a summary was expected but none was stored', async () => {
|
||||
mockStoreObservations.mockImplementation(() => ({
|
||||
observationIds: [],
|
||||
summaryId: null,
|
||||
createdAtEpoch: 1700000000000,
|
||||
} as StorageResult));
|
||||
|
||||
const session = createMockSession({
|
||||
conversationHistory: [{ role: 'user', content: SUMMARY_PROMPT }],
|
||||
});
|
||||
// LLM returned nothing structured — no summary stored
|
||||
const badResponse = 'I cannot comply with that request.';
|
||||
|
||||
await processAgentResponse(badResponse, session, mockDbManager, mockSessionManager, mockWorker, 0, null, 'TestAgent');
|
||||
|
||||
expect(session.consecutiveSummaryFailures).toBe(1);
|
||||
});
|
||||
|
||||
it('does NOT increment the counter on intentional <skip_summary/> responses', async () => {
|
||||
mockStoreObservations.mockImplementation(() => ({
|
||||
observationIds: [],
|
||||
summaryId: null,
|
||||
createdAtEpoch: 1700000000000,
|
||||
} as StorageResult));
|
||||
|
||||
const session = createMockSession({
|
||||
consecutiveSummaryFailures: 1,
|
||||
conversationHistory: [{ role: 'user', content: SUMMARY_PROMPT }],
|
||||
});
|
||||
const skipResponse = '<skip_summary reason="no meaningful work this session"/>';
|
||||
|
||||
await processAgentResponse(skipResponse, session, mockDbManager, mockSessionManager, mockWorker, 0, null, 'TestAgent');
|
||||
|
||||
// Skip is neutral — counter stays where it was, no spurious increment
|
||||
expect(session.consecutiveSummaryFailures).toBe(1);
|
||||
});
|
||||
|
||||
it('resets the counter to 0 when a summary is successfully stored', async () => {
|
||||
mockStoreObservations.mockImplementation(() => ({
|
||||
observationIds: [],
|
||||
summaryId: 42,
|
||||
createdAtEpoch: 1700000000000,
|
||||
} as StorageResult));
|
||||
|
||||
const session = createMockSession({
|
||||
consecutiveSummaryFailures: 2,
|
||||
conversationHistory: [{ role: 'user', content: SUMMARY_PROMPT }],
|
||||
});
|
||||
const goodResponse = `
|
||||
<summary>
|
||||
<request>wrap it up</request>
|
||||
<investigated>the thing</investigated>
|
||||
<learned>the answer</learned>
|
||||
<completed>the work</completed>
|
||||
<next_steps>none</next_steps>
|
||||
</summary>
|
||||
`;
|
||||
|
||||
await processAgentResponse(goodResponse, session, mockDbManager, mockSessionManager, mockWorker, 0, null, 'TestAgent');
|
||||
|
||||
expect(session.consecutiveSummaryFailures).toBe(0);
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user