8ec91e7ffa
* Initial plan * fix: break infinite summary-retry loop (#1633) Three-part fix: 1. Parser coercion: When LLM returns <observation> tags instead of <summary>, coerce observation content into summary fields (root cause fix) 2. Stronger summary prompt: Add clearer tag requirements with warnings 3. Circuit breaker: Track consecutive summary failures per session, skip further attempts after 3 failures to prevent unbounded prompt growth Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77 Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> * refactor: extract shared constants for summary mode marker and failure threshold Addresses code review feedback: SUMMARY_MODE_MARKER and MAX_CONSECUTIVE_SUMMARY_FAILURES are now defined once in sdk/prompts.ts and imported by ResponseProcessor and SessionManager. Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77 Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> * fix: guard summary failure counter on summaryExpected (Greptile P1) The circuit breaker counter previously incremented on any response containing <observation> or <summary> tags — which matches virtually every normal observation response. After 3 observations the breaker would open and permanently block summarization, reproducing the data-loss scenario #1633 was meant to prevent. Gate the increment block on summaryExpected (already computed for parseSummary coercion) so the counter only tracks actual summary attempts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: cover circuit-breaker + apply review polish - Use findLast / at(-1) for last-user-message lookup instead of filter + index (O(1) common case). - Drop redundant `|| 0` fallback — field is required and initialized. - Add comment noting counter is ephemeral by design. - Add ResponseProcessor tests covering: * counter NOT incrementing on normal observation responses (regression guard for the Greptile P1) * counter incrementing when a summary was expected but missing * counter resetting to 0 on successful summary storage Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: iterate all observation blocks; don't count skip_summary as failure Addresses CodeRabbit review on #2072: - coerceObservationToSummary now iterates all <observation> blocks with a global regex and returns the first block that has title, narrative, or facts. Previously, an empty leading observation would short-circuit and discard populated follow-ups. - Circuit-breaker counter now treats explicit <skip_summary/> as neutral — neither a failure nor a success — so a run that happens to end on a skip doesn't punish the session or mask a prior bad streak. Real failures (no summary, no skip) still increment. - Tests added for both cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: reference SUMMARY_MODE_MARKER constant instead of hardcoded string Addresses CodeRabbit nitpick: tests should pull the marker from the canonical source so they don't silently drift when the constant is renamed or edited. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: also coerce observations when <summary> has empty sub-tags When the LLM wraps an empty <summary></summary> around real observation content, the #1360 empty-subtag guard rejects the summary and returns null — which would lose the observation content and resurrect the #1633 retry loop. Fall back to coerceObservationToSummary in that branch too, mirroring the unmatched-<summary> path. Adds a test covering the empty-summary-wraps-observation case and a guard test for empty summary with no observation content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> Co-authored-by: Alex Newman <thedotmack@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
150 lines
6.3 KiB
TypeScript
150 lines
6.3 KiB
TypeScript
/**
|
|
* Tests for parseSummary (fix for #1360)
|
|
*
|
|
* Validates that false-positive summary matches (no sub-tags) are rejected
|
|
* while real summaries — even with some missing fields — are still saved.
|
|
*/
|
|
import { describe, it, expect } from 'bun:test';
|
|
import { parseSummary } from '../../src/sdk/parser.js';
|
|
|
|
describe('parseSummary', () => {
|
|
it('returns null when no <summary> tag present and coercion disabled', () => {
|
|
expect(parseSummary('<observation><title>foo</title></observation>')).toBeNull();
|
|
});
|
|
|
|
it('returns null when no <summary> or <observation> tags present', () => {
|
|
expect(parseSummary('Some plain text response without any XML tags')).toBeNull();
|
|
});
|
|
|
|
it('returns null when <summary> has no sub-tags (false positive — fix for #1360)', () => {
|
|
// This is the bug: observation response accidentally contains <summary>some text</summary>
|
|
expect(parseSummary('<observation>done <summary>some content here</summary></observation>')).toBeNull();
|
|
});
|
|
|
|
it('returns null for bare <summary> with only plain text, no sub-tags', () => {
|
|
expect(parseSummary('<summary>This session was productive.</summary>')).toBeNull();
|
|
});
|
|
|
|
it('returns summary when at least one sub-tag is present (respects maintainer note)', () => {
|
|
const text = `<summary><request>Fix the bug</request></summary>`;
|
|
const result = parseSummary(text);
|
|
expect(result).not.toBeNull();
|
|
expect(result?.request).toBe('Fix the bug');
|
|
expect(result?.investigated).toBeNull();
|
|
expect(result?.learned).toBeNull();
|
|
});
|
|
|
|
it('returns full summary when all fields are present', () => {
|
|
const text = `<summary>
|
|
<request>Fix login bug</request>
|
|
<investigated>Auth flow and JWT expiry</investigated>
|
|
<learned>Token was expiring too soon</learned>
|
|
<completed>Extended token TTL to 24h</completed>
|
|
<next_steps>Monitor error rates</next_steps>
|
|
</summary>`;
|
|
const result = parseSummary(text);
|
|
expect(result).not.toBeNull();
|
|
expect(result?.request).toBe('Fix login bug');
|
|
expect(result?.investigated).toBe('Auth flow and JWT expiry');
|
|
expect(result?.learned).toBe('Token was expiring too soon');
|
|
expect(result?.completed).toBe('Extended token TTL to 24h');
|
|
expect(result?.next_steps).toBe('Monitor error rates');
|
|
});
|
|
|
|
it('returns null when skip_summary tag is present', () => {
|
|
expect(parseSummary('<skip_summary reason="no work done"/>')).toBeNull();
|
|
});
|
|
|
|
// Observation-to-summary coercion tests (#1633)
|
|
it('coerces <observation> with content into a summary when coerceFromObservation=true (#1633)', () => {
|
|
const result = parseSummary('<observation><title>foo</title></observation>', undefined, true);
|
|
expect(result).not.toBeNull();
|
|
expect(result?.request).toBe('foo');
|
|
expect(result?.completed).toBe('foo');
|
|
});
|
|
|
|
it('coerces observation with narrative into summary with investigated field (#1633)', () => {
|
|
const text = `<observation>
|
|
<type>refactor</type>
|
|
<title>UObjectArray refactored</title>
|
|
<narrative>Removed local XXXX and migrated to new pattern</narrative>
|
|
</observation>`;
|
|
const result = parseSummary(text, undefined, true);
|
|
expect(result).not.toBeNull();
|
|
expect(result?.request).toBe('UObjectArray refactored');
|
|
expect(result?.investigated).toBe('Removed local XXXX and migrated to new pattern');
|
|
});
|
|
|
|
it('coerces observation with facts into summary with learned field (#1633)', () => {
|
|
const text = `<observation>
|
|
<type>discovery</type>
|
|
<title>JWT token handling</title>
|
|
<facts>
|
|
<fact>Tokens expire after 1 hour</fact>
|
|
<fact>Refresh flow uses rotating keys</fact>
|
|
</facts>
|
|
</observation>`;
|
|
const result = parseSummary(text, undefined, true);
|
|
expect(result).not.toBeNull();
|
|
expect(result?.request).toBe('JWT token handling');
|
|
expect(result?.learned).toBe('Tokens expire after 1 hour; Refresh flow uses rotating keys');
|
|
});
|
|
|
|
it('coerces observation with subtitle into completed field (#1633)', () => {
|
|
const text = `<observation>
|
|
<type>config</type>
|
|
<title>Database migration</title>
|
|
<subtitle>Added new index for performance</subtitle>
|
|
</observation>`;
|
|
const result = parseSummary(text, undefined, true);
|
|
expect(result).not.toBeNull();
|
|
expect(result?.completed).toBe('Database migration — Added new index for performance');
|
|
});
|
|
|
|
it('returns null for empty observation even with coercion enabled (#1633)', () => {
|
|
const text = `<observation><type>config</type></observation>`;
|
|
expect(parseSummary(text, undefined, true)).toBeNull();
|
|
});
|
|
|
|
it('prefers <summary> tags over observation coercion when both present (#1633)', () => {
|
|
const text = `<observation><title>obs title</title></observation>
|
|
<summary><request>summary request</request></summary>`;
|
|
const result = parseSummary(text, undefined, true);
|
|
expect(result).not.toBeNull();
|
|
expect(result?.request).toBe('summary request');
|
|
});
|
|
|
|
it('falls back to observation coercion when <summary> matches but has empty sub-tags (#1633)', () => {
|
|
// LLM wraps an empty summary around real observation content — without the
|
|
// fallback, the empty-subtag guard (#1360) rejects the summary and we lose
|
|
// the observation content, resurrecting the retry loop.
|
|
const text = `<summary></summary>
|
|
<observation>
|
|
<title>the real work</title>
|
|
<narrative>what actually happened</narrative>
|
|
</observation>`;
|
|
const result = parseSummary(text, undefined, true);
|
|
expect(result).not.toBeNull();
|
|
expect(result?.request).toBe('the real work');
|
|
expect(result?.investigated).toBe('what actually happened');
|
|
});
|
|
|
|
it('empty <summary> with no observation content still returns null (coercion disabled)', () => {
|
|
const text = '<summary></summary>';
|
|
expect(parseSummary(text, undefined, true)).toBeNull();
|
|
});
|
|
|
|
it('skips empty leading observation blocks and coerces from the first populated one (#1633)', () => {
|
|
const text = `<observation><type>discovery</type></observation>
|
|
<observation>
|
|
<type>bugfix</type>
|
|
<title>second block has content</title>
|
|
<narrative>fixed the crash</narrative>
|
|
</observation>`;
|
|
const result = parseSummary(text, undefined, true);
|
|
expect(result).not.toBeNull();
|
|
expect(result?.request).toBe('second block has content');
|
|
expect(result?.investigated).toBe('fixed the crash');
|
|
});
|
|
});
|