Files
claude-mem/src/sdk/parser.ts
T
Alex Newman 68290a9121 Performance improvements: Token reduction and enhanced summaries (#101)
* refactor: Reduce continuation prompt token usage by 95 lines

Removed redundant instructions from continuation prompt that were originally
added to mitigate a session continuity issue. That issue has since been
resolved, making these detailed instructions unnecessary on every continuation.

Changes:
- Reduced continuation prompt from ~106 lines to ~11 lines (~95 line reduction)
- Changed "User's Goal:" to "Next Prompt in Session:" (more accurate framing)
- Removed redundant WHAT TO RECORD, WHEN TO SKIP, and OUTPUT FORMAT sections
- Kept concise reminder: "Continue generating observations and progress summaries..."
- Initial prompt still contains all detailed instructions

Impact:
- Significant token savings on every continuation prompt
- Faster context injection with no loss of functionality
- Instructions remain comprehensive in initial prompt

Files modified:
- src/sdk/prompts.ts (buildContinuationPrompt function)
- plugin/scripts/worker-service.cjs (compiled output)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: Enhance observation and summary prompts for clarity and token efficiency

* Enhance prompt clarity and instructions in prompts.ts

- Added a reminder to think about instructions before starting work.
- Simplified the continuation prompt instruction by removing "for this ongoing session."

* feat: Enhance settings.json with permissions and deny access to sensitive files

refactor: Remove PLAN-full-observation-display.md and PR_SUMMARY.md as they are no longer needed

chore: Delete SECURITY_SUMMARY.md since it is redundant after recent changes

fix: Update worker-service.cjs to streamline observation generation instructions

cleanup: Remove src-analysis.md and src-tree.md for a cleaner codebase

refactor: Modify prompts.ts to clarify instructions for memory processing

* refactor: Remove legacy worker service implementation

* feat: Enhance summary hook to extract last assistant message and improve logging

- Added function to extract the last assistant message from the transcript.
- Updated summary hook to include last assistant message in the summary request.
- Modified SDKSession interface to store last assistant message.
- Adjusted buildSummaryPrompt to utilize last assistant message for generating summaries.
- Updated worker service and session manager to handle last assistant message in summarize requests.
- Introduced silentDebug utility for improved logging and diagnostics throughout the summary process.

* docs: Add comprehensive implementation plan for ROI metrics feature

Added detailed implementation plan covering:
- Token usage capture from Agent SDK
- Database schema changes (migration #8)
- Discovery cost tracking per observation
- Context hook display with ROI metrics
- Testing and rollout strategy

Timeline: ~20 hours over 4 days
Goal: Empirical data for YC application amendment

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: Add transcript processing scripts for analysis and formatting

- Implemented `dump-transcript-readable.ts` to generate a readable markdown dump of transcripts, excluding certain entry types.
- Created `extract-rich-context-examples.ts` to extract and showcase rich context examples from transcripts, highlighting user requests and assistant reasoning.
- Developed `format-transcript-context.ts` to format transcript context into a structured markdown format for improved observation generation.
- Added `test-transcript-parser.ts` for validating data extraction from transcript JSONL files, including statistics and error reporting.
- Introduced `transcript-to-markdown.ts` for a complete representation of transcript data in markdown format, showing all context data.
- Enhanced type definitions in `transcript.ts` to support new features and ensure type safety.
- Built `transcript-parser.ts` to handle parsing of transcript JSONL files, including error handling and data extraction methods.

* Refactor hooks and SDKAgent for improved observation handling

- Updated `new-hook.ts` to clean user prompts by stripping leading slashes for better semantic clarity.
- Enhanced `save-hook.ts` to include additional tools in the SKIP_TOOLS set, preventing unnecessary observations from certain command invocations.
- Modified `prompts.ts` to change the structure of observation prompts, emphasizing the observational role and providing a detailed XML output format for observations.
- Adjusted `SDKAgent.ts` to enforce stricter tool usage restrictions, ensuring the memory agent operates solely as an observer without any tool access.

* feat: Enhance session initialization to accept user prompts and prompt numbers

- Updated `handleSessionInit` in `worker-service.ts` to extract `userPrompt` and `promptNumber` from the request body and pass them to `initializeSession`.
- Modified `initializeSession` in `SessionManager.ts` to handle optional `currentUserPrompt` and `promptNumber` parameters.
- Added logic to update the existing session's `userPrompt` and `lastPromptNumber` if a `currentUserPrompt` is provided.
- Implemented debug logging for session initialization and updates to track user prompts and prompt numbers.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-13 18:22:44 -05:00

197 lines
5.8 KiB
TypeScript

/**
* XML Parser Module
* Parses observation and summary XML blocks from SDK responses
*/
import { logger } from '../utils/logger.js';
export interface ParsedObservation {
type: string;
title: string | null;
subtitle: string | null;
facts: string[];
narrative: string | null;
concepts: string[];
files_read: string[];
files_modified: string[];
}
export interface ParsedSummary {
request: string | null;
investigated: string | null;
learned: string | null;
completed: string | null;
next_steps: string | null;
notes: string | null;
}
/**
* Parse observation XML blocks from SDK response
* Returns all observations found in the response
*/
export function parseObservations(text: string, correlationId?: string): ParsedObservation[] {
const observations: ParsedObservation[] = [];
// Match <observation>...</observation> blocks (non-greedy)
const observationRegex = /<observation>([\s\S]*?)<\/observation>/g;
let match;
while ((match = observationRegex.exec(text)) !== null) {
const obsContent = match[1];
// Extract all fields
const type = extractField(obsContent, 'type');
const title = extractField(obsContent, 'title');
const subtitle = extractField(obsContent, 'subtitle');
const narrative = extractField(obsContent, 'narrative');
const facts = extractArrayElements(obsContent, 'facts', 'fact');
const concepts = extractArrayElements(obsContent, 'concepts', 'concept');
const files_read = extractArrayElements(obsContent, 'files_read', 'file');
const files_modified = extractArrayElements(obsContent, 'files_modified', 'file');
// NOTE FROM THEDOTMACK: ALWAYS save observations - never skip. 10/24/2025
// All fields except type are nullable in schema
// If type is missing or invalid, use "change" as catch-all fallback
// Determine final type
let finalType = 'change'; // Default catch-all
if (type) {
const validTypes = ['bugfix', 'feature', 'refactor', 'change', 'discovery', 'decision'];
if (validTypes.includes(type.trim())) {
finalType = type.trim();
} else {
logger.warn('PARSER', `Invalid observation type: ${type}, using "change"`, { correlationId });
}
} else {
logger.warn('PARSER', 'Observation missing type field, using "change"', { correlationId });
}
// All other fields are optional - save whatever we have
// Filter out type from concepts array (types and concepts are separate dimensions)
const cleanedConcepts = concepts.filter(c => c !== finalType);
if (cleanedConcepts.length !== concepts.length) {
logger.warn('PARSER', 'Removed observation type from concepts array', {
correlationId,
type: finalType,
originalConcepts: concepts,
cleanedConcepts
});
}
observations.push({
type: finalType,
title,
subtitle,
facts,
narrative,
concepts: cleanedConcepts,
files_read,
files_modified
});
}
return observations;
}
/**
* Parse summary XML block from SDK response
* Returns null if no valid summary found or if summary was skipped
*/
export function parseSummary(text: string, sessionId?: number): ParsedSummary | null {
// Check for skip_summary first
const skipRegex = /<skip_summary\s+reason="([^"]+)"\s*\/>/;
const skipMatch = skipRegex.exec(text);
if (skipMatch) {
logger.info('PARSER', 'Summary skipped', {
sessionId,
reason: skipMatch[1]
});
return null;
}
// Match <summary>...</summary> block (non-greedy)
const summaryRegex = /<summary>([\s\S]*?)<\/summary>/;
const summaryMatch = summaryRegex.exec(text);
if (!summaryMatch) {
return null;
}
const summaryContent = summaryMatch[1];
// Extract fields
const request = extractField(summaryContent, 'request');
const investigated = extractField(summaryContent, 'investigated');
const learned = extractField(summaryContent, 'learned');
const completed = extractField(summaryContent, 'completed');
const next_steps = extractField(summaryContent, 'next_steps');
const notes = extractField(summaryContent, 'notes'); // Optional
// NOTE FROM THEDOTMACK: 100% of the time we must SAVE the summary, even if fields are missing. 10/24/2025
// NEVER DO THIS NONSENSE AGAIN.
// Validate required fields are present (notes is optional)
// if (!request || !investigated || !learned || !completed || !next_steps) {
// logger.warn('PARSER', 'Summary missing required fields', {
// sessionId,
// hasRequest: !!request,
// hasInvestigated: !!investigated,
// hasLearned: !!learned,
// hasCompleted: !!completed,
// hasNextSteps: !!next_steps
// });
// return null;
// }
return {
request,
investigated,
learned,
completed,
next_steps,
notes
};
}
/**
* Extract a simple field value from XML content
* Returns null for missing or empty/whitespace-only fields
*/
function extractField(content: string, fieldName: string): string | null {
const regex = new RegExp(`<${fieldName}>([^<]*)</${fieldName}>`);
const match = regex.exec(content);
if (!match) return null;
const trimmed = match[1].trim();
return trimmed === '' ? null : trimmed;
}
/**
* Extract array of elements from XML content
*/
function extractArrayElements(content: string, arrayName: string, elementName: string): string[] {
const elements: string[] = [];
// Match the array block
const arrayRegex = new RegExp(`<${arrayName}>(.*?)</${arrayName}>`, 's');
const arrayMatch = arrayRegex.exec(content);
if (!arrayMatch) {
return elements;
}
const arrayContent = arrayMatch[1];
// Extract individual elements
const elementRegex = new RegExp(`<${elementName}>([^<]+)</${elementName}>`, 'g');
let elementMatch;
while ((elementMatch = elementRegex.exec(arrayContent)) !== null) {
elements.push(elementMatch[1].trim());
}
return elements;
}