Files

T

Alex Newman 68290a9121 Performance improvements: Token reduction and enhanced summaries (#101 )

* refactor: Reduce continuation prompt token usage by 95 lines

Removed redundant instructions from continuation prompt that were originally
added to mitigate a session continuity issue. That issue has since been
resolved, making these detailed instructions unnecessary on every continuation.

Changes:
- Reduced continuation prompt from ~106 lines to ~11 lines (~95 line reduction)
- Changed "User's Goal:" to "Next Prompt in Session:" (more accurate framing)
- Removed redundant WHAT TO RECORD, WHEN TO SKIP, and OUTPUT FORMAT sections
- Kept concise reminder: "Continue generating observations and progress summaries..."
- Initial prompt still contains all detailed instructions

Impact:
- Significant token savings on every continuation prompt
- Faster context injection with no loss of functionality
- Instructions remain comprehensive in initial prompt

Files modified:
- src/sdk/prompts.ts (buildContinuationPrompt function)
- plugin/scripts/worker-service.cjs (compiled output)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: Enhance observation and summary prompts for clarity and token efficiency

* Enhance prompt clarity and instructions in prompts.ts

- Added a reminder to think about instructions before starting work.
- Simplified the continuation prompt instruction by removing "for this ongoing session."

* feat: Enhance settings.json with permissions and deny access to sensitive files

refactor: Remove PLAN-full-observation-display.md and PR_SUMMARY.md as they are no longer needed

chore: Delete SECURITY_SUMMARY.md since it is redundant after recent changes

fix: Update worker-service.cjs to streamline observation generation instructions

cleanup: Remove src-analysis.md and src-tree.md for a cleaner codebase

refactor: Modify prompts.ts to clarify instructions for memory processing

* refactor: Remove legacy worker service implementation

* feat: Enhance summary hook to extract last assistant message and improve logging

- Added function to extract the last assistant message from the transcript.
- Updated summary hook to include last assistant message in the summary request.
- Modified SDKSession interface to store last assistant message.
- Adjusted buildSummaryPrompt to utilize last assistant message for generating summaries.
- Updated worker service and session manager to handle last assistant message in summarize requests.
- Introduced silentDebug utility for improved logging and diagnostics throughout the summary process.

* docs: Add comprehensive implementation plan for ROI metrics feature

Added detailed implementation plan covering:
- Token usage capture from Agent SDK
- Database schema changes (migration #8)
- Discovery cost tracking per observation
- Context hook display with ROI metrics
- Testing and rollout strategy

Timeline: ~20 hours over 4 days
Goal: Empirical data for YC application amendment

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: Add transcript processing scripts for analysis and formatting

- Implemented `dump-transcript-readable.ts` to generate a readable markdown dump of transcripts, excluding certain entry types.
- Created `extract-rich-context-examples.ts` to extract and showcase rich context examples from transcripts, highlighting user requests and assistant reasoning.
- Developed `format-transcript-context.ts` to format transcript context into a structured markdown format for improved observation generation.
- Added `test-transcript-parser.ts` for validating data extraction from transcript JSONL files, including statistics and error reporting.
- Introduced `transcript-to-markdown.ts` for a complete representation of transcript data in markdown format, showing all context data.
- Enhanced type definitions in `transcript.ts` to support new features and ensure type safety.
- Built `transcript-parser.ts` to handle parsing of transcript JSONL files, including error handling and data extraction methods.

* Refactor hooks and SDKAgent for improved observation handling

- Updated `new-hook.ts` to clean user prompts by stripping leading slashes for better semantic clarity.
- Enhanced `save-hook.ts` to include additional tools in the SKIP_TOOLS set, preventing unnecessary observations from certain command invocations.
- Modified `prompts.ts` to change the structure of observation prompts, emphasizing the observational role and providing a detailed XML output format for observations.
- Adjusted `SDKAgent.ts` to enforce stricter tool usage restrictions, ensuring the memory agent operates solely as an observer without any tool access.

* feat: Enhance session initialization to accept user prompts and prompt numbers

- Updated `handleSessionInit` in `worker-service.ts` to extract `userPrompt` and `promptNumber` from the request body and pass them to `initializeSession`.
- Modified `initializeSession` in `SessionManager.ts` to handle optional `currentUserPrompt` and `promptNumber` parameters.
- Added logic to update the existing session's `userPrompt` and `lastPromptNumber` if a `currentUserPrompt` is provided.
- Implemented debug logging for session initialization and updates to track user prompts and prompt numbers.

---------

Co-authored-by: Claude <noreply@anthropic.com>

2025-11-13 18:22:44 -05:00

7.2 KiB

Raw Blame History

Claude Code Transcript Data Discovery

Executive Summary

This document details findings from implementing a validated transcript parser for Claude Code JSONL transcripts. The parser enables extraction of rich contextual data that can optimize prompt generation and track token usage for ROI metrics.

Transcript Structure

File Location

~/.claude/projects/<encoded-project-path>/<session-id>.jsonl

Example:

~/.claude/projects/-Users-alexnewman-Scripts-claude-mem/2933cff9-f0a7-4f0b-8296-0a030e7658a6.jsonl

Entry Types

Discovered 5 transcript entry types:

file-history-snapshot (NEW - not in Python model)
- Purpose: Track file state snapshots
- Frequency: ~10 entries per session
user - User messages and tool results
- Contains actual user text messages OR tool result data
- Can have string content or array of ContentItems
assistant - Assistant responses and tool uses
- Contains text responses, tool uses, and thinking blocks
- Critical: Contains usage data with token counts
summary (not yet observed in test data)
- Session summaries
system (not yet observed in test data)
- System messages/warnings
queue-operation (not yet observed in test data)
- Queue tracking for message flow

Key Findings

1. Message Extraction Complexity

Problem: Naively getting the "last" entry doesn't work because:

Last user entry might be a tool result, not a text message
Last assistant entry might only contain tool uses, no text

Solution: Iterate backward through entries to find the last entry with actual text content.

2. Tool Use Tracking

Discovery: Tool uses are in assistant messages, not user messages.

Data Available:

{
  name: string;      // Tool name (e.g., "Bash", "Read", "TodoWrite")
  timestamp: string; // When the tool was used
  input: any;        // Full tool input parameters
}

Test Session Results (168 entries):

42 tool uses across 7 different tool types
Most used: Bash (24x), TodoWrite (5x), Edit (4x)

3. Token Usage Data (ROI Foundation)

Critical Discovery: Every assistant message contains complete token usage data:

interface UsageInfo {
  input_tokens?: number;              // Total input tokens (includes context)
  cache_creation_input_tokens?: number; // Tokens used to create cache
  cache_read_input_tokens?: number;   // Cached tokens read (discounted cost)
  output_tokens?: number;             // Model output tokens
}

Test Session Token Analysis:

Input tokens:          858
Output tokens:         44,165
Cache creation tokens: 469,650
Cache read tokens:     5,294,101  ← 5.29M tokens saved by caching!
Total tokens:          45,023

ROI Implication: This validates our ROI implementation plan. We can track:

Discovery cost = sum of all input + output tokens across session
Context savings = cache_read_input_tokens (tokens NOT paid for in full)
ROI = Discovery cost / Context savings

4. Parse Reliability

Result: 0.00% parse failure rate on production transcript with 168 entries.

Conclusion: The JSONL format is stable and well-formed. No need for extensive error handling.

Implementation Files

Created Files

src/types/transcript.ts - TypeScript types matching Python Pydantic model
- All entry types, content types, usage info
- Drop-in compatible with Python model structure
src/utils/transcript-parser.ts - Robust transcript parsing class
- Handles all entry types
- Smart message extraction (finds last text message, not just last entry)
- Tool use history extraction
- Token usage aggregation
- Parse statistics and error tracking
scripts/test-transcript-parser.ts - Validation script
- Tests all extraction methods
- Reports parse statistics
- Shows token usage breakdown
- Lists tool use history

Usage Example

import { TranscriptParser } from '../src/utils/transcript-parser.js';

const parser = new TranscriptParser('/path/to/transcript.jsonl');

// Extract messages
const lastUserMsg = parser.getLastUserMessage();
const lastAssistantMsg = parser.getLastAssistantMessage();

// Get tool history
const tools = parser.getToolUseHistory();
// => [{name: 'Bash', timestamp: '...', input: {...}}, ...]

// Get token usage
const tokens = parser.getTotalTokenUsage();
// => {inputTokens: 858, outputTokens: 44165, cacheReadTokens: 5294101, ...}

// Parse statistics
const stats = parser.getParseStats();
// => {totalLines: 168, parsedEntries: 168, failedLines: 0, ...}

Next Steps for PR Review

Addressing "Drops Unknown Lines" Concern

Original Issue: Summary hook silently skipped malformed lines without visibility.

Root Cause: We didn't understand the full transcript model. The "skip malformed lines" was a band-aid.

Solution: Replace ad-hoc parsing in summary-hook.ts with validated TranscriptParser class:

Before (summary-hook.ts:38-117):

// Manually parsing with try/catch, no type safety
for (let i = lines.length - 1; i >= 0; i--) {
  try {
    const line = JSON.parse(lines[i]);
    if (line.type === 'user' && line.message?.content) {
      // ... extraction logic
    }
  } catch (parseError) {
    // Skip malformed lines  ← BLACK HOLE
    continue;
  }
}

After (using TranscriptParser):

import { TranscriptParser } from '../utils/transcript-parser.js';

const parser = new TranscriptParser(transcriptPath);
const lastUserMessage = parser.getLastUserMessage();
const lastAssistantMessage = parser.getLastAssistantMessage();

// Parse errors are tracked in parser.getParseErrors()

Benefits:

✅ Type-safe extraction based on validated model
✅ No silent failures - parse errors are tracked
✅ Smart extraction (finds last TEXT message, not last entry)
✅ Reusable across all hooks and scripts
✅ Enables token usage tracking (ROI metrics)
✅ Enables tool use tracking (prompt optimization)

Prompt Optimization Opportunities

With rich transcript data available, we can enhance prompts with:

1. Tool Use Patterns

"In this session you've used: Bash (24x), TodoWrite (5x), Edit (4x)"
Helps Claude understand what kind of work is being done

2. Token Economics Awareness

"Cache read tokens: 5.29M (context savings)"
Reinforces value of memory system

3. Session Flow Understanding

Number of user/assistant exchanges
Tools used per exchange
Session complexity metrics

4. File History Snapshots

Track which files were modified during session
Provide file change context to summaries

Testing

Run the validation script:

# Find your current session transcript
ls -lt ~/.claude/projects/-Users-alexnewman-Scripts-claude-mem/*.jsonl | head -1

# Test the parser
npx tsx scripts/test-transcript-parser.ts <path-to-transcript.jsonl>

Conclusion

The transcript parser implementation:

✅ Addresses PR review concern about dropped lines
✅ Validates the ROI metrics implementation plan
✅ Enables prompt optimization with rich context
✅ Provides foundation for future enhancements

Recommendation: Replace ad-hoc transcript parsing in hooks with TranscriptParser class for improved reliability and feature richness.

7.2 KiB Raw Blame History