Files

T

Alex Newman 68290a9121 Performance improvements: Token reduction and enhanced summaries (#101 )

* refactor: Reduce continuation prompt token usage by 95 lines

Removed redundant instructions from continuation prompt that were originally
added to mitigate a session continuity issue. That issue has since been
resolved, making these detailed instructions unnecessary on every continuation.

Changes:
- Reduced continuation prompt from ~106 lines to ~11 lines (~95 line reduction)
- Changed "User's Goal:" to "Next Prompt in Session:" (more accurate framing)
- Removed redundant WHAT TO RECORD, WHEN TO SKIP, and OUTPUT FORMAT sections
- Kept concise reminder: "Continue generating observations and progress summaries..."
- Initial prompt still contains all detailed instructions

Impact:
- Significant token savings on every continuation prompt
- Faster context injection with no loss of functionality
- Instructions remain comprehensive in initial prompt

Files modified:
- src/sdk/prompts.ts (buildContinuationPrompt function)
- plugin/scripts/worker-service.cjs (compiled output)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: Enhance observation and summary prompts for clarity and token efficiency

* Enhance prompt clarity and instructions in prompts.ts

- Added a reminder to think about instructions before starting work.
- Simplified the continuation prompt instruction by removing "for this ongoing session."

* feat: Enhance settings.json with permissions and deny access to sensitive files

refactor: Remove PLAN-full-observation-display.md and PR_SUMMARY.md as they are no longer needed

chore: Delete SECURITY_SUMMARY.md since it is redundant after recent changes

fix: Update worker-service.cjs to streamline observation generation instructions

cleanup: Remove src-analysis.md and src-tree.md for a cleaner codebase

refactor: Modify prompts.ts to clarify instructions for memory processing

* refactor: Remove legacy worker service implementation

* feat: Enhance summary hook to extract last assistant message and improve logging

- Added function to extract the last assistant message from the transcript.
- Updated summary hook to include last assistant message in the summary request.
- Modified SDKSession interface to store last assistant message.
- Adjusted buildSummaryPrompt to utilize last assistant message for generating summaries.
- Updated worker service and session manager to handle last assistant message in summarize requests.
- Introduced silentDebug utility for improved logging and diagnostics throughout the summary process.

* docs: Add comprehensive implementation plan for ROI metrics feature

Added detailed implementation plan covering:
- Token usage capture from Agent SDK
- Database schema changes (migration #8)
- Discovery cost tracking per observation
- Context hook display with ROI metrics
- Testing and rollout strategy

Timeline: ~20 hours over 4 days
Goal: Empirical data for YC application amendment

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: Add transcript processing scripts for analysis and formatting

- Implemented `dump-transcript-readable.ts` to generate a readable markdown dump of transcripts, excluding certain entry types.
- Created `extract-rich-context-examples.ts` to extract and showcase rich context examples from transcripts, highlighting user requests and assistant reasoning.
- Developed `format-transcript-context.ts` to format transcript context into a structured markdown format for improved observation generation.
- Added `test-transcript-parser.ts` for validating data extraction from transcript JSONL files, including statistics and error reporting.
- Introduced `transcript-to-markdown.ts` for a complete representation of transcript data in markdown format, showing all context data.
- Enhanced type definitions in `transcript.ts` to support new features and ensure type safety.
- Built `transcript-parser.ts` to handle parsing of transcript JSONL files, including error handling and data extraction methods.

* Refactor hooks and SDKAgent for improved observation handling

- Updated `new-hook.ts` to clean user prompts by stripping leading slashes for better semantic clarity.
- Enhanced `save-hook.ts` to include additional tools in the SKIP_TOOLS set, preventing unnecessary observations from certain command invocations.
- Modified `prompts.ts` to change the structure of observation prompts, emphasizing the observational role and providing a detailed XML output format for observations.
- Adjusted `SDKAgent.ts` to enforce stricter tool usage restrictions, ensuring the memory agent operates solely as an observer without any tool access.

* feat: Enhance session initialization to accept user prompts and prompt numbers

- Updated `handleSessionInit` in `worker-service.ts` to extract `userPrompt` and `promptNumber` from the request body and pass them to `initializeSession`.
- Modified `initializeSession` in `SessionManager.ts` to handle optional `currentUserPrompt` and `promptNumber` parameters.
- Added logic to update the existing session's `userPrompt` and `lastPromptNumber` if a `currentUserPrompt` is provided.
- Implemented debug logging for session initialization and updates to track user prompts and prompt numbers.

---------

Co-authored-by: Claude <noreply@anthropic.com>

2025-11-13 18:22:44 -05:00

16 KiB

Raw Permalink Blame History

Transcript Data Analysis: Available Context for Memory Worker

Generated: 2025-11-13 Purpose: Document what contextual data exists in Claude Code transcripts and identify opportunities to improve memory worker observation generation.

Executive Summary

Current State: The memory worker receives isolated tool executions via save-hook.ts:

Tool name
Tool input (parameters)
Tool output (results)

Available in Transcripts: Rich contextual data that could dramatically improve observation quality:

User's original request/intent
Assistant's reasoning (thinking blocks)
Full conversation context
Tool result data
Token usage and performance metrics
Session metadata (timestamps, UUIDs, CWD)

Recommendation: Enhance the memory worker to receive full conversation context for each tool execution, not just isolated tool data.

Transcript Structure

Entry Types

The transcript file (~/.claude/projects/-{project}/session-id.jsonl) contains:

- summary entries (149 in sample)
- file-history-snapshot entries (18 in sample)
- user entries (86 in sample)
- assistant entries (155 in sample)

Conversation Turn Pattern

Each conversation turn consists of:

User Entry - User's request
Assistant Entry - Assistant's response
User Entry - Tool results submitted back (automatic)
Assistant Entry - Assistant processes results and continues

This creates a pattern: User → Assistant → User (tool results) → Assistant (continues) → ...

Available Data by Entry Type

1. User Entries

Current Save-Hook Access:

Tool name
Tool input
Tool output

Additional Data Available in User Entries:

interface UserTranscriptEntry {
  type: 'user';
  timestamp: string;           // ISO timestamp
  uuid: string;                // Unique entry ID
  sessionId: string;           // Session identifier
  cwd: string;                 // Working directory
  parentUuid?: string;         // Parent entry reference
  isSidechain: boolean;        // Is this a side conversation?
  userType: string;            // 'human' or 'system'
  version: string;             // Claude Code version

  message: {
    role: 'user';
    content: string | ContentItem[];  // Can be text or structured
  };

  toolUseResult?: ToolUseResult;  // Legacy field, may contain results
}

When content is an array, it contains:

Text blocks with user's actual request
Tool result blocks with complete output data

Example Structure:

{
  "type": "user",
  "timestamp": "2025-11-13T17:10:31.963Z",
  "uuid": "364676a7-51c3-4036-afc3-7ff8f7301a8f",
  "sessionId": "57dcc12f-4751-46bb-82b4-2aa96a3e226d",
  "cwd": "/Users/alexnewman/Scripts/claude-mem",
  "message": {
    "role": "user",
    "content": [
      {
        "type": "tool_result",
        "tool_use_id": "toolu_01T477WUra1sDR6gHaqZHhKT",
        "content": "[actual tool output data]"
      }
    ]
  }
}

2. Assistant Entries

Current Save-Hook Access:

Nothing from assistant entries (they happen after tool execution)

Available Data in Assistant Entries:

interface AssistantTranscriptEntry {
  type: 'assistant';
  timestamp: string;
  uuid: string;
  sessionId: string;
  cwd: string;
  parentUuid?: string;
  isSidechain: boolean;
  userType: string;
  version: string;
  requestId?: string;  // API request ID

  message: {
    id: string;
    type: 'message';
    role: 'assistant';
    model: string;              // e.g., "claude-sonnet-4-5-20250929"
    content: ContentItem[];     // Array of content blocks
    stop_reason?: string;       // 'tool_use' | 'end_turn' | etc.
    stop_sequence?: string;
    usage?: UsageInfo;          // Token usage stats
  };
}

Content Block Types in message.content:

Thinking Blocks - Internal reasoning before acting

{
  type: 'thinking';
  thinking: string;  // Full reasoning text
  signature?: string;
}

Text Blocks - Assistant's visible response

{
  type: 'text';
  text: string;  // Response text
}

Tool Use Blocks - Tool invocations

{
  type: 'tool_use';
  id: string;              // Tool use ID
  name: string;            // Tool name (e.g., 'Read', 'Edit')
  input: Record<string, any>;  // Complete tool parameters
}

Token Usage Data:

interface UsageInfo {
  input_tokens?: number;
  output_tokens?: number;
  cache_creation_input_tokens?: number;
  cache_read_input_tokens?: number;
  service_tier?: string;
}

3. Summary Entries

interface SummaryTranscriptEntry {
  type: 'summary';
  summary: string;     // Generated summary text
  leafUuid: string;    // UUID of summarized entry
  cwd?: string;
}

These appear frequently (149 in sample) and provide high-level summaries of work done.

Data Flow: Current vs Potential

Current Flow (Save-Hook Only)

User: "Fix the bug in login.ts"
  ↓
Assistant: [uses Edit tool]
  ↓
Tool Execution: Edit(file_path: "login.ts", old_string: "...", new_string: "...")
  ↓
Save-Hook receives:
  - toolName: "Edit"
  - toolInput: { file_path: "login.ts", old_string: "...", new_string: "..." }
  - toolOutput: { success: true }
  ↓
Memory Worker generates observation from ONLY tool data
  - No user intent
  - No assistant reasoning
  - No context about WHY this change was made

Enhanced Flow (With Transcript Context)

User: "Fix the authentication bug - users getting logged out randomly"
  ↓
Assistant (thinking): "This sounds like a token expiration issue.
                       Let me check the JWT handling in login.ts..."
  ↓
Assistant (uses Edit tool)
  ↓
Save-Hook receives:
  - toolName: "Edit"
  - toolInput: { file_path: "login.ts", ... }
  - toolOutput: { success: true }
  - PLUS:
    - userRequest: "Fix the authentication bug - users getting logged out randomly"
    - assistantReasoning: "This sounds like a token expiration issue..."
    - conversationContext: Previous 2-3 turns
    - sessionMetadata: { cwd, timestamp, sessionId }
  ↓
Memory Worker generates richer observation:
  - "Fixed authentication bug causing random logouts"
  - "Problem: JWT tokens expiring too quickly"
  - "Solution: Updated token expiration to 24h in login.ts"
  - "Files: src/auth/login.ts"
  - "Concepts: authentication, token-management, bugfix"

Specific Opportunities

1. User Intent Extraction

Problem: Current observations lack user intent.

Solution: Parse the most recent user text entry before the tool execution.

Implementation:

Walk backward from tool execution entry
Find first user entry with text content
Extract text blocks (filter out tool_result blocks)

Example:

// In save-hook.ts
const userEntries = parser.getUserEntries();
const recentUserMessage = findUserMessageBeforeTool(userEntries, toolExecutionTimestamp);
const userIntent = extractTextFromContent(recentUserMessage.content);

2. Assistant Reasoning

Problem: We don't capture WHY the assistant chose to use a tool.

Solution: Extract thinking blocks from assistant entry immediately before tool use.

Implementation:

Find assistant entry that contains the tool_use block
Extract thinking blocks from same entry
Include first ~500 chars of thinking in observation context

Example:

const assistantEntry = findAssistantEntryWithToolUse(toolUseId);
const thinkingBlocks = assistantEntry.message.content.filter(c => c.type === 'thinking');
const reasoning = thinkingBlocks.map(b => b.thinking).join('\n');

3. Tool Results Context

Problem: Tool output alone doesn't show what was found or changed.

Solution: Access full tool result content from next user entry.

Implementation:

Tool execution happens in assistant entry
Results come back in next user entry as tool_result content
Save-hook can access both

Current Structure:

Assistant Entry:
  { type: 'tool_use', id: 'toolu_123', name: 'Read', input: {...} }
    ↓
User Entry (automatic):
  { type: 'tool_result', tool_use_id: 'toolu_123', content: "file contents..." }

Opportunity: Match tool_use_id to tool_result and include full result content.

4. Conversation Context

Problem: Isolated tool executions miss the larger conversation flow.

Solution: Include last N conversation turns (2-3 turns is usually sufficient).

Implementation:

Get entries from transcript within time window (e.g., last 5 minutes)
Include user messages and assistant text responses
Exclude thinking blocks to save tokens

Example Context:

Turn 1:
User: "I need to add dark mode support"
Assistant: "I'll help you add dark mode. Let me start by..."

Turn 2:
User: [tool results]
Assistant: "Now I'll update the theme configuration..."

Turn 3: [current tool execution]

5. Session Metadata

Problem: Observations lack temporal and project context.

Solution: Include session metadata in observation generation.

Available Fields:

cwd - Working directory (project path)
timestamp - Exact time of execution
sessionId - Session identifier
uuid - Entry identifier
version - Claude Code version

Use Case: Helps with project-specific context and temporal queries.

6. Token Usage Metrics

Problem: No visibility into performance and cost.

Solution: Track token usage per observation.

Available Data:

Input tokens
Output tokens
Cache creation tokens
Cache read tokens

Use Case:

Performance monitoring
Cost attribution
Cache effectiveness analysis

Recommended Implementation Strategy

Phase 1: User Intent (High Impact, Low Effort)

Change: Modify save-hook to extract user's most recent message.

Implementation:

// In save-hook.ts
import { TranscriptParser } from '../utils/transcript-parser';

const parser = new TranscriptParser(transcriptPath);
const userIntent = parser.getLastUserMessage();

// Send to worker
await workerService.saveToolExecution({
  ...existingData,
  userIntent,  // NEW
});

Impact: Observations now include "what the user wanted to do".

Phase 2: Assistant Reasoning (High Impact, Medium Effort)

Change: Extract thinking blocks from assistant entry containing tool use.

Implementation:

const assistantEntries = parser.getAssistantEntries();
const toolUseEntry = findEntryWithToolUse(assistantEntries, toolUseId);
const thinking = extractThinkingBlocks(toolUseEntry);

await workerService.saveToolExecution({
  ...existingData,
  userIntent,
  assistantReasoning: thinking,  // NEW
});

Impact: Observations include "why the assistant chose this approach".

Phase 3: Conversation Context (Medium Impact, High Effort)

Change: Include last 2-3 conversation turns.

Implementation:

const recentTurns = getRecentConversationTurns(parser, 3);

await workerService.saveToolExecution({
  ...existingData,
  userIntent,
  assistantReasoning: thinking,
  conversationContext: recentTurns,  // NEW
});

Impact: Observations understand multi-turn workflows.

Phase 4: Enhanced Metadata (Low Impact, Low Effort)

Change: Include session and performance metadata.

Implementation:

await workerService.saveToolExecution({
  ...existingData,
  userIntent,
  assistantReasoning: thinking,
  conversationContext: recentTurns,
  metadata: {  // NEW
    cwd: entry.cwd,
    timestamp: entry.timestamp,
    sessionId: entry.sessionId,
    tokenUsage: entry.message.usage,
  },
});

Impact: Better analytics and debugging.

Example: Before and After

Current Observation (Tool Data Only)

{
  "type": "feature",
  "title": "Updated login.ts",
  "narrative": "Modified authentication logic in src/auth/login.ts",
  "files": ["src/auth/login.ts"],
  "concepts": ["authentication"],
  "facts": []
}

Enhanced Observation (With Transcript Context)

{
  "type": "bugfix",
  "title": "Fixed authentication bug causing random logouts",
  "narrative": "Users were experiencing random logouts due to JWT token expiration. Updated token expiration from 1h to 24h in token validation logic. Modified src/auth/login.ts to use longer-lived tokens and improved error handling for expired tokens.",
  "files": ["src/auth/login.ts"],
  "concepts": ["authentication", "jwt", "token-management", "bugfix"],
  "facts": [
    "JWT token expiration was too short (1h)",
    "Updated expiration to 24h",
    "Added error handling for expired tokens"
  ]
}

Improvement:

Clear problem statement
Explicit solution
Specific technical details
Better concept tagging
Actionable facts

Technical Considerations

1. Performance

Concern: Parsing entire transcript on every tool execution.

Solution:

TranscriptParser already loads full file (unavoidable)
Use caching for transcript parsing within same session
Only parse once per session, reuse parsed entries

Benchmark:

Current: ~10ms to parse 408-line transcript
Impact: Negligible (save-hook already reads transcript)

2. Token Usage

Concern: Sending more context to worker increases tokens.

Solution:

Thinking blocks: Limit to first 500 chars
Conversation context: Only last 2-3 turns
Tool results: Truncate large outputs to 500 chars
User intent: Full text (usually short)

Estimate:

Current: ~200 tokens per observation generation
Enhanced: ~500 tokens per observation generation
Increase: ~150%
Cost: Still < $0.001 per observation with Haiku

3. Implementation Complexity

Concern: Matching tool executions to transcript entries.

Solution:

Tool use IDs are in both places
Timestamps provide ordering
UUID chains provide parent-child relationships

Example Matching:

function findToolContext(parser: TranscriptParser, toolUseId: string) {
  // 1. Find assistant entry with tool_use block
  const assistantEntry = parser.getAssistantEntries()
    .find(entry =>
      entry.message.content.some(c =>
        c.type === 'tool_use' && c.id === toolUseId
      )
    );

  // 2. Find next user entry with tool_result
  const userEntry = parser.getUserEntries()
    .find(entry =>
      entry.message.content.some(c =>
        c.type === 'tool_result' && c.tool_use_id === toolUseId
      )
    );

  return { assistantEntry, userEntry };
}

Next Steps

Validate Approach
- Review this analysis with project team
- Confirm data availability in all transcript scenarios
- Identify any privacy concerns
Implement Phase 1
- Update save-hook.ts to extract user intent
- Modify worker service to accept new fields
- Update observation prompt to use user intent
Test and Measure
- Compare observation quality before/after
- Measure token usage increase
- Validate performance impact
Iterate
- Roll out Phase 2 (assistant reasoning)
- Roll out Phase 3 (conversation context)
- Monitor improvements at each phase

Appendix: Data Samples

Complete Markdown Representation

See /Users/alexnewman/Scripts/claude-mem/docs/context/transcript-complete-readable.md for a full 1:1 markdown representation of the first 10 conversation turns from the sample transcript, including:

Complete user messages
Full assistant responses
Thinking blocks (truncated to 2000 chars)
Tool uses with complete input JSON
Tool results with actual output data (truncated to 500 chars)
Token usage stats
All metadata (timestamps, UUIDs, session IDs, CWD)

Sample Tool Result Structure

// User entry containing tool result
{
  "type": "user",
  "message": {
    "content": [
      {
        "type": "tool_result",
        "tool_use_id": "toolu_01T477WUra1sDR6gHaqZHhKT",
        "content": [
          {
            "type": "text",
            "text": "{\n  \"thoughtNumber\": 1,\n  \"totalThoughts\": 8,\n  \"nextThoughtNeeded\": true,\n  \"branches\": [],\n  \"thoughtHistoryLength\": 1\n}"
          }
        ]
      }
    ]
  }
}

Conclusion

The Claude Code transcript files contain a wealth of contextual data that is currently unused by the memory worker. By extracting:

User intent (the "what" and "why")
Assistant reasoning (the "how" and "because")
Tool results (the "outcome")
Conversation context (the "flow")
Session metadata (the "when" and "where")

We can generate significantly richer, more useful observations that better capture the intent, decisions, and outcomes of each coding session.

The data is already there - we just need to read it.

16 KiB Raw Permalink Blame History

Transcript Data Analysis: Available Context for Memory Worker

Executive Summary

Transcript Structure

Entry Types

Conversation Turn Pattern

Available Data by Entry Type

1. User Entries

2. Assistant Entries

3. Summary Entries

Data Flow: Current vs Potential

Current Flow (Save-Hook Only)

Enhanced Flow (With Transcript Context)

Specific Opportunities

1. User Intent Extraction

2. Assistant Reasoning

3. Tool Results Context

4. Conversation Context

5. Session Metadata

6. Token Usage Metrics

Recommended Implementation Strategy

Phase 1: User Intent (High Impact, Low Effort)

Phase 2: Assistant Reasoning (High Impact, Medium Effort)

Phase 3: Conversation Context (Medium Impact, High Effort)

Phase 4: Enhanced Metadata (Low Impact, Low Effort)

Example: Before and After

Current Observation (Tool Data Only)

Enhanced Observation (With Transcript Context)

Technical Considerations

1. Performance

2. Token Usage

3. Implementation Complexity

Next Steps

Appendix: Data Samples

Complete Markdown Representation

Sample Tool Result Structure

Conclusion

16 KiB

Raw Permalink Blame History