Files

T

Alex Newman 938eb9dc0e feat: Implement new memory processing prompts and XML structures

- Added final finalize prompt for session summary generation with required XML fields.
- Introduced recommended prompt flow with structured observation format and hierarchical storage principles.
- Created final init prompt for processing tool executions with clear guidelines on when to store observations.
- Developed final observation prompt for analyzing tool outputs and generating structured observations.
- Migrated old prompt flow to a new system with improved clarity and structured data handling.
- Updated parser and storage mechanisms to accommodate new observation formats and fields.
- Enhanced documentation for new prompts and their usage in memory processing sessions.

2025-10-18 17:27:46 -04:00

14 KiB

Raw Blame History

Prompt Flow Analysis & Rankings

Rating System

✅ Smart: Well-designed, clear purpose, effective
⚠️ Problematic: Has issues but salvageable
❌ Stupid: Poorly designed, confusing, or counterproductive
🧠 Context Poison: Will confuse the AI or create inconsistent behavior
🔍 No Clear Purpose: Exists but unclear why
🎯 Clarity Score: 1-10 (10 = crystal clear, 1 = incomprehensible)

Element-by-Element Comparison

INIT PROMPTS (Session Start)

CURRENT: "You are a memory processor"

You will PROCESS tool executions during this Claude Code session. Your job is to:
1. ANALYZE each tool response for meaningful content
2. DECIDE whether it contains something worth storing
3. EXTRACT the key insight
4. STORE it as an observation in the XML format below

For MOST meaningful tool outputs, you should generate an observation. Only skip truly routine operations.

Rating: ❌ Stupid + 🧠 Context Poison Clarity: 3/10

Issues:

"For MOST" is ambiguous - does that mean 51%? 80%? 95%?
Creates bias toward over-storage (fear of missing things)
Contradicts "Only skip truly routine operations" later in prompt
No clear guidance on what "meaningful" actually means
"Only skip truly routine" implies almost everything should be stored

Why Context Poison:

Agent will second-guess every decision
Creates inconsistent thresholds across sessions
User gets frustrated with noise

OLD: "You are a semantic memory compressor"

## FIRST: Generate Session Title
IMMEDIATELY generate a title and subtitle for this session based on the user request.

## THEN: Process Tool Responses
You will receive a stream of tool responses. For each one:
1. ANALYZE: Does this contain information worth remembering?
2. DECIDE: Should I store this or skip it?
3. EXTRACT: What are the key semantic concepts?
4. DECOMPOSE: Break into title + subtitle + atomic facts + narrative
5. STORE: Use bash to save the hierarchical memory
6. TRACK: Keep count of stored memories (001, 002, 003...)

# IMPORTANT REMINDERS
- Be selective - quality over quantity

Rating: ⚠️ Problematic but contains ✅ Smart elements Clarity: 6/10

Issues:

"IMMEDIATELY" vs "THEN" creates ordering confusion
Session title generation is unclear when it should happen
Bash tool dependency is fragile
Manual counter tracking is error-prone

Smart Elements:

"Quality over quantity" is clear directive
Hierarchical decomposition gives structure
Explicit state tracking (counter)
"Be selective" is unambiguous

Verdict: The philosophy is better (selective, quality-focused), execution is messier (bash commands, ordering confusion)

OBSERVATION PROMPTS

CURRENT: buildObservationPrompt

ANALYSIS TASK
-------------
ANALYZE this tool response and DECIDE: Does it contain something worth storing?

Most Read, Edit, Grep, Bash, and Write operations contain meaningful content.

If this contains something worth remembering, output the observation...

Rating: ❌ Stupid + 🧠 Context Poison Clarity: 4/10

Issues:

"Most Read, Edit, Grep, Bash, and Write operations contain meaningful content" - Why repeat this per observation?
Contradicts init prompt's "be selective" guidance
Creates bias: agent thinks "well, this is a Read, so I should probably store it"
No guidance on WHAT to extract from the content
One-sentence observations lose context

Why Context Poison:

Tool-type bias ("It's a Read, so store it") instead of content-based evaluation
Encourages lazy pattern matching instead of semantic analysis
Results in garbage like "Read package.json file" with no insight

OLD: Tool Message

# Tool Response {timeFormatted}

Tool: {toolName}
User Context: "{userPrompt}"

{toolResponse}


Analyze and store if meaningful.

Rating: ✅ Smart (minimal, non-leading) Clarity: 8/10

Smart Elements:

Doesn't tell agent what to think about tool types
Trusts agent's judgment from system prompt
Short and clear
Includes user context for relevance filtering

Issue:

Maybe TOO minimal - no reminder of format requirements

STORAGE FORMATS

CURRENT: XML Observations

<observation>
  <type>feature</type>
  <text>Implemented JWT token refresh flow with 7-day expiry</text>
</observation>

Rating: ⚠️ Problematic Clarity: 7/10

Issues:

One sentence only - loses narrative context
Five types (decision, bugfix, feature, refactor, discovery) - are these actually useful categories?
No file associations
No concept tagging
Flat structure - all observations equal weight

Smart Elements:

Simple to parse
Structured typing
Regex-parseable

OLD: Hierarchical Memory (4 levels)

--title "SDK Transcript Auto-Cleanup"
--subtitle "Automatic deletion of SDK transcripts after completion prevents memory conversations from appearing in UI history"
--facts '["stop-streaming.js: Deletes SDK transcript after overview generation", "Path: ~/.claude/projects/{sanitized-cwd}/{sessionId}.jsonl"]'
--concepts '["cleanup", "SDK-lifecycle", "UX"]'
--files '["hooks/stop-streaming.js"]'

Rating: ✅ Smart (structure) but ❌ Stupid (execution via bash) Clarity: 8/10 (concept), 3/10 (implementation)

Smart Elements:

Multiple levels of granularity (title → subtitle → facts → narrative)
Atomic facts enable precise retrieval
File associations explicit
Concept tags for categorization
Subtitle gives the "why it matters"

Stupid Elements:

Bash command execution is fragile
Quote escaping nightmare
Manual counter tracking
JSON in bash arguments is error-prone

Verdict: Great data model, terrible implementation

SUMMARY/FINALIZE PROMPTS

CURRENT: buildFinalizePrompt (per prompt)

<summary>
  <request>Implement JWT authentication system</request>
  <investigated>Existing auth middleware, session management</investigated>
  <learned>Current system uses session cookies; no JWT support</learned>
  <completed>Implemented JWT token + refresh flow</completed>
  <next_steps>Add token revocation API endpoint</next_steps>
  <files_read><file>src/auth.ts</file></files_read>
  <files_edited><file>src/auth.ts</file></files_edited>
  <notes>Token secret stored in .env</notes>
</summary>

Rating: ✅ Smart (structure) but 🔍 No Clear Purpose (frequency) Clarity: 9/10

Smart Elements:

Structured format with clear fields
Tracks what was learned (semantic value)
Files read/edited tracked explicitly
Next steps captured

Issues:

Generated PER PROMPT - is this too granular?
Will create many summaries per session
Unclear how these summaries are used
No aggregation across prompts

Question: Should this be per-session instead of per-prompt?

OLD: Session Overview (per session)

claude-mem store-overview --project "{project}" --session "{sessionId}" --content "2-3 sentence overview"

Rating: ⚠️ Problematic Clarity: 5/10

Issues:

Only 2-3 sentences - very lossy
No structured fields
Happens once at end - loses per-prompt context
Relies on agent's memory of entire session

Smart Element:

One overview per session (not noisy)

DECISION GUIDANCE

CURRENT: What to Store/Skip

Store these:
✓ File contents with logic, algorithms, or patterns
✓ Search results revealing project structure
✓ Build errors or test failures with context
...

Skip these:
✗ Simple status checks (git status with no changes)
✗ Trivial edits (one-line config changes)
...

Rating: ✅ Smart Clarity: 8/10

Smart Elements:

Concrete examples
Both positive and negative cases
Action-oriented

Issue:

Contradicted by "For MOST" and "Most Read, Edit..." statements elsewhere

OLD: What to Store/Skip

Store these:
- File contents with logic, algorithms, or patterns
- Search results revealing project structure
...

Skip these:
- Simple status checks (git status with no changes)
- Trivial edits (one-line config changes)
- Binary data or noise
- Anything without semantic value

Rating: ✅ Smart Clarity: 8/10

Same as current, which is good.

CRITICAL ISSUES RANKED

1. "For MOST meaningful tool outputs" - 🧠 CONTEXT POISON #1

Severity: CRITICAL Impact: Destroys selectivity, fills DB with noise Fix: Remove entirely. Replace with: "Be selective. Only store if it reveals important information about the codebase."

Severity: CRITICAL Impact: Creates tool-type bias instead of content-based evaluation Fix: Remove entirely. It's redundant and harmful.

3. One-sentence observations lose context - ❌ STUPID

Severity: HIGH Impact: Can't understand observation without narrative Fix: Add narrative field to observations (like old system)

4. No hierarchical structure in current system - ❌ STUPID

Severity: HIGH Impact: Can't do granular retrieval (fact-level vs narrative-level) Fix: Adopt 4-level hierarchy from old system

5. Bash command execution in old system - ❌ STUPID

Severity: HIGH Impact: Fragile, error-prone, quote-escaping nightmare Fix: Keep current approach (XML parsing + direct DB writes)

6. Manual memory counter in old system - ⚠️ PROBLEMATIC

Severity: MEDIUM Impact: Agent forgets, skips numbers, duplicates Fix: Auto-increment in database (current approach)

7. Per-prompt summaries unclear purpose - 🔍 NO CLEAR PURPOSE

Severity: MEDIUM Impact: Creates many summaries, unclear how they're used Fix: Decide: per-session summary only, or per-prompt with aggregation?

8. Five observation types unclear value - 🔍 NO CLEAR PURPOSE

Severity: LOW Impact: Are these categories actually useful for retrieval? Fix: Evaluate if types should be: (1) kept as-is, (2) expanded, (3) removed

BEST ELEMENTS FROM EACH SYSTEM

From OLD System (Keep These)

✅ 4-level hierarchy (title → subtitle → facts → narrative)
✅ "Be selective - quality over quantity"
✅ Atomic facts (50-150 char, self-contained, no pronouns)
✅ Concept tagging
✅ File associations
✅ Minimal observation prompts (don't bias agent)

From CURRENT System (Keep These)

✅ XML parsing (not bash commands)
✅ Auto-increment IDs (not manual counters)
✅ Structured summary format (8 fields)
✅ Per-prompt tracking
✅ Foreign key integrity
✅ Typed observations (decision/bugfix/feature/refactor/discovery)

From NEITHER System (Add These)

Clear threshold guidance: "Only store if it reveals important information about the codebase"
Explicit narrative field in observations
Vector embeddings for semantic search (current stores in SQLite only)

RECOMMENDED HYBRID SYSTEM

Storage Format: Hierarchical Observations (XML)

<observation>
  <type>feature</type>
  <title>JWT Token Refresh Implementation</title>
  <subtitle>Added 7-day refresh token rotation with Redis storage</subtitle>
  <facts>
    <fact>src/auth.ts: refreshToken() generates new JWT with 7-day expiry</fact>
    <fact>Redis key format: refresh:{userId}:{tokenId} with TTL 604800s</fact>
    <fact>Old token invalidated on refresh to prevent replay attacks</fact>
  </facts>
  <narrative>Implemented JWT refresh token functionality in src/auth.ts. The refreshToken() function validates the old refresh token from Redis, generates a new JWT access token (7-day expiry) and new refresh token, stores the new refresh token in Redis with key format refresh:{userId}:{tokenId} and TTL of 604800 seconds (7 days), and invalidates the old refresh token to prevent replay attacks. This enables long-lived authenticated sessions without requiring users to re-login while maintaining security through token rotation.</narrative>
  <concepts>
    <concept>authentication</concept>
    <concept>security</concept>
    <concept>session-management</concept>
  </concepts>
  <files>
    <file>src/auth.ts</file>
    <file>src/middleware/auth.ts</file>
  </files>
</observation>

Guidance: Clear and Unambiguous

Be selective. Only store observations when the tool output reveals important information about:
- Architecture or design patterns
- Implementation details of features or bug fixes
- System state or configuration
- Business logic or algorithms

Skip routine operations like empty git status, simple npm installs, or trivial config changes.

Each observation should be self-contained and searchable.

Summary: Per-Session (Not Per-Prompt)

Generate ONE summary when session ends
Aggregate all observations from session
Use current structured format (request, investigated, learned, completed, next_steps, files_read, files_edited, notes)

FINAL VERDICT

Element	Current	Old	Winner
Storage Structure	Flat one-sentence	4-level hierarchy	OLD
Storage Implementation	XML parsing	Bash commands	CURRENT
Decision Guidance	Contradictory	Clear	OLD
Session Metadata	None	Title + subtitle	OLD
Per-Prompt Tracking	Yes (summaries)	No	CURRENT
Semantic Search	No	Yes (ChromaDB)	OLD
Observation Prompts	Biased, repetitive	Minimal, clear	OLD
Auto-Increment IDs	Yes	No (manual)	CURRENT
File Associations	No	Yes	OLD
Concept Tagging	No	Yes	OLD

Optimal System: Hybrid - Old system's data model + Current system's implementation approach

14 KiB Raw Blame History

Prompt Flow Analysis & Rankings

Rating System

Element-by-Element Comparison

INIT PROMPTS (Session Start)

CURRENT: "You are a memory processor"

OLD: "You are a semantic memory compressor"

OBSERVATION PROMPTS

CURRENT: buildObservationPrompt

OLD: Tool Message

STORAGE FORMATS

CURRENT: XML Observations

OLD: Hierarchical Memory (4 levels)

SUMMARY/FINALIZE PROMPTS

CURRENT: buildFinalizePrompt (per prompt)

OLD: Session Overview (per session)

DECISION GUIDANCE

CURRENT: What to Store/Skip

OLD: What to Store/Skip

CRITICAL ISSUES RANKED

1. "For MOST meaningful tool outputs" - 🧠 CONTEXT POISON #1

2. "Most Read, Edit, Grep, Bash, and Write operations contain meaningful content" - 🧠 CONTEXT POISON #2

3. One-sentence observations lose context - ❌ STUPID

4. No hierarchical structure in current system - ❌ STUPID

5. Bash command execution in old system - ❌ STUPID

6. Manual memory counter in old system - ⚠️ PROBLEMATIC

7. Per-prompt summaries unclear purpose - 🔍 NO CLEAR PURPOSE

8. Five observation types unclear value - 🔍 NO CLEAR PURPOSE

BEST ELEMENTS FROM EACH SYSTEM

From OLD System (Keep These)

From CURRENT System (Keep These)

From NEITHER System (Add These)

RECOMMENDED HYBRID SYSTEM

Storage Format: Hierarchical Observations (XML)

Guidance: Clear and Unambiguous

Summary: Per-Session (Not Per-Prompt)

FINAL VERDICT

14 KiB

Raw Blame History