- Added final finalize prompt for session summary generation with required XML fields. - Introduced recommended prompt flow with structured observation format and hierarchical storage principles. - Created final init prompt for processing tool executions with clear guidelines on when to store observations. - Developed final observation prompt for analyzing tool outputs and generating structured observations. - Migrated old prompt flow to a new system with improved clarity and structured data handling. - Updated parser and storage mechanisms to accommodate new observation formats and fields. - Enhanced documentation for new prompts and their usage in memory processing sessions.
14 KiB
Prompt Flow Analysis & Rankings
Rating System
- ✅ Smart: Well-designed, clear purpose, effective
- ⚠️ Problematic: Has issues but salvageable
- ❌ Stupid: Poorly designed, confusing, or counterproductive
- 🧠 Context Poison: Will confuse the AI or create inconsistent behavior
- 🔍 No Clear Purpose: Exists but unclear why
- 🎯 Clarity Score: 1-10 (10 = crystal clear, 1 = incomprehensible)
Element-by-Element Comparison
INIT PROMPTS (Session Start)
CURRENT: "You are a memory processor"
You will PROCESS tool executions during this Claude Code session. Your job is to:
1. ANALYZE each tool response for meaningful content
2. DECIDE whether it contains something worth storing
3. EXTRACT the key insight
4. STORE it as an observation in the XML format below
For MOST meaningful tool outputs, you should generate an observation. Only skip truly routine operations.
Rating: ❌ Stupid + 🧠 Context Poison Clarity: 3/10
Issues:
- "For MOST" is ambiguous - does that mean 51%? 80%? 95%?
- Creates bias toward over-storage (fear of missing things)
- Contradicts "Only skip truly routine operations" later in prompt
- No clear guidance on what "meaningful" actually means
- "Only skip truly routine" implies almost everything should be stored
Why Context Poison:
- Agent will second-guess every decision
- Creates inconsistent thresholds across sessions
- User gets frustrated with noise
OLD: "You are a semantic memory compressor"
## FIRST: Generate Session Title
IMMEDIATELY generate a title and subtitle for this session based on the user request.
## THEN: Process Tool Responses
You will receive a stream of tool responses. For each one:
1. ANALYZE: Does this contain information worth remembering?
2. DECIDE: Should I store this or skip it?
3. EXTRACT: What are the key semantic concepts?
4. DECOMPOSE: Break into title + subtitle + atomic facts + narrative
5. STORE: Use bash to save the hierarchical memory
6. TRACK: Keep count of stored memories (001, 002, 003...)
# IMPORTANT REMINDERS
- Be selective - quality over quantity
Rating: ⚠️ Problematic but contains ✅ Smart elements Clarity: 6/10
Issues:
- "IMMEDIATELY" vs "THEN" creates ordering confusion
- Session title generation is unclear when it should happen
- Bash tool dependency is fragile
- Manual counter tracking is error-prone
Smart Elements:
- "Quality over quantity" is clear directive
- Hierarchical decomposition gives structure
- Explicit state tracking (counter)
- "Be selective" is unambiguous
Verdict: The philosophy is better (selective, quality-focused), execution is messier (bash commands, ordering confusion)
OBSERVATION PROMPTS
CURRENT: buildObservationPrompt
ANALYSIS TASK
-------------
ANALYZE this tool response and DECIDE: Does it contain something worth storing?
Most Read, Edit, Grep, Bash, and Write operations contain meaningful content.
If this contains something worth remembering, output the observation...
Rating: ❌ Stupid + 🧠 Context Poison Clarity: 4/10
Issues:
- "Most Read, Edit, Grep, Bash, and Write operations contain meaningful content" - Why repeat this per observation?
- Contradicts init prompt's "be selective" guidance
- Creates bias: agent thinks "well, this is a Read, so I should probably store it"
- No guidance on WHAT to extract from the content
- One-sentence observations lose context
Why Context Poison:
- Tool-type bias ("It's a Read, so store it") instead of content-based evaluation
- Encourages lazy pattern matching instead of semantic analysis
- Results in garbage like "Read package.json file" with no insight
OLD: Tool Message
# Tool Response {timeFormatted}
Tool: {toolName}
User Context: "{userPrompt}"
{toolResponse}
Analyze and store if meaningful.
Rating: ✅ Smart (minimal, non-leading) Clarity: 8/10
Smart Elements:
- Doesn't tell agent what to think about tool types
- Trusts agent's judgment from system prompt
- Short and clear
- Includes user context for relevance filtering
Issue:
- Maybe TOO minimal - no reminder of format requirements
STORAGE FORMATS
CURRENT: XML Observations
<observation>
<type>feature</type>
<text>Implemented JWT token refresh flow with 7-day expiry</text>
</observation>
Rating: ⚠️ Problematic Clarity: 7/10
Issues:
- One sentence only - loses narrative context
- Five types (decision, bugfix, feature, refactor, discovery) - are these actually useful categories?
- No file associations
- No concept tagging
- Flat structure - all observations equal weight
Smart Elements:
- Simple to parse
- Structured typing
- Regex-parseable
OLD: Hierarchical Memory (4 levels)
--title "SDK Transcript Auto-Cleanup"
--subtitle "Automatic deletion of SDK transcripts after completion prevents memory conversations from appearing in UI history"
--facts '["stop-streaming.js: Deletes SDK transcript after overview generation", "Path: ~/.claude/projects/{sanitized-cwd}/{sessionId}.jsonl"]'
--concepts '["cleanup", "SDK-lifecycle", "UX"]'
--files '["hooks/stop-streaming.js"]'
Rating: ✅ Smart (structure) but ❌ Stupid (execution via bash) Clarity: 8/10 (concept), 3/10 (implementation)
Smart Elements:
- Multiple levels of granularity (title → subtitle → facts → narrative)
- Atomic facts enable precise retrieval
- File associations explicit
- Concept tags for categorization
- Subtitle gives the "why it matters"
Stupid Elements:
- Bash command execution is fragile
- Quote escaping nightmare
- Manual counter tracking
- JSON in bash arguments is error-prone
Verdict: Great data model, terrible implementation
SUMMARY/FINALIZE PROMPTS
CURRENT: buildFinalizePrompt (per prompt)
<summary>
<request>Implement JWT authentication system</request>
<investigated>Existing auth middleware, session management</investigated>
<learned>Current system uses session cookies; no JWT support</learned>
<completed>Implemented JWT token + refresh flow</completed>
<next_steps>Add token revocation API endpoint</next_steps>
<files_read><file>src/auth.ts</file></files_read>
<files_edited><file>src/auth.ts</file></files_edited>
<notes>Token secret stored in .env</notes>
</summary>
Rating: ✅ Smart (structure) but 🔍 No Clear Purpose (frequency) Clarity: 9/10
Smart Elements:
- Structured format with clear fields
- Tracks what was learned (semantic value)
- Files read/edited tracked explicitly
- Next steps captured
Issues:
- Generated PER PROMPT - is this too granular?
- Will create many summaries per session
- Unclear how these summaries are used
- No aggregation across prompts
Question: Should this be per-session instead of per-prompt?
OLD: Session Overview (per session)
claude-mem store-overview --project "{project}" --session "{sessionId}" --content "2-3 sentence overview"
Rating: ⚠️ Problematic Clarity: 5/10
Issues:
- Only 2-3 sentences - very lossy
- No structured fields
- Happens once at end - loses per-prompt context
- Relies on agent's memory of entire session
Smart Element:
- One overview per session (not noisy)
DECISION GUIDANCE
CURRENT: What to Store/Skip
Store these:
✓ File contents with logic, algorithms, or patterns
✓ Search results revealing project structure
✓ Build errors or test failures with context
...
Skip these:
✗ Simple status checks (git status with no changes)
✗ Trivial edits (one-line config changes)
...
Rating: ✅ Smart Clarity: 8/10
Smart Elements:
- Concrete examples
- Both positive and negative cases
- Action-oriented
Issue:
- Contradicted by "For MOST" and "Most Read, Edit..." statements elsewhere
OLD: What to Store/Skip
Store these:
- File contents with logic, algorithms, or patterns
- Search results revealing project structure
...
Skip these:
- Simple status checks (git status with no changes)
- Trivial edits (one-line config changes)
- Binary data or noise
- Anything without semantic value
Rating: ✅ Smart Clarity: 8/10
Same as current, which is good.
CRITICAL ISSUES RANKED
1. "For MOST meaningful tool outputs" - 🧠 CONTEXT POISON #1
Severity: CRITICAL Impact: Destroys selectivity, fills DB with noise Fix: Remove entirely. Replace with: "Be selective. Only store if it reveals important information about the codebase."
2. "Most Read, Edit, Grep, Bash, and Write operations contain meaningful content" - 🧠 CONTEXT POISON #2
Severity: CRITICAL Impact: Creates tool-type bias instead of content-based evaluation Fix: Remove entirely. It's redundant and harmful.
3. One-sentence observations lose context - ❌ STUPID
Severity: HIGH Impact: Can't understand observation without narrative Fix: Add narrative field to observations (like old system)
4. No hierarchical structure in current system - ❌ STUPID
Severity: HIGH Impact: Can't do granular retrieval (fact-level vs narrative-level) Fix: Adopt 4-level hierarchy from old system
5. Bash command execution in old system - ❌ STUPID
Severity: HIGH Impact: Fragile, error-prone, quote-escaping nightmare Fix: Keep current approach (XML parsing + direct DB writes)
6. Manual memory counter in old system - ⚠️ PROBLEMATIC
Severity: MEDIUM Impact: Agent forgets, skips numbers, duplicates Fix: Auto-increment in database (current approach)
7. Per-prompt summaries unclear purpose - 🔍 NO CLEAR PURPOSE
Severity: MEDIUM Impact: Creates many summaries, unclear how they're used Fix: Decide: per-session summary only, or per-prompt with aggregation?
8. Five observation types unclear value - 🔍 NO CLEAR PURPOSE
Severity: LOW Impact: Are these categories actually useful for retrieval? Fix: Evaluate if types should be: (1) kept as-is, (2) expanded, (3) removed
BEST ELEMENTS FROM EACH SYSTEM
From OLD System (Keep These)
- ✅ 4-level hierarchy (title → subtitle → facts → narrative)
- ✅ "Be selective - quality over quantity"
- ✅ Atomic facts (50-150 char, self-contained, no pronouns)
- ✅ Concept tagging
- ✅ File associations
- ✅ Minimal observation prompts (don't bias agent)
From CURRENT System (Keep These)
- ✅ XML parsing (not bash commands)
- ✅ Auto-increment IDs (not manual counters)
- ✅ Structured summary format (8 fields)
- ✅ Per-prompt tracking
- ✅ Foreign key integrity
- ✅ Typed observations (decision/bugfix/feature/refactor/discovery)
From NEITHER System (Add These)
- Clear threshold guidance: "Only store if it reveals important information about the codebase"
- Explicit narrative field in observations
- Vector embeddings for semantic search (current stores in SQLite only)
RECOMMENDED HYBRID SYSTEM
Storage Format: Hierarchical Observations (XML)
<observation>
<type>feature</type>
<title>JWT Token Refresh Implementation</title>
<subtitle>Added 7-day refresh token rotation with Redis storage</subtitle>
<facts>
<fact>src/auth.ts: refreshToken() generates new JWT with 7-day expiry</fact>
<fact>Redis key format: refresh:{userId}:{tokenId} with TTL 604800s</fact>
<fact>Old token invalidated on refresh to prevent replay attacks</fact>
</facts>
<narrative>Implemented JWT refresh token functionality in src/auth.ts. The refreshToken() function validates the old refresh token from Redis, generates a new JWT access token (7-day expiry) and new refresh token, stores the new refresh token in Redis with key format refresh:{userId}:{tokenId} and TTL of 604800 seconds (7 days), and invalidates the old refresh token to prevent replay attacks. This enables long-lived authenticated sessions without requiring users to re-login while maintaining security through token rotation.</narrative>
<concepts>
<concept>authentication</concept>
<concept>security</concept>
<concept>session-management</concept>
</concepts>
<files>
<file>src/auth.ts</file>
<file>src/middleware/auth.ts</file>
</files>
</observation>
Guidance: Clear and Unambiguous
Be selective. Only store observations when the tool output reveals important information about:
- Architecture or design patterns
- Implementation details of features or bug fixes
- System state or configuration
- Business logic or algorithms
Skip routine operations like empty git status, simple npm installs, or trivial config changes.
Each observation should be self-contained and searchable.
Summary: Per-Session (Not Per-Prompt)
- Generate ONE summary when session ends
- Aggregate all observations from session
- Use current structured format (request, investigated, learned, completed, next_steps, files_read, files_edited, notes)
FINAL VERDICT
| Element | Current | Old | Winner |
|---|---|---|---|
| Storage Structure | Flat one-sentence | 4-level hierarchy | OLD |
| Storage Implementation | XML parsing | Bash commands | CURRENT |
| Decision Guidance | Contradictory | Clear | OLD |
| Session Metadata | None | Title + subtitle | OLD |
| Per-Prompt Tracking | Yes (summaries) | No | CURRENT |
| Semantic Search | No | Yes (ChromaDB) | OLD |
| Observation Prompts | Biased, repetitive | Minimal, clear | OLD |
| Auto-Increment IDs | Yes | No (manual) | CURRENT |
| File Associations | No | Yes | OLD |
| Concept Tagging | No | Yes | OLD |
Optimal System: Hybrid - Old system's data model + Current system's implementation approach