Add stderr test hook for UI experiment
This commit is contained in:
@@ -0,0 +1,222 @@
|
||||
# Context Engineering for AI Agents: Best Practices Cheat Sheet
|
||||
|
||||
## Core Principle
|
||||
**Find the smallest possible set of high-signal tokens that maximize the likelihood of your desired outcome.**
|
||||
|
||||
---
|
||||
|
||||
## Context Engineering vs Prompt Engineering
|
||||
|
||||
**Prompt Engineering**: Writing and organizing LLM instructions for optimal outcomes (one-time task)
|
||||
|
||||
**Context Engineering**: Curating and maintaining the optimal set of tokens during inference across multiple turns (iterative process)
|
||||
|
||||
Context engineering manages:
|
||||
- System instructions
|
||||
- Tools
|
||||
- Model Context Protocol (MCP)
|
||||
- External data
|
||||
- Message history
|
||||
- Runtime data retrieval
|
||||
|
||||
---
|
||||
|
||||
## The Problem: Context Rot
|
||||
|
||||
**Key Insight**: LLMs have an "attention budget" that gets depleted as context grows
|
||||
|
||||
- Every token attends to every other token (n² relationships)
|
||||
- As context length increases, model accuracy decreases
|
||||
- Models have less training experience with longer sequences
|
||||
- Context must be treated as a finite resource with diminishing marginal returns
|
||||
|
||||
---
|
||||
|
||||
## System Prompts: Find the "Right Altitude"
|
||||
|
||||
### The Goldilocks Zone
|
||||
|
||||
**Too Prescriptive** ❌
|
||||
- Hardcoded if-else logic
|
||||
- Brittle and fragile
|
||||
- High maintenance complexity
|
||||
|
||||
**Too Vague** ❌
|
||||
- High-level guidance without concrete signals
|
||||
- Falsely assumes shared context
|
||||
- Lacks actionable direction
|
||||
|
||||
**Just Right** ✅
|
||||
- Specific enough to guide behavior effectively
|
||||
- Flexible enough to provide strong heuristics
|
||||
- Minimal set of information that fully outlines expected behavior
|
||||
|
||||
### Best Practices
|
||||
- Use simple, direct language
|
||||
- Organize into distinct sections (`<background_information>`, `<instructions>`, `## Tool guidance`, etc.)
|
||||
- Use XML tags or Markdown headers for structure
|
||||
- Start with minimal prompt, add based on failure modes
|
||||
- Note: Minimal ≠ short (provide sufficient information upfront)
|
||||
|
||||
---
|
||||
|
||||
## Tools: Minimal and Clear
|
||||
|
||||
### Design Principles
|
||||
- **Self-contained**: Each tool has a single, clear purpose
|
||||
- **Robust to error**: Handle edge cases gracefully
|
||||
- **Extremely clear**: Intended use is unambiguous
|
||||
- **Token-efficient**: Returns relevant information without bloat
|
||||
- **Descriptive parameters**: Unambiguous input names (e.g., `user_id` not `user`)
|
||||
|
||||
### Critical Rule
|
||||
**If a human engineer can't definitively say which tool to use in a given situation, an AI agent can't be expected to do better.**
|
||||
|
||||
### Common Failure Modes to Avoid
|
||||
- Bloated tool sets covering too much functionality
|
||||
- Tools with overlapping purposes
|
||||
- Ambiguous decision points about which tool to use
|
||||
|
||||
---
|
||||
|
||||
## Examples: Diverse, Not Exhaustive
|
||||
|
||||
**Do** ✅
|
||||
- Curate a set of diverse, canonical examples
|
||||
- Show expected behavior effectively
|
||||
- Think "pictures worth a thousand words"
|
||||
|
||||
**Don't** ❌
|
||||
- Stuff in a laundry list of edge cases
|
||||
- Try to articulate every possible rule
|
||||
- Overwhelm with exhaustive scenarios
|
||||
|
||||
---
|
||||
|
||||
## Context Retrieval Strategies
|
||||
|
||||
### Just-In-Time Context (Recommended for Agents)
|
||||
**Approach**: Maintain lightweight identifiers (file paths, queries, links) and dynamically load data at runtime
|
||||
|
||||
**Benefits**:
|
||||
- Avoids context pollution
|
||||
- Enables progressive disclosure
|
||||
- Mirrors human cognition (we don't memorize everything)
|
||||
- Leverages metadata (file names, folder structure, timestamps)
|
||||
- Agents discover context incrementally
|
||||
|
||||
**Trade-offs**:
|
||||
- Slower than pre-computed retrieval
|
||||
- Requires proper tool guidance to avoid dead-ends
|
||||
|
||||
### Pre-Inference Retrieval (Traditional RAG)
|
||||
**Approach**: Use embedding-based retrieval to surface context before inference
|
||||
|
||||
**When to Use**: Static content that won't change during interaction
|
||||
|
||||
### Hybrid Strategy (Best of Both)
|
||||
**Approach**: Retrieve some data upfront, enable autonomous exploration as needed
|
||||
|
||||
**Example**: Claude Code loads CLAUDE.md files upfront, uses glob/grep for just-in-time retrieval
|
||||
|
||||
**Rule of Thumb**: "Do the simplest thing that works"
|
||||
|
||||
---
|
||||
|
||||
## Long-Horizon Tasks: Three Techniques
|
||||
|
||||
### 1. Compaction
|
||||
**What**: Summarize conversation nearing context limit, reinitiate with summary
|
||||
|
||||
**Implementation**:
|
||||
- Pass message history to model for compression
|
||||
- Preserve critical details (architectural decisions, bugs, implementation)
|
||||
- Discard redundant outputs
|
||||
- Continue with compressed context + recently accessed files
|
||||
|
||||
**Tuning Process**:
|
||||
1. **First**: Maximize recall (capture all relevant information)
|
||||
2. **Then**: Improve precision (eliminate superfluous content)
|
||||
|
||||
**Low-Hanging Fruit**: Clear old tool calls and results
|
||||
|
||||
**Best For**: Tasks requiring extensive back-and-forth
|
||||
|
||||
### 2. Structured Note-Taking (Agentic Memory)
|
||||
**What**: Agent writes notes persisted outside context window, retrieved later
|
||||
|
||||
**Examples**:
|
||||
- To-do lists
|
||||
- NOTES.md files
|
||||
- Game state tracking (Pokémon example: tracking 1,234 steps of training)
|
||||
- Project progress logs
|
||||
|
||||
**Benefits**:
|
||||
- Persistent memory with minimal overhead
|
||||
- Maintains critical context across tool calls
|
||||
- Enables multi-hour coherent strategies
|
||||
|
||||
**Best For**: Iterative development with clear milestones
|
||||
|
||||
### 3. Sub-Agent Architectures
|
||||
**What**: Specialized sub-agents handle focused tasks with clean context windows
|
||||
|
||||
**How It Works**:
|
||||
- Main agent coordinates high-level plan
|
||||
- Sub-agents perform deep technical work
|
||||
- Sub-agents explore extensively (tens of thousands of tokens)
|
||||
- Return condensed summaries (1,000-2,000 tokens)
|
||||
|
||||
**Benefits**:
|
||||
- Clear separation of concerns
|
||||
- Parallel exploration
|
||||
- Detailed context remains isolated
|
||||
|
||||
**Best For**: Complex research and analysis tasks
|
||||
|
||||
---
|
||||
|
||||
## Quick Decision Framework
|
||||
|
||||
| Scenario | Recommended Approach |
|
||||
|----------|---------------------|
|
||||
| Static content | Pre-inference retrieval or hybrid |
|
||||
| Dynamic exploration needed | Just-in-time context |
|
||||
| Extended back-and-forth | Compaction |
|
||||
| Iterative development | Structured note-taking |
|
||||
| Complex research | Sub-agent architectures |
|
||||
| Rapid model improvement | "Do the simplest thing that works" |
|
||||
|
||||
---
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
1. **Context is finite**: Treat it as a precious resource with an attention budget
|
||||
2. **Think holistically**: Consider the entire state available to the LLM
|
||||
3. **Stay minimal**: More context isn't always better
|
||||
4. **Be iterative**: Context curation happens each time you pass to the model
|
||||
5. **Design for autonomy**: As models improve, let them act intelligently
|
||||
6. **Start simple**: Test with minimal setup, add based on failure modes
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns to Avoid
|
||||
|
||||
- ❌ Cramming everything into prompts
|
||||
- ❌ Creating brittle if-else logic
|
||||
- ❌ Building bloated tool sets
|
||||
- ❌ Stuffing exhaustive edge cases as examples
|
||||
- ❌ Assuming larger context windows solve everything
|
||||
- ❌ Ignoring context pollution over long interactions
|
||||
|
||||
---
|
||||
|
||||
## Remember
|
||||
|
||||
> "Even as models continue to improve, the challenge of maintaining coherence across extended interactions will remain central to building more effective agents."
|
||||
|
||||
Context engineering will evolve, but the core principle stays the same: **optimize signal-to-noise ratio in your token budget**.
|
||||
|
||||
---
|
||||
|
||||
*Based on Anthropic's "Effective context engineering for AI agents" (September 2025)*
|
||||
Reference in New Issue
Block a user