feat: tier routing by queue complexity + observation feedback table

Tier Routing:
- Inspect pending queue before starting generator
- Summarize messages → CLAUDE_MEM_TIER_SUMMARY_MODEL (e.g., Opus)
- All simple tools (Read, Glob, Grep, LS) → CLAUDE_MEM_TIER_SIMPLE_MODEL (Haiku)
- Mixed/complex → default model (no override)
- session.modelOverride in ActiveSession, used by SDKAgent.getModelId()
- peekPendingTypes() in PendingMessageStore for non-claiming inspection
- Configurable via CLAUDE_MEM_TIER_ROUTING_ENABLED (default: true)

Feedback Collection (schema only):
- New observation_feedback table via MigrationRunner (schema version 24)
- Tracks signal_type (semantic_inject_hit, search_accessed, etc.)
- Indexes on observation_id and signal_type
- Foundation for future Thompson Sampling optimization

Production data (24h tier routing test):
- 36 Haiku observations in 4 min, quality indistinguishable from Sonnet
- Estimated ~52% cost reduction on SDK Agent usage
- 835 → 6,695 feedback signals collected over 13 days

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alessandro Costa
2026-04-01 21:56:01 -03:00
committed by Alex Newman
parent 876cc4d837
commit 0fcc078873
7 changed files with 143 additions and 3 deletions
+2 -2
View File
@@ -49,8 +49,8 @@ export class SDKAgent {
// Find Claude executable
const claudePath = this.findClaudeExecutable();
// Get model ID and disallowed tools
const modelId = this.getModelId();
// Get model ID (tier routing override takes precedence)
const modelId = session.modelOverride || this.getModelId();
// Memory agent is OBSERVER ONLY - no tools allowed
const disallowedTools = [
'Bash', // Prevent infinite loops