# Dual-Tag System Architecture **Date**: 2025-11-30 **Branch**: `feature/meta-observation-control` **Status**: Implemented **Based on**: PR #105 dual-tag system ## Overview The dual-tag system provides fine-grained control over what content gets persisted in claude-mem's observation database. It uses an edge processing pattern to filter tagged content at the hook layer before it reaches the worker service. ## The Two Tags ### Tag 1: `` **Purpose**: User-controlled privacy **Status**: User-facing feature (documented) **Use case**: Users wrap content they don't want persisted ```xml This content won't be stored in observations ``` **Examples**: - Sensitive information (API keys, credentials, internal URLs) - Temporary context (deadlines, personal notes) - Debug output (logs, stack traces) - Exploratory prompts (brainstorming, hypotheticals) ### Tag 2: `` **Purpose**: System-level meta-observation control **Status**: Infrastructure-ready (not user-facing yet) **Use case**: Prevents recursive storage when real-time context injection is active ```xml # Relevant Context from Past Sessions [Auto-injected past observations...] ``` **Context**: This tag is used by the real-time context injection feature (not yet shipped). When past observations are injected into new prompts, they're wrapped in this tag to prevent them from being re-stored as new observations (recursive storage problem). ## Architecture Pattern: Edge Processing **Principle**: "Process at edge, send clean data to server" The dual-tag system follows the edge processing pattern from hooks-in-composition: ```text UserPrompt → [Hook Layer] → Worker → Database ↑ Filter here (strip tags at edge) ``` ### Data Flow **Without Filtering** (broken): ``` UserPrompt with → PostToolUse hook → Worker → Memory Agent → Database ↓ Private content stored ``` **With Edge Processing** (correct): ``` UserPrompt with → PostToolUse hook → stripMemoryTags() → Worker → Memory Agent → Database ↑ ↓ Filter at edge Only clean data stored ``` ## Implementation ### File: `src/hooks/save-hook.ts` **Function Added** (lines 31-53): ```typescript /** * Strip memory tags to prevent recursive storage and enable privacy control */ function stripMemoryTags(content: string): string { if (typeof content !== 'string') { silentDebug('[save-hook] stripMemoryTags received non-string:', { type: typeof content }); return '{}'; // Safe default for JSON context } return content .replace(/[\s\S]*?<\/claude-mem-context>/g, '') .replace(/[\s\S]*?<\/private>/g, '') .trim(); } ``` **Application** (lines 95-100): ```typescript tool_input: tool_input !== undefined ? stripMemoryTags(JSON.stringify(tool_input)) : '{}', tool_response: tool_response !== undefined ? stripMemoryTags(JSON.stringify(tool_response)) : '{}', ``` ### File: `tests/strip-memory-tags.test.ts` **Test Coverage**: 19 tests across 4 categories: 1. **Basic Functionality** (7 tests) - Strip `` tags - Strip `` tags - Strip both tag types - Handle nested tags - Multiline content - Multiple tags - Empty results 2. **Edge Cases** (5 tests) - Malformed tags (unclosed) - Tag-like strings (not actual tags) - Very large content (10k+ chars) - Whitespace trimming - Strings without tags 3. **Type Safety** (5 tests) - Non-string inputs (number, null, undefined, object, array) - All return safe default '{}' 4. **Real-World Scenarios** (2 tests) - JSON.stringify output - Efficient large content handling **All tests passing** ✅ (19/19) ## Design Decisions ### 1. Always Active (No Configuration) **Decision**: Tag stripping is always on, no environment variable needed **Rationale**: Privacy and anti-recursion protection should be default, not opt-in ### 2. Edge Processing (Not Worker-Level) **Decision**: Filter at hook layer before sending to worker **Rationale**: - Keeps worker service simple - Follows one-way data stream - No worker changes needed - Hook becomes a filter/gateway ### 3. Defensive Coding with Silent Debug **Decision**: Handle non-string inputs with silentDebug, return safe default **Rationale**: - Never block the agent (hooks-in-composition principle) - Log issues for observability - Safe fallback maintains system stability ### 4. Both Tags Now (Progressive Enhancement) **Decision**: Implement both tags even though only `` is user-facing **Rationale**: - Infrastructure ready for real-time context feature - No rework needed when context injection ships - Same code path for both tags (simple) - Progressive enhancement approach ### 5. Regex-Based Stripping **Decision**: Use regex `/[\s\S]*?<\/tag>/g` instead of XML parser **Rationale**: - No dependencies needed - Handles multiline content (`[\s\S]*?`) - Non-greedy (`*?`) prevents over-matching - Global flag (`g`) handles multiple tags - Good enough for this use case ## Edge Cases Handled | Case | Input | Output | Why | |------|-------|--------|-----| | Nested tags | `a b a` | `` | Outer tag matches all | | Malformed | `unclosed` | `unclosed` | Regex requires closing tag | | Multiple | `a b c` | `b` | Global flag removes all | | Empty | `` | `` | Matches and removes | | Tag-like | `not private` | `not private` | Different tag name | | Large content | 10MB+ string | (stripped) | O(n) regex handles it | | Non-string | `123`, `null`, `{}` | `'{}'` | Defensive default | ## Future Enhancements ### 1. Real-Time Context Injection **Status**: Deferred (not in this PR) **When ready**: The `` tag infrastructure is already in place The missing piece is in `src/hooks/new-hook.ts`: - Select relevant observations from timeline - Wrap in `` tags - Return via `hookSpecificOutput` - Tag stripping already handles the rest ### 2. System-Level Meta-Observation Tagging **Concept**: Auto-tag observations about observations **Examples**: - Search skill results: `[search results]` - Memory lookups: Fetched observations wrapped in tag - Observation summaries: Meta-level analysis wrapped **Implementation**: Tools/skills that produce meta-observations can wrap output in `` tags to prevent recursive storage. ### 3. Additional Tag Types **Potential tags**: - ``: Content that should be seen but not stored (alias for ``) - ``: Debug output that should be logged but not persisted - ``: Thinking/planning content not meant for observations **Note**: Current implementation handles any tag you add to the regex. Adding new tags requires one line change in `stripMemoryTags()`. ## Testing Strategy ### Unit Tests ```bash node --test tests/strip-memory-tags.test.ts ``` **Expected**: 19/19 passing ✅ ### Integration Tests **Test 1: Basic Privacy** ```bash # Submit prompt with tag # Query database: should not contain private content sqlite3 ~/.claude-mem/claude-mem.db "SELECT COUNT(*) FROM observations WHERE narrative LIKE '%%';" # Expected: 0 ``` **Test 2: Dual Tags** ```bash # Submit prompt with both tags # Verify neither tag appears in database sqlite3 ~/.claude-mem/claude-mem.db "SELECT COUNT(*) FROM observations WHERE narrative LIKE '%%' OR narrative LIKE '%%';" # Expected: 0 ``` **Test 3: Function Exists** ```bash # Verify stripMemoryTags in built file grep -c "claude-mem-context.*private.*trim" ~/.claude/plugins/marketplaces/thedotmack/plugin/scripts/save-hook.js # Expected: 1 ``` ### Regression Tests **Ensure**: - Normal observations still work (no tags broken) - Worker service receives clean data - No errors in `~/.claude-mem/silent.log` - Tool executions still captured correctly ## Known Limitations ### 1. Tag Format is Fixed Tags must use exact XML-style format: `content` **Won't work**: - `[private]content[/private]` (wrong syntax) - `content` (comment syntax) - `{{private}}content{{/private}}` (curly braces) **Future**: Could add support for alternative formats if needed. ### 2. Partial Tag Matching If user writes about tags without intending to use them: ``` I want to add a tag feature to my app ``` This won't be stripped (no closing tag). But if they accidentally write: ``` I want to add a tag feature ``` "tag" gets stripped. **Mitigation**: Documentation educates users on proper usage. ### 3. Performance with Very Large Content Regex performance is O(n) where n = content length. **Tested**: Works fine with 10,000 character strings **Unknown**: Performance with multi-megabyte tool responses **Mitigation**: Most tool I/O is small. If issues arise, could optimize with: - Early exit if no '<' character found - Streaming regex for very large content - Size limits on stripMemoryTags input ## Documentation ### User-Facing **Location**: `docs/public/usage/private-tags.mdx` **Content**: - How to use `` tags - Use cases and examples - Best practices - Troubleshooting **Available in**: Mintlify docs site, navigation under "Get Started" ### Technical/Internal **Location**: `docs/context/dual-tag-system-architecture.md` (this file) **Content**: - Complete dual-tag system architecture - Implementation details - Design decisions - Future enhancements **Audience**: Contributors, maintainers, future developers ## References ### Original Work - **PR #105**: Real-time context injection with dual-tag system - **Branch**: `feature/real-time-context` (merged to main) - **Investigator**: @basher83 ### Documentation - **Investigation**: `docs/context/real-time-context-recursive-memory-investigation.md` - **User Guide**: `docs/public/usage/private-tags.mdx` - **This Document**: `docs/context/dual-tag-system-architecture.md` ### Patterns Applied - **Edge Processing**: From hooks-in-composition pattern - **Never Block the Agent**: Defensive coding, safe defaults - **One-Way Data Stream**: Hook → Worker → Database ## Summary The dual-tag system is a complete, production-ready implementation that: - ✅ Gives users privacy control via `` tags - ✅ Prepares infrastructure for real-time context injection - ✅ Uses edge processing pattern for clean architecture - ✅ Has comprehensive test coverage (19 tests, all passing) - ✅ Includes user documentation and technical reference - ✅ Requires no configuration (always active) - ✅ Handles edge cases defensively **Status**: Ready to ship 🚀