f38b5b85bc
* docs: add investigation reports for 5 open GitHub issues Comprehensive analysis of issues #543, #544, #545, #555, and #557: - #557: settings.json not generated, module loader error (node/bun mismatch) - #555: Windows hooks not executing, hasIpc always false - #545: formatTool crashes on non-JSON tool_input strings - #544: mem-search skill hint shown incorrectly to Claude Code users - #543: /claude-mem slash command unavailable despite installation Each report includes root cause analysis, affected files, and proposed fixes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(logger): handle non-JSON tool_input in formatTool (#545) Wrap JSON.parse in try-catch to handle raw string inputs (e.g., Bash commands) that aren't valid JSON. Falls back to using the string as-is. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(context): update mem-search hint to reference MCP tools (#544) Update hint messages to reference MCP tools (search, get_observations) instead of the deprecated "mem-search skill" terminology. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(settings): auto-create settings.json on first load (#557, #543) When settings.json doesn't exist, create it with defaults instead of returning in-memory defaults. Creates parent directory if needed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(hooks): use bun runtime for hooks except smart-install (#557) Change hook commands from node to bun since hooks use bun:sqlite. Keep smart-install.js on node since it bootstraps bun installation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: rebuild plugin scripts * docs: clarify that build artifacts must be committed * fix(docs): update build artifacts directory reference in CLAUDE.md * test: add test coverage for PR #558 fixes - Fix 2 failing tests: update "mem-search skill" → "MCP tools" expectations - Add 56 tests for formatTool() JSON.parse crash fix (Issue #545) - Add 27 tests for settings.json auto-creation (Issue #543) Test coverage includes: - formatTool: JSON parsing, raw strings, objects, null/undefined, all tool types - Settings: file creation, directory creation, schema migration, edge cases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(tests): clean up flaky tests and fix circular dependency Phase 1 of test quality improvements: - Delete 6 harmful/worthless test files that used problematic mock.module() patterns or tested implementation details rather than behavior: - context-builder.test.ts (tested internal implementation) - export-types.test.ts (fragile mock patterns) - smart-install.test.ts (shell script testing antipattern) - session_id_refactor.test.ts (outdated, tested refactoring itself) - validate_sql_update.test.ts (one-time migration validation) - observation-broadcaster.test.ts (excessive mocking) - Fix circular dependency between logger.ts and SettingsDefaultsManager.ts by using late binding pattern - logger now lazily loads settings - Refactor mock.module() to spyOn() in several test files for more maintainable and less brittle tests: - observation-compiler.test.ts - gemini_agent.test.ts - error-handler.test.ts - server.test.ts - response-processor.test.ts All 649 tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(tests): phase 2 - reduce mock-heavy tests and improve focus - Remove mock-heavy query tests from observation-compiler.test.ts, keep real buildTimeline tests - Convert session_id_usage_validation.test.ts from 477 to 178 lines of focused smoke tests - Remove tests for language built-ins from worker-spawn.test.ts (JSON.parse, array indexing) - Rename logger-coverage.test.ts to logger-usage-standards.test.ts for clarity 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(tests): phase 3 - add JSDoc mock justification to test files Document mock usage rationale in 5 test files to improve maintainability: - error-handler.test.ts: Express req/res mocks, logger spies (~11%) - fallback-error-handler.test.ts: Zero mocks, pure function tests - session-cleanup-helper.test.ts: Session fixtures, worker mocks (~19%) - hook-constants.test.ts: process.platform mock for Windows tests (~12%) - session_store.test.ts: Zero mocks, real SQLite :memory: database Part of ongoing effort to document mock justifications per TESTING.md guidelines. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test(integration): phase 5 - add 72 tests for critical coverage gaps Add comprehensive test coverage for previously untested areas: - tests/integration/hook-execution-e2e.test.ts (10 tests) Tests lifecycle hooks execution flow and context propagation - tests/integration/worker-api-endpoints.test.ts (19 tests) Tests all worker service HTTP endpoints without heavy mocking - tests/integration/chroma-vector-sync.test.ts (16 tests) Tests vector embedding synchronization with ChromaDB - tests/utils/tag-stripping.test.ts (27 tests) Tests privacy tag stripping utilities for both <private> and <meta-observation> tags All tests use real implementations where feasible, following the project's testing philosophy of preferring integration-style tests over unit tests with extensive mocking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * context update * docs: add comment linking DEFAULT_DATA_DIR locations Added NOTE comment in logger.ts pointing to the canonical DEFAULT_DATA_DIR in SettingsDefaultsManager.ts. This addresses PR reviewer feedback about the fragility of having the default defined in two places to avoid circular dependencies. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
402 lines
14 KiB
TypeScript
402 lines
14 KiB
TypeScript
import { describe, it, expect, beforeEach, afterEach, spyOn, mock } from 'bun:test';
|
|
import { writeFileSync, mkdirSync, rmSync, existsSync } from 'fs';
|
|
import { join } from 'path';
|
|
import { tmpdir } from 'os';
|
|
import { GeminiAgent } from '../src/services/worker/GeminiAgent';
|
|
import { DatabaseManager } from '../src/services/worker/DatabaseManager';
|
|
import { SessionManager } from '../src/services/worker/SessionManager';
|
|
import { ModeManager } from '../src/services/domain/ModeManager';
|
|
import { SettingsDefaultsManager } from '../src/shared/SettingsDefaultsManager';
|
|
|
|
// Track rate limiting setting (controls Gemini RPM throttling)
|
|
// Set to 'false' to disable rate limiting for faster tests
|
|
let rateLimitingEnabled = 'false';
|
|
|
|
// Mock mode config
|
|
const mockMode = {
|
|
name: 'code',
|
|
prompts: {
|
|
init: 'init prompt',
|
|
observation: 'obs prompt',
|
|
summary: 'summary prompt'
|
|
},
|
|
observation_types: [{ id: 'discovery' }, { id: 'bugfix' }],
|
|
observation_concepts: []
|
|
};
|
|
|
|
// Use spyOn for all dependencies to avoid affecting other test files
|
|
// spyOn restores automatically, unlike mock.module which persists
|
|
let loadFromFileSpy: ReturnType<typeof spyOn>;
|
|
let getSpy: ReturnType<typeof spyOn>;
|
|
let modeManagerSpy: ReturnType<typeof spyOn>;
|
|
|
|
describe('GeminiAgent', () => {
|
|
let agent: GeminiAgent;
|
|
let originalFetch: typeof global.fetch;
|
|
|
|
// Mocks
|
|
let mockStoreObservation: any;
|
|
let mockStoreObservations: any; // Plural - atomic transaction method used by ResponseProcessor
|
|
let mockStoreSummary: any;
|
|
let mockMarkSessionCompleted: any;
|
|
let mockSyncObservation: any;
|
|
let mockSyncSummary: any;
|
|
let mockMarkProcessed: any;
|
|
let mockCleanupProcessed: any;
|
|
let mockResetStuckMessages: any;
|
|
let mockDbManager: DatabaseManager;
|
|
let mockSessionManager: SessionManager;
|
|
|
|
beforeEach(() => {
|
|
// Reset rate limiting to disabled by default (speeds up tests)
|
|
rateLimitingEnabled = 'false';
|
|
|
|
// Mock ModeManager using spyOn (restores properly)
|
|
modeManagerSpy = spyOn(ModeManager, 'getInstance').mockImplementation(() => ({
|
|
getActiveMode: () => mockMode,
|
|
loadMode: () => {},
|
|
} as any));
|
|
|
|
// Mock SettingsDefaultsManager methods using spyOn (restores properly)
|
|
loadFromFileSpy = spyOn(SettingsDefaultsManager, 'loadFromFile').mockImplementation(() => ({
|
|
...SettingsDefaultsManager.getAllDefaults(),
|
|
CLAUDE_MEM_GEMINI_API_KEY: 'test-api-key',
|
|
CLAUDE_MEM_GEMINI_MODEL: 'gemini-2.5-flash-lite',
|
|
CLAUDE_MEM_GEMINI_RATE_LIMITING_ENABLED: rateLimitingEnabled,
|
|
CLAUDE_MEM_DATA_DIR: '/tmp/claude-mem-test',
|
|
}));
|
|
|
|
getSpy = spyOn(SettingsDefaultsManager, 'get').mockImplementation((key: string) => {
|
|
if (key === 'CLAUDE_MEM_GEMINI_API_KEY') return 'test-api-key';
|
|
if (key === 'CLAUDE_MEM_GEMINI_MODEL') return 'gemini-2.5-flash-lite';
|
|
if (key === 'CLAUDE_MEM_GEMINI_RATE_LIMITING_ENABLED') return rateLimitingEnabled;
|
|
if (key === 'CLAUDE_MEM_DATA_DIR') return '/tmp/claude-mem-test';
|
|
return SettingsDefaultsManager.getAllDefaults()[key as keyof ReturnType<typeof SettingsDefaultsManager.getAllDefaults>] ?? '';
|
|
});
|
|
|
|
// Initialize mocks
|
|
mockStoreObservation = mock(() => ({ id: 1, createdAtEpoch: Date.now() }));
|
|
mockStoreSummary = mock(() => ({ id: 1, createdAtEpoch: Date.now() }));
|
|
mockMarkSessionCompleted = mock(() => {});
|
|
mockSyncObservation = mock(() => Promise.resolve());
|
|
mockSyncSummary = mock(() => Promise.resolve());
|
|
mockMarkProcessed = mock(() => {});
|
|
mockCleanupProcessed = mock(() => 0);
|
|
mockResetStuckMessages = mock(() => 0);
|
|
|
|
// Mock for storeObservations (plural) - the atomic transaction method called by ResponseProcessor
|
|
mockStoreObservations = mock(() => ({
|
|
observationIds: [1],
|
|
summaryId: 1,
|
|
createdAtEpoch: Date.now()
|
|
}));
|
|
|
|
const mockSessionStore = {
|
|
storeObservation: mockStoreObservation,
|
|
storeObservations: mockStoreObservations, // Required by ResponseProcessor.ts
|
|
storeSummary: mockStoreSummary,
|
|
markSessionCompleted: mockMarkSessionCompleted
|
|
};
|
|
|
|
const mockChromaSync = {
|
|
syncObservation: mockSyncObservation,
|
|
syncSummary: mockSyncSummary
|
|
};
|
|
|
|
mockDbManager = {
|
|
getSessionStore: () => mockSessionStore,
|
|
getChromaSync: () => mockChromaSync
|
|
} as unknown as DatabaseManager;
|
|
|
|
const mockPendingMessageStore = {
|
|
markProcessed: mockMarkProcessed,
|
|
cleanupProcessed: mockCleanupProcessed,
|
|
resetStuckMessages: mockResetStuckMessages
|
|
};
|
|
|
|
mockSessionManager = {
|
|
getMessageIterator: async function* () { yield* []; },
|
|
getPendingMessageStore: () => mockPendingMessageStore
|
|
} as unknown as SessionManager;
|
|
|
|
agent = new GeminiAgent(mockDbManager, mockSessionManager);
|
|
originalFetch = global.fetch;
|
|
});
|
|
|
|
afterEach(() => {
|
|
global.fetch = originalFetch;
|
|
// Restore spied methods
|
|
if (modeManagerSpy) modeManagerSpy.mockRestore();
|
|
if (loadFromFileSpy) loadFromFileSpy.mockRestore();
|
|
if (getSpy) getSpy.mockRestore();
|
|
mock.restore();
|
|
});
|
|
|
|
it('should initialize with correct config', async () => {
|
|
const session = {
|
|
sessionDbId: 1,
|
|
contentSessionId: 'test-session',
|
|
memorySessionId: 'mem-session-123',
|
|
project: 'test-project',
|
|
userPrompt: 'test prompt',
|
|
conversationHistory: [],
|
|
lastPromptNumber: 1,
|
|
cumulativeInputTokens: 0,
|
|
cumulativeOutputTokens: 0,
|
|
pendingMessages: [],
|
|
abortController: new AbortController(),
|
|
generatorPromise: null,
|
|
earliestPendingTimestamp: null,
|
|
currentProvider: null,
|
|
startTime: Date.now()
|
|
} as any;
|
|
|
|
global.fetch = mock(() => Promise.resolve(new Response(JSON.stringify({
|
|
candidates: [{
|
|
content: {
|
|
parts: [{ text: '<observation><type>discovery</type><title>Test</title></observation>' }]
|
|
}
|
|
}],
|
|
usageMetadata: { totalTokenCount: 100 }
|
|
}))));
|
|
|
|
await agent.startSession(session);
|
|
|
|
expect(global.fetch).toHaveBeenCalledTimes(1);
|
|
const url = (global.fetch as any).mock.calls[0][0];
|
|
expect(url).toContain('https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent');
|
|
expect(url).toContain('key=test-api-key');
|
|
});
|
|
|
|
it('should handle multi-turn conversation', async () => {
|
|
const session = {
|
|
sessionDbId: 1,
|
|
contentSessionId: 'test-session',
|
|
memorySessionId: 'mem-session-123',
|
|
project: 'test-project',
|
|
userPrompt: 'test prompt',
|
|
conversationHistory: [{ role: 'user', content: 'prev context' }, { role: 'assistant', content: 'prev response' }],
|
|
lastPromptNumber: 2,
|
|
cumulativeInputTokens: 0,
|
|
cumulativeOutputTokens: 0,
|
|
pendingMessages: [],
|
|
abortController: new AbortController(),
|
|
generatorPromise: null,
|
|
earliestPendingTimestamp: null,
|
|
currentProvider: null,
|
|
startTime: Date.now()
|
|
} as any;
|
|
|
|
global.fetch = mock(() => Promise.resolve(new Response(JSON.stringify({
|
|
candidates: [{ content: { parts: [{ text: 'response' }] } }]
|
|
}))));
|
|
|
|
await agent.startSession(session);
|
|
|
|
const body = JSON.parse((global.fetch as any).mock.calls[0][1].body);
|
|
expect(body.contents).toHaveLength(3);
|
|
expect(body.contents[0].role).toBe('user');
|
|
expect(body.contents[1].role).toBe('model');
|
|
expect(body.contents[2].role).toBe('user');
|
|
});
|
|
|
|
it('should process observations and store them', async () => {
|
|
const session = {
|
|
sessionDbId: 1,
|
|
contentSessionId: 'test-session',
|
|
memorySessionId: 'mem-session-123',
|
|
project: 'test-project',
|
|
userPrompt: 'test prompt',
|
|
conversationHistory: [],
|
|
lastPromptNumber: 1,
|
|
cumulativeInputTokens: 0,
|
|
cumulativeOutputTokens: 0,
|
|
pendingMessages: [],
|
|
abortController: new AbortController(),
|
|
generatorPromise: null,
|
|
earliestPendingTimestamp: null,
|
|
currentProvider: null,
|
|
startTime: Date.now()
|
|
} as any;
|
|
|
|
const observationXml = `
|
|
<observation>
|
|
<type>discovery</type>
|
|
<title>Found bug</title>
|
|
<subtitle>Null pointer</subtitle>
|
|
<narrative>Found a null pointer in the code</narrative>
|
|
<facts><fact>Null check missing</fact></facts>
|
|
<concepts><concept>bug</concept></concepts>
|
|
<files_read><file>src/main.ts</file></files_read>
|
|
<files_modified></files_modified>
|
|
</observation>
|
|
`;
|
|
|
|
global.fetch = mock(() => Promise.resolve(new Response(JSON.stringify({
|
|
candidates: [{ content: { parts: [{ text: observationXml }] } }],
|
|
usageMetadata: { totalTokenCount: 50 }
|
|
}))));
|
|
|
|
await agent.startSession(session);
|
|
|
|
// ResponseProcessor uses storeObservations (plural) for atomic transactions
|
|
expect(mockStoreObservations).toHaveBeenCalled();
|
|
expect(mockSyncObservation).toHaveBeenCalled();
|
|
expect(session.cumulativeInputTokens).toBeGreaterThan(0);
|
|
});
|
|
|
|
it('should fallback to Claude on rate limit error', async () => {
|
|
const session = {
|
|
sessionDbId: 1,
|
|
contentSessionId: 'test-session',
|
|
memorySessionId: 'mem-session-123',
|
|
project: 'test-project',
|
|
userPrompt: 'test prompt',
|
|
conversationHistory: [],
|
|
lastPromptNumber: 1,
|
|
cumulativeInputTokens: 0,
|
|
cumulativeOutputTokens: 0,
|
|
pendingMessages: [],
|
|
abortController: new AbortController(),
|
|
generatorPromise: null,
|
|
earliestPendingTimestamp: null,
|
|
currentProvider: null,
|
|
startTime: Date.now()
|
|
} as any;
|
|
|
|
global.fetch = mock(() => Promise.resolve(new Response('Resource has been exhausted (e.g. check quota).', { status: 429 })));
|
|
|
|
const fallbackAgent = {
|
|
startSession: mock(() => Promise.resolve())
|
|
};
|
|
agent.setFallbackAgent(fallbackAgent);
|
|
|
|
await agent.startSession(session);
|
|
|
|
// Verify fallback to Claude was triggered
|
|
expect(fallbackAgent.startSession).toHaveBeenCalledWith(session, undefined);
|
|
// Note: resetStuckMessages is called by worker-service.ts, not by GeminiAgent
|
|
});
|
|
|
|
it('should NOT fallback on other errors', async () => {
|
|
const session = {
|
|
sessionDbId: 1,
|
|
contentSessionId: 'test-session',
|
|
memorySessionId: 'mem-session-123',
|
|
project: 'test-project',
|
|
userPrompt: 'test prompt',
|
|
conversationHistory: [],
|
|
lastPromptNumber: 1,
|
|
cumulativeInputTokens: 0,
|
|
cumulativeOutputTokens: 0,
|
|
pendingMessages: [],
|
|
abortController: new AbortController(),
|
|
generatorPromise: null,
|
|
earliestPendingTimestamp: null,
|
|
currentProvider: null,
|
|
startTime: Date.now()
|
|
} as any;
|
|
|
|
global.fetch = mock(() => Promise.resolve(new Response('Invalid argument', { status: 400 })));
|
|
|
|
const fallbackAgent = {
|
|
startSession: mock(() => Promise.resolve())
|
|
};
|
|
agent.setFallbackAgent(fallbackAgent);
|
|
|
|
await expect(agent.startSession(session)).rejects.toThrow('Gemini API error: 400 - Invalid argument');
|
|
expect(fallbackAgent.startSession).not.toHaveBeenCalled();
|
|
});
|
|
|
|
it('should respect rate limits when rate limiting enabled', async () => {
|
|
// Enable rate limiting - this means requests will be throttled
|
|
// Note: CLAUDE_MEM_GEMINI_RATE_LIMITING_ENABLED !== 'false' means enabled
|
|
rateLimitingEnabled = 'true';
|
|
|
|
const originalSetTimeout = global.setTimeout;
|
|
const mockSetTimeout = mock((cb: any) => cb());
|
|
global.setTimeout = mockSetTimeout as any;
|
|
|
|
try {
|
|
const session = {
|
|
sessionDbId: 1,
|
|
contentSessionId: 'test-session',
|
|
memorySessionId: 'mem-session-123',
|
|
project: 'test-project',
|
|
userPrompt: 'test prompt',
|
|
conversationHistory: [],
|
|
lastPromptNumber: 1,
|
|
cumulativeInputTokens: 0,
|
|
cumulativeOutputTokens: 0,
|
|
pendingMessages: [],
|
|
abortController: new AbortController(),
|
|
generatorPromise: null,
|
|
earliestPendingTimestamp: null,
|
|
currentProvider: null,
|
|
startTime: Date.now()
|
|
} as any;
|
|
|
|
global.fetch = mock(() => Promise.resolve(new Response(JSON.stringify({
|
|
candidates: [{ content: { parts: [{ text: 'ok' }] } }]
|
|
}))));
|
|
|
|
await agent.startSession(session);
|
|
await agent.startSession(session);
|
|
|
|
expect(mockSetTimeout).toHaveBeenCalled();
|
|
} finally {
|
|
global.setTimeout = originalSetTimeout;
|
|
}
|
|
});
|
|
|
|
describe('gemini-3-flash model support', () => {
|
|
it('should accept gemini-3-flash as a valid model', async () => {
|
|
// The GeminiModel type includes gemini-3-flash - compile-time check
|
|
const validModels = [
|
|
'gemini-2.5-flash-lite',
|
|
'gemini-2.5-flash',
|
|
'gemini-2.5-pro',
|
|
'gemini-2.0-flash',
|
|
'gemini-2.0-flash-lite',
|
|
'gemini-3-flash'
|
|
];
|
|
|
|
// Verify all models are strings (type guard)
|
|
expect(validModels.every(m => typeof m === 'string')).toBe(true);
|
|
expect(validModels).toContain('gemini-3-flash');
|
|
});
|
|
|
|
it('should have rate limit defined for gemini-3-flash', async () => {
|
|
// GEMINI_RPM_LIMITS['gemini-3-flash'] = 5
|
|
// This is enforced at compile time, but we can test the rate limiting behavior
|
|
// by checking that the rate limit is applied when using gemini-3-flash
|
|
const session = {
|
|
sessionDbId: 1,
|
|
contentSessionId: 'test-session',
|
|
memorySessionId: 'mem-session-123',
|
|
project: 'test-project',
|
|
userPrompt: 'test prompt',
|
|
conversationHistory: [],
|
|
lastPromptNumber: 1,
|
|
cumulativeInputTokens: 0,
|
|
cumulativeOutputTokens: 0,
|
|
pendingMessages: [],
|
|
abortController: new AbortController(),
|
|
generatorPromise: null,
|
|
earliestPendingTimestamp: null,
|
|
currentProvider: null,
|
|
startTime: Date.now()
|
|
} as any;
|
|
|
|
global.fetch = mock(() => Promise.resolve(new Response(JSON.stringify({
|
|
candidates: [{ content: { parts: [{ text: 'ok' }] } }],
|
|
usageMetadata: { totalTokenCount: 10 }
|
|
}))));
|
|
|
|
// This validates that gemini-3-flash is a valid model at runtime
|
|
// The agent's validation array includes gemini-3-flash
|
|
await agent.startSession(session);
|
|
expect(global.fetch).toHaveBeenCalled();
|
|
});
|
|
});
|
|
}); |