159 lines
5.1 KiB
Markdown
159 lines
5.1 KiB
Markdown
# CivilPlan MCP v2 - Bird's-Eye View Generation Design Spec
|
|
|
|
## Overview
|
|
|
|
Add a new MCP tool `generate_birdseye_view` to CivilPlan MCP that generates 3D architectural/civil engineering bird's-eye view and perspective renderings using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Additionally, remove all local LLM dependencies and create a polished release with comprehensive documentation.
|
|
|
|
## Scope
|
|
|
|
### In Scope
|
|
1. **New MCP tool**: `generate_birdseye_view` — generates 2 images (bird's-eye + perspective)
|
|
2. **Nano Banana Pro integration** via `google-genai` Python SDK
|
|
3. **Project-type-specific prompt templates** (road, building, water/sewerage, river, landscaping)
|
|
4. **Local LLM removal** — delete all local LLM code and dependencies
|
|
5. **Release v2.0.0** — GitHub release with detailed README and connection guides
|
|
|
|
### Out of Scope
|
|
- Night/day or seasonal variations
|
|
- Video/animation generation
|
|
- 3D model file export (OBJ, FBX, etc.)
|
|
|
|
## Architecture
|
|
|
|
### Data Flow
|
|
|
|
```
|
|
MCP Client (Claude / ChatGPT)
|
|
|
|
|
| MCP Protocol (HTTP)
|
|
v
|
|
CivilPlan MCP Server (FastMCP)
|
|
|
|
|
| generate_birdseye_view tool called
|
|
v
|
|
BirdseyeViewGenerator
|
|
|
|
|
|-- [If SVG drawing exists] Convert SVG to PNG reference image
|
|
|-- [Always] Build optimized prompt from project data
|
|
|
|
|
v
|
|
Google Gemini API (Nano Banana Pro model)
|
|
|
|
|
v
|
|
2x PNG images returned (bird's-eye + perspective)
|
|
|
|
|
|-- Save to output directory
|
|
|-- Return base64 + file paths via MCP response
|
|
```
|
|
|
|
### New Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `civilplan_mcp/tools/birdseye_generator.py` | MCP tool implementation |
|
|
| `civilplan_mcp/prompts/birdseye_templates.py` | Project-type prompt templates |
|
|
| `civilplan_mcp/services/gemini_image.py` | Nano Banana Pro API client wrapper |
|
|
| `tests/test_birdseye_generator.py` | Unit tests |
|
|
|
|
### Tool Interface
|
|
|
|
```python
|
|
@mcp.tool()
|
|
async def generate_birdseye_view(
|
|
project_summary: str, # Parsed project description (from project_parser)
|
|
project_type: str, # "road" | "building" | "water" | "river" | "landscape" | "mixed"
|
|
svg_drawing: str | None, # Optional SVG drawing content from drawing_generator
|
|
resolution: str = "2k", # "2k" | "4k"
|
|
output_dir: str = "./output/renders"
|
|
) -> dict:
|
|
"""
|
|
Returns:
|
|
{
|
|
"birdseye_view": {"path": str, "base64": str},
|
|
"perspective_view": {"path": str, "base64": str},
|
|
"prompt_used": str,
|
|
"model": "nano-banana-pro"
|
|
}
|
|
"""
|
|
```
|
|
|
|
### Prompt Template Strategy
|
|
|
|
Each project type gets a specialized prompt template:
|
|
|
|
- **Road**: Emphasize road alignment, terrain, surrounding land use, utility corridors
|
|
- **Building**: Emphasize building mass, facade, site context, parking/landscaping
|
|
- **Water/Sewerage**: Emphasize pipeline routes, treatment facilities, connection points
|
|
- **River**: Emphasize riverbank, embankments, bridges, flood plains
|
|
- **Landscape**: Emphasize vegetation, pathways, public spaces, terrain grading
|
|
- **Mixed**: Combine relevant elements from applicable types
|
|
|
|
Template format:
|
|
```
|
|
"Create a photorealistic {view_type} of a {project_type} project:
|
|
{project_details}
|
|
Style: Professional architectural visualization, Korean construction context,
|
|
clear weather, daytime, {resolution} resolution"
|
|
```
|
|
|
|
### API Configuration
|
|
|
|
- API key stored via existing `.env` / `secure_store.py` pattern
|
|
- New env var: `GEMINI_API_KEY`
|
|
- SDK: `google-genai` (official Google Gen AI Python SDK)
|
|
- Model: `gemini-3-pro-image` (Nano Banana Pro)
|
|
- Error handling: On API failure, return error message without crashing the MCP tool
|
|
|
|
### SVG-to-PNG Conversion
|
|
|
|
When an SVG drawing is provided as reference:
|
|
1. Convert SVG to PNG using `cairosvg` or `Pillow`
|
|
2. Send as reference image alongside the text prompt
|
|
3. Nano Banana Pro uses it for spatial understanding
|
|
|
|
### Local LLM Removal
|
|
|
|
Identify and remove:
|
|
- Any local model loading code (transformers, llama-cpp, ollama, etc.)
|
|
- Related dependencies in `requirements.txt` / `pyproject.toml`
|
|
- Config entries referencing local models
|
|
- Replace with Gemini API calls where needed
|
|
|
|
## Release Plan
|
|
|
|
### Version: v2.0.0
|
|
|
|
### README Overhaul
|
|
- Project overview with feature highlights
|
|
- Quick start guide (clone, install, configure, run)
|
|
- Tool reference table (all 20 tools including new birdseye)
|
|
- Claude Desktop connection guide (step-by-step with screenshots description)
|
|
- ChatGPT / OpenAI connection guide
|
|
- API key setup guide (Gemini, public data portal)
|
|
- Example outputs (birdseye rendering description)
|
|
- Troubleshooting FAQ
|
|
|
|
### GitHub Release
|
|
- Tag: `v2.0.0`
|
|
- Release notes summarizing changes
|
|
- Installation instructions
|
|
|
|
## Testing Strategy
|
|
|
|
- Unit test for prompt template generation
|
|
- Unit test for SVG-to-PNG conversion
|
|
- Integration test with mocked Gemini API response
|
|
- Manual end-to-end test with real API key
|
|
|
|
## Dependencies Added
|
|
|
|
| Package | Purpose |
|
|
|---------|---------|
|
|
| `google-genai` | Gemini API SDK (Nano Banana Pro) |
|
|
| `cairosvg` | SVG to PNG conversion |
|
|
| `Pillow` | Image processing |
|
|
|
|
## Dependencies Removed
|
|
|
|
All local LLM packages (to be identified during implementation by scanning current requirements).
|