Add design spec for bird's-eye view generation and v2.0.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
158
docs/superpowers/specs/2026-04-04-birdseye-view-design.md
Normal file
158
docs/superpowers/specs/2026-04-04-birdseye-view-design.md
Normal file
@@ -0,0 +1,158 @@
|
||||
# CivilPlan MCP v2 - Bird's-Eye View Generation Design Spec
|
||||
|
||||
## Overview
|
||||
|
||||
Add a new MCP tool `generate_birdseye_view` to CivilPlan MCP that generates 3D architectural/civil engineering bird's-eye view and perspective renderings using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Additionally, remove all local LLM dependencies and create a polished release with comprehensive documentation.
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
1. **New MCP tool**: `generate_birdseye_view` — generates 2 images (bird's-eye + perspective)
|
||||
2. **Nano Banana Pro integration** via `google-genai` Python SDK
|
||||
3. **Project-type-specific prompt templates** (road, building, water/sewerage, river, landscaping)
|
||||
4. **Local LLM removal** — delete all local LLM code and dependencies
|
||||
5. **Release v2.0.0** — GitHub release with detailed README and connection guides
|
||||
|
||||
### Out of Scope
|
||||
- Night/day or seasonal variations
|
||||
- Video/animation generation
|
||||
- 3D model file export (OBJ, FBX, etc.)
|
||||
|
||||
## Architecture
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
MCP Client (Claude / ChatGPT)
|
||||
|
|
||||
| MCP Protocol (HTTP)
|
||||
v
|
||||
CivilPlan MCP Server (FastMCP)
|
||||
|
|
||||
| generate_birdseye_view tool called
|
||||
v
|
||||
BirdseyeViewGenerator
|
||||
|
|
||||
|-- [If SVG drawing exists] Convert SVG to PNG reference image
|
||||
|-- [Always] Build optimized prompt from project data
|
||||
|
|
||||
v
|
||||
Google Gemini API (Nano Banana Pro model)
|
||||
|
|
||||
v
|
||||
2x PNG images returned (bird's-eye + perspective)
|
||||
|
|
||||
|-- Save to output directory
|
||||
|-- Return base64 + file paths via MCP response
|
||||
```
|
||||
|
||||
### New Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `civilplan_mcp/tools/birdseye_generator.py` | MCP tool implementation |
|
||||
| `civilplan_mcp/prompts/birdseye_templates.py` | Project-type prompt templates |
|
||||
| `civilplan_mcp/services/gemini_image.py` | Nano Banana Pro API client wrapper |
|
||||
| `tests/test_birdseye_generator.py` | Unit tests |
|
||||
|
||||
### Tool Interface
|
||||
|
||||
```python
|
||||
@mcp.tool()
|
||||
async def generate_birdseye_view(
|
||||
project_summary: str, # Parsed project description (from project_parser)
|
||||
project_type: str, # "road" | "building" | "water" | "river" | "landscape" | "mixed"
|
||||
svg_drawing: str | None, # Optional SVG drawing content from drawing_generator
|
||||
resolution: str = "2k", # "2k" | "4k"
|
||||
output_dir: str = "./output/renders"
|
||||
) -> dict:
|
||||
"""
|
||||
Returns:
|
||||
{
|
||||
"birdseye_view": {"path": str, "base64": str},
|
||||
"perspective_view": {"path": str, "base64": str},
|
||||
"prompt_used": str,
|
||||
"model": "nano-banana-pro"
|
||||
}
|
||||
"""
|
||||
```
|
||||
|
||||
### Prompt Template Strategy
|
||||
|
||||
Each project type gets a specialized prompt template:
|
||||
|
||||
- **Road**: Emphasize road alignment, terrain, surrounding land use, utility corridors
|
||||
- **Building**: Emphasize building mass, facade, site context, parking/landscaping
|
||||
- **Water/Sewerage**: Emphasize pipeline routes, treatment facilities, connection points
|
||||
- **River**: Emphasize riverbank, embankments, bridges, flood plains
|
||||
- **Landscape**: Emphasize vegetation, pathways, public spaces, terrain grading
|
||||
- **Mixed**: Combine relevant elements from applicable types
|
||||
|
||||
Template format:
|
||||
```
|
||||
"Create a photorealistic {view_type} of a {project_type} project:
|
||||
{project_details}
|
||||
Style: Professional architectural visualization, Korean construction context,
|
||||
clear weather, daytime, {resolution} resolution"
|
||||
```
|
||||
|
||||
### API Configuration
|
||||
|
||||
- API key stored via existing `.env` / `secure_store.py` pattern
|
||||
- New env var: `GEMINI_API_KEY`
|
||||
- SDK: `google-genai` (official Google Gen AI Python SDK)
|
||||
- Model: `gemini-3-pro-image` (Nano Banana Pro)
|
||||
- Error handling: On API failure, return error message without crashing the MCP tool
|
||||
|
||||
### SVG-to-PNG Conversion
|
||||
|
||||
When an SVG drawing is provided as reference:
|
||||
1. Convert SVG to PNG using `cairosvg` or `Pillow`
|
||||
2. Send as reference image alongside the text prompt
|
||||
3. Nano Banana Pro uses it for spatial understanding
|
||||
|
||||
### Local LLM Removal
|
||||
|
||||
Identify and remove:
|
||||
- Any local model loading code (transformers, llama-cpp, ollama, etc.)
|
||||
- Related dependencies in `requirements.txt` / `pyproject.toml`
|
||||
- Config entries referencing local models
|
||||
- Replace with Gemini API calls where needed
|
||||
|
||||
## Release Plan
|
||||
|
||||
### Version: v2.0.0
|
||||
|
||||
### README Overhaul
|
||||
- Project overview with feature highlights
|
||||
- Quick start guide (clone, install, configure, run)
|
||||
- Tool reference table (all 20 tools including new birdseye)
|
||||
- Claude Desktop connection guide (step-by-step with screenshots description)
|
||||
- ChatGPT / OpenAI connection guide
|
||||
- API key setup guide (Gemini, public data portal)
|
||||
- Example outputs (birdseye rendering description)
|
||||
- Troubleshooting FAQ
|
||||
|
||||
### GitHub Release
|
||||
- Tag: `v2.0.0`
|
||||
- Release notes summarizing changes
|
||||
- Installation instructions
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
- Unit test for prompt template generation
|
||||
- Unit test for SVG-to-PNG conversion
|
||||
- Integration test with mocked Gemini API response
|
||||
- Manual end-to-end test with real API key
|
||||
|
||||
## Dependencies Added
|
||||
|
||||
| Package | Purpose |
|
||||
|---------|---------|
|
||||
| `google-genai` | Gemini API SDK (Nano Banana Pro) |
|
||||
| `cairosvg` | SVG to PNG conversion |
|
||||
| `Pillow` | Image processing |
|
||||
|
||||
## Dependencies Removed
|
||||
|
||||
All local LLM packages (to be identified during implementation by scanning current requirements).
|
||||
Reference in New Issue
Block a user