Files

sinmb79 121df5b63c Add design spec for bird's-eye view generation and v2.0.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-04 18:31:35 +09:00

5.1 KiB

Raw Blame History

CivilPlan MCP v2 - Bird's-Eye View Generation Design Spec

Overview

Add a new MCP tool generate_birdseye_view to CivilPlan MCP that generates 3D architectural/civil engineering bird's-eye view and perspective renderings using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Additionally, remove all local LLM dependencies and create a polished release with comprehensive documentation.

Scope

In Scope

New MCP tool: generate_birdseye_view — generates 2 images (bird's-eye + perspective)
Nano Banana Pro integration via google-genai Python SDK
Project-type-specific prompt templates (road, building, water/sewerage, river, landscaping)
Local LLM removal — delete all local LLM code and dependencies
Release v2.0.0 — GitHub release with detailed README and connection guides

Out of Scope

Night/day or seasonal variations
Video/animation generation
3D model file export (OBJ, FBX, etc.)

Architecture

Data Flow

MCP Client (Claude / ChatGPT)
        |
        | MCP Protocol (HTTP)
        v
CivilPlan MCP Server (FastMCP)
        |
        | generate_birdseye_view tool called
        v
BirdseyeViewGenerator
        |
        |-- [If SVG drawing exists] Convert SVG to PNG reference image
        |-- [Always] Build optimized prompt from project data
        |
        v
Google Gemini API (Nano Banana Pro model)
        |
        v
2x PNG images returned (bird's-eye + perspective)
        |
        |-- Save to output directory
        |-- Return base64 + file paths via MCP response

New Files

File	Purpose
`civilplan_mcp/tools/birdseye_generator.py`	MCP tool implementation
`civilplan_mcp/prompts/birdseye_templates.py`	Project-type prompt templates
`civilplan_mcp/services/gemini_image.py`	Nano Banana Pro API client wrapper
`tests/test_birdseye_generator.py`	Unit tests

Tool Interface

@mcp.tool()
async def generate_birdseye_view(
    project_summary: str,       # Parsed project description (from project_parser)
    project_type: str,          # "road" | "building" | "water" | "river" | "landscape" | "mixed"
    svg_drawing: str | None,    # Optional SVG drawing content from drawing_generator
    resolution: str = "2k",     # "2k" | "4k"
    output_dir: str = "./output/renders"
) -> dict:
    """
    Returns:
    {
        "birdseye_view": {"path": str, "base64": str},
        "perspective_view": {"path": str, "base64": str},
        "prompt_used": str,
        "model": "nano-banana-pro"
    }
    """

Prompt Template Strategy

Each project type gets a specialized prompt template:

Road: Emphasize road alignment, terrain, surrounding land use, utility corridors
Building: Emphasize building mass, facade, site context, parking/landscaping
Water/Sewerage: Emphasize pipeline routes, treatment facilities, connection points
River: Emphasize riverbank, embankments, bridges, flood plains
Landscape: Emphasize vegetation, pathways, public spaces, terrain grading
Mixed: Combine relevant elements from applicable types

Template format:

"Create a photorealistic {view_type} of a {project_type} project:
{project_details}
Style: Professional architectural visualization, Korean construction context,
clear weather, daytime, {resolution} resolution"

API Configuration

API key stored via existing .env / secure_store.py pattern
New env var: GEMINI_API_KEY
SDK: google-genai (official Google Gen AI Python SDK)
Model: gemini-3-pro-image (Nano Banana Pro)
Error handling: On API failure, return error message without crashing the MCP tool

SVG-to-PNG Conversion

When an SVG drawing is provided as reference:

Convert SVG to PNG using cairosvg or Pillow
Send as reference image alongside the text prompt
Nano Banana Pro uses it for spatial understanding

Local LLM Removal

Identify and remove:

Any local model loading code (transformers, llama-cpp, ollama, etc.)
Related dependencies in requirements.txt / pyproject.toml
Config entries referencing local models
Replace with Gemini API calls where needed

Release Plan

Version: v2.0.0

README Overhaul

Project overview with feature highlights
Quick start guide (clone, install, configure, run)
Tool reference table (all 20 tools including new birdseye)
Claude Desktop connection guide (step-by-step with screenshots description)
ChatGPT / OpenAI connection guide
API key setup guide (Gemini, public data portal)
Example outputs (birdseye rendering description)
Troubleshooting FAQ

GitHub Release

Tag: v2.0.0
Release notes summarizing changes
Installation instructions

Testing Strategy

Unit test for prompt template generation
Unit test for SVG-to-PNG conversion
Integration test with mocked Gemini API response
Manual end-to-end test with real API key

Dependencies Added

Package	Purpose
`google-genai`	Gemini API SDK (Nano Banana Pro)
`cairosvg`	SVG to PNG conversion
`Pillow`	Image processing

Dependencies Removed

All local LLM packages (to be identified during implementation by scanning current requirements).

5.1 KiB Raw Blame History