Files

T

pftom ac70719d4d feat(media): add image / video / audio surfaces with unified od media generate dispatcher

Extends Open Design from web-only to a multi-modal creation tool. The
unifying contract is one code-agent loop driven by skills + project
metadata + prompt constraints; for non-web surfaces the agent shells
out to a single dispatcher (`od media generate`) that the daemon
routes per (surface, model).

- Types: new Surface union, MediaAspect / AudioKind, image/video/audio
  ProjectKind + ProjectMetadata fields, video/audio ProjectFileKind.
- NewProjectPanel: top-level surface picker + Image / Video / Audio
  forms with model, aspect, length, duration, voice, audio-kind pickers.
- ExamplesTab + DesignSystemsTab: surface filter row that scopes
  before mode / scenario / category filters.
- FileViewer / FileWorkspace: native <video> and <audio> previews and
  matching tab icons.
- Daemon: parses `od.surface` and `> Surface:` blockquotes; recognises
  mp4 / webm / mov / mp3 / wav / ogg / m4a / flac extensions; spawns
  agents with OD_BIN / OD_DAEMON_URL / OD_PROJECT_ID / OD_PROJECT_DIR
  env so any code-agent CLI with shell access can call the dispatcher.
- daemon/media.js + daemon/media-models.js: surface-agnostic dispatcher
  with stub providers that emit deterministic placeholder bytes
  (1x1 PNG, valid mp4 ftyp, mp3 frame / silent WAV) so the framework
  works without API keys; real provider integrations slot in later.
- daemon/cli.js: `od media generate --surface ... --model ...`
  subcommand routes to POST /api/projects/:id/media/generate and
  prints one JSON line for the agent to parse.
- prompts/media-contract.ts: hard contract pinned LAST in the system
  prompt for image/video/audio surfaces — env vars, exact invocation,
  registered model IDs per surface, six workflow rules. system.ts
  metadata block updated to point at the contract.
- Seed skills: image-poster, video-shortform, audio-jingle each ship a
  SKILL.md with `mode/surface: image|video|audio` and a stylized
  example.html preview, and instruct the agent to dispatch via the
  contract.

Made-with: Cursor

2026-04-28 22:40:58 +08:00

3.4 KiB

Raw Blame History

name, description, triggers, od

name

description

triggers

image-poster

Single-image generation skill for posters, key art, and editorial illustrations. Defaults to gpt-image-2 but is provider-agnostic — the same workflow drives Flux, Imagen, or Midjourney via the active upstream tooling. Output is one or more PNG/JPEG files saved to the project folder.

poster

key art

illustration

image

cover art

海报

插画

mode

surface

scenario

preview

design_system

example_prompt

image

design

type	entry
html	example.html

requires
false

Editorial poster for an indie film festival — one bold abstract silhouette over a warm, slightly grainy paper background; hand-set sans serif title at the top, festival dates and venue at the bottom in monospace. Muted ochre + ink palette.

Image Poster Skill

Produce one finished image asset per turn unless the user asks for variations. Image generation rewards a tight, structured prompt — your job is to assemble that prompt from the user's brief, then dispatch.

Resource map

image-poster/
├── SKILL.md         ← you're reading this
└── example.html     ← what the resulting card looks like in Examples

Workflow

Step 0 — Read the project metadata

The active project carries imageModel, imageAspect, and (optional) imageStyle notes. Use them as the upstream model + canvas + style anchor; only ask the user to fill them in if they're marked (unknown — ask).

Step 1 — Compose the prompt

Plan in this exact order before calling any tool:

Subject + composition — what is in the frame, where, at what scale; eye-line and crop.
Lighting + mood — natural / studio / moody; warm / cool; key plus rim plus fill; time of day if outdoor.
Palette + textures — hex anchors when the user gave a brand palette; otherwise a 3-word mood tag (e.g. "muted ochre + ink").
Camera / lens — only if the user wants photographic realism ("85mm portrait, shallow DOF") or a specific film stock.
What to avoid — common AI-slop patterns ("no extra fingers, no warped text, no logo placeholders").

Step 2 — Dispatch via the media contract

Use the unified dispatcher — do not call upstream provider APIs by hand. Run from your shell tool:

node "$OD_BIN" media generate \
  --project "$OD_PROJECT_ID" \
  --surface image \
  --model "<imageModel from metadata>" \
  --aspect "<imageAspect from metadata>" \
  --output "<short-descriptive-name>.png" \
  --prompt "<the full assembled prompt from Step 1>"

The command prints one line of JSON: {"file": {"name": "...", ...}}. The daemon writes the bytes into the project folder; the FileViewer picks it up automatically.

Step 3 — Hand off

Reply with a one-paragraph summary of the prompt you used and the filename returned by the dispatcher (e.g. I generated hero-poster.png with gpt-image-2 at 1:1.). Do not emit an <artifact> tag.

Hard rules

One image per turn unless asked for variations.
Honor imageAspect exactly — the upstream cost is the same; matching the aspect avoids a re-render.
No filler typography in the image itself unless the user asked for in-frame text. Real copy beats lorem.
Save every render — never describe an image without producing the file. The user expects something to open in the file viewer.

3.4 KiB Raw Blame History