Introduce non-web media surfaces (image, video, audio) as first-class project kinds. The unifying contract is "skill workflow + project metadata tell the agent WHAT to make; one shell command — od media generate — is HOW bytes are produced", so any code-agent CLI with shell access can drive it without bespoke tools. - Frontend: New Project panel gains Image/Video/Audio tabs with model picker, aspect/length/duration controls, and audio kind/voice selection. Examples and Design Systems tabs gain layered sections. FileViewer renders the generated image/video/audio files. - Shared registry: src/media/models.ts is the single source of truth for image/video/audio model IDs, aspects, and defaults — consumed by the picker AND the daemon dispatcher. - Prompts: media-contract.ts is pinned LAST in the system prompt for media surfaces so its hard rules (call od media generate, don't emit binary in <artifact>, allowed model IDs) win over softer earlier wording. - Daemon: new media.js dispatcher + media-models.js JSON view of the registry; cli.js gets the `od media generate` subcommand wired up via server.js / projects.js so the daemon writes files back into the project dir. - Skills: audio-jingle, image-poster, video-shortform seed examples for the three surfaces. Made-with: Cursor
3.4 KiB
name, description, triggers, od
| name | description | triggers | od | |||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| image-poster | Single-image generation skill for posters, key art, and editorial illustrations. Defaults to gpt-image-2 but is provider-agnostic — the same workflow drives Flux, Imagen, or Midjourney via the active upstream tooling. Output is one or more PNG/JPEG files saved to the project folder. |
|
|
Image Poster Skill
Produce one finished image asset per turn unless the user asks for variations. Image generation rewards a tight, structured prompt — your job is to assemble that prompt from the user's brief, then dispatch.
Resource map
image-poster/
├── SKILL.md ← you're reading this
└── example.html ← what the resulting card looks like in Examples
Workflow
Step 0 — Read the project metadata
The active project carries imageModel, imageAspect, and (optional)
imageStyle notes. Use them as the upstream model + canvas + style
anchor; only ask the user to fill them in if they're marked (unknown — ask).
Step 1 — Compose the prompt
Plan in this exact order before calling any tool:
- Subject + composition — what is in the frame, where, at what scale; eye-line and crop.
- Lighting + mood — natural / studio / moody; warm / cool; key plus rim plus fill; time of day if outdoor.
- Palette + textures — hex anchors when the user gave a brand palette; otherwise a 3-word mood tag (e.g. "muted ochre + ink").
- Camera / lens — only if the user wants photographic realism ("85mm portrait, shallow DOF") or a specific film stock.
- What to avoid — common AI-slop patterns ("no extra fingers, no warped text, no logo placeholders").
Step 2 — Dispatch via the media contract
Use the unified dispatcher — do not call upstream provider APIs by hand. Run from your shell tool:
node "$OD_BIN" media generate \
--project "$OD_PROJECT_ID" \
--surface image \
--model "<imageModel from metadata>" \
--aspect "<imageAspect from metadata>" \
--output "<short-descriptive-name>.png" \
--prompt "<the full assembled prompt from Step 1>"
The command prints one line of JSON: {"file": {"name": "...", ...}}.
The daemon writes the bytes into the project folder; the FileViewer
picks it up automatically.
Step 3 — Hand off
Reply with a one-paragraph summary of the prompt you used and the
filename returned by the dispatcher (e.g. I generated hero-poster.png
with gpt-image-2 at 1:1.). Do not emit an <artifact> tag.
Hard rules
- One image per turn unless asked for variations.
- Honor
imageAspectexactly — the upstream cost is the same; matching the aspect avoids a re-render. - No filler typography in the image itself unless the user asked for in-frame text. Real copy beats lorem.
- Save every render — never describe an image without producing the file. The user expects something to open in the file viewer.