Files

T

pftom ac70719d4d feat(media): add image / video / audio surfaces with unified od media generate dispatcher

Extends Open Design from web-only to a multi-modal creation tool. The
unifying contract is one code-agent loop driven by skills + project
metadata + prompt constraints; for non-web surfaces the agent shells
out to a single dispatcher (`od media generate`) that the daemon
routes per (surface, model).

- Types: new Surface union, MediaAspect / AudioKind, image/video/audio
  ProjectKind + ProjectMetadata fields, video/audio ProjectFileKind.
- NewProjectPanel: top-level surface picker + Image / Video / Audio
  forms with model, aspect, length, duration, voice, audio-kind pickers.
- ExamplesTab + DesignSystemsTab: surface filter row that scopes
  before mode / scenario / category filters.
- FileViewer / FileWorkspace: native <video> and <audio> previews and
  matching tab icons.
- Daemon: parses `od.surface` and `> Surface:` blockquotes; recognises
  mp4 / webm / mov / mp3 / wav / ogg / m4a / flac extensions; spawns
  agents with OD_BIN / OD_DAEMON_URL / OD_PROJECT_ID / OD_PROJECT_DIR
  env so any code-agent CLI with shell access can call the dispatcher.
- daemon/media.js + daemon/media-models.js: surface-agnostic dispatcher
  with stub providers that emit deterministic placeholder bytes
  (1x1 PNG, valid mp4 ftyp, mp3 frame / silent WAV) so the framework
  works without API keys; real provider integrations slot in later.
- daemon/cli.js: `od media generate --surface ... --model ...`
  subcommand routes to POST /api/projects/:id/media/generate and
  prints one JSON line for the agent to parse.
- prompts/media-contract.ts: hard contract pinned LAST in the system
  prompt for image/video/audio surfaces — env vars, exact invocation,
  registered model IDs per surface, six workflow rules. system.ts
  metadata block updated to point at the contract.
- Seed skills: image-poster, video-shortform, audio-jingle each ship a
  SKILL.md with `mode/surface: image|video|audio` and a stylized
  example.html preview, and instruct the agent to dispatch via the
  contract.

Made-with: Cursor

2026-04-28 22:40:58 +08:00

3.3 KiB

Raw Permalink Blame History

name, description, triggers, od

name

description

triggers

video-shortform

Short-form video generation skill — 3-10 second clips for product reveals, motion teasers, ambient loops. Defaults to Seedance 2 but works the same with Kling 3 / 4, Veo 3 or Sora 2. Output is one MP4 saved to the project folder. When the workspace also ships an interactive-video / hyperframes skill, prefer composing several short shots into a single timeline rather than one long monolithic clip.

video

clip

shortform

reel

短视频

动效

mode

surface

scenario

preview

design_system

example_prompt

video

marketing

type	entry
html	example.html

requires
false

5-second product reveal — ceramic coffee mug rotating on a soft paper backdrop, warm side-light from camera-left, micro dust motes drifting through the beam. Cinematic, 16:9, slow drift on the camera.

Video Shortform Skill

Short-form (≤ 10s) is the sweet spot for current text-to-video models — they're great at one shot with one idea, weaker at multi-cut narratives. Plan one shot per call.

Resource map

video-shortform/
├── SKILL.md
└── example.html

Workflow

Step 0 — Read the project metadata

videoModel, videoLength (seconds), videoAspect. These are hard-locks — clamp the prompt to whatever the chosen model supports (Seedance 2 caps at 10s; Kling 4 supports up to 10s + image-to-video; Veo 3 supports 8s with audio).

Step 1 — Plan the shot

Write the shotlist BEFORE calling the model:

Slot	Content
Subject	What's in frame?
Camera	Static / pan / push-in / orbit?
Lighting	Key direction + temperature
Motion	What moves, at what pace? Subject motion vs camera motion.
Sound	Ambient bed? (only if the model supports audio)

Show this to the user as a one-sentence plan before dispatching — they can redirect cheaply.

Step 2 — Compose the prompt

Use the format the upstream model prefers (Seedance: motion + camera + mood; Kling: subject + camera + style; Veo: subject + cinematography + sound). Bind the project's videoAspect and videoLength directly to the API parameters; never put them in prose.

Step 3 — Dispatch via the media contract

Use the unified dispatcher — do not call provider APIs by hand:

node "$OD_BIN" media generate \
  --project "$OD_PROJECT_ID" \
  --surface video \
  --model "<videoModel from metadata>" \
  --aspect "<videoAspect from metadata>" \
  --length <videoLength seconds> \
  --output "<short-slug>-<seconds>s.mp4" \
  --prompt "<assembled shot prompt from Step 2>"

The command prints one line of JSON: {"file": {"name": "...", ...}}. The bytes land in the project; the FileViewer plays it automatically.

Step 4 — Hand off

Reply with: shot summary, the filename returned by the dispatcher, and one sentence on what to try if the user wants a variation.

Hard rules

One shot per turn. Multi-shot timelines belong in a hyperframes / interactive-video skill, not here.
Match videoAspect exactly — re-renders are slow.
Never ship a video without saving the file — the user expects something to play in the file viewer.
When the underlying model fails (NSFW filter, content policy, timeout), report the error verbatim. Don't silently retry.

3.3 KiB Raw Permalink Blame History