feat(media): add image / video / audio surfaces with unified od media generate dispatcher

Extends Open Design from web-only to a multi-modal creation tool. The
unifying contract is one code-agent loop driven by skills + project
metadata + prompt constraints; for non-web surfaces the agent shells
out to a single dispatcher (`od media generate`) that the daemon
routes per (surface, model).

- Types: new Surface union, MediaAspect / AudioKind, image/video/audio
  ProjectKind + ProjectMetadata fields, video/audio ProjectFileKind.
- NewProjectPanel: top-level surface picker + Image / Video / Audio
  forms with model, aspect, length, duration, voice, audio-kind pickers.
- ExamplesTab + DesignSystemsTab: surface filter row that scopes
  before mode / scenario / category filters.
- FileViewer / FileWorkspace: native <video> and <audio> previews and
  matching tab icons.
- Daemon: parses `od.surface` and `> Surface:` blockquotes; recognises
  mp4 / webm / mov / mp3 / wav / ogg / m4a / flac extensions; spawns
  agents with OD_BIN / OD_DAEMON_URL / OD_PROJECT_ID / OD_PROJECT_DIR
  env so any code-agent CLI with shell access can call the dispatcher.
- daemon/media.js + daemon/media-models.js: surface-agnostic dispatcher
  with stub providers that emit deterministic placeholder bytes
  (1x1 PNG, valid mp4 ftyp, mp3 frame / silent WAV) so the framework
  works without API keys; real provider integrations slot in later.
- daemon/cli.js: `od media generate --surface ... --model ...`
  subcommand routes to POST /api/projects/:id/media/generate and
  prints one JSON line for the agent to parse.
- prompts/media-contract.ts: hard contract pinned LAST in the system
  prompt for image/video/audio surfaces — env vars, exact invocation,
  registered model IDs per surface, six workflow rules. system.ts
  metadata block updated to point at the contract.
- Seed skills: image-poster, video-shortform, audio-jingle each ship a
  SKILL.md with `mode/surface: image|video|audio` and a stylized
  example.html preview, and instruct the agent to dispatch via the
  contract.

Made-with: Cursor
This commit is contained in:
pftom
2026-04-28 22:40:58 +08:00
parent bc7c057216
commit ac70719d4d
28 changed files with 2902 additions and 78 deletions
+19
View File
@@ -156,6 +156,21 @@ const EXT_MIME = {
'.gif': 'image/gif',
'.webp': 'image/webp',
'.avif': 'image/avif',
// Video — covered MIMEs are the formats most generators emit. Browsers
// play them via <video> / <audio> in the FileViewer with no transcode.
'.mp4': 'video/mp4',
'.m4v': 'video/mp4',
'.webm': 'video/webm',
'.mov': 'video/quicktime',
// Audio — music / TTS generators commonly produce mp3 / wav / ogg /
// m4a; flac is rarer but cheap to support.
'.mp3': 'audio/mpeg',
'.wav': 'audio/wav',
'.ogg': 'audio/ogg',
'.oga': 'audio/ogg',
'.m4a': 'audio/mp4',
'.flac': 'audio/flac',
'.aac': 'audio/aac',
};
export function mimeFor(name) {
@@ -175,6 +190,10 @@ export function kindFor(name) {
if (name.startsWith('sketch-')) return 'sketch';
return 'image';
}
if (['.mp4', '.m4v', '.webm', '.mov'].includes(ext)) return 'video';
if (['.mp3', '.wav', '.ogg', '.oga', '.m4a', '.flac', '.aac'].includes(ext)) {
return 'audio';
}
if (['.md', '.txt'].includes(ext)) return 'text';
if (['.js', '.mjs', '.cjs', '.ts', '.tsx', '.json', '.css'].includes(ext)) {
return 'code';