feat(media): add image / video / audio surfaces with unified od media generate dispatcher
Extends Open Design from web-only to a multi-modal creation tool. The unifying contract is one code-agent loop driven by skills + project metadata + prompt constraints; for non-web surfaces the agent shells out to a single dispatcher (`od media generate`) that the daemon routes per (surface, model). - Types: new Surface union, MediaAspect / AudioKind, image/video/audio ProjectKind + ProjectMetadata fields, video/audio ProjectFileKind. - NewProjectPanel: top-level surface picker + Image / Video / Audio forms with model, aspect, length, duration, voice, audio-kind pickers. - ExamplesTab + DesignSystemsTab: surface filter row that scopes before mode / scenario / category filters. - FileViewer / FileWorkspace: native <video> and <audio> previews and matching tab icons. - Daemon: parses `od.surface` and `> Surface:` blockquotes; recognises mp4 / webm / mov / mp3 / wav / ogg / m4a / flac extensions; spawns agents with OD_BIN / OD_DAEMON_URL / OD_PROJECT_ID / OD_PROJECT_DIR env so any code-agent CLI with shell access can call the dispatcher. - daemon/media.js + daemon/media-models.js: surface-agnostic dispatcher with stub providers that emit deterministic placeholder bytes (1x1 PNG, valid mp4 ftyp, mp3 frame / silent WAV) so the framework works without API keys; real provider integrations slot in later. - daemon/cli.js: `od media generate --surface ... --model ...` subcommand routes to POST /api/projects/:id/media/generate and prints one JSON line for the agent to parse. - prompts/media-contract.ts: hard contract pinned LAST in the system prompt for image/video/audio surfaces — env vars, exact invocation, registered model IDs per surface, six workflow rules. system.ts metadata block updated to point at the contract. - Seed skills: image-poster, video-shortform, audio-jingle each ship a SKILL.md with `mode/surface: image|video|audio` and a stylized example.html preview, and instruct the agent to dispatch via the contract. Made-with: Cursor
This commit is contained in:
@@ -25,12 +25,16 @@ export async function listSkills(skillsRoot) {
|
||||
const { data, body } = parseFrontmatter(raw);
|
||||
const hasAttachments = await dirHasAttachments(dir);
|
||||
const mode = data.od?.mode || inferMode(body, data.description);
|
||||
const surface = normalizeSurface(data.od?.surface, mode);
|
||||
out.push({
|
||||
id: data.name || entry.name,
|
||||
name: data.name || entry.name,
|
||||
description: data.description || "",
|
||||
triggers: Array.isArray(data.triggers) ? data.triggers : [],
|
||||
mode,
|
||||
// Surface defaults to inferring from `mode` so legacy SKILL.md
|
||||
// files (no `od.surface` declared) keep classifying correctly.
|
||||
surface,
|
||||
platform: normalizePlatform(
|
||||
data.od?.platform,
|
||||
mode,
|
||||
@@ -159,6 +163,20 @@ function inferMode(body, description) {
|
||||
return "prototype";
|
||||
}
|
||||
|
||||
// Surface is the high-level output bucket — web, image, video or audio.
|
||||
// Authors can pin it via `od.surface`; otherwise we derive from `mode`,
|
||||
// then fall back to the safe default ('web') so existing skills classify
|
||||
// unchanged.
|
||||
const KNOWN_SURFACES = new Set(["web", "image", "video", "audio"]);
|
||||
function normalizeSurface(value, mode) {
|
||||
if (typeof value === "string") {
|
||||
const v = value.trim().toLowerCase();
|
||||
if (KNOWN_SURFACES.has(v)) return v;
|
||||
}
|
||||
if (mode === "image" || mode === "video" || mode === "audio") return mode;
|
||||
return "web";
|
||||
}
|
||||
|
||||
// Validate platform tag — only desktop / mobile are meaningful for the
|
||||
// Examples gallery. Falls back to autodetecting "mobile" from descriptions
|
||||
// so legacy skills sort under the right pill without authoring changes.
|
||||
|
||||
Reference in New Issue
Block a user