Add initial project structure with essential files

- Created .gitignore to exclude build artifacts and dependencies.
- Added index.html as the main entry point for the application.
- Included LICENSE file with Apache 2.0 terms.
- Initialized package.json and package-lock.json for project dependencies.
- Added pnpm-lock.yaml for package management.
- Created QUICKSTART.md for setup instructions.
- Added README.md and README.zh-CN.md for project documentation in English and Chinese.
This commit is contained in:
pftom
2026-04-28 12:25:59 +08:00
commit a98096a042
258 changed files with 67862 additions and 0 deletions
+258
View File
@@ -0,0 +1,258 @@
# Agent Adapters
**Parent:** [`spec.md`](spec.md) · **Siblings:** [`architecture.md`](architecture.md) · [`skills-protocol.md`](skills-protocol.md) · [`modes.md`](modes.md)
The adapter layer is OCD's most load-bearing design decision. We delegate the **entire agent loop** — model calls, tool use, context management, permission handling, resume, cancel — to the user's existing code agent CLI. OCD's job is to detect it, feed it a skill + prompt + working directory, and stream its output back to the web UI.
> **Thesis:** The code agent space has already converged on a few strong implementations (Claude Code, Codex, Cursor Agent, Gemini CLI, OpenCode, OpenClaw). Reimplementing a 7th is worse than just talking to all of them.
>
> **Inspiration:** [multica](https://github.com/multica-ai/multica) (PATH-scan detection + daemon architecture) and [cc-switch](https://github.com/farion1231/cc-switch) (per-agent config format knowledge + symlink-based skill distribution).
---
## 1. Adapter interface (TypeScript)
Every adapter implements this interface. Full types in [`schemas/adapter.md`](schemas/adapter.md) (TODO).
```ts
interface AgentAdapter {
readonly id: string; // "claude-code" | "codex" | …
readonly displayName: string;
// -- discovery --------------------------------------------------
detect(): Promise<AgentDetection | null>; // null if not installed
// -- capability negotiation ------------------------------------
capabilities(): AgentCapabilities;
// -- execution -------------------------------------------------
run(params: AgentRunParams): AsyncIterable<AgentEvent>;
cancel(runId: string): Promise<void>;
resume?(runId: string, message: string): AsyncIterable<AgentEvent>;
}
interface AgentDetection {
executablePath: string; // absolute path to CLI
version: string;
configDir?: string; // e.g. ~/.claude
skillsDir?: string; // e.g. ~/.claude/skills
authState: "ok" | "missing" | "expired";
}
interface AgentCapabilities {
surgicalEdit: boolean; // can edit a targeted region without rewriting file
nativeSkillLoading: boolean; // picks up ~/.<agent>/skills/ automatically
streaming: boolean; // emits tool calls in real time
resume: boolean; // can continue an interrupted run
permissionMode: "strict" | "permissive" | "none";
contextWindowHint?: number; // in tokens
}
interface AgentRunParams {
runId: string;
cwd: string; // absolute path — artifact dir
systemPrompt: string; // skill's SKILL.md body + DESIGN.md excerpt
userPrompt: string;
skillDir?: string; // if set, adapter should make skill files available
allowedTools?: string[]; // for agents that support it
timeoutMs?: number;
}
type AgentEvent =
| { type: "thinking"; text: string }
| { type: "tool_call"; name: string; input: unknown; id: string }
| { type: "tool_result"; id: string; output: unknown }
| { type: "text_delta"; text: string }
| { type: "file_write"; path: string } // synthesized by adapter if agent doesn't emit natively
| { type: "error"; error: string }
| { type: "done"; reason: "completed" | "cancelled" | "error" };
```
## 2. Detection strategy
Run all adapters' `detect()` in parallel on daemon start, then cache results in `~/.open-claude-design/agents.json` with a 24h TTL. Re-detect on daemon `SIGHUP`.
Each adapter uses **two signals**:
1. **PATH scan.** `which <binary>` for each known executable name. Fast (<10ms).
2. **Config-dir probe.** Check for `~/.claude/`, `~/.codex/`, `~/.cursor/`, etc. This catches cases where the CLI was installed via npm global into a shell-specific PATH.
If both signals agree, detection is confident. If only one signal fires, we mark `authState: "missing"` and prompt the user to run the CLI's auth flow.
## 3. Adapter catalog (v1 target)
| Adapter | CLI command | Config dir | Skills dir | Native skill loading | Surgical edit | Streaming | Priority |
|---|---|---|---|---|---|---|---|
| **claude-code** | `claude` | `~/.claude/` | `~/.claude/skills/` | ✅ | ✅ | ✅ | P0 (MVP) |
| **api-fallback** | *(direct Anthropic API)* | — | — | ❌ (prompt-injected) | 〜 | ✅ | P0 (MVP) |
| **codex** | `codex` | `~/.codex/` | `~/.codex/skills/` | 〜 (varies by version) | 〜 (regenerate w/ scoping) | ✅ | P1 |
| **cursor-agent** | `cursor-agent` | `~/.cursor/` | n/a (via project `.cursorrules`) | ❌ (prompt-injected) | ✅ | ✅ | P1 |
| **gemini-cli** | `gemini` | `~/.config/gemini/` | ❌ | ❌ (prompt-injected) | ❌ (regenerate) | ✅ | P2 |
| **opencode** | `opencode` | `~/.opencode/` | 〜 | 〜 | ✅ | P2 |
| **openclaw** | `openclaw` | `~/.openclaw/` | 〜 | 〜 | 〜 | P2 |
"P0/P1/P2" correspond to the roadmap phases in [`roadmap.md`](roadmap.md).
## 4. Skill injection per adapter
Skills travel into each agent via one of three strategies, in order of preference:
### 4.1 Native skill loading (preferred)
Agent scans its own `~/.<agent>/skills/` on launch. We symlink OCD's skill into that dir (see [`skills-protocol.md`](skills-protocol.md) §3) and let the agent pick it up natively. Zero prompt overhead.
- **Works for:** Claude Code. Codex (version-dependent). OpenCode.
### 4.2 Prompt injection (fallback)
We read the skill's `SKILL.md` body + any `references/*.md` files it has, concatenate them into the system prompt, and copy `assets/` files into the cwd. The agent has no concept of "skills" but has the instructions.
- **Works for:** everyone. Default for API fallback, Cursor Agent, Gemini CLI.
- **Cost:** more tokens per run. Mitigation: prune `references/` to the files the skill body actually mentions.
### 4.3 File-placed workflow (hybrid)
For agents that support `AGENTS.md` / `.cursorrules` / similar project-level instruction files (Cursor Agent, OpenCode), we write a project-scoped instruction file in the artifact cwd before running the agent. The agent picks it up automatically.
- **Works for:** Cursor Agent (`.cursorrules`), some OpenCode configurations.
The adapter declares which strategy to use via `capabilities().nativeSkillLoading` and a private `skillInjectionStrategy` field.
## 5. Per-adapter notes
### 5.1 Claude Code (reference implementation)
- Invocation: `claude --print --output-format stream-json --cwd <artifact-dir> "<prompt>"`.
- Streaming format: JSON Lines over stdout; each line is an event we can map to `AgentEvent` directly.
- Skill loading: native. Just ensure the skill is symlinked in `~/.claude/skills/`.
- Surgical edits: use the `Edit` tool; Claude Code's own loop handles this.
- Permission: set `--allowed-tools "Read,Edit,Write"` to restrict blast radius.
- Cancel: send `SIGTERM`; Claude Code flushes and exits.
- **Gotchas:** Claude Code's JSON stream schema is versioned — pin to a known version, warn on mismatch.
### 5.2 API fallback (no CLI)
- Invocation: direct Anthropic Messages API with `stream: true`.
- Skill loading: prompt injection only — read the skill dir, inline everything.
- Tool use: we register `Read/Write/Edit` as tools, implement them in the daemon against the artifact cwd, and run the loop ourselves. This is the one place OCD does own the loop — because the user has no agent at all. Keep it as dumb as possible.
- Surgical edits: approximated by regenerating the whole target file with "only change X" in the prompt.
- Model: Claude Sonnet 4.6 default; Opus 4.7 behind a flag.
- **Why ship this at all?** Topology C requires it (no daemon available in a pure-Vercel deploy). Also, users trying OCD for the first time without a CLI installed still get a working experience.
### 5.3 Codex
- Invocation: `codex exec --cwd <dir> "<prompt>"`.
- Streaming: line-based; parse with a regex-based state machine. Less rich than Claude Code's JSON stream.
- Skill loading: varies. Newer Codex versions read `~/.codex/skills/`; older versions don't. Detect by version string; fall back to prompt injection.
- Surgical edits: Codex's edit tool exists but the tool-call schema is different enough that we regenerate files instead in v1. Revisit in v2.
- **Gotcha:** Codex's CLI auth state can be "authenticated to wrong org." Detect by running `codex whoami` at detect time.
### 5.4 Cursor Agent
- Invocation: `cursor-agent --workspace <dir> "<prompt>"` (rough; verify with CLI docs at implementation time).
- Streaming: yes, JSON lines.
- Skill loading: no native skill concept. We write a `.cursorrules` file into the artifact dir before running. The rules file contains the skill's SKILL.md body (minus front-matter).
- Surgical edits: Cursor's inline edit tool is strong; map our `refine` call to its edit protocol.
- **Gotcha:** Cursor Agent operates on workspaces, not single files. Constrain the workspace to the artifact dir to prevent over-broad changes.
### 5.5 Gemini CLI
- Invocation: `gemini --prompt "<prompt>" --cwd <dir>`.
- Streaming: yes, but less structured tool events. Expect fewer surgical-edit capabilities.
- Skill loading: prompt injection only.
- Surgical edits: regenerate whole file.
- **Gotcha:** Gemini's tool-use format is distinct; we translate our file-write tool to its `file_tool` equivalent.
### 5.6 OpenCode / OpenClaw
- Less-matured CLIs. Targeting P2. Expect bumps; adapter implementations will likely be the thinnest possible "shell out, parse output, synthesize events" approach.
## 6. Capability-driven UI
The web UI reads `agents.capabilities()` and disables features that the active adapter can't support:
| UI feature | Requires | If missing |
|---|---|---|
| Comment mode (click to refine) | `surgicalEdit: true` | Hidden; show tooltip explaining why |
| Streaming tool-call feed | `streaming: true` | Show a spinner only |
| Resume interrupted run | `resume: true` | "Cancel + restart" only |
| Skill picker shows skill with `ocd.capabilities_required` | all listed caps | Skill greyed out with reason |
This is how we avoid "works on my Claude Code, breaks on your Gemini" — we detect, degrade, and document.
## 7. Agent switching
The user can switch active agent per session:
```
POST agents.setActive { agentId: "codex" }
→ capabilities() reported
→ web UI refreshes feature gates
→ next generation runs on Codex
```
Switching mid-run is not allowed (cancel first). The artifact is agent-agnostic; only the generation process differs.
## 8. Fallback chain
If the user's preferred agent fails (crash, auth, timeout), OCD offers a one-click fallback in this order:
1. User's preferred agent (e.g. Cursor Agent)
2. Any other detected agent (Claude Code, if installed)
3. API fallback (direct Anthropic, requires key)
The user explicitly opts in to fallback — we don't silently switch, because a skill may have been authored for a specific agent's capabilities.
## 9. Detection UX
First run:
```
$ pnpm dev
[ocd] daemon starting on :7431
[ocd] detecting agents…
[ocd] ✓ claude-code v0.6.3 (auth: ok, skills dir linked)
[ocd] ✓ codex v0.8.1 (auth: ok)
[ocd] ✗ cursor-agent (not installed)
[ocd] ✗ gemini-cli (installed but not authenticated; run `gemini auth login`)
[ocd] ✓ api-fallback (ANTHROPIC_API_KEY found)
[ocd] daemon ready; 3 agents available
```
Web UI mirrors this in an agent-selector dropdown, with unauthenticated agents shown greyed out with a fix-it tooltip.
## 10. Authorization boundaries
We inherit the underlying agent's permission model rather than building our own. This means:
- **Claude Code** respects its own `--allowed-tools` and `--permission-mode` flags. OCD passes through user preferences.
- **Codex / Cursor** sandbox by workspace; OCD always sets cwd to the artifact dir so nothing outside is visible by default.
- **API fallback** is the one case we own. We implement a whitelist: only `Read`, `Write`, `Edit` tools, all rooted at the artifact cwd. Network access is off.
The daemon never grants more authority to an agent than it had on its own. We don't run the agent in a privileged mode "for convenience."
## 11. Adapter source layout
```
apps/daemon/src/adapters/
├── base.ts # shared interface + utility helpers
├── claude-code/
│ ├── adapter.ts
│ ├── stream-parser.ts # JSON-lines → AgentEvent
│ └── detect.ts
├── api-fallback/
│ ├── adapter.ts
│ ├── tool-loop.ts # the minimal tool-use loop
│ └── tools.ts # Read/Write/Edit implementations
├── codex/ # Phase 1
├── cursor-agent/ # Phase 1
├── gemini-cli/ # Phase 2
├── opencode/ # Phase 2
└── openclaw/ # Phase 2
```
Each adapter is a separate module so community contributions can add new ones without touching core daemon code.
## 12. Open questions
- **Nested agents.** What if Claude Code's agent itself spawns a subagent? We receive events from the outer process only. v1 policy: surface only top-level events; summarize subagent activity as "sub-task" placeholder.
- **Billing awareness.** Some agents bill per message, some per token. OCD doesn't track cost in MVP; v1 adds an optional "usage" event from adapters that expose it.
- **Windows support.** PATH scanning and `spawn` semantics differ on Windows. v1 targets macOS and Linux; Windows is best-effort.
- **Docker-contained agents.** Some users run Claude Code in a container. Adapter needs a "remote" mode — probably same interface but talks over SSH. Phase 2+.
+340
View File
@@ -0,0 +1,340 @@
# Architecture
**Parent:** [`spec.md`](spec.md) · **Siblings:** [`skills-protocol.md`](skills-protocol.md) · [`agent-adapters.md`](agent-adapters.md) · [`modes.md`](modes.md)
This doc describes the system topology, runtime modes, data flow, and file layout. Design rationale lives in [`spec.md`](spec.md); protocol details for skills and agent adapters live in their own docs.
[ocod]: https://github.com/OpenCoworkAI/open-codesign
[acd]: https://github.com/VoltAgent/awesome-claude-design
[piai]: https://github.com/mariozechner/pi-ai
[guizang]: https://github.com/op7418/guizang-ppt-skill
---
## 1. Three deployment topologies
OCD is a web app plus a local daemon. The split means the same UI can run in three shapes:
### Topology A — Fully local (the default)
```
┌────────────────── user's machine ──────────────────┐
│ │
│ browser ──► Next.js dev server (localhost:3000) │
│ │ │
│ │ ws://localhost:7431 │
│ ▼ │
│ ocd daemon (Node, long-running) │
│ │ │
│ ▼ │
│ spawns: claude / codex / cursor / … │
└────────────────────────────────────────────────────┘
```
One `pnpm dev` starts both the Next.js app and the daemon (via a predev script). Zero config. No accounts.
### Topology B — Web on Vercel + daemon on user's machine
```
browser ──► ocd.yourdomain.com (Vercel)
│ ws(s):// user-provided URL (e.g. cloudflared tunnel)
ocd daemon on user's laptop
spawns: claude / codex / …
```
The user runs `ocd daemon --expose` which prints a tunnel URL; they paste the URL into the deployed web app's "Connect daemon" screen. Daemon holds secrets; Vercel holds nothing sensitive.
### Topology C — Web on Vercel + direct API (no daemon)
```
browser ──► ocd.yourdomain.com (Vercel serverless)
Anthropic Messages API (BYOK stored in browser)
```
No local CLI, no daemon. Degraded experience — no Claude Code skills, no filesystem artifacts (stored in IndexedDB), no PPTX export. But it's the "just try it" path. Keys stored `localStorage` with explicit warning.
The three topologies share the same web bundle; the difference is which transports are enabled.
## 2. Component diagram (logical)
```
┌─────────────────────────────── Web App ─────────────────────────────┐
│ │
│ ┌──────────┐ ┌─────────────┐ ┌───────────┐ ┌────────────────┐ │
│ │ chat pane│ │ artifact │ │ preview │ │ comment / │ │
│ │ │ │ tree │ │ iframe │ │ slider overlay │ │
│ └────┬─────┘ └──────┬──────┘ └─────┬─────┘ └────────┬───────┘ │
│ │ │ │ │ │
│ └─────────── session bus (in-memory) ──────────────┘ │
│ │ │
│ ▼ │
│ Transport layer (ws-rpc | http-direct | indexeddb) │
└─────────────────────────┬───────────────────────────────────────────┘
┌───────────────────────┴────────────────────────────────┐
│ │
▼ (topology A/B) ▼ (topology C)
┌─────────────────────── Daemon ───────────────────────┐ ┌────────────┐
│ │ │ browser- │
│ session manager skill registry │ │ only │
│ agent adapter pool design-system resolver │ │ runtime │
│ artifact store preview compile pipeline │ │ (limited) │
│ export pipeline detection service │ └────────────┘
│ │
└─┬────────────────────────────────────────────────┬───┘
│ │
▼ ▼
┌─ agent CLIs ─┐ ┌─ filesystem ─┐
│ claude │ │ ./.ocd/ │
│ codex │ │ ~/.ocd/ │
│ cursor-agent │ │ skills/ │
│ gemini │ │ DESIGN.md │
│ opencode │ └──────────────┘
│ openclaw │
└──────────────┘
```
## 3. Key components
### 3.1 Web app (Next.js 15, App Router)
- **Why Next.js, not Vite SPA?** We want SSR for the marketing landing page + serverless routes for Topology C's direct-API path + Vercel deployment as a first-class citizen. An SPA would need a separate server for any of that.
- **State:** Zustand for session/artifact stores. Browser-side only; hydrated from daemon on connect.
- **Iframe preview:** Vendored React 18 + Babel standalone for JSX artifacts, following [Open CoDesign][ocod]'s approach. HTML artifacts load raw. See [§5](#5-preview-renderer).
- **Comment mode:** Click captures `[data-ocd-id]` on preview DOM, opens a popover, sends `{artifact_id, element_id, note}` to daemon → agent gets a surgical edit instruction.
- **Slider UI:** When an agent emits a "tweak parameter" tool call (see [`skills-protocol.md`](skills-protocol.md) §4.2), the web app renders a live-update control that re-sends parameterized prompts without round-tripping the chat.
### 3.2 Local daemon (`ocd daemon`)
Single binary via `pkg` or a thin Node script distributed over npm. Responsibilities:
- Listen on `ws://localhost:7431` (configurable). Accept JSON-RPC-over-WebSocket.
- Maintain a **session** per web tab. Sessions hold: active agent, active skill, active artifact, in-flight tool calls, design-system reference.
- Operate the **agent adapter pool**: one detected CLI = one adapter instance, reused across sessions.
- Scan and index **skills** from `~/.claude/skills/`, `./skills/`, `./.claude/skills/` on startup and on FS-watch events.
- Own the **artifact store** — writes files to disk, never in memory.
- Run the **preview compile pipeline** (Babel transform for JSX, CSS inliner for HTML exports).
- Spawn headless Chromium via Puppeteer for **PDF export**. PPTX via `pptxgenjs`.
### 3.3 Agent adapter pool
See [`agent-adapters.md`](agent-adapters.md) for the full interface. Each adapter:
1. **Detects** its target CLI (PATH lookup + config-dir probe).
2. **Spawns** the CLI with a standardized wrapper prompt + skill context + design-system context + CWD set to the project's artifact root.
3. **Streams** stdout/stderr as structured events (JSON Lines if the CLI supports it; line-based parser otherwise).
4. **Reports capabilities** — does it support multi-turn? Surgical edits? Native skill loading? Tool use?
### 3.4 Skill registry
See [`skills-protocol.md`](skills-protocol.md). Scans three locations and merges:
| Source | Priority | Purpose |
|---|---|---|
| `./.claude/skills/` | highest | project-private skills |
| `./skills/` | medium | project-declared skills |
| `~/.claude/skills/` | lowest | user-global skills |
Conflicts resolve by priority (higher wins). Each skill parsed once; watched for changes in dev.
### 3.5 Design-system resolver
- Looks for `./DESIGN.md` first, then `./design-system/DESIGN.md`, then user-configured path.
- Parses the 9-section format (see [awesome-claude-design][acd] schema).
- Injects as a prepended system message on every agent run, plus as a `{{ design_system }}` template variable skills can reference.
- Hot-reloads on file change in dev.
### 3.6 Artifact store
Plain files on disk. Conventional layout per project:
```
./.ocd/
├── config.json # project-level daemon config
├── artifacts/
│ ├── 2026-04-24T10-03-12-landing/
│ │ ├── artifact.json # metadata (skill, mode, prompt, parent)
│ │ ├── index.html # primary output (or .jsx, .md, .pptx.json)
│ │ └── assets/ # skill-generated images, fonts, etc.
│ └── …
├── history.jsonl # append-only action log (generations, edits, comments)
└── sessions/
└── <session-id>.json # transient; garbage-collected after 24h
```
Rationale:
- **Plain files** → users can `git add ./.ocd/artifacts/` and review designs in PRs.
- **`artifact.json` metadata** → OCD can reconstruct the artifact tree without a DB.
- **`history.jsonl` not SQLite** → append-only, git-friendly, greppable. [Open CoDesign][ocod] uses SQLite; we deliberately don't.
- **Sessions separate from artifacts** → sessions are ephemeral UI state; artifacts are durable.
### 3.7 Export pipeline
| Format | How |
|---|---|
| HTML (self-contained) | Inline all CSS, rewrite asset URLs to data: URIs |
| PDF | `puppeteer``page.pdf()` on the rendered HTML |
| PPTX | `deck-skill` outputs a JSON intermediate (`slides.json`); `pptxgenjs` generates the `.pptx` |
| ZIP | `archiver` over `artifacts/<id>/` |
| Markdown | direct copy if artifact is `.md`, otherwise skill-defined render |
## 4. Data flow — a typical "generate prototype" turn
```
1. User types prompt in web chat.
2. Web sends { method: "session.generate", params: {
sessionId, prompt, modeHint: "prototype"
}} to daemon via WS.
3. Daemon:
a. picks active skill (prototype-skill)
b. loads design-system (DESIGN.md)
c. materializes a new artifact dir under ./.ocd/artifacts/<slug>/
d. invokes agent adapter with:
- system: skill's SKILL.md contents + DESIGN.md
- user: original prompt
- cwd: the new artifact dir
e. streams agent events back to web as they arrive:
- "tool_call" (edit file, write file, read file)
- "text_delta"
- "thinking" (if supported)
4. Web shows:
- running tool-call feed in the side panel
- artifact tree updates as files materialize
- preview iframe loads the primary output file when agent signals "done"
- slider/comment overlay activates once preview loads
5. On completion, daemon appends:
{ ts, sessionId, artifactId, action: "generate", skill, promptHash }
to history.jsonl.
6. User comments on an element → web sends { method: "session.refine", params: {
sessionId, artifactId, elementId, note }}
7. Daemon re-invokes agent with surgical-edit instruction + the note.
Adapter translates based on capabilities:
- Claude Code → native tool loop, edits that region only
- Codex → regenerates the file with "only change element X" constraint
- API fallback → same as Codex path
```
## 5. Preview renderer
**Constraints:**
- Must isolate artifact code from the host app (no access to window, cookies, parent DOM).
- Must hot-reload as the agent streams writes.
- Must support both static HTML and JSX artifacts.
**Design:**
- Always an `<iframe sandbox="allow-scripts">` — no `allow-same-origin`.
- Static HTML: `srcdoc` load of the inlined artifact.
- JSX: inject a small bootstrap that imports vendored React 18 + Babel standalone, then dynamically evals the JSX as Babel-transformed code. (This is what [Open CoDesign][ocod] does, and it works; no reason to reinvent.)
- Agent writes trigger a debounced rebuild + iframe `srcdoc` replace. Full reload each time — React state loss is acceptable at this scope.
## 6. Config files
| File | Purpose |
|---|---|
| `~/.open-claude-design/config.toml` | daemon-global: default agent preference, keys (optional, BYOK), telemetry opt-in (default off) |
| `~/.open-claude-design/agents.json` | cached agent detection results |
| `./.ocd/config.json` | project-local: active design system, preferred skills, preferred mode |
| `./skills/<skill>/SKILL.md` | skill manifest (standard Claude Code format) |
| `./DESIGN.md` | active design system ([awesome-claude-design][acd] format) |
All config is plain text / TOML / JSON — no binary formats, no sqlite. Reviewable in PRs.
## 7. Protocol between web and daemon
JSON-RPC 2.0 over WebSocket. Why not REST? Because we need streaming. Why not gRPC? Over-engineering for a single-origin local app.
Methods (MVP):
```
session.open(params) -> { sessionId }
session.generate(params) -> streams: tool_call | text_delta | artifact_update | done
session.refine(params) -> streams: same
session.cancel(params) -> { ok }
agents.list() -> [{ id, name, version, capabilities }]
agents.setActive({ agentId }) -> { ok }
skills.list() -> [{ id, name, mode, manifest }]
skills.setActive({ sessionId, skillId }) -> { ok }
skills.install({ source }) -> streams: progress
design.setActive({ path }) -> { designSystem }
artifacts.list({ projectRoot }) -> [Artifact]
artifacts.get({ id }) -> Artifact
artifacts.export({ id, format }) -> streams: progress | { url | path }
```
Full schema in [`schemas/protocol.md`](schemas/protocol.md) (TODO: write).
## 8. Deployment
### Local
```sh
pnpm install
pnpm dev # starts daemon on :7431, next on :3000
```
### Docker
```yaml
# docker-compose.yml
services:
daemon:
image: openclaudedesign/daemon
volumes: [ "~/.open-claude-design:/root/.open-claude-design", "./:/workspace" ]
ports: ["7431:7431"]
web:
image: openclaudedesign/web
ports: ["3000:3000"]
environment: [ "OCD_DAEMON_URL=ws://daemon:7431" ]
```
### Vercel + local daemon (Topology B)
```sh
vercel deploy # web only
ocd daemon --expose # user runs locally; prints tunnel URL
# user pastes URL into /connect UI
```
### Vercel direct (Topology C)
```sh
vercel deploy # same bundle
# flip VERCEL env flag OCD_MODE=direct to hide daemon-connect UI
```
## 9. Security model
| Surface | Threat | Mitigation |
|---|---|---|
| Daemon WebSocket | Arbitrary local process talks to daemon | Token handshake; token printed on `ocd daemon` start, required in WS URL |
| Artifact code in preview | XSS/cookie theft from host | `<iframe sandbox="allow-scripts">`, no `allow-same-origin` |
| Agent running on user's machine | Agent reads/writes outside project | Adapter sets `cwd` to artifact dir; relies on agent's own permission system (Claude Code's `--allowed-tools` etc.) |
| User secrets | Leak to cloud | BYOK stored only in daemon's `config.toml` (mode 0600) or browser `localStorage` in Topology C, never sent to OCD's own servers (we don't have any) |
| Skill from untrusted source | Malicious skill in `~/.claude/skills/` | Install-time warning; skills run under the agent's permission model, not ours |
| Vercel web bundle | Compromised build | Standard Vercel integrity; bundle has zero secrets |
We inherit the agent's permission model on purpose — we don't invent our own sandbox, because Claude Code's `--permission-mode` / Codex's sandboxing / Cursor's containment already exist and are maintained.
## 10. Performance notes
- Daemon startup: < 500 ms (lazy adapter init).
- Agent detection: < 200 ms (parallel PATH probes).
- First generation latency: dominated by agent model time; OCD overhead should be < 50 ms.
- Preview reload: debounced 100 ms on artifact file writes.
- Skill index: cold scan < 100 ms for ~50 skills; watched with `chokidar`.
## 11. What's explicitly out of scope for MVP
- Multi-user / RBAC / orgs
- Hosted skill marketplace (git URLs only in v1)
- Figma export (post-1.0, same as [Open CoDesign][ocod])
- Collaborative editing
- Mobile web support (desktop only in MVP)
- Offline mode (beyond "the agent is local" — we don't cache model responses)
Binary file not shown.

After

Width:  |  Height:  |  Size: 2.7 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.0 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 229 KiB

View File
+64
View File
@@ -0,0 +1,64 @@
# Warm Editorial
> Sample DESIGN.md demonstrating the 9-section format. Used as a reference / test fixture.
> Referenced from [`../spec.md`](../spec.md), [`../skills-protocol.md`](../skills-protocol.md), [`../modes.md`](../modes.md).
## Visual Theme & Atmosphere
Warm, unhurried, magazine-like. Think "a New Yorker interview column online." Generous whitespace, long-form readability, restrained chrome. Playful but never novelty.
## Color Palette & Roles
- **Background:** `#FAF7F2` (warm off-white paper)
- **Foreground:** `#1C1A17` (near-black, slightly warm)
- **Accent (primary):** `#C0512F` (terracotta) — used for links, primary CTAs, 1 hero element max per page
- **Accent (secondary):** `#2F5B4F` (forest) — section dividers, tags
- **Muted:** `#8A817A` (mid-warm-grey) — timestamps, metadata
- **Surface:** `#FFFFFF` — elevated cards only
Never use pure black or pure white anywhere user-facing.
## Typography Rules
- **Display / headings:** "GT Sectra" or fallback serif (`'GT Sectra', 'Times New Roman', serif`)
- **Body:** "Söhne" or fallback sans (`'Söhne', -apple-system, system-ui, sans-serif`)
- **Mono:** `'JetBrains Mono', ui-monospace, monospace` for code only
- Scale (px): 12 · 14 · 16 · 20 · 28 · 40 · 56 · 80
- Line-height: 1.6 for body, 1.2 for display
- Letter-spacing: -0.02em for display sizes above 40px; default elsewhere
## Component Stylings
- **Buttons:** flat fill, 12px radius, 14px padding-block, 20px padding-inline. Primary = terracotta fill, off-white label. Secondary = outlined 1px foreground, transparent fill.
- **Cards:** off-white background, 1px forest-at-8%-opacity border, 16px radius, 2432px internal padding. No shadow except hover (y+2px, blur 16, foreground-at-6%).
- **Inputs:** underline only (no box), 1px muted baseline, terracotta baseline on focus, 16px vertical padding.
- **Links:** terracotta, 1px terracotta-at-40% underline, no underline on hover (swap for terracotta-at-8% background).
## Layout Principles
- 12-column grid, 1200px max-width, 24px gutters.
- Hero sections: 72vh minimum, 120vh maximum. Content top-biased, never centered vertically.
- Body sections: 80px top+bottom spacing at desktop; 48px at tablet; 32px at phone.
- One accent color per screen. If a page has a terracotta hero, secondary CTAs are foreground-only, not forest.
## Depth & Elevation
Minimal. Only two elevation levels:
- **Flat (0):** everything by default.
- **Raised (1):** cards on hover, dropdown menus, floating CTAs. 2px y-offset, 16px blur, foreground at 6% opacity.
No shadows on inputs. No shadows on the hero. No neumorphism, no glassmorphism.
## Do's and Don'ts
- ✅ Let whitespace breathe. A short headline on 50% of the viewport height is correct.
- ✅ Use serif for numbers when they matter (pricing, stats).
- ✅ Draw one accent element per page; the rest is foreground.
- ❌ No gradients.
- ❌ No emojis in product copy.
- ❌ No sentence-case for headings — use title case for H1/H2, sentence case for H3 and below.
- ❌ No border-radius above 24px; no border-radius below 8px.
## Responsive Behavior
- **Desktop ≥ 1024px:** 12-col grid, full hero heights, side-by-side columns.
- **Tablet 6401023px:** 8-col grid; hero drops to 60vh; columns stack at 3+.
- **Phone < 640px:** 4-col grid; single-column layout; hero drops to 50vh; all padding -33%.
## Agent Prompt Guide
When generating artifacts against this design system:
- Lead with typography and whitespace; chrome (borders, shadows) is subtractive.
- If you need more than one accent element on a screen, you're doing too much — cut one.
- When asked for "professional" or "serious," lean harder on serif + whitespace. When asked for "modern," this system isn't the right answer; pick a different DESIGN.md.
- Color tokens are non-negotiable. Do not invent new hex values. If the request needs a color outside this palette, produce a warning comment in the artifact and use the closest existing token.
- Prefer 1 hero + 35 body sections over 1 hero + 8+ sections. Editorial means restraint.
+120
View File
@@ -0,0 +1,120 @@
---
name: saas-landing
description: |
Single-page SaaS landing with hero, features, social proof, pricing, and CTA.
Respects the active DESIGN.md color/typography/layout tokens.
Trigger keywords: "saas landing", "marketing page", "product landing".
triggers:
- "saas landing"
- "marketing page"
- "product landing"
ocd:
mode: prototype
preview:
type: html
entry: index.html
reload: debounce-100
design_system:
requires: true
sections: [color, typography, layout, components]
inputs:
- name: product_name
type: string
required: true
- name: tagline
type: string
required: true
- name: has_pricing
type: boolean
default: true
- name: proof_count
type: integer
default: 3
min: 0
max: 6
parameters:
- name: hero_density
type: spacing
default: 96
range: [48, 200]
- name: accent_strength
type: opacity
default: 1.0
range: [0.5, 1.0]
outputs:
primary: index.html
capabilities_required:
- file_write
---
# SaaS Landing Skill
Produce a single-page SaaS landing. Agent, follow this workflow exactly.
## 1. Read context
Before writing anything:
- Read `DESIGN.md` in the current working directory. If missing, stop and ask for one.
- Identify the color palette, typography tokens, and layout principles.
- Note the "Agent Prompt Guide" section — it overrides any instruction here if they conflict.
## 2. Plan sections
Required sections, in order:
1. **Hero** — logo-or-wordmark, headline (tagline input), subhead (12 sentences), primary CTA, secondary CTA. Use the hero_density parameter as vertical padding in px.
2. **Features** — 36 feature tiles. Each: icon, short title, 12 sentence body.
3. **Social proof**`proof_count` logos or testimonials. If 0, skip this section.
4. **Pricing** — 23 tiers. Include only if `has_pricing` is true.
5. **Footer CTA** — large accent-colored band with one-button call to action.
6. **Footer** — minimal: links + copyright.
## 3. Apply design system
- All colors must come from DESIGN.md tokens. Do not invent hex values.
- Typography: use the declared display font for headlines, body font for everything else.
- Layout: respect the grid, max-width, and section spacing rules.
- Components: use declared button/card/input patterns. Do not add shadows if DESIGN.md's Depth & Elevation says minimal.
- Accent: use the accent color only once in the hero, once in the footer CTA, and for all links. Do not flood the page.
## 4. Write the file
Output a single self-contained `index.html` with:
- All CSS inlined in a `<style>` block in `<head>`.
- System font fallbacks if DESIGN.md fonts aren't loadable from Google Fonts etc.
- No external JS.
- Semantic HTML (`<header>`, `<main>`, `<section>`, `<footer>`).
- Each editable element tagged with `data-ocd-id="<unique-slug>"` so the host app's comment mode can target it.
## 5. Self-check
Before finishing, verify:
- [ ] All text is content-meaningful, not lorem ipsum (use product_name and tagline inputs; generate plausible specific copy for the rest).
- [ ] No broken color references (every CSS color value is in DESIGN.md's palette or a valid alpha/fallback variant).
- [ ] Responsive breakpoints match DESIGN.md's Responsive Behavior section.
- [ ] The page looks good at 1440w, 768w, and 375w (mentally simulate).
- [ ] Accent used no more than twice total.
## 6. Done
Write only `index.html`. Do not generate a separate CSS file, JS file, or README.
---
## For skill authors reading this as a reference
This is a minimal but complete skill. Structure:
```
saas-landing-skill/
├── SKILL.md ← you are here
└── assets/
└── base.html (optional starter template; this skill doesn't use one)
```
Things to notice:
- The `ocd:` front-matter block is optional for Claude-Code-only compatibility, but adding it lights up OCD's typed inputs, sliders, preview metadata, and capability gating.
- The workflow below the front-matter is plain Markdown that the agent reads as its system prompt.
- DESIGN.md is treated as a collaborator, not an override. The skill gives the agent authority to override when the brief conflicts, but never to invent new tokens.
- `data-ocd-id` tagging is how we wire elements to comment mode. Skills that want comment-mode compatibility must annotate their output.
See [`../../skills-protocol.md`](../../skills-protocol.md) for the full protocol.
+273
View File
@@ -0,0 +1,273 @@
# Modes
**Parent:** [`spec.md`](spec.md) · **Siblings:** [`architecture.md`](architecture.md) · [`skills-protocol.md`](skills-protocol.md) · [`agent-adapters.md`](agent-adapters.md)
OCD exposes four user-facing modes. Modes are not arbitrary; each maps to a distinct **skill type** (see [`skills-protocol.md`](skills-protocol.md) §4) and a distinct **workflow shape**. Keeping them separate lets us tune UI affordances, export pipelines, and default skills per mode.
| Mode | What you get | Time to first result | Skill type |
|---|---|---|---|
| **Prototype** | A single editable screen (HTML/JSX) | ~60120s | `prototype-skill` |
| **Deck** | Multi-slide HTML presentation | ~90180s | `deck-skill` |
| **Template** | Populated copy of a curated template | ~2040s | `template-skill` |
| **Design System** | A `DESIGN.md` + sample preview | ~60180s | `design-system-skill` |
Modes compose: Design System first → everything else reads from it. Template reuse is the fast path; Prototype/Deck are the generative path.
---
## 1. Prototype mode
### Purpose
One high-fidelity screen or flow. User brief → working HTML/JSX in a sandboxed iframe.
### UX flow
```
[ mode picker: Prototype ]
[ skill picker: saas-landing | dashboard | login-flow | … ]
[ inputs form (if skill declares ocd.inputs) ]
[ free-text prompt box ]
[ generate ]
[ streaming tool-call feed · artifact tree · preview iframe ]
[ comment mode (if adapter supports surgicalEdit) ]
[ parameter sliders (if skill declares ocd.parameters) ]
[ export: html · pdf · zip ]
```
### Inputs
- Skill selection (defaults to first matching trigger)
- Optional structured inputs from skill (e.g. `product_name`, `has_pricing`)
- Optional free-text prompt
- Active DESIGN.md (auto-injected if skill requires it)
### Outputs
- `index.html` (primary) or `Prototype.jsx` (if skill outputs JSX)
- `assets/` (images, fonts generated by skill)
- `artifact.json` metadata
### Preview
- HTML → `<iframe sandbox="allow-scripts" srcdoc>` with hot-reload on writes.
- JSX → iframe with vendored React 18 + Babel standalone.
- Multi-frame toggle: desktop / tablet / phone widths (borrowed from Open CoDesign).
### Refinement surfaces
- **Chat:** free-text "move the CTA above the fold."
- **Comment mode:** click an element → popover → "make this card glassmorphic." Only available if `capabilities.surgicalEdit === true`.
- **Sliders:** any `ocd.parameters` the skill declared. Slider movements re-prompt with the parameter value only; no full regeneration.
### Default v1 skills
- `saas-landing`
- `dashboard`
- `login-flow`
- `empty-state-pack`
- `pricing-page`
### Failure modes
- Skill requires DESIGN.md but none is set → UI prompts to create one (offers Design System mode).
- Agent times out mid-generation → partial artifact preserved; "resume" button if adapter supports it, else "regenerate."
- Preview iframe fails to render (JSX parse error) → show raw code with error annotation.
---
## 2. Deck mode
### Purpose
Multi-slide presentation. Sliding, magazine, minimal, dark, whatever — as long as the skill supports it.
### UX flow
Same as Prototype, but:
- Skill picker shows only `mode: deck` skills
- Preview renders the full deck with arrow-key navigation (keyboard, scrollwheel, touch) — the deck skill's own navigation
- Export adds `pptx` and `pdf` as first-class options
### Inputs
- Slide count (skill usually declares `ocd.inputs.slide_count`)
- Topic / outline (free text or structured)
- Theme preset (skill-defined enum; e.g. `editorial | minimal | brutalist`)
- DESIGN.md (optional — many deck skills don't need one because they have their own theme system)
### Outputs
- `index.html` — single-file deck, self-contained
- `slides.json` — optional machine-readable outline, used by PPTX export
- `assets/` — images, fonts
### Preview
Just an iframe loading `index.html`. Navigation is the skill's own responsibility. We add a minimal overlay with slide count and keyboard hints.
### Refinement
- **Chat:** "add a slide about pricing between 4 and 5."
- **Per-slide comment:** click a slide → popover → "make this more data-heavy." Translates to surgical edit of that slide's section.
- **Theme slider:** if skill exposes theme parameters (e.g. `accent_hue`), adjustable post-generation.
### Default v1 skills
- `magazine-web-ppt` (fork of [guizang-ppt-skill](https://github.com/op7418/guizang-ppt-skill))
- `pitch-deck` (minimal, investor-oriented)
- `product-demo-deck` (screenshot-heavy)
### Failure modes
- Agent produces deck with `slides.json` missing → PPTX export falls back to page-capture (uglier output). Document per-skill.
- Too many slides → context blown for small-context agents. Skill declares `max_slides` in front-matter; we warn before generating.
---
## 3. Template mode
### Purpose
Skip generation entirely. Curated templates with proven aesthetics. Agent only personalizes content. This is the fastest path and the highest floor — good for users who don't want to prompt.
### UX flow
```
[ template gallery: cards showing thumbnail + name + inferred design system ]
[ pick one ]
[ inputs form: what to personalize (brand name, content blocks, links) ]
[ generate ]
[ preview with populated content ]
[ optional: "restyle to match my DESIGN.md" button ]
[ export ]
```
### Inputs
- Template selection (from bundled gallery + user-added templates)
- Structured content inputs (template declares what slots need filling)
- Optional: re-skin to a target DESIGN.md
### Outputs
Same shape as Prototype mode — the template is just a higher-quality starting artifact.
### How it differs from Prototype
- **No design decisions from the agent.** Layout, spacing, typography all pre-decided.
- **Faster.** Typically 2040s because the agent only fills text.
- **Lower ceiling.** You can't go off-script without falling back to Prototype mode.
### Template format
A template is a special kind of skill (`mode: template`) with this layout:
```
<template-root>/
├── SKILL.md # declares inputs; workflow says "copy and fill"
├── preview.png # gallery thumbnail
├── assets/
│ └── base.html # the template HTML with {{ handlebars }} slots
└── references/
└── DESIGN.md # template's own inferred design system (for re-skin)
```
### Default v1 templates
- `stripe-ish-landing`
- `linear-ish-docs`
- `notion-ish-workspace`
- `vercel-ish-pricing`
- (Names are nods to inspirations, not copies; we don't ship infringing clones.)
### Failure modes
- User-provided content violates template constraints (e.g. too-long heading) → agent auto-truncates with a warning in the artifact metadata.
- Re-skin to DESIGN.md produces ugly result → keep the original; re-skin is non-destructive.
---
## 4. Design System mode
### Purpose
Produce a `DESIGN.md` file. This is the *meta* mode: the output is the input for other modes.
### UX flow
```
[ choose input source ]
→ option A: screenshot upload
→ option B: brand guide PDF upload
→ option C: public URL ("analyze airbnb.com")
→ option D: free-text brief ("warm editorial, terracotta accent…")
[ skill picker: design-system-from-screenshot | … ]
[ generate ]
[ preview: rendered DESIGN.md + sample components demo ]
[ edit the DESIGN.md inline or via chat ]
[ "Set as active design system" button → writes to ./DESIGN.md ]
```
### Inputs
- One of: screenshot, PDF, URL, free-text brief
- Optional: existing DESIGN.md to refine
### Outputs
- `DESIGN.md` — the canonical 9-section format
- `preview.html` — a sample components page rendered against the new design system (hero, buttons, card, form, table)
- `tokens.json` — optional, machine-readable version of the color/typography tokens (for devs who want to import into code)
### Preview
Split view:
- Left: editable DESIGN.md in a Markdown editor
- Right: `preview.html` rendering sample components
### 9-section DESIGN.md format (per [awesome-claude-design](https://github.com/VoltAgent/awesome-claude-design))
1. Visual Theme & Atmosphere
2. Color Palette & Roles
3. Typography Rules
4. Component Stylings
5. Layout Principles
6. Depth & Elevation
7. Do's and Don'ts
8. Responsive Behavior
9. Agent Prompt Guide
This format is not ours. We adopt it because awesome-claude-design has already shipped 68 of them — immediate ecosystem compatibility.
### Default v1 skills
- `design-system-from-screenshot` (vision-capable agent required)
- `design-system-from-brief` (text-only)
- `design-system-refine` (takes existing DESIGN.md + notes)
### Failure modes
- Screenshot upload but active agent has no vision (e.g. older Codex) → suggest switching agent or fall back to "describe the screenshot in text."
- DESIGN.md parse errors when set as active → validator highlights which section is malformed; user edits and retries.
---
## 5. Mode selection & heuristics
### Explicit
User picks a mode from the top-level navigation. Each mode shows only compatible skills.
### Inferred (chat-first flow)
If the user just types a prompt without selecting a mode:
```
prompt contains "slide" | "deck" | "ppt" | "presentation" → Deck
prompt contains "design system" | "tokens" | "brand" → Design System
prompt contains "template" + named template → Template
else → Prototype
```
Inference is a hint; user can override via a mode picker on the artifact page.
## 6. Cross-mode composition examples
- **Design System → Prototype:** run Design System mode once; every Prototype/Deck/Template run after that picks it up from `./DESIGN.md`.
- **Template → Prototype:** pick a template, export as starting artifact, re-open in Prototype mode for free-form edits.
- **Prototype → Design System:** if a generated prototype hits a good aesthetic, we plan a "freeze as design system" action in v1.5. Not in MVP.
## 7. Keyboard & UI affordances (cross-mode)
| Action | Shortcut | Available in |
|---|---|---|
| Generate | ⌘/Ctrl+Enter | all |
| Cancel run | Esc | all |
| Toggle comment mode | ⌘/Ctrl+; | Prototype, Deck |
| Cycle preview frame | ⌘/Ctrl+\\ | Prototype |
| Export | ⌘/Ctrl+E | all |
| Set active design system | n/a (button) | Design System |
## 8. What mode ≠
Modes are **workflow containers**, not product subscriptions or pricing tiers. They all run on the same infrastructure, the same skills protocol, and the same agent adapters. A user can move between modes freely at zero cost.
## 9. Out of scope for MVP
- Hybrid modes (e.g. "deck with a prototype-screen embedded")
- Auto-mode-switching mid-session
- Collaborative multi-user mode
- Mobile-first layouts for the modes themselves (the web UI is desktop-only in MVP)
+146
View File
@@ -0,0 +1,146 @@
# References
**Parent:** [`spec.md`](spec.md)
Every external project this spec leans on. Three questions per entry: what is it, what do we borrow, and what do we deliberately not take.
---
## Primary references
### [Anthropic Claude Design][cd]
- **URL:** [claude.ai/design][cd] · [release announcement](https://www.infoq.cn/article/TH0QVHpvVGZ7VP3hAEmm) · [ifanr review](https://www.ifanr.com/1662860)
[cd]: https://www.anthropic.com/news/claude-design
- **What it is:** Anthropic's closed-source AI design product. Released 2026-04-17. Powered by Opus 4.7. Web-only (claude.ai). Generates prototypes, wireframes, decks, marketing pages, complex prototypes with voice/video/3D/shaders.
- **Why it matters to us:** Defines the category. Its viral moment (~60M X impressions week 1) proves the market.
- **What we borrow:** The high-level value prop — "natural language → editable visual design." Feature inspiration for modes (prototype, deck, marketing). UI ideas around inline editing and custom sliders.
- **What we don't:** Closed source. Anthropic-only models. No self-hosting. Paid tiers (Pro/Max/Team/Enterprise) only. We are not trying to be a drop-in clone; we are an **open substrate** for the same category.
### [Open CoDesign][ocod] (OpenCoworkAI)
- **Repo:** [github.com/OpenCoworkAI/open-codesign][ocod]
- **Site:** [opencoworkai.github.io/open-codesign](https://opencoworkai.github.io/open-codesign/)
- **What it is:** The main open-source Claude Design alternative. MIT-licensed. Electron desktop app. React 19 + Vite + Tailwind v4. [`@mariozechner/pi-ai`][piai] for multi-provider. SQLite for version history. 12 built-in design skill modules. HTML/JSX sandboxed iframe preview. Exports HTML/PDF/PPTX/ZIP/MD. 15 templates. Comment mode + slider controls + multi-frame preview.
[ocod]: https://github.com/OpenCoworkAI/open-codesign
[piai]: https://github.com/mariozechner/pi-ai
- **Why it matters:** Direct competitor; most overlap with what we're building.
- **What we borrow:**
- UI concepts: **comment mode** (click-to-pin element edits), **tweak sliders** (agent-emitted parameters), **multi-frame preview** (desktop/tablet/phone).
- Sandboxed iframe preview (`<iframe sandbox="allow-scripts">` with vendored React 18 + Babel standalone for JSX).
- Export pipeline shape (HTML/PDF/PPTX/ZIP/MD).
- **What we don't:**
- **Electron** — we go Next.js web app instead (runs local and deploys to Vercel).
- **Bundled agent on `pi-ai`** — we delegate to the user's existing CLI.
- **Proprietary skill format** (TypeScript modules compiled into the app) — we use Claude Code's `SKILL.md` so third-party skills drop in.
- **SQLite for artifacts** — plain files + `.jsonl` history, so git tracks it naturally.
- **Sole focus on UI panels** — we add Design System mode and `DESIGN.md` as first-class.
### [multica][multica] (multica-ai)
- **Repo:** [github.com/multica-ai/multica][multica]
[multica]: https://github.com/multica-ai/multica
- **What it is:** Open-source "managed agents platform." Frontend: Next.js 16. Backend: Go + Chi + WebSocket. DB: PostgreSQL + pgvector. **Local daemon auto-detects CLIs on PATH: Claude Code, Codex, OpenClaw, OpenCode, Hermes, Gemini, Pi, Cursor Agent.** Assigns work via a web board view; agents execute; WebSocket streams progress.
- **Why it matters:** They already solved the "detect and wrap local code agents" problem.
- **What we borrow:**
- **PATH-scan + config-dir probe detection** strategy.
- **Local daemon + WebSocket** topology (daemon on user's machine, thin web client).
- Agent catalog (our P0P2 list maps closely to theirs).
- **What we don't:**
- Go backend + PostgreSQL — overkill for our scope; Node daemon + filesystem is enough.
- Team / board / issue-assignment model — not our domain.
- pgvector — we don't embed anything in MVP.
### [cc-switch][ccsw] (farion1231)
- **Repo:** [github.com/farion1231/cc-switch][ccsw]
[ccsw]: https://github.com/farion1231/cc-switch
- **What it is:** Tauri desktop app for managing five CLI tools (Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw). Provider management, MCP server config, skills install, session browsing. SQLite at `~/.cc-switch/cc-switch.db`. **Skills dir at `~/.cc-switch/skills/` with symlinks into each agent's config dir.** 50+ provider presets.
- **Why it matters:** Shows exactly how to live beside multiple code-agent CLIs without stepping on their config.
- **What we borrow:**
- **Symlink-based skill distribution.** Canonical skill location + symlinks to each agent's skills dir.
- Knowledge of per-agent config dir locations (`~/.claude/`, `~/.codex/`, …).
- "Provider presets" idea — a curated list we can ship so users don't have to hand-enter endpoint URLs for OpenAI-compatible relays.
- **What we don't:**
- Tauri / desktop app — not our shape.
- Provider-switching as core feature — we defer that to the underlying agent. If a user wants to switch providers inside Claude Code, they use Claude Code's config, not ours.
- Tray icon / system integration — out of scope.
### [awesome-claude-design][acd] (VoltAgent)
- **Repo:** [github.com/VoltAgent/awesome-claude-design][acd]
[acd]: https://github.com/VoltAgent/awesome-claude-design
- **Ecosystem:** 68 DESIGN.md files for named brands. Referenced schema has **9 standardized sections**: Visual Theme & Atmosphere, Color Palette & Roles, Typography Rules, Component Stylings, Layout Principles, Depth & Elevation, Do's and Don'ts, Responsive Behavior, Agent Prompt Guide.
- **Related URLs:** claude.ai/design, getdesign.md, Discord community
- **Why it matters:** Defines the de-facto portable design-system format for AI agents.
- **What we borrow:**
- **The entire `DESIGN.md` format, unchanged.** We adopt their 9-section schema as OCD's canonical design-system format.
- Ecosystem compatibility: any of their 68 DESIGN.md files works as an OCD active design system out of the box.
- **What we don't:**
- Their curated list itself — we don't fork their 68 files; we reference upstream.
- Their Discord / community layer — not our product.
### [guizang-ppt-skill][guizang] (op7418)
- **Repo:** [github.com/op7418/guizang-ppt-skill][guizang]
[guizang]: https://github.com/op7418/guizang-ppt-skill
- **What it is:** A Claude Code skill producing magazine-style, horizontal-swipe web decks. Structure: `SKILL.md` + `assets/template.html` + `references/{components,layouts,themes,checklist}.md`. 6-step workflow. Single-file HTML output with embedded CSS/WebGL. Keyboard/scroll/touch navigation.
- **Why it matters:** Reference implementation of a high-quality Claude skill, and our default deck skill.
- **What we borrow:**
- **The whole skill, unmodified.** It's our default v1 `deck-skill`. A user runs `ocd skill add https://github.com/op7418/guizang-ppt-skill` and it works.
- Skill directory convention (`assets/` + `references/` + `SKILL.md`) as the pattern we document for skill authors.
- The "6-step workflow + quality-checklist rubric" pattern for authoring new skills.
- **What we don't:** Nothing — this is pure reuse. We add an `ocd:` block to its front-matter only if we want to expose theme sliders; the skill works without it.
---
## Secondary references (format / protocol / UI ideas)
| Project | Relevance |
|---|---|
| [Claude Code skills docs](https://docs.anthropic.com/) | Source of the `SKILL.md` format we adopt |
| [Cursor .cursorrules](https://docs.cursor.com/) | Informs how the Cursor Agent adapter injects skill context |
| [Reveal.js](https://revealjs.com/) / [Marp](https://marp.app/) | Reference for deck HTML navigation patterns |
| [Shadcn/ui](https://ui.shadcn.com/) | Likely component library for the web UI shell |
| [Vercel AI SDK](https://sdk.vercel.ai/) | Streaming primitives for the API-fallback adapter |
| [Puppeteer](https://pptr.dev/) | PDF export engine |
| [pptxgenjs](https://gitbrent.github.io/PptxGenJS/) | PPTX export engine |
| [chokidar](https://github.com/paulmillr/chokidar) | Filesystem watching for skill / artifact hot-reload |
---
## Compatibility & differentiation matrix
| Dimension | [Claude Design][cd] | [Open CoDesign][ocod] | [multica][multica] | [cc-switch][ccsw] | **OCD** |
|---|---|---|---|---|---|
| Open source | ❌ | ✅ | ✅ | ✅ | ✅ |
| Primary form factor | Web (hosted) | Electron | Web + Go daemon | Tauri | **Next.js web + Node daemon** |
| Vercel-deployable | ❌ | ❌ | ❌ | ❌ | **✅** |
| Runs local-only | ❌ | ✅ | ✅ | ✅ | **✅** |
| Generates design artifacts | ✅ | ✅ | ❌ (general coding) | ❌ | **✅** |
| Uses existing code agent | — (owns it) | ❌ | ✅ | ✅ | **✅** |
| Supports Claude Code skills (`SKILL.md`) | — | ❌ | ✅ | ✅ | **✅** |
| `DESIGN.md` as first-class | ❌ | ❌ | — | — | **✅** |
| Deck mode / PPTX export | ✅ | ✅ | ❌ | ❌ | **✅ (via skill)** |
| Template gallery | ✅ | ✅ (15) | ❌ | ❌ | **✅** |
| Design-system authoring mode | ❌ | ❌ | ❌ | ❌ | **✅** |
The two empty-column crossings where OCD lights up and others don't: **Vercel-deployable + design-system authoring**, and **uses existing code agent + first-class DESIGN.md**. That's the niche.
---
## What we explicitly don't borrow (and why)
- **Desktop packaging** (Electron / Tauri). Every minute spent on code-signing is a minute not spent on skills. If a user wants a tray icon, [`cc-switch`][ccsw] already does that — install both.
- **SQLite for artifacts** (from [Open CoDesign][ocod] and [cc-switch][ccsw]). Plain files + JSONL history are reviewable in git, trivially portable, and match the "skills are files" ethos.
- **Bundled model router** ([`pi-ai`][piai] from [Open CoDesign][ocod]). The user's code agent already routes. Two routers is worse than one.
- **PostgreSQL + pgvector** (from [multica][multica]). We don't embed anything in MVP. When we do, SQLite + `sqlite-vec` is enough for single-user scale.
- **Board / issue model** (from [multica][multica]). Off-brand for a design tool.
These "don'ts" are what keep the MVP achievable in 68 weeks.
---
## Living references
This file is maintained. When we add an adapter or borrow a pattern from a new upstream, add it here with the same three-question format. When upstream licensing or direction changes materially, flag it here and cross-link from [`spec.md`](spec.md).
+210
View File
@@ -0,0 +1,210 @@
# Roadmap
**Parent:** [`spec.md`](spec.md) · **Siblings:** [`architecture.md`](architecture.md) · [`skills-protocol.md`](skills-protocol.md) · [`agent-adapters.md`](agent-adapters.md) · [`modes.md`](modes.md)
Phased plan from "spec-only today" to "usable MVP" to "published v1." All estimates assume one focused developer; multiply by 0.6 for two and 0.4 for three.
---
## Phase 0 — Spec finalization (current, ~35 days)
**Goal:** get the interfaces right before writing implementation code. All decisions that are cheap to change on paper and expensive to change in code live here.
**Deliverables:**
- [x] `README.md` + `docs/spec.md` + architecture / protocol / adapter / modes / references docs (this repo, as of now)
- [ ] `docs/schemas/skill-manifest.json` — JSON Schema for the `ocd:` front-matter block
- [ ] `docs/schemas/design-system.md` — formal spec of the 9-section `DESIGN.md`
- [ ] `docs/schemas/protocol.md` — JSON-RPC method signatures
- [ ] `docs/schemas/adapter.md` — adapter interface in TypeScript, printed out
- [ ] `docs/examples/DESIGN.sample.md` — a working example design system
- [ ] `docs/examples/saas-landing-skill/` — a working example skill (the one sketched in `skills-protocol.md` §8)
- [ ] Resolve the four "open questions" at the end of each spec doc
**Exit criteria:** every interface we'll implement has a signed-off schema in this repo. No code yet.
---
## Phase 1 — MVP (~68 weeks)
**Goal:** a single developer can clone, install, start the daemon, point at Claude Code, and produce a prototype and a deck from scratch. The tool is usable for real work even if not polished.
### Scope
**Included:**
- Web app (Next.js 15, App Router)
- chat pane · artifact tree · sandboxed iframe preview · export menu
- skill picker · mode picker · design-system picker
- **no** comment mode yet · **no** sliders yet · **no** template gallery UI yet
- Local daemon (Node)
- JSON-RPC over WebSocket on `:7431`
- agent detection + cached results
- skill registry (scan three dirs, hot-reload)
- artifact store (plain files + `history.jsonl`)
- design-system resolver
- export pipeline (HTML + ZIP only; PDF/PPTX in Phase 2)
- Agent adapters
- **`claude-code`** — native skill loading, streaming, surgical edit
- **`api-fallback`** — direct Anthropic Messages API, minimal tool loop (Read/Write/Edit only)
- Skills shipped in repo
- `saas-landing` (Prototype)
- `magazine-web-ppt` (Deck, fork of guizang-ppt-skill)
- Modes available
- **Prototype** (fully working)
- **Deck** (fully working)
- **Design System** (basic: from text brief only; no screenshot input yet)
- **Template** (deferred to Phase 2)
- Topologies
- **A — fully local** (primary)
- **C — Vercel + direct API** (partial; no daemon features)
**Explicitly out of MVP:**
- Codex / Cursor / Gemini adapters
- Comment mode + sliders
- Template gallery + template skill
- Design System from screenshot (vision) / PDF / URL
- PDF / PPTX export
- Topology B (Vercel + tunneled local daemon)
- Docker compose file
- Skill tests (`ocd skill test`)
- Auth / multi-user
### Week-by-week breakdown
| Week | Theme | Concrete deliverables |
|---|---|---|
| 1 | Scaffolding | monorepo (pnpm workspaces: `apps/web`, `apps/daemon`, `packages/shared`); Next.js 15 base; daemon CLI skeleton; CI green |
| 2 | Daemon core | JSON-RPC over WS; session manager; skill registry scanning; artifact store (write files + `history.jsonl`); design-system resolver loading `./DESIGN.md` |
| 3 | Claude Code adapter | detection (PATH + `~/.claude/` probe); spawn with `--output-format stream-json`; parser from JSON-lines → `AgentEvent`; streaming to daemon's session; cancel via SIGTERM |
| 4 | API-fallback adapter | Anthropic Messages streaming; minimal tool loop (Read/Write/Edit rooted to artifact cwd); integration with skill prompt injection |
| 5 | Web UI — chat + artifact tree | Zustand session store; WS client; chat pane; artifact tree reflects filesystem; skill picker |
| 6 | Web UI — preview + export | sandboxed iframe with hot reload; JSX → vendored React/Babel runtime; export ZIP; export self-contained HTML (inline CSS) |
| 7 | Default skills | port `guizang-ppt-skill` (no modifications; add `ocd:` extension block); write `saas-landing` skill; write 12 DESIGN.md examples; docs for skill authors |
| 8 | Polish + dogfood | end-to-end dogfooding; performance pass (daemon <500ms cold start, first generation overhead <50ms); bug-fixing; first publishable alpha |
### MVP exit criteria
1. `pnpm install && pnpm dev` works on clean macOS and Linux.
2. With Claude Code installed: prototype + deck generation works end-to-end.
3. Without Claude Code installed: API-fallback produces prototypes (not decks — guizang-ppt-skill needs native skill loading).
4. A user can drop a DESIGN.md into the project root and subsequent generations respect it.
5. A third party can publish a skill repo; `ocd skill add <url>` installs it and it works.
6. Artifacts are plain files; `git add ./.ocd/artifacts/` and `git log` tell a sensible story.
7. No Electron, no Tauri, no desktop packaging anywhere in the repo.
---
## Phase 2 — v1 (~8 weeks after MVP)
**Goal:** feature parity with the "UI-polish-heavy" parts of Open CoDesign + multi-agent support + the full four modes.
### Scope
**Agent adapters:**
- `codex` (P1)
- `cursor-agent` (P1)
- capability-driven UI gating (disable features per adapter)
- agent fallback chain
**UI:**
- **Comment mode** (click element → surgical edit; only when `capabilities.surgicalEdit`)
- **Slider parameters** (live-tweak `ocd.parameters`)
- **Multi-frame preview** (desktop / tablet / phone)
- **Template gallery** UI with thumbnails
- **Design System editor** (split view: markdown ↔ sample-components preview)
**Skills:**
- Template skills: `stripe-ish-landing`, `linear-ish-docs`, `notion-ish-workspace`, `vercel-ish-pricing`
- More Prototype skills: `dashboard`, `login-flow`, `empty-state-pack`, `pricing-page`
- More Deck skills: `pitch-deck`, `product-demo-deck`
- Design System skills: `design-system-from-screenshot`, `design-system-refine`
**Modes:**
- **Template mode** fully shipped
- **Design System mode** extended: screenshot input, URL input
**Export:**
- PDF (Puppeteer)
- PPTX (pptxgenjs, driven by `slides.json`)
**Deployment:**
- Docker compose file
- Topology B: Vercel web + tunneled local daemon
- Ship a helper subcommand: `ocd daemon --expose` using `cloudflared` (opt-in, documented)
**Dev experience:**
- `ocd skill test` with cheap-model runs
- Skill author starter template: `ocd skill scaffold`
### v1 exit criteria
1. All four modes fully functional.
2. Three adapters working (Claude Code, Codex, Cursor Agent); fallback chain shipping.
3. PDF + PPTX export working for at least the `magazine-web-ppt` + `pitch-deck` skills.
4. Deployed example at `demo.open-claude-design.dev` (Topology C).
5. Skill author docs published; at least one third-party skill submitted.
6. Documentation site rebuilt from these spec docs.
---
## Phase 3 — v2 (~12 weeks after v1)
**Goal:** ecosystem + robustness.
**Scope sketch (non-binding):**
- Skill marketplace UI — searchable, categorized, install with one click
- Skill signing / checksums
- Gemini CLI + OpenCode + OpenClaw adapters (P2 tier)
- Windows support
- Collaborative mode (multi-user session on a single daemon)
- "Freeze prototype as design system" action
- Figma export (behind the Open CoDesign post-1.0 line; borrow their approach when they ship it)
- Telemetry (opt-in, self-hosted, never phoning home to a central service)
- Hosted SaaS offering (optional; full-local stays primary)
v2 isn't promised. It's the direction if v1 lands.
---
## Risk register
| Risk | Impact | Mitigation |
|---|---|---|
| Claude Code JSON stream format changes between versions | adapter breaks | pin version range; write a compatibility test; keep a parser for each major release |
| Third-party agent CLIs don't expose enough to stream tool calls | UX degrades silently | capability flags + feature gates; document per-adapter limitations in-product |
| `@mariozechner/pi-ai` or similar abstractions get popular and contributors ask us to support them | scope creep | defer; if demand is real, add as yet-another-adapter next to `api-fallback` |
| Vercel deploy (Topology B) flaky because of tunnel setup | users can't try the cloud path | ship Topology C (direct API) as the always-works path; document Topology B as advanced |
| `guizang-ppt-skill` or similar upstream skill changes format | default deck skill breaks | pin git SHA in our default install; monitor upstream |
| DESIGN.md format evolves in awesome-claude-design | incompatibility | track upstream; adopt changes; our resolver is tolerant of missing sections |
| Anthropic ships an open-source Claude Design | differentiation collapses | our moat is the "uses user's existing agent" angle; Anthropic is unlikely to ship that |
| Skill security (malicious skill via `ocd skill add`) | user machine compromise | install-time warning; rely on agent's own permission model; document best practices |
---
## Decision log (lightweight)
Record one line per material decision as we go. Example entries:
- 2026-04-24 — Use plain files + `history.jsonl` over SQLite for artifacts. *Why:* git-reviewable, no driver dependency, matches "skills are files" ethos.
- 2026-04-24 — Adopt `DESIGN.md` (awesome-claude-design) verbatim rather than inventing a new format. *Why:* 68 existing files are immediately compatible.
- 2026-04-24 — Do not ship an Electron / Tauri wrapper. *Why:* every minute on code-signing is a minute not on skills; `cc-switch` already solves the tray-icon use case.
- 2026-04-24 — Delegate the entire agent loop to the user's CLI. *Why:* reimplementing is worse than integrating; ecosystem compatibility beats control.
Decisions supersede each other; keep the log append-only and date every entry.
---
## What to do right after reading this
If you're the implementer:
1. Read [`spec.md`](spec.md) top to bottom.
2. Skim [`architecture.md`](architecture.md), [`skills-protocol.md`](skills-protocol.md), [`agent-adapters.md`](agent-adapters.md).
3. Argue with anything in the four "open questions" sections; file one-line decisions.
4. Fill in the missing Phase 0 deliverables (the `docs/schemas/` and `docs/examples/` files).
5. Scaffold the monorepo and start Week 1.
If you're evaluating the concept:
1. Read [`README.md`](../README.md) + [`spec.md`](spec.md) §13.
2. Check the comparison matrix in [`references.md`](references.md).
3. Look at the worked example in [`skills-protocol.md`](skills-protocol.md) §7 — that's the end-to-end feel.
Binary file not shown.

After

Width:  |  Height:  |  Size: 148 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 263 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 217 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 237 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 101 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 215 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 115 KiB

+322
View File
@@ -0,0 +1,322 @@
# Skills Protocol
**Parent:** [`spec.md`](spec.md) · **Siblings:** [`architecture.md`](architecture.md) · [`agent-adapters.md`](agent-adapters.md) · [`modes.md`](modes.md)
A **Skill** is the atomic unit of design capability in OCD. We adopt Claude Code's `SKILL.md` convention verbatim as the base format, then add optional fields for design-specific features (preview type, input schema, slider parameters). A skill written for plain Claude Code runs in OCD. An OCD skill that doesn't use our extensions runs in plain Claude Code.
> **Compatibility promise:** A skill like [`guizang-ppt-skill`](https://github.com/op7418/guizang-ppt-skill) works in OCD **without modification**. It just drops into `~/.claude/skills/` and OCD discovers it.
---
## 1. Base format (unchanged from Claude Code)
Every skill is a directory containing at minimum a `SKILL.md`:
```
<skill-root>/
├── SKILL.md # manifest + workflow instructions
├── assets/ # templates, images, boilerplate the skill writes
│ └── …
└── references/ # knowledge files the skill reads during planning
├── components.md
├── layouts.md
└── …
```
`SKILL.md` front-matter (YAML):
```yaml
---
name: magazine-web-ppt
description: |
Magazine-style horizontal-swipe web deck.
Trigger keywords: 杂志风 PPT, magazine deck, swipe slides.
triggers:
- "magazine deck"
- "杂志风 PPT"
- "horizontal swipe presentation"
---
```
Body is free-form Markdown that describes the workflow the agent should follow — typically a numbered step list plus principles. This is what [guizang-ppt-skill](https://github.com/op7418/guizang-ppt-skill) does.
**OCD reads all of this as-is.** No changes required.
## 2. OCD extensions (optional)
Skills can declare additional front-matter fields to unlock OCD-specific UI. All fields are optional; absent fields fall back to sensible defaults.
```yaml
---
name: magazine-web-ppt
description:
triggers: […]
# --- OCD extensions below this line ---
ocd:
mode: deck # one of: prototype | deck | template | design-system
preview:
type: html # html | jsx | pptx | markdown
entry: index.html # relative path produced by the skill
reload: debounce-100 # how the preview refreshes
design_system:
requires: true # this skill reads the active DESIGN.md
sections: [color, typography] # which sections it actually uses (for prompt pruning)
inputs: # typed inputs the user can fill in the UI
- name: title
type: string
required: true
- name: slide_count
type: integer
default: 8
min: 4
max: 20
- name: theme
type: enum
values: [editorial, minimal, brutalist, dark-glass, warm]
default: editorial
parameters: # live-tweakable sliders after first generation
- name: accent_hue
type: hue # hue | spacing | font-scale | opacity
default: 18
range: [0, 360]
- name: section_spacing
type: spacing
default: 48
range: [16, 128]
outputs:
primary: index.html
secondary: [slides.json] # for PPTX export
capabilities_required:
- surgical_edit # comment mode needs this
- file_write
---
```
### 2.1 What OCD uses each field for
| Field | Used by |
|---|---|
| `ocd.mode` | routing (which mode picker the skill shows up under) |
| `ocd.preview.type` | picking the right iframe renderer |
| `ocd.design_system.requires` | whether to inject `DESIGN.md` |
| `ocd.design_system.sections` | pruning the injected DESIGN.md to relevant sections only (token savings) |
| `ocd.inputs` | rendering a typed form in the sidebar instead of only free-text |
| `ocd.parameters` | rendering live sliders that re-prompt on change |
| `ocd.outputs.primary` | which file the iframe loads |
| `ocd.outputs.secondary` | which files export pipelines read (e.g. `slides.json` for PPTX) |
| `ocd.capabilities_required` | gating: if the active agent lacks surgical edit, comment mode is disabled for this skill |
### 2.2 If a skill omits `ocd:` entirely
Defaults:
- `mode`: inferred from name/description (best-effort keyword match) or "prototype"
- `preview.type`: sniff for `*.html` → html, `*.jsx` → jsx, else "markdown"
- `preview.entry`: first file matching the sniffed type
- `design_system.requires`: true if the skill body mentions "design system" or "DESIGN.md"
- `inputs`, `parameters`: none (free-text prompt only)
The goal: **zero-config compatibility** for existing Claude Code skills.
## 3. Skill discovery & precedence
The daemon's skill registry scans three locations:
| Location | Priority | Purpose |
|---|---|---|
| `./.claude/skills/` | 1 (highest) | project-private skills, not committed |
| `./skills/` | 2 | project-committed skills |
| `~/.claude/skills/` | 3 | user-global skills |
Conflicts by `name` resolve to the higher-priority version. All locations are watched with `chokidar` in dev and re-scanned on `SIGHUP` in production.
### Symlink strategy (borrowed from [cc-switch](https://github.com/farion1231/cc-switch))
`cc-switch` maintains a central skill dir at `~/.cc-switch/skills/` and symlinks it into each agent's expected location (`~/.claude/skills/`, `~/.codex/skills/`, etc.). OCD can opt into the same model:
```
~/.open-claude-design/skills/
magazine-web-ppt/ (canonical location)
~/.claude/skills/
magazine-web-ppt → ~/.open-claude-design/skills/magazine-web-ppt
~/.codex/skills/
magazine-web-ppt → ~/.open-claude-design/skills/magazine-web-ppt
```
One install → every agent sees the skill. This is optional; users who only use one agent don't need it.
## 4. Skill types (by mode)
Each mode expects a slightly different skill shape. The required outputs and expected workflow differ.
### 4.1 `prototype-skill`
- **Purpose:** single-screen interactive prototype.
- **Preview:** `html` or `jsx`.
- **Primary output:** `index.html` or `Prototype.jsx`.
- **Typical workflow:** clarify brief → resolve design tokens → write component tree → write file.
- **Example skills:** `saas-landing`, `dashboard`, `login-flow`, `empty-states`.
### 4.2 `deck-skill`
- **Purpose:** multi-slide presentation.
- **Preview:** `html` (single-file deck with in-page navigation).
- **Primary output:** `index.html`.
- **Secondary output:** `slides.json` (for PPTX export).
- **Typical workflow:** clarify topic + slide count → pick theme → populate slides from layout catalog → self-check against quality rubric.
- **Reference implementation:** [guizang-ppt-skill](https://github.com/op7418/guizang-ppt-skill) — fork this for v1.
### 4.3 `template-skill`
- **Purpose:** start from a pre-built artifact; agent only personalizes content, doesn't design from scratch.
- **Preview:** inherits from the template bundle (`html` typically).
- **Primary output:** a populated copy of the template.
- **Typical workflow:** copy `assets/template/` to artifact dir → replace content placeholders → optionally tweak tokens to match design system.
- **Why separate from `prototype-skill`:** much faster (no design decisions), higher-quality floor, worse ceiling.
### 4.4 `design-system-skill`
- **Purpose:** produce a `DESIGN.md` from inputs (brand brief, screenshot, URL).
- **Preview:** `markdown` (render the resulting DESIGN.md with a sample-components preview).
- **Primary output:** `DESIGN.md`.
- **Typical workflow:** analyze input → draft 9 sections per awesome-claude-design schema → generate sample component preview → finalize.
- **Post-run:** OCD prompts the user to set this DESIGN.md as the project's active design system.
## 5. The DESIGN.md as skill context
Every nondesign-system skill (modes 13) can consume the active `DESIGN.md`. OCD injects it as:
1. **System-prompt prefix** (required sections only, per `ocd.design_system.sections`).
2. **File available in CWD** named `DESIGN.md` — skills can `Read` it directly via their agent.
3. **Template variable** `{{ design_system }}` if the skill body references it in Mustache-style.
The 9-section DESIGN.md format is **not invented by OCD**; it's the [awesome-claude-design](https://github.com/VoltAgent/awesome-claude-design) convention, reproduced here for convenience:
```markdown
# <Brand Name>
## Visual Theme & Atmosphere
## Color Palette & Roles
## Typography Rules
## Component Stylings
## Layout Principles
## Depth & Elevation
## Do's and Don'ts
## Responsive Behavior
## Agent Prompt Guide
```
Full schema and examples: [`schemas/design-system.md`](schemas/design-system.md) and [`examples/DESIGN.sample.md`](examples/DESIGN.sample.md) (TODO).
## 6. Skill installation
```sh
ocd skill add https://github.com/op7418/guizang-ppt-skill
# → clones into ~/.open-claude-design/skills/magazine-web-ppt
# → symlinks into ~/.claude/skills/ (and any other active agent dirs)
# → re-indexes registry
ocd skill add ./path/to/my-skill
# → symlinks local dir (no copy) into skills registry
ocd skill list
# → table: name, mode, source, agent compatibility
ocd skill remove <name>
# → unlinks; does not delete the source
```
## 7. Worked example — running `guizang-ppt-skill` under OCD
The skill is unchanged. Here's the full path:
1. User: `ocd skill add https://github.com/op7418/guizang-ppt-skill`
2. Registry indexes it. No `ocd:` block in front-matter → defaults applied:
- `mode`: inferred from body mentioning "PPT" → `deck`.
- `preview.type`: sniffed from `assets/template.html``html`.
- `preview.entry`: `index.html` (convention).
- `design_system.requires`: false (skill body doesn't mention DESIGN.md).
3. User switches to `deck` mode in the web UI; skill appears in the skill picker.
4. User types "给我做一份杂志风 8 页投资人 PPT".
5. Daemon dispatches to active agent (Claude Code) with:
- system message: skill's `SKILL.md` body
- cwd: `./.ocd/artifacts/2026-04-24-pitch-deck/`
- files already placed in cwd: `template.html` (from skill's `assets/`)
6. Agent runs its 6-step workflow (clarify → copy template → populate → self-check → preview → refine).
7. OCD streams the agent's tool calls as UI events; artifact dir grows.
8. Agent signals done; daemon sets preview iframe to `index.html`.
9. User clicks "Export PPTX" — export pipeline notices the skill has no `slides.json` output (the upstream skill doesn't produce one). OCD falls back to "print to PDF then page-to-slide PPTX," which is uglier but works. This is a known limitation documented per-skill.
## 8. Writing a new skill — minimal example
```
saas-landing-skill/
├── SKILL.md
└── assets/
└── base.html
```
```markdown
---
name: saas-landing
description: |
Produce a single-page SaaS landing with hero, features, social proof, pricing, CTA.
Trigger: "saas landing", "marketing page", "product landing".
triggers:
- "saas landing"
- "marketing page"
ocd:
mode: prototype
preview:
type: html
entry: index.html
design_system:
requires: true
sections: [color, typography, layout, components]
inputs:
- name: product_name
type: string
required: true
- name: tagline
type: string
required: true
- name: has_pricing
type: boolean
default: true
parameters:
- name: hero_density
type: spacing
default: 96
range: [48, 200]
---
# Workflow
1. Read DESIGN.md from cwd. Adopt its color/typography/layout rules.
2. Copy `assets/base.html` to `index.html` in cwd.
3. Fill sections: hero, features (36), social proof, pricing (if `has_pricing`), CTA, footer.
4. Inline all CSS. Use system font stack as fallback if DESIGN.md typography fails to load.
5. Respect `hero_density` parameter as the hero section's vertical padding in px.
6. Write `index.html`. Done.
```
## 9. Testing skills
A skill ships with optional test inputs that OCD uses for CI:
```
<skill-root>/
└── tests/
├── basic.prompt
├── basic.expected.manifest.json # assertions: files produced, preview.type, etc.
└── basic.expected.regex.txt # text regex assertions against the primary output
```
`ocd skill test <name>` runs the skill against each case using a cheap model (e.g. Haiku 4.5) and asserts on the manifest + regex. Low-fidelity but catches structural regressions.
## 10. Open questions
- **Skill signing.** Can we verify a skill hasn't been tampered with between publish and install? Simplest answer: `ocd skill add` records the git commit SHA; reinstall-on-update warns on signature change. Deferred to v1.
- **Skill composition.** Can a `prototype-skill` call a `deck-skill` for a sub-artifact? Not in v1; skills are leaf-level. Composition would require a meta-skill concept, which is speculative.
- **Parameter stability.** When sliders change, should the agent re-plan or just re-render? Lean: re-render (fast path), with an "also re-plan" button for larger changes.
+142
View File
@@ -0,0 +1,142 @@
# Open Claude Design — Product Spec
**Status:** Draft v0.1 · 2026-04-24
**Scope:** Product definition, scenarios, non-goals, high-level modules, and positioning against both [Anthropic's Claude Design][cd] and the existing open-source alternative ([Open CoDesign][ocod]).
[cd]: https://www.anthropic.com/news/claude-design
[ocod]: https://github.com/OpenCoworkAI/open-codesign
[guizang]: https://github.com/op7418/guizang-ppt-skill
[multica]: https://github.com/multica-ai/multica
[ccsw]: https://github.com/farion1231/cc-switch
[acd]: https://github.com/VoltAgent/awesome-claude-design
[piai]: https://github.com/mariozechner/pi-ai
Other docs:
- Architecture → [`architecture.md`](architecture.md)
- Skills protocol → [`skills-protocol.md`](skills-protocol.md)
- Agent adapters → [`agent-adapters.md`](agent-adapters.md)
- Modes → [`modes.md`](modes.md)
- References & credits → [`references.md`](references.md)
- Roadmap → [`roadmap.md`](roadmap.md)
---
## 1. Product in one sentence
> **A web app that turns natural-language briefs into editable, previewable design artifacts (prototypes, decks, templates, design systems) by orchestrating the code agent already installed on the user's machine.**
## 2. Core bets (and why they're different)
| # | Bet | [Anthropic Claude Design][cd] | [Open CoDesign][ocod] | OCD |
|---|---|---|---|---|
| 1 | Where the product runs | claude.ai only | Local Electron app | **Next.js web app**`pnpm dev`, `vercel deploy`, or `docker compose up` |
| 2 | Who owns the agent loop | Anthropic, closed | [Open CoDesign][ocod] itself, via [`pi-ai`][piai] | **The user's existing code agent CLI** (Claude Code, Codex, Cursor Agent, Gemini CLI, OpenCode, OpenClaw); direct Anthropic API as fallback |
| 3 | What "design skills" are | Proprietary internal tools | TypeScript modules baked into the app | **File-based skills** that follow Claude Code's `SKILL.md` spec — forkable, versionable, shareable, installable by symlink |
| 4 | How design systems are authored | Implicit in prompt | N/A | **`DESIGN.md` files** following the [awesome-claude-design][acd] 9-section schema |
| 5 | Extension point | Anthropic only | Custom PRs | **Drop a folder into `skills/`** — composable by third parties |
The differentiation is not "yet another design generator." It is **an integration shell that refuses to own the agent, the model, or the skill catalog** — all three are external and pluggable.
## 3. Target users
- **Indie devs / designers** who already pay for one coding agent and don't want a second subscription or a second model router just to get design output.
- **Design system maintainers** who want to codify their system as a `DESIGN.md` and have every skill respect it automatically.
- **Skill authors** who want to publish a design skill (e.g. "SaaS marketing page with glassmorphism") and have it run inside any compatible agent without porting.
- **Teams self-hosting AI tooling** who need a web deployment, not an Electron binary, and who need to keep keys in their own infra.
## 4. User scenarios
### S1 — "Give me a prototype"
User opens the web app, types *"Airbnb-style search page, use our internal design system"*, OCD picks the `prototype-skill`, resolves the user's `DESIGN.md`, dispatches to Claude Code with both files plus the brief, streams tool calls into the UI, and renders the resulting HTML in an iframe preview. User clicks an element, drops a comment, the agent rewrites just that region.
### S2 — "Make me a deck"
User says *"8-slide magazine-style pitch deck for my seed round"*. OCD routes to `deck-skill` (a fork of [`guizang-ppt-skill`][guizang]). Output is a single-file HTML deck; preview is the deck itself with arrow-key navigation; export is PDF/PPTX.
### S3 — "Start from a template"
User picks "SaaS landing — Stripe-ish" from a gallery. Template is a pre-filled artifact bundle plus a `DESIGN.md` reference. Agent only fills content; structure is already there. This is the fastest mode — useful for users who don't want to prompt at all.
### S4 — "Set up our design system"
User uploads a screenshot, brand guide PDF, or Figma link. OCD runs `design-system-skill` which produces a `DESIGN.md` following the 9-section format. That file is then referenced by every subsequent generation — prototypes, decks, templates all pick up the tokens.
These four scenarios map 1:1 to the four modes in [`modes.md`](modes.md).
## 5. High-level modules
```
┌──────────────────────────────────────────────────────────────────┐
│ Web App (Next.js) │
│ chat · artifact tree · iframe preview · comment mode · exports │
└────────────┬─────────────────────────────────┬───────────────────┘
│ WebSocket (JSON-RPC) │ HTTPS (BYOK direct)
┌────────────▼──────────────────┐ ┌────────▼─────────────────┐
│ Local Daemon (ocd daemon) │ │ Anthropic Messages API │
│ · agent detection │ │ (fallback when no CLI) │
│ · skill registry │ └──────────────────────────┘
│ · artifact store │
│ · design-system resolver │
└────────────┬──────────────────┘
│ spawn / stdio / SDK
┌────────────▼──────────────────────────────────────────────────┐
│ Code Agent CLIs on user's machine (one or more of): │
│ Claude Code · Codex · Cursor Agent · Gemini CLI · OpenCode │
└───────────────────────────────────────────────────────────────┘
```
Module responsibilities:
- **Web app** — chat UI, artifact tree, sandboxed iframe preview, comment mode, slider controls, export UI. Stateless; all state lives in the daemon or in the browser's IndexedDB for cloud deploys.
- **Daemon** — long-running local process. Detects agents, registers skills, manages artifacts on disk, resolves the active design system, and brokers WebSocket sessions.
- **Agent adapters** — one adapter per supported CLI; see [`agent-adapters.md`](agent-adapters.md).
- **Skill registry** — scans `~/.claude/skills/`, `./skills/`, and `./.claude/skills/`; merges and exposes a typed catalog.
- **Artifact store** — project-scoped folder (default `./.ocd/`) holding generated files, version snapshots (git-friendly), and per-artifact metadata.
- **Design-system resolver** — loads the active `DESIGN.md`, injects it as skill context.
- **Preview renderer** — sandboxed iframe with vendored React + Babel for JSX artifacts; plain iframe for HTML; PDF via the daemon's headless Chrome.
- **Export pipeline** — HTML (inlined), PDF, PPTX, ZIP, Markdown.
## 6. Non-goals
- **We do not ship a model router.** If the user's agent supports 20 providers, great. If it only supports Anthropic, that's the ceiling. We don't layer our own provider abstraction on top of someone else's.
- **We do not ship a desktop app.** No Electron, no Tauri. The "local" story is a Next.js dev server + a Node daemon. If someone wants a tray icon, that's [`cc-switch`][ccsw]'s job, not ours.
- **We do not reinvent the agent loop.** No custom tool-use harness, no bespoke context-manager. Everything goes through the detected agent's native loop.
- **We do not maintain a skill marketplace in v1.** Skills are git URLs and local folders. A browseable UI is v2.
- **We do not try to compete with Figma.** Output is code (HTML/JSX) and content (`DESIGN.md`, Markdown, PPTX), not editable vector canvases.
- **We do not implement auth / billing / orgs in MVP.** Single-user, single-machine. Multi-user is post-v1 and optional.
## 7. Why not just extend [Open CoDesign][ocod]?
We seriously considered it. The concrete blockers:
1. **It's Electron.** Porting to a web architecture requires ripping out ~40% of the code and rewriting the renderer/main IPC layer. At that point it's a rewrite.
2. **It owns the agent loop.** [`pi-ai`][piai] is a perfectly fine provider abstraction, but it means every skill is written against `pi-ai`'s tool-use format — not against whatever Claude Code, Codex, or Cursor Agent natively speak. We can't reuse existing skills, and existing skills can't reuse us.
3. **Skill format is proprietary.** Its 12 skills are TypeScript modules compiled into the app. A user cannot drop [`guizang-ppt-skill`][guizang] in and have it work; there's no `SKILL.md` loader.
4. **No design system abstraction.** Design tokens live in prompts, not in a versioned file that can be shared across projects.
We keep the good parts: comment mode, slider-emitted parameters, multi-frame preview, single-file HTML export, sandboxed iframe rendering. These are all UI ideas that are orthogonal to the agent layer and we'll absolutely borrow them. See [`references.md`](references.md) for the explicit borrow list.
## 8. Positioning against Anthropic's [Claude Design][cd]
We are **not** trying to out-feature [Claude Design][cd]. Claude Design has Anthropic's model team, internal tooling, and a rendering pipeline we can't match. What we offer instead:
- **Self-hostable.** Run on your laptop, your Vercel, your k8s. Secrets never leave.
- **BYO-agent.** If you're already paying for Cursor, that's your agent. If you've standardized on Codex inside your company, use Codex. No mandatory Anthropic subscription.
- **Skills as files.** Version them in git. Fork them. Ship them to teammates as a repo. Run your team's branded deck skill without rebuilding a product.
- **Design systems as files.** A `DESIGN.md` is an artifact you can review in a PR. Claude Design's "design system" lives in an ephemeral chat.
In short: Claude Design is a product; OCD is a **substrate**.
## 9. Success criteria for v1
- One developer can `git clone && pnpm install && pnpm dev`, point at their Claude Code install, and produce a prototype in under 5 minutes.
- A third party can author a skill in a separate git repo, publish it, and have a user install it by running `ocd skill add <git-url>` without touching OCD's source.
- A design system author can write a `DESIGN.md`, point OCD at it, and have the style propagate across prototype / deck / template outputs.
- Deploying to Vercel with a local daemon works end-to-end (the daemon is reachable via localhost tunnel or a user-provided URL).
- Swapping the underlying agent from Claude Code to Codex requires zero skill changes.
## 10. Open questions (to resolve before coding)
- **Daemon ↔ Vercel bridge.** Do we ship a reverse-tunnel helper (like `cloudflared`), require the user to set one up, or punt to "run locally for now"? My current lean: punt for MVP, helper in v1.
- **Artifact versioning.** Git, or SQLite, or both? [Open CoDesign][ocod] uses SQLite; that's easier but less reviewable. Lean: write artifacts as plain files + a `.ocd/history.jsonl` log. Git is the user's business.
- **Comment mode on non-Claude-Code agents.** Claude Code supports surgical edits via its tool loop. Codex and Gemini CLI are less graceful. Do we degrade to "regenerate whole file" for weaker agents? Lean: yes, document clearly in the adapter table.
- **Skill trust model.** Skills can shell out via the agent. We should at minimum warn on install, and probably sandbox the agent's cwd to the project directory. Claude Code's permission mode handles this for us if we use it; Codex is looser. Needs a per-adapter note.
These go on the roadmap as Phase 0 discovery items.