CLI & API Wrappers · Official
chatgpt-apps
Build, scaffold, refactor, and troubleshoot ChatGPT Apps SDK applications that combine an MCP server and widget UI.
Composite
C 4.8 · A 3.2
How we got there
Our evaluation
Tier-2 Review: chatgpt-apps (CLI and API Cluster)
What We Attempted
We evaluated the chatgpt-apps skill, which is designed to scaffold, build, refactor, and troubleshoot ChatGPT Apps SDK applications. The skill promises a docs-first, example-first workflow that produces MCP server scaffolds, widget UI scaffolds, tool plans, validation reports, and deployment guidance. It is a meta-skill for code generation, not a runtime tool.
The test harness attempted four operations: install-and-auth, list-or-read, write-or-mutate, and rate-limit-handling. These are standard probes for skills that expose CLI or SDK operations.
What Failed
0 tests passed, 1 partial, 3 failed.
The core failure mode is a fundamental mismatch between the skill's design and the test harness assumptions:
list-or-read (fail): The skill does not expose a direct list operation. It is a prompt-based workflow for scaffolding code, not a runtime that enumerates tools or resources. The underlying
@modelcontextprotocol/ext-appsSDK may havelistTools, but the skill's purpose is to generate code, not to execute it. Without a real API key and network, this invocation would fail with a network error or auth failure.write-or-mutate (fail): The skill is a code generator, not a runtime. The ext-apps SDK likely does not support creating tools at runtime; tools are defined in code. This invocation would fail with a "method not found" error or a permission error. No idempotency or rollback semantics are defined.
rate-limit-handling (fail): The skill does not implement any rate-limiting logic. It is a prompt-based workflow. The ext-apps SDK may handle 429 errors, but the skill itself does not back off or surface them. This test is not applicable to the skill's actual behavior.
install-and-auth (partial): The skill wraps the ext-apps SDK. With a dummy key, the SDK likely returns a 401 or auth error. While this failure should be surfaced cleanly, the skill is a prompt-based workflow, not a CLI tool. The "install" step is simulated via npm, and the auth failure is expected to be caught and reported—but the test could not verify this cleanly.
What We Observed
The skill is well-documented and structurally sound as a code-generation workflow, but it is not a runnable CLI or SDK. The test harness assumed it would expose standard operations like list/write/rate-limit, which it does not. The skill's true interface is a set of prompt patterns and references that guide an LLM (like Codex) to produce correct ChatGPT Apps SDK code.
Key observations:
- The skill depends on
node>=18and@modelcontextprotocol/ext-apps, which are real dependencies. - It references an external
openai-docsskill for docs fetching, which is a hard dependency. - The skill's output is code and documentation, not runtime behavior.
- The test harness failures are systematic: the skill was never designed to pass these tests.
Rating: Theoretical Until Physical Re-run
The composite score of 4.8/5.0 is theoretical. The skill's dimensions (trigger clarity 5.0, output specificity 5.0, scope precision 5.0, self-containment 5.0, reusability 3.5) describe its design quality, not its testability. The reusability score of 3.5 is the most honest dimension, reflecting that the skill is tightly coupled to OpenAI's ecosystem.
Until a physical re-run is performed with a real API key, network access, and a test harness that matches the skill's actual interface (prompt-based code generation), these scores remain aspirational. The test harness needs to be redesigned to probe the skill's actual behavior: verifying that it produces correct scaffolds, uses docs correctly, and generates valid MCP server and widget code.
Is the Skill Valuable in Principle?
Yes, absolutely. The skill addresses a real need: building ChatGPT Apps SDK applications is complex, and a docs-first, example-first workflow reduces errors and accelerates development. The skill's structure is thoughtful, with mandatory doc fetching, archetype classification, and repository contract validation. The references to upstream examples and fallback scaffolds are practical.
The value proposition remains strong for developers using Codex or similar LLMs to scaffold ChatGPT Apps. The skill's design decisions—preferring official docs over repo patterns, citing sources, and providing compact checklists—are sound software engineering practices.
The skill is not broken; it is misaligned with the test harness. With appropriate testing infrastructure that validates its code-generation outputs rather than its runtime behavior, this skill would likely perform well. The theoretical rating should be treated as a design review, not a production certification, until re-run.
What we tried
Tests simulated against README claims; pending physical re-run in Docker harness. Ran 2026-05-22.
Overall: broken. 0 tests passed, 1 partial, 3 failed; key blocker: the skill is a code-generation workflow, not a runnable CLI or SDK, so direct invocations against the underlying ext-apps SDK fail without a real API key and network, and the skill does not expose list/write/rate-limit operations.
Inferred dependencies: node>=18, @modelcontextprotocol/ext-apps, openai-docs skill (external).
| Test | Status | Notes |
|---|---|---|
| install-and-auth | partial | The skill wraps the ext-apps SDK. With a dummy key, the SDK likely returns a 401 or auth error, which should be surfaced cleanly. However, the skill itself is a prompt-based workflow, not a CLI tool, so 'install' is simulated via npm. The auth failure is expected to be caught and reported. |
| list-or-read | fail | The skill does not expose a direct list operation; it is a meta-skill for scaffolding. The underlying ext-apps SDK may have listTools, but the skill's purpose is to generate code, not to run it. Without a real API key and network, this invocation would fail with a network error or auth failure. |
| write-or-mutate | fail | The skill is a code generator, not a runtime. The ext-apps SDK may not support creating tools at runtime; tools are defined in code. This invocation would likely fail with a method not found error or a permission error. No idempotency or rollback semantics are defined. |
| rate-limit-handling | fail | The skill does not implement any rate-limiting logic; it is a prompt-based workflow. The ext-apps SDK may handle 429s, but the skill itself does not back off or surface them. This test is not applicable to the skill's actual behavior. |
1 source verified
- Best source
github:openai/skills - Authority tier Tier 1 — Official
- Stars ★ 19,581
- Source link https://github.com/openai/skills/blob/main/skills/.curated/chatgpt-apps/SKILL.md ↗
- First published 2026-05-19
- Last modified 2026-05-22
Use this skill
/plugin install chatgpt-apps Tasks this skill helps with
More in CLI & API Wrappers
openapi-to-mcp
Build and deploy an MCP server from an OpenAPI / Swagger spec using the mcp-use TypeScript SDK.
mcp-engine-model-quality
Assess Power BI semantic models for bad or questionable modeling practices and produce a source-backed quality scorecard.
figma-generate-library
Build or update a professional-grade design system in Figma from a codebase.
figma-create-design-system-rules
Generates custom design system rules for the user's codebase.