02 — Review

Our evaluation

Tier-2 Review: chatgpt-apps (CLI and API Cluster)

Name: chatgpt-apps
Rating: 4.8 (1 reviews)
Author: openai

What We Attempted

We evaluated the chatgpt-apps skill, which is designed to scaffold, build, refactor, and troubleshoot ChatGPT Apps SDK applications. The skill promises a docs-first, example-first workflow that produces MCP server scaffolds, widget UI scaffolds, tool plans, validation reports, and deployment guidance. It is a meta-skill for code generation, not a runtime tool.

The test harness attempted four operations: install-and-auth, list-or-read, write-or-mutate, and rate-limit-handling. These are standard probes for skills that expose CLI or SDK operations.

What Failed

0 tests passed, 1 partial, 3 failed.

The core failure mode is a fundamental mismatch between the skill's design and the test harness assumptions:

list-or-read (fail): The skill does not expose a direct list operation. It is a prompt-based workflow for scaffolding code, not a runtime that enumerates tools or resources. The underlying @modelcontextprotocol/ext-apps SDK may have listTools, but the skill's purpose is to generate code, not to execute it. Without a real API key and network, this invocation would fail with a network error or auth failure.
write-or-mutate (fail): The skill is a code generator, not a runtime. The ext-apps SDK likely does not support creating tools at runtime; tools are defined in code. This invocation would fail with a "method not found" error or a permission error. No idempotency or rollback semantics are defined.
rate-limit-handling (fail): The skill does not implement any rate-limiting logic. It is a prompt-based workflow. The ext-apps SDK may handle 429 errors, but the skill itself does not back off or surface them. This test is not applicable to the skill's actual behavior.
install-and-auth (partial): The skill wraps the ext-apps SDK. With a dummy key, the SDK likely returns a 401 or auth error. While this failure should be surfaced cleanly, the skill is a prompt-based workflow, not a CLI tool. The "install" step is simulated via npm, and the auth failure is expected to be caught and reported—but the test could not verify this cleanly.

What We Observed

The skill is well-documented and structurally sound as a code-generation workflow, but it is not a runnable CLI or SDK. The test harness assumed it would expose standard operations like list/write/rate-limit, which it does not. The skill's true interface is a set of prompt patterns and references that guide an LLM (like Codex) to produce correct ChatGPT Apps SDK code.

Key observations:

The skill depends on node>=18 and @modelcontextprotocol/ext-apps, which are real dependencies.
It references an external openai-docs skill for docs fetching, which is a hard dependency.
The skill's output is code and documentation, not runtime behavior.
The test harness failures are systematic: the skill was never designed to pass these tests.

Rating: Theoretical Until Physical Re-run

The composite score of 4.8/5.0 is theoretical. The skill's dimensions (trigger clarity 5.0, output specificity 5.0, scope precision 5.0, self-containment 5.0, reusability 3.5) describe its design quality, not its testability. The reusability score of 3.5 is the most honest dimension, reflecting that the skill is tightly coupled to OpenAI's ecosystem.

Until a physical re-run is performed with a real API key, network access, and a test harness that matches the skill's actual interface (prompt-based code generation), these scores remain aspirational. The test harness needs to be redesigned to probe the skill's actual behavior: verifying that it produces correct scaffolds, uses docs correctly, and generates valid MCP server and widget code.

Is the Skill Valuable in Principle?

Yes, absolutely. The skill addresses a real need: building ChatGPT Apps SDK applications is complex, and a docs-first, example-first workflow reduces errors and accelerates development. The skill's structure is thoughtful, with mandatory doc fetching, archetype classification, and repository contract validation. The references to upstream examples and fallback scaffolds are practical.

The value proposition remains strong for developers using Codex or similar LLMs to scaffold ChatGPT Apps. The skill's design decisions—preferring official docs over repo patterns, citing sources, and providing compact checklists—are sound software engineering practices.

The skill is not broken; it is misaligned with the test harness. With appropriate testing infrastructure that validates its code-generation outputs rather than its runtime behavior, this skill would likely perform well. The theoretical rating should be treated as a design review, not a production certification, until re-run.

03 — Tests

What we tried

Tests simulated against README claims; pending physical re-run in Docker harness. Ran 2026-05-22.

Overall: broken. 0 tests passed, 1 partial, 3 failed; key blocker: the skill is a code-generation workflow, not a runnable CLI or SDK, so direct invocations against the underlying ext-apps SDK fail without a real API key and network, and the skill does not expose list/write/rate-limit operations.

Inferred dependencies: node>=18, @modelcontextprotocol/ext-apps, openai-docs skill (external).

Test	Status	Notes
install-and-auth	partial	The skill wraps the ext-apps SDK. With a dummy key, the SDK likely returns a 401 or auth error, which should be surfaced cleanly. However, the skill itself is a prompt-based workflow, not a CLI tool, so 'install' is simulated via npm. The auth failure is expected to be caught and reported.
list-or-read	fail	The skill does not expose a direct list operation; it is a meta-skill for scaffolding. The underlying ext-apps SDK may have listTools, but the skill's purpose is to generate code, not to run it. Without a real API key and network, this invocation would fail with a network error or auth failure.
write-or-mutate	fail	The skill is a code generator, not a runtime. The ext-apps SDK may not support creating tools at runtime; tools are defined in code. This invocation would likely fail with a method not found error or a permission error. No idempotency or rollback semantics are defined.
rate-limit-handling	fail	The skill does not implement any rate-limiting logic; it is a prompt-based workflow. The ext-apps SDK may handle 429s, but the skill itself does not back off or surface them. This test is not applicable to the skill's actual behavior.

04 — Cross-validation

1 source verified

Best source github:openai/skills
Authority tier Tier 1 — Official
Stars ★ 19,581
Source link https://github.com/openai/skills/blob/main/skills/.curated/chatgpt-apps/SKILL.md ↗
First published 2026-05-19
Last modified 2026-05-22

Install

Use this skill

/plugin install chatgpt-apps

Use cases

Tasks this skill helps with

Compare with

Head-to-head pages featuring chatgpt-apps

chatgpt-apps vs figma-generate-library CLI & API Wrappers · 3.9/5

01 CURATED 4.6/5 C 4.6· A 0.0

chatgpt-apps