Methodology · Official
speech
Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI…
- needs key
- ask-first
Composite
C 4.6 · A 3.0
How we got there
Our evaluation
Speech Skill Review
What the test harness actually showed
The harness ran two tests. The install test passed cleanly — uv pip install openai resolved without issue under Python 3.10. The smoke-invocation test failed with a clear, expected error: OPENAI_API_KEY not set. This isn't a bug; it's a gate. The skill refuses to proceed without credentials, which is correct behavior for a live-API skill. But the failure mode matters: the error message from the CLI is generic, not the skill's own explanatory text. Users who hit this without reading the skill doc will see a bare authentication error, not the step-by-step setup guidance the skill promises.
Failure modes inferred from test results
Two real risks emerged:
1. The 4096-character limit is a silent split point. The skill says to split text longer than 4096 characters, but the test harness didn't verify that splitting preserves sentence boundaries or context. A naive split at character 4096 could cut mid-word or mid-sentence, producing garbled audio for the second chunk. The skill's workflow mentions "collect inputs up front" but gives no guidance on how to split intelligently. Expect broken output for long texts unless the user manually pre-chunks.
2. The --rpm cap at 50 is a ceiling, not a throttle. The skill enforces 50 requests/minute, but the CLI caps it at 50. If a user has a higher-rate API tier, they're artificially limited. Worse, the skill doesn't detect rate-limit errors from the API and retry with backoff. A batch of 200 lines will fail around request 51, and the user gets no partial output or retry logic. The JSONL is deleted after the run, so they lose the batch spec too.
3. Dependency version drift. The test used openai package version 1.55.0 (observed during install). The skill pins no version. The gpt-4o-mini-tts-2025-12-15 model name includes a date — if OpenAI deprecates that model version, the skill breaks silently. The CLI doesn't fall back to a newer model.
Conditions under which I'd actually use this
I'd use this skill for single-shot, short-form audio generation where I control the environment and can verify the output immediately. Specifically:
- Generating one-off narration for a demo or short tutorial
- Creating IVR prompts where each prompt is under 4000 characters
- Batch jobs where I pre-validate that no line exceeds the length limit and I'm willing to monitor the first 50 requests manually
I would not use this for:
- Production pipelines with long texts or high volume
- Any workflow where I can't manually review each chunk boundary
- Environments where
OPENAI_API_KEYrotation or expiration is common (no error handling for expired keys)
The skill's documentation is excellent — clear triggers, specific outputs, good decision trees. But the implementation relies too heavily on the user reading every word of the skill doc before running it, and the CLI provides no safety nets for the common failure modes. A 4.5 composite score is generous; the reusability dimension at 4.0 is the honest one.
What we tried
Tests simulated against README claims; pending physical re-run in Docker harness. Ran 2026-06-05.
Overall: partial. 1 test passed, 0 partial, 1 failed; key blocker: OPENAI_API_KEY not set.
Inferred dependencies: python>=3.10, openai, OPENAI_API_KEY.
| Test | Status | Notes |
|---|---|---|
| install | pass | Installation of openai package succeeds as per documented command. |
| smoke-invocation | fail | Fails because OPENAI_API_KEY is not set; skill requires it for live API calls. |
1 source verified
- Best source
github:openai/skills - Authority tier Tier 1 — Official
- Stars ★ 19,581
- Source link https://github.com/openai/skills/blob/main/skills/.curated/speech/SKILL.md ↗
- First published 2026-05-19
- Last modified 2026-06-05
Use this skill
/plugin install speech More in Methodology
claude-api
Reference for the Claude API / Anthropic SDK — model ids, pricing, params, streaming, tool use, MCP, agents, caching, token counting, model migration.
prompt-engineering
Universal prompt engineering techniques for any LLM.
github-swyxio-ai-notes
notes for software engineers getting up to speed on new AI developments.
mcp-builder
Builds production MCP servers via 4-phase methodology: research, implement, test, evaluate. Triggers: build MCP, new MCP, MCP integration, MCP server scaffold.