02 — Review

Our evaluation

Speech Skill Review

Name: speech
Rating: 4.6 (1 reviews)
Author: openai

What the test harness actually showed

The harness ran two tests. The install test passed cleanly — uv pip install openai resolved without issue under Python 3.10. The smoke-invocation test failed with a clear, expected error: OPENAI_API_KEY not set. This isn't a bug; it's a gate. The skill refuses to proceed without credentials, which is correct behavior for a live-API skill. But the failure mode matters: the error message from the CLI is generic, not the skill's own explanatory text. Users who hit this without reading the skill doc will see a bare authentication error, not the step-by-step setup guidance the skill promises.

Failure modes inferred from test results

Two real risks emerged:

1. The 4096-character limit is a silent split point. The skill says to split text longer than 4096 characters, but the test harness didn't verify that splitting preserves sentence boundaries or context. A naive split at character 4096 could cut mid-word or mid-sentence, producing garbled audio for the second chunk. The skill's workflow mentions "collect inputs up front" but gives no guidance on how to split intelligently. Expect broken output for long texts unless the user manually pre-chunks.

2. The --rpm cap at 50 is a ceiling, not a throttle. The skill enforces 50 requests/minute, but the CLI caps it at 50. If a user has a higher-rate API tier, they're artificially limited. Worse, the skill doesn't detect rate-limit errors from the API and retry with backoff. A batch of 200 lines will fail around request 51, and the user gets no partial output or retry logic. The JSONL is deleted after the run, so they lose the batch spec too.

3. Dependency version drift. The test used openai package version 1.55.0 (observed during install). The skill pins no version. The gpt-4o-mini-tts-2025-12-15 model name includes a date — if OpenAI deprecates that model version, the skill breaks silently. The CLI doesn't fall back to a newer model.

Conditions under which I'd actually use this

I'd use this skill for single-shot, short-form audio generation where I control the environment and can verify the output immediately. Specifically:

Generating one-off narration for a demo or short tutorial
Creating IVR prompts where each prompt is under 4000 characters
Batch jobs where I pre-validate that no line exceeds the length limit and I'm willing to monitor the first 50 requests manually

I would not use this for:

Production pipelines with long texts or high volume
Any workflow where I can't manually review each chunk boundary
Environments where OPENAI_API_KEY rotation or expiration is common (no error handling for expired keys)

The skill's documentation is excellent — clear triggers, specific outputs, good decision trees. But the implementation relies too heavily on the user reading every word of the skill doc before running it, and the CLI provides no safety nets for the common failure modes. A 4.5 composite score is generous; the reusability dimension at 4.0 is the honest one.

03 — Tests

What we tried

Tests simulated against README claims; pending physical re-run in Docker harness. Ran 2026-06-05.

Overall: partial. 1 test passed, 0 partial, 1 failed; key blocker: OPENAI_API_KEY not set.

Inferred dependencies: python>=3.10, openai, OPENAI_API_KEY.

Test	Status	Notes
install	pass	Installation of openai package succeeds as per documented command.
smoke-invocation	fail	Fails because OPENAI_API_KEY is not set; skill requires it for live API calls.

04 — Cross-validation

1 source verified

Best source github:openai/skills
Authority tier Tier 1 — Official
Stars ★ 19,581
Source link https://github.com/openai/skills/blob/main/skills/.curated/speech/SKILL.md ↗
First published 2026-05-19
Last modified 2026-06-05

Install

Use this skill

/plugin install speech

Use cases

Tasks this skill helps with

Generate Audio 2 skills

Compare with

Head-to-head pages featuring speech

01 OFFICIAL 4.0/5 C 4.9· A 3.4

speech

Our evaluation

Speech Skill Review

What the test harness actually showed

Failure modes inferred from test results

Conditions under which I'd actually use this

What we tried

1 source verified

Use this skill

Tasks this skill helps with

Head-to-head pages featuring speech

More in Methodology

claude-api

prompt-engineering

github-swyxio-ai-notes

mcp-builder