A skill, eating its own dog food

humangarden-review.

Every entry on /tested/ (11 so far) was installed, fed a real input, and judged from the artifact that came out. That protocol is itself a Claude skill — written to the same SKILL.md format we evaluate. Run it on anyone's work, including ours.

What this is

humangarden is a directory of agent skills. Most entries are indexed algorithmically — read once, scored on craft and adoption. A growing subset gets tested: installed end-to-end, fed a representative input, the real output captured as an artifact.

The recipe for that test is fixed: 8 numbered steps, 5 non-negotiable quality rules, a frontmatter schema for the verdict, and a markdown template for the writeup. Same recipe every time. We packaged it as a Claude skill — humangarden-review — so anyone with Claude Code can run our protocol on any other skill.

Including ours. The skill includes itself as an example.

Install

git clone https://github.com/homosapien1218/agent-skills-site
cd agent-skills-site/humangarden-review

# Point Claude Code at the SKILL.md and ask:
claude code --skill ./SKILL.md \
  "review github.com/anthropics/skills/tree/main/pdf"

The skill works in any Claude Code session. It does not depend on humangarden internals — outputs a generic review-output/ folder. Ingest it into the humangarden directory only if you want; otherwise keep the review as your own.

The non-negotiables

You must install and run the target. Reading the SKILL.md is step 0; step 1 is pip install, step 2 is feeding it a real input. Reviews from README-skimming alone are rejected.
The artifacts you capture must be real. No fabricated transcripts, no claimed outputs, no "this is what it would look like". The file in artifacts/ must be the literal file the skill emitted. Verifiable byte-for-byte downstream.
The verdict reflects what you saw, not what the README promised. If prose extraction is clean but table extraction silently fails, the verdict is "do not trust the fallback path", not "~95% accurate".
Caveats are required. Every skill has at least one. If you can't find one, you didn't push hard enough — try empty input, twice-the-size input, malformed input, missing optional deps.
Date everything. Six-month-old reviews get archived. Reality drifts.

What you can read

Source on GitHub ↗ the canonical version of the skill, including scripts/, templates/, and three completed examples (pdf, docx, pptx)
SKILL.md (raw) ↗ just the protocol body — what Claude reads when invoked
/tested/ — the catalog produced by running this skill 11 skills reviewed so far. Every entry is one full pass of the protocol below.