Agent Infrastructure  ·  Curated marketplace

agent-eval

Head-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics


Composite

4.2

C 4.2 · A 0.0

How we got there

Craft · D1–D5

D1 · Trigger clarity 4.5
D2 · Output specificity 4.5
D3 · Scope precision 4.5
D4 · Self-containment 3.5
D5 · Reusability 4.0

02 — Cross-validation

1 source verified

Install

Use this skill

/plugin install agent-eval

Auto-indexed. Editorial review pending — score is based on the rubric only.