Methodology  ·  Curated marketplace

evaluation

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge,…


Composite

3.0

C 3.9 · A 2.4

How we got there

Craft · D1–D5

D1 · Trigger clarity 4.5
D2 · Output specificity 3.5
D3 · Scope precision 4.0
D4 · Self-containment 3.5
D5 · Reusability 4.0

Adoption · A1–A5

A1 · Maintenance 2.5
A2 · Documentation 1.0
A3 · License 2.5
A4 · Adoption 4.2
A5 · Authorship 2.0

02 — Cross-validation

1 source verified

Install

Use this skill

/plugin install evaluation

Auto-indexed. Editorial review pending — score is based on the rubric only.