YAML-driven eval harness. Pair a prompt with a goldset, define rubrics, run across multiple models in CI. Strong for catching prompt regressions before they hit production.
Eval · Promptfoo
Promptfoo
Open-source LLM eval CLI. Rubric scoring + golden sets.
OPEN SOURCECLImacOSWindowsLinuxVetted
Model support
BYO key / model
- Claude
- GPT
- Gemini
- Local
Where it runs
- CLI
- macOS
- Windows
- Linux
Tags
- #eval
- #ci
- #rubric
- #open-source
Related in Eval
View Braintrust details EvalFREEMIUMBraintrust
Braintrust
Hosted eval + tracing platform for LLM apps.
Production-grade eval orchestration with a dashboard, dataset versioning, and OpenTelemetry tracing. Useful once eval volume outgrows a CI YAML file.
- eval
- tracing
- datasets
- production