Scorers are functions that measure your AI capability’s output. They receive the inputs and outputs of a capability run, and return a score. The sameDocumentation Index
Fetch the complete documentation index at: https://axiom.co/docs/llms.txt
Use this file to discover all available pages before exploring further.
Scorer API works in both offline and online evaluations.
The key difference between the two contexts is what the scorer receives:
- Offline scorers receive
input,output, andexpected(ground truth from your test collection). - Online scorers are reference-free. They receive
inputandoutputwithout anexpectedvalue.
expected.
Create scorers
Create scorers using theScorer wrapper. A scorer takes a name and a scoring function:
Return types
Scorers can return three types of values:Boolean
Returntrue or false for simple pass/fail checks. The SDK converts booleans to 1 (pass) or 0 (fail) and marks the score as boolean in telemetry.
Numeric
Return a number between0 and 1 for graded scoring:
Score with metadata
Return an object withscore and metadata to attach additional context to the eval span:
Scorer patterns
Exact match (offline)
Compare the output directly against the expected value. This pattern only works in offline evaluations where ground truth is available.Heuristic checks
Validate output structure or format without ground truth. These scorers work in both offline and online evaluations.LLM-as-judge
Use a second model to evaluate the output. Async scorers are useful in both contexts, especially in online evaluations where you don’t have ground truth and need semantic quality assessment.LLM judge scorers add latency and cost per evaluation. In online evaluations, use sampling to control how often they run.
Use autoevals
The autoevals library provides prebuilt scorers for common tasks:
Telemetry
Each scorer produces an OTel span with the following attributes:| Attribute | Description |
|---|---|
gen_ai.operation.name | Always eval.score |
eval.name | The eval name |
eval.score.name | The scorer name |
eval.score.value | The numeric score (0-1) |
eval.score.metadata | JSON string of scorer metadata. Includes eval.score.is_boolean: true when the scorer returned a boolean. |
eval.capability.name | The capability being evaluated |
eval.step.name | The step within the capability (when set) |
eval.tags | ["online"] for online evaluations |
What’s next?
- Use scorers in offline evaluations to test against known-good answers before shipping.
- Use scorers in online evaluations to monitor production quality continuously.