Learn how to measure the quality of your AI capabilities by running evaluations against ground truth data.
Eval
functionEval
function, which will be available in the @axiomhq/ai/evals
package. It provides a simple, declarative way to define a test suite for your capability directly in your codebase.
An Eval
is structured around a few key parameters:
data
: An async function that returns your collection
of { input, expected }
pairs, which serve as your ground truth.task
: The function that executes your AI capability, taking an input
and producing an output
.scorers
: An array of grader
functions that score the output
against the expected
value.threshold
: A score between 0 and 1 that determines the pass/fail condition for the evaluation.input
, the generated output
, and the expected
value, and must return a score.
axiom
CLI.
vitest
in the background. Note that vitest
will be a peer dependency for this functionality.
eval.*
attributes, allowing you to deeply analyze results in the Axiom Console.
The Console will feature leaderboards and comparison views to track score progression across different versions of a capability, helping you verify that your changes are leading to measurable improvements.