Evaluation

Systematically measuring whether an AI workflow produces good outputs.

Definition

Systematically measuring whether an AI workflow produces good outputs. Evaluation can be human review, automated checks, or a second model grading the first. Without evaluation you cannot tell whether a change to a prompt, model, or workflow made things better or worse.

Example

Before swapping the model behind your sales-email generator, you run 50 historical leads through both versions and have a reviewer rate the drafts. The new model wins on 38 of 50 — now you can ship the swap with evidence.

See it in context Learn how Evaluation fits into the bigger picture of how software actually works.

Read the Guide →

Definition

Example

More terms in Operating with AI