Skip to Content
Documentationpytest Plugin@llm_test Decorator

@llm_test Decorator

The primary way to write LLM tests.

Basic Usage

test_capital.py
from assertllm import expect, llm_test @llm_test( expect.contains("Paris"), expect.latency_under(2000), model="gpt-4o-mini", ) def test_capital(llm): llm("What is the capital of France?")
Output
test_capital.py::test_capital "The capital of France is Paris." ✓ contains("Paris") ✓ latency_under(2000) — 823ms PASSED [0.8s] ────────── assertllm summary ────────── LLM tests: 1 passed Assertions: 2/2 passed Total cost: $0.000018 Avg latency: 823ms

Parameters

ParameterTypeDefaultDescription
*assertionsBaseAssertionAssertions to check after test
providerstr"openai"LLM provider
modelstrModel name
system_promptstrNoneSystem prompt
temperaturefloatNoneSampling temperature
max_tokensintNoneMax output tokens
tagslist[str]NoneTest tags for filtering
retriesint0Retry count on failure
retry_delayfloat1.0Seconds between retries
runsint1Number of times to run the test
min_pass_ratefloat1.0Minimum pass rate (0.0–1.0)

How It Works

The decorator injects the llm callable, runs your test, then checks all assertions against the last LLMOutput. If any fail, the test fails with a detailed error message.

  1. The decorator injects the llm callable into your test function
  2. Your test runs, making LLM calls via llm()
  3. After the test completes, all assertions are checked against the last LLMOutput
  4. If any assertion fails, the test fails with a detailed error message
  5. If retries > 0 and the test fails, it retries up to N times

Per-call Overrides

test_override.py
@llm_test( expect.is_not_empty(), provider="anthropic", model="claude-sonnet-4-6", ) def test_with_override(llm): llm( "Explain gravity", model="claude-haiku-4-5-20251001", temperature=0.0, )

Statistical Testing

Run a test multiple times and pass if enough runs succeed.

test_stats.py
@llm_test( expect.contains("Paris"), expect.not_contains("I'm not sure"), expect.latency_under(3000), model="claude-sonnet-4-6", runs=10, min_pass_rate=0.8, ) def test_capital_reliable(llm): llm("What is the capital of France?")
Output
test_stats.py::test_capital_reliable Run 1/10: PASSED Run 2/10: PASSED ... Run 10/10: PASSED Pass rate: 9/10 (90%) >= 80% required PASSED [8.2s]

This runs the test 10 times and passes if at least 8/10 (80%) succeed.

Last updated on