@llm_test Decorator

The primary way to write LLM tests.

Basic Usage

test_capital.py


from assertllm import expect, llm_test
 
@llm_test(
    expect.contains("Paris"),
    expect.latency_under(2000),
    model="gpt-4o-mini",
)
def test_capital(llm):
    llm("What is the capital of France?")

Output


test_capital.py::test_capital
   "The capital of France is Paris."
  ✓ contains("Paris")
  ✓ latency_under(2000) — 823ms
  PASSED    [0.8s]
 
────────── assertllm summary ──────────
  LLM tests: 1 passed
  Assertions: 2/2 passed
  Total cost: $0.000018
  Avg latency: 823ms

Parameters

Parameter	Type	Default	Description
`*assertions`	`BaseAssertion`	—	Assertions to check after test
`provider`	`str`	`"openai"`	LLM provider
`model`	`str`	—	Model name
`system_prompt`	`str`	`None`	System prompt
`temperature`	`float`	`None`	Sampling temperature
`max_tokens`	`int`	`None`	Max output tokens
`tags`	`list[str]`	`None`	Test tags for filtering
`retries`	`int`	`0`	Retry count on failure
`retry_delay`	`float`	`1.0`	Seconds between retries
`runs`	`int`	`1`	Number of times to run the test
`min_pass_rate`	`float`	`1.0`	Minimum pass rate (0.0–1.0)

How It Works

The decorator injects the llm callable, runs your test, then checks all assertions against the last LLMOutput. If any fail, the test fails with a detailed error message.

The decorator injects the llm callable into your test function
Your test runs, making LLM calls via llm()
After the test completes, all assertions are checked against the last LLMOutput
If any assertion fails, the test fails with a detailed error message
If retries > 0 and the test fails, it retries up to N times

Per-call Overrides

test_override.py


@llm_test(
    expect.is_not_empty(),
    provider="anthropic",
    model="claude-sonnet-4-6",
)
def test_with_override(llm):
    llm(
        "Explain gravity",
        model="claude-haiku-4-5-20251001",
        temperature=0.0,
    )

Statistical Testing

Run a test multiple times and pass if enough runs succeed.

test_stats.py


@llm_test(
    expect.contains("Paris"),
    expect.not_contains("I'm not sure"),
    expect.latency_under(3000),
    model="claude-sonnet-4-6",
    runs=10,
    min_pass_rate=0.8,
)
def test_capital_reliable(llm):
    llm("What is the capital of France?")

Output


test_stats.py::test_capital_reliable
  Run 1/10: PASSED
  Run 2/10: PASSED
  ...
  Run 10/10: PASSED
  Pass rate: 9/10 (90%) >= 80% required
  PASSED    [8.2s]

This runs the test 10 times and passes if at least 8/10 (80%) succeed.