@llm_test Decorator
The primary way to write LLM tests.
Basic Usage
test_capital.py
from assertllm import expect, llm_test
@llm_test(
expect.contains("Paris"),
expect.latency_under(2000),
model="gpt-4o-mini",
)
def test_capital(llm):
llm("What is the capital of France?")Output
test_capital.py::test_capital
"The capital of France is Paris."
✓ contains("Paris")
✓ latency_under(2000) — 823ms
PASSED [0.8s]
────────── assertllm summary ──────────
LLM tests: 1 passed
Assertions: 2/2 passed
Total cost: $0.000018
Avg latency: 823msParameters
| Parameter | Type | Default | Description |
|---|---|---|---|
*assertions | BaseAssertion | — | Assertions to check after test |
provider | str | "openai" | LLM provider |
model | str | — | Model name |
system_prompt | str | None | System prompt |
temperature | float | None | Sampling temperature |
max_tokens | int | None | Max output tokens |
tags | list[str] | None | Test tags for filtering |
retries | int | 0 | Retry count on failure |
retry_delay | float | 1.0 | Seconds between retries |
runs | int | 1 | Number of times to run the test |
min_pass_rate | float | 1.0 | Minimum pass rate (0.0–1.0) |
How It Works
The decorator injects the llm callable, runs your test, then checks all assertions against the last LLMOutput. If any fail, the test fails with a detailed error message.
- The decorator injects the
llmcallable into your test function - Your test runs, making LLM calls via
llm() - After the test completes, all assertions are checked against the last
LLMOutput - If any assertion fails, the test fails with a detailed error message
- If
retries > 0and the test fails, it retries up to N times
Per-call Overrides
test_override.py
@llm_test(
expect.is_not_empty(),
provider="anthropic",
model="claude-sonnet-4-6",
)
def test_with_override(llm):
llm(
"Explain gravity",
model="claude-haiku-4-5-20251001",
temperature=0.0,
)Statistical Testing
Run a test multiple times and pass if enough runs succeed.
test_stats.py
@llm_test(
expect.contains("Paris"),
expect.not_contains("I'm not sure"),
expect.latency_under(3000),
model="claude-sonnet-4-6",
runs=10,
min_pass_rate=0.8,
)
def test_capital_reliable(llm):
llm("What is the capital of France?")Output
test_stats.py::test_capital_reliable
Run 1/10: PASSED
Run 2/10: PASSED
...
Run 10/10: PASSED
Pass rate: 9/10 (90%) >= 80% required
PASSED [8.2s]This runs the test 10 times and passes if at least 8/10 (80%) succeed.
Last updated on