Skip to Content
DocumentationAssertionsPerformance Assertions

Performance Assertions

All performance assertions are deterministic — no LLM calls.

latency_under

Check that the response time is within a limit.

expect.latency_under(2000) # must respond within 2 seconds

cost_under

Check that the cost stays within budget.

expect.cost_under(0.01) # must cost less than 1 cent

token_count_under

Check that total token usage is within a limit.

expect.token_count_under(500) # max 500 total tokens

Combining

test_perf.py
from assertllm import expect, llm_test @llm_test( expect.is_not_empty(), expect.latency_under(2000), expect.cost_under(0.01), expect.token_count_under(500), model="gpt-4o-mini", ) def test_fast_and_cheap(llm): llm("What is 2+2?")
Output
test_perf.py::test_fast_and_cheap "2 + 2 = 4" ✓ is_not_empty() ✓ latency_under(2000) — 412ms ✓ cost_under(0.01) — $0.000008 ✓ token_count_under(500) — 38 tokens PASSED [0.4s]

Cost and token assertions require the provider to return usage data. Most cloud providers do — Ollama may not.

Last updated on