Performance Assertions
All performance assertions are deterministic — no LLM calls.
latency_under
Check that the response time is within a limit.
expect.latency_under(2000) # must respond within 2 secondscost_under
Check that the cost stays within budget.
expect.cost_under(0.01) # must cost less than 1 centtoken_count_under
Check that total token usage is within a limit.
expect.token_count_under(500) # max 500 total tokensCombining
test_perf.py
from assertllm import expect, llm_test
@llm_test(
expect.is_not_empty(),
expect.latency_under(2000),
expect.cost_under(0.01),
expect.token_count_under(500),
model="gpt-4o-mini",
)
def test_fast_and_cheap(llm):
llm("What is 2+2?")Output
test_perf.py::test_fast_and_cheap
"2 + 2 = 4"
✓ is_not_empty()
✓ latency_under(2000) — 412ms
✓ cost_under(0.01) — $0.000008
✓ token_count_under(500) — 38 tokens
PASSED [0.4s]Cost and token assertions require the provider to return usage data. Most cloud providers do — Ollama may not.
Last updated on