Performance Assertions

All performance assertions are deterministic — no LLM calls.

latency_under

Check that the response time is within a limit.


expect.latency_under(2000)  # must respond within 2 seconds

cost_under

Check that the cost stays within budget.


expect.cost_under(0.01)   # must cost less than 1 cent

token_count_under

Check that total token usage is within a limit.


expect.token_count_under(500)  # max 500 total tokens

Combining

test_perf.py


from assertllm import expect, llm_test
 
@llm_test(
    expect.is_not_empty(),
    expect.latency_under(2000),
    expect.cost_under(0.01),
    expect.token_count_under(500),
    model="gpt-4o-mini",
)
def test_fast_and_cheap(llm):
    llm("What is 2+2?")

Output


test_perf.py::test_fast_and_cheap
   "2 + 2 = 4"
  ✓ is_not_empty()
  ✓ latency_under(2000) — 412ms
  ✓ cost_under(0.01) — $0.000008
  ✓ token_count_under(500) — 38 tokens
  PASSED    [0.4s]

Cost and token assertions require the provider to return usage data. Most cloud providers do — Ollama may not.