CI/CD
GitHub Actions
Set up the workflow
.github/workflows/llm-tests.yml
name: LLM Tests
on:
push:
branches: [main]
pull_request:
jobs:
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install -e ".[dev]"
- run: pytest tests/ -v --ignore=tests/test_real_*
llm-tests:
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install -e ".[dev,openai,anthropic]"
- run: pytest -m llmtest --tb=short
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}Add test markers
test_ci.py
import pytest
@pytest.mark.llmtest
def test_with_llm(llm):
output = llm("Hello", model="gpt-4o-mini")
assert output.contentRun selectively
Terminal
pytest -m llmtest # only LLM tests
pytest -m "not llmtest" # everything except LLM testsCost control — LLM tests cost money. Only run them on main or use cheap models like gpt-4o-mini in CI.
Cost Control
test_budget.py
from assertllm import expect, llm_test
@llm_test(
expect.cost_under(0.01),
expect.latency_under(10000),
model="gpt-4o-mini",
)
def test_ci_safe(llm):
llm("What is 2+2?")Output
test_budget.py::test_ci_safe
"4"
✓ cost_under(0.01) — $0.000005
✓ latency_under(10000) — 234ms
PASSED [0.2s]Use expect.cost_under() to set a hard budget per test. Tests that exceed the limit will fail immediately.
Last updated on