CI/CD

GitHub Actions

Set up the workflow

.github/workflows/llm-tests.yml


name: LLM Tests
on:
  push:
    branches: [main]
  pull_request:
 
jobs:
  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install -e ".[dev]"
      - run: pytest tests/ -v --ignore=tests/test_real_*
 
  llm-tests:
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install -e ".[dev,openai,anthropic]"
      - run: pytest -m llmtest --tb=short
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Add test markers

test_ci.py


import pytest
 
@pytest.mark.llmtest
def test_with_llm(llm):
    output = llm("Hello", model="gpt-4o-mini")
    assert output.content

Run selectively

Terminal


pytest -m llmtest           # only LLM tests
pytest -m "not llmtest"     # everything except LLM tests

Cost control — LLM tests cost money. Only run them on main or use cheap models like gpt-4o-mini in CI.

Cost Control

test_budget.py


from assertllm import expect, llm_test
 
@llm_test(
    expect.cost_under(0.01),
    expect.latency_under(10000),
    model="gpt-4o-mini",
)
def test_ci_safe(llm):
    llm("What is 2+2?")

Output


test_budget.py::test_ci_safe
   "4"
  ✓ cost_under(0.01) — $0.000005
  ✓ latency_under(10000) — 234ms
  PASSED    [0.2s]

Use expect.cost_under() to set a hard budget per test. Tests that exceed the limit will fail immediately.