Case

Case is a container for a single scenario: inputs you will pass to your System Under Test (SUT), plus optional reference values you’ll assert against. Using Case enables:

Iterating one merit function over many scenarios with @merit.iter_cases(...)
Grouping related cases with CaseGroup and iterating with @merit.iter_case_groups(...)
Keeping inputs in sut_input_values (to call sut(**case.sut_input_values))
Storing reference data in references (typed or untyped)
Tagging and filtering with tags, and attaching context with metadata

Basic Usage

import merit
from merit import Case


def classifier(text: str) -> str:
    return "positive" if "love" in text.lower() else "negative"


case1 = Case(
    sut_input_values={"text": "I love this"},
    references={"expected_label": "positive"},
    tags={"smoke"},
)

case2 = Case(
    sut_input_values={"text": "This is bad"},
    references={"expected_label": "negative"},
    tags={"regression"},
)


@merit.iter_cases(*[c for c in (case1, case2) if "smoke" in c.tags])
def merit_classifier(case: Case):
    label = classifier(**case.sut_input_values)
    assert label == case.references["expected_label"]

Case API

Adding input values

sut_input_values is a dictionary of keyword arguments that will be passed to your SUT:

from merit import Case

case = Case(sut_input_values={"prompt": "Hello"})

Typed references

Use Case[YourModel] to get IDE autocomplete and runtime validation of references.

from pydantic import BaseModel
from merit import Case


class AgentReference(BaseModel):
    expected_keywords: list[str]
    min_response_length: int = 10


case = Case[AgentReference](
    sut_input_values={"prompt": "Say hello"},
    references=AgentReference(
        expected_keywords=["hello"],
        min_response_length=5
        ),
)

assert case.references.min_response_length == 5

Untyped references

The default Case uses untyped references (a dict) and references defaults to {}:

from merit import Case

case = Case(
    sut_input_values={"x": 2},
    references={"expected": 4},
)

assert case.references["expected"] == 4

Providing ID for persistence

Each case has an id: UUID (auto-generated by default). Provide it explicitly when you want stable case IDs across runs or when you store cases in datasets. This matters because @merit.iter_cases(...) uses case.id to build readable, stable parametrization IDs.

from uuid import UUID

from merit import Case


case = Case(
    id=UUID("00000000-0000-0000-0000-000000000001"),
    sut_input_values={"x": 1},
)

Adding metadata

Use metadata for extra context and reporting. Values must be JSON-like primitives (str | int | float | bool | None).

from merit import Case

case = Case(
    sut_input_values={"prompt": "Hello"},
    metadata={"priority": "high", "latency_budget_ms": 200},
)

Adding tags for filtering

Use tags to label cases and then select subsets with normal Python:

from merit import Case

cases = [
    Case(sut_input_values={"x": 1}, tags={"smoke"}),
    Case(sut_input_values={"x": 2}, tags={"regression"}),
]

smoke_cases = [c for c in cases if "smoke" in c.tags]

Validating cases for a SUT

If cases come from files/APIs, validate them against your SUT’s signature by attaching them to @merit.sut(validate_cases=...):

import merit
from merit import Case


cases = [
    Case(sut_input_values={"prompt": "Hello", "temperature": 0.5}),
    Case(sut_input_values={"prompt": "Hi"}),
]


@merit.sut(validate_cases=cases)
def my_agent():
    def run(prompt: str, temperature: float = 0.7) -> str:
        return f"{prompt} @ {temperature}"

    return run


@merit.iter_cases(*cases)
def merit_my_agent(case: Case, my_agent):
    my_agent(**case.sut_input_values)

CaseGroup

A CaseGroup bundles related Case objects together with group-level references and a pass threshold (min_passes). This is useful when your evaluation naturally splits into logical groups (e.g. topics, languages, difficulty tiers) and you want to:

Assert on group-level data that applies to every case in the group
Set a per-group pass threshold instead of a single global one
Get hierarchical reporting: run → groups → cases

Basic Usage

import merit
from merit import Case, CaseGroup


geography = CaseGroup(
    name="geography",
    cases=[
        Case(sut_input_values={"prompt": "Capital of France?"}, references={"expected": "Paris"}),
        Case(sut_input_values={"prompt": "Capital of Germany?"}, references={"expected": "Berlin"}),
    ],
    min_passes=2,  # strict: both must pass
)

music = CaseGroup(
    name="music",
    cases=[
        Case(sut_input_values={"prompt": "Best rock band?"}, references={"expected": "Metallica"}),
        Case(sut_input_values={"prompt": "Best pop artist?"}, references={"expected": "Lady Gaga"}),
    ],
    min_passes=1,  # tolerant: at least one must pass
)


@merit.iter_case_groups(geography, music)
def merit_chatbot(group: CaseGroup, case: Case, chatbot):
    response = chatbot(**case.sut_input_values)
    assert case.references["expected"] in response

@merit.iter_case_groups injects two parameters by name:

group — the current CaseGroup (with its name, references, and cases)
case — the current Case inside that group

The merit passes only if every group meets its own min_passes threshold.

Typed group references

Just like Case[RefsT], CaseGroup accepts two type parameters: one for case-level references and one for group-level references.

from pydantic import BaseModel
from merit import Case, CaseGroup


class CaseRefs(BaseModel):
    expected: str


class GroupRefs(BaseModel):
    stop_keywords: list[str]


group = CaseGroup[CaseRefs, GroupRefs](
    name="geography",
    references=GroupRefs(stop_keywords=["Lol", "Kek"]),
    cases=[
        Case[CaseRefs](
            sut_input_values={"prompt": "Capital of France?"},
            references=CaseRefs(expected="Paris"),
        ),
    ],
)

# IDE autocomplete works on both levels
group.references.stop_keywords   # list[str]
group.cases[0].references.expected  # str

Validation

CaseGroup enforces these constraints at creation time:

cases must contain at least 1 case
min_passes must be ≥ 1
min_passes cannot exceed the number of cases

# All of these raise ValueError at creation time:
CaseGroup(name="empty", cases=[], min_passes=1)              # no cases
CaseGroup(name="zero", cases=[Case(...)], min_passes=0)      # min_passes < 1
CaseGroup(name="over", cases=[Case(...)], min_passes=2)      # min_passes > len(cases)

Validating grouped cases for a SUT

To validate all case inputs against your SUT signature, flatten the groups:

import merit
from merit import CaseGroup

all_groups = [geography, music]

@merit.sut(validate_cases=[case for group in all_groups for case in group.cases])
def chatbot():
    return my_chatbot_fn

Recommendations

1. Use Cases when data comes from external sources

If you’re hardcoding inputs directly in your merit functions, you probably don’t need Case. Don’t do this:

from merit import Case, iter_cases

# Hardcoding simple data in Case objects is unnecessary
cases = [Case(sut_input_values={"x": 1}), Case(sut_input_values={"x": 2})]

@iter_cases(*cases)
def merit_simple(case, add_one):
    result = add_one(**case.sut_input_values)
    assert result

Do this:

import merit

# For simple hardcoded data, use parametrize directly
@merit.parametrize("x", [1, 2, 3, 4, 5])
def merit_simple(x, add_one):
    result = add_one(x)
    assert result

When loading cases from JSON/APIs, prefer Case[YourModel] (typed references) and validate inputs up-front with @merit.sut(validate_cases=...).

2. Use tags and metadata for dynamic case selection

Structure your cases with tags and metadata to enable flexible selection without changing merit code.

import merit
from merit import Case
import os

cases = [
    Case(
        sut_input_values={"text": "short"},
        tags={"smoke", "fast"},
        metadata={"execution_time_ms": 50}
    ),
    Case(
        sut_input_values={"text": "very long input" * 100},
        tags={"regression", "slow"},
        metadata={"execution_time_ms": 5000}
    ),
]

# Run different subsets based on context
if os.getenv("CI_QUICK"):
    # Fast tests only in CI
    test_cases = [c for c in cases if "fast" in c.tags]
else:
    # Full suite locally
    test_cases = cases

@merit.iter_cases(*test_cases)
def merit_text_processor(case: Case, processor):
    processor(**case.sut_input_values)

Get Started

Usage

Concepts

API Reference

Examples

Basic Usage

Case API

Adding input values

Typed references

Untyped references

Providing ID for persistence

Adding metadata

Adding tags for filtering

Validating cases for a SUT

CaseGroup

Basic Usage

Typed group references

Validation

Validating grouped cases for a SUT

Recommendations

1. Use Cases when data comes from external sources

2. Use tags and metadata for dynamic case selection

Get Started

Usage

Concepts

API Reference

Examples

​Basic Usage

​Case API

​Adding input values

​Typed references

​Untyped references

​Providing ID for persistence

​Adding metadata

​Adding tags for filtering

​Validating cases for a SUT

​CaseGroup

​Basic Usage

​Typed group references

​Validation

​Validating grouped cases for a SUT

​Recommendations

​1. Use Cases when data comes from external sources

​2. Use tags and metadata for dynamic case selection

Basic Usage

Case API

Adding input values

Typed references

Untyped references

Providing ID for persistence

Adding metadata

Adding tags for filtering

Validating cases for a SUT

CaseGroup

Basic Usage

Typed group references

Validation

Validating grouped cases for a SUT

Recommendations

1. Use Cases when data comes from external sources

2. Use tags and metadata for dynamic case selection