Documentation Index
Fetch the complete documentation index at: https://docs.appmerit.com/llms.txt
Use this file to discover all available pages before exploring further.
Case is a container for a single scenario: inputs you will pass to your System Under Test (SUT), plus optional reference values you’ll assert against.
Using Case enables:
- Iterating one merit function over many scenarios with
@merit.iter_cases(...)
- Grouping related cases with
CaseGroup and iterating with @merit.iter_case_groups(...)
- Keeping inputs in
sut_input_values (to call sut(**case.sut_input_values))
- Storing reference data in
references (typed or untyped)
- Tagging and filtering with
tags, and attaching context with metadata
Basic Usage
import merit
from merit import Case
def classifier(text: str) -> str:
return "positive" if "love" in text.lower() else "negative"
case1 = Case(
sut_input_values={"text": "I love this"},
references={"expected_label": "positive"},
tags={"smoke"},
)
case2 = Case(
sut_input_values={"text": "This is bad"},
references={"expected_label": "negative"},
tags={"regression"},
)
@merit.iter_cases(*[c for c in (case1, case2) if "smoke" in c.tags])
def merit_classifier(case: Case):
label = classifier(**case.sut_input_values)
assert label == case.references["expected_label"]
Case API
sut_input_values is a dictionary of keyword arguments that will be passed to your SUT:
from merit import Case
case = Case(sut_input_values={"prompt": "Hello"})
Typed references
Use Case[YourModel] to get IDE autocomplete and runtime validation of references.
from pydantic import BaseModel
from merit import Case
class AgentReference(BaseModel):
expected_keywords: list[str]
min_response_length: int = 10
case = Case[AgentReference](
sut_input_values={"prompt": "Say hello"},
references=AgentReference(
expected_keywords=["hello"],
min_response_length=5
),
)
assert case.references.min_response_length == 5
Untyped references
The default Case uses untyped references (a dict) and references defaults to {}:
from merit import Case
case = Case(
sut_input_values={"x": 2},
references={"expected": 4},
)
assert case.references["expected"] == 4
Providing ID for persistence
Each case has an id: UUID (auto-generated by default). Provide it explicitly when you want stable case IDs across runs or when you store cases in datasets.
This matters because @merit.iter_cases(...) uses case.id to build readable, stable parametrization IDs.
from uuid import UUID
from merit import Case
case = Case(
id=UUID("00000000-0000-0000-0000-000000000001"),
sut_input_values={"x": 1},
)
Use metadata for extra context and reporting. Values must be JSON-like primitives (str | int | float | bool | None).
from merit import Case
case = Case(
sut_input_values={"prompt": "Hello"},
metadata={"priority": "high", "latency_budget_ms": 200},
)
Use tags to label cases and then select subsets with normal Python:
from merit import Case
cases = [
Case(sut_input_values={"x": 1}, tags={"smoke"}),
Case(sut_input_values={"x": 2}, tags={"regression"}),
]
smoke_cases = [c for c in cases if "smoke" in c.tags]
Validating cases for a SUT
If cases come from files/APIs, validate them against your SUT’s signature by attaching them to @merit.sut(validate_cases=...):
import merit
from merit import Case
cases = [
Case(sut_input_values={"prompt": "Hello", "temperature": 0.5}),
Case(sut_input_values={"prompt": "Hi"}),
]
@merit.sut(validate_cases=cases)
def my_agent():
def run(prompt: str, temperature: float = 0.7) -> str:
return f"{prompt} @ {temperature}"
return run
@merit.iter_cases(*cases)
def merit_my_agent(case: Case, my_agent):
my_agent(**case.sut_input_values)
CaseGroup
A CaseGroup bundles related Case objects together with group-level references and a pass threshold (min_passes). This is useful when your evaluation naturally splits into logical groups (e.g. topics, languages, difficulty tiers) and you want to:
- Assert on group-level data that applies to every case in the group
- Set a per-group pass threshold instead of a single global one
- Get hierarchical reporting: run → groups → cases
Basic Usage
import merit
from merit import Case, CaseGroup
geography = CaseGroup(
name="geography",
cases=[
Case(sut_input_values={"prompt": "Capital of France?"}, references={"expected": "Paris"}),
Case(sut_input_values={"prompt": "Capital of Germany?"}, references={"expected": "Berlin"}),
],
min_passes=2, # strict: both must pass
)
music = CaseGroup(
name="music",
cases=[
Case(sut_input_values={"prompt": "Best rock band?"}, references={"expected": "Metallica"}),
Case(sut_input_values={"prompt": "Best pop artist?"}, references={"expected": "Lady Gaga"}),
],
min_passes=1, # tolerant: at least one must pass
)
@merit.iter_case_groups(geography, music)
def merit_chatbot(group: CaseGroup, case: Case, chatbot):
response = chatbot(**case.sut_input_values)
assert case.references["expected"] in response
@merit.iter_case_groups injects two parameters by name:
group — the current CaseGroup (with its name, references, and cases)
case — the current Case inside that group
The merit passes only if every group meets its own min_passes threshold.
Typed group references
Just like Case[RefsT], CaseGroup accepts two type parameters: one for case-level references and one for group-level references.
from pydantic import BaseModel
from merit import Case, CaseGroup
class CaseRefs(BaseModel):
expected: str
class GroupRefs(BaseModel):
stop_keywords: list[str]
group = CaseGroup[CaseRefs, GroupRefs](
name="geography",
references=GroupRefs(stop_keywords=["Lol", "Kek"]),
cases=[
Case[CaseRefs](
sut_input_values={"prompt": "Capital of France?"},
references=CaseRefs(expected="Paris"),
),
],
)
# IDE autocomplete works on both levels
group.references.stop_keywords # list[str]
group.cases[0].references.expected # str
Validation
CaseGroup enforces these constraints at creation time:
cases must contain at least 1 case
min_passes must be ≥ 1
min_passes cannot exceed the number of cases
# All of these raise ValueError at creation time:
CaseGroup(name="empty", cases=[], min_passes=1) # no cases
CaseGroup(name="zero", cases=[Case(...)], min_passes=0) # min_passes < 1
CaseGroup(name="over", cases=[Case(...)], min_passes=2) # min_passes > len(cases)
Validating grouped cases for a SUT
To validate all case inputs against your SUT signature, flatten the groups:
import merit
from merit import CaseGroup
all_groups = [geography, music]
@merit.sut(validate_cases=[case for group in all_groups for case in group.cases])
def chatbot():
return my_chatbot_fn
Recommendations
1. Use Cases when data comes from external sources
If you’re hardcoding inputs directly in your merit functions, you probably don’t need Case.
Don’t do this:
from merit import Case, iter_cases
# Hardcoding simple data in Case objects is unnecessary
cases = [Case(sut_input_values={"x": 1}), Case(sut_input_values={"x": 2})]
@iter_cases(*cases)
def merit_simple(case, add_one):
result = add_one(**case.sut_input_values)
assert result
Do this:
import merit
# For simple hardcoded data, use parametrize directly
@merit.parametrize("x", [1, 2, 3, 4, 5])
def merit_simple(x, add_one):
result = add_one(x)
assert result
When loading cases from JSON/APIs, prefer Case[YourModel] (typed references) and validate inputs up-front with @merit.sut(validate_cases=...).
Structure your cases with tags and metadata to enable flexible selection without changing merit code.
import merit
from merit import Case
import os
cases = [
Case(
sut_input_values={"text": "short"},
tags={"smoke", "fast"},
metadata={"execution_time_ms": 50}
),
Case(
sut_input_values={"text": "very long input" * 100},
tags={"regression", "slow"},
metadata={"execution_time_ms": 5000}
),
]
# Run different subsets based on context
if os.getenv("CI_QUICK"):
# Fast tests only in CI
test_cases = [c for c in cases if "fast" in c.tags]
else:
# Full suite locally
test_cases = cases
@merit.iter_cases(*test_cases)
def merit_text_processor(case: Case, processor):
processor(**case.sut_input_values)