Merit Definitions API

Decorators

@resource

@resource(
    fn: Callable | None = None,
    *,
    scope: Scope | str = Scope.CASE,
    on_resolve: Callable[[Any], Any] | None = None,
    on_injection: Callable[[Any], Any] | None = None,
    on_teardown: Callable[[Any], Any] | None = None,
)

Parameters:

Name	Type	Default	Description
`fn`	`Callable \| None`	`None`	The resource factory function
`scope`	`Scope \| str`	`Scope.CASE`	Lifecycle scope: `"case"`, `"suite"`, or `"session"`
`on_resolve`	`Callable \| None`	`None`	Hook called once when resource is first created
`on_injection`	`Callable \| None`	`None`	Hook called every time resource is injected
`on_teardown`	`Callable \| None`	`None`	Hook called after generator teardown runs

Returns: Decorated function registered as a resource Example:

import merit

@merit.resource
def api_client():
    return APIClient()

@merit.resource(scope="session")
async def database():
    conn = await connect()
    yield conn
    await conn.close()

@merit.resource(
    scope="suite",
    on_injection=lambda client: client.refresh_token()
)
def authenticated_client():
    client = APIClient()
    client.login()
    yield client
    client.logout()

def merit_test(api_client, database, authenticated_client):
    # All resources automatically injected
    pass

@sut

@sut(
    fn: Callable | None = None,
    *,
    scope: Scope | str = Scope.CASE,
    method: str = "__call__",
    validate_cases: list[Case[Any]] | None = None,
)

Parameters:

Name	Type	Default	Description
`fn`	`Callable \| None`	`None`	The SUT factory function to register
`scope`	`Scope \| str`	`Scope.CASE`	Resource lifecycle scope: `"case"`, `"suite"`, `"session"`
`method`	`str`	`"__call__"`	Method to trace when the factory returns a non-callable instance
`validate_cases`	`list[Case[Any]] \| None`	`None`	Cases to validate against the resolved SUT signature (raises on invalid input)

Returns: Decorated function registered as a traced resource Example:

import merit

from rag import retrieve
from agents import Agent

@merit.sut
def rag():
    return retrieve

@merit.sut(method="run")
def agent():
    agent = Agent(tools="all")
    return agent

cases = [
    merit.Case(sut_input_values={"query": "Privacy policy"}),
    merit.Case(sut_input_values={"query": "How do you store data?"}),
]

@merit.sut(validate_cases=cases)
def search_agent():
    return retrieve

def merit_test(agent, rag):
    # Resolved values behave like original callable/instance APIs
    context = retrieve("Privacy policy")
    output = agent.run("How do you store my data?", context=context)

@parametrize

Run a merit function with multiple parameter combinations. Signature:

@parametrize(
    argnames: str | Sequence[str],
    argvalues: Iterable[Any],
    *,
    ids: Sequence[str] | None = None,
)

Parameters:

Name	Type	Default	Description
`argnames`	`str \| Sequence[str]`	-	Parameter name(s) as string or sequence
`argvalues`	`Iterable[Any]`	-	List of value tuples for each parameter combination
`ids`	`Sequence[str] \| None`	`None`	Optional custom IDs for each test case

Returns: Decorator that applies parametrization to the target function Example:

import merit

@merit.parametrize("city,state", [
    ("Boston", "Massachusetts"),
    ("Austin", "Texas"),
    ("Miami", "Florida"),
])
def merit_geography(city: str, state: str, bot):
    result = bot.ask(f"What state is {city} in?")
    assert state in result

# Stacked parametrization (creates cartesian product)
@merit.parametrize("model", ["gpt-4", "claude-3"])
@merit.parametrize("temperature", [0.0, 0.7])
def merit_combinations(model: str, temperature: float):
    # Runs 4 times: 2 models × 2 temperatures
    pass

# Custom IDs
@merit.parametrize("value", [1, 2, 3], ids=["one", "two", "three"])
def merit_custom_ids(value: int):
    assert value > 0

@repeat

Run a merit function multiple times to test consistency. Signature:

@repeat(
    count: int,
    *,
    min_passes: int | None = None,
)

Parameters:

Name	Type	Default	Description
`count`	`int`	-	Number of times to run the merit
`min_passes`	`int \| None`	`count`	Minimum passes required (defaults to all)

Returns: Decorator that applies repeat configuration to the target Example:

import merit

@merit.repeat(10)
def merit_consistent(llm):
    # All 10 runs must pass
    response = llm.generate("Say hello")
    assert "hello" in response.lower()

@merit.repeat(10, min_passes=8)
def merit_mostly_reliable(llm):
    # At least 8 out of 10 must pass
    response = llm.generate("Translate 'hello' to Spanish")
    assert "hola" in response.lower()

@run_inline

Run a synchronous merit inline on the event-loop thread. By default, synchronous merits are offloaded to a worker thread via asyncio.to_thread(...). Use @run_inline to opt out when thread affinity matters. Signature:

@run_inline

Parameters:

Name	Type	Description
`fn`	`Callable[..., Any]`	Synchronous merit function to mark for inline execution

Returns: Decorated sync function marked to run inline Notes:

Applies only to synchronous def merit_* functions.
async def merit_* functions already run on the event-loop thread.

Example:

import threading
import merit

def merit_default_threaded():
    # Default sync behavior: worker thread.
    assert threading.current_thread() is not threading.main_thread()

@merit.run_inline
def merit_main_thread_only():
    # Opt-out: runs inline on main event-loop thread.
    assert threading.current_thread() is threading.main_thread()

@tag

Add tags to merit functions or classes for filtering and organization. Signature:

@tag(*names: str)

Parameters:

Name	Type	Description
`*names`	`str`	Tag names to apply

Returns: Decorator that adds tags to the target Example:

import merit

@merit.tag("smoke", "fast")
def merit_health_check(api):
    assert api.health_check()

@merit.tag("integration", "slow")
def merit_end_to_end(system):
    pass

# Tag entire classes
@merit.tag("customer-support")
class MeritSupportBot:
    @merit.tag("greeting")
    def merit_hello(self, bot):
        pass

    @merit.tag("farewell")
    def merit_goodbye(self, bot):
        pass

CLI Usage:

merit test --tag smoke    # Only smoke tests
merit test --tag slow     # Only slow tests

@tag.skip

Skip a merit function with an optional reason. Signature:

@tag.skip(*, reason: str | None = None)

Parameters:

Name	Type	Default	Description
`reason`	`str \| None`	`None`	Optional explanation for why merit is skipped

Returns: Decorator that marks the target as skipped Example:

import merit

@merit.tag.skip(reason="Feature not implemented yet")
def merit_upcoming_feature():
    pass

@merit.tag.skip(reason="Requires API key")
def merit_external_api():
    pass

@tag.xfail

Mark a merit as expected to fail. Signature:

@tag.xfail(
    *,
    reason: str | None = None,
    strict: bool = False,
)

Parameters:

Name	Type	Default	Description
`reason`	`str \| None`	`None`	Optional explanation for expected failure
`strict`	`bool`	`False`	If `True`, passing is treated as a failure (unexpected pass)

Returns: Decorator that marks the target as expected to fail Example:

import merit

@merit.tag.xfail(reason="Known bug #123")
def merit_known_issue():
    # This failure won't fail the merit suite
    assert False

@merit.tag.xfail(reason="Model not accurate yet", strict=True)
def merit_strict_xfail():
    # If this passes, it's an error (unexpected)
    pass

Classes

Case

Container for test case inputs and reference data. Attributes:

Name	Type	Description
`id`	`UUID`	Unique identifier (auto-generated)
`tags`	`set[str]`	Tags for filtering or categorization
`metadata`	`dict[str, str \| int \| float \| bool \| None]`	Arbitrary key-value pairs
`references`	`RefsT`	Reference data for validation (typed or dict), defaults to `{}`
`sut_input_values`	`dict[str, Any]`	Input arguments to pass to the SUT

Example:

from merit import Case
import json

# Create cases programmatically
cases = [
    Case(
        tags={"smoke"},
        metadata={"category": "greeting"},
        sut_input_values={"prompt": "Hello"},
        references={"expected": "Hi there!"}
    ),
    Case(
        sut_input_values={"prompt": "Goodbye"},
        references={"expected": "See you later!"}
    ),
]

# Load from JSON
with open("cases.json") as f:
    cases = [Case(**item) for item in json.load(f)]

# Typed references
from typing import TypedDict

class MyRefs(TypedDict):
    expected_label: str
    confidence_threshold: float

case = Case[MyRefs](
    sut_input_values={"text": "Sample input"},
    references={"expected_label": "positive", "confidence_threshold": 0.8}
)

CaseGroup

Container for grouping related cases with group-level references and a pass threshold. Type Parameters:

Name	Default	Description
`RefsT`	`dict[str, Any]`	Type of each case’s `references`
`GroupRefsT`	`dict[str, Any]`	Type of the group’s `references`

Attributes:

Name	Type	Description
`name`	`str`	Unique group identifier (used in reports and ID suffixes)
`cases`	`list[Case[RefsT]]`	One or more cases in this group (min 1)
`references`	`GroupRefsT`	Group-level reference data, defaults to `{}`
`min_passes`	`int`	Minimum case passes required for the group to pass (default `1`, must be `≥ 1` and `≤ len(cases)`)

Example:

from pydantic import BaseModel
from merit import Case, CaseGroup


class CaseRefs(BaseModel):
    expected: str

class GroupRefs(BaseModel):
    stop_keywords: list[str]


geography = CaseGroup[CaseRefs, GroupRefs](
    name="geography",
    references=GroupRefs(stop_keywords=["Lol"]),
    cases=[
        Case[CaseRefs](
            sut_input_values={"prompt": "Capital of France?"},
            references=CaseRefs(expected="Paris"),
        ),
        Case[CaseRefs](
            sut_input_values={"prompt": "Capital of Germany?"},
            references=CaseRefs(expected="Berlin"),
        ),
    ],
    min_passes=2,
)

# Untyped (dict references)
simple = CaseGroup(
    name="simple",
    cases=[Case(sut_input_values={"x": 1})],
)

Scope

Enum defining resource lifecycle scopes. Values:

Value	Description
`Scope.CASE`	Fresh instance per parametrized merit case
`Scope.SUITE`	Shared within a single merit file/module
`Scope.SESSION`	Shared across entire merit run

Example:

from merit import resource
from merit.resources import Scope

@resource(scope=Scope.SESSION)
def expensive_model():
    return load_model()  # Loaded once

@resource(scope=Scope.SUITE)
def api_client():
    return APIClient()  # Shared within file

@resource(scope=Scope.CASE)
def temp_dir():
    import tempfile
    tmpdir = tempfile.mkdtemp()
    yield tmpdir
    shutil.rmtree(tmpdir)  # Fresh per case

Runner

Execute discovered merits and return a MeritRun. Signature:

class Runner:
    def __init__(
        self,
        *,
        reporters: list[Reporter],
        maxfail: int | None = None,
        fail_fast: bool = False,
        verbosity: int = 0,
        concurrency: int = 1,
        timeout: float | None = None,
        enable_tracing: bool = False,
        trace_output: Path | str | None = None,
        capture_output: bool = True,
        save_to_db: bool = True,
        db_path: Path | str | None = None,
        run_id: UUID | str | None = None,
    ) -> None: ...

    async def run(
        self,
        items: list[MeritTestDefinition] | None = None,
        path: str | None = None,
        run_id: UUID | str | None = None,
    ) -> MeritRun: ...

    def run_id_exists(self, run_id: UUID | str) -> bool: ...

Timeout behavior (timeout):

timeout is a run-level limit (not per-test).
Cancellation is cooperative: on timeout, the run is marked stopped_early and no new tests are started.
In-flight work may not stop immediately, especially synchronous merits already executing in worker threads.

Run UUID behavior:

run_id accepts either a UUID object or UUID string.
run() run_id overrides constructor run_id.
If neither is provided, Merit auto-generates a new UUID.
If save_to_db=True and the selected run UUID already exists, run() raises ValueError.
run_id_exists() can be used as a preflight check before execution.

Example:

from uuid import UUID

from merit.reports import ConsoleReporter
from merit.testing import Runner

runner = Runner(
    reporters=[ConsoleReporter()],
    db_path=".merit/merit.db",
    run_id=UUID("00000000-0000-0000-0000-000000000001"),
)

if runner.run_id_exists("00000000-0000-0000-0000-000000000001"):
    raise ValueError("run_id already exists")

# Uses constructor run_id
result = await runner.run(path="tests/")

# Overrides constructor run_id for this run only
result = await runner.run(
    path="tests/",
    run_id="00000000-0000-0000-0000-000000000002",
)

Functions

Imperative Outcome Control

Merit provides imperative functions to control test outcomes at runtime. These are different from decorators and are used for conditional control flow within tests.

skip

Skip the current test imperatively. Signature:

def skip(reason: str = "") -> NoReturn

Parameters:

Name	Type	Default	Description
`reason`	`str`	`""`	Explanation for why the test is being skipped

Returns: Never returns (raises SkipTest exception) Example:

import merit
import os

def merit_requires_api_key():
    """Skip test if API key is not configured."""
    if "API_KEY" not in os.environ:
        merit.skip("API_KEY not configured")

    # Test continues only if API_KEY exists
    api_client = APIClient(api_key=os.environ["API_KEY"])
    assert api_client.health_check()

def merit_conditional_skip(database):
    """Skip based on runtime conditions."""
    if not database.is_available():
        merit.skip("Database not available")

    result = database.query("SELECT 1")
    assert result

fail

Explicitly fail the current test. Signature:

def fail(reason: str = "") -> NoReturn

Parameters:

Name	Type	Default	Description
`reason`	`str`	`""`	Explanation for why the test is failing

Returns: Never returns (raises FailTest exception) Example:

import merit

def merit_explicit_failure(api_client):
    """Fail test when detecting invalid state."""
    response = api_client.get("/status")

    if response.status_code == 500:
        merit.fail(f"Server returned 500 error: {response.text}")

    assert response.status_code == 200

def merit_validation_failure(data_processor):
    """Fail when preconditions aren't met."""
    data = data_processor.load()

    if len(data) == 0:
        merit.fail("No data loaded - cannot run test")

    result = data_processor.process(data)
    assert result

xfail

Mark the current test as expected to fail and stop execution. Signature:

def xfail(reason: str = "") -> NoReturn

Parameters:

Name	Type	Default	Description
`reason`	`str`	`""`	Explanation for why the test is expected to fail

Returns: Never returns (raises XFailTest exception) Example:

import merit
import sys

def merit_known_bug():
    """Mark test as expected failure for a known bug."""
    merit.xfail("issue #42: division by zero not handled")

    # This code won't execute
    result = 1 / 0
    assert result == 0

def merit_conditional_xfail():
    """Conditionally mark as expected failure."""
    if sys.version_info < (3, 12):
        merit.xfail("feature requires Python 3.12+")

    # Test continues on Python 3.12+
    assert True

Note: These imperative functions are different from the decorators:

merit.skip() (function) vs @merit.tag.skip() (decorator)
merit.fail() (function) vs no decorator equivalent
merit.xfail() (function) vs @merit.tag.xfail() (decorator)

Use decorators for unconditional outcomes known at definition time. Use functions for conditional outcomes determined at runtime.

iter_cases

Decorator to run a merit function for each case. Signature:

def iter_cases(
    *cases: Case[RefsT],
    min_passes: int | None = None,
) -> Callable[[Callable[..., Any]], Callable[..., Any]]

Parameters:

Name	Type	Description
`*cases`	`Case[RefsT]`	One or more test cases to iterate over
`min_passes`	`int \| None`	Minimum number of passing case executions required for the parent merit to pass. Defaults to all cases.

Returns: Decorator that applies parametrization using the cases Validation:

min_passes must be >= 1
min_passes cannot exceed the number of provided cases

Example:

from merit import Case, iter_cases
import json

# Load test cases
with open("test_cases.json") as f:
    cases = [Case(**item) for item in json.load(f)]

@iter_cases(*cases)
def merit_from_dataset(case: Case, classifier):
    result = classifier(**case.sut_input_values)

    if case.references:
        expected = case.references["expected_label"]
        assert result == expected

@iter_cases(*cases, min_passes=8)
def merit_from_dataset_pass_at_k(case: Case, classifier):
    result = classifier(**case.sut_input_values)
    assert result in {"positive", "negative", "neutral"}

iter_case_groups

Decorator to run a merit function for each case group, iterating cases within each group. Signature:

def iter_case_groups(
    *groups: CaseGroup[RefsT, GroupRefsT],
) -> Callable[[Callable[..., Any]], Callable[..., Any]]

Parameters:

Name	Type	Description
`*groups`	`CaseGroup[RefsT, GroupRefsT]`	One or more case groups to iterate over

Returns: Decorator that applies group-level iteration to the target function Injected parameters:

Name	Type	Description
`group`	`CaseGroup`	The current group being executed
`case`	`Case`	The current case within the group

Execution semantics:

Each group produces a nested execution; within each group, cases are iterated using the group’s min_passes threshold.
The parent merit passes only if all groups pass (i.e. every group meets its own min_passes).

Validation:

At least one group is required (empty call sets a deferred definition error)

Example:

import merit
from merit import Case, CaseGroup

geography = CaseGroup(
    name="geography",
    cases=[
        Case(sut_input_values={"prompt": "Capital of France?"}, references={"expected": "Paris"}),
        Case(sut_input_values={"prompt": "Capital of Germany?"}, references={"expected": "Berlin"}),
    ],
    min_passes=2,
)

music = CaseGroup(
    name="music",
    cases=[
        Case(sut_input_values={"prompt": "Best rock band?"}, references={"expected": "Metallica"}),
    ],
    min_passes=1,
)

@merit.iter_case_groups(geography, music)
def merit_chatbot(group: CaseGroup, case: Case, chatbot):
    response = chatbot(**case.sut_input_values)
    assert case.references["expected"] in response

SUT case validation

Validate case inputs by passing them to @sut(validate_cases=...). Validation runs during SUT resolution and raises if any case does not match the resolved callable/method signature.

import merit

cases = [
    merit.Case(sut_input_values={"prompt": "Hello", "temperature": 0.5}),
    merit.Case(sut_input_values={"prompt": "Hi"}),
]

@merit.sut(validate_cases=cases)
def my_agent():
    def run(prompt: str, temperature: float = 0.7) -> str:
        return f"{prompt} @ {temperature}"

    return run

Get Started

Usage

Concepts

API Reference

Examples

Decorators

@resource

@sut

@parametrize

@repeat

@run_inline

@tag

@tag.skip

@tag.xfail

Classes

Case

CaseGroup

Scope

Runner

Functions

Imperative Outcome Control

skip

fail

xfail

iter_cases

iter_case_groups

SUT case validation

Get Started

Usage

Concepts

API Reference

Examples

​Decorators

​@resource

​@sut

​@parametrize

​@repeat

​@run_inline

​@tag

​@tag.skip

​@tag.xfail

​Classes

​Case

​CaseGroup

​Scope

​Runner

​Functions

​Imperative Outcome Control

​skip

​fail

​xfail

​iter_cases

​iter_case_groups

​SUT case validation

Decorators

@resource

@sut

@parametrize

@repeat

@run_inline

@tag

@tag.skip

@tag.xfail

Classes

Case

CaseGroup

Scope

Runner

Functions

Imperative Outcome Control

skip

fail

xfail

iter_cases

iter_case_groups

SUT case validation