Skip to main content

Decorators

@resource

Register a function as a dependency injection resource. Signature:
@resource(
    fn: Callable | None = None,
    *,
    scope: Scope | str = Scope.CASE,
    on_resolve: Callable[[Any], Any] | None = None,
    on_injection: Callable[[Any], Any] | None = None,
    on_teardown: Callable[[Any], Any] | None = None,
)
Parameters:
NameTypeDefaultDescription
fnCallable | NoneNoneThe resource factory function
scopeScope | strScope.CASELifecycle scope: "case", "suite", or "session"
on_resolveCallable | NoneNoneHook called once when resource is first created
on_injectionCallable | NoneNoneHook called every time resource is injected
on_teardownCallable | NoneNoneHook called after generator teardown runs
Returns: Decorated function registered as a resource Example:
import merit

@merit.resource
def api_client():
    return APIClient()

@merit.resource(scope="session")
async def database():
    conn = await connect()
    yield conn
    await conn.close()

@merit.resource(
    scope="suite",
    on_injection=lambda client: client.refresh_token()
)
def authenticated_client():
    client = APIClient()
    client.login()
    yield client
    client.logout()

def merit_test(api_client, database, authenticated_client):
    # All resources automatically injected
    pass

@sut

Register a function factory as a traced system-under-test resource. Signature:
@sut(
    fn: Callable | None = None,
    *,
    scope: Scope | str = Scope.CASE,
    method: str = "__call__",
    validate_cases: list[Case[Any]] | None = None,
)
Parameters:
NameTypeDefaultDescription
fnCallable | NoneNoneThe SUT factory function to register
scopeScope | strScope.CASEResource lifecycle scope: "case", "suite", "session"
methodstr"__call__"Method to trace when the factory returns a non-callable instance
validate_caseslist[Case[Any]] | NoneNoneCases to validate against the resolved SUT signature (raises on invalid input)
Returns: Decorated function registered as a traced resource Example:
import merit

from rag import retrieve
from agents import Agent

@merit.sut
def rag():
    return retrieve

@merit.sut(method="run")
def agent():
    agent = Agent(tools="all")
    return agent

cases = [
    merit.Case(sut_input_values={"query": "Privacy policy"}),
    merit.Case(sut_input_values={"query": "How do you store data?"}),
]

@merit.sut(validate_cases=cases)
def search_agent():
    return retrieve

def merit_test(agent, rag):
    # Resolved values behave like original callable/instance APIs
    context = retrieve("Privacy policy")
    output = agent.run("How do you store my data?", context=context)

@parametrize

Run a merit function with multiple parameter combinations. Signature:
@parametrize(
    argnames: str | Sequence[str],
    argvalues: Iterable[Any],
    *,
    ids: Sequence[str] | None = None,
)
Parameters:
NameTypeDefaultDescription
argnamesstr | Sequence[str]-Parameter name(s) as string or sequence
argvaluesIterable[Any]-List of value tuples for each parameter combination
idsSequence[str] | NoneNoneOptional custom IDs for each test case
Returns: Decorator that applies parametrization to the target function Example:
import merit

@merit.parametrize("city,state", [
    ("Boston", "Massachusetts"),
    ("Austin", "Texas"),
    ("Miami", "Florida"),
])
def merit_geography(city: str, state: str, bot):
    result = bot.ask(f"What state is {city} in?")
    assert state in result

# Stacked parametrization (creates cartesian product)
@merit.parametrize("model", ["gpt-4", "claude-3"])
@merit.parametrize("temperature", [0.0, 0.7])
def merit_combinations(model: str, temperature: float):
    # Runs 4 times: 2 models × 2 temperatures
    pass

# Custom IDs
@merit.parametrize("value", [1, 2, 3], ids=["one", "two", "three"])
def merit_custom_ids(value: int):
    assert value > 0

@repeat

Run a merit function multiple times to test consistency. Signature:
@repeat(
    count: int,
    *,
    min_passes: int | None = None,
)
Parameters:
NameTypeDefaultDescription
countint-Number of times to run the merit
min_passesint | NonecountMinimum passes required (defaults to all)
Returns: Decorator that applies repeat configuration to the target Example:
import merit

@merit.repeat(10)
def merit_consistent(llm):
    # All 10 runs must pass
    response = llm.generate("Say hello")
    assert "hello" in response.lower()

@merit.repeat(10, min_passes=8)
def merit_mostly_reliable(llm):
    # At least 8 out of 10 must pass
    response = llm.generate("Translate 'hello' to Spanish")
    assert "hola" in response.lower()

@run_inline

Run a synchronous merit inline on the event-loop thread. By default, synchronous merits are offloaded to a worker thread via asyncio.to_thread(...). Use @run_inline to opt out when thread affinity matters. Signature:
@run_inline
Parameters:
NameTypeDescription
fnCallable[..., Any]Synchronous merit function to mark for inline execution
Returns: Decorated sync function marked to run inline Notes:
  • Applies only to synchronous def merit_* functions.
  • async def merit_* functions already run on the event-loop thread.
Example:
import threading
import merit

def merit_default_threaded():
    # Default sync behavior: worker thread.
    assert threading.current_thread() is not threading.main_thread()

@merit.run_inline
def merit_main_thread_only():
    # Opt-out: runs inline on main event-loop thread.
    assert threading.current_thread() is threading.main_thread()

@tag

Add tags to merit functions or classes for filtering and organization. Signature:
@tag(*names: str)
Parameters:
NameTypeDescription
*namesstrTag names to apply
Returns: Decorator that adds tags to the target Example:
import merit

@merit.tag("smoke", "fast")
def merit_health_check(api):
    assert api.health_check()

@merit.tag("integration", "slow")
def merit_end_to_end(system):
    pass

# Tag entire classes
@merit.tag("customer-support")
class MeritSupportBot:
    @merit.tag("greeting")
    def merit_hello(self, bot):
        pass

    @merit.tag("farewell")
    def merit_goodbye(self, bot):
        pass
CLI Usage:
merit test --tag smoke    # Only smoke tests
merit test --tag slow     # Only slow tests

@tag.skip

Skip a merit function with an optional reason. Signature:
@tag.skip(*, reason: str | None = None)
Parameters:
NameTypeDefaultDescription
reasonstr | NoneNoneOptional explanation for why merit is skipped
Returns: Decorator that marks the target as skipped Example:
import merit

@merit.tag.skip(reason="Feature not implemented yet")
def merit_upcoming_feature():
    pass

@merit.tag.skip(reason="Requires API key")
def merit_external_api():
    pass

@tag.xfail

Mark a merit as expected to fail. Signature:
@tag.xfail(
    *,
    reason: str | None = None,
    strict: bool = False,
)
Parameters:
NameTypeDefaultDescription
reasonstr | NoneNoneOptional explanation for expected failure
strictboolFalseIf True, passing is treated as a failure (unexpected pass)
Returns: Decorator that marks the target as expected to fail Example:
import merit

@merit.tag.xfail(reason="Known bug #123")
def merit_known_issue():
    # This failure won't fail the merit suite
    assert False

@merit.tag.xfail(reason="Model not accurate yet", strict=True)
def merit_strict_xfail():
    # If this passes, it's an error (unexpected)
    pass

Classes

Case

Container for test case inputs and reference data. Attributes:
NameTypeDescription
idUUIDUnique identifier (auto-generated)
tagsset[str]Tags for filtering or categorization
metadatadict[str, str | int | float | bool | None]Arbitrary key-value pairs
referencesRefsTReference data for validation (typed or dict), defaults to {}
sut_input_valuesdict[str, Any]Input arguments to pass to the SUT
Example:
from merit import Case
import json

# Create cases programmatically
cases = [
    Case(
        tags={"smoke"},
        metadata={"category": "greeting"},
        sut_input_values={"prompt": "Hello"},
        references={"expected": "Hi there!"}
    ),
    Case(
        sut_input_values={"prompt": "Goodbye"},
        references={"expected": "See you later!"}
    ),
]

# Load from JSON
with open("cases.json") as f:
    cases = [Case(**item) for item in json.load(f)]

# Typed references
from typing import TypedDict

class MyRefs(TypedDict):
    expected_label: str
    confidence_threshold: float

case = Case[MyRefs](
    sut_input_values={"text": "Sample input"},
    references={"expected_label": "positive", "confidence_threshold": 0.8}
)

CaseGroup

Container for grouping related cases with group-level references and a pass threshold. Type Parameters:
NameDefaultDescription
RefsTdict[str, Any]Type of each case’s references
GroupRefsTdict[str, Any]Type of the group’s references
Attributes:
NameTypeDescription
namestrUnique group identifier (used in reports and ID suffixes)
caseslist[Case[RefsT]]One or more cases in this group (min 1)
referencesGroupRefsTGroup-level reference data, defaults to {}
min_passesintMinimum case passes required for the group to pass (default 1, must be ≥ 1 and ≤ len(cases))
Example:
from pydantic import BaseModel
from merit import Case, CaseGroup


class CaseRefs(BaseModel):
    expected: str

class GroupRefs(BaseModel):
    stop_keywords: list[str]


geography = CaseGroup[CaseRefs, GroupRefs](
    name="geography",
    references=GroupRefs(stop_keywords=["Lol"]),
    cases=[
        Case[CaseRefs](
            sut_input_values={"prompt": "Capital of France?"},
            references=CaseRefs(expected="Paris"),
        ),
        Case[CaseRefs](
            sut_input_values={"prompt": "Capital of Germany?"},
            references=CaseRefs(expected="Berlin"),
        ),
    ],
    min_passes=2,
)

# Untyped (dict references)
simple = CaseGroup(
    name="simple",
    cases=[Case(sut_input_values={"x": 1})],
)

Scope

Enum defining resource lifecycle scopes. Values:
ValueDescription
Scope.CASEFresh instance per parametrized merit case
Scope.SUITEShared within a single merit file/module
Scope.SESSIONShared across entire merit run
Example:
from merit import resource
from merit.resources import Scope

@resource(scope=Scope.SESSION)
def expensive_model():
    return load_model()  # Loaded once

@resource(scope=Scope.SUITE)
def api_client():
    return APIClient()  # Shared within file

@resource(scope=Scope.CASE)
def temp_dir():
    import tempfile
    tmpdir = tempfile.mkdtemp()
    yield tmpdir
    shutil.rmtree(tmpdir)  # Fresh per case

Runner

Execute discovered merits and return a MeritRun. Signature:
class Runner:
    def __init__(
        self,
        *,
        reporters: list[Reporter],
        maxfail: int | None = None,
        fail_fast: bool = False,
        verbosity: int = 0,
        concurrency: int = 1,
        timeout: float | None = None,
        enable_tracing: bool = False,
        trace_output: Path | str | None = None,
        capture_output: bool = True,
        save_to_db: bool = True,
        db_path: Path | str | None = None,
        run_id: UUID | str | None = None,
    ) -> None: ...

    async def run(
        self,
        items: list[MeritTestDefinition] | None = None,
        path: str | None = None,
        run_id: UUID | str | None = None,
    ) -> MeritRun: ...

    def run_id_exists(self, run_id: UUID | str) -> bool: ...
Timeout behavior (timeout):
  • timeout is a run-level limit (not per-test).
  • Cancellation is cooperative: on timeout, the run is marked stopped_early and no new tests are started.
  • In-flight work may not stop immediately, especially synchronous merits already executing in worker threads.
Run UUID behavior:
  • run_id accepts either a UUID object or UUID string.
  • run() run_id overrides constructor run_id.
  • If neither is provided, Merit auto-generates a new UUID.
  • If save_to_db=True and the selected run UUID already exists, run() raises ValueError.
  • run_id_exists() can be used as a preflight check before execution.
Example:
from uuid import UUID

from merit.reports import ConsoleReporter
from merit.testing import Runner

runner = Runner(
    reporters=[ConsoleReporter()],
    db_path=".merit/merit.db",
    run_id=UUID("00000000-0000-0000-0000-000000000001"),
)

if runner.run_id_exists("00000000-0000-0000-0000-000000000001"):
    raise ValueError("run_id already exists")

# Uses constructor run_id
result = await runner.run(path="tests/")

# Overrides constructor run_id for this run only
result = await runner.run(
    path="tests/",
    run_id="00000000-0000-0000-0000-000000000002",
)

Functions

Imperative Outcome Control

Merit provides imperative functions to control test outcomes at runtime. These are different from decorators and are used for conditional control flow within tests.

skip

Skip the current test imperatively. Signature:
def skip(reason: str = "") -> NoReturn
Parameters:
NameTypeDefaultDescription
reasonstr""Explanation for why the test is being skipped
Returns: Never returns (raises SkipTest exception) Example:
import merit
import os

def merit_requires_api_key():
    """Skip test if API key is not configured."""
    if "API_KEY" not in os.environ:
        merit.skip("API_KEY not configured")

    # Test continues only if API_KEY exists
    api_client = APIClient(api_key=os.environ["API_KEY"])
    assert api_client.health_check()

def merit_conditional_skip(database):
    """Skip based on runtime conditions."""
    if not database.is_available():
        merit.skip("Database not available")

    result = database.query("SELECT 1")
    assert result

fail

Explicitly fail the current test. Signature:
def fail(reason: str = "") -> NoReturn
Parameters:
NameTypeDefaultDescription
reasonstr""Explanation for why the test is failing
Returns: Never returns (raises FailTest exception) Example:
import merit

def merit_explicit_failure(api_client):
    """Fail test when detecting invalid state."""
    response = api_client.get("/status")

    if response.status_code == 500:
        merit.fail(f"Server returned 500 error: {response.text}")

    assert response.status_code == 200

def merit_validation_failure(data_processor):
    """Fail when preconditions aren't met."""
    data = data_processor.load()

    if len(data) == 0:
        merit.fail("No data loaded - cannot run test")

    result = data_processor.process(data)
    assert result

xfail

Mark the current test as expected to fail and stop execution. Signature:
def xfail(reason: str = "") -> NoReturn
Parameters:
NameTypeDefaultDescription
reasonstr""Explanation for why the test is expected to fail
Returns: Never returns (raises XFailTest exception) Example:
import merit
import sys

def merit_known_bug():
    """Mark test as expected failure for a known bug."""
    merit.xfail("issue #42: division by zero not handled")

    # This code won't execute
    result = 1 / 0
    assert result == 0

def merit_conditional_xfail():
    """Conditionally mark as expected failure."""
    if sys.version_info < (3, 12):
        merit.xfail("feature requires Python 3.12+")

    # Test continues on Python 3.12+
    assert True
Note: These imperative functions are different from the decorators:
  • merit.skip() (function) vs @merit.tag.skip() (decorator)
  • merit.fail() (function) vs no decorator equivalent
  • merit.xfail() (function) vs @merit.tag.xfail() (decorator)
Use decorators for unconditional outcomes known at definition time. Use functions for conditional outcomes determined at runtime.

iter_cases

Decorator to run a merit function for each case. Signature:
def iter_cases(
    *cases: Case[RefsT],
    min_passes: int | None = None,
) -> Callable[[Callable[..., Any]], Callable[..., Any]]
Parameters:
NameTypeDescription
*casesCase[RefsT]One or more test cases to iterate over
min_passesint | NoneMinimum number of passing case executions required for the parent merit to pass. Defaults to all cases.
Returns: Decorator that applies parametrization using the cases Validation:
  • min_passes must be >= 1
  • min_passes cannot exceed the number of provided cases
Example:
from merit import Case, iter_cases
import json

# Load test cases
with open("test_cases.json") as f:
    cases = [Case(**item) for item in json.load(f)]

@iter_cases(*cases)
def merit_from_dataset(case: Case, classifier):
    result = classifier(**case.sut_input_values)

    if case.references:
        expected = case.references["expected_label"]
        assert result == expected

@iter_cases(*cases, min_passes=8)
def merit_from_dataset_pass_at_k(case: Case, classifier):
    result = classifier(**case.sut_input_values)
    assert result in {"positive", "negative", "neutral"}

iter_case_groups

Decorator to run a merit function for each case group, iterating cases within each group. Signature:
def iter_case_groups(
    *groups: CaseGroup[RefsT, GroupRefsT],
) -> Callable[[Callable[..., Any]], Callable[..., Any]]
Parameters:
NameTypeDescription
*groupsCaseGroup[RefsT, GroupRefsT]One or more case groups to iterate over
Returns: Decorator that applies group-level iteration to the target function Injected parameters:
NameTypeDescription
groupCaseGroupThe current group being executed
caseCaseThe current case within the group
Execution semantics:
  • Each group produces a nested execution; within each group, cases are iterated using the group’s min_passes threshold.
  • The parent merit passes only if all groups pass (i.e. every group meets its own min_passes).
Validation:
  • At least one group is required (empty call sets a deferred definition error)
Example:
import merit
from merit import Case, CaseGroup

geography = CaseGroup(
    name="geography",
    cases=[
        Case(sut_input_values={"prompt": "Capital of France?"}, references={"expected": "Paris"}),
        Case(sut_input_values={"prompt": "Capital of Germany?"}, references={"expected": "Berlin"}),
    ],
    min_passes=2,
)

music = CaseGroup(
    name="music",
    cases=[
        Case(sut_input_values={"prompt": "Best rock band?"}, references={"expected": "Metallica"}),
    ],
    min_passes=1,
)

@merit.iter_case_groups(geography, music)
def merit_chatbot(group: CaseGroup, case: Case, chatbot):
    response = chatbot(**case.sut_input_values)
    assert case.references["expected"] in response

SUT case validation

Validate case inputs by passing them to @sut(validate_cases=...). Validation runs during SUT resolution and raises if any case does not match the resolved callable/method signature.
import merit

cases = [
    merit.Case(sut_input_values={"prompt": "Hello", "temperature": 0.5}),
    merit.Case(sut_input_values={"prompt": "Hi"}),
]

@merit.sut(validate_cases=cases)
def my_agent():
    def run(prompt: str, temperature: float = 0.7) -> str:
        return f"{prompt} @ {temperature}"

    return run