Documentation Index
Fetch the complete documentation index at: https://docs.appmerit.com/llms.txt
Use this file to discover all available pages before exploring further.
Decorators
@resource
Register a function as a dependency injection resource.
Signature:
@resource(
fn: Callable | None = None,
*,
scope: Scope | str = Scope.CASE,
on_resolve: Callable[[Any], Any] | None = None,
on_injection: Callable[[Any], Any] | None = None,
on_teardown: Callable[[Any], Any] | None = None,
)
Parameters:
| Name | Type | Default | Description |
|---|
fn | Callable | None | None | The resource factory function |
scope | Scope | str | Scope.CASE | Lifecycle scope: "case", "suite", or "session" |
on_resolve | Callable | None | None | Hook called once when resource is first created |
on_injection | Callable | None | None | Hook called every time resource is injected |
on_teardown | Callable | None | None | Hook called after generator teardown runs |
Returns: Decorated function registered as a resource
Example:
import merit
@merit.resource
def api_client():
return APIClient()
@merit.resource(scope="session")
async def database():
conn = await connect()
yield conn
await conn.close()
@merit.resource(
scope="suite",
on_injection=lambda client: client.refresh_token()
)
def authenticated_client():
client = APIClient()
client.login()
yield client
client.logout()
def merit_test(api_client, database, authenticated_client):
# All resources automatically injected
pass
@sut
Register a function factory as a traced system-under-test resource.
Signature:
@sut(
fn: Callable | None = None,
*,
scope: Scope | str = Scope.CASE,
method: str = "__call__",
validate_cases: list[Case[Any]] | None = None,
)
Parameters:
| Name | Type | Default | Description |
|---|
fn | Callable | None | None | The SUT factory function to register |
scope | Scope | str | Scope.CASE | Resource lifecycle scope: "case", "suite", "session" |
method | str | "__call__" | Method to trace when the factory returns a non-callable instance |
validate_cases | list[Case[Any]] | None | None | Cases to validate against the resolved SUT signature (raises on invalid input) |
Returns: Decorated function registered as a traced resource
Example:
import merit
from rag import retrieve
from agents import Agent
@merit.sut
def rag():
return retrieve
@merit.sut(method="run")
def agent():
agent = Agent(tools="all")
return agent
cases = [
merit.Case(sut_input_values={"query": "Privacy policy"}),
merit.Case(sut_input_values={"query": "How do you store data?"}),
]
@merit.sut(validate_cases=cases)
def search_agent():
return retrieve
def merit_test(agent, rag):
# Resolved values behave like original callable/instance APIs
context = retrieve("Privacy policy")
output = agent.run("How do you store my data?", context=context)
@parametrize
Run a merit function with multiple parameter combinations.
Signature:
@parametrize(
argnames: str | Sequence[str],
argvalues: Iterable[Any],
*,
ids: Sequence[str] | None = None,
)
Parameters:
| Name | Type | Default | Description |
|---|
argnames | str | Sequence[str] | - | Parameter name(s) as string or sequence |
argvalues | Iterable[Any] | - | List of value tuples for each parameter combination |
ids | Sequence[str] | None | None | Optional custom IDs for each test case |
Returns: Decorator that applies parametrization to the target function
Example:
import merit
@merit.parametrize("city,state", [
("Boston", "Massachusetts"),
("Austin", "Texas"),
("Miami", "Florida"),
])
def merit_geography(city: str, state: str, bot):
result = bot.ask(f"What state is {city} in?")
assert state in result
# Stacked parametrization (creates cartesian product)
@merit.parametrize("model", ["gpt-4", "claude-3"])
@merit.parametrize("temperature", [0.0, 0.7])
def merit_combinations(model: str, temperature: float):
# Runs 4 times: 2 models × 2 temperatures
pass
# Custom IDs
@merit.parametrize("value", [1, 2, 3], ids=["one", "two", "three"])
def merit_custom_ids(value: int):
assert value > 0
@repeat
Run a merit function multiple times to test consistency.
Signature:
@repeat(
count: int,
*,
min_passes: int | None = None,
)
Parameters:
| Name | Type | Default | Description |
|---|
count | int | - | Number of times to run the merit |
min_passes | int | None | count | Minimum passes required (defaults to all) |
Returns: Decorator that applies repeat configuration to the target
Example:
import merit
@merit.repeat(10)
def merit_consistent(llm):
# All 10 runs must pass
response = llm.generate("Say hello")
assert "hello" in response.lower()
@merit.repeat(10, min_passes=8)
def merit_mostly_reliable(llm):
# At least 8 out of 10 must pass
response = llm.generate("Translate 'hello' to Spanish")
assert "hola" in response.lower()
@run_inline
Run a synchronous merit inline on the event-loop thread.
By default, synchronous merits are offloaded to a worker thread via asyncio.to_thread(...). Use @run_inline to opt out when thread affinity matters.
Signature:
Parameters:
| Name | Type | Description |
|---|
fn | Callable[..., Any] | Synchronous merit function to mark for inline execution |
Returns: Decorated sync function marked to run inline
Notes:
- Applies only to synchronous
def merit_* functions.
async def merit_* functions already run on the event-loop thread.
Example:
import threading
import merit
def merit_default_threaded():
# Default sync behavior: worker thread.
assert threading.current_thread() is not threading.main_thread()
@merit.run_inline
def merit_main_thread_only():
# Opt-out: runs inline on main event-loop thread.
assert threading.current_thread() is threading.main_thread()
@tag
Add tags to merit functions or classes for filtering and organization.
Signature:
Parameters:
| Name | Type | Description |
|---|
*names | str | Tag names to apply |
Returns: Decorator that adds tags to the target
Example:
import merit
@merit.tag("smoke", "fast")
def merit_health_check(api):
assert api.health_check()
@merit.tag("integration", "slow")
def merit_end_to_end(system):
pass
# Tag entire classes
@merit.tag("customer-support")
class MeritSupportBot:
@merit.tag("greeting")
def merit_hello(self, bot):
pass
@merit.tag("farewell")
def merit_goodbye(self, bot):
pass
CLI Usage:
merit test --tag smoke # Only smoke tests
merit test --tag slow # Only slow tests
@tag.skip
Skip a merit function with an optional reason.
Signature:
@tag.skip(*, reason: str | None = None)
Parameters:
| Name | Type | Default | Description |
|---|
reason | str | None | None | Optional explanation for why merit is skipped |
Returns: Decorator that marks the target as skipped
Example:
import merit
@merit.tag.skip(reason="Feature not implemented yet")
def merit_upcoming_feature():
pass
@merit.tag.skip(reason="Requires API key")
def merit_external_api():
pass
@tag.xfail
Mark a merit as expected to fail.
Signature:
@tag.xfail(
*,
reason: str | None = None,
strict: bool = False,
)
Parameters:
| Name | Type | Default | Description |
|---|
reason | str | None | None | Optional explanation for expected failure |
strict | bool | False | If True, passing is treated as a failure (unexpected pass) |
Returns: Decorator that marks the target as expected to fail
Example:
import merit
@merit.tag.xfail(reason="Known bug #123")
def merit_known_issue():
# This failure won't fail the merit suite
assert False
@merit.tag.xfail(reason="Model not accurate yet", strict=True)
def merit_strict_xfail():
# If this passes, it's an error (unexpected)
pass
Classes
Case
Container for test case inputs and reference data.
Attributes:
| Name | Type | Description |
|---|
id | UUID | Unique identifier (auto-generated) |
tags | set[str] | Tags for filtering or categorization |
metadata | dict[str, str | int | float | bool | None] | Arbitrary key-value pairs |
references | RefsT | Reference data for validation (typed or dict), defaults to {} |
sut_input_values | dict[str, Any] | Input arguments to pass to the SUT |
Example:
from merit import Case
import json
# Create cases programmatically
cases = [
Case(
tags={"smoke"},
metadata={"category": "greeting"},
sut_input_values={"prompt": "Hello"},
references={"expected": "Hi there!"}
),
Case(
sut_input_values={"prompt": "Goodbye"},
references={"expected": "See you later!"}
),
]
# Load from JSON
with open("cases.json") as f:
cases = [Case(**item) for item in json.load(f)]
# Typed references
from typing import TypedDict
class MyRefs(TypedDict):
expected_label: str
confidence_threshold: float
case = Case[MyRefs](
sut_input_values={"text": "Sample input"},
references={"expected_label": "positive", "confidence_threshold": 0.8}
)
CaseGroup
Container for grouping related cases with group-level references and a pass threshold.
Type Parameters:
| Name | Default | Description |
|---|
RefsT | dict[str, Any] | Type of each case’s references |
GroupRefsT | dict[str, Any] | Type of the group’s references |
Attributes:
| Name | Type | Description |
|---|
name | str | Unique group identifier (used in reports and ID suffixes) |
cases | list[Case[RefsT]] | One or more cases in this group (min 1) |
references | GroupRefsT | Group-level reference data, defaults to {} |
min_passes | int | Minimum case passes required for the group to pass (default 1, must be ≥ 1 and ≤ len(cases)) |
Example:
from pydantic import BaseModel
from merit import Case, CaseGroup
class CaseRefs(BaseModel):
expected: str
class GroupRefs(BaseModel):
stop_keywords: list[str]
geography = CaseGroup[CaseRefs, GroupRefs](
name="geography",
references=GroupRefs(stop_keywords=["Lol"]),
cases=[
Case[CaseRefs](
sut_input_values={"prompt": "Capital of France?"},
references=CaseRefs(expected="Paris"),
),
Case[CaseRefs](
sut_input_values={"prompt": "Capital of Germany?"},
references=CaseRefs(expected="Berlin"),
),
],
min_passes=2,
)
# Untyped (dict references)
simple = CaseGroup(
name="simple",
cases=[Case(sut_input_values={"x": 1})],
)
Scope
Enum defining resource lifecycle scopes.
Values:
| Value | Description |
|---|
Scope.CASE | Fresh instance per parametrized merit case |
Scope.SUITE | Shared within a single merit file/module |
Scope.SESSION | Shared across entire merit run |
Example:
from merit import resource
from merit.resources import Scope
@resource(scope=Scope.SESSION)
def expensive_model():
return load_model() # Loaded once
@resource(scope=Scope.SUITE)
def api_client():
return APIClient() # Shared within file
@resource(scope=Scope.CASE)
def temp_dir():
import tempfile
tmpdir = tempfile.mkdtemp()
yield tmpdir
shutil.rmtree(tmpdir) # Fresh per case
Runner
Execute discovered merits and return a MeritRun.
Signature:
class Runner:
def __init__(
self,
*,
reporters: list[Reporter],
maxfail: int | None = None,
fail_fast: bool = False,
verbosity: int = 0,
concurrency: int = 1,
timeout: float | None = None,
enable_tracing: bool = False,
trace_output: Path | str | None = None,
capture_output: bool = True,
save_to_db: bool = True,
db_path: Path | str | None = None,
run_id: UUID | str | None = None,
) -> None: ...
async def run(
self,
items: list[MeritTestDefinition] | None = None,
path: str | None = None,
run_id: UUID | str | None = None,
) -> MeritRun: ...
def run_id_exists(self, run_id: UUID | str) -> bool: ...
Timeout behavior (timeout):
timeout is a run-level limit (not per-test).
- Cancellation is cooperative: on timeout, the run is marked
stopped_early and no new tests are started.
- In-flight work may not stop immediately, especially synchronous merits already executing in worker threads.
Run UUID behavior:
run_id accepts either a UUID object or UUID string.
run() run_id overrides constructor run_id.
- If neither is provided, Merit auto-generates a new UUID.
- If
save_to_db=True and the selected run UUID already exists, run() raises ValueError.
run_id_exists() can be used as a preflight check before execution.
Example:
from uuid import UUID
from merit.reports import ConsoleReporter
from merit.testing import Runner
runner = Runner(
reporters=[ConsoleReporter()],
db_path=".merit/merit.db",
run_id=UUID("00000000-0000-0000-0000-000000000001"),
)
if runner.run_id_exists("00000000-0000-0000-0000-000000000001"):
raise ValueError("run_id already exists")
# Uses constructor run_id
result = await runner.run(path="tests/")
# Overrides constructor run_id for this run only
result = await runner.run(
path="tests/",
run_id="00000000-0000-0000-0000-000000000002",
)
Functions
Imperative Outcome Control
Merit provides imperative functions to control test outcomes at runtime. These are different from decorators and are used for conditional control flow within tests.
skip
Skip the current test imperatively.
Signature:
def skip(reason: str = "") -> NoReturn
Parameters:
| Name | Type | Default | Description |
|---|
reason | str | "" | Explanation for why the test is being skipped |
Returns: Never returns (raises SkipTest exception)
Example:
import merit
import os
def merit_requires_api_key():
"""Skip test if API key is not configured."""
if "API_KEY" not in os.environ:
merit.skip("API_KEY not configured")
# Test continues only if API_KEY exists
api_client = APIClient(api_key=os.environ["API_KEY"])
assert api_client.health_check()
def merit_conditional_skip(database):
"""Skip based on runtime conditions."""
if not database.is_available():
merit.skip("Database not available")
result = database.query("SELECT 1")
assert result
fail
Explicitly fail the current test.
Signature:
def fail(reason: str = "") -> NoReturn
Parameters:
| Name | Type | Default | Description |
|---|
reason | str | "" | Explanation for why the test is failing |
Returns: Never returns (raises FailTest exception)
Example:
import merit
def merit_explicit_failure(api_client):
"""Fail test when detecting invalid state."""
response = api_client.get("/status")
if response.status_code == 500:
merit.fail(f"Server returned 500 error: {response.text}")
assert response.status_code == 200
def merit_validation_failure(data_processor):
"""Fail when preconditions aren't met."""
data = data_processor.load()
if len(data) == 0:
merit.fail("No data loaded - cannot run test")
result = data_processor.process(data)
assert result
xfail
Mark the current test as expected to fail and stop execution.
Signature:
def xfail(reason: str = "") -> NoReturn
Parameters:
| Name | Type | Default | Description |
|---|
reason | str | "" | Explanation for why the test is expected to fail |
Returns: Never returns (raises XFailTest exception)
Example:
import merit
import sys
def merit_known_bug():
"""Mark test as expected failure for a known bug."""
merit.xfail("issue #42: division by zero not handled")
# This code won't execute
result = 1 / 0
assert result == 0
def merit_conditional_xfail():
"""Conditionally mark as expected failure."""
if sys.version_info < (3, 12):
merit.xfail("feature requires Python 3.12+")
# Test continues on Python 3.12+
assert True
Note: These imperative functions are different from the decorators:
merit.skip() (function) vs @merit.tag.skip() (decorator)
merit.fail() (function) vs no decorator equivalent
merit.xfail() (function) vs @merit.tag.xfail() (decorator)
Use decorators for unconditional outcomes known at definition time. Use functions for conditional outcomes determined at runtime.
iter_cases
Decorator to run a merit function for each case.
Signature:
def iter_cases(
*cases: Case[RefsT],
min_passes: int | None = None,
) -> Callable[[Callable[..., Any]], Callable[..., Any]]
Parameters:
| Name | Type | Description |
|---|
*cases | Case[RefsT] | One or more test cases to iterate over |
min_passes | int | None | Minimum number of passing case executions required for the parent merit to pass. Defaults to all cases. |
Returns: Decorator that applies parametrization using the cases
Validation:
min_passes must be >= 1
min_passes cannot exceed the number of provided cases
Example:
from merit import Case, iter_cases
import json
# Load test cases
with open("test_cases.json") as f:
cases = [Case(**item) for item in json.load(f)]
@iter_cases(*cases)
def merit_from_dataset(case: Case, classifier):
result = classifier(**case.sut_input_values)
if case.references:
expected = case.references["expected_label"]
assert result == expected
@iter_cases(*cases, min_passes=8)
def merit_from_dataset_pass_at_k(case: Case, classifier):
result = classifier(**case.sut_input_values)
assert result in {"positive", "negative", "neutral"}
iter_case_groups
Decorator to run a merit function for each case group, iterating cases within each group.
Signature:
def iter_case_groups(
*groups: CaseGroup[RefsT, GroupRefsT],
) -> Callable[[Callable[..., Any]], Callable[..., Any]]
Parameters:
| Name | Type | Description |
|---|
*groups | CaseGroup[RefsT, GroupRefsT] | One or more case groups to iterate over |
Returns: Decorator that applies group-level iteration to the target function
Injected parameters:
| Name | Type | Description |
|---|
group | CaseGroup | The current group being executed |
case | Case | The current case within the group |
Execution semantics:
- Each group produces a nested execution; within each group, cases are iterated using the group’s
min_passes threshold.
- The parent merit passes only if all groups pass (i.e. every group meets its own
min_passes).
Validation:
- At least one group is required (empty call sets a deferred definition error)
Example:
import merit
from merit import Case, CaseGroup
geography = CaseGroup(
name="geography",
cases=[
Case(sut_input_values={"prompt": "Capital of France?"}, references={"expected": "Paris"}),
Case(sut_input_values={"prompt": "Capital of Germany?"}, references={"expected": "Berlin"}),
],
min_passes=2,
)
music = CaseGroup(
name="music",
cases=[
Case(sut_input_values={"prompt": "Best rock band?"}, references={"expected": "Metallica"}),
],
min_passes=1,
)
@merit.iter_case_groups(geography, music)
def merit_chatbot(group: CaseGroup, case: Case, chatbot):
response = chatbot(**case.sut_input_values)
assert case.references["expected"] in response
SUT case validation
Validate case inputs by passing them to @sut(validate_cases=...). Validation runs during SUT resolution and raises if any case does not match the resolved callable/method signature.
import merit
cases = [
merit.Case(sut_input_values={"prompt": "Hello", "temperature": 0.5}),
merit.Case(sut_input_values={"prompt": "Hi"}),
]
@merit.sut(validate_cases=cases)
def my_agent():
def run(prompt: str, temperature: float = 0.7) -> str:
return f"{prompt} @ {temperature}"
return run