Skip to main content
Merit provides a pytest-inspired CLI for running tests, filtering by tags or keywords, controlling concurrency, and reporting results. This page covers how to run merits and how the reporting system works, referencing where the behavior lives in the codebase.

Basic Usage

Run all discovered merits in the current directory:
merit test
Run merits from specific paths:
merit test tests/
merit test merit_chatbot.py merit_agent.py

Filtering Tests

By Keyword Expression

Use -k to filter tests by name with boolean expressions:
# Run tests with "chatbot" in the name
merit test -k chatbot

# Boolean operators: and, or, not
merit test -k "chatbot and not slow"
merit test -k "gpt4 or claude"

# Grouping with parentheses
merit test -k "(fast or smoke) and not flaky"
Keyword matching is substring-based: -k agent matches merit_agent_response, merit_weather_agent, etc.

By Tags

Use -t/--tag to include tests with specific tags:
# Run tests tagged "smoke"
merit test --tag smoke

# Multiple tags (OR logic)
merit test --tag smoke --tag fast
Use --skip-tag to exclude tests:
# Skip slow tests
merit test --skip-tag slow

# Skip multiple tags
merit test --skip-tag slow --skip-tag flaky
Combine filters:
# Run fast smoke tests about chatbots
merit test --tag smoke --tag fast -k chatbot

Controlling Execution

Stop on Failure

--maxfail N - Stop after N failures:
merit test --maxfail 3  # Stop after 3 failures
--fail-fast - Stop at the first failed assertion within a test:
merit test --fail-fast
Without --fail-fast, Merit collects all assertion failures in a test. With it, the test stops at the first failure.

Concurrency

Control parallel test execution with --concurrency:
# Sequential (default)
merit test --concurrency 1

# 5 concurrent tests
merit test --concurrency 5

# Unlimited (capped at 10)
merit test --concurrency 0
When to use concurrency:
  • Sequential (1): Default. Predictable output, easier debugging.
  • Concurrent (>1): Faster runs for independent tests. Use with stateless SUTs.
  • Unlimited (0): Maximum parallelism for large test suites.
For synchronous merits (def merit_*), Merit runs test bodies in worker threads by default to keep the event loop responsive. Use @merit.run_inline on a sync merit when it must run on the main event-loop thread.

Timeout

Set a global timeout for the entire test run:
# Stop after 300 seconds (5 minutes)
merit test --timeout 300
The timeout applies to the entire test session, not individual tests. Timeout is cooperative: when the timeout is reached, Merit marks the run as stopped early and stops starting new tests, but in-flight work may not stop immediately (especially synchronous merits already running in worker threads).

Verbosity

Control output detail with -v (verbose) or -q (quiet):
# Minimal output (only failures)
merit test -q

# Very minimal
merit test -qq

# Verbose output
merit test -v

# Very verbose
merit test -vv
Verbosity levels:
  • -qq or lower: Only failed/errored tests shown
  • -q: Less output
  • Default (0): Standard output
  • -v, -vv: More detail

Output Capture

By default, Merit captures stdout and stderr during test execution. Use -s to show output live:
# Show output in real-time (still captured for reports)
merit test -s

# Or the long form
merit test --show-output
This is useful for debugging tests with print statements or logging output.

Tracing

Enable OpenTelemetry tracing to capture spans from your SUT and tests:
# Enable tracing (writes to .merit/traces.jsonl)
merit test --trace

# Custom output path
merit test --trace --trace-output my-traces.jsonl
Use traces for:
  • Asserting tool calls in agent tests
  • Debugging LLM request/response flows
  • Performance analysis
See SUT for trace assertions.

Custom Run UUID

By default, Merit generates a run UUID automatically. You can provide one explicitly when you need a stable external correlation ID (for example, linking CI jobs to Merit runs).

CLI

Provide a UUID with --run-id:
merit test --run-id 3f5f5e9a-1c2d-4b5f-9c2b-7f6d8a9b0c1d
If that UUID already exists in the configured SQLite database, the command exits with code 2 and no tests are executed.

Python API

You can set a default run UUID on the runner, and override it per run() call:
from uuid import UUID
from merit.reports import ConsoleReporter
from merit.testing import Runner

runner = Runner(
    reporters=[ConsoleReporter()],
    run_id=UUID("00000000-0000-0000-0000-000000000001"),
)

# Uses constructor run_id
await runner.run(path="tests/")

# Overrides constructor run_id for this run only
await runner.run(
    path="tests/",
    run_id="00000000-0000-0000-0000-000000000002",
)
Run IDs are currently configured only via CLI --run-id or Python API parameters. They are not read from pyproject.toml, merit.toml, or environment variables. If save_to_db=True and the selected run UUID already exists, Runner.run() raises ValueError.

Configuration Files

Define default options in pyproject.toml or merit.toml: pyproject.toml:
[tool.merit]
test-paths = ["tests", "integrations"]
include-tags = ["smoke"]
exclude-tags = ["slow", "flaky"]
maxfail = 5
verbosity = 1
concurrency = 4
addopts = ["--fail-fast"]
merit.toml:
test_paths = ["tests"]
verbosity = 1
concurrency = 4
Precedence: CLI args override config files. Config files are discovered by walking up the directory tree from the current working directory.

Understanding Test Output

Merit reports test status as tests complete with a compact line per file by default:
===== MERIT RUN STARTS =====
platform macOS-14.6.1-arm64-arm-64bit -- python 3.12.2 -- merit 0.9.1
rootdir: /Users/you/workspace/project
run_id: 3f5f5e9a-1c2d-4b5f-9c2b-7f6d8a9b0c1d
git: main (abc12345) dirty

Collected 12 tests

tests/unit/test_chatbot.py ✓✓✗
tests/unit/test_db.py !-
tests/unit/test_known_issues.py x

===== SUMMARY =====
run_id: 3f5f5e9a-1c2d-4b5f-9c2b-7f6d8a9b0c1d | 6 passed, 1 failed, 1 errors, 1 skipped, 1 xfailed in 892ms
==================
Use -v to show per-test lines with durations and detailed sub-results.
When using ConsoleReporter live mode (internally implemented with Rich Live), very long verbose output can exceed the terminal’s vertical render limit. In that case, live in-flight updates may appear to stop until the run finishes, which is more likely with many iterated/case-grouped subtests at -v/-vv. No result data is lost: Merit still records everything and prints the full output at run completion.
Status symbols:
  • (green): PASSED - Test succeeded
  • (red): FAILED - Assertion failed
  • ! (yellow): ERROR - Unexpected exception
  • - (yellow): SKIPPED - Test was skipped
  • x (blue): XFAILED - Expected failure occurred
  • ! (magenta): XPASSED - Expected failure passed (usually bad)
Exit codes:
  • 0: All tests passed (or only skipped/xfailed)
  • 1: At least one test failed or errored
  • 2: Invalid CLI usage or configuration error (including duplicate --run-id)

Repeated Tests

For tests with @merit.repeat(), verbose output shows aggregated results:
  ✓ merit_llm_consistency (1234.5ms) PASSED
  ✗ merit_flaky_api (567.8ms) FAILED
The individual run results are attached to execution.sub_executions.

Reporter System

Merit uses an async reporter architecture for flexible output handling.

The Reporter Interface

All reporters implement the Reporter ABC from src/merit/reports/base.py:
class Reporter(ABC):
    @abstractmethod
    async def on_no_tests_found(self) -> None:
        """Called when test collection finds no tests."""

    @abstractmethod
    async def on_collection_complete(self, items: list[MeritTestDefinition]) -> None:
        """Called after test collection completes."""

    async def on_test_start(self, item: MeritTestDefinition) -> None:
        """Called before a test starts executing (optional override)."""

    async def on_subtest_complete(
        self,
        parent: MeritTestDefinition,
        sub_execution: TestExecution,
    ) -> None:
        """Called when a subtest completes (optional override)."""

    @abstractmethod
    async def on_test_complete(self, execution: TestExecution) -> None:
        """Called after each test completes."""

    @abstractmethod
    async def on_run_complete(self, merit_run: MeritRun) -> None:
        """Called after all tests complete."""

    @abstractmethod
    async def on_run_stopped_early(self, failure_count: int) -> None:
        """Called when run stops early due to maxfail limit."""

    @abstractmethod
    async def on_tracing_enabled(self, output_path: Path) -> None:
        """Called when tracing is enabled to report output location."""

Built-in Reporters

ConsoleReporter (default): Outputs to terminal with Rich formatting.
from merit.reports import ConsoleReporter

reporter = ConsoleReporter(verbosity=1)

Creating Custom Reporters

Create custom reporters by subclassing Reporter and implementing all abstract methods. Override on_test_start if you need live in-flight state:
from pathlib import Path
from merit.reports.base import Reporter
from merit.testing import MeritTestDefinition, MeritRun, TestExecution
import json

class JsonFileReporter(Reporter):
    """Custom reporter that writes results to JSON file."""

    def __init__(self, output_path: str):
        self.output_path = Path(output_path)
        self.results = []

    async def on_no_tests_found(self) -> None:
        print("No tests found")

    async def on_collection_complete(self, items: list[MeritTestDefinition]) -> None:
        print(f"Collected {len(items)} tests")

    async def on_test_start(self, item: MeritTestDefinition) -> None:
        # Optional lifecycle hook for live/in-flight reporting
        pass

    async def on_test_complete(self, execution: TestExecution) -> None:
        # Record each test result
        self.results.append({
            "name": execution.item.full_name,
            "status": execution.status.value,
            "duration_ms": execution.result.duration_ms,
            "passed": execution.status.value == "passed"
        })

    async def on_run_complete(self, merit_run: MeritRun) -> None:
        # Write all results to JSON file
        output = {
            "run_id": str(merit_run.run_id),
            "total": merit_run.result.total,
            "passed": merit_run.result.passed,
            "failed": merit_run.result.failed,
            "duration_ms": merit_run.result.total_duration_ms,
            "tests": self.results
        }

        self.output_path.write_text(json.dumps(output, indent=2))
        print(f"Results written to {self.output_path}")

    async def on_run_stopped_early(self, failure_count: int) -> None:
        print(f"Run stopped after {failure_count} failures")

    async def on_tracing_enabled(self, output_path: Path) -> None:
        print(f"Tracing enabled: {output_path}")

Using Multiple Reporters

Use multiple reporters simultaneously with the programmatic API:
import asyncio
from merit.testing import Runner
from merit.reports import ConsoleReporter
from my_reporters import JsonFileReporter, SlackReporter

async def main():
    # Create multiple reporters
    console = ConsoleReporter(verbosity=1)
    json_reporter = JsonFileReporter("results.json")
    slack = SlackReporter(webhook_url="https://...")

    # Pass all reporters to runner
    runner = Runner(reporters=[console, json_reporter, slack])

    # Run tests - all reporters receive events
    result = await runner.run(path="tests/")

    print(f"Results: {result.result.passed}/{result.result.total} passed")

asyncio.run(main())
Note: Currently, only ConsoleReporter is built-in. Create custom reporters for JSON, HTML, database, or external service integration.

Examples

Run smoke tests concurrently:
merit test --tag smoke --concurrency 5
Debug a specific test with tracing:
merit test -k "agent_tool_call" --trace -vv
Run fast tests, stop on first failure:
merit test --tag fast --fail-fast
CI configuration (pyproject.toml):
[tool.merit]
test-paths = ["tests"]
exclude-tags = ["manual", "slow"]
maxfail = 10
concurrency = 4
addopts = ["--fail-fast"]

Database Persistence

Merit automatically persists test run data to a SQLite database for historical tracking and analysis.

Database Location

By default, Merit stores the database at the project root:
<project-root>/.merit/merit.db
The database location is determined by walking up from the current directory to find the first directory containing pyproject.toml.

Disabling Database Persistence

To disable database writes (e.g., for CI environments or quick local runs):
merit test --no-db

Custom Database Path

Specify a custom database location:
merit test --db-path /path/to/custom/merit.db
Or configure in pyproject.toml:
[tool.merit]
db-path = "/path/to/custom/merit.db"

Database Management Commands

Merit provides CLI commands to manage the database:

Check Database Status

View current schema version and pending migrations:
merit db status
Output:
Database: /Users/you/project/.merit/merit.db
Current version: 1
Target version: 1
Status: Up to date

Run Migrations

Apply pending schema migrations:
merit db migrate
Dry run (preview without applying):
merit db migrate --dry-run

Backup Database

Create a timestamped backup:
merit db backup
Creates: .merit/merit.db.backup.YYYYMMDD_HHMMSS

Reset Database

Delete and recreate the database (destructive):
merit db reset --yes
Without --yes, Merit prints a warning and exits without resetting the database.

What’s Stored

The database stores:
  • Test run metadata (timestamp, environment, git info)
  • Individual test executions and results
  • Assertion results and predicate outcomes
  • Metric results and statistical data
  • Trace references (if tracing enabled)
  • Error tracebacks for failed tests

Database Schema

The schema is versioned and managed through migrations. Current tables include:
  • runs - Test run sessions
  • test_executions - Individual test results
  • metrics - Aggregated metrics
  • assertions - Assertion outcomes
  • predicates - Predicate results linked to assertions
  • trace_spans - Trace spans linked to executions
Schema version is stored in SQLite PRAGMA user_version (shown by merit db status as “Current version”).