Trace is the structured execution record of a single merit case. In Merit, each test run gets its own OpenTelemetry trace: a tree of spans showing what happened (tools, retrieval steps, LLM calls), when it happened, and how long it took.
Using traces enables:
- Debugging and explaining why a merit failed (what steps ran, in what order, and how long they took)
- Asserting on execution behavior, not just outputs (tool calls happened, retrieval ran, etc.)
- Correlating LLM spans with your SUT spans and custom pipeline steps
How Merit traces are structured
When tracing is enabled, Merit wraps each merit case in a root span:test.<full_name>
- SUT spans: created by
@merit.sut, namedsut.<sut_name> - Custom step spans: created by
merit.trace_step("...") - LLM spans: auto-instrumented spans whose names usually start with
openai.,anthropic., orgen_ai.
Enable tracing
Tracing is disabled by default. Enable it from the CLI:.merit/traces.jsonl. You can override the output path:
The injected
trace_context parameter is only available when tracing is enabled. Without --trace, resolving trace_context raises at runtime.Basic Usage
Usetrace_context to query spans created during the current merit case execution:
Common patterns
Assert tool-calling contracts (tool dependency + no loops + permissions)
The main point of tracing is enforcing workflow contracts that matter in production (especially for agents): not just “did we return a good string”, but “did we call the right tools, in the right shape, without runaway loops”.1. If tool A was called, tool B must also be called
Example contract: “if we calledsearch, we must also call cite_sources”.
2. Assert there are no tool-calling loops (ABAB…, ABCABC…)
This catches common failure modes like calling the same 2–3 tools in a tight cycle.Inspect LLM calls
If your SUT triggers instrumented LLM clients, you can locate those spans:Recommendations
1. Prefer trace assertions for execution guarantees
If correctness depends on how the system behaves (e.g., “must call retrieval”, “must call tool X”), asserting on spans is more robust than parsing free-form text output.2. Keep spans high-signal
Create a small number of meaningful steps (retrieve, rerank, generate) rather than tracing every minor helper function.
3. Be deliberate about content capture
Traces may include request/response content depending on configuration. See the tracing API docs (especiallyMERIT_TRACE_CONTENT) in docs/apis/tracing.mdx.