Merit Tracing APIs

Setup Function

init_tracing

Initialize OpenTelemetry tracing with streaming file export. Signature:

def init_tracing(
    *,
    service_name: str = "merit",
    trace_content: bool | None = None,
    output_path: Path | str = ".merit/traces.jsonl",
) -> None

Parameters:

Name	Type	Default	Description
`service_name`	`str`	`"merit"`	Service name in trace metadata
`trace_content`	`bool \| None`	`None`	Content capture toggle. Note: today Merit uses `MERIT_TRACE_CONTENT` to control SUT span input/output attributes; `trace_content` is not currently applied to OpenAI/Anthropic client instrumentation.
`output_path`	`Path \| str`	`".merit/traces.jsonl"`	File path for trace export

Returns: None Side Effects:

Sets up OpenTelemetry tracer provider
Instruments OpenAI and Anthropic clients
Creates/truncates output file

Important: Must be called before instantiating LLM clients to ensure instrumentation captures all calls. Example:

from merit import init_tracing

# Basic setup
init_tracing()

# Custom configuration
init_tracing(
    service_name="my-ai-system",
    output_path="traces/run_001.jsonl"
)

# Now create LLM clients - they'll be automatically traced
from openai import OpenAI
client = OpenAI()  # All calls traced

from anthropic import Anthropic
claude = Anthropic()  # All calls traced

Environment Variables:

MERIT_TRACE_CONTENT: Set to "false" to avoid recording SUT input/output content (sut.input.*, sut.output). This does not currently guarantee LLM client instrumentation content is redacted.

Context Manager

trace_step

Create a custom span for tracing application logic. Signature:

@contextmanager
def trace_step(
    name: str,
    attributes: dict[str, Any] | None = None
) -> Iterator[Span]

Parameters:

Name	Type	Default	Description
`name`	`str`	-	Name of the span
`attributes`	`dict[str, Any] \| None`	`None`	Optional attributes to attach to span

Yields: Span - OpenTelemetry span object Example:

from merit import trace_step

async def merit_agent_pipeline(agent):
    with trace_step("retrieve_context", {"query": "user question"}):
        context = agent.retrieve("user question")

    with trace_step("generate_response", {"context_length": len(context)}) as span:
        response = await agent.generate(context)
        span.set_attribute("response_length", len(response))

    assert response

Nested Spans:

from merit import trace_step

def complex_pipeline():
    with trace_step("pipeline"):
        with trace_step("stage_1"):
            result_1 = process_stage_1()

        with trace_step("stage_2"):
            result_2 = process_stage_2(result_1)

        with trace_step("stage_3"):
            return process_stage_3(result_2)

Classes

TraceContext

Provides access to trace data for the current test execution. Injection: TraceContext is automatically injected when a merit function declares trace_context as a parameter. It enables querying child spans, LLM calls, and setting custom attributes on the test span. Properties:

Name	Type	Description
`trace_id`	`str`	The trace ID for this test’s span (32 hex characters)
`span_id`	`str`	The span ID for this test’s span (16 hex characters)
`is_enabled`	`bool`	Whether tracing is currently enabled

Methods:

Method	Returns	Description
`get_child_spans()`	`list[ReadableSpan]`	Get all spans created during this test’s execution
`get_llm_calls()`	`list[ReadableSpan]`	Get spans from LLM API calls (OpenAI, Anthropic)
`get_sut_spans(name=None)`	`list[ReadableSpan]`	Get spans from `@merit.sut` decorated functions, optionally filtered by name
`set_attribute(key, value)`	`None`	Set a custom attribute on the test span

Example:

import merit

@merit.sut
def my_agent(prompt: str) -> str:
    with merit.trace_step("retrieve"):
        docs = retrieve_docs(prompt)

    with merit.trace_step("generate"):
        return generate_response(docs)

def merit_agent_workflow(my_agent, trace_context):
    """Use trace_context to assert on execution flow."""
    result = my_agent("What is Python?")

    # Set custom attributes on test span
    trace_context.set_attribute("response.length", len(result))

    # Get all child spans
    all_spans = trace_context.get_child_spans()
    assert len(all_spans) >= 2, "Expected at least 2 trace steps"

    # Get SUT-specific spans
    sut_spans = trace_context.get_sut_spans(name="my_agent")
    assert len(sut_spans) == 1
    assert sut_spans[0].name == "sut.my_agent"

    # Get LLM calls (if any)
    llm_calls = trace_context.get_llm_calls()
    for call in llm_calls:
        model = call.attributes.get("llm.model")
        print(f"LLM call used model: {model}")

    # Check trace IDs for correlation
    print(f"Trace ID: {trace_context.trace_id}")
    print(f"Span ID: {trace_context.span_id}")

Filtering SUT Spans:

def merit_multiple_suts(agent_a, agent_b, trace_context):
    """Filter SUT spans by name when multiple SUTs are used."""
    result_a = agent_a("query 1")
    result_b = agent_b("query 2")

    # Get spans for specific SUT
    agent_a_spans = trace_context.get_sut_spans(name="agent_a")
    agent_b_spans = trace_context.get_sut_spans(name="agent_b")

    assert len(agent_a_spans) == 1
    assert len(agent_b_spans) == 1

Conditional Logic Based on Tracing:

def merit_tracing_enabled_only(my_agent, trace_context):
    """This test requires tracing (run with `merit test --trace`)."""
    result = my_agent("test query")

    spans = trace_context.get_child_spans()
    assert len(spans) > 0, "Expected trace spans"
    assert result is not None

Utility Functions

get_tracer

Get an OpenTelemetry tracer instance for creating custom spans. Signature:

def get_tracer(name: str = "merit") -> Tracer

Parameters:

Name	Type	Default	Description
`name`	`str`	`"merit"`	Tracer name

Returns: Tracer - OpenTelemetry tracer instance Example:

from merit.tracing import get_tracer

tracer = get_tracer("my-component")

def custom_function():
    with tracer.start_as_current_span("custom_operation") as span:
        span.set_attribute("custom_key", "custom_value")
        # Your code here
        result = do_work()
        span.set_attribute("result_size", len(result))
        return result

clear_traces

Clear the trace output file. Signature:

def clear_traces() -> None

Parameters: None Returns: None Example:

from merit.tracing import clear_traces, init_tracing

# Setup tracing
init_tracing(output_path=".merit/traces.jsonl")

# Run some tests...
merit_run_1()

# Clear traces before next run
clear_traces()

# Run more tests with fresh trace file
merit_run_2()

set_trace_output_path

Change the trace output path for the current exporter. Signature:

def set_trace_output_path(output_path: Path | str) -> None

Parameters:

Name	Type	Description
`output_path`	`Path \| str`	New file path for trace export

Returns: None Example:

from merit.tracing import set_trace_output_path, init_tracing

# Initial setup
init_tracing(output_path=".merit/traces.jsonl")

# Change output path mid-run
set_trace_output_path("traces/experiment_2.jsonl")

get_span_collector

Get the current span collector instance for accessing collected spans. Signature:

def get_span_collector() -> InMemorySpanCollector | None

Parameters: None Returns: InMemorySpanCollector | None - The active span collector, or None if tracing is not enabled Example:

from merit.tracing import get_span_collector

def merit_advanced_tracing(my_agent, trace_context):
    """Access raw OpenTelemetry spans for the current test (requires `--trace`)."""
    result = my_agent("test query")

    collector = get_span_collector()
    if collector:
        spans = collector.get_spans(trace_context.trace_id)
        print(f"Total spans in this test trace: {len(spans)}")

    assert result

Note: Most tests should use trace_context parameter instead, which provides a cleaner API scoped to the current test. get_span_collector() is useful for advanced scenarios requiring access to all spans.

InMemorySpanCollector

Internal class that collects and stores OpenTelemetry spans during test execution. Purpose: This is an advanced/internal API used by Merit’s tracing system. Most users should use TraceContext instead. Methods:

Method	Returns	Description
`get_spans(trace_id)`	`list[ReadableSpan]`	Get all spans for a specific trace ID
`clear(trace_id)`	`None`	Clear spans for a specific trace ID
`clear_all()`	`None`	Clear all collected spans

Example:

from merit.tracing import get_span_collector

def merit_inspect_current_trace(trace_context):
    """Inspect raw spans for the current test (requires `--trace`)."""
    collector = get_span_collector()
    if not collector:
        return

    spans = collector.get_spans(trace_context.trace_id)
    llm_spans = [s for s in spans if s.name.startswith(("openai.", "anthropic.", "gen_ai."))]
    print(f"Total spans: {len(spans)}")
    print(f"Total LLM spans: {len(llm_spans)}")

When to use:

Custom test runners or frameworks built on Merit
Advanced trace analysis across multiple tests
Performance profiling and debugging

When NOT to use:

Regular test assertions (use trace_context parameter)
Single-test trace inspection (use trace_context.get_child_spans())

Automatic Tracing

LLM Client Instrumentation

When init_tracing() is called, Merit automatically instruments:

OpenAI - openai package
Anthropic - anthropic package

All LLM calls are captured with:

Request parameters (model, temperature, messages, etc.)
Response/content details depend on the underlying OpenTelemetry instrumentor configuration (Merit does not currently toggle this via trace_content)
Timing information
Token usage
Error details

Example:

from merit import init_tracing
from openai import OpenAI

# Enable tracing
init_tracing()

# Create client - automatically instrumented
client = OpenAI()

# This call is automatically traced
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

# Trace includes:
# - Model name: "gpt-4"
# - Messages: [{"role": "user", "content": "Hello"}]
# - Response content: "Hi there!"
# - Tokens used
# - Latency

SUT Tracing

Functions and classes decorated with @sut are automatically traced:

from merit import sut

@sut
async def my_agent(prompt: str) -> str:
    # This entire function execution is traced as "sut.my_agent"
    return await llm.generate(prompt)

@sut
class RAGSystem:
    def __call__(self, query: str) -> str:
        # Traced as "sut.rag_system"
        context = self.retrieve(query)
        return self.generate(context)

def merit_test(my_agent, rag_system):
    # Both calls create spans with input/output
    result1 = await my_agent("Hello")
    result2 = rag_system("Question")

Captured Information:

Input arguments (args and kwargs)
Output values
Execution time
Nested LLM calls (as child spans)

Usage Patterns

Basic Setup

from merit import init_tracing, sut

# Initialize tracing
init_tracing(output_path=".merit/traces.jsonl")

@sut
async def chatbot(prompt: str) -> str:
    from openai import OpenAI
    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def merit_chatbot(chatbot):
    response = await chatbot("Hello")
    assert "hello" in response.lower()

Trace Structure:

merit_chatbot
└── sut.chatbot
    └── openai.chat.completions.create

Custom Steps

from merit import init_tracing, trace_step, sut

init_tracing()

@sut
class RAGPipeline:
    def __call__(self, query: str) -> str:
        with trace_step("retrieve", {"query": query}):
            docs = self.retrieve(query)

        with trace_step("rerank", {"doc_count": len(docs)}):
            top_docs = self.rerank(docs, query)

        with trace_step("generate"):
            return self.generate(query, top_docs)

def merit_rag(rag_pipeline):
    result = rag_pipeline("What is Python?")
    assert result

Trace Structure:

merit_rag
└── sut.rag_pipeline
    ├── retrieve
    ├── rerank
    └── generate
        └── openai.chat.completions.create

Debugging with Traces

from merit import init_tracing, trace_step
import json

# Enable tracing
init_tracing(output_path="debug_traces.jsonl")

# Run merit
# ... merit functions execute ...

# Analyze traces
with open("debug_traces.jsonl") as f:
    traces = [json.loads(line) for line in f]

for trace in traces:
    # Each line is an OpenTelemetry span serialized via `ReadableSpan.to_json()`.
    name = trace.get("name", "<unknown>")
    attributes = trace.get("attributes") or {}

    print(name)
    if isinstance(attributes, dict) and "llm.model" in attributes:
        print(f"  Model: {attributes['llm.model']}")

Privacy Controls

from merit import init_tracing
import os

# Disable content capture for sensitive data
os.environ["MERIT_TRACE_CONTENT"] = "false"

init_tracing()

# Now traces capture:
# - Timing information
# - Model names
# - Token counts
# - Parameter counts
# But NOT:
# - SUT input/output values (`sut.input.*`, `sut.output`)
#
# Note: Merit does not currently guarantee LLM client instrumentation content is redacted.

CI/CD Integration

from merit import init_tracing
import os

# Trace to different files per run
run_id = os.environ.get("CI_RUN_ID", "local")
init_tracing(
    service_name=f"merit-{run_id}",
    output_path=f"traces/run_{run_id}.jsonl"
)

# Run merit suite
# ...

# Upload traces to analysis platform
# upload_traces(f"traces/run_{run_id}.jsonl")

Trace File Format

Traces are exported as JSONL (JSON Lines). Each line is a complete OpenTelemetry span serialized via ReadableSpan.to_json(). Because the exact shape can vary by OpenTelemetry version and installed instrumentations, inspect a line directly in your trace file. Merit-added attributes to look for (may be absent when MERIT_TRACE_CONTENT=false):

merit.sut / merit.sut.name
sut.input.args / sut.input.kwargs / sut.input.count
sut.output / sut.output.type

{
  "name": "sut.chatbot",
  "attributes": {
    "...": "span attributes (varies by instrumentation)"
  }
}

CLI Integration

Merit CLI automatically enables tracing when --trace flag is used:

# Enable tracing
merit test --trace

# Custom output path
merit test --trace --trace-output my_traces.jsonl

# Disable content (only metadata)
MERIT_TRACE_CONTENT=false merit test --trace

Get Started

Usage

Concepts

API Reference

Examples

Setup Function

init_tracing

Context Manager

trace_step

Classes

TraceContext

Utility Functions

get_tracer

clear_traces

set_trace_output_path

get_span_collector

InMemorySpanCollector

Automatic Tracing

LLM Client Instrumentation

SUT Tracing

Usage Patterns

Basic Setup

Custom Steps

Debugging with Traces

Privacy Controls

CI/CD Integration

Trace File Format

CLI Integration

Get Started

Usage

Concepts

API Reference

Examples

​Setup Function

​init_tracing

​Context Manager

​trace_step

​Classes

​TraceContext

​Utility Functions

​get_tracer

​clear_traces

​set_trace_output_path

​get_span_collector

​InMemorySpanCollector

​Automatic Tracing

​LLM Client Instrumentation

​SUT Tracing

​Usage Patterns

​Basic Setup

​Custom Steps

​Debugging with Traces

​Privacy Controls

​CI/CD Integration

​Trace File Format

​CLI Integration

Setup Function

init_tracing

Context Manager

trace_step

Classes

TraceContext

Utility Functions

get_tracer

clear_traces

set_trace_output_path

get_span_collector

InMemorySpanCollector

Automatic Tracing

LLM Client Instrumentation

SUT Tracing

Usage Patterns

Basic Setup

Custom Steps

Debugging with Traces

Privacy Controls

CI/CD Integration

Trace File Format

CLI Integration