Tracing

What is Tracing?

Merit includes OpenTelemetry tracing to help you:

Debug complex test flows
Track LLM calls and costs
Identify performance bottlenecks
Understand test execution paths

Enable Tracing

Run tests with tracing enabled:

# Enable tracing via CLI
merit --trace

# Or initialize in code
merit your_tests.py --trace

Traces are exported to traces.json at the end of the run. You can also initialize tracing programmatically:

from merit import init_tracing

# Enable tracing with default settings
init_tracing()

# Your tests here
async def merit_chatbot():
    response = await chatbot("Hello")
    assert response is not None

Custom Service Name

Identify your tests with a custom service name:

from merit import init_tracing

init_tracing(service_name="my-ai-tests")

@sut Decorator

Mark your system under test for automatic tracing:

import merit

@merit.sut
def chatbot(prompt: str) -> str:
    """All calls to this function are automatically traced."""
    # Your AI system
    return generate_response(prompt)

# Use as a resource
def merit_chatbot_test(chatbot):
    """chatbot is injected and traced automatically."""
    response = chatbot("Hello")
    assert response is not None

Works with:

Functions (sync and async)
Classes (traces __call__ method)
Resources

Class-Based SUT

@merit.sut
class RAGPipeline:
    """Class-based SUT with automatic tracing."""
    
    def __call__(self, query: str) -> str:
        """Main entry point - traced automatically."""
        docs = self._retrieve(query)
        return self._generate(docs, query)
    
    def _retrieve(self, query: str) -> list[str]:
        """Add custom trace steps for internal methods."""
        with merit.trace_step("retrieve", {"query": query}):
            return ["doc1", "doc2"]
    
    def _generate(self, docs: list[str], query: str) -> str:
        with merit.trace_step("generate", {"doc_count": len(docs)}):
            return f"Answer based on {len(docs)} docs"

# Use in tests
def merit_rag_pipeline(rag_pipeline):
    """rag_pipeline is injected as a resource."""
    result = rag_pipeline("What is Python?")
    assert "Answer" in result

Async SUT

@merit.sut
async def async_chatbot(prompt: str) -> str:
    """Async SUT - works the same way."""
    return await generate_async_response(prompt)

async def merit_async_test(async_chatbot):
    response = await async_chatbot("Hello")
    assert response is not None

Trace Steps

Add custom trace spans to track specific operations:

from merit import trace_step

async def merit_complex_workflow():
    with trace_step("Load context"):
        context = await load_knowledge_base()
    
    with trace_step("Generate response"):
        response = await llm_call(context)
    
    with trace_step("Validate output"):
        assert await has_facts(response, expected_facts)

Automatic Instrumentation

Merit automatically traces:

LLM calls - OpenAI and Anthropic API calls
Test execution - Test start, duration, results
Predicate evaluations - AI assertion calls

What Gets Tracked

LLM Calls

For each LLM API call, Merit records:

Model name
Token counts (input/output)
Cost estimates
Latency
Success/failure status

Test Execution

For each test, Merit records:

Test name and location
Start time and duration
Pass/fail status
Error messages (if failed)

AI Predicates

For each predicate call, Merit records:

Predicate type (has_facts, has_topics, etc.)
Input/reference text lengths
API latency
Result (passed/failed)
Confidence score

Example with Tracing

from merit import init_tracing, trace_step
from merit.predicates import has_facts, has_unsupported_facts

# Enable tracing
init_tracing(service_name="chatbot-tests")

async def merit_chatbot_accuracy():
    """All LLM calls and predicates are automatically traced."""
    
    with trace_step("Generate response"):
        response = await chatbot("Tell me about Paris")
    
    with trace_step("Fact checking"):
        # Automatically traced
        assert await has_facts(response, "Paris is the capital of France")
    
    with trace_step("Hallucination detection"):
        # Automatically traced
        assert not await has_unsupported_facts(response, source_docs)

Viewing Traces (TODO)

TODO: Document trace viewing and export options Expected integrations:

Export to Jaeger
Export to Zipkin
Console output
JSON export for analysis

Cost Tracking

Use tracing to track LLM costs:

from merit import init_tracing

init_tracing(service_name="cost-tracking")

async def merit_expensive_test():
    """Traces will show total cost of all LLM calls."""
    
    # Each call is tracked with cost estimate
    response1 = await llm_call("Long prompt...")
    response2 = await llm_call("Another call...")
    
    # Predicate calls are also tracked
    assert await has_facts(response1, "...")

After running, check traces to see:

Total tokens used
Estimated cost per test
Most expensive operations

Performance Analysis

Identify slow operations:

from merit import trace_step

async def merit_performance_check():
    """Find bottlenecks in test execution."""
    
    with trace_step("Database query"):
        # If this is slow, trace will show it
        data = await db.query("SELECT * FROM users")
    
    with trace_step("LLM call"):
        # Compare timing to other operations
        response = await llm_call(prompt)
    
    with trace_step("AI predicate"):
        # Predicate calls are usually 1-3 seconds
        assert await has_facts(response, reference)

Disabling Tracing

Tracing has minimal overhead, but you can disable it:

# Don't call init_tracing()

async def merit_no_tracing():
    """Runs without tracing."""
    response = await chatbot("Hello")
    assert response is not None

Environment Configuration (TODO)

TODO: Document environment variables for tracing Expected configuration:

# Enable tracing
MERIT_TRACING_ENABLED=true

# Set service name
MERIT_SERVICE_NAME=my-tests

# Export endpoint
MERIT_OTLP_ENDPOINT=http://localhost:4318

# Sampling rate (0.0 to 1.0)
MERIT_TRACE_SAMPLE_RATE=1.0

Integration with Observability Platforms (TODO)

TODO: Document integration with observability tools Expected integrations:

Jaeger
Zipkin
Honeycomb
DataDog
New Relic

Getting Started

Core Concepts

AI Predicates

Advanced Features

Error Analysis

API Reference

What is Tracing?

Enable Tracing

Custom Service Name

@sut Decorator

Class-Based SUT

Async SUT

Trace Steps

Automatic Instrumentation

What Gets Tracked

LLM Calls

Test Execution

AI Predicates

Example with Tracing

Viewing Traces (TODO)

Cost Tracking

Performance Analysis

Disabling Tracing

Environment Configuration (TODO)

Integration with Observability Platforms (TODO)

Next Steps

Running Tests

Error Analysis

Getting Started

Core Concepts

AI Predicates

Advanced Features

Error Analysis

API Reference

​What is Tracing?

​Enable Tracing

​Custom Service Name

​@sut Decorator

​Class-Based SUT

​Async SUT

​Trace Steps

​Automatic Instrumentation

​What Gets Tracked

​LLM Calls

​Test Execution

​AI Predicates

​Example with Tracing

​Viewing Traces (TODO)

​Cost Tracking

​Performance Analysis

​Disabling Tracing

​Environment Configuration (TODO)

​Integration with Observability Platforms (TODO)

​Next Steps

Running Tests

Error Analysis

What is Tracing?

Enable Tracing

Custom Service Name

@sut Decorator

Class-Based SUT

Async SUT

Trace Steps

Automatic Instrumentation

What Gets Tracked

LLM Calls

Test Execution

AI Predicates

Example with Tracing

Viewing Traces (TODO)

Cost Tracking

Performance Analysis

Disabling Tracing

Environment Configuration (TODO)

Integration with Observability Platforms (TODO)

Next Steps