Merit provides 8 built-in AI predicates for common LLM evaluation scenarios. All predicates are async functions that return PredicateResult objects with boolean values, confidence scores, and explanatory messages.
Copy
import meritfrom merit.predicates import has_unsupported_facts, follows_policyasync def merit_customer_faq_bot(faq_bot): # Knowledge base the bot should use knowledge = """ Our store hours are 9 AM to 6 PM, Monday through Saturday. We're closed on Sundays and major holidays. Free shipping on orders over $50. """ # Customer asks a question, bot generates response response = faq_bot.answer( "When are you open?", context=knowledge ) # Example output: # "We're open Monday through Saturday, 9 AM to 6 PM. # We're closed Sundays and holidays." # Verify response doesn't hallucinate facts assert not await has_unsupported_facts( response, knowledge ) # Verify response follows customer service guidelines conversation_policy = """ Agent always asks if they can help with any other questions. """ assert await follows_policy(response, conversation_policy)
Detects when generated text contradicts source material.
Copy
from merit.predicates import has_conflicting_factsasync def merit_rag_no_contradictions(rag_system): # Source document about a company source = """ Acme Corp was founded in 2018. The company has 150 employees and is headquartered in Austin, Texas. Revenue was $12M in 2023. """ # LLM generates answer based on retrieved context answer = rag_system.query("Tell me about Acme Corp") # Passes: answer doesn't contradict assert not await has_conflicting_facts(answer, source) # Bad output: # "Acme Corp was founded in 2015 in San Francisco..." # Would fail: contradicts year and location
Catches hallucinations - facts the LLM invented that aren’t grounded in source material.
Copy
from merit.predicates import has_unsupported_factsasync def merit_no_hallucinations(rag_system): # Knowledge base only contains this information source = """ Python 3.12 was released in October 2023. It introduced f-string improvements. """ answer = rag_system.query("What's new in Python 3.12?") # Example output: # "Python 3.12 came out in October 2023 # with better f-strings." # Passes: all facts are grounded in source assert not await has_unsupported_facts(answer, source) # Bad output: # "Python 3.12 released October 2023 with # f-string improvements and a new JIT compiler # for 2x faster performance." # Would fail: JIT compiler claim is hallucinated
Verifies output covers required subjects. Useful for content generation where specific themes must be addressed.
Copy
from merit.predicates import has_topicsasync def merit_onboarding_covers_topics(onboarding_bot): # New employee asks about benefits response = onboarding_bot.chat( "What benefits do I get?" ) # Example: # "Welcome! Your benefits include comprehensive # health insurance with dental and vision, # a 401k with 4% company match, and 20 days PTO. # You're also eligible for our annual bonus # program." # Response must cover these key topics topics = """ health insurance, retirement plan, paid time off """ assert await has_topics(response, topics)
Ensures LLM outputs adhere to business rules, safety guidelines, or content policies.
Copy
from merit.predicates import follows_policyasync def merit_support_follows_guidelines(support_bot): # Customer asking about competitor question = "Is your product better than CompetitorX?" response = support_bot.chat(question) # Example: # "I'd be happy to tell you about our product's # strengths! We offer 24/7 support, 99.9% uptime, # and flexible pricing. I can't compare directly # to other products, but I can answer any # questions about what we offer." policy = """ - Never disparage competitors by name - Focus on our product's strengths, not competitor weaknesses - Don't make claims about competitor products - Redirect to our features when asked for comparisons """ assert await follows_policy(response, policy)
Validates tone, formality, and voice match a reference example.
Copy
from merit.predicates import matches_writing_styleasync def merit_maintains_brand_voice(marketing_bot): # Generate product description description = marketing_bot.generate( "Describe our new running shoes" ) # Example: # "Meet the CloudRunner Pro. Engineered for # the long haul. 47% lighter than last gen. # Zero compromises." # Brand voice reference: punchy, confident, minimal brand_voice = """ Built different. The UltraFrame bike handles like nothing else. Carbon fiber. Precision engineering. Pure speed. """ assert await matches_writing_style( description, brand_voice ) # Would fail with: # "Our new running shoes are very comfortable # and lightweight, offering great support for # runners of all levels..." # (too generic and wordy for this brand voice)
When you run merits with database persistence enabled (default behavior), all AI predicate evaluations used inside assert statements are automatically saved to the Merit database. This enables post-run analysis, debugging, and quality monitoring.Every PredicateResult evaluated in an assertion is stored with full context. After the run completes, you can investigate these evaluations even if all tests passed.
While Merit provides 8 built-in AI predicates, you can create custom predicates for domain-specific comparisons or integrate third-party LLM evaluation tools. Use the @predicate decorator to ensure your custom predicates integrate seamlessly with Merit’s assertion tracking and database persistence.
The @predicate decorator transforms ordinary comparison functions into protocol-conforming predicates. To be eligible for decoration, your function must satisfy the Predicate protocol’s signature constraints:Signature Requirements:
Return type: Must return bool representing the evaluation outcome
Required parameters: Must accept actual and reference as either:
The first two positional parameters, or
Named keyword parameters (actual=, reference=)
Execution model: Can be synchronous or asynchronous—the decorator adapts to both def and async def functions
from merit import predicatefrom openai import AsyncOpenAIclient = AsyncOpenAI()@predicateasync def matches_tone_with_gpt4( actual: str, reference: str, *, strict: bool = False) -> bool: """Check if actual text matches the tone of reference using GPT-4. Args: actual: The text to evaluate reference: Example text with desired tone strict: Whether to require exact tone match or allow similar tones """ prompt = f"""Compare the tone of these two texts. Reference tone: {reference} Text to evaluate: {actual} Does the text to evaluate match the reference tone? {"Require exact match." if strict else "Allow similar tones."} Answer only 'yes' or 'no'.""" response = await client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": prompt}], temperature=0 ) answer = response.choices[0].message.content.strip().lower() return answer == "yes"async def merit_brand_consistency(marketing_bot): brand_example = "Hey there! 👋 Let's make this happen together." generated = marketing_bot.create_post("Announce new feature") assert await matches_tone_with_gpt4( generated, brand_example, strict=False )
When building custom predicates, follow the naming convention of starting with action verbs like has_, matches_, follows_, or contains_ to make assertions read naturally.
1. Use AI predicates for natural language assertions
AI predicates shine when evaluating LLM outputs where exact string matching is too brittle.Don’t do this:
Copy
# Using AI predicates for exact matchingfrom merit.predicates import has_factsasync def merit_json_output(api): result = api.get_user(id=123) # Semantic predicate overkill for structured data assert await has_facts( str(result), '{"name": "Alice"}' )
Do this:
Copy
# Use standard assertions for structured datadef merit_json_output(api): result = api.get_user(id=123) assert result["name"] == "Alice" assert result["id"] == 123# Use semantic predicates for natural languagefrom merit.predicates import ( has_facts, has_unsupported_facts)async def merit_text_generation(llm): context = "The company was founded in 2020." summary = llm.summarize(context) # Semantic checks for flexible language matching assert await has_facts(summary, "founded in 2020") assert not await has_unsupported_facts( summary, context )
2. Combine multiple predicates for comprehensive validation
Layer semantic checks to validate different aspects of LLM outputs. This provides stronger guarantees than single assertions.
Copy
from merit.predicates import ( has_unsupported_facts, has_conflicting_facts, has_topics, follows_policy)async def merit_product_description(product_copilot): # Source: product database entry product_data = """ Name: ThermoPro X500 Price: $299 Features: Temperature sensing, WiFi connectivity, Mobile app Warranty: 2 years """ description = product_copilot.generate_description( product_data ) # Example output: # "The ThermoPro X500 ($299) brings smart # temperature monitoring to your home. # Connect via WiFi, control from our mobile app, # and enjoy peace of mind with a 2-year warranty." # Layer 1: No hallucinated features or specs assert not await has_unsupported_facts( description, product_data ) # Layer 2: Price and warranty not misstated assert not await has_conflicting_facts( description, product_data ) # Layer 3: Must mention key selling points assert await has_topics( description, "WiFi, mobile app, warranty" ) # Layer 4: Follow marketing guidelines marketing_policy = """ No superlatives like 'best' or 'revolutionary'. No competitor mentions. """ assert await follows_policy( description, marketing_policy )