RheaDuoBoost: Cost-Effective AI Enhancement
RheaDuoBoost is our proprietary AI orchestration system that delivers premium AI quality at >99% lower cost compared to using premium models directly. This sophisticated 4-phase collaborative workflow combines multiple smaller models to achieve results comparable to GPT-4.1 while dramatically reducing costs through intelligent model orchestration.
Overview
RheaDuoBoost is Rhea AI's flagship orchestration system that dramatically improves AI response quality through a sophisticated 4-phase collaborative workflow. By combining a primary model with a reviewer model, RheaDuoBoost creates a systematic process for generating, reviewing, and refining AI responses.
Architecture Overview
RheaDuoBoost follows a layered architecture with clear separation of concerns:
The Cost Challenge
Traditional AI pricing makes it difficult to offer advanced AI features at scale:
- GPT-4.1 Input: $2 per million tokens
- GPT-4.1 Output: $8 per million tokens
- RheaDuoBoost Input: $0.02 per million tokens (100x cheaper)
- RheaDuoBoost Output: $0.04 per million tokens (200x cheaper)
For a typical co-parenting app with thousands of daily users, this difference is dramatic:
- Traditional approach: $500-1,000+ per day in AI costs
- RheaDuoBoost approach: $5-10 per day in AI costs
The 4-Phase Workflow
RheaDuoBoost implements a systematic 4-phase process designed to maximize response quality through multi-model collaboration:
Phase 1: Inception 🎯
Purpose: Optimize the query and generate the ideal AI persona
Process:
- Persona Generation: Reviewer model analyzes the query to create optimal system prompt
- Query Optimization: Reviewer model enhances user query for better AI understanding
- Parallel Execution: Both tasks run concurrently for efficiency
Example from our logs:
Original Query: "Explain the concept of a Large Language Model in simple terms, for a 13 year old."
Generated Persona: "You are Bard, an AI assistant who specializes in explaining
complex topics like artificial intelligence in a way that makes sense to teenagers.
Use simple language, relatable examples, and be patient in explaining the concept."
Optimized Query: "Imagine you had a super smart computer program that could read
and understand any text you gave it, just like a human. This program is called
a Large Language Model, or LLM. Can you explain how LLMs work, using examples
like how they learn from books and websites, and what they can do, like write
stories or answer questions, in a way that a 13-year-old could easily understand?"Phase 2: Primary Generation 🚀
Purpose: Generate initial response using optimized inputs
Process:
- Primary model (Mistral-7B) receives the generated persona as system prompt
- Uses the optimized query instead of original user input
- Generates comprehensive initial response
- Captures detailed usage metadata
Phase 3: Review 🔍
Purpose: Critical analysis by independent reviewer model (Gemma-2-9B)
Process:
- Reviewer model evaluates the primary response objectively
- Focuses on 2-3 most important improvements
- Provides specific, actionable feedback
- Maintains quality through separate model perspective
Example Review from our logs:
Review of LLM explanation:
• Clarify "reading and understanding": explicitly state that LLMs analyze
text patterns, not understand meaning like humans
• Explain "training data": briefly define what constitutes training data
(books, websites, etc.) for better comprehension
• Condense examples: streamline into one concise examplePhase 4: Synthesis ⚡
Purpose: Generate final improved response incorporating review feedback
Process:
- Primary model receives complete context (persona + query + initial response + feedback)
- Generates final synthesized response addressing all concerns
- Always produces output with maintained coherence
- Delivers polished, high-quality result
Detailed Cost Breakdown
RheaDuoBoost uses approximately 2,000 prompt tokens and 1,000 completion tokens per complex request:
Per-Request Cost Calculation
RheaDuoBoost:
- Prompt tokens: 2,000 × $0.02 ÷ 1,000,000 = $0.00004
- Completion tokens: 1,000 × $0.04 ÷ 1,000,000 = $0.00004
- Total per request: $0.00008 (0.008¢)
GPT-4.1:
- Prompt tokens: 2,000 × $2 ÷ 1,000,000 = $0.004
- Completion tokens: 1,000 × $8 ÷ 1,000,000 = $0.008
- Total per request: $0.012 (1.2¢)
Cost Reduction: 99.3% savings per request
The Four Stages
- Inception Stage: Persona generation and prompt optimization
- Primary Generation Stage: Main response creation
- Review Stage: Quality analysis and feedback
- Synthesis Stage: Final response polishing
All stages combined use roughly 2,000 prompt tokens and generate 1,000 completion tokens.
Provider System
Supported Providers
RheaDuoBoost integrates with multiple AI providers through a unified interface:
| Provider | Models Used | Strengths | Use Cases |
|---|---|---|---|
| OpenRouter | Mistral-7B, Gemma-2-9B | Cost-effective, diverse models | Primary generation, review |
| OpenAI | GPT-4.1, GPT-4o | Advanced reasoning | Fallback, comparison |
| Anthropic | Claude Sonnet, Haiku | Safety, instruction following | Review, critique |
| Ollama | Local models | Privacy, no API costs | Development, private data |
AIInteraction Contract
All provider communication uses the standardized AIInteraction format:
interface AIInteraction {
interaction_type: string // 'chat', 'completion', 'embedding'
content: Record<string, any> // {'messages': [...]} or {'prompt': "..."}
model_name: string // Specific model identifier
expect_json?: boolean // Structured output flag
agent_id?: string // Agent identification
kwargs?: Record<string, any> // Provider options
}Registry and Configuration
The AIProviderRegistry manages:
- Provider Discovery: Dynamic loading of available providers
- Model Catalogs: Comprehensive model information and capabilities
- Cost Calculation: Centralized token pricing and cost tracking
- Configuration Management: Environment-based and file-based settings
Configuration Sources
The registry supports multiple configuration sources with clear precedence:
-
Environment Variables (Highest Priority)
export OPENROUTER_API_KEY="sk-..." export OPENAI_API_KEY="sk-..." -
Provider JSON Files
{ "openrouter": { "enabled": true, "default_model": "mistralai/mistral-7b-instruct", "models": { "mistralai/mistral-7b-instruct": { "description": "Fast, cost-effective generation", "costs": { "input_per_mtok": 0.02, "output_per_mtok": 0.04 } } } } }
Supabase Edge Functions Integration
RheaDuoBoost is implemented as Supabase Edge Functions for seamless integration with our backend:
// Example: RheaDuoBoost Edge Function
import { serve } from "https://deno.land/std@0.168.0/http/server.ts";
import { RheaDuoBoostService } from "./rheaduoboost/service.ts";
serve(async (req) => {
try {
const { query, lodgeId, userId } = await req.json();
// Initialize RheaDuoBoost service
const rheaService = new RheaDuoBoostService(
"openrouter/mistralai/mistral-7b-instruct", // Primary model
"openrouter/google/gemma-2-9b-it" // Reviewer model
);
// Process through 4-phase workflow
const result = await rheaService.process(query);
// Store result in database
await storeAiResponse(supabase, lodgeId, userId, result);
return new Response(JSON.stringify({
response: result.final_response,
cost: result.metadata.total_cost,
phases: result.metadata.stages.length
}));
} catch (error) {
return new Response(JSON.stringify({ error: error.message }), {
status: 500,
headers: { 'Content-Type': 'application/json' }
});
}
});Metadata Collection and Transparency
Each phase contributes detailed metadata for cost tracking and debugging:
{
"query": "Explain the concept of a Large Language Model in simple terms, for a 13 year old.",
"stages": [
{
"name": "inception_persona",
"model": "openrouter/google/gemma-2-9b-it",
"usage": {
"input_tokens": 182,
"output_tokens": 100,
"total_tokens": 282
},
"cost": 0.0000076
},
{
"name": "inception_prompt_optimization",
"model": "openrouter/google/gemma-2-9b-it",
"usage": {
"input_tokens": 193,
"output_tokens": 89,
"total_tokens": 282
},
"cost": 0.0000075
},
{
"name": "primary_generation",
"model": "openrouter/mistralai/mistral-7b-instruct",
"usage": {
"input_tokens": 197,
"output_tokens": 433,
"total_tokens": 630
},
"cost": 0.0000213
},
{
"name": "review",
"model": "openrouter/google/gemma-2-9b-it",
"usage": {
"input_tokens": 762,
"output_tokens": 89,
"total_tokens": 851
},
"cost": 0.0000188
},
{
"name": "final_synthesis",
"model": "openrouter/mistralai/mistral-7b-instruct",
"usage": {
"input_tokens": 1057,
"output_tokens": 328,
"total_tokens": 1385
},
"cost": 0.0000342
}
],
"total_cost": 0.0000894,
"final_response": "Alright, so imagine you've been reading tons of stories..."
}Total actual cost from logs: $0.0000894 (0.009¢) vs GPT-4.1 estimated cost of $0.012 (1.2¢)
Error Handling and Resilience
RheaDuoBoost implements graceful degradation across multiple levels:
- Provider Level: API failures logged, alternative providers attempted
- Phase Level: Individual phase failures don't crash entire workflow
- Registry Level: Missing configurations result in warnings, not crashes
- Cost Calculation: Failures return 0.0 instead of exceptions
Fallback Strategy
// Fallback hierarchy for critical phases
const fallbackChain = [
"openrouter/mistralai/mistral-7b-instruct", // Primary choice
"openai/gpt-4o-mini", // Fallback 1
"anthropic/claude-haiku" // Fallback 2
];
for (const model of fallbackChain) {
try {
return await generateResponse(model, prompt);
} catch (error) {
console.warn(`Model ${model} failed, trying next...`);
}
}Cost Comparison
For a typical OurOtters user making 10 AI requests per day:
| Model | Cost per Request | Daily Cost (10 requests) | Monthly Cost | Savings vs GPT-4.1 |
|---|---|---|---|---|
| GPT-4.1 | $0.012 (1.2¢) | $0.12 | $3.60 | - |
| RheaDuoBoost | $0.00008 (0.008¢) | $0.0008 | $0.024 | 99.3% |
Annual savings for 1,000 active users: ~$13,000 vs GPT-4.1
Quality Assurance
Despite the dramatic cost savings, RheaDuoBoost maintains high quality through:
- Multi-Model Verification: Each response is checked by multiple models
- Specialized Personas: Tailored AI personalities for specific tasks
- Continuous Feedback Loop: The review stage catches and corrects issues
- Context Preservation: Maintains conversation context across stages
Implementation in OurOtters
RheaDuoBoost powers several key features:
Co-Parenting Assistant
- Provides empathetic, balanced advice
- Understands family dynamics
- Maintains appropriate tone for sensitive topics
Document Analysis
- Extracts key information from legal documents
- Summarizes medical records
- Identifies important dates and obligations
Conflict Resolution
- Suggests compromise solutions
- Maintains neutrality
- Focuses on children's best interests
Expense Categorization
- Automatically categorizes receipts
- Suggests fair splits
- Tracks patterns over time
Technical Implementation
RheaDuoBoost is implemented as Supabase Edge Functions:
// Example: RheaDuoBoost Edge Function
export async function processWithRheaDuoBoost(query: string) {
// Stage 1: Inception
const inception = await runInception(query);
// Stage 2: Primary Generation
const primary = await generatePrimary(
inception.persona,
inception.optimizedPrompt
);
// Stage 3: Review
const review = await reviewResponse(
query,
primary.response
);
// Stage 4: Synthesis
const final = await synthesizeFinal(
primary.response,
review.feedback,
inception.persona
);
return {
response: final.response,
metadata: {
stages: [inception, primary, review, final],
totalCost: calculateCost(),
processingTime: getProcessingTime()
}
};
}Future Enhancements
Planned Improvements
- Dynamic Model Selection: Choose models based on query complexity
- Caching Layer: Cache common patterns to reduce costs further
- Fine-Tuning: Custom models trained on co-parenting scenarios
- Parallel Processing: Run stages concurrently when possible
Scaling Strategy
- Load Balancing: Distribute requests across multiple providers
- Regional Deployment: Deploy closer to users for lower latency
- Batch Processing: Group similar requests for efficiency
Conclusion
RheaDuoBoost represents a paradigm shift in AI economics. By intelligently orchestrating multiple smaller models, we achieve:
- >90% cost reduction compared to premium models
- Maintained quality through multi-stage verification
- Scalability to serve thousands of users affordably
- Flexibility to adapt to different use cases
This technology enables OurOtters to democratize access to advanced AI features, ensuring that all families can benefit from AI-powered co-parenting assistance regardless of their subscription tier.
Technical Deep Dive
For developers interested in implementing similar systems, key considerations include:
- Model Selection: Choose complementary models with different strengths
- Prompt Engineering: Invest in sophisticated prompt templates
- Error Handling: Implement fallbacks for each stage
- Monitoring: Track quality metrics across all stages
- Cost Tracking: Monitor token usage per stage
RheaDuoBoost proves that innovative architecture can overcome the cost barriers of AI, making advanced features accessible to everyone.