07. AI Platform
RheaDuoBoost

RheaDuoBoost: Cost-Effective AI Enhancement

RheaDuoBoost is our proprietary AI orchestration system that delivers premium AI quality at >99% lower cost compared to using premium models directly. This sophisticated 4-phase collaborative workflow combines multiple smaller models to achieve results comparable to GPT-4.1 while dramatically reducing costs through intelligent model orchestration.

Overview

RheaDuoBoost is Rhea AI's flagship orchestration system that dramatically improves AI response quality through a sophisticated 4-phase collaborative workflow. By combining a primary model with a reviewer model, RheaDuoBoost creates a systematic process for generating, reviewing, and refining AI responses.

Architecture Overview

RheaDuoBoost follows a layered architecture with clear separation of concerns:

The Cost Challenge

Traditional AI pricing makes it difficult to offer advanced AI features at scale:

  • GPT-4.1 Input: $2 per million tokens
  • GPT-4.1 Output: $8 per million tokens
  • RheaDuoBoost Input: $0.02 per million tokens (100x cheaper)
  • RheaDuoBoost Output: $0.04 per million tokens (200x cheaper)

For a typical co-parenting app with thousands of daily users, this difference is dramatic:

  • Traditional approach: $500-1,000+ per day in AI costs
  • RheaDuoBoost approach: $5-10 per day in AI costs

The 4-Phase Workflow

RheaDuoBoost implements a systematic 4-phase process designed to maximize response quality through multi-model collaboration:

Phase 1: Inception 🎯

Purpose: Optimize the query and generate the ideal AI persona

Process:

  1. Persona Generation: Reviewer model analyzes the query to create optimal system prompt
  2. Query Optimization: Reviewer model enhances user query for better AI understanding
  3. Parallel Execution: Both tasks run concurrently for efficiency

Example from our logs:

Original Query: "Explain the concept of a Large Language Model in simple terms, for a 13 year old."

Generated Persona: "You are Bard, an AI assistant who specializes in explaining 
complex topics like artificial intelligence in a way that makes sense to teenagers. 
Use simple language, relatable examples, and be patient in explaining the concept."

Optimized Query: "Imagine you had a super smart computer program that could read 
and understand any text you gave it, just like a human. This program is called 
a Large Language Model, or LLM. Can you explain how LLMs work, using examples 
like how they learn from books and websites, and what they can do, like write 
stories or answer questions, in a way that a 13-year-old could easily understand?"

Phase 2: Primary Generation 🚀

Purpose: Generate initial response using optimized inputs

Process:

  1. Primary model (Mistral-7B) receives the generated persona as system prompt
  2. Uses the optimized query instead of original user input
  3. Generates comprehensive initial response
  4. Captures detailed usage metadata

Phase 3: Review 🔍

Purpose: Critical analysis by independent reviewer model (Gemma-2-9B)

Process:

  1. Reviewer model evaluates the primary response objectively
  2. Focuses on 2-3 most important improvements
  3. Provides specific, actionable feedback
  4. Maintains quality through separate model perspective

Example Review from our logs:

Review of LLM explanation:
• Clarify "reading and understanding": explicitly state that LLMs analyze 
  text patterns, not understand meaning like humans
• Explain "training data": briefly define what constitutes training data 
  (books, websites, etc.) for better comprehension
• Condense examples: streamline into one concise example

Phase 4: Synthesis ⚡

Purpose: Generate final improved response incorporating review feedback

Process:

  1. Primary model receives complete context (persona + query + initial response + feedback)
  2. Generates final synthesized response addressing all concerns
  3. Always produces output with maintained coherence
  4. Delivers polished, high-quality result

Detailed Cost Breakdown

RheaDuoBoost uses approximately 2,000 prompt tokens and 1,000 completion tokens per complex request:

Per-Request Cost Calculation

RheaDuoBoost:

  • Prompt tokens: 2,000 × $0.02 ÷ 1,000,000 = $0.00004
  • Completion tokens: 1,000 × $0.04 ÷ 1,000,000 = $0.00004
  • Total per request: $0.00008 (0.008¢)

GPT-4.1:

  • Prompt tokens: 2,000 × $2 ÷ 1,000,000 = $0.004
  • Completion tokens: 1,000 × $8 ÷ 1,000,000 = $0.008
  • Total per request: $0.012 (1.2¢)

Cost Reduction: 99.3% savings per request

The Four Stages

  1. Inception Stage: Persona generation and prompt optimization
  2. Primary Generation Stage: Main response creation
  3. Review Stage: Quality analysis and feedback
  4. Synthesis Stage: Final response polishing

All stages combined use roughly 2,000 prompt tokens and generate 1,000 completion tokens.

Provider System

Supported Providers

RheaDuoBoost integrates with multiple AI providers through a unified interface:

ProviderModels UsedStrengthsUse Cases
OpenRouterMistral-7B, Gemma-2-9BCost-effective, diverse modelsPrimary generation, review
OpenAIGPT-4.1, GPT-4oAdvanced reasoningFallback, comparison
AnthropicClaude Sonnet, HaikuSafety, instruction followingReview, critique
OllamaLocal modelsPrivacy, no API costsDevelopment, private data

AIInteraction Contract

All provider communication uses the standardized AIInteraction format:

interface AIInteraction {
  interaction_type: string      // 'chat', 'completion', 'embedding'
  content: Record<string, any>  // {'messages': [...]} or {'prompt': "..."}
  model_name: string           // Specific model identifier
  expect_json?: boolean        // Structured output flag
  agent_id?: string           // Agent identification
  kwargs?: Record<string, any> // Provider options
}

Registry and Configuration

The AIProviderRegistry manages:

  • Provider Discovery: Dynamic loading of available providers
  • Model Catalogs: Comprehensive model information and capabilities
  • Cost Calculation: Centralized token pricing and cost tracking
  • Configuration Management: Environment-based and file-based settings

Configuration Sources

The registry supports multiple configuration sources with clear precedence:

  1. Environment Variables (Highest Priority)

    export OPENROUTER_API_KEY="sk-..."
    export OPENAI_API_KEY="sk-..."
  2. Provider JSON Files

    {
      "openrouter": {
        "enabled": true,
        "default_model": "mistralai/mistral-7b-instruct",
        "models": {
          "mistralai/mistral-7b-instruct": {
            "description": "Fast, cost-effective generation",
            "costs": {
              "input_per_mtok": 0.02,
              "output_per_mtok": 0.04
            }
          }
        }
      }
    }

Supabase Edge Functions Integration

RheaDuoBoost is implemented as Supabase Edge Functions for seamless integration with our backend:

// Example: RheaDuoBoost Edge Function
import { serve } from "https://deno.land/std@0.168.0/http/server.ts";
import { RheaDuoBoostService } from "./rheaduoboost/service.ts";
 
serve(async (req) => {
  try {
    const { query, lodgeId, userId } = await req.json();
    
    // Initialize RheaDuoBoost service
    const rheaService = new RheaDuoBoostService(
      "openrouter/mistralai/mistral-7b-instruct",  // Primary model
      "openrouter/google/gemma-2-9b-it"           // Reviewer model  
    );
    
    // Process through 4-phase workflow
    const result = await rheaService.process(query);
    
    // Store result in database
    await storeAiResponse(supabase, lodgeId, userId, result);
    
    return new Response(JSON.stringify({
      response: result.final_response,
      cost: result.metadata.total_cost,
      phases: result.metadata.stages.length
    }));
    
  } catch (error) {
    return new Response(JSON.stringify({ error: error.message }), {
      status: 500,
      headers: { 'Content-Type': 'application/json' }
    });
  }
});

Metadata Collection and Transparency

Each phase contributes detailed metadata for cost tracking and debugging:

{
  "query": "Explain the concept of a Large Language Model in simple terms, for a 13 year old.",
  "stages": [
    {
      "name": "inception_persona",
      "model": "openrouter/google/gemma-2-9b-it", 
      "usage": {
        "input_tokens": 182,
        "output_tokens": 100,
        "total_tokens": 282
      },
      "cost": 0.0000076
    },
    {
      "name": "inception_prompt_optimization",
      "model": "openrouter/google/gemma-2-9b-it",
      "usage": {
        "input_tokens": 193, 
        "output_tokens": 89,
        "total_tokens": 282
      },
      "cost": 0.0000075
    },
    {
      "name": "primary_generation", 
      "model": "openrouter/mistralai/mistral-7b-instruct",
      "usage": {
        "input_tokens": 197,
        "output_tokens": 433, 
        "total_tokens": 630
      },
      "cost": 0.0000213
    },
    {
      "name": "review",
      "model": "openrouter/google/gemma-2-9b-it",
      "usage": {
        "input_tokens": 762,
        "output_tokens": 89,
        "total_tokens": 851  
      },
      "cost": 0.0000188
    },
    {
      "name": "final_synthesis",
      "model": "openrouter/mistralai/mistral-7b-instruct", 
      "usage": {
        "input_tokens": 1057,
        "output_tokens": 328,
        "total_tokens": 1385
      },
      "cost": 0.0000342
    }
  ],
  "total_cost": 0.0000894,
  "final_response": "Alright, so imagine you've been reading tons of stories..."
}

Total actual cost from logs: $0.0000894 (0.009¢) vs GPT-4.1 estimated cost of $0.012 (1.2¢)

Error Handling and Resilience

RheaDuoBoost implements graceful degradation across multiple levels:

  1. Provider Level: API failures logged, alternative providers attempted
  2. Phase Level: Individual phase failures don't crash entire workflow
  3. Registry Level: Missing configurations result in warnings, not crashes
  4. Cost Calculation: Failures return 0.0 instead of exceptions

Fallback Strategy

// Fallback hierarchy for critical phases
const fallbackChain = [
  "openrouter/mistralai/mistral-7b-instruct",  // Primary choice
  "openai/gpt-4o-mini",                        // Fallback 1
  "anthropic/claude-haiku"                     // Fallback 2
];
 
for (const model of fallbackChain) {
  try {
    return await generateResponse(model, prompt);
  } catch (error) {
    console.warn(`Model ${model} failed, trying next...`);
  }
}

Cost Comparison

For a typical OurOtters user making 10 AI requests per day:

ModelCost per RequestDaily Cost (10 requests)Monthly CostSavings vs GPT-4.1
GPT-4.1$0.012 (1.2¢)$0.12$3.60-
RheaDuoBoost$0.00008 (0.008¢)$0.0008$0.02499.3%

Annual savings for 1,000 active users: ~$13,000 vs GPT-4.1

Quality Assurance

Despite the dramatic cost savings, RheaDuoBoost maintains high quality through:

  1. Multi-Model Verification: Each response is checked by multiple models
  2. Specialized Personas: Tailored AI personalities for specific tasks
  3. Continuous Feedback Loop: The review stage catches and corrects issues
  4. Context Preservation: Maintains conversation context across stages

Implementation in OurOtters

RheaDuoBoost powers several key features:

Co-Parenting Assistant

  • Provides empathetic, balanced advice
  • Understands family dynamics
  • Maintains appropriate tone for sensitive topics

Document Analysis

  • Extracts key information from legal documents
  • Summarizes medical records
  • Identifies important dates and obligations

Conflict Resolution

  • Suggests compromise solutions
  • Maintains neutrality
  • Focuses on children's best interests

Expense Categorization

  • Automatically categorizes receipts
  • Suggests fair splits
  • Tracks patterns over time

Technical Implementation

RheaDuoBoost is implemented as Supabase Edge Functions:

// Example: RheaDuoBoost Edge Function
export async function processWithRheaDuoBoost(query: string) {
  // Stage 1: Inception
  const inception = await runInception(query);
  
  // Stage 2: Primary Generation
  const primary = await generatePrimary(
    inception.persona,
    inception.optimizedPrompt
  );
  
  // Stage 3: Review
  const review = await reviewResponse(
    query,
    primary.response
  );
  
  // Stage 4: Synthesis
  const final = await synthesizeFinal(
    primary.response,
    review.feedback,
    inception.persona
  );
  
  return {
    response: final.response,
    metadata: {
      stages: [inception, primary, review, final],
      totalCost: calculateCost(),
      processingTime: getProcessingTime()
    }
  };
}

Future Enhancements

Planned Improvements

  • Dynamic Model Selection: Choose models based on query complexity
  • Caching Layer: Cache common patterns to reduce costs further
  • Fine-Tuning: Custom models trained on co-parenting scenarios
  • Parallel Processing: Run stages concurrently when possible

Scaling Strategy

  • Load Balancing: Distribute requests across multiple providers
  • Regional Deployment: Deploy closer to users for lower latency
  • Batch Processing: Group similar requests for efficiency

Conclusion

RheaDuoBoost represents a paradigm shift in AI economics. By intelligently orchestrating multiple smaller models, we achieve:

  • >90% cost reduction compared to premium models
  • Maintained quality through multi-stage verification
  • Scalability to serve thousands of users affordably
  • Flexibility to adapt to different use cases

This technology enables OurOtters to democratize access to advanced AI features, ensuring that all families can benefit from AI-powered co-parenting assistance regardless of their subscription tier.

Technical Deep Dive

For developers interested in implementing similar systems, key considerations include:

  1. Model Selection: Choose complementary models with different strengths
  2. Prompt Engineering: Invest in sophisticated prompt templates
  3. Error Handling: Implement fallbacks for each stage
  4. Monitoring: Track quality metrics across all stages
  5. Cost Tracking: Monitor token usage per stage

RheaDuoBoost proves that innovative architecture can overcome the cost barriers of AI, making advanced features accessible to everyone.