Model Development
Our AI Architecture Strategy
We've learned the hard way that most AI startups burn through money by running expensive cloud models for users who never pay. OurOtters takes a different approach: we use offline models that run on users' phones for our free tier, and only use expensive cloud AI for paying customers. This isn't just about saving money - it's about building a sustainable business that can actually serve millions of co-parents without going bankrupt.
Offline Models: The Foundation of Our Free Tier
Why Offline Models Matter
Here's the brutal math: if 95% of our users stay free and we pay OpenAI $0.02 per AI request, we'd lose money on every single interaction. Even if a free user only makes 10 AI requests per month, that's $0.20 in costs with zero revenue. Scale that to 100,000 users and we're bleeding $20,000 monthly just on AI costs for non-paying users.
Offline models change the game completely. Once a user downloads the model to their phone, every AI interaction costs us literally nothing. The user's device does all the work, using their electricity and their processing power. It's the difference between a sustainable business and a money pit.
Gemma 3n: Our Primary Offline Model
Google released Gemma 3n in June 2025, and it's exactly what we needed. Think of it as a smaller, mobile-optimized version of the AI models you're used to, but one that can actually run on a phone without draining the battery in an hour.
The technical specs are impressive: it only needs 2-3GB of memory (most modern phones have 8GB+), and it can handle text, images, audio, and video all in one model. That means our OtterSnap feature can analyze a photo of a report card, extract the text, understand what it means, and suggest appropriate responses - all without sending anything to the cloud.
What makes Gemma 3n special is its mobile-first design. Google built it specifically for phones and tablets, not servers. The MatFormer architecture is like having a series of nested models - if the phone is running hot or low on battery, it can switch to a smaller, faster version of itself. The Per-Layer Embeddings feature is clever too: most of the model runs on the phone's regular processor, only using the graphics chip for the intensive parts.
Implementation Strategy
We're creating two different AI experiences that feel cohesive but use completely different technology under the hood.
Free users get Gemma 3n running locally on their phones. When they use OtterSnap to photograph a school permission slip, their phone processes the image, extracts the text, understands it's a deadline that needs to go on the calendar, and suggests the appropriate action. When they need help with a difficult co-parenting conversation, OtterChat provides basic suggestions using patterns learned from thousands of successful family communications. The AI feels smart and helpful, but it never sends their private family information anywhere.
Paid users get the premium experience with cloud-based models like GPT-4 and Claude. These models are dramatically more sophisticated - they can analyze complex legal documents, provide nuanced communication advice for difficult situations, predict scheduling conflicts weeks in advance, and handle multiple languages with cultural context. The trade-off is that some data gets processed in the cloud, but we use enterprise-grade encryption and never store personal information.
Model Development Lifecycle
Phase 1: Model Selection and Evaluation
Choosing the right AI models is like picking tools for different jobs. We don't just look at how smart the model is - we need to consider whether it can actually run on a three-year-old Android phone without melting the processor.
Performance matters, but so does efficiency. A model that's brilliant but takes 30 seconds to respond to a simple question isn't useful for busy parents. Cost is critical too - both the upfront cost of licensing or training the model, and the ongoing cost of running it millions of times. Privacy capabilities determine whether we can keep sensitive family data on-device, which is increasingly important to parents. And multimodal support means one model can handle photos, voice messages, and text instead of needing three separate systems.
Phase 2: Custom Fine-Tuning
General AI models know a lot about everything, but they don't know much about the specific challenges of co-parenting. We're teaching our models to understand the nuances of family communication, recognize legal document types that matter to separated parents, understand child development stages, and suggest ways to de-escalate conflicts before they spiral.
Our training data comes from several sources, all carefully anonymized. Users can opt in to share their interaction patterns (never the actual content, just the patterns). We use publicly available co-parenting resources, family law templates, and academic research on family psychology. For edge cases - like what to do when a co-parent misses pickup or how to handle emergency custody changes - we create simulated scenarios based on expert guidance.
Phase 3: On-Device Optimization
We use several techniques to make large AI models fit on phones. Quantization is like compressing a high-resolution photo - you lose some detail, but the file becomes much smaller. Pruning removes parts of the model that don't contribute much to accuracy, like editing out unnecessary scenes from a movie. Knowledge distillation teaches a smaller model to mimic a larger one, capturing most of the intelligence in a fraction of the size.
Each platform needs different optimizations. Apple's phones have specialized AI chips that work best with Core ML, while Android devices prefer TensorFlow Lite. For web browsers, we use WebAssembly to run AI models at near-native speed without requiring downloads.
Phase 4: Deployment and Monitoring
We roll out new models gradually, testing them with small groups of users first. Different phones perform differently, so we monitor how the model runs on everything from flagship iPhones to budget Android devices. User feedback tells us when the AI is being helpful versus annoying, and we continuously improve based on real usage patterns rather than theoretical benchmarks.
Privacy and Security
On-Device Benefits
The privacy benefits of offline AI are huge for families. When you photograph your child's medical records or record a voice note about a custody issue, that information never leaves your phone for basic AI processing. There's no risk of a data breach exposing your family's private information because the data never goes anywhere.
This approach makes regulatory compliance much simpler. HIPAA compliance is straightforward when health information stays on the patient's device. GDPR compliance is easier when you're not collecting and storing personal data on servers. And we avoid complex international data transfer regulations because the data doesn't transfer anywhere.
Hybrid Security Model
For free users, security is built into the architecture. Their data is encrypted on their device, processed locally, and never transmitted for basic AI features. We collect minimal information - just enough to sync calendars and messages between co-parents.
Paid users get enterprise-level security when using cloud features. All data is encrypted in transit and at rest, we maintain audit trails of AI interactions for legal compliance, and we use advanced threat detection to prevent unauthorized access.
Technical Challenges and Solutions
Challenge 1: Model Size vs. Performance
Problem: Balancing model capability with mobile device constraints
Solution: Gemma 3n's MatFormer architecture allows dynamic sizing. We can deploy the E4B model but run E2B inference when battery or thermal constraints require it.
Challenge 2: Cold Start Performance
Problem: Initial model loading time on app launch
Solution: Background pre-loading during app idle time, progressive loading of model components, and caching frequently used model states.
Challenge 3: Keeping Models Updated
Problem: Updating offline models without breaking user experience
Solution: Delta updates for model improvements, backward compatibility guarantees, and seamless model swapping during app updates.
Future Roadmap
Short Term (6-12 months)
- Gemma 3n integration and optimization
- Basic multimodal features for free users
- Cloud model integration for paid features
Medium Term (12-24 months)
- Custom fine-tuned models for co-parenting scenarios
- Advanced on-device capabilities
- Real-time model switching based on device capabilities
Long Term (24+ months)
- Federated learning across user base
- Personalized models for individual families
- Integration with emerging on-device AI hardware
Success Metrics
Model Performance:
- Accuracy on co-parenting specific tasks
- User satisfaction with AI features
- Conversion from free to paid tiers
Technical Metrics:
- Model inference latency
- Memory usage across devices
- Battery impact on mobile devices
Business Metrics:
- Cost per AI interaction
- User engagement with AI features
- Premium feature adoption rates
This hybrid approach ensures we can offer meaningful AI capabilities to all users while maintaining a sustainable business model. Offline models like Gemma 3n aren't just a cost-saving measure - they're a strategic advantage that provides better privacy, reliability, and user experience.