From RFC to Production: What They Don't Tell You About Implementation
An honest take on the gap between beautiful RFC designs and messy production reality, featuring real-world lessons from implementing notification systems at scale
Abstract
RFCs rarely survive contact with production unchanged, and that's not necessarily a problem. Through examining notification system implementations, we can learn how elegant designs evolve when they meet organizational constraints, timeline pressures, and unexpected requirements. This exploration reveals patterns that help bridge the gap between theoretical design and practical implementation.
Situation: The Beautiful RFC vs. Production Reality
You know that feeling when you're reading through a beautifully crafted RFC, nodding along to the elegant architecture diagrams, and thinking "This is it, this is the design that will finally work perfectly"? Then six months later you're knee-deep in production issues, the timeline has doubled, and that pristine database schema looks like it went through a blender?
This pattern emerges repeatedly across system implementations. The gap between RFC and production isn't a bug - it's a feature of building complex systems with teams under business pressures. Understanding this gap helps us plan more effectively and set realistic expectations.
Note: The following examples are adapted from multiple notification system implementations across different organizations. While specific details may vary, the patterns and challenges described are representative of common experiences in this domain.
Task: Building a Notification System from RFC to Reality
The task seemed straightforward from the RFC perspective. A comprehensive notification system with clean architecture diagrams, well-planned database schemas, and phased rollout strategies. The specifications looked thorough and the timeline appeared conservative:
The RFC appeared comprehensive, covering rate limiting, deduplication, preference management, and user experience considerations like quiet hours. The phased approach seemed reasonable - core infrastructure in 4 weeks felt achievable.
Action: Implementation Challenges and Adaptations
Database Schema Evolution
The initial database schema design emphasized clean normalization with proper foreign keys and constraints:
Three months into production, the schema had evolved significantly:
Each schema change addressed production incidents, performance bottlenecks, or requirements that emerged during implementation. These adaptations reflect the natural evolution from theoretical design to operational system.
WebSocket Connection Management Complexity
The RFC specified WebSocket-based delivery for optimal performance. The initial implementation approach was straightforward:
Production requirements revealed additional complexity. After addressing connection management challenges during mobile app deployments, the implementation evolved:
Each addition addressed specific production challenges: circuit breakers for cascading failures, message chunking for large payloads, and sophisticated rate limiting for notification storms. These patterns emerge consistently when simple designs meet complex operational requirements.
Timeline and Scope Evolution
The RFC outlined a structured development approach:
- Phase 1 (Weeks 1-4): Core Infrastructure
- Phase 2 (Weeks 5-8): Advanced Features
- Phase 3 (Weeks 9-12): Integration & Optimization
The implementation timeline revealed different patterns:
Weeks 1-4: Infrastructure Foundation Challenges
Environment setup and capacity planning consumed more time than anticipated. Database throughput requirements exceeded initial assumptions, and competing production priorities affected team availability.
Weeks 5-12: Scope Expansion
Early demonstrations generated enthusiasm and additional requirements. Channel diversity expanded beyond initial specifications as business needs emerged during development.
Months 4-6: Integration Complexity
The clean API design assumed consistent authentication patterns across services. Production revealed three different authentication systems requiring unified notification support.
Months 7-8: Performance Optimization
While functional, the system required significant performance work to meet throughput requirements. Template rendering emerged as an unexpected bottleneck, with personalization features requiring multiple API calls per notification.
Team Scaling and Organizational Changes
The RFC specified "2 developers for 12 weeks." The implementation team evolved differently:
- 2 senior engineers (supposed to be full-time, averaged 60% due to production support)
- 1 junior engineer (added month 2, spent month 3 learning the codebase)
- 2 contractors (added month 4 for "quick wins," spent month 5 fixing their code)
- 1 DevOps engineer (supposedly "consulting," became full-time by month 3)
- 1 database expert (brought in month 5 for performance crisis)
- Product manager (changed twice during the project)
- 3 different engineering managers (reorg happened in month 6)
Team changes introduced context transfer challenges and architectural reviews. Contractor contributions required additional integration work, and organizational restructuring prompted design reassessment that affected project momentum.
Monitoring Requirements Discovery
The RFC monitoring section covered standard metrics: delivery rate, response time, and error rate. Production operation revealed additional observability requirements:
Each additional metric addresses specific operational challenges that emerged during production use, highlighting the difference between design-time and runtime observability needs.
Technical Debt Accumulation Patterns
Technical debt considerations weren't explicit in the RFC. By month 8, several patterns had emerged:
Template System Complexity
Multiple template engines emerged to support different team requirements, creating a hybrid system that required ongoing maintenance.
Schema Migration Challenges
The evolution from initial to optimized schema required careful migration planning. Running parallel schemas during transition introduced synchronization complexity.
Result: Lessons from Implementation Experience
The RFC specified technical success criteria: 99.9% uptime, sub-100ms delivery, and 10,000 notifications per second. Achievement of these targets revealed that user and business metrics were equally important.
What actually mattered:
- User happiness: We had 99% delivery rate but users hated the notifications because they were poorly timed
- Developer productivity: Other teams couldn't integrate with our "clean" API without extensive hand-holding
- Operational burden: The system required constant babysitting despite all our automation
- Business value: Marketing couldn't use half the features because they were too complex
Key Implementation Insights
Several patterns emerge consistently across notification system implementations:
1. RFCs as Starting Hypotheses
Treating RFCs as initial hypotheses rather than fixed specifications enables better adaptation. Documents should evolve with implementation learning rather than remaining static reference points.
2. Planning for Emergent Requirements
Significant buffer allocation for unexpected requirements reflects implementation reality. Doubling estimates and adding contingency helps accommodate discovery during development.
3. Evolution-Ready Design
Systems inevitably require migration, versioning, and compatibility features. Building these capabilities early reduces future technical debt and operational complexity.
4. Edge Cases as Core Requirements
Scenarios discussed during design reviews typically manifest in production. Planning for these cases during initial implementation proves more efficient than reactive fixes.
5. Organizational Context Integration
Technical design success depends on organizational alignment. Team changes, restructuring, and varying stakeholder priorities affect implementation more than architectural elegance.
6. Operational Observability Focus
Effective monitoring addresses incident response needs rather than design documentation requirements. Business impact, user experience, and operational detail provide more valuable debugging information.
Bridging Design and Implementation
Several strategies help minimize the RFC-to-production gap:
Progressive Feature Development
Starting with well-executed core functionality enables better iteration than comprehensive initial implementation. Perfect email notifications provide a stronger foundation than basic multi-channel support.
Adaptability Over Optimization
Systems designed for graceful evolution handle changing requirements better than those optimized for predicted scenarios. Flexibility often proves more valuable than initial perfection.
Developer Experience Investment
Easy integration and operation drive adoption more effectively than raw performance. API usability often determines system success more than technical specifications.
Documentation Evolution
Maintaining documentation as living artifacts rather than historical records improves team understanding. Sections for original design, current implementation, and learned insights provide comprehensive context.
Comprehensive Feedback Integration
Feedback loops across user experience, operational metrics, and developer workflow enable rapid iteration. Quick learning cycles accelerate problem identification and resolution.
Conclusion: Embracing Implementation Reality
Learning to work with implementation evolution rather than against it improves outcomes. Pristine RFCs naturally become complex as they address user needs. Beautiful architectures develop practical extensions. Clean codebases accumulate necessary technical debt. This represents successful problem-solving rather than design failure.
The RFC-to-production gap requires management rather than elimination. Effective engineering adapts to emerging reality while maintaining system coherence and user value.
Reflecting on notification system implementations, final systems rarely match initial designs. They're typically more complex and take longer to build, but they're also more capable and solve problems that weren't apparent during initial planning.
When writing RFCs, remember: you're starting a conversation with implementation reality rather than defining fixed specifications. This perspective enables better planning and more realistic expectations.
References
- martinfowler.com - Martin Fowler on software architecture (index).
- developer.mozilla.org - MDN Web Docs (web platform reference).
- semver.org - Semantic Versioning specification.
- ietf.org - IETF RFC index (protocol standards).
- arxiv.org - arXiv software engineering recent submissions (research context).
- cheatsheetseries.owasp.org - OWASP Cheat Sheet Series (applied security guidance).