Database Selection Guide: From Classical to Edge - A Complete Engineering Perspective
Comprehensive guide to choosing the right database for your project - covering SQL, NoSQL, NewSQL, and edge solutions with real-world implementation stories and performance benchmarks.
Database selection mistakes can be costly. A simple product catalog that works perfectly with 1,000 products can suddenly ground to a halt at 100,000. When MongoDB collections aren't indexed properly and queries scan entire collections, that "web-scale" solution becomes an expensive lesson in database fundamentals.
This isn't uncommon. Database choices derail more projects than most engineers realize. The database is your application's foundation - get it wrong, and everything built on top starts to crack.
Why Database Choice Matters More Than You Think
The Real Cost of Wrong Choices
Technical Debt Explosion: Switching databases mid-project isn't just a migration - it's architectural surgery. Using MySQL for time-series data creates query complexity with excessive date functions and subqueries. Moving to a specialized database like InfluxDB can take 6 months and require rewriting significant application logic.
Team Productivity Impact: Your choice directly affects development velocity. A team comfortable with SQL struggling with MongoDB document queries can lose 40% productivity for months. Conversely, NoSQL experts forced into rigid SQL schemas often over-engineer simple problems.
Hidden Operational Costs:
- PostgreSQL self-hosted: $200/month server + 20 hours admin time
- RDS PostgreSQL: $400/month but 2 hours admin time
- DynamoDB: $50/month usage + zero admin time (when designed properly)
The cheapest option often costs the most when you factor in engineering time.
Classical Database Categories
Relational (SQL) Databases
PostgreSQL: The Swiss Army Knife
PostgreSQL is an excellent default choice for most projects. It's boring in the best possible way - reliable, well-documented, and handles edge cases gracefully. PostgreSQL 16 (current major version) continues to improve performance and adds features like SQL/JSON standard support.
When PostgreSQL Shines:
- Complex business logic requiring ACID transactions
- Analytics workloads with sophisticated queries
- Applications needing both relational and document storage (JSONB)
- Teams comfortable with SQL
Production Story: We migrated a Rails app from MySQL to PostgreSQL specifically for its JSON capabilities. Being able to store flexible metadata alongside relational data eliminated the need for a separate document store. Query performance improved 3x, and we avoided the complexity of maintaining two database systems.
Gotchas:
- Write amplification with frequent updates (use HOT updates wisely)
- Connection management - use pgBouncer in production
- Vacuum tuning required for high-write workloads
MySQL: The Web-Scale Workhorse
MySQL earned its reputation powering the web's biggest sites. It's fast, well-understood, and has an ecosystem built around web applications.
When MySQL Works:
- Read-heavy web applications
- Applications requiring master-slave replication
- Teams with existing MySQL expertise
- Cost-conscious projects (excellent community support)
Real-World Performance: MySQL clusters can handle 50K+ QPS across multiple read replicas. The key is treating it like a cache layer - denormalized data, aggressive indexing, and strategic partitioning.
Trade-offs:
- Less sophisticated query planner than PostgreSQL
- JSON support exists but feels bolted-on
- Replication lag can be tricky in multi-master setups
SQLite: The Embedded Champion
Don't underestimate SQLite. It's not just for mobile apps anymore. With proper configuration, it can handle surprising workloads.
Perfect For:
- Edge applications with local data requirements
- Development and testing environments
- Applications with <100GB data and modest concurrency
- Embedded systems and IoT devices
Performance Reality Check: SQLite can handle 100K reads/second on modern hardware. The bottleneck is usually concurrent writes, not reads.
NoSQL Databases
MongoDB: The Document Store
MongoDB gets a lot of hate, often deserved, but it genuinely excels in specific scenarios. The key is understanding its strengths and designing around its limitations.
Where MongoDB Excels:
- Rapid prototyping with evolving schemas
- Content management systems
- Catalog systems with varied product attributes
- Applications where document structure matches business logic
Important Consideration: Always design your indexes first. MongoDB without proper indexes is like a Ferrari without wheels - impressive specs but unusable performance.
Production Gotchas:
- Memory usage grows with working set size
- Aggregation pipelines can be memory-intensive
- Sharding requires careful planning of shard keys
Redis: The Speed Demon
Redis isn't just a cache - it's a data structure server that can solve complex problems elegantly.
Redis Use Cases Beyond Caching:
- Session storage with automatic expiration
- Rate limiting with sliding windows
- Real-time leaderboards and counters
- Pub/sub for real-time features
- Distributed locks for coordination
Proven Pattern: Using Redis for distributed rate limiting across microservices:
DynamoDB: The Serverless Powerhouse
DynamoDB is either amazing or terrible depending on how well you understand its data model. There's no middle ground.
DynamoDB Strengths:
- True serverless with pay-per-use pricing
- Predictable single-digit millisecond latency
- Automatic scaling and backup
- Global tables for multi-region applications
The DynamoDB Mental Model: Stop thinking in SQL. Start thinking in access patterns. Design your table structure around how you'll query the data, not how you'll store it.
DynamoDB Gotchas:
- Hot partitions can throttle your entire application
- Query patterns must be known upfront
- Complex relationships require careful GSI design
- FilterExpressions still consume read capacity
NewSQL: Best of Both Worlds
CockroachDB: Distributed SQL Done Right
CockroachDB promises PostgreSQL compatibility with global distribution. In practice, it delivers on most of these promises with some important caveats. CockroachDB v23.2 (current stable) offers improved PostgreSQL compatibility and better performance for distributed workloads.
When CockroachDB Makes Sense:
- Global applications requiring strong consistency
- Financial systems needing ACID across regions
- Applications outgrowing single-node PostgreSQL
- Teams wanting SQL with automatic sharding
Implementation Example: CockroachDB works well for fintech applications spanning multiple regions. The automatic geo-partitioning keeps user data in the right regions for compliance, while maintaining strong consistency for financial transactions.
Trade-offs:
- Higher latency than single-node databases due to consensus
- More expensive than traditional PostgreSQL
- Some PostgreSQL features still missing or different
Edge Database Solutions
PouchDB/CouchDB: Offline-First Architecture
For applications that need to work offline, CouchDB's replication model is unmatched. PouchDB brings this to the browser seamlessly.
Perfect For:
- Field service applications
- Mobile apps in areas with poor connectivity
- Collaborative applications with eventual consistency needs
Implementation Pattern:
InfluxDB: Time-Series Specialist
When you're dealing with metrics, logs, or IoT data, specialized time-series databases like InfluxDB outperform general-purpose databases dramatically.
InfluxDB Advantages:
- Automatic downsampling and retention policies
- Built-in time-based functions and aggregations
- Efficient storage for time-series data
- Native integration with monitoring tools
Database Selection Matrix
By Use Case
E-commerce Platform:
- Catalog: PostgreSQL (structured product data + JSONB for attributes)
- Sessions: Redis (fast access + automatic expiration)
- Orders: PostgreSQL (ACID compliance for financial data)
- Analytics: ClickHouse or BigQuery (analytical workloads)
IoT Application:
- Device State: Redis (real-time updates)
- Time Series: InfluxDB (sensor data)
- Configuration: PostgreSQL (device management)
- Edge Cache: SQLite (local device storage)
Social Media App:
- User Profiles: PostgreSQL (relational data)
- Posts/Timeline: DynamoDB (high scale, simple queries)
- Real-time: Redis Streams (notifications, chat)
- Search: Elasticsearch (content discovery)
By Scale Requirements
Small Scale (1K-100K users): PostgreSQL + Redis covers 90% of use cases. Simple, well-understood, cost-effective.
Medium Scale (100K-10M users):
- Read replicas for PostgreSQL
- DynamoDB for high-traffic features
- Elasticsearch for search
- Redis cluster for caching
Large Scale (10M+ users):
- Sharded PostgreSQL or CockroachDB
- DynamoDB with careful partition design
- Redis Cluster with consistent hashing
- Specialized databases for specific workloads
Selection Criteria Deep Dive
Consistency Requirements
Strong Consistency (ACID): PostgreSQL, CockroachDB, SQL Server
- Financial transactions
- Inventory management
- User authentication
Eventual Consistency (BASE): DynamoDB, MongoDB, Cassandra
- Social media feeds
- Content catalogs
- Analytics data
Choose Strong When: Data integrity is more important than availability Choose Eventual When: Availability and partition tolerance are priority
Performance Patterns
Read-Heavy Workloads: MySQL with read replicas, Redis caching layer Write-Heavy Workloads: DynamoDB, Cassandra, or sharded PostgreSQL Mixed Workloads: PostgreSQL with proper indexing and connection pooling
Latency Requirements:
- <1ms: Redis (in-memory)
- <10ms: DynamoDB, well-tuned PostgreSQL
- <100ms: Most SQL databases with proper indexing
-
100ms: Acceptable for analytical workloads
Real-World Migration Stories
The MongoDB to PostgreSQL Migration
The Problem: Content management systems using MongoDB can struggle with complex queries. Aggregation pipelines become unmaintainable, and the lack of schema validation causes data quality issues.
The Solution: Migrated to PostgreSQL with JSONB columns for flexible content, maintaining the benefits of document storage while gaining SQL's query power.
Timeline: 3 months with zero downtime using a dual-write pattern:
Outcomes:
- Schema validation in PostgreSQL catches data quality issues early
- JSONB queries can be faster than MongoDB aggregations for complex queries
- Proper database selection significantly improves developer productivity
The Single-Region to Multi-Region Challenge
The Challenge: Growing SaaS applications need to expand from single-region to global, requiring data residency compliance and low latency worldwide.
The Solution: Migrating from single PostgreSQL to CockroachDB with geo-partitioning allows user data to stay in their regions while maintaining global consistency for billing and analytics.
Implementation:
Results:
- Latency can reduce from 200ms to 50ms for international users
- GDPR compliance achieved through data localization
- Development complexity increases but operational benefits often justify the cost
Performance Benchmarking
Read Performance Comparison
Based on typical testing with 1M records (results vary by hardware and configuration):
Simple Key Lookups (ops/second):
- Redis: 100,000+
- DynamoDB: 50,000
- PostgreSQL (indexed): 25,000
- MongoDB (indexed): 20,000
- MySQL (indexed): 22,000
Complex Queries (analytical workloads):
- PostgreSQL: Excellent (sophisticated query planner)
- CockroachDB: Good (distributed but still SQL)
- MongoDB: Poor (aggregation pipelines)
- DynamoDB: Not applicable (limited query capabilities)
Write Performance Under Load
Concurrent Writes (1000 clients):
- DynamoDB: Scales automatically, consistent performance
- Redis: Excellent until memory limit
- PostgreSQL: Good with proper connection pooling
- MongoDB: Degrades with document size growth
Implementation Patterns
Database Sharding Strategies
Horizontal Sharding (dividing data across servers):
Vertical Sharding (separating by feature):
Connection Management
PostgreSQL Connection Pooling:
Monitoring and Troubleshooting
Key Metrics to Track
PostgreSQL Essential Metrics:
- Connection usage (
pg_stat_activity) - Query performance (
pg_stat_statements) - Index usage (
pg_stat_user_indexes) - Replication lag (
pg_stat_replication)
DynamoDB CloudWatch Metrics:
- ConsumedReadCapacityUnits / ConsumedWriteCapacityUnits
- ThrottledRequests (critical!)
- SuccessfulRequestLatency
- SystemErrors
Note: DynamoDB pricing was reduced by ~50% in November 2024, making on-demand pricing more cost-effective for variable workloads.
MongoDB Key Metrics:
- Operations per second (opcounters)
- Working set size vs available memory
- Lock percentage
- Replication lag
Common Performance Issues
The N+1 Query Problem:
Connection Pool Exhaustion:
Future-Proofing Your Database Choice
Technology Trends to Watch
Vector Databases for AI/ML: pgvector for PostgreSQL, Pinecone, Weaviate
- Embedding storage for semantic search
- RAG (Retrieval-Augmented Generation) applications
- Image and document similarity search
Multi-Model Databases: FaunaDB, Azure Cosmos DB
- Single database supporting multiple data models
- Reduced operational complexity
- Unified query interfaces
Serverless-First Architectures:
- PlanetScale (serverless MySQL)
- Neon (serverless PostgreSQL)
- FaunaDB (serverless transactional)
Planning for Growth
Capacity Planning Framework:
Team Development Strategy
Skill Building Path:
- Foundation: Master one SQL database deeply (PostgreSQL recommended)
- NoSQL Understanding: Learn one document store (MongoDB) and one key-value (Redis)
- Cloud Native: Understand one cloud database (DynamoDB or Cosmos DB)
- Specialization: Deep dive into domain-specific databases (time-series, graph, etc.)
Knowledge Sharing Practices:
- Database design reviews for all new features
- Regular performance analysis sessions
- Post-mortem analysis of database-related incidents
- Cross-training on different database technologies
Decision Framework
When choosing a database for a new project, ask these questions in order:
1. Consistency Requirements
- Do you need ACID transactions? → SQL databases
- Can you work with eventual consistency? → NoSQL options open up
2. Query Complexity
- Complex analytical queries? → PostgreSQL, CockroachDB
- Simple key-value lookups? → Redis, DynamoDB
- Full-text search required? → Elasticsearch + primary database
3. Scale and Performance
- Current scale: <100K users → PostgreSQL + Redis
- Growth trajectory: >1M users → Consider sharding or cloud-native options
- Latency requirements: <10ms → In-memory (Redis) or optimized NoSQL
4. Team and Operational Constraints
- Team expertise: Stick close to existing skills initially
- Operational budget: Managed services vs. self-hosted
- Compliance requirements: Data residency, encryption, audit trails
5. Future Flexibility
- How likely is the data model to change? → Document stores for high change rate
- Multi-region expansion planned? → Consider distributed databases early
- Integration requirements: What other systems need to connect?
References
- postgresql.org - PostgreSQL official documentation.
- redis.io - Redis documentation.
- docs.aws.amazon.com - Amazon DynamoDB Developer Guide.
- martinfowler.com - Martin Fowler on software architecture (index).
- web.dev - web.dev performance guidance (Core Web Vitals).
- developer.mozilla.org - MDN Web Docs (web platform reference).
- semver.org - Semantic Versioning specification.