Building AWS Serverless with TypeScript: Hard-Won Lessons from Lambda at Scale
Why I moved from Express.js to Lambda, the costly mistakes I made along the way, and the TypeScript patterns that saved my team thousands in AWS bills.
I was running a traditional Express.js API on EC2 instances. Fixed costs, predictable scaling, 99.9% uptime. Life was good. Then our biggest client asked for a feature that needed to process 50,000 webhooks in under 10 minutes, once per month.
Keeping EC2 instances running 24/7 for a 10-minute monthly spike felt wasteful. That's when I dove headfirst into AWS Lambda. Here's what I learned from building production Lambda functions, making every serverless mistake possible, and spending way too much on AWS bills.
Why I Finally Embraced Serverless (After Years of Resistance)
I used to be that guy who called serverless "vendor lock-in with extra steps." Coming from a background of managing Kubernetes clusters and fine-tuning JVM garbage collectors, Lambda felt like giving up control. But three incidents changed my mind:
The Unexpected Traffic Spike (June 2022)
Our Express API got featured on Hacker News at 2 AM. Traffic went from 100 req/min to 5,000 req/min. Our auto-scaling group took 8 minutes to spin up new instances. By then, we'd experienced significant payment processing failures and our Redis cache was overwhelmed.
Lambda would have scaled instantly. This incident highlighted the value of automatic scaling.
The Webhook Processing Challenge (August 2022)
A client needed to process Stripe webhooks that could arrive in bursts of 10,000+ events. With EC2, we had two bad options:
- Over-provision for peak load (expensive)
- Use queues and risk webhook timeouts (unreliable)
Lambda's automatic concurrency scaling solved this elegantly. Each webhook got its own function instance. No queues, no timeouts, no over-provisioning.
The Compute Utilization Analysis (October 2022)
Analyzing our actual compute utilization revealed that our API servers were idle 87% of the time, yet we paid for 100% capacity. The monthly costs for unused resources added up significantly.
Lambda's pay-per-millisecond model addressed this inefficiency directly.
The Stack That Actually Works in Production
After burning through multiple approaches, here's what we settled on:
The Lambda Handler That Handles Reality
Here's our production Lambda handler, complete with all the error handling and optimizations learned from countless production incidents:
Cost Optimization Lessons That Saved Thousands
1. Memory vs. CPU Trade-offs
I spent weeks optimizing our Lambda memory settings. Here's what I learned:
1024 MB was our sweet spot. More memory = faster execution = lower cost, up to a point.
2. Connection Reuse Saved 15% on AWS Bills
3. Bundle Size Optimization
The Monitoring Setup That Actually Alerts on Real Issues
After too many unnecessary alerts for non-issues, here's our production monitoring:
The Mistakes That Cost Me Sleep (and Money)
1. The Concurrent Execution Limit Issue
During a high-traffic event, our webhook processing Lambda consumed all 1,000 concurrent executions in our AWS account. Our main API experienced downtime because it couldn't get any Lambda capacity.
Fix: Set reserved concurrency on critical functions:
2. The DynamoDB Hot Partition Problem
Sequential IDs for DynamoDB partition keys caused all traffic to hit one partition. Read/write throttling significantly degraded performance.
Fix: Distributed partition keys:
3. The 15-Minute Timeout Discovery
Lambda functions were timing out after exactly 15 minutes. Initially suspected a memory leak, but discovered AWS has a 15-minute maximum execution time limit. Large batches were being processed synchronously.
Fix: Batch processing with pagination:
TypeScript Patterns That Saved My Sanity
1. Strict Event Type Definitions
2. Environment Variable Validation
3. Result Types for Error Handling
Performance Insights from Production Data
After 18 months in production with detailed monitoring:
Cold Start Analysis
- Average cold start: 850ms
- P95 cold start: 1,200ms
- Bundle size impact: 10MB bundle = +400ms cold start
- Memory impact: 1024MB vs 512MB = -200ms cold start
Cost Breakdown (Monthly)
- Lambda execution: $89/month (8M invocations)
- API Gateway: $28/month (8M requests)
- DynamoDB: $67/month (pay-per-request)
- CloudWatch logs: $12/month
- Total: 800/month for EC2 equivalent)
Reliability Metrics
- Uptime: 99.97% (vs. 99.9% on EC2)
- Error rate: 0.02% (mostly client errors)
- P95 response time: 180ms
When NOT to Use Serverless
Serverless isn't always the answer. Here's when I stick with containers:
- Long-running processes - Video encoding, large batch jobs
- Websocket-heavy apps - Real-time gaming, chat apps
- Legacy applications - Complex deployment requirements
- Stateful workloads - In-memory caches, sessions
- Cold start sensitive - Sub-100ms response requirements
The Deployment Pipeline That Doesn't Break
Final Thoughts
Serverless with TypeScript transformed how our team ships features. We went from weekly deployments to daily deployments. Our AWS costs decreased significantly. Our uptime improved to 99.97%.
The biggest benefit? Reduced operational overhead. Fewer emergency calls about server crashes, minimal capacity planning, and no operating system patching.
The serverless learning curve is steep, but the productivity gains are measurable. Start small, implement comprehensive monitoring from day one, and expect to make mistakes during the learning process.
Ready to dive in? Start with a simple CRUD API, add proper monitoring from day one, and build incrementally as you learn the platform's characteristics.
References
- docs.aws.amazon.com - Amazon API Gateway Developer Guide.
- docs.aws.amazon.com - Lambda functions: execution model and scaling.
- typescriptlang.org - TypeScript Handbook and language reference.
- github.com - TypeScript project wiki (FAQ and design notes).
- docs.aws.amazon.com - AWS Overview (official whitepaper).
- cloud.google.com - Google Cloud documentation.