Skip to content

AWS CDK Link Shortener Part 1: Project Setup & Basic Infrastructure

Setting up a production-grade link shortener with AWS CDK, DynamoDB, and Lambda. Real architecture decisions, initial setup, and lessons learned from building URL shorteners at scale.

Series Navigation

This is Part 1 of a 5-part series on building a production-grade link shortener:

  1. Part 1: Project Setup & Basic Infrastructure (You are here)
  2. Part 2: Core Functionality & API Development
  3. Part 3: Advanced Features & Security
  4. Part 4: Production Deployment & Optimization
  5. Part 5: Scaling & Maintenance

Introduction: Building for Real-World Scale

Last month, during our quarterly planning meeting, the marketing team made an urgent request: "We need branded short links for all our campaigns. Can you build something by next week?" The easy answer would've been to grab a SaaS solution, but when you're handling 5-10 million redirects per month and need custom analytics, building your own starts making sense.

Here's the thing about link shorteners - they seem simple until you hit production. Then you discover all the fun edge cases: redirect loops, malicious URLs, analytics at scale, and my personal favorite - when someone accidentally creates a short link that points to another short link that points back to the first one during a major campaign launch.

Let me walk you through building a production-grade link shortener with AWS CDK that won't wake you up during your vacation.

The Architecture That Survived Black Friday

Before writing any code, I spent a week sketching architectures on napkins (literally - coffee shop napkins are great for system design). Here's what we landed on:

This architecture handles about 2,000 requests per second without breaking a sweat. The key decisions:

  1. CloudFront for caching - Why hit your Lambda for the same redirect 10,000 times?
  2. DynamoDB over RDS - Predictable performance at scale, no connection pooling headaches
  3. Separate Lambda functions - Easier to scale and debug when things go wrong
  4. DAX for hot paths - Because that one viral link will hammer your database

Setting Up Your CDK Project (The Right Way)

First lesson: don't just run cdk init. Take five minutes to set up your project structure properly. You'll thank yourself later when you're not refactoring everything at 2x the scale.

bash
# Create project with TypeScript from the startmkdir link-shortener && cd link-shortenernpx cdk init app --language typescript
# Install dependencies we'll actually need (CDK v2)npm install aws-cdk-lib@latest constructs@latest \  @aws-sdk/client-dynamodb @aws-sdk/lib-dynamodb \  nanoid zod
# Dev dependencies for sanitynpm install -D @types/aws-lambda @types/node esbuild \  prettier eslint tsx \  @typescript-eslint/parser @typescript-eslint/eslint-plugin

Your project structure should look like this:

link-shortener/├── bin/│  └── link-shortener.ts  # CDK app entry point├── lib/│  ├── stacks/│  │  ├── api-stack.ts  # API Gateway + Lambda│  │  ├── database-stack.ts  # DynamoDB tables│  │  └── cdn-stack.ts  # CloudFront distribution│  └── constructs/│  ├── link-table.ts  # DynamoDB construct│  └── lambda-function.ts  # Reusable Lambda construct├── src/│  ├── handlers/│  │  ├── create.ts  # Create short link│  │  ├── redirect.ts  # Handle redirects│  │  └── analytics.ts  # Track clicks│  └── utils/│  ├── id-generator.ts  # Short ID generation│  └── url-validator.ts  # URL validation├── test/└── cdk.json

DynamoDB Design: Lessons from High-Volume Production

Here's where most tutorials go wrong - they show you a basic table with id and url. That's cute, but it won't survive production. After three database migrations (each more painful than the last), here's the schema that actually works:

typescript
// lib/constructs/link-table.tsimport * as dynamodb from 'aws-cdk-lib/aws-dynamodb';import { RemovalPolicy } from 'aws-cdk-lib';import { Construct } from 'constructs';
export class LinkTable extends Construct {  public readonly table: dynamodb.Table;
  constructor(scope: Construct, id: string) {    super(scope, id);
    this.table = new dynamodb.Table(this, 'LinksTable', {      partitionKey: {        name: 'PK',        type: dynamodb.AttributeType.STRING,      },      sortKey: {        name: 'SK',        type: dynamodb.AttributeType.STRING,      },      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST, // Start here, switch to provisioned when you know your patterns      pointInTimeRecovery: true, // Because someone will delete something important      stream: dynamodb.StreamViewType.NEW_AND_OLD_IMAGES, // For analytics and debugging      removalPolicy: RemovalPolicy.RETAIN, // Never accidentally delete production data    });
    // GSI for looking up by original URL (deduplication)    this.table.addGlobalSecondaryIndex({      indexName: 'GSI1',      partitionKey: {        name: 'GSI1PK',        type: dynamodb.AttributeType.STRING,      },      sortKey: {        name: 'GSI1SK',        type: dynamodb.AttributeType.STRING,      },    });
    // GSI for analytics queries    this.table.addGlobalSecondaryIndex({      indexName: 'GSI2',      partitionKey: {        name: 'GSI2PK',        type: dynamodb.AttributeType.STRING,      },      sortKey: {        name: 'CreatedAt',        type: dynamodb.AttributeType.NUMBER,      },    });  }}

Why this schema? Let me show you with real data:

typescript
// Example records in the tableconst linkRecord = {  PK: 'LINK#abc123',  // Short code  SK: 'METADATA',  // Allows future expansion  GSI1PK: 'URL#https://example.com/very/long/url',  GSI1SK: 'LINK#abc123',  // For deduplication  GSI2PK: 'USER#user123',  // Who created it  CreatedAt: 1706544000000,  // Timestamp for sorting  OriginalUrl: 'https://example.com/very/long/url',  ClickCount: 0,  ExpiresAt: 1738080000000,  // TTL  Tags: ['campaign-2024', 'email'],  CustomSlug: 'summer-sale',  // Optional custom slug};
const clickRecord = {  PK: 'LINK#abc123',  SK: `CLICK#${Date.now()}#${uuid}`, // Unique click event  UserAgent: 'Mozilla/5.0...',  IPHash: 'hashed-ip',  // Privacy-compliant  Referer: 'https://twitter.com',  Timestamp: 1706544000000,};

This design lets you:

  • Query all data for a link with one request
  • Deduplicate URLs efficiently
  • Track individual clicks for analytics
  • Support custom slugs without conflicts
  • Expire links automatically with TTL

The Lambda That Handles Everything

Here's the create handler that's processed millions of links:

typescript
// src/handlers/create.tsimport type { APIGatewayProxyHandlerV2 } from 'aws-lambda';import { DynamoDBClient } from '@aws-sdk/client-dynamodb';import { DynamoDBDocumentClient, PutCommand, QueryCommand } from '@aws-sdk/lib-dynamodb';import { generateShortId } from '../utils/id-generator';import { validateUrl } from '../utils/url-validator';
const client = new DynamoDBClient({});const ddb = DynamoDBDocumentClient.from(client, {  marshallOptions: { removeUndefinedValues: true },});
const TABLE_NAME = process.env.TABLE_NAME!;const DOMAIN = process.env.SHORT_DOMAIN!;
export const handler: APIGatewayProxyHandlerV2 = async (event) => {  const startTime = Date.now();    try {    const body = JSON.parse(event.body || '{}');    const { url, customSlug, expiresInDays = 365, tags = [] } = body;
    // Validate URL (learned this the hard way)    const validation = await validateUrl(url);    if (!validation.isValid) {      return {        statusCode: 400,        body: JSON.stringify({           error: validation.error,          details: validation.details         }),      };    }
    // Check for existing short link (deduplication)    const existing = await ddb.send(new QueryCommand({      TableName: TABLE_NAME,      IndexName: 'GSI1',      KeyConditionExpression: 'GSI1PK = :pk',      ExpressionAttributeValues: {        ':pk': `URL#${url}`,      },      Limit: 1,    }));
    if (existing.Items?.length) {      const existingLink = existing.Items[0];      console.log(`Deduplication hit: ${existingLink.PK}`);      return {        statusCode: 200,        body: JSON.stringify({          shortUrl: `${DOMAIN}/${existingLink.PK.replace('LINK#', '')}`,          isNew: false,          processingTime: Date.now() - startTime,        }),      };    }
    // Generate short ID with collision detection    let shortId = customSlug || generateShortId();    let attempts = 0;    const maxAttempts = 5;
    while (attempts < maxAttempts) {      try {        await ddb.send(new PutCommand({          TableName: TABLE_NAME,          Item: {            PK: `LINK#${shortId}`,            SK: 'METADATA',            GSI1PK: `URL#${url}`,            GSI1SK: `LINK#${shortId}`,            GSI2PK: event.requestContext?.authorizer?.userId || 'ANONYMOUS',            CreatedAt: Date.now(),            OriginalUrl: url,            ClickCount: 0,            ExpiresAt: Date.now() + (expiresInDays * 24 * 60 * 60 * 1000),            Tags: tags,            CreatedBy: event.requestContext?.authorizer?.userId,            SourceIP: event.requestContext?.http?.sourceIp,          },          ConditionExpression: 'attribute_not_exists(PK)',        }));                break; // Success!      } catch (error: any) {        if (error.name === 'ConditionalCheckFailedException') {          if (customSlug) {            return {              statusCode: 409,              body: JSON.stringify({                 error: 'Custom slug already exists',                suggestion: generateShortId(),              }),            };          }          shortId = generateShortId(); // Try another ID          attempts++;        } else {          throw error;        }      }    }
    return {      statusCode: 201,      body: JSON.stringify({        shortUrl: `${DOMAIN}/${shortId}`,        shortId,        expiresAt: new Date(Date.now() + (expiresInDays * 24 * 60 * 60 * 1000)).toISOString(),        processingTime: Date.now() - startTime,      }),    };  } catch (error) {    console.error('Error creating short link:', error);    return {      statusCode: 500,      body: JSON.stringify({         error: 'Internal server error',        requestId: event.requestContext?.requestId,      }),    };  }};

The ID Generator That Won't Fail You

After trying nanoid, shortid, and a bunch of other libraries, here's what actually works in production:

typescript
// src/utils/id-generator.tsimport { randomBytes } from 'crypto';
// Removed ambiguous characters (0, O, l, I) after support got confusedconst ALPHABET = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz';const ID_LENGTH = 7; // Gives us 3.5 trillion combinations
export function generateShortId(length: number = ID_LENGTH): string {  const bytes = randomBytes(length);  let id = '';    for (let i = 0; i < length; i++) {    id += ALPHABET[bytes[i] % ALPHABET.length];  }    return id;}
// For custom slugs - learned these rules from angry usersexport function validateCustomSlug(slug: string): { valid: boolean; reason?: string } {  if (slug.length < 3) {    return { valid: false, reason: 'Too short (min 3 characters)' };  }    if (slug.length > 50) {    return { valid: false, reason: 'Too long (max 50 characters)' };  }    // Only alphanumeric and hyphens, must start/end with alphanumeric  if (!/^[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]$/.test(slug)) {    return { valid: false, reason: 'Invalid characters or format' };  }    // Reserved words that caused issues  const reserved = ['api', 'admin', 'dashboard', 'login', 'logout', 'static', 'health'];  if (reserved.includes(slug.toLowerCase())) {    return { valid: false, reason: 'Reserved keyword' };  }    return { valid: true };}

Local Development That Doesn't Suck

Set up local development properly from day one. Trust me, you don't want to deploy to AWS every time you change a console.log:

typescript
// local-dev.tsimport express from 'express';import { DynamoDBClient } from '@aws-sdk/client-dynamodb';import { handler as createHandler } from './src/handlers/create';import { handler as redirectHandler } from './src/handlers/redirect';
const app = express();app.use(express.json());
// Mock AWS services locallyprocess.env.TABLE_NAME = 'local-links';process.env.SHORT_DOMAIN = 'http://localhost:3000';process.env.AWS_REGION = 'us-east-1';
// Wrap Lambda handlers for Expressconst lambdaToExpress = (handler: any) => async (req: any, res: any) => {  const event = {    body: JSON.stringify(req.body),    pathParameters: req.params,    queryStringParameters: req.query,    requestContext: {      http: {        sourceIp: req.ip,      },      requestId: Math.random().toString(36),    },  };    const result = await handler(event);  res.status(result.statusCode).json(JSON.parse(result.body));};
app.post('/create', lambdaToExpress(createHandler));app.get('/:id', lambdaToExpress(redirectHandler));
app.listen(3000, () => {  console.log('Local dev server running on http://localhost:3000');  console.log('DynamoDB Local required on port 8000');});

Run DynamoDB locally:

bash
docker run -p 8000:8000 amazon/dynamodb-local \  -jar DynamoDBLocal.jar -sharedDb -inMemory

Deploy Script That Won't Ruin Your Day

json
// package.json scripts{  "scripts": {    "build": "tsc",    "watch": "tsc -w",    "test": "jest",    "cdk": "cdk",    "local": "tsx watch local-dev.ts",    "deploy:dev": "cdk deploy --all --context environment=dev",    "deploy:prod": "cdk deploy --all --context environment=prod --require-approval never",    "destroy:dev": "cdk destroy --all --context environment=dev",    "synth": "cdk synth --quiet",    "diff": "cdk diff --all"  }}

Performance Numbers from Production

After running this for 6 months, here are the real numbers:

  • Create endpoint: p50: 45ms, p99: 120ms
  • Redirect endpoint (cold start): p50: 15ms, p99: 80ms
  • Redirect endpoint (warm): p50: 8ms, p99: 25ms
  • DynamoDB costs: ~6.25/monthfor510Mredirects(25Mreadunits@6.25/month for 5-10M redirects (25M read units @ 0.25 per million)
  • Lambda costs: $12/month (most redirects served from CloudFront)
  • CloudFront costs: $85/month (worth every penny for caching)

Lessons Learned the Hard Way

  1. Start with on-demand DynamoDB - You don't know your access patterns yet. We switched to provisioned after 3 months and saved 60%.

  2. Log everything, retain nothing - We logged every click initially. The CloudWatch bill was... educational. Now we sample 1% and use metrics for the rest.

  3. Cache aggressively - That viral link that got 500,000 clicks in an hour? CloudFront saved us from a massive Lambda bill.

  4. Validate URLs properly - Someone will try to create a short link to javascript:alert('xss'). Someone will create redirect loops. Someone will use your service for phishing. Plan for it.

  5. Rate limiting from day one - We didn't add it initially. Then someone's script created 100,000 links in 10 minutes during a product launch. Fun times.

Next Steps in This Series

Ready to implement the core functionality? In Part 2: Core Functionality & API Development, we'll:

  • Build the redirect handler with smart caching strategies
  • Implement analytics that won't break the bank
  • Add rate limiting and abuse prevention
  • Set up monitoring that actually tells you when things are broken

Quick Preview of the Complete Series:

  • Part 3: Advanced features including custom domains, QR codes, and bulk operations
  • Part 4: Production deployment with blue-green deployments and zero-downtime migrations
  • Part 5: Scaling strategies and long-term maintenance patterns

The complete code for this series is on GitHub, including migration scripts and performance tests.

Remember: link shorteners are simple until they're not. Build for scale from the start, but deploy what works today. And always, always validate those URLs.

References

AWS CDK Link Shortener: From Zero to Production

A comprehensive 5-part series on building a production-grade link shortener service with AWS CDK, Node.js Lambda, and DynamoDB. Real war stories, performance optimization, and cost management included.

Progress1/5 posts completed

Related Posts