Skip to content

API Versioning with AWS CDK: A Production Case Study

A technical case study on implementing multi-version APIs in production. Failed approaches, working solutions, and CDK patterns for managing API evolution.

Abstract

This case study examines the implementation of a production API versioning system using AWS CDK. Through analysis of three failed approaches and one working solution, we explore practical patterns for managing API evolution while maintaining client compatibility. The approach we ultimately developed provides solid patterns for managing multiple API versions with minimal operational overhead.

Problem Statement

API evolution creates an inevitable conflict: the need to improve and change the API while maintaining backward compatibility for existing clients. The challenge intensifies in enterprise environments where clients have varying update capabilities and deployment windows.

The specific challenge addressed here involved:

  • Multiple enterprise clients with different integration capabilities
  • Varying deployment cycles (from weekly to 18-month government cycles)
  • Need for API improvements without breaking existing integrations
  • Limited development resources for maintaining multiple versions

Failed Approaches

Three approaches were attempted before arriving at the working solution, each failing for different technical and operational reasons.

Failed Approach #1: No Versioning Strategy

The initial approach assumed all clients could be updated simultaneously, eliminating the need for versioning.

Implementation: Single API endpoint with continuous updates Timeline: 6 months from launch to failure Client Growth: 5 initial clients → 50 clients

Failure Points:

  • Government client with air-gapped networks required 18-month update cycles
  • Manual backporting of security fixes became unsustainable
  • Shadow API maintenance created significant infrastructure complexity
  • Development velocity decreased as every change required compatibility analysis

Failed Approach #2: Over-Versioning

The second approach attempted to version every aspect of the API independently.

Implementation: Separate versioning for endpoints, headers, and response formats

GET /v2/users?response_version=1.3X-API-Version: 2.1Accept: application/vnd.company.user.v4+json

Failure Points:

  • 25+ version combinations created exponential testing complexity
  • Developer cognitive load became unsustainable
  • Client integration difficulty increased significantly
  • Documentation maintenance became impossible

Failed Approach #3: Intelligent Routing

The third approach used client fingerprinting to automatically route requests to appropriate API versions.

Implementation: Lambda@Edge function with client detection logic Performance Impact: +150ms latency per request

Failure Points:

  • Single point of failure affected all API versions
  • Client detection logic proved unreliable
  • Performance degradation unacceptable for production use
  • High operational complexity for minimal benefit

Working Solution: Path-Based Versioning with Lifecycle Management

The successful approach combines path-based versioning with comprehensive lifecycle management and automated deprecation warnings.

typescript
// lib/config/api-versions.tsexport interface ApiVersion {  version: string;  status: 'alpha' | 'beta' | 'stable' | 'deprecated' | 'sunset';  launchedAt: Date;  deprecatedAt?: Date;  sunsetAt?: Date;  monthlyActiveClients?: number;  // Track this!  breakingChanges: string[];  supportedFeatures: Set<string>;}
export const API_VERSIONS: Record<string, ApiVersion> = {  v1: {    version: 'v1',    status: 'deprecated',    launchedAt: new Date('2022-01-15'),    deprecatedAt: new Date('2024-01-15'),    sunsetAt: new Date('2025-01-15'),    monthlyActiveClients: 28,  // Legacy government clients    breakingChanges: [],    supportedFeatures: new Set(['basic-crud']),  },  v2: {    version: 'v2',    status: 'stable',    launchedAt: new Date('2023-06-01'),    monthlyActiveClients: 156,    breakingChanges: [      'Changed userId to user_id in all responses',      'Removed XML support',      'Made email field required',    ],    supportedFeatures: new Set(['basic-crud', 'pagination', 'filtering']),  },  v3: {    version: 'v3',    status: 'beta',    launchedAt: new Date('2024-03-01'),    monthlyActiveClients: 42,    breakingChanges: [      'Moved to JSON:API spec',      'Changed all IDs to UUIDs',      'Nested resources under data property',    ],    supportedFeatures: new Set([      'basic-crud',      'pagination',      'filtering',      'webhooks',      'graphql',      'batch-operations'    ]),  },};

The CDK Stack That Powers Our APIs

The production CDK implementation handles substantial traffic across multiple API versions:

typescript
// lib/stacks/versioned-api-stack.tsimport { RestApi, MethodLoggingLevel, LambdaIntegration } from 'aws-cdk-lib/aws-apigateway';import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';import { Duration, Stack, StackProps } from 'aws-cdk-lib';import { Alarm, Metric } from 'aws-cdk-lib/aws-cloudwatch';import { Construct } from 'constructs';
export class VersionedApiStack extends Stack {  constructor(scope: Construct, id: string, props: StackProps) {    super(scope, id, props);
    const api = new RestApi(this, 'MultiVersionAPI', {      restApiName: 'production-api',      // Learned this the hard way: always enable CloudWatch      deployOptions: {        loggingLevel: MethodLoggingLevel.INFO,        dataTraceEnabled: true,  // Essential for debugging version-specific issues        metricsEnabled: true,        tracingEnabled: true,      },    });
    // Add the version check Lambda - this is crucial    const versionCheckFn = new NodejsFunction(this, 'VersionCheck', {      entry: 'src/middleware/version-check.ts',      memorySize: 256,  // Don't need much      timeout: Duration.seconds(3),      environment: {        VERSIONS: JSON.stringify(API_VERSIONS),        SLACK_WEBHOOK: process.env.SLACK_WEBHOOK!,  // Alert on deprecated version usage      },    });
    // Set up each version    Object.entries(API_VERSIONS).forEach(([version, config]) => {      if (config.status === 'sunset') return;  // Don't deploy sunset versions
      const versionResource = api.root.addResource(version);      this.setupVersionEndpoints(versionResource, config);    });
    // Critical: version discovery endpoint    this.addVersionDiscovery(api);
    // The alarm that saved us during the v1 sunset    new Alarm(this, 'DeprecatedVersionHighUsage', {      metric: new Metric({        namespace: 'API/Versions',        metricName: 'DeprecatedVersionCalls',        statistic: 'Sum',      }),      threshold: 1000,      evaluationPeriods: 1,    });  }
  private setupVersionEndpoints(resource: IResource, config: ApiVersion) {    // Architecture: 24 Lambda functions across versions    // Separate functions per version ensure isolation
    const handlers = new Map<string, Function>();
    // User endpoints - the source of most breaking changes    const usersResource = resource.addResource('users');
    const listUsersHandler = new NodejsFunction(this, `ListUsers-${config.version}`, {      entry: `src/handlers/${config.version}/users/list.ts`,      memorySize: config.version === 'v1' ? 512 : 1024,  // V1 is inefficient      timeout: Duration.seconds(29),  // API Gateway maximum timeout (up to 29 seconds for REST APIs)      environment: {        TABLE_NAME: process.env.USERS_TABLE!,        VERSION: config.version,        FEATURES: [...config.supportedFeatures].join(','),        // This saved debugging time countless times        DEPLOYMENT_TIME: new Date().toISOString(),      },      bundling: {        // Version-specific dependencies        externalModules: [          '@aws-sdk/client-dynamodb',  // AWS SDK v3 for Node.js 18+ runtime          '@aws-sdk/client-cloudwatch',          ...(config.version === 'v1' ? ['xmlbuilder'] : []),  // V1 XML support        ],      },    });
    usersResource.addMethod('GET', new LambdaIntegration(listUsersHandler), {      requestParameters: {        'method.request.querystring.page': config.supportedFeatures.has('pagination'),        'method.request.querystring.limit': config.supportedFeatures.has('pagination'),        'method.request.querystring.filter': config.supportedFeatures.has('filtering'),        // V3 specific parameters        'method.request.querystring.include': config.version === 'v3',        'method.request.querystring.fields': config.version === 'v3',      },    });
    // Track every version call - this metric is gold    listUsersHandler.metricInvocations().createAlarm(this, `HighTraffic-${config.version}`, {      threshold: 10000,      evaluationPeriods: 1,      alarmDescription: `High traffic on ${config.version} - check scaling`,    });  }}

The Version Handlers That Actually Run

Here's the real code with all its warts:

typescript
// src/handlers/v1/users/list.ts// Legacy v1 implementation with minimal changesexport const handler = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {  console.log('V1 handler called', {    path: event.path,    clientIp: event.requestContext.identity.sourceIp,    userAgent: event.headers['User-Agent'],  });
  try {    // V1 doesn't support pagination, returns everything    // V1 design limitation - maintained for compatibility    const users = await getAllUsers();  // Returns all users - pagination added in v2
    // The field that caused the incident    const transformedUsers = users.map(u => ({      userId: u.user_id,  // V1 uses camelCase      userName: u.name,      userEmail: u.email,      createdDate: u.created_at,  // Different field name because reasons    }));
    return {      statusCode: 200,      headers: {        'Content-Type': 'application/json',        'X-API-Version': 'v1',        'X-API-Deprecated': 'true',        'X-API-Sunset': '2025-01-15',        'Warning': '299 - "API v1 is deprecated. Please migrate to v2. Guides: https://docs.api.com/migration"',        // Required by financial industry clients        'X-Total-Count': transformedUsers.length.toString(),      },      body: JSON.stringify(transformedUsers),    };  } catch (error) {    // Comprehensive error logging for troubleshooting    console.error('V1 handler error', {      error,      stack: error.stack,      event: JSON.stringify(event),    });
    return {      statusCode: 500,      body: JSON.stringify({        error: 'Internal Server Error',        // V1 clients expect this exact format        errorCode: 'INTERNAL_ERROR',        timestamp: new Date().toISOString(),      }),    };  }};
// src/handlers/v2/users/list.tsexport const handler = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {  // V2 added proper pagination after the 50K incident  const page = parseInt(event.queryStringParameters?.page || '1');  const limit = Math.min(    parseInt(event.queryStringParameters?.limit || '20'),    100  // Maximum page size for performance  );
  const metrics = {    version: 'v2',    page,    limit,    clientIp: event.requestContext.identity.sourceIp,  };
  // Track deprecated version usage  if (event.headers['User-Agent']?.includes('OldSDK/1.')) {    await cloudwatch.putMetricData({      Namespace: 'API/Clients',      MetricData: [{        MetricName: 'OutdatedSDKUsage',        Value: 1,        Dimensions: [{ Name: 'Version', Value: 'v2' }],      }],    }).promise();  }
  try {    const { users, total } = await getUsersPaginated({ page, limit });
    // V2 response format with pagination    const response = {      data: users.map(u => ({        id: u.user_id,  // Changed from userId        name: u.name,        email: u.email,        status: u.status || 'active',  // New required field        created_at: u.created_at,  // Snake case everywhere        updated_at: u.updated_at,      })),      pagination: {        page,        limit,        total,        total_pages: Math.ceil(total / limit),        has_next: page < Math.ceil(total / limit),        has_prev: page > 1,      },      // HATEOAS links for client navigation      _links: {        self: `/v2/users?page=${page}&limit=${limit}`,        next: page < Math.ceil(total / limit) ? `/v2/users?page=${page + 1}&limit=${limit}` : null,        prev: page > 1 ? `/v2/users?page=${page - 1}&limit=${limit}` : null,      },    };
    return {      statusCode: 200,      headers: {        'Content-Type': 'application/json',        'X-API-Version': 'v2',        'X-RateLimit-Limit': '500',        'X-RateLimit-Remaining': await getRateLimitRemaining(event),        'Cache-Control': 'private, max-age=60',  // Prevent unintended caching      },      body: JSON.stringify(response),    };  } catch (error) {    logger.error('V2 handler error', { error, metrics });    throw error;  // Let API Gateway handle it  }};
// src/handlers/v3/users/list.ts// V3: JSON:API specification implementationexport const handler = middy(async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {  // JSON:API compliance for enterprise integration  const params = parseJsonApiParams(event.queryStringParameters);
  // Feature flags for gradual rollout  const features = await getFeatureFlags('v3', event.headers['X-Client-Id']);
  const { users, total, included } = await getUsersWithRelationships({    ...params,    includeRelationships: params.include,    sparseFields: params.fields,    experimentalFeatures: features,  });
  // JSON:API format - love it or hate it  const response = {    data: users.map(u => ({      type: 'users',      id: u.id,  // UUID format for consistency      attributes: {        name: u.name,        email: u.email,        status: u.status,        created_at: u.created_at,        updated_at: u.updated_at,      },      relationships: {        organization: {          data: { type: 'organizations', id: u.organization_id },        },        roles: {          data: u.role_ids.map(id => ({ type: 'roles', id })),        },      },      links: {        self: `/v3/users/${u.id}`,      },    })),    included: included,  // Related resources    meta: {      pagination: {        page: params.page.number,        pages: Math.ceil(total / params.page.size),        count: users.length,        total: total,      },      api_version: 'v3',      generated_at: new Date().toISOString(),      experimental_features: [...features],    },    links: generateJsonApiLinks(params, total),  };
  return {    statusCode: 200,    headers: {      'Content-Type': 'application/vnd.api+json',  // JSON:API requirement      'X-API-Version': 'v3',      'X-RateLimit-Limit': '1000',      'X-RateLimit-Remaining': await getRateLimitRemaining(event),      'Vary': 'Accept, X-Client-Id',  // Important for caching    },    body: JSON.stringify(response),  };})  .use(jsonBodyParser())  .use(httpErrorHandler())  .use(correlationIds())  .use(logTimeout())  .use(warmup());

Migration Pain Points and Solutions

The Database Migration That Almost Killed Us

When moving from V1 to V2, we needed to change userId (string) to user_id (UUID). Here's how we did it without downtime:

typescript
// migrations/v1-to-v2-user-ids.tsexport const migrateUserIds = async () => {  const BATCH_SIZE = 100;  let lastEvaluatedKey: any = undefined;  let migrated = 0;  let failed = 0;
  // First pass: Add new field  do {    const { Items, LastEvaluatedKey } = await dynamodb.scan({      TableName: process.env.USERS_TABLE!,      Limit: BATCH_SIZE,      ExclusiveStartKey: lastEvaluatedKey,    }).promise();
    const batch = Items?.map(item => ({      PutRequest: {        Item: {          ...item,          user_id: item.userId || generateUUID(),  // New field          _migration: 'v1-to-v2-phase1',          _migrated_at: new Date().toISOString(),        },      },    })) || [];
    if (batch.length > 0) {      try {        await dynamodb.batchWrite({          RequestItems: { [process.env.USERS_TABLE!]: batch },        }).promise();        migrated += batch.length;      } catch (error) {        // Log but don't stop - we'll retry failed items        console.error('Batch failed', { error, batch: batch.map(b => b.PutRequest.Item.userId) });        failed += batch.length;      }    }
    lastEvaluatedKey = LastEvaluatedKey;
    // Throttle to avoid hot partitions    await new Promise(resolve => setTimeout(resolve, 100));
  } while (lastEvaluatedKey);
  console.log(`Migration complete: ${migrated} succeeded, ${failed} failed`);
  // Second pass: Remove old field (after all clients updated)  // We waited 6 months for this};

Client SDK Backwards Compatibility

Our SDK had to work with all API versions. This is messy but necessary:

typescript
// sdk/src/client.tsexport class ApiClient {  private version: string;  private warned = new Set<string>();
  constructor(options: ClientOptions = {}) {    this.version = options.version || 'v2';  // Default to stable
    if (this.version === 'v1' && !this.warned.has('deprecation')) {      console.warn(        '\x1b[33m%s\x1b[0m',  // Yellow text        '[DEPRECATION] API v1 will be sunset on 2025-01-15. ' +        'Migration guide: https://docs.api.com/migration'      );      this.warned.add('deprecation');
      // Track SDK version usage      this.trackEvent('sdk_deprecation_warning', { version: 'v1' });    }  }
  async getUsers(options?: GetUsersOptions) {    const url = this.buildUrl('users', options);    const response = await this.request(url);
    // Normalize responses across versions    return this.normalizeUserResponse(response);  }
  private normalizeUserResponse(response: any): User[] {    switch (this.version) {      case 'v1':        // V1 returns flat array        return response.map((u: any) => ({          id: u.userId,          name: u.userName,          email: u.userEmail,          createdAt: new Date(u.createdDate),          // V1 doesn't have these          status: 'active',          updatedAt: new Date(u.createdDate),        }));
      case 'v2':        // V2 returns paginated response        return response.data.map((u: any) => ({          id: u.id,          name: u.name,          email: u.email,          status: u.status,          createdAt: new Date(u.created_at),          updatedAt: new Date(u.updated_at),        }));
      case 'v3':        // V3 returns JSON:API format        return response.data.map((u: any) => ({          id: u.id,          name: u.attributes.name,          email: u.attributes.email,          status: u.attributes.status,          createdAt: new Date(u.attributes.created_at),          updatedAt: new Date(u.attributes.updated_at),          // V3 includes relationships          organizationId: u.relationships?.organization?.data?.id,          roleIds: u.relationships?.roles?.data?.map((r: any) => r.id) || [],        }));
      default:        throw new Error(`Unknown API version: ${this.version}`);    }  }}

Monitoring and Alerting That Actually Helps

The monitoring system provides visibility into version usage patterns and performance:

typescript
// lib/constructs/api-monitoring.tsexport class ApiMonitoring extends Construct {  constructor(scope: Construct, id: string) {    super(scope, id);
    // Dashboard that actually gets looked at    const dashboard = new Dashboard(this, 'ApiDashboard', {      dashboardName: 'api-versions-prod',      defaultInterval: Duration.hours(3),  // Recent enough to be useful    });
    // Version distribution - watched this like a hawk during v2 rollout    dashboard.addWidgets(      new GraphWidget({        title: 'API Version Distribution (% of requests)',        left: [v1Percentage, v2Percentage, v3Percentage],        leftYAxis: { max: 100, min: 0 },        period: Duration.minutes(5),        statistic: 'Average',        // Minimum usage threshold for sunset decisions        leftAnnotations: [{          label: 'Min safe threshold',          value: 5,          color: Color.RED,        }],      })    );
    // The metric that matters: client errors by version    dashboard.addWidgets(      new GraphWidget({        title: '4xx Errors by Version',        left: [          new MathExpression({            expression: 'RATE(m1)',            usingMetrics: {              m1: v1Errors,            },            label: 'V1 Error Rate',            color: Color.RED,          }),          // Similar for v2, v3        ],      })    );
    // Deprecation warning effectiveness    const deprecationAlarm = new Alarm(this, 'V1StillHighUsage', {      metric: v1Percentage,      threshold: 10,      evaluationPeriods: 3,      comparisonOperator: ComparisonOperator.GREATER_THAN_THRESHOLD,      alarmDescription: 'V1 still above 10% - delay sunset?',      treatMissingData: TreatMissingData.NOT_BREACHING,    });
    deprecationAlarm.addAlarmAction(      new SnsAction(Topic.fromTopicArn(this, 'AlertTopic', process.env.ALERT_TOPIC_ARN!))    );  }}

Lessons Learned

1. Version Sunset Complexity

28 clients remain on V1 after two years of deprecation due to:

  • Government deployment cycles requiring 18-month lead times
  • IoT devices with firmware-embedded URLs
  • Legacy systems with hard-coded integrations

V1 maintenance requires ongoing technical resources while supporting clients with critical integration dependencies

2. Exponential Testing Complexity

Breaking changes multiply testing requirements exponentially:

  • 3 API versions
  • 3 SDK versions
  • 4 response formats
  • = 36 test combinations

Integration test suite: 25 minutes execution time

3. Documentation Maintenance

Documentation drift creates hidden dependencies. V1 documentation lag led to:

  • Client reliance on undocumented behavior
  • Need for feature flags to maintain compatibility
  • Additional development overhead for legacy behavior

4. Version Discovery Is Critical

typescript
// This endpoint saves more support tickets than any otherapp.get('/api', (req, res) => {  res.json({    versions: {      v1: {        status: 'deprecated',        sunset_date: '2025-01-15',        docs: 'https://docs.api.com/v1',        migration_guide: 'https://docs.api.com/v1-to-v2',      },      v2: {        status: 'stable',        docs: 'https://docs.api.com/v2',      },      v3: {        status: 'beta',        docs: 'https://docs.api.com/v3',        breaking_changes: 'https://docs.api.com/v3-breaking-changes',      },    },    current_stable: 'v2',    recommended: 'v2',    your_version: detectVersion(req),  // What the client is using  });});

Operational Considerations

Multi-version API maintenance requires significant technical considerations:

  • Infrastructure: 3x Lambda functions, API Gateway configurations create operational complexity
  • Development: 35% longer implementation time for cross-version features
  • Testing: CI/CD pipeline extended from 8 minutes to 25 minutes due to comprehensive version coverage
  • Documentation: Dedicated resources needed for version-specific documentation
  • Support: 25% of tickets related to version confusion requiring clear migration guides

Implementation Recommendations

  1. Design for versioning from initial release - Retrofitting versioning increases complexity 8-10x
  2. Bundle breaking changes - Batch related changes to reduce version proliferation
  3. Automate migration tooling - Build client migration tools before they're needed
  4. Plan realistic sunset timelines - Enterprise clients require 12-18 month migration windows
  5. Implement usage tracking early - Version analytics inform sunset decisions

The CDK Pattern That Actually Works

If you're starting fresh, use this structure:

/api  /v1    /users    /orders    /internal/health  /v2    /users    /orders    /internal/health  /versions (discovery endpoint)  /health (version-agnostic)

Keep your Lambda code organized by version:

/src  /handlers    /v1      /users      /orders    /v2      /users      /orders  /shared    /database    /auth    /utils

Conclusion

Successful API versioning balances technical elegance with business reality. The path-based versioning approach with lifecycle management provides:

  • Client Compatibility: Maintains service for diverse client update cycles
  • Development Efficiency: Clear separation of version-specific logic
  • Operational Visibility: Comprehensive monitoring and deprecation warnings
  • Business Continuity: Revenue protection during API evolution

Implementing production-ready API versioning requires 4-6 months initial investment and ongoing operational complexity, but provides essential client compatibility during API evolution and protects critical business relationships.

References

Related Posts