Saga Pattern for Distributed Transactions: Maintaining Consistency Without ACID

Abstract

The Saga pattern solves one of the most challenging problems in microservices architectures: maintaining data consistency across services without traditional ACID transactions. In this guide, I'll share practical patterns for implementing sagas using AWS Step Functions orchestration and EventBridge choreography, designing effective compensation logic, ensuring idempotency, and handling the isolation challenges that arise in distributed systems. You'll learn when to choose orchestration versus choreography, how to implement semantic locking to prevent concurrent saga conflicts, and the critical observability patterns needed for production environments.

The Distributed Transaction Problem

When building microservices, you quickly encounter a fundamental challenge: coordinating multi-step transactions across independent services. Traditional ACID transactions don't work across service boundaries, and two-phase commit (2PC) creates tight coupling and single points of failure.

Consider a typical e-commerce order flow: creating an order, reserving inventory, processing payment, and confirming shipment. Each step involves a different microservice with its own database. If payment processing fails after inventory has been reserved, you need a reliable way to roll back the inventory reservation. Without proper patterns, services become inconsistent; orders created, payments failed, inventory not restored.

The Saga pattern provides a structured approach to this problem through local transactions and compensating transactions. Instead of a distributed transaction, you execute a sequence of local transactions where each step can be undone by a compensating transaction if a later step fails.

Saga Pattern Fundamentals

A saga is a sequence of local transactions where:

Each transaction updates data within a single service
Each transaction publishes an event or message to trigger the next transaction
If a transaction fails, the saga executes compensating transactions to undo completed steps
The system achieves eventual consistency instead of immediate ACID consistency

Key characteristics that make sagas work:

Eventual Consistency: The system becomes consistent over time, not immediately
Local Transactions: Each service manages its own database with local ACID guarantees
Compensating Transactions: Explicit rollback logic for each forward step
Idempotency: All saga steps must be safe to retry
Observability: Critical for debugging distributed flows

When to use the Saga pattern:

Microservices architecture with multiple databases
Business processes spanning multiple services
Cannot use distributed transactions due to performance or coupling concerns
Acceptable to have temporary inconsistency
Typically 3-5 steps maximum (complexity increases rapidly beyond this)

Orchestration vs Choreography

There are two main approaches to coordinating sagas: orchestration and choreography. Understanding when to use each is critical for successful implementation.

Choreography: Event-Driven Coordination

In choreography, services coordinate through domain events without a central coordinator. Each service knows what to do when it receives an event.

Advantages:

Loose coupling between services
No single point of failure
Scales well for independent services
Natural fit for event-driven architecture

Disadvantages:

Control flow not visible in one place
Difficult to understand complete saga flow
Debugging complexity (distributed logic)
Harder to track saga state
Risk of cyclic dependencies

Best for:

3-4 services maximum
Independent, autonomous services
Event-driven architecture already in place
Simple linear flows

Orchestration: Centralized Coordination

In orchestration, a central coordinator (typically AWS Step Functions) manages the saga flow, telling each service what to do.

Advantages:

Clear control flow visualization
Easier debugging and monitoring
Centralized error handling
State management built-in
Better for complex flows

Disadvantages:

Orchestrator is coordination point
Services coupled to orchestrator
Orchestrator must know all services

Best for:

Complex multi-step workflows
Need visibility into saga state
Human approval steps
More than 4 services involved
Strict ordering requirements

Decision Framework

Use this decision logic when choosing your approach:

Orchestration Implementation with AWS Step Functions

Let me show you a production-ready implementation of an e-commerce order saga using AWS Step Functions orchestration with AWS CDK.

Infrastructure Setup

typescript

import * as cdk from 'aws-cdk-lib';import * as sfn from 'aws-cdk-lib/aws-stepfunctions';import * as tasks from 'aws-cdk-lib/aws-stepfunctions-tasks';import * as lambda from 'aws-cdk-lib/aws-lambda';import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';import { Construct } from 'constructs';
export class OrderSagaStack extends cdk.Stack {  constructor(scope: Construct, id: string) {    super(scope, id);
    // DynamoDB tables for saga state persistence    const ordersTable = new dynamodb.Table(this, 'Orders', {      partitionKey: { name: 'orderId', type: dynamodb.AttributeType.STRING },      sortKey: { name: 'transactionId', type: dynamodb.AttributeType.STRING },      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST    });
    const inventoryTable = new dynamodb.Table(this, 'Inventory', {      partitionKey: { name: 'productId', type: dynamodb.AttributeType.STRING },      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST    });
    const paymentsTable = new dynamodb.Table(this, 'Payments', {      partitionKey: { name: 'paymentId', type: dynamodb.AttributeType.STRING },      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST    });
    // Forward transaction Lambdas    const createOrder = new lambda.Function(this, 'CreateOrder', {      runtime: lambda.Runtime.NODEJS_20_X,      handler: 'createOrder.handler',      code: lambda.Code.fromAsset('lambda'),      environment: {        ORDERS_TABLE: ordersTable.tableName      }    });
    const reserveInventory = new lambda.Function(this, 'ReserveInventory', {      runtime: lambda.Runtime.NODEJS_20_X,      handler: 'reserveInventory.handler',      code: lambda.Code.fromAsset('lambda'),      environment: {        INVENTORY_TABLE: inventoryTable.tableName      }    });
    const processPayment = new lambda.Function(this, 'ProcessPayment', {      runtime: lambda.Runtime.NODEJS_20_X,      handler: 'processPayment.handler',      code: lambda.Code.fromAsset('lambda'),      environment: {        PAYMENTS_TABLE: paymentsTable.tableName      }    });
    const confirmOrder = new lambda.Function(this, 'ConfirmOrder', {      runtime: lambda.Runtime.NODEJS_20_X,      handler: 'confirmOrder.handler',      code: lambda.Code.fromAsset('lambda'),      environment: {        ORDERS_TABLE: ordersTable.tableName      }    });
    // Compensating transaction Lambdas (rollback)    const cancelOrder = new lambda.Function(this, 'CancelOrder', {      runtime: lambda.Runtime.NODEJS_20_X,      handler: 'cancelOrder.handler',      code: lambda.Code.fromAsset('lambda'),      environment: {        ORDERS_TABLE: ordersTable.tableName      }    });
    const releaseInventory = new lambda.Function(this, 'ReleaseInventory', {      runtime: lambda.Runtime.NODEJS_20_X,      handler: 'releaseInventory.handler',      code: lambda.Code.fromAsset('lambda'),      environment: {        INVENTORY_TABLE: inventoryTable.tableName      }    });
    const refundPayment = new lambda.Function(this, 'RefundPayment', {      runtime: lambda.Runtime.NODEJS_20_X,      handler: 'refundPayment.handler',      code: lambda.Code.fromAsset('lambda'),      environment: {        PAYMENTS_TABLE: paymentsTable.tableName      }    });
    // Grant table permissions    ordersTable.grantReadWriteData(createOrder);    ordersTable.grantReadWriteData(confirmOrder);    ordersTable.grantReadWriteData(cancelOrder);    inventoryTable.grantReadWriteData(reserveInventory);    inventoryTable.grantReadWriteData(releaseInventory);    paymentsTable.grantReadWriteData(processPayment);    paymentsTable.grantReadWriteData(refundPayment);
    // Step Functions tasks with retry configuration    const createOrderTask = new tasks.LambdaInvoke(this, 'CreateOrderTask', {      lambdaFunction: createOrder,      outputPath: '$.Payload',      retryOnServiceExceptions: true    }).addRetry({      errors: ['ThrottlingException', 'ServiceUnavailable'],      interval: cdk.Duration.seconds(2),      maxAttempts: 3,      backoffRate: 2.0    });
    const reserveInventoryTask = new tasks.LambdaInvoke(this, 'ReserveInventoryTask', {      lambdaFunction: reserveInventory,      outputPath: '$.Payload',      retryOnServiceExceptions: true    }).addRetry({      errors: ['ThrottlingException'],      interval: cdk.Duration.seconds(1),      maxAttempts: 3,      backoffRate: 1.5    });
    const processPaymentTask = new tasks.LambdaInvoke(this, 'ProcessPaymentTask', {      lambdaFunction: processPayment,      outputPath: '$.Payload',      retryOnServiceExceptions: true,      timeout: cdk.Duration.seconds(30)    }).addRetry({      errors: ['PaymentProcessingException'],      interval: cdk.Duration.seconds(3),      maxAttempts: 2,      backoffRate: 2.0    });
    const confirmOrderTask = new tasks.LambdaInvoke(this, 'ConfirmOrderTask', {      lambdaFunction: confirmOrder,      outputPath: '$.Payload'    });
    // Compensation tasks    const cancelOrderTask = new tasks.LambdaInvoke(this, 'CancelOrderTask', {      lambdaFunction: cancelOrder,      outputPath: '$.Payload'    });
    const releaseInventoryTask = new tasks.LambdaInvoke(this, 'ReleaseInventoryTask', {      lambdaFunction: releaseInventory,      outputPath: '$.Payload'    });
    const refundPaymentTask = new tasks.LambdaInvoke(this, 'RefundPaymentTask', {      lambdaFunction: refundPayment,      outputPath: '$.Payload'    });
    // Saga orchestration with compensation logic    // Build compensation chain backwards    const compensatePayment = refundPaymentTask      .next(releaseInventoryTask)      .next(cancelOrderTask)      .next(new sfn.Fail(this, 'OrderFailed', {        error: 'OrderProcessingFailed',        cause: 'Payment processing failed, all steps compensated'      }));
    const compensateInventory = releaseInventoryTask      .next(cancelOrderTask)      .next(new sfn.Fail(this, 'InventoryFailed', {        error: 'InventoryReservationFailed',        cause: 'Inventory reservation failed, order cancelled'      }));
    // Build forward flow with catch blocks for compensation    const sagaDefinition = createOrderTask      .next(reserveInventoryTask        .addCatch(compensateInventory, {          errors: ['States.ALL'],          resultPath: '$.inventoryError'        })      )      .next(processPaymentTask        .addCatch(compensatePayment, {          errors: ['States.ALL'],          resultPath: '$.paymentError'        })      )      .next(confirmOrderTask)      .next(new sfn.Succeed(this, 'OrderSuccess'));
    // Create saga state machine    const orderSaga = new sfn.StateMachine(this, 'OrderSaga', {      definition: sagaDefinition,      stateMachineType: sfn.StateMachineType.STANDARD,      timeout: cdk.Duration.minutes(5),      tracingEnabled: true    });
    new cdk.CfnOutput(this, 'OrderSagaArn', {      value: orderSaga.stateMachineArn,      description: 'Order Saga State Machine ARN'    });  }}

import * as cdk from 'aws-cdk-lib';import * as sfn from 'aws-cdk-lib/aws-stepfunctions';import * as tasks from 'aws-cdk-lib/aws-stepfunctions-tasks';import * as lambda from 'aws-cdk-lib/aws-lambda';import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';import { Construct } from 'constructs';
export class OrderSagaStack extends cdk.Stack {  constructor(scope: Construct, id: string) {    super(scope, id);
    // DynamoDB tables for saga state persistence    const ordersTable = new dynamodb.Table(this, 'Orders', {      partitionKey: { name: 'orderId', type: dynamodb.AttributeType.STRING },      sortKey: { name: 'transactionId', type: dynamodb.AttributeType.STRING },      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST    });
    const inventoryTable = new dynamodb.Table(this, 'Inventory', {      partitionKey: { name: 'productId', type: dynamodb.AttributeType.STRING },      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST    });
    const paymentsTable = new dynamodb.Table(this, 'Payments', {      partitionKey: { name: 'paymentId', type: dynamodb.AttributeType.STRING },      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST    });
    // Forward transaction Lambdas    const createOrder = new lambda.Function(this, 'CreateOrder', {      runtime: lambda.Runtime.NODEJS_20_X,      handler: 'createOrder.handler',      code: lambda.Code.fromAsset('lambda'),      environment: {        ORDERS_TABLE: ordersTable.tableName      }    });
    const reserveInventory = new lambda.Function(this, 'ReserveInventory', {      runtime: lambda.Runtime.NODEJS_20_X,      handler: 'reserveInventory.handler',      code: lambda.Code.fromAsset('lambda'),      environment: {        INVENTORY_TABLE: inventoryTable.tableName      }    });
    const processPayment = new lambda.Function(this, 'ProcessPayment', {      runtime: lambda.Runtime.NODEJS_20_X,      handler: 'processPayment.handler',      code: lambda.Code.fromAsset('lambda'),      environment: {        PAYMENTS_TABLE: paymentsTable.tableName      }    });
    const confirmOrder = new lambda.Function(this, 'ConfirmOrder', {      runtime: lambda.Runtime.NODEJS_20_X,      handler: 'confirmOrder.handler',      code: lambda.Code.fromAsset('lambda'),      environment: {        ORDERS_TABLE: ordersTable.tableName      }    });
    // Compensating transaction Lambdas (rollback)    const cancelOrder = new lambda.Function(this, 'CancelOrder', {      runtime: lambda.Runtime.NODEJS_20_X,      handler: 'cancelOrder.handler',      code: lambda.Code.fromAsset('lambda'),      environment: {        ORDERS_TABLE: ordersTable.tableName      }    });
    const releaseInventory = new lambda.Function(this, 'ReleaseInventory', {      runtime: lambda.Runtime.NODEJS_20_X,      handler: 'releaseInventory.handler',      code: lambda.Code.fromAsset('lambda'),      environment: {        INVENTORY_TABLE: inventoryTable.tableName      }    });
    const refundPayment = new lambda.Function(this, 'RefundPayment', {      runtime: lambda.Runtime.NODEJS_20_X,      handler: 'refundPayment.handler',      code: lambda.Code.fromAsset('lambda'),      environment: {        PAYMENTS_TABLE: paymentsTable.tableName      }    });
    // Grant table permissions    ordersTable.grantReadWriteData(createOrder);    ordersTable.grantReadWriteData(confirmOrder);    ordersTable.grantReadWriteData(cancelOrder);    inventoryTable.grantReadWriteData(reserveInventory);    inventoryTable.grantReadWriteData(releaseInventory);    paymentsTable.grantReadWriteData(processPayment);    paymentsTable.grantReadWriteData(refundPayment);
    // Step Functions tasks with retry configuration    const createOrderTask = new tasks.LambdaInvoke(this, 'CreateOrderTask', {      lambdaFunction: createOrder,      outputPath: '$.Payload',      retryOnServiceExceptions: true    }).addRetry({      errors: ['ThrottlingException', 'ServiceUnavailable'],      interval: cdk.Duration.seconds(2),      maxAttempts: 3,      backoffRate: 2.0    });
    const reserveInventoryTask = new tasks.LambdaInvoke(this, 'ReserveInventoryTask', {      lambdaFunction: reserveInventory,      outputPath: '$.Payload',      retryOnServiceExceptions: true    }).addRetry({      errors: ['ThrottlingException'],      interval: cdk.Duration.seconds(1),      maxAttempts: 3,      backoffRate: 1.5    });
    const processPaymentTask = new tasks.LambdaInvoke(this, 'ProcessPaymentTask', {      lambdaFunction: processPayment,      outputPath: '$.Payload',      retryOnServiceExceptions: true,      timeout: cdk.Duration.seconds(30)    }).addRetry({      errors: ['PaymentProcessingException'],      interval: cdk.Duration.seconds(3),      maxAttempts: 2,      backoffRate: 2.0    });
    const confirmOrderTask = new tasks.LambdaInvoke(this, 'ConfirmOrderTask', {      lambdaFunction: confirmOrder,      outputPath: '$.Payload'    });
    // Compensation tasks    const cancelOrderTask = new tasks.LambdaInvoke(this, 'CancelOrderTask', {      lambdaFunction: cancelOrder,      outputPath: '$.Payload'    });
    const releaseInventoryTask = new tasks.LambdaInvoke(this, 'ReleaseInventoryTask', {      lambdaFunction: releaseInventory,      outputPath: '$.Payload'    });
    const refundPaymentTask = new tasks.LambdaInvoke(this, 'RefundPaymentTask', {      lambdaFunction: refundPayment,      outputPath: '$.Payload'    });
    // Saga orchestration with compensation logic    // Build compensation chain backwards    const compensatePayment = refundPaymentTask      .next(releaseInventoryTask)      .next(cancelOrderTask)      .next(new sfn.Fail(this, 'OrderFailed', {        error: 'OrderProcessingFailed',        cause: 'Payment processing failed, all steps compensated'      }));
    const compensateInventory = releaseInventoryTask      .next(cancelOrderTask)      .next(new sfn.Fail(this, 'InventoryFailed', {        error: 'InventoryReservationFailed',        cause: 'Inventory reservation failed, order cancelled'      }));
    // Build forward flow with catch blocks for compensation    const sagaDefinition = createOrderTask      .next(reserveInventoryTask        .addCatch(compensateInventory, {          errors: ['States.ALL'],          resultPath: '$.inventoryError'        })      )      .next(processPaymentTask        .addCatch(compensatePayment, {          errors: ['States.ALL'],          resultPath: '$.paymentError'        })      )      .next(confirmOrderTask)      .next(new sfn.Succeed(this, 'OrderSuccess'));
    // Create saga state machine    const orderSaga = new sfn.StateMachine(this, 'OrderSaga', {      definition: sagaDefinition,      stateMachineType: sfn.StateMachineType.STANDARD,      timeout: cdk.Duration.minutes(5),      tracingEnabled: true    });
    new cdk.CfnOutput(this, 'OrderSagaArn', {      value: orderSaga.stateMachineArn,      description: 'Order Saga State Machine ARN'    });  }}

This infrastructure sets up a complete order processing saga with proper compensation chains. Notice how each step has retry configuration for transient errors, and compensation flows are built in reverse order; this is critical for properly undoing completed steps.

Implementing Idempotent Operations

Idempotency is non-negotiable in sagas. Steps may execute multiple times due to retries, failures, or network issues. Here's how to implement properly idempotent operations:

typescript

import { DynamoDBClient } from '@aws-sdk/client-dynamodb';import { DynamoDBDocumentClient, UpdateCommand, GetCommand } from '@aws-sdk/lib-dynamodb';
const client = new DynamoDBClient({});const docClient = DynamoDBDocumentClient.from(client);
interface ReserveInventoryEvent {  orderId: string;  productId: string;  quantity: number;  transactionId: string; // Idempotency key}
export const handler = async (event: ReserveInventoryEvent) => {  console.log('Reserving inventory', { orderId: event.orderId, productId: event.productId });
  try {    // Idempotent check: Has this transaction already completed?    const existingReservation = await docClient.send(      new GetCommand({        TableName: process.env.INVENTORY_TABLE!,        Key: { productId: event.productId }      })    );
    const reservation = existingReservation.Item?.reservations?.[event.transactionId];    if (reservation?.status === 'RESERVED') {      console.log('Inventory already reserved for this transaction', {        transactionId: event.transactionId      });      return {        success: true,        message: 'Idempotent: Already reserved',        reservationId: event.transactionId      };    }
    // Atomic update with conditional expression (semantic lock)    const result = await docClient.send(      new UpdateCommand({        TableName: process.env.INVENTORY_TABLE!,        Key: { productId: event.productId },        UpdateExpression:          'SET availableQuantity = availableQuantity - :qty, ' +          'reservedQuantity = reservedQuantity + :qty, ' +          'reservations.#txId = :reservation',        ConditionExpression: 'availableQuantity >= :qty',        ExpressionAttributeNames: {          '#txId': event.transactionId        },        ExpressionAttributeValues: {          ':qty': event.quantity,          ':reservation': {            orderId: event.orderId,            quantity: event.quantity,            status: 'RESERVED',            timestamp: Date.now(),            expiresAt: Date.now() + (15 * 60 * 1000) // 15 min timeout          }        },        ReturnValues: 'ALL_NEW'      })    );
    console.log('Inventory reserved successfully', {      orderId: event.orderId,      reservationId: event.transactionId    });
    return {      success: true,      reservationId: event.transactionId,      availableQuantity: result.Attributes?.availableQuantity    };
  } catch (error: any) {    if (error.name === 'ConditionalCheckFailedException') {      console.error('Insufficient inventory', {        productId: event.productId,        requested: event.quantity      });      throw new Error('InsufficientInventory');    }
    console.error('Failed to reserve inventory', error);    throw error;  }};

import { DynamoDBClient } from '@aws-sdk/client-dynamodb';import { DynamoDBDocumentClient, UpdateCommand, GetCommand } from '@aws-sdk/lib-dynamodb';
const client = new DynamoDBClient({});const docClient = DynamoDBDocumentClient.from(client);
interface ReserveInventoryEvent {  orderId: string;  productId: string;  quantity: number;  transactionId: string; // Idempotency key}
export const handler = async (event: ReserveInventoryEvent) => {  console.log('Reserving inventory', { orderId: event.orderId, productId: event.productId });
  try {    // Idempotent check: Has this transaction already completed?    const existingReservation = await docClient.send(      new GetCommand({        TableName: process.env.INVENTORY_TABLE!,        Key: { productId: event.productId }      })    );
    const reservation = existingReservation.Item?.reservations?.[event.transactionId];    if (reservation?.status === 'RESERVED') {      console.log('Inventory already reserved for this transaction', {        transactionId: event.transactionId      });      return {        success: true,        message: 'Idempotent: Already reserved',        reservationId: event.transactionId      };    }
    // Atomic update with conditional expression (semantic lock)    const result = await docClient.send(      new UpdateCommand({        TableName: process.env.INVENTORY_TABLE!,        Key: { productId: event.productId },        UpdateExpression:          'SET availableQuantity = availableQuantity - :qty, ' +          'reservedQuantity = reservedQuantity + :qty, ' +          'reservations.#txId = :reservation',        ConditionExpression: 'availableQuantity >= :qty',        ExpressionAttributeNames: {          '#txId': event.transactionId        },        ExpressionAttributeValues: {          ':qty': event.quantity,          ':reservation': {            orderId: event.orderId,            quantity: event.quantity,            status: 'RESERVED',            timestamp: Date.now(),            expiresAt: Date.now() + (15 * 60 * 1000) // 15 min timeout          }        },        ReturnValues: 'ALL_NEW'      })    );
    console.log('Inventory reserved successfully', {      orderId: event.orderId,      reservationId: event.transactionId    });
    return {      success: true,      reservationId: event.transactionId,      availableQuantity: result.Attributes?.availableQuantity    };
  } catch (error: any) {    if (error.name === 'ConditionalCheckFailedException') {      console.error('Insufficient inventory', {        productId: event.productId,        requested: event.quantity      });      throw new Error('InsufficientInventory');    }
    console.error('Failed to reserve inventory', error);    throw error;  }};

The idempotency check at the beginning ensures that if this function executes multiple times with the same transactionId, it returns the same result without side effects. The conditional expression provides an atomic "semantic lock"; only reserving inventory if sufficient quantity is available.

Compensation: Releasing Inventory

typescript

export const releaseInventoryHandler = async (event: ReserveInventoryEvent) => {  console.log('Releasing inventory reservation', {    orderId: event.orderId,    transactionId: event.transactionId  });
  try {    const reservation = await docClient.send(      new GetCommand({        TableName: process.env.INVENTORY_TABLE!,        Key: { productId: event.productId }      })    );
    const existingReservation = reservation.Item?.reservations?.[event.transactionId];
    if (!existingReservation) {      console.log('Idempotent: Reservation already released or never existed', {        transactionId: event.transactionId      });      return { success: true, message: 'Already released' };    }
    // Atomic release with conditional check    await docClient.send(      new UpdateCommand({        TableName: process.env.INVENTORY_TABLE!,        Key: { productId: event.productId },        UpdateExpression:          'SET availableQuantity = availableQuantity + :qty, ' +          'reservedQuantity = reservedQuantity - :qty ' +          'REMOVE reservations.#txId',        ConditionExpression: 'attribute_exists(reservations.#txId)',        ExpressionAttributeNames: {          '#txId': event.transactionId        },        ExpressionAttributeValues: {          ':qty': existingReservation.quantity        }      })    );
    console.log('Inventory released successfully', { transactionId: event.transactionId });    return { success: true };
  } catch (error: any) {    if (error.name === 'ConditionalCheckFailedException') {      // Reservation already released - idempotent success      return { success: true, message: 'Already released' };    }    throw error;  }};

export const releaseInventoryHandler = async (event: ReserveInventoryEvent) => {  console.log('Releasing inventory reservation', {    orderId: event.orderId,    transactionId: event.transactionId  });
  try {    const reservation = await docClient.send(      new GetCommand({        TableName: process.env.INVENTORY_TABLE!,        Key: { productId: event.productId }      })    );
    const existingReservation = reservation.Item?.reservations?.[event.transactionId];
    if (!existingReservation) {      console.log('Idempotent: Reservation already released or never existed', {        transactionId: event.transactionId      });      return { success: true, message: 'Already released' };    }
    // Atomic release with conditional check    await docClient.send(      new UpdateCommand({        TableName: process.env.INVENTORY_TABLE!,        Key: { productId: event.productId },        UpdateExpression:          'SET availableQuantity = availableQuantity + :qty, ' +          'reservedQuantity = reservedQuantity - :qty ' +          'REMOVE reservations.#txId',        ConditionExpression: 'attribute_exists(reservations.#txId)',        ExpressionAttributeNames: {          '#txId': event.transactionId        },        ExpressionAttributeValues: {          ':qty': existingReservation.quantity        }      })    );
    console.log('Inventory released successfully', { transactionId: event.transactionId });    return { success: true };
  } catch (error: any) {    if (error.name === 'ConditionalCheckFailedException') {      // Reservation already released - idempotent success      return { success: true, message: 'Already released' };    }    throw error;  }};

Notice how the compensation is also idempotent. If the reservation doesn't exist, we return success; the desired end state is achieved.

Payment Processing with Idempotency

typescript

interface ProcessPaymentEvent {  orderId: string;  customerId: string;  amount: number;  currency: string;  paymentMethodId: string;  idempotencyKey: string;}
export const processPaymentHandler = async (event: ProcessPaymentEvent) => {  console.log('Processing payment', {    orderId: event.orderId,    amount: event.amount  });
  try {    // Check idempotency: Has this payment already been processed?    const existingPayment = await docClient.send(      new GetCommand({        TableName: process.env.PAYMENTS_TABLE!,        Key: { paymentId: event.idempotencyKey }      })    );
    if (existingPayment.Item?.status === 'COMPLETED') {      console.log('Idempotent: Payment already processed', {        paymentId: event.idempotencyKey      });      return {        success: true,        paymentId: event.idempotencyKey,        transactionId: existingPayment.Item.transactionId,        message: 'Already processed'      };    }
    // Create payment record with PENDING status first    const paymentId = event.idempotencyKey;    await docClient.send(      new UpdateCommand({        TableName: process.env.PAYMENTS_TABLE!,        Key: { paymentId },        UpdateExpression:          'SET #status = :pending, orderId = :orderId, ' +          'amount = :amount, createdAt = :timestamp',        ConditionExpression: 'attribute_not_exists(paymentId)',        ExpressionAttributeNames: { '#status': 'status' },        ExpressionAttributeValues: {          ':pending': 'PENDING',          ':orderId': event.orderId,          ':amount': event.amount,          ':timestamp': Date.now()        }      })    );
    // Call external payment provider (Stripe, etc.)    const paymentResult = await callPaymentProvider({      amount: event.amount,      currency: event.currency,      customerId: event.customerId,      paymentMethodId: event.paymentMethodId,      idempotencyKey: event.idempotencyKey    });
    if (!paymentResult.success) {      await docClient.send(        new UpdateCommand({          TableName: process.env.PAYMENTS_TABLE!,          Key: { paymentId },          UpdateExpression:            'SET #status = :failed, failureReason = :reason, updatedAt = :timestamp',          ExpressionAttributeNames: { '#status': 'status' },          ExpressionAttributeValues: {            ':failed': 'FAILED',            ':reason': paymentResult.error,            ':timestamp': Date.now()          }        })      );
      throw new Error(`PaymentFailed: ${paymentResult.error}`);    }
    // Update to COMPLETED status    await docClient.send(      new UpdateCommand({        TableName: process.env.PAYMENTS_TABLE!,        Key: { paymentId },        UpdateExpression:          'SET #status = :completed, transactionId = :txId, updatedAt = :timestamp',        ExpressionAttributeNames: { '#status': 'status' },        ExpressionAttributeValues: {          ':completed': 'COMPLETED',          ':txId': paymentResult.transactionId,          ':timestamp': Date.now()        }      })    );
    return {      success: true,      paymentId,      transactionId: paymentResult.transactionId    };
  } catch (error: any) {    if (error.name === 'ConditionalCheckFailedException') {      // Another execution already created the payment record      throw new Error('ConcurrentPaymentAttempt');    }    throw error;  }};
// Simulated payment provider callasync function callPaymentProvider(params: any) {  // In production: Stripe, Adyen, etc. with their idempotency keys  return {    success: Math.random() > 0.1, // 90% success rate    transactionId: `txn_${Date.now()}`,    error: 'CardDeclined'  };}

interface ProcessPaymentEvent {  orderId: string;  customerId: string;  amount: number;  currency: string;  paymentMethodId: string;  idempotencyKey: string;}
export const processPaymentHandler = async (event: ProcessPaymentEvent) => {  console.log('Processing payment', {    orderId: event.orderId,    amount: event.amount  });
  try {    // Check idempotency: Has this payment already been processed?    const existingPayment = await docClient.send(      new GetCommand({        TableName: process.env.PAYMENTS_TABLE!,        Key: { paymentId: event.idempotencyKey }      })    );
    if (existingPayment.Item?.status === 'COMPLETED') {      console.log('Idempotent: Payment already processed', {        paymentId: event.idempotencyKey      });      return {        success: true,        paymentId: event.idempotencyKey,        transactionId: existingPayment.Item.transactionId,        message: 'Already processed'      };    }
    // Create payment record with PENDING status first    const paymentId = event.idempotencyKey;    await docClient.send(      new UpdateCommand({        TableName: process.env.PAYMENTS_TABLE!,        Key: { paymentId },        UpdateExpression:          'SET #status = :pending, orderId = :orderId, ' +          'amount = :amount, createdAt = :timestamp',        ConditionExpression: 'attribute_not_exists(paymentId)',        ExpressionAttributeNames: { '#status': 'status' },        ExpressionAttributeValues: {          ':pending': 'PENDING',          ':orderId': event.orderId,          ':amount': event.amount,          ':timestamp': Date.now()        }      })    );
    // Call external payment provider (Stripe, etc.)    const paymentResult = await callPaymentProvider({      amount: event.amount,      currency: event.currency,      customerId: event.customerId,      paymentMethodId: event.paymentMethodId,      idempotencyKey: event.idempotencyKey    });
    if (!paymentResult.success) {      await docClient.send(        new UpdateCommand({          TableName: process.env.PAYMENTS_TABLE!,          Key: { paymentId },          UpdateExpression:            'SET #status = :failed, failureReason = :reason, updatedAt = :timestamp',          ExpressionAttributeNames: { '#status': 'status' },          ExpressionAttributeValues: {            ':failed': 'FAILED',            ':reason': paymentResult.error,            ':timestamp': Date.now()          }        })      );
      throw new Error(`PaymentFailed: ${paymentResult.error}`);    }
    // Update to COMPLETED status    await docClient.send(      new UpdateCommand({        TableName: process.env.PAYMENTS_TABLE!,        Key: { paymentId },        UpdateExpression:          'SET #status = :completed, transactionId = :txId, updatedAt = :timestamp',        ExpressionAttributeNames: { '#status': 'status' },        ExpressionAttributeValues: {          ':completed': 'COMPLETED',          ':txId': paymentResult.transactionId,          ':timestamp': Date.now()        }      })    );
    return {      success: true,      paymentId,      transactionId: paymentResult.transactionId    };
  } catch (error: any) {    if (error.name === 'ConditionalCheckFailedException') {      // Another execution already created the payment record      throw new Error('ConcurrentPaymentAttempt');    }    throw error;  }};
// Simulated payment provider callasync function callPaymentProvider(params: any) {  // In production: Stripe, Adyen, etc. with their idempotency keys  return {    success: Math.random() > 0.1, // 90% success rate    transactionId: `txn_${Date.now()}`,    error: 'CardDeclined'  };}

This payment implementation shows three-phase idempotency: check for existing completion, create PENDING record, then update to final state. This pattern ensures we never double-charge a customer even if the function executes multiple times.

Choreography Implementation with EventBridge

For simpler flows, choreography can provide better decoupling. Here's how to implement event-driven saga coordination:

typescript

import { EventBridgeClient, PutEventsCommand } from '@aws-sdk/client-eventbridge';
// Order Service: Creates order and publishes OrderCreated eventexport const createOrderHandler = async (event: any) => {  const orderId = `ord_${Date.now()}`;
  // Create order in database  await saveOrder({    orderId,    customerId: event.customerId,    items: event.items,    status: 'PENDING'  });
  // Publish OrderCreated event  const eventBridge = new EventBridgeClient({});  await eventBridge.send(    new PutEventsCommand({      Entries: [{        Source: 'order.service',        DetailType: 'OrderCreated',        Detail: JSON.stringify({          orderId,          customerId: event.customerId,          items: event.items,          totalAmount: event.totalAmount,          timestamp: Date.now()        })      }]    })  );
  return { orderId, status: 'PENDING' };};
// Inventory Service: Listens to OrderCreated, reserves inventoryexport const reserveInventoryOnOrderHandler = async (event: any) => {  const { orderId, items } = event.detail;
  try {    const reservationResult = await reserveInventoryForOrder(orderId, items);
    const eventBridge = new EventBridgeClient({});    await eventBridge.send(      new PutEventsCommand({        Entries: [{          Source: 'inventory.service',          DetailType: 'InventoryReserved',          Detail: JSON.stringify({            orderId,            reservationId: reservationResult.reservationId,            items,            timestamp: Date.now()          })        }]      })    );
  } catch (error: any) {    // Publish failure event    const eventBridge = new EventBridgeClient({});    await eventBridge.send(      new PutEventsCommand({        Entries: [{          Source: 'inventory.service',          DetailType: 'InventoryReservationFailed',          Detail: JSON.stringify({            orderId,            reason: error.message,            timestamp: Date.now()          })        }]      })    );  }};
// Payment Service: Listens to InventoryReserved, processes paymentexport const processPaymentOnInventoryReservedHandler = async (event: any) => {  const { orderId } = event.detail;
  try {    const order = await getOrder(orderId);    const paymentResult = await processPayment({      orderId,      customerId: order.customerId,      amount: order.totalAmount    });
    const eventBridge = new EventBridgeClient({});    await eventBridge.send(      new PutEventsCommand({        Entries: [{          Source: 'payment.service',          DetailType: 'PaymentCompleted',          Detail: JSON.stringify({            orderId,            paymentId: paymentResult.paymentId,            transactionId: paymentResult.transactionId,            timestamp: Date.now()          })        }]      })    );
  } catch (error: any) {    // Publish failure event - triggers compensation    const eventBridge = new EventBridgeClient({});    await eventBridge.send(      new PutEventsCommand({        Entries: [{          Source: 'payment.service',          DetailType: 'PaymentFailed',          Detail: JSON.stringify({            orderId,            reason: error.message,            timestamp: Date.now()          })        }]      })    );  }};
// Inventory Service: Listens to PaymentFailed, releases inventory (compensation)export const releaseInventoryOnPaymentFailedHandler = async (event: any) => {  const { orderId } = event.detail;
  console.log('Compensating: Releasing inventory due to payment failure', { orderId });
  await releaseInventory(orderId);
  const eventBridge = new EventBridgeClient({});  await eventBridge.send(    new PutEventsCommand({      Entries: [{        Source: 'inventory.service',        DetailType: 'InventoryReleased',        Detail: JSON.stringify({          orderId,          reason: 'PaymentFailed',          timestamp: Date.now()        })      }]    })  );};

import { EventBridgeClient, PutEventsCommand } from '@aws-sdk/client-eventbridge';
// Order Service: Creates order and publishes OrderCreated eventexport const createOrderHandler = async (event: any) => {  const orderId = `ord_${Date.now()}`;
  // Create order in database  await saveOrder({    orderId,    customerId: event.customerId,    items: event.items,    status: 'PENDING'  });
  // Publish OrderCreated event  const eventBridge = new EventBridgeClient({});  await eventBridge.send(    new PutEventsCommand({      Entries: [{        Source: 'order.service',        DetailType: 'OrderCreated',        Detail: JSON.stringify({          orderId,          customerId: event.customerId,          items: event.items,          totalAmount: event.totalAmount,          timestamp: Date.now()        })      }]    })  );
  return { orderId, status: 'PENDING' };};
// Inventory Service: Listens to OrderCreated, reserves inventoryexport const reserveInventoryOnOrderHandler = async (event: any) => {  const { orderId, items } = event.detail;
  try {    const reservationResult = await reserveInventoryForOrder(orderId, items);
    const eventBridge = new EventBridgeClient({});    await eventBridge.send(      new PutEventsCommand({        Entries: [{          Source: 'inventory.service',          DetailType: 'InventoryReserved',          Detail: JSON.stringify({            orderId,            reservationId: reservationResult.reservationId,            items,            timestamp: Date.now()          })        }]      })    );
  } catch (error: any) {    // Publish failure event    const eventBridge = new EventBridgeClient({});    await eventBridge.send(      new PutEventsCommand({        Entries: [{          Source: 'inventory.service',          DetailType: 'InventoryReservationFailed',          Detail: JSON.stringify({            orderId,            reason: error.message,            timestamp: Date.now()          })        }]      })    );  }};
// Payment Service: Listens to InventoryReserved, processes paymentexport const processPaymentOnInventoryReservedHandler = async (event: any) => {  const { orderId } = event.detail;
  try {    const order = await getOrder(orderId);    const paymentResult = await processPayment({      orderId,      customerId: order.customerId,      amount: order.totalAmount    });
    const eventBridge = new EventBridgeClient({});    await eventBridge.send(      new PutEventsCommand({        Entries: [{          Source: 'payment.service',          DetailType: 'PaymentCompleted',          Detail: JSON.stringify({            orderId,            paymentId: paymentResult.paymentId,            transactionId: paymentResult.transactionId,            timestamp: Date.now()          })        }]      })    );
  } catch (error: any) {    // Publish failure event - triggers compensation    const eventBridge = new EventBridgeClient({});    await eventBridge.send(      new PutEventsCommand({        Entries: [{          Source: 'payment.service',          DetailType: 'PaymentFailed',          Detail: JSON.stringify({            orderId,            reason: error.message,            timestamp: Date.now()          })        }]      })    );  }};
// Inventory Service: Listens to PaymentFailed, releases inventory (compensation)export const releaseInventoryOnPaymentFailedHandler = async (event: any) => {  const { orderId } = event.detail;
  console.log('Compensating: Releasing inventory due to payment failure', { orderId });
  await releaseInventory(orderId);
  const eventBridge = new EventBridgeClient({});  await eventBridge.send(    new PutEventsCommand({      Entries: [{        Source: 'inventory.service',        DetailType: 'InventoryReleased',        Detail: JSON.stringify({          orderId,          reason: 'PaymentFailed',          timestamp: Date.now()        })      }]    })  );};

In choreography, each service is responsible for listening to relevant events and publishing new events. Compensation happens through the same event mechanism; when a service publishes a failure event, other services react by executing their compensating transactions.

Semantic Locking for Isolation

Sagas lack traditional transaction isolation, which can lead to concurrent saga conflicts. Semantic locking provides application-level isolation:

typescript

interface SagaLock {  sagaId: string;  lockedAt: number;  expiresAt: number;}
export const acquireSagaLock = async (orderId: string, sagaId: string) => {  const lockDuration = 5 * 60 * 1000; // 5 minutes  const now = Date.now();
  try {    await docClient.send(      new UpdateCommand({        TableName: process.env.ORDERS_TABLE!,        Key: { orderId },        UpdateExpression:          'SET sagaLock = :lock, #status = :processing',        ConditionExpression:          'attribute_not_exists(sagaLock) OR sagaLock.expiresAt < :now',        ExpressionAttributeNames: {          '#status': 'status'        },        ExpressionAttributeValues: {          ':lock': {            sagaId,            lockedAt: now,            expiresAt: now + lockDuration          },          ':processing': 'PROCESSING',          ':now': now        }      })    );
    console.log('Saga lock acquired', { orderId, sagaId });    return true;
  } catch (error: any) {    if (error.name === 'ConditionalCheckFailedException') {      console.warn('Saga lock already held by another saga', { orderId });      throw new Error('SagaLockConflict');    }    throw error;  }};
export const releaseSagaLock = async (orderId: string, sagaId: string) => {  await docClient.send(    new UpdateCommand({      TableName: process.env.ORDERS_TABLE!,      Key: { orderId },      UpdateExpression: 'REMOVE sagaLock',      ConditionExpression: 'sagaLock.sagaId = :sagaId',      ExpressionAttributeValues: {        ':sagaId': sagaId      }    })  );
  console.log('Saga lock released', { orderId, sagaId });};

interface SagaLock {  sagaId: string;  lockedAt: number;  expiresAt: number;}
export const acquireSagaLock = async (orderId: string, sagaId: string) => {  const lockDuration = 5 * 60 * 1000; // 5 minutes  const now = Date.now();
  try {    await docClient.send(      new UpdateCommand({        TableName: process.env.ORDERS_TABLE!,        Key: { orderId },        UpdateExpression:          'SET sagaLock = :lock, #status = :processing',        ConditionExpression:          'attribute_not_exists(sagaLock) OR sagaLock.expiresAt < :now',        ExpressionAttributeNames: {          '#status': 'status'        },        ExpressionAttributeValues: {          ':lock': {            sagaId,            lockedAt: now,            expiresAt: now + lockDuration          },          ':processing': 'PROCESSING',          ':now': now        }      })    );
    console.log('Saga lock acquired', { orderId, sagaId });    return true;
  } catch (error: any) {    if (error.name === 'ConditionalCheckFailedException') {      console.warn('Saga lock already held by another saga', { orderId });      throw new Error('SagaLockConflict');    }    throw error;  }};
export const releaseSagaLock = async (orderId: string, sagaId: string) => {  await docClient.send(    new UpdateCommand({      TableName: process.env.ORDERS_TABLE!,      Key: { orderId },      UpdateExpression: 'REMOVE sagaLock',      ConditionExpression: 'sagaLock.sagaId = :sagaId',      ExpressionAttributeValues: {        ':sagaId': sagaId      }    })  );
  console.log('Saga lock released', { orderId, sagaId });};

This semantic lock prevents two sagas from modifying the same order concurrently. The lock includes an expiration time to handle cases where a saga crashes without releasing the lock.

Cost Analysis and Trade-offs

Understanding the cost implications helps you choose the right approach.

Step Functions Orchestration Costs

For 100,000 orders per month with an 8-step saga:

Total state transitions: 800,000
Cost: (800,000 / 1,000) × $0.025 = **$ 20/month**
Failed sagas (5% with 4-step compensation): ~20,000 transitions = +$0.50/month
Total: ~$20.50/month

EventBridge Choreography Costs

For 100,000 orders per month with 4 events per order:

Total events: 400,000
Cost: (400,000 / 1,000,000) × $1.00 = **$ 0.40/month**
Failed orders with compensation: +$0.03/month
Total: ~$0.43/month

The Real Trade-off

Choreography is significantly cheaper but comes with higher development and debugging complexity. Orchestration costs more but provides better visibility and easier troubleshooting. For most production systems, the improved observability of orchestration is worth the additional cost.

Common Pitfalls and Solutions

Pitfall 1: Non-Idempotent Operations

Problem: Payment charged multiple times on retry.

Solution: Always implement idempotency checks and use provider idempotency keys.

Pitfall 2: Incomplete Compensation Chains

Problem: Only compensating the last step, leaving earlier steps in inconsistent state.

Solution: Chain all compensations in reverse order. Each catch block must compensate all previous steps.

Pitfall 3: Ignoring Compensation Failures

Problem: Compensation fails, saga hangs.

Solution: Aggressive retry for compensations (10+ attempts) with dead-letter queue for manual intervention.

Pitfall 4: Timeout Too Short

Problem: Timeout triggers compensation, but operation actually succeeded.

Solution: Set realistic timeouts with buffer. Verify actual state before compensating.

Pitfall 5: No Saga State Tracking in Choreography

Problem: Can't determine which orders are in compensation.

Solution: Persist saga state even in choreography for observability.

Key Takeaways

Choose orchestration for complex flows (>4 services), choreography for simple linear flows
Idempotency is critical; every saga step must be idempotent with proper keys
Compensate in reverse order; make compensations idempotent and retryable
Use semantic locking to prevent concurrent saga conflicts
Set realistic timeouts and verify state before compensating
Implement observability from day one; structured logging, metrics, tracing
Keep sagas simple (3-5 steps); chain multiple sagas for complex workflows
Persist saga state even in choreography for debugging
Retry compensations aggressively with DLQ for failures requiring manual intervention
Consider cost vs complexity; orchestration provides better visibility but costs more

The Saga pattern provides a robust solution for distributed transactions in microservices. By understanding the trade-offs between orchestration and choreography, implementing proper idempotency and compensation, and establishing strong observability, you can build reliable distributed systems that maintain consistency across service boundaries.

Helper Functions

typescript

// Simplified helper functions referenced in examplesimport { PutCommand } from '@aws-sdk/lib-dynamodb';
async function saveOrder(order: any) {  await docClient.send(    new PutCommand({      TableName: process.env.ORDERS_TABLE!,      Item: order    })  );}
async function getOrder(orderId: string) {  const result = await docClient.send(    new GetCommand({      TableName: process.env.ORDERS_TABLE!,      Key: { orderId }    })  );  return result.Item as any;}
async function reserveInventoryForOrder(orderId: string, items: any[]) {  return { reservationId: `res_${Date.now()}` };}
async function releaseInventory(orderId: string) {  // Release logic}
async function processPayment(params: any) {  return {    paymentId: `pay_${Date.now()}`,    transactionId: `txn_${Date.now()}`  };}
async function updateOrderStatus(orderId: string, status: string) {  await docClient.send(    new UpdateCommand({      TableName: process.env.ORDERS_TABLE!,      Key: { orderId },      UpdateExpression: 'SET #status = :status',      ExpressionAttributeNames: { '#status': 'status' },      ExpressionAttributeValues: { ':status': status }    })  );}

// Simplified helper functions referenced in examplesimport { PutCommand } from '@aws-sdk/lib-dynamodb';
async function saveOrder(order: any) {  await docClient.send(    new PutCommand({      TableName: process.env.ORDERS_TABLE!,      Item: order    })  );}
async function getOrder(orderId: string) {  const result = await docClient.send(    new GetCommand({      TableName: process.env.ORDERS_TABLE!,      Key: { orderId }    })  );  return result.Item as any;}
async function reserveInventoryForOrder(orderId: string, items: any[]) {  return { reservationId: `res_${Date.now()}` };}
async function releaseInventory(orderId: string) {  // Release logic}
async function processPayment(params: any) {  return {    paymentId: `pay_${Date.now()}`,    transactionId: `txn_${Date.now()}`  };}
async function updateOrderStatus(orderId: string, status: string) {  await docClient.send(    new UpdateCommand({      TableName: process.env.ORDERS_TABLE!,      Key: { orderId },      UpdateExpression: 'SET #status = :status',      ExpressionAttributeNames: { '#status': 'status' },      ExpressionAttributeValues: { ':status': status }    })  );}

References

usenix.org - Research example: distributed systems reading (USENIX).
microservices.io - Microservices patterns catalog (Chris Richardson).
docs.aws.amazon.com - AWS CDK Developer Guide.
github.com - AWS CDK source repository and release notes.
typescriptlang.org - TypeScript Handbook and language reference.
github.com - TypeScript project wiki (FAQ and design notes).
docs.aws.amazon.com - Amazon EventBridge User Guide.
developer.mozilla.org - MDN Web Docs (web platform reference).
semver.org - Semantic Versioning specification.

Abstract#

The Distributed Transaction Problem#

Saga Pattern Fundamentals#

Orchestration vs Choreography#

Choreography: Event-Driven Coordination#

Orchestration: Centralized Coordination#

Decision Framework#

Orchestration Implementation with AWS Step Functions#

Infrastructure Setup#

Implementing Idempotent Operations#

Compensation: Releasing Inventory#

Payment Processing with Idempotency#

Choreography Implementation with EventBridge#

Semantic Locking for Isolation#

Cost Analysis and Trade-offs#

Step Functions Orchestration Costs#

EventBridge Choreography Costs#

The Real Trade-off#

Common Pitfalls and Solutions#

Pitfall 1: Non-Idempotent Operations#

Pitfall 2: Incomplete Compensation Chains#

Pitfall 3: Ignoring Compensation Failures#

Pitfall 4: Timeout Too Short#

Pitfall 5: No Saga State Tracking in Choreography#

Key Takeaways#

Helper Functions#

References#

Related Posts

Abstract

The Distributed Transaction Problem

Saga Pattern Fundamentals

Orchestration vs Choreography

Choreography: Event-Driven Coordination

Orchestration: Centralized Coordination

Decision Framework

Orchestration Implementation with AWS Step Functions

Infrastructure Setup

Implementing Idempotent Operations

Compensation: Releasing Inventory

Payment Processing with Idempotency

Choreography Implementation with EventBridge

Semantic Locking for Isolation

Cost Analysis and Trade-offs

Step Functions Orchestration Costs

EventBridge Choreography Costs

The Real Trade-off

Common Pitfalls and Solutions

Pitfall 1: Non-Idempotent Operations

Pitfall 2: Incomplete Compensation Chains

Pitfall 3: Ignoring Compensation Failures

Pitfall 4: Timeout Too Short

Pitfall 5: No Saga State Tracking in Choreography

Key Takeaways

Helper Functions

References