Skip to content

Async API Patterns for Web and Mobile: An Opinionated Default

One default shape for long-running work across a browser SPA and a mobile app, with the cases where it should be overridden.

The Problem, Made Concrete

Most product APIs begin with plain synchronous request and response. It works for login, for reading a user record, for a simple search. Then one endpoint needs to do real work.

Consider a payment authorization that takes six to ten seconds. A spinner that long drops checkout conversion; users retry the button, and without an idempotency layer the backend accepts a duplicate charge.

Or consider a mobile upload that triggers a two-minute transcode. The user backgrounds the app, the cellular socket dies on carrier NAT or OS suspension, the job finishes successfully minutes later, and the UI still shows a spinner pinned on "uploading" because the client never learned. Webhook-driven outcomes (a payment-provider callback arriving after the originating mobile session is gone) have the same shape: the backend knows; the client does not.

These are not edge cases. They are the normal failure shape of long-running work over unreliable mobile links. The fix is not more polling. The fix is picking the right coordination pattern per operation and using an idempotency layer underneath.

The Default

For most product teams running a web app and a mobile app on the same backend, one pattern covers the long tail of long-running operations. Reach for it first, and override it only when a specific constraint demands something else.

  1. Client sends POST /resource with an Idempotency-Key header. Server returns 202 Accepted with a Location: /jobs/{id} header and a jobId in the body.
  2. Client subscribes to GET /jobs/{id}/events over SSE. On disconnect, the browser (or the RN library) reconnects automatically with Last-Event-ID, and the server replays missed events.
  3. On app reopen after a long background, the client calls GET /jobs/{id} once to read the terminal state before re-subscribing.

One server API covers three client paths: the web user watching in a tab, the mobile user staying in the foreground, and the mobile user who backgrounds the app and returns hours later. The sequence below is that default drawn out end to end.

Runnable Example

Here is a small Node sketch of the submit endpoint plus the SSE stream. Error handling is trimmed for space. The three-branch idempotency cache is the shape worth copying: match returns the cached 202, mismatch returns 422, fresh creates the job.

typescript
import express from "express";import { randomUUID } from "crypto";import { EventEmitter } from "events";
const app = express();app.use(express.json());
const jobs = new Map<string, Job>();const events = new EventEmitter();
interface Job {  id: string;  status: "queued" | "processing" | "completed" | "failed";  progress: number;  result?: string;}
// Idempotency-Key cache: key -> { bodyHash, response }const idem = new Map<string, { bodyHash: string; jobId: string }>();
app.post("/transcodes", (req, res) => {  const key = req.header("Idempotency-Key");  const bodyHash = hash(JSON.stringify(req.body));
  if (key) {    const cached = idem.get(key);    if (cached) {      if (cached.bodyHash === bodyHash) {        return res.status(202).json({ jobId: cached.jobId });      }      return res        .status(422)        .json({ error: "idempotency_key_reused_with_different_body" });    }  }
  const jobId = randomUUID();  const job: Job = { id: jobId, status: "queued", progress: 0 };  jobs.set(jobId, job);  if (key) idem.set(key, { bodyHash, jobId });
  enqueue(jobId);  res.status(202).location(`/jobs/${jobId}`).json({ jobId });});
app.get("/jobs/:id/events", (req, res) => {  const { id } = req.params;  res.writeHead(200, {    "Content-Type": "text/event-stream",    "Cache-Control": "no-cache",    Connection: "keep-alive",  });
  // Replay current state on connect, then stream updates.  const job = jobs.get(id);  if (job) {    res.write(`event: status\ndata: ${JSON.stringify(job)}\n\n`);  }
  const listener = (update: Job) => {    if (update.id !== id) return;    res.write(`id: ${Date.now()}\n`);    res.write(`event: ${update.status}\n`);    res.write(`data: ${JSON.stringify(update)}\n\n`);  };
  events.on("job", listener);  req.on("close", () => events.off("job", listener));});
app.get("/jobs/:id", (req, res) => {  const job = jobs.get(req.params.id);  if (!job) return res.sendStatus(404);  res.json(job);});

Two things earn their keep. The Idempotency-Key cache lets a retry after a network hiccup reuse the same job ID instead of creating a duplicate. The separate GET /jobs/{id} endpoint is the reconciliation path. A mobile client that missed the terminal event still has a clean way to learn the final state.

Browser Client

The browser side is short because EventSource does the hard parts.

typescript
async function transcode(file: File) {  const key = crypto.randomUUID();  const res = await fetch("/transcodes", {    method: "POST",    headers: {      "Content-Type": "application/json",      "Idempotency-Key": key,    },    body: JSON.stringify({ fileId: file.name }),  });  const { jobId } = await res.json();
  const events = new EventSource(`/jobs/${jobId}/events`);  events.addEventListener("progress", (e) =>    console.log("progress", JSON.parse(e.data)),  );  events.addEventListener("completed", (e) => {    console.log("done", JSON.parse(e.data));    events.close();  });  events.onerror = () => {    // EventSource auto-reconnects. Only close if we decide to give up.  };}

React Native Note

React Native does not ship EventSource. The two common paths are the react-native-sse library or parsing the stream yourself from fetch with a ReadableStream polyfill. Parsing is not hard; each record is event: plus data: plus a blank line. The library is fine for most teams.

On reopen, do not trust the stream alone. Call GET /jobs/{id} and render whatever terminal state comes back. If you only rely on the SSE stream, a background kill that drops the final event leaves your UI stuck.

Failure Modes that Make the Default Work

Idempotency Keys, the Default's First Contract

Any mutating endpoint a mobile client calls needs an Idempotency-Key. The client generates a UUID per logical operation and reuses it on retry. The server caches the request fingerprint and response for about 24 hours. The IETF draft for the Idempotency-Key header has gone through several revisions, and Stripe has run the pattern in production for years. There is no reason to invent a different key name.

The server stores three things per key: the body fingerprint, the response body, and a status. If a retry comes in with the same key and a matching fingerprint, return the cached response. If the fingerprint differs, that is a client bug. Stripe returns 400 with an idempotency_error code; 409 Conflict and 422 Unprocessable Entity are also reasonable choices. Pick one and document it.

Retry with Jitter

Client retries need exponential backoff with jitter. If many clients retry at exactly 1s, 2s, 4s, 8s, they synchronize. The server never gets a quiet moment to recover. The AWS Builders' Library writeup on timeouts and backoff is the reference most teams cite. Full jitter over the current backoff window is a good default.

Reconciling Missed Events on Reconnect

When the SSE connection drops and comes back, the client's first signal is Last-Event-ID. The server replays events from that point forward. When the app reopens after a long background, a single GET /jobs/{id} is the authoritative answer. It returns the terminal state if the job finished, or the current progress if it is still running. Only after that does the client try to resubscribe.

Do not build reconciliation at the transport layer. Build it at the business layer, keyed by a correlation ID your domain already owns. "Order 9f3c completed" reconciles fine across a six-hour app kill. "Socket 0x42 payload 17" does not.

SSE Buffering and Proxy Pitfalls

SSE works through almost every proxy, but a surprising number buffer the response by default. Events then arrive in a clump or never at all. Nginx needs X-Accel-Buffering: no on the response. Other proxies have similar knobs. If your CDN sits in front of the SSE endpoint, disable response compression for text/event-stream and confirm the connection is not being held until the full body arrives. Test from the networks your users actually sit behind, not only from your office.

When to Override the Default

The default covers most long-running operations. Five situations earn an override.

Operation completes under about one second at p99 on a degraded mobile network. Plain synchronous request and response. No job ID, no SSE, no coordination. The boundary is not what you see on your laptop; it is what a user on LTE with 30% packet loss sees. Overbuilding here costs you battery and code for no gain.

Bidirectional, high-frequency client writes. WebSockets. Chat, collaborative cursors, multiplayer input. The radio stays awake, a Wi-Fi to LTE handoff breaks the connection, and mobile background policies kill sockets fast. You will build heartbeats, reconnect with jitter, and a server-side session resume token. Budget for that from day one, and do not reach for WebSockets as "future flexibility" on a problem SSE already covers.

A corporate proxy strips SSE. Long polling is the fallback. The client sends a GET that the server may hold for up to N seconds, flushing when data appears or at timeout, and the client immediately reconnects. It costs one round trip per event with no multiplexing, but every proxy passes it through.

A multi-step workflow that runs for hours with per-step retries and timeouts. A workflow engine like Temporal replaces the plain queue behind the default API. If you try to rebuild retries, compensations and per-step timeouts on top of a message queue, you will reinvent workflow orchestration badly. The default's POST and GET /jobs/{id} endpoints stay the same; only the worker changes shape.

Pure backend decoupling with no client waiting. A message queue like SQS or BullMQ, invisible behind the default API. The client still POSTs and gets a 202 with a job ID, but the SSE stream may never carry more than a single completed event, and in some cases the client does not subscribe at all.

Webhooks deserve one note, not a chapter. They are server-to-server only. A webhook is never the sole result channel for a web or mobile client, because the client is not a server. When a provider calls your webhook endpoint, verify the signature, persist the event, ACK, and then bridge the outcome to the user's SSE stream or mark the job store so the next GET /jobs/{id} returns the terminal state.

Decision Framework

The flowchart below starts from the default and branches to overrides.

One guardrail before the overrides: if the server takes longer than about 15 seconds, mobile clients on cellular tend to lose the connection before your handler returns. The OS does not impose a fixed request cap, but carrier timeouts and app suspension routinely break longer-held requests. Treat roughly 15s as a practical ceiling for anything a mobile client holds open in the foreground. Past that, you are in default territory whether you wanted to be or not.

Common Pitfalls

The mistakes below show up in nearly every project that bolts async onto sync.

  • Treating a 30-second synchronous endpoint as "probably fine". It is not. Mobile LTE timeouts are shorter than your laptop's Wi-Fi tolerance.
  • Retrying POST without an idempotency key. Duplicate charges and duplicate orders follow within a week.
  • Assuming WebSocket reconnect "just works". It does not across network switches.
  • Webhook handlers that do the work synchronously before ACKing. The provider retries. You do the work twice.
  • SSE behind a reverse proxy with response buffering on. Events arrive in a clump or never at all. Nginx needs X-Accel-Buffering: no; other proxies have similar knobs.
  • Polling interval tuned for the happy path, not the outage path. One-second polling, multiplied by many users and a ten-minute outage, becomes a self-inflicted denial of service.
  • No dead-letter queue. One poison message consumes the whole worker pool.
  • Correlation ID invented at the transport layer instead of the business layer. Impossible to reconcile when a user reopens the app hours later.
  • Skipping webhook signature verification. Every major provider offers it; using it is a one-line change.

Two Honest Disagreements in the Field

Two points divide experienced practitioners. It is worth naming them rather than picking a winner.

The first is WebSockets versus SSE on mobile battery. Some sources argue WebSockets are more efficient once connected because they avoid the wake-up cost of reconnecting. Others report WebSockets draining more battery because the radio stays in a higher power state. The honest answer depends on message frequency. Sparse traffic favors SSE with keepalive. Chatty traffic favors WebSockets. Measure on the devices and networks your users sit on.

The second is queue choice for long-running jobs on Node. BullMQ advocates argue it is the right default for Node shops. Temporal advocates argue that if the work has retries, timeouts and multiple steps, a queue library is the wrong abstraction and you will rebuild workflow orchestration poorly on top of it. Both views are correct for different shapes of work. The question is the shape of your job, not the library's popularity.

References

Related Posts