Database Transactions in Serverless: What Can Go Wrong (And Often Does)

We had a production bug where users were being charged without getting access to what they paid for. The payment went through, the webhook fired, but the database ended up half-updated. No error in Sentry. No obvious failure. Just a angry support email at 7am. That's when we got serious about transactions in serverless environments.

Transactions in traditional server setups are annoying but manageable. In serverless — with cold starts, function timeouts, connection limits, and stateless execution — they become a whole different problem. Let's go through exactly what breaks and why.

The Cold Start Connection Problem

In a traditional server, you establish a database connection once and keep it alive. You can start a transaction, do work, commit or roll back — all on the same stable connection. In serverless, every function invocation might be starting from scratch. New process, new connection, and you're paying for the overhead.

The problem isn't just latency. It's that connection-level constructs — and transactions are connection-level — don't play well with ephemeral execution contexts. If your function times out mid-transaction, you need the database to eventually roll it back. Most databases do this when the connection drops, but with connection poolers like PgBouncer in transaction mode, the connection goes back to the pool while your transaction is still 'open' from the application's perspective.

// This looks fine but can silently break with PgBouncer in transaction mode
export async function POST(req: Request) {
  const client = await pool.connect();
  
  try {
    await client.query('BEGIN');
    await client.query('INSERT INTO orders ...');
    // Function times out here — what happens to the transaction?
    await client.query('UPDATE inventory ...');
    await client.query('COMMIT');
  } catch (e) {
    await client.query('ROLLBACK');
    throw e;
  } finally {
    client.release();
  }
}

If you're using Neon, Supabase, or PlanetScale with serverless drivers, each of them has specific transaction semantics you need to understand. They're not all the same. Neon's HTTP driver doesn't support interactive transactions at all unless you use their transaction helper. PlanetScale doesn't support multi-statement transactions by default (foreign keys too, but that's a separate rant).

Timeouts Are Silent Killers

Vercel's free tier gives you 10 seconds. Pro gives you 60 seconds for serverless, 800 for Edge. AWS Lambda defaults to 3 seconds if you don't change it. The naive assumption is: my query is fast, so I'll never hit this. Then you get a traffic spike, your database is under load, query takes 12 seconds instead of 200ms, and your function exits mid-transaction.

The database connection drops, the database (eventually) rolls back the transaction, but from your application's perspective — nothing happened. You caught no error. The operation just... didn't complete. If you're lucky, you have idempotency checks. If you're not, you have partial data.

A partial write with no error is worse than a complete failure with an error. At least with an error you know something went wrong.

The fix here is aggressive timeout budgeting. Set your database query timeout lower than your function timeout, so the database throws before your function exits unexpectedly. If you're using Drizzle or Prisma, you can set statement timeouts at the query level.

import { drizzle } from 'drizzle-orm/neon-http';
import { neon } from '@neondatabase/serverless';

const sql = neon(process.env.DATABASE_URL!);
const db = drizzle(sql);

// Set statement timeout lower than your function timeout
// Vercel serverless = 60s, so set DB timeout to 45s
async function withTimeout<T>(fn: () => Promise<T>, ms: number): Promise<T> {
  const timeout = new Promise<never>((_, reject) =>
    setTimeout(() => reject(new Error(`Query timed out after ${ms}ms`)), ms)
  );
  return Promise.race([fn(), timeout]);
}

export async function createOrderWithInventory(userId: string, productId: string) {
  return withTimeout(async () => {
    return await db.transaction(async (tx) => {
      const [order] = await tx
        .insert(orders)
        .values({ userId, status: 'pending' })
        .returning();

      await tx
        .update(inventory)
        .set({ reserved: sql`reserved + 1` })
        .where(eq(inventory.productId, productId));

      return order;
    });
  }, 45_000);
}

Retries Without Idempotency Are a Disaster

Here's a fun scenario: your transaction fails because of a transient network error. You retry. The transaction succeeds — but was it the first one that actually committed and you just didn't get the confirmation? Or did neither commit and the retry was correct? Now you've created two orders for the same cart.

This isn't hypothetical. Stripe webhooks retry on failure. Queue systems retry failed jobs. Your client-side retry logic retries on network errors. All of these can trigger the same transactional operation more than once. If your transaction isn't idempotent, you will have data integrity problems eventually.

The standard fix is idempotency keys. Generate a unique key for the operation on the client side (or derive it from immutable inputs), and make your transaction check for it before doing anything.

// Schema: add idempotency_key unique column to relevant tables
// idempotency_keys table for cross-entity operations

export async function processWebhookPayment(
  stripeEventId: string, // use stripe's event ID as idempotency key
  userId: string,
  amount: number
) {
  return await db.transaction(async (tx) => {
    // Check if we've already processed this event
    const existing = await tx
      .select()
      .from(idempotencyKeys)
      .where(eq(idempotencyKeys.key, stripeEventId))
      .limit(1);

    if (existing.length > 0) {
      // Already processed, return the cached result
      return { alreadyProcessed: true, result: existing[0].result };
    }

    // Do the actual work
    const [payment] = await tx
      .insert(payments)
      .values({ userId, amount, status: 'completed' })
      .returning();

    await tx
      .update(users)
      .set({ credits: sql`credits + ${amount}` })
      .where(eq(users.id, userId));

    // Record that we processed this event
    await tx.insert(idempotencyKeys).values({
      key: stripeEventId,
      result: JSON.stringify({ paymentId: payment.id }),
      createdAt: new Date(),
    });

    return { alreadyProcessed: false, result: { paymentId: payment.id } };
  });
}

Distributed Transactions Are (Almost) Never Worth It

At some point someone on a team decides they need a transaction that spans two databases, or a database and an external API call (like Stripe). This is a trap. There is no clean solution here that doesn't involve a lot of complexity.

The classic mistake: call Stripe inside a database transaction to charge the user, then insert the payment record. If the Stripe call succeeds but the insert fails, you charged someone without recording it. If you flip the order, you record a payment before taking money. Neither option is clean.

The honest answer is you can't have ACID guarantees across system boundaries. What you can have is eventual consistency with compensation logic. The pattern that works in practice:

Write your intent first (pending state in the DB) — this is your durable record that something is happening
Do the external operation (Stripe charge, send email, whatever)
Update your DB to reflect the result (success or failure)
Have a background job that finds 'stuck' pending records and either completes or cancels them
Make each step idempotent so retries don't double-charge or double-send

This is basically the Saga pattern, and it's more ceremony than a simple transaction, but it's honest about what distributed systems can actually guarantee. Two-phase commit exists, but 'I'm not going to run a transaction coordinator in my serverless app' is a valid stance.

Connection Pooling Modes Actually Matter

If you're using Supabase with the pooler (which you should for serverless), you need to understand the difference between Session mode and Transaction mode.

Transaction mode is the right choice for serverless — it gives you a connection only for the duration of a transaction and returns it to the pool immediately after. But it breaks prepared statements and advisory locks, which some ORMs use internally. Session mode is more compatible but holds a connection for the duration of a client session, which defeats the point of pooling in serverless.

Prisma with Supabase in Transaction mode used to require specific configuration to disable prepared statements. Drizzle with Neon's serverless driver uses HTTP by default, which doesn't have a persistent connection at all — great for isolation, means you need Neon's transaction helper for multi-statement transactions.

// Neon serverless — interactive transactions via HTTP
import { neon } from '@neondatabase/serverless';

const sql = neon(process.env.DATABASE_URL!);

// This will NOT work — each query is an independent HTTP request
async function brokenTransaction() {
  await sql`BEGIN`; // HTTP request 1
  await sql`INSERT INTO orders ...`; // HTTP request 2 — different 'connection'
  await sql`COMMIT`; // HTTP request 3 — commits nothing
}

// This WILL work — neon's transaction() wraps it properly
async function correctTransaction() {
  const result = await sql.transaction([
    sql`INSERT INTO orders (user_id, status) VALUES (1, 'pending') RETURNING id`,
    sql`UPDATE inventory SET reserved = reserved + 1 WHERE product_id = 42`,
  ]);
  return result;
}

// Or with Drizzle on top of Neon:
async function drizzleTransaction(db: NeonDatabase) {
  return await db.transaction(async (tx) => {
    const [order] = await tx.insert(orders).values({ userId: 1 }).returning();
    await tx.update(inventory)
      .set({ reserved: sql`reserved + 1` })
      .where(eq(inventory.productId, 42));
    return order;
  });
}

The Practical Checklist

After enough production incidents, we've settled on a list of things to verify before any transactional code goes to production in a serverless environment.

Your DB query timeout is set lower than your function timeout — never let the function be the first to give up
Every operation that gets retried (webhook handlers, queue consumers) has an idempotency check
You're not mixing external API calls inside database transactions — accept eventual consistency instead
You know which pooling mode you're running and what it breaks (prepared statements, advisory locks, etc.)
You have a way to detect and recover from stuck 'pending' records — a cron job, a dead letter queue, something
Your transaction only covers what actually needs atomicity — don't wrap unrelated operations in a transaction 'just to be safe'

That last one is underrated. We've seen code where someone put 6 different operations in a single transaction because 'they should all succeed together', but three of them were reads and one was an external API call. The transaction bought nothing and made rollback behavior confusing.

A transaction should cover the minimum set of operations that must be atomic. Everything else is just adding failure surface area.

One More Thing: Serialization Failures

If you're using SERIALIZABLE isolation (and you probably aren't, but maybe you should be for financial operations), PostgreSQL will sometimes throw serialization errors when two transactions conflict. This is correct behavior — it's the database telling you 'these transactions would have conflicted, try again.'

In a serverless handler, you need to be ready to retry these. They're not bugs, they're expected. The retry logic should be at the transaction level, not the HTTP level, and it needs to be bounded — don't retry forever, pick a max attempts count and return a 503 if you exhaust it.

async function withSerializableRetry<T>(
  fn: () => Promise<T>,
  maxAttempts = 3
): Promise<T> {
  let attempt = 0;
  
  while (attempt < maxAttempts) {
    try {
      return await fn();
    } catch (e: any) {
      // PostgreSQL serialization failure error code
      if (e?.code === '40001' && attempt < maxAttempts - 1) {
        attempt++;
        // Small jitter before retry
        await new Promise(r => setTimeout(r, Math.random() * 100));
        continue;
      }
      throw e;
    }
  }
  
  throw new Error('Transaction failed after max retries');
}

// Usage
await withSerializableRetry(() =>
  db.transaction(async (tx) => {
    // your transactional code here
  }, { isolationLevel: 'serializable' })
);

All of this is the kind of stuff we've baked into the database layer of peal.dev templates — the transaction helpers, the idempotency patterns, the pooling configuration that actually works on Vercel out of the box. Because this stuff is genuinely annoying to set up from scratch every project.

The main takeaway: serverless doesn't break transactions, but it does expose every assumption you were making about long-lived connections, reliable timing, and single-execution guarantees. Audit your transactional code with the assumption that the function will die at the worst possible moment. If your data is still consistent after that, you're in good shape.