Monitoring Next.js Apps in Production: Errors, Performance, and Uptime Without Losing Sleep

Here's a scenario that happened to us: we shipped a template update on a Friday evening (classic), went to sleep, and woke up to three support emails asking why checkout was throwing a 500 error. The app was technically 'up' — the server was responding — but the payment flow was completely broken. We had no idea until users told us. That was the moment we got serious about monitoring.

Monitoring is one of those things that feels optional until it absolutely isn't. And with Next.js specifically, there are a few unique gotchas: you've got Server Components, Client Components, Route Handlers, Server Actions, and middleware all potentially failing in different ways. A generic Node.js setup won't catch half of it. Let's fix that.

The Three Pillars You Actually Need

Monitoring for a web app breaks down into three distinct concerns, and people often conflate them. Error tracking tells you when something breaks. Performance monitoring tells you when something is slow. Uptime monitoring tells you when nothing responds at all. You need all three, and they're different tools for different jobs.

Error tracking — catching exceptions, unhandled rejections, and broken API responses
Performance monitoring — Core Web Vitals, server response times, database query durations
Uptime monitoring — external checks that confirm your app actually responds from the outside

The mistake we made early on was thinking that having server logs on Vercel was enough. It's not. Logs are reactive — you're fishing through them after something already went wrong. Good monitoring is proactive. You want to know the second something starts failing, not an hour later when a user's patience runs out.

Error Tracking: Catching What Next.js Hides From You

Next.js App Router has multiple error boundaries — error.tsx files at different layout levels, not-found.tsx, and global-error.tsx. The problem is these catch errors for the user experience, but they don't automatically report them anywhere. A user hits an error boundary, sees a 'Something went wrong' page, and closes the tab. You never find out.

The fix is wiring up an error tracking service and calling it from both your error boundaries and your server-side code. We use Sentry — it's not the sexiest choice, but the Next.js integration is genuinely good and it handles both client and server errors in one place. Here's how we set it up:

// sentry.client.config.ts
import * as Sentry from '@sentry/nextjs';

Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
  replaysSessionSampleRate: 0.05,
  replaysOnErrorSampleRate: 1.0,
  integrations: [
    Sentry.replayIntegration({
      maskAllText: true,
      blockAllMedia: false,
    }),
  ],
});

// sentry.server.config.ts
import * as Sentry from '@sentry/nextjs';

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
  // Don't set replays on server — that's client-only
});

Notice the tracesSampleRate is set to 0.1 in production. At 1.0, you'll rack up Sentry quota fast on any app with real traffic. 10% sampling gives you enough data to spot patterns without burning through your plan. For errors specifically, Sentry captures 100% regardless — the sampling only applies to performance traces.

Your error.tsx files also need to report to Sentry explicitly, because Next.js catches the error before it bubbles up:

// app/error.tsx
'use client';

import * as Sentry from '@sentry/nextjs';
import { useEffect } from 'react';

export default function ErrorBoundary({
  error,
  reset,
}: {
  error: Error & { digest?: string };
  reset: () => void;
}) {
  useEffect(() => {
    // Report to Sentry — Next.js won't do this automatically
    Sentry.captureException(error, {
      tags: {
        digest: error.digest,
      },
    });
  }, [error]);

  return (
    <div className="flex flex-col items-center justify-center min-h-[400px] gap-4">
      <h2 className="text-xl font-semibold">Something went wrong</h2>
      <button
        onClick={reset}
        className="px-4 py-2 bg-primary text-primary-foreground rounded-md"
      >
        Try again
      </button>
    </div>
  );
}

One thing worth knowing: the `digest` property on Next.js errors is a hash that links the client-side error to the server-side stack trace. Include it in your Sentry context and you can actually debug server errors that get sanitized before reaching the client.

Performance Monitoring: Core Web Vitals Without the Guesswork

Next.js has a built-in hook for reporting Web Vitals that almost nobody uses. You can drop it in your root layout and pipe the data wherever you want — Sentry, your own analytics endpoint, Datadog, whatever.

// app/_components/web-vitals.tsx
'use client';

import { useReportWebVitals } from 'next/web-vitals';

export function WebVitals() {
  useReportWebVitals((metric) => {
    // Only report in production — dev numbers are meaningless
    if (process.env.NODE_ENV !== 'production') return;

    const { name, value, rating } = metric;

    // Send to your analytics endpoint
    fetch('/api/vitals', {
      method: 'POST',
      body: JSON.stringify({ name, value, rating }),
      headers: { 'Content-Type': 'application/json' },
      // Use keepalive so the request completes even if the page unloads
      keepalive: true,
    }).catch(() => {
      // Silently fail — don't let monitoring break the app
    });

    // Or send directly to Sentry
    if (rating === 'poor') {
      // Only alert on poor ratings to avoid noise
      import('@sentry/nextjs').then(({ captureMessage }) => {
        captureMessage(`Poor Web Vital: ${name}`, {
          level: 'warning',
          extra: { value, rating },
        });
      });
    }
  });

  return null;
}

// app/layout.tsx — add to your root layout
// <WebVitals />

The metrics you'll get: CLS (Cumulative Layout Shift), FID/INP (interaction delay), LCP (Largest Contentful Paint), FCP (First Contentful Paint), and TTFB (Time to First Byte). Of these, LCP and INP are the ones Google actually uses for rankings. TTFB is useful for diagnosing slow server-side rendering.

For server-side performance — database query times, external API latency, that kind of thing — you need instrumentation at the code level. We use a simple wrapper around our database calls that logs slow queries:

// lib/instrumented-query.ts
import * as Sentry from '@sentry/nextjs';

const SLOW_QUERY_THRESHOLD_MS = 1000;

export async function instrumentedQuery<T>(
  name: string,
  queryFn: () => Promise<T>
): Promise<T> {
  const start = performance.now();

  try {
    const result = await queryFn();
    const duration = performance.now() - start;

    if (duration > SLOW_QUERY_THRESHOLD_MS) {
      console.warn(`Slow query detected: ${name} took ${duration.toFixed(0)}ms`);
      Sentry.captureMessage(`Slow query: ${name}`, {
        level: 'warning',
        extra: { duration, threshold: SLOW_QUERY_THRESHOLD_MS },
      });
    }

    return result;
  } catch (error) {
    const duration = performance.now() - start;
    Sentry.captureException(error, {
      extra: { queryName: name, duration },
    });
    throw error;
  }
}

// Usage
const users = await instrumentedQuery('fetch-active-users', () =>
  db.select().from(usersTable).where(eq(usersTable.active, true))
);

This won't replace proper APM tooling like Datadog or New Relic if you're running a large-scale app, but for most SaaS products doing a few thousand requests a day, it's more than enough to catch problems before they become incidents.

Uptime Monitoring: The Check You Don't Control

Here's the thing about uptime monitoring that people miss: it has to run from outside your infrastructure. If your Vercel deployment is down, a monitor running on Vercel won't tell you anything. You need an external service making HTTP requests to your app on a schedule.

We use Better Uptime (now just 'Better Stack') and have been happy with it, but Checkly, UptimeRobot, and Freshping all work. The free tiers are usually sufficient for basic monitoring. What matters is the setup:

Check your main URL every 1-3 minutes — not just for a 200, but validate response content
Check your health endpoint (not just the homepage — that might cache successfully even when your DB is down)
Check your critical user flows: login page, checkout, any page that hits the database
Set up alert channels: email for low priority, SMS or PagerDuty for anything that's been down more than 5 minutes

The health endpoint is crucial. Here's how we build ours — it checks actual dependencies, not just whether the server is running:

// app/api/health/route.ts
import { db } from '@/lib/db';
import { sql } from 'drizzle-orm';
import { NextResponse } from 'next/server';

export const runtime = 'nodejs';
export const dynamic = 'force-dynamic';

type HealthStatus = 'healthy' | 'degraded' | 'unhealthy';

interface HealthCheck {
  status: HealthStatus;
  latencyMs: number;
  error?: string;
}

async function checkDatabase(): Promise<HealthCheck> {
  const start = performance.now();
  try {
    await db.execute(sql`SELECT 1`);
    return { status: 'healthy', latencyMs: performance.now() - start };
  } catch (error) {
    return {
      status: 'unhealthy',
      latencyMs: performance.now() - start,
      error: error instanceof Error ? error.message : 'Unknown error',
    };
  }
}

export async function GET() {
  const [database] = await Promise.all([
    checkDatabase(),
    // Add more checks here: Redis, external APIs, etc.
  ]);

  const overallStatus: HealthStatus =
    database.status === 'unhealthy' ? 'unhealthy' :
    database.status === 'degraded' ? 'degraded' :
    'healthy';

  const statusCode = overallStatus === 'healthy' ? 200 :
    overallStatus === 'degraded' ? 200 : 503;

  return NextResponse.json(
    {
      status: overallStatus,
      timestamp: new Date().toISOString(),
      checks: { database },
    },
    { status: statusCode }
  );
}

Important: don't expose sensitive info in your health endpoint. No database connection strings, no internal IPs, no stack traces. The endpoint should be public — which means it should be boring. Status + latency is enough.

Alert Fatigue Is Real — Design Your Alerts Carefully

We went through a phase where we had alerts set up for everything. Every 4xx response, every slow query, every blip. Within two weeks we were ignoring all of them. If every alert is urgent, no alert is urgent.

The framework we landed on: page us (SMS/push) for things that are definitely broken right now and affecting users. Email us for things that are trending wrong but not yet an emergency. Log everything else for when we're actively debugging something.

Page immediately: health endpoint returning 503, error rate > 5% of requests in a 5-minute window, checkout flow completely broken
Email within an hour: single slow queries consistently > 2s, error rate uptick that hasn't crossed 5%, a spike in 4xx errors that might indicate a broken deployment
Log silently: individual errors that happen once, queries between 500ms-1s, 404s on URLs that were probably just bots

In Sentry, you can set up alert rules with filters and thresholds. The key is adding a minimum event count before alerting — a single error at 3am shouldn't wake you up unless the same error fires 10 times in 10 minutes. That's a pattern, not a one-off.

Putting It All Together Without Over-Engineering

Here's the setup we actually run and would recommend to anyone building a Next.js SaaS:

Sentry — error tracking for both client and server, plus performance traces at 10% sampling
Better Stack (or UptimeRobot if you're cost-sensitive) — external uptime checks every 2 minutes on health endpoint and key pages
Vercel Analytics — built-in, free, gives you real-user Web Vitals without any extra setup
Vercel's built-in logs — for debugging specific incidents, not proactive monitoring
A simple /api/health route that checks database connectivity

Total cost for a small SaaS: probably $20-30/month. Sentry free tier covers up to 5k errors/month and 10k performance events. Better Stack free tier gives you 10 monitors with 3-minute intervals. That's enough to start.

If you're using one of the peal.dev templates, we include a basic health check endpoint and a Sentry configuration file out of the box — so you're not starting from scratch. Wiring up the monitoring service itself still takes an hour, but at least the Next.js side is already structured correctly.

One Last Thing: Test Your Monitoring

This sounds obvious but almost nobody does it: deliberately break something in a staging environment and verify that your alerts fire. Throw an unhandled exception and confirm Sentry captures it. Take your database offline for a minute and make sure the uptime monitor pages you. Set a query to sleep for 3 seconds and check that the slow query log shows up.

Monitoring that you've never tested is just a false sense of security. We learned this after setting up what we thought was a solid alerting pipeline, only to discover six months later that we'd fat-fingered a webhook URL and alerts had been silently failing for ages. Nothing quite like finding that out during an actual incident.

The goal of monitoring isn't to know everything — it's to know the right things fast enough that you can fix them before users give up and leave. Three well-configured alerts beat thirty noisy ones every time.

Set up your health endpoint today, wire up Sentry this week, and add an external uptime check before you go live. Future you — the one getting paged at 2am — will be genuinely grateful.