AI Pair Programming: The Honest Breakdown After 18 Months of Daily Use

We've been using AI coding assistants every single day for the past 18 months — Cursor, Claude, GPT-4, the whole circus. And we've gone through every phase: the honeymoon where it feels like a superpower, the crash when it confidently destroys something critical, and finally the boring-but-effective middle ground where you just know what to use it for and what to handle yourself.

This isn't a hot take post. It's what we actually learned shipping real product — templates, landing pages, auth flows, payment integrations, database migrations. The stuff where 'mostly correct' can cost you a 2am production incident or a security hole you don't notice for weeks.

What AI is genuinely great at

Let's start with the wins, because there are real ones. The biggest unlock for us was boilerplate elimination. Not the trivial stuff like 'write a for loop' — the genuinely annoying medium-complexity boilerplate that eats 40 minutes of your day.

Think: spinning up a new Zod schema that mirrors a database table, writing the 12th slightly-different version of a Next.js route handler, generating TypeScript types from a JSON response you just copy-pasted. AI handles all of this at 10x your speed and gets it right probably 85% of the time. That 15% failure rate sounds scary until you remember you'd be reviewing the code anyway.

// Prompt: "Given this Drizzle schema, generate a Zod validation schema for inserts"
// Input:
const users = pgTable('users', {
  id: uuid('id').primaryKey().defaultRandom(),
  email: text('email').notNull().unique(),
  name: text('name').notNull(),
  role: text('role', { enum: ['admin', 'user'] }).default('user'),
  createdAt: timestamp('created_at').defaultNow(),
});

// AI output (actually correct):
const insertUserSchema = z.object({
  email: z.string().email(),
  name: z.string().min(1).max(255),
  role: z.enum(['admin', 'user']).optional().default('user'),
});
// Note: id and createdAt correctly excluded — AI understood these are DB-generated

That's legitimately useful. It correctly understood which fields are DB-generated versus user-provided. Not magic, but genuinely saves 10 minutes of tedious work.

The other area where AI earns its keep: code transformation tasks. Renaming things consistently across a large file, converting a class component to hooks, migrating from one API shape to another, reformatting a bunch of similar functions to match a new pattern. These are tasks where humans make dumb typos from boredom. AI just... does it.

Generating boilerplate for new features (route handlers, form schemas, API clients)
Code transformation: refactoring patterns, renaming, migrating API shapes
Writing test cases — especially the boring happy-path and edge-case enumeration
Explaining unfamiliar code you didn't write (or wrote 8 months ago and forgot)
First drafts of regex patterns, SQL queries, and utility functions
Documentation and inline comments you'll never write yourself

Where it falls apart (and why)

Here's where we get honest. AI fails in specific, predictable ways — and once you know the pattern, you stop being surprised.

The biggest one: it has no idea what your application actually does. It sees code, not systems. Ask it to refactor a function and it'll optimize that function perfectly while breaking the invariant the rest of your app depends on. We had this exact thing happen with our subscription status logic — AI rewrote a check in a way that was technically cleaner but silently broke the grace period behavior for canceled subscriptions. Clean code. Wrong behavior.

// What we had (a bit messy but intentional):
function canAccessPremiumFeature(user: User): boolean {
  if (user.subscriptionStatus === 'active') return true;
  // Allow access during 3-day grace period after cancellation
  if (user.subscriptionStatus === 'canceled' && user.canceledAt) {
    const gracePeriodEnd = addDays(user.canceledAt, 3);
    return isBefore(new Date(), gracePeriodEnd);
  }
  return false;
}

// What AI 'cleaned it up' to:
function canAccessPremiumFeature(user: User): boolean {
  return user.subscriptionStatus === 'active';
}

// Technically simpler. Also silently removed a business rule.
// We caught it in review. Barely.

The AI wasn't wrong by any local measure — that code is simpler. It just had no way to know about the grace period behavior we'd intentionally built in. This is the core failure mode: AI optimizes locally, and local optimization breaks global behavior.

Second failure mode: security-sensitive code. AI will write auth flows, permission checks, and crypto operations that look completely reasonable and are subtly wrong. Not obviously broken — subtly wrong. Timing attacks in password comparison, permission checks that miss an edge case, JWT validation that skips the algorithm check. It has read all the tutorials. It has also read all the tutorials that had mistakes.

// AI-generated password comparison — looks fine, has a timing issue:
function verifyPassword(input: string, stored: string): boolean {
  const inputHash = hashPassword(input);
  return inputHash === stored; // String comparison, timing-safe? Depends on JS engine
}

// What you actually want:
import { timingSafeEqual } from 'crypto';

function verifyPassword(input: string, stored: string): boolean {
  const inputHash = Buffer.from(hashPassword(input));
  const storedHash = Buffer.from(stored);
  if (inputHash.length !== storedHash.length) return false;
  return timingSafeEqual(inputHash, storedHash);
}

// AI will generate the first version most of the time.
// It might even generate the second if you ask specifically about timing attacks.
// The problem is you have to know to ask.

The confidence problem

The thing that makes AI dangerous isn't that it's wrong — it's that it's wrong with the same tone it uses when it's right. There's no 'I'm guessing here' indicator. It'll confidently tell you a Next.js API is available when it isn't, confidently generate code for a library version that doesn't exist, and confidently explain behavior that was true in version 12 but changed in version 14.

We've started calling this the 'competent intern' problem. A competent intern writes code quickly, explains their reasoning clearly, and is wrong about 1 in 5 things — but doesn't know which 1 in 5. You wouldn't ship a competent intern's PR without review. Same rule applies.

The danger isn't that AI makes mistakes. It's that it makes mistakes with the same confident tone it uses when it's right. Calibrate your review accordingly.

Version hallucination deserves its own mention. We've wasted real hours chasing bugs caused by AI generating code for APIs that don't exist in the version we're actually running. It confidently uses `useFormState` in React 18, or a Drizzle method that was added in a later release, or a Next.js config option that was deprecated. Always check the actual docs when AI introduces an API you haven't used before.

The workflow that actually works

After 18 months, we've settled into a pattern that's less 'AI writes code, we ship it' and more 'AI drafts, we design'.

The split looks roughly like this: AI handles the first draft of anything self-contained. A new utility function, a new form schema, a new test file, a component that doesn't touch business logic. We handle architecture decisions, anything touching auth or payments, anything that crosses module boundaries, and anything where 'works in isolation' isn't the same as 'correct in context'.

Let AI write the first draft — it's faster and you'll edit it anyway
Never let AI make the architectural decision, only implement it once you've decided
Be explicit about constraints: 'this must be timing-safe', 'don't remove the grace period logic'
When AI refactors, diff carefully — look for behavior changes hiding inside style changes
For anything security-related, write it yourself and use AI to review it instead
Version-check any API you haven't personally used before
Use AI to generate test cases — it's surprisingly good at enumerating edge cases you'll miss

The 'use AI to review your own code' pattern is underrated. Writing something yourself and then asking Claude 'what could go wrong here?' or 'what edge cases am I missing?' is often more valuable than having it write the code in the first place. It's good at finding problems. Less good at designing correct solutions in complex domains.

What this means for your codebase long-term

There's a creeping risk we've noticed in AI-heavy codebases: the code starts to look right but mean nothing in particular. Functions get cleaner, abstractions get more consistent, but the actual decisions — why this works this way, what invariant this is protecting — get lost. Because AI optimizes for code that looks good, not code that encodes intent.

The fix is boring: write more comments. Not 'this function adds two numbers' comments — comments that explain why. Why is this check here. What breaks if you remove this. What the business rule is. AI can't know this. You can. Future you will thank past you when you come back in six months and Claude helpfully 'simplifies' the function again.

// Don't:
function getSubscriptionStatus(user: User) {
  if (user.canceledAt && isBefore(new Date(), addDays(user.canceledAt, 3))) {
    return 'active';
  }
  return user.subscriptionStatus;
}

// Do:
function getSubscriptionStatus(user: User) {
  // We give users a 3-day grace period after cancellation.
  // This prevents accidental lockouts for users who cancel and re-subscribe.
  // Business decision from 2024-03 — talk to Stefan before changing this.
  if (user.canceledAt && isBefore(new Date(), addDays(user.canceledAt, 3))) {
    return 'active';
  }
  return user.subscriptionStatus;
}

// Now AI knows not to 'simplify' this. And so does the next developer.

The honest summary

AI coding tools are real productivity multipliers for the right tasks. We're not going back to writing all our Zod schemas by hand or manually generating test cases. But they're also not a replacement for actually understanding what your code needs to do.

The developers getting burned by AI aren't the ones who use it too much — they're the ones who use it in the wrong places. Letting it design your auth flow because it sounds confident. Shipping its database migration without reading it. Accepting a 'cleanup' refactor without checking what got removed.

For us, the biggest practical win has been in template work — the stuff we build and sell at peal.dev. AI dramatically speeds up the 'implementation' phase of building a new template (the boilerplate, the test coverage, the TypeScript types) while we stay in control of the architecture and the tricky business logic. That's the split that's worked.

Use AI like a fast typist who's read a lot of Stack Overflow. Extremely useful. Should not be left unsupervised.

The developers who get the most out of these tools aren't the ones who trust them the most. They're the ones who know exactly where to stop trusting and take the wheel back. That line moves as the models improve — but it hasn't disappeared yet, and anyone who tells you it has is either selling something or hasn't shipped enough production code to know the difference.