We've been using AI coding assistants heavily for the past year — Cursor, Claude, GitHub Copilot, the whole circus. And we have opinions. Not the "AI will replace developers" kind, and not the "AI is just autocomplete" dismissal either. The truth is more boring and more useful: AI is exceptional at some things, embarrassingly bad at others, and the developers getting the most value are the ones who've figured out which is which.
This isn't a review post. It's a breakdown based on actual usage — shipping real features, debugging production issues, building templates. We've made enough mistakes trusting AI in the wrong places that we feel qualified to tell you where to be careful.
What AI Is Genuinely Great At
Let's start positive. There are tasks where AI makes you feel like you're cheating, in the best way.
Boilerplate generation is the obvious one. Writing a Zod schema, a basic API route handler, a TypeScript interface from a JSON blob — AI does this faster than you can type and it's almost always correct. Not because AI is smart, but because these patterns are so common and well-represented in training data that there's basically no room to go wrong.
// You describe this in plain English to AI:
// "Create a Zod schema for a SaaS user profile with
// name, email, optional avatar URL, plan type enum
// (free/pro/enterprise), and created timestamp"
// And you get back something like:
import { z } from 'zod'
export const UserProfileSchema = z.object({
name: z.string().min(1).max(100),
email: z.string().email(),
avatarUrl: z.string().url().optional(),
plan: z.enum(['free', 'pro', 'enterprise']),
createdAt: z.coerce.date(),
})
export type UserProfile = z.infer<typeof UserProfileSchema>That takes 5 seconds instead of 2 minutes. You still read it and verify it, but the friction is gone. Same goes for writing regex patterns, CSS that you'd otherwise spend 20 minutes fighting with, SQL queries for straightforward data fetching, and unit tests for pure functions.
Explaining unfamiliar code is another strong suit. You paste in a gnarly piece of middleware or someone else's custom hook, ask "what does this do and why", and you get a clear explanation in 10 seconds. Stefan does this constantly when we're integrating third-party libraries with poor documentation. It's like having a senior dev on call who has actually read every npm package ever published.
Refactoring well-scoped functions is also great. Give AI a function that does one thing, tell it to make it cleaner/faster/more typesafe, and it usually delivers. The key is "well-scoped" — we'll come back to that.
The "Confidently Wrong" Problem
Here's where it gets dangerous. AI doesn't know what it doesn't know. It will give you a complete, syntactically valid, beautifully formatted answer that is subtly broken in ways that only surface in production at 2am.
The worst case we personally experienced: we asked Claude to help implement a webhook signature verification endpoint. It gave us code that looked correct — it even added a comment explaining the security reasoning. What it missed was that our framework was reading the request body as a stream, and we needed the raw bytes for HMAC verification, not the parsed JSON. The signature check always failed. The code was logically sound but architecturally wrong for our specific setup, and AI had no way to know about that context.
// What AI confidently gave us:
export async function POST(req: Request) {
const body = await req.json() // <-- problem here
const signature = req.headers.get('stripe-signature')
// This fails because req.json() parses the body,
// but Stripe verifies against the RAW request bytes
const event = stripe.webhooks.constructEvent(
JSON.stringify(body), // can't reconstruct exact bytes
signature!,
process.env.STRIPE_WEBHOOK_SECRET!
)
}
// What we actually needed:
export async function POST(req: Request) {
const rawBody = await req.text() // raw bytes preserved
const signature = req.headers.get('stripe-signature')
const event = stripe.webhooks.constructEvent(
rawBody,
signature!,
process.env.STRIPE_WEBHOOK_SECRET!
)
}The fix was obvious once we found it. But AI didn't flag any uncertainty — it presented both versions of the logic with equal confidence. This is the pattern that bites people: the code looks right, tests might even pass locally, and you only find out something's wrong when real conditions hit.
Where AI Consistently Falls Apart
We've identified a few categories where you should treat AI output as a first draft that needs serious review, not a solution.
- Security-sensitive code: auth flows, token handling, permission checks, anything cryptographic. AI knows the concepts but misses the subtle implementation details that matter.
- Code that spans multiple files and depends on your specific architecture. AI works well in isolation; it gets confused when the answer depends on how your app is actually wired together.
- Anything involving race conditions, concurrency, or timing. AI will give you code that works 95% of the time and breaks in hard-to-reproduce ways under load.
- Migrations and schema changes. AI is dangerously optimistic about data migrations. It'll write you a migration that works on empty tables and destroys data on production ones.
- Edge cases and error handling. Ask AI to write a function and it'll nail the happy path. Ask it to handle every realistic failure mode and it either misses half of them or over-engineers it into unreadability.
- Performance optimization at scale. AI can suggest micro-optimizations but has no sense of what the actual bottleneck in your system is.
AI is optimistic by nature. It will solve the problem you described, not the problem you actually have. The gap between those two things is where bugs live.
The Context Problem Is Real
Most AI failures we've seen come down to context. The AI doesn't know that you're running Next.js App Router not Pages Router, that you're using Prisma with connection pooling through PgBouncer, that your deployment environment has specific memory constraints, or that this endpoint needs to handle 10,000 requests per minute, not 10.
You can dump context into the prompt, and that helps — but there's a ceiling. You can't fully describe a complex codebase in a prompt. The AI will fill in the gaps with reasonable assumptions that happen to be wrong for your situation.
The developers who get the most out of AI tools are the ones who've learned to be extremely specific about constraints. Don't ask "how do I cache this data" — ask "how do I cache this data in a Next.js 14 App Router context, with Redis via Upstash, where the cache needs to be invalidated when a user updates their profile, and the handler is behind an auth middleware that adds user data to the request." More context, better output.
// Vague prompt result — works but doesn't fit your stack:
async function getCachedUser(userId: string) {
const cacheKey = `user:${userId}`
const cached = await redis.get(cacheKey)
if (cached) return JSON.parse(cached)
const user = await db.user.findUnique({ where: { id: userId } })
await redis.set(cacheKey, JSON.stringify(user), 'EX', 3600)
return user
}
// Specific prompt result — mentions Next.js App Router,
// Upstash Redis, and your invalidation requirement:
import { Redis } from '@upstash/redis'
import { revalidateTag } from 'next/cache'
const redis = Redis.fromEnv()
export async function getCachedUser(userId: string) {
const cacheKey = `user:${userId}`
const cached = await redis.get<User>(cacheKey)
if (cached) return cached
const user = await db.user.findUnique({ where: { id: userId } })
if (!user) return null
// Store with tag for targeted invalidation
await redis.set(cacheKey, user, { ex: 3600 })
return user
}
export async function invalidateUserCache(userId: string) {
await redis.del(`user:${userId}`)
revalidateTag(`user-${userId}`)
}A Workflow That Actually Works
After a year of trial and error, here's roughly how we use AI tools now:
For greenfield features, we'll sketch the architecture ourselves — what components, what data flow, what the API surface looks like. Then we hand off individual, well-scoped pieces to AI. "Write this one function", "generate this schema", "create the test cases for this logic". We never hand off the whole feature and ask AI to design it.
For debugging, AI is useful for generating hypotheses, not solutions. Paste the error, paste the relevant code, ask "what could cause this". You get a list of possibilities and you go investigate them. Don't just take the first fix AI suggests and ship it.
For code review, AI is surprisingly good at catching obvious issues in isolated functions — unused variables, missing null checks, off-by-one errors. We'll sometimes ask "review this function for edge cases" as a sanity check before PR. It's not a replacement for human review but it catches the embarrassing stuff.
For anything touching auth, payments, or data integrity — we write it ourselves or we review AI output line by line with extreme skepticism. The time you save using AI on these isn't worth the time you lose debugging subtle security or data issues.
The Skill That Actually Matters Now
There's a lot of talk about whether AI will replace developers. We think that's the wrong question. The real question is: which developers will become more effective with AI, and which will become dependent on it in ways that hurt them?
The developers who get better with AI are the ones who already have strong fundamentals. They can read AI output and know immediately whether it's correct, because they understand what correct looks like. They can give AI precise context because they understand their own system well. They can catch the subtle mistakes because they know what the subtle mistakes look like.
The developers who get hurt by AI are the ones who use it to avoid learning. If you're using AI to generate code you don't understand, you're borrowing against your own skill development. You get faster in the short term and less capable over time. We've seen this with junior developers who can ship features quickly but can't debug them when things go wrong.
AI amplifies what you already know. If you know a lot, it makes you significantly faster. If you know a little, it makes you confidently wrong faster.
When we built the peal.dev templates, we used AI heavily for the repetitive parts — generating TypeScript types from API responses, writing utility functions, creating test fixtures. But the architecture decisions, the auth flows, the Stripe webhook handling, the database schema design — those we did ourselves. The templates work reliably in production because the parts that matter were done by humans who understand the failure modes.
The Practical Takeaway
Stop thinking about AI as a tool that writes code for you. Think of it as a tool that removes friction from the parts of coding that are tedious but not complex. Generating boilerplate, writing tests for pure functions, explaining unfamiliar APIs, converting between formats — these are high-value, low-risk uses.
For anything where being wrong has real consequences — security, data integrity, performance under real load — use AI for ideas and first drafts, then verify everything yourself. The 20 minutes you spend reviewing AI-generated auth code is not wasted time. It's the time that keeps you from spending 3 days fixing a security issue you didn't write but shipped anyway.
- High trust: boilerplate, schemas, type definitions, utility functions, test cases for pure logic
- Medium trust: API routes, component code, SQL queries — read carefully, check edge cases
- Low trust: auth flows, payment handling, migrations, anything with concurrency or race conditions
- Always verify: anything where the failure mode involves losing data or exposing user information
The developers having the best time with AI tools right now aren't the ones who've handed over the most code generation to AI. They're the ones who've figured out exactly where AI earns its keep and where it wastes yours. That line is different for every stack and every team, but drawing it clearly is the most valuable thing you can do.
