Prompt Engineering for Code Generation: What Actually Works in 2025

We've both been burned. You write a vague prompt, the AI produces something that looks right, you paste it in, it half-works, you iterate five more times, and somehow you've wasted more time than if you'd just written it yourself. Or worse — it works perfectly in your head but ships a subtle bug that bites you three weeks later at 2am.

Prompt engineering for code generation isn't magic, and most of the advice out there is either too abstract ('be specific!') or cargo-culted from marketing copy. This post is what we've actually learned after using AI tools daily for the past year building templates at peal.dev — the patterns that genuinely improve output quality, and the ones that are a waste of time.

The Core Problem: AI Fills in Blanks With Assumptions

Every gap in your prompt gets filled by the model's training data. That sounds fine until you realize the training data contains every Stack Overflow answer from 2015, every tutorial that uses deprecated patterns, and a million 'hello world' examples that don't reflect production code. When you write 'add authentication to my app', the model reaches for the most statistically common answer — which might be passport.js, JWT stored in localStorage, or whatever was popular when the training data was collected.

The fix isn't writing longer prompts. It's eliminating ambiguity by stating your constraints explicitly. What you're using, what you're NOT using, what the output should and shouldn't do. Think of it like writing a function signature before the body — constrain the output space first.

Lead With Context, Not the Request

Most people write prompts backwards. They lead with the ask and append context as an afterthought. Flip it. Give the model its operating environment before you make the request. This isn't about writing more — it's about ordering information so the model builds the right mental model before it starts generating.

// Bad prompt
"Add rate limiting to my API route"

// Better prompt
"Tech stack: Next.js 15 App Router, TypeScript strict mode, deployed on Vercel (serverless).
Using: Upstash Redis for any stateful storage.
Not using: any middleware packages beyond what Next.js provides natively.

Add rate limiting to this API route handler. Limit to 10 requests per minute per IP.
Return 429 with a Retry-After header when exceeded.
Handle the case where the IP header might be missing (Vercel forwards X-Forwarded-For).

[paste your route code here]"

Notice what changed: you've ruled out express-rate-limit, you've specified the storage layer, you've called out the Vercel-specific header behavior. The model can no longer guess wrong on any of those. You're not writing more words for the sake of it — every line eliminates a decision the model would otherwise make for you.

The 'Show Your Types' Trick

If you're working with TypeScript — and you should be — paste your types before asking for implementation code. This is probably the single highest-leverage thing you can do. The model will generate code that matches your actual data shape instead of inventing one that looks plausible.

// Paste this BEFORE your actual request:
type User = {
  id: string
  email: string
  role: 'admin' | 'member' | 'viewer'
  organizationId: string
  createdAt: Date
}

type Session = {
  userId: string
  expiresAt: Date
  metadata: {
    ipAddress: string
    userAgent: string
  }
}

// Now your request:
// "Write a function that validates a session and returns the associated user.
//  Return null if session is expired. Throw an AuthError if userId doesn't match any user.
//  Use Drizzle ORM with the db import from '@/lib/db'."

You'd be shocked how much better the output is when the model knows your role enum is 'admin' | 'member' | 'viewer' instead of the generic 'user' | 'admin' it would invent. It also stops the model from inventing properties your types don't have, which is a very common failure mode.

Negative Constraints Are Underrated

Tell the model what NOT to do. This sounds counterintuitive — surely you want to describe the solution, not non-solutions? But think about how many times you've gotten code that technically works but uses a pattern you hate, a library you're not installing, or an approach that doesn't fit your architecture. Negative constraints are preventive debugging.

"Do not use any additional npm packages" — prevents library suggestions you have to research and install
"Do not use useEffect for data fetching" — forces the model toward server components or React Query if you've said you're using it
"Do not use try/catch inside the component" — pushes error handling to your error boundaries
"Do not create new files, only modify what I paste" — keeps scope contained
"Do not change the function signature" — prevents the model from refactoring things you didn't ask it to refactor

That last one is huge. AI models have a compulsive refactoring instinct. You ask them to fix one bug and they restructure your whole module. Telling them explicitly 'only change what's needed to fix the bug, leave everything else as-is' saves you from reviewing a diff that's 10x larger than it needed to be.

Prompt Patterns That Actually Work

After enough repetition, you start to notice which prompt structures reliably produce good code and which ones are a coin flip. Here are the ones we actually use:

// Pattern 1: The Spec Pattern
// Great for new features where you know what you want but not how to implement it

"Implement the following spec:
- Input: [describe inputs]
- Output: [describe outputs / return type]
- Behavior: [describe logic]
- Error cases: [how should errors be handled]
- Constraints: [performance, security, etc.]

Use [your tech stack]. Here's the existing code it needs to integrate with:
[paste relevant code]"

// Pattern 2: The Diff Pattern  
// Great for modifications — you want minimal blast radius

"Here's the current implementation:
[paste code]

Here's what needs to change:
- [specific change 1]
- [specific change 2]

Return only the modified sections as a unified diff or rewrite only the changed functions.
Do not refactor anything that isn't directly related to these changes."

// Pattern 3: The Review Pattern
// Great for catching bugs before they ship

"Review this code for:
1. Security issues (auth bypass, injection, data exposure)
2. Edge cases I haven't handled
3. Performance problems at scale (assume 1000 concurrent users)
4. TypeScript type safety holes

Do NOT suggest style changes or refactoring. Only flag actual bugs or risks.
For each issue, tell me: severity (critical/medium/low), what breaks, and how to fix it.

[paste code]"

The Review Pattern is something we started using after a particularly rough deploy. You catch things that look fine but aren't — missing authorization checks, N+1 queries hiding in loops, error messages that leak internal implementation details. It's basically a free code reviewer that doesn't have feelings.

Iterating Without Losing Your Mind

Iteration is where most people's prompt strategy falls apart. The first output isn't right, so they write 'that's not quite right, can you fix it' and get back something that changes too much or too little. You need to be surgical about correction prompts.

// Bad correction prompt
"That's not right, the error handling is wrong"

// Good correction prompt
"The error handling is wrong in two specific ways:
1. Line 23: you're catching all errors and returning null, but AuthError should re-throw
2. The expired session case returns undefined instead of null (breaks the return type)

Fix only those two issues. Leave everything else exactly as-is."

If you give vague feedback, you get vague fixes. The model will rewrite things it doesn't need to rewrite because it doesn't know what specifically was wrong. Pinpoint the issue, describe the expected behavior, and explicitly constrain the scope of the fix.

Also: don't iterate more than 3-4 times on the same code without starting fresh. After a few rounds of back-and-forth, the model starts making compensating changes that mask earlier problems rather than actually fixing them. If you're on iteration five and things are getting worse, scrap the conversation, write a better initial prompt incorporating what you learned, and start clean.

The Dirty Secret: Context Window Hygiene Matters More Than Prompt Cleverness

Fancy prompt techniques won't save you if you're feeding the model 3000 lines of irrelevant code along with a 50-line function you want help with. The model's attention degrades with distance — stuff at the top and bottom of the context gets more attention than stuff in the middle. This is why 'I pasted my whole codebase and the output was garbage' is such a common complaint.

Paste only the files directly relevant to the task — usually 1-3 files maximum
If you need to paste a large file, trim it: remove imports the AI doesn't need, collapse functions it shouldn't touch with a comment like '// [other methods omitted]'
Put your actual question/request at the END of the prompt, after all context — models tend to anchor on the last thing they see
For long sessions, periodically summarize what's been established and start a new conversation — don't trust that the model remembers accurately what was agreed 20 messages ago
Explicitly reference line numbers or function names rather than saying 'the function above' — positional references break down in long contexts

The best prompt is one that makes it easier for the model to be right than wrong. You're not persuading it — you're constraining the solution space until only the correct answer fits.

When to Stop Prompting and Just Write It

This is the thing nobody says enough: sometimes prompting is not the right tool. If you've spent 15 minutes writing and refining a prompt for something that would take 10 minutes to write manually, you've made a bad trade. AI code generation earns its keep on boilerplate, repetitive patterns, code you broadly understand but don't want to type out, and unfamiliar APIs where you need a working starting point.

It earns its keep less on complex business logic with subtle requirements, security-critical code where you need to fully understand every line anyway, and anything where the iteration cost is high (database migrations, schema changes, anything touching production data). For all of the templates we build at peal.dev, we use AI heavily for the scaffolding and repetitive wiring — auth callbacks, API route structure, component boilerplate — but we write the business logic and security checks ourselves. The boundary isn't fuzzy once you've shipped enough code.

The practical takeaway: prompt engineering is a skill, but the highest-leverage skill is knowing when to use it. Invest in writing better prompts for the code you generate often. For the rest, type it yourself and move on.