We've all done it. You have a bug, you're frustrated, you ctrl+A your entire component folder and dump it into Claude or ChatGPT, then wonder why it starts confidently suggesting changes to files you didn't even ask about. The AI isn't dumb — you just handed it a fire hose when it needed a cup.
Context window management is the skill nobody talks about but every developer using AI tools needs. It's not just about token limits. It's about signal-to-noise ratio. Garbage in, garbage out — and with LLMs, 'garbage' means irrelevant-but-valid code that pulls the model's attention in the wrong direction.
Why the Context Window Is More Like Working Memory Than a Filing Cabinet
The mental model most people use for context windows is wrong. They think of it like a hard drive — just fill it up with stuff and the AI will 'have access' to it. But it's closer to human working memory. The more things you ask someone to hold in their head simultaneously, the worse they perform on any individual task. LLMs work the same way.
Research has consistently shown that models perform worse on information buried in the middle of long contexts — the so-called 'lost in the middle' problem. The beginning and end of your context get the most attention. So if you dump 8,000 tokens of boilerplate before you get to the actual problem, you've already kneecapped yourself.
The best context isn't the most complete context. It's the most relevant context. You're curating, not archiving.
The Anatomy of a Good Context Window
When we're using AI for a specific coding task — say, debugging a Next.js server action or wiring up a new Stripe webhook — we've landed on a consistent structure that gets dramatically better results.
- The immediate problem (2-3 sentences, brutally specific)
- The file(s) directly involved (not the whole module, just the relevant file)
- Type definitions or interfaces the code depends on
- The error message or unexpected behavior, verbatim
- One example of related working code if the pattern isn't obvious
- Any constraints the AI should know (version numbers, external service limitations)
Notice what's not on that list: your entire project structure, every utility function in your codebase, your README, or your config files unless they're directly relevant. The AI doesn't need to understand your whole app to fix your specific bug.
Practical Patterns for Structuring What You Feed In
We've started using a simple wrapper format when pasting code. It sounds obvious, but explicitly labeling what each piece of code is — and why you're including it — genuinely improves output quality. It also forces you to think about whether each piece actually belongs there.
## Context
I'm building a Next.js 14 app with the App Router. I'm using Prisma with PostgreSQL.
## The Problem
My server action throws `Error: PrismaClientKnownRequestError` when trying to create a user that already exists. I want to catch this specific error and return a user-friendly message instead of crashing.
## The File With the Bug
```typescript
// app/actions/auth.ts
'use server'
import { prisma } from '@/lib/prisma'
export async function createUser(email: string, password: string) {
const user = await prisma.user.create({
data: { email, password: await hashPassword(password) }
})
return { success: true, user }
}
```
## Relevant Type (Prisma error codes I know about)
Prisma unique constraint violation is error code P2002.
## What I've Tried
Wrapping in try/catch works but I don't know how to check if it's specifically a unique constraint error versus a database connection error.That whole prompt is maybe 200 tokens. You'll get a better answer than if you'd pasted your entire auth module, your Prisma schema, your middleware, and your environment setup. We've tested this — obsessively — because we spent weeks building templates and needed the AI to help us debug patterns we were seeing for the first time.
The Types File Trick
Here's one that took us embarrassingly long to figure out: when you're working in a typed codebase, your type definitions and interfaces are some of the highest-value context you can include. They're compact, they're information-dense, and they tell the AI exactly what shape your data is in.
// Instead of pasting your whole 400-line component, paste the types it uses:
interface Subscription {
id: string
status: 'active' | 'canceled' | 'past_due' | 'trialing'
currentPeriodEnd: Date
planId: string
customerId: string
}
interface User {
id: string
email: string
subscription: Subscription | null
role: 'admin' | 'member' | 'viewer'
}
// Then paste ONLY the specific function that's broken:
export function canAccessFeature(user: User, feature: string): boolean {
if (!user.subscription) return false
// bug is here somewhere...
return user.subscription.status === 'active' && planIncludes(user.subscription.planId, feature)
}The AI now understands exactly what a User and Subscription look like in your system, without you having to include the 12 other places those types get used. This is especially useful when you're debugging data transformation logic or asking the AI to write a function that operates on your specific domain objects.
When You Actually Do Need More Context
Everything above is about reducing context. But sometimes you genuinely need to give the AI a lot. Architectural questions, refactoring tasks, or 'why does this pattern exist' questions benefit from broader context. The key is being intentional about it.
For larger context dumps, we use a different structure. Start with the question at the very top — before any code. This is counterintuitive because it feels like you're asking before you've given context, but it anchors the AI's attention on what matters. Then provide context. Then restate the specific question at the bottom.
## Question (read this first)
Should this data fetching logic live in the Server Component or should I extract it into a separate server action? I want to understand the trade-offs for this specific pattern.
## The Current Implementation
[paste your code here]
## Constraints
- This page needs to be streamed with Suspense
- The data is user-specific (can't be statically cached)
- We reuse similar fetching logic in 3 other places
## Restate: What I need
A recommendation with reasoning, not just 'it depends'. Pick one and tell me why.The 'restate at the bottom' trick is genuinely useful. Because of how attention works in these models, putting your question at both ends of the context helps ensure the response stays focused on what you actually asked, even if you've got 3,000 tokens of code in the middle.
Managing Context Across a Conversation
Long conversations degrade. This is just reality. After 10-15 back-and-forths, the model starts losing track of earlier context, contradicting itself, or giving solutions that conflict with decisions made earlier in the thread. We learned this the hard way building out the auth flow for a template — 40 messages in, Claude started suggesting we use a library we'd explicitly said we weren't using in message 3.
The solution is to treat each conversation like a sprint. Pick a narrow scope, work through it, then start a fresh conversation for the next thing. When you start fresh, open with a brief summary of decisions already made — a 'state of the world' paragraph — so you're not re-litigating old ground but you've reset the attention mechanism.
## State of the World (decisions already made, don't re-suggest these)
- Auth: NextAuth v5 with Credentials + Google provider
- DB: Prisma + Neon PostgreSQL
- Session: JWT (not database sessions)
- No Redis, keeping infra simple for now
- User roles stored in JWT claims, not DB lookup on every request
## New Task
I need to implement the 'forgot password' flow. Given the above constraints, what's the right approach for generating and storing password reset tokens?This 'state of the world' pattern has cut our frustrating AI conversations by probably 60%. It sounds like extra work but it takes 2 minutes to write and saves you from arguing with a chatbot about why you're not using Redis.
IDE Tools vs. Chat: Different Context Strategies
Cursor, GitHub Copilot, and similar IDE-integrated tools handle context differently than a chat interface. With IDE tools, you have less manual control — the tool is deciding what to include. Knowing how each tool works helps you work with it instead of against it.
Cursor's @-mentions system is excellent because it makes context inclusion explicit. Instead of hoping the tool grabs the right files, you tell it exactly what's relevant: @auth.ts @types/user.ts @middleware.ts. This is the same principle as manual curation but with a better UX. We use this constantly when working on the peal.dev templates — especially when jumping between the auth layer and the payment layer, which share types but have very different logic.
- For Cursor: Use @file mentions explicitly rather than relying on automatic context detection
- For Copilot: Open only the files you want it to consider, close everything else
- For Claude/ChatGPT in browser: Use the structure patterns above, be explicit with labels
- For all tools: Never assume the AI knows your project structure — state it when relevant
The .cursorrules / System Prompt Trick for Persistent Context
Some context is so consistent across your project that it shouldn't have to live in every individual prompt. Framework versions, coding conventions, architectural decisions — these belong in a system prompt or project-level rules file that applies everywhere.
# .cursorrules (or your system prompt)
## Project: SaaS template built with Next.js 14 App Router
### Tech stack (don't suggest alternatives)
- Next.js 14 with App Router
- TypeScript (strict mode)
- Prisma + PostgreSQL
- NextAuth v5
- Stripe for payments
- Resend for email
- Tailwind CSS + shadcn/ui
### Conventions
- Server components by default, add 'use client' only when necessary
- Server actions for all form submissions and mutations
- All database calls go through /lib/db — never import prisma directly in components
- Error handling: return { success: boolean, error?: string } objects, don't throw from server actions
- Types go in /types directory, not colocated with components
### Don't suggest
- tRPC (we evaluated it, decided against it)
- React Query (server components handle caching)
- Any solution that requires running a separate server processThis kind of persistent context means you're not burning tokens explaining your architecture in every conversation, and the AI isn't suggesting you switch to a different ORM every time you ask a database question. It's especially valuable when multiple people are working on the same codebase — everyone gets consistent AI behavior without having to memorize what to include in each prompt.
Project-level rules files are documentation that actually gets read — by your AI tools, every single request. That's more than you can say for most READMEs.
This is something we bake into every peal.dev template — a solid starting .cursorrules file that reflects the actual architecture decisions in the template, so when you start building on top of it, the AI already knows the lay of the land.
What to Cut When You're Over the Limit
Sometimes you genuinely need to include a lot and you're bumping against practical limits — either hard token limits or the 'too much noise' soft limit. When you need to trim, cut in this order:
- Comments that explain what the code does (the AI can read code, it doesn't need your comments)
- Import statements (mention the library names in your description instead)
- Unrelated functions in the same file (cut to just the function that matters)
- Boilerplate that follows obvious conventions (you don't need to paste a standard Next.js layout if it's standard)
- Old iterations of code you tried (mention what you tried in prose, don't paste it)
What you should almost never cut: type definitions, error messages (paste the exact error, always), and the specific function signature or component API that the broken code needs to match.
Context window management isn't a one-time skill you learn and apply forever unchanged. Models improve, tools change, and what works with Claude 3.5 might not be optimal for whatever comes out next month. The underlying principle stays constant though: you are the curator. The AI is powerful but it's not psychic, and feeding it the right information at the right time is still a human job — for now, at least.
