Fundamentals

Why You Hit Your Claude Limit So Fast

The context window, tokens, and how to stop torching your usage

Use This Right Now

Chat getting long? Don't just start over and lose everything. Paste this in before you hit the wall — Claude writes a handoff document capturing exactly where you've got to, plus a ready-made prompt to paste into a fresh chat so the next one picks up with all the context intact. The rest of this page explains why this works.

The handoff prompt
This chat is getting long. Before we lose the thread, write me a handoff document so a fresh chat can pick up exactly where we are. Save it as HANDOFF.md (or just output it if you can't write files), and cover: - Goal — what we're ultimately trying to do - Where we've got to — what's done, what works, what's been confirmed - Current state — the key files, decisions, and anything important you're holding in context right now - Open threads — what's in progress, what's undecided, what's blocked - Next step — the very next thing to do - Gotchas — anything that bit us, or that the next chat needs to know not to repeat Keep it tight and factual — these are notes for whoever picks this up, not a report. Then, at the very end, write me a short "kickoff prompt" I can paste into a brand-new chat. It should tell that chat to read HANDOFF.md first and continue from the next step, with everything above in mind.

Then open a new chat and paste the kickoff prompt it gives you. The fresh chat reads the handoff, picks up at the next step, and you're working with a near-empty context window again — fast, cheap, and sharp.

The Idea

You're burning through your Claude limit faster than you think you should — and it's probably not because you're using it too much. Here's the bit nobody tells you:

Every message you send, Claude re-reads the entire conversation to reply.

Claude has no memory between messages. So a long chat doesn't cost you one message — to answer, it reprocesses everything said so far, every single time. The longer the chat runs, the more each new reply costs. That conversation Claude holds in its head is the context window — and your usage limit is basically how much of it you burn through.

First, What's A Token?

Claude doesn't read words — it reads tokens, the little chunks it breaks text into. A token is roughly four characters, or about three-quarters of a word. Everything is measured in these: your messages, Claude's replies, the files you paste, all of it. Tokens are the unit your limit is counted in.

~4
characters in a token
of a word per token
~750
words in 1,000 tokens

And The Context Window?

The context window is the most Claude can hold in its head at once — the running total of everything in the conversation, measured in tokens. It's big. A current model like Claude Opus 4.8 holds around 1,000,000 tokens — roughly 750,000 words, a couple of novels. (The lightweight Haiku model holds 200,000.)

Big — but not free. Every message refills more of that window, and every reply has to read whatever's in it. A near-empty window is fast and cheap; a near-full one is slow and expensive. You're not paying for the size of the window — you're paying, on every turn, for how full you've let it get.

What Fills It Up

Everything in the conversation lives in the window — and it all gets re-read on every reply:

The system instructions

The standing rules Claude works under — and your CLAUDE.md, if you're in Claude Code. Loaded before you even type.

Every message you've sent

The whole back-and-forth, all of it, from the first message to the last.

Every reply Claude's given

Claude's own answers count too — and they're often the longest part.

Everything you've pasted in

Files, screenshots, code, error logs, documents. A pasted file can be worth thousands of messages.

In Claude Code: the work itself

Every file it reads, every command it runs, every search result — all of it lands back in the window.

A Long Chat Costs You Twice

The obvious cost is your limit — a full window means more to re-read on every reply, so each message eats more of your quota than the last. (Claude does cache parts of the conversation to soften the blow, but a longer chat is still a more expensive one.)

The quieter cost is quality. As the window fills, Claude's attention spreads thin and the oldest instructions start getting crowded out — it drifts, forgets your rules, gets vaguer. Shorter chats don't just save your limit; they keep Claude sharper. (That's the same slide the Canary Trick is built to catch.)

How To Manage It

The fix is simple, and it's mostly one habit: don't let one chat run forever.

1

One chat per task

Don't let a single conversation run all day. Finish a task, start a fresh chat for the next one. The new chat starts near-empty — and cheap.

2

Clear or compact when it gets heavy

In Claude Code, /clear wipes the context for a clean slate, and /compact summarises the chat so far into a short brief — keeping the gist without the full weight.

3

Keep the important stuff in a file

Your plan, your project rules, your CLAUDE.md — put them in files. Then starting fresh costs you almost nothing, because the context you care about is one quick read away, not buried in a three-hour chat.

4

Paste only what's needed

Dropping a whole repo or a giant log in "just in case" fills the window fast. Give it the file that matters, not the folder.

The one-line version

Claude re-reads the whole chat on every reply, so a long conversation gets slower, pricier, and dumber as it goes. Start fresh for each new task, keep what matters in a file, and you'll spend less of your limit and get sharper answers.