learn-ai

Stop Hitting Claude's Rate Limits: My 3-Tool $25 Stack

Anthropic admitted the compute shortage. Switching all-in to DeepSeek doesn't work either. Here's the 3-tool ai stack that beats Claude's rate limits for about $25 a month.

A silhouette of a knowledge worker with a briefcase being lifted by three teal balloons of different sizes, rising past a cracked brick wall into a warm cream sky

Answer in 60 seconds. Anthropic admitted a compute shortage in early 2026. Claude Plus rate limits got tighter for everyone. Switching entirely to DeepSeek or another tool costs you quality on the 20 percent of work that actually needs Claude. The fix is a 3-tool stack: Opus 4.7 for the hard 20 percent, DeepSeek V4 Flash for the cheap 80 percent of drafting and rewriting, Gemini Flash-Lite for routing and extraction. Total cost about $25 a month versus ~$60 a month for stacking three “Plus” subscriptions that still hit walls. No rate-limit walls.

You hit 4pm. You’re mid-thought on a draft. Claude Plus throws up the wall: “You’ve reached your message limit. Try again at 6:42pm.”

You stare at it. Two and a half hours of nothing.

You pay $20 a month for this. The thing you bought it for is now unavailable for the rest of the workday. Anthropic’s status page says everything’s fine.

Yeah, no.

What’s actually happening

Anthropic admitted it in March 2026: they’re out of compute. Demand outpaced supply and they tightened the screws on rate limits across all paid tiers. Most Claude Plus subscribers dropped from effectively unlimited Sonnet usage to about 240 messages per 5 hours, and from comfortable Opus usage to about 45 messages per 5 hours.

If you’re a heavy Plus user, you hit those walls by lunch.

Anthropic’s fix is more compute. That takes time. Months. Maybe longer.

Your fix is not waiting.

Key Takeaway: Claude’s rate limits in 2026 aren’t a glitch. Anthropic publicly admitted a compute shortage. Plan around it.

Quick truth, before I sell you anything

I’m on Claude Max, not Plus. About $200 a month, because I run Claude Cowork and code in long desktop sessions. For the load I push, the math works out. It’s expensive, and I don’t have many other choices given what I do.

So I’m not the one hitting your 4pm wall today. I won’t pretend I am.

But I was the regular user before. I remember that wall. So I worked out the routing below for you, because the timer beating you down all afternoon is a problem worth solving even when it’s not mine right now.

Cost numbers in the tables come from posted API prices, not from my own bills on this stack.

The wrong fix most people try first

Switch to DeepSeek. Or ChatGPT. Or Gemini. Pick one, jump ship for good.

A lot of people tried it that way. It works for about three days. Then you hit a task that needs Opus-level reasoning, watch DeepSeek get it visibly wrong, and you’re back at app.claude.com hitting refresh on the rate-limit timer.

Single-tool switching trades one set of weaknesses for a different set. You don’t end up with a better stack. You end up in a different cage.

The way out is using all three for the parts each one is best at, and dropping the idea that one tool covers everything.

The 3-tool stack

Three models. Each handles a specific category of thinking. The cost math falls out of the routing.

Opus 4.7 (Anthropic): the hard 20 percent

What you use it for: long-context reasoning, multi-step planning, code architecture, full-chapter writing, anything that needs the model to hold 50 things in mind and not lose its place. The work where DeepSeek and Gemini still trip up.

How you access it: Claude Plus subscription ($20/mo) or pay-per-use API ($5 input / $25 output per million tokens). The Plus plan covers the hard 20 percent of your workload comfortably if you stop using Opus for tasks the cheaper models can do.

Where it breaks: rate limits. The whole reason you’re reading this.

When to reach for it: the message has to be smart. Smart-and-fast doesn’t matter. Smart matters. If you find yourself thinking “the answer better not be wrong,” it’s an Opus message.

DeepSeek V4 Flash: the cheap 80 percent

What you use it for: drafts, rewrites, brainstorming, summaries, second-pass editing, repetitive ai work that doesn’t need top-shelf reasoning. The middle of any writing or research workflow.

How you access it: deepseek.com web app (free tier is generous) or API at $0.14 input / $0.28 output per million tokens. Roughly 35x cheaper than Opus on input, close to 90x cheaper on output. No rate-limit walls in normal use.

Where it breaks: deep reasoning, novel-length context, anything legal or technical where you need to trust every sentence.

When to reach for it: most of the time. If you’re writing 5,000 words a week, 4,000 of those are first-draft DeepSeek work. The 1,000 that matter most go to Opus.

Gemini Flash-Lite: routing and extraction

What you use it for: parsing data, simple Q&A, classification, pulling structured info out of unstructured text, summary tasks where speed beats brilliance. The “I just need an answer” tier.

How you access it: Gemini app or API at $0.10 input / $0.40 output per million tokens. Cheapest of the three. Fastest of the three. Best at structured outputs (JSON, tables, lists).

Where it breaks: anything that needs personality or nuance. Gemini Flash-Lite is a tool, not a colleague.

When to reach for it: when you’d rather not burn an Opus message on something a calculator could do.

Key Takeaway: The whole stack works because each tool plays its position. Opus for the hard 20 percent, DeepSeek for the cheap 80 percent, Gemini for the parsing layer underneath. Stop using Opus for jobs Gemini handles for a tenth of a cent.

The model decision card

Use this for the next two weeks and watch your Claude usage drop sharply, without losing a single hard task.

Task typeUse thisWhy
Drafting an article from outlineDeepSeek V4 FlashCheap, fast, “good enough” prose you’ll edit anyway
Editing your own writing for toneDeepSeek V4 FlashSame cost reasoning
Long-form reasoning, multi-step plansOpus 4.7Only model that holds 50 things in mind
Writing a chapter of a bookOpus 4.7Voice consistency over 3,000+ words
Code architecture or refactorOpus 4.7DeepSeek breaks at multi-file reasoning
Quick code snippet, single functionDeepSeek V4 FlashFaster, cheaper, same accuracy on small jobs
Pulling data from a PDF or emailGemini Flash-LiteBest at structured extraction, cheapest
Yes/no classification questionsGemini Flash-LiteSpeed matters more than nuance here
“What does this error mean”DeepSeek V4 FlashFree tier covers it
Anything where being wrong matters a lotOpus 4.7The whole point of paying for Opus

The cost math (real numbers)

The single-vendor stack most heavy users build by accident:

ToolMonthly costWhat you use it for
Claude Plus$20Everything, hits limits by 4pm
ChatGPT Plus$20Backup for when Claude’s down
Gemini Advanced$20“Just in case”
Total$60/moAnd you still hit walls

The 3-tool stack done right (API + free tiers):

ToolMonthly costWhat you use it for
Claude Plus$20The hard 20 percent only
DeepSeek V4 Flash$3-5 (API) or $0 (free tier)The cheap 80 percent
Gemini Flash-Lite$1-2 (API) or $0 (free tier)Extraction, classification
Total~$25/moNo rate-limit walls

If you’re heavy on API, you might pay more. If you stick to free tiers, you might pay less. The shape stays the same: Claude does less work but the right work, the cheap models do more work for ~10 percent of the cost.

Key Takeaway: The savings aren’t from switching tools. The savings come from sending each task to the cheapest model that can still do it well. The bill drops because you stop paying $20 of compute for tasks that cost three cents elsewhere.

What the day looks like on this stack

This is the routine I’d map onto a typical Plus-user writing day. Paste it into your week if it fits.

The outline of a piece goes to Opus. Outlining is where being wrong matters most. Five minutes of Opus time.

The section drafts go to DeepSeek. Drafting needs volume more than brilliance. Thirty minutes of DeepSeek work generates maybe 3,000 words of rough prose you’ll edit down.

A second DeepSeek pass for tone catches obvious clunkers. Free.

Gemini Flash-Lite pulls quotes from a PDF you’re citing. Two cents.

The hard parts (the headline, the closer, the paragraph that needs a real receipt) go back to Opus. Five more minutes.

Opus time for the whole article comes to about 15 minutes, well under the rate-limit wall. The other 90 percent of the work happens in tools that don’t have walls.

The catch

This stack has three real problems.

Context-switching tax. You’re now juggling three browser tabs instead of one. After two weeks the muscle memory clicks in, but the first week is annoying. If you live entirely inside one tool’s chat history, this is a real switch.

Quality drift if you route wrong. If you give DeepSeek a job that actually needed Opus, you’ll waste 30 minutes editing slop the better model wouldn’t have written. The decision card above helps. The reflex builds with practice.

You still need Claude Plus. Don’t cancel it. The whole goal here is to stop using Claude for jobs it shouldn’t be doing. Cancel Plus and you’ll need API credits for the hard 20 percent, and that math gets uglier fast for heavy users.

Why most “stop hitting Claude limits” advice misses the point

Most posts on this topic tell you the same five things: turn off Projects, shorter prompts, switch to Sonnet, upgrade to Max, log out and back in. Surface fixes that buy you maybe 10 percent more headroom.

Rate limits are a compute issue at Anthropic’s end, not a usage-pattern issue at yours. Tightening your prompts won’t fix a GPU shortage. The way out of the wall is to send 80 percent of your traffic somewhere the wall doesn’t exist.

Single-tool optimization tries to make Claude carry the whole workload more efficiently. The 3-tool stack moves the workload off Claude for everything that doesn’t need it.

For the frame on why no single AI deserves your full trust, see the smartest AI model lies the most. The broader case for stack-thinking lives in why is AI bad, and the Claude-specific prompting that makes Opus pay off when you do reach for it sits in how to use Claude Opus 4.7.

FAQ

Why does Claude have rate limits in 2026?

Anthropic ran into a compute shortage in early 2026 and admitted publicly that they cannot meet demand. Claude Plus dropped from effectively unlimited to roughly 45 messages per 5 hours on Opus and 240 on Sonnet for most subscribers. The fix from Anthropic’s end is more compute, which takes time. The fix from your end is not waiting.

Should I just switch from Claude to DeepSeek?

Switching one tool for another doesn’t solve the problem. DeepSeek V4 is cheaper but weaker at deep reasoning. If you switch entirely, you save money but lose quality on the 20 percent of tasks that actually need Claude. The right move is using both for what each one is good at.

How much does a 3-tool stack cost compared to Claude Plus and ChatGPT Plus?

About $25 a month total if you route tasks correctly, versus around $60 a month if you stack Claude Plus, ChatGPT Plus, and Gemini Advanced at $20 each, and around $300+ a month if you climb the Max and Pro tiers chasing higher limits. The math depends on your usage, but the routing is what saves the money.

Which model do I use for what?

Opus 4.7 for the hard 20 percent (long-context reasoning, multi-step planning, code architecture, novel-length writing). DeepSeek V4 Flash for the cheap 80 percent (drafts, rewrites, brainstorming, summaries, repetitive AI work). Gemini Flash-Lite for routing and extraction (parsing data, simple Q&A, classification, structured outputs).

Does this work for non-coders or non-developers?

Yes. The same routing logic works for writers, marketers, researchers, and small-business owners. The 3-tool stack is about which tool handles which type of thinking, not about API integrations. You can switch between the web apps manually.

The workflow as a checklist

Save this. Try it for two weeks.

  1. Keep Claude Plus open in tab one. Use it only for the hard 20 percent of tasks.
  2. Open DeepSeek in tab two. Send drafts, rewrites, brainstorms, summaries here.
  3. Open Gemini Flash-Lite in tab three. Use it for parsing data, classification, structured outputs.
  4. Use the decision card above as your routing reflex. Two weeks builds the muscle memory.
  5. Watch your Claude rate-limit hits drop to near zero.

Box score:

  • Total cost: about $25/month with API, less with free tiers
  • Total tools open: 3 tabs
  • Time you spend hitting rate-limit walls: zero

Run the stack for two weeks. The 4pm wall stops being part of your day.


Want the full task-routing matrix as a printable card you can keep next to your monitor? It goes out in the Stash newsletter when I refresh it. Sign up at theaistash.com/newsletter, I send something new whenever I find it worth your inbox.