Free Tiers · Side-by-Side Comparison

How to Choose a Free Tier

Most major providers offer a free tier, but the rate limits, concurrency, expiry, and data-usage policies differ widely and are often buried deep in the docs. This page lays the key terms of each provider's free tier side by side, so you can pick what fits your needs and sidestep the common limits.

Updated 2026-06-18 · verified periodically against official docs; subsequent changes per provider's site
Free Tiers, Side by Side
Everything here is "free," but how usable it actually is varies a lot. The columns worth watching most are "Rate limit" and "Peak performance."
Platform What's free Rate limit (nominal) Peak performance Card required Used for training Main limitations
GroqLPU ultra-fast inference Open models: Llama / Qwen / GPT-OSS 30 req/min14,400 req/day Stable speedruns on dedicated hardware No No 5 concurrent requests; limits counted per model, and some models have lower daily caps Go →
Google GeminiAI Studio Gemini Flash / Flash-Lite (Pro is trial-tier) 15 req/min1,500 req/day Throttled at peakshared compute, no SLA No Yes (free tier) Free-tier requests may be used for model training; enabling billing cancels the free tier for that project and every call is then charged Go →
OpenRouteraggregates 400+ models Models with the :free suffix (25+ across 4 providers) 20 req/min50 req/day (rises to 1,000 after a $10 top-up) Throttled / queued at peakfree models get lower priority No Depends on the model Failed requests still count toward your quota, so it burns down fast while debugging; un-topped-up accounts are capped at 50/day Go →
Together AIopen-model platform $1 free credit on signup + select free endpoints (Llama / Qwen / DeepSeek / Mixtral) Varies by modelcredit-based, then pay-as-you-go Production-grade stabilitydedicated serving Optional No The $1 signup credit is one-time; once spent you pay standard rates, though select endpoints remain free Go →
Cloudflare Workers AIedge inference Generous daily allocation (10,000 Neurons/day); Llama / Mistral / open models 10,000 Neurons/daydaily Neuron-based quota Low latencyglobal edge network No No Neurons reset daily; heavier models consume them faster, so a big model can exhaust the allocation quickly Go →
DeepSeekfirst-party New-user signup credit (DeepSeek V4 Flash / Pro) One-time creditV4 Flash $0.14/M after Stableofficial provider No No Signup credit is one-time, then billed at standard rates; V4 Flash is among the cheapest paid rates globally, so the credit stretches a long way Go →
Coherefirst-party Free trial API key (Command R / R+) 1,000 calls/monthrate-limited trial key Good for prototypingofficial endpoint No Yes (trial) Trial data may be used to improve models; trial keys are rate-limited and not intended for production Go →
Mistral AILa Plateforme Free experimentation tier (Mistral open models) Limited rateevaluation-oriented EU-hostedofficial endpoint May be required Check data policy The free tier is intended for evaluation; confirm the current data-usage policy before sending sensitive content Go →
Hugging FaceInference API Free serverless inference for many open models Shared poolrate-limited, best-effort Variable latency at peakshared free pool, no SLA No No (public models) Free tier is best-effort with no SLA; cold starts and queueing are common under load Go →
GitHub Modelsvia GitHub account Free access to GPT / Llama / Phi for prototyping Low per-model limitsdev/testing only Strict rate limitsthrottled under load No No Intended for development and testing only, not production; rate limits are strict and enforced per model Go →

Rate limits are the nominal values from each provider's official docs; real-world experience depends on time of day, region, and account status. The "Peak performance" column is a qualitative read of public documentation and user reports, not live probe data. Policies change often (Gemini, for example, cut its free quota 50–80% in late 2025), so the provider's current docs are always the source of truth.

A Few Things Worth Knowing Before You Pick a Free Tier
Quota size is only the surface metric. The factors below have a more direct impact on what it's actually like to use.
Rate limits often matter more than quota
What usually caps your throughput isn't the total quota, it's the per-minute request limit. Batch jobs, shared keys, or bursty traffic hit the limiter fast. We put the rate-limit column up front so it's easy to compare first.
Peak hours can slow things down
Free tiers typically run on shared compute, with no SLA and lower priority. The same model can vary in speed and reliability across the day. Free does not mean stable.
Whether a card is required
Some platforms (Groq, Gemini) let you use the free tier without a card on file; others cancel the free tier once billing is enabled. Adding a card means potential charges, so confirm the terms first.
Mind the data-usage policy
Some free tiers use your request content for model training (Gemini's free tier, for one). For sensitive information or customer data, it's best to avoid those providers.
Credits usually expire
Signup credits typically come with an expiry date and reset to zero when they lapse; they aren't a long-term free supply. New-user promos are often time-limited too, reverting to standard pricing afterward.
Failed requests can count against quota
On some platforms (OpenRouter, for example) failed requests still count toward the daily quota. While debugging or retrying, your quota can drain faster than you'd expect.

Free-tier policies change. We keep tracking them.

Rate limits, quotas, expiry windows, and data-usage policies shift often. Subscribe and we'll round up the changes and send them over, so you don't have to recheck the docs yourself.

Subscribe to free change alerts