Free Tiers · Side-by-Side Comparison

How to Choose a Free Tier

Most major providers offer a free tier, but the rate limits, concurrency, expiry, and data-usage policies differ widely and are often buried deep in the docs. This page lays the key terms of each provider's free tier side by side, so you can pick what fits your needs and sidestep the common limits.

Updated 2026-06-18 · verified periodically against official docs; subsequent changes per provider's site

Free Tiers, Side by Side

Everything here is "free," but how usable it actually is varies a lot. The columns worth watching most are "Rate limit" and "Peak performance."

Platform	What's free	Rate limit (nominal)	Peak performance	Card required	Used for training	Main limitations
GroqLPU ultra-fast inference	Open models: Llama / Qwen / GPT-OSS	30 req/min14,400 req/day	Stable speedruns on dedicated hardware	No	No	5 concurrent requests; limits counted per model, and some models have lower daily caps	Go →
Google GeminiAI Studio	Gemini Flash / Flash-Lite (Pro is trial-tier)	15 req/min1,500 req/day	Throttled at peakshared compute, no SLA	No	Yes (free tier)	Free-tier requests may be used for model training; enabling billing cancels the free tier for that project and every call is then charged	Go →
OpenRouteraggregates 400+ models	Models with the `:free` suffix (25+ across 4 providers)	20 req/min50 req/day (rises to 1,000 after a $10 top-up)	Throttled / queued at peakfree models get lower priority	No	Depends on the model	Failed requests still count toward your quota, so it burns down fast while debugging; un-topped-up accounts are capped at 50/day	Go →
Together AIopen-model platform	$1 free credit on signup + select free endpoints (Llama / Qwen / DeepSeek / Mixtral)	Varies by modelcredit-based, then pay-as-you-go	Production-grade stabilitydedicated serving	Optional	No	The $1 signup credit is one-time; once spent you pay standard rates, though select endpoints remain free	Go →
Cloudflare Workers AIedge inference	Generous daily allocation (10,000 Neurons/day); Llama / Mistral / open models	10,000 Neurons/daydaily Neuron-based quota	Low latencyglobal edge network	No	No	Neurons reset daily; heavier models consume them faster, so a big model can exhaust the allocation quickly	Go →
DeepSeekfirst-party	New-user signup credit (DeepSeek V4 Flash / Pro)	One-time creditV4 Flash $0.14/M after	Stableofficial provider	No	No	Signup credit is one-time, then billed at standard rates; V4 Flash is among the cheapest paid rates globally, so the credit stretches a long way	Go →
Coherefirst-party	Free trial API key (Command R / R+)	1,000 calls/monthrate-limited trial key	Good for prototypingofficial endpoint	No	Yes (trial)	Trial data may be used to improve models; trial keys are rate-limited and not intended for production	Go →
Mistral AILa Plateforme	Free experimentation tier (Mistral open models)	Limited rateevaluation-oriented	EU-hostedofficial endpoint	May be required	Check data policy	The free tier is intended for evaluation; confirm the current data-usage policy before sending sensitive content	Go →
Hugging FaceInference API	Free serverless inference for many open models	Shared poolrate-limited, best-effort	Variable latency at peakshared free pool, no SLA	No	No (public models)	Free tier is best-effort with no SLA; cold starts and queueing are common under load	Go →
GitHub Modelsvia GitHub account	Free access to GPT / Llama / Phi for prototyping	Low per-model limitsdev/testing only	Strict rate limitsthrottled under load	No	No	Intended for development and testing only, not production; rate limits are strict and enforced per model	Go →

Rate limits are the nominal values from each provider's official docs; real-world experience depends on time of day, region, and account status. The "Peak performance" column is a qualitative read of public documentation and user reports, not live probe data. Policies change often (Gemini, for example, cut its free quota 50–80% in late 2025), so the provider's current docs are always the source of truth.

A Few Things Worth Knowing Before You Pick a Free Tier

Quota size is only the surface metric. The factors below have a more direct impact on what it's actually like to use.

Rate limits often matter more than quota

What usually caps your throughput isn't the total quota, it's the per-minute request limit. Batch jobs, shared keys, or bursty traffic hit the limiter fast. We put the rate-limit column up front so it's easy to compare first.

Peak hours can slow things down

Free tiers typically run on shared compute, with no SLA and lower priority. The same model can vary in speed and reliability across the day. Free does not mean stable.

Whether a card is required

Some platforms (Groq, Gemini) let you use the free tier without a card on file; others cancel the free tier once billing is enabled. Adding a card means potential charges, so confirm the terms first.

Mind the data-usage policy

Some free tiers use your request content for model training (Gemini's free tier, for one). For sensitive information or customer data, it's best to avoid those providers.

Credits usually expire

Signup credits typically come with an expiry date and reset to zero when they lapse; they aren't a long-term free supply. New-user promos are often time-limited too, reverting to standard pricing afterward.

Failed requests can count against quota

On some platforms (OpenRouter, for example) failed requests still count toward the daily quota. While debugging or retrying, your quota can drain faster than you'd expect.

Free-tier policies change. We keep tracking them.

Rate limits, quotas, expiry windows, and data-usage policies shift often. Subscribe and we'll round up the changes and send them over, so you don't have to recheck the docs yourself.

Subscribe to free change alerts