Cerebras Inference - Free 1M Tokens/Day
Source: https://inference.cerebras.ai/
Description
Create account to comment on specific lines or Sign in
+
1
Cerebras offers 1 million free tokens per day through its Inference API, running on proprietary Wafer-Scale Engine (WSE-3) hardware. No credit card required, no waitlist (fully open since June 2025). The API is OpenAI-compatible — swap the base URL to https://api.cerebras.ai/v1 and use your existing OpenAI SDK code. Speeds range from ~1,000 to ~3,000 tokens/second depending on the model, making it one of the fastest inference providers available. Supported models include Llama, Qwen, GPT-OSS, and Z.ai GLM families.
No comments on this line yet.
+ 2
No comments on this line yet.
+
3
No comments on this line yet.
+ 4
No comments on this line yet.
+ 6
No comments on this line yet.
+ 7 1. Go to cloud.cerebras.ai
No comments on this line yet.
+ 8 2. Create an account (email signup)
No comments on this line yet.
+ 9 3. Verify your email address
No comments on this line yet.
+ 10 4. Navigate to "API Keys" in the left sidebar
No comments on this line yet.
+ 11 5. Click "Create API Key", give it a name, and copy the key immediately
No comments on this line yet.
+
12
6. Set it as an environment variable: export CEREBRAS_API_KEY="your-key-here"
No comments on this line yet.
+ 13 7. Done — you can start making API calls right away
No comments on this line yet.
+ 14
No comments on this line yet.
+ 15 Important:
No comments on this line yet.
+ 16 • No credit card is required for the free tier
No comments on this line yet.
+ 17 • No waitlist or approval process — instant access
No comments on this line yet.
+ 18 • The free tier resets daily (1M tokens per day, not cumulative)
No comments on this line yet.
+ 19 • Unused tokens do not roll over
No comments on this line yet.
+ 20
No comments on this line yet.
+
21
No comments on this line yet.
+ 22
No comments on this line yet.
+ 24
No comments on this line yet.
+ 26
No comments on this line yet.
+ 27 ModelModel IDParametersSpeedContext (Free)Status
No comments on this line yet.
+
28
Llama 3.1 8Bllama3.1-8b8B~2,200 tok/s8,192 tokensActive
No comments on this line yet.
+
29
Llama 3.3 70Bllama-3.3-70b70B~2,100 tok/s8,192 tokensDeprecating Feb 16, 2026
No comments on this line yet.
+
30
OpenAI GPT-OSS 120Bgpt-oss-120b120B~3,000 tok/s8,192 tokensActive
No comments on this line yet.
+
31
Qwen 3 32Bqwen-3-32b32B~2,600 tok/s8,192 tokensDeprecating Feb 16, 2026
No comments on this line yet.
+ 32
No comments on this line yet.
+ 34
No comments on this line yet.
+ 35 ModelModel IDParametersSpeed
No comments on this line yet.
+
36
Qwen 3 235B Instructqwen-3-235b-a22b-instruct-2507235B (22B active MoE)~1,400 tok/s
No comments on this line yet.
+
37
Z.ai GLM 4.7zai-glm-4.7355B~1,000 tok/s
No comments on this line yet.
+ 38
No comments on this line yet.
+
39
Note: Preview models have lower rate limits (especially zai-glm-4.7 at 10 RPM, 100 RPH, 100 RPD) and are not recommended for production use.
No comments on this line yet.
+ 40
No comments on this line yet.
+
41
No comments on this line yet.
+ 42
No comments on this line yet.
+ 44
No comments on this line yet.
+ 45 LimitFree TierDeveloper Tier (paid)
No comments on this line yet.
+ 46 Tokens per Minute60,0001,000,000
No comments on this line yet.
+ 47 Tokens per Hour1,000,000Unlimited
No comments on this line yet.
+ 48 Tokens per Day1,000,000Unlimited (pay-as-you-go)
No comments on this line yet.
+ 49 Requests per Minute301,000
No comments on this line yet.
+ 50 Requests per Hour900Unlimited
No comments on this line yet.
+ 51 Requests per Day14,400Unlimited
No comments on this line yet.
+ 52 Context Window8,192 tokensUp to 128K+
No comments on this line yet.
+ 53
No comments on this line yet.
+ 54 Rate limits use a token bucket mechanism — capacity replenishes continuously rather than resetting at fixed intervals. Whichever limit (tokens or requests) triggers first will restrict access.
No comments on this line yet.
+ 55
No comments on this line yet.
+
56
No comments on this line yet.
+ 57
No comments on this line yet.
+ 59
No comments on this line yet.
+ 60 The free tier context window is temporarily limited to 8,192 tokens across all models. This is a significant constraint for use cases requiring long documents or multi-turn conversations.
No comments on this line yet.
+ 61
No comments on this line yet.
+ 62 On the paid Developer tier, context windows expand substantially — up to 131K tokens for Qwen 3 235B Instruct and up to 128K for other models. If you need longer context, the Developer tier starts at just $10.
No comments on this line yet.
+ 63
No comments on this line yet.
+
64
No comments on this line yet.
+ 65
No comments on this line yet.
+ 67
No comments on this line yet.
+ 68 Cerebras provides an OpenAI-compatible Chat Completions endpoint, making migration straightforward. Change two things in your existing code:
No comments on this line yet.
+ 69
No comments on this line yet.
+
70
1. Set base_url to https://api.cerebras.ai/v1
No comments on this line yet.
+ 71 2. Use your Cerebras API key instead of your OpenAI key
No comments on this line yet.
+ 72
No comments on this line yet.
+
73
No comments on this line yet.
+ 74 import openai
No comments on this line yet.
+ 75
No comments on this line yet.
+ 76 client = openai.OpenAI(
No comments on this line yet.
+
77
base_url="https://api.cerebras.ai/v1",
base_url="https://api.cerebras.ai/v1", No comments on this line yet.
+
78
api_key="your-cerebras-api-key"
api_key="your-cerebras-api-key" No comments on this line yet.
+ 79 )
No comments on this line yet.
+ 80
No comments on this line yet.
+ 81 response = client.chat.completions.create(
No comments on this line yet.
+
82
model="llama-3.3-70b",
model="llama-3.3-70b", No comments on this line yet.
+
83
messages=[{"role": "user", "content": "Hello!"}]
messages=[{"role": "user", "content": "Hello!"}] No comments on this line yet.
+ 84 )
No comments on this line yet.
+
85
No comments on this line yet.
+ 86
No comments on this line yet.
+ 87 Unsupported OpenAI parameters:
No comments on this line yet.
+
88
• frequency_penalty, presence_penalty, logit_bias
No comments on this line yet.
+
89
• Streaming + JSON mode on reasoning models (streaming works fine with gpt-oss-120b and non-reasoning models)
No comments on this line yet.
+ 90 • Text Completions endpoint (only Chat Completions is supported)
No comments on this line yet.
+ 91
No comments on this line yet.
+
92
Cerebras also provides native SDKs for Python (pip install cerebras_cloud_sdk) and Node.js (npm install @cerebras/cerebras_cloud_sdk).
No comments on this line yet.
+ 93
No comments on this line yet.
+
94
No comments on this line yet.
+ 95
No comments on this line yet.
+ 97
No comments on this line yet.
+ 98 TierCostKey Differences
No comments on this line yet.
+ 99 Free$01M tokens/day, 8K context, community support
No comments on this line yet.
+ 100 DeveloperFrom $10 (pay-as-you-go)10x higher rate limits, up to 128K+ context, no daily cap
No comments on this line yet.
+ 101 EnterpriseCustom pricingDedicated queue priority, custom model weights, fine-tuning, SLA support
No comments on this line yet.
+ 102
No comments on this line yet.
+ 103 The Developer tier has no contracts — deposit $10 to your account and pay per token consumed. No auto-billing traps on the free tier.
No comments on this line yet.
+ 104
No comments on this line yet.
+
105
No comments on this line yet.
+ 106
No comments on this line yet.
+ 108
No comments on this line yet.
+ 109 You can also access Cerebras-powered inference through:
No comments on this line yet.
+ 110 • OpenRouter — unified API supporting multiple providers
No comments on this line yet.
+ 111 • Hugging Face Inference — access Cerebras models from the HF Hub
No comments on this line yet.
+ 112 • AWS Marketplace — Cerebras Fast Inference Cloud available as a marketplace product
No comments on this line yet.
+ 113
No comments on this line yet.
+
114
No comments on this line yet.
+ 115
No comments on this line yet.
+ 117
No comments on this line yet.
+ 118 • 1M tokens/day is enough for serious prototyping, small internal tools, or early-stage pilots — but the 8K context window on free tier is the real bottleneck for many use cases
No comments on this line yet.
+ 119 • Model deprecation — Llama 3.3 70B and Qwen 3 32B are scheduled for deprecation on February 16, 2026. Plan migrations to newer models
No comments on this line yet.
+
120
• gpt-oss-120b system role behavior — the system role maps to developer-level instructions with stronger influence than standard OpenAI, so identical prompts may produce different results
No comments on this line yet.
+ 121 • Cerebras Code — a separate product (VS Code extension) with Pro ($50/mo, 24M tokens/day) and Max ($200/mo, 120M tokens/day) plans for coding assistance
No comments on this line yet.
+ 122 • No region restrictions — the API is globally available, with data centers across North America and Europe
No comments on this line yet.
+ 123 • Speed advantage — Cerebras benchmarked Llama 4 Maverick at 2,522 tok/s vs. NVIDIA Blackwell at 1,038 tok/s for the same model
No comments on this line yet.
+ 124
No comments on this line yet.
+
125
No comments on this line yet.
+ 126
No comments on this line yet.
+ 127 Sources:
No comments on this line yet.
+ 128 • Cerebras Inference
No comments on this line yet.
+ 129 • Cerebras Pricing
No comments on this line yet.
+ 130 • Cerebras Supported Models
No comments on this line yet.
+ 131 • Cerebras Rate Limits
No comments on this line yet.
+ 132 • Cerebras Quickstart
No comments on this line yet.
+ 133 • Cerebras OpenAI Compatibility
No comments on this line yet.
+ 134 • Cerebras Inference Free Tier Analysis (Adam Holter)
No comments on this line yet.
+ 135 • CloudCredits.io - Cerebras Free Tier
No comments on this line yet.
+ 136 • Cerebras Blog: Pay Per Token
No comments on this line yet.