Inference.net: $25 free credits for OSS-model inference
Source: https://inference.net/pricing
Description
Create account to comment on specific lines or Sign in
+ 1 Inference.net hands every new account $25 in free credits to use against its OpenAI-compatible serverless inference API for open-source LLMs and vision-language models (Gemma 3, GPT-OSS 120B, NVIDIA Nemotron, plus Inference.net's own Schematron/ClipTagger families). Marketing promises rates up to ~90% lower than legacy providers — at $0.02/$0.05 per 1M tokens for the cheapest Schematron model, that $25 stretches a long way for evaluation, prototyping, batch jobs, structured-output pipelines, and OSS app development.
No comments on this line yet.
+ 2
No comments on this line yet.
+
3
No comments on this line yet.
+ 4
No comments on this line yet.
+ 6
No comments on this line yet.
+ 7 1. Go to inference.net and click Sign up (or jump straight to the docs at docs.inference.net).
No comments on this line yet.
+ 8 2. Create an account with email or a supported SSO option.
No comments on this line yet.
+ 9 3. The $25 free credit is auto-applied to new accounts — you do not need to enter a credit card to start using the Playground or API.
No comments on this line yet.
+ 10 4. Open the dashboard sidebar and go to API Keys.
No comments on this line yet.
+ 11 5. Click Create new key (or use the default key that's pre-generated for the account).
No comments on this line yet.
+ 12 6. Export the key locally:
No comments on this line yet.
+
13
No comments on this line yet.
+ 14 export INFERENCE_API_KEY=
No comments on this line yet.
+
15
No comments on this line yet.
+
16
7. Point any OpenAI SDK at https://api.inference.net/v1 and you're done — the first request will start drawing from the $25 balance.
No comments on this line yet.
+ 17
No comments on this line yet.
+ 18 Important:
No comments on this line yet.
+ 19 • No credit card required to claim the $25 (verified via signup flow + docs).
No comments on this line yet.
+ 20 • Credits are usage-based — they only deplete when you actually call the API; idle accounts don't lose them.
No comments on this line yet.
+ 21 • No public expiry on the $25 grant (treat it as ongoing until used).
No comments on this line yet.
+ 22 • Going beyond $25 requires adding a payment method and switching to pay-as-you-go.
No comments on this line yet.
+ 23
No comments on this line yet.
+
24
No comments on this line yet.
+ 25
No comments on this line yet.
+ 27
No comments on this line yet.
+ 28 Inference.net is a strict OpenAI-compatible endpoint. Migrating from OpenAI / Anthropic / Together / DeepInfra is a one-line change:
No comments on this line yet.
+ 29
No comments on this line yet.
+
30
No comments on this line yet.
+ 31 from openai import OpenAI
No comments on this line yet.
+ 32
No comments on this line yet.
+ 33 client = OpenAI(
No comments on this line yet.
+
34
base_url="https://api.inference.net/v1",
base_url="https://api.inference.net/v1", No comments on this line yet.
+
35
api_key=os.environ["INFERENCE_API_KEY"],
api_key=os.environ["INFERENCE_API_KEY"], No comments on this line yet.
+ 36 )
No comments on this line yet.
+ 37
No comments on this line yet.
+ 38 response = client.chat.completions.create(
No comments on this line yet.
+
39
model="google/gemma-3-27b-instruct/bf-16",
model="google/gemma-3-27b-instruct/bf-16", No comments on this line yet.
+
40
messages=[{"role": "user", "content": "Hello"}],
messages=[{"role": "user", "content": "Hello"}], No comments on this line yet.
+ 41 )
No comments on this line yet.
+
42
No comments on this line yet.
+ 43
No comments on this line yet.
+ 44 Supported features:
No comments on this line yet.
+ 45 • Chat completions (primary endpoint)
No comments on this line yet.
+ 46 • Structured outputs (JSON schema)
No comments on this line yet.
+
47
• Function / tool calling via tools parameter
No comments on this line yet.
+ 48 • Streaming responses
No comments on this line yet.
+ 49
No comments on this line yet.
+
50
No comments on this line yet.
+ 51
No comments on this line yet.
+ 53
No comments on this line yet.
+ 55
No comments on this line yet.
+ 56 ModelContextInput / Output ($/1M tokens)Notes
No comments on this line yet.
+ 57 NVIDIA Nemotron 3 Super (FP8)1M$2.50 / $5.00JSON, tool calling
No comments on this line yet.
+ 58 Schematron 3B (Inference.net, BF16)125K$0.02 / $0.05Cheapest; JSON output
No comments on this line yet.
+ 59 Schematron 8B (Inference.net, BF16)125K$0.04 / $0.10JSON output
No comments on this line yet.
+ 60 Schematron V2 Small (BF16)125K$0.05 / $0.25JSON output
No comments on this line yet.
+ 61 Schematron V2 Turbo (BF16)125K$0.03 / $0.15JSON output
No comments on this line yet.
+ 62
No comments on this line yet.
+ 64
No comments on this line yet.
+ 65 ModelContextInput / Output ($/1M tokens)Notes
No comments on this line yet.
+ 66 Google Gemma 3 (BF16)125K$0.15 / $0.30VLM, multimodal, JSON
No comments on this line yet.
+ 67 ClipTagger 12B (GrassData, FP8)8K$0.30 / $0.50VLM for video frame tagging
No comments on this line yet.
+ 68
No comments on this line yet.
+ 70 • Kimi K2.5 (Moonshot AI)
No comments on this line yet.
+ 71 • MiniMax-M2.5
No comments on this line yet.
+ 72 • GLM-5 (Z.ai)
No comments on this line yet.
+ 73 • GPT-OSS 120B (OpenAI)
No comments on this line yet.
+ 74
No comments on this line yet.
+
75
Larger models are priced per GPU-hour (dedicated deploys), not per-token, so they are best evaluated on the $25 balance with short test runs.
Larger models are priced per GPU-hour (dedicated deploys), not per-token, so they are best evaluated on the $25 balance with short test runs.
No comments on this line yet.
+ 76
No comments on this line yet.
+ 77 Latest catalog: see the official Inference.net models page — pricing and lineup change frequently.
No comments on this line yet.
+ 78
No comments on this line yet.
+
79
No comments on this line yet.
+ 80
No comments on this line yet.
+ 82
No comments on this line yet.
+ 83 Using the cheapest catalog model (Schematron 3B, $0.02 input / $0.05 output per 1M tokens):
No comments on this line yet.
+ 84 • ~1.25 billion input tokens, or
No comments on this line yet.
+ 85 • ~500 million output tokens, or
No comments on this line yet.
+ 86 • A typical 50/50 split: hundreds of millions of tokens
No comments on this line yet.
+ 87
No comments on this line yet.
+ 88 Using Gemma 3 vision ($0.15 / $0.30 per 1M tokens):
No comments on this line yet.
+ 89 • ~166 million input tokens / ~83 million output tokens
No comments on this line yet.
+ 90
No comments on this line yet.
+ 91 Using Nemotron 3 Super ($2.50 / $5.00 per 1M):
No comments on this line yet.
+ 92 • ~10 million input tokens / 5 million output tokens (still huge for evaluation)
No comments on this line yet.
+ 93
No comments on this line yet.
+ 94 The $25 grant is genuinely useful — well beyond a token-tasting demo.
No comments on this line yet.
+ 95
No comments on this line yet.
+
96
No comments on this line yet.
+ 97
No comments on this line yet.
+ 99
No comments on this line yet.
+ 100 If you maintain or contribute to an open-source AI project, Inference.net runs a Grants Program offering free compute beyond the $25 starter:
No comments on this line yet.
+ 101 • Free compute credits for OSS AI projects
No comments on this line yet.
+ 102 • Applications reviewed within ~24 hours
No comments on this line yet.
+ 103 • Useful for OSS model authors, eval frameworks, agent libraries, etc.
No comments on this line yet.
+ 104
No comments on this line yet.
+ 105 Apply via the Grants link on inference.net.
No comments on this line yet.
+ 106
No comments on this line yet.
+
107
No comments on this line yet.
+ 108
No comments on this line yet.
+ 110
No comments on this line yet.
+ 111 The $25 also unlocks Catalyst, Inference.net's broader platform:
No comments on this line yet.
+ 112 • Observe — log production LLM traffic
No comments on this line yet.
+ 113 • Datasets — manage eval/training data
No comments on this line yet.
+ 114 • Evaluate — compare model quality
No comments on this line yet.
+ 115 • Train — fine-tune custom models from your traffic
No comments on this line yet.
+ 116 • Deploy — serve fine-tunes on dedicated GPU infra
No comments on this line yet.
+ 117
No comments on this line yet.
+ 118 This matters if you want to start with the free credits, then graduate to fine-tuning your own task-specific small model (the Schematron family is their reference example of this pipeline).
No comments on this line yet.
+ 119
No comments on this line yet.
+
120
No comments on this line yet.
+ 121
No comments on this line yet.
+ 123
No comments on this line yet.
+ 124 • No catch on the $25 — no card required, no auto-billing, balance simply runs out and your API calls start returning 402-style errors until you top up.
No comments on this line yet.
+ 125 • Frontier OSS models are GPU-hour priced ($9.98/hr on B200), so a couple of hours of testing Kimi K2.5 / GLM-5 / GPT-OSS-120B will eat the $25 quickly. Use Schematron / Gemma 3 for long-running token-cheap workloads.
No comments on this line yet.
+ 126 • No published rate-limit ceiling for free-tier accounts — typical OpenAI-compatible limits apply; high-RPS workloads should contact sales.
No comments on this line yet.
+ 127 • Catalog evolves fast — model availability and pricing change; always re-check the live models page and pricing page before committing code to a specific model id.
No comments on this line yet.
+ 128
No comments on this line yet.
+
129
No comments on this line yet.
+ 130
No comments on this line yet.
+ 132
No comments on this line yet.
+
133
• OpenAI SDK drop-in — swap base_url and api_key only; everything else (streaming, tool calling, JSON mode) just works.
No comments on this line yet.
+ 134 • Pair with OpenRouter for fallback — if Inference.net runs out of capacity for a specific model, OpenRouter often hosts the same OSS model.
No comments on this line yet.
+ 135 • Schematron family is unique to Inference.net — purpose-built for structured/JSON output at very low cost. Worth the $25 just to benchmark against your current GPT-4o-mini structured-output pipeline.
No comments on this line yet.
+ 136 • Production migration — combine the $25 with the OSS Grants Program for sustained free usage if you're shipping an open-source agent / eval framework.
No comments on this line yet.
+ 137
No comments on this line yet.
+
138
No comments on this line yet.
+ 139
No comments on this line yet.
+ 140 Sources:
No comments on this line yet.
+ 141 • Inference.net Pricing
No comments on this line yet.
+ 142 • Inference.net Homepage & Models
No comments on this line yet.
+ 143 • Inference.net API Quickstart
No comments on this line yet.
+ 144 • Catalyst Platform Docs
No comments on this line yet.
+ 145 • Keywords AI: Introducing Inference.net
No comments on this line yet.