Z.AI: GLM-4.5-Flash, GLM-4.7-Flash and GLM-4.6V-Flash 100% free via API
Source: https://docs.z.ai/guides/overview/pricing
Description
Create account to comment on specific lines or Sign in
+
1
Z.AI (Zhipu AI's international platform, the company behind ChatGLM) exposes three "Flash" models priced at $0 for input, cached input, cached storage AND output tokens. There is no monthly cap, no credit balance to top up, and no card required to use them. The endpoint is OpenAI-compatible, so you can drop these models straight into any existing OpenAI SDK / LangChain / OpenRouter-style client by changing the base_url and api_key.
No comments on this line yet.
+ 2
No comments on this line yet.
+
3
The three free models cover the most common use-cases for indie builders: a fast general chat model (GLM-4.5-Flash), the latest-generation general chat model (GLM-4.7-Flash), and a vision-language model that can read images (GLM-4.6V-Flash). All three sit on the international api.z.ai endpoint, so no China-mainland phone number is required.
No comments on this line yet.
+ 4
No comments on this line yet.
+
5
No comments on this line yet.
+ 6
No comments on this line yet.
+ 8
No comments on this line yet.
+
9
1. Go to z.ai/model-api (the international Open Platform — not bigmodel.cn, which is the China-mainland version with different rules).
No comments on this line yet.
+ 10 2. Click Sign Up and register with email + password (Google/GitHub SSO is also offered). No phone verification on the international platform for basic signup.
No comments on this line yet.
+ 11 3. Verify your email via the link Z.AI sends.
No comments on this line yet.
+ 12 4. Open the API Keys page at z.ai/manage-apikey/apikey-list and click Create API Key. Copy it once — Z.AI does not show it again.
No comments on this line yet.
+ 13 5. Point your OpenAI client at the Z.AI base URL:
No comments on this line yet.
+
14
• Base URL: https://api.z.ai/api/paas/v4/
No comments on this line yet.
+
15
• Auth header: Authorization: Bearer <YOUR_API_KEY>
No comments on this line yet.
+
16
6. Set the model parameter to one of glm-4.5-flash, glm-4.7-flash, or glm-4.6v-flash and call chat/completions as you would with OpenAI.
No comments on this line yet.
+ 17
No comments on this line yet.
+ 18 Important:
No comments on this line yet.
+ 19 • No credit card required for the free Flash models. You only need to add a payment method if you want to use the paid models (GLM-4.7, GLM-4.6V, GLM-5.1, etc.).
No comments on this line yet.
+
20
• The international platform (z.ai) and the China-mainland platform (open.bigmodel.cn) are separate accounts with separate keys — pick the one closest to your users for latency.
No comments on this line yet.
+ 21 • Quickstart example:
No comments on this line yet.
+
22
No comments on this line yet.
+ 23 curl 'https://api.z.ai/api/paas/v4/chat/completions' \
No comments on this line yet.
+
24
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Authorization: Bearer YOUR_API_KEY' \ No comments on this line yet.
+
25
-H 'Content-Type: application/json' \
-H 'Content-Type: application/json' \ No comments on this line yet.
+
26
-d '{"model":"glm-4.5-flash","messages":[{"role":"user","content":"Hello"}]}'
-d '{"model":"glm-4.5-flash","messages":[{"role":"user","content":"Hello"}]}' No comments on this line yet.
+
27
No comments on this line yet.
+ 28
No comments on this line yet.
+
29
No comments on this line yet.
+ 30
No comments on this line yet.
+ 32
No comments on this line yet.
+ 33 ModelTypeInputCached inputOutputNotes
No comments on this line yet.
+ 34 GLM-4.5-FlashText chat$0$0$0Lightweight general-purpose; 128K context
No comments on this line yet.
+ 35 GLM-4.7-FlashText chat$0$0$0Newer-gen lightweight model in the GLM-4.7 family
No comments on this line yet.
+ 36 GLM-4.6V-FlashVision-language$0$0$0Multimodal — accepts image inputs alongside text
No comments on this line yet.
+ 37
No comments on this line yet.
+
38
All three are billed at $0 / 1M tokens across the board (input, cached input, cached storage, output). This is the official Free tier on the pricing page, not a promo.
No comments on this line yet.
+ 39
No comments on this line yet.
+ 40 For reference, the paid counterparts are not free:
No comments on this line yet.
+ 41 • GLM-4.7-Flash (free) vs. GLM-4.7 ≈ paid pricing
No comments on this line yet.
+ 42 • GLM-4.6V-Flash (free) vs. GLM-4.6V ≈ $0.30 / $0.90 per 1M tokens
No comments on this line yet.
+ 43 • GLM-4.7 ≈ paid; GLM-5.1 ≈ paid
No comments on this line yet.
+ 44
No comments on this line yet.
+ 45 Source of model list and prices: docs.z.ai/guides/overview/pricing.
No comments on this line yet.
+ 46
No comments on this line yet.
+
47
No comments on this line yet.
+ 48
No comments on this line yet.
+ 50
No comments on this line yet.
+
51
Z.AI's chat/completions endpoint is a near drop-in for the OpenAI SDK. In Python:
No comments on this line yet.
+ 52
No comments on this line yet.
+
53
No comments on this line yet.
+ 54 from openai import OpenAI
No comments on this line yet.
+ 55
No comments on this line yet.
+ 56 client = OpenAI(
No comments on this line yet.
+
57
api_key="YOUR_ZAI_KEY",
api_key="YOUR_ZAI_KEY", No comments on this line yet.
+
58
base_url="https://api.z.ai/api/paas/v4/",
base_url="https://api.z.ai/api/paas/v4/", No comments on this line yet.
+ 59 )
No comments on this line yet.
+ 60
No comments on this line yet.
+ 61 resp = client.chat.completions.create(
No comments on this line yet.
+
62
model="glm-4.5-flash",
model="glm-4.5-flash", No comments on this line yet.
+
63
messages=[{"role": "user", "content": "Hello"}],
messages=[{"role": "user", "content": "Hello"}], No comments on this line yet.
+ 64 )
No comments on this line yet.
+ 65 print(resp.choices[0].message.content)
No comments on this line yet.
+
66
No comments on this line yet.
+ 67
No comments on this line yet.
+
68
For GLM-4.6V-Flash (vision), pass image content using the standard OpenAI multimodal content array (type: "image_url"). Streaming, tool calling, and JSON-mode responses are supported on the Flash models.
No comments on this line yet.
+ 69
No comments on this line yet.
+
70
No comments on this line yet.
+ 71
No comments on this line yet.
+ 73
No comments on this line yet.
+ 74 Z.AI does not rate-limit by RPM/TPM the way OpenAI does. Instead, free accounts are limited by concurrency (number of in-flight requests).
No comments on this line yet.
+ 75
No comments on this line yet.
+ 76 Key caveats from the official rate-limits page:
No comments on this line yet.
+ 77 • Free-trial accounts have lower concurrency than balance-funded accounts.
No comments on this line yet.
+ 78 • For Flash-class models, requests with context length over 8K tokens are throttled to roughly 1% of the standard concurrency limit during periods of platform stress. In practice this means small prompts fly, while a single 100K-context request can stall behind paid traffic.
No comments on this line yet.
+ 79 • Concurrency limits are dynamic and per-model — Z.AI does not publish exact numbers, and they may change without notice.
No comments on this line yet.
+ 80 • Adding a small balance (or subscribing to the Coding Plan) raises concurrency caps even though Flash usage stays $0.
No comments on this line yet.
+ 81
No comments on this line yet.
+ 82 If you need predictable throughput for a production workload, treat the free Flash models as dev/eval/burst infrastructure, not as the load-bearing backbone of a paying-customer product.
No comments on this line yet.
+ 83
No comments on this line yet.
+ 84 See: z.ai/manage-apikey/rate-limits.
No comments on this line yet.
+ 85
No comments on this line yet.
+
86
No comments on this line yet.
+ 87
No comments on this line yet.
+ 89
No comments on this line yet.
+ 90 Great fit:
No comments on this line yet.
+ 91 • Indie builders prototyping LLM features without burning a Stripe card
No comments on this line yet.
+ 92 • Coding agents/scripts where you want a free fallback after hitting OpenAI/Anthropic limits
No comments on this line yet.
+ 93 • Vision experiments where 25 free Stability credits or one image/day on other free vision tiers isn't enough
No comments on this line yet.
+ 94 • Multi-language apps — GLM models are particularly strong on Chinese, but English is fully supported
No comments on this line yet.
+ 95
No comments on this line yet.
+ 96 Skip if:
No comments on this line yet.
+ 97 • You need GPT-4-class quality on hard reasoning — GLM-4.5-Flash is a small fast model, not a frontier model
No comments on this line yet.
+ 98 • You need strict data-residency in the US/EU — Z.AI infrastructure is operated from China by Zhipu AI; check your compliance posture
No comments on this line yet.
+ 99 • You need guaranteed throughput SLAs on free tier (you won't get them — see rate limits above)
No comments on this line yet.
+ 100
No comments on this line yet.
+
101
No comments on this line yet.
+ 102
No comments on this line yet.
+ 104
No comments on this line yet.
+ 105 • GLM Coding Plan (optional, paid): Z.AI also sells a Claude-Code-style subscription (Lite ~$10/mo, Pro ~$30/mo, Max ~$80/mo, billed quarterly). The promotional $3/mo tier was discontinued 2026-02-11. The Coding Plan is unrelated to the free Flash models — you can use the free models without ever subscribing.
No comments on this line yet.
+ 106 • OpenRouter alternative: The same GLM-4.5/4.7/4.6V models are also available via OpenRouter, which can be useful if you already have an OpenRouter key. Free routing through OpenRouter applies only to providers OpenRouter marks as free.
No comments on this line yet.
+ 107 • Two platforms, don't mix them up:
No comments on this line yet.
+
108
• International: z.ai + api.z.ai — covered by this listing.
No comments on this line yet.
+
109
• Mainland China: bigmodel.cn + open.bigmodel.cn — separate accounts, separate pricing, China phone number required.
No comments on this line yet.
+ 110 • Watch for model deprecations: GLM-4-Flash (the older one without a version number) is being phased out. The currently-blessed free models are the three listed above. Re-check the pricing page if a model stops responding.
No comments on this line yet.
+ 111 • Region/latency: API origin is in Asia. Expect ~150–400ms extra round-trip from US/EU compared to a domestic provider; fine for chat, painful for streaming token-by-token UX in some cases.
No comments on this line yet.
+ 112
No comments on this line yet.
+
113
No comments on this line yet.
+ 114
No comments on this line yet.
+ 115 Sources:
No comments on this line yet.
+ 116 • Z.AI Pricing — Overview
No comments on this line yet.
+ 117 • Z.AI HTTP API Introduction & Quickstart
No comments on this line yet.
+ 118 • Z.AI Rate Limits page
No comments on this line yet.
+ 119 • Z.AI Open Platform / Model API
No comments on this line yet.
+ 120 • Z.AI API Key management
No comments on this line yet.
+ 121 • Zhipu AI GLM Pricing 2026 review
No comments on this line yet.