Hugging Face ZeroGPU for Spaces - Free H200 GPU Access
Source: https://huggingface.co/docs/hub/en/spaces-zerogpu
Description
Create account to comment on specific lines or Sign in
+ 1 ZeroGPU is Hugging Face's dynamic GPU sharing system that allocates NVIDIA H200 GPUs to Gradio-based Spaces on demand. Instead of renting a dedicated GPU 24/7, your app grabs a powerful H200 slice when it needs compute and releases it immediately after. Free accounts get 3.5 minutes of daily GPU quota, and PRO subscribers ($9/month) get 25 minutes/day with highest queue priority. All users can use existing ZeroGPU Spaces for free; hosting your own requires a PRO or Team/Enterprise subscription.
No comments on this line yet.
+ 2
No comments on this line yet.
+
3
No comments on this line yet.
+ 4
No comments on this line yet.
+ 6
No comments on this line yet.
+ 8 1. Go to huggingface.co and create a free account
No comments on this line yet.
+ 9 2. Browse the curated list of ZeroGPU Spaces
No comments on this line yet.
+ 10 3. Use any Space directly -- GPU is allocated automatically when you run inference
No comments on this line yet.
+ 11 4. Even unauthenticated users get 2 minutes/day, but logging in bumps you to 3.5 minutes
No comments on this line yet.
+ 12
No comments on this line yet.
+ 14 1. Subscribe to Hugging Face PRO ($9/month) or join a Team/Enterprise org
No comments on this line yet.
+ 15 2. Create a new Space and select Gradio as the SDK
No comments on this line yet.
+ 16 3. In Space settings, choose ZeroGPU as the hardware
No comments on this line yet.
+
17
4. In your code, import spaces and decorate GPU-dependent functions with @spaces.GPU
No comments on this line yet.
+ 18 5. Push your code -- the Space auto-builds and GPU is allocated dynamically per request
No comments on this line yet.
+ 19
No comments on this line yet.
+ 20 Important:
No comments on this line yet.
+ 21 • ZeroGPU only works with Gradio SDK -- Streamlit and Docker Spaces are not supported
No comments on this line yet.
+ 22 • Personal PRO accounts can host up to 10 ZeroGPU Spaces
No comments on this line yet.
+ 23 • Organization Team/Enterprise accounts can host up to 50 ZeroGPU Spaces
No comments on this line yet.
+
24
• torch.compile is not supported; use PyTorch ahead-of-time compilation (torch 2.8+) instead
No comments on this line yet.
+ 25
No comments on this line yet.
+
26
No comments on this line yet.
+ 27
No comments on this line yet.
+ 29
No comments on this line yet.
+ 30 GPU SizeBacking HardwareVRAMQuota Cost
No comments on this line yet.
+ 31 large (default)Half NVIDIA H20070 GB1x
No comments on this line yet.
+ 32 xlargeFull NVIDIA H200141 GB2x
No comments on this line yet.
+ 33
No comments on this line yet.
+ 34 The NVIDIA H200 features HBM3e memory and Hopper architecture, making it one of the most powerful GPUs available. You get access to this hardware for free -- a significant advantage over platforms offering older T4 or P100 GPUs.
No comments on this line yet.
+ 35
No comments on this line yet.
+ 37
No comments on this line yet.
+
38
No comments on this line yet.
+ 40 @spaces.GPU
No comments on this line yet.
+ 41 def generate(prompt):
No comments on this line yet.
+
42
return pipe(prompt).images
return pipe(prompt).images No comments on this line yet.
+ 43
No comments on this line yet.
+ 45 @spaces.GPU(size="xlarge")
No comments on this line yet.
+ 46 def generate_large(prompt):
No comments on this line yet.
+
47
return pipe(prompt).images
return pipe(prompt).images No comments on this line yet.
+
48
No comments on this line yet.
+ 49
No comments on this line yet.
+
50
No comments on this line yet.
+ 51
No comments on this line yet.
+ 53
No comments on this line yet.
+ 54 Account TypeDaily GPU QuotaQueue Priority
No comments on this line yet.
+ 55 Unauthenticated2 minutesLow
No comments on this line yet.
+ 56 Free account3.5 minutesMedium
No comments on this line yet.
+ 57 PRO account ($9/month)25 minutesHighest
No comments on this line yet.
+ 58 Team org member25 minutesHighest
No comments on this line yet.
+ 59 Enterprise org member45 minutesHighest
No comments on this line yet.
+ 60
No comments on this line yet.
+ 61 LimitValue
No comments on this line yet.
+ 62 Quota reset24 hours after first GPU usage
No comments on this line yet.
+ 63 Default function timeout60 seconds
No comments on this line yet.
+
64
Max function durationConfigurable via @spaces.GPU(duration=120)
No comments on this line yet.
+ 65 ZeroGPU Spaces limit (PRO)10 per personal account
No comments on this line yet.
+ 66 ZeroGPU Spaces limit (Org)50 per organization
No comments on this line yet.
+ 67
No comments on this line yet.
+
68
No comments on this line yet.
+ 69
No comments on this line yet.
+ 71
No comments on this line yet.
+ 72 FrameworkSupported Versions
No comments on this line yet.
+ 73 Gradio4+ (required -- only supported SDK)
No comments on this line yet.
+ 74 PyTorch2.1.0 through latest (2.9.1)
No comments on this line yet.
+ 75 Python3.10.13, 3.12.12
No comments on this line yet.
+ 76 HF Librariestransformers, diffusers, accelerate (enhanced compatibility)
No comments on this line yet.
+ 77
No comments on this line yet.
+
78
No comments on this line yet.
+ 79
No comments on this line yet.
+ 81
No comments on this line yet.
+
82
• Maximize quota efficiency: Set shorter duration values for quick functions to improve queue priority. Example: @spaces.GPU(duration=30) for functions that finish in under 30 seconds
No comments on this line yet.
+
83
• Use dynamic duration: Pass a callable to duration that estimates runtime based on input parameters, so longer tasks get appropriate time without wasting quota on short ones
No comments on this line yet.
+
84
• xlarge is rarely needed: Only use size="xlarge" when your model truly needs >70 GB VRAM. It consumes 2x quota and has longer queue wait times
No comments on this line yet.
+
85
• Ahead-of-time compilation: Since torch.compile is not supported, use PyTorch AOT compilation (torch 2.8+) for performance optimization. See the official blog post
No comments on this line yet.
+ 86 • Use flash-attention 3: For large models, combining AOT compilation with flash-attention 3 significantly reduces inference time and quota consumption
No comments on this line yet.
+
87
• The @spaces.GPU decorator is effect-free outside ZeroGPU: Your code will work normally on local machines or other platforms without modification
No comments on this line yet.
+ 88 • Remaining quota affects priority: The more quota you have left, the higher your priority in the queue. Heavy users get deprioritized within a day
No comments on this line yet.
+ 89 • PRO subscription is excellent value: At $9/month, you get 25 minutes of H200 daily -- equivalent to hundreds of dollars of GPU time per month if rented conventionally
No comments on this line yet.
+ 90
No comments on this line yet.
+
91
No comments on this line yet.
+ 92
No comments on this line yet.
+ 93 Sources:
No comments on this line yet.
+ 94 • Spaces ZeroGPU Documentation
No comments on this line yet.
+ 95 • ZeroGPU AOT Compilation Blog Post
No comments on this line yet.
+ 96 • Hugging Face Pricing
No comments on this line yet.
+ 97 • Hugging Face PRO Account
No comments on this line yet.
+ 98 • ZeroGPU Explorers Community
No comments on this line yet.
+ 99 • Advanced Compute Options
No comments on this line yet.
+ 100 • Using GPU Spaces
No comments on this line yet.