All products
GPU Fleet
T4 through B200.
The fleet spans ten GPU classes, from a 16 GB T4 for small batch jobs to a 192 GB B200 for frontier models and the lowest latency.
You do not have to know the catalog — the optimizer maps your model and priority onto the right card — but you can pin a specific GPU when you want to.
What you get
No. 01
Budget tier
T4, L4, and A10 for small models, batch work, and latency-tolerant serving where tokens-per-dollar wins.
No. 02
Mid & performance
L40S, A100, and H100/H200 for 8B–70B production inference with real latency SLAs.
No. 03
Top tier
B200 with 192 GB and 8 TB/s bandwidth for frontier models and the lowest achievable latency.
Full fleet · Schedule A
| GPU | Tier | VRAM | Best for | /hour |
|---|---|---|---|---|
T4 | Budget | 16 GB | Small models, batch jobs | $0.59 |
L4 | Budget | 24 GB | 7B–8B chat, moderate throughput | $0.80 |
A10 | Budget | 24 GB | Latency-tolerant serving | $1.10 |
L40S | Mid | 48 GB | 8B–13B production inference | $1.95 |
A100 40 GB | Mid | 40 GB | Mid-size models, stable workloads | $2.10 |
RTX Pro 6000 | Mid | 96 GB | Memory-heavy single-GPU serving | $3.05 |
H100 | Performance | 80 GB | 30B–70B, demanding latency SLAs | $3.95 |
H200 | Performance | 141 GB | Long-context 70B, MoE models | $4.54 |
B200 8 TB/s | Top | 192 GB | Frontier models, lowest latency | $6.25 |
A100 80 GB | Performance | 80 GB | Legacy workloads requiring A100 | $8.99 |