Skip to main content
All products

GPU Fleet

T4 through B200.

The fleet spans ten GPU classes, from a 16 GB T4 for small batch jobs to a 192 GB B200 for frontier models and the lowest latency.

You do not have to know the catalog — the optimizer maps your model and priority onto the right card — but you can pin a specific GPU when you want to.

What you get

No. 01

Budget tier

T4, L4, and A10 for small models, batch work, and latency-tolerant serving where tokens-per-dollar wins.

No. 02

Mid & performance

L40S, A100, and H100/H200 for 8B–70B production inference with real latency SLAs.

No. 03

Top tier

B200 with 192 GB and 8 TB/s bandwidth for frontier models and the lowest achievable latency.

Full fleet · Schedule A

GPUTierVRAMBest for/hour
T4
Budget16 GBSmall models, batch jobs$0.59
L4
Budget24 GB7B–8B chat, moderate throughput$0.80
A10
Budget24 GBLatency-tolerant serving$1.10
L40S
Mid48 GB8B–13B production inference$1.95
A100 40 GB
Mid40 GBMid-size models, stable workloads$2.10
RTX Pro 6000
Mid96 GBMemory-heavy single-GPU serving$3.05
H100
Performance80 GB30B–70B, demanding latency SLAs$3.95
H200
Performance141 GBLong-context 70B, MoE models$4.54
B200
8 TB/s
Top192 GBFrontier models, lowest latency$6.25
A100 80 GB
Performance80 GBLegacy workloads requiring A100$8.99