GPU Fleet

Name: GPU Fleet
Brand: Seattle Compute Company

T4 through B200.

The fleet spans ten GPU classes, from a 16 GB T4 for small batch jobs to a 192 GB B200 for frontier models and the lowest latency.

You do not have to know the catalog — the optimizer maps your model and priority onto the right card — but you can pin a specific GPU when you want to.

See per-hour rates View pricing

What you get

No. 01

Budget tier

T4, L4, and A10 for small models, batch work, and latency-tolerant serving where tokens-per-dollar wins.

No. 02

Mid & performance

L40S, A100, and H100/H200 for 8B–70B production inference with real latency SLAs.

No. 03

Top tier

B200 with 192 GB and 8 TB/s bandwidth for frontier models and the lowest achievable latency.

Full fleet · Schedule A

GPU	Tier	VRAM	Best for	/hour
T4	Budget	16 GB	Small models, batch jobs	$0.59
L4	Budget	24 GB	7B–8B chat, moderate throughput	$0.80
A10	Budget	24 GB	Latency-tolerant serving	$1.10
L40S	Mid	48 GB	8B–13B production inference	$1.95
A100 40 GB	Mid	40 GB	Mid-size models, stable workloads	$2.10
RTX Pro 6000	Mid	96 GB	Memory-heavy single-GPU serving	$3.05
H100	Performance	80 GB	30B–70B, demanding latency SLAs	$3.95
H200	Performance	141 GB	Long-context 70B, MoE models	$4.54
B200 8 TB/s	Top	192 GB	Frontier models, lowest latency	$6.25
A100 80 GB	Performance	80 GB	Legacy workloads requiring A100	$8.99

More products

Dedicated Inference Inference API Custom Domains Autoscaling & Replicas