Journal · Sheet 05

Field notes on inference.

Practical writing on GPU selection, inference engines, and the economics of running dedicated endpoints. Written by the team that runs them.

Featured

Choosing the right GPU for your model

A practical map from model size and precision to the cheapest GPU that still hits your latency target — from a T4 to a B200.

By The Seattle Compute Team

All posts · Schedule B

Shared inference APIs are perfect until they are not. Here is where the wall is, why it is there, and what dedicated GPUs fix.

Both are excellent. They optimize for different shapes of traffic. Here is the short version of when to reach for each.