Journal · Sheet 05
Field notes on inference.
Practical writing on GPU selection, inference engines, and the economics of running dedicated endpoints. Written by the team that runs them.
All posts · Schedule B
2 min readInference
Why dedicated inference beats shared APIs at scale
Shared inference APIs are perfect until they are not. Here is where the wall is, why it is there, and what dedicated GPUs fix.
Read2 min readInference
vLLM vs SGLang: picking an inference engine
Both are excellent. They optimize for different shapes of traffic. Here is the short version of when to reach for each.
Read