Requests / sec
0
+130% vs cold at 60% prompt reuse
LLM inference gateway · benchmark results
The median rides the cache fast-path (around 10 ms), but the roughly 34% that miss still pay full backend latency, so p95 and p99 stay high. Caching lifts the median, not the tail.