Velo

LLM inference gateway · benchmark results

Requests / sec
0
+130% vs cold at 60% prompt reuse
p50 latency
0ms
−84% vs cold
TTFT p50
0ms
−88% vs cold
Cache hit rate
0%
vs 0% on cold

Latency distribution

The median rides the cache fast-path (around 10 ms), but the roughly 34% that miss still pay full backend latency, so p95 and p99 stay high. Caching lifts the median, not the tail.

Go pgvector Redis 16 workers × 30s threshold 0.92 60% prompt reuse 0 errors