F
GLM-5.2 FAST on Fireworks
Real, measured numbers — GSM8K at climbing concurrency and a 2K → 32K context sweep. No vendor stats, just the stopwatch.
Speed report · Fireworks FAST is the fastest model we've tested
Throughput
769.8
tok/s aggregate
total output ÷ batch wall time
Combined tokens/sec across 16 parallel requests — the system's total output under load. 1.7× the nearest provider.
TTFT
0.95s
time to first token
send → first token
Wait from sending the prompt to the first token coming back — the prefill cost before any text appears.
Decode
1,089
tok/s generation
output ÷ (latency − TTFT)
How fast a single request streams tokens once generation starts (prefill excluded, 2K context). 3× the next model.
Prefill
31,885
tok/s ingest
prompt tokens ÷ latency
How fast it reads the prompt — a full 32K-token context ingested in about a second.
Throughput vs concurrency
aggregate tok/s — does it keep scaling, or flatten?total output ÷ batch wall timeSystem throughput under load — climbs until the server saturates, then flattens.
Why it matters: the steeper this climbs, the more users you can serve at once without each request slowing down.
Note: assumes constant available capacity. Other users share the same capacity and it may be autoscaling.
Decode speed vs context length
generation tok/s, prefill excluded — 2K → 32K tokensoutput ÷ (latency − TTFT)Pure generation speed — flat means each request decodes at the same rate regardless of load.
Why it matters: the flatter this stays, the less your long prompts slow generation down — big context stays fast.