F
GLM-5.2 FAST on Fireworks
We loaded it up and it didn't flinch. Real, measured numbers — GSM8K fired at climbing concurrency and a 2K → 32K context sweep. No vendor stats, just the stopwatch.
Speed report · Fireworks FAST is the fastest model we've tested
Throughput
769.8
tok/s aggregate
16 concurrent requests — 1.7× the nearest provider
TTFT
0.95s
time to first token
prefill done before you blink
Decode
1,089
tok/s generation
at 2K context, prefill excluded — 3× the next model
Prefill
31,885
tok/s ingest
takes in a 32K-token prompt in a blink
Throughput vs concurrency
aggregate tok/s — does it keep scaling, or flatten?Decode speed vs context length
generation tok/s, prefill excluded — 2K → 32K tokensTTFT
Time to first token. Seconds from send to the first chunk — pure prefill cost (ingesting the prompt) before generation starts.
Latency p50 / p95
Total wall time per request. p95 outrunning p50 means queueing and contention under load.
Aggregate tok/s
Total output ÷ batch wall time. System throughput under load — climbs until the server saturates, then flattens.
Decode tok/s
Output ÷ (latency − TTFT), prefill excluded. Pure generation speed — flat means each request decodes at the same rate regardless of load.
Prefill tok/s
Prompt tokens ÷ latency. How fast the server ingests context — roughly flat is a steady ingest rate; dropping means a throughput ceiling.