GLM-5.2 FAST on Fireworks

Real, measured numbers — GSM8K at climbing concurrency and a 2K → 32K context sweep. No vendor stats, just the stopwatch.

Speed report · Fireworks FAST is the fastest model we've tested

Throughput

769.8

tok/s aggregate

total output ÷ batch wall time

Combined tokens/sec across 16 parallel requests — the system's total output under load. 1.7× the nearest provider.

TTFT

0.95s

time to first token

send → first token

Wait from sending the prompt to the first token coming back — the prefill cost before any text appears.

Decode

1,089

tok/s generation

output ÷ (latency − TTFT)

How fast a single request streams tokens once generation starts (prefill excluded, 2K context). 3× the next model.

Prefill

31,885

tok/s ingest

prompt tokens ÷ latency

How fast it reads the prompt — a full 32K-token context ingested in about a second.

Throughput vs concurrency

aggregate tok/s — does it keep scaling, or flatten?

total output ÷ batch wall timeSystem throughput under load — climbs until the server saturates, then flattens.

Why it matters: the steeper this climbs, the more users you can serve at once without each request slowing down.

Note: assumes constant available capacity. Other users share the same capacity and it may be autoscaling.

generation tok/s, prefill excluded — 2K → 32K tokens

output ÷ (latency − TTFT)Pure generation speed — flat means each request decodes at the same rate regardless of load.

Why it matters: the flatter this stays, the less your long prompts slow generation down — big context stays fast.