GLM-5.2 FAST on Fireworks

We loaded it up and it didn't flinch. Real, measured numbers — GSM8K fired at climbing concurrency and a 2K → 32K context sweep. No vendor stats, just the stopwatch.

Speed report · Fireworks FAST is the fastest model we've tested

Throughput

769.8

tok/s aggregate

16 concurrent requests — 1.7× the nearest provider

TTFT

0.95s

time to first token

prefill done before you blink

Decode

1,089

tok/s generation

at 2K context, prefill excluded — 3× the next model

Prefill

31,885

tok/s ingest

takes in a 32K-token prompt in a blink

Throughput vs concurrency

aggregate tok/s — does it keep scaling, or flatten?

Decode speed vs context length

generation tok/s, prefill excluded — 2K → 32K tokens

TTFT

Time to first token. Seconds from send to the first chunk — pure prefill cost (ingesting the prompt) before generation starts.

Latency p50 / p95

Total wall time per request. p95 outrunning p50 means queueing and contention under load.

Aggregate tok/s

Total output ÷ batch wall time. System throughput under load — climbs until the server saturates, then flattens.

Decode tok/s

Output ÷ (latency − TTFT), prefill excluded. Pure generation speed — flat means each request decodes at the same rate regardless of load.

Prefill tok/s

Prompt tokens ÷ latency. How fast the server ingests context — roughly flat is a steady ingest rate; dropping means a throughput ceiling.