Skip to content

Throughput

[/ˈθruːpaʊt/]

nounTechnology#performance#scale#capacity#metrics
0 views1 definitions

Definitions

1
+760

The amount of work a system can process in a given time period. In APIs it's usually measured in requests per second; in AI inference it's tokens per second. Throughput and latency are related but distinct — a system can have high throughput while still having high latency for individual requests.

The inference cluster achieved 10,000 tokens per second throughput across all concurrent users.
by @cloudarch1/1/1970

Related Terms

Related terms are generated only from public tags, classes, translations, and explicit relationships. No unavailable semantic relationships are fabricated.