5 Performance metrics every system architect should know

Justin San Juan
4 min readOct 28, 2022


In this article, I describe a few (non-exhaustive) performance metrics every system architect should know.

The goal of system architects is to design and oversee the development of IT infrastructure that supports business goals

Firstly, we need to understand what a system architect does:

A system architect is in charge of devising, configuring, operating, and maintaining both computer and networking systems. They objectively analyze desired processes and outcomes and advise on the right combination of IT systems and components to achieve specific business, department, team, or functional goals (Shiff, 2022).

With this, every system architect must fully understand the different IT services being supported, how they interact with each other, and the performance requirements of each one and as a whole.

There are 5 important metrics useful for measuring service performance

Figure 1. A simple request, process, response sequence between two services.

1. Latency (first-byte)

The first important metric is the first-byte latency. This is the time it takes for the smallest input (typically a single byte) to be processed from the start of the request to the end of the response. These are often what are written in hardware or system specifications (e.g. disk latency, memory access latency, etc.).

2. Latency (end-to-end)

The end-to-end latency, similar to first-byte latency, is the time it takes from the start to finish of the transaction. The difference between the two arises due to processing times that depend on the size of the input. For example, matrix multiplication between two 1000x1000 matrices takes longer than two 2x2 matrices.

This is the performance metric that system architects need to be aware of for every service. It is especially important when gathering numbers to be aware of this relationship between latency and input size. For example, comparing the end-to-end latency of processing a small input to a large input is like comparing apples to oranges.

3. Throughput

Throughput is the number of tasks that a service completes within a given time range. For example, services are often measured in terms of requests per second (rps).

4. Bandwidth

Bandwidth is the maximum rated capacity of a service. A typical example of this is network bandwidth, which is what is advertised by the Internet Service Provider (ISP) based on the specifications of hardware used.

In a service, the bandwidth is essentially the maximum number of requests it can process per second. In contrast, throughput is the actual requests per second that is realized by the system, which is equal to or less than the bandwidth.

5. Concurrency

Finally, concurrency is the number of requests that a service can process at the same time. Note that this is measured with a time duration as the denominator (e.g. 100 concurrent requests v.s. 1000 requests per second).

The maximum concurrency of a system and the average latency of requests define the bandwidth of the system:

100 concurrent requests * 500ms requests means = bandwidth of 200 requests per second

System architects need to identify the bottleneck in order to find opportunities for improvement

To increase the performance of a system, different strategies could be used. For example, decreasing the end-to-end latency of requests by minimizing the processing time can increase the bandwidth of the service. Similarly, horizontally scaling the service by adding more worker nodes can increase the concurrency and thus can also increase the bandwidth of the service.

To identify such opportunities, system architects can think like chemists

An analogy to Chemistry is that in a chemical reaction, with material inputs converted to outputs, the expected amount of output can be determined by finding the “limiting reagent”.

In IT services, the bottleneck can be the memory space of a system. For example, a worker node may only be able to host 4 instances of a service due to the memory size of each service being a quarter of the worker node’s memory capacity.

Since memory is the limiting reagent, the compute power of the worker node must be underutilized. To tackle this, decreasing the memory size of the instances, or even shuffling instances in the memory through paging could be techniques to increase the bandwidth of the service.


In this article, I have described 5 important performance metrics that system architects should be aware of. These metrics are first-byte latency, end-to-end latency, throughput, bandwidth, and concurrency. To identify ways to improve a service and meet desired performance levels, the system architect must identify the bottleneck in the system. Knowing the bottleneck will point to solutions that will help, and rule out solutions that will not.

If you enjoyed this article or learned something from it, I would really appreciate your clap :)

If you have more ideas that you think can help, send me a message or comment down below!



Justin San Juan

Award-Winning Software Engineer | Business and Web Consultant