This article describes the similarities and differences of scheduling policies in serverless. In partcilar, I will focus on the dimension of proactive v.s. reactive provisioning in the environment of serverless systems. I list out the different policies by goal and effect, then briefly describe each one.
The paradigm of “serverless” consists of two main parties: the service provider, and the business customer.
The business customer itself includes developers that upload code to the service provider. The business customer also has end-customers that request the usage of the uploaded code via an Application Programming Interface (API). These requests may be triggered from interacting with the business’ website, or other devices used by the business.
The overall objective of Serverless is to provide businesses with an economic “pay-as-you-go” option for hosting code, with comparable performance to on-site servers.
Serverless service providers are able to build a business through economies of scale and sharing its infrastructure with multiple tenants (business customers). For example, businesses who purchase a couple of server-grade machines are likely not going to use them 100% (maximum utilization), which means there is space for sharing this capacity, which is what serverless service providers do.
On the business customer side, this enables businesses to use the compute infrastructure on-demand, thus only paying for what it uses, rather than paying the up-front capital cost of building / setting up servers and the necessary network infrastructure.
The main problem in serverless is “cold starts“— in order to share the infrastructure, the business customers’ code needs to be shuffled in and out, which leads to several requests occurring with no instance of the code already running.
To tackle this problem serverless service providers rely on scheduling techniques to manage the trade-off between cold start penalties and server utilization.
There are several examples of scheduling policies adopted by current commercial offerings of Serverless:
- AWS Lambda: Bin Packing — AWS Lambda appears to aim to have the fewest possible number of instances running to serve a load of requests. AWS Lambda uses the amount of memory required by the function as declared by the developer to determine if new VMs need to be instantiated to meet demand.
- Google Cloud Functions (GCF): Utilization-based— Google enables the most configurability in its developer-specified scaling policy. However, this is arguably contradictory to the paradigm of serverless where it should be the role of the service provider to scale instances in and out. GCF allows developers to use CPU utilization, HTTP serving rate, and cloud metrics as signals to scale in and out. GCF also allows developers to set the minimum number of instances to reduce the cold start frequency for a function.
- Microsoft Azure Functions: Queue-Length Based—Azure Functions scale differently depending on the trigger. For example, using an Azure Queue storage trigger, scales the number of instances using the queue length and age of the oldest queue message.
Other scheduling techniques include:
- Heuristics — Least Connections, Round Robin. These scheduling policies are often mentioned in load balancing. As a note, these do not determine the amount of server resources allocated to a certain piece of code and thus are not provisioning policies.
- More Bin-Packing Scheduling — On the contrary, some scheduling policies like bin-packing can influence the system’s provisioning strategy. For example, the popular open-source OpenWhisk serverless platform relies on keep-alive timeouts to reduce the number of instances when an instance is not used. It uses a hashing scheduler that attempts to schedule requests primarily on the “first” available worker based on an order defined by a hashing function, allowing older instances to become stale and deallocated.
- More Utilization-based Scaling— One of the challenges with bin-packing scheduling is runtime performance degradation due to sharing VM resources among multiple containers as discussed in TK1. To address this, TK2 aims to balance runtime performance with resource efficiency by setting a target range of CPU / memory utilization, and scaling instances in-out based on that.
- Prediction-based — Ideally, if the number of incoming requests can be predicted, instances can be pre-provisioned as the rate or requests increase. Several works including (Zhang et al, 2013) and (Zhang et al, 2019) use Auto-Regressive Integrated Moving Average (ARIMA) and LSTM ML models respectively to predict the future request rate and scale in/out correspondingly. In addition, functions in a pipeline (function chain) can be more easily predicted (Daw et al, 2020), (Banaei and Sharifi, 2021). The main limitation of these methods depend on the predictability and amount of historical information of the workload. These works also incur large computational overhead due to the usage of ML models to perform the inference for scheduling actions that occur at 1000s of requests per second.
- Logistics-based — Another class of scheduling techniques are based on the study of logistics and queuing theory. For example, (Suresh et al, 2020) uses the square-root staffing policy to pre-provision extra resources based on the volatility of demand and a target service rate (cold start frequency).
A provisioning strategy determines when instances of code are “warmed up”.
Now, let’s divide these methods by proactive and reactive provisioning strategies.
These methods typically aim to increase performance (by reducing cold start frequency) while sacrificing server utilization. This is done by proactively running instances of code, which could remain unused but is prepared to receive new requests.
Serverless service providers that use this policy tend to continue charging by the actual number and duration of requests, thus absorbing the idle resource cost instead of passing the cost of running idle servers to the business customers.
Proactive provisioning policies include:
- Prediction-based policies
- Utilization-based policies
- Logistics-based policies
These methods aim instead to minimize resource costs by lazily instantiating new instances of request code when the current allocation of servers do not meet an increasing load of oncoming requests.
By doing this, serverless service providers do not incur additional idle server costs, but sacrifices performance by having an increased frequency of cold starts. This may be unattractive for business customers that have stringent latency requirements.
Reactive provisioning policies include:
- Bin-packing policies
How can these problems be solved?
There are different research directions to tackle this trade-off between resource efficiency and performance, which include:
- Reducing the cost cold start. By reducing the cost of cold starts, reactive provisioning can become more attractive. This is because idle resources can be instead used for actual computation rather than awaiting future requests that may not arrive. Reactive provisioning with low cold start costs are especially attractive with increasing adoption of accelerators such as GPUs in serverless due to the expensive nature of GPUs.
- Accurately predicting future workload. As noted previously, accurate prediction of workloads can ideally allow maximum resource utilization while minimizing cold start penalty. Workload prediction works in both individual functions as well as function chains.
- Reducing the cost of idle resources. Under the paradigm of proactive provisioning to minimize cold start frequencies, an alternative is to reduce the cost of idle resources. This can be done by using cheaper / lower-end machines to host proactively provisioned instances.
This article compares and contrasts proactive and reactive provisioning strategies. There are more diverse proactive provisioning policies than reactive. This article starts by identifying the motives of actors in the serverless paradigm, mainly service providers and business customers, then by identifying commercial scheduling policies and relating them back to provisioning policies. Finally, future directions for research in serverless are provided.
If you enjoyed this article and want more, I would really appreciate your clap :)
If you have more ideas that you think can help, send me a message or comment down below!