Proactive or Reactive Provisioning, Which is Better in Serverless Systems?

The overall objective of Serverless is to provide businesses with an economic “pay-as-you-go” option for hosting code, with comparable performance to on-site servers.

Serverless service providers are able to build a business through economies of scale and sharing its infrastructure with multiple tenants (business customers). For example, businesses who purchase a couple of server-grade machines are likely not going to use them 100% (maximum utilization), which means there is space for sharing this capacity, which is what serverless service providers do.

Golden balance to represent tradeoff of performance and resource efficiency in serverless scaling policies.
Photo by Elena Mozhvilo on Unsplash

The main problem in serverless is “cold starts“— in order to share the infrastructure, the business customers’ code needs to be shuffled in and out, which leads to several requests occurring with no instance of the code already running.

To tackle this problem serverless service providers rely on scheduling techniques to manage the trade-off between cold start penalties and server utilization.

  • AWS Lambda: Bin Packing — AWS Lambda appears to aim to have the fewest possible number of instances running to serve a load of requests. AWS Lambda uses the amount of memory required by the function as declared by the developer to determine if new VMs need to be instantiated to meet demand.
  • Google Cloud Functions (GCF): Utilization-based— Google enables the most configurability in its developer-specified scaling policy. However, this is arguably contradictory to the paradigm of serverless where it should be the role of the service provider to scale instances in and out. GCF allows developers to use CPU utilization, HTTP serving rate, and cloud metrics as signals to scale in and out. GCF also allows developers to set the minimum number of instances to reduce the cold start frequency for a function.
  • Microsoft Azure Functions: Queue-Length Based—Azure Functions scale differently depending on the trigger. For example, using an Azure Queue storage trigger, scales the number of instances using the queue length and age of the oldest queue message.
  • Heuristics — Least Connections, Round Robin. These scheduling policies are often mentioned in load balancing. As a note, these do not determine the amount of server resources allocated to a certain piece of code and thus are not provisioning policies.
  • More Bin-Packing Scheduling — On the contrary, some scheduling policies like bin-packing can influence the system’s provisioning strategy. For example, the popular open-source OpenWhisk serverless platform relies on keep-alive timeouts to reduce the number of instances when an instance is not used. It uses a hashing scheduler that attempts to schedule requests primarily on the “first” available worker based on an order defined by a hashing function, allowing older instances to become stale and deallocated.
  • More Utilization-based Scaling— One of the challenges with bin-packing scheduling is runtime performance degradation due to sharing VM resources among multiple containers as discussed in TK1. To address this, TK2 aims to balance runtime performance with resource efficiency by setting a target range of CPU / memory utilization, and scaling instances in-out based on that.
  • Prediction-based — Ideally, if the number of incoming requests can be predicted, instances can be pre-provisioned as the rate or requests increase. Several works including (Zhang et al, 2013) and (Zhang et al, 2019) use Auto-Regressive Integrated Moving Average (ARIMA) and LSTM ML models respectively to predict the future request rate and scale in/out correspondingly. In addition, functions in a pipeline (function chain) can be more easily predicted (Daw et al, 2020), (Banaei and Sharifi, 2021). The main limitation of these methods depend on the predictability and amount of historical information of the workload. These works also incur large computational overhead due to the usage of ML models to perform the inference for scheduling actions that occur at 1000s of requests per second.
  • Logistics-based — Another class of scheduling techniques are based on the study of logistics and queuing theory. For example, (Suresh et al, 2020) uses the square-root staffing policy to pre-provision extra resources based on the volatility of demand and a target service rate (cold start frequency).

A provisioning strategy determines when instances of code are “warmed up”.

Now, let’s divide these methods by proactive and reactive provisioning strategies.

Proactive Provisioning

These methods typically aim to increase performance (by reducing cold start frequency) while sacrificing server utilization. This is done by proactively running instances of code, which could remain unused but is prepared to receive new requests.

  • Prediction-based policies
  • Utilization-based policies
  • Logistics-based policies

Reactive Provisioning

These methods aim instead to minimize resource costs by lazily instantiating new instances of request code when the current allocation of servers do not meet an increasing load of oncoming requests.

  • Bin-packing policies

How can these problems be solved?

There are different research directions to tackle this trade-off between resource efficiency and performance, which include:

  • Reducing the cost cold start. By reducing the cost of cold starts, reactive provisioning can become more attractive. This is because idle resources can be instead used for actual computation rather than awaiting future requests that may not arrive. Reactive provisioning with low cold start costs are especially attractive with increasing adoption of accelerators such as GPUs in serverless due to the expensive nature of GPUs.
  • Accurately predicting future workload. As noted previously, accurate prediction of workloads can ideally allow maximum resource utilization while minimizing cold start penalty. Workload prediction works in both individual functions as well as function chains.
  • Reducing the cost of idle resources. Under the paradigm of proactive provisioning to minimize cold start frequencies, an alternative is to reduce the cost of idle resources. This can be done by using cheaper / lower-end machines to host proactively provisioned instances.

Conclusion

This article compares and contrasts proactive and reactive provisioning strategies. There are more diverse proactive provisioning policies than reactive. This article starts by identifying the motives of actors in the serverless paradigm, mainly service providers and business customers, then by identifying commercial scheduling policies and relating them back to provisioning policies. Finally, future directions for research in serverless are provided.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Justin San Juan

Justin San Juan

48 Followers

Award-Winning Software Engineer | Business and Web Consultant