How to optimize AWS Lambda functions with provisioned concurrency and autoscaling – News Block

AWS logo

AWS Lambda Functions are a serverless solution for running code in the cloud without setting up your own servers. The main disadvantage is that initialization times can be long, leading to higher latency. With provisioned concurrency, you can resolve this issue.

What is provisioned concurrency?

Lambda functions run in their own “runtimes,” which are typically started automatically when a request is made. After a function call, the environment will stay “warm” for about 5-15 minutes and the initialization code will not need to be executed again.

However, after that period, or if multiple requests must be served simultaneously, a new execution environment must be activated, leading to increased startup times called “cold start”. This can be especially pronounced with languages ​​that need to do large amounts of JIT compilation at startup, such as Java and .NET.

To solve this, AWS has a feature called Provisioned Concurrency, where you can essentially reserve a certain number of execution environments to be permanently hot throughout the day. This means that all the initialization code is executed in advance and you don’t have to experience cold boots.

If you rely on Lambda to serve user requests, you might want to consider provisioned concurrency, even if it ends up costing a bit more. While cold starts typically only account for 1% of requests, that 1% can be extra seconds waiting for an app to load, depending on the runtime and size of your code.

How much does provisioned concurrency cost?

Unlike EC2 Reserved Instances, Provisioned Concurrency is still primarily a “pay as you go” price, like the rest of Lambda. You pay a small fee for each hour that each environment is provisioned, and then pay for Lambda requests as usual.

However, because traffic is more predictable on the AWS end and it’s cheaper not to have to run initialization code all the time, the compute cost per function for requests made with provisioned concurrency is actually lower. There are also no downsides to going over the limit: you’ll only be charged the standard concurrency price.

In general, provisioned concurrency can be a bit cheaper (around 5-10%) if you have very predictable traffic and reserve exactly that much capacity. However, it can also be a bit more expensive in some cases. You’ll want to check your analytics and plug them into the AWS Lambda Pricing Calculator for more information.

Enabling provisioned concurrency

Enabling provisioned concurrency is fairly simple, but it has a drawback: you can’t point to the default version of $LATEST. This label is an alias that you can change and does not point to a specific version, and provisioned concurrency needs to reserve a specific version. So you’ll need to publish a new version of Lambda, if you don’t already have one:

Then set up an alias to point to that version. This alias can be updated, which will trigger an update for provisioned environments.

Once you’ve set up your alias, you can add a new concurrency setting from Lambda settings under Settings > Concurrency. You can also set this directly from the alias settings.

Configuration for provisioned concurrency is simple: select an alias and enter an amount to provision.

You can also set and update this value using the AWS API or CLI, which can be used to automate it throughout the day:

aws lambda put-provisioned-concurrency-config --function-name MyFunction \
--qualifier LatestProvisioned --provisioned-concurrent-executions 10

Autoscale with provisioned concurrency

Since provisioned concurrency can be adjusted throughout the day, it can also be connected to AWS Application Auto Scaling to adjust based on usage. Hooking this up is simple and only requires a few AWS CLI or API commands, as there is no management console for this yet.

First, you’ll need to register the Lambda function as a scaling target. Here, you will need to edit the function name (MyFunction) and alias (LatestProvisioned), and also adjust the minimum and maximum capacity ranges.

aws application-autoscaling register-scalable-target --service-namespace lambda \
--resource-id function:MyFunction:LatestProvisioned --min-capacity 2 --max-capacity 10 \
--scalable-dimension lambda:function:ProvisionedConcurrency

You can then enable an autoscaling policy, using the role name and alias as the resource ID, and configure it with a JSON scaling policy. This example configures it to scale up and down when LambdaProvisionedConcurrencyUtilization goes above or below 70%.

aws application-autoscaling put-scaling-policy --service-namespace lambda \
--scalable-dimension lambda:function:ProvisionedConcurrency --resource-id function:MyFunction:LatestProvisioned \
--policy-name my-policy --policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{ "TargetValue": 0.7, "PredefinedMetricSpecification": { "PredefinedMetricType": "LambdaProvisionedConcurrencyUtilization" }}'

This doesn’t cover all cases, like quick bursts of usage that don’t last long, but it works well for constant traffic and will save you money at night when traffic is low.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top