How Do You Like Your AWS Lambda Concurrency — Part 1: Introduction

Developer Experience Review and Perfomance Benchmark of Python, Node.js, Go, Rust, Java and C++ in AWS Lambda

7 min readAug 24, 2023

In this five-part blog series, we explore how much speed improvement can be achieved when a simple sequential logic is converted to a concurrent implementation. We review the developer experience on implementing the same concurrent logic using Python, Node.js, Go, Rust, Java and C++ and benchmark each implementation in AWS Lambda with different resource configurations.

You can jump directly to the different articles in this series:

With each of these programming languages, we implement the same logic which involves a series of IO-bound tasks. We start from the Lambda’s minimum vCPU resources and then gradually increase that up to maximum and see how that affects processing performance.

We measure the speed of execution of each implementation and compare the languages against each other from an objective perspective i.e. in terms of measured execution speed.

We also evaluate each language from a subjective developer experience perspective i.e. how easy it is to implement parallel logic code and what it takes for each language to get your code running in AWS Lambda.

Key Terminology and Concepts

First let’s clarify the following key terms and concepts:

Serial, parallel and concurrent execution
IO-bound and CPU-bound operations
Synchronous (blocking) and asynchronous (non-blocking) programming model

Serial execution refers to the sequential processing of tasks, where one task is completed before moving on to the next. This linear approach ensures that each task is executed one after the other, creating a predictable and ordered flow.

Parallel execution, on the other hand, involves the simultaneous execution of multiple tasks, utilizing multiple processing units or cores or higher architectural constructs.

Concurrent execution introduces the notion of overlapping tasks and allows them to progress simultaneously. The key distinction from parallel execution is that concurrency focuses on managing and coordinating tasks efficiently, optimizing resource utilization and responsiveness.

IO-bound operations refer to tasks that spend a significant amount of time waiting for input/output operations, such as reading from or writing to external resources like files or networks. These operations are “input/output bound” because they are limited by the speed of external devices or network connections.

CPU-bound (or blocking) operations are computational tasks that primarily utilize the processing power of the CPU. These tasks involve extensive calculations or complex algorithms and are “CPU bound” because they are limited by the speed and capabilities of the CPU.

In the synchronous (or blocking) programming model, tasks are executed sequentially, one after the other. Each task must wait for the previous one to complete before it can proceed. This synchronous execution ensures ordered and deterministic processing.

Asynchronous (or non-blocking) programming models allow tasks to proceed independently without waiting for each other to finish. Task can initiate an IO-operation and while it’s waiting for a result other tasks are executed.

For CPU-bound operations the choice of programming language and used libraries as well as the efficiency of the implemented algorithm plays a big role. If the work can be divided into separate tasks then parallel execution provides great performance improvements. For IO-bound operations concurrent execution using an asynchronous programming model provides good results, while keeping the resource requirements low.

Some Limitations to Consider

Our implementation runs in AWS Lambda and uses language specific AWS SDKs to access AWS S3. There are some limitations and behavior we need to be aware and consider during evaluation and when analyzing results.

AWS S3 API has a limit of 5500 read requests per second (per S3 bucket prefix) and it scales up to that limit gradually. Therefore we use a separate prefix with each language so at least the effects of S3 API scaling the read throughput should be comparable between the languages. It might be that the read request threshold is somewhat a limiting factor in this evaluation. AWS S3 ListObjects also has a limit of 1000 keys so to keep code simple we limit the amount of files under that threshold to avoid pagination.

AWS Lambda generally supports both Intel and ARM processor architectures but some language runtimes only support Intel. For this evaluation we use existing runtimes and ARM architecture whenever possible.

AWS Lambda resources are configured by specifying a memory size between 128 MB and 10240 MB in 1 MB steps. CPU power scales proportionally to memory size starting from a fraction of vCPU with 128MB and up to 6vCPUs with 10240 MB. The following table lists the memory configuration step where (in my tests at the time of writing) AWS Lambda Python runtime reported the different CPU counts:

+--------------+----------------+
| Memory (MB)  | Number of CPUs |
+--------------+----------------+
|         128  |              2 |
|        3072  |              3 |
|        5376  |              4 |
|        7168  |              5 |
|        8960  |              6 |
+--------------+----------------+

Even though in this evaluation we are not particularly interested in cost optimization it’s something that is always good to keep an eye on when designing and implementing cloud services. AWS Lambda execution is priced by execution time per 1ms by memory configuration.

Pricing and number of reported vCPUs don’t exactly line up in all steps. You may find that it’s probably a good idea to specify memory configuration just below a pricing step to get the most out of runtime performance. In other words instead of using 1024MB it might be better to use 1023MB and get almost the same performance but with the lower pricing tier. This is definitely already on a micro cost optimization level. If your code can’t benefit from multiple processors there’s really no benefit of going above e.g. 2048MB and certainly not above 3072MB (unless memory is a limiting factor). Although it’s a bit difficult to tell exactly at which memory configuration you’ll get 1 CPU worth of processing power.

So among these memory configurations, we choose the steps which are just 1MB below the stage where an additional vCPU is reported and just below the next pricing point. In this way we can assume to always get the full processing power of all the available vCPUs at that configuration with the best price. The chosen configurations lean towards the lower end of the spectrum as those are the ones you would generally end up using. The following table lists the chosen 10 memory configurations and number of CPUs reported by runtime:

+--------------+-----------+
| Memory (MB)  | CPU count |
+--------------+-----------+
|         128  |         2 |
|         256  |         2 |
|         511  |         2 |
|        1023  |         2 |
|        3008  |         2 |
|        3071  |         2 |
|        6143  |         3 |
|        7167  |         4 |
|        9215  |         5 |
|       10240  |         6 |
+--------------+-----------+

We use the most obvious approach for each language to achieve concurrency/parallelism. For Python and Node.js this approach is to use an asynchronous programming model. What this means is that Python and Node.js don’t benefit from having more than 1vCPU available. For other languages the implementations are done so that they should take advantage of all the available vCPUs in the execution environment. Unfortunately there’s no way to monitor execution resource usage in that detail in AWS Lambda.

Word of Warning

Designing and implementing parallel processing in AWS Lambda function is by no means the first and best approach in most cases. Typically it’s much better to implement logic so that you utilize cloud services’ in a way that you achieve true parallelism and can easily recover from error situations where processing of some tasks fails. In AWS there are many services and patterns you can use e.g. Step Function (always when in need to implement a process think about Step Function), SQS together with Lambda or AWS Glue just to name a few. In this case we are more interested in evaluating the differences between programming languages so for that kind of experiment AWS Lambda environment is a good choice.

Sequential Implementation as a Baseline

To understand how much improvement can be achieved with concurrent or parallel processing we need to have some baseline from using sequential processing. The core logic is shown in the following pseudo-code snippet:

timer.start()
s3_objects = aws_sdk.list_objects(s3_bucket)
for key in s3_objects.keys:
  body = aws_sdk.get_object(key).decode()
timer.stop()

The implementation lists the objects from a specified S3 bucket and then reads the body of each of them sequentially in a loop. This piece of logic is then wrapped within a timer to get the execution time. We use Python using version 3.11, which at the time of writing is the latest version in AWS Lambda Python runtime

We generate 1000 files with random content and varying size averaging around 250KB and upload those to S3 using different prefixes for each language. e.g. Python implementation reads files from S3://<bucket_name>/python/1000 and Node.js implementation reads them from S3://<bucket_name>/nodejs/1000. With this approach we try to give an equal ground for each language irrespective how S3 scales the read throughput.

The following graph shows how the execution speed evolved over time when this Python implementation is executed few times:

The graph shows quite well the effect of S3 gradually scaling the throughput. The first execution took 80,2 seconds. After a few runs the time drops significantly and the average execution time of the last 5 attempts is 22.3s. In other words the execution time of the same sequential implementation with the same Lambda configuration is down to ¼. Only because S3 has time to scale the throughput.

In the upcoming posts we convert this sequential implementation to process tasks concurrently. We explore the results of using 6 different languages: Python and Node.js, Go and Rust, Java and C++. Each pair of languages is covered in a separate blog. Finally there is a summary that concludes the different evaluations.

Clap, comment and share this posts as well as follow me for more upcoming interesting writes. Also, experiment by yourself with the provided code!

Links to GitHub repositories:

https://github.com/villekr/concurrency-eval-python

How Do You Like Your AWS Lambda Concurrency — Part 1: Introduction

Developer Experience Review and Perfomance Benchmark of Python, Node.js, Go, Rust, Java and C++ in AWS Lambda

Key Terminology and Concepts

Some Limitations to Consider

Word of Warning

Sequential Implementation as a Baseline

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by VilleKr

No responses yet