How Do You Like Your AWS Lambda Concurrency? — Part 2: Python and Node.js

Developer Experience Review and Perfomance Benchmark of Python, Node.js, Go, Rust, Java and C++ in AWS Lambda

VilleKr
8 min readAug 24, 2023

In this five-part blog series, we explore how much speed improvement can be achieved when a simple sequential logic is converted to a concurrent implementation. We review the developer experience on implementing the same concurrent logic using Python, Node.js, Go, Rust, Java and C++ and benchmark each implementation in AWS Lambda with different resource configurations.

You can jump directly to the different articles in this series:

In this post we focus on two modern asynchronous and dynamic languages: Python and Node.js. First, we go through the implementation and deployment related details. Then we review both languages from a subjective developer experience perspective. The objective execution speed comparison is presented in the final summary article across all the languages.

Python Implementation

The most efficient and effective way to achieve concurrency in Python is to use an asynchronous programming model with asyncio. It’s possible to use multiple threads or processes but due to Python’s infamous GIL that approach is not a good choice in this case.

Converting the synchronous implementation to asynchronous one requires three things: all the used libraries must support asynchronous execution, async/await keywords must be used and asynchronous function calls must be wrapped in asyncio.run-function.

Unfortunately boto3 — AWS SDK for Python — only supports synchronous execution. There are two approaches to overcome this limitation: implement wrapping code for boto3’s synchronous functions to run correctly within an event loop or use an existing library like aioboto3 that does this for you.

Lambda runtime for Python contains a collection of some commonly used packages. But as aioboto3 is not included there, we need to deal with all the complexity of deploying an extra package next to our Python code.

In order to have some common structure across different languages we implement three different functions:

  • handler-function, which acts as a lambda entry point
  • processor-function, which acts as a coordinator
  • get-function, which performs a single operation i.e. reads and object from S3

Let’s see the handler-function for Python implementation:

We have timing functions and processor-function invocation. The handler-function returns metrics that are used to collect benchmark results.

Next we have processor-function:

We initialize aioboto3’s session and create a S3 client. Under the async context manager we list the objects in an S3 bucket. Then we create a list of awaitables (i.e. callable functions with await keyword) and give that list to asyncio’s gather-function, which actually schedules awaitables for execution in event loop. Finally the function returns the length of processed files just for later bookkeeping purposes. There’s also an if-else statement, which adds some CPU-bound work to the mix. In the primary benchmark case the execution follows the find==False path.

Finally there’s the get-function:

Get-function asynchronously gets an object from S3, reads and decodes its body. The CPU-bound execution path also tries to find a given string from the decoded body.

Node.js Implementation

For Node.js, the best way to achieve concurrency is to utilize an asynchronous programming model with async and await. Node.js doesn’t have such a restriction like GIL in Python that prevents us from using multiple threads or processes effectively. But we stick to async implementation for the sake of simplicity and because it is likely the most performant one as well.

AWS SDK for Node.js fully supports asynchronous invocation and Node.js Lambda entrypoint is by default asynchronous so things are much more straightforward compared to Python. There’s no need to deploy any extra packages and we don’t need to wrap processor-function with anything.

Here’s Node.js handler-function:

We can directly invoke the processor-function with await-keyword. Calculating elapsed time is a little bit more messy compared to Python.

Next let’s see processor-function:

We’re using the latest v3 version of AWS SDK for JavaScript, which introduced modular package structure and completely changed the APIs. Compared to Python’s boto API and previous v2 SDK I find this new API a bit cumbersome with command&send-paradigm.

I also find Python’s syntax form list comprehension much more elegant compared to Node.js map. Promise.all triggers the processing of awaitables.

Finally we have get-function:

Getting objects from S3 has the same awkward command&send-paradigm. However, there’s a handy transformToString-method in the S3 response body to get string representation out of body content.

Building Deployment Packages

In this blog series we don’t cover the details of deploying all the required AWS resources. In order to get your code to Lambda there are three distinct steps that you need to take care of:

  • Develop the code
  • Build the deployment package
  • Deploy the cloud resources

For interpreted languages like Python and Node.js the step for building the deployment package is straightforward unless there’s a need to include external packages. It’s a good practice to utilize docker containers here as a build environment.

Dockerfile for Python:

FROM public.ecr.aws/sam/build-python3.11
RUN pip install poetry==1.5.1

AWS Serverless Application Model (SAM) maintains a set of docker images that are targeted to be used for building deployment packages and we use that as our base image. We use Poetry for Python dependency management and that’s the only extra package we need to add to build the image.

Dockerfile for Node.js:

FROM public.ecr.aws/sam/build-nodejs18.x
ENV NODE_ENV production

Here we set the NODE_ENV value to production so that some development packages we’re using won’t end up in the deployment package.

For deploying cloud resources we use terraform together with terragrunt. There’s an excellent terraform-aws-modules/lambda/aws-terraform module for deploying AWS Lambda functions. Currently that module only supports Python and Node.js and has an option to use docker container as a deployment package build environment

Subjective Review of Python and Node.js

The following section is based on my own personal opinions. I defined 7 categories to cover various developer experience aspects. I used grades from 1 to 6 where higher is better.

+---------------------------------------------------+--------+---------+
| Category | Python | Node.js |
+---------------------------------------------------+--------+---------+
| Code amount | 6 | 6 |
| Code complexity | 5 | 6 |
| AWS SDK developer experience | 4 | 5 |
| Code formatting and linting | 4 | 4 |
| Ease of setting up project and 3rd party packages | 5 | 6 |
| Ease of debugging runtime/compile time errors | 6 | 6 |
| Ease of creating deployment package | 5 | 5 |
| TOTAL | 35 | 38 |
+---------------------------------------------------+--------+---------+

The amount of code is pretty much equivalent in both cases. When the whole implementation with all the extra boilerplate fits to a screen then that’s pretty much the best you can get.

For the code complexity in Python’s case I decided to reduce 1 point due the burden of using asyncio.run, async context manager as well as a bit strange syntax for asyncio.gather. In Node.js list comprehensions and some other pieces of code were not that elegant as in Python. The main reason I decided to take 1 point away from Node.js is due to a baffling choice between CommonJS and ECMAScript.

For the AWS SDK developer experience I take 2 points away from Python because boto3 doesn’t support the async programming model. This is something AWS should really improve quickly (e.g. Azure SDK for Python has async support built-in). It’s not ok from developer experience to rely on 3rd party libraries or complicate your code by wrapping synchronous boto requests in order to be able to use async/await with Python. For Node.js the only complaint is the awkward command&send-paradigm in v3 SDK requests.

For code formatting you could rely either on IDEs built-in support or use external tools as there are no native formatters either in Python or Node.js. In the Python ecosystem the Black has become a de-facto standard and I really like what it does. Prettier does the same in Node.js side. I reduced 1 point from both languages as you have to rely (and know) external tools for code formatting.

For code linting I use flake8 in Python and eslint in Node.js. 1 point taken away from the fact that you have to know which tool to use and configure that to suit your needs.

The next category is how easily you can set up your project using a specific version of language and adding necessary 3rd party packages. Typically you will need to have multiple different language versions available. For Python I’ve settled on using pyenv for installing and switching between different Python versions and virtual environments. I use Poetry to manage project’s dependencies. For Node.js things are much simpler. You can use tools like nvm to manage different Node versions but as node_modules are installed in your development folder next to where package.json resides, you don’t have similar hassle as with Python’s virtual environments.

Once you have everything setup then running locally and debugging in breeze for Python in PyCharm and Node.js in IntelliJ IDEA.

Creating a deployment package especially with external packages can be challenging. Using a docker container as a build medium ensures that you get correct versions of packages matching the target processor architecture when the packages contain pre-built binaries as is the case e.g. with Python’s pandas-package. You can develop your own scripts to perform packaging or use something like terraform-aws-modules/lambda/aws-module instead.

Conclusion

Both Python and Node.js are great and popular choices to use with Lambda functions. Even though I personally prefer Python I have to admit that Node.js wins this duel from a developer experience perspective.

In the final part of this blog series we see how these languages compare from the objective performance perspective.

Next in this series we cover Go and Rust implementation.

Clap, comment and share this posts as well as follow me for more upcoming interesting writes. Also, experiment by yourself with the provided code!

Links to GitHub repositories:

--

--

VilleKr
VilleKr

Written by VilleKr

Working at NordHero as Cloud Solutions Architect.

No responses yet