Notes on Serverless 2: Confusing Benchmarks

I’m due to give a talk on Java serverless at the end of this month. The difference between standard lambdas, Snapstart and provisioned concurrency is simple in theory – but digging into this has proved complicated. I’ve been using the simplest lambda possible, printing a single string to the command line. In this situation an unoptimised lambda proved the fastest option, although a ‘primed’ snapstart lambda (one that calls the handler method before the CRaC checkpoint) was only slightly slower.

Running my simple lambda produced the following output:

RequestInit Duration (ms)Duration (ms)Billed Duration (ms)
1st execution438.23209.36210
2nd execution10.5411
Execution after 30 minutes455.72258.06259

What I hadn’t expected here was for both the init duration and duration to both be slower on the first request. I was also shocked that the simplest lambda possible was taking so long to run. I’m aware that one query is not statistically relevant, but this matches what I’ve seen on other occasions.

I tried the same thing with the Snapstart lambda. My first attempts to do this didn’t work, calling the lambda in the normal way:

RequestInit Duration (ms)Duration (ms)Billed Duration (ms)
1st execution472.25212.41213
2nd execution7.128
Execution after 30 minutes500.80223.55224

I recreated the Snapstart lambda then tried explicitly publishing it to see if that was what was wrong. I had to execute the test against the specific version and this produced different Cloudwatch logs and speeds:

RequestRestore Duration (ms)Duration (ms)Billed Duration (ms)
1st execution660.45269.75473
Following day703.86256.52239

I decided to make the timings more obvious by adding a 6s sleep in the lambdas constructor and a 3s sleep in the handler method.

RequestRestore Duration (ms)Duration (ms)Billed Duration (ms)
1st execution739.573250.473455
Following Day755.283235.883420

This lambda demonstrates that the restore duration does not recreate the lambda, but we can see that there is a restore penalty for snapstart which is slightly longer than that for a non-snapstart lambda when the lambda is simple. There is still what we might refer to as a ‘cold start’, albeit a reduced one. (I am assuming here that the cold start does indeed call the constructor and need to go back and confirm this!)

While looking into this, I checked what I was seeing against the result in Max Day’s Lambda cold start analysis. The results yesterday (Saturday 11th May) included the following:

RuntimeCold start Duration (ms)Duration (ms)
C++ (fastest available)12.71.62
GraalVM Java 17126.8677.60
NodeJS 20138.4313.53
Java 17202.288.28
Java 11 Snapstart652.4842.48

I’d long wondered why Day was getting such poor results from Snapstart. Now, looking at the above results, this makes sense – Snapstart only becomes helpful for complicated lambdas. The thing I’m now wondering is how come Day’s Java 17 start time is so low.

One other trick I’ve seen, which has worked for me it to invoke the lambda handler in the beforeCheckpoint method, which ensures that the stored Snapstart image includes as much of the JIT compilation as possible. This seems to work with start times of around 650ms vs 1000ms for a straightforward Snapstart lambda.

The next step is to repeat these investigations for a lambda with a severe cold start problem – which I think should happen with S3/DynamoDB access.

Leave a Reply

Your email address will not be published. Required fields are marked *