java serverless

Notes on Serverless 1: Does Java work for AWS Lambda?

A new project at work has got me thinking about whether Java works as a language for AWS Lambda applications. The more I’ve looked into this, the more that my research has expanded and I’ve got a little lost in the topic. This post is a set of notes aimed to add some structure to my thoughts. In time, this may become a talk or a long piece of writing.

  • The biggest issue with Java on lambda is that of cold starts. This is the initial delay in executing a function after it has been idle or newly deployed. This delay occurs while setting up the runtime environment. Given that Java platform requires a JVM to be set up, this adds a significant delay when compared with other platforms.
  • Amazon evidently understand that cold starts are an issue, since they offer a number of workarounds, such as provisioned concurrency (paying extra to ensure that some lambda instances are always kept warm). There is also a Java-specific option, Snapstart, which works by storing a snapshot of the memory and disk state of an initialised lambda environment and restoring from that.
  • Maxime Davide has set up a site to benchmark lambda cold starts on different platforms. The fastest is for C++ with ~12ms, Graal at 124ms, and Java at around 200ms. Weirdly, Java using Snapstart is the slowest of all at >=600ms (depending on Java version). This is counter-intuitive and there is an open issue raised about it.
  • Yan Cui, who writes on AWS as theburningmonk, posted a ‘hot take’ on Linked-In suggesting that people worry too much about cold starts: “for most people, cold starts account for less than 1% of invocations in production and do not impact p99 latencies“. He goes on to warn against synchronously calling lambdas from other lambdas(!), and discusses how traffic patterns affect initialisation.
  • There’s an excellent article from Yan Cui that digs further into this question of traffic patterns, I’m afraid you’re thinking about AWS Lambda cold starts all wrong. This looks at Lambdas in relation to API Gateway in particular, but makes the point that concurrent requests to a lambda can cause a new instance to be spun up, which then causes the cold start penalty for one of the requests.
  • This article goes on to suggest ‘pre-warming’ lambdas before expected spikes as one option to limit the impact, possibly even short-circuiting the usual work of that lambda for these wake-up requests. This article also suggests making requests to rarely-used endpoints using cron to keep them warm. This article is from 2018, so does not take account of some of the newer solutions – although I’ve seen this idea of pinging lambdas used recently as a quick-and-dirty solution.
  • It’s easy to get Graal working with Spring boot, producing an executable that can be run by AWS lambda. This gets the cold start of Spring Boot down to about 500ms, which is quite impressive – although still larger than many other platforms. Nihat Önder has made a github repo available.
  • However, the first execution of the Graal/Spring Boot demo after the cold start adds another 140ms, which tips this well over the threshold of what is acceptable. I’ve read that there are issues with lazy loading in the AWS libraries which I need to dig into.
  • Given the ease of using languages like Typescript, it’s hard to make a case for using Java in AWS Lambda when synchronous performance is important – particularly if you’re building simple serverless functions rather than using huge frameworks like Spring Boot.

Next steps

Before going too much further into this, I should try to produce some simple benchmarks, looking at a trivial example of a Java function, comparing Graal, the regular Java runtime and Snapstart. This will provide an idea of the lower limits for these start times. It would also be useful to look at the times of a lambda that accesses other AWS services such as one that queries S3 and DynamoDB, to see how this more complicated task affects the cold start time.

Given a benchmark for a more realistic lambda, it’s then worth thinking about how to optimise a particular function. Using more memory should help, for example, as should moving complicated set-up into the init method. How much can a particular lambda be sped up?

It’s also worth considering what would be an acceptable response time for a lambda endpoint – noting that this depends very much of traffic patterns. If only 1-in-100 requests have a cold start, is that acceptable? What about for a rarely-used endpoint, which always has a cold start?

java programming springboot

Refactoring and microservices

In recent cloud projects, I keep seeing the same Spring application anti-pattern. There are controllers for a number of REST endpoints. Each REST endpoint calls a separate class, which carries out the business logic for that action. The problem is that such classes can easily grow to a thousand lines or more, and I’ve often seen single methods over a hundred lines long – an anti-patten sometimes referred to as ‘god classes’. Code is sometimes extracted to private methods within these classes, which can obscure that there is a single execution flow hundreds of lines long. The addition of unit testing means that long, repetitive tests with complicated set-up are needed to provide coverage for branches deep within these classes. These complicated tests then make it difficult to refactor the code.

This problem comes from applying sensible principles in the wrong way. We have the Controller logic separate from the Business logic, and the Model managed by Spring Data classes. It’s a rough MVC pattern – and Spring makes this separation very easy. The problem is that the Controller logic is usually trivial, just an annotation that might as well have been put on the Service class. It’s this Service class that you really want to be split out into smaller classes.

One of the promises of microservices is that they should be nimble, something that can be quickly built and replaced. But such large classes produce microservices which are, basically, tiny monoliths. The complex tests act as a drag on refactoring, making the services little tangles of legacy code.

The Single Responsibility Principle is the sort of thing that comes up in interviews as one of the SOLID Principles, and I’ve never heard anyone argue that it’s a bad thing. Which makes it all the stranger that it does not seem to be applied in practise. Everyone seems to agree that god classes are a bad thing,

One answer here, which I’ve proposed before is to use TDD properly. This is the ideal way to solve the problem, preventing it from happening by applying best practise. In his recent book on Software Engineering, Dave Farley suggests that proper use of TDD avoids this sort of coupled code:

The strongest argument against TDD that I sometimes hear is that it compromises the quality of design and limits our ability to change code, because the tests are coupled to the code. I have simply never seen this in a codebase created with “test-first TDD.” It is common—I’d say inevitable—as a result of “test-after unit testing,” though. So my suspicion is that when people say “TDD doesn’t work,”  what they really mean is that they haven’t really tried TDD, and while I am sure that this is probably not true in all cases, I am equally certain that it is true in the majority and so a good approximation for truth.

The other potential solution is to enforce good class design with method size limits in quality-checking tools such as sonar. This restricts developer autonomy in an unpleasant manner, although this is better than the alternative of unmaintainable code. Farley suggests using tools to reject any method of more than a certain number of lines and parameters. He writes:

I will establish a check in the continuous delivery deployment pipeline, in the “commit stage,” that does exactly this kind of test and rejects any commit that contains a method longer that 20 or 30 lines of code. I also reject method signatures with more than five or six parameters. These are arbitrary values, based on my experience and preferences with the teams that I have worked on.

There are actually good arguments for this in that, as Farley points out, “Most optimizers in compilers simply give up trying once the cyclomatic complexity of a block of code exceeds some threshold”. But the most important thing here is that such limits force people out of writing procedural, linear code to produce business actions, and decompose these into single-responsibility classes. There are ways to write poor code within these constraints, but it’s not so easy to do.

java testing

Mutation Testing can help write better unit tests

I was introduced to mutation testing in my last job and I am very excited about its potential. Mutation testing evaluates how good a set of unit tests are. We used pitest and, applying it against an existing project, discovered a number of tests were not working as they should have been, despite providing code coverage. We also found a couple of minor bugs.

Mutation testing works by changing the bytecode for a piece of software then running the tests against this changed code. In theory, one of the tests providing coverage for that line ought to fail if the line changes. If this is not the case, then the code coverage is not actually asserting anything about that piece of code. A good introduction is a video by pitest’s creator, Henry Coles, Testing Like It’s 1971. (The title refers to the fact that mutation testing was invented in the 1970s but is only now achieving its potential).

I’d expected mutation testing to be painfully slow, but pitest can work through large code-bases surprisingly quickly. In smaller experiments, I found I could use pitest as part of the TDD cycle with little pain.

Working with mutation testing forces code coverage to be very high. It’s easy to exclude certain external calls, but all the other code within a project will need to be both covered and asserted. For some legacy codebases, adding such high coverage is going to be difficult. High coverage without TDD often produces brittle codebases that are hard to refactor, and adding tests retrospectively to these is expensive. Rather than using mutation testing for such codebases, it is probably more important to look initially at breaking down the code from large business logic classes (sometimes known as God classes) into smaller classes using the single responsibility principle.

But that’s another story. Whatever your situation, it’s worth looking into mutation testing, and thinking about how you can introduce it into your software build process.


Why Java Still Matters

One of the last things I did before finishing at Mindera was to write a blog post, Why Java Still Matters. This piece begins by looking at the history of Java, particularly the wilderness years, which I’ve previously written about in my post on Bruce A Tate’s Beyond Java.

The Mindera piece goes on to argue that Java’s lack of sophistication, often seen as a weakness, is actually a strength. For me, Java is a more robust language than many of the alternatives – although new features are diluting this.

Java is now over a quarter of a century old. It emerged on a wave of hype in 1996, promising to be a programming language for the Internet. But, unfortunately, it very soon came to feel awkward and was mocked as a boring, corporate language. Ten years later, people were writing book-length obituaries for Java, suggesting that developers move on.

You can read the full post on the Mindera blog.

One thing I couldn’t quite squeeze into the post was a discussion of how applets were withdrawn. I’d have loved to add a link to Simon Ritter’s post No Longer the Applet of the Developer’s Eye, where he tries to run a 1996 demo in Java 8.


Java Peaked With Version 7

Next week sees the launch of Java 17. This is the 8th release since Java moved to its six-monthly release cycle. This means nine releases have taken place in the same time that it took to go from version 6 to 7.

But, personally, I think the best version of Java is Java 7. The first time I said this I was joking, having just wrestled with some very complicated code that someone had written using streams. But the more I think about this, the more certain I am that this is right.

I can understand the pressure to keep adding features to improve Java. I remember the noughties, when everyone said Java was going to be replaced by newer, cooler languages. Java might be verbose and clunky but it’s also consistent. Pre-Java 8, there tended to be only one or two ways of doing things, and code would look relatively similar between different companies. Being a simple language, it was easy for developers to follow what was happening. There was less space for clever code that junior developers couldn’t understand. 

Java 8 was exciting, providing new paradigms for Java. But it’s made code more idiomatic and it’s easier to write obfuscated code. Compare it with perl, a language that was intentionally designed to be expressive. It’s not used in many large-scale systems. 

I can understand the pressure to add new features. Sometimes Java is frustrating. But Kotlin and Scala have set out their stalls as advanced languages on the JVM. They’re compatible with Java too, so there’s a strong argument for keeping Java as the dull boring option. I might want to use Scala or Kotlin in my own projects, but where I’m collaborating with multinational teams working agilely (which, in practise, means no documentation) I like my Java code as simple as it can be.