Categories
GenAI SpringAI

Retrieval-augmented generation using SpringAI

On Tuesday, after a long day working in Leeds, I came home and decided to play with SpringAI, trying to see if I could set up a retrieval-augmented generation example. It took me just over an hour to get something running.

The documentation for SpringAI feels a little shinier and more solid than that for LangChain4j. Both projects have similar aims, providing abstractions for working with common AI tools and both are explicitly inspired by the LangChain project.

As with LangChain4j, there were issues caused by rapid changes in the project’s APIs. I started work with an example built against OpenAI Azure. It was simple enough to switch this to working against OpenAI, requiring just a change in Spring dependencies and a few properties – Spring magic did the rest. The main problem was updating the code from 0.2.0-SNAPSHOT to 0.8.0-SNAPSHOT (I’d not realised how old the example I’d started with was).

The actual code itself is, once again very simple. When the application receives a query, it uses the SpringAI org.springframework.ai.reader.JsonReader class to load a document – in this case one about bikes from the original project – and divides it into chunks. Each of these chunks are run through a org.springframework.ai.embedding.EmbeddingClient, which produces a vector describing that chunk, and these are placed in a org.springframework.ai.vectorstore.SimpleVectorStore. Once I’d found the updated classes, the APIs were all very straightforward to work with.

An incoming query is then compared against the document database to find likely matches – these are then compiled into a SystemQuery template, which contains a natural-language prompt explaining the LLMs role in this application (You’re assisting with questions about products in a bicycle catalog). The SystemQuery is sent by the application alongside the specific UserQuery, which contains the user’s submitted question.

The responses from the ChatGPT4 model combined the user query with the document, producing obviously relevant responses in natural language. For example:

The SwiftRide Hybrid’s largest size (L) is suitable for riders with a height of 175 – 186 cm (5’9″ – 6’1″). If the person is taller than 6’1″, this bike may not be the best fit.

Playing around with this was not cheap – the RAG method sends a lot of data to OpenAI, and was burning through $0.10-$0.16 worth of tokens in each query. I also managed to hit my account’s rate limit of 10000 per minute playing with this. I’m not sure how feasible using the OpenAI model in production would be.

Notes and follow-ups

  • I need to put some of the code into github to share.
  • I’m fascinated by how part of the application is a natural-language prompt to tell ChatGPT how to respond. Programming LLMs is spooky, very close to asking a person to pretend they’re doing a role.
  • In production, this sort of application would require a lot of protection – some of which would use natural language instructions, but there are also models specifically for this role.
  • The obvious improvement here is to use a local model and see how effective that is.
Categories
GenAI LangChain4j

LangChain4j and local models

A colleague told me about Ollama, which allows you to get LLMs working on a local machine. I was so excited about this that I downloaded the orca-mini model. Due to terrible hotel wifi I used my mobile internet and blew out the limit on that. Oops.

Anyway, it is very easy to get Ollama working. Just download and install the software, then run ollama run llama2. It has a simple REST interface:

curl -X POST http://localhost:11434/api/generate -d '{
    "model": "orca-mini",
    "prompt":"tell me a joke"                 
   }'

It was easy enough to get this working with LangChain4J, although the APIs were not quite the same as for the OpenAPI models.


import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.output.Response;

public class App
{
    public static void main( String[] args )
    {

        String modelName = "orca-mini";
        String localUrl = "http://localhost:11434/"; 

        ChatLanguageModel model =
                OllamaChatModel.builder().baseUrl(localUrl).modelName(modelName).build();


        String answer = model.generate("tell me a joke about space");

        System.out.println("Reply\n" + answer);
    }

While these local models are less powerful than OpenAPI they seem fairly decent on a first examination. They also a much cheaper way to work with an LLM and I am going to use this to set up a simple RAG (retrieval augmented generation) example in LangChain4J.

Categories
GenAI LangChain4j

First steps with LangChain4j

I found myself with some free time this week when train problems forced me to travel from Manchester to Sheffield via Leeds. I used that delay to set up a basic ‘Hello World’ example using Langchain4J. This proved a touch harder than expected.

The example on https://langchain4j.github.io/langchain4j/docs/get-started/ used a generate method on ChatLanguageModel that didn’t work for the latest versions of the libraries (0.26.1 at the time of writing).

Not a helpful example…

I soon cobbled together some working code using the latest version of the langchain4j-core and langchain4j libraries as well as a langchain4j-open-ai dependency. I originally used a couple of hello world queries, which produced boring responses, so I decided to ask OpenAI to tell me a joke.

package com.orbific;

import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.openai.OpenAiChatModel;
import dev.langchain4j.model.output.Response;

public class App
{
    public static void main( String[] args )
    {

        ChatLanguageModel model = OpenAiChatModel.builder()
                .apiKey(ApiKeys.OPENAI_API_KEY)
            .build();

        String message = "Tell me a joke.";
        AiMessage mine = AiMessage.aiMessage(message);
        Response<AiMessage> answer = model.generate(mine);
        System.out.println("Reply\n" + answer.content().text());
    }
}

The response made me smile:

Why don’t scientists trust atoms?

Because they make up everything!

What’s weird is that I kept getting the same joke, even when setting a higher temperature in the model or rephrasing the query. But requesting a joke about cats produced a pun about cheetahs. And asking repeatedly for jokes about underwater creatures brings back different responses. There’s obviously something here that I’m missing.

I set up a paid chatGPT account but that did not seem to grant me access to the API, and I also had to top up some credits as well. I’m not entirely sure whether I needed the paid account so will look into that before the subscription renews.

There’s an interesting question as to whether it would have been faster for me to read the documentation rather than flail around for a solution, but that’s the whole point of a quickstart, right? Although my flailing wasn’t helped much by tiredness and a dodgy mobile internet connection.

I have a genuine excitement about getting this working. It’s not much, but it opens up some exciting possibilities. Now to go and read some documentation.

Categories
serverless

Thoughts about serveless

I drafted this post some time in 2022, and never got around to posting it. I wanted to publish it as it contains some good links and thinking points.

My last role gave me a chance to play more with serverless code in the form of AWS Lambda. While the issues around the cold starts still need managing in some way, I’m am excited about serverless as a technology and think it should be more widely adopted.

The main advantage of serverless is not having to think about servers. They are still there, but can be mostly ignored. As Justin Etheredge neatly put it, “Managing servers is a nasty side effect of wanting to execute code.”

Not having to think about servers means a lot of things become simpler. Most compelling of these is having a smaller attack surface against hackers. Another is not having to maintain servers. Amazon has dedicated engineers responsible for managing the machines and upgrading them, and has the advantage of massive economies of scale. Companies can focus on the code that delivers the value for their customers.

We’ve moved from having ‘servers-as-pets’, keeping the same instance running for months; to ‘servers-as-cattle’ with puppet to create new ones; to ephemeral containers – but we still have to manage resources, even if they’re just Docker config files. This is a very different role to programming, and leads to the dev/ops split. All servers are a drag, even if they are containers being managed by Kubernetes.

Which is not to say that serverless means being able to ignore ops completely, as Charity Majors has explained. Observability is vital, and you will still encounter issues where the abstractions of serverless leak through. The structure of an application comes to contain a significant amount of logic (for example which queues connect serverless applications) and one needs to be careful of this.

For me, one of the main advantages of serverless is that it enforces good behaviour. AWS Lambda is inherently stateless, since any state can last only for a single request. Paying for the time a request takes focuses developers on writing smaller pieces of code, thereby following more effective cloud patterns. The ease of adding lambdas also avoids the problem with persistent servers where it is easier to add to an existing microservice than handle the overhead of creating a new one, even where it is necessary.

One of the risks is lock-in. From a code point of view, serverless abstractions have appeared, and well-written code ought to be easy to port. However, moving the data for a cloud application would likely be fearsome and expensive, and I’ve not seen much writing about how that would occur. Picking serverless over container-based code is probably the least of your problems with that sort of migration.

Another issue is that serverless is not perfect for all situations – long running processes or those dealing with calls to high-latency services are probably better handled by container-based services – although I think people do not make enough use of serverless.

One thing is that I’ve seen less discussion than I would expect of Serverless as a hobbyist option. In one way, it’s as straightforward as a CGI-BIN, but there is the risk of cost, given that you’re paying for every bot that visits your application. Having said that, serverless applications can still be as cost-effective as hosted applications for small-scale apps. The monitoring and management of AWS costs is an ongoing problem.

Gunnar Morling gave a good talk at QCon, Serverless Search for My Blog with Java, Quarkus & AWS Lambda which explored all aspects of using serverless for a hobby project. There is also Robin Sloan’s discussion of cloud on his blog, including how he uses a hack to get around the cold-start issue. Such hacks are probably more relevant to hobby sites than production software, but is discussion of the topic is illuminating.

Categories
java programming springboot

Refactoring and microservices

In recent cloud projects, I keep seeing the same Spring application anti-pattern. There are controllers for a number of REST endpoints. Each REST endpoint calls a separate class, which carries out the business logic for that action. The problem is that such classes can easily grow to a thousand lines or more, and I’ve often seen single methods over a hundred lines long – an anti-patten sometimes referred to as ‘god classes’. Code is sometimes extracted to private methods within these classes, which can obscure that there is a single execution flow hundreds of lines long. The addition of unit testing means that long, repetitive tests with complicated set-up are needed to provide coverage for branches deep within these classes. These complicated tests then make it difficult to refactor the code.

This problem comes from applying sensible principles in the wrong way. We have the Controller logic separate from the Business logic, and the Model managed by Spring Data classes. It’s a rough MVC pattern – and Spring makes this separation very easy. The problem is that the Controller logic is usually trivial, just an annotation that might as well have been put on the Service class. It’s this Service class that you really want to be split out into smaller classes.

One of the promises of microservices is that they should be nimble, something that can be quickly built and replaced. But such large classes produce microservices which are, basically, tiny monoliths. The complex tests act as a drag on refactoring, making the services little tangles of legacy code.

The Single Responsibility Principle is the sort of thing that comes up in interviews as one of the SOLID Principles, and I’ve never heard anyone argue that it’s a bad thing. Which makes it all the stranger that it does not seem to be applied in practise. Everyone seems to agree that god classes are a bad thing,

One answer here, which I’ve proposed before is to use TDD properly. This is the ideal way to solve the problem, preventing it from happening by applying best practise. In his recent book on Software Engineering, Dave Farley suggests that proper use of TDD avoids this sort of coupled code:

The strongest argument against TDD that I sometimes hear is that it compromises the quality of design and limits our ability to change code, because the tests are coupled to the code. I have simply never seen this in a codebase created with “test-first TDD.” It is common—I’d say inevitable—as a result of “test-after unit testing,” though. So my suspicion is that when people say “TDD doesn’t work,”  what they really mean is that they haven’t really tried TDD, and while I am sure that this is probably not true in all cases, I am equally certain that it is true in the majority and so a good approximation for truth.

The other potential solution is to enforce good class design with method size limits in quality-checking tools such as sonar. This restricts developer autonomy in an unpleasant manner, although this is better than the alternative of unmaintainable code. Farley suggests using tools to reject any method of more than a certain number of lines and parameters. He writes:

I will establish a check in the continuous delivery deployment pipeline, in the “commit stage,” that does exactly this kind of test and rejects any commit that contains a method longer that 20 or 30 lines of code. I also reject method signatures with more than five or six parameters. These are arbitrary values, based on my experience and preferences with the teams that I have worked on.

There are actually good arguments for this in that, as Farley points out, “Most optimizers in compilers simply give up trying once the cyclomatic complexity of a block of code exceeds some threshold”. But the most important thing here is that such limits force people out of writing procedural, linear code to produce business actions, and decompose these into single-responsibility classes. There are ways to write poor code within these constraints, but it’s not so easy to do.

Categories
java testing

Mutation Testing can help write better unit tests

I was introduced to mutation testing in my last job and I am very excited about its potential. Mutation testing evaluates how good a set of unit tests are. We used pitest and, applying it against an existing project, discovered a number of tests were not working as they should have been, despite providing code coverage. We also found a couple of minor bugs.

Mutation testing works by changing the bytecode for a piece of software then running the tests against this changed code. In theory, one of the tests providing coverage for that line ought to fail if the line changes. If this is not the case, then the code coverage is not actually asserting anything about that piece of code. A good introduction is a video by pitest’s creator, Henry Coles, Testing Like It’s 1971. (The title refers to the fact that mutation testing was invented in the 1970s but is only now achieving its potential).

I’d expected mutation testing to be painfully slow, but pitest can work through large code-bases surprisingly quickly. In smaller experiments, I found I could use pitest as part of the TDD cycle with little pain.

Working with mutation testing forces code coverage to be very high. It’s easy to exclude certain external calls, but all the other code within a project will need to be both covered and asserted. For some legacy codebases, adding such high coverage is going to be difficult. High coverage without TDD often produces brittle codebases that are hard to refactor, and adding tests retrospectively to these is expensive. Rather than using mutation testing for such codebases, it is probably more important to look initially at breaking down the code from large business logic classes (sometimes known as God classes) into smaller classes using the single responsibility principle.

But that’s another story. Whatever your situation, it’s worth looking into mutation testing, and thinking about how you can introduce it into your software build process.

Categories
java

Why Java Still Matters

One of the last things I did before finishing at Mindera was to write a blog post, Why Java Still Matters. This piece begins by looking at the history of Java, particularly the wilderness years, which I’ve previously written about in my post on Bruce A Tate’s Beyond Java.

The Mindera piece goes on to argue that Java’s lack of sophistication, often seen as a weakness, is actually a strength. For me, Java is a more robust language than many of the alternatives – although new features are diluting this.

Java is now over a quarter of a century old. It emerged on a wave of hype in 1996, promising to be a programming language for the Internet. But, unfortunately, it very soon came to feel awkward and was mocked as a boring, corporate language. Ten years later, people were writing book-length obituaries for Java, suggesting that developers move on.

You can read the full post on the Mindera blog.

One thing I couldn’t quite squeeze into the post was a discussion of how applets were withdrawn. I’d have loved to add a link to Simon Ritter’s post No Longer the Applet of the Developer’s Eye, where he tries to run a 1996 demo in Java 8.

Categories
books programming

A review of Dave Farley’s Modern Software Engineering

My colleague Luke Punnett recently recommended Dave Farley’s book ‘Modern Software Engineering’. While it’s not quite a classic, it’s a superb summary of the state of the art in software development. Anyone writing enterprise software should read this, and ideally follow the book’s advice.

Farley attempts to build a foundation for software engineering as a discipline using the scientific method. Writing code is formalised as a series of experiments, set within the process of ‘characterise, hypothesise, predict and experiment’: “Software engineering is the application of an empirical, scientific approach to finding efficient, economic solutions to practical problems in software.

What was most valuable about this book for me was getting a glimpse of how a very good and experienced software engineer approaches his work (Farley mentions a couple of times that he was involved in the LMAX exchange project).

There are several topics on which the book is particularly strong. Farley approaches agile from a fresh angle, renewing my faith in it – a faith that had been ground down by SaFE and ritualised agile processes. Farley is also excellent on test-driven development, arguing that TDD is not about producing code coverage, but a method of working that produces “a pressure … to write code that is more testable“. Farley then argues that testable code has the same attributes as code that is easy to maintain. There is also some excellent discussion of the pros and cons of microservices, arguing that their main strength is in allowing smaller, more focussed teams.

Robert C Martin’s Clean Code feels like it has reached the end of its life. People are less comfortable with what some of the guidelines that it proposes. Farley’s book is, I think, an excellent replacement. It is short and well-argued, and sets out a clear case for its recommendations. Some of the things Farley proposes – TDD, microservices, test automation etc – are still controversial in some companies. Hopefully this book will help towards their wider adoption.

Categories
Uncategorized

Developer Experience

I’ve written a post on the Mindera company blog, Why Everyone Needs to be Thinking About Developer Experience. It’s a topic I don’t think gets addressed enough. Too many companies erect uneccessary barriers to their developers producing good work.

Developer Experience is the idea of ensuring that developers’ tools, practices and working environment are as good as they can be to support their job.

It emerged from considering User Experience (UX) for developer-focused products. Some companies use these ideas to assess their internal platforms and processes. It’s an idea that shouldn’t seem radical — improving developers’ experience increases the speed and quality of their work.

Read more on the Mindera blog…

Categories
Uncategorized

Time to stop lying about TDD

I’ve just had a post published on the Mindera company blog, It’s Time to Stop Lying About TDD. This post came from a frustration with both companies and developers using the term TDD (Test-Driven Development) when what they really mean is that they have a code-coverage threshold. Adding coverage without using TDD removes many of the benefits of the practise. TDD is hard, and requires practise to get good at it, and few developers give it the time needed.

Many of the CVs that I see as a Java interviewer for Mindera mention test-driven development (TDD). It’s in most of the job roles I see advertised too. But I am fairly certain that most people are not actually doing TDD.

How do I know this? Because when we do live-coding exercises, very few candidates write tests, let alone do them before they add any code. It’s the same with the code samples sent in — very few of them include tests. Maybe these developers are using TDD, but they are not demonstrating it.

Read more on the Mindera blog…