Retrieval-augmented generation using SpringAI

On Tuesday, after a long day working in Leeds, I came home and decided to play with SpringAI, trying to see if I could set up a retrieval-augmented generation example. It took me just over an hour to get something running.

The documentation for SpringAI feels a little shinier and more solid than that for LangChain4j. Both projects have similar aims, providing abstractions for working with common AI tools and both are explicitly inspired by the LangChain project.

As with LangChain4j, there were issues caused by rapid changes in the project’s APIs. I started work with an example built against OpenAI Azure. It was simple enough to switch this to working against OpenAI, requiring just a change in Spring dependencies and a few properties – Spring magic did the rest. The main problem was updating the code from 0.2.0-SNAPSHOT to 0.8.0-SNAPSHOT (I’d not realised how old the example I’d started with was).

The actual code itself is, once again very simple. When the application receives a query, it uses the SpringAI org.springframework.ai.reader.JsonReader class to load a document – in this case one about bikes from the original project – and divides it into chunks. Each of these chunks are run through a org.springframework.ai.embedding.EmbeddingClient, which produces a vector describing that chunk, and these are placed in a org.springframework.ai.vectorstore.SimpleVectorStore. Once I’d found the updated classes, the APIs were all very straightforward to work with.

An incoming query is then compared against the document database to find likely matches – these are then compiled into a SystemQuery template, which contains a natural-language prompt explaining the LLMs role in this application (You’re assisting with questions about products in a bicycle catalog). The SystemQuery is sent by the application alongside the specific UserQuery, which contains the user’s submitted question.

The responses from the ChatGPT4 model combined the user query with the document, producing obviously relevant responses in natural language. For example:

The SwiftRide Hybrid’s largest size (L) is suitable for riders with a height of 175 – 186 cm (5’9″ – 6’1″). If the person is taller than 6’1″, this bike may not be the best fit.

Playing around with this was not cheap – the RAG method sends a lot of data to OpenAI, and was burning through $0.10-$0.16 worth of tokens in each query. I also managed to hit my account’s rate limit of 10000 per minute playing with this. I’m not sure how feasible using the OpenAI model in production would be.

Notes and follow-ups

I need to put some of the code into github to share.
I’m fascinated by how part of the application is a natural-language prompt to tell ChatGPT how to respond. Programming LLMs is spooky, very close to asking a person to pretend they’re doing a role.
In production, this sort of application would require a lot of protection – some of which would use natural language instructions, but there are also models specifically for this role.
The obvious improvement here is to use a local model and see how effective that is.

Recent Posts

Recent Comments

Archives

Categories

Leave a Reply Cancel reply