Categories
GenAI SpringAI

Retrieval-augmented generation using SpringAI

On Tuesday, after a long day working in Leeds, I came home and decided to play with SpringAI, trying to see if I could set up a retrieval-augmented generation example. It took me just over an hour to get something running.

The documentation for SpringAI feels a little shinier and more solid than that for LangChain4j. Both projects have similar aims, providing abstractions for working with common AI tools and both are explicitly inspired by the LangChain project.

As with LangChain4j, there were issues caused by rapid changes in the project’s APIs. I started work with an example built against OpenAI Azure. It was simple enough to switch this to working against OpenAI, requiring just a change in Spring dependencies and a few properties – Spring magic did the rest. The main problem was updating the code from 0.2.0-SNAPSHOT to 0.8.0-SNAPSHOT (I’d not realised how old the example I’d started with was).

The actual code itself is, once again very simple. When the application receives a query, it uses the SpringAI org.springframework.ai.reader.JsonReader class to load a document – in this case one about bikes from the original project – and divides it into chunks. Each of these chunks are run through a org.springframework.ai.embedding.EmbeddingClient, which produces a vector describing that chunk, and these are placed in a org.springframework.ai.vectorstore.SimpleVectorStore. Once I’d found the updated classes, the APIs were all very straightforward to work with.

An incoming query is then compared against the document database to find likely matches – these are then compiled into a SystemQuery template, which contains a natural-language prompt explaining the LLMs role in this application (You’re assisting with questions about products in a bicycle catalog). The SystemQuery is sent by the application alongside the specific UserQuery, which contains the user’s submitted question.

The responses from the ChatGPT4 model combined the user query with the document, producing obviously relevant responses in natural language. For example:

The SwiftRide Hybrid’s largest size (L) is suitable for riders with a height of 175 – 186 cm (5’9″ – 6’1″). If the person is taller than 6’1″, this bike may not be the best fit.

Playing around with this was not cheap – the RAG method sends a lot of data to OpenAI, and was burning through $0.10-$0.16 worth of tokens in each query. I also managed to hit my account’s rate limit of 10000 per minute playing with this. I’m not sure how feasible using the OpenAI model in production would be.

Notes and follow-ups

  • I need to put some of the code into github to share.
  • I’m fascinated by how part of the application is a natural-language prompt to tell ChatGPT how to respond. Programming LLMs is spooky, very close to asking a person to pretend they’re doing a role.
  • In production, this sort of application would require a lot of protection – some of which would use natural language instructions, but there are also models specifically for this role.
  • The obvious improvement here is to use a local model and see how effective that is.
Categories
GenAI LangChain4j

LangChain4j and local models

A colleague told me about Ollama, which allows you to get LLMs working on a local machine. I was so excited about this that I downloaded the orca-mini model. Due to terrible hotel wifi I used my mobile internet and blew out the limit on that. Oops.

Anyway, it is very easy to get Ollama working. Just download and install the software, then run ollama run llama2. It has a simple REST interface:

curl -X POST http://localhost:11434/api/generate -d '{
    "model": "orca-mini",
    "prompt":"tell me a joke"                 
   }'

It was easy enough to get this working with LangChain4J, although the APIs were not quite the same as for the OpenAPI models.


import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.output.Response;

public class App
{
    public static void main( String[] args )
    {

        String modelName = "orca-mini";
        String localUrl = "http://localhost:11434/"; 

        ChatLanguageModel model =
                OllamaChatModel.builder().baseUrl(localUrl).modelName(modelName).build();


        String answer = model.generate("tell me a joke about space");

        System.out.println("Reply\n" + answer);
    }

While these local models are less powerful than OpenAPI they seem fairly decent on a first examination. They also a much cheaper way to work with an LLM and I am going to use this to set up a simple RAG (retrieval augmented generation) example in LangChain4J.

Categories
GenAI LangChain4j

First steps with LangChain4j

I found myself with some free time this week when train problems forced me to travel from Manchester to Sheffield via Leeds. I used that delay to set up a basic ‘Hello World’ example using Langchain4J. This proved a touch harder than expected.

The example on https://langchain4j.github.io/langchain4j/docs/get-started/ used a generate method on ChatLanguageModel that didn’t work for the latest versions of the libraries (0.26.1 at the time of writing).

Not a helpful example…

I soon cobbled together some working code using the latest version of the langchain4j-core and langchain4j libraries as well as a langchain4j-open-ai dependency. I originally used a couple of hello world queries, which produced boring responses, so I decided to ask OpenAI to tell me a joke.

package com.orbific;

import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.openai.OpenAiChatModel;
import dev.langchain4j.model.output.Response;

public class App
{
    public static void main( String[] args )
    {

        ChatLanguageModel model = OpenAiChatModel.builder()
                .apiKey(ApiKeys.OPENAI_API_KEY)
            .build();

        String message = "Tell me a joke.";
        AiMessage mine = AiMessage.aiMessage(message);
        Response<AiMessage> answer = model.generate(mine);
        System.out.println("Reply\n" + answer.content().text());
    }
}

The response made me smile:

Why don’t scientists trust atoms?

Because they make up everything!

What’s weird is that I kept getting the same joke, even when setting a higher temperature in the model or rephrasing the query. But requesting a joke about cats produced a pun about cheetahs. And asking repeatedly for jokes about underwater creatures brings back different responses. There’s obviously something here that I’m missing.

I set up a paid chatGPT account but that did not seem to grant me access to the API, and I also had to top up some credits as well. I’m not entirely sure whether I needed the paid account so will look into that before the subscription renews.

There’s an interesting question as to whether it would have been faster for me to read the documentation rather than flail around for a solution, but that’s the whole point of a quickstart, right? Although my flailing wasn’t helped much by tiredness and a dodgy mobile internet connection.

I have a genuine excitement about getting this working. It’s not much, but it opens up some exciting possibilities. Now to go and read some documentation.