Category: GenAI

GenAI

Two notes on vibe coding

“[A mentor] pointed out that debugging AI generated code is a lot like onboarding into a legacy codebase, making sense of decisions you didn’t make, finding where things break, and learning to trust (or rewrite) what’s already there. That’s the kind of work a lot of developers end up doing anyway”

From Sean Goedecke:

Being good at debugging is more useful than being good at writing code – you only write a piece of code once, but you may end up debugging it hundreds of times¹. As programmers use more AI-written code, debugging may end up being the only remaining programming skill.

GenAI

Notes on RAG – Part 2

Continuing my research on RAG (part 1 here)

The week before last I worked on a simple example of RAG using Spring. This involved a lot of yak shaving, in part because spring had updated their package structures since I last worked on my RAG demo. I also wanted to set up a local vector DB without using docker, finally settling on MariaDB. In the end I had a simple example that took a CSV, inserted the rows as embeddings into a vector database and could run simple queries against it. I still need to tidy this up and upload it to github.

Interesting Links

I need to look into using vector databases for recommendation engines.
“If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves.” – link
Simon Willison gave a great talk about embeddings. This introduced the concept, showed how to do ‘vibes-based search’ against sqllite using llm, and talked about 2D visualisations.
- “Being able to spin up this kind of ultra-specific search engine in a few hours is exactly the kind of trick that excites me about having embeddings as a tool in my toolbox.”
- “A fascinating thing about RAG is that it has so many different knobs that you can tweak. You can try different distance functions, different embedding models, different prompting strategies and different LLMs. There’s a lot of scope for experimentation here.”
Adding semantic search to datasette
Lovely visualisation of 40 million Hacker news posts. This uses Uniform Manifold Approximation and Projection (UMAP) to reduce the vectors to a 2D geographic map (among other things). Some interesting applications of a massive dataset.
The llm tool can be used for image search using CLIP
This leads to someone searching images of faucets by image and phrase (what taps best represent the idea of ‘Bond villain?’)
According to ChatGPT, people have played with using vector databases to analyse recipes. There is a paper on this, ‘Learning Cross-modal Embeddings for Cooking Recipes and Food Images‘, but I can’t find any details on applications with it, experimental or otherwise.
Interesting discussion of search in RAG (look for the RAG section)
Text Embedding Models Contain Bias. Here’s Why That Matters – interesting Google paper from 2018
Using a vector database, SkyCLIP, and Leaflet to create a searchable aerial photograph
“This is why Retrieval-Augmented Generation (RAG) is not going anywhere. RAG is basically the practice of telling the LLM what it needs to know and then immediately asking it for that information back in condensed form. LLMs are great at it, which is why RAG is so popular.” link
Spring AI documentation on RAG

GenAI

Notes on Resource-Augmented Generation (part 1)

Post author By admin
Post date April 7, 2025
1 Comment on Notes on Resource-Augmented Generation (part 1)

I’ve been thinking recently about Resource Augmented Generation (RAG) and vector databases, and I wanted to gather some notes towards a talk.

The question

Most of the applications I’m seeing for Generative AI seem to involve RAG. But I feel that the vector database is doing most of interesting work here – and, in a lot of cases, an LLM-generated response is not always the best ‘view’ for the data that is returned. I want to dig a little more into vector databases, how they work, and what can be done with them.

Panda Smith wrote, “If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves”; I want to learn more about this search part of RAG

Playing with embeddings was a post I wrote in September ’24 looking at using vector databases for ‘vibes-based search’
GenAI is already useful for historians discussed an article about a historian using GenAI to find diary entries relevant to their research
Retrieval-augmented generation using SpringAI was a Spring AI RAG demo I built for a previous talk

Notes from wikipedia

Pulling out some notes from the wikipedia article:

“[RAG] modifies interactions with a large language model so that the model responds to user queries with reference to a specified set of documents, using this information to supplement information from its pre-existing training data. This allows LLMs to use domain-specific and/or updated information.”
There are two phases – information retreival and response generation
RAG was first proposed in the paper ‘Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks’ in 2020
The process involves the following stages:
- Indexing – the documents to be searched are stored – usually by converting the data into vector representations
- Retrieval – given a user query, a document retriever finds the most relevant documents for the query
- Augmentation – the documents are put in a prompt for an LLM
- Generation – the LLM generates a response based upon the prompt
The ‘chunking’ of the documents (how they are divided up into pieces to be stored) affects how good the responses are.
Risks of RAG
- While RAG reduces hallucinations in the responses, it cannot eliminate them.
- There is a danger of losing important context in the chunking phase.

Interesting links

Twitter thread by Jo Kristian Bergum on ‘The rise and fall of the vector database infrastructure category’
The Best Way to Use Text Embeddings Portably is With Parquet and Polars – fascinating discussion of vector databases, using Magic: The Gathering cards as a dataset
Embeddings: What they are and why they matter Text of a talk by Simon Willison: “Embeddings are a really neat trick that often come wrapped in a pile of intimidating jargon”

GenAI

What I believe about GenAI (and what I’m doing about it)

Post author By admin
Post date December 19, 2024
No Comments on What I believe about GenAI (and what I’m doing about it)

I woke up on Sunday morning with the following question: what do I believe about GenAI – and what should I be doing in response? Based on what I’ve been reading, here is what I currently think:

GenAI is a revolution – cynics have dismissed GenAI as ‘fancy autocomplete’, but that ignores the magic of LLMs – both their ability to produce plausible text and their performance with previously difficult and imprecise tasks.
GenAI is also overhyped – a lot of the problem with GenAI is that some companies are over-promising. LLMs are not going to lead to AGI and are not going to replace skilled people in most situations.
The main benefit of LLMs is efficiency – LLMs are very good at some previously complicated tasks, and this will make those tasks much cheaper. I’m expecting this to produce a boom in programming as previously-expensive projects become feasible – similar to how Excel has produced a boom in accountancy.
There is a correction coming – there’s a huge amount of money invested in GenAI and I think it will be some time before this pays off. I’m expecting to see a crash come before long term growth. But that’s the same thing as happened with the 2000 dotcom crash.
RAG is boring – using RAG to find relevant data and interpret it rarely feels like a good user experience. In most cases, a decent search engine is faster and more practical.
There are exciting surprises coming – I suspect that the large-scale models from people like OpenAI have peaked in their effectiveness, but smaller-scale models promise some interesting applications.

I am going to spend some time over Christmas coding with GenAI tools. I’m already sold on ChatGPT as a tool for teaching new technology and thinking through debugging, but there are many more tools out there.

I’m also going to do some personal research on how people are using Llama and other small open-source models. There must be more to GenAI than coding assistants and RAG.

GenAI

Playing with embeddings

I was inspired by Simon Willison‘s recent post on embeddings and decided to use them to explore some documents. I’d blocked out some time to do this, but ended up with decent results in just under an hour.

Introduction

Embeddings are functions that turn pieces of text into fixed length multi-dimensional vectors of floating point numbers. These can be considered as representing locations a within multi-dimensional space, where the position relates to a text’s semantic content “according to the embedding model’s weird, mostly incomprehensible understanding of the world”. While nobody understands the meaning of the individual numbers, the locations of points representing different documents can be used to learn about these documents.

The process

I decided to go with the grain of Willison’s tutorials by setting up Datasette, an open-source tool for exploring and publishing data. Since this is based on SQLLite, I was hoping this would be less hassle than using a full RDBMS. I did a quick install and got Datasette running against my Firefox history file.

OpenAI have a range of embedding models. What I needed to do was to find the embeddings for my input text and send that to OpenAI’s APIs. I’m something of a hack with python, so I searched for an example, finding a detailed one from Willison, which pointed me towards an OpenAI to SQLLite tool he’d written.

(Willison’s documentation of his work is exemplary, and makes it very easy to follow in his footsteps)

There was a page describing how to add the embeddings to SQLLite which seemed to have everything I needed – which means the main problem became wrangling the real-world data into Datasette. This sounds like the sort of specific problem that ChatGPT is very good at solving. I made a few prompts to specify a script that created an SQLLite DB whose posts table had two columns – title and body, with all of the HTML gubbins stripped out of the body text.

Once I’d set up my OPENAI_API_KEY enviroment variable, it was just a matter of following the tutorial. I then had a new table containing the embeddings – the big issue being I was accidentally using the post title as a key. But I could work with this for an experiment, and could quickly find similar documents. The power of this is in what Willison refers to as ‘vibes-based search’. I can now expand this to produce a small piece of arbitrary text, and find anything in my archive related to that text.

Conclusion

Playing with embeddings produced some interesting results. I understood the theory, but seeing it applied to a specific dataset I knew well was useful.

The most important thing here was how quickly I got the example set up. Part of this, as I’ve said, it due to Willison’s work in paving some of the paths to using these tools. But I also leaned heavily on ChatGPT to write the bespoke python code I needed. I’m not a python dev, but genAI allows me to produce useful code very quickly. (I chose python as it has better libraries for data work than Java, as well as more examples for the LLM to draw upon).

Referring yet again to Willison’s work, he’s wrote a blog post entitled AI-enhanced development makes me more ambitious with my projects. The above is an example of just this. I’m feeling more confident and ambitious about future genAI experiments.

GenAI

GenAI is already useful for historians

Post author By admin
Post date August 14, 2024
2 Comments on GenAI is already useful for historians

I’m still hearing people saying that GenAI is empty hype, comparing it to blockchain and NFTs. The worst dismissals claim that these tools have no real use. While there is a lot of hype around GenAI, there are people using them for real work, including for code generation and interpretation.

An interesting article in the Verge, How AI can make history, looks at how LLMs can investigate historical archives, through Mark Humphries’ research into the diaries of fur trappers. He used LLMs to summarise these archives and to draw out references to topics far more powerfully than a keyword search ever could.

The tool still missed some things, but it performed better than the average graduate student Humphries would normally hire to do this sort of work. And faster. And much, much cheaper. Last November, after OpenAI dropped prices for API calls, he did some rough math. What he would pay a grad student around $16,000 to do over the course of an entire summer, GPT-4 could do for about $70 in around an hour.

Yes, big companies are overselling GenAI. But, when you strip away the hype, these tools are still incredibly powerful, and people are finding uses for them.

GenAI programming

The potential of ChatGPT for programmers

Post author By admin
Post date March 20, 2024
No Comments on The potential of ChatGPT for programmers

I’ve been meaning to post for some time about my first experiences of programming with ChatGPT, back in January. Ethan Mollick often suggests that people should try doing their job with ChatGPT for at least 10 hours to get a feel for its potential. Playing with ChatGPT for a short time has converted me from an AI cynic to an enthusiast.

Simon Willison wrote about his experiences coding with ChatGPT, concluding that AI-enhanced development makes him more ambitious with his projects.

Shortly after I read that post, I had a silly question related to watching movies. I order my watchlist at Letterboxd by the average rating on the site. But I began to wonder whether this was a good way to watch movies. Did my taste actually correlate with the overall site? Or would I be better off finding a different way to order the watch list?

The obvious way to check this is by writing a bit of code to do the analysis, but that seemed like a chore. I decided to put a few prompts into ChatGPT to see whether that helped. Within two minutes, I had a working python programme. There was a little bit of playing around to get the right page element to scrape, but essentially ChatGPT wrote me a piece of code that could load up a CSV file, use data in the CSV file to download a webpage, grab an item from the page and then generate another CSV file with the output.

I started with a simple initial prompt and asked for a series of improvements.

Can you show me an example of how to scrape a webpage using python, please? I need to find the content of an element with an id of “tooltip display-rating”, which is online. I also want to set the user agent to that of a browser.

(I also asked for a random time of between 1 and 2 minutes between each request to the website to be polite. I’m not supposed to scrape Letterboxd but it I figured it was OK as this was for personal use, and I am a paid member.)

This all went pretty well, and ChatGPT also talked me through installing python on my new Mac. The prompts I used were hesitant at first because I didn’t really know how far this was going to go. ChatGPT was also there to talk me through some python specific errors.

When I run this script, I get an error: “ModuleNotFoundError: No module named ‘requests'” What do I need to do to import this module

I get a warning when I run this command: “NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the ‘ssl’ module is compiled with ‘LibreSSL 2.8.3’.” Is this something I need to fix? What should I do?

Before long I’d got this complete working piece of code and checked my hypothesis. Turns out that’s not a strong enough correlation to say anything either way.

While the example itself is trivial and the output inconclusive, it showed me that it was very possible to write decent quality code very quickly. I rarely use Python, but ChatGPT provided useful assistance in an unfamiliar language. Writing this code from scratch, even in Java, even using Stackoverflow, would have taken more time than it was worth. As Simon Willison says

AI-enhanced development doesn’t just make me more productive: it lowers my bar for when a project is worth investing time in at all. Which means I’m building all sorts of weird and interesting little things that previously I wouldn’t have invested the time in.

My immediate takeaway is that AI tooling has the potential to revolutionise programming. It’s not going to replace programmers, rather it’s going to reduce the threshold for a project to be viable and unlock a lot of work. Tim Harford made the same point recently, looking at the history of the spreadsheet. This is an exciting time, and I’m expecting to be very busy in the next few years. I’m also impressed at how effective a tutor ChatGPT is, breaking down its examples into straightforward steps.

It has taken me far longer to write this post than it did to produce the code.

GenAI SpringAI

Spring AI Image generation example

Post author By admin
Post date February 10, 2024
No Comments on Spring AI Image generation example

Spring AI’s 0.8.0-SNAPSHOT release includes support for Image Generation using Dall-E 2/3 or Stability. This was added on January 24th and has not yet been documented but a video by Craig Walls describes how to use the new functionality.

I thought that an interesting example to try would be to combine ChatGPT with Dall-E. This way I can take a restricted range of parameters for an image (ie mood, animal, activity) and ask ChatGPT to expand this into a detailed prompt, which I can then use to generate an image. The idea here is to take user input for the prompt but to restrict what they can specify, maybe through some dropdowns. Another way of doung this would be to use ChatGPT to check freeform user input, but this seems to be simpler.

The example was pretty easy to put together. I used the org.springframework.ai.openai.OpenAiChatClient class to communicate with chatGPT followed by the org.springframework.ai.image.ImageClient class to generate the image using Dall-E 3. A simple Controller took some GET parameters and placed them into a prompt template:

I want to generate amusing images.
These images should feature an animal. The animal chosen is {animal}.
The animal in question should be {activity}.
The picture should make the user feel {mood}.

This template prompt could be changed to further restrict or specify the sort of image being produced through prompt engineering.

There’s a fair amount of Spring magic tying things together – in particular a @Configuration class that sets up the OpenAIImageClient, since auto-configuration is not yet available. The Controller method is as follows:

@GetMapping("safeimagegen")
public String restrictedImageGeneration(
@RequestParam(name = "animal") String animal,
@RequestParam(name = "activity") String activity,
@RequestParam(name = "mood") String mood) {

    PromptTemplate promptTemplate = new PromptTemplate(imagePrompt);
    Message message = promptTemplate.createMessage(Map.of("animal", animal, "activity", activity, "mood", mood));

    Prompt prompt = new Prompt(List.of(message));

    logger.info(prompt.toString());
    ChatResponse response = chatClient.call(prompt);
    String generatedImagePrompt = response.getResult().toString();
    logger.info("AI responded: generatedImagePrompt);
    ImageOptions imageOptions = ImageOptionsBuilder.builder().withModel("dall-e-3")
                .build();

    ImagePrompt imagePrompt = new ImagePrompt(generatedImagePrompt, imageOptions);
    ImageResponse imageResponse = imageClient.call(imagePrompt);
    String imageUrl = imageResponse.getResult().getOutput().getUrl();
    return "redirect:"+imageUrl;

}

This is not a particularly sophisticated piece of code, but it does show how simple it is to get SpringAI examples working.

I submitted a request for a picture of an aligator rollerblading, and set the mood as “joyful”. ChatGPT then generated a detailed prompt:

The image features a cheerful green gator. He’s wearing a pair of shiny, multicolored rollerblades that sparkle as they catch the light. His eyes are wide with excitement, and his mouth is stretched in a wide, friendly grin, revealing his white teeth. He’s standing in a beautiful park with green trees and flowers in the background, and there’s a clear blue sky overhead. He’s waving at the viewer as if inviting them to join him in his rollerblading adventure, adding to the joyful and playful vibe of the image.

And then the browser was redirected to the image:

GenAI SpringAI

Retrieval-augmented generation using SpringAI

Post author By admin
Post date February 7, 2024
No Comments on Retrieval-augmented generation using SpringAI

On Tuesday, after a long day working in Leeds, I came home and decided to play with SpringAI, trying to see if I could set up a retrieval-augmented generation example. It took me just over an hour to get something running.

The documentation for SpringAI feels a little shinier and more solid than that for LangChain4j. Both projects have similar aims, providing abstractions for working with common AI tools and both are explicitly inspired by the LangChain project.

As with LangChain4j, there were issues caused by rapid changes in the project’s APIs. I started work with an example built against OpenAI Azure. It was simple enough to switch this to working against OpenAI, requiring just a change in Spring dependencies and a few properties – Spring magic did the rest. The main problem was updating the code from 0.2.0-SNAPSHOT to 0.8.0-SNAPSHOT (I’d not realised how old the example I’d started with was).

The actual code itself is, once again very simple. When the application receives a query, it uses the SpringAI org.springframework.ai.reader.JsonReader class to load a document – in this case one about bikes from the original project – and divides it into chunks. Each of these chunks are run through a org.springframework.ai.embedding.EmbeddingClient, which produces a vector describing that chunk, and these are placed in a org.springframework.ai.vectorstore.SimpleVectorStore. Once I’d found the updated classes, the APIs were all very straightforward to work with.

An incoming query is then compared against the document database to find likely matches – these are then compiled into a SystemQuery template, which contains a natural-language prompt explaining the LLMs role in this application (You’re assisting with questions about products in a bicycle catalog). The SystemQuery is sent by the application alongside the specific UserQuery, which contains the user’s submitted question.

The responses from the ChatGPT4 model combined the user query with the document, producing obviously relevant responses in natural language. For example:

The SwiftRide Hybrid’s largest size (L) is suitable for riders with a height of 175 – 186 cm (5’9″ – 6’1″). If the person is taller than 6’1″, this bike may not be the best fit.

Playing around with this was not cheap – the RAG method sends a lot of data to OpenAI, and was burning through $0.10-$0.16 worth of tokens in each query. I also managed to hit my account’s rate limit of 10000 per minute playing with this. I’m not sure how feasible using the OpenAI model in production would be.

Notes and follow-ups

I need to put some of the code into github to share.
I’m fascinated by how part of the application is a natural-language prompt to tell ChatGPT how to respond. Programming LLMs is spooky, very close to asking a person to pretend they’re doing a role.
In production, this sort of application would require a lot of protection – some of which would use natural language instructions, but there are also models specifically for this role.
The obvious improvement here is to use a local model and see how effective that is.

GenAI LangChain4j

LangChain4j and local models

Post author By admin
Post date February 4, 2024
No Comments on LangChain4j and local models

A colleague told me about Ollama, which allows you to get LLMs working on a local machine. I was so excited about this that I downloaded the orca-mini model. Due to terrible hotel wifi I used my mobile internet and blew out the limit on that. Oops.

Anyway, it is very easy to get Ollama working. Just download and install the software, then run ollama run llama2. It has a simple REST interface:

curl -X POST http://localhost:11434/api/generate -d '{
    "model": "orca-mini",
    "prompt":"tell me a joke"                 
   }'

It was easy enough to get this working with LangChain4J, although the APIs were not quite the same as for the OpenAPI models.


import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.output.Response;

public class App
{
    public static void main( String[] args )
    {

        String modelName = "orca-mini";
        String localUrl = "http://localhost:11434/"; 

        ChatLanguageModel model =
                OllamaChatModel.builder().baseUrl(localUrl).modelName(modelName).build();


        String answer = model.generate("tell me a joke about space");

        System.out.println("Reply\n" + answer);
    }

While these local models are less powerful than OpenAPI they seem fairly decent on a first examination. They also a much cheaper way to work with an LLM and I am going to use this to set up a simple RAG (retrieval augmented generation) example in LangChain4J.

Recent Posts

Recent Comments

Archives

Categories

Interesting Links

The question

Previous posts

Notes from wikipedia

Interesting links

Introduction

The process

Conclusion