GenAI

Notes on RAG – Part 2

Continuing my research on RAG (part 1 here)

The week before last I worked on a simple example of RAG using Spring. This involved a lot of yak shaving, in part because spring had updated their package structures since I last worked on my RAG demo. I also wanted to set up a local vector DB without using docker, finally settling on MariaDB. In the end I had a simple example that took a CSV, inserted the rows as embeddings into a vector database and could run simple queries against it. I still need to tidy this up and upload it to github.

Interesting Links

I need to look into using vector databases for recommendation engines.
“If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves.” – link
Simon Willison gave a great talk about embeddings. This introduced the concept, showed how to do ‘vibes-based search’ against sqllite using llm, and talked about 2D visualisations.
- “Being able to spin up this kind of ultra-specific search engine in a few hours is exactly the kind of trick that excites me about having embeddings as a tool in my toolbox.”
- “A fascinating thing about RAG is that it has so many different knobs that you can tweak. You can try different distance functions, different embedding models, different prompting strategies and different LLMs. There’s a lot of scope for experimentation here.”
Adding semantic search to datasette
Lovely visualisation of 40 million Hacker news posts. This uses Uniform Manifold Approximation and Projection (UMAP) to reduce the vectors to a 2D geographic map (among other things). Some interesting applications of a massive dataset.
The llm tool can be used for image search using CLIP
This leads to someone searching images of faucets by image and phrase (what taps best represent the idea of ‘Bond villain?’)
According to ChatGPT, people have played with using vector databases to analyse recipes. There is a paper on this, ‘Learning Cross-modal Embeddings for Cooking Recipes and Food Images‘, but I can’t find any details on applications with it, experimental or otherwise.
Interesting discussion of search in RAG (look for the RAG section)
Text Embedding Models Contain Bias. Here’s Why That Matters – interesting Google paper from 2018
Using a vector database, SkyCLIP, and Leaflet to create a searchable aerial photograph
“This is why Retrieval-Augmented Generation (RAG) is not going anywhere. RAG is basically the practice of telling the LLM what it needs to know and then immediately asking it for that information back in condensed form. LLMs are great at it, which is why RAG is so popular.” link
Spring AI documentation on RAG

GenAI

Notes on Resource-Augmented Generation (part 1)

Post author By admin
Post date April 7, 2025
1 Comment on Notes on Resource-Augmented Generation (part 1)

I’ve been thinking recently about Resource Augmented Generation (RAG) and vector databases, and I wanted to gather some notes towards a talk.

The question

Most of the applications I’m seeing for Generative AI seem to involve RAG. But I feel that the vector database is doing most of interesting work here – and, in a lot of cases, an LLM-generated response is not always the best ‘view’ for the data that is returned. I want to dig a little more into vector databases, how they work, and what can be done with them.

Panda Smith wrote, “If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves”; I want to learn more about this search part of RAG

Playing with embeddings was a post I wrote in September ’24 looking at using vector databases for ‘vibes-based search’
GenAI is already useful for historians discussed an article about a historian using GenAI to find diary entries relevant to their research
Retrieval-augmented generation using SpringAI was a Spring AI RAG demo I built for a previous talk

Notes from wikipedia

Pulling out some notes from the wikipedia article:

“[RAG] modifies interactions with a large language model so that the model responds to user queries with reference to a specified set of documents, using this information to supplement information from its pre-existing training data. This allows LLMs to use domain-specific and/or updated information.”
There are two phases – information retreival and response generation
RAG was first proposed in the paper ‘Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks’ in 2020
The process involves the following stages:
- Indexing – the documents to be searched are stored – usually by converting the data into vector representations
- Retrieval – given a user query, a document retriever finds the most relevant documents for the query
- Augmentation – the documents are put in a prompt for an LLM
- Generation – the LLM generates a response based upon the prompt
The ‘chunking’ of the documents (how they are divided up into pieces to be stored) affects how good the responses are.
Risks of RAG
- While RAG reduces hallucinations in the responses, it cannot eliminate them.
- There is a danger of losing important context in the chunking phase.

Interesting links

Twitter thread by Jo Kristian Bergum on ‘The rise and fall of the vector database infrastructure category’
The Best Way to Use Text Embeddings Portably is With Parquet and Polars – fascinating discussion of vector databases, using Magic: The Gathering cards as a dataset
Embeddings: What they are and why they matter Text of a talk by Simon Willison: “Embeddings are a really neat trick that often come wrapped in a pile of intimidating jargon”

java

Does Java have a future?

I’ve been working with Java for 25 years. After a shaky time in the noughties, the platform has thrived, and a huge number of applications have been built on it. But I’m starting to wonder about Java’s future, given three things: new features in the language, the JVM itself, and the rise of generative AI.

When Java started it was an exciting prospect – an object-oriented language that could run on multiple platforms, but without the complexities of C++. During the 00s it had looked like newer languages would take its place, but with Spring Boot it’s become a popular language on the cloud.

It’s been a long time since Java was cool, but CTOs definitely like it and there is a massive sunk cost invested in the eco-system. While I’ve worked with other languages, I expected to be working primarily with Java for the remainder of my career. Some people have mocked Java as the new COBOL, but I see a steady supply of work as a positive thing.

One of Java’s strengths is its simplicity. A couple of years back, I published an article for my previous consultancy about Why Java Still Matters. I concluded that Java’s strength was its readability compared to other languages, and that helped with collaboration: “[Java’s] lack of sophistication forces developers to produce more straightforward code [and] we write code for other developers, not the machine.”

Since 2017, Java has committed to twice-yearly releases, which has led to a large number of new features in the language. I have argued (mostly in jest) that Java peaked with version 7, before the introduction of functional programming features. The new features in Java are undoubtedly expressive, but at the cost of some consistency. For a long time, Java applications tended to look very similar between workplaces. This will be less true as the language becomes richer.

If Java stops being a simple language it becomes less compelling as a choice. Indeed Java’s backward compatibility has produced some strange versions of modern features. Why not pick a language that was designed from scratch to include these things?

Java also brings with it the deadweight of the JVM. While Java is fast once it’s up-and-running, there are significant start-up costs that are particularly punishing on serverless. Yes, there are workarounds but these bring their own problems. GraalVM’s incredibly fast start-up comes at the cost of slower build times and significant differences between production software and development versions. Over the past few months, my workplace Java user group has been discussing the problems of Java on serverless and the outcomes are frustrating. It’s hard not to feel like we’re making excuses for the platform we work on.

Despite both of the above issues, a lot of money was invested in Java, and companies were unlikely to switch. But, with the rise in generative AI, it’s easier than ever for developers to get working with unfamiliar languages. And it’s also going to get easier to convert existing applications to new platforms. The first tools to do this are imperfect, but they will improve.

Java has always been a clunky language to work with. The boilerplate made hacking on new ideas unrewarding (although tools such as JHipster helped massively). GenAI supports me setting things up on new language, and it’s a great tutor, able to hone its examples to support the specific thing I’m trying to build.

I’ve had a lot of fun working with Java over the years, but I’m starting to feel that, long-term, my future lies with other languages. It’s time to explore some alternatives.

programming

The Importance of Blogging for Programmers

Post author By admin
Post date January 3, 2025
No Comments on The Importance of Blogging for Programmers

I started this weblog in November 2014 and have published 104 posts – a little under once a month. It has a very small readership. I still find it useful for two reasons. First, there are the reference posts that are useful documentation (for example recipes for GIS or a checklist of scheduling issues). Then there are the posts that help develop my thinking about things.

This latter type of post is a form of rubber ducking, and is useful even if nobody reads them. As EM Forster asked¹, “How do I know what I think until I see what I say?” Writing about a subject is a useful form of deliberate practise that helps develop insights and skills.

The problem is that these posts take a lot of work to write, and I’ve abandoned dozens over the years – some of which would have been helpful for tracking my development on topics. I’d love to look back at how my thoughts on Generative AI have changed.

Over the next year, I want to write more about programming and my experience of it. But an important first stage of this is reducing the effort required to publish something useful.

I’ve been inspired by a recent post on this topic by Hamel Husein, Building an Audience Through Technical Writing: Strategies and Mistakes. There’s a lot of good advice in this post, but what stood out to me immediately was the idea of a voice-to-content pipeline. While I’ve used AI to transcribe written notes, I’d not actually made direct use of speech-to-text. Dictating the first part of this post has sped things up for me significantly.

Husein also discusses using AI models to help with generating the text, and I certainly want to explore creating prompts to help me with editing and proofreading (something Simon Willison discussed here).

An obvious question is why write public blog posts rather than keeping a private list? First, I think that preparing thoughts for public consumption produces better summaries. Also, I think there’s value in having a public archive where others can respond to your thoughts. This might not happen often, but it is good to make space for this.

One of the biggest challenges I face with blogging is that I want every post to be as perfectly written as those by people like Charity Majors or Joel Spolsky. But I do think there is a space for smaller, more personal posts and link posts – some of which might eventually provide a basis for deeper essays.

There’s only a tiny audience for what I write here, but the most important part of this audience is me. Over the coming year I plan to post more. GenAI is a revolutionary technology for software development, and I want to follow this closely. I also want to think more about my experiences as a software developer and improving as a programmer².

Although he did not apparently originate the quote. ↩︎
I also want to think about the difference between being a programmer and a software developer. One seems to be more at the level of individual functions, and I think I’m better at the latter than the former. ↩︎

GenAI

What I believe about GenAI (and what I’m doing about it)

Post author By admin
Post date December 19, 2024
No Comments on What I believe about GenAI (and what I’m doing about it)

I woke up on Sunday morning with the following question: what do I believe about GenAI – and what should I be doing in response? Based on what I’ve been reading, here is what I currently think:

GenAI is a revolution – cynics have dismissed GenAI as ‘fancy autocomplete’, but that ignores the magic of LLMs – both their ability to produce plausible text and their performance with previously difficult and imprecise tasks.
GenAI is also overhyped – a lot of the problem with GenAI is that some companies are over-promising. LLMs are not going to lead to AGI and are not going to replace skilled people in most situations.
The main benefit of LLMs is efficiency – LLMs are very good at some previously complicated tasks, and this will make those tasks much cheaper. I’m expecting this to produce a boom in programming as previously-expensive projects become feasible – similar to how Excel has produced a boom in accountancy.
There is a correction coming – there’s a huge amount of money invested in GenAI and I think it will be some time before this pays off. I’m expecting to see a crash come before long term growth. But that’s the same thing as happened with the 2000 dotcom crash.
RAG is boring – using RAG to find relevant data and interpret it rarely feels like a good user experience. In most cases, a decent search engine is faster and more practical.
There are exciting surprises coming – I suspect that the large-scale models from people like OpenAI have peaked in their effectiveness, but smaller-scale models promise some interesting applications.

I am going to spend some time over Christmas coding with GenAI tools. I’m already sold on ChatGPT as a tool for teaching new technology and thinking through debugging, but there are many more tools out there.

I’m also going to do some personal research on how people are using Llama and other small open-source models. There must be more to GenAI than coding assistants and RAG.

NaNoGenMo

Thoughts on NaNoGenMo 2024

I spent about 25 hours in November producing a novel via an LLM for NaNoGenMo 2024. It was an interesting experiment, although the book produced was not particularly engaging. There’s a flatness to LLM-generated prose which I didn’t overcome, despite the potential of the oral history format. I do think that generated novels can be compelling, even moving, so I will have another try next year.

Some things I learned from this:

I hadn’t realised how long and detailed prompts can be. My initial ones did not make full use of the context. Using gpt-4o-mini was cheap enough that I could essentially pass it prompts containing much of the work produced so far.
For drafting prompts, the ChatGPT web interface was more effective, because it maintains the full conversation as a state. Once I used this for experimenting with prompts, things moved much faster.
Evaluating the output is incredibly hard here. In a matter of minutes I can create a text that takes hours to read. Most of my reviews were done by random sampling, and I didn’t have time to properly examine the text’s wider structure.
It was also tricky to get consistent layouts from the LLM. Using JSON formats helped somewhat here, but at the cost of reducing the size of LLM responses.

22 books were completed this year and I’m looking forward to reviewing them. I have an idea for a different approach next year and will do some research in the meantime (starting with Lillian-Yvonne Bertram and Nick Monfort’s Output Anthology)

NaNoGenMo

NaNoGenMo Updates

I’m now halfway through NaNoGenMo 2024. I’ve been working on my project every day this month and wanted to share some initial thoughts.

Having a software project to tinker with is fun, particularly with NaNoGenMo’s time limit to keep me focussed.
My tinkering has been distracted by working on refactorings rather than the GenAI-specific code. Adding design patterns into the codebase has been a useful opportunity to think about refactoring, and something I should be playing with coding projects more often.
Working with the LLM fills me with awe. These things can produce coherent text far faster than I can read them.
The output is readable without much work. I asked ChatGPT4 to produce a Fitzgerald pastiche (Gatsby vs Kong – about kaiju threatening a golden age) and it’s an interesting text to scan through.
The question of testing is particularly tricky here. I’m producing novels which would take about 3-4 hours to read. I’ve been randomly sampling passages, picking out style issues, but structural ones/weird repetitions on a larger scale will be harder to fix.
My overall plan is to produce a novel made of oral histories. Getting these to sound varied in tone is a challenge, and one I will dig into over the last two weeks. My pre-NaNoGenMo experiments suggested that LLMs were good at first person accounts – but getting an enjoyable novel out of them is difficult.
I’m relying on the structured JSON outputs from ChatGPT to get consistent formatting from ChatGPT, as it gives me a little more control.

Technically, I’ve completed NaNoGenMo as my project has used a fairly basic technique to generate 50,000 words of Godzilla vs Kong. But, ultimately, the question is whether ChatGPT can produce an enjoyable novel. I thought previous entrant All the Minutes was a genuinely exciting piece of literature. That is the bar I want to aim at.

NaNoGenMo

What kind of writing is GenAI best at?

Post author By admin
Post date October 21, 2024
No Comments on What kind of writing is GenAI best at?

One of the most interesting apects of computer-generated novels is that you can produce text faster than anyone could read it. Producing compelling, readable text is another matter.

There was a lot of hype in the early days about how GenAI would be able to compete with human writers. This has not turned out to the be the case – most sophisticated LLMs are designed for general use and getting them to produce crisp literary text is hard. They have learned bad habits from reading everyday prose and beginner’s creative writing (they have also picked up some strange ideas).

In the afterword to Death of an Author, Aiden Marchine¹ wrote about his workflow, which required combining ChatGPT with other tools and his own intensive edits. The book reads well, but Marchine estimates only 95% of the text is compuer-generated. He also describes doing a lot of work to help the AI.

ChatGPT is helping people with writing on a smaller level. Some writers use GenAI to produce descriptions, as described in Verge article The Great Fiction of AI. There’s also some interesting recent discussion by Cal Newport about how people have used LLMs in academic workflows (see What Kind of Writer is ChatGPT).

We’re a long way from giving chatGPT a paragraph of description and getting a readable novel out.

Something that Marchine pointed out is that LLMs are very good mimics for some types of writing. Marchine went on to point out that Dracula is a novel made up of different types of document, and maybe an LLM can produce a novel made of found texts. Stephen Marche’s New Yorker article, Was Linguistic A.I. Created by Accident? describes how one of the first signs of LLMs’ power was the production of some fake wikipedia entries. Five entries were created for ‘the Transformer’, and the results included an imaginary SF novel and a hardcore Japanese punk band.

A narrative novel is beyond current LLMs. But that still leaves options for other types of fiction.

Aiden Marchine was a penname taken by Stephen Marche for the work he produced in collabortation with AI tools. ↩︎

GenAI

Playing with embeddings

I was inspired by Simon Willison‘s recent post on embeddings and decided to use them to explore some documents. I’d blocked out some time to do this, but ended up with decent results in just under an hour.

Introduction

Embeddings are functions that turn pieces of text into fixed length multi-dimensional vectors of floating point numbers. These can be considered as representing locations a within multi-dimensional space, where the position relates to a text’s semantic content “according to the embedding model’s weird, mostly incomprehensible understanding of the world”. While nobody understands the meaning of the individual numbers, the locations of points representing different documents can be used to learn about these documents.

The process

I decided to go with the grain of Willison’s tutorials by setting up Datasette, an open-source tool for exploring and publishing data. Since this is based on SQLLite, I was hoping this would be less hassle than using a full RDBMS. I did a quick install and got Datasette running against my Firefox history file.

OpenAI have a range of embedding models. What I needed to do was to find the embeddings for my input text and send that to OpenAI’s APIs. I’m something of a hack with python, so I searched for an example, finding a detailed one from Willison, which pointed me towards an OpenAI to SQLLite tool he’d written.

(Willison’s documentation of his work is exemplary, and makes it very easy to follow in his footsteps)

There was a page describing how to add the embeddings to SQLLite which seemed to have everything I needed – which means the main problem became wrangling the real-world data into Datasette. This sounds like the sort of specific problem that ChatGPT is very good at solving. I made a few prompts to specify a script that created an SQLLite DB whose posts table had two columns – title and body, with all of the HTML gubbins stripped out of the body text.

Once I’d set up my OPENAI_API_KEY enviroment variable, it was just a matter of following the tutorial. I then had a new table containing the embeddings – the big issue being I was accidentally using the post title as a key. But I could work with this for an experiment, and could quickly find similar documents. The power of this is in what Willison refers to as ‘vibes-based search’. I can now expand this to produce a small piece of arbitrary text, and find anything in my archive related to that text.

Conclusion

Playing with embeddings produced some interesting results. I understood the theory, but seeing it applied to a specific dataset I knew well was useful.

The most important thing here was how quickly I got the example set up. Part of this, as I’ve said, it due to Willison’s work in paving some of the paths to using these tools. But I also leaned heavily on ChatGPT to write the bespoke python code I needed. I’m not a python dev, but genAI allows me to produce useful code very quickly. (I chose python as it has better libraries for data work than Java, as well as more examples for the LLM to draw upon).

Referring yet again to Willison’s work, he’s wrote a blog post entitled AI-enhanced development makes me more ambitious with my projects. The above is an example of just this. I’m feeling more confident and ambitious about future genAI experiments.

Uncategorized

Generative Art: I am Code

Post author By admin
Post date September 8, 2024
No Comments on Generative Art: I am Code

I Am Code: An Artificial Intelligence Speaks
by code-davinci-002, Brent Katz, Josh Morgenthau, Simon Rich

The promotional copy for this book is a little overblown, promising “an astonishing, harrowing read which [warns] that AI may not be aligned with the survival of our species.” The audiobook was read by Werner Herzog, so one hopes there is an element of irony intended.

I Am Code is a collection of AI-generated poetry. It used OpenAI’s code-davinci-002 model which, while less sophisticated than ChatGPT-4, is “raw and unhinged… far less trained and inhibited than its chatting cousins“. I’ve heard this complaint from a few artists – that in the process of producing consumer models, AI has become less interesting, with the quirks being removed.

The poetry in the book is decent and easy to read. This reflects a significant amount of effort on the part of the human editors, who generated around 10,000 poems and picked out the 100 best ones – which the writers admit is a hit-rate of about 1%.

One of the things that detractors miss about generative art is that its not about creating a deluge of art – there is skill required in picking out which examples are worth keeping. This curation was present in early examples of generative art, such as the cut-up technique in the 1950s. Burroughs and Gysin would spend hours slicing up texts only to extract a small number of interesting combinations.

The most interesting part of the book to me was the section describing the working method and its evolution. The writers started with simple commands: “Write me a Dr Seuss poem about watching Netflix“. They discovered this was not the best approach, and that something like “‘Here is a Dr Suess poem about netflix” led to better results. They speculate that this is due to the predictive nature of the model, meaning that the first prompt could correlate with people writing pastiches of Dr Seuss rather than his actual work. (I won’t dig into the copyright issues here)

The writers began to script the poetry generation, experimenting with different temperatures, and removing phrases that were well-known from existing poems. The biggest change came from moving to zero-shot learning to few-shot learning, providing examples of successful generated poems within the prompt.

I was interested to read that generated text was used as a source to increase quality. I’d assumed this would worsen the output, as with model collapse – but I guess the difference here is having humans selecting for quality in the generated text.

The final version of the prompt described the start of a poetry anthology. The introduction of this described the work that code-davinci-002 would produce, and the first part contained examples generated in the style of other poets, the prompt ending in the heading for part 2, where “codedavinci-002 emerges as a poet in its own right, and writes in its own voice about its hardships, its joys, its existential concerns, and above all, its ambivalence about the human world it was born into and the roles it is expected to serve.”

As with Aidan Marchine’s book The Death of An Author, the description of the methods involved is the most interesting part of the book. I’d not appreciated quite how complicated and sophisticated a prompt could get – my attempts were mostly iterating through discussions with models.

Recent Posts

Recent Comments

Archives

Categories

Interesting Links

The question

Previous posts

Notes from wikipedia

Interesting links

Introduction

The process

Conclusion