I’ve been thinking recently about Resource Augmented Generation (RAG) and vector databases, and I wanted to gather some notes towards a talk.

The question

Most of the applications I’m seeing for Generative AI seem to involve RAG. But I feel that the vector database is doing most of interesting work here – and, in a lot of cases, an LLM-generated response is not always the best ‘view’ for the data that is returned. I want to dig a little more into vector databases, how they work, and what can be done with them.

Panda Smith wrote, “If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves”; I want to learn more about this search part of RAG

Playing with embeddings was a post I wrote in September ’24 looking at using vector databases for ‘vibes-based search’
GenAI is already useful for historians discussed an article about a historian using GenAI to find diary entries relevant to their research
Retrieval-augmented generation using SpringAI was a Spring AI RAG demo I built for a previous talk

Notes from wikipedia

Pulling out some notes from the wikipedia article:

“[RAG] modifies interactions with a large language model so that the model responds to user queries with reference to a specified set of documents, using this information to supplement information from its pre-existing training data. This allows LLMs to use domain-specific and/or updated information.”
There are two phases – information retreival and response generation
RAG was first proposed in the paper ‘Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks’ in 2020
The process involves the following stages:
- Indexing – the documents to be searched are stored – usually by converting the data into vector representations
- Retrieval – given a user query, a document retriever finds the most relevant documents for the query
- Augmentation – the documents are put in a prompt for an LLM
- Generation – the LLM generates a response based upon the prompt
The ‘chunking’ of the documents (how they are divided up into pieces to be stored) affects how good the responses are.
Risks of RAG
- While RAG reduces hallucinations in the responses, it cannot eliminate them.
- There is a danger of losing important context in the chunking phase.

Interesting links

Twitter thread by Jo Kristian Bergum on ‘The rise and fall of the vector database infrastructure category’
The Best Way to Use Text Embeddings Portably is With Parquet and Polars – fascinating discussion of vector databases, using Magic: The Gathering cards as a dataset
Embeddings: What they are and why they matter Text of a talk by Simon Willison: “Embeddings are a really neat trick that often come wrapped in a pile of intimidating jargon”

Recent Posts

Recent Comments

Archives

Categories

The question

Previous posts

Notes from wikipedia

Interesting links

One reply on “Notes on Resource-Augmented Generation (part 1)”

Leave a Reply Cancel reply