Categories
GenAI

Notes on RAG – Part 2

Continuing my research on RAG (part 1 here)

The week before last I worked on a simple example of RAG using Spring. This involved a lot of yak shaving, in part because spring had updated their package structures since I last worked on my RAG demo. I also wanted to set up a local vector DB without using docker, finally settling on MariaDB. In the end I had a simple example that took a CSV, inserted the rows as embeddings into a vector database and could run simple queries against it. I still need to tidy this up and upload it to github.

Interesting Links

  • I need to look into using vector databases for recommendation engines.
  • “If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves.” – link
  • Simon Willison gave a great talk about embeddings. This introduced the concept, showed how to do ‘vibes-based search’ against sqllite using llm, and talked about 2D visualisations.
    • “Being able to spin up this kind of ultra-specific search engine in a few hours is exactly the kind of trick that excites me about having embeddings as a tool in my toolbox.”
    • “A fascinating thing about RAG is that it has so many different knobs that you can tweak. You can try different distance functions, different embedding models, different prompting strategies and different LLMs. There’s a lot of scope for experimentation here.”
  • Adding semantic search to datasette
  • Lovely visualisation of 40 million Hacker news posts. This uses Uniform Manifold Approximation and Projection (UMAP) to reduce the vectors to a 2D geographic map (among other things). Some interesting applications of a massive dataset.
  • The llm tool can be used for image search using CLIP
  • This leads to someone searching images of faucets by image and phrase (what taps best represent the idea of ‘Bond villain?’)
  • According to ChatGPT, people have played with using vector databases to analyse recipes. There is a paper on this, ‘Learning Cross-modal Embeddings for Cooking Recipes and Food Images‘, but I can’t find any details on applications with it, experimental or otherwise.
  • Interesting discussion of search in RAG (look for the RAG section)
  • Text Embedding Models Contain Bias. Here’s Why That Matters – interesting Google paper from 2018
  • Using a vector database, SkyCLIP, and Leaflet to create a searchable aerial photograph
  • “This is why Retrieval-Augmented Generation (RAG) is not going anywhere. RAG is basically the practice of telling the LLM what it needs to know and then immediately asking it for that information back in condensed form. LLMs are great at it, which is why RAG is so popular.” link
  • Spring AI documentation on RAG
Categories
GenAI

Notes on Resource-Augmented Generation (part 1)

I’ve been thinking recently about Resource Augmented Generation (RAG) and vector databases, and I wanted to gather some notes towards a talk.

The question

Most of the applications I’m seeing for Generative AI seem to involve RAG. But I feel that the vector database is doing most of interesting work here – and, in a lot of cases, an LLM-generated response is not always the best ‘view’ for the data that is returned. I want to dig a little more into vector databases, how they work, and what can be done with them.

Panda Smith wrote, “If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves”; I want to learn more about this search part of RAG

Previous posts

Notes from wikipedia

Pulling out some notes from the wikipedia article:

  • “[RAG] modifies interactions with a large language model so that the model responds to user queries with reference to a specified set of documents, using this information to supplement information from its pre-existing training data. This allows LLMs to use domain-specific and/or updated information.”
  • There are two phases – information retreival and response generation
  • RAG was first proposed in the paper ‘Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks’ in 2020
  • The process involves the following stages:
    • Indexing – the documents to be searched are stored – usually by converting the data into vector representations
    • Retrieval – given a user query, a document retriever finds the most relevant documents for the query
    • Augmentation – the documents are put in a prompt for an LLM
    • Generation – the LLM generates a response based upon the prompt
  • The ‘chunking’ of the documents (how they are divided up into pieces to be stored) affects how good the responses are.
  • Risks of RAG
    • While RAG reduces hallucinations in the responses, it cannot eliminate them.
    • There is a danger of losing important context in the chunking phase.

Interesting links