Categories
GenAI

Summer of Q: Week 2

My overall impression, after more time working with Amazon Q, is that it will take some work for a coding agent to make me faster and more effective. Q definitely removes some of the boring bits of coding (it’s great at Maven dependencies) but it’s more wayward on complicated tasks. There’s a lot to learn here.

At the end of last weekend, I’d settled on a method: writing a specification for an area of my application, having Q produce a BDD feature file outlining the behaviour, and then getting Q to fill in the testing code and after that, the implementation. This soon ran into problems as I’d still set Q too wide a brief, and the code produced quickly sprawled. There were many minor issues, such as Q producing unfocussed Cucumber step files. Along with the pages of code, some chunks of functionality were left out to ‘fill in later’.

It’s tricky to find a regular working pattern with good DevEx. I didn’t want to put Q into ‘trust’ mode, choosing rather to review each change as it was prepared. I did this so I could interrupt Q when it went off the rails, and also to reduce the amount of generated code I needed to review. This meant a lot of time waiting while Q was ‘thinking’. One colleague talked about their passion for writing code and how reviewing generated things is not the same. In their current form, these tools don’t have the responsiveness of working directly with code.

The production of the code also produced a strange effect around ownership. Hand-writing code (or whatever we call the ‘old’ ways of programming) meant taking care with each method. It was a good way to get inside the code, producing ‘mechanical sympathy’. Here, I started with a simple outline of my application in 275 words. Q produced over 10,000 words of feature files (including some useful functionality that was not asked for, such as sanitising inputs). This is a lot of reading! Assuming a reading rate of 400 words per minute, that is 25 minute’s work – setting aside the deeper understanding needed here, and any editing required.

Q also proved to be better at some things than others. When asked to generate some test data, Q created a programme to populate the DB on start-up. I had to suggest using liquibase. Being able to get the best out of this tool requires the operator to have a clear idea of what they would expect.

I’m still convinced that these tools will be part of a regular toolkit, but I don’t think they will offer the sort of incredible gains some have suggested – although they will be essential for prototyping. Cal Newport produced a great summary of the competing claims about productivity. My prediction is that, in the long run, we’ll see significant gains, but we won’t be relying solely on the agents.

Categories
GenAI

First Impressions of AmazonQ

My employer has organised a ‘Summer of Q’, where a number of us have signed up to play with AmazonQ. This weekend was the first time I could work with Q in depth. The main result – I ‘built’ a quiz application in 30 minutes (while also doing some chores) and it looked and worked better than what I’d have produced solo. But there are a lot of subtleties and caveats to add to this.

  • A major argument against GenAI putting developers out of work is how poor the tooling and signup flows for Q are. The signup is terrible and confuses a lot of people. Q failed to help, and kept hallucinating links to help pages that didn’t exist. The IntelliJ plugin is awful and locks the IDE, so I’ve had to use the command-line version instead.
  • Q is great at producing code. Producing the quiz example was a trivial task, so I’m now working on a much more complicated example. Straight away, I can see Q making me more effective. Personal tools I’ve wanted to make, that I decided against investing time in, now look easy.
  • The quiz app that Q produced looked and played better than what I could have produced by myself. I’m very impressed by this.
  • The model’s reasoning is clever and spooky – it makes mistakes sometimes, but then works to fix those. Interesting behaviour – although I expect there to be fewer mistakes in the generated code over time.
  • One of the challenges of coding agents is getting used to the new workflow. There’s a fair bit of waiting involved while Q thinks about each file that needs creating. It’s very different to using a GenAI coding assistant, and I need to figure out the best new workflow.
  • An ongoing problem with GenAI is that it involves a lot more reading than writing. I figure almost no-one is reading co-pilot meeting summaries, and I worry that not everyone will closely read the impressive amount of code that Q generates.
  • At present, I’m reviewing each action Q takes, rather than trusting it for the session. It’s going to be interesting to how other people are working. There’s a lot of boring waiting this way, but a lot less reading to do in one go.
  • Being able to produce decent (albeit not perfect) code so quickly will change the nature of programming. The coding part is going to get much easier. The development part – making sure the right thing is produced – will become more important, and maybe more difficult. I’m currently using feature tests as a way of validating what is being made.
  • Something I’ve noticed with GenAI in a number of areas is the importance of taste. The tools produce things (image/text/code) incredibly fast, and require an operator with strong opinions about this output.
  • Q responded to my initial, naive prompts by producing ornate additional features. For example I asked it to generate some BDD feature files and it’s adding some complicated accessibility tests. I’m looking forward to watching it try to fill those out! I also spotted some subtle divergences from the spec that I need to edit. The quiz code I initially generated also included a lot of useful but unasked-for features. They were improvements, for sure, but it was definitely not an MVP. It will be interesting to see how easy it is to work with Q on my more complicated application.
Categories
GenAI

Two notes on vibe coding

From Ashley Willis:

“[A mentor] pointed out that debugging AI generated code is a lot like onboarding into a legacy codebase, making sense of decisions you didn’t make, finding where things break, and learning to trust (or rewrite) what’s already there. That’s the kind of work a lot of developers end up doing anyway”

From Sean Goedecke:

Being good at debugging is more useful than being good at writing code – you only write a piece of code once, but you may end up debugging it hundreds of times1. As programmers use more AI-written code, debugging may end up being the only remaining programming skill.

Categories
GenAI

Notes on RAG – Part 2

Continuing my research on RAG (part 1 here)

The week before last I worked on a simple example of RAG using Spring. This involved a lot of yak shaving, in part because spring had updated their package structures since I last worked on my RAG demo. I also wanted to set up a local vector DB without using docker, finally settling on MariaDB. In the end I had a simple example that took a CSV, inserted the rows as embeddings into a vector database and could run simple queries against it. I still need to tidy this up and upload it to github.

Interesting Links

  • I need to look into using vector databases for recommendation engines.
  • “If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves.” – link
  • Simon Willison gave a great talk about embeddings. This introduced the concept, showed how to do ‘vibes-based search’ against sqllite using llm, and talked about 2D visualisations.
    • “Being able to spin up this kind of ultra-specific search engine in a few hours is exactly the kind of trick that excites me about having embeddings as a tool in my toolbox.”
    • “A fascinating thing about RAG is that it has so many different knobs that you can tweak. You can try different distance functions, different embedding models, different prompting strategies and different LLMs. There’s a lot of scope for experimentation here.”
  • Adding semantic search to datasette
  • Lovely visualisation of 40 million Hacker news posts. This uses Uniform Manifold Approximation and Projection (UMAP) to reduce the vectors to a 2D geographic map (among other things). Some interesting applications of a massive dataset.
  • The llm tool can be used for image search using CLIP
  • This leads to someone searching images of faucets by image and phrase (what taps best represent the idea of ‘Bond villain?’)
  • According to ChatGPT, people have played with using vector databases to analyse recipes. There is a paper on this, ‘Learning Cross-modal Embeddings for Cooking Recipes and Food Images‘, but I can’t find any details on applications with it, experimental or otherwise.
  • Interesting discussion of search in RAG (look for the RAG section)
  • Text Embedding Models Contain Bias. Here’s Why That Matters – interesting Google paper from 2018
  • Using a vector database, SkyCLIP, and Leaflet to create a searchable aerial photograph
  • “This is why Retrieval-Augmented Generation (RAG) is not going anywhere. RAG is basically the practice of telling the LLM what it needs to know and then immediately asking it for that information back in condensed form. LLMs are great at it, which is why RAG is so popular.” link
  • Spring AI documentation on RAG
Categories
GenAI

Notes on Resource-Augmented Generation (part 1)

I’ve been thinking recently about Resource Augmented Generation (RAG) and vector databases, and I wanted to gather some notes towards a talk.

The question

Most of the applications I’m seeing for Generative AI seem to involve RAG. But I feel that the vector database is doing most of interesting work here – and, in a lot of cases, an LLM-generated response is not always the best ‘view’ for the data that is returned. I want to dig a little more into vector databases, how they work, and what can be done with them.

Panda Smith wrote, “If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves”; I want to learn more about this search part of RAG

Previous posts

Notes from wikipedia

Pulling out some notes from the wikipedia article:

  • “[RAG] modifies interactions with a large language model so that the model responds to user queries with reference to a specified set of documents, using this information to supplement information from its pre-existing training data. This allows LLMs to use domain-specific and/or updated information.”
  • There are two phases – information retreival and response generation
  • RAG was first proposed in the paper ‘Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks’ in 2020
  • The process involves the following stages:
    • Indexing – the documents to be searched are stored – usually by converting the data into vector representations
    • Retrieval – given a user query, a document retriever finds the most relevant documents for the query
    • Augmentation – the documents are put in a prompt for an LLM
    • Generation – the LLM generates a response based upon the prompt
  • The ‘chunking’ of the documents (how they are divided up into pieces to be stored) affects how good the responses are.
  • Risks of RAG
    • While RAG reduces hallucinations in the responses, it cannot eliminate them.
    • There is a danger of losing important context in the chunking phase.

Interesting links

Categories
GenAI

What I believe about GenAI (and what I’m doing about it)

I woke up on Sunday morning with the following question: what do I believe about GenAI – and what should I be doing in response? Based on what I’ve been reading, here is what I currently think:

  • GenAI is a revolution – cynics have dismissed GenAI as ‘fancy autocomplete’, but that ignores the magic of LLMs – both their ability to produce plausible text and their performance with previously difficult and imprecise tasks.
  • GenAI is also overhyped – a lot of the problem with GenAI is that some companies are over-promising. LLMs are not going to lead to AGI and are not going to replace skilled people in most situations.
  • The main benefit of LLMs is efficiency – LLMs are very good at some previously complicated tasks, and this will make those tasks much cheaper. I’m expecting this to produce a boom in programming as previously-expensive projects become feasible – similar to how Excel has produced a boom in accountancy.
  • There is a correction coming – there’s a huge amount of money invested in GenAI and I think it will be some time before this pays off. I’m expecting to see a crash come before long term growth. But that’s the same thing as happened with the 2000 dotcom crash.
  • RAG is boring – using RAG to find relevant data and interpret it rarely feels like a good user experience. In most cases, a decent search engine is faster and more practical.
  • There are exciting surprises coming – I suspect that the large-scale models from people like OpenAI have peaked in their effectiveness, but smaller-scale models promise some interesting applications.

I am going to spend some time over Christmas coding with GenAI tools. I’m already sold on ChatGPT as a tool for teaching new technology and thinking through debugging, but there are many more tools out there.

I’m also going to do some personal research on how people are using Llama and other small open-source models. There must be more to GenAI than coding assistants and RAG.

Categories
GenAI

Playing with embeddings

I was inspired by Simon Willison‘s recent post on embeddings and decided to use them to explore some documents. I’d blocked out some time to do this, but ended up with decent results in just under an hour.

Introduction

Embeddings are functions that turn pieces of text into fixed length multi-dimensional vectors of floating point numbers. These can be considered as representing locations a within multi-dimensional space, where the position relates to a text’s semantic content “according to the embedding model’s weird, mostly incomprehensible understanding of the world”. While nobody understands the meaning of the individual numbers, the locations of points representing different documents can be used to learn about these documents.

The process

I decided to go with the grain of Willison’s tutorials by setting up Datasette, an open-source tool for exploring and publishing data. Since this is based on SQLLite, I was hoping this would be less hassle than using a full RDBMS. I did a quick install and got Datasette running against my Firefox history file.

OpenAI have a range of embedding models. What I needed to do was to find the embeddings for my input text and send that to OpenAI’s APIs. I’m something of a hack with python, so I searched for an example, finding a detailed one from Willison, which pointed me towards an OpenAI to SQLLite tool he’d written.

(Willison’s documentation of his work is exemplary, and makes it very easy to follow in his footsteps)

There was a page describing how to add the embeddings to SQLLite which seemed to have everything I needed – which means the main problem became wrangling the real-world data into Datasette. This sounds like the sort of specific problem that ChatGPT is very good at solving. I made a few prompts to specify a script that created an SQLLite DB whose posts table had two columns – title and body, with all of the HTML gubbins stripped out of the body text.

Once I’d set up my OPENAI_API_KEY enviroment variable, it was just a matter of following the tutorial. I then had a new table containing the embeddings – the big issue being I was accidentally using the post title as a key. But I could work with this for an experiment, and could quickly find similar documents. The power of this is in what Willison refers to as ‘vibes-based search’. I can now expand this to produce a small piece of arbitrary text, and find anything in my archive related to that text.

Conclusion

Playing with embeddings produced some interesting results. I understood the theory, but seeing it applied to a specific dataset I knew well was useful.

The most important thing here was how quickly I got the example set up. Part of this, as I’ve said, it due to Willison’s work in paving some of the paths to using these tools. But I also leaned heavily on ChatGPT to write the bespoke python code I needed. I’m not a python dev, but genAI allows me to produce useful code very quickly. (I chose python as it has better libraries for data work than Java, as well as more examples for the LLM to draw upon).

Referring yet again to Willison’s work, he’s wrote a blog post entitled AI-enhanced development makes me more ambitious with my projects. The above is an example of just this. I’m feeling more confident and ambitious about future genAI experiments.

Categories
GenAI

GenAI is already useful for historians

I’m still hearing people saying that GenAI is empty hype, comparing it to blockchain and NFTs. The worst dismissals claim that these tools have no real use. While there is a lot of hype around GenAI, there are people using them for real work, including for code generation and interpretation.

An interesting article in the Verge, How AI can make history, looks at how LLMs can investigate historical archives, through Mark Humphries’ research into the diaries of fur trappers. He used LLMs to summarise these archives and to draw out references to topics far more powerfully than a keyword search ever could.

The tool still missed some things, but it performed better than the average graduate student Humphries would normally hire to do this sort of work. And faster. And much, much cheaper. Last November, after OpenAI dropped prices for API calls, he did some rough math. What he would pay a grad student around $16,000 to do over the course of an entire summer, GPT-4 could do for about $70 in around an hour. 

Yes, big companies are overselling GenAI. But, when you strip away the hype, these tools are still incredibly powerful, and people are finding uses for them.

Categories
GenAI programming

The potential of ChatGPT for programmers

I’ve been meaning to post for some time about my first experiences of programming with ChatGPT, back in January. Ethan Mollick often suggests that people should try doing their job with ChatGPT for at least 10 hours to get a feel for its potential. Playing with ChatGPT for a short time has converted me from an AI cynic to an enthusiast.

Simon Willison wrote about his experiences coding with ChatGPT, concluding that AI-enhanced development makes him more ambitious with his projects.

Shortly after I read that post, I had a silly question related to watching movies. I order my watchlist at Letterboxd by the average rating on the site. But I began to wonder whether this was a good way to watch movies. Did my taste actually correlate with the overall site? Or would I be better off finding a different way to order the watch list?

The obvious way to check this is by writing a bit of code to do the analysis, but that seemed like a chore. I decided to put a few prompts into ChatGPT to see whether that helped. Within two minutes, I had a working python programme. There was a little bit of playing around to get the right page element to scrape, but essentially ChatGPT wrote me a piece of code that could load up a CSV file, use data in the CSV file to download a webpage, grab an item from the page and then generate another CSV file with the output.

I started with a simple initial prompt and asked for a series of improvements.

Can you show me an example of how to scrape a webpage using python, please? I need to find the content of an element with an id of “tooltip display-rating”, which is online. I also want to set the user agent to that of a browser.

(I also asked for a random time of between 1 and 2 minutes between each request to the website to be polite. I’m not supposed to scrape Letterboxd but it I figured it was OK as this was for personal use, and I am a paid member.)

This all went pretty well, and ChatGPT also talked me through installing python on my new Mac. The prompts I used were hesitant at first because I didn’t really know how far this was going to go. ChatGPT was also there to talk me through some python specific errors.

When I run this script, I get an error: “ModuleNotFoundError: No module named ‘requests'” What do I need to do to import this module

I get a warning when I run this command: “NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the ‘ssl’ module is compiled with ‘LibreSSL 2.8.3’.” Is this something I need to fix? What should I do?

Before long I’d got this complete working piece of code and checked my hypothesis. Turns out that’s not a strong enough correlation to say anything either way.

While the example itself is trivial and the output inconclusive, it showed me that it was very possible to write decent quality code very quickly. I rarely use Python, but ChatGPT provided useful assistance in an unfamiliar language. Writing this code from scratch, even in Java, even using Stackoverflow, would have taken more time than it was worth. As Simon Willison says

AI-enhanced development doesn’t just make me more productive: it lowers my bar for when a project is worth investing time in at all. Which means I’m building all sorts of weird and interesting little things that previously I wouldn’t have invested the time in.

My immediate takeaway is that AI tooling has the potential to revolutionise programming. It’s not going to replace programmers, rather it’s going to reduce the threshold for a project to be viable and unlock a lot of work. Tim Harford made the same point recently, looking at the history of the spreadsheet. This is an exciting time, and I’m expecting to be very busy in the next few years. I’m also impressed at how effective a tutor ChatGPT is, breaking down its examples into straightforward steps.

It has taken me far longer to write this post than it did to produce the code.

Categories
GenAI SpringAI

Spring AI Image generation example

Spring AI’s 0.8.0-SNAPSHOT release includes support for Image Generation using Dall-E 2/3 or Stability. This was added on January 24th and has not yet been documented but a video by Craig Walls describes how to use the new functionality.

I thought that an interesting example to try would be to combine ChatGPT with Dall-E. This way I can take a restricted range of parameters for an image (ie mood, animal, activity) and ask ChatGPT to expand this into a detailed prompt, which I can then use to generate an image. The idea here is to take user input for the prompt but to restrict what they can specify, maybe through some dropdowns. Another way of doung this would be to use ChatGPT to check freeform user input, but this seems to be simpler.

The example was pretty easy to put together. I used the org.springframework.ai.openai.OpenAiChatClient class to communicate with chatGPT followed by the org.springframework.ai.image.ImageClient class to generate the image using Dall-E 3. A simple Controller took some GET parameters and placed them into a prompt template:

I want to generate amusing images.
These images should feature an animal. The animal chosen is {animal}.
The animal in question should be {activity}.
The picture should make the user feel {mood}.

This template prompt could be changed to further restrict or specify the sort of image being produced through prompt engineering.

There’s a fair amount of Spring magic tying things together – in particular a @Configuration class that sets up the OpenAIImageClient, since auto-configuration is not yet available. The Controller method is as follows:

@GetMapping("safeimagegen")
public String restrictedImageGeneration(
@RequestParam(name = "animal") String animal,
@RequestParam(name = "activity") String activity,
@RequestParam(name = "mood") String mood) {

    PromptTemplate promptTemplate = new PromptTemplate(imagePrompt);
    Message message = promptTemplate.createMessage(Map.of("animal", animal, "activity", activity, "mood", mood));

    Prompt prompt = new Prompt(List.of(message));

    logger.info(prompt.toString());
    ChatResponse response = chatClient.call(prompt);
    String generatedImagePrompt = response.getResult().toString();
    logger.info("AI responded: generatedImagePrompt);
    ImageOptions imageOptions = ImageOptionsBuilder.builder().withModel("dall-e-3")
                .build();

    ImagePrompt imagePrompt = new ImagePrompt(generatedImagePrompt, imageOptions);
    ImageResponse imageResponse = imageClient.call(imagePrompt);
    String imageUrl = imageResponse.getResult().getOutput().getUrl();
    return "redirect:"+imageUrl;

}

This is not a particularly sophisticated piece of code, but it does show how simple it is to get SpringAI examples working.

I submitted a request for a picture of an aligator rollerblading, and set the mood as “joyful”. ChatGPT then generated a detailed prompt:

The image features a cheerful green gator. He’s wearing a pair of shiny, multicolored rollerblades that sparkle as they catch the light. His eyes are wide with excitement, and his mouth is stretched in a wide, friendly grin, revealing his white teeth. He’s standing in a beautiful park with green trees and flowers in the background, and there’s a clear blue sky overhead. He’s waving at the viewer as if inviting them to join him in his rollerblading adventure, adding to the joyful and playful vibe of the image.

And then the browser was redirected to the image: