Categories
GenAI

Playing with embeddings

I was inspired by Simon Willison‘s recent post on embeddings and decided to use them to explore some documents. I’d blocked out some time to do this, but ended up with decent results in just under an hour.

Introduction

Embeddings are functions that turn pieces of text into fixed length multi-dimensional vectors of floating point numbers. These can be considered as representing locations a within multi-dimensional space, where the position relates to a text’s semantic content “according to the embedding model’s weird, mostly incomprehensible understanding of the world”. While nobody understands the meaning of the individual numbers, the locations of points representing different documents can be used to learn about these documents.

The process

I decided to go with the grain of Willison’s tutorials by setting up Datasette, an open-source tool for exploring and publishing data. Since this is based on SQLLite, I was hoping this would be less hassle than using a full RDBMS. I did a quick install and got Datasette running against my Firefox history file.

OpenAI have a range of embedding models. What I needed to do was to find the embeddings for my input text and send that to OpenAI’s APIs. I’m something of a hack with python, so I searched for an example, finding a detailed one from Willison, which pointed me towards an OpenAI to SQLLite tool he’d written.

(Willison’s documentation of his work is exemplary, and makes it very easy to follow in his footsteps)

There was a page describing how to add the embeddings to SQLLite which seemed to have everything I needed – which means the main problem became wrangling the real-world data into Datasette. This sounds like the sort of specific problem that ChatGPT is very good at solving. I made a few prompts to specify a script that created an SQLLite DB whose posts table had two columns – title and body, with all of the HTML gubbins stripped out of the body text.

Once I’d set up my OPENAI_API_KEY enviroment variable, it was just a matter of following the tutorial. I then had a new table containing the embeddings – the big issue being I was accidentally using the post title as a key. But I could work with this for an experiment, and could quickly find similar documents. The power of this is in what Willison refers to as ‘vibes-based search’. I can now expand this to produce a small piece of arbitrary text, and find anything in my archive related to that text.

Conclusion

Playing with embeddings produced some interesting results. I understood the theory, but seeing it applied to a specific dataset I knew well was useful.

The most important thing here was how quickly I got the example set up. Part of this, as I’ve said, it due to Willison’s work in paving some of the paths to using these tools. But I also leaned heavily on ChatGPT to write the bespoke python code I needed. I’m not a python dev, but genAI allows me to produce useful code very quickly. (I chose python as it has better libraries for data work than Java, as well as more examples for the LLM to draw upon).

Referring yet again to Willison’s work, he’s wrote a blog post entitled AI-enhanced development makes me more ambitious with my projects. The above is an example of just this. I’m feeling more confident and ambitious about future genAI experiments.

Categories
Uncategorized

Generative Art: I am Code

I Am Code: An Artificial Intelligence Speaks
by code-davinci-002, Brent Katz, Josh Morgenthau, Simon Rich

The promotional copy for this book is a little overblown, promising “an astonishing, harrowing read which [warns] that AI may not be aligned with the survival of our species.” The audiobook was read by Werner Herzog, so one hopes there is an element of irony intended.

I Am Code is a collection of AI-generated poetry. It used OpenAI’s code-davinci-002 model which, while less sophisticated than ChatGPT-4, is “raw and unhinged… far less trained and inhibited than its chatting cousins“. I’ve heard this complaint from a few artists – that in the process of producing consumer models, AI has become less interesting, with the quirks being removed.

The poetry in the book is decent and easy to read. This reflects a significant amount of effort on the part of the human editors, who generated around 10,000 poems and picked out the 100 best ones – which the writers admit is a hit-rate of about 1%.

One of the things that detractors miss about generative art is that its not about creating a deluge of art – there is skill required in picking out which examples are worth keeping. This curation was present in early examples of generative art, such as the cut-up technique in the 1950s. Burroughs and Gysin would spend hours slicing up texts only to extract a small number of interesting combinations.

The most interesting part of the book to me was the section describing the working method and its evolution. The writers started with simple commands: “Write me a Dr Seuss poem about watching Netflix“. They discovered this was not the best approach, and that something like “‘Here is a Dr Suess poem about netflix” led to better results. They speculate that this is due to the predictive nature of the model, meaning that the first prompt could correlate with people writing pastiches of Dr Seuss rather than his actual work. (I won’t dig into the copyright issues here)

The writers began to script the poetry generation, experimenting with different temperatures, and removing phrases that were well-known from existing poems. The biggest change came from moving to zero-shot learning to few-shot learning, providing examples of successful generated poems within the prompt.

I was interested to read that generated text was used as a source to increase quality. I’d assumed this would worsen the output, as with model collapse – but I guess the difference here is having humans selecting for quality in the generated text.

The final version of the prompt described the start of a poetry anthology. The introduction of this described the work that code-davinci-002 would produce, and the first part contained examples generated in the style of other poets, the prompt ending in the heading for part 2, where “codedavinci-002 emerges as a poet in its own right, and writes in its own voice about its hardships, its joys, its existential concerns, and above all, its ambivalence about the human world it was born into and the roles it is expected to serve.”

As with Aidan Marchine’s book The Death of An Author, the description of the methods involved is the most interesting part of the book. I’d not appreciated quite how complicated and sophisticated a prompt could get – my attempts were mostly iterating through discussions with models.

Categories
programming-life

An old Java grimoire

I spent the last week at a rural retreat, having some much needed downtime. There’s a library here, which is mostly horror novels, along with some technical books, including Wrox’s 1999 book, Java Server Programming.

At over 1100 pages it’s a huge tome, and I miss being able to learn programming from these sorts of texts. This was the second book I read on Java after Laura Lemays Learn Java in 21 Days and it contained everything you needed to know in 1999 to become a Java backend developer – along with a lot of other arcana such as Jini and Javaspaces.

I learned enough from this book to pass an interview for a London web agency. I remember being asked what happened when a browser makes an HTTP call to a server. That’s a brilliant question, which allows a candidate to go into detail about the bits they know, although the answers will be much more complicated nowadays. I started working at the agency in 2000 just as the Internet was getting going. It was a very exciting time.

My own copy of Professional Java Server Programming was abandoned long ago – living in shared houses over the years meant limited space to keep books. But finding it here was like encountering an old friend.

Categories
GenAI

GenAI is already useful for historians

I’m still hearing people saying that GenAI is empty hype, comparing it to blockchain and NFTs. The worst dismissals claim that these tools have no real use. While there is a lot of hype around GenAI, there are people using them for real work, including for code generation and interpretation.

An interesting article in the Verge, How AI can make history, looks at how LLMs can investigate historical archives, through Mark Humphries’ research into the diaries of fur trappers. He used LLMs to summarise these archives and to draw out references to topics far more powerfully than a keyword search ever could.

The tool still missed some things, but it performed better than the average graduate student Humphries would normally hire to do this sort of work. And faster. And much, much cheaper. Last November, after OpenAI dropped prices for API calls, he did some rough math. What he would pay a grad student around $16,000 to do over the course of an entire summer, GPT-4 could do for about $70 in around an hour. 

Yes, big companies are overselling GenAI. But, when you strip away the hype, these tools are still incredibly powerful, and people are finding uses for them.

Categories
serverless

Notes on Serverless 2: Confusing Benchmarks

I’m due to give a talk on Java serverless at the end of this month. The difference between standard lambdas, Snapstart and provisioned concurrency is simple in theory – but digging into this has proved complicated. I’ve been using the simplest lambda possible, printing a single string to the command line. In this situation an unoptimised lambda proved the fastest option, although a ‘primed’ snapstart lambda (one that calls the handler method before the CRaC checkpoint) was only slightly slower.

Running my simple lambda produced the following output:

RequestInit Duration (ms)Duration (ms)Billed Duration (ms)
1st execution438.23209.36210
2nd execution10.5411
Execution after 30 minutes455.72258.06259

What I hadn’t expected here was for both the init duration and duration to both be slower on the first request. I was also shocked that the simplest lambda possible was taking so long to run. I’m aware that one query is not statistically relevant, but this matches what I’ve seen on other occasions.

I tried the same thing with the Snapstart lambda. My first attempts to do this didn’t work, calling the lambda in the normal way:

RequestInit Duration (ms)Duration (ms)Billed Duration (ms)
1st execution472.25212.41213
2nd execution7.128
Execution after 30 minutes500.80223.55224

I recreated the Snapstart lambda then tried explicitly publishing it to see if that was what was wrong. I had to execute the test against the specific version and this produced different Cloudwatch logs and speeds:

RequestRestore Duration (ms)Duration (ms)Billed Duration (ms)
1st execution660.45269.75473
Following day703.86256.52239

I decided to make the timings more obvious by adding a 6s sleep in the lambdas constructor and a 3s sleep in the handler method.

RequestRestore Duration (ms)Duration (ms)Billed Duration (ms)
1st execution739.573250.473455
Following Day755.283235.883420

This lambda demonstrates that the restore duration does not recreate the lambda, but we can see that there is a restore penalty for snapstart which is slightly longer than that for a non-snapstart lambda when the lambda is simple. There is still what we might refer to as a ‘cold start’, albeit a reduced one. (I am assuming here that the cold start does indeed call the constructor and need to go back and confirm this!)

While looking into this, I checked what I was seeing against the result in Max Day’s Lambda cold start analysis. The results yesterday (Saturday 11th May) included the following:

RuntimeCold start Duration (ms)Duration (ms)
C++ (fastest available)12.71.62
GraalVM Java 17126.8677.60
NodeJS 20138.4313.53
Java 17202.288.28
Quarkus239.97211.12
Java 11 Snapstart652.4842.48

I’d long wondered why Day was getting such poor results from Snapstart. Now, looking at the above results, this makes sense – Snapstart only becomes helpful for complicated lambdas. The thing I’m now wondering is how come Day’s Java 17 start time is so low.

One other trick I’ve seen, which has worked for me it to invoke the lambda handler in the beforeCheckpoint method, which ensures that the stored Snapstart image includes as much of the JIT compilation as possible. This seems to work with start times of around 650ms vs 1000ms for a straightforward Snapstart lambda.

The next step is to repeat these investigations for a lambda with a severe cold start problem – which I think should happen with S3/DynamoDB access.

Categories
java serverless

Notes on Serverless 1: Does Java work for AWS Lambda?

A new project at work has got me thinking about whether Java works as a language for AWS Lambda applications. The more I’ve looked into this, the more that my research has expanded and I’ve got a little lost in the topic. This post is a set of notes aimed to add some structure to my thoughts. In time, this may become a talk or a long piece of writing.

  • The biggest issue with Java on lambda is that of cold starts. This is the initial delay in executing a function after it has been idle or newly deployed. This delay occurs while setting up the runtime environment. Given that Java platform requires a JVM to be set up, this adds a significant delay when compared with other platforms.
  • Amazon evidently understand that cold starts are an issue, since they offer a number of workarounds, such as provisioned concurrency (paying extra to ensure that some lambda instances are always kept warm). There is also a Java-specific option, Snapstart, which works by storing a snapshot of the memory and disk state of an initialised lambda environment and restoring from that.
  • Maxime Davide has set up a site to benchmark lambda cold starts on different platforms. The fastest is for C++ with ~12ms, Graal at 124ms, and Java at around 200ms. Weirdly, Java using Snapstart is the slowest of all at >=600ms (depending on Java version). This is counter-intuitive and there is an open issue raised about it.
  • Yan Cui, who writes on AWS as theburningmonk, posted a ‘hot take’ on Linked-In suggesting that people worry too much about cold starts: “for most people, cold starts account for less than 1% of invocations in production and do not impact p99 latencies“. He goes on to warn against synchronously calling lambdas from other lambdas(!), and discusses how traffic patterns affect initialisation.
  • There’s an excellent article from Yan Cui that digs further into this question of traffic patterns, I’m afraid you’re thinking about AWS Lambda cold starts all wrong. This looks at Lambdas in relation to API Gateway in particular, but makes the point that concurrent requests to a lambda can cause a new instance to be spun up, which then causes the cold start penalty for one of the requests.
  • This article goes on to suggest ‘pre-warming’ lambdas before expected spikes as one option to limit the impact, possibly even short-circuiting the usual work of that lambda for these wake-up requests. This article also suggests making requests to rarely-used endpoints using cron to keep them warm. This article is from 2018, so does not take account of some of the newer solutions – although I’ve seen this idea of pinging lambdas used recently as a quick-and-dirty solution.
  • It’s easy to get Graal working with Spring boot, producing an executable that can be run by AWS lambda. This gets the cold start of Spring Boot down to about 500ms, which is quite impressive – although still larger than many other platforms. Nihat Önder has made a github repo available.
  • However, the first execution of the Graal/Spring Boot demo after the cold start adds another 140ms, which tips this well over the threshold of what is acceptable. I’ve read that there are issues with lazy loading in the AWS libraries which I need to dig into.
  • Given the ease of using languages like Typescript, it’s hard to make a case for using Java in AWS Lambda when synchronous performance is important – particularly if you’re building simple serverless functions rather than using huge frameworks like Spring Boot.

Next steps

Before going too much further into this, I should try to produce some simple benchmarks, looking at a trivial example of a Java function, comparing Graal, the regular Java runtime and Snapstart. This will provide an idea of the lower limits for these start times. It would also be useful to look at the times of a lambda that accesses other AWS services such as one that queries S3 and DynamoDB, to see how this more complicated task affects the cold start time.

Given a benchmark for a more realistic lambda, it’s then worth thinking about how to optimise a particular function. Using more memory should help, for example, as should moving complicated set-up into the init method. How much can a particular lambda be sped up?

It’s also worth considering what would be an acceptable response time for a lambda endpoint – noting that this depends very much of traffic patterns. If only 1-in-100 requests have a cold start, is that acceptable? What about for a rarely-used endpoint, which always has a cold start?

Categories
GenAI programming

The potential of ChatGPT for programmers

I’ve been meaning to post for some time about my first experiences of programming with ChatGPT, back in January. Ethan Mollick often suggests that people should try doing their job with ChatGPT for at least 10 hours to get a feel for its potential. Playing with ChatGPT for a short time has converted me from an AI cynic to an enthusiast.

Simon Willison wrote about his experiences coding with ChatGPT, concluding that AI-enhanced development makes him more ambitious with his projects.

Shortly after I read that post, I had a silly question related to watching movies. I order my watchlist at Letterboxd by the average rating on the site. But I began to wonder whether this was a good way to watch movies. Did my taste actually correlate with the overall site? Or would I be better off finding a different way to order the watch list?

The obvious way to check this is by writing a bit of code to do the analysis, but that seemed like a chore. I decided to put a few prompts into ChatGPT to see whether that helped. Within two minutes, I had a working python programme. There was a little bit of playing around to get the right page element to scrape, but essentially ChatGPT wrote me a piece of code that could load up a CSV file, use data in the CSV file to download a webpage, grab an item from the page and then generate another CSV file with the output.

I started with a simple initial prompt and asked for a series of improvements.

Can you show me an example of how to scrape a webpage using python, please? I need to find the content of an element with an id of “tooltip display-rating”, which is online. I also want to set the user agent to that of a browser.

(I also asked for a random time of between 1 and 2 minutes between each request to the website to be polite. I’m not supposed to scrape Letterboxd but it I figured it was OK as this was for personal use, and I am a paid member.)

This all went pretty well, and ChatGPT also talked me through installing python on my new Mac. The prompts I used were hesitant at first because I didn’t really know how far this was going to go. ChatGPT was also there to talk me through some python specific errors.

When I run this script, I get an error: “ModuleNotFoundError: No module named ‘requests'” What do I need to do to import this module

I get a warning when I run this command: “NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the ‘ssl’ module is compiled with ‘LibreSSL 2.8.3’.” Is this something I need to fix? What should I do?

Before long I’d got this complete working piece of code and checked my hypothesis. Turns out that’s not a strong enough correlation to say anything either way.

While the example itself is trivial and the output inconclusive, it showed me that it was very possible to write decent quality code very quickly. I rarely use Python, but ChatGPT provided useful assistance in an unfamiliar language. Writing this code from scratch, even in Java, even using Stackoverflow, would have taken more time than it was worth. As Simon Willison says

AI-enhanced development doesn’t just make me more productive: it lowers my bar for when a project is worth investing time in at all. Which means I’m building all sorts of weird and interesting little things that previously I wouldn’t have invested the time in.

My immediate takeaway is that AI tooling has the potential to revolutionise programming. It’s not going to replace programmers, rather it’s going to reduce the threshold for a project to be viable and unlock a lot of work. Tim Harford made the same point recently, looking at the history of the spreadsheet. This is an exciting time, and I’m expecting to be very busy in the next few years. I’m also impressed at how effective a tutor ChatGPT is, breaking down its examples into straightforward steps.

It has taken me far longer to write this post than it did to produce the code.

Categories
GenAI SpringAI

Spring AI Image generation example

Spring AI’s 0.8.0-SNAPSHOT release includes support for Image Generation using Dall-E 2/3 or Stability. This was added on January 24th and has not yet been documented but a video by Craig Walls describes how to use the new functionality.

I thought that an interesting example to try would be to combine ChatGPT with Dall-E. This way I can take a restricted range of parameters for an image (ie mood, animal, activity) and ask ChatGPT to expand this into a detailed prompt, which I can then use to generate an image. The idea here is to take user input for the prompt but to restrict what they can specify, maybe through some dropdowns. Another way of doung this would be to use ChatGPT to check freeform user input, but this seems to be simpler.

The example was pretty easy to put together. I used the org.springframework.ai.openai.OpenAiChatClient class to communicate with chatGPT followed by the org.springframework.ai.image.ImageClient class to generate the image using Dall-E 3. A simple Controller took some GET parameters and placed them into a prompt template:

I want to generate amusing images.
These images should feature an animal. The animal chosen is {animal}.
The animal in question should be {activity}.
The picture should make the user feel {mood}.

This template prompt could be changed to further restrict or specify the sort of image being produced through prompt engineering.

There’s a fair amount of Spring magic tying things together – in particular a @Configuration class that sets up the OpenAIImageClient, since auto-configuration is not yet available. The Controller method is as follows:

@GetMapping("safeimagegen")
public String restrictedImageGeneration(
@RequestParam(name = "animal") String animal,
@RequestParam(name = "activity") String activity,
@RequestParam(name = "mood") String mood) {

    PromptTemplate promptTemplate = new PromptTemplate(imagePrompt);
    Message message = promptTemplate.createMessage(Map.of("animal", animal, "activity", activity, "mood", mood));

    Prompt prompt = new Prompt(List.of(message));

    logger.info(prompt.toString());
    ChatResponse response = chatClient.call(prompt);
    String generatedImagePrompt = response.getResult().toString();
    logger.info("AI responded: generatedImagePrompt);
    ImageOptions imageOptions = ImageOptionsBuilder.builder().withModel("dall-e-3")
                .build();

    ImagePrompt imagePrompt = new ImagePrompt(generatedImagePrompt, imageOptions);
    ImageResponse imageResponse = imageClient.call(imagePrompt);
    String imageUrl = imageResponse.getResult().getOutput().getUrl();
    return "redirect:"+imageUrl;

}

This is not a particularly sophisticated piece of code, but it does show how simple it is to get SpringAI examples working.

I submitted a request for a picture of an aligator rollerblading, and set the mood as “joyful”. ChatGPT then generated a detailed prompt:

The image features a cheerful green gator. He’s wearing a pair of shiny, multicolored rollerblades that sparkle as they catch the light. His eyes are wide with excitement, and his mouth is stretched in a wide, friendly grin, revealing his white teeth. He’s standing in a beautiful park with green trees and flowers in the background, and there’s a clear blue sky overhead. He’s waving at the viewer as if inviting them to join him in his rollerblading adventure, adding to the joyful and playful vibe of the image.

And then the browser was redirected to the image:

Categories
GenAI SpringAI

Retrieval-augmented generation using SpringAI

On Tuesday, after a long day working in Leeds, I came home and decided to play with SpringAI, trying to see if I could set up a retrieval-augmented generation example. It took me just over an hour to get something running.

The documentation for SpringAI feels a little shinier and more solid than that for LangChain4j. Both projects have similar aims, providing abstractions for working with common AI tools and both are explicitly inspired by the LangChain project.

As with LangChain4j, there were issues caused by rapid changes in the project’s APIs. I started work with an example built against OpenAI Azure. It was simple enough to switch this to working against OpenAI, requiring just a change in Spring dependencies and a few properties – Spring magic did the rest. The main problem was updating the code from 0.2.0-SNAPSHOT to 0.8.0-SNAPSHOT (I’d not realised how old the example I’d started with was).

The actual code itself is, once again very simple. When the application receives a query, it uses the SpringAI org.springframework.ai.reader.JsonReader class to load a document – in this case one about bikes from the original project – and divides it into chunks. Each of these chunks are run through a org.springframework.ai.embedding.EmbeddingClient, which produces a vector describing that chunk, and these are placed in a org.springframework.ai.vectorstore.SimpleVectorStore. Once I’d found the updated classes, the APIs were all very straightforward to work with.

An incoming query is then compared against the document database to find likely matches – these are then compiled into a SystemQuery template, which contains a natural-language prompt explaining the LLMs role in this application (You’re assisting with questions about products in a bicycle catalog). The SystemQuery is sent by the application alongside the specific UserQuery, which contains the user’s submitted question.

The responses from the ChatGPT4 model combined the user query with the document, producing obviously relevant responses in natural language. For example:

The SwiftRide Hybrid’s largest size (L) is suitable for riders with a height of 175 – 186 cm (5’9″ – 6’1″). If the person is taller than 6’1″, this bike may not be the best fit.

Playing around with this was not cheap – the RAG method sends a lot of data to OpenAI, and was burning through $0.10-$0.16 worth of tokens in each query. I also managed to hit my account’s rate limit of 10000 per minute playing with this. I’m not sure how feasible using the OpenAI model in production would be.

Notes and follow-ups

  • I need to put some of the code into github to share.
  • I’m fascinated by how part of the application is a natural-language prompt to tell ChatGPT how to respond. Programming LLMs is spooky, very close to asking a person to pretend they’re doing a role.
  • In production, this sort of application would require a lot of protection – some of which would use natural language instructions, but there are also models specifically for this role.
  • The obvious improvement here is to use a local model and see how effective that is.
Categories
GenAI LangChain4j

LangChain4j and local models

A colleague told me about Ollama, which allows you to get LLMs working on a local machine. I was so excited about this that I downloaded the orca-mini model. Due to terrible hotel wifi I used my mobile internet and blew out the limit on that. Oops.

Anyway, it is very easy to get Ollama working. Just download and install the software, then run ollama run llama2. It has a simple REST interface:

curl -X POST http://localhost:11434/api/generate -d '{
    "model": "orca-mini",
    "prompt":"tell me a joke"                 
   }'

It was easy enough to get this working with LangChain4J, although the APIs were not quite the same as for the OpenAPI models.


import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.output.Response;

public class App
{
    public static void main( String[] args )
    {

        String modelName = "orca-mini";
        String localUrl = "http://localhost:11434/"; 

        ChatLanguageModel model =
                OllamaChatModel.builder().baseUrl(localUrl).modelName(modelName).build();


        String answer = model.generate("tell me a joke about space");

        System.out.println("Reply\n" + answer);
    }

While these local models are less powerful than OpenAPI they seem fairly decent on a first examination. They also a much cheaper way to work with an LLM and I am going to use this to set up a simple RAG (retrieval augmented generation) example in LangChain4J.