Categories
weeknotes

Weeknotes: 2025-30/29

  • I drafted some notes last week, but didn’t press publish, so these notes are two weeks’ worth.
  • A client colleague prompted me to make more use of Copilot in Teams. It’s hugely useful, but there’s a gap between reading and writing in all these tools – it’s too easy to copy and paste the application summary rather than edit it (particularly if you have another meeting to get to, since the context disappears when you move away). It’s going to be interesting to see how helpful this proves in the long run.
  • I wonder if remote working is increasing the number of meetings as it is so easy to book them – and cameras off means that there are people multi-tasking, rather than looking bored in the room. There’s no feedback to prompt people to push back against the calls.
  • I’ve been playing with AmazonQ. The UX is an atrocity, but the tool itself impressive and compelling. There are however, a lot of subtleties about how this would work as a development workflow, and how it will scale up to use in large organisations. I’m using the Nilenso piece on AI-coding as a guideline. I made a post about my initial response to Q and another one about my second week.

Links

  • I’ve been catching up on Sean Goedecke’s excellent writing. In Do Not Yell at the Language Model he talks about how berating a language model for mistakes might create a negative context, producing worse results.
  • Peter Hilton describes an amazing lightning talk, where Chris Oldwood told programming jokes for 5 minutes. Hilton goes on to imagine a book of 97 Jokes Every Programmer Should Know, suggesting that such jokes are a good way to learn some aspects of programming. “There are 10 kinds of programmers: those who understand binary, those who don’t, and those who weren’t expecting a base 3 joke.”
  • Charity Majors wrote an interesting piece, On How Long it Takes to Know if a Job is Right for You or Not, in which she talks about the need for alignment between a manager’s values and the company they work for.
  • The striking thing about Bo Frese’s The 13 Ways We Kill High-Performing Agile Teams was how often these occur, despite going against well-known best practise. Also interesting to see that the scrum guide had removed ‘the three questions’ as a stand-up practise.
  • Good retros are hard, and Who Needs Action Items by Daniel Cooper is a good piece on this. “Eventually, people stop bringing anything that actually matters and it’ll all be fluff. No one wants to accidentally become the owner of ‘improve emotional tone in retros (Q3 OKR)’.”

Books

I completed a re-read of Kent Beck’s Extreme Programming Explained, which I last read back around 2001. I have a lot of notes to reflect on, but the biggest surprise was how little empirical evidence Beck had for his theories. Which is not to say I think Beck is wrong per se, rather that his insights are based on a particular set of experiences. There was also some provocative thoughts about documentation which goes against what I think, and is worth interrogating.

Categories
GenAI

Summer of Q: Week 2

My overall impression, after more time working with Amazon Q, is that it will take some work for a coding agent to make me faster and more effective. Q definitely removes some of the boring bits of coding (it’s great at Maven dependencies) but it’s more wayward on complicated tasks. There’s a lot to learn here.

At the end of last weekend, I’d settled on a method: writing a specification for an area of my application, having Q produce a BDD feature file outlining the behaviour, and then getting Q to fill in the testing code and after that, the implementation. This soon ran into problems as I’d still set Q too wide a brief, and the code produced quickly sprawled. There were many minor issues, such as Q producing unfocussed Cucumber step files. Along with the pages of code, some chunks of functionality were left out to ‘fill in later’.

It’s tricky to find a regular working pattern with good DevEx. I didn’t want to put Q into ‘trust’ mode, choosing rather to review each change as it was prepared. I did this so I could interrupt Q when it went off the rails, and also to reduce the amount of generated code I needed to review. This meant a lot of time waiting while Q was ‘thinking’. One colleague talked about their passion for writing code and how reviewing generated things is not the same. In their current form, these tools don’t have the responsiveness of working directly with code.

The production of the code also produced a strange effect around ownership. Hand-writing code (or whatever we call the ‘old’ ways of programming) meant taking care with each method. It was a good way to get inside the code, producing ‘mechanical sympathy’. Here, I started with a simple outline of my application in 275 words. Q produced over 10,000 words of feature files (including some useful functionality that was not asked for, such as sanitising inputs). This is a lot of reading! Assuming a reading rate of 400 words per minute, that is 25 minute’s work – setting aside the deeper understanding needed here, and any editing required.

Q also proved to be better at some things than others. When asked to generate some test data, Q created a programme to populate the DB on start-up. I had to suggest using liquibase. Being able to get the best out of this tool requires the operator to have a clear idea of what they would expect.

I’m still convinced that these tools will be part of a regular toolkit, but I don’t think they will offer the sort of incredible gains some have suggested – although they will be essential for prototyping. Cal Newport produced a great summary of the competing claims about productivity. My prediction is that, in the long run, we’ll see significant gains, but we won’t be relying solely on the agents.

Categories
GenAI

First Impressions of AmazonQ

My employer has organised a ‘Summer of Q’, where a number of us have signed up to play with AmazonQ. This weekend was the first time I could work with Q in depth. The main result – I ‘built’ a quiz application in 30 minutes (while also doing some chores) and it looked and worked better than what I’d have produced solo. But there are a lot of subtleties and caveats to add to this.

  • A major argument against GenAI putting developers out of work is how poor the tooling and signup flows for Q are. The signup is terrible and confuses a lot of people. Q failed to help, and kept hallucinating links to help pages that didn’t exist. The IntelliJ plugin is awful and locks the IDE, so I’ve had to use the command-line version instead.
  • Q is great at producing code. Producing the quiz example was a trivial task, so I’m now working on a much more complicated example. Straight away, I can see Q making me more effective. Personal tools I’ve wanted to make, that I decided against investing time in, now look easy.
  • The quiz app that Q produced looked and played better than what I could have produced by myself. I’m very impressed by this.
  • The model’s reasoning is clever and spooky – it makes mistakes sometimes, but then works to fix those. Interesting behaviour – although I expect there to be fewer mistakes in the generated code over time.
  • One of the challenges of coding agents is getting used to the new workflow. There’s a fair bit of waiting involved while Q thinks about each file that needs creating. It’s very different to using a GenAI coding assistant, and I need to figure out the best new workflow.
  • An ongoing problem with GenAI is that it involves a lot more reading than writing. I figure almost no-one is reading co-pilot meeting summaries, and I worry that not everyone will closely read the impressive amount of code that Q generates.
  • At present, I’m reviewing each action Q takes, rather than trusting it for the session. It’s going to be interesting to how other people are working. There’s a lot of boring waiting this way, but a lot less reading to do in one go.
  • Being able to produce decent (albeit not perfect) code so quickly will change the nature of programming. The coding part is going to get much easier. The development part – making sure the right thing is produced – will become more important, and maybe more difficult. I’m currently using feature tests as a way of validating what is being made.
  • Something I’ve noticed with GenAI in a number of areas is the importance of taste. The tools produce things (image/text/code) incredibly fast, and require an operator with strong opinions about this output.
  • Q responded to my initial, naive prompts by producing ornate additional features. For example I asked it to generate some BDD feature files and it’s adding some complicated accessibility tests. I’m looking forward to watching it try to fill those out! I also spotted some subtle divergences from the spec that I need to edit. The quiz code I initially generated also included a lot of useful but unasked-for features. They were improvements, for sure, but it was definitely not an MVP. It will be interesting to see how easy it is to work with Q on my more complicated application.
Categories
weeknotes

Weeknotes: 2025-28

  • I’ve been working this week on mongo replicasets and I’m very impressed with their resilience, particularly the use of an intelligent client in the driver to handle failover etc.
  • As part of an initiative at work, I started playing with Amazon Q, initially asking it to generate some basic arcade games. First impression was to be impressed at the simple examples produced, while being aware of the challenge in getting precise results from a coding agent. Something I need to spend more time on.

Links

  • An excellent post from Sean Goedecke, AI Interpretability is further along than I thought, talks about internals of language models – it was a useful reminder of why telling a chatbot that it’s an expert works.
  • AI-assisted coding for teams that can’t get away with vibes (via Simon Willison) was a useful primer on large-scale coding with GenAI. A useful rule here was ‘what helps the human helps the AI’, including linting, CI/CD, documentation and clearly defined features. Some good examples around prompting, and how AIs are used to build the prompts to code from. The most interesting bit, and something I’d like to go back to, is the claim that the DRY principle is less useful when working with LLMs. This is a living document being maintained by nilenso, which I will have to keep an eye.
  • Could HTTP 402 be the Future of the Web was a good speculative article about the need for micropayments and how charging AI crawlers could lead to that.
  • Some excellent words of wisdom from Everything is Prioritization: “If you’re remote and still free frazzled, you’re not doing remote wrong. You’re just prioritizing availability over impact.” The article talks about the need to avoid tempting distractions: “The best teams aren’t full of geniuses. They’re full of people who keep their focus and say ‘no’ without having a breakdown”.
  • I’ve long disliked the cargo cult metaphor, and this is deconstructed in The origin of the cargo cult metaphor, which points out a lot of the errors and miscomprehension in the popular understanding of actual cargo cults. “The cargo cult metaphor is best avoided”.
  • Simon Willison’s Identify, solve, verify is a short piece on the role of the programmer in the era of GenAI. “The more time I spend using LLMs for code, the less I worry about my career”.
  • The Elegance Question: What Makes Some Systems Just Work? set out some simple principles for building ‘elegant’ systems. This was thought-provoking, particularly around the question of why so many systems go against these principles.

Books

No time for reading this week – and I’ve been distracted by a non-tech book.

Categories
weeknotes

Weeknotes: 2025-27

  • I’m going to try writing a few weeknotes to see how they feel. I need some way to consolidate everything I’m reading and thinking about, but longer blog posts are not coming together. These weeknotes will help me track my technical interests – and hopefully help me find interesting blog posts when I need to refer back to them.
  • Last Sunday, I had an interesting conversation with Laurence where I found myself asking whether agile is too hard for most teams. Laurence pointed out out that the core of agile is simple, but it does place a lot of demand on developers. I think the widely perceived failures of agile need much more consideration.
  • In another discussion with Laurence, I realised how vital GenAI skills will be for technical managers – there is a huge change in software development coming and staying current will require understanding those skills – not least to be able to support and unblock those who use them most.
  • Something I’ve not blogged about over the past few weeks is the decline of stack overflow. It’s been interesting to see how the references for learning technical skills have changed over the years.
  • One of the things I like most about working in a large consultancy is the number of talks and activities going on. An ‘unadvent of code’ group has started to look at the Advent of Code puzzles from 2018. This has got me playing with Go as a coding activity, which I’m enjoying.

Links

  • I watched the video Java for AI by Java Library Architect Paul Sandoz – another example of the Java platform’s strength as a combination of JVM, libraries. It’s will be good to see Java become a first-class platform for AI
  • AI Is Poised to Re-write History is an interesting article looking at GenAI as a reading machine rather than a writing machine. It also interviews Mark Humphries, who was discussed in the excellent Feb 2024 Verge article How AI can make history

Reading

Writing for Developers

I started reading this book, a recommendation from my colleague Matt. The book could probably be titled ‘Blogging for Developers’, and it’s interesting to see someone writing such a book in 2025. I like the book, but I definitely have philosophical differences with it, in that it focusses on blogging as a way to go viral, sometimes neglecting the more personal uses of blogging (such as weeknotes). A good counterpoint occurs in Simon Willison’s piece on keeping a link blog.

Categories
GenAI

Two notes on vibe coding

From Ashley Willis:

“[A mentor] pointed out that debugging AI generated code is a lot like onboarding into a legacy codebase, making sense of decisions you didn’t make, finding where things break, and learning to trust (or rewrite) what’s already there. That’s the kind of work a lot of developers end up doing anyway”

From Sean Goedecke:

Being good at debugging is more useful than being good at writing code – you only write a piece of code once, but you may end up debugging it hundreds of times1. As programmers use more AI-written code, debugging may end up being the only remaining programming skill.

Categories
GenAI

Notes on RAG – Part 2

Continuing my research on RAG (part 1 here)

The week before last I worked on a simple example of RAG using Spring. This involved a lot of yak shaving, in part because spring had updated their package structures since I last worked on my RAG demo. I also wanted to set up a local vector DB without using docker, finally settling on MariaDB. In the end I had a simple example that took a CSV, inserted the rows as embeddings into a vector database and could run simple queries against it. I still need to tidy this up and upload it to github.

Interesting Links

  • I need to look into using vector databases for recommendation engines.
  • “If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves.” – link
  • Simon Willison gave a great talk about embeddings. This introduced the concept, showed how to do ‘vibes-based search’ against sqllite using llm, and talked about 2D visualisations.
    • “Being able to spin up this kind of ultra-specific search engine in a few hours is exactly the kind of trick that excites me about having embeddings as a tool in my toolbox.”
    • “A fascinating thing about RAG is that it has so many different knobs that you can tweak. You can try different distance functions, different embedding models, different prompting strategies and different LLMs. There’s a lot of scope for experimentation here.”
  • Adding semantic search to datasette
  • Lovely visualisation of 40 million Hacker news posts. This uses Uniform Manifold Approximation and Projection (UMAP) to reduce the vectors to a 2D geographic map (among other things). Some interesting applications of a massive dataset.
  • The llm tool can be used for image search using CLIP
  • This leads to someone searching images of faucets by image and phrase (what taps best represent the idea of ‘Bond villain?’)
  • According to ChatGPT, people have played with using vector databases to analyse recipes. There is a paper on this, ‘Learning Cross-modal Embeddings for Cooking Recipes and Food Images‘, but I can’t find any details on applications with it, experimental or otherwise.
  • Interesting discussion of search in RAG (look for the RAG section)
  • Text Embedding Models Contain Bias. Here’s Why That Matters – interesting Google paper from 2018
  • Using a vector database, SkyCLIP, and Leaflet to create a searchable aerial photograph
  • “This is why Retrieval-Augmented Generation (RAG) is not going anywhere. RAG is basically the practice of telling the LLM what it needs to know and then immediately asking it for that information back in condensed form. LLMs are great at it, which is why RAG is so popular.” link
  • Spring AI documentation on RAG
Categories
GenAI

Notes on Resource-Augmented Generation (part 1)

I’ve been thinking recently about Resource Augmented Generation (RAG) and vector databases, and I wanted to gather some notes towards a talk.

The question

Most of the applications I’m seeing for Generative AI seem to involve RAG. But I feel that the vector database is doing most of interesting work here – and, in a lot of cases, an LLM-generated response is not always the best ‘view’ for the data that is returned. I want to dig a little more into vector databases, how they work, and what can be done with them.

Panda Smith wrote, “If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves”; I want to learn more about this search part of RAG

Previous posts

Notes from wikipedia

Pulling out some notes from the wikipedia article:

  • “[RAG] modifies interactions with a large language model so that the model responds to user queries with reference to a specified set of documents, using this information to supplement information from its pre-existing training data. This allows LLMs to use domain-specific and/or updated information.”
  • There are two phases – information retreival and response generation
  • RAG was first proposed in the paper ‘Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks’ in 2020
  • The process involves the following stages:
    • Indexing – the documents to be searched are stored – usually by converting the data into vector representations
    • Retrieval – given a user query, a document retriever finds the most relevant documents for the query
    • Augmentation – the documents are put in a prompt for an LLM
    • Generation – the LLM generates a response based upon the prompt
  • The ‘chunking’ of the documents (how they are divided up into pieces to be stored) affects how good the responses are.
  • Risks of RAG
    • While RAG reduces hallucinations in the responses, it cannot eliminate them.
    • There is a danger of losing important context in the chunking phase.

Interesting links

Categories
java

Does Java have a future?

I’ve been working with Java for 25 years. After a shaky time in the noughties, the platform has thrived, and a huge number of applications have been built on it. But I’m starting to wonder about Java’s future, given three things: new features in the language, the JVM itself, and the rise of generative AI.

When Java started it was an exciting prospect – an object-oriented language that could run on multiple platforms, but without the complexities of C++. During the 00s it had looked like newer languages would take its place, but with Spring Boot it’s become a popular language on the cloud.

It’s been a long time since Java was cool, but CTOs definitely like it and there is a massive sunk cost invested in the eco-system. While I’ve worked with other languages, I expected to be working primarily with Java for the remainder of my career. Some people have mocked Java as the new COBOL, but I see a steady supply of work as a positive thing.

One of Java’s strengths is its simplicity. A couple of years back, I published an article for my previous consultancy about Why Java Still Matters. I concluded that Java’s strength was its readability compared to other languages, and that helped with collaboration: “[Java’s] lack of sophistication forces developers to produce more straightforward code [and] we write code for other developers, not the machine.”

Since 2017, Java has committed to twice-yearly releases, which has led to a large number of new features in the language. I have argued (mostly in jest) that Java peaked with version 7, before the introduction of functional programming features. The new features in Java are undoubtedly expressive, but at the cost of some consistency. For a long time, Java applications tended to look very similar between workplaces. This will be less true as the language becomes richer.

If Java stops being a simple language it becomes less compelling as a choice. Indeed Java’s backward compatibility has produced some strange versions of modern features. Why not pick a language that was designed from scratch to include these things?

Java also brings with it the deadweight of the JVM. While Java is fast once it’s up-and-running, there are significant start-up costs that are particularly punishing on serverless. Yes, there are workarounds but these bring their own problems. GraalVM’s incredibly fast start-up comes at the cost of slower build times and significant differences between production software and development versions. Over the past few months, my workplace Java user group has been discussing the problems of Java on serverless and the outcomes are frustrating. It’s hard not to feel like we’re making excuses for the platform we work on.

Despite both of the above issues, a lot of money was invested in Java, and companies were unlikely to switch. But, with the rise in generative AI, it’s easier than ever for developers to get working with unfamiliar languages. And it’s also going to get easier to convert existing applications to new platforms. The first tools to do this are imperfect, but they will improve.

Java has always been a clunky language to work with. The boilerplate made hacking on new ideas unrewarding (although tools such as JHipster helped massively). GenAI supports me setting things up on new language, and it’s a great tutor, able to hone its examples to support the specific thing I’m trying to build.

I’ve had a lot of fun working with Java over the years, but I’m starting to feel that, long-term, my future lies with other languages. It’s time to explore some alternatives.

Categories
programming

The Importance of Blogging for Programmers

I started this weblog in November 2014 and have published 104 posts – a little under once a month. It has a very small readership. I still find it useful for two reasons. First, there are the reference posts that are useful documentation (for example recipes for GIS or a checklist of scheduling issues). Then there are the posts that help develop my thinking about things.

This latter type of post is a form of rubber ducking, and is useful even if nobody reads them. As EM Forster asked1, “How do I know what I think until I see what I say?” Writing about a subject is a useful form of deliberate practise that helps develop insights and skills.

The problem is that these posts take a lot of work to write, and I’ve abandoned dozens over the years – some of which would have been helpful for tracking my development on topics. I’d love to look back at how my thoughts on Generative AI have changed.

Over the next year, I want to write more about programming and my experience of it. But an important first stage of this is reducing the effort required to publish something useful.

I’ve been inspired by a recent post on this topic by Hamel Husein, Building an Audience Through Technical Writing: Strategies and Mistakes. There’s a lot of good advice in this post, but what stood out to me immediately was the idea of a voice-to-content pipeline. While I’ve used AI to transcribe written notes, I’d not actually made direct use of speech-to-text. Dictating the first part of this post has sped things up for me significantly.

Husein also discusses using AI models to help with generating the text, and I certainly want to explore creating prompts to help me with editing and proofreading (something Simon Willison discussed here).

An obvious question is why write public blog posts rather than keeping a private list? First, I think that preparing thoughts for public consumption produces better summaries. Also, I think there’s value in having a public archive where others can respond to your thoughts. This might not happen often, but it is good to make space for this.

One of the biggest challenges I face with blogging is that I want every post to be as perfectly written as those by people like Charity Majors or Joel Spolsky. But I do think there is a space for smaller, more personal posts and link posts – some of which might eventually provide a basis for deeper essays.

There’s only a tiny audience for what I write here, but the most important part of this audience is me. Over the coming year I plan to post more. GenAI is a revolutionary technology for software development, and I want to follow this closely. I also want to think more about my experiences as a software developer and improving as a programmer2.

  1. Although he did not apparently originate the quote. ↩︎
  2. I also want to think about the difference between being a programmer and a software developer. One seems to be more at the level of individual functions, and I think I’m better at the latter than the former. ↩︎