My overall impression, after more time working with Amazon Q, is that it will take some work for a coding agent to make me faster and more effective. Q definitely removes some of the boring bits of coding (it’s great at Maven dependencies) but it’s more wayward on complicated tasks. There’s a lot to learn here.
At the end of last weekend, I’d settled on a method: writing a specification for an area of my application, having Q produce a BDD feature file outlining the behaviour, and then getting Q to fill in the testing code and after that, the implementation. This soon ran into problems as I’d still set Q too wide a brief, and the code produced quickly sprawled. There were many minor issues, such as Q producing unfocussed Cucumber step files. Along with the pages of code, some chunks of functionality were left out to ‘fill in later’.
It’s tricky to find a regular working pattern with good DevEx. I didn’t want to put Q into ‘trust’ mode, choosing rather to review each change as it was prepared. I did this so I could interrupt Q when it went off the rails, and also to reduce the amount of generated code I needed to review. This meant a lot of time waiting while Q was ‘thinking’. One colleague talked about their passion for writing code and how reviewing generated things is not the same. In their current form, these tools don’t have the responsiveness of working directly with code.
The production of the code also produced a strange effect around ownership. Hand-writing code (or whatever we call the ‘old’ ways of programming) meant taking care with each method. It was a good way to get inside the code, producing ‘mechanical sympathy’. Here, I started with a simple outline of my application in 275 words. Q produced over 10,000 words of feature files (including some useful functionality that was not asked for, such as sanitising inputs). This is a lot of reading! Assuming a reading rate of 400 words per minute, that is 25 minute’s work – setting aside the deeper understanding needed here, and any editing required.
Q also proved to be better at some things than others. When asked to generate some test data, Q created a programme to populate the DB on start-up. I had to suggest using liquibase. Being able to get the best out of this tool requires the operator to have a clear idea of what they would expect.
I’m still convinced that these tools will be part of a regular toolkit, but I don’t think they will offer the sort of incredible gains some have suggested – although they will be essential for prototyping. Cal Newport produced a great summary of the competing claims about productivity. My prediction is that, in the long run, we’ll see significant gains, but we won’t be relying solely on the agents.