projectmanagement

Effective Retrospectives

tl;dr – retrospectives should be effective or abandoned; and retrospectives can only be effective if time and effort are spent on fixing the issues raised.

I first worked with agile in 2003. Back then, it was very much developer-led and it sometimes took significant effort to persuade clients or management to adopt these practises. Agile had to be justified by results, and those were compelling enough that larger organisations adopted frameworks such as Scrum.

Fitting flexible methodologies into corporate structures that require predictability is challenging. Sean Goedecke’s essay Seeing like a software company discusses how large companies sacrifice fast delivery for ‘legibility’. He makes a case for this trade-off, but the need for predictability often limits the benefits of agile, leading to top-down structures and, before long, something like SAFe, in which the team’s choices are restricted by the perceived need for centralised co-ordination.

It’s interesting to look back at things that were considered important in the early years of agile that have fallen by the wayside. A lot of teams have unclear or multiple sprint goals, and no longer able to deliver a single unit of value in a sprint. A sprint is never allowed to fail and restart, because teams are often expected to follow a company-wide agile calendar. Commitments to build features are made months in advance, defining future sprints and removing any real agility. Organisations even try standardising story points between teams to compare their output.

Another lost concept is that of the ‘sprint’ being a sustained, time-boxed period of effort, with breaks in between. You had a two- or three-week sprint, followed by a few days to reset, improve tooling, and prepare for the next one. Planning for the next sprint was not crammed into the closing days of the previous one. This allowed time for one of the critical aspects of successful agile: continuous improvement.

Too often, particularly under SAFe, teams have several development sprints planned out, leaving no room to invest in improvements. This leads to pointless retrospectives. The same complaints surface time after time, but there is no opportunity to address them. Developers are not allowed to improve their working methods – even though good developer experience and happy teams correlate strongly with high performance.

As a result, many agile teams fall into learned helplessness, feeling unable to change things. Teams accept limitations and stop challenging them. Rather than empowering teams, retrospectives simply confront the team with their own powerlessness, damaging morale. You can dress your retro up with as many light-hearted themes as you like, but if you’re not fixing anything, what’s the point? At its worst, a lack of continuous improvement means that process debt and technical debt slow the team down more as time goes on. Before you know it, you’re looking at year-long release cycles and all that.

Teams need to be given the space to improve their working practises and environments. It should be easy to make the case for this against feature work. Investing in the team’s velocity will always be a good investment.

In short, when something turns up repeatedly in retros it needs to be dealt with. Retro actions should be prioritised in the same list of work as everything else (see this essay by James Stanier, One List to Rule Them All). And, if your retros aren’t getting anywhere, then there are more useful ways to spend the time.

projectmanagement

How can we get better at enterprise development?

Post author By admin
Post date January 15, 2026
No Comments on How can we get better at enterprise development?

This post is a summary of where I’m at with thinking about the title question. I’m running an internal discussion on this topic later in the month.

In many software projects, coding skills are not essential to success. Most systems I’ve worked on involve simple, well-known functionality – sharing messages, automating workflows, making payments. These are all things people have been working on for decades but somehow projects still fail – despite advances in technology and methodologies. It’s not code that makes the difference.

For a while, I thought the issue was project resources – people trying to do things with too few people and too little time. But now I’ve come to think that the problems of maintaining a large code base for an organisation over years require skills very different to simply writing code. My thoughts on this have been focussed by reading Sean Goedecke’s essays, in particular Pure vs impure software development.

…pure engineering – is interested in solving a technical problem as perfectly as possible… The second kind – impure engineering – is interested in solving a real-world problem as efficiently as possible.

impure engineering is a brawl: you’re fighting decades of previous technical decisions, competing political views about how the product ought to work, consensus among your colleagues or the company at large, and in general much more incidental complexity.

This has led to me thinking about three questions:

Why is enterprise software development so slow?
What skills does it demand?
How can we get better at it?

Why is enterprise software development so slow?

In a discussion about AI project success, Goedecke quoted the statistic that 84% of non-AI projects fail. Indeed, “The infamous 2015 CHAOS report found an 61% failure rate, going up to 98% for “large, complex projects”. Most IT projects fail.

One explanation Goedecke offers is the the wicked feature, the sort of addition to software that affects every other feature – for example, user types or different deployment options. As software grows, such changes accumulate making it harder to add new features. Companies can try to resist such changes but eventually give in to them to unlock new markets.

Incidentally, this is why the quality of big tech engineering is sometimes worse than you would expect. The cognitive load of operating in this environment is pretty intense. Delivering anything at all is really, really difficult. So really obvious balls sometimes get dropped because the engineer in question was trying to thread a path between ten other features.

What skills does it require?

One skill Goedecke has pointed out is the ability to explain large and subtle pieces of software, providing what he refers to as ‘technical clarity’: “In an organization, technical clarity is when non-technical decision makers have a good-enough practical understanding of what changes they can make to their software systems.”

Goedecke also sets out shipping as a distinct skill to writing code: “The default state of a project is to not ship: to be delayed indefinitely, cancelled, or to go out half-baked and burst into flames. Projects do not ship automatically once all the code has been written or all the Jira tickets closed. They ship because someone takes up the difficult and delicate job of shipping them.”

Some others I would add:

Being able to track the work that needs doing.
Effective asynchronous collaboration, particularly with external teams
Running effective and efficient meetings

How do we get better at it?

I’ve only started thinking about this third question. My main concern is that most of the skills listed above are non-technical. I still think technical skills are important but, they are not sufficient. I’m looking forward to exploring this further.

GenAI

Links for Leeds AWS talk

I gave a talk at October’s Leeds AWS User Group about Amazon Q, and my employer’s ‘Summer of Q’ experiment. This page lists the references for the talk.

NaNoGenMo is the annual project where people try to write software to produce a novel each November.
Simon Willison: AI-enhanced development makes me more ambitious with my projects – Willison’s blog is very much a must-read
Strategies for an Accelerating Future by Ethan Mollick: “No amount of reading and research can substitute for spending 10 hours or so with a frontier model, learning what it can do”
The reasons for being polite to models is discussed by Sean Goedecke in Do not yell at the language model
Elevator muzak for agentic coding
Sam Kriss: The Internet is Made of Demons – if you want to think more about AI programming and its relation to demonology
Sean Goedecke: The three great virtues of an AI-assisted programmer – talks about the slot machine effect. Another great blog, with some fascinating discussion around enterprise development.
Where’s the Shovelware? Why AI Coding Claims Don’t Add Up – a fascinating argument that AI coding may not be producing more finished software
What the birth of the spreadsheet teaches us about generative AI – an interesting counterpoint to gloom around tech jobs

weeknotes

Weeknotes: 2025 35/34

August is the last month of my employer’s financial year, which means preparing for annual appraisals just as everyone is using up their leave – and this is on top of client work. While I’m enjoying being busy, it’s also been tiring.
We finished off our internal Summer of Q, where about 40 of us explored agentic AIs for coding. It’s one of several things I need to write up before I forget about them. I love working with agentic AI, but doing it effectively will take some more work.
I continue to think about Sean Goedecke’s essays on value, alongside his one on Pure and Impure Software Engineering. I’m mentioning Goedecke in most of my weeknotes recently. I have been rereading some of his pieces and want to put them into practise when I start a new role in September.
My new role involves React, so it’s time to get to grips with that properly.

Weeknotes: 2025-33 to 30

It’s been a few weeks since my last weeknotes. I had a restful holiday in Wales and a lot of work things to catch up on.
I’ve been thinking a lot about Sean Godecke’s essays on the role of a programmer in the current economy, and the need to provide demonstrable value – particularly for seniors and managers. This is shaping my approach to my work.
I wrote a post about how The Joel Test turned 25 this month, looking at its historical importance and pondering what a modern Joel test might include.
In another flight of nostalgia I looked up Brad Fitzpatrick and Lisa Philip’s notes on scaling Livejournal, which were a useful reference for scaling Flirtomatic many years ago.
I continued my experiments with Amazon Q and learned a lot. I should have those written up by the end of the month.

Links

The 2025 Stack Overflow Developer Survey has been published, although it may be a few weeks before I get around to reading this properly.
Another great piece from Charity Majors was In Praise of ‘Normal‘ Engineers, where she talks about the importance of teams rather than individual developers, and how systems must be easy to use: “The best engineering orgs are the ones where normal engineers can do great work”
Sean Goedecke’s What’s Going to Happen to Junior Engineers was a good exploration of the potential long-term meaning of the fall in junior tech positions.
The conclusions of Anthropic’s Project Vend feel quite speculative, but it raises some interesting questions: “we think this experiment suggests that AI middle-managers are plausibly on the horizon… the AI won’t have to be perfect to be adopted; it will just have to be competitive with human performance at a lower cost in some cases… we don’t know if AI middle managers would actually replace many existing jobs or instead spawn a new category of businesses”.
Welcoming the Next Generation of Programmers argues that vibe coders are programmers and it’s important to onboard and welcome them to the existing developer communities.
Christina Wodtke posted on linkedin about how a lot of Gen-X programmers have the same passion for GenAI as for the early web – in a way that didn’t happen with blockchain or VR (via Simon Willison)

Books

Tidy First by Kent Beck

I read Kent Beck’s Tidy First? while away – some thought-provoking ideas, but they rely on being able to produce and merge changes easily. Again, I need to find time to write up my notes.

programming programming-life

The Joel Test is 25

On Saturday, the Joel Test turned 25 years old. This is a 12-question checklist to assess the quality of a software team. Many of its points are now universal, but it got me thinking about what might be on a modern Joel test.

The test was written by Joel Spolsky, then of Fog Creek Software, but better known now for Stack Overflow. According to Spolsky, 10-or-less out of 12 suggested serious problems. The Joel Test provided me with a useful set of questions to ask potential employers and helped me avoid some dodgy companies.

Some of the items on the test show how far programming has come in the last quarter century. Back then, not every team used source control, regular builds, or bug databases. A few other things are still not as common as they should be – not every company asks developers to write code during an interview.

What would I put on a modern Joel test? I would add active monitoring of production; early and regular review of software by product owners; documented onboarding processes for new hires.

Even though we’re in the early days of learning about GenAI, I think it’s already essential for teams and companies to provide training and hands-on experience with the new tools. Whether these produce a 10% or a 10x increase in developer output, they will become essential.

What other things are essential for a successful software development team?

weeknotes

Weeknotes: 2025-30/29

I drafted some notes last week, but didn’t press publish, so these notes are two weeks’ worth.
A client colleague prompted me to make more use of Copilot in Teams. It’s hugely useful, but there’s a gap between reading and writing in all these tools – it’s too easy to copy and paste the application summary rather than edit it (particularly if you have another meeting to get to, since the context disappears when you move away). It’s going to be interesting to see how helpful this proves in the long run.
I wonder if remote working is increasing the number of meetings as it is so easy to book them – and cameras off means that there are people multi-tasking, rather than looking bored in the room. There’s no feedback to prompt people to push back against the calls.
I’ve been playing with AmazonQ. The UX is an atrocity, but the tool itself impressive and compelling. There are however, a lot of subtleties about how this would work as a development workflow, and how it will scale up to use in large organisations. I’m using the Nilenso piece on AI-coding as a guideline. I made a post about my initial response to Q and another one about my second week.

Links

I’ve been catching up on Sean Goedecke’s excellent writing. In Do Not Yell at the Language Model he talks about how berating a language model for mistakes might create a negative context, producing worse results.
Peter Hilton describes an amazing lightning talk, where Chris Oldwood told programming jokes for 5 minutes. Hilton goes on to imagine a book of 97 Jokes Every Programmer Should Know, suggesting that such jokes are a good way to learn some aspects of programming. “There are 10 kinds of programmers: those who understand binary, those who don’t, and those who weren’t expecting a base 3 joke.”
Charity Majors wrote an interesting piece, On How Long it Takes to Know if a Job is Right for You or Not, in which she talks about the need for alignment between a manager’s values and the company they work for.
The striking thing about Bo Frese’s The 13 Ways We Kill High-Performing Agile Teams was how often these occur, despite going against well-known best practise. Also interesting to see that the scrum guide had removed ‘the three questions’ as a stand-up practise.
Good retros are hard, and Who Needs Action Items by Daniel Cooper is a good piece on this. “Eventually, people stop bringing anything that actually matters and it’ll all be fluff. No one wants to accidentally become the owner of ‘improve emotional tone in retros (Q3 OKR)’.”

Books

I completed a re-read of Kent Beck’s Extreme Programming Explained, which I last read back around 2001. I have a lot of notes to reflect on, but the biggest surprise was how little empirical evidence Beck had for his theories. Which is not to say I think Beck is wrong per se, rather that his insights are based on a particular set of experiences. There was also some provocative thoughts about documentation which goes against what I think, and is worth interrogating.

GenAI

Summer of Q: Week 2

My overall impression, after more time working with Amazon Q, is that it will take some work for a coding agent to make me faster and more effective. Q definitely removes some of the boring bits of coding (it’s great at Maven dependencies) but it’s more wayward on complicated tasks. There’s a lot to learn here.

At the end of last weekend, I’d settled on a method: writing a specification for an area of my application, having Q produce a BDD feature file outlining the behaviour, and then getting Q to fill in the testing code and after that, the implementation. This soon ran into problems as I’d still set Q too wide a brief, and the code produced quickly sprawled. There were many minor issues, such as Q producing unfocussed Cucumber step files. Along with the pages of code, some chunks of functionality were left out to ‘fill in later’.

It’s tricky to find a regular working pattern with good DevEx. I didn’t want to put Q into ‘trust’ mode, choosing rather to review each change as it was prepared. I did this so I could interrupt Q when it went off the rails, and also to reduce the amount of generated code I needed to review. This meant a lot of time waiting while Q was ‘thinking’. One colleague talked about their passion for writing code and how reviewing generated things is not the same. In their current form, these tools don’t have the responsiveness of working directly with code.

The production of the code also produced a strange effect around ownership. Hand-writing code (or whatever we call the ‘old’ ways of programming) meant taking care with each method. It was a good way to get inside the code, producing ‘mechanical sympathy’. Here, I started with a simple outline of my application in 275 words. Q produced over 10,000 words of feature files (including some useful functionality that was not asked for, such as sanitising inputs). This is a lot of reading! Assuming a reading rate of 400 words per minute, that is 25 minute’s work – setting aside the deeper understanding needed here, and any editing required.

Q also proved to be better at some things than others. When asked to generate some test data, Q created a programme to populate the DB on start-up. I had to suggest using liquibase. Being able to get the best out of this tool requires the operator to have a clear idea of what they would expect.

I’m still convinced that these tools will be part of a regular toolkit, but I don’t think they will offer the sort of incredible gains some have suggested – although they will be essential for prototyping. Cal Newport produced a great summary of the competing claims about productivity. My prediction is that, in the long run, we’ll see significant gains, but we won’t be relying solely on the agents.

GenAI

First Impressions of AmazonQ

Post author By admin
Post date July 20, 2025
1 Comment on First Impressions of AmazonQ

My employer has organised a ‘Summer of Q’, where a number of us have signed up to play with AmazonQ. This weekend was the first time I could work with Q in depth. The main result – I ‘built’ a quiz application in 30 minutes (while also doing some chores) and it looked and worked better than what I’d have produced solo. But there are a lot of subtleties and caveats to add to this.

A major argument against GenAI putting developers out of work is how poor the tooling and signup flows for Q are. The signup is terrible and confuses a lot of people. Q failed to help, and kept hallucinating links to help pages that didn’t exist. The IntelliJ plugin is awful and locks the IDE, so I’ve had to use the command-line version instead.
Q is great at producing code. Producing the quiz example was a trivial task, so I’m now working on a much more complicated example. Straight away, I can see Q making me more effective. Personal tools I’ve wanted to make, that I decided against investing time in, now look easy.
The quiz app that Q produced looked and played better than what I could have produced by myself. I’m very impressed by this.
The model’s reasoning is clever and spooky – it makes mistakes sometimes, but then works to fix those. Interesting behaviour – although I expect there to be fewer mistakes in the generated code over time.
One of the challenges of coding agents is getting used to the new workflow. There’s a fair bit of waiting involved while Q thinks about each file that needs creating. It’s very different to using a GenAI coding assistant, and I need to figure out the best new workflow.
An ongoing problem with GenAI is that it involves a lot more reading than writing. I figure almost no-one is reading co-pilot meeting summaries, and I worry that not everyone will closely read the impressive amount of code that Q generates.
At present, I’m reviewing each action Q takes, rather than trusting it for the session. It’s going to be interesting to how other people are working. There’s a lot of boring waiting this way, but a lot less reading to do in one go.
Being able to produce decent (albeit not perfect) code so quickly will change the nature of programming. The coding part is going to get much easier. The development part – making sure the right thing is produced – will become more important, and maybe more difficult. I’m currently using feature tests as a way of validating what is being made.
Something I’ve noticed with GenAI in a number of areas is the importance of taste. The tools produce things (image/text/code) incredibly fast, and require an operator with strong opinions about this output.
Q responded to my initial, naive prompts by producing ornate additional features. For example I asked it to generate some BDD feature files and it’s adding some complicated accessibility tests. I’m looking forward to watching it try to fill those out! I also spotted some subtle divergences from the spec that I need to edit. The quiz code I initially generated also included a lot of useful but unasked-for features. They were improvements, for sure, but it was definitely not an MVP. It will be interesting to see how easy it is to work with Q on my more complicated application.

weeknotes

Weeknotes: 2025-28

I’ve been working this week on mongo replicasets and I’m very impressed with their resilience, particularly the use of an intelligent client in the driver to handle failover etc.
As part of an initiative at work, I started playing with Amazon Q, initially asking it to generate some basic arcade games. First impression was to be impressed at the simple examples produced, while being aware of the challenge in getting precise results from a coding agent. Something I need to spend more time on.

Links

An excellent post from Sean Goedecke, AI Interpretability is further along than I thought, talks about internals of language models – it was a useful reminder of why telling a chatbot that it’s an expert works.
AI-assisted coding for teams that can’t get away with vibes (via Simon Willison) was a useful primer on large-scale coding with GenAI. A useful rule here was ‘what helps the human helps the AI’, including linting, CI/CD, documentation and clearly defined features. Some good examples around prompting, and how AIs are used to build the prompts to code from. The most interesting bit, and something I’d like to go back to, is the claim that the DRY principle is less useful when working with LLMs. This is a living document being maintained by nilenso, which I will have to keep an eye.
Could HTTP 402 be the Future of the Web was a good speculative article about the need for micropayments and how charging AI crawlers could lead to that.
Some excellent words of wisdom from Everything is Prioritization: “If you’re remote and still free frazzled, you’re not doing remote wrong. You’re just prioritizing availability over impact.” The article talks about the need to avoid tempting distractions: “The best teams aren’t full of geniuses. They’re full of people who keep their focus and say ‘no’ without having a breakdown”.
I’ve long disliked the cargo cult metaphor, and this is deconstructed in The origin of the cargo cult metaphor, which points out a lot of the errors and miscomprehension in the popular understanding of actual cargo cults. “The cargo cult metaphor is best avoided”.
Simon Willison’s Identify, solve, verify is a short piece on the role of the programmer in the era of GenAI. “The more time I spend using LLMs for code, the less I worry about my career”.
The Elegance Question: What Makes Some Systems Just Work? set out some simple principles for building ‘elegant’ systems. This was thought-provoking, particularly around the question of why so many systems go against these principles.

Books

No time for reading this week – and I’ve been distracted by a non-tech book.

Recent Posts

Recent Comments

Archives

Categories