Categories
books programming

A review of Dave Farley’s Modern Software Engineering

My colleague Luke Punnett recently recommended Dave Farley’s book ‘Modern Software Engineering’. While it’s not quite a classic, it’s a superb summary of the state of the art in software development. Anyone writing enterprise software should read this, and ideally follow the book’s advice.

Farley attempts to build a foundation for software engineering as a discipline using the scientific method. Writing code is formalised as a series of experiments, set within the process of ‘characterise, hypothesise, predict and experiment’: “Software engineering is the application of an empirical, scientific approach to finding efficient, economic solutions to practical problems in software.

What was most valuable about this book for me was getting a glimpse of how a very good and experienced software engineer approaches his work (Farley mentions a couple of times that he was involved in the LMAX exchange project).

There are several topics on which the book is particularly strong. Farley approaches agile from a fresh angle, renewing my faith in it – a faith that had been ground down by SaFE and ritualised agile processes. Farley is also excellent on test-driven development, arguing that TDD is not about producing code coverage, but a method of working that produces “a pressure … to write code that is more testable“. Farley then argues that testable code has the same attributes as code that is easy to maintain. There is also some excellent discussion of the pros and cons of microservices, arguing that their main strength is in allowing smaller, more focussed teams.

Robert C Martin’s Clean Code feels like it has reached the end of its life. People are less comfortable with what some of the guidelines that it proposes. Farley’s book is, I think, an excellent replacement. It is short and well-argued, and sets out a clear case for its recommendations. Some of the things Farley proposes – TDD, microservices, test automation etc – are still controversial in some companies. Hopefully this book will help towards their wider adoption.

Categories
programming

What Do We Mean By Technical Debt?

The cat is a metaphor for technical debt

Recently, I’ve been thinking a lot about technical debt. It’s a great metaphor for how time is lost in programming projects but, like all metaphors, you have to be aware of where it stops working.

For example, when people compare national debts to household debts, this ignores how national banks control the supply of money, and that countries run into different problems to households when they’re over-extended.

Technical debt is, basically, time saved now which costs more later – taking tactical short cuts. It’s usefully compared to any other corporate debt, providing leverage for growth that would be impossible otherwise. The problem is that people don’t seem to track how much technical debt they’ve accrued and need to ‘pay back’. I’ve worked at a number of places where dealing with inefficient processes that were set up in a hurry took up a substantial amount of day-to-day work.

Having seem companies struggling with debt, I wonder if technical debt should be treated more like personal or household debt. Advice to people who are struggling with debt generally focusses on three points:

  1. Reduce outgoings to focus on paying off the debt
  2. Pay off the highest-interest debts first
  3. Do not save or invest anything until any costly debts are dealt with.

The software equivalent of expensive credit card debt is any inefficiencies or manual steps in repeated processes. If your regular work is taking longer than it should, there’s less time available for new things. So, like a household in debt, don’t try to invest in new things before sorting these existing inefficiencies. Fix the automated tests so no-one needs to reduce the manual checks on a release. Automate the entire release process. Improve testing to prevent disruptive production bugs.

Technical debt is a good thing when it’s used to provide leverage for growth. But, if you’re not tracking this debt accurately then maybe you should treat technical debt more like household debt, and prioritise reducing it to a manageable level. There’s no point investing in new features when you’re paying high interest on broken processes.

Categories
programming

The Seven States of a Boolean Object

Everyone understands Boolean variables – they’re either true or they’re false. Right? Except, in languages, where the Boolean variable might be null, which makes comparing two Boolean values a little little trickier. And the DailyWTF once gave an example of a Boolean ENUM that could be true, false or file-not-found.

Long ago, when I worked for Tom Hume at Future Platforms, one of the developers suggested something more ambitious. After all, a three-state Boolean leaves too much room for ambiguity. By the time he was done, Thom Hopper had suggested 6 or 7 states for a Boolean variable. Tom Hume and I recently tried to remember as many as we could, but only got 5 or 6.

We tweeted Thom Hopper to see if he remembered, but the states of this variable are lost in time, like tears in rain. But, for the sake of posterity, I want to record some of the values that a proper Boolean type might hold:

• true
• false (obviously)
• null
• uninitialised (similar null – but this means a true/false has never been set)
• unknown (we’re currently not certain of the value)
• indeterminate (there’s no way of knowing this value)

I’ve love to be able to declare that this is stupid, but I don’t know. I mean, if you accept null as a Boolean state, where do you stop? Maybe Java needs a ‘true’ boolean that can take all of these states?

Categories
infrastructure programming

Living Without Pre-Production Environments

Would we be better off without test environments?

I try not to recommend too many talks, but I loved this one from Nicky Wrightson of Skyscanner about living without pre-production environments. It provides an interesting solution to a lot of problems with performance environments I’ve been thinking through. Trying to accurately reproduce live environments seems like a wasteful, quixotic endeavour and I kept wondering about was just not using them. This video is an encouragement to that thinking.

Talks are often very different to the reality in companies, but there are some great questions for anyone about what test environments are for. “Historically, we had lower environments that were like production so that we could test releases that we couldn’t confidently reason about the effect of the changes.”

The only performance test that actually matters is how the live environment. ”At the end of the day, just make it a lot easier to track issues in production with really well instrumented code rather than try to replicate those issues in lower environments”.

There will always be inherent differences between environments, which limits the ability to draw conclusions from them. Yes, H2 is great for integration testing, but it has subtle differences with Oracle and MySQL which need managing. In performance tests, replicating load is hard – data and traffic shapes are as important as infrastructure and code. And then there are all the environment specific configs to manage, and the bugs from that.

Obviously, doing away with testing environments is the sort of decision that gets people fired. But! Just imagine how much things would improve if it worked!

Categories
infrastructure programming

What’s So Special About Cloud Software?

One of the more interesting interview question I’ve been asked in the past few years was about the differences between cloud development and monoliths. I don’t think I gave the expected answer when I said that they’re not all that different

Yes, cloud environments are complicated but good cloud development relies on the sort of fundamentals that sometimes go wrong in monolith development.

Co-ordinating subscription renewals, credit card billing and confirmation emails is an example of working with distributed systems on a monolith. Do this in the wrong way and you billed a customer multiple times, or spammed their email. Trade-offs, failures and retries needed to be carefully considered.

A lot of applications take external systems for granted, rather than considering that they do sometimes go wrong. One place I worked coupled their login process to Salesforce. When Salesforce had an outage, their monolith shared it. When local filesystems have issues, applications will fail dramatically.

There are obvious differences between cloud and monolith development (not least the operational complexity), but both require an attention to the principles of software development. You don’t need a large system to be hit by the fallacies of distributed computing.

Categories
programming

My Most Useful Technical Interview Question

One interview question that I’ve been using for about ten years seems to filter out more candidates than any other. It’s not a trick, and I still don’t understand how come it catches so many people. Sometimes I worry that there is something wrong with what I’m asking.

The question is this – using a text editor and not an IDE, write a simple method to take an integer as an argument and return the factorial. I make sure to explain what a factorial is and wait.

The people I’m interviewing are rarely novices. I’ve asked this from people with years of banking experience. Some of them had exciting CVs, with successful projects and all the skills I was looking for; they could talk fluently about complex technology. Yet they did not seem that familiar with code. I’ve senior developers struggle with writing a simple loop.

I ask a lot of other things in interviews. I try to be open-minded, searching for strengths rather than weaknesses. I don’t bother with tricky algorithm questions that people rehearse for interviews and forget once they get an offer. With everything else I ask, if a candidate doesn’t know the initial answer, I follow-up to find what they do know.

But I expect anyone going for a technical position to be comfortable writing a simple piece of code, to be familiar with what code looks like. Can you write a loop and check it? I try to account for the fact that the candidate might feel nervous, and might find the lack of an IDE challenging. Sometimes, I tell them not to worry too much about syntax, to use pseudocode if they like.

I’ve been interviewing developers for years, and that question is essential. The piece of code I ask for is trivial. I’ve heard of interviewers getting the same results with FizzBuzz. The example that I use is listed across the web as an interview cliche, something a prepared candidate would expect. A good candidate disposes of this quickly and moves to the next question; but some people struggle. The question shouldn’t work, but it does.

Categories
programming

Some things learned in 20 Years of Software Development

I’ve now been a software developer for over 20 years. I started out thinking this would be a temporary diversion, but it’s grown to be something I love. I’ve been lucky enough have a wide experience of the industry, from mobile to microservices, and from three-person companies to multi-nationals. So, I decided to compress some of what I’ve learned into some short points:

  1. If a bug isn’t getting fixed this month, then you might as well not track it as you’ll never touch it. (or, to put it more positively – use a zero-bug strategy!)
  2. TDD is never going to take off. Everyone has automated tests, which is great, but I’ve never worked anywhere that used the proper TDD cycle in practise.
  3. Good project management is more important than methodology. Projects are just as messed up under agile as they were under waterfall, but we now have more meetings.
  4. The DRY principle is overrated. Too many people go for this ease of change rather than ease of reading. This is especially problematic in test code, which is mostly write-only.
  5. Focus on the data – I’ve always considered any computer system as a data-store with some code attached, and this works pretty well. If you get the transactions right, everything else will follow.
  6. The best code is simple. If it can’t be followed by junior colleagues, it’s too complicated.
  7. Projects rarely fail for technical reasons. Unless you’re doing something cutting-edge, the failure is due to something within your control. Software development is one of the least important parts of being a developer.
  8. Performance testing is hard.
  9. New technologies get less exciting as you compare them to things you already know. Like, gRPC is just fancy SOAP.
Categories
infrastructure programming

Why Use Couchbase?

Back in the Noughties, when I first started web programming, data storage choices were straightforward. Your options were limited to RDBMS systems (Oracle if there was a budget, MySQL otherwise); if you to store binary data, then you could use file systems; and, in some cases, where the data was read-only maybe, you’d use a CSV file. Life was simple.

Back then, when I first heard the term ‘NoSQL’, I dismissed it. It’s never good to define something as what it isn’t, and the lack of a structure query language didn’t sound that compelling. But, over the years, NoSQL datastores have become essential, with some of them not being promoted as databases as such. The first one I used extensively was Lucene, which I didn’t really think of as a datastore. (Arguably, it isn’t strictly a datastore, but that’s another discussion).

Now there is a wide range of choices, each with their own specific use cases. I was recently tasked with looking into Couchbase, and the first question to answer is, why use Couchbase at all?

For an overview of Couchbase we can turn to a Linked in blog post on Couchbase’s evolution:

Couchbase is a highly scalable, distributed data store that plays a critical role in LinkedIn’s caching systems. Couchbase was first adopted at LinkedIn in 2012, and it now handles over 10 million queries per second with over 300 clusters in our production, staging, and corporate environments. Couchbase’s replication mechanisms and high performance have enabled us to use Couchbase for a number of mission-critical use cases at LinkedIn.

Wikipedia provides a good summary in their Comparison of Structured Storage software. We can see that Couchbase is a document storage solution, similar to MongoDB, but adding high availability functionality.

Couchbase started as a memcached replacement, adding in features like persistence, replicas, and cluster resizability. Its use as a backend to LinkedIn has demonstrated its potential in large deployments, with LinkedIn having, at one point, “over 2,000 hosts running Couchbase in production with over 300 unique clusters”. Or there were the 100 hosts used for Draw Something – 2 billion drawings were stored, at a rate of up to 3000 per second.

One of the interesting problems with learning a lot of modern technologies is that their potential only really comes out at scale. Speaking as a developer, I would be hard pushed to find a reason to use Couchbase above Mongo unless the use case involved master/client on mobile, or a website I expected to scale massively. But it is easy to get started with a basic Couchbase site thanks to JHipster and docker.

There are clear instructions online for getting going with JHipster, and having a working Couchbase application could be managed within about an hour, even with no JHipster experience. The basic steps are:

  • Install JHipster
  • Start JHipster and run through the basic application creation options. It’s easiest to work with a monolithic application if you’re new to JHipster. Make sure to pick Couchbase as the database, but otherwise the defaults will work well enough.
  • In the newly created application folders, go to the src/main/docker folder, and type the command ‘docker-compose -f couchbase.yml up’
  • In the main JHipster folder, use the command ‘./gradlew’ to build and run the Spring Boot application.
  • The application can then be viewed at http://localhost:8080. I had to use Chrome to get this working successfully.

The basic JHipster application, with no customisation includes a basic usermanagement system. The couchbase docker instance can be accessed at http://127.0.0.1:8091/, username ‘Adminstrator’, password ‘password’.

Clicking through to the ‘Buckets’ option on the left-handside menu shows the different data partitions available. Clicking on the ‘Documents’ link for the partition we have created shows the basic user data that has been added.

This is not much of an application, but by following the JHipster instructions for creating new entities, we get CRUD options for new pieces of data. While this produces a relatively simple application, JHipster has produced an entire stack, including Spring Data Couchbase. The work so far could be customised to provide a full application, or used as a working example of how to integrate Couchbase into a Spring Boot application. (One advantage of JHipster is that the application produced can be subsequently developed without reference or use of JHipster.)

Categories
programming

lifelong learning in software development

I always wanted to have a Brighton Java talk on lifelong learning. The techniques, tools and fashions in software development are constantly changing – how do you keep on top of this churn while doing an often-stressful job and maintaining the family and social life you have a right to? I never found anyone to give this talk, and now I don’t need to, as a fantastic talk by Trisha Gee called Becoming a Fully Buzzword Compliant Developer has been released on InfoQ.

In the talk, Gee sets out a simple step-by-step approach to learning and career development, without ignoring how expectations around this impact on often-excluded groups in IT. (Remember: asking for interview candidate’s github profile makes a lot of assumptions about your potential employees and their free time).

Gee also looks at how quickly new ideas are absorbed into the mainstream of development; how to discover and assess new buzzwords; the importance of real-world meetups (like Brighton Java!); and the importance of “Enough knowledge to blag your way through a conversation in the pub.”

Sometimes, when discussing training and development outside work, developers point out that there is no requirement for, say, HR staff or office managers to do their job as part of a hobby. But look at it another way: modern careers are a half-century or so. You want to find a way to engage with software development and lifelong learning that is so exciting that you want to do it. Even maintaining the simplest hobby site can help you to keep up with the latest exciting new things.

There is also a flip-side to Gee’s talk which is less often discussed – and that’s how little learning Java enterprise developers can get away with by using the slow pace of change to avoid any self-development. Mobile and JS devs tend to engage much more because their careers depend on it (an Android dev who does not use the latest APIs will soon be unemployable). New ideas and tools can take a long time to filter through to the enterprise. The downside of this is that it can take even longer to get them right – TDD, agile and microservices suffer in enterprises due to the lack of an engaged, interested and playful attitude among devs and managers.

Categories
programming

The TDD Lie

The requirements for pretty-much every developer job I see these days includes Test Driven Development. Which is exciting – everyone recognises the importance of unit tests.

In my previous post I talked about how often companies claim to do agile, but don’t succeed in practise. And I see the same thing happening with test-driven development.

It’s easy to tell which companies want to use TDD but don’t in practise: you ask what percentage coverage they have for their code.

In its purest sense, TDD demands that tests are written before any new code is written. If you’re coding in this way (as suggested in  Kent Beck’s Test Driven Development book) your coverage should be close to 100%. Of course, there are other ways one might approach TDD, such as writing business tests before the code. This is definitely better than adding a few tests at the end of a project.

But, if you’re tracking the coverage and have chosen a figure less than 100%, then there is a choice being made about which code is not covered. I’ve never seen a standard coverage threshold working out well. The obvious indication  of this is how often the coverage of projects is just over the required threshold.

I’m a great believer in high coverage (while acknowledging the limitations of that). And if you don’t grow your code with the  tests then you tend to end up with code where it’s difficult to add code at the end.

If you’re listing  TDD in job requirements and not practising it, then it’s worth asking why not. A lot of times it’s a fundamental limitation to the company – we know we should have tests but we’re rushing. It’s the same problem that many companies have with agile – these things are obviously a good idea, but difficult to do in practise.

As Bob Martin pointed out,”You know what you believe by observing yourself in a crisis. If in a crisis you follow your disciplines, then you truly believe in those disciplines. On the other hand, if you change your behaviour in a crisis, then you don’t truly believe in your normal behaviour” (from Chapter 11 in the Clean Coder).

If you know you should be using TDD but aren’t, then you need to ask why not. And, if you can’t fix that, stop lying that you’re doing TDD – admit what you’re actually doing and focus on improving that.