Categories
infrastructure

Java Infrastructure Part 4 – The Build System

It’s about time we added a build tool to the project. It’s possible to create jars by hand, but that soon becomes time-consuming and error prone. Having a repeatable build process launched with a single command is pretty much essential to doing anything interesting with software.

Over the years I’ve used make, ant, maven and gradle. The one of these I like least is ant. It seems to produce massive, thousand-line monstrosities that are unreadable and inscrutable. And while ivy is fairly similar to maven’s dependency management, it doesn’t seem as natural to me. Having said that, maven can also get unwieldy, with simple builds that get out of hand.

I’ve not used Gradle a great deal, but it seems an obvious choice. A significant reason is its success – Gradle is the standard tool for Android Studio and Spring. Popularity is often under-rated as a reason for choosing tools or frameworks, but means examples and expertise are easier to find. There may be many good reasons for lesser-used frameworks, but knowing there is a vibrant community around a platform is a major plus.

However, I’m still cautious about Gradle. I’ve found some of the plugins I’ve used unhelpful, with the missing options harder to find than they were with maven. I also find the documentation focuses too much on how to do certain tasks rather than explaining the underlying concepts and assumptions. On top of that is a growing suspicion that Groovy may result in scripts that are write-only, impossible to read back later on, just like Perl scripts used to be.

(There’s an example in the documentation of the power of dynamically-generated tasks and their potential for chaos. The script

4.times { counter ->
    task "task$counter" << {
        println "I'm task number $counter"
    }
}

creates four tasks, which can then be called as

> gradle -q task1
I'm task number 1

I can see some powerful uses for this, but I can also see myself struggling to work out where on earth a failing task comes from)

Despite some teething problems with the Artifactory plugin at work, I’ve enjoyed using Gradle so far. I love groovy for its concision and charm and there’s an optimism to using a new tool, particularly when the documentation explains how much better it is. It may turn out that maven would be a better choice but, because we’re working on infrastructure rather than code, we should have a lot more freedom to change things later.

Gradle uses the same concept of configuration over convention as maven. Past experience tells me that it’s easy to work with the grain of such things that fight the tool, so we will move our source directories from src/ to src/main/java/ in line with this.

Because we’ve used the standard directory layout, the initial build script is extremely simple. In fact, it’s just a single line in our initial build.gradle file:

apply plugin: 'java'

Running the command ‘gradle build’ results in the jar file being built. Nice and straightforward – but I feel a slight sense of nervousness that so much happens with a single command. For example, if we had not moved the source directories, gradle would still happily produce a jar file, just one with nothing in it.

Introducing a new tool means something else to track. As well as noting the current version in the readme and todo files, Gradle also offers a mechanism for reducing the risk of different versions being used – the Gradle wrapper. This is a script that checks whether the required version of Gradle is available on the local machine. If not, the version is downloaded and stored locally. This requires us to add a new gradle wrapper task to the script, then execute the gradle wrapper command.

task wrapper(type: Wrapper) {
    gradleVersion = '2.11'
}

The wrapper adds several new files – gradlew and gradlew.bat scripts, as well as a jar file and configuration in the gradle/wrapper folder. This is intended to be commited to git, so that anyone building the project in future can use the correct version of gradle via the gradlew command. This version is downloaded and stored centrally so that it can be used by other gradlew scripts as needed.

However, this convenience introduces a new issue, one we will face again when we introduce dependency management: how do we make sure that the code we download is safe? There’s an interesting discussion of risk in a post called How to Take over the computer of any Java developer. Basically, we need to make sure that the code we download has not been tampered with.

A basic level of security is provided by the distributionSha256Sum property which is added to the gradle-wrapper.properties and checks that the zip file downloaded from http://services.gradle.org/distributions/gradle-2.11-bin.zip is the one expected. Of course, this in itself requires finding ” the SHA-256 hash of a known Gradle distribution”. We’d probably be OK in trusting the (HTTP) download, but this isn’t really good enough. It’s going to be added to the TODO list, and dealt with after we’ve looked at dependency management.

The latest git commit is cd8e97a. In the next part we’ll look at adding a continuous integration server.

Categories
infrastructure

Java Infrastructure Part 3 – A problem with compilation

In theory, compiling our Java class is straightforward: drop into the command line, use the javac command, then test by running the main method.

command

Which is fine on my laptop  – I now know I can compile the code and run it. But problems can arise as the code in question becomes more complex, or if it needs to run on other machines. The latter is a certainty – putting aside failure of this laptop, I want to run this code on a server at some point. (That is, unless I decide to develop directly on the production machine. That seems such an appalling idea that I find myself wondering whether there is some bizarre case to be made for it).

Using java -showversion reveals that I am using 1.8.0_72. The latest version at the moment is Version 8 Update 73, which was released on February 5th 2016 – I’m writing this on the 21st. There are two problems here.

  1. How do I make sure that this code is always handled with a consistent version of the Java SDK? I don’t want to risk inconsistent behaviour between different machines.
  2. How do I make sure that I am running the latest version of the SDK? Looking at  the release notes for version 73 to see the differences between this and the version, I notice that there are some security patches that I’m not taking advantage of.

This problem will occur with every tool that is used. A similar problem will occur when we start adding some dependencies to the software, but we will deal with that separately.

There is also a certain amount of configuration that remains implicit when I am running on a single machine. Right now this doesn’t matter much, but these sorts of problems become a nightmare as  the software grows – what are my environmental variables? What is the underlying OS?. It would only take a few minutes for a developer to set up a new machine to run this code now, but as we add databases, continuous integration etc, we end up with that becoming more difficult.

Consistency sounds like an obscure problem (and is low-risk for the Java SDK), but when it does arise, it’s vicious. You don’t want a bug on the server that can’t be easily spotted on development machines. If the development and production environments are the same then every bit of work carried out confirms that the code works as it should.

What are some options for dealing with these issues?

  • Document a target environment fully and allow people to follow that as closely as they want/need to. There are still problems when doing this, but it’s more than a lot of companies bother with.
  • Use a bespoke local machine build image – the question then becomes how to keep existing machines in sync with this. Over time, the machines diverge from the original image, or the that image needs updating. This can be complicated by machines needing special builds for testing etc.
  • Find a way to develop cleanly using docker/vagrant or similar. The code is executed and possibly compiled within VMs.These can be rebuilt every time.

Build images and documentation are both useful first steps, but ultimately, the VM-based solutions feel right. At this point, I am going to put in some TODOs to cover this. It’s unsatisfactory, as there are now 5 of them (compared to just 11 lines of code, 5 of which are blank or single characters). This issue needs to be dealt with soon, but I want to put in a bit more structure to make this easier. However, as a stop-gap we should also note our current build environment in the README file.

The latest git revision is eb62957. In the next part, I’ll be adding an automated build tool.

Categories
infrastructure

Java Infrastructure Part 2 – A ‘simple’ Java class

Even though we’re focusing on infrastructure, we should have some code to play with. We’ll use a simple Java class, one that merely says hello, and build everything else around it. We will put this onto a server, with (eventually!) a pipeline to deploy it, and a user management system so it can say hello to specific users; but we’re not going to add any functionality beyond greetings until that is all working.

Our first version of the class is this:

package com.riddlefox.greeting;

public class Greeter {
   public static void main (String[] args) {
       System.out.println(greet("world"));
   }

   public static String greet(String name){
       return "Hello " + name;
   }
}

And we create this in the src/com/riddlefox/greeting. It’s a simple class that would work in almost any version of Java. But it already makes a lot of assumptions. These aren’t necessarily problems, as long as we’re aware of them. What can we say about this first class and what it implies about the project?

  1. We have placed this into a fairly uninformative package/folder structure. At this point, there isn’t any need for sub-modules, particularly with no other files to distinguish it from. A src folder in the project root and a package name of  com.riddlefox.greeting are probably good enough for now.
  2. I haven’t added any Javadoc. Arguably, the class is too simple to need it yet, and I want to avoid the sort of Javadoc that simply repeats the method defintion. We’ll add a TODO about adding in Javadoc later on when the project is a little larger, and the method might be used without access to the source.
  3. I have misgivings around the use of the static keyword. It’s mainly there to make the main method concise. In terms of a single class this probably isn’t the end of the world, but if we add much more code such issues of style will become important.
  4. I also have misgivings around the name. As the joke goes, there are only two hard problems in computer programming, and I’m avoiding one of these hard problems. Again, as a project grows naming becomes more important.
  5. String management is an issue here. Changing the strings requires recompiling the class. There is also no means of internationalisation. These can both be added to the TODO list.

The issue of String management is a difficult one. It’s good to be able to update the strings on an application without redeploying, particularly when you have a monolith that takes time to deploy. However, it also adds a level of obfuscation to the code. If the application is easy to update, then redeploying it might not be a problem.

I’ve spent so long talking about this class that I am not going to actually compile it yet – that can wait until the next part. And we’re going to spend that entire post just talking about that. In the meantime, the latest version of the repository is on github.

PS – If you don’t know the joke, there are said to be two hard problems in computer science: cache invalidation, naming things, and off-by-one errors.

Categories
infrastructure

Java Infrastructure Part 1 – Version Control

If your code isn’t in version control, then it doesn’t really exist. It’s too easy to lose the code on a single computer – a hard drive failure, or maybe a mistaken rm command. And it’s also easy to make a change that breaks something and not be able to get back to a working version. At those times a stored copy is a lifesaver. The basic requirement of a Version Control System is to keep code safe. But, over the years it’s become much more than that.

I’ve used a lot of different VCS – sourcesafe, RCS, CVS, SVN and git. At my first job, we used a series of network folders, with a lotus notes DB to keep track of who was working on each file. Different versions of the folders identified the different development environments, from development through to live. Promoting the site  would involve copying code from one set of folders to the next.

At first, version control is about making sure the software is safe and providing a history. This in itself makes it invaluable. But tools like SVN and git make collaboration easier. Of all the tools I’ve used, git is the first one that I’ve loved.

A lot of the tutorials on git treat it like a distributed SVN. This may be helpful in getting started, but soon leads to confusion. My favourite tutorial is Git from the Bottom Up, which discusses git in terms of the objects it uses internally. That makes it much easier to understand: git is a time machine, allowing you to open up alternative timestreams, recombine them, and do it all without opening up paradoxes. As long as you understand what you’re doing.

Git adds a lot of great features. git bisect is great for finding where bugs were introduced. Git stash is great for when you need to change what you’re working on. Git detects moved files better than svn does. But the best thing about git is the branching model. Rather than have branching being something difficult, as it can be in SVN, git is based treats branching as something that should be commonplace.

The big problem with version control systems is how you fit them into the company’s working methods. Git enables people to collaborate effectively but also provides challenges. This is a topic that I owe a whole post on its own. The Death of Continuous Integration is an excellent talk on the topic by Steve Smith.

So, before we write any code for our new project, we need to set up a repository. Github is a convenient place to host these archives, and that’s where I’ll be putting the java-infrastructure code.

Screenshot - 160216 - 18:37:27

Our repository is created with a Java gitignore file and a readme. To that I am going to add a single file, a TODO. The Readme provides a quick overview of what this project is for. The TODO is a simple reference to track things that need to be added to this project. The first item in this file is a note to add a better issue tracking system.

It’s not much of a project yet, but at least we know anything we add is safe in a repository. The current state can be found as commit 94d34c6.

Next up: writing some Java code.

Categories
infrastructure

Java Infrastructure Part 0 – The Long Way

I’ve been programming for a long time and worked with a lot of different companies. I’ve seen a range of architectures, organisations and processes. I started coding before the Agile Manifesto was signed, so I’m old enough to remember that projects were sometimes still successful under waterfall – but that’s another story.

Writing classes and putting applications online are easy enough. Most companies are working on well-understood problems. Despite this, two issues tend to emerge. The first is maintenance. A lot is written about refactoring and managing software, but it rarely works in practise. No matter how clever the devs, code tends to end up more complicated than it needs to be and change becomes difficult.

The second issue is linked to the first, and that is infrastructure. It’s easy enough to write a new piece of code and put it live. It’s so easy that a lot of people focus on writing features for a new application. Deployment tends to be figured out in the closing weeks of the project. After all, the first deployment is relatively straightforward. The problem comes as things grow more complicated.

Once a piece of software is live and has users, it’s hard to switch the deployment strategies.The Internet is now sufficiently established that it’s not  appropriate to shut down the system every time you need to make a change. The first few deployments are simple, quick. As the system grows, it takes longer and longer to redeploy by which time there are a lot of other things competing for attention.

Adding infrastructure to a large project is a challenge. One doesn’t want to risk breaking those obscure sections of config files, placed there to handle one specific situation. Obsolete sections are left in the config because nobody is quite sure if a line does something or not. In the end, only one or two long-established developers are able to change the infrastructure. After they leave things become even more difficult.

What I want to do with this series – both on my blog and on github – is to build up a generic piece of software with simple Java code, but to build a rich infrastructure around it. I think there is a lot to learn from this – and at the end I’ll have a good base to work from with my own future projects. I hope to learn about making infrastructure flexible, which, as I’ve said above, it a rare thing.

Java is easy, but being a professional developer requires much more: version control, continuous delivery/deployment, build management, monitoring, IDEs, logging frameworks, email management etc, etc. This is what I’m going to focus on.

Some years back, I studied deconstruction for my MA. At the start of the course, the Professor read a short Kafa story, the Next Village:

My grandfather used to say: Life is astoundingly short. To me, looking back over it, life seems so foreshortened that I scarcely understand, for instance, how a young man can decide to ride over to the next village without being afraid that -not to mention accidents- even the span of a normal happy life may fall far short of the time needed for such a journey.”

After reading those sixty-six words, the Professor sighed. “We could spend all ten weeks on that piece”. I’m not planning to be quite that meticulous, but this is going to be quite detailed. Based on the notes I’ve made so far, no code gets compiled until Part 3.