A Visit to the Buildomat

The latest episode in our continuing saga of slow browser builds and attempts to fix them was a visit from Ivan Moore, a jack-of-all-trades coder with one of many specialities in continuous integration, and the originator of a clever idea for solving our problem that I had totally failed to explain properly to the team.

Over some more delicious lunch, Ivan explained what his CI server, build-o-matic does: when your build fails on three checkins, say, build-o-matic reruns previous revisions using binary search until it figures out exactly which checkin was the first to cause a failure. A feature like this – or at least the Team City feature that lets you manually rerun for an earlier revision – would help us solve at least one of our problems, namely who should look at a failed build.

But when Ivan had explained this feature on the plane to CITCON Amsterdam, I’d objected that our slow build would make a binary search impractical – we’d know what had happened, but only after waiting a day or so! Build-o-matic is smart enough to use idle agents to do the binary search, and to give recent checkins priority over the search, but still it didn’t seem possible.

Ivan suggested (and I failed to explain properly when I got home) that we break our tests into many small, independent build projects, each of which is short enough to make a binary search practical – and maybe even unnecessary, if it runs fast enough to allow one build per checkin. We would hope that most of these small projects would pass most of the time, and those that do fail would run fast enough to give us speedy feedback – of course this would require substantially more computing resource (whether virtual or physical) to keep all these builds running along quickly!

We have some resistance to the idea of running a huge build farm – we already have 18 servers and doubling or tripling this number starts moving us into real data centre territory, with whispering attendants caring for rows of gleaming machines, and we’re not quite sure if we’re ready for that much hardware management – at least while our very clever IT guy is still at college part of each week!

Some alternatives that also came up in our discussion (in addition to the ones we talked about before):

  • Put together a “smoke test” – a group of tests that cover most of the application and (we think) are most likely to fail whenever anyone breaks a basic feature. Run this (shorter) suite of tests on a fast loop, figuring that this will find the majority of problems.
  • Use annotations to label tests by functional area. Might help us split tests into meaningful functional groups. Something like NUnit categories might help here – anyone know if there is a JUnit equivalent?
  • Use personal builds from Team City or similar features in other CI tools. These builds run through all the same tests, but without actually committing. If you have enough kit (there’s the gleaming data centre again!) then each developer can run a personal build for each checkin, and should be able to fix problems before merging them into source control.
  • Distributed version control systems like git should let you run builds on each branch, if your CI system is smart enough. Again, this should let developers get feedback on their builds before committing to a common repository.
  • Finally, of course, the modern answer to the data centre is the cloud. Seems like someone is working on this for the Bamboo CI server, but I don’t know if anyone has actually tried it in anger. Wouldn’t work so well for us as our customers are awfully security-conscious, but would be fun to see running somewhere!

Many thanks to Ivan for visiting us. We now have lots of ideas to chew on and try out.

Edited to include the “smoke test” idea and to better summarise Ivan’s many skills.

Recording VNC Session

Sometimes it’s handy to record a VNC session. For example when the remote machine is running an automated browser test and you want to see what exactly happened when a failure occurred.

pyvnc2swf can record a VNC session and save it as a Flash file.

1. Install pyvnc2swf

Fedora:

yum install pyvnc2swf

Ubuntu:

apt-get install pyvnc2swf

2. You can use pyvnc2swf via its GUI or on the command line. Using the GUI is pretty straight forward.
The remaining steps describe using it from the command line.
3. Start pyvnc2swf on the command line with the following command:

pyvnc2swf -n -t shape -o <Name of file to save recording> <VNC server name>

For example:

pyvnc2swf -n -t shape -o myrecording.swf vncserver.com

The meaning of each options are:
-n Run pyvnc2swf in console mode.
-t The encoding type. The possible types are shape (.swf), video (.flv), vnc (VNCLog).
-o The name of the file to save the recording.

4. Press Ctrl+c to stop recording.
5. Open myrecording.html in a web browser to view the recording.

I like Hudson

Recently I had to add some new tests to our continuous integration test system. I was dreading it given my previous experience with our CruiseControl setup. However, we have been trialling Hudson so this gave me an opportunity to try it out.

And the verdict…..I like Hudson (kinda gave that away already). It was really easy to modify an existing Hudson project to include my tests. Two Hudson features particularly impressed me.

1. Hudson told me right away when I entered in incorrect path to my artifact file. No need to wait until after a build to realise I had made a typo.

Hudson - Artifact path validation

Hudson immediately validates that the path is correct.

2. It is really easy to see the output from the build. I remember bashing my head against a wall for ages just trying to get the same output from CruiseControl.

Hudson - Console Output

Hudson provides the build output right underneath your fingertips.

PS – The magic incantation to see the build output, as CruiseControl runs the build, is to add uselogger="false" to the <ant> tag in your CruiseControl config file.

Code Dojo VI

We held our 6th code dojo a few weeks ago and this time we decided to break the pattern and try something different. There are two documented types of code dojo. Prepared Kata where a presenter takes the participants through a worked solution to a problem. We’ve never tried this one and the Randori Kata where random teams of two programmers work for 5 minutes each at solving a “simple” problem with advice provided from the non-active participants. Until our last dojo, this is what we’ve been doing.

A general theme that came out of our randori kata dojos was that it got a little bit frustrating for the non-programmers and that swapping out the coders every five minutes was too distracting. We found ourselves not really getting anywhere near the solution even after two hours, which was starting to take the fun out of things. Also, because we were implementing TDD we found ourselves testing things like IO, which is difficult to test and which wasn’t really part of the problem.

So at code dojo VI we tried something completely different. We wrote a small framework called JTanks which is similar to JRobots and which provided a simple API for writing a TankDriver which would operate a virtual Tank in a virtual arena. Tanks are able to move around at variable speeds, fire missiles, lay mines, and scan the local area for enemies (and missiles and mines). The tanks are operated in real time with a simple granular game clock. We also wrote a small graphical UI for tracking the positions of the tanks so that we could watch our tanks in action.

We then split up into four teams of two; each team with the goal of pair programming a TankDriver. After an hour we re-grouped and got together for the first tournament. We used simple death match rules that the last tank standing in each game would get the most points and a tournament consisted of three games. After that, the teams returned to their desks to continue to build drivers this time based on the strategies they’d seen others employ.

Most teams employed TDD (which i would have thought very difficult in such a short time-frame) and one team did not. Interestingly, the team not using TDD actually performed the worst owing to a silly bug. The non-TDD team definitely had the most complex tank and actually the most successful tank was TDD’d but didn’t move – it would just shoot at the closest tank to itself and try and dodge incoming missiles.

The evening was very enjoyable. Developers didn’t have to worry about a complicated API and they got to see the fruits of their labours in action immediately as everyone started with a basic working tank. This was our first dojo to introduce a competitive element and that worked very well to spur on the developers to improve their strategy. Testing the tank drivers proved a little complicated and the newness of the JTanks API meant that a few bugs crept up on the night. However, this didn’t detract from the success of the evening and could easily be improved (and is being) for future dojos.

There are plenty of other competitive programming problems that could be used (efficient elevators in a building, poker or chess players, etc.) and when combined with a goal for development improvement (in this case pair programming) it makes for a fun and interesting dojo.

House Call From the Build Doctor

The Build Doctor came and had some pizza with us the other day to discuss our bleeding browser builds – they run and run and run some more, and the feedback is so slow that by the time they break we’ve no idea which of ten checkins is to blame.

Interestingly, our disease is not uncommon – the Doctor has seen many patients with similar ills. There is no cure, but there are a number of therapies that can ease the pain:

Fix your flickers. Flickering builds, as Ivan says, are a curse. Like Heisenbugs, they come and go at random, and the temptation to just run the build again to make them go away is tremendous. But if your build has flickeritis, it will be very hard to implement the other suggestions below successfully. One solution we’re trying is to remove any flickering test temporarily and add a stickynote to the kanban board so we’re sure to fix it and return to the suite.

Prioritise tests. This has a few flavours:

  • At the beginning of your test run, execute the tests or suites that failed recently. The theory is that these are most likely to fail again – this is a little like the principle of locality.
  • At the beginning of your test run, execute the tests that were most recently written. Again, the theory is that these are most likely to fail (since you’ve probably been changing the code that they test.)
  • At the beginning of your test run, execute the tests whose tested code was changed since the last successful checkin. No theory needed here – if your tests are really independent, and your code has few dependencies, by definition the only tests that can fail are the ones that touch changed code. JTestMe, TestNG, and Clover claim to do this, though I think they are aimed at unit not functional tests.

With any of these options, you want to stop the build the moment a test fails, or at least alert developers that it has failed. That’s where we’re starting – the plan is to get the build to fail fast first, then implement one of the options above.

Run functional, non-visual tests without a browser. Without the overhead of starting and displaying a browser, you can get a lot of your workflow tested and run (we think) much faster. WebDriver does this, and it will soon be available as part of Selenium. Something like HTMLUnit might also be useful. Unfortunately we don’t seem to have a lot of pure workflow in our applications, so this seems less relevant for us.

Use builders to seed data. This is a variation on the previous item. If you’re creating test data through tests that run in the browser, stop now! It’s the slowest way to do it – just think of all the totally unnecessary retesting your data-entry screens are getting when you do this. At present, we have a canned database that we load in when the tests start, which at least avoids the need to seed during test runs. However, the Doctor suggested we go even further, and let each test create and destroy its own data really quickly, without the browser at all. The Builder pattern is likely to be helpful here. Joel is trying out a simplified version of this, where he populates a load of data at once through a single browser request – though ideally you should do it all on the server, with no browser requests at all.

Parallelise, parallelise, parallelise! We already run one test type through three simultaneously running slaves. But we could certainly do more – add slaves, add servers so we have less competition among projects, and more. JUnit is doing more with parallelism these days, though Selenium Grid may be more directly relevant. We’re clearing the decks on a server so we can put on some virtual machines and try this out.

A few other ideas also came up during our discussion. Windowlicker is an alternative for functional testing (though I can’t imagine actually saying the words “we are using Windowlicker” without breaking down laughing). You can take movies of your functional tests running to help you debug them (and to entertain you if you’re really bored); vnc2swf may be useful for this. And Cargo may be useful if you want to swap containers (e.g. run Jetty in test, Tomcat in production).

Expect an update soon on the ongoing saga; we have some more advice to absorb and we need to get moving on some of the tasks above to see if they work for us.