House Call From the Build Doctor

The Build Doctor came and had some pizza with us the other day to discuss our bleeding browser builds – they run and run and run some more, and the feedback is so slow that by the time they break we’ve no idea which of ten checkins is to blame.

Interestingly, our disease is not uncommon – the Doctor has seen many patients with similar ills. There is no cure, but there are a number of therapies that can ease the pain:

Fix your flickers. Flickering builds, as Ivan says, are a curse. Like Heisenbugs, they come and go at random, and the temptation to just run the build again to make them go away is tremendous. But if your build has flickeritis, it will be very hard to implement the other suggestions below successfully. One solution we’re trying is to remove any flickering test temporarily and add a stickynote to the kanban board so we’re sure to fix it and return to the suite.

Prioritise tests. This has a few flavours:

  • At the beginning of your test run, execute the tests or suites that failed recently. The theory is that these are most likely to fail again – this is a little like the principle of locality.
  • At the beginning of your test run, execute the tests that were most recently written. Again, the theory is that these are most likely to fail (since you’ve probably been changing the code that they test.)
  • At the beginning of your test run, execute the tests whose tested code was changed since the last successful checkin. No theory needed here – if your tests are really independent, and your code has few dependencies, by definition the only tests that can fail are the ones that touch changed code. JTestMe, TestNG, and Clover claim to do this, though I think they are aimed at unit not functional tests.

With any of these options, you want to stop the build the moment a test fails, or at least alert developers that it has failed. That’s where we’re starting – the plan is to get the build to fail fast first, then implement one of the options above.

Run functional, non-visual tests without a browser. Without the overhead of starting and displaying a browser, you can get a lot of your workflow tested and run (we think) much faster. WebDriver does this, and it will soon be available as part of Selenium. Something like HTMLUnit might also be useful. Unfortunately we don’t seem to have a lot of pure workflow in our applications, so this seems less relevant for us.

Use builders to seed data. This is a variation on the previous item. If you’re creating test data through tests that run in the browser, stop now! It’s the slowest way to do it – just think of all the totally unnecessary retesting your data-entry screens are getting when you do this. At present, we have a canned database that we load in when the tests start, which at least avoids the need to seed during test runs. However, the Doctor suggested we go even further, and let each test create and destroy its own data really quickly, without the browser at all. The Builder pattern is likely to be helpful here. Joel is trying out a simplified version of this, where he populates a load of data at once through a single browser request – though ideally you should do it all on the server, with no browser requests at all.

Parallelise, parallelise, parallelise! We already run one test type through three simultaneously running slaves. But we could certainly do more – add slaves, add servers so we have less competition among projects, and more. JUnit is doing more with parallelism these days, though Selenium Grid may be more directly relevant. We’re clearing the decks on a server so we can put on some virtual machines and try this out.

A few other ideas also came up during our discussion. Windowlicker is an alternative for functional testing (though I can’t imagine actually saying the words “we are using Windowlicker” without breaking down laughing). You can take movies of your functional tests running to help you debug them (and to entertain you if you’re really bored); vnc2swf may be useful for this. And Cargo may be useful if you want to swap containers (e.g. run Jetty in test, Tomcat in production).

Expect an update soon on the ongoing saga; we have some more advice to absorb and we need to get moving on some of the tasks above to see if they work for us.