Scenario testing for infrastructures

Recent advancements have allowed us to provision an entire environment with a single command.

The next major challenge facing us is how to perform updates to environments, this gives rise to an additional set of challenges (and constraints) for our automated provisioning system.

We’re working towards a provisioning system that is able to upgrade the systems serving an application by rebuilding a completely fresh environment from the ground up, then seamlessly redirecting the live traffic into the new environment. We plan to eventually do this in a fully automated fashion during business hours – continually as we make changes and improvements to the infrastructure.

One approach to being able to perform updates on the running system, whilst maintaining the “single build” provisioning model presented in the last post would be to implement Blue-Green deployments for our applications at the environment level.

Thinking about things in this mindset gives us a set of problems to solve:

  • How can we have complete confidence that our newly provisioned environment is fully operational and that no regressions have been introduced?
  • How do we know that we can exactly duplicate an environment build process in a way that we have complete confidence in? (E.g. is the QA environment really like production)
  • Do we know the health monitoring of components and the service is functional? (Can we trust the system to monitor the right things as servers are rebuilt dynamically).
  • Are load balancing and redundancy / high availability requirements met under the conditions of unexpected hardware failures? (Not only do the application environments have to be highly available, the provisioning system itself has to cope with failure gracefully)


Let’s start by talking about confidence, and how our MVP for automated confidence is insufficient. Our initial thought process went something like this:

  • We have monitoring for every machine comprising the environment – right?
  • Our monitoring should be able to tell us whether the machines we just provisioned are working (i.e. give us confidence in the state of the environment) – awesome.
  • If we automatically verify that our per host monitoring is “all green”?
  • We’re confident everything is working, and we can put the environment into production – right?
  • So we need automated API to access environment specific monitoring pieces, so that we can assert that all the checks for this specific application is correctly provisioned and healthy.

It was a relatively simple job to extend the mcollective nrpe agent so that from our provisioning model we have a rake task which remotely execute our NRPE checks on the hosts in an environment.

However, whilst a great first step, this is obviously not sufficient because:

  • It does not cover the end-to-end case (we cut-out nagios)
  • It is difficult to model test targets of services rather than individual machines (e.g. tests for a virtual IP on a load balancer)
  • How do we know that our monitoring continues to work through subsequent iterations of model or puppet code?
  • How do we know that our monitoring is not just saying ‘OK’ despite the state of the things it’s trying to monitor?
  • How do we know that we have not caused regressions in our HA setup, which causes it to not-function in a crisis?

… (etc)

Just like in application development, we can’t just hope that provisioning an entire infrastructure still works! We need automated tests that will give us the confidence we need to bring newly provisioned environments online.

Scenario testing

Our current thinking is that we need to be able to reason about the behaviour of the entire system, both under normal and failure conditions – from a high level.

This feels a good fit to structuring the tests in the classic Given, When, Then BDD format.

To this end, we wanted to give some examples of test scenarios that we would be interested in writing to actually have confidence, and show the support we will need to realise them.

Here is a simple example of a scenario we might want to test:

Given – the Service has finished being provisioned
And – all monitoring for the related the service to be passing
When – when we destroy a single member of the service
Then – we expect all monitoring at the service level to be passing
And – we expect all monitoring at the single machine level to be

Even with this simple example, we can drive our thinking into the
upstream APIs and services we’ll need to achieve these goals at each
of the steps.

>> Given – the Service has finished being

We can do this! We can launch a bunch of virtual machines to fulfil a service.

>> And – all monitoring for the related the service
to be passing.

We can’t do this. We can check that nrpe commands on machines that are part of a service are working (and we do), we can also execute some one off sanity checks from the provisioning machines. But what we really want to do is to ask our monitoring system (nagios in our case).

Now “the service” actually consists of a number of machines. Each machine has checks, and the service itself has checks. The service will have some internal checking like are all the instances healthy in the load balancer or not; it will also have some external checking checked from somewhere on the Internet (e.g. pingdom), which will cover off most connectivity issues.

So how can we confidently answer this question? I believe we need to be able to query the monitoring system like this:

    'The Service' :transitive => true
  ).should be_all_green

In this case I want to know about every participating machine and all checks relating to the service. This is the classic “is everything ok for the thing we just provisioned” question.

>> When – when we destroy a single member of the service

This is easy We just issue a destroy command for a VM. For other scenarios we might want more specific ways for disabling particular functionality of a machine rather than complete destruction.

>> Then – we expect all monitoring at the service level to be passing.

We need to able to do something like this:

    'The Service', :transitive => false
  ).should be_all_green

Note that the change of transitivity is the important clue here!

Sometimes I want to ask the question: “Are all participants in this service ok?” But sometimes I just want to know if the service is still functioning.

>> And – we expect all monitoring at the single machine level to be failing

  'Service App Machine 0', :transitive => false
).should be_all_red


The thinking we’re doing about extending the modeling of our infrastructure from provisioning to testing is also applicable to running the systems in production. The scenario testing described above is predicated on the ability to sample an environment and build a model of the infrastructure state.

Whilst we’re still a long way from solving all of the problems, the model used for solving the testing problems outlined above can be used to reason about the production system! There are wins to be had every step along the path – in the short term, our model can power much better service/environment (rather than machine) level monitoring.

In the longer term, we’ll have better testing in place to give us confidence about newly provisioned environments. This will allow us to have a trustworthy blue/green infrastructure deployment and update system, as it can be built as a simple state machine where transitions resolve differences between a ‘desired state’ and the ‘current state’. This is exactly the same concept as behind Orc, our continuous deployment tool – wherein a model and state machine driven approach allows us to safely upgrade production applications, even in the face of machine failure.

We hope this post has made you think about the problems in the same way we have. Even if you don’t agree with the strategy we’re pursuing, you’ll hopefully agree that being able to reason about infrastructures at a higher conceptual level than individual machines is an important and powerful concept.

Exported Resources Considered Harmful

Our infrastructure automation is driven by Puppet, so this post is mainly going to talk about Puppet – however the key problem we have (and issues we’re solving) is equally relevant for most other current configuration management tools (such as Chef). One of the key challenges for configuration management systems is determinism – i.e. being able to rebuild the same system in the same way.

In a ‘traditional’ world view, the ‘system’ means an individual machine – however, in the real world, there are very few cases where a new production system can be brought on-line with only one component. For resiliency (if not scalability) purposes you probably want to have more than one machine able to fulfil a particular role so that a single hardware failure won’t cause a system wide outage.

Therefore your system consists of more than one machine – whilst the current crop of configuration management tools can be deterministic for a single machine, they’re much less deterministic when you have inter-machine dependencies.

Beyond individual per-application clusters of servers, you want the monitoring of your entire infrastructure to be coupled with the systems in place inside that infrastructure. I.e. you shouldn’t have to replicate any effort; when you add a new web server to the pool serving an application, you expect the monitoring of that server to adjust so that the new service becomes monitored automatically.

In the puppet ecosystem, the traditional solution to this is exported resources. In this model, each host running puppet ‘exports’ a set of data about the system under management (for instance nagios checks), and then other hosts can ‘collect’ these resources when they run puppet.

Traditionally this was not very scalable, although this has largely been addressed with the introduction of PuppetDB. It was also difficult to arrange things such that you could get the set of resources you wanted onto the host you wanted – with the newer versions of PuppetDB this issue is ameliorated with the introduction of a more flexible query interface.

All of these advancements have been great progress, and kudos for puppetlabs to doing much needed work in this area. However, pulling back from the actual problems, myself (and my team) have come to consider exported resources as the wrong solution for the problems it’s commonly used to solve.

Exported resources introduce puppet run-order dependencies, i.e. in order to reach the correct state, puppet must run on some machines before it runs on others. The implication is that this “management method” is a Convergent[1] system as the system could end up in its final state by more than one route. Any system which relies on convergence is complicated, as it’s very hard to know if you’ve converged to the end state (or if you will ever converge to the end state).

The key issue is, of course, determinism: If host A exports resources to host B – then the order in which you build host A and host B matter, making them co-dependent and non-deterministic. If you’re rolling out an entirely new environment then this likely means that you have to run puppet again and again across the machines until things appear to stop changing – and this is just _apparent_ convergence, rather than proven convergence.

We can go some way to solve this issue, by forcing the order that machines are provisioned (or that puppet is run on those machines). We wrote puppet roll which executes puppet on hosts in order according to a dependency graph. But this is the wrong problem to be solving. Eliminate provisioning order dependencies and we eliminate a difficult and brittle problem.

In recent work, we have rejected the traditional “exported resources anti-pattern” and instead have created a model of ‘our system’ entirely outside puppet. This means that we can build a model of the entire system, which contains no mutable state. We wire this model up to puppet to generate ENC (external node classifier) data for each machine. All data needed for this machine’s state is supplied by the ENC, meaning that machines can be built in any order, and all in exactly one pass.

This entirely removes all the key problems with determinism, convergence, multiple puppet runs etc. In our experience, it also in many cases radically simplifies things. Whereas previously we would have bent things to fit into the model offered by exported resources, we can now instead write our business specific logic in our own model layer – meaning that we can represent things as they should naturally be modelled.


The thing we like most about puppet for individual systems and services is its declarative, model driven nature – thus we’ve tried to replicate something with a similar ‘feel’ at the whole system level.

Given this (somewhat simplified) description of a service:

stack 'example' do
  virtual_appserver 'exampleapp', :instances => 2
  loadbalancer, :instances => 2

env 'ci', :location => 'dc1' do
  instantiate_stack 'example'

env 'production', :location => 'dc2' do
  instantiate_stack 'example'

The (again somewhat simplified) ENC generated for the 2 application servers, looks like this:

role: http_app_server
  environment: ci
  application: 'exampleapp'

The ENC for the 2 load balancers look like this:

      type: http

This sort of configuration eliminates the need for a defined puppet run-order (as each host has all the details it needs to configure itself from the model – without any data being needed from other hosts), and goes a long way towards achieving the goal of complete determinism.

The example shows a traditional load balancers to web servers dependency, however the same technique can be (and is) applied in our code wherever we have clusters of servers of the same type that need to inter-communicate. E.g. RabbitMQ, Ehcache and Elasticsearch clusters.

If you haven’t guessed yet, as well as being theoretically correct, this approach is vastly powerful:

  • We’re able to smoke test our puppet code in a real integration environment.
  • We can provision entire clusters of servers for QA or development purposes with 1 line of code and a < 10 minute wait.
  • We’ve used this system to build our newest production applications.
  • We can rebuild an entire app environment during scheduled maintenance.
  • We can add servers or resources to an application cluster with a 1 line code change and 1 command.

We’ve got a lot of work still to do on this system (and on our internal applications and puppet code before it’ll all fit into this system), however it’s already quite obviously a completely different (and superior) model to traditional convergence for how to think of (and use) our configuration management tools across many servers.


  1. Why Order Matters: Turing Equivalence
    in Automated Systems Administration (USENIX 2002)

Mocking the proud

A mocking framework such as JMock provides a way of specifying and checking the interactions between an object under test and its collaborators, without having a full implementation of those collaborators to hand. However some prideful classes don’t like being mocked!

Initially, when all we have is an interface specifying the contract that some collaborator must fulfil, the mock objects created by the framework act as stubs for implementation code that has yet to be written. Later on, however, we use mock objects as replacements for actually existing implementations, so that we can unit test an object that drives other parts of a larger system without having to come up with a complex test configuration for the system itself.

In both cases, mocking allows us to pretend that the object under test has a fully functioning environment to interact with, when in fact it is in the position of a “brain in a vat” being fed artificial stimuli by the testing code.

User:Кирилл Мартынов [GFDL ( or CC-BY-SA-3.0-2.5-2.0-1.0 (], via Wikimedia Commons

In the usual case, a mock object fills out some entirely abstract interface: by default, JMock will complain if you try to mock a concrete class. Although with a bit of persuasion JMock is able (thanks to some cunning bytecode manipulation) to create mock implementations even of final classes, I’d argue that whenever you find yourself having to do this you have a prideful class and should consider it as a potential code smell.

Prideful classes are a particular problem when they show up at the public boundary of a system (e.g. as part of its API). Objects under unit test should ideally interact with the wider system through collaborators on that boundary ,rather than “reaching in” to work with internal system objects, in violation of the Law of Demeter.

So when you find you have to mock out an interface that a) belongs on the external boundary of some part of the system, and b) can’t be mocked without bytecode manipulation, because it’s tied to a specific implementation, alarm bells should go off in your head – separate modules within the system are in danger of becoming tightly and inflexibly coupled.

In this situation, consider humbling the proud by refactoring prideful classes to extract an abstract interface, which can be mocked freely.

Telling Stories Around The Codebase

This is a story that has been about a year in the making. It has now reached a point where I think the story needs to be told to a wider audience.

At last year’s Citcon in Paris there was a presentation by Andy Palmer and Antony Marcano. They showed off a way of writing tests in Fitnesse that read much more nicely than you normally encounter. The underlying system they called JNarrate allowed writing tests in Java in a similarly readable fashion.

At youDevise we had been trying to figure out what framework we should be using to try to make things easier to understand and nicer to write. We had taken a look at Cucumber as well as a few other frameworks but none had really made us take the plunge yet. During the conference Steve Freeman mentioned that you don’t need to find a framework. You can instead evolve one by constantly refactoring, removing duplication, and making the code expressive.

Back at work I decided to try to give it a go. Guided by the small bit of JNarrate that I had seen at the conference I started trying to figure out what it would look like. The first passes at it (which got very quickly superseded) tried to do away with some of the things in JNarrate that didn’t seem very necessary. It turned out that many of them were. Not for the code to run, but for the code to be readable.

Eventually it turned into something that looked good.



    .should(be(money(300, usd)));

We worked with this for a while and had very few changes to the basic structure of:

Given.the( <actor> ).wasAbleTo( <action> );
When.the( <actor> ).attemptsTo( <action> );
Then.the( <actor>).exptectsThat( <thing> )
    .should( <hamcrest matcher> );

The framework itself is really just a few interfaces (Action, Actor, and Extractor) with the Given, When and Then classes being the only part of the framework that actually does anything. The rest of what is written is entirely part of the implementing application.

This style of writing non-unit tests has come to pervade large portions of our tests. It immediately helped immensely in communicating with business people and helping developers to understand the domain better. Since it was in Java we had the full support of the IDE for writing the tests and for refactoring them as our understanding of the scenarios improved. Once you get over the initial hurdle of defining the vocabulary, writing up new scenarios becomes so easy that we have started to sometimes go a little overboard with them 🙂

The only change that has occurred recently is that we dropped the standard Java camel-casing of identifiers and replaced them with underscores. We reached this decision after discovering that most of the pain of reading some of our more complex scenarios was in trying to parse the identifiers into sentences. SomeTimesItJustGetsALittleTooHardToFigureOutWhereTheIndividualWordsAre.

So a recent example is:

Given.the( company_admin)
.was_able_to( modify_the_portfolio( the_portfolio)
    .to_add( author_in_other_company_who_published));

Given.the( author_in_other_company_who_published)
.was_able_to( publish_an_idea()
    .with( a_long_recommendation_for( any_stock()))
    .to( a_portfolio_named( the_portfolio)));

Given.the( company_admin)
.was_able_to( modify_the_portfolio( the_portfolio)
    .to_remove( author_in_other_company_who_published));

When.the( company_admin).attempts_to(view_the_available_authors());

Then.the( company_admin)
.should( have( author_in_other_company_who_published));

Who will test the tests themselves?

A short while ago, a colleague and I were developing some JUnit tests using the jMock library, and came across some troubles while trying to start with a failing test. If you’re unfamiliar with jMock, the basic structure of a test looks something like this:

public void theCollaboratorIsToldToPerformATask() {
  // setup your mock object
  Collaborator collaborator = context.mock(Collaborator.class);

  // define your expectations
  context.checking(new Expectations() {{
    oneOf(collaborator).performATask(); // the method 'performATask' should be invoked once

  // set up your object under test, injecting the mock collaborator
  MyObject underTest = new MyObject(collaborator);

  // execute your object under test, which should at some point, invoke collaborator.performATask()

  // check that the collaborator has been called as expected

(For an excellent background on developing software with this technique, I highly recommend reading Growing Object-Oriented Software.)

So back to our problem. We couldn’t work out why our unit test, despite the functionality not existing yet, was passing. It didn’t take long for someone with a bit more experience using jMock to point out our error: we were not verifying that the mock was called as expected. In the code above, this translates to: we were missing the call to “context.assertIsSatisfied()“. Since the mock object wasn’t asked if it had received the message, it didn’t have a chance to complain that, no, it hadn’t.

Granted, myself and my pairing partner were not too familiar with jMock, but it seemed like an easy mistake to make, and it got me thinking.

  • How many other developers didn’t realise the necessity to verify the interaction?
  • How many tests had been written which did not start out failing for the right reason, and thus, were now passing under false pretences?
  • How could we check our existing tests for this bug, and ensure that new tests didn’t fall prey to the same lack of understanding?

In short, who will test the tests themselves?

A satisfactory answer for this case, I found, is FindBugs.

FindBugs is a static analysis tool for Java, which detects likely programming errors in your code. The standalone FindBugs distribution can detect around 300 different types of programming errors, from boolean “typos” (e.g. using & instead of &&) to misuse of common APIs (e.g. calling myString.substring() and ignoring the result). Obviously the FindBugs tool can’t anticipate everything, and the jMock error was obscure enough that I had no expectation of it being included. Fortunately, a handy feature of FindBugs is that, if you have a rough idea of a bug you’d like to discover, and a couple of hours to spare, you can write your own plugin to detect it.

With a bit of effort I had whipped together a simple detector which would find this type of problem across the host of tests we continuously run at youDevise. Out of approximately 4000 unit tests, this error appeared around 80 times. Not too many, but enough to be concerned about. Fortunately most of the time, when the call to context.assertIsSatisfied() was included (or the @RunWith(JMock.class) annotation added to the class), the tests still passed. That they “fortunately” still passed, was the problem, since that depended on luck. Occasionally the problem test cases didn’t pass after being fixed, and it either meant a test was outdated and the interaction deliberately didn’t happen anymore, or the interaction had never occurred in any version of the code. Fortunately (again, more by luck than judgment) the tests didn’t actually highlight faulty code. Granted, we also have suites of higher level tests, so the unit tests were not the last line of defense, but still, it is important that they do their job: providing fast (and accurate) feedback about changes, and communicating intent.

The FindBugs plugin, by testing the tests, helped to discover when they weren’t doing their job. Since we run FindBugs, now with said plugin, as part of the continuous build, it will (and has) prevented new test code from exhibiting the same fault. Although no bugs in production code were revealed as part of correcting the tests, knowing that we won’t need the same flavour of luck again increases confidence. This in turn leads to all manner of warm and fuzzy feelings.

Since the plugin has been open-sourced, if you or your team uses jMock, you can detect and prevent those errors too (though I don’t guarantee you’ll feel as warm and fuzzy as I did).

The FindBugs plugin is available for use from the youDevise github repository. Instructions and the JAR to download are included. FindBugs is also open-source, is free to use, mature, and widely used (by the likes of those reputable sorts at Google) and is available for download.

So if you ever find yourself asking,“Who will test the tests themselves?” maybe FindBugs, with a custom plugin, is your answer.

Adventures in Scala-based functional testing

We are in the process of creating a financial calculation library in Scala for one of our applications, and if there is one thing that is “really easy” for calculations, it is testing them… </sarcasm>

Everybody likes to demonstrate simple examples of testing mathematical formulas using tools like JCheck/ScalaCheck or other testing frameworks. The test code always looks pretty, but unfortunately, it is never so simple in practice. Our tests have dozens, if not hundreds of numbers going in and a few distilled numbers coming out.

As a “simple” example, if you want to calculate the Jensen’s Alpha of your USD position in British Petroleum, then we are going to need prices for BP, fx rates for GBp to USD, risk free rate, beta for the stock, index prices (for your market return)… and that is for only one position. Jensen’s Alpha is a much more useful metric over a whole portfolio…

We have yet to find a truly convenient way to model all of this information using any of the popular functional testing tools like Fitnesse or Cucumber. They just seem to end up with more work for us to do (in creating fixtures or step definitions), usually with less readable results in the end. Consequently, we decided to create a test script format that made our BA’s, QA folk (and developers) happy, and decided to go from there.

As a group, we decided on a JSON format that presented the data and its hierarchies in a form that all could easily read and understand, something like:

  "positions": [ ...position objects here... ],
  "period": { "start": "2009-01-01", "end": "2009-03-31" }
  "prices": [ ...price objects... ],
  "fxRates": [ ...fx rate objects... ],
  "verification": {
    "averageDuration" : 42,
    "portfolioBeta" : 0.85

For JSON in Scala, you don’t have to look any farther than scala.util.parsing.json package built into Scala itself. The simple parser gives back a List or Map (of List/Map/Double/String…) which mirrors the JSON itself. Unfortunately, that output forced us to write brittle (and casting-filled) transformation code from the Map[String, Any] and List[Any] into the traits needed for our calculations. Small changes to the format, like turning an object into an array, would cause painful, run-time errors.

After some frantic googling, we stumbled across the lift-json project, an extracted library from the original Lift project. Lift-json has the very pleasant notion of parsing JSON into a series of case classes, allowing us to create case class implementing the traits for our calculation library. Our implementation also leveraged the excellent ScalaTest libraries and its ShouldMatcher syntax to map our final verifications into a Map[String, Double] where the String was the field name on the calculation result and the Double was the result value.

Excellent! Boilerplate collection parsing and casting begone!

This structure worked well, until we ran into a problem with our case-class-mapped, JSON schema. In our verification section, we wanted to try to assert calculation results that were more complex. So, a simple Map[String, Double] would not do. For example:

{ ...
  "verification": {
    "averageDuration" : 42,
    "portfolioBeta" : 0.85,
    "totalReturn" : [ { amount: 1000, currency: "USD"}, 
                      { amount: 800, currency: "EUR" } ]

This meant that we either needed to complicate our JSON schema a lot, (by adding new verification sections or much more complex verification types) or try again. We were very committed to making this test script concept work, but we didn’t want to be hindered again by another trip-up like this. So, one of the guys suggested a Scala DSL instead, and he ran with creating a Fluent Interface for a test case.

In the end, he produced something like this:

class TestCase1 extends CalcTestCase {
  period from "2009-01-01 00:00:00" to "2009-03-31 23:59:59"
  position("openDate" -> "2009-01-15 10:45:00",
           /* ... more position details... */)
  // more positions, prices, fxRates, etc...

  verify("averageDuration" -> 42,
         "portfolioBeta"   -> 0.85,
         "totalReturn"     -> is(1000 in "USD", 800 in "EUR") )

Using Scala as our test script language gave us huge wins. We could choose to write with a more fluent style where it suited: period from "2009-01-01 00:00:00" to "2009-03-31 23:59:59" or 1000 in "USD". Also, we could still fall back to String value maps for our verifications, allowing us to easily make failing tests (but not tests that fail to compile…) Using Scala as the test language (with a our DSL on top) seems like the choice that we should have made all along.

Testing tools that (we heard about at CITCON Europe 2008) you might find useful …

Squirrel, Paulo and I recently attended CITCON Europe 2008. Apart from meeting, discussing and solving problems with many experts in the field of continuous integration and testing, we also learnt about many testing tools. It was the first time I’ve heard of some of these tools, so I thought I’d share them with others:

Web application testing tools:

  • Selenium (we use this at youDevise) – there are various Selenium projects that aid testing of web applications. Selenium provides an IDE which allows users to record and play back tests. Tests can be stored in HTML files or in a few programming languages (Java, Ruby, Python, Perl, PHP or .Net).
  • Watir is an open-source library for automating web browsers.
  • Allors. Does something called Immersive Testing.
  • WinRunner is functional testing software for enterprise IT applications.
  • WebDriver is an open-source tool for automating web browsers.
  • Rational Robot (from IBM) is a test automation tool for functional testing of client/server applications.

Performance testing tools:

  • JMeter from Apache is a 100% pure Java desktop application designed to load-test functional behaviour and measure performance. It was originally designed for testing Web Applications but has since expanded to other test functions.
  • Grinder is a Java load testing framework that makes it easy to run a distributed test using many load-injector machines. It is freely available under a BSD-style open-source license
  • LoadRunner is a performance and load testing product by Hewlett-Packard for examining system behaviour and performance, while generating actual load.

Web Service Testing:

  • soapUI is a tool for Web Service Testing


  • Fit is a framework that allows customers to provide examples of how a program should behave. Developers then build the program and Fit verifies the program works by comparing the examples against the program.
  • FitNesse is a tool that wraps Fit, making it easier to define tests.

Extreme Ironing in Visual Basic

Extreme ironing looks like fun: take an ordinary activity and do it in strange places or under strange conditions or both.

I felt a bit like an extreme ironist this week as I wrote a plugin for Excel in Visual Basic, surely one of the more unit-test-unfriendly languages out there. I’ve sworn never to write code without a unit test again, but how the heck am I supposed to write test-first when the language is designed for accidental programmers who don’t know what an object is, much less a unit test?

Yes, there is VBAUnit but it looks abandoned. You can forget about mocking or refactoring, of course.

I did find it was possible to write working unit tests for each of my non-interactive procedures once I embraced the notion of using Excel itself as the interface. The outputs go into cells and Excel formulas check whether they match the expected values. The plugin generates random strings so there’s a simple randomness check for a roughly even distribution – hardly the Diehard tests but it works.

There’s one procedure (I guess it’s called a SUB procedure in the VBA world) that I can’t test from within Excel, because it’s the one that actually opens forms and interacts with the user. If this were a commercial product, I’d use WinRunner to simulate a user clicking and typing data, or try to find an open-source alternative, but since we’re just using the plugin internally I’m happy with a manual test carefully documented in one of the Excel sheets.

I don’t think I’m going to be adopting VBA programming (or extreme ironing!) as a hobby anytime soon, but it was a fun experiment – a little like our last code dojo, where we imposed some severe restrictions to see how far we could push ourselves.

Happy ironing!

Crazy guy dangles from rope over canyon while happily ironing

Crazy guy dangles from rope over canyon while happily ironing