Test Data Buildering

This blog post accompanies one of our weekly lightning talks, embedded below. Read the text, watch the video, or heck, do both!

[video:youtube:hUgp4svllcA]

Over the course of our learning how to specify and write tests for our code on the HIP we have gone through many different styles of dealing with setup. There was a stage when No setup was done and therefore very little real testing of behavior. Then there was the time of mocking absolutely everything (including numbers). That time still haunts us, but we are slowly putting it behind us. After that we got mock shy and just started using the real objects whenever possible.

This stage in our evolution caused tests to be written. The tests checked meaningful behavior, were fast, and somewhat maintainable. The problem was that they were not always the easiest to read. The problem came down to how we built the objects that we wanted to use.

   FundOfFund fohf = new FundOfFund( [8 parameters, only two of which we care about] );

We tried to solve this problem by having a whole load of pre-built test objects stored as statics.

    public class TestFundOfFund {
        public static final FundOfFund USD_FOHF = new FundOfFund(...);
    }

This unfortunately lead to us have a lot of hidden knowledge in our tests. “Oh that USD_FOHF, it also happens to have an initial price of $5.” This caused us to start backing away from that approach pretty quickly.

Our next step was to try out mocking again. If we needed a FundOfFund, then we would mock out the parts of a FundOfFund that should be used in the test.

    final FundOfFund fohf = context.mock(FundOfFund.class);
    context.checking(new Expectations() {{
        allowing(fohf).getName(); will(returnValue("blah"));
        allowing(fohf).getCurrency(); will(returnValue(usd));
    }});

This worked. It expressed what was needed for the test. But it is noisy and annoying to type. If a large amount of data is needed (which we sometimes need), then it gets hard to see what has been setup. It also caused noise in tests when some data needed to be available for the code under test to work, but the value had no bearing on what we were trying to specify (name in all of my code snippets here is a prime example). There was no way for making sure that the object had sensible defaults.

The next thing we are trying out is Test Data Builders. The standard way of doing it is fine and gets it done. What I don’t like about it is the mass of boilerplate code that is needed (all of those with*() methods written out individually? ick!). The make-it-easy framework is a bit better, but has problems as well.

    FundOfFund fohf = make(a(FundOfFund, with(name, "bar"), with(currency, usd)));

To write it this way you need to import “make”, “a”, “name”, and “currency” into the namespace. Pollution! Collision! Confusion ensues. But the basic idea is good. So what I am trying out now is using the framework for its basic elements and wrapping it up a little.

    FundOfFund fohf = new FundOfFundMaker() {{
        with(name, "bar");
        with(currency, usd);
    }}.build();

So far this has worked out pretty well (if you can get over the instance initializer syntax). It doesn’t pollute the namespace and also it seems to provide a nice middle ground between the boilerplate of the all custom builders and the more composable nature of make-it-easy. For instance:

    FundOfFund fohf = new FundOfFundMaker() {{
        withYearEndPrice(4, december(30, 2010));
    }}.build();

Adventures in Scala-based functional testing

We are in the process of creating a financial calculation library in Scala for one of our applications, and if there is one thing that is “really easy” for calculations, it is testing them… </sarcasm>

Everybody likes to demonstrate simple examples of testing mathematical formulas using tools like JCheck/ScalaCheck or other testing frameworks. The test code always looks pretty, but unfortunately, it is never so simple in practice. Our tests have dozens, if not hundreds of numbers going in and a few distilled numbers coming out.

As a “simple” example, if you want to calculate the Jensen’s Alpha of your USD position in British Petroleum, then we are going to need prices for BP, fx rates for GBp to USD, risk free rate, beta for the stock, index prices (for your market return)… and that is for only one position. Jensen’s Alpha is a much more useful metric over a whole portfolio…

We have yet to find a truly convenient way to model all of this information using any of the popular functional testing tools like Fitnesse or Cucumber. They just seem to end up with more work for us to do (in creating fixtures or step definitions), usually with less readable results in the end. Consequently, we decided to create a test script format that made our BA’s, QA folk (and developers) happy, and decided to go from there.

As a group, we decided on a JSON format that presented the data and its hierarchies in a form that all could easily read and understand, something like:

{ 
  "positions": [ ...position objects here... ],
  "period": { "start": "2009-01-01", "end": "2009-03-31" }
  "prices": [ ...price objects... ],
  "fxRates": [ ...fx rate objects... ],
  "verification": {
    "averageDuration" : 42,
    "portfolioBeta" : 0.85
  }
}

For JSON in Scala, you don’t have to look any farther than scala.util.parsing.json package built into Scala itself. The simple parser gives back a List or Map (of List/Map/Double/String…) which mirrors the JSON itself. Unfortunately, that output forced us to write brittle (and casting-filled) transformation code from the Map[String, Any] and List[Any] into the traits needed for our calculations. Small changes to the format, like turning an object into an array, would cause painful, run-time errors.

After some frantic googling, we stumbled across the lift-json project, an extracted library from the original Lift project. Lift-json has the very pleasant notion of parsing JSON into a series of case classes, allowing us to create case class implementing the traits for our calculation library. Our implementation also leveraged the excellent ScalaTest libraries and its ShouldMatcher syntax to map our final verifications into a Map[String, Double] where the String was the field name on the calculation result and the Double was the result value.

Excellent! Boilerplate collection parsing and casting begone!

This structure worked well, until we ran into a problem with our case-class-mapped, JSON schema. In our verification section, we wanted to try to assert calculation results that were more complex. So, a simple Map[String, Double] would not do. For example:

{ ...
  "verification": {
    "averageDuration" : 42,
    "portfolioBeta" : 0.85,
    "totalReturn" : [ { amount: 1000, currency: "USD"}, 
                      { amount: 800, currency: "EUR" } ]
  }
}

This meant that we either needed to complicate our JSON schema a lot, (by adding new verification sections or much more complex verification types) or try again. We were very committed to making this test script concept work, but we didn’t want to be hindered again by another trip-up like this. So, one of the guys suggested a Scala DSL instead, and he ran with creating a Fluent Interface for a test case.

In the end, he produced something like this:

class TestCase1 extends CalcTestCase {
  period from "2009-01-01 00:00:00" to "2009-03-31 23:59:59"
  
  position("openDate" -> "2009-01-15 10:45:00",
           /* ... more position details... */)
  position(...)
  // more positions, prices, fxRates, etc...

  verify("averageDuration" -> 42,
         "portfolioBeta"   -> 0.85,
         "totalReturn"     -> is(1000 in "USD", 800 in "EUR") )
}

Using Scala as our test script language gave us huge wins. We could choose to write with a more fluent style where it suited: period from "2009-01-01 00:00:00" to "2009-03-31 23:59:59" or 1000 in "USD". Also, we could still fall back to String value maps for our verifications, allowing us to easily make failing tests (but not tests that fail to compile…) Using Scala as the test language (with a our DSL on top) seems like the choice that we should have made all along.

Pattern Language Problems

I’ve been talking to various people about pattern languages lately and trying to get various developers here at youDevise to give lightning talks about patterns to help spread knowledge about them a little. During all of this I’ve noticed a recurring theme: people usually think that a pattern is a description of an implementation. This means that people start using pattern language to talk about specific implementations to solve problems. I think this is wrong and leads to misunderstanding with no greater insight into the problem that is being solved.

Various communities have railed against patterns because of this focus on implementation (see Design Patterns in Dynamic Programming for one example) because many of the implementations are nonsensical in their environment. This is a perfectly reasonable reaction to a language that is being used solely to express a particular kind of solution to a general problem.

Instead of patterns being about solutions or implementations, I think that they should instead be seen as identification of problems. When you say you plan to use the Factory pattern, what you are really saying is that you have encountered a particular problem: you need to create instances of an object, but the concrete class of the object being created may change.

Once the pattern language has been turned around like this, then suddenly many more things start to fall into place. Patterns beg other patterns (the identification of one problem brings to light other problems that will need to be solved). Patterns have synonyms (Template Method and Strategy). The list goes on.

So try to change your thinking about pattern languages: instead of using them to express a concrete solution, make them a way of identifying the problem you are trying to solve. From there you can start choosing the best implementation based on your context.

Acceptance Tests at GOOSGaggle

Steve Freeman has been advising youDevise from our early beginnings. He and Nat Pryce have downloaded their ideas about test-driven development, mocking, and software design into a super book called Growing Object-Oriented Software, Guided by Tests, or GOOS for short.

Recently Steve, Nat, and Brian Marick gathered some very smart London folks to discuss the ideas in GOOS (and they let me in too). I joined the acceptance-test OpenSpaces group, which discussed these questions:

  1. Should acceptance tests use the user interface, or should they drive the domain objects directly?
  2. What practises do others use when writing and running acceptance tests?
  3. What problems do we encounter when using acceptance tests?

Here’s what we found in each area.

User Interface or Not?

The problem with UI-based testing, as we know at youDevise, is that UI tests can be slow and unstable, and asynchronous browser activity like AJAX behaviour can be annoying to capture and test. These problems can be overcome but the investment can be high. On the other hand, UI tests are often most understandable to customers, and can be really engaging – you can get viewers immediately oohing and aahing by showing a pointer moving around the screen to perform actions, as one UI test tool does during test playback. If you can then make a live change and rerun the test with new behaviour you’re likely to get lots of immediate useful feedback. See Resource-oriented testing below for a suggestion that can help provide both benefits.

Acceptance-Test Practises

Goal orientation

BDD done right gives you a way to describe the user’s goals, not how the user achieves those goals – you can hide the how in the implementation of your test. Traditional tests (and acceptance tests done badly) are very specific about the actions (“click here”, “enter that”) rather than the goals (“choose a product”, “supply credit card details”).

Scientific method

After discussion of specific versus generic data (see below) Antony Marcano crystallised our thinking by describing the process of building a software product as an application of the scientific method. At first, you examine lots of specific examples of the behaviour you are interested in. Next you construct a theory (i.e. initial acceptance tests) that captures and describes some of this behaviour in a more generic way. Then you check this theory with reality through experiments – constructing more specific examples that you can validate with your product owner in working code. This will lead to changes to your theory, i.e. new acceptance tests, and you iterate until your understanding of the domain has developed into a mature and useful theory (i.e. a sufficiently complete set of acceptance tests).

Resource-oriented testing

Matt Savage described a method he’s used with success in his current team at Sky, which we decided to call resource-oriented acceptance testing. The first step is to make sure your application can be used in a RESTful style (Matt’s isn’t built to be used this way, so they use a clever shim tool called a “restifier” that converts REST-style requests to expected application actions). Next, provide two implementations of each RESTful action: one that uses direct HTTP (say, adding a new user by sending PUT to http://rest.example.com/user/new) and another that uses a tool like Selenium to perform the same action visibly (navigate to http://www.example.com/user, enter field values, and click [Save]). Now (assuming you are using the standard Given/When/Then style), implement the Givens and Whens of your acceptance tests in terms of the resource-altering verbs PUT, POST, and DELETE, and implement the Thens using GETs. Finally, plug in the direct HTTP version of the actions when you want speedy, non-visual tests (which should be most of the time), and use the visible client-based tests when you want to see the results or test in a browser (e.g. for demonstrations or browser compatibility tests).

Matt finds this style provides the best of both worlds: speedy developer tests and comprehensible customer demonstrations. Further, it can force positive changes in the product – for example, when testing one feature, Matt found there was no way to get certain information about search results without actually loading the page in a browser. Making the missing information available as a resource allowed the feature to be tested and provided a better user experience to real browser users as well when it was incorporated into the page.

Problems in Using Acceptance Tests

Antony points out something that can get lost in BDD tests – particularly the resource-based ones Matt described: the role the user has. You can focus on your resources and actions and lose sight of how the activities should be organised into groups and assigned to types of users.

We had some debate about the use of data in tests (see also the Scientific Theory section above). I find it very useful to have realistic examples (“Given a customer phone number of 01555 555 2372”) because they are most meaningful to users and so help us converse about what they need, as well as helping future maintainers learn about the domain quickly. However Matt finds that once developers understand the domain better, they can ditch the examples and describe the rules for generating them (“Given a fixed-line customer phone number on the Manchester exchange”, which translates in the implementation into “number beginning with 01555 not in the list of mobiles”). This allows them to stress their system with random valid data and find more edge cases quickly (though you have to have really good diagnostics and the ability to replay a test with specific input to make debugging the failures possible).

Thanks to all the participants in the group, notably Priya Viseskul, Matt, and Antony.