MDI: Monitoring Driven Infrastructure?

Adam and I attended the London Infracoders Meetup last night which featured a demo of serverspec and Beaker. When I asked Adam what he thought he wasn’t impressed*. “I don’t see the point of this if you’re using Puppet unless you’re worried about typos or you don’t control your production monitoring. It duplicates what you already have with Puppet’s DSL, which is why we stopped using rspec-puppet in the first place.”

I realized that Adam was correct, that the sort of automated tests I’m used to as a developer are functionally equivalent of the monitoring checks system administrators are already heavily using. (An easy leap for me to make since Michael Bolton already convinced me that automated tests are checks, not testing.) Over time I’ve seen the migrations of testing from unit tests on the developer desktop to more tests further and further down the deployment pipeline. This led me to wonder how far monitoring could make the same march but in reverse.

We already use our production monitoring in our test environments, but we don’t tend to use them in our local Puppet development environments. But why? Couldn’t we use our monitoring in TDD fashion? Write a monitoring check, see it fail, then write the Puppet code to make it pass? (This is the same motivation as using your tests as monitoring such as with Cucumber-Nagios but working in the other direction.)

We haven’t tried this experiment yet but I’m curious if this is a pattern others have attempted, and if so, how did it work for you?

* Adam did think serverspec might be a plausible replacement for our existing NRPE checks, a way to uniformly structure our monitoring and to allow easier independent development of the checks

3 thoughts on “MDI: Monitoring Driven Infrastructure?”

  1. In my opionion, specs serve a different purpose than monitoring checks. Specs are about describing a system from a static viewpoint, monitoring is about keeping things up and running.

    To me serverspec is the Test-driven side of configuration management. Tools such as rspec-puppet are working on the same abstraction level as the puppet code itself, so it does help to test your puppet code, but not that much.

    serverspec has the big advantage of being human-readable, whereas monitoring-checks may not be, especially when a check calls another script (which calls a script, …)

    Combining monitoring and spec? yes, for example: http://de.slideshare.net/m_richardson/serverspec-and-sensu-testing-and-monitoring-collide

  2. I feel like this is a fundamental misunderstanding of how you’d use beaker to test modules and why you’d test modules. This kind of testing isn’t designed to catch typo problems or situations where you don’t monitor production. These tests are designed to act as “acceptance tests” when your infrastructure is changing.

    Lets say Debian 8 comes out and you want to make sure your modules still work as they should, that mysql still is setup and running and your databases are available.

    In the old world you now have to go setup a bunch of virtual machines as a testing environment, all running debian 8, and run all your existing roles against them. If you’ve got code in any of your modules that isn’t exercised by those existing roles (for future work, or stuff you’re just not currently using) you have to then make sure you add those classes and parameters too.

    Or you run your acceptance tests against a Debian 8 nodeset within Beaker and have it actively test all the classes you’re likely to use with a variety of inputs to make sure you’re covering all of the code.

    Sure you can do this with production monitoring but that’s going to get you results AFTER you run it against a live machine. Beaker and serverspec are part of the development workflow, not the production piece of the equation. You run them before you release modules into production in the first place.

  3. I’ve been experimenting with a spec-like language for creating monitoring checks with Sensu. The idea is that something like

    describe ‘nginx’ do
    describe ‘processes’ do
    it ‘must have 4 nginx worker processes’
    end
    end

    would turn into a check that looks for nginx worker processes. A command is used to run all the checks on the system in “testing” mode and those checks are then used directly by Sensu.

    Still a work in progress, but I love the idea of MDD

Leave a Reply

Your email address will not be published. Required fields are marked *