Using your values to choose

At the London Action Science Meetup in January we discussed the article Emotional Agility (HBR, Nov 2013), and in particular how to apply the four steps described in the article:

  1. recognize your patterns;
  2. label your thoughts and emotions;
  3. accept them; and
  4. act on your values

The importance of using your values they provide a stable point of reference, not subject to day-to-day fluctuations: “The mind’s thought stream flows endlessly, and emotions change like the weather, but values can be called on at any time, in any situation.”

For me personally, one consequence is choosing between alternatives for positive reasons, not negative ones. This was useful for a recent decision at TIM Group and I used the occasion of our weekly Lightning Talks to share the experience more widely.

Human Error and Just Culture

Sidney Dekker’s Just Culture made me thankful I don’t work in an occupation with a high risk of impacting public safety (those described in the book include aviation, health-care, and policing). In our society we believe that practitioners should be accountable for their actions, that without legal consequences after a tragedy there would be no justice. The dilemma is that tragic outcomes are more likely to be the result of systemic issues rather than bad actors, and the legal system is fundamentally unsuitable for dealing with issues of systematic safety. Worse, the risk of legal consequences stifles learning, and so our search for justice makes tragic outcomes more likely, rather than less.

Reading Just Culture after Charles Perrow’s Normal Accidents was a serendipitous pairing. Normal Accidents illustrates very convincingly that safety is an issue that largely transcends our traditional idea of human error. It makes the case that some accidents are normal and expected because of the properties of the system, and that the easy finger pointing at the practitioners misses the real story. As we should already know from Deming and manufacturing, quality is a property of the system, not the people in the system.

Picking up from there, Just Culture shows how the concept of accident doesn’t exist in law. There is always someone who was negligent, either willfully or not, and that someone shall be held responsible. The law isn’t interested in the learning of the system. It isn’t really interested in the truth as most of us would understand it. It is really about blame and about punishment.

How does your organization respond to a system outage? Are blame and finger-pointing the order of the day? We may not be subject to the criminalization of error described in Just Culture, but the organizational reflex can all too easily be to blame the developers, the testers, the system administrators, or others, when the focus should be on organizational learning, on fixing the system.

The idea of Blameless PostMortems is not new to TIM Group. We’ve done our best to use our RCAs as a tool for improving the system for several years now. Just Culture served as a reminder that we are fighting a cultural bias, and we need vigilance to avoid outdated ideas of human error creeping back into our organization. The pressure to do so is both pervasive and subtle. It would be easy to detect and fight if it were a case of managers asking “who screwed up?” It is harder when it seems like a virtue, when it is an engineer who is quick to assume responsibility for a mistake. It is a valuable trait when each individual is willing to be self-critical. The challenge is being able to look beyond the individual to the contribution of the larger system.

This is the balance we are trying to strike, between individuals who feel enough safety that they are willing to acknowledge their own contribution to the problem, and a system that doesn’t accept “human error” as a reason to avoid learning. We believe this is the path to a high-performing, and just, culture.

MDI: Monitoring Driven Infrastructure?

Adam and I attended the London Infracoders Meetup last night which featured a demo of serverspec and Beaker. When I asked Adam what he thought he wasn’t impressed*. “I don’t see the point of this if you’re using Puppet unless you’re worried about typos or you don’t control your production monitoring. It duplicates what you already have with Puppet’s DSL, which is why we stopped using rspec-puppet in the first place.”

I realized that Adam was correct, that the sort of automated tests I’m used to as a developer are functionally equivalent of the monitoring checks system administrators are already heavily using. (An easy leap for me to make since Michael Bolton already convinced me that automated tests are checks, not testing.) Over time I’ve seen the migrations of testing from unit tests on the developer desktop to more tests further and further down the deployment pipeline. This led me to wonder how far monitoring could make the same march but in reverse.

We already use our production monitoring in our test environments, but we don’t tend to use them in our local Puppet development environments. But why? Couldn’t we use our monitoring in TDD fashion? Write a monitoring check, see it fail, then write the Puppet code to make it pass? (This is the same motivation as using your tests as monitoring such as with Cucumber-Nagios but working in the other direction.)

We haven’t tried this experiment yet but I’m curious if this is a pattern others have attempted, and if so, how did it work for you?

* Adam did think serverspec might be a plausible replacement for our existing NRPE checks, a way to uniformly structure our monitoring and to allow easier independent development of the checks

Happy Holidays Jenkins!

At TIM Group we are proud to say that we support open source projects. And indeed we’ve spent time on many projects, submitting patches, posting on forums/mailing lists, and have a number of projects we’ve put up on our github page. Today we’ve taken the additional step of putting our money where our mouth is. Because we use Jenkins as a major part of our CI infrastructure we were happy to respond to the Jenkin’s holiday appeal and donate $1000 US.

Happy Holidays Jenkins!