Human Error and Just Culture

Sidney Dekker’s Just Culture made me thankful I don’t work in an occupation with a high risk of impacting public safety (those described in the book include aviation, health-care, and policing). In our society we believe that practitioners should be accountable for their actions, that without legal consequences after a tragedy there would be no justice. The dilemma is that tragic outcomes are more likely to be the result of systemic issues rather than bad actors, and the legal system is fundamentally unsuitable for dealing with issues of systematic safety. Worse, the risk of legal consequences stifles learning, and so our search for justice makes tragic outcomes more likely, rather than less.

Reading Just Culture after Charles Perrow’s Normal Accidents was a serendipitous pairing. Normal Accidents illustrates very convincingly that safety is an issue that largely transcends our traditional idea of human error. It makes the case that some accidents are normal and expected because of the properties of the system, and that the easy finger pointing at the practitioners misses the real story. As we should already know from Deming and manufacturing, quality is a property of the system, not the people in the system.

Picking up from there, Just Culture shows how the concept of accident doesn’t exist in law. There is always someone who was negligent, either willfully or not, and that someone shall be held responsible. The law isn’t interested in the learning of the system. It isn’t really interested in the truth as most of us would understand it. It is really about blame and about punishment.

How does your organization respond to a system outage? Are blame and finger-pointing the order of the day? We may not be subject to the criminalization of error described in Just Culture, but the organizational reflex can all too easily be to blame the developers, the testers, the system administrators, or others, when the focus should be on organizational learning, on fixing the system.

The idea of Blameless PostMortems is not new to TIM Group. We’ve done our best to use our RCAs as a tool for improving the system for several years now. Just Culture served as a reminder that we are fighting a cultural bias, and we need vigilance to avoid outdated ideas of human error creeping back into our organization. The pressure to do so is both pervasive and subtle. It would be easy to detect and fight if it were a case of managers asking “who screwed up?” It is harder when it seems like a virtue, when it is an engineer who is quick to assume responsibility for a mistake. It is a valuable trait when each individual is willing to be self-critical. The challenge is being able to look beyond the individual to the contribution of the larger system.

This is the balance we are trying to strike, between individuals who feel enough safety that they are willing to acknowledge their own contribution to the problem, and a system that doesn’t accept “human error” as a reason to avoid learning. We believe this is the path to a high-performing, and just, culture.

Facilitating Agility with Transparency

Part of the agile coaching work I do at TIM Group involves running a large number of Retrospectives and the (hopefully only) occasional Root Cause Analysis. Both of these generate actions designed to improve (and/or shore up) our processes so that we are constantly improving. These actions are supposed to be discrete, and done within a week of their assignment.

Over the last year or so, TIM Group been moving to a more ‘start-up’ style organizational model. Previous to this, we had a stabilized two week release cycle, and our development teams were quite static. This has changed now, and while some of the teams here still run retros on a two-week cycle, others are on a one-week cycle, and still others on a more ad-hoc basis. More importantly, the teams are a lot more fluid, with developers not just moving from one development team to another but also to our infrastructure team and back.

In a perfect world, this would not be a problem because actions are all done within a week.

Well, despite the ‘within a week’ expectation, actions had been piling up. Retro after retro would pass, and the ‘actions’ column would get clogged with outstanding items. In addition to this, the RCA actions were also not getting done. While it wouldn’t be fair to say this was a new problem, the new organization was aggravating the existing problem.

I was into the habit of reminding each team about their retro first thing in the morning of their meeting. This helped to get more discussion topics brought up before the start of the meeting. This helped the meeting go faster and more smoothly, but it wasn’t proving to be enough to actually get the actions done.

So I started sending out more and more specific reminders, looking at each board and naming the individuals who had outstanding actions.

As this activity took more and more of my time each retro day, I decided to build myself some help. Luckily, I had previously done some work with the APIs of our on-line Kanban tool. It was fairly simple to make a new version of the code that instead of working with our taskboards, worked with the boards we used for our RCAs and retrospectives.

My initial idea was to simply find a way to generate (or at least partially generate) some of those reminders I was sending out to the teams. But once I had gotten the team notifications done, a pattern emerged — many people had actions across multiple teams. This was when it struck me. I was facilitating the teams, but the *individuals* were the ones who needed to get their actions done. I needed to make their lives easier if I wanted them to get the actions done.

The clear next step was to give each person their own ‘actions report’. Now, at the start of each week, instead of having to look in a bunch of different locations and trying to check if there were things they needed to do, each person who has any outstanding actions gets an e-mail. It clearly states which actions need to be done, including the action title and description, with a URL linking back to the exact card in question on the taskboard. *This* was getting somewhere. I got a lot of positive feedback from people. In fact, I got a number of people asking to put their own smaller-task or special project taskboards on the system so that they could get even more of their actions in one place.

That was a big indicator that I’d done something right, people asking for more!

Of course, once I had a person-by-person action tally, it was a doddle to implement a simple gamification, a leader-board, posted weekly, listing everyone who has yet to complete their actions with the ‘top’ person having the most outstanding actions. A top position which has been, incidentally, occupied since inception by our very own CTO.

Next up? Implementing markdown in the action reports, to increase readability. Team status pages, to show what cards we have as ‘monitor’ cards, so we know what current issues we are monitoring.

Devopsdays London

This is a blog post that was written in 2013, but somehow was forgotten about. So here is a bit of history!

— Andrew Parker


Most of our Infrastructure team and a couple of developers we had seconded to the team all attended the Devopsdays London conference a couple of weeks ago.

There are a load of reviews/notes about the conference online already, however we also made a set.

I think everyone attending found the conference valuable, although for varying reasons (depending upon which sessions they had attended). Personally I found that the 2nd day was more valuable, with better talks and more interesting openspace sessions (that I attended). As I had expected (from my previous attendance at Devopsdays New York), I found the most value in networking and comparing the state of the art with what others are doing in automation / monitoring / etc.

I was very pleased that TIM Group is actually among the leading companies to have implemented devops practices. I’m well aware that what we’re doing is a long way away from perfect (as I deal with it 5 days a week), however it’s refreshing to find out that our practices are among the leaders, and that the issues we’re currently struggling with are relevant to many other people and teams.

I particularly enjoyed the discussion in the openspaces part of the conference about estimating and planning Infrastructure and Operations projects – at the time we were at the end of a large project, in which we’d tried a new planning process for the first time (and we had a number of reservations). The thoughts and ideas from the group helped us to shape our thinking about the problems we were trying to solve (both within the team, and by broadcasting progress information to the wider company).

Afterwards (in the last week) we have taken the time to step back and re-engineer our planning and estimation process. We’ve subsequently set off work on another couple of projects, with the modified planning and estimation process, and the initial feeling from the team is much more positive. Once we’ve completed the current projects and we’ve had a retrospective (and made more changes) I’ll be writing up the challenges that we’ve faced in estimating and how we’ve overcome them – as being able to deliver accurate and consistent estimates in the face of un-planned work (e.g. outages, hardware failures etc) is even more challenging than for operations projects than in an agile development organisation.

The Seven Pillars of Agile – Self-Improvement

One in a series on the Seven Pillars of Agile retrospective.

Self-Improvement

To what extent do you agree that each statement below is true of your team ?

  • Continuous Improvement

    • We improve ourselves as a team over time

    • Our team supports individual self-improvement

    • We are always open to any new technology or idea that may help us achieve our goals

  • Intentional Practice

    • We read books that inform our work

    • We discuss how we work and consider opportunities for improvement

    • We participate in code katas / code dojos to improve our skills

    • We attend conferences and events where we can learn from experts in our field

    • We contribute to Open Source projects

  • Introspection

    • After any problem, we consider how we could do better next time

    • We hold regular Team Retrospectives, and use them to improve how the team works

  • Balance

    • I maintain motivation in my work

    • I have a sustainable Work/Life balance

    • We have regular slack time, rather than sustaining 100% effort continuously

 

The Seven Pillars of Agile – Technical Excellence

One in a series on the Seven Pillars of Agile retrospective.

Technical Excellence

To what extent do you agree that each statement below is true of your team ?

  • Considered Design

    • We consider multiple possible ways to satisfy each business need, and make informed choices to achieve the best functionality, usability, and long-term sustainability

    • Our designs allow us to move forwards at a steady pace, without unpleasant surprises or panics

    • We keep technical debt manageable, and keep our design under control without requiring excessive quantities of rework

    • We consider our users performance requirements for each feature, and strive to meet or exceed them.

  • High Quality Implementation

    • Our code is clear and simple – “Perfection is achieved not when there is nothing left to add, but when there is nothing left to take away”

    • Don’t Repeat Yourself – we refactor when we spot repetition. Are your ‘C’ and ‘V’ keys showing more wear than the rest ?

    • Cohesive, Single Responsibility classes – we don’t have big, bad classes; and if we encounter such beasts, we trim them rather than allowing ourselves to add to them.

    • We don’t abuse inheritance – our subclasses are genuinely substitutable

    • Our collaborators Tell each other what to do, rather than Asking for information

    • Our tests are fit for purpose – expressive to read, flexible to change, and reliable to run

     

 

The Seven Pillars of Agile – Confidence

One in a series on the Seven Pillars of Agile retrospective.

Confidence

To what extent do you agree that each statement below is true of your team ?

  • Code Confidence

    • Confident that the code actually meets the user requirement

    • Before passing a card on for testing , we are confident that the feature works properly under all likely use cases – not just “done”, but “Done Done”

    • There is no ‘actually, we also need to…’ or ‘other 80% of the work’

    • Zero Bugs process – don’t just fix the bug, fix the process to eliminate the source of the bug

  • Process Confidence

    • We use version control effectively to ensure that we know what we deliver

    • Our Continuous Integration system gives us clear, timely feedback on our code

    • Our progress is transparent to all members of the team and to outside stakeholders

    • We maintain a shared team rhythm, e.g. all being present for the morning standup

 

The Seven Pillars of Agile – Supportive Culture

One in a series on the Seven Pillars of Agile retrospective.

Supportive Culture:

To what extent do you agree that each statement below is true of your team ?

  • Learning

    • Everything we do, we treat as an opportunity to learn

    • We embrace the face that when we a task is completed, we’ll see more clearly what we should have done – and use this to help us see what to do next.

  • Space to Learn

    • We have a ‘No Blame’ culture

    • An appropriate proportion of slack time – to stop, think and experiment

    • We celebrate failure as an opportunity to learn

    • We take appropriate risks

  • Managing Conflicts

    • We avoid allowing conflict to damage the team

      • wasted energy

      • disappointment

      • individuals feeling disengaged from the team effort

    • We extract value from a diversity of opinion

  • Respect

    • I trust my colleagues

    • I feel trusted by my colleagues

    • Our team is trusted by the rest of the business

    • Individuals and the team feel empowered to tackle challenges themselves, without waiting for senior assistance or permission

  • Commitment

    • Each team member is always nudging for improvement

    • We are willing to ask for help, and to give help when asked

    • Whole Team Attitude (we all own the whole product)

    • Permanent Team Attitude (stability, shared history, commitment to a shared future)

 

 

The Seven Pillars of Agile and the Spiderweb Retrospective

As part of our efforts to continuously improve our team’s working process, we hold Agile Retrospectives every couple of weeks. A feeling arose in the team that our existing retros were getting a bit stale, so as the facilitator, I was tasked with running the next one ‘completely differently’.

I discovered Brian Marick’s Pillar Spiderweb Retrospectives; the spiderweb is nicely visual, and by asking participants to focus on specific areas, should help bring up possible areas for improvement that might otherwise escape consideration.

Probably the most crucial part of making these discussions fruitful is the initial description to the group of what each specific ‘pillar’ means. In order to elicit a comparable set of ratings and a productive discussion, it is critical to establish a shared understanding first.

Helpfully, Brian also wrote up sets of notes for three of the seven pillars on his blog, describing and giving examples of what they mean in practice:

To support discussion of the other four pillars, I’ve assembled some notes, based closely upon the descriptions in the Agile Skills Project Wiki:

Since I only sat down to write these notes after exhaustive Googling appeared to show that no-one else had done it for me, I thought they’d be worth sharing here, in the hope that they help others try out the Agile Pillars Retrospective.

Our Pillar Spiderweb retro seemed to succeed in enabling discussion of points that hadn’t otherwise been considered, and at the end there was a strong team consensus that it had been worthwhile. The team agreed that they want to repeat this format of Retro at three-monthly intervals.

See also:

Brian Marick’s original blog post: http://www.exampler.com/blog/2009/06/10/the-seven-pillars-of-an-agile-team-introduction/

The Agile Skills Project Wiki: http://sites.google.com/site/agileskillsprojectwiki/

The Mind Map of Agile Skills: http://www.mindmeister.com/35781546/seven-pillars