What's wrong with how we use metrics?
Organizations looking at metrics from a management by numbers perspective follow a process that looks
like this:
-
Management come up with a goal and work out a measure
-
Management establish a target over a large period (3-6 months up to a year) for the people doing
the work
-
Management communicate only the target (in terms of the agreed metric)
-
People doing the work do everything in their power to meet the target number
This process encourages overloading a metric with the following purposes:
-
Metrics as a target – Numerical metrics make it particularly easy for people to use it as the
only means for communicating a goal. It’s often much easier to tell people a scale and a number
than explaining a much more complicated goal. The target is frequently an arbitrary number and
some organizations even spend excessive amounts of time determining what that number should be.
-
Metrics as a measure of performance – With an established number in place instead of a
well-articulated goal, it is now easy for managers to use that same measure to track how quickly
people doing the work move towards the goal. Many organizations link these numbers to individual
performance targets.
-
Metrics as best practice – Using metrics both as a target and a measure of performance results in
an unintended side effect – an implication this metric is the best method of working towards the
goal. When an independent party measures someone else using a numerical target, it applies more
pressure on the person doing the work to simply meet an established number. Since they are only
measured on performance to this metric, they do all that they can to achieve that particular
metric. It implies no other method is best at achieving the end goal.
Overloading a single metric with multiple purposes causes many problems, particularly when dealing with
knowledge work such as software. Metrics are simplifications of much more complex attributes. The cost
of simplifying complexity comes at the cost of losing sight of the real end goal, and ends in a
suboptimal result.
Let’s look at an example:
A test manger, let’s call her Mary, holds weekly meeting with the development lead, Dan. “Where are
we at with our bug counts?” she asked at their most recent one. Dan answered, “We cleared our
three priority one bugs, fixed four priority two bugs and cleared out a record twelve priority three
bugs. A pretty good week right?”
Looking at the development lead, slightly shaking her head, Mary responded, “Unfortunately our
customer reported five priority one bugs, six priority two bugs and fifteen priority three bugs. You’ll
need to work harder next week.” Exasperated and feeling overwhelmed at missing his target, Dan left
the meeting thinking about asking his team to work yet another weekend.
In this very simple story, the chosen metric meets one benefit of making the meeting move very quickly.
Both people quickly understand progress after Dan reports his results and when Mary responds.
Unfortunately the implied goal of delivering useful software is missed, and Dan leaves the meeting with
a solution more likely to cause further software issues and drag in software quality.
The way that Mary states her objective puts pressure on Dan to reduce the number of bugs. It seems like
an admirable goal. While reducing the number of bugs is a good goal, it also leads to a very reactive
solution. Dan leaves the meeting thinking how much harder to work. The question posed by Mary fails to
neglect the broader goal, and she fails to ask the crucial question that guides Dan and his team towards
fixing the underlying reason the bug exists. Without resolving this root cause, Dan and his team are
destined to fix bugs for life.
Dan is experiencing single loop learning.
Single loop learning is the repeated attempt at the same problem, with no variation of method and
without ever questioning the goal. If Dan ever hopes to break out of this vicious bug cycle, he needs to
do something differently. The inappropriate use of software leads Dan away from the end goal of
delivering useful software and improving overall software quality. Einstein’s definition of insanity
seems to fit well here: “doing the same thing over and over again and expecting different results.”
Be careful what you measure
Organizations love metrics because it makes setting targets easier, and discourages people from
questioning the goal behind the target. This leads managers into a false sense of organizational
efficiency. Strong incentives tied to strong metrics force people to concentrate on just one part of the
work, neglecting other contributing factors that might make a goal more successful. Organizations must
be wary of this actively destructive focus that leads people to neglect other important factors.
Even agile techniques do not protect teams from the undesirable behaviors driven by measuring and
tracking the wrong number. For example, agile teams often use story cards
for development work. Teams often visualize these small increments of work on board as it moves through
their organization’s software lifecycle. A typical process might look like this with the ideal flow of
stories moving from left to right:
Management and product management often ask the question, “How soon will that feature be complete?”
Teams often choose to interpret this as when coding finishes, succumbing to the idea that testing and
the path to production are trivial and inconsequential parts of software process. Project management
reinforces this perception by asking the question, “How many stories did we finish coding this
week?” instead of the better question, “How many stories are we happy to release to end users?” or
better yet, “How many stories did we release to end users?” An even better question is, “How much value
have our users found from our recent releases?”
Teams want to do the right thing and these questions and metrics consequently drive developers to focus
on getting storiesDevelopment Complete. Let’s look at the consequences of overtly focusing on
this sub optimal goal alone:
Malcolm, a marketing representative always takes a keen interest in what developers built for him that he
dropped by the team as often as he could. He often talked to Dan, the developer, asking when his
features would be complete. Dan, not wanting to disappoint Malcolm worked hard to focus on finishing
whatever Malcolm asked, knowing he wouldn't be far off from returning to ask on progress. He'd often
think to himself, “This feature must be really important.” Tim, the team's newest tester often needed to
approach a developer, like Dan, to understand how to trigger the newly developed features.
Tim approaches Dan one day, “Hi Dan! I really need your help to understand how to test this feature
you completed last week.” Dan, under pressure to deliver snaps, “Can't you do anything by
yourself? I need to get this feature complete so Malcolm gets off my back.” Shocked at Dan's
response, Tim returns to his desk, and waits. He thinks to himself, “I can't get anything done
until Dan helps me out.”
Each week this happens, and over time, the stack of stories waiting to be tested grows and grows.
Eventually Malcolm calls a meeting with the team concerned he’s yet to see that feature he asked for two
months ago in production. Surprised, Dan says he completed it over a month ago. Tim bashfully responds,
“I couldn't test that story because I needed some help from Dan and he's been so busy with other
work. I didn't want to interrupt him.”
What can we learn from this story? Firstly, what matters to Malcolm is that the flow of work is getting
done. Even though Malcolm asks when something will be completed, what he really wants is to be able to
use it in production. We know that Tim didn’t have knowledge necessary to complete and his work and the
pressures on Dan to complete more work prevented Tim from acquiring any more knowledge. The end result
was a vicious cycle of work building up in testing, never getting released and with Malcolm puzzled why
he hadn’t received the feature he’d asked for. This is why methods like Kanban Software Development
encourage
Explicit Work in Progress
limits. These limits force people to help out other when bottlenecks appear. These WIP limits work to
overcome the undesirable behaviors that emerge when people are measured by the wrong metric of their
individual productivity instead of overall value delivered. The book, Lean
Software Development, stresses the importance of measuring the end to end result instead of
simply a small part of the process, calling out a principle they call ‘Optmize the Whole’. Optimizing
the whole means ensuring the metrics in use do not drive sub optimal behavior towards the real goal of
delivering useful software.
Guidelines for a more appropriate use of metrics
Given the undesirable behaviors that emerge due to the inappropriate use of metrics, does this mean there
is no place for them? Of course there is a place for metrics. What is needed is a different method. Use
the following guidelines to lead you to a more appropriate use of metrics:
-
Explicitly link metrics to goals
-
Favor tracking trends over absolute numbers
-
Use shorter tracking periods
-
Change metrics when they stop driving change
We’ll use the following sections to explore what these mean.
Explicitly link metrics to goals
In the traditional style, management decides what the best measure for a particular goal is.
Management then set a target in terms of that measure. Management then articulate only this target
to people doing the work, in its, often, numerical representation. The lines between the measure
chosen to monitor progress towards the goal and the actual goal itself blur. Over time, the reason
behind the measure is lost and people focus on meeting the target even if that metric is no longer
relevant. A more appropriate use of metrics is to ensure that the chosen measure for progress, the
metric, is teased out, yet related to its purpose, the goal.
For example, in a software development context, you might see metrics defined like this:
Methods must be less than 15 lines. You must not have more than 4 parameters to a method. Method
cyclomatic complexity must not exceed 20.
With an appropriate use of metrics, every single measure should clearly be linked to its original
purpose. The current mechanism for tracking and monitoring must be decoupled from its goal and that
goal made explicit to help people better understand the metric’s intent. A metric in a richer
context for its existence guides people in making more appropriate, pragmatic and ultimately useful
decisions towards the goal. Without its purpose, the effort expended means people find ways to
creatively game their system, ultimately detracting from the real goal. Here's what that looks like:
We would like our code to be less complex and easier to change. Therefore we should aim to write
short methods (less than 15 lines) with a low cyclomatic complexity (less than 20 is good). We
should also aim to have a small handful of parameters (up to four) so that methods remain as focused
as possible.
Explicit linking the metrics to the goal allow people to better challenge their relevance, to find
other ways of satisfying the need, and to help people understand the intent behind the numbers.
Without this articulated purpose, people may find ways, unintentionally working against the implicit
goal. For example, a number of techniques might help reduce a method length, but increase overall
complexity by being harder to read if not applied with the correct intent.
The nature of software development means most work is knowledge work, and is therefore hard to
observe. It is easy to monitor activity (how much time they sit at their computer) yet it is hard to
observe the value they produce (useful software that meets a real need). The further that people
move away from the code, the harder it is for them to appreciate the complexities involved. This
implies that it is very difficult, if not impossible for people furthest away from the work to
really know the best measure to monitor for progress towards the goal.
A shift towards a more appropriate use of metrics means management cannot come up with measures in
isolation. They must no longer delude themselves into thinking they know the best method for
monitoring progress and stop enforcing a measure that may or may not be the most relevant to the
goal. Instead management is responsible for ensuring the end goal is always kept in sight, working
with the people with the most knowledge of the system to come up with measures that make the most
sense to monitor for progress.
Favor tracking trends over absolute numbers
Management find metrics too hard to resist because it distils down organizational complexity into
something everyone can understand, a number. It’s easy to see one number is bigger or smaller than
another, or how distant one number is from another. It’s much harder to see if that number is still
relevant. This traditional approach to management likes using these metrics because it makes it easy
to communicate when a target is met. “Just reach this number and we’ll be fine”.
When you turn a qualitative and highly interpretive issue (think of productivity, quality, and
usability) into a number, any figure is relative and arbitrary. There may be significant difference
between code coverage of 5% and 95%, but is there really a significant difference between 94% and
95%? Choosing 95% as a target helps people understand when to stop, but if it requires an order of
magnitude of effort getting that last 1%, is it really worth it? This is only something that people
must work out subjectively in their own organizational context.
Looking at trends provides more interesting information than whether or not a target is met. Working
out if a goal is met is easy. The difficult work, and one that management must work with people with
the skills to complete is looking at trends to see if they are moving in the desired direction and a
fast enough rate. Trends provide leading indicators into the performance that emerges from
organizational complexity. It is clearly pointless focusing on the gap in a number when a trend
moves further and further away from a desired state.
Focusing on trends is important because it provides feedback based on real data on any change
implemented and creates more options for organizations to react. For instance, if the team is
trending away from a desired state, they can ask themselves what is causing them to move away from
their goal and what can they do about it. It pre-empts action much earlier than simply doing as much
as they can before working out a number. If a team find themselves trending towards a desired state,
they can ask themselves what is helping them move towards their goal and what else can be done to
accelerate that rate. Measuring teams encourages people to experiment much more. Tweak one thing and
observe its effect on the trend, monitoring where you are with the desired state and knowing when to
stop.
Arbitrary absolute numbers also create helplessness, especially when progress towards a goal is slow
and dependencies on other departments or corporate policies outside of a group’s control prevent
more progress. Trends help focus people’s efforts on making movement in the right direction rather
than being paralyzed between a gap that looks impossible to resolve.
A more appropriate use of metrics requires more management involvement in reporting and recording
movements in trends because the ecosystem that surrounds a team is management’s responsibility. This
ecosystem includes the organization’s policies, the way work is scheduled or planned and the way
that teams and people are organized. This ecosystem often has much more influence on the trend that
the efforts expended by individuals. Management should be interested in trends to observe the
effects of changes to this ecosystem.
An appropriate use of metrics finds trends much more useful than absolute numbers. Arbitrary targets
don’t really have much meaning without the right trend and better questions emerge when thinking
about what affects a trend and what else can be done to affect the trend, rather than pointing about
what the gap is between an arbitrary number and reality.
Use shorter tracking periods
Many organizations use metrics to set targets for very long periods, typically 3-6 months, even as
long as a year or beyond. Managers establish this target, with the responsibility lying with the
people doing the work to do whatever they can to meet that target. Management revisits this target
at the end of the period to
evaluate
the people doing the work. In this system, the relationship between management and the workforce is,
at best, described as confrontational. The workforce, doing their best endeavors, do whatever they
can to meet the goal, with an implicit idea that management do not have any responsibility.
A consequence of revisiting metrics after long periods is that the failure to meet management’s
arbitrary target becomes more and more unacceptable. I’ve heard managers say things like, “You
had a whole year to meet your target and you missed it.” The risk and cost of failure increases
the longer the tracking period is.
Agile methods prefer shorter periods for review because any performance gap is less costly. Failing
to make enough progress in a week is much less significant than failing to make enough progress over
a whole year. Reviewing progress after each week generates many more options than reviewing progress
after a year, simply because there are more opportunities to react and change. After a short period
such as a week, you also have much more data about what actually happened instead of what was
planned, and this should be used to influence the outcome by using it to drive change.
Organizations benefit from using shorter tracking periods as it creates more opportunities for
re-planning that allows maximum value.
I worked with a team that released software into production every two weeks. The business liked
regular releases because they could use the software almost immediately. On using the software
deployed after the latest release, the business discovered they had enough features they could do
almost everything they needed for a new marketing initiative. It was only a fraction of what they
originally asked for.
Instead of the development team writing features that would probably never be used, the business
picked a small subset of the leftover stories and started work on the next initiative.
An appropriate use of metrics tracks progress in smaller cycles because it gives much more
information about where a project may end up further in the future. Tracking smaller periods helps
identify trends and the pause gives organizations a more informed position to influence the
environment and the rate/direction of a trend.
Tracking smaller periods also enables more collaboration because it provides more opportunity for
management to be involved. Rather than simply
evaluating
people at the end of a larger period, tracking smaller periods provides more data about what is
actually happening that influences the trends.
Change metrics when they stop driving change
If organizations reached goals easily, they would never need metrics. The organization could shift
direction and they would reach their goal immediately. Unfortunately this doesn’t happen in reality
which is why measures exist. Achieving a goal often takes much longer. The first guideline to an
appropriate use of metrics separates the real goal from the measure selected to monitor progress
towards that goal. The real goal must always be made explicit.
Guideline #2 and #3, monitoring trends and doing so over shorter periods is about helping
organizations realize their goal faster. It isn’t achieved through the single-loop learning
described earlier in the chapter. Organizations require is the double-loop learning Argyris writes
about. An appropriate use of metrics drives people to question the goal and, based on collecting
real data, implementing change to get there.
Here’s what double loop learning looks like:
Frustrated by fixing bugs every week Dan the developer considers why he is constantly fixing bugs.
Over the last three weeks, Malcolm reports many issues about things not working as he expected. He
steps back to think about what is really going on, less concerned about the bug count he is always
asked about and more about why he has them to begin with.
When Dan picks up a story, he often has lots questions for Malcolm about how it should work. Dan
knows Malcolm has his other marketing activities keeping him busy and understands Malcolm cannot sit
with him to answer his questions. Dan is under enormous pressure to deliver something, so he makes
several assumptions to ensure he can deliver something instead of nothing.
Looking at the bugs, Dan realizes that many of the bugs reported are based on those small
assumptions he keeps making. The pressures to deliver something mean that Dan never builds the right
thing the first time around.
When Dan explains this to Malcolm, they agree to sit down at the start of each new story to make
sure all of Dan’s questions are answered before he starts coding. They try this the next week and
the overall number of bugs reported that week decreases.
Double loop learning requires more data about what is actually going on. Shorter periods create more
data points, making it easier to see any trends. The trends offer insight into the current
performance of the system and should be used to trigger thinking and problem solving about the
deeper underlying forces at play in the system, not simply for tracking performance measurement.
Implementing real change helps accelerate organizations towards their current goal.
Changing the system that people work in often has a much greater impact than focusing on the
individual’s efforts to work harder or faster. In our story, Dan could have spent more time each
week trying to fix bugs, but by adjusting the flow of information and the working relationship
between Malcolm and Dan, they changed the system to be much more effective.
Project post-mortems look back after a project finished, seeking lessons learned in the hope of
applying them to future projects or spreading them across an organization. Conducting post-mortems
at the end of the project offers no chance to actually apply these learnings to the project itself.
Agile Retrospectives
differ in their intent by seeking change while a project is in flight, where actions have more
impact than they would at the end. These meetings create an opportunity for teams to look for
opportunities for change though still rely on the people and organization to commit to those
changes.
When an organization reaches its goals, it’s time to return the metrics used to achieve it. Remember
that if organizations accomplished their goals instantaneously, there would never be a need for a
metric. Defining, tracking and monitoring and interpreting metrics takes time and resources that
could be better spent against new goals. Organizations need to drop metrics that are no longer
relevant, instead of holding on to all the metrics they are used to collecting. With an appropriate
use of metrics, understanding what metrics to retire will be easy because those metrics have been
explicitly linked to the goal and constant monitoring of trends during periods encourage a
continuous review of the state of the end goal.
You can look for some symptoms for potentially out-dated metrics by asking people, “Why do we
need to collect this number?” A terrible response might include, “That’s the way we’ve
always done it,” or worse yet, “It’s our policy.” This question does not necessarily
discriminate between poorly explained goals, or out-dated metrics so it probably takes a little bit
more digging. It is management’s responsibility to ensure that an organization’s time is not spent
needlessly gathering, maintaining unnecessary metrics.