A Metrics Framework for Continuous Delivery

Here’s a framework I like to start with when I discuss what types of metrics can help or harm you in yoru journey to continuous delivery:

Lagging Indicators

At the organizational level, unit tests are usually part of a bigger set of goals, usually related to continuous delivery. If that's the case for you, I highly recommend using the four common DevOps metrics :

• Deployment Frequency—How often an organization successfully releases to production

• Lead Time for Changes—

o The amount of time it takes a feature request to get into production

NOTE: Many places incorrectly publish this as : The amount of time it takes a commit to get into production – which is only a part of the journey a feature goes through from an organizational standpoint. If you're measuring from commit time – you're closer to measuring the "cycle time" of a feature from commit up to a specific point. Lead time is made up of multiple cycle times.

• Change Failure Rate— Number of failures found in production per release/Deployment/time

OR: The percentage of deployments causing a failure in production OR

• Time to Restore Service—How long it takes an organization to recover from a failure in production

These four are we'd call "Lagging Indicators" and are very hard to fake (although pretty easy to measure in most places). They are great in making sure we do not lie to ourselves about the results of experiments.

Leading Indicators

However, many times we'd like a faster feedback loop that we're going ther right way, that's where "leading indicators" come in.

Leading indicators are things we can control on the day to day view – code coverage, number of tests, build run time and more. They are easier to "fake" but, combined with lagging indicators, can often provide us with early signs we might be going the right way.

We can measure leading indicators at the team and middle management level, but we have to make sure to always have the lagging indicators as well, so we do not lie to ourselves that we’re doing well when we’re not.

Metric groups and categories

I usually break up the leading indicators to two types of groups:

·        Team Level (Metrics and individual team can control)

·        Engineering Management Level (Metrics that require cross team collaboration or aggregate metrics across multiple teams)

I also like to categorize the, based on what they will be used to solve:

·        Progress: used to solve visibility and decision making on the plan

·        Bottlenecks and feedback – as the name implies

·        Quality – Leading indicators connected to the Quality lagging indicator indirectly (Escaped Bugs in production)

·        Skills – track that we are slowly removing knowledge barriers inside teams or across teams

·        Learning – are we acting like a learning organization?

 

The metrics are mostly quantitative (i.e they are numbers that can be measured), but a few are qualitative – in that you ask people how they feel or think about something. The ones I use are:

·        From 1-5 how much confidence do you have in the tests (that they will and can find bugs in the code if they arise)? (take the average of the team/cross teams)

·        Same question but for the code – that it does what I is supposed to do.

 These are just surveys you can ask at each retrospective meeting, and they take 5 minutes to answer.

Trend lines are your friends

For all leading indicators and lagging indicators, you want to see trend lines, not just snapshots of numbers. Lines over time is how you see if you're getting better or worse.

Don’t fall into the trap of a nice dashboard with large numbers on it, if your goal is improvement over time.  Trend lines tell  you if you're better this week than you were last week. Numbers do not.

Numbers without context are not good or bad. Only in relation to last week/month/release etc. Remember, we're about change here. So trendlines are you friends.