Red Pipelines and Build Whisperers: Getting to a Trustworthy Noise Free Test Pipeline

People wonder why it takes such a long time to transform the way a large organization works. Here’s just a small nugget, a small tip of a small iceberg, in a small corner of a large building in a large state, filled IT folks trying to get their jobs done.

And they have an nightly test pipeline.

And nobody cares about it.

I already explained why this might be an anti pattern, but another thing that usually happens is that the nightly pipeline is not trustworthy:

  • There is a high noise ratio. in many places, the “nightly” is almost always red and it takes a “specialist” (usually from the QA department) to “decrypt” the status of the build and tell developers if everything is actually OK or not. I like to call these pipelines: “Red unless Green” to signify that they are red by default, and green is almost by accident if it exists at all.

  • Because there is a high noise ratio - almost everyone disregards the results of the build, and so even if the build fails and finds real issues, they will be ignored and dismissed.

  • Many of the tests in the pipeline are fragile. They fail sometimes, and sometimes they don’t - but almost every build has some failing tests. Which leads to all the points above.

When we have that kind of situation there are several things we might want to consider moving towards:

  1. Getting the pipeline to become trustworthy: a.k.a “Green unless Red”. Right now it is at “Red unless Green”.

  2. Getting the pipeline to run fast enough so that we can execute it on each commit instead of at night.

  3. Merging the nightly test pipeline into the Delivery per commit pipeline.

First, I want to make sure we understand how many risks are involved in “Red Unless Green” (RUG) pipelines:

RUG (Red Unless Green) pipeline risks:

  1. Tests may indicate a real issue but everyone dismisses the results because the build is always red.

  2. Green tests are also ignored - so any confidence benefits we might get to help us run faster are dismissed. This often leads to manual testing since we cannot trust the nightly build. Wasting even more time.

  3. Test cost goes unmaintained since nobody cares about the tests.

  4. New tests are not added, or added as “lip service” since nobody cares about the feedback the build provides: both wasting time and providing no value.

Let’s add Build Whisperers to those risks.

The Build Whisperer

If you do have a person that is looking at the tests and can decipher their output to tell if they are actually in a good state, we call that person “Build Whisperer” - for no one else can understand the build they way that person does.

That person is also human bottleneck and acting as a manual dashboard for the team. This presents several risks as well:

  • This person can take a while to decipher the results of the build before communicating status

  • The person can also make mistakes trying to understand the results

  • If the person is not available or working on other tasks - status is delayed, sometimes by days or even weeks.

  • If it is one person’s job to understand the build, then it’s nobody else’s job (“not my job” syndrome). This allows everyone else to dismiss responsibility for the build status.

  • If only one person watched the build, only one person cares about it. This might seem too much like the last point but let me put it this way: If many people watch the build, many people start slowly caring about it more and more.

To get over thee risks - we decide, as a team, that the build will provide the final say on whether the product is deliverable or not.

It then becomes our job to make the build robust enough and trustworthy enough that we want to listen to it and deliver based on its results, and not based on a human making a decision about manual test cases.

What can we do?

  1. Configure the nightly build to trigger as many times a day as possible (if it takes 5 hours to run, execute it 4 times a day). Don’t put it on the developer’s wall yet. This is just to increase the feedback loop speed as we fix the build.

  2. DELETE or IGNORE all red tests.

  3. Run the build if it is not running yet.

  4. If there are red tests, repeat 1 -3.

  5. If the build has been green for three days we have achieved a BASELINE GREEN BUILD.

  6. At this point we can show the build on the Team’s Wall

  7. At this point we can treat red test with actual work to fix the tests or the bugs they discover. We can finally feel worried about red tests, and gain more confidence if the tests are green.

  8. We can slowly add back ignored tests that we deem necessary, adding them one by one, and fixing them in the process.

  9. Continue adding new and old tests to the build incrementally to gain more confidence (not just E2E tests. Use a test recipe.)

  10. Make the build run faster (parallelize items, break up test suites, increase amount of agents and environments) so you can ultimately merge the nightly build into the per-commit delivery build.

If point #2 scares you - remember that those tests are not really bringing to you value today. You are delivering your product even though they are failing.