Jake Worth

Jake Worth

Cold Water for Your Code Coverage

Published: March 28, 2023 • Updated: March 29, 2023 3 min read

  • testing

Code coverage– the percentage of your code covered by automated tests– is a metric associated with quality. In this post, I’d like to pour some cold water on this association.

The higher-is-better metric I’m discussing could be summarized as:

  • Coverage percentage up: good job!
  • Coverage percentage down: let’s fix that.
  • Function ‘covered’ (more on this later) by a test? Good.
  • Function not covered by a test? Bad.

This worldview is not a Straw Man argument; it’s real! And here are the issues I see with it:

  • Tests are tradeoffs
  • Coverage can measure the wrong thing
  • Coverage is not the same as usage
  • Coverage does not mean correct

Flaw #1: Tests Are Tradeoffs

Code coverage rewards writing tests, which ignores that every test is a tradeoff.

Tests aren’t free. Each test adds runtime to the suite. While those tests are running locally and on CI, developers are waiting. Tests that are written poorly can make a suite unacceptably slow. Even well-written tests add dependencies and additional maintenance to a codebase.

Kent C. Dodds once said about tests that we should want “some tests, not too many, mostly integration.” Rewarding an increase in code coverage rebuts this idea by rewarding testing every line of code, no matter how trivial. Let’s release the idea that an automated test applied to any line of code is good.

Flaw #2: Code Coverage Can Measure the Wrong Thing

Code coverage is one of many things we could measure, but does it deserve to be measured?

There’s a saying in manufacturing: “that which gets measured improves.” If you publicly measure accidents on the job, accidents should decrease. Measure quality, and defects should decrease.

The programming analogy is lines of code, which was once considered a useful measurement of program scale and programmer productivity. Today this metric has been abandoned; it’s understood that a bad program can contain tens of thousands of lines of code, while a useful program can be small. Lines of code proved was noisy and we stopped measuring it.

I think we should hold code coverage to the same scrutiny. Instead of code coverage, here are some other things we could measure:

  • Customer satisfaction (how happy are customers with our software?)
  • Conversion (how often do customers complete an order?)
  • Revenue (how much are customers spending?)
  • Bugs (what glitches are preventing ordering?)

Are these questions more important than code coverage? We must choose. Code coverage often seems exempt from this choice.

Flaw #3: Coverage Is Not Usage

Code coverage increases when you add a test, even when the code is never used.

You can test an untested function and raise your code coverage percentage, with zero guarantees that that function is ever used by the code. I’d rather see an effort to raise coverage percentage by removing dead code.

Flaw #4: Coverage Does Not Mean Correct

Code coverage increases when you add a test, even when that test is describing incorrect behavior.

Describing existing legacy behavior is a technique; see my talk at RailsConf 2017 for an example. But generally, I’m more interested in whether the behavior is correct than whether it is tested.

An Alternative Proposal: Test Bugs and New Code

So, what am I proposing instead? Given a legacy codebase with some tests, the path that I see is:

  • Existing code: when you find a bug worth fixing, fix and test it
  • New or extended code: test it

Testing existing bugs guarantees you’re focusing your testing on code that’s affecting customers. Testing new or extended code provides confidence in future work.

Okay, but what about code that is untested and isn’t demonstrating bugs? Should we write tests for that? Maybe not! We got lucky; that code is either unused, unreachable, or it just works. I’d recommend focusing your time elsewhere.

When Code Coverage is Useful

I’ve used code coverage tools on a project I’m building alone to keep myself honest about what I’m testing. However, I knew where the gaps were. On a team larger than one person, that kind of accountability is challenging to maintain.

I want to focus my attention on code that is used, is important, and needs to be tested. Test coverage is almost certainly the Pareto Principle at work: 20% of your tests cover 80% of your application’s business value.

How do you measure code quality?

What are your thoughts on this? Let me know!

Join 100+ engineers who subscribe for advice, commentary, and technical deep-dives into the world of software.