Published: March 28, 2023 • Updated: March 29, 2023 • 3 min read
Code coverage– the percentage of your code covered by automated tests– is a metric associated with quality. In this post, I’d like to pour some cold water on this association.
The higher-is-better metric I’m discussing could be summarized as:
This worldview is not a Straw Man argument; it’s real! And here are the issues I see with it:
Code coverage rewards writing tests, which ignores that every test is a tradeoff.
Tests aren’t free. Each test adds runtime to the suite. While those tests are running locally and on CI, developers are waiting. Tests that are written poorly can make a suite unacceptably slow. Even well-written tests add dependencies and additional maintenance to a codebase.
Kent C. Dodds once said about tests that we should want “some tests, not too many, mostly integration.” Rewarding an increase in code coverage rebuts this idea by rewarding testing every line of code, no matter how trivial. Let’s release the idea that an automated test applied to any line of code is good.
Code coverage is one of many things we could measure, but does it deserve to be measured?
There’s a saying in manufacturing: “that which gets measured improves.” If you publicly measure accidents on the job, accidents should decrease. Measure quality, and defects should decrease.
The programming analogy is lines of code, which was once considered a useful measurement of program scale and programmer productivity. Today this metric has been abandoned; it’s understood that a bad program can contain tens of thousands of lines of code, while a useful program can be small. Lines of code proved was noisy and we stopped measuring it.
I think we should hold code coverage to the same scrutiny. Instead of code coverage, here are some other things we could measure:
Are these questions more important than code coverage? We must choose. Code coverage often seems exempt from this choice.
Code coverage increases when you add a test, even when the code is never used.
You can test an untested function and raise your code coverage percentage, with zero guarantees that that function is ever used by the code. I’d rather see an effort to raise coverage percentage by removing dead code.
Code coverage increases when you add a test, even when that test is describing incorrect behavior.
Describing existing legacy behavior is a technique; see my talk at RailsConf 2017 for an example. But generally, I’m more interested in whether the behavior is correct than whether it is tested.
So, what am I proposing instead? Given a legacy codebase with some tests, the path that I see is:
Testing existing bugs guarantees you’re focusing your testing on code that’s affecting customers. Testing new or extended code provides confidence in future work.
Okay, but what about code that is untested and isn’t demonstrating bugs? Should we write tests for that? Maybe not! We got lucky; that code is either unused, unreachable, or it just works. I’d recommend focusing your time elsewhere.
I’ve used code coverage tools on a project I’m building alone to keep myself honest about what I’m testing. However, I knew where the gaps were. On a team larger than one person, that kind of accountability is challenging to maintain.
I want to focus my attention on code that is used, is important, and needs to be tested. Test coverage is almost certainly the Pareto Principle at work: 20% of your tests cover 80% of your application’s business value.
How do you measure code quality?
What are your thoughts on this? Let me know!
Join 100+ engineers who subscribe for advice, commentary, and technical deep-dives into the world of software.