Cold Water for Your Code Coverage
- 4 minutes read - 791 wordsCode coverage, the percentage of your code covered by automated tests, is a metric associated with quality. In this post, I’d like to investigate this association. And pour some cold water on it.
The higher-is-better metric I’m discussing could be summarized as:
- ✅ Coverage percentage up: good job!
- ❌ Coverage percentage down: let’s fix write more tests.
- ✅ Function covered by a test? Good.
- ❌ Function not covered by a test? Bad.
This perspective is not a Straw Man argument; it’s real! And here are the issues I see with it:
- Tests are tradeoffs
- Coverage can measure the wrong thing
- Coverage is not the same as usage
- Coverage does not mean correct
Flaw #1: Tests Are Tradeoffs
Code coverage rewards writing tests, which ignores the reality every test is a tradeoff.
Tests aren’t free. Each test adds runtime to the suite. While those tests are running locally and on CI, developers are waiting. Tests that are written poorly can make a suite unacceptably slow. Even well-written tests add dependencies and maintenance to a codebase.
Kent C. Dodds once said about tests that we should want “some tests, not too many, mostly integration.” Rewarding a rise in code coverage rebuts this idea by encouraging testing every line of code, no matter how trivial. Let’s release the idea that an automated test applied to any line of code is good and instead start to see each one as an often necessary tradeoff.
Flaw #2: Code Coverage Can Measure the Wrong Thing
Code coverage is one of many things we could measure. Does it deserve to be measured?
There’s a saying in manufacturing: “That which gets measured improves.” If you publicly measure accidents on the job, accidents should decrease. Measure quality, and defects should decrease.
The programming cautionary tale is lines of code (LOC), which was once considered a useful measurement of programmer productivity. Today this metric has been abandoned. It’s well understood that a great programmer can contribute a relatively small number of lines in a given time period. Lines of code proved noisy and we stopped measuring it.
I think we should hold code coverage to the same scrutiny. Instead of code coverage, what if we measured conversion: how often do customers complete an order? Or bug reports? Are these metrics more important than code coverage? We must choose.
Flaw #3: Coverage Is Not Usage
Code coverage increases when you add a test, even when the code is never used.
You can test an untested function and raise your code coverage percentage, with zero guarantees that that function is ever used by the code. If that’s the game, I’d rather see an effort to raise coverage percentage by removing unused code.
Flaw #4: Coverage Does Not Mean Correct
Code coverage increases when you add a test, even when that test is describing incorrect behavior.
Describing existing legacy behavior is a technique; see my talk at RailsConf 2017 for an example. But generally, I’m more interested in whether the behavior is correct than whether it is tested.
An Alternative Proposal: Test Bugs and New Code
So, what am I proposing instead? Given a legacy codebase with some tests, the path that I see is:
- Existing code: when you find a bug worth fixing, fix and test it
- New or extended code: test it
Testing existing production bugs guarantees you’re testing code that’s affecting customers. Testing new or extended code provides all the benefits of coverage to future, presumably important work.
Okay, but what about code that is untested and isn’t demonstrating bugs? Should we write tests for that? Maybe not! Maybe we got lucky; that code is either unused, unreachable, or it works as is. Focus elsewhere.
When Code Coverage is Useful
I’ve used code coverage tools on a project I’m building alone to keep myself honest about how many tests I’m writing. However, I knew where the gaps were. On a team larger than one person, that kind of accountability is challenging to maintain.
If you do use code coverage metrics, I think it’s more of a high-level technical leadership metric. Are we generally moving in a tested direction, or not? When you share it with teammates, add technical context them. “We’re 80% covered… that’s pretty good considering we had no tests a year ago.” Or: “Our coverage dropped by 10% this year because we added [new product from consultants]. Let’s add some tests there next year.” It’s data, not an answer all by itself.
I want to focus my attention on code that is used by users and has business value. Test coverage as a metric is almost certainly the Pareto Principle at work: 20% of your tests cover 80% of your application’s business value.
How do you measure code quality?