Why Your PRs Aren’t Improving Quality

It’s simple: you’re missing the forest for the tree

Published in

Better Programming

10 min readApr 23, 2022

The ubiquity of Git in the world of software engineering means that the tooling and processes around it have somewhat coalesced more or less into a few handful of patterns with respect to developer workflows and software release patterns.

These include:

At the core of each of these is the pull request or PR and many a team attempts to enforce quality — of both code and product — at this layer.

Guess what? It rarely works.

Fixing What’s Already Broken

It should be obvious why this approach rarely works without much more rigor and discipline: at the point of the PR, the code is already written and all who participate are subject to the sunk cost fallacy.

In other words, at the point of the PR, it is unlikely that a team lead or peer would point out that an approach is sub-optimal or should be scrapped or request major changes. Not just because of the sunk cost fallacy, but the PR review itself is likely a big context switch for the reviewer as well.

Instead, most PRs will end up focusing on:

Obvious, isolated logic errors
Bad practices and style (but lint rules can be used for this)
Unsafe practices (again lint rules, static analysis tools, or tools like CodeQL can be used instead)
Violations of DRY when a reviewer knows that such a piece of logic may exist elsewhere in the system (better solved earlier in the process through design reviews)
Or — the worst — just routine motions.

Such exercises ultimately feel like a chore and rarely lead to the type of code or product quality improvements that will make a difference in terms of lower defect rates, iteration velocity, better architecture, and increased developer productivity.

More often than not, it feels as if such exercises are ceremony and miss the forest for the trees.

If this feels all too familiar, read on!

PRs Are Not Code Reviews

There is often a mistaken equivalency that PRs are code reviews. While indeed, you review code in the context of a PR, the PR is not a code review.

For starters, it is often useless to review changes in code outside of the context of the larger flow of the application and a deeper review of the product, feature, or business requirements. In other words, deciding how the code should be written requires knowing the business or product context in which the code was written.

But even at a very fundamental level, it is often impossible to understand whether a 1-line diff in isolation is correct or misses edge cases or requires more test cases without understanding the larger context of the code around it and how the data flows through that code.

I posit that no amount of trying to make the PR easier for the reviewer with smaller PRs, PR templates, more frequent PRs, draft PRs, and so on will meaningfully impact product and code quality.

The reason is simple: code is dense and many features and bug fixes are complex thus requiring a deep understanding of the flow of logic and business requirements and it is likely impossible for a peer working on some other sub-system — or even the same sub-system — to know whether your code is correct, complete, and high quality.

The Crux of the Problem

And here, we get to the real core of the question: how exactly do we determine code and product quality?

To answer this question, it’s instructive to consider how nearly every other industry achieves quality and it is almost universally two levers that are used:

Specifications
Testing

It’s as simple as asking the question: did the output meet the specified product design? Whether it’s a food manufacturing line, a machine assembly line, pharmaceutical manufacturing, construction, a seamstress sewing a wedding dress, or software engineering, it is clear that meaningful quality can only be achieved through specifications and testing.

These are two sides of the same coin. Without specifications, testing is meaningless. Without testing, specifications cannot be verified. The specifications determine the testing strategy and the testing verifies that the specifications have been correctly implemented.

In the age of agile, no one wants to hear this, but the root of many code and product quality issues comes from a broken (or non-existent) product specification process

Likewise, code reviews without specifications are empty exercises because it is nearly impossible to determine the correctness or suitability of an implementation without the business context in which it was designed and developed (if it was designed at all!). Without the specifications, code reviews focus on the wrong things.

Meaningfully Improving Code and Product Quality

So how can we meaningfully improve code and product quality? How do we have more meaningful code reviews rather than PRs that just go through the motion?

Fix the Specification Process

In the age of agile, no one wants to hear this, but the root of many code and product quality issues comes from a broken (or non-existent) product specification process. Often, this is a result of a lazy or inexperienced product team that lacks the rigor to fully explore and document a feature or a product.

This often manifests as “we’re agile so let’s build a prototype and we’ll review it and iterate”. More often than not, the sunk cost fallacy creeps in once again and the prototype becomes the shipped product.

Without sufficient specifications, deciding on the most suitable architecture or defining the test procedure is an impossible task because “edge cases” will abound. Without a strong specification to start from, it is often impossible to determine the correct design pattern to apply to solve the problem at hand because the scope of the problem has not been fully revealed to the implementation team.

Personally, I am a proponent of Basecamp’s Shape Up model. I’ve used it. I know it works.

Shape Up

Shape Up is for product development teams who struggle to ship. If you've thought to yourself "Why can't we ship like…

basecamp.com

This model requires that the product function within an organization accept more responsibility and rigor up front to fully encapsulate the parameters of work so that an engineering team can work from a full context. Conversely, it also requires the engineering function to carefully review the specifications and make determinations of feasibility, tradeoffs, and timelines.

Having specifications can make code reviews much more meaningful because rather than reviewing the code without context, the objective in a code review then becomes one to determine whether the implementation meets the specifications.

Require and Review Technical Designs

Rather than reviewing code once it’s been written and cut, it would seem that the software engineering discipline could learn a lot by understanding how other industries manage matters of output quality: review the design.

Imagine building a house or skyscraper without a blueprint. Imagine building a Tesla without a CAD model of every component. Imagine building a SpaceX rocket without rigorous design and specification of each part that has to withstand incredible forces. It would seem an impossibility. Even landscapers will provide drawings of proposed designs to iterate and review before starting to dig and plant. It’s not that the iteration stops there, but that iterating earlier in the process is significantly cheaper than iterating later in the process.

Yet teams often treat software as if it’s a given that we simply don’t give a damn about technical design and architecture. As if it is a sacrifice to be made at the alter of iteration speed and the gospel of “agile”.

Agile in the software engineering discipline is often applied to the code, but rarely to the design. Wouldn’t it be much cheaper to iterate the design before spending the time to fix non-conforming, misbehaving code later?

The rigor of the design is specific to the domain, of course. Fast Company had a great article They Write the Right Stuff about software engineering within NASA:

But how much work the software does is not what makes it remarkable. What makes it remarkable is how well the software works. This software never crashes. It never needs to be re-booted. This software is bug-free. It is perfect, as perfect as human beings have achieved.

When the code is part of a one shot, once in a generation mission, such systems must strive for near perfection.

For most teams, this is obviously way too much rigor, but that does not mean “no design”. Instead, the question each team should ponder is how to make the process more efficient at de-risking product and technical design.

Google’s Design Sprint is a great example of how to incorporate fast, iterative design and discovery with stakeholders early in the process to reduce risk and cost later in the process.

A 90 second intro to a powerful technique to de-risk your product development process.

Write Better Code

Ironically, the answer to improving code quality is almost certainly “write better code”. The best way to do that isn’t PRs nor code reviews because at that point the code is already written (this is the fundamental logical flaw in thinking with respect to code reviews and code quality) and teams will be hesitant to say “Let’s refactor this into cleaner code and better design” (will never happen) because at that point, the pressure to ship is too high. In fact, that code will never be rewritten (every product has that pile of code that no one dares touch).

This is no easy feat and requires more more diligence in the hiring process and structuring of teams. Rather than a free-for-all division of work, it often means that core framework bits and APIs should be handled by a smaller group of minds that have more experience. Fred Brooks coined a term for this in his seminal collection of essays The Mythical Man Month: “conceptual integrity”.

To be fair, no team ever gets it right on the first cut; that’s not the point. The point is to build a system in such a way that its core is pliable and malleable to the shifting business requirements. Martin Fowler’s writing on “You Aren’t Gonna Need It” or YAGNI seems fitting:

Yagni only applies to capabilities built into the software to support a presumptive feature, it does not apply to effort to make the software easier to modify. Yagni is only a viable strategy if the code is easy to change, so expending effort on refactoring isn’t a violation of yagni because refactoring makes the code more malleable.

In other words, good code often has the quality that it is easily adaptable to changing requirements; the exact opposite of spaghetti code.

More Team-Oriented Development

Developers working together on a bug fix or a feature in a team will have a better grasp of the larger context of the business requirements and code flow in that area of the system and are therefore better equipped to provide more meaningful feedback to each other compared to pulling in a reviewer who has no contextual understanding of the specific business problem a piece of code is solving.

Organizing development work into small teams is a way to automatically put more eyes on each unit of work and ensure that there are multiple individuals who understand the larger context of the feature being implemented.

In fact, this is precisely what Google prescribes:

If your team practices pair programming, then the code has already been reviewed by a second person.

Interactive Code Reviews

Rather than looking at diffs in isolation, having interactive code reviews where the submitter walks through the code in the larger context of the product or feature and discusses the tradeoffs which were made can lead to much more meaningful feedback from a reviewer.

Often there are nuances to the decision making process that are not visible when looking at a diff or even an entire change set because the code is interacting within the flow of a larger system or state machine.

Having a developer or team walk through their technical design decisions and challenges they accounted for during their implementation can be far more productive than just reviewing a PR. The PR is not the code review, but a central part of a more holistic code review process.

Once again, Google’s solution is pragmatic:

[M]oving to synchronous code review, or at least ensuring that developers prioritize code review, helps to ensure that changes don’t have to wait hours, or even days, to get merged into trunk.

Focus on Testing in Code Reviews

In addition to looking at the details of the implementation, it is perhaps even more important to review the testing strategy to verify that it provides sufficient coverage of the requirements rather than coverage of the code.

No amount of code reviews will account for the edge cases because by their nature, these are cases that were not in a specification. Similarly, to evaluate whether the testing —be it automated unit tests or automated end-to-end tests — is sufficient in a code review requires that there is a sufficiently detailed functional specification from which to determine the test coverage.

Code reviews should focus on whether the developer has designed and implemented the proper test cases to account for the specifications. CI/CD should focus on ensuring those test cases are executed and passing. But of course, this requires that such specifications exist in the first place from which to determine the functional coverage (not just statistical coverage) of such test cases.

Ultimately, the PR is one tool that can be used to improve code and product quality. If it is a team’s only tool, then it is easy to see why teams continue to struggle to ship high quality, low defect software quickly.

Building quality software ultimately still requires discipline (and experience helps!) to build a process from start to finish that focuses on clearly defining the specifications. For it is only from the specifications that the definition of “correct” can be derived.