Better Programming

Advice for programmers.

Follow publication

Why We Quit Unit Testing Classes to Focus On a Behavioral Approach

Jonas Tulstrup
Better Programming
Published in
10 min readDec 8, 2021
Testing behavior of the entire system rather than each class in isolation. Image Source: Author.

I recently wrote a post explaining how we removed 80% of our code by avoiding premature software abstractions and how that greatly improved development efficiency and reduced errors.

One point that I largely left out of that post, was our decision to completely stop writing unit tests for individual classes. I am not saying that you have to pick only one type of test for your system, but simply that the type of tests that focus on individual classes in isolation has multiple issues which made us avoid those completely. In this post, I will cover our reasoning behind this choice, as well as our alternative approach, based on the following main issues with class-level testing.

  1. Class tests make changes painful
  2. Class tests don’t validate actual behavior
  3. Class tests are hard to understand

Let’s dive into each of these in more detail.

1. Class Tests Make Changes Painful

Unit testing on a class level locks down every class in our codebase to work in a specific way and to use other classes in a specific way. Even though class-level unit tests should not test the implementation of the class, the class itself, including its methods and interface, acts as an implementation detail in the perspective of the system as a whole.

This becomes an issue when making changes to our code as every small modification will break tests. Since typical changes to a codebase affect multiple classes, we will most often have to update tests for every single class that we touched. Not only that, we might additionally need to update other tests that are mocking any of the changed classes. This becomes tedious and adds an extra barrier to changing even the slightest detail.

Typical changes cause multiple tests to break. Image Source: Author.

Even when we are only modifying internal implementation details our tests will break and need updating. Say we want to refactor one class into two so we can reuse part of the logic elsewhere. This will immediately break the tests, requiring us to remove and update test cases for the original class as well as create a new set of test cases for the added class. And we didn’t even change any external behavior of the system.

Instead, we would prefer tests to break only when external behavior changes. This would make us free to do any internal refactorings of our codebase without a single change to our tests. Additionally, we would prefer changes to multiple classes within the same flow — such as an endpoint of a microservice — to only require updating a single set of tests, instead of one for every touched class.

2. Class Tests Don’t Validate Actual Behavior

Class-level testing focus on individual classes in isolation. As a result, we are testing implementation details rather than the behavior of our code as a whole. A major downside is that whenever a test fails, it doesn’t tell us anything about whether the external behavior of our code has changed or not, as the tests might have simply failed due to a changed implementation detail.

It is even a problem when all tests continue to be green after a change — typically indicating to developers that everything is good, and safe to deploy. However, this is often not the case for this type of test as we rely on mocking other classes. Every time a class is mocked, an assumption is made of how that class works, which quickly becomes out of date when the class itself changes and we forget to update the mocks.

For example, say that class A handles when class B returns result X, tested by mocking B to act this way. If we, later on, change class B to start also returning result Y in some scenarios, then even though A does not handle this new case, all of its tests would still be green as every mock of B are still set to always return X. Unless developers remember class A when changing B we will have breaking code with all-green tests.

Class tests rely on assumptions that quickly get outdated. Image source: Author.

This means that even when class tests continue being green after a change, we cannot be sure that the code as a whole actually behaves correctly. Instead of this, we would prefer tests passing or failing to be tied only to the external behavior of our codebase. If tests fail, behavior has changed, and if they pass, then the code behaves the same.

3. Class Tests Are Hard To Understand

We just covered that testing classes in isolation do not say much about the external behavior of our code. As a result of this, to really know if a flow is working after making changes, we need to understand each and every class involved in the flow and if their corresponding tests are covering all the required cases and all possible outcomes from the classes that they mock.

We then have to piece this together in our minds to conclude whether the individual classes will jointly result in the correct external behavior of the flow. This is both hard and error-prone, especially when the change is made by somebody who does not know every corner of the codebase by heart.

It is further complicated by the fact that class testing, due to its focus on implementation details, results in many tests breaking often, and for many different reasons. This entails that developers constantly need to update tests, each time requiring a full understanding of the classes involved in the flow, their tests, and how they might affect each other differently after the change.

Let’s have a look at the example below showing four classes using each other as illustrated by the arrows.

If class D changes, not only would we have to understand and update the tests for D, we would also need to understand its impact on all classes depending on D and their corresponding tests.

In this example, that would be classes B and C. Additionally, as B might behave differently due to D changing, we also need to understand A and the tests of that class.

Inferring behavior from class tests is complex. Image source: Author.

Even though understanding a single class test might not be difficult, it becomes quite complex when we need to infer any kind of external behavior from these tests.

Instead, we would prefer that a single test case alone would be enough to infer some part of the actual external behavior of our codebase.

The Alternative To Unit Testing Classes

We made a choice to avoid class-level unit testing entirely, and because of that, we naturally needed an alternative approach to automated testing. We are primarily applying these concepts for testing individual microservices, but they can also be applied to many other types of systems, such as native and web apps or even libraries.

Concept

What we ended up with relies on the basic concept of treating our running system as a black box focusing only on external behavior. This means that we, as part of our tests, start up the system and execute each test on it while it is running. We aim at treating it as much as a black box as possible since this automatically makes the tests independent of implementation details and focused on behavior. Essentially, instead of unit testing classes, we are treating our entire system, a microservice for example, as the unit or system under test.

This has become our most granular type of test and is still only focused on the behavior of a single system or codebase. Additional types of tests might be required to ensure correct behavior end-to-end across multiple systems. However, I will not cover these in this post.

Mocking external dependencies

When treating our system as a black box, we no longer want to mock, stub, double, or fake any internal parts of the codebase, as the tests should not be concerned with those.

We do want to mock external dependencies though, as this allows us to test our system in isolation. The definition of what is and isn’t external will vary from project to project according to what makes sense.

For example, when testing our microservices, databases are treated as internal and are thus not mocked. Time is treated as an external component and is mocked, and so is HTTP communication between the system and other external systems. In cases where our microservice uses a message queue, it can be both.

Messages that are both published and consumed by the microservice itself will not be mocked, however, when publishing or receiving any messages to or from other systems, those will be mocked and treated as external. This is visualized in the illustration below.

Internal vs External. Image Source: Author.

Arranging test data

When treating the running system as a black box, we want to arrange test data and provide input exactly like how it would happen when running in a real environment.

For a microservice, this could be through invoking the endpoints that it exposes, or publishing messages on an external queue that the service consumes.

For a frontend, this could be by actually pressing buttons and navigating the user interface similar to what a user would do.

Using this approach we ensure that all tests are based on real application states exactly as they would appear in production. Additionally, as everything is invoked through allowed input to the system, we will never be spending time testing cases that cannot happen in reality, which is an added benefit.

Arranging test data by invoking externally exposed endpoints. Image Source: Author.

To make tests easy to write and maintain, creating reusable methods for arranging commonly used test data is often well worth it. An example could be arranging users in a database. Instead of making the HTTP request for creating users in every single test, move it to a reusable method that each test case can invoke.

Asserting the outcome

With test data arranged we are ready to execute an action on our system and assert the outcome. As we are still treating our system as a black box, we aim for asserting only the external outcomes that our action caused.

Examples of external outcomes could be the HTTP response in case our action was an HTTP request. Additionally, external outcomes could also be outgoing HTTP calls made by the system and messages published on an external message queue.

Asserting external outcomes of the action. Checkmarks represent points of assertion. Image Source: Author.

When setting up mocks for asserting the correct external behavior of our system, consider using mocks in strict mode. They should be strict in the sense that they cause tests to fail whenever invoked with any input they were not specifically set up to handle. This will ensure not only that our system does the right things, but also that no unexpected behavior is happening. We do not want to make HTTP calls and send messages to other systems when we shouldn’t.

Execution time

When switching out classical class-level unit tests with behavioral system tests, we need to ensure that our tests are still executing quickly. This is one of the benefits of class-level unit tests, as they are not dependent on anything except the class itself which usually makes them extremely fast. When tests suddenly involve both HTTP calls and database queries, this becomes harder.

The solution for getting around this will vary depending on the technology you are using. In our concrete case, using the ASP.NET TestServer was enough to make HTTP calls fast, and substituting our database for an in-memory variant while running tests locally helped with query speeds.

Before a pull request can be merged though, all tests are run with a real database as the in-memory variant will never behave exactly the same. It is important to find a good balance between execution time and using as-real-as-possible internal dependencies.

Conclusion

That’s it, we have now gotten rid of class-level unit tests altogether. Our new approach focuses entirely on behavior, and as a result of this, when tests are passing, we know that the same cases will also work for an actual user of the system.

We no longer need to update tests every time implementations details change, making us free to do any internal refactorings seamlessly.

Lastly, there is no longer any need for having to piece together multiple different classes and tests inside the minds of developers to infer behavior — reading a single isolated test case is now enough to understand and verify how the system behaves.

I would love to hear about your experiences with behavioral tests in the comments section! What types of testing are you favoring, and why?

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Jonas Tulstrup
Jonas Tulstrup

Written by Jonas Tulstrup

Tech and Team Lead at MobilePay | Join my email list for helpful insights https://jonastulstrup.medium.com/subscribe

Responses (20)

Write a response