Rethinking Testing Through Declarative Programming
Write readable and maintainable tests
This article is about writing declarative readable and maintainable tests. I hope it can be a starting point to start questioning: Should we have more evolved test frameworks that encourage and makes it easier to write declarative tests instead of the current generic and imperative ones we have based on steps?
I found that the current frameworks abstractions like describe
, it
, and asserts
only tackle test structure superficially, but there’s still plenty of room to explore what we actually do when testing. The Arrange, Act, Assert (AAA) model is valid metaphor, but it also has the same problem. The consequence is an increased maintenance demand. Starting every test with a blank mindset means future problems.
So I’ll make it concrete presenting examples of a couple of testing patterns I’ve been using in JS in the last couple of years.
I now write almost all of my tests with some of these patterns. In a large codebase, it becomes very important!
These patterns have a very positive effects in tests: improving readability and maintainability and allowing for better collaboration in code reviews. They allow the extraction of more value from testing.
I’d like to see these ideas evolved into best practices for testing a framework.
What we’ll see next:
#1.1 Tablelike Declarative Tests
#1.2 Custom Describes
#1.3 The 'doTest' Declarative Method
#1.4 Expressing Tests With DSLs
#2. Special Note About Code Reusage
#3. Benefits of the Declarative Approach
#4. Summary and Conclusion
Tablelike Declarative Tests
Having tests being just normal function calls, like describe
and it
, makes it very easy to refactor tests that have duplicated parts by dynamically calling it()
.
Let’s take the following example: The isWholeBlockSelected(content, selection):Boolean
function gives true
if the full text of a block is selected.
createState
already has some magic that uses the notation [> ... >]
to declare where the selection is, but we’ll talk about that later (in Expressing Tests With DSLs).
The point here is once we understand the first it
, then all of the others are pretty much the same. A lot of the content is just garbage duplicated code. Here’s a filter to show how the mind reads it:
And even the it
descriptions are kind of omitted by the eye. But we can do better if it
is just a regular call. Then we can do:
You can dismiss the complex forEach
part and just focus on analysing all of the cases in terms of scenario/input + expected output. The test title
becomes useless and actually harder to read than just reading the case
. Running them gives the following:
We could even create a generic testFunction
high-order function to test any function like this.
Tests in this way become a set of cases and expectations. This isn’t new — FitNesse proposed a similar approach, although within a specific tool and environment and within the idea of acceptance tests, putting a focus on different stakeholders and on a higher business perspective. I think this idea can be applied to any unit test without any special tool. And even for the target being only devs.
What we just did is declarative programming. We’ve split the information part of the test (the what) from the mechanics and behaviour needed to create and run the tests (the how). This is the underlying idea of the whole article. We’ll explore it further.
I called them tablelike tests because given a framework that models this explicitly, we could show and edit tests as tables.
Custom Describes
Another way to reuse code, especially setup code, is to create your own describe blocks. I usually do this to create different types of tests, mostly related to an architectural part. For example, we have describeMongoDB
to write back-end tests that need to have a Mongo mock setup.
This kind of describe
provide´s useful functions to the body that uses it to access the context, in this case the configured db
instance and mongoServer
. Here’s a sample test using describeMongoDB
.
In my current project, we have a few of these custom describe
s for both the front and back end. We use this especially for tests that somehow integrate different parts of the architecture, like a front end making server calls or back-end endpoints that need the DB, etc.
So we have describeDB
for DB-model tests, describeGQL
for testsing GraphQL queries/mutations, describeAction
for front-end redux actions, etc. Encapsulating the setup code is good for test maintenance. Here’s the high-order function to create custom describe
s:
The '
doTest'
Declarative Method
This is what we’re aiming to do for every test lately. It’s a similar approach to using tables but with inverted control.
Any piece of code we’re testing always involves at least >1 cases, so why should we think every it
as a start-from-scratch piece of code? It sounds natural, but it has to do with the mental flow we follow when testing (especially afterwards):
- Make a test for an initial case: We think about the steps, the AAA, as a procedure: First, I’ll create a clock. Then, I’ll set an alarm. Then, blah, blah. Oh, I need this other thing in the context. Let’s add that step, etc.
- Then, think about new cases: We forget about the previous
it
and start from scratch for second tests because this one is different. Or worse, we start to copy and paste parts of the first test.
We shouldn’t just judge that copy and paste. There’s an underlying cause for doing that, even if we know copying and pasting is bad.
What happens is that we’re trying to quickly focus our attention on the differences in order to express these new cases. We don’t want to lose time on the common parts — because that’s what testing is about: thinking about input variations and expected output. Writing too much procedural code for the steps deviate our attention, and many times we end up missing some cases.
So use this other flow:
- Make a first test.
- When about to start the second test, get back to the first one and think about the differences. What parts do they share? Which particular app concepts do they share in each part (decompose the app semantics of each AAA phase).
- With that in mind, extract the first test body into a
doTest
function that’ll be used now for the first function. Model it as given inputs and expected outputs. Make it green. - Then, get back to the second case you were about to build, and write it using the
doTest
.
Let’s see an example as a refactor. We have a set of tests for a pure function and a redux reducer called project
. We’re testing its behaviour for a particular action, receiveChangeSet
.
A ChangeSet
is like a commit
operation on data objects. This is part of a distributed-state architecture — a front end receives changes done by another user and updates its local copy of those objects. So we have changeSet
s involving adding
new objects, deleting
objects, and updating
them.
If we write tests procedurally one by one, we’ll end up with many tests like this:
If we filter the content by useful information, we’ll see … this is what the eye ends up needing effort to do, and when doing so, sometimes it misses small differences.
Note: I’ll talk about assertions like the following in the last section.
In the end, what changes on every test is:
- What the initial objects in the store are (context)
- Which changes we’re processing (input)
- What the expected output states of objects in the system are after applying those changes (outputs)
So we can think about a doTest
function for these particular cases. Here’s a refactored version:
Below is a little screencast showing how to refactor one of those into using doTest
. Pretty straightforward!
Expressing Tests With DSLs
Let’s say the main idea up to this point was to make tests declarative by expressing them in terms of data instead of procedural code. We declare context, inputs, and expected outputs as data. This applies to both tablelike tests and doTest
styles.
Given that, we’ve found that in some domains, expressing those inputs or outputs can be very hard. It might involve a lot of code, which hurts test readability. (The whole point of being declarative is to improve the tests readability as much as possible, as in Edward Tufte’s Data-Ink ratio when applied to coding.)
Let’s see some real examples.
Example 1: A rich text editor selection and entities DSL
This first example tests the isWithinEntityMatch
function, which given DraftJS text editor content, tells us if the current selection is within a given entity.
DraftJS is a framework for building rich text editors in React. We’ve used it for game dialogues. In this case, an entity is what we call a markup, like an inlined note within a dialogue, delimited by curly braces: {MOVE_CAMERA ... }
. Here’s how it looks from the UI:
The problem is this text is actually a pretty complex DraftJS model object, involving ImmutableJS- and DraftJS-specific models. So we need to improve the way we create those inputs or contexts to also make them declarative and readable. We could use util functions, factories, builders, and, at the end of the spectrum, create a domain-specific language (DSL) just for our specific concern.
In our case, we came up with this tablelike test:
The text is actually a very small internal DSL using just regular strings and conventions through symbols:
... {something} …
: Curly braces for entities (same as the user types)[|]
: Means the user cursor is at that specific position (the collapsed selection)[>
...>]
: Declares an expanded selection. That is, the content within it is currently selected from left to right. The cursor, by definition, is on the right side.[<
…<]
: The same as above, but the selection is from right to left, and the cursor is on the left side. (The selection direction is pretty important in text editors.)
The impl of expectIsWithin
can be found here with the DSL parser. I’m not inlining it because it’s a lot of code for just an impl. But take a look, and imagine if each test case needed that much code to create the input. It’d be really difficult to read and maintain!
Example 2: Undo/redo logic
Another real example: a function to compute the undo-redo stack of an application. It’s a pure function in this case, a reselect selector. That’s a function that derives some data given a (redux) app state, it.
In this case, the state has a list of changes that were done. A change could be:
- A regular change: For example,
A
(we assign names for the sake of test readability) - An undo: Reverts a change. We use the notation
U(A)
. - a redo: Redoes a change. We use
R(A)
.
Same as with the DraftJS example, there’s a long distance between these concepts when thinking about a test — like Let’s test change A, B, C; Undo C and B; and see where the cursor is — to what we actually need to code to create that scenario.
Expressing the inputs/context would involve a lot of code. So we, again, create a small, internal, string-based DSL to be able to express cases with a very compact syntax.
We actually just ended up transcribing the same notation we used on a whiteboard to investigate the problem and come up with a solution.
It reads like this:
// [latest_change ← first_change] => expected_stack[ R(B), U(B), U(C), C, B, A ] => [C, (B), A]
The user did a change, A
, followed by change B
and C
. Then they undid C
and B
, but right after this, they redid B
. So where are we?
[C, (B), A]
This means that A
is applied, C isn’t (it’s been undone), and we’re currently at B
(also applied). From this point, we can undo A
or redo C
Both inputs, as well as the expected outputs, are expressed as a string DSL, but the underlying model consists of complex data structures that would’ve made the test difficult to read.
Here is the impl of parseInput
, which, in turn, has its own tests. So to make better tests, we had to create a language, a parser (although this was pretty simple), and tests for that parser. Imagine if this was easier to do out of the box with testing tools ?
The DSL allows us to get rid of a lot of boilerplate code and just distill the meaningful part for the test. This makes it easier to think about missing cases, redundant cases, etc. — especially by others — in code reviews.
Special Note About Code Reusage
We can think of what we did in all of these examples as reusing code between tests. But reusing code alone isn’t a good rule of thumb — it should be constrained by how the test experience ends up.
Sometimes reusing code has the drawback of decreasing the test’s code readability. We don’t want that in our tests.
I detected two cases where this happens:
Assertions
As we saw in the receiveChangeSet
example, the asserts were trying to avoid code duplications by reusing data from the context and input.
Here’s the example — already shortened by using doTest
:
This is perfectly fine, and it avoids duplicating code. But it still sacrifices test readability. In the end, I find it better to sacrifice duplication. It makes it easier to read for people other than the author. And if we really want to avoid duplications, we could do it by making it easier to reuse well-known inputs instead of the test itself. Like this:
But we should be careful when extracting information from the test to avoid the next problem.
Reusing inputs and outputs
Another antipattern I’ve detected is to reuse test inputs and domain objects in such a way that it becomes very difficult to read the code later.
Take this example, a test that first declares a lot of objects that are reused between tests cases. The domain isn’t so important. It’s also a selector’s test.
If I only read the it
, I really wouldn’t be able to understand what it’s supposed to be testing.
To understand it, the eye needs to jump frequently between the it
and the objects on the top of the file. Also, reading all of the objects at once might not make any sense if some objects are only used by some tests but not by others. The test misses the most important information fragments.
So this is the opposite of what we’ve seen in previous sections. Tests have duplicated the imperative code, while they’ve extracted the declarative data.
The conclusion here is the most important part we should reuse between tests are those imperative parts (the how of the test) and not the input/output definitions (the what). Otherwise, they lose their explicit meaning.
Benefits of the Declarative Approach
Designing declarative expressive tests:
- Gives higher readability
- Improves maintenance
- Provides a better process to ordering your thinking. (First, do the imperative test. Then, generalise a function for the second case in terms of inputs, what they do, and your expectations.) Also spend some time thinking about different scenarios and how to express them in this test’s language.
- Exploits the full value of code reviews: Reviewing a big set of tests that have many lines of imperative and/or duplicated code takes extra mental effort beyond thinking about test coverage (in terms of possible scenarios not in LOC). This approach reduces that effort completely, allowing the reviewer to focus on missing cases and whether the current scenarios and expectations make sense from the domain point of view.
Summary and Conclusion
This is not new: Good tests require effort — the same effort we spend on core logic. Using the AAA testing model alone isn’t enough to write good tests. We should explore what we, as devs, do within those phases, identifying patterns and their consequences in tests and in our overall SW development process.
As with many other problems, declarativity is the most powerful metapattern we have to tackle programming problems like this.
Test maintenance depends on two practices when writing tests. You can follow them:
- Write declarative tests
- Don’t omit declarations by leaving them out of the test (either in other files or the same file but up/down — avoid the need for the eye to jump while reading). But search for the shortest most expressive way to declare them. No more, no less. This is pretty much Edward Tufte’s Data-Ink ratio applied to tests.
- Create a language (in the broadest sense) for each set of tests. It could be a table, a
doTest
function, or even a DSL. Separate the what from the how.
When you write tests, you’re not writing them for the code — you’re testing right now! You should be writing tests for the reader. Tests must be empathetic.
I’m looking forward to new tools and frameworks that’ll embrace these and many other ideas at their core. Testing shouldn’t be so hard — we can do better. There’s a lot to explore!