Rethinking Testing Through Declarative Programming

Write readable and maintainable tests

Published in

Better Programming

13 min readNov 15, 2020

A sailboat on the open sea. — Photo by Kristel Hayes on Unsplash

This article is about writing declarative readable and maintainable tests. I hope it can be a starting point to start questioning: Should we have more evolved test frameworks that encourage and makes it easier to write declarative tests instead of the current generic and imperative ones we have based on steps?

I found that the current frameworks abstractions like describe, it, and asserts only tackle test structure superficially, but there’s still plenty of room to explore what we actually do when testing. The Arrange, Act, Assert (AAA) model is valid metaphor, but it also has the same problem. The consequence is an increased maintenance demand. Starting every test with a blank mindset means future problems.

So I’ll make it concrete presenting examples of a couple of testing patterns I’ve been using in JS in the last couple of years.

I now write almost all of my tests with some of these patterns. In a large codebase, it becomes very important!

These patterns have a very positive effects in tests: improving readability and maintainability and allowing for better collaboration in code reviews. They allow the extraction of more value from testing.

I’d like to see these ideas evolved into best practices for testing a framework.

What we’ll see next:

#1.1 Tablelike Declarative Tests
#1.2 Custom Describes
#1.3 The 'doTest' Declarative Method
#1.4 Expressing Tests With DSLs
#2. Special Note About Code Reusage
#3. Benefits of the Declarative Approach
#4. Summary and Conclusion

Tablelike Declarative Tests

Having tests being just normal function calls, like describe and it, makes it very easy to refactor tests that have duplicated parts by dynamically calling it().

Let’s take the following example: The isWholeBlockSelected(content, selection):Boolean function gives true if the full text of a block is selected.

createState already has some magic that uses the notation [> ... >] to declare where the selection is, but we’ll talk about that later (in Expressing Tests With DSLs).

The point here is once we understand the first it, then all of the others are pretty much the same. A lot of the content is just garbage duplicated code. Here’s a filter to show how the mind reads it:

And even the it descriptions are kind of omitted by the eye. But we can do better if it is just a regular call. Then we can do:

You can dismiss the complex forEach part and just focus on analysing all of the cases in terms of scenario/input + expected output. The test title becomes useless and actually harder to read than just reading the case. Running them gives the following:

We could even create a generic testFunction high-order function to test any function like this.

Tests in this way become a set of cases and expectations. This isn’t new — FitNesse proposed a similar approach, although within a specific tool and environment and within the idea of acceptance tests, putting a focus on different stakeholders and on a higher business perspective. I think this idea can be applied to any unit test without any special tool. And even for the target being only devs.

What we just did is declarative programming. We’ve split the information part of the test (the what) from the mechanics and behaviour needed to create and run the tests (the how). This is the underlying idea of the whole article. We’ll explore it further.

I called them tablelike tests because given a framework that models this explicitly, we could show and edit tests as tables.

Rethinking multiple imperative ‘it’s into a tablelike cases-based test

Custom Describes

Another way to reuse code, especially setup code, is to create your own describe blocks. I usually do this to create different types of tests, mostly related to an architectural part. For example, we have describeMongoDB to write back-end tests that need to have a Mongo mock setup.

This kind of describe provide´s useful functions to the body that uses it to access the context, in this case the configured db instance and mongoServer. Here’s a sample test using describeMongoDB.

In my current project, we have a few of these custom describes for both the front and back end. We use this especially for tests that somehow integrate different parts of the architecture, like a front end making server calls or back-end endpoints that need the DB, etc.

So we have describeDB for DB-model tests, describeGQL for testsing GraphQL queries/mutations, describeAction for front-end redux actions, etc. Encapsulating the setup code is good for test maintenance. Here’s the high-order function to create custom describes:

We use this piece of code to create custom 'describe's keeping the '.only' and '.skip' functionality.

The `'`doTest`'` Declarative Method

This is what we’re aiming to do for every test lately. It’s a similar approach to using tables but with inverted control.

Any piece of code we’re testing always involves at least >1 cases, so why should we think every it as a start-from-scratch piece of code? It sounds natural, but it has to do with the mental flow we follow when testing (especially afterwards):

Make a test for an initial case: We think about the steps, the AAA, as a procedure: First, I’ll create a clock. Then, I’ll set an alarm. Then, blah, blah. Oh, I need this other thing in the context. Let’s add that step, etc.
Then, think about new cases: We forget about the previous it and start from scratch for second tests because this one is different. Or worse, we start to copy and paste parts of the first test.

We shouldn’t just judge that copy and paste. There’s an underlying cause for doing that, even if we know copying and pasting is bad.

What happens is that we’re trying to quickly focus our attention on the differences in order to express these new cases. We don’t want to lose time on the common parts — because that’s what testing is about: thinking about input variations and expected output. Writing too much procedural code for the steps deviate our attention, and many times we end up missing some cases.

So use this other flow:

Make a first test.
When about to start the second test, get back to the first one and think about the differences. What parts do they share? Which particular app concepts do they share in each part (decompose the app semantics of each AAA phase).
With that in mind, extract the first test body into a doTest function that’ll be used now for the first function. Model it as given inputs and expected outputs. Make it green.
Then, get back to the second case you were about to build, and write it using the doTest.

Left: Tests duplicate imperative code while mixing up test-specific information. Right: Using a doTest function to reuse code — On the left, some tests duplicate imperative code (in black) while mixing up test-specific information. On the right, the tests are using a `'`doTest`'` function to reuse code and clean up each case.

Let’s see an example as a refactor. We have a set of tests for a pure function and a redux reducer called project. We’re testing its behaviour for a particular action, receiveChangeSet.

A ChangeSet is like a commit operation on data objects. This is part of a distributed-state architecture — a front end receives changes done by another user and updates its local copy of those objects. So we have changeSets involving adding new objects, deleting objects, and updating them.

If we write tests procedurally one by one, we’ll end up with many tests like this:

If we filter the content by useful information, we’ll see … this is what the eye ends up needing effort to do, and when doing so, sometimes it misses small differences.

Filtering the content by useful information.

Note: I’ll talk about assertions like the following in the last section.

Code that reads: objects: { …state.project.masterBranch.objects, NEW: changeSet.changes[0].added

In the end, what changes on every test is:

What the initial objects in the store are (context)
Which changes we’re processing (input)
What the expected output states of objects in the system are after applying those changes (outputs)

So we can think about a doTest function for these particular cases. Here’s a refactored version:

Below is a little screencast showing how to refactor one of those into using doTest. Pretty straightforward!

Refactoring a test case to use a ‘doTest’ function — Refactoring a test case to use a `'`doTest`'` function

Expressing Tests With DSLs

Let’s say the main idea up to this point was to make tests declarative by expressing them in terms of data instead of procedural code. We declare context, inputs, and expected outputs as data. This applies to both tablelike tests and doTest styles.

Given that, we’ve found that in some domains, expressing those inputs or outputs can be very hard. It might involve a lot of code, which hurts test readability. (The whole point of being declarative is to improve the tests readability as much as possible, as in Edward Tufte’s Data-Ink ratio when applied to coding.)

Let’s see some real examples.

Example 1: A rich text editor selection and entities DSL

This first example tests the isWithinEntityMatch function, which given DraftJS text editor content, tells us if the current selection is within a given entity.

DraftJS is a framework for building rich text editors in React. We’ve used it for game dialogues. In this case, an entity is what we call a markup, like an inlined note within a dialogue, delimited by curly braces: {MOVE_CAMERA ... }. Here’s how it looks from the UI:

DOC: Marty! {MOVE_CAMERA to: DOC} You’re not thinking fourth-dimensionally!

The problem is this text is actually a pretty complex DraftJS model object, involving ImmutableJS- and DraftJS-specific models. So we need to improve the way we create those inputs or contexts to also make them declarative and readable. We could use util functions, factories, builders, and, at the end of the spectrum, create a domain-specific language (DSL) just for our specific concern.

In our case, we came up with this tablelike test:

The text is actually a very small internal DSL using just regular strings and conventions through symbols:

... {something} …: Curly braces for entities (same as the user types)
[|]: Means the user cursor is at that specific position (the collapsed selection)
[> ... >]: Declares an expanded selection. That is, the content within it is currently selected from left to right. The cursor, by definition, is on the right side.
[< … <]: The same as above, but the selection is from right to left, and the cursor is on the left side. (The selection direction is pretty important in text editors.)

The impl of expectIsWithin can be found here with the DSL parser. I’m not inlining it because it’s a lot of code for just an impl. But take a look, and imagine if each test case needed that much code to create the input. It’d be really difficult to read and maintain!

Example 2: Undo/redo logic

Another real example: a function to compute the undo-redo stack of an application. It’s a pure function in this case, a reselect selector. That’s a function that derives some data given a (redux) app state, it.

In this case, the state has a list of changes that were done. A change could be:

A regular change: For example, A (we assign names for the sake of test readability)
An undo: Reverts a change. We use the notation U(A).
a redo: Redoes a change. We use R(A).

Same as with the DraftJS example, there’s a long distance between these concepts when thinking about a test — like Let’s test change A, B, C; Undo C and B; and see where the cursor is — to what we actually need to code to create that scenario.

Expressing the inputs/context would involve a lot of code. So we, again, create a small, internal, string-based DSL to be able to express cases with a very compact syntax.

Test tables/DSL for building complex input objects. Every string in the array is a test case expressed in a simple DSL. — Test tables and DSL for building complex input objects. Every string in the array is a test case expressed in a small, simple DSL as a notation.

We actually just ended up transcribing the same notation we used on a whiteboard to investigate the problem and come up with a solution.

It reads like this:

// [latest_change    ←   first_change]   => expected_stack[ R(B), U(B), U(C), C, B, A ]            => [C, (B), A]

The user did a change, A, followed by change B and C . Then they undid C and B, but right after this, they redid B. So where are we?

[C, (B), A]

This means that A is applied, C isn’t (it’s been undone), and we’re currently at B (also applied). From this point, we can undo A or redo C

Both inputs, as well as the expected outputs, are expressed as a string DSL, but the underlying model consists of complex data structures that would’ve made the test difficult to read.

Here is the impl of parseInput, which, in turn, has its own tests. So to make better tests, we had to create a language, a parser (although this was pretty simple), and tests for that parser. Imagine if this was easier to do out of the box with testing tools ?

The DSL allows us to get rid of a lot of boilerplate code and just distill the meaningful part for the test. This makes it easier to think about missing cases, redundant cases, etc. — especially by others — in code reviews.

Special Note About Code Reusage

We can think of what we did in all of these examples as reusing code between tests. But reusing code alone isn’t a good rule of thumb — it should be constrained by how the test experience ends up.

Sometimes reusing code has the drawback of decreasing the test’s code readability. We don’t want that in our tests.

I detected two cases where this happens:

Assertions

As we saw in the receiveChangeSet example, the asserts were trying to avoid code duplications by reusing data from the context and input.

Here’s the example — already shortened by using doTest:

This is perfectly fine, and it avoids duplicating code. But it still sacrifices test readability. In the end, I find it better to sacrifice duplication. It makes it easier to read for people other than the author. And if we really want to avoid duplications, we could do it by making it easier to reuse well-known inputs instead of the test itself. Like this:

But we should be careful when extracting information from the test to avoid the next problem.

Reusing inputs and outputs

Another antipattern I’ve detected is to reuse test inputs and domain objects in such a way that it becomes very difficult to read the code later.

Take this example, a test that first declares a lot of objects that are reused between tests cases. The domain isn’t so important. It’s also a selector’s test.

If I only read the it, I really wouldn’t be able to understand what it’s supposed to be testing.

To understand it, the eye needs to jump frequently between the it and the objects on the top of the file. Also, reading all of the objects at once might not make any sense if some objects are only used by some tests but not by others. The test misses the most important information fragments.

Reusing code while losing test semantics increases the cognitive load on readers. The eye needs to jump back and forth. — Reusing code while losing test semantics increases the cognitive load on readers. The eye needs to jump back and forth between the test code and the reused code.

So this is the opposite of what we’ve seen in previous sections. Tests have duplicated the imperative code, while they’ve extracted the declarative data.

The conclusion here is the most important part we should reuse between tests are those imperative parts (the how of the test) and not the input/output definitions (the what). Otherwise, they lose their explicit meaning.

Benefits of the Declarative Approach

Designing declarative expressive tests:

Gives higher readability
Improves maintenance
Provides a better process to ordering your thinking. (First, do the imperative test. Then, generalise a function for the second case in terms of inputs, what they do, and your expectations.) Also spend some time thinking about different scenarios and how to express them in this test’s language.
Exploits the full value of code reviews: Reviewing a big set of tests that have many lines of imperative and/or duplicated code takes extra mental effort beyond thinking about test coverage (in terms of possible scenarios not in LOC). This approach reduces that effort completely, allowing the reviewer to focus on missing cases and whether the current scenarios and expectations make sense from the domain point of view.

Summary and Conclusion

This is not new: Good tests require effort — the same effort we spend on core logic. Using the AAA testing model alone isn’t enough to write good tests. We should explore what we, as devs, do within those phases, identifying patterns and their consequences in tests and in our overall SW development process.

As with many other problems, declarativity is the most powerful metapattern we have to tackle programming problems like this.

Test maintenance depends on two practices when writing tests. You can follow them:

Write declarative tests
Don’t omit declarations by leaving them out of the test (either in other files or the same file but up/down — avoid the need for the eye to jump while reading). But search for the shortest most expressive way to declare them. No more, no less. This is pretty much Edward Tufte’s Data-Ink ratio applied to tests.
Create a language (in the broadest sense) for each set of tests. It could be a table, a doTest function, or even a DSL. Separate the what from the how.

When you write tests, you’re not writing them for the code — you’re testing right now! You should be writing tests for the reader. Tests must be empathetic.

I’m looking forward to new tools and frameworks that’ll embrace these and many other ideas at their core. Testing shouldn’t be so hard — we can do better. There’s a lot to explore!