Why Are Phantom Types Useful?

Using the Elm programming language

Stefan Wullems
Better Programming

--

Photo by Ben Collins on Unsplash

In this post, we’re slowly going to unravel both what phantom types are, as well as why they might be useful.

We’ll be using Elm as the example programming language, but if you know a bit of another programming language that supports type parameters you’ll probably be able to follow.

First, let’s start with a definition:

“Phantom types are custom types that have one or more unused type parameter.”

Here is an example of a phantom type.

type Length unit = Length Int

To understand what this is and why it might be useful, we’re going to begin with understanding a problem that they can solve.

Let’s start out with this piece of elm code.

meter : Int 
meter = 1
twoMeters : Int
twoMeters = meter + meter
kilometer : Int
kilometer = 1
twoKilometers : Int
twoKilometers = kilometer + kilometer

This works completely fine. However, there’s a mistake, waiting to be made in this code. Take a look at this:

test : Int 
test = meter + kilometer
> test
> 2

That’s not right. It doesn’t make sense to add a meter to a kilometer without converting it first. What happened?

The mistake is that we added two things that don’t have the same meaning; two incompatible concepts. We mixed the units “meters” and “kilometers” and the result is meaninglessness.

A question that might arise is: “If it’s meaningless to add a kilometer to a meter without converting it first, how come it’s possible to do so?”

The mistake is possible because in writing this program, we’ve abstracted both the concept of “meters” and the concept of “kilometers” to the concept of “integers”.

Abstraction is the process of “forgetting” or “omitting” details. In this case we take a concept “length”, which has a “unit” and an “amount” component, and we abstract it to an “integer”, a concept that only has, and therefore can only capture the “amount” component. In this process of abstraction, we “omit” an important detail, namely the “unit” component; the component that dictates whether addition makes sense for two given lengths. This is why the computer does not stop you from adding “meters” and “kilometers”. It doesn’t know about “meters” or “kilometers”, they’re the “units” we’ve abstracted away.

Ok, we understand the cause of the mistake being possible now: In representing the concept “length”, we perform an abstraction step that “omits” the “unit” component. Therefore, it’s not possible to tell lenghts with different units apart anymore and so it becomes possible to add lengths that implicitly have different units.

Let’s try to come up with a way to prevent this mistake from happening.

Remember, the concept of “length” has a “unit” and an “amount” component. Integers already capture the “amount” component, so to make sure we’ll never add lengths with different units we just need to figure out a way to tell the computer:

  • What the “unit” of a specific “length” is.
  • That it should prevent addition of two “lengths” that don’t have the same “unit”.

How to go about this?

One way to do it is to store both the “unit” and the “amount” components in a Length data structure. Then before we combine two lengths, we first compare their “units” to make sure the operation makes sense.

type Unit = Meters | Kilometerstype Length = Length Unit Intadd : Length -> Length -> Maybe Length 
add (Length unitA a) (Length unitB b) =
if unitA == unitB then
Just (Length unitA (a + b))
else
Nothing

If we make a mistake now, at least we don’t get meaninglessness. It does have some tradeoffs though.

  • When we have made the mistake, it only becomes apparent at runtime. Although less subtle than what we had before, it would probably still be a bug.
  • We have to come up with an ad hoc way to handle the case where the units are not the same. In this case I chose to return a `Maybe Length`, but I might as well have chosen to return `Result UnitError Length` or even something more custom.
  • Every callee now has to handle the error case, even if they’re sure they haven’t made any mistakes.

I think we can do better.

Perhaps we can encode the “unit” component in the type system. That way we can prevent the error from occurring at all during runtime and so we don’t have to handle any error cases.

Let’s try to do this by defining a custom type for each supported “unit”.

type Meters = Meters Int
type Kilometers = Kilometers Int
addMeters : Meters -> Meters -> Meters
addMeters (Meters a) (Meters b) =
Meters (a + b)

addKilometers : Kilometers -> Kilometers -> Kilometers
addKilometers (Kilometers a) (Kilometers a) =
Kilometers (a + b)

Ok, that works, it’s not possible to accidentally add meters to kilometers, but still suboptimal.

  • Every function we want to support on “length” needs to be re-implemented for every supported “unit” (e.g what happens if we want to support subtraction?). All of these functions follow exactly the same structure. Not very DRY.
  • Whenever we want to support a new “unit”, we need to re-implement all supported “length” functions for that “unit”. Again, not very DRY.

We’ve got the runtime properties we want, but the cost of re-implementing the same functions over and over is not something I’m ready to settle for. Perhaps there is a way to generalise this; to implement all of these functions only once, and be able to reuse them for all “units”.

Let’s look again at the types we’ve defined above.

type Meters = Meters Int 
type Kilometers = Kilometers Int

We can understand these definitions as representations of the concept “length” with the “unit” component pre-filled. The Meters type has the meaning “a ‘length’ where the ‘unit’ component is always meters”, similarly with Kilometers. Is there a way to “pull out” that pre-filled ‘unit’? Can we define something that has the meaning “a ‘length’ where the ‘unit’ component is X”.

This is the point where phantom types start to become interesting. It’s exactly the what we’re looking for. It generalises the previous approach.

Let’s look again at the definition:

“Phantom types are custom types that have one or more unused type parameters.”

Now that we understand the problem they can solve, we can also describe them in a bit of a different way.

“Phantom types can be used to attribute different meanings to identical data structures.”

Let’s look at them in action:

type Length unit = Length Int — (1)type Meters = Meters — (2) 
type Kilometers = Kilometers — (2)
add : Length unit -> Length unit -> Length unit — (3)
add (Length a) (Length b) =
Length (a + b)
  • (1) This is a phantom type. We define a custom type Length with an unused type parameter unit. It’s meaning is exactly “a ‘length’ where the ‘unit’ component is X”. The type parameter `unit` serves as the X.
  • (2) We define the custom types Meters and Kilometers which represent the units “meters” and “kilometers” respectively. As an example, Length Meters derives the meaning “a ‘length’ where the ‘unit’ component is ‘meters’”. Critically, the data structures for Length Meters and Length Kilometers are exactly the same (both Length Int), so we can write reusable functions that work for both, but the type checker will see them as different types. Mixing them will result in a type error!
  • (3) The type definition of add reads as: “You can add any two lengths, as long as they are of the same unit”. Also note that because both Length Meters and Length Kilometers are compatible with Length unit, we can reuse this algorithm for both types.

Cool, this has none of the downsides of the previous approaches:

  • Units don’t exist at runtime, so less overhead and no runtime error handling!
  • We can write functions to work for any unit. No re-implementing every function for every unit!

In order to get a better picture of how phantom types are actually used, I will list two common real world use cases:

  1. Measures

The example we’ve followed throughout this post is an example of parameterising the “unit” component of a length, but this technique can be used for any measure.

Here we see a Quantity type that can be used to represent any quantity while still guarding against the mixing of quantities.

We see this technique used extensively in the package elm-units which is a pillar of elm-3d-scene.

2. Ids

Database entities often have ids that are of the same type (e.g. String or Int ), but there is a difference in meaning between a userId and a bookId . We can use phantom types to allow us to separate these concepts while sharing functions that should work for any id.

Next to these common use cases, there are probably a lot more cases where using phantom types makes sense.

If you’re interested you can take a look at a more advanced version of this technique called the phantom builder pattern by Jeroen Engels:

--

--