Member-only story
How to Write a RegEx Worthy of Passing Code Review
Let’s make RegExes cool again

How would you react to seeing a RegEx like this:
/^([0-9]{5}([\s]+|,)+)*([0-9]{5})$/
in a code review?
Is it clear what this expression is trying to do, what it doesn’t do, or how to test it? Sadly, a typical RegEx is either so basic that it’s unhelpful, or so complicated that only the engineer who wrote it can understand it.
It doesn’t have to be this way! You can write effective regular expressions with just some basic knowledge, and keep them readable for other engineers, even the opinionated ones who vow never to use them.
Let the following be a practical guide for understanding, utilizing, and maintaining RegExes.
Quick background
Regular expressions (RegEx) provide a concise, language-agnostic format for matching and parsing strings. In theoretical computer science, it can be shown that any finite state machine has an equivalent regular expression. Speaking practically, that means that a RegEx can represent an entire algorithm for processing data, all in a single, compact string.
Hence, a RegEx should be treated like any other code. It should be written cleanly and readably, so that anyone else can come back and maintain it with minimal effort. Read on to see some how we can apply programming best practices to RegExes (with examples).
Example 1: zip code
Here’s a RegEx that will match any string containing 5 digits, and nothing else, such as a US Postal (zip) Code:
^[0-9]{5}$
Here’s what all those symbols mean:
^
— matches the beginning of a string[0-9]
— matches any digit (specifically a character in the set{0,1,2,3,4,5,6,7,8,9}
){5}
— applies[0-9]
exactly 5 times$
— matches the end of a string
All together, our RegEx will match any string with exactly 5 digits, nothing before, and nothing after. Let’s run it, and compare the complexity to a homemade string-parsing solution: