ReactTDD.com

Testing regular expressions: a kata

Regular expressions are a flexible mechanism for matching string formats. Test-driving them is a fun exercise. Here’s a regular expression kata you can try out.

As an example, let’s build a matcher for phone numbers. Here’s a first test.

it("matches 1234567890", () => {
  expect(isPhoneNumber("1234567890")).toBeTruthy();
});

Alright, we can write the simplest thing to make that pass:

const isPhoneNumber = () => true;

We need to triangulate to move to a real implementation. Let’s choose an example which will help us do that.

it("does not match qwertyuiop", () => {
  expect(isPhoneNumber("qwertyuiop")).toBeFalsy();
});

Okay, so if we were being strict, the simplest thing to make this pass would be this:

const isPhoneNumber = (number) => number === "1234567890";

But let’s just jump to a regular expression solution:

const isPhoneNumber = (number) => number.match(/[0-9]/);

Now, is that the simplest thing that could possibly work? Assuming that we’re ignoring the hardcoded solution, then yes it probably is: the examples we’ve chosen mean that this regular expression solution is probably the shortest piece of code that we could have written.

And in fact, you’ll note that it’s actually shorter than the hardcoded version.

(Conversely, you could argue that an equality comparison is simpler than a method invocation).

What about this test?

it("does not match letter prefix", () => {
  expect(isPhoneNumber("a1")).toBeFalsy();
});

This will fail. Let’s make it pass, using the ^ symbol to match the start of the string:

const isPhoneNumber = (number) => number.match(/^[0-9]/);

Marvellous. Now for the postfix.

it("does not match letter postfix", () => {
  expect(isPhoneNumber("1a")).toBeFalsy();
});

Now we need the $ symbol to match the end of the string:

const isPhoneNumber = (number) => number.match(/^[0-9]$/);

Oh! But now we have our first test failing. We also need to add the + character to signify that we’ll allow any number of digits:

const isPhoneNumber = (number) => number.match(/^[0-9]+$/);

What’s the next test?

it("matches 12345 67890", () => {
  expect(isPhoneNumber("12345 67890")).toBeTruthy();
});

Alright, we want to allow spaces. Now we just add a single character into the regular expression:

const isPhoneNumber = (number) => number.match(/^[0-9 ]+$/);

Again, the simplest thing that can possibly work. Just a single character change! Now, what about hyphens?

To keep some notion of test independence, this one has no spaces:

it("matches 12345-67890", () => {
  expect(isPhoneNumber("12345-67890")).toBeTruthy();
});

Alright, we just add in the hyphen: not that because - is a special character in regular expressions (it’s used in the expression 0-9), we have to escape it:

const isPhoneNumber = (number) => number.match(/^[0-9 \-]+$/);

Next, brackets (parens!):

it("matches (123)4567890", () => {
  expect(isPhoneNumber("(123)4567890")).toBeTruthy();
});

It’s getting easy now:

const isPhoneNumber = (number) => number.match(/^[0-9 \-()]+$/);

That’s it, the finished implementation!

As a final flourish, if you wanted you could refactor your test suite into two lists representing valid and invalid numbers:

const validPhoneNumbers = [
 "1234567890",
 "12345 67890",
 "12345-67890",
 "(123)4567890",
];

const invalidPhoneNumbers = [
 "qwertyuiop",
 "a1",
 "1a",
]

validPhoneNumbers.forEach(candidate =>
 it(`matches ${candidate}`, () => {
   expect(isPhoneNumber(candidate)).toBeTruthy();
 });
);

invalidPhoneNumbers.forEach(candidate =>
 it(`does not match ${candidate}`, () => {
   expect(isPhoneNumber(candidate)).toBeFalsy();
 });
);

But, your mileage may vary. You may prefer to keep the initial form of your tests over this. It’s arguable which version is simpler.

One last point: this isn’t the only way to do this exercise. You could have arrived at the solution in many different ways.

Notice in particular that the first test broke when we introduced the $ symbol. Can you figure out a strategy for re-writing these tests so that none break as you go through?

— Written by Daniel Irvine on August 25, 2022.