Flow Your Tests Like Your Code

My co-worker Belinda Liu turned to me and said, “I don’t like these tests at all; they’re hard to follow, and I’m not sure what they’re testing.”

I looked at the tests that I had spent much of yesterday afternoon working on. She was right: they were hard to follow (even for me, who had written some of them!).

How had we gotten here? Our code was straightforward, but our tests were byzantine (excessively complicated). We identified two problems:

the tests didn’t line up with the code
the tests were deeply nested, but the code wasn’t.

1. Lining Up the Tests with the Code

Belinda and I work on Cloud Foundry’s command line interface (CLI), a Golang-based utility which allows end users to interact with Cloud Foundry (e.g. to push an application). We write our tests with Ginkgo, a BDD-style Golang testing framework.

In this post we’ll explore a typical subcommand, stage, and its corresponding unit tests to determine how well the code lines up with our tests (hint: it doesn’t).

In the screenshot below, the code for the stage subcommand is on the left. On the right is a minimap of the subcommand’s Ginkgo test code. (we use a minimap rather than the complete Ginkgo code because, at 300+ lines, the tests aren’t easy to digest). The pink arrows show where a particular line of code is tested. We’re off to a good start (the first error condition is the first test), but we rapidly spiral into chaos. There seems to be no rhyme or reason why a particular test appears in a particular spot.

This disorganization exacts a cost: the tests are hard to follow, and coverage is difficult to determine.

The code is to the left, and the tests are to the right. Note that the tests are much too large (>300 lines) to fit in the image, so we use a minimap. The pink arrows indicate where in the test file that particular line of code is tested.

After refactoring the test, the pink arrows are organized (see image below).

Keen-eyed viewers may notice that our code has grown (it has; we added a feature, stage should not require a package). Our test refactor was opportunistic, a drive-by: we were in the codebase adding a feature, and we took an extra quarter-hour to refactor/reorganize our tests.

Lack of code coverage is much easier to identify when the tests are patterned after the code. In fact, while writing this post, I realized that we had no test coverage for the GetNewestReadyPackageForApplication() error path. Prior to the refactor, it would have gone unnoticed.

2. Un-Nesting the Tests

The tests were nested more deeply than the code, as we found out by running egrep "When\\(|It\\(|Describe\\(" command/v7/stage_command_test.go | sed 's/$/})/'.

Below is the layout of our test code before the refactor. Our code was only one-level deep—our flow control statements, which were limited to if, weren’t nested. But our tests were as many as three levels deep:

	When("checking target fails", func() {})
		It("displays the experimental warning", func() {})
		It("returns an error", func() {})
	When("the user is logged in", func() {})
		When("the logging does not error", func() {})
			When("the staging is successful", func() {})
				It("outputs the droplet GUID", func() {})
				It("stages the package", func() {})
				It("displays staging logs and their warnings", func() {})
			When("the staging returns an error", func() {})
				It("returns the error and displays warnings", func() {})
		When("the logging stream has errors", func() {})
			It("displays the errors and continues staging", func() {})
		When("the logging returns an error due to an API error", func() {})
			It("returns the error and displays warnings", func() {})

After the refactor, we see that our test code is now two levels deep. “Why not one level deep?” you may ask. The answer is that the feature we were implementing—the reason we were modifying our code—introduced a new level in our code: it now had a nested if block. Our code now had two levels, which matches the two levels of our test.

Our newly-refactored tests mirror our code, so much so that by looking at them we get an intuitive sense of how our code was written:

	When("checking target fails", func() {})
		It("displays the experimental warning", func() {})
		It("returns an error", func() {})
	When("the package's GUID is not passed in", func() {})
		It("grabs the most recent version", func() {})
		When("It can't get the application's information", func() {})
			It("returns an error", func() {})
	When("the logging stream has errors", func() {})
		It("displays the errors and continues staging", func() {})
	When("the logging returns an error due to an API error", func() {})
		It("returns the error and displays warnings", func() {})
	When("the staging returns an error", func() {})
		It("returns the error and displays warnings", func() {})
	It("outputs the droplet GUID", func() {})
	It("stages the package", func() {})
	It("displays staging logs and their warnings", func() {})

3. In Practice: Every “if” Should Have a “When”

Every if err != nil { should have a corresponding When("XXX returns an error", func() {.

Below is an example from the Cloud Foundry CLI bind-security-group:

Code:

securityGroup, warnings, err := cmd.Actor.GetSecurityGroup(cmd.RequiredArgs.SecurityGroupName)
cmd.UI.DisplayWarnings(warnings)
if err != nil {
	return err
}

Unit Test:

It("gets the security group information", func() {
	Expect(fakeActor.GetSecurityGroupCallCount).To(Equal(1))
	securityGroupName := fakeActor.GetSecurityGroupArgsForCall(0)
	Expect(securityGroupName).To(Equal(cmd.RequiredArgs.SecurityGroupName))
})

It("displays the warnings", func() {
	Expect(testUI.Err).To(Say(getSecurityGroupWarning[0]))
})

When("an error is encountered getting the provided security group", func() {
	var expectedErr error

	BeforeEach(func() {
		expectedErr = errors.New("get security group error")
		fakeActor.GetSecurityGroupReturns(
			resources.SecurityGroup{},
			v7action.Warnings{"get security group warning"},
			expectedErr)
	})

	It("returns the error and displays all warnings", func() {
		Expect(executeErr).To(MatchError(expectedErr))
	})
})

Takeaways

It’s easier to navigate the tests when they’re patterned after the code. Much easier.
Refactoring the unit tests can be done opportunistically. Sure, one can choose to do a grand refactor of the entire suite of unit tests, but that’s not always an option, and the ability to do these refactors piecemeal, when you’re modifying the code in question, is a definite advantage.
Refactoring a test is not time consuming, and can be done in as few as fifteen minutes.

References

Code: Align the happy path to the left edge, Mat Ryer, August, 2016

Practical Go: Real world advice for writing maintainable Go programs, Section 4.3 Return early rather than nesting deeply, Dave Cheney, April, 2019

The title, Flow Your Code Like Your Tests, is an homage to Philip K. Dick’s novel, Flow My Tears, the Policeman Said. The title is one of the few examples in the English language where the verb “flow” is used in the imperative mood.

Acknowledgements

Belinda Liu: inspiration, suggestions & edits; Mona Mohebbi: code samples.

Corrections & Updates

2020-04-02

We include a snippet of code with its corresponding test.

1. Lining Up the Tests with the Code#

2. Un-Nesting the Tests#

3. In Practice: Every “if” Should Have a “When”#

Takeaways#

References#

Acknowledgements#

Corrections & Updates#