How Bob used implementation details in tests

Is it okay to use implementation details in tests? And if not, why so?

Have you met Bob? Probably yes, if you read What Real Madrid can teach us about leadership or my recent article about using "silent grouping" in estimation.
If you haven't – don't worry, I'll introduce him now.

Here is Bob:

One interesting fact about Bob is that he is really into testing:

But something really odd happened to him recently. He wrote a new feature, covered its logic with tests and sent it to code review.

Can you imagine how dumbfounded he was when Jane, his colleague from the picture above, rejected the pull request with this comment:

And it was just the beginning of a long conversation.

Let's challenge best practices

Bob quickly responded to Jane's comment and started refreshing the page, waiting for an answer.

"You know that GitHub will notify you about all new comments, right?" Jane saw what he's doing and laughed at that.
"Do you have time now?"

Bob nodded: it was his last task, and he had all the time in the world.

Jane moved her chair closer to Bob's desk and looked at the screen once again. "Any real reason..." she said thoughtfully. "I don't know, it's just good practice, isn't it? People wouldn't do it if it was bad."

"People recommended bloodletting not too long ago," Bob replied with the smile. "And CSS-in-JS".

Jane laughed, "CSS-in-JS is cool, come on. But you have a point, let's find out a reason".

She took the keyboard, opened Google and started searching.

Implementation details in tests

She pressed enter and opened several links from the results.

The majority of the pages had the same advice – to not touch anything private in the tests. But as Bob supposed, they could not find any clear rationale for not doing it.

"Well...", Jane started. "These people all ask the same question – what is the point of testing private stuff if you can test public interface? Public methods will call private ones anyway, and eventually, everything will be tested".

"The reason is the same as for writing any unit-tests – to test small pieces of code in isolation. If a public method calls five private functions, I'd like to have separate tests for those functions, to see which of them are failing."

"But why?"

"Because otherwise, I gonna have a red test for the public method, with no clue what went wrong. And I'll need to debug it to locate the exact part which caused the problem. But we can have more granular data if we have tests for private functions. If they fail, you'll see where it was broken".

"You're probably right..." said Jane, her mouth tightening.
What Bob said made sense, but she wasn't entirely convinced. "I will approve your pull request".

Her colleague saw the scepticism in her eyes, which wasn't the desired outcome he had hoped for. "Do you still disagree?"

"Not really... your point sounds legit, but why all these people keep advocating against using private methods in tests? I mean, someone like you could just come and tell everybody why we need it... Unless they have some strong arguments".

The first reaction that Bob had was to shrug his shoulders and merge the pull request. But then a good idea came to his mind. "Do you want to ask Lucas?"

"Oh, yes!" she exclaimed. "Weird that we didn't think about it earlier".

And they went to talk to their team lead.

Let's ask the team lead

"We just want to know what you think about using private stuff in tests," Jane started when three of them entered the meeting room.

"What is private stuff?" Lucas didn't seem to understand the question.

"Private is something which is not public," Bob declared with a philosophical expression.

Jane continued, "Like private methods, for example. Many people warn against testing them, but Bob has a point why it can be useful".

"But how would you do it, when these methods are private, and you don't have access to them?"

"Well..." Jane tried to explain. "There are still many ways, right?
Some languages have workarounds to access private elements, for example, @VisibleForTesting in Java. Languages like Python and Javascript may not have real privacy at all".

"Privacy by convention," Bob added.

"Okay, I got it," said Lucas with the look as if he got it. "The short answer – it depends".

Bob rolled his eyes. He recently noticed a correlation – the more experienced people are, the more frequently they use the phrase 'it depends'. He didn't like it, but at least he knew how to look more knowledgeable.
"Depends on what?" he asked.

"On the system you're building," Lucas said and took the marker. "Unit tests are just code, and every code has its cost."
He came to the whiteboard and started writing:

"Cow?" Bob laughed and put two fingers to the head, imitating horns.

Lucas gave him a disappointed look and rewrote the formula:

"It's the cost of writing. If you develop a prototype which is going to be thrown away as soon as it's demonstrated, you don't really care about future maintenance, you aim to produce correct code as fast as possible.
On the contrary, if you're working on something with a potentially long lifespan, you may want to invest more time on writing code, if it can save future costs".

Jane and Bob nodded synchronously.

Lucas continued, "Your decision whether to use private implementation details in tests or not depends on what you're trying to optimize for. Why do we need unit-tests at all?"

Bob felt that it was a rhetorical question, but Lucas seemed to wait for an answer. Jade started first.

"To not break anything if we change code in the future," she replied.

"One," Lucas said and underlined the phrase 'Cost of Maintenance' in his formula. "What else?"

"I do it to not test my changes manually all the time," Bob decided to try it too.

"And here is your cow, man," Lucas winked and drew the line under 'Cost of Writing'.
"If we don't need to run it manually after every change, writing code will take much less time," he continued. "There are many other reasons for writing tests, but let's not overcomplicate for now."

"So, how is it related to private methods in tests?" Jane asked, confused.

Lucas didn't answer but started drawing something else on the board:

"The problem is not only about methods," he finally said. "There are many ways how you can use implementation details in tests, and we can group them into two large categories. Does it make sense?"

Jane and Bob looked at each other. "Hm... sort of..." Jane said.

"Okay, okay," Lucas realized that he might have made it too complicated. He put back the marker and opened the laptop. "It's gonna be better now – I have a perfect example".

Let's find some old code

The "perfect example", as Lucas put it, was written by him three years ago. It took a while to find it in the repository's history because the code had been changed quite a bit since then.

"Don't laugh, it's quite old," he hid unrelated functions from the file and turned the screen towards his colleagues.

https://gist.github.com/elergy/...

"What was it for?" Jane asked.

"It was supposed to keep track on available books in the library," Lucas answered. "Physical books, not for Kindle. A library keeps information inside the internal property _books and gives two methods to the public: addBook and takeBook. Makes sense so far?"

Jane and Bob didn't have anything to ask.

Lucas continued "Time to test it. Which tests we would need to write to cover the method takeBook? I see at least two: the one when we return null, and the one when we find a book and return it".

He picked the marker again and wrote these two cases:

"Sounds valid?" he asked.

"Kinda..." Bob replied uncertainly and examined the code again:

gist.github.com/elergy/...

"We can make it more specific," he said. "We return null if the book does not exist at all, right? At the same time, we return null if we request a valid book, but there are no available copies in the library."

"Oh, yes!" Lucas agreed. "We need many cases! In fact, we do have many cases, check it out."
He opened a large file with tests and randomly picked one test-case in the middle:

gist.github.com/elergy/...

"Here, we test that we cannot take a book if it's already taken.
Firstly, we add some books into the library: two books written by Stephen King, one – by George Orwell.
Then we take one book from the library and check that we can't do it once again. Sounds reasonable?"

"Yep," Bob said impatiently. He was used to watching YouTube on the 2x speed, and now he regretted that he can't speed up Lucas.

Lucas went on talking, "But I didn't really like the way we were writing it. The preparation phase looked cumbersome: for this case, it takes five steps, for others, it might be even longer. So I decided to try another approach".

He took a red marker and circled the word "Mocking" on his first drawing:

"Instead of calling several methods to achieve the state we needed, we decided to mock the final state – just override the property _books."
He returned to the laptop and opened a newer revision in the repository. The test looked slightly different:

gist.github.com/elergy/...

"Now, the preparation phase has only two steps instead of five: create a new library and change library._books to whatever we want. And it's always gonna be two steps, no matter which situation we need to replicate. We wrote many tests following the same approach until we saw that it wasn't as good as we thought..."

Bob finally looked incredulous. The test seemed to be very decent, "Why not good? What do you mean?"

Lucas spread his hands, "They quickly became unreliable. The tests were easy to write, even for some very complex examples.
But we had changed the code as well, and eventually, these mocks had gone out of sync. This is how the function addBook looked in some time, do you see the difference?"

gist.github.com/elergy/...

Jane found it first, "Yes, the fields author and title aren't on the top-level anymore – now they're inside info".

"Which makes our old mock invalid," Lucas confirmed.
He opened the mocked state again:

gist.github.com/elergy/...

Bob laughed, "So, you're saying that the test would still pass because library.takeBook cannot find the requested book and returns null?".

Lucas smiled, "Exactly! And we have a weird situation – the test is green, but it doesn't check the scenario it was supposed to verify. And if we break the code, the tests will not warn us".
He returned to the code and removed the check for !book.available:

gist.github.com/elergy/...

"This code will return any known book, regardless of whether it's available or not. We can take the same book one million times, and our tests will be silent.
But look at the first version of the tests again:"

gist.github.com/elergy/...

"It only uses the publicly available methods.
If we change private implementation details, this test will continue working just as before, giving us enough confidence".

Let's talk about static typing

"Wait a minute," Bob understood the problem but didn't fully agree with the conclusion. "Isn't it a problem of Javascript? If you used Java or even Typescript, you wouldn't miss this problem – the language itself would flag that the mock has an incorrect type."

"Actually, that's a good point," Lucas agreed. "How could Typescript protect us?"

"It just wouldn't allow us to run these tests until we fix outdated mocks."

Lucas skimmed through the file with tests, "We have sixty-two test cases. You're saying that we need to update mocks in all of them, right?"

"Of course!" Bob exclaimed. "How else?"

Lucas came to the board and highlighted Cost of Maintenance:

"There is no doubt that this situation is much better than having unreliable tests. But let me say it aloud, so you can hear how ridiculous it sounds – we changed two lines of private code, and now we need to fix sixty-two test cases.
It would never happen if we used only public API in tests, without mocking private details."

Jane sighed deeply, "You know, I was exactly in this situation one week ago. I slightly changed the shape of the Redux State, and then spent the whole day updating tests, which had outdated mocks..."

Lucas looked interested, "Oh, Redux is a huge topic! Maybe you can check our code and think about how to avoid this problem in the future? You can even write a blog post about it later."

Let's take a break and summarize

"That was about mocking," Lucas concluded and pointed out on the board where it still was highlighted.

"You see what happens when you mock private implementation details in tests, right?"

"Yep," Bob said.

"When we change code, even slightly, we may have to spend too much time fixing tests which didn't have to be broken."

"Only if you are lucky," Lucas reminded.

"These tests are much less reliable, because we may not notice that mocks were out of sync. You think that everything is tested, while, in fact, it's not."

"What's about asserting?" Jane asked. "Is it okay to call private methods to verify that they work correctly?"

Lucas checked his watches, "Have to run to another meeting, can we continue later?"

"Can you just tell us if it's okay or not?"

Lucas looked at the board as if trying to collect his thoughts and come up with a brilliant answer. Then he shifted his gaze to Bob and sighed wearily.

Credits for the camera on the cover picture goes to Benedikt Geyer on Unsplash

How Bob used implementation details in tests

Let's challenge best practices

Let's ask the team lead

Let's find some old code

Let's talk about static typing

Let's take a break and summarize

Read next

How Weak Opinions Damage Code Review

The Story About a Smart Boss

The Story about Algorithms in Interviews