Unit tests are small tests that each exercise a unit in such a way that
together they allow a statement about the functional correctness of the
unit under test.
What is a unit? In [1] Michael C. Feathers defines
a unit as: "In procedural code, the units are often functions. In
obejct-oriented code, the units are classes".
A more paradigm-agnostic approach would be to define a unit as "the
smallest functional unit in your code that can be tested on its own,
outside of any context".
The second definition has a certain appeal, because it can have a psoitive
effect on the way you design classes, mainly because it fosters a culture
that adheres to the KISS (Keep It Small
and Simple) principle: if you treat
individual methods of a class as units you need to think about how you can
unit-test them. Which in turn might lead to a different view of what methods
actually belong into what class and, in consequence, to a finer granularity
of classes. Which in turn means better testability.
Because it makes things so much easier if you or somone else has to refactor
the code later. Or has to change it in some way. As long as the unit tests
still result in green you know you didn't break anything.
Because unit tests are excellent documentation. They tell everybody
in a very concise way "if I use this unit in this way, this is
what is supposed to happen". Do you always explain the behaviour
of a class or a function in some lines of comment preceeding its code?
Excactly.
Because it makes your life easier when a regression test fails. With
unit tests in place you can approach the bug with the assumption
"assuming the code covered by the unit tests is ok, what else could have
gone wrong?". The alternative would be long sessions with a debugger,
stepping through the sources, checking return values of methods...
basically the same sort of work that unit tests do but are far better
equipped to do.
Another reason (and one already mentioned in the section above) is that,
once you start securing your code with unit tests, it affects the way
you code. You start to design your architecture and your classes with
testing in mind (well, you have to) and that will usually change the way
you develop code for the better. In [2] Herb Sutter
and Alexei Alexandrescu bring it to a point: "Good design us testable,
and design that isn't is bad".
Test-driven development describes the practice of first writing a test,
which fails, and then to write the code that makes it pass. Once the test
passes, you know you are finished writing the code. Kent Beck describes
the technique in detail in [3]. Test-driven
development is certainly a very good way to arrive at clear, easy-to-maintain
code, but especially with quick prototyping it sometimes can be a bit of a
hassle. And anyway, the technique can only be applied when writing
new code.
But of course you can also write unit-tests retrospectively, even if that
is sometimes frowned upon, mainly because then it will usually not be done
at all. Still, it's better to write unit-tests retrospectively and so at
least get some unit-tests from time to time than to not having any unit tests
at all.
There is actually even an argument for writing unit tests
retrospectively: it can then be done by somebody else, which somewhat
minimizes the risk of making the same wrong assumption twice, first
while coding the method under test and second when writing the test
(finding the right number for the calendar week of January 1st is
a very good example for this sort of problem). Which brings us to the topic
of validation.
One of the things to keep in mind when writing unit-tests is that the best
test is only as good as the data used for input and for validation.
When writing unit-tests retrospectively you might be tempted to look at what
the unit does with the arguments received and then model the tests on that
data. However, this approach inherently assumes that the unit to be put
under test is working correctly at the time you are writing the unit-test.
What if it isn't? Then the fact that it isn't is not only not detected, it
will also be harder to find due to the assumption that the unit test knows
what it is doing meaning that that unit will be the last place to look at
when searching for the cause of a bug.
In that sense, even using an existing version of the software as a reference
while doing a complete rewrite can be wrong. You might not look at the source,
but you look at the results and you inherently treat them as correct although
they might not be (this happened to me once: I was doing a complete rewrite
of a module displaying stability data of some sort and used the old module as
a reference for some tests. When I finally presented the finished module
to the person responsible for the feature, he looked at it and said:
"that bit of data is wrong". I pointed out that it was the same as
in the old version, only to learn that then it was wrong in the old version
as well and the old version therefore buggy. So much for references.)
A better approach is to get validation data independently from any existing
code through thorough understanding of the domain, i.e. by reading and
understanding the specification (assuming one exists) and by understanding
the underlying knowledge domain, be that a technological domain or an economic
one (think of a method that calculates interest, for example).
There is, of course an exception to the rule that you shouldn't use the
existing code as a base to write tests. If other people are dependant on the
way your systems behaves, then you have to make sure that, when adding or
changing parts of it, you maintain the current behaviour. Even if that
means preserving wrong behaviour. The reason for this is simple: people
might actually depend on errors. Be it that they have developed rough patches
for them or they are actually exploiting bugs to their advantage. If you fix
these bugs, thus altering the behaviour of the system, it might break other
peoples functionality or, worse still, silently lead to wrong data. Rumour
has it that some Windows(tm) versions were downward-compatible to such a
degree that wrong behaviour was preserved just because some wide-spread
games relied on that wrong behaviour. For a more detailed discussion of this
see [1] (p. 186 ff).
Another problem you might encounter when writing unit-tests retrospectively
is that the code you are writing the tests for does not seem to be easily
testable. You might need to instantiate classes and fill them with meaningful
values because the unit you want to test expects these objects as arguments;
you might not be able to run a unit test offline because the method under test
retrieves data from a server. In other words: when the code was originally
written, it was not written with unit-testing in mind.
There are two concepts that might help you writing code that is easily
testable.
The first is Programming
To An Interface. Instead of using actual classes in declarations, for
each class you define an interface and you use these interfaces to declare
member fields or arguments. This enables you to develop so called "mock
objects" that you can exchange with the real ones: production code
uses the real classes, unit-tests use simple mock objects, where the real
class for some reason or another can't be instantiated or used. For instance
because it requires access to s specific server that can't be reached from
where you are testing.
The second concept is that of
KISS - Keep It Small
and Simple. If a method does only one job; if a class offers all the
functionality really needed to do what it promises but nothing more
(delegating tasks to other classes where appropriate), then your tests will
be easy to write. Or seen the other way round: if your tests are hard to
write, this is a good indicator that your code could do with some refactoring
to make it simpler and to break dependencies. [1]
deals with this in depth.
Code coverage is a metric for the amount of code covered by unit tests. Or to
put it in simpler terms: the percentage of instructions in your code that are
actually executed by unit tests.
Your aim, of course, should be 100%. 100% percent code coverage means that
every single execution path in your program is covered by at least one test.
If you follow the test-driven-development approach, this will be pretty easy
to achieve. It actually comes as a by-product of test-driven-development.
If you write your unit tests retrospectively, achieving 100% code coverage
might be much harder - especially, if the code had not been written with unit
testing in mind. In large units full of convoluted code it might not be easy
to establish every single execution path and how to put that execution path
under test. If that is the case though, it is a strong indicator that the
unit could really do with some refactoring towards code that better adheres
to the KISS principle.
Even if 100% percent code coverage for the whole application seems somewhat
ambitious, not to say outright utopic, you should at least try to achieve
100% code coverage for each individual unit under test. The reason for this
is simple: if less than 100% of your unit is under test, you might get false
psoitives - i.e. the tests pass even though the unit is buggy. Because
the bug resides in the untested bits.
How many unit tests you should write for a single function depends on what
possible errors you want or need to catch. If you assume that the caller
of the function always keeps his part of the contract and only calls the
function with valid arguments, then the number of unit tests necessary to
ensure correctness only depends on the algorithm the function performs on
the argument(s).
Let's assume the function under test is
bool IsLeapYear( int year ). Then four
tests, with the years 1996, 2000, 2002 and 2100, are sufficient (the first
two years are leap years, the last two are not).
But if you want to be absolutely sure that your code does work correctly
even if the contract is broken (which might not even be intended by the
caller), than more tests are necessary. For an integer argument I would
always test for zero as well because an attempt to divide by zero
will definitely crash your program (unless caught). Other prime candidates
for tests are type boundaries and values close to them. For an 16-bit integer
that would be 0, 65535 (boundary), 65536 (exceeds the maximum capacity of a
16-bit integer) and -32768 (a boundary for a signed 16-bit integer) in
addition to the four tests already mentioned above (the boundaries of an
integer are always interesting because, at least in C++, integer types
can be signed or unsigned, and if you mix the two by mistake you might end
up with rather strange results).
For strings you should at least test for the empty string and, at least
in an environemt where mixed C/C++ code is used, some really, really long
string to catch buffer overflows with sprintf() and related
functions.
That leads us directly to another thing to think about when writing unit
test: how many arguments may a function under test have?
In [3], Robert C. Martin looks at the problem as
a matter of code readability; he gives no arguments as the optimum, followed
by functions with one and two arguments. Triadic functions (functions with
three arguments) should be avoided if possible and more than three arguments
should not be used at all.
From the perspective of unit tests, the same is true but for a different
reason. And that is: the number of unit tests you will have to write grows
exponentially with the number of arguments.
Think of the example in the paragraph above. For
bool IsLeapYear( int year ) we arrived at eight unit tests for
full coverage of all risks. Now assume a function that takes two arguments,
and for each argument we have eight distinct values that need to be included
in our tests. We would have to write 64 tests to cover all possible
permutations! Anyone for a third argument? No? Thought so.
I once heard of a company that solved the problem by generating the
unit tests automatically by a script. A valid approach, but then you a)
need to be able to automatically divine the result as well, and b) your
build will take rather more time than it takes to make a coffee - and so
the unit test idea is perverted, because unit tests should be performed
at every build. Which should be reason enough to keep the number
of tests to the necessary minimum.
There is more to this, of course. You might come up with the idea of
encapsulating data in a struct or value class. Unfortunately that doesn't
count as reducing the number of arguments. In this context, for
"argument" read: every single variable that is used inside the
function. This includes variables that are kept together in structs or value
classes; it also includes any global variables or member fields used by the
function.
Which brings us to a nice conclusion: if you start relying on unit tests,
it will fundamentally alter the way you design your code. You will start
to avoid monster methods with plenty of arguments just so that you don't
have to write so many unit tests. Which will result, in cleaner, more
concise code.
Finally some words on what a unit test does, and what not. And that means to
state the naked, ugly truth: no number of unit tests can assure the correctness
of the unit under test. All a unit test does is compare data generated by some
function against data provided by you, the author of the unit test.
There are three possible ways to arrive at unit tests that pass even if the
method under test is not working correctly:
First, your knowledge of the domain might be insufficient. This is especially
bad because you will write the unit tests under the same faulty assumptions
as the code under test.
Take leap year calculation, for example. Any given year is a leap year if the
number divides without remainder by four, unless it also
divides by 100, but then only if it does not also divide by 400.
The year 2000 is a leap year, 2100 is not. However, if you only
know about the rule of the divison by four,
assertEquals(true, calendar.isLeapYear(2100)) will pass,
although the result is wrong.
Another reason for getting the (wrong) message that everything is fine is
the sheer bad luck of repeating a typo (using cut & paste between your
code and your tests greatly increases the chance of this to happen). Consider
the following (very bad) implementation for isLeapYear(): if year in
(1988, 1992, 1998, 2000, 2004, 2008, 2012) return true else return
false.
assertEquals(true, calendar.isLeapYear(1998)) will pass
although 1998 is definitely not a leap year.
The third possibility of getting false positives arises from test-driven
developement, especially if you follow the practice to the letter. In
[4], the seminal book on the subject, Kent Beck
writes: "Make the test work quickly, committing whatever sins necessary
in the process" (a few pages later he gives a concrete example).
Translated into our little leap year method, the following implementation
would make the unit test
assertEquals(true, calendar.isLeapYear(2100)) pass:
isLeapYear(int year) { return true; }
Why is this bad? After all, Kent Beck goes on to define the next step as
"Refactor - Eliminate all of the duplication created in merely getting
the test to work"? Because there is a small timeframe in which the test
results in a false positive. And in that timeframe you might get interrupted.
Say, you are implementing a class with a number of methods, you did write a
unit test for each method and you have supplied an initial (fake)
implementation to make the tests pass. In comes your boss: "Red alert!
Our latest release is throwing segmentation faults like hell! Drop whatever
you are doing and try to find the cause and fix it!" Chances are, when
you finally go back to the code you were developing at the time of the
interruption, you might miss a fake implementation or two. And the unit
tests won't tell you that you did.
How can you avoid this trap? Apart from omitting the false positive
and never allowing the test to go green unless the functionality has
been fully implemented, there are some practices that can be of help
here. One is to only work on one method at any given time. Never have more
than one unit test go green with a fake implementation. And if your boss
interrupts you, at least take the time to leave some artefact in the code
that will break the build. That way, even if a week has passed before you
can work again on the code, the compiler will remind you of the last method
you were working on.
Another practice is not to supply just one initial unit test, but two.
You still get one unit test to pass, telling you everything else has
been set in place, but the second test will (hopefully) fail and tell
you that the implementation of the functionality is still faulty.
[1] "Working Effectively With Legacy Code", p 12. Michael C. Feathers, Prentice Hall, 2010
[2] "C++ Coding Standards", p 139. Herb Sutter; Alexandrei Alexandrescu, Addison-Wesley 2010
[3] "Clean Code", p 40ff. Robert C. Martin, Prentice Hall, 2009
[4] "Test-Driven Development By Example", p X. Kent Beck, Addison-Wesley, 2006