Building trust on property-based testing

This is the third blog post on property-based testing. For background, be sure to read the first and second posts.

Some of the common concerns from new users of property-based testing are:

How can we trust a process that uses randomly generated tests?
How can we ensure that both the typical cases and the edge cases that we usually test explicitly using traditional unit tests will be among the randomly generated tests?
How can be verify that the properties that we define are themselves bug free?

Ensuring that edge cases are tested

Edge cases are quite common and often obvious when we look to predicate input arguments. If an argument is a list, is an empty list properly handled? If another argument is an integer, does the predicate behaves correctly when passed a zero? What about a negative integer? If a float argument must be in a given range, are both the minimum and maximum values accepted?

The arbitrary library category defines default edge cases for common types, which we can query via the type object (the category complements the object, adding new predicates to it). For example:

| ?- type::edge_case(atom, Value).
Value = ''
yes

| ?- type::edge_case(byte, Value).
Value = 0 ? ;
Value = 255
yes

| ?- type::edge_case(list, Value).
Value = []
yes

Edge cases are tested before the QuickCheck implementation switches to generating random tests. Their use is controlled by the ec(Boolean) option for the quick_check/2-3 predicates and test dialects. Its default value is true. It can be set to false in those rares cases where for some reason we want to disable the use of edge cases.

When the default edge cases don’t provide the necessary coverage, e.g. when defining new types and arbitrary value generators for those types, we can add new edge cases by defining clauses for the edge_case/2 multifile predicate. For example, assuming we are defining a new age type with range [0, 121]:

:- multifile(arbitrary::edge_case/2).
arbitrary::edge_case(age, 0).
arbitrary::edge_case(age, 121).

Although this is a simple example, the construction of edge cases may be non-trivial computations. In any case, we can always QuickCheck the edge_case/2 definitions:

| ?- lgtunit::quick_check(type::edge_case({age}, -between(integer,0,121))).
% 100 random tests passed, 0 discarded
% starting seed: seed(3172,9814,20125)
yes

The {age} notation tells QuickCheck that the argument have a fixed value. The second argument uses the default between/3 type to check that the returned value is an integer in the valid range.

Labelling generated tests

Now that we know how to ensure that edge cases are tested, what about the randomly generated tests? When writing traditional unit tests, is common practice to cover typical cases. But can we get the same assurance using property-based testing? The quick_check/2-3 predicates and test dialects support a l(Closure) option that allows us to label the generated tests. For example, assume the following predicate definition:

label(List, _, Label) :-
    length(List, Length),
    (   Length mod 2 =:= 0 ->
        Label = even
    ;   Label = odd
    ).

In the previous blog post, we used as example an every_other/2 predicate that returns every other element in the list. The original buggy version wrongly assumed an input list with an even number of elements and was diagnosed and fixed using QuickCheck. The fixed version is:

every_other([], []).
every_other([H| T], L) :-
    every_other(T, H, L).

every_other([], X, [X]).
every_other([_| T], X, [X| L]) :-
    every_other(T, L).

To ensure that QuickCheck is indeed generating tests using lists with both an even and an odd number of elements, we can use the label/3 predicate by passing the option l(label). The label closure will be called, for each generated test, by appending the two predicate arguments used by the test call to the every_other/2 predicate plus the labels argument (the labelling predicate can return a single test label or a list of test labels):

| ?- lgtunit::quick_check(
         every_other(+list(integer), -list(integer)),
         [l(label), n(10000)]
     ).
% 10000 random tests passed, 0 discarded
% starting seed: seed(3172,9814,20125)
% even: 5112/10000 (51.120000%)
% odd: 4888/10000 (48.880000%)
yes

As the printed stats show, there’s a reassuring balanced split between lists with even and odd lengths. But is this enough? Note that we didn’t check e.g. that all elements in the output list are present in the input list or that the size of the output list is half the size of the input list. See our previous post for a full example on defining properties for the predicates being tested.

Restricting generated tests

Assume that we want to restrict the randomly generated tests to those where the input list have an odd number of elements, which triggered the bug in the original definition of the every_other/2 predicate. We can use the pc(Closure) option for this purpose. The clousure will work as a pre-condition and will be called, for each generated test, by appending the two predicate arguments that would be used by the test. If the call fails, the test is discarded and a new candidate test is generated. Thus, with the following predicate definition:

condition(List, _) :-
    length(List, Length),
    Length mod 2 =:= 1.

we can call:

| ?- lgtunit::quick_check(
         every_other(+list(integer), -list(integer)),
         [pc(condition), l(label), n(10000)]
     ).
% 10000 random tests passed, 10590 discarded
% starting seed: seed(9170,22060,20847)
% odd: 10000/10000 (100.000000%)
yes

Note that the number of discarded tests is close to the number of tests that were run given that, as the previous query demonstrated, QuickCheck generates a balanced split between lists of even and odd lengths.

We can also use similar pre-conditions to verify that our label/3 predicate is correctly labelling tests:

odd_list(List, _, _) :-
    length(List, Length),
    Length mod 2 =:= 1.

even_list(List, _, _) :-
    length(List, Length),
    Length mod 2 =:= 0.

| ?- lgtunit::quick_check(label(+list, {[]}, {odd}), [pc(odd_list)]).
% 100 random tests passed, 109 discarded
% starting seed: seed(17567,8226,21714)
yes

?- lgtunit::quick_check(label(+list, {[]}, {even}), [pc(even_list)]).
% 100 random tests passed, 102 discarded
% starting seed: seed(22040,19096,13145)
true.

The pc(Closure) option is also useful when constraining or enforcing a relation between the generated arguments and can sometimes be used as an alternative to define custom types. See the lgtunit tool documentation for further details and examples.

In actual cases, we can and often use QuickCheck itself and the solutions illustrated in this blog post to verify that the properties that we write to test predicates, plus the pre-conditions and labels that we use to filter and classify tests, are correct.

Note: The ec/1, l/1, and pc/1 options require Logtalk 3.38.0 or later.

PREVIOUSEvolving from manually written tests