Daikon internals (The Daikon Invariant Detector Developer Manual)

4 Daikon internals

This chapter describes some of the techniques used in Daikon to make it efficient in terms of time and space needed. These techniques can be enabled or disabled at the Daikon command line, as described in Running Daikon in Daikon User Manual.

4.1 Avoiding work for redundant invariants

Daikon reduces run time and memory by avoiding performing work for redundant invariants that provide no useful information to the user. There are three basic types of optimization that can be performed for uninteresting invariants: non-instantiation, suppression, and non-printing.

Non-instantiation prevents the creation of an invariant because the invariant’s truth value is statically obvious (from the semantics of the programming language), no matter what values may be seen at run time. Two examples are ‘A[i] is an element of A[]’ and ‘size(A[]) >= 0’. Non-instantiation is implemented by the by the isObviousStatically method. With the equality sets optimization (see Equality optimization), non-instantiation can only happen if all equality permutations are statically obvious. Note that isObviousStatically should be used only for invariants that are known to be true. Other code presumes that any statically obvious invariants are true and can be safely presumed when determining if other invariants are redundant.

An invariant can be suppressed if it is logically implied by some set of other invariants (referred to as suppressors). A suppressed invariant is not instantiated or checked as long as its suppressors hold. For example ‘x > y’ implies ‘x >= y’. Suppression has some limitations. It cannot use as suppressors or suppress sample dependent invariants (invariants that adapt themselves to the samples they see and whose equation thus involves a constant such as ‘x > 42’). Suppression also cannot use relationships between variables. For example, it cannot suppress ‘x[i] = y[j]’ by ‘(x[] = y[]) ^ (i = j)’. Suppressor invariants can only use variables that are also in the invariant that is being suppressed. In this example, only invariants using the variables ‘x[i]’ and ‘y[i]’ can be used as a suppressors. See New suppressors for more information.

Non-printing is a post-pass that throws out any invariants that are implied by other true invariants. It is similar to suppression, but has none of the limitations of suppression. But since it is only run as a post pass, it cannot optimize run time and memory use as suppression can. Non-printing should be used only in cases where suppression cannot. Non-printing is implemented by ObviousFilter, which calls the isObviousDynamically method on invariants. The isObviousStatically method is also used by the non-printing checks; it can be called at the end without reference to equality sets.

More detail can be found in the paper “Efficient incremental algorithms for dynamic detection of likely invariants” by Jeff H. Perkins and Michael D. Ernst, published in Foundations of Software Engineering in 2004; the paper is available from http://plse.cs.washington.edu/daikon/pubs/invariants-incremental-fse2004-abstract.html.

4.2 Dataflow hierarchy

The dataflow hierarchy expresses relationships between variables at different program points. It is used to save time and space during invariant generation and to prevent invariants from being printed multiple times.

Suppose there are two program points X and its parent Y. Then every sample seen at X is also seen at Y, and every invariant that is true at Y is also true at X.

Variable z in program point X is related to variable z’ in another program point Y by a flow relation if every sample seen of z at X is also seen of z’ at Y. Y is called a parent program point of X.

For example, all the field variables in the :::ENTER program point of a method in class C relate to the field variables in the :::CLASS program point of C. This is because the state of C, when in context at the entry :::ENTER program point, is also in context at the :::CLASS program point. Any invariant that holds true on a parent program point must hold on the child program point. Daikon saves time and space by only computing invariants at the highest parent at which they apply.

Program point declarations describe how program points are declared in a Daikon input file. Here we will describe how the parent records are typically used to connect program points into a dataflow hierarchy.

Daikon uses three primary relationship types (PARENT, ENTER-EXIT and EXIT-EXITNN) to connect the program points into an acyclic dataflow hierarchy.

A program point that represents the ENTRY or EXIT of a static method will have a parent record that points to the CLASS program point for the containing class.
A program point that represents the ENTRY or EXIT of a non-static (instance) method will have a parent record that points to the OBJECT program point for the containing object.
An ENTER-EXIT edge connects each method’s ENTER program point with its corresponding EXCEPTION and EXIT program points.
An EXIT-EXITNN edge connects each method’s EXIT program point with each of its corresponding EXIT<id> program points.
A program point that represents a CLASS will usually not have a parent record.
A program point that represents a OBJECT will have a parent record that points to the CLASS program point for the object if the object has static data members.

When using Daikon, the relations used to describe the dataflow hierarchy may result in some true invariants that are not reported at some program points. However, the invariant will be present in some parent program point. The dataflow hierarchy is used by default, but can be disabled by the --nohierarchy flag. When dataflow is enabled, the only samples that are examined by Daikon are the :::EXIT program points (plus ‘orig’ variables) since these contain a complete view of the data, from which invariants at all other locations can be inferred. For example, Daikon does not need to examine data at :::ENTER or :::OBJECT program points which are parents of :::EXIT in the dataflow hierarchy.

4.3 Equality optimization

When N variables are equal within a program point there will be N(N-1)/2 pairwise invariants to represent the equality within the equal variables, and N copies of every other invariant. For example, if a, b, and c are equal, then ‘a == b’, ‘a == c’, ‘b == c’ will be reported as pairwise invariants, and ‘odd(a)’, ‘odd(b)’ and ‘odd(c)’ will be reported. If the variables will always be equal, then reporting N times the invariants is wasteful. Daikon thus treats equality specially.

Each group of variables that are equal from the start of inferencing are placed in equality sets. An equality set can hold an arbitrary number of variables, and replaces the O(N^2) pairwise equality invariants. Every equality set has a leader or canonical representation by a variable in the set. Non-equality invariants are only instantiated and checked on the leader. When printing invariants, Daikon reports only invariants on the leader. The user can easily determine that ‘odd(a)’ and ‘a == b’ imply ‘odd(b)’. Equality optimization can be turned off at the command line with the --noequality flag.

• Avoiding work for redundant invariants
• Dataflow hierarchy
• Equality optimization