Next: , Previous: Debugging Daikon, Up: Top   [Contents][Index]

4 Daikon internals

This chapter describes some of the techniques used in Daikon to make it efficient in terms of time and space needed. These techniques can be enabled or disabled at the Daikon command line, as described in Running Daikon in Daikon User Manual.


Next: , Up: Daikon internals   [Contents][Index]

4.1 Avoiding work for redundant invariants

Daikon reduces runtime and memory by avoiding performing work for redundant invariants that provide no useful information to the user. There are three basic types of optimization that can be performed for uninteresting invariants: non-instantiation, suppression, and non-printing.

Non-instantiation prevents the creation of an invariant because the invariant’s truth value is statically obvious (from the semantics of the programming language), no matter what values may be seen at run time. Two examples are ‘A[i] is an element of A[]’ and ‘size(A[]) >= 0’. Non-instantiation is implemented by the by the isObviousStatically method. With the equality sets optimization (see Equality optimization), non-instantiation can only happen if all equality permutations are statically obvious. Note that isObviousStatically should be used only for invariants that are known to be true. Other code presumes that any statically obvious invariants are true and can be safely presumed when determining if other invariants are redundant.

An invariant can be suppressed if it is logically implied by some set of other invariants (referred to as suppressors). A suppressed invariant is not instantiated or checked as long as its suppressors hold. For example ‘x > y’ implies ‘x >= y’. Suppression has some limitations. It cannot use as suppressors or suppress sample dependent invariants (invariants that adapt themselves to the samples they see and whose equation thus involves a constant such as ‘x > 42’). Suppression also cannot use relationships between variables. For example, it cannot suppress ‘x[i] = y[j]’ by ‘(x[] = y[]) ^ (i = j)’. Suppressor invariants can only use variables that are also in the invariant that is being suppressed. In this example, only invariants using the variables ‘x[i]’ and ‘y[i]’ can be used as a suppressors. See New suppressors for more information.

Non-printing is a post-pass that throws out any invariants that are implied by other true invariants. It is similar to suppression, but has none of the limitations of suppression. But since it is only run as a post pass, it cannot optimize runtime and memory use as suppression can. Non-printing should be used only in cases where suppression cannot. Non-printing is implemented by ObviousFilter, which calls the isObviousDynamically method on invariants. The isObviousStatically method is also used by the non-printing checks; it can be called at the end without reference to equality sets.

More detail can be found in the paper “Efficient incremental algorithms for dynamic detection of likely invariants” by Jeff H. Perkins and Michael D. Ernst, published in Foundations of Software Engineering in 2004; the paper is available from http://plse.cs.washington.edu/daikon/pubs/invariants-incremental-fse2004-abstract.html.


Next: , Previous: Avoiding work for redundant invariants, Up: Daikon internals   [Contents][Index]

4.2 Dataflow hierarchy

Dataflow hierarchy is a means to relate variables in different program points in a partial ordering. Variables in program point X are related to variables in another program point Y by a flow relation if every sample seen of X’s variables is also meant to be seen at Y. Y is called a parent program point of X. For example, all the field variables in the :::ENTER program point of a method in class C relate to the field variables in the :::CLASS program point of C. This is because the state of C, when in context at the entry :::ENTER program point, is also in context at the :::CLASS program point. Any invariant that holds true on a parent program point must hold on the child program point. The purpose of dataflow hierarchy is to reduce the presence of redundant invariants by only keeping invariants at the highest parent at which they apply. This saves both time and space.

Daikon provides four ways that program points can be connected. First, :::CLASS program points are parents of all their method program points. Second, between two classes that are related by inheritance, corresponding program points relate — for example, java.util.Vector:::CLASS is a child of java.util.List:::CLASS. Third, when a program point contains variables of a type whose :::CLASS program point is also available to Daikon, the former program point’s variables relate to the latter program point’s :::CLASS method. For example, if X.y is of type Y, and Y contains fields a and b, X.y, X.y.a and X.y.b relate to Y.this, Y.b and Y.a. Fourth, variables at :::ENTER program points are related to the ‘orig’ versions at :::EXIT program points.

When using Daikon, the above four ways of relations in the dataflow hierarchy will result in some true invariants that are not reported at some program points. However, the invariant will be present in some parent program point. Dataflow hierarchy is enabled by default, but can be disabled by the --nohierarchy flag. When dataflow is enabled, the only samples that are examined by Daikon are the :::EXIT program points (plus ‘orig’ variables) since these contain a complete view of the data.


Previous: Dataflow hierarchy, Up: Daikon internals   [Contents][Index]

4.3 Equality optimization

When N variables are equal within a program point there will be N(N-1)/2 pairwise invariants to represent the equality within the equal variables, and N copies of every other invariant. For example, if a, b, and c are equal, then ‘a == b’, ‘a == c’, ‘b == c’ will be reported as pairwise invariants, and ‘odd(a)’, ‘odd(b)’ and ‘odd(c)’ will be reported. If the variables will always be equal, then reporting N times the invariants is wasteful. Daikon thus treats equality specially.

Each group of variables that are equal from the start of inferencing are placed in equality sets. An equality set can hold an arbitrary number of variables, and replaces the O(N^2) pairwise equality invariants. Every equality set has a leader or canonical representation by a variable in the set. Non-equality invariants are only instantiated and checked on the leader. When printing invariants, Daikon reports only invariants on the leader. The user can easily determine that ‘odd(a)’ and ‘a == b’ imply ‘odd(b)’. Equality optimization can be turned off at the command line with the --noequality flag.


Previous: Dataflow hierarchy, Up: Daikon internals   [Contents][Index]