Next: Testing, Previous: Debugging Daikon, Up: Top [Contents][Index]
• Avoiding work for redundant invariants: | ||
• Dataflow hierarchy: | ||
• Equality optimization: |
This chapter describes some of the techniques used in Daikon to make it efficient in terms of time and space needed. These techniques can be enabled or disabled at the Daikon command line, as described in Running Daikon in Daikon User Manual.
Next: Dataflow hierarchy, Up: Daikon internals [Contents][Index]
Daikon reduces runtime and memory by avoiding performing work for redundant invariants that provide no useful information to the user. There are three basic types of optimization that can be performed for uninteresting invariants: non-instantiation, suppression, and non-printing.
Non-instantiation prevents the creation of an invariant because
the invariant’s truth value is statically obvious (from the semantics
of the programming language), no matter what values may be seen at run
time. Two examples are ‘A[i] is an element of A[]’ and ‘size(A[])
>= 0’. Non-instantiation is implemented by the by the
isObviousStatically
method.
With the equality sets optimization (see Equality optimization),
non-instantiation can only happen if all equality permutations are
statically obvious. Note that isObviousStatically
should
be used only for invariants that are known to be true. Other code
presumes that any statically obvious invariants are true and can
be safely presumed when determining if other invariants are redundant.
An invariant can be suppressed if it is logically implied by some set of other invariants (referred to as suppressors). A suppressed invariant is not instantiated or checked as long as its suppressors hold. For example ‘x > y’ implies ‘x >= y’. Suppression has some limitations. It cannot use as suppressors or suppress sample dependent invariants (invariants that adapt themselves to the samples they see and whose equation thus involves a constant such as ‘x > 42’). Suppression also cannot use relationships between variables. For example, it cannot suppress ‘x[i] = y[j]’ by ‘(x[] = y[]) ^ (i = j)’. Suppressor invariants can only use variables that are also in the invariant that is being suppressed. In this example, only invariants using the variables ‘x[i]’ and ‘y[i]’ can be used as a suppressors. See New suppressors for more information.
Non-printing is a post-pass that throws out any invariants that
are implied by other true invariants. It is similar to suppression, but
has none of the limitations of suppression. But since it is only run as
a post pass, it cannot optimize runtime and memory use as suppression can.
Non-printing should be used only in cases where suppression cannot.
Non-printing is implemented by ObviousFilter
, which calls the
isObviousDynamically
method on invariants. The
isObviousStatically
method is also used by the non-printing
checks; it can be called at the end without reference to equality sets.
More detail can be found in the paper “Efficient incremental algorithms for dynamic detection of likely invariants” by Jeff H. Perkins and Michael D. Ernst, published in Foundations of Software Engineering in 2004; the paper is available from http://plse.cs.washington.edu/daikon/pubs/invariants-incremental-fse2004-abstract.html.
Next: Equality optimization, Previous: Avoiding work for redundant invariants, Up: Daikon internals [Contents][Index]
Dataflow hierarchy is a means to relate variables in different program
points in a partial ordering. Variables in program point X are
related to variables in another program point Y by a flow relation
if every sample seen of X’s variables is also meant to be seen at Y.
Y is called a parent program point of X. For example, all the field
variables in the :::ENTER
program point of a method in class C relate to
the field variables in the :::CLASS
program point of C. This is because
the state of C, when in context at the entry :::ENTER
program point, is
also in context at the :::CLASS
program point. Any invariant that holds
true on a parent program point must hold on the child program point.
The purpose of dataflow hierarchy is to reduce the presence of
redundant invariants by only keeping invariants at the highest parent
at which they apply. This saves both time and space.
Daikon provides four ways that program points can be connected.
First, :::CLASS
program points are parents of
all their method program points. Second, between two classes that are
related by inheritance, corresponding program points relate — for
example, java.util.Vector:::CLASS
is a child of
java.util.List:::CLASS
. Third, when a program point contains
variables of a type whose :::CLASS
program point is also available to
Daikon, the former program point’s variables relate to the latter
program point’s :::CLASS
method. For example, if X.y is of type Y, and
Y contains fields a and b, X.y, X.y.a and X.y.b relate to
Y.this, Y.b and Y.a. Fourth, variables at :::ENTER
program
points are related to the ‘orig’ versions at :::EXIT
program points.
When using Daikon, the above four ways of relations in the dataflow
hierarchy will result in some true invariants that are not reported at
some program points. However, the invariant will be present in some
parent program point. Dataflow hierarchy is enabled by default, but
can be disabled by the --nohierarchy flag. When dataflow is enabled,
the only samples that are examined by Daikon are the :::EXIT
program
points (plus ‘orig’ variables) since these contain a complete view of
the data.
Previous: Dataflow hierarchy, Up: Daikon internals [Contents][Index]
When N variables are equal within a program point there will be N(N-1)/2 pairwise invariants to represent the equality within the equal variables, and N copies of every other invariant. For example, if a, b, and c are equal, then ‘a == b’, ‘a == c’, ‘b == c’ will be reported as pairwise invariants, and ‘odd(a)’, ‘odd(b)’ and ‘odd(c)’ will be reported. If the variables will always be equal, then reporting N times the invariants is wasteful. Daikon thus treats equality specially.
Each group of variables that are equal from the start of inferencing are placed in equality sets. An equality set can hold an arbitrary number of variables, and replaces the O(N^2) pairwise equality invariants. Every equality set has a leader or canonical representation by a variable in the set. Non-equality invariants are only instantiated and checked on the leader. When printing invariants, Daikon reports only invariants on the leader. The user can easily determine that ‘odd(a)’ and ‘a == b’ imply ‘odd(b)’. Equality optimization can be turned off at the command line with the --noequality flag.
Previous: Dataflow hierarchy, Up: Daikon internals [Contents][Index]