Next: Testing, Previous: Debugging Daikon, Up: Top [Contents][Index]
• Avoiding work for redundant invariants | ||
• Dataflow hierarchy | ||
• Equality optimization |
This chapter describes some of the techniques used in Daikon to make it efficient in terms of time and space needed. These techniques can be enabled or disabled at the Daikon command line, as described in Running Daikon in Daikon User Manual.
Next: Dataflow hierarchy, Up: Daikon internals [Contents][Index]
Daikon reduces run time and memory by avoiding performing work for redundant invariants that provide no useful information to the user. There are three basic types of optimization that can be performed for uninteresting invariants: non-instantiation, suppression, and non-printing.
Non-instantiation prevents the creation of an invariant because
the invariant’s truth value is statically obvious (from the semantics
of the programming language), no matter what values may be seen at run
time. Two examples are ‘A[i] is an element of A[]’ and ‘size(A[])
>= 0’. Non-instantiation is implemented by the by the
isObviousStatically
method.
With the equality sets optimization (see Equality optimization),
non-instantiation can only happen if all equality permutations are
statically obvious. Note that isObviousStatically
should
be used only for invariants that are known to be true. Other code
presumes that any statically obvious invariants are true and can
be safely presumed when determining if other invariants are redundant.
An invariant can be suppressed if it is logically implied by some set of other invariants (referred to as suppressors). A suppressed invariant is not instantiated or checked as long as its suppressors hold. For example ‘x > y’ implies ‘x >= y’. Suppression has some limitations. It cannot use as suppressors or suppress sample dependent invariants (invariants that adapt themselves to the samples they see and whose equation thus involves a constant such as ‘x > 42’). Suppression also cannot use relationships between variables. For example, it cannot suppress ‘x[i] = y[j]’ by ‘(x[] = y[]) ^ (i = j)’. Suppressor invariants can only use variables that are also in the invariant that is being suppressed. In this example, only invariants using the variables ‘x[i]’ and ‘y[i]’ can be used as a suppressors. See New suppressors for more information.
Non-printing is a post-pass that throws out any invariants that
are implied by other true invariants. It is similar to suppression, but
has none of the limitations of suppression. But since it is only run as
a post pass, it cannot optimize run time and memory use as suppression can.
Non-printing should be used only in cases where suppression cannot.
Non-printing is implemented by ObviousFilter
, which calls the
isObviousDynamically
method on invariants. The
isObviousStatically
method is also used by the non-printing
checks; it can be called at the end without reference to equality sets.
More detail can be found in the paper “Efficient incremental algorithms for dynamic detection of likely invariants” by Jeff H. Perkins and Michael D. Ernst, published in Foundations of Software Engineering in 2004; the paper is available from http://plse.cs.washington.edu/daikon/pubs/invariants-incremental-fse2004-abstract.html.
Next: Equality optimization, Previous: Avoiding work for redundant invariants, Up: Daikon internals [Contents][Index]
The dataflow hierarchy expresses relationships between variables at different program points. It is used to save time and space during invariant generation and to prevent invariants from being printed multiple times.
Suppose there are two program points X and its parent Y. Then every sample seen at X is also seen at Y, and every invariant that is true at Y is also true at X.
Variable z in program point X is related to variable z’ in another program point Y by a flow relation if every sample seen of z at X is also seen of z’ at Y. Y is called a parent program point of X.
For example, all the field
variables in the :::ENTER
program point of a method in class C relate to
the field variables in the :::CLASS
program point of C. This is because
the state of C, when in context at the entry :::ENTER
program point, is
also in context at the :::CLASS
program point. Any invariant that holds
true on a parent program point must hold on the child program point.
Daikon saves time and space by only computing invariants at the highest parent
at which they apply.
Program point declarations describe how program points are
declared in a Daikon input file. Here we will describe how the
parent
records are typically used to connect program points into
a dataflow hierarchy.
Daikon uses three primary relationship types
(PARENT
, ENTER-EXIT
and EXIT-EXITNN
) to connect the program points
into an acyclic dataflow hierarchy.
ENTRY
or EXIT
of a
static method will have a parent
record that points to the
CLASS
program point for the containing class.
ENTRY
or EXIT
of a
non-static (instance) method will have a parent
record that points to the
OBJECT
program point for the containing object.
ENTER-EXIT
edge connects each method’s ENTER
program
point with its corresponding EXCEPTION
and EXIT
program points.
EXIT-EXITNN
edge connects each method’s EXIT
program
point with each of its corresponding EXIT<id>
program points.
A program point that represents a CLASS
will usually not have a parent
record.
OBJECT
will have a parent
record that points to the CLASS
program point for the object if the object has static data members.
When using Daikon, the relations used to describe the dataflow
hierarchy may result in some true invariants that are not reported at
some program points. However, the invariant will be present in some
parent program point. The dataflow hierarchy is used by default, but
can be disabled by the --nohierarchy flag. When dataflow is enabled,
the only samples that are examined by Daikon are the :::EXIT
program
points (plus ‘orig’ variables) since these contain a complete view of
the data, from which invariants at all other locations can be inferred.
For example, Daikon does not need to examine data at :::ENTER
or
:::OBJECT
program points which are parents of :::EXIT
in the
dataflow hierarchy.
Previous: Dataflow hierarchy, Up: Daikon internals [Contents][Index]
When N variables are equal within a program point there will be N(N-1)/2 pairwise invariants to represent the equality within the equal variables, and N copies of every other invariant. For example, if a, b, and c are equal, then ‘a == b’, ‘a == c’, ‘b == c’ will be reported as pairwise invariants, and ‘odd(a)’, ‘odd(b)’ and ‘odd(c)’ will be reported. If the variables will always be equal, then reporting N times the invariants is wasteful. Daikon thus treats equality specially.
Each group of variables that are equal from the start of inferencing are placed in equality sets. An equality set can hold an arbitrary number of variables, and replaces the O(N^2) pairwise equality invariants. Every equality set has a leader or canonical representation by a variable in the set. Non-equality invariants are only instantiated and checked on the leader. When printing invariants, Daikon reports only invariants on the leader. The user can easily determine that ‘odd(a)’ and ‘a == b’ imply ‘odd(b)’. Equality optimization can be turned off at the command line with the --noequality flag.
Previous: Dataflow hierarchy, Up: Daikon internals [Contents][Index]