This is the developer manual for the Daikon invariant detector. It describes Daikon version 4.6.4, released June 23, 2010.
This is the developer manual for the Daikon invariant detector. For information about using Daikon, see its user manual (see Overview). This manual is intended for those who are already familiar with the use of Daikon, but wish to customize or extend it.
Additional information can be found in technical papers available from http://pag.csail.mit.edu/daikon/pubs/.
This chapter describes how to customize or modify Daikon.
To compile Daikon, type ‘make’ in daikon/java/ or any of its subdirectories. The distribution includes compiled .class files, so you do not need to compile them yourself unless you make changes.
When you compile Daikon, environment variables DAIKONDIR (or INV, whose effect is the same) and JAVA_HOME should be set. This is already done if you source the daikon.bashrc or daikon.cshrc file, as recommended in the installation instructions (see Complete installation). When you compile Daikon, environment variable DAIKONCLASS_SOURCES should be set (to any value) before your startup file sources file daikon.bashrc or daikon.cshrc. Thus, a complete .bashrc or .bash_profile shell setup file that would enable you to compile Daikon might look like the following.
export INV=$HOME/invariants export JAVA_HOME=/usr/java/jdk1.6.0_20 export DAIKONCLASS_SOURCES=1 source $INV/scripts/daikon.bashrc export CLASSPATH=.:$CLASSPATH
Daikon is written in Java 1.5 (also known as Java 5). In order to compile Daikon, you need a Java 1.5 compiler such as javac on your path. To override the default Java compiler (javac), create a Makefile.user file in the daikon/ directory and add a line like the following.
JAVAC ?= jikes -g +E +F
In order to compile Daikon, you need the C preprocessor
cpp, which is used to convert each .jpp file in the
distribution into multiple .java files, which are then compiled.
If you have a C compiler, you almost certainly have cpp.
If you do not have cpp (or gcc, which can emulate
cpp via ‘gcc -E’), you may run ‘make avoid-jpp’, in
which case changes to .jpp
files will not be reflected in the
.java files or the compiled .class files. (The purpose
of the .jpp files is to avoid code duplication by placing
common code in a single file, then generating other files that need to
include that common code.)
To make the documentation (via ‘make -C $inv/doc’), you will need a recent version of makeinfo. Makeinfo version 4.7 is known to work, but makeinfo version 4.1 is known to fail.
For more information about compiling Daikon, see the comments in the Makefiles.
To compile Daikon on Windows, the best approach is to install the Cygwin toolset (available at http://sources.redhat.com/cygwin/), which contains everything you need to compile and run Unix programs under Windows. You can install Cygwin by simply running the program found at http://sources.redhat.com/cygwin/setup.exe.
When setting up environment pathname variables under Windows/Cygwin (such as JAVA_HOME or CLASSPATH) make sure that the pathname is specified in Unix format (e.g., ‘/cygdrive/c/daikon’ rather than ‘C:\daikon’). Cygwin expects Unix style pathnames and the Makefile will convert them to Windows pathnames when necessary (such as when using Windows programs such as Java). The CLASSPATH environment variable should use colons (:) rather than semicolons (;) as a separator. Using windows pathnames or separators is a common source of errors that will result in odd error messages and build failures.
Compiling Daikon on MacOSX is relatively straightforward. In addition to the standard settings, add an environment variable that specifies the location of the Java ‘classes.jar’ file (‘classes.jar’ performs a similar function to the more standard ‘rt.jar’. Normally the file is found in ‘/System/Library/Frameworks’ under the appropriate Java version. The following example is for the standard install of Java 1.5 on MacOSX:
export ORIG_RT=/System/Library/Frameworks/JavaVM.framework/Versions/1.5.0/Classes/classes.jar
The layout of the Daikon CVS repository differs slightly from that of the distribution. For example, the top-level directory is named invariants/ instead of daikon/, and the subdirectory with helper programs is named scripts/ instead of bin/.
For information about obtaining Daikon via CVS, see http://groups.csail.mit.edu/pag/daikon/mit/.
Here is one way to use Eclipse to edit Daikon.
First, make sure that Daikon builds cleanly from the command line.
File > Import > General > Existing Projects into Workspace
Choose the “java” directory of your Daikon checkout
Project > properties > Java build path: libraries : “add external jars” everything in the lib/ directory, plus also the tools.jar file in the lib/ directory of your JDK. (I'm not sure why, but “add jars” doesn't show all .jar files in the directory.)
Source: add “Daikon”, remove “Daikon/src”. Default output folder: change from “Daikon/bin” to “Daikon”.
You can easily write your own invariants and have Daikon check them, in addition to all the other invariants that are already part of Daikon. Adding a new invariant to Daikon requires writing one Java class, adding a line to another file to inform Daikon of the new class, and recompiling Daikon.
The file java/daikon/inv/unary/scalar/Positive.java in the
Daikon distribution contains a sample invariant. This invariant is
true if the variable is always positive (greater than zero). This
invariant is subsumed by other invariants in the system; it is provided
only as a pedagogical example. To enable the invariant, uncomment the
appropriate line in Daikon.setup_proto_invs()
, then recompile
Daikon.
A Java class defining an invariant is a concrete subclass of one of the following abstract classes:
SingleScalar
TwoScalar
ThreeScalar
SingleSequence
TwoSequence
SequenceScalar
Daikon's invariants are first instantiated, then are presented samples (tuples of values for all the variables of interest to the invariant; this might be a 1-tuple, a 2-tuple, or a 3-tuple) in turn. If any sample falsifies the invariant, the invariant destroys itself. All remaining invariants at the end of the program run can be reported as likely to be true.
The key methods of the new invariant class InvName are
protected
InvName(PptSlice ppt)
instantiate_dyn
.
Its typical implementation is
super(ppt);
protected Invariant instantiate_dyn (PptSlice slice)
return new InvName(slice);
public static
InvName get_proto()
if (proto == null) proto = new InvName (null); return (proto);
public boolean enabled()
return dkconfig_enabled;
public boolean instantiate_ok (VarInfo[] vis)
public InvariantStatus check_modified (..., int count)
count
argument indicates how many samples have this value.
For example, three calls to
check_modified
with a count
parameter of 1 is equivalent to
one call to check_modified
with a count
parameter of 3.
Returns whether or not the sample is consistent with the invariant.
Does not change the state of the invariant.
public InvariantStatus add_modified (..., int count)
check_modified
except that it can change the state
of the invariant if necessary. If the invariant doesn't have any
state, then simply calls check_modified
.
protected double computeConfidence ()
Invariant
class (see the class for
documentation): CONFIDENCE_JUSTIFIED
,
CONFIDENCE_UNJUSTIFIED
, CONFIDENCE_NEVER
.
For example, suppose your new invariant has a 50% chance of being true
by chance for each sample. (“x is even” is an example of such an
invariant.) Then a reasonable body for computeConfidence
would
be
return 1 - Math.pow(.5, ppt.num_samples());
If 5 values had been seen, then this implementation would return 31/32,
which is the likelihood that all 5 values seen so far were even not purely
by chance. An invariant is printed only if its probability of not
occurring by chance alone is large enough (by default, greater than .99;
see Daikon's ‘--conf_limit’ command-line option.
public String format ()
public String repr ()
public String format_using (OutputFormat format)
format
produces “normal” output, while
the ‘repr’ formatting routine produces low-level, detailed output for
debugging. When first writing
an invariant, you can make repr
and format_using
simply call format
, then fix up the
implementations for the different output formats later as needed.
See also New formatting for invariants.
After the invariant is written, add a call to its get_proto
method in the Daikon.setup_proto_invs
method.
A derived variable is an expression that does not appear in the source code as a variable, but that Daikon treats as a variable for purposes of invariant detection. For instance, if there exists an array ‘a’ and an integer ‘i’, then Daikon introduces the derived variable ‘a[i]’. This permits detection of invariants over this quantity.
(Describing how to create new variety of derived variable is still to be written. For now, see the derived variables that appear in the Java files in directory $DAIKONDIR/java/daikon/derive/.)
Daikon can print invariants in multiple formats (see Invariant syntax).
To support a new output format, you need to do two things:
daikon.inv.Invariant.OutputFormat
, add a new static final
field and also update the get
method.
Invariant
, edit the
format_using
method to handle the new OutputFormat
.
A front end for Daikon converts data into a form Daikon can process, producing files in Daikon's input format — data trace declarations and records. For more information about these files, see File formats.
The data traces can be obtained from any source. For instance, front ends have been built for stock data, weather forecasts, truck weight data, and spreadsheet data (see convertcsv.pl), among others. More often, users apply a programming language front end (also called an “instrumenter”) to a program, causing executions of the program to write files in Daikon's format. (For information about existing front ends, see Front ends (instrumentation).) When a general front end is not available, it is possible to manually instrument a specific program so that it writes files in Daikon's format. The resulting instrumented program is very similar to what an instrumenter would have created, so this section is relevant to both approaches.
Conceptually, instrumentation is very simple. For each program point
(say, a line of code or the entry or exit from a procedure) at which you
wish to detect invariants, the front end must arrange to create a
declaration (see Declarations) that
lists the variables in scope at that program point, and the front end
must arrange that execution creates a data trace record (see Data trace records) for each execution of that
program point. Conventionally, the way to create a data trace record is
to insert a printf
(or similar) statement that outputs the current
values of the variables of interest.
This section gives an example of how an instrumenter for Java might work; other languages are analogous. Suppose we wish to instrument file Example.java.
class Example { // Return either the square of x or the square of (x+1). int squar(int x, boolean b) { if (b) x++; return x*x; } }
The .decls file might look like the following.
DECLARE Example.squar:::ENTER x int int 1 b boolean int 2 DECLARE Example.squar:::EXIT x int int 1 b boolean int 2 return int int 1
The instrumented .java file might look like the following. This example does not compute the “modified bits”, but simply sets them all to 1, which is a safe default.
class Example { static { daikon.Runtime.setDtraceMaybe("daikon-output/StackAr.dtrace"); } // Return either the square of x or the square of (x+1). int squar(int x, boolean b) { synchronized (daikon.Runtime.dtrace) { daikon.Runtime.dtrace.println(); daikon.Runtime.dtrace.println("Example.squar:::ENTER"); daikon.Runtime.dtrace.println("x"); daikon.Runtime.dtrace.println(x); daikon.Runtime.dtrace.println(1); // modified bit daikon.Runtime.dtrace.println("b"); daikon.Runtime.dtrace.println(b ? 1 : 0); daikon.Runtime.dtrace.println(1); // modified bit } if (b) x++; int daikon_return_value = x*x; synchronized (daikon.Runtime.dtrace) { daikon.Runtime.dtrace.println(); daikon.Runtime.dtrace.println("Example.squar:::EXIT"); daikon.Runtime.dtrace.println("x"); daikon.Runtime.dtrace.println(x); daikon.Runtime.dtrace.println(1); // modified bit daikon.Runtime.dtrace.println("b"); daikon.Runtime.dtrace.println(b ? 1 : 0); daikon.Runtime.dtrace.println(1); // modified bit daikon.Runtime.dtrace.println("return"); daikon.Runtime.dtrace.println(daikon_return_value); daikon.Runtime.dtrace.println(1); // modified bit } return daikon_return_value; } }
(Daikon's Java front end, Chicory, does not actually insert
instrumentation into the Java source code of your program. Rather, it
instruments the bytecode as it is loaded into the JVM. This is more
efficient, and it avoids making any changes to your .java
or
.class
files. We have shown an example of Java source code
instrumentation because that is simpler to explain and understand than
the bytecode instrumentation.)
Daikon comes with two front ends for the C language: Kvasir (see Kvasir) and Mangel-Wurzel (see Mangel-Wurzel). Each has its limitations. Kvasir only works under the Linux operating system, and it works only on “x86” (Intel 386-compatible) processors. Mangel-Wurzel lacks some tracing features related to arrays and nested structs, and requires the user to purchase Purify.
You may wish to infer invariants over C programs running on other platforms; for instance, you want a robust C front end that works under Microsoft Windows. This section will help you to either write such a front end or to hand-instrument your program to produce output that Daikon can process.
We welcome additions and corrections to this part of the manual. And, if you write a C instrumenter that might be of use to others, please contribute it back to the Daikon project.
A front end for C (or any other language) performs two tasks. It determines the names of all variables that are in scope at a particular program point, and it prints the values of those variables each time the program point executes.
Determining the names of the variables is straightforward. It requires either parsing source code or parsing a compiled executable. In the latter case, the variables can be determined from debugging information that the compiler places in the executable.
The challenge for C programs is determining the values of variables at execution time: for each variable, the front end must determine whether the variable's value is valid, and how big the value is.
A front end should print only variables that have valid values. Examples of invalid values are variables that have not yet been initialized and pointers whose content has been deallocated. (A pointer dereference, such as ‘*p’ or ‘p->field’, can itself be to uninitialized and/or deallocated memory.) Invalid values should be printed as “nonsensical” (see Data trace records).
It is desirable to print “nonsensical” rather than an invalid value, for two reasons. First, outputting nonsense values can degrade invariant detection; patterns in the valid data may be masked by noise from invalid values. Second, an attempt to access an invalid value can cause the instrumented program to crash! For instance, suppose that pointer ‘p’ is not yet initialized — the pointer value refers to some arbitrary location in memory, possibly even an address that the operating system has not allocated to the program. An attempt to print the value of ‘*p’ or ‘p->field’ will result in a segmentation fault when ‘*p’ is accessed. (If you choose never to dereference a pointer while performing instrumentation, then you do not need to worry about invalid references. However, you will be unable to output any fields of a pointer to a struct or class, making your front end less useful. You will still be able to output fields of a regular variable to a struct or class, but most interesting uses of structs and classes in C and C++ are through pointers.)
C relies on the programmer to remember which variables are valid, and the programmer must take care never to access invalid variables. Unfortunately, there is no simple automatic way to determine variable validity for an arbitrary C program. (Languages with automatic memory management, such as Java, do not pose these problems. All variables always have an initial value, so there is no danger of printing uninitialized memory, though the initial value may not be particularly meaningful. Because pointed-to memory is never deallocated, all non-null pointers are always valid, so there is no danger of a segmentation fault.)
An instrumenter needs information about validity of variable values. This could be obtained from the programmer (which requires work on the part of the user of Daikon), or obtained automatically by creating a new run-time system that tracks the information (which requires a more sophisticated front end).
In addition to determining which variables are uninitialized and which pointers are to allocated memory, there are additional problems for a C front end. For example, given a char pointer ‘*c’, does it point to a single character, or to an array of characters? If it points to an array of characters, how big is that array? And for each element of the array, is that element initialized or not?
The problem of tracking C memory may seem daunting, but it is not insurmountable. There exist many tools for detecting or debugging memory errors in C, and they need to perform exactly the same memory tracking as a Daikon front end must perform. Therefore, a Daikon front end can use the same well-known techniques, and possibly can even be built on top of such a tool. For instance, one C front end, named Kvasir, is built on top of the Valgrind tool (http://valgrind.org/), greatly reducing the implementation effort. Valgrind only works under Linux, but a C front end for another platform could build on a similar tool; many other such tools exist.
There are two basic approaches to instrumenting a C program (or a program in any other language): instrument the source code, or instrument a compiled binary representation of the program. In each case, additional code that tracks all memory allocations, deallocations, writes, and reads must be executed at run time. Which approach is most appropriate for you depends on what tools you use when building your C instrumentation system.
In some cases, it may not be necessary to build a fully general C instrumentation system. You may be able to craft a smaller, simpler extension to an existing program — enabling that program (only) to produce files for Daikon to analyze.
For instance, many programs use specialized memory allocation routines
(customized versions of malloc
and free
), in order to
prevent or detect memory errors. The information that such libraries
collect is often sufficient to determine which variable values should be
printed, and which should be suppressed in favor of printing
“nonsensical” instead.
The presence of memory errors — even in a program that appears to run correctly — makes it much harder to create Daikon's output. Therefore, as a prerequisite to instrumenting a C program, it is usually a good idea to run a memory checker on that program and to eliminate any memory errors.
As mentioned in Daikon internals, one way to make Daikon more
efficient, and to reduce clutter in output to the user, is to reduce the
number of redundant invariants of various kinds. This section describes
how to add a new suppressor relation, such that if invariant A implies
B, B is not instantiated or checked as long as A holds, saving time and
space. Suppression implications use some terminology. A
suppressor (defined in the class NISuppressor
) is one of a
set of invariants (NISuppression
) that imply and suppress a
suppressee invariant (NISuppressee
). The set of all of
the suppressions that suppress a particular suppressee is stored in the
class NISuppressionSet
.
Adding a new suppression is straightforward when the invariants involved
do not have any state. Define the suppressee and
each of the suppressions that suppress it using the corresponding
constructors. Add the method get_ni_suppressions
to the class
of the invariant being suppressed and return the appropriate
suppression set. Make sure that get_ni_suppressions
always
returns the same suppression set (i.e., that storage to store
the suppressions is only allocated once). Normally this is
done by defining a static variable to hold the suppression sets
and initializing this variable the first time that get_ni_suppressions
is called.
The following example defines suppressions for “x == y” implies “x >= y” and “x > y” implies “x >= y”.
private static NISuppressionSet suppressions = null; public NISuppressionSet get_ni_suppressions() { if (suppressions == null) { NISuppressee = new NISuppressee (IntGreaterEqual); NISuppressor v1_eq_v2 = new NISuppressor (0, 1, IntEqual.class); NISuppressor v1_lt_v2 = new NISuppressor (0, 1, IntLessThan.class); suppressions = new NISuppressionSet (new NISuppression[] { new NISuppression (v1_eq_v2, suppressee), new NISuppression (v1_lt_v2, suppressee), }); } return (suppressions); }
For suppressions depending on the state of a particular invariant, each Invariant
has an isObviousDynamically(VarInfo[] vis)
method that is called once the state of other invariants has already been determined. This method returns a non-null value if this invariant is implied by a fact that can be derived from the given VarInfo
s.
For example, suppose division was not defined for divisors smaller than 1. The following example defines an obvious check for “x <= c” (where c < 1 is a constant) implies “y % x == 0”, written in the Divides class.
public DiscardInfo isObviousDynamically(VarInfo[] vis) { DiscardInfo di = super.isObviousDynamically(vis); if(di != null) { return di; } VarInfo var1 = vis[0]; PptSlice1 ppt_over1 = ppt.parent.findSlice(var1); if(ppt_over1 == null) { return null; } for(Invariant inv : ppt_over1.invs) { if(inv instanceof UpperBound) { if(((UpperBound) inv).max() < 1) { return new DiscardInfo(this, DiscardCode.obvious, ``Divides is obvious when divisor less than one''); } } } return null; }
The Daikon codebase does not call System.exit()
, except in a
dummy main method that catches TerminationMessage
, which is the
standard way that a component of Daikon requests the JVM to shut down.
The reason for this is that calling System.exit()
is usually a
bad idea. It makes the class un-usable as a subroutine, because it
might kill the calling program. It can cause deadlock. And it can
leave data in an inconsistent state (for example, if the program was in
the middle of writing a file, still held non-Java locks, etc.), because
the program has no good way of completing any actions that it was in the
middle of. Therefore, it is better to throw an exception and let the
program handle it appropriately. (This is true of instrumentation code
as well.)
This chapter describes some techniques that can be used for debugging Daikon. Because Daikon processes large amounts of data, using a debugger can be difficult. The following logging techniques provide alternatives to using a debugger.
Daikon's logging routines are based on the java.util.logging
utilities
(built into Java 1.4 and later).
Often it is desirable to print information only about one or more specific invariants. This is distinct from general logging because it concentrates on specific invariant objects rather than a particular class or portion of Daikon. This is referred to as Track logging because it tracks particular values across Daikon.
The --track class|class|...<var,var,var>@ppt option to Daikon (see Daikon debugging options) enables track logging. The argument to the --track option supplies three pieces of information:
IntEqual
).
Multiple class arguments can be specified separated by pipe symbols
(‘|’).
return
,
size(this.s[])
). The variables are specified in angle brackets
(‘<>’).
DataStructures.StackAr.makeEmpty()V:::ENTER
). The program point
is preceded by an at sign (‘@’).
Each item is optional. For example:
IntEqual<x,y>@makeEmpty() LessThan|GreaterThan<return,orig(y)>@EXIT99
Multiple --track switches can be specified. The class, program point, and each of the variables must match one of the specifications in order for information concerning the invariant to be printed.
Matching is a simple substring comparison. The specified item must be
a substring of the actual item. For instance, LessThan
matches
both IntLessThan
and FloatLessThan
.
Program points and variables are specified exactly as they are seen in
normal Daikon invariant output. Specifically, Ppt.name
and
VarInfo.name.name()
are used to generate the names for comparisons.
Invariants are not the only classes that can be tracked. Any class name
is a valid entry. Thus, for example, to print information about derived
sequence variables from sequence this.theArray[]
and scalar
x
at program point DisjSets.find(int):::EXIT
, the tracking
argument would be:
SequenceScalarSubscriptFactory<x,this.theArray[]>@DisjSets.find(int):::EXIT
There are two configuration options that can customize the output. The option daikon.Debug.showTraceback will output a stack trace on each log statement. The option daikon.Debug.logDetail will cause more detailed (and often voluminous) output to be printed. For more information, Configuration options.
Note that all interesting information is not necessarily currently logged. It will often be necessary to add new logging statements for the specific information of interest (see Adding track logging). This is covered in the next section.
More detailed information can be found in the Javadoc for
daikon.Debug
and
daikon.inv.Invariant
.
When you add a new invariant, derived variable, or other component to Daikon, you should ensure that it supports track logging in the same way that existing components do. This section describes how to do so.
Track logging is based around the class name, program point name, and variables of interest. Track logging methods accept these parameters and a string to be printed. Debug.java implements the following basic log methods:
log (String) log (Class, Ppt, String) log (Class, Ppt, Varinfo[], String)
The first uses the cached version of the Class
, Ppt
,
and VarInfo
that
was provided in the constructor. The second uses the specified
variables and the VarInfo
information from Ppt
.
The third specifies each variable explicitly.
When logging is not enabled, calling the logging functions can take a
significant amount of time (because the parameters need to be evaluated and
passed). To minimize this, a function logOn()
is provided to see
if logging is enabled. It is recommended that code of the following form
be used for efficiency:
if (Debug.logOn()) { Debug.log (getClass(), ppt, "Entering routine foo"); }
Track logging also can work with other loggers. Each of the logging methods has an alternative version that also accepts a logger as the first argument. In this case, normal track logging is performed if the class, ppt, and vars match. If they don't match, the same information is logged via the specified logger. For example:
if (Debug.logOn || logger.isLoggable (Level.FINE)) { Debug.log (logger, getClass(), ppt, "Entering routine foo"); }
The above will print if either the tracking information matches or if the specified logger is enabled.
Convenience methods are available for track logging invariants. In this case the class name, ppt, and variable information are all taken from the invariant. The available methods are:
logOn() logDetail() log (String) log (Logger, String)
These correspond to the Debug
methods described above. They are
the recommended way to log information concerning invariants.
Track logging also provides one additional level of detail. The function
logDetail()
returns whether or not more detailed information
should be printed. This should be used for information which is not
normally interesting or especially voluminous output. Often statements
using logDetail()
should be commented out when not in active use.
Each call to a track log method will produce output in the same basic format. Space for three variables is always maintained for consistency:
daikon.Debug: <class>: <ppt>: <var1>: <var2>: <var3>: <msg>
If showTrackback
is enabled, the traceback will follow each
line of debug output.
Unfortunately, in ASCII text, the above can be a little difficult to read because it normally doesn't line up very well. A simple translator to HTML exists that can be used to provide HTML formatted output. This tool is not completely tested, but seems to work reasonably well in most situations. The following instructions only apply to MIT, but the tool is shipped in the scripts directory and can easily be setup elsewhere as well.
Use the URL http://pag.csail.mit.edu/daikon/mit/log2html.php to access log2html. It will ask you for a file of daikon output. One good way to create this file is to use the tee command. For example:
daikon [daikon args] | tee ~/daikon.out
Then specify that file to log2html. Note that when supplying a filename to log2html, you must expand ‘~’ yourself since the webserver doesn't know who you are. The result will contain tables with the log output in them (all other output is unchanged). Table columns are based on the ‘: ’ separator in the ASCII output. If traceback is enabled, another column is added showing where the log method was called. For example, the traceback column might contain:
+PptSlice1.addInvariant
If you put your cursor over PptSlice1.addInvariant
it will show
the exact line number in the source file where the log method was
called as part of the href. If you click on the traceback it will
create an output file of type ‘application/emacs’ that contains an
emacsclient command to edit the related source file. Most browsers
can be setup to execute a command to process these files (in Mozilla
this is done in the “Navigator/Helper Applications” section of
preferences). The script $inv/scripts/browser_emacs will
correctly handle files of this type and bring up the appropriate file
in Emacs. This could easily be changed to support other editors.
Note that as currently implemented this creates a possible security
hole (malicious non-editor commands could be executed) as no checking
is done on the validity of the command.
Clicking on the leading plus of the traceback information will show the entire traceback. For example:
-PptSlice1.addInvariant PptSlice.flow_and_remove_falsified PptSlice1.add PptTopLevel.add PptTopLevel.add_and_flow FileIO.process_sample FileIO.read_data_trace_file FileIO.read_data_trace_files Daikon.process_data Daikon.main
The same capabilities (showing the line number, bringing up the buffer in emacs) exist on each of the frames in the traceback. The detailed traceback can be hidden by clicking on the leading ‘-’ on the first frame. Note that the current state of what tracebacks are expanded is kept in a file named the same as your Daikon output file with .state appended. For example, the state file for ~/daikon.out is ~/daikon.out.state. This file must be world writable for log2html to work correctly.
This chapter describes some of the techniques used in Daikon to make it efficient in terms of time and space needed. These techniques can be enabled or disabled at the Daikon command line, as described in Running Daikon.
Daikon reduces runtime and memory by avoiding performing work for redundant invariants that provide no useful information to the user. There are three basic types of optimization that can be performed for uninteresting invariants: non-instantiation, suppression, and non-printing.
Non-instantiation prevents the creation of an invariant because
the invariant's truth value is statically obvious (from the semantics
of the programming language), no matter what values may be seen at run
time. Two examples are “A[i] is an element of A[]” and “size(A[])
>= 0”. Non-instantiation is implemented by the by the
isObviousStatically
method.
With the equality sets optimization (see Equality optimization),
non-instantiation can only happen if all equality permutations are
statically obvious. Note that isObviousStatically
should
be used only for invariants that are known to be true. Other code
presumes that any statically obvious invariants are true and can
be safely presumed when determining if other invariants are redundant.
An invariant can be suppressed if it is logically implied by some set of other invariants (referred to as “suppressors”). A suppressed invariant is not instantiated or checked as long as its suppressors hold. For example “x > y” implies “x >= y”. Suppression has some limitations. It cannot use as suppressors or suppress sample dependent invariants (invariants that adapt themselves to the samples they see and whose equation thus involves a constant such as “x > 42”). Suppression also cannot use relationships between variables. For example, it cannot suppress “x[i] = y[j]” by “(x[] = y[]) ^ (i = j)”. Suppressor invariants can only use variables that are also in the invariant that is being suppressed. In this example, only invariants using the variables “x[i]” and “y[i]” can be used as a suppressors. See New suppressors for more information.
Non-printing is a post-pass that throws out any invariants that
are implied by other true invariants. It is similar to suppression, but
has none of the limitations of suppression. But since it is only run as
a post pass, it cannot optimize runtime and memory use as suppression can.
Non-printing should be used only in cases where suppression cannot.
Non-printing is implemented by ObviousFilter
, which calls the
isObviousDynamically
method on invariants. The
isObviousStatically
method is also used by the non-printing
checks; it can be called at the end without reference to equality sets.
More detail can be found in the paper “Efficient incremental algorithms for dynamic detection of likely invariants” by Jeff H. Perkins and Michael D. Ernst, published in FSE 2004; the paper is available from http://pag.csail.mit.edu/pubs/invariants-incremental-fse2004-abstract.html.
Dataflow hierarchy is a means to relate variables in different program
points in a partial ordering. Variables in program point X are
related to variables in another program point Y by a “flow” relation
if every sample seen of X's variables is also meant to be seen at Y.
Y is called a parent program point of X. For example, all the field
variables in the :::ENTER
program point of a method in class C relate to
the field variables in the :::CLASS
program point of C. This is because
the state of C, when in context at the entry :::ENTER
program point, is
also in context at the :::CLASS
program point. Any invariant that holds
true on a parent program point must hold on the child program point.
The purpose of dataflow hierarchy is to reduce the presence of
redundant invariants by only keeping invariants at the highest parent
at which they apply. This saves both time and space.
There are many ways that program points can be connected. Daikon
provides for four ways. First, :::CLASS
program points are parents of
all their method program points. Second, between two classes that are
related by inheritance, corresponding program points relate — for
example, java.util.Vector:::CLASS
is a child of
java.util.List:::CLASS
. Third, when a program point contains
variables of a type whose :::CLASS
program point is also available to
Daikon, the former program point's variables relate to the latter
program point's :::CLASS
method. For example, if X.y is of type Y, and
Y contains fields a and b, X.y, X.y.a and X.y.b relate to
Y.this, Y.b and Y.a. Fourth, variables at :::ENTER
program
points are related to the “orig” versions at :::EXIT
program points.
When using Daikon, the above four ways of relations in the dataflow
hierarchy will result in some true invariants that are not reported at
some program points. However, the invariant will be present in some
parent program point. Dataflow hierarchy is enabled by default, but
can be disabled by the --nohierarchy flag. When dataflow is enabled,
the only samples that are examined by Daikon are the :::EXIT
program
points (plus “orig” variables) since these contain a complete view of
the data.
When N variables are equal within a program point there will be N(N-1)/2 pairwise invariants to represent the equality within the equal variables, and N copies of every other invariant. For example, if a, b, and c are equal, then “a == b”, “a == c”, “b == c” will be reported as pairwise invariants, and “odd(a)”, “odd(b)” and “odd(c)” will be reported. If the variables will always be equal, then reporting N times the invariants is wasteful. Daikon thus treats equality specially.
Each group of variables that are equal from the start of inferencing are placed in equality sets. An equality set can hold an arbitrary number of variables, and replaces the O(N^2) pairwise equality invariants. Every equality set has a leader or canonical representation by a variable in the set. Non-equality invariants are only instantiated and checked on the leader. When printing invariants, Daikon reports only invariants on the leader. The user can easily determine that “odd(a)” and “a == b” imply “odd(b)”. Equality optimization can be turned off at the command line with the --noequality flag.
Daikon has two sets of tests: unit tests (see Unit testing) and regression tests (see Regression tests). If there are any differences between the expected results and the ones you get, don't check in your changes until you understand which is the desired behavior and possibly update the goals.
The Daikon distribution contains unit tests, but not regression tests (which would make the distribution much larger). The regression tests appear in Daikon's CVS repository.
The unit tests are found in invariants/java/daikon/test/; they use the JUnit unit testing framework. They take a few seconds to run. They are automatically run each time you compile Daikon (by running ‘make’ in $inv/java or any of its subdirectories). You can also run them explicitly via ‘make unit’. When you write new code or modify old code, please try to add unit tests.
This tests the formatting of invariants with specified input. The tests are configured in the file InvariantFormatTest.commands under daikon/test/. Make sure the InvariantFormatTest.commands file is in the classpath when this tester is run or the tester will not work. (It will just tell you that the file is not in the classpath.)
The file is formatted as follows:
<fully qualified class name> [<instantiate args>] <type string> <goal string>+ <- 1 or more goal strings <sample>* <- 0 or more samples
The file format should be the same regardless of blank or commented lines except in the samples area. No blank lines or comments should appear after the goal string before the first sample or between parts of samples (these lines are used currently to determine where samples lists end). This will be remedied in a future version of the tester.
Instantiate args
boolean true int 37 boolean false
Type string:
Goal string:
Example: Type string, Goals | | \|/ | int \|/ Goal (daikon): a >= -6 Goal (java): a >= -6 Goal (esc): a >= -6 Goal (ioa): a >= -6 Goal (jml): a >= -6 Goal (simplify): (>= |a| -6)
Note that the spacing on the goal lines is exact, that is, no extra spaces are allowed and no spaces are allowed to be missing. So the exact format is again:
Goal<1 space>(<format name>):<1 space><goal text>
Samples:
Arrays and strings must be formatted according to the Daikon dtrace file convention (for a full description, see File formats. This states that arrays must be surrounded in brackets (start with ‘[’, end with ‘]’), and entries must be separated by a space. Strings must be enclosed in quotes (‘"’). Quotes within a string can be represented by the sequence ‘\"’.
For example:
[1 2 3 4 5] - an array with the elements 1, 2, 3, 4, 5 "aString" - a string "a string" - also legal as a string "\"" - the string with the value " ["a" "b" "c"] - an array of strings int int <- type string Goal: a < b <- goal string, no comment/blank lines after this 1 <- or before this 2 2 <-|__ Pair of values (a = 2 , b = 3) 3 <-|
Other examples are in the existing test file (InvariantFormatTest.commands).
The output of a test run can be converted into goals by using the --generate_goals switch to the tester as follows:
java daikon.test.InvariantFormatTester --generate_goals
Note that this test is included in the set of tests performed by the master tester, and so it is not necessary to separately run this test except to generate goal files.
Furthermore, this framework cannot parse complex types from files
unless they contain a public (Object) valueOf(String s)
function. Otherwise the program has no was of knowing how to create
such an object from a string. All primitives and the String type are
already recognized.
Sample testing tests various components of Daikon as samples are being processed. A file (normally daikon/test/SampleTester.commands) specifies a decls file to use, the samples for each ppt/var, and assertions about Daikon's state (such as whether or not a particular invariant exists).
Each line of the file specifies exactly one command. Blank lines and leading blanks are ignored. Comments begin with the number sign (‘#’) and extend to the end of the line. The type of command is specified as the first token on the line followed by a colon. The supported commands are:
This command specifies the declaration file to use. This is a normal decls file that should follow the format defined in the user manual.
This command specifies the program point that will be used with following vars, data, and assert commands. The program point should be specified exactly as it appears in the decls file.
Specifies the variables that will be used on following data lines. Each variable must match exactly a variable in the ppt. Other variables will be treated as missing.
Specifies the values for each of the previously specified variables. The values must match the type of the variables. A single dash (-) indicates that a variable is missing.
Specifies an assertion that should be true at this point (see Assertions). The negation of an assertion can be specified by adding an exclamation point before the assertion (for example:
!inv("x > y", x, y
)).
Assertions are formatted like function calls: <name>(arg1, arg2, ...). The valid assertions for the assert: command are:
The inv assertion asserts that the specified invariant exists in the current ppt. The format argument is the result of calling
format()
on the invariant. This is how the invariant is recognized. The remaining arguments are the variables that make up the invariants slice. These must match exactly variables in the ppt. The inv assertion returns true iff the slice exists and an invariant is found within that slice that matches format.Optionally, format can be replaced by the fully qualified class name of the invariant. In this case, it is only necessary for the class to match.
More assertions can easily be added to SampleTester.java as required.
The following is an simple example of sample testing.
decl: daikon/test/SampleTesters.decls ppt: foo.f():::EXIT35 vars: x y z data: 1 1 0 data: 2 1 0 assert: inv("x >= y", x, y) assert: inv(daikon.inv.binary.twoScalar.IntGreaterEqual,x,y) assert: !inv("x <= y", x, y)
The regression tests run Daikon on many different inputs and compare Daikon's output to expected output. They take about an hour to run.
The regression tests appear in the $inv/tests/ directory. Type ‘make’ in that directory to see a list of makefile targets. The most common target is ‘make diffs’; if any resulting file has non-zero size, the tests fail. You do not generally need to do ‘make clean’, which forces re-instrumentation (a possibly slow process) the next time you run the tests.
As when you install or compile Daikon, when you run the tests environment variable DAIKONDIR (or INV, whose effect is the same) should be set. Additionally, environment variable JAVA_HOME should be the directory containing the Java JDK.
You should generally run the regression tests before checking it a change (especially any non-trivial change). If any of the regression test diffs has a non-zero size, then your edits have changed Daikon's output and you should not check in without carefully determining that the changes are intentional and desirable (and you should update the goal output files, so that the diffs are again zero).
There are several subdirectories under $inv/tests/, testing different components of the Daikon distribution (such as Kvasir, see Kvasir). Tests of the invariant detection engine itself appear in $inv/tests/daikon-tests/.
Each Makefile under $inv/tests/ includes $inv/tests/Makefile.common, which contains the logic for all of the tests. Makefile.common is somewhat complicated, if only because it controls so many types of tests.
Note on Kvasir tests: The Kvasir (Daikon C front-end) tests appear in the $inv/tests/kvasir-tests directory. These tests run Daikon to ensure that the Kvasir output is valid Daikon input. To run them, go to $inv/tests/kvasir-tests or any test sub-directory within here and run ‘make summary-w-daikon’. If any tests return ‘FAILED’, then you should look at the appropriate .diff file. If you feel that the failure was actually a result of your Daikon changes and should be in fact correct output, then run ‘make update-inv-goals’ to update the Daikon invs.goal file.
Most Daikon regression tests appear in $inv/tests/daikon-tests. Each test is placed in a separate directory. That directory contains a simple makefile and the goal files for the tests. The source files for the test are stored in $inv/tests/sources. For example, the StackAr directory contains the following files in CVS:
Makefile Stackar.spinfo-static.goal StackAr.txt-daikon.goal StackAr.txt-esc.goal StackAr.txt-jml.goal StackAr.txt-merge-esc.goal StackAr.txt-merge-jml.goal
The Makefile must contain the following entries.
instrument-files-revise: echo "DataStructures/StackAr.java" >| ${INST_LIST_FILE}
The goal files are the expected results of running daikon and its associated tools. The easiest way to create them is to simply create empty versions of each. They execute ‘make diffs’ to run the test and produce results. When the results are as expected execute ‘make update-goals’ to copy the results in to the goal files. Release the test by committing the goal files, Makefile, and source files to the CVS repository.
The common makefile contains a number of other useful targets. A brief explanation of each can be found by executing ‘make’ (without a target).
The test can be added into the standard tests (either ‘everything’ or ‘quick’ by adding the test the appropriate list in $inv/tests/daikon-tests/Makefile.
This section is intended primarily for researchers who are analyzing historical versions of Daikon. A number of researchers (for example, in the testing community) use Daikon because it contains both a CVS repository and a set of tests. (The CVS repository can also be useful to those making non-trivial changes to the Daikon code base, because the CVS repository includes regression tests that are more extensive than the unit tests that are included in the Daikon distribution. However, note that the Daikon distribution contains full source code, and we are always happy to receive bug fixes and patches against the source code.)
If you wish access to the Daikon CVS repository, send mail to daikon-developers@lists.csail.mit.edu. We appreciate it if you let us know why you need it and what you want to use it for. Also, we request that you keep us appraised of any problems that you encounter or discoveries that you make, and that you let us know of any publications so that we can publicize them at http://pag.csail.mit.edu/daikon/pubs/#daikon-testsubject. Also, please do not redistribute the repository without prior permission from us.
We typically give you a copy of the CVS repository (about a 1GB download) rather than remote access to the master CVS repository. This protects you from the possibility that our server is down, or that we someday cut you off from access to the repository; you are guaranteed to be able to reproduce your results. It is also less hassle for us (we don't have to create an account for you), and it is less load on our servers (since researchers may wish to perform many CVS operations).
This section points out some pitfalls for such researchers. Although these problems are easy to avoid, some previous published work has made these mistakes; don't let that happen to you!
Recall that Daikon contains two sets of tests (see Testing); you should include both in any analysis of Daikon's tests. (Or, if you can analyze only one of the two sets of tests, then clearly explain that the regression tests are the main tests.) The regression tests use Makefiles to avoid re-doing unnecessary work, so any description of the time taken to run Daikon's tests should be a measurement of re-running the tests after they have been run once, not running them from a clean checkout or after a ‘make clean’ command.
Daikon intentionally does not contain tests for third-party libraries that are included (sometimes in source form) in the Daikon distribution. As one example, the java/jtb/ directory contains an external library. Therefore, any measurement of Daikon's code coverage should not include such libraries (or other libraries, some of which are distributed as .jar files).
Be sure to see file doc/www/mit/index.html in the repository for information about how group members use Daikon. This file changes from time to time — for instance, it changed when a CVS branch was created and later when development on it ceased (see Branches).
The Daikon CVS repository contains two branches: a main trunk and a branch (named ‘ENGINE_V2_PATCHES’) for version 2 of Daikon.
The CVS manual (see section “Branching and merging” of the manual CVS — Concurrent Versions System) describes CVS branches:
CVS allows you to isolate changes onto a separate line of development, known as a “branch”. When you change files on a branch, those changes do not appear on the main trunk or other branches.Later you can move changes from one branch to another branch (or the main trunk) by “merging”. Merging involves first running ‘cvs update -j’, to merge the changes into the working directory. You can then commit that revision, and thus effectively copy the changes onto another branch.
In early January 2002 (or perhaps in late 2001), we created the ‘ENGINE_V2_PATCHES’ branch at the invariants/java/daikon level of the Daikon CVS repository. Primary development continued along the CVS branch ‘ENGINE_V2_PATCHES’, which we called “Daikon version 2”. We called the CVS trunk “Daikon version 3”; it was experimental, and very few people ran its code or performed development on it. Periodically, all changes made to the branch would be merged into the trunk, as one large checkin on the trunk. Later, development on version 3 became more common, some changes were merged from the trunk to the branch, and version 2 was finally retired (and no more changes were made to the branch) in December 2003.
A regular ‘cvs checkout’ gets the trunk. The -r flag specifies a branch. For example, to get the branch as of June 9, 2002, one could do
cvs -d $pag/projects/invariants/.CVS co -r ENGINE_V2_PATCHES \ -D 2003/06/09 invariants/java/daikon
Some warnings about analyzing historical versions of Daikon:
This chapter contains information about the file format of Daikon's input files. It is of most information to those who wish to write a front end, also known as an instrumenter (see Front ends (instrumentation)). A new front end enables Daikon to detect invariants in another programming language.
Daikon's input is conventionally one or more .dtrace data trace files. (Another, optional type of input file for Daikon is a splitter info file; see Splitter info file.) A trace file is a text file that consists of newline-separated records. There are two basic types of records that can appear in Daikon's input: program point declarations, and trace records. The declarations describe the structure of the trace records. The trace records contain the data on which Daikon operates — the run-time values of variables in your program.
Each declaration names an instrumented program point and lists the variables at that program point. A program point is a location in the program, such as a specific line number, or a specific procedure's entry or exit. An instrumented program point is a place where the instrumenter may emit a trace record. A program point declaration may be repeated, so long as the declarations match exactly (any declarations after the first one have no effect).
A data trace record (also known as a “sample”) represents one execution of a program point. The record specifies the program point and gives the runtime values of each variable. The list of variables in the data trace record must be identical to that in the corresponding declaration. For a given program point, the declaration must precede the first data trace record for the program point. It is not required that all the program point declarations appear before any of the data trace records.
There exist some other declaration-related records; See Declaration-related records.
Instead of placing both declarations and data trace records in a single file, it is permitted to place the declarations in one or more .decls “declaration files” while leaving the data trace records in the .dtrace file. This can be convenient for tools that perform a separate instrumentation step, such as dfepl (see dfepl) and Mangel-Wurzel (see Mangel-Wurzel). Such a tool takes as input a target program to be analyzed, and produces two outputs: a .decls file and an instrumented program. Executing the instrumented program produces a .dtrace file containing data trace records for all the program points that appear in the .dtrace file. This approach works fine and is easier to implement in certain situations, but has a few disadvantages. It requires the user to perform at least two steps — instrumentation and execution — and the existence of two versions of the program (instrumented and uninstrumented) can lead to confusion or extra work. It is also more convenient to have a single file that contains all information about a program, rather than multiple .decls files that must be associated with the .dtrace file.
Daikon files are textual, to permit easier viewing and editing by humans. Each record is separated by one or more blank lines. To permit easier parsing by programs, each piece of information in a record appears on a separate line.
Outside a record, any line starting with a pound sign (#) or double slashes (//) is ignored as a comment. Comments are not permitted inside a record.
The trace file (or declaration file) first states the declaration file format version number (see Declaration version). It may also specify some other information about the file (see Declaration-related records). Then, it defines each program point and its variables.
Indentation is ignored, so it may be used to aid readability. Fields with defaults can be omitted.
As a rule, each line of the declaration file is of the form
<field-name> <field-value>
.
The declaration version record must be the first record in the file.
The declaration version record is as follows:
decl-version <version>
The current version is 2.0.
Previous versions (see Version 1 Declarations) did not include a version field and are identified by the lack of this field.
You can specify the language in which the program was written with a record of the form
input-language <language>
The language string is arbitrary and is not currently used.
The Variable comparability record indicates how the comparability field of a variable declaration should be interpreted.
Its format is:
var-comparability <comparability-type>
The possible values for comparability-type are implicit
and
none
.
“implicit
” means ordinary comparability as described in
Program point declarations. (The name implicit
is retained
for historical reasons.)
This record is optional. The implicit
type is the default.
This declaration indicates classes that implement the
java.util.List
interface, and should be treated as sequences
for the purposes of invariant detection. The syntax is as follows:
ListImplementors <classname1> <classname2> ...
Each classname is in Java format (for example, “java.util.LinkedList”).
The ‘--list_type’ command-line option to Daikon can also be used to specify classes that implement lists; See Options to control invariant detection.
The format of a program point declaration is:
ppt <ppt-name> <ppt-info> <ppt-info> ... <variable-declaration> <variable-declaration> ...
The program point name can include any character. In the declaration
file,
blanks must be replaced by \_
, and backslashes must be escaped as \\
.
Program point names must be distinct.
The following information about the program point (ppt-info
) can be specified:
ppt-type <type>
Specifies the type of the program point. Possible program point
types are point
, class
, object
, enter
,
exit
, subexit
. Except for point
all of these
types are related to the program point hierarchy (see Dataflow hierarchy).
A point
program point is one that is not involved in a
program point hierarchy. This is normally used when the input is not
from a programming language or when is no dataflow hierarchy.
flags <flags>
Specifies one or more flags for this ppt. The possible flags are:
static
, enter
, exit
, private
, return
.
parent <relation-type> <parent-ppt-name> <relation-id>
Specifies the program point hierarchy (Dataflow hierarchy).
In particular, each parent
field names one parent of this program point. A parent program point
is a point whose samples should include all of the samples at this
program point. For example, an object program point is a parent of
each of the method program points in that object.
The relation-type is the type of parent-child relationship in
the hierarchy. Possible relationship types are parent
and
user
.
A parent
relationship is one where the program points themselves
are explicitly related, such as an enter and an exit point. All of the
variables at one of the points exists at the other. A user
relation is one where a class is used at another point, such as at an
enter point. For example, if a reference to class A were passed to
routine r1, the values found at enter and exit of r1 could be applied to
the class/object program point for A. By default user
relations
are not used because they can be recursive.
The relation-id is a unique integer that identifies this parent relation. They are used when defining the the specific parent relations for variables.
Multiple parent fields can be specified.
The format of a variable declaration is:
variable <name> <variable-info> <variable-info> ...
The variable name is arbitrary, but for clarity, it should match what is
used in the programming language. All characters are legal in a name,
but blanks must be represented as \_
and backslashes as
\\
.
If the variable is an array, '..
' marks the location of
array indices within the variable name. Some examples of names are:
this.theArray this.theArray[..] this.stack.getClass()
The following information about the variable (variable-info
) can be specified:
var-kind <kind> [<relative-name>]
Specifies the variable kind. Possible values are: field
, function
,
array
, variable
, return
. If field
or function
are specified, the relative name of the field or function must be
specified. For example, if the variable is this.theArray
, the
relative name is theArray
. Pointers to arrays are of type
field
. The arrays themselves (a sequence of values) are of
type array
. A var-kind entry is required in each variable block.
enclosing-var <enclosing-var-name>
The variable that contains this variable. Required for fields and arrays, and optional for functions. If specified for functions, the function is an instance method. If not specified the function is static. A variable is specified by its name. The enclosing-var must be defined. If a variable is omitted (e.g., by the omit-var switch), any variable for which it is the enclosing variable must be omitted as well.
For example, if the variable is this.theArray
, the
enclosing variable is this
.
reference-type pointer|offset
Specifies the kind of reference for variables which are structures or
classes. The possible values are pointer
or offset
. In
C, pointer
is used if the variable is a pointer, offset
is used when the structure is placed inline. Pointer would be used
for all references to java objects. Defaults to pointer.
array <dim>
The number of array dimensions inherited or declared by this variable. The valid values are 0 or 1. This should be specified for any variable that has multiple values. If not specified it defaults to 0. Future versions of Daikon may support more levels of arrays.
dec-type <language-declaration>
This is what the programmer used in the declaration of the variable.
Names for standard types should use Java's names (e.g., int
,
boolean
, etc.), but names for user-defined or language-specific
types can be arbitrary strings. A dec-type entry is required in each
variable block.
rep-type <daikon-type>
This describes what will appear in the data
trace file. For instance, the declared type might be char[]
but
the representation type might be java.lang.String
. Or, the declared
type might be Object
but the representation type might be
hashcode
, if the address of the object is written to the data trace
file. A rep-type entry is required in each
variable block.
The representation type should be one of boolean
, int
,
hashcode
, double
, or java.lang.String
; or an
array of one of those (indicated by a []
suffix, as in Java).
hashcode
is intended for unique object identifiers like memory
addresses (pointers) or the return value of Java's
Object.hashCode
method. hashcode
is treated like
int
, except that the hashcode values are considered uninteresting
for the purposes of output. For example, Daikon will print
‘var has only one value’ instead of ‘var ==
0x38E8A’.
flags <flags>
One or more flags may optionally be specified. Possible values are:
is_param
Indicates that a given variable is a parameter to a procedure. Some
procedures reassign parameters – essentially using them as local
variables. Such uses are not relevant to the procedure's external
specification. The is_param
flag causes Daikon not to print
certain invariants, if the variable has been reassigned.
p
in its post-state form are not
printed.
p
(such as p.x
)
are printed only if p
has not changed.
p
is changed, but then, p
would no longer be interesting.)
no_dups
Indicates that a collection can not contain duplicates. If it cannot, Daikon does not check for some invariants that only have meaning for collections that can contain duplicate elements.
not_ordered
Indicates that the order of a collection does not have meaning. In this case, Daikon does not check for element-wise comparisons between it and other collections.
synthetic
Indicates that the variable was added by the front end and is not manifest in the input program.
classname
Indicates that the variable indicates the classname of its enclosing variable.
to_string
Indicates that the variable is the string representation of its enclosing variable.
non_null
Indicates that the variable can't take on a null value. In this case, Daikon will not check for the NonZero invariant.
comparability <comparability-key>
The comparability-key indicates which other variables are comparable to this one. The information specified here might have been obtained dynamically, via type-inference based analysis, or in some other manner.
A comparability for a non-array type is a signed integer. Two variables at the same program point are considered comparable if both integers are the same, or if either integer is negative (that is, a negative number means “comparable to every other variable”). A comparability for an array type must contain an integer for each index and for the contents; for instance, ‘5[22][17]’ for a two-dimensional array. An array comparison succeeds if comparisons over each component succeed.
Variables at different program points are never compared to one another. Use of the same number at different program points does not indicate any relationship between the variables, and a given variable may have a different comparability integer at different program points.
As an example, in the following code:
int sum(int len, int[] a) { int sum=0; for (int i=0; i++; i<len) sum += a[i]; return sum; }
variables i
and len
are comparable to one another (and
to indices of array a
). Furthermore, the result is comparable
to the elements of array a
. The comparability keys for these
variables might look like
len - comparability 5 a - comparability 8[5] return - comparability 8
A comparability entry is required in each variable block.
parent <parent-ppt> <rel-id> [<parent-variable>]
Optionally specifies the parent variable of this variable in the program point/variable hierarchy. The parent-ppt is the name of the parent program point. The rel-id must be one of the relationship ids specified for this program point. The parent-variable is the name of this variable's parent in the parent program point. If the names are the same, it can be omitted.
constant <value>
Optionally specifies a constant value for this variable. If the variable has compile-time constant value, it must be omitted from the data trace records.
function-args <arg1> <arg2> ...
Optionally specifies the arguments to a function (if any). Specified by the external name of the argument variables. Multiple arguments are blank separated. For example
function-args a.b this.f1
specifies that the function takes two arguments which are a.b
and
this.f1
. As with enclosing variables, each of the arguments must
be defined as variables.
A data trace record (also known as a “sample”) contains run-time value information. Its format is:
<program-point-name> this-invocation-nonce <nonce-string> <varname-1> <var-value-1> <var-modified-1> <varname2> <var-value-2> <var-modified-2> ...
In other words, the sample record contains:
:::ENTER
) with
procedure exits (whose names conventionally end with :::EXIT
).
This is necessary in concurrent systems because there may
be several invocations of a procedure active at once and they do not
necessarily follow a stack discipline, being exited in the reverse order of
entry. For non-concurrent systems, this nonce is not necessary, and
both the line this-invocation-nonce
and the nonce value may be
omitted.
null
.
[
), elements separated by
spaces, close bracket (]
). (Also, the array name
should end in ‘[..]’; use ‘a[..]’ for array contents,
but ‘a’ for the identity of the array itself.)
The value may also be the string nonsensical
; See Nonsensical values.
A string or array value is never null
. A reference to a
string or array may be null
, in which case the string or array is
nonsensical
.
The special value 2 should be used only (and always) when the value
field is nonsensical
.
The variables should appear in the same order as they did in the declaration of the program point, without omissions or additions.
Some trace variables and derived variables may not have a value because
the expression that computes it cannot be evaluated. In such a
circumstance, the value is said to be nonsensical, it is written in the
trace file as nonsensical
, and its modified field must be 2.
Examples include
x
when x
is uninitialized or deallocated,
x.y
when x
is null (or uninitialized or deallocated)
a[i]
when i
is outside the bounds of a
(or
uninitialized or deallocated, or a
is null, uninitialized, or
deallocated)
A trace record should contain exactly the same variables as in the corresponding declaration. There is one exception: for efficiency, compile-time constants (e.g., static final variables in Java) are omitted from the trace record, since they would have the same value every time.
Neither the declarations nor the trace records contains derived variables (see Variable names).
Here are portions of two files StackArTester.decls and StackArTester.dtrace, for a Java class that implements a stack of integers using an array as the underlying data structure. You can see many more examples by simply running an existing front end on some Java, C, or Perl programs and viewing the resulting files.
This is part of the file StackArTester.decls, a declaration file for the StackAr.java program (see StackAr example).
ppt DataStructures.StackAr.push(java.lang.Object):::ENTER ppt-type enter parent parent DataStructures.StackAr:::OBJECT 1 variable this var-kind variable dec-type DataStructures.StackAr rep-type hashcode flags is_param comparability 22 parent DataStructures.StackAr:::OBJECT 1 variable this.theArray var-kind field theArray enclosing-var this dec-type java.lang.Object[] rep-type hashcode comparability 22 parent DataStructures.StackAr:::OBJECT 1 variable this.theArray.getClass() var-kind function getClass() enclosing-var this.theArray dec-type java.lang.Class rep-type java.lang.String flags synthetic classname comparability 22 parent DataStructures.StackAr:::OBJECT 1 variable this.theArray[..] var-kind array enclosing-var this.theArray array 1 dec-type java.lang.Object[] rep-type hashcode[] comparability 22 parent DataStructures.StackAr:::OBJECT 1 variable this.theArray[..].getClass() var-kind function getClass() enclosing-var this.theArray[..] array 1 dec-type java.lang.Class[] rep-type java.lang.String[] flags synthetic classname comparability 22 parent DataStructures.StackAr:::OBJECT 1 variable this.topOfStack var-kind field topOfStack enclosing-var this dec-type int rep-type int comparability 22 parent DataStructures.StackAr:::OBJECT 1 variable x var-kind variable dec-type java.lang.Object rep-type hashcode flags is_param comparability 22 variable x.getClass() var-kind function getClass() enclosing-var x dec-type java.lang.Class rep-type java.lang.String flags synthetic classname comparability 22 ppt DataStructures.StackAr.push(java.lang.Object):::EXIT99 ppt-type subexit parent parent DataStructures.StackAr:::OBJECT 1 variable this var-kind variable dec-type DataStructures.StackAr rep-type hashcode flags is_param comparability 22 parent DataStructures.StackAr:::OBJECT 1 variable this.theArray var-kind field theArray enclosing-var this dec-type java.lang.Object[] rep-type hashcode comparability 22 parent DataStructures.StackAr:::OBJECT 1 variable this.theArray.getClass() var-kind function getClass() enclosing-var this.theArray dec-type java.lang.Class rep-type java.lang.String flags synthetic classname comparability 22 parent DataStructures.StackAr:::OBJECT 1 variable this.theArray[..] var-kind array enclosing-var this.theArray array 1 dec-type java.lang.Object[] rep-type hashcode[] comparability 22 parent DataStructures.StackAr:::OBJECT 1 variable this.theArray[..].getClass() var-kind function getClass() enclosing-var this.theArray[..] array 1 dec-type java.lang.Class[] rep-type java.lang.String[] flags synthetic classname comparability 22 parent DataStructures.StackAr:::OBJECT 1 variable this.topOfStack var-kind field topOfStack enclosing-var this dec-type int rep-type int comparability 22 parent DataStructures.StackAr:::OBJECT 1 variable x var-kind variable dec-type java.lang.Object rep-type hashcode flags is_param comparability 22 variable x.getClass() var-kind function getClass() enclosing-var x dec-type java.lang.Class rep-type java.lang.String flags synthetic classname comparability 22 ppt DataStructures.StackAr:::OBJECT ppt-type object variable this var-kind variable dec-type DataStructures.StackAr rep-type hashcode flags is_param comparability 22 variable this.theArray var-kind field theArray enclosing-var this dec-type java.lang.Object[] rep-type hashcode comparability 22 variable this.theArray.getClass() var-kind function getClass() enclosing-var this.theArray dec-type java.lang.Class rep-type java.lang.String flags synthetic classname comparability 22 variable this.theArray[..] var-kind array enclosing-var this.theArray array 1 dec-type java.lang.Object[] rep-type hashcode[] comparability 22 variable this.theArray[..].getClass() var-kind function getClass() enclosing-var this.theArray[..] array 1 dec-type java.lang.Class[] rep-type java.lang.String[] flags synthetic classname comparability 22 variable this.topOfStack var-kind field topOfStack enclosing-var this dec-type int rep-type int comparability 22
This is part of file StackArTester.dtrace, which you can create by
running the instrumented StackAr.java program (see StackAr example). This excerpt contains only the first two calls to
push
and the first return from push
, along with the
associated object program point records; omitted records are indicated
by ellipses.
... StackAr.push(java.lang.Object):::ENTER this_invocation_nonce 55 x 1217030 1 x.getClass() "DataStructures.MyInteger" 1 this.theArray 3852104 1 this.theArray.getClass() "java.lang.Object[]" 1 this.theArray[] [null] 1 this.theArray[].getClass() [null] 1 this.topOfStack -1 1 StackAr:::OBJECT this.theArray 3852104 1 this.theArray.getClass() "java.lang.Object[]" 1 this.theArray[] [null] 1 this.theArray[].getClass() [null] 1 this.topOfStack -1 1 ... StackAr.push(java.lang.Object):::EXIT96 this_invocation_nonce 55 x 1217030 1 x.getClass() "DataStructures.MyInteger" 1 this.theArray 3852104 1 this.theArray.getClass() "java.lang.Object[]" 1 this.theArray[] [1217030] 1 this.theArray[].getClass() ["DataStructures.MyInteger"] 1 this.topOfStack 0 1 StackAr:::OBJECT this.theArray 3852104 1 this.theArray.getClass() "java.lang.Object[]" 1 this.theArray[] [1217030] 1 this.theArray[].getClass() ["DataStructures.MyInteger"] 1 this.topOfStack 0 1 ... StackAr.push(java.lang.Object):::ENTER this_invocation_nonce 94 x 1482257 1 x.getClass() "DataStructures.StackAr" 1 this.theArray 350965 1 this.theArray.getClass() "java.lang.Object[]" 1 this.theArray[] [null] 1 this.theArray[].getClass() [null] 1 this.topOfStack -1 1 StackAr:::OBJECT this.theArray 350965 1 this.theArray.getClass() "java.lang.Object[]" 1 this.theArray[] [null] 1 this.theArray[].getClass() [null] 1 this.topOfStack -1 1 ...
This section describes the original version (1.0) of declaration records. These are now obsolete and should not be used.
A declarations file can contain program point declarations,
VarComparability
declarations, and ListImplementors declarations
.
The format of a program point declaration is:
DECLARE program-point-name varname1 declared-type1 [# auxiliary-information1] representation-type1 [= constant-value1] comparable1 varname2 declared-type2 [# auxiliary-information2] representation-type2 [= constant-value2] comparable2 ...
Program point information includes:
int
, boolean
, etc.), but names for
user-defined or language-specific types can be arbitrary strings.
hasDuplicates
hasOrder
hasNull
nullTerminated
isParam
p
in its post-state form are not
printed. Second, invariants that use fields of p
(such as p.x
)
are printed only if p
has not changed. Lastly, some immutable
characteristics, such as the size of arrays and data types are not
printed (both can be changed if p
is changed, but then, p
would no longer be interesting).
char[]
but
the representation type might be java.lang.String
. Or, the declared
type might be Object
but the representation type might be
hashcode
, if the address of the object is written to the data trace
file.
The representation type should be one of boolean
, int
,
hashcode
, double
, or java.lang.String
; or an
array of one of those (indicated by a []
suffix, as in Java).
Hashcodes are treated like integers, except that their actual values
are considered uninteresting for the purposes of output; they are
intended for unique object identifiers like memory addresses or the
return value of Java's Object.hashCode
method.
The representation type may optionally be followed by an equals sign and a value; in that case, the variable is known to have a compile-time constant value and should be omitted from the data trace file.
The point of comparability is that Daikon should not compare unrelated quantities. For example, each person's height in centimeters may always be less than their birth year, but it is not helpful for Daikon to output ‘height < birthyear’, because they are measuring incomparable quantities. (In this case, the variables use different units of measurement.)
Variable comparability information helps Daikon to avoid computing information over unrelated variables. This saves time and (more importantly) improves the quality of Daikon's output. For more details, see the paper “Quickly detecting relevant program invariants”.
Variable comparability information may be obtained dynamically (see Dynamic abstract type inference (DynComp)), via type-inference based analysis, or in some other manner. In any event, Daikon reads it from the variable declarations.
A comparability for a non-array type is a signed integer. Two variables at the same program point are considered comparable if both integers are the same, or if either integer is negative. A comparability for an array type must contain an integer for each index and for the contents; for instance, ‘5[22][17]’ for a two-dimensional array. Comparisons succeed if comparisons over each component succeed.
Regardless of comparability, variables at different program points are never compared to one another. Use of the same comparability integer at different program points does not indicate any relationship between the variables, and a given variable may have a different comparability integer at different program points.
As an example, in the following code:
int sum(int len, int[] a) { int sum=0; for (int i=0; i++; i<len) sum += a[i]; return sum; }
variables i
and len
are comparable to one another (and
to indices of array a
). Furthermore, the result is comparable
to the elements of array a
. A declaration file for these
variables might look like
len int int 5 a int[] int[] 8[5] return int int 8
Instrumenting code creates a .decls file that contains program point names such as:
DataStructures.StackAr.push(java.lang.Object):::ENTER DataStructures.StackAr.push(java.lang.Object):::EXIT99 PolyCalc.RatNum.RatNum(int, int):::ENTER PolyCalc.RatNum.RatNum(int, int):::EXIT55 PolyCalc.RatNum.RatNum(int, int):::EXIT67
This section describes the format of these program point names. Someone writing an instrumenter for a new language must be sure to follow this format specification.
A program point name is a string with no tabs or newlines in it. The basic
format is ‘topLevel.bottomLevel:::pptInfo’.
For the first example given above, the top level of the hierarchy would
be DataStructures.StackAr
, the bottom level would be
push(java.lang.Object)
, and the
program point information would be ENTER
.
topLevel may contain any number of periods (‘.’). bottomLevel and pptInfo may not contain any periods. The string ‘:::’ may only appear once.
topLevel and pptInfo are required (i.e., they must be non-empty), as are the period to the right of topLevel and the colons to the left of pptInfo. However, bottomLevel is optional.
By convention, for Java topLevel consists of the class name, and bottomLevel consists of the method name and method signature.
For C, topLevel consists of a filename (or a single period for global functions), and bottomLevel could consist of a function name and signature. More precisely, names of C program points follow these conventions:
For IOA, topLevel consists of an Automaton name and bottomLevel consists of information for a transition state.
By convention, the entry and exit points for a function have names of
a special form so that they can be associated with one another.
(Currently, those names end with :::ENTER
and :::EXIT
.) This
convention permits Daikon to generate pre-state variables
(see Variable names) automatically at procedure exit points, so
front ends need not output them explicitly. When there
are multiple exit points, then each one should be suffixed by a number
(such as a line number, for example, foo::EXIT22
). Daikon produces
the main (non-numbered) :::EXIT
point automatically. All the
numbered exits should contain the same set of variables; in general,
this means that local variables are not included at exit points.
Daikon currently requires that declarations for :::ENTER
program
points appear before any declarations for matching :::EXIT
program
points.
Another convention is to have another program point whose
bottomLevel is empty and whose pptInfo is OBJECT
:
for example, StackAr:::OBJECT
. This contains the
representation invariant (sometimes called the object invariant) of a
class. This program point is created automatically by Daikon; it need
not appear in a trace file.
There is a special VarComparability
declaration that controls how
the comparability field in program point declarations is interpreted.
The default VarComparability
is implicit
, which means
ordinary comparability as described in Program point declarations.
(The name implicit
is retained for historical reasons.)
You can override it as
follows:
VarComparability none
As with all records in Daikon input files, a blank line is required between this record and the next one.
This declaration indicates classes that implement the
java.util.List
interface, and should be treated as sequences
for the purposes of invariant detection. The syntax is as follows:
ListImplementors <classname1> <classname2> ...
Each classname is in Java format (for example, “java.util.LinkedList”).
The ‘--list_type’ command-line option to Daikon can also be used to specify classes that implement lists; See Options to control invariant detection.