Next: Debugging Daikon, Previous: Introduction, Up: Top [Contents][Index]
This chapter describes how to customize or modify Daikon.
To compile Daikon, type make
in $DAIKONDIR/java/ or any of its
subdirectories. Alternately, type make -C $DAIKONDIR compile
.
To create the daikon.jar file, type make -C $DAIKONDIR daikon.jar
.
The distribution includes daikon.jar and compiled .class files, so you do not need to compile them yourself unless you make changes.
For more information about compiling Daikon, see the comments in the Makefiles.
• Requirements for compiling Daikon |
Up: Compiling Daikon [Contents][Index]
Before compiling Daikon, you need to install some dependencies (that is, software used to build Daikon). For Rocky Linux and Ubuntu, you can find the commands to install these dependencies in file $DAIKONDIR/scripts/Dockerfile-OSNAME-jdkN-plus. For other operating systems, use similar commands.
Note that Kvasir, the Daikon front end for the C language (see Kvasir in Daikon User Manual), does not work on Mac OS.
The Daikon build process uses the C preprocessor
cpp
to convert each .jpp file in the
distribution into multiple .java files, which are then compiled.
If you have a C compiler, you almost certainly have cpp
.
Next: Using Eclipse, Previous: Compiling Daikon, Up: Extending Daikon [Contents][Index]
The Daikon git repository is located on GitHub; see https://github.com/codespecs/daikon/.
After making a local clone, see Compiling Daikon, for instructions on how to compile Daikon.
Next: New invariants, Previous: Source code (version control repository), Up: Extending Daikon [Contents][Index]
[To be improved.]
Here is one way to use Eclipse to edit Daikon.
First, make sure that Daikon builds cleanly from the command line.
File > Import > General > Existing Projects into Workspace
Choose the java directory of your Daikon checkout
Project > properties > Java build path:
libraries : add external jars
everything in the lib/ directory,
plus also the tools.jar file in the lib/ directory of your JDK.
(I’m not sure why, but add jars
doesn’t show all .jar
files in the directory.)
Source: add Daikon, remove Daikon/src. Default output folder: change from Daikon/bin to Daikon.
Next: New derived variables, Previous: Using Eclipse, Up: Extending Daikon [Contents][Index]
You can easily write your own invariants and have Daikon check them, in addition to all the other invariants that are already part of Daikon. Adding a new invariant to Daikon requires writing one Java class, as explained below.
Daikon.setup_proto_invs
to
call the new invariant’s get_proto
method and recompile Daikon.
The file java/daikon/inv/unary/scalar/Positive.java in the
Daikon distribution contains a sample invariant. This invariant is
true if the variable is always positive (greater than zero). This
invariant is subsumed by other invariants in the system; it is provided
only as a pedagogical example. To enable the invariant, comment out the
appropriate line in Daikon.setup_proto_invs()
, then recompile
Daikon.
A Java class defining an invariant is a concrete subclass of one of the direct subclasses of
UnaryInvariant
,
BinaryInvariant
,
or
TernaryInvariant
.
A complete list of invariants appears in the body of
Daikon.setup_proto_invs()
.
Daikon’s invariants are first instantiated, then are presented samples (tuples of values for all the variables of interest to the invariant; this might be a 1-tuple, a 2-tuple, or a 3-tuple) in turn. If any sample falsifies the invariant, the invariant destroys itself. All remaining invariants at the end of the program run can be queried for their statistical confidence, then reported as likely to be true.
You need to implement the abstract methods of Invariant
that are not defined in one of the subclasses listed above. You also need to define a constructor and a static method:
protected InvName(PptSlice ppt)
Constructor for class InvName. Should only be called from
instantiate_dyn
.
Its typical implementation is
super(ppt);
public static InvName get_proto()
Returns the prototype invariant used to create other invariants. Its typical implementation is
if (proto == null) proto = new InvName(null); return (proto);
Methods that need to be overridden that are defined in a subclass of ‘Invariant’ include:
public InvariantStatus check_modified(..., int count)
public InvariantStatus add_modified(..., int count)
Determines whether the invariant is true for a sample (a tuple of values).
You will eventually want to override one or more of these methods (see New formatting for invariants):
public String format()
public String repr()
public String format_using(OutputFormat format)
Returns a high-level printed representation of the invariant, for user output.
Next: New formatting for invariants, Previous: New invariants, Up: Extending Daikon [Contents][Index]
A derived variable is an expression that does not appear in the source code as a variable, but that Daikon treats as a variable for purposes of invariant detection. For instance, if there exists an array ‘a’ and an integer ‘i’, then Daikon introduces the derived variable ‘a[i]’. This permits detection of invariants over this quantity.
(Describing how to create new variety of derived variable is still to be written. For now, see the derived variables that appear in the Java files in directory $DAIKONDIR/java/daikon/derive/.)
Next: New front ends, Previous: New derived variables, Up: Extending Daikon [Contents][Index]
Daikon can print invariants in multiple formats (see Invariant syntax in Daikon User Manual).
To support a new output format, you need to do two things:
daikon.inv.Invariant.OutputFormat
, add a new static final
field and also update the get
method.
Invariant
, edit the
format_using
method to handle the new OutputFormat
.
Next: New suppressors, Previous: New formatting for invariants, Up: Extending Daikon [Contents][Index]
A front end for Daikon converts data into a form Daikon can process, producing files in Daikon’s input format — data trace declarations and records. For more information about these files, see File formats.
The data traces can be obtained from any source. For instance, front ends have been built for stock data, weather forecasts, truck weight data, and spreadsheet data (see convertcsv.pl in Daikon User Manual), among others. More often, users apply a programming language front end (also called an instrumenter) to a program, causing executions of the program to write files in Daikon’s format. (For information about existing front ends, see Front ends and instrumentation in Daikon User Manual.) When a general front end is not available, it is possible to manually instrument a specific program so that it writes files in Daikon’s format. The resulting instrumented program is very similar to what an instrumenter would have created, so this section is relevant to both approaches.
Conceptually, a front end instrumenter has two tasks. Suppose you want to
infer invariants at a program point (say, a line of code or the entry or
exit from a procedure). The front end must create a
declaration (see Declarations) that
lists the variables in scope at that program point. Every time that
program point is executed, the program must output a data trace record
(see Data trace records). A front end can make the program output a
data trace record by inserting a printf
(or similar) statement that
outputs the current
values of the variables of interest.
• Example instrumented Java program | ||
• Instrumenting C programs |
Next: Instrumenting C programs, Up: New front ends [Contents][Index]
This section gives an example of how an instrumenter for Java might work; other languages are analogous. Suppose we wish to instrument file Example.java.
class Example { // Return either the square of x or the square of (x+1). int squar(int x, boolean b) { if (b) x++; return x*x; } }
The .decls file might look like the following.
DECLARE Example.squar:::ENTER x int int 1 b boolean int 2 DECLARE Example.squar:::EXIT x int int 1 b boolean int 2 return int int 1
The instrumented .java file might look like the following. This example does not compute the “modified bits”, but simply sets them all to 1, which is a safe default.
class Example { static { daikon.chicory.Runtime.setDtraceMaybe("daikon-output/StackAr.dtrace"); } // Return either the square of x or the square of (x+1). int squar(int x, boolean b) { synchronized (daikon.chicory.Runtime.dtrace) { daikon.chicory.Runtime.dtrace.println(); daikon.chicory.Runtime.dtrace.println("Example.squar:::ENTER"); daikon.chicory.Runtime.dtrace.println("x"); daikon.chicory.Runtime.dtrace.println(x); daikon.chicory.Runtime.dtrace.println(1); // modified bit daikon.chicory.Runtime.dtrace.println("b"); daikon.chicory.Runtime.dtrace.println(b ? 1 : 0); daikon.chicory.Runtime.dtrace.println(1); // modified bit } if (b) x++; int daikon_return_value = x*x; synchronized (daikon.chicory.Runtime.dtrace) { daikon.chicory.Runtime.dtrace.println(); daikon.chicory.Runtime.dtrace.println("Example.squar:::EXIT"); daikon.chicory.Runtime.dtrace.println("x"); daikon.chicory.Runtime.dtrace.println(x); daikon.chicory.Runtime.dtrace.println(1); // modified bit daikon.chicory.Runtime.dtrace.println("b"); daikon.chicory.Runtime.dtrace.println(b ? 1 : 0); daikon.chicory.Runtime.dtrace.println(1); // modified bit daikon.chicory.Runtime.dtrace.println("return"); daikon.chicory.Runtime.dtrace.println(daikon_return_value); daikon.chicory.Runtime.dtrace.println(1); // modified bit } return daikon_return_value; } }
(Daikon’s Java front end, Chicory, does not actually insert instrumentation into the Java source code of your program. Rather, it instruments the bytecode as it is loaded into the JVM. This is more efficient, and it avoids making any changes to your .java or .class files. We have shown an example of Java source code instrumentation because that is simpler to explain and understand than the bytecode instrumentation.)
Previous: Example instrumented Java program, Up: New front ends [Contents][Index]
Daikon comes with a front end for the C language: Kvasir (see Kvasir in Daikon User Manual). Kvasir only works under the Linux operating system, and it works only on “x86” (Intel 386, 486, 586, 686 compatible) and “x86-64” (AMD64, EM64T compatible) processors.
You may wish to infer invariants over C programs running on other platforms; for instance, you want a robust C front end that works under Microsoft Windows. This section will help you to either write such a front end or to hand-instrument your program to produce output that Daikon can process.
We welcome additions and corrections to this part of the manual. And, if you write a C instrumenter that might be of use to others, please contribute it back to the Daikon project.
A front end for C (or any other language) performs two tasks. It determines the names of all variables that are in scope at a particular program point, and it prints the values of those variables each time the program point executes.
Determining the names of the variables is straightforward. It requires either parsing source code or parsing a compiled executable. In the latter case, the variables can be determined from debugging information that the compiler places in the executable.
The challenge for C programs is determining the values of variables at execution time: for each variable, the front end must determine whether the variable’s value is valid, and how big the value is.
A front end should print only variables that have valid values. Examples of invalid values are variables that have not yet been initialized and pointers whose content has been deallocated. (A pointer dereference, such as ‘*p’ or ‘p->field’, can itself be to uninitialized and/or deallocated memory.) Invalid values should be printed as ‘nonsensical’ (see Data trace records).
It is desirable to print ‘nonsensical’ rather than an invalid value, for two reasons. First, outputting nonsense values can degrade invariant detection; patterns in the valid data may be masked by noise from invalid values. Second, an attempt to access an invalid value can cause the instrumented program to crash! For instance, suppose that pointer ‘p’ is not yet initialized — the pointer value refers to some arbitrary location in memory, possibly even an address that the operating system has not allocated to the program. An attempt to print the value of ‘*p’ or ‘p->field’ will result in a segmentation fault when ‘*p’ is accessed. (If you choose never to dereference a pointer while performing instrumentation, then you do not need to worry about invalid references. However, you will be unable to output any fields of a pointer to a struct or class, making your front end less useful. You will still be able to output fields of a regular variable to a struct or class, but most interesting uses of structs and classes in C and C++ are through pointers.)
C relies on the programmer to remember which variables are valid, and the programmer must take care never to access invalid variables. Unfortunately, there is no simple automatic way to determine variable validity for an arbitrary C program. (Languages with automatic memory management, such as Java, do not pose these problems. All variables always have an initial value, so there is no danger of printing uninitialized memory, though the initial value may not be particularly meaningful. Because pointed-to memory is never deallocated, all non-null pointers are always valid, so there is no danger of a segmentation fault.)
An instrumenter needs information about validity of variable values. This could be obtained from the programmer (which requires work on the part of the user of Daikon), or obtained automatically by creating a new run-time system that tracks the information (which requires a more sophisticated front end).
In addition to determining which variables are uninitialized and which pointers are to allocated memory, there are additional problems for a C front end. For example, given a char pointer ‘*c’, does it point to a single character, or to an array of characters? If it points to an array of characters, how big is that array? And for each element of the array, is that element initialized or not?
The problem of tracking C memory may seem daunting, but it is not insurmountable. There exist many tools for detecting or debugging memory errors in C, and they need to perform exactly the same memory tracking as a Daikon front end must perform. Therefore, a Daikon front end can use the same well-known techniques, and possibly can even be built on top of such a tool. For instance, one C front end, named Kvasir, is built on top of the Valgrind tool (https://valgrind.org/), greatly reducing the implementation effort. Valgrind only works under Linux, but a C front end for another platform could build on a similar tool; many other such tools exist.
There are two basic approaches to instrumenting a C program (or a program in any other language): instrument the source code, or instrument a compiled binary representation of the program. In each case, additional code that tracks all memory allocations, deallocations, writes, and reads must be executed at run time. Which approach is most appropriate for you depends on what tools you use when building your C instrumentation system.
In some cases, it may not be necessary to build a fully general C instrumentation system. You may be able to craft a smaller, simpler extension to an existing program — enabling that program (only) to produce files for Daikon to analyze.
For instance, many programs use specialized memory allocation routines
(customized versions of malloc
and free
), in order to
prevent or detect memory errors. The information that such libraries
collect is often sufficient to determine which variable values should be
printed, and which should be suppressed in favor of printing
‘nonsensical’ instead.
The presence of memory errors — even in a program that appears to run correctly — makes it much harder to create Daikon’s output. Therefore, as a prerequisite to instrumenting a C program, it is usually a good idea to run a memory checker on that program and to eliminate any memory errors.
Next: Reading dtrace files, Previous: New front ends, Up: Extending Daikon [Contents][Index]
As mentioned in Daikon internals, one way to make Daikon more
efficient, and to reduce clutter in the output to the user, is to reduce the
number of redundant invariants of various kinds. This section describes
how to add a new suppressor relation, such that if invariant A implies
B, B is not instantiated or checked as long as A holds, saving time and
space. Suppression implications use some terminology. A
suppressor (defined in the class NISuppressor
) is one of a
set of invariants (NISuppression
) that imply and suppress a
suppressee invariant (NISuppressee
). The set of all of
the suppressions that suppress a particular suppressee is stored in the
class NISuppressionSet
.
Adding a new suppression is straightforward when the invariants involved
do not have any state. Define the suppressee and
each of the suppressions that suppress it using the corresponding
constructors. Add the method get_ni_suppressions
to the class
of the invariant being suppressed and return the appropriate
suppression set. Make sure that get_ni_suppressions
always
returns the same suppression set (i.e., that storage to store
the suppressions is only allocated once). Normally this is
done by defining a static variable to hold the suppression sets
and initializing this variable the first time that get_ni_suppressions
is called.
The following example defines suppressions for ‘x == y’ implies ‘x >= y’ and ‘x > y’ implies ‘x >= y’.
private static NISuppressionSet suppressions = null; public NISuppressionSet get_ni_suppressions() { if (suppressions == null) { NISuppressee = new NISuppressee(IntGreaterEqual); NISuppressor v1_eq_v2 = new NISuppressor(0, 1, IntEqual.class); NISuppressor v1_lt_v2 = new NISuppressor(0, 1, IntLessThan.class); suppressions = new NISuppressionSet(new NISuppression[] { new NISuppression(v1_eq_v2, suppressee), new NISuppression(v1_lt_v2, suppressee), }); } return suppressions; }
For suppressions depending on the state of a particular invariant,
each Invariant
has an isObviousDynamically(VarInfo[] vis)
method that is called once the state of other invariants has already
been determined. This method returns a non-null value if this invariant
is implied by a fact that can be derived from the given VarInfo
s.
For example, suppose division was not defined for divisors smaller than 1. The following example defines an obvious check for ‘x <= c’ (where c < 1 is a constant) implies ‘y % x == 0’, written in the Divides class.
public DiscardInfo isObviousDynamically(VarInfo[] vis) { DiscardInfo di = super.isObviousDynamically(vis); if(di != null) { return di; } VarInfo var1 = vis[0]; PptSlice1 ppt_over1 = ppt.parent.findSlice(var1); if(ppt_over1 == null) { return null; } for(Invariant inv : ppt_over1.invs) { if(inv instanceof UpperBound) { if(((UpperBound) inv).max() < 1) { return new DiscardInfo(this, DiscardCode.obvious, ‘Divides is obvious when divisor less than one’); } } } return null; }
Next: System.exit, Previous: New suppressors, Up: Extending Daikon [Contents][Index]
If you wish to write a program that manipulates a .dtrace file, you can use Daikon’s built-in mechanisms for parsing .dtrace files. (This is easier and less error-prone than writing your own parser.)
You will write a subclass of FileIO.Processor
, then pass an instance
of that class to FileIO.read_data_trace_files
. Daikon will parse
each record in the trace files that you indicate, then will pass the parsed
version to methods in your processor.
For a simple example of how to use FileIO.Processor
, see the file
daikon/java/daikon/tools/ReadTrace.java.
Previous: Reading dtrace files, Up: Extending Daikon [Contents][Index]
The Daikon codebase does not call System.exit()
, except in a
dummy main method that catches TerminationMessage
, which is the
standard way that a component of Daikon requests the JVM to shut down.
The reason for this is that calling System.exit()
is usually a
bad idea. It makes the class unusable as a subroutine, because it
might kill the calling program. It can cause deadlock. And it can
leave data in an inconsistent state (for example, if the program was in
the middle of writing a file, still held non-Java locks, etc.), because
the program has no good way of completing any actions that it was in the
middle of. Therefore, it is better to throw an exception and let the
program handle it appropriately. (This is true of instrumentation code
as well.)
To see the stack trace for a TerminationMessage
, pass
--config_option daikon.Debug.show_stack_trace=true on the command
line.
Previous: Reading dtrace files, Up: Extending Daikon [Contents][Index]