Next: , Previous: Introduction, Up: Top   [Contents][Index]

2 Extending Daikon

This chapter describes how to customize or modify Daikon.


Next: , Up: Extending Daikon   [Contents][Index]

2.1 Compiling Daikon

To compile Daikon, type make in $DAIKONDIR/java/ or any of its subdirectories. Alternately, type make -C $DAIKONDIR compile. To create the daikon.jar file, type make -C $DAIKONDIR daikon.jar.

The distribution includes daikon.jar and compiled .class files, so you do not need to compile them yourself unless you make changes.

For more information about compiling Daikon, see the comments in the Makefiles.

When you compile Daikon, environment variables DAIKONDIR and JAVA_HOME should be set. This is already done if you source the daikon.bashrc, as recommended in the installation instructions (see Complete installation in Daikon User Manual). Thus, a complete .bashrc or .bash_profile shell setup file that would enable you to compile Daikon might look like the following:

export DAIKONDIR=$HOME/daikon
export JAVA_HOME=/usr/java/current
source $DAIKONDIR/scripts/daikon.bashrc
export CLASSPATH=.:$CLASSPATH

When setting up environment pathname variables under Windows/Cygwin (such as JAVA_HOME) make sure that the pathname is specified in Linux format (e.g., /cygdrive/c/daikon rather than C:\daikon). However, the CLASSPATH environment variable is a special case; as java is a Windows program, it expects CLASSPATH to be in Windows format. This is the only exception to Cygwin looking like Linux. Thus CLASSPATH must always be in Windows format; i.e., it should use semicolons (;) rather than colons (:) as a path separator. Using the wrong filename format or the wrong pathname separator is a common source of errors that will result in odd error messages and build failures.

Please note this applies to a -CLASSPATH argument to java as well: it must have Windows pathnames. One easy way to deal with this is to use the cygpath utility:

java -cp "`cygpath -wp linux/path/a:linux/path/b`" <other arguments>

Note both the back quotes (‘`’) to force cygpath to be evaluated before calling java and the double quotes (‘"’) to protect against special characters and spaces in Windows paths.


Up: Compiling Daikon   [Contents][Index]

2.1.1 Requirements for compiling Daikon

Daikon is written in Java. In order to compile Daikon, you need a Java compiler such as ‘javac’ on your path. To override the default Java compiler (‘javac’), create a Makefile.user file in the daikon/ directory and add a line like the following.

JAVAC ?= jikes -g +E +F

In order to compile Daikon, you need the C preprocessor cpp, which is used to convert each .jpp file in the distribution into multiple .java files, which are then compiled. If you have a C compiler, you almost certainly have cpp. If you do not have cpp (or gcc, which can emulate cpp via gcc -E), you may run make avoid-jpp, in which case changes to .jpp files will not be reflected in the .java files or the compiled .class files. (The purpose of the .jpp files is to avoid code duplication by placing common code in a single file, then generating other files that need to include that common code.)

To make the documentation (via make -C $DAIKONDIR/doc), you need makeinfo version 4.7 or newer.

To compile Daikon on Windows, the best approach is to install the Cygwin toolset (available at http://cygwin.com/), which contains everything you need to compile and run Linux programs under Windows. You can install Cygwin by running one of the setup programs (based on your machine type) found at http://cygwin.com/install.html.


Next: , Previous: Compiling Daikon, Up: Extending Daikon   [Contents][Index]

2.2 Version control repository

The Daikon git repository is located on GitHub; see https://github.com/codespecs/daikon/.

See Compiling Daikon, for instructions for compiling Daikon.


Next: , Previous: Version control repository, Up: Extending Daikon   [Contents][Index]

2.3 Using Eclipse

[To be improved.]

Here is one way to use Eclipse to edit Daikon.

First, make sure that Daikon builds cleanly from the command line.

File > Import > General > Existing Projects into Workspace

Choose the java directory of your Daikon checkout

Project > properties > Java build path: libraries : add external jars everything in the lib/ directory, plus also the tools.jar file in the lib/ directory of your JDK. (I’m not sure why, but add jars doesn’t show all .jar files in the directory.)

Source: add Daikon, remove Daikon/src. Default output folder: change from Daikon/bin to Daikon.


Next: , Previous: Using Eclipse, Up: Extending Daikon   [Contents][Index]

2.4 New invariants

You can easily write your own invariants and have Daikon check them, in addition to all the other invariants that are already part of Daikon. Adding a new invariant to Daikon requires writing one Java class, adding a line to another file to inform Daikon of the new class, and recompiling Daikon.

The file java/daikon/inv/unary/scalar/Positive.java in the Daikon distribution contains a sample invariant. This invariant is true if the variable is always positive (greater than zero). This invariant is subsumed by other invariants in the system; it is provided only as a pedagogical example. To enable the invariant, comment out the appropriate line in Daikon.setup_proto_invs(), then recompile Daikon.

A Java class defining an invariant is a concrete subclass of one of the following abstract classes:

SingleScalar

Invariants over a single numeric (scalar) variable, such as ‘x != 0’.

TwoScalar

Invariants over two numeric variables, such as ‘y = abs(x)’.

ThreeScalar

Invariants over three numeric variables, such as ‘z = ax + by + c’.

SingleSequence

Invariants over one sequence (array) variable, such as ‘a[] contains no duplicates’.

TwoSequence

Invariants over two sequences, such as ‘a[] is a subsequence of b[]’.

SequenceScalar

Invariants over a scalar and a sequence, such as ‘x is a member of a[]’.

A complete list of invariants appears in Daikon.setup_proto_invs().

Daikon’s invariants are first instantiated, then are presented samples (tuples of values for all the variables of interest to the invariant; this might be a 1-tuple, a 2-tuple, or a 3-tuple) in turn. If any sample falsifies the invariant, the invariant destroys itself. All remaining invariants at the end of the program run can be reported as likely to be true.

The key methods of the new invariant class InvName are

protected InvName(PptSlice ppt)

Constructor for class InvName. Should only be called from instantiate_dyn. Its typical implementation is

super(ppt);
protected Invariant instantiate_dyn (PptSlice slice)

Returns an invariant of this class on the specified slice. Its implementation is almost always

return new InvName(slice);
public static InvName get_proto()

Returns the prototype invariant used to create other invariants. Its typical implementation is

if (proto == null)
  proto = new InvName (null);
return (proto);
public boolean enabled()

Returns whether or not this invariant is enabled. Its implementation is almost always

return dkconfig_enabled;
public boolean instantiate_ok (VarInfo[] vis)

Returns whether or not it makes sense to instantiate this invariant over the specified variables. If not present, the invariant is created over all comparable variables of the correct types.

public InvariantStatus check_modified (..., int count)

Presents a tuple of values to the invariant; these are the first N arguments, which have appropriate types for the particular invariant. The count argument indicates how many samples have this value. For example, three calls to check_modified with a count parameter of 1 is equivalent to one call to check_modified with a count parameter of 3. Returns whether or not the sample is consistent with the invariant. Does not change the state of the invariant.

public InvariantStatus add_modified (..., int count)

Similar to check_modified except that it can change the state of the invariant if necessary. If the invariant doesn’t have any state, then simply calls check_modified.

protected double computeConfidence ()

Returns the probability that the observed data could not have happened by chance alone. The result usually falls between 0 and 1, where 0 means the values seen so far certainly happened by chance and 1 means they could never have happened by chance. The method may also return one of the following constants in the Invariant class (see the class for documentation): CONFIDENCE_JUSTIFIED, CONFIDENCE_UNJUSTIFIED, CONFIDENCE_NEVER.

For example, suppose your new invariant has a 50% chance of being true by chance for each sample. (‘x is even’ is an example of such an invariant.) Then a reasonable body for computeConfidence would be

return 1 - Math.pow(.5, ppt.num_samples());

If 5 values had been seen, then this implementation would return 31/32, which is the likelihood that all 5 values seen so far were even not purely by chance. An invariant is printed only if its probability of not occurring by chance alone is large enough (by default, greater than .99; see Daikon’s --conf_limit command-line option.

public String format ()
public String repr ()
public String format_using (OutputFormat format)

Returns a high-level printed representation of the invariant, for user output. format produces normal output, while the repr formatting routine produces low-level, detailed output for debugging. When first writing an invariant, you can make repr and format_using simply call format, then fix up the implementations for the different output formats later as needed. See also New formatting for invariants.

After the invariant is written, add a call to its get_proto method in the Daikon.setup_proto_invs method.


Next: , Previous: New invariants, Up: Extending Daikon   [Contents][Index]

2.5 New derived variables

A derived variable is an expression that does not appear in the source code as a variable, but that Daikon treats as a variable for purposes of invariant detection. For instance, if there exists an array ‘a’ and an integer ‘i’, then Daikon introduces the derived variable ‘a[i]’. This permits detection of invariants over this quantity.

(Describing how to create new variety of derived variable is still to be written. For now, see the derived variables that appear in the Java files in directory $DAIKONDIR/java/daikon/derive/.)


Next: , Previous: New derived variables, Up: Extending Daikon   [Contents][Index]

2.6 New formatting for invariants

Daikon can print invariants in multiple formats (see Invariant syntax in Daikon User Manual).

To support a new output format, you need to do two things:


Next: , Previous: New formatting for invariants, Up: Extending Daikon   [Contents][Index]

2.7 New front ends

A front end for Daikon converts data into a form Daikon can process, producing files in Daikon’s input format — data trace declarations and records. For more information about these files, see File formats.

The data traces can be obtained from any source. For instance, front ends have been built for stock data, weather forecasts, truck weight data, and spreadsheet data (see convertcsv.pl in Daikon User Manual), among others. More often, users apply a programming language front end (also called an instrumenter) to a program, causing executions of the program to write files in Daikon’s format. (For information about existing front ends, see Front ends (instrumentation) in Daikon User Manual.) When a general front end is not available, it is possible to manually instrument a specific program so that it writes files in Daikon’s format. The resulting instrumented program is very similar to what an instrumenter would have created, so this section is relevant to both approaches.

Conceptually, instrumentation is very simple. For each program point (say, a line of code or the entry or exit from a procedure) at which you wish to detect invariants, the front end must arrange to create a declaration (see Declarations) that lists the variables in scope at that program point, and the front end must arrange that execution creates a data trace record (see Data trace records) for each execution of that program point. Conventionally, the way to create a data trace record is to insert a printf (or similar) statement that outputs the current values of the variables of interest.


Next: , Up: New front ends   [Contents][Index]

2.7.1 Example instrumented Java program

This section gives an example of how an instrumenter for Java might work; other languages are analogous. Suppose we wish to instrument file Example.java.

class Example {
  // Return either the square of x or the square of (x+1).
  int squar(int x, boolean b) {
    if (b)
      x++;
    return x*x;
  }
}

The .decls file might look like the following.

DECLARE
Example.squar:::ENTER
x
int
int
1
b
boolean
int
2

DECLARE
Example.squar:::EXIT
x
int
int
1
b
boolean
int
2
return
int
int
1

The instrumented .java file might look like the following. This example does not compute the “modified bits”, but simply sets them all to 1, which is a safe default.

class Example {
  static {
    daikon.Runtime.setDtraceMaybe("daikon-output/StackAr.dtrace");
  }

  // Return either the square of x or the square of (x+1).
  int squar(int x, boolean b) {
    synchronized (daikon.Runtime.dtrace) {
      daikon.Runtime.dtrace.println();
      daikon.Runtime.dtrace.println("Example.squar:::ENTER");
      daikon.Runtime.dtrace.println("x");
      daikon.Runtime.dtrace.println(x);
      daikon.Runtime.dtrace.println(1);  // modified bit
      daikon.Runtime.dtrace.println("b");
      daikon.Runtime.dtrace.println(b ? 1 : 0);
      daikon.Runtime.dtrace.println(1);  // modified bit
    }

    if (b)
      x++;

    int daikon_return_value = x*x;
    synchronized (daikon.Runtime.dtrace) {
      daikon.Runtime.dtrace.println();
      daikon.Runtime.dtrace.println("Example.squar:::EXIT");
      daikon.Runtime.dtrace.println("x");
      daikon.Runtime.dtrace.println(x);
      daikon.Runtime.dtrace.println(1);  // modified bit
      daikon.Runtime.dtrace.println("b");
      daikon.Runtime.dtrace.println(b ? 1 : 0);
      daikon.Runtime.dtrace.println(1);  // modified bit
      daikon.Runtime.dtrace.println("return");
      daikon.Runtime.dtrace.println(daikon_return_value);
      daikon.Runtime.dtrace.println(1);  // modified bit
    }

    return daikon_return_value;
  }
}

(Daikon’s Java front end, Chicory, does not actually insert instrumentation into the Java source code of your program. Rather, it instruments the bytecode as it is loaded into the JVM. This is more efficient, and it avoids making any changes to your .java or .class files. We have shown an example of Java source code instrumentation because that is simpler to explain and understand than the bytecode instrumentation.)


Previous: Example instrumented Java program, Up: New front ends   [Contents][Index]

2.7.2 Instrumenting C programs

Daikon comes with a front end for the C language: Kvasir (see Kvasir in Daikon User Manual). Kvasir only works under the Linux operating system, and it works only on “x86” (Intel 386, 486, 586, 686 compatible) and “x86-64” (AMD64, EM64T compatible) processors.

You may wish to infer invariants over C programs running on other platforms; for instance, you want a robust C front end that works under Microsoft Windows. This section will help you to either write such a front end or to hand-instrument your program to produce output that Daikon can process.

We welcome additions and corrections to this part of the manual. And, if you write a C instrumenter that might be of use to others, please contribute it back to the Daikon project.

A front end for C (or any other language) performs two tasks. It determines the names of all variables that are in scope at a particular program point, and it prints the values of those variables each time the program point executes.

Determining the names of the variables is straightforward. It requires either parsing source code or parsing a compiled executable. In the latter case, the variables can be determined from debugging information that the compiler places in the executable.

The challenge for C programs is determining the values of variables at execution time: for each variable, the front end must determine whether the variable’s value is valid, and how big the value is.

A front end should print only variables that have valid values. Examples of invalid values are variables that have not yet been initialized and pointers whose content has been deallocated. (A pointer dereference, such as ‘*p’ or ‘p->field’, can itself be to uninitialized and/or deallocated memory.) Invalid values should be printed as ‘nonsensical’ (see Data trace records).

It is desirable to print ‘nonsensical’ rather than an invalid value, for two reasons. First, outputting nonsense values can degrade invariant detection; patterns in the valid data may be masked by noise from invalid values. Second, an attempt to access an invalid value can cause the instrumented program to crash! For instance, suppose that pointer ‘p’ is not yet initialized — the pointer value refers to some arbitrary location in memory, possibly even an address that the operating system has not allocated to the program. An attempt to print the value of ‘*p’ or ‘p->field’ will result in a segmentation fault when ‘*p’ is accessed. (If you choose never to dereference a pointer while performing instrumentation, then you do not need to worry about invalid references. However, you will be unable to output any fields of a pointer to a struct or class, making your front end less useful. You will still be able to output fields of a regular variable to a struct or class, but most interesting uses of structs and classes in C and C++ are through pointers.)

C relies on the programmer to remember which variables are valid, and the programmer must take care never to access invalid variables. Unfortunately, there is no simple automatic way to determine variable validity for an arbitrary C program. (Languages with automatic memory management, such as Java, do not pose these problems. All variables always have an initial value, so there is no danger of printing uninitialized memory, though the initial value may not be particularly meaningful. Because pointed-to memory is never deallocated, all non-null pointers are always valid, so there is no danger of a segmentation fault.)

An instrumenter needs information about validity of variable values. This could be obtained from the programmer (which requires work on the part of the user of Daikon), or obtained automatically by creating a new run-time system that tracks the information (which requires a more sophisticated front end).

In addition to determining which variables are uninitialized and which pointers are to allocated memory, there are additional problems for a C front end. For example, given a char pointer ‘*c’, does it point to a single character, or to an array of characters? If it points to an array of characters, how big is that array? And for each element of the array, is that element initialized or not?

The problem of tracking C memory may seem daunting, but it is not insurmountable. There exist many tools for detecting or debugging memory errors in C, and they need to perform exactly the same memory tracking as a Daikon front end must perform. Therefore, a Daikon front end can use the same well-known techniques, and possibly can even be built on top of such a tool. For instance, one C front end, named Kvasir, is built on top of the Valgrind tool (http://valgrind.org/), greatly reducing the implementation effort. Valgrind only works under Linux, but a C front end for another platform could build on a similar tool; many other such tools exist.

There are two basic approaches to instrumenting a C program (or a program in any other language): instrument the source code, or instrument a compiled binary representation of the program. In each case, additional code that tracks all memory allocations, deallocations, writes, and reads must be executed at run time. Which approach is most appropriate for you depends on what tools you use when building your C instrumentation system.

In some cases, it may not be necessary to build a fully general C instrumentation system. You may be able to craft a smaller, simpler extension to an existing program — enabling that program (only) to produce files for Daikon to analyze.

For instance, many programs use specialized memory allocation routines (customized versions of malloc and free), in order to prevent or detect memory errors. The information that such libraries collect is often sufficient to determine which variable values should be printed, and which should be suppressed in favor of printing ‘nonsensical’ instead.

The presence of memory errors — even in a program that appears to run correctly — makes it much harder to create Daikon’s output. Therefore, as a prerequisite to instrumenting a C program, it is usually a good idea to run a memory checker on that program and to eliminate any memory errors.


Next: , Previous: New front ends, Up: Extending Daikon   [Contents][Index]

2.8 New suppressors

As mentioned in Daikon internals, one way to make Daikon more efficient, and to reduce clutter in the output to the user, is to reduce the number of redundant invariants of various kinds. This section describes how to add a new suppressor relation, such that if invariant A implies B, B is not instantiated or checked as long as A holds, saving time and space. Suppression implications use some terminology. A suppressor (defined in the class NISuppressor) is one of a set of invariants (NISuppression) that imply and suppress a suppressee invariant (NISuppressee). The set of all of the suppressions that suppress a particular suppressee is stored in the class NISuppressionSet.

Adding a new suppression is straightforward when the invariants involved do not have any state. Define the suppressee and each of the suppressions that suppress it using the corresponding constructors. Add the method get_ni_suppressions to the class of the invariant being suppressed and return the appropriate suppression set. Make sure that get_ni_suppressions always returns the same suppression set (i.e., that storage to store the suppressions is only allocated once). Normally this is done by defining a static variable to hold the suppression sets and initializing this variable the first time that get_ni_suppressions is called.

The following example defines suppressions for ‘x == y’ implies ‘x >= y’ and ‘x > y’ implies ‘x >= y’.

private static NISuppressionSet suppressions = null;

public NISuppressionSet get_ni_suppressions() {
  if (suppressions == null) {
    NISuppressee = new NISuppressee (IntGreaterEqual);

    NISuppressor v1_eq_v2 = new NISuppressor (0, 1, IntEqual.class);
    NISuppressor v1_lt_v2 = new NISuppressor (0, 1, IntLessThan.class);

    suppressions = new NISuppressionSet (new NISuppression[] {
      new NISuppression (v1_eq_v2, suppressee),
      new NISuppression (v1_lt_v2, suppressee),
    });
  }
  return (suppressions);
}

For suppressions depending on the state of a particular invariant, each Invariant has an isObviousDynamically(VarInfo[] vis) method that is called once the state of other invariants has already been determined. This method returns a non-null value if this invariant is implied by a fact that can be derived from the given VarInfos.

For example, suppose division was not defined for divisors smaller than 1. The following example defines an obvious check for ‘x <= c’ (where c < 1 is a constant) implies ‘y % x == 0’, written in the Divides class.

public DiscardInfo isObviousDynamically(VarInfo[] vis) {
  DiscardInfo di = super.isObviousDynamically(vis);
  if(di != null) {
    return di;
  }

  VarInfo var1 = vis[0];

  PptSlice1 ppt_over1 = ppt.parent.findSlice(var1);

  if(ppt_over1 == null) {
    return null;
  }

  for(Invariant inv : ppt_over1.invs) {            
    if(inv instanceof UpperBound) {
      if(((UpperBound) inv).max() < 1) {
        return new DiscardInfo(this, DiscardCode.obvious,
                       ‘Divides is obvious when divisor less than one’);
      }
    }
  }

  return null;
}

Next: , Previous: New suppressors, Up: Extending Daikon   [Contents][Index]

2.9 Reading dtrace files

If you wish to write a program that manipulates a .dtrace file, you can use Daikon’s built-in mechanisms for parsing .dtrace files. (This is easier and less error-prone than write a program that directly reads the file.)

You will write a subclass of FileIO.Processor, then pass an instance of that class to FileIO.read_data_trace_files. Daikon will parse each record in the trace files that you indicate, then will pass the parsed version to methods in your processor.

For a simple example of how to use FileIO.Processor, see the file
daikon/java/daikon/tools/ReadTrace.java.


Previous: Reading dtrace files, Up: Extending Daikon   [Contents][Index]

2.10 System.exit

The Daikon codebase does not call System.exit(), except in a dummy main method that catches TerminationMessage, which is the standard way that a component of Daikon requests the JVM to shut down.

The reason for this is that calling System.exit() is usually a bad idea. It makes the class unusable as a subroutine, because it might kill the calling program. It can cause deadlock. And it can leave data in an inconsistent state (for example, if the program was in the middle of writing a file, still held non-Java locks, etc.), because the program has no good way of completing any actions that it was in the middle of. Therefore, it is better to throw an exception and let the program handle it appropriately. (This is true of instrumentation code as well.)


Previous: Reading dtrace files, Up: Extending Daikon   [Contents][Index]