Extending Daikon (The Daikon Invariant Detector Developer Manual)

To compile Daikon, type make in $DAIKONDIR/java/ or any of its subdirectories. Alternately, type make -C $DAIKONDIR compile. To create the daikon.jar file, type make -C $DAIKONDIR daikon.jar.

The distribution includes daikon.jar and compiled .class files, so you do not need to compile them yourself unless you make changes.

2.1.1 Requirements for compiling Daikon

Before compiling Daikon, you need to install some dependencies (that is, software used to build Daikon). For Rocky Linux and Ubuntu, you can find the commands to install these dependencies in file $DAIKONDIR/scripts/Dockerfile-OSNAME-jdkN-plus. For other operating systems, use similar commands.

Note that Kvasir, the Daikon front end for the C language (see Kvasir in Daikon User Manual), does not work on Mac OS.

The Daikon build process uses the C preprocessor cpp to convert each .jpp file in the distribution into multiple .java files, which are then compiled. If you have a C compiler, you almost certainly have cpp.

2.2 Source code (version control repository)

After making a local clone, see Compiling Daikon, for instructions on how to compile Daikon.

2.3 Using Eclipse

Project > properties > Java build path: libraries : add external jars everything in the lib/ directory, plus also the tools.jar file in the lib/ directory of your JDK. (I’m not sure why, but add jars doesn’t show all .jar files in the directory.)

Source: add Daikon, remove Daikon/src. Default output folder: change from Daikon/bin to Daikon.

2.4 New invariants

You can easily write your own invariants and have Daikon check them, in addition to all the other invariants that are already part of Daikon. Adding a new invariant to Daikon requires writing one Java class, as explained below.

The file java/daikon/inv/unary/scalar/Positive.java in the Daikon distribution contains a sample invariant. This invariant is true if the variable is always positive (greater than zero). This invariant is subsumed by other invariants in the system; it is provided only as a pedagogical example. To enable the invariant, comment out the appropriate line in Daikon.setup_proto_invs(), then recompile Daikon.

A Java class defining an invariant is a concrete subclass of one of the direct subclasses of UnaryInvariant, BinaryInvariant, or TernaryInvariant. A complete list of invariants appears in the body of Daikon.setup_proto_invs().

Daikon’s invariants are first instantiated, then are presented samples (tuples of values for all the variables of interest to the invariant; this might be a 1-tuple, a 2-tuple, or a 3-tuple) in turn. If any sample falsifies the invariant, the invariant destroys itself. All remaining invariants at the end of the program run can be queried for their statistical confidence, then reported as likely to be true.

You need to implement the abstract methods of Invariant that are not defined in one of the subclasses listed above. You also need to define a constructor and a static method:

Methods that need to be overridden that are defined in a subclass of ‘Invariant’ include:

2.5 New derived variables

A derived variable is an expression that does not appear in the source code as a variable, but that Daikon treats as a variable for purposes of invariant detection. For instance, if there exists an array ‘a’ and an integer ‘i’, then Daikon introduces the derived variable ‘a[i]’. This permits detection of invariants over this quantity.

(Describing how to create new variety of derived variable is still to be written. For now, see the derived variables that appear in the Java files in directory $DAIKONDIR/java/daikon/derive/.)

2.6 New formatting for invariants

Daikon can print invariants in multiple formats (see Invariant syntax in Daikon User Manual).

2.7 New front ends

A front end for Daikon converts data into a form Daikon can process, producing files in Daikon’s input format — data trace declarations and records. For more information about these files, see File formats.

The data traces can be obtained from any source. For instance, front ends have been built for stock data, weather forecasts, truck weight data, and spreadsheet data (see convertcsv.pl in Daikon User Manual), among others. More often, users apply a programming language front end (also called an instrumenter) to a program, causing executions of the program to write files in Daikon’s format. (For information about existing front ends, see Front ends and instrumentation in Daikon User Manual.) When a general front end is not available, it is possible to manually instrument a specific program so that it writes files in Daikon’s format. The resulting instrumented program is very similar to what an instrumenter would have created, so this section is relevant to both approaches.

Conceptually, a front end instrumenter has two tasks. Suppose you want to infer invariants at a program point (say, a line of code or the entry or exit from a procedure). The front end must create a declaration (see Declarations) that lists the variables in scope at that program point. Every time that program point is executed, the program must output a data trace record (see Data trace records). A front end can make the program output a data trace record by inserting a printf (or similar) statement that outputs the current values of the variables of interest.

2.7.1 Example instrumented Java program

This section gives an example of how an instrumenter for Java might work; other languages are analogous. Suppose we wish to instrument file Example.java.

class Example {
  // Return either the square of x or the square of (x+1).
  int squar(int x, boolean b) {
    if (b)
      x++;
    return x*x;
  }
}

DECLARE
Example.squar:::ENTER
x
int
int
1
b
boolean
int
2

DECLARE
Example.squar:::EXIT
x
int
int
1
b
boolean
int
2
return
int
int
1

The instrumented .java file might look like the following. This example does not compute the “modified bits”, but simply sets them all to 1, which is a safe default.

class Example {
  static {
    daikon.chicory.Runtime.setDtraceMaybe("daikon-output/StackAr.dtrace");
  }

  // Return either the square of x or the square of (x+1).
  int squar(int x, boolean b) {
    synchronized (daikon.chicory.Runtime.dtrace) {
      daikon.chicory.Runtime.dtrace.println();
      daikon.chicory.Runtime.dtrace.println("Example.squar:::ENTER");
      daikon.chicory.Runtime.dtrace.println("x");
      daikon.chicory.Runtime.dtrace.println(x);
      daikon.chicory.Runtime.dtrace.println(1);  // modified bit
      daikon.chicory.Runtime.dtrace.println("b");
      daikon.chicory.Runtime.dtrace.println(b ? 1 : 0);
      daikon.chicory.Runtime.dtrace.println(1);  // modified bit
    }

    if (b)
      x++;

    int daikon_return_value = x*x;
    synchronized (daikon.chicory.Runtime.dtrace) {
      daikon.chicory.Runtime.dtrace.println();
      daikon.chicory.Runtime.dtrace.println("Example.squar:::EXIT");
      daikon.chicory.Runtime.dtrace.println("x");
      daikon.chicory.Runtime.dtrace.println(x);
      daikon.chicory.Runtime.dtrace.println(1);  // modified bit
      daikon.chicory.Runtime.dtrace.println("b");
      daikon.chicory.Runtime.dtrace.println(b ? 1 : 0);
      daikon.chicory.Runtime.dtrace.println(1);  // modified bit
      daikon.chicory.Runtime.dtrace.println("return");
      daikon.chicory.Runtime.dtrace.println(daikon_return_value);
      daikon.chicory.Runtime.dtrace.println(1);  // modified bit
    }

    return daikon_return_value;
  }
}

(Daikon’s Java front end, Chicory, does not actually insert instrumentation into the Java source code of your program. Rather, it instruments the bytecode as it is loaded into the JVM. This is more efficient, and it avoids making any changes to your .java or .class files. We have shown an example of Java source code instrumentation because that is simpler to explain and understand than the bytecode instrumentation.)

2.7.2 Instrumenting C programs

Daikon comes with a front end for the C language: Kvasir (see Kvasir in Daikon User Manual). Kvasir only works under the Linux operating system, and it works only on “x86” (Intel 386, 486, 586, 686 compatible) and “x86-64” (AMD64, EM64T compatible) processors.

You may wish to infer invariants over C programs running on other platforms; for instance, you want a robust C front end that works under Microsoft Windows. This section will help you to either write such a front end or to hand-instrument your program to produce output that Daikon can process.

We welcome additions and corrections to this part of the manual. And, if you write a C instrumenter that might be of use to others, please contribute it back to the Daikon project.

A front end for C (or any other language) performs two tasks. It determines the names of all variables that are in scope at a particular program point, and it prints the values of those variables each time the program point executes.

Determining the names of the variables is straightforward. It requires either parsing source code or parsing a compiled executable. In the latter case, the variables can be determined from debugging information that the compiler places in the executable.

The challenge for C programs is determining the values of variables at execution time: for each variable, the front end must determine whether the variable’s value is valid, and how big the value is.

A front end should print only variables that have valid values. Examples of invalid values are variables that have not yet been initialized and pointers whose content has been deallocated. (A pointer dereference, such as ‘*p’ or ‘p->field’, can itself be to uninitialized and/or deallocated memory.) Invalid values should be printed as ‘nonsensical’ (see Data trace records).

It is desirable to print ‘nonsensical’ rather than an invalid value, for two reasons. First, outputting nonsense values can degrade invariant detection; patterns in the valid data may be masked by noise from invalid values. Second, an attempt to access an invalid value can cause the instrumented program to crash! For instance, suppose that pointer ‘p’ is not yet initialized — the pointer value refers to some arbitrary location in memory, possibly even an address that the operating system has not allocated to the program. An attempt to print the value of ‘*p’ or ‘p->field’ will result in a segmentation fault when ‘*p’ is accessed. (If you choose never to dereference a pointer while performing instrumentation, then you do not need to worry about invalid references. However, you will be unable to output any fields of a pointer to a struct or class, making your front end less useful. You will still be able to output fields of a regular variable to a struct or class, but most interesting uses of structs and classes in C and C++ are through pointers.)

C relies on the programmer to remember which variables are valid, and the programmer must take care never to access invalid variables. Unfortunately, there is no simple automatic way to determine variable validity for an arbitrary C program. (Languages with automatic memory management, such as Java, do not pose these problems. All variables always have an initial value, so there is no danger of printing uninitialized memory, though the initial value may not be particularly meaningful. Because pointed-to memory is never deallocated, all non-null pointers are always valid, so there is no danger of a segmentation fault.)

An instrumenter needs information about validity of variable values. This could be obtained from the programmer (which requires work on the part of the user of Daikon), or obtained automatically by creating a new run-time system that tracks the information (which requires a more sophisticated front end).

In addition to determining which variables are uninitialized and which pointers are to allocated memory, there are additional problems for a C front end. For example, given a char pointer ‘*c’, does it point to a single character, or to an array of characters? If it points to an array of characters, how big is that array? And for each element of the array, is that element initialized or not?

The problem of tracking C memory may seem daunting, but it is not insurmountable. There exist many tools for detecting or debugging memory errors in C, and they need to perform exactly the same memory tracking as a Daikon front end must perform. Therefore, a Daikon front end can use the same well-known techniques, and possibly can even be built on top of such a tool. For instance, one C front end, named Kvasir, is built on top of the Valgrind tool (https://valgrind.org/), greatly reducing the implementation effort. Valgrind only works under Linux, but a C front end for another platform could build on a similar tool; many other such tools exist.

There are two basic approaches to instrumenting a C program (or a program in any other language): instrument the source code, or instrument a compiled binary representation of the program. In each case, additional code that tracks all memory allocations, deallocations, writes, and reads must be executed at run time. Which approach is most appropriate for you depends on what tools you use when building your C instrumentation system.

In some cases, it may not be necessary to build a fully general C instrumentation system. You may be able to craft a smaller, simpler extension to an existing program — enabling that program (only) to produce files for Daikon to analyze.

For instance, many programs use specialized memory allocation routines (customized versions of malloc and free), in order to prevent or detect memory errors. The information that such libraries collect is often sufficient to determine which variable values should be printed, and which should be suppressed in favor of printing ‘nonsensical’ instead.

The presence of memory errors — even in a program that appears to run correctly — makes it much harder to create Daikon’s output. Therefore, as a prerequisite to instrumenting a C program, it is usually a good idea to run a memory checker on that program and to eliminate any memory errors.

2.8 New suppressors

As mentioned in Daikon internals, one way to make Daikon more efficient, and to reduce clutter in the output to the user, is to reduce the number of redundant invariants of various kinds. This section describes how to add a new suppressor relation, such that if invariant A implies B, B is not instantiated or checked as long as A holds, saving time and space. Suppression implications use some terminology. A suppressor (defined in the class NISuppressor) is one of a set of invariants (NISuppression) that imply and suppress a suppressee invariant (NISuppressee). The set of all of the suppressions that suppress a particular suppressee is stored in the class NISuppressionSet.

Adding a new suppression is straightforward when the invariants involved do not have any state. Define the suppressee and each of the suppressions that suppress it using the corresponding constructors. Add the method get_ni_suppressions to the class of the invariant being suppressed and return the appropriate suppression set. Make sure that get_ni_suppressions always returns the same suppression set (i.e., that storage to store the suppressions is only allocated once). Normally this is done by defining a static variable to hold the suppression sets and initializing this variable the first time that get_ni_suppressions is called.

The following example defines suppressions for ‘x == y’ implies ‘x >= y’ and ‘x > y’ implies ‘x >= y’.

private static NISuppressionSet suppressions = null;

public NISuppressionSet get_ni_suppressions() {
  if (suppressions == null) {
    NISuppressee = new NISuppressee(IntGreaterEqual);

    NISuppressor v1_eq_v2 = new NISuppressor(0, 1, IntEqual.class);
    NISuppressor v1_lt_v2 = new NISuppressor(0, 1, IntLessThan.class);

    suppressions = new NISuppressionSet(new NISuppression[] {
      new NISuppression(v1_eq_v2, suppressee),
      new NISuppression(v1_lt_v2, suppressee),
    });
  }
  return suppressions;
}

For suppressions depending on the state of a particular invariant, each Invariant has an isObviousDynamically(VarInfo[] vis) method that is called once the state of other invariants has already been determined. This method returns a non-null value if this invariant is implied by a fact that can be derived from the given VarInfos.

For example, suppose division was not defined for divisors smaller than 1. The following example defines an obvious check for ‘x <= c’ (where c < 1 is a constant) implies ‘y % x == 0’, written in the Divides class.

public DiscardInfo isObviousDynamically(VarInfo[] vis) {
  DiscardInfo di = super.isObviousDynamically(vis);
  if(di != null) {
    return di;
  }

  VarInfo var1 = vis[0];

  PptSlice1 ppt_over1 = ppt.parent.findSlice(var1);

  if(ppt_over1 == null) {
    return null;
  }

  for(Invariant inv : ppt_over1.invs) {
    if(inv instanceof UpperBound) {
      if(((UpperBound) inv).max() < 1) {
        return new DiscardInfo(this, DiscardCode.obvious,
                       ‘Divides is obvious when divisor less than one’);
      }
    }
  }

  return null;
}

2.9 Reading dtrace files

If you wish to write a program that manipulates a .dtrace file, you can use Daikon’s built-in mechanisms for parsing .dtrace files. (This is easier and less error-prone than writing your own parser.)

You will write a subclass of FileIO.Processor, then pass an instance of that class to FileIO.read_data_trace_files. Daikon will parse each record in the trace files that you indicate, then will pass the parsed version to methods in your processor.

For a simple example of how to use FileIO.Processor, see the file
daikon/java/daikon/tools/ReadTrace.java.

2.10 System.exit

The Daikon codebase does not call System.exit(), except in a dummy main method that catches TerminationMessage, which is the standard way that a component of Daikon requests the JVM to shut down.

The reason for this is that calling System.exit() is usually a bad idea. It makes the class unusable as a subroutine, because it might kill the calling program. It can cause deadlock. And it can leave data in an inconsistent state (for example, if the program was in the middle of writing a file, still held non-Java locks, etc.), because the program has no good way of completing any actions that it was in the middle of. Therefore, it is better to throw an exception and let the program handle it appropriately. (This is true of instrumentation code as well.)

To see the stack trace for a TerminationMessage, pass --config_option daikon.Debug.show_stack_trace=true on the command line.

• Compiling Daikon
• Source code (version control repository)
• Using Eclipse
• New invariants
• New derived variables
• New formatting for invariants
• New front ends
• New suppressors
• Reading dtrace files
• System.exit

2 Extending Daikon

2.1 Compiling Daikon