Next: , Previous: , Up: Top   [Contents][Index]

Appendix A File formats

This chapter contains information about the file format of Daikon’s input files. It is of most information to those who wish to write a front end, also known as an instrumenter (see Front ends and instrumentation in Daikon User Manual). A new front end enables Daikon to detect invariants in another programming language.

Daikon’s input is conventionally one or more .dtrace data trace files. (Another, optional type of input file for Daikon is a splitter info file; see Splitter info file format in Daikon User Manual.) A trace file is a text file that consists of newline-separated records. There are two basic types of records that can appear in Daikon’s input: program point declarations, and trace records. The declarations describe the structure of the trace records. The trace records contain the data on which Daikon operates — the run-time values of variables in your program.

Each declaration names an instrumented program point and lists the variables at that program point. A program point is a location in the program, such as a specific line number, or a specific procedure’s entry or exit. An instrumented program point is a place where the instrumenter may emit a trace record. A program point declaration may be repeated, so long as the declarations match exactly (any declarations after the first one have no effect).

A data trace record (also known as a sample) represents one execution of a program point. The record specifies the program point and gives the run-time values of each variable. The list of variables in the data trace record must be identical to that in the corresponding declaration. For a given program point, the declaration must precede the first data trace record for the program point. It is not required that all the program point declarations appear before any of the data trace records.

There exist some other declaration-related records; see Declaration-related records.


Next: , Up: File formats   [Contents][Index]

A.1 Declarations in a separate file

Instead of placing both declarations and data trace records in a single file, it is permitted to place the declarations in one or more .decls declaration files while leaving the data trace records in the .dtrace file. This can be convenient for tools that perform a separate instrumentation step, such as dfepl (see dfepl in Daikon User Manual). Such a tool takes as input a target program to be analyzed, and produces two outputs: a .decls file and an instrumented program. Executing the instrumented program produces a .dtrace file containing data trace records for all the program points that appear in the .decls file. This approach works fine and is easier to implement in certain situations, but has a few disadvantages. It requires the user to perform at least two steps — instrumentation and execution — and the existence of two versions of the program (instrumented and uninstrumented) can lead to confusion or extra work. It is also more convenient to have a single file that contains all information about a program, rather than multiple .decls files that must be associated with the .dtrace file.

It is also permitted for a declaration to appear more than once in Daikon’s input. The declaration must be identical in all occurrences. This is useful when running Daikon with multiple .dtrace files, each of which contains its own declarations.


Next: , Previous: , Up: File formats   [Contents][Index]

A.2 File format conventions

Daikon files are textual, to permit easier viewing and editing by humans. Each record is separated by one or more blank lines. To permit easier parsing by programs, each piece of information in a record appears on a separate line.

Outside a record, any line starting with a pound sign (#) or double slashes (//) is ignored as a comment. Comments are not permitted inside a record.


Next: , Previous: , Up: File formats   [Contents][Index]

A.3 Declarations

The trace file (or declaration file) first states the declaration file format version number (see Declaration version). It may also specify some other information about the file (see Declaration-related records). Then, it defines each program point and its variables.

Indentation is ignored, so it may be used to aid readability. Fields with defaults can be omitted.

As a rule, each line of the declaration file is of the form <field-name> <field-value>.

Some additional details about the declaration file format appear in file daikon/doc/decl_format.txt in the Daikon source code.


Next: , Up: Declarations   [Contents][Index]

A.3.1 Declaration-related records


Next: , Up: Declaration-related records   [Contents][Index]

A.3.1.1 Declaration version

The declaration version record must be the first record in the file.

The declaration version record is as follows:

decl-version <version>

The current version is 2.0.

Previous versions (see Version 1 Declarations) did not include a version field and are identified by the lack of this field.


Next: , Previous: , Up: Declaration-related records   [Contents][Index]

A.3.1.2 Input-language declaration

You can specify the language in which the program was written with a record of the form

input-language <language>

The language string is arbitrary and is not used.


Next: , Previous: , Up: Declaration-related records   [Contents][Index]

A.3.1.3 Variable comparability

The Variable comparability record indicates how the comparability field of a variable declaration should be interpreted.

Its format is:

var-comparability <comparability-type>

The possible values for comparability-type are implicit and none.

implicit means ordinary comparability as described in Variable declarations. (The name implicit is retained for historical reasons.)

This record is optional. The implicit type is the default.


Previous: , Up: Declaration-related records   [Contents][Index]

A.3.1.4 ListImplementors declaration

This declaration indicates classes that implement the java.util.List interface, and should be treated as sequences for the purposes of invariant detection. The syntax is as follows:

ListImplementors
<classname1>
<classname2>
...

Each classname is in Java format (for example, java.util.LinkedList).

The --list_type command-line option to Daikon can also be used to specify classes that implement lists; see Options to control invariant detection in Daikon User Manual.


Next: , Previous: , Up: Declarations   [Contents][Index]

A.3.2 Program point declarations

The format of a program point declaration is:

ppt <ppt-name>
<ppt-info>
<ppt-info>
...
<variable-declaration>
<variable-declaration>
...

The program point name can include any character. In the declaration file, blanks must be replaced by \_, and backslashes must be escaped as \\. Program point names must be distinct.

While Daikon does not infer program point relationships from ppt-names, it does require these names to conform to a set syntax. The following patterns are for the enter, subexit, class and object ppt-types, respectively.

<fully qualified class name>.<method/function name>(<argument types>):::ENTER
<fully qualified class name>.<method/function name>(<argument types>):::EXIT<id>
<fully qualified class name>:::CLASS
<fully qualified class name>:::OBJECT

Since in most languages a method or function may have multiple exit points, the ppt-name for a subexit ppt-type must be appended with a unique id. Typically, this is the corresponding source line number.

The following information about the program point (ppt-info) can be specified:


Previous: , Up: Declarations   [Contents][Index]

A.3.3 Variable declarations

The format of a variable declaration is:

variable <name>
  <variable-info>
  <variable-info>
  ...

The variable name is arbitrary, but for clarity, it should match what is used in the programming language. All characters are legal in a name, but blanks must be represented as \_ and backslashes as \\.

If the variable is an array, .. marks the location of array indices within the variable name. Some examples of names are:

this.theArray
this.theArray[..]
this.stack.getClass()

The following information about the variable (variable-info) can be specified:


Next: , Previous: , Up: File formats   [Contents][Index]

A.4 Data trace records

A data trace record (also known as a sample) contains run-time value information. Its format is:

<program-point-name>
this_invocation_nonce
<nonce-string>
<varname-1>
<var-value-1>
<var-modified-1>
<varname2>
<var-value-2>
<var-modified-2>
...

In other words, the sample record contains:


Next: , Up: Data trace records   [Contents][Index]

A.4.1 Nonsensical values for variables

Some trace variables and derived variables may not have a value because the expression that computes it cannot be evaluated. In such a circumstance, the value is said to be nonsensical, it is written in the trace file as nonsensical, and its modified field must be 2. Examples include

For trace variables, it is the responsibility of the front end to perform a check at run time whenever a variable’s value is about to be output to the trace, and to output the value ‘nonsensical’ (see Nonsensical values) rather than crashing the program or outputting an uninitialized or meaningless value. (Determining when an expression’s value is meaningless is the most challenging part of writing an instrumenter for a language like C, since it requires tracking memory allocation and initialization.) For derived variables created by Daikon, Daikon does the same thing, setting values to ‘nonsensical’ when appropriate. For controlling Daikon’s output in the presence of nonsensical values, see the daikon.Daikon.guardNulls configuration option (see General configuration options in Daikon User Manual).


Previous: , Up: Data trace records   [Contents][Index]

A.4.2 Variables that do not appear in trace records

A trace record should contain exactly the same variables as in the corresponding declaration. There is one exception: for efficiency, compile-time constants (e.g., static final variables in Java) are omitted from the trace record, since they would have the same value every time.

Furthermore, neither the declarations nor the trace records contain derived variables (see Variable names in Daikon User Manual).


Next: , Previous: , Up: File formats   [Contents][Index]

A.5 Example files

Here are portions of two files StackArTester.decls and StackArTester.dtrace, for a Java class that implements a stack of integers using an array as the underlying data structure. You can see many more examples by simply running an existing front end on some Java, C, or Perl programs and viewing the resulting files.


Next: , Up: Example files   [Contents][Index]

A.5.1 Example declaration file

This is part of the file StackArTester.decls, a declaration file for the StackAr.java program (see StackAr example in Daikon User Manual).

ppt DataStructures.StackAr.push(java.lang.Object):::ENTER
ppt-type enter
parent parent DataStructures.StackAr:::OBJECT 1
variable this
  var-kind variable
  dec-type DataStructures.StackAr
  rep-type hashcode
  flags is_param nomod
  comparability 22
  parent DataStructures.StackAr:::OBJECT 1
variable this.theArray
  var-kind field theArray
  enclosing-var this
  dec-type java.lang.Object[]
  rep-type hashcode
  flags nomod
  comparability 22
  parent DataStructures.StackAr:::OBJECT 1
variable this.theArray.getClass().getName()
  var-kind function getClass().getName()
  enclosing-var this.theArray
  dec-type java.lang.Class
  rep-type java.lang.String
  function-args this.theArray
  flags nomod synthetic classname
  comparability 22
  parent DataStructures.StackAr:::OBJECT 1
variable this.theArray[..]
  var-kind array
  enclosing-var this.theArray
  array 1
  dec-type java.lang.Object[]
  rep-type hashcode[]
  flags nomod
  comparability 22
  parent DataStructures.StackAr:::OBJECT 1
variable this.theArray[..].getClass().getName()
  var-kind function getClass().getName()
  enclosing-var this.theArray[..]
  array 1
  dec-type java.lang.Class[]
  rep-type java.lang.String[]
  function-args this.theArray[]
  flags nomod synthetic classname
  comparability 22
  parent DataStructures.StackAr:::OBJECT 1
variable this.topOfStack
  var-kind field topOfStack
  enclosing-var this
  dec-type int
  rep-type int
  flags nomod
  comparability 22
  parent DataStructures.StackAr:::OBJECT 1
variable DataStructures.StackAr.DEFAULT_CAPACITY
  var-kind variable
  dec-type int
  rep-type int
  constant 10
  flags nomod
  comparability 22
  parent DataStructures.StackAr:::OBJECT 1
variable x
  var-kind variable
  dec-type java.lang.Object
  rep-type hashcode
  flags is_param nomod
  comparability 22
variable x.getClass().getName()
  var-kind function getClass().getName()
  enclosing-var x
  dec-type java.lang.Class
  rep-type java.lang.String
  function-args x
  flags nomod synthetic classname
  comparability 22

ppt DataStructures.StackAr.push(java.lang.Object):::EXIT103
ppt-type subexit
parent parent DataStructures.StackAr:::OBJECT 1
variable this
  var-kind variable
  dec-type DataStructures.StackAr
  rep-type hashcode
  flags is_param nomod
  comparability 22
  parent DataStructures.StackAr:::OBJECT 1
variable this.theArray
  var-kind field theArray
  enclosing-var this
  dec-type java.lang.Object[]
  rep-type hashcode
  flags nomod
  comparability 22
  parent DataStructures.StackAr:::OBJECT 1
variable this.theArray.getClass().getName()
  var-kind function getClass().getName()
  enclosing-var this.theArray
  dec-type java.lang.Class
  rep-type java.lang.String
  function-args this.theArray
  flags nomod synthetic classname
  comparability 22
  parent DataStructures.StackAr:::OBJECT 1
variable this.theArray[..]
  var-kind array
  enclosing-var this.theArray
  array 1
  dec-type java.lang.Object[]
  rep-type hashcode[]
  flags nomod
  comparability 22
  parent DataStructures.StackAr:::OBJECT 1
variable this.theArray[..].getClass().getName()
  var-kind function getClass().getName()
  enclosing-var this.theArray[..]
  array 1
  dec-type java.lang.Class[]
  rep-type java.lang.String[]
  function-args this.theArray[]
  flags nomod synthetic classname
  comparability 22
  parent DataStructures.StackAr:::OBJECT 1
variable this.topOfStack
  var-kind field topOfStack
  enclosing-var this
  dec-type int
  rep-type int
  flags nomod
  comparability 22
  parent DataStructures.StackAr:::OBJECT 1
variable DataStructures.StackAr.DEFAULT_CAPACITY
  var-kind variable
  dec-type int
  rep-type int
  constant 10
  flags nomod
  comparability 22
  parent DataStructures.StackAr:::OBJECT 1
variable x
  var-kind variable
  dec-type java.lang.Object
  rep-type hashcode
  flags is_param nomod
  comparability 22
variable x.getClass().getName()
  var-kind function getClass().getName()
  enclosing-var x
  dec-type java.lang.Class
  rep-type java.lang.String
  function-args x
  flags nomod synthetic classname
  comparability 22

ppt DataStructures.StackAr:::CLASS
ppt-type class
variable DataStructures.StackAr.DEFAULT_CAPACITY
  var-kind variable
  dec-type int
  rep-type int
  constant 10
  flags nomod
  comparability 22

ppt DataStructures.StackAr:::OBJECT
ppt-type object
parent parent DataStructures.StackAr:::CLASS 1
variable this
  var-kind variable
  dec-type DataStructures.StackAr
  rep-type hashcode
  flags is_param nomod
  comparability 22
variable this.theArray
  var-kind field theArray
  enclosing-var this
  dec-type java.lang.Object[]
  rep-type hashcode
  flags nomod
  comparability 22
variable this.theArray.getClass().getName()
  var-kind function getClass().getName()
  enclosing-var this.theArray
  dec-type java.lang.Class
  rep-type java.lang.String
  function-args this.theArray
  flags nomod synthetic classname
  comparability 22
variable this.theArray[..]
  var-kind array
  enclosing-var this.theArray
  array 1
  dec-type java.lang.Object[]
  rep-type hashcode[]
  flags nomod
  comparability 22
variable this.theArray[..].getClass().getName()
  var-kind function getClass().getName()
  enclosing-var this.theArray[..]
  array 1
  dec-type java.lang.Class[]
  rep-type java.lang.String[]
  function-args this.theArray[]
  flags nomod synthetic classname
  comparability 22
variable this.topOfStack
  var-kind field topOfStack
  enclosing-var this
  dec-type int
  rep-type int
  flags nomod
  comparability 22
variable DataStructures.StackAr.DEFAULT_CAPACITY
  var-kind variable
  dec-type int
  rep-type int
  constant 10
  flags nomod
  comparability 22
  parent DataStructures.StackAr:::CLASS 1

Previous: , Up: Example files   [Contents][Index]

A.5.2 Example data trace file

This is part of file StackArTester.dtrace, which you can create by running the instrumented StackAr.java program (see StackAr example in Daikon User Manual). This excerpt contains only the first two calls to push and the first return from push, along with the associated object program point records; omitted records are indicated by ellipses.


...

DataStructures.StackAr.push(java.lang.Object):::ENTER
this_invocation_nonce
104
this
812272602
1
this.theArray
312077835
1
this.theArray.getClass().getName()
"java.lang.Object[]"
1
this.theArray[..]
[null]
1
this.theArray[..].getClass().getName()
[null]
1
this.topOfStack
-1
1
x
1367164551
1
x.getClass().getName()
"DataStructures.MyInteger"
1

...

DataStructures.StackAr.push(java.lang.Object):::EXIT103
this_invocation_nonce
104
this
812272602
1
this.theArray
312077835
1
this.theArray.getClass().getName()
"java.lang.Object[]"
1
this.theArray[..]
[1367164551]
1
this.theArray[..].getClass().getName()
["DataStructures.MyInteger"]
1
this.topOfStack
0
1
x
1367164551
1
x.getClass().getName()
"DataStructures.MyInteger"
1

...

DataStructures.StackAr.push(java.lang.Object):::ENTER
this_invocation_nonce
159
this
2007069404
1
this.theArray
142345952
1
this.theArray.getClass().getName()
"java.lang.Object[]"
1
this.theArray[..]
[null]
1
this.theArray[..].getClass().getName()
[null]
1
this.topOfStack
-1
1
x
111632506
1
x.getClass().getName()
"DataStructures.MyInteger"
1

...

DataStructures.StackAr.push(java.lang.Object):::EXIT103
this_invocation_nonce
159
this
2007069404
1
this.theArray
142345952
1
this.theArray.getClass().getName()
"java.lang.Object[]"
1
this.theArray[..]
[111632506]
1
this.theArray[..].getClass().getName()
["DataStructures.MyInteger"]
1
this.topOfStack
0
1
x
111632506
1
x.getClass().getName()
"DataStructures.MyInteger"
1

...


Previous: , Up: File formats   [Contents][Index]

A.6 Version 1 Declarations

This section describes the original version (1.0) of declaration records. These are now obsolete and should not be used.

A declarations file can contain program point declarations, VarComparability declarations, and ListImplementors declarations.


Next: , Up: Version 1 Declarations   [Contents][Index]

A.6.1 V1 Program point declarations

The format of a program point declaration is:

DECLARE
program-point-name
varname1
declared-type1 [# auxiliary-information1]
representation-type1 [= constant-value1]
comparable1
varname2
declared-type2 [# auxiliary-information2]
representation-type2 [= constant-value2]
comparable2
...

Program point information includes:


Next: , Previous: , Up: Version 1 Declarations   [Contents][Index]

A.6.2 Program point name format specification

Instrumenting code creates a .decls file that contains program point names such as:

DataStructures.StackAr.push(java.lang.Object):::ENTER
DataStructures.StackAr.push(java.lang.Object):::EXIT99
PolyCalc.RatNum.RatNum(int, int):::ENTER
PolyCalc.RatNum.RatNum(int, int):::EXIT55
PolyCalc.RatNum.RatNum(int, int):::EXIT67

This section describes the format of these program point names. Someone writing an instrumenter for a new language must be sure to follow this format specification.

A program point name is a string with no tabs or newlines in it. The basic format is ‘topLevel.bottomLevel:::pptInfo’. For the first example given above, the top level of the hierarchy would be DataStructures.StackAr, the bottom level would be push(java.lang.Object), and the program point information would be ENTER.

topLevel may contain any number of periods (‘.’). bottomLevel and pptInfo may not contain any periods. The string ‘:::’ may only appear once.

topLevel and pptInfo are required (i.e., they must be non-empty), as are the period to the right of topLevel and the colons to the left of pptInfo. However, bottomLevel is optional.

By convention, for Java topLevel consists of the class name, and bottomLevel consists of the method name and method signature.

For C, topLevel consists of a filename (or a single period for global functions), and bottomLevel could consist of a function name and signature. More precisely, names of C program points follow these conventions:

For an Input Output Automaton, topLevel consists of an Automaton name and bottomLevel consists of information for a transition state.

By convention, the entry and exit points for a function have names of a special form so that they can be associated with one another. (Currently, those names end with :::ENTER and :::EXIT.) This convention permits Daikon to generate pre-state variables (see Variable names in Daikon User Manual) automatically at procedure exit points, so front ends need not output them explicitly. When there are multiple exit points, then each one should be suffixed by a number (such as a line number, for example, foo::EXIT22). Daikon produces the main (non-numbered) :::EXIT point automatically. All the numbered exits should contain the same set of variables; in general, this means that local variables are not included at exit points. Daikon requires that declarations for :::ENTER program points appear before any declarations for matching :::EXIT program points.

Another convention is to have another program point whose bottomLevel is empty and whose pptInfo is OBJECT: for example, StackAr:::OBJECT. This contains the representation invariant (sometimes called the object invariant) of a class. This program point is created automatically by Daikon; it need not appear in a trace file.


Next: , Previous: , Up: Version 1 Declarations   [Contents][Index]

A.6.3 V1 VarComparability declaration

There is a special VarComparability declaration that controls how the comparability field in program point declarations is interpreted. The default VarComparability is implicit, which means ordinary comparability as described in Program point declarations. (The name implicit is retained for historical reasons.) You can override it as follows:

VarComparability
none

As with all records in Daikon input files, a blank line is required between this record and the next one.


Previous: , Up: Version 1 Declarations   [Contents][Index]

A.6.4 V1 ListImplementors declaration

This declaration indicates classes that implement the java.util.List interface, and should be treated as sequences for the purposes of invariant detection. The syntax is as follows:

ListImplementors
<classname1>
<classname2>
...

Each classname is in Java format (for example, java.util.LinkedList).

The --list_type command-line option to Daikon can also be used to specify classes that implement lists; see Options to control invariant detection in Daikon User Manual.


Previous: , Up: Version 1 Declarations   [Contents][Index]