4.2 Scalar Numeric Types

There are four scalar numeric data types defined in the Ptolemy kernel: complex, fixed-point, double precision floating-point, and integer. All of these four types can be read from and written to portholes as described in "Reading inputs and writing outputs" on page 2-17. The floating-point and integer data types are based on the standard C++ double and int types, and need no further explanation. To support the other two types, the Ptolemy kernel contains a Complex class and a Fix class, which are described in the rest of this section.

Fix

class. This class supports a two's complement representation of a finite precision number. In fixed-point notation, the partition between the integer part and the fractional part-the binary point-lies at a fixed position in the bit pattern. Its position represents a trade-off between precision and range. If the binary point lies to the right of all bits, then there is no fractional part.

Constructing Fixed-point variables

Variables of type Fix are defined by specifying the word length and the position of the binary point. At the user-interface level, precision is specified either by setting a fixed-point parameter to a "(value, precision)" pair, or by setting a


precision

parameter. The former gives the value and precision of some fixed-point value, while the latter is typically used to specify the internal precision of computations in a star.

In either case, the syntax of the precision is either "x.y" or "m/n", where x is the number of integer bits (including the sign bit), y and m are the number of fractional bits, and n is the total number of bits. Thus, the total number of bits in the fixed-point number (also called its length) is x+y or n. For example, a fixed-point number with precision "3.5" has a total length of 8 bits, with 3 bits to the left and 5 bits to the right of the binary point.

At the source code level, methods working on Fix objects either have the precision passed as an "x.y" or "m/n" string, or as two C++ integers that specify the total number of bits and the number of integer bits including the sign bit (that is, n and x). For example, suppose you have a star with a precision parameter named precision. Consider the following code:

Fix x = Fix(((const char *) precision));
if (x.invalid())
Error::abortRun(*this, "Invalid precision");
The "precision" parameter is cast to a string and passed as a constructor argument to the Fix class. The error check verifies that the precision was valid.

There is a maximum value for the total length of a Fix object which is specified by the constant FIX_MAX_LENGTH in the file $PTOLEMY/src/kernel/Fix.h. The current value is 64 bits. Numbers in the Fix class are represented using two's complement notation, with the sign bit stored in the bits to the left of the binary point. There must always be at least one bit to the left of the binary point to store the sign.

In addition to its value, each Fix object contains information about its precision and error codes indicating overflow, divide-by-zero, or bad format parameters. The error codes are set when errors occur in constructors or arithmetic operators. There are also fields to specify

: a. whether rounding or truncation should take place when other Fix values are assigned to it-truncation is the default
: b. the response to an overflow or underflow on assignment-the default is saturation (see page 4-6).

Warning

The Fix type is still experimental.

Fixed-point states

State variables can be declared as Fix or FixArray. The precision is specified by an associated precision state using either of two syntaxes:

Specifying just a value itself in the dialog box creates a fixed-point number with the default length of 24 bits and with the position of the binary point set as required to store the integer value. For example, the value 1.0 creates a fixed-point object with precision 2.22, and the value 0.5 would create one with precision 1.23.
Specifying a (value, precision) pair create a fixed-point number with the specified precision. For example, the value (2.546, 3.5) creates a fixed-point object by casting the double 2.546 to a Fix with precision 3.5.

Fixed-point inputs and outputs

Fix types are available in Ptolemy as a type of Particle. The conversion from an int or a double to a Fix takes place using the Fix::Fix(double) constructor which makes a Fix object with the default word length of 24 bits and the number of integer bits as needed required by the value. For instance, the double 10.3 will be converted to a Fix with precision 5.19, since 5 is the minimum number of bits needed to represent the integer part, 10, including its sign bit.

To use the Fix type in a star, the type of the portholes must be declared as "fix". Stars that receive or transmit fixed-point data have parameters that specify the precision of the input and output in bits, as well as the overflow behavior. Here is a simplified version of SDFAddFix star, configured for two inputs:

defstar {
name { AddFix }
domain {SDF}
derivedFrom{ SDFFix }
input {
name { input1 }
type { fix }
}
input {
name { input2 }
type { fix }
}
output {
name { output }
type { fix }
}
defstate {
name { OutputPrecision }
type { precision }
default { 2.14 }
desc {
Precision of the output in bits and precision of the accumulation.
When the value of the accumulation extends outside of the precision,
the OverflowHandler will be called.
}
}
(Note that the real AddFix star supports any number of inputs.) By default, the precision used by this star during the addition will have 2 bits to the left of the binary point and 14 bits to the right. Not shown here is the state OverflowHandler, which is inherited from the SDFFix star and which defaults to saturate-that is, if the addition overflows, then the result saturates, pegging it to either the largest positive or negative number representable. The result value, sum, is initialized by the following code:

protected {
Fix sum;
}
begin {
SDFFix::begin();

sum = Fix( ((const char *) OutputPrecision) );
if ( sum.invalid() )
Error::abortRun(*this, "Invalid OutputPrecision");
sum.set_ovflow( ((const char*) OverflowHandler) );
if ( sum.invalid() )
Error::abortRun(*this, "Invalid OverflowHandler");
} The begin method checks the specified precision and overflow handler for correctness. Then, in the go method, we use sum to calculate the result value, thus guaranteeing that the desired precision and overflow handling are enforced. For example,

go {
sum.setToZero();
sum += Fix(input1%0);
checkOverflow(sum);
sum += Fix(input2%0);
checkOverflow(sum);
output%0 << sum;
}
(The checkOverflow method is inherited from SDFFix.) The protected member sum is an uninitialized Fix object until the begin method runs. In the begin method, it is given the precision specified by OutputPrecision. The go method initializes it to zero. If the go method had instead assigned it a value specified by another Fix object, then it would acquire the precision of that other object-at that point, it would be initialized.

Assignment and overflow handling

Once a Fix object has been initialized, its precision does not change as long as the object exists. The assignment operator is overloaded so that it checks whether the value of the object to the right of the assignment fits into the precision of the left object. If not, then it takes the appropriate overflow response is taken and set the overflow error bit.

If a Fix object is created using the constructor that takes no arguments, as in the protected declaration above, then that object is an uninitialized Fix; it can accept any assignment, acquiring not only its value, but also its precision and overflow handler.

The behavior of a Fix object on an overflow depends on the specifications and the behavior of the object itself. Each object has a private data field that is initialized by the constructor; when there is an overflow, the overflow_handler looks at this field and uses the specified method to handle the overflow. This data field is set to saturate by default, and can be set explicitly to any other desired overflow handling method using a function called set_ovflow(<keyword>). The keywords for overflow handling methods are: saturate (default), zero_saturate, wrapped, warning. saturate replaces the original value is replaced by the maximum (for overflow) or minimum (for underflow) value representable given the precision of the Fix object. zero_saturate sets the value to zero.

Explicitly casting inputs

In the above example, the first line of the go method assigned the input to the protected member sum, which has the side-effect of quantizing the input to the precision of sum. We could have alternatively written the go method as follows:

go {
sum = Fix(input1%0) + Fix(input2%0);
output%0 << sum;
}
The behavior here is significantly different: the inputs are added using their own native precision, and only the result is quantized to the precision of sum.

Some stars allow the user to select between these two different behaviors with a parameter called ArrivingPrecision. If set to YES, the input particles are not explicitly cast; they are used as they are; if set to NO, the input particles are cast to an internal precision, which is usually specified by another parameter.

Here is the (abbreviated) source of the SDFGainFix star, which demonstrates this point:

defstar {
name { GainFix }
domain { SDF }
derivedFrom { SDFFix }
desc {
This is an amplifier; the fixed-point output is the fixed-point input
multiplied by the "gain" (default 1.0). The precision of "gain", the
input, and the output can be specified in bits.
}
input {
name { input }
type { fix }
}
output {
name { output }
type { fix }
}
defstate {
name { gain }
type { fix }
default { 1.0 }
desc { Gain of the star. }
}
defstate {
name { ArrivingPrecision }
type {int}
default {"YES"}
desc {
Flag indicating whether or no to use the arriving particles as they
are: YES keeps the same precision, and NO casts them to the precision
specified by the parameter "InputPrecision". }
}
defstate {
name { InputPrecision }
type { precision }
default { 2.14 }
desc {
Precision of the input in bits. The input particles are only cast
to this precision if the parameter "ArrivingPrecision" is set to NO.
}
}
defstate {
name { OutputPrecision }
type { precision }
default { 2.14 }
desc {
Precision of the output in bits.
This is the precision that will hold the result of the arithmetic
operation on the inputs.
When the value of the product extends outside of the precision,
the OverflowHandler will be called.
}
protected {
Fix fixIn, out;
}
begin {
SDFFix::begin();

if ( ! int(ArrivingPrecision) ) {
fixIn = Fix( ((const char *) InputPrecision) );
if(fixIn.invalid())
Error::abortRun( *this, "Invalid InputPrecision" );
}

out = Fix( ((const char *) OutputPrecision) );
if ( out.invalid() )
Error::abortRun( *this, "Invalid OutputPrecision" );
out.set_ovflow( ((const char *) OverflowHandler) );
if(out.invalid())
Error::abortRun( *this,"Invalid OverflowHandler" );
}
go {
// all computations should be performed with out since
// that is the Fix variable with the desired overflow
// handler
out = Fix(gain);
if ( int(ArrivingPrecision) ) {
out *= Fix(input%0);
}
else {
fixIn = Fix(input%0);
out *= fixIn;
}
checkOverflow(out);
output%0 << out;
} // a wrap-up method is inherited from SDFFix
// if you defined your own, you should call SDFFix::wrapup()
}
Note that the SDFGainFix star and many of the Fix stars are derived from the star SDFFix. SDFFix implements commonly used methods and defines two states: OverflowHandler selects one of four overflow handlers to be called each time an overflow occurs; and ReportOverflow, which, if true, causes the number and percentage of overflows that occurred for that star during a simulation run to be reported in the wrapup method.

Constructors:

Fix()

Create a Fix number with unspecified precision and value zero.

Fix(int length, int intbits)

Create a Fix number with total word length of length bits and intbits bits to the left of the binary point. The value is set to zero. If the precision parameters are not valid, then an error bit is internally set so that the invalid method will return TRUE.

Fix(const char* precisionString)

Create a Fix number whose precision is determined by precisionString, which has the syntax "leftbits.rightbits", where leftbits is the number of bits to the left of the binary point and rightbits is the number of bits to the right of the binary point, or "rightbits/totalbits", where totalbits is the total number of bits. The value is set to zero. If the precisionString is not in the proper format, an error bit is internally set so that the invalid method will return TRUE.

Fix(double value)

Create a Fix with the default precision of 24 total bits for the word length and set the number of integer bits to the minimum needed to represent the integer part of the number value. If the value given needs more than 24 bits to represent, the value will be clipped and the number stored will be the largest possible under the default precision (i.e. saturation occurs). In this case an internal error bit is set so that the ovf_occurred method will return TRUE.

Fix(int length, int intbits, double value)

Create a Fix with the specified precision and set its value to the given value. The number is rounded to the closest representable number given the precision. If the precision parameters are not valid, then an error bit is internally set so that the invalid method will return TRUE.

Fix(const char* precisionString, double value)

Same as the previous constructor except that the precision is specified by the given precisionString instead of as two integer arguments. If the precision parameters are not valid, then an error bit is internally set so that the invalid() method will return true when called on the object.

Fix(const char* precisionString, uint16* bits)

Create a Fix with the specified precision and set the bits precisely to the ones in the given bits. The first word pointed to by bits contains the most significant 16 bits of the representation. Only as many words as are necessary to fetch the bits will be referenced from the bits argument. For example: Fix("2.14",bits) will only reference bits[0].

This constructor gets very close to the representation and is meant mainly for debugging. It may be removed in the future.

Fix(const Fix& arg)

Copy constructor. Produces an exact duplicate of arg.

Fix(int length, int intbits, const Fix& arg)

Read the value from the Fix argument and set to a new precision. If the precision parameters are not valid, then an error bit is internally set so that the invalid method will return true when called on the object. If the value from the source will not fit, an error bit is set so that the ovf_occurred method will return TRUE.

Functions to set or display information about the Fix number:

int len() const

Return the total word length of the Fix number.

int intb() const

Return the number of bits to the left of the binary point.

int precision() const

Return the number of bits to the right of the binary point.

int overflow() const

Return the code of the type of overflow response for the Fix number. The possible codes are:
0 - ovf_saturate,
1 - ovf_zero_saturate,
2 - ovf_wrapped,
3 - ovf_warning,
4 - ovf_n_types.

int roundMode() const

Return the rounding mode: 1 for rounding, 0 for truncation.

int signBit() const

Return TRUE if the value of the Fix number is negative, FALSE if it is positive or zero.

int is_zero()

Return TRUE if the value of the Fix number is zero.

double max()

Return the maximum value representable using the current precision.

double min()

Return the minimum value representable using the current precision.

double value()

The value of the Fix number as a double.

void setToZero()

Set the value of the Fix number to zero.

void set_overflow(int value)

Set the overflow type.

void set_rounding(int value)

Set the rounding type: TRUE for rounding, FALSE for truncation.

void initialize()

Discard the current precision format and set the Fix number to zero.
There are a few functions for backward compatibility:

void set_ovflow(const char*)

Set the overflow using a name.

void Set_MASK(int value)

Set the rounding type. Same functionality as set_rounding().
Comparison function:

int compare (const Fix& a, const Fix& b)

Compare two Fix numbers. Return -1 if a < b, 0 if a = b, 1 if a > b.
The following functions are for use with the error condition fields:

int ovf_occurred()

Return TRUE if an overflow has occurred as the result of some operation like addition or assignment.

int invalid()

Return TRUE if the current value of the Fix number is invalid due to it having an improper precision format, or if some operation caused a divide by zero.

int dbz()

Return TRUE if a divide by zero error occurred.

void clear_errors()

Reset all error bit fields to zero.

Operators:

Fix& operator = (const Fix& arg)

Assignment operator. If *this does not have its precision format set (i.e. it is uninitialized), the source Fix is copied. Otherwise, the source Fix value is converted to the existing precision. Either truncation or rounding takes place, based on the value of the rounding bit of the current object. Overflow results either in saturation, "zero saturation" (replacing the result with zero), or a warning error message, depending on the overflow field of the object. In these cases, ovf_occurred will return TRUE on the result.

Fix& operator = (double arg)

Assignment operator. The double value is first converted to a default precision Fix number and then assigned to *this.
The function of these arithmetic operators should be self-explanatory:

Fix& operator += (const Fix&)

Fix& operator -= (const Fix&)

Fix& operator *= (const Fix&)

Fix& operator *= (int)

Fix& operator /= (const Fix&)

Fix operator + (const Fix&, const Fix&)

Fix operator - (const Fix&, const Fix&)

Fix operator * (const Fix&, const Fix&)

Fix operator * (const Fix&, int)

Fix operator * (int, const Fix&)

Fix operator / (const Fix&, const Fix&)

Fix operator - (const Fix&) // unary minus

int operator == (const Fix& a, const Fix& b)

int operator != (const Fix& a, const Fix& b)

int operator >= (const Fix& a, const Fix& b)

int operator <= (const Fix& a, const Fix& b)

int operator > (const Fix& a, const Fix& b)

int operator < (const Fix& a, const Fix& b)

Note:

These operators are designed so that overflow does not, as a rule, occur (the return value has a wider format than that of its arguments). The exception is when the result cannot be represented in a Fix with all 64 bits before the binary point.
The output of any operation will have error codes that are the logical OR of those of the arguments to the operation, plus any additional errors that occurred during the operation (like divide by zero).
The division operation is currently a cheat: it converts to double and computes the result, converting back to Fix.
The relational operators ==, !=, >=, <=, >, < are all written in terms of a function
int compare(const Fix& a, const Fix& b)This functions returns -1 if a < b, 0 if a = b, and 1 if a > b. The comparison is exact (every bit is checked) if the two values have the same precision format. If the precisions are different, the arguments are converted to doubles and compared. Since double values only have an accuracy of about 53 bits on most machines, this may cause false equality reports for Fix values with many bits.

Conversions:

operator int() const

Return the value of the Fix number as an integer, truncating towards zero.

operator float() const

operator double() const

Convert to a float or a double, creating an exact result when possible.

void complement()

Replace the current value by its complement.

Fix overflow, rounding, and errors.

The Fix class defines the following enumerated values for overflow handling:

Fix::ovf_saturate

Fix::ovf_zero_saturate

Fix::ovf_wrapped

Fix::ovf_warning

They may be used as arguments to the set_overflow method, as in the following example:

out.set_overflow(Fix::ovf_saturate); The member function

int overflow() const; returns the overflow type. This returned result can be compared against the above enumerated values. Overflow types may also be specified as strings, using the method

void set_ovflow(const char* overflow_type); the overflow_type argument may be one of saturate, zero_saturate, wrapped, or warning.

The rounding behavior of a Fix value may be set by calling

void set_rounding(int value); If the argument is false, or has the value Fix::mask_truncate, truncation will occur. If the argument is nonzero (for example, if it has the value Fix::mask_truncate_round, rounding will occur. The older name Set_MASK is a synonym for set_rounding.

The following functions access the error bits of a Fix result:

int ovf_occurred() const; int invalid() const; int dbz() const; The first function returns TRUE if there have been any overflows in computing the value. The second returns TRUE if the value is invalid, because of invalid precision parameters or a divide by zero. The third returns TRUE only for divide by zero.