Mescal Architecture Description Language — 1.0

2003.9 — 2004.7





Contents

   Introduction
   Define Section
   Manager Section
   Machine Section
   Function Section
   Operation Section
   Action Ordering
   Data Types
   Basic Operators
   Modifiers
   OSM Actions
   Annotation Syntax



Introduction

The Mescal Architecture Description Language (MADL) specifies the operation state machine model (OSM) for microprocessor modeling purposes. It supplies processor information to software tools including the instruction set simulator, the microarchitecture simulator, the assembler, the disassembler and various compiler optimizers.

MADL is composed of two parts: the core language and the annotation language. The core language describes the operation state machines of the OSM model, which have concrete executable semantics. The annotation language describes tool-dependent information. For any tool that utilizes MADL, an annotation description scheme can be created based on the generic annotation syntax. The annotation description supplements the core description with implementation-dependent or tool-dependent information, e.g. hints for the tool to analyze the core description. This document mainly describes the syntax of the core language. The generic syntax of the annotation language is described in Annotation Syntax.

Note that currently the hardware layer of the OSM model is not part of MADL. The execution model of the hardware units, including the token managers, are implemented in the general purpose programming language C++, which is the target language into which MADL descriptions are to be translated for execution. MADL only declares the names and the types of the token managers. Description of the hardware layer is expected to be included in future versions of MADL.

MADL utilizes a hierarchical description structure called the and-or graph to minize redundancy in descriptions. To integrate the OSM model with the and-or graph, MADL uses a dynamic version of the OSM model. The feature of the dynamic model is that the actions and computations are dynamically bound to the edges of the state diagram. A well-defined dynamic model can be transformed back to a static model. Besides token managers, the entities in the dynamic OSM model include the skeleton and the syntax operation. A skeleton refers to the state diagram and the internal state variables associated with it. A syntax operation refers to a set of actions and computations, as well as assembly syntax and binary encoding. The syntax operations form an and-or graph. A skeleton and all syntax operations in an expansion of the and-or graph constitute the model of one operation in the instruction set. These syntax operations are dynamically bound to the skeleton during execution.

An MADL file may contain any number of the following sections,

  1. Define Section — declaration of global variables/functions.
  2. Manager Section — declaration of token managers.
  3. Machine Section — definition of a skeleton.
  4. Function Section — definition of a function.
  5. Operation Section — definition of a syntax operation.


Besides, an MADL file may also contain the following commands.

  1. Import command — including other MADL files to the same description. Its syntax is shown below.
    import_command ::= "IMPORT" '"'identifier'"' ';'
    
  2. Using command — associating skeleton with syntax operations. A using command states that all operation sections from the command until the next using command or the end of the file, whichever comes first, are based on the skeleton with the given name. Its syntax is shown below.
    using_command  ::= "USING" identifier ';'
    


Except for the using command, all other sections in MADL are order-independent. For instance, a function section may appear anywhere in an MADL description, either before or after its caller(s). Comments can be placed anywhere in a MADL description. Two types of comments are allowed: single-line comment and block comment. A single-line comment starts with a '#' and lasts until the end of the line. A block comment starts with a "##" and lasts until another "##".



Define Section

The define section declares a list of global constant variables and function prototypes. These variables/functions are in the global scope and can be accessed throughout an MADL description. The general syntax of a variable/function declaration is:

define_section  ::= "DEFINE" def_clause+
def_clause      ::= identifier ':' data_type '=' data_value ';'
def_clause      ::= identifier ':' data_type ';'
def_clause      ::= identifier ':' func_type ';'


An example define section is as follows.

DEFINE

 reg_names : string[16] = {"r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
                           "r8", "r9", "sl", "fp", "ip", "sp", "lr", "pc"};

 pred_table : uint<16>[4] = {0xf0f0, 0x0f0f, 0xcccc, 0x3333};

 epsilon : double;

 func1  : (string, uint<32>);

 func2  : (uint<32>*, uint<32>);


The above define section defines an array of string literals named "reg_name", an array of 16-bit unsigned integer constants "pred_table", and a double-precision constant "epsilon" whose value is not given. Additionally, it defines two functions "func1" and "func2". Function arguments in MADL are passed by reference. Writable arguments are denoted by a "*" after the argument type, e.g. the first argument of "func2". The value of a writable argument may be changed by the function. The variables without values and the functions should be defined in external C++ files for simulation purposes. These C++ files should be linked with MADL generated C++ files in simulators.

For syntax of data and function types, refer to the Data Type section of the document. The restriction is that no void or tuple data types can be used in define sections.



Manager Section

In the OSM model, data or structural resources are modeled as tokens and are managed by token managers. A state machine transacts tokens with the token managers during its execution. In order to get a token, it will typically present to the token manager an index as a token identifier. The manager will then return a token if it is available. The state machine may also read value from and write value to the tokens that it can access. For a list of possible token transactions (also called actions), refer to the OSM Action section of the document.

A manager section may contain a CLASS subsection and an INSTANCE subsection. The former declares token manager class names and their types, while the latter declares token manager instances. A type here is a tuple of the token index type and the token value type. All data types except array can be used as index or value type. The syntax of the section is shown below followed by one example.

manager_section      ::= "MANAGER" class_subsection instance_subsection
class_subsection     ::= "CLASS" class_clause+
class_clause         ::= identifier ':' data_type "->" data_type ';'
instance_subsection  ::= "INSTANCE" instance_clause+
instance_clause      ::= identifier ':' identifier ';'
MANAGER

    CLASS
        fetch_manager : void -> (uint<32>,uint<32>);
        simple_manager: void -> void;

    INSTANCE
        mIF : fetch_manager;
        mEX : simple_manager;



This example declares a token manager class named "fetch_manager" with a void index type (in this case there is no need for token identifier since this manager has only one token), and a tuple value type. The example also declares a "simple_manager" class with a void index type and a void value type (it is simply a structural resource and has no value). Two token manager instances are later declared based on these two classes in the INSTANCE subsection.

An MADL description may contain one or more manager sections. All manager classes and instances declared in these sections are visible to the global scope.



Machine Section

A machine section describes a skeleton, which contains the state diagram and the variables visible to all syntax operations associated with it. A special type of variable is the token buffer. It is used to store allocated tokens for the convenience of reference.

A machine section may contain the following subsections:

  • INITIAL — the initial(dormant) state of the OSM.
  • STATE — the regular states of the OSM.
  • EDGE — the edges connecting the states.
  • BUFFER — the token buffers.
  • VAR — the variables.


There must be one and only one INITIAL state defined in each machine section. There can exist any number of regular states as long as there is no naming conflict. The syntax of the machine section is shown below.

machine_section      ::= "MACHINE" initial_subsection
                         (state_subsection | edge_subsection)+
                         buffer_subsection var_subsection

initial_subsection  ::= "INITIAL" identifier ';'

state_subsection    ::= "STATE" identifier_list ';'
identifier_list     ::= (identifier ',')* identifier

edge_subsection     ::= "EDGE" edge_clause+
edge_clause         ::= identifier ':' identifier "->" identifier ';'

buffer_subsection   ::= "BUFFER" buffer_clause+
buffer_clause       ::= identifier ':' identifier;

var_subsection      ::= "VAR" var_clause+
var_clause          ::= identifier ':' basic_type ';'

The STATE subsection contains a list of state names separated by commas. The EDGE subsection contains a list of edge clauses. Each clause contains the edge name, followed by a ':', the source state name, '->' and the destination state name. The BUFFER subsection contains a list of token buffer clauses, each of which contains a buffer name, followed by ':' and the name of a token manager class. The buffer can only be used to temporarily store tokens obtained from managers of the same class. The variable subsection contains a list of variable declaration, each of which contains a variable name followed by ':' and a type. See Data Type section for details about variable types. An example machine section name "normal" is shown below.

MACHINE normal

    INITIAL S_INIT;

    STATE S_IF, S_EX;

    EDGE  e_in_if : S_INIT -> S_IF;
          e_if_ex : S_IF -> S_EX;
          e_ex_in : S_EX -> S_INIT;

    BUFFER if_buffer : fetch_manager;
           ex_buffer : simple_manager;

    VAR   iw : uint<32>;
          pc : uint<32>;



The states and edges forms the state diagram of the skeleton. The state diagram must be a strongly connected directed graph.



Function Section

A function section defines an internal MADL function. This is different from the external functions in the DEFINE section. The body of an internal function is part of the MADL description, while the body of the external functions are in external C++ source files. A function section contains a function name, a list of arguments, an optional variable (VAR) subsection and an evaluation (EVAL) subsection. The variable subsection defines the local variables. Its syntax is the same as the variable subsection of the MACHINE section. The evaluation subsection contains a sequential list of statements. See Operator section for information about the statements. The statements may access the arguments, the local variables and global constant variables from define sections.

Unlike C functions, MADL functions do not have a return value. The computation result of the function can be returned through writable arguments. See Data Type section for more information about writable arguments.The syntax of the function section is shown below.

function_section      ::= "FUNCTION" identifier '(' arg_list ')'
                          var_subsection? eval_subsection

arg_list              ::= (arg ',')* arg
arg                   ::= identifier ':' basic_type '*'?

eval_subsection       ::= "EVAL" eval_clause+
eval_clause           ::= statement ';'

An example function section is given below. The "result" argument is writable and is used to return the value of computation.

FUNCTION eval_pred(result:uint<1>*, cond:uint<2>, flags:uint<4>)

   VAR temp : uint<4>;

   EVAL
       temp = pred_table[cond] >> flags;
       result = (uint<1>)temp;


Similar to external functions, internal functions are visible to the global name scope. A function can be called throughout an MADL description, regardless of the location of the caller. Recursion is allowed.



Operation Section

An operation section defines a syntax operation. It must be defined based on a skeleton. The skeleton for a syntax operation is specified by the "USING" command. The subsections in an operation section may access the local variables declared in the skeleton and the global constant variables in the define sections.

An operation section contains a name and the following subsections.

  1. VAR — Local variable declaration.

    This subsection is optional. It defines local variables of data types specified in the Data Types section of the document. The syntax is basically the same as the VAR subsection of the MACHINE section. In addition, the subsection may also contain one special type of variable called or-node variable. An or-node variable corresponds to an or-node in the and-or graph. The syntax of the subsection is as below.

    var_subsection  ::= "VAR" (var_clause | var_clause_or)+ 
    
    var_clause_or   ::=  identifier ':' '{' identifier_list '}' ';'
                       | identifier ':' '{' identifier_list '}' '(' identifier ')' ';'
    


    The "identifier_list" contains a list of syntax operation names. The last identifier of the third line above specifies the name of a default syntax operation. Conceptually, the or-node variable is similar to a union-type variable in C. The variable may be resolved to point to any operation in the list or the default operation. Resolving the actual operation occurs at run time by decoding: the encodings (specified by CODING subsection) of the operations in the identifier list will be pattern-matched against a given binary value and the matching one will be chosen. If no one matches and a default operation is given, the default operation will be chosen. If no default operation is provided, a run-time error will be reported. If more than one operation matches, the closest match will be chosen. A valid or-node variable requires that all operations in the name list have the same encoding width. Such an encoding width is viewed as the encoding width of the or-node variable. Decoding is triggered by the decode statement or the activate statement. See details of the statements in description of the EVAL subsection.

    A predefined variable "coding" can be used throughout the operation section if it contains a CODING subsection. The variable has an unsigned integer type of the same width as that of the encoding of the operation (the sum of the data widths of all elements in the CODING).

  2. SYNTAX — Assembly syntax of the operation.

    The subsection contains a list of syntax elements separated by blank spaces or carets. When two elements are separated by a blank space, there will be a space in between in the assembly output. Otherwise, the two will be joined together.

    A syntax element can be any of the following:

    • String literal, e.g. "ldw".
    • A variable, e.g. v1. The variable can be a local variable or one declared in the machine section or in the define section. Modifiers can be used here to specify the output format when converting arithmetic data values to string.
    • Table lookup, e.g. array[v1]. The table should be defined in the define section. Modifiers can also be used here.

    The syntax of this subsection is shown below

    syntax_subsection  ::= "SYNTAX" (syntax_clause '^'?)* syntax_clause ';'
    syntax_clause      ::=  '"' string '"'
                          | identifier modifier?
                          | identifier '[' identifier ']' modifier?
    
  3. CODING — Binary encoding of the operation.

    The subsection contains a list of coding elements separated by blank spaces. A coding element can be any of the following.

    • Boolean literal — string of 0,1,- such as 00--11-.
    • A variable, e.g. v1.
    • OR-ed boolean literals, e.g. (0011 | 1100).

    The syntax of this subsection is shown below

    coding_subsection  ::= "CODING" coding_clause+  ';'
    coding_clause_or   ::=  boolean_literal
                          | identifier
                          | '(' (boolean_literal '|')+ boolean_literal ')'
    boolean_literal    ::= ('0'|'1'|'-')+
    


  4. EVAL — Initialization actions.

    This subsection contains the actions to be performed at the moment when the syntax operation is bound to the skeleton at run-time. Similar to the EVAL subsection in function section, this subsection contains a sequential list of statements. In addition to the statements defined in the Operator section of the document, two other types of statements are supported here.

    1. Decode Statement.
      The syntax of a decode statement is
      statement_decode    ::=  '+' identifier = identifier
                             | '+' identifier
      
    2. Activate Statement.
      statement_activate  ::=  '@' identifier = identifier
                             | '@' identifier
      


    For both statements, the first identifier must be an or-node variable. The optional second identifier must have identical encoding width to that of the or-node variable. The second identifier specifies the actual binary value that is used to decode the or-node variable. It should be omitted if the or-node variable appears in the coding section of the operation. In this case MADL will extract the corresponding binary field from coding and use it to decode.

    Both statements will trigger a decoding procedure to resolve the actual syntax operation. After decoding, the decode statement will evaluate the EVAL subsection of the resolved operation (the closest match in the list of the or-node variable) and annotate its actions and computations (defined in the TRANS subsection) on the current skeleton. In contrast, the activate statement will spawn another state machine and then let the resolved operation evaluate its EVAL subsection and annotate its actions and computations onto the spawned skeleton.



  5. TRANS — Actions and computations.

    The TRANS subsection describes the actions and computation statements associated with the OSM edges. The syntax of the subsection is shown below.

    trans_subsection     ::=  "TRANS" trans_clause+
    trans_clause         ::=  identifier ':' '{' action_list '}' statement_list ';'
                            | identifier ':' statement_list ';'
    action_list          ::=  nil
                            | (action ',')* action+
    statement_list       ::=  (statement ';')*
    

    Both the "action_list" and the "statement_list" are optional. If both are omitted (and no other syntax operation annotates the edge), it means that state transfer can occur along this edge unconditionally and without any side-effects. For information on actions, see OSM Action section for more information. The statement list syntax is the same as that of the EVAL subsection.

    The "trans_clause" associates the actions and the statements to the edge. It is likely that multiple syntax operations annotate their actions and statements onto the same edge. The actual firing order of these actions and statements is described in Action Ordering section. If an edge is not annotated by any syntax operation (not even with a "trans_clause" with empty action list and statement list), the edge is disabled and state transition cannot occur along the edge.

An operation example named "mvn" is shown below.

OPERATION mvn

    VAR v_rs : uint<32>;
        v_rn : uint<32>;

    SYNTAX "mvn" reg_names[rd] "," reg_names[rs];

    CODING 10111 rd rs ----;

    TRANS
    e_id_ex:    {v_rs = *mRF[rs], ex_buf = mEX[], !id_buf, *mRF[rs] = v_rd};
                 v_rd = -v_rs;

    e_ex_bf:    {bf_buf = mBF[], !ex_buf};





Action Ordering

An OSM is formed by one skeleton and one or more syntax operations. The skeleton mainly specifies the state diagram while the syntax operations specify the actions and computations occurring on the edges. It is possible that more than one syntax operation annotates its actions and statements, onto the same edge of the skeleton.

By OSM rules, when an edge is evaluated, the OSM will first test if all actions on the edge can be fired. If and only if all actions are firable, the OSM will fire the actions and evaluate the computation statements.

When all the actions are firable, the actions and the statements will be fired in certain order: category 1 OSM actions are evaluated first, followed by category 3, then the statements, category 4 actions, and finally category 2 actions. The general rule for action ordering is allocation/inquire first, read second, write third and release/discard last. Such order enables data-flow between token managers to occur within a single control step.

According to these rules, the actions associated with the edge "e_id_ex" in the above example follow the order:

  1. v_rs = *mRF[rs], ex_buf = mEX[];
  2. v_rd = -v_rs;
  3. !id_buf, *mRF[rs] = v_rd;


All these actions occur within one control step. One value is read from token manager mRF, then negated and written back to token manager mRF.

Note that there should be no explicit control dependency among the actions on one edge. The reason is that the firing of the actions depends on the outcome of the condition tests. Only when all conditions test true can the actions be fired. If the firing condition of an action depends on the firing result of another action, there will be cyclic dependency between the test and the firing. The code below shows examples of such control dependency. The first three edges are illegal since the second action depends on the first one in each case.

   edge1: {ind = *m1[], *m2[ind]};   #illegal
   edge2: {buf1 = m3[], !buf1};      #illegal
   edge3: {v1 = *m4[],  v1>10};      #illegal
   edge4: {v2 = *m5[],  *m6[] = v2}; #legal, data dependency is fine
   edge5: {buf2 = m6[], !!buf1};     #legal, since discard is unconditional


Also note that an edge may contain actions annotated by multiple syntax operations at a time. The ordering rule and control dependency rule applies to all actions across operation boundaries. The category-based ordering rule guarantees that data flow is well-preserved regardless of the which syntax operation that an action comes from.

The statements from different syntax operations are fired according to the binding order of the statements. Recall that binding occurs at decoding time. So for the example operation below, if its decoding statement on edge "e_if_id" resolves to an "mvn" operation as shown in previous examples, the "mvn" will annotate its actions on the skeleton. Obviously the annotation occurs later than that of its parent "dpi". So when edge "e_id_ex" is evaluated, the statement "foo=10" will precede "v_rd = -v_rs".

OPERATION dpi

    VAR oper: {mov, mvn};
        iw  : uint<32>;
        foo : uint<32>;

    EVAL
    e_if_id:    {iw = *mIF[]}
                 +oper = iw;

    e_id_ex:    foo = 10;




Data Types

MADL supports the following basic types:

  • void
  • int<n>—n is the bit width
  • uint<n>—n is the bit width
  • float—IEEE-754 single precision
  • double—IEEE-754 double precision
  • string


MADL supports the following complex types:

  • array—type[n].
    One dimensional array for int, uint, float, double and string types are supported. Array type can only be used in global constant variable declaration in define sections.
  • n-tuple—(type1, type2, ...).
    A tuple element can be any of the basic types except void. Tuple type can be used in manager sections as index type or value type. Functions also have tuple types, either in the function sections where they are defined, or in the define sections where they are declared as external functions. An element of a function tuple type can be followed by a '*', indicating that this is a writable argument, i.e. the argument is a reference (same as in C++) and may be modified by the function body. Elements without '*'s are read-only arguments, similar to const references in C++. Note that except function calling arguments, tuple-typed value can only appear in actions.


The syntax of data types is shown below.

data_type    ::=  "void" | basic_type | complex_type
basic_type   ::=  "int"  '<' integer '>'
                | "uint" '<' integer '>'
                | "float"
                | "double"
                | "string"

complex_type ::=  basic_type '[' integer ']'
                | '(' (basic_type ',')* basic_type ')'

func_type    ::= '(' (basic_type '*'? ',')* basic_type '*'? ')' 

Implicit conversion between types is supported by MADL. The following implicit conversions are valid:

  • int<n> to int<m> or uint<m>, when n<=m.
  • uint<n> to int<m> or uint<m>, when n<=m.
  • int<n> or uint<n> to float.
  • int<n> or uint<n> to double.
  • float to double.
  • (t1,t2,...) to (T1,T2,...) when all ti can be implicitly converted to Ti.
  • int<n> or uint<n> to string.
  • float or double to string.




Basic Operators

The basic operators are grouped according to their precedence levels listed in the table below. Highest precedence operators appear first.

Name Operator Associativity
function call
subscripting
bit extraction
modifier
( )
[ ]
[ : ]
.
none
left
left
left
cast (type)expr right
1's complement
negation
~
-
right
right
bit concatenation :: left
multiplication
division
modulo
*
/
%
left
left
left
addition
subtraction
+
-
left
left
right shift
left shift
>>
<<
left
left
greater than or equal to
greater than
less than or equal to
less than
>=
>
<=
<
left
left
left
left
equal
not equal
==
!=
left
left
bitwise and & left
bitwise xor ^ left
bitwise or | left
conditional ?: left
assignment = none

Operator precedence here is similar to that of ANSI-C operators. '(' and ')' can be used with the highest precedence. An MADL statement is either an assignment operation or a function call. Arithmetic and comparison operators can be applied to numerical types including integer and floating-point. Logical and bit operators can be applied to integer types only. Addition (means concatenation) and comparison of string-typed operands are supported.

For details about the modifier operators, please see the section below.



Modifiers

Modifier Exp. Type Result type
cod operation uint<w>
syn operation string
hex int/uint string
oct int/uint string
bin int/uint string
dec int/uint string
dec int/uint string
sci float/double string
fix float/double string
flt uint<32>/uint<64> float/double
bit float/double uint<32>/uint<64>

Modifiers can be used to refer to the syntax and encoding of any or-node variable. For assembly syntax, use "var_name.syn". For encoding, use "var_name.cod". The result type will have the same width as the variable's width.

Modifiers can also be used to convert numerical variables or expressions to string type. An integer variable/expression can be appended with ".hex", ".dec", ".oct" or ".bin" (hexadecimal, decimal, octal, binary) modifiers so that it is converted to a formatted string. Similarly, floating-point variables/expressions can be appended with ".sci" or ".fix" (scientific, fixed) modifiers for the same purpose. Finally, modifiers can be used to convert (literally) between integer and floating point values. ".flt" converts 32-bit or 64-bit integer to float or double typed values. ".bit" does the reverse. Note that such conversion is different from a normal arithmetic conversion. This is a literal conversion. All bit values remain the same after such a conversion.



OSM Actions

Transaction Syntax Category
allocate buffer = manager[index]; 1
inquire *manager[index]; 1
release !buffer; 2
discard !!buffer; 2
allocate' manager[index]; 1
read + inquire var = *manager[index]; 3
read var = *buffer; 3
write + allocate' *manager[index] = var/constant; 4
write *buffer = var/constant; 4
comparison var op var/constant; 1

Allocate' in above table means temporary allocate. It is equivalent to an allocate followed by a discard in one cycle. It is a syntax sugar for the convenience of model specification. The comparison operators are the same as C comparison operators

Note that except assignment, basic operators are not supported in the OSM action specification. Computation can always be moved into the statements. Implicit type conversion is allowed in OSM actions. This includes type conversion for both indexes and values.

It is valid to combine read and write in ways such as "*manager1[index1] = *manager2[index2];".



Annotation Syntax

Annotations appear as paragraphs in an MADL description. Below is the syntax of of an annotation paragraph in Backus-Naur Form.

annot_paragraph  ::=  claus*
                    | ':' identifier ':' claus_list   //with namespace

claus            ::=  decl | stmt

decl             ::=  "var" identifier ':' type ';'   //variable
                    | "define" identifier value ';'   //macro

stmt             ::=  identifier '(' arg_list ')' ';' //command
                    | val op val                      //relationship
arg              ::=  identifier = value
val              ::=  identifier | number | string
                    | '(' (val ',')+ val ')'       // tuple 
                    | '{' (val ',')* val '}'       // set       

typ              ::=  "int"   '<' integer '>'
                    | "uint" '<' integer '>'
                    | "string"
                    | '(' (typ ',')+ typ ')'      // tuple type
                    | '{' (typ ',')* typ '}'      // set type

An annotation paragraph contains an optional namespace label and a list of declarations and statements. The label specifies the tool-scope of the paragraph and can be used to filter irrelevant annotations. Paragraphs without a label belong to the global namespace.

In an MADL description, an annotation paragraph can either be in a single-line format or in a block format. The former is preceded by a ``$'' and runs through the end of the line while the latter is enclosed within a pair of ``$$''s. An annotation paragraph can be attached to any command, newly defined skeleton name, state, edge, variable, buffer, manager class, manager instance, syntax operation name, function name, statement, action, SYNTAX subsection, CODING subsection, and edge name reference in TRANS subsection.