Contents Introduction to
OSM |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Introduction to OSMMescal Architecture Description Language (MADL) specifies operation state machines (OSMs) for microprocessor modeling purposes. It is intended to assist the development of software tools including the instruction set simulator, the microarchitecture simulator, the disassembler and various compiler optimizers. These tools are necessary components of the software development tool chain for a microprocessor. Quick synthesis of these tools enables fast processor prototyping and design space exploration at early stages of the processor design process. The OSM model is created based on two types of abstraction. The first abstraction is the use of the finite state machine to model the execution of operations (equivalent to an instruction in RISC sense). The states of the state machine represent the execution status of the operation. The edges represent the possible execution paths. Each edge of the state machine is associated with a state transition condition, which represents the readiness for the operation to progress along the edge. Such readiness is expressed as the availability of execution resources, including structural resources, data resources and artifitial resources. The second abstraction is the notion of token for resource modeling. The tokens are controlled by token managers, which allocate tokens to the state machines according to the individual token allocation policies of the token managers. Overall, the OSM model views a microprocessor in two layers, the operation layer and the hardware layer. The operation layer contains a number of finite state machines modeling operations. These state machines execute concurrently and are coordinated by a scheduler which ensures that they act in a deterministic and deadlock-free manner. The hardwayer layer contains a number of function units. Their execution is scheduled by a discrete-event kernel. Many of the function units contain a token manager. A token manager controls a set of tokens of the same nature and implements its resource allocation policy. It communicates with the state machines through the common token transaction protocol. The token transaction is the only interaction between the operation layer and the hardware layer. The aforementioned conditions associated with the edges of the state machine consist of the token transaction requests. The token managers control the execution progress of the state machines through their responses to these requests. We defined four types of control-related token transaction requests: allocation, inquiry, release and discard. These requests are related to controlling the execution progress of the operation state machines. We also defined 2 data-related token transaction requests: token read and token write. These requests enable the state machines to exchange data values with the hardware layer. For a detailed description of the OSM model, please refer to the references. Figure 1 shows an example finite state machine. The state machine is used to model an "add r10, r1, r2" instruction (r10=r1+r2) in a 4-stage pipelined scalar processor. The hardware layer of the model is not shown here. It simply contains 5 token managers: mIF as the instruction fetching stage, mID as the decoding stage, mEX as the execution stage, mWB as the write-back stage and mDM as the data dependency manager. Each of the mIF, mID, mEX and mWB managers controls one token. The token indicates the existence of an operation in the corresponding pipeline stage. Since the token can be allocated to one state machine at a time, only one operation can be in the pipeline stage at a time. This prevents structural hazards in the scalar processor. The register file manager controls an array of tokens, each corresponding to a register in the register file. The allocation of a register token to a state machine means that the corresponding operation is using the register as its destination operand.
The conditions associated with the edges of the state machines are expressed as lists of token transaction requests in below. A condition is considered as true when all its requests are satisfied.
This paragraph provides a detailed explanation of the execution of the state machine. Initially, the state machine (M below) is at state I. This corresponds to the moment prior to the fetching of the operation. In the first clock cycle, M tries to advance its state to F by sending a token allocation request to mIF. If the mIF token is available (the fetching stage is empty), M will obtain the token and advance its state to F. In the next cycle, M tries to advance to state D by allocating the mID token and releasing the mIF one. If the mID token is available and the mIF token can be released (in this case the fetching manager allows the mIF token to be released once the instruction memory access is done), M will enter state D. Meanwhile since the mIF token is released, another state machine will be able to get the token and enter its F state. In the following cycle, M will inquire about its source operands from mDM to test their availability. It will also try to allocate the token corresponding to its destination operand, thus to prevent future operation from inquiring about it (using it as source operand) until M has written back its computation result and released the token. This way, data dependency in the pipeline can be preserved. If all these requests are successful, M will enter E. In the following two cycles, it will first go through state W, and then release the destination operand token and go back to I. Thus the state machine finishes its life cycle modeling the add operation. In a pipelined microprocessor, more than one such state machines are active at the same time. Each machine models one operation. They actively advance their states at every clock cycle. Together with the hardware layer, they model the processor cycle accurately. The advantage of the OSM model is mainly three-fold. First, the finate state machine is flexible enough to model a wide range of processors, including scalar, superscalar and VLIW ones. While other models such as the pipeline diagram do not offer such flexibility. Second, it is relatively easy to extract operation properties from the state machines. The control and data semantics are exposed in the token transaction requests. Therefore it is easy to analyze processor properties for model verification and compiler synthesis purposes. Finally, compared to other formal models such as the discrete event model or the abstract state machine model, the OSM model provides for a higher level of abstraction for micro-processor modeling. It greatly simplifies the modeling of control paths by distributing control policies into operation state machines and token managers. Therefore it allows that the user focus on high level architectural trade-offs rather than low level implementation details. It should be noted that the OSM model is different from the Petri Net, although they both utilize the notion of token to represent resources. A Petri Net is a concurrent system, while a finite state machine is sequential. The concurrency of the OSM model is reflected by the use of multiple such sequential state machines. It may be possible to convert the operation layer of the OSM model to a Timed Colored Petri Net (still needs to be proved). But we believe that specifying the operation layer as separate state machines is a much more straightforward task for designers than specifying a colossal Petri Net. MADL overviewThis document is written for two purposes. First, it is intended to give new users a basic idea about the language. With a general idea in mind, new users are encouraged to to start from an existing MADL example and to modify it. Second, for experienced users who would like to explore features not exemplified in the existing examples, the document works as a reference manual. MADL is composed of two parts: the core language and the annotation. The core language describes the finite state machines of the OSM model, which have concrete execution semantics. Note that currently the hardware layer is not part of MADL. We expect that to be included in the next version of MADL. The annotation part describes tool-dependent information. For any tool that utilizes MADL, it can create its own annotation description scheme based on a generic annotation syntax. The tool-dependent information may include supplemental information such as pointers to the implementation of the token managers, or hints for the tools to analyze the core description. This document mainly focuses on the syntax of the core language. The Backus Naur Form (BNF) of the attribute can be found in the Annotation Syntax of the document. MADL defines the operation layer of the processor, i.e. the OSMs themselves, and the communication between the operation layer and the hardware layer. MADL descriptions only declare the names and the types of the token managers. In current implementation, the token managers and the rest of the hardware layer are written in the programming language C++, which is the target language into which MADL descriptions will be translated. MADL supports a hierarchical description structure called and-or graph for syntax operations. Similar structure has been used by several other architecture description languages. With such a description structure, common properties of operations are factored and merged into higher level nodes, while their differences are kept at lower level nodes. Such a scheme greatly reduces redundancy in MADL descriptions and keeps them compact. To utilize the and-or graph to describe finite state machines, MADL adopts the notions of machine skeleton and syntax operation. A machine skeleton and one or more associated syntax operations form one finite state machines. A machine skeleton includes the state diagram, the token buffers and variables accessible by all the syntax operations associated with it. The syntax operations specify the token transactions of the state machine. These token transactions will be bound to the edges of the machine skeleton to form the transition conditions. They also contain information such as assembly syntax and binary encoding. The reasoning of the using the machine skeleton and the syntax operations, as well its implication in the OSM model, can be found in the LCTES'04 reference. An MADL file may contain any number of the following sections,
Besides, an MADL file may also contain the following commands.
Comments can be placed anywhere in a MADL program. Two types of comments are allowed: single-line comment and block comment. A single-line comment starts with a '#' and lasts until the end of the line. A block comment starts with a '##' and lasts until another '##'. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Define SectionThe define section declares global constant variables and function prototypes. These variables/functions are in the global naming scope and can be accessed throughout an MADL program. The following types of definition are supported. The general syntax of a variable/function declaration is: name : type = value(s); or name : type; An example define section is as follows. DEFINE reg_names : string[16] = {"r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "sl", "fp", "ip", "sp", "lr", "pc"}; pred_table : uint<16>[4] = {0xf0f0, 0x0f0f, 0xcccc, 0x3333}; epsilon : double; func1 : (string, uint<32>); func2 : (uint<32>*, uint<32>); The above define section defines an array of string literals named "reg_name", an array of 16-bit unsigned integer constants "pred_table", and a double-precision constant "epsilon" whose value is not given. Additionally, it defines two functions "func1" and "func2". Function arguments in MADL are passed by reference. Writable arguments are denoted by a "*" after the argument type, e.g. the first argument of "func2". The value of a writable argument may be changed by the function. The variables without values and the functions should be defined in external C++ files for simulation purposes. These C++ files should be linked with MADL generated C++ files for form simulators. For syntax of types, refer to the Data Type section of the document. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Manager SectionIn the OSM model, data or structural resources are modeled as tokens and are managed by token managers. A state machine transacts tokens with the token managers during its execution. In order to get a token, it will typically present to the token manager an index as a token identifier. The manager will then return a token if it is available. The state machine may also perform read and write on the tokens that it can access. For a list of possible token transactions, refer to the OSM Expression section of the document. MADL descriptions do not contain detailed implementation of the token managers, but only their types and instance names. A token manager type is a tuple of the token index type and the token value type. A token manager class statement declares such a tuple type. A manager section may contain a CLASS subsection and an INSTANCE subsection. The former declares token manager classes and their types, while the later declares token manager instances. The syntaxes of the subsections are illustrated by the example below. MANAGER CLASS fetch_manager : void -> (uint<32>,uint<32>); simple_resource : void -> void; INSTANCE mIF : fetch_manager; mEX : simple_resource; This example declares a token manager class named "fetch_manager" with a void index type (in this case there is no need for token identifier since there is only one token), and a tuple value type. An instance named mIF is later declared based on such a class. The example also declares a "simple_resource" manager class with a void index type and a void value type (it is simply a structural resource and has no value). Two token manager instances are later declared based on these two classes in the INSTANCE subsection. An MADL program may contain one or more manager sections. All manager classes and instances declared in these sections are visible to the global scope. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Machine SectionA machine section describes a machine skeleton, which contains the state diagram, token buffers and variables visible to all syntax operations associated with it. A machine section may contain the following subsections:
There must be one and only one INITIAL state defined in each machine section. There can exist an arbitrary number of regular states as long as there is no naming conflict. An example machine section name "normal" is shown below: MACHINE normal INITIAL S_INIT; STATE S_IF, S_EX; EDGE e_in_if : S_INIT -> S_IF; e_if_ex : S_IF -> S_EX; e_ex_in : S_EX -> S_INIT; BUFFER if_buffer : fetch_manager; ex_buffer : simple_resource; VAR iw : uint<32>; pc : uint<32>; The STATE subsection simply contains a list of state names. The EDGE subsection contains a list of edge clauses. Each clause contains the edge name, followed by a ':', the source state name, '->' and the destination state name. The BUFFER subsection contains a list of token buffer clauses, each of which contains a buffer name, followed by ':' and the class name of a manager. The buffer can only be used to temporarily store tokens obtained from managers of the same class. The variable subsection contains a list of variable declaration, each of which contains a variable name followed by ':' and a type. See Data Type section for details about variable types. The states and edges forms the state diagram of the machine skeleton. The state diagram must be a directed strongly connected directed graph. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Function SectionA function section defines one internal MADL function. This is different from the external functions in the DEFINE section. The body of internal functions are part of the MADL program, while the body of the external functions are in external C++ source files. A function section contains a function name, an optional variable (VAR) subsection and an evaluation (EVAL) subsection. The variable subsection defines the local variables and the evaluation subsection defines the computation. Its syntax is the same as the variable subsection of the MACHINE section. The evaluation subsection contains a sequential list of statements. See Operator section for information about the statements. The statements may access the function arguments, the local variables and global constant variables from define sections. Unlike C functions, MADL functions do not have a return value. The computation result of the function can be returned through writable arguments. See Data Type section for more information about writable arguments. An example function section is given below. The "result" argument is writable and is used to return the value of computation. FUNCTION eval_pred(result:uint<1>*, cond:uint<2>, flags:uint<4>) VAR temp : uint<4>; EVAL temp = pred_table[cond] >> flags; result = (uint<1>)temp; Same as external functions, internal functions are visible to the global name scope. A function can be called throughout an MADL program, regardless the locations of caller or the function. Recursion is allowed. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Operation SectionAn operation section defines a syntax operation. It must be defined based on a machine skeleton. The machine skeleton for a syntax operation is specified by the "USING" command. The subsections in an operation section may access the local variables declared in the machine section and the global constant variables in the define sections. An operation section contains a name and the following subsections.
An operation example named "mvn" is shown below. OPERATION mvn VAR v_rs : uint<32>; v_rn : uint<32>; SYNTAX "mvn" reg_names[rd] "," reg_names[rs]; CODING 10111 rd rs ----; TRANS e_id_ex: {v_rs = *mRF[rs], ex_buf = mEX[], !id_buf, *mRF[rs] = v_rd}; v_rd = -v_rs; e_ex_bf: {bf_buf = mBF[], !ex_buf}; |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Action OrderingAn OSM is formed by one machine skeleton and one or more syntax operations. The machine skeleton mainly specifies the state diagram while the syntax operations specify the actions occurring on the edges. It is possible that more than one syntax operations annotate their actions, including transactions and statements, onto the same edge of the machine skeleton. There are two types of actions associated to the edges of the machine skeleton: transaction and statement. By OSM rules, when an edge is evaluated, the OSM will first test if all transactions on the edge can be fired. If and only if all transactions are firable, the OSM will fire the transactions and evaluate the statements. Otherwise nothing should happen. When all the transactions are firable, the transactions and statements will be fired in certain order: category 1 and 3 OSM transactions are evaluated first, then the statements are evaluated, finally category 2 OSM transactions are evaluated. The general rule for transaction ordering is allocation/inquire/read first and release/discard/write last. Such order enables data-flow between token managers to occur within a single control step. According to these rules, the actions associated with the edge "e_id_ex" in the above example follow such order:
All these actions occur within one control step. One value is read from token manager mRF, then negated and written back to token manager mRF. Note that there should be no explicit control dependency among the transactions on one edge. The reason is that firing of the transactions depends on the outcome of the condition tests. Only when all conditions test true can the transactions be fired. If the firing condition of an transaction depends on the firing result of another transaction, there will be cyclic dependency between the test and the firing. The code below shows examples of such control dependency. The first three edges are illegal since their second transaction depends on their first one. edge1: {ind = *m1[], *m2[ind]}; #illegal edge2: {buf1 = m3[], !buf1}; #illegal edge3: {v1 = *m4[], v1>10}; #illegal edge4: {v2 = *m5[], *m6[] = v2}; #legal, data dependency is fine edge5: {buf2 = m6[], !!buf1}; #legal, since discard is unconditional Note that an edge may contain actions annotated by multiple syntax operations at a time. The ordering rule and control dependency rule applies to all transactions across operation boundaries. The category-based ordering rule guarantees that data flow is well-preserved regardless from which syntax operation a transaction comes. The statements from different syntax operations are fired according to the binding order of the statements. Recall that binding occurs at decoding time. So for the example operation below, if its decoding statement on edge "e_if_id" resolves to an "mvn" operation as shown in previous examples, the "mvn" will annotate its transactions on the machine skeleton. Obviously the annotation occurs later than that of its parent "dpi". So when edge "e_id_ex" is evaluated, the statement "foo=10" will precede "v_rd = -v_rs". OPERATION dpi VAR oper: {mov, mvn}; iw : uint<32>; foo : uint<32>; EVAL e_if_id: {iw = *mIF[]} +oper = iw; e_id_ex: foo = 10; |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Data TypesMADL supports the following basic types:
MADL supports the following complex types:
Implicit conversion between types is supported by MADL. The following implicit conversions are valid:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Basic OperatorsThe basic operators are grouped according to their precedence levels listed in the table below. Highest precedence operators appear first.
Operator precedence here is similar to that of ANSI-C operators. '(' and ')' can be used with the highest precedence. An MADL statement is either an assignment operation or a function call. Arithmetic and comparison operators can be applied to numerical types including integer and floating-point. Logical and bit operators can be applied to integer types only. Addition (means concatenation) and comparison of string-typed operands are supported. For details about the modifier operators, please see the section below. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Modifiers
Modifiers can be used to refer to the syntax and encoding of any or-node variable. For syntax, use "var_name.syn". For encoding, use "var_name.cod". The result type will have the same width as the variable's width. Modifiers can also be used to convert numerical variables or expressions to string type. An integer variable/expression can be appended with ".hex", ".dec", ".oct" or ".bin" (hexadecimal, decimal, octal, binary) modifiers so that it is converted to a formatted string. Similarly, floating-point variables/expressions can be appended with ".sci" or ".fix" (scientific, fixed) modifiers for the same purpose. Finally, modifiers can be used to convert (literally) between integer and floating point values. ".flt" converts 32-bit or 64-bit integer to float or double typed values. ".bit" does the reverse. Note that such conversion is different from a normal arithmetic conversion. This is a literal conversion. All bit value remains the same after such a conversion. OSM Transactions
Allocate' in above table means temporary allocate. It is equivalent to an allocate followed by a discard in one cycle. It is a syntax sugar for the convenience of modeling. The comparison operators are the same as C comparison operators Note that except for assignment, basic operators are not supported in the OSM transaction specification. Computation can always be moved into the statements. Implicit type conversion is allowed in OSM transactions. This includes type conversion for both indexes and values. It is valid to combine read and write in ways such as "*manager1[index1] = *manager2[index2];". |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Annotation SyntaxAnnotations appear as paragraphs in an MADL description. Below is the syntax of of an annotation paragraph in Backus-Naur Form. annot_paragraph ::= claus_list | :id: claus_list //with namespace claus ::= decl | stmt decl ::= var id:type //variable | define id value //macro stmt ::= id (arg_list) //command | val op val //relationship arg ::= id = value val ::= id | number | string | (val_list) // tuple | {val_list} // set type ::= int<width> | uint<width> | string | (type_list) // tuple type | {type} // set type An annotation paragraph contains an optional namespace label and a list of declarations and statements. The label specifies the tool-scope of the paragraph and can be used to filter irrelevant annotations. Paragraphs without a label belong to the global namespace. In an MADL program, an annotation paragraph can either be in a single-line format or in a block format. The former is preceded by a ``\$'' and runs through the end of the line while the latter is enclosed within a pair of ``\$\$''s. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ReferenceThe main reference of the OSM model is:
A few related architecture description works are:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Last Update: $Date: 2004/06/08 21:32:18 $ |