Monthly R&D Status Report Date: August 15, 1995 Title: "SYSTEM-LEVEL DESIGN METHODOLOGY FOR EMBEDDED SIGNAL PROCESSORS" Contract Number: F33615-93-C-1317 Principal Investigator: Edward A. Lee Organization: University of California at Berkeley 1. Tasks Performed We presented a full-day tutorial and gave numerous demos on the exhibit floor at the RASSP conference in Arlington, Virginia. In addition, we have made considerable progress towards a definitive dynamic dataflow scheduler and the supporting theory. Finally, we have prepared and submitted a number of conference papers. 2. Significant Accomplishments 2.1 The RASSP Conference In conjunction with Dave Wilson of Berkeley Design Technology, Mike Williamson, Brian Evans, and Edward Lee led a full-day tutorial on Ptolemy at the RASSP conference in Arlington Virginia. Approximately 25 people attended. The handouts for the tutorial are available in compressed postscript form on the Ptolemy home page at http://ptolemy.eecs.berkeley.edu. In addition, we staffed an exhibit booth on the exhibit floor, and gave numerous demos. Ptolemy also played a role in five other exhibits on the exhibit floor: (1) Sanders wrote their own front-end to Ptolemy that allows a user to sketch a target parallel architecture and quickly map an SDF graph to the processors in the sketched architecture. (2) Sanders wrote a new code generation domain for FPGAs that uses the DE domain to automatically insert registers to compensate for pipelining. They apply perl scripts to the resulting ptcl code to generate the FPGA layout in a Xilinx format. (3) DQDT derived a new VHDL domain to serve as a front end specification and VHDL code generation environment for behavior modeling and synthesis of Application-Specific Integrated Circuits. They applied the same approach using Mentor Graphics DSP Station as a front end. (4) Berkeley Design Technology wrote a layer on top of the Ptolemy kernel called Ptolemy HSIM (Heterogeneous Simulation) which serves as a simulation backplane that allows Cadence's Signal Processing Workstation, Cadence's Bones and Precedence's SimMatrix tools to cooperate during a simulation. SimMatrix is a synchronization mechanism for connecting 30 different VHDL and Verilog simulators together. (5) Boston University demonstrated a signal-reprocessing environment called IPUS that has been integrated into Ptolemy. Ptolemy serves as the organizing framework, and will eventually provide (through its other domains) a computational engine. Signal reprocessing is a family of knowledge-based techniques for iteratively refining a computation by dynamically selecting the algorithms to be applied to the data on the basis of results provided by previous algorithms. 2.2. Dynamic Dataflow Models Tom Parks, Prof. Soonhoi Ha (of Seoul National University), and Edward Lee have continued development of what appears to be a considerable breakthrough on the dynamic dataflow model of computation, apparently solving a long standing open problem. This solution appears to define a robust and simple scheduler that can be used in the DDF and process network (PN) domains in Ptolemy, as well as in commercial simulators like COSSAP (from Synopsys), DDSIM (from Mentor Graphics), and SPW (from Cadence). The problem can be stated simply, and is repeated here from the previous monthly report, for convenience. A dynamic dataflow scheduler should satisfy two requirements: R1: If the dataflow graph does not contain a deadlock condition, the scheduler should not halt. R2: If the dataflow graph can be executed forever in bounded memory, then the scheduler should be able to execute it forever in bounded memory. In general, given a dataflow graph, it is undecidable whether the graph will deadlock (the halting problem). It is also undecidable whether the graph can be executed in bounded memory. It is easy to define a scheduling algorithm that satisfies R1 or R2, but no scheduling algorithm can always, in finite time, guarantee both R1 and R2. This problem has plagued us for years, and has also appeared in much of the dataflow architecture work in various forms. We have identified a third condition that is extremely useful in practice, although less fundamental: R3: The scheduler should execute a graph in a sequence of well-defined and determinate "iterations", where an iteration is set of actor firings. The reason for this third requirement is that users frequently wish to execute a graph for a finite period despite the property that the graph can be executed forever. A well-defined and determinate notion of an iteration permits a user to specify the length of the desired execution in iterations. The notion of an iteration also plays a role in heterogeneous systems, where other models of computation are mixed with dataflow. In such a system, a dynamic dataflow scheduler may be invoked and told to run through "one iteration." In such cases, the designer of the dataflow subsystem may need to control what this means so that the quantum of computation is an intuitive one. To support all of these ideas, we have implemented two new dynamic dataflow schedulers, and have outlined proofs of their correctness. Both schedulers define a default "basic iteration" to consist of no more than one firing of any dataflow actor. The schedulers differ, however, in how many stars will be fired in one iteration. Both also support a "user iteration", which consists of however many "basic iterations" are needed to fire one or more specified stars a specified number of times. To permit a user to annotate a dataflow graph with the number of firings of a star that constitute a "user iteration", we implemented an extension to the GUI and the Target object to support "pragmas" attached to blocks. A given Target (such as the DDFTarget) understands only certain pragmas. In the DDF domain, the DDFTarget understands a pragma called "firingsPerIteration". Thus, when a user specifies a value of this pragma for a particular star, a "user iteration" has been defined. if no such value is specified, then a "user iteration" equals a "basic iteration". 2.3. Mixing Control with Dataflow Our research on control/dataflow mixture continues, focusing on fundamental semantics issues. We have examined the semantics of several specialized computational models for control, including the twenty-one variants of Statecharts surveyed by M. von der Beeck of Aachen University of Technology. We have identified three orthogonal semantic properties of Statecharts: FSM, hierarchy, and concurrency. If we take away from Statecharts the transitions that cross hierarchy boundaries, we get a simpler model in which the FSM semantics can be cleanly separated from the concurrency semantics. This means that the basic FSM model can be mixed with the various Ptolemy domains' concurrency models to get many models that are only slightly weaker than Statecharts. They lack the hierarchy-crossing transitions, but those are considered by many to violate the information hiding principle of hierarchical design. We are refining this observation and testing it out on some examples. 2.4. Mixed Signal Systems Joel King continues to make progress on building a mixed-signal design environment into Ptolemy. He is learning from prior work in the CAD group at Berkeley on a system called Splice3 that combines Spice with switch level and gate level simulators. Many of the problems of integrating time-based discrete-event simulators with circuit simulators have been addressed in Splice3. The program is a variation of Spice that uses Linear Relaxation techniques to solve the input circuits, resulting in a substantial savings of time for some circuits. After evaluating the interfacing mechanisms, Joel plans to build a wormhole interface, focusing on the semantics of interaction with the SDF (synchronous dataflow) and DE (discrete-event) domains. Interfaces with Thor and VHDL domains will follow. 2.5. Technical Improvements to Ptolemy In a major breakthrough, Christopher Hylands and Jose Pino built a pigiRpc executable with g++ that uses shared libraries. The image size is about 225K, versus 43Mb a purelinked ptinyRpc (191x). For the interpreter version of Ptolemy, ptcl, the shared image is about 279K vs. 11Mb for the purelinked ptcl(39x). Link time and execution speed on machines with limited memory is dramatically improved. The shared pigiRpc will incrementally link in stars, resulting in full debugging capability for dynamically linked stars. Moreover, since libg++ is now shared, we avoid a certain class of bugs that where a star wants libg++ functions that were not brought into the binary at link time. The 0.5.2 User's manual and Programmer's manual are now available via the Ptolemy home page. Both manuals have a keyword queryable index. Soonhoi Ha and Wan-Teh Chang have made a number of improvements to the code generation domains to support code generation for dynamic dataflow graphs. Wan-Teh Chang and Brian Evans have fixed the compile-SDF target to work for universes with Matlab and Tcl/Tk stars. This target generates standalone C++ programs from applications specified in the synchronous dataflow (SDF) domain. They also fixed another problem in that the compile-SDF target did not call the begin methods of the component blocks. Christopher Hylands has converted our environment variable ARCH to PTARCH, to avoid conflicts with other programs. Kang Ngee Chia has used the new Target pragma mechanism (see above) to permit a user to identify parameters of blocks in the CGC domain (code generation in C) that permit these parameters to be set on the command line when invoking the generated program. Matt Tavis has rewritten the Gantt chart widget using Tk and better integrated it into Ptolemy. 3. Schedule Reconciliation No changes. 4. Publications The following papers have been accepted: [1] Jose Luis Pino, Shuvra S. Bhattacharyya and Edward A. Lee, "A Hierarchical Multiprocessor Scheduling System for DSP Applications," to be presented at the IEEE Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, October 1995. (http://ptolemy.eecs.berkeley.edu/~pino/papers/asilomar-95) [2] Thomas M. Parks, Jose Luis Pino, and Edward A. Lee, "A Comparison of Synchronous and Cyclo-Static Dataflow" to be presented at the IEEE Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, October 1995. The following papers have been submitted: [3] Praveen K. Murthy, Edward A. Lee, "Extension of Multidimensional Synchronous Dataflow to Handle Arbitrary Sampling Lattices," submitted to ICASSP '96. [4] Jose Luis Pino, Michael C. Williamson and Edward A. Lee, "Interface Synthesis in Heterogeneous System-Level DSP Design Tools," submitted to ICASSP '96. [5] Guner Arslan, Brian L. Evans, F. Ayhan Sakarya, Onur Tacki and Jose Luis Pino "Performance Evaluation and Real-Time Implementation of Subspace, Adaptive, and DFT Algorithms for Multi-Tone Detection," submitted to ICASSP '96. The following two thesis drafts have been prepared and will be finalized in the next couple of months: [5] S. Sriram, "Minimizing Communication and Synchronization Overhead in Multiprocessors for Digital Signal Processing," thesis draft, July, 1995. [6] Asawaree Kalavade, "System-level Codesign of Mixed Hardware-Software Systems," thesis draft, July 1995. All Ptolemy publications are listed chronologically and can be searched by author, topic, etc., by accessing our World Wide Web page at URL http://ptolemy.eecs.berkeley.edu/ under "Publications".