Distributed Modeling and Simulation with Backtracking


Researchers: Thomas H. Feng
Advisor:Edward A. Lee

Distributed modeling and simulation are an active research area. It has received increasing concern, particularly because high-speed networks and cluster computer systems become widely available. There are many good reasons for using such systems in large-scale modeling and simulation, some of which are:

Ptolemy II is a modeling, design, and simulation framework suitable for addressing such requirements. To incorporate the ability of high-speed distributed computation into Ptolemy II, a number of interesting problems are going to be studied thoroughly.

Some of the arising problems are technically hard. One example is the strategy employed to overcome network latency while still maximizing speed. Time Warp [1] is one approach, and there are others under research.

Some other problems are more tractable and partially satisfactory solutions have been found. One example is the backtracking requirement for the above-mentioned strategy. I have prototyped a mechanism using aspect-oriented programming to transparently insert rollback code into a Ptolemy II models with relatively small overhead. I am studying refinements, improvements, and semantic implications.

In this project, I will take a formal approach to describe the global semantics of those highly autonomous but still mutually dependent components in a distributed system. This work will probably give rise to a formalism which, like PN (process networks), regards components as processes and abstracts the physical connections between them as data channels with FIFO queues. However, unlike PN or DE (discrete events) systems, issues related to the nature of distributed systems cannot be abstracted away. In this scenario, components are highly flexible, whose activities are not restricted by blocking reads. Connections between them might be established, destroyed, or re-established dynamically. Components might transparently migrate from one physical location to another. Latency is observable, and in the worst case, messages might be lost. No single manager is able to arbitrate such global notions as global startup signals and global time. With the penalty of increased complexity, this formalism significantly improves flexibility, and more directly maps to implementation.

References:

[1] D. Jefferson, Brian Beckman, et al, Distributed Simulation and the Time Warp Operating System, UCLA Computer Science Department: 870042, 1987.

Last updated 11/01/04