Efficient development of real-time fault-tolerant controllers
Pinello, University of California at Berkeley
Luigi Palopoli, Scuola Superiore S.Anna Pisa, Italy
Alberto Sangiovanni-Vincentelli, University of California at Berkeley
This research plan deals with the synthesis of real-time embedded controllers,
taking into account constraints and design goals pertaining to the domain of the
control application and that of its implementation. A particular concern is on
real-time scheduling and fault tolerance, two critical requirements of an
emerging class of embedded applications (e.g. "by-wire'' automotive
controllers). We perceive that systematic approaches in this discipline would
enable shorter development cycles and more efficient designs. Designers should
concentrate on high level decisions and have large support of synthesis and
analysis tools to take care of detailed tuning and validation. Most of the
detailed decisions should be automatically derived as solution to optimization
problems or should be the result of synthesis based on rich libraries of
debugged components. Both flows should give sufficient insight to the designer
and should allow for her guidance.
The envisioned methodology: an overview
A pictorial representation of the methodological approach we advocate is shown in
Figure 1, where ovals represent activities, boxes represent artifacts or components and round cornered boxes represent design goals. An
arrow drawn from an activity into a box is used to denote that the content of the box is a result of the activity. Conversely, arrows
drawn from boxes denote inputs to activities. The graph also includes boxes which are not results of any activity, referred to inputs from the designer.
At a first glance, it is possible to recognize that the envisioned approach is inspired to a common conception in modern embedded systems design:
behavioral and architectural design should be as much as possible orthogonal activities.
By leveraging this principle, it is possible to transform a confused sequence of trial-and error iterations
into a well defined engineering process, unfolding unprecedented opportunities in the search for optimal performance/cost
3. Research tasks
The following tasks stem for this research activity:
- Control synthesis with computation constraints
- Synthesis of distributed fault-tolerant schedules.
3.1 Control synthesis with computation constraints
In the context of our methodology, the behavioral design is expected to
accomplish two distinct (and to some extent conflicting) goals. On the one hand,
we want to select/devise a set of models and algorithms able to ensure a good
level of orthogonalization between the design of the behavior and that of the
architecture. On the other, we want that the information contained in artifacts
circulating between the different activities be sufficiently rich to ensure the
actual respect of the design constraints and the fulfillment of the design
goals. It is our persuasion that classical design flows, based on a rigid
separation between the work of the control engineer and that of the computer
engineer, fail into achieving the latter goal. Since limitations related to the
implementation platform are not taken into account during the early phase of the
design, they manifest themselves only during the prototyping of the system
compelling the developers to expensive trial-and-error iterations. More
specifically we want to investigate the following problems:
I. Closed loop robustness under real-time schedulability constraints;
We conjecture that delays introduced by computation (e.g. in a time
triggered model of computation) may reduce system robustness with respect to
unmodeled plant uncertainties and disturbance. When multiple systems share
computation resources, schedulability issues impose a bound on the loop rates
attainable for the different closed-loop systems. In our, view this can be
phrased as an optimization problem where loop rate and gains are decision
variable and robustness appears as a cost function.
II. Combined effects of computation/bitrate constraints on control quality.
Bitrate constraints are commonplace in distributed control systems. The
problem of stabilization of bandwidth constrained systems is addressed in ,
while the state estimation problem for linear systems under bit-rate constraints
is analyzed in . A comprehensive framework where observability,
stabilizability and controllability have been addressed is in . We want to
extend these results taking into account effects of delays due to the
computation activities and to the scheduling of multiple activities hosted on a
same processor. In our view, this could be the first step toward a unifying
theory of control under general resource constraints.
3.2 Synthesis of distributed fault-tolerant schedules
Some applications are so critical that they need be resilient to faults in
the computing architecture. Typically this is achieved by redundantly scheduling
the application on the architecture. Starting from a network of process, some or
all of the processes and the data they exchange are replicated . Additional
processes may be needed for voting on the results of different replicas of a
same process to establish a common result: the consensus problem. Then an
assignment and schedule of the augmented network onto the distributed
architecture must be devised.
It seems profitable to relief the designer from the burden of devising a
fault-tolerant distributed schedule, and opt for an approaches based on
In order to obtain efficient utilization of resources, we want to allow a
flexible use of passive replicas (replicas of a process that run only when the
main replica undergoes a fault). Preliminary results have shown the usefulness
of this technique in achieving higher schedulability by ``reclaiming'' resources
from non-running replicas . A further venue of improvement may arise in
the context of gracefully degrading applications, where replicas are not an
exact copy of the original process. Rather there may be simpler versions with
reduced functionality and/or accuracy and likely less resource requirements .
This exposes an opportunity to achieve higher schedulability, by requiring
strong fault resilience only of the light-weight versions.
Moreover we want to allow general architectures, removing the strict
boundaries of the modules and busses found in the TTA. This enables more general
fault models also for the communication subsystem. The resulting architecture is
a full-fledged distributed multiprocessor system, where each node can be used
per se and not as a mere duplicate of another one. All the parallelism in the
hardware can then be exploited to speed up the execution of parallel processes
of the application without affecting the degree of fault tolerance. We
note that most of the results cited above have been derived under very
restrictive assumption on the fault model. We believe some of their founding
principles can be rephrased in a more general framework. The expected outcome of
this research is a systematization of a set of design techniques which could
allow for an easy exploration of design alternatives arising from different
Luigi Palopoli, Claudio Pinello, Alberto Sangiovanni Vincentelli, Laurent
Elghaoui, Antonio Bicchi, "Synthesis of robust control systems under
resource constraints", to appear in Lecture Notes in Computer Science,
proceedings of the Hybrid Systems: Computation and Control, March 2002. Abstract.
 F. Balarin and
others, "Hardware-Software Co-Design of Embedded
Systems: the polis approach", 1997, Kluwer Academic Publishers.
 W.S. Wong, R. Brockett, "Systems with finite bandwidth constraints -
part II: Stabilization with limited information feedback", IEEE Trans. on
Automatic Control, 1999, Vol 44, N.5.
 W.S. Wong, R. Brockett, "Systems with finite bandwidth constraints -
part I: State estimation problems", IEEE Trans. on Automatic Control, 1997,
Vol 42, N.9.
 G. N. Nair, R. J. Evans, "State estimation under bit rate
constraints", Proc. of the 37th IEEE Conference on Decision and Control,
 S. Tatikonda, S. Mitter, "Control Under Communication
Constraints", MIT PhD Thesis, August 2000.
 M. Barborak, M. Malek, A. Dahbura, "The consensus problem in
fault-tolerant computing", ACM Computing Surveys, 1993, Vol 25, N.2
 KapDae Ahn, Jong Kim, SungJe Hong, "Fault-tolerant real-time
scheduling using passive replicas", Proceedings of Pacific Rim
International Symposium on Fault-Tolerant Systems, 1997.
 M. Caccamo G. Buttazzo,"Optimal scheduling for fault-tolerant and
firm real-time systems", Proc. IEEE Conference on Real-Time Computing
Systems and Applications, Hiroshima, Japan, 1998.
 M. Caccamo, G.Buttazzo, and L. Sha, "Capacity sharing for overrun
control.", Proc. IEEE Real-Time Systems Symposium, Orlando FL, 2000.
 C. Dima, A. Girault, C. Lavarenne, and Y. Sorel, "Off-line
real-time fault-tolerant scheduling", Proceedings Ninth Euromicro Workshop
on Parallel and Distributed Processing, Mantova, Italy, 2001.
 J. Aguilar and M. Hernandez, "Fault tolerance protocols for
parallel programs based on tasks replication", Proceedings of MASCOTS, San
Francisco, CA, 2000.
More details can be found in the research plan.
For questions or comments pinello at eecs.berkeley.edu