Efficient development of real-time fault-tolerant controllers

JavaScript is disabledSite Map

Introduction

People

Research themes:

Efficient development of real-time fault-tolerant controllers

Claudio Pinello, University of California at Berkeley
Luigi Palopoli, Scuola Superiore S.Anna Pisa, Italy
Alberto Sangiovanni-Vincentelli, University of California at Berkeley

This research plan deals with the synthesis of real-time embedded controllers, taking into account constraints and design goals pertaining to the domain of the control application and that of its implementation. A particular concern is on real-time scheduling and fault tolerance, two critical requirements of an emerging class of embedded applications (e.g. "by-wire'' automotive controllers). We perceive that systematic approaches in this discipline would enable shorter development cycles and more efficient designs. Designers should concentrate on high level decisions and have large support of synthesis and analysis tools to take care of detailed tuning and validation. Most of the detailed decisions should be automatically derived as solution to optimization problems or should be the result of synthesis based on rich libraries of debugged components. Both flows should give sufficient insight to the designer and should allow for her guidance.

The envisioned methodology: an overview

A pictorial representation of the methodological approach we advocate is shown in Figure 1, where ovals represent activities, boxes represent artifacts or components and round cornered boxes represent design goals. An arrow drawn from an activity into a box is used to denote that the content of the box is a result of the activity. Conversely, arrows drawn from boxes denote inputs to activities. The graph also includes boxes which are not results of any activity, referred to inputs from the designer. At a first glance, it is possible to recognize that the envisioned approach is inspired to a common conception in modern embedded systems design: behavioral and architectural design should be as much as possible orthogonal activities.
By leveraging this principle, it is possible to transform a confused sequence of trial-and error iterations into a well defined engineering process, unfolding unprecedented opportunities in the search for optimal performance/cost trade-offs [Polis].

Figure 1

3. Research tasks

The following tasks stem for this research activity:

- Control synthesis with computation constraints
- Synthesis of distributed fault-tolerant schedules.

3.1 Control synthesis with computation constraints

In the context of our methodology, the behavioral design is expected to accomplish two distinct (and to some extent conflicting) goals. On the one hand, we want to select/devise a set of models and algorithms able to ensure a good level of orthogonalization between the design of the behavior and that of the architecture. On the other, we want that the information contained in artifacts circulating between the different activities be sufficiently rich to ensure the actual respect of the design constraints and the fulfillment of the design goals. It is our persuasion that classical design flows, based on a rigid separation between the work of the control engineer and that of the computer engineer, fail into achieving the latter goal. Since limitations related to the implementation platform are not taken into account during the early phase of the design, they manifest themselves only during the prototyping of the system compelling the developers to expensive trial-and-error iterations. More specifically we want to investigate the following problems:

I. Closed loop robustness under real-time schedulability constraints;

We conjecture that delays introduced by computation (e.g. in a time triggered model of computation) may reduce system robustness with respect to unmodeled plant uncertainties and disturbance. When multiple systems share computation resources, schedulability issues impose a bound on the loop rates attainable for the different closed-loop systems. In our, view this can be phrased as an optimization problem where loop rate and gains are decision variable and robustness appears as a cost function.

II. Combined effects of computation/bitrate constraints on control quality.

Bitrate constraints are commonplace in distributed control systems. The problem of stabilization of bandwidth constrained systems is addressed in [2], while the state estimation problem for linear systems under bit-rate constraints is analyzed in [3][4]. A comprehensive framework where observability, stabilizability and controllability have been addressed is in [5]. We want to extend these results taking into account effects of delays due to the computation activities and to the scheduling of multiple activities hosted on a same processor. In our view, this could be the first step toward a unifying theory of control under general resource constraints.

3.2 Synthesis of distributed fault-tolerant schedules

Some applications are so critical that they need be resilient to faults in the computing architecture. Typically this is achieved by redundantly scheduling the application on the architecture. Starting from a network of process, some or all of the processes and the data they exchange are replicated . Additional processes may be needed for voting on the results of different replicas of a same process to establish a common result: the consensus problem[6]. Then an assignment and schedule of the augmented network onto the distributed architecture must be devised.

It seems profitable to relief the designer from the burden of devising a fault-tolerant distributed schedule, and opt for an approaches based on synthesis.

In order to obtain efficient utilization of resources, we want to allow a flexible use of passive replicas (replicas of a process that run only when the main replica undergoes a fault). Preliminary results have shown the usefulness of this technique in achieving higher schedulability by ``reclaiming'' resources from non-running replicas [7][8]. A further venue of improvement may arise in the context of gracefully degrading applications, where replicas are not an exact copy of the original process. Rather there may be simpler versions with reduced functionality and/or accuracy and likely less resource requirements [9]. This exposes an opportunity to achieve higher schedulability, by requiring strong fault resilience only of the light-weight versions.

Moreover we want to allow general architectures, removing the strict boundaries of the modules and busses found in the TTA. This enables more general fault models also for the communication subsystem. The resulting architecture is a full-fledged distributed multiprocessor system, where each node can be used per se and not as a mere duplicate of another one. All the parallelism in the hardware can then be exploited to speed up the execution of parallel processes of the application[10][11] without affecting the degree of fault tolerance. We note that most of the results cited above have been derived under very restrictive assumption on the fault model. We believe some of their founding principles can be rephrased in a more general framework. The expected outcome of this research is a systematization of a set of design techniques which could allow for an easy exploration of design alternatives arising from different fault models.

Publications:

[PPSEB] Luigi Palopoli, Claudio Pinello, Alberto Sangiovanni Vincentelli, Laurent Elghaoui, Antonio Bicchi, "Synthesis of robust control systems under resource constraints", to appear in Lecture Notes in Computer Science, proceedings of the Hybrid Systems: Computation and Control, March 2002. Abstract.

Bibliography:

[1] F. Balarin and others, "Hardware-Software Co-Design of Embedded Systems: the polis approach", 1997, Kluwer Academic Publishers.

[2] W.S. Wong, R. Brockett, "Systems with finite bandwidth constraints - part II: Stabilization with limited information feedback", IEEE Trans. on Automatic Control, 1999, Vol 44, N.5.

[3] W.S. Wong, R. Brockett, "Systems with finite bandwidth constraints - part I: State estimation problems", IEEE Trans. on Automatic Control, 1997, Vol 42, N.9.

[4] G. N. Nair, R. J. Evans, "State estimation under bit rate constraints", Proc. of the 37th IEEE Conference on Decision and Control, 1998.

[5] S. Tatikonda, S. Mitter, "Control Under Communication Constraints", MIT PhD Thesis, August 2000.

[6] M. Barborak, M. Malek, A. Dahbura, "The consensus problem in fault-tolerant computing", ACM Computing Surveys, 1993, Vol 25, N.2

[7] KapDae Ahn, Jong Kim, SungJe Hong, "Fault-tolerant real-time scheduling using passive replicas", Proceedings of Pacific Rim International Symposium on Fault-Tolerant Systems, 1997.

[8] M. Caccamo G. Buttazzo,"Optimal scheduling for fault-tolerant and firm real-time systems", Proc. IEEE Conference on Real-Time Computing Systems and Applications, Hiroshima, Japan, 1998.

[9] M. Caccamo, G.Buttazzo, and L. Sha, "Capacity sharing for overrun control.", Proc. IEEE Real-Time Systems Symposium, Orlando FL, 2000.

[10] C. Dima, A. Girault, C. Lavarenne, and Y. Sorel, "Off-line real-time fault-tolerant scheduling", Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing, Mantova, Italy, 2001.

[11] J. Aguilar and M. Hernandez, "Fault tolerance protocols for parallel programs based on tasks replication", Proceedings of MASCOTS, San Francisco, CA, 2000.

More details can be found in the research plan.

For questions or comments pinello at eecs.berkeley.edu

Contact

Efficient development of real-time fault-tolerant controllers

Claudio Pinello, University of California at Berkeley Luigi Palopoli, Scuola Superiore S.Anna Pisa, Italy Alberto Sangiovanni-Vincentelli, University of California at Berkeley

The envisioned methodology: an overview

3. Research tasks

3.1 Control synthesis with computation constraints

3.2 Synthesis of distributed fault-tolerant schedules

Claudio Pinello, University of California at Berkeley
Luigi Palopoli, Scuola Superiore S.Anna Pisa, Italy
Alberto Sangiovanni-Vincentelli, University of California at Berkeley