Page 21 out of 24 total pages


16 PN Domain

Author: Mudit Goel

16.1 Introduction

The process networks (PN) domain in Ptolemy II models a system as a network of sequential processes, implemented as Java threads [65], that communicate by passing messages through unidirectional first-in-first-out (FIFO) channels. A process blocks when trying to read from an empty channel until a message becomes available on it. This model of computation is deterministic in the sense that if the processes are deterministic and communicate only via the channels, then the sequence of values communicated on the channels is completely determined by the model.

PN is a natural model for describing signal processing systems where infinite streams of data samples are incrementally transformed by a collection of processes executing in parallel. Embedded signal processing systems are typically designed to operate indefinitely with limited resources. Thus, we want to execute process network programs forever with bounded buffering on the communication channels whenever possible.

PN can also be used to model the concurrency in the various hardware components of an embedded system. The original process networks model of computation can model the functional behavior of these systems and test them for their functional correctness, but it cannot directly model their real-time behavior. To address this, the PN domain extends the model by introducing a notion of time.

In addition, some systems might display adaptive behavior like migrating code, agents, and arrivals and departures of processes. For this, we provide a mutation mechanism that supports addition, deletion, and changing of processes and channels. With untimed PN, this might display non-determinism, while with timed-PN, it becomes deterministic.

The PN model of computation a superset of the synchronous dataflow model of computation (see the SDF Domain chapter). Thus any SDF actor can be used in the PN domain. Similarly any domain-polymorphic actor can be used in the PN domain. A separate process is created for each of these actors.

The software architecture for PN is described in section 16.3 and the finer technical details are explained in section 16.4.

16.2 Process Network Semantics

16.2.1 Asynchronous Communication

Kahn and MacQueen [40][41] describe a model of computation where processes are connected by communication channels to form a network. Processes produce data elements or tokens and send them along a unidirectional communication channel where they are stored in a FIFO queue until the destination process consumes them. This is a form of asynchronous communication between processes. Communication channels are the only method processes may use to exchange information. A set of processes that communicate through a network of FIFO queues defines a program.

Kahn and MacQueen require that execution of a process be suspended when it attempts to get data from an empty input channel (blocking reads). A process may not poll the channel for presence or absence of data. At any given point, a process is either doing some computation (enabled) or it is blocked waiting for data (read blocked) on exactly one of its input channels; it cannot wait for data from more than one channel simultaneously. Systems that obey this model are determinate; the history of tokens produced on the communication channels does not depend on the execution order. Therefore, the results produced by executing a program are not affected by the scheduling of the various processes.

In case all the processes in a model are blocked while trying to read from some channel, then we have a real deadlock. In such a case, none of the processes can do anything further. The determinacy of the program also guarantees that a real deadlock is a program state and does not depend on the schedule of the various processes in the model.

16.2.2 Bounded Memory Execution

The high level of concurrency in process networks makes it an ideal match for embedded system software and for modeling hardware implementations. But the Kahn-MacQueen semantics do not guarantee bounded memory execution of process networks even if it is possible for the application to execute in bounded memory. Most real-time embedded applications and hardware processes are intended to run indefinitely with a limited amount of memory. Thus bounded memory execution of process networks becomes crucial for its usefulness for hardware and embedded software.

Parks [68] addresses this aspect of process networks and provides an algorithm to make a process network application execute in bounded memory whenever possible. He provides an implementation of the Kahn-MacQueen semantics using blocking writes that assigns a fixed capacity to each FIFO channel and forces processes to block temporarily if a channel is full. To avoid introducing deadlock, these capacities are increased if absolutely necessary. A process may not poll the channel for room. Thus a process has three states now: running (executing), read blocked, or write blocked.

Deadlocks can now occur when all processes are blocked either on a read or on a write to a channel. In case all the processes in a model are blocked with at least one process blocked on a write, then we have an artificial deadlock. On detection of an artificial deadlock, Parks chooses the channel with the smallest capacity among the channels on which processes are blocked on a write and increases its capacity to break the deadlock.

16.2.3 Time

The process networks model of computation lacks a notion of time. In some real time systems and embedded applications, the real time behavior of a system is as important as the functional correctness. Developers can use process networks test applications for functional correctness and use some other timed model of computation, such as DE, for testing their timing behavior. Introducing a notion of time to the process networks model of computation is therefore a natural extension of PN. This is done in the PN domain in Ptolemy II.

In the PN domain, time is global. That is, all processes in a model share the same time, referred to as the current time or model time. A process can explicitly wait for time to advance. It can choose to delay itself for some period from the current time. When a process delays itself for some length of time from the current time, it is suspended until time has sufficiently advanced, at which stage it wakes up and continues. If the process delays itself for zero time, this will have no effect and the process will continue executing.

For a process, time cannot change during its normal execution (i.e. when the process is enabled). Time for a process may advance in only one of the two states:

  1. The process is delayed and is explicitly waiting for time to advance (delay block).

    2. The process is waiting for data to arrive on one of its input channels (read block).

    Time is advanced when all the processes are blocked on either a delay or on a read from a channel with at least one process delayed. This state of the program is called a timed deadlock. In case of a timed deadlock, the current time is advanced just enough to wake up at least one process.

    A process can be aware of the global time, but it cannot influence the current time except by delaying itself. This model of time is influenced by Pamela [27], a run time library that is used to model parallel programs.

16.2.4 Mutations

The PN domain tolerates mutations, which are run-time changes in the model structure. Normally, mutations are realized as change requests queued with the director or manager. In case of timed PN, requests for mutations are not processed until there is a timed deadlock. Because occurrence of a timed deadlock is determinate, mutations in timed PN are determinate.

In case of untimed PN, there is no determinate point where mutations can occur. The only determinate point in case of untimed PN is a read deadlock. Performing mutations at this point is unlikely to be useful because a real deadlock might never occur. A model with even one non-terminating source never experiences a real deadlock. Thus in case of untimed PN, mutations are performed as soon as they are requested (if they are queued with the director) and when real deadlock occurs (if they are queued with the manager). Since different schedules will result in different states of the model when mutations are performed, the former introduces non-determinism in untimed PN. The details of implementations are presented later in section 16.3.

16.3 The PN Software Architecture

16.3.1 PN Domain

Ptolemy II is modular and is divided into packages, each of which provide separate functionalities. The abstract syntax is separated from the mechanisms that attach semantics. In PN, the package that attaches the process networks semantics is the ptolemy.domains.pn.kernel. A UML static structure diagram is shown in figure 16.1 (see appendix A of chapter 1). This section discusses the implementation in details.

16.3.2 The Execution Sequence

Director

In process networks, each node of the graph is a separate process. In the PN domain in Ptolemy II, this is achieved by letting each actor have its own separate thread of execution. These threads are based on the native Java threads [65][43] and are instances of ptolemy.actors.ProcessThread.

BasePNDirector:

This is the base class for the directors that govern the execution of a CompositeActor with Kahn process networks (PN) semantics. This base class attaches the Kahn-MacQueen process networks semantics to a composite actor. This director does not support mutations or a notion of model time. This provides a mechanism to perform blocking reads and bounded memory execution (using blocking writes) whenever possible. Thus it is capable of handling real and artificial deadlocks.

The first step in the execution is the call to the initialize() method of the director. This method creates the receivers in the input ports of the actors for all the channels and creates a thread for each actor. It initializes all the actors in the graph. It also sets the count of active actors in the model, which is required for detection of deadlocks and termination, to the number of actors in the composite actor.

The next stage is the iterations. It starts with a call to the prefire() method of the director. This method starts all the threads that were created for the actors in the initialize() method of the director. In PN, this method always returns true.

The fire() method of the director is called next. In PN, the fire() method is responsible for handling deadlocks. This director resolves artificial deadlocks as soon as they arise according to Parks's algorithm as explained in section 16.2.2. On detection of a real deadlock, the method returns.

The last stage of the iteration cycle of the director is the call to the postfire() method. This method returns false if the composite actor containing the director has no input ports. Otherwise it returns true. Returning true implies that if some new data is provided to the composite actor on the input ports, then the execution can resume. If it returns false, then this composite actor will not be fired again. In such a case the executive director or the manager will call the wrapup() method of the top-level composite actor. This in turn calls the wrapup() method of the director. The director then terminates the execution of the composite actor. Details of termination are discussed in section 16.3.4.

PNDirector:

PNDirector is the same as BasePNDirector with one additional functionality. This director supports mutations of a graph. The mutations are processed as soon as they are requested. The point at which the mutations are processed depends on the schedule of the threads in the model. Thus these mutations might introduce non-determinism to the model.

TimedPNDirector:

TimedPNDirector has two functionalities that distinguishes it from BasePNDirector. It introduces a notion of global time to the model and it allows deterministic mutations. Mutations are performed at the earliest timed-deadlock that occurs after they are queued. Since occurrence of timed-deadlock is completely deterministic, performing mutations at this point in the model makes mutations deterministic.

Execution of Actors

As mentioned earlier, a separate thread is responsible for the execution of each actor in PN. This thread is started in the prefire() method of the director. After starting, this thread repeatedly calls the prefire(), fire(), and postfire() methods of the actor. This sequence continues until the postfire() or the prefire() method returns false. The only way for an actor to terminate gracefully in PN is by returning from the fire() method and returning false in the postfire() or prefire() method of the actor. If an actor finishes execution as above, then the thread calls the wrapup() method of the actor. Once this method returns, the thread informs the director about the termination of this actor and finishes its own execution. This actor is not fired again unless the director creates and starts a new thread for the actor. Also, if an actor returns false in its prefire() method, the first time it is called, the actor would never be fired in PN.

Message Passing

Recall that in Ptolemy II, data transfer between entities is achieved using ports and the receivers embedded in the input ports. Each receiver in an input port is capable of receiving messages from a distinct channel.

An actor calls the send() or broadcast() method on its output port to transmit a token to a remote actor. The port obtains a reference to a remote receiver (via the relation connecting them) and calls the put() method of the receiver, passing it the token. The destination actor retrieves the token by calling the get() method of its input port, which in turn calls the get() method of the designated receiver.

Both the get() and send() methods of the port take an integer index as an argument which the actor uses to distinguish between the different channels its port is connected to. This index specifies the channel to which the data is being sent to or being received from. If the ports are connected to a single channel, then the index is 0. But if the port is connected to more than one channel (a multiport), say N channels, then the index ranges from 0 to N-1. The broadcast() method of the port does not require an index as it transmits the token to all the channels it is connected to.

In the PN domain, these receivers are instances of ptolemy.domains.pn.kernel.PNQueueReceiver. These receivers have a FIFO queue in them to provide the functionality of a FIFO channel in a process networks graph. In addition to this, these receivers are also responsible for implementing the blocking reads and blocking writes. They handle this using the get() and the put() methods. These methods are as shown in figures 16.2,16.3.

The get() method checks if the FIFO queue has any tokens. If not, then it increases the count, tracking the number of actors blocked on a read in the director and sets its _readpending flag to true. Then the calling thread is suspended until some actor puts a token in the FIFO queue and sets the _readpending flag of this receiver to false. (This is done in the put() method as described later.) On resuming, it reads and removes the first token from the FIFO queue. In case some process is blocked on a write to this receiver (the FIFO queue is full to capacity), it unblocks that process, notifies it, and returns. This method also handles the termination of the simulation as is explained later in section 16.3.4.

The put() method of the receiver is responsible for implementing the blocking writes. This method checks whether the FIFO queue is full to capacity. If it is, then it sets its _writepending flag to true and informs the director that a process is blocked on a write. Then it suspends the calling process until someone wakes it up after setting the _writepending flag to false. After this, it puts the token into the FIFO queue and checks if some process is blocked on a read from this receiver. If a process is blocked on a read, it unblocks it and informs it that a new token is now available for it to read. Then the method returns.

16.3.3 Detecting deadlocks:

The mechanism for detecting deadlocks in the Ptolemy II implementation of PN is based on the mechanism suggested in [42]. This mechanism requires keeping count of the number of threads currently active, paused, and blocked in the simulation. The number of threads that are currently active in the graph is set by a call to the _increaseActiveCount() of the director. This method is called whenever a new thread corresponding to an actor is created in the simulation. The corresponding method for decreasing the count of active actors (on termination of a process) is _decreaseActiveCount() in the director.

Whenever an actor blocks on a read from a channel, the count of actors blocked on a read is incremented by calling the _informOfReadBlock() method in director. Similarly, the number of actors blocked on a write is incremented by a call to the _informOfWriteBlock() method of the director. The corresponding methods for decreasing the count of the actors blocked on a read or a write are _informOfReadUnblock() and _informOfWriteUnblock(), respectively. These methods are called from the instances of the PNQueueReceiver class when an actor tries to read from or write to a channel. Similarly, when a process queues a mutation, it informs the director by a call to the _informOfMutationBlock().

Every time an actor blocks, the director checks for a deadlock. If the total number of actors blocked or paused equals the total number of actors active in the simulation, a deadlock is detected. On detection of a deadlock, if one or more actors are blocked on a write, then this is an artificial deadlock. The channel with the smallest capacity among all the channels with actors blocked on a write is chosen and its capacity is incremented by 1. This implements the bounded memory execution as suggested by [68]. If a real deadlock is detected at the top-level composite actor, then the manager terminates the simulation.

16.3.4 Terminating the model:

A simulation can be ended (on detection of a real deadlock) by calling the wrapup() method on either the toplevel composite actor or the corresponding director. This method is normally called by the manager on the top-level composite actor. In PN, this method traverses the topology of the graph and calls the setFinish() method of the receivers in the input ports of all the actors. Since this method is called only when a real deadlock is detected, one can be sure that all the active actors in the simulation are currently blocked on a read from a channel and are waiting in the call to the get() method of a receiver. This fact is used to wrap up the simulation. The setFinish() method of the receiver sets the termination flag to true, and wakes up all the threads currently waiting in the get() method of the receiver(Figure 16.4) . This is implemented using the wait() - notifyAll() mechanism of Java [65][43]. Once these threads wake up, they see that the termination flag is set. This results in the get() method of the receivers throwing a TerminateProcessException (a runtime exception in Ptolemy II). This exception is never caught in any of the actor methods and is eventually caught by the process thread. The thread catches this runtime exception, calls the wrapup() method of the actor and finishes its execution. Eventually after all threads catch this exception and finish executing, the simulation ends.

16.3.5 Mutations of a Graph

The PN domain in Ptolemy II allow graphs to mutate during execution. This implies that old processes or channels can disappear from the graph and new processes and channels can be created during the simulation.

Though other domains, like SDF, also support mutations in their graphs, there is a big difference between the two. In domains like SDF, mutations can occur only between iterations. This keeps the simulation determinate as changes to the topology occur only at a fixed point in the execution cycle. In PN, the execution of a graph is not centralized, and hence, the notion of an iteration is quite difficult to define. Thus, in PN, we let mutations happen as soon as they are requested, if they are queued with the director rather than the manager. This is the behavior of PNDirector. (TimedPNDirector performs mutations only when there is a timed-deadlock. Mutations in this form are deterministic.) The point in the execution where mutations occur would normally depend on the schedule of the underlying Java threads. Under certain conditions where the application can guarantee a fixed point in the execution cycle for mutations, or where the mutations are localized, they can still be determined.

In case of TimedPNDirector, all mutations are deterministic as request to perform mutations is not processed unless there is a timed deadlock. Since occurrence of a timed deadlock does not depend on the schedule of the underlying threads, the mutations are completely deterministic.

An actor can request a mutation by creating an instance of a class derived from ptolemy.kernel.event.ChangeRequest. It should override the method execute() and include the commands that it wants to use to perform mutations in this method.

16.4 Technical Details

There are two main issues that a developer should be aware of while extending PN. The first one is to get the mutual exclusion right and the second is to avoid undetected deadlocks.

16.4.1 Mutual Exclusion using Monitors

In PN, threads interact in various ways for message passing, deadlock detection, etc. This requires various threads to access the same data structures. Concurrency can easily lead to inconsistent states as threads could access a data structure while it is being modified by some other thread. This can result in race conditions and undesired deadlock [4]. For this, Java provides a low-level mechanism called a monitor to enforce mutual exclusion. Monitors are invoked in Java using the synchronized keyword. A block of code can be synchronized on a monitor lock as follows:

synchronized (obj) {
... //Part of code that requires exclusive lock on obj.
}

This implies that if a thread wants to access the synchronized part of the code, then it has to grab an exclusive lock on the monitor object, obj. Also while this thread has a lock on the monitor, no thread can access any code that is synchronized on the same monitor.

There are many actions (like mutations) that could affect the consistency of more than one object, such as the director and receivers. Java does not provide a mechanism to acquire multiple locks simultaneously. Acquiring locks sequentially is not good enough as this could lead to deadlocks. For example, consider a thread trying to acquire locks on objects a and b in that order. Another thread might try to obtain locks on the same objects in the opposite order. The first thread acquires a lock on a and stalls to acquire a lock on b, while the second thread acquires a lock on b and waits to grab a lock on a. Both threads stall indefinitely and the application is deadlocked.

The main problem in the above example is that different threads try to acquire locks in conflicting orders. One possible solution to this is to define an order or hierarchy of locks and require all threads to grab the locks in the same top-down order [43]. In the above example, we could force all the threads to acquire locks in a strict order, say a followed by b. If all the code that requires synchronization respects this order, then this strategy can work with some additional constraints, like making the order on locks immutable. Although this strategy can work, this might not be very efficient and can make the code a lot less readable. Also Java does not permit an easy and straightforward way of implementing this.

We follow a similar but easier strategy in the PN domain of Ptolemy II. We define a three level strict hierarchy of locks with the lowest level being the director, the middle level being the various receivers and the highest level being the workspace. The rule that all threads have to respect after acquiring their first lock is to never try acquiring a lock at a higher or at the same level as their first lock. Specifically, a block of code synchronized on the director should not try to access a block of code that requires a lock on either the workspace or any of the receivers. Also, a block of code synchronized on a receiver should not try to call a block of code synchronized on either the workspace or any other receiver.

Some discussion about these locks in PN is presented in the following section.

16.4.2 Hierarchy of Locks

The highest level in the hierarchy of locks is the Workspace which is a class defined specifically for this purpose. This level of synchronization though is quite different from the other two forms. This synchronization is modeled explicitly in Ptolemy II and is another layer of abstraction based on the Java synchronization mechanism. The principle behind this mechanism is that if a thread wants to read the topology, then it wants to read it only in a consistent state. Also if a thread is modifying the topology, then no other thread should try to read the topology as it might be in an inconsistent state. To enforce this, we use a reader-writer mechanism to access the workspace (see the Kernel chapter). Any thread that wants to read the topology but does not modify it, requests a read access on the workspace. If the thread already has a read or write access on the workspace, it gets another read access immediately. Otherwise if no thread is currently modifying the topology, and no thread has requested a write access on the workspace, the thread gets the read access to the workspace. If the thread cannot get the read access currently, it stalls until it gets it. Similarly, if a thread requests a write access on the workspace, it stalls until all other threads give up their read and write access to the workspace. Thus though a thread does not have an exclusive lock on the workspace, the above mechanism provides a mutual exclusion between the activities of reading the topology and modifying the topology. This way of synchronizing on the workspace is distinctly different from possessing an exclusive lock on the workspace.

Once a thread has a read or write access on the workspace, it can call methods or blocks of code that are synchronized on a single receiver or the director. The receivers form the next level in the hierarchy of locks. These receivers are once again accessed by different threads (the reader and the writer to the queue) and need to be synchronized. For example, a writer thread might try to write to a receiver while another token is being read from it. This could leave the receiver in an inconsistent state. The state of a receiver might include information about the number of tokens in the queue, the information about any process blocked on a read or a write to the receiver and some other information. These methods or blocks of code accessing and modifying the state of the receivers, are forced to get an exclusive lock on the receiver. These blocks might call methods that require a lock on the director, but do not call methods that require a lock on any other receiver.

The lower most lock in the hierarchy is the PNDirector object. There are some internal state variables, such as the number of processes blocked on a read, that are accessed and modified by different threads. For this, the code that modifies any internal state variable should not let more than one thread access these variables at the same time. Since access to these variables is limited to the methods in director, the blocks of code modifying these state variables obtain an exclusive lock on the director itself. These blocks should not try to access any block of code that requires an exclusive lock on the receivers or requires a read or a write access on the workspace.

16.4.3 Undetected Deadlocks

Undetected deadlocks should be avoided while extending the PN domain in Ptolemy II. We discuss a significant but subtle issue that a developer should be aware of when trying to extend the PN domain. This concerns the release of locks from a suspended thread.

In Java, when a thread with an exclusive lock on multiple objects suspends by calling wait() on an object, it releases the lock only on that object and does not release other locks. For example, consider a thread that holds a lock on two objects, say a and b. It calls wait() on b and releases the lock on b alone. If another thread requires a lock on a to perform whatever action the first thread is waiting for, then deadlock will ensue. That thread cannot get a lock on a until the first thread releases its exclusive lock on a, and the first thread cannot continue until the second thread gets the lock on a from the first and performs whatever action it is waiting for.

This sort of scenario is currently avoided in PN, by following some simple rules. The first of them being that a method or block synchronized on the director never calls wait() on any object. Thus once a thread grabs a lock on director, it is guaranteed to release it. The second is that a block of code with an exclusive lock on a receiver does not call the wait() method on the workspace. (Note that the code should never synchronize directly on the workspace object and should always use the read and write access mechanism.) The third rule is that a thread should give up all the read permissions on the workspace before calling the wait() method on the receiver object. Note that in case of workspace, we require this because of the explicit modeling of mutual exclusion between the read and write activities on the workspace. If a thread does not release the read permissions on the workspace and suspends, while the second thread requires a write access on the workspace to perform the action that the first thread is waiting for, a deadlock results. Also to be in a consistent state with respect to the number of read accesses on the workspace, the thread should regain those read accesses after returning from the call to the wait() method. For this a wait(Object obj) method is provided in the class Workspace that releases all the read accesses to the workspace, calls wait() on the argument obj, and regains all the read accesses on waking up.

The above rules guarantee that a deadlock does not occur in the PN domain because of contention for various locks.




Page 21 out of 24 total pages


ptII at eecs berkeley edu Copyright © 1998-1999, The Regents of the University of California. All rights reserved.