15.11 Declustering Scheduler

Declustering scheduler is the most elaborate scheduler developed by Sih . This algorithm only applies to the homogeneous multiprocessor targets and it does not support wormholes nor parallel stars. Since it takes into account the global information of the graph, it may overcome the weaknesses of list schedulers which consider only local information at each scheduling step. It turns out that this scheduling algorithm is very costly since it involves recursive executions of the list scheduler with various assignment, and choosing the best scheduling result. The scheduling routine was originally written in LISP for the Gabriel system and was translated into C++. Since the algorithm itself is very complicated, the reader of the code is highly encouraged to read Sih's paper on the scheduling algorithm.

Class DeclustScheduler is derived from class ParScheduler. It has a constructor with two arguments as the ParScheduler class. The subclass of ParProcessors used in DeclustScheduler is DCParProcs. The DeclustScheduler maintains two kinds of DCParProcs instances, one to save the best scheduling result so far, and the other is for retrying list scheduling whose results will be compared with the best result so far. These two DCParProcs are created in

void setUpProcs(int num);

They are deleted in the destructor.

StringList displaySchedule();

Displays the best scheduling result obtained so far.

The overall procedure of the declustering algorithm is:

(1) Make elementary clusters of nodes. To make elementary clusters, we examine the output arcs of all branch nodes (a branch node is a node with more than one output arc) and the input arcs of all merge nodes (a merge node is a node with more than one input arc). Those arcs are candidates to be cut to make clusters. An arc is cut if the introduced communication overhead can be compensated by exploiting parallelism.

(2) We make a hierarchy of clusters starting from elementary clusters up to one cluster which includes all nodes in the APEG. Clusters with the smallest work-load will be placed at the bottom level of the hierarchy.

The above two steps are performed in the following protected method:

int preSchedule();

Before making clusters, it first checks whether the APEG has wormholes or parallel stars. If it finds any, it returns FALSE.

(3) We decompose the cluster hierarchy. We examine the hierarchy from the top. We assign two processors to each branch (son cluster) of the top node at the next level. Then, we execute list scheduling and save the scheduling result. We choose a son cluster with a larger work-load. Then, we introduce another processor to schedule two branches of the son cluster. Execute a different list scheduling to compare the previously best scheduling result, then save the better result. Repeat this procedure until all processors are consumed. It is likely to stop traversing the cluster hierarchy since all processors are consumed. At this stage, we compare the loads of processors and try to balance the loads within a certain ratio by shifting some elementary clusters from the most heavily loaded processor onto a lightly loaded processor.

(4) In some cases, we can not achieve our load-balancing goals by shifting clusters. We try to breakdown some elementary clusters in heavily loaded processors to lightly loaded clusters. This is cluster "breakdown".

(5) In each step of (3) and (4), we execute list scheduling to compare the previously best scheduling result. This is the reason why the declustering algorithm is computationally expensive. Finally, we get the best scheduling result. Based on that scheduling result, we make a final version of the APEG including all communication nodes (in the finalizeGalaxy method of DCParProcs class). Note that we do not call the list scheduling algorithm in the ParProcessors class after we find out the best scheduling result since we already executed that routine for that result.

Steps (3), (4), and (5) are performed in the following public method:

int scheduleIt();

Many details of the scheduling procedure are hidden with private methods. The remaining section will describe the classes used for Declustering scheduling one by one.

15.11.1 Class DCNode

Class DCNode, derived from class ParNode, is an APEG node for the declustering scheduler. It has the same constructors with the ParNode class. This class does not have any protected members.

int amIMerge();
int amIBranch();

These methods return TRUE if this node is a merge node or a branch node.

DCCluster* cluster;
DCCluster* elemDCCluster;

These pointers point to the highest-level cluster and the current elementary cluster owning this node.

void saveInfo(); 
int getBestStart(); 
int getBestFinish();

The first method saves the scheduling information of this node with the best scheduling result, which includes the processor assignment, the scheduled time, and the completion time. The next two methods return the scheduled time and the completion time of the current node.

int getSamples(DCNode* destN);

Returns the number of samples transferred from the current node to the destination node destN. If no sample is passed, it returns 0 with an error message.

DCNode* adjacentNode(DCNodeList& nlist, int direction);

DCNodeList is derived from class EGNodeList just to perform type casting. This method returns an adjacent node of the current node in the given node list. If direction is 1, look at the ancestors, if -1, look at the descendants.

StringList print();

Prints the master star name and the invocation number of the node.

There are three iterators defined for DCNode class: DCNodeListIter, DCAncestorIter, and DCDescendantIter. As names suggest, they return the DCNode in the list, in the ancestors of a node, and in the descendants of a node.

15.11.2 Classes DCArc and DCArcList

Class DCArc represents an candidate arc in the APEG to be cut in making elementary clusters. It has a constructor with five arguments.

DCArc(DCNode* src, DCNode* sink, int first, int second,int third);

The first two arguments indicate the source and destination nodes of the arc. The remaining three arguments define a triplet of information used to help find the arcs to be cut. We call these arcs "cut-arcs". We consider a pair of a branch node and a merge node and two paths between them to determine if a pair of cut-arcs to parallelize these two paths is beneficial. The first argument is the sum of execution times of nodes preceding this arc, starting from the branch node. The second argument is the communication overhead for this arc. The third argument is the sum of execution times of nodes following from this arc to the merge node.

The five arguments given to the constructor can be retrieved by the following methods:

DCNode* getSrc(); 
DCNode* getSink();
int getF();
int getS(); 
int getT();

They can be printed by

StringList print();

The sink and the source nodes can be reversed, and can be copied from an argument DCArc by the following methods:

void reverse(); 
int operator==(DCArc& arc);

There are other public methods as follow:

DCArcList* parentList();

A DCArc will be inserted to a list of DCArcs, call DCArcList. This method returns the pointer to the list structure.

int betweenSameStarInvoc();

Returns TRUE or FALSE, based on whether this arc is between invocations of the same star.

Class DCArcList is derived from class SequentialList to make a list of DCArcs. It has a constructor with no argument and a copy constructor. The destructor deletes all DCArcs in the list.

void insert(DCArc* arc);
void append(DCArc* arc);

These methods to put arc at the front and the back of the list, respectively.

DCArc* head();

Returns the DCArc at the front of the list.

int remove(DCArc* arc);

Removes arc from the list.

int member(DCArc* arc);

Returns TRUE if the given DCArc is a member of the list.

int mySize();

Returns the number of DCArcs in the list.

StringList print();

It prints a list of DCArcs in the list.

There is a iterator for DCArcList, called DCArcIter, which returns a DCArc.

15.11.3 Class DCGraph

Class DCGraph, derived from class ParGraph, is an input APEG for DeclustScheduler. It has no explicit constructor.

EGNode* newNode(DataFlowStar*, int);

Creates a DCNode as an APEG node for DCGraph.

DCNodeList BranchNodes; 
DCNodeList MergeNodes;

These lists store the branch nodes and merge nodes.

int initializeGraph();

This protected method initializes the DCGraph. It sets up the lists of branch nodes and merge nodes (BranchNodes, MergeNodes), and the list of initially runnable nodes. We sort these lists by the static levels of the nodes: the branch nodes are sorted by smallest static level first while the merge nodes are sorted by largest static level first. In this method, we also initialize the DCNodes, which includes the detection of the merge nodes that are reachable from the node and the branch nodes reachable to the node.

The remaining methods are all public.

const char* genDCClustName(int type);

Generate a name for the cluster. If type = 0, we prefix with "ElemDCClust" to represent an elementary cluster. Otherwise, we prefix with "MacroDCClust".

StringList display();

Displays the APEG with the lists of initially runnable nodes, the branch nodes, and the merge nodes.

DCNode* intersectNode(DCNode* d1, DCNode* d2, int direction);

This method returns a merge node with the smallest static level, reachable from both d1 and d2 if direction = 1. If direction = 0, it returns a branch node with the smallest static level, that can reach both d1 and d2 nodes.

DCArcList* traceArcPath(DCNode* branch, DCNode* src, DCNode* dest, int direction);

This method makes a list of candidate cut-arcs between branch and dest nodes, and returns the pointer to the list. The second argument, src, is an immediate descendant of the branch node on the path to the dest node. If direction = 1, we reverse all arcs and find cut-arcs from dest to branch nodes.

void addCutArc(DCArc* arc);

This method adds a DCArc to a list of cut-arcs in DCGraph.

void formElemDCClusters(DCClusterList& EClusts);

In this method, we remove all cut-arcs in the APEG and make each connected component an elementary cluster. The argument EClusts is the list of those elementary clusters. We connect these clusters at the end.

void computeScore();

In scheduling stage (3) of the DeclustScheduler, we may want to shift clusters from heavily loaded processors to lightly loaded processors. To prepare this step, we compute the score of top-level clusters in that scheduling phase. The score of a cluster is the number of samples passed to other processors minus the number of samples passed inside the same processor along the cut-arcs within that cluster. The score indicates the cost of shifting a cluster due to communication.

void commProcs(DCCluster* clust, int* procs);

This method finds processors that clust communicates with. We set the component of the second argument array to 1 if that processor communicates with the cluster.

void copyInfo();

Used for saving the scheduling information if the most recent scheduling result is better than the previous ones.

15.11.4 Class DCCluster

Class DCCluster represents a cluster of nodes in the declustering algorithm. There is no protected member in this class. It consists of two DCClusters, called component clusters, to make a hierarchy of clusters. An elementary cluster has NULL component clusters. It is constructed by a one-argument constructor.

DCCluster(DCNodeList* node-list);

Makes the cluster contain all nodes from the list.

To make a macro cluster, we use the following constructor:

DCCluster(DCCluster* clust1, DCCluster* clust2);

The argument clusters become the component clusters of this higher level cluster. The cluster-arcs are established from the cluster-arcs of two component clusters by calling the following method:

void fixArcs(DCCluster* clust1, DCCluster* clust2);

In this method, arcs put inside this cluster are removed from the arcs of two argument clusters.

In both constructors, we compute the sum of execution times of all nodes in the cluster, which can be obtained by

int getExecTime(); 
DCCluster* getComp1();
DCCluster* getComp2();

The last two methods above return two component clusters.

void setName(const char* name); 
const char* readName();

The above methods set and get the name of the cluster.

void addArc(DCCluster* adj, int numSample);

This method adds a cluster-arc that is adjacent to the first argument cluster with sample rate numSample.

void setDCCluster(DCCluster* clust);

Sets the cluster pointer of the nodes in this cluster to the argument cluster.

void assignP(int procNum); 
int getProc();

The first method assigns all nodes in the cluster to a processor. The second returns the processor that this cluster is assigned to.

void switchWith(DCCluster* clust);

Switches the processor assignment of this cluster with the argument cluster.

DCCluster* pullWhich();

Returns the cluster with the smaller execution time between two component clusters, and pull it out.

DCCluster* findCombiner();

This method returns the best cluster, in terms of cluster-arc communication cost, to be combined. We break ties by returning the cluster with smallest execution time.

void broken(); 
int getIntact();

These methods indicate whether the cluster or its subclusters were broken into its components in the scheduling stage (3) of DeclustScheduler. The first method indicates that it happens. The second method queries whether it happens or not.

int getScore(); 
int setScore(int score);
void resetMember();

These methods get and set the score of the cluster. Refer to the computeScore method of the DCGraph class to see what the score of a cluster is. The last method resets the score to 0.

StringList print();

Prints the name of this cluster and the names of component clusters.

~DCCluster();

The destructor deletes the nodes in the cluster and cluster-arcs if it is an elementary cluster.

15.11.5 Class DCClusterList

Class DCClusterList, derived from class DoubleLinkList, keeps a list of clusters. It has no protected members. It has a default constructor and a copy constructor.

void insert(DCCluster* clust); 
void append(DCCluster* clust);
void insertSorted(DCCluster* clust);

These methods put clust at the head and the back of the list. The last method inserts the cluster in order of increasing execution time.

DCClusterLink* firstLink(); 
DCCluster* firstDCClust(); 
DCCluster* popHead();

The above methods return the DCClusterLink and DCCluster at the head of the list. The last method removes and returns the cluster from the list. Class DCClusterLink is derived from class DoubleLink as a container of DCCluster in the DCClusterList. It has a public method to access the cluster called

DCCluster* getDCClustp(); 
DCClusterLink* createLink(DCCluster* clust); 
void removeDCClusters();

Create a DCClusterLink and removes all clusters in the list, respectively.

void resetList(); 
void resetScore();
void setDCClusters();

The first two methods reset the scores of all clusters to 0. The first method also declares that each cluster is not broken. The third method resets the cluster pointer of the nodes of the clusters in the list.

int member(DCCluster* clust);

Returns TRUE or FALSE, based on whether the argument cluster is in the list or not.

void findDCClusts(DCNodeList& nlist);

Add to the list clusters that own the nodes of the argument list. If the number of clusters is 1, we break the cluster into two component clusters and put them into the list.

int listSize();

Returns the number of clusters in the list.

StringList print();

Prints the list of clusters.

There is an iterator associated with the DCClusterList called DCClusterListIter that returns a DCCluster. It can return a DCClusterLink by the nextLink method.

15.11.6 Class DCClustArc and class DCClustArcList

Class DCClustArc represents a cluster-arc. It has a constructor with two arguments:

DCClustArc(DCCluster* neighbor, int nsamples);

The first argument is the pointer to the neighboring cluster while the second argument sets the sample rate of the connection.

DCCluster* getNeighbor(); 
void changeNeighbor(DCCluster* clust);

These methods return the neighbor cluster and change to it.

void changeSamples(int newsamps); 
void addSamples(int delta);
int getSamples();

The above methods modify, increment, and return the sample rate of the current arc.

StringList print();

Prints the name of the neighbor cluster and the sample rate.

Class DCClustArcList is derived from class SequentialList to make a list of cluster-arcs. It has four public methods.

DCClustArc* contain(DCCluster* clust);

Returns the DCClustArc that is adjacent to the argument cluster. If no cluster-arc is found in the list, return 0.

void changeArc(DCCluster* oldC, DCCluster* newC);

This method changes the pointer of neighbor cluster, oldC, in all cluster-arcs in the list to newC.

void removeArcs();

Deletes all cluster-arcs in the list.

StringList print();

Prints the list of DCClustArcs.

There is an iterator associated with the DCClustArcList, called DCClustArcListIter, which returns a DCClustArc.

15.11.7 Class DCParProcs

Class DCParProcs is derived from class ParProcessors. It has the same constructor and destructor with the ParProcessors class.

There is one protected method:

ParNode* createCommNode(int i);

Creates a DCNode to represent a communication code. The argument indicates the type of the node.

The other methods are all public, and support the main scheduling procedure described in the DeclustScheduler class .

int commAmount();

Returns the communication overhead of the current schedule.

void saveBestResult(DCGraph* graph);

This method saves the current scheduling information of the nodes as the best scheduling result.

void finalizeGalaxy(DCGraph* graph);

After all scheduling is completed, we make a final version of the APEG including all communication loads based on the best scheduling result obtained.

void categorizeLoads(int procs);

This method categorizes each processor as either heavily or lightly loaded. It sets an integer array, nprocs, 1 for heavy and -1 for light processors. The initial threshold is 50 processors are heavily loaded if all processors are loaded beyond a 75 maximum load. We regard at most one idle processor as lightly loaded.

int findSLP(DCNodeList* nlist);

This method finds the progression of nodes (regular or communication) in the schedule which prevents the makespan from being any shorter. We call this set of nodes and schedule limiting progression: SLP (refer to Sih's paper). The SLP can span several processors and can't contain idle times. If there are several SLPs it will return just one of them.