Class DeclustScheduler is derived from class ParScheduler. It has a constructor with two arguments as the ParScheduler class. The subclass of ParProcessors used in DeclustScheduler is DCParProcs. The DeclustScheduler maintains two kinds of DCParProcs instances, one to save the best scheduling result so far, and the other is for retrying list scheduling whose results will be compared with the best result so far. These two DCParProcs are created in
void setUpProcs(int num);They are deleted in the destructor.
StringList displaySchedule();Displays the best scheduling result obtained so far.
The overall procedure of the declustering algorithm is:
(1) Make elementary clusters of nodes. To make elementary clusters, we examine the output arcs of all branch nodes (a branch node is a node with more than one output arc) and the input arcs of all merge nodes (a merge node is a node with more than one input arc). Those arcs are candidates to be cut to make clusters. An arc is cut if the introduced communication overhead can be compensated by exploiting parallelism.
(2) We make a hierarchy of clusters starting from elementary clusters up to one cluster which includes all nodes in the APEG. Clusters with the smallest work-load will be placed at the bottom level of the hierarchy.
The above two steps are performed in the following protected method:
int preSchedule();Before making clusters, it first checks whether the APEG has wormholes or parallel stars. If it finds any, it returns
FALSE
.(3) We decompose the cluster hierarchy. We examine the hierarchy from the top. We assign two processors to each branch (son cluster) of the top node at the next level. Then, we execute list scheduling and save the scheduling result. We choose a son cluster with a larger work-load. Then, we introduce another processor to schedule two branches of the son cluster. Execute a different list scheduling to compare the previously best scheduling result, then save the better result. Repeat this procedure until all processors are consumed. It is likely to stop traversing the cluster hierarchy since all processors are consumed. At this stage, we compare the loads of processors and try to balance the loads within a certain ratio by shifting some elementary clusters from the most heavily loaded processor onto a lightly loaded processor.
(4) In some cases, we can not achieve our load-balancing goals by shifting clusters. We try to breakdown some elementary clusters in heavily loaded processors to lightly loaded clusters. This is cluster "breakdown".
(5) In each step of (3) and (4), we execute list scheduling to compare the previously best scheduling result. This is the reason why the declustering algorithm is computationally expensive. Finally, we get the best scheduling result. Based on that scheduling result, we make a final version of the APEG including all communication nodes (in the
finalizeGalaxy
method of DCParProcs class). Note that we do not call the list scheduling algorithm in the ParProcessors class after we find out the best scheduling result since we already executed that routine for that result.Steps (3), (4), and (5) are performed in the following public method:
int scheduleIt();Many details of the scheduling procedure are hidden with private methods. The remaining section will describe the classes used for Declustering scheduling one by one.
int amIMerge();These methods return
int amIBranch();
TRUE
if this node is a merge node or a branch node.
DCCluster* cluster;These pointers point to the highest-level cluster and the current elementary cluster owning this node.
DCCluster* elemDCCluster;
void saveInfo();The first method saves the scheduling information of this node with the best scheduling result, which includes the processor assignment, the scheduled time, and the completion time. The next two methods return the scheduled time and the completion time of the current node.
int getBestStart();
int getBestFinish();
int getSamples(DCNode* destN);Returns the number of samples transferred from the current node to the destination node
destN.
If no sample is passed, it returns 0 with an error message.
DCNode* adjacentNode(DCNodeList& nlist, int direction);DCNodeList is derived from class EGNodeList just to perform type casting. This method returns an adjacent node of the current node in the given node list. If
direction
is 1, look at the ancestors, if -1, look at the descendants.
StringList print();Prints the master star name and the invocation number of the node.
There are three iterators defined for DCNode class:
DCNodeListIter,
DCAncestorIter,
and DCDescendantIter.
As names suggest, they return the DCNode in the list, in the ancestors of a node, and in the descendants of a node.
DCArc(DCNode* src, DCNode* sink, int first, int second,int third);The first two arguments indicate the source and destination nodes of the arc. The remaining three arguments define a triplet of information used to help find the arcs to be cut. We call these arcs "cut-arcs". We consider a pair of a branch node and a merge node and two paths between them to determine if a pair of cut-arcs to parallelize these two paths is beneficial. The
first
argument is the sum of execution times of nodes preceding this arc, starting from the branch node. The second
argument is the communication overhead for this arc. The third
argument is the sum of execution times of nodes following from this arc to the merge node.The five arguments given to the constructor can be retrieved by the following methods:
DCNode* getSrc();They can be printed by
DCNode* getSink();
int getF();
int getS();
int getT();
StringList print();The sink and the source nodes can be reversed, and can be copied from an argument DCArc by the following methods:
void reverse();There are other public methods as follow:
int operator==(DCArc& arc);
DCArcList* parentList();A DCArc will be inserted to a list of DCArcs, call DCArcList. This method returns the pointer to the list structure.
int betweenSameStarInvoc();Returns
TRUE
or FALSE
, based on whether this arc is between invocations of the same star.Class DCArcList is derived from class SequentialList to make a list of DCArcs. It has a constructor with no argument and a copy constructor. The destructor deletes all DCArcs in the list.
void insert(DCArc* arc);These methods to put
void append(DCArc* arc);
arc
at the front and the back of the list, respectively.
DCArc* head();Returns the DCArc at the front of the list.
int remove(DCArc* arc);Removes
arc
from the list.
int member(DCArc* arc);Returns
TRUE
if the given DCArc is a member of the list.
int mySize();Returns the number of DCArcs in the list.
StringList print();It prints a list of DCArcs in the list.
There is a iterator for DCArcList, called DCArcIter, which returns a DCArc.
EGNode* newNode(DataFlowStar*, int);Creates a DCNode as an APEG node for DCGraph.
DCNodeList BranchNodes;These lists store the branch nodes and merge nodes.
DCNodeList MergeNodes;
int initializeGraph();This protected method initializes the DCGraph. It sets up the lists of branch nodes and merge nodes (
BranchNodes,
MergeNodes),
and the list of initially runnable nodes. We sort these lists by the static levels of the nodes: the branch nodes are sorted by smallest static level first while the merge nodes are sorted by largest static level first. In this method, we also initialize the DCNodes, which includes the detection of the merge nodes that are reachable from the node and the branch nodes reachable to the node.The remaining methods are all public.
const char* genDCClustName(int type);Generate a name for the cluster. If
type
= 0, we prefix with "ElemDCClust" to represent an elementary cluster. Otherwise, we prefix with "MacroDCClust".
StringList display();Displays the APEG with the lists of initially runnable nodes, the branch nodes, and the merge nodes.
DCNode* intersectNode(DCNode* d1, DCNode* d2, int direction);This method returns a merge node with the smallest static level, reachable from both
d1
and d2
if direction
= 1. If direction
= 0, it returns a branch node with the smallest static level, that can reach both d1
and d2
nodes.
DCArcList* traceArcPath(DCNode* branch, DCNode* src, DCNode* dest, int direction);This method makes a list of candidate cut-arcs between
branch
and dest
nodes, and returns the pointer to the list. The second argument, src,
is an immediate descendant of the branch
node on the path to the dest
node. If direction
= 1, we reverse all arcs and find cut-arcs from dest
to branch
nodes.
void addCutArc(DCArc* arc);This method adds a DCArc to a list of cut-arcs in DCGraph.
void formElemDCClusters(DCClusterList& EClusts);In this method, we remove all cut-arcs in the APEG and make each connected component an elementary cluster. The argument
EClusts
is the list of those elementary clusters. We connect these clusters at the end.
void computeScore();In scheduling stage (3) of the DeclustScheduler, we may want to shift clusters from heavily loaded processors to lightly loaded processors. To prepare this step, we compute the score of top-level clusters in that scheduling phase. The score of a cluster is the number of samples passed to other processors minus the number of samples passed inside the same processor along the cut-arcs within that cluster. The score indicates the cost of shifting a cluster due to communication.
void commProcs(DCCluster* clust, int* procs);This method finds processors that
clust
communicates with. We set the component of the second argument array to 1 if that processor communicates with the cluster.
void copyInfo();Used for saving the scheduling information if the most recent scheduling result is better than the previous ones.
NULL
component clusters. It is constructed by a one-argument constructor.
DCCluster(DCNodeList* node-list);Makes the cluster contain all nodes from the list.
To make a macro cluster, we use the following constructor:
DCCluster(DCCluster* clust1, DCCluster* clust2);The argument clusters become the component clusters of this higher level cluster. The cluster-arcs are established from the cluster-arcs of two component clusters by calling the following method:
void fixArcs(DCCluster* clust1, DCCluster* clust2);In this method, arcs put inside this cluster are removed from the arcs of two argument clusters.
In both constructors, we compute the sum of execution times of all nodes in the cluster, which can be obtained by
int getExecTime();The last two methods above return two component clusters.
DCCluster* getComp1();
DCCluster* getComp2();
void setName(const char* name);The above methods set and get the name of the cluster.
const char* readName();
void addArc(DCCluster* adj, int numSample);This method adds a cluster-arc that is adjacent to the first argument cluster with sample rate
numSample.
void setDCCluster(DCCluster* clust);Sets the
cluster
pointer of the nodes in this cluster to the argument cluster.
void assignP(int procNum);The first method assigns all nodes in the cluster to a processor. The second returns the processor that this cluster is assigned to.
int getProc();
void switchWith(DCCluster* clust);Switches the processor assignment of this cluster with the argument cluster.
DCCluster* pullWhich();Returns the cluster with the smaller execution time between two component clusters, and pull it out.
DCCluster* findCombiner();This method returns the best cluster, in terms of cluster-arc communication cost, to be combined. We break ties by returning the cluster with smallest execution time.
void broken();These methods indicate whether the cluster or its subclusters were broken into its components in the scheduling stage (3) of DeclustScheduler. The first method indicates that it happens. The second method queries whether it happens or not.
int getIntact();
int getScore();These methods get and set the score of the cluster. Refer to the
int setScore(int score);
void resetMember();
computeScore
method of the DCGraph class to see what the score of a cluster is. The last method resets the score to 0.
StringList print();Prints the name of this cluster and the names of component clusters.
~DCCluster();The destructor deletes the nodes in the cluster and cluster-arcs if it is an elementary cluster.
void insert(DCCluster* clust);These methods put
void append(DCCluster* clust);
void insertSorted(DCCluster* clust);
clust
at the head and the back of the list. The last method inserts the cluster in order of increasing execution time.
DCClusterLink* firstLink();The above methods return the DCClusterLink and DCCluster at the head of the list. The last method removes and returns the cluster from the list. Class DCClusterLink is derived from class DoubleLink as a container of DCCluster in the DCClusterList. It has a public method to access the cluster called
DCCluster* firstDCClust();
DCCluster* popHead();
DCCluster* getDCClustp();Create a DCClusterLink and removes all clusters in the list, respectively.
DCClusterLink* createLink(DCCluster* clust);
void removeDCClusters();
void resetList();The first two methods reset the scores of all clusters to 0. The first method also declares that each cluster is not broken. The third method resets the cluster pointer of the nodes of the clusters in the list.
void resetScore();
void setDCClusters();
int member(DCCluster* clust);Returns
TRUE
or FALSE
, based on whether the argument cluster is in the list or not.
void findDCClusts(DCNodeList& nlist);Add to the list clusters that own the nodes of the argument list. If the number of clusters is 1, we break the cluster into two component clusters and put them into the list.
int listSize();Returns the number of clusters in the list.
StringList print();Prints the list of clusters.
There is an iterator associated with the DCClusterList called DCClusterListIter that returns a DCCluster. It can return a DCClusterLink by the
nextLink
method.
DCClustArc(DCCluster* neighbor, int nsamples);The first argument is the pointer to the neighboring cluster while the second argument sets the sample rate of the connection.
DCCluster* getNeighbor();These methods return the neighbor cluster and change to it.
void changeNeighbor(DCCluster* clust);
void changeSamples(int newsamps);The above methods modify, increment, and return the sample rate of the current arc.
void addSamples(int delta);
int getSamples();
StringList print();Prints the name of the neighbor cluster and the sample rate.
Class DCClustArcList is derived from class SequentialList to make a list of cluster-arcs. It has four public methods.
DCClustArc* contain(DCCluster* clust);Returns the DCClustArc that is adjacent to the argument cluster. If no cluster-arc is found in the list, return 0.
void changeArc(DCCluster* oldC, DCCluster* newC);This method changes the pointer of neighbor cluster,
oldC,
in all cluster-arcs in the list to newC.
void removeArcs();Deletes all cluster-arcs in the list.
StringList print();Prints the list of DCClustArcs.
There is an iterator associated with the DCClustArcList, called DCClustArcListIter, which returns a DCClustArc.
There is one protected method:
ParNode* createCommNode(int i);Creates a DCNode to represent a communication code. The argument indicates the type of the node.
The other methods are all public, and support the main scheduling procedure described in the DeclustScheduler class .
int commAmount();Returns the communication overhead of the current schedule.
void saveBestResult(DCGraph* graph);This method saves the current scheduling information of the nodes as the best scheduling result.
void finalizeGalaxy(DCGraph* graph);After all scheduling is completed, we make a final version of the APEG including all communication loads based on the best scheduling result obtained.
void categorizeLoads(int procs);This method categorizes each processor as either heavily or lightly loaded. It sets an integer array,
nprocs,
1 for heavy and -1 for light processors. The initial threshold is 50 processors are heavily loaded if all processors are loaded beyond a 75 maximum load. We regard at most one idle processor as lightly loaded.
int findSLP(DCNodeList* nlist);This method finds the progression of nodes (regular or communication) in the schedule which prevents the makespan from being any shorter. We call this set of nodes and
schedule limiting progression: SLP
(refer to Sih's paper). The SLP can span several processors and can't contain idle times. If there are several SLPs it will return just one of them.