Indu K

MS Scholar, IIT Madras

Modern languages such as Cilk, Chapel and X10 implements dy- namic lightweight task parallel execution model. It helps program- mers to express the ideal parallelism. It is the responsibility of com- piler and the run-time system to extract useful parallelism in the program to get better performance. The runtime system schedules the activities to each software thread with the help of a work steal- ing queue. But it may lead to irregular work distribution and hence load imbalance. So it is important to chunk the loop at compile time itself and divide the iterations or work among the chunks and acheive good load balancing.
Division of work among the chunks.
There are different ways to divide the iterations among the chunks. Block chunking: contiguous iterations are as- signed to one chunk. The efficiency of such chunking depends on the workload on each iteration and the size of the chunk. Cyclic chunking: we can also assign iterations to chunks in a cyclic man- ner.Here the iterations after a regular intervals becomes part of one chunk. However in most benchmarks, the amount of work done by each iteration may be irregular. Therefore we may not be able to determine whether to chunk as block or cyclic at compile time. For instance, the load on each iteration may be input dependent. In such a case the challenge is to optimally assign iterations to chunks.

hnxyfuyucyfuyuygnryxiuymuzioiuj,voxufiucmurnunv