%\pagebreak \section{Task Dependences} \label{sec:task_depend} \index{dependences!task dependences} \subsection{Flow Dependence} \label{subsec:task_flow_depend} \index{task dependences!flow dependence} \index{task construct@\kcode{task} construct!depend clause@\kcode{depend} clause} \index{task construct@\kcode{task} construct} \index{constructs!task@\kcode{task}} \index{depend clause@\kcode{depend} clause} \index{clauses!depend@\kcode{depend}} This example shows a simple flow dependence using a \kcode{depend} clause on the \kcode{task} construct. \cexample[4.0]{task_dep}{1} \ffreeexample[4.0]{task_dep}{1} The program will always print \pout{x = 2}, because the \kcode{depend} clauses enforce the ordering of the tasks. If the \kcode{depend} clauses had been omitted, then the tasks could execute in any order and the program and the program would have a race condition. \subsection{Anti-dependence} \label{subsec:task_anti_depend} \index{task dependences!anti dependence} This example shows an anti-dependence using the \kcode{depend} clause on the \kcode{task} construct. \cexample[4.0]{task_dep}{2} \ffreeexample[4.0]{task_dep}{2} The program will always print \pout{x = 1}, because the \kcode{depend} clauses enforce the ordering of the tasks. If the \kcode{depend} clauses had been omitted, then the tasks could execute in any order and the program would have a race condition. \subsection{Output Dependence} \label{subsec:task_out_depend} \index{task dependences!output dependence} This example shows an output dependence using the \kcode{depend} clause on the \kcode{task} construct. \cexample[4.0]{task_dep}{3} \ffreeexample[4.0]{task_dep}{3} The program will always print \pout{x = 2}, because the \kcode{depend} clauses enforce the ordering of the tasks. If the \kcode{depend} clauses had been omitted, then the tasks could execute in any order and the program would have a race condition. \pagebreak \subsection{Concurrent Execution with Dependences} \label{subsec:task_concurrent_depend} \index{task dependences!concurrent execution with} In this example we show potentially concurrent execution of tasks using multiple flow dependences expressed using the \kcode{depend} clause on the \kcode{task} construct. The last two tasks are dependent on the first task. However, there is no dependence between the last two tasks, which may execute in any order (or concurrently if more than one thread is available). Thus, the possible outputs are \pout{x + 1 = 3. x + 2 = 4.} and \pout{x + 2 = 4. x + 1 = 3.}. If the \kcode{depend} clauses had been omitted, then all of the tasks could execute in any order and the program would have a race condition. \cexample[4.0]{task_dep}{4} \ffreeexample[4.0]{task_dep}{4} The following example illustrates the semantic difference between \kcode{inout} and \kcode{inoutset} dependence types. In Case 1, tasks generated at T1 inside the loop have dependences among themselves due to the \kcode{inout} dependence type and with task T2. As a result, these tasks are executed sequentially before the print statement from task T2. In Case 2, tasks generated at T3 inside the loop have no dependences among themselves from the \kcode{inoutset} dependence type, but have dependences with task T4. As a result, these tasks are executed concurrently before the print statement from task T4. \cexample[5.1]{task_dep}{4b} \ffreeexample[5.1]{task_dep}{4b} \subsection{Matrix multiplication} \label{subsec:task_matrix_mult} \index{task dependences!matrix multiplication} This example shows a task-based blocked matrix multiplication. Matrices are of \ucode{N}x\ucode{N} elements, and the multiplication is implemented using blocks of \ucode{BS}x\ucode{BS} elements. \cexample[4.0]{task_dep}{5} \ffreeexample[4.0]{task_dep}{5} \subsection{\kcode{taskwait} with Dependences} \label{subsec:taskwait_depend} \index{task dependences!taskwait construct with@\kcode{taskwait} construct with} \index{taskwait construct@\kcode{taskwait} construct} \index{constructs!taskwait@\kcode{taskwait}} \index{taskwait construct@\kcode{taskwait} construct!depend clause@\kcode{depend} clause} \index{depend clause@\kcode{depend} clause} \index{clauses!depend@\kcode{depend}} In this subsection three examples illustrate how the \kcode{depend} clause can be applied to a \kcode{taskwait} construct to make the generating task wait for specific child tasks to complete. This is an OpenMP 5.0 feature. In the same manner that dependences can order executions among child tasks with \kcode{depend} clauses on \kcode{task} constructs, the generating task can be scheduled to wait on child tasks at a \kcode{taskwait} before it can proceed. Note: Since the \kcode{depend} clause on a \kcode{taskwait} construct relaxes the default synchronization behavior (waiting for all children to finish), it is important to realize that child tasks that are not predecessor tasks, as determined by the \kcode{depend} clause of the \kcode{taskwait} construct, may be running concurrently while the generating task is executing after the taskwait. In the first example the generating task waits at the \kcode{taskwait} construct for the completion of the first child task because a dependence on the first task is produced by \ucode{x} with an \kcode{in} dependence type within the \kcode{depend} clause of the \kcode{taskwait} construct. Immediately after the first \kcode{taskwait} construct it is safe to access the \ucode{x} variable by the generating task, as shown in the print statement. There is no completion restraint on the second child task. Hence, immediately after the first \kcode{taskwait} it is unsafe to access the \ucode{y} variable since the second child task may still be executing. The second \kcode{taskwait} ensures that the second child task has completed; hence it is safe to access the \ucode{y} variable in the following print statement. \cexample[5.0]{task_dep}{6} \ffreeexample[5.0]{task_dep}{6} In this example the first two tasks are serialized, because a dependence on the first child is produced by \ucode{x} with the \kcode{in} dependence type in the \kcode{depend} clause of the second task. However, the generating task at the first \kcode{taskwait} waits only on the first child task to complete, because a dependence on only the first child task is produced by \ucode{x} with an \kcode{in} dependence type within the \kcode{depend} clause of the \kcode{taskwait} construct. The second \kcode{taskwait} (without a \kcode{depend} clause) is included to guarantee completion of the second task before \ucode{y} is accessed. (While unnecessary, the \kcode{depend(inout: \ucode{y})} clause on the 2nd child task is included to illustrate how the child task dependences can be completely annotated in a data-flow model.) \cexample[5.0]{task_dep}{7} \ffreeexample[5.0]{task_dep}{7} \clearpage This example is similar to the previous one, except the generating task is directed to also wait for completion of the second task. The \kcode{depend} clause of the \kcode{taskwait} construct now includes an \kcode{in} dependence type for \ucode{y}. Hence the generating task must now wait on completion of any child task having \ucode{y} with an \kcode{out} (here \kcode{inout}) dependence type in its \kcode{depend} clause. So, the \kcode{depend} clause of the \kcode{taskwait} construct now constrains the second task to complete at the taskwait, too. %--both tasks must now complete execution at the \code{taskwait}. (This change makes the second taskwait of the previous example unnecessary-- it has been removed in this example.) Note: While a \kcode{taskwait} construct ensures that all child tasks have completed; a depend clause on a \kcode{taskwait} construct only waits for specific child tasks (prescribed by the dependence type and list items in the \kcode{taskwait}'s \kcode{depend} clause). This and the previous example illustrate the need to carefully determine the dependence type of variables in the \kcode{depend} clause of the \kcode{taskwait} construct. when selecting child tasks that the generating task must wait on, so that its execution after the taskwait does not produce race conditions on variables accessed by non-completed child tasks. \cexample[5.0]{task_dep}{8} \ffreeexample[5.0]{task_dep}{8} \pagebreak \subsection{Mutually Exclusive Execution with Dependences} \label{subsec:task_dep_mutexinoutset} \index{task dependences!mutually exclusive execution} In this example we show a series of tasks, including mutually exclusive tasks, expressing dependences using the \kcode{depend} clause on the \kcode{task} construct. The program will always print \pout{6}. Tasks T1, T2 and T3 will be scheduled first, in any order. Task T4 will be scheduled after tasks T1 and T2 are completed. T5 will be scheduled after tasks T1 and T3 are completed. Due to the \kcode{mutexinoutset} dependence type on \ucode{c}, T4 and T5 may be scheduled in any order with respect to each other, but not at the same time. Tasks T6 will be scheduled after both T4 and T5 are completed. \cexample[5.0]{task_dep}{9} \ffreeexample[5.0]{task_dep}{9} The following example demonstrates a situation where the \kcode{mutexinoutset} dependence type is advantageous. If \ucode{shortTaskB} completes before \ucode{longTaskA}, the runtime can take advantage of this by scheduling \ucode{longTaskBC} before \ucode{shortTaskAC}. \cexample[5.0]{task_dep}{10} \ffreeexample[5.0]{task_dep}{10} \subsection{Multidependences Using Iterators} \label{subsec:depend_iterator} \index{task dependences!using iterators} \index{depend clause@\kcode{depend} clause!iterator modifier@\kcode{iterator} modifier} \index{iterator modifier@\kcode{iterator} modifier} The following example uses an iterator to define a dynamic number of dependences. In the \kcode{single} construct of a parallel region a loop generates \ucode{n} tasks and each task has an \kcode{out} dependence specified through an element of the \ucode{v} array. This is followed by a single task that defines an \kcode{in} dependence on each element of the array. This is accomplished by using the \kcode{iterator} modifier in the \kcode{depend} clause, supporting a dynamic number of dependences (\ucode{n} here). The task for the \ucode{print_all_elements} procedure is not executed until all dependences prescribed (or registered) by the iterator are fulfilled; that is, after all the tasks generated by the loop have completed. Note, one cannot simply use an array section in the \kcode{depend} clause of the second task construct because this would violate the \kcode{depend} clause restriction: ``List items used in \kcode{depend} clauses of the same task or sibling tasks must indicate identical storage locations or disjoint storage locations''. In this case each of the loop tasks use a single disjoint (different storage) element in their \kcode{depend} clause; however, the array-section storage area prescribed in the commented directive is neither identical nor disjoint to the storage prescribed by the elements of the loop tasks. The iterator overcomes this restriction by effectively creating \ucode{n} disjoint storage areas. \cexample[5.0]{task_dep}{11} \ffreeexample[5.0]{task_dep}{11} \subsection{Dependence for Undeferred Tasks} \label{subsec:depend_undefer_task} \index{task dependences!undeferred tasks} In the following example, we show that even if a task is undeferred as specified by an \kcode{if} clause that evaluates to \vcode{false}, task dependences are still honored. The \kcode{depend} clauses of the first and second explicit tasks specify that the first task is completed before the second task. The second explicit task has an \kcode{if} clause that evaluates to \vcode{false}. This means that the execution of the generating task (the implicit task of the \kcode{single} region) must be suspended until the second explicit task is completed. But, because of the dependence, the first explicit task must complete first, then the second explicit task can execute and complete, and only then the generating task can resume to the print statement. Thus, the program will always print \pout{x = 2}. \cexample[4.0]{task_dep}{12} \clearpage \ffreeexample[4.0]{task_dep}{12} In OpenMP 5.1 the \kcode{omp_all_memory} \plc{reserved locator} was introduced to specify storage of all objects in memory. In the following example, it is used in Task 4 as a convenient way to specify that the locator (list item) denotes the storage of all objects (locations) in memory, and will therefore match the \ucode{a} and \ucode{d} locators of Task 2, Task 3 and Task 6. The dependences guarantee the ordered execution of Tasks 2 and 3 before 4, and Task 4 before Task 6. Since there are no dependences imposed on Task 1 and Task 5, they can be scheduled to execute at any time, with no ordering. \cexample[5.1]{task_dep}{13} \ffreeexample[5.1]{task_dep}{13} \subsection{Transparent Task Dependences} \label{subsec:depend_trans_task} \index{task dependences!transparent tasks} In the following example, each iteration of the \ucode{h}-loop updates all elements of array \ucode{M} and task dependences are used to synchronize updates across different iterations of the loop. The code uses two levels of dependent tasks and assumes that \ucode{N_ROWS} is evenly divisible by \ucode{ROWS_PER_TASK}. The \ucode{h}-loop generates the first level of tasks, with the \kcode{depend} clause serializing their execution and each task calling \ucode{my_func}. A second level of tasks are generated by the \ucode{i}-loop in \ucode{my_func}. However, the dependences for this second level of tasks are between tasks from different calls to \ucode{my_func}. In order to enforce these dependences, the first-level tasks are specified as transparent tasks with the \kcode{transparent(omp_impex)} clause. The \kcode{omp_impex} argument (which is the default if not explicitly specified) indicates that the task is both an exporting and importing task. For the purposes of dependence matching, an exporting task is one that makes its child tasks visible to its successors and an importing task is one that makes its preceding tasks (such as earlier sibling tasks) visible to its child tasks. As a result of the exposed dependences, the task generated in the $i^{th}$ iteration of the \ucode{h}=$h_0$ instance of \ucode{my_func} is guaranteed to be ordered before the task generated in the $i^{th}$ iteration of the \ucode{h}=$h_1$ instance of \ucode{my_func}, where $h_0 < h_1$. \cexample[6.0]{task_dep}{14} \ffreeexample[6.0]{task_dep}{14}