\pagebreak \section{Task Dependences} \label{sec:task_depend} \subsection{Flow Dependence} \label{subsec:task_flow_depend} This example shows a simple flow dependence using a \code{depend} clause on the \code{task} construct. \cexample[4.0]{task_dep}{1} \ffreeexample[4.0]{task_dep}{1} The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend} clauses enforce the ordering of the tasks. If the \code{depend} clauses had been omitted, then the tasks could execute in any order and the program and the program would have a race condition. \subsection{Anti-dependence} \label{subsec:task_anti_depend} This example shows an anti-dependence using the \code{depend} clause on the \code{task} construct. \cexample[4.0]{task_dep}{2} \ffreeexample[4.0]{task_dep}{2} The program will always print \texttt{"}x = 1\texttt{"}, because the \code{depend} clauses enforce the ordering of the tasks. If the \code{depend} clauses had been omitted, then the tasks could execute in any order and the program would have a race condition. \subsection{Output Dependence} \label{subsec:task_out_depend} This example shows an output dependence using the \code{depend} clause on the \code{task} construct. \cexample[4.0]{task_dep}{3} \ffreeexample[4.0]{task_dep}{3} The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend} clauses enforce the ordering of the tasks. If the \code{depend} clauses had been omitted, then the tasks could execute in any order and the program would have a race condition. \pagebreak \subsection{Concurrent Execution with Dependences} \label{subsec:task_concurrent_depend} In this example we show potentially concurrent execution of tasks using multiple flow dependences expressed using the \code{depend} clause on the \code{task} construct. \cexample[4.0]{task_dep}{4} \ffreeexample[4.0]{task_dep}{4} The last two tasks are dependent on the first task. However there is no dependence between the last two tasks, which may execute in any order (or concurrently if more than one thread is available). Thus, the possible outputs are \texttt{"}x + 1 = 3. x + 2 = 4. \texttt{"} and \texttt{"}x + 2 = 4. x + 1 = 3. \texttt{"}. If the \code{depend} clauses had been omitted, then all of the tasks could execute in any order and the program would have a race condition. \subsection{Matrix multiplication} \label{subsec:task_matrix_mult} This example shows a task-based blocked matrix multiplication. Matrices are of NxN elements, and the multiplication is implemented using blocks of BSxBS elements. \cexample[4.0]{task_dep}{5} \ffreeexample[4.0]{task_dep}{5} \subsection{\code{taskwait} with Dependences} \label{subsec:taskwait_depend} In this subsection three examples illustrate how the \code{depend} clause can be applied to a \code{taskwait} construct to make the generating task wait for specific child tasks to complete. This is an OpenMP 5.0 feature. In the same manner that dependences can order executions among child tasks with \code{depend} clauses on \code{task} constructs, the generating task can be scheduled to wait on child tasks at a \code{taskwait} before it can proceed. Note: Since the \code{depend} clause on a \code{taskwait} construct relaxes the default synchronization behavior (waiting for all children to finish), it is important to realize that child tasks that are not predecessor tasks, as determined by the \code{depend} clause of the \code{taskwait} construct, may be running concurrently while the generating task is executing after the taskwait. In the first example the generating task waits at the \code{taskwait} construct for the completion of the first child task because a dependence on the first task is produced by \plc{x} with an \code{in} dependence type within the \code{depend} clause of the \code{taskwait} construct. Immediately after the first \code{taskwait} construct it is safe to access the \plc{x} variable by the generating task, as shown in the print statement. There is no completion restraint on the second child task. Hence, immediately after the first \code{taskwait} it is unsafe to access the \plc{y} variable since the second child task may still be executing. The second \code{taskwait} ensures that the second child task has completed; hence it is safe to access the \plc{y} variable in the following print statement. \cexample[5.0]{task_dep}{6} \ffreeexample[5.0]{task_dep}{6} In this example the first two tasks are serialized, because a dependence on the first child is produced by \plc{x} with the \code{in} dependence type in the \code{depend} clause of the second task. However, the generating task at the first \code{taskwait} waits only on the first child task to complete, because a dependence on only the first child task is produced by \plc{x} with an \code{in} dependence type within the \code{depend} clause of the \code{taskwait} construct. The second \code{taskwait} (without a \code{depend} clause) is included to guarantee completion of the second task before \plc{y} is accessed. (While unnecessary, the \code{depend(inout:} \code{y)} clause on the 2nd child task is included to illustrate how the child task dependences can be completely annotated in a data-flow model.) \cexample[5.0]{task_dep}{7} \ffreeexample[5.0]{task_dep}{7} This example is similar to the previous one, except the generating task is directed to also wait for completion of the second task. The \code{depend} clause of the \code{taskwait} construct now includes an \code{in} dependence type for \plc{y}. Hence the generating task must now wait on completion of any child task having \plc{y} with an \code{out} (here \code{inout}) dependence type in its \code{depend} clause. So, the \code{depend} clause of the \code{taskwait} construct now constrains the second task to complete at the \code{taskwait}, too. %--both tasks must now complete execution at the \code{taskwait}. (This change makes the second \code{taskwait} of the previous example unnecessary-- it has been removed in this example.) Note: While a taskwait construct ensures that all child tasks have completed; a depend clause on a taskwait construct only waits for specific child tasks (prescribed by the dependence type and list items in the \code{taskwait}'s \code{depend} clause). This and the previous example illustrate the need to carefully determine the dependence type of variables in the \code{taskwait} \code{depend} clause when selecting child tasks that the generating task must wait on, so that its execution after the taskwait does not produce race conditions on variables accessed by non-completed child tasks. \cexample[5.0]{task_dep}{8} \ffreeexample[5.0]{task_dep}{8} \pagebreak \subsection{Mutually Exclusive Execution with Dependences} \label{subsec:task_dep_mutexinoutset} In this example we show a series of tasks, including mutually exclusive tasks, expressing dependences using the \code{depend} clause on the \code{task} construct. The program will always print~6. Tasks T1, T2 and T3 will be scheduled first, in any order. Task T4 will be scheduled after tasks T1 and T2 are completed. T5 will be scheduled after tasks T1 and T3 are completed. Due to the \code{mutexinoutset} dependence type on \code{c}, T4 and T5 may be scheduled in any order with respect to each other, but not at the same time. Tasks T6 will be scheduled after both T4 and T5 are completed. \cexample[5.0]{task_dep}{9} \ffreeexample[5.0]{task_dep}{9} The following example demonstrates a situation where the \code{mutexinoutset} dependence type is advantageous. If \code{shortTaskB} completes before \code{longTaskA}, the runtime can take advantage of this by scheduling \code{longTaskBC} before \code{shortTaskAC}. \cexample[5.0]{task_dep}{10} \ffreeexample[5.0]{task_dep}{10} \subsection{Multidependences Using Iterators} \label{subsec:depend_iterator} The following example uses an iterator to define a dynamic number of dependences. In the \code{single} construct of a parallel region a loop generates n tasks and each task has an \code{out} dependence specified through an element of the \plc{v} array. This is followed by a single task that defines an \code{in} dependence on each element of the array. This is accomplished by using the \code{iterator} modifier in the \code{depend} clause, supporting a dynamic number of dependences (\plc{n} here). The task for the \plc{print\_all\_elements} function is not executed until all dependences prescribed (or registered) by the iterator are fulfilled; that is, after all the tasks generated by the loop have completed. Note, one cannot simply use an array section in the \code{depend} clause of the second task construct because this would violate the \code{depend} clause restriction: "List items used in \code{depend} clauses of the same task or sibling tasks must indicate identical storage locations or disjoint storage locations". In this case each of the loop tasks use a single disjoint (different storage) element in their \code{depend} clause; however, the array-section storage area prescribed in the commented directive is neither identical nor disjoint to the storage prescibed by the elements of the loop tasks. The iterator overcomes this restriction by effectively creating n disjoint storage areas. \cexample[5.0]{task_dep}{11} \ffreeexample[5.0]{task_dep}{11} \subsection{Dependence for Undeferred Tasks} \label{subsec:depend_undefer_task} In the following example, we show that even if a task is undeferred as specified by an \code{if} clause that evaluates to \plc{false}, task dependences are still honored. The \code{depend} clauses of the first and second explicit tasks specify that the first task is completed before the second task. The second explicit task has an \code{if} clause that evaluates to \plc{false}. This means that the execution of the generating task (the implicit task of the \code{single} region) must be suspended until the second explict task is completed. But, because of the dependence, the first explicit task must complete first, then the second explicit task can execute and complete, and only then the generating task can resume to the print statement. Thus, the program will always print "\texttt{x = 2}". \cexample[4.0]{task_dep}{12} \clearpage \ffreeexample[4.0]{task_dep}{12}