OpenMP-Examples/Examples_task_dep.tex

\pagebreak
\section{Task Dependences}
\label{sec:task_depend}

\subsection{Flow Dependence}
\label{subsec:task_flow_depend}

This example shows a simple flow dependence using a \code{depend}
clause on the \code{task} construct.

\cexample[4.0]{task_dep}{1}

\ffreeexample[4.0]{task_dep}{1}

The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend}
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
omitted, then the tasks could execute in any order and the program and the program
would have a race condition.

\subsection{Anti-dependence}
\label{subsec:task_anti_depend}

This example shows an anti-dependence using the \code{depend}
clause on the \code{task} construct.

\cexample[4.0]{task_dep}{2}

\ffreeexample[4.0]{task_dep}{2}

The program will always print \texttt{"}x = 1\texttt{"}, because the \code{depend}
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
omitted, then the tasks could execute in any order and the program would have a
race condition.

\subsection{Output Dependence}
\label{subsec:task_out_depend}

This example shows an output dependence using the \code{depend}
clause on the \code{task} construct.

\cexample[4.0]{task_dep}{3}

\ffreeexample[4.0]{task_dep}{3}

The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend}
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
omitted, then the tasks could execute in any order and the program would have a
race condition.

\pagebreak
\subsection{Concurrent Execution with Dependences}
\label{subsec:task_concurrent_depend}

In this example we show potentially concurrent execution of tasks using multiple
flow dependences expressed using the \code{depend} clause on the \code{task}
construct.

\cexample[4.0]{task_dep}{4}

\ffreeexample[4.0]{task_dep}{4}

The last two tasks are dependent on the first task. However there is no dependence
between the last two tasks, which may execute in any order (or concurrently if
more than one thread is available). Thus, the possible outputs are \texttt{"}x
+ 1 = 3. x + 2 = 4. \texttt{"} and \texttt{"}x + 2 = 4. x + 1 = 3. \texttt{"}.
If the \code{depend} clauses had been omitted, then all of the tasks could execute
in any order and the program would have a race condition.

\subsection{Matrix multiplication}
\label{subsec:task_matrix_mult}

This example shows a task-based blocked matrix multiplication. Matrices are of
NxN elements, and the multiplication is implemented using blocks of BSxBS elements.

\cexample[4.0]{task_dep}{5}

\ffreeexample[4.0]{task_dep}{5}

\subsection{\code{taskwait} with Dependences}
\label{subsec:taskwait_depend}

In this subsection three examples illustrate how the
\code{depend} clause can be applied to a \code{taskwait} construct to make the
generating task wait for specific child tasks to complete. This is an OpenMP 5.0 feature.
 In the same manner that
dependences can order executions among child tasks with \code{depend} clauses on
\code{task} constructs, the generating task can be scheduled to wait on child tasks
at a \code{taskwait} before it can proceed.

Note: Since the \code{depend} clause on a \code{taskwait} construct relaxes the
default synchronization behavior (waiting for all children to finish), it is important to
realize that child tasks that are not predecessor tasks, as determined by the \code{depend}
clause of the \code{taskwait} construct, may be running concurrently while the
generating task is executing after the taskwait.

In the first example the generating task waits at the \code{taskwait} construct
for the completion of the first child task because a dependence on the first task
is produced by \plc{x} with an \code{in} dependence type within the \code{depend}
clause of the \code{taskwait} construct.
Immediately after the first \code{taskwait} construct it is safe to access the
\plc{x} variable by the generating task, as shown in the print statement.
There is no completion restraint on the second child task.
Hence, immediately after the first \code{taskwait} it is unsafe to access the
\plc{y} variable since the second child task may still be executing.
The second \code{taskwait} ensures that the second child task has completed; hence
it is safe to access the \plc{y} variable in the following print statement.

\cexample[5.0]{task_dep}{6}

\ffreeexample[5.0]{task_dep}{6}

In this example the first two tasks are serialized, because a dependence on
the first child is produced by \plc{x} with the \code{in} dependence type
in the \code{depend} clause of the second task.
However, the generating task at the first \code{taskwait} waits only on the
first child task to complete, because a dependence on only the first child task
is produced by \plc{x} with an \code{in} dependence type within the
\code{depend} clause of the \code{taskwait} construct.
The second \code{taskwait} (without a \code{depend} clause) is included
to guarantee completion of the second task before \plc{y} is accessed.
(While unnecessary, the \code{depend(inout:} \code{y)} clause on the  2nd child task is
included to illustrate how the child task dependences can be completely annotated
in a data-flow model.)


\cexample[5.0]{task_dep}{7}

\ffreeexample[5.0]{task_dep}{7}


This example is similar to the previous one, except the generating task is
directed to also wait for completion of the second task.

The \code{depend} clause of the \code{taskwait} construct now includes an
\code{in} dependence type for \plc{y}.  Hence the generating task must now
wait on completion of any child task having \plc{y} with an \code{out}
(here \code{inout}) dependence type in its \code{depend} clause.
So, the \code{depend} clause of the \code{taskwait} construct now constrains
the second task to complete at the \code{taskwait}, too.
%--both tasks must now complete execution at the \code{taskwait}.
(This change makes the second \code{taskwait} of the previous example unnecessary--
it has been removed in this example.)

Note: While a taskwait construct ensures that all child tasks have completed; a depend clause on a taskwait
construct only waits for specific child tasks (prescribed by the dependence type and list
items in the \code{taskwait}'s \code{depend} clause).
This and the previous example illustrate the need to carefully determine
the dependence type of variables in the \code{taskwait} \code{depend} clause
when selecting child tasks that the generating task must wait on, so that its execution after the
taskwait does not produce race conditions on variables accessed by non-completed child tasks.

\cexample[5.0]{task_dep}{8}

\ffreeexample[5.0]{task_dep}{8}

\pagebreak
\subsection{Mutually Exclusive Execution with Dependences}
\label{subsec:task_dep_mutexinoutset}

In this example we show a series of tasks, including mutually exclusive
tasks, expressing dependences using the \code{depend} clause on the
\code{task} construct.

The program will always print~6. Tasks T1, T2 and T3 will be scheduled first,
in any order. Task T4 will be scheduled after tasks T1 and T2 are
completed. T5 will be scheduled after tasks T1 and T3 are completed. Due
to the \code{mutexinoutset} dependence type on \code{c}, T4 and T5 may be
scheduled in any order with respect to each other, but not at the same
time. Tasks T6 will be scheduled after both T4 and T5 are completed.

\cexample[5.0]{task_dep}{9}

\ffreeexample[5.0]{task_dep}{9}

The following example demonstrates a situation where the \code{mutexinoutset}
dependence type is advantageous. If \code{shortTaskB} completes
before \code{longTaskA}, the runtime can take advantage of this by
scheduling \code{longTaskBC} before \code{shortTaskAC}.

\cexample[5.0]{task_dep}{10}

\ffreeexample[5.0]{task_dep}{10}

\subsection{Multidependences Using Iterators}
\label{subsec:depend_iterator}

The following example uses an iterator to define a dynamic number of
dependences.

In the \code{single} construct of a parallel region a loop generates n tasks
and each task has an \code{out} dependence specified through an element of
the \plc{v} array.  This is followed by a single task that defines an \code{in}
dependence on each element of the array.  This is accomplished by
using the \code{iterator} modifier in the \code{depend} clause, supporting a dynamic number
of dependences (\plc{n} here).

The task for the \plc{print\_all\_elements} function is not executed until all dependences
prescribed (or registered) by the iterator are fulfilled; that is,
after all the tasks generated by the loop have completed.

Note, one cannot simply use an array section in the \code{depend} clause
of the second task construct because this would violate the \code{depend} clause restriction:

"List items used in \code{depend} clauses of the same task or sibling tasks
must indicate identical storage locations or disjoint storage locations".

In this case each of the loop tasks use a single disjoint (different storage)
element in their \code{depend} clause; however,
the array-section storage area prescribed in the commented directive is neither
identical nor disjoint to the storage prescibed by the elements of the
loop tasks.  The iterator overcomes this restriction by effectively
creating n disjoint storage areas.

\cexample[5.0]{task_dep}{11}

\ffreeexample[5.0]{task_dep}{11}

\subsection{Dependence for Undeferred Tasks}
\label{subsec:depend_undefer_task}

In the following example, we show that even if a task is undeferred as specified
by an \code{if} clause that evaluates to \plc{false}, task dependences are
still honored.

The \code{depend} clauses of the first and second explicit tasks specify that
the first task is completed before the second task.

The second explicit task has an \code{if} clause that evaluates to \plc{false}.
This means that the execution of the generating task (the implicit task of
the \code{single} region) must be suspended until the second explict task
is completed.
But, because of the dependence, the first explicit task must complete first,
then the second explicit task can execute and complete, and only then
the generating task can resume to the print statement.
Thus, the program will always print "\texttt{x = 2}".

\cexample[4.0]{task_dep}{12}
\clearpage

\ffreeexample[4.0]{task_dep}{12}