mirror of
https://github.com/OpenMP/Examples.git
synced 2025-04-04 05:41:33 +01:00
242 lines
10 KiB
TeX
242 lines
10 KiB
TeX
\pagebreak
|
|
\section{Task Dependences}
|
|
\label{sec:task_depend}
|
|
|
|
\subsection{Flow Dependence}
|
|
\label{subsec:task_flow_depend}
|
|
|
|
This example shows a simple flow dependence using a \code{depend}
|
|
clause on the \code{task} construct.
|
|
|
|
\cexample[4.0]{task_dep}{1}
|
|
|
|
\ffreeexample[4.0]{task_dep}{1}
|
|
|
|
The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend}
|
|
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
|
|
omitted, then the tasks could execute in any order and the program and the program
|
|
would have a race condition.
|
|
|
|
\subsection{Anti-dependence}
|
|
\label{subsec:task_anti_depend}
|
|
|
|
This example shows an anti-dependence using the \code{depend}
|
|
clause on the \code{task} construct.
|
|
|
|
\cexample[4.0]{task_dep}{2}
|
|
|
|
\ffreeexample[4.0]{task_dep}{2}
|
|
|
|
The program will always print \texttt{"}x = 1\texttt{"}, because the \code{depend}
|
|
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
|
|
omitted, then the tasks could execute in any order and the program would have a
|
|
race condition.
|
|
|
|
\subsection{Output Dependence}
|
|
\label{subsec:task_out_depend}
|
|
|
|
This example shows an output dependence using the \code{depend}
|
|
clause on the \code{task} construct.
|
|
|
|
\cexample[4.0]{task_dep}{3}
|
|
|
|
\ffreeexample[4.0]{task_dep}{3}
|
|
|
|
The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend}
|
|
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
|
|
omitted, then the tasks could execute in any order and the program would have a
|
|
race condition.
|
|
|
|
\pagebreak
|
|
\subsection{Concurrent Execution with Dependences}
|
|
\label{subsec:task_concurrent_depend}
|
|
|
|
In this example we show potentially concurrent execution of tasks using multiple
|
|
flow dependences expressed using the \code{depend} clause on the \code{task}
|
|
construct.
|
|
|
|
\cexample[4.0]{task_dep}{4}
|
|
|
|
\ffreeexample[4.0]{task_dep}{4}
|
|
|
|
The last two tasks are dependent on the first task. However there is no dependence
|
|
between the last two tasks, which may execute in any order (or concurrently if
|
|
more than one thread is available). Thus, the possible outputs are \texttt{"}x
|
|
+ 1 = 3. x + 2 = 4. \texttt{"} and \texttt{"}x + 2 = 4. x + 1 = 3. \texttt{"}.
|
|
If the \code{depend} clauses had been omitted, then all of the tasks could execute
|
|
in any order and the program would have a race condition.
|
|
|
|
\subsection{Matrix multiplication}
|
|
\label{subsec:task_matrix_mult}
|
|
|
|
This example shows a task-based blocked matrix multiplication. Matrices are of
|
|
NxN elements, and the multiplication is implemented using blocks of BSxBS elements.
|
|
|
|
\cexample[4.0]{task_dep}{5}
|
|
|
|
\ffreeexample[4.0]{task_dep}{5}
|
|
|
|
\subsection{\code{taskwait} with Dependences}
|
|
\label{subsec:taskwait_depend}
|
|
|
|
In this subsection three examples illustrate how the
|
|
\code{depend} clause can be applied to a \code{taskwait} construct to make the
|
|
generating task wait for specific child tasks to complete. This is an OpenMP 5.0 feature.
|
|
In the same manner that
|
|
dependences can order executions among child tasks with \code{depend} clauses on
|
|
\code{task} constructs, the generating task can be scheduled to wait on child tasks
|
|
at a \code{taskwait} before it can proceed.
|
|
|
|
Note: Since the \code{depend} clause on a \code{taskwait} construct relaxes the
|
|
default synchronization behavior (waiting for all children to finish), it is important to
|
|
realize that child tasks that are not predecessor tasks, as determined by the \code{depend}
|
|
clause of the \code{taskwait} construct, may be running concurrently while the
|
|
generating task is executing after the taskwait.
|
|
|
|
In the first example the generating task waits at the \code{taskwait} construct
|
|
for the completion of the first child task because a dependence on the first task
|
|
is produced by \plc{x} with an \code{in} dependence type within the \code{depend}
|
|
clause of the \code{taskwait} construct.
|
|
Immediately after the first \code{taskwait} construct it is safe to access the
|
|
\plc{x} variable by the generating task, as shown in the print statement.
|
|
There is no completion restraint on the second child task.
|
|
Hence, immediately after the first \code{taskwait} it is unsafe to access the
|
|
\plc{y} variable since the second child task may still be executing.
|
|
The second \code{taskwait} ensures that the second child task has completed; hence
|
|
it is safe to access the \plc{y} variable in the following print statement.
|
|
|
|
\cexample[5.0]{task_dep}{6}
|
|
|
|
\ffreeexample[5.0]{task_dep}{6}
|
|
|
|
In this example the first two tasks are serialized, because a dependence on
|
|
the first child is produced by \plc{x} with the \code{in} dependence type
|
|
in the \code{depend} clause of the second task.
|
|
However, the generating task at the first \code{taskwait} waits only on the
|
|
first child task to complete, because a dependence on only the first child task
|
|
is produced by \plc{x} with an \code{in} dependence type within the
|
|
\code{depend} clause of the \code{taskwait} construct.
|
|
The second \code{taskwait} (without a \code{depend} clause) is included
|
|
to guarantee completion of the second task before \plc{y} is accessed.
|
|
(While unnecessary, the \code{depend(inout:} \code{y)} clause on the 2nd child task is
|
|
included to illustrate how the child task dependences can be completely annotated
|
|
in a data-flow model.)
|
|
|
|
|
|
\cexample[5.0]{task_dep}{7}
|
|
|
|
\ffreeexample[5.0]{task_dep}{7}
|
|
|
|
|
|
This example is similar to the previous one, except the generating task is
|
|
directed to also wait for completion of the second task.
|
|
|
|
The \code{depend} clause of the \code{taskwait} construct now includes an
|
|
\code{in} dependence type for \plc{y}. Hence the generating task must now
|
|
wait on completion of any child task having \plc{y} with an \code{out}
|
|
(here \code{inout}) dependence type in its \code{depend} clause.
|
|
So, the \code{depend} clause of the \code{taskwait} construct now constrains
|
|
the second task to complete at the \code{taskwait}, too.
|
|
%--both tasks must now complete execution at the \code{taskwait}.
|
|
(This change makes the second \code{taskwait} of the previous example unnecessary--
|
|
it has been removed in this example.)
|
|
|
|
Note: While a taskwait construct ensures that all child tasks have completed; a depend clause on a taskwait
|
|
construct only waits for specific child tasks (prescribed by the dependence type and list
|
|
items in the \code{taskwait}'s \code{depend} clause).
|
|
This and the previous example illustrate the need to carefully determine
|
|
the dependence type of variables in the \code{taskwait} \code{depend} clause
|
|
when selecting child tasks that the generating task must wait on, so that its execution after the
|
|
taskwait does not produce race conditions on variables accessed by non-completed child tasks.
|
|
|
|
\cexample[5.0]{task_dep}{8}
|
|
|
|
\ffreeexample[5.0]{task_dep}{8}
|
|
|
|
\pagebreak
|
|
\subsection{Mutually Exclusive Execution with Dependences}
|
|
\label{subsec:task_dep_mutexinoutset}
|
|
|
|
In this example we show a series of tasks, including mutually exclusive
|
|
tasks, expressing dependences using the \code{depend} clause on the
|
|
\code{task} construct.
|
|
|
|
The program will always print~6. Tasks T1, T2 and T3 will be scheduled first,
|
|
in any order. Task T4 will be scheduled after tasks T1 and T2 are
|
|
completed. T5 will be scheduled after tasks T1 and T3 are completed. Due
|
|
to the \code{mutexinoutset} dependence type on \code{c}, T4 and T5 may be
|
|
scheduled in any order with respect to each other, but not at the same
|
|
time. Tasks T6 will be scheduled after both T4 and T5 are completed.
|
|
|
|
\cexample[5.0]{task_dep}{9}
|
|
|
|
\ffreeexample[5.0]{task_dep}{9}
|
|
|
|
The following example demonstrates a situation where the \code{mutexinoutset}
|
|
dependence type is advantageous. If \code{shortTaskB} completes
|
|
before \code{longTaskA}, the runtime can take advantage of this by
|
|
scheduling \code{longTaskBC} before \code{shortTaskAC}.
|
|
|
|
\cexample[5.0]{task_dep}{10}
|
|
|
|
\ffreeexample[5.0]{task_dep}{10}
|
|
|
|
\subsection{Multidependences Using Iterators}
|
|
\label{subsec:depend_iterator}
|
|
|
|
The following example uses an iterator to define a dynamic number of
|
|
dependences.
|
|
|
|
In the \code{single} construct of a parallel region a loop generates n tasks
|
|
and each task has an \code{out} dependence specified through an element of
|
|
the \plc{v} array. This is followed by a single task that defines an \code{in}
|
|
dependence on each element of the array. This is accomplished by
|
|
using the \code{iterator} modifier in the \code{depend} clause, supporting a dynamic number
|
|
of dependences (\plc{n} here).
|
|
|
|
The task for the \plc{print\_all\_elements} function is not executed until all dependences
|
|
prescribed (or registered) by the iterator are fulfilled; that is,
|
|
after all the tasks generated by the loop have completed.
|
|
|
|
Note, one cannot simply use an array section in the \code{depend} clause
|
|
of the second task construct because this would violate the \code{depend} clause restriction:
|
|
|
|
"List items used in \code{depend} clauses of the same task or sibling tasks
|
|
must indicate identical storage locations or disjoint storage locations".
|
|
|
|
In this case each of the loop tasks use a single disjoint (different storage)
|
|
element in their \code{depend} clause; however,
|
|
the array-section storage area prescribed in the commented directive is neither
|
|
identical nor disjoint to the storage prescibed by the elements of the
|
|
loop tasks. The iterator overcomes this restriction by effectively
|
|
creating n disjoint storage areas.
|
|
|
|
\cexample[5.0]{task_dep}{11}
|
|
|
|
\ffreeexample[5.0]{task_dep}{11}
|
|
|
|
\subsection{Dependence for Undeferred Tasks}
|
|
\label{subsec:depend_undefer_task}
|
|
|
|
In the following example, we show that even if a task is undeferred as specified
|
|
by an \code{if} clause that evaluates to \plc{false}, task dependences are
|
|
still honored.
|
|
|
|
The \code{depend} clauses of the first and second explicit tasks specify that
|
|
the first task is completed before the second task.
|
|
|
|
The second explicit task has an \code{if} clause that evaluates to \plc{false}.
|
|
This means that the execution of the generating task (the implicit task of
|
|
the \code{single} region) must be suspended until the second explict task
|
|
is completed.
|
|
But, because of the dependence, the first explicit task must complete first,
|
|
then the second explicit task can execute and complete, and only then
|
|
the generating task can resume to the print statement.
|
|
Thus, the program will always print "\texttt{x = 2}".
|
|
|
|
\cexample[4.0]{task_dep}{12}
|
|
\clearpage
|
|
|
|
\ffreeexample[4.0]{task_dep}{12}
|
|
|