mirror of
https://github.com/OpenMP/Examples.git
synced 2025-04-04 05:41:33 +01:00
327 lines
14 KiB
TeX
327 lines
14 KiB
TeX
%\pagebreak
|
|
\section{Task Dependences}
|
|
\label{sec:task_depend}
|
|
\index{dependences!task dependences}
|
|
|
|
\subsection{Flow Dependence}
|
|
\label{subsec:task_flow_depend}
|
|
\index{task dependences!flow dependence}
|
|
\index{task construct@\kcode{task} construct!depend clause@\kcode{depend} clause}
|
|
\index{task construct@\kcode{task} construct}
|
|
\index{constructs!task@\kcode{task}}
|
|
\index{depend clause@\kcode{depend} clause}
|
|
\index{clauses!depend@\kcode{depend}}
|
|
|
|
This example shows a simple flow dependence using a \kcode{depend}
|
|
clause on the \kcode{task} construct.
|
|
|
|
\cexample[4.0]{task_dep}{1}
|
|
|
|
\ffreeexample[4.0]{task_dep}{1}
|
|
|
|
The program will always print \pout{x = 2}, because the \kcode{depend}
|
|
clauses enforce the ordering of the tasks. If the \kcode{depend} clauses had been
|
|
omitted, then the tasks could execute in any order and the program and the program
|
|
would have a race condition.
|
|
|
|
\subsection{Anti-dependence}
|
|
\label{subsec:task_anti_depend}
|
|
\index{task dependences!anti dependence}
|
|
|
|
This example shows an anti-dependence using the \kcode{depend}
|
|
clause on the \kcode{task} construct.
|
|
|
|
\cexample[4.0]{task_dep}{2}
|
|
|
|
\ffreeexample[4.0]{task_dep}{2}
|
|
|
|
The program will always print \pout{x = 1}, because the \kcode{depend}
|
|
clauses enforce the ordering of the tasks. If the \kcode{depend} clauses had been
|
|
omitted, then the tasks could execute in any order and the program would have a
|
|
race condition.
|
|
|
|
\subsection{Output Dependence}
|
|
\label{subsec:task_out_depend}
|
|
\index{task dependences!output dependence}
|
|
|
|
This example shows an output dependence using the \kcode{depend}
|
|
clause on the \kcode{task} construct.
|
|
|
|
\cexample[4.0]{task_dep}{3}
|
|
|
|
\ffreeexample[4.0]{task_dep}{3}
|
|
|
|
The program will always print \pout{x = 2}, because the \kcode{depend}
|
|
clauses enforce the ordering of the tasks. If the \kcode{depend} clauses had been
|
|
omitted, then the tasks could execute in any order and the program would have a
|
|
race condition.
|
|
|
|
\pagebreak
|
|
\subsection{Concurrent Execution with Dependences}
|
|
\label{subsec:task_concurrent_depend}
|
|
\index{task dependences!concurrent execution with}
|
|
|
|
In this example we show potentially concurrent execution of tasks using multiple
|
|
flow dependences expressed using the \kcode{depend} clause on the \kcode{task}
|
|
construct.
|
|
|
|
The last two tasks are dependent on the first task. However, there is no dependence
|
|
between the last two tasks, which may execute in any order (or concurrently if
|
|
more than one thread is available). Thus, the possible outputs are
|
|
\pout{x + 1 = 3. x + 2 = 4.} and \pout{x + 2 = 4. x + 1 = 3.}.
|
|
If the \kcode{depend} clauses had been omitted, then all of the tasks could execute
|
|
in any order and the program would have a race condition.
|
|
|
|
\cexample[4.0]{task_dep}{4}
|
|
|
|
\ffreeexample[4.0]{task_dep}{4}
|
|
|
|
The following example illustrates the semantic difference between \kcode{inout}
|
|
and \kcode{inoutset} dependence types. In Case 1, tasks generated at T1
|
|
inside the loop have dependences among themselves due to
|
|
the \kcode{inout} dependence type and with task T2.
|
|
As a result, these tasks are executed sequentially before the print
|
|
statement from task T2.
|
|
In Case 2, tasks generated at T3 inside the loop have no dependences
|
|
among themselves from the \kcode{inoutset} dependence type, but have
|
|
dependences with task T4.
|
|
As a result, these tasks are executed concurrently before the print
|
|
statement from task T4.
|
|
|
|
\cexample[5.1]{task_dep}{4b}
|
|
|
|
\ffreeexample[5.1]{task_dep}{4b}
|
|
|
|
\subsection{Matrix multiplication}
|
|
\label{subsec:task_matrix_mult}
|
|
\index{task dependences!matrix multiplication}
|
|
|
|
This example shows a task-based blocked matrix multiplication. Matrices are of
|
|
\ucode{N}x\ucode{N} elements, and the multiplication is implemented using blocks
|
|
of \ucode{BS}x\ucode{BS} elements.
|
|
|
|
\cexample[4.0]{task_dep}{5}
|
|
|
|
\ffreeexample[4.0]{task_dep}{5}
|
|
|
|
\subsection{\kcode{taskwait} with Dependences}
|
|
\label{subsec:taskwait_depend}
|
|
\index{task dependences!taskwait construct with@\kcode{taskwait} construct with}
|
|
\index{taskwait construct@\kcode{taskwait} construct}
|
|
\index{constructs!taskwait@\kcode{taskwait}}
|
|
\index{taskwait construct@\kcode{taskwait} construct!depend clause@\kcode{depend} clause}
|
|
\index{depend clause@\kcode{depend} clause}
|
|
\index{clauses!depend@\kcode{depend}}
|
|
|
|
In this subsection three examples illustrate how the
|
|
\kcode{depend} clause can be applied to a \kcode{taskwait} construct to make the
|
|
generating task wait for specific child tasks to complete. This is an OpenMP 5.0 feature.
|
|
In the same manner that
|
|
dependences can order executions among child tasks with \kcode{depend} clauses on
|
|
\kcode{task} constructs, the generating task can be scheduled to wait on child tasks
|
|
at a \kcode{taskwait} before it can proceed.
|
|
|
|
Note: Since the \kcode{depend} clause on a \kcode{taskwait} construct relaxes the
|
|
default synchronization behavior (waiting for all children to finish), it is important to
|
|
realize that child tasks that are not predecessor tasks, as determined by the \kcode{depend}
|
|
clause of the \kcode{taskwait} construct, may be running concurrently while the
|
|
generating task is executing after the taskwait.
|
|
|
|
In the first example the generating task waits at the \kcode{taskwait} construct
|
|
for the completion of the first child task because a dependence on the first task
|
|
is produced by \ucode{x} with an \kcode{in} dependence type within the \kcode{depend}
|
|
clause of the \kcode{taskwait} construct.
|
|
Immediately after the first \kcode{taskwait} construct it is safe to access the
|
|
\ucode{x} variable by the generating task, as shown in the print statement.
|
|
There is no completion restraint on the second child task.
|
|
Hence, immediately after the first \kcode{taskwait} it is unsafe to access the
|
|
\ucode{y} variable since the second child task may still be executing.
|
|
The second \kcode{taskwait} ensures that the second child task has completed; hence
|
|
it is safe to access the \ucode{y} variable in the following print statement.
|
|
|
|
\cexample[5.0]{task_dep}{6}
|
|
|
|
\ffreeexample[5.0]{task_dep}{6}
|
|
|
|
In this example the first two tasks are serialized, because a dependence on
|
|
the first child is produced by \ucode{x} with the \kcode{in} dependence type
|
|
in the \kcode{depend} clause of the second task.
|
|
However, the generating task at the first \kcode{taskwait} waits only on the
|
|
first child task to complete, because a dependence on only the first child task
|
|
is produced by \ucode{x} with an \kcode{in} dependence type within the
|
|
\kcode{depend} clause of the \kcode{taskwait} construct.
|
|
The second \kcode{taskwait} (without a \kcode{depend} clause) is included
|
|
to guarantee completion of the second task before \ucode{y} is accessed.
|
|
(While unnecessary, the \kcode{depend(inout: \ucode{y})} clause on the 2nd child task is
|
|
included to illustrate how the child task dependences can be completely annotated
|
|
in a data-flow model.)
|
|
|
|
|
|
\cexample[5.0]{task_dep}{7}
|
|
|
|
\ffreeexample[5.0]{task_dep}{7}
|
|
\clearpage
|
|
|
|
|
|
This example is similar to the previous one, except the generating task is
|
|
directed to also wait for completion of the second task.
|
|
|
|
The \kcode{depend} clause of the \kcode{taskwait} construct now includes an
|
|
\kcode{in} dependence type for \ucode{y}. Hence the generating task must now
|
|
wait on completion of any child task having \ucode{y} with an \kcode{out}
|
|
(here \kcode{inout}) dependence type in its \kcode{depend} clause.
|
|
So, the \kcode{depend} clause of the \kcode{taskwait} construct now constrains
|
|
the second task to complete at the taskwait, too.
|
|
%--both tasks must now complete execution at the \code{taskwait}.
|
|
(This change makes the second taskwait of the previous example unnecessary--
|
|
it has been removed in this example.)
|
|
|
|
Note: While a \kcode{taskwait} construct ensures that all child tasks have completed; a depend clause on a \kcode{taskwait}
|
|
construct only waits for specific child tasks (prescribed by the dependence type and list
|
|
items in the \kcode{taskwait}'s \kcode{depend} clause).
|
|
This and the previous example illustrate the need to carefully determine
|
|
the dependence type of variables in the \kcode{depend} clause of the \kcode{taskwait} construct.
|
|
when selecting child tasks that the generating task must wait on, so that its execution after the
|
|
taskwait does not produce race conditions on variables accessed by non-completed child tasks.
|
|
|
|
\cexample[5.0]{task_dep}{8}
|
|
|
|
\ffreeexample[5.0]{task_dep}{8}
|
|
|
|
\pagebreak
|
|
\subsection{Mutually Exclusive Execution with Dependences}
|
|
\label{subsec:task_dep_mutexinoutset}
|
|
\index{task dependences!mutually exclusive execution}
|
|
|
|
In this example we show a series of tasks, including mutually exclusive
|
|
tasks, expressing dependences using the \kcode{depend} clause on the
|
|
\kcode{task} construct.
|
|
|
|
The program will always print \pout{6}. Tasks T1, T2 and T3 will be scheduled first,
|
|
in any order. Task T4 will be scheduled after tasks T1 and T2 are
|
|
completed. T5 will be scheduled after tasks T1 and T3 are completed. Due
|
|
to the \kcode{mutexinoutset} dependence type on \ucode{c}, T4 and T5 may be
|
|
scheduled in any order with respect to each other, but not at the same
|
|
time. Tasks T6 will be scheduled after both T4 and T5 are completed.
|
|
|
|
\cexample[5.0]{task_dep}{9}
|
|
|
|
\ffreeexample[5.0]{task_dep}{9}
|
|
|
|
The following example demonstrates a situation where the \kcode{mutexinoutset}
|
|
dependence type is advantageous. If \ucode{shortTaskB} completes
|
|
before \ucode{longTaskA}, the runtime can take advantage of this by
|
|
scheduling \ucode{longTaskBC} before \ucode{shortTaskAC}.
|
|
|
|
\cexample[5.0]{task_dep}{10}
|
|
|
|
\ffreeexample[5.0]{task_dep}{10}
|
|
|
|
\subsection{Multidependences Using Iterators}
|
|
\label{subsec:depend_iterator}
|
|
\index{task dependences!using iterators}
|
|
\index{depend clause@\kcode{depend} clause!iterator modifier@\kcode{iterator} modifier}
|
|
\index{iterator modifier@\kcode{iterator} modifier}
|
|
|
|
The following example uses an iterator to define a dynamic number of
|
|
dependences.
|
|
|
|
In the \kcode{single} construct of a parallel region a loop generates \ucode{n} tasks
|
|
and each task has an \kcode{out} dependence specified through an element of
|
|
the \ucode{v} array. This is followed by a single task that defines an \kcode{in}
|
|
dependence on each element of the array. This is accomplished by
|
|
using the \kcode{iterator} modifier in the \kcode{depend} clause, supporting a dynamic number
|
|
of dependences (\ucode{n} here).
|
|
|
|
The task for the \ucode{print_all_elements} procedure is not executed until all dependences
|
|
prescribed (or registered) by the iterator are fulfilled; that is,
|
|
after all the tasks generated by the loop have completed.
|
|
|
|
Note, one cannot simply use an array section in the \kcode{depend} clause
|
|
of the second task construct because this would violate the \kcode{depend} clause restriction:
|
|
|
|
``List items used in \kcode{depend} clauses of the same task or sibling tasks
|
|
must indicate identical storage locations or disjoint storage locations''.
|
|
|
|
In this case each of the loop tasks use a single disjoint (different storage)
|
|
element in their \kcode{depend} clause; however,
|
|
the array-section storage area prescribed in the commented directive is neither
|
|
identical nor disjoint to the storage prescribed by the elements of the
|
|
loop tasks. The iterator overcomes this restriction by effectively
|
|
creating \ucode{n} disjoint storage areas.
|
|
|
|
\cexample[5.0]{task_dep}{11}
|
|
|
|
\ffreeexample[5.0]{task_dep}{11}
|
|
|
|
\subsection{Dependence for Undeferred Tasks}
|
|
\label{subsec:depend_undefer_task}
|
|
\index{task dependences!undeferred tasks}
|
|
|
|
In the following example, we show that even if a task is undeferred as specified
|
|
by an \kcode{if} clause that evaluates to \vcode{false}, task dependences are
|
|
still honored.
|
|
|
|
The \kcode{depend} clauses of the first and second explicit tasks specify that
|
|
the first task is completed before the second task.
|
|
|
|
The second explicit task has an \kcode{if} clause that evaluates to \vcode{false}.
|
|
This means that the execution of the generating task (the implicit task of
|
|
the \kcode{single} region) must be suspended until the second explicit task
|
|
is completed.
|
|
But, because of the dependence, the first explicit task must complete first,
|
|
then the second explicit task can execute and complete, and only then
|
|
the generating task can resume to the print statement.
|
|
Thus, the program will always print \pout{x = 2}.
|
|
|
|
\cexample[4.0]{task_dep}{12}
|
|
\clearpage
|
|
|
|
\ffreeexample[4.0]{task_dep}{12}
|
|
|
|
|
|
In OpenMP 5.1 the \kcode{omp_all_memory} \plc{reserved locator} was introduced
|
|
to specify storage of all objects in memory. In the following example,
|
|
it is used in Task 4 as a convenient way to specify that the locator
|
|
(list item) denotes the storage of all objects (locations) in memory, and
|
|
will therefore match the \ucode{a} and \ucode{d} locators of Task 2, Task 3 and Task 6.
|
|
The dependences guarantee the ordered execution of Tasks 2 and 3 before 4, and
|
|
Task 4 before Task 6.
|
|
Since there are no dependences imposed on Task 1 and Task 5, they can be
|
|
scheduled to execute at any time, with no ordering.
|
|
|
|
\cexample[5.1]{task_dep}{13}
|
|
\ffreeexample[5.1]{task_dep}{13}
|
|
|
|
\subsection{Transparent Task Dependences}
|
|
\label{subsec:depend_trans_task}
|
|
\index{task dependences!transparent tasks}
|
|
|
|
|
|
In the following example, each iteration of the \ucode{h}-loop updates all
|
|
elements of array \ucode{M} and task dependences are used to synchronize
|
|
updates across different iterations of the loop. The code uses two levels of
|
|
dependent tasks and assumes that
|
|
\ucode{N_ROWS} is evenly divisible by \ucode{ROWS_PER_TASK}.
|
|
The \ucode{h}-loop generates the first level of tasks, with
|
|
the \kcode{depend} clause serializing their execution and each task calling
|
|
\ucode{my_func}. A second level of tasks are generated by the \ucode{i}-loop in
|
|
\ucode{my_func}.
|
|
|
|
However, the dependences for this second level of tasks are between tasks from
|
|
different calls to \ucode{my_func}. In order to enforce these dependences, the
|
|
first-level tasks are specified as transparent tasks with the
|
|
\kcode{transparent(omp_impex)} clause. The \kcode{omp_impex} argument (which
|
|
is the default if not explicitly specified) indicates that the task is both an
|
|
exporting and importing task. For the purposes of dependence matching, an
|
|
exporting task is one that makes its child tasks visible to its successors and
|
|
an importing task is one that makes its preceding tasks (such as earlier
|
|
sibling tasks) visible to its child tasks. As a result of the exposed
|
|
dependences, the task generated in the $i^{th}$ iteration of the
|
|
\ucode{h}=$h_0$ instance of \ucode{my_func} is guaranteed to be ordered before
|
|
the task generated in the $i^{th}$ iteration of the \ucode{h}=$h_1$ instance of
|
|
\ucode{my_func}, where $h_0 < h_1$.
|
|
|
|
\cexample[6.0]{task_dep}{14}
|
|
\ffreeexample[6.0]{task_dep}{14}
|