OpenMP-Examples/Examples_reduction.tex

238 lines
11 KiB
TeX

\pagebreak
\section{Reduction}
\label{sec:reduction}
This section covers ways to perform reductions in parallel, task, taskloop, and SIMD regions.
\subsection{The \code{reduction} Clause}
\label{subsec:reduction}
The following example demonstrates the \code{reduction} clause; note that some
reductions can be expressed in the loop in several ways, as shown for the \code{max}
and \code{min} reductions below:
\cexample{reduction}{1}
\pagebreak
\ffreeexample{reduction}{1}
A common implementation of the preceding example is to treat it as if it had been
written as follows:
\cexample{reduction}{2}
\fortranspecificstart
\ffreenexample{reduction}{2}
The following program is non-conforming because the reduction is on the
\emph{intrinsic procedure name} \code{MAX} but that name has been redefined to be the variable
named \code{MAX}.
\ffreenexample{reduction}{3}
% blue line floater at top of this page for "Fortran, cont."
\begin{figure}[t!]
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
\end{figure}
The following conforming program performs the reduction using the
\emph{intrinsic procedure name} \code{MAX} even though the intrinsic \code{MAX} has been renamed
to \code{REN}.
\ffreenexample{reduction}{4}
The following conforming program performs the reduction using
\plc{intrinsic procedure name} \code{MAX} even though the intrinsic \code{MAX} has been renamed
to \code{MIN}.
\ffreenexample{reduction}{5}
\fortranspecificend
\pagebreak
The following example is non-conforming because the initialization (\code{a =
0}) of the original list item \code{a} is not synchronized with the update of
\code{a} as a result of the reduction computation in the \code{for} loop. Therefore,
the example may print an incorrect value for \code{a}.
To avoid this problem, the initialization of the original list item \code{a}
should complete before any update of \code{a} as a result of the \code{reduction}
clause. This can be achieved by adding an explicit barrier after the assignment
\code{a = 0}, or by enclosing the assignment \code{a = 0} in a \code{single}
directive (which has an implied barrier), or by initializing \code{a} before
the start of the \code{parallel} region.
\cexample{reduction}{6}
\fexample{reduction}{6}
The following example demonstrates the reduction of array \plc{a}. In C/C++ this is illustrated by the explicit use of an array section \plc{a[0:N]} in the \code{reduction} clause. The corresponding Fortran example uses array syntax supported in the base language. As of the OpenMP 4.5 specification the explicit use of array section in the \code{reduction} clause in Fortran is not permitted. But this oversight will be fixed in the next release of the specification.
\cexample{reduction}{7}
\ffreeexample{reduction}{7}
\subsection{Task Reduction}
\label{subsec:task_reduction}
The following C/C++ and Fortran examples show how to implement
a task reduction over a linked list.
Task reductions are supported by the \code{task\_reduction} clause which can only be
applied to the \code{taskgroup} directive, and a \code{in\_reduction} clause
which can be applied to the \code{task} construct among others.
The \code{task\_reduction} clause on the \code{taskgroup} construct is used to
define the scope of a new reduction, and after the \code{taskgroup}
region the original variable will contain the final value of the reduction.
In the task-generating while loop the \code{in\_reduction} clause of the \code{task}
construct is used to specify that the task participates "in" the reduction.
Note: The \plc{res} variable is private in the \plc{linked\_list\_sum} routine
and is not required to be shared (as in the case of a \code{parallel} construct
reduction).
\cexample{task_reduction}{1}
\ffreeexample{task_reduction}{1}
\subsection{Taskloop Reduction}
\label{subsec:taskloop_reduction}
In the OpenMP 5.0 Specification the \code{taskloop} construct
was extended to include the reductions.
The following two examples show how to implement a reduction over an array
using taskloop reduction in two different ways.
In the first
example we apply the \code{reduction} clause to the \code{taskloop} construct. As it was
explained above in the task reduction examples, a reduction over tasks is
divided in two components: the scope of the reduction, which is defined by a
\code{taskgroup} region, and the tasks that participate in the reduction. In this
example, the \code{reduction} clause defines both semantics. First, it specifies that
the implicit \code{taskgroup} region associated with the \code{taskloop} construct is the scope of the
reduction, and second, it defines all tasks created by the \code{taskloop} construct as
participants of the reduction. About the first property, it is important to note
that if we add the \code{nogroup} clause to the \code{taskloop} construct the code will be
nonconforming, basically because we have a set of tasks that participate in a
reduction that has not been defined.
\cexample{taskloop_reduction}{1}
\ffreeexample{taskloop_reduction}{1}
%In the second example, we are computing exactly the same
%value but we do it in a very different way. The first thing that we do in the
%\plc{array\_sum} function is to create a \code{taskgroup} region that defines the scope of a
%new reduction using the \code{task\_reduction} clause.
%After that, we specify that a task and also the tasks generated
%by a taskloop will participate in that reduction using the \code{in\_reduction} clause
%on the \code{task} and \code{taskloop} constructs, respectively. Note that
%we also added the \code{nogroup} clause to the \code{taskloop} construct. This is allowed
%because what we are expressing with the \code{in\_reduction} clause is different
%from what we were expressing with the \code{reduction} clause. In one case we specify
%that the generated tasks will participate in a previously declared reduction
%(\code{in\_reduction} clause) whereas in the other case we specify that we want to
%create a new reduction and also that all tasks generated by the taskloop will
%participate on it.
The second example computes exactly the same value as in the preceding\plc{taskloop\_reduction.1} code section,
but in a very different way.
First, in the \plc{array\_sum} function a \code{taskgroup} region is created
that defines the scope of a new reduction using the \code{task\_reduction} clause.
After that, a task and also the tasks generated by a taskloop participate in
that reduction by using the \code{in\_reduction} clause on the \code{task}
and \code{taskloop} constructs, respectively.
Note that the \code{nogroup} clause was added to the \code{taskloop} construct.
This is allowed because what is expressed with the \code{in\_reduction} clause
is different from what is expressed with the \code{reduction} clause.
In one case the generated tasks are specified to participate in a previously
declared reduction (\code{in\_reduction} clause) whereas in the other case
creation of a new reduction is specified and also that all tasks generated
by the taskloop will participate on it.
\cexample{taskloop_reduction}{2}
\ffreeexample{taskloop_reduction}{2}
In the OpenMP 5.0 Specification, \code{reduction} clauses for the
\code{taskloop}~\code{ simd} construct were also added.
The examples below compare reductions for the \code{taskloop} and the \code{taskloop}~\code{simd} constructs.
These examples illustrate the use of \code{reduction} clauses within
"stand-alone" \code{taskloop} constructs, and the use of \code{in\_reduction} clauses for tasks of taskloops to participate
with other reductions within the scope of a parallel region.
\textbf{taskloop reductions:}
In the \plc{taskloop reductions} section of the example below,
\plc{taskloop 1} uses the \code{reduction} clause
in a \code{taskloop} construct for a sum reduction, accumulated in \plc{asum}.
The behavior is as though a \code{taskgroup} construct encloses the
taskloop region with a \code{task\_reduction} clause, and each taskloop
task has an \code{in\_reduction} clause with the specifications
of the \code{reduction} clause.
At the end of the taskloop region \plc{asum} contains the result of the reduction.
The next taskloop, \plc{taskloop 2}, illustrates the use of the
\code{in\_reduction} clause to participate in a previously defined
reduction scope of a \code{parallel} construct.
The task reductions of \plc{task 2} and \plc{taskloop 2} are combined
across the \code{taskloop} construct and the single \code{task} construct, as specified
in the \code{reduction(task,}~\code{+:asum)} clause of the \code{parallel} construct.
At the end of the parallel region \plc{asum} contains the combined result of all reductions.
\textbf{taskloop simd reductions:}
Reductions for the \code{taskloop}~\code{simd} construct are shown in the second half of the code.
Since each component construct, \code{taskloop} and \code{simd},
can accept a reduction-type clause, the \code{taskloop}~\code{simd} construct
is a composite construct, and the specific application of the reduction clause is defined
within the \code{taskloop}~\code{simd} construct section of the OpenMP 5.0 Specification.
The code below illustrates use cases for these reductions.
In the \plc{taskloop simd reduction} section of the example below,
\plc{taskloop simd 3} uses the \code{reduction} clause
in a \code{taskloop}~\code{simd} construct for a sum reduction within a loop.
For this case a \code{reduction} clause is used, as one would use
for a \code{simd} construct.
The SIMD reductions of each task are combined, and the results of these tasks are further
combined just as in the \code{taskloop} construct with the \code{reduction} clause for \plc{taskloop 1}.
At the end of the taskloop region \plc{asum} contains the combined result of all reductions.
If a \code{taskloop}~\code{simd} construct is to participate in a previously defined
reduction scope, the reduction participation should be specified with
a \code{in\_reduction} clause, as shown in the \code{parallel} region enclosing
\plc{task 4} and \plc{taskloop simd 4} code sections.
Here the \code{taskloop}~\code{simd} construct's
\code{in\_reduction} clause specifies participation of the construct's tasks as
a task reduction within the scope of the parallel region.
That is, the results of each task of the \code{taskloop} construct component
contribute to the reduction in a broader level, just as in \plc{parallel reduction a} code section above.
Also, each \code{simd}-component construct
occurs as if it has a \code{reduction} clause, and the
SIMD results of each task are combined as though to form a single result for
each task (that participates in the \code{in\_reduction} clause).
At the end of the parallel region \plc{asum} contains the combined result of all reductions.
%Just as in \plc{parallel reduction a} the
%\code{taskloop simd} construct reduction results are combined
%with the \code{task} construct reduction results
%as specified by the \code{in\_reduction} clause of the \code{task} construct
%and the \plc{task} reduction-modifier of the \code{reduction} clause of
%the \code{parallel} construct.
%At the end of the parallel region \plc{asum} contains the combined result of all reductions.
\cexample{taskloop_simd_reduction}{1}
\ffreeexample{taskloop_simd_reduction}{1}
% All other reductions