mirror of
https://github.com/OpenMP/Examples.git
synced 2025-04-03 13:21:33 +01:00
131 lines
6.0 KiB
TeX
131 lines
6.0 KiB
TeX
\cchapter{Parallel Execution}{parallel_execution}
|
|
\label{chap:parallel_execution}
|
|
|
|
A single thread, the \plc{initial thread}, begins sequential execution of
|
|
an OpenMP enabled program, as if the whole program is in an implicit parallel
|
|
region consisting of an implicit task executed by the \plc{initial thread}.
|
|
|
|
A \kcode{parallel} construct encloses code,
|
|
forming a parallel region. An \plc{initial thread} encountering a \kcode{parallel}
|
|
region forks (creates) a team of threads at the beginning of the
|
|
\kcode{parallel} region, and joins them (removes from execution) at the
|
|
end of the region. The initial thread becomes the primary thread of the team in a
|
|
\kcode{parallel} region with a \plc{thread} number equal to zero, the other
|
|
threads are numbered from 1 to number of threads minus 1.
|
|
A team may be comprised of just a single thread.
|
|
|
|
Each \plc{thread} of a team is assigned an implicit task consisting of code within the
|
|
\kcode{parallel} region. The task that creates a \kcode{parallel} region is suspended while the
|
|
tasks of the team are executed. A thread is tied to its task; that is,
|
|
only the thread assigned to the task can execute that task. After completion
|
|
of the \kcode{parallel} region, the primary thread resumes execution of the generating task.
|
|
|
|
%After the \code{parallel} region the primary thread becomes the initial
|
|
%thread again, and continues to execute the \plc{sequential part}.
|
|
|
|
Any task within a \kcode{parallel} region is allowed to encounter another
|
|
\kcode{parallel} region to form a nested \kcode{parallel} region. The
|
|
parallelism of a nested \kcode{parallel} region (whether it forks additional
|
|
threads, or is executed serially by the encountering task) can be controlled by the
|
|
\kcode{OMP_NESTED} environment variable or the \kcode{omp_set_nested()}
|
|
API routine with arguments indicating true or false.
|
|
|
|
The number of threads of a \kcode{parallel} region can be set by the \kcode{OMP_NUM_THREADS}
|
|
environment variable, the \kcode{omp_set_num_threads()} routine, or on the \kcode{parallel}
|
|
directive with the \kcode{num_threads}
|
|
clause. The routine overrides the environment variable, and the clause overrides all.
|
|
Use the \kcode{OMP_DYNAMIC}
|
|
or the \kcode{omp_set_dynamic()} function to specify that the OpenMP
|
|
implementation dynamically adjust the number of threads for
|
|
\kcode{parallel} regions. The default setting for dynamic adjustment is implementation
|
|
defined. When dynamic adjustment is on and the number of threads is specified,
|
|
the number of threads becomes an upper limit for the number of threads to be
|
|
provided by the OpenMP runtime.
|
|
|
|
%\pagebreak
|
|
\bigskip
|
|
WORKSHARING CONSTRUCTS
|
|
|
|
A worksharing construct distributes the execution of the associated region
|
|
among the members of the team that encounter it. There is an
|
|
implied barrier at the end of the worksharing region
|
|
(there is no barrier at the beginning).
|
|
|
|
\newpage
|
|
The worksharing constructs are:
|
|
|
|
\begin{compactitem}
|
|
|
|
\item loop constructs: {\kcode{for} and \kcode{do} }
|
|
\item \kcode{sections}
|
|
\item \kcode{single}
|
|
\item \kcode{workshare}
|
|
|
|
\end{compactitem}
|
|
|
|
The \kcode{for} and \kcode{do} constructs (loop constructs) create a region
|
|
consisting of a loop. A loop controlled by a loop construct is called
|
|
an \plc{associated} loop. Nested loops can form a single region when the
|
|
\kcode{collapse} clause (with an integer argument) designates the number of
|
|
\plc{associated} loops to be executed in parallel, by forming a
|
|
``single iteration space'' for the specified number of nested loops.
|
|
The \kcode{ordered} clause can also control multiple associated loops.
|
|
|
|
An associated loop must adhere to a ``canonical form'' (specified in the
|
|
\docref{Canonical Loop Form} of the OpenMP Specifications document) which allows the
|
|
iteration count (of all associated loops) to be computed before the
|
|
(outermost) loop is executed. %[58:27-29].
|
|
Most common loops comply with the canonical form, including C++ iterators.
|
|
|
|
A \kcode{single} construct forms a region in which only one thread (any one
|
|
of the team) executes the region.
|
|
The other threads wait at the implied
|
|
barrier at the end, unless the \kcode{nowait} clause is specified.
|
|
|
|
The \kcode{sections} construct forms a region that contains one or more
|
|
structured blocks. Each block of a \kcode{sections} directive is
|
|
constructed with a \kcode{section} construct, and executed once by
|
|
one of the threads (any one) in the team. (If only one block is
|
|
formed in the region, the \kcode{section} construct, which is used to
|
|
separate blocks, is not required.)
|
|
The other threads wait at the implied
|
|
barrier at the end, unless the \kcode{nowait} clause is specified.
|
|
|
|
|
|
The \kcode{workshare} construct is a Fortran feature that consists of a
|
|
region with a single structure block (section of code). Statements in the
|
|
\kcode{workshare} region are divided into units of work, and executed (once)
|
|
by threads of the team.
|
|
|
|
\bigskip
|
|
MASKED CONSTRUCT
|
|
|
|
The \kcode{masked} construct is not a worksharing construct. The \kcode{masked} region is
|
|
executed only by the primary thread. There is no implicit barrier (and flush)
|
|
at the end of the \kcode{masked} region; hence the other threads of the team continue
|
|
execution beyond code statements beyond the \kcode{masked} region.
|
|
The \kcode{master} construct, which has been deprecated in OpenMP 5.1, has identical semantics
|
|
to the \kcode{masked} construct with no \kcode{filter} clause.
|
|
|
|
|
|
%===== Examples Sections =====
|
|
\input{parallel_execution/ploop}
|
|
\input{parallel_execution/parallel}
|
|
\input{parallel_execution/host_teams}
|
|
\input{parallel_execution/nthrs_nesting}
|
|
\input{parallel_execution/nthrs_dynamic}
|
|
\input{parallel_execution/fort_do}
|
|
\input{parallel_execution/nowait}
|
|
\input{parallel_execution/collapse}
|
|
\input{parallel_execution/linear_in_loop}
|
|
\input{parallel_execution/psections}
|
|
\input{parallel_execution/fpriv_sections}
|
|
\input{parallel_execution/single}
|
|
\input{parallel_execution/workshare}
|
|
\input{parallel_execution/masked}
|
|
\input{parallel_execution/loop}
|
|
\input{parallel_execution/pra_iterator}
|
|
\input{parallel_execution/set_dynamic_nthrs}
|
|
\input{parallel_execution/get_nthrs}
|
|
|