\cchapter{Parallel Execution}{parallel_execution}
\label{chap:parallel_execution}

A single thread, the \plc{initial thread}, begins sequential execution of 
an OpenMP enabled program, as if the whole program is in an implicit parallel
region consisting of an implicit task executed by the \plc{initial thread}.

A \kcode{parallel} construct encloses code, 
forming a parallel region.  An \plc{initial thread} encountering a \kcode{parallel} 
region forks (creates) a team of threads at the beginning of the 
\kcode{parallel} region, and joins them (removes from execution) at the 
end of the region.  The initial thread becomes the primary thread of the team in a 
\kcode{parallel} region with a \plc{thread} number equal to zero, the other 
threads are numbered from 1 to number of threads minus 1. 
A team may be comprised of just a single thread.

Each \plc{thread} of a team is assigned an implicit task consisting of code within the 
\kcode{parallel} region. The task that creates a \kcode{parallel} region is suspended while the
tasks of the team are executed.  A thread is tied to its task; that is,
only the thread assigned to the task can execute that task.  After completion 
of the \kcode{parallel} region, the primary thread resumes execution of the generating task.  

%After the \code{parallel} region the primary thread becomes the initial 
%thread again, and continues to execute the \plc{sequential part}.  

Any task within a \kcode{parallel} region is allowed to encounter another
\kcode{parallel} region to form a nested \kcode{parallel} region. The 
parallelism of a nested \kcode{parallel} region (whether it forks additional 
threads, or is executed serially by the encountering task) can be controlled by the
\kcode{OMP_NESTED} environment variable or the \kcode{omp_set_nested()} 
API routine with arguments indicating true or false.

The number of threads of a \kcode{parallel} region can be set by the \kcode{OMP_NUM_THREADS}
environment variable, the \kcode{omp_set_num_threads()} routine, or on the \kcode{parallel} 
directive with the \kcode{num_threads}
clause. The routine overrides the environment variable, and the clause overrides all. 
Use the \kcode{OMP_DYNAMIC}
or the \kcode{omp_set_dynamic()} function to specify that the OpenMP
implementation dynamically adjust the number of threads for
\kcode{parallel} regions. The default setting for dynamic adjustment is implementation
defined. When dynamic adjustment is on and the number of threads is specified,
the number of threads becomes an upper limit for the number of threads to be
provided by the OpenMP runtime.

%\pagebreak
\bigskip
WORKSHARING CONSTRUCTS

A worksharing construct distributes the execution of the associated region
among the members of the team that encounter it.  There is an
implied barrier at the end of the worksharing region
(there is no barrier at the beginning). 

\newpage
The worksharing constructs are:

\begin{compactitem}

\item loop constructs: {\kcode{for} and \kcode{do} }
\item \kcode{sections}
\item \kcode{single}
\item \kcode{workshare}

\end{compactitem}

The \kcode{for} and \kcode{do} constructs (loop constructs) create a region 
consisting of a loop.  A loop controlled by a loop construct is called 
an \plc{associated} loop.  Nested loops can form a single region when the 
\kcode{collapse} clause (with an integer argument) designates the number of 
\plc{associated} loops to be executed in parallel, by forming a 
``single iteration space'' for the specified number of nested loops.
The \kcode{ordered} clause can also control multiple associated loops.

An associated loop must adhere to a ``canonical form'' (specified in the
\docref{Canonical Loop Form} of the OpenMP Specifications document) which allows the 
iteration count (of all associated loops) to be computed before the 
(outermost) loop is executed. %[58:27-29].  
Most common loops comply with the canonical form, including C++ iterators.

A \kcode{single} construct forms a region in which only one thread (any one 
of the team) executes the region. 
The other threads wait at the implied 
barrier at the end, unless the \kcode{nowait} clause is specified.

The \kcode{sections} construct forms a region that contains one or more 
structured blocks.  Each block of a \kcode{sections} directive is 
constructed with a \kcode{section} construct, and executed once by 
one of the threads (any one) in the team.  (If only one block is 
formed in the region, the \kcode{section} construct, which is used to
separate blocks, is not required.)
The other threads wait at the implied 
barrier at the end, unless the \kcode{nowait} clause is specified.


The \kcode{workshare} construct is a Fortran feature that consists of a
region with a single structure block (section of code). Statements in the
\kcode{workshare} region are divided into units of work, and executed (once)
by threads of the team.  

\bigskip
MASKED CONSTRUCT

The \kcode{masked} construct is not a worksharing construct.  The \kcode{masked} region is
executed only by the primary thread. There is no implicit barrier (and flush) 
at the end of the \kcode{masked} region; hence the other threads of the team continue
execution beyond code statements beyond the \kcode{masked} region.
The \kcode{master} construct, which has been deprecated in OpenMP 5.1, has identical semantics
to the \kcode{masked} construct with no \kcode{filter} clause.


%===== Examples Sections =====
\input{parallel_execution/ploop}
\input{parallel_execution/parallel}
\input{parallel_execution/host_teams}
\input{parallel_execution/nthrs_nesting}
\input{parallel_execution/nthrs_dynamic}
\input{parallel_execution/fort_do}
\input{parallel_execution/nowait}
\input{parallel_execution/collapse}
\input{parallel_execution/linear_in_loop}
\input{parallel_execution/psections}
\input{parallel_execution/fpriv_sections}
\input{parallel_execution/single}
\input{parallel_execution/workshare}
\input{parallel_execution/masked}
\input{parallel_execution/loop}
\input{parallel_execution/pra_iterator}
\input{parallel_execution/set_dynamic_nthrs}
\input{parallel_execution/get_nthrs}