\cchapter{Parallel Execution}{parallel_execution} \label{chap:parallel_execution} A single thread, the \plc{initial thread}, begins sequential execution of an OpenMP enabled program, as if the whole program is in an implicit parallel region consisting of an implicit task executed by the \plc{initial thread}. A \kcode{parallel} construct encloses code, forming a parallel region. An \plc{initial thread} encountering a \kcode{parallel} region forks (creates) a team of threads at the beginning of the \kcode{parallel} region, and joins them (removes from execution) at the end of the region. The initial thread becomes the primary thread of the team in a \kcode{parallel} region with a \plc{thread} number equal to zero, the other threads are numbered from 1 to number of threads minus 1. A team may be comprised of just a single thread. Each \plc{thread} of a team is assigned an implicit task consisting of code within the \kcode{parallel} region. The task that creates a \kcode{parallel} region is suspended while the tasks of the team are executed. A thread is tied to its task; that is, only the thread assigned to the task can execute that task. After completion of the \kcode{parallel} region, the primary thread resumes execution of the generating task. %After the \code{parallel} region the primary thread becomes the initial %thread again, and continues to execute the \plc{sequential part}. Any task within a \kcode{parallel} region is allowed to encounter another \kcode{parallel} region to form a nested \kcode{parallel} region. The parallelism of a nested \kcode{parallel} region (whether it forks additional threads, or is executed serially by the encountering task) can be controlled by the \kcode{OMP_NESTED} environment variable or the \kcode{omp_set_nested()} API routine with arguments indicating true or false. The number of threads of a \kcode{parallel} region can be set by the \kcode{OMP_NUM_THREADS} environment variable, the \kcode{omp_set_num_threads()} routine, or on the \kcode{parallel} directive with the \kcode{num_threads} clause. The routine overrides the environment variable, and the clause overrides all. Use the \kcode{OMP_DYNAMIC} or the \kcode{omp_set_dynamic()} function to specify that the OpenMP implementation dynamically adjust the number of threads for \kcode{parallel} regions. The default setting for dynamic adjustment is implementation defined. When dynamic adjustment is on and the number of threads is specified, the number of threads becomes an upper limit for the number of threads to be provided by the OpenMP runtime. %\pagebreak \bigskip WORKSHARING CONSTRUCTS A worksharing construct distributes the execution of the associated region among the members of the team that encounter it. There is an implied barrier at the end of the worksharing region (there is no barrier at the beginning). \newpage The worksharing constructs are: \begin{compactitem} \item loop constructs: {\kcode{for} and \kcode{do} } \item \kcode{sections} \item \kcode{single} \item \kcode{workshare} \end{compactitem} The \kcode{for} and \kcode{do} constructs (loop constructs) create a region consisting of a loop. A loop controlled by a loop construct is called an \plc{associated} loop. Nested loops can form a single region when the \kcode{collapse} clause (with an integer argument) designates the number of \plc{associated} loops to be executed in parallel, by forming a ``single iteration space'' for the specified number of nested loops. The \kcode{ordered} clause can also control multiple associated loops. An associated loop must adhere to a ``canonical form'' (specified in the \docref{Canonical Loop Form} of the OpenMP Specifications document) which allows the iteration count (of all associated loops) to be computed before the (outermost) loop is executed. %[58:27-29]. Most common loops comply with the canonical form, including C++ iterators. A \kcode{single} construct forms a region in which only one thread (any one of the team) executes the region. The other threads wait at the implied barrier at the end, unless the \kcode{nowait} clause is specified. The \kcode{sections} construct forms a region that contains one or more structured blocks. Each block of a \kcode{sections} directive is constructed with a \kcode{section} construct, and executed once by one of the threads (any one) in the team. (If only one block is formed in the region, the \kcode{section} construct, which is used to separate blocks, is not required.) The other threads wait at the implied barrier at the end, unless the \kcode{nowait} clause is specified. The \kcode{workshare} construct is a Fortran feature that consists of a region with a single structure block (section of code). Statements in the \kcode{workshare} region are divided into units of work, and executed (once) by threads of the team. \bigskip MASKED CONSTRUCT The \kcode{masked} construct is not a worksharing construct. The \kcode{masked} region is executed only by the primary thread. There is no implicit barrier (and flush) at the end of the \kcode{masked} region; hence the other threads of the team continue execution beyond code statements beyond the \kcode{masked} region. The \kcode{master} construct, which has been deprecated in OpenMP 5.1, has identical semantics to the \kcode{masked} construct with no \kcode{filter} clause. %===== Examples Sections ===== \input{parallel_execution/ploop} \input{parallel_execution/parallel} \input{parallel_execution/host_teams} \input{parallel_execution/nthrs_nesting} \input{parallel_execution/nthrs_dynamic} \input{parallel_execution/fort_do} \input{parallel_execution/nowait} \input{parallel_execution/collapse} \input{parallel_execution/linear_in_loop} \input{parallel_execution/psections} \input{parallel_execution/fpriv_sections} \input{parallel_execution/single} \input{parallel_execution/workshare} \input{parallel_execution/masked} \input{parallel_execution/loop} \input{parallel_execution/pra_iterator} \input{parallel_execution/set_dynamic_nthrs} \input{parallel_execution/get_nthrs}