2024-11-13 11:07:08 -08:00

157 lines
7.8 KiB
TeX

%\pagebreak
\section{\kcode{teams} Construct and Related Combined Constructs}
\label{sec:teams}
\subsection{\kcode{target} and \kcode{teams} Constructs with \kcode{omp_get_num_teams}\\
and \kcode{omp_get_team_num} Routines}
\label{subsec:teams_api}
\index{constructs!target@\kcode{target}}
\index{target construct@\kcode{target} construct}
\index{constructs!teams@\kcode{teams}}
\index{teams construct@\kcode{teams} construct}
\index{combined constructs!target teams@\kcode{target teams}}
\index{teams construct@\kcode{teams} construct!num_teams clause@\kcode{num_teams} clause}
\index{clauses!num_teams@\kcode{num_teams}}
\index{num_teams clause@\kcode{num_teams} clause}
\index{routines!omp_get_num_teams@\kcode{omp_get_num_teams}}
\index{routines!omp_get_team_num@\kcode{omp_get_team_num}}
\index{omp_get_num_teams routine@\kcode{omp_get_num_teams} routine}
\index{omp_get_team_num routine@\kcode{omp_get_team_num} routine}
The following example shows how the \kcode{target} and \kcode{teams} constructs
are used to create a \plc{league} of thread teams that execute a region. The \kcode{teams}
construct creates a league of at most two teams where the primary thread of each
team executes the \kcode{teams} region.
The \kcode{omp_get_num_teams} routine returns the number of teams executing in a \kcode{teams}
region. The \kcode{omp_get_team_num} routine returns the team number, which is an integer
between 0 and one less than the value returned by \kcode{omp_get_num_teams}. The following
example manually distributes a loop across two teams.
\cexample[4.0]{teams}{1}
\ffreeexample[4.0]{teams}{1}
\subsection{\kcode{target}, \kcode{teams}, and \kcode{distribute} Constructs}
\label{subsec:teams_distribute}
\index{constructs!distribute@\kcode{distribute}}
\index{distribute construct@\kcode{distribute} construct}
The following example shows how the \kcode{target}, \kcode{teams}, and \kcode{distribute}
constructs are used to execute a loop nest in a \kcode{target} region. The \kcode{teams}
construct creates a league and the primary thread of each team executes the \kcode{teams}
region. The \kcode{distribute} construct schedules the subsequent loop iterations
across the primary threads of each team.
The number of teams in the league is less than or equal to the variable \ucode{num_blocks}.
Each team in the league has a number of threads less than or equal to the variable
\ucode{block_threads}. The iterations in the outer loop are distributed among the primary
threads of each team.
When a team's primary thread encounters the parallel loop construct before the inner
loop, the other threads in its team are activated. The team executes the \kcode{parallel}
region and then workshares the execution of the loop.
\index{reduction clause@\kcode{reduction} clause!on teams construct@on \kcode{teams} construct}
Each primary thread executing the \kcode{teams} region has a private copy of the
variable \ucode{sum} that is created by the \kcode{reduction} clause on the \kcode{teams} construct.
The primary thread and all threads in its team have a private copy of the variable
\ucode{sum} that is created by the \kcode{reduction} clause on the parallel loop construct.
The second private \ucode{sum} is reduced into the primary thread's private copy of \ucode{sum}
created by the \kcode{teams} construct. At the end of the \kcode{teams} region,
each primary thread's private copy of \ucode{sum} is reduced into the final \ucode{sum} that is
implicitly mapped into the \kcode{target} region.
\cexample[4.0]{teams}{2}
%\clearpage
\ffreeexample[4.0]{teams}{2}
\subsection{\kcode{target teams}, and Distribute Parallel Loop Constructs}
\label{subsec:teams_distribute_parallel}
The following example shows how the \kcode{target teams} and distribute
parallel loop constructs are used to execute a \kcode{target} region.
The \kcode{target teams} construct creates a league of teams where the primary thread of each
team executes the \kcode{teams} region.
The distribute parallel loop construct schedules the loop iterations across the
primary threads of each team and then across the threads of each team.
\cexample[4.5]{teams}{3}
\ffreeexample[4.5]{teams}{3}
\subsection{\kcode{target teams} and Distribute Parallel Loop
Constructs with Scheduling Clauses}
\label{subsec:teams_distribute_parallel_schedule}
\index{distribute construct@\kcode{distribute} construct!dist_schedule clause@\kcode{dist_schedule} clause}
\index{clauses!dist_schedule@\kcode{dist_schedule}}
\index{dist_schedule clause@\kcode{dist_schedule} clause}
\index{worksharing-loop constructs!schedule clause@\kcode{schedule} clause}
\index{clauses!schedule@\kcode{schedule}}
\index{schedule clause@\kcode{schedule} clause}
The following example shows how the \kcode{target teams} and \kcode{distribute parallel}
constructs are used to execute a \kcode{target} region. The \kcode{teams}
construct creates a league of at most eight teams where the primary thread of each
team executes the \kcode{teams} region. The number of threads in each team is
less than or equal to \ucode{16}.
The \kcode{distribute} parallel loop construct schedules the subsequent loop iterations
across the primary threads of each team and then across the threads of each team.
The \kcode{dist_schedule} clause on the distribute parallel loop construct indicates
that loop iterations are distributed to the primary thread of each team in chunks
of \ucode{1024} iterations.
The \kcode{schedule} clause indicates that the 1024 iterations distributed to
a primary thread are then assigned to the threads in its associated team in chunks
of \ucode{64} iterations.
\cexample[4.0]{teams}{4}
\ffreeexample[4.0]{teams}{4}
\subsection{\kcode{target teams} and \kcode{distribute simd} Constructs}
\label{subsec:teams_distribute_simd}
The following example shows how the \kcode{target teams} and \kcode{distribute simd} constructs are used to execute a loop in a \kcode{target} region.
The \kcode{target teams} construct creates a league of teams where the
primary thread of each team executes the \kcode{teams} region.
The \kcode{distribute simd} construct schedules the loop iterations across
the primary thread of each team and then uses SIMD parallelism to execute the iterations.
\cexample[4.0]{teams}{5}
\ffreeexample[4.0]{teams}{5}
\subsection{\kcode{target teams} and Distribute Parallel Loop SIMD Constructs}
\label{subsec:teams_distribute_parallel_simd}
The following example shows how the \kcode{target teams} and the distribute
parallel loop SIMD constructs are used to execute a loop in a \kcode{target teams}
region. The \kcode{target teams} construct creates a league of teams
where the primary thread of each team executes the \kcode{teams} region.
The distribute parallel loop SIMD construct schedules the loop iterations across
the primary thread of each team and then across the threads of each team where each
thread uses SIMD parallelism.
\cexample[4.0]{teams}{6}
\ffreeexample[4.0]{teams}{6}
\subsection{Evaluation of \kcode{num_teams} Clause that Appears inside \kcode{target} Region}
\label{subsec:target_teams_num_teams}
The following example shows the evaluation of the \kcode{num_teams} clause when the \kcode{teams} construct is closely nested inside \kcode{target} construct. The code is non-conforming since value of \ucode{x} for the clause may be different from different devices. As of OpenMP 6.0, it is the user's responsibility to ensure identical values for the clause expression for nested as well as combined directive cases for \kcode{target} and \kcode{teams} constructs. This permits implementations to evaluate the \kcode{num_teams} argument on the host rather than the target device. For the program to be conforming, the program must update the host value so that \ucode{x} will have the same value when evaluated on the host or target device.
\cexample[6.0]{teams}{7}
\ffreeexample[6.0]{teams}{7}