mirror of
https://github.com/OpenMP/Examples.git
synced 2025-04-04 05:41:33 +01:00
157 lines
7.8 KiB
TeX
157 lines
7.8 KiB
TeX
%\pagebreak
|
|
\section{\kcode{teams} Construct and Related Combined Constructs}
|
|
\label{sec:teams}
|
|
|
|
\subsection{\kcode{target} and \kcode{teams} Constructs with \kcode{omp_get_num_teams}\\
|
|
and \kcode{omp_get_team_num} Routines}
|
|
\label{subsec:teams_api}
|
|
\index{constructs!target@\kcode{target}}
|
|
\index{target construct@\kcode{target} construct}
|
|
\index{constructs!teams@\kcode{teams}}
|
|
\index{teams construct@\kcode{teams} construct}
|
|
\index{combined constructs!target teams@\kcode{target teams}}
|
|
\index{teams construct@\kcode{teams} construct!num_teams clause@\kcode{num_teams} clause}
|
|
\index{clauses!num_teams@\kcode{num_teams}}
|
|
\index{num_teams clause@\kcode{num_teams} clause}
|
|
\index{routines!omp_get_num_teams@\kcode{omp_get_num_teams}}
|
|
\index{routines!omp_get_team_num@\kcode{omp_get_team_num}}
|
|
\index{omp_get_num_teams routine@\kcode{omp_get_num_teams} routine}
|
|
\index{omp_get_team_num routine@\kcode{omp_get_team_num} routine}
|
|
|
|
The following example shows how the \kcode{target} and \kcode{teams} constructs
|
|
are used to create a \plc{league} of thread teams that execute a region. The \kcode{teams}
|
|
construct creates a league of at most two teams where the primary thread of each
|
|
team executes the \kcode{teams} region.
|
|
|
|
The \kcode{omp_get_num_teams} routine returns the number of teams executing in a \kcode{teams}
|
|
region. The \kcode{omp_get_team_num} routine returns the team number, which is an integer
|
|
between 0 and one less than the value returned by \kcode{omp_get_num_teams}. The following
|
|
example manually distributes a loop across two teams.
|
|
|
|
\cexample[4.0]{teams}{1}
|
|
|
|
\ffreeexample[4.0]{teams}{1}
|
|
|
|
\subsection{\kcode{target}, \kcode{teams}, and \kcode{distribute} Constructs}
|
|
\label{subsec:teams_distribute}
|
|
\index{constructs!distribute@\kcode{distribute}}
|
|
\index{distribute construct@\kcode{distribute} construct}
|
|
|
|
The following example shows how the \kcode{target}, \kcode{teams}, and \kcode{distribute}
|
|
constructs are used to execute a loop nest in a \kcode{target} region. The \kcode{teams}
|
|
construct creates a league and the primary thread of each team executes the \kcode{teams}
|
|
region. The \kcode{distribute} construct schedules the subsequent loop iterations
|
|
across the primary threads of each team.
|
|
|
|
The number of teams in the league is less than or equal to the variable \ucode{num_blocks}.
|
|
Each team in the league has a number of threads less than or equal to the variable
|
|
\ucode{block_threads}. The iterations in the outer loop are distributed among the primary
|
|
threads of each team.
|
|
|
|
When a team's primary thread encounters the parallel loop construct before the inner
|
|
loop, the other threads in its team are activated. The team executes the \kcode{parallel}
|
|
region and then workshares the execution of the loop.
|
|
|
|
\index{reduction clause@\kcode{reduction} clause!on teams construct@on \kcode{teams} construct}
|
|
|
|
Each primary thread executing the \kcode{teams} region has a private copy of the
|
|
variable \ucode{sum} that is created by the \kcode{reduction} clause on the \kcode{teams} construct.
|
|
The primary thread and all threads in its team have a private copy of the variable
|
|
\ucode{sum} that is created by the \kcode{reduction} clause on the parallel loop construct.
|
|
The second private \ucode{sum} is reduced into the primary thread's private copy of \ucode{sum}
|
|
created by the \kcode{teams} construct. At the end of the \kcode{teams} region,
|
|
each primary thread's private copy of \ucode{sum} is reduced into the final \ucode{sum} that is
|
|
implicitly mapped into the \kcode{target} region.
|
|
|
|
\cexample[4.0]{teams}{2}
|
|
%\clearpage
|
|
|
|
\ffreeexample[4.0]{teams}{2}
|
|
|
|
\subsection{\kcode{target teams}, and Distribute Parallel Loop Constructs}
|
|
\label{subsec:teams_distribute_parallel}
|
|
|
|
The following example shows how the \kcode{target teams} and distribute
|
|
parallel loop constructs are used to execute a \kcode{target} region.
|
|
The \kcode{target teams} construct creates a league of teams where the primary thread of each
|
|
team executes the \kcode{teams} region.
|
|
|
|
The distribute parallel loop construct schedules the loop iterations across the
|
|
primary threads of each team and then across the threads of each team.
|
|
|
|
\cexample[4.5]{teams}{3}
|
|
|
|
\ffreeexample[4.5]{teams}{3}
|
|
|
|
\subsection{\kcode{target teams} and Distribute Parallel Loop
|
|
Constructs with Scheduling Clauses}
|
|
\label{subsec:teams_distribute_parallel_schedule}
|
|
\index{distribute construct@\kcode{distribute} construct!dist_schedule clause@\kcode{dist_schedule} clause}
|
|
\index{clauses!dist_schedule@\kcode{dist_schedule}}
|
|
\index{dist_schedule clause@\kcode{dist_schedule} clause}
|
|
\index{worksharing-loop constructs!schedule clause@\kcode{schedule} clause}
|
|
\index{clauses!schedule@\kcode{schedule}}
|
|
\index{schedule clause@\kcode{schedule} clause}
|
|
|
|
The following example shows how the \kcode{target teams} and \kcode{distribute parallel}
|
|
constructs are used to execute a \kcode{target} region. The \kcode{teams}
|
|
construct creates a league of at most eight teams where the primary thread of each
|
|
team executes the \kcode{teams} region. The number of threads in each team is
|
|
less than or equal to \ucode{16}.
|
|
|
|
The \kcode{distribute} parallel loop construct schedules the subsequent loop iterations
|
|
across the primary threads of each team and then across the threads of each team.
|
|
|
|
The \kcode{dist_schedule} clause on the distribute parallel loop construct indicates
|
|
that loop iterations are distributed to the primary thread of each team in chunks
|
|
of \ucode{1024} iterations.
|
|
|
|
The \kcode{schedule} clause indicates that the 1024 iterations distributed to
|
|
a primary thread are then assigned to the threads in its associated team in chunks
|
|
of \ucode{64} iterations.
|
|
|
|
\cexample[4.0]{teams}{4}
|
|
|
|
\ffreeexample[4.0]{teams}{4}
|
|
|
|
\subsection{\kcode{target teams} and \kcode{distribute simd} Constructs}
|
|
\label{subsec:teams_distribute_simd}
|
|
|
|
The following example shows how the \kcode{target teams} and \kcode{distribute simd} constructs are used to execute a loop in a \kcode{target} region.
|
|
The \kcode{target teams} construct creates a league of teams where the
|
|
primary thread of each team executes the \kcode{teams} region.
|
|
|
|
The \kcode{distribute simd} construct schedules the loop iterations across
|
|
the primary thread of each team and then uses SIMD parallelism to execute the iterations.
|
|
|
|
\cexample[4.0]{teams}{5}
|
|
|
|
\ffreeexample[4.0]{teams}{5}
|
|
|
|
\subsection{\kcode{target teams} and Distribute Parallel Loop SIMD Constructs}
|
|
\label{subsec:teams_distribute_parallel_simd}
|
|
|
|
The following example shows how the \kcode{target teams} and the distribute
|
|
parallel loop SIMD constructs are used to execute a loop in a \kcode{target teams}
|
|
region. The \kcode{target teams} construct creates a league of teams
|
|
where the primary thread of each team executes the \kcode{teams} region.
|
|
|
|
The distribute parallel loop SIMD construct schedules the loop iterations across
|
|
the primary thread of each team and then across the threads of each team where each
|
|
thread uses SIMD parallelism.
|
|
|
|
\cexample[4.0]{teams}{6}
|
|
|
|
\ffreeexample[4.0]{teams}{6}
|
|
|
|
\subsection{Evaluation of \kcode{num_teams} Clause that Appears inside \kcode{target} Region}
|
|
\label{subsec:target_teams_num_teams}
|
|
|
|
The following example shows the evaluation of the \kcode{num_teams} clause when the \kcode{teams} construct is closely nested inside \kcode{target} construct. The code is non-conforming since value of \ucode{x} for the clause may be different from different devices. As of OpenMP 6.0, it is the user's responsibility to ensure identical values for the clause expression for nested as well as combined directive cases for \kcode{target} and \kcode{teams} constructs. This permits implementations to evaluate the \kcode{num_teams} argument on the host rather than the target device. For the program to be conforming, the program must update the host value so that \ucode{x} will have the same value when evaluated on the host or target device.
|
|
|
|
\cexample[6.0]{teams}{7}
|
|
|
|
\ffreeexample[6.0]{teams}{7}
|
|
|
|
|