mirror of
https://github.com/OpenMP/Examples.git
synced 2025-04-16 03:08:05 +01:00
147 lines
6.9 KiB
TeX
147 lines
6.9 KiB
TeX
%\pagebreak
|
|
\section{\kcode{teams} Construct and Related Combined Constructs}
|
|
\label{sec:teams}
|
|
|
|
\subsection{\kcode{target} and \kcode{teams} Constructs with \kcode{omp_get_num_teams}\\
|
|
and \kcode{omp_get_team_num} Routines}
|
|
\label{subsec:teams_api}
|
|
\index{constructs!target@\kcode{target}}
|
|
\index{target construct@\kcode{target} construct}
|
|
\index{constructs!teams@\kcode{teams}}
|
|
\index{teams construct@\kcode{teams} construct}
|
|
\index{combined constructs!target teams@\kcode{target teams}}
|
|
\index{teams construct@\kcode{teams} construct!num_teams clause@\kcode{num_teams} clause}
|
|
\index{clauses!num_teams@\kcode{num_teams}}
|
|
\index{num_teams clause@\kcode{num_teams} clause}
|
|
\index{routines!omp_get_num_teams@\kcode{omp_get_num_teams}}
|
|
\index{routines!omp_get_team_num@\kcode{omp_get_team_num}}
|
|
\index{omp_get_num_teams routine@\kcode{omp_get_num_teams} routine}
|
|
\index{omp_get_team_num routine@\kcode{omp_get_team_num} routine}
|
|
|
|
The following example shows how the \kcode{target} and \kcode{teams} constructs
|
|
are used to create a \plc{league} of thread teams that execute a region. The \kcode{teams}
|
|
construct creates a league of at most two teams where the primary thread of each
|
|
team executes the \kcode{teams} region.
|
|
|
|
The \kcode{omp_get_num_teams} routine returns the number of teams executing in a \kcode{teams}
|
|
region. The \kcode{omp_get_team_num} routine returns the team number, which is an integer
|
|
between 0 and one less than the value returned by \kcode{omp_get_num_teams}. The following
|
|
example manually distributes a loop across two teams.
|
|
|
|
\cexample[4.0]{teams}{1}
|
|
|
|
\ffreeexample[4.0]{teams}{1}
|
|
|
|
\subsection{\kcode{target}, \kcode{teams}, and \kcode{distribute} Constructs}
|
|
\label{subsec:teams_distribute}
|
|
\index{constructs!distribute@\kcode{distribute}}
|
|
\index{distribute construct@\kcode{distribute} construct}
|
|
|
|
The following example shows how the \kcode{target}, \kcode{teams}, and \kcode{distribute}
|
|
constructs are used to execute a loop nest in a \kcode{target} region. The \kcode{teams}
|
|
construct creates a league and the primary thread of each team executes the \kcode{teams}
|
|
region. The \kcode{distribute} construct schedules the subsequent loop iterations
|
|
across the primary threads of each team.
|
|
|
|
The number of teams in the league is less than or equal to the variable \ucode{num_blocks}.
|
|
Each team in the league has a number of threads less than or equal to the variable
|
|
\ucode{block_threads}. The iterations in the outer loop are distributed among the primary
|
|
threads of each team.
|
|
|
|
When a team's primary thread encounters the parallel loop construct before the inner
|
|
loop, the other threads in its team are activated. The team executes the \kcode{parallel}
|
|
region and then workshares the execution of the loop.
|
|
|
|
\index{reduction clause@\kcode{reduction} clause!on teams construct@on \kcode{teams} construct}
|
|
|
|
Each primary thread executing the \kcode{teams} region has a private copy of the
|
|
variable \ucode{sum} that is created by the \kcode{reduction} clause on the \kcode{teams} construct.
|
|
The primary thread and all threads in its team have a private copy of the variable
|
|
\ucode{sum} that is created by the \kcode{reduction} clause on the parallel loop construct.
|
|
The second private \ucode{sum} is reduced into the primary thread's private copy of \ucode{sum}
|
|
created by the \kcode{teams} construct. At the end of the \kcode{teams} region,
|
|
each primary thread's private copy of \ucode{sum} is reduced into the final \ucode{sum} that is
|
|
implicitly mapped into the \kcode{target} region.
|
|
|
|
\cexample[4.0]{teams}{2}
|
|
\clearpage
|
|
|
|
\ffreeexample[4.0]{teams}{2}
|
|
|
|
\subsection{\kcode{target teams}, and Distribute Parallel Loop Constructs}
|
|
\label{subsec:teams_distribute_parallel}
|
|
|
|
The following example shows how the \kcode{target teams} and distribute
|
|
parallel loop constructs are used to execute a \kcode{target} region.
|
|
The \kcode{target teams} construct creates a league of teams where the primary thread of each
|
|
team executes the \kcode{teams} region.
|
|
|
|
The distribute parallel loop construct schedules the loop iterations across the
|
|
primary threads of each team and then across the threads of each team.
|
|
|
|
\cexample[4.5]{teams}{3}
|
|
|
|
\ffreeexample[4.5]{teams}{3}
|
|
|
|
\subsection{\kcode{target teams} and Distribute Parallel Loop
|
|
Constructs with Scheduling Clauses}
|
|
\label{subsec:teams_distribute_parallel_schedule}
|
|
\index{distribute construct@\kcode{distribute} construct!dist_schedule clause@\kcode{dist_schedule} clause}
|
|
\index{clauses!dist_schedule@\kcode{dist_schedule}}
|
|
\index{dist_schedule clause@\kcode{dist_schedule} clause}
|
|
\index{worksharing-loop constructs!schedule clause@\kcode{schedule} clause}
|
|
\index{clauses!schedule@\kcode{schedule}}
|
|
\index{schedule clause@\kcode{schedule} clause}
|
|
|
|
The following example shows how the \kcode{target teams} and \kcode{distribute parallel}
|
|
constructs are used to execute a \kcode{target} region. The \kcode{teams}
|
|
construct creates a league of at most eight teams where the primary thread of each
|
|
team executes the \kcode{teams} region. The number of threads in each team is
|
|
less than or equal to \ucode{16}.
|
|
|
|
The \kcode{distribute} parallel loop construct schedules the subsequent loop iterations
|
|
across the primary threads of each team and then across the threads of each team.
|
|
|
|
The \kcode{dist_schedule} clause on the distribute parallel loop construct indicates
|
|
that loop iterations are distributed to the primary thread of each team in chunks
|
|
of \ucode{1024} iterations.
|
|
|
|
The \kcode{schedule} clause indicates that the 1024 iterations distributed to
|
|
a primary thread are then assigned to the threads in its associated team in chunks
|
|
of \ucode{64} iterations.
|
|
|
|
\cexample[4.0]{teams}{4}
|
|
|
|
\ffreeexample[4.0]{teams}{4}
|
|
|
|
\subsection{\kcode{target teams} and \kcode{distribute simd} Constructs}
|
|
\label{subsec:teams_distribute_simd}
|
|
|
|
The following example shows how the \kcode{target teams} and \kcode{distribute simd} constructs are used to execute a loop in a \kcode{target} region.
|
|
The \kcode{target teams} construct creates a league of teams where the
|
|
primary thread of each team executes the \kcode{teams} region.
|
|
|
|
The \kcode{distribute simd} construct schedules the loop iterations across
|
|
the primary thread of each team and then uses SIMD parallelism to execute the iterations.
|
|
|
|
\cexample[4.0]{teams}{5}
|
|
|
|
\ffreeexample[4.0]{teams}{5}
|
|
|
|
\subsection{\kcode{target teams} and Distribute Parallel Loop SIMD Constructs}
|
|
\label{subsec:teams_distribute_parallel_simd}
|
|
|
|
The following example shows how the \kcode{target teams} and the distribute
|
|
parallel loop SIMD constructs are used to execute a loop in a \kcode{target teams}
|
|
region. The \kcode{target teams} construct creates a league of teams
|
|
where the primary thread of each team executes the \kcode{teams} region.
|
|
|
|
The distribute parallel loop SIMD construct schedules the loop iterations across
|
|
the primary thread of each team and then across the threads of each team where each
|
|
thread uses SIMD parallelism.
|
|
|
|
\cexample[4.0]{teams}{6}
|
|
|
|
\ffreeexample[4.0]{teams}{6}
|
|
|