mirror of
https://github.com/OpenMP/Examples.git
synced 2025-04-03 13:21:33 +01:00
126 lines
6.0 KiB
TeX
126 lines
6.0 KiB
TeX
\cchapter{OpenMP Affinity}{affinity}
|
|
\label{chap:openmp_affinity}
|
|
|
|
OpenMP Affinity consists of a \kcode{proc_bind} policy (thread affinity policy) and a specification of
|
|
places (``location units'' or \plc{processors} that may be cores, hardware
|
|
threads, sockets, etc.).
|
|
OpenMP Affinity enables users to bind computations on specific places.
|
|
The placement will hold for the duration of the parallel region.
|
|
However, the runtime is free to migrate the OpenMP threads
|
|
to different cores (hardware threads, sockets, etc.) prescribed within a given place,
|
|
if two or more cores (hardware threads, sockets, etc.) have been assigned to a given place.
|
|
|
|
Often the binding can be managed without resorting to explicitly setting places.
|
|
Without the specification of places in the \kcode{OMP_PLACES} variable,
|
|
the OpenMP runtime will distribute and bind threads using the entire range of processors for
|
|
the OpenMP program, according to the \kcode{OMP_PROC_BIND} environment variable
|
|
or the \kcode{proc_bind} clause. When places are specified, the OMP runtime
|
|
binds threads to the places according to a default distribution policy, or
|
|
those specified in the \kcode{OMP_PROC_BIND} environment variable or the
|
|
\kcode{proc_bind} clause.
|
|
|
|
In the OpenMP Specifications document a processor refers to an execution unit that
|
|
is enabled for an OpenMP thread to use. A processor is a core when there is
|
|
no SMT (Simultaneous Multi-Threading) support or SMT is disabled. When
|
|
SMT is enabled, a processor is a hardware thread (HW-thread). (This is the
|
|
usual case; but actually, the execution unit is implementation defined.) Processor
|
|
numbers are numbered sequentially from 0 to the number of cores less one (without SMT), or
|
|
0 to the number HW-threads less one (with SMT). OpenMP places use the processor number to designate
|
|
binding locations (unless an ``abstract name'' is used.)
|
|
|
|
|
|
The processors available to a process may be a subset of the system's
|
|
processors. This restriction may be the result of a
|
|
wrapper process controlling the execution (such as \plc{numactl} on Linux systems),
|
|
compiler options, library-specific environment variables, or default
|
|
kernel settings. For instance, the execution of multiple MPI processes,
|
|
launched on a single compute node, will each have a subset of processors as
|
|
determined by the MPI launcher or set by MPI affinity environment
|
|
variables for the MPI library. %Forked threads within an MPI process
|
|
%(for a hybrid execution of MPI and OpenMP code) inherit the valid
|
|
%processor set for execution from the parent process (the initial task region)
|
|
%when a parallel region forks threads. The binding policy set in
|
|
%\code{OMP\_PROC\_BIND} or the \code{proc\_bind} clause will be applied to
|
|
%the subset of processors available to \plc{the particular} MPI process.
|
|
|
|
%Also, setting an explicit list of processor numbers in the \code{OMP\_PLACES}
|
|
%variable before an MPI launch (which involves more than one MPI process) will
|
|
%result in unspecified behavior (and doesn't make sense) because the set of
|
|
%processors in the places list must not contain processors outside the subset
|
|
%of processors for an MPI process. A separate \code{OMP\_PLACES} variable must
|
|
%be set for each MPI process, and is usually accomplished by launching a script
|
|
%which sets \code{OMP\_PLACES} specifically for the MPI process.
|
|
|
|
Threads of a team are positioned onto places in a compact manner, a
|
|
scattered distribution, or onto the primary thread's place, by setting the
|
|
\kcode{OMP_PROC_BIND} environment variable or the \kcode{proc_bind} clause to
|
|
\kcode{close}, \kcode{spread}, or \kcode{primary} (\kcode{master} has been deprecated), respectively. When
|
|
\kcode{OMP_PROC_BIND} is set to FALSE no binding is enforced; and
|
|
when the value is TRUE, the binding is implementation defined to
|
|
a set of places in the \kcode{OMP_PLACES} variable or to places
|
|
defined by the implementation if the \kcode{OMP_PLACES} variable
|
|
is not set.
|
|
|
|
The \kcode{OMP_PLACES} variable can also be set to an abstract name
|
|
(\kcode{threads}, \kcode{cores}, \kcode{sockets}) to specify that a place is
|
|
either a single hardware thread, a core, or a socket, respectively.
|
|
This description of the \kcode{OMP_PLACES} is most useful when the
|
|
number of threads is equal to the number of hardware thread, cores
|
|
or sockets. It can also be used with a \kcode{close} or \kcode{spread}
|
|
distribution policy when the equality doesn't hold.
|
|
|
|
|
|
% We need an example of using sockets, cores and threads:
|
|
|
|
% case 1 cores:
|
|
|
|
% Hyper-Threads on (2 hardware threads per core)
|
|
% 1 socket x 4 cores x 2 HW-threads
|
|
%
|
|
% export OMP_NUM_THREADS=4
|
|
% export OMP_PLACES=threads
|
|
%
|
|
% core # 0 1 2 3
|
|
% processor # 0,1 2,3 4,5 6,7
|
|
% thread # 0 * _ _ _ _ _ _ _ #mask for thread 0
|
|
% thread # 1 _ _ * _ _ _ _ _ #mask for thread 1
|
|
% thread # 2 _ _ _ _ * _ _ _ #mask for thread 2
|
|
% thread # 3 _ _ _ _ _ _ * _ #mask for thread 3
|
|
|
|
% case 2 threads:
|
|
%
|
|
% Hyper-Threads on (2 hardware threads per core)
|
|
% 1 socket x 4 cores x 2 HW-threads
|
|
%
|
|
% export OMP_NUM_THREADS=4
|
|
% export OMP_PLACES=cores
|
|
%
|
|
% core # 0 1 2 3
|
|
% processor # 0,1 2,3 4,5 6,7
|
|
% thread # 0 * * _ _ _ _ _ _ #mask for thread 0
|
|
% thread # 1 _ _ * * _ _ _ _ #mask for thread 1
|
|
% thread # 2 _ _ _ _ * * _ _ #mask for thread 2
|
|
% thread # 3 _ _ _ _ _ _ * * #mask for thread 3
|
|
|
|
% case 3 sockets:
|
|
%
|
|
% No Hyper-Threads
|
|
% 3 socket x 4 cores
|
|
%
|
|
% export OMP_NUM_THREADS=3
|
|
% export OMP_PLACES=sockets
|
|
%
|
|
% socket # 0 1 2
|
|
% processor # 0,1,2,3 4,5,6,7 8,9,10,11
|
|
% thread # 0 * * * * _ _ _ _ _ _ _ _ #mask for thread 0
|
|
% thread # 0 _ _ _ _ * * * * _ _ _ _ #mask for thread 1
|
|
% thread # 0 _ _ _ _ _ _ _ _ * * * * #mask for thread 2
|
|
|
|
|
|
%===== Examples Sections =====
|
|
\input{affinity/affinity}
|
|
\input{affinity/task_affinity}
|
|
\input{affinity/affinity_display}
|
|
\input{affinity/affinity_query}
|
|
|