mirror of
https://github.com/OpenMP/Examples.git
synced 2025-04-04 05:41:33 +01:00
127 lines
6.8 KiB
TeX
127 lines
6.8 KiB
TeX
\section{Affinity Display}
|
|
\label{sec:affinity_display}
|
|
\index{affinity display!OMP_DISPLAY_AFFINITY@\kcode{OMP_DISPLAY_AFFINITY}}
|
|
\index{environment variables!OMP_DISPLAY_AFFINITY@\kcode{OMP_DISPLAY_AFFINITY}}
|
|
\index{OMP_DISPLAY_AFFINITY@\kcode{OMP_DISPLAY_AFFINITY}}
|
|
\index{affinity display!OMP_AFFINITY_FORMAT@\kcode{OMP_AFFINITY_FORMAT}}
|
|
\index{environment variables!OMP_AFFINITY_FORMAT@\kcode{OMP_AFFINITY_FORMAT}}
|
|
\index{OMP_AFFINITY_FORMAT@\kcode{OMP_AFFINITY_FORMAT}}
|
|
\index{affinity display!omp_display_affinity routine@\kcode{omp_display_affinity} routine}
|
|
\index{routines!omp_display_affinity@\kcode{omp_display_affinity}}
|
|
\index{omp_display_affinity routine@\kcode{omp_display_affinity} routine}
|
|
|
|
The following examples illustrate ways to display thread affinity.
|
|
Automatic display of affinity can be invoked by setting
|
|
the \kcode{OMP_DISPLAY_AFFINITY} environment variable to \vcode{TRUE}.
|
|
The format of the output can be customized by setting the
|
|
\kcode{OMP_AFFINITY_FORMAT} environment variable to an appropriate string.
|
|
Also, there are API calls for the user to display thread affinity
|
|
at selected locations within code.
|
|
|
|
For the first example the environment variable \kcode{OMP_DISPLAY_AFFINITY} has been
|
|
set to \vcode{TRUE}, and execution occurs on an 8-core system with \kcode{OMP_NUM_THREADS} set to 8.
|
|
|
|
The affinity for the primary thread is reported through a call to the API
|
|
\kcode{omp_display_affinity()} routine. For default affinity settings
|
|
the report shows that the primary thread can execute on any of the cores.
|
|
In the following parallel region the affinity for each of the team threads is reported
|
|
automatically since the \kcode{OMP_DISPLAY_AFFINITY} environment variable has been set
|
|
to \vcode{TRUE}.
|
|
|
|
These two reports are often useful (as in hybrid codes using both MPI and OpenMP)
|
|
to observe the affinity (for an MPI task) before the parallel region,
|
|
and during an OpenMP parallel region. Note: the next parallel region uses the
|
|
same number of threads as in the previous parallel region and affinities are
|
|
not changed, so affinity is NOT reported.
|
|
|
|
In the last parallel region, the thread affinities are reported
|
|
because the thread affinity has changed.
|
|
|
|
\cexample[5.0]{affinity_display}{1}[1]
|
|
|
|
\ffreeexample[5.0]{affinity_display}{1}[1]
|
|
|
|
|
|
In the following example 2 threads are forked, and each executes on a socket. Next,
|
|
a nested parallel region runs half of the available threads on each socket.
|
|
|
|
These OpenMP environment variables have been set:
|
|
|
|
\begin{boxeducode}
|
|
\kcode{export OMP_PROC_BIND=}"TRUE"
|
|
\kcode{export OMP_NUM_THREADS=}"2,4"
|
|
\kcode{export OMP_PLACES=}"{0,2,4,6},{1,3,5,7}"
|
|
\kcode{export OMP_AFFINITY_FORMAT=}"nest_level= %L, parent_thrd_num= %a,
|
|
thrd_num= %n, thrd_affinity= %A"
|
|
\end{boxeducode}
|
|
|
|
where the numbers correspond to core ids for the system. Note, \kcode{OMP_DISPLAY_AFFINITY} is not
|
|
set and is \vcode{FALSE} by default. This example shows how to use API routines to
|
|
perform affinity display operations.
|
|
|
|
\index{environment variables!OMP_PLACES@\kcode{OMP_PLACES}}
|
|
\index{OMP_PLACES@\kcode{OMP_PLACES}}
|
|
For each of the two first-level threads the \kcode{OMP_PLACES} variable specifies
|
|
a place with all the core-ids of the socket (\{0,2,4,6\} for one thread and \{1,3,5,7\} for the other).
|
|
(As is sometimes the case in 2-socket systems, one socket may consist
|
|
of the even id numbers, while the other may have the odd id numbers.) The affinities
|
|
are printed according to the \kcode{OMP_AFFINITY_FORMAT} format: providing
|
|
the parallel nesting level (\ucode{\%L}), the ancestor thread number (\ucode{\%a}), the thread number (\ucode{\%n})
|
|
and the thread affinity (\ucode{\%A}). In the nested parallel region within the \ucode{socket_work} routine
|
|
the affinities for the threads on each socket are printed according to this format.
|
|
|
|
\cexample[5.0]{affinity_display}{2}[3]
|
|
|
|
\ffreeexample[5.0]{affinity_display}{2}[3]
|
|
|
|
%\newpage
|
|
\index{affinity display!omp_get_affinity_format routine@\kcode{omp_get_affinity_format} routine}
|
|
\index{routines!omp_get_affinity_format@\kcode{omp_get_affinity_format}}
|
|
\index{omp_get_affinity_format routine@\kcode{omp_get_affinity_format} routine}
|
|
\index{affinity display!omp_set_affinity_format routine@\kcode{omp_set_affinity_format} routine}
|
|
\index{routines!omp_set_affinity_format@\kcode{omp_set_affinity_format}}
|
|
\index{omp_set_affinity_format routine@\kcode{omp_set_affinity_format} routine}
|
|
The next example illustrates more details about affinity formatting.
|
|
First, the \kcode{omp_get_affinity_format()} API routine is used to
|
|
obtain the default format. The code checks to make sure the storage
|
|
provides enough space to hold the format.
|
|
Next, the \kcode{omp_set_affinity_format()} API routine sets a user-defined
|
|
format: \ucode{host=\%20H~thrd_num=\%0.4n~binds_to=\%A}.
|
|
|
|
The host, thread number and affinity fields are specified by \ucode{\%20H},
|
|
\ucode{\%0.4n} and \ucode{\%A}: \ucode{H}, \ucode{n} and \ucode{A} are single character ``short names''
|
|
for the host, thread\_num and thread\_affinity data to be printed,
|
|
with format sizes of \ucode{20}, \ucode{4}, and ``size as needed''.
|
|
The period (.) indicates that the field is displayed right-justified (default is left-justified)
|
|
and the ``0'' indicates that any unused space is to be prefixed with zeros
|
|
(e.g. instead of ``1'', ``0001'' is displayed for the field size of 4).
|
|
|
|
%The period (.) indicates that the field is displayed left-justified and the ``0'' indicates
|
|
%that leading zeros are to be added so that the total length for the display of this “n” (thread_num) field is 4.
|
|
|
|
%The period (\plc{.}) indicates right justified and \plc{0} leading zeros.
|
|
%All other text in the format is just user narrative.
|
|
|
|
\index{affinity display!omp_capture_affinity routine@\kcode{omp_capture_affinity} routine}
|
|
\index{routines!omp_capture_affinity@\kcode{omp_capture_affinity}}
|
|
\index{omp_capture_affinity routine@\kcode{omp_capture_affinity} routine}
|
|
Within the parallel region the affinity for each thread is captured by
|
|
\kcode{omp_capture_affinity()} into a buffer array with elements indexed
|
|
by the thread number (\ucode{thrd_num}).
|
|
After the parallel region, the thread affinities are printed in thread-number order.
|
|
|
|
If the storage area in buffer is inadequate for holding the affinity
|
|
data, the stored affinity data is truncated.
|
|
%The \plc{max} reduction on the required storage, returned by
|
|
%\code{omp\_capture\_affinity} in \plc{nchars}, is used to report
|
|
%possible truncation (if \plc{max\_req\_store} > \plc{buffer\_store}).
|
|
The maximum value for the number of characters (\ucode{nchars}) returned by
|
|
\kcode{omp_capture_affinity} is captured by the \kcode{reduction(max: \ucode{max_req_store})}
|
|
clause and the \ucode{if(nchars >= max_req_store) max_req_store=nchars} statement.
|
|
It is used to report possible truncation (if \ucode{max_req_store} > \ucode{buffer_store}).
|
|
|
|
\cexample[5.0]{affinity_display}{3}
|
|
|
|
\ffreeexample[5.0]{affinity_display}{3}
|
|
|