OpenMP-Examples/affinity/affinity_display.tex
2021-08-17 09:11:55 -07:00

105 lines
5.3 KiB
TeX

\section{Affinity Display}
\label{sec:affinity_display}
The following examples illustrate ways to display thread affinity.
Automatic display of affinity can be invoked by setting
the \code{OMP\_DISPLAY\_AFFINITY} environment variable to \code{TRUE}.
The format of the output can be customized by setting the
\code{OMP\_AFFINITY\_FORMAT} environment variable to an appropriate string.
Also, there are API calls for the user to display thread affinity
at selected locations within code.
For the first example the environment variable \code{OMP\_DISPLAY\_AFFINITY} has been
set to \code{TRUE}, and execution occurs on an 8-core system with \code{OMP\_NUM\_THREADS} set to 8.
The affinity for the primary thread is reported through a call to the API
\code{omp\_display\_affinity()} routine. For default affinity settings
the report shows that the primary thread can execute on any of the cores.
In the following parallel region the affinity for each of the team threads is reported
automatically since the \code{OMP\_DISPLAY\_AFFINITY} environment variable has been set
to \code{TRUE}.
These two reports are often useful (as in hybrid codes using both MPI and OpenMP)
to observe the affinity (for an MPI task) before the parallel region,
and during an OpenMP parallel region. Note: the next parallel region uses the
same number of threads as in the previous parallel region and affinities are
not changed, so affinity is NOT reported.
In the last parallel region, the thread affinities are reported
because the thread affinity has changed.
\cexample[5.0]{affinity_display}{1}
\ffreeexample[5.0]{affinity_display}{1}
In the following example 2 threads are forked, and each executes on a socket. Next,
a nested parallel region runs half of the available threads on each socket.
These OpenMP environment variables have been set:
\begin{compactitem}
\item \code{OMP\_PROC\_BIND}="TRUE"
\item \code{OMP\_NUM\_THREADS}="2,4"
\item \code{OMP\_PLACES}="\{0,2,4,6\},\{1,3,5,7\}"
\item \code{OMP\_AFFINITY\_FORMAT}="nest\_level= \%L, parent\_thrd\_num= \%a, thrd\_num= \%n, thrd\_affinity= \%A"
\end{compactitem}
where the numbers correspond to core ids for the system. Note, \code{OMP\_DISPLAY\_AFFINITY} is not
set and is \code{FALSE} by default. This example shows how to use API routines to
perform affinity display operations.
For each of the two first-level threads the \code{OMP\_PLACES} variable specifies
a place with all the core-ids of the socket (\{0,2,4,6\} for one thread and \{1,3,5,7\} for the other).
(As is sometimes the case in 2-socket systems, one socket may consist
of the even id numbers, while the other may have the odd id numbers.) The affinities
are printed according to the \code{OMP\_AFFINITY\_FORMAT} format: providing
the parallel nesting level (\%L), the ancestor thread number (\%a), the thread number (\%n)
and the thread affinity (\%A). In the nested parallel region within the \plc{socket\_work} routine
the affinities for the threads on each socket are printed according to this format.
\cexample[5.0]{affinity_display}{2}
\ffreeexample[5.0]{affinity_display}{2}
The next example illustrates more details about affinity formatting.
First, the \code{omp\_get\_affininity\_format()} API routine is used to
obtain the default format. The code checks to make sure the storage
provides enough space to hold the format.
Next, the \code{omp\_set\_affinity\_format()} API routine sets a user-defined
format: \plc{host=\%20H thrd\_num=\%0.4n binds\_to=\%A}.
The host, thread number and affinity fields are specified by \plc{\%20H},
\plc{\%0.4n} and \plc{\%A}: \plc{H}, \plc{n} and \plc{A} are single character "short names"
for the host, thread\_num and thread\_affinity data to be printed,
with format sizes of \plc{20}, \plc{4}, and "size as needed".
The period (.) indicates that the field is displayed right-justified (default is left-justified)
and the "0" indicates that any unused space is to be prefixed with zeros
(e.g. instead of "1", "0001" is displayed for the field size of 4).
%The period (.) indicates that the field is displayed left-justified and the "0" indicates
%that leading zeros are to be added so that the total length for the display of this “n” (thread_num) field is 4.
%The period (\plc{.}) indicates right justified and \plc{0} leading zeros.
%All other text in the format is just user narrative.
Within the parallel region the affinity for each thread is captured by
\code{omp\_capture\_affinity()} into a buffer array with elements indexed
by the thread number (\plc{thrd\_num}).
After the parallel region, the thread affinities are printed in thread-number order.
If the storage area in buffer is inadequate for holding the affinity
data, the stored affinity data is truncated.
%The \plc{max} reduction on the required storage, returned by
%\code{omp\_capture\_affinity} in \plc{nchars}, is used to report
%possible truncation (if \plc{max\_req\_store} > \plc{buffer\_store}).
The maximum value for the number of characters (\plc{nchars}) returned by
\code{omp\_capture\_affinity} is captured by the \code{reduction(max:max\_req\_store)}
clause and the \plc{if(nchars >= max\_req\_store) max\_req\_store=nchars} statement.
It is used to report possible truncation (if \plc{max\_req\_store} > \plc{buffer\_store}).
\cexample[5.0]{affinity_display}{3}
\ffreeexample[5.0]{affinity_display}{3}