\section{Affinity Display} \label{sec:affinity_display} The following examples illustrate ways to display thread affinity. Automatic display of affinity can be invoked by setting the \code{OMP\_DISPLAY\_AFFINITY} environment variable to \code{TRUE}. The format of the output can be customized by setting the \code{OMP\_AFFINITY\_FORMAT} environment variable to an appropriate string. Also, there are API calls for the user to display thread affinity at selected locations within code. For the first example the environment variable \code{OMP\_DISPLAY\_AFFINITY} has been set to \code{TRUE}, and execution occurs on an 8-core system with \code{OMP\_NUM\_THREADS} set to 8. The affinity for the master thread is reported through a call to the API \code{omp\_display\_affinity()} routine. For default affinity settings the report shows that the master thread can execute on any of the cores. In the following parallel region the affinity for each of the team threads is reported automatically since the \code{OMP\_DISPLAY\_AFFINITY} environment variable has been set to \code{TRUE}. These two reports are often useful (as in hybrid codes using both MPI and OpenMP) to observe the affinity (for an MPI task) before the parallel region, and during an OpenMP parallel region. Note: the next parallel region uses the same number of threads as in the previous parallel region and affinities are not changed, so affinity is NOT reported. In the last parallel region, the thread affinities are reported because the thread affinity has changed. \cexample[5.0]{affinity_display}{1} \ffreeexample[5.0]{affinity_display}{1} In the following example 2 threads are forked, and each executes on a socket. Next, a nested parallel region runs half of the available threads on each socket. These OpenMP environment variables have been set: \begin{compactitem} \item \code{OMP\_PROC\_BIND}="TRUE" \item \code{OMP\_NUM\_THREADS}="2,4" \item \code{OMP\_PLACES}="\{0,2,4,6\},\{1,3,5,7\}" \item \code{OMP\_AFFINITY\_FORMAT}="nest\_level= \%L, parent\_thrd\_num= \%a, thrd\_num= \%n, thrd\_affinity= \%A" \end{compactitem} where the numbers correspond to core ids for the system. Note, \code{OMP\_DISPLAY\_AFFINITY} is not set and is \code{FALSE} by default. This example shows how to use API routines to perform affinity display operations. For each of the two first-level threads the \code{OMP\_PLACES} variable specifies a place with all the core-ids of the socket (\{0,2,4,6\} for one thread and \{1,3,5,7\} for the other). (As is sometimes the case in 2-socket systems, one socket may consist of the even id numbers, while the other may have the odd id numbers.) The affinities are printed according to the \code{OMP\_AFFINITY\_FORMAT} format: providing the parallel nesting level (\%L), the ancestor thread number (\%a), the thread number (\%n) and the thread affinity (\%A). In the nested parallel region within the \plc{socket\_work} routine the affinities for the threads on each socket are printed according to this format. \cexample[5.0]{affinity_display}{2} \ffreeexample[5.0]{affinity_display}{2} The next example illustrates more details about affinity formatting. First, the \code{omp\_get\_affininity\_format()} API routine is used to obtain the default format. The code checks to make sure the storage provides enough space to hold the format. Next, the \code{omp\_set\_affinity\_format()} API routine sets a user-defined format: \plc{host=\%20H thrd\_num=\%0.4n binds\_to=\%A}. The host, thread number and affinity fields are specified by \plc{\%20H}, \plc{\%0.4n} and \plc{\%A}: \plc{H}, \plc{n} and \plc{A} are single character "short names" for the host, thread\_num and thread\_affinity data to be printed, with format sizes of \plc{20}, \plc{4}, and "size as needed". The period (.) indicates that the field is displayed right-justified (default is left-justified) and the "0" indicates that any unused space is to be prefixed with zeros (e.g. instead of "1", "0001" is displayed for the field size of 4). %The period (.) indicates that the field is displayed left-justified and the "0" indicates %that leading zeros are to be added so that the total length for the display of this ā€œnā€ (thread_num) field is 4. %The period (\plc{.}) indicates right justified and \plc{0} leading zeros. %All other text in the format is just user narrative. Within the parallel region the affinity for each thread is captured by \code{omp\_capture\_affinity()} into a buffer array with elements indexed by the thread number (\plc{thrd\_num}). After the parallel region, the thread affinities are printed in thread-number order. If the storage area in buffer is inadequate for holding the affinity data, the stored affinity data is truncated. %The \plc{max} reduction on the required storage, returned by %\code{omp\_capture\_affinity} in \plc{nchars}, is used to report %possible truncation (if \plc{max\_req\_store} > \plc{buffer\_store}). The maximum value for the number of characters (\plc{nchars}) returned by \code{omp\_capture\_affinity} is captured by the \code{reduction(max:max\_req\_store)} clause and the \plc{if(nchars >= max\_req\_store) max\_req\_store=nchars} statement. It is used to report possible truncation (if \plc{max\_req\_store} > \plc{buffer\_store}). \cexample[5.0]{affinity_display}{3} \ffreeexample[5.0]{affinity_display}{3}