\pagebreak \section{\code{declare}~\code{variant} Directive} \label{sec:declare_variant} %A \code{declare variant} directive specifies that the following function is an alternate function, %a \plc{function variant}, to be used in place of the specified \plc{base function} %when the trait within the \code{match} clause has a valid context. A \code{declare}~\code{variant} directive specifies an alternate function, \plc{function variant}, to be used in place of the \plc{base function} %when the trait within the \code{match} clause has a valid context. when the trait within the \code{match} clause matches the OpenMP context at a given call site. The base function follows the directive in the C and C++ languages. In Fortran, either a subroutine or function may be used as the \plc{base function}, and the \code{declare}~\code{variant} directive must be in the specification part of a subroutine or function (unless a \plc{base-proc-name} modifier is used, as in the case of a procedure declaration statement). See the OpenMP 5.0 Specification for details on the modifier. When multiple \code{declare}~\code{variant} directives are used a function variant becomes a candidate for replacing the base function if the %base function call context matches the traits of all selectors in the \code{match} clause. context at the base function call matches the traits of all selectors in the \code{match} clause. If there are multiple candidates, a score is assigned with rules for each of the selector traits. The scoring algorithm can be found in the OpenMP 5.0 Specification. In the first example the \plc{vxv()} function is called within a \code{parallel} region, a \code{target} region, and in a sequential part of the program. Two function variants, \plc{p\_vxv()} and \plc{t\_vxv()}, are defined for the first two regions by using \plc{parallel} and \plc{target} selectors (within the \plc{construct} trait set) in a \code{match} clause. The \plc{p\_vxv()} function variant includes a \code{for} construct (\code{do} construct for Fortran) for the \code{parallel} region, while \plc{t\_vxv()} includes a \code{distribute}~\code{simd} construct for the \code{target} region. The \plc{t\_vxv()} function is explicitly compiled for the device using a \code{declare}~\code{target} directive. Since the two \code{declare}~\code{variant} directives have no selectors that match traits for the context of the base function call in the sequential part of the program, the base \plc{vxv()} function is used there, as expected. (The vectors in the \plc{p\_vxv} and \plc{t\_vxv} functions have been multiplied by 3 and 2, respectively, for checking the validity of the replacement. Normally the purpose of a function variant is to produce the same results by a different method.) %Note: a \code{target teams} construct is used to direct execution onto a device, with a %\code{distribute simd} construct in the function variant. As of the OpenMP 5.0 implementation %no intervening code is allowed between a \code{target} and \code{teams} construct. So %using a \code{target} construct to direct execution onto a device, and including %\code{teams distribute simd} in the variant function would produce non conforming code. %\pagebreak \cexample{declare_variant}{1} \ffreeexample{declare_variant}{1} %\pagebreak In this example, traits from the \plc{device} set are used to select a function variant. In the \code{declare}~\code{variant} directive, an \plc{isa} selector specifies that if the implementation of the ``\plc{core-avx512}'' instruction set is detected at compile time the \plc{avx512\_saxpy()} variant function is used for the call to \plc{base\_saxpy()}. A compilation of \plc{avx512\_saxpy()} is aware of the AVX-512 instruction set that supports 512-bit vector extensions (for Xeon or Xeon Phi architectures). Within \plc{avx512\_saxpy()}, the \code{parallel}~\code{for}~\code{simd} construct performs parallel execution, and takes advantage of 64-byte data alignment. When the \plc{avx512\_saxpy()} function variant is not selected, the base \plc{base\_saxpy()} function variant containing only a basic \code{parallel}~\code{for} construct is used for the call to \plc{base\_saxpy()}. %Note: %An allocator is used to set the alignment to 64 bytes when an OpenMP compilation is performed. %Details about allocator variable declarations and functions %can be found in the allocator example of the Memory Management Chapter. %\pagebreak \cexample{declare_variant}{2} \ffreeexample{declare_variant}{2}