OpenMP-Examples/program_control/metadirective.tex
2022-11-04 09:35:42 -07:00

155 lines
8.1 KiB
TeX

\pagebreak
\section{Metadirectives}
\label{sec:metadirective}
\index{directives!metadirective@\code{metadirective}}
\index{metadirective directive@\code{metadirective} directive}
\index{metadirective directive@\code{metadirective} directive!when clause@\code{when} clause}
\index{metadirective directive@\code{metadirective} directive!otherwise clause@\code{otherwise} clause}
\index{clauses!when@\code{when}}
\index{when clause@\code{when} clause}
\index{clauses!otherwise@\code{otherwise}}
\index{otherwise clause@\code{otherwise} clause}
A \code{metadirective} directive provides a mechanism to select a directive in
a \code{when} clause to be used, depending upon one or more contexts:
implementation, available devices and the present enclosing construct.
The directive in an \code{otherwise} clause is used when a directive of the
\code{when} clause is not selected.
\index{context selector!construct@\plc{construct}}
In the \code{when} clause the \plc{context selector} (or just \plc{selector}) defines traits that are
evaluated for selection of the directive that follows the selector.
This ``selectables'' directive is called a \plc{directive variant}.
Traits are grouped by \plc{construct}, \plc{implementation} and
\plc{device} \plc{sets} to be used by a selector of the same name.
\index{context selector!device@\plc{device}}
In the first example the architecture trait \plc{arch} of the
\plc{device} selector set specifies that if an \plc{nvptx} architecture is
active in the OpenMP context, then the \code{teams}~\code{loop}
\plc{directive variant} is selected as the directive; otherwise, the \code{parallel}~\code{loop}
\plc{directive variant} of the \code{otherwise} clause is selected as the directive.
That is, if a \plc{device} of \plc{nvptx} architecture is supported by the implementation within
the enclosing \code{target} construct, its \plc{directive variant} is selected.
The architecture names, such as \plc{nvptx}, are implementation defined.
Also, note that \plc{device} as used in a \code{target} construct specifies
a device number, while \plc{device}, as used in the \code{metadirective}
directive as selector set, has traits of \plc{kind}, \plc{isa} and \plc{arch}.
\cexample[5.2]{metadirective}{1}
\ffreeexample[5.2]{metadirective}{1}
%\pagebreak
\index{context selector!implementation@\plc{implementation}}
In the second example, the \plc{implementation} selector set is specified
in the \code{when} clause to distinguish between platforms.
Additionally, specific architectures are specified with the \plc{device}
selector set.
In the code, different \code{teams} constructs are employed as determined
by the \code{metadirective} directive.
The number of teams is restricted by a \code{num\_teams} clause
and a thread limit is also set by a \code{thread\_limit} clause for
\plc{vendor} platforms and specific architecture
traits. Otherwise, just the \code{teams} construct is used without
any clauses, as prescribed by the \code{otherwise} clause.
\cexample[5.2]{metadirective}{2}
\ffreeexample[5.2]{metadirective}{2}
\clearpage
\index{context selector!construct@\plc{construct}}
\index{directives!declare target@\code{declare}~\code{target}}
\index{declare target directive@\code{declare}~\code{target} directive}
\index{directives!begin declare target@\code{begin}~\code{declare}~\code{target}}
\index{begin declare target directive@\code{begin}~\code{declare}~\code{target} directive}
In the third example, a \plc{construct} selector set is specified in the \code{when} clause.
Here, a \code{metadirective} directive is used within a function that is also
compiled as a function for a target device as directed by a declare target directive.
The \plc{target} directive name of the \code{construct} selector ensures that the
\code{distribute}~\code{parallel}~\code{for/do} construct is employed for the target compilation.
Otherwise, for the host-compiled version the \code{parallel}~\code{for/do}~\code{simd} construct is used.
In the first call to the \plc{exp\_pi\_diff()} routine the context is a
\code{target}~\code{teams} construct and the \code{distribute}~\code{parallel}~\code{for/do}
construct version of the function is invoked,
while in the second call the \code{parallel}~\code{for/do}~\code{simd} construct version is used.
%%%%%%%%
This case illustrates an important point for users that may want to hoist the
\code{target} directive out of a function that contains the usual
\code{target}~\code{teams}~\code{distribute}~\code{parallel}~\code{for/do} construct
(for providing alternate constructs through the \code{metadirective} directive as here).
While this combined construct can be decomposed into a \code{target} and
\code{teams distribute parallel for/do} constructs, the OpenMP 5.0 specification has the restriction:
``If a \code{teams} construct is nested within a \code{target} construct, that \code{target} construct must
contain no statements, declarations or directives outside of the \code{teams} construct''.
So, the \code{teams} construct must immediately follow the \code{target} construct without any intervening
code statements (which includes function calls).
Since the \code{target} construct alone cannot be hoisted out of a function,
the \code{target}~\code{teams} construct has been hoisted out of the function, and the
\code{distribute}~\code{parallel}~\code{for/do} construct is used
as the \plc{variant} directive of the \code{metadirective} directive within the function.
%%%%%%%%
\cexample[5.2]{metadirective}{3}
\ffreeexample[5.2]{metadirective}{3}
\index{context selector!user@\plc{user}}
\index{context selector!condition selector@\code{condition} selector}
The \code{user} selector set can be used in a metadirective
to select directives at execution time when the
\code{condition(}~\plc{boolean-expr}~\code{)} selector expression is not a constant expression.
In this case it is a \plc{dynamic} trait set, and the selection is made at run time, rather
than at compile time.
In the following example the \plc{foo} function employs the \code{condition}
selector to choose a device for execution at run time.
In the \plc{bar} routine metadirectives are nested.
At the outer level a selection between serial and parallel execution in performed
at run time, followed by another run time selection on the schedule kind in the inner
level when the active \plc{construct} trait is \code{parallel}.
(Note, the variable \plc{b} in two of the ``selected'' constructs is declared private for the sole purpose
of detecting and reporting that the construct is used. Since the variable is private, its value
is unchanged outside of the construct region, whereas it is changed if the ``unselected'' construct
is used.)
%(Note: The value of \plc{b} after the \code{parallel} region remains 0 for the
%\code{guided} scheduling case, because its \code{parallel} construct also contains
%the \code{private(}~\plc{b}~\code{)} clause.
%The variable \plc{b} is employed for the sole purpose of distinguishing which
%\code{parallel} construct is selected-- for testing.)
%While there might be other ways to make these decisions at run time, such as using
%an \code{if} clause on a \code{parallel} construct, this mechanism is much more general.
%For instance, an input ``gpu\_type'' string could be used and tested in boolean expressions
%to select from one of several possible \code{target} constructs.
%Also, setting the scheduling variable (\plc{unbalanced}) within the execution through a
%``work balance'' function might be a more practical approach for setting the schedule kind.
\cexample[5.2]{metadirective}{4}
\ffreeexample[5.2]{metadirective}{4}
Metadirectives can be used in conjunction with templates as shown in the C++ code below.
Here the template definition generates two versions of the Fibonacci function.
The \splc{tasking} boolean is used in the \scode{condition} selector to enable tasking.
The true form implements a parallel version with \scode{task} and \scode{taskwait}
constructs as in the \splc{tasking.4.c} code in Section~\ref{sec:task_taskwait}.
The false form implements a serial version without any tasking constructs.
Note that the serial version is used in the parallel function for optimally
processing numbers less than 8.
\cppexample[5.0]{metadirective}{5}