2024-11-13 11:07:08 -08:00

151 lines
7.0 KiB
TeX

\pagebreak
\section{\kcode{apply} Clause}
\label{sec:apply_clause}
\index{unroll construct@\kcode{unroll} construct!apply clause@\kcode{apply} clause}
\index{tile construct@\kcode{tile} construct!apply clause@\kcode{apply} clause}
\index{apply clause@\kcode{apply} clause}
\index{clauses!apply@\kcode{apply}}
A loop transformation construct can be applied to another nested
loop transformation construct, but the application of the ``outer'' transformation
is limited to the outermost generated loop of the ``inner'' transformation.
The \kcode{apply} clause on a loop transformation construct can specify additional
loop transformation directives that apply to generated loops other than the outermost one.
Clause modifiers are used to specify which generated loop to target.
Also, an applied directive within a clause may specify another \kcode{apply} clause.
%The \code{apply} clause on a loop transformation construct can specify (other)
%loop transformation directives to be applied to its transformation.
%Clause modifiers can be used to target specific generated loops, providing a mechanism
%to overcome the restriction of applying a transformation immediately to the next loop
%transformation construct. Also, an applied directive within a clause may be another
%\code{apply} clause.
Any nested loop transformation constructs including any constructs that
result from \kcode{apply} clauses of nested constructs are replaced before any enclosing
loop transformation construct. This is referred to as the \plc{innermost-first order}
here.
\subsection{Syntax and Effect}
In the example below, the \ucode{construct_unroll} and \ucode{apply_unroll} functions
illustrate the syntax for two equivalent means of applying the \kcode{unroll} loop transformation
directive to the outermost generated (grid) loop of the \kcode{tile} construct transformation.
In function \ucode{construct_unroll}, the tile transformation creates the generated (tiled) loops
and then the \kcode{unroll} construct is applied to outermost loop of the replacement.
In the \ucode{apply_unroll} function, the \kcode{apply} clause on the \kcode{tile} construct
is used to apply an \kcode{unroll} transformation on the \plc{grid} loop (the outermost loop
of the tile transformation) as specified by the \kcode{grid} modifier.
\cexample[6.0]{apply_syntax}{1}
\ffreeexample[6.0]{apply_syntax}{1}
For the two functions in the previous example,
the \ucode{equivalent} function in the next example shows an equivalent
code that a user could have written without using the \kcode{tile} construct
or \kcode{apply} clause.
\cexample[5.1]{apply_syntax_equivalent}{1}
\ffreeexample[5.1]{apply_syntax_equivalent}{1}
The following example shows how multiple loop transformation directives
can be applied to different generated loops resulting from a loop transformation.
For the 4x4 \kcode{tile} construct there will be two (outer) \plc{grid} loops and two (inner) \plc{intra-tile} loops.
The first \kcode{apply} clause specifies that the two \plc{grid} loops are to have an \kcode{interchange} directive and a \kcode{nothing} directive
(just a placeholder to indicate no directive application) applied to the grid (two outermost) loops.
Directives, read from left to right, are applied to the \plc{grid} loops, from outermost to innermost, respectively.
The second \kcode{apply} clause specifies that the two \plc{intratile} loops are to have \kcode{nothing} and \kcode{interchange} directives applied to the
last two \plc{tile} loops, respectively.
Note that the \ucode{A} array dimensions are \ucode{A[100][100][3]} and \ucode{A(0:2,0:99,0:99)}
in the C/C++ and Fortran codes to illustrate equivalent sequential memory access for the
\ucode{i}, \ucode{j} and \ucode{k} loops.
\index{interchange directive@\kcode{interchange} directive}
\index{directives!interchange@\kcode{interchange}}
\index{nothing directive@\kcode{nothing} directive}
\index{directives!nothing@\kcode{nothing}}
\cexample[6.0]{apply_syntax}{2}
\pagebreak
\ffreeexample[6.0]{apply_syntax}{2}
For the function in the previous example,
the \ucode{equivalent} function in the next example shows a possible
equivalent tile replacement code (\kcode{tile} generated loops) and the
appropriately positioned \kcode{interchange} and \kcode{nothing} directives.
\cexample[6.0]{apply_syntax_equivalent}{2}
\pagebreak
\ffreeexample[6.0]{apply_syntax_equivalent}{2}
\index{tile construct@\kcode{tile} construct!apply clause@\kcode{apply} clause}
\index{grid modifier@\kcode{grid} modifier}
\index{intratile modifier@\kcode{intratile} modifier}
The following example illustrates the use of \kcode{apply} clause
modifiers with argument. The index of the generated loop instead of
a positional location can be used for the applied-directive.
The \kcode{grid(1)} modifier indicates the first grid loop
generated by the \kcode{tile} directive
and the \kcode{intratile(2)} modifier indicates the second tile loop
generated by the \kcode{tile} directive.
\cexample[6.0]{apply_syntax}{3}
\pagebreak
\ffreeexample[6.0]{apply_syntax}{3}
Without the index arguments, the \kcode{nothing} argument would
be needed as a placeholder, as illustrated by the equivalent codes
of the above example as follows.
\cexample[6.0]{apply_syntax_equivalent}{3}
\pagebreak
\ffreeexample[6.0]{apply_syntax_equivalent}{3}
\subsection{Spanning Loop Associations}
It is possible for a loop transformation directive to be applied to multiple generated loops,
and multiple directives applied to the same generated loop.
The latter is illustrated in the this example.
\cexample[6.0]{apply_span}{1}
\ffreeexample[6.0]{apply_span}{1}
In this example, the functions show successive steps in the application of
the previous loop transformation example as equivalent user-written code.
First, the tiling is applied in the \ucode{step1} function.
Next, loop transformations in the generated loop nest are replaced according to the innermost-first order rule.
Applying the innermost transformation, loop reversal, results in the loop nest in \ucode{step2}.
After that, the inner tile directive is applied in the \ucode{step3} function.
\index{reverse directive@\kcode{reverse} directive}
\index{directives!reverse@\kcode{reverse}}
\cexample[6.0]{apply_span_equivalent}{1}
\ffreeexample[6.0]{apply_span_equivalent}{1}
\subsection{Nested apply}
The following example illustrates how multiple loop transformations can be chained by nesting \kcode{apply} clauses.
In the \ucode{nested_apply} function, a loop is first tiled, then the intra-tile
loop is unrolled, and finally the iteration order of the unrolled loop is reversed.
For C/C++ codes, reversing a loop with an unsigned type index may cause the compiler
to ensure that underflow is handled correctly.
\cexample[6.0]{apply_nested}{1}
\ffreeexample[6.0]{apply_nested}{1}
In this example the \ucode{step1}, \ucode{step2} and \ucode{step3}
functions are all equivalent to the \ucode{nested_apply} function, but illustrate
a possible chain of transformations but done manually by a user.
\cexample[6.0]{apply_nested_equivalent}{1}
\ffreeexample[6.0]{apply_nested_equivalent}{1}