mirror of
https://github.com/OpenMP/Examples.git
synced 2025-04-08 07:32:12 +01:00
151 lines
6.6 KiB
TeX
151 lines
6.6 KiB
TeX
%\pagebreak
|
|
\section{\kcode{simd} and \kcode{declare simd} Directives}
|
|
\label{sec:SIMD}
|
|
|
|
\index{constructs!simd@\kcode{simd}}
|
|
\index{simd construct@\kcode{simd} construct}
|
|
The following example illustrates the basic use of the \kcode{simd} construct
|
|
to assure the compiler that the loop can be vectorized.
|
|
|
|
\cexample[4.0]{SIMD}{1}
|
|
|
|
\ffreeexample[4.0]{SIMD}{1}
|
|
|
|
|
|
\index{directives!declare simd@\kcode{declare simd}}
|
|
\index{declare simd directive@\kcode{declare simd} directive}
|
|
\index{clauses!uniform@\kcode{uniform}}
|
|
\index{uniform clause@\kcode{uniform} clause}
|
|
\index{clauses!linear@\kcode{linear}}
|
|
\index{linear clause@\kcode{linear} clause}
|
|
When a function can be inlined within a loop the compiler has an opportunity to
|
|
vectorize the loop. By guaranteeing SIMD behavior of a function's operations,
|
|
characterizing the arguments of the function and privatizing temporary
|
|
variables of the loop, the compiler can often create faster, vector code for
|
|
the loop. In the examples below the \kcode{declare simd} directive is
|
|
used on the \ucode{add1} and \ucode{add2} functions to enable creation of their
|
|
corresponding SIMD function versions for execution within the associated SIMD
|
|
loop. The functions characterize two different approaches of accessing data
|
|
within the function: by a single variable and as an element in a data array,
|
|
respectively. The \ucode{add3} C function uses dereferencing.
|
|
|
|
The \kcode{declare simd} directives also illustrate the use of
|
|
\kcode{uniform} and \kcode{linear} clauses. The \kcode{uniform(\ucode{fact})} clause
|
|
indicates that the variable \ucode{fact} is invariant across the SIMD lanes. In
|
|
the \ucode{add2} function \ucode{a} and \ucode{b} are included in the \kcode{uniform}
|
|
list because the C pointer and the Fortran array references are constant. The
|
|
\ucode{i} index used in the \ucode{add2} function is included in a \kcode{linear}
|
|
clause with a constant-linear-step of 1, to guarantee a unity increment of the
|
|
associated loop. In the \kcode{declare simd} directive for the \ucode{add3}
|
|
C function the \kcode{linear(\ucode{a,b:1})} clause instructs the compiler to generate
|
|
unit-stride loads across the SIMD lanes; otherwise, costly \emph{gather}
|
|
instructions would be generated for the unknown sequence of access of the
|
|
pointer dereferences.
|
|
|
|
In the \kcode{simd} constructs for the loops the \kcode{private(\ucode{tmp})} clause is
|
|
necessary to assure that each vector operation has its own \ucode{tmp}
|
|
variable.
|
|
|
|
\cexample[4.0]{SIMD}{2}
|
|
|
|
\ffreeexample[4.0]{SIMD}{2}
|
|
|
|
%\pagebreak
|
|
\index{clauses!private@\kcode{private}}
|
|
\index{private clause@\kcode{private} clause}
|
|
\index{clauses!reduction@\kcode{reduction}}
|
|
\index{reduction clause@\kcode{reduction} clause}
|
|
\index{reductions!reduction clause@\kcode{reduction} clause}
|
|
A thread that encounters a SIMD construct executes a vectorized code of the
|
|
iterations. Similar to the concerns of a worksharing loop a loop vectorized
|
|
with a SIMD construct must assure that temporary and reduction variables are
|
|
privatized and declared as reductions with clauses. The example below
|
|
illustrates the use of \kcode{private} and \kcode{reduction} clauses in a SIMD
|
|
construct.
|
|
|
|
\cexample[4.0]{SIMD}{3}
|
|
|
|
\ffreeexample[4.0]{SIMD}{3}
|
|
|
|
|
|
%\pagebreak
|
|
\index{clauses!safelen@\kcode{safelen}}
|
|
\index{safelen clause@\kcode{safelen} clause}
|
|
A \kcode{safelen(\ucode{N})} clause in a \kcode{simd} construct assures the compiler that
|
|
there are no loop-carried dependences for vectors of size \ucode{N} or below. If
|
|
the \kcode{safelen} clause is not specified, then the default safelen value is
|
|
the number of loop iterations.
|
|
|
|
The \kcode{safelen(\ucode{16})} clause in the example below guarantees that the vector
|
|
code is safe for vectors up to and including size 16. In the loop, \ucode{m} can
|
|
be 16 or greater, for correct code execution. If the value of \ucode{m} is less
|
|
than 16, the behavior is undefined.
|
|
|
|
\cexample[4.0]{SIMD}{4}
|
|
|
|
\ffreeexample[4.0]{SIMD}{4}
|
|
|
|
%\pagebreak
|
|
\index{clauses!collapse@\kcode{collapse}}
|
|
\index{collapse clause@\kcode{collapse} clause}
|
|
The following SIMD construct instructs the compiler to collapse the \ucode{i} and
|
|
\ucode{j} loops into a single SIMD loop in which SIMD chunks are executed by
|
|
threads of the team. Within the workshared loop chunks of a thread, the SIMD
|
|
chunks are executed in the lanes of the vector units.
|
|
|
|
\cexample[4.0]{SIMD}{5}
|
|
|
|
\ffreeexample[4.0]{SIMD}{5}
|
|
|
|
|
|
%%% section
|
|
\section{\kcode{inbranch} and \kcode{notinbranch} Clauses}
|
|
\label{sec:SIMD_branch}
|
|
\index{clauses!inbranch@\kcode{inbranch}}
|
|
\index{inbranch clause@\kcode{inbranch} clause}
|
|
\index{clauses!notinbranch@\kcode{notinbranch}}
|
|
\index{notinbranch clause@\kcode{notinbranch} clause}
|
|
|
|
The following examples illustrate the use of the \kcode{declare simd}
|
|
directive with the \kcode{inbranch} and \kcode{notinbranch} clauses. The
|
|
\kcode{notinbranch} clause informs the compiler that the function \ucode{foo} is
|
|
never called conditionally in the SIMD loop of the function \ucode{myaddint}. On
|
|
the other hand, the \kcode{inbranch} clause for the function goo indicates that
|
|
the function is always called conditionally in the SIMD loop inside
|
|
the function \ucode{myaddfloat}.
|
|
|
|
\cexample[4.0]{SIMD}{6}
|
|
|
|
\ffreeexample[4.0]{SIMD}{6}
|
|
|
|
|
|
In the code below, the function \ucode{fib()} is called in the main program and
|
|
also recursively called in the function \ucode{fib()} within an \bcode{if}
|
|
condition. The compiler creates a masked vector version and a non-masked vector
|
|
version for the function \ucode{fib()} while retaining the original scalar
|
|
version of the \ucode{fib()} function.
|
|
|
|
\cexample[4.0]{SIMD}{7}
|
|
|
|
\ffreeexample[4.0]{SIMD}{7}
|
|
|
|
|
|
|
|
%%% section
|
|
%\pagebreak
|
|
\section{Loop-Carried Lexical Forward Dependence}
|
|
\label{sec:SIMD_forward_dep}
|
|
\index{dependences!loop-carried lexical forward}
|
|
|
|
|
|
The following example tests the restriction on an SIMD loop with the loop-carried lexical forward-dependence. This dependence must be preserved for the correct execution of SIMD loops.
|
|
|
|
A loop can be vectorized even though the iterations are not completely independent when it has loop-carried dependences that are forward lexical dependences, indicated in the code below by the read of \ucode{A[j+1]} and the write to \ucode{A[j]} in C/C++ code (or \ucode{A(j+1)} and \ucode{A(j)} in Fortran). That is, the read of \ucode{A[j+1]} (or \ucode{A(j+1)} in Fortran) before the write to \ucode{A[j]} (or \ucode{A(j)} in Fortran) ordering must be preserved for each iteration in \ucode{j} for valid SIMD code generation.
|
|
|
|
This test assures that the compiler preserves the loop-carried lexical forward-dependence for generating a correct SIMD code.
|
|
|
|
\cexample[4.0]{SIMD}{8}
|
|
|
|
\ffreeexample[4.0]{SIMD}{8}
|
|
|