mirror of
https://github.com/OpenMP/Examples.git
synced 2025-04-04 05:41:33 +01:00
53 lines
2.7 KiB
TeX
53 lines
2.7 KiB
TeX
\cchapter{SIMD}{SIMD}
|
|
\label{chap:simd}
|
|
|
|
Single instruction, multiple data (SIMD) is a form of parallel execution
|
|
in which the same operation is performed on multiple data elements
|
|
independently in hardware vector processing units (VPU), also called SIMD units.
|
|
The addition of two vectors to form a third vector is a SIMD operation.
|
|
Many processors have SIMD (vector) units that can perform simultaneously
|
|
2, 4, 8 or more executions of the same operation (by a single SIMD unit).
|
|
|
|
Loops without loop-carried backward dependency (or with dependency preserved using
|
|
ordered simd) are candidates for vectorization by the compiler for
|
|
execution with SIMD units. In addition, with state-of-the-art vectorization
|
|
technology and \code{declare simd} directive extensions for function vectorization
|
|
in the OpenMP 4.5 specification, loops with function calls can be vectorized as well.
|
|
The basic idea is that a scalar function call in a loop can be replaced by a vector version
|
|
of the function, and the loop can be vectorized simultaneously by combining a loop
|
|
vectorization (\code{simd} directive on the loop) and a function
|
|
vectorization (\code{declare simd} directive on the function).
|
|
|
|
A \code{simd} construct states that SIMD operations be performed on the
|
|
data within the loop. A number of clauses are available to provide
|
|
data-sharing attributes (\code{private}, \code{linear}, \code{reduction} and
|
|
\code{lastprivate}). Other clauses provide vector length preference/restrictions
|
|
(\code{simdlen} / \code{safelen}), loop fusion (\code{collapse}), and data
|
|
alignment (\code{aligned}).
|
|
|
|
The \code{declare simd} directive designates
|
|
that a vector version of the function should also be constructed for
|
|
execution within loops that contain the function and have a \code{simd}
|
|
directive. Clauses provide argument specifications (\code{linear},
|
|
\code{uniform}, and \code{aligned}), a requested vector length
|
|
(\code{simdlen}), and designate whether the function is always/never
|
|
called conditionally in a loop (\code{branch}/\code{inbranch}).
|
|
The latter is for optimizing performance.
|
|
|
|
Also, the \code{simd} construct has been combined with the worksharing loop
|
|
constructs (\code{for simd} and \code{do simd}) to enable simultaneous thread
|
|
execution in different SIMD units.
|
|
%Hence, the \code{simd} construct can be
|
|
%used alone on a loop to direct vectorization (SIMD execution), or in
|
|
%combination with a parallel loop construct to include thread parallelism
|
|
%(a parallel loop sequentially followed by a \code{simd} construct,
|
|
%or a combined construct such as \code{parallel do simd} or
|
|
%\code{parallel for simd}).
|
|
|
|
|
|
%===== Examples Sections =====
|
|
\input{SIMD/SIMD}
|
|
\input{SIMD/linear_modifier}
|
|
|
|
|