mirror of
https://github.com/OpenMP/Examples.git
synced 2025-04-03 13:21:33 +01:00
v5.0.1 release
This commit is contained in:
parent
eaec9ede64
commit
3052c10566
@ -48,15 +48,14 @@ document.
|
||||
In this chapter, examples illustrate how race conditions may arise for accesses
|
||||
to variables with a \plc{shared} data-sharing attribute when flush operations
|
||||
are not properly employed. A race condition can exist when two or more threads
|
||||
are involved in accessing a variable in which not all of the accesses are
|
||||
reads; that is, a WaR, RaW or WaW condition exists (R=read, a=after, W=write).
|
||||
A RaR does not produce a race condition. In particular, a data race will arise
|
||||
when conflicting accesses do not have a well-defined \emph{completion order}.
|
||||
The existence of data races in OpenMP programs result in undefined behavior,
|
||||
and so they should generally be avoided for programs to be correct. The
|
||||
completion order of accesses to a shared variable is guaranteed in OpenMP
|
||||
through a set of memory consistency rules that are described in the \plc{OpenMP
|
||||
Memory Consitency} section of the OpenMP Specifications document.
|
||||
are involved in accessing a variable and at least one of the accesses modifies
|
||||
the variable. In particular, a data race will arise when conflicting accesses
|
||||
do not have a well-defined \emph{completion order}. The existence of data
|
||||
races in OpenMP programs result in undefined behavior, and so they should
|
||||
generally be avoided for programs to be correct. The completion order of
|
||||
accesses to a shared variable is guaranteed in OpenMP through a set of memory
|
||||
consistency rules that are described in the \plc{OpenMP Memory Consitency}
|
||||
section of the OpenMP Specifications document.
|
||||
|
||||
%This chapter also includes examples that exhibit non-sequentially consistent
|
||||
%(\emph{non-SC}) behavior. Sequential consistency (\emph{SC}) is the desirable
|
||||
|
@ -40,8 +40,9 @@ for executing the task to completion, even though it may leave the
|
||||
execution at a scheduling point and return later. The thread is tied
|
||||
to the task. Scheduling points can be introduced with the \code{taskyield}
|
||||
construct. With an \code{untied} clause any other thread is allowed to continue
|
||||
the task. An \code{if} clause with a \plc{true} expression allows the
|
||||
generating thread to immediately execute the task as an undeferred task.
|
||||
the task. An \code{if} clause with an expression that evaluates to \plc{false}
|
||||
results in an \emph{undeferred} task, which instructs the runtime to suspend
|
||||
the generating task until the undeferred task completes its execution.
|
||||
By including the data environment of the generating task into the generated task with the
|
||||
\code{mergeable} and \code{final} clauses, task generation overhead can be reduced.
|
||||
|
||||
|
@ -19,3 +19,8 @@ source form. \plc{ext} is one of the following:
|
||||
\item \plc{f90} -- Fortran code in free form.
|
||||
\end{compactitem}
|
||||
|
||||
Some of the example labels may include version information
|
||||
(\code{\small{}omp\_\plc{verno}}) to indicate features that are illustrated
|
||||
by an example for a specific OpenMP version, such as ``\plc{scan.1.c}
|
||||
\;(\code{\small{}omp\_5.0}).''
|
||||
|
||||
|
@ -5,9 +5,9 @@
|
||||
The following example illustrates the basic use of the \code{simd} construct
|
||||
to assure the compiler that the loop can be vectorized.
|
||||
|
||||
\cexample{SIMD}{1}
|
||||
\cexample[4.0]{SIMD}{1}
|
||||
|
||||
\ffreeexample{SIMD}{1}
|
||||
\ffreeexample[4.0]{SIMD}{1}
|
||||
|
||||
\clearpage
|
||||
|
||||
@ -40,9 +40,9 @@ In the \code{simd} constructs for the loops the \code{private(tmp)} clause is
|
||||
necessary to assure that the each vector operation has its own \plc{tmp}
|
||||
variable.
|
||||
|
||||
\cexample{SIMD}{2}
|
||||
\cexample[4.0]{SIMD}{2}
|
||||
|
||||
\ffreeexample{SIMD}{2}
|
||||
\ffreeexample[4.0]{SIMD}{2}
|
||||
|
||||
\pagebreak
|
||||
A thread that encounters a SIMD construct executes a vectorized code of the
|
||||
@ -52,9 +52,9 @@ privatized and declared as reductions with clauses. The example below
|
||||
illustrates the use of \code{private} and \code{reduction} clauses in a SIMD
|
||||
construct.
|
||||
|
||||
\cexample{SIMD}{3}
|
||||
\cexample[4.0]{SIMD}{3}
|
||||
|
||||
\ffreeexample{SIMD}{3}
|
||||
\ffreeexample[4.0]{SIMD}{3}
|
||||
|
||||
|
||||
\pagebreak
|
||||
@ -68,9 +68,9 @@ code is safe for vectors up to and including size 16. In the loop, \plc{m} can
|
||||
be 16 or greater, for correct code execution. If the value of \plc{m} is less
|
||||
than 16, the behavior is undefined.
|
||||
|
||||
\cexample{SIMD}{4}
|
||||
\cexample[4.0]{SIMD}{4}
|
||||
|
||||
\ffreeexample{SIMD}{4}
|
||||
\ffreeexample[4.0]{SIMD}{4}
|
||||
|
||||
\pagebreak
|
||||
The following SIMD construct instructs the compiler to collapse the \plc{i} and
|
||||
@ -78,9 +78,9 @@ The following SIMD construct instructs the compiler to collapse the \plc{i} and
|
||||
threads of the team. Within the workshared loop chunks of a thread, the SIMD
|
||||
chunks are executed in the lanes of the vector units.
|
||||
|
||||
\cexample{SIMD}{5}
|
||||
\cexample[4.0]{SIMD}{5}
|
||||
|
||||
\ffreeexample{SIMD}{5}
|
||||
\ffreeexample[4.0]{SIMD}{5}
|
||||
|
||||
|
||||
%%% section
|
||||
@ -95,9 +95,9 @@ the other hand, the \code{inbranch} clause for the function goo indicates that
|
||||
the function is always called conditionally in the SIMD loop inside
|
||||
the function \plc{myaddfloat}.
|
||||
|
||||
\cexample{SIMD}{6}
|
||||
\cexample[4.0]{SIMD}{6}
|
||||
|
||||
\ffreeexample{SIMD}{6}
|
||||
\ffreeexample[4.0]{SIMD}{6}
|
||||
|
||||
|
||||
In the code below, the function \plc{fib()} is called in the main program and
|
||||
@ -106,9 +106,9 @@ condition. The compiler creates a masked vector version and a non-masked vector
|
||||
version for the function \plc{fib()} while retaining the original scalar
|
||||
version of the \plc{fib()} function.
|
||||
|
||||
\cexample{SIMD}{7}
|
||||
\cexample[4.0]{SIMD}{7}
|
||||
|
||||
\ffreeexample{SIMD}{7}
|
||||
\ffreeexample[4.0]{SIMD}{7}
|
||||
|
||||
|
||||
|
||||
@ -124,7 +124,7 @@ A loop can be vectorized even though the iterations are not completely independe
|
||||
|
||||
This test assures that the compiler preserves the loop carried lexical forward-dependence for generating a correct SIMD code.
|
||||
|
||||
\cexample{SIMD}{8}
|
||||
\cexample[4.0]{SIMD}{8}
|
||||
|
||||
\ffreeexample{SIMD}{8}
|
||||
\ffreeexample[4.0]{SIMD}{8}
|
||||
|
||||
|
@ -67,8 +67,8 @@ by thread 0 and the read from \plc{x} by thread 1, and so thread 1 must see that
|
||||
\plc{x} equals 10.
|
||||
|
||||
\pagebreak
|
||||
\cexample{acquire_release}{1}
|
||||
\ffreeexample{acquire_release}{1}
|
||||
\cexample[5.0]{acquire_release}{1}
|
||||
\ffreeexample[5.0]{acquire_release}{1}
|
||||
|
||||
In the second example, the \code{critical} constructs are exchanged with
|
||||
\code{atomic} constructs that have \textit{explicit} memory ordering specified. When the
|
||||
@ -77,8 +77,8 @@ results in a release/acquire synchronization that in turn implies that the
|
||||
assignment to \plc{x} on thread 0 happens before the read of \plc{x} on thread
|
||||
1. Therefore, thread 1 will print ``x = 10''.
|
||||
|
||||
\cexample{acquire_release}{2}
|
||||
\ffreeexample{acquire_release}{2}
|
||||
\cexample[5.0]{acquire_release}{2}
|
||||
\ffreeexample[5.0]{acquire_release}{2}
|
||||
|
||||
\pagebreak
|
||||
In the third example, \code{atomic} constructs that specify relaxed atomic
|
||||
@ -105,8 +105,8 @@ construct used in Example 2 for thread 1.
|
||||
%}
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3
|
||||
|
||||
\cexample{acquire_release}{3}
|
||||
\ffreeexample{acquire_release}{3}
|
||||
\cexample[5.0]{acquire_release}{3}
|
||||
\ffreeexample[5.0]{acquire_release}{3}
|
||||
|
||||
Example 4 will fail to order the write to \plc{x} on thread 0 before the read
|
||||
from \plc{x} on thread 1. Importantly, the implicit release flush on exit from
|
||||
@ -137,5 +137,5 @@ modifies \plc{y} and provides release semantics must be specified.
|
||||
%by thread 0.
|
||||
%}
|
||||
|
||||
\cexample{acquire_release_broke}{4}
|
||||
\ffreeexample{acquire_release_broke}{4}
|
||||
\cexample[5.0]{acquire_release_broke}{4}
|
||||
\ffreeexample[5.0]{acquire_release_broke}{4}
|
||||
|
@ -34,9 +34,9 @@ the partition list when the number of threads is less than or equal to the numbe
|
||||
of places in the parent's place partition, for the machine architecture depicted
|
||||
above. Note that the threads are bound to the first place of each subpartition.
|
||||
|
||||
\cexample{affinity}{1}
|
||||
\cexample[4.0]{affinity}{1}
|
||||
|
||||
\fexample{affinity}{1}
|
||||
\fexample[4.0]{affinity}{1}
|
||||
|
||||
It is unspecified on which place the master thread is initially started. If the
|
||||
master thread is initially started on p0, the following placement of threads will
|
||||
@ -75,9 +75,9 @@ parent's place partition. The first \plc{T/P} threads of the team (including the
|
||||
thread) execute on the parent's place. The next \plc{T/P} threads execute on the next
|
||||
place in the place partition, and so on, with wrap around.
|
||||
|
||||
\cexample{affinity}{2}
|
||||
\cexample[4.0]{affinity}{2}
|
||||
|
||||
\ffreeexample{affinity}{2}
|
||||
\ffreeexample[4.0]{affinity}{2}
|
||||
|
||||
It is unspecified on which place the master thread is initially started. If the
|
||||
master thread is initially started on p0, the following placement of threads will
|
||||
@ -130,9 +130,9 @@ the partition list when the number of threads is less than or equal to the numbe
|
||||
of places in parent's place partition, for the machine architecture depicted above.
|
||||
The place partition is not changed by the \code{close} policy.
|
||||
|
||||
\cexample{affinity}{3}
|
||||
\cexample[4.0]{affinity}{3}
|
||||
|
||||
\fexample{affinity}{3}
|
||||
\fexample[4.0]{affinity}{3}
|
||||
|
||||
It is unspecified on which place the master thread is initially started. If the
|
||||
master thread is initially started on p0, the following placement of threads will
|
||||
@ -171,9 +171,9 @@ thread) execute on the parent's place. The next \plc{T/P} threads execute on the
|
||||
place in the place partition, and so on, with wrap around. The place partition
|
||||
is not changed by the \code{close} policy.
|
||||
|
||||
\cexample{affinity}{4}
|
||||
\cexample[4.0]{affinity}{4}
|
||||
|
||||
\ffreeexample{affinity}{4}
|
||||
\ffreeexample[4.0]{affinity}{4}
|
||||
|
||||
It is unspecified on which place the master thread is initially started. If the
|
||||
master thread is initially running on p0, the following placement of threads will
|
||||
@ -225,9 +225,9 @@ The following example shows the result of the \code{master} affinity policy on
|
||||
the partition list for the machine architecture depicted above. The place partition
|
||||
is not changed by the master policy.
|
||||
|
||||
\cexample{affinity}{5}
|
||||
\cexample[4.0]{affinity}{5}
|
||||
|
||||
\fexample{affinity}{5}
|
||||
\fexample[4.0]{affinity}{5}
|
||||
|
||||
It is unspecified on which place the master thread is initially started. If the
|
||||
master thread is initially running on p0, the following placement of threads will
|
||||
|
@ -28,9 +28,9 @@ not changed, so affinity is NOT reported.
|
||||
In the last parallel region, the thread affinities are reported
|
||||
because the thread affinity has changed.
|
||||
|
||||
\cexample{affinity_display}{1}
|
||||
\cexample[5.0]{affinity_display}{1}
|
||||
|
||||
\ffreeexample{affinity_display}{1}
|
||||
\ffreeexample[5.0]{affinity_display}{1}
|
||||
|
||||
|
||||
In the following example 2 threads are forked, and each executes on a socket. Next,
|
||||
@ -58,9 +58,9 @@ the parallel nesting level (\%L), the ancestor thread number (\%a), the thread n
|
||||
and the thread affinity (\%A). In the nested parallel region within the \plc{socket\_work} routine
|
||||
the affinities for the threads on each socket are printed according to this format.
|
||||
|
||||
\cexample{affinity_display}{2}
|
||||
\cexample[5.0]{affinity_display}{2}
|
||||
|
||||
\ffreeexample{affinity_display}{2}
|
||||
\ffreeexample[5.0]{affinity_display}{2}
|
||||
|
||||
The next example illustrates more details about affinity formatting.
|
||||
First, the \code{omp\_get\_affininity\_format()} API routine is used to
|
||||
@ -98,7 +98,7 @@ The maximum value for the number of characters (\plc{nchars}) returned by
|
||||
clause and the \plc{if(nchars >= max\_req\_store) max\_req\_store=nchars} statement.
|
||||
It is used to report possible truncation (if \plc{max\_req\_store} > \plc{buffer\_store}).
|
||||
|
||||
\cexample{affinity_display}{3}
|
||||
\cexample[5.0]{affinity_display}{3}
|
||||
|
||||
\ffreeexample{affinity_display}{3}
|
||||
\ffreeexample[5.0]{affinity_display}{3}
|
||||
|
||||
|
@ -37,7 +37,7 @@ On some systems there are utilities, files or user guides that provide configura
|
||||
information. For instance, the socket number and proc\_id's for a socket
|
||||
can be found in the /proc/cpuinfo text file on Linux systems.
|
||||
|
||||
\cexample{affinity_query}{1}
|
||||
\cexample[4.5]{affinity_query}{1}
|
||||
|
||||
\ffreeexample{affinity_query}{1}
|
||||
\ffreeexample[4.5]{affinity_query}{1}
|
||||
|
||||
|
@ -57,7 +57,7 @@ and the set of all variables used in the allocate statement is specified in the
|
||||
|
||||
%\pagebreak
|
||||
|
||||
\cexample{allocators}{1}
|
||||
\ffreeexample{allocators}{1}
|
||||
\cexample[5.0]{allocators}{1}
|
||||
\ffreeexample[5.0]{allocators}{1}
|
||||
|
||||
|
||||
|
@ -8,31 +8,31 @@ on \code{target} and \code{target} \code{data} constructs.
|
||||
This example shows the invalid usage of two separate sections of the same array
|
||||
inside of a \code{target} construct.
|
||||
|
||||
\cexample{array_sections}{1}
|
||||
\cexample[4.0]{array_sections}{1}
|
||||
|
||||
\ffreeexample{array_sections}{1}
|
||||
\ffreeexample[4.0]{array_sections}{1}
|
||||
|
||||
\pagebreak
|
||||
This example shows the invalid usage of two separate sections of the same array
|
||||
inside of a \code{target} construct.
|
||||
|
||||
\cexample{array_sections}{2}
|
||||
\cexample[4.0]{array_sections}{2}
|
||||
|
||||
\ffreeexample{array_sections}{2}
|
||||
\ffreeexample[4.0]{array_sections}{2}
|
||||
|
||||
\pagebreak
|
||||
This example shows the valid usage of two separate sections of the same array inside
|
||||
of a \code{target} construct.
|
||||
|
||||
\cexample{array_sections}{3}
|
||||
\cexample[4.0]{array_sections}{3}
|
||||
|
||||
\ffreeexample{array_sections}{3}
|
||||
\ffreeexample[4.0]{array_sections}{3}
|
||||
|
||||
\pagebreak
|
||||
This example shows the valid usage of a wholly contained array section of an already
|
||||
mapped array section inside of a \code{target} construct.
|
||||
|
||||
\cexample{array_sections}{4}
|
||||
\cexample[4.0]{array_sections}{4}
|
||||
|
||||
\ffreeexample{array_sections}{4}
|
||||
\ffreeexample[4.0]{array_sections}{4}
|
||||
|
||||
|
@ -23,5 +23,13 @@ Note the use of additional parentheses
|
||||
around the shape-operator and $a$ to ensure the correct precedence
|
||||
over array-section operations.
|
||||
|
||||
\cnexample{array_shaping}{1}
|
||||
\cnexample[5.0]{array_shaping}{1}
|
||||
\ccppspecificend
|
||||
|
||||
The shape operator is not defined for Fortran. Explicit array shaping
|
||||
of procedure arguments can be used instead to achieve a similar goal.
|
||||
Below is the Fortran-equivalent of the above example that illustrates
|
||||
the support of transferring two rows of noncontiguous boundary
|
||||
data in the \code{target}~\code{update} directive.
|
||||
|
||||
\ffreeexample[5.0]{array_shaping}{1}
|
||||
|
@ -11,13 +11,13 @@ name \plc{b} is associated with the shared variable \plc{a}. With the predetermi
|
||||
attribute rule, the associate name \plc{b} is not allowed to be specified on the \code{private}
|
||||
clause.
|
||||
|
||||
\fnexample{associate}{1}
|
||||
\fnexample[4.0]{associate}{1}
|
||||
|
||||
In next example, within the \code{parallel} construct, the association name \plc{thread\_id}
|
||||
is associated with the private copy of \plc{i}. The print statement should output the
|
||||
unique thread number.
|
||||
|
||||
\fnexample{associate}{2}
|
||||
\fnexample[4.0]{associate}{2}
|
||||
|
||||
The following example illustrates the effect of specifying a selector name on a data-sharing
|
||||
attribute clause. The associate name \plc{u} is associated with \plc{v} and the variable \plc{v}
|
||||
@ -27,6 +27,6 @@ The association between \plc{u} and the original \plc{v} is retained (see the Da
|
||||
Attribute Rules section in the OpenMP 4.0 API Specifications). Inside the \code{parallel}
|
||||
region, \plc{v} has the value of -1 and \plc{u} has the value of the original \plc{v}.
|
||||
|
||||
\ffreenexample{associate}{3}
|
||||
\ffreenexample[4.0]{associate}{3}
|
||||
\fortranspecificend
|
||||
|
||||
|
@ -26,6 +26,6 @@ one thread is being used for offload generation. In the situation where
|
||||
little time is spent by the \plc{target task} in setting
|
||||
up and tearing down the the target execution, \code{static} scheduling may be desired.
|
||||
|
||||
\cexample{async_target}{3}
|
||||
\cexample[4.5]{async_target}{3}
|
||||
|
||||
\ffreeexample{async_target}{3}
|
||||
\ffreeexample[4.5]{async_target}{3}
|
||||
|
@ -11,8 +11,8 @@ The last dependence is produced by array \plc{p} with the \code{out} dependence
|
||||
|
||||
The \code{nowait} clause on the \code{target} construct creates a deferrable \plc{target task}, allowing the encountering task to continue execution without waiting for the completion of the \plc{target task}.
|
||||
|
||||
\cexample{async_target}{4}
|
||||
\cexample[4.5]{async_target}{4}
|
||||
|
||||
\ffreeexample{async_target}{4}
|
||||
\ffreeexample[4.5]{async_target}{4}
|
||||
|
||||
%end
|
||||
|
@ -9,20 +9,20 @@ scheduling point while waiting for the execution of the \code{target} region
|
||||
to complete, allowing the thread to switch back to the execution of the encountering
|
||||
task or one of the previously generated explicit tasks.
|
||||
|
||||
\cexample{async_target}{1}
|
||||
\cexample[4.0]{async_target}{1}
|
||||
|
||||
\pagebreak
|
||||
The Fortran version has an interface block that contains the \code{declare} \code{target}.
|
||||
An identical statement exists in the function declaration (not shown here).
|
||||
|
||||
\ffreeexample{async_target}{1}
|
||||
\ffreeexample[4.0]{async_target}{1}
|
||||
|
||||
The following example shows how the \code{task} and \code{target} constructs
|
||||
are used to execute multiple \code{target} regions asynchronously. The task dependence
|
||||
ensures that the storage is allocated and initialized on the device before it is
|
||||
accessed.
|
||||
|
||||
\cexample{async_target}{2}
|
||||
\cexample[4.0]{async_target}{2}
|
||||
|
||||
The Fortran example below is similar to the C version above. Instead of pointers, though, it uses
|
||||
the convenience of Fortran allocatable arrays on the device. In order to preserve the arrays
|
||||
@ -52,4 +52,4 @@ section of the specification.)
|
||||
However, the intention is to relax the restrictions on mapping of allocatable variables in the next release
|
||||
of the specification so that the example will be compliant.
|
||||
|
||||
\ffreeexample{async_target}{2}
|
||||
\ffreeexample[4.0]{async_target}{2}
|
||||
|
@ -14,9 +14,9 @@ Note that the \code{atomic} directive applies only to the statement immediately
|
||||
following it. As a result, elements of \plc{y} are not updated atomically in
|
||||
this example.
|
||||
|
||||
\cexample{atomic}{1}
|
||||
\cexample[3.1]{atomic}{1}
|
||||
|
||||
\fexample{atomic}{1}
|
||||
\fexample[3.1]{atomic}{1}
|
||||
|
||||
The following example illustrates the \code{read} and \code{write} clauses
|
||||
for the \code{atomic} directive. These clauses ensure that the given variable
|
||||
@ -26,9 +26,9 @@ another part of the variable. Note that most hardware provides atomic reads and
|
||||
writes for some set of properly aligned variables of specific sizes, but not necessarily
|
||||
for all the variable types supported by the OpenMP API.
|
||||
|
||||
\cexample{atomic}{2}
|
||||
\cexample[3.1]{atomic}{2}
|
||||
|
||||
\fexample{atomic}{2}
|
||||
\fexample[3.1]{atomic}{2}
|
||||
|
||||
The following example illustrates the \code{capture} clause for the \code{atomic}
|
||||
directive. In this case the value of a variable is captured, and then the variable
|
||||
@ -37,8 +37,8 @@ be implemented using the fetch-and-add instruction available on many kinds of ha
|
||||
The example also shows a way to implement a spin lock using the \code{capture}
|
||||
and \code{read} clauses.
|
||||
|
||||
\cexample{atomic}{3}
|
||||
\cexample[3.1]{atomic}{3}
|
||||
|
||||
\fexample{atomic}{3}
|
||||
\fexample[3.1]{atomic}{3}
|
||||
|
||||
|
||||
|
@ -5,21 +5,21 @@
|
||||
The following non-conforming examples illustrate the restrictions on the \code{atomic}
|
||||
construct.
|
||||
|
||||
\cexample{atomic_restrict}{1}
|
||||
\cexample[3.1]{atomic_restrict}{1}
|
||||
|
||||
\fexample{atomic_restrict}{1}
|
||||
\fexample[3.1]{atomic_restrict}{1}
|
||||
|
||||
\cexample{atomic_restrict}{2}
|
||||
\cexample[3.1]{atomic_restrict}{2}
|
||||
|
||||
\fortranspecificstart
|
||||
The following example is non-conforming because \code{I} and \code{R} reference
|
||||
the same location but have different types.
|
||||
|
||||
\fnexample{atomic_restrict}{2}
|
||||
\fnexample[3.1]{atomic_restrict}{2}
|
||||
|
||||
Although the following example might work on some implementations, this is also
|
||||
non-conforming:
|
||||
|
||||
\fnexample{atomic_restrict}{3}
|
||||
\fnexample[3.1]{atomic_restrict}{3}
|
||||
\fortranspecificend
|
||||
|
||||
|
@ -11,7 +11,7 @@ exception is properly handled in the sequential part. If cancellation of the \co
|
||||
region has been requested, some threads might have executed \code{phase\_1()}.
|
||||
However, it is guaranteed that none of the threads executed \code{phase\_2()}.
|
||||
|
||||
\cppexample{cancellation}{1}
|
||||
\cppexample[4.0]{cancellation}{1}
|
||||
|
||||
|
||||
The following example illustrates the use of the \code{cancel} construct in error
|
||||
@ -20,7 +20,7 @@ the cancellation is activated. The encountering thread sets the shared variable
|
||||
\code{err} and other threads of the binding thread set proceed to the end of
|
||||
the worksharing construct after the cancellation has been activated.
|
||||
|
||||
\ffreeexample{cancellation}{1}
|
||||
\ffreeexample[4.0]{cancellation}{1}
|
||||
|
||||
\clearpage
|
||||
|
||||
@ -34,11 +34,11 @@ task group to control the effect of the \code{cancel taskgroup} directive. The
|
||||
\plc{level} argument is used to create undeferred tasks after the first ten
|
||||
levels of the tree.
|
||||
|
||||
\cexample{cancellation}{2}
|
||||
\cexample[4.0]{cancellation}{2}
|
||||
|
||||
|
||||
The following is the equivalent parallel search example in Fortran.
|
||||
|
||||
\ffreeexample{cancellation}{2}
|
||||
\ffreeexample[4.0]{cancellation}{2}
|
||||
|
||||
|
||||
|
@ -16,9 +16,9 @@ The variable \code{j} can be omitted from the \code{private} clause when the
|
||||
from the \code{private} clause. In either case, \code{k} is implicitly private
|
||||
and could be omitted from the \code{private} clause.
|
||||
|
||||
\cexample{collapse}{1}
|
||||
\cexample[3.0]{collapse}{1}
|
||||
|
||||
\fexample{collapse}{1}
|
||||
\fexample[3.0]{collapse}{1}
|
||||
|
||||
In the next example, the \code{k} and \code{j} loops are associated with the
|
||||
loop construct. So the iterations of the \code{k} and \code{j} loops are collapsed
|
||||
@ -33,9 +33,9 @@ will have the value \code{2} and \code{j} will have the value \code{3}. Since
|
||||
by the sequentially last iteration of the collapsed \code{k} and \code{j} loop.
|
||||
This example prints: \code{2 3}.
|
||||
|
||||
\cexample{collapse}{2}
|
||||
\cexample[3.0]{collapse}{2}
|
||||
|
||||
\fexample{collapse}{2}
|
||||
\fexample[3.0]{collapse}{2}
|
||||
|
||||
The next example illustrates the interaction of the \code{collapse} and \code{ordered}
|
||||
clauses.
|
||||
@ -71,8 +71,8 @@ The code prints
|
||||
\\
|
||||
\code{1 3 2}
|
||||
|
||||
\cexample{collapse}{3}
|
||||
\cexample[3.0]{collapse}{3}
|
||||
|
||||
\fexample{collapse}{3}
|
||||
\fexample[3.0]{collapse}{3}
|
||||
|
||||
|
||||
|
@ -10,5 +10,5 @@ The following example shows the use of reference types in data-sharing clauses i
|
||||
Additionally it shows how the data-sharing of formal arguments with a C++ reference type on an orphaned task generating construct is determined implicitly. (See the Data-sharing Attribute Rules for Variables Referenced in a Construct Section of the 4.5 OpenMP specification.)
|
||||
|
||||
|
||||
\cppnexample{cpp_reference}{1}
|
||||
\cppnexample[4.5]{cpp_reference}{1}
|
||||
\cppspecificend
|
||||
|
@ -17,4 +17,4 @@ The following example extends the previous example by adding the \code{hint} cla
|
||||
|
||||
\cexample{critical}{2}
|
||||
|
||||
\fexample{critical}{2}
|
||||
\fexample[4.5]{critical}{2}
|
||||
|
@ -16,7 +16,7 @@ the \code{target} region (thus \code{fib}) will execute on the host device.
|
||||
For C/C++ codes the declaration of the function \code{fib} appears between the \code{declare}
|
||||
\code{target} and \code{end} \code{declare} \code{target} directives.
|
||||
|
||||
\cexample{declare_target}{1}
|
||||
\cexample[4.0]{declare_target}{1}
|
||||
|
||||
The Fortran \code{fib} subroutine contains a \code{declare} \code{target} declaration
|
||||
to indicate to the compiler to create an device executable version of the procedure.
|
||||
@ -27,7 +27,7 @@ The program uses the \code{module\_fib} module, which presents an explicit inter
|
||||
the compiler with the \code{declare} \code{target} declarations for processing
|
||||
the \code{fib} call.
|
||||
|
||||
\ffreeexample{declare_target}{1}
|
||||
\ffreeexample[4.0]{declare_target}{1}
|
||||
|
||||
The next Fortran example shows the use of an external subroutine. Without an explicit
|
||||
interface (through module use or an interface block) the \code{declare} \code{target}
|
||||
@ -35,7 +35,7 @@ declarations within a external subroutine are unknown to the main program unit;
|
||||
therefore, a \code{declare} \code{target} must be provided within the program
|
||||
scope for the compiler to determine that a target binary should be available.
|
||||
|
||||
\ffreeexample{declare_target}{2}
|
||||
\ffreeexample[4.0]{declare_target}{2}
|
||||
|
||||
\subsection{\code{declare} \code{target} Construct for Class Type}
|
||||
\label{subsec:declare_target_class}
|
||||
@ -47,7 +47,7 @@ of a variable \plc{varY} with a class type \code{typeY}. The member function \co
|
||||
be accessed on a target device because its declaration did not appear between \code{declare}
|
||||
\code{target} and \code{end} \code{declare} \code{target} directives.
|
||||
|
||||
\cppnexample{declare_target}{2}
|
||||
\cppnexample[4.0]{declare_target}{2}
|
||||
\cppspecificend
|
||||
|
||||
\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for Variables}
|
||||
@ -65,13 +65,13 @@ is then used to manage the consistency of the variables \plc{p}, \plc{v1}, and \
|
||||
data environment of the encountering host device task and the implicit device data
|
||||
environment of the default target device.
|
||||
|
||||
\cexample{declare_target}{3}
|
||||
\cexample[4.0]{declare_target}{3}
|
||||
|
||||
The Fortran version of the above C code uses a different syntax. Fortran modules
|
||||
use a list syntax on the \code{declare} \code{target} directive to declare
|
||||
mapped variables.
|
||||
|
||||
\ffreeexample{declare_target}{3}
|
||||
\ffreeexample[4.0]{declare_target}{3}
|
||||
|
||||
The following example also indicates that the function \code{Pfun()} is available on the
|
||||
target device, as well as the variable \plc{Q}, which is mapped to the implicit device
|
||||
@ -84,7 +84,7 @@ In the following example, the function and variable declarations appear between
|
||||
the \code{declare} \code{target} and \code{end} \code{declare} \code{target}
|
||||
directives.
|
||||
|
||||
\cexample{declare_target}{4}
|
||||
\cexample[4.0]{declare_target}{4}
|
||||
|
||||
The Fortran version of the above C code uses a different syntax. In Fortran modules
|
||||
a list syntax on the \code{declare} \code{target} directive is used to declare
|
||||
@ -93,7 +93,7 @@ separated list. When the \code{declare} \code{target} directive is used to
|
||||
declare just the procedure, the procedure name need not be listed -- it is implicitly
|
||||
assumed, as illustrated in the \code{Pfun()} function.
|
||||
|
||||
\ffreeexample{declare_target}{4}
|
||||
\ffreeexample[4.0]{declare_target}{4}
|
||||
|
||||
\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} with \code{declare} \code{simd}}
|
||||
\label{subsec:declare_target_simd}
|
||||
@ -104,7 +104,7 @@ is available on a target device. The \code{declare} \code{simd} directive indica
|
||||
that there is a SIMD version of the function \code{P()} that is available on the target
|
||||
device as well as one that is available on the host device.
|
||||
|
||||
\cexample{declare_target}{5}
|
||||
\cexample[4.0]{declare_target}{5}
|
||||
|
||||
The Fortran version of the above C code uses a different syntax. Fortran modules
|
||||
use a list syntax of the \code{declare} \code{target} declaration for the mapping.
|
||||
@ -113,7 +113,7 @@ The function declaration does not use a list and implicitly assumes the function
|
||||
name. In this Fortran example row and column indices are reversed relative to the
|
||||
C/C++ example, as is usual for codes optimized for memory access.
|
||||
|
||||
\ffreeexample{declare_target}{5}
|
||||
\ffreeexample[4.0]{declare_target}{5}
|
||||
|
||||
|
||||
\subsection{\code{declare}~\code{target} Directive with \code{link} Clause}
|
||||
@ -137,6 +137,6 @@ globally on the device for part of the program execution. The single precision d
|
||||
are allocated and persist only for the first \code{target} region. Similarly, the
|
||||
double precision data are in scope on the device only for the second \code{target} region.
|
||||
|
||||
\cexample{declare_target}{6}
|
||||
\ffreeexample{declare_target}{6}
|
||||
\cexample[4.5]{declare_target}{6}
|
||||
\ffreeexample[4.5]{declare_target}{6}
|
||||
|
||||
|
@ -44,6 +44,6 @@ effectively destroying the depend object.
|
||||
After an object has been uninitialized it can be initialized again
|
||||
with a new dependence type \emph{and} a new variable.
|
||||
|
||||
\cexample{depobj}{1}
|
||||
\cexample[5.0]{depobj}{1}
|
||||
|
||||
\ffreeexample{depobj}{1}
|
||||
\ffreeexample[5.0]{depobj}{1}
|
||||
|
@ -10,9 +10,9 @@ can be used to query if a code is executing on the initial host device or on a
|
||||
target device. The example then sets the number of threads in the \code{parallel}
|
||||
region based on where the code is executing.
|
||||
|
||||
\cexample{device}{1}
|
||||
\cexample[4.0]{device}{1}
|
||||
|
||||
\ffreeexample{device}{1}
|
||||
\ffreeexample[4.0]{device}{1}
|
||||
|
||||
\subsection{\code{omp\_get\_num\_devices} Routine}
|
||||
\label{subsec:device_num_devices}
|
||||
@ -20,9 +20,9 @@ region based on where the code is executing.
|
||||
The following example shows how the \code{omp\_get\_num\_devices} runtime library routine
|
||||
can be used to determine the number of devices.
|
||||
|
||||
\cexample{device}{2}
|
||||
\cexample[4.0]{device}{2}
|
||||
|
||||
\ffreeexample{device}{2}
|
||||
\ffreeexample[4.0]{device}{2}
|
||||
|
||||
\subsection{\code{omp\_set\_default\_device} and \\
|
||||
\code{omp\_get\_default\_device} Routines}
|
||||
@ -32,9 +32,9 @@ The following example shows how the \code{omp\_set\_default\_device} and \code{o
|
||||
runtime library routines can be used to set the default device and determine the
|
||||
default device respectively.
|
||||
|
||||
\cexample{device}{3}
|
||||
\cexample[4.0]{device}{3}
|
||||
|
||||
\ffreeexample{device}{3}
|
||||
\ffreeexample[4.0]{device}{3}
|
||||
|
||||
|
||||
\subsection{Target Memory and Device Pointers Routines}
|
||||
@ -53,5 +53,5 @@ in a \code{target} region by exposing the device pointer in an \code{is\_device\
|
||||
The example creates an array of cosine values on the default device, to be used
|
||||
on the host device. The function fails if a default device is not available.
|
||||
|
||||
\cexample{device}{4}
|
||||
\cexample[4.5]{device}{4}
|
||||
|
||||
|
@ -19,9 +19,9 @@ with an \code{ordered} clause without a parameter, on the loop directive,
|
||||
and a single \code{ordered} directive without the \code{depend} clause
|
||||
specified for the statement executing the \plc{bar} function.
|
||||
|
||||
\cexample{doacross}{1}
|
||||
\cexample[4.5]{doacross}{1}
|
||||
|
||||
\ffreeexample{doacross}{1}
|
||||
\ffreeexample[4.5]{doacross}{1}
|
||||
|
||||
The following code is similar to the previous example but with
|
||||
\plc{doacross loop nest} extended to two nested loops, \plc{i} and \plc{j},
|
||||
@ -37,9 +37,9 @@ Likewise, the \code{depend(sink:j-1,i)} and \code{depend(sink:j,i-1)} clauses
|
||||
in the Fortran code define cross-iteration dependences from iterations
|
||||
(\plc{j-1, i}) and (\plc{j, i-1}) to iteration (\plc{j, i}).
|
||||
|
||||
\cexample{doacross}{2}
|
||||
\cexample[4.5]{doacross}{2}
|
||||
|
||||
\ffreeexample{doacross}{2}
|
||||
\ffreeexample[4.5]{doacross}{2}
|
||||
|
||||
|
||||
The following example shows the incorrect use of the \code{ordered}
|
||||
@ -51,9 +51,9 @@ clauses define dependences on lexicographically later
|
||||
source iterations (\plc{i+1, j}) and (\plc{i, j+1}), which could cause
|
||||
a deadlock as well since they may not start to execute until the current iteration completes.
|
||||
|
||||
\cexample{doacross}{3}
|
||||
\cexample[4.5]{doacross}{3}
|
||||
|
||||
\ffreeexample{doacross}{3}
|
||||
\ffreeexample[4.5]{doacross}{3}
|
||||
|
||||
|
||||
The following example illustrates the use of the \code{collapse} clause for
|
||||
@ -63,6 +63,6 @@ The example also shows a compliant usage of the dependence source
|
||||
directive placed before the corresponding sink directive.
|
||||
Checking the completion of computation from previous iterations at the sink point can occur after the source statement.
|
||||
|
||||
\cexample{doacross}{4}
|
||||
\cexample[4.5]{doacross}{4}
|
||||
|
||||
\ffreeexample{doacross}{4}
|
||||
\ffreeexample[4.5]{doacross}{4}
|
||||
|
@ -22,7 +22,7 @@ to execute double precision code. Two teams are required, and
|
||||
the thread limit for each team is set to 1/2 of the number of
|
||||
available processors.
|
||||
|
||||
\cexample{host_teams}{1}
|
||||
\cexample[5.0]{host_teams}{1}
|
||||
|
||||
\ffreeexample{host_teams}{1}
|
||||
\ffreeexample[5.0]{host_teams}{1}
|
||||
|
||||
|
@ -5,6 +5,6 @@
|
||||
The following example demonstrates how to initialize an array of locks in a \code{parallel} region by using \code{omp\_init\_lock\_with\_hint}.
|
||||
Note, hints are combined with an \code{|} or \code{+} operator in C/C++ and a \code{+} operator in Fortran.
|
||||
|
||||
\cppexample{init_lock_with_hint}{1}
|
||||
\cppexample[4.5]{init_lock_with_hint}{1}
|
||||
|
||||
\fexample{init_lock_with_hint}{1}
|
||||
\fexample[4.5]{init_lock_with_hint}{1}
|
||||
|
@ -11,4 +11,14 @@ sequentially.
|
||||
|
||||
\fexample{lastprivate}{1}
|
||||
|
||||
\clearpage
|
||||
The next example illustrates the use of the \code{conditional} modifier in
|
||||
a \code{lastprivate} clause to return the last value when it may not come from
|
||||
the last iteration of a loop.
|
||||
That is, users can preserve the serial equivalence semantics of the loop.
|
||||
The conditional lastprivate ensures the final value of the variable after the loop
|
||||
is as if the loop iterations were executed in a sequential order.
|
||||
|
||||
\cexample[5.0]{lastprivate}{2}
|
||||
|
||||
\ffreeexample[5.0]{lastprivate}{2}
|
||||
|
@ -7,7 +7,7 @@ an induction variable (\plc{j}). At the end of the execution of
|
||||
the loop construct, the original variable \plc{j} is updated with
|
||||
the value \plc{N/2} from the last iteration of the loop.
|
||||
|
||||
\cexample{linear_in_loop}{1}
|
||||
\cexample[4.5]{linear_in_loop}{1}
|
||||
|
||||
\ffreeexample{linear_in_loop}{1}
|
||||
\ffreeexample[4.5]{linear_in_loop}{1}
|
||||
|
||||
|
76
Examples_linear_modifier.tex
Normal file
76
Examples_linear_modifier.tex
Normal file
@ -0,0 +1,76 @@
|
||||
%%% section
|
||||
\section{\code{ref}, \code{val}, \code{uval} Modifiers for \code{linear} Clause}
|
||||
\label{sec:linear_modifier}
|
||||
|
||||
When generating vector functions from \code{declare}~\code{simd} directives, it is important for a compiler to know the proper types of function arguments in
|
||||
order to generate efficient codes.
|
||||
This is especially true for C++ reference types and Fortran arguments.
|
||||
|
||||
In the following example, the function \plc{add\_one2} has a C++ reference
|
||||
parameter (or Fortran argument) \plc{p}. Variable \plc{p} gets incremented by 1 in the function.
|
||||
The caller loop \plc{i} in the main program passes
|
||||
a variable \plc{k} as a reference to the function \plc{add\_one2} call.
|
||||
The \code{ref} modifier for the \code{linear} clause on the
|
||||
\code{declare}~\code{simd} directive is used to annotate the
|
||||
reference-type parameter \plc{p} to match the property of the variable
|
||||
\plc{k} in the loop.
|
||||
This use of reference type is equivalent to the second call to
|
||||
\plc{add\_one2} with a direct passing of the array element \plc{a[i]}.
|
||||
In the example, the preferred vector
|
||||
length 8 is specified for both the caller loop and the callee function.
|
||||
|
||||
When \code{linear(ref(p))} is applied to an argument passed by reference,
|
||||
it tells the compiler that the addresses in its vector argument are consecutive,
|
||||
and so the compiler can generate a single vector load or store instead of
|
||||
a gather or scatter. This allows more efficient SIMD code to be generated with
|
||||
less source changes.
|
||||
|
||||
\cppexample[4.5]{linear_modifier}{1}
|
||||
\ffreeexample[4.5]{linear_modifier}{1}
|
||||
\clearpage
|
||||
|
||||
|
||||
The following example is a variant of the above example. The function \plc{add\_one2} in the C++ code includes an additional C++ reference parameter \plc{i}.
|
||||
The loop index \plc{i} of the caller loop \plc{i} in the main program
|
||||
is passed as a reference to the function \plc{add\_one2} call.
|
||||
The loop index \plc{i} has a uniform address with
|
||||
linear value of step 1 across SIMD lanes.
|
||||
Thus, the \code{uval} modifier is used for the \code{linear} clause
|
||||
to annotate the C++ reference-type parameter \plc{i} to match
|
||||
the property of loop index \plc{i}.
|
||||
|
||||
In the correponding Fortran code the arguments \plc{p} and
|
||||
\plc{i} in the routine \plc{add\_on2} are passed by references.
|
||||
Similar modifiers are used for these variables in the \code{linear} clauses
|
||||
to match with the property at the caller loop in the main program.
|
||||
|
||||
When \code{linear(uval(i))} is applied to an argument passed by reference, it
|
||||
tells the compiler that its addresses in the vector argument are uniform
|
||||
so that the compiler can generate a scalar load or scalar store and create
|
||||
linear values. This allows more efficient SIMD code to be generated with
|
||||
less source changes.
|
||||
|
||||
\cppexample[4.5]{linear_modifier}{2}
|
||||
\ffreeexample[4.5]{linear_modifier}{2}
|
||||
|
||||
In the following example, the function \plc{func} takes arrays \plc{x} and \plc{y} as arguments, and accesses the array elements referenced by
|
||||
the index \plc{i}.
|
||||
The caller loop \plc{i} in the main program passes a linear copy of
|
||||
the variable \plc{k} to the function \plc{func}.
|
||||
The \code{val} modifier is used for the \code{linear} clause
|
||||
in the \code{declare}~\code{simd} directive for the function
|
||||
\plc{func} to annotate argument \plc{i} to match the property of
|
||||
the actual argument \plc{k} passed in the SIMD loop.
|
||||
Arrays \plc{x} and \plc{y} have uniform addresses across SIMD lanes.
|
||||
|
||||
When \code{linear(val(i):1)} is applied to an argument,
|
||||
it tells the compiler that its addresses in the vector argument may not be
|
||||
consecutive, however, their values are linear (with stride 1 here). When the value of \plc{i} is used
|
||||
in subscript of array references (e.g., \plc{x[i]}), the compiler can generate
|
||||
a vector load or store instead of a gather or scatter. This allows more
|
||||
efficient SIMD code to be generated with less source changes.
|
||||
|
||||
\cexample[4.5]{linear_modifier}{3}
|
||||
\ffreeexample[4.5]{linear_modifier}{3}
|
||||
|
||||
|
@ -9,5 +9,5 @@ of the loop are free of data dependencies and may be executed concurrently.
|
||||
It allows the compiler to use heuristics to select the parallelization scheme
|
||||
and compiler-level optimizations for the concurrency.
|
||||
|
||||
\cexample{loop}{1}
|
||||
\ffreeexample{loop}{1}
|
||||
\cexample[5.0]{loop}{1}
|
||||
\ffreeexample[5.0]{loop}{1}
|
||||
|
@ -3,39 +3,52 @@
|
||||
\section{The OpenMP Memory Model}
|
||||
\label{sec:mem_model}
|
||||
|
||||
In the following example, at Print 1, the value of \plc{x} could be either 2
|
||||
or 5, depending on the timing of the threads, and the implementation of the assignment
|
||||
to \plc{x}. There are two reasons that the value at Print 1 might not be 5.
|
||||
First, Print 1 might be executed before the assignment to \plc{x} is executed.
|
||||
Second, even if Print 1 is executed after the assignment, the value 5 is not guaranteed
|
||||
to be seen by thread 1 because a flush may not have been executed by thread 0 since
|
||||
the assignment.
|
||||
The following examples illustrate two major concerns for concurrent thread
|
||||
execution: ordering of thread execution and memory accesses that may or may not
|
||||
lead to race conditions.
|
||||
|
||||
The barrier after Print 1 contains implicit flushes on all threads, as well as
|
||||
a thread synchronization, so the programmer is guaranteed that the value 5 will
|
||||
be printed by both Print 2 and Print 3.
|
||||
In the following example, at Print 1, the value of \code{xval} could be either 2
|
||||
or 5, depending on the timing of the threads. The \code{atomic} directives are
|
||||
necessary for the accesses to \code{x} by threads 1 and 2 to avoid a data race.
|
||||
If the atomic write completes before the atomic read, thread 1 is guaranteed to
|
||||
see 5 in \code{xval}. Otherwise, thread 1 is guaranteed to see 2 in \code{xval}.
|
||||
|
||||
\cexample{mem_model}{1}
|
||||
The barrier after Print 1 contains implicit flushes on all threads, as well as
|
||||
a thread synchronization, so the programmer is guaranteed that the value 5 will
|
||||
be printed by both Print 2 and Print 3. Since neither Print 2 or Print 3 are modifying
|
||||
\code{x}, they may concurrently access \code{x} without requiring \code{atomic}
|
||||
directives to avoid a data race.
|
||||
|
||||
\ffreeexample{mem_model}{1}
|
||||
\cexample[3.1]{mem_model}{1}
|
||||
|
||||
\ffreeexample[3.1]{mem_model}{1}
|
||||
|
||||
\pagebreak
|
||||
The following example demonstrates why synchronization is difficult to perform
|
||||
correctly through variables. The value of flag is undefined in both prints on thread
|
||||
1 and the value of data is only well-defined in the second print.
|
||||
The following example demonstrates why synchronization is difficult to perform
|
||||
correctly through variables. The write to \code{flag} on thread 0 and the read
|
||||
from \code{flag} in the loop on thread 1 must be atomic to avoid a data race.
|
||||
When thread 1 breaks out of the loop, \code{flag} will have the value of 1.
|
||||
However, \code{data} will still be undefined at the first print statement. Only
|
||||
after the flush of both \code{flag} and \code{data} after the first print
|
||||
statement will \code{data} have the well-defined value of 42.
|
||||
|
||||
\cexample{mem_model}{2}
|
||||
\cexample[3.1]{mem_model}{2}
|
||||
|
||||
\fexample{mem_model}{2}
|
||||
\fexample[3.1]{mem_model}{2}
|
||||
|
||||
\pagebreak
|
||||
The next example demonstrates why synchronization is difficult to perform correctly
|
||||
through variables. Because the \plc{write}(1)-\plc{flush}(1)-\plc{flush}(2)-\plc{read}(2)
|
||||
sequence cannot be guaranteed in the example, the statements on thread 0 and thread
|
||||
1 may execute in either order.
|
||||
The next example demonstrates why synchronization is difficult to perform
|
||||
correctly through variables. As in the preceding example, the updates to
|
||||
\code{flag} and the reading of \code{flag} in the loops on threads 1 and 2 are
|
||||
performed atomically to avoid data races on \code{flag}. However, the code still
|
||||
contains data race due to the incorrect use of ``flush with a list'' after the
|
||||
assignment to \code{data1} on thread 1. By not including \code{flag} in the
|
||||
flush-set of that \code{flush} directive, the assignment can be reordered with
|
||||
respect to the subsequent atomic update to \code{flag}. Consequentially,
|
||||
\code{data1} is undefined at the print statement on thread 2.
|
||||
|
||||
\cexample{mem_model}{3}
|
||||
\cexample[3.1]{mem_model}{3}
|
||||
|
||||
\fexample{mem_model}{3}
|
||||
\fexample[3.1]{mem_model}{3}
|
||||
|
||||
|
||||
|
@ -28,9 +28,9 @@ directive as selector set, has traits of \plc{kind}, \plc{isa} and \plc{arch}.
|
||||
|
||||
|
||||
|
||||
\cexample{metadirective}{1}
|
||||
\cexample[5.0]{metadirective}{1}
|
||||
|
||||
\ffreeexample{metadirective}{1}
|
||||
\ffreeexample[5.0]{metadirective}{1}
|
||||
|
||||
%\pagebreak
|
||||
In the second example, the \plc{implementation} selector set is specified
|
||||
@ -47,9 +47,9 @@ traits. Otherwise, just the \code{teams} construct is used without
|
||||
any clauses, as prescribed by the \code{default} clause.
|
||||
|
||||
|
||||
\cexample{metadirective}{2}
|
||||
\cexample[5.0]{metadirective}{2}
|
||||
|
||||
\ffreeexample{metadirective}{2}
|
||||
\ffreeexample[5.0]{metadirective}{2}
|
||||
|
||||
\clearpage
|
||||
|
||||
@ -83,6 +83,6 @@ the \code{target}~\code{teams} construct has been hoisted out of the function, a
|
||||
as the \plc{variant} directive of the \code{metadirective} directive within the function.
|
||||
%%%%%%%%
|
||||
|
||||
\cexample{metadirective}{3}
|
||||
\cexample[5.0]{metadirective}{3}
|
||||
|
||||
\ffreeexample{metadirective}{3}
|
||||
\ffreeexample[5.0]{metadirective}{3}
|
||||
|
@ -28,6 +28,6 @@ with appropriate restrictions. The combination of the \code{parallel}~\code{mast
|
||||
with the \code{taskloop} or \code{taskloop}~\code{simd} construct produces no additional
|
||||
restrictions.
|
||||
|
||||
\cexample{parallel_master_taskloop}{1}
|
||||
\cexample[5.0]{parallel_master_taskloop}{1}
|
||||
|
||||
\ffreeexample{parallel_master_taskloop}{1}
|
||||
\ffreeexample[5.0]{parallel_master_taskloop}{1}
|
||||
|
@ -5,7 +5,7 @@
|
||||
|
||||
The following example shows a parallel random access iterator loop.
|
||||
|
||||
\cppnexample{pra_iterator}{1}
|
||||
\cppnexample[3.0]{pra_iterator}{1}
|
||||
\cppspecificend
|
||||
|
||||
|
||||
|
@ -12,7 +12,7 @@ The following example demonstrates the \code{reduction} clause; note that some
|
||||
reductions can be expressed in the loop in several ways, as shown for the \code{max}
|
||||
and \code{min} reductions below:
|
||||
|
||||
\cexample{reduction}{1}
|
||||
\cexample[3.1]{reduction}{1}
|
||||
|
||||
\pagebreak
|
||||
|
||||
@ -66,38 +66,148 @@ the start of the \code{parallel} region.
|
||||
|
||||
\fexample{reduction}{6}
|
||||
|
||||
The following example demonstrates the reduction of array \plc{a}. In C/C++ this is illustrated by the explicit use of an array section \plc{a[0:N]} in the \code{reduction} clause. The corresponding Fortran example uses array syntax supported in the base language. As of the OpenMP 4.5 specification the explicit use of array section in the \code{reduction} clause in Fortran is not permitted. But this oversight will be fixed in the next release of the specification.
|
||||
The following example demonstrates the reduction of array \plc{a}. In C/C++ this is illustrated by the explicit use of an array section \plc{a[0:N]} in the \code{reduction} clause. The corresponding Fortran example uses array syntax supported in the base language. As of the OpenMP 4.5 specification the explicit use of array section in the \code{reduction} clause in Fortran is not permitted. But this oversight has been fixed in the OpenMP 5.0 specification.
|
||||
|
||||
|
||||
\cexample{reduction}{7}
|
||||
\cexample[4.5]{reduction}{7}
|
||||
|
||||
\ffreeexample{reduction}{7}
|
||||
|
||||
|
||||
\subsection{Task Reduction}
|
||||
\label{subsec:task_reduction}
|
||||
|
||||
The following C/C++ and Fortran examples show how to implement
|
||||
a task reduction over a linked list.
|
||||
In OpenMP 5.0 the \code{task\_reduction} clause was created for the \code{taskgroup} construct,
|
||||
to allow reductions among explicit tasks that have an \code{in\_reduction} clause.
|
||||
|
||||
Task reductions are supported by the \code{task\_reduction} clause which can only be
|
||||
applied to the \code{taskgroup} directive, and a \code{in\_reduction} clause
|
||||
which can be applied to the \code{task} construct among others.
|
||||
|
||||
The \code{task\_reduction} clause on the \code{taskgroup} construct is used to
|
||||
define the scope of a new reduction, and after the \code{taskgroup}
|
||||
region the original variable will contain the final value of the reduction.
|
||||
In the task-generating while loop the \code{in\_reduction} clause of the \code{task}
|
||||
construct is used to specify that the task participates "in" the reduction.
|
||||
In the \plc{task\_reduction.1} example below a reduction is performed as the algorithm
|
||||
traverses a linked list. The reduction statement is assigned to be an explicit task using
|
||||
a \code{task} construct and is specified to be a reduction participant with
|
||||
the \code{in\_reduction} clause.
|
||||
A \code{taskgroup} construct encloses the tasks participating in the reduction, and
|
||||
specifies, with the \code{task\_reduction} clause, that the taskgroup has tasks participating
|
||||
in a reduction. After the \code{taskgroup} region the original variable will contain
|
||||
the final value of the reduction.
|
||||
|
||||
Note: The \plc{res} variable is private in the \plc{linked\_list\_sum} routine
|
||||
and is not required to be shared (as in the case of a \code{parallel} construct
|
||||
reduction).
|
||||
|
||||
|
||||
\cexample{task_reduction}{1}
|
||||
\cexample[5.0]{task_reduction}{1}
|
||||
|
||||
\ffreeexample{task_reduction}{1}
|
||||
\ffreeexample[5.0]{task_reduction}{1}
|
||||
|
||||
In OpenMP 5.0 the \code{task} \plc{reduction-modifier} for the \code{reduction} clause was
|
||||
introduced to provide a means of performing reductions among implicit and explicit tasks.
|
||||
|
||||
The \code{reduction} clause of a \code{parallel} or worksharing construct may
|
||||
specify the \code{task} \plc{reduction-modifier} to include explicit task reductions
|
||||
within their region, provided the reduction operators (\plc{reduction-identifiers})
|
||||
and variables (\plc{list items}) of the participating tasks match those of the
|
||||
implicit tasks.
|
||||
|
||||
There are 2 reduction use cases (identified by USE CASE \#) in the \plc{task\_reduction.2} example below.
|
||||
|
||||
In USE CASE 1 a \code{task} modifier in the \code{reduction} clause
|
||||
of the \code{parallel} construct is used to include the reductions of any
|
||||
participating tasks, those with an \code{in\_reduction} clause and matching
|
||||
\plc{reduction-identifiers} (\code{+}) and list items (\code{x}).
|
||||
|
||||
Note, a \code{taskgroup} construct (with a \code{task\_reduction} clause) in not
|
||||
necessary to scope the explicit task reduction (as seen in the example above).
|
||||
Hence, even without the implicit task reduction statement (without the C \code{x++\;}
|
||||
and Fortran \code{x=x+1} statements), the \code{task} \plc{reduction-modifier}
|
||||
in a \code{reduction} clause of the \code{parallel} construct
|
||||
can be used to avoid having to create a \code{taskgroup} construct
|
||||
(and its \code{task\_reduction} clause) around the task generating structure.
|
||||
|
||||
In USE CASE 2 tasks participating in the reduction are within a
|
||||
worksharing region (a parallel worksharing-loop construct).
|
||||
Here, too, no \code{taskgroup} is required, and the \plc{reduction-identifier} (\code{+})
|
||||
and list item (variable \code{x}) match as required.
|
||||
|
||||
|
||||
\cexample[5.0]{task_reduction}{2}
|
||||
|
||||
\ffreeexample[5.0]{task_reduction}{2}
|
||||
|
||||
|
||||
\subsection{Reduction on Combined Target Constructs}
|
||||
\label{subsec:target_reduction}
|
||||
|
||||
When a \code{reduction} clause appears on a combined construct that combines
|
||||
a \code{target} construct with another construct, there is an implicit map
|
||||
of the list items with a \code{tofrom} map type for the \code{target} construct.
|
||||
Otherwise, the list items (if they are scalar variables) would be
|
||||
treated as firstprivate by default in the \code{target} construct, which
|
||||
is unlikely to provide the intended behavior since the result of the
|
||||
reduction that is in the firstprivate variable would be discarded
|
||||
at the end of the \code{target} region.
|
||||
|
||||
In the following example, the use of the \code{reduction} clause on \code{sum1}
|
||||
or \code{sum2} should, by default, result in an implicit \code{tofrom} map for
|
||||
that variable. So long as neither \code{sum1} nor \code{sum2} were already
|
||||
present on the device, the mapping behavior ensures the value for
|
||||
\code{sum1} computed in the first \code{target} construct is used in the
|
||||
second \code{target} construct.
|
||||
|
||||
\cexample[5.0]{target_reduction}{1}
|
||||
|
||||
\ffreeexample[5.0]{target_reduction}{1}
|
||||
\clearpage
|
||||
|
||||
In next example, the variables \code{sum1} and \code{sum2} remain on the
|
||||
device for the duration of the \code{target}~\code{data} region so that it is
|
||||
their device copies that are updated by the reductions. Note the significance
|
||||
of mapping \code{sum1} on the second \code{target} construct; otherwise, it
|
||||
would be treated by default as firstprivate and the result computed for
|
||||
\code{sum1} in the prior \code{target} region may not be used. Alternatively, a
|
||||
\code{target}~\code{update} construct could be used between the two
|
||||
\code{target} constructs to update the host version of \code{sum1} with the
|
||||
value that is in the corresponding device version after the completion of the
|
||||
first construct.
|
||||
|
||||
\cexample[5.0]{target_reduction}{2}
|
||||
|
||||
\ffreeexample[5.0]{target_reduction}{2}
|
||||
|
||||
|
||||
\subsection{Task Reduction with Target Constructs}
|
||||
\label{subsec:target_task_reduction}
|
||||
|
||||
The following examples illustrate how task reductions can apply to target tasks
|
||||
that result from a \code{target} construct with the \code{in\_reduction}
|
||||
clause. Here, the \code{in\_reduction} clause specifies that the target task
|
||||
participates in the task reduction defined in the scope of the enclosing
|
||||
\code{taskgroup} construct. Partial results from all tasks participating in the
|
||||
task reduction will be combined (in some order) into the original variable
|
||||
listed in the \code{task\_reduction} clause before exiting the \code{taskgroup}
|
||||
region.
|
||||
|
||||
\cexample[5.0]{target_task_reduction}{1}
|
||||
|
||||
\ffreeexample[5.0]{target_task_reduction}{1}
|
||||
|
||||
In the next pair of examples, the task reduction is defined by a
|
||||
\code{reduction} clause with the \code{task} modifier, rather than a
|
||||
\code{task\_reduction} clause on a \code{taskgroup} construct. Again, the
|
||||
partial results from the participating tasks will be combined in some order
|
||||
into the original reduction variable, \code{sum}.
|
||||
|
||||
\cexample[5.0]{target_task_reduction}{2a}
|
||||
|
||||
\ffreeexample[5.0]{target_task_reduction}{2a}
|
||||
|
||||
Next, the \code{task} modifier is again used to define a task reduction over
|
||||
participating tasks. This time, the participating tasks are a target task
|
||||
resulting from a \code{target} construct with the \code{in\_reduction} clause,
|
||||
and the implicit task (executing on the master thread) that calls
|
||||
\code{host\_compute}. As before, the partial results from these paricipating
|
||||
tasks are combined in some order into the original reduction variable.
|
||||
|
||||
\cexample[5.0]{target_task_reduction}{2b}
|
||||
|
||||
\ffreeexample[5.0]{target_task_reduction}{2b}
|
||||
|
||||
|
||||
\subsection{Taskloop Reduction}
|
||||
@ -121,8 +231,8 @@ that if we add the \code{nogroup} clause to the \code{taskloop} construct the co
|
||||
nonconforming, basically because we have a set of tasks that participate in a
|
||||
reduction that has not been defined.
|
||||
|
||||
\cexample{taskloop_reduction}{1}
|
||||
\ffreeexample{taskloop_reduction}{1}
|
||||
\cexample[5.0]{taskloop_reduction}{1}
|
||||
\ffreeexample[5.0]{taskloop_reduction}{1}
|
||||
|
||||
%In the second example, we are computing exactly the same
|
||||
%value but we do it in a very different way. The first thing that we do in the
|
||||
@ -154,8 +264,9 @@ declared reduction (\code{in\_reduction} clause) whereas in the other case
|
||||
creation of a new reduction is specified and also that all tasks generated
|
||||
by the taskloop will participate on it.
|
||||
|
||||
\cexample{taskloop_reduction}{2}
|
||||
\ffreeexample{taskloop_reduction}{2}
|
||||
\cexample[5.0]{taskloop_reduction}{2}
|
||||
\ffreeexample[5.0]{taskloop_reduction}{2}
|
||||
\clearpage
|
||||
|
||||
In the OpenMP 5.0 Specification, \code{reduction} clauses for the
|
||||
\code{taskloop}~\code{ simd} construct were also added.
|
||||
@ -228,10 +339,8 @@ At the end of the parallel region \plc{asum} contains the combined result of all
|
||||
%At the end of the parallel region \plc{asum} contains the combined result of all reductions.
|
||||
|
||||
|
||||
\cexample{taskloop_simd_reduction}{1}
|
||||
\cexample[5.0]{taskloop_simd_reduction}{1}
|
||||
|
||||
\ffreeexample{taskloop_simd_reduction}{1}
|
||||
\ffreeexample[5.0]{taskloop_simd_reduction}{1}
|
||||
|
||||
|
||||
|
||||
% All other reductions
|
||||
|
@ -26,6 +26,6 @@ not updated on the host.
|
||||
|
||||
%\pagebreak
|
||||
|
||||
\cppexample{requires}{1}
|
||||
\cppexample[5.0]{requires}{1}
|
||||
|
||||
\ffreeexample{requires}{1}
|
||||
\ffreeexample[5.0]{requires}{1}
|
||||
|
38
Examples_scan.tex
Normal file
38
Examples_scan.tex
Normal file
@ -0,0 +1,38 @@
|
||||
\pagebreak
|
||||
\section{The \code{scan} Directive}
|
||||
\label{sec:scan}
|
||||
|
||||
The following examples illustrate how to parallelize a loop that saves
|
||||
the \emph{prefix sum} of a reduction. This is accomplished by using
|
||||
the \code{inscan} modifier in the \code{reduction} clause for the input
|
||||
variable of the scan, and specifying with a \code{scan} directive whether
|
||||
the storage statement includes or excludes the scan input of the present
|
||||
iteration (\texttt{k}).
|
||||
|
||||
Basically, the \code{inscan} modifier connects a loop and/or SIMD reduction to
|
||||
the scan operation, and a \code{scan} construct with an \code{inclusive} or
|
||||
\code{exclusive} clause specifies whether the ``scan phase'' (lexical block
|
||||
before and after the directive, respectively) is to use an \plc{inclusive} or
|
||||
\plc{exclusive} scan value for the list item (\texttt{x}).
|
||||
|
||||
The first example uses the \plc{inclusive} scan operation on a composite
|
||||
loop-SIMD construct. The \code{scan} directive separates the reduction
|
||||
statement on variable \texttt{x} from the use of \texttt{x} (saving to array \texttt{b}).
|
||||
The order of the statements in this example indicates that
|
||||
value \texttt{a[k]} (\texttt{a(k)} in Fortran) is included in the computation of
|
||||
the prefix sum \texttt{b[k]} (\texttt{b(k)} in Fortran) for iteration \texttt{k}.
|
||||
|
||||
\cexample[5.0]{scan}{1}
|
||||
|
||||
\ffreeexample[5.0]{scan}{1}
|
||||
|
||||
The second example uses the \plc{exclusive} scan operation on a composite
|
||||
loop-SIMD construct. The \code{scan} directive separates the use of \texttt{x}
|
||||
(saving to array \texttt{b}) from the reduction statement on variable \texttt{x}.
|
||||
The order of the statements in this example indicates that
|
||||
value \texttt{a[k]} (\texttt{a(k)} in Fortran) is excluded from the computation
|
||||
of the prefix sum \texttt{b[k]} (\texttt{b(k)} in Fortran) for iteration \texttt{k}.
|
||||
|
||||
\cexample[5.0]{scan}{2}
|
||||
|
||||
\ffreeexample[5.0]{scan}{2}
|
@ -7,7 +7,7 @@ The following example is non-conforming, because the \code{flush}, \code{barrier
|
||||
\code{taskwait}, and \code{taskyield} directives are stand-alone directives
|
||||
and cannot be the immediate substatement of an \code{if} statement.
|
||||
|
||||
\cexample{standalone}{1}
|
||||
\cexample[3.1]{standalone}{1}
|
||||
|
||||
\pagebreak
|
||||
The following example is non-conforming, because the \code{flush}, \code{barrier},
|
||||
@ -15,19 +15,19 @@ The following example is non-conforming, because the \code{flush}, \code{barrier
|
||||
and cannot be the action statement of an \code{if} statement or a labeled branch
|
||||
target.
|
||||
|
||||
\ffreeexample{standalone}{1}
|
||||
\ffreeexample[3.1]{standalone}{1}
|
||||
|
||||
The following version of the above example is conforming because the \code{flush},
|
||||
\code{barrier}, \code{taskwait}, and \code{taskyield} directives are enclosed
|
||||
in a compound statement.
|
||||
|
||||
\cexample{standalone}{2}
|
||||
\cexample[3.1]{standalone}{2}
|
||||
|
||||
\pagebreak
|
||||
The following example is conforming because the \code{flush}, \code{barrier},
|
||||
\code{taskwait}, and \code{taskyield} directives are enclosed in an \code{if}
|
||||
construct or follow the labeled branch target.
|
||||
|
||||
\ffreeexample{standalone}{2}
|
||||
\ffreeexample[3.1]{standalone}{2}
|
||||
|
||||
|
||||
|
@ -9,9 +9,9 @@ This following example shows how the \code{target} construct offloads a code
|
||||
region to a target device. The variables \plc{p}, \plc{v1}, \plc{v2}, and \plc{N} are implicitly mapped
|
||||
to the target device.
|
||||
|
||||
\cexample{target}{1}
|
||||
\cexample[4.0]{target}{1}
|
||||
|
||||
\ffreeexample{target}{1}
|
||||
\ffreeexample[4.0]{target}{1}
|
||||
|
||||
\subsection{\code{target} Construct with \code{map} Clause}
|
||||
\label{subsec:target_map}
|
||||
@ -21,9 +21,9 @@ region to a target device. The variables \plc{p}, \plc{v1} and \plc{v2} are expl
|
||||
target device using the \code{map} clause. The variable \plc{N} is implicitly mapped to
|
||||
the target device.
|
||||
|
||||
\cexample{target}{2}
|
||||
\cexample[4.0]{target}{2}
|
||||
|
||||
\ffreeexample{target}{2}
|
||||
\ffreeexample[4.0]{target}{2}
|
||||
|
||||
\subsection{\code{map} Clause with \code{to}/\code{from} map-types}
|
||||
\label{subsec:target_map_tofrom}
|
||||
@ -46,14 +46,14 @@ the variable \plc{p} is not initialized with the value of the corresponding vari
|
||||
on the host device, and at the end of the \code{target} region the variable \plc{p}
|
||||
is assigned to the corresponding variable on the host device.
|
||||
|
||||
\cexample{target}{3}
|
||||
\cexample[4.0]{target}{3}
|
||||
|
||||
The \code{to} and \code{from} map-types allow programmers to optimize data
|
||||
motion. Since data for the \plc{v} arrays are not returned, and data for the \plc{p} array
|
||||
are not transferred to the device, only one-half of the data is moved, compared
|
||||
to the default behavior of an implicit mapping.
|
||||
|
||||
\ffreeexample{target}{3}
|
||||
\ffreeexample[4.0]{target}{3}
|
||||
|
||||
\subsection{\code{map} Clause with Array Sections}
|
||||
\label{subsec:target_array_section}
|
||||
@ -64,14 +64,15 @@ the mapping of variables to the target device. Because variables \plc{p}, \plc{v
|
||||
pointers, array section notation must be used to map the arrays. The notation \code{:N}
|
||||
is equivalent to \code{0:N}.
|
||||
|
||||
\cexample{target}{4}
|
||||
\cexample[4.0]{target}{4}
|
||||
\clearpage
|
||||
|
||||
In C, the length of the pointed-to array must be specified. In Fortran the extent
|
||||
of the array is known and the length need not be specified. A section of the array
|
||||
can be specified with the usual Fortran syntax, as shown in the following example.
|
||||
The value 1 is assumed for the lower bound for array section \plc{v2(:N)}.
|
||||
|
||||
\ffreeexample{target}{4}
|
||||
\ffreeexample[4.0]{target}{4}
|
||||
|
||||
A more realistic situation in which an assumed-size array is passed to \code{vec\_mult}
|
||||
requires that the length of the arrays be specified, because the compiler does
|
||||
@ -79,7 +80,7 @@ not know the size of the storage. A section of the array must be specified with
|
||||
the usual Fortran syntax, as shown in the following example. The value 1 is assumed
|
||||
for the lower bound for array section \plc{v2(:N)}.
|
||||
|
||||
\ffreeexample{target}{4b}
|
||||
\ffreeexample[4.0]{target}{4b}
|
||||
|
||||
\subsection{\code{target} Construct with \code{if} Clause}
|
||||
\label{subsec:target_if}
|
||||
@ -95,9 +96,9 @@ The \code{if} clause on the \code{parallel} construct indicates that if the
|
||||
variable \plc{N} is smaller than a second threshold then the \code{parallel} region
|
||||
is inactive.
|
||||
|
||||
\cexample{target}{5}
|
||||
\cexample[4.0]{target}{5}
|
||||
|
||||
\ffreeexample{target}{5}
|
||||
\ffreeexample[4.0]{target}{5}
|
||||
|
||||
The following example is a modification of the above \plc{target.5} code to show the combined \code{target}
|
||||
and parallel loop directives. It uses the \plc{directive-name} modifier in multiple \code{if}
|
||||
@ -107,9 +108,9 @@ The \code{if} clause with the \code{target} modifier applies to the \code{target
|
||||
combined directive, and the \code{if} clause with the \code{parallel} modifier applies
|
||||
to the \code{parallel} component of the combined directive.
|
||||
|
||||
\cexample{target}{6}
|
||||
\cexample[4.5]{target}{6}
|
||||
|
||||
\ffreeexample{target}{6}
|
||||
\ffreeexample[4.5]{target}{6}
|
||||
|
||||
\subsection{target Reverse Offload}
|
||||
\label{subsec:target_reverse_offload}
|
||||
@ -139,6 +140,6 @@ function. This feature may be necessary if the function
|
||||
exists in another compile unit.
|
||||
|
||||
|
||||
\cexample{target_reverse_offload}{7}
|
||||
\cexample[5.0]{target_reverse_offload}{7}
|
||||
|
||||
\ffreeexample{target_reverse_offload}{7}
|
||||
\ffreeexample[5.0]{target_reverse_offload}{7}
|
||||
|
@ -14,14 +14,14 @@ variables \plc{v1}, \plc{v2}, and \plc{p} from the enclosing device data environ
|
||||
\plc{N} is mapped into the new device data environment from the encountering task's data
|
||||
environment.
|
||||
|
||||
\cexample{target_data}{1}
|
||||
\cexample[4.0]{target_data}{1}
|
||||
|
||||
\pagebreak
|
||||
The Fortran code passes a reference and specifies the extent of the arrays in the
|
||||
declaration. No length information is necessary in the map clause, as is required
|
||||
with C/C++ pointers.
|
||||
|
||||
\ffreeexample{target_data}{1}
|
||||
\ffreeexample[4.0]{target_data}{1}
|
||||
|
||||
\subsection{\code{target} \code{data} Region Enclosing Multiple \code{target} Regions}
|
||||
\label{subsec:target_data_multiregion}
|
||||
@ -39,7 +39,7 @@ In the following example the variables \plc{v1} and \plc{v2} are mapped at each
|
||||
construct. Instead of mapping the variable \plc{p} twice, once at each \code{target}
|
||||
construct, \plc{p} is mapped once by the \code{target} \code{data} construct.
|
||||
|
||||
\cexample{target_data}{2}
|
||||
\cexample[4.0]{target_data}{2}
|
||||
|
||||
|
||||
The Fortran code uses reference and specifies the extent of the \plc{p}, \plc{v1} and \plc{v2} arrays.
|
||||
@ -48,7 +48,7 @@ C/C++ pointers. The arrays \plc{v1} and \plc{v2} are mapped at each \code{target
|
||||
Instead of mapping the array \plc{p} twice, once at each target construct, \plc{p} is mapped
|
||||
once by the \code{target} \code{data} construct.
|
||||
|
||||
\ffreeexample{target_data}{2}
|
||||
\ffreeexample[4.0]{target_data}{2}
|
||||
|
||||
In the following example, the array \plc{Q} is mapped once at the enclosing
|
||||
\code{target}~\code{data} region instead of at each \code{target} construct.
|
||||
@ -58,9 +58,9 @@ the \code{tofrom} map-type at the first \code{target} construct in order to retu
|
||||
its reduced value from the parallel loop construct to the host.
|
||||
The variable defaults to firstprivate at the second \code{target} construct.
|
||||
|
||||
\cexample{target_data}{3}
|
||||
\cexample[4.0]{target_data}{3}
|
||||
|
||||
\ffreeexample{target_data}{3}
|
||||
\ffreeexample[4.0]{target_data}{3}
|
||||
|
||||
\subsection{\code{target} \code{data} Construct with Orphaned Call}
|
||||
|
||||
@ -87,7 +87,7 @@ of the storage location associated with their corresponding array sections. Note
|
||||
that the following pairs of array section storage locations are equivalent (\plc{p0[:N]},
|
||||
\plc{p1[:N]}), (\plc{v1[:N]},\plc{v3[:N]}), and (\plc{v2[:N]},\plc{v4[:N]}).
|
||||
|
||||
\cexample{target_data}{4}
|
||||
\cexample[4.0]{target_data}{4}
|
||||
|
||||
The Fortran code maps the pointers and storage in an identical manner (same extent,
|
||||
but uses indices from 1 to \plc{N}).
|
||||
@ -103,7 +103,7 @@ assigned the address of the storage location associated with their corresponding
|
||||
array sections. Note that the following pair of array storage locations are equivalent
|
||||
(\plc{p0},\plc{p1}), (\plc{v1},\plc{v3}), and (\plc{v2},\plc{v4}).
|
||||
|
||||
\ffreeexample{target_data}{4}
|
||||
\ffreeexample[4.0]{target_data}{4}
|
||||
|
||||
|
||||
In the following example, the variables \plc{p1}, \plc{v3}, and \plc{v4} are references to the pointer
|
||||
@ -112,7 +112,7 @@ environment inherits the pointer variables \plc{p0}, \plc{v1}, and \plc{v2} from
|
||||
\code{data} construct's device data environment. Thus, \plc{p1}, \plc{v3}, and \plc{v4} are already
|
||||
present in the device data environment.
|
||||
|
||||
\cppexample{target_data}{5}
|
||||
\cppexample[4.0]{target_data}{5}
|
||||
|
||||
In the following example, the usual Fortran approach is used for dynamic memory.
|
||||
The \plc{p0}, \plc{v1}, and \plc{v2} arrays are allocated in the main program and passed as references
|
||||
@ -122,7 +122,7 @@ environment inherits the arrays \plc{p0}, \plc{v1}, and \plc{v2} from the enclos
|
||||
device data environment. Thus, \plc{p1}, \plc{v3}, and \plc{v4} are already present in the device
|
||||
data environment.
|
||||
|
||||
\ffreeexample{target_data}{5}
|
||||
\ffreeexample[4.0]{target_data}{5}
|
||||
|
||||
\subsection{\code{target} \code{data} Construct with \code{if} Clause}
|
||||
\label{subsec:target_data_if}
|
||||
@ -140,7 +140,7 @@ variable \plc{p} is implicitly mapped with a map-type of \code{tofrom}, but the
|
||||
location for the array section \plc{p[0:N]} will not be mapped in the device data environments
|
||||
of the \code{target} constructs.
|
||||
|
||||
\cexample{target_data}{6}
|
||||
\cexample[4.0]{target_data}{6}
|
||||
|
||||
\pagebreak
|
||||
The \code{if} clauses work the same way for the following Fortran code. The \code{target}
|
||||
@ -149,7 +149,7 @@ an \code{if} clause with the same condition, so that the \code{target} \code{dat
|
||||
region and the \code{target} region are either both created for the device, or
|
||||
are both ignored.
|
||||
|
||||
\ffreeexample{target_data}{6}
|
||||
\ffreeexample[4.0]{target_data}{6}
|
||||
|
||||
\pagebreak
|
||||
In the following example, when the \code{if} clause conditional expression on
|
||||
@ -161,7 +161,7 @@ region the array section \plc{p[0:N]} will be assigned from the device data envi
|
||||
to the corresponding variable in the data environment of the task that encountered
|
||||
the \code{target} \code{data} construct, resulting in undefined values in \plc{p[0:N]}.
|
||||
|
||||
\cexample{target_data}{7}
|
||||
\cexample[4.0]{target_data}{7}
|
||||
|
||||
\pagebreak
|
||||
The \code{if} clauses work the same way for the following Fortran code. When
|
||||
@ -174,5 +174,5 @@ region the \plc{p} array will be assigned from the device data environment to th
|
||||
variable in the data environment of the task that encountered the \code{target}
|
||||
\code{data} construct, resulting in undefined values in \plc{p}.
|
||||
|
||||
\ffreeexample{target_data}{7}
|
||||
\ffreeexample[4.0]{target_data}{7}
|
||||
|
||||
|
56
Examples_target_defaultmap.tex
Normal file
56
Examples_target_defaultmap.tex
Normal file
@ -0,0 +1,56 @@
|
||||
\pagebreak
|
||||
\section{\code{defaultmap} Clause}
|
||||
\label{sec:defaultmap}
|
||||
|
||||
The implicitly-determined, data-mapping and data-sharing attribute
|
||||
rules of variables referenced in a \code{target} construct can be
|
||||
changed by the \code{defaultmap} clause introduced in OpenMP 5.0.
|
||||
The implicit behavior is specified as
|
||||
\code{alloc}, \code{to}, \code{from}, \code{tofrom},
|
||||
\code{firstprivate}, \code{none}, \code{default} or \code{present},
|
||||
and is applied to a variable-category, where
|
||||
\code{scalar}, \code{aggregate}, \code{allocatable},
|
||||
and \code{pointer} are the variable categories.
|
||||
|
||||
In OpenMP, a ``category'' has a common data-mapping and data-sharing
|
||||
behavior for variable types within the category.
|
||||
In C/C++, \code{scalar} refers to base-language scalar variables, except pointers.
|
||||
In Fortran it refers to a scalar variable, as defined by the base language,
|
||||
with intrinsic type, and excludes character type.
|
||||
|
||||
Also, \code{aggregate} refers to arrays and structures (C/C++) and
|
||||
derived types (Fortran). Fortran has the additional category of \code{allocatable}.
|
||||
|
||||
%In the example below, the first \code{target} construct uses \code{defaultmap}
|
||||
%clauses to explicitly set data-mapping attributes that reproduce
|
||||
%the default implicit mapping (data-mapping and data-sharing attributes). That is,
|
||||
%if the \code{defaultmap} clauses were removed, the results would be identical.
|
||||
In the example below, the first \code{target} construct uses \code{defaultmap}
|
||||
clauses to set data-mapping and possibly data-sharing attributes that reproduce
|
||||
the default implicit mapping (data-mapping and data-sharing attributes). That is,
|
||||
if the \code{defaultmap} clauses were removed, the results would be identical.
|
||||
|
||||
In the second \code{target} construct all implicit behavior is removed
|
||||
by specifying the \code{none} implicit behavior in the \code{defaultmap} clause.
|
||||
Hence, all variables must be explicitly mapped.
|
||||
In the C/C++ code a scalar (\texttt{s}), an array (\texttt{A}) and a structure
|
||||
(\texttt{S}) are explicitly mapped \code{tofrom}.
|
||||
The Fortran code uses a derived type (\texttt{D}) in lieu of structure.
|
||||
|
||||
The third \code{target} construct shows another usual case for using the \code{defaultmap} clause.
|
||||
The default mapping for (non-pointer) scalar variables is specified as \code{tofrom}.
|
||||
Here, the default implicit mapping for \texttt{s3} is \code{tofrom} as specified
|
||||
in the \code{defaultmap} clause, and \texttt{s1} and \texttt{s2} are explicitly
|
||||
mapped with the \code{firstprivate} data-sharing attribute.
|
||||
|
||||
In the fourth \code{target} construct all arrays, structures (C/C++) and derived
|
||||
types (Fortran) are mapped with \code{firstprivate} data-sharing behavior by a
|
||||
\code{defaultmap} clause with an \code{aggregate} variable category.
|
||||
For the \texttt{H} allocated array in the Fortran code, the \code{allocable}
|
||||
category must be used in a separate \code{defaultmap} clause to acquire
|
||||
\code{firsprivate} data-sharing behavior (\texttt{H} has the Fortran allocatable attribute).
|
||||
% (Common use cases for C/C++ heap storage can be found in \specref{sec:pointer_mapping}.)
|
||||
|
||||
\cexample[5.0]{target_defaultmap}{1}
|
||||
|
||||
\ffreeexample[5.0]{target_defaultmap}{1}
|
@ -26,11 +26,11 @@ full structure, plus the dynamic storage of the \plc{data} element.
|
||||
%The associated Fortran allocatable \plc{data} array is automatically mapped with the derived
|
||||
%type, it does not require an array section as in the C/C++ example.
|
||||
|
||||
\cexample{target_mapper}{1}
|
||||
\cexample[5.0]{target_mapper}{1}
|
||||
|
||||
\ffreeexample{target_mapper}{1}
|
||||
\ffreeexample[5.0]{target_mapper}{1}
|
||||
|
||||
\pagebreak
|
||||
%\pagebreak
|
||||
The next example illustrates the use of the \plc{mapper-identifier} and deep copy within a structure.
|
||||
The structure, \plc{dzmat\_t}, represents a complex matrix,
|
||||
with separate real (\plc{r\_m}) and imaginary (\plc{i\_m}) elements.
|
||||
@ -56,11 +56,11 @@ Note, the \plc{is} and \plc{ie} scalars are firstprivate
|
||||
by default for a target region, but are declared firstprivate anyway
|
||||
to remind the user of important firstprivate data-sharing properties required here.
|
||||
|
||||
\cexample{target_mapper}{2}
|
||||
\cexample[5.0]{target_mapper}{2}
|
||||
|
||||
\ffreeexample{target_mapper}{2}
|
||||
\ffreeexample[5.0]{target_mapper}{2}
|
||||
|
||||
\pagebreak
|
||||
%\pagebreak
|
||||
In the third example \plc{myvec} structures are
|
||||
nested within a \plc{mypoints} structure. The \plc{myvec\_t} type is mapped
|
||||
as in the first example. Following the \plc{mypoints} structure declaration,
|
||||
@ -80,7 +80,7 @@ type structure.
|
||||
%Note, in the main program \plc{P} is an array of \plc{mypoints\_t} type structures,
|
||||
%and hence every element of the array is mapped with the mapper prescription.
|
||||
|
||||
\cexample{target_mapper}{3}
|
||||
\cexample[5.0]{target_mapper}{3}
|
||||
|
||||
\ffreeexample{target_mapper}{3}
|
||||
\ffreeexample[5.0]{target_mapper}{3}
|
||||
|
||||
|
@ -34,10 +34,10 @@ when the \code{OMP\_DISPLAY\_ENV}
|
||||
environment variable is set to \code{TRUE} or \code{VERBOSE}.
|
||||
|
||||
%\pagebreak
|
||||
\cexample{target_offload_control}{1}
|
||||
\cexample[5.0]{target_offload_control}{1}
|
||||
|
||||
%\pagebreak
|
||||
\ffreeexample{target_offload_control}{1}
|
||||
\ffreeexample[5.0]{target_offload_control}{1}
|
||||
|
||||
|
||||
% OMP 4.5 target offload 15:9-11
|
||||
|
@ -2,6 +2,23 @@
|
||||
\section{Pointer mapping}
|
||||
\label{sec:pointer_mapping}
|
||||
|
||||
Pointers that contain host addresses require that those addresses are translated to device addresses for them to be useful in the context of a device data environment. Broadly speaking, there are two scenarios where this is important.
|
||||
|
||||
The first scenario is where the pointer is mapped to the device data environment, such that references to the pointer inside a \code{target} region are to the corresponding pointer. Pointer attachment ensures that the corresponding pointer will contain a device address when all of the following conditions are true:
|
||||
\begin{itemize}
|
||||
\item the pointer is mapped by directive $A$ to a device;
|
||||
\item a list item that uses the pointer as its base pointer (call it the \emph{pointee}) is mapped, to the same device, by directive $B$, which may be the same as $A$;
|
||||
\item the effect of directive $B$ is to create either the corresponding pointer or pointee in the device data environment of the device.
|
||||
\end{itemize}
|
||||
|
||||
Given the above conditions, pointer attachment is initiated as a result of directive $B$ and subsequent references to the pointee list item in a target region that use the pointer will access the corresponding pointee. The corresponding pointer remains in this \emph{attached} state until it is removed from the device data environment.
|
||||
|
||||
The second scenario, which is only applicable for C/C++, is where the pointer is implicitly privatized inside a \code{target} construct when it appears as the base pointer to a list item on the construct and does not appear explicitly as a list item in a \code{map} clause, \code{is\_device\_ptr} clause, or data-sharing attribute clause. This scenario can be further split into two cases: the list item is a zero-length array section (e.g., \plc{p[:0]}) or it is not.
|
||||
|
||||
If it is a zero-length array section, this will trigger a runtime check on entry to the \code{target} region for a previously mapped list item where the value of the pointer falls within the range of its base address and ending address. If such a match is found the private pointer is initialized to the device address corresponding to the value of the original pointer, and otherwise it is initialized to NULL (or retains its original value if the \code{unified\_address} requirement is specified for that compilation unit).
|
||||
|
||||
If the list item (again, call it the \emph{pointee}) is not a zero-length array section, the private pointer will be initialized such that references in the \code{target} region to the pointee list item that use the pointer will access the corresponding pointee.
|
||||
|
||||
The following example shows the basics of mapping pointers with and without
|
||||
associated storage on the host.
|
||||
|
||||
@ -24,18 +41,23 @@ data at the end of the \code{target} region.
|
||||
As a comparison, note that the \plc{aray} array is automatically mapped,
|
||||
since the compiler knows the extent of the array.
|
||||
|
||||
The pointer \plc{ptr3} is used in the \code{target} region and has
|
||||
a data-sharing attribute of firstprivate.
|
||||
The pointer is implicitly mapped to a zero-length array section.
|
||||
Neither the pointer address nor any
|
||||
of its locally assigned data on the device is returned
|
||||
to the host.
|
||||
The pointer \plc{ptr3} is used inside the \code{target} construct, but it does
|
||||
not appear in a data-mapping or data-sharing clause. Nor is there a
|
||||
\code{defaultmap} clause on the construct to indicate what its implicit
|
||||
data-mapping or data-sharing attribute should be. For such a case, \plc{ptr3}
|
||||
will be implicitly privatized within the construct and there will be a runtime
|
||||
check to see if the host memory to which it is pointing has corresponding memory
|
||||
in the device data environment. If this runtime check passes, the private
|
||||
\plc{ptr3} would be initialized to point to the corresponding memory. But in
|
||||
this case the check does not pass and so it is initialized to null.
|
||||
Since \plc{ptr3} is private, the value to which it is assigned in the
|
||||
\code{target} region is not returned into the original \plc{ptr3} on the host.
|
||||
|
||||
\cexample{target_ptr_map}{1}
|
||||
\cexample[5.0]{target_ptr_map}{1}
|
||||
|
||||
In the following example the global pointer \plc{p} appears in a
|
||||
\code{declare}~\code{target} directive. Hence, the pointer \plc{p} will
|
||||
persist on the device throughout executions in all target regions.
|
||||
persist on the device throughout executions in all \code{target} regions.
|
||||
|
||||
The pointer is also used in an array section of a \code{map} clause on
|
||||
a \code{target} construct. When storage associated with
|
||||
@ -50,4 +72,49 @@ pointer on the device is \emph{attached}.)
|
||||
% For globals with declare target is there such a things a
|
||||
% original and corresponding?
|
||||
|
||||
\cexample{target_ptr_map}{2}
|
||||
\cexample[5.0]{target_ptr_map}{2}
|
||||
|
||||
The following two examples illustrate subtle differences in pointer attachment
|
||||
to device address because of the order of data mapping.
|
||||
|
||||
In example \plc{target\_ptr\_map.3a}
|
||||
the global pointer \plc{p1} points to array \plc{x} and \plc{p2} points to
|
||||
array \plc{y} on the host.
|
||||
The array section \plc{x[:N]} is mapped by the \code{target}~\code{enter}~\code{data} directive while array \plc{y} is mapped
|
||||
on the \code{target} construct.
|
||||
Since the \code{declare}~\code{target} directive is applied to the declaration
|
||||
of \plc{p1}, \plc{p1} is a treated like a mapped variable on the \code{target}
|
||||
construct and references to \plc{p1} inside the construct will be to the
|
||||
corresponding \plc{p1} that exists on the device. However, the corresponding
|
||||
\plc{p1} will be undefined since there is no pointer attachment for it. Pointer
|
||||
attachment for \plc{p1} would require that (1) \plc{p1} (or an lvalue
|
||||
expression that refers to the same storage as \plc{p1}) appears as a base
|
||||
pointer to a list item in a \code{map} clause, and (2) the construct that has
|
||||
the \code{map} clause causes the list item to transition from \emph{not mapped}
|
||||
to \emph{mapped}. The conditions are clearly not satisifed for this example.
|
||||
|
||||
The problem for \plc{p2} in this example is also subtle. It will be privatized
|
||||
inside the \code{target} construct, with a runtime check for whether the memory
|
||||
to which it is pointing has corresponding memory that is accessible on the
|
||||
device. If this check is successful then the \plc{p2} inside the construct
|
||||
would be appropriately initialized to point to that corresponding memory.
|
||||
Unfortunately, despite there being an implicit map of the array \plc{y} (to
|
||||
which \plc{p2} is pointing) on the construct, the order of this map relative to
|
||||
the initialization of \plc{p2} is unspecified. Therefore, the initial value of
|
||||
\plc{p2} will also be undefined.
|
||||
|
||||
Thus, referencing values via either \plc{p1} or \plc{p2} inside
|
||||
the \code{target} region would be invalid.
|
||||
|
||||
\cexample[5.0]{target_ptr_map}{3a}
|
||||
|
||||
In example \plc{target\_ptr\_map.3b} the mapping orders for arrays \plc{x}
|
||||
and \plc{y} were rearranged to allow proper pointer attachments.
|
||||
On the \code{target} construct, the \code{map(x)} clause triggers pointer
|
||||
attachment for \plc{p1} to the device address of \plc{x}.
|
||||
Pointer \plc{p2} is assigned the device address of the previously mapped
|
||||
array \plc{y}.
|
||||
Referencing values via either \plc{p1} or \plc{p2} inside the \code{target} region is now valid.
|
||||
|
||||
\cexample[5.0]{target_ptr_map}{3b}
|
||||
|
||||
|
@ -19,7 +19,7 @@ Note: The buffer arrays and the \plc{x} variable have been grouped together, so
|
||||
the components that will reside on the device are all together (without gaps).
|
||||
This allows the runtime to optimize the transfer and the storage footprint on the device.
|
||||
|
||||
\cexample{target_struct_map}{1}
|
||||
\cexample[5.0]{target_struct_map}{1}
|
||||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
@ -28,7 +28,7 @@ a C++ class. In the member function \plc{SAXPY::driver}
|
||||
the array section \plc{p[:N]} is \emph{attached} to the pointer member \plc{p}
|
||||
on the device.
|
||||
|
||||
\cppexample{target_struct_map}{2}
|
||||
\cppexample[5.0]{target_struct_map}{2}
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
|
||||
|
@ -22,7 +22,7 @@ back to the host. Note, the stand-alone \code{target}~\code{enter}~\code{data} o
|
||||
after the host vector is created, and the \code{target}~\code{exit}~\code{data}
|
||||
construct occurs before the host data is deleted.
|
||||
|
||||
\cppexample{target_unstructured_data}{1}
|
||||
\cppexample[4.5]{target_unstructured_data}{1}
|
||||
|
||||
\pagebreak
|
||||
The following C code allocates and frees the data member of a Matrix structure.
|
||||
@ -33,7 +33,7 @@ and then frees the memory on the host. Note, the stand-alone
|
||||
\code{target}~\code{enter}~\code{data} occurs after the host memory is allocated, and the
|
||||
\code{target}~\code{exit}~\code{data} construct occurs before the host data is freed.
|
||||
|
||||
\cexample{target_unstructured_data}{1}
|
||||
\cexample[4.5]{target_unstructured_data}{1}
|
||||
|
||||
\pagebreak
|
||||
The following Fortran code allocates and deallocates a module array. The
|
||||
@ -44,6 +44,6 @@ then deallocates the array on the host. Note, the stand-alone
|
||||
\code{target}~\code{enter}~\code{data} occurs after the host memory is allocated, and the
|
||||
\code{target}~\code{exit}~\code{data} construct occurs before the host data is deallocated.
|
||||
|
||||
\ffreeexample{target_unstructured_data}{1}
|
||||
\ffreeexample[4.5]{target_unstructured_data}{1}
|
||||
%end
|
||||
|
||||
|
@ -27,9 +27,9 @@ region and waits for the completion of the region.
|
||||
|
||||
The second \code{target} region uses the updated values of \plc{v1[:N]} and \plc{v2[:N]}.
|
||||
|
||||
\cexample{target_update}{1}
|
||||
\cexample[4.0]{target_update}{1}
|
||||
|
||||
\ffreeexample{target_update}{1}
|
||||
\ffreeexample[4.0]{target_update}{1}
|
||||
|
||||
\subsection{\code{target} \code{update} Construct with \code{if} Clause}
|
||||
\label{subsec:target_update_if}
|
||||
@ -49,7 +49,7 @@ assigns the new values of \plc{v1} and \plc{v2} from the task's data environment
|
||||
mapped array sections in the \code{target} \code{data} construct's device data
|
||||
environment.
|
||||
|
||||
\cexample{target_update}{2}
|
||||
\cexample[4.0]{target_update}{2}
|
||||
|
||||
\ffreeexample{target_update}{2}
|
||||
\ffreeexample[4.0]{target_update}{2}
|
||||
|
||||
|
@ -26,7 +26,7 @@ The use of the \plc{A} array is sufficient for this case, because one
|
||||
would expect the storage for \plc{A} and \plc{B} would be physically "close"
|
||||
(as provided by the hint in the first task).
|
||||
|
||||
\cexample{affinity}{6}
|
||||
\cexample[5.0]{affinity}{6}
|
||||
|
||||
\ffreeexample{affinity}{6}
|
||||
\ffreeexample[5.0]{affinity}{6}
|
||||
|
||||
|
@ -8,9 +8,9 @@
|
||||
This example shows a simple flow dependence using a \code{depend}
|
||||
clause on the \code{task} construct.
|
||||
|
||||
\cexample{task_dep}{1}
|
||||
\cexample[4.0]{task_dep}{1}
|
||||
|
||||
\ffreeexample{task_dep}{1}
|
||||
\ffreeexample[4.0]{task_dep}{1}
|
||||
|
||||
The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend}
|
||||
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
|
||||
@ -23,9 +23,9 @@ would have a race condition.
|
||||
This example shows an anti-dependence using the \code{depend}
|
||||
clause on the \code{task} construct.
|
||||
|
||||
\cexample{task_dep}{2}
|
||||
\cexample[4.0]{task_dep}{2}
|
||||
|
||||
\ffreeexample{task_dep}{2}
|
||||
\ffreeexample[4.0]{task_dep}{2}
|
||||
|
||||
The program will always print \texttt{"}x = 1\texttt{"}, because the \code{depend}
|
||||
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
|
||||
@ -38,9 +38,9 @@ race condition.
|
||||
This example shows an output dependence using the \code{depend}
|
||||
clause on the \code{task} construct.
|
||||
|
||||
\cexample{task_dep}{3}
|
||||
\cexample[4.0]{task_dep}{3}
|
||||
|
||||
\ffreeexample{task_dep}{3}
|
||||
\ffreeexample[4.0]{task_dep}{3}
|
||||
|
||||
The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend}
|
||||
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
|
||||
@ -55,9 +55,9 @@ In this example we show potentially concurrent execution of tasks using multiple
|
||||
flow dependences expressed using the \code{depend} clause on the \code{task}
|
||||
construct.
|
||||
|
||||
\cexample{task_dep}{4}
|
||||
\cexample[4.0]{task_dep}{4}
|
||||
|
||||
\ffreeexample{task_dep}{4}
|
||||
\ffreeexample[4.0]{task_dep}{4}
|
||||
|
||||
The last two tasks are dependent on the first task. However there is no dependence
|
||||
between the last two tasks, which may execute in any order (or concurrently if
|
||||
@ -72,9 +72,9 @@ in any order and the program would have a race condition.
|
||||
This example shows a task-based blocked matrix multiplication. Matrices are of
|
||||
NxN elements, and the multiplication is implemented using blocks of BSxBS elements.
|
||||
|
||||
\cexample{task_dep}{5}
|
||||
\cexample[4.0]{task_dep}{5}
|
||||
|
||||
\ffreeexample{task_dep}{5}
|
||||
\ffreeexample[4.0]{task_dep}{5}
|
||||
|
||||
\subsection{\code{taskwait} with Dependences}
|
||||
\label{subsec:taskwait_depend}
|
||||
@ -105,9 +105,9 @@ Hence, immediately after the first \code{taskwait} it is unsafe to access the
|
||||
The second \code{taskwait} ensures that the second child task has completed; hence
|
||||
it is safe to access the \plc{y} variable in the following print statement.
|
||||
|
||||
\cexample{task_dep}{6}
|
||||
\cexample[5.0]{task_dep}{6}
|
||||
|
||||
\ffreeexample{task_dep}{6}
|
||||
\ffreeexample[5.0]{task_dep}{6}
|
||||
|
||||
In this example the first two tasks are serialized, because a dependence on
|
||||
the first child is produced by \plc{x} with the \code{in} dependence type
|
||||
@ -123,9 +123,9 @@ included to illustrate how the child task dependences can be completely annotate
|
||||
in a data-flow model.)
|
||||
|
||||
|
||||
\cexample{task_dep}{7}
|
||||
\cexample[5.0]{task_dep}{7}
|
||||
|
||||
\ffreeexample{task_dep}{7}
|
||||
\ffreeexample[5.0]{task_dep}{7}
|
||||
|
||||
|
||||
This example is similar to the previous one, except the generating task is
|
||||
@ -149,9 +149,9 @@ the dependence type of variables in the \code{taskwait} \code{depend} clause
|
||||
when selecting child tasks that the generating task must wait on, so that its execution after the
|
||||
taskwait does not produce race conditions on variables accessed by non-completed child tasks.
|
||||
|
||||
\cexample{task_dep}{8}
|
||||
\cexample[5.0]{task_dep}{8}
|
||||
|
||||
\ffreeexample{task_dep}{8}
|
||||
\ffreeexample[5.0]{task_dep}{8}
|
||||
|
||||
\pagebreak
|
||||
\subsection{Mutually Exclusive Execution with Dependences}
|
||||
@ -168,18 +168,18 @@ to the \code{mutexinoutset} dependence type on \code{c}, T4 and T5 may be
|
||||
scheduled in any order with respect to each other, but not at the same
|
||||
time. Tasks T6 will be scheduled after both T4 and T5 are completed.
|
||||
|
||||
\cexample{task_dep}{9}
|
||||
\cexample[5.0]{task_dep}{9}
|
||||
|
||||
\ffreeexample{task_dep}{9}
|
||||
\ffreeexample[5.0]{task_dep}{9}
|
||||
|
||||
The following example demonstrates a situation where the \code{mutexinoutset}
|
||||
dependence type is advantageous. If \code{shortTaskB} completes
|
||||
before \code{longTaskA}, the runtime can take advantage of this by
|
||||
scheduling \code{longTaskBC} before \code{shortTaskAC}.
|
||||
|
||||
\cexample{task_dep}{10}
|
||||
\cexample[5.0]{task_dep}{10}
|
||||
|
||||
\ffreeexample{task_dep}{10}
|
||||
\ffreeexample[5.0]{task_dep}{10}
|
||||
|
||||
\subsection{Multidependences Using Iterators}
|
||||
\label{subsec:depend_iterator}
|
||||
@ -211,6 +211,31 @@ identical nor disjoint to the storage prescibed by the elements of the
|
||||
loop tasks. The iterator overcomes this restriction by effectively
|
||||
creating n disjoint storage areas.
|
||||
|
||||
\cexample{task_dep}{11}
|
||||
\cexample[5.0]{task_dep}{11}
|
||||
|
||||
\ffreeexample[5.0]{task_dep}{11}
|
||||
|
||||
\subsection{Dependence for Undeferred Tasks}
|
||||
\label{subsec:depend_undefer_task}
|
||||
|
||||
In the following example, we show that even if a task is undeferred as specified
|
||||
by an \code{if} clause that evaluates to \plc{false}, task dependences are
|
||||
still honored.
|
||||
|
||||
The \code{depend} clauses of the first and second explicit tasks specify that
|
||||
the first task is completed before the second task.
|
||||
|
||||
The second explicit task has an \code{if} clause that evaluates to \plc{false}.
|
||||
This means that the execution of the generating task (the implicit task of
|
||||
the \code{single} region) must be suspended until the second explict task
|
||||
is completed.
|
||||
But, because of the dependence, the first explicit task must complete first,
|
||||
then the second explicit task can execute and complete, and only then
|
||||
the generating task can resume to the print statement.
|
||||
Thus, the program will always print "\texttt{x = 2}".
|
||||
|
||||
\cexample[4.0]{task_dep}{12}
|
||||
\clearpage
|
||||
|
||||
\ffreeexample[4.0]{task_dep}{12}
|
||||
|
||||
\ffreeexample{task_dep}{11}
|
||||
|
@ -16,7 +16,7 @@ The creation of tasks occurs in ascending order (according to the iteration spac
|
||||
the loop) but a hint, by means of the \code{priority} clause, is provided to reverse
|
||||
the execution order.
|
||||
|
||||
\cexample{task_priority}{1}
|
||||
\cexample[4.5]{task_priority}{1}
|
||||
|
||||
\ffreeexample{task_priority}{1}
|
||||
\ffreeexample[4.5]{task_priority}{1}
|
||||
|
||||
|
@ -14,7 +14,7 @@ does not participate in the synchronization, and is left free to execute in para
|
||||
This is opposed to the behavior of the \code{taskwait} construct, which would
|
||||
include the background tasks in the synchronization.
|
||||
|
||||
\cexample{taskgroup}{1}
|
||||
\cexample[4.0]{taskgroup}{1}
|
||||
|
||||
\ffreeexample{taskgroup}{1}
|
||||
\ffreeexample[4.0]{taskgroup}{1}
|
||||
|
||||
|
@ -9,17 +9,17 @@ note that the tasks will be executed in no specified order because there are no
|
||||
synchronization directives. Thus, assuming that the traversal will be done in post
|
||||
order, as in the sequential code, is wrong.
|
||||
|
||||
\cexample{tasking}{1}
|
||||
\cexample[3.0]{tasking}{1}
|
||||
|
||||
\ffreeexample{tasking}{1}
|
||||
\ffreeexample[3.0]{tasking}{1}
|
||||
|
||||
In the next example, we force a postorder traversal of the tree by adding a \code{taskwait}
|
||||
directive. Now, we can safely assume that the left and right sons have been executed
|
||||
before we process the current node.
|
||||
|
||||
\cexample{tasking}{2}
|
||||
\cexample[3.0]{tasking}{2}
|
||||
|
||||
\ffreeexample{tasking}{2}
|
||||
\ffreeexample[3.0]{tasking}{2}
|
||||
|
||||
The following example demonstrates how to use the \code{task} construct to process
|
||||
elements of a linked list in parallel. The thread executing the \code{single}
|
||||
@ -28,18 +28,19 @@ in the current team. The pointer \plc{p} is \code{firstprivate} by default
|
||||
on the \code{task} construct so it is not necessary to specify it in a \code{firstprivate}
|
||||
clause.
|
||||
|
||||
\cexample{tasking}{3}
|
||||
\cexample[3.0]{tasking}{3}
|
||||
|
||||
\ffreeexample{tasking}{3}
|
||||
\ffreeexample[3.0]{tasking}{3}
|
||||
|
||||
The \code{fib()} function should be called from within a \code{parallel} region
|
||||
for the different specified tasks to be executed in parallel. Also, only one thread
|
||||
of the \code{parallel} region should call \code{fib()} unless multiple concurrent
|
||||
Fibonacci computations are desired.
|
||||
|
||||
\cexample{tasking}{4}
|
||||
\cexample[3.0]{tasking}{4}
|
||||
|
||||
\fexample{tasking}{4}
|
||||
\fexample[3.0]{tasking}{4}
|
||||
\clearpage
|
||||
|
||||
Note: There are more efficient algorithms for computing Fibonacci numbers. This
|
||||
classic recursion algorithm is for illustrative purposes.
|
||||
@ -52,9 +53,9 @@ loop to suspend its task at the task scheduling point in the \code{task} directi
|
||||
and start executing unassigned tasks. Once the number of unassigned tasks is sufficiently
|
||||
low, the thread may resume execution of the task generating loop.
|
||||
|
||||
\cexample{tasking}{5}
|
||||
\cexample[3.0]{tasking}{5}
|
||||
|
||||
\fexample{tasking}{5}
|
||||
\fexample[3.0]{tasking}{5}
|
||||
|
||||
The following example is the same as the previous one, except that the tasks are
|
||||
generated in an untied task. While generating the tasks, the implementation may
|
||||
@ -69,9 +70,9 @@ to resume the task generating loop. In the previous examples, the other threads
|
||||
would be forced to idle until the generating thread finishes its long task, since
|
||||
the task generating loop was in a tied task.
|
||||
|
||||
\cexample{tasking}{6}
|
||||
\cexample[3.0]{tasking}{6}
|
||||
|
||||
\fexample{tasking}{6}
|
||||
\fexample[3.0]{tasking}{6}
|
||||
|
||||
The following two examples demonstrate how the scheduling rules illustrated in
|
||||
Section 2.11.3 of the OpenMP 4.0 specification affect the usage of
|
||||
@ -86,20 +87,20 @@ both of the task regions that modify \code{tp}. The parts of these task regions
|
||||
in which \code{tp} is modified may be executed in any order so the resulting
|
||||
value of \code{var} can be either 1 or 2.
|
||||
|
||||
\cexample{tasking}{7}
|
||||
\cexample[3.0]{tasking}{7}
|
||||
|
||||
|
||||
\fexample{tasking}{7}
|
||||
\fexample[3.0]{tasking}{7}
|
||||
|
||||
In this example, scheduling constraints prohibit a thread in the team from executing
|
||||
a new task that modifies \code{tp} while another such task region tied to the
|
||||
same thread is suspended. Therefore, the value written will persist across the
|
||||
task scheduling point.
|
||||
|
||||
\cexample{tasking}{8}
|
||||
\cexample[3.0]{tasking}{8}
|
||||
|
||||
|
||||
\fexample{tasking}{8}
|
||||
\fexample[3.0]{tasking}{8}
|
||||
|
||||
The following two examples demonstrate how the scheduling rules illustrated in
|
||||
Section 2.11.3 of the OpenMP 4.0 specification affect the usage of locks
|
||||
@ -112,20 +113,20 @@ it encounters the task scheduling point at task 3, it could suspend task 1 and
|
||||
begin task 2 which will result in a deadlock when it tries to enter critical region
|
||||
1.
|
||||
|
||||
\cexample{tasking}{9}
|
||||
\cexample[3.0]{tasking}{9}
|
||||
|
||||
|
||||
\fexample{tasking}{9}
|
||||
\fexample[3.0]{tasking}{9}
|
||||
|
||||
In the following example, \code{lock} is held across a task scheduling point.
|
||||
However, according to the scheduling restrictions, the executing thread can't
|
||||
begin executing one of the non-descendant tasks that also acquires \code{lock} before
|
||||
the task region is complete. Therefore, no deadlock is possible.
|
||||
|
||||
\cexample{tasking}{10}
|
||||
\cexample[3.0]{tasking}{10}
|
||||
|
||||
|
||||
\ffreeexample{tasking}{10}
|
||||
\ffreeexample[3.0]{tasking}{10}
|
||||
\clearpage
|
||||
|
||||
The following examples illustrate the use of the \code{mergeable} clause in the
|
||||
\code{task} construct. In this first example, the \code{task} construct has
|
||||
@ -139,9 +140,9 @@ outcome does not depend on whether or not the task is merged (that is, the task
|
||||
will always increment the same variable and will always compute the same value
|
||||
for \code{x}).
|
||||
|
||||
\cexample{tasking}{11}
|
||||
\cexample[3.1]{tasking}{11}
|
||||
|
||||
\ffreeexample{tasking}{11}
|
||||
\ffreeexample[3.1]{tasking}{11}
|
||||
|
||||
This second example shows an incorrect use of the \code{mergeable} clause. In
|
||||
this example, the created task will access different instances of the variable
|
||||
@ -150,9 +151,9 @@ it will access the same variable \code{x} if the task is merged. As a result,
|
||||
the behavior of the program is unspecified and it can print two different values
|
||||
for \code{x} depending on the decisions taken by the implementation.
|
||||
|
||||
\cexample{tasking}{12}
|
||||
\cexample[3.1]{tasking}{12}
|
||||
|
||||
\ffreeexample{tasking}{12}
|
||||
\ffreeexample[3.1]{tasking}{12}
|
||||
|
||||
The following example shows the use of the \code{final} clause and the \code{omp\_in\_final}
|
||||
API call in a recursive binary search program. To reduce overhead, once a certain
|
||||
@ -170,9 +171,9 @@ in the stack could also be avoided but it would make this example less clear. Th
|
||||
clause since all tasks created in a \code{final} task region are included tasks
|
||||
that can be merged if the \code{mergeable} clause is present.
|
||||
|
||||
\cexample{tasking}{13}
|
||||
\cexample[3.1]{tasking}{13}
|
||||
|
||||
\ffreeexample{tasking}{13}
|
||||
\ffreeexample[3.1]{tasking}{13}
|
||||
|
||||
The following example illustrates the difference between the \code{if} and the
|
||||
\code{final} clauses. The \code{if} clause has a local effect. In the first
|
||||
@ -184,7 +185,7 @@ task itself. In the second nest of tasks, the nested tasks will be created as in
|
||||
tasks. Note also that the conditions for the \code{if} and \code{final} clauses
|
||||
are usually the opposite.
|
||||
|
||||
\cexample{tasking}{14}
|
||||
\cexample[3.1]{tasking}{14}
|
||||
|
||||
\ffreeexample{tasking}{14}
|
||||
\ffreeexample[3.1]{tasking}{14}
|
||||
|
||||
|
@ -9,9 +9,9 @@ The \code{grainsize} clause specifies that each task is to execute at least 500
|
||||
|
||||
The \code{nogroup} clause removes the implicit taskgroup of the \code{taskloop} construct; the explicit \code{taskgroup} construct in the example ensures that the function is not exited before the long-running task and the loops have finished execution.
|
||||
|
||||
\cexample{taskloop}{1}
|
||||
\cexample[4.5]{taskloop}{1}
|
||||
|
||||
\ffreeexample{taskloop}{1}
|
||||
\ffreeexample[4.5]{taskloop}{1}
|
||||
|
||||
%\clearpage
|
||||
|
||||
@ -34,6 +34,6 @@ tasks. This is the common use case for the \code{taskloop} construct.)
|
||||
In the example, the code thus prints \code{x1 = 16384} (\plc{T}*\plc{N}) and
|
||||
\code{x2 = 1024} (\plc{N}).
|
||||
|
||||
\cexample{taskloop}{2}
|
||||
\cexample[4.5]{taskloop}{2}
|
||||
|
||||
\ffreeexample{taskloop}{2}
|
||||
\ffreeexample[4.5]{taskloop}{2}
|
||||
|
@ -8,7 +8,7 @@ that must be done in a critical region. By using \code{taskyield} when a task
|
||||
cannot get access to the \code{critical} region the implementation can suspend
|
||||
the current task and schedule some other task that can do something useful.
|
||||
|
||||
\cexample{taskyield}{1}
|
||||
\cexample[3.1]{taskyield}{1}
|
||||
|
||||
\ffreeexample{taskyield}{1}
|
||||
\ffreeexample[3.1]{taskyield}{1}
|
||||
|
||||
|
@ -16,9 +16,9 @@ region. The \code{omp\_get\_team\_num} routine returns the team number, which is
|
||||
between 0 and one less than the value returned by \code{omp\_get\_num\_teams}. The following
|
||||
example manually distributes a loop across two teams.
|
||||
|
||||
\cexample{teams}{1}
|
||||
\cexample[4.0]{teams}{1}
|
||||
|
||||
\ffreeexample{teams}{1}
|
||||
\ffreeexample[4.0]{teams}{1}
|
||||
|
||||
\subsection{\code{target}, \code{teams}, and \code{distribute} Constructs}
|
||||
\label{subsec:teams_distribute}
|
||||
@ -47,9 +47,9 @@ created by the \code{teams} construct. At the end of the \code{teams} region,
|
||||
each master thread's private copy of \plc{sum} is reduced into the final \plc{sum} that is
|
||||
implicitly mapped into the \code{target} region.
|
||||
|
||||
\cexample{teams}{2}
|
||||
\cexample[4.0]{teams}{2}
|
||||
|
||||
\ffreeexample{teams}{2}
|
||||
\ffreeexample[4.0]{teams}{2}
|
||||
|
||||
\subsection{\code{target} \code{teams}, and Distribute Parallel Loop Constructs}
|
||||
\label{subsec:teams_distribute_parallel}
|
||||
@ -62,9 +62,9 @@ team executes the \code{teams} region.
|
||||
The distribute parallel loop construct schedules the loop iterations across the
|
||||
master threads of each team and then across the threads of each team.
|
||||
|
||||
\cexample{teams}{3}
|
||||
\cexample[4.5]{teams}{3}
|
||||
|
||||
\ffreeexample{teams}{3}
|
||||
\ffreeexample[4.5]{teams}{3}
|
||||
|
||||
\subsection{\code{target} \code{teams} and Distribute Parallel Loop
|
||||
Constructs with Scheduling Clauses}
|
||||
@ -87,9 +87,9 @@ The \code{schedule} clause indicates that the 1024 iterations distributed to
|
||||
a master thread are then assigned to the threads in its associated team in chunks
|
||||
of 64 iterations.
|
||||
|
||||
\cexample{teams}{4}
|
||||
\cexample[4.0]{teams}{4}
|
||||
|
||||
\ffreeexample{teams}{4}
|
||||
\ffreeexample[4.0]{teams}{4}
|
||||
|
||||
\subsection{\code{target} \code{teams} and \code{distribute} \code{simd} Constructs}
|
||||
\label{subsec:teams_distribute_simd}
|
||||
@ -102,9 +102,9 @@ master thread of each team executes the \code{teams} region.
|
||||
The \code{distribute} \code{simd} construct schedules the loop iterations across
|
||||
the master thread of each team and then uses SIMD parallelism to execute the iterations.
|
||||
|
||||
\cexample{teams}{5}
|
||||
\cexample[4.0]{teams}{5}
|
||||
|
||||
\ffreeexample{teams}{5}
|
||||
\ffreeexample[4.0]{teams}{5}
|
||||
|
||||
\subsection{\code{target} \code{teams} and Distribute Parallel Loop SIMD Constructs}
|
||||
\label{subsec:teams_distribute_parallel_simd}
|
||||
@ -118,7 +118,7 @@ The distribute parallel loop SIMD construct schedules the loop iterations across
|
||||
the master thread of each team and then across the threads of each team where each
|
||||
thread uses SIMD parallelism.
|
||||
|
||||
\cexample{teams}{6}
|
||||
\cexample[4.0]{teams}{6}
|
||||
|
||||
\ffreeexample{teams}{6}
|
||||
\ffreeexample[4.0]{teams}{6}
|
||||
|
||||
|
@ -26,35 +26,36 @@ The initializer of the \code{declare}~\code{reduction} directive specifies
|
||||
the initial value for the private variable of each implicit task.
|
||||
The \code{omp\_priv} identifier is used to denote the private variable.
|
||||
|
||||
\cexample{udr}{1}
|
||||
\cexample[4.0]{udr}{1}
|
||||
\clearpage
|
||||
|
||||
The following example shows the corresponding code in Fortran.
|
||||
The \code{declare}~\code{reduction} directives are specified as part of
|
||||
the declaration in subroutine \plc{find\_enclosing\_rectangle} and
|
||||
the procedures that perform the min and max operations are specified as subprograms.
|
||||
|
||||
\ffreeexample{udr}{1}
|
||||
\ffreeexample[4.0]{udr}{1}
|
||||
|
||||
|
||||
The following example shows the same computation as \plc{udr.1} but it illustrates that you can craft complex expressions in the user-defined reduction declaration. In this case, instead of calling the \plc{minproc} and \plc{maxproc} functions we inline the code in a single expression.
|
||||
|
||||
\cexample{udr}{2}
|
||||
\cexample[4.0]{udr}{2}
|
||||
|
||||
The corresponding code of the same example in Fortran is very similar
|
||||
except that the assignment expression in the \code{declare}~\code{reduction}
|
||||
directive can only be used for a single variable, in this case through
|
||||
a type structure constructor \plc{point($\ldots$)}.
|
||||
|
||||
\ffreeexample{udr}{2}
|
||||
\ffreeexample[4.0]{udr}{2}
|
||||
|
||||
|
||||
The following example shows the use of special variables in arguments for combiner (\code{omp\_in} and \code{omp\_out}) and initializer (\code{omp\_priv} and \code{omp\_orig}) routines. This example returns the maximum value of an array and the corresponding index value. The \code{declare}~\code{reduction} directive specifies a user-defined reduction operation \plc{maxloc} for data type \plc{struct} \plc{mx\_s}. The function \plc{mx\_combine} is the combiner and the function \plc{mx\_init} is the initializer.
|
||||
|
||||
\cexample{udr}{3}
|
||||
\cexample[4.0]{udr}{3}
|
||||
|
||||
Below is the corresponding Fortran version of the above example. The \code{declare}~\code{reduction} directive specifies the user-defined operation \plc{maxloc} for user-derived type \plc{mx\_s}. The combiner \plc{mx\_combine} and the initializer \plc{mx\_init} are specified as subprograms.
|
||||
|
||||
\ffreeexample{udr}{3}
|
||||
\ffreeexample[4.0]{udr}{3}
|
||||
|
||||
|
||||
The following example explains a few details of the user-defined reduction
|
||||
@ -74,16 +75,16 @@ has the \code{initializer} clause, the subroutine specified on the clause
|
||||
must be accessible in the current scoping unit. In this case,
|
||||
the subroutine \plc{dt\_init} is accessible by use association.
|
||||
|
||||
\ffreeexample{udr}{4}
|
||||
\ffreeexample[4.0]{udr}{4}
|
||||
|
||||
|
||||
The following example uses user-defined reductions to declare a plus (+) reduction for a C++ class. As the \code{declare}~\code{reduction} directive is inside the context of the \plc{V} class the expressions in the \code{declare}~\code{reduction} directive are resolved in the context of the class. Also, note that the \code{initializer} clause uses a copy constructor to initialize the private variables of the reduction and it uses as parameter to its original variable by using the special variable \code{omp\_orig}.
|
||||
|
||||
\cppexample{udr}{5}
|
||||
\cppexample[4.0]{udr}{5}
|
||||
|
||||
The following examples shows how user-defined reductions can be defined for some STL containers. The first \code{declare}~\code{reduction} defines the plus (+) operation for \plc{std::vector<int>} by making use of the \plc{std::transform} algorithm. The second and third define the merge (or concatenation) operation for \plc{std::vector<int>} and \plc{std::list<int>}.
|
||||
%It shows how the same user-defined reduction operation can be defined to be done differently depending on the specified data type.
|
||||
It shows how the user-defined reduction operation can be applied to specific data types of an STL.
|
||||
|
||||
\cppexample{udr}{6}
|
||||
\cppexample[4.0]{udr}{6}
|
||||
|
||||
|
@ -46,9 +46,9 @@ the purpose of a function variant is to produce the same results by a different
|
||||
%\code{teams distribute simd} in the variant function would produce non conforming code.
|
||||
|
||||
%\pagebreak
|
||||
\cexample{declare_variant}{1}
|
||||
\cexample[5.0]{declare_variant}{1}
|
||||
|
||||
\ffreeexample{declare_variant}{1}
|
||||
\ffreeexample[5.0]{declare_variant}{1}
|
||||
|
||||
|
||||
%\pagebreak
|
||||
@ -72,6 +72,6 @@ containing only a basic \code{parallel}~\code{for} construct is used for the cal
|
||||
%can be found in the allocator example of the Memory Management Chapter.
|
||||
|
||||
%\pagebreak
|
||||
\cexample{declare_variant}{2}
|
||||
\cexample[5.0]{declare_variant}{2}
|
||||
|
||||
\ffreeexample{declare_variant}{2}
|
||||
\ffreeexample[5.0]{declare_variant}{2}
|
||||
|
@ -5,11 +5,11 @@
|
||||
|
||||
The OpenMP Examples document has been updated with new features
|
||||
found in the OpenMP 5.0 Specification. The additional examples and updates
|
||||
are referenced in the Document Revision History of the Appendix, \specref{sec:history_45_to_50}.
|
||||
are referenced in the Document Revision History of the Appendix on page~\pageref{chap:history}.
|
||||
|
||||
Text describing an example with a 5.0 feature specifically states
|
||||
that the feature support begins in the OpenMP 5.0 Specification. Also,
|
||||
an \plc{omp\_5.0} keyword has been added to metadata in the source code.
|
||||
an \code{\small omp\_5.0} keyword has been added to metadata in the source code.
|
||||
These distinctions are presented to remind readers that a 5.0 compliant
|
||||
OpenMP implementation is necessary to use these features in codes.
|
||||
|
||||
|
48
History.tex
48
History.tex
@ -1,6 +1,46 @@
|
||||
\chapter{Document Revision History}
|
||||
\label{chap:history}
|
||||
|
||||
\section{Changes from 5.0.0 to 5.0.1}
|
||||
\label{sec:history_50_to_501}
|
||||
|
||||
\begin{itemize}
|
||||
\item Added version tags (\code{\small{}omp\_}\plc{x.y}) in example labels
|
||||
and the corresponding source codes for all examples that feature
|
||||
OpenMP 3.0 and later.
|
||||
|
||||
\item Included additional examples for the 5.0 features:
|
||||
|
||||
\begin{itemize}
|
||||
\item Extension to the \code{defaultmap} clause
|
||||
(\specref{sec:defaultmap})
|
||||
\item Transferring noncontiguous data with the \code{target}~\code{update} directive in Fortran (\specref{sec:array-shaping})
|
||||
\item \code{conditional} modifier for the \code{lastprivate} clause (\specref{sec:lastprivate})
|
||||
\item \code{task} modifier for the \code{reduction} clause (\specref{subsec:task_reduction})
|
||||
\item Reduction on combined target constructs (\specref{subsec:target_reduction})
|
||||
\item Task reduction with target constructs
|
||||
(\specref{subsec:target_task_reduction})
|
||||
\item \code{scan} directive for returning the \emph{prefix sum} of a reduction (\specref{sec:scan})
|
||||
|
||||
\end{itemize}
|
||||
|
||||
\item Included additional examples for the 4.x features:
|
||||
|
||||
\begin{itemize}
|
||||
\item Dependence for undeferred tasks
|
||||
(\specref{subsec:depend_undefer_task})
|
||||
\item \code{ref}, \code{val}, \code{uval} modifiers for \code{linear} clause (\specref{sec:linear_modifier})
|
||||
|
||||
\end{itemize}
|
||||
|
||||
\item Clarified the description of pointer mapping and pointer attachment in
|
||||
\specref{sec:pointer_mapping}.
|
||||
\item Clarified the description of memory model examples
|
||||
in \specref{sec:mem_model}.
|
||||
|
||||
\end{itemize}
|
||||
|
||||
|
||||
\section{Changes from 4.5.0 to 5.0.0}
|
||||
\label{sec:history_45_to_50}
|
||||
|
||||
@ -21,6 +61,8 @@
|
||||
\item Combined constructs: \code{parallel}~\code{master}~\code{taskloop} and \code{parallel}~\code{master}~\code{taskloop}~\code{simd}
|
||||
(\specref{sec:parallel_master_taskloop})
|
||||
\item Reverse Offload through \plc{ancestor} modifier of \code{device} clause. (\specref{subsec:target_reverse_offload})
|
||||
\item Pointer Mapping - behavior of mapped pointers (\specref{sec:pointer_mapping}) %Example_target_ptr_map*
|
||||
\item Structure Mapping - behavior of mapped structures (\specref{sec:structure_mapping}) %Examples_target_structure_mapping.tex target_struct_map*
|
||||
\item Array Shaping with the \plc{shape-operator} (\specref{sec:array-shaping})
|
||||
\item The \code{declare}~\code{mapper} construct (\specref{sec:declare_mapper})
|
||||
\item Acquire and Release Semantics Synchronization: Memory ordering
|
||||
@ -36,12 +78,16 @@
|
||||
\item \code{requires} directive specifies required features of implementation (\specref{sec:requires})
|
||||
\item \code{declare}~\code{variant} directive - for function variants (\specref{sec:declare_variant})
|
||||
\item \code{metadirective} directive - for directive variants (\specref{sec:metadirective})
|
||||
\item \code{OMP\_TARGET\_OFFLOAD} Environment Variable - controls offload behavior (\specref{sec:target_offload})
|
||||
\end{itemize}
|
||||
|
||||
\item Included the following additional examples for the 4.x features:
|
||||
\begin{itemize}
|
||||
\item more taskloop examples (\specref{sec:taskloop})
|
||||
\item user-defined reduction (UDR) (\specref{subsec:UDR})
|
||||
%NEW 5.0
|
||||
%\item \code{target} \code{enter} and \code{exit} \code{data} unstructured data constructs (\specref{sec:target_enter_exit_data}) %Example_target_unstructured_data.* ?
|
||||
|
||||
\end{itemize}
|
||||
\end{itemize}
|
||||
|
||||
@ -101,12 +147,12 @@ Added the following new examples:
|
||||
\begin{itemize}
|
||||
\item task dependences (\specref{sec:task_depend})
|
||||
\item \code{target} construct (\specref{sec:target})
|
||||
\item array sections in device constructs (\specref{sec:array_sections})
|
||||
\item \code{target}~\code{data} construct (\specref{sec:target_data})
|
||||
\item \code{target}~\code{update} construct (\specref{sec:target_update})
|
||||
\item \code{declare}~\code{target} construct (\specref{sec:declare_target})
|
||||
\item \code{teams} constructs (\specref{sec:teams})
|
||||
\item asynchronous execution of a \code{target} region using tasks (\specref{subsec:async_target_with_tasks})
|
||||
\item array sections in device constructs (\specref{sec:array_sections})
|
||||
\item device runtime routines (\specref{sec:device})
|
||||
\item Fortran ASSOCIATE construct (\specref{sec:associate})
|
||||
\item cancellation constructs (\specref{sec:cancellation})
|
||||
|
48
Makefile
48
Makefile
@ -1,8 +1,9 @@
|
||||
# Makefile for the OpenMP Examples document in LaTex format.
|
||||
# For more information, see the master document, openmp-examples.tex.
|
||||
|
||||
version=5.0.0
|
||||
version=5.0.1
|
||||
default: openmp-examples.pdf
|
||||
diff: openmp-diff-abridged.pdf
|
||||
|
||||
|
||||
CHAPTERS=Title_Page.tex \
|
||||
@ -25,6 +26,9 @@ INTERMEDIATE_FILES=openmp-examples.pdf \
|
||||
openmp-examples.out \
|
||||
openmp-examples.log
|
||||
|
||||
# check for branches names with "name_XXX"
|
||||
DIFF_TICKET_ID=$(shell git rev-parse --abbrev-ref HEAD)
|
||||
|
||||
openmp-examples.pdf: $(CHAPTERS) $(SOURCES) openmp.sty openmp-examples.tex openmp-logo.png
|
||||
rm -f $(INTERMEDIATE_FILES)
|
||||
pdflatex -interaction=batchmode -file-line-error openmp-examples.tex
|
||||
@ -34,4 +38,46 @@ openmp-examples.pdf: $(CHAPTERS) $(SOURCES) openmp.sty openmp-examples.tex openm
|
||||
|
||||
clean:
|
||||
rm -f $(INTERMEDIATE_FILES)
|
||||
rm -f openmp-diff-full.pdf openmp-diff-abridged.pdf
|
||||
rm -rf *.tmpdir
|
||||
|
||||
ifdef DIFF_TO
|
||||
VC_DIFF_TO := -r ${DIFF_TO}
|
||||
else
|
||||
VC_DIFF_TO :=
|
||||
endif
|
||||
ifdef DIFF_FROM
|
||||
VC_DIFF_FROM := -r ${DIFF_FROM}
|
||||
else
|
||||
VC_DIFF_FROM := -r master
|
||||
endif
|
||||
|
||||
DIFF_TO:=HEAD
|
||||
DIFF_FROM:=master
|
||||
DIFF_TYPE:=UNDERLINE
|
||||
|
||||
COMMON_DIFF_OPTS:=--math-markup=whole \
|
||||
--append-safecmd=plc,code,hcode,scode,pcode,splc \
|
||||
--append-textcmd=subsubsubsection
|
||||
|
||||
VC_DIFF_OPTS:=${COMMON_DIFF_OPTS} --force -c latexdiff.cfg --flatten --type="${DIFF_TYPE}" --git --pdf ${VC_DIFF_FROM} ${VC_DIFF_TO} --subtype=ZLABEL --graphics-markup=none
|
||||
|
||||
VC_DIFF_MINIMAL_OPTS:= --only-changes --force
|
||||
|
||||
%.tmpdir: $(wildcard *.sty) $(wildcard *.png) $(wildcard *.aux) openmp-examples.pdf
|
||||
mkdir -p $@/sources
|
||||
mkdir -p $@/figs
|
||||
cp -f $^ "$@/"
|
||||
cp -f sources/* "$@/sources"
|
||||
cp -f figs/* "$@/figs"
|
||||
|
||||
openmp-diff-abridged.pdf: diff-fast-minimal.tmpdir openmp-examples.pdf
|
||||
env PATH="$(shell pwd)/util/latexdiff:$(PATH)" latexdiff-vc ${VC_DIFF_MINIMAL_OPTS} --fast -d $< ${VC_DIFF_OPTS} openmp-examples.tex
|
||||
cp $</openmp-examples.pdf $@
|
||||
if [ "x$(DIFF_TICKET_ID)" != "x" ]; then cp $@ ${@:.pdf=-$(DIFF_TICKET_ID).pdf}; fi
|
||||
|
||||
# Slow but portable diffs
|
||||
openmp-diff-minimal.pdf: diffs-slow-minimal.tmpdir
|
||||
env PATH="$(shell pwd)/util/latexdiff:$(PATH)" latexdiff-vc ${VC_DIFF_MINIMAL_OPTS} -d $< ${VC_DIFF_OPTS} openmp-examples.tex
|
||||
cp $</openmp-examples.pdf $@
|
||||
if [ "x$(DIFF_TICKET_ID)" != "x" ]; then cp $@ ${@:.pdf=-$(DIFF_TICKET_ID).pdf}; fi
|
||||
|
24
README
24
README
@ -32,6 +32,7 @@ For copyright information, please see omp_copyright.txt.
|
||||
@@compilable: yes|no|maybe
|
||||
@@linkable: yes|no|maybe
|
||||
@@expect: success|failure|nothing|rt-error
|
||||
@@version: omp_<verno>
|
||||
|
||||
"name" is the name of an example
|
||||
"type" is the source code type, which can be translated into or from
|
||||
@ -43,20 +44,27 @@ For copyright information, please see omp_copyright.txt.
|
||||
"rt-error" is for a case where compilation may be successful,
|
||||
but the code contains potential runtime issues (such as race condition).
|
||||
Alternative would be to just use "conforming" or "non-conforming".
|
||||
"version" indicates features for a specific OpenMP version, such as "omp_5.0"
|
||||
|
||||
3) LaTeX macros for examples
|
||||
|
||||
- Source code with language h-rules
|
||||
\cexample{<ename>}{<seq-no>} % for C/C++ examples
|
||||
\cppexample{<ename>}{<seq-no>} % for C++ examples
|
||||
\fexample{<ename>}{<seq-no>} % for fixed-form Fortran examples
|
||||
\ffreeexample{<ename>}{<seq-no>} % for free-form Fortran examples
|
||||
\cexample[<verno>]{<ename>}{<seq-no>} % for C/C++ examples
|
||||
\cppexample[<verno>]{<ename>}{<seq-no>} % for C++ examples
|
||||
\fexample[<verno>]{<ename>}{<seq-no>} % for fixed-form Fortran examples
|
||||
\ffreeexample[<verno>]{<ename>}{<seq-no>} % for free-form Fortran examples
|
||||
|
||||
- Source code without language h-rules
|
||||
\cnexample{<ename>}{<seq-no>}
|
||||
\cppnexample{<ename>}{<seq-no>}
|
||||
\fnexample{<ename>}{<seq-no>}
|
||||
\ffreenexample{<ename>}{<seq-no>}
|
||||
\cnexample[<verno>]{<ename>}{<seq-no>}
|
||||
\cppnexample[<verno>]{<ename>}{<seq-no>}
|
||||
\fnexample[<verno>]{<ename>}{<seq-no>}
|
||||
\ffreenexample[<verno>]{<ename>}{<seq-no>}
|
||||
|
||||
Optional <verno> can be supplied in a macro to include a specific OpenMP
|
||||
version in the example header. This option also suggests one additional
|
||||
tag (@@version) line is included in the corresponding source code.
|
||||
If this is not the case (i.e., no @@version tag line), one needs to
|
||||
prefix <verno> with an underscore '_' symbol in the macro.
|
||||
|
||||
- Language h-rules
|
||||
\cspecificstart, \cspecificend
|
||||
|
@ -27,7 +27,7 @@ Source codes for OpenMP \PVER{} Examples can be downloaded from
|
||||
\href{https://github.com/OpenMP/Examples/tree/v\VER}{github}.\\
|
||||
|
||||
\begin{adjustwidth}{0pt}{1em}\setlength{\parskip}{0.25\baselineskip}%
|
||||
Copyright © 1997-2019 OpenMP Architecture Review Board.\\
|
||||
Copyright © 1997-2020 OpenMP Architecture Review Board.\\
|
||||
Permission to copy without fee all or part of this material is granted,
|
||||
provided the OpenMP Architecture Review Board copyright notice and
|
||||
the title of this document appear. Notice is given that copying is by
|
||||
|
5
latexdiff.cfg
Normal file
5
latexdiff.cfg
Normal file
@ -0,0 +1,5 @@
|
||||
PICTUREENV=(?:picture|DIFnomarkup)[\w\d*@]*
|
||||
VERBATIMLINEENV=(?:boxedcode|omptCallback|omptRecord|omptInquiry|omptEnum|omptOther|ompcPragma|ompfPragma|ompfSyntax|ompfFunction|ompfSubroutine|ompcEnum|ompcFunction|ompfEnum|ompEnv|ompSyntax|indentedcodelist|codepar)
|
||||
COUNTERCMD=subsubsubsection
|
||||
CUSTOMDIFCMD=(?:binding|comments|constraints|crossreferences|descr|argdesc|effect|format|restrictions|summary|syntax|events|tools|record|glossaryterm)
|
||||
FLOATENV=(?:note|(?:c|cpp|ccpp|c90|c99|fortran)specific)
|
@ -48,9 +48,9 @@
|
||||
\documentclass[10pt,letterpaper,twoside,makeidx,hidelinks]{scrreprt}
|
||||
|
||||
% Text to appear in the footer on even-numbered pages:
|
||||
\newcommand{\VER}{5.0.0}
|
||||
\newcommand{\VER}{5.0.1}
|
||||
\newcommand{\PVER}{\VER{}p1}
|
||||
\newcommand{\VERDATE}{February 2018}
|
||||
\newcommand{\VERDATE}{May 2020}
|
||||
\newcommand{\footerText}{OpenMP Examples Version \PVER{} - \VERDATE}
|
||||
|
||||
% Unified style sheet for OpenMP documents:
|
||||
|
@ -49,9 +49,9 @@
|
||||
\documentclass[10pt,letterpaper,twoside,makeidx,hidelinks]{scrreprt}
|
||||
|
||||
% Text to appear in the footer on even-numbered pages:
|
||||
\newcommand{\VER}{5.0.0}
|
||||
\newcommand{\VER}{5.0.1}
|
||||
\newcommand{\PVER}{\VER{}}
|
||||
\newcommand{\VERDATE}{November 2019}
|
||||
\newcommand{\VERDATE}{June 2020}
|
||||
\newcommand{\footerText}{OpenMP Examples Version \PVER{} - \VERDATE}
|
||||
|
||||
% Unified style sheet for OpenMP documents:
|
||||
@ -119,6 +119,7 @@
|
||||
|
||||
\input{Chap_devices}
|
||||
\input{Examples_target}
|
||||
\input{Examples_target_defaultmap}
|
||||
\input{Examples_target_pointer_mapping}
|
||||
\input{Examples_target_structure_mapping}
|
||||
\input{Examples_array_sections}
|
||||
@ -146,6 +147,8 @@
|
||||
% Forward Depend 370
|
||||
% simdlen 476
|
||||
% simd linear modifier 480
|
||||
\input{Examples_linear_modifier}
|
||||
|
||||
|
||||
\input{Chap_synchronization}
|
||||
\input{Examples_critical}
|
||||
@ -182,6 +185,7 @@
|
||||
\input{Examples_reduction}
|
||||
% User UDR 287
|
||||
\input{Examples_udr}
|
||||
\input{Examples_scan}
|
||||
\input{Examples_copyin}
|
||||
\input{Examples_copyprivate}
|
||||
\input{Examples_cpp_reference}
|
||||
|
59
openmp.sty
59
openmp.sty
@ -417,10 +417,10 @@
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
% Code example formatting for the Examples document
|
||||
% This defines:
|
||||
% /cexample formats blue markers, caption, and code for C examples
|
||||
% /cppexample formats blue markers, caption, and code for C++ examples
|
||||
% /fexample formats blue markers, caption, and code for Fortran (fixed) examples
|
||||
% /ffreeexample formats blue markers, caption, and code for Fortran90 (free) examples
|
||||
% \cexample formats blue markers, caption, and code for C examples
|
||||
% \cppexample formats blue markers, caption, and code for C++ examples
|
||||
% \fexample formats blue markers, caption, and code for Fortran (fixed) examples
|
||||
% \ffreeexample formats blue markers, caption, and code for Fortran90 (free) examples
|
||||
% Thanks to Jin, Haoqiang H. for the original definitions of the following:
|
||||
|
||||
\usepackage{color,fancyvrb} % for \VerbatimInput
|
||||
@ -437,7 +437,7 @@
|
||||
|
||||
\newcommand{\escstr}[1]{\myreplace{_}{\_}{#1}}
|
||||
|
||||
\def\exampleheader#1#2#3#4{%
|
||||
\def\exampleheader#1#2#3#4#5{%
|
||||
\ifthenelse{ \equal{#1}{} }{
|
||||
\def\cname{#2}
|
||||
\def\ename\cname
|
||||
@ -448,52 +448,61 @@
|
||||
% Use following for mneumonics
|
||||
\def\ename{\escstr{#1}.#2.#3}
|
||||
}
|
||||
\newcount\cnt
|
||||
\cnt=#4
|
||||
\ifthenelse{ \equal{#5}{} }{
|
||||
\def\vername{}
|
||||
}{
|
||||
\def\myver##1{\toolboxSplitAt{##1}{_}\lefttext\righttext
|
||||
\lefttext\toolboxIfElse{\ifx\righttext\undefined}%
|
||||
{\global\advance\cnt by 1}{\expandafter{\righttext}}}
|
||||
\def\vername{\;\;(\code{\small{}omp\_\myver{#5}})}
|
||||
}
|
||||
\noindent
|
||||
\textit{Example \ename}
|
||||
\textit{Example \ename}\vername
|
||||
\def\fcnt{\the\cnt}
|
||||
%\vspace*{-3mm}
|
||||
\code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=#4,fontsize=\small]%
|
||||
%\code{\VerbatimInput[numbers=left,firstnumber=1,firstline=#4,fontsize=\small]%
|
||||
%\code{\VerbatimInput[firstline=#4,fontsize=\small]%
|
||||
\code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=\fcnt,fontsize=\small]%
|
||||
{sources/Example_\cname}}
|
||||
}
|
||||
|
||||
\def\cnexample#1#2{%
|
||||
\exampleheader{#1}{#2}{c}{8}
|
||||
\newcommand\cnexample[3][]{%
|
||||
\exampleheader{#2}{#3}{c}{8}{#1}
|
||||
}
|
||||
|
||||
\def\cppnexample#1#2{%
|
||||
\exampleheader{#1}{#2}{cpp}{8}
|
||||
\newcommand\cppnexample[3][]{%
|
||||
\exampleheader{#2}{#3}{cpp}{8}{#1}
|
||||
}
|
||||
|
||||
\def\fnexample#1#2{%
|
||||
\exampleheader{#1}{#2}{f}{6}
|
||||
\newcommand\fnexample[3][]{%
|
||||
\exampleheader{#2}{#3}{f}{6}{#1}
|
||||
}
|
||||
|
||||
\def\ffreenexample#1#2{%
|
||||
\exampleheader{#1}{#2}{f90}{6}
|
||||
\newcommand\ffreenexample[3][]{%
|
||||
\exampleheader{#2}{#3}{f90}{6}{#1}
|
||||
}
|
||||
|
||||
\newcommand\cexample[2]{%
|
||||
\newcommand\cexample[3][]{%
|
||||
\needspace{5\baselineskip}\ccppspecificstart
|
||||
\cnexample{#1}{#2}
|
||||
\cnexample[#1]{#2}{#3}
|
||||
\ccppspecificend
|
||||
}
|
||||
|
||||
\newcommand\cppexample[2]{%
|
||||
\newcommand\cppexample[3][]{%
|
||||
\needspace{5\baselineskip}\cppspecificstart
|
||||
\cppnexample{#1}{#2}
|
||||
\cppnexample[#1]{#2}{#3}
|
||||
\cppspecificend
|
||||
}
|
||||
|
||||
\newcommand\fexample[2]{%
|
||||
\newcommand\fexample[3][]{%
|
||||
\needspace{5\baselineskip}\fortranspecificstart
|
||||
\fnexample{#1}{#2}
|
||||
\fnexample[#1]{#2}{#3}
|
||||
\fortranspecificend
|
||||
}
|
||||
|
||||
\newcommand\ffreeexample[2]{%
|
||||
\newcommand\ffreeexample[3][]{%
|
||||
\needspace{5\baselineskip}\fortranspecificstart
|
||||
\ffreenexample{#1}{#2}
|
||||
\ffreenexample[#1]{#2}{#3}
|
||||
\fortranspecificend
|
||||
}
|
||||
|
||||
|
@ -4,6 +4,7 @@
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
* @@expect: success
|
||||
* @@version: omp_4.0
|
||||
*/
|
||||
void star( double *a, double *b, double *c, int n, int *ioff )
|
||||
{
|
||||
|
@ -3,6 +3,7 @@
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
! @@expect: success
|
||||
! @@version: omp_4.0
|
||||
subroutine star(a,b,c,n,ioff_ptr)
|
||||
implicit none
|
||||
double precision :: a(*),b(*),c(*)
|
||||
|
@ -4,6 +4,7 @@
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
* @@expect: success
|
||||
* @@version: omp_4.0
|
||||
*/
|
||||
#include <stdio.h>
|
||||
|
||||
|
@ -3,6 +3,7 @@
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
! @@expect: success
|
||||
! @@version: omp_4.0
|
||||
program main
|
||||
implicit none
|
||||
integer, parameter :: N=32
|
||||
@ -19,15 +20,15 @@ program main
|
||||
end program
|
||||
|
||||
function add1(a,b,fact) result(c)
|
||||
!$omp declare simd(add1) uniform(fact)
|
||||
implicit none
|
||||
!$omp declare simd(add1) uniform(fact)
|
||||
double precision :: a,b,fact, c
|
||||
c = a + b + fact
|
||||
end function
|
||||
|
||||
function add2(a,b,i, fact) result(c)
|
||||
!$omp declare simd(add2) uniform(a,b,fact) linear(i:1)
|
||||
implicit none
|
||||
!$omp declare simd(add2) uniform(a,b,fact) linear(i:1)
|
||||
integer :: i
|
||||
double precision :: a(*),b(*),fact, c
|
||||
c = a(i) + b(i) + fact
|
||||
|
@ -4,6 +4,7 @@
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
* @@expect: success
|
||||
* @@version: omp_4.0
|
||||
*/
|
||||
double work( double *a, double *b, int n )
|
||||
{
|
||||
|
@ -3,6 +3,7 @@
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
! @@expect: success
|
||||
! @@version: omp_4.0
|
||||
subroutine work( a, b, n, sum )
|
||||
implicit none
|
||||
integer :: i, n
|
||||
|
@ -4,6 +4,7 @@
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
* @@expect: success
|
||||
* @@version: omp_4.0
|
||||
*/
|
||||
void work( float *b, int n, int m )
|
||||
{
|
||||
|
@ -3,6 +3,7 @@
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
! @@expect: success
|
||||
! @@version: omp_4.0
|
||||
subroutine work( b, n, m )
|
||||
implicit none
|
||||
real :: b(n)
|
||||
|
@ -4,6 +4,7 @@
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
* @@expect: success
|
||||
* @@version: omp_4.0
|
||||
*/
|
||||
void work( double **a, double **b, double **c, int n )
|
||||
{
|
||||
|
@ -3,6 +3,7 @@
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
! @@expect: success
|
||||
! @@version: omp_4.0
|
||||
subroutine work( a, b, c, n )
|
||||
implicit none
|
||||
integer :: i,j,n
|
||||
|
@ -4,6 +4,7 @@
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
* @@expect: success
|
||||
* @@version: omp_4.0
|
||||
*/
|
||||
#pragma omp declare simd linear(p:1) notinbranch
|
||||
int foo(int *p){
|
||||
|
@ -3,9 +3,10 @@
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
! @@expect: success
|
||||
! @@version: omp_4.0
|
||||
function foo(p) result(r)
|
||||
!$omp declare simd(foo) notinbranch
|
||||
implicit none
|
||||
!$omp declare simd(foo) notinbranch
|
||||
integer :: p, r
|
||||
p = p + 10
|
||||
r = p
|
||||
@ -26,8 +27,8 @@ function myaddint(a, b, n) result(r)
|
||||
end function myaddint
|
||||
|
||||
function goo(p) result(r)
|
||||
!$omp declare simd(goo) inbranch
|
||||
implicit none
|
||||
!$omp declare simd(goo) inbranch
|
||||
real :: p, r
|
||||
p = p + 18.5
|
||||
r = p
|
||||
|
@ -4,6 +4,7 @@
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
* @@expect: success
|
||||
* @@version: omp_4.0
|
||||
*/
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
|
@ -3,6 +3,7 @@
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
! @@expect: success
|
||||
! @@version: omp_4.0
|
||||
program fibonacci
|
||||
implicit none
|
||||
integer,parameter :: N=45
|
||||
@ -25,8 +26,8 @@ program fibonacci
|
||||
end program
|
||||
|
||||
recursive function fib(n) result(r)
|
||||
!$omp declare simd(fib) inbranch
|
||||
implicit none
|
||||
!$omp declare simd(fib) inbranch
|
||||
integer :: n, r
|
||||
|
||||
if (n <= 1) then
|
||||
|
@ -4,6 +4,7 @@
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
* @@expect: success
|
||||
* @@version: omp_4.0
|
||||
*/
|
||||
#include <stdio.h>
|
||||
#include <math.h>
|
||||
|
@ -3,6 +3,7 @@
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
! @@expect: success
|
||||
! @@version: omp_4.0
|
||||
module work
|
||||
|
||||
integer :: P(1000)
|
||||
|
@ -1,9 +1,10 @@
|
||||
/*
|
||||
* @@name: acquire_release.1.c
|
||||
* @@type: C
|
||||
* @@compilable: yes, omp_5.0
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
* @@expect: success
|
||||
* @@version: omp_5.0
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
|
@ -1,8 +1,9 @@
|
||||
! @@name: acquire_release.1.f90
|
||||
! @@type: F-free
|
||||
! @@compilable: yes, omp_5.0
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
! @@expect: success
|
||||
! @@version: omp_5.0
|
||||
|
||||
program rel_acq_ex1
|
||||
use omp_lib
|
||||
|
@ -1,9 +1,10 @@
|
||||
/*
|
||||
* @@name: acquire_release.2.c
|
||||
* @@type: C
|
||||
* @@compilable: yes, omp_5.0
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
* @@expect: success
|
||||
* @@version: omp_5.0
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
|
@ -1,8 +1,9 @@
|
||||
! @@name: acquire_release.2.f90
|
||||
! @@type: F-free
|
||||
! @@compilable: yes, omp_5.0
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
! @@expect: success
|
||||
! @@version: omp_5.0
|
||||
|
||||
program rel_acq_ex2
|
||||
use omp_lib
|
||||
|
@ -1,9 +1,10 @@
|
||||
/*
|
||||
* @@name: acquire_release.3.c
|
||||
* @@type: C
|
||||
* @@compilable: yes, omp_5.0
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
* @@expect: success
|
||||
* @@version: omp_5.0
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
|
@ -1,8 +1,9 @@
|
||||
! @@name: acquire_release.3.f90
|
||||
! @@type: F-free
|
||||
! @@compilable: yes, omp_5.0
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
! @@expect: success
|
||||
! @@version: omp_5.0
|
||||
|
||||
program rel_acq_ex3
|
||||
use omp_lib
|
||||
|
@ -1,9 +1,10 @@
|
||||
/*
|
||||
* @@name: acquire_release.4.c
|
||||
* @@type: C
|
||||
* @@compilable: yes, omp_5.0
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
* @@expect: success
|
||||
* @@version: omp_5.0
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
|
@ -1,8 +1,9 @@
|
||||
! @@name: acquire_release.4.f90
|
||||
! @@type: F-free
|
||||
! @@compilable: yes, omp_5.0
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
! @@expect: success
|
||||
! @@version: omp_5.0
|
||||
|
||||
program rel_acq_ex4
|
||||
use omp_lib
|
||||
@ -13,7 +14,7 @@ program rel_acq_ex4
|
||||
!! !!! THIS CODE WILL FAIL TO PRODUCE CONSISTENT RESULTS !!!!!!!
|
||||
!! !!! DO NOT PROGRAM SYNCHRONIZATION THIS WAY !!!!!!!
|
||||
|
||||
!$omp parallel num_threads private(thrd) private(tmp)
|
||||
!$omp parallel num_threads(2) private(thrd) private(tmp)
|
||||
thrd = omp_get_thread_num()
|
||||
if (thrd == 0) then
|
||||
!$omp critical
|
||||
|
@ -4,6 +4,7 @@
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
* @@expect: success
|
||||
* @@version: omp_4.0
|
||||
*/
|
||||
|
||||
void work();
|
||||
|
@ -3,6 +3,7 @@
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
! @@expect: success
|
||||
! @@version: omp_4.0
|
||||
PROGRAM EXAMPLE
|
||||
!$OMP PARALLEL PROC_BIND(SPREAD) NUM_THREADS(4)
|
||||
CALL WORK()
|
||||
|
@ -4,6 +4,7 @@
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
* @@expect: success
|
||||
* @@version: omp_4.0
|
||||
*/
|
||||
void work();
|
||||
void foo()
|
||||
|
@ -3,6 +3,7 @@
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
! @@expect: success
|
||||
! @@version: omp_4.0
|
||||
subroutine foo
|
||||
!$omp parallel num_threads(16) proc_bind(spread)
|
||||
call work()
|
||||
|
@ -4,6 +4,7 @@
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
* @@expect: success
|
||||
* @@version: omp_4.0
|
||||
*/
|
||||
void work();
|
||||
int main()
|
||||
|
@ -3,6 +3,7 @@
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
! @@expect: success
|
||||
! @@version: omp_4.0
|
||||
PROGRAM EXAMPLE
|
||||
!$OMP PARALLEL PROC_BIND(CLOSE) NUM_THREADS(4)
|
||||
CALL WORK()
|
||||
|
@ -4,6 +4,7 @@
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
* @@expect: success
|
||||
* @@version: omp_4.0
|
||||
*/
|
||||
void work();
|
||||
void foo()
|
||||
|
@ -3,6 +3,7 @@
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
! @@expect: success
|
||||
! @@version: omp_4.0
|
||||
subroutine foo
|
||||
!$omp parallel num_threads(16) proc_bind(close)
|
||||
call work()
|
||||
|
@ -4,6 +4,7 @@
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
* @@expect: success
|
||||
* @@version: omp_4.0
|
||||
*/
|
||||
void work();
|
||||
int main()
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
x
Reference in New Issue
Block a user