mirror of
https://github.com/OpenMP/Examples.git
synced 2025-04-04 05:41:33 +01:00
synced with the 4.5.0 implementation of the examples-internal repo
This commit is contained in:
parent
c65fe47427
commit
156a12ca09
@ -1,3 +1,9 @@
|
||||
[20-May-2016] Version 4.5.0
|
||||
Changes from 4.0.2ltx
|
||||
|
||||
1. Reorganization into topic chapters
|
||||
2. Change file suffixes (f/f90 => Fixed/Free format) C++ => cpp
|
||||
|
||||
[2-Feb-2015] Version 4.0.2
|
||||
Changes from 4.0.1ltx
|
||||
|
||||
|
48
Chap_SIMD.tex
Normal file
48
Chap_SIMD.tex
Normal file
@ -0,0 +1,48 @@
|
||||
\pagebreak
|
||||
\chapter{SIMD}
|
||||
\label{chap:simd}
|
||||
|
||||
Single instruction, multiple data (SIMD) is a form of parallel execution
|
||||
in which the same operation is performed on multiple data elements
|
||||
independently in hardware vector processing units (VPU), also called SIMD units.
|
||||
The addition of two vectors to form a third vector is a SIMD operation.
|
||||
Many processors have SIMD (vector) units that can perform simultaneously
|
||||
2, 4, 8 or more executions of the same operation (by a single SIMD unit).
|
||||
|
||||
Loops without loop-carried backward dependency (or with dependency preserved using
|
||||
ordered simd) are candidates for vectorization by the compiler for
|
||||
execution with SIMD units. In addition, with state-of-the-art vectorization
|
||||
technology and \code{declare simd} construct extensions for function vectorization
|
||||
in the OpenMP 4.5 specification, loops with function calls can be vectorized as well.
|
||||
The basic idea is that a scalar function call in a loop can be replaced by a vector version
|
||||
of the function, and the loop can be vectorized simultaneously by combining a loop
|
||||
vectorization (\code{simd} directive on the loop) and a function
|
||||
vectorization (\code{declare simd} directive on the function).
|
||||
|
||||
A \code{simd} construct states that SIMD operations be performed on the
|
||||
data within the loop. A number of clauses are available to provide
|
||||
data-sharing attributes (\code{private}, \code{linear}, \code{reduction} and
|
||||
\code{lastprivate}). Other clauses provide vector length preference/restrictions
|
||||
(\code{simdlen} / \code{safelen}), loop fusion (\code{collapse}), and data
|
||||
alignment (\code{aligned}).
|
||||
|
||||
The \code{declare simd} directive designates
|
||||
that a vector version of the function should also be constructed for
|
||||
execution within loops that contain the function and have a \code{simd}
|
||||
directive. Clauses provide argument specifications (\code{linear},
|
||||
\code{uniform}, and \code{aligned}), a requested vector length
|
||||
(\code{simdlen}), and designate whether the function is always/never
|
||||
called conditionally in a loop (\code{branch}/\code{inbranch}).
|
||||
The latter is for optimizing peformance.
|
||||
|
||||
Also, the \code{simd} construct has been combined with the worksharing loop
|
||||
constructs (\code{for simd} and \code{do simd}) to enable simultaneous thread
|
||||
execution in different SIMD units.
|
||||
%Hence, the \code{simd} construct can be
|
||||
%used alone on a loop to direct vectorization (SIMD execution), or in
|
||||
%combination with a parallel loop construct to include thread parallelism
|
||||
%(a parallel loop sequentially followed by a \code{simd} construct,
|
||||
%or a combined construct such as \code{parallel do simd} or
|
||||
%\code{parallel for simd}).
|
||||
|
||||
|
118
Chap_affinity.tex
Normal file
118
Chap_affinity.tex
Normal file
@ -0,0 +1,118 @@
|
||||
\pagebreak
|
||||
\chapter{OpenMP Affinity}
|
||||
\label{chap:openmp_affinity}
|
||||
|
||||
OpenMP Affinity consists of a \code{proc\_bind} policy (thread affinity policy) and a specification of
|
||||
places (\texttt{"}location units\texttt{"} or \plc{processors} that may be cores, hardware
|
||||
threads, sockets, etc.).
|
||||
OpenMP Affinity enables users to bind computations on specific places.
|
||||
The placement will hold for the duration of the parallel region.
|
||||
However, the runtime is free to migrate the OpenMP threads
|
||||
to different cores (hardware threads, sockets, etc.) prescribed within a given place,
|
||||
if two or more cores (hardware threads, sockets, etc.) have been assigned to a given place.
|
||||
|
||||
Often the binding can be managed without resorting to explicitly setting places.
|
||||
Without the specification of places in the \code{OMP\_PLACES} variable,
|
||||
the OpenMP runtime will distribute and bind threads using the entire range of processors for
|
||||
the OpenMP program, according to the \code{OMP\_PROC\_BIND} environment variable
|
||||
or the \code{proc\_bind} clause. When places are specified, the OMP runtime
|
||||
binds threads to the places according to a default distribution policy, or
|
||||
those specified in the \code{OMP\_PROC\_BIND} environment variable or the
|
||||
\code{proc\_bind} clause.
|
||||
|
||||
In the OpenMP Specifications document a processor refers to an execution unit that
|
||||
is enabled for an OpenMP thread to use. A processor is a core when there is
|
||||
no SMT (Simultaneous Multi-Threading) support or SMT is disabled. When
|
||||
SMT is enabled, a processor is a hardware thread (HW-thread). (This is the
|
||||
usual case; but actually, the execution unit is implementation defined.) Processor
|
||||
numbers are numbered sequentially from 0 to the number of cores less one (without SMT), or
|
||||
0 to the number HW-threads less one (with SMT). OpenMP places use the processor number to designate
|
||||
binding locations (unless an \texttt{"}abstract name\texttt{"} is used.)
|
||||
|
||||
|
||||
The processors available to a process may be a subset of the system's
|
||||
processors. This restriction may be the result of a
|
||||
wrapper process controlling the execution (such as \code{numactl} on Linux systems),
|
||||
compiler options, library-specific environment variables, or default
|
||||
kernel settings. For instance, the execution of multiple MPI processes,
|
||||
launched on a single compute node, will each have a subset of processors as
|
||||
determined by the MPI launcher or set by MPI affinity environment
|
||||
variables for the MPI library. %Forked threads within an MPI process
|
||||
%(for a hybrid execution of MPI and OpenMP code) inherit the valid
|
||||
%processor set for execution from the parent process (the initial task region)
|
||||
%when a parallel region forks threads. The binding policy set in
|
||||
%\code{OMP\_PROC\_BIND} or the \code{proc\_bind} clause will be applied to
|
||||
%the subset of processors available to \plc{the particular} MPI process.
|
||||
|
||||
%Also, setting an explicit list of processor numbers in the \code{OMP\_PLACES}
|
||||
%variable before an MPI launch (which involves more than one MPI process) will
|
||||
%result in unspecified behavior (and doesn't make sense) because the set of
|
||||
%processors in the places list must not contain processors outside the subset
|
||||
%of processors for an MPI process. A separate \code{OMP\_PLACES} variable must
|
||||
%be set for each MPI process, and is usually accomplished by launching a script
|
||||
%which sets \code{OMP\_PLACES} specifically for the MPI process.
|
||||
|
||||
Threads of a team are positioned onto places in a compact manner, a
|
||||
scattered distribution, or onto the master's place, by setting the
|
||||
\code{OMP\_PROC\_BIND} environment variable or the \code{proc\_bind} clause to
|
||||
\plc{close}, \plc{spread}, or \plc{master}, respectively. When
|
||||
\code{OMP\_PROC\_BIND} is set to FALSE no binding is enforced; and
|
||||
when the value is TRUE, the binding is implementation defined to
|
||||
a set of places in the \code{OMP\_PLACES} variable or to places
|
||||
defined by the implementation if the \code{OMP\_PLACES} variable
|
||||
is not set.
|
||||
|
||||
The \code{OMP\_PLACES} variable can also be set to an abstract name
|
||||
(\plc{threads}, \plc{cores}, \plc{sockets}) to specify that a place is
|
||||
either a single hardware thread, a core, or a socket, respectively.
|
||||
This description of the \code{OMP\_PLACES} is most useful when the
|
||||
number of threads is equal to the number of hardware thread, cores
|
||||
or sockets. It can also be used with a \plc{close} or \plc{spread}
|
||||
distribution policy when the equality doesn't hold.
|
||||
|
||||
|
||||
% We need an example of using sockets, cores and threads:
|
||||
|
||||
% case 1 cores:
|
||||
|
||||
% Hyper-Threads on (2 hardware threads per core)
|
||||
% 1 socket x 4 cores x 2 HW-threads
|
||||
%
|
||||
% export OMP_NUM_THREADS=4
|
||||
% export OMP_PLACES=threads
|
||||
%
|
||||
% core # 0 1 2 3
|
||||
% processor # 0,1 2,3 4,5 6,7
|
||||
% thread # 0 * _ _ _ _ _ _ _ #mask for thread 0
|
||||
% thread # 1 _ _ * _ _ _ _ _ #mask for thread 1
|
||||
% thread # 2 _ _ _ _ * _ _ _ #mask for thread 2
|
||||
% thread # 3 _ _ _ _ _ _ * _ #mask for thread 3
|
||||
|
||||
% case 2 threads:
|
||||
%
|
||||
% Hyper-Threads on (2 hardware threads per core)
|
||||
% 1 socket x 4 cores x 2 HW-threads
|
||||
%
|
||||
% export OMP_NUM_THREADS=4
|
||||
% export OMP_PLACES=cores
|
||||
%
|
||||
% core # 0 1 2 3
|
||||
% processor # 0,1 2,3 4,5 6,7
|
||||
% thread # 0 * * _ _ _ _ _ _ #mask for thread 0
|
||||
% thread # 1 _ _ * * _ _ _ _ #mask for thread 1
|
||||
% thread # 2 _ _ _ _ * * _ _ #mask for thread 2
|
||||
% thread # 3 _ _ _ _ _ _ * * #mask for thread 3
|
||||
|
||||
% case 3 sockets:
|
||||
%
|
||||
% No Hyper-Threads
|
||||
% 3 socket x 4 cores
|
||||
%
|
||||
% export OMP_NUM_THREADS=3
|
||||
% export OMP_PLACES=sockets
|
||||
%
|
||||
% socket # 0 1 2
|
||||
% processor # 0,1,2,3 4,5,6,7 8,9,10,11
|
||||
% thread # 0 * * * * _ _ _ _ _ _ _ _ #mask for thread 0
|
||||
% thread # 0 _ _ _ _ * * * * _ _ _ _ #mask for thread 1
|
||||
% thread # 0 _ _ _ _ _ _ _ _ * * * * #mask for thread 2
|
75
Chap_data_environment.tex
Normal file
75
Chap_data_environment.tex
Normal file
@ -0,0 +1,75 @@
|
||||
\pagebreak
|
||||
\chapter{Data Environment}
|
||||
\label{chap:data_environment}
|
||||
The OpenMP \plc{data environment} contains data attributes of variables and
|
||||
objects. Many constructs (such as \code{parallel}, \code{simd}, \code{task})
|
||||
accept clauses to control \plc{data-sharing} attributes
|
||||
of referenced variables in the construct, where \plc{data-sharing} applies to
|
||||
whether the attribute of the variable is \plc{shared},
|
||||
is \plc{private} storage, or has special operational characteristics
|
||||
(as found in the \code{firstprivate}, \code{lastprivate}, \code{linear}, or \code{reduction} clause).
|
||||
|
||||
The data environment for a device (distinguished as a \plc{device data environment})
|
||||
is controlled on the host by \plc{data-mapping} attributes, which determine the
|
||||
relationship of the data on the host, the \plc{original} data, and the data on the
|
||||
device, the \plc{corresponding} data.
|
||||
|
||||
\bigskip
|
||||
DATA-SHARING ATTRIBUTES
|
||||
|
||||
Data-sharing attributes of variables can be classified as being \plc{predetermined},
|
||||
\plc{explicitly determined} or \plc{implicitly determined}.
|
||||
|
||||
Certain variables and objects have predetermined attributes.
|
||||
A commonly found case is the loop iteration variable in associated loops
|
||||
of a \code{for} or \code{do} construct. It has a private data-sharing attribute.
|
||||
Variables with predetermined data-sharing attributes can not be listed in a data-sharing clause; but there are some
|
||||
exceptions (mainly concerning loop iteration variables).
|
||||
|
||||
Variables with explicitly determined data-sharing attributes are those that are
|
||||
referenced in a given construct and are listed in a data-sharing attribute
|
||||
clause on the construct. Some of the common data-sharing clauses are:
|
||||
\code{shared}, \code{private}, \code{firstprivate}, \code{lastprivate},
|
||||
\code{linear}, and \code{reduction}. % Are these all of them?
|
||||
|
||||
Variables with implicitly determined data-sharing attributes are those
|
||||
that are referenced in a given construct, do not have predetermined
|
||||
data-sharing attributes, and are not listed in a data-sharing
|
||||
attribute clause of an enclosing construct.
|
||||
For a complete list of variables and objects with predetermined and
|
||||
implicitly determined attributes, please refer to the
|
||||
\plc{Data-sharing Attribute Rules for Variables Referenced in a Construct}
|
||||
subsection of the OpenMP Specifications document.
|
||||
|
||||
\bigskip
|
||||
DATA-MAPPING ATTRIBUTES
|
||||
|
||||
The \code{map} clause on a device construct explictly specifies how the list items in
|
||||
the clause are mapped from the encountering task's data environment (on the host)
|
||||
to the corresponding item in the device data environment (on the device).
|
||||
The common \plc{list items} are arrays, array sections, scalars, pointers, and
|
||||
structure elements (members).
|
||||
|
||||
Procedures and global variables have predetermined data mapping if they appear
|
||||
within the list or block of a \code{declare target} directive. Also, a C/C++ pointer
|
||||
is mapped as a zero-length array section, as is a C++ variable that is a reference to a pointer.
|
||||
% Waiting for response from Eric on this.
|
||||
|
||||
Without explict mapping, non-scalar and non-pointer variables within the scope of the \code{target}
|
||||
construct are implicitly mapped with a \plc{map-type} of \code{tofrom}.
|
||||
Without explicit mapping, scalar variables within the scope of the \code{target}
|
||||
construct are not mapped, but have an implicit firstprivate data-sharing
|
||||
attribute. (That is, the value of the original variable is given to a private
|
||||
variable of the same name on the device.) This behavior can be changed with
|
||||
the \code{defaultmap} clause.
|
||||
|
||||
The \code{map} clause can appear on \code{target}, \code{target data} and
|
||||
\code{target enter/exit data} constructs. The operations of creation and
|
||||
removal of device storage as well as assignment of the original list item
|
||||
values to the corresponding list items may be complicated when the list
|
||||
item appears on multiple constructs or when the host and device storage
|
||||
is shared. In these cases the item's reference count, the number of times
|
||||
it has been referenced (+1 on entry and -1 on exited) in nested (structured)
|
||||
map regions and/or accumulative (unstructured) mappings, determines the operation.
|
||||
Details of the \code{map} clause and reference count operation are specified
|
||||
in the \plc{map Clause} subsection of the OpenMP Specifications document.
|
53
Chap_devices.tex
Normal file
53
Chap_devices.tex
Normal file
@ -0,0 +1,53 @@
|
||||
\pagebreak
|
||||
\chapter{Devices}
|
||||
\label{chap:devices}
|
||||
|
||||
The \code{target} construct consists of a \code{target} directive
|
||||
and an execution region. The \code{target} region is executed on
|
||||
the default device or the device specified in the \code{device}
|
||||
clause.
|
||||
|
||||
In OpenMP version 4.0, by default, all variables within the lexical
|
||||
scope of the construct are copied \plc{to} and \plc{from} the
|
||||
device, unless the device is the host, or the data exists on the
|
||||
device from a previously executed data-type construct that
|
||||
has created space on the device and possibly copied host
|
||||
data to the device storage.
|
||||
|
||||
The constructs that explicitly
|
||||
create storage, transfer data, and free storage on the device
|
||||
are catagorized as structured and unstructured. The
|
||||
\code{target} \code{data} construct is structured. It creates
|
||||
a data region around \code{target} constructs, and is
|
||||
convenient for providing persistent data throughout multiple
|
||||
\code{target} regions. The \code{target} \code{enter} \code{data} and
|
||||
\code{target} \code{exit} \code{data} constructs are unstructured, because
|
||||
they can occur anywhere and do not support a "structure"
|
||||
(a region) for enclosing \code{target} constructs, as does the
|
||||
\code{target} \code{data} construct.
|
||||
|
||||
The \code{map} clause is used on \code{target}
|
||||
constructs and the data-type constructs to map host data. It
|
||||
specifies the device storage and data movement \code{to} and \code{from}
|
||||
the device, and controls on the storage duration.
|
||||
|
||||
There is an important change in the OpenMP 4.5 specification
|
||||
that alters the data model for scalar variables and C/C++ pointer variables.
|
||||
The default behavior for scalar variables and C/C++ pointer variables
|
||||
in an 4.5 compliant code is \code{firstprivate}. Example
|
||||
codes that have been updated to reflect this new behavior are
|
||||
annotated with a description that describes changes required
|
||||
for correct execution. Often it is a simple matter of mapping
|
||||
the variable as \code{tofrom} to obtain the intended 4.0 behavior.
|
||||
|
||||
In OpenMP version 4.5 the mechanism for target
|
||||
execution is specified as occuring through a \plc{target task}.
|
||||
When the \code{target} construct is encountered a new
|
||||
\plc{target task} is generated. The \plc{target task}
|
||||
completes after the \code{target} region has executed and all data
|
||||
transfers have finished.
|
||||
|
||||
This new specification does not affect the execution of
|
||||
pre-4.5 code; it is a necessary element for asynchronous
|
||||
execution of the \code{target} region when using the new \code{nowait}
|
||||
clause introduced in OpenMP 4.5.
|
105
Chap_memory_model.tex
Normal file
105
Chap_memory_model.tex
Normal file
@ -0,0 +1,105 @@
|
||||
\pagebreak
|
||||
\chapter{Memory Model}
|
||||
\label{chap:memory_model}
|
||||
|
||||
In this chapter, examples illustrate race conditions on access to variables with
|
||||
shared data-sharing attributes. A race condition can exist when two
|
||||
or more threads are involved in accessing a variable in which not all
|
||||
of the accesses are reads; that is, a WaR, RaW or WaW condition
|
||||
exists (R=read, a=after, W=write). A RaR does not produce a race condition.
|
||||
Ensuring thread execution order at
|
||||
the processor level is not enough to avoid race conditions, because the
|
||||
local storage at the processor level (registers, caches, etc.)
|
||||
must be synchronized so that a consistent view of the variable in the
|
||||
memory hierarchy can be seen by the threads accessing the variable.
|
||||
|
||||
OpenMP provides a shared-memory model which allows all threads access
|
||||
to \plc{memory} (shared data). Each thread also has exclusive
|
||||
access to \plc{threadprivate memory} (private data). A private
|
||||
variable referenced in an OpenMP directive's structured block is a
|
||||
new version of the original variable (with the same name) for each
|
||||
task (or SIMD lane) within the code block. A private variable is
|
||||
initially undefined (except for variables in \code{firstprivate}
|
||||
and \code{linear} clauses), and the original variable value is
|
||||
unaltered by assignments to the private variable, (except for
|
||||
\code{reduction}, \code{lastprivate} and \code{linear} clauses).
|
||||
|
||||
Private variables in an outer \code{parallel} region can be
|
||||
shared by implicit tasks of an inner \code{parallel} region
|
||||
(with a \code{share} clause on the inner \code{parallel} directive).
|
||||
Likewise, a private variable may be shared in the region of an
|
||||
explicit \code{task} (through a \code{shared} clause).
|
||||
|
||||
|
||||
The \code{flush} directive forces a consistent view of local variables
|
||||
of the thread executing the \code{flush}.
|
||||
When a list is supplied on the directive, only the items (variables)
|
||||
in the list are guaranteed to be flushed.
|
||||
|
||||
Implied flushes exist at prescribed locations of certain constructs.
|
||||
For the complete list of these locations and associated constructs,
|
||||
please refer to the \plc{flush Construct} section of the OpenMP
|
||||
Specifications document.
|
||||
|
||||
% The following table lists construct in which implied flushes exist, and the
|
||||
% location of their execution.
|
||||
%
|
||||
% %\begin{table}[hb]
|
||||
% \begin{center}
|
||||
% %\caption {Execution Location for Implicit Flushes. }
|
||||
% \begin{tabular}{ | p{0.6\linewidth} | l | }
|
||||
% \hline
|
||||
% \code{CONSTRUCT} & \makecell{\code{EXECUTION} \\ \code{LOCATION}} \\
|
||||
% \hline
|
||||
% \code{parallel} & upon entry and exit \\
|
||||
% \hline
|
||||
% \makecell[l]{worksharing \\ \hspace{1.5em}\code{for}, \code{do}
|
||||
% \\ \hspace{1.5em}\code{sections}
|
||||
% \\ \hspace{1.5em}\code{single}
|
||||
% \\ \hspace{1.5em}\code{workshare} }
|
||||
% & upon exit \\
|
||||
% \hline
|
||||
% \code{critical} & upon entry and exit \\
|
||||
% \hline
|
||||
% \code{target} & upon entry and exit \\
|
||||
% \hline
|
||||
% \code{barrier} & during \\
|
||||
% \hline
|
||||
% \code{atomic} operation with \plc{seq\_cst} clause & upon entry and exit \\
|
||||
% \hline
|
||||
% \code{ordered}* & upon entry and exit \\
|
||||
% \hline
|
||||
% \code{cancel}** and \code{cancellation point}** & during \\
|
||||
% \hline
|
||||
% \code{target data} & upon entry and exit \\
|
||||
% \hline
|
||||
% \code{target update} + \code{to} clause,
|
||||
% \code{target enter data} & on entry \\
|
||||
% \hline
|
||||
% \code{target update} + \code{from} clause,
|
||||
% \code{target exit data} & on exit \\
|
||||
% \hline
|
||||
% \code{omp\_set\_lock} & during \\
|
||||
% \hline
|
||||
% \makecell[l]{ \code{omp\_set/unset\_lock}, \code{omp\_test\_lock}***
|
||||
% \\ \code{omp\_set/unset/test\_nest\_lock}*** }
|
||||
% & during \\
|
||||
% \hline
|
||||
% task scheduling point & \makecell[l]{immediately \\ before and after} \\
|
||||
% \hline
|
||||
% \end{tabular}
|
||||
% %\caption {Execution Location for Implicit Flushes. }
|
||||
%
|
||||
% \end{center}
|
||||
% %\end{table}
|
||||
%
|
||||
% * without clauses and with \code{threads} or \code{depend} clauses \newline
|
||||
% ** when \plc{cancel-var} ICV is \plc{true} (cancellation is turned on) and cancellation is activated \newline
|
||||
% *** if the region causes the lock to be set or unset
|
||||
%
|
||||
% A flush with a list is implied for non-sequentially consistent \code{atomic} operations
|
||||
% (\code{atomic} directive without a \code{seq\_cst} clause), where the list item is the
|
||||
% specific storage location accessed atomically (specified as the \plc{x} variable
|
||||
% in \plc{atomic Construct} subsection of the OpenMP Specifications document).
|
||||
|
||||
Examples 1-3 show the difficulty of synchronizing threads through \code{flush} and \code{atomic} directives.
|
104
Chap_parallel_execution.tex
Normal file
104
Chap_parallel_execution.tex
Normal file
@ -0,0 +1,104 @@
|
||||
\pagebreak
|
||||
\chapter{Parallel Execution}
|
||||
\label{chap:parallel_execution}
|
||||
|
||||
A single thread, the \plc{initial thread}, begins sequential execution of
|
||||
an OpenMP enabled program, as if the whole program is in an implicit parallel
|
||||
region consisting of an implicit task executed by the \plc{initial thread}.
|
||||
|
||||
A \code{parallel} construct encloses code,
|
||||
forming a parallel region. An \plc{initial thread} encountering a \code{parallel}
|
||||
region forks (creates) a team of threads at the beginning of the
|
||||
\code{parallel} region, and joins them (removes from execution) at the
|
||||
end of the region. The initial thread becomes the master thread of the team in a
|
||||
\code{parallel} region with a \plc{thread} number equal to zero, the other
|
||||
threads are numbered from 1 to number of threads minus 1.
|
||||
A team may be comprised of just a single thread.
|
||||
|
||||
Each thread of a team is assigned an implicit task consisting of code within the
|
||||
parallel region. The task that creates a parallel region is suspended while the
|
||||
tasks of the team are executed. A thread is tied to its task; that is,
|
||||
only the thread assigned to the task can execute that task. After completion
|
||||
of the \code{parallel} region, the master thread resumes execution of the generating task.
|
||||
|
||||
%After the \code{parallel} region the master thread becomes the initial
|
||||
%thread again, and continues to execute the \plc{sequential part}.
|
||||
|
||||
Any task within a \code{parallel} region is allowed to encounter another
|
||||
\code{parallel} region to form a nested \code{parallel} region. The
|
||||
parallelism of a nested \code{parallel} region (whether it forks additional
|
||||
threads, or is executed serially by the encountering task) can be controlled by the
|
||||
\code{OMP\_NESTED} environment variable or the \code{omp\_set\_nested()}
|
||||
API routine with arguments indicating true or false.
|
||||
|
||||
The number of threads of a \code{parallel} region can be set by the \code{OMP\_NUM\_THREADS}
|
||||
environment variable, the \code{omp\_set\_num\_threads()} routine, or on the \code{parallel}
|
||||
directive with the \code{num\_threads}
|
||||
clause. The routine overrides the environment variable, and the clause overrides all.
|
||||
Use the \code{OMP\_DYNAMIC}
|
||||
or the \code{omp\_set\_dynamic()} function to specify that the OpenMP
|
||||
implementation dynamically adjust the number of threads for
|
||||
\code{parallel} regions. The default setting for dynamic adjustment is implementation
|
||||
defined. When dynamic adjustment is on and the number of threads is specified,
|
||||
the number of threads becomes an upper limit for the number of threads to be
|
||||
provided by the OpenMP runtime.
|
||||
|
||||
\pagebreak
|
||||
WORKSHARING CONSTRUCTS
|
||||
|
||||
A worksharing construct distributes the execution of the associated region
|
||||
among the members of the team that encounter it. There is an
|
||||
implied barrier at the end of the worksharing region
|
||||
(there is no barrier at the beginning). The worksharing
|
||||
constructs are:
|
||||
|
||||
\begin{compactitem}
|
||||
|
||||
\item loop constructs: {\code{for} and \code{do} }
|
||||
\item \code{sections}
|
||||
\item \code{single}
|
||||
\item \code{workshare}
|
||||
|
||||
\end{compactitem}
|
||||
|
||||
The \code{for} and \code{do} constructs (loop constructs) create a region
|
||||
consisting of a loop. A loop controlled by a loop construct is called
|
||||
an \plc{associated} loop. Nested loops can form a single region when the
|
||||
\code{collapse} clause (with an integer argument) designates the number of
|
||||
\plc{associated} loops to be executed in parallel, by forming a
|
||||
"single iteration space" for the specified number of nested loops.
|
||||
The \code{ordered} clause can also control multiple associated loops.
|
||||
|
||||
An associated loop must adhere to a "canonical form" (specified in the
|
||||
\plc{Canonical Loop Form} of the OpenMP Specifications document) which allows the
|
||||
iteration count (of all associated loops) to be computed before the
|
||||
(outermost) loop is executed. %[58:27-29].
|
||||
Most common loops comply with the canonical form, including C++ iterators.
|
||||
|
||||
A \code{single} construct forms a region in which only one thread (any one
|
||||
of the team) executes the region.
|
||||
The other threads wait at the implied
|
||||
barrier at the end, unless the \code{nowait} clause is specified.
|
||||
|
||||
The \code{sections} construct forms a region that contains one or more
|
||||
structured blocks. Each block of a \code{sections} directive is
|
||||
constructed with a \code{section} construct, and executed once by
|
||||
one of the threads (any one) in the team. (If only one block is
|
||||
formed in the region, the \code{section} construct, which is used to
|
||||
separate blocks, is not required.)
|
||||
The other threads wait at the implied
|
||||
barrier at the end, unless the \code{nowait} clause is specified.
|
||||
|
||||
|
||||
The \code{workshare} construct is a Fortran feature that consists of a
|
||||
region with a single structure block (section of code). Statements in the
|
||||
\code{workshare} region are divided into units of work, and executed (once)
|
||||
by threads of the team.
|
||||
|
||||
\bigskip
|
||||
MASTER CONSTRUCT
|
||||
|
||||
The \code{master} construct is not a worksharing construct. The master region is
|
||||
is executed only by the master thread. There is no implicit barrier (and flush)
|
||||
at the end of the \code{master} region; hence the other threads of the team continue
|
||||
execution beyond code statements beyond the \code{master} region.
|
85
Chap_program_control.tex
Normal file
85
Chap_program_control.tex
Normal file
@ -0,0 +1,85 @@
|
||||
\pagebreak
|
||||
\chapter{Program Control}
|
||||
\label{sec:program_control}
|
||||
|
||||
Some specific and elementary concepts of controlling program execution are
|
||||
illustrated in the examples of this chapter. Control can be directly
|
||||
managed with conditional control code (ifdef's with the \code{\_OPENMP}
|
||||
macro, and the Fortran sentinel (\code{!\$})
|
||||
for conditionally compiling). The \code{if} clause on some constructs
|
||||
can direct the runtime to ignore or alter the behavior of the construct.
|
||||
Of course, the base-language \code{if} statements can be used to control the "execution"
|
||||
of stand-alone directives (such as \code{flush}, \code{barrier}, \code{taskwait},
|
||||
and \code{taskyield}).
|
||||
However, the directives must appear in a block structure, and not as a substatement as shown in examples 1 and 2 of this chapter.
|
||||
|
||||
\bigskip
|
||||
CANCELLATION
|
||||
|
||||
Cancellation (termination) of the normal sequence of execution for the threads in an OpenMP region can
|
||||
be accomplished with the \code{cancel} construct. The construct uses a
|
||||
\plc{construct-type-clause} to set the region-type to activate for the cancellation.
|
||||
That is, inclusion of one of the \plc{construct-type-clause} names \code{parallel}, \code{for},
|
||||
\code{do}, \code{sections} or \code{taskgroup} on the directive line
|
||||
activates the corresponding region.
|
||||
The \code{cancel} construct is activated by the first encountering thread, and it
|
||||
continues execution at the end of the named region.
|
||||
The \code{cancel} construct is also a concellation point for any other thread of the team
|
||||
to also continue execution at the end of the named region.
|
||||
|
||||
Also, once the specified region has been activated for cancellation any thread that encounnters
|
||||
a \code{cancellation point} construct with the same named region (\plc{construct-type-clause}),
|
||||
continues execution at the end of the region.
|
||||
|
||||
For an activated \code{cancel taskgroup} construct, the tasks that
|
||||
belong to the taskgroup set of the innermost enclosing taskgroup region will be canceled.
|
||||
|
||||
A task that encounters the cancel taskgroup construct continues execution at the end of its
|
||||
task region. Any task of the taskgroup that has already begun execution will run to completion,
|
||||
unless it encounters a \code{cancellation point}; tasks that have not begun execution "may" be
|
||||
discarded as completed tasks.
|
||||
|
||||
\bigskip
|
||||
CONTROL VARIABLES
|
||||
|
||||
Internal control variables (ICV) are used by implementations to hold values which control the execution
|
||||
of OpenMP regions. Control (and hence the ICVs) may be set as implementation defaults,
|
||||
or set and adjusted through environment variables, clauses, and API functions. Many of the ICV control
|
||||
values are accessible through API function calls. Also, initial ICV values are reported by the runtime
|
||||
if the \code{OMP\_DISPLAY\_ENV} environment variable has been set to \code{TRUE}.
|
||||
|
||||
%As an example, the \plc{nthreads-var} is the ICV that holds the number of threads
|
||||
%to be used in a \code{parallel} region. It can be set with the \code{OMP\_NUM\_THREADS} environment variable,
|
||||
%the \code{omp\_set\_num\_threads()} API function, or the \code{num\_threads} clause. The default \plc{nthreads-var}
|
||||
%value is implementation defined. All of the ICVs are presented in the \plc{Internal Control Variables} section
|
||||
%of the \plc{Directives} chapter of the OpenMP Specifications document. Within the same document section, override
|
||||
%relationships and scoping information can be found for applying user specifications and understanding the
|
||||
%extent of the control.
|
||||
|
||||
\bigskip
|
||||
NESTED CONSTRUCTS
|
||||
|
||||
Certain combinations of nested constructs are permitted, giving rise to a \plc{combined} construct
|
||||
consisting of two or more constructs. These can be used when the two (or several) constructs would be used
|
||||
immediately in succession (closely nested). A combined construct can use the clauses of the component
|
||||
constructs without restrictions.
|
||||
A \plc{composite} construct is a combined construct which has one or more clauses with (an often obviously)
|
||||
modified or restricted meaning, relative to when the constructs are uncombined. %%[appear separately (singly).
|
||||
|
||||
%The combined \code{parallel do} and \code{parallel for} constructs are formed by combining the \code{parallel}
|
||||
%construct with one of the loops constructs \code{do} or \code{for}. The
|
||||
%\code{parallel do SIMD} and \code{parallel for SIMD} constructs are composite constructs (composed from
|
||||
%the parallel loop constructs and the \code{SIMD} construct), because the \code{collapse} clause must
|
||||
%explicitly address the ordering of loop chunking \plc{and} SIMD "combined" execution.
|
||||
|
||||
Certain nestings are forbidden, and often the reasoning is obvious. Worksharing constructs cannot be nested, and
|
||||
the \code{barrier} construct cannot be nested inside a worksharing construct, or a \code{critical} construct.
|
||||
Also, \code{target} constructs cannot be nested.
|
||||
|
||||
The \code{parallel} construct can be nested, as well as the \code{task} construct. The parallel
|
||||
execution in the nested \code{parallel} construct(s) is control by the \code{OMP\_NESTED} and
|
||||
\code{OMP\_MAX\_ACTIVE\_LEVELS} environment variables, and the \code{omp\_set\_nested()} and
|
||||
\code{omp\_set\_max\_active\_levels()} functions.
|
||||
|
||||
More details on nesting can be found in the \plc{Nesting of Regions} of the \plc{Directives}
|
||||
chapter in the OpenMP Specifications document.
|
69
Chap_synchronization.tex
Normal file
69
Chap_synchronization.tex
Normal file
@ -0,0 +1,69 @@
|
||||
\pagebreak
|
||||
\chapter{Synchronization}
|
||||
\label{chap:synchronization}
|
||||
|
||||
The \code{barrier} construct is a stand-alone directive that requires all threads
|
||||
of a team (within a contention group) to execute the barrier and complete
|
||||
execution of all tasks within the region, before continuing past the barrier.
|
||||
|
||||
The \code{critical} construct is a directive that contains a structured block.
|
||||
The construct allows only a single thread at a time to execute the structured block (region).
|
||||
Multiple critical regions may exist in a parallel region, and may
|
||||
act cooperatively (only one thread at a time in all \code{critical} regions),
|
||||
or separately (only one thread at a time in each \code{critical} regions when
|
||||
a unique name is supplied on each \code{critical} construct).
|
||||
An optional (lock) \code{hint} clause may be specified on a named \code{critical}
|
||||
construct to provide the OpenMP runtime guidance in selection a locking
|
||||
mechanism.
|
||||
|
||||
On a finer scale the \code{atomic} construct allows only a single thread at
|
||||
a time to have atomic access to a storage location involving a single read,
|
||||
write, update or capture statement, and a limited number of combinations
|
||||
when specifying the \code{capture} \plc{atomic-clause} clause. The \plc{atomic-clause} clause
|
||||
is required for some expression statements, but are not required for
|
||||
\code{update} statements. Please see the details in the \plc{atomic Construct}
|
||||
subsection of the \plc{Directives} chapter in the OpenMP Specifications document.
|
||||
|
||||
% The following three sentences were stolen from the spec.
|
||||
The \code{ordered} construct either specifies a structured block in a loop,
|
||||
simd, or loop SIMD region that will be executed in the order of the loop
|
||||
iterations. The ordered construct sequentializes and orders the execution
|
||||
of ordered regions while allowing code outside the region to run in parallel.
|
||||
|
||||
Since OpenMP 4.5 the \code{ordered} construct can also be a stand-alone
|
||||
directive that specifies cross-iteration dependences in a doacross loop nest.
|
||||
The \code{depend} clause uses a \code{sink} \plc{dependence-type}, along with a
|
||||
iteration vector argument (vec) to indicate the iteration that satisfies the
|
||||
dependence. The \code{depend} clause with a \code{source}
|
||||
\plc{dependence-type} specifies dependence satisfaction.
|
||||
|
||||
The \code{flush} directive is a stand-alone construct that forces a thread's
|
||||
temporal local storage (view) of a variable to memory where a consistent view
|
||||
of the variable storage can be accesses. When the construct is used without
|
||||
a variable list, all the locally thread-visible data as defined by the
|
||||
base language are flushed. A construct with a list applies the flush
|
||||
operation only to the items in the list. The \code{flush} construct also
|
||||
effectively insures that no memory (load or store) operation for
|
||||
the variable set (list items, or default set) may be reordered across
|
||||
the \code{flush} directive.
|
||||
|
||||
General-purpose routines provide mutual exclusion semantics through locks,
|
||||
represented by lock variables.
|
||||
The semantics allows a task to \plc{set}, and hence
|
||||
\plc{own} a lock, until it is \plc{unset} by the task that set it. A
|
||||
\plc{nestable} lock can be set multiple times by a task, and is used
|
||||
when in code requires nested control of locks. A \plc{simple lock} can
|
||||
only be set once by the owning task. There are specific calls for the two
|
||||
types of locks, and the variable of a specific lock type cannot be used by the
|
||||
other lock type.
|
||||
|
||||
Any explicit task will observe the synchronization prescribed in a
|
||||
\code{barrier} construct and an implied barrier. Also, additional synchronizations
|
||||
are available for tasks. All children of a task will wait at a \code{taskwait} (for
|
||||
their siblings to complete). A \code{taskgroup} construct creates a region in which the
|
||||
current task is suspended at the end of the region until all sibling tasks,
|
||||
and their descendants, have completed.
|
||||
Scheduling constraints on task execution can be prescribed by the \code{depend}
|
||||
clause to enforce dependence on previously generated tasks.
|
||||
More details on controlling task executions can be found in the \plc{Tasking} Chapter
|
||||
in the OpenMP Specifications document. %(DO REF. RIGHT.)
|
51
Chap_tasking.tex
Normal file
51
Chap_tasking.tex
Normal file
@ -0,0 +1,51 @@
|
||||
\pagebreak
|
||||
\chapter{Tasking}
|
||||
\label{chap:tasking}
|
||||
|
||||
Tasking constructs provide units of work to a thread for execution.
|
||||
Worksharing constructs do this, too (e.g. \code{for}, \code{do},
|
||||
\code{sections}, and \code{singles} constructs);
|
||||
but the work units are tightly controlled by an iteration limit and limited
|
||||
scheduling, or a limited number of \code{sections} or \code{single} regions.
|
||||
Worksharing was designed
|
||||
with \texttt{"}data parallel\texttt{"} computing in mind. Tasking was designed for
|
||||
\texttt{"}task parallel\texttt{"} computing and often involves non-locality or irregularity
|
||||
in memory access.
|
||||
|
||||
The \code{task} construct can be used to execute work chunks: in a while loop;
|
||||
while traversing nodes in a list; at nodes in a tree graph;
|
||||
or in a normal loop (with a \code{taskloop} construct).
|
||||
Unlike the statically scheduled loop iterations of worksharing, a task is
|
||||
often enqueued, and then dequeued for execution by any of the threads of the
|
||||
team within a parallel region. The generation of tasks can be from a single
|
||||
generating thread (creating sibling tasks), or from multiple generators
|
||||
in a recursive graph tree traversals.
|
||||
%(creating a parent-descendents hierarchy of tasks, see example 4 and 7 below).
|
||||
A \code{taskloop} construct
|
||||
bundles iterations of an associated loop into tasks, and provides
|
||||
similar controls found in the \code{task} construct.
|
||||
|
||||
Sibling tasks are synchronized by the \code{taskwait} construct, and tasks
|
||||
and their descendent tasks can be synchronized by containing them in
|
||||
a \code{taskgroup} region. Ordered execution is accomplished by specifying
|
||||
dependences with a \code{depend} clause. Also, priorities can be
|
||||
specified as hints to the scheduler through a \code{priority} clause.
|
||||
|
||||
Various clauses can be used to manage and optimize task generation,
|
||||
as well as reduce the overhead of execution and to relinquish
|
||||
control of threads for work balance and forward progress.
|
||||
|
||||
Once a thread starts executing a task, it is the designated thread
|
||||
for executing the task to completion, even though it may leave the
|
||||
execution at a scheduling point and return later. The thread is tied
|
||||
to the task. Scheduling points can be introduced with the \code{taskyield}
|
||||
construct. With an \code{untied} clause any other thread is allowed to continue
|
||||
the task. An \code{if} clause with a \plc{true} expression allows the
|
||||
generating thread to immediately execute the task as an undeferred task.
|
||||
By including the data environment of the generating task into the generated task with the
|
||||
\code{mergeable} and \code{final} clauses, task generation overhead can be reduced.
|
||||
|
||||
A complete list of the tasking constructs and details of their clauses
|
||||
can be found in the \plc{Tasking Constructs} chapter of the OpenMP Specifications,
|
||||
in the \plc{OpenMP Application Programming Interface} section.
|
||||
|
@ -1,9 +1,21 @@
|
||||
|
||||
\chapter*{Examples}
|
||||
\label{chap:examples}
|
||||
\addcontentsline{toc}{chapter}{\protect\numberline{}Examples}
|
||||
The following are examples of the OpenMP API directives, constructs, and routines.
|
||||
\ccppspecificstart
|
||||
A statement following a directive is compound only when necessary, and a
|
||||
non-compound statement is indented with respect to a directive preceding it.
|
||||
\ccppspecificend
|
||||
|
||||
Each example is labeled as \plc{ename.seqno.ext}, where \plc{ename} is
|
||||
the example name, \plc{seqno} is the sequence number in a section, and
|
||||
\plc{ext} is the source file extension to indicate the code type and
|
||||
source form. \plc{ext} is one of the following:
|
||||
\begin{compactitem}
|
||||
\item \plc{c} -- C code,
|
||||
\item \plc{cpp} -- C++ code,
|
||||
\item \plc{f} -- Fortran code in fixed form, and
|
||||
\item \plc{f90} -- Fortran code in free form.
|
||||
\end{compactitem}
|
||||
|
||||
|
@ -1,17 +1,13 @@
|
||||
\pagebreak
|
||||
\chapter{SIMD Constructs}
|
||||
\label{chap:SIMD}
|
||||
%\pagebreak
|
||||
\section{\code{simd} and \code{declare} \code{simd} Constructs}
|
||||
\label{sec:SIMD}
|
||||
|
||||
The following examples illustrate the use of SIMD constructs for vectorization.
|
||||
The following example illustrates the basic use of the \code{simd} construct
|
||||
to assure the compiler that the loop can be vectorized.
|
||||
|
||||
Compilers may not vectorize loops when they are complex or possibly have
|
||||
dependencies, even though the programmer is certain the loop will execute
|
||||
correctly as a vectorized loop. The \code{simd} construct assures the compiler
|
||||
that the loop can be vectorized.
|
||||
\cexample{SIMD}{1}
|
||||
|
||||
\cexample{SIMD}{1c}
|
||||
|
||||
\fexample{SIMD}{1f}
|
||||
\ffreeexample{SIMD}{1}
|
||||
|
||||
|
||||
When a function can be inlined within a loop the compiler has an opportunity to
|
||||
@ -42,9 +38,9 @@ In the \code{simd} constructs for the loops the \code{private(tmp)} clause is
|
||||
necessary to assure that the each vector operation has its own \plc{tmp}
|
||||
variable.
|
||||
|
||||
\cexample{SIMD}{2c}
|
||||
\cexample{SIMD}{2}
|
||||
|
||||
\fexample{SIMD}{2f}
|
||||
\ffreeexample{SIMD}{2}
|
||||
|
||||
|
||||
A thread that encounters a SIMD construct executes a vectorized code of the
|
||||
@ -54,9 +50,9 @@ privatized and declared as reductions with clauses. The example below
|
||||
illustrates the use of \code{private} and \code{reduction} clauses in a SIMD
|
||||
construct.
|
||||
|
||||
\cexample{SIMD}{3c}
|
||||
\cexample{SIMD}{3}
|
||||
|
||||
\fexample{SIMD}{3f}
|
||||
\ffreeexample{SIMD}{3}
|
||||
|
||||
|
||||
A \code{safelen(N)} clause in a \code{simd} construct assures the compiler that
|
||||
@ -69,9 +65,9 @@ code is safe for vectors up to and including size 16. In the loop, \plc{m} can
|
||||
be 16 or greater, for correct code execution. If the value of \plc{m} is less
|
||||
than 16, the behavior is undefined.
|
||||
|
||||
\cexample{SIMD}{4c}
|
||||
\cexample{SIMD}{4}
|
||||
|
||||
\fexample{SIMD}{4f}
|
||||
\ffreeexample{SIMD}{4}
|
||||
|
||||
|
||||
The following SIMD construct instructs the compiler to collapse the \plc{i} and
|
||||
@ -79,11 +75,15 @@ The following SIMD construct instructs the compiler to collapse the \plc{i} and
|
||||
threads of the team. Within the workshared loop chunks of a thread, the SIMD
|
||||
chunks are executed in the lanes of the vector units.
|
||||
|
||||
\cexample{SIMD}{5c}
|
||||
\cexample{SIMD}{5}
|
||||
|
||||
\fexample{SIMD}{5f}
|
||||
\ffreeexample{SIMD}{5}
|
||||
|
||||
|
||||
%%% section
|
||||
\section{\code{inbranch} and \code{notinbranch} Clauses}
|
||||
\label{sec:SIMD_branch}
|
||||
|
||||
The following examples illustrate the use of the \code{declare} \code{simd}
|
||||
construct with the \code{inbranch} and \code{notinbranch} clauses. The
|
||||
\code{notinbranch} clause informs the compiler that the function \plc{foo} is
|
||||
@ -92,9 +92,9 @@ the other hand, the \code{inbranch} clause for the function goo indicates that
|
||||
the function is always called conditionally in the SIMD loop inside
|
||||
the function \plc{myaddfloat}.
|
||||
|
||||
\cexample{SIMD}{6c}
|
||||
\cexample{SIMD}{6}
|
||||
|
||||
\fexample{SIMD}{6f}
|
||||
\ffreeexample{SIMD}{6}
|
||||
|
||||
|
||||
In the code below, the function \plc{fib()} is called in the main program and
|
||||
@ -103,7 +103,24 @@ condition. The compiler creates a masked vector version and a non-masked vector
|
||||
version for the function \plc{fib()} while retaining the original scalar
|
||||
version of the \plc{fib()} function.
|
||||
|
||||
\cexample{SIMD}{7c}
|
||||
\cexample{SIMD}{7}
|
||||
|
||||
\fexample{SIMD}{7f}
|
||||
\ffreeexample{SIMD}{7}
|
||||
|
||||
|
||||
|
||||
%%% section
|
||||
\section{Loop-Carried Lexical Forward Dependence}
|
||||
\label{sec:SIMD_forward_dep}
|
||||
|
||||
|
||||
The following example tests the restriction on an SIMD loop with the loop-carried lexical forward-dependence. This dependence must be preserved for the correct execution of SIMD loops.
|
||||
|
||||
A loop can be vectorized even though the iterations are not completely independent when it has loop-carried dependences that are forward lexical dependences, indicated in the code below by the read of \plc{A[j+1]} and the write to \plc{A[j]} in C/C++ code (or \plc{A(j+1)} and \plc{A(j)} in Fortran). That is, the read of \plc{A[j+1]} (or \plc{A(j+1)} in Fortran) before the write to \plc{A[j]} (or \plc{A(j)} in Fortran) ordering must be preserved for each iteration in \plc{j} for valid SIMD code generation.
|
||||
|
||||
This test assures that the compiler preserves the loop carried lexical forward-dependence for generating a correct SIMD code.
|
||||
|
||||
\cexample{SIMD}{8}
|
||||
|
||||
\ffreeexample{SIMD}{8}
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{proc\_bind} Clause}
|
||||
\label{chap:affinity}
|
||||
\section{The \code{proc\_bind} Clause}
|
||||
\label{sec:affinity}
|
||||
|
||||
The following examples demonstrate how to use the \code{proc\_bind} clause to
|
||||
control the thread binding for a team of threads in a \code{parallel} region.
|
||||
@ -25,16 +25,18 @@ or
|
||||
|
||||
\code{OMP\_PLACES=\texttt{"}\{0:2\}:8:2\texttt{"}}
|
||||
|
||||
\section{Spread Affinity Policy}
|
||||
\subsection{Spread Affinity Policy}
|
||||
\label{subsec:affinity_spread}
|
||||
|
||||
|
||||
The following example shows the result of the \code{spread} affinity policy on
|
||||
the partition list when the number of threads is less than or equal to the number
|
||||
of places in the parent's place partition, for the machine architecture depicted
|
||||
above. Note that the threads are bound to the first place of each subpartition.
|
||||
|
||||
\cexample{affinity}{1c}
|
||||
\cexample{affinity}{1}
|
||||
|
||||
\fexample{affinity}{1f}
|
||||
\fexample{affinity}{1}
|
||||
|
||||
It is unspecified on which place the master thread is initially started. If the
|
||||
master thread is initially started on p0, the following placement of threads will
|
||||
@ -73,9 +75,9 @@ parent's place partition. The first \plc{T/P} threads of the team (including the
|
||||
thread) execute on the parent's place. The next \plc{T/P} threads execute on the next
|
||||
place in the place partition, and so on, with wrap around.
|
||||
|
||||
\cexample{affinity}{2c}
|
||||
\cexample{affinity}{2}
|
||||
|
||||
\fexample{affinity}{2f}
|
||||
\ffreeexample{affinity}{2}
|
||||
|
||||
It is unspecified on which place the master thread is initially started. If the
|
||||
master thread is initially started on p0, the following placement of threads will
|
||||
@ -120,16 +122,17 @@ and distribution of the place partition would be as follows:
|
||||
\item threads 14,15 execute on p1 with the place partition p1
|
||||
\end{compactitem}
|
||||
|
||||
\section{Close Affinity Policy}
|
||||
\subsection{Close Affinity Policy}
|
||||
\label{subsec:affinity_close}
|
||||
|
||||
The following example shows the result of the \code{close} affinity policy on
|
||||
the partition list when the number of threads is less than or equal to the number
|
||||
of places in parent's place partition, for the machine architecture depicted above.
|
||||
The place partition is not changed by the \code{close} policy.
|
||||
|
||||
\cexample{affinity}{3c}
|
||||
\cexample{affinity}{3}
|
||||
|
||||
\fexample{affinity}{3f}
|
||||
\fexample{affinity}{3}
|
||||
|
||||
It is unspecified on which place the master thread is initially started. If the
|
||||
master thread is initially started on p0, the following placement of threads will
|
||||
@ -168,9 +171,9 @@ thread) execute on the parent's place. The next \plc{T/P} threads execute on the
|
||||
place in the place partition, and so on, with wrap around. The place partition
|
||||
is not changed by the \code{close} policy.
|
||||
|
||||
\cexample{affinity}{4c}
|
||||
\cexample{affinity}{4}
|
||||
|
||||
\fexample{affinity}{4f}
|
||||
\ffreeexample{affinity}{4}
|
||||
|
||||
It is unspecified on which place the master thread is initially started. If the
|
||||
master thread is initially running on p0, the following placement of threads will
|
||||
@ -215,15 +218,16 @@ and distribution of the place partition would be as follows:
|
||||
\item threads 14,15 execute on p1 with the place partition p0-p7
|
||||
\end{compactitem}
|
||||
|
||||
\section{Master Affinity Policy}
|
||||
\subsection{Master Affinity Policy}
|
||||
\label{subsec:affinity_master}
|
||||
|
||||
The following example shows the result of the \code{master} affinity policy on
|
||||
the partition list for the machine architecture depicted above. The place partition
|
||||
is not changed by the master policy.
|
||||
|
||||
\cexample{affinity}{5c}
|
||||
\cexample{affinity}{5}
|
||||
|
||||
\fexample{affinity}{5f}
|
||||
\fexample{affinity}{5}
|
||||
|
||||
It is unspecified on which place the master thread is initially started. If the
|
||||
master thread is initially running on p0, the following placement of threads will
|
||||
|
43
Examples_affinity_query.tex
Normal file
43
Examples_affinity_query.tex
Normal file
@ -0,0 +1,43 @@
|
||||
\section{Affinity Query Functions}
|
||||
\label{sec: affinity_query}
|
||||
|
||||
In the example below a team of threads is generated on each socket of
|
||||
the system, using nested parallelism. Several query functions are used
|
||||
to gather information to support the creation of the teams and to obtain
|
||||
socket and thread numbers.
|
||||
|
||||
For proper execution of the code, the user must create a place partition, such that
|
||||
each place is a listing of the core numbers for a socket. For example,
|
||||
in a 2 socket system with 8 cores in each socket, and sequential numbering
|
||||
in the socket for the core numbers, the \code{OMP\_PLACES} variable would be set
|
||||
to "\{0:8\},\{8:8\}", using the place syntax \{\plc{lower\_bound}:\plc{length}:\plc{stride}\},
|
||||
and the default stride of 1.
|
||||
|
||||
The code determines the number of sockets (\plc{n\_sockets})
|
||||
using the \code{omp\_get\_num\_places()} query function.
|
||||
In this example each place is constructed with a list of
|
||||
each socket's core numbers, hence the number of places is equal
|
||||
to the number of sockets.
|
||||
|
||||
The outer parallel region forms a team of threads, and each thread
|
||||
executes on a socket (place) because the \code{proc\_bind} clause uses
|
||||
\code{spread} in the outer \code{parallel} construct.
|
||||
Next, in the \plc{socket\_init} function, an inner parallel region creates a team
|
||||
of threads equal to the number of elements (core numbers) from the place
|
||||
of the parent thread. Because the outer \code{parallel} construct uses
|
||||
a \code{spread} affinity policy, each of its threads inherits a subpartition of
|
||||
the original partition. Hence, the \code{omp\_get\_place\_num\_procs} query function
|
||||
returns the number of elements (here procs = cores) in the subpartition of the thread.
|
||||
After each parent thread creates its nested parallel region on the section,
|
||||
the socket number and thread number are reported.
|
||||
|
||||
Note: Portable tools like hwloc (Portable HardWare LOCality package), which support
|
||||
many common operating systems, can be used to determine the configuration of a system.
|
||||
On some systems there are utilities, files or user guides that provide configuration
|
||||
information. For instance, the socket number and proc\_id's for a socket
|
||||
can be found in the /proc/cpuinfo text file on Linux systems.
|
||||
|
||||
\cexample{affinity}{6}
|
||||
|
||||
\ffreeexample{affinity}{6}
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{Array Sections in Device Constructs}
|
||||
\label{chap:array_sections}
|
||||
\section{Array Sections in Device Constructs}
|
||||
\label{sec:array_sections}
|
||||
|
||||
The following examples show the usage of array sections in \code{map} clauses
|
||||
on \code{target} and \code{target} \code{data} constructs.
|
||||
@ -8,28 +8,28 @@ on \code{target} and \code{target} \code{data} constructs.
|
||||
This example shows the invalid usage of two seperate sections of the same array
|
||||
inside of a \code{target} construct.
|
||||
|
||||
\cexample{array_sections}{1c}
|
||||
\cexample{array_sections}{1}
|
||||
|
||||
\fexample{array_sections}{1f}
|
||||
\ffreeexample{array_sections}{1}
|
||||
|
||||
This example shows the invalid usage of two separate sections of the same array
|
||||
inside of a \code{target} construct.
|
||||
|
||||
\cexample{array_sections}{2c}
|
||||
\cexample{array_sections}{2}
|
||||
|
||||
\fexample{array_sections}{2f}
|
||||
\ffreeexample{array_sections}{2}
|
||||
|
||||
This example shows the valid usage of two separate sections of the same array inside
|
||||
of a \code{target} construct.
|
||||
|
||||
\cexample{array_sections}{3c}
|
||||
\cexample{array_sections}{3}
|
||||
|
||||
\fexample{array_sections}{3f}
|
||||
\ffreeexample{array_sections}{3}
|
||||
|
||||
This example shows the valid usage of a wholly contained array section of an already
|
||||
mapped array section inside of a \code{target} construct.
|
||||
|
||||
\cexample{array_sections}{4c}
|
||||
\cexample{array_sections}{4}
|
||||
|
||||
\fexample{array_sections}{4f}
|
||||
\ffreeexample{array_sections}{4}
|
||||
|
||||
|
@ -1,7 +1,7 @@
|
||||
\pagebreak
|
||||
\chapter{Fortran \code{ASSOCIATE} Construct}
|
||||
\section{Fortran \code{ASSOCIATE} Construct}
|
||||
\fortranspecificstart
|
||||
\label{chap:associate}
|
||||
\label{sec:associate}
|
||||
|
||||
The following is an invalid example of specifying an associate name on a data-sharing attribute
|
||||
clause. The constraint in the Data Sharing Attribute Rules section in the OpenMP
|
||||
@ -11,13 +11,13 @@ name \plc{b} is associated with the shared variable \plc{a}. With the predetermi
|
||||
attribute rule, the associate name \plc{b} is not allowed to be specified on the \code{private}
|
||||
clause.
|
||||
|
||||
\fnexample{associate}{1f}
|
||||
\fnexample{associate}{1}
|
||||
|
||||
In next example, within the \code{parallel} construct, the association name \plc{thread\_id}
|
||||
is associated with the private copy of \plc{i}. The print statement should output the
|
||||
unique thread number.
|
||||
|
||||
\fnexample{associate}{2f}
|
||||
\fnexample{associate}{2}
|
||||
|
||||
The following example illustrates the effect of specifying a selector name on a data-sharing
|
||||
attribute clause. The associate name \plc{u} is associated with \plc{v} and the variable \plc{v}
|
||||
@ -27,6 +27,6 @@ The association between \plc{u} and the original \plc{v} is retained (see the Da
|
||||
Attribute Rules section in the OpenMP 4.0 API Specifications). Inside the \code{parallel}
|
||||
region, \plc{v} has the value of -1 and \plc{u} has the value of the original \plc{v}.
|
||||
|
||||
\fnexample{associate}{3f}
|
||||
\ffreenexample{associate}{3}
|
||||
\fortranspecificend
|
||||
|
||||
|
15
Examples_async_target_depend.tex
Normal file
15
Examples_async_target_depend.tex
Normal file
@ -0,0 +1,15 @@
|
||||
\pagebreak
|
||||
\section{Asynchronous \code{target} Execution and Dependences}
|
||||
\label{sec:async_target_exec_depend}
|
||||
|
||||
Asynchronous execution of a \code{target} region can be accomplished
|
||||
by creating an explicit task around the \code{target} region. Examples
|
||||
with explicit tasks are shown at the beginning of this section.
|
||||
|
||||
As of OpenMP 4.5 and beyond the \code{nowait} clause can be used on the
|
||||
\code{target} directive for asynchronous execution. Examples with
|
||||
\code{nowait} clauses follow the explicit \code{task} examples.
|
||||
|
||||
This section also shows the use of \code{depend} clauses to order
|
||||
executions through dependences.
|
||||
|
31
Examples_async_target_nowait.tex
Normal file
31
Examples_async_target_nowait.tex
Normal file
@ -0,0 +1,31 @@
|
||||
\subsection{\code{nowait} Clause on \code{target} Construct}
|
||||
\label{subsec:target_nowait_clause}
|
||||
|
||||
The following example shows how to execute code asynchronously on a
|
||||
device without an explicit task. The \code{nowait} clause on a \code{target}
|
||||
construct allows the thread of the \plc{target task} to perform other
|
||||
work while waiting for the \code{target} region execution to complete.
|
||||
Hence, the the \code{target} region can execute asynchronously on the
|
||||
device (without requiring a host thread to idle while waiting for
|
||||
the \plc{target task} execution to complete).
|
||||
|
||||
In this example the product of two vectors (arrays), \plc{v1}
|
||||
and \plc{v2}, is formed. One half of the operations is performed
|
||||
on the device, and the last half on the host, concurrently.
|
||||
|
||||
After a team of threads is formed the master thread generates
|
||||
the \plc{target task} while the other threads can continue on, without a barrier,
|
||||
to the execution of the host portion of the vector product.
|
||||
The completion of the \plc{target task} (asynchronous target execution) is
|
||||
guaranteed by the synchronization in the implicit barrier at the end of the
|
||||
host vector-product worksharing loop region. See the \code{barrier}
|
||||
glossary entry in the OpenMP specification for details.
|
||||
|
||||
The host loop scheduling is \code{dynamic}, to balance the host thread executions, since
|
||||
one thread is being used for offload generation. In the situation where
|
||||
little time is spent by the \plc{target task} in setting
|
||||
up and tearing down the the target execution, \code{static} scheduling may be desired.
|
||||
|
||||
\cexample{async_target}{3}
|
||||
|
||||
\ffreeexample{async_target}{3}
|
18
Examples_async_target_nowait_depend.tex
Normal file
18
Examples_async_target_nowait_depend.tex
Normal file
@ -0,0 +1,18 @@
|
||||
%begin
|
||||
\subsection{Asynchronous \code{target} with \code{nowait} and \code{depend} Clauses}
|
||||
\label{subsec:async_target_nowait_depend}
|
||||
|
||||
More details on dependences can be found in \specref{sec:task_depend}, Task
|
||||
Dependences. In this example, there are three flow dependences. In the first two dependences the
|
||||
target task does not execute until the preceding explicit tasks have finished. These
|
||||
dependences are produced by arrays \plc{v1} and \plc{v2} with the \code{out} dependence type in the first two tasks, and the \code{in} dependence type in the target task.
|
||||
|
||||
The last dependence is produced by array \plc{p} with the \code{out} dependence type in the target task, and the \code{in} dependence type in the last task. The last task does not execute until the target task finishes.
|
||||
|
||||
The \code{nowait} clause on the \code{target} construct creates a deferrable \plc{target task}, allowing the encountering task to continue execution without waiting for the completion of the \plc{target task}.
|
||||
|
||||
\cexample{async_target}{4}
|
||||
|
||||
\ffreeexample{async_target}{4}
|
||||
|
||||
%end
|
@ -1,6 +1,5 @@
|
||||
\pagebreak
|
||||
\chapter{Asynchronous Execution of a \code{target} Region Using Tasks}
|
||||
\label{chap:async_target}
|
||||
\subsection{Asynchronous \code{target} with Tasks}
|
||||
\label{subsec:async_target_with_tasks}
|
||||
|
||||
The following example shows how the \code{task} and \code{target} constructs
|
||||
are used to execute multiple \code{target} regions asynchronously. The task that
|
||||
@ -10,45 +9,46 @@ scheduling point while waiting for the execution of the \code{target} region
|
||||
to complete, allowing the thread to switch back to the execution of the encountering
|
||||
task or one of the previously generated explicit tasks.
|
||||
|
||||
\cexample{async_target}{1c}
|
||||
\cexample{async_target}{1}
|
||||
|
||||
The Fortran version has an interface block that contains the \code{declare} \code{target}.
|
||||
An identical statement exists in the function declaration (not shown here).
|
||||
|
||||
\fexample{async_target}{1f}
|
||||
\ffreeexample{async_target}{1}
|
||||
|
||||
The following example shows how the \code{task} and \code{target} constructs
|
||||
are used to execute multiple \code{target} regions asynchronously. The task dependence
|
||||
ensures that the storage is allocated and initialized on the device before it is
|
||||
accessed.
|
||||
|
||||
\cexample{async_target}{2c}
|
||||
\cexample{async_target}{2}
|
||||
|
||||
The Fortran example below is similar to the C version above. Instead of pointers, though, it uses
|
||||
the convenience of Fortran allocatable arrays on the device. An allocatable array has the
|
||||
same behavior in a \code{map} clause as a C pointer, in this case.
|
||||
the convenience of Fortran allocatable arrays on the device. In order to preserve the arrays
|
||||
allocated on the device across multiple \code{target} regions, a \code{target}~\code{data} region
|
||||
is used in this case.
|
||||
|
||||
If there is no shape specified for an allocatable array in a \code{map} clause, only the array descriptor
|
||||
(also called a dope vector) is mapped. That is, device space is created for the descriptor, and it
|
||||
is initially populated with host values. In this case, the \plc{v1} and \plc{v2} arrays will be in a
|
||||
non-associated state on the device. When space for \plc{v1} and \plc{v2} is allocated on the device
|
||||
the addresses to the space will be included in their descriptors.
|
||||
in the first \code{target} region the addresses to the space will be included in their descriptors.
|
||||
|
||||
At the end of the first \code{target} region, the descriptor (of an unshaped specification of an allocatable
|
||||
array in a \code{map} clause) is returned with the raw device address of the allocated space.
|
||||
The content of the array is not returned. In the example the data in arrays \plc{v1} and \plc{v2}
|
||||
are not returned. In the second \code{target} directive, the \plc{v1} and \plc{v2} descriptors are
|
||||
re-created on the device with the descriptive information; and references to the
|
||||
vectors point to the correct local storage, of the space that was not freed in the first \code{target}
|
||||
directive. At the end of the second \code{target} region, the data in array \plc{p} is copied back
|
||||
to the host since \plc{p} is not an allocatable array.
|
||||
At the end of the first \code{target} region, the arrays \plc{v1} and \plc{v2} are preserved on the device
|
||||
for access in the second \code{target} region. At the end of the second \code{target} region, the data
|
||||
in array \plc{p} is copied back, the arrays \plc{v1} and \plc{v2} are not.
|
||||
|
||||
A \code{depend} clause is used in the \code{task} directive to provide a wait at the beginning of the second
|
||||
\code{target} region, to insure that there is no race condition with \plc{v1} and \plc{v2} in the two tasks.
|
||||
It would be noncompliant to use \plc{v1} and/or \plc{v2} in lieu of \plc{N} in the \code{depend} clauses,
|
||||
because the use of non-allocated allocatable arrays as list items in the first \code{depend} clause would
|
||||
because the use of non-allocated allocatable arrays as list items in a \code{depend} clause would
|
||||
lead to unspecified behavior.
|
||||
|
||||
\fexample{async_target}{2f}
|
||||
|
||||
\noteheader{--} This example is not strictly compliant with the OpenMP 4.5 specification since the allocation status
|
||||
of allocatable arrays \plc{v1} and \plc{v2} is changed inside the \code{target} region, which is not allowed.
|
||||
(See the restrictions for the \code{map} clause in the \plc{Data-mapping Attribute Rules and Clauses}
|
||||
section of the specification.)
|
||||
However, the intention is to relax the restrictions on mapping of allocatable variables in the next release
|
||||
of the specification so that the example will be compliant.
|
||||
|
||||
\ffreeexample{async_target}{2}
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{atomic} Construct}
|
||||
\label{chap:atomic}
|
||||
\section{The \code{atomic} Construct}
|
||||
\label{sec:atomic}
|
||||
|
||||
The following example avoids race conditions (simultaneous updates of an element
|
||||
of \plc{x} by multiple threads) by using the \code{atomic} construct .
|
||||
@ -14,9 +14,9 @@ Note that the \code{atomic} directive applies only to the statement immediately
|
||||
following it. As a result, elements of \plc{y} are not updated atomically in
|
||||
this example.
|
||||
|
||||
\cexample{atomic}{1c}
|
||||
\cexample{atomic}{1}
|
||||
|
||||
\fexample{atomic}{1f}
|
||||
\fexample{atomic}{1}
|
||||
|
||||
The following example illustrates the \code{read} and \code{write} clauses
|
||||
for the \code{atomic} directive. These clauses ensure that the given variable
|
||||
@ -26,9 +26,9 @@ another part of the variable. Note that most hardware provides atomic reads and
|
||||
writes for some set of properly aligned variables of specific sizes, but not necessarily
|
||||
for all the variable types supported by the OpenMP API.
|
||||
|
||||
\cexample{atomic}{2c}
|
||||
\cexample{atomic}{2}
|
||||
|
||||
\fexample{atomic}{2f}
|
||||
\fexample{atomic}{2}
|
||||
|
||||
The following example illustrates the \code{capture} clause for the \code{atomic}
|
||||
directive. In this case the value of a variable is captured, and then the variable
|
||||
@ -37,8 +37,8 @@ be implemented using the fetch-and-add instruction available on many kinds of ha
|
||||
The example also shows a way to implement a spin lock using the \code{capture}
|
||||
and \code{read} clauses.
|
||||
|
||||
\cexample{atomic}{3c}
|
||||
\cexample{atomic}{3}
|
||||
|
||||
\fexample{atomic}{3f}
|
||||
\fexample{atomic}{3}
|
||||
|
||||
|
||||
|
@ -1,25 +1,25 @@
|
||||
\pagebreak
|
||||
\chapter{Restrictions on the \code{atomic} Construct}
|
||||
\label{chap:atomic_restrict}
|
||||
\section{Restrictions on the \code{atomic} Construct}
|
||||
\label{sec:atomic_restrict}
|
||||
|
||||
The following non-conforming examples illustrate the restrictions on the \code{atomic}
|
||||
construct.
|
||||
|
||||
\cexample{atomic_restrict}{1c}
|
||||
\cexample{atomic_restrict}{1}
|
||||
|
||||
\fexample{atomic_restrict}{1f}
|
||||
\fexample{atomic_restrict}{1}
|
||||
|
||||
\cexample{atomic_restrict}{2c}
|
||||
\cexample{atomic_restrict}{2}
|
||||
|
||||
\fortranspecificstart
|
||||
The following example is non-conforming because \code{I} and \code{R} reference
|
||||
the same location but have different types.
|
||||
|
||||
\fnexample{atomic_restrict}{2f}
|
||||
\fnexample{atomic_restrict}{2}
|
||||
|
||||
Although the following example might work on some implementations, this is also
|
||||
non-conforming:
|
||||
|
||||
\fnexample{atomic_restrict}{3f}
|
||||
\fnexample{atomic_restrict}{3}
|
||||
\fortranspecificend
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{Binding of \code{barrier} Regions}
|
||||
\label{chap:barrier_regions}
|
||||
\section{Binding of \code{barrier} Regions}
|
||||
\label{sec:barrier_regions}
|
||||
|
||||
The binding rules call for a \code{barrier} region to bind to the closest enclosing
|
||||
\code{parallel} region.
|
||||
@ -17,8 +17,8 @@ part. Also note that the \code{barrier} region in \plc{sub3} when called from
|
||||
\plc{sub2} only synchronizes the team of threads in the enclosing \code{parallel}
|
||||
region and not all the threads created in \plc{sub1}.
|
||||
|
||||
\cexample{barrier_regions}{1c}
|
||||
\cexample{barrier_regions}{1}
|
||||
|
||||
\fexample{barrier_regions}{1f}
|
||||
\fexample{barrier_regions}{1}
|
||||
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{Cancellation Constructs}
|
||||
\label{chap:cancellation}
|
||||
\section{Cancellation Constructs}
|
||||
\label{sec:cancellation}
|
||||
|
||||
The following example shows how the \code{cancel} directive can be used to terminate
|
||||
an OpenMP region. Although the \code{cancel} construct terminates the OpenMP
|
||||
@ -11,7 +11,7 @@ exception is properly handled in the sequential part. If cancellation of the \co
|
||||
region has been requested, some threads might have executed \code{phase\_1()}.
|
||||
However, it is guaranteed that none of the threads executed \code{phase\_2()}.
|
||||
|
||||
\cexample{cancellation}{1c}
|
||||
\cppexample{cancellation}{1}
|
||||
|
||||
|
||||
The following example illustrates the use of the \code{cancel} construct in error
|
||||
@ -20,7 +20,7 @@ the cancellation is activated. The encountering thread sets the shared variable
|
||||
\code{err} and other threads of the binding thread set proceed to the end of
|
||||
the worksharing construct after the cancellation has been activated.
|
||||
|
||||
\fexample{cancellation}{1f}
|
||||
\ffreeexample{cancellation}{1}
|
||||
|
||||
The following example shows how to cancel a parallel search on a binary tree as
|
||||
soon as the search value has been detected. The code creates a task to descend
|
||||
@ -32,11 +32,11 @@ task group to control the effect of the \code{cancel taskgroup} directive. The
|
||||
\plc{level} argument is used to create undeferred tasks after the first ten
|
||||
levels of the tree.
|
||||
|
||||
\cexample{cancellation}{2c}
|
||||
\cexample{cancellation}{2}
|
||||
|
||||
|
||||
The following is the equivalent parallel search example in Fortran.
|
||||
|
||||
\fexample{cancellation}{2f}
|
||||
\ffreeexample{cancellation}{2}
|
||||
|
||||
|
||||
|
@ -1,7 +1,7 @@
|
||||
\pagebreak
|
||||
\chapter{C/C++ Arrays in a \code{firstprivate} Clause}
|
||||
\section{C/C++ Arrays in a \code{firstprivate} Clause}
|
||||
\ccppspecificstart
|
||||
\label{chap:carrays_fpriv}
|
||||
\label{sec:carrays_fpriv}
|
||||
|
||||
The following example illustrates the size and value of list items of array or
|
||||
pointer type in a \code{firstprivate} clause . The size of new list items is
|
||||
@ -31,7 +31,7 @@ The new items of array type are initialized as if each integer element of the or
|
||||
array is assigned to the corresponding element of the new array. Those of pointer
|
||||
type are initialized as if by assignment from the original item to the new item.
|
||||
|
||||
\cnexample{carrays_fpriv}{1c}
|
||||
\cnexample{carrays_fpriv}{1}
|
||||
\ccppspecificend
|
||||
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{collapse} Clause}
|
||||
\label{chap:collapse}
|
||||
\section{The \code{collapse} Clause}
|
||||
\label{sec:collapse}
|
||||
|
||||
In the following example, the \code{k} and \code{j} loops are associated with
|
||||
the loop construct. So the iterations of the \code{k} and \code{j} loops are
|
||||
@ -16,9 +16,9 @@ The variable \code{j} can be omitted from the \code{private} clause when the
|
||||
from the \code{private} clause. In either case, \code{k} is implicitly private
|
||||
and could be omitted from the \code{private} clause.
|
||||
|
||||
\cexample{collapse}{1c}
|
||||
\cexample{collapse}{1}
|
||||
|
||||
\fexample{collapse}{1f}
|
||||
\fexample{collapse}{1}
|
||||
|
||||
In the next example, the \code{k} and \code{j} loops are associated with the
|
||||
loop construct. So the iterations of the \code{k} and \code{j} loops are collapsed
|
||||
@ -33,9 +33,9 @@ will have the value \code{2} and \code{j} will have the value \code{3}. Since
|
||||
by the sequentially last iteration of the collapsed \code{k} and \code{j} loop.
|
||||
This example prints: \code{2 3}.
|
||||
|
||||
\cexample{collapse}{2c}
|
||||
\cexample{collapse}{2}
|
||||
|
||||
\fexample{collapse}{2f}
|
||||
\fexample{collapse}{2}
|
||||
|
||||
The next example illustrates the interaction of the \code{collapse} and \code{ordered}
|
||||
clauses.
|
||||
@ -71,8 +71,8 @@ The code prints
|
||||
\\
|
||||
\code{1 3 2}
|
||||
|
||||
\cexample{collapse}{3c}
|
||||
\cexample{collapse}{3}
|
||||
|
||||
\fexample{collapse}{3f}
|
||||
\fexample{collapse}{3}
|
||||
|
||||
|
||||
|
@ -1,13 +1,13 @@
|
||||
\pagebreak
|
||||
\chapter{Conditional Compilation}
|
||||
\label{chap:cond_comp}
|
||||
\section{Conditional Compilation}
|
||||
\label{sec:cond_comp}
|
||||
|
||||
\ccppspecificstart
|
||||
The following example illustrates the use of conditional compilation using the
|
||||
OpenMP macro \code{\_OPENMP}. With OpenMP compilation, the \code{\_OPENMP}
|
||||
macro becomes defined.
|
||||
|
||||
\cnexample{cond_comp}{1c}
|
||||
\cnexample{cond_comp}{1}
|
||||
\ccppspecificend
|
||||
|
||||
\fortranspecificstart
|
||||
@ -16,6 +16,6 @@ With OpenMP compilation, the conditional compilation sentinel \code{!\$} is reco
|
||||
and treated as two spaces. In fixed form source, statements guarded by the sentinel
|
||||
must start after column 6.
|
||||
|
||||
\fnexample{cond_comp}{1f}
|
||||
\fnexample{cond_comp}{1}
|
||||
\fortranspecificend
|
||||
|
||||
|
@ -1,13 +1,13 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{copyin} Clause}
|
||||
\label{chap:copyin}
|
||||
\section{The \code{copyin} Clause}
|
||||
\label{sec:copyin}
|
||||
|
||||
The \code{copyin} clause is used to initialize threadprivate data upon entry
|
||||
to a \code{parallel} region. The value of the threadprivate variable in the master
|
||||
thread is copied to the threadprivate variable of each other team member.
|
||||
|
||||
\cexample{copyin}{1c}
|
||||
\cexample{copyin}{1}
|
||||
|
||||
\fexample{copyin}{1f}
|
||||
\fexample{copyin}{1}
|
||||
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{copyprivate} Clause}
|
||||
\label{chap:copyprivate}
|
||||
\section{The \code{copyprivate} Clause}
|
||||
\label{sec:copyprivate}
|
||||
|
||||
The \code{copyprivate} clause can be used to broadcast values acquired by a single
|
||||
thread directly to all instances of the private variables in the other threads.
|
||||
@ -16,28 +16,28 @@ The thread that executes the structured block associated with the \code{single}
|
||||
of the other implicit tasks in the thread team. The broadcast completes before
|
||||
any of the threads have left the barrier at the end of the construct.
|
||||
|
||||
\cexample{copyprivate}{1c}
|
||||
\cexample{copyprivate}{1}
|
||||
|
||||
\fexample{copyprivate}{1f}
|
||||
\fexample{copyprivate}{1}
|
||||
|
||||
In this example, assume that the input must be performed by the master thread.
|
||||
Since the \code{master} construct does not support the \code{copyprivate} clause,
|
||||
it cannot broadcast the input value that is read. However, \code{copyprivate}
|
||||
is used to broadcast an address where the input value is stored.
|
||||
|
||||
\cexample{copyprivate}{2c}
|
||||
\cexample{copyprivate}{2}
|
||||
|
||||
\fexample{copyprivate}{2f}
|
||||
\fexample{copyprivate}{2}
|
||||
|
||||
Suppose that the number of lock variables required within a \code{parallel} region
|
||||
cannot easily be determined prior to entering it. The \code{copyprivate} clause
|
||||
can be used to provide access to shared lock variables that are allocated within
|
||||
that \code{parallel} region.
|
||||
|
||||
\cexample{copyprivate}{3c}
|
||||
\cexample{copyprivate}{3}
|
||||
|
||||
\fortranspecificstart
|
||||
\fnexample{copyprivate}{3f}
|
||||
\fnexample{copyprivate}{3}
|
||||
|
||||
Note that the effect of the \code{copyprivate} clause on a variable with the
|
||||
\code{allocatable} attribute is different than on a variable with the \code{pointer}
|
||||
@ -45,7 +45,7 @@ attribute. The value of \code{A} is copied (as if by intrinsic assignment) and
|
||||
the pointer \code{B} is copied (as if by pointer assignment) to the corresponding
|
||||
list items in the other implicit tasks belonging to the \code{parallel} region.
|
||||
|
||||
\fnexample{copyprivate}{4f}
|
||||
\fnexample{copyprivate}{4}
|
||||
\fortranspecificend
|
||||
|
||||
|
||||
|
14
Examples_cpp_reference.tex
Normal file
14
Examples_cpp_reference.tex
Normal file
@ -0,0 +1,14 @@
|
||||
\section{C++ Reference in Data-Sharing Clauses}
|
||||
\cppspecificstart
|
||||
\label{sec:cpp_reference}
|
||||
|
||||
C++ reference types are allowed in data-sharing attribute clauses as of OpenMP 4.5, except
|
||||
for the \code{threadprivate}, \code{copyin} and \code{copyprivate} clauses.
|
||||
(See the Data-Sharing Attribute Clauses Section of the 4.5 OpenMP specification.)
|
||||
When a variable with C++ reference type is privatized, the object the reference refers to is privatized in addition to the reference itself.
|
||||
The following example shows the use of reference types in data-sharing clauses in the usual way.
|
||||
Additionally it shows how the data-sharing of formal arguments with a C++ reference type on an orphaned task generating construct is determined implicitly. (See the Data-sharing Attribute Rules for Variables Referenced in a Construct Section of the 4.5 OpenMP specification.)
|
||||
|
||||
|
||||
\cppnexample{cpp_reference}{1}
|
||||
\cppspecificend
|
@ -1,16 +1,20 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{critical} Construct}
|
||||
\label{chap:critical}
|
||||
\section{The \code{critical} Construct}
|
||||
\label{sec:critical}
|
||||
|
||||
The following example includes several \code{critical} constructs . The example
|
||||
The following example includes several \code{critical} constructs. The example
|
||||
illustrates a queuing model in which a task is dequeued and worked on. To guard
|
||||
against multiple threads dequeuing the same task, the dequeuing operation must
|
||||
be in a \code{critical} region. Because the two queues in this example are independent,
|
||||
they are protected by \code{critical} constructs with different names, \plc{xaxis}
|
||||
and \plc{yaxis}.
|
||||
|
||||
\cexample{critical}{1c}
|
||||
\cexample{critical}{1}
|
||||
|
||||
\fexample{critical}{1f}
|
||||
\fexample{critical}{1}
|
||||
|
||||
The following example extends the previous example by adding the \code{hint} clause to the \code{critical} constructs.
|
||||
|
||||
\cexample{critical}{2}
|
||||
|
||||
\fexample{critical}{2}
|
||||
|
@ -1,8 +1,9 @@
|
||||
\pagebreak
|
||||
\chapter{\code{declare} \code{target} Construct}
|
||||
\label{chap:declare_target}
|
||||
\section{\code{declare} \code{target} Construct}
|
||||
\label{sec:declare_target}
|
||||
|
||||
\section{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for a Function}
|
||||
\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for a Function}
|
||||
\label{subsec:declare_target_function}
|
||||
|
||||
The following example shows how the \code{declare} \code{target} directive
|
||||
is used to indicate that the corresponding call inside a \code{target} region
|
||||
@ -15,7 +16,7 @@ the \code{target} region (thus \code{fib}) will execute on the host device.
|
||||
For C/C++ codes the declaration of the function \code{fib} appears between the \code{declare}
|
||||
\code{target} and \code{end} \code{declare} \code{target} directives.
|
||||
|
||||
\cexample{declare_target}{1c}
|
||||
\cexample{declare_target}{1}
|
||||
|
||||
The Fortran \code{fib} subroutine contains a \code{declare} \code{target} declaration
|
||||
to indicate to the compiler to create an device executable version of the procedure.
|
||||
@ -26,7 +27,7 @@ The program uses the \code{module\_fib} module, which presents an explicit inter
|
||||
the compiler with the \code{declare} \code{target} declarations for processing
|
||||
the \code{fib} call.
|
||||
|
||||
\fexample{declare_target}{1f}
|
||||
\ffreeexample{declare_target}{1}
|
||||
|
||||
The next Fortran example shows the use of an external subroutine. Without an explicit
|
||||
interface (through module use or an interface block) the \code{declare} \code{target}
|
||||
@ -34,9 +35,10 @@ declarations within a external subroutine are unknown to the main program unit;
|
||||
therefore, a \code{declare} \code{target} must be provided within the program
|
||||
scope for the compiler to determine that a target binary should be available.
|
||||
|
||||
\fexample{declare_target}{2f}
|
||||
\ffreeexample{declare_target}{2}
|
||||
|
||||
\section{\code{declare} \code{target} Construct for Class Type}
|
||||
\subsection{\code{declare} \code{target} Construct for Class Type}
|
||||
\label{subsec:declare_target_class}
|
||||
|
||||
\cppspecificstart
|
||||
The following example shows how the \code{declare} \code{target} and \code{end}
|
||||
@ -45,10 +47,11 @@ of a variable \plc{varY} with a class type \code{typeY}. The member function \co
|
||||
be accessed on a target device because its declaration did not appear between \code{declare}
|
||||
\code{target} and \code{end} \code{declare} \code{target} directives.
|
||||
|
||||
\cnexample{declare_target}{2c}
|
||||
\cppnexample{declare_target}{2}
|
||||
\cppspecificend
|
||||
|
||||
\section{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for Variables}
|
||||
\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for Variables}
|
||||
\label{subsec:declare_target_variables}
|
||||
|
||||
The following examples show how the \code{declare} \code{target} and \code{end}
|
||||
\code{declare} \code{target} directives are used to indicate that global variables
|
||||
@ -62,13 +65,13 @@ is then used to manage the consistency of the variables \plc{p}, \plc{v1}, and \
|
||||
data environment of the encountering host device task and the implicit device data
|
||||
environment of the default target device.
|
||||
|
||||
\cexample{declare_target}{3c}
|
||||
\cexample{declare_target}{3}
|
||||
|
||||
The Fortran version of the above C code uses a different syntax. Fortran modules
|
||||
use a list syntax on the \code{declare} \code{target} directive to declare
|
||||
mapped variables.
|
||||
|
||||
\fexample{declare_target}{3f}
|
||||
\ffreeexample{declare_target}{3}
|
||||
|
||||
The following example also indicates that the function \code{Pfun()} is available on the
|
||||
target device, as well as the variable \plc{Q}, which is mapped to the implicit device
|
||||
@ -81,7 +84,7 @@ In the following example, the function and variable declarations appear between
|
||||
the \code{declare} \code{target} and \code{end} \code{declare} \code{target}
|
||||
directives.
|
||||
|
||||
\cexample{declare_target}{4c}
|
||||
\cexample{declare_target}{4}
|
||||
|
||||
The Fortran version of the above C code uses a different syntax. In Fortran modules
|
||||
a list syntax on the \code{declare} \code{target} directive is used to declare
|
||||
@ -90,9 +93,10 @@ separated list. When the \code{declare} \code{target} directive is used to
|
||||
declare just the procedure, the procedure name need not be listed -- it is implicitly
|
||||
assumed, as illustrated in the \code{Pfun()} function.
|
||||
|
||||
\fexample{declare_target}{4f}
|
||||
\ffreeexample{declare_target}{4}
|
||||
|
||||
\section{\code{declare} \code{target} and \code{end} \code{declare} \code{target} with \code{declare} \code{simd}}
|
||||
\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} with \code{declare} \code{simd}}
|
||||
\label{subsec:declare_target_simd}
|
||||
|
||||
The following example shows how the \code{declare} \code{target} and \code{end}
|
||||
\code{declare} \code{target} directives are used to indicate that a function
|
||||
@ -100,7 +104,7 @@ is available on a target device. The \code{declare} \code{simd} directive indica
|
||||
that there is a SIMD version of the function \code{P()} that is available on the target
|
||||
device as well as one that is available on the host device.
|
||||
|
||||
\cexample{declare_target}{5c}
|
||||
\cexample{declare_target}{5}
|
||||
|
||||
The Fortran version of the above C code uses a different syntax. Fortran modules
|
||||
use a list syntax of the \code{declare} \code{target} declaration for the mapping.
|
||||
@ -109,5 +113,30 @@ The function declaration does not use a list and implicitly assumes the function
|
||||
name. In this Fortran example row and column indices are reversed relative to the
|
||||
C/C++ example, as is usual for codes optimized for memory access.
|
||||
|
||||
\fexample{declare_target}{5f}
|
||||
\ffreeexample{declare_target}{5}
|
||||
|
||||
|
||||
\subsection{\code{declare}~\code{target} Directive with \code{link} Clause}
|
||||
\label{subsec:declare_target_link}
|
||||
|
||||
In the OpenMP 4.5 standard the \code{declare}~\code{target} directive was extended to allow static
|
||||
data to be mapped, \emph{when needed}, through a \code{link} clause.
|
||||
|
||||
Data storage for items listed in the \code{link} clause becomes available on the device
|
||||
when it is mapped implicitly or explicitly in a \code{map} clause, and it persists for the scope of
|
||||
the mapping (as specified by a \code{target} construct,
|
||||
a \code{target}~\code{data} construct, or
|
||||
\code{target}~\code{enter/exit}~\code{data} constructs).
|
||||
|
||||
Tip: When all the global data items will not fit on a device and are not needed
|
||||
simultaneously, use the \code{link} clause and map the data only when it is needed.
|
||||
|
||||
The following C and Fortran examples show two sets of data (single precision and double precision)
|
||||
that are global on the host for the entire execution on the host; but are only used
|
||||
globally on the device for part of the program execution. The single precision data
|
||||
are allocated and persist only for the first \code{target} region. Similarly, the
|
||||
double precision data are in scope on the device only for the second \code{target} region.
|
||||
|
||||
\cexample{declare_target}{6}
|
||||
\ffreeexample{declare_target}{6}
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{default(none)} Clause}
|
||||
\label{chap:default_none}
|
||||
\section{The \code{default(none)} Clause}
|
||||
\label{sec:default_none}
|
||||
|
||||
The following example distinguishes the variables that are affected by the \code{default(none)}
|
||||
clause from those that are not.
|
||||
@ -11,9 +11,9 @@ are no longer predetermined shared. Thus, these variables (variable \plc{c} in
|
||||
need to be explicitly listed
|
||||
in data-sharing attribute clauses when the \code{default(none)} clause is specified.
|
||||
|
||||
\cnexample{default_none}{1c}
|
||||
\cnexample{default_none}{1}
|
||||
\ccppspecificend
|
||||
|
||||
\fexample{default_none}{1f}
|
||||
\fexample{default_none}{1}
|
||||
|
||||
|
||||
|
@ -1,35 +1,57 @@
|
||||
\pagebreak
|
||||
\chapter{Device Routines}
|
||||
\label{chap:device}
|
||||
\section{Device Routines}
|
||||
\label{sec:device}
|
||||
|
||||
\section{\code{omp\_is\_initial\_device} Routine}
|
||||
\subsection{\code{omp\_is\_initial\_device} Routine}
|
||||
\label{subsec:device_is_initial}
|
||||
|
||||
The following example shows how the \code{omp\_is\_initial\_device} runtime library routine
|
||||
can be used to query if a code is executing on the initial host device or on a
|
||||
target device. The example then sets the number of threads in the \code{parallel}
|
||||
region based on where the code is executing.
|
||||
|
||||
\cexample{device}{1c}
|
||||
\cexample{device}{1}
|
||||
|
||||
\fexample{device}{1f}
|
||||
\ffreeexample{device}{1}
|
||||
|
||||
\section{\code{omp\_get\_num\_devices} Routine}
|
||||
\subsection{\code{omp\_get\_num\_devices} Routine}
|
||||
\label{subsec:device_num_devices}
|
||||
|
||||
The following example shows how the \code{omp\_get\_num\_devices} runtime library routine
|
||||
can be used to determine the number of devices.
|
||||
|
||||
\cexample{device}{2c}
|
||||
\cexample{device}{2}
|
||||
|
||||
\fexample{device}{2f}
|
||||
\ffreeexample{device}{2}
|
||||
|
||||
\section{\code{omp\_set\_default\_device} and \\
|
||||
\subsection{\code{omp\_set\_default\_device} and \\
|
||||
\code{omp\_get\_default\_device} Routines}
|
||||
\label{subsec:device_is_set_get_default}
|
||||
|
||||
The following example shows how the \code{omp\_set\_default\_device} and \code{omp\_get\_default\_device}
|
||||
runtime library routines can be used to set the default device and determine the
|
||||
default device respectively.
|
||||
|
||||
\cexample{device}{3c}
|
||||
\cexample{device}{3}
|
||||
|
||||
\fexample{device}{3f}
|
||||
\ffreeexample{device}{3}
|
||||
|
||||
|
||||
\subsection{Target Memory and Device Pointers Routines}
|
||||
\label{subsec:target_mem_and_device_ptrs}
|
||||
|
||||
The following example shows how to create space on a device, transfer data
|
||||
to and from that space, and free the space, using API calls. The API calls
|
||||
directly execute allocation, copy and free operations on the device, without invoking
|
||||
any mapping through a \code{target} directive. The \code{omp\_target\_alloc} routine allocates space
|
||||
and returns a device pointer for referencing the space in the \code{omp\_target\_memcpy}
|
||||
API routine on the host. The \code{omp\_target\_free} routine frees the space on the device.
|
||||
|
||||
The example also illustrates how to access that space
|
||||
in a \code{target} region by exposing the device pointer in an \code{is\_device\_ptr} clause.
|
||||
|
||||
The example creates an array of cosine values on the default device, to be used
|
||||
on the host device. The function fails if a default device is not available.
|
||||
|
||||
\cexample{device}{4}
|
||||
|
||||
|
68
Examples_doacross.tex
Normal file
68
Examples_doacross.tex
Normal file
@ -0,0 +1,68 @@
|
||||
\pagebreak
|
||||
\section{Doacross Loop Nest}
|
||||
\label{sec:doacross}
|
||||
|
||||
An \code{ordered} clause can be used on a loop construct with an integer
|
||||
parameter argument to define the number of associated loops within
|
||||
a \plc{doacross loop nest} where cross-iteration dependences exist.
|
||||
A \code{depend} clause on an \code{ordered} construct within an ordered
|
||||
loop describes the dependences of the \plc{doacross} loops.
|
||||
|
||||
In the code below, the \code{depend(sink:i-1)} clause defines an \plc{i-1}
|
||||
to \plc{i} cross-iteration dependence that specifies a wait point for
|
||||
the completion of computation from iteration \plc{i-1} before proceeding
|
||||
to the subsequent statements. The \code{depend(source)} clause indicates
|
||||
the completion of computation from the current iteration (\plc{i})
|
||||
to satisfy the cross-iteration dependence that arises from the iteration.
|
||||
For this example the same sequential ordering could have been achieved
|
||||
with an \code{ordered} clause without a parameter, on the loop directive,
|
||||
and a single \code{ordered} directive without the \code{depend} clause
|
||||
specified for the statement executing the \plc{bar} function.
|
||||
|
||||
\cexample{doacross}{1}
|
||||
|
||||
\ffreeexample{doacross}{1}
|
||||
|
||||
The following code is similar to the previous example but with
|
||||
\plc{doacross loop nest} extended to two nested loops, \plc{i} and \plc{j},
|
||||
as specified by the \code{ordered(2)} clause on the loop directive.
|
||||
In the C/C++ code, the \plc{i} and \plc{j} loops are the first and
|
||||
second associated loops, respectively, whereas
|
||||
in the Fortran code, the \plc{j} and \plc{i} loops are the first and
|
||||
second associated loops, respectively.
|
||||
The \code{depend(sink:i-1,j)} and \code{depend(sink:i,j-1)} clauses in
|
||||
the C/C++ code define cross-iteration dependences in two dimensions from
|
||||
iterations (\plc{i-1, j}) and (\plc{i, j-1}) to iteration (\plc{i, j}).
|
||||
Likewise, the \code{depend(sink:j-1,i)} and \code{depend(sink:j,i-1)} clauses
|
||||
in the Fortran code define cross-iteration dependences from iterations
|
||||
(\plc{j-1, i}) and (\plc{j, i-1}) to iteration (\plc{j, i}).
|
||||
|
||||
\cexample{doacross}{2}
|
||||
|
||||
\ffreeexample{doacross}{2}
|
||||
|
||||
|
||||
The following example shows the incorrect use of the \code{ordered}
|
||||
directive with a \code{depend} clause. There are two issues with the code.
|
||||
The first issue is a missing \code{ordered}~\code{depend(source)} directive,
|
||||
which could cause a deadlock.
|
||||
The second issue is the \code{depend(sink:i+1,j)} and \code{depend(sink:i,j+1)}
|
||||
clauses define dependences on lexicographically later
|
||||
source iterations (\plc{i+1, j}) and (\plc{i, j+1}), which could cause
|
||||
a deadlock as well since they may not start to execute until the current iteration completes.
|
||||
|
||||
\cexample{doacross}{3}
|
||||
|
||||
\ffreeexample{doacross}{3}
|
||||
|
||||
|
||||
The following example illustrates the use of the \code{collapse} clause for
|
||||
a \plc{doacross loop nest}. The \plc{i} and \plc{j} loops are the associated
|
||||
loops for the collapsed loop as well as for the \plc{doacross loop nest}.
|
||||
The example also shows a compliant usage of the dependence source
|
||||
directive placed before the corresponding sink directive.
|
||||
Checking the completion of computation from previous iterations at the sink point can occur after the source statement.
|
||||
|
||||
\cexample{doacross}{4}
|
||||
|
||||
\ffreeexample{doacross}{4}
|
@ -1,12 +1,12 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{flush} Construct without a List}
|
||||
\label{chap:flush_nolist}
|
||||
\section{The \code{flush} Construct without a List}
|
||||
\label{sec:flush_nolist}
|
||||
|
||||
The following example distinguishes the shared variables affected by a \code{flush}
|
||||
construct with no list from the shared objects that are not affected:
|
||||
|
||||
\cexample{flush_nolist}{1c}
|
||||
\cexample{flush_nolist}{1}
|
||||
|
||||
\fexample{flush_nolist}{1f}
|
||||
\fexample{flush_nolist}{1}
|
||||
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{Fortran Restrictions on the \code{do} Construct}
|
||||
\label{chap:fort_do}
|
||||
\section{Fortran Restrictions on the \code{do} Construct}
|
||||
\label{sec:fort_do}
|
||||
\fortranspecificstart
|
||||
|
||||
If an \code{end do} directive follows a \plc{do-construct} in which several
|
||||
@ -8,12 +8,12 @@ If an \code{end do} directive follows a \plc{do-construct} in which several
|
||||
directive can only be specified for the outermost of these \code{DO} statements.
|
||||
The following example contains correct usages of loop constructs:
|
||||
|
||||
\fnexample{fort_do}{1f}
|
||||
\fnexample{fort_do}{1}
|
||||
|
||||
The following example is non-conforming because the matching \code{do} directive
|
||||
for the \code{end do} does not precede the outermost loop:
|
||||
|
||||
\fnexample{fort_do}{2f}
|
||||
\fnexample{fort_do}{2}
|
||||
\fortranspecificend
|
||||
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{Fortran Private Loop Iteration Variables}
|
||||
\label{chap:fort_loopvar}
|
||||
\section{Fortran Private Loop Iteration Variables}
|
||||
\label{sec:fort_loopvar}
|
||||
\fortranspecificstart
|
||||
|
||||
In general loop iteration variables will be private, when used in the \plc{do-loop}
|
||||
@ -10,12 +10,12 @@ the OpenMP 4.0 specification). In the following example of a sequential
|
||||
loop in a \code{parallel} construct the loop iteration variable \plc{I} will
|
||||
be private.
|
||||
|
||||
\fnexample{fort_loopvar}{1f}
|
||||
\ffreenexample{fort_loopvar}{1}
|
||||
|
||||
In exceptional cases, loop iteration variables can be made shared, as in the following
|
||||
example:
|
||||
|
||||
\fnexample{fort_loopvar}{2f}
|
||||
\ffreenexample{fort_loopvar}{2}
|
||||
|
||||
Note however that the use of shared loop iteration variables can easily lead to
|
||||
race conditions.
|
||||
|
@ -1,7 +1,7 @@
|
||||
\pagebreak
|
||||
\chapter{Race Conditions Caused by Implied Copies of Shared Variables in Fortran}
|
||||
\section{Race Conditions Caused by Implied Copies of Shared Variables in Fortran}
|
||||
\fortranspecificstart
|
||||
\label{chap:fort_race}
|
||||
\label{sec:fort_race}
|
||||
|
||||
The following example contains a race condition, because the shared variable, which
|
||||
is an array section, is passed as an actual argument to a routine that has an assumed-size
|
||||
@ -10,7 +10,7 @@ may cause the compiler to copy the argument into a temporary location prior to
|
||||
the call and copy from the temporary location into the original variable when the
|
||||
subroutine returns. This copying would cause races in the \code{parallel} region.
|
||||
|
||||
\fnexample{fort_race}{1f}
|
||||
\ffreenexample{fort_race}{1}
|
||||
\fortranspecificend
|
||||
|
||||
|
||||
|
@ -1,23 +1,23 @@
|
||||
\pagebreak
|
||||
\chapter{Fortran Restrictions on Storage Association with the \code{private} Clause}
|
||||
\section{Fortran Restrictions on Storage Association with the \code{private} Clause}
|
||||
\fortranspecificstart
|
||||
\label{chap:fort_sa_private}
|
||||
\label{sec:fort_sa_private}
|
||||
|
||||
The following non-conforming examples illustrate the implications of the \code{private}
|
||||
clause rules with regard to storage association.
|
||||
|
||||
\fnexample{fort_sa_private}{1f}
|
||||
\fnexample{fort_sa_private}{1}
|
||||
|
||||
\fnexample{fort_sa_private}{2f}
|
||||
\fnexample{fort_sa_private}{2}
|
||||
|
||||
\fnexample{fort_sa_private}{3}
|
||||
% blue line floater at top of this page for "Fortran, cont."
|
||||
\begin{figure}[t!]
|
||||
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
|
||||
\end{figure}
|
||||
|
||||
\fnexample{fort_sa_private}{3f}
|
||||
\fnexample{fort_sa_private}{4}
|
||||
|
||||
\fnexample{fort_sa_private}{4f}
|
||||
|
||||
\fnexample{fort_sa_private}{5f}
|
||||
\fnexample{fort_sa_private}{5}
|
||||
\fortranspecificend
|
||||
|
||||
|
@ -1,7 +1,7 @@
|
||||
\pagebreak
|
||||
\chapter{Fortran Restrictions on \code{shared} and \code{private} Clauses with Common Blocks}
|
||||
\section{Fortran Restrictions on \code{shared} and \code{private} Clauses with Common Blocks}
|
||||
\fortranspecificstart
|
||||
\label{chap:fort_sp_common}
|
||||
\label{sec:fort_sp_common}
|
||||
|
||||
When a named common block is specified in a \code{private}, \code{firstprivate},
|
||||
or \code{lastprivate} clause of a construct, none of its members may be declared
|
||||
@ -10,11 +10,11 @@ illustrate this point.
|
||||
|
||||
The following example is conforming:
|
||||
|
||||
\fnexample{fort_sp_common}{1f}
|
||||
\fnexample{fort_sp_common}{1}
|
||||
|
||||
The following example is also conforming:
|
||||
|
||||
\fnexample{fort_sp_common}{2f}
|
||||
\fnexample{fort_sp_common}{2}
|
||||
% blue line floater at top of this page for "Fortran, cont."
|
||||
\begin{figure}[t!]
|
||||
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
|
||||
@ -22,17 +22,17 @@ The following example is also conforming:
|
||||
|
||||
The following example is conforming:
|
||||
|
||||
\fnexample{fort_sp_common}{3f}
|
||||
\fnexample{fort_sp_common}{3}
|
||||
|
||||
The following example is non-conforming because \code{x} is a constituent element
|
||||
of \code{c}:
|
||||
|
||||
\fnexample{fort_sp_common}{4f}
|
||||
\fnexample{fort_sp_common}{4}
|
||||
|
||||
The following example is non-conforming because a common block may not be declared
|
||||
both shared and private:
|
||||
|
||||
\fnexample{fort_sp_common}{5f}
|
||||
\fnexample{fort_sp_common}{5}
|
||||
\fortranspecificend
|
||||
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{firstprivate} Clause and the \code{sections} Construct}
|
||||
\label{chap:fpriv_sections}
|
||||
\section{The \code{firstprivate} Clause and the \code{sections} Construct}
|
||||
\label{sec:fpriv_sections}
|
||||
|
||||
In the following example of the \code{sections} construct the \code{firstprivate}
|
||||
clause is used to initialize the private copy of \code{section\_count} of each
|
||||
@ -11,8 +11,8 @@ thread executes the two sections, one section will print the value 1 and the oth
|
||||
will print the value 2. Since the order of execution of the two sections in this
|
||||
case is unspecified, it is unspecified which section prints which value.
|
||||
|
||||
\cexample{fpriv_sections}{1c}
|
||||
\cexample{fpriv_sections}{1}
|
||||
|
||||
\fexample{fpriv_sections}{1f}
|
||||
\ffreeexample{fpriv_sections}{1}
|
||||
|
||||
|
||||
|
@ -1,21 +1,21 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{omp\_get\_num\_threads} Routine}
|
||||
\label{chap:get_nthrs}
|
||||
\section{The \code{omp\_get\_num\_threads} Routine}
|
||||
\label{sec:get_nthrs}
|
||||
|
||||
In the following example, the \code{omp\_get\_num\_threads} call returns 1 in
|
||||
the sequential part of the code, so \code{np} will always be equal to 1. To determine
|
||||
the number of threads that will be deployed for the \code{parallel} region, the
|
||||
call should be inside the \code{parallel} region.
|
||||
|
||||
\cexample{get_nthrs}{1c}
|
||||
\cexample{get_nthrs}{1}
|
||||
|
||||
\fexample{get_nthrs}{1f}
|
||||
\fexample{get_nthrs}{1}
|
||||
|
||||
The following example shows how to rewrite this program without including a query
|
||||
for the number of threads:
|
||||
|
||||
\cexample{get_nthrs}{2c}
|
||||
\cexample{get_nthrs}{2}
|
||||
|
||||
\fexample{get_nthrs}{2f}
|
||||
\fexample{get_nthrs}{2}
|
||||
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{Internal Control Variables (ICVs)}
|
||||
\label{chap:icv}
|
||||
\section{Internal Control Variables (ICVs)}
|
||||
\label{sec:icv}
|
||||
|
||||
According to Section 2.3 of the OpenMP 4.0 specification, an OpenMP implementation must act as if there are ICVs that control
|
||||
the behavior of the program. This example illustrates two ICVs, \plc{nthreads-var}
|
||||
@ -50,7 +50,7 @@ one of the threads in the team. Since we have a total of two inner \code{paralle
|
||||
regions, the print statement will be executed twice -- once per inner \code{parallel}
|
||||
region.
|
||||
|
||||
\cexample{icv}{1c}
|
||||
\cexample{icv}{1}
|
||||
|
||||
\fexample{icv}{1f}
|
||||
\fexample{icv}{1}
|
||||
|
||||
|
@ -1,11 +1,10 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{omp\_init\_lock} Routine}
|
||||
\label{chap:init_lock}
|
||||
\subsection{The \code{omp\_init\_lock} Routine}
|
||||
\label{subsec:init_lock}
|
||||
|
||||
The following example demonstrates how to initialize an array of locks in a \code{parallel}
|
||||
region by using \code{omp\_init\_lock}.
|
||||
|
||||
\cexample{init_lock}{1c}
|
||||
\cppexample{init_lock}{1}
|
||||
|
||||
\fexample{init_lock}{1f}
|
||||
\fexample{init_lock}{1}
|
||||
|
||||
|
10
Examples_init_lock_with_hint.tex
Normal file
10
Examples_init_lock_with_hint.tex
Normal file
@ -0,0 +1,10 @@
|
||||
%\pagebreak
|
||||
\subsection{The \code{omp\_init\_lock\_with\_hint} Routine}
|
||||
\label{subsec:init_lock_with_hint}
|
||||
|
||||
The following example demonstrates how to initialize an array of locks in a \code{parallel} region by using \code{omp\_init\_lock\_with\_hint}.
|
||||
Note, hints are combined with an \code{|} or \code{+} operator in C/C++ and a \code{+} operator in Fortran.
|
||||
|
||||
\cppexample{init_lock_with_hint}{1}
|
||||
|
||||
\fexample{init_lock_with_hint}{1}
|
@ -1,14 +1,14 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{lastprivate} Clause}
|
||||
\label{chap:lastprivate}
|
||||
\section{The \code{lastprivate} Clause}
|
||||
\label{sec:lastprivate}
|
||||
|
||||
Correct execution sometimes depends on the value that the last iteration of a loop
|
||||
assigns to a variable. Such programs must list all such variables in a \code{lastprivate}
|
||||
clause so that the values of the variables are the same as when the loop is executed
|
||||
sequentially.
|
||||
|
||||
\cexample{lastprivate}{1c}
|
||||
\cexample{lastprivate}{1}
|
||||
|
||||
\fexample{lastprivate}{1f}
|
||||
\fexample{lastprivate}{1}
|
||||
|
||||
|
||||
|
13
Examples_linear_in_loop.tex
Normal file
13
Examples_linear_in_loop.tex
Normal file
@ -0,0 +1,13 @@
|
||||
\section{\code{linear} Clause in Loop Constructs}
|
||||
\label{sec:linear_in_loop}
|
||||
|
||||
The following example shows the use of the \code{linear} clause in a loop
|
||||
construct to allow the proper parallelization of a loop that contains
|
||||
an induction variable (\plc{j}). At the end of the execution of
|
||||
the loop construct, the original variable \plc{j} is updated with
|
||||
the value \plc{N/2} from the last iteration of the loop.
|
||||
|
||||
\cexample{linear_in_loop}{1}
|
||||
|
||||
\ffreeexample{linear_in_loop}{1}
|
||||
|
@ -1,6 +1,5 @@
|
||||
\pagebreak
|
||||
\chapter{Ownership of Locks}
|
||||
\label{chap:lock_owner}
|
||||
\subsection{Ownership of Locks}
|
||||
\label{subsec:lock_owner}
|
||||
|
||||
Ownership of locks has changed since OpenMP 2.5. In OpenMP 2.5, locks are owned
|
||||
by threads; so a lock released by the \code{omp\_unset\_lock} routine must be
|
||||
@ -16,8 +15,8 @@ the same). However, it is not conforming beginning with OpenMP 3.0, because the
|
||||
region that releases the lock \code{lck} is different from the task region that
|
||||
acquires the lock.
|
||||
|
||||
\cexample{lock_owner}{1c}
|
||||
\cexample{lock_owner}{1}
|
||||
|
||||
\fexample{lock_owner}{1f}
|
||||
\fexample{lock_owner}{1}
|
||||
|
||||
|
||||
|
5
Examples_locks.tex
Normal file
5
Examples_locks.tex
Normal file
@ -0,0 +1,5 @@
|
||||
\pagebreak
|
||||
\section{Lock Routines}
|
||||
\label{sec:locks}
|
||||
|
||||
This section is about the use of lock routines for synchronization.
|
@ -1,13 +1,13 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{master} Construct}
|
||||
\label{chap:master}
|
||||
\section{The \code{master} Construct}
|
||||
\label{sec:master}
|
||||
|
||||
The following example demonstrates the master construct . In the example, the master
|
||||
keeps track of how many iterations have been executed and prints out a progress
|
||||
report. The other threads skip the master region without waiting.
|
||||
|
||||
\cexample{master}{1c}
|
||||
\cexample{master}{1}
|
||||
|
||||
\fexample{master}{1f}
|
||||
\fexample{master}{1}
|
||||
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{The OpenMP Memory Model}
|
||||
\label{chap:mem_model}
|
||||
\section{The OpenMP Memory Model}
|
||||
\label{sec:mem_model}
|
||||
|
||||
In the following example, at Print 1, the value of \plc{x} could be either 2
|
||||
or 5, depending on the timing of the threads, and the implementation of the assignment
|
||||
@ -14,25 +14,25 @@ The barrier after Print 1 contains implicit flushes on all threads, as well as
|
||||
a thread synchronization, so the programmer is guaranteed that the value 5 will
|
||||
be printed by both Print 2 and Print 3.
|
||||
|
||||
\cexample{mem_model}{1c}
|
||||
\cexample{mem_model}{1}
|
||||
|
||||
\fexample{mem_model}{1f}
|
||||
\ffreeexample{mem_model}{1}
|
||||
|
||||
The following example demonstrates why synchronization is difficult to perform
|
||||
correctly through variables. The value of flag is undefined in both prints on thread
|
||||
1 and the value of data is only well-defined in the second print.
|
||||
|
||||
\cexample{mem_model}{2c}
|
||||
\cexample{mem_model}{2}
|
||||
|
||||
\fexample{mem_model}{2f}
|
||||
\fexample{mem_model}{2}
|
||||
|
||||
The next example demonstrates why synchronization is difficult to perform correctly
|
||||
through variables. Because the \plc{write}(1)-\plc{flush}(1)-\plc{flush}(2)-\plc{read}(2)
|
||||
sequence cannot be guaranteed in the example, the statements on thread 0 and thread
|
||||
1 may execute in either order.
|
||||
|
||||
\cexample{mem_model}{3c}
|
||||
\cexample{mem_model}{3}
|
||||
|
||||
\fexample{mem_model}{3f}
|
||||
\fexample{mem_model}{3}
|
||||
|
||||
|
||||
|
@ -1,11 +1,10 @@
|
||||
\pagebreak
|
||||
\chapter{Nestable Lock Routines}
|
||||
\label{chap:nestable_lock}
|
||||
\subsection{Nestable Lock Routines}
|
||||
\label{subsec:nestable_lock}
|
||||
|
||||
The following example demonstrates how a nestable lock can be used to synchronize
|
||||
updates both to a whole structure and to one of its members.
|
||||
|
||||
\cexample{nestable_lock}{1c}
|
||||
\cexample{nestable_lock}{1}
|
||||
|
||||
\fexample{nestable_lock}{1f}
|
||||
\fexample{nestable_lock}{1}
|
||||
|
||||
|
@ -1,18 +1,18 @@
|
||||
\pagebreak
|
||||
\chapter{Nested Loop Constructs}
|
||||
\label{chap:nested_loop}
|
||||
\section{Nested Loop Constructs}
|
||||
\label{sec:nested_loop}
|
||||
|
||||
The following example of loop construct nesting is conforming because the inner
|
||||
and outer loop regions bind to different \code{parallel} regions:
|
||||
|
||||
\cexample{nested_loop}{1c}
|
||||
\cexample{nested_loop}{1}
|
||||
|
||||
\fexample{nested_loop}{1f}
|
||||
\fexample{nested_loop}{1}
|
||||
|
||||
The following variation of the preceding example is also conforming:
|
||||
|
||||
\cexample{nested_loop}{2c}
|
||||
\cexample{nested_loop}{2}
|
||||
|
||||
\fexample{nested_loop}{2f}
|
||||
\fexample{nested_loop}{2}
|
||||
|
||||
|
||||
|
@ -1,52 +1,52 @@
|
||||
\pagebreak
|
||||
\chapter{Restrictions on Nesting of Regions}
|
||||
\label{chap:nesting_restrict}
|
||||
\section{Restrictions on Nesting of Regions}
|
||||
\label{sec:nesting_restrict}
|
||||
|
||||
The examples in this section illustrate the region nesting rules.
|
||||
|
||||
The following example is non-conforming because the inner and outer loop regions
|
||||
are closely nested:
|
||||
|
||||
\cexample{nesting_restrict}{1c}
|
||||
\cexample{nesting_restrict}{1}
|
||||
|
||||
\fexample{nesting_restrict}{1f}
|
||||
\fexample{nesting_restrict}{1}
|
||||
|
||||
The following orphaned version of the preceding example is also non-conforming:
|
||||
|
||||
\cexample{nesting_restrict}{2c}
|
||||
\cexample{nesting_restrict}{2}
|
||||
|
||||
\fexample{nesting_restrict}{2f}
|
||||
\fexample{nesting_restrict}{2}
|
||||
|
||||
The following example is non-conforming because the loop and \code{single} regions
|
||||
are closely nested:
|
||||
|
||||
\cexample{nesting_restrict}{3c}
|
||||
\cexample{nesting_restrict}{3}
|
||||
|
||||
\fexample{nesting_restrict}{3f}
|
||||
\fexample{nesting_restrict}{3}
|
||||
|
||||
The following example is non-conforming because a \code{barrier} region cannot
|
||||
be closely nested inside a loop region:
|
||||
|
||||
\cexample{nesting_restrict}{4c}
|
||||
\cexample{nesting_restrict}{4}
|
||||
|
||||
\fexample{nesting_restrict}{4f}
|
||||
\fexample{nesting_restrict}{4}
|
||||
|
||||
The following example is non-conforming because the \code{barrier} region cannot
|
||||
be closely nested inside the \code{critical} region. If this were permitted,
|
||||
it would result in deadlock due to the fact that only one thread at a time can
|
||||
enter the \code{critical} region:
|
||||
|
||||
\cexample{nesting_restrict}{5c}
|
||||
\cexample{nesting_restrict}{5}
|
||||
|
||||
\fexample{nesting_restrict}{5f}
|
||||
\fexample{nesting_restrict}{5}
|
||||
|
||||
The following example is non-conforming because the \code{barrier} region cannot
|
||||
be closely nested inside the \code{single} region. If this were permitted, it
|
||||
would result in deadlock due to the fact that only one thread executes the \code{single}
|
||||
region:
|
||||
|
||||
\cexample{nesting_restrict}{6c}
|
||||
\cexample{nesting_restrict}{6}
|
||||
|
||||
\fexample{nesting_restrict}{6f}
|
||||
\fexample{nesting_restrict}{6}
|
||||
|
||||
|
||||
|
@ -1,14 +1,14 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{nowait} Clause}
|
||||
\label{chap:nowait}
|
||||
\section{The \code{nowait} Clause}
|
||||
\label{sec:nowait}
|
||||
|
||||
If there are multiple independent loops within a \code{parallel} region, you
|
||||
can use the \code{nowait} clause to avoid the implied barrier at the end of the
|
||||
loop construct, as follows:
|
||||
|
||||
\cexample{nowait}{1c}
|
||||
\cexample{nowait}{1}
|
||||
|
||||
\fexample{nowait}{1f}
|
||||
\fexample{nowait}{1}
|
||||
|
||||
In the following example, static scheduling distributes the same logical iteration
|
||||
numbers to the threads that execute the three loop regions. This allows the \code{nowait}
|
||||
@ -22,7 +22,7 @@ to \code{n-1} (from \code{1} to \code{N} in the Fortran version), while the
|
||||
iteration space of the last loop is from \code{1} to \code{n} (\code{2} to
|
||||
\code{N+1} in the Fortran version).
|
||||
|
||||
\cexample{nowait}{2c}
|
||||
\cexample{nowait}{2}
|
||||
|
||||
\fexample{nowait}{2f}
|
||||
\ffreeexample{nowait}{2}
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{Interaction Between the \code{num\_threads} Clause and \code{omp\_set\_dynamic}}
|
||||
\label{chap:nthrs_dynamic}
|
||||
\section{Interaction Between the \code{num\_threads} Clause and \code{omp\_set\_dynamic}}
|
||||
\label{sec:nthrs_dynamic}
|
||||
|
||||
The following example demonstrates the \code{num\_threads} clause and the effect
|
||||
of the \\
|
||||
@ -12,17 +12,17 @@ of threads in OpenMP implementations that support it. In this case, 10 threads
|
||||
are provided. Note that in case of an error the OpenMP implementation is free to
|
||||
abort the program or to supply any number of threads available.
|
||||
|
||||
\cexample{nthrs_dynamic}{1c}
|
||||
\cexample{nthrs_dynamic}{1}
|
||||
|
||||
\fexample{nthrs_dynamic}{1f}
|
||||
\fexample{nthrs_dynamic}{1}
|
||||
|
||||
The call to the \code{omp\_set\_dynamic} routine with a non-zero argument in
|
||||
C/C++, or \code{.TRUE.} in Fortran, allows the OpenMP implementation to choose
|
||||
any number of threads between 1 and 10.
|
||||
|
||||
\cexample{nthrs_dynamic}{2c}
|
||||
\cexample{nthrs_dynamic}{2}
|
||||
|
||||
\fexample{nthrs_dynamic}{2f}
|
||||
\fexample{nthrs_dynamic}{2}
|
||||
|
||||
It is good practice to set the \plc{dyn-var} ICV explicitly by calling the \code{omp\_set\_dynamic}
|
||||
routine, as its default setting is implementation defined.
|
||||
|
@ -1,12 +1,12 @@
|
||||
\pagebreak
|
||||
\chapter{Controlling the Number of Threads on Multiple Nesting Levels}
|
||||
\label{chap:nthrs_nesting}
|
||||
\section{Controlling the Number of Threads on Multiple Nesting Levels}
|
||||
\label{sec:nthrs_nesting}
|
||||
|
||||
The following examples demonstrate how to use the \code{OMP\_NUM\_THREADS} environment
|
||||
variable to control the number of threads on multiple nesting levels:
|
||||
|
||||
\cexample{nthrs_nesting}{1c}
|
||||
\cexample{nthrs_nesting}{1}
|
||||
|
||||
\fexample{nthrs_nesting}{1f}
|
||||
\fexample{nthrs_nesting}{1}
|
||||
|
||||
|
||||
|
@ -1,28 +1,28 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{ordered} Clause and the \code{ordered} Construct}
|
||||
\label{chap:ordered}
|
||||
\section{The \code{ordered} Clause and the \code{ordered} Construct}
|
||||
\label{sec:ordered}
|
||||
|
||||
Ordered constructs are useful for sequentially ordering the output from work that
|
||||
is done in parallel. The following program prints out the indices in sequential
|
||||
order:
|
||||
|
||||
\cexample{ordered}{1c}
|
||||
\cexample{ordered}{1}
|
||||
|
||||
\fexample{ordered}{1f}
|
||||
\fexample{ordered}{1}
|
||||
|
||||
It is possible to have multiple \code{ordered} constructs within a loop region
|
||||
with the \code{ordered} clause specified. The first example is non-conforming
|
||||
because all iterations execute two \code{ordered} regions. An iteration of a
|
||||
loop must not execute more than one \code{ordered} region:
|
||||
|
||||
\cexample{ordered}{2c}
|
||||
\cexample{ordered}{2}
|
||||
|
||||
\fexample{ordered}{2f}
|
||||
\fexample{ordered}{2}
|
||||
|
||||
The following is a conforming example with more than one \code{ordered} construct.
|
||||
Each iteration will execute only one \code{ordered} region:
|
||||
|
||||
\cexample{ordered}{3c}
|
||||
\cexample{ordered}{3}
|
||||
|
||||
\fexample{ordered}{3f}
|
||||
\fexample{ordered}{3}
|
||||
|
||||
|
@ -1,12 +1,12 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{parallel} Construct}
|
||||
\label{chap:parallel}
|
||||
\section{The \code{parallel} Construct}
|
||||
\label{sec:parallel}
|
||||
|
||||
The \code{parallel} construct can be used in coarse-grain parallel programs.
|
||||
In the following example, each thread in the \code{parallel} region decides what
|
||||
part of the global array \plc{x} to work on, based on the thread number:
|
||||
|
||||
\cexample{parallel}{1c}
|
||||
\cexample{parallel}{1}
|
||||
|
||||
\fexample{parallel}{1f}
|
||||
\fexample{parallel}{1}
|
||||
|
||||
|
@ -1,11 +1,12 @@
|
||||
\chapter{A Simple Parallel Loop}
|
||||
\label{chap:ploop}
|
||||
\pagebreak
|
||||
\section{A Simple Parallel Loop}
|
||||
\label{sec:ploop}
|
||||
|
||||
The following example demonstrates how to parallelize a simple loop using the parallel
|
||||
loop construct. The loop iteration variable is private by default, so it is not
|
||||
necessary to specify it explicitly in a \code{private} clause.
|
||||
|
||||
\cexample{ploop}{1c}
|
||||
\cexample{ploop}{1}
|
||||
|
||||
\fexample{ploop}{1f}
|
||||
\fexample{ploop}{1}
|
||||
|
||||
|
@ -1,11 +1,11 @@
|
||||
\pagebreak
|
||||
\chapter{Parallel Random Access Iterator Loop}
|
||||
\section{Parallel Random Access Iterator Loop}
|
||||
\cppspecificstart
|
||||
\label{chap:pra_iterator}
|
||||
\label{sec:pra_iterator}
|
||||
|
||||
The following example shows a parallel random access iterator loop.
|
||||
|
||||
\cnexample{pra_iterator}{1c}
|
||||
\cppnexample{pra_iterator}{1}
|
||||
\cppspecificend
|
||||
|
||||
|
||||
|
@ -1,31 +1,31 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{private} Clause}
|
||||
\label{chap:private}
|
||||
\section{The \code{private} Clause}
|
||||
\label{sec:private}
|
||||
|
||||
In the following example, the values of original list items \plc{i} and \plc{j}
|
||||
are retained on exit from the \code{parallel} region, while the private list
|
||||
items \plc{i} and \plc{j} are modified within the \code{parallel} construct.
|
||||
|
||||
\cexample{private}{1c}
|
||||
\cexample{private}{1}
|
||||
|
||||
\fexample{private}{1f}
|
||||
\fexample{private}{1}
|
||||
|
||||
In the following example, all uses of the variable \plc{a} within the loop construct
|
||||
in the routine \plc{f} refer to a private list item \plc{a}, while it is
|
||||
unspecified whether references to \plc{a} in the routine \plc{g} are to a
|
||||
private list item or the original list item.
|
||||
|
||||
\cexample{private}{2c}
|
||||
\cexample{private}{2}
|
||||
|
||||
\fexample{private}{2f}
|
||||
\fexample{private}{2}
|
||||
|
||||
The following example demonstrates that a list item that appears in a \code{private}
|
||||
clause in a \code{parallel} construct may also appear in a \code{private}
|
||||
clause in an enclosed worksharing construct, which results in an additional private
|
||||
copy.
|
||||
|
||||
\cexample{private}{3c}
|
||||
\cexample{private}{3}
|
||||
|
||||
\fexample{private}{3f}
|
||||
\fexample{private}{3}
|
||||
|
||||
|
||||
|
@ -1,13 +1,13 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{parallel} \code{sections} Construct}
|
||||
\label{chap:psections}
|
||||
\section{The \code{parallel} \code{sections} Construct}
|
||||
\label{sec:psections}
|
||||
|
||||
In the following example routines \code{XAXIS}, \code{YAXIS}, and \code{ZAXIS} can
|
||||
be executed concurrently. The first \code{section} directive is optional. Note
|
||||
that all \code{section} directives need to appear in the \code{parallel sections}
|
||||
construct.
|
||||
|
||||
\cexample{psections}{1c}
|
||||
\cexample{psections}{1}
|
||||
|
||||
\fexample{psections}{1f}
|
||||
\fexample{psections}{1}
|
||||
|
||||
|
@ -1,44 +1,44 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{reduction} Clause}
|
||||
\label{chap:reduction}
|
||||
\section{The \code{reduction} Clause}
|
||||
\label{sec:reduction}
|
||||
|
||||
The following example demonstrates the \code{reduction} clause ; note that some
|
||||
reductions can be expressed in the loop in several ways, as shown for the \code{max}
|
||||
and \code{min} reductions below:
|
||||
|
||||
\cexample{reduction}{1c}
|
||||
\cexample{reduction}{1}
|
||||
|
||||
\fexample{reduction}{1f}
|
||||
\ffreeexample{reduction}{1}
|
||||
|
||||
A common implementation of the preceding example is to treat it as if it had been
|
||||
written as follows:
|
||||
|
||||
\cexample{reduction}{2c}
|
||||
\cexample{reduction}{2}
|
||||
|
||||
\fortranspecificstart
|
||||
\fnexample{reduction}{2f}
|
||||
\ffreenexample{reduction}{2}
|
||||
|
||||
The following program is non-conforming because the reduction is on the
|
||||
\emph{intrinsic procedure name} \code{MAX} but that name has been redefined to be the variable
|
||||
named \code{MAX}.
|
||||
|
||||
\ffreenexample{reduction}{3}
|
||||
% blue line floater at top of this page for "Fortran, cont."
|
||||
\begin{figure}[t!]
|
||||
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
|
||||
\end{figure}
|
||||
|
||||
\fnexample{reduction}{3f}
|
||||
|
||||
The following conforming program performs the reduction using the
|
||||
\emph{intrinsic procedure name} \code{MAX} even though the intrinsic \code{MAX} has been renamed
|
||||
to \code{REN}.
|
||||
|
||||
\fnexample{reduction}{4f}
|
||||
\ffreenexample{reduction}{4}
|
||||
|
||||
The following conforming program performs the reduction using
|
||||
\plc{intrinsic procedure name} \code{MAX} even though the intrinsic \code{MAX} has been renamed
|
||||
to \code{MIN}.
|
||||
|
||||
\fnexample{reduction}{5f}
|
||||
\ffreenexample{reduction}{5}
|
||||
\fortranspecificend
|
||||
|
||||
The following example is non-conforming because the initialization (\code{a =
|
||||
@ -53,8 +53,13 @@ clause. This can be achieved by adding an explicit barrier after the assignment
|
||||
directive (which has an implied barrier), or by initializing \code{a} before
|
||||
the start of the \code{parallel} region.
|
||||
|
||||
\cexample{reduction}{3c}
|
||||
\cexample{reduction}{6}
|
||||
|
||||
\fexample{reduction}{6f}
|
||||
\fexample{reduction}{6}
|
||||
|
||||
The following example demonstrates the reduction of array \plc{a}. In C/C++ this is illustrated by the explicit use of an array section \plc{a[0:N]} in the \code{reduction} clause. The corresponding Fortran example uses array syntax supported in the base language. As of the OpenMP 4.5 specification the explicit use of array section in the \code{reduction} clause in Fortran is not permitted. But this oversight will be fixed in the next release of the specification.
|
||||
|
||||
|
||||
\cexample{reduction}{7}
|
||||
|
||||
\ffreeexample{reduction}{7}
|
||||
|
@ -1,7 +1,7 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{omp\_set\_dynamic} and \\
|
||||
\section{The \code{omp\_set\_dynamic} and \\
|
||||
\code{omp\_set\_num\_threads} Routines}
|
||||
\label{chap:set_dynamic_nthrs}
|
||||
\label{sec:set_dynamic_nthrs}
|
||||
|
||||
Some programs rely on a fixed, prespecified number of threads to execute correctly.
|
||||
Because the default setting for the dynamic adjustment of the number of threads
|
||||
@ -17,8 +17,8 @@ dynamic threads setting. The dynamic threads mechanism determines the number of
|
||||
threads to use at the start of the \code{parallel} region and keeps it constant
|
||||
for the duration of the region.
|
||||
|
||||
\cexample{set_dynamic_nthrs}{1c}
|
||||
\cexample{set_dynamic_nthrs}{1}
|
||||
|
||||
\fexample{set_dynamic_nthrs}{1f}
|
||||
\fexample{set_dynamic_nthrs}{1}
|
||||
|
||||
|
||||
|
@ -1,6 +1,5 @@
|
||||
\pagebreak
|
||||
\chapter{Simple Lock Routines}
|
||||
\label{chap:simple_lock}
|
||||
\subsection{Simple Lock Routines}
|
||||
\label{subsec:simple_lock}
|
||||
|
||||
In the following example, the lock routines cause the threads to be idle while
|
||||
waiting for entry to the first critical section, but to do other work while waiting
|
||||
@ -10,10 +9,10 @@ function does not, allowing the work in \code{skip} to be done.
|
||||
Note that the argument to the lock routines should have type \code{omp\_lock\_t},
|
||||
and that there is no need to flush it.
|
||||
|
||||
\cexample{simple_lock}{1c}
|
||||
\cexample{simple_lock}{1}
|
||||
|
||||
Note that there is no need to flush the lock variable.
|
||||
|
||||
\fexample{simple_lock}{1f}
|
||||
\fexample{simple_lock}{1}
|
||||
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{single} Construct}
|
||||
\label{chap:single}
|
||||
\section{The \code{single} Construct}
|
||||
\label{sec:single}
|
||||
|
||||
The following example demonstrates the \code{single} construct. In the example,
|
||||
only one thread prints each of the progress messages. All other threads will skip
|
||||
@ -11,8 +11,8 @@ a \code{nowait} clause can be specified, as is done in the third \code{single}
|
||||
construct in this example. The user must not make any assumptions as to which thread
|
||||
will execute a \code{single} region.
|
||||
|
||||
\cexample{single}{1c}
|
||||
\cexample{single}{1}
|
||||
|
||||
\fexample{single}{1f}
|
||||
\fexample{single}{1}
|
||||
|
||||
|
||||
|
@ -1,31 +1,31 @@
|
||||
\pagebreak
|
||||
\chapter{Placement of \code{flush}, \code{barrier}, \code{taskwait}
|
||||
\section{Placement of \code{flush}, \code{barrier}, \code{taskwait}
|
||||
and \code{taskyield} Directives}
|
||||
\label{chap:standalone}
|
||||
\label{sec:standalone}
|
||||
|
||||
The following example is non-conforming, because the \code{flush}, \code{barrier},
|
||||
\code{taskwait}, and \code{taskyield} directives are stand-alone directives
|
||||
and cannot be the immediate substatement of an \code{if} statement.
|
||||
|
||||
\cexample{standalone}{1c}
|
||||
\cexample{standalone}{1}
|
||||
|
||||
The following example is non-conforming, because the \code{flush}, \code{barrier},
|
||||
\code{taskwait}, and \code{taskyield} directives are stand-alone directives
|
||||
and cannot be the action statement of an \code{if} statement or a labeled branch
|
||||
target.
|
||||
|
||||
\fexample{standalone}{1f}
|
||||
\ffreeexample{standalone}{1}
|
||||
|
||||
The following version of the above example is conforming because the \code{flush},
|
||||
\code{barrier}, \code{taskwait}, and \code{taskyield} directives are enclosed
|
||||
in a compound statement.
|
||||
|
||||
\cexample{standalone}{2c}
|
||||
\cexample{standalone}{2}
|
||||
|
||||
The following example is conforming because the \code{flush}, \code{barrier},
|
||||
\code{taskwait}, and \code{taskyield} directives are enclosed in an \code{if}
|
||||
construct or follow the labeled branch target.
|
||||
|
||||
\fexample{standalone}{2f}
|
||||
\ffreeexample{standalone}{2}
|
||||
|
||||
|
||||
|
@ -1,29 +1,32 @@
|
||||
\pagebreak
|
||||
\chapter{\code{target} Construct}
|
||||
\label{chap:target}
|
||||
\section{\code{target} Construct}
|
||||
\label{sec:target}
|
||||
|
||||
\section{\code{target} Construct on \code{parallel} Construct}
|
||||
\subsection{\code{target} Construct on \code{parallel} Construct}
|
||||
\label{subsec:target_parallel}
|
||||
|
||||
This following example shows how the \code{target} construct offloads a code
|
||||
region to a target device. The variables \plc{p}, \plc{v1}, \plc{v2}, and \plc{N} are implicitly mapped
|
||||
to the target device.
|
||||
|
||||
\cexample{target}{1c}
|
||||
\cexample{target}{1}
|
||||
|
||||
\fexample{target}{1f}
|
||||
\ffreeexample{target}{1}
|
||||
|
||||
\section{\code{target} Construct with \code{map} Clause}
|
||||
\subsection{\code{target} Construct with \code{map} Clause}
|
||||
\label{subsec:target_map}
|
||||
|
||||
This following example shows how the \code{target} construct offloads a code
|
||||
region to a target device. The variables \plc{p}, \plc{v1} and \plc{v2} are explicitly mapped to the
|
||||
target device using the \code{map} clause. The variable \plc{N} is implicitly mapped to
|
||||
the target device.
|
||||
|
||||
\cexample{target}{2c}
|
||||
\cexample{target}{2}
|
||||
|
||||
\fexample{target}{2f}
|
||||
\ffreeexample{target}{2}
|
||||
|
||||
\section{\code{map} Clause with \code{to}/\code{from} map-types}
|
||||
\subsection{\code{map} Clause with \code{to}/\code{from} map-types}
|
||||
\label{subsec:target_map_tofrom}
|
||||
|
||||
The following example shows how the \code{target} construct offloads a code region
|
||||
to a target device. In the \code{map} clause, the \code{to} and \code{from}
|
||||
@ -43,16 +46,17 @@ the variable \plc{p} is not initialized with the value of the corresponding vari
|
||||
on the host device, and at the end of the \code{target} region the variable \plc{p}
|
||||
is assigned to the corresponding variable on the host device.
|
||||
|
||||
\cexample{target}{3c}
|
||||
\cexample{target}{3}
|
||||
|
||||
The \code{to} and \code{from} map-types allow programmers to optimize data
|
||||
motion. Since data for the \plc{v} arrays are not returned, and data for the \plc{p} array
|
||||
are not transferred to the device, only one-half of the data is moved, compared
|
||||
to the default behavior of an implicit mapping.
|
||||
|
||||
\fexample{target}{3f}
|
||||
\ffreeexample{target}{3}
|
||||
|
||||
\section{\code{map} Clause with Array Sections}
|
||||
\subsection{\code{map} Clause with Array Sections}
|
||||
\label{subsec:target_array_section}
|
||||
|
||||
The following example shows how the \code{target} construct offloads a code region
|
||||
to a target device. In the \code{map} clause, map-types are used to optimize
|
||||
@ -60,14 +64,14 @@ the mapping of variables to the target device. Because variables \plc{p}, \plc{v
|
||||
pointers, array section notation must be used to map the arrays. The notation \code{:N}
|
||||
is equivalent to \code{0:N}.
|
||||
|
||||
\cexample{target}{4c}
|
||||
\cexample{target}{4}
|
||||
|
||||
In C, the length of the pointed-to array must be specified. In Fortran the extent
|
||||
of the array is known and the length need not be specified. A section of the array
|
||||
can be specified with the usual Fortran syntax, as shown in the following example.
|
||||
The value 1 is assumed for the lower bound for array section \plc{v2(:N)}.
|
||||
|
||||
\fexample{target}{4f}
|
||||
\ffreeexample{target}{4}
|
||||
|
||||
A more realistic situation in which an assumed-size array is passed to \code{vec\_mult}
|
||||
requires that the length of the arrays be specified, because the compiler does
|
||||
@ -75,9 +79,10 @@ not know the size of the storage. A section of the array must be specified with
|
||||
the usual Fortran syntax, as shown in the following example. The value 1 is assumed
|
||||
for the lower bound for array section \plc{v2(:N)}.
|
||||
|
||||
\fexample{target}{4bf}
|
||||
\ffreeexample{target}{4b}
|
||||
|
||||
\section{\code{target} Construct with \code{if} Clause}
|
||||
\subsection{\code{target} Construct with \code{if} Clause}
|
||||
\label{subsec:target_if}
|
||||
|
||||
The following example shows how the \code{target} construct offloads a code region
|
||||
to a target device.
|
||||
@ -90,7 +95,18 @@ The \code{if} clause on the \code{parallel} construct indicates that if the
|
||||
variable \plc{N} is smaller than a second threshold then the \code{parallel} region
|
||||
is inactive.
|
||||
|
||||
\cexample{target}{5c}
|
||||
\cexample{target}{5}
|
||||
|
||||
\fexample{target}{5f}
|
||||
\ffreeexample{target}{5}
|
||||
|
||||
The following example is a modification of the above \plc{target.5} code to show the combined \code{target}
|
||||
and parallel loop directives. It uses the \plc{directive-name} modifier in multiple \code{if}
|
||||
clauses to specify the component directive to which it applies.
|
||||
|
||||
The \code{if} clause with the \code{target} modifier applies to the \code{target} component of the
|
||||
combined directive, and the \code{if} clause with the \code{parallel} modifier applies
|
||||
to the \code{parallel} component of the combined directive.
|
||||
|
||||
\cexample{target}{6}
|
||||
|
||||
\ffreeexample{target}{6}
|
||||
|
@ -1,8 +1,9 @@
|
||||
\pagebreak
|
||||
\chapter{\code{target} \code{data} Construct}
|
||||
\label{chap:target_data}
|
||||
\section{\code{target} \code{data} Construct}
|
||||
\label{sec:target_data}
|
||||
|
||||
\section{Simple \code{target} \code{data} Construct}
|
||||
\subsection{Simple \code{target} \code{data} Construct}
|
||||
\label{subsec:target_data_simple}
|
||||
|
||||
This example shows how the \code{target} \code{data} construct maps variables
|
||||
to a device data environment. The \code{target} \code{data} construct creates
|
||||
@ -13,15 +14,16 @@ variables \plc{v1}, \plc{v2}, and \plc{p} from the enclosing device data environ
|
||||
\plc{N} is mapped into the new device data environment from the encountering task's data
|
||||
environment.
|
||||
|
||||
\cexample{target_data}{1c}
|
||||
\cexample{target_data}{1}
|
||||
|
||||
The Fortran code passes a reference and specifies the extent of the arrays in the
|
||||
declaration. No length information is necessary in the map clause, as is required
|
||||
with C/C++ pointers.
|
||||
|
||||
\fexample{target_data}{1f}
|
||||
\ffreeexample{target_data}{1}
|
||||
|
||||
\section{\code{target} \code{data} Region Enclosing Multiple \code{target} Regions}
|
||||
\subsection{\code{target} \code{data} Region Enclosing Multiple \code{target} Regions}
|
||||
\label{subsec:target_data_multiregion}
|
||||
|
||||
The following examples show how the \code{target} \code{data} construct maps
|
||||
variables to a device data environment of a \code{target} region. The \code{target}
|
||||
@ -36,7 +38,7 @@ In the following example the variables \plc{v1} and \plc{v2} are mapped at each
|
||||
construct. Instead of mapping the variable \plc{p} twice, once at each \code{target}
|
||||
construct, \plc{p} is mapped once by the \code{target} \code{data} construct.
|
||||
|
||||
\cexample{target_data}{2c}
|
||||
\cexample{target_data}{2}
|
||||
|
||||
|
||||
The Fortran code uses reference and specifies the extent of the \plc{p}, \plc{v1} and \plc{v2} arrays.
|
||||
@ -45,14 +47,14 @@ C/C++ pointers. The arrays \plc{v1} and \plc{v2} are mapped at each \code{target
|
||||
Instead of mapping the array \plc{p} twice, once at each target construct, \plc{p} is mapped
|
||||
once by the \code{target} \code{data} construct.
|
||||
|
||||
\fexample{target_data}{2f}
|
||||
\ffreeexample{target_data}{2}
|
||||
|
||||
In the following example, the variable tmp defaults to \code{tofrom} map-type
|
||||
and is mapped at each \code{target} construct. The array \plc{Q} is mapped once at
|
||||
the enclosing \code{target} \code{data} region instead of at each \code{target}
|
||||
construct.
|
||||
|
||||
\cexample{target_data}{3c}
|
||||
\cexample{target_data}{3}
|
||||
|
||||
In the following example the arrays \plc{v1} and \plc{v2} are mapped at each \code{target}
|
||||
construct. Instead of mapping the array \plc{Q} twice at each \code{target} construct,
|
||||
@ -61,9 +63,9 @@ variable is implicitly remapped for each \code{target} region, mapping the value
|
||||
from the device to the host at the end of the first \code{target} region, and
|
||||
from the host to the device for the second \code{target} region.
|
||||
|
||||
\fexample{target_data}{3f}
|
||||
\ffreeexample{target_data}{3}
|
||||
|
||||
\section{\code{target} \code{data} Construct with Orphaned Call}
|
||||
\subsection{\code{target} \code{data} Construct with Orphaned Call}
|
||||
|
||||
The following two examples show how the \code{target} \code{data} construct
|
||||
maps variables to a device data environment. The \code{target} \code{data}
|
||||
@ -88,7 +90,7 @@ of the storage location associated with their corresponding array sections. Note
|
||||
that the following pairs of array section storage locations are equivalent (\plc{p0[:N]},
|
||||
\plc{p1[:N]}), (\plc{v1[:N]},\plc{v3[:N]}), and (\plc{v2[:N]},\plc{v4[:N]}).
|
||||
|
||||
\cexample{target_data}{4c}
|
||||
\cexample{target_data}{4}
|
||||
|
||||
The Fortran code maps the pointers and storage in an identical manner (same extent,
|
||||
but uses indices from 1 to \plc{N}).
|
||||
@ -104,7 +106,7 @@ assigned the address of the storage location associated with their corresponding
|
||||
array sections. Note that the following pair of array storage locations are equivalent
|
||||
(\plc{p0},\plc{p1}), (\plc{v1},\plc{v3}), and (\plc{v2},\plc{v4}).
|
||||
|
||||
\fexample{target_data}{4f}
|
||||
\ffreeexample{target_data}{4}
|
||||
|
||||
|
||||
In the following example, the variables \plc{p1}, \plc{v3}, and \plc{v4} are references to the pointer
|
||||
@ -113,7 +115,7 @@ environment inherits the pointer variables \plc{p0}, \plc{v1}, and \plc{v2} from
|
||||
\code{data} construct's device data environment. Thus, \plc{p1}, \plc{v3}, and \plc{v4} are already
|
||||
present in the device data environment.
|
||||
|
||||
\cexample{target_data}{5c}
|
||||
\cppexample{target_data}{5}
|
||||
|
||||
In the following example, the usual Fortran approach is used for dynamic memory.
|
||||
The \plc{p0}, \plc{v1}, and \plc{v2} arrays are allocated in the main program and passed as references
|
||||
@ -123,9 +125,10 @@ environment inherits the arrays \plc{p0}, \plc{v1}, and \plc{v2} from the enclos
|
||||
device data environment. Thus, \plc{p1}, \plc{v3}, and \plc{v4} are already present in the device
|
||||
data environment.
|
||||
|
||||
\fexample{target_data}{5f}
|
||||
\ffreeexample{target_data}{5}
|
||||
|
||||
\section{\code{target} \code{data} Construct with \code{if} Clause}
|
||||
\subsection{\code{target} \code{data} Construct with \code{if} Clause}
|
||||
\label{subsec:target_data_if}
|
||||
|
||||
The following two examples show how the \code{target} \code{data} construct
|
||||
maps variables to a device data environment.
|
||||
@ -140,7 +143,7 @@ variable \plc{p} is implicitly mapped with a map-type of \code{tofrom}, but the
|
||||
location for the array section \plc{p[0:N]} will not be mapped in the device data environments
|
||||
of the \code{target} constructs.
|
||||
|
||||
\cexample{target_data}{6c}
|
||||
\cexample{target_data}{6}
|
||||
|
||||
The \code{if} clauses work the same way for the following Fortran code. The \code{target}
|
||||
constructs enclosed in the \code{target} \code{data} region should also use
|
||||
@ -148,7 +151,7 @@ an \code{if} clause with the same condition, so that the \code{target} \code{dat
|
||||
region and the \code{target} region are either both created for the device, or
|
||||
are both ignored.
|
||||
|
||||
\fexample{target_data}{6f}
|
||||
\ffreeexample{target_data}{6}
|
||||
|
||||
In the following example, when the \code{if} clause conditional expression on
|
||||
the \code{target} construct evaluates to \plc{false}, the target region will
|
||||
@ -159,7 +162,7 @@ region the array section \plc{p[0:N]} will be assigned from the device data envi
|
||||
to the corresponding variable in the data environment of the task that encountered
|
||||
the \code{target} \code{data} construct, resulting in undefined values in \plc{p[0:N]}.
|
||||
|
||||
\cexample{target_data}{7c}
|
||||
\cexample{target_data}{7}
|
||||
|
||||
The \code{if} clauses work the same way for the following Fortran code. When
|
||||
the \code{if} clause conditional expression on the \code{target} construct
|
||||
@ -171,5 +174,5 @@ region the \plc{p} array will be assigned from the device data environment to th
|
||||
variable in the data environment of the task that encountered the \code{target}
|
||||
\code{data} construct, resulting in undefined values in \plc{p}.
|
||||
|
||||
\fexample{target_data}{7f}
|
||||
\ffreeexample{target_data}{7}
|
||||
|
||||
|
47
Examples_target_unstructured_data.tex
Normal file
47
Examples_target_unstructured_data.tex
Normal file
@ -0,0 +1,47 @@
|
||||
%begin
|
||||
\pagebreak
|
||||
\section{\code{target} \code{enter} \code{data} and \code{target} \code{exit} \code{data} Constructs}
|
||||
\label{sec:target_enter_exit_data}
|
||||
%\section{Simple target enter data and target exit data Constructs}
|
||||
|
||||
The structured data construct (\code{target}~\code{data}) provides persistent data on a
|
||||
device for subsequent \code{target} constructs as shown in the
|
||||
\code{target}~\code{data} examples above. This is accomplished by creating a single
|
||||
\code{target}~\code{data} region containing \code{target} constructs.
|
||||
|
||||
The unstructured data constructs allow the creation and deletion of data on
|
||||
the device at any appropriate point within the host code, as shown below
|
||||
with the \code{target}~\code{enter}~\code{data} and \code{target}~\code{exit}~\code{data} constructs.
|
||||
|
||||
The following C++ code creates/deletes a vector in a constructor/destructor
|
||||
of a class. The constructor creates a vector with \code{target}~\code{enter}~\code{data}
|
||||
and uses an \code{alloc} modifier in the \code{map} clause to avoid copying values
|
||||
to the device. The destructor deletes the data (\code{target}~\code{exit}~\code{data})
|
||||
and uses the \code{delete} modifier in the \code{map} clause to avoid copying data
|
||||
back to the host. Note, the stand-alone \code{target}~\code{enter}~\code{data} occurs
|
||||
after the host vector is created, and the \code{target}~\code{exit}~\code{data}
|
||||
construct occurs before the host data is deleted.
|
||||
|
||||
\cppexample{target_unstructured_data}{1}
|
||||
|
||||
The following C code allocates and frees the data member of a Matrix structure.
|
||||
The \code{init\_matrix} function allocates the memory used in the structure and
|
||||
uses the \code{target}~\code{enter}~\code{data} directive to map it to the target device. The
|
||||
\code{free\_matrix} function removes the mapped array from the target device
|
||||
and then frees the memory on the host. Note, the stand-alone
|
||||
\code{target}~\code{enter}~\code{data} occurs after the host memory is allocated, and the
|
||||
\code{target}~\code{exit}~\code{data} construct occurs before the host data is freed.
|
||||
|
||||
\cexample{target_unstructured_data}{1}
|
||||
|
||||
The following Fortran code allocates and deallocates a module array. The
|
||||
\code{initialize} subroutine allocates the module array and uses the
|
||||
\code{target}~\code{enter}~\code{data} directive to map it to the target device. The
|
||||
\code{finalize} subroutine removes the mapped array from the target device and
|
||||
then deallocates the array on the host. Note, the stand-alone
|
||||
\code{target}~\code{enter}~\code{data} occurs after the host memory is allocated, and the
|
||||
\code{target}~\code{exit}~\code{data} construct occurs before the host data is deallocated.
|
||||
|
||||
\ffreeexample{target_unstructured_data}{1}
|
||||
%end
|
||||
|
@ -1,8 +1,9 @@
|
||||
\pagebreak
|
||||
\chapter{\code{target} \code{update} Construct}
|
||||
\label{chap:target_update}
|
||||
\section{\code{target} \code{update} Construct}
|
||||
\label{sec:target_update}
|
||||
|
||||
\section{Simple \code{target} \code{data} and \code{target} \code{update} Constructs}
|
||||
\subsection{Simple \code{target} \code{data} and \code{target} \code{update} Constructs}
|
||||
\label{subsec:target_data_and_update}
|
||||
|
||||
The following example shows how the \code{target} \code{update} construct updates
|
||||
variables in a device data environment.
|
||||
@ -26,11 +27,12 @@ region and waits for the completion of the region.
|
||||
|
||||
The second \code{target} region uses the updated values of \plc{v1[:N]} and \plc{v2[:N]}.
|
||||
|
||||
\cexample{target_update}{1c}
|
||||
\cexample{target_update}{1}
|
||||
|
||||
\fexample{target_update}{1f}
|
||||
\ffreeexample{target_update}{1}
|
||||
|
||||
\section{\code{target} \code{update} Construct with \code{if} Clause}
|
||||
\subsection{\code{target} \code{update} Construct with \code{if} Clause}
|
||||
\label{subsec:target_update_if}
|
||||
|
||||
The following example shows how the \code{target} \code{update} construct updates
|
||||
variables in a device data environment.
|
||||
@ -47,7 +49,7 @@ assigns the new values of \plc{v1} and \plc{v2} from the task's data environment
|
||||
mapped array sections in the \code{target} \code{data} construct's device data
|
||||
environment.
|
||||
|
||||
\cexample{target_update}{2c}
|
||||
\cexample{target_update}{2}
|
||||
|
||||
\fexample{target_update}{2f}
|
||||
\ffreeexample{target_update}{2}
|
||||
|
||||
|
@ -1,58 +1,62 @@
|
||||
\pagebreak
|
||||
\chapter{Task Dependences}
|
||||
\label{chap:task_dep}
|
||||
\section{Task Dependences}
|
||||
\label{sec:task_depend}
|
||||
|
||||
\section{Flow Dependence}
|
||||
\subsection{Flow Dependence}
|
||||
\label{subsec:task_flow_depend}
|
||||
|
||||
In this example we show a simple flow dependence expressed using the \code{depend}
|
||||
clause on the \code{task} construct.
|
||||
|
||||
\cexample{task_dep}{1c}
|
||||
\cexample{task_dep}{1}
|
||||
|
||||
\fexample{task_dep}{1f}
|
||||
\ffreeexample{task_dep}{1}
|
||||
|
||||
The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend}
|
||||
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
|
||||
omitted, then the tasks could execute in any order and the program and the program
|
||||
would have a race condition.
|
||||
|
||||
\section{Anti-dependence}
|
||||
\subsection{Anti-dependence}
|
||||
\label{subsec:task_anti_depend}
|
||||
|
||||
In this example we show an anti-dependence expressed using the \code{depend}
|
||||
clause on the \code{task} construct.
|
||||
|
||||
\cexample{task_dep}{2c}
|
||||
\cexample{task_dep}{2}
|
||||
|
||||
\fexample{task_dep}{2f}
|
||||
\ffreeexample{task_dep}{2}
|
||||
|
||||
The program will always print \texttt{"}x = 1\texttt{"}, because the \code{depend}
|
||||
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
|
||||
omitted, then the tasks could execute in any order and the program would have a
|
||||
race condition.
|
||||
|
||||
\section{Output Dependence}
|
||||
\subsection{Output Dependence}
|
||||
\label{subsec:task_out_depend}
|
||||
|
||||
In this example we show an output dependence expressed using the \code{depend}
|
||||
clause on the \code{task} construct.
|
||||
|
||||
\cexample{task_dep}{3c}
|
||||
\cexample{task_dep}{3}
|
||||
|
||||
\fexample{task_dep}{3f}
|
||||
\ffreeexample{task_dep}{3}
|
||||
|
||||
The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend}
|
||||
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
|
||||
omitted, then the tasks could execute in any order and the program would have a
|
||||
race condition.
|
||||
|
||||
\section{Concurrent Execution with Dependences}
|
||||
\subsection{Concurrent Execution with Dependences}
|
||||
\label{subsec:task_concurrent_depend}
|
||||
|
||||
In this example we show potentially concurrent execution of tasks using multiple
|
||||
flow dependences expressed using the \code{depend} clause on the \code{task}
|
||||
construct.
|
||||
|
||||
\cexample{task_dep}{4c}
|
||||
\cexample{task_dep}{4}
|
||||
|
||||
\fexample{task_dep}{4f}
|
||||
\ffreeexample{task_dep}{4}
|
||||
|
||||
The last two tasks are dependent on the first task. However there is no dependence
|
||||
between the last two tasks, which may execute in any order (or concurrently if
|
||||
@ -61,12 +65,13 @@ more than one thread is available). Thus, the possible outputs are \texttt{"}x
|
||||
If the \code{depend} clauses had been omitted, then all of the tasks could execute
|
||||
in any order and the program would have a race condition.
|
||||
|
||||
\section{Matrix multiplication}
|
||||
\subsection{Matrix multiplication}
|
||||
\label{subsec:task_matrix_mult}
|
||||
|
||||
This example shows a task-based blocked matrix multiplication. Matrices are of
|
||||
NxN elements, and the multiplication is implemented using blocks of BSxBS elements.
|
||||
|
||||
\cexample{task_dep}{5c}
|
||||
\cexample{task_dep}{5}
|
||||
|
||||
\fexample{task_dep}{5f}
|
||||
\ffreeexample{task_dep}{5}
|
||||
|
||||
|
22
Examples_task_priority.tex
Normal file
22
Examples_task_priority.tex
Normal file
@ -0,0 +1,22 @@
|
||||
\pagebreak
|
||||
\section{Task Priority}
|
||||
\label{sec:task_priority}
|
||||
|
||||
|
||||
|
||||
%\subsection{Task Priority}
|
||||
%\label{subsec:task_priority}
|
||||
|
||||
In this example we compute arrays in a matrix through a \plc{compute\_array} routine.
|
||||
Each task has a priority value equal to the value of the loop variable \plc{i} at the
|
||||
moment of its creation. A higher priority on a task means that a task is a candidate
|
||||
to run sooner.
|
||||
|
||||
The creation of tasks occurs in ascending order (according to the iteration space of
|
||||
the loop) but a hint, by means of the \code{priority} clause, is provided to reverse
|
||||
the execution order.
|
||||
|
||||
\cexample{task_priority}{1}
|
||||
|
||||
\ffreeexample{task_priority}{1}
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{taskgroup} Construct}
|
||||
\label{chap:taskgroup}
|
||||
\section{The \code{taskgroup} Construct}
|
||||
\label{sec:taskgroup}
|
||||
|
||||
In this example, tasks are grouped and synchronized using the \code{taskgroup}
|
||||
construct.
|
||||
@ -14,7 +14,7 @@ does not participate in the synchronization, and is left free to execute in para
|
||||
This is opposed to the behaviour of the \code{taskwait} construct, which would
|
||||
include the background tasks in the synchronization.
|
||||
|
||||
\cexample{taskgroup}{1c}
|
||||
\cexample{taskgroup}{1}
|
||||
|
||||
\fexample{taskgroup}{1f}
|
||||
\ffreeexample{taskgroup}{1}
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{task} and \code{taskwait} Constructs}
|
||||
\label{chap:tasking}
|
||||
\section{The \code{task} and \code{taskwait} Constructs}
|
||||
\label{sec:task_taskwait}
|
||||
|
||||
The following example shows how to traverse a tree-like structure using explicit
|
||||
tasks. Note that the \code{traverse} function should be called from within a
|
||||
@ -9,17 +9,17 @@ note that the tasks will be executed in no specified order because there are no
|
||||
synchronization directives. Thus, assuming that the traversal will be done in post
|
||||
order, as in the sequential code, is wrong.
|
||||
|
||||
\cexample{tasking}{1c}
|
||||
\cexample{tasking}{1}
|
||||
|
||||
\fexample{tasking}{1f}
|
||||
\ffreeexample{tasking}{1}
|
||||
|
||||
In the next example, we force a postorder traversal of the tree by adding a \code{taskwait}
|
||||
directive. Now, we can safely assume that the left and right sons have been executed
|
||||
before we process the current node.
|
||||
|
||||
\cexample{tasking}{2c}
|
||||
\cexample{tasking}{2}
|
||||
|
||||
\fexample{tasking}{2f}
|
||||
\ffreeexample{tasking}{2}
|
||||
|
||||
The following example demonstrates how to use the \code{task} construct to process
|
||||
elements of a linked list in parallel. The thread executing the \code{single}
|
||||
@ -28,18 +28,18 @@ in the current team. The pointer \plc{p} is \code{firstprivate} by default
|
||||
on the \code{task} construct so it is not necessary to specify it in a \code{firstprivate}
|
||||
clause.
|
||||
|
||||
\cexample{tasking}{3c}
|
||||
\cexample{tasking}{3}
|
||||
|
||||
\fexample{tasking}{3f}
|
||||
\ffreeexample{tasking}{3}
|
||||
|
||||
The \code{fib()} function should be called from within a \code{parallel} region
|
||||
for the different specified tasks to be executed in parallel. Also, only one thread
|
||||
of the \code{parallel} region should call \code{fib()} unless multiple concurrent
|
||||
Fibonacci computations are desired.
|
||||
|
||||
\cexample{tasking}{4c}
|
||||
\cexample{tasking}{4}
|
||||
|
||||
\fexample{tasking}{4f}
|
||||
\fexample{tasking}{4}
|
||||
|
||||
Note: There are more efficient algorithms for computing Fibonacci numbers. This
|
||||
classic recursion algorithm is for illustrative purposes.
|
||||
@ -52,9 +52,9 @@ loop to suspend its task at the task scheduling point in the \code{task} directi
|
||||
and start executing unassigned tasks. Once the number of unassigned tasks is sufficiently
|
||||
low, the thread may resume execution of the task generating loop.
|
||||
|
||||
\cexample{tasking}{5c}
|
||||
\cexample{tasking}{5}
|
||||
\pagebreak
|
||||
\fexample{tasking}{5f}
|
||||
\fexample{tasking}{5}
|
||||
|
||||
The following example is the same as the previous one, except that the tasks are
|
||||
generated in an untied task. While generating the tasks, the implementation may
|
||||
@ -69,9 +69,9 @@ to resume the task generating loop. In the previous examples, the other threads
|
||||
would be forced to idle until the generating thread finishes its long task, since
|
||||
the task generating loop was in a tied task.
|
||||
|
||||
\cexample{tasking}{6c}
|
||||
\cexample{tasking}{6}
|
||||
|
||||
\fexample{tasking}{6f}
|
||||
\fexample{tasking}{6}
|
||||
|
||||
The following two examples demonstrate how the scheduling rules illustrated in
|
||||
Section 2.11.3 of the OpenMP 4.0 specification affect the usage of
|
||||
@ -86,20 +86,20 @@ both of the task regions that modify \code{tp}. The parts of these task regions
|
||||
in which \code{tp} is modified may be executed in any order so the resulting
|
||||
value of \code{var} can be either 1 or 2.
|
||||
|
||||
\cexample{tasking}{7c}
|
||||
\cexample{tasking}{7}
|
||||
|
||||
|
||||
\fexample{tasking}{7f}
|
||||
\fexample{tasking}{7}
|
||||
|
||||
In this example, scheduling constraints prohibit a thread in the team from executing
|
||||
a new task that modifies \code{tp} while another such task region tied to the
|
||||
same thread is suspended. Therefore, the value written will persist across the
|
||||
task scheduling point.
|
||||
|
||||
\cexample{tasking}{8c}
|
||||
\cexample{tasking}{8}
|
||||
|
||||
|
||||
\fexample{tasking}{8f}
|
||||
\fexample{tasking}{8}
|
||||
|
||||
The following two examples demonstrate how the scheduling rules illustrated in
|
||||
Section 2.11.3 of the OpenMP 4.0 specification affect the usage of locks
|
||||
@ -112,20 +112,20 @@ it encounters the task scheduling point at task 3, it could suspend task 1 and
|
||||
begin task 2 which will result in a deadlock when it tries to enter critical region
|
||||
1.
|
||||
|
||||
\cexample{tasking}{9c}
|
||||
\cexample{tasking}{9}
|
||||
|
||||
|
||||
\fexample{tasking}{9f}
|
||||
\fexample{tasking}{9}
|
||||
|
||||
In the following example, \code{lock} is held across a task scheduling point.
|
||||
However, according to the scheduling restrictions, the executing thread can't
|
||||
begin executing one of the non-descendant tasks that also acquires \code{lock} before
|
||||
the task region is complete. Therefore, no deadlock is possible.
|
||||
|
||||
\cexample{tasking}{10c}
|
||||
\cexample{tasking}{10}
|
||||
|
||||
|
||||
\fexample{tasking}{10f}
|
||||
\ffreeexample{tasking}{10}
|
||||
|
||||
The following examples illustrate the use of the \code{mergeable} clause in the
|
||||
\code{task} construct. In this first example, the \code{task} construct has
|
||||
@ -139,9 +139,9 @@ outcome does not depend on whether or not the task is merged (that is, the task
|
||||
will always increment the same variable and will always compute the same value
|
||||
for \code{x}).
|
||||
|
||||
\cexample{tasking}{11c}
|
||||
\cexample{tasking}{11}
|
||||
|
||||
\fexample{tasking}{11f}
|
||||
\ffreeexample{tasking}{11}
|
||||
|
||||
This second example shows an incorrect use of the \code{mergeable} clause. In
|
||||
this example, the created task will access different instances of the variable
|
||||
@ -150,9 +150,9 @@ it will access the same variable \code{x} if the task is merged. As a result,
|
||||
the behavior of the program is unspecified and it can print two different values
|
||||
for \code{x} depending on the decisions taken by the implementation.
|
||||
|
||||
\cexample{tasking}{12c}
|
||||
\cexample{tasking}{12}
|
||||
|
||||
\fexample{tasking}{12f}
|
||||
\ffreeexample{tasking}{12}
|
||||
|
||||
The following example shows the use of the \code{final} clause and the \code{omp\_in\_final}
|
||||
API call in a recursive binary search program. To reduce overhead, once a certain
|
||||
@ -170,9 +170,9 @@ in the stack could also be avoided but it would make this example less clear. Th
|
||||
clause since all tasks created in a \code{final} task region are included tasks
|
||||
that can be merged if the \code{mergeable} clause is present.
|
||||
|
||||
\cexample{tasking}{13c}
|
||||
\cexample{tasking}{13}
|
||||
|
||||
\fexample{tasking}{13f}
|
||||
\ffreeexample{tasking}{13}
|
||||
|
||||
The following example illustrates the difference between the \code{if} and the
|
||||
\code{final} clauses. The \code{if} clause has a local effect. In the first
|
||||
@ -184,7 +184,7 @@ task itself. In the second nest of tasks, the nested tasks will be created as in
|
||||
tasks. Note also that the conditions for the \code{if} and \code{final} clauses
|
||||
are usually the opposite.
|
||||
|
||||
\cexample{tasking}{14c}
|
||||
\cexample{tasking}{14}
|
||||
|
||||
\fexample{tasking}{14f}
|
||||
\ffreeexample{tasking}{14}
|
||||
|
||||
|
14
Examples_taskloop.tex
Normal file
14
Examples_taskloop.tex
Normal file
@ -0,0 +1,14 @@
|
||||
\pagebreak
|
||||
\section{The \code{taskloop} Construct}
|
||||
\label{sec:taskloop}
|
||||
|
||||
The following example illustrates how to execute a long running task concurrently with tasks created
|
||||
with a \code{taskloop} directive for a loop having unbalanced amounts of work for its iterations.
|
||||
|
||||
The \code{grainsize} clause specifies that each task is to execute at least 500 iterations of the loop.
|
||||
|
||||
The \code{nogroup} clause removes the implicit taskgroup of the \code{taskloop} construct; the explicit \code{taskgroup} construct in the example ensures that the function is not exited before the long-running task and the loops have finished execution.
|
||||
|
||||
\cexample{taskloop}{1}
|
||||
|
||||
\ffreeexample{taskloop}{1}
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{taskyield} Construct}
|
||||
\label{chap:taskyield}
|
||||
\section{The \code{taskyield} Construct}
|
||||
\label{sec:taskyield}
|
||||
|
||||
The following example illustrates the use of the \code{taskyield} directive.
|
||||
The tasks in the example compute something useful and then do some computation
|
||||
@ -8,7 +8,7 @@ that must be done in a critical region. By using \code{taskyield} when a task
|
||||
cannot get access to the \code{critical} region the implementation can suspend
|
||||
the current task and schedule some other task that can do something useful.
|
||||
|
||||
\cexample{taskyield}{1c}
|
||||
\cexample{taskyield}{1}
|
||||
|
||||
\fexample{taskyield}{1f}
|
||||
\ffreeexample{taskyield}{1}
|
||||
|
||||
|
@ -1,9 +1,10 @@
|
||||
\pagebreak
|
||||
\chapter{\code{teams} Constructs}
|
||||
\label{chap:teams}
|
||||
\section{\code{teams} Constructs}
|
||||
\label{sec:teams}
|
||||
|
||||
\section{\code{target} and \code{teams} Constructs with \code{omp\_get\_num\_teams}\\
|
||||
\subsection{\code{target} and \code{teams} Constructs with \code{omp\_get\_num\_teams}\\
|
||||
and \code{omp\_get\_team\_num} Routines}
|
||||
\label{subsec:teams_api}
|
||||
|
||||
The following example shows how the \code{target} and \code{teams} constructs
|
||||
are used to create a league of thread teams that execute a region. The \code{teams}
|
||||
@ -15,11 +16,12 @@ region. The \code{omp\_get\_team\_num} routine returns the team number, which is
|
||||
between 0 and one less than the value returned by \code{omp\_get\_num\_teams}. The following
|
||||
example manually distributes a loop across two teams.
|
||||
|
||||
\cexample{teams}{1c}
|
||||
\cexample{teams}{1}
|
||||
|
||||
\fexample{teams}{1f}
|
||||
\ffreeexample{teams}{1}
|
||||
|
||||
\section{\code{target}, \code{teams}, and \code{distribute} Constructs}
|
||||
\subsection{\code{target}, \code{teams}, and \code{distribute} Constructs}
|
||||
\label{subsec:teams_distribute}
|
||||
|
||||
The following example shows how the \code{target}, \code{teams}, and \code{distribute}
|
||||
constructs are used to execute a loop nest in a \code{target} region. The \code{teams}
|
||||
@ -45,11 +47,12 @@ created by the \code{teams} construct. At the end of the \code{teams} region,
|
||||
each master thread's private copy of \plc{sum} is reduced into the final \plc{sum} that is
|
||||
implicitly mapped into the \code{target} region.
|
||||
|
||||
\cexample{teams}{2c}
|
||||
\cexample{teams}{2}
|
||||
|
||||
\fexample{teams}{2f}
|
||||
\ffreeexample{teams}{2}
|
||||
|
||||
\section{\code{target} \code{teams}, and Distribute Parallel Loop Constructs}
|
||||
\subsection{\code{target} \code{teams}, and Distribute Parallel Loop Constructs}
|
||||
\label{subsec:teams_distribute_parallel}
|
||||
|
||||
The following example shows how the \code{target} \code{teams} and distribute
|
||||
parallel loop constructs are used to execute a \code{target} region. The \code{target}
|
||||
@ -59,12 +62,13 @@ team executes the \code{teams} region.
|
||||
The distribute parallel loop construct schedules the loop iterations across the
|
||||
master threads of each team and then across the threads of each team.
|
||||
|
||||
\cexample{teams}{3c}
|
||||
\cexample{teams}{3}
|
||||
|
||||
\fexample{teams}{3f}
|
||||
\ffreeexample{teams}{3}
|
||||
|
||||
\section{\code{target} \code{teams} and Distribute Parallel Loop
|
||||
\subsection{\code{target} \code{teams} and Distribute Parallel Loop
|
||||
Constructs with Scheduling Clauses}
|
||||
\label{subsec:teams_distribute_parallel_schedule}
|
||||
|
||||
The following example shows how the \code{target} \code{teams} and distribute
|
||||
parallel loop constructs are used to execute a \code{target} region. The \code{teams}
|
||||
@ -83,11 +87,12 @@ The \code{schedule} clause indicates that the 1024 iterations distributed to
|
||||
a master thread are then assigned to the threads in its associated team in chunks
|
||||
of 64 iterations.
|
||||
|
||||
\cexample{teams}{4c}
|
||||
\cexample{teams}{4}
|
||||
|
||||
\fexample{teams}{4f}
|
||||
\ffreeexample{teams}{4}
|
||||
|
||||
\section{\code{target} \code{teams} and \code{distribute} \code{simd} Constructs}
|
||||
\subsection{\code{target} \code{teams} and \code{distribute} \code{simd} Constructs}
|
||||
\label{subsec:teams_distribute_simd}
|
||||
|
||||
The following example shows how the \code{target} \code{teams} and \code{distribute}
|
||||
\code{simd} constructs are used to execute a loop in a \code{target} region.
|
||||
@ -97,11 +102,12 @@ master thread of each team executes the \code{teams} region.
|
||||
The \code{distribute} \code{simd} construct schedules the loop iterations across
|
||||
the master thread of each team and then uses SIMD parallelism to execute the iterations.
|
||||
|
||||
\cexample{teams}{5c}
|
||||
\cexample{teams}{5}
|
||||
|
||||
\fexample{teams}{5f}
|
||||
\ffreeexample{teams}{5}
|
||||
|
||||
\section{\code{target} \code{teams} and Distribute Parallel Loop SIMD Constructs}
|
||||
\subsection{\code{target} \code{teams} and Distribute Parallel Loop SIMD Constructs}
|
||||
\label{subsec:teams_distribute_parallel_simd}
|
||||
|
||||
The following example shows how the \code{target} \code{teams} and the distribute
|
||||
parallel loop SIMD constructs are used to execute a loop in a \code{target} \code{teams}
|
||||
@ -112,7 +118,7 @@ The distribute parallel loop SIMD construct schedules the loop iterations across
|
||||
the master thread of each team and then across the threads of each team where each
|
||||
thread uses SIMD parallelism.
|
||||
|
||||
\cexample{teams}{6c}
|
||||
\cexample{teams}{6}
|
||||
|
||||
\fexample{teams}{6f}
|
||||
\ffreeexample{teams}{6}
|
||||
|
||||
|
@ -1,18 +1,18 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{threadprivate} Directive}
|
||||
\label{chap:threadprivate}
|
||||
\section{The \code{threadprivate} Directive}
|
||||
\label{sec:threadprivate}
|
||||
|
||||
The following examples demonstrate how to use the \code{threadprivate} directive
|
||||
to give each thread a separate counter.
|
||||
|
||||
\cexample{threadprivate}{1c}
|
||||
\cexample{threadprivate}{1}
|
||||
|
||||
\fexample{threadprivate}{1f}
|
||||
\fexample{threadprivate}{1}
|
||||
|
||||
\ccppspecificstart
|
||||
The following example uses \code{threadprivate} on a static variable:
|
||||
|
||||
\cnexample{threadprivate}{2c}
|
||||
\cnexample{threadprivate}{2}
|
||||
|
||||
The following example demonstrates unspecified behavior for the initialization
|
||||
of a \code{threadprivate} variable. A \code{threadprivate} variable is initialized
|
||||
@ -22,7 +22,7 @@ constructed using the value of \code{x} (which is modified by the statement
|
||||
region could be either 1 or 2. This problem is avoided for \code{b}, which uses
|
||||
an auxiliary \code{const} variable and a copy-constructor.
|
||||
|
||||
\cnexample{threadprivate}{3c}
|
||||
\cppnexample{threadprivate}{3}
|
||||
\ccppspecificend
|
||||
|
||||
The following examples show non-conforming uses and correct uses of the \code{threadprivate}
|
||||
@ -32,29 +32,25 @@ directive.
|
||||
The following example is non-conforming because the common block is not declared
|
||||
local to the subroutine that refers to it:
|
||||
|
||||
\fnexample{threadprivate}{2f}
|
||||
\fnexample{threadprivate}{2}
|
||||
|
||||
The following example is also non-conforming because the common block is not declared
|
||||
local to the subroutine that refers to it:
|
||||
|
||||
\fnexample{threadprivate}{3f}
|
||||
\fnexample{threadprivate}{3}
|
||||
|
||||
The following example is a correct rewrite of the previous example:
|
||||
% blue line floater at top of this page for "Fortran, cont."
|
||||
\begin{figure}[t!]
|
||||
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
|
||||
\end{figure}
|
||||
|
||||
\fnexample{threadprivate}{4f}
|
||||
\fnexample{threadprivate}{4}
|
||||
|
||||
The following is an example of the use of \code{threadprivate} for local variables:
|
||||
|
||||
\fnexample{threadprivate}{5f}
|
||||
% blue line floater at top of this page for "Fortran, cont."
|
||||
\begin{figure}[t!]
|
||||
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
|
||||
\end{figure}
|
||||
|
||||
\fnexample{threadprivate}{5}
|
||||
|
||||
The above program, if executed by two threads, will print one of the following
|
||||
two sets of output:
|
||||
|
||||
@ -85,8 +81,12 @@ or
|
||||
\code{i = 5}
|
||||
|
||||
The following is an example of the use of \code{threadprivate} for module variables:
|
||||
% blue line floater at top of this page for "Fortran, cont."
|
||||
\begin{figure}[t!]
|
||||
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
|
||||
\end{figure}
|
||||
|
||||
\fnexample{threadprivate}{6f}
|
||||
\fnexample{threadprivate}{6}
|
||||
\fortranspecificend
|
||||
|
||||
\cppspecificstart
|
||||
@ -95,12 +95,12 @@ for class-type \code{T}. \code{t1} is default constructed, \code{t2} is construc
|
||||
taking a constructor accepting one argument of integer type, \code{t3} is copy
|
||||
constructed with argument \code{f()}:
|
||||
|
||||
\cnexample{threadprivate}{4c}
|
||||
\cppnexample{threadprivate}{4}
|
||||
|
||||
The following example illustrates the use of \code{threadprivate} for static
|
||||
class members. The \code{threadprivate} directive for a static class member must
|
||||
be placed inside the class definition.
|
||||
|
||||
\cnexample{threadprivate}{5c}
|
||||
\cppnexample{threadprivate}{5}
|
||||
\cppspecificend
|
||||
|
||||
|
@ -1,7 +1,7 @@
|
||||
\pagebreak
|
||||
\chapter{The \code{workshare} Construct}
|
||||
\section{The \code{workshare} Construct}
|
||||
\fortranspecificstart
|
||||
\label{chap:workshare}
|
||||
\label{sec:workshare}
|
||||
|
||||
The following are examples of the \code{workshare} construct.
|
||||
|
||||
@ -10,14 +10,14 @@ the \code{parallel} region, and there is a barrier after the last statement.
|
||||
Implementations must enforce Fortran execution rules inside of the \code{workshare}
|
||||
block.
|
||||
|
||||
\fnexample{workshare}{1f}
|
||||
\fnexample{workshare}{1}
|
||||
|
||||
In the following example, the barrier at the end of the first \code{workshare}
|
||||
region is eliminated with a \code{nowait} clause. Threads doing \code{CC =
|
||||
DD} immediately begin work on \code{EE = FF} when they are done with \code{CC
|
||||
= DD}.
|
||||
|
||||
\fnexample{workshare}{2f}
|
||||
\fnexample{workshare}{2}
|
||||
% blue line floater at top of this page for "Fortran, cont."
|
||||
\begin{figure}[t!]
|
||||
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
|
||||
@ -27,7 +27,7 @@ The following example shows the use of an \code{atomic} directive inside a \code
|
||||
construct. The computation of \code{SUM(AA)} is workshared, but the update to
|
||||
\code{R} is atomic.
|
||||
|
||||
\fnexample{workshare}{3f}
|
||||
\fnexample{workshare}{3}
|
||||
|
||||
Fortran \code{WHERE} and \code{FORALL} statements are \emph{compound statements},
|
||||
made up of a \emph{control} part and a \emph{statement} part. When \code{workshare}
|
||||
@ -47,7 +47,7 @@ Each task gets worked on in order by the threads:
|
||||
\\
|
||||
\code{GG = HH}
|
||||
|
||||
\fnexample{workshare}{4f}
|
||||
\fnexample{workshare}{4}
|
||||
% blue line floater at top of this page for "Fortran, cont."
|
||||
\begin{figure}[t!]
|
||||
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
|
||||
@ -56,21 +56,21 @@ Each task gets worked on in order by the threads:
|
||||
In the following example, an assignment to a shared scalar variable is performed
|
||||
by one thread in a \code{workshare} while all other threads in the team wait.
|
||||
|
||||
\fnexample{workshare}{5f}
|
||||
\fnexample{workshare}{5}
|
||||
|
||||
The following example contains an assignment to a private scalar variable, which
|
||||
is performed by one thread in a \code{workshare} while all other threads wait.
|
||||
It is non-conforming because the private scalar variable is undefined after the
|
||||
assignment statement.
|
||||
|
||||
\fnexample{workshare}{6f}
|
||||
\fnexample{workshare}{6}
|
||||
|
||||
Fortran execution rules must be enforced inside a \code{workshare} construct.
|
||||
In the following example, the same result is produced in the following program
|
||||
fragment regardless of whether the code is executed sequentially or inside an OpenMP
|
||||
program with multiple threads:
|
||||
|
||||
\fnexample{workshare}{7f}
|
||||
\fnexample{workshare}{7}
|
||||
\fortranspecificend
|
||||
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
\pagebreak
|
||||
\chapter{Worksharing Constructs Inside a \code{critical} Construct}
|
||||
\label{chap:worksharing_critical}
|
||||
\section{Worksharing Constructs Inside a \code{critical} Construct}
|
||||
\label{sec:worksharing_critical}
|
||||
|
||||
The following example demonstrates using a worksharing construct inside a \code{critical}
|
||||
construct. This example is conforming because the worksharing \code{single}
|
||||
@ -11,8 +11,8 @@ region, creates a new team of threads, and becomes the master of the new team.
|
||||
One of the threads in the new team enters the \code{single} region and increments
|
||||
\code{i} by \code{1}. At the end of this example \code{i} is equal to \code{2}.
|
||||
|
||||
\cexample{worksharing_critical}{1c}
|
||||
\cexample{worksharing_critical}{1}
|
||||
|
||||
\fexample{worksharing_critical}{1f}
|
||||
\fexample{worksharing_critical}{1}
|
||||
|
||||
|
||||
|
56
History.tex
56
History.tex
@ -1,11 +1,39 @@
|
||||
\chapter{Document Revision History}
|
||||
\label{chap:history}
|
||||
|
||||
\section{Changes from 4.0.2 to 4.5.0}
|
||||
\begin{itemize}
|
||||
\item Reorganized into chapters of major topics
|
||||
\item Included file extensions in example labels to indicate source type
|
||||
\item Applied the explicit \code{map(tofrom)} for scalar variables
|
||||
in a number of examples to comply with
|
||||
the change of the default behavior for scalar variables from
|
||||
\code{map(tofrom)} to \code{firstprivate} in the 4.5 specification
|
||||
\item Added the following new examples:
|
||||
\begin{itemize}
|
||||
\item \code{linear} clause in loop constructs (\specref{sec:linear_in_loop})
|
||||
\item task priority (\specref{sec:task_priority})
|
||||
\item \code{taskloop} construct (\specref{sec:taskloop})
|
||||
\item \plc{directive-name} modifier in multiple \code{if} clauses on
|
||||
a combined construct (\specref{subsec:target_if})
|
||||
\item unstructured data mapping (\specref{sec:target_enter_exit_data})
|
||||
\item \code{link} clause for \code{declare}~\code{target} directive
|
||||
(\specref{subsec:declare_target_link})
|
||||
\item asynchronous target execution with \code{nowait} clause (\specref{sec:async_target_exec_depend})
|
||||
\item device memory routines and device pointers
|
||||
(\specref{subsec:target_mem_and_device_ptrs})
|
||||
\item doacross loop nest (\specref{sec:doacross})
|
||||
\item locks with hints (\specref{sec:locks})
|
||||
\item C/C++ array reduction (\specref{sec:reduction})
|
||||
\item C++ reference types in data sharing clauses (\specref{sec:cpp_reference})
|
||||
\end{itemize}
|
||||
\end{itemize}
|
||||
|
||||
\section{Changes from 4.0.1 to 4.0.2}
|
||||
|
||||
\begin{itemize}
|
||||
\item Names of examples were changed from numbers to mnemonics
|
||||
\item Added SIMD examples (\specref{chap:SIMD})
|
||||
\item Added SIMD examples (\specref{sec:SIMD})
|
||||
\item Applied miscellaneous fixes in several source codes
|
||||
\item Added the revision history
|
||||
\end{itemize}
|
||||
@ -14,8 +42,8 @@
|
||||
|
||||
Added the following new examples:
|
||||
\begin{itemize}
|
||||
\item the \code{proc\_bind} clause (\specref{chap:affinity})
|
||||
\item the \code{taskgroup} construct (\specref{chap:taskgroup})
|
||||
\item the \code{proc\_bind} clause (\specref{sec:affinity})
|
||||
\item the \code{taskgroup} construct (\specref{sec:taskgroup})
|
||||
\end{itemize}
|
||||
|
||||
\section{Changes from 3.1 to 4.0}
|
||||
@ -25,16 +53,16 @@ from the specification document.
|
||||
|
||||
Version 4.0 added the following new examples:
|
||||
\begin{itemize}
|
||||
\item task dependences (\specref{chap:task_dep})
|
||||
\item cancellation constructs (\specref{chap:cancellation})
|
||||
\item \code{target} construct (\specref{chap:target})
|
||||
\item \code{target} \code{data} construct (\specref{chap:target_data})
|
||||
\item \code{target} \code{update} construct (\specref{chap:target_update})
|
||||
\item \code{declare} \code{target} construct (\specref{chap:declare_target})
|
||||
\item \code{teams} constructs (\specref{chap:teams})
|
||||
\item task dependences (\specref{sec:task_depend})
|
||||
\item \code{target} construct (\specref{sec:target})
|
||||
\item \code{target} \code{data} construct (\specref{sec:target_data})
|
||||
\item \code{target} \code{update} construct (\specref{sec:target_update})
|
||||
\item \code{declare} \code{target} construct (\specref{sec:declare_target})
|
||||
\item \code{teams} constructs (\specref{sec:teams})
|
||||
\item asynchronous execution of a \code{target} region using tasks
|
||||
(\specref{chap:async_target})
|
||||
\item array sections in device constructs (\specref{chap:array_sections})
|
||||
\item device runtime routines (\specref{chap:device})
|
||||
\item Fortran ASSOCIATE construct (\specref{chap:associate})
|
||||
(\specref{subsec:async_target_with_tasks})
|
||||
\item array sections in device constructs (\specref{sec:array_sections})
|
||||
\item device runtime routines (\specref{sec:device})
|
||||
\item Fortran ASSOCIATE construct (\specref{sec:associate})
|
||||
\item cancellation constructs (\specref{sec:cancellation})
|
||||
\end{itemize}
|
||||
|
@ -34,13 +34,14 @@
|
||||
|
||||
\chapter*{Introduction}
|
||||
\label{chap:introduction}
|
||||
\addcontentsline{toc}{chapter}{\protect\numberline{}Introduction}
|
||||
This collection of programming examples supplements the OpenMP API for Shared
|
||||
Memory Parallelization specifications, and is not part of the formal specifications. It
|
||||
assumes familiarity with the OpenMP specifications, and shares the typographical
|
||||
conventions used in that document.
|
||||
|
||||
\notestart
|
||||
\noteheader – This first release of the OpenMP Examples reflects the OpenMP Version 4.0
|
||||
\noteheader – This first release of the OpenMP Examples reflects the OpenMP Version 4.5
|
||||
specifications. Additional examples are being developed and will be published in future
|
||||
releases of this document.
|
||||
\noteend
|
||||
|
71
Makefile
71
Makefile
@ -1,75 +1,20 @@
|
||||
# Makefile for the OpenMP Examples document in LaTex format.
|
||||
# For more information, see the master document, openmp-examples.tex.
|
||||
|
||||
version=4.0.2
|
||||
version=4.5.0
|
||||
default: openmp-examples.pdf
|
||||
|
||||
|
||||
CHAPTERS=Title_Page.tex \
|
||||
Introduction_Chapt.tex \
|
||||
Examples_Chapt.tex \
|
||||
Examples_ploop.tex \
|
||||
Examples_mem_model.tex \
|
||||
Examples_cond_comp.tex \
|
||||
Examples_icv.tex \
|
||||
Examples_parallel.tex \
|
||||
Examples_nthrs_nesting.tex \
|
||||
Examples_nthrs_dynamic.tex \
|
||||
Examples_affinity.tex \
|
||||
Examples_fort_do.tex \
|
||||
Examples_fort_loopvar.tex \
|
||||
Examples_nowait.tex \
|
||||
Examples_collapse.tex \
|
||||
Examples_psections.tex \
|
||||
Examples_fpriv_sections.tex \
|
||||
Examples_single.tex \
|
||||
Examples_tasking.tex \
|
||||
Examples_task_dep.tex \
|
||||
Examples_taskgroup.tex \
|
||||
Examples_taskyield.tex \
|
||||
Examples_workshare.tex \
|
||||
Examples_master.tex \
|
||||
Examples_critical.tex \
|
||||
Examples_worksharing_critical.tex \
|
||||
Examples_barrier_regions.tex \
|
||||
Examples_atomic.tex \
|
||||
Examples_atomic_restrict.tex \
|
||||
Examples_flush_nolist.tex \
|
||||
Examples_standalone.tex \
|
||||
Examples_ordered.tex \
|
||||
Examples_cancellation.tex \
|
||||
Examples_threadprivate.tex \
|
||||
Examples_pra_iterator.tex \
|
||||
Examples_fort_sp_common.tex \
|
||||
Examples_default_none.tex \
|
||||
Examples_fort_race.tex \
|
||||
Examples_private.tex \
|
||||
Examples_fort_sa_private.tex \
|
||||
Examples_carrays_fpriv.tex \
|
||||
Examples_lastprivate.tex \
|
||||
Examples_reduction.tex \
|
||||
Examples_copyin.tex \
|
||||
Examples_copyprivate.tex \
|
||||
Examples_nested_loop.tex \
|
||||
Examples_nesting_restrict.tex \
|
||||
Examples_set_dynamic_nthrs.tex \
|
||||
Examples_get_nthrs.tex \
|
||||
Examples_init_lock.tex \
|
||||
Examples_lock_owner.tex \
|
||||
Examples_simple_lock.tex \
|
||||
Examples_nestable_lock.tex \
|
||||
Examples_SIMD.tex \
|
||||
Examples_target.tex \
|
||||
Examples_target_data.tex \
|
||||
Examples_target_update.tex \
|
||||
Examples_declare_target.tex \
|
||||
Examples_teams.tex \
|
||||
Examples_async_target.tex \
|
||||
Examples_array_sections.tex \
|
||||
Examples_device.tex \
|
||||
Examples_associate.tex \
|
||||
Examples_*.tex \
|
||||
History.tex
|
||||
|
||||
SOURCES=sources/*.c \
|
||||
sources/*.cpp \
|
||||
sources/*.f90 \
|
||||
sources/*.f
|
||||
|
||||
INTERMEDIATE_FILES=openmp-examples.pdf \
|
||||
openmp-examples.toc \
|
||||
openmp-examples.idx \
|
||||
@ -79,7 +24,7 @@ INTERMEDIATE_FILES=openmp-examples.pdf \
|
||||
openmp-examples.out \
|
||||
openmp-examples.log
|
||||
|
||||
openmp-examples.pdf: $(CHAPTERS) openmp.sty openmp-examples.tex openmp-logo.png
|
||||
openmp-examples.pdf: $(CHAPTERS) $(SOURCES) openmp.sty openmp-examples.tex openmp-logo.png
|
||||
rm -f $(INTERMEDIATE_FILES)
|
||||
pdflatex -interaction=batchmode -file-line-error openmp-examples.tex
|
||||
pdflatex -interaction=batchmode -file-line-error openmp-examples.tex
|
||||
|
@ -27,7 +27,7 @@ Source codes for OpenMP \VER{} Examples can be downloaded from
|
||||
\href{https://github.com/OpenMP/Examples/tree/v\VER}{github}.\\
|
||||
|
||||
\begin{adjustwidth}{0pt}{1em}\setlength{\parskip}{0.25\baselineskip}%
|
||||
Copyright © 1997-2015 OpenMP Architecture Review Board.\\
|
||||
Copyright © 1997-2016 OpenMP Architecture Review Board.\\
|
||||
Permission to copy without fee all or part of this material is granted,
|
||||
provided the OpenMP Architecture Review Board copyright notice and
|
||||
the title of this document appear. Notice is given that copying is by
|
||||
|
@ -1,4 +1,4 @@
|
||||
Copyright (c) 1997-2015 OpenMP Architecture Review Board.
|
||||
Copyright (c) 1997-2016 OpenMP Architecture Review Board.
|
||||
All rights reserved.
|
||||
|
||||
Permission to redistribute and use without fee all or part of the source
|
||||
|
11
openmp-examples.tcp
Normal file
11
openmp-examples.tcp
Normal file
@ -0,0 +1,11 @@
|
||||
[FormatInfo]
|
||||
Type=TeXnicCenterProjectInformation
|
||||
Version=4
|
||||
|
||||
[ProjectInfo]
|
||||
MainFile=ClassicThesis.tex
|
||||
UseBibTeX=1
|
||||
UseMakeIndex=0
|
||||
ActiveProfile=LaTeX ⇨ PDF
|
||||
ProjectLanguage=en
|
||||
ProjectDialect=US
|
@ -48,8 +48,8 @@
|
||||
\documentclass[10pt,letterpaper,twoside,makeidx,hidelinks]{scrreprt}
|
||||
|
||||
% Text to appear in the footer on even-numbered pages:
|
||||
\newcommand{\VER}{4.0.2}
|
||||
\newcommand{\VERDATE}{March 2015}
|
||||
\newcommand{\VER}{4.5.0}
|
||||
\newcommand{\VERDATE}{November 2016}
|
||||
\newcommand{\footerText}{OpenMP Examples Version \VER{} - \VERDATE}
|
||||
|
||||
% Unified style sheet for OpenMP documents:
|
||||
@ -77,71 +77,120 @@
|
||||
|
||||
\setcounter{chapter}{0} % start chapter numbering here
|
||||
|
||||
\input{Examples_ploop}
|
||||
\input{Examples_mem_model}
|
||||
\input{Examples_cond_comp}
|
||||
\input{Examples_icv}
|
||||
\input{Examples_parallel}
|
||||
\input{Examples_nthrs_nesting}
|
||||
\input{Examples_nthrs_dynamic}
|
||||
\input{Examples_affinity}
|
||||
\input{Examples_fort_do}
|
||||
\input{Examples_fort_loopvar}
|
||||
\input{Examples_nowait}
|
||||
\input{Examples_collapse}
|
||||
\input{Examples_psections}
|
||||
\input{Examples_fpriv_sections}
|
||||
\input{Examples_single}
|
||||
\input{Examples_tasking}
|
||||
\input{Examples_task_dep}
|
||||
\input{Examples_taskgroup}
|
||||
\input{Examples_taskyield}
|
||||
\input{Examples_workshare}
|
||||
\input{Examples_master}
|
||||
\input{Examples_critical}
|
||||
\input{Examples_worksharing_critical}
|
||||
\input{Examples_barrier_regions}
|
||||
\input{Examples_atomic}
|
||||
\input{Examples_atomic_restrict}
|
||||
\input{Examples_flush_nolist}
|
||||
\input{Examples_standalone}
|
||||
\input{Examples_ordered}
|
||||
\input{Examples_cancellation}
|
||||
\input{Examples_threadprivate}
|
||||
\input{Examples_pra_iterator}
|
||||
\input{Examples_fort_sp_common}
|
||||
\input{Examples_default_none}
|
||||
\input{Examples_fort_race}
|
||||
\input{Examples_private}
|
||||
\input{Examples_fort_sa_private}
|
||||
\input{Examples_carrays_fpriv}
|
||||
\input{Examples_lastprivate}
|
||||
\input{Examples_reduction}
|
||||
\input{Examples_copyin}
|
||||
\input{Examples_copyprivate}
|
||||
\input{Examples_nested_loop}
|
||||
\input{Examples_nesting_restrict}
|
||||
\input{Examples_set_dynamic_nthrs}
|
||||
\input{Examples_get_nthrs}
|
||||
\input{Examples_init_lock}
|
||||
\input{Examples_lock_owner}
|
||||
\input{Examples_simple_lock}
|
||||
\input{Examples_nestable_lock}
|
||||
\input{Examples_SIMD}
|
||||
\input{Examples_target}
|
||||
\input{Examples_target_data}
|
||||
\input{Examples_target_update}
|
||||
\input{Examples_declare_target}
|
||||
\input{Examples_teams}
|
||||
\input{Examples_async_target}
|
||||
\input{Examples_array_sections}
|
||||
\input{Examples_device}
|
||||
\input{Examples_associate}
|
||||
\input{Chap_parallel_execution}
|
||||
\input{Examples_ploop}
|
||||
\input{Examples_parallel}
|
||||
\input{Examples_nthrs_nesting}
|
||||
\input{Examples_nthrs_dynamic}
|
||||
\input{Examples_fort_do}
|
||||
\input{Examples_nowait}
|
||||
\input{Examples_collapse}
|
||||
% linear Clause 475
|
||||
\input{Examples_linear_in_loop}
|
||||
\input{Examples_psections}
|
||||
\input{Examples_fpriv_sections}
|
||||
\input{Examples_single}
|
||||
\input{Examples_workshare}
|
||||
\input{Examples_master}
|
||||
\input{Examples_pra_iterator}
|
||||
\input{Examples_set_dynamic_nthrs}
|
||||
\input{Examples_get_nthrs}
|
||||
|
||||
\input{Chap_affinity}
|
||||
\input{Examples_affinity}
|
||||
\input{Examples_affinity_query}
|
||||
|
||||
\input{Chap_tasking}
|
||||
\input{Examples_tasking}
|
||||
\input{Examples_task_priority}
|
||||
\input{Examples_task_dep}
|
||||
\input{Examples_taskgroup}
|
||||
\input{Examples_taskyield}
|
||||
\input{Examples_taskloop}
|
||||
|
||||
\input{Chap_devices}
|
||||
\input{Examples_target}
|
||||
\input{Examples_target_data}
|
||||
\input{Examples_target_unstructured_data}
|
||||
\input{Examples_target_update}
|
||||
\input{Examples_declare_target}
|
||||
% Link clause 474
|
||||
\input{Examples_teams}
|
||||
\input{Examples_async_target_depend}
|
||||
\input{Examples_async_target_with_tasks}
|
||||
%Title change of 57.1 and 57.2
|
||||
%New subsection
|
||||
\input{Examples_async_target_nowait}
|
||||
\input{Examples_async_target_nowait_depend}
|
||||
\input{Examples_array_sections}
|
||||
% Structure Element in map 487
|
||||
\input{Examples_device}
|
||||
% MemoryRoutine and Device ptr 473
|
||||
|
||||
\input{Chap_SIMD}
|
||||
\input{Examples_SIMD}
|
||||
% Forward Depend 370
|
||||
% simdlen 476
|
||||
% simd linear modifier 480
|
||||
|
||||
\input{Chap_synchronization}
|
||||
\input{Examples_critical}
|
||||
\input{Examples_worksharing_critical}
|
||||
\input{Examples_barrier_regions}
|
||||
\input{Examples_atomic}
|
||||
\input{Examples_atomic_restrict}
|
||||
\input{Examples_flush_nolist}
|
||||
\input{Examples_ordered}
|
||||
% Doacross loop 405
|
||||
\input{Examples_doacross}
|
||||
\input{Examples_locks}
|
||||
\input{Examples_init_lock}
|
||||
\input{Examples_init_lock_with_hint}
|
||||
\input{Examples_lock_owner}
|
||||
\input{Examples_simple_lock}
|
||||
\input{Examples_nestable_lock}
|
||||
% % LOCK with Hints 478
|
||||
% % Hint Clause xxxxxx (included after init_lock)
|
||||
% % Lock routines with hint
|
||||
|
||||
|
||||
\input{Chap_data_environment}
|
||||
\input{Examples_threadprivate}
|
||||
\input{Examples_default_none}
|
||||
\input{Examples_private}
|
||||
\input{Examples_fort_loopvar}
|
||||
\input{Examples_fort_sp_common}
|
||||
\input{Examples_fort_sa_private}
|
||||
\input{Examples_carrays_fpriv}
|
||||
\input{Examples_lastprivate}
|
||||
\input{Examples_reduction}
|
||||
% User UDR 287
|
||||
% C array reduction 377
|
||||
\input{Examples_copyin}
|
||||
\input{Examples_copyprivate}
|
||||
\input{Examples_cpp_reference}
|
||||
% Fortran 2003 features 482
|
||||
\input{Examples_associate} %section--> subsection
|
||||
|
||||
\input{Chap_memory_model}
|
||||
\input{Examples_mem_model}
|
||||
\input{Examples_fort_race}
|
||||
|
||||
\input{Chap_program_control}
|
||||
\input{Examples_cond_comp}
|
||||
\input{Examples_icv}
|
||||
% If multi-ifs 471
|
||||
\input{Examples_standalone}
|
||||
\input{Examples_cancellation}
|
||||
% New Section Nested Regions
|
||||
\input{Examples_nested_loop}
|
||||
\input{Examples_nesting_restrict}
|
||||
|
||||
|
||||
\setcounter{chapter}{0} % restart chapter numbering with "letter A"
|
||||
\renewcommand{\thechapter}{\Alph{chapter}}%
|
||||
\appendix
|
||||
|
||||
\input{History}
|
||||
|
||||
\end{document}
|
||||
|
||||
|
47
openmp.sty
47
openmp.sty
@ -78,6 +78,7 @@
|
||||
|
||||
\usepackage{comment} % allow use of \begin{comment}
|
||||
\usepackage{ifpdf,ifthen} % allow conditional tests in LaTeX definitions
|
||||
\usepackage{makecell} % Allows common formatting in cells with \thread & \makecell
|
||||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
@ -416,8 +417,10 @@
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
% Code example formatting for the Examples document
|
||||
% This defines:
|
||||
% /cexample formats blue markers, caption, and code for C/C++ examples
|
||||
% /fexample formats blue markers, caption, and code for Fortran examples
|
||||
% /cexample formats blue markers, caption, and code for C examples
|
||||
% /cppexample formats blue markers, caption, and code for C++ examples
|
||||
% /fexample formats blue markers, caption, and code for Fortran (fixed) examples
|
||||
% /ffreeexample formats blue markers, caption, and code for Fortran90 (free) examples
|
||||
% Thanks to Jin, Haoqiang H. for the original definitions of the following:
|
||||
|
||||
\usepackage{color,fancyvrb} % for \VerbatimInput
|
||||
@ -434,36 +437,40 @@
|
||||
|
||||
\newcommand{\escstr}[1]{\myreplace{_}{\_}{#1}}
|
||||
|
||||
\def\exampleheader#1#2{%
|
||||
\def\exampleheader#1#2#3#4{%
|
||||
\ifthenelse{ \equal{#1}{} }{
|
||||
\def\cname{#2}
|
||||
\def\ename\cname
|
||||
}{
|
||||
\def\cname{#1.#2}
|
||||
\def\cname{#1.#2.#3}
|
||||
% Use following line for old numbering
|
||||
% \def\ename{\thechapter.#2}
|
||||
% \def\ename{\thechapter.#2.#3}
|
||||
% Use following for mneumonics
|
||||
\def\ename{\escstr{#1}.#2}
|
||||
\def\ename{\escstr{#1}.#2.#3}
|
||||
}
|
||||
\noindent
|
||||
\textit{Example \ename}
|
||||
%\vspace*{-3mm}
|
||||
\code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=#4,fontsize=\small]%
|
||||
%\code{\VerbatimInput[numbers=left,firstnumber=1,firstline=#4,fontsize=\small]%
|
||||
%\code{\VerbatimInput[firstline=#4,fontsize=\small]%
|
||||
{sources/Example_\cname}}
|
||||
}
|
||||
|
||||
\def\cnexample#1#2{%
|
||||
\exampleheader{#1}{#2}
|
||||
\code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=8,fontsize=\small]%
|
||||
%\code{\VerbatimInput[numbers=left,firstnumber=1,firstline=8,fontsize=\small]%
|
||||
%\code{\VerbatimInput[firstline=8,fontsize=\small]%
|
||||
{sources/Example_\cname.c}}
|
||||
\exampleheader{#1}{#2}{c}{8}
|
||||
}
|
||||
|
||||
\def\cppnexample#1#2{%
|
||||
\exampleheader{#1}{#2}{cpp}{8}
|
||||
}
|
||||
|
||||
\def\fnexample#1#2{%
|
||||
\exampleheader{#1}{#2}
|
||||
\code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=6,fontsize=\small]%
|
||||
%\code{\VerbatimInput[numbers=left,firstnumber=1,firstline=6,fontsize=\small]%
|
||||
%\code{\VerbatimInput[firstline=6,fontsize=\small]%
|
||||
{sources/Example_\cname.f}}
|
||||
\exampleheader{#1}{#2}{f}{6}
|
||||
}
|
||||
|
||||
\def\ffreenexample#1#2{%
|
||||
\exampleheader{#1}{#2}{f90}{6}
|
||||
}
|
||||
|
||||
\newcommand\cexample[2]{%
|
||||
@ -474,7 +481,7 @@
|
||||
|
||||
\newcommand\cppexample[2]{%
|
||||
\needspace{5\baselineskip}\cppspecificstart
|
||||
\cnexample{#1}{#2}
|
||||
\cppnexample{#1}{#2}
|
||||
\cppspecificend
|
||||
}
|
||||
|
||||
@ -484,6 +491,12 @@
|
||||
\fortranspecificend
|
||||
}
|
||||
|
||||
\newcommand\ffreeexample[2]{%
|
||||
\needspace{5\baselineskip}\fortranspecificstart
|
||||
\ffreenexample{#1}{#2}
|
||||
\fortranspecificend
|
||||
}
|
||||
|
||||
|
||||
% Set default fonts:
|
||||
\rmfamily\mdseries\upshape\normalsize
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
x
Reference in New Issue
Block a user