synced with the 4.5.0 implementation of the examples-internal repo

This commit is contained in:
Henry Jin 2016-11-10 16:11:12 -08:00
parent c65fe47427
commit 156a12ca09
447 changed files with 3166 additions and 788 deletions

View File

@ -1,3 +1,9 @@
[20-May-2016] Version 4.5.0
Changes from 4.0.2ltx
1. Reorganization into topic chapters
2. Change file suffixes (f/f90 => Fixed/Free format) C++ => cpp
[2-Feb-2015] Version 4.0.2
Changes from 4.0.1ltx

48
Chap_SIMD.tex Normal file
View File

@ -0,0 +1,48 @@
\pagebreak
\chapter{SIMD}
\label{chap:simd}
Single instruction, multiple data (SIMD) is a form of parallel execution
in which the same operation is performed on multiple data elements
independently in hardware vector processing units (VPU), also called SIMD units.
The addition of two vectors to form a third vector is a SIMD operation.
Many processors have SIMD (vector) units that can perform simultaneously
2, 4, 8 or more executions of the same operation (by a single SIMD unit).
Loops without loop-carried backward dependency (or with dependency preserved using
ordered simd) are candidates for vectorization by the compiler for
execution with SIMD units. In addition, with state-of-the-art vectorization
technology and \code{declare simd} construct extensions for function vectorization
in the OpenMP 4.5 specification, loops with function calls can be vectorized as well.
The basic idea is that a scalar function call in a loop can be replaced by a vector version
of the function, and the loop can be vectorized simultaneously by combining a loop
vectorization (\code{simd} directive on the loop) and a function
vectorization (\code{declare simd} directive on the function).
A \code{simd} construct states that SIMD operations be performed on the
data within the loop. A number of clauses are available to provide
data-sharing attributes (\code{private}, \code{linear}, \code{reduction} and
\code{lastprivate}). Other clauses provide vector length preference/restrictions
(\code{simdlen} / \code{safelen}), loop fusion (\code{collapse}), and data
alignment (\code{aligned}).
The \code{declare simd} directive designates
that a vector version of the function should also be constructed for
execution within loops that contain the function and have a \code{simd}
directive. Clauses provide argument specifications (\code{linear},
\code{uniform}, and \code{aligned}), a requested vector length
(\code{simdlen}), and designate whether the function is always/never
called conditionally in a loop (\code{branch}/\code{inbranch}).
The latter is for optimizing peformance.
Also, the \code{simd} construct has been combined with the worksharing loop
constructs (\code{for simd} and \code{do simd}) to enable simultaneous thread
execution in different SIMD units.
%Hence, the \code{simd} construct can be
%used alone on a loop to direct vectorization (SIMD execution), or in
%combination with a parallel loop construct to include thread parallelism
%(a parallel loop sequentially followed by a \code{simd} construct,
%or a combined construct such as \code{parallel do simd} or
%\code{parallel for simd}).

118
Chap_affinity.tex Normal file
View File

@ -0,0 +1,118 @@
\pagebreak
\chapter{OpenMP Affinity}
\label{chap:openmp_affinity}
OpenMP Affinity consists of a \code{proc\_bind} policy (thread affinity policy) and a specification of
places (\texttt{"}location units\texttt{"} or \plc{processors} that may be cores, hardware
threads, sockets, etc.).
OpenMP Affinity enables users to bind computations on specific places.
The placement will hold for the duration of the parallel region.
However, the runtime is free to migrate the OpenMP threads
to different cores (hardware threads, sockets, etc.) prescribed within a given place,
if two or more cores (hardware threads, sockets, etc.) have been assigned to a given place.
Often the binding can be managed without resorting to explicitly setting places.
Without the specification of places in the \code{OMP\_PLACES} variable,
the OpenMP runtime will distribute and bind threads using the entire range of processors for
the OpenMP program, according to the \code{OMP\_PROC\_BIND} environment variable
or the \code{proc\_bind} clause. When places are specified, the OMP runtime
binds threads to the places according to a default distribution policy, or
those specified in the \code{OMP\_PROC\_BIND} environment variable or the
\code{proc\_bind} clause.
In the OpenMP Specifications document a processor refers to an execution unit that
is enabled for an OpenMP thread to use. A processor is a core when there is
no SMT (Simultaneous Multi-Threading) support or SMT is disabled. When
SMT is enabled, a processor is a hardware thread (HW-thread). (This is the
usual case; but actually, the execution unit is implementation defined.) Processor
numbers are numbered sequentially from 0 to the number of cores less one (without SMT), or
0 to the number HW-threads less one (with SMT). OpenMP places use the processor number to designate
binding locations (unless an \texttt{"}abstract name\texttt{"} is used.)
The processors available to a process may be a subset of the system's
processors. This restriction may be the result of a
wrapper process controlling the execution (such as \code{numactl} on Linux systems),
compiler options, library-specific environment variables, or default
kernel settings. For instance, the execution of multiple MPI processes,
launched on a single compute node, will each have a subset of processors as
determined by the MPI launcher or set by MPI affinity environment
variables for the MPI library. %Forked threads within an MPI process
%(for a hybrid execution of MPI and OpenMP code) inherit the valid
%processor set for execution from the parent process (the initial task region)
%when a parallel region forks threads. The binding policy set in
%\code{OMP\_PROC\_BIND} or the \code{proc\_bind} clause will be applied to
%the subset of processors available to \plc{the particular} MPI process.
%Also, setting an explicit list of processor numbers in the \code{OMP\_PLACES}
%variable before an MPI launch (which involves more than one MPI process) will
%result in unspecified behavior (and doesn't make sense) because the set of
%processors in the places list must not contain processors outside the subset
%of processors for an MPI process. A separate \code{OMP\_PLACES} variable must
%be set for each MPI process, and is usually accomplished by launching a script
%which sets \code{OMP\_PLACES} specifically for the MPI process.
Threads of a team are positioned onto places in a compact manner, a
scattered distribution, or onto the master's place, by setting the
\code{OMP\_PROC\_BIND} environment variable or the \code{proc\_bind} clause to
\plc{close}, \plc{spread}, or \plc{master}, respectively. When
\code{OMP\_PROC\_BIND} is set to FALSE no binding is enforced; and
when the value is TRUE, the binding is implementation defined to
a set of places in the \code{OMP\_PLACES} variable or to places
defined by the implementation if the \code{OMP\_PLACES} variable
is not set.
The \code{OMP\_PLACES} variable can also be set to an abstract name
(\plc{threads}, \plc{cores}, \plc{sockets}) to specify that a place is
either a single hardware thread, a core, or a socket, respectively.
This description of the \code{OMP\_PLACES} is most useful when the
number of threads is equal to the number of hardware thread, cores
or sockets. It can also be used with a \plc{close} or \plc{spread}
distribution policy when the equality doesn't hold.
% We need an example of using sockets, cores and threads:
% case 1 cores:
% Hyper-Threads on (2 hardware threads per core)
% 1 socket x 4 cores x 2 HW-threads
%
% export OMP_NUM_THREADS=4
% export OMP_PLACES=threads
%
% core # 0 1 2 3
% processor # 0,1 2,3 4,5 6,7
% thread # 0 * _ _ _ _ _ _ _ #mask for thread 0
% thread # 1 _ _ * _ _ _ _ _ #mask for thread 1
% thread # 2 _ _ _ _ * _ _ _ #mask for thread 2
% thread # 3 _ _ _ _ _ _ * _ #mask for thread 3
% case 2 threads:
%
% Hyper-Threads on (2 hardware threads per core)
% 1 socket x 4 cores x 2 HW-threads
%
% export OMP_NUM_THREADS=4
% export OMP_PLACES=cores
%
% core # 0 1 2 3
% processor # 0,1 2,3 4,5 6,7
% thread # 0 * * _ _ _ _ _ _ #mask for thread 0
% thread # 1 _ _ * * _ _ _ _ #mask for thread 1
% thread # 2 _ _ _ _ * * _ _ #mask for thread 2
% thread # 3 _ _ _ _ _ _ * * #mask for thread 3
% case 3 sockets:
%
% No Hyper-Threads
% 3 socket x 4 cores
%
% export OMP_NUM_THREADS=3
% export OMP_PLACES=sockets
%
% socket # 0 1 2
% processor # 0,1,2,3 4,5,6,7 8,9,10,11
% thread # 0 * * * * _ _ _ _ _ _ _ _ #mask for thread 0
% thread # 0 _ _ _ _ * * * * _ _ _ _ #mask for thread 1
% thread # 0 _ _ _ _ _ _ _ _ * * * * #mask for thread 2

75
Chap_data_environment.tex Normal file
View File

@ -0,0 +1,75 @@
\pagebreak
\chapter{Data Environment}
\label{chap:data_environment}
The OpenMP \plc{data environment} contains data attributes of variables and
objects. Many constructs (such as \code{parallel}, \code{simd}, \code{task})
accept clauses to control \plc{data-sharing} attributes
of referenced variables in the construct, where \plc{data-sharing} applies to
whether the attribute of the variable is \plc{shared},
is \plc{private} storage, or has special operational characteristics
(as found in the \code{firstprivate}, \code{lastprivate}, \code{linear}, or \code{reduction} clause).
The data environment for a device (distinguished as a \plc{device data environment})
is controlled on the host by \plc{data-mapping} attributes, which determine the
relationship of the data on the host, the \plc{original} data, and the data on the
device, the \plc{corresponding} data.
\bigskip
DATA-SHARING ATTRIBUTES
Data-sharing attributes of variables can be classified as being \plc{predetermined},
\plc{explicitly determined} or \plc{implicitly determined}.
Certain variables and objects have predetermined attributes.
A commonly found case is the loop iteration variable in associated loops
of a \code{for} or \code{do} construct. It has a private data-sharing attribute.
Variables with predetermined data-sharing attributes can not be listed in a data-sharing clause; but there are some
exceptions (mainly concerning loop iteration variables).
Variables with explicitly determined data-sharing attributes are those that are
referenced in a given construct and are listed in a data-sharing attribute
clause on the construct. Some of the common data-sharing clauses are:
\code{shared}, \code{private}, \code{firstprivate}, \code{lastprivate},
\code{linear}, and \code{reduction}. % Are these all of them?
Variables with implicitly determined data-sharing attributes are those
that are referenced in a given construct, do not have predetermined
data-sharing attributes, and are not listed in a data-sharing
attribute clause of an enclosing construct.
For a complete list of variables and objects with predetermined and
implicitly determined attributes, please refer to the
\plc{Data-sharing Attribute Rules for Variables Referenced in a Construct}
subsection of the OpenMP Specifications document.
\bigskip
DATA-MAPPING ATTRIBUTES
The \code{map} clause on a device construct explictly specifies how the list items in
the clause are mapped from the encountering task's data environment (on the host)
to the corresponding item in the device data environment (on the device).
The common \plc{list items} are arrays, array sections, scalars, pointers, and
structure elements (members).
Procedures and global variables have predetermined data mapping if they appear
within the list or block of a \code{declare target} directive. Also, a C/C++ pointer
is mapped as a zero-length array section, as is a C++ variable that is a reference to a pointer.
% Waiting for response from Eric on this.
Without explict mapping, non-scalar and non-pointer variables within the scope of the \code{target}
construct are implicitly mapped with a \plc{map-type} of \code{tofrom}.
Without explicit mapping, scalar variables within the scope of the \code{target}
construct are not mapped, but have an implicit firstprivate data-sharing
attribute. (That is, the value of the original variable is given to a private
variable of the same name on the device.) This behavior can be changed with
the \code{defaultmap} clause.
The \code{map} clause can appear on \code{target}, \code{target data} and
\code{target enter/exit data} constructs. The operations of creation and
removal of device storage as well as assignment of the original list item
values to the corresponding list items may be complicated when the list
item appears on multiple constructs or when the host and device storage
is shared. In these cases the item's reference count, the number of times
it has been referenced (+1 on entry and -1 on exited) in nested (structured)
map regions and/or accumulative (unstructured) mappings, determines the operation.
Details of the \code{map} clause and reference count operation are specified
in the \plc{map Clause} subsection of the OpenMP Specifications document.

53
Chap_devices.tex Normal file
View File

@ -0,0 +1,53 @@
\pagebreak
\chapter{Devices}
\label{chap:devices}
The \code{target} construct consists of a \code{target} directive
and an execution region. The \code{target} region is executed on
the default device or the device specified in the \code{device}
clause.
In OpenMP version 4.0, by default, all variables within the lexical
scope of the construct are copied \plc{to} and \plc{from} the
device, unless the device is the host, or the data exists on the
device from a previously executed data-type construct that
has created space on the device and possibly copied host
data to the device storage.
The constructs that explicitly
create storage, transfer data, and free storage on the device
are catagorized as structured and unstructured. The
\code{target} \code{data} construct is structured. It creates
a data region around \code{target} constructs, and is
convenient for providing persistent data throughout multiple
\code{target} regions. The \code{target} \code{enter} \code{data} and
\code{target} \code{exit} \code{data} constructs are unstructured, because
they can occur anywhere and do not support a "structure"
(a region) for enclosing \code{target} constructs, as does the
\code{target} \code{data} construct.
The \code{map} clause is used on \code{target}
constructs and the data-type constructs to map host data. It
specifies the device storage and data movement \code{to} and \code{from}
the device, and controls on the storage duration.
There is an important change in the OpenMP 4.5 specification
that alters the data model for scalar variables and C/C++ pointer variables.
The default behavior for scalar variables and C/C++ pointer variables
in an 4.5 compliant code is \code{firstprivate}. Example
codes that have been updated to reflect this new behavior are
annotated with a description that describes changes required
for correct execution. Often it is a simple matter of mapping
the variable as \code{tofrom} to obtain the intended 4.0 behavior.
In OpenMP version 4.5 the mechanism for target
execution is specified as occuring through a \plc{target task}.
When the \code{target} construct is encountered a new
\plc{target task} is generated. The \plc{target task}
completes after the \code{target} region has executed and all data
transfers have finished.
This new specification does not affect the execution of
pre-4.5 code; it is a necessary element for asynchronous
execution of the \code{target} region when using the new \code{nowait}
clause introduced in OpenMP 4.5.

105
Chap_memory_model.tex Normal file
View File

@ -0,0 +1,105 @@
\pagebreak
\chapter{Memory Model}
\label{chap:memory_model}
In this chapter, examples illustrate race conditions on access to variables with
shared data-sharing attributes. A race condition can exist when two
or more threads are involved in accessing a variable in which not all
of the accesses are reads; that is, a WaR, RaW or WaW condition
exists (R=read, a=after, W=write). A RaR does not produce a race condition.
Ensuring thread execution order at
the processor level is not enough to avoid race conditions, because the
local storage at the processor level (registers, caches, etc.)
must be synchronized so that a consistent view of the variable in the
memory hierarchy can be seen by the threads accessing the variable.
OpenMP provides a shared-memory model which allows all threads access
to \plc{memory} (shared data). Each thread also has exclusive
access to \plc{threadprivate memory} (private data). A private
variable referenced in an OpenMP directive's structured block is a
new version of the original variable (with the same name) for each
task (or SIMD lane) within the code block. A private variable is
initially undefined (except for variables in \code{firstprivate}
and \code{linear} clauses), and the original variable value is
unaltered by assignments to the private variable, (except for
\code{reduction}, \code{lastprivate} and \code{linear} clauses).
Private variables in an outer \code{parallel} region can be
shared by implicit tasks of an inner \code{parallel} region
(with a \code{share} clause on the inner \code{parallel} directive).
Likewise, a private variable may be shared in the region of an
explicit \code{task} (through a \code{shared} clause).
The \code{flush} directive forces a consistent view of local variables
of the thread executing the \code{flush}.
When a list is supplied on the directive, only the items (variables)
in the list are guaranteed to be flushed.
Implied flushes exist at prescribed locations of certain constructs.
For the complete list of these locations and associated constructs,
please refer to the \plc{flush Construct} section of the OpenMP
Specifications document.
% The following table lists construct in which implied flushes exist, and the
% location of their execution.
%
% %\begin{table}[hb]
% \begin{center}
% %\caption {Execution Location for Implicit Flushes. }
% \begin{tabular}{ | p{0.6\linewidth} | l | }
% \hline
% \code{CONSTRUCT} & \makecell{\code{EXECUTION} \\ \code{LOCATION}} \\
% \hline
% \code{parallel} & upon entry and exit \\
% \hline
% \makecell[l]{worksharing \\ \hspace{1.5em}\code{for}, \code{do}
% \\ \hspace{1.5em}\code{sections}
% \\ \hspace{1.5em}\code{single}
% \\ \hspace{1.5em}\code{workshare} }
% & upon exit \\
% \hline
% \code{critical} & upon entry and exit \\
% \hline
% \code{target} & upon entry and exit \\
% \hline
% \code{barrier} & during \\
% \hline
% \code{atomic} operation with \plc{seq\_cst} clause & upon entry and exit \\
% \hline
% \code{ordered}* & upon entry and exit \\
% \hline
% \code{cancel}** and \code{cancellation point}** & during \\
% \hline
% \code{target data} & upon entry and exit \\
% \hline
% \code{target update} + \code{to} clause,
% \code{target enter data} & on entry \\
% \hline
% \code{target update} + \code{from} clause,
% \code{target exit data} & on exit \\
% \hline
% \code{omp\_set\_lock} & during \\
% \hline
% \makecell[l]{ \code{omp\_set/unset\_lock}, \code{omp\_test\_lock}***
% \\ \code{omp\_set/unset/test\_nest\_lock}*** }
% & during \\
% \hline
% task scheduling point & \makecell[l]{immediately \\ before and after} \\
% \hline
% \end{tabular}
% %\caption {Execution Location for Implicit Flushes. }
%
% \end{center}
% %\end{table}
%
% * without clauses and with \code{threads} or \code{depend} clauses \newline
% ** when \plc{cancel-var} ICV is \plc{true} (cancellation is turned on) and cancellation is activated \newline
% *** if the region causes the lock to be set or unset
%
% A flush with a list is implied for non-sequentially consistent \code{atomic} operations
% (\code{atomic} directive without a \code{seq\_cst} clause), where the list item is the
% specific storage location accessed atomically (specified as the \plc{x} variable
% in \plc{atomic Construct} subsection of the OpenMP Specifications document).
Examples 1-3 show the difficulty of synchronizing threads through \code{flush} and \code{atomic} directives.

104
Chap_parallel_execution.tex Normal file
View File

@ -0,0 +1,104 @@
\pagebreak
\chapter{Parallel Execution}
\label{chap:parallel_execution}
A single thread, the \plc{initial thread}, begins sequential execution of
an OpenMP enabled program, as if the whole program is in an implicit parallel
region consisting of an implicit task executed by the \plc{initial thread}.
A \code{parallel} construct encloses code,
forming a parallel region. An \plc{initial thread} encountering a \code{parallel}
region forks (creates) a team of threads at the beginning of the
\code{parallel} region, and joins them (removes from execution) at the
end of the region. The initial thread becomes the master thread of the team in a
\code{parallel} region with a \plc{thread} number equal to zero, the other
threads are numbered from 1 to number of threads minus 1.
A team may be comprised of just a single thread.
Each thread of a team is assigned an implicit task consisting of code within the
parallel region. The task that creates a parallel region is suspended while the
tasks of the team are executed. A thread is tied to its task; that is,
only the thread assigned to the task can execute that task. After completion
of the \code{parallel} region, the master thread resumes execution of the generating task.
%After the \code{parallel} region the master thread becomes the initial
%thread again, and continues to execute the \plc{sequential part}.
Any task within a \code{parallel} region is allowed to encounter another
\code{parallel} region to form a nested \code{parallel} region. The
parallelism of a nested \code{parallel} region (whether it forks additional
threads, or is executed serially by the encountering task) can be controlled by the
\code{OMP\_NESTED} environment variable or the \code{omp\_set\_nested()}
API routine with arguments indicating true or false.
The number of threads of a \code{parallel} region can be set by the \code{OMP\_NUM\_THREADS}
environment variable, the \code{omp\_set\_num\_threads()} routine, or on the \code{parallel}
directive with the \code{num\_threads}
clause. The routine overrides the environment variable, and the clause overrides all.
Use the \code{OMP\_DYNAMIC}
or the \code{omp\_set\_dynamic()} function to specify that the OpenMP
implementation dynamically adjust the number of threads for
\code{parallel} regions. The default setting for dynamic adjustment is implementation
defined. When dynamic adjustment is on and the number of threads is specified,
the number of threads becomes an upper limit for the number of threads to be
provided by the OpenMP runtime.
\pagebreak
WORKSHARING CONSTRUCTS
A worksharing construct distributes the execution of the associated region
among the members of the team that encounter it. There is an
implied barrier at the end of the worksharing region
(there is no barrier at the beginning). The worksharing
constructs are:
\begin{compactitem}
\item loop constructs: {\code{for} and \code{do} }
\item \code{sections}
\item \code{single}
\item \code{workshare}
\end{compactitem}
The \code{for} and \code{do} constructs (loop constructs) create a region
consisting of a loop. A loop controlled by a loop construct is called
an \plc{associated} loop. Nested loops can form a single region when the
\code{collapse} clause (with an integer argument) designates the number of
\plc{associated} loops to be executed in parallel, by forming a
"single iteration space" for the specified number of nested loops.
The \code{ordered} clause can also control multiple associated loops.
An associated loop must adhere to a "canonical form" (specified in the
\plc{Canonical Loop Form} of the OpenMP Specifications document) which allows the
iteration count (of all associated loops) to be computed before the
(outermost) loop is executed. %[58:27-29].
Most common loops comply with the canonical form, including C++ iterators.
A \code{single} construct forms a region in which only one thread (any one
of the team) executes the region.
The other threads wait at the implied
barrier at the end, unless the \code{nowait} clause is specified.
The \code{sections} construct forms a region that contains one or more
structured blocks. Each block of a \code{sections} directive is
constructed with a \code{section} construct, and executed once by
one of the threads (any one) in the team. (If only one block is
formed in the region, the \code{section} construct, which is used to
separate blocks, is not required.)
The other threads wait at the implied
barrier at the end, unless the \code{nowait} clause is specified.
The \code{workshare} construct is a Fortran feature that consists of a
region with a single structure block (section of code). Statements in the
\code{workshare} region are divided into units of work, and executed (once)
by threads of the team.
\bigskip
MASTER CONSTRUCT
The \code{master} construct is not a worksharing construct. The master region is
is executed only by the master thread. There is no implicit barrier (and flush)
at the end of the \code{master} region; hence the other threads of the team continue
execution beyond code statements beyond the \code{master} region.

85
Chap_program_control.tex Normal file
View File

@ -0,0 +1,85 @@
\pagebreak
\chapter{Program Control}
\label{sec:program_control}
Some specific and elementary concepts of controlling program execution are
illustrated in the examples of this chapter. Control can be directly
managed with conditional control code (ifdef's with the \code{\_OPENMP}
macro, and the Fortran sentinel (\code{!\$})
for conditionally compiling). The \code{if} clause on some constructs
can direct the runtime to ignore or alter the behavior of the construct.
Of course, the base-language \code{if} statements can be used to control the "execution"
of stand-alone directives (such as \code{flush}, \code{barrier}, \code{taskwait},
and \code{taskyield}).
However, the directives must appear in a block structure, and not as a substatement as shown in examples 1 and 2 of this chapter.
\bigskip
CANCELLATION
Cancellation (termination) of the normal sequence of execution for the threads in an OpenMP region can
be accomplished with the \code{cancel} construct. The construct uses a
\plc{construct-type-clause} to set the region-type to activate for the cancellation.
That is, inclusion of one of the \plc{construct-type-clause} names \code{parallel}, \code{for},
\code{do}, \code{sections} or \code{taskgroup} on the directive line
activates the corresponding region.
The \code{cancel} construct is activated by the first encountering thread, and it
continues execution at the end of the named region.
The \code{cancel} construct is also a concellation point for any other thread of the team
to also continue execution at the end of the named region.
Also, once the specified region has been activated for cancellation any thread that encounnters
a \code{cancellation point} construct with the same named region (\plc{construct-type-clause}),
continues execution at the end of the region.
For an activated \code{cancel taskgroup} construct, the tasks that
belong to the taskgroup set of the innermost enclosing taskgroup region will be canceled.
A task that encounters the cancel taskgroup construct continues execution at the end of its
task region. Any task of the taskgroup that has already begun execution will run to completion,
unless it encounters a \code{cancellation point}; tasks that have not begun execution "may" be
discarded as completed tasks.
\bigskip
CONTROL VARIABLES
Internal control variables (ICV) are used by implementations to hold values which control the execution
of OpenMP regions. Control (and hence the ICVs) may be set as implementation defaults,
or set and adjusted through environment variables, clauses, and API functions. Many of the ICV control
values are accessible through API function calls. Also, initial ICV values are reported by the runtime
if the \code{OMP\_DISPLAY\_ENV} environment variable has been set to \code{TRUE}.
%As an example, the \plc{nthreads-var} is the ICV that holds the number of threads
%to be used in a \code{parallel} region. It can be set with the \code{OMP\_NUM\_THREADS} environment variable,
%the \code{omp\_set\_num\_threads()} API function, or the \code{num\_threads} clause. The default \plc{nthreads-var}
%value is implementation defined. All of the ICVs are presented in the \plc{Internal Control Variables} section
%of the \plc{Directives} chapter of the OpenMP Specifications document. Within the same document section, override
%relationships and scoping information can be found for applying user specifications and understanding the
%extent of the control.
\bigskip
NESTED CONSTRUCTS
Certain combinations of nested constructs are permitted, giving rise to a \plc{combined} construct
consisting of two or more constructs. These can be used when the two (or several) constructs would be used
immediately in succession (closely nested). A combined construct can use the clauses of the component
constructs without restrictions.
A \plc{composite} construct is a combined construct which has one or more clauses with (an often obviously)
modified or restricted meaning, relative to when the constructs are uncombined. %%[appear separately (singly).
%The combined \code{parallel do} and \code{parallel for} constructs are formed by combining the \code{parallel}
%construct with one of the loops constructs \code{do} or \code{for}. The
%\code{parallel do SIMD} and \code{parallel for SIMD} constructs are composite constructs (composed from
%the parallel loop constructs and the \code{SIMD} construct), because the \code{collapse} clause must
%explicitly address the ordering of loop chunking \plc{and} SIMD "combined" execution.
Certain nestings are forbidden, and often the reasoning is obvious. Worksharing constructs cannot be nested, and
the \code{barrier} construct cannot be nested inside a worksharing construct, or a \code{critical} construct.
Also, \code{target} constructs cannot be nested.
The \code{parallel} construct can be nested, as well as the \code{task} construct. The parallel
execution in the nested \code{parallel} construct(s) is control by the \code{OMP\_NESTED} and
\code{OMP\_MAX\_ACTIVE\_LEVELS} environment variables, and the \code{omp\_set\_nested()} and
\code{omp\_set\_max\_active\_levels()} functions.
More details on nesting can be found in the \plc{Nesting of Regions} of the \plc{Directives}
chapter in the OpenMP Specifications document.

69
Chap_synchronization.tex Normal file
View File

@ -0,0 +1,69 @@
\pagebreak
\chapter{Synchronization}
\label{chap:synchronization}
The \code{barrier} construct is a stand-alone directive that requires all threads
of a team (within a contention group) to execute the barrier and complete
execution of all tasks within the region, before continuing past the barrier.
The \code{critical} construct is a directive that contains a structured block.
The construct allows only a single thread at a time to execute the structured block (region).
Multiple critical regions may exist in a parallel region, and may
act cooperatively (only one thread at a time in all \code{critical} regions),
or separately (only one thread at a time in each \code{critical} regions when
a unique name is supplied on each \code{critical} construct).
An optional (lock) \code{hint} clause may be specified on a named \code{critical}
construct to provide the OpenMP runtime guidance in selection a locking
mechanism.
On a finer scale the \code{atomic} construct allows only a single thread at
a time to have atomic access to a storage location involving a single read,
write, update or capture statement, and a limited number of combinations
when specifying the \code{capture} \plc{atomic-clause} clause. The \plc{atomic-clause} clause
is required for some expression statements, but are not required for
\code{update} statements. Please see the details in the \plc{atomic Construct}
subsection of the \plc{Directives} chapter in the OpenMP Specifications document.
% The following three sentences were stolen from the spec.
The \code{ordered} construct either specifies a structured block in a loop,
simd, or loop SIMD region that will be executed in the order of the loop
iterations. The ordered construct sequentializes and orders the execution
of ordered regions while allowing code outside the region to run in parallel.
Since OpenMP 4.5 the \code{ordered} construct can also be a stand-alone
directive that specifies cross-iteration dependences in a doacross loop nest.
The \code{depend} clause uses a \code{sink} \plc{dependence-type}, along with a
iteration vector argument (vec) to indicate the iteration that satisfies the
dependence. The \code{depend} clause with a \code{source}
\plc{dependence-type} specifies dependence satisfaction.
The \code{flush} directive is a stand-alone construct that forces a thread's
temporal local storage (view) of a variable to memory where a consistent view
of the variable storage can be accesses. When the construct is used without
a variable list, all the locally thread-visible data as defined by the
base language are flushed. A construct with a list applies the flush
operation only to the items in the list. The \code{flush} construct also
effectively insures that no memory (load or store) operation for
the variable set (list items, or default set) may be reordered across
the \code{flush} directive.
General-purpose routines provide mutual exclusion semantics through locks,
represented by lock variables.
The semantics allows a task to \plc{set}, and hence
\plc{own} a lock, until it is \plc{unset} by the task that set it. A
\plc{nestable} lock can be set multiple times by a task, and is used
when in code requires nested control of locks. A \plc{simple lock} can
only be set once by the owning task. There are specific calls for the two
types of locks, and the variable of a specific lock type cannot be used by the
other lock type.
Any explicit task will observe the synchronization prescribed in a
\code{barrier} construct and an implied barrier. Also, additional synchronizations
are available for tasks. All children of a task will wait at a \code{taskwait} (for
their siblings to complete). A \code{taskgroup} construct creates a region in which the
current task is suspended at the end of the region until all sibling tasks,
and their descendants, have completed.
Scheduling constraints on task execution can be prescribed by the \code{depend}
clause to enforce dependence on previously generated tasks.
More details on controlling task executions can be found in the \plc{Tasking} Chapter
in the OpenMP Specifications document. %(DO REF. RIGHT.)

51
Chap_tasking.tex Normal file
View File

@ -0,0 +1,51 @@
\pagebreak
\chapter{Tasking}
\label{chap:tasking}
Tasking constructs provide units of work to a thread for execution.
Worksharing constructs do this, too (e.g. \code{for}, \code{do},
\code{sections}, and \code{singles} constructs);
but the work units are tightly controlled by an iteration limit and limited
scheduling, or a limited number of \code{sections} or \code{single} regions.
Worksharing was designed
with \texttt{"}data parallel\texttt{"} computing in mind. Tasking was designed for
\texttt{"}task parallel\texttt{"} computing and often involves non-locality or irregularity
in memory access.
The \code{task} construct can be used to execute work chunks: in a while loop;
while traversing nodes in a list; at nodes in a tree graph;
or in a normal loop (with a \code{taskloop} construct).
Unlike the statically scheduled loop iterations of worksharing, a task is
often enqueued, and then dequeued for execution by any of the threads of the
team within a parallel region. The generation of tasks can be from a single
generating thread (creating sibling tasks), or from multiple generators
in a recursive graph tree traversals.
%(creating a parent-descendents hierarchy of tasks, see example 4 and 7 below).
A \code{taskloop} construct
bundles iterations of an associated loop into tasks, and provides
similar controls found in the \code{task} construct.
Sibling tasks are synchronized by the \code{taskwait} construct, and tasks
and their descendent tasks can be synchronized by containing them in
a \code{taskgroup} region. Ordered execution is accomplished by specifying
dependences with a \code{depend} clause. Also, priorities can be
specified as hints to the scheduler through a \code{priority} clause.
Various clauses can be used to manage and optimize task generation,
as well as reduce the overhead of execution and to relinquish
control of threads for work balance and forward progress.
Once a thread starts executing a task, it is the designated thread
for executing the task to completion, even though it may leave the
execution at a scheduling point and return later. The thread is tied
to the task. Scheduling points can be introduced with the \code{taskyield}
construct. With an \code{untied} clause any other thread is allowed to continue
the task. An \code{if} clause with a \plc{true} expression allows the
generating thread to immediately execute the task as an undeferred task.
By including the data environment of the generating task into the generated task with the
\code{mergeable} and \code{final} clauses, task generation overhead can be reduced.
A complete list of the tasking constructs and details of their clauses
can be found in the \plc{Tasking Constructs} chapter of the OpenMP Specifications,
in the \plc{OpenMP Application Programming Interface} section.

View File

@ -1,9 +1,21 @@
\chapter*{Examples}
\label{chap:examples}
\addcontentsline{toc}{chapter}{\protect\numberline{}Examples}
The following are examples of the OpenMP API directives, constructs, and routines.
\ccppspecificstart
A statement following a directive is compound only when necessary, and a
non-compound statement is indented with respect to a directive preceding it.
\ccppspecificend
Each example is labeled as \plc{ename.seqno.ext}, where \plc{ename} is
the example name, \plc{seqno} is the sequence number in a section, and
\plc{ext} is the source file extension to indicate the code type and
source form. \plc{ext} is one of the following:
\begin{compactitem}
\item \plc{c} -- C code,
\item \plc{cpp} -- C++ code,
\item \plc{f} -- Fortran code in fixed form, and
\item \plc{f90} -- Fortran code in free form.
\end{compactitem}

View File

@ -1,17 +1,13 @@
\pagebreak
\chapter{SIMD Constructs}
\label{chap:SIMD}
%\pagebreak
\section{\code{simd} and \code{declare} \code{simd} Constructs}
\label{sec:SIMD}
The following examples illustrate the use of SIMD constructs for vectorization.
The following example illustrates the basic use of the \code{simd} construct
to assure the compiler that the loop can be vectorized.
Compilers may not vectorize loops when they are complex or possibly have
dependencies, even though the programmer is certain the loop will execute
correctly as a vectorized loop. The \code{simd} construct assures the compiler
that the loop can be vectorized.
\cexample{SIMD}{1}
\cexample{SIMD}{1c}
\fexample{SIMD}{1f}
\ffreeexample{SIMD}{1}
When a function can be inlined within a loop the compiler has an opportunity to
@ -42,9 +38,9 @@ In the \code{simd} constructs for the loops the \code{private(tmp)} clause is
necessary to assure that the each vector operation has its own \plc{tmp}
variable.
\cexample{SIMD}{2c}
\cexample{SIMD}{2}
\fexample{SIMD}{2f}
\ffreeexample{SIMD}{2}
A thread that encounters a SIMD construct executes a vectorized code of the
@ -54,9 +50,9 @@ privatized and declared as reductions with clauses. The example below
illustrates the use of \code{private} and \code{reduction} clauses in a SIMD
construct.
\cexample{SIMD}{3c}
\cexample{SIMD}{3}
\fexample{SIMD}{3f}
\ffreeexample{SIMD}{3}
A \code{safelen(N)} clause in a \code{simd} construct assures the compiler that
@ -69,9 +65,9 @@ code is safe for vectors up to and including size 16. In the loop, \plc{m} can
be 16 or greater, for correct code execution. If the value of \plc{m} is less
than 16, the behavior is undefined.
\cexample{SIMD}{4c}
\cexample{SIMD}{4}
\fexample{SIMD}{4f}
\ffreeexample{SIMD}{4}
The following SIMD construct instructs the compiler to collapse the \plc{i} and
@ -79,11 +75,15 @@ The following SIMD construct instructs the compiler to collapse the \plc{i} and
threads of the team. Within the workshared loop chunks of a thread, the SIMD
chunks are executed in the lanes of the vector units.
\cexample{SIMD}{5c}
\cexample{SIMD}{5}
\fexample{SIMD}{5f}
\ffreeexample{SIMD}{5}
%%% section
\section{\code{inbranch} and \code{notinbranch} Clauses}
\label{sec:SIMD_branch}
The following examples illustrate the use of the \code{declare} \code{simd}
construct with the \code{inbranch} and \code{notinbranch} clauses. The
\code{notinbranch} clause informs the compiler that the function \plc{foo} is
@ -92,9 +92,9 @@ the other hand, the \code{inbranch} clause for the function goo indicates that
the function is always called conditionally in the SIMD loop inside
the function \plc{myaddfloat}.
\cexample{SIMD}{6c}
\cexample{SIMD}{6}
\fexample{SIMD}{6f}
\ffreeexample{SIMD}{6}
In the code below, the function \plc{fib()} is called in the main program and
@ -103,7 +103,24 @@ condition. The compiler creates a masked vector version and a non-masked vector
version for the function \plc{fib()} while retaining the original scalar
version of the \plc{fib()} function.
\cexample{SIMD}{7c}
\cexample{SIMD}{7}
\fexample{SIMD}{7f}
\ffreeexample{SIMD}{7}
%%% section
\section{Loop-Carried Lexical Forward Dependence}
\label{sec:SIMD_forward_dep}
The following example tests the restriction on an SIMD loop with the loop-carried lexical forward-dependence. This dependence must be preserved for the correct execution of SIMD loops.
A loop can be vectorized even though the iterations are not completely independent when it has loop-carried dependences that are forward lexical dependences, indicated in the code below by the read of \plc{A[j+1]} and the write to \plc{A[j]} in C/C++ code (or \plc{A(j+1)} and \plc{A(j)} in Fortran). That is, the read of \plc{A[j+1]} (or \plc{A(j+1)} in Fortran) before the write to \plc{A[j]} (or \plc{A(j)} in Fortran) ordering must be preserved for each iteration in \plc{j} for valid SIMD code generation.
This test assures that the compiler preserves the loop carried lexical forward-dependence for generating a correct SIMD code.
\cexample{SIMD}{8}
\ffreeexample{SIMD}{8}

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{The \code{proc\_bind} Clause}
\label{chap:affinity}
\section{The \code{proc\_bind} Clause}
\label{sec:affinity}
The following examples demonstrate how to use the \code{proc\_bind} clause to
control the thread binding for a team of threads in a \code{parallel} region.
@ -25,16 +25,18 @@ or
\code{OMP\_PLACES=\texttt{"}\{0:2\}:8:2\texttt{"}}
\section{Spread Affinity Policy}
\subsection{Spread Affinity Policy}
\label{subsec:affinity_spread}
The following example shows the result of the \code{spread} affinity policy on
the partition list when the number of threads is less than or equal to the number
of places in the parent's place partition, for the machine architecture depicted
above. Note that the threads are bound to the first place of each subpartition.
\cexample{affinity}{1c}
\cexample{affinity}{1}
\fexample{affinity}{1f}
\fexample{affinity}{1}
It is unspecified on which place the master thread is initially started. If the
master thread is initially started on p0, the following placement of threads will
@ -73,9 +75,9 @@ parent's place partition. The first \plc{T/P} threads of the team (including the
thread) execute on the parent's place. The next \plc{T/P} threads execute on the next
place in the place partition, and so on, with wrap around.
\cexample{affinity}{2c}
\cexample{affinity}{2}
\fexample{affinity}{2f}
\ffreeexample{affinity}{2}
It is unspecified on which place the master thread is initially started. If the
master thread is initially started on p0, the following placement of threads will
@ -120,16 +122,17 @@ and distribution of the place partition would be as follows:
\item threads 14,15 execute on p1 with the place partition p1
\end{compactitem}
\section{Close Affinity Policy}
\subsection{Close Affinity Policy}
\label{subsec:affinity_close}
The following example shows the result of the \code{close} affinity policy on
the partition list when the number of threads is less than or equal to the number
of places in parent's place partition, for the machine architecture depicted above.
The place partition is not changed by the \code{close} policy.
\cexample{affinity}{3c}
\cexample{affinity}{3}
\fexample{affinity}{3f}
\fexample{affinity}{3}
It is unspecified on which place the master thread is initially started. If the
master thread is initially started on p0, the following placement of threads will
@ -168,9 +171,9 @@ thread) execute on the parent's place. The next \plc{T/P} threads execute on the
place in the place partition, and so on, with wrap around. The place partition
is not changed by the \code{close} policy.
\cexample{affinity}{4c}
\cexample{affinity}{4}
\fexample{affinity}{4f}
\ffreeexample{affinity}{4}
It is unspecified on which place the master thread is initially started. If the
master thread is initially running on p0, the following placement of threads will
@ -215,15 +218,16 @@ and distribution of the place partition would be as follows:
\item threads 14,15 execute on p1 with the place partition p0-p7
\end{compactitem}
\section{Master Affinity Policy}
\subsection{Master Affinity Policy}
\label{subsec:affinity_master}
The following example shows the result of the \code{master} affinity policy on
the partition list for the machine architecture depicted above. The place partition
is not changed by the master policy.
\cexample{affinity}{5c}
\cexample{affinity}{5}
\fexample{affinity}{5f}
\fexample{affinity}{5}
It is unspecified on which place the master thread is initially started. If the
master thread is initially running on p0, the following placement of threads will

View File

@ -0,0 +1,43 @@
\section{Affinity Query Functions}
\label{sec: affinity_query}
In the example below a team of threads is generated on each socket of
the system, using nested parallelism. Several query functions are used
to gather information to support the creation of the teams and to obtain
socket and thread numbers.
For proper execution of the code, the user must create a place partition, such that
each place is a listing of the core numbers for a socket. For example,
in a 2 socket system with 8 cores in each socket, and sequential numbering
in the socket for the core numbers, the \code{OMP\_PLACES} variable would be set
to "\{0:8\},\{8:8\}", using the place syntax \{\plc{lower\_bound}:\plc{length}:\plc{stride}\},
and the default stride of 1.
The code determines the number of sockets (\plc{n\_sockets})
using the \code{omp\_get\_num\_places()} query function.
In this example each place is constructed with a list of
each socket's core numbers, hence the number of places is equal
to the number of sockets.
The outer parallel region forms a team of threads, and each thread
executes on a socket (place) because the \code{proc\_bind} clause uses
\code{spread} in the outer \code{parallel} construct.
Next, in the \plc{socket\_init} function, an inner parallel region creates a team
of threads equal to the number of elements (core numbers) from the place
of the parent thread. Because the outer \code{parallel} construct uses
a \code{spread} affinity policy, each of its threads inherits a subpartition of
the original partition. Hence, the \code{omp\_get\_place\_num\_procs} query function
returns the number of elements (here procs = cores) in the subpartition of the thread.
After each parent thread creates its nested parallel region on the section,
the socket number and thread number are reported.
Note: Portable tools like hwloc (Portable HardWare LOCality package), which support
many common operating systems, can be used to determine the configuration of a system.
On some systems there are utilities, files or user guides that provide configuration
information. For instance, the socket number and proc\_id's for a socket
can be found in the /proc/cpuinfo text file on Linux systems.
\cexample{affinity}{6}
\ffreeexample{affinity}{6}

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{Array Sections in Device Constructs}
\label{chap:array_sections}
\section{Array Sections in Device Constructs}
\label{sec:array_sections}
The following examples show the usage of array sections in \code{map} clauses
on \code{target} and \code{target} \code{data} constructs.
@ -8,28 +8,28 @@ on \code{target} and \code{target} \code{data} constructs.
This example shows the invalid usage of two seperate sections of the same array
inside of a \code{target} construct.
\cexample{array_sections}{1c}
\cexample{array_sections}{1}
\fexample{array_sections}{1f}
\ffreeexample{array_sections}{1}
This example shows the invalid usage of two separate sections of the same array
inside of a \code{target} construct.
\cexample{array_sections}{2c}
\cexample{array_sections}{2}
\fexample{array_sections}{2f}
\ffreeexample{array_sections}{2}
This example shows the valid usage of two separate sections of the same array inside
of a \code{target} construct.
\cexample{array_sections}{3c}
\cexample{array_sections}{3}
\fexample{array_sections}{3f}
\ffreeexample{array_sections}{3}
This example shows the valid usage of a wholly contained array section of an already
mapped array section inside of a \code{target} construct.
\cexample{array_sections}{4c}
\cexample{array_sections}{4}
\fexample{array_sections}{4f}
\ffreeexample{array_sections}{4}

View File

@ -1,7 +1,7 @@
\pagebreak
\chapter{Fortran \code{ASSOCIATE} Construct}
\section{Fortran \code{ASSOCIATE} Construct}
\fortranspecificstart
\label{chap:associate}
\label{sec:associate}
The following is an invalid example of specifying an associate name on a data-sharing attribute
clause. The constraint in the Data Sharing Attribute Rules section in the OpenMP
@ -11,13 +11,13 @@ name \plc{b} is associated with the shared variable \plc{a}. With the predetermi
attribute rule, the associate name \plc{b} is not allowed to be specified on the \code{private}
clause.
\fnexample{associate}{1f}
\fnexample{associate}{1}
In next example, within the \code{parallel} construct, the association name \plc{thread\_id}
is associated with the private copy of \plc{i}. The print statement should output the
unique thread number.
\fnexample{associate}{2f}
\fnexample{associate}{2}
The following example illustrates the effect of specifying a selector name on a data-sharing
attribute clause. The associate name \plc{u} is associated with \plc{v} and the variable \plc{v}
@ -27,6 +27,6 @@ The association between \plc{u} and the original \plc{v} is retained (see the Da
Attribute Rules section in the OpenMP 4.0 API Specifications). Inside the \code{parallel}
region, \plc{v} has the value of -1 and \plc{u} has the value of the original \plc{v}.
\fnexample{associate}{3f}
\ffreenexample{associate}{3}
\fortranspecificend

View File

@ -0,0 +1,15 @@
\pagebreak
\section{Asynchronous \code{target} Execution and Dependences}
\label{sec:async_target_exec_depend}
Asynchronous execution of a \code{target} region can be accomplished
by creating an explicit task around the \code{target} region. Examples
with explicit tasks are shown at the beginning of this section.
As of OpenMP 4.5 and beyond the \code{nowait} clause can be used on the
\code{target} directive for asynchronous execution. Examples with
\code{nowait} clauses follow the explicit \code{task} examples.
This section also shows the use of \code{depend} clauses to order
executions through dependences.

View File

@ -0,0 +1,31 @@
\subsection{\code{nowait} Clause on \code{target} Construct}
\label{subsec:target_nowait_clause}
The following example shows how to execute code asynchronously on a
device without an explicit task. The \code{nowait} clause on a \code{target}
construct allows the thread of the \plc{target task} to perform other
work while waiting for the \code{target} region execution to complete.
Hence, the the \code{target} region can execute asynchronously on the
device (without requiring a host thread to idle while waiting for
the \plc{target task} execution to complete).
In this example the product of two vectors (arrays), \plc{v1}
and \plc{v2}, is formed. One half of the operations is performed
on the device, and the last half on the host, concurrently.
After a team of threads is formed the master thread generates
the \plc{target task} while the other threads can continue on, without a barrier,
to the execution of the host portion of the vector product.
The completion of the \plc{target task} (asynchronous target execution) is
guaranteed by the synchronization in the implicit barrier at the end of the
host vector-product worksharing loop region. See the \code{barrier}
glossary entry in the OpenMP specification for details.
The host loop scheduling is \code{dynamic}, to balance the host thread executions, since
one thread is being used for offload generation. In the situation where
little time is spent by the \plc{target task} in setting
up and tearing down the the target execution, \code{static} scheduling may be desired.
\cexample{async_target}{3}
\ffreeexample{async_target}{3}

View File

@ -0,0 +1,18 @@
%begin
\subsection{Asynchronous \code{target} with \code{nowait} and \code{depend} Clauses}
\label{subsec:async_target_nowait_depend}
More details on dependences can be found in \specref{sec:task_depend}, Task
Dependences. In this example, there are three flow dependences. In the first two dependences the
target task does not execute until the preceding explicit tasks have finished. These
dependences are produced by arrays \plc{v1} and \plc{v2} with the \code{out} dependence type in the first two tasks, and the \code{in} dependence type in the target task.
The last dependence is produced by array \plc{p} with the \code{out} dependence type in the target task, and the \code{in} dependence type in the last task. The last task does not execute until the target task finishes.
The \code{nowait} clause on the \code{target} construct creates a deferrable \plc{target task}, allowing the encountering task to continue execution without waiting for the completion of the \plc{target task}.
\cexample{async_target}{4}
\ffreeexample{async_target}{4}
%end

View File

@ -1,6 +1,5 @@
\pagebreak
\chapter{Asynchronous Execution of a \code{target} Region Using Tasks}
\label{chap:async_target}
\subsection{Asynchronous \code{target} with Tasks}
\label{subsec:async_target_with_tasks}
The following example shows how the \code{task} and \code{target} constructs
are used to execute multiple \code{target} regions asynchronously. The task that
@ -10,45 +9,46 @@ scheduling point while waiting for the execution of the \code{target} region
to complete, allowing the thread to switch back to the execution of the encountering
task or one of the previously generated explicit tasks.
\cexample{async_target}{1c}
\cexample{async_target}{1}
The Fortran version has an interface block that contains the \code{declare} \code{target}.
An identical statement exists in the function declaration (not shown here).
\fexample{async_target}{1f}
\ffreeexample{async_target}{1}
The following example shows how the \code{task} and \code{target} constructs
are used to execute multiple \code{target} regions asynchronously. The task dependence
ensures that the storage is allocated and initialized on the device before it is
accessed.
\cexample{async_target}{2c}
\cexample{async_target}{2}
The Fortran example below is similar to the C version above. Instead of pointers, though, it uses
the convenience of Fortran allocatable arrays on the device. An allocatable array has the
same behavior in a \code{map} clause as a C pointer, in this case.
the convenience of Fortran allocatable arrays on the device. In order to preserve the arrays
allocated on the device across multiple \code{target} regions, a \code{target}~\code{data} region
is used in this case.
If there is no shape specified for an allocatable array in a \code{map} clause, only the array descriptor
(also called a dope vector) is mapped. That is, device space is created for the descriptor, and it
is initially populated with host values. In this case, the \plc{v1} and \plc{v2} arrays will be in a
non-associated state on the device. When space for \plc{v1} and \plc{v2} is allocated on the device
the addresses to the space will be included in their descriptors.
in the first \code{target} region the addresses to the space will be included in their descriptors.
At the end of the first \code{target} region, the descriptor (of an unshaped specification of an allocatable
array in a \code{map} clause) is returned with the raw device address of the allocated space.
The content of the array is not returned. In the example the data in arrays \plc{v1} and \plc{v2}
are not returned. In the second \code{target} directive, the \plc{v1} and \plc{v2} descriptors are
re-created on the device with the descriptive information; and references to the
vectors point to the correct local storage, of the space that was not freed in the first \code{target}
directive. At the end of the second \code{target} region, the data in array \plc{p} is copied back
to the host since \plc{p} is not an allocatable array.
At the end of the first \code{target} region, the arrays \plc{v1} and \plc{v2} are preserved on the device
for access in the second \code{target} region. At the end of the second \code{target} region, the data
in array \plc{p} is copied back, the arrays \plc{v1} and \plc{v2} are not.
A \code{depend} clause is used in the \code{task} directive to provide a wait at the beginning of the second
\code{target} region, to insure that there is no race condition with \plc{v1} and \plc{v2} in the two tasks.
It would be noncompliant to use \plc{v1} and/or \plc{v2} in lieu of \plc{N} in the \code{depend} clauses,
because the use of non-allocated allocatable arrays as list items in the first \code{depend} clause would
because the use of non-allocated allocatable arrays as list items in a \code{depend} clause would
lead to unspecified behavior.
\fexample{async_target}{2f}
\noteheader{--} This example is not strictly compliant with the OpenMP 4.5 specification since the allocation status
of allocatable arrays \plc{v1} and \plc{v2} is changed inside the \code{target} region, which is not allowed.
(See the restrictions for the \code{map} clause in the \plc{Data-mapping Attribute Rules and Clauses}
section of the specification.)
However, the intention is to relax the restrictions on mapping of allocatable variables in the next release
of the specification so that the example will be compliant.
\ffreeexample{async_target}{2}

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{The \code{atomic} Construct}
\label{chap:atomic}
\section{The \code{atomic} Construct}
\label{sec:atomic}
The following example avoids race conditions (simultaneous updates of an element
of \plc{x} by multiple threads) by using the \code{atomic} construct .
@ -14,9 +14,9 @@ Note that the \code{atomic} directive applies only to the statement immediately
following it. As a result, elements of \plc{y} are not updated atomically in
this example.
\cexample{atomic}{1c}
\cexample{atomic}{1}
\fexample{atomic}{1f}
\fexample{atomic}{1}
The following example illustrates the \code{read} and \code{write} clauses
for the \code{atomic} directive. These clauses ensure that the given variable
@ -26,9 +26,9 @@ another part of the variable. Note that most hardware provides atomic reads and
writes for some set of properly aligned variables of specific sizes, but not necessarily
for all the variable types supported by the OpenMP API.
\cexample{atomic}{2c}
\cexample{atomic}{2}
\fexample{atomic}{2f}
\fexample{atomic}{2}
The following example illustrates the \code{capture} clause for the \code{atomic}
directive. In this case the value of a variable is captured, and then the variable
@ -37,8 +37,8 @@ be implemented using the fetch-and-add instruction available on many kinds of ha
The example also shows a way to implement a spin lock using the \code{capture}
and \code{read} clauses.
\cexample{atomic}{3c}
\cexample{atomic}{3}
\fexample{atomic}{3f}
\fexample{atomic}{3}

View File

@ -1,25 +1,25 @@
\pagebreak
\chapter{Restrictions on the \code{atomic} Construct}
\label{chap:atomic_restrict}
\section{Restrictions on the \code{atomic} Construct}
\label{sec:atomic_restrict}
The following non-conforming examples illustrate the restrictions on the \code{atomic}
construct.
\cexample{atomic_restrict}{1c}
\cexample{atomic_restrict}{1}
\fexample{atomic_restrict}{1f}
\fexample{atomic_restrict}{1}
\cexample{atomic_restrict}{2c}
\cexample{atomic_restrict}{2}
\fortranspecificstart
The following example is non-conforming because \code{I} and \code{R} reference
the same location but have different types.
\fnexample{atomic_restrict}{2f}
\fnexample{atomic_restrict}{2}
Although the following example might work on some implementations, this is also
non-conforming:
\fnexample{atomic_restrict}{3f}
\fnexample{atomic_restrict}{3}
\fortranspecificend

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{Binding of \code{barrier} Regions}
\label{chap:barrier_regions}
\section{Binding of \code{barrier} Regions}
\label{sec:barrier_regions}
The binding rules call for a \code{barrier} region to bind to the closest enclosing
\code{parallel} region.
@ -17,8 +17,8 @@ part. Also note that the \code{barrier} region in \plc{sub3} when called from
\plc{sub2} only synchronizes the team of threads in the enclosing \code{parallel}
region and not all the threads created in \plc{sub1}.
\cexample{barrier_regions}{1c}
\cexample{barrier_regions}{1}
\fexample{barrier_regions}{1f}
\fexample{barrier_regions}{1}

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{Cancellation Constructs}
\label{chap:cancellation}
\section{Cancellation Constructs}
\label{sec:cancellation}
The following example shows how the \code{cancel} directive can be used to terminate
an OpenMP region. Although the \code{cancel} construct terminates the OpenMP
@ -11,7 +11,7 @@ exception is properly handled in the sequential part. If cancellation of the \co
region has been requested, some threads might have executed \code{phase\_1()}.
However, it is guaranteed that none of the threads executed \code{phase\_2()}.
\cexample{cancellation}{1c}
\cppexample{cancellation}{1}
The following example illustrates the use of the \code{cancel} construct in error
@ -20,7 +20,7 @@ the cancellation is activated. The encountering thread sets the shared variable
\code{err} and other threads of the binding thread set proceed to the end of
the worksharing construct after the cancellation has been activated.
\fexample{cancellation}{1f}
\ffreeexample{cancellation}{1}
The following example shows how to cancel a parallel search on a binary tree as
soon as the search value has been detected. The code creates a task to descend
@ -32,11 +32,11 @@ task group to control the effect of the \code{cancel taskgroup} directive. The
\plc{level} argument is used to create undeferred tasks after the first ten
levels of the tree.
\cexample{cancellation}{2c}
\cexample{cancellation}{2}
The following is the equivalent parallel search example in Fortran.
\fexample{cancellation}{2f}
\ffreeexample{cancellation}{2}

View File

@ -1,7 +1,7 @@
\pagebreak
\chapter{C/C++ Arrays in a \code{firstprivate} Clause}
\section{C/C++ Arrays in a \code{firstprivate} Clause}
\ccppspecificstart
\label{chap:carrays_fpriv}
\label{sec:carrays_fpriv}
The following example illustrates the size and value of list items of array or
pointer type in a \code{firstprivate} clause . The size of new list items is
@ -31,7 +31,7 @@ The new items of array type are initialized as if each integer element of the or
array is assigned to the corresponding element of the new array. Those of pointer
type are initialized as if by assignment from the original item to the new item.
\cnexample{carrays_fpriv}{1c}
\cnexample{carrays_fpriv}{1}
\ccppspecificend

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{The \code{collapse} Clause}
\label{chap:collapse}
\section{The \code{collapse} Clause}
\label{sec:collapse}
In the following example, the \code{k} and \code{j} loops are associated with
the loop construct. So the iterations of the \code{k} and \code{j} loops are
@ -16,9 +16,9 @@ The variable \code{j} can be omitted from the \code{private} clause when the
from the \code{private} clause. In either case, \code{k} is implicitly private
and could be omitted from the \code{private} clause.
\cexample{collapse}{1c}
\cexample{collapse}{1}
\fexample{collapse}{1f}
\fexample{collapse}{1}
In the next example, the \code{k} and \code{j} loops are associated with the
loop construct. So the iterations of the \code{k} and \code{j} loops are collapsed
@ -33,9 +33,9 @@ will have the value \code{2} and \code{j} will have the value \code{3}. Since
by the sequentially last iteration of the collapsed \code{k} and \code{j} loop.
This example prints: \code{2 3}.
\cexample{collapse}{2c}
\cexample{collapse}{2}
\fexample{collapse}{2f}
\fexample{collapse}{2}
The next example illustrates the interaction of the \code{collapse} and \code{ordered}
clauses.
@ -71,8 +71,8 @@ The code prints
\\
\code{1 3 2}
\cexample{collapse}{3c}
\cexample{collapse}{3}
\fexample{collapse}{3f}
\fexample{collapse}{3}

View File

@ -1,13 +1,13 @@
\pagebreak
\chapter{Conditional Compilation}
\label{chap:cond_comp}
\section{Conditional Compilation}
\label{sec:cond_comp}
\ccppspecificstart
The following example illustrates the use of conditional compilation using the
OpenMP macro \code{\_OPENMP}. With OpenMP compilation, the \code{\_OPENMP}
macro becomes defined.
\cnexample{cond_comp}{1c}
\cnexample{cond_comp}{1}
\ccppspecificend
\fortranspecificstart
@ -16,6 +16,6 @@ With OpenMP compilation, the conditional compilation sentinel \code{!\$} is reco
and treated as two spaces. In fixed form source, statements guarded by the sentinel
must start after column 6.
\fnexample{cond_comp}{1f}
\fnexample{cond_comp}{1}
\fortranspecificend

View File

@ -1,13 +1,13 @@
\pagebreak
\chapter{The \code{copyin} Clause}
\label{chap:copyin}
\section{The \code{copyin} Clause}
\label{sec:copyin}
The \code{copyin} clause is used to initialize threadprivate data upon entry
to a \code{parallel} region. The value of the threadprivate variable in the master
thread is copied to the threadprivate variable of each other team member.
\cexample{copyin}{1c}
\cexample{copyin}{1}
\fexample{copyin}{1f}
\fexample{copyin}{1}

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{The \code{copyprivate} Clause}
\label{chap:copyprivate}
\section{The \code{copyprivate} Clause}
\label{sec:copyprivate}
The \code{copyprivate} clause can be used to broadcast values acquired by a single
thread directly to all instances of the private variables in the other threads.
@ -16,28 +16,28 @@ The thread that executes the structured block associated with the \code{single}
of the other implicit tasks in the thread team. The broadcast completes before
any of the threads have left the barrier at the end of the construct.
\cexample{copyprivate}{1c}
\cexample{copyprivate}{1}
\fexample{copyprivate}{1f}
\fexample{copyprivate}{1}
In this example, assume that the input must be performed by the master thread.
Since the \code{master} construct does not support the \code{copyprivate} clause,
it cannot broadcast the input value that is read. However, \code{copyprivate}
is used to broadcast an address where the input value is stored.
\cexample{copyprivate}{2c}
\cexample{copyprivate}{2}
\fexample{copyprivate}{2f}
\fexample{copyprivate}{2}
Suppose that the number of lock variables required within a \code{parallel} region
cannot easily be determined prior to entering it. The \code{copyprivate} clause
can be used to provide access to shared lock variables that are allocated within
that \code{parallel} region.
\cexample{copyprivate}{3c}
\cexample{copyprivate}{3}
\fortranspecificstart
\fnexample{copyprivate}{3f}
\fnexample{copyprivate}{3}
Note that the effect of the \code{copyprivate} clause on a variable with the
\code{allocatable} attribute is different than on a variable with the \code{pointer}
@ -45,7 +45,7 @@ attribute. The value of \code{A} is copied (as if by intrinsic assignment) and
the pointer \code{B} is copied (as if by pointer assignment) to the corresponding
list items in the other implicit tasks belonging to the \code{parallel} region.
\fnexample{copyprivate}{4f}
\fnexample{copyprivate}{4}
\fortranspecificend

View File

@ -0,0 +1,14 @@
\section{C++ Reference in Data-Sharing Clauses}
\cppspecificstart
\label{sec:cpp_reference}
C++ reference types are allowed in data-sharing attribute clauses as of OpenMP 4.5, except
for the \code{threadprivate}, \code{copyin} and \code{copyprivate} clauses.
(See the Data-Sharing Attribute Clauses Section of the 4.5 OpenMP specification.)
When a variable with C++ reference type is privatized, the object the reference refers to is privatized in addition to the reference itself.
The following example shows the use of reference types in data-sharing clauses in the usual way.
Additionally it shows how the data-sharing of formal arguments with a C++ reference type on an orphaned task generating construct is determined implicitly. (See the Data-sharing Attribute Rules for Variables Referenced in a Construct Section of the 4.5 OpenMP specification.)
\cppnexample{cpp_reference}{1}
\cppspecificend

View File

@ -1,16 +1,20 @@
\pagebreak
\chapter{The \code{critical} Construct}
\label{chap:critical}
\section{The \code{critical} Construct}
\label{sec:critical}
The following example includes several \code{critical} constructs . The example
The following example includes several \code{critical} constructs. The example
illustrates a queuing model in which a task is dequeued and worked on. To guard
against multiple threads dequeuing the same task, the dequeuing operation must
be in a \code{critical} region. Because the two queues in this example are independent,
they are protected by \code{critical} constructs with different names, \plc{xaxis}
and \plc{yaxis}.
\cexample{critical}{1c}
\cexample{critical}{1}
\fexample{critical}{1f}
\fexample{critical}{1}
The following example extends the previous example by adding the \code{hint} clause to the \code{critical} constructs.
\cexample{critical}{2}
\fexample{critical}{2}

View File

@ -1,8 +1,9 @@
\pagebreak
\chapter{\code{declare} \code{target} Construct}
\label{chap:declare_target}
\section{\code{declare} \code{target} Construct}
\label{sec:declare_target}
\section{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for a Function}
\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for a Function}
\label{subsec:declare_target_function}
The following example shows how the \code{declare} \code{target} directive
is used to indicate that the corresponding call inside a \code{target} region
@ -15,7 +16,7 @@ the \code{target} region (thus \code{fib}) will execute on the host device.
For C/C++ codes the declaration of the function \code{fib} appears between the \code{declare}
\code{target} and \code{end} \code{declare} \code{target} directives.
\cexample{declare_target}{1c}
\cexample{declare_target}{1}
The Fortran \code{fib} subroutine contains a \code{declare} \code{target} declaration
to indicate to the compiler to create an device executable version of the procedure.
@ -26,7 +27,7 @@ The program uses the \code{module\_fib} module, which presents an explicit inter
the compiler with the \code{declare} \code{target} declarations for processing
the \code{fib} call.
\fexample{declare_target}{1f}
\ffreeexample{declare_target}{1}
The next Fortran example shows the use of an external subroutine. Without an explicit
interface (through module use or an interface block) the \code{declare} \code{target}
@ -34,9 +35,10 @@ declarations within a external subroutine are unknown to the main program unit;
therefore, a \code{declare} \code{target} must be provided within the program
scope for the compiler to determine that a target binary should be available.
\fexample{declare_target}{2f}
\ffreeexample{declare_target}{2}
\section{\code{declare} \code{target} Construct for Class Type}
\subsection{\code{declare} \code{target} Construct for Class Type}
\label{subsec:declare_target_class}
\cppspecificstart
The following example shows how the \code{declare} \code{target} and \code{end}
@ -45,10 +47,11 @@ of a variable \plc{varY} with a class type \code{typeY}. The member function \co
be accessed on a target device because its declaration did not appear between \code{declare}
\code{target} and \code{end} \code{declare} \code{target} directives.
\cnexample{declare_target}{2c}
\cppnexample{declare_target}{2}
\cppspecificend
\section{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for Variables}
\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for Variables}
\label{subsec:declare_target_variables}
The following examples show how the \code{declare} \code{target} and \code{end}
\code{declare} \code{target} directives are used to indicate that global variables
@ -62,13 +65,13 @@ is then used to manage the consistency of the variables \plc{p}, \plc{v1}, and \
data environment of the encountering host device task and the implicit device data
environment of the default target device.
\cexample{declare_target}{3c}
\cexample{declare_target}{3}
The Fortran version of the above C code uses a different syntax. Fortran modules
use a list syntax on the \code{declare} \code{target} directive to declare
mapped variables.
\fexample{declare_target}{3f}
\ffreeexample{declare_target}{3}
The following example also indicates that the function \code{Pfun()} is available on the
target device, as well as the variable \plc{Q}, which is mapped to the implicit device
@ -81,7 +84,7 @@ In the following example, the function and variable declarations appear between
the \code{declare} \code{target} and \code{end} \code{declare} \code{target}
directives.
\cexample{declare_target}{4c}
\cexample{declare_target}{4}
The Fortran version of the above C code uses a different syntax. In Fortran modules
a list syntax on the \code{declare} \code{target} directive is used to declare
@ -90,9 +93,10 @@ separated list. When the \code{declare} \code{target} directive is used to
declare just the procedure, the procedure name need not be listed -- it is implicitly
assumed, as illustrated in the \code{Pfun()} function.
\fexample{declare_target}{4f}
\ffreeexample{declare_target}{4}
\section{\code{declare} \code{target} and \code{end} \code{declare} \code{target} with \code{declare} \code{simd}}
\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} with \code{declare} \code{simd}}
\label{subsec:declare_target_simd}
The following example shows how the \code{declare} \code{target} and \code{end}
\code{declare} \code{target} directives are used to indicate that a function
@ -100,7 +104,7 @@ is available on a target device. The \code{declare} \code{simd} directive indica
that there is a SIMD version of the function \code{P()} that is available on the target
device as well as one that is available on the host device.
\cexample{declare_target}{5c}
\cexample{declare_target}{5}
The Fortran version of the above C code uses a different syntax. Fortran modules
use a list syntax of the \code{declare} \code{target} declaration for the mapping.
@ -109,5 +113,30 @@ The function declaration does not use a list and implicitly assumes the function
name. In this Fortran example row and column indices are reversed relative to the
C/C++ example, as is usual for codes optimized for memory access.
\fexample{declare_target}{5f}
\ffreeexample{declare_target}{5}
\subsection{\code{declare}~\code{target} Directive with \code{link} Clause}
\label{subsec:declare_target_link}
In the OpenMP 4.5 standard the \code{declare}~\code{target} directive was extended to allow static
data to be mapped, \emph{when needed}, through a \code{link} clause.
Data storage for items listed in the \code{link} clause becomes available on the device
when it is mapped implicitly or explicitly in a \code{map} clause, and it persists for the scope of
the mapping (as specified by a \code{target} construct,
a \code{target}~\code{data} construct, or
\code{target}~\code{enter/exit}~\code{data} constructs).
Tip: When all the global data items will not fit on a device and are not needed
simultaneously, use the \code{link} clause and map the data only when it is needed.
The following C and Fortran examples show two sets of data (single precision and double precision)
that are global on the host for the entire execution on the host; but are only used
globally on the device for part of the program execution. The single precision data
are allocated and persist only for the first \code{target} region. Similarly, the
double precision data are in scope on the device only for the second \code{target} region.
\cexample{declare_target}{6}
\ffreeexample{declare_target}{6}

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{The \code{default(none)} Clause}
\label{chap:default_none}
\section{The \code{default(none)} Clause}
\label{sec:default_none}
The following example distinguishes the variables that are affected by the \code{default(none)}
clause from those that are not.
@ -11,9 +11,9 @@ are no longer predetermined shared. Thus, these variables (variable \plc{c} in
need to be explicitly listed
in data-sharing attribute clauses when the \code{default(none)} clause is specified.
\cnexample{default_none}{1c}
\cnexample{default_none}{1}
\ccppspecificend
\fexample{default_none}{1f}
\fexample{default_none}{1}

View File

@ -1,35 +1,57 @@
\pagebreak
\chapter{Device Routines}
\label{chap:device}
\section{Device Routines}
\label{sec:device}
\section{\code{omp\_is\_initial\_device} Routine}
\subsection{\code{omp\_is\_initial\_device} Routine}
\label{subsec:device_is_initial}
The following example shows how the \code{omp\_is\_initial\_device} runtime library routine
can be used to query if a code is executing on the initial host device or on a
target device. The example then sets the number of threads in the \code{parallel}
region based on where the code is executing.
\cexample{device}{1c}
\cexample{device}{1}
\fexample{device}{1f}
\ffreeexample{device}{1}
\section{\code{omp\_get\_num\_devices} Routine}
\subsection{\code{omp\_get\_num\_devices} Routine}
\label{subsec:device_num_devices}
The following example shows how the \code{omp\_get\_num\_devices} runtime library routine
can be used to determine the number of devices.
\cexample{device}{2c}
\cexample{device}{2}
\fexample{device}{2f}
\ffreeexample{device}{2}
\section{\code{omp\_set\_default\_device} and \\
\subsection{\code{omp\_set\_default\_device} and \\
\code{omp\_get\_default\_device} Routines}
\label{subsec:device_is_set_get_default}
The following example shows how the \code{omp\_set\_default\_device} and \code{omp\_get\_default\_device}
runtime library routines can be used to set the default device and determine the
default device respectively.
\cexample{device}{3c}
\cexample{device}{3}
\fexample{device}{3f}
\ffreeexample{device}{3}
\subsection{Target Memory and Device Pointers Routines}
\label{subsec:target_mem_and_device_ptrs}
The following example shows how to create space on a device, transfer data
to and from that space, and free the space, using API calls. The API calls
directly execute allocation, copy and free operations on the device, without invoking
any mapping through a \code{target} directive. The \code{omp\_target\_alloc} routine allocates space
and returns a device pointer for referencing the space in the \code{omp\_target\_memcpy}
API routine on the host. The \code{omp\_target\_free} routine frees the space on the device.
The example also illustrates how to access that space
in a \code{target} region by exposing the device pointer in an \code{is\_device\_ptr} clause.
The example creates an array of cosine values on the default device, to be used
on the host device. The function fails if a default device is not available.
\cexample{device}{4}

68
Examples_doacross.tex Normal file
View File

@ -0,0 +1,68 @@
\pagebreak
\section{Doacross Loop Nest}
\label{sec:doacross}
An \code{ordered} clause can be used on a loop construct with an integer
parameter argument to define the number of associated loops within
a \plc{doacross loop nest} where cross-iteration dependences exist.
A \code{depend} clause on an \code{ordered} construct within an ordered
loop describes the dependences of the \plc{doacross} loops.
In the code below, the \code{depend(sink:i-1)} clause defines an \plc{i-1}
to \plc{i} cross-iteration dependence that specifies a wait point for
the completion of computation from iteration \plc{i-1} before proceeding
to the subsequent statements. The \code{depend(source)} clause indicates
the completion of computation from the current iteration (\plc{i})
to satisfy the cross-iteration dependence that arises from the iteration.
For this example the same sequential ordering could have been achieved
with an \code{ordered} clause without a parameter, on the loop directive,
and a single \code{ordered} directive without the \code{depend} clause
specified for the statement executing the \plc{bar} function.
\cexample{doacross}{1}
\ffreeexample{doacross}{1}
The following code is similar to the previous example but with
\plc{doacross loop nest} extended to two nested loops, \plc{i} and \plc{j},
as specified by the \code{ordered(2)} clause on the loop directive.
In the C/C++ code, the \plc{i} and \plc{j} loops are the first and
second associated loops, respectively, whereas
in the Fortran code, the \plc{j} and \plc{i} loops are the first and
second associated loops, respectively.
The \code{depend(sink:i-1,j)} and \code{depend(sink:i,j-1)} clauses in
the C/C++ code define cross-iteration dependences in two dimensions from
iterations (\plc{i-1, j}) and (\plc{i, j-1}) to iteration (\plc{i, j}).
Likewise, the \code{depend(sink:j-1,i)} and \code{depend(sink:j,i-1)} clauses
in the Fortran code define cross-iteration dependences from iterations
(\plc{j-1, i}) and (\plc{j, i-1}) to iteration (\plc{j, i}).
\cexample{doacross}{2}
\ffreeexample{doacross}{2}
The following example shows the incorrect use of the \code{ordered}
directive with a \code{depend} clause. There are two issues with the code.
The first issue is a missing \code{ordered}~\code{depend(source)} directive,
which could cause a deadlock.
The second issue is the \code{depend(sink:i+1,j)} and \code{depend(sink:i,j+1)}
clauses define dependences on lexicographically later
source iterations (\plc{i+1, j}) and (\plc{i, j+1}), which could cause
a deadlock as well since they may not start to execute until the current iteration completes.
\cexample{doacross}{3}
\ffreeexample{doacross}{3}
The following example illustrates the use of the \code{collapse} clause for
a \plc{doacross loop nest}. The \plc{i} and \plc{j} loops are the associated
loops for the collapsed loop as well as for the \plc{doacross loop nest}.
The example also shows a compliant usage of the dependence source
directive placed before the corresponding sink directive.
Checking the completion of computation from previous iterations at the sink point can occur after the source statement.
\cexample{doacross}{4}
\ffreeexample{doacross}{4}

View File

@ -1,12 +1,12 @@
\pagebreak
\chapter{The \code{flush} Construct without a List}
\label{chap:flush_nolist}
\section{The \code{flush} Construct without a List}
\label{sec:flush_nolist}
The following example distinguishes the shared variables affected by a \code{flush}
construct with no list from the shared objects that are not affected:
\cexample{flush_nolist}{1c}
\cexample{flush_nolist}{1}
\fexample{flush_nolist}{1f}
\fexample{flush_nolist}{1}

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{Fortran Restrictions on the \code{do} Construct}
\label{chap:fort_do}
\section{Fortran Restrictions on the \code{do} Construct}
\label{sec:fort_do}
\fortranspecificstart
If an \code{end do} directive follows a \plc{do-construct} in which several
@ -8,12 +8,12 @@ If an \code{end do} directive follows a \plc{do-construct} in which several
directive can only be specified for the outermost of these \code{DO} statements.
The following example contains correct usages of loop constructs:
\fnexample{fort_do}{1f}
\fnexample{fort_do}{1}
The following example is non-conforming because the matching \code{do} directive
for the \code{end do} does not precede the outermost loop:
\fnexample{fort_do}{2f}
\fnexample{fort_do}{2}
\fortranspecificend

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{Fortran Private Loop Iteration Variables}
\label{chap:fort_loopvar}
\section{Fortran Private Loop Iteration Variables}
\label{sec:fort_loopvar}
\fortranspecificstart
In general loop iteration variables will be private, when used in the \plc{do-loop}
@ -10,12 +10,12 @@ the OpenMP 4.0 specification). In the following example of a sequential
loop in a \code{parallel} construct the loop iteration variable \plc{I} will
be private.
\fnexample{fort_loopvar}{1f}
\ffreenexample{fort_loopvar}{1}
In exceptional cases, loop iteration variables can be made shared, as in the following
example:
\fnexample{fort_loopvar}{2f}
\ffreenexample{fort_loopvar}{2}
Note however that the use of shared loop iteration variables can easily lead to
race conditions.

View File

@ -1,7 +1,7 @@
\pagebreak
\chapter{Race Conditions Caused by Implied Copies of Shared Variables in Fortran}
\section{Race Conditions Caused by Implied Copies of Shared Variables in Fortran}
\fortranspecificstart
\label{chap:fort_race}
\label{sec:fort_race}
The following example contains a race condition, because the shared variable, which
is an array section, is passed as an actual argument to a routine that has an assumed-size
@ -10,7 +10,7 @@ may cause the compiler to copy the argument into a temporary location prior to
the call and copy from the temporary location into the original variable when the
subroutine returns. This copying would cause races in the \code{parallel} region.
\fnexample{fort_race}{1f}
\ffreenexample{fort_race}{1}
\fortranspecificend

View File

@ -1,23 +1,23 @@
\pagebreak
\chapter{Fortran Restrictions on Storage Association with the \code{private} Clause}
\section{Fortran Restrictions on Storage Association with the \code{private} Clause}
\fortranspecificstart
\label{chap:fort_sa_private}
\label{sec:fort_sa_private}
The following non-conforming examples illustrate the implications of the \code{private}
clause rules with regard to storage association.
\fnexample{fort_sa_private}{1f}
\fnexample{fort_sa_private}{1}
\fnexample{fort_sa_private}{2f}
\fnexample{fort_sa_private}{2}
\fnexample{fort_sa_private}{3}
% blue line floater at top of this page for "Fortran, cont."
\begin{figure}[t!]
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
\end{figure}
\fnexample{fort_sa_private}{3f}
\fnexample{fort_sa_private}{4}
\fnexample{fort_sa_private}{4f}
\fnexample{fort_sa_private}{5f}
\fnexample{fort_sa_private}{5}
\fortranspecificend

View File

@ -1,7 +1,7 @@
\pagebreak
\chapter{Fortran Restrictions on \code{shared} and \code{private} Clauses with Common Blocks}
\section{Fortran Restrictions on \code{shared} and \code{private} Clauses with Common Blocks}
\fortranspecificstart
\label{chap:fort_sp_common}
\label{sec:fort_sp_common}
When a named common block is specified in a \code{private}, \code{firstprivate},
or \code{lastprivate} clause of a construct, none of its members may be declared
@ -10,11 +10,11 @@ illustrate this point.
The following example is conforming:
\fnexample{fort_sp_common}{1f}
\fnexample{fort_sp_common}{1}
The following example is also conforming:
\fnexample{fort_sp_common}{2f}
\fnexample{fort_sp_common}{2}
% blue line floater at top of this page for "Fortran, cont."
\begin{figure}[t!]
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
@ -22,17 +22,17 @@ The following example is also conforming:
The following example is conforming:
\fnexample{fort_sp_common}{3f}
\fnexample{fort_sp_common}{3}
The following example is non-conforming because \code{x} is a constituent element
of \code{c}:
\fnexample{fort_sp_common}{4f}
\fnexample{fort_sp_common}{4}
The following example is non-conforming because a common block may not be declared
both shared and private:
\fnexample{fort_sp_common}{5f}
\fnexample{fort_sp_common}{5}
\fortranspecificend

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{The \code{firstprivate} Clause and the \code{sections} Construct}
\label{chap:fpriv_sections}
\section{The \code{firstprivate} Clause and the \code{sections} Construct}
\label{sec:fpriv_sections}
In the following example of the \code{sections} construct the \code{firstprivate}
clause is used to initialize the private copy of \code{section\_count} of each
@ -11,8 +11,8 @@ thread executes the two sections, one section will print the value 1 and the oth
will print the value 2. Since the order of execution of the two sections in this
case is unspecified, it is unspecified which section prints which value.
\cexample{fpriv_sections}{1c}
\cexample{fpriv_sections}{1}
\fexample{fpriv_sections}{1f}
\ffreeexample{fpriv_sections}{1}

View File

@ -1,21 +1,21 @@
\pagebreak
\chapter{The \code{omp\_get\_num\_threads} Routine}
\label{chap:get_nthrs}
\section{The \code{omp\_get\_num\_threads} Routine}
\label{sec:get_nthrs}
In the following example, the \code{omp\_get\_num\_threads} call returns 1 in
the sequential part of the code, so \code{np} will always be equal to 1. To determine
the number of threads that will be deployed for the \code{parallel} region, the
call should be inside the \code{parallel} region.
\cexample{get_nthrs}{1c}
\cexample{get_nthrs}{1}
\fexample{get_nthrs}{1f}
\fexample{get_nthrs}{1}
The following example shows how to rewrite this program without including a query
for the number of threads:
\cexample{get_nthrs}{2c}
\cexample{get_nthrs}{2}
\fexample{get_nthrs}{2f}
\fexample{get_nthrs}{2}

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{Internal Control Variables (ICVs)}
\label{chap:icv}
\section{Internal Control Variables (ICVs)}
\label{sec:icv}
According to Section 2.3 of the OpenMP 4.0 specification, an OpenMP implementation must act as if there are ICVs that control
the behavior of the program. This example illustrates two ICVs, \plc{nthreads-var}
@ -50,7 +50,7 @@ one of the threads in the team. Since we have a total of two inner \code{paralle
regions, the print statement will be executed twice -- once per inner \code{parallel}
region.
\cexample{icv}{1c}
\cexample{icv}{1}
\fexample{icv}{1f}
\fexample{icv}{1}

View File

@ -1,11 +1,10 @@
\pagebreak
\chapter{The \code{omp\_init\_lock} Routine}
\label{chap:init_lock}
\subsection{The \code{omp\_init\_lock} Routine}
\label{subsec:init_lock}
The following example demonstrates how to initialize an array of locks in a \code{parallel}
region by using \code{omp\_init\_lock}.
\cexample{init_lock}{1c}
\cppexample{init_lock}{1}
\fexample{init_lock}{1f}
\fexample{init_lock}{1}

View File

@ -0,0 +1,10 @@
%\pagebreak
\subsection{The \code{omp\_init\_lock\_with\_hint} Routine}
\label{subsec:init_lock_with_hint}
The following example demonstrates how to initialize an array of locks in a \code{parallel} region by using \code{omp\_init\_lock\_with\_hint}.
Note, hints are combined with an \code{|} or \code{+} operator in C/C++ and a \code{+} operator in Fortran.
\cppexample{init_lock_with_hint}{1}
\fexample{init_lock_with_hint}{1}

View File

@ -1,14 +1,14 @@
\pagebreak
\chapter{The \code{lastprivate} Clause}
\label{chap:lastprivate}
\section{The \code{lastprivate} Clause}
\label{sec:lastprivate}
Correct execution sometimes depends on the value that the last iteration of a loop
assigns to a variable. Such programs must list all such variables in a \code{lastprivate}
clause so that the values of the variables are the same as when the loop is executed
sequentially.
\cexample{lastprivate}{1c}
\cexample{lastprivate}{1}
\fexample{lastprivate}{1f}
\fexample{lastprivate}{1}

View File

@ -0,0 +1,13 @@
\section{\code{linear} Clause in Loop Constructs}
\label{sec:linear_in_loop}
The following example shows the use of the \code{linear} clause in a loop
construct to allow the proper parallelization of a loop that contains
an induction variable (\plc{j}). At the end of the execution of
the loop construct, the original variable \plc{j} is updated with
the value \plc{N/2} from the last iteration of the loop.
\cexample{linear_in_loop}{1}
\ffreeexample{linear_in_loop}{1}

View File

@ -1,6 +1,5 @@
\pagebreak
\chapter{Ownership of Locks}
\label{chap:lock_owner}
\subsection{Ownership of Locks}
\label{subsec:lock_owner}
Ownership of locks has changed since OpenMP 2.5. In OpenMP 2.5, locks are owned
by threads; so a lock released by the \code{omp\_unset\_lock} routine must be
@ -16,8 +15,8 @@ the same). However, it is not conforming beginning with OpenMP 3.0, because the
region that releases the lock \code{lck} is different from the task region that
acquires the lock.
\cexample{lock_owner}{1c}
\cexample{lock_owner}{1}
\fexample{lock_owner}{1f}
\fexample{lock_owner}{1}

5
Examples_locks.tex Normal file
View File

@ -0,0 +1,5 @@
\pagebreak
\section{Lock Routines}
\label{sec:locks}
This section is about the use of lock routines for synchronization.

View File

@ -1,13 +1,13 @@
\pagebreak
\chapter{The \code{master} Construct}
\label{chap:master}
\section{The \code{master} Construct}
\label{sec:master}
The following example demonstrates the master construct . In the example, the master
keeps track of how many iterations have been executed and prints out a progress
report. The other threads skip the master region without waiting.
\cexample{master}{1c}
\cexample{master}{1}
\fexample{master}{1f}
\fexample{master}{1}

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{The OpenMP Memory Model}
\label{chap:mem_model}
\section{The OpenMP Memory Model}
\label{sec:mem_model}
In the following example, at Print 1, the value of \plc{x} could be either 2
or 5, depending on the timing of the threads, and the implementation of the assignment
@ -14,25 +14,25 @@ The barrier after Print 1 contains implicit flushes on all threads, as well as
a thread synchronization, so the programmer is guaranteed that the value 5 will
be printed by both Print 2 and Print 3.
\cexample{mem_model}{1c}
\cexample{mem_model}{1}
\fexample{mem_model}{1f}
\ffreeexample{mem_model}{1}
The following example demonstrates why synchronization is difficult to perform
correctly through variables. The value of flag is undefined in both prints on thread
1 and the value of data is only well-defined in the second print.
\cexample{mem_model}{2c}
\cexample{mem_model}{2}
\fexample{mem_model}{2f}
\fexample{mem_model}{2}
The next example demonstrates why synchronization is difficult to perform correctly
through variables. Because the \plc{write}(1)-\plc{flush}(1)-\plc{flush}(2)-\plc{read}(2)
sequence cannot be guaranteed in the example, the statements on thread 0 and thread
1 may execute in either order.
\cexample{mem_model}{3c}
\cexample{mem_model}{3}
\fexample{mem_model}{3f}
\fexample{mem_model}{3}

View File

@ -1,11 +1,10 @@
\pagebreak
\chapter{Nestable Lock Routines}
\label{chap:nestable_lock}
\subsection{Nestable Lock Routines}
\label{subsec:nestable_lock}
The following example demonstrates how a nestable lock can be used to synchronize
updates both to a whole structure and to one of its members.
\cexample{nestable_lock}{1c}
\cexample{nestable_lock}{1}
\fexample{nestable_lock}{1f}
\fexample{nestable_lock}{1}

View File

@ -1,18 +1,18 @@
\pagebreak
\chapter{Nested Loop Constructs}
\label{chap:nested_loop}
\section{Nested Loop Constructs}
\label{sec:nested_loop}
The following example of loop construct nesting is conforming because the inner
and outer loop regions bind to different \code{parallel} regions:
\cexample{nested_loop}{1c}
\cexample{nested_loop}{1}
\fexample{nested_loop}{1f}
\fexample{nested_loop}{1}
The following variation of the preceding example is also conforming:
\cexample{nested_loop}{2c}
\cexample{nested_loop}{2}
\fexample{nested_loop}{2f}
\fexample{nested_loop}{2}

View File

@ -1,52 +1,52 @@
\pagebreak
\chapter{Restrictions on Nesting of Regions}
\label{chap:nesting_restrict}
\section{Restrictions on Nesting of Regions}
\label{sec:nesting_restrict}
The examples in this section illustrate the region nesting rules.
The following example is non-conforming because the inner and outer loop regions
are closely nested:
\cexample{nesting_restrict}{1c}
\cexample{nesting_restrict}{1}
\fexample{nesting_restrict}{1f}
\fexample{nesting_restrict}{1}
The following orphaned version of the preceding example is also non-conforming:
\cexample{nesting_restrict}{2c}
\cexample{nesting_restrict}{2}
\fexample{nesting_restrict}{2f}
\fexample{nesting_restrict}{2}
The following example is non-conforming because the loop and \code{single} regions
are closely nested:
\cexample{nesting_restrict}{3c}
\cexample{nesting_restrict}{3}
\fexample{nesting_restrict}{3f}
\fexample{nesting_restrict}{3}
The following example is non-conforming because a \code{barrier} region cannot
be closely nested inside a loop region:
\cexample{nesting_restrict}{4c}
\cexample{nesting_restrict}{4}
\fexample{nesting_restrict}{4f}
\fexample{nesting_restrict}{4}
The following example is non-conforming because the \code{barrier} region cannot
be closely nested inside the \code{critical} region. If this were permitted,
it would result in deadlock due to the fact that only one thread at a time can
enter the \code{critical} region:
\cexample{nesting_restrict}{5c}
\cexample{nesting_restrict}{5}
\fexample{nesting_restrict}{5f}
\fexample{nesting_restrict}{5}
The following example is non-conforming because the \code{barrier} region cannot
be closely nested inside the \code{single} region. If this were permitted, it
would result in deadlock due to the fact that only one thread executes the \code{single}
region:
\cexample{nesting_restrict}{6c}
\cexample{nesting_restrict}{6}
\fexample{nesting_restrict}{6f}
\fexample{nesting_restrict}{6}

View File

@ -1,14 +1,14 @@
\pagebreak
\chapter{The \code{nowait} Clause}
\label{chap:nowait}
\section{The \code{nowait} Clause}
\label{sec:nowait}
If there are multiple independent loops within a \code{parallel} region, you
can use the \code{nowait} clause to avoid the implied barrier at the end of the
loop construct, as follows:
\cexample{nowait}{1c}
\cexample{nowait}{1}
\fexample{nowait}{1f}
\fexample{nowait}{1}
In the following example, static scheduling distributes the same logical iteration
numbers to the threads that execute the three loop regions. This allows the \code{nowait}
@ -22,7 +22,7 @@ to \code{n-1} (from \code{1} to \code{N} in the Fortran version), while the
iteration space of the last loop is from \code{1} to \code{n} (\code{2} to
\code{N+1} in the Fortran version).
\cexample{nowait}{2c}
\cexample{nowait}{2}
\fexample{nowait}{2f}
\ffreeexample{nowait}{2}

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{Interaction Between the \code{num\_threads} Clause and \code{omp\_set\_dynamic}}
\label{chap:nthrs_dynamic}
\section{Interaction Between the \code{num\_threads} Clause and \code{omp\_set\_dynamic}}
\label{sec:nthrs_dynamic}
The following example demonstrates the \code{num\_threads} clause and the effect
of the \\
@ -12,17 +12,17 @@ of threads in OpenMP implementations that support it. In this case, 10 threads
are provided. Note that in case of an error the OpenMP implementation is free to
abort the program or to supply any number of threads available.
\cexample{nthrs_dynamic}{1c}
\cexample{nthrs_dynamic}{1}
\fexample{nthrs_dynamic}{1f}
\fexample{nthrs_dynamic}{1}
The call to the \code{omp\_set\_dynamic} routine with a non-zero argument in
C/C++, or \code{.TRUE.} in Fortran, allows the OpenMP implementation to choose
any number of threads between 1 and 10.
\cexample{nthrs_dynamic}{2c}
\cexample{nthrs_dynamic}{2}
\fexample{nthrs_dynamic}{2f}
\fexample{nthrs_dynamic}{2}
It is good practice to set the \plc{dyn-var} ICV explicitly by calling the \code{omp\_set\_dynamic}
routine, as its default setting is implementation defined.

View File

@ -1,12 +1,12 @@
\pagebreak
\chapter{Controlling the Number of Threads on Multiple Nesting Levels}
\label{chap:nthrs_nesting}
\section{Controlling the Number of Threads on Multiple Nesting Levels}
\label{sec:nthrs_nesting}
The following examples demonstrate how to use the \code{OMP\_NUM\_THREADS} environment
variable to control the number of threads on multiple nesting levels:
\cexample{nthrs_nesting}{1c}
\cexample{nthrs_nesting}{1}
\fexample{nthrs_nesting}{1f}
\fexample{nthrs_nesting}{1}

View File

@ -1,28 +1,28 @@
\pagebreak
\chapter{The \code{ordered} Clause and the \code{ordered} Construct}
\label{chap:ordered}
\section{The \code{ordered} Clause and the \code{ordered} Construct}
\label{sec:ordered}
Ordered constructs are useful for sequentially ordering the output from work that
is done in parallel. The following program prints out the indices in sequential
order:
\cexample{ordered}{1c}
\cexample{ordered}{1}
\fexample{ordered}{1f}
\fexample{ordered}{1}
It is possible to have multiple \code{ordered} constructs within a loop region
with the \code{ordered} clause specified. The first example is non-conforming
because all iterations execute two \code{ordered} regions. An iteration of a
loop must not execute more than one \code{ordered} region:
\cexample{ordered}{2c}
\cexample{ordered}{2}
\fexample{ordered}{2f}
\fexample{ordered}{2}
The following is a conforming example with more than one \code{ordered} construct.
Each iteration will execute only one \code{ordered} region:
\cexample{ordered}{3c}
\cexample{ordered}{3}
\fexample{ordered}{3f}
\fexample{ordered}{3}

View File

@ -1,12 +1,12 @@
\pagebreak
\chapter{The \code{parallel} Construct}
\label{chap:parallel}
\section{The \code{parallel} Construct}
\label{sec:parallel}
The \code{parallel} construct can be used in coarse-grain parallel programs.
In the following example, each thread in the \code{parallel} region decides what
part of the global array \plc{x} to work on, based on the thread number:
\cexample{parallel}{1c}
\cexample{parallel}{1}
\fexample{parallel}{1f}
\fexample{parallel}{1}

View File

@ -1,11 +1,12 @@
\chapter{A Simple Parallel Loop}
\label{chap:ploop}
\pagebreak
\section{A Simple Parallel Loop}
\label{sec:ploop}
The following example demonstrates how to parallelize a simple loop using the parallel
loop construct. The loop iteration variable is private by default, so it is not
necessary to specify it explicitly in a \code{private} clause.
\cexample{ploop}{1c}
\cexample{ploop}{1}
\fexample{ploop}{1f}
\fexample{ploop}{1}

View File

@ -1,11 +1,11 @@
\pagebreak
\chapter{Parallel Random Access Iterator Loop}
\section{Parallel Random Access Iterator Loop}
\cppspecificstart
\label{chap:pra_iterator}
\label{sec:pra_iterator}
The following example shows a parallel random access iterator loop.
\cnexample{pra_iterator}{1c}
\cppnexample{pra_iterator}{1}
\cppspecificend

View File

@ -1,31 +1,31 @@
\pagebreak
\chapter{The \code{private} Clause}
\label{chap:private}
\section{The \code{private} Clause}
\label{sec:private}
In the following example, the values of original list items \plc{i} and \plc{j}
are retained on exit from the \code{parallel} region, while the private list
items \plc{i} and \plc{j} are modified within the \code{parallel} construct.
\cexample{private}{1c}
\cexample{private}{1}
\fexample{private}{1f}
\fexample{private}{1}
In the following example, all uses of the variable \plc{a} within the loop construct
in the routine \plc{f} refer to a private list item \plc{a}, while it is
unspecified whether references to \plc{a} in the routine \plc{g} are to a
private list item or the original list item.
\cexample{private}{2c}
\cexample{private}{2}
\fexample{private}{2f}
\fexample{private}{2}
The following example demonstrates that a list item that appears in a \code{private}
clause in a \code{parallel} construct may also appear in a \code{private}
clause in an enclosed worksharing construct, which results in an additional private
copy.
\cexample{private}{3c}
\cexample{private}{3}
\fexample{private}{3f}
\fexample{private}{3}

View File

@ -1,13 +1,13 @@
\pagebreak
\chapter{The \code{parallel} \code{sections} Construct}
\label{chap:psections}
\section{The \code{parallel} \code{sections} Construct}
\label{sec:psections}
In the following example routines \code{XAXIS}, \code{YAXIS}, and \code{ZAXIS} can
be executed concurrently. The first \code{section} directive is optional. Note
that all \code{section} directives need to appear in the \code{parallel sections}
construct.
\cexample{psections}{1c}
\cexample{psections}{1}
\fexample{psections}{1f}
\fexample{psections}{1}

View File

@ -1,44 +1,44 @@
\pagebreak
\chapter{The \code{reduction} Clause}
\label{chap:reduction}
\section{The \code{reduction} Clause}
\label{sec:reduction}
The following example demonstrates the \code{reduction} clause ; note that some
reductions can be expressed in the loop in several ways, as shown for the \code{max}
and \code{min} reductions below:
\cexample{reduction}{1c}
\cexample{reduction}{1}
\fexample{reduction}{1f}
\ffreeexample{reduction}{1}
A common implementation of the preceding example is to treat it as if it had been
written as follows:
\cexample{reduction}{2c}
\cexample{reduction}{2}
\fortranspecificstart
\fnexample{reduction}{2f}
\ffreenexample{reduction}{2}
The following program is non-conforming because the reduction is on the
\emph{intrinsic procedure name} \code{MAX} but that name has been redefined to be the variable
named \code{MAX}.
\ffreenexample{reduction}{3}
% blue line floater at top of this page for "Fortran, cont."
\begin{figure}[t!]
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
\end{figure}
\fnexample{reduction}{3f}
The following conforming program performs the reduction using the
\emph{intrinsic procedure name} \code{MAX} even though the intrinsic \code{MAX} has been renamed
to \code{REN}.
\fnexample{reduction}{4f}
\ffreenexample{reduction}{4}
The following conforming program performs the reduction using
\plc{intrinsic procedure name} \code{MAX} even though the intrinsic \code{MAX} has been renamed
to \code{MIN}.
\fnexample{reduction}{5f}
\ffreenexample{reduction}{5}
\fortranspecificend
The following example is non-conforming because the initialization (\code{a =
@ -53,8 +53,13 @@ clause. This can be achieved by adding an explicit barrier after the assignment
directive (which has an implied barrier), or by initializing \code{a} before
the start of the \code{parallel} region.
\cexample{reduction}{3c}
\cexample{reduction}{6}
\fexample{reduction}{6f}
\fexample{reduction}{6}
The following example demonstrates the reduction of array \plc{a}. In C/C++ this is illustrated by the explicit use of an array section \plc{a[0:N]} in the \code{reduction} clause. The corresponding Fortran example uses array syntax supported in the base language. As of the OpenMP 4.5 specification the explicit use of array section in the \code{reduction} clause in Fortran is not permitted. But this oversight will be fixed in the next release of the specification.
\cexample{reduction}{7}
\ffreeexample{reduction}{7}

View File

@ -1,7 +1,7 @@
\pagebreak
\chapter{The \code{omp\_set\_dynamic} and \\
\section{The \code{omp\_set\_dynamic} and \\
\code{omp\_set\_num\_threads} Routines}
\label{chap:set_dynamic_nthrs}
\label{sec:set_dynamic_nthrs}
Some programs rely on a fixed, prespecified number of threads to execute correctly.
Because the default setting for the dynamic adjustment of the number of threads
@ -17,8 +17,8 @@ dynamic threads setting. The dynamic threads mechanism determines the number of
threads to use at the start of the \code{parallel} region and keeps it constant
for the duration of the region.
\cexample{set_dynamic_nthrs}{1c}
\cexample{set_dynamic_nthrs}{1}
\fexample{set_dynamic_nthrs}{1f}
\fexample{set_dynamic_nthrs}{1}

View File

@ -1,6 +1,5 @@
\pagebreak
\chapter{Simple Lock Routines}
\label{chap:simple_lock}
\subsection{Simple Lock Routines}
\label{subsec:simple_lock}
In the following example, the lock routines cause the threads to be idle while
waiting for entry to the first critical section, but to do other work while waiting
@ -10,10 +9,10 @@ function does not, allowing the work in \code{skip} to be done.
Note that the argument to the lock routines should have type \code{omp\_lock\_t},
and that there is no need to flush it.
\cexample{simple_lock}{1c}
\cexample{simple_lock}{1}
Note that there is no need to flush the lock variable.
\fexample{simple_lock}{1f}
\fexample{simple_lock}{1}

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{The \code{single} Construct}
\label{chap:single}
\section{The \code{single} Construct}
\label{sec:single}
The following example demonstrates the \code{single} construct. In the example,
only one thread prints each of the progress messages. All other threads will skip
@ -11,8 +11,8 @@ a \code{nowait} clause can be specified, as is done in the third \code{single}
construct in this example. The user must not make any assumptions as to which thread
will execute a \code{single} region.
\cexample{single}{1c}
\cexample{single}{1}
\fexample{single}{1f}
\fexample{single}{1}

View File

@ -1,31 +1,31 @@
\pagebreak
\chapter{Placement of \code{flush}, \code{barrier}, \code{taskwait}
\section{Placement of \code{flush}, \code{barrier}, \code{taskwait}
and \code{taskyield} Directives}
\label{chap:standalone}
\label{sec:standalone}
The following example is non-conforming, because the \code{flush}, \code{barrier},
\code{taskwait}, and \code{taskyield} directives are stand-alone directives
and cannot be the immediate substatement of an \code{if} statement.
\cexample{standalone}{1c}
\cexample{standalone}{1}
The following example is non-conforming, because the \code{flush}, \code{barrier},
\code{taskwait}, and \code{taskyield} directives are stand-alone directives
and cannot be the action statement of an \code{if} statement or a labeled branch
target.
\fexample{standalone}{1f}
\ffreeexample{standalone}{1}
The following version of the above example is conforming because the \code{flush},
\code{barrier}, \code{taskwait}, and \code{taskyield} directives are enclosed
in a compound statement.
\cexample{standalone}{2c}
\cexample{standalone}{2}
The following example is conforming because the \code{flush}, \code{barrier},
\code{taskwait}, and \code{taskyield} directives are enclosed in an \code{if}
construct or follow the labeled branch target.
\fexample{standalone}{2f}
\ffreeexample{standalone}{2}

View File

@ -1,29 +1,32 @@
\pagebreak
\chapter{\code{target} Construct}
\label{chap:target}
\section{\code{target} Construct}
\label{sec:target}
\section{\code{target} Construct on \code{parallel} Construct}
\subsection{\code{target} Construct on \code{parallel} Construct}
\label{subsec:target_parallel}
This following example shows how the \code{target} construct offloads a code
region to a target device. The variables \plc{p}, \plc{v1}, \plc{v2}, and \plc{N} are implicitly mapped
to the target device.
\cexample{target}{1c}
\cexample{target}{1}
\fexample{target}{1f}
\ffreeexample{target}{1}
\section{\code{target} Construct with \code{map} Clause}
\subsection{\code{target} Construct with \code{map} Clause}
\label{subsec:target_map}
This following example shows how the \code{target} construct offloads a code
region to a target device. The variables \plc{p}, \plc{v1} and \plc{v2} are explicitly mapped to the
target device using the \code{map} clause. The variable \plc{N} is implicitly mapped to
the target device.
\cexample{target}{2c}
\cexample{target}{2}
\fexample{target}{2f}
\ffreeexample{target}{2}
\section{\code{map} Clause with \code{to}/\code{from} map-types}
\subsection{\code{map} Clause with \code{to}/\code{from} map-types}
\label{subsec:target_map_tofrom}
The following example shows how the \code{target} construct offloads a code region
to a target device. In the \code{map} clause, the \code{to} and \code{from}
@ -43,16 +46,17 @@ the variable \plc{p} is not initialized with the value of the corresponding vari
on the host device, and at the end of the \code{target} region the variable \plc{p}
is assigned to the corresponding variable on the host device.
\cexample{target}{3c}
\cexample{target}{3}
The \code{to} and \code{from} map-types allow programmers to optimize data
motion. Since data for the \plc{v} arrays are not returned, and data for the \plc{p} array
are not transferred to the device, only one-half of the data is moved, compared
to the default behavior of an implicit mapping.
\fexample{target}{3f}
\ffreeexample{target}{3}
\section{\code{map} Clause with Array Sections}
\subsection{\code{map} Clause with Array Sections}
\label{subsec:target_array_section}
The following example shows how the \code{target} construct offloads a code region
to a target device. In the \code{map} clause, map-types are used to optimize
@ -60,14 +64,14 @@ the mapping of variables to the target device. Because variables \plc{p}, \plc{v
pointers, array section notation must be used to map the arrays. The notation \code{:N}
is equivalent to \code{0:N}.
\cexample{target}{4c}
\cexample{target}{4}
In C, the length of the pointed-to array must be specified. In Fortran the extent
of the array is known and the length need not be specified. A section of the array
can be specified with the usual Fortran syntax, as shown in the following example.
The value 1 is assumed for the lower bound for array section \plc{v2(:N)}.
\fexample{target}{4f}
\ffreeexample{target}{4}
A more realistic situation in which an assumed-size array is passed to \code{vec\_mult}
requires that the length of the arrays be specified, because the compiler does
@ -75,9 +79,10 @@ not know the size of the storage. A section of the array must be specified with
the usual Fortran syntax, as shown in the following example. The value 1 is assumed
for the lower bound for array section \plc{v2(:N)}.
\fexample{target}{4bf}
\ffreeexample{target}{4b}
\section{\code{target} Construct with \code{if} Clause}
\subsection{\code{target} Construct with \code{if} Clause}
\label{subsec:target_if}
The following example shows how the \code{target} construct offloads a code region
to a target device.
@ -90,7 +95,18 @@ The \code{if} clause on the \code{parallel} construct indicates that if the
variable \plc{N} is smaller than a second threshold then the \code{parallel} region
is inactive.
\cexample{target}{5c}
\cexample{target}{5}
\fexample{target}{5f}
\ffreeexample{target}{5}
The following example is a modification of the above \plc{target.5} code to show the combined \code{target}
and parallel loop directives. It uses the \plc{directive-name} modifier in multiple \code{if}
clauses to specify the component directive to which it applies.
The \code{if} clause with the \code{target} modifier applies to the \code{target} component of the
combined directive, and the \code{if} clause with the \code{parallel} modifier applies
to the \code{parallel} component of the combined directive.
\cexample{target}{6}
\ffreeexample{target}{6}

View File

@ -1,8 +1,9 @@
\pagebreak
\chapter{\code{target} \code{data} Construct}
\label{chap:target_data}
\section{\code{target} \code{data} Construct}
\label{sec:target_data}
\section{Simple \code{target} \code{data} Construct}
\subsection{Simple \code{target} \code{data} Construct}
\label{subsec:target_data_simple}
This example shows how the \code{target} \code{data} construct maps variables
to a device data environment. The \code{target} \code{data} construct creates
@ -13,15 +14,16 @@ variables \plc{v1}, \plc{v2}, and \plc{p} from the enclosing device data environ
\plc{N} is mapped into the new device data environment from the encountering task's data
environment.
\cexample{target_data}{1c}
\cexample{target_data}{1}
The Fortran code passes a reference and specifies the extent of the arrays in the
declaration. No length information is necessary in the map clause, as is required
with C/C++ pointers.
\fexample{target_data}{1f}
\ffreeexample{target_data}{1}
\section{\code{target} \code{data} Region Enclosing Multiple \code{target} Regions}
\subsection{\code{target} \code{data} Region Enclosing Multiple \code{target} Regions}
\label{subsec:target_data_multiregion}
The following examples show how the \code{target} \code{data} construct maps
variables to a device data environment of a \code{target} region. The \code{target}
@ -36,7 +38,7 @@ In the following example the variables \plc{v1} and \plc{v2} are mapped at each
construct. Instead of mapping the variable \plc{p} twice, once at each \code{target}
construct, \plc{p} is mapped once by the \code{target} \code{data} construct.
\cexample{target_data}{2c}
\cexample{target_data}{2}
The Fortran code uses reference and specifies the extent of the \plc{p}, \plc{v1} and \plc{v2} arrays.
@ -45,14 +47,14 @@ C/C++ pointers. The arrays \plc{v1} and \plc{v2} are mapped at each \code{target
Instead of mapping the array \plc{p} twice, once at each target construct, \plc{p} is mapped
once by the \code{target} \code{data} construct.
\fexample{target_data}{2f}
\ffreeexample{target_data}{2}
In the following example, the variable tmp defaults to \code{tofrom} map-type
and is mapped at each \code{target} construct. The array \plc{Q} is mapped once at
the enclosing \code{target} \code{data} region instead of at each \code{target}
construct.
\cexample{target_data}{3c}
\cexample{target_data}{3}
In the following example the arrays \plc{v1} and \plc{v2} are mapped at each \code{target}
construct. Instead of mapping the array \plc{Q} twice at each \code{target} construct,
@ -61,9 +63,9 @@ variable is implicitly remapped for each \code{target} region, mapping the value
from the device to the host at the end of the first \code{target} region, and
from the host to the device for the second \code{target} region.
\fexample{target_data}{3f}
\ffreeexample{target_data}{3}
\section{\code{target} \code{data} Construct with Orphaned Call}
\subsection{\code{target} \code{data} Construct with Orphaned Call}
The following two examples show how the \code{target} \code{data} construct
maps variables to a device data environment. The \code{target} \code{data}
@ -88,7 +90,7 @@ of the storage location associated with their corresponding array sections. Note
that the following pairs of array section storage locations are equivalent (\plc{p0[:N]},
\plc{p1[:N]}), (\plc{v1[:N]},\plc{v3[:N]}), and (\plc{v2[:N]},\plc{v4[:N]}).
\cexample{target_data}{4c}
\cexample{target_data}{4}
The Fortran code maps the pointers and storage in an identical manner (same extent,
but uses indices from 1 to \plc{N}).
@ -104,7 +106,7 @@ assigned the address of the storage location associated with their corresponding
array sections. Note that the following pair of array storage locations are equivalent
(\plc{p0},\plc{p1}), (\plc{v1},\plc{v3}), and (\plc{v2},\plc{v4}).
\fexample{target_data}{4f}
\ffreeexample{target_data}{4}
In the following example, the variables \plc{p1}, \plc{v3}, and \plc{v4} are references to the pointer
@ -113,7 +115,7 @@ environment inherits the pointer variables \plc{p0}, \plc{v1}, and \plc{v2} from
\code{data} construct's device data environment. Thus, \plc{p1}, \plc{v3}, and \plc{v4} are already
present in the device data environment.
\cexample{target_data}{5c}
\cppexample{target_data}{5}
In the following example, the usual Fortran approach is used for dynamic memory.
The \plc{p0}, \plc{v1}, and \plc{v2} arrays are allocated in the main program and passed as references
@ -123,9 +125,10 @@ environment inherits the arrays \plc{p0}, \plc{v1}, and \plc{v2} from the enclos
device data environment. Thus, \plc{p1}, \plc{v3}, and \plc{v4} are already present in the device
data environment.
\fexample{target_data}{5f}
\ffreeexample{target_data}{5}
\section{\code{target} \code{data} Construct with \code{if} Clause}
\subsection{\code{target} \code{data} Construct with \code{if} Clause}
\label{subsec:target_data_if}
The following two examples show how the \code{target} \code{data} construct
maps variables to a device data environment.
@ -140,7 +143,7 @@ variable \plc{p} is implicitly mapped with a map-type of \code{tofrom}, but the
location for the array section \plc{p[0:N]} will not be mapped in the device data environments
of the \code{target} constructs.
\cexample{target_data}{6c}
\cexample{target_data}{6}
The \code{if} clauses work the same way for the following Fortran code. The \code{target}
constructs enclosed in the \code{target} \code{data} region should also use
@ -148,7 +151,7 @@ an \code{if} clause with the same condition, so that the \code{target} \code{dat
region and the \code{target} region are either both created for the device, or
are both ignored.
\fexample{target_data}{6f}
\ffreeexample{target_data}{6}
In the following example, when the \code{if} clause conditional expression on
the \code{target} construct evaluates to \plc{false}, the target region will
@ -159,7 +162,7 @@ region the array section \plc{p[0:N]} will be assigned from the device data envi
to the corresponding variable in the data environment of the task that encountered
the \code{target} \code{data} construct, resulting in undefined values in \plc{p[0:N]}.
\cexample{target_data}{7c}
\cexample{target_data}{7}
The \code{if} clauses work the same way for the following Fortran code. When
the \code{if} clause conditional expression on the \code{target} construct
@ -171,5 +174,5 @@ region the \plc{p} array will be assigned from the device data environment to th
variable in the data environment of the task that encountered the \code{target}
\code{data} construct, resulting in undefined values in \plc{p}.
\fexample{target_data}{7f}
\ffreeexample{target_data}{7}

View File

@ -0,0 +1,47 @@
%begin
\pagebreak
\section{\code{target} \code{enter} \code{data} and \code{target} \code{exit} \code{data} Constructs}
\label{sec:target_enter_exit_data}
%\section{Simple target enter data and target exit data Constructs}
The structured data construct (\code{target}~\code{data}) provides persistent data on a
device for subsequent \code{target} constructs as shown in the
\code{target}~\code{data} examples above. This is accomplished by creating a single
\code{target}~\code{data} region containing \code{target} constructs.
The unstructured data constructs allow the creation and deletion of data on
the device at any appropriate point within the host code, as shown below
with the \code{target}~\code{enter}~\code{data} and \code{target}~\code{exit}~\code{data} constructs.
The following C++ code creates/deletes a vector in a constructor/destructor
of a class. The constructor creates a vector with \code{target}~\code{enter}~\code{data}
and uses an \code{alloc} modifier in the \code{map} clause to avoid copying values
to the device. The destructor deletes the data (\code{target}~\code{exit}~\code{data})
and uses the \code{delete} modifier in the \code{map} clause to avoid copying data
back to the host. Note, the stand-alone \code{target}~\code{enter}~\code{data} occurs
after the host vector is created, and the \code{target}~\code{exit}~\code{data}
construct occurs before the host data is deleted.
\cppexample{target_unstructured_data}{1}
The following C code allocates and frees the data member of a Matrix structure.
The \code{init\_matrix} function allocates the memory used in the structure and
uses the \code{target}~\code{enter}~\code{data} directive to map it to the target device. The
\code{free\_matrix} function removes the mapped array from the target device
and then frees the memory on the host. Note, the stand-alone
\code{target}~\code{enter}~\code{data} occurs after the host memory is allocated, and the
\code{target}~\code{exit}~\code{data} construct occurs before the host data is freed.
\cexample{target_unstructured_data}{1}
The following Fortran code allocates and deallocates a module array. The
\code{initialize} subroutine allocates the module array and uses the
\code{target}~\code{enter}~\code{data} directive to map it to the target device. The
\code{finalize} subroutine removes the mapped array from the target device and
then deallocates the array on the host. Note, the stand-alone
\code{target}~\code{enter}~\code{data} occurs after the host memory is allocated, and the
\code{target}~\code{exit}~\code{data} construct occurs before the host data is deallocated.
\ffreeexample{target_unstructured_data}{1}
%end

View File

@ -1,8 +1,9 @@
\pagebreak
\chapter{\code{target} \code{update} Construct}
\label{chap:target_update}
\section{\code{target} \code{update} Construct}
\label{sec:target_update}
\section{Simple \code{target} \code{data} and \code{target} \code{update} Constructs}
\subsection{Simple \code{target} \code{data} and \code{target} \code{update} Constructs}
\label{subsec:target_data_and_update}
The following example shows how the \code{target} \code{update} construct updates
variables in a device data environment.
@ -26,11 +27,12 @@ region and waits for the completion of the region.
The second \code{target} region uses the updated values of \plc{v1[:N]} and \plc{v2[:N]}.
\cexample{target_update}{1c}
\cexample{target_update}{1}
\fexample{target_update}{1f}
\ffreeexample{target_update}{1}
\section{\code{target} \code{update} Construct with \code{if} Clause}
\subsection{\code{target} \code{update} Construct with \code{if} Clause}
\label{subsec:target_update_if}
The following example shows how the \code{target} \code{update} construct updates
variables in a device data environment.
@ -47,7 +49,7 @@ assigns the new values of \plc{v1} and \plc{v2} from the task's data environment
mapped array sections in the \code{target} \code{data} construct's device data
environment.
\cexample{target_update}{2c}
\cexample{target_update}{2}
\fexample{target_update}{2f}
\ffreeexample{target_update}{2}

View File

@ -1,58 +1,62 @@
\pagebreak
\chapter{Task Dependences}
\label{chap:task_dep}
\section{Task Dependences}
\label{sec:task_depend}
\section{Flow Dependence}
\subsection{Flow Dependence}
\label{subsec:task_flow_depend}
In this example we show a simple flow dependence expressed using the \code{depend}
clause on the \code{task} construct.
\cexample{task_dep}{1c}
\cexample{task_dep}{1}
\fexample{task_dep}{1f}
\ffreeexample{task_dep}{1}
The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend}
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
omitted, then the tasks could execute in any order and the program and the program
would have a race condition.
\section{Anti-dependence}
\subsection{Anti-dependence}
\label{subsec:task_anti_depend}
In this example we show an anti-dependence expressed using the \code{depend}
clause on the \code{task} construct.
\cexample{task_dep}{2c}
\cexample{task_dep}{2}
\fexample{task_dep}{2f}
\ffreeexample{task_dep}{2}
The program will always print \texttt{"}x = 1\texttt{"}, because the \code{depend}
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
omitted, then the tasks could execute in any order and the program would have a
race condition.
\section{Output Dependence}
\subsection{Output Dependence}
\label{subsec:task_out_depend}
In this example we show an output dependence expressed using the \code{depend}
clause on the \code{task} construct.
\cexample{task_dep}{3c}
\cexample{task_dep}{3}
\fexample{task_dep}{3f}
\ffreeexample{task_dep}{3}
The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend}
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
omitted, then the tasks could execute in any order and the program would have a
race condition.
\section{Concurrent Execution with Dependences}
\subsection{Concurrent Execution with Dependences}
\label{subsec:task_concurrent_depend}
In this example we show potentially concurrent execution of tasks using multiple
flow dependences expressed using the \code{depend} clause on the \code{task}
construct.
\cexample{task_dep}{4c}
\cexample{task_dep}{4}
\fexample{task_dep}{4f}
\ffreeexample{task_dep}{4}
The last two tasks are dependent on the first task. However there is no dependence
between the last two tasks, which may execute in any order (or concurrently if
@ -61,12 +65,13 @@ more than one thread is available). Thus, the possible outputs are \texttt{"}x
If the \code{depend} clauses had been omitted, then all of the tasks could execute
in any order and the program would have a race condition.
\section{Matrix multiplication}
\subsection{Matrix multiplication}
\label{subsec:task_matrix_mult}
This example shows a task-based blocked matrix multiplication. Matrices are of
NxN elements, and the multiplication is implemented using blocks of BSxBS elements.
\cexample{task_dep}{5c}
\cexample{task_dep}{5}
\fexample{task_dep}{5f}
\ffreeexample{task_dep}{5}

View File

@ -0,0 +1,22 @@
\pagebreak
\section{Task Priority}
\label{sec:task_priority}
%\subsection{Task Priority}
%\label{subsec:task_priority}
In this example we compute arrays in a matrix through a \plc{compute\_array} routine.
Each task has a priority value equal to the value of the loop variable \plc{i} at the
moment of its creation. A higher priority on a task means that a task is a candidate
to run sooner.
The creation of tasks occurs in ascending order (according to the iteration space of
the loop) but a hint, by means of the \code{priority} clause, is provided to reverse
the execution order.
\cexample{task_priority}{1}
\ffreeexample{task_priority}{1}

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{The \code{taskgroup} Construct}
\label{chap:taskgroup}
\section{The \code{taskgroup} Construct}
\label{sec:taskgroup}
In this example, tasks are grouped and synchronized using the \code{taskgroup}
construct.
@ -14,7 +14,7 @@ does not participate in the synchronization, and is left free to execute in para
This is opposed to the behaviour of the \code{taskwait} construct, which would
include the background tasks in the synchronization.
\cexample{taskgroup}{1c}
\cexample{taskgroup}{1}
\fexample{taskgroup}{1f}
\ffreeexample{taskgroup}{1}

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{The \code{task} and \code{taskwait} Constructs}
\label{chap:tasking}
\section{The \code{task} and \code{taskwait} Constructs}
\label{sec:task_taskwait}
The following example shows how to traverse a tree-like structure using explicit
tasks. Note that the \code{traverse} function should be called from within a
@ -9,17 +9,17 @@ note that the tasks will be executed in no specified order because there are no
synchronization directives. Thus, assuming that the traversal will be done in post
order, as in the sequential code, is wrong.
\cexample{tasking}{1c}
\cexample{tasking}{1}
\fexample{tasking}{1f}
\ffreeexample{tasking}{1}
In the next example, we force a postorder traversal of the tree by adding a \code{taskwait}
directive. Now, we can safely assume that the left and right sons have been executed
before we process the current node.
\cexample{tasking}{2c}
\cexample{tasking}{2}
\fexample{tasking}{2f}
\ffreeexample{tasking}{2}
The following example demonstrates how to use the \code{task} construct to process
elements of a linked list in parallel. The thread executing the \code{single}
@ -28,18 +28,18 @@ in the current team. The pointer \plc{p} is \code{firstprivate} by default
on the \code{task} construct so it is not necessary to specify it in a \code{firstprivate}
clause.
\cexample{tasking}{3c}
\cexample{tasking}{3}
\fexample{tasking}{3f}
\ffreeexample{tasking}{3}
The \code{fib()} function should be called from within a \code{parallel} region
for the different specified tasks to be executed in parallel. Also, only one thread
of the \code{parallel} region should call \code{fib()} unless multiple concurrent
Fibonacci computations are desired.
\cexample{tasking}{4c}
\cexample{tasking}{4}
\fexample{tasking}{4f}
\fexample{tasking}{4}
Note: There are more efficient algorithms for computing Fibonacci numbers. This
classic recursion algorithm is for illustrative purposes.
@ -52,9 +52,9 @@ loop to suspend its task at the task scheduling point in the \code{task} directi
and start executing unassigned tasks. Once the number of unassigned tasks is sufficiently
low, the thread may resume execution of the task generating loop.
\cexample{tasking}{5c}
\cexample{tasking}{5}
\pagebreak
\fexample{tasking}{5f}
\fexample{tasking}{5}
The following example is the same as the previous one, except that the tasks are
generated in an untied task. While generating the tasks, the implementation may
@ -69,9 +69,9 @@ to resume the task generating loop. In the previous examples, the other threads
would be forced to idle until the generating thread finishes its long task, since
the task generating loop was in a tied task.
\cexample{tasking}{6c}
\cexample{tasking}{6}
\fexample{tasking}{6f}
\fexample{tasking}{6}
The following two examples demonstrate how the scheduling rules illustrated in
Section 2.11.3 of the OpenMP 4.0 specification affect the usage of
@ -86,20 +86,20 @@ both of the task regions that modify \code{tp}. The parts of these task regions
in which \code{tp} is modified may be executed in any order so the resulting
value of \code{var} can be either 1 or 2.
\cexample{tasking}{7c}
\cexample{tasking}{7}
\fexample{tasking}{7f}
\fexample{tasking}{7}
In this example, scheduling constraints prohibit a thread in the team from executing
a new task that modifies \code{tp} while another such task region tied to the
same thread is suspended. Therefore, the value written will persist across the
task scheduling point.
\cexample{tasking}{8c}
\cexample{tasking}{8}
\fexample{tasking}{8f}
\fexample{tasking}{8}
The following two examples demonstrate how the scheduling rules illustrated in
Section 2.11.3 of the OpenMP 4.0 specification affect the usage of locks
@ -112,20 +112,20 @@ it encounters the task scheduling point at task 3, it could suspend task 1 and
begin task 2 which will result in a deadlock when it tries to enter critical region
1.
\cexample{tasking}{9c}
\cexample{tasking}{9}
\fexample{tasking}{9f}
\fexample{tasking}{9}
In the following example, \code{lock} is held across a task scheduling point.
However, according to the scheduling restrictions, the executing thread can't
begin executing one of the non-descendant tasks that also acquires \code{lock} before
the task region is complete. Therefore, no deadlock is possible.
\cexample{tasking}{10c}
\cexample{tasking}{10}
\fexample{tasking}{10f}
\ffreeexample{tasking}{10}
The following examples illustrate the use of the \code{mergeable} clause in the
\code{task} construct. In this first example, the \code{task} construct has
@ -139,9 +139,9 @@ outcome does not depend on whether or not the task is merged (that is, the task
will always increment the same variable and will always compute the same value
for \code{x}).
\cexample{tasking}{11c}
\cexample{tasking}{11}
\fexample{tasking}{11f}
\ffreeexample{tasking}{11}
This second example shows an incorrect use of the \code{mergeable} clause. In
this example, the created task will access different instances of the variable
@ -150,9 +150,9 @@ it will access the same variable \code{x} if the task is merged. As a result,
the behavior of the program is unspecified and it can print two different values
for \code{x} depending on the decisions taken by the implementation.
\cexample{tasking}{12c}
\cexample{tasking}{12}
\fexample{tasking}{12f}
\ffreeexample{tasking}{12}
The following example shows the use of the \code{final} clause and the \code{omp\_in\_final}
API call in a recursive binary search program. To reduce overhead, once a certain
@ -170,9 +170,9 @@ in the stack could also be avoided but it would make this example less clear. Th
clause since all tasks created in a \code{final} task region are included tasks
that can be merged if the \code{mergeable} clause is present.
\cexample{tasking}{13c}
\cexample{tasking}{13}
\fexample{tasking}{13f}
\ffreeexample{tasking}{13}
The following example illustrates the difference between the \code{if} and the
\code{final} clauses. The \code{if} clause has a local effect. In the first
@ -184,7 +184,7 @@ task itself. In the second nest of tasks, the nested tasks will be created as in
tasks. Note also that the conditions for the \code{if} and \code{final} clauses
are usually the opposite.
\cexample{tasking}{14c}
\cexample{tasking}{14}
\fexample{tasking}{14f}
\ffreeexample{tasking}{14}

14
Examples_taskloop.tex Normal file
View File

@ -0,0 +1,14 @@
\pagebreak
\section{The \code{taskloop} Construct}
\label{sec:taskloop}
The following example illustrates how to execute a long running task concurrently with tasks created
with a \code{taskloop} directive for a loop having unbalanced amounts of work for its iterations.
The \code{grainsize} clause specifies that each task is to execute at least 500 iterations of the loop.
The \code{nogroup} clause removes the implicit taskgroup of the \code{taskloop} construct; the explicit \code{taskgroup} construct in the example ensures that the function is not exited before the long-running task and the loops have finished execution.
\cexample{taskloop}{1}
\ffreeexample{taskloop}{1}

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{The \code{taskyield} Construct}
\label{chap:taskyield}
\section{The \code{taskyield} Construct}
\label{sec:taskyield}
The following example illustrates the use of the \code{taskyield} directive.
The tasks in the example compute something useful and then do some computation
@ -8,7 +8,7 @@ that must be done in a critical region. By using \code{taskyield} when a task
cannot get access to the \code{critical} region the implementation can suspend
the current task and schedule some other task that can do something useful.
\cexample{taskyield}{1c}
\cexample{taskyield}{1}
\fexample{taskyield}{1f}
\ffreeexample{taskyield}{1}

View File

@ -1,9 +1,10 @@
\pagebreak
\chapter{\code{teams} Constructs}
\label{chap:teams}
\section{\code{teams} Constructs}
\label{sec:teams}
\section{\code{target} and \code{teams} Constructs with \code{omp\_get\_num\_teams}\\
\subsection{\code{target} and \code{teams} Constructs with \code{omp\_get\_num\_teams}\\
and \code{omp\_get\_team\_num} Routines}
\label{subsec:teams_api}
The following example shows how the \code{target} and \code{teams} constructs
are used to create a league of thread teams that execute a region. The \code{teams}
@ -15,11 +16,12 @@ region. The \code{omp\_get\_team\_num} routine returns the team number, which is
between 0 and one less than the value returned by \code{omp\_get\_num\_teams}. The following
example manually distributes a loop across two teams.
\cexample{teams}{1c}
\cexample{teams}{1}
\fexample{teams}{1f}
\ffreeexample{teams}{1}
\section{\code{target}, \code{teams}, and \code{distribute} Constructs}
\subsection{\code{target}, \code{teams}, and \code{distribute} Constructs}
\label{subsec:teams_distribute}
The following example shows how the \code{target}, \code{teams}, and \code{distribute}
constructs are used to execute a loop nest in a \code{target} region. The \code{teams}
@ -45,11 +47,12 @@ created by the \code{teams} construct. At the end of the \code{teams} region,
each master thread's private copy of \plc{sum} is reduced into the final \plc{sum} that is
implicitly mapped into the \code{target} region.
\cexample{teams}{2c}
\cexample{teams}{2}
\fexample{teams}{2f}
\ffreeexample{teams}{2}
\section{\code{target} \code{teams}, and Distribute Parallel Loop Constructs}
\subsection{\code{target} \code{teams}, and Distribute Parallel Loop Constructs}
\label{subsec:teams_distribute_parallel}
The following example shows how the \code{target} \code{teams} and distribute
parallel loop constructs are used to execute a \code{target} region. The \code{target}
@ -59,12 +62,13 @@ team executes the \code{teams} region.
The distribute parallel loop construct schedules the loop iterations across the
master threads of each team and then across the threads of each team.
\cexample{teams}{3c}
\cexample{teams}{3}
\fexample{teams}{3f}
\ffreeexample{teams}{3}
\section{\code{target} \code{teams} and Distribute Parallel Loop
\subsection{\code{target} \code{teams} and Distribute Parallel Loop
Constructs with Scheduling Clauses}
\label{subsec:teams_distribute_parallel_schedule}
The following example shows how the \code{target} \code{teams} and distribute
parallel loop constructs are used to execute a \code{target} region. The \code{teams}
@ -83,11 +87,12 @@ The \code{schedule} clause indicates that the 1024 iterations distributed to
a master thread are then assigned to the threads in its associated team in chunks
of 64 iterations.
\cexample{teams}{4c}
\cexample{teams}{4}
\fexample{teams}{4f}
\ffreeexample{teams}{4}
\section{\code{target} \code{teams} and \code{distribute} \code{simd} Constructs}
\subsection{\code{target} \code{teams} and \code{distribute} \code{simd} Constructs}
\label{subsec:teams_distribute_simd}
The following example shows how the \code{target} \code{teams} and \code{distribute}
\code{simd} constructs are used to execute a loop in a \code{target} region.
@ -97,11 +102,12 @@ master thread of each team executes the \code{teams} region.
The \code{distribute} \code{simd} construct schedules the loop iterations across
the master thread of each team and then uses SIMD parallelism to execute the iterations.
\cexample{teams}{5c}
\cexample{teams}{5}
\fexample{teams}{5f}
\ffreeexample{teams}{5}
\section{\code{target} \code{teams} and Distribute Parallel Loop SIMD Constructs}
\subsection{\code{target} \code{teams} and Distribute Parallel Loop SIMD Constructs}
\label{subsec:teams_distribute_parallel_simd}
The following example shows how the \code{target} \code{teams} and the distribute
parallel loop SIMD constructs are used to execute a loop in a \code{target} \code{teams}
@ -112,7 +118,7 @@ The distribute parallel loop SIMD construct schedules the loop iterations across
the master thread of each team and then across the threads of each team where each
thread uses SIMD parallelism.
\cexample{teams}{6c}
\cexample{teams}{6}
\fexample{teams}{6f}
\ffreeexample{teams}{6}

View File

@ -1,18 +1,18 @@
\pagebreak
\chapter{The \code{threadprivate} Directive}
\label{chap:threadprivate}
\section{The \code{threadprivate} Directive}
\label{sec:threadprivate}
The following examples demonstrate how to use the \code{threadprivate} directive
to give each thread a separate counter.
\cexample{threadprivate}{1c}
\cexample{threadprivate}{1}
\fexample{threadprivate}{1f}
\fexample{threadprivate}{1}
\ccppspecificstart
The following example uses \code{threadprivate} on a static variable:
\cnexample{threadprivate}{2c}
\cnexample{threadprivate}{2}
The following example demonstrates unspecified behavior for the initialization
of a \code{threadprivate} variable. A \code{threadprivate} variable is initialized
@ -22,7 +22,7 @@ constructed using the value of \code{x} (which is modified by the statement
region could be either 1 or 2. This problem is avoided for \code{b}, which uses
an auxiliary \code{const} variable and a copy-constructor.
\cnexample{threadprivate}{3c}
\cppnexample{threadprivate}{3}
\ccppspecificend
The following examples show non-conforming uses and correct uses of the \code{threadprivate}
@ -32,29 +32,25 @@ directive.
The following example is non-conforming because the common block is not declared
local to the subroutine that refers to it:
\fnexample{threadprivate}{2f}
\fnexample{threadprivate}{2}
The following example is also non-conforming because the common block is not declared
local to the subroutine that refers to it:
\fnexample{threadprivate}{3f}
\fnexample{threadprivate}{3}
The following example is a correct rewrite of the previous example:
% blue line floater at top of this page for "Fortran, cont."
\begin{figure}[t!]
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
\end{figure}
\fnexample{threadprivate}{4f}
\fnexample{threadprivate}{4}
The following is an example of the use of \code{threadprivate} for local variables:
\fnexample{threadprivate}{5f}
% blue line floater at top of this page for "Fortran, cont."
\begin{figure}[t!]
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
\end{figure}
\fnexample{threadprivate}{5}
The above program, if executed by two threads, will print one of the following
two sets of output:
@ -85,8 +81,12 @@ or
\code{i = 5}
The following is an example of the use of \code{threadprivate} for module variables:
% blue line floater at top of this page for "Fortran, cont."
\begin{figure}[t!]
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
\end{figure}
\fnexample{threadprivate}{6f}
\fnexample{threadprivate}{6}
\fortranspecificend
\cppspecificstart
@ -95,12 +95,12 @@ for class-type \code{T}. \code{t1} is default constructed, \code{t2} is construc
taking a constructor accepting one argument of integer type, \code{t3} is copy
constructed with argument \code{f()}:
\cnexample{threadprivate}{4c}
\cppnexample{threadprivate}{4}
The following example illustrates the use of \code{threadprivate} for static
class members. The \code{threadprivate} directive for a static class member must
be placed inside the class definition.
\cnexample{threadprivate}{5c}
\cppnexample{threadprivate}{5}
\cppspecificend

View File

@ -1,7 +1,7 @@
\pagebreak
\chapter{The \code{workshare} Construct}
\section{The \code{workshare} Construct}
\fortranspecificstart
\label{chap:workshare}
\label{sec:workshare}
The following are examples of the \code{workshare} construct.
@ -10,14 +10,14 @@ the \code{parallel} region, and there is a barrier after the last statement.
Implementations must enforce Fortran execution rules inside of the \code{workshare}
block.
\fnexample{workshare}{1f}
\fnexample{workshare}{1}
In the following example, the barrier at the end of the first \code{workshare}
region is eliminated with a \code{nowait} clause. Threads doing \code{CC =
DD} immediately begin work on \code{EE = FF} when they are done with \code{CC
= DD}.
\fnexample{workshare}{2f}
\fnexample{workshare}{2}
% blue line floater at top of this page for "Fortran, cont."
\begin{figure}[t!]
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
@ -27,7 +27,7 @@ The following example shows the use of an \code{atomic} directive inside a \code
construct. The computation of \code{SUM(AA)} is workshared, but the update to
\code{R} is atomic.
\fnexample{workshare}{3f}
\fnexample{workshare}{3}
Fortran \code{WHERE} and \code{FORALL} statements are \emph{compound statements},
made up of a \emph{control} part and a \emph{statement} part. When \code{workshare}
@ -47,7 +47,7 @@ Each task gets worked on in order by the threads:
\\
\code{GG = HH}
\fnexample{workshare}{4f}
\fnexample{workshare}{4}
% blue line floater at top of this page for "Fortran, cont."
\begin{figure}[t!]
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
@ -56,21 +56,21 @@ Each task gets worked on in order by the threads:
In the following example, an assignment to a shared scalar variable is performed
by one thread in a \code{workshare} while all other threads in the team wait.
\fnexample{workshare}{5f}
\fnexample{workshare}{5}
The following example contains an assignment to a private scalar variable, which
is performed by one thread in a \code{workshare} while all other threads wait.
It is non-conforming because the private scalar variable is undefined after the
assignment statement.
\fnexample{workshare}{6f}
\fnexample{workshare}{6}
Fortran execution rules must be enforced inside a \code{workshare} construct.
In the following example, the same result is produced in the following program
fragment regardless of whether the code is executed sequentially or inside an OpenMP
program with multiple threads:
\fnexample{workshare}{7f}
\fnexample{workshare}{7}
\fortranspecificend

View File

@ -1,6 +1,6 @@
\pagebreak
\chapter{Worksharing Constructs Inside a \code{critical} Construct}
\label{chap:worksharing_critical}
\section{Worksharing Constructs Inside a \code{critical} Construct}
\label{sec:worksharing_critical}
The following example demonstrates using a worksharing construct inside a \code{critical}
construct. This example is conforming because the worksharing \code{single}
@ -11,8 +11,8 @@ region, creates a new team of threads, and becomes the master of the new team.
One of the threads in the new team enters the \code{single} region and increments
\code{i} by \code{1}. At the end of this example \code{i} is equal to \code{2}.
\cexample{worksharing_critical}{1c}
\cexample{worksharing_critical}{1}
\fexample{worksharing_critical}{1f}
\fexample{worksharing_critical}{1}

View File

@ -1,11 +1,39 @@
\chapter{Document Revision History}
\label{chap:history}
\section{Changes from 4.0.2 to 4.5.0}
\begin{itemize}
\item Reorganized into chapters of major topics
\item Included file extensions in example labels to indicate source type
\item Applied the explicit \code{map(tofrom)} for scalar variables
in a number of examples to comply with
the change of the default behavior for scalar variables from
\code{map(tofrom)} to \code{firstprivate} in the 4.5 specification
\item Added the following new examples:
\begin{itemize}
\item \code{linear} clause in loop constructs (\specref{sec:linear_in_loop})
\item task priority (\specref{sec:task_priority})
\item \code{taskloop} construct (\specref{sec:taskloop})
\item \plc{directive-name} modifier in multiple \code{if} clauses on
a combined construct (\specref{subsec:target_if})
\item unstructured data mapping (\specref{sec:target_enter_exit_data})
\item \code{link} clause for \code{declare}~\code{target} directive
(\specref{subsec:declare_target_link})
\item asynchronous target execution with \code{nowait} clause (\specref{sec:async_target_exec_depend})
\item device memory routines and device pointers
(\specref{subsec:target_mem_and_device_ptrs})
\item doacross loop nest (\specref{sec:doacross})
\item locks with hints (\specref{sec:locks})
\item C/C++ array reduction (\specref{sec:reduction})
\item C++ reference types in data sharing clauses (\specref{sec:cpp_reference})
\end{itemize}
\end{itemize}
\section{Changes from 4.0.1 to 4.0.2}
\begin{itemize}
\item Names of examples were changed from numbers to mnemonics
\item Added SIMD examples (\specref{chap:SIMD})
\item Added SIMD examples (\specref{sec:SIMD})
\item Applied miscellaneous fixes in several source codes
\item Added the revision history
\end{itemize}
@ -14,8 +42,8 @@
Added the following new examples:
\begin{itemize}
\item the \code{proc\_bind} clause (\specref{chap:affinity})
\item the \code{taskgroup} construct (\specref{chap:taskgroup})
\item the \code{proc\_bind} clause (\specref{sec:affinity})
\item the \code{taskgroup} construct (\specref{sec:taskgroup})
\end{itemize}
\section{Changes from 3.1 to 4.0}
@ -25,16 +53,16 @@ from the specification document.
Version 4.0 added the following new examples:
\begin{itemize}
\item task dependences (\specref{chap:task_dep})
\item cancellation constructs (\specref{chap:cancellation})
\item \code{target} construct (\specref{chap:target})
\item \code{target} \code{data} construct (\specref{chap:target_data})
\item \code{target} \code{update} construct (\specref{chap:target_update})
\item \code{declare} \code{target} construct (\specref{chap:declare_target})
\item \code{teams} constructs (\specref{chap:teams})
\item task dependences (\specref{sec:task_depend})
\item \code{target} construct (\specref{sec:target})
\item \code{target} \code{data} construct (\specref{sec:target_data})
\item \code{target} \code{update} construct (\specref{sec:target_update})
\item \code{declare} \code{target} construct (\specref{sec:declare_target})
\item \code{teams} constructs (\specref{sec:teams})
\item asynchronous execution of a \code{target} region using tasks
(\specref{chap:async_target})
\item array sections in device constructs (\specref{chap:array_sections})
\item device runtime routines (\specref{chap:device})
\item Fortran ASSOCIATE construct (\specref{chap:associate})
(\specref{subsec:async_target_with_tasks})
\item array sections in device constructs (\specref{sec:array_sections})
\item device runtime routines (\specref{sec:device})
\item Fortran ASSOCIATE construct (\specref{sec:associate})
\item cancellation constructs (\specref{sec:cancellation})
\end{itemize}

View File

@ -34,13 +34,14 @@
\chapter*{Introduction}
\label{chap:introduction}
\addcontentsline{toc}{chapter}{\protect\numberline{}Introduction}
This collection of programming examples supplements the OpenMP API for Shared
Memory Parallelization specifications, and is not part of the formal specifications. It
assumes familiarity with the OpenMP specifications, and shares the typographical
conventions used in that document.
\notestart
\noteheader This first release of the OpenMP Examples reflects the OpenMP Version 4.0
\noteheader This first release of the OpenMP Examples reflects the OpenMP Version 4.5
specifications. Additional examples are being developed and will be published in future
releases of this document.
\noteend

View File

@ -1,75 +1,20 @@
# Makefile for the OpenMP Examples document in LaTex format.
# For more information, see the master document, openmp-examples.tex.
version=4.0.2
version=4.5.0
default: openmp-examples.pdf
CHAPTERS=Title_Page.tex \
Introduction_Chapt.tex \
Examples_Chapt.tex \
Examples_ploop.tex \
Examples_mem_model.tex \
Examples_cond_comp.tex \
Examples_icv.tex \
Examples_parallel.tex \
Examples_nthrs_nesting.tex \
Examples_nthrs_dynamic.tex \
Examples_affinity.tex \
Examples_fort_do.tex \
Examples_fort_loopvar.tex \
Examples_nowait.tex \
Examples_collapse.tex \
Examples_psections.tex \
Examples_fpriv_sections.tex \
Examples_single.tex \
Examples_tasking.tex \
Examples_task_dep.tex \
Examples_taskgroup.tex \
Examples_taskyield.tex \
Examples_workshare.tex \
Examples_master.tex \
Examples_critical.tex \
Examples_worksharing_critical.tex \
Examples_barrier_regions.tex \
Examples_atomic.tex \
Examples_atomic_restrict.tex \
Examples_flush_nolist.tex \
Examples_standalone.tex \
Examples_ordered.tex \
Examples_cancellation.tex \
Examples_threadprivate.tex \
Examples_pra_iterator.tex \
Examples_fort_sp_common.tex \
Examples_default_none.tex \
Examples_fort_race.tex \
Examples_private.tex \
Examples_fort_sa_private.tex \
Examples_carrays_fpriv.tex \
Examples_lastprivate.tex \
Examples_reduction.tex \
Examples_copyin.tex \
Examples_copyprivate.tex \
Examples_nested_loop.tex \
Examples_nesting_restrict.tex \
Examples_set_dynamic_nthrs.tex \
Examples_get_nthrs.tex \
Examples_init_lock.tex \
Examples_lock_owner.tex \
Examples_simple_lock.tex \
Examples_nestable_lock.tex \
Examples_SIMD.tex \
Examples_target.tex \
Examples_target_data.tex \
Examples_target_update.tex \
Examples_declare_target.tex \
Examples_teams.tex \
Examples_async_target.tex \
Examples_array_sections.tex \
Examples_device.tex \
Examples_associate.tex \
Examples_*.tex \
History.tex
SOURCES=sources/*.c \
sources/*.cpp \
sources/*.f90 \
sources/*.f
INTERMEDIATE_FILES=openmp-examples.pdf \
openmp-examples.toc \
openmp-examples.idx \
@ -79,7 +24,7 @@ INTERMEDIATE_FILES=openmp-examples.pdf \
openmp-examples.out \
openmp-examples.log
openmp-examples.pdf: $(CHAPTERS) openmp.sty openmp-examples.tex openmp-logo.png
openmp-examples.pdf: $(CHAPTERS) $(SOURCES) openmp.sty openmp-examples.tex openmp-logo.png
rm -f $(INTERMEDIATE_FILES)
pdflatex -interaction=batchmode -file-line-error openmp-examples.tex
pdflatex -interaction=batchmode -file-line-error openmp-examples.tex

View File

@ -27,7 +27,7 @@ Source codes for OpenMP \VER{} Examples can be downloaded from
\href{https://github.com/OpenMP/Examples/tree/v\VER}{github}.\\
\begin{adjustwidth}{0pt}{1em}\setlength{\parskip}{0.25\baselineskip}%
Copyright © 1997-2015 OpenMP Architecture Review Board.\\
Copyright © 1997-2016 OpenMP Architecture Review Board.\\
Permission to copy without fee all or part of this material is granted,
provided the OpenMP Architecture Review Board copyright notice and
the title of this document appear. Notice is given that copying is by

View File

@ -1,4 +1,4 @@
Copyright (c) 1997-2015 OpenMP Architecture Review Board.
Copyright (c) 1997-2016 OpenMP Architecture Review Board.
All rights reserved.
Permission to redistribute and use without fee all or part of the source

11
openmp-examples.tcp Normal file
View File

@ -0,0 +1,11 @@
[FormatInfo]
Type=TeXnicCenterProjectInformation
Version=4
[ProjectInfo]
MainFile=ClassicThesis.tex
UseBibTeX=1
UseMakeIndex=0
ActiveProfile=LaTeX ⇨ PDF
ProjectLanguage=en
ProjectDialect=US

View File

@ -48,8 +48,8 @@
\documentclass[10pt,letterpaper,twoside,makeidx,hidelinks]{scrreprt}
% Text to appear in the footer on even-numbered pages:
\newcommand{\VER}{4.0.2}
\newcommand{\VERDATE}{March 2015}
\newcommand{\VER}{4.5.0}
\newcommand{\VERDATE}{November 2016}
\newcommand{\footerText}{OpenMP Examples Version \VER{} - \VERDATE}
% Unified style sheet for OpenMP documents:
@ -77,71 +77,120 @@
\setcounter{chapter}{0} % start chapter numbering here
\input{Examples_ploop}
\input{Examples_mem_model}
\input{Examples_cond_comp}
\input{Examples_icv}
\input{Examples_parallel}
\input{Examples_nthrs_nesting}
\input{Examples_nthrs_dynamic}
\input{Examples_affinity}
\input{Examples_fort_do}
\input{Examples_fort_loopvar}
\input{Examples_nowait}
\input{Examples_collapse}
\input{Examples_psections}
\input{Examples_fpriv_sections}
\input{Examples_single}
\input{Examples_tasking}
\input{Examples_task_dep}
\input{Examples_taskgroup}
\input{Examples_taskyield}
\input{Examples_workshare}
\input{Examples_master}
\input{Examples_critical}
\input{Examples_worksharing_critical}
\input{Examples_barrier_regions}
\input{Examples_atomic}
\input{Examples_atomic_restrict}
\input{Examples_flush_nolist}
\input{Examples_standalone}
\input{Examples_ordered}
\input{Examples_cancellation}
\input{Examples_threadprivate}
\input{Examples_pra_iterator}
\input{Examples_fort_sp_common}
\input{Examples_default_none}
\input{Examples_fort_race}
\input{Examples_private}
\input{Examples_fort_sa_private}
\input{Examples_carrays_fpriv}
\input{Examples_lastprivate}
\input{Examples_reduction}
\input{Examples_copyin}
\input{Examples_copyprivate}
\input{Examples_nested_loop}
\input{Examples_nesting_restrict}
\input{Examples_set_dynamic_nthrs}
\input{Examples_get_nthrs}
\input{Examples_init_lock}
\input{Examples_lock_owner}
\input{Examples_simple_lock}
\input{Examples_nestable_lock}
\input{Examples_SIMD}
\input{Examples_target}
\input{Examples_target_data}
\input{Examples_target_update}
\input{Examples_declare_target}
\input{Examples_teams}
\input{Examples_async_target}
\input{Examples_array_sections}
\input{Examples_device}
\input{Examples_associate}
\input{Chap_parallel_execution}
\input{Examples_ploop}
\input{Examples_parallel}
\input{Examples_nthrs_nesting}
\input{Examples_nthrs_dynamic}
\input{Examples_fort_do}
\input{Examples_nowait}
\input{Examples_collapse}
% linear Clause 475
\input{Examples_linear_in_loop}
\input{Examples_psections}
\input{Examples_fpriv_sections}
\input{Examples_single}
\input{Examples_workshare}
\input{Examples_master}
\input{Examples_pra_iterator}
\input{Examples_set_dynamic_nthrs}
\input{Examples_get_nthrs}
\input{Chap_affinity}
\input{Examples_affinity}
\input{Examples_affinity_query}
\input{Chap_tasking}
\input{Examples_tasking}
\input{Examples_task_priority}
\input{Examples_task_dep}
\input{Examples_taskgroup}
\input{Examples_taskyield}
\input{Examples_taskloop}
\input{Chap_devices}
\input{Examples_target}
\input{Examples_target_data}
\input{Examples_target_unstructured_data}
\input{Examples_target_update}
\input{Examples_declare_target}
% Link clause 474
\input{Examples_teams}
\input{Examples_async_target_depend}
\input{Examples_async_target_with_tasks}
%Title change of 57.1 and 57.2
%New subsection
\input{Examples_async_target_nowait}
\input{Examples_async_target_nowait_depend}
\input{Examples_array_sections}
% Structure Element in map 487
\input{Examples_device}
% MemoryRoutine and Device ptr 473
\input{Chap_SIMD}
\input{Examples_SIMD}
% Forward Depend 370
% simdlen 476
% simd linear modifier 480
\input{Chap_synchronization}
\input{Examples_critical}
\input{Examples_worksharing_critical}
\input{Examples_barrier_regions}
\input{Examples_atomic}
\input{Examples_atomic_restrict}
\input{Examples_flush_nolist}
\input{Examples_ordered}
% Doacross loop 405
\input{Examples_doacross}
\input{Examples_locks}
\input{Examples_init_lock}
\input{Examples_init_lock_with_hint}
\input{Examples_lock_owner}
\input{Examples_simple_lock}
\input{Examples_nestable_lock}
% % LOCK with Hints 478
% % Hint Clause xxxxxx (included after init_lock)
% % Lock routines with hint
\input{Chap_data_environment}
\input{Examples_threadprivate}
\input{Examples_default_none}
\input{Examples_private}
\input{Examples_fort_loopvar}
\input{Examples_fort_sp_common}
\input{Examples_fort_sa_private}
\input{Examples_carrays_fpriv}
\input{Examples_lastprivate}
\input{Examples_reduction}
% User UDR 287
% C array reduction 377
\input{Examples_copyin}
\input{Examples_copyprivate}
\input{Examples_cpp_reference}
% Fortran 2003 features 482
\input{Examples_associate} %section--> subsection
\input{Chap_memory_model}
\input{Examples_mem_model}
\input{Examples_fort_race}
\input{Chap_program_control}
\input{Examples_cond_comp}
\input{Examples_icv}
% If multi-ifs 471
\input{Examples_standalone}
\input{Examples_cancellation}
% New Section Nested Regions
\input{Examples_nested_loop}
\input{Examples_nesting_restrict}
\setcounter{chapter}{0} % restart chapter numbering with "letter A"
\renewcommand{\thechapter}{\Alph{chapter}}%
\appendix
\input{History}
\end{document}

View File

@ -78,6 +78,7 @@
\usepackage{comment} % allow use of \begin{comment}
\usepackage{ifpdf,ifthen} % allow conditional tests in LaTeX definitions
\usepackage{makecell} % Allows common formatting in cells with \thread & \makecell
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -416,8 +417,10 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Code example formatting for the Examples document
% This defines:
% /cexample formats blue markers, caption, and code for C/C++ examples
% /fexample formats blue markers, caption, and code for Fortran examples
% /cexample formats blue markers, caption, and code for C examples
% /cppexample formats blue markers, caption, and code for C++ examples
% /fexample formats blue markers, caption, and code for Fortran (fixed) examples
% /ffreeexample formats blue markers, caption, and code for Fortran90 (free) examples
% Thanks to Jin, Haoqiang H. for the original definitions of the following:
\usepackage{color,fancyvrb} % for \VerbatimInput
@ -434,36 +437,40 @@
\newcommand{\escstr}[1]{\myreplace{_}{\_}{#1}}
\def\exampleheader#1#2{%
\def\exampleheader#1#2#3#4{%
\ifthenelse{ \equal{#1}{} }{
\def\cname{#2}
\def\ename\cname
}{
\def\cname{#1.#2}
\def\cname{#1.#2.#3}
% Use following line for old numbering
% \def\ename{\thechapter.#2}
% \def\ename{\thechapter.#2.#3}
% Use following for mneumonics
\def\ename{\escstr{#1}.#2}
\def\ename{\escstr{#1}.#2.#3}
}
\noindent
\textit{Example \ename}
%\vspace*{-3mm}
\code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=#4,fontsize=\small]%
%\code{\VerbatimInput[numbers=left,firstnumber=1,firstline=#4,fontsize=\small]%
%\code{\VerbatimInput[firstline=#4,fontsize=\small]%
{sources/Example_\cname}}
}
\def\cnexample#1#2{%
\exampleheader{#1}{#2}
\code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=8,fontsize=\small]%
%\code{\VerbatimInput[numbers=left,firstnumber=1,firstline=8,fontsize=\small]%
%\code{\VerbatimInput[firstline=8,fontsize=\small]%
{sources/Example_\cname.c}}
\exampleheader{#1}{#2}{c}{8}
}
\def\cppnexample#1#2{%
\exampleheader{#1}{#2}{cpp}{8}
}
\def\fnexample#1#2{%
\exampleheader{#1}{#2}
\code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=6,fontsize=\small]%
%\code{\VerbatimInput[numbers=left,firstnumber=1,firstline=6,fontsize=\small]%
%\code{\VerbatimInput[firstline=6,fontsize=\small]%
{sources/Example_\cname.f}}
\exampleheader{#1}{#2}{f}{6}
}
\def\ffreenexample#1#2{%
\exampleheader{#1}{#2}{f90}{6}
}
\newcommand\cexample[2]{%
@ -474,7 +481,7 @@
\newcommand\cppexample[2]{%
\needspace{5\baselineskip}\cppspecificstart
\cnexample{#1}{#2}
\cppnexample{#1}{#2}
\cppspecificend
}
@ -484,6 +491,12 @@
\fortranspecificend
}
\newcommand\ffreeexample[2]{%
\needspace{5\baselineskip}\fortranspecificstart
\ffreenexample{#1}{#2}
\fortranspecificend
}
% Set default fonts:
\rmfamily\mdseries\upshape\normalsize

Some files were not shown because too many files have changed in this diff Show More