mirror of
https://github.com/OpenMP/Examples.git
synced 2025-04-03 13:21:33 +01:00
v5.2 release
This commit is contained in:
parent
fb0edc81e7
commit
a5e3d8b3f2
@ -31,7 +31,7 @@ execution within loops that contain the function and have a \code{simd}
|
||||
directive. Clauses provide argument specifications (\code{linear},
|
||||
\code{uniform}, and \code{aligned}), a requested vector length
|
||||
(\code{simdlen}), and designate whether the function is always/never
|
||||
called conditionally in a loop (\code{branch}/\code{inbranch}).
|
||||
called conditionally in a loop (\code{notinbranch}/\code{inbranch}).
|
||||
The latter is for optimizing performance.
|
||||
|
||||
Also, the \code{simd} construct has been combined with the worksharing loop
|
||||
|
@ -22,7 +22,7 @@ Data-sharing attributes of variables can be classified as being \plc{predetermin
|
||||
Certain variables and objects have predetermined attributes.
|
||||
A commonly found case is the loop iteration variable in associated loops
|
||||
of a \code{for} or \code{do} construct. It has a private data-sharing attribute.
|
||||
Variables with predetermined data-sharing attributes can not be listed in a data-sharing clause; but there are some
|
||||
Variables with predetermined data-sharing attributes cannot be listed in a data-sharing clause; but there are some
|
||||
exceptions (mainly concerning loop iteration variables).
|
||||
|
||||
Variables with explicitly determined data-sharing attributes are those that are
|
||||
@ -50,7 +50,7 @@ The common \plc{list items} are arrays, array sections, scalars, pointers, and
|
||||
structure elements (members).
|
||||
|
||||
Procedures and global variables have predetermined data mapping if they appear
|
||||
within the list or block of a \code{declare target} directive. Also, a C/C++ pointer
|
||||
within the list or block of a \code{declare}~\code{target} directive. Also, a C/C++ pointer
|
||||
is mapped as a zero-length array section, as is a C++ variable that is a reference to a pointer.
|
||||
% Waiting for response from Eric on this.
|
||||
|
||||
|
@ -15,7 +15,7 @@ data to the device storage.
|
||||
|
||||
The constructs that explicitly
|
||||
create storage, transfer data, and free storage on the device
|
||||
are catagorized as structured and unstructured. The
|
||||
are categorized as structured and unstructured. The
|
||||
\code{target} \code{data} construct is structured. It creates
|
||||
a data region around \code{target} constructs, and is
|
||||
convenient for providing persistent data throughout multiple
|
||||
@ -33,14 +33,14 @@ the device, and controls on the storage duration.
|
||||
There is an important change in the OpenMP 4.5 specification
|
||||
that alters the data model for scalar variables and C/C++ pointer variables.
|
||||
The default behavior for scalar variables and C/C++ pointer variables
|
||||
in an 4.5 compliant code is \code{firstprivate}. Example
|
||||
in a 4.5 compliant code is \code{firstprivate}. Example
|
||||
codes that have been updated to reflect this new behavior are
|
||||
annotated with a description that describes changes required
|
||||
for correct execution. Often it is a simple matter of mapping
|
||||
the variable as \code{tofrom} to obtain the intended 4.0 behavior.
|
||||
|
||||
In OpenMP version 4.5 the mechanism for target
|
||||
execution is specified as occuring through a \plc{target task}.
|
||||
execution is specified as occurring through a \plc{target task}.
|
||||
When the \code{target} construct is encountered a new
|
||||
\plc{target task} is generated. The \plc{target task}
|
||||
completes after the \code{target} region has executed and all data
|
||||
@ -59,13 +59,14 @@ clause introduced in OpenMP 4.5.
|
||||
\input{devices/target_structure_mapping}
|
||||
\input{devices/target_fort_allocatable_array_mapping}
|
||||
\input{devices/array_sections}
|
||||
\input{devices/C++_virtual_functions}
|
||||
\input{devices/array_shaping}
|
||||
\input{devices/target_mapper}
|
||||
\input{devices/target_data}
|
||||
\input{devices/target_unstructured_data}
|
||||
\input{devices/target_update}
|
||||
\input{devices/target_associate_ptr}
|
||||
\input{devices/declare_target}
|
||||
\input{devices/lambda_expressions}
|
||||
\input{devices/teams}
|
||||
\input{devices/async_target_depend}
|
||||
\input{devices/async_target_with_tasks}
|
||||
|
@ -1,11 +1,12 @@
|
||||
\cchapter{OpenMP Directive Syntax}{directives}
|
||||
\label{chap:directive_syntax}
|
||||
\index{directive syntax}
|
||||
|
||||
OpenMP \emph{directives} use base-language mechanisms to specify OpenMP program behavior.
|
||||
In C code, the directives are formed exclusively with pragmas, whereas in C++
|
||||
code, directives are formed from either pragmas or attributes.
|
||||
Fortran directives are formed with comments in free form and fixed form sources (codes).
|
||||
All of these mechanism allow the compilation to ignore the OpenMP directives if
|
||||
All of these mechanisms allow the compilation to ignore the OpenMP directives if
|
||||
OpenMP is not supported or enabled.
|
||||
|
||||
|
||||
@ -35,6 +36,18 @@ Fortran comments
|
||||
|
||||
where \code{c\$omp} and \code{*\$omp} may be used in Fortran fixed form sources.
|
||||
|
||||
Most OpenMP directives accept clauses that alter the semantics of the directive in some way,
|
||||
and some directives also accept parenthesized arguments that follow the directive name.
|
||||
A clause may just be a keyword (e.g., \scode{untied}) or it may also accept argument lists
|
||||
(e.g., \scode{shared(x,y,z)}) and/or optional modifiers (e.g., \scode{tofrom} in
|
||||
\scode{map(tofrom:}~\scode{x,y,z)}).
|
||||
Clause modifiers may be "simple" or "complex" -- a complex modifier consists of a
|
||||
keyword followed by one or more parameters, bracketed by parentheses, while a simple
|
||||
modifier does not. An example of a complex modifier is the \scode{iterator} modifier,
|
||||
as in \scode{map(iterator(i=0:n),}~\scode{tofrom:}~\scode{p[i])}, or the \scode{step} modifier, as in
|
||||
\scode{linear(x:}~\scode{ref,}~\scode{step(4))}.
|
||||
In the preceding examples, \scode{tofrom} and \scode{ref} are simple modifiers.
|
||||
|
||||
|
||||
%===== Examples Sections =====
|
||||
\input{directives/pragmas}
|
||||
|
73
Chap_introduction.tex
Normal file
73
Chap_introduction.tex
Normal file
@ -0,0 +1,73 @@
|
||||
% This is the introduction for the OpenMP Examples document.
|
||||
% This is an included file. See the main file (openmp-examples.tex) for more information.
|
||||
%
|
||||
% When editing this file:
|
||||
%
|
||||
% 1. To change formatting, appearance, or style, please edit openmp.sty.
|
||||
%
|
||||
% 2. Custom commands and macros are defined in openmp.sty.
|
||||
%
|
||||
% 3. Be kind to other editors -- keep a consistent style by copying-and-pasting to
|
||||
% create new content.
|
||||
%
|
||||
% 4. We use semantic markup, e.g. (see openmp.sty for a full list):
|
||||
% \code{} % for bold monospace keywords, code, operators, etc.
|
||||
% \plc{} % for italic placeholder names, grammar, etc.
|
||||
%
|
||||
% 5. Other recommendations:
|
||||
% Use the convenience macros defined in openmp.sty for the minor headers
|
||||
% such as Comments, Syntax, etc.
|
||||
%
|
||||
% To keep items together on the same page, prefer the use of
|
||||
% \begin{samepage}.... Avoid \parbox for text blocks as it interrupts line numbering.
|
||||
% When possible, avoid \filbreak, \pagebreak, \newpage, \clearpage unless that's
|
||||
% what you mean. Use \needspace{} cautiously for troublesome paragraphs.
|
||||
%
|
||||
% Avoid absolute lengths and measures in this file; use relative units when possible.
|
||||
% Vertical space can be relative to \baselineskip or ex units. Horizontal space
|
||||
% can be relative to \linewidth or em units.
|
||||
%
|
||||
% Prefer \emph{} to italicize terminology, e.g.:
|
||||
% This is a \emph{definition}, not a placeholder.
|
||||
% This is a \plc{var-name}.
|
||||
%
|
||||
|
||||
\cchapter{Introduction}{introduction}
|
||||
\label{chap:introduction}
|
||||
|
||||
This collection of programming examples supplements the OpenMP API for Shared
|
||||
Memory Parallelization specifications, and is not part of the formal specifications. It
|
||||
assumes familiarity with the OpenMP specifications, and shares the typographical
|
||||
conventions used in that document.
|
||||
|
||||
The OpenMP API specification provides a model for parallel programming that is
|
||||
portable across shared memory architectures from different vendors. Compilers from
|
||||
numerous vendors support the OpenMP API.
|
||||
|
||||
The directives, library routines, and environment variables demonstrated in this
|
||||
document allow users to create and manage parallel programs while permitting
|
||||
portability. The directives extend the C, C++ and Fortran base languages with single
|
||||
program multiple data (SPMD) constructs, tasking constructs, device constructs,
|
||||
worksharing constructs, and synchronization constructs, and they provide support for
|
||||
sharing and privatizing data. The functionality to control the runtime environment is
|
||||
provided by library routines and environment variables. Compilers that support the
|
||||
OpenMP API often include a command line option to the compiler that activates and
|
||||
allows interpretation of all OpenMP directives.
|
||||
|
||||
The documents and source codes for OpenMP Examples can be downloaded from
|
||||
\href{https://github.com/OpenMP/Examples}{https://github.com/OpenMP/Examples}.
|
||||
Each directory holds the contents of a chapter and has a \splc{sources} subdirectory of its codes.
|
||||
The codes for this OpenMP \VER{} Examples document have the tag
|
||||
\href{https://github.com/OpenMP/Examples/tree/v\VER}{\plc{v\PVER}}.
|
||||
|
||||
Complete information about the OpenMP API and a list of the compilers that support
|
||||
the OpenMP API can be found at the OpenMP.org web site
|
||||
|
||||
\code{https://www.openmp.org}
|
||||
|
||||
\clearpage
|
||||
|
||||
\input{introduction/Examples}
|
||||
|
||||
% This is the end of introduction.tex of the OpenMP Examples document.
|
||||
|
@ -22,4 +22,5 @@ whereby specific hot spots can be affected by transformation directives.
|
||||
%===== Examples Sections =====
|
||||
\input{loop_transformations/tile}
|
||||
\input{loop_transformations/unroll}
|
||||
\input{loop_transformations/partial_tile}
|
||||
|
||||
|
@ -25,7 +25,7 @@ flush operation is characterized by its flush properties -- some combination of
|
||||
flushes, a \emph{flush-set}.
|
||||
|
||||
A \emph{strong} flush will force consistency between the temporary view and the
|
||||
memory for all variables in its \emph{flush-set}. Furthermore all strong flushes in a
|
||||
memory for all variables in its \emph{flush-set}. Furthermore, all strong flushes in a
|
||||
program that have intersecting flush-sets will execute in some total order, and
|
||||
within a thread strong flushes may not be reordered with respect to other
|
||||
memory operations on variables in its flush-set. \emph{Release} and
|
||||
@ -53,7 +53,7 @@ do not have a well-defined \emph{completion order}. The existence of data
|
||||
races in OpenMP programs result in undefined behavior, and so they should
|
||||
generally be avoided for programs to be correct. The completion order of
|
||||
accesses to a shared variable is guaranteed in OpenMP through a set of memory
|
||||
consistency rules that are described in the \plc{OpenMP Memory Consitency}
|
||||
consistency rules that are described in the \plc{OpenMP Memory Consistency}
|
||||
section of the OpenMP Specifications document.
|
||||
|
||||
%This chapter also includes examples that exhibit non-sequentially consistent
|
||||
|
@ -102,8 +102,8 @@ The \code{masked} construct is not a worksharing construct. The \code{masked} r
|
||||
executed only by the primary thread. There is no implicit barrier (and flush)
|
||||
at the end of the \code{masked} region; hence the other threads of the team continue
|
||||
execution beyond code statements beyond the \code{masked} region.
|
||||
The \code{master} contruct, which has been deprecated in OpenMP 5.1, has identical semantics
|
||||
to the \code{masked} contruct with no \code{filter} clause.
|
||||
The \code{master} construct, which has been deprecated in OpenMP 5.1, has identical semantics
|
||||
to the \code{masked} construct with no \code{filter} clause.
|
||||
|
||||
|
||||
%===== Examples Sections =====
|
||||
|
@ -108,6 +108,7 @@ chapter in the OpenMP Specifications document.
|
||||
\input{program_control/nested_loop}
|
||||
\input{program_control/nesting_restrict}
|
||||
\input{program_control/target_offload}
|
||||
\input{program_control/reproducible}
|
||||
\input{program_control/interop}
|
||||
\input{program_control/utilities}
|
||||
|
||||
|
@ -36,7 +36,7 @@ of ordered regions while allowing code outside the region to run in parallel.
|
||||
|
||||
Since OpenMP 4.5 the \code{ordered} construct can also be a stand-alone
|
||||
directive that specifies cross-iteration dependences in a doacross loop nest.
|
||||
The \code{depend} clause uses a \code{sink} \plc{dependence-type}, along with a
|
||||
The \code{depend} clause uses a \code{sink} \plc{dependence-type}, along with an
|
||||
iteration vector argument (vec) to indicate the iteration that satisfies the
|
||||
dependence. The \code{depend} clause with a \code{source}
|
||||
\plc{dependence-type} specifies dependence satisfaction.
|
||||
|
@ -54,25 +54,28 @@ For a brief revision history, see `Changes.log` in the repo.
|
||||
* Insert the code in the sources directory for each chapter, and include the following metadata:
|
||||
* Metadata Tags for example sources:
|
||||
```
|
||||
@@name: <ename>.<seq-no>[c|cpp|f|f90]
|
||||
@@name: <ename>.<seq-no>
|
||||
@@type: C|C++|F-fixed|F-free
|
||||
@@requires: preprocessing
|
||||
@@compilable: yes|no|maybe
|
||||
@@linkable: yes|no|maybe
|
||||
@@expect: success|failure|nothing|rt-error
|
||||
@@expect: success|compile-time-error|runtime-error|undefined-behavior
|
||||
@@version: omp_<verno>
|
||||
```
|
||||
* **name**
|
||||
is the name of an example
|
||||
* **type**
|
||||
is the source code type, which can be translated into or from proper file extension (c,cpp,f,f90)
|
||||
is the source code type, which can be translated into or from proper file extension (C:c,C++:cpp,F-fixed:f,F-free:f90)
|
||||
* **requires**
|
||||
any additional requirements, currently `preprocessing` for requiring preprocessing
|
||||
* **compilable**
|
||||
indicates whether the source code is compilable
|
||||
* **linkable**
|
||||
indicates whether the source code is linkable
|
||||
* **expect**
|
||||
indicates some expected result for testing purpose "`success|failure|nothing`" applies
|
||||
to the result of code compilation "`rt-error`" is for a case where compilation may be
|
||||
successful, but the code contains potential runtime issues (such as race condition).
|
||||
indicates some expected result for testing purpose "`success|compile-time-error|ct-error`" applies
|
||||
to the result of code compilation; "`runtime-error|rt-error`" is for a case where compilation may be
|
||||
successful, but the code contains potential runtime issues (such as race condition); `undefined-behavior` could result from a non-conforming code.
|
||||
Alternative would be to just use "`conforming`" or "`non-conforming`".
|
||||
* **version**
|
||||
indicates features for a specific OpenMP version, such as "`omp_5.0`"
|
||||
@ -94,23 +97,30 @@ For a brief revision history, see `Changes.log` in the repo.
|
||||
|
||||
|
||||
|
||||
# LaTeX macros for examples
|
||||
## LaTeX macros for examples
|
||||
|
||||
The following describes LaTeX macros defined specifically for examples.
|
||||
* Source code with language h-rules
|
||||
* Source code without language h-rules
|
||||
* Language h-rules
|
||||
* Other macros
|
||||
* See `openmp.sty` for more information
|
||||
|
||||
### Source code with language h-rules
|
||||
```
|
||||
\cexample[<verno>]{<ename>}{<seq-no>} % for C/C++ examples
|
||||
\cppexample[<verno>]{<ename>}{<seq-no>} % for C++ examples
|
||||
\fexample[<verno>]{<ename>}{<seq-no>} % for fixed-form Fortran examples
|
||||
\ffreeexample[<verno>]{<ename>}{<seq-no>} % for free-form Fortran examples
|
||||
\cexample[<verno>]{<ename>}{<seq-no>}[<s>] % for C/C++ examples
|
||||
\cppexample[<verno>]{<ename>}{<seq-no>}[<s>] % for C++ examples
|
||||
\fexample[<verno>]{<ename>}{<seq-no>}[<s>] % for fixed-form Fortran examples
|
||||
\ffreeexample[<verno>]{<ename>}{<seq-no>}[<s>] % for free-form Fortran examples
|
||||
```
|
||||
|
||||
* Source code without language h-rules
|
||||
### Source code without language h-rules
|
||||
```
|
||||
\cnexample[<verno>]{<ename>}{<seq-no>}
|
||||
\cppnexample[<verno>]{<ename>}{<seq-no>}
|
||||
\fnexample[<verno>]{<ename>}{<seq-no>}
|
||||
\ffreenexample[<verno>]{<ename>}{<seq-no>}
|
||||
\srcnexample[<verno>]{<ename>}{<seq-no>}{<ext>}
|
||||
\cnexample[<verno>]{<ename>}{<seq-no>}[<s>]
|
||||
\cppnexample[<verno>]{<ename>}{<seq-no>}[<s>]
|
||||
\fnexample[<verno>]{<ename>}{<seq-no>}[<s>]
|
||||
\ffreenexample[<verno>]{<ename>}{<seq-no>}[<s>]
|
||||
\srcnexample[<verno>]{<ename>}{<seq-no>}{<ext>}[<s>]
|
||||
```
|
||||
|
||||
Optional `<verno>` can be supplied in a macro to include a specific OpenMP
|
||||
@ -123,7 +133,11 @@ For a brief revision history, see `Changes.log` in the repo.
|
||||
source code should not contain any `@@` metadata tags. The `ext` argument
|
||||
to this macro is the file extension (such as `h`, `hpp`, `inc`).
|
||||
|
||||
* Language h-rules
|
||||
The `<s>` option to each macro allows finer-control of any additional lines
|
||||
to be skipped due to addition of new `@@` tags, such as `@@requires`.
|
||||
The default value for `<s>` is 0.
|
||||
|
||||
### Language h-rules
|
||||
```
|
||||
\cspecificstart, \cspecificend
|
||||
\cppspecificstart, \cppspecificend
|
||||
@ -131,9 +145,11 @@ For a brief revision history, see `Changes.log` in the repo.
|
||||
\fortranspecificstart, \fortranspecificend
|
||||
```
|
||||
|
||||
* Chapter and section macros
|
||||
### Other macros
|
||||
```
|
||||
\cchapter{<Chapter Name>}{<chap_directory>}
|
||||
\hexentry[ext1]{<example_name>}[ext2]{<earlier_tag>}
|
||||
\hexmentry[ext1]{<example_name>}[ext2]{<earlier_tag>}{<prior_name>}
|
||||
```
|
||||
|
||||
The `\cchapter` macro is used for starting a chapter with proper page spacing.
|
||||
@ -146,8 +162,15 @@ A previously-defined macro `\sinput{<section_file>}` to import a section
|
||||
file from `<chap_directory>` is no longer supported. Please use
|
||||
`\input{<chap_directory>/<section_file>}` explicitly.
|
||||
|
||||
* See `openmp.sty` for more information
|
||||
The two macros `\hexentry` and `\hexmentry` are defined for simplifying
|
||||
entries in the feature deprecation and update tables. Option `[ext1]` is
|
||||
the file extension with a default value of `c` and option `[ext2]` is
|
||||
the file extension for the associated second file if present.
|
||||
`<earlier_tag>` is the version tag of the corresponding example
|
||||
in the earlier version. `\hexentry` assumes no name change for an example
|
||||
in different versions; `\hexmentry` can be used to specify a prior name
|
||||
if it is different.
|
||||
|
||||
### License
|
||||
## License
|
||||
|
||||
For copyright information, please see `omp_copyright.txt`.
|
||||
|
281
Deprecated_Features.tex
Normal file
281
Deprecated_Features.tex
Normal file
@ -0,0 +1,281 @@
|
||||
\cchapter{Feature Deprecations and Updates in Examples}{deprecated_features}
|
||||
\label{chap:deprecated_features}
|
||||
\label{sec:deprecated_features}
|
||||
\index{deprecated features}
|
||||
|
||||
Deprecation of features began in OpenMP 5.0.
|
||||
Examples that use a deprecated feature have been updated with an equivalent
|
||||
replacement feature.
|
||||
|
||||
Table~\ref{tab:Deprecated Features} summarizes deprecated features and
|
||||
their replacements in each version. Affected examples are updated
|
||||
accordingly and listed in Section~\ref{sec:Updated Examples}.
|
||||
|
||||
\nolinenumbers
|
||||
\renewcommand{\arraystretch}{1.4}
|
||||
\tablefirsthead{%
|
||||
\hline
|
||||
\textbf{Version} & \textbf{Deprecated Feature} & \textbf{Replacement}\\
|
||||
\hline\\[-3.5ex]
|
||||
}
|
||||
\tablehead{%
|
||||
\multicolumn{2}{l}{\small\slshape table continued from previous page}\\
|
||||
\hline
|
||||
\textbf{Version} & \textbf{Deprecated Feature} & \textbf{Replacement}\\
|
||||
\hline\\[-3ex]
|
||||
}
|
||||
\tabletail{%
|
||||
\hline\\[-4ex]
|
||||
\multicolumn{2}{l}{\small\slshape table continued on next page}\\
|
||||
}
|
||||
\tablelasttail{\hline\\[-2ex]}
|
||||
\tablecaption{Deprecated Features and Their Replacements\label{tab:Deprecated Features}}
|
||||
\begin{supertabular}{p{0.4in} p{2.3in} p{2.2in}}
|
||||
5.2 & \scode{default} clause on metadirectives
|
||||
& \scode{otherwise} clause \\
|
||||
5.2 & delimited \scode{declare}~\scode{target} directive for C/C++
|
||||
& \scode{begin}~\scode{declare}~\scode{target} directive \\
|
||||
5.2 & \scode{to} clause on \scode{declare}~\scode{target} directive
|
||||
& \scode{enter} clause \\
|
||||
5.2 & non-argument \scode{destroy} clause on \scode{depobj} construct
|
||||
& \scode{destroy(}\plc{argument}\code{)} \\
|
||||
5.2 & \scode{allocate} construct for Fortran \scode{ALLOCATE} statements
|
||||
& \scode{allocators} construct \\
|
||||
5.2 & \scode{depend} clause on \scode{ordered} construct
|
||||
& \scode{doacross} clause \\
|
||||
5.2 & \scode{linear(}\plc{modifier(list): linear-step}\code{)} clause
|
||||
& \scode{linear(}\plc{list:}~\scode{step(}\plc{linear-step}\scode{)}\plc{, modifier}\scode{)} clause \\
|
||||
\hline
|
||||
5.1 & \scode{master} construct
|
||||
& \scode{masked} construct \\
|
||||
5.1 & \scode{master} affinity policy
|
||||
& \scode{primary} affinity policy \\
|
||||
\hline
|
||||
5.0 & \scode{omp_lock_hint_*} constants
|
||||
& \scode{omp_sync_hint_*} constants \\[2pt]
|
||||
\end{supertabular}
|
||||
|
||||
\linenumbers
|
||||
These replacements appear in examples that illustrate, otherwise, earlier features.
|
||||
When using a compiler that is compliant with a version prior to
|
||||
the indicated version, the earlier form of an example for a previous
|
||||
version is listed as a reference.
|
||||
|
||||
\newpage
|
||||
\section{Updated Examples for Different Versions}
|
||||
\label{sec:Updated Examples}
|
||||
|
||||
The following tables list the updated examples for different versions as
|
||||
a result of feature deprecation. The \emph{Earlier Version} column of
|
||||
the tables shows the version tag of the earlier version. It also shows
|
||||
the prior name of an example when it has been renamed.
|
||||
|
||||
Table~\ref{tab:Updated Examples 5.2} lists the updated examples for OpenMP 5.2
|
||||
in the Examples Document Version
|
||||
\href{https://github.com/OpenMP/Examples/tree/v5.2}{5.2}.
|
||||
The \emph{Earlier Version} column of the table lists the earlier version
|
||||
tags of the examples that can be found in
|
||||
the Examples Document Version
|
||||
\href{https://github.com/OpenMP/Examples/tree/v5.1}{5.1}.
|
||||
|
||||
\index{clauses!default@\code{default}}
|
||||
\index{clauses!otherwise@\code{otherwise}}
|
||||
\index{clauses!to@\code{to}}
|
||||
\index{clauses!enter@\code{enter}}
|
||||
\index{clauses!depend@\code{depend}}
|
||||
\index{clauses!doacross@\code{doacross}}
|
||||
\index{clauses!linear@\code{linear}}
|
||||
\index{clauses!destroy@\code{destroy}}
|
||||
\index{default clause@\code{default} clause}
|
||||
\index{otherwise clause@\code{otherwise} clause}
|
||||
\index{to clause@\code{to} clause}
|
||||
\index{enter clause@\code{enter} clause}
|
||||
\index{depend clause@\code{depend} clause}
|
||||
\index{doacross clause@\code{doacross} clause}
|
||||
\index{linear clause@\code{linear} clause}
|
||||
\index{destroy clause@\code{destroy} clause}
|
||||
\index{directives!begin declare target@\code{begin}~\code{declare}~\code{target}}
|
||||
\index{begin declare target directive@\code{begin}~\code{declare}~\code{target} directive}
|
||||
\index{allocate construct@\code{allocate} construct}
|
||||
\index{allocators construct@\code{allocators} construct}
|
||||
|
||||
\nolinenumbers
|
||||
\renewcommand{\arraystretch}{1.0}
|
||||
\tablefirsthead{%
|
||||
\hline\\[-2ex]
|
||||
\textbf{Example Name} & \textbf{Earlier Version} & \textbf{Feature Updated}
|
||||
\\[2pt]
|
||||
\hline\\[-2ex]
|
||||
}
|
||||
\tablehead{%
|
||||
\multicolumn{2}{l}{\small\slshape table continued from previous page}\\[2pt]
|
||||
\hline\\[-2ex]
|
||||
\textbf{Example Name} & \textbf{Earlier Version} & \textbf{Feature Updated}\\[2pt]
|
||||
\hline\\[-2ex]
|
||||
}
|
||||
\tabletail{%
|
||||
\hline\\[-2.5ex]
|
||||
\multicolumn{2}{l}{\small\slshape table continued on next page}\\
|
||||
}
|
||||
\tablelasttail{\hline\\[-1ex]}
|
||||
\tablecaption{Updated Examples for Version 5.2\label{tab:Updated Examples 5.2}}
|
||||
\begin{supertabular}{p{1.7in} p{1.2in} p{2.1in}}
|
||||
\hexentry{error.1}[f90]{5.1} &
|
||||
\scode{default} clause on metadirectives \\
|
||||
\hexentry{metadirective.1}[f90]{5.0} &
|
||||
replaced with \scode{otherwise} clause \\
|
||||
\hexentry{metadirective.2}[f90]{5.0} & \\
|
||||
\hexentry{metadirective.3}[f90]{5.0} & \\
|
||||
\hexentry{metadirective.4}[f90]{5.1} & \\
|
||||
\hexentry{target_ptr_map.4}{5.1} & \\
|
||||
\hexentry{target_ptr_map.5}[f90]{5.1} & \\[2pt]
|
||||
\hline\\[-2ex]
|
||||
\hexentry[f90]{array_shaping.1}{5.0} &
|
||||
\scode{to} clause on \scode{declare} \scode{target} \\
|
||||
\hexentry{target_reverse_offload.7}{5.0} &
|
||||
directive replaced with \scode{enter} clause \\
|
||||
\hexentry{target_task_reduction.1}[f90]{5.1} & \\
|
||||
\hexentry{target_task_reduction.2a}[f90]{5.0} & \\
|
||||
\hexentry{target_task_reduction.2b}[f90]{5.1} &\\[2pt]
|
||||
\hline\\[-2ex]
|
||||
\hexentry{array_shaping.1}{5.0} &
|
||||
delimited \scode{declare}~\scode{target} \\
|
||||
\hexentry{async_target.1}{4.0} &
|
||||
directive replaced with \\
|
||||
\hexentry{async_target.2}{4.0} &
|
||||
\scode{begin}~\scode{declare}~\scode{target} \\
|
||||
\hexentry{declare_target.1}{4.0} &
|
||||
directive for C/C++ \\
|
||||
\hexentry[cpp]{declare_target.2c}{4.0} & \\
|
||||
\hexentry{declare_target.3}{4.0} & \\
|
||||
\hexentry{declare_target.4}{4.0} & \\
|
||||
\hexentry{declare_target.5}{4.0} & \\
|
||||
\hexentry{declare_target.6}{4.0} & \\
|
||||
\hexentry{declare_variant.1}{5.0} & \\
|
||||
\hexentry{device.1}{4.0} & \\
|
||||
\hexentry{metadirective.3}{5.0} & \\
|
||||
\hexentry{target_ptr_map.2}{5.0} & \\
|
||||
\hexentry{target_ptr_map.3a}{5.0} & \\
|
||||
\hexentry{target_ptr_map.3b}{5.0} & \\
|
||||
\hexentry{target_struct_map.1}{5.0} & \\
|
||||
\hexentry[cpp]{target_struct_map.2}{5.0} & \\
|
||||
\hexentry{target_struct_map.3}{5.0} & \\
|
||||
\hexentry{target_struct_map.4}{5.0} & \\[2pt]
|
||||
\hline\\[-2ex]
|
||||
\hexentry{doacross.1}[f90]{4.5} &
|
||||
\scode{depend} clause on \scode{ordered} \\
|
||||
\hexentry{doacross.2}[f90]{4.5} &
|
||||
construct replaced with \scode{doacross} \\
|
||||
\hexentry{doacross.3}[f90]{4.5} &
|
||||
clause \\
|
||||
\hexentry{doacross.4}[f90]{4.5} & \\[2pt]
|
||||
\hline\\[-2ex]
|
||||
\hexentry[cpp]{linear_modifier.1}[f90]{4.5} &
|
||||
modifier syntax change for \scode{linear} \\
|
||||
\hexentry[cpp]{linear_modifier.2}[f90]{4.5} &
|
||||
clause on \scode{declare}~\scode{simd} directive \\
|
||||
\hexentry{linear_modifier.3}[f90]{4.5} & \\[2pt]
|
||||
\hline\\[-2ex]
|
||||
\hexentry[f90]{allocators.1}{5.0} &
|
||||
\scode{allocate} construct replaced with \scode{allocators} construct
|
||||
for Fortran allocate statements \\[2pt]
|
||||
\hline\\[-2ex]
|
||||
\hexentry{depobj.1}[f90]{5.0} &
|
||||
argument added to \scode{destroy} clause on \scode{depobj}
|
||||
construct \\[2pt]
|
||||
\end{supertabular}
|
||||
|
||||
\linenumbers
|
||||
Table~\ref{tab:Updated Examples 5.1} lists the updated examples for OpenMP 5.1
|
||||
in the Examples Document Version
|
||||
\href{https://github.com/OpenMP/Examples/tree/v5.1}{5.1}.
|
||||
The \emph{Earlier Version} column of the table lists the earlier version
|
||||
tags and prior names of the examples that can be found in
|
||||
the Examples Document Version
|
||||
\href{https://github.com/OpenMP/Examples/tree/v5.0.1}{5.0.1}.
|
||||
|
||||
\index{affinity!master policy@\code{master} policy}
|
||||
\index{affinity!primary policy@\code{primary} policy}
|
||||
\index{constructs!master@\code{master}}
|
||||
\index{constructs!masked@\code{masked}}
|
||||
\index{master construct@\code{master} construct}
|
||||
\index{masked construct@\code{masked} construct}
|
||||
|
||||
\nolinenumbers
|
||||
\renewcommand{\arraystretch}{1.0}
|
||||
\tablefirsthead{%
|
||||
\hline\\[-2ex]
|
||||
\textbf{Example Name} & \textbf{Earlier Version} & \textbf{Feature Updated}
|
||||
\\[2pt]
|
||||
\hline\\[-2ex]
|
||||
}
|
||||
\tablehead{%
|
||||
\multicolumn{2}{l}{\small\slshape table continued from previous page}\\[2pt]
|
||||
\hline\\[-2ex]
|
||||
\textbf{Example Name} & \textbf{Earlier Version} & \textbf{Feature Updated}\\[2pt]
|
||||
\hline\\[-2ex]
|
||||
}
|
||||
\tabletail{%
|
||||
\hline\\[-2.5ex]
|
||||
\multicolumn{2}{l}{\small\slshape table continued on next page}\\
|
||||
}
|
||||
\tablelasttail{\hline\\[-1ex]}
|
||||
\tablecaption{Updated Examples for Version 5.1\label{tab:Updated Examples 5.1}}
|
||||
\begin{supertabular}{p{1.8in} p{1.4in} p{1.8in}}
|
||||
\hexentry{affinity.5}[f]{4.0} &
|
||||
\scode{master} affinity policy replaced with \scode{primary} policy \\[2pt]
|
||||
\hline\\[-2ex]
|
||||
\hexentry{async_target.3}[f90]{5.0} &
|
||||
\scode{master} construct replaced \\
|
||||
\hexentry{cancellation.2}[f90]{4.0} &
|
||||
with \scode{masked} construct \\
|
||||
\hexentry{copyprivate.2}[f]{3.0} & \\
|
||||
\hexentry[f]{fort_sa_private.5}{3.0} & \\
|
||||
\hexentry{lock_owner.1}[f]{3.0} & \\
|
||||
\hexmentry{masked.1}[f]{3.0}{master.1} & \\
|
||||
\hexmentry{parallel_masked_taskloop.1}[f90]{5.0}{parallel_master_taskloop.1} &\\
|
||||
\hexentry{reduction.6}[f]{3.0} & \\
|
||||
\hexentry{target_task_reduction.1}[f90]{5.0} & \\
|
||||
\hexentry{target_task_reduction.2b}[f90]{5.0} & \\
|
||||
\hexentry{taskloop_simd_reduction.1}[f90]{5.0} & \\
|
||||
\hexentry{task_detach.1}[f90]{5.0} & \\[2pt]
|
||||
\end{supertabular}
|
||||
|
||||
\linenumbers
|
||||
Table~\ref{tab:Updated Examples 5.0} lists the updated examples for OpenMP 5.0
|
||||
in the Examples Document Version
|
||||
\href{https://github.com/OpenMP/Examples/tree/v5.1}{5.1}.
|
||||
The \emph{Earlier Version} column of the table lists the earlier version
|
||||
tags of the examples that can be found in
|
||||
the Examples Document Version
|
||||
\href{https://github.com/OpenMP/Examples/tree/v5.0.1}{5.0.1}.
|
||||
|
||||
\nolinenumbers
|
||||
\renewcommand{\arraystretch}{1.0}
|
||||
\tablefirsthead{%
|
||||
\hline\\[-2ex]
|
||||
\textbf{Example Name} & \textbf{Earlier Version} & \textbf{Feature Updated}
|
||||
\\[2pt]
|
||||
\hline\\[-2ex]
|
||||
}
|
||||
\tablehead{%
|
||||
\multicolumn{2}{l}{\small\slshape table continued from previous page}\\[2pt]
|
||||
\hline\\[-2ex]
|
||||
\textbf{Example Name} & \textbf{Earlier Version} & \textbf{Feature Updated}\\[2pt]
|
||||
\hline\\[-2ex]
|
||||
}
|
||||
\tabletail{%
|
||||
\hline\\[-2.5ex]
|
||||
\multicolumn{2}{l}{\small\slshape table continued on next page}\\
|
||||
}
|
||||
\tablelasttail{\hline\\[-1ex]}
|
||||
\tablecaption{Updated Examples for Version 5.0\label{tab:Updated Examples 5.0}}
|
||||
\begin{supertabular}{p{1.6in} p{1.3in} p{2.1in}}
|
||||
\hexentry{critical.2}[f]{4.5} &
|
||||
\scode{omp_lock_hint_*} constants \\
|
||||
\hexentry[cpp]{init_lock_with_hint.1}[f]{4.5} &
|
||||
replaced with \scode{omp_sync_hint_*} constants \\[2pt]
|
||||
\end{supertabular}
|
||||
|
||||
\linenumbers
|
||||
|
@ -1,23 +1,33 @@
|
||||
\bchapter{Foreword}
|
||||
\chapter*{Foreword}
|
||||
\label{chap:foreword}
|
||||
|
||||
The OpenMP Examples document has been updated with new features
|
||||
found in the OpenMP 5.1 Specification. The additional examples and updates
|
||||
found in the OpenMP \VER\ Specification. The additional examples and updates
|
||||
are referenced in the Document Revision History of the Appendix on page~\pageref{chap:history}.
|
||||
|
||||
Text describing an example with a 5.1 feature specifically states
|
||||
that the feature support begins in the OpenMP 5.1 Specification. Also,
|
||||
an \code{\small omp\_5.1} keyword is included in the metadata of the source code.
|
||||
These distinctions are presented to remind readers that a 5.1 compliant
|
||||
Text describing an example with a \VER\ feature specifically states
|
||||
that the feature support begins in the OpenMP \VER\ Specification. Also,
|
||||
an \code{\small omp\_\VER} keyword is included in the metadata of the source code.
|
||||
These distinctions are presented to remind readers that a \VER\ compliant
|
||||
OpenMP implementation is necessary to use these features in codes.
|
||||
|
||||
Examples for most of the 5.1 features are included in this document,
|
||||
Examples for most of the \VER\ features are included in this document,
|
||||
and incremental releases will become available as more feature examples
|
||||
and updates are submitted, and approved by the OpenMP Examples Subcommittee.
|
||||
and updates are submitted and approved by the OpenMP Examples Subcommittee.
|
||||
|
||||
Examples are accepted for this document after discussions, revisions and reviews
|
||||
in the Examples Subcommittee, and two reviews/discussions and two votes
|
||||
in the OpenMP Language Committee.
|
||||
Draft examples are often derived from case studies for new features in the language,
|
||||
and are revised to illustrate the basic application of the features with code comments,
|
||||
and a text description. We are grateful to the numerous members of the Language Committee
|
||||
who took the time to prepare codes and descriptions, and shepherd them through
|
||||
the acceptance process. We sincerely appreciate the Example Subcommittee members, who
|
||||
actively participated and contributed in weekly meetings over the years.
|
||||
|
||||
\bigskip
|
||||
Examples Subcommitee Co-chairs: \smallskip\linebreak
|
||||
Examples Subcommittee Co-chairs: \smallskip\linebreak
|
||||
Henry Jin (\textsc{NASA} Ames Research Center) \linebreak
|
||||
Kent Milfeld (\textsc{TACC}, Texas Advanced Research Center)
|
||||
Kent Milfeld (\textsc{TACC}, Texas Advanced Computing Center)
|
||||
|
||||
|
||||
|
66
History.tex
66
History.tex
@ -1,6 +1,72 @@
|
||||
\cchapter{Document Revision History}{history}
|
||||
\label{chap:history}
|
||||
|
||||
%=====================================
|
||||
\section{Changes from 5.1 to 5.2}
|
||||
\label{sec:history_51_to_52}
|
||||
|
||||
\begin{itemize}
|
||||
\item General changes:
|
||||
\begin{itemize}
|
||||
\item Included a description of the semantics for OpenMP directive syntax
|
||||
(see \specref{chap:directive_syntax})
|
||||
\item Reorganized the Introduction Chapter and moved the Feature
|
||||
Deprecation Chapter to Appendix~\ref{chap:deprecated_features}
|
||||
\item Included a list of examples that were updated for feature deprecation
|
||||
and replacement in each version (see Appendix~\ref{sec:Updated Examples})
|
||||
\item Added Index entries
|
||||
\end{itemize}
|
||||
|
||||
\item Updated the examples for feature deprecation and replacement in OpenMP 5.2.
|
||||
See Table~\ref{tab:Deprecated Features} and
|
||||
Table~\ref{tab:Updated Examples 5.2} for details.
|
||||
|
||||
\item Added the following examples for the 5.2 features:
|
||||
\begin{itemize}
|
||||
\item Mapping class objects with virtual functions
|
||||
(\specref{sec:virtual_functions})
|
||||
\item \scode{allocators} construct for Fortran \code{allocate} statement
|
||||
(\specref{sec:allocators})
|
||||
\item Behavior of reallocation of variables through OpenMP allocator in
|
||||
Fortran (\specref{sec:allocators})
|
||||
\end{itemize}
|
||||
|
||||
\item Added the following examples for the 5.1 features:
|
||||
\begin{itemize}
|
||||
\item Clarification of optional \code{end} directive for strictly structured
|
||||
block in Fortran (\specref{sec:fortran_free_format_comments})
|
||||
\item \scode{filter} clause on \scode{masked} construct (\specref{sec:masked})
|
||||
\item \scode{omp_all_memory} reserved locator for specifying task dependences
|
||||
(\specref{subsec:depend_undefer_task})
|
||||
\item Behavior of Fortran allocatable variables in \code{target} regions
|
||||
(\specref{sec:fort_allocatable_array_mapping})
|
||||
\item Device memory routines in Fortran
|
||||
(\specref{subsec:target_mem_and_device_ptrs})
|
||||
\item Partial tiles from \scode{tile} construct
|
||||
(\specref{sec:incomplete_tiles})
|
||||
\item Fortran associate names and selectors in \code{target} region
|
||||
(\specref{sec:associate_target})
|
||||
\item \scode{allocate} directive for variable declarations and
|
||||
\scode{allocate} clause on \scode{task} constructs
|
||||
(\specref{sec:allocators})
|
||||
\item Controlling concurrency and reproducibility with \code{order} clause
|
||||
(\specref{sec:reproducible_modifier})
|
||||
\end{itemize}
|
||||
|
||||
\item Added other examples:
|
||||
\begin{itemize}
|
||||
\item Using lambda expressions with \scode{target} constructs
|
||||
(\specref{sec:lambda_expressions})
|
||||
\item Target memory and device pointer routines
|
||||
(\specref{subsec:target_mem_and_device_ptrs})
|
||||
\item Examples to illustrate the ordering properties of
|
||||
the \plc{flush} operation (\specref{sec:mem_model})
|
||||
\item User selector in the \code{metadirective} directive
|
||||
(\specref{sec:metadirective})
|
||||
\end{itemize}
|
||||
|
||||
\end{itemize}
|
||||
|
||||
%=====================================
|
||||
\section{Changes from 5.0.1 to 5.1}
|
||||
\label{sec:history_501_to_51}
|
||||
|
39
Makefile
39
Makefile
@ -1,17 +1,18 @@
|
||||
# Makefile for the OpenMP Examples document in LaTex format.
|
||||
# For more information, see the main document, openmp-examples.tex.
|
||||
|
||||
version=5.1
|
||||
version=5.2
|
||||
default: openmp-examples.pdf
|
||||
diff: openmp-diff-abridged.pdf
|
||||
|
||||
book: BOOK_BUILD="\\\\def\\\\bookbuild{1}"
|
||||
book: clean openmp-examples.pdf
|
||||
cp openmp-examples-${version}.pdf openmp-examples-${version}-book.pdf
|
||||
|
||||
CHAPTERS=Title_Page.tex \
|
||||
Foreword_Chapt.tex \
|
||||
Introduction_Chapt.tex \
|
||||
Examples_Chapt.tex \
|
||||
Deprecated_Features_Chapt.tex \
|
||||
Chap_*.tex \
|
||||
Deprecated_Features.tex \
|
||||
History.tex \
|
||||
*/*.tex
|
||||
|
||||
@ -22,6 +23,8 @@ SOURCES=*/sources/*.c \
|
||||
|
||||
INTERMEDIATE_FILES=openmp-examples.pdf \
|
||||
openmp-examples.toc \
|
||||
openmp-examples.lof \
|
||||
openmp-examples.lot \
|
||||
openmp-examples.idx \
|
||||
openmp-examples.aux \
|
||||
openmp-examples.ilg \
|
||||
@ -29,20 +32,30 @@ INTERMEDIATE_FILES=openmp-examples.pdf \
|
||||
openmp-examples.out \
|
||||
openmp-examples.log
|
||||
|
||||
LATEXCMD=pdflatex -interaction=batchmode -file-line-error
|
||||
LATEXDCMD=$(LATEXCMD) -draftmode
|
||||
|
||||
# check for branches names with "name_XXX"
|
||||
DIFF_TICKET_ID=$(shell git rev-parse --abbrev-ref HEAD)
|
||||
|
||||
openmp-examples.pdf: $(CHAPTERS) $(SOURCES) openmp.sty openmp-examples.tex openmp-logo.png
|
||||
openmp-examples.pdf: $(CHAPTERS) $(SOURCES) openmp.sty openmp-examples.tex openmp-logo.png generated-include.tex
|
||||
rm -f $(INTERMEDIATE_FILES)
|
||||
pdflatex -interaction=batchmode -file-line-error openmp-examples.tex
|
||||
pdflatex -interaction=batchmode -file-line-error openmp-examples.tex
|
||||
pdflatex -interaction=batchmode -file-line-error openmp-examples.tex
|
||||
touch generated-include.tex
|
||||
$(LATEXDCMD) openmp-examples.tex
|
||||
makeindex -s openmp-index.ist openmp-examples.idx
|
||||
$(LATEXDCMD) openmp-examples.tex
|
||||
$(LATEXCMD) openmp-examples.tex
|
||||
cp openmp-examples.pdf openmp-examples-${version}.pdf
|
||||
|
||||
clean:
|
||||
rm -f $(INTERMEDIATE_FILES)
|
||||
rm -f generated-include.tex
|
||||
rm -f openmp-diff-full.pdf openmp-diff-abridged.pdf
|
||||
rm -rf *.tmpdir
|
||||
cd util; make clean
|
||||
|
||||
realclean: clean
|
||||
rm -f openmp-examples-${version}.pdf openmp-examples-${version}-book.pdf
|
||||
|
||||
ifdef DIFF_TO
|
||||
VC_DIFF_TO := -r ${DIFF_TO}
|
||||
@ -52,11 +65,11 @@ endif
|
||||
ifdef DIFF_FROM
|
||||
VC_DIFF_FROM := -r ${DIFF_FROM}
|
||||
else
|
||||
VC_DIFF_FROM := -r work_5.1
|
||||
VC_DIFF_FROM := -r work_5.2
|
||||
endif
|
||||
|
||||
DIFF_TO:=HEAD
|
||||
DIFF_FROM:=work_5.1
|
||||
DIFF_FROM:=work_5.2
|
||||
DIFF_TYPE:=UNDERLINE
|
||||
|
||||
COMMON_DIFF_OPTS:=--math-markup=whole \
|
||||
@ -67,6 +80,10 @@ VC_DIFF_OPTS:=${COMMON_DIFF_OPTS} --force -c latexdiff.cfg --flatten --type="${D
|
||||
|
||||
VC_DIFF_MINIMAL_OPTS:= --only-changes --force
|
||||
|
||||
generated-include.tex:
|
||||
echo "$(BOOK_BUILD)"
|
||||
echo "$(BOOK_BUILD)" > $@
|
||||
|
||||
%.tmpdir: $(wildcard *.sty) $(wildcard *.png) $(wildcard *.aux) openmp-examples.pdf
|
||||
mkdir -p $@/sources
|
||||
for i in affinity devices loop_transformations parallel_execution SIMD tasking \
|
||||
@ -88,3 +105,5 @@ openmp-diff-minimal.pdf: diffs-slow-minimal.tmpdir
|
||||
env PATH="$(shell pwd)/util/latexdiff:$(PATH)" latexdiff-vc ${VC_DIFF_MINIMAL_OPTS} -d $< ${VC_DIFF_OPTS} openmp-examples.tex
|
||||
cp $</openmp-examples.pdf $@
|
||||
if [ "x$(DIFF_TICKET_ID)" != "x" ]; then cp $@ ${@:.pdf=-$(DIFF_TICKET_ID).pdf}; fi
|
||||
|
||||
.PHONY: diff default book clean realclean
|
||||
|
@ -2,6 +2,8 @@
|
||||
\section{\code{simd} and \code{declare} \code{simd} Directives}
|
||||
\label{sec:SIMD}
|
||||
|
||||
\index{constructs!simd@\code{simd}}
|
||||
\index{simd construct@\code{simd} construct}
|
||||
The following example illustrates the basic use of the \code{simd} construct
|
||||
to assure the compiler that the loop can be vectorized.
|
||||
|
||||
@ -10,6 +12,12 @@ to assure the compiler that the loop can be vectorized.
|
||||
\ffreeexample[4.0]{SIMD}{1}
|
||||
|
||||
|
||||
\index{directives!declare simd@\code{declare}~\code{simd}}
|
||||
\index{declare simd directive@\code{declare}~\code{simd} directive}
|
||||
\index{clauses!uniform@\code{uniform}}
|
||||
\index{uniform clause@\code{uniform} clause}
|
||||
\index{clauses!linear@\code{linear}}
|
||||
\index{linear clause@\code{linear} clause}
|
||||
When a function can be inlined within a loop the compiler has an opportunity to
|
||||
vectorize the loop. By guaranteeing SIMD behavior of a function's operations,
|
||||
characterizing the arguments of the function and privatizing temporary
|
||||
@ -43,6 +51,11 @@ variable.
|
||||
\ffreeexample[4.0]{SIMD}{2}
|
||||
|
||||
%\pagebreak
|
||||
\index{clauses!private@\code{private}}
|
||||
\index{private clause@\code{private} clause}
|
||||
\index{clauses!reduction@\code{reduction}}
|
||||
\index{reduction clause@\code{reduction} clause}
|
||||
\index{reductions!reduction clause@\code{reduction} clause}
|
||||
A thread that encounters a SIMD construct executes a vectorized code of the
|
||||
iterations. Similar to the concerns of a worksharing loop a loop vectorized
|
||||
with a SIMD construct must assure that temporary and reduction variables are
|
||||
@ -56,6 +69,8 @@ construct.
|
||||
|
||||
|
||||
%\pagebreak
|
||||
\index{clauses!safelen@\code{safelen}}
|
||||
\index{safelen clause@\code{safelen} clause}
|
||||
A \code{safelen(N)} clause in a \code{simd} construct assures the compiler that
|
||||
there are no loop-carried dependencies for vectors of size \plc{N} or below. If
|
||||
the \code{safelen} clause is not specified, then the default safelen value is
|
||||
@ -71,6 +86,8 @@ than 16, the behavior is undefined.
|
||||
\ffreeexample[4.0]{SIMD}{4}
|
||||
|
||||
%\pagebreak
|
||||
\index{clauses!collapse@\code{collapse}}
|
||||
\index{collapse clause@\code{collapse} clause}
|
||||
The following SIMD construct instructs the compiler to collapse the \plc{i} and
|
||||
\plc{j} loops into a single SIMD loop in which SIMD chunks are executed by
|
||||
threads of the team. Within the workshared loop chunks of a thread, the SIMD
|
||||
@ -84,6 +101,10 @@ chunks are executed in the lanes of the vector units.
|
||||
%%% section
|
||||
\section{\code{inbranch} and \code{notinbranch} Clauses}
|
||||
\label{sec:SIMD_branch}
|
||||
\index{clauses!inbranch@\code{inbranch}}
|
||||
\index{inbranch clause@\code{inbranch} clause}
|
||||
\index{clauses!notinbranch@\code{notinbranch}}
|
||||
\index{notinbranch clause@\code{notinbranch} clause}
|
||||
|
||||
The following examples illustrate the use of the \code{declare} \code{simd}
|
||||
directive with the \code{inbranch} and \code{notinbranch} clauses. The
|
||||
@ -114,6 +135,7 @@ version of the \plc{fib()} function.
|
||||
\pagebreak
|
||||
\section{Loop-Carried Lexical Forward Dependence}
|
||||
\label{sec:SIMD_forward_dep}
|
||||
\index{dependences!loop-carried lexical forward}
|
||||
|
||||
|
||||
The following example tests the restriction on an SIMD loop with the loop-carried lexical forward-dependence. This dependence must be preserved for the correct execution of SIMD loops.
|
||||
|
@ -1,8 +1,14 @@
|
||||
%%% section
|
||||
\section{\code{ref}, \code{val}, \code{uval} Modifiers for \code{linear} Clause}
|
||||
\label{sec:linear_modifier}
|
||||
\index{modifiers, linear@modifiers, \code{linear}!ref@\code{ref}}
|
||||
\index{modifiers, linear@modifiers, \code{linear}!val@\code{val}}
|
||||
\index{modifiers, linear@modifiers, \code{linear}!uval@\code{uval}}
|
||||
\index{clauses!linear@\code{linear}}
|
||||
\index{linear clause@\code{linear} clause}
|
||||
|
||||
When generating vector functions from \code{declare}~\code{simd} directives, it is important for a compiler to know the proper types of function arguments in
|
||||
When generating vector functions from \code{declare}~\code{simd} directives,
|
||||
it is important for a compiler to know the proper types of function arguments in
|
||||
order to generate efficient codes.
|
||||
This is especially true for C++ reference types and Fortran arguments.
|
||||
|
||||
@ -11,66 +17,67 @@ parameter (or Fortran argument) \plc{p}. Variable \plc{p} gets incremented by 1
|
||||
The caller loop \plc{i} in the main program passes
|
||||
a variable \plc{k} as a reference to the function \plc{add\_one2} call.
|
||||
The \code{ref} modifier for the \code{linear} clause on the
|
||||
\code{declare}~\code{simd} directive is used to annotate the
|
||||
reference-type parameter \plc{p} to match the property of the variable
|
||||
\code{declare}~\code{simd} directive specifies that the
|
||||
reference-type parameter \plc{p} is to match the property of the variable
|
||||
\plc{k} in the loop.
|
||||
This use of reference type is equivalent to the second call to
|
||||
\plc{add\_one2} with a direct passing of the array element \plc{a[i]}.
|
||||
In the example, the preferred vector
|
||||
length 8 is specified for both the caller loop and the callee function.
|
||||
|
||||
When \code{linear(ref(p))} is applied to an argument passed by reference,
|
||||
When \code{linear(p:~ref)} is applied to an argument passed by reference,
|
||||
it tells the compiler that the addresses in its vector argument are consecutive,
|
||||
and so the compiler can generate a single vector load or store instead of
|
||||
a gather or scatter. This allows more efficient SIMD code to be generated with
|
||||
less source changes.
|
||||
|
||||
\cppexample[4.5]{linear_modifier}{1}
|
||||
\ffreeexample[4.5]{linear_modifier}{1}
|
||||
\cppexample[5.2]{linear_modifier}{1}
|
||||
\ffreeexample[5.2]{linear_modifier}{1}
|
||||
\clearpage
|
||||
|
||||
|
||||
The following example is a variant of the above example. The function \plc{add\_one2} in the C++ code includes an additional C++ reference parameter \plc{i}.
|
||||
The following example is a variant of the above example. The function \plc{add\_one2}
|
||||
in the C++ code includes an additional C++ reference parameter \plc{i}.
|
||||
The loop index \plc{i} of the caller loop \plc{i} in the main program
|
||||
is passed as a reference to the function \plc{add\_one2} call.
|
||||
The loop index \plc{i} has a uniform address with
|
||||
linear value of step 1 across SIMD lanes.
|
||||
Thus, the \code{uval} modifier is used for the \code{linear} clause
|
||||
to annotate the C++ reference-type parameter \plc{i} to match
|
||||
to specify that the C++ reference-type parameter \plc{i} is to match
|
||||
the property of loop index \plc{i}.
|
||||
|
||||
In the correponding Fortran code the arguments \plc{p} and
|
||||
In the corresponding Fortran code the arguments \plc{p} and
|
||||
\plc{i} in the routine \plc{add\_on2} are passed by references.
|
||||
Similar modifiers are used for these variables in the \code{linear} clauses
|
||||
to match with the property at the caller loop in the main program.
|
||||
|
||||
When \code{linear(uval(i))} is applied to an argument passed by reference, it
|
||||
When \code{linear(i:~uval)} is applied to an argument passed by reference, it
|
||||
tells the compiler that its addresses in the vector argument are uniform
|
||||
so that the compiler can generate a scalar load or scalar store and create
|
||||
linear values. This allows more efficient SIMD code to be generated with
|
||||
less source changes.
|
||||
|
||||
\cppexample[4.5]{linear_modifier}{2}
|
||||
\ffreeexample[4.5]{linear_modifier}{2}
|
||||
\cppexample[5.2]{linear_modifier}{2}
|
||||
\ffreeexample[5.2]{linear_modifier}{2}
|
||||
|
||||
In the following example, the function \plc{func} takes arrays \plc{x} and \plc{y} as arguments, and accesses the array elements referenced by
|
||||
the index \plc{i}.
|
||||
In the following example, the function \plc{func} takes arrays \plc{x} and \plc{y}
|
||||
as arguments, and accesses the array elements referenced by the index \plc{i}.
|
||||
The caller loop \plc{i} in the main program passes a linear copy of
|
||||
the variable \plc{k} to the function \plc{func}.
|
||||
The \code{val} modifier is used for the \code{linear} clause
|
||||
in the \code{declare}~\code{simd} directive for the function
|
||||
\plc{func} to annotate argument \plc{i} to match the property of
|
||||
\plc{func} to specify that the argument \plc{i} is to match the property of
|
||||
the actual argument \plc{k} passed in the SIMD loop.
|
||||
Arrays \plc{x} and \plc{y} have uniform addresses across SIMD lanes.
|
||||
|
||||
When \code{linear(val(i):1)} is applied to an argument,
|
||||
When \code{linear(i:~val,step(1))} is applied to an argument,
|
||||
it tells the compiler that its addresses in the vector argument may not be
|
||||
consecutive, however, their values are linear (with stride 1 here). When the value of \plc{i} is used
|
||||
in subscript of array references (e.g., \plc{x[i]}), the compiler can generate
|
||||
a vector load or store instead of a gather or scatter. This allows more
|
||||
efficient SIMD code to be generated with less source changes.
|
||||
|
||||
\cexample[4.5]{linear_modifier}{3}
|
||||
\ffreeexample[4.5]{linear_modifier}{3}
|
||||
\cexample[5.2]{linear_modifier}{3}
|
||||
\ffreeexample[5.2]{linear_modifier}{3}
|
||||
|
||||
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: SIMD.1c
|
||||
* @@name: SIMD.1
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: SIMD.1f
|
||||
! @@name: SIMD.1
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: SIMD.2c
|
||||
* @@name: SIMD.2
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: SIMD.2f
|
||||
! @@name: SIMD.2
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: SIMD.3c
|
||||
* @@name: SIMD.3
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: SIMD.3f
|
||||
! @@name: SIMD.3
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: SIMD.4c
|
||||
* @@name: SIMD.4
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: SIMD.4f
|
||||
! @@name: SIMD.4
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: SIMD.5c
|
||||
* @@name: SIMD.5
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: SIMD.5f
|
||||
! @@name: SIMD.5
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: SIMD.6c
|
||||
* @@name: SIMD.6
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: SIMD.6f
|
||||
! @@name: SIMD.6
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: SIMD.7c
|
||||
* @@name: SIMD.7
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: SIMD.7f
|
||||
! @@name: SIMD.7
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: SIMD.8c
|
||||
* @@name: SIMD.8
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: SIMD.8f
|
||||
! @@name: SIMD.8
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
|
@ -1,17 +1,17 @@
|
||||
/*
|
||||
* @@name: linear_modifier.1cpp
|
||||
* @@name: linear_modifier.1
|
||||
* @@type: C++
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
* @@expect: success
|
||||
* @@version: omp_4.5
|
||||
* @@version: omp_5.1
|
||||
*/
|
||||
#include <stdio.h>
|
||||
|
||||
#define NN 1023
|
||||
int a[NN];
|
||||
|
||||
#pragma omp declare simd linear(ref(p)) simdlen(8)
|
||||
#pragma omp declare simd linear(p: ref) simdlen(8)
|
||||
void add_one2(int& p)
|
||||
{
|
||||
p += 1;
|
||||
|
@ -1,16 +1,16 @@
|
||||
! @@name: linear_modifier.1.f90
|
||||
! @@name: linear_modifier.1
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
! @@expect: success
|
||||
! @@version: omp_4.5
|
||||
! @@version: omp_5.2
|
||||
module m
|
||||
integer, parameter :: NN = 1023
|
||||
integer :: a(NN)
|
||||
|
||||
contains
|
||||
subroutine add_one2(p)
|
||||
!$omp declare simd(add_one2) linear(ref(p)) simdlen(8)
|
||||
!$omp declare simd(add_one2) linear(p: ref) simdlen(8)
|
||||
implicit none
|
||||
integer :: p
|
||||
|
||||
|
@ -1,17 +1,17 @@
|
||||
/*
|
||||
* @@name: linear_modifier.2cpp
|
||||
* @@name: linear_modifier.2
|
||||
* @@type: C++
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
* @@expect: success
|
||||
* @@version: omp_4.5
|
||||
* @@version: omp_5.2
|
||||
*/
|
||||
#include <stdio.h>
|
||||
|
||||
#define NN 1023
|
||||
int a[NN];
|
||||
|
||||
#pragma omp declare simd linear(ref(p)) linear(uval(i))
|
||||
#pragma omp declare simd linear(p: ref) linear(i: uval)
|
||||
void add_one2(int& p, const int& i)
|
||||
{
|
||||
p += i;
|
||||
|
@ -1,16 +1,16 @@
|
||||
! @@name: linear_modifier.2f90
|
||||
! @@name: linear_modifier.2
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
! @@expect: success
|
||||
! @@version: omp_4.5
|
||||
! @@version: omp_5.2
|
||||
module m
|
||||
integer, parameter :: NN = 1023
|
||||
integer :: a(NN)
|
||||
|
||||
contains
|
||||
subroutine add_one2(p, i)
|
||||
!$omp declare simd(add_one2) linear(ref(p)) linear(uval(i))
|
||||
!$omp declare simd(add_one2) linear(p: ref) linear(i: uval)
|
||||
implicit none
|
||||
integer :: p
|
||||
integer, intent(in) :: i
|
||||
|
@ -1,16 +1,16 @@
|
||||
/*
|
||||
* @@name: linear_modifier.3c
|
||||
* @@name: linear_modifier.3
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
* @@expect: success
|
||||
* @@version: omp_4.5
|
||||
* @@version: omp_5.2
|
||||
*/
|
||||
#include <stdio.h>
|
||||
|
||||
#define N 128
|
||||
|
||||
#pragma omp declare simd simdlen(4) uniform(x, y) linear(val(i):1)
|
||||
#pragma omp declare simd simdlen(4) uniform(x, y) linear(i:val,step(1))
|
||||
double func(double x[], double y[], int i)
|
||||
{
|
||||
return (x[i] + y[i]);
|
||||
|
@ -1,13 +1,13 @@
|
||||
! @@name: linear_modifier.3f
|
||||
! @@name: linear_modifier.3
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
! @@expect: success
|
||||
! @@version: omp_4.5
|
||||
! @@version: omp_5.2
|
||||
module func_mod
|
||||
contains
|
||||
real(8) function func(x, y, i)
|
||||
!$omp declare simd(func) simdlen(4) uniform(x, y) linear(val(i):1)
|
||||
!$omp declare simd(func) simdlen(4) uniform(x, y) linear(i:val,step(1))
|
||||
implicit none
|
||||
real(8), intent(in) :: x(*), y(*)
|
||||
integer, intent(in) :: i
|
||||
|
@ -23,11 +23,12 @@
|
||||
|
||||
\vspace{2.3in} %was 3.0
|
||||
|
||||
Source codes for OpenMP \PVER{} Examples can be downloaded from
|
||||
\href{https://github.com/OpenMP/Examples/tree/v\VER}{github}.\\
|
||||
Source codes for OpenMP \VER{} Examples are available at
|
||||
\href{https://github.com/OpenMP/Examples/tree/v\VER}%
|
||||
{github (https://github.com/OpenMP/Examples/tree/v\VER)}.\\
|
||||
|
||||
\begin{adjustwidth}{0pt}{1em}\setlength{\parskip}{0.25\baselineskip}%
|
||||
Copyright \copyright{} 1997-2021 OpenMP Architecture Review Board.\\
|
||||
Copyright \copyright{} 1997-2022 OpenMP Architecture Review Board.\\
|
||||
Permission to copy without fee all or part of this material is granted,
|
||||
provided the OpenMP Architecture Review Board copyright notice and
|
||||
the title of this document appear. Notice is given that copying is by
|
||||
@ -37,7 +38,7 @@ permission of OpenMP Architecture Review Board.\end{adjustwidth}
|
||||
|
||||
% Blank page
|
||||
|
||||
\cleardoublepage
|
||||
%\cleardoublepage
|
||||
|
||||
%For final version, uncomment the line above, comment out the lines below
|
||||
%This working version enacted the following tickets: 287, 519, 550, 593,
|
||||
|
@ -1,19 +1,24 @@
|
||||
\pagebreak
|
||||
\section{\code{proc\_bind} Clause}
|
||||
\label{sec:affinity}
|
||||
\index{affinity!proc_bind clause@\scode{proc_bind} clause}
|
||||
\index{clauses!proc_bind@\scode{proc_bind}}
|
||||
\index{proc_bind clause@\scode{proc_bind} clause}
|
||||
|
||||
The following examples demonstrate how to use the \code{proc\_bind} clause to
|
||||
control the thread binding for a team of threads in a \code{parallel} region.
|
||||
The machine architecture is depicted in the figure below. It consists of two sockets,
|
||||
The machine architecture is depicted in Figure~\ref{fig:mach_arch}. It consists of two sockets,
|
||||
each equipped with a quad-core processor and configured to execute two hardware
|
||||
threads simultaneously on each core. These examples assume a contiguous core numbering
|
||||
starting from 0, such that the hardware threads 0,1 form the first physical core.
|
||||
|
||||
\ifpdf
|
||||
%\begin{figure}[htbp]
|
||||
\centerline{\includegraphics[width=3.8in,keepaspectratio=true]%
|
||||
\begin{figure}[htb]
|
||||
\centerline{\includegraphics[width=3.0in,keepaspectratio=true]%
|
||||
{figs/proc_bind_fig.pdf}}
|
||||
%\end{figure}
|
||||
\caption{A machine architecture with two quad-core processors}
|
||||
\label{fig:mach_arch}
|
||||
\end{figure}
|
||||
\fi
|
||||
|
||||
The following equivalent place list declarations consist of eight places (which
|
||||
@ -27,6 +32,8 @@ or
|
||||
|
||||
\subsection{Spread Affinity Policy}
|
||||
\label{subsec:affinity_spread}
|
||||
\index{affinity!spread policy@\code{spread} policy}
|
||||
\index{spread policy@\code{spread} policy}
|
||||
|
||||
|
||||
The following example shows the result of the \code{spread} affinity policy on
|
||||
@ -124,6 +131,8 @@ and distribution of the place partition would be as follows:
|
||||
|
||||
\subsection{Close Affinity Policy}
|
||||
\label{subsec:affinity_close}
|
||||
\index{affinity!close policy@\code{close} policy}
|
||||
\index{close policy@\code{close} policy}
|
||||
|
||||
The following example shows the result of the \code{close} affinity policy on
|
||||
the partition list when the number of threads is less than or equal to the number
|
||||
@ -220,6 +229,8 @@ and distribution of the place partition would be as follows:
|
||||
|
||||
\subsection{Primary Affinity Policy}
|
||||
\label{subsec:affinity_primary}
|
||||
\index{affinity!primary policy@\code{primary} policy}
|
||||
\index{primary policy@\code{primary} policy}
|
||||
|
||||
The following example shows the result of the \code{primary} affinity policy on
|
||||
the partition list for the machine architecture depicted above. The place partition
|
||||
@ -227,7 +238,7 @@ is not changed by the primary policy.
|
||||
|
||||
\cexample[4.0]{affinity}{5}
|
||||
|
||||
\fexample[4.0]{affinity}{5}[1]
|
||||
\fexample[4.0]{affinity}{5}
|
||||
\clearpage
|
||||
|
||||
It is unspecified on which place the primary thread is initially started. If the
|
||||
|
@ -1,5 +1,14 @@
|
||||
\section{Affinity Display}
|
||||
\label{sec:affinity_display}
|
||||
\index{affinity display!OMP_DISPLAY_AFFINITY@\scode{OMP_DISPLAY_AFFINITY}}
|
||||
\index{environment variables!OMP_DISPLAY_AFFINITY@\scode{OMP_DISPLAY_AFFINITY}}
|
||||
\index{OMP_DISPLAY_AFFINITY@\scode{OMP_DISPLAY_AFFINITY}}
|
||||
\index{affinity display!OMP_AFFINITY_FORMAT@\scode{OMP_AFFINITY_FORMAT}}
|
||||
\index{environment variables!OMP_AFFINITY_FORMAT@\scode{OMP_AFFINITY_FORMAT}}
|
||||
\index{OMP_AFFINITY_FORMAT@\scode{OMP_AFFINITY_FORMAT}}
|
||||
\index{affinity display!omp_display_affinity routine@\scode{omp_display_affinity} routine}
|
||||
\index{routines!omp_display_affinity@\scode{omp_display_affinity}}
|
||||
\index{omp_display_affinity routine@\scode{omp_display_affinity} routine}
|
||||
|
||||
The following examples illustrate ways to display thread affinity.
|
||||
Automatic display of affinity can be invoked by setting
|
||||
@ -49,6 +58,8 @@ where the numbers correspond to core ids for the system. Note, \code{OMP\_DISPLA
|
||||
set and is \code{FALSE} by default. This example shows how to use API routines to
|
||||
perform affinity display operations.
|
||||
|
||||
\index{environment variables!OMP_PLACES@\scode{OMP_PLACES}}
|
||||
\index{OMP_PLACES@\scode{OMP_PLACES}}
|
||||
For each of the two first-level threads the \code{OMP\_PLACES} variable specifies
|
||||
a place with all the core-ids of the socket (\{0,2,4,6\} for one thread and \{1,3,5,7\} for the other).
|
||||
(As is sometimes the case in 2-socket systems, one socket may consist
|
||||
@ -62,8 +73,14 @@ the affinities for the threads on each socket are printed according to this form
|
||||
|
||||
\ffreeexample[5.0]{affinity_display}{2}
|
||||
|
||||
\index{affinity display!omp_get_affinity_format routine@\scode{omp_get_affinity_format} routine}
|
||||
\index{routines!omp_get_affinity_format@\scode{omp_get_affinity_format}}
|
||||
\index{omp_get_affinity_format routine@\scode{omp_get_affinity_format} routine}
|
||||
\index{affinity display!omp_set_affinity_format routine@\scode{omp_set_affinity_format} routine}
|
||||
\index{routines!omp_set_affinity_format@\scode{omp_set_affinity_format}}
|
||||
\index{omp_set_affinity_format routine@\scode{omp_set_affinity_format} routine}
|
||||
The next example illustrates more details about affinity formatting.
|
||||
First, the \code{omp\_get\_affininity\_format()} API routine is used to
|
||||
First, the \code{omp\_get\_affinity\_format()} API routine is used to
|
||||
obtain the default format. The code checks to make sure the storage
|
||||
provides enough space to hold the format.
|
||||
Next, the \code{omp\_set\_affinity\_format()} API routine sets a user-defined
|
||||
@ -83,6 +100,9 @@ and the "0" indicates that any unused space is to be prefixed with zeros
|
||||
%The period (\plc{.}) indicates right justified and \plc{0} leading zeros.
|
||||
%All other text in the format is just user narrative.
|
||||
|
||||
\index{affinity display!omp_capture_affinity routine@\scode{omp_capture_affinity} routine}
|
||||
\index{routines!omp_capture_affinity@\scode{omp_capture_affinity}}
|
||||
\index{omp_capture_affinity routine@\scode{omp_capture_affinity} routine}
|
||||
Within the parallel region the affinity for each thread is captured by
|
||||
\code{omp\_capture\_affinity()} into a buffer array with elements indexed
|
||||
by the thread number (\plc{thrd\_num}).
|
||||
@ -98,6 +118,7 @@ The maximum value for the number of characters (\plc{nchars}) returned by
|
||||
clause and the \plc{if(nchars >= max\_req\_store) max\_req\_store=nchars} statement.
|
||||
It is used to report possible truncation (if \plc{max\_req\_store} > \plc{buffer\_store}).
|
||||
|
||||
\newpage
|
||||
\cexample[5.0]{affinity_display}{3}
|
||||
|
||||
\ffreeexample[5.0]{affinity_display}{3}
|
||||
|
@ -1,5 +1,18 @@
|
||||
\section{Affinity Query Functions}
|
||||
\label{sec: affinity_query}
|
||||
\index{affinity query!omp_get_num_places routine@\scode{omp_get_num_places} routine}
|
||||
\index{routines!omp_get_num_places@\scode{omp_get_num_places}}
|
||||
\index{omp_get_num_places routine@\scode{omp_get_num_places} routine}
|
||||
\index{affinity query!omp_get_place_num routine@\scode{omp_get_place_num} routine}
|
||||
\index{routines!omp_get_place_num@\scode{omp_get_place_num}}
|
||||
\index{omp_get_place_num routine@\scode{omp_get_place_num} routine}
|
||||
\index{affinity query!omp_get_place_num_procs routine@\scode{omp_get_place_num_procs} routine}
|
||||
\index{routines!omp_get_place_num_procs@\scode{omp_get_place_num_procs}}
|
||||
\index{omp_get_place_num_procs routine@\scode{omp_get_place_num_procs} routine}
|
||||
\index{affinity!spread policy@\code{spread} policy}
|
||||
\index{spread policy@\code{spread} policy}
|
||||
\index{environment variables!OMP_PLACES@\scode{OMP_PLACES}}
|
||||
\index{OMP_PLACES@\scode{OMP_PLACES}}
|
||||
|
||||
In the example below a team of threads is generated on each socket of
|
||||
the system, using nested parallelism. Several query functions are used
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: affinity.1c
|
||||
* @@name: affinity.1
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: affinity.1f
|
||||
! @@name: affinity.1
|
||||
! @@type: F-fixed
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: affinity.2c
|
||||
* @@name: affinity.2
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: affinity.2f
|
||||
! @@name: affinity.2
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: affinity.3c
|
||||
* @@name: affinity.3
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: affinity.3f
|
||||
! @@name: affinity.3
|
||||
! @@type: F-fixed
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: affinity.4c
|
||||
* @@name: affinity.4
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: affinity.4f
|
||||
! @@name: affinity.4
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,15 +1,11 @@
|
||||
/*
|
||||
* @@name: affinity.5c
|
||||
* @@name: affinity.5
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
* @@expect: success
|
||||
* @@version: omp_5.1
|
||||
*/
|
||||
#if _OPENMP < 202011
|
||||
#define primary master
|
||||
#endif
|
||||
|
||||
void work();
|
||||
int main()
|
||||
{
|
||||
|
@ -1,14 +1,9 @@
|
||||
! @@name: affinity.5f
|
||||
! @@name: affinity.5
|
||||
! @@type: F-fixed
|
||||
! @@compilable: yes
|
||||
! @@requires: preprocessing
|
||||
! @@linkable: no
|
||||
! @@expect: success
|
||||
! @@version: omp_5.1
|
||||
#if _OPENMP < 202011
|
||||
#define primary master
|
||||
#endif
|
||||
|
||||
PROGRAM EXAMPLE
|
||||
!$OMP PARALLEL PROC_BIND(primary) NUM_THREADS(4)
|
||||
CALL WORK()
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: affinity.1.c
|
||||
* @@name: affinity.6
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: affinity.6f
|
||||
! @@name: affinity.6
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: affinity_display.1.c
|
||||
* @@name: affinity_display.1
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
@ -9,52 +9,53 @@
|
||||
#include <stdio.h>
|
||||
#include <omp.h>
|
||||
|
||||
int main(void){ //MAX threads = 8, single socket system
|
||||
int main(void){ //MAX threads = 8, single socket system
|
||||
|
||||
omp_display_affinity(NULL); //API call-- Displays Affinity of Primary Thread
|
||||
//API call-- Displays Affinity of Primary Thread
|
||||
omp_display_affinity(NULL);
|
||||
|
||||
// API CALL OUTPUT (default format):
|
||||
//team_num= 0, nesting_level= 0, thread_num= 0, thread_affinity= 0,1,2,3,4,5,6,7
|
||||
// API CALL OUTPUT (default format):
|
||||
// team_num= 0, nesting_level= 0, thread_num= 0,
|
||||
// thread_affinity= 0,1,2,3,4,5,6,7
|
||||
|
||||
|
||||
// OMP_DISPLAY_AFFINITY=TRUE, OMP_NUM_THREADS=8
|
||||
// OMP_DISPLAY_AFFINITY=TRUE, OMP_NUM_THREADS=8
|
||||
#pragma omp parallel num_threads(omp_get_num_procs())
|
||||
{
|
||||
if(omp_get_thread_num()==0)
|
||||
if(omp_get_thread_num()==0)
|
||||
printf("1st Parallel Region -- Affinity Reported \n");
|
||||
|
||||
// DISPLAY OUTPUT (default format) has been sorted:
|
||||
// team_num= 0, nesting_level= 1, thread_num= 0, thread_affinity= 0
|
||||
// team_num= 0, nesting_level= 1, thread_num= 1, thread_affinity= 1
|
||||
// ...
|
||||
// team_num= 0, nesting_level= 1, thread_num= 7, thread_affinity= 7
|
||||
// DISPLAY OUTPUT (default format) has been sorted:
|
||||
// team_num= 0, nesting_level= 1, thread_num= 0, thread_affinity= 0
|
||||
// team_num= 0, nesting_level= 1, thread_num= 1, thread_affinity= 1
|
||||
// ...
|
||||
// team_num= 0, nesting_level= 1, thread_num= 7, thread_affinity= 7
|
||||
|
||||
// doing work here
|
||||
// doing work here
|
||||
}
|
||||
|
||||
#pragma omp parallel num_threads( omp_get_num_procs() )
|
||||
{
|
||||
if(omp_get_thread_num()==0)
|
||||
printf("%s%s\n","Same Affinity as in Previous Parallel Region",
|
||||
" -- no Affinity Reported\n");
|
||||
if(omp_get_thread_num()==0)
|
||||
printf("%s%s\n","Same Affinity as in Previous Parallel Region",
|
||||
" -- no Affinity Reported\n");
|
||||
|
||||
// NO AFFINITY OUTPUT:
|
||||
//(output in 1st parallel region only for OMP_DISPLAY_AFFINITY=TRUE)
|
||||
|
||||
// doing more work here
|
||||
// NO AFFINITY OUTPUT:
|
||||
//(output in 1st parallel region only for OMP_DISPLAY_AFFINITY=TRUE)
|
||||
|
||||
// doing more work here
|
||||
}
|
||||
|
||||
// Report Affinity for 1/2 number of threads
|
||||
// Report Affinity for 1/2 number of threads
|
||||
#pragma omp parallel num_threads( omp_get_num_procs()/2 )
|
||||
{
|
||||
if(omp_get_thread_num()==0)
|
||||
if(omp_get_thread_num()==0)
|
||||
printf("Report Affinity for using 1/2 of max threads.\n");
|
||||
|
||||
// DISPLAY OUTPUT (default format) has been sorted:
|
||||
// team_num= 0, nesting_level= 1, thread_num= 0, thread_affinity= 0,1
|
||||
// team_num= 0, nesting_level= 1, thread_num= 1, thread_affinity= 2,3
|
||||
// team_num= 0, nesting_level= 1, thread_num= 2, thread_affinity= 4,5
|
||||
// team_num= 0, nesting_level= 1, thread_num= 3, thread_affinity= 6,7
|
||||
|
||||
// DISPLAY OUTPUT (default format) has been sorted:
|
||||
// team_num= 0, nesting_level= 1, thread_num= 0, thread_affinity= 0,1
|
||||
// team_num= 0, nesting_level= 1, thread_num= 1, thread_affinity= 2,3
|
||||
// team_num= 0, nesting_level= 1, thread_num= 2, thread_affinity= 4,5
|
||||
// team_num= 0, nesting_level= 1, thread_num= 3, thread_affinity= 6,7
|
||||
|
||||
// do work
|
||||
}
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: affinity_display.1.f90
|
||||
! @@name: affinity_display.1
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
@ -10,13 +10,15 @@ program affinity_display ! MAX threads = 8, single socket system
|
||||
implicit none
|
||||
character(len=0) :: null
|
||||
|
||||
call omp_display_affinity(null) !API call- Displays Affinity of Primary Thrd
|
||||
! API call - Displays Affinity of Primary Thread
|
||||
call omp_display_affinity(null)
|
||||
|
||||
! API CALL OUTPUT (default format):
|
||||
!team_num= 0, nesting_level= 0, thread_num= 0, thread_affinity= 0,1,2,3,4,5,6,7
|
||||
! API CALL OUTPUT (default format):
|
||||
! team_num= 0, nesting_level= 0, thread_num= 0, &
|
||||
! thread_affinity= 0,1,2,3,4,5,6,7
|
||||
|
||||
|
||||
! OMP_DISPLAY_AFFINITY=TRUE, OMP_NUM_THREADS=8
|
||||
! OMP_DISPLAY_AFFINITY=TRUE, OMP_NUM_THREADS=8
|
||||
|
||||
!$omp parallel num_threads(omp_get_num_procs())
|
||||
|
||||
@ -24,11 +26,11 @@ program affinity_display ! MAX threads = 8, single socket system
|
||||
print*, "1st Parallel Region -- Affinity Reported"
|
||||
endif
|
||||
|
||||
! DISPLAY OUTPUT (default format) has been sorted:
|
||||
! team_num= 0, nesting_level= 1, thread_num= 0, thread_affinity= 0
|
||||
! team_num= 0, nesting_level= 1, thread_num= 1, thread_affinity= 1
|
||||
! ...
|
||||
! team_num= 0, nesting_level= 1, thread_num= 7, thread_affinity= 7
|
||||
! DISPLAY OUTPUT (default format) has been sorted:
|
||||
! team_num= 0, nesting_level= 1, thread_num= 0, thread_affinity= 0
|
||||
! team_num= 0, nesting_level= 1, thread_num= 1, thread_affinity= 1
|
||||
! ...
|
||||
! team_num= 0, nesting_level= 1, thread_num= 7, thread_affinity= 7
|
||||
|
||||
! doing work here
|
||||
|
||||
@ -40,25 +42,30 @@ program affinity_display ! MAX threads = 8, single socket system
|
||||
print*, "Same Affinity in Parallel Region -- no Affinity Reported"
|
||||
endif
|
||||
|
||||
! NO AFFINITY OUTPUT:
|
||||
!(output in 1st parallel region only for OMP_DISPLAY_AFFINITY=TRUE)
|
||||
! NO AFFINITY OUTPUT:
|
||||
! (output in 1st parallel region only for
|
||||
! OMP_DISPLAY_AFFINITY=TRUE)
|
||||
|
||||
! doing more work here
|
||||
|
||||
!$omp end parallel
|
||||
|
||||
! Report Affinity for 1/2 number of threads
|
||||
! Report Affinity for 1/2 number of threads
|
||||
!$omp parallel num_threads( omp_get_num_procs()/2 )
|
||||
|
||||
if(omp_get_thread_num()==0) then
|
||||
print*, "Different Affinity in Parallel Region -- Affinity Reported"
|
||||
print*, "Altered Affinity in Parallel Region -- Affinity Reported"
|
||||
endif
|
||||
|
||||
! DISPLAY OUTPUT (default format) has been sorted:
|
||||
! team_num= 0, nesting_level= 1, thread_num= 0, thread_affinity= 0,1
|
||||
! team_num= 0, nesting_level= 1, thread_num= 1, thread_affinity= 2,3
|
||||
! team_num= 0, nesting_level= 1, thread_num= 2, thread_affinity= 4,5
|
||||
! team_num= 0, nesting_level= 1, thread_num= 3, thread_affinity= 6,7
|
||||
! DISPLAY OUTPUT (default format) has been sorted:
|
||||
! team_num= 0, nesting_level= 1, thread_num= 0, &
|
||||
! thread_affinity= 0,1
|
||||
! team_num= 0, nesting_level= 1, thread_num= 1, &
|
||||
! thread_affinity= 2,3
|
||||
! team_num= 0, nesting_level= 1, thread_num= 2, &
|
||||
! thread_affinity= 4,5
|
||||
! team_num= 0, nesting_level= 1, thread_num= 3, &
|
||||
! thread_affinity= 6,7
|
||||
|
||||
! do work
|
||||
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: affinity_display.2c
|
||||
* @@name: affinity_display.2
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
@ -14,62 +14,65 @@ void socket_work(int socket_num, int n_thrds);
|
||||
|
||||
int main(void)
|
||||
{
|
||||
int n_sockets, socket_num, n_thrds_on_socket;
|
||||
int n_sockets, socket_num, n_thrds_on_socket;
|
||||
|
||||
omp_set_nested(1); // or env var= OMP_NESTED=true
|
||||
omp_set_max_active_levels(2); // or env var= OMP_MAX_ACTIVE_LEVELS=2
|
||||
omp_set_nested(1); // or env var= OMP_NESTED=true
|
||||
omp_set_max_active_levels(2); // or env var= OMP_MAX_ACTIVE_LEVELS=2
|
||||
|
||||
n_sockets = omp_get_num_places();
|
||||
n_thrds_on_socket = omp_get_place_num_procs(0);
|
||||
n_sockets = omp_get_num_places();
|
||||
n_thrds_on_socket = omp_get_place_num_procs(0);
|
||||
|
||||
// OMP_NUM_THREADS=2,4
|
||||
// OMP_PLACES="{0,2,4,6},{1,3,5,7}" #2 sockets; even/odd proc-ids
|
||||
// OMP_AFFINITY_FORMAT=\
|
||||
// "nest_level= %L, parent_thrd_num= %a, thrd_num= %n, thrd_affinity= %A"
|
||||
|
||||
#pragma omp parallel num_threads(n_sockets) private(socket_num)
|
||||
{
|
||||
socket_num = omp_get_place_num();
|
||||
// OMP_NUM_THREADS=2,4
|
||||
// OMP_PLACES="{0,2,4,6},{1,3,5,7}" #2 sockets; even/odd proc-ids
|
||||
// OMP_AFFINITY_FORMAT=\
|
||||
// "nest_level= %L, parent_thrd_num= %a, thrd_num= %n, thrd_affinity= %A"
|
||||
|
||||
if(socket_num==0)
|
||||
printf(" LEVEL 1 AFFINITIES 1 thread/socket, %d sockets:\n\n", n_sockets);
|
||||
#pragma omp parallel num_threads(n_sockets) private(socket_num)
|
||||
{
|
||||
socket_num = omp_get_place_num();
|
||||
|
||||
omp_display_affinity(NULL); // not needed if OMP_DISPLAY_AFFINITY=TRUE
|
||||
if(socket_num==0)
|
||||
printf(" LEVEL 1 AFFINITIES 1 thread/socket, %d sockets:\n\n",
|
||||
n_sockets);
|
||||
|
||||
// OUTPUT:
|
||||
// LEVEL 1 AFFINITIES 1 thread/socket, 2 sockets:
|
||||
// nest_level= 1, parent_thrd_num= 0, thrd_num= 0, thrd_affinity= 0,2,4,6
|
||||
// nest_level= 1, parent_thrd_num= 0, thrd_num= 1, thrd_affinity= 1,3,5,7
|
||||
// not needed if OMP_DISPLAY_AFFINITY=TRUE
|
||||
omp_display_affinity(NULL);
|
||||
|
||||
socket_work(socket_num, n_thrds_on_socket);
|
||||
}
|
||||
|
||||
return 0;
|
||||
// OUTPUT:
|
||||
// LEVEL 1 AFFINITIES 1 thread/socket, 2 sockets:
|
||||
// nest_level= 1, parent_thrd_num= 0, thrd_num= 0, thrd_affinity= 0,2,4,6
|
||||
// nest_level= 1, parent_thrd_num= 0, thrd_num= 1, thrd_affinity= 1,3,5,7
|
||||
|
||||
socket_work(socket_num, n_thrds_on_socket);
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
void socket_work(int socket_num, int n_thrds)
|
||||
{
|
||||
#pragma omp parallel num_threads(n_thrds)
|
||||
{
|
||||
if(omp_get_thread_num()==0)
|
||||
printf(" LEVEL 2 AFFINITIES, %d threads on socket %d\n",n_thrds, socket_num);
|
||||
|
||||
omp_display_affinity(NULL); // not needed if OMP_DISPLAY_AFFINITY=TRUE
|
||||
|
||||
// OUTPUT:
|
||||
// LEVEL 2 AFFINITIES, 4 threads on socket 0
|
||||
// nest_level= 2, parent_thrd_num= 0, thrd_num= 0, thrd_affinity= 0
|
||||
// nest_level= 2, parent_thrd_num= 0, thrd_num= 1, thrd_affinity= 2
|
||||
// nest_level= 2, parent_thrd_num= 0, thrd_num= 2, thrd_affinity= 4
|
||||
// nest_level= 2, parent_thrd_num= 0, thrd_num= 3, thrd_affinity= 6
|
||||
{
|
||||
#pragma omp parallel num_threads(n_thrds)
|
||||
{
|
||||
if(omp_get_thread_num()==0)
|
||||
printf(" LEVEL 2 AFFINITIES, %d threads on socket %d\n",
|
||||
n_thrds, socket_num);
|
||||
|
||||
// not needed if OMP_DISPLAY_AFFINITY=TRUE
|
||||
omp_display_affinity(NULL);
|
||||
|
||||
// OUTPUT:
|
||||
// LEVEL 2 AFFINITIES, 4 threads on socket 0
|
||||
// nest_level= 2, parent_thrd_num= 0, thrd_num= 0, thrd_affinity= 0
|
||||
// nest_level= 2, parent_thrd_num= 0, thrd_num= 1, thrd_affinity= 2
|
||||
// nest_level= 2, parent_thrd_num= 0, thrd_num= 2, thrd_affinity= 4
|
||||
// nest_level= 2, parent_thrd_num= 0, thrd_num= 3, thrd_affinity= 6
|
||||
|
||||
// LEVEL 2 AFFINITIES, 4 threads on socket 1
|
||||
// nest_level= 2, parent_thrd_num= 1, thrd_num= 0, thrd_affinity= 1
|
||||
// nest_level= 2, parent_thrd_num= 1, thrd_num= 1, thrd_affinity= 3
|
||||
// nest_level= 2, parent_thrd_num= 1, thrd_num= 2, thrd_affinity= 5
|
||||
// nest_level= 2, parent_thrd_num= 1, thrd_num= 3, thrd_affinity= 7
|
||||
|
||||
// LEVEL 2 AFFINITIES, 4 threads on socket 1
|
||||
// nest_level= 2, parent_thrd_num= 1, thrd_num= 0, thrd_affinity= 1
|
||||
// nest_level= 2, parent_thrd_num= 1, thrd_num= 1, thrd_affinity= 3
|
||||
// nest_level= 2, parent_thrd_num= 1, thrd_num= 2, thrd_affinity= 5
|
||||
// nest_level= 2, parent_thrd_num= 1, thrd_num= 3, thrd_affinity= 7
|
||||
|
||||
// ... Do Some work on Socket
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: affinity_display.2.f90
|
||||
! @@name: affinity_display.2
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
@ -20,22 +20,26 @@ program affinity_display
|
||||
! OMP_NUM_THREADS=2,4
|
||||
! OMP_PLACES="{0,2,4,6},{1,3,5,7}" #2 sockets; even/odd proc-ids
|
||||
! OMP_AFFINITY_FORMAT=\
|
||||
! "nest_level= %L, parent_thrd_num= %a, thrd_num= %n, thrd_affinity= %A"
|
||||
!"nest_level= %L, parent_thrd_num= %a, thrd_num= %n, thrd_affinity= %A"
|
||||
|
||||
!$omp parallel num_threads(n_sockets) private(socket_num)
|
||||
|
||||
socket_num = omp_get_place_num()
|
||||
|
||||
if(socket_num==0) then
|
||||
write(*,'("LEVEL 1 AFFINITIES 1 thread/socket ",i0," sockets")')n_sockets
|
||||
write(*,'("LEVEL 1 AFFINITIES 1 thread/socket ",i0," sockets")') &
|
||||
n_sockets
|
||||
endif
|
||||
|
||||
call omp_display_affinity(null) !not needed if OMP_DISPLAY_AFFINITY=TRUE
|
||||
call omp_display_affinity(null) ! not needed
|
||||
! if OMP_DISPLAY_AFFINITY=TRUE
|
||||
|
||||
! OUTPUT:
|
||||
! LEVEL 1 AFFINITIES 1 thread/socket, 2 sockets:
|
||||
! nest_level= 1, parent_thrd_num= 0, thrd_num= 0, thrd_affinity= 0,2,4,6
|
||||
! nest_level= 1, parent_thrd_num= 0, thrd_num= 1, thrd_affinity= 1,3,5,7
|
||||
! nest_level= 1, parent_thrd_num= 0, thrd_num= 0, &
|
||||
! thrd_affinity= 0,2,4,6
|
||||
! nest_level= 1, parent_thrd_num= 0, thrd_num= 1, &
|
||||
! thrd_affinity= 1,3,5,7
|
||||
|
||||
call socket_work(socket_num, n_thrds_on_socket)
|
||||
|
||||
@ -56,7 +60,8 @@ subroutine socket_work(socket_num, n_thrds)
|
||||
n_thrds,socket_num
|
||||
endif
|
||||
|
||||
call omp_display_affinity(null); !not needed if OMP_DISPLAY_AFFINITY=TRUE
|
||||
call omp_display_affinity(null) ! not needed
|
||||
! if OMP_DISPLAY_AFFINITY=TRUE
|
||||
|
||||
! OUTPUT:
|
||||
! LEVEL 2 AFFINITIES, 4 threads on socket 0
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: affinity_display.3.c
|
||||
* @@name: affinity_display.3
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
@ -25,9 +25,9 @@ int main(void){
|
||||
char **buffer;
|
||||
|
||||
|
||||
// CODE SEGMENT 1 AFFINITY FORMAT
|
||||
// CODE SEGMENT 1 AFFINITY FORMAT
|
||||
|
||||
// Get and Display Default Affinity Format
|
||||
// Get and Display Default Affinity Format
|
||||
|
||||
nchars = omp_get_affinity_format(default_format,(size_t)FORMAT_STORE);
|
||||
printf("Default Affinity Format is: %s\n",default_format);
|
||||
@ -37,44 +37,49 @@ int main(void){
|
||||
printf(" FORMAT_STORE to %d.\n", nchars+1);
|
||||
}
|
||||
|
||||
// Set Affinity Format
|
||||
// Set Affinity Format
|
||||
|
||||
omp_set_affinity_format(my_format);
|
||||
printf("Affinity Format set to: %s\n",my_format);
|
||||
|
||||
|
||||
// CODE SEGMENT 2 CAPTURE AFFINITY
|
||||
// CODE SEGMENT 2 CAPTURE AFFINITY
|
||||
|
||||
// Set up buffer for affinity of n threads
|
||||
// Set up buffer for affinity of n threads
|
||||
|
||||
n = omp_get_num_procs();
|
||||
buffer = (char **)malloc( sizeof(char *) * n );
|
||||
for(i=0;i<n;i++){ buffer[i]=(char *)malloc( sizeof(char) * BUFFER_STORE); }
|
||||
for(i=0;i<n;i++){
|
||||
buffer[i]=(char *)malloc( sizeof(char) * BUFFER_STORE);
|
||||
}
|
||||
|
||||
// Capture Affinity using Affinity Format set above.
|
||||
// Use max reduction to check size of buffer areas
|
||||
// Capture Affinity using Affinity Format set above.
|
||||
// Use max reduction to check size of buffer areas
|
||||
max_req_store = 0;
|
||||
#pragma omp parallel private(thrd_num,nchars) reduction(max:max_req_store)
|
||||
#pragma omp parallel private(thrd_num,nchars) \
|
||||
reduction(max:max_req_store)
|
||||
{
|
||||
if(omp_get_num_threads()>n) exit(1); //safety: don't exceed # of buffers
|
||||
//safety: don't exceed # of buffers
|
||||
if(omp_get_num_threads()>n) exit(1);
|
||||
|
||||
thrd_num=omp_get_thread_num();
|
||||
nchars=omp_capture_affinity(buffer[thrd_num],(size_t)BUFFER_STORE,NULL);
|
||||
nchars=omp_capture_affinity(buffer[thrd_num],
|
||||
(size_t)BUFFER_STORE,NULL);
|
||||
if(nchars > max_req_store) max_req_store=nchars;
|
||||
|
||||
// ...
|
||||
}
|
||||
|
||||
for(i=0;i<n;i++){
|
||||
printf("thrd_num= %d, affinity: %s\n", i,buffer[i]);
|
||||
for(i=0;i<n;i++){
|
||||
printf("thrd_num= %d, affinity: %s\n", i,buffer[i]);
|
||||
}
|
||||
// For 4 threads with OMP_PLACES='{0,1},{2,3},{4,5},{6,7}'
|
||||
// Format host=%20H thrd_num=%0.4n binds_to=%A
|
||||
// For 4 threads with OMP_PLACES='{0,1},{2,3},{4,5},{6,7}'
|
||||
// Format host=%20H thrd_num=%0.4n binds_to=%A
|
||||
|
||||
// affinity: host=hpc.cn567 thrd_num=0000 binds_to=0,1
|
||||
// affinity: host=hpc.cn567 thrd_num=0001 binds_to=2,3
|
||||
// affinity: host=hpc.cn567 thrd_num=0002 binds_to=4,5
|
||||
// affinity: host=hpc.cn567 thrd_num=0003 binds_to=6,7
|
||||
// affinity: host=hpc.cn567 thrd_num=0000 binds_to=0,1
|
||||
// affinity: host=hpc.cn567 thrd_num=0001 binds_to=2,3
|
||||
// affinity: host=hpc.cn567 thrd_num=0002 binds_to=4,5
|
||||
// affinity: host=hpc.cn567 thrd_num=0003 binds_to=6,7
|
||||
|
||||
|
||||
if(max_req_store>=BUFFER_STORE){
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: affinity_display.3.f90
|
||||
! @@name: affinity_display.3
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: affinity_query.1c
|
||||
* @@name: affinity_query.1
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: affinity_query.1f
|
||||
! @@name: affinity_query.1
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,5 +1,9 @@
|
||||
\section{Task Affinity}
|
||||
\label{sec: task_affinity}
|
||||
\index{affinity!task affinity}
|
||||
\index{affinity!affinity clause@\code{affinity} clause}
|
||||
\index{clauses!affinity@\code{affinity}}
|
||||
\index{affinity clause@\code{affinity} clause}
|
||||
|
||||
The next example illustrates the use of the \code{affinity}
|
||||
clause with a \code{task} construct.
|
||||
|
@ -2,6 +2,7 @@
|
||||
\section{Fortran \code{ASSOCIATE} Construct}
|
||||
\fortranspecificstart
|
||||
\label{sec:associate}
|
||||
\index{ASSOCIATE construct, Fortran@\code{ASSOCIATE} construct, Fortran}
|
||||
|
||||
The following is an invalid example of specifying an associate name on a data-sharing attribute
|
||||
clause. The constraint in the Data Sharing Attribute Rules section in the OpenMP
|
||||
@ -29,5 +30,40 @@ region, \plc{v} has the value of -1 and \plc{u} has the value of the original \p
|
||||
|
||||
\pagebreak
|
||||
\ffreenexample[4.0]{associate}{3}
|
||||
|
||||
% blue line floater at top of this page for "Fortran, cont."
|
||||
\begin{figure}[t!]
|
||||
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
|
||||
\end{figure}
|
||||
\label{sec:associate_target}
|
||||
|
||||
\bigskip
|
||||
The following example illustrates mapping behavior for a Fortran
|
||||
associate name and its selector for a \scode{target} construct.
|
||||
|
||||
For the first 3 \scode{target} constructs the associate name \splc{a_aray} is
|
||||
associated with the selector \splc{aray}, an array.
|
||||
For the \scode{target} construct of code block TARGET 1 just the selector
|
||||
\splc{aray} is used and is implicitly mapped,
|
||||
likewise for the associate name \splc{a_aray} in the TARGET 2 block.
|
||||
However, mapping an associate name and its selector is not valid for the same
|
||||
\scode{target} construct. Hence the TARGET 3 block is non-conforming.
|
||||
|
||||
|
||||
In TARGET 4, the \splc{scalr} selector used in the \scode{target} region
|
||||
has an implicit data-sharing attribute of firstprivate since it is a scalar.
|
||||
Hence, the assigned value is not returned.
|
||||
In TARGET 5, the associate name \splc{a_scalr} is implicitly mapped and the
|
||||
assigned value is returned to the host (default \scode{tofrom} mapping behavior).
|
||||
In TARGET 6, the use of the associate name and its selector in the \scode{target}
|
||||
region is conforming because the scalar firstprivate behavior of the selector
|
||||
and the implicit mapping of the associate name are allowed.
|
||||
At the end of the \scode{target} region only the
|
||||
associate name's value is returned to the host.
|
||||
In TARGET 7, the selector and associate name appear in
|
||||
an explicit mapping for the same \scode{target} construct,
|
||||
hence the code block is non-conforming.
|
||||
|
||||
\ffreenexample[5.1]{associate}{4}
|
||||
\fortranspecificend
|
||||
|
||||
|
@ -2,6 +2,8 @@
|
||||
\section{C/C++ Arrays in a \code{firstprivate} Clause}
|
||||
\ccppspecificstart
|
||||
\label{sec:carrays_fpriv}
|
||||
\index{clauses!firstprivate@\code{firstprivate}}
|
||||
\index{firstprivate clause@\code{firstprivate} clause!C/C++ arrays in}
|
||||
|
||||
The following example illustrates the size and value of list items of array or
|
||||
pointer type in a \code{firstprivate} clause . The size of new list items is
|
||||
|
@ -1,6 +1,10 @@
|
||||
\pagebreak
|
||||
\section{\code{copyin} Clause}
|
||||
\label{sec:copyin}
|
||||
\index{clauses!copyin@\code{copyin}}
|
||||
\index{copyin clause@\code{copyin} clause}
|
||||
\index{directives!threadprivate@\code{threadprivate}}
|
||||
\index{threadprivate directive@\code{threadprivate} directive}
|
||||
|
||||
The \code{copyin} clause is used to initialize threadprivate data upon entry
|
||||
to a \code{parallel} region. The value of the threadprivate variable in the primary
|
||||
|
@ -1,6 +1,8 @@
|
||||
\pagebreak
|
||||
\section{\code{copyprivate} Clause}
|
||||
\label{sec:copyprivate}
|
||||
\index{clauses!copyprivate@\code{copyprivate}}
|
||||
\index{copyprivate clause@\code{copyprivate} clause}
|
||||
|
||||
The \code{copyprivate} clause can be used to broadcast values acquired by a single
|
||||
thread directly to all instances of the private variables in the other threads.
|
||||
@ -9,6 +11,8 @@ is not affected by the presence of the directives. If it is called from a \code{
|
||||
region, then the actual arguments with which \code{a} and \code{b} are associated
|
||||
must be private.
|
||||
|
||||
\index{constructs!single@\code{single}}
|
||||
\index{single construct@\code{single} construct}
|
||||
The thread that executes the structured block associated with the \code{single}
|
||||
construct broadcasts the values of the private variables \code{a}, \code{b},
|
||||
\code{x}, and
|
||||
@ -20,6 +24,8 @@ any of the threads have left the barrier at the end of the construct.
|
||||
|
||||
\fexample{copyprivate}{1}
|
||||
|
||||
\index{constructs!masked@\code{masked}}
|
||||
\index{masked construct@\code{masked} construct}
|
||||
In this example, assume that the input must be performed by the primary thread.
|
||||
Since the \code{masked} construct does not support the \code{copyprivate} clause,
|
||||
it cannot broadcast the input value that is read. However, \code{copyprivate}
|
||||
@ -27,7 +33,7 @@ is used to broadcast an address where the input value is stored.
|
||||
|
||||
\cexample[5.1]{copyprivate}{2}
|
||||
|
||||
\fexample[5.1]{copyprivate}{2}[1]
|
||||
\fexample[5.1]{copyprivate}{2}
|
||||
|
||||
Suppose that the number of lock variables required within a \code{parallel} region
|
||||
cannot easily be determined prior to entering it. The \code{copyprivate} clause
|
||||
|
@ -1,6 +1,8 @@
|
||||
\section{C++ Reference in Data-Sharing Clauses}
|
||||
\cppspecificstart
|
||||
\label{sec:cpp_reference}
|
||||
\index{clauses!data-sharing, C++ reference in}
|
||||
\index{data-sharing clauses, C++ reference in}
|
||||
|
||||
C++ reference types are allowed in data-sharing attribute clauses as of OpenMP 4.5, except
|
||||
for the \code{threadprivate}, \code{copyin} and \code{copyprivate} clauses.
|
||||
|
@ -1,6 +1,8 @@
|
||||
\pagebreak
|
||||
\section{\code{default(none)} Clause}
|
||||
\label{sec:default_none}
|
||||
\index{clauses!default(none)@\code{default(none)}}
|
||||
\index{default(none) clause@\code{default(none)} clause}
|
||||
|
||||
The following example distinguishes the variables that are affected by the \code{default(none)}
|
||||
clause from those that are not.
|
||||
|
@ -2,6 +2,7 @@
|
||||
\section{Fortran Private Loop Iteration Variables}
|
||||
\label{sec:fort_loopvar}
|
||||
\fortranspecificstart
|
||||
\index{loop variables, Fortran}
|
||||
|
||||
In general loop iteration variables will be private, when used in the \plc{do-loop}
|
||||
of a \code{do} and \code{parallel do} construct or in sequential loops in a
|
||||
|
@ -2,6 +2,8 @@
|
||||
\section{Fortran Restrictions on Storage Association with the \code{private} Clause}
|
||||
\fortranspecificstart
|
||||
\label{sec:fort_sa_private}
|
||||
\index{clauses!private@\code{private}}
|
||||
\index{private clause@\code{private} clause!storage association, Fortran}
|
||||
|
||||
The following non-conforming examples illustrate the implications of the \code{private}
|
||||
clause rules with regard to storage association.
|
||||
|
@ -2,6 +2,10 @@
|
||||
\section{Fortran Restrictions on \code{shared} and \code{private} Clauses with Common Blocks}
|
||||
\fortranspecificstart
|
||||
\label{sec:fort_sp_common}
|
||||
\index{clauses!private@\code{private}}
|
||||
\index{clauses!shared@\code{shared}}
|
||||
\index{private clause@\code{private} clause!common blocks, Fortran}
|
||||
\index{shared clause@\code{shared} clause!common blocks, Fortran}
|
||||
|
||||
When a named common block is specified in a \code{private}, \code{firstprivate},
|
||||
or \code{lastprivate} clause of a construct, none of its members may be declared
|
||||
|
@ -1,6 +1,8 @@
|
||||
\pagebreak
|
||||
\section{\code{lastprivate} Clause}
|
||||
\label{sec:lastprivate}
|
||||
\index{clauses!lastprivate@\code{lastprivate}}
|
||||
\index{lastprivate clause@\code{lastprivate} clause}
|
||||
|
||||
Correct execution sometimes depends on the value that the last iteration of a loop
|
||||
assigns to a variable. Such programs must list all such variables in a \code{lastprivate}
|
||||
@ -12,6 +14,8 @@ sequentially.
|
||||
\fexample{lastprivate}{1}
|
||||
|
||||
\clearpage
|
||||
\index{lastprivate clause@\code{lastprivate} clause!conditional modifier@\code{conditional} modifier}
|
||||
\index{conditional modifier@\code{conditional} modifier}
|
||||
The next example illustrates the use of the \code{conditional} modifier in
|
||||
a \code{lastprivate} clause to return the last value when it may not come from
|
||||
the last iteration of a loop.
|
||||
|
@ -1,6 +1,8 @@
|
||||
\pagebreak
|
||||
\section{\code{private} Clause}
|
||||
\label{sec:private}
|
||||
\index{clauses!private@\code{private}}
|
||||
\index{private clause@\code{private} clause}
|
||||
|
||||
In the following example, the values of original list items \plc{i} and \plc{j}
|
||||
are retained on exit from the \code{parallel} region, while the private list
|
||||
|
@ -7,6 +7,9 @@ This section covers ways to perform reductions in parallel, task, taskloop, and
|
||||
|
||||
\subsection{\code{reduction} Clause}
|
||||
\label{subsec:reduction}
|
||||
\index{clauses!reduction@\code{reduction}}
|
||||
\index{reduction clause@\code{reduction} clause}
|
||||
\index{reductions!reduction clause@\code{reduction} clause}
|
||||
|
||||
The following example demonstrates the \code{reduction} clause; note that some
|
||||
reductions can be expressed in the loop in several ways, as shown for the \code{max}
|
||||
@ -64,7 +67,7 @@ the start of the \code{parallel} region.
|
||||
|
||||
\cexample[5.1]{reduction}{6}
|
||||
|
||||
\fexample[5.1]{reduction}{6}[1]
|
||||
\fexample[5.1]{reduction}{6}
|
||||
|
||||
The following example demonstrates the reduction of array \plc{a}. In C/C++ this is illustrated by the explicit use of an array section \plc{a[0:N]} in the \code{reduction} clause. The corresponding Fortran example uses array syntax supported in the base language. As of the OpenMP 4.5 specification the explicit use of array section in the \code{reduction} clause in Fortran is not permitted. But this oversight has been fixed in the OpenMP 5.0 specification.
|
||||
|
||||
@ -75,6 +78,12 @@ The following example demonstrates the reduction of array \plc{a}. In C/C++ thi
|
||||
|
||||
\subsection{Task Reduction}
|
||||
\label{subsec:task_reduction}
|
||||
\index{clauses!task_reduction@\scode{task_reduction}}
|
||||
\index{task_reduction clause@\scode{task_reduction} clause}
|
||||
\index{reductions!task_reduction clause@\scode{task_reduction} clause}
|
||||
\index{clauses!in_reduction@\scode{in_reduction}}
|
||||
\index{in_reduction clause@\scode{in_reduction} clause}
|
||||
\index{reductions!in_reduction clause@\scode{in_reduction} clause}
|
||||
|
||||
In OpenMP 5.0 the \code{task\_reduction} clause was created for the \code{taskgroup} construct,
|
||||
to allow reductions among explicit tasks that have an \code{in\_reduction} clause.
|
||||
@ -97,6 +106,8 @@ reduction).
|
||||
|
||||
\ffreeexample[5.0]{task_reduction}{1}
|
||||
|
||||
\index{reduction clause@\code{reduction} clause!task modifier@\code{task} modifier}
|
||||
\index{task modifier@\code{task} modifier}
|
||||
In OpenMP 5.0 the \code{task} \plc{reduction-modifier} for the \code{reduction} clause was
|
||||
introduced to provide a means of performing reductions among implicit and explicit tasks.
|
||||
|
||||
@ -134,6 +145,9 @@ and list item (variable \code{x}) match as required.
|
||||
|
||||
\subsection{Reduction on Combined Target Constructs}
|
||||
\label{subsec:target_reduction}
|
||||
\index{reduction clause@\code{reduction} clause!on target construct@on \code{target} construct}
|
||||
\index{constructs!target@\code{target}}
|
||||
\index{target construct@\code{target} construct}
|
||||
|
||||
When a \code{reduction} clause appears on a combined construct that combines
|
||||
a \code{target} construct with another construct, there is an implicit map
|
||||
@ -174,6 +188,12 @@ first construct.
|
||||
|
||||
\subsection{Task Reduction with Target Constructs}
|
||||
\label{subsec:target_task_reduction}
|
||||
\index{in_reduction clause@\scode{in_reduction} clause}
|
||||
\index{constructs!target@\code{target}}
|
||||
\index{target construct@\code{target} construct}
|
||||
|
||||
\index{clauses!enter@\code{enter}}
|
||||
\index{enter clause@\code{enter} clause}
|
||||
|
||||
The following examples illustrate how task reductions can apply to target tasks
|
||||
that result from a \code{target} construct with the \code{in\_reduction}
|
||||
@ -184,34 +204,43 @@ task reduction will be combined (in some order) into the original variable
|
||||
listed in the \code{task\_reduction} clause before exiting the \code{taskgroup}
|
||||
region.
|
||||
|
||||
\cexample[5.1]{target_task_reduction}{1}
|
||||
\cexample[5.2]{target_task_reduction}{1}
|
||||
|
||||
\ffreeexample[5.1]{target_task_reduction}{1}[1]
|
||||
\ffreeexample[5.2]{target_task_reduction}{1}
|
||||
\clearpage
|
||||
|
||||
\index{reduction clause@\code{reduction} clause!task modifier@\code{task} modifier}
|
||||
\index{task modifier@\code{task} modifier}
|
||||
In the next pair of examples, the task reduction is defined by a
|
||||
\code{reduction} clause with the \code{task} modifier, rather than a
|
||||
\code{task\_reduction} clause on a \code{taskgroup} construct. Again, the
|
||||
partial results from the participating tasks will be combined in some order
|
||||
into the original reduction variable, \code{sum}.
|
||||
|
||||
\cexample[5.0]{target_task_reduction}{2a}
|
||||
\cexample[5.2]{target_task_reduction}{2a}
|
||||
|
||||
\ffreeexample[5.0]{target_task_reduction}{2a}
|
||||
\ffreeexample[5.2]{target_task_reduction}{2a}
|
||||
|
||||
\index{in_reduction clause@\scode{in_reduction} clause!with target construct@with \code{target} construct}
|
||||
\index{constructs!target@\code{target}}
|
||||
\index{target construct@\code{target} construct}
|
||||
Next, the \code{task} modifier is again used to define a task reduction over
|
||||
participating tasks. This time, the participating tasks are a target task
|
||||
resulting from a \code{target} construct with the \code{in\_reduction} clause,
|
||||
and the implicit task (executing on the primary thread) that calls
|
||||
\code{host\_compute}. As before, the partial results from these paricipating
|
||||
\code{host\_compute}. As before, the partial results from these participating
|
||||
tasks are combined in some order into the original reduction variable.
|
||||
|
||||
\cexample[5.1]{target_task_reduction}{2b}
|
||||
\cexample[5.2]{target_task_reduction}{2b}
|
||||
|
||||
\ffreeexample[5.1]{target_task_reduction}{2b}[1]
|
||||
\ffreeexample[5.2]{target_task_reduction}{2b}
|
||||
|
||||
|
||||
\subsection{Taskloop Reduction}
|
||||
\label{subsec:taskloop_reduction}
|
||||
\index{reduction clause@\code{reduction} clause!on taskloop construct@on \code{taskloop} construct}
|
||||
\index{constructs!taskloop@\code{taskloop}}
|
||||
\index{taskloop construct@\code{taskloop} construct}
|
||||
|
||||
In the OpenMP 5.0 Specification the \code{taskloop} construct
|
||||
was extended to include the reductions.
|
||||
@ -249,7 +278,7 @@ reduction that has not been defined.
|
||||
%create a new reduction and also that all tasks generated by the taskloop will
|
||||
%participate on it.
|
||||
|
||||
The second example computes exactly the same value as in the preceding\plc{taskloop\_reduction.1} code section,
|
||||
The second example computes exactly the same value as in the preceding \plc{taskloop\_reduction.1} code section,
|
||||
but in a very different way.
|
||||
First, in the \plc{array\_sum} function a \code{taskgroup} region is created
|
||||
that defines the scope of a new reduction using the \code{task\_reduction} clause.
|
||||
@ -261,7 +290,7 @@ This is allowed because what is expressed with the \code{in\_reduction} clause
|
||||
is different from what is expressed with the \code{reduction} clause.
|
||||
In one case the generated tasks are specified to participate in a previously
|
||||
declared reduction (\code{in\_reduction} clause) whereas in the other case
|
||||
creation of a new reduction is specified and also that all tasks generated
|
||||
creation of a new reduction is specified and also all tasks generated
|
||||
by the taskloop will participate on it.
|
||||
|
||||
\cexample[5.0]{taskloop_reduction}{2}
|
||||
@ -271,6 +300,9 @@ by the taskloop will participate on it.
|
||||
In the OpenMP 5.0 Specification, \code{reduction} clauses for the
|
||||
\code{taskloop}~\code{ simd} construct were also added.
|
||||
|
||||
\index{reduction clause@\code{reduction} clause!on taskloop simd construct@on \code{taskloop}~\code{simd} construct}
|
||||
\index{combined constructs!taskloop simd@\code{taskloop}~\code{simd}}
|
||||
\index{taskloop simd construct@\code{taskloop}~\code{simd} construct}
|
||||
The examples below compare reductions for the \code{taskloop} and the \code{taskloop}~\code{simd} constructs.
|
||||
These examples illustrate the use of \code{reduction} clauses within
|
||||
"stand-alone" \code{taskloop} constructs, and the use of \code{in\_reduction} clauses for tasks of taskloops to participate
|
||||
@ -341,11 +373,14 @@ At the end of the parallel region \plc{asum} contains the combined result of all
|
||||
|
||||
\cexample[5.1]{taskloop_simd_reduction}{1}
|
||||
|
||||
\ffreeexample[5.1]{taskloop_simd_reduction}{1}[1]
|
||||
\ffreeexample[5.1]{taskloop_simd_reduction}{1}
|
||||
|
||||
|
||||
\subsection{Reduction with the \code{scope} Construct}
|
||||
\label{subsec:reduction_scope}
|
||||
\index{reduction clause@\code{reduction} clause!on scope construct@on \code{scope} construct}
|
||||
\index{constructs!scope@\code{scope}}
|
||||
\index{scope construct@\code{scope} construct}
|
||||
|
||||
The following example illustrates the use of the \code{scope} construct
|
||||
to perform a reduction in a \code{parallel} region. The case is useful for
|
||||
|
@ -1,6 +1,10 @@
|
||||
\pagebreak
|
||||
\section{\code{scan} Directive}
|
||||
\label{sec:scan}
|
||||
\index{directives!scan@\code{scan}}
|
||||
\index{scan directive@\code{scan} directive}
|
||||
\index{reduction clause@\code{reduction} clause!inscan modifier@\code{inscan} modifier}
|
||||
\index{inscan modifier@\code{inscan} modifier}
|
||||
|
||||
The following examples illustrate how to parallelize a loop that saves
|
||||
the \emph{prefix sum} of a reduction. This is accomplished by using
|
||||
@ -9,6 +13,12 @@ variable of the scan, and specifying with a \code{scan} directive whether
|
||||
the storage statement includes or excludes the scan input of the present
|
||||
iteration (\texttt{k}).
|
||||
|
||||
\index{scan directive@\code{scan} directive!inclusive clause@\code{inclusive} clause}
|
||||
\index{scan directive@\code{scan} directive!exclusive clause@\code{exclusive} clause}
|
||||
\index{clauses!inclusive@\code{inclusive}}
|
||||
\index{inclusive clause@\code{inclusive} clause}
|
||||
\index{clauses!exclusive@\code{exclusive}}
|
||||
\index{exclusive clause@\code{exclusive} clause}
|
||||
Basically, the \code{inscan} modifier connects a loop and/or SIMD reduction to
|
||||
the scan operation, and a \code{scan} construct with an \code{inclusive} or
|
||||
\code{exclusive} clause specifies whether the ``scan phase'' (lexical block
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: associate.1f
|
||||
! @@name: associate.1
|
||||
! @@type: F-fixed
|
||||
! @@compilable: no
|
||||
! @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: associate.2f
|
||||
! @@name: associate.2
|
||||
! @@type: F-fixed
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: associate.3f
|
||||
! @@name: associate.3
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
|
58
data_environment/sources/associate.4.f90
Normal file
58
data_environment/sources/associate.4.f90
Normal file
@ -0,0 +1,58 @@
|
||||
! @@name: associate.4
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
! @@expect: success
|
||||
! @@version: omp_5.1
|
||||
program main
|
||||
integer :: scalr, aray(3)
|
||||
scalr = -1 ; aray = -1
|
||||
|
||||
associate(a_scalr=>scalr, a_aray=>aray)
|
||||
|
||||
!$omp target !! TARGET 1
|
||||
aray = [1,2,3]
|
||||
!$omp end target
|
||||
print *, a_aray, aray !! 1 2 3 1 2 3
|
||||
|
||||
!$omp target !! TARGET 2
|
||||
a_aray = [4,5,6]
|
||||
!$omp end target
|
||||
print *, a_aray, aray !! 4 5 6 4 5 6
|
||||
|
||||
!!!$omp target !! TARGET 3
|
||||
!! !! mapping, in this case implicit,
|
||||
!! !! of aray AND a_aray NOT ALLOWED
|
||||
!! aray = [4,5,6]
|
||||
!! a_aray = [1,2,3]
|
||||
!!!$omp end target
|
||||
|
||||
|
||||
!$omp target !! TARGET 4
|
||||
scalr = 1 !! scalr is firstprivate
|
||||
!$omp end target
|
||||
print *, a_scalr, scalr !! -1 -1
|
||||
|
||||
!$omp target !! TARGET 5
|
||||
a_scalr = 2 !! a_scalr implicitly mapped
|
||||
!$omp end target
|
||||
print *, a_scalr, scalr !! 2 2
|
||||
|
||||
!$omp target !! TARGET 6
|
||||
scalr = 3 !! scalr is firstprivate
|
||||
print *, a_scalr, scalr !! 2 3
|
||||
a_scalr = 4 !! a_scalr implicitly mapped
|
||||
print *, a_scalr, scalr !! 4 3
|
||||
!$omp end target
|
||||
print *, a_scalr, scalr !! 4 4
|
||||
|
||||
!!!$omp target map(a_scalr,scalr) !! TARGET 7
|
||||
!! mapping, in this case explicit,
|
||||
!! of scalr AND a_sclar NOT ALLOWED
|
||||
!! scalr = 5
|
||||
!! a_scalr = 5
|
||||
!!!$omp end target
|
||||
|
||||
end associate
|
||||
|
||||
end program
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: carrays_fpriv.1c
|
||||
* @@name: carrays_fpriv.1
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: yes
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: copyin.1c
|
||||
* @@name: copyin.1
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: copyin.1f
|
||||
! @@name: copyin.1
|
||||
! @@type: F-fixed
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: copyprivate.1c
|
||||
* @@name: copyprivate.1
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: copyprivate.1f
|
||||
! @@name: copyprivate.1
|
||||
! @@type: F-fixed
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,15 +1,11 @@
|
||||
/*
|
||||
* @@name: copyprivate.2c
|
||||
* @@name: copyprivate.2
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
* @@expect: success
|
||||
* @@version: omp_5.1
|
||||
*/
|
||||
#if _OPENMP < 202011
|
||||
#define masked master
|
||||
#endif
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
|
||||
|
@ -1,14 +1,9 @@
|
||||
! @@name: copyprivate.2f
|
||||
! @@name: copyprivate.2
|
||||
! @@type: F-fixed
|
||||
! @@compilable: yes
|
||||
! @@requires: preprocessing
|
||||
! @@linkable: no
|
||||
! @@expect: success
|
||||
! @@version: omp_5.1
|
||||
#if _OPENMP < 202011
|
||||
#define MASKED MASTER
|
||||
#endif
|
||||
|
||||
REAL FUNCTION READ_NEXT()
|
||||
REAL, POINTER :: TMP
|
||||
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: copyprivate.3c
|
||||
* @@name: copyprivate.3
|
||||
* @@type: C
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: copyprivate.3f
|
||||
! @@name: copyprivate.3
|
||||
! @@type: F-fixed
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: copyprivate.4f
|
||||
! @@name: copyprivate.4
|
||||
! @@type: F-fixed
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: cpp_reference.1c
|
||||
* @@name: cpp_reference.1
|
||||
* @@type: C++
|
||||
* @@compilable: yes
|
||||
* @@linkable: no
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* @@name: default_none.1c
|
||||
* @@name: default_none.1
|
||||
* @@type: C
|
||||
* @@compilable: no
|
||||
* @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: default_none.1f
|
||||
! @@name: default_none.1
|
||||
! @@type: F-fixed
|
||||
! @@compilable: no
|
||||
! @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: fort_loopvar.1f
|
||||
! @@name: fort_loopvar.1
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: fort_loopvar.2f
|
||||
! @@name: fort_loopvar.2
|
||||
! @@type: F-free
|
||||
! @@compilable: yes
|
||||
! @@linkable: no
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: fort_sa_private.1f
|
||||
! @@name: fort_sa_private.1
|
||||
! @@type: F-fixed
|
||||
! @@compilable: yes
|
||||
! @@linkable: yes
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: fort_sa_private.2f
|
||||
! @@name: fort_sa_private.2
|
||||
! @@type: F-fixed
|
||||
! @@compilable: maybe
|
||||
! @@linkable: maybe
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: fort_sa_private.3f
|
||||
! @@name: fort_sa_private.3
|
||||
! @@type: F-fixed
|
||||
! @@compilable: maybe
|
||||
! @@linkable: maybe
|
||||
|
@ -1,4 +1,4 @@
|
||||
! @@name: fort_sa_private.4f
|
||||
! @@name: fort_sa_private.4
|
||||
! @@type: F-fixed
|
||||
! @@compilable: maybe
|
||||
! @@linkable: maybe
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
x
Reference in New Issue
Block a user