v6.0 release

This commit is contained in:
Henry Jin 2024-11-13 11:07:08 -08:00
parent 11f2efcccf
commit 3346a30ce2
138 changed files with 3009 additions and 339 deletions

View File

@ -86,6 +86,7 @@ in the \docref{\kcode{map} Clause} subsection of the OpenMP Specifications docum
\input{data_environment/lastprivate}
\input{data_environment/reduction}
\input{data_environment/udr}
\input{data_environment/induction}
\input{data_environment/scan}
\input{data_environment/copyin}
\input{data_environment/copyprivate}

View File

@ -73,5 +73,7 @@ clause introduced in OpenMP 4.5.
\input{devices/async_target_with_tasks}
\input{devices/async_target_nowait}
\input{devices/async_target_nowait_depend}
\input{devices/async_target_nowait_arg}
\input{devices/device}
\input{devices/device_env_traits}

View File

@ -3,8 +3,8 @@
\index{directive syntax}
OpenMP \plc{directives} use base-language mechanisms to specify OpenMP program behavior.
In C code, the directives are formed exclusively with pragmas, whereas in C++
code, directives are formed from either pragmas or attributes.
In C/C++ code, the directives are formed with
either pragmas or attributes.
Fortran directives are formed with comments in free form and fixed form sources (codes).
All of these mechanisms allow the compilation to ignore the OpenMP directives if
OpenMP is not supported or enabled.
@ -20,18 +20,27 @@ The formats for combining a base-language mechanism and a \plc{directive-specifi
C/C++ pragmas
\begin{indentedcodelist}
\kcode{\#pragma omp} \plc{directive-specification}
#pragma omp \plc{directive-specification}
\end{indentedcodelist}
C++ attributes
C/C++ attribute specifiers
\begin{indentedcodelist}
\kcode{[[omp :: directive( \plc{directive-specification} )]]}
\kcode{[[using omp : directive( \plc{directive-specification} )]]}
[[omp :: directive( \plc{directive-specification} )]]
[[omp :: decl( \plc{directive-specification} )]]
\end{indentedcodelist}
C++ attribute specifiers
\begin{indentedcodelist}
[[using omp : directive( \plc{directive-specification} )]]
[[using omp : decl( \plc{directive-specification} )]]
\end{indentedcodelist}
where the \kcode{decl} attribute may be used for declarative
directives alternatively.
Fortran comments
\begin{indentedcodelist}
\scode{!$omp} \plc{directive-specification}
!$omp \plc{directive-specification}
\end{indentedcodelist}
where \scode{c$omp} and \scode{*$omp} may be used in Fortran fixed form sources.

View File

@ -21,6 +21,7 @@ whereby specific hot spots can be affected by transformation directives.
%===== Examples Sections =====
\input{loop_transformations/tile}
\input{loop_transformations/unroll}
\input{loop_transformations/partial_tile}
\input{loop_transformations/unroll}
\input{loop_transformations/apply}

View File

@ -105,6 +105,7 @@ chapter in the OpenMP Specifications document.
\input{program_control/cancellation}
\input{program_control/requires}
\input{program_control/context_based_variants}
\input{program_control/dispatch}
\input{program_control/nested_loop}
\input{program_control/nesting_restrict}
\input{program_control/target_offload}

View File

@ -59,4 +59,5 @@ can be found in the \docref{Tasking Constructs} chapter of the OpenMP Specificat
\input{tasking/taskyield}
\input{tasking/taskloop}
\input{tasking/parallel_masked_taskloop}
\input{tasking/taskloop_dep}

View File

@ -158,9 +158,19 @@ The following describes LaTeX macros defined specifically for examples.
\cppspecificstart, \cppspecificend
\ccppspecificstart, \ccppspecificend
\fortranspecificstart, \fortranspecificend
\begin{cspecific}[s] ... \end{cspecific}
\begin{cppspecific}[s] ... \end{cppspecific}
\begin{ccppspecific}[s] ... \end{ccppspecific}
\begin{fortranspecific}[s] ... \end{fortranspecific}
\topmarker{Lang}
```
Use of the structured `\begin{} .. \end{}` environments is the preferred
way of specifying language-dependent text over the unstructured approach
of using `\*specificstart` and `\*specificend`.
The option `[s]` to each of the environments can specify a vertical shift
for the beginning rule, such as when followed by a section header.
The macro `\topmarker` puts a dashed blue line floater at top of a page for
"Lang (cont.)" where `Lang` can be `C/C++`, `C++`, `Fortran`.

View File

@ -58,6 +58,9 @@ accordingly and listed in Section~\ref{sec:Updated Examples}.
\tablelasttail{\hline\\[-2ex]}
\tablecaption{Deprecated Features and Their Replacements\label{tab:Deprecated Features}}
\begin{supertabular}{p{0.4in} p{2.3in} p{2.2in}}
6.0 & \kcode{declare reduction(}\plc{reduction-id}: \plc{typename-list}: \plc{combiner}\kcode{)}
& \kcode{declare reduction(}\plc{reduction-id}: \plc{typename-list}\kcode{)} \kcode{combiner(\plc{combiner-exp})} \\
\hline
5.2 & \kcode{default} clause on metadirectives
& \kcode{otherwise} clause \\
5.2 & delimited \kcode{declare target} directive for C/C++
@ -98,6 +101,32 @@ the tables shows the version tag of the earlier version. It also shows
the prior name of an example when it has been renamed.
Table~\ref{tab:Updated Examples 6.0} lists the updated examples for
features deprecated in OpenMP 6.0
in the Examples Document Version
\href{https://github.com/OpenMP/Examples/tree/v6.0}{6.0}.
The \emph{Earlier Version} column of the table lists the earlier version
tags of the examples that can be found in
the Examples Document Version
\href{https://github.com/OpenMP/Examples/tree/v5.2}{5.2}.
\index{clauses!combiner@\kcode{combiner}}
\index{combiner clause@\kcode{combiner} clause}
\nolinenumbers
\dpftable{6.0}
\begin{supertabular}{p{1.7in} p{1.1in} p{2.2in}}
\hexentry{udr.1}[f90]{4.0} &
\plc{combiner} expression in \kcode{declare} \\
\hexentry{udr.2}[f90]{4.0} &
\kcode{reduction} directive changed to use \\
\hexentry{udr.3}[f90]{4.0} & \kcode{combiner} clause \\
\hexentry[f90]{udr.4}{4.0} & \\
\hexentry[cpp]{udr.5}{4.0} & \\
\hexentry[cpp]{udr.6}{4.0} & \\[2pt]
\end{supertabular}
\linenumbers
Table~\ref{tab:Updated Examples 5.2} lists the updated examples for
features deprecated in OpenMP 5.2
in the Examples Document Version \examplesref{5.2}.
@ -195,6 +224,7 @@ the Examples Document Version \examplesref{5.1}.
\end{supertabular}
\linenumbers
\newpage
Table~\ref{tab:Updated Examples 5.1} lists the updated examples for
features deprecated in OpenMP 5.1
in the Examples Document Version \examplesref{5.1}.

View File

@ -2,20 +2,27 @@
\label{chap:foreword}
The OpenMP Examples document has been updated with new features
found in the OpenMP \PVER\ Specification. The additional examples and updates
are referenced in the Document Revision History of the Appendix on page~\pageref{chap:history}.
found in the OpenMP \SVER\ Specification.
In order to provide users with new feature examples concurrently
with the release of the OpenMP 6.0 Specification,
the 6.0 Examples document is being released early
with a caveat that some of the 6.0 features
(such as \kcode{workdistribute} construct, \kcode{taskgraph} construct,
\kcode{threadset} clause and free-agent threads) will be covered
in the next release of the document.
For a list of the new examples and updates in this release,
please refer to the Document Revision History of the Appendix on page~\pageref{chap:history}.
Text describing an example with a \PVER\ feature specifically states
that the feature support begins in the OpenMP \PVER\ Specification. Also,
an \kcode{\small{}omp_\PVER} keyword is included in the metadata of the source code.
These distinctions are presented to remind readers that a \PVER\ compliant
Text describing an example with a \SVER\ feature specifically states
that the feature support begins in the OpenMP \SVER\ Specification. Also,
an \kcode{\small{}omp_\SVER} keyword is included in the metadata of the source code.
These distinctions are presented to remind readers that a \SVER\ compliant
OpenMP implementation is necessary to use these features in codes.
Examples for most of the \PVER\ features are included in this document,
and incremental releases will become available as more feature examples
%Examples for most of the \SVER\ features are included in this document,
%and
Incremental releases will become available as more feature examples
and updates are submitted and approved by the OpenMP Examples Subcommittee.
Examples are accepted for this document after discussions, revisions and reviews
in the Examples Subcommittee, and two reviews/discussions and two votes
in the OpenMP Language Committee.

View File

@ -1,6 +1,74 @@
\cchapter{Document Revision History}{history}
\label{chap:history}
%=====================================
\section{Changes from 5.2.2 to 6.0}
\label{sec:history_522_to_60}
\begin{itemize}
\item General changes:
\begin{itemize}
\item Added a set of structured LaTeX environments for specifying
language-dependent text. This allows extracting language-specific
content of the Examples document. Refer to the content of
\examplesblob{v6.0/Contributions.md} for details.
\end{itemize}
\item Added the following examples for the 6.0 features:
\begin{itemize}
\item \kcode{omp::decl} attribute for declarative directives in C/C++
(\specref{sec:attributes})
\item \kcode{transparent} clause on the \kcode{task} construct to enable dependences
between non-sibling tasks (\specref{subsec:depend_trans_task})
\item Task dependences for \kcode{taskloop} construct
(\specref{sec:taskloop_depend})
\item \kcode{num_threads} clause that appears inside \kcode{target} region
(\specref{subsec:target_teams_num_teams})
\item \kcode{nowait} clause with argument on the \kcode{target} construct to control deferment
of target task (\specref{subsec:async_target_nowait_arg})
\item Traits for specifying devices (\specref{sec:device_env_traits})
\item \kcode{apply} clause with modifier argument to
support selective loop transformations
(\specref{sec:apply_clause})
\item Reduction on private variables in a \kcode{parallel} region
(\specref{subsec:priv_reduction})
\item \kcode{induction} clause (\specref{subsec:induction})
and user-defined induction (\specref{subsec:user-defined-induction})
\item \kcode{init_complete} clause for \kcode{scan} directive to
support initialization phase in scan operation
(\specref{sec:scan})
\item \kcode{assume} construct with \kcode{no_openmp} and \kcode{no_parallelism} clauses (\specref{sec:assumption})
\item \kcode{num_threads} clause with a list
(\specref{subsec:icv_nthreads})
\item \kcode{dispatch} construct to control variant substitution
for a procedure call (\specref{sec:dispatch})
\end{itemize}
\item Other changes:
\begin{itemize}
\item Changed attribute specifier as a directive form from C++ only to C/C++
(\specref{chap:directive_syntax})
\item Added missing \bcode{include <omp.h>} in Example \example{atomic.4.c}
and \bcode{use omp_lib} in Example \example{atomic.4.f90}
(\specref{sec:atomic_hint})
\item Fixed the function declaration order for variant functions in
Examples \example{selector_scoring.[12].c} and Fortran pointer
initialization in Example \example{selector_scoring.2.f90}
(\specref{subsec:context_selector_scoring})
\item Replaced the deprecated use of \plc{combiner-exp}
in \kcode{declare reduction} directive with \kcode{combiner} clause
(\specref{subsec:UDR} and \specref{sec:Updated Examples})
\item Fixed the initialization of Fortran pointers
in Example \example{cancellation.2.f90} and changed to
use \kcode{atomic write} for performing atomic writes
(\specref{sec:cancellation})
\item Added missing \kcode{declare target} directive for external procedure
called inside \kcode{target} region in Example
\example{requires.1.f90} (\specref{sec:requires})
\end{itemize}
\end{itemize}
%=====================================
\section{Changes from 5.2.1 to 5.2.2}
\label{sec:history_521_to_522}

View File

@ -4,15 +4,21 @@
include versioninfo
default: openmp-examples.pdf
diff: openmp-diff-abridged.pdf
diff: clean openmp-diff-abridged.pdf
book: BOOK_BUILD="\\\\def\\\\bookbuild{1}"
book: VERSIONSTR="$(version_date)"
book: clean openmp-examples.pdf
mv openmp-examples-${version}.pdf openmp-examples-${version}-book.pdf
release: VERSIONSTR="$(version_date)"
release: clean openmp-examples.pdf
book: BOOK_BUILD="\\\\def\\\\bookbuild{1}"
book: clean release
mv openmp-examples-${version}.pdf openmp-examples-${version}-book.pdf
ccpp-only: LANG_OPT="\\\\ccpptrue\\\\fortranfalse"
ccpp-only: clean release
fortran-only: LANG_OPT="\\\\ccppfalse\\\\fortrantrue"
fortran-only: clean release
CHAPTERS=Title_Page.tex \
Foreword_Chapt.tex \
Chap_*.tex \
@ -41,8 +47,9 @@ LATEXDCMD=$(LATEXCMD) -draftmode
# check for branches names with "name_XXX"
DIFF_TICKET_ID=$(shell git rev-parse --abbrev-ref HEAD)
GITREV=$(shell git rev-parse --short HEAD)
GITREV=$(shell git rev-parse --short HEAD || echo "??")
VERSIONSTR="GIT rev $(GITREV)"
LANG_OPT="\\\\ccpptrue\\\\fortrantrue"
openmp-examples.pdf: $(CHAPTERS) $(SOURCES) openmp.sty openmp-examples.tex openmp-logo.png generated-include.tex
rm -f $(INTERMEDIATE_FILES)
@ -75,15 +82,15 @@ endif
ifdef DIFF_FROM
VC_DIFF_FROM := -r ${DIFF_FROM}
else
VC_DIFF_FROM := -r main
VC_DIFF_FROM := -r work_6.0
endif
DIFF_TO:=HEAD
DIFF_FROM:=main
DIFF_FROM:=work_6.0
DIFF_TYPE:=UNDERLINE
COMMON_DIFF_OPTS:=--math-markup=whole \
--append-safecmd=plc,code,hcode,scode,pcode,splc \
--append-safecmd=plc,code,kcode,scode,ucode,vcode,splc,bcode,pvar,pout,example \
--append-textcmd=subsubsubsection
VC_DIFF_OPTS:=${COMMON_DIFF_OPTS} --force -c latexdiff.cfg --flatten --type="${DIFF_TYPE}" --git --pdf ${VC_DIFF_FROM} ${VC_DIFF_TO} --subtype=ZLABEL --graphics-markup=none
@ -94,8 +101,10 @@ generated-include.tex:
echo "$(BOOK_BUILD)"
echo "$(BOOK_BUILD)" > $@
echo "\def\VER{${version}}" >> $@
echo "\def\PVER{${version_spec}}" >> $@
echo "\def\SVER{${version_spec}}" >> $@
echo "\def\VERDATE{${VERSIONSTR}}" >> $@
echo "\\\\newif\ifccpp\\\\newif\iffortran" >> $@
echo "$(LANG_OPT)" >> $@
util/list_tags -vtag */sources/* >> $@
%.tmpdir: $(wildcard *.sty) $(wildcard *.png) $(wildcard *.aux) openmp-examples.pdf

46
STYLE_GUIDE.md Normal file
View File

@ -0,0 +1,46 @@
### OpenMP Examples Coding Style Guide
Must Dos:
- Indents and Braces
- Code: Follow common base language practices.
- Where indents are normally used, use 2 spaces instead of tabs.
- Comments: Follow the indent of the base language for which the comment applies.
- OpenMP directives should be indented as if it's base language code where possible.
- Braces `{}` around structured blocks following directives must be on a new line and must follow base language indent.
- For C/C++ examples, for code blocks with multiple lines, the if-else statements must follow the following format:
```
if {
} else {
}
```
- All Section and sub-section headings must be in Title case. For example: " This is a Useful Example of X Directive ".
- Comments
- Comments go on a new line before the relevant code/code block.
- Expected results may go on the same line.
- Keep comments terse; detailed explanations go in the text.
- Output
- If there is a deterministic output, provide it.
- It can be done in one of the following ways:
- Specify the correct value in a comment.
- Code prints out "expected" and "run" values.
- Test for the correctness of a value in a conditional.
- If the test is expected to execute, return values must be used to indicate success or failure.
- For tests that produce incorrect results, use:
- `return(1)` for C/C++
- `stop 1` for Fortran (do not exit)
- For tests that need to discontinue execution, use:
- `exit(1)` for C/C++
- `error stop` for Fortran
- Validation messages such as "Pass" / "Fail" are not mandatory.
- A single "pass" or "fail" is sufficient for a multi-case test.
- To Verify Metadata:
- A tool in the repository at the top level, "make check", scans all sources for version tags and ensures line length is 75 characters max.
- Inside `utils`, there is `chk_tags` (see different options) that can accept 1 file and scan for all specified values.
Donts:
- Unless required by the feature, use free-format Fortran for new examples.
- Do not use all-caps for emphasis in the document.

View File

@ -12,7 +12,7 @@
\textsf{OpenMP\\Application Programming\\Interface}
% An optional subtitle can go here:
\vspace{0.5in}\textsf{Examples}\vspace{-0.7in}
\vspace{0.5in}\textsf{\langselect Examples}\vspace{-0.7in}
\normalsize
\vspace{1.0in}

View File

@ -23,12 +23,14 @@ starting from 0, such that the hardware threads 0,1 form the first physical core
The following equivalent place list declarations consist of eight places (which
we designate as p0 to p7):
\kcode{OMP_PLACES}=\verb+"{0,1},{2,3},{4,5},{6,7},{8,9},{10,11},{12,13},{14,15}"+
\begin{boxeducode}
\kcode{export OMP_PLACES=}"{0,1},{2,3},{4,5},{6,7},{8,9},{10,11},{12,13},
{14,15}"
\end{boxeducode}
or
\kcode{OMP_PLACES}=\verb+"{0:2}:8:2"+
\begin{boxeducode}
\kcode{export OMP_PLACES=}"{0:2}:8:2"
\end{boxeducode}
\subsection{Spread Affinity Policy}
\label{subsec:affinity_spread}

View File

@ -47,12 +47,13 @@ a nested parallel region runs half of the available threads on each socket.
These OpenMP environment variables have been set:
\begin{compactitem}
\item \kcode{OMP_PROC_BIND}=\verb+"TRUE"+
\item \kcode{OMP_NUM_THREADS}=\verb+"2,4"+
\item \kcode{OMP_PLACES}=\verb+"{0,2,4,6},{1,3,5,7}"+
\item \kcode{OMP_AFFINITY_FORMAT}=\verb+"nest_level= %L, parent_thrd_num= %a,+ \verb+thrd_num= %n, thrd_affinity= %A"+
\end{compactitem}
\begin{boxeducode}
\kcode{export OMP_PROC_BIND=}"TRUE"
\kcode{export OMP_NUM_THREADS=}"2,4"
\kcode{export OMP_PLACES=}"{0,2,4,6},{1,3,5,7}"
\kcode{export OMP_AFFINITY_FORMAT=}"nest_level= %L, parent_thrd_num= %a,
thrd_num= %n, thrd_affinity= %A"
\end{boxeducode}
where the numbers correspond to core ids for the system. Note, \kcode{OMP_DISPLAY_AFFINITY} is not
set and is \vcode{FALSE} by default. This example shows how to use API routines to

View File

@ -1,6 +1,6 @@
%\pagebreak
\begin{fortranspecific}[4ex]
\section{Fortran \bcode{ASSOCIATE} Construct}
\fortranspecificstart
\label{sec:associate}
\index{ASSOCIATE construct, Fortran@\bcode{ASSOCIATE} construct, Fortran}
@ -12,14 +12,13 @@ name \ucode{b} is associated with the shared variable \ucode{a}. With the predet
attribute rule, the associate name \ucode{b} is not allowed to be specified on the \kcode{private}
clause.
\pagebreak
%\pagebreak
\fnexample[4.0]{associate}{1}
In next example, within the \kcode{parallel} construct, the association name \ucode{thread_id}
is associated with the private copy of \ucode{i}. The print statement should output the
unique thread number.
\topmarker{Fortran}
\fnexample[4.0]{associate}{2}
The following example illustrates the effect of specifying a selector name on a data-sharing
@ -30,9 +29,10 @@ The association between \ucode{u} and the original \ucode{v} is retained (see th
Attribute Rules} section in the OpenMP 4.0 API Specification). Inside the \kcode{parallel}
region, \ucode{v} has the value of -1 and \ucode{u} has the value of the original \ucode{v}.
\topmarker{Fortran}
\ffreenexample[4.0]{associate}{3}
\topmarker{Fortran}
%\topmarker{Fortran}
\label{sec:associate_target}
\bigskip
@ -63,5 +63,5 @@ an explicit mapping for the same \kcode{target} construct,
hence the code block is non-conforming.
\ffreenexample[5.1]{associate}{4}
\fortranspecificend
\end{fortranspecific}

View File

@ -1,6 +1,6 @@
%\pagebreak
\begin{ccppspecific}[4ex]
\section{C/C++ Arrays in a \kcode{firstprivate} Clause}
\ccppspecificstart
\label{sec:carrays_fpriv}
\index{clauses!firstprivate@\kcode{firstprivate}}
\index{firstprivate clause@\kcode{firstprivate} clause!C/C++ arrays in}
@ -34,6 +34,6 @@ array is assigned to the corresponding element of the new array. Those of pointe
type are initialized as if by assignment from the original item to the new item.
\cnexample{carrays_fpriv}{1}
\ccppspecificend
\end{ccppspecific}

View File

@ -42,7 +42,7 @@ that \kcode{parallel} region.
\cexample{copyprivate}{3}
\fortranspecificstart
\begin{fortranspecific}
\fnexample{copyprivate}{3}
Note that the effect of the \kcode{copyprivate} clause on a variable with the
@ -52,6 +52,6 @@ the pointer \ucode{B} is copied (as if by pointer assignment) to the correspondi
list items in the other implicit tasks belonging to the \kcode{parallel} region.
\fnexample{copyprivate}{4}
\fortranspecificend
\end{fortranspecific}

View File

@ -1,5 +1,5 @@
\begin{cppspecific}[4ex]
\section{C++ Reference in Data-Sharing Clauses}
\cppspecificstart
\label{sec:cpp_reference}
\index{clauses!data-sharing, C++ reference in}
\index{data-sharing clauses, C++ reference in}
@ -13,4 +13,4 @@ Additionally it shows how the data-sharing of formal arguments with a C++ refere
\cppnexample[4.5]{cpp_reference}{1}
\cppspecificend
\end{cppspecific}

View File

@ -7,14 +7,14 @@
The following example distinguishes the variables that are affected by the \kcode{default(none)}
clause from those that are not.
\ccppspecificstart
\begin{ccppspecific}
Beginning with OpenMP 4.0, variables with \bcode{const}-qualified type and no mutable member
are no longer predetermined shared. Thus, these variables (variable \ucode{c} in the example)
need to be explicitly listed
in data-sharing attribute clauses when the \kcode{default(none)} clause is specified.
\cnexample{default_none}{1}
\ccppspecificend
\end{ccppspecific}
\fexample{default_none}{1}

View File

@ -1,7 +1,7 @@
%\pagebreak
\begin{fortranspecific}[4ex]
\section{Fortran Private Loop Iteration Variables}
\label{sec:fort_loopvar}
\fortranspecificstart
\index{loop variables, Fortran}
In general loop iteration variables will be private, when used in the \plc{do-loop}
@ -21,5 +21,5 @@ example:
Note however that the use of shared loop iteration variables can easily lead to
race conditions.
\fortranspecificend
\end{fortranspecific}

View File

@ -1,4 +1,5 @@
%\pagebreak
\begin{fortranspecific}[4ex]
\section{Fortran Restrictions on Storage Association with the \kcode{private} Clause}
\label{sec:fort_sa_private}
\index{clauses!private@\kcode{private}}
@ -8,9 +9,9 @@ The following non-conforming examples illustrate the implications of the \kcode{
clause rules with regard to storage association.
\pagebreak
\fortranspecificstart
\fnexample{fort_sa_private}{1}
\topmarker{Fortran}
\fnexample{fort_sa_private}{2}
\fnexample{fort_sa_private}{3}
@ -19,5 +20,5 @@ clause rules with regard to storage association.
\topmarker{Fortran}
\fnexample[5.1]{fort_sa_private}{5}
\fortranspecificend
\end{fortranspecific}

View File

@ -1,6 +1,6 @@
%\pagebreak
\begin{fortranspecific}[4ex]
\section{Passing Shared Variable to Procedure in Fortran}
\fortranspecificstart
\label{sec:fort_shared_var}
\index{clauses!shared@\kcode{shared}}
\index{shared clause@\kcode{shared} clause!storage association, Fortran}
@ -41,5 +41,5 @@ not well defined.
\topmarker{Fortran}
\ffreenexample{fort_shared_var}{1}
\fortranspecificend
\end{fortranspecific}

View File

@ -1,6 +1,6 @@
%\pagebreak
\begin{fortranspecific}[4ex]
\section{Fortran Restrictions on \kcode{shared} and \kcode{private} Clauses with Common Blocks}
\fortranspecificstart
\label{sec:fort_sp_common}
\index{clauses!private@\kcode{private}}
\index{clauses!shared@\kcode{shared}}
@ -35,6 +35,6 @@ The following example is non-conforming because a common block may not be declar
both shared and private:
\fnexample{fort_sp_common}{5}
\fortranspecificend
\end{fortranspecific}

View File

@ -0,0 +1,67 @@
%\pagebreak
\section{Induction}
\label{sec:induction}
This section covers ways to perform inductions in \kcode{distribute}, worksharing-loop, \kcode{taskloop}, and SIMD regions.
\subsection{\kcode{induction} Clause}
\label{subsec:induction}
\index{clauses!induction@\kcode{induction}}
\index{induction clause@\kcode{induction} clause}
\index{inductions!induction clause@\kcode{induction} clause}
\index{inductions!closed form}
The following example demonstrates the basic use of the \kcode{induction} clause
in Case 1 for variable \ucode{xi} in a loop in routine \ucode{comp_poly} to
evaluate the polynomial of variable \ucode{x}.
For this case, the induction operation is
with the inductor `\scode{*}' and induction step \ucode{x}.
The intermediate value of \ucode{xi} is used in producing
the reduction sum \ucode{result}.
The last value of \ucode{xi} is well defined after the loop and
is printed out together with the final value of \ucode{result}.
An alternative approach is to use an \plc{inscan} reduction
as illustrated in Case 2, but this may not be as optimal as Case 1.
An equivalent code without the \kcode{induction} clause is given in Case 3
where a non-recursive closed form of the induction operation is used to
compute the intermediate value of \ucode{xi}.
The last value of \ucode{xi} is returned with the \kcode{lastprivate} clause
for this case.
\cexample[6.0]{induction}{1}
\ffreeexample[6.0]{induction}{1}
\subsection{User-defined Induction}
\label{subsec:user-defined-induction}
\index{directives!declare induction@\kcode{declare induction}}
\index{declare induction directive@\kcode{declare induction} directive}
\index{inductions!declare induction directive@\kcode{declare induction} directive}
\index{inductions!inductor clause@\kcode{inductor} clause}
\index{inductions!collector clause@\kcode{collector} clause}
\index{inductions!user-defined}
\index{OpenMP variable identifiers!omp_var@\kcode{omp_var}}
\index{OpenMP variable identifiers!omp_step@\kcode{omp_step}}
\index{OpenMP variable identifiers!omp_idx@\kcode{omp_idx}}
The following is a user-defined induction example that uses the
\kcode{declare induction} directive and the \kcode{induction} clause.
The example processes in parallel $N$ points along a line of a given slope
starting from a given point, and where adjacent points are separated by
a fixed distance.
The induction variable \ucode{P} represents a point, and
the step expression is the distance. The induction identifier \ucode{next}
is defined in the \kcode{declare induction} directive with an
appropriate \plc{inductor} via the \kcode{inductor} clause and
\plc{collector} via the \kcode{collector} clause.
This identifier together with the \kcode{step(\ucode{Separation})}
modifier is specified in the \kcode{induction} clause
for the \kcode{parallel for}/\kcode{do} construct
in routine \ucode{processPointsInLine}.
\cppexample[6.0]{induction}{2}
\ffreeexample[6.0]{induction}{2}

View File

@ -26,7 +26,7 @@ written as follows:
\cexample{reduction}{2}
\fortranspecificstart
\begin{fortranspecific}
\ffreenexample{reduction}{2}
The following program is non-conforming because the reduction is on the
@ -47,7 +47,7 @@ The following conforming program performs the reduction using
to \ucode{MIN}.
\ffreenexample{reduction}{5}
\fortranspecificend
\end{fortranspecific}
%\pagebreak
The following example is non-conforming because the initialization (\ucode{a =
@ -395,3 +395,34 @@ without using a worksharing-loop construct.
\ffreeexample[5.1]{scope_reduction}{1}
\subsection{Reduction on Private Variables in a \kcode{parallel} Region}
\label{subsec:priv_reduction}
\index{reduction clause@\kcode{reduction} clause!on private variables}
\index{reduction clause@\kcode{reduction} clause!original(private) modifier@\kcode{original(private)} modifier}
The following example shows reduction on a private variable (\ucode{sum_v})
for an orphaned worksharing loop in routine \ucode{do_red},
which is called in a \kcode{parallel} region.
At the end of the loop, private variable of each thread should have the same combined value.
\cexample[6.0]{priv_reduction}{1}
\ffreeexample[6.0]{priv_reduction}{1}
The following example is slightly modified from the previous example
where the \kcode{original(private)} modifier is explicitly specified
for variable \ucode{sum_v} in the \kcode{reduction} clause.
This modifier indicates that variable \ucode{sum_v} is private
for reduction as opposed to shared by default for a variable
passed as a procedure argument.
\cppexample[6.0]{priv_reduction}{2}
\ffreeexample[6.0]{priv_reduction}{2}
The following example shows the effect of nested \kcode{reduction} constructs.
For the \kcode{parallel} construct, the reduction is on the shared variable
\ucode{x}. For the worksharing loop nested inside the \kcode{parallel}
region, the reduction is performed on the private copy of \ucode{x}
for each thread.
With 4 threads assigned for the \kcode{parallel} region
(enforced by the \kcode{strict} modifier in the \kcode{num_threads} clause),
the code should print 40 at the end.
\cexample[6.0]{priv_reduction}{3}
\ffreeexample[6.0]{priv_reduction}{3}

View File

@ -46,3 +46,20 @@ of the prefix sum \ucode{b[k]} (\ucode{b(k)} in Fortran) for iteration \ucode{k}
\cexample[5.0]{scan}{2}
\ffreeexample[5.0]{scan}{2}
In OpenMP 6.0, the \kcode{scan} directive was extended to support
the concept of an \plc{initialization} phase where a private variable
can be set for later use in the \plc{input} phase of
an \plc{exclusive} scan operation.
The following example is a rewrite of the previous exclusive scan
example, which uses the \kcode{scan init_complete} directive to separate
the initialization phase from the other phases of the scan operation.
The private variable \ucode{tmp} is set in the initialization phase
and used later in the input phase to update the prefix sum stored
in variable \ucode{x}.
This case allows the same array \ucode{c} to be used for
both input and output of the scan results.
\cexample[6.0]{scan}{3}
\ffreeexample[6.0]{scan}{3}

View File

@ -0,0 +1,48 @@
/*
* @@name: induction.1
* @@type: C
* @@operation: compile
* @@expect: success
* @@version: omp_6.0
*/
#include <stdio.h>
#include <math.h>
void comp_poly(int N, double x, double c[]) {
// x: input: value of x for which to eval the polynomial
// c[N]: input: the coefficients
double x0 = 1.0; // initial value x^0 == 1
double xi; // x^i
double result; // accumulator for the result
// Case 1: induction clause
xi = x0;
result = 0.0;
#pragma omp parallel for reduction(+: result) induction(step(x),*: xi)
for (int i = 0; i < N; i++) {
result += c[i] * xi;
xi *= x;
}
printf("C1: result = %f, xn = %f\n", result, xi);
// Case 2: inscan reduction
xi = x0;
result = 0.0;
#pragma omp parallel for reduction(+: result) reduction(inscan,*: xi)
for (int i = 0; i < N; i++) {
result += c[i] * xi;
#pragma omp scan exclusive(xi)
xi *= x;
}
printf("C2: result = %f, xn = %f\n", result, xi);
// Case 3: closed form
result = 0.0;
#pragma omp parallel for reduction(+: result) lastprivate(xi)
for (int i = 0; i < N; i++) {
xi = x0 * pow(x, i); // induction operation in closed form
result += c[i] * xi;
xi *= x;
}
printf("C3: result = %f, xn = %f\n", result, xi);
}

View File

@ -0,0 +1,48 @@
! @@name: induction.1
! @@type: F-free
! @@operation: compile
! @@expect: success
! @@version: omp_6.0
subroutine comp_poly(N, x, c)
implicit none
! x: input: value of x for which to eval the polynomial
! c(N): input: the coefficients
integer :: N
double precision :: x, c(*)
double precision :: x0 = 1.0 ! initial value x^0 == 1
double precision :: xi ! x^i
double precision :: result ! accumulator for the result
integer :: i
!! Case 1: induction clause
xi = x0
result = 0.0
!$omp parallel do reduction(+: result) induction(step(x),*: xi)
do i = 1, N
result = result + c(i) * xi
xi = xi * x
end do
print *, 'C1: result =', result, ', xn =', xi
!! Case 2: inscan reduction
xi = x0
result = 0.0
!$omp parallel do reduction(+: result) reduction(inscan,*: xi)
do i = 1, N
result = result + c(i) * xi
!$omp scan exclusive(xi)
xi = xi * x
end do
print *, 'C2: result =', result, ', xn =', xi
!! Case 3: closed form
result = 0.0
!$omp parallel do reduction(+: result) lastprivate(xi)
do i = 1, N
xi = x0 * (x ** (i-1)) ! induction operation in closed form
result = result + c(i) * xi
xi = xi * x
end do
print *, 'C3: result =', result, ', xn =', xi
end subroutine

View File

@ -0,0 +1,47 @@
/*
* @@name: induction.2
* @@type: C++
* @@operation: compile
* @@expect: success
* @@version: omp_6.0
*/
#include <cmath>
class Point {
float x, y, m;
char color;
public:
Point(float x, float y, float m) : x(x), y(y), m(m) {
color = (int)(x+y) % 256;
}
Point nextPoint(float distance) {
// return a Point that is 'distance' away along slope m
// in the x direction
float deltaX = distance/(sqrtf(1.0f + m * m));
float deltaY = m * deltaX;
Point NewPoint(x+deltaX, y+deltaY, m);
return NewPoint;
}
};
#pragma omp declare induction(next : (Point, float)) \
inductor (omp_var = omp_var.nextPoint(omp_step)) \
collector(omp_step * omp_idx)
extern void process(Point P);
void processPointsInLine(Point Start, int NumberOfPoints,
float Separation) {
Point P = Start;
#pragma omp parallel for induction(step(Separation), next : P)
for (int i = 0; i < NumberOfPoints; ++i) {
process(P);
P = P.nextPoint(Separation);
}
}
int main() {
Point Start(1.0f, -2.0f, 0.5f);
processPointsInLine(Start, 100, 0.25f);
return 0;
}

View File

@ -0,0 +1,66 @@
! @@name: induction.2
! @@type: F-free
! @@operation: compile
! @@expect: success
! @@version: omp_6.0
module udi
integer, parameter :: I2 = selected_int_kind(3) ! enough for 256
type Point
real x, y, m
integer(I2) color
contains
procedure initPoint, nextPoint
end type
!$omp declare induction(next : (Point, real)) &
!$omp& inductor (omp_var = omp_var%nextPoint(omp_step)) &
!$omp& collector(omp_step * omp_idx)
contains
subroutine initPoint(this, x1, y1, m1)
implicit none
class(Point) this
real x1, y1, m1
this%x = x1; this%y = y1; this%m = m1
this%color = mod(int(x1+y1), 256)
end subroutine
function nextPoint(this, distance) result(NewPoint)
! return a Point that is 'distance' away along slope m in the x direction
implicit none
class(Point) this
real distance
type(Point) NewPoint
real deltaX, deltaY
deltaX = distance/(sqrt(1.0 + this%m * this%m))
deltaY = this%m * deltaX
call NewPoint%initPoint(this%x+deltaX, this%y+deltaY, this%m)
end function
end module
subroutine processPointsInLine(Start, NumberOfPoints, Separation)
use udi
implicit none
type(Point) Start
integer NumberOfPoints
real Separation
type(Point) P
integer i
P = Start
!$omp parallel do induction(step(Separation), next : P)
do i = 1, NumberOfPoints
call process(P)
P = P%nextPoint(Separation)
end do
end subroutine
program main
use udi
implicit none
type(Point) Start
call Start%initPoint(1.0, -2.0, 0.5)
call processPointsInLine(Start, 100, 0.25)
end program

View File

@ -0,0 +1,35 @@
/*
* @@name: priv_reduction.1
* @@type: C
* @@operation: run
* @@expect: success
* @@version: omp_6.0
*/
#include <stdio.h>
#include <omp.h>
#define N 100
int do_red(int n, int *v)
{
int sum_v = 0; // sum_v is private
#pragma omp for reduction(+: sum_v)
for (int i = 0; i < n; i++) {
sum_v += v[i];
}
return sum_v;
}
int main(void)
{
int v[N];
for (int i = 0; i < N; i++)
v[i] = i;
#pragma omp parallel
{
int s_v = do_red(N, v);
printf("myid %d: sum of v = %d\n", omp_get_thread_num(), s_v);
}
return 0;
}

View File

@ -0,0 +1,35 @@
! @@name: priv_reduction.1
! @@type: F-free
! @@operation: run
! @@expect: success
! @@version: omp_6.0
function do_red(n, v) result(sum_v)
implicit none
integer :: n, v(*)
integer :: sum_v ! sum_v is private
integer :: i
sum_v = 0
!$omp do reduction(+: sum_v)
do i = 1, n
sum_v = sum_v + v(i)
end do
end function
program priv_red
use :: omp_lib, only : omp_get_thread_num
implicit none
integer, parameter :: N = 100
integer :: i, v(N), s_v
integer, external :: do_red
do i = 1, N
v(i) = i - 1
end do
!$omp parallel private(s_v)
s_v = do_red(N, v)
print 10, omp_get_thread_num(), s_v
10 format("myid ", i0, ": sum of v = ", i0)
!$omp end parallel
end program

View File

@ -0,0 +1,34 @@
/*
* @@name: priv_reduction.2
* @@type: C++
* @@operation: run
* @@expect: success
* @@version: omp_6.0
*/
#include <stdio.h>
#include <omp.h>
#define N 100
void do_red(int n, int *v, int &sum_v)
{
sum_v = 0; // sum_v is private
#pragma omp for reduction(original(private),+: sum_v)
for (int i = 0; i < n; i++) {
sum_v += v[i];
}
}
int main(void)
{
int v[N];
for (int i = 0; i < N; i++)
v[i] = i;
#pragma omp parallel
{
int s_v; // s_v is private
do_red(N, v, s_v);
printf("myid %d: sum of v = %d\n", omp_get_thread_num(), s_v);
}
return 0;
}

View File

@ -0,0 +1,34 @@
! @@name: priv_reduction.2
! @@type: F-free
! @@operation: run
! @@expect: success
! @@version: omp_6.0
subroutine do_red(n, v, sum_v)
implicit none
integer :: n, v(*)
integer :: sum_v
integer :: i
sum_v = 0 ! sum_v is private
!$omp do reduction(original(private),+: sum_v)
do i = 1, n
sum_v = sum_v + v(i)
end do
end subroutine
program priv_red
use :: omp_lib, only : omp_get_thread_num
implicit none
integer, parameter :: N = 100
integer :: i, v(N), s_v
do i = 1, N
v(i) = i - 1
end do
!$omp parallel private(s_v)
call do_red(N, v, s_v)
print 10, omp_get_thread_num(), s_v
10 format("myid ", i0, ": sum of v = ", i0)
!$omp end parallel
end program

View File

@ -0,0 +1,24 @@
/*
* @@name: priv_reduction.3
* @@type: C
* @@operation: run
* @@expect: success
* @@version: omp_6.0
*/
#include <stdio.h>
int main(void)
{
int x;
x = 0;
// parallel reduction on shared x
#pragma omp parallel reduction(+: x) num_threads(strict: 4)
{
#pragma omp for reduction(+: x) // reduction on private x
for (int i = 0; i < 10; i++)
x++;
}
printf("x = %d\n", x); // should print 40, with 4 threads
return 0;
}

View File

@ -0,0 +1,20 @@
! @@name: priv_reduction.3
! @@type: F-free
! @@operation: run
! @@expect: success
! @@version: omp_6.0
program nest_red
implicit none
integer :: x
x = 0
! parallel reduction on shared x
!$omp parallel reduction(+: x) num_threads(strict: 4)
!$omp do reduction(+: x) ! reduction on private x
do i = 1, 10
x = x + 1
end do
!$omp end do
!$omp end parallel
print *, "x =", x ! should print 40, with 4 threads
end program

View File

@ -0,0 +1,40 @@
/*
* @@name: scan.3
* @@type: C
* @@operation: run
* @@expect: success
* @@version: omp_6.0
*/
#include <stdio.h>
#define N 100
int main(void)
{
int c[N], tmp;
int x = 0;
// initialization
for (int k = 0; k < N; k++)
c[k] = k + 1;
// c[k] is used for both input and output of scan results
#pragma omp parallel for simd reduction(inscan,+: x) private(tmp)
for (int k = 0; k < N; k++) {
// initialization phase
tmp = c[k];
#pragma omp scan init_complete
// scan (output) phase - cannot use tmp here
c[k] = x;
#pragma omp scan exclusive(x)
// input phase - can use tmp here
x += tmp;
}
printf("x = %d, c[0:3] = %d %d %d\n", x, c[0], c[1], c[2]);
// 5050, 0 1 3
return 0;
}

View File

@ -0,0 +1,37 @@
! @@name: scan.3
! @@type: F-free
! @@operation: run
! @@expect: success
! @@version: omp_6.0
program inclusive_scan
implicit none
integer, parameter :: n = 100
integer c(n), tmp
integer x, k
! initialization
x = 0
do k = 1, n
c(k) = k
end do
! c(k) is used for both input and output of scan results
!$omp parallel do simd reduction(inscan,+: x) private(tmp)
do k = 1, n
! initialization phase
tmp = c(k)
!$omp scan init_complete
! scan (output) phase - cannot use tmp here
c(k) = x
!$omp scan exclusive(x)
! input phase - can use tmp here
x = x + tmp
end do
print *,'x =', x, ', c(1:3) =', c(1:3)
! 5050, 0 1 3
end program

View File

@ -3,7 +3,7 @@
* @@type: C
* @@operation: compile
* @@expect: success
* @@version: omp_4.0
* @@version: omp_6.0
*/
#include <stdio.h>
#include <limits.h>
@ -25,12 +25,12 @@ void maxproc ( struct point *out, struct point *in )
if ( in->y > out->y ) out->y = in->y;
}
#pragma omp declare reduction(min : struct point : \
minproc(&omp_out, &omp_in)) \
#pragma omp declare reduction(min : struct point) \
combiner( minproc(&omp_out, &omp_in) ) \
initializer( omp_priv = { INT_MAX, INT_MAX } )
#pragma omp declare reduction(max : struct point : \
maxproc(&omp_out, &omp_in)) \
#pragma omp declare reduction(max : struct point) \
combiner( maxproc(&omp_out, &omp_in) ) \
initializer( omp_priv = { 0, 0 } )
void find_enclosing_rectangle ( int n, struct point points[] )

View File

@ -2,7 +2,7 @@
! @@type: F-free
! @@operation: compile
! @@expect: success
! @@version: omp_4.0
! @@version: omp_6.0
module data_type
type :: point
@ -18,10 +18,12 @@ subroutine find_enclosing_rectangle ( n, points )
integer :: n
type(point) :: points(*)
!$omp declare reduction(min : point : minproc(omp_out, omp_in)) &
!$omp declare reduction(min : point) &
!$omp& combiner( minproc(omp_out, omp_in) ) &
!$omp& initializer( omp_priv = point( HUGE(0), HUGE(0) ) )
!$omp declare reduction(max : point : maxproc(omp_out, omp_in)) &
!$omp declare reduction(max : point) &
!$omp& combiner( maxproc(omp_out, omp_in) ) &
!$omp& initializer( omp_priv = point( 0, 0 ) )
type(point) :: minp = point( HUGE(0), HUGE(0) ), maxp = point( 0, 0 )

View File

@ -3,7 +3,7 @@
* @@type: C
* @@operation: compile
* @@expect: success
* @@version: omp_4.0
* @@version: omp_6.0
*/
#include <stdio.h>
#include <limits.h>
@ -13,15 +13,15 @@ struct point {
int y;
};
#pragma omp declare reduction(min : struct point : \
omp_out.x = omp_in.x > omp_out.x ? omp_out.x : omp_in.x, \
omp_out.y = omp_in.y > omp_out.y ? omp_out.y : omp_in.y ) \
initializer( omp_priv = { INT_MAX, INT_MAX } )
#pragma omp declare reduction(min : struct point) \
combiner( omp_out.x = omp_in.x > omp_out.x ? omp_out.x : omp_in.x, \
omp_out.y = omp_in.y > omp_out.y ? omp_out.y : omp_in.y ) \
initializer( omp_priv = { INT_MAX, INT_MAX } )
#pragma omp declare reduction(max : struct point : \
omp_out.x = omp_in.x < omp_out.x ? omp_out.x : omp_in.x, \
omp_out.y = omp_in.y < omp_out.y ? omp_out.y : omp_in.y ) \
initializer( omp_priv = { 0, 0 } )
#pragma omp declare reduction(max : struct point) \
combiner( omp_out.x = omp_in.x < omp_out.x ? omp_out.x : omp_in.x, \
omp_out.y = omp_in.y < omp_out.y ? omp_out.y : omp_in.y ) \
initializer( omp_priv = { 0, 0 } )
void find_enclosing_rectangle ( int n, struct point points[] )
{

View File

@ -2,7 +2,7 @@
! @@type: F-free
! @@operation: compile
! @@expect: success
! @@version: omp_4.0
! @@version: omp_6.0
module data_type
type :: point
@ -18,14 +18,14 @@ subroutine find_enclosing_rectangle ( n, points )
integer :: n
type(point) :: points(*)
!$omp declare reduction( min : point : &
!$omp& omp_out = point(min( omp_out%x, omp_in%x ), &
!$omp& min( omp_out%y, omp_in%y )) ) &
!$omp declare reduction( min : point ) &
!$omp& combiner( omp_out = point(min( omp_out%x, omp_in%x ), &
!$omp& min( omp_out%y, omp_in%y )) ) &
!$omp& initializer( omp_priv = point( HUGE(0), HUGE(0) ) )
!$omp declare reduction( max : point : &
!$omp& omp_out = point(max( omp_out%x, omp_in%x ), &
!$omp& max( omp_out%y, omp_in%y )) ) &
!$omp declare reduction( max : point ) &
!$omp& combiner( omp_out = point(max( omp_out%x, omp_in%x ), &
!$omp& max( omp_out%y, omp_in%y )) ) &
!$omp& initializer( omp_priv = point( 0, 0 ) )
type(point) :: minp = point( HUGE(0), HUGE(0) ), maxp = point( 0, 0 )

View File

@ -3,7 +3,7 @@
* @@type: C
* @@operation: run
* @@expect: success
* @@version: omp_4.0
* @@version: omp_6.0
*/
#include <stdio.h>
#define N 100
@ -18,9 +18,9 @@ struct mx_s {
void mx_combine(struct mx_s *out, struct mx_s *in);
void mx_init(struct mx_s *priv, struct mx_s *orig);
#pragma omp declare reduction(maxloc: struct mx_s: \
mx_combine(&omp_out, &omp_in)) \
initializer(mx_init(&omp_priv, &omp_orig))
#pragma omp declare reduction(maxloc: struct mx_s) \
combiner( mx_combine(&omp_out, &omp_in) ) \
initializer( mx_init(&omp_priv, &omp_orig) )
void mx_combine(struct mx_s *out, struct mx_s *in)
{

View File

@ -2,7 +2,7 @@
! @@type: F-free
! @@operation: run
! @@expect: success
! @@version: omp_4.0
! @@version: omp_6.0
program max_loc
implicit none
type :: mx_s
@ -10,9 +10,9 @@ program max_loc
integer index
end type
!$omp declare reduction(maxloc: mx_s: &
!$omp& mx_combine(omp_out, omp_in)) &
!$omp& initializer(mx_init(omp_priv, omp_orig))
!$omp declare reduction(maxloc: mx_s) &
!$omp& combiner( mx_combine(omp_out, omp_in) ) &
!$omp& initializer( mx_init(omp_priv, omp_orig) )
integer, parameter :: N = 100
type(mx_s) :: mx

View File

@ -2,7 +2,7 @@
! @@type: F-free
! @@operation: run
! @@expect: success
! @@version: omp_4.0
! @@version: omp_6.0
module data_red
! Declare data type.
type dt
@ -16,8 +16,9 @@ module data_red
end interface
! Declare the user-defined reduction operator .add.
!$omp declare reduction(.add.:dt:omp_out=omp_out.add.omp_in) &
!$omp& initializer(dt_init(omp_priv))
!$omp declare reduction(.add. : dt) &
!$omp& combiner( omp_out=omp_out.add.omp_in ) &
!$omp& initializer( dt_init(omp_priv) )
contains
! Declare the initialization routine.

View File

@ -3,7 +3,7 @@
* @@type: C++
* @@operation: compile
* @@expect: success
* @@version: omp_4.0
* @@version: omp_6.0
*/
class V {
float *p;
@ -16,6 +16,6 @@ public:
V& operator+= ( const V& );
#pragma omp declare reduction( + : V : omp_out += omp_in ) \
#pragma omp declare reduction( + : V ) combiner( omp_out += omp_in ) \
initializer(omp_priv(omp_orig))
};

View File

@ -3,18 +3,19 @@
* @@type: C++
* @@operation: view
* @@expect: unspecified
* @@version: omp_4.0
* @@version: omp_6.0
*/
#include <algorithm>
#include <list>
#include <vector>
#pragma omp declare reduction( + : std::vector<int> : \
std::transform (omp_out.begin(), omp_out.end(), \
omp_in.begin(), omp_in.end(),std::plus<int>()))
#pragma omp declare reduction( + : std::vector<int> ) \
combiner( std::transform (omp_out.begin(), omp_out.end(), \
omp_in.begin(), omp_in.end(),std::plus<int>()) )
#pragma omp declare reduction( merge : std::vector<int> : \
omp_out.insert(omp_out.end(), omp_in.begin(), omp_in.end()))
#pragma omp declare reduction( merge : std::vector<int> ) \
combiner( omp_out.insert(omp_out.end(), omp_in.begin(), \
omp_in.end()) )
#pragma omp declare reduction( merge : std::list<int> : \
omp_out.merge(omp_in))
#pragma omp declare reduction( merge : std::list<int> ) \
combiner( omp_out.merge(omp_in) )

View File

@ -12,7 +12,7 @@ The following examples demonstrate how to use the \kcode{threadprivate} directiv
\fexample{threadprivate}{1}
\pagebreak
\ccppspecificstart
\begin{ccppspecific}
The following example uses \kcode{threadprivate} on a static variable:
\cnexample{threadprivate}{2}
@ -26,12 +26,12 @@ region could be either 1 or 2. This problem is avoided for \ucode{b}, which uses
an auxiliary \bcode{const} variable and a copy-constructor.
\cppnexample{threadprivate}{3}
\ccppspecificend
\end{ccppspecific}
The following examples show non-conforming uses and correct uses of the \kcode{threadprivate}
directive.
\fortranspecificstart
\begin{fortranspecific}
The following example is non-conforming because the common block is not declared
local to the subroutine that refers to it:
@ -84,9 +84,9 @@ The following is an example of the use of \kcode{threadprivate} for module varia
\topmarker{Fortran}
\fnexample{threadprivate}{6}
\fortranspecificend
\end{fortranspecific}
\cppspecificstart
\begin{cppspecific}
The following example illustrates initialization of \kcode{threadprivate} variables
for class-type \ucode{T}. \ucode{t1} is default constructed, \ucode{t2} is constructed
taking a constructor accepting one argument of integer type, \ucode{t3} is copy
@ -99,5 +99,5 @@ class members. The \kcode{threadprivate} directive for a static class member mus
be placed inside the class definition.
\cppnexample{threadprivate}{5}
\cppspecificend
\end{cppspecific}

View File

@ -28,8 +28,10 @@ the rectangle that encloses a set of 2-D points.
Each \kcode{declare reduction} directive defines new reduction identifiers,
\ucode{min} and \ucode{max}, to be used in a \kcode{reduction} clause. The next item in the
declaration list is the data type (\ucode{struct point}) used in the reduction,
followed by the combiner, here the functions \ucode{minproc} and \ucode{maxproc} perform
declaration list is the data type (\ucode{struct point}) used in the
reduction.
The \kcode{combiner} clause specifies the functions \ucode{minproc} and
\ucode{maxproc} to perform
the min and max operations, respectively, on the user data (of type \ucode{struct point}).
In the function argument list are two special OpenMP variable identifiers, \kcode{omp_in} and \kcode{omp_out},
that denote the two values to be combined in the ``real'' function;
@ -39,7 +41,7 @@ The initializer of the \kcode{declare reduction} directive specifies
the initial value for the private variable of each implicit task.
The \kcode{omp_priv} identifier is used to denote the private variable.
\cexample[4.0]{udr}{1}
\cexample[6.0]{udr}{1}
%\clearpage
The following example shows the corresponding code in Fortran.
@ -47,49 +49,53 @@ The \kcode{declare reduction} directives are specified as part of
the declaration in subroutine \ucode{find_enclosing_rectangle} and
the procedures that perform the min and max operations are specified as subprograms.
\ffreeexample[4.0]{udr}{1}
\ffreeexample[6.0]{udr}{1}
The following example shows the same computation as \example{udr.1} but it illustrates that you can craft complex expressions in the user-defined
reduction declaration. In this case, instead of calling the \ucode{minproc}
and \ucode{maxproc} functions we inline the code in a single expression.
\cexample[4.0]{udr}{2}
\cexample[6.0]{udr}{2}
The corresponding code of the same example in Fortran is very similar
except that the assignment expression in the \kcode{declare reduction}
except that the assignment expression in the \kcode{combiner} clause for
the \kcode{declare reduction}
directive can only be used for a single variable, in this case through
a type structure constructor \ucode{point($\ldots$)}.
\ffreeexample[4.0]{udr}{2}
\ffreeexample[6.0]{udr}{2}
\index{OpenMP variable identifiers!omp_orig@\kcode{omp_orig}}
The following example shows the use of special variables in arguments for
combiner (\kcode{omp_in} and \kcode{omp_out}) and initializer (\kcode{omp_priv}
and \kcode{omp_orig}) routines. This example returns the maximum value of an
array and the corresponding index value. The \kcode{declare reduction}
directive specifies a user-defined reduction operation \ucode{maxloc} for
data type \ucode{struct mx_s}. The function \ucode{mx_combine} is the combiner
and the function \ucode{mx_init} is the initializer.
combiner (\kcode{omp_in} and \kcode{omp_out}) and initializer
(\kcode{omp_priv} and \kcode{omp_orig}) routines. This example returns
the maximum value of an array and the corresponding index value. The
\kcode{declare reduction} directive specifies a user-defined
reduction operation \ucode{maxloc} for data type \ucode{struct mx_s}.
The function \ucode{mx_combine} is the combiner and the function \ucode{mx_init}
is the initializer.
\cexample[4.0]{udr}{3}
\cexample[6.0]{udr}{3}
Below is the corresponding Fortran version of the above example. The
\kcode{declare reduction} directive specifies the user-defined operation
\ucode{maxloc} for user-derived type \ucode{mx_s}. The combiner
\ucode{mx_combine} and the initializer \ucode{mx_init} are specified as
subprograms.
Below is the corresponding Fortran version of the above example.
The \kcode{declare reduction} directive specifies the user-defined
operation \ucode{maxloc} for user-derived type \ucode{mx_s}.
The combiner \ucode{mx_combine} and the initializer \ucode{mx_init} are
specified as subprograms.
\ffreeexample[4.0]{udr}{3}
\ffreeexample[6.0]{udr}{3}
The following example explains a few details of the user-defined reduction
in Fortran through modules. The \kcode{declare reduction} directive is declared in a module (\ucode{data_red}).
in Fortran through modules. The \kcode{declare reduction} directive
is declared in a module (\ucode{data_red}).
The reduction-identifier \ucode{.add.} is a user-defined operator that is
to allow accessibility in the scope that performs the reduction
operation.
The user-defined operator \ucode{.add.} and the subroutine \ucode{dt_init} specified in the \kcode{initializer} clause are defined in the same subprogram.
The user-defined operator \ucode{.add.} and the subroutine \ucode{dt_init}
specified in the \kcode{initializer} clause are defined in the same subprogram.
The reduction operation (that is, the \kcode{reduction} clause) is in the main program.
The reduction identifier \ucode{.add.} is accessible by use association.
@ -101,28 +107,27 @@ has the \kcode{initializer} clause, the subroutine specified on the clause
must be accessible in the current scoping unit. In this case,
the subroutine \ucode{dt_init} is accessible by use association.
\ffreeexample[4.0]{udr}{4}
\ffreeexample[6.0]{udr}{4}
The following example uses user-defined reductions to declare a plus (\kcode{+})
reduction for a C++ class. As the \kcode{declare reduction} directive is inside
the context of the \ucode{V} class the expressions in the \kcode{declare
reduction} directive are resolved in the context of the class. Also, note that
the \kcode{initializer} clause uses a copy constructor to initialize the
private variables of the reduction and it uses as parameter to its original
variable by using the special variable \kcode{omp_orig}.
reduction for a C++ class. As the \kcode{declare reduction} directive
is inside the context of the \ucode{V} class the expressions in the
\kcode{declare reduction} directive are resolved in the context of
the class. Also, note that the \kcode{initializer} clause uses a copy
constructor to initialize the private variables of the reduction and it uses
as parameter to its original variable by using the special variable
\kcode{omp_orig}.
\cppexample[4.0]{udr}{5}
\cppexample[6.0]{udr}{5}
The following examples shows how user-defined reductions can be defined for
some STL containers. The first \kcode{declare reduction} defines the plus
(\kcode{+})
operation for \ucode{std::vector<int>} by making use of the
\ucode{std::transform} algorithm. The second and third define the merge
(or concatenation) operation for \ucode{std::vector<int>} and
\ucode{std::list<int>}.
%It shows how the same user-defined reduction operation can be defined to be done differently depending on the specified data type.
It shows how the user-defined reduction operation can be applied to specific data types of an STL.
some STL containers. The first \kcode{declare reduction} defines the
plus (\kcode{+}) operation for \ucode{std::vector<int>} by making use of the
\ucode{std::transform} algorithm. The second and third define the merge (or
concatenation) operation for \ucode{std::vector<int>} and \ucode{std::list<int>}.
It shows how the user-defined reduction operation can be applied to specific
data types of an STL.
\cppexample[4.0]{udr}{6}
\cppexample[6.0]{udr}{6}

View File

@ -1,4 +1,5 @@
%\pagebreak
\begin{cppspecific}[4ex]
\section{C++ Virtual Functions}
\label{sec:virtual_functions}
@ -31,7 +32,8 @@ That is, the behavior of the implicit map of \ucode{ar}
is non-conforming -- its static type \ucode{A} doesn't match its dynamic type \ucode{D}.
Hence the behavior of the access to the virtual functions is unspecified.
\cppexample[5.2]{virtual_functions}{1}
\topmarker{C++}
\cppnexample[5.2]{virtual_functions}{1}
The second example illustrates the restriction:
@ -47,4 +49,6 @@ In the second case, the object \ucode{ap} is instantiated on the host; access of
the next \kcode{target} region is permitted. (Unified Shared Memory is
used here to minimize mapping concerns.)
\cppexample[5.2]{virtual_functions}{2}
\topmarker{C++}
\cppnexample[5.2]{virtual_functions}{2}
\end{cppspecific}

View File

@ -11,7 +11,7 @@
\index{directives!begin declare target@\kcode{begin declare target}}
\index{begin declare target directive@\kcode{begin declare target} directive}
\ccppspecificstart
\begin{ccppspecific}
A pointer variable can be shaped to a multi-dimensional array to facilitate
data access. This is achieved by a \plc{shape-operator} casted in front of
a pointer (lvalue expression):
@ -35,13 +35,15 @@ around the shape-operator and \ucode{a} to ensure the correct precedence
over array-section operations.
\cnexample[5.1]{array_shaping}{1}
\ccppspecificend
\end{ccppspecific}
%\clearpage
\begin{fortranspecific}
The shape operator is not defined for Fortran. Explicit array shaping
of procedure arguments can be used instead to achieve a similar goal.
Below is the Fortran equivalent of the above example that illustrates
the support of transferring two rows of noncontiguous boundary
data in the \kcode{target update} directive.
\ffreeexample[5.2]{array_shaping}{1}
\ffreenexample[5.2]{array_shaping}{1}
\end{fortranspecific}

View File

@ -0,0 +1,11 @@
\subsection{Conditionally Asynchronous \kcode{target} Using the \kcode{nowait} Clause}
\label{subsec:async_target_nowait_arg}
\index{target construct@\kcode{target} construct!nowait clause@\kcode{nowait} clause}
\index{nowait clause@\kcode{nowait} clause}
\index{clauses!nowait@\kcode{nowait}}
In OpenMP 6.0, \kcode{nowait} takes an OpenMP logical type argument to specify if the generated \plc{task} is an included task or a deferred task. In the following example, the \kcode{nowait} clause is used with an argument on the \kcode{target} directive. In a practical situation, the value of \ucode{is_deferred} can be chosen based on the time taken for some work on host or device that can be performed asynchronously after the target task is scheduled. If the target task is deferred, it must be synchronized by a \kcode{taskwait} before the value of \ucode{x} is used. Prior to 6.0, the same effect would require the use of a \plc{metadirective} or an \bcode{if-else} statement that duplicates the \kcode{target} construct.
\cexample[6.0]{async_target}{5}
\ffreeexample[6.0]{async_target}{5}

View File

@ -95,7 +95,7 @@ end of the affected declarations, as introduced in OpenMP 5.1.
The \kcode{begin declare target} directive was defined
to symmetrically complement the terminating (``end'') directive.
\cppspecificstart
\begin{cppspecific}
The example also shows 3 different ways to use a \kcode{declare target} directive for a
class and an external member-function definition (for the \ucode{XOR1}, \ucode{XOR2},
@ -138,10 +138,10 @@ separately and linking them, will create appropriate executable device functions
\smallskip
\cppnexample[5.1]{declare_target}{2b_main}[1]
%\cppspecificend
%\end{cppspecific}
\topmarker{C++}
%\cppspecificstart
%\begin{cppspecific}
The following example shows how the \kcode{begin declare target} and \kcode{end declare target} directives are used to enclose the declaration
of a variable \ucode{varY} with a class type \ucode{typeY}.
%Prior to OpenMP 5.0, the member function \code{typeY::foo()} cannot
@ -157,7 +157,7 @@ and will successfully execute the function on the device. See previous examples
%as if it were included in list or block of a declare target directive,
\cppnexample[5.1]{declare_target}{2c}
\cppspecificend
\end{cppspecific}
\subsection{Declare Target Directive for Variables}
\label{subsec:declare_target_variables}

View File

@ -0,0 +1,65 @@
\pagebreak
\section{Traits for Specifying Devices}
\label{sec:device_env_traits}
\index{environment variables!OMP_AVAILABLE_DEVICES@\kcode{OMP_AVAILABLE_DEVICES}}
\index{OMP_AVAILABLE_DEVICES@\kcode{OMP_AVAILABLE_DEVICES}}
\index{environment variables!OMP_DEFAULT_DEVICE@\kcode{OMP_DEFAULT_DEVICE}}
\index{OMP_DEFAULT_DEVICE@\kcode{OMP_DEFAULT_DEVICE}}
Environment variables \kcode{OMP_AVAILABLE_DEVICES} and
\kcode{OMP_DEFAULT_DEVICE} can take traits to specify the available
devices and the default device, respectively.
In addition, \kcode{OMP_DEFAULT_DEVICE} can also take an integer
as a device number to specify the default device.
The following examples show how traits are used to specify devices
for these environment variables.
Only GPU non-host devices are available to program:
\begin{boxedcode}
export OMP_AVAILABLE_DEVICES=\ucode{"kind(gpu)"}
\end{boxedcode}
Order of available devices would be all vendor \ucode{A} GPUs, then
the rest of the non-host devices as specified by "\ucode{*}":
\begin{boxedcode}
export OMP_AVAILABLE_DEVICES=\ucode{"kind(gpu)&&vendor(A),*"}
\end{boxedcode}
Available devices would be all non-gpu devices from vendor \ucode{A}:
\begin{boxedcode}
export OMP_AVAILABLE_DEVICES=\ucode{"!kind(gpu)&&vendor(A)"}
\end{boxedcode}
Available devices start with 1 vendor \ucode{A} GPU device, then
2 vendor \ucode{B} GPU devices, and then the rest of the non-host devices:
\begin{boxedcode}
export OMP_AVAILABLE_DEVICES=\ucode{"(kind(gpu)&&vendor(A))[0],}
\ucode{(kind(gpu)&&vendor(B))[0:2],*"}
\end{boxedcode}
The device number range is specified by the C/C++ array section syntax
\ucode{[0:2]} where "\ucode{0}" is the first index and "\ucode{2}"
is the length.
Three available devices are re-ordered with "\ucode{uid-gpu3}" corresponding
to device 0, "\ucode{uid-gpu2}" to device 1 and "\ucode{uid-gpu1}"
to device 2:
\begin{boxedcode}
export OMP_AVAILABLE_DEVICES=\ucode{"uid(uid-gpu3),uid(uid-gpu2),}
\ucode{uid(uid-gpu1)"}
\end{boxedcode}
The default device will be some visible vendor \ucode{A} GPU device.
If not available, then set to initial device:
\begin{boxedcode}
export OMP_DEFAULT_DEVICE=\ucode{"kind(gpu)&&vendor(A),initial"}
\end{boxedcode}
The default device will be some visible vendor \ucode{A} GPU device.
If not available, then set to invalid device so that upon first use of default
device the program will error out:
\begin{boxedcode}
export OMP_DEFAULT_DEVICE=\ucode{"kind(gpu)&&vendor(A),invalid"}
\end{boxedcode}

View File

@ -1,11 +1,11 @@
%\pagebreak
\begin{cppspecific}[4ex]
\section{Lambda Expressions}
\label{sec:lambda_expressions}
\index{lambda expressions}
\cppspecificstart
The following example illustrates the usage of lambda expressions and their
corresponding closure objects within a \kcode{target} region.
@ -48,5 +48,6 @@ results from the \kcode{declare target} directive. The \kcode{always}
modifier is used on the \kcode{map} clause to transfer the updated values for
the structure back to the host device.
\topmarker{C++}
\cppnexample[5.0]{lambda_expressions}{1}
\cppspecificend
\end{cppspecific}

View File

@ -0,0 +1,42 @@
/*
* @@name: async_target.5
* @@type: C
* @@operation: run
* @@expect: success
* @@version: omp_6.0
*/
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
#pragma omp begin declare target
void update(int* num) {
*num = (*num) * 3;
}
#pragma omp end declare target
int main(int argc, char*argv[]){
int x = 2 ;
int is_deferred = time(NULL) % 2;
#pragma omp target nowait(is_deferred) map(tofrom: x)
{
update(&x);
}
// Perform other tasks in parallel while the
// target region is executing
if(is_deferred){
#pragma omp taskwait
}
if( x == 6){
printf("Passed\n");
return 0;
} else {
printf("Failed\n");
return 1;
}
}

View File

@ -0,0 +1,41 @@
! @@name: async_target.5
! @@type: F-free
! @@operation: run
! @@expect: success
! @@version: omp_6.0
program async_target_nowait_arg
implicit none
integer :: x
logical :: is_deferred
real :: rand_no
x = 2
! Determine if computation is deferred
call random_number(rand_no)
is_deferred=mod(int(rand_no*10), 2) == 1
!$omp target map(tofrom: x) nowait(is_deferred)
call update(x)
!$omp end target
! Perform other tasks in parallel while the target region is executing
if (is_deferred) then
!$omp taskwait
endif
if (x == 6) then
stop "Passed"
else
error stop "Failed"
endif
contains
subroutine update(num)
integer, intent(inout) :: num
!$omp declare target
num = num * 3
end subroutine update
end program async_target_nowait_arg

27
devices/sources/teams.7.c Normal file
View File

@ -0,0 +1,27 @@
/*
* @@name: teams.7
* @@type: C
* @@operation: run
* @@expect: success
* @@version: omp_6.0
*/
#include<stdio.h>
#include<omp.h>
int x;
#pragma omp declare target local(x)
int main() {
x = 128;
#pragma omp target
x = 256;
#pragma omp target
#pragma omp teams num_teams(x) // Undefined behavior due to value of "x"
if (omp_get_team_num() == 0){
printf("%d\n", omp_get_num_teams());
}
return 0;
}

View File

@ -0,0 +1,25 @@
! @@name: teams.7
! @@type: F-free
! @@operation: run
! @@expect: success
! @@version: omp_6.0
PROGRAM main
USE omp_lib
INTEGER :: x
!$OMP DECLARE TARGET LOCAL(x)
x = 128
!$OMP TARGET
x = 256
!$OMP END TARGET
!$OMP TARGET
!$OMP TEAMS NUM_TEAMS(x) ! Undefined behavior due to value of 'x'
IF (omp_get_team_num() == 0) THEN
PRINT *, omp_get_num_teams()
END IF
!$OMP END TEAMS
!$OMP END TARGET
END PROGRAM main

View File

@ -1,4 +1,4 @@
! @@name: usm_scalar_ptr_ref_ax.1
! @@name: usm_scalar_ptr_ref_asc.1
! @@type: F-free
! @@operation: compile
! @@expect: success

View File

@ -1,4 +1,5 @@
%\pagebreak
\begin{fortranspecific}[4ex]
\section{Fortran Allocatable Array Mapping}
\label{sec:fort_allocatable_array_mapping}
\index{mapping!allocatable array, Fortran}
@ -36,8 +37,8 @@ or disassociated status, and associated storage can be mapped and attached as ne
For allocatable variables, the update of the allocation status to allocated (allowing
reference to allocated storage) on the device, is similar to pointer attachment.
\ffreeexample[5.1]{target_fort_allocatable_map}{1}
\topmarker{Fortran}
\ffreenexample[5.1]{target_fort_allocatable_map}{1}
Once an allocatable variable has been allocated on the host,
its allocation status may not be changed in a \kcode{target} region, either
@ -48,7 +49,8 @@ and allocation) in a \kcode{target} region is not conforming.
Also, an initial intrinsic assignment of an allocatable variable
requires deallocation before the \kcode{target} region ends.
\ffreeexample[5.1]{target_fort_allocatable_map}{2}
\topmarker{Fortran}
\ffreenexample[5.1]{target_fort_allocatable_map}{2}
\newpage
The next example illustrates a corner case of this restriction (allocatable status
@ -62,4 +64,5 @@ the compiler will deallocate the associated actual argument when the subroutine
(However, the allocation on procedure entry can be avoided by specifying the intent
as \bcode{intent(inout)}, making the intended use conforming.)
\ffreeexample[5.1]{target_fort_allocatable_map}{3}
\ffreenexample[5.1]{target_fort_allocatable_map}{3}
\end{fortranspecific}

View File

@ -64,7 +64,7 @@ each primary thread's private copy of \ucode{sum} is reduced into the final \uco
implicitly mapped into the \kcode{target} region.
\cexample[4.0]{teams}{2}
\clearpage
%\clearpage
\ffreeexample[4.0]{teams}{2}
@ -144,3 +144,13 @@ thread uses SIMD parallelism.
\ffreeexample[4.0]{teams}{6}
\subsection{Evaluation of \kcode{num_teams} Clause that Appears inside \kcode{target} Region}
\label{subsec:target_teams_num_teams}
The following example shows the evaluation of the \kcode{num_teams} clause when the \kcode{teams} construct is closely nested inside \kcode{target} construct. The code is non-conforming since value of \ucode{x} for the clause may be different from different devices. As of OpenMP 6.0, it is the user's responsibility to ensure identical values for the clause expression for nested as well as combined directive cases for \kcode{target} and \kcode{teams} constructs. This permits implementations to evaluate the \kcode{num_teams} argument on the host rather than the target device. For the program to be conforming, the program must update the host value so that \ucode{x} will have the same value when evaluated on the host or target device.
\cexample[6.0]{teams}{7}
\ffreeexample[6.0]{teams}{7}

View File

@ -14,6 +14,7 @@ unified shared memory (USM) is required throughout the scope of the program by t
\kcode{unified_shared_memory} clause in a \kcode{requires} directive.
USM assumes a unified address space.
\begin{cppspecific}
In the C++ code of the first example, a scalar (\ucode{x}), a pointer
(\ucode{ptr}), and a reference (\ucode{ref}) are used in a \kcode{target} construct in Cases 1, 2 and 3, respectively.
For the scalar variable \ucode{x}, the predetermined data-sharing attribute is still
@ -24,6 +25,10 @@ the \kcode{target} construct, as seen in Case 2.
For the reference \ucode{ref}, the object to which it refers is mapped for
the \kcode{target} construct, as seen in Case 3.
\cppnexample[5.2]{usm_scalar_ptr_ref_asc}{1}
\end{cppspecific}
\begin{fortranspecific}
In Case 1 of the Fortran example, the scalar \ucode{x} is firstprivate under the USM requirement
in the \kcode{target} construct, and modification of the local variable on the device is
never updated to the host data environment.
@ -35,5 +40,5 @@ but implicitly mapped. Hence, updates to the value of \ucode{y} appear in the h
%Hence, updates to \ucode{y} in the \kcode{target} construct appear in the data environment of the host.
%\pagebreak
\cppexample[5.2]{usm_scalar_ptr_ref_asc}{1}
\ffreeexample[5.2]{usm_scalar_ptr_ref_asc}{1}
\ffreenexample[5.2]{usm_scalar_ptr_ref_asc}{1}
\end{fortranspecific}

View File

@ -1,22 +1,23 @@
\section{C++ Attributes}
\begin{ccppspecific}[4ex]
\section{C/C++ Attributes}
\label{sec:attributes}
\index{directive syntax!attribute, C++}
\index{attribute syntax, C++}
\index{directive syntax!attribute, C/C++}
\index{attribute syntax, C/C++}
OpenMP directives for C++ can also be specified with
%the implementation-defined
the \kcode{directive} extension for the C++11 standard \plc{attributes}.
OpenMP directives for C/C++ can also be specified with
the \kcode{directive} extension for the C23 and C++11 standard \plc{attributes}.
%https://en.cppreference.com/w/cpp/language/attributes
The C++ example below shows two ways to parallelize a \bcode{for} loop using the \kcode{\#pragma} syntax.
The example below shows two ways to parallelize a \bcode{for} loop using the \kcode{\#pragma} syntax.
The first pragma uses the combined \kcode{parallel for} directive, and the second
applies the uncombined closely nested directives, \kcode{parallel} and \kcode{for}, directly to the same statement.
These are labeled PRAG 1-3.
Using the attribute syntax, the same construct in PRAG 1
is applied two different ways in attribute form, as shown in the ATTR 1 and ATTR 2 sections.
is applied in two different ways in attribute form, as shown in the ATTR 1 and ATTR 2 sections.
In ATTR 1 the attribute syntax is used with the \kcode{omp ::} namespace form.
In ATTR 2 the attribute syntax is used with the \kcode{using omp :} namespace form.
In ATTR 2 the attribute syntax is used with the \kcode{using omp :} namespace
form available for C++ only.
Next, parallelization is attempted by applying directives using two different syntaxes.
For ATTR 3 and PRAG 4, the loop parallelization will fail to compile because multiple directives that
@ -53,4 +54,33 @@ form of the \kcode{simd} directive is used for loops calling the \ucode{Q} funct
in combination with the attribute form of the \kcode{declare simd}
directives declaring the variants for \ucode{Q}.
\cppexample[5.1]{directive_syntax_attribute}{1}
\topmarker{C/C++}
\cppnexample[6.0]{directive_syntax_attribute}{1}
\topmarker{C/C++}
The following code snippets show how to use the \kcode{omp::decl} attribute
as an alternative way for specifying declarative directives.
The \kcode{omp::decl} attribute can be embedded in the base
language declarations as shown for variables in Cases 1 and 2,
for function in Case 3, and for C++ template in Case 4.
The variable and function name lists are implied from where
the attributes are specified.
In Case 1, the prefix attribute applies
to all variables (\ucode{u} and \ucode{v}) in the declaration;
in Case 2, the postfix attribute applies to the associated variable
(\ucode{a} as the directive argument for the \kcode{declare_target} directive,
and \ucode{b} as the clause argument for the \kcode{link} clause
on \kcode{declare_target});
in Case 3, the prefix attribute applies to the function (\ucode{f}).
The comma to separate directive name (\kcode{declare_target}) and
clause name (\kcode{link}) in
the \kcode{omp::decl} attribute specifier in Case 2 is optional.
Case 4 shows the use of \kcode{omp::decl(declare_target)} for
a C++ template function definition
and its equivalent using the delimited
\kcode{begin}/\kcode{end declare_target} pragma form.
\cppnexample[6.0]{directive_syntax_attribute}{2}
\end{ccppspecific}

View File

@ -1,4 +1,5 @@
%\pagebreak
\pagebreak
\begin{fortranspecific}[4ex]
\section{Fortran Comments (Fixed Source Form)}
\label{sec:fortran_fixed_format_comments}
\index{directive syntax!fixed form, Fortran}
@ -16,5 +17,6 @@ two separate directives.
Here, an \kcode{end} directive (\kcode{end parallel}) must be specified to demarcate the range (region)
of the \kcode{parallel} directive.
\fexample{directive_syntax_F_fixed_comment}{1}
\fnexample{directive_syntax_F_fixed_comment}{1}
\end{fortranspecific}
\clearpage

View File

@ -1,4 +1,5 @@
%\pagebreak
\begin{fortranspecific}[4ex]
\section{Fortran Comments (Free Source Form)}
\label{sec:fortran_free_format_comments}
\index{directive syntax!free form, Fortran}
@ -18,7 +19,7 @@ two separate directives.
Here, an \kcode{end} directive (\kcode{end parallel}) must be specified to demarcate the range (region)
of the \kcode{parallel} directive.
\ffreeexample{directive_syntax_F_free_comment}{1}
\ffreenexample{directive_syntax_F_free_comment}{1}
\clearpage
As of OpenMP 5.1, \bcode{block} and \bcode{end block} statements can be used to designate
@ -28,7 +29,8 @@ block structure and are hence private.
It was necessary to explicitly declare the \ucode{i} variable, due to the \bcode{implicit none} statement;
it could have also been declared outside the structured block.
\ffreeexample[5.1]{directive_syntax_F_block}{1}
\topmarker{Fortran}
\ffreenexample[5.1]{directive_syntax_F_block}{1}
A Fortran \bcode{BLOCK} construct may eliminate the need for a paired \kcode{end} directive for an OpenMP construct,
as illustrated in the following example.
@ -48,4 +50,6 @@ a strictly structured block of an OpenMP construct is treated as the terminating
of that construct.
The next \kcode{end parallel} directive is required to terminate the outer \kcode{parallel} construct.
\ffreeexample[5.1]{directive_syntax_F_block}{2}
\topmarker{Fortran}
\ffreenexample[5.1]{directive_syntax_F_block}{2}
\end{fortranspecific}

View File

@ -1,4 +1,5 @@
%\pagebreak
\begin{ccppspecific}[4ex]
\section{C/C++ Pragmas}
\label{sec:pragmas}
\index{directive syntax!pragma, C/C++}
@ -20,4 +21,5 @@ two separate directives. The executable directives above all apply to the next
statement. The \kcode{parallel} directive can be applied to a \plc{structured block}
as shown in PRAG 5.
\cexample{directive_syntax_pragma}{1}
\cnexample{directive_syntax_pragma}{1}
\end{ccppspecific}

View File

@ -3,7 +3,7 @@
* @@type: C++
* @@operation: run
* @@expect: success
* @@version: omp_5.1
* @@version: omp_6.0
*/
#include <stdio.h>
#include <omp.h>
@ -77,5 +77,5 @@ int main() {
// OUTPUT: thrd no 2
// OUTPUT: thrd no 3
// repeated 3 time:
// repeated 3 times:
// OUTPUT: 656700.000000

View File

@ -0,0 +1,36 @@
/*
* @@name: directive_syntax_attribute.2
* @@type: C++
* @@operation: view
* @@expect: none
* @@version: omp_6.0
*/
// Case 1
[[ omp::decl(threadprivate) ]] int u, v;
// equivalent to
int u ,v;
#pragma omp threadprivate(u, v)
// Case 2
int a[100] [[ omp::decl(declare_target) ]],
b[100] [[ omp::decl(declare_target, link) ]];
// equivalent to
int a[100], b[100];
#pragma omp declare_target(a)
#pragma omp declare_target link(b)
// Case 3
[[ omp::decl(declare_target) ]] void f( int c );
// equivalent to
void f( int c );
#pragma omp declare_target(f)
// Case 4
template<typename T>
[[ omp::decl(declare_target) ]]
void foo(T);
// equivalent to
#pragma omp begin declare_target
template<typename T>
void foo(T);
#pragma omp end declare_target

View File

@ -29,15 +29,18 @@ prior to OpenMP version 3.0, such as
Language markers may be used to indicate text or codes that are specific
to a particular base language.
\ccppspecificstart
\begin{ccppspecific}
This is C/C++ specific:
A statement following a directive is compound only when necessary, and a
non-compound statement is indented with respect to a directive preceding it.
\ccppspecificend
\fortranspecificstart
\end{ccppspecific}
\begin{fortranspecific}
This is Fortran specific...
\fortranspecificend
\end{fortranspecific}
\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
This marks the continuation of language specific page.
\medskip
Throughout the examples document we assume that the number of threads
used for a \kcode{parallel} region is the same as
the number of threads requested, unless explicitly specified otherwise.

View File

@ -0,0 +1,150 @@
\pagebreak
\section{\kcode{apply} Clause}
\label{sec:apply_clause}
\index{unroll construct@\kcode{unroll} construct!apply clause@\kcode{apply} clause}
\index{tile construct@\kcode{tile} construct!apply clause@\kcode{apply} clause}
\index{apply clause@\kcode{apply} clause}
\index{clauses!apply@\kcode{apply}}
A loop transformation construct can be applied to another nested
loop transformation construct, but the application of the ``outer'' transformation
is limited to the outermost generated loop of the ``inner'' transformation.
The \kcode{apply} clause on a loop transformation construct can specify additional
loop transformation directives that apply to generated loops other than the outermost one.
Clause modifiers are used to specify which generated loop to target.
Also, an applied directive within a clause may specify another \kcode{apply} clause.
%The \code{apply} clause on a loop transformation construct can specify (other)
%loop transformation directives to be applied to its transformation.
%Clause modifiers can be used to target specific generated loops, providing a mechanism
%to overcome the restriction of applying a transformation immediately to the next loop
%transformation construct. Also, an applied directive within a clause may be another
%\code{apply} clause.
Any nested loop transformation constructs including any constructs that
result from \kcode{apply} clauses of nested constructs are replaced before any enclosing
loop transformation construct. This is referred to as the \plc{innermost-first order}
here.
\subsection{Syntax and Effect}
In the example below, the \ucode{construct_unroll} and \ucode{apply_unroll} functions
illustrate the syntax for two equivalent means of applying the \kcode{unroll} loop transformation
directive to the outermost generated (grid) loop of the \kcode{tile} construct transformation.
In function \ucode{construct_unroll}, the tile transformation creates the generated (tiled) loops
and then the \kcode{unroll} construct is applied to outermost loop of the replacement.
In the \ucode{apply_unroll} function, the \kcode{apply} clause on the \kcode{tile} construct
is used to apply an \kcode{unroll} transformation on the \plc{grid} loop (the outermost loop
of the tile transformation) as specified by the \kcode{grid} modifier.
\cexample[6.0]{apply_syntax}{1}
\ffreeexample[6.0]{apply_syntax}{1}
For the two functions in the previous example,
the \ucode{equivalent} function in the next example shows an equivalent
code that a user could have written without using the \kcode{tile} construct
or \kcode{apply} clause.
\cexample[5.1]{apply_syntax_equivalent}{1}
\ffreeexample[5.1]{apply_syntax_equivalent}{1}
The following example shows how multiple loop transformation directives
can be applied to different generated loops resulting from a loop transformation.
For the 4x4 \kcode{tile} construct there will be two (outer) \plc{grid} loops and two (inner) \plc{intra-tile} loops.
The first \kcode{apply} clause specifies that the two \plc{grid} loops are to have an \kcode{interchange} directive and a \kcode{nothing} directive
(just a placeholder to indicate no directive application) applied to the grid (two outermost) loops.
Directives, read from left to right, are applied to the \plc{grid} loops, from outermost to innermost, respectively.
The second \kcode{apply} clause specifies that the two \plc{intratile} loops are to have \kcode{nothing} and \kcode{interchange} directives applied to the
last two \plc{tile} loops, respectively.
Note that the \ucode{A} array dimensions are \ucode{A[100][100][3]} and \ucode{A(0:2,0:99,0:99)}
in the C/C++ and Fortran codes to illustrate equivalent sequential memory access for the
\ucode{i}, \ucode{j} and \ucode{k} loops.
\index{interchange directive@\kcode{interchange} directive}
\index{directives!interchange@\kcode{interchange}}
\index{nothing directive@\kcode{nothing} directive}
\index{directives!nothing@\kcode{nothing}}
\cexample[6.0]{apply_syntax}{2}
\pagebreak
\ffreeexample[6.0]{apply_syntax}{2}
For the function in the previous example,
the \ucode{equivalent} function in the next example shows a possible
equivalent tile replacement code (\kcode{tile} generated loops) and the
appropriately positioned \kcode{interchange} and \kcode{nothing} directives.
\cexample[6.0]{apply_syntax_equivalent}{2}
\pagebreak
\ffreeexample[6.0]{apply_syntax_equivalent}{2}
\index{tile construct@\kcode{tile} construct!apply clause@\kcode{apply} clause}
\index{grid modifier@\kcode{grid} modifier}
\index{intratile modifier@\kcode{intratile} modifier}
The following example illustrates the use of \kcode{apply} clause
modifiers with argument. The index of the generated loop instead of
a positional location can be used for the applied-directive.
The \kcode{grid(1)} modifier indicates the first grid loop
generated by the \kcode{tile} directive
and the \kcode{intratile(2)} modifier indicates the second tile loop
generated by the \kcode{tile} directive.
\cexample[6.0]{apply_syntax}{3}
\pagebreak
\ffreeexample[6.0]{apply_syntax}{3}
Without the index arguments, the \kcode{nothing} argument would
be needed as a placeholder, as illustrated by the equivalent codes
of the above example as follows.
\cexample[6.0]{apply_syntax_equivalent}{3}
\pagebreak
\ffreeexample[6.0]{apply_syntax_equivalent}{3}
\subsection{Spanning Loop Associations}
It is possible for a loop transformation directive to be applied to multiple generated loops,
and multiple directives applied to the same generated loop.
The latter is illustrated in the this example.
\cexample[6.0]{apply_span}{1}
\ffreeexample[6.0]{apply_span}{1}
In this example, the functions show successive steps in the application of
the previous loop transformation example as equivalent user-written code.
First, the tiling is applied in the \ucode{step1} function.
Next, loop transformations in the generated loop nest are replaced according to the innermost-first order rule.
Applying the innermost transformation, loop reversal, results in the loop nest in \ucode{step2}.
After that, the inner tile directive is applied in the \ucode{step3} function.
\index{reverse directive@\kcode{reverse} directive}
\index{directives!reverse@\kcode{reverse}}
\cexample[6.0]{apply_span_equivalent}{1}
\ffreeexample[6.0]{apply_span_equivalent}{1}
\subsection{Nested apply}
The following example illustrates how multiple loop transformations can be chained by nesting \kcode{apply} clauses.
In the \ucode{nested_apply} function, a loop is first tiled, then the intra-tile
loop is unrolled, and finally the iteration order of the unrolled loop is reversed.
For C/C++ codes, reversing a loop with an unsigned type index may cause the compiler
to ensure that underflow is handled correctly.
\cexample[6.0]{apply_nested}{1}
\ffreeexample[6.0]{apply_nested}{1}
In this example the \ucode{step1}, \ucode{step2} and \ucode{step3}
functions are all equivalent to the \ucode{nested_apply} function, but illustrate
a possible chain of transformations but done manually by a user.
\cexample[6.0]{apply_nested_equivalent}{1}
\ffreeexample[6.0]{apply_nested_equivalent}{1}

View File

@ -0,0 +1,14 @@
/*
* @@name: apply_nested.1
* @@type: C
* @@operation: compile
* @@expect: success
* @@version: omp_6.0
*/
void nested_apply(double A[100])
{
#pragma omp tile sizes(10) \
apply(intratile: unroll partial(2) apply(reverse))
for (int i = 0; i < 100; ++i)
A[i] = A[i] + 1;
}

View File

@ -0,0 +1,15 @@
! @@name: apply_nested.1
! @@type: F-free
! @@operation: compile
! @@expect: success
! @@version: omp_6.0
subroutine nested_apply(A)
implicit none
double precision :: A(0:99)
integer :: i
!$omp tile sizes(10) apply(intratile: unroll partial(2) apply(reverse))
do i = 0, 99
A(i) = A(i) + 1
enddo
end subroutine

View File

@ -0,0 +1,39 @@
/*
* @@name: apply_nested_equivalent.1
* @@type: C
* @@operation: compile
* @@expect: success
* @@version: omp_6.0
*/
void step1(double A[100])
{
for (int i1 = 0; i1 < 10; ++i1)
#pragma omp unroll partial(2) apply(reverse)
for (int i2 = 0; i2 < 10; ++i2) {
int i = i1 * 10 + i2;
A[i] = A[i] + 1;
}
}
void step2(double A[100])
{
for (int i1 = 0; i1 < 10; ++i1)
#pragma omp reverse
for (int i2 = 0; i2 < 5; ++i2) {
int i = i1 * 10 + i2 * 2;
A[i] = A[i] + 1;
++i;
A[i] = A[i] + 1;
}
}
void step3(double A[100])
{
for (int i1 = 0; i1 < 10; ++i1)
for (int i2 = 4; i2 >= 0; --i2) {
int i = i1 * 10 + i2 * 2;
A[i] = A[i] + 1;
++i;
A[i] = A[i] + 1;
}
}

View File

@ -0,0 +1,46 @@
! @@name: apply_nested_equivalent.1
! @@type: F-free
! @@operation: compile
! @@expect: success
! @@version: omp_6.0
subroutine step1(A)
implicit none
double precision :: A(0:99)
integer :: i,i1,i2
do i1 = 0, 9
!$omp unroll partial(2) apply(reverse)
do i2 = 0, 9
i = i1 * 10 + i2
A(i) = A(i) + 1
enddo; enddo
end subroutine
subroutine step2(A)
implicit none
double precision :: A(0:99)
integer :: i,i1,i2
do i1 = 0, 9
!$omp reverse
do i2 = 0, 4
i = i1 * 10 + i2 * 2
A(i) = A(i) + 1
i = i + 1
A(i) = A(i) + 1
enddo; enddo
end subroutine
subroutine step3(A)
implicit none
double precision :: A(0:99)
integer :: i,i1,i2
do i1 = 0, 9
do i2 = 4, 0, -1
i = i1 * 10 + i2 * 2
A(i) = A(i) + 1
i = i + 1
A(i) = A(i) + 1
enddo; enddo
end subroutine

View File

@ -0,0 +1,16 @@
/*
* @@name: apply_span.1
* @@type: C
* @@operation: compile
* @@expect: success
* @@version: omp_6.0
*/
void span_apply(double A[128][128])
{
#pragma omp for collapse(2)
#pragma omp tile sizes(16,16) \
apply(grid: interchange,reverse)
for (int i = 0; i < 128; ++i)
for (int j = 0; j < 128; ++j)
A[i][j] = A[i][j] + 1;
}

View File

@ -0,0 +1,18 @@
! @@name: apply_span.1
! @@type: F-free
! @@operation: compile
! @@expect: success
! @@version: omp_6.0
subroutine span_apply( A )
implicit none
double precision :: A(0:127,0:127)
integer :: i , j
!$omp for collapse(2)
!$omp tile sizes(16,16) apply(grid: interchange,reverse)
do i = 0, 127
do j = 0, 127
A(j,i) = A(j,i) + 1
enddo; enddo
end subroutine

View File

@ -0,0 +1,52 @@
/*
* @@name: apply_span_equivalent.1
* @@type: C
* @@operation: compile
* @@expect: success
* @@version: omp_6.0
*/
void step1(double A[128][128])
{
#pragma omp for collapse(2)
#pragma omp interchange
for (int i1 = 0; i1 < 8; ++i1)
#pragma omp reverse
for (int j1 = 0; j1 < 8; ++j1)
for (int i2 = 0; i2 < 16; ++i2)
for (int j2 = 0; j2 < 16; ++j2) {
int i = i1 * 16 + i2;
int j = j1 * 16 + j2;
A[i][j] = A[i][j] + 1;
}
}
void step2(double A[128][128])
{
#pragma omp for collapse(2)
#pragma omp interchange
for (int i1 = 0; i1 < 8; ++i1)
for (int j1 = 7; j1 >= 0; --j1)
for (int i2 = 0; i2 < 16; ++i2)
for (int j2 = 0; j2 < 16; ++j2) {
int i = i1 * 16 + i2;
int j = j1 * 16 + j2;
A[i][j] = A[i][j] + 1;
}
}
void step3(double A[128][128])
{
#pragma omp for collapse(2)
for (int j1 = 7; j1 >= 0; --j1)
for (int i1 = 0; i1 < 8; ++i1)
for (int i2 = 0; i2 < 16; ++i2)
for (int j2 = 0; j2 < 16; ++j2) {
int i = i1 * 16 + i2;
int j = j1 * 16 + j2;
A[i][j] = A[i][j] + 1;
}
}

View File

@ -0,0 +1,64 @@
! @@name: apply_span_equivalent.1
! @@type: F-free
! @@operation: compile
! @@expect: success
! @@version: omp_6.0
subroutine step1(A)
implicit none
double precision :: A(0:127, 0:127)
integer :: i,i1,i2, j,j1,j2
!$omp do collapse(2)
!$omp interchange
do i1 = 0, 7
!$omp reverse
do j1 = 0, 7
do i2 = 0, 15
do j2 = 0, 15
i = i1 * 16 + i2
j = j1 * 16 + j2
A(j,i) = A(j,i) + 1
enddo; enddo
enddo; enddo
end subroutine
subroutine step2(A)
implicit none
double precision :: A(0:127, 0:127)
integer :: i,i1,i2, j,j1,j2
!$omp do collapse(2)
!$omp interchange
do i1 = 0, 7
do j1 = 7, 0, -1
do i2 = 0, 15
do j2 = 0, 15
i = i1 * 16 + i2
j = j1 * 16 + j2
A(j,i) = A(j,i) + 1
enddo; enddo
enddo; enddo
end subroutine
subroutine step3(A)
implicit none
double precision :: A(0:127, 0:127)
integer :: i,i1,i2, j,j1,j2
!$omp do collapse(2)
do j1 = 7, 0, -1
do i1 = 0, 7
do i2 = 0, 15
do j2 = 0, 15
i = i1 * 16 + i2
j = j1 * 16 + j2
A(j,i) = A(j,i) + 1
enddo; enddo
enddo; enddo
end subroutine

View File

@ -0,0 +1,21 @@
/*
* @@name: apply_syntax.1
* @@type: C
* @@operation: compile
* @@expect: success
* @@version: omp_6.0
*/
void construct_unroll(double A[100])
{
#pragma omp unroll
#pragma omp tile sizes(4)
for (int i = 0; i < 100; ++i)
A[i] = A[i] + 1;
}
void apply_unroll(double A[100])
{
#pragma omp tile sizes(4) apply(grid: unroll)
for (int i = 0; i < 100; ++i)
A[i] = A[i] + 1;
}

View File

@ -0,0 +1,27 @@
! @@name: apply_syntax.1
! @@type: F-free
! @@operation: compile
! @@expect: success
! @@version: omp_6.0
subroutine construct_unroll(A)
implicit none
integer :: i
double precision :: A(0:99)
!$omp unroll
!$omp tile sizes(4)
do i = 0, 99
A(i) = A(i) + 1
end do
end subroutine
subroutine apply_unroll(A)
implicit none
integer :: i
double precision :: A(0:99)
!$omp tile sizes(4) apply(grid: unroll)
do i = 0, 99
A(i) = A(i) + 1
end do
end subroutine

View File

@ -0,0 +1,19 @@
/*
* @@name: apply_syntax.2
* @@type: C
* @@operation: compile
* @@expect: success
* @@version: omp_6.0
*/
void apply_assoc(double A[100][100][3])
{
#pragma omp tile sizes(4,4) \
apply( grid: interchange,nothing) \
apply(intratile: nothing,interchange)
for (int i = 0; i < 100; ++i)
for (int j = 0; j < 100; ++j)
// k loop not associated with tile, but with interchange
for (int k = 0; k < 3; ++k)
A[i][j][k] = A[i][j][k] + 1;
}

View File

@ -0,0 +1,21 @@
! @@name: apply_syntax.2
! @@type: F-free
! @@operation: compile
! @@expect: success
! @@version: omp_6.0
subroutine apply_assoc(A)
implicit none
double precision :: A(0:2, 0:99, 0:99)
integer :: k, j, i
!$omp tile sizes(4,4) &
!$omp& apply( grid: interchange, nothing) &
!$omp& apply(intratile: nothing, interchange)
do i = 0, 99
do j = 0, 99
do k = 0, 2 !! k loop not associated with tile, but w. interchange
A(k,j,i) = A(k,j,i) + 1
enddo
enddo; enddo
end subroutine

View File

@ -0,0 +1,16 @@
/*
* @@name: apply_syntax.3
* @@type: C
* @@operation: compile
* @@expect: success
* @@version: omp_6.0
*/
void apply_complexarg(double A[100*100])
{
#pragma omp tile sizes(4,5) \
apply(grid(1): reverse) \
apply(intratile(2): unroll)
for (int i = 0; i < 100; ++i)
for (int j = 0; j < 100; ++j)
A[i*100+j] += 1;
}

View File

@ -0,0 +1,19 @@
! @@name: apply_syntax.3
! @@type: F-free
! @@operation: compile
! @@expect: success
! @@version: omp_6.0
subroutine apply_complexarg(A)
implicit none
double precision :: A(100,100)
integer :: i, j
!$omp tile sizes(4,5) &
!$omp& apply(grid(1): reverse) &
!$omp& apply(intratile(2): unroll)
do i = 1, 100
do j = 1, 100
A(j,i) = A(j,i) + 1
end do
end do
end subroutine

View File

@ -0,0 +1,16 @@
/*
* @@name: apply_syntax_equivalent.1
* @@type: C
* @@operation: compile
* @@expect: success
* @@version: omp_5.1
*/
void equivalent(double A[100])
{
#pragma omp unroll
for (int i1 = 0; i1 < 25; ++i1)
for (int i2 = 0; i2 < 4; ++i2) {
int i = i1 * 4 + i2;
A[i] = A[i] + 1;
}
}

View File

@ -0,0 +1,18 @@
! @@name: apply_syntax_equivalent.1
! @@type: F-free
! @@operation: compile
! @@expect: success
! @@version: omp_5.1
subroutine equivalent(A)
implicit none
double precision :: A(0:99)
integer :: i1,i2, i
!$omp unroll
do i1=0,24
do i2=0, 3
i = i1 * 4 + i2
A(i) = A(i) + 1
enddo; enddo
end subroutine

View File

@ -0,0 +1,25 @@
/*
* @@name: apply_syntax_equivalent.2
* @@type: C
* @@operation: compile
* @@expect: success
* @@version: omp_6.0
*/
void equivalent(double A[100][100][3])
{
#pragma omp interchange
for (int i1 = 0; i1 < 25; ++i1)
#pragma omp nothing
for (int j1 = 0; j1 < 25; ++j1)
#pragma omp nothing
for (int i2 = 0; i2 < 4; ++i2)
#pragma omp interchange
for (int j2 = 0; j2 < 4; ++j2)
for (int k = 0; k < 3; ++k) {
int i = i1 * 4 + i2;
int j = j1 * 4 + j2;
A[i][j][k] = A[i][j][k] + 1;
}
}

View File

@ -0,0 +1,29 @@
! @@name: apply_syntax_equivalent.2
! @@type: F-free
! @@operation: compile
! @@expect: success
! @@version: omp_6.0
subroutine equivalent(A)
implicit none
double precision :: A(0:2, 0:99, 0:99)
integer :: k, j1,j2, i1,i2
!$omp interchange !! grid modifier
do i1 = 0, 24
!$omp nothing !! grid modifier
do j1 = 0, 24
!$omp nothing !! intratile modifier
do i2 = 0, 3
!$omp interchange !! intratile modifier
do j2 = 0, 3
do k = 0, 2
i = i1 * 4 + i2
j = j1 * 4 + j2
A(k,j,i) = A(k,j,i) + 1
enddo
enddo; enddo
enddo; enddo
end subroutine

View File

@ -0,0 +1,27 @@
/*
* @@name: apply_syntax_equivalent.3
* @@type: C
* @@operation: compile
* @@expect: success
* @@version: omp_6.0
*/
void apply_complexarg_equivalent1(double A[100*100])
{
#pragma omp tile sizes(4,5) \
apply(grid: reverse,nothing) \
apply(intratile: nothing,unroll)
for (int i = 0; i < 100; ++i)
for (int j = 0; j < 100; ++j)
A[i*100+j] += 1;
}
void apply_complexarg_equivalent2(double A[100*100])
{
#pragma omp reverse
for (int i1 = 0; i1 < 100; i1+=4) // grid loop 1
for (int j1 = 0; j1 < 100; j1+=5) // grid loop 2
for (int i = i1; i < i1+4; i+=1) // tile loop 1
#pragma omp unroll
for (int j = j1; j < j1+5; j+=1) // tile loop 2
A[i*100+j] += 1;
}

View File

@ -0,0 +1,37 @@
! @@name: apply_syntax_equivalent.3
! @@type: F-free
! @@operation: compile
! @@expect: success
! @@version: omp_6.0
subroutine apply_complexarg_equivalent1(A)
implicit none
double precision :: A(100,100)
integer :: i, j
!$omp tile sizes(4,5) &
!$omp& apply(grid: reverse,nothing) &
!$omp& apply(intratile: nothing,unroll)
do i = 1, 100
do j = 1, 100
A(j,i) = A(j,i) + 1
end do
end do
end subroutine
subroutine apply_complexarg_equivalent2(A)
implicit none
double precision :: A(100,100)
integer :: i, j, i1, j1
!$omp reverse
do i1 = 1, 100, 4 ! grid loop 1
do j1 = 1, 100, 5 ! grid loop 2
do i = i1, i1+3 ! tile loop 1
!$omp unroll
do j = j1, j1+4 ! tile loop 2
A(j,i) = A(j,i) + 1
end do
end do
end do
end do
end subroutine

View File

@ -1,6 +1,6 @@
%\pagebreak
\pagebreak
\begin{fortranspecific}[4ex]
\section{Race Conditions Caused by Implied Copies of Shared Variables in Fortran}
\fortranspecificstart
\label{sec:fort_race}
\index{shared variables!race conditions}
@ -12,6 +12,6 @@ the call and copy from the temporary location into the original variable when th
subroutine returns. This copying would cause races in the \kcode{parallel} region.
\ffreenexample{fort_race}{1}
\fortranspecificend
\end{fortranspecific}

View File

@ -50,7 +50,7 @@
\input{generated-include}
% Text to appear in the footer on even-numbered pages:
\newcommand{\footerText}{OpenMP Examples Version \VER{} - \VERDATE}
\newcommand{\footerText}{OpenMP \langselect Examples Version \VER{} -- \VERDATE}
% Unified style sheet for OpenMP documents:
\input{openmp.sty}

View File

@ -52,20 +52,20 @@
%
% \specref{} % formats the cross-reference "Section X on page Y"
%
% \notestart % black horizontal rule for Notes
% \noteend
% \begin{note} % black horizontal rule for Notes
% \end{note}
%
% \cspecificstart % blue horizontal rule for C-specific text
% \cspecificend
% \begin{cspecific} % blue horizontal rule for C-specific text
% \end{cspecific}
%
% \cppspecificstart % blue horizontal rule for C++ -specific text
% \cppspecificend
% \begin{cppspecific} % blue horizontal rule for C++ -specific text
% \end{cppspecific}
%
% \ccppspecificstart % blue horizontal rule for C / C++ -specific text
% \ccppspecificend
% \begin{ccppspecific} % blue horizontal rule for C/C++ -specific text
% \end{ccppspecific}
%
% \fortranspecificstart % blue horizontal rule for Fortran-specific text
% \fortranspecificend
% \begin{fortranspecific} % blue horizontal rule for Fortran-specific text
% \end{fortranspecific}
%
% \glossaryterm % for use in formatting glossary entries
% \glossarydefstart
@ -302,7 +302,7 @@
% Enable \alltt{} for formatting blocks of code:
\usepackage{alltt}
\usepackage{toolbox} % for \toolboxMakeSplit
\usepackage{toolbox} % for \toolboxReplace
% This sets the default \code{} font to tt (monospace) and bold:
\newcommand\code[1]{\texttt{\textbf{#1}}}
@ -315,20 +315,16 @@
% This is an updated set of macros for code style work
% kcode - keywords, vcode - value, bcode - base language,
% pvar - variables, pout - program outputs
\toolboxMakeSplit*{ }{DoSplitS}\toolboxMakeSplit*{_}{DoSplitU}
\protected\def\DoReplaceU#1{\DoSplitU{#1}\leftutext\rightutext
\leftutext%
\ifthenelse{\isundefined{\rightutext}}{}%
{\_\expandafter\DoReplaceU\expandafter{\rightutext}}}
\protected\def\DoReplaceS#1{\DoSplitS{#1}\leftstext\rightstext
\expandafter\DoReplaceU\expandafter{\leftstext}%
\ifthenelse{\isundefined{\rightstext}}{}%
{\textrm{~}\expandafter\DoReplaceS\expandafter{\rightstext}}}
\newcommand{\myreplacedmt}[1]{\protect\DoReplaceS{#1}}
\newcommand\kcode[1]{\texttt{\bfseries\upshape\myreplacedmt{#1}}}
\newcommand\bcode[1]{\texttt{\mdseries\upshape\myreplacedmt{#1}}}
\newcommand\vcode[1]{\bcode{#1}}
\newcommand\ucode[1]{\texttt{\mdseries\slshape\myreplacedmt{#1}}}
\protected\def\DoReplaceU#1{\def\utexttmp{#1}%
\toolboxReplace{_}{\_}\utexttmp\utexttmp}
\protected\def\myreplacedmt#1#2{\def\stexttmp{#1}%
\toolboxReplace{_}{\_}\stexttmp%
\toolboxReplace{ }{\rmfamily{ }\ttfamily#2}\stexttmp%
{\ttfamily#2\stexttmp}}
\newcommand\kcode[1]{\myreplacedmt{#1}{\bfseries\upshape}}
\newcommand\vcode[1]{\myreplacedmt{#1}{\mdseries\upshape}}
\newcommand\bcode[1]{\kcode{#1}}
\newcommand\ucode[1]{\myreplacedmt{#1}{\mdseries\slshape}}
\newcommand\pvar[1]{\ucode{#1}}
\newcommand\pout[1]{\vcode{#1}}
\newcommand\docref[1]{\textrm{\mdseries\itshape{#1}}}
@ -340,14 +336,24 @@
\newcommand\examplesblob[1]{\href{\examplesrepo/blob/#1}{#1}}
% Environment for a paragraph of literal code, single-spaced, no outline, no indenting:
\newenvironment{codepar}[1]
{\begin{alltt}\bfseries #1}
{\end{alltt}}
\usepackage{listings}
\lstnewenvironment{codepar}{%
}{}
%\newenvironment{codepar}[1]
%{\begin{alltt}\bfseries #1}
%{\end{alltt}}
% For blocks of code inside a box frame:
\newenvironment{boxedcode}[1]
{\vspace{0.25em plus 5em minus 0.25em}\begin{framed}\begin{minipage}[t]{\textwidth}\begin{alltt}\bfseries #1}
{\end{alltt}\end{minipage}\end{framed}\vspace{0.25em plus 5em minus 0.25em}}
\lstnewenvironment{boxedcode}{%
\lstset{framesep=1.2ex,frame=l,framerule=3pt,
backgroundcolor=\color{white!90!black}}}{}
\lstnewenvironment{boxeducode}{%
\lstset{framesep=1.2ex,frame=l,framerule=3pt,
basicstyle=\ttfamily\mdseries\slshape,
backgroundcolor=\color{white!90!black}}}{}
%\newenvironment{boxedcode}[1]
%{\vspace{0.25em plus 5em minus %0.25em}\begin{framed}\begin{minipage}[t]{\textwidth}\begin{alltt}\bfseries #1}
%{\end{alltt}\end{minipage}\end{framed}\vspace{0.25em plus 5em minus 0.25em}}
% This sets the margins in the framed box:
\setlength{\FrameSep}{0.6em}
@ -355,9 +361,39 @@
% For indented lists of verbatim code at a relaxed line spacing,
% e.g., for use after "where clause is one of the following:"
\usepackage{setspace}
\newenvironment{indentedcodelist}{%
\begin{adjustwidth}{0.25in}{}\begin{spacing}{1.5}\begin{alltt}\bfseries}
{\end{alltt}\end{spacing}\vspace{-0.25\baselineskip}\end{adjustwidth}}
\lstnewenvironment{indentedcodelist}{%
\lstset{xleftmargin=0.25in}}{}
%\newenvironment{indentedcodelist}{%
%\begin{adjustwidth}{0.25in}{}\vspace{-0.2\baselineskip}\begin{spacing}{1.2}\beg%in{alltt}\bfseries}
% {\end{alltt}\end{spacing}\vspace{-0.2\baselineskip}\end{adjustwidth}}
\lstdefinestyle{openmp}{
showstringspaces=false,
basicstyle=\ttfamily\bfseries,
linewidth=.99\linewidth,
xleftmargin=0.01\linewidth,
columns=fullflexible,
keepspaces=true,
escapechar=@,
belowskip=\smallskipamount,
aboveskip=\smallskipamount,
morecomment=[l][\color{red}\sout]{\%DIF\ <}, % deleted empty lines
morecomment=[l][\color{blue}\uwave]{\%DIF\ >}, % added empty lines
moredelim=[il][\color{red}\sout]{\%DIF\ <\ }, % deleted lines
moredelim=[il][\color{blue}\uwave]{\%DIF\ >\ }, % added lines
moredelim=**[is][\rmfamily\mdseries\itshape]{\\plc\{}{\}},
moredelim=**[is][\textsubscript]{\\textsubscript\{}{\}},
moredelim=**[is][]{\\textnormal\{}{\}},
moredelim=**[is][\rmfamily\mdseries\itshape]{\\textsl\{}{\}},
moredelim=**[is][\ttfamily\mdseries\slshape]{\\ucode\{}{\}},
moredelim=**[is][\ttfamily\bfseries\upshape]{\\kcode\{}{\}},
moredelim=**[is][]{\\code\{}{\}},
moredelim=**[is][]{\\scode\{}{\}},
moredelim=*[is][\color{red}\sout]{*!----}{----!*},
moredelim=*[is][\color{blue}\uwave]{*!++++}{++++!*},
moredelim=**[is][\mdseries\rmfamily]{\\text\{}{\}},
}
\lstset{style=openmp}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -413,31 +449,61 @@
\newcommand{\VSPb}{\vspace{0.5ex plus 5ex minus 0.25ex}}
\newcommand{\VSPa}{\vspace{0.25ex plus 5ex minus 0.25ex}}
% remove language marker definition if either ccpp or fortran is undefined
\ifthenelse{\boolean{ccpp}\and\boolean{fortran}}{}%
{\renewcommand{\linewitharrows}[4]{\par}}
\newcommand{\langselect}{}
\ifccpp\else\renewcommand{\langselect}{Fortran~}\fi
\iffortran\else\renewcommand{\langselect}{C/C++~}\fi
% C
\ifccpp
\newenvironment{cspecific}[1][0ex]{\vspace{#1}\cspecificstart\vspace{-#1}}{\cspecificend}
\else
\excludecomment{cspecific}
\fi
\newcommand{\cspecificstart}{\needspace{\sbns}\linewitharrows{-1}{solid}{C}{3em}}
\newcommand{\cspecificend}{\linewitharrows{1}{solid}{C}{3em}\bigskip}
% C/C++
\ifccpp
\newenvironment{ccppspecific}[1][0ex]{\vspace{#1}\ccppspecificstart\vspace{-#1}}{\ccppspecificend}
\else
\excludecomment{ccppspecific}
\fi
\newcommand{\ccppspecificstart}{\VSPb\linewitharrows{-1}{solid}{C / C++}{6em}\VSPa}
\newcommand{\ccppspecificend}{\VSPb\linewitharrows{1}{solid}{C / C++}{6em}\VSPa}
% C++
\ifccpp
\newenvironment{cppspecific}[1][0ex]{\vspace{#1}\cppspecificstart\vspace{-#1}}{\cppspecificend}
\else
\excludecomment{cppspecific}
\fi
\newcommand{\cppspecificstart}{\needspace{\sbns}\linewitharrows{-1}{solid}{C++}{6em}}
\newcommand{\cppspecificend}{\linewitharrows{1}{solid}{C++}{6em}\bigskip}
% C90
\newenvironment{cNinetyspecific}{\cNinetyspecificstart}{\cNinetyspecificend}
\newcommand{\cNinetyspecificstart}{\needspace{\sbns}\linewitharrows{-1}{solid}{C90}{4em}}
\newcommand{\cNinetyspecificend}{\linewitharrows{1}{solid}{C90}{4em}\bigskip}
% C99
\newenvironment{cNinetyNinespecific}{\cNinetyNinespecificstart}{\cNinetyNinespecificend}
\newcommand{\cNinetyNinespecificstart}{\needspace{\sbns}\linewitharrows{-1}{solid}{C99}{4em}}
\newcommand{\cNinetyNinespecificend}{\linewitharrows{1}{solid}{C99}{4em}\bigskip}
% Fortran
\iffortran
\newenvironment{fortranspecific}[1][0ex]{\vspace{#1}\fortranspecificstart\vspace{-#1}}{\fortranspecificend}
\else
\excludecomment{fortranspecific}
\fi
\newcommand{\fortranspecificstart}{\VSPb\linewitharrows{-1}{solid}{Fortran}{6em}\VSPa}
\newcommand{\fortranspecificend}{\VSPb\linewitharrows{1}{solid}{Fortran}{6em}\VSPa}
% Note
\newenvironment{note}{\notestart}{\noteend}
\newcommand{\notestart}{\VSPb\notelinewitharrows{-1}{solid}\VSPa}
\newcommand{\noteend}{\VSPb\notelinewitharrows{1}{solid}\VSPa}
@ -486,7 +552,8 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Formats a cross reference label as "Section X on page Y".
\newcommand{\specref}[1]{Section~\ref{#1} on page~\pageref{#1}}
\newcommand{\nspecref}[2]{#1~\ref{#2} on page~\pageref{#2}}
\newcommand{\specref}[1]{\nspecref{Section}{#1}}
% For caption for supertabular and figure, by yanyh15
\captionsetup[table]{labelfont={sf,sc,bf},textfont=normalfont,singlelinecheck=off,labelformat=simple,labelsep=colon,aboveskip=00pt,belowskip=10pt}
@ -501,7 +568,6 @@
% \cppexample formats blue markers, caption, and code for C++ examples
% \fexample formats blue markers, caption, and code for Fortran (fixed) examples
% \ffreeexample formats blue markers, caption, and code for Fortran90 (free) examples
% Thanks to Jin, Haoqiang H. for the original definitions of the following:
\usepackage{color,fancyvrb} % for \VerbatimInput
\usepackage{xargs} % for optional args
@ -542,7 +608,7 @@
\def\fcnt{\the\cnt}
% \def\fcnt{\stagcnt}
\noindent
\hypertarget{ex:\cname}{\textit{Example \ename}}\vername
\underline{\hypertarget{ex:\cname}{\textit{Example \ename}}\vername}
%\vspace*{-3mm}
\code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=\fcnt,fontsize=\small]%
{\chapdirname/sources/\cname}}
@ -569,27 +635,37 @@
}
\newcommandx*\cexample[4][1=,4=0]{%
\needspace{5\baselineskip}\ccppspecificstart
\ifccpp
\needspace{5\baselineskip}\begin{ccppspecific}
\cnexample[#1]{#2}{#3}[#4]
\ccppspecificend
\end{ccppspecific}
\fi
}
\newcommandx*\cppexample[4][1=,4=0]{%
\needspace{5\baselineskip}\cppspecificstart
\ifccpp
\needspace{5\baselineskip}\begin{cppspecific}
\cppnexample[#1]{#2}{#3}[#4]
\cppspecificend
\end{cppspecific}
\fi
}
\newcommandx*\fexample[4][1=,4=0]{%
\needspace{5\baselineskip}\fortranspecificstart
\iffortran
\needspace{5\baselineskip}
\begin{fortranspecific}
\fnexample[#1]{#2}{#3}[#4]
\fortranspecificend
\end{fortranspecific}
\fi
}
\newcommandx*\ffreeexample[4][1=,4=0]{%
\needspace{5\baselineskip}\fortranspecificstart
\iffortran
\needspace{5\baselineskip}
\begin{fortranspecific}
\ffreenexample[#1]{#2}{#3}[#4]
\fortranspecificend
\end{fortranspecific}
\fi
}
\newcommandx*\hexentry[4][1=c,3=]{%

View File

@ -1,9 +1,9 @@
\pagebreak
\begin{fortranspecific}[4ex]
\section{Fortran Restrictions on the \kcode{do} Construct}
\label{sec:fort_do}
\index{constructs!do@\kcode{do}}
\index{do construct@\kcode{do} construct}
\fortranspecificstart
If an \kcode{end do} directive follows a \plc{do-construct} in which several
\bcode{DO} statements share a \bcode{DO} termination statement, then a \kcode{do}
@ -17,6 +17,6 @@ The following example is non-conforming because the matching \kcode{do} directiv
for the \kcode{end do} does not precede the outermost loop:
\fnexample{fort_do}{2}
\fortranspecificend
\end{fortranspecific}

View File

@ -21,10 +21,8 @@ from where the function \ucode{foo} is called. Binding to \kcode{teams} allows t
parallelism to be available for the second \kcode{loop} construct.
The loop iterations can be executed concurrently,
thus allowing implementations to perform various loop nest optimizations including
reordering of the \ucode{i} and \ucode{j} loops. The \kcode{loop} construct can be
implemented using any parallelism-generating mechanism, which allows better use
of hardware resources while also allowing sequential optimizations, reordering,
tiling etc.
reordering of the \ucode{i} and \ucode{j} loops. The \kcode{loop} construct can be implemented
with the use of additional threads or some other concurrency mechanism, which allows better use of hardware resources while also allowing sequential optimizations, reordering, tiling etc.
For example, the first \kcode{loop} construct could be implemented as if it was specified as
\kcode{distribute parallel for} and the second \kcode{loop} construct as if it was specified as

View File

@ -1,12 +1,12 @@
%\pagebreak
\begin{cppspecific}[4ex]
\section{Parallel Random Access Iterator Loop}
\cppspecificstart
\label{sec:pra_iterator}
\index{random access iterator, C++}
The following example shows a parallel random access iterator loop.
\cppnexample[3.0]{pra_iterator}{1}
\cppspecificend
\end{cppspecific}

View File

@ -1,6 +1,6 @@
%\pagebreak
\begin{fortranspecific}[4ex]
\section{\kcode{workshare} Construct}
\fortranspecificstart
\label{sec:workshare}
\index{constructs!workshare@\kcode{workshare}}
\index{workshare construct@\kcode{workshare} construct}
@ -68,6 +68,6 @@ fragment regardless of whether the code is executed sequentially or inside an Op
program with multiple threads:
\fnexample{workshare}{7}
\fortranspecificend
\end{fortranspecific}

View File

@ -18,7 +18,9 @@
Assumption directives provide additional information about the expected properties of
the program that may be used by an implementation for optimization.
Ignoring this information should not alter the behavior of the program. The C/C++ example
Ignoring this information should not alter the behavior of the program.
The C/C++ example
shows the use of delimited scope (Case 1) and block-associated (Case 2) assumption directives.
A similar effect is shown for Fortran where the \kcode{assumes} directive is used in the module (Case 1)
and the block-associated directive uses an \kcode{end assume} termination (Case 2).
@ -32,3 +34,19 @@ could eliminate additional checks.
\cexample[5.1]{assumption}{1}
\ffreeexample[5.1]{assumption}{1}
\pagebreak
In the following example the \kcode{no_openmp} and \kcode{no_parallelism} assumption clauses are used.
The \kcode{no_openmp} clause is shorthand for the \kcode{no_openmp_contructs} and \kcode{no_openmp_routines} clauses.
In Case 1 the \kcode{assume} directive with the \kcode{no_openmp} clause is applied to an external function call \ucode{init}.
Independent of the compiler's ability to derive necessary information about \ucode{init}, the \kcode{assume} directive guarantees
the absence of OpenMP constructs or OpenMP runtime calls so that the compiler may manage hardware and the runtime in a more optimal manner.
In Case 2, the \kcode{assume} directive with \kcode{no_parallelism} is nested inside the \kcode{target teams loop} directive. By providing the information
that no other OpenMP parallelism generating constructs are going to be encountered in the function,
the implementation of \ucode{element_transform} may have an opportunity to optimize the code in the \kcode{loop} construct,
which may now be implemented using all additional threads available or via some other concurrency mechanism.
\cexample[6.0]{assumption}{2}
\ffreeexample[6.0]{assumption}{2}

Some files were not shown because too many files have changed in this diff Show More