Makefile update

fix transparent task example
v6.0 release
2025-04-11 00:42:12 +01:00 · 2025-01-05 14:22:19 -08:00 · 2024-11-14 08:08:37 -08:00 · 2024-11-13 11:07:08 -08:00 · 2024-04-16 08:59:23 -07:00 · 2022-11-09 13:11:02 -08:00
889 changed files with 34233 additions and 7954 deletions
--- a/Chap_SIMD.tex
+++ b/Chap_SIMD.tex
@ -1,5 +1,4 @@
-\pagebreak
-\chapter{SIMD}
+\cchapter{SIMD}{SIMD}
 \label{chap:simd}

 Single instruction, multiple data (SIMD) is a form of parallel execution 
@ -9,34 +8,34 @@ The addition of two vectors to form a third vector is a SIMD operation.
 Many processors have SIMD (vector) units that can perform simultaneously 
 2, 4, 8 or more executions of the same operation (by a single SIMD unit). 

-Loops without loop-carried backward dependency (or with dependency preserved using 
-ordered simd) are candidates for vectorization by the compiler for 
+Loops without loop-carried backward dependences (or with dependences preserved using
+\kcode{ordered simd}) are candidates for vectorization by the compiler for 
 execution with SIMD units. In addition, with state-of-the-art vectorization 
-technology and \code{declare simd} construct extensions for function vectorization
+technology and \kcode{declare simd} directive extensions for function vectorization
 in the OpenMP 4.5 specification, loops with function calls can be vectorized as well. 
 The basic idea is that a scalar function call in a loop can be replaced by a vector version 
 of the function, and the loop can be vectorized simultaneously by combining a loop 
-vectorization (\code{simd} directive on the loop) and a function 
-vectorization (\code{declare simd} directive on the function).
+vectorization (\kcode{simd} directive on the loop) and a function 
+vectorization (\kcode{declare simd} directive on the function).

-A \code{simd} construct states that SIMD operations be performed on the
+A \kcode{simd} construct states that SIMD operations be performed on the
 data within the loop.  A number of clauses are available to provide
-data-sharing attributes (\code{private}, \code{linear}, \code{reduction} and 
-\code{lastprivate}).  Other clauses provide vector length preference/restrictions 
-(\code{simdlen} / \code{safelen}), loop fusion (\code{collapse}), and data 
-alignment (\code{aligned}).
+data-sharing attributes (\kcode{private}, \kcode{linear}, \kcode{reduction} and 
+\kcode{lastprivate}).  Other clauses provide vector length preference/restrictions 
+(\kcode{simdlen} / \kcode{safelen}), loop fusion (\kcode{collapse}), and data 
+alignment (\kcode{aligned}).

-The \code{declare simd} directive designates
+The \kcode{declare simd} directive designates
 that a vector version of the function should also be constructed for 
-execution within loops that contain the function and have a \code{simd} 
-directive.  Clauses provide argument specifications (\code{linear},
-\code{uniform}, and \code{aligned}), a requested vector length 
-(\code{simdlen}), and designate whether the function is always/never 
-called conditionally in a loop (\code{branch}/\code{inbranch}). 
+execution within loops that contain the function and have a \kcode{simd} 
+directive.  Clauses provide argument specifications (\kcode{linear},
+\kcode{uniform}, and \kcode{aligned}), a requested vector length 
+(\kcode{simdlen}), and designate whether the function is always/never 
+called conditionally in a loop (\kcode{notinbranch}/\kcode{inbranch}). 
 The latter is for optimizing performance.

-Also, the \code{simd} construct has been combined with the worksharing loop 
-constructs (\code{for simd} and \code{do simd}) to enable simultaneous thread 
+Also, the \kcode{simd} construct has been combined with the worksharing loop 
+constructs (\kcode{for simd} and \kcode{do simd}) to enable simultaneous thread 
 execution in different SIMD units.  
 %Hence, the \code{simd} construct can be 
 %used alone on a loop to direct vectorization (SIMD execution), or in 
@ -46,3 +45,8 @@ execution in different SIMD units.
 %\code{parallel for simd}).


+%===== Examples Sections =====
+\input{SIMD/SIMD}
+\input{SIMD/linear_modifier}
+
+
--- a/Chap_affinity.tex
+++ b/Chap_affinity.tex
@ -1,9 +1,8 @@
-\pagebreak
-\chapter{OpenMP Affinity}
+\cchapter{OpenMP Affinity}{affinity}
 \label{chap:openmp_affinity}

-OpenMP Affinity consists of a \code{proc\_bind} policy (thread affinity policy) and a specification of
-places (\texttt{"}location units\texttt{"} or \plc{processors} that may be cores, hardware
+OpenMP Affinity consists of a \kcode{proc_bind} policy (thread affinity policy) and a specification of
+places (``location units'' or \plc{processors} that may be cores, hardware
 threads, sockets, etc.).  
 OpenMP Affinity enables users to bind computations on specific places.
 The placement will hold for the duration of the parallel region. 
@ -12,13 +11,13 @@ to different cores (hardware threads, sockets, etc.) prescribed within a given p
 if two or more cores (hardware threads, sockets, etc.) have been assigned to a given place.

 Often the binding can be managed without resorting to explicitly setting places.
-Without the specification of places in the \code{OMP\_PLACES} variable, 
+Without the specification of places in the \kcode{OMP_PLACES} variable, 
 the OpenMP runtime will distribute and bind threads using the entire range of processors for 
-the OpenMP program, according to the \code{OMP\_PROC\_BIND} environment variable
-or the \code{proc\_bind} clause.  When places are specified, the OMP runtime
+the OpenMP program, according to the \kcode{OMP_PROC_BIND} environment variable
+or the \kcode{proc_bind} clause.  When places are specified, the OMP runtime
 binds threads to the places according to a default distribution policy, or
-those specified in the \code{OMP\_PROC\_BIND} environment variable or the
-\code{proc\_bind} clause.
+those specified in the \kcode{OMP_PROC_BIND} environment variable or the
+\kcode{proc_bind} clause.

 In the OpenMP Specifications document a processor refers to an execution unit that
 is enabled for an OpenMP thread to use.  A processor is a core when there is
@ -27,12 +26,12 @@ SMT is enabled, a processor is a hardware thread (HW-thread). (This is the
 usual case; but actually, the execution unit is implementation defined.) Processor
 numbers are numbered sequentially from 0 to the number of cores less one (without SMT), or
 0 to the number HW-threads less one (with SMT). OpenMP places use the processor number to designate
-binding locations (unless an \texttt{"}abstract name\texttt{"} is used.) 
+binding locations (unless an ``abstract name'' is used.) 


 The processors available to a process may be a subset of the system's
 processors.  This restriction may be the result of a 
-wrapper process controlling the execution (such as \code{numactl} on Linux systems), 
+wrapper process controlling the execution (such as \plc{numactl} on Linux systems), 
 compiler options, library-specific environment variables, or default
 kernel settings.  For instance, the execution of multiple MPI processes,
 launched on a single compute node, will each have a subset of processors as
@ -53,21 +52,21 @@ variables for the MPI library.  %Forked threads within an MPI process
 %which sets \code{OMP\_PLACES} specifically for the MPI process. 

 Threads of a team are positioned onto places in a compact manner, a 
-scattered distribution, or onto the master's place, by setting the 
-\code{OMP\_PROC\_BIND} environment variable or the \code{proc\_bind} clause  to 
-\plc{close}, \plc{spread}, or \plc{master}, respectively.  When 
-\code{OMP\_PROC\_BIND} is set to FALSE no binding is enforced; and 
+scattered distribution, or onto the primary thread's place, by setting the 
+\kcode{OMP_PROC_BIND} environment variable or the \kcode{proc_bind} clause  to 
+\kcode{close}, \kcode{spread}, or \kcode{primary} (\kcode{master} has been deprecated), respectively.  When 
+\kcode{OMP_PROC_BIND} is set to FALSE no binding is enforced; and 
 when the value is TRUE, the binding is implementation defined to 
-a set of places in the \code{OMP\_PLACES} variable or to places 
-defined by the implementation if the \code{OMP\_PLACES} variable 
-is not set.
+a set of places in the \kcode{OMP_PLACES} variable or to places 
+defined by the implementation if the \kcode{OMP_PLACES} variable 
+is not set. 

-The \code{OMP\_PLACES} variable can also be set to an abstract name 
-(\plc{threads}, \plc{cores}, \plc{sockets}) to specify that a place is
+The \kcode{OMP_PLACES} variable can also be set to an abstract name 
+(\kcode{threads}, \kcode{cores}, \kcode{sockets}) to specify that a place is
 either a single hardware thread, a core, or a socket, respectively. 
-This description of the \code{OMP\_PLACES} is most useful when the 
+This description of the \kcode{OMP_PLACES} is most useful when the 
 number of threads is equal to the number of hardware thread, cores
-or sockets.  It can also be used with a \plc{close} or \plc{spread} 
+or sockets.  It can also be used with a \kcode{close} or \kcode{spread} 
 distribution policy when the equality doesn't hold.


@ -116,3 +115,11 @@ distribution policy when the equality doesn't hold.
 %     thread #     0  * * * *   _ _ _ _   _ _  _  _   #mask for thread 0
 %     thread #     0  _ _ _ _   * * * *   _ _  _  _   #mask for thread 1
 %     thread #     0  _ _ _ _   _ _ _ _   * *  *  *   #mask for thread 2
+
+
+%===== Examples Sections =====
+\input{affinity/affinity}
+\input{affinity/task_affinity}
+\input{affinity/affinity_display}
+\input{affinity/affinity_query}
+
--- a/Chap_data_environment.tex
+++ b/Chap_data_environment.tex
@ -1,13 +1,12 @@
-\pagebreak
-\chapter{Data Environment}
+\cchapter{Data Environment}{data_environment}
 \label{chap:data_environment}
 The OpenMP \plc{data environment} contains data attributes of variables and
-objects.  Many constructs (such as \code{parallel}, \code{simd}, \code{task}) 
+objects.  Many constructs (such as \kcode{parallel}, \kcode{simd}, \kcode{task}) 
 accept clauses to control \plc{data-sharing} attributes
 of referenced variables in the construct, where \plc{data-sharing} applies to
 whether the attribute of the variable is \plc{shared}, 
 is \plc{private} storage, or has special operational characteristics 
-(as found in the \code{firstprivate}, \code{lastprivate}, \code{linear}, or \code{reduction} clause).
+(as found in the \kcode{firstprivate}, \kcode{lastprivate}, \kcode{linear}, or \kcode{reduction} clause).

 The data environment for a device (distinguished as a \plc{device data environment})
 is controlled on the host by \plc{data-mapping} attributes, which determine the
@ -22,15 +21,15 @@ Data-sharing attributes of variables can be classified as being \plc{predetermin

 Certain variables and objects have predetermined attributes.  
 A commonly found case is the loop iteration variable in associated loops 
-of a \code{for} or \code{do} construct. It has a private data-sharing attribute.
-Variables with predetermined data-sharing attributes can not be listed in a data-sharing clause; but there are some
+of a \kcode{for} or \kcode{do} construct. It has a private data-sharing attribute.
+Variables with predetermined data-sharing attributes cannot be listed in a data-sharing clause; but there are some
 exceptions (mainly concerning loop iteration variables).

 Variables with explicitly determined data-sharing attributes are those that are
 referenced in a given construct and are listed in a data-sharing attribute
 clause on the construct. Some of the common data-sharing clauses are:
-\code{shared}, \code{private}, \code{firstprivate}, \code{lastprivate}, 
-\code{linear}, and \code{reduction}. % Are these all of them?
+\kcode{shared}, \kcode{private}, \kcode{firstprivate}, \kcode{lastprivate}, 
+\kcode{linear}, and \kcode{reduction}. % Are these all of them?

 Variables with implicitly determined data-sharing attributes are those
 that are referenced in a given construct, do not have predetermined
@ -38,38 +37,59 @@ data-sharing attributes, and are not listed in a data-sharing
 attribute clause of an enclosing construct.
 For a complete list of variables and objects with predetermined and
 implicitly determined attributes, please refer to the
-\plc{Data-sharing Attribute Rules for Variables Referenced in a Construct}
+\docref{Data-sharing Attribute Rules for Variables Referenced in a Construct}
 subsection of the OpenMP Specifications document.  

 \bigskip
 DATA-MAPPING ATTRIBUTES

-The \code{map} clause on a device construct explicitly specifies how the list items in
+The \kcode{map} clause on a device construct explicitly specifies how the list items in
 the clause are mapped from the encountering task's data environment (on the host)
 to the corresponding item in the device data environment (on the device).
 The common \plc{list items} are arrays, array sections, scalars, pointers, and
 structure elements (members). 

 Procedures and global variables have predetermined data mapping if they appear
-within the list or block of a \code{declare target} directive. Also, a C/C++ pointer
+within the list or block of a \kcode{declare target} directive. Also, a C/C++ pointer
 is mapped as a zero-length array section, as is a C++ variable that is a reference to a pointer.
 % Waiting for response from Eric on this.

-Without explicit mapping, non-scalar and non-pointer variables within the scope of the \code{target}
-construct are implicitly mapped with a \plc{map-type} of \code{tofrom}.
-Without explicit mapping, scalar variables within the scope of the \code{target}
+Without explicit mapping, non-scalar and non-pointer variables within the scope of the \kcode{target}
+construct are implicitly mapped with a \plc{map-type} of \kcode{tofrom}.
+Without explicit mapping, scalar variables within the scope of the \kcode{target}
 construct are not mapped, but have an implicit firstprivate data-sharing
 attribute. (That is, the value of the original variable is given to a private
 variable of the same name on the device.) This behavior can be changed with
-the \code{defaultmap} clause.
+the \kcode{defaultmap} clause.

-The \code{map} clause can appear on \code{target}, \code{target data} and 
-\code{target enter/exit data} constructs.  The operations of creation and
+The \kcode{map} clause can appear on \kcode{target}, \kcode{target data} and 
+\kcode{target enter/exit data} constructs.  The operations of creation and
 removal of device storage as well as assignment of the original list item 
 values to the corresponding list items may be complicated when the list 
 item appears on multiple constructs or when the host and device storage 
 is shared. In these cases the item's reference count, the number of times
-it has been referenced (+1 on entry and -1 on exited) in nested (structured)
+it has been referenced (increment by 1 on entry and decrement by 1 on exit) in nested (structured)
 map regions and/or accumulative (unstructured) mappings, determines the operation.
-Details of the \code{map} clause and reference count operation are specified 
-in the \plc{map Clause} subsection of the OpenMP Specifications document.
+Details of the \kcode{map} clause and reference count operation are specified 
+in the \docref{\kcode{map} Clause} subsection of the OpenMP Specifications document.
+
+
+%===== Examples Sections =====
+\input{data_environment/threadprivate}
+\input{data_environment/default_none}
+\input{data_environment/private}
+\input{data_environment/fort_loopvar}
+\input{data_environment/fort_sp_common}
+\input{data_environment/fort_sa_private}
+\input{data_environment/fort_shared_var}
+\input{data_environment/carrays_fpriv}
+\input{data_environment/lastprivate}
+\input{data_environment/reduction}
+\input{data_environment/udr}
+\input{data_environment/induction}
+\input{data_environment/scan}
+\input{data_environment/copyin}
+\input{data_environment/copyprivate}
+\input{data_environment/cpp_reference}
+\input{data_environment/associate}
+
--- a/Chap_devices.tex
+++ b/Chap_devices.tex
@ -1,10 +1,9 @@
-\pagebreak
-\chapter{Devices}
+\cchapter{Devices}{devices}
 \label{chap:devices}

-The \code{target} construct consists of a \code{target} directive 
-and an execution region. The \code{target} region is executed on
-the default device or the device specified in the \code{device} 
+The \kcode{target} construct consists of a \kcode{target} directive 
+and an execution region. The \kcode{target} region is executed on
+the default device or the device specified in the \kcode{device} 
 clause. 

 In OpenMP version 4.0, by default, all variables within the lexical
@ -16,38 +15,65 @@ data to the device storage.

 The constructs that explicitly
 create storage, transfer data, and free storage on the device
-are catagorized as structured and unstructured. The
-\code{target} \code{data} construct is structured. It creates
-a data region around \code{target} constructs, and is
+are categorized as structured and unstructured. The
+\kcode{target data} construct is structured. It creates
+a data region around \kcode{target} constructs, and is
 convenient for providing persistent data throughout multiple
-\code{target} regions. The \code{target} \code{enter} \code{data} and 
-\code{target} \code{exit} \code{data} constructs are unstructured, because 
-they can occur anywhere and do not support a "structure" 
-(a region) for enclosing \code{target} constructs, as does the
-\code{target} \code{data} construct. 
+\kcode{target} regions. The \kcode{target enter data} and 
+\kcode{target exit data} constructs are unstructured, because 
+they can occur anywhere and do not support a ``structure''
+(a region) for enclosing \kcode{target} constructs, as does the
+\kcode{target data} construct. 

-The \code{map} clause is used on \code{target} 
+The \kcode{map} clause is used on \kcode{target} 
 constructs and the data-type constructs to map host data. It 
-specifies the device storage and data movement \code{to} and \code{from}
+specifies the device storage and data movement \plc{to} and \plc{from}
 the device, and controls on the storage duration.

 There is an important change in the OpenMP 4.5 specification
 that alters the data model for scalar variables and C/C++ pointer variables.
 The default behavior for scalar variables and C/C++ pointer variables
-in an 4.5 compliant code is \code{firstprivate}. Example
+in a 4.5 compliant code is \kcode{firstprivate}. Example
 codes that have been updated to reflect this new behavior are
 annotated with a description that describes changes required
 for correct execution. Often it is a simple matter of mapping
-the variable as \code{tofrom} to obtain the intended 4.0 behavior.
+the variable as \kcode{tofrom} to obtain the intended 4.0 behavior.

 In OpenMP version 4.5 the mechanism for target
-execution is specified as occuring through a \plc{target task}. 
-When the \code{target} construct is encountered a new 
-\plc{target task} is generated. The \plc{target task} 
-completes after the \code{target} region has executed and all data 
+execution is specified as occurring through a \plc{target task}. 
+When the \kcode{target} construct is encountered a new 
+target task is generated. The target task 
+completes after the \kcode{target} region has executed and all data 
 transfers have finished.

 This new specification does not affect the execution of 
 pre-4.5 code; it is a necessary element for asynchronous 
-execution of the \code{target} region when using the new \code{nowait} 
+execution of the \kcode{target} region when using the new \kcode{nowait} 
 clause introduced in OpenMP 4.5.
+
+
+%===== Examples Sections =====
+\input{devices/target}
+\input{devices/target_defaultmap}
+\input{devices/target_pointer_mapping}
+\input{devices/target_structure_mapping}
+\input{devices/target_fort_allocatable_array_mapping}
+\input{devices/array_sections}
+\input{devices/usm}
+\input{devices/C++_virtual_functions}
+\input{devices/array_shaping}
+\input{devices/target_mapper}
+\input{devices/target_data}
+\input{devices/target_unstructured_data}
+\input{devices/target_update}
+\input{devices/declare_target}
+\input{devices/lambda_expressions}
+\input{devices/teams}
+\input{devices/async_target_depend}
+\input{devices/async_target_with_tasks}
+\input{devices/async_target_nowait}
+\input{devices/async_target_nowait_depend}
+\input{devices/async_target_nowait_arg}
+\input{devices/device}
+\input{devices/device_env_traits}
+
--- a/Chap_directives.tex
+++ b/Chap_directives.tex
@ -0,0 +1,71 @@
+\cchapter{OpenMP Directive Syntax}{directives}
+\label{chap:directive_syntax}
+\index{directive syntax}
+
+OpenMP \plc{directives} use base-language mechanisms to specify OpenMP program behavior.
+In C/C++ code, the directives are formed with 
+either pragmas or attributes.
+Fortran directives are formed with comments in free form and fixed form sources (codes).
+All of these mechanisms allow the compilation to ignore the OpenMP directives if
+OpenMP is not supported or enabled.
+
+
+The OpenMP directive is a combination of the base-language mechanism and a \plc{directive-specification},
+as shown below. The \plc{directive-specification} consists
+of the \plc{directive-name} which may seldomly have arguments, 
+followed by optional \plc{clauses}. Full details of the syntax can be found in the OpenMP Specification.
+Illustrations of the syntax is given in the examples.
+
+The formats for combining a base-language mechanism and a \plc{directive-specification} are:
+
+C/C++ pragmas
+\begin{indentedcodelist}
+#pragma omp \plc{directive-specification}
+\end{indentedcodelist}
+
+C/C++ attribute specifiers
+\begin{indentedcodelist}
+[[omp :: directive( \plc{directive-specification} )]]
+[[omp :: decl( \plc{directive-specification} )]]
+\end{indentedcodelist}
+
+C++ attribute specifiers
+\begin{indentedcodelist}
+[[using omp : directive( \plc{directive-specification} )]]
+[[using omp : decl( \plc{directive-specification} )]]
+\end{indentedcodelist}
+
+where the \kcode{decl} attribute may be used for declarative
+directives alternatively.
+
+Fortran comments
+\begin{indentedcodelist}
+!$omp \plc{directive-specification}
+\end{indentedcodelist}
+
+where \scode{c$omp} and \scode{*$omp} may be used in Fortran fixed form sources.
+
+Most OpenMP directives accept clauses that alter the semantics of the directive in some way, 
+and some directives also accept parenthesized arguments that follow the directive name. 
+A clause may just be a keyword (e.g., \kcode{untied}) or it may also accept argument lists 
+(e.g., \kcode{shared(\ucode{x,y,z})}) and/or optional modifiers (e.g., \kcode{tofrom} in 
+\kcode{map(tofrom: \ucode{x,y,z})}).
+Clause modifiers may be ``simple'' or ``complex'' -- a complex modifier consists of a
+keyword followed by one or more parameters, bracketed by parentheses, while a simple 
+modifier does not. An example of a complex modifier is the \kcode{iterator} modifier, 
+as in \kcode{map(iterator(\ucode{i=0:n}), tofrom: \ucode{p[i]})}, or the \kcode{step} modifier, as in 
+\kcode{linear(\ucode{x}: ref, step(\ucode{4}))}. 
+In the preceding examples, \kcode{tofrom} and \kcode{ref} are simple modifiers.
+
+For Fortran, a declarative directive (such as \kcode{declare reduction})
+must appear after any \bcode{USE}, \bcode{IMPORT}, and \bcode{IMPLICIT} statements
+in the specification part.
+
+
+%===== Examples Sections =====
+\input{directives/pragmas}
+\input{directives/attributes}
+\input{directives/fixed_format_comments}
+\input{directives/free_format_comments}
+
+
--- a/Introduction_Chapt.tex
+++ b/Introduction_Chapt.tex
@ -1,5 +1,5 @@
 % This is the introduction for the OpenMP Examples document.
-% This is an included file. See the master file (openmp-examples.tex) for more information.
+% This is an included file. See the main file (openmp-examples.tex) for more information.
 %
 % When editing this file:
 %
@ -32,9 +32,9 @@
 %             This is a \plc{var-name}.
 %

-\chapter*{Introduction}
+\cchapter{Introduction}{introduction}
 \label{chap:introduction}
-\addcontentsline{toc}{chapter}{\protect\numberline{}Introduction}
+
 This collection of programming examples supplements the OpenMP API for Shared
 Memory Parallelization specifications, and is not part of the formal specifications. It
 assumes familiarity with the OpenMP specifications, and shares the typographical
@ -46,26 +46,28 @@ numerous vendors support the OpenMP API.

 The directives, library routines, and environment variables demonstrated in this
 document allow users to create and manage parallel programs while permitting
-portability. The directives extend the C, C++ and Fortran base languages with single
-program multiple data (SPMD) constructs, tasking constructs, device constructs,
-worksharing constructs, and synchronization constructs, and they provide support for
+portability. The directives extend the C, C++ and Fortran base languages with \plc{single
+program multiple data} (SPMD) constructs, \plc{tasking} constructs, \plc{device} constructs,
+\plc{worksharing} constructs, and \plc{synchronization} constructs, and they provide support for
 sharing and privatizing data. The functionality to control the runtime environment is
 provided by library routines and environment variables. Compilers that support the
 OpenMP API often include a command line option to the compiler that activates and
 allows interpretation of all OpenMP directives.

-The latest source codes for OpenMP Examples can be downloaded from the \code{sources}
-directory at
-\href{https://github.com/OpenMP/Examples}{https://github.com/OpenMP/Examples}.
-The codes for this OpenMP \VER{} Examples document have the tag \plc{v\VER}.
-
-%\href{https://github.com/OpenMP/Examples/tree/master/sources}{https://github.com/OpenMP/Examples/sources}.
+The documents and source codes for OpenMP Examples can be downloaded from
+\href{\examplesrepo}{\examplesrepo}.
+Each directory holds the contents of a chapter and has a \plc{sources} subdirectory of its codes. 
+This OpenMP Examples \VER{} document and its codes are tagged as  
+\examplestree{\VER}{\plc{v\VER}}.

 Complete information about the OpenMP API and a list of the compilers that support
 the OpenMP API can be found at the OpenMP.org web site

-\code{http://www.openmp.org}
+\scode{https://www.openmp.org}

+\clearpage
+
+\input{introduction/Examples}

 % This is the end of introduction.tex of the OpenMP Examples document.

--- a/Chap_loop_transformations.tex
+++ b/Chap_loop_transformations.tex
@ -0,0 +1,27 @@
+\cchapter{Loop Transformations}{loop_transformations}
+\label{chap:loop_transformations}
+
+To obtain better performance on a platform, code may need to be restructured 
+relative to the way it is written (which is often for best readability).
+User-directed loop transformations accomplish this goal by providing a means 
+to separate code semantics and its optimization.
+
+A loop transformation construct states that a transformation operation is to be 
+performed on set of nested loops.  This directive approach can target specific loops
+for transformation, rather than applying more time-consuming general compiler 
+heuristics methods with compiler options that may not be able to discover 
+optimal transformations.
+
+Loop transformations can be augmented by preprocessor support or OpenMP \kcode{metadirective} 
+directives, to select optimal dimension and size parameters for specific platforms,
+facilitating a single code base for multiple platforms.
+Moreover, directive-based transformations make experimenting easier: 
+whereby specific hot spots can be affected by transformation directives.
+
+
+%===== Examples Sections =====
+\input{loop_transformations/tile}
+\input{loop_transformations/partial_tile}
+\input{loop_transformations/unroll}
+\input{loop_transformations/apply}
+
--- a/Chap_memory_model.tex
+++ b/Chap_memory_model.tex
@ -1,14 +1,13 @@
-\pagebreak
-\chapter{Memory Model}
+\cchapter{Memory Model}{memory_model}
 \label{chap:memory_model}

 OpenMP provides a shared-memory model that allows all threads on a given
 device shared access to \emph{memory}. For a given OpenMP region that may be
 executed by more than one thread or SIMD lane, variables in memory may be
-\emph{shared} or \emph{private} with respect to those threads or SIMD lanes. A
+\plc{shared} or \plc{private} with respect to those threads or SIMD lanes. A
 variable's data-sharing attribute indicates whether it is shared (the
-\emph{shared} attribute) or private (the \emph{private}, \emph{firstprivate},
-\emph{lastprivate}, \emph{linear}, and \emph{reduction} attributes) in the data
+\plc{shared} attribute) or private (the \plc{private}, \plc{firstprivate},
+\plc{lastprivate}, \plc{linear}, and \plc{reduction} attributes) in the data
 environment of an OpenMP region. While private variables in an OpenMP region
 are new copies of the original variable (with same name) that may then be
 concurrently accessed or modified by their respective threads or SIMD lanes, a
@ -22,41 +21,40 @@ a given variable in their respective temporary views. Threads may employ flush
 operations for the purposes of making their temporary view of a variable
 consistent with the value of the variable in memory. The effect of a given
 flush operation is characterized by its flush properties -- some combination of
-\emph{strong}, \emph{release}, and \emph{acquire} -- and, for \emph{strong}
-flushes, a \emph{flush-set}. 
+\plc{strong}, \plc{release}, and \plc{acquire} -- and, for \plc{strong}
+flushes, a \plc{flush-set}.

-A \emph{strong} flush will force consistency between the temporary view and the
-memory for all variables in its \emph{flush-set}.  Furthermore all strong flushes in a
+A \plc{strong} flush will force consistency between the temporary view and the
+memory for all variables in its \plc{flush-set}.  Furthermore, all strong flushes in a
 program that have intersecting flush-sets will execute in some total order, and
 within a thread strong flushes may not be reordered with respect to other
-memory operations on variables in its flush-set. \emph{Release} and
-\emph{acquire} flushes operate in pairs. A release flush may ``synchronize''
+memory operations on variables in its flush-set. \plc{Release} and
+\plc{acquire} flushes operate in pairs. A release flush may ``synchronize''
 with an acquire flush, and when it does so the local memory operations that
 precede the release flush will appear to have been completed before the local
 memory operations on the same variables that follow the acquire flush.

-Flush operations arise from explicit \code{flush} directives, implicit
-\code{flush} directives, and also from the execution of \code{atomic}
-constructs. The \code{flush} directive forces a  consistent view of local
-variables of the thread executing the \code{flush}.  When a list is supplied on
+Flush operations arise from explicit \kcode{flush} directives, implicit
+\kcode{flush} directives, and also from the execution of \kcode{atomic}
+constructs. The \kcode{flush} directive forces a  consistent view of local
+variables of the thread executing the \kcode{flush}.  When a list is supplied on
 the directive, only the items (variables) in the list are guaranteed to be
 flushed.  Implied flushes exist at prescribed locations of certain constructs.
 For the complete list of these locations and associated constructs, please
-refer to the \plc{flush Construct} section of the OpenMP Specifications
+refer to the \docref{\kcode{flush} Construct} section of the OpenMP Specifications
 document.

 In this chapter, examples illustrate how race conditions may arise for accesses
 to variables with a \plc{shared} data-sharing attribute when flush operations
 are not properly employed.  A race condition can exist when two or more threads
-are involved in accessing a variable in which not all of the accesses are
-reads; that is, a WaR, RaW or WaW condition exists (R=read, a=after, W=write).
-A RaR does not produce a race condition. In particular, a data race will arise
-when conflicting accesses do not have a well-defined \emph{completion order}.
-The existence of data races in OpenMP programs result in undefined behavior,
-and so they should generally be avoided for programs to be correct.  The
-completion order of accesses to a shared variable is guaranteed in OpenMP
-through a set of memory consistency rules that are described in the \plc{OpenMP
-Memory Consitency} section of the OpenMP Specifications document.
+are involved in accessing a variable and at least one of the accesses modifies
+the variable.  In particular, a data race will arise when conflicting accesses
+do not have a well-defined \emph{completion order}.  The existence of data
+races in OpenMP programs result in undefined behavior, and so they should
+generally be avoided for programs to be correct.  The completion order of
+accesses to a shared variable is guaranteed in OpenMP through a set of memory
+consistency rules that are described in the \docref{OpenMP Memory Consistency}
+section of the OpenMP Specifications document.

 %This chapter also includes examples that exhibit non-sequentially consistent
 %(\emph{non-SC}) behavior. Sequential consistency (\emph{SC}) is the desirable
@ -130,3 +128,10 @@ Memory Consitency} section of the OpenMP Specifications document.
 % in \plc{atomic Construct} subsection of the OpenMP Specifications document).

 % Examples 1-3 show the difficulty of synchronizing threads through \code{flush} and \code{atomic} directives.
+
+
+%===== Examples Sections =====
+\input{memory_model/mem_model}
+\input{memory_model/allocators}
+\input{memory_model/fort_race}
+
--- a/Chap_ompt_interface.tex
+++ b/Chap_ompt_interface.tex
@ -0,0 +1,19 @@
+\cchapter{OMPT Interface}{ompt_interface}
+\label{chap:ompt_interface}
+OMPT defines mechanisms and an API for interfacing with tools in the OpenMP program.
+
+The OMPT API provides the following functionality:
+\begin{itemize}
+  \addtolength{\itemindent}{1cm}
+  \item  examines the state associated with an OpenMP thread
+  \item  interprets the call stack of an OpenMP thread
+  \item  receives notification about OpenMP events
+  \item  traces activity on OpenMP target devices
+  \item  assesses implementation-dependent details
+  \item  controls a tool from an OpenMP application
+\end{itemize}
+
+The following sections will illustrate basic mechanisms and operations of the OMPT API.
+
+
+\input{ompt_interface/ompt_start}
--- a/Chap_parallel_execution.tex
+++ b/Chap_parallel_execution.tex
@ -1,104 +1,130 @@
-\pagebreak
-\chapter{Parallel Execution}
+\cchapter{Parallel Execution}{parallel_execution}
 \label{chap:parallel_execution}

 A single thread, the \plc{initial thread}, begins sequential execution of 
 an OpenMP enabled program, as if the whole program is in an implicit parallel
 region consisting of an implicit task executed by the \plc{initial thread}.

-A \code{parallel} construct encloses code, 
-forming a parallel region.  An \plc{initial thread} encountering a \code{parallel} 
+A \kcode{parallel} construct encloses code, 
+forming a parallel region.  An \plc{initial thread} encountering a \kcode{parallel} 
 region forks (creates) a team of threads at the beginning of the 
-\code{parallel} region, and joins them (removes from execution) at the 
-end of the region.  The initial thread becomes the master thread of the team in a 
-\code{parallel} region with a \plc{thread} number equal to zero, the other 
+\kcode{parallel} region, and joins them (removes from execution) at the 
+end of the region.  The initial thread becomes the primary thread of the team in a 
+\kcode{parallel} region with a \plc{thread} number equal to zero, the other 
 threads are numbered from 1 to number of threads minus 1. 
 A team may be comprised of just a single thread.

-Each thread of a team is assigned an implicit task consisting of code within the 
-parallel region. The task that creates a parallel region is suspended while the
+Each \plc{thread} of a team is assigned an implicit task consisting of code within the 
+\kcode{parallel} region. The task that creates a \kcode{parallel} region is suspended while the
 tasks of the team are executed.  A thread is tied to its task; that is,
 only the thread assigned to the task can execute that task.  After completion 
-of the \code{parallel} region, the master thread resumes execution of the generating task.  
+of the \kcode{parallel} region, the primary thread resumes execution of the generating task.  

-%After the \code{parallel} region the master thread becomes the initial 
+%After the \code{parallel} region the primary thread becomes the initial 
 %thread again, and continues to execute the \plc{sequential part}.  

-Any task within a \code{parallel} region is allowed to encounter another
-\code{parallel} region to form a nested \code{parallel} region. The 
-parallelism of a nested \code{parallel} region (whether it forks additional 
+Any task within a \kcode{parallel} region is allowed to encounter another
+\kcode{parallel} region to form a nested \kcode{parallel} region. The 
+parallelism of a nested \kcode{parallel} region (whether it forks additional 
 threads, or is executed serially by the encountering task) can be controlled by the
-\code{OMP\_NESTED} environment variable or the \code{omp\_set\_nested()} 
+\kcode{OMP_NESTED} environment variable or the \kcode{omp_set_nested()} 
 API routine with arguments indicating true or false.

-The number of threads of a \code{parallel} region can be set by the \code{OMP\_NUM\_THREADS}
-environment variable, the \code{omp\_set\_num\_threads()} routine, or on the \code{parallel} 
-directive with the \code{num\_threads}
+The number of threads of a \kcode{parallel} region can be set by the \kcode{OMP_NUM_THREADS}
+environment variable, the \kcode{omp_set_num_threads()} routine, or on the \kcode{parallel} 
+directive with the \kcode{num_threads}
 clause. The routine overrides the environment variable, and the clause overrides all. 
-Use the \code{OMP\_DYNAMIC}
-or the \code{omp\_set\_dynamic()} function to specify that the OpenMP
+Use the \kcode{OMP_DYNAMIC}
+or the \kcode{omp_set_dynamic()} function to specify that the OpenMP
 implementation dynamically adjust the number of threads for
-\code{parallel} regions.  The default setting for dynamic adjustment is implementation
+\kcode{parallel} regions. The default setting for dynamic adjustment is implementation
 defined. When dynamic adjustment is on and the number of threads is specified,
 the number of threads becomes an upper limit for the number of threads to be
 provided by the OpenMP runtime.

-\pagebreak
+%\pagebreak
+\bigskip
 WORKSHARING CONSTRUCTS

 A worksharing construct distributes the execution of the associated region
 among the members of the team that encounter it.  There is an
 implied barrier at the end of the worksharing region
-(there is no barrier at the beginning). The worksharing
-constructs are:
+(there is no barrier at the beginning). 
+
+\newpage
+The worksharing constructs are:

 \begin{compactitem}

-\item loop constructs: {\code{for} and \code{do} }
-\item \code{sections}
-\item \code{single}
-\item \code{workshare}
+\item loop constructs: {\kcode{for} and \kcode{do} }
+\item \kcode{sections}
+\item \kcode{single}
+\item \kcode{workshare}

 \end{compactitem}

-The \code{for} and \code{do} constructs (loop constructs) create a region 
+The \kcode{for} and \kcode{do} constructs (loop constructs) create a region 
 consisting of a loop.  A loop controlled by a loop construct is called 
 an \plc{associated} loop.  Nested loops can form a single region when the 
-\code{collapse} clause (with an integer argument) designates the number of 
+\kcode{collapse} clause (with an integer argument) designates the number of 
 \plc{associated} loops to be executed in parallel, by forming a 
-"single iteration space" for the specified number of nested loops.  
-The \code{ordered} clause can also control multiple associated loops.
+``single iteration space'' for the specified number of nested loops.
+The \kcode{ordered} clause can also control multiple associated loops.

-An associated loop must adhere to a "canonical form" (specified in the 
-\plc{Canonical Loop Form} of the OpenMP Specifications document) which allows the 
+An associated loop must adhere to a ``canonical form'' (specified in the
+\docref{Canonical Loop Form} of the OpenMP Specifications document) which allows the 
 iteration count (of all associated loops) to be computed before the 
 (outermost) loop is executed. %[58:27-29].  
 Most common loops comply with the canonical form, including C++ iterators.

-A \code{single} construct forms a region in which only one thread (any one 
+A \kcode{single} construct forms a region in which only one thread (any one 
 of the team) executes the region. 
 The other threads wait at the implied 
-barrier at the end, unless the \code{nowait} clause is specified.
+barrier at the end, unless the \kcode{nowait} clause is specified.

-The \code{sections} construct forms a region that contains one or more 
-structured blocks.  Each block of a \code{sections} directive is 
-constructed with a \code{section} construct, and executed once by 
+The \kcode{sections} construct forms a region that contains one or more 
+structured blocks.  Each block of a \kcode{sections} directive is 
+constructed with a \kcode{section} construct, and executed once by 
 one of the threads (any one) in the team.  (If only one block is 
-formed in the region, the \code{section} construct, which is used to
+formed in the region, the \kcode{section} construct, which is used to
 separate blocks, is not required.)
 The other threads wait at the implied 
-barrier at the end, unless the \code{nowait} clause is specified.
+barrier at the end, unless the \kcode{nowait} clause is specified.


-The \code{workshare} construct is a Fortran feature that consists of a
+The \kcode{workshare} construct is a Fortran feature that consists of a
 region with a single structure block (section of code). Statements in the
-\code{workshare} region are divided into units of work, and executed (once)
+\kcode{workshare} region are divided into units of work, and executed (once)
 by threads of the team.  

 \bigskip
-MASTER CONSTRUCT
+MASKED CONSTRUCT
+
+The \kcode{masked} construct is not a worksharing construct.  The \kcode{masked} region is
+executed only by the primary thread. There is no implicit barrier (and flush) 
+at the end of the \kcode{masked} region; hence the other threads of the team continue
+execution beyond code statements beyond the \kcode{masked} region.
+The \kcode{master} construct, which has been deprecated in OpenMP 5.1, has identical semantics
+to the \kcode{masked} construct with no \kcode{filter} clause.
+
+
+%===== Examples Sections =====
+\input{parallel_execution/ploop}
+\input{parallel_execution/parallel}
+\input{parallel_execution/host_teams}
+\input{parallel_execution/nthrs_nesting}
+\input{parallel_execution/nthrs_dynamic}
+\input{parallel_execution/fort_do}
+\input{parallel_execution/nowait}
+\input{parallel_execution/collapse}
+\input{parallel_execution/linear_in_loop}
+\input{parallel_execution/psections}
+\input{parallel_execution/fpriv_sections}
+\input{parallel_execution/single}
+\input{parallel_execution/workshare}
+\input{parallel_execution/masked}
+\input{parallel_execution/loop}
+\input{parallel_execution/pra_iterator}
+\input{parallel_execution/set_dynamic_nthrs}
+\input{parallel_execution/get_nthrs}

-The \code{master} construct is not a worksharing construct.  The master region is
-is executed only by the master thread. There is no implicit barrier (and flush) 
-at the end of the \code{master} region; hence the other threads of the team continue
-execution beyond code statements beyond the \code{master} region.
--- a/Chap_program_control.tex
+++ b/Chap_program_control.tex
@ -1,52 +1,62 @@
-\pagebreak
-\chapter{Program Control}
-\label{sec:program_control}
+\cchapter{Program Control}{program_control}
+\label{chap:program_control}

-Some specific and elementary concepts of controlling program execution are
-illustrated in the examples of this chapter.  Control can be directly
-managed with conditional control code (ifdef's with the \code{\_OPENMP} 
-macro, and the Fortran sentinel (\code{!\$}) 
-for conditionally compiling). The \code{if} clause on some constructs
+Basic concepts and mechanisms for directing and controlling a program compilation and execution
+are provided in this introduction and illustrated in subsequent examples.
+
+\bigskip
+CONDITIONAL COMPILATION and EXECUTION
+
+Conditional compilation can be performed with conventional \bcode{\#ifdef} directives
+in C, C++, and Fortran, and additionally with OpenMP sentinel (\scode{!$}) in Fortran. 
+The \kcode{if} clause on some directives
 can direct the runtime to ignore or alter the behavior of the construct.
-Of course, the base-language \code{if} statements can be used to control the "execution" 
-of stand-alone directives (such as \code{flush}, \code{barrier}, \code{taskwait}, 
-and  \code{taskyield}).
-However, the directives must appear in a block structure, and not as a substatement as shown in examples 1 and 2 of this chapter.
+Of course, the base-language \bcode{if} statements can be used to control the execution
+of stand-alone directives (such as \kcode{flush}, \kcode{barrier}, \kcode{taskwait}, 
+and  \kcode{taskyield}).
+However, the directives must appear in a block structure, and not as a substatement.
+The \kcode{metadirective} and \kcode{declare variant} directives provide conditional 
+selection of directives and routines for compilation (and use), respectively.
+The \kcode{assume} and \kcode{requires} directives provide invariants
+for optimizing compilation, and essential features for compilation 
+and correct execution, respectively.
+

 \bigskip
 CANCELLATION

 Cancellation (termination) of the normal sequence of execution for the threads in an OpenMP region can
-be  accomplished with the \code{cancel} construct.  The construct uses a
+be  accomplished with the \kcode{cancel} construct.  The construct uses a
 \plc{construct-type-clause} to set the region-type to activate for the cancellation. 
-That is, inclusion  of one of the \plc{construct-type-clause} names \code{parallel}, \code{for}, 
-\code{do}, \code{sections} or \code{taskgroup} on the directive line 
+That is, inclusion  of one of the \plc{construct-type-clause} names \kcode{parallel}, \kcode{for}, 
+\kcode{do}, \kcode{sections} or \kcode{taskgroup} on the directive line 
 activates the corresponding region.  
-The \code{cancel} construct is activated by the first encountering thread,  and it
+The \kcode{cancel} construct is activated by the first encountering thread,  and it
 continues execution at the end of the named region.
-The \code{cancel} construct is also a cancellation point for any other thread of the team 
+The \kcode{cancel} construct is also a cancellation point for any other thread of the team 
 to also continue execution at the end of the named region.  

-Also, once the specified region has been activated for cancellation any thread that encounnters 
-a \code{cancellation point} construct with the same named region (\plc{construct-type-clause}),
+Also, once the specified region has been activated for cancellation any thread that encounters
+a \kcode{cancellation point} construct with the same named region (\plc{construct-type-clause}),
 continues execution at the end of the region.

-For an activated \code{cancel taskgroup} construct, the tasks that
+For an activated \kcode{cancel taskgroup} construct, the tasks that
 belong to the taskgroup set of the innermost enclosing taskgroup region will be canceled. 

-A task that encounters the cancel taskgroup construct continues execution at the end of its
+A task that encounters a \kcode{cancel taskgroup} construct continues execution at the end of its
 task region. Any task of the taskgroup that has already begun execution will run to completion,
-unless it encounters a \code{cancellation point}; tasks that have not begun execution "may" be
+unless it encounters a \kcode{cancellation point}; tasks that have not begun execution may be
 discarded as completed tasks.

-\bigskip
+\pagebreak
 CONTROL VARIABLES 

  Internal control variables (ICV) are used by implementations to hold values which control the execution
  of OpenMP regions.  Control (and hence the ICVs) may be set as implementation defaults, 
-  or set and adjusted through environment variables, clauses, and API functions.  Many of the ICV control
-  values are accessible through API function calls.  Also, initial ICV values are reported by the runtime
-  if the \code{OMP\_DISPLAY\_ENV} environment variable has been set to \code{TRUE}. 
+  or set and adjusted through environment variables, clauses, and API functions.  
+ %Many of the ICV control values are accessible through API function calls.  
+  Initial ICV values are reported by the runtime
+  if the \kcode{OMP_DISPLAY_ENV} environment variable has been set to \vcode{TRUE} or \vcode{VERBOSE}. 

 %As an example, the \plc{nthreads-var} is the ICV that holds the number of threads
 %to be used in a \code{parallel} region.  It can be set with the \code{OMP\_NUM\_THREADS} environment variable, 
@ -59,8 +69,8 @@ CONTROL VARIABLES
 \bigskip
 NESTED CONSTRUCTS

-Certain combinations of nested constructs are permitted, giving rise to a \plc{combined} construct
-consisting of two or more constructs.  These can be used when the two (or several) constructs would be used
+Certain combinations of nested constructs are permitted, giving rise to \plc{combined} constructs
+consisting of two or more directives.  These can be used when the two (or several) constructs would be used
 immediately in succession (closely nested). A combined construct can use the clauses of the component
 constructs without restrictions.
 A \plc{composite} construct is a combined construct which has one or more clauses with (an often obviously)
@ -70,16 +80,37 @@ modified or restricted meaning, relative to when the constructs are uncombined.
 %construct with one of the loops constructs \code{do} or \code{for}.  The
 %\code{parallel do SIMD} and \code{parallel for SIMD} constructs are composite constructs (composed from
 %the parallel loop constructs and the \code{SIMD} construct), because the \code{collapse} clause must
-%explicitly address the ordering of loop chunking \plc{and} SIMD "combined" execution.
+%explicitly address the ordering of loop chunking \plc{and} SIMD ``combined'' execution.

-Certain nestings are forbidden, and often the reasoning is obvious.  Worksharing constructs cannot be nested, and
-the \code{barrier} construct cannot be nested inside a worksharing construct, or a \code{critical} construct. 
-Also, \code{target} constructs cannot be nested.  
+Certain nestings are forbidden, and often the reasoning is obvious.  For example, worksharing constructs cannot be nested, and
+the \kcode{barrier} construct cannot be nested inside a worksharing construct, or a \kcode{critical} construct. 
+Also, \kcode{target} constructs cannot be nested, unless the nested target is a reverse offload.

-The \code{parallel} construct can be nested, as well as the \code{task} construct.  The parallel
-execution in the nested \code{parallel} construct(s) is control by the \code{OMP\_NESTED} and 
-\code{OMP\_MAX\_ACTIVE\_LEVELS} environment variables, and the \code{omp\_set\_nested()} and 
-\code{omp\_set\_max\_active\_levels()} functions.
+The \kcode{parallel} construct can be nested, as well as the \kcode{task} construct.  
+The parallel execution in the nested \kcode{parallel} construct(s) is controlled by the 
+\kcode{OMP_MAX_ACTIVE_LEVELS} environment variable, and the \kcode{omp_set_max_active_levels} routine. 
+Use the \kcode{omp_get_max_active_levels} routine to determine the maximum levels provided by an implementation.
+As of OpenMP 5.0, use of the \kcode{OMP_NESTED} environment variable and the \kcode{omp_set_nested} routine 
+has been deprecated.

-More details on nesting can be found in the \plc{Nesting of Regions} of the \plc{Directives} 
+More details on nesting can be found in the \docref{Nesting of Regions} of the \docref{Directives} 
 chapter in the OpenMP Specifications document.
+
+
+%===== Examples Sections =====
+\input{program_control/assumption}
+\input{program_control/cond_comp}
+\input{program_control/icv}
+\input{program_control/standalone}
+\input{program_control/cancellation}
+\input{program_control/requires}
+\input{program_control/context_based_variants}
+\input{program_control/dispatch}
+\input{program_control/nested_loop}
+\input{program_control/nesting_restrict}
+\input{program_control/target_offload}
+\input{program_control/pause_resource}
+\input{program_control/reproducible}
+\input{program_control/interop}
+\input{program_control/utilities}
+
--- a/Chap_synchronization.tex
+++ b/Chap_synchronization.tex
@ -1,48 +1,47 @@
-\pagebreak
-\chapter{Synchronization}
+\cchapter{Synchronization}{synchronization}
 \label{chap:synchronization}

-The \code{barrier} construct is a stand-alone directive that requires all threads
+The \kcode{barrier} construct is a stand-alone directive that requires all threads
 of a team (within a contention group) to execute the barrier and complete
 execution of all tasks within the region, before continuing past the barrier.

-The \code{critical} construct is a directive that contains a structured block. 
+The \kcode{critical} construct is a directive that contains a structured block. 
 The construct allows only a single thread at a time to execute the structured block (region).
-Multiple critical regions may exist in a parallel region, and may
-act cooperatively (only one thread at a time in all \code{critical} regions),
-or separately (only one thread at a time in each \code{critical} regions when
-a unique name is supplied on each \code{critical} construct).
-An optional (lock) \code{hint} clause may be specified on a named \code{critical} 
+Multiple \kcode{critical} regions may exist in a parallel region, and may
+act cooperatively (only one thread at a time in all \kcode{critical} regions),
+or separately (only one thread at a time in each \kcode{critical} regions when
+a unique name is supplied on each \kcode{critical} construct).
+An optional (lock) \kcode{hint} clause may be specified on a named \kcode{critical} 
 construct to provide the OpenMP runtime guidance in selection a locking 
 mechanism.

-On a finer scale the \code{atomic} construct allows only a single thread at 
+On a finer scale the \kcode{atomic} construct allows only a single thread at 
 a time to have atomic access to a storage location involving a single read, 
 write, update or capture statement, and a limited number of combinations 
-when specifying the \code{capture} \plc{atomic-clause} clause.  The
+when specifying the \kcode{capture} \plc{atomic-clause} clause.  The
 \plc{atomic-clause} clause is required for some expression statements, but is
-not required for \code{update} statements. The \plc{memory-order} clause can be
-used to specify the degree of memory ordering enforced by an \code{atomic}
-construct. From weakest to strongest, they are \code{relaxed} (the default),
-acquire and/or release clauses (specified with \code{acquire}, \code{release},
-or \code{acq\_rel}), and \code{seq\_cst}.  Please see the details in the
-\plc{atomic Construct} subsection of the \plc{Directives} chapter in the OpenMP
+not required for \kcode{update} statements. The \plc{memory-order} clause can be
+used to specify the degree of memory ordering enforced by an \kcode{atomic}
+construct. From weakest to strongest, they are \kcode{relaxed} (the default),
+\plc{acquire} and/or \plc{release} clauses (specified with \kcode{acquire}, \kcode{release},
+or \kcode{acq_rel}), and \kcode{seq_cst}.  Please see the details in the
+\docref{atomic Construct} subsection of the \docref{Directives} chapter in the OpenMP
 Specifications document.

 % The following three sentences were stolen from the spec.
-The \code{ordered} construct either specifies a structured block in a loop, 
+The \kcode{ordered} construct either specifies a structured block in a loop, 
 simd, or loop SIMD region that will be executed in the order of the loop 
-iterations.  The ordered construct sequentializes and orders the execution 
-of ordered regions while allowing code outside the region to run in parallel.
+iterations.  The \kcode{ordered} construct sequentializes and orders the execution
+of \kcode{ordered} regions while allowing code outside the region to run in parallel.

-Since OpenMP 4.5 the \code{ordered} construct can also be a stand-alone 
-directive that specifies cross-iteration dependences in a doacross loop nest.  
-The \code{depend} clause uses a \code{sink} \plc{dependence-type}, along with a 
-iteration vector argument (vec) to indicate the iteration that satisfies the 
-dependence.  The \code{depend} clause with a \code{source}
+Since OpenMP 4.5 the \kcode{ordered} construct can also be a stand-alone 
+directive that specifies cross-iteration dependences in a \plc{doacross} loop nest.
+The \kcode{depend} clause uses a \kcode{sink} \plc{dependence-type}, along with an 
+iteration vector argument (\plc{vec}) to indicate the iteration that satisfies the
+dependence.  The \kcode{depend} clause with a \kcode{source}
 \plc{dependence-type} specifies dependence satisfaction.

-The \code{flush} directive is a stand-alone construct for enforcing consistency
+The \kcode{flush} directive is a stand-alone construct for enforcing consistency
 between a thread's view of memory and the view of memory for other threads (see
 the Memory Model chapter of this document for more details). When the construct
 is used with an explicit variable list, a \plc{strong flush} that forces a
@ -56,7 +55,7 @@ semantics.  When an explicit variable list is not present and a
 release memory ordering semantics according to the \plc{memory-order} clause,
 but no strong flush is performed. A resulting strong flush that applies to a
 set of variables effectively ensures that no memory (load or store)
-operation for the affected variables may be reordered across the \code{flush}
+operation for the affected variables may be reordered across the \kcode{flush}
 directive.

 General-purpose routines provide mutual exclusion semantics through locks, 
@ -70,12 +69,33 @@ types of locks, and the variable of a specific lock type cannot be used by the
 other lock type.  

 Any explicit task will observe the synchronization prescribed in a 
-\code{barrier} construct and an implied barrier.  Also, additional synchronizations 
-are available for tasks.  All children of a task will wait at a \code{taskwait} (for 
-their siblings to complete).  A \code{taskgroup} construct creates a region in which the
+\kcode{barrier} construct and an implied barrier.  Also, additional synchronizations 
+are available for tasks.  All children of a task will wait at a \kcode{taskwait} (for 
+their siblings to complete).  A \kcode{taskgroup} construct creates a region in which the
 current task is suspended at the end of the region until all sibling tasks, 
 and their descendants, have completed. 
-Scheduling constraints on task execution can be prescribed by the \code{depend}
+Scheduling constraints on task execution can be prescribed by the \kcode{depend}
 clause to enforce dependence on previously generated tasks.
-More details on controlling task executions can be found in the \plc{Tasking} Chapter
+More details on controlling task executions can be found in the \docref{Tasking} Chapter
 in the OpenMP Specifications document. %(DO REF. RIGHT.)
+
+
+%===== Examples Sections =====
+\input{synchronization/critical}
+\input{synchronization/worksharing_critical}
+\input{synchronization/barrier_regions}
+\input{synchronization/atomic}
+\input{synchronization/atomic_cas}
+\input{synchronization/atomic_restrict}
+\input{synchronization/atomic_hint}
+\input{synchronization/acquire_release}
+\input{synchronization/ordered}
+\input{synchronization/depobj}
+\input{synchronization/doacross}
+\input{synchronization/locks}
+\input{synchronization/init_lock}
+\input{synchronization/init_lock_with_hint} 
+\input{synchronization/lock_owner}
+\input{synchronization/simple_lock}
+\input{synchronization/nestable_lock}
+
--- a/Chap_tasking.tex
+++ b/Chap_tasking.tex
@ -1,35 +1,34 @@
-\pagebreak
-\chapter{Tasking}
+\cchapter{Tasking}{tasking}
 \label{chap:tasking}

 Tasking constructs provide units of work to a thread for execution.  
-Worksharing constructs do this, too (e.g. \code{for}, \code{do}, 
-\code{sections}, and \code{singles} constructs); 
+Worksharing constructs do this, too (e.g. \kcode{for}, \kcode{do}, 
+\kcode{sections}, and \kcode{single} constructs); 
 but the work units are tightly controlled by an iteration limit and limited 
-scheduling, or a limited number of \code{sections} or \code{single} regions. 
+scheduling, or a limited number of \kcode{sections} or \kcode{single} regions. 
 Worksharing was designed 
-with \texttt{"}data parallel\texttt{"} computing in mind.  Tasking was designed for 
-\texttt{"}task parallel\texttt{"} computing and often involves non-locality or irregularity 
+with ``data parallel'' computing in mind.  Tasking was designed for 
+``task parallel'' computing and often involves non-locality or irregularity 
 in memory access.

-The \code{task} construct can be used to execute work chunks: in a while loop; 
+The \kcode{task} construct can be used to execute work chunks: in a while loop; 
 while traversing nodes in a list; at nodes in a tree graph; 
-or in a normal loop (with a \code{taskloop} construct).  
+or in a normal loop (with a \kcode{taskloop} construct).  
 Unlike the statically scheduled loop iterations of worksharing, a task is 
 often enqueued, and then dequeued for execution by any of the threads of the
 team within a parallel region. The generation of tasks can be from a single 
 generating thread (creating sibling tasks), or from multiple generators
 in a recursive graph tree traversals. 
 %(creating a parent-descendents hierarchy of tasks, see example 4 and 7  below). 
-A \code{taskloop} construct
+A \kcode{taskloop} construct
 bundles iterations of an associated loop into tasks, and provides 
-similar controls found in the \code{task} construct.
+similar controls found in the \kcode{task} construct.

-Sibling tasks are synchronized by the \code{taskwait} construct, and tasks
+Sibling tasks are synchronized by the \kcode{taskwait} construct, and tasks
 and their descendent tasks can be synchronized by containing them in
-a \code{taskgroup} region.  Ordered execution is accomplished by specifying
-dependences with a \code{depend} clause. Also, priorities can be
-specified as hints to the scheduler through a \code{priority} clause.
+a \kcode{taskgroup} region.  Ordered execution is accomplished by specifying
+dependences with a \kcode{depend} clause. Also, priorities can be
+specified as hints to the scheduler through a \kcode{priority} clause.

 Various clauses can be used to manage and optimize task generation,
 as well as reduce the overhead of execution and to relinquish 
@ -37,15 +36,28 @@ control of threads for work balance and forward progress.

 Once a thread starts executing a task, it is the designated thread 
 for executing the task to completion, even though it may leave the
-execution at a scheduling point and return later.  The thread is tied
-to the task.  Scheduling points can be introduced with the \code{taskyield}
-construct.  With an \code{untied} clause any other thread is allowed to continue
-the task.  An \code{if} clause with a \plc{true} expression allows the 
-generating thread to immediately execute the task as an undeferred task.
+execution at a scheduling point and return later.  The thread is \plc{tied}
+to the task.  Scheduling points can be introduced with the \kcode{taskyield}
+construct.  With an \kcode{untied} clause any other thread is allowed to continue
+the task.  An \kcode{if} clause with an expression that evaluates to \plc{false} 
+results in an \plc{undeferred} task, which instructs the runtime to suspend
+the generating task until the undeferred task completes its execution.
 By including the data environment of the generating task into the generated task with the 
-\code{mergeable} and \code{final} clauses, task generation overhead can be reduced.
+\kcode{mergeable} and \kcode{final} clauses, task generation overhead can be reduced.

 A complete list of the tasking constructs and details of their clauses
-can be found in the \plc{Tasking Constructs} chapter of the OpenMP Specifications,
-in the \plc{OpenMP Application Programming Interface} section.
+can be found in the \docref{Tasking Constructs} chapter of the OpenMP Specifications.
+%in the \docref{OpenMP Application Programming Interface} section.
+
+
+%===== Examples Sections =====
+\input{tasking/tasking}
+\input{tasking/task_priority}
+\input{tasking/task_dep}
+\input{tasking/task_detach}
+\input{tasking/taskgroup}
+\input{tasking/taskyield}
+\input{tasking/taskloop}
+\input{tasking/parallel_masked_taskloop}
+\input{tasking/taskloop_dep}

--- a/Contributions.md
+++ b/Contributions.md
@ -0,0 +1,234 @@
+# Contributing
+
+The usual process for adding new examples, making changes or adding corrections 
+is to submit an issue for discussion and initial evaluation of changes or example additions. 
+When there is a consensus at a meeting about the contribution, 
+the issue will be brought forward for voting at the OpenMP Language
+Committee meetings and you will be asked to submit a pull request.
+
+Of course, if your contribution is an obvious correction, clarification, or note, you
+may want to submit a pull request directly.
+
+-----------------------------------------------------------
+
+## The OpenMP Examples document
+
+The OpenMP Examples document is in LaTeX format.
+Please see the main LaTeX file, `openmp-examples.tex`, for more information.
+
+## Maintainer
+
+[OpenMP Examples Subcommittee](http://twiki.openmp.org/twiki/bin/view/OpenMPLang/OpenMPExamplesSubCommittee)
+For a brief revision history, see `Changes.log` in the repo.
+
+## Git procedure
+
+ * Fork your own branch of the OpenMP [examples-internal repo](https://github.com/OpenMP/examples-internal)
+ * Clone your fork locally
+ * If you are working on generic or old-version updates, create a branch off main.
+ * If you are working on an example for a release candidate for version #.#, create a branch off work_#.#.
+   1) `git clone --branch <main|work_#.#> https://github.com/<my_account>/examples-internal`
+   2) `git checkout -b <branch_name>`
+   3) ...  `add`, `commit`
+   4) `git push -u origin <branch_name>`
+   5) `make` or `make diff` will create a full-document pdf or just a pdf with differences (do this at any point).
+ * `git status` and `git branch -a` are your friends
+ * Submit an issue for your work (usually with a diff pdf), and then you will be asked to submit a pull request
+   * Create an issue by selecting the (issue tab)[https://github.com/OpenMP/examples-internal/issues] and clicking on `new issue`.
+   * Use this MarkDown Cheatsheet for (issue formatting)[https://wordpress.com/support/markdown-quick-reference/]
+   * More MarkDown details are available (here)[https://markdown-it.github.io]
+   * You can cut and paste markdown formatted text in a (reader)[https://dillinger.io] to see formatting effects.
+   * Forced spaces are available in Markdown.  On a Mac it is "option+space".
+   * Polling is available.  Go to (gh-poll)[https://app.gh-polls.com/].  Type an option on each line, then click `copy markdown`, and paste the contents into the issue.  (Use preview to check your poll, and then submit it.)
+ * Create a pull request
+
+
+## Processing source code
+
+   * Prepare source code (C/C++ and Fortran) and a text description (use similar styles found in recent examples)
+   * Determine the *example* name `<ename>`, *sequence* identifier `<seq-id>` and *compiler* suffix `<csuffix>` for the example
+      * The syntax is:   `<ename>.<seq-id>.<csuffix>`   (e.g. `affinity_display.1.f90`)
+      * The example name may be a Section name (e.g. affinity), or a Subsection name (affinity_display)
+      * If you are creating a new Chapter, it may be the chapter name.
+   * New examples are usually added at the end of a Section or Subsection. Number it as the next number in the sequence numbers for examples in that Section or Subsection.
+   * The compiler suffix `<csuffix>` is `c`, `cpp`, `f`, and `f90` for C, C++ and Fortran (fixed/free form) codes.
+   * Insert the code in the sources directory for each chapter, and include the following metadata:
+   * Metadata Tags for example sources:
+     ```
+       @@name:        <ename>.<seq-no>
+       @@type:        C|C++|F-fixed|F-free
+       @@operation:   view|compile|link|run
+       @@expect:      success|ct-error|rt-error|unspecified
+       @@version:     [pre_]omp_<verno>
+       @@env:         <environment_variables>
+       @@depend:      <source_code_name>
+     ```
+     * **name**
+       - is the name of an example
+     * **type**
+       - is the source code type, which can be translated into or from proper file extension (C:c,C++:cpp,F-fixed:f,F-free:f90)
+     * **operation**
+       - indicates how the source code is treated. Possible values are:
+         - `view`     - code for illustration only, not compilable;
+         - `compile`  - incomplete program, such as function or subroutine;
+         - `link`     - complete program, but no verification value;
+         - `run`      - complete program with verification value.
+     * **expect**
+       - indicates some expected result for testing purpose. 
+         - `success` means no issue;
+         - `ct-error` applies to the result of code compilation;
+         - `rt-error` is for a case where compilation may be successful, but the code
+       contains potential runtime issues (including race condition); 
+         - `unspecified` could result from a non-conforming code or is for code
+       that is viewable only.
+     * **version**
+       - indicates that the example uses features in a specific OpenMP version, such as "`omp_5.0`".
+       The prefix `pre_` indicates that the example uses features prior to a specific version, such as "`pre_omp_3.0`".
+     * **env**
+       - specifies any environment variables needed to run the code.
+       This tag is optional and can be repeated.
+     * **depend**
+       - specifies a source code file on which the current code depends.
+       This tag is optional and can be repeated.
+     * For **env** and **depend**, make sure to specify
+       a proper skipping number `<s>` in the LaTeX macros described below
+       to match with the number of `env` and `depend` tags.
+
+
+## Process for text
+   * Create or update the description text in a Section/Subsection file under each chapter directory, usually `<chap_directory>/<ename>.tex`
+   * If adding a new Subsection, just include it in the appropriate subsection file (`<subsection>.tex`)
+   * If adding a new Section, create an `<section>.tex` file and add an entry in the corresponding chapter file, such as `Chap_affinity.tex`
+   * If adding a new Chapter, create a `Chap_<chap_name>.tex` file with introductory text, and add a new `<section>.tex` file with text and links to the code. Update `Makefile` and `openmp-examples.tex` to include the new chapter file.
+   * Commit your changes into your fork of examples-internal
+   * Summit your issue at [OpenMP Examples internal repo]( https://github.com/openmp/examples-internal/issues), and include a PDF when ready.
+   * Examples subcommittee members can view [meeting schedule and notes](http://twiki.openmp.org/twiki/bin/view/OpenMPLang/ExamplesSchedules)
+   * Shepherd your issue to acceptance (discussed at weekly Examples meeting and in issue comments)
+   * When it is in a ready state, you should then submit a pull request.
+   * It will be reviewed and voted on, and changes will be requested.
+   * Once the last changes are made, it will be verified and merged into an appropriate branch (either the `main` branch or a working branch).
+
+
+
+
+## LaTeX macros for examples
+
+The following describes LaTeX macros defined specifically for examples.
+* Source code with language h-rules
+* Source code without language h-rules
+* Language h-rules
+* Macros for keywords in text description
+* Other macros
+* See `openmp.sty` for more information
+
+### Source code with language h-rules
+```
+   \cexample[<verno>]{<ename>}{<seq-no>}[<s>]     % for C/C++ examples
+   \cppexample[<verno>]{<ename>}{<seq-no>}[<s>]   % for C++ examples
+   \fexample[<verno>]{<ename>}{<seq-no>}[<s>]     % for fixed-form Fortran examples
+   \ffreeexample[<verno>]{<ename>}{<seq-no>}[<s>] % for free-form Fortran examples
+```
+
+### Source code without language h-rules
+```
+   \cnexample[<verno>]{<ename>}{<seq-no>}[<s>]
+   \cppnexample[<verno>]{<ename>}{<seq-no>}[<s>]
+   \fnexample[<verno>]{<ename>}{<seq-no>}[<s>]
+   \ffreenexample[<verno>]{<ename>}{<seq-no>}[<s>]
+   \srcnexample[<verno>]{<ename>}{<seq-no>}{<ext>}[<s>]
+```
+
+   Optional `<verno>` can be supplied in a macro to include a specific OpenMP
+   version in the example header.  This option also suggests one additional
+   tag (`@@version`) line is included in the corresponding source code. 
+   If this is not the case (i.e., no `@@version` tag line), one needs to 
+   prefix `<verno>` with an underscore '\_' symbol in the macro.
+
+   The exception is macro `\srcnexample`, for which the corresponding
+   source code might not contain any `@@` metadata tags. The `ext` argument
+   to this macro is the file extension (such as `h`, `hpp`, `inc`).
+
+   The `<s>` option to each macro allows finer-control of any additional lines
+   to be skipped due to addition of new `@@` tags, such as `@@env`.
+   The default value for `<s>` is 0.
+
+### Language h-rules
+```
+   \cspecificstart, \cspecificend
+   \cppspecificstart, \cppspecificend
+   \ccppspecificstart, \ccppspecificend
+   \fortranspecificstart, \fortranspecificend
+   \begin{cspecific}[s] ... \end{cspecific}
+   \begin{cppspecific}[s] ... \end{cppspecific}
+   \begin{ccppspecific}[s] ... \end{ccppspecific}
+   \begin{fortranspecific}[s] ... \end{fortranspecific}
+   \topmarker{Lang}
+```
+
+   Use of the structured `\begin{} .. \end{}` environments is the preferred 
+   way of specifying language-dependent text over the unstructured approach
+   of using `\*specificstart` and `\*specificend`.
+   The option `[s]` to each of the environments can specify a vertical shift 
+   for the beginning rule, such as when followed by a section header.
+
+   The macro `\topmarker` puts a dashed blue line floater at top of a page for 
+   "Lang (cont.)" where `Lang` can be `C/C++`, `C++`, `Fortran`.
+
+### Macros for keywords in text description
+A partial list:
+- `\kcode{}` - for OpenMP keywords, such as directives, clauses, environment variables, API routines. Support direct use of '_' (underscore) and ' ' (space)
+- `\scode{}` - OpenMP specifier with special chars, such as '`$`' in "`!$omp`"
+- `\bcode{}` - base language keywords (such as `ASSOCIATE` in Fortran)
+- `\vcode{}` - values of a keyword, such as `TRUE`, `FALSE`, `VERBOSE`
+- `\plc{}` - OpenMP concept, such ICV names; `\splc{}` - escape '_' (underscore)
+- `\example{}` - example names, such as `\example{taskloop_reduction.1}`
+- `\docref{}` - chapter or section name of a document, such as the spec
+- `\ucode{}` - program variables, procedure names, or expression in examples codes. Support direct use of '_' (underscore) and ' ' (space).
+- `\pout{}` - program outputs
+
+Examples:
+- `\kcode{declare reduction}` for **declare reduction**
+- `\scode{!$omp}` sentinel, however, `\kcode{\#pragma omp}`
+- `\kcode{map(iterator(\ucode{i=0:n}), tofrom: \ucode{p[i]})}` for **map(iterator(**_i=0:n_**), tofrom:** _p[i]_**)**
+- Fortran `\bcode{IMPLICIT NONE}` statement
+- The `\vcode{VERBOSE}` value for `\kcode{OMP_DISPLAY_ENV}`
+- OpenMP `\plc{directives}`, the `\plc{num-threads}` ICV
+- This is an example name `\example{taskloop_reduction.1}`
+- `(\ucode{x,y,z})` argument for procedure `\ucode{a_proc_name}`
+- structure constructor `\ucode{point($\ldots$)}`
+- This is a code output `"\pout{x = 1}"`
+
+### Other macros
+```
+   \cchapter{<Chapter Name>}{<chap_directory>}
+   \hexentry[ext1]{<example_name>}[ext2]{<earlier_tag>}
+   \hexmentry[ext1]{<example_name>}[ext2]{<earlier_tag>}{<prior_name>}
+   \examplesref{<verno>}
+   \examplesblob{<verno/file>}
+```
+
+The `\cchapter` macro is used for starting a chapter with proper page spacing.
+`<Chapter Name>` is the name of a chapter and `<chap_directory>` is the name 
+of the chapter directory.  All section and subsection files for the chapter 
+should be placed under `<chap_directory>`. The corresponding example sources 
+should be placed under the `sources` directory inside `<chap_directory>`.
+
+A previously-defined macro `\sinput{<section_file>}` to import a section
+file from `<chap_directory>` is no longer supported.  Please use
+`\input{<chap_directory>/<section_file>}` explicitly.
+
+The two macros `\hexentry` and `\hexmentry` are defined for simplifying
+entries in the feature deprecation and update tables. Option `[ext1]` is
+the file extension with a default value of `c` and option `[ext2]` is 
+the file extension for the associated second file if present.
+`<earlier_tag>` is the version tag of the corresponding example
+in the earlier version. `\hexentry` assumes no name change for an example
+in different versions; `\hexmentry` can be used to specify a prior name
+if it is different.
+
+The two macros `\examplesref` and `\examplesblob` are for referencing
+a specific version of or a file in the github Examples repository.
+
+## License
+
+For copyright information, please see [omp_copyright.txt](omp_copyright.txt).
--- a/Deprecated_Features.tex
+++ b/Deprecated_Features.tex
@ -0,0 +1,282 @@
+\cchapter{Feature Deprecations and Updates in Examples}{deprecated_features}
+\label{chap:deprecated_features}
+\label{sec:deprecated_features}
+\index{deprecated features}
+
+\newcommand\tabpcont[1]{\multicolumn{2}{l}{\small\slshape table continued #1 page}}
+\newcommand\tabpheader{\textbf{Version} & \textbf{Deprecated Feature} &
+  \textbf{Replacement}}
+\newcommand\tabuheader{\textbf{Example Name} & \textbf{Earlier Version} &
+  \textbf{Feature Updated}}
+\newcommand\dpftable[1]{
+  \renewcommand{\arraystretch}{1.0}
+  \tablefirsthead{%
+    \hline\\[-2ex]
+    \tabuheader\\[2pt]
+    \hline\\[-2ex]
+  }
+  \tablehead{%
+    \tabpcont{from previous}\\[2pt]
+    \hline\\[-2ex]
+    \tabuheader\\[2pt]
+    \hline\\[-2ex]
+  }
+  \tabletail{%
+    \hline\\[-2.5ex]
+    \tabpcont{on next}\\
+  }
+  \tablelasttail{\hline\\[-1ex]}
+  \tablecaption{Updated Examples for Features Deprecated in Version #1\label{tab:Updated Examples #1}}
+}
+
+
+Deprecation of features began in OpenMP 5.0. 
+Examples that use a deprecated feature have been updated with an equivalent 
+replacement feature.  
+
+Table~\ref{tab:Deprecated Features} summarizes deprecated features and 
+their replacements in each version.  Affected examples are updated 
+accordingly and listed in Section~\ref{sec:Updated Examples}.
+
+\nolinenumbers
+\renewcommand{\arraystretch}{1.4}
+\tablefirsthead{%
+  \hline
+  \tabpheader\\
+  \hline\\[-3.5ex]
+}
+\tablehead{%
+  \tabpcont{from previous}\\
+  \hline
+  \tabpheader\\
+  \hline\\[-3ex]
+}
+\tabletail{%
+  \hline\\[-4ex]
+  \tabpcont{on next}\\
+}
+\tablelasttail{\hline\\[-2ex]}
+\tablecaption{Deprecated Features and Their Replacements\label{tab:Deprecated Features}}
+\begin{supertabular}{p{0.4in} p{2.3in} p{2.2in}}
+6.0 & \kcode{declare reduction(}\plc{reduction-id}: \plc{typename-list}: \plc{combiner}\kcode{)} 
+    & \kcode{declare reduction(}\plc{reduction-id}: \plc{typename-list}\kcode{)} \kcode{combiner(\plc{combiner-exp})} \\ 
+\hline
+5.2 & \kcode{default} clause on metadirectives 
+    & \kcode{otherwise} clause \\
+5.2 & delimited \kcode{declare target} directive for C/C++ 
+    & \kcode{begin declare target} directive \\
+5.2 & \kcode{to} clause on \kcode{declare target} directive 
+    & \kcode{enter} clause \\
+5.2 & non-argument \kcode{destroy} clause on \kcode{depobj} construct 
+    & \kcode{destroy(\plc{argument})} \\
+5.2 & \kcode{allocate} directive for Fortran \bcode{ALLOCATE} statements 
+    & \kcode{allocators} directive \\
+5.2 & \kcode{depend} clause on \kcode{ordered} construct 
+    & \kcode{doacross} clause \\
+5.2 & \kcode{linear(\plc{modifier(list): linear-step})} clause 
+    & \kcode{linear(\plc{list:} step(\plc{linear-step})\plc{, modifier})} clause \\
+\hline
+5.1 & \kcode{master} construct 
+    & \kcode{masked} construct \\
+5.1 & \kcode{master} affinity policy 
+    & \kcode{primary} affinity policy \\
+\hline
+5.0 & \kcode{omp_lock_hint_*} constants 
+    & \kcode{omp_sync_hint_*} constants \\[2pt]
+\end{supertabular}
+
+\linenumbers
+These replacements appear in examples that illustrate, otherwise, earlier features.
+When using a compiler that is compliant with a version prior to 
+the indicated version, the earlier form of an example for a previous
+version is listed as a reference.
+
+\newpage
+\section{Updated Examples for Different Versions}
+\label{sec:Updated Examples}
+
+The following tables list the updated examples for different versions as 
+a result of feature deprecation. The \emph{Earlier Version} column of 
+the tables shows the version tag of the earlier version. It also shows
+the prior name of an example when it has been renamed.
+
+
+Table~\ref{tab:Updated Examples 6.0} lists the updated examples for 
+features deprecated in OpenMP 6.0
+in the Examples Document Version
+\href{https://github.com/OpenMP/Examples/tree/v6.0}{6.0}.
+The \emph{Earlier Version} column of the table lists the earlier version
+tags of the examples that can be found in 
+the Examples Document Version 
+\href{https://github.com/OpenMP/Examples/tree/v5.2}{5.2}.
+
+\index{clauses!combiner@\kcode{combiner}}
+\index{combiner clause@\kcode{combiner} clause}
+
+\nolinenumbers
+\dpftable{6.0}
+\begin{supertabular}{p{1.7in} p{1.1in} p{2.2in}}
+  \hexentry{udr.1}[f90]{4.0} &
+    \plc{combiner} expression in \kcode{declare} \\
+  \hexentry{udr.2}[f90]{4.0} & 
+    \kcode{reduction} directive changed to use \\
+  \hexentry{udr.3}[f90]{4.0} & \kcode{combiner} clause \\
+  \hexentry[f90]{udr.4}{4.0} & \\
+  \hexentry[cpp]{udr.5}{4.0} & \\
+  \hexentry[cpp]{udr.6}{4.0} & \\[2pt]
+\end{supertabular}
+
+\linenumbers
+Table~\ref{tab:Updated Examples 5.2} lists the updated examples for
+features deprecated in OpenMP 5.2
+in the Examples Document Version \examplesref{5.2}.
+The \emph{Earlier Version} column of the table lists the earlier version
+tags of the examples that can be found in 
+the Examples Document Version \examplesref{5.1}.
+
+\index{clauses!default@\kcode{default}}
+\index{clauses!otherwise@\kcode{otherwise}}
+\index{clauses!to@\kcode{to}}
+\index{clauses!enter@\kcode{enter}}
+\index{clauses!depend@\kcode{depend}}
+\index{clauses!doacross@\kcode{doacross}}
+\index{clauses!linear@\kcode{linear}}
+\index{clauses!destroy@\kcode{destroy}}
+\index{default clause@\kcode{default} clause}
+\index{otherwise clause@\kcode{otherwise} clause}
+\index{to clause@\kcode{to} clause}
+\index{enter clause@\kcode{enter} clause}
+\index{depend clause@\kcode{depend} clause}
+\index{doacross clause@\kcode{doacross} clause}
+\index{linear clause@\kcode{linear} clause}
+\index{destroy clause@\kcode{destroy} clause}
+\index{directives!begin declare target@\kcode{begin declare target}}
+\index{begin declare target directive@\kcode{begin declare target} directive}
+\index{allocate directive@\kcode{allocate} directive}
+\index{allocators directive@\kcode{allocators} directive}
+
+\nolinenumbers
+\dpftable{5.2}
+\begin{supertabular}{p{1.7in} p{1.2in} p{2.1in}}
+  \hexentry{error.1}[f90]{5.1} & 
+    \kcode{default} clause on metadirectives \\
+  \hexentry{metadirective.1}[f90]{5.0} & 
+    replaced with \kcode{otherwise} clause \\
+  \hexentry{metadirective.2}[f90]{5.0} & \\
+  \hexentry{metadirective.3}[f90]{5.0} & \\
+  \hexentry{metadirective.4}[f90]{5.1} & \\
+  \hexentry{target_ptr_map.4}{5.1} & \\
+  \hexentry{target_ptr_map.5}[f90]{5.1} & \\[2pt]
+\hline\\[-2ex]
+  \hexentry[f90]{array_shaping.1}{5.0} &
+    \kcode{to} clause on \kcode{declare target} \\
+  \hexentry{target_reverse_offload.7}{5.0} &
+    directive replaced with \kcode{enter} clause \\
+  \hexentry{target_task_reduction.1}[f90]{5.1} & \\
+  \hexentry{target_task_reduction.2a}[f90]{5.0} & \\
+  \hexentry{target_task_reduction.2b}[f90]{5.1} &\\[2pt]
+\hline\\[-2ex]
+  \hexentry{array_shaping.1}{5.0} & 
+    delimited \kcode{declare target} \\
+  \hexentry{async_target.1}{4.0} & 
+    directive replaced with \\
+  \hexentry{async_target.2}{4.0} & 
+    \kcode{begin declare target} \\
+  \hexentry{declare_target.1}{4.0} & 
+    directive for C/C++ \\
+  \hexentry[cpp]{declare_target.2c}{4.0} & \\
+  \hexentry{declare_target.3}{4.0} & \\
+  \hexentry{declare_target.4}{4.0} & \\
+  \hexentry{declare_target.5}{4.0} & \\
+  \hexentry{declare_target.6}{4.0} & \\
+  \hexentry{declare_variant.1}{5.0} & \\
+  \hexentry{device.1}{4.0} & \\
+  \hexentry{metadirective.3}{5.0} & \\
+  \hexentry{target_ptr_map.2}{5.0} & \\
+  \hexentry{target_ptr_map.3a}{5.0} & \\
+  \hexentry{target_ptr_map.3b}{5.0} & \\
+  \hexentry{target_struct_map.1}{5.0} & \\
+  \hexentry[cpp]{target_struct_map.2}{5.0} & \\
+  \hexentry{target_struct_map.3}{5.0} & \\
+  \hexentry{target_struct_map.4}{5.0} & \\[2pt]
+\hline\\[-2ex]
+  \hexentry{doacross.1}[f90]{4.5} &
+    \kcode{depend} clause on \kcode{ordered} \\
+  \hexentry{doacross.2}[f90]{4.5} &
+    construct replaced with \kcode{doacross} \\
+  \hexentry{doacross.3}[f90]{4.5} &
+    clause \\
+  \hexentry{doacross.4}[f90]{4.5} & \\[2pt]
+\hline\\[-2ex]
+  \hexentry[cpp]{linear_modifier.1}[f90]{4.5} &
+    modifier syntax change for \kcode{linear} \\
+  \hexentry[cpp]{linear_modifier.2}[f90]{4.5} &
+    clause on \kcode{declare simd} directive \\
+  \hexentry{linear_modifier.3}[f90]{4.5} & \\[2pt]
+\hline\\[-2ex]
+  \hexentry[f90]{allocators.1}{5.0} &
+    \kcode{allocate} directive replaced with \kcode{allocators} directive
+    for Fortran \bcode{allocate} statements \\[2pt]
+\hline\\[-2ex]
+  \hexentry{depobj.1}[f90]{5.0} &
+    argument added to \kcode{destroy} clause on \kcode{depobj} 
+    construct \\[2pt]
+\end{supertabular}
+
+\linenumbers
+\newpage
+Table~\ref{tab:Updated Examples 5.1} lists the updated examples for
+features deprecated in OpenMP 5.1
+in the Examples Document Version \examplesref{5.1}.
+The \emph{Earlier Version} column of the table lists the earlier version
+tags and prior names of the examples that can be found in 
+the Examples Document Version \examplesref{5.0.1}.
+
+\index{affinity!master policy@\kcode{master} policy}
+\index{affinity!primary policy@\kcode{primary} policy}
+\index{constructs!master@\kcode{master}}
+\index{constructs!masked@\kcode{masked}}
+\index{master construct@\kcode{master} construct}
+\index{masked construct@\kcode{masked} construct}
+
+\nolinenumbers
+\dpftable{5.1}
+\begin{supertabular}{p{1.8in} p{1.4in} p{1.8in}}
+  \hexentry{affinity.5}[f]{4.0} &
+    \kcode{master} affinity policy replaced with \kcode{primary} policy \\[2pt]
+\hline\\[-2ex]
+  \hexentry{async_target.3}[f90]{5.0} & 
+    \kcode{master} construct replaced \\
+  \hexentry{cancellation.2}[f90]{4.0} & 
+    with \kcode{masked} construct \\
+  \hexentry{copyprivate.2}[f]{3.0} & \\
+  \hexentry[f]{fort_sa_private.5}{3.0} & \\
+  \hexentry{lock_owner.1}[f]{3.0} & \\
+  \hexmentry{masked.1}[f]{3.0}{master.1} & \\
+  \hexmentry{parallel_masked_taskloop.1}[f90]{5.0}{parallel_master_taskloop.1} &\\
+  \hexentry{reduction.6}[f]{3.0} & \\
+  \hexentry{target_task_reduction.1}[f90]{5.0} & \\
+  \hexentry{target_task_reduction.2b}[f90]{5.0} & \\
+  \hexentry{taskloop_simd_reduction.1}[f90]{5.0} & \\
+  \hexentry{task_detach.1}[f90]{5.0} & \\[2pt]
+\end{supertabular}
+
+\linenumbers
+Table~\ref{tab:Updated Examples 5.0} lists the updated examples for
+features deprecated in OpenMP 5.0
+in the Examples Document Version \examplesref{5.1}.
+The \emph{Earlier Version} column of the table lists the earlier version
+tags of the examples that can be found in 
+the Examples Document Version \examplesref{5.0.1}.
+
+\nolinenumbers
+\dpftable{5.0}
+\begin{supertabular}{p{1.6in} p{1.3in} p{2.1in}}
+ \hexentry{critical.2}[f]{4.5} &
+   \kcode{omp_lock_hint_*} constants \\
+ \hexentry[cpp]{init_lock_with_hint.1}[f]{4.5} &
+   replaced with \kcode{omp_sync_hint_*} constants \\[2pt]
+\end{supertabular}
+
+\linenumbers
+
--- a/Examples_Chapt.tex
+++ b/Examples_Chapt.tex
@ -1,21 +0,0 @@
-
-\chapter*{Examples}
-\label{chap:examples}
-\addcontentsline{toc}{chapter}{\protect\numberline{}Examples}
-The following are examples of the OpenMP API directives, constructs, and routines.
-\ccppspecificstart
-A statement following a directive is compound only when necessary, and a 
-non-compound statement is indented with respect to a directive preceding it.
-\ccppspecificend
-
-Each example is labeled as \plc{ename.seqno.ext}, where \plc{ename} is 
-the example name, \plc{seqno} is the sequence number in a section, and 
-\plc{ext} is the source file extension to indicate the code type and 
-source form.  \plc{ext} is one of the following:
-\begin{compactitem}
-\item \plc{c} -- C code,
-\item \plc{cpp} -- C++ code,
-\item \plc{f} -- Fortran code in fixed form, and
-\item \plc{f90} -- Fortran code in free form.
-\end{compactitem}
-
--- a/Examples_SIMD.tex
+++ b/Examples_SIMD.tex
@ -1,130 +0,0 @@
-%\pagebreak
-\section{\code{simd} and \code{declare} \code{simd} Constructs}
-\label{sec:SIMD}
-
-The following example illustrates the basic use of the \code{simd} construct 
-to assure the compiler that the loop can be vectorized.
-
-\cexample{SIMD}{1}
-
-\ffreeexample{SIMD}{1}
-
-\clearpage
- 
-
-When a function can be inlined within a loop the compiler has an opportunity to 
-vectorize the loop. By guaranteeing SIMD behavior of a function's operations, 
-characterizing the arguments of the function and privatizing temporary 
-variables of the loop, the compiler can often create faster, vector code for 
-the loop. In the examples below the \code{declare} \code{simd} construct is 
-used on the \plc{add1} and \plc{add2} functions to enable creation of their 
-corresponding SIMD function versions for execution within the associated SIMD 
-loop. The functions characterize two different approaches of accessing data 
-within the function: by a single variable and as an element in a data array, 
-respectively. The \plc{add3} C function uses dereferencing.
-
-The \code{declare} \code{simd} constructs also illustrate the use of 
-\code{uniform} and \code{linear} clauses.  The \code{uniform(fact)} clause 
-indicates that the variable \plc{fact} is invariant across the SIMD lanes. In 
-the \plc{add2} function \plc{a} and \plc{b} are included in the \code{uniform} 
-list because the C pointer and the Fortran array references are constant.  The 
-\plc{i} index used in the \plc{add2} function is included in a \code{linear} 
-clause with a constant-linear-step of 1, to guarantee a unity increment of the 
-associated loop. In the \code{declare} \code{simd} construct for the \plc{add3} 
-C function the  \code{linear(a,b:1)} clause instructs the compiler to generate 
-unit-stride loads across the SIMD lanes; otherwise,  costly \emph{gather} 
-instructions would be generated for the unknown sequence of access of the 
-pointer dereferences.
-
-In the \code{simd} constructs for the loops the \code{private(tmp)} clause is 
-necessary to assure that the each vector operation has its own \plc{tmp} 
-variable.
-
-\cexample{SIMD}{2}
-
-\ffreeexample{SIMD}{2}
-
-\pagebreak
-A thread that encounters a SIMD construct executes a vectorized code of the 
-iterations. Similar to the concerns of a worksharing loop a loop vectorized 
-with a SIMD construct must assure that temporary and reduction variables are 
-privatized and declared as reductions with clauses.  The example below 
-illustrates the use of \code{private} and \code{reduction} clauses in a SIMD 
-construct.
-
-\cexample{SIMD}{3}
-
-\ffreeexample{SIMD}{3}
-
-
-\pagebreak
-A \code{safelen(N)} clause in a \code{simd} construct assures the compiler that 
-there are no loop-carried dependencies for vectors of size \plc{N} or below. If 
-the \code{safelen} clause is not specified, then the default safelen value is 
-the number of loop iterations.
- 
-The \code{safelen(16)} clause in the example below guarantees that the vector 
-code is safe for vectors up to and including size 16.  In the loop, \plc{m} can 
-be 16 or greater, for correct code execution.  If the value of \plc{m} is less 
-than 16, the behavior is undefined.
-
-\cexample{SIMD}{4}
-
-\ffreeexample{SIMD}{4}
-
-\pagebreak
-The following SIMD construct instructs the compiler to collapse the \plc{i} and 
-\plc{j} loops into a single SIMD loop in which SIMD chunks are executed by 
-threads of the team. Within the workshared loop chunks of a thread, the SIMD 
-chunks are executed in the lanes of the vector units.
-
-\cexample{SIMD}{5}
-
-\ffreeexample{SIMD}{5}
-
-
-%%% section
-\section{\code{inbranch} and \code{notinbranch} Clauses}
-\label{sec:SIMD_branch}
-
-The following examples illustrate the use of the \code{declare} \code{simd} 
-construct with the \code{inbranch} and \code{notinbranch} clauses. The 
-\code{notinbranch} clause informs the compiler that the function \plc{foo} is 
-never called conditionally in the SIMD loop of the function \plc{myaddint}. On 
-the other hand, the \code{inbranch} clause for the function goo indicates that 
-the function is always called conditionally in the SIMD loop inside 
-the function \plc{myaddfloat}.
-
-\cexample{SIMD}{6}
-
-\ffreeexample{SIMD}{6}
-
-
-In the code below, the function \plc{fib()} is called in the main program and 
-also recursively called in the function \plc{fib()} within an \code{if} 
-condition. The compiler creates a masked vector version and a non-masked vector 
-version for the function \plc{fib()} while retaining the original scalar 
-version of the \plc{fib()} function.
-
-\cexample{SIMD}{7}
-
-\ffreeexample{SIMD}{7}
-
-
-
-%%% section
-\pagebreak
-\section{Loop-Carried Lexical Forward Dependence}
-\label{sec:SIMD_forward_dep}
-
-
- The following example tests the restriction on an SIMD loop with the loop-carried lexical forward-dependence. This dependence must be preserved for the correct execution of SIMD loops.
-
-A loop can be vectorized even though the iterations are not completely independent when it has loop-carried dependences that are forward lexical dependences, indicated in the code below by the read of \plc{A[j+1]} and the write to \plc{A[j]} in C/C++ code (or \plc{A(j+1)} and \plc{A(j)} in Fortran). That is, the read of \plc{A[j+1]} (or \plc{A(j+1)} in Fortran) before the write to \plc{A[j]} (or \plc{A(j)} in Fortran) ordering must be preserved for each iteration in \plc{j} for valid SIMD code generation.
-
-This test assures that the compiler preserves the loop carried lexical forward-dependence for generating a correct SIMD code.
-
-\cexample{SIMD}{8}
-
-\ffreeexample{SIMD}{8}
-
--- a/Examples_acquire_release.tex
+++ b/Examples_acquire_release.tex
@ -1,141 +0,0 @@
-\pagebreak
-\section{Synchronization Based on Acquire/Release Semantics}
-\label{sec:acquire_and_release_semantics}
-
-%OpenMP 5.0 introduced ``release/acquire'' memory ordering semantics to the
-%specification. The memory ordering behavior of OpenMP constructs and routines
-%that permit two threads to synchronize with each other are defined in terms of
-%\textit{release flushes} and \textit{acquire flushes}, where a release flush
-%must occur at the source of the synchronization and an acquire flush must occur
-%at the sink of the synchronization. Flushes resulting from a \code{flush}
-%directive without a list may function as a release flush, an acquire flush, or
-%both a release and acquire flush. Flushes implied on entry to or exit from an
-%atomic operation (specified by an \code{atomic} construct) may also function as
-%a release flush or an acquire flush, depending on if a memory ordering clause
-%appears on a construct. Flushes implied by other OpenMP constructs or routines
-%also function as either a release flush or an acquire flush, according to the
-%synchronization semantics of the construct.
-
-%%%%%%%%%%%%%%%%%%
-
-As explained in the Memory Model chapter of this document, a flush operation
-may be an \emph{acquire flush} and/or a \emph{release flush}, and OpenMP 5.0
-defines acquire/release semantics in terms of these fundamental flush
-operations.  For any synchronization between two threads that is specified by
-OpenMP, a release flush logically occurs at the source of the synchronization
-and an acquire flush logically occurs at the sink of the synchronization.
-OpenMP 5.0 added memory ordering clauses -- \code{acquire}, \code{release}, and
-\code{acq\_rel} -- to the \code{flush} and \code{atomic} constructs for
-explicitly requesting acquire/release semantics.  Furthermore, implicit flushes
-for all OpenMP constructs and runtime routines that synchronize OpenMP threads
-in some manner were redefined in terms of synchronizing release and acquire
-flushes to avoid the requirement of strong memory fences (see the \plc{Flush
-Synchronization and Happens Before} and \plc{Implicit Flushes} sections of the
-OpenMP Specifications document).
-
-The examples that follow in this section illustrate how acquire and release
-flushes may be employed, implicitly or explicitly, for synchronizing threads. A
-\code{flush} directive without a list and without any memory ordering clause
-can also function as both an acquire and release flush for facilitating thread
-synchronization.  Flushes implied on entry to, or exit from, an atomic
-operation (specified by an \code{atomic} construct) may function as an acquire
-flush or a release flush if a memory ordering clause appears on the construct.
-On entry to and exit from a \code{critical} construct there is now an implicit
-acquire flush and release flush, respectively.
-
-%%%%%%%%%%%%%%%%%%
-
-The first example illustrates how the release and acquire flushes implied by a
-\code{critical} region guarantee a value written by the first thread is visible
-to a read of the value on the second thread. Thread 0 writes to \plc{x} and
-then executes a \code{critical} region in which it writes to \plc{y}; the write
-to \plc{x} happens before the execution of the \code{critical} region,
-consistent with the program order of the thread.  Meanwhile, thread 1 executes a
-\code{critical} region in a loop until it reads a non-zero value from
-\plc{y} in the \code{critical} region, after which it prints the value of
-\plc{x}; again, the execution of the \code{critical} regions happen before the
-read from \plc{x} based on the program order of the thread. The \code{critical}
-regions executed by the two threads execute in a serial manner, with a
-pair-wise synchronization from the exit of one \code{critical} region to the
-entry to the next \code{critical} region.  These pair-wise synchronizations
-result from the implicit release flushes that occur on exit from
-\code{critical} regions and the implicit acquire flushes that occur on entry to
-\code{critical} regions; hence, the execution of each \code{critical} region in
-the sequence happens before the execution of the next \code{critical} region.
-A ``happens before'' order is therefore established between the assignment to \plc{x}
-by thread 0 and the read from \plc{x} by thread 1, and so thread 1 must see that
-\plc{x} equals 10.
-
-\pagebreak
-\cexample{acquire_release}{1}
-\ffreeexample{acquire_release}{1}
-
-In the second example, the \code{critical} constructs are exchanged with
-\code{atomic} constructs that have \textit{explicit} memory ordering specified. When the
-atomic read operation on thread 1 reads a non-zero value from \plc{y}, this
-results in a release/acquire synchronization that in turn implies that the
-assignment to \plc{x} on thread 0 happens before the read of \plc{x} on thread
-1. Therefore, thread 1 will print ``x = 10''.
-
-\cexample{acquire_release}{2}
-\ffreeexample{acquire_release}{2}
-
-\pagebreak
-In the third example, \code{atomic} constructs that specify relaxed atomic
-operations are used with explicit \code{flush} directives to enforce memory
-ordering between the two threads. The explicit \code{flush} directive on thread
-0 must specify a release flush and the explicit \code{flush} directive on
-thread 1 must specify an acquire flush to establish a release/acquire
-synchronization between the two threads. The \code{flush} and \code{atomic}
-constructs encountered by thread 0 can be replaced by the \code{atomic} construct used in
-Example 2 for thread 0, and similarly the \code{flush} and \code{atomic}
-constructs encountered by thread 1 can be replaced by the \code{atomic}
-construct used in Example 2 for thread 1.
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3
-%{\color{violet}
-%For this example, the implicit release flush of the \code{flush} directive for thread 0 creates
-%a source synchronization with release memory ordering, while the implicit release flush of the
-%\code{flush} directive for thread 1 creates a sink synchronization with acquire memory ordering.
-%The code performs the same thread synchronization of the previous example, with only a slight 
-%coding change.
-%The explicit \code{release} and \code{acquire} clauses of the atomic construct has been 
-%replaced with implicit release and aquire flushes of explicit \code{flush} constructs.
-%(Here, the \code{atomic} constructs have \plc{relaxed} operations.)
-%}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3
-
-\cexample{acquire_release}{3}
-\ffreeexample{acquire_release}{3}
-
-Example 4 will fail to order the write to \plc{x} on thread 0 before the read
-from \plc{x} on thread 1. Importantly, the implicit release flush on exit from
-the \code{critical} region will not synchronize with the acquire flush that
-occurs on the atomic read operation performed by thread 1. This is because
-implicit release flushes that occur on a given construct may only synchronize
-with implicit acquire flushes on a compatible construct (and vice-versa) that
-internally makes use of the same synchronization variable. For a
-\code{critical} construct, this might correspond to a \plc{lock} object that
-is used by a given implementation (for the synchronization semantics of other
-constructs due to implicit release and acquire flushes, refer to the \plc{Implicit
-Flushes} section of the OpenMP Specifications document).  Either an explicit \code{flush}
-directive that provides a release flush (i.e., a flush without a list that does
-not have the \code{acquire} clause) must be specified between the
-\code{critical} construct and the atomic write, or an atomic operation that
-modifies \plc{y} and provides release semantics must be specified.
-
-%{\color{violet}
-%In the following example synchronization between the acquire flush of the atomic read
-%of \plc{y} by thread 1 is not synchronized with the relaxed atomic construct that
-%assigns a value to \plc{y} by thread 0.
-%While there is a \code{critical} construct and implicit release flush
-%for the \plc{x} assignment of thread 0,
-%a release flush association with the \plc{y} assignment of
-%thread 0 is not formed.  A \code{release} or \code{acq-rel} clause on the
-%\code{atomic write} construct or a \code{flush} directive after the assignment to \plc{y}
-%will form a synchronization and will guarantee  memory ordering of the x and y assignments
-%by thread 0.
-%}
-
-\cexample{acquire_release_broke}{4}
-\ffreeexample{acquire_release_broke}{4}
--- a/Examples_affinity_display.tex
+++ b/Examples_affinity_display.tex
@ -1,104 +0,0 @@
-\section{Affinity Display}
-\label{sec:affinity_display}
-
-The following examples illustrate ways to display thread affinity.
-Automatic display of affinity can be invoked by setting
-the \code{OMP\_DISPLAY\_AFFINITY} environment variable to \code{TRUE}.  
-The format of the output can be customized by setting the 
-\code{OMP\_AFFINITY\_FORMAT} environment variable to an appropriate string.  
-Also, there are API calls for the user to display thread affinity 
-at selected locations within code.
-
-For the first example the environment variable \code{OMP\_DISPLAY\_AFFINITY} has been
-set to \code{TRUE}, and execution occurs on an 8-core system with \code{OMP\_NUM\_THREADS} set to 8.
-
-The affinity for the master thread is reported through a call to the API
-\code{omp\_display\_affinity()} routine. For default affinity settings
-the report shows that the master thread can execute on any of the cores. 
-In the following parallel region the affinity for each of the team threads is reported
-automatically since the \code{OMP\_DISPLAY\_AFFINITY} environment variable has been set
-to \code{TRUE}.
-
-These two reports are often useful (as in hybrid codes using both MPI and OpenMP) 
-to observe the affinity (for an MPI task) before the parallel region,
-and during an OpenMP parallel region. Note: the next parallel region uses the 
-same number of threads as in the previous parallel region and affinities are 
-not changed, so affinity is NOT reported.
-
-In the last parallel region, the thread affinities are reported
-because the thread affinity has changed.
-
-\cexample{affinity_display}{1}
-
-\ffreeexample{affinity_display}{1}
-
-
-In the following example 2 threads are forked, and each executes on a socket. Next,
-a nested parallel region runs half of the available threads on each socket.
-
-These OpenMP environment variables have been set:
-
-\begin{compactitem}
-\item \code{OMP\_PROC\_BIND}="TRUE"
-\item \code{OMP\_NUM\_THREADS}="2,4"
-\item \code{OMP\_PLACES}="\{0,2,4,6\},\{1,3,5,7\}"
-\item \code{OMP\_AFFINITY\_FORMAT}="nest\_level= \%L, parent\_thrd\_num= \%a, thrd\_num= \%n, thrd\_affinity= \%A"
-\end{compactitem}
-
-where the numbers correspond to core ids for the system. Note, \code{OMP\_DISPLAY\_AFFINITY} is not
-set and is \code{FALSE} by default.  This example shows how to use API routines to
-perform affinity display operations.
-
-For each of the two first-level threads the \code{OMP\_PLACES} variable specifies
-a place with all the core-ids of the socket (\{0,2,4,6\} for one thread and \{1,3,5,7\} for the other).
-(As is sometimes the case in 2-socket systems, one socket may consist
-of the even id numbers, while the other may have the odd id numbers.)  The affinities
-are printed according to the \code{OMP\_AFFINITY\_FORMAT} format: providing
-the parallel nesting level (\%L), the ancestor thread number (\%a), the thread number (\%n)
-and the thread affinity (\%A). In the nested parallel region within the \plc{socket\_work} routine
-the affinities for the threads on each socket are printed according to this format.
-
-\cexample{affinity_display}{2}
-
-\ffreeexample{affinity_display}{2}
-
-The next example illustrates more details about affinity formatting.
-First, the \code{omp\_get\_affininity\_format()} API routine is used to 
-obtain the default format. The code checks to make sure the storage 
-provides enough space to hold the format.  
-Next, the \code{omp\_set\_affinity\_format()} API routine sets a user-defined 
-format: \plc{host=\%20H thrd\_num=\%0.4n binds\_to=\%A}.   
-
-The host, thread number and affinity fields are specified by \plc{\%20H}, 
-\plc{\%0.4n} and \plc{\%A}: \plc{H}, \plc{n} and \plc{A} are single character "short names" 
-for the host, thread\_num and thread\_affinity data to be printed,
-with format sizes of \plc{20}, \plc{4}, and "size as needed". 
-The period (.) indicates that the field is displayed right-justified (default is left-justified) 
-and the "0" indicates that any unused space is to be prefixed with zeros 
-(e.g. instead of "1", "0001" is displayed for the field size of 4).
-
-%The period (.) indicates that the field is displayed left-justified and the "0" indicates 
-%that leading zeros are to be added so that the total length for the display of this “n” (thread_num) field is 4.
-
-%The period (\plc{.}) indicates right justified and \plc{0} leading zeros.  
-%All other text in the format is just user narrative.
-
-Within the parallel region the affinity for each thread is captured by 
-\code{omp\_capture\_affinity()} into a buffer array with elements indexed 
-by the thread number (\plc{thrd\_num}).
-After the parallel region, the thread affinities are printed in thread-number order.
-
-If the storage area in buffer is inadequate for holding the affinity
-data, the stored affinity data is truncated.  
-%The \plc{max} reduction on the required storage, returned by 
-%\code{omp\_capture\_affinity} in \plc{nchars}, is used to report 
-%possible truncation (if \plc{max\_req\_store}  >  \plc{buffer\_store}).
-The maximum value for the number of characters (\plc{nchars}) returned by 
-\code{omp\_capture\_affinity} is captured by the \code{reduction(max:max\_req\_store)}
-clause and the \plc{if(nchars >= max\_req\_store) max\_req\_store=nchars} statement. 
-It is used to report possible truncation (if \plc{max\_req\_store} > \plc{buffer\_store}).
-
-\cexample{affinity_display}{3}
-
-\ffreeexample{affinity_display}{3}
-
--- a/Examples_affinity_query.tex
+++ b/Examples_affinity_query.tex
@ -1,43 +0,0 @@
-\section{Affinity Query Functions}
-\label{sec: affinity_query}
-
-In the example below a team of threads is generated on each socket of
-the system, using nested parallelism. Several query functions are used
-to gather information to support the creation of the teams and to obtain 
-socket and thread numbers.
-
-For proper execution of the code, the user must create a place partition, such that
-each place is a listing of the core numbers for a socket. For example,
-in a 2 socket system with 8 cores in each socket, and sequential numbering
-in the socket for the core numbers, the \code{OMP\_PLACES} variable would be set
-to "\{0:8\},\{8:8\}", using the place syntax \{\plc{lower\_bound}:\plc{length}:\plc{stride}\},
-and the default stride of 1.
-
-The code determines the number of sockets (\plc{n\_sockets})
-using the \code{omp\_get\_num\_places()} query function.
-In this example each place is constructed with a list of 
-each socket's core numbers, hence the number of places is equal
-to the number of sockets. 
-
-The outer parallel region forms a team of threads, and each thread 
-executes on a socket (place) because the \code{proc\_bind} clause uses 
-\code{spread} in the outer \code{parallel} construct.
-Next, in the \plc{socket\_init} function, an inner parallel region creates a team 
-of threads equal to the number of elements (core numbers) from the place
-of the parent thread. Because the outer \code{parallel} construct uses 
-a \code{spread} affinity policy, each of its threads inherits a subpartition of 
-the original partition.  Hence, the \code{omp\_get\_place\_num\_procs} query function
-returns the number of elements (here procs = cores) in the subpartition of the thread.  
-After each parent thread creates its nested parallel region on the section,
-the socket number and thread number are reported.
-
-Note: Portable tools like hwloc (Portable HardWare LOCality package), which support
-many common operating systems, can be used to determine the configuration of a system.  
-On some systems there are utilities, files or user guides that provide configuration
-information.  For instance, the socket number and proc\_id's for a socket 
-can be found in the /proc/cpuinfo text file on Linux systems.
-
-\cexample{affinity_query}{1}
-
-\ffreeexample{affinity_query}{1}
-
--- a/Examples_allocators.tex
+++ b/Examples_allocators.tex
@ -1,63 +0,0 @@
-\pagebreak
-\section{ Memory Allocators}
-\label{sec:allocators}
-
-OpenMP memory allocators can be used to allocate memory with
-specific allocator traits.  In the following example an OpenMP allocator is used to
-specify an alignment for arrays \plc{x} and \plc{y}. The
-general approach for attributing traits to variables allocated by
-OpenMP is to create or specify a pre-defined \plc{memory space}, create an
-array of \plc{traits}, and then form an \plc{allocator} from the
-memory space and trait.
-The allocator is then specified
-in an OpenMP allocation (using an API \plc{omp\_alloc()} function
-for C/C++ code and an \code{allocate} directive for Fortran code
-in the allocators.1 example).
-
-In the example below the \plc{xy\_memspace} variable is declared
-and assigned the default memory space (\plc{omp\_default\_mem\_space}).
-Next, an array for \plc{traits} is created. Since only one
-trait will be used, the array size is \plc{1}.
-A trait is a structure in C/C++ and a derived type in Fortran,
-containing 2 components: a key and a corresponding value (key-value pair).
-The trait key used here is \plc{omp\_atk\_alignment} (an enum for C/C++
-and a parameter for Fortran)
-and the trait value of 64 is specified in the \plc{xy\_traits} declaration.
-These declarations are followed by a call to the
-\plc{omp\_init\_allocator()} function to combine the memory
-space (\plc{xy\_memspace}) and the traits (\plc{xy\_traits})
-to form an allocator (\plc{xy\_alloc}).
-
-%In the C/C++ code the API  \plc{omp\_allocate()} function is used 
-%to allocate space, similar to \plc{malloc}, except that the allocator 
-%is specified as the second argument.  
-%In Fortran an API allocation function is not available. 
-%An \code{allocate} construct is used (with \plc{x} and \plc{y} 
-%listed as the variables to be allocated), along
-%with an \code{allocator} clause (specifying the \plc{xy\_alloc} as the allocator)
-%for the following Fortran \plc{allocate} statement.
-
-In the C/C++ code the API  \plc{omp\_allocate()} function is used
-to allocate space, similar to \plc{malloc}, except that the allocator
-is specified as the second argument.
-In Fortran an \code{allocate} directive is used to specify an allocator
-for a following Fortran \plc{allocate} statement.
-A variable list may be supplied if the allocator
-is to be applied to a subset of variables in the Fortran allocate
-statement. Specifying the complete list is optional.
-Here, the \plc{xy\_alloc} allocator is specified
-in the \code{allocator} clause,
-and the set of all variables used in the allocate statement is specified in the list.
-
-%"for a following Fortran allocation statement" (no using "immediately" here)
-% it looks like if you have a list, the allocation statement does not need
-% to follow immediately.(?)
-% spec5.0 157:19-20 The allocate directive must appear in the same scope as
-% the declarations of each of its list items and must follow all such declarations.
-
-%\pagebreak
-
-    \cexample{allocators}{1}
-\ffreeexample{allocators}{1}
-
-
--- a/Examples_array_sections.tex
+++ b/Examples_array_sections.tex
@ -1,38 +0,0 @@
-\pagebreak
-\section{Array Sections in Device Constructs}
-\label{sec:array_sections}
-
-The following examples show the usage of array sections in \code{map} clauses 
-on \code{target} and \code{target} \code{data} constructs.
-
-This example shows the invalid usage of two separate sections of the same array 
-inside of a \code{target} construct.
-
-\cexample{array_sections}{1}
-
-\ffreeexample{array_sections}{1}
-
-\pagebreak
-This example shows the invalid usage of two separate sections of the same array 
-inside of a \code{target} construct.
-
-\cexample{array_sections}{2}
-
-\ffreeexample{array_sections}{2}
-
-\pagebreak
-This example shows the valid usage of two separate sections of the same array inside 
-of a \code{target} construct.
-
-\cexample{array_sections}{3}
-
-\ffreeexample{array_sections}{3}
-
-\pagebreak
-This example shows the valid usage of a wholly contained array section of an already 
-mapped array section inside of a \code{target} construct.
-
-\cexample{array_sections}{4}
-
-\ffreeexample{array_sections}{4}
-
--- a/Examples_array_shaping.tex
+++ b/Examples_array_shaping.tex
@ -1,27 +0,0 @@
-\section{Array Shaping}
-\label{sec:array-shaping}
-
-\ccppspecificstart
-A pointer variable can be shaped to a multi-dimensional array to facilitate
-data access. This is achieved by a \plc{shape-operator} casted in front of 
-a pointer (lvalue expression):
-\begin{description}
-\item[]\hspace*{5mm}\code{([$s_1$][$s_2$]...[$s_n$])}\plc{pointer}
-\end{description}
-where each $s_i$ is an integral-type expression of positive value.
-The shape-operator can appear in either the \plc{motion-clause}
-of the \code{target}~\code{update} directive or the \code{depend} clause.
-
-The following example shows the use of the shape-operator in the 
-\code{target}~\code{update} directive. The shape-operator \code{([nx][ny+2])}
-casts pointer variable $a$ to a 2-dimentional array of size
-\plc{nx}$\times$\plc{(ny+2)}.  The resulting array is then accessed as
-array sections (such as \code{[0:nx][1]} and \code{[0:nx][ny]}) 
-in the \code{from} or \code{to} clause for transferring two columns of 
-noncontiguous boundary data from or to the device.  
-Note the use of additional parentheses
-around the shape-operator and $a$ to ensure the correct precedence 
-over array-section operations.
-
-\cnexample{array_shaping}{1}
-\ccppspecificend
--- a/Examples_associate.tex
+++ b/Examples_associate.tex
@ -1,32 +0,0 @@
-\pagebreak
-\section{Fortran \code{ASSOCIATE} Construct}
-\fortranspecificstart
-\label{sec:associate}
-
-The following is an invalid example of specifying an associate name on a data-sharing attribute 
-clause. The constraint in the Data Sharing Attribute Rules section in the OpenMP 
-4.0 API Specifications states that an associate name preserves the association 
-with the selector established at the \code{ASSOCIATE} statement. The associate 
-name \plc{b} is associated with the shared variable \plc{a}. With the predetermined data-sharing 
-attribute rule, the associate name \plc{b} is not allowed to be specified on the \code{private} 
-clause.
-
-\fnexample{associate}{1}
-
-In next example, within the \code{parallel} construct, the association name \plc{thread\_id} 
-is associated with the private copy of \plc{i}. The print statement should output the 
-unique thread number.
-
-\fnexample{associate}{2}
-
-The following example illustrates the effect of specifying a selector name on a data-sharing 
-attribute clause. The associate name \plc{u} is associated with \plc{v} and the variable \plc{v} 
-is specified on the \code{private} clause of the \code{parallel} construct. 
-The construct association is established prior to the \code{parallel} region. 
-The association between \plc{u} and the original \plc{v} is retained (see the Data Sharing 
-Attribute Rules section in the OpenMP 4.0 API Specifications). Inside the \code{parallel} 
-region, \plc{v} has the value of -1 and \plc{u} has the value of the original \plc{v}.
-
-\ffreenexample{associate}{3}
-\fortranspecificend
-
--- a/Examples_async_target_depend.tex
+++ b/Examples_async_target_depend.tex
@ -1,14 +0,0 @@
-\pagebreak
-\section{Asynchronous \code{target} Execution and Dependences}
-\label{sec:async_target_exec_depend}
-
-Asynchronous execution of a \code{target} region can be accomplished
-by creating an explicit task around the \code{target} region. Examples
-with explicit tasks are shown at the beginning of this section. 
-
-As of OpenMP 4.5 and beyond the \code{nowait} clause can be used on the
-\code{target} directive for asynchronous execution. Examples with 
-\code{nowait} clauses follow the explicit \code{task} examples.
-
-This section also shows the use of \code{depend} clauses to order 
-executions through dependences.
--- a/Examples_async_target_nowait.tex
+++ b/Examples_async_target_nowait.tex
@ -1,31 +0,0 @@
-\subsection{\code{nowait} Clause on \code{target} Construct}
-\label{subsec:target_nowait_clause}
-
-The following example shows how to execute code asynchronously on a 
-device without an explicit task. The \code{nowait} clause on a \code{target} 
-construct allows the thread of the \plc{target task} to perform other
-work while waiting for the \code{target} region execution to complete. 
-Hence, the the \code{target} region can execute asynchronously on the 
-device (without requiring a host thread to idle while waiting for 
-the \plc{target task} execution to complete).
-
-In this example the product of two vectors (arrays), \plc{v1}
-and \plc{v2}, is formed. One half of the operations is performed
-on the device, and the last half on the host, concurrently.
-
-After a team of threads is formed the master thread generates 
-the \plc{target task} while the other threads can continue on, without a barrier,
-to the execution of the host portion of the vector product.
-The completion of the \plc{target task} (asynchronous target execution) is 
-guaranteed by the synchronization in the implicit barrier at the end of the 
-host vector-product worksharing loop region. See the \code{barrier} 
-glossary entry in the OpenMP specification for details.
-
-The host loop scheduling is \code{dynamic}, to balance the host thread executions, since 
-one thread is being used for offload generation. In the situation where 
-little time is spent by the \plc{target task} in setting 
-up and tearing down the the target execution, \code{static} scheduling may be desired. 
-
-\cexample{async_target}{3}
-
-\ffreeexample{async_target}{3}
--- a/Examples_async_target_nowait_depend.tex
+++ b/Examples_async_target_nowait_depend.tex
@ -1,18 +0,0 @@
-%begin 
-\subsection{Asynchronous \code{target} with \code{nowait} and \code{depend} Clauses}
-\label{subsec:async_target_nowait_depend}
-
-More details on dependences can be found in \specref{sec:task_depend}, Task 
-Dependences. In this example, there are three flow dependences.  In the first two dependences the
-target task does not execute until the preceding explicit tasks have finished.   These 
-dependences are produced by arrays \plc{v1} and \plc{v2}  with the \code{out} dependence type in the first two tasks, and the \code{in} dependence type in the target task.   
-
-The last dependence is produced by array \plc{p}  with the \code{out} dependence type in the target task, and the \code{in} dependence type in the last task.  The last task does not execute until the target task finishes.  
-
-The \code{nowait} clause on the \code{target} construct creates a deferrable \plc{target task}, allowing the encountering task to continue execution without waiting for the completion of the \plc{target task}.
-
-\cexample{async_target}{4}
-
-\ffreeexample{async_target}{4}
-
-%end
--- a/Examples_async_target_with_tasks.tex
+++ b/Examples_async_target_with_tasks.tex
@ -1,55 +0,0 @@
-\subsection{Asynchronous \code{target} with Tasks}
-\label{subsec:async_target_with_tasks}
-
-The following example shows how the \code{task} and \code{target} constructs 
-are used to execute multiple \code{target} regions asynchronously. The task that 
-encounters the \code{task} construct generates an explicit task that contains 
-a \code{target} region. The thread executing the explicit task encounters a task 
-scheduling point while waiting for the execution of the \code{target} region 
-to complete, allowing the thread to switch back to the execution of the encountering 
-task or one of the previously generated explicit tasks.
-
-\cexample{async_target}{1}
-
-\pagebreak
-The Fortran version has an interface block that contains the \code{declare} \code{target}. 
-An identical statement exists in the function declaration (not shown here).
-
-\ffreeexample{async_target}{1}
-
-The following example shows how the \code{task} and \code{target} constructs 
-are used to execute multiple \code{target} regions asynchronously. The task dependence 
-ensures that the storage is allocated and initialized on the device before it is 
-accessed.
-
-\cexample{async_target}{2}
-
-The Fortran example below is similar to the C version above. Instead of pointers, though, it uses
-the convenience of Fortran allocatable arrays on the device. In order to preserve the arrays 
-allocated on the device across multiple \code{target} regions, a \code{target}~\code{data} region 
-is used in this case.
-
-If there is no shape specified for an allocatable array in a \code{map} clause, only the array descriptor
-(also called a dope vector) is mapped. That is, device space is created for the descriptor, and it
-is initially populated with host values. In this case, the \plc{v1} and \plc{v2} arrays will be in a
-non-associated state on the device. When space for \plc{v1} and \plc{v2} is allocated on the device
-in the first \code{target} region the addresses to the space will be included in their descriptors.
-
-At the end of the first \code{target} region, the arrays \plc{v1} and \plc{v2} are preserved on the device 
-for access in the second \code{target} region. At the end of the second \code{target} region, the data 
-in array \plc{p} is copied back, the arrays \plc{v1} and \plc{v2} are not.
-
-A \code{depend} clause is used in the \code{task} directive to provide a wait at the beginning of the second 
-\code{target} region, to insure that there is no race condition with \plc{v1} and \plc{v2} in the two tasks.
-It would be noncompliant to use \plc{v1} and/or \plc{v2} in lieu of \plc{N} in the \code{depend} clauses, 
-because the use of non-allocated allocatable arrays as list items in a \code{depend} clause would 
-lead to unspecified behavior. 
-
-\noteheader{--} This example is not strictly compliant with the OpenMP 4.5 specification since the allocation status
-of allocatable arrays \plc{v1} and \plc{v2} is changed inside the \code{target} region, which is not allowed.
-(See the restrictions for the \code{map} clause in the \plc{Data-mapping Attribute Rules and Clauses} 
-section of the specification.)
-However, the intention is to relax the restrictions on mapping of allocatable variables in the next release
-of the specification so that the example will be compliant.
-
-\ffreeexample{async_target}{2}
--- a/Examples_atomic.tex
+++ b/Examples_atomic.tex
@ -1,44 +0,0 @@
-\pagebreak
-\section{The \code{atomic} Construct}
-\label{sec:atomic}
-
-The following example avoids race conditions (simultaneous updates of an element 
-of \plc{x} by multiple threads) by using the \code{atomic} construct .
-
-The advantage of using the \code{atomic} construct in this example is that it 
-allows updates of two different elements of \plc{x} to occur in parallel. If 
-a \code{critical} construct were used instead, then all updates to elements of 
-\plc{x} would be executed serially (though not in any guaranteed order).
-
-Note that the \code{atomic} directive applies only to the statement immediately 
-following it. As a result, elements of \plc{y} are not updated atomically in 
-this example.
-
-\cexample{atomic}{1}
-
-\fexample{atomic}{1}
-
-The following example illustrates the \code{read} and \code{write}  clauses 
-for the \code{atomic} directive. These clauses ensure that the given variable 
-is read or written, respectively, as a whole. Otherwise, some other thread might 
-read or write part of the variable while the current thread was reading or writing 
-another part of the variable. Note that most hardware provides atomic reads and 
-writes for some set of properly aligned variables of specific sizes, but not necessarily 
-for all the variable types supported by the OpenMP API.
-
-\cexample{atomic}{2}
-
-\fexample{atomic}{2}
-
-The following example illustrates the \code{capture} clause for the \code{atomic} 
-directive. In this case the value of a variable is captured, and then the variable 
-is incremented. These operations occur atomically. This particular example could 
-be implemented using the fetch-and-add instruction available on many kinds of hardware. 
-The example also shows a way to implement a spin lock using the \code{capture} 
- and \code{read} clauses.
-
-\cexample{atomic}{3}
-
-\fexample{atomic}{3}
-
-
--- a/Examples_atomic_restrict.tex
+++ b/Examples_atomic_restrict.tex
@ -1,25 +0,0 @@
-\pagebreak
-\section{Restrictions on the \code{atomic} Construct}
-\label{sec:atomic_restrict}
-
-The following non-conforming examples illustrate the restrictions on the \code{atomic} 
-construct. 
-
-\cexample{atomic_restrict}{1}
-
-\fexample{atomic_restrict}{1}
-
-\cexample{atomic_restrict}{2}
-
-\fortranspecificstart
-The following example is non-conforming because \code{I} and \code{R} reference 
-the same location but have different types.
-
-\fnexample{atomic_restrict}{2}
-
-Although the following example might work on some implementations, this is also 
-non-conforming:
-
-\fnexample{atomic_restrict}{3}
-\fortranspecificend
-
--- a/Examples_barrier_regions.tex
+++ b/Examples_barrier_regions.tex
@ -1,24 +0,0 @@
-\pagebreak
-\section{Binding of \code{barrier} Regions}
-\label{sec:barrier_regions}
-
-The binding rules call for a \code{barrier} region to bind to the closest enclosing 
-\code{parallel} region. 
-
-In the following example, the call from the main program to \plc{sub2} is conforming 
-because the \code{barrier} region (in \plc{sub3}) binds to the \code{parallel} 
-region in \plc{sub2}. The call from the main program to \plc{sub1} is conforming 
-because the \code{barrier} region binds to the \code{parallel} region in subroutine 
-\plc{sub2}.
-
-The call from the main program to \plc{sub3} is conforming because the \code{barrier} 
-region binds to the implicit inactive \code{parallel} region enclosing the sequential 
-part. Also note that the \code{barrier} region in \plc{sub3} when called from 
-\plc{sub2} only synchronizes the team of threads in the enclosing \code{parallel} 
-region and not all the threads created in \plc{sub1}.
-
-\cexample{barrier_regions}{1}
-
-\fexample{barrier_regions}{1}
-
-
--- a/Examples_cancellation.tex
+++ b/Examples_cancellation.tex
@ -1,44 +0,0 @@
-\pagebreak
-\section{Cancellation Constructs}
-\label{sec:cancellation}
-
-The following example shows how the \code{cancel} directive can be used to terminate 
-an OpenMP region. Although the \code{cancel} construct terminates the OpenMP 
-worksharing region, programmers must still track the exception through the pointer 
-ex and issue a cancellation for the \code{parallel} region if an exception has 
-been raised. The master thread checks the exception pointer to make sure that the 
-exception is properly handled in the sequential part. If cancellation of the \code{parallel} 
-region has been requested, some threads might have executed \code{phase\_1()}. 
-However, it is guaranteed that none of the threads executed \code{phase\_2()}.
-
-\cppexample{cancellation}{1}
-
-
-The following example illustrates the use of the \code{cancel} construct in error 
-handling. If there is an error condition from the \code{allocate} statement, 
-the cancellation is activated. The encountering thread sets the shared variable 
-\code{err} and other threads of the binding thread set proceed to the end of 
-the worksharing construct after the cancellation has been activated. 
-
-\ffreeexample{cancellation}{1}
-
-\clearpage
-
-The following example shows how to cancel a parallel search on a binary tree as 
-soon as the search value has been detected. The code creates a task to descend 
-into the child nodes of the current tree node. If the search value has been found, 
-the code remembers the tree node with the found value through an \code{atomic} 
-write to the result variable and then cancels execution of all search tasks. The 
-function \code{search\_tree\_parallel} groups all search tasks into a single 
-task group to control the effect of the \code{cancel taskgroup} directive. The 
-\plc{level} argument is used to create undeferred tasks after the first ten 
-levels of the tree.
-
-\cexample{cancellation}{2}
-
-
-The following is the equivalent parallel search example in Fortran.
-
-\ffreeexample{cancellation}{2}
-
-
--- a/Examples_carrays_fpriv.tex
+++ b/Examples_carrays_fpriv.tex
@ -1,37 +0,0 @@
-\pagebreak
-\section{C/C++ Arrays in a \code{firstprivate} Clause}
-\ccppspecificstart
-\label{sec:carrays_fpriv}
-
-The following example illustrates the size and value of list items of array or 
-pointer type in a \code{firstprivate} clause . The size of new list items is 
-based on the type of the corresponding original list item, as determined by the 
-base language.
-
-In this example:
-
-\begin{compactitem}
-\item The type of \code{A} is array of two arrays of two ints.
-
-\item  The type of \code{B} is adjusted to pointer to array of \code{n} 
-ints, because it is a function parameter.
-
-\item  The type of \code{C} is adjusted to pointer to int, because 
-it is a function parameter.
-
-\item  The type of \code{D} is array of two arrays of two ints.
-
-\item  The type of \code{E} is array of \code{n} arrays of \code{n} 
-ints.
-\end{compactitem}
-
-Note that  \code{B} and \code{E} involve variable length array types.
-
-The new items of array type are initialized as if each integer element of the original 
-array is assigned to the corresponding element of the new array. Those of pointer 
-type are initialized as if by assignment from the original item to the new item.
-
-\cnexample{carrays_fpriv}{1}
-\ccppspecificend
-
-
--- a/Examples_collapse.tex
+++ b/Examples_collapse.tex
@ -1,78 +0,0 @@
-\pagebreak
-\section{The \code{collapse} Clause}
-\label{sec:collapse}
-
-In the following example, the \code{k} and \code{j} loops are associated with 
-the loop construct. So the iterations of the \code{k} and \code{j} loops are 
-collapsed into one loop with a larger iteration space, and that loop is then divided 
-among the threads in the current team. Since the \code{i} loop is not associated 
-with the loop construct, it is not collapsed, and the \code{i} loop is executed 
-sequentially in its entirety in every iteration of the collapsed \code{k} and 
-\code{j} loop. 
-
-The variable \code{j} can be omitted from the \code{private}  clause when the 
-\code{collapse} clause is used since it is implicitly private. However, if the 
-\code{collapse} clause is omitted then \code{j} will be shared if it is omitted 
-from the \code{private} clause. In either case, \code{k} is implicitly private 
-and could be omitted from the \code{private}  clause.
-
-\cexample{collapse}{1}
-
-\fexample{collapse}{1}
-
-In the next example, the \code{k} and \code{j} loops are associated with the 
-loop construct. So the iterations of the \code{k} and \code{j} loops are collapsed 
-into one loop with a larger iteration space, and that loop is then divided among 
-the threads in the current team.
-
-The sequential execution of the iterations in the \code{k} and \code{j} loops 
-determines the order of the iterations in the collapsed iteration space. This implies 
-that in the sequentially last iteration of the collapsed iteration space, \code{k} 
-will have the value \code{2} and \code{j} will have the value \code{3}. Since 
-\code{klast} and \code{jlast} are \code{lastprivate}, their values are assigned 
-by the sequentially last iteration of the collapsed \code{k} and \code{j} loop. 
-This example prints: \code{2 3}.
-
-\cexample{collapse}{2}
-
-\fexample{collapse}{2}
-
-The next example illustrates the interaction of the \code{collapse} and \code{ordered} 
- clauses.
-
-In the example, the loop construct has both a \code{collapse} clause and an \code{ordered} 
-clause. The \code{collapse} clause causes the iterations of the \code{k} and 
-\code{j} loops to be collapsed into one loop with a larger iteration space, and 
-that loop is divided among the threads in the current team. An \code{ordered} 
-clause is added to the loop construct, because an ordered region binds to the loop 
-region arising from the loop construct.
-
-According to Section 2.12.8 of the OpenMP 4.0 specification, 
-a thread must not execute more than one ordered region that binds 
-to the same loop region. So the \code{collapse} clause is required for the example 
-to be conforming. With the \code{collapse} clause, the iterations of the \code{k} 
-and \code{j} loops are collapsed into one loop, and therefore only one ordered 
-region will bind to the collapsed \code{k} and \code{j} loop. Without the \code{collapse} 
-clause, there would be two ordered regions that bind to each iteration of the \code{k} 
-loop (one arising from the first iteration of the \code{j} loop, and the other 
-arising from the second iteration of the \code{j} loop).
-
-The code prints
-
-\code{0 1 1}
-\\
-\code{0 1 2}
-\\
-\code{0 2 1}
-\\
-\code{1 2 2}
-\\
-\code{1 3 1}
-\\
-\code{1 3 2}
-
-\cexample{collapse}{3}
-
-\fexample{collapse}{3}
-
-
--- a/Examples_copyin.tex
+++ b/Examples_copyin.tex
@ -1,13 +0,0 @@
-\pagebreak
-\section{The \code{copyin} Clause}
-\label{sec:copyin}
-
-The \code{copyin} clause is used to initialize threadprivate data upon entry 
-to a \code{parallel} region. The value of the threadprivate variable in the master 
-thread is copied to the threadprivate variable of each other team member.
-
-\cexample{copyin}{1}
-
-\fexample{copyin}{1}
-
-
--- a/Examples_copyprivate.tex
+++ b/Examples_copyprivate.tex
@ -1,51 +0,0 @@
-\pagebreak
-\section{The \code{copyprivate} Clause}
-\label{sec:copyprivate}
-
-The \code{copyprivate} clause can be used to broadcast values acquired by a single 
-thread directly to all instances of the private variables in the other threads. 
-In this example, if the routine is called from the sequential part, its behavior 
-is not affected by the presence of the directives. If it is called from a \code{parallel} 
-region, then the actual arguments with which \code{a} and \code{b} are associated 
-must be private. 
-
-The thread that executes the structured block associated with the \code{single} 
- construct broadcasts the values of the private variables \code{a}, \code{b}, 
-\code{x}, and 
-\code{y} from its implicit task's data environment to the data environments 
-of the other implicit tasks in the thread team. The broadcast completes before 
-any of the threads have left the barrier at the end of the construct.
-
-\cexample{copyprivate}{1}
-
-\fexample{copyprivate}{1}
-
-In this example, assume that the input must be performed by the master thread. 
-Since the \code{master} construct does not support the \code{copyprivate} clause, 
-it cannot broadcast the input value that is read. However, \code{copyprivate} 
-is used to broadcast an address where the input value is stored.
-
-\cexample{copyprivate}{2}
-
-\fexample{copyprivate}{2}
-
-Suppose that the number of lock variables required within a \code{parallel} region 
-cannot easily be determined prior to entering it. The \code{copyprivate} clause 
-can be used to provide access to shared lock variables that are allocated within 
-that \code{parallel} region.
-
-\cexample{copyprivate}{3}
-
-\fortranspecificstart
-\fnexample{copyprivate}{3}
-
-Note that the effect of the \code{copyprivate} clause on a variable with the 
-\code{allocatable} attribute is different than on a variable with the \code{pointer} 
-attribute. The value of \code{A} is copied (as if by intrinsic assignment) and 
-the pointer \code{B} is copied (as if by pointer assignment) to the corresponding 
-list items in the other implicit tasks belonging to the \code{parallel} region. 
-
-\fnexample{copyprivate}{4}
-\fortranspecificend
-
-
--- a/Examples_critical.tex
+++ b/Examples_critical.tex
@ -1,20 +0,0 @@
-\pagebreak
-\section{The \code{critical} Construct}
-\label{sec:critical}
-
-The following example includes several \code{critical} constructs. The example 
-illustrates a queuing model in which a task is dequeued and worked on. To guard 
-against multiple threads dequeuing the same task, the dequeuing operation must 
-be in a \code{critical} region. Because the two queues in this example are independent, 
-they are protected by \code{critical} constructs with different names, \plc{xaxis} 
-and \plc{yaxis}.
-
-\cexample{critical}{1}
-
-\fexample{critical}{1}
-
-The following example extends the previous example by adding the \code{hint} clause to the \code{critical} constructs.
-
-\cexample{critical}{2}
-
-\fexample{critical}{2}
--- a/Examples_declare_target.tex
+++ b/Examples_declare_target.tex
@ -1,142 +0,0 @@
-\pagebreak
-\section{\code{declare} \code{target} Construct}
-\label{sec:declare_target}
-
-\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for a Function}
-\label{subsec:declare_target_function}
-
-The following example shows how the \code{declare} \code{target} directive 
-is used to indicate that the corresponding call inside a \code{target} region 
-is to a \code{fib} function that can execute on the default target device.
-
-A version of the function is also available on the host device. When the \code{if} 
-clause conditional expression on the \code{target} construct evaluates to \plc{false}, 
-the \code{target} region (thus \code{fib}) will execute on the host device.
-
-For C/C++ codes the declaration of the function \code{fib} appears between the \code{declare} 
-\code{target} and \code{end} \code{declare} \code{target} directives.
-
-\cexample{declare_target}{1}
-
-The Fortran \code{fib} subroutine contains a \code{declare} \code{target} declaration 
-to indicate to the compiler to create an device executable version of the procedure. 
-The subroutine name has not been included on the \code{declare} \code{target} 
-directive and is, therefore, implicitly assumed.
-
-The program uses the \code{module\_fib} module, which presents an explicit interface to 
-the compiler with the \code{declare} \code{target} declarations for processing 
-the \code{fib} call.
-
-\ffreeexample{declare_target}{1}
-
-The next Fortran example shows the use of an external subroutine. Without an explicit 
-interface (through module use or an interface block) the \code{declare} \code{target} 
-declarations within a external subroutine are unknown to the main program unit; 
-therefore, a \code{declare} \code{target} must be provided within the program 
-scope for the compiler to determine that a target binary should be available.
-
-\ffreeexample{declare_target}{2}
-
-\subsection{\code{declare} \code{target} Construct for Class Type}
-\label{subsec:declare_target_class}
-
-\cppspecificstart
-The following example shows how the \code{declare} \code{target} and \code{end} 
-\code{declare} \code{target} directives are used to enclose the declaration 
-of a variable \plc{varY} with a class type \code{typeY}. The member function \code{typeY::foo()} cannot 
-be accessed on a target device because its declaration did not appear between \code{declare} 
-\code{target} and \code{end} \code{declare} \code{target} directives.
-
-\cppnexample{declare_target}{2}
-\cppspecificend
-
-\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for Variables}
-\label{subsec:declare_target_variables}
-
-The following examples show how the \code{declare} \code{target} and \code{end} 
-\code{declare} \code{target} directives are used to indicate that global variables 
-are mapped to the implicit device data environment of each target device.
-
-In the following example, the declarations of the variables \plc{p}, \plc{v1}, and \plc{v2} appear 
-between \code{declare} \code{target} and \code{end} \code{declare} \code{target} 
-directives indicating that the variables are mapped to the implicit device data 
-environment of each target device. The \code{target} \code{update} directive 
-is then used to manage the consistency of the variables \plc{p}, \plc{v1}, and \plc{v2} between the 
-data environment of the encountering host device task and the implicit device data 
-environment of the default target device.
-
-\cexample{declare_target}{3}
-
-The Fortran version of the above C code uses a different syntax. Fortran modules 
-use a list syntax on the \code{declare} \code{target} directive to declare 
-mapped variables.
-
-\ffreeexample{declare_target}{3}
-
-The following example also indicates that the function \code{Pfun()} is available on the 
-target device, as well as the variable \plc{Q}, which is mapped to the implicit device 
-data environment of each target device. The \code{target} \code{update} directive 
-is then used to manage the consistency of the variable \plc{Q} between the data environment 
-of the encountering host device task and the implicit device data environment of 
-the default target device.
-
-In the following example, the function and variable declarations appear between 
-the \code{declare} \code{target} and \code{end} \code{declare} \code{target} 
-directives.
-
-\cexample{declare_target}{4}
-
-The Fortran version of the above C code uses a different syntax. In Fortran modules 
-a list syntax on the \code{declare} \code{target} directive is used to declare 
-mapped variables and procedures. The \plc{N} and \plc{Q} variables are declared as a comma 
-separated list. When the \code{declare} \code{target} directive is used to 
-declare just the procedure, the procedure name need not be listed -- it is implicitly 
-assumed, as illustrated in the \code{Pfun()} function.
-
-\ffreeexample{declare_target}{4}
-
-\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} with \code{declare} \code{simd}}
-\label{subsec:declare_target_simd}
-
-The following example shows how the \code{declare} \code{target} and \code{end} 
-\code{declare} \code{target} directives are used to indicate that a function 
-is available on a target device. The \code{declare} \code{simd} directive indicates 
-that there is a SIMD version of the function \code{P()} that is available on the target 
-device as well as one that is available on the host device.
-
-\cexample{declare_target}{5}
-
-The Fortran version of the above C code uses a different syntax. Fortran modules 
-use a list syntax of the \code{declare} \code{target} declaration for the mapping. 
-Here the \plc{N} and \plc{Q} variables are declared in the list form as a comma separated list. 
-The function declaration does not use a list and implicitly assumes the function 
-name. In this Fortran example row and column indices are reversed relative to the 
-C/C++ example, as is usual for codes optimized for memory access.
-
-\ffreeexample{declare_target}{5}
-
-
-\subsection{\code{declare}~\code{target} Directive with \code{link} Clause}
-\label{subsec:declare_target_link}
-
-In the OpenMP 4.5 standard the \code{declare}~\code{target} directive was extended to allow static
-data to be mapped, \emph{when needed}, through a \code{link} clause.
-
-Data storage for items listed in the \code{link} clause becomes available on the device
-when it is mapped implicitly or explicitly in a \code{map} clause, and it persists for the scope of
-the mapping (as specified by a \code{target} construct, 
-a \code{target}~\code{data} construct, or 
-\code{target}~\code{enter/exit}~\code{data} constructs).
-
-Tip: When all the global data items will not fit on a device and are not needed
-simultaneously, use the \code{link} clause and map the data only when it is needed.
-
-The following C and Fortran examples show two sets of data (single precision and double precision)
-that are global on the host for the entire execution on the host; but are only used
-globally on the device for part of the program execution. The single precision data
-are allocated and persist only for the first \code{target} region. Similarly, the
-double precision data are in scope on the device only for the second \code{target} region.
-
-\cexample{declare_target}{6}
-\ffreeexample{declare_target}{6}
-
--- a/Examples_default_none.tex
+++ b/Examples_default_none.tex
@ -1,19 +0,0 @@
-\pagebreak
-\section{The \code{default(none)} Clause}
-\label{sec:default_none}
-
-The following example distinguishes the variables that are affected by the \code{default(none)} 
-clause from those that are not. 
-
-\ccppspecificstart
-Beginning with OpenMP 4.0, variables with \code{const}-qualified type and no mutable member 
-are no longer predetermined shared.  Thus, these variables (variable \plc{c} in the example) 
-need to be explicitly listed
-in data-sharing attribute clauses when the \code{default(none)} clause is specified.
-
-\cnexample{default_none}{1}
-\ccppspecificend
-
-\fexample{default_none}{1}
-
-
--- a/Examples_depobj.tex
+++ b/Examples_depobj.tex
@ -1,49 +0,0 @@
-\pagebreak
-\section{The \code{depobj} Construct}
-\label{sec:depobj}
-
-The stand-alone \code{depobj} construct provides a mechanism 
-to create a \plc{depend object} that expresses a dependence to be 
-used subsequently in the \code{depend} clause of another construct.
-The dependence is created from a dependence type and a storage location,
-within a \code{depend} clause of an \code{depobj} construct; 
-%just as one would find directly on a \code{task} construct.  
-and it is stored in the depend object.
-The depend object is represented by a variable of type \code{omp\_depend\_t} 
-in C/C++ (by a scalar variable of integer kind \code{omp\_depend\_kind} in Fortran).
-
-In the example below the stand-alone \code{depobj} construct uses the 
-\code{depend}, \code{update} and \code{destroy} clauses to 
-\plc{initialize}, \plc{update} and \plc{uninitialize}
-a depend object (\code{obj}).
-
-The first \code{depobj} construct initializes the \code{obj} 
-depend object with 
-an \code{inout} dependence type with a storage 
-location defined by variable \code{a}.  
-This dependence is passed into the \plc{driver} 
-routine via the \code{obj} depend object.
-
-In the first \plc{driver} routine call, \emph{Task 1} uses
-the dependence of the object (\code{inout}), 
-while \emph{Task 2} uses an \code{in} dependence specified 
-directly in a \code{depend} clause.
-For these task dependences \emph{Task 1} must execute and 
-complete before \emph{Task 2} begins.
-
-Before the second call to \plc{driver}, \code{obj} is updated 
-using the \code{depobj} construct to represent an \code{in} dependence. 
-Hence, in the second call to \plc{driver}, \emph{Task 1}
-will have an \code{in} dependence; and \emph{Task 1} and 
-\emph{Task 2} can execute simultaneously. Note: in an \code{update}
-clause, only the dependence type can be (is) updated.
-
-The third \code{depobj} construct uses the \code{destroy} clause.
-It frees resources as it puts the depend object in an uninitialized state--
-effectively destroying the depend object.
-After an object has been uninitialized it can be initialized again
-with a new dependence type \emph{and} a new variable.
-
-\cexample{depobj}{1}
-
-\ffreeexample{depobj}{1}
--- a/Examples_device.tex
+++ b/Examples_device.tex
@ -1,57 +0,0 @@
-\pagebreak
-\section{Device Routines}
-\label{sec:device}
-
-\subsection{\code{omp\_is\_initial\_device} Routine}
-\label{subsec:device_is_initial}
-
-The following example shows how the \code{omp\_is\_initial\_device} runtime library routine 
-can be used to query if a code is executing on the initial host device or on a 
-target device. The example then sets the number of threads in the \code{parallel} 
-region based on where the code is executing.
-
-\cexample{device}{1}
-
-\ffreeexample{device}{1}
-
-\subsection{\code{omp\_get\_num\_devices} Routine}
-\label{subsec:device_num_devices}
-
-The following example shows how the \code{omp\_get\_num\_devices} runtime library routine 
-can be used to determine the number of devices.
-
-\cexample{device}{2}
-
-\ffreeexample{device}{2}
-
-\subsection{\code{omp\_set\_default\_device} and \\
-\code{omp\_get\_default\_device} Routines}
-\label{subsec:device_is_set_get_default}
-
-The following example shows how the \code{omp\_set\_default\_device} and \code{omp\_get\_default\_device} 
-runtime library routines can be used to set the default device and determine the 
-default device respectively.
-
-\cexample{device}{3}
-
-\ffreeexample{device}{3}
-
-
- \subsection{Target Memory and Device Pointers Routines}
-\label{subsec:target_mem_and_device_ptrs}
-
-The following example shows how to create space on a device, transfer data
-to and from that space, and free the space, using API calls. The API calls
-directly execute allocation, copy and free operations on the device, without invoking
-any mapping through a \code{target} directive. The \code{omp\_target\_alloc} routine allocates space
-and returns a device pointer for referencing the space in the \code{omp\_target\_memcpy}
-API routine on the host. The \code{omp\_target\_free} routine frees the space on the device.
-
-The example also illustrates how to access that space
-in a \code{target} region by exposing the device pointer in an \code{is\_device\_ptr} clause.
-
-The example creates an array of cosine values on the default device, to be used
-on the host device. The function fails if a default device is not available.
-
-\cexample{device}{4}
-
--- a/Examples_doacross.tex
+++ b/Examples_doacross.tex
@ -1,68 +0,0 @@
-\pagebreak
-\section{Doacross Loop Nest}
-\label{sec:doacross}
-
-An \code{ordered} clause can be used on a loop construct with an integer
-parameter argument to define the number of associated loops within 
-a \plc{doacross loop nest} where cross-iteration dependences exist.
-A \code{depend} clause on an \code{ordered} construct within an ordered 
-loop describes the dependences of the \plc{doacross} loops. 
-
-In the code below, the \code{depend(sink:i-1)} clause defines an \plc{i-1} 
-to \plc{i} cross-iteration dependence that specifies a wait point for 
-the completion of computation from iteration \plc{i-1} before proceeding 
-to the subsequent statements. The \code{depend(source)} clause indicates 
-the completion of computation from the current iteration (\plc{i}) 
-to satisfy the cross-iteration dependence that arises from the iteration.
-For this example the same sequential ordering could have been achieved 
-with an \code{ordered} clause without a parameter, on the loop directive, 
-and a single \code{ordered} directive without the \code{depend} clause
-specified for the statement executing the \plc{bar} function.
-
-\cexample{doacross}{1}
-
-\ffreeexample{doacross}{1}
-
-The following code is similar to the previous example but with 
-\plc{doacross loop nest} extended to two nested loops, \plc{i} and \plc{j}, 
-as specified by the \code{ordered(2)} clause on the loop directive. 
-In the C/C++ code, the \plc{i} and \plc{j} loops are the first and
-second associated loops, respectively, whereas
-in the Fortran code, the \plc{j} and \plc{i} loops are the first and
-second associated loops, respectively.
-The \code{depend(sink:i-1,j)} and \code{depend(sink:i,j-1)} clauses in 
-the C/C++ code define cross-iteration dependences in two dimensions from 
-iterations (\plc{i-1, j}) and (\plc{i, j-1}) to iteration (\plc{i, j}).  
-Likewise, the \code{depend(sink:j-1,i)} and \code{depend(sink:j,i-1)} clauses 
-in the Fortran code define cross-iteration dependences from iterations 
-(\plc{j-1, i}) and (\plc{j, i-1}) to iteration (\plc{j, i}).
-
-\cexample{doacross}{2}
-
-\ffreeexample{doacross}{2}
-
-
-The following example shows the incorrect use of the \code{ordered} 
-directive with a \code{depend} clause.  There are two issues with the code.  
-The first issue is a missing \code{ordered}~\code{depend(source)} directive,
-which could cause a deadlock.  
-The second issue is the \code{depend(sink:i+1,j)} and \code{depend(sink:i,j+1)} 
-clauses define dependences on lexicographically later 
-source iterations (\plc{i+1, j}) and (\plc{i, j+1}), which could cause 
-a deadlock as well since they may not start to execute until the current iteration completes.
-
-\cexample{doacross}{3}
-
-\ffreeexample{doacross}{3}
-
-
-The following example illustrates the use of the \code{collapse} clause for
-a \plc{doacross loop nest}.  The \plc{i} and \plc{j} loops are the associated
-loops for the collapsed loop as well as for the \plc{doacross loop nest}.
-The example also shows a compliant usage of the dependence source
-directive placed before the corresponding sink directive.
-Checking the completion of computation from previous iterations at the sink point can occur after the source statement.
-
-\cexample{doacross}{4}
-
-\ffreeexample{doacross}{4}
--- a/Examples_flush_nolist.tex
+++ b/Examples_flush_nolist.tex
@ -1,12 +0,0 @@
-\pagebreak
-\section{The \code{flush} Construct without a List}
-\label{sec:flush_nolist}
-
-The following example distinguishes the shared variables affected by a \code{flush} 
-construct with no list from the shared objects that are not affected:
-
-\cexample{flush_nolist}{1}
-
-\fexample{flush_nolist}{1}
-
-
--- a/Examples_fort_do.tex
+++ b/Examples_fort_do.tex
@ -1,19 +0,0 @@
-\pagebreak
-\section{Fortran Restrictions on the \code{do} Construct}
-\label{sec:fort_do}
-\fortranspecificstart
-
-If an \code{end do} directive follows a \plc{do-construct}  in which several 
-\code{DO} statements share a \code{DO} termination statement, then a  \code{do} 
-directive can only be specified for the outermost of these \code{DO} statements. 
-The following example contains correct usages of loop constructs:
-
-\fnexample{fort_do}{1}
-
-The following example is non-conforming because the matching \code{do} directive 
-for the \code{end do} does not precede the outermost loop:
-
-\fnexample{fort_do}{2}
-\fortranspecificend
-
-
--- a/Examples_fort_sa_private.tex
+++ b/Examples_fort_sa_private.tex
@ -1,23 +0,0 @@
-\pagebreak
-\section{Fortran Restrictions on Storage Association with the \code{private} Clause}
-\fortranspecificstart
-\label{sec:fort_sa_private}
-
-The following non-conforming examples illustrate the implications of the \code{private} 
-clause rules with regard to storage association. 
-
-\fnexample{fort_sa_private}{1}
-
-\fnexample{fort_sa_private}{2}
-
-\fnexample{fort_sa_private}{3}
-% blue line floater at top of this page for "Fortran, cont."
-\begin{figure}[t!]
-\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
-\end{figure}
-
-\fnexample{fort_sa_private}{4}
-
-\fnexample{fort_sa_private}{5}
-\fortranspecificend
-
--- a/Examples_fort_sp_common.tex
+++ b/Examples_fort_sp_common.tex
@ -1,38 +0,0 @@
-\pagebreak
-\section{Fortran Restrictions on \code{shared} and \code{private} Clauses with Common Blocks}
-\fortranspecificstart
-\label{sec:fort_sp_common}
-
-When a named common block is specified in a \code{private}, \code{firstprivate}, 
-or \code{lastprivate} clause of a construct, none of its members may be declared 
-in another data-sharing attribute clause on that construct. The following examples 
-illustrate this point. 
-
-The following example is conforming:
-
-\fnexample{fort_sp_common}{1}
-
-The following example is also conforming:
-
-\fnexample{fort_sp_common}{2}
-% blue line floater at top of this page for "Fortran, cont."
-%\begin{figure}[t!]
-%\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
-%\end{figure}
-
-The following example is conforming:
-
-\fnexample{fort_sp_common}{3}
-
-The following example is non-conforming because \code{x} is a constituent element 
-of \code{c}:
-
-\fnexample{fort_sp_common}{4}
-
-The following example is non-conforming because a common block may not be declared 
-both shared and private:
-
-\fnexample{fort_sp_common}{5}
-\fortranspecificend
-
-
--- a/Examples_fpriv_sections.tex
+++ b/Examples_fpriv_sections.tex
@ -1,18 +0,0 @@
-\pagebreak
-\section{The \code{firstprivate} Clause and the \code{sections} Construct}
-\label{sec:fpriv_sections}
-
-In the following example of the \code{sections} construct  the \code{firstprivate} 
-clause is used to initialize the private copy of \code{section\_count} of each 
-thread. The problem is that the \code{section} constructs modify \code{section\_count}, 
-which breaks the independence of the \code{section} constructs. When different 
-threads execute each section, both sections will print the value 1. When the same 
-thread executes the two sections, one section will print the value 1 and the other 
-will print the value 2. Since the order of execution of the two sections in this 
-case is unspecified, it is unspecified which section prints which value. 
-
-\cexample{fpriv_sections}{1}
-
-\ffreeexample{fpriv_sections}{1}
-
-
--- a/Examples_get_nthrs.tex
+++ b/Examples_get_nthrs.tex
@ -1,22 +0,0 @@
-\pagebreak
-\section{The \code{omp\_get\_num\_threads} Routine}
-\label{sec:get_nthrs}
-
-In the following example, the \code{omp\_get\_num\_threads} call returns 1 in 
-the sequential part of the code, so \code{np} will always be equal to 1. To determine 
-the number of threads that will be deployed for the \code{parallel} region, the 
-call should be inside the \code{parallel} region.
-
-\cexample{get_nthrs}{1}
-
-\fexample{get_nthrs}{1}
-
-\pagebreak
-The following example shows how to rewrite this program without including a query 
-for the number of threads:
-
-\cexample{get_nthrs}{2}
-
-\fexample{get_nthrs}{2}
-
-
--- a/Examples_host_teams.tex
+++ b/Examples_host_teams.tex
@ -1,28 +0,0 @@
-\pagebreak
-\section{\code{teams} Construct on Host}
-\label{sec:host_teams}
-
-%{\color{blue} ... } {\color{violet} ... }
-Originally the \code{teams} construct was created for devices (such as GPUs)
-for independent executions of a structured block by teams within a league (on SMs).
-It was only available through offloading with the \code{target} construct,
-and the execution of a \code{teams} region could only be directed to host
-execution by various means such as \code{if} and \code{device} clauses,
-and the \code{OMP\_TARGET\_OFFLOAD} environment variable.
-
-In OpenMP 5.0 the \code{teams} construct was extended to enable the host
-to execute a \code{teams} region (without an associated \code{target} construct), 
-with anticipation of further affinity and threading controls in future OpenMP releases.
-%With additional affinity controls, a team could be
-%assigned to execute on a socket or use only a specified number of threads.
-
-In the example below the \code{teams} construct is used to create two
-teams, one to execute single precision code, and the other
-to execute double precision code. Two teams are required, and
-the thread limit for each team is set to 1/2 of the number of 
-available processors.
-
-\cexample{host_teams}{1}
-
-\ffreeexample{host_teams}{1}
-
--- a/Examples_icv.tex
+++ b/Examples_icv.tex
@ -1,57 +0,0 @@
-\pagebreak
-\section{Internal Control Variables (ICVs)}
-\label{sec:icv}
-
-According to Section 2.3 of the OpenMP 4.0 specification, an OpenMP implementation must act as if there are ICVs that control 
-the behavior of the program.  This example illustrates two ICVs, \plc{nthreads-var} 
-and \plc{max-active-levels-var}. The \plc{nthreads-var} ICV controls the 
-number of threads requested for encountered parallel regions; there is one copy 
-of this ICV per task. The \plc{max-active-levels-var} ICV controls the maximum 
-number of nested active parallel regions; there is one copy of this ICV for the 
-whole program.
-
-In the following example, the \plc{nest-var}, \plc{max-active-levels-var}, 
-\plc{dyn-var}, and \plc{nthreads-var} ICVs are modified through calls to 
-the runtime library routines \code{omp\_set\_nested},\\ \code{omp\_set\_max\_active\_levels},\code{ 
-omp\_set\_dynamic}, and \code{omp\_set\_num\_threads} respectively. These ICVs 
-affect the operation of \code{parallel} regions. Each implicit task generated 
-by a \code{parallel} region has its own copy of the \plc{nest-var, dyn-var}, 
-and \plc{nthreads-var} ICVs.
-
-In the following example, the new value of \plc{nthreads-var} applies only to 
-the implicit tasks that execute the call to \code{omp\_set\_num\_threads}. There 
-is one copy of the \plc{max-active-levels-var} ICV for the whole program and 
-its value is the same for all tasks. This example assumes that nested parallelism 
-is supported.
-
-The outer \code{parallel} region creates a team of two threads; each of the threads 
-will execute one of the two implicit tasks generated by the outer \code{parallel} 
-region.
-
-Each implicit task generated by the outer \code{parallel} region calls \code{omp\_set\_num\_threads(3)}, 
-assigning the value 3 to its respective copy of \plc{nthreads-var}. Then each 
-implicit task encounters an inner \code{parallel} region that creates a team 
-of three threads; each of the threads will execute one of the three implicit tasks 
-generated by that inner \code{parallel} region.
-
-Since the outer \code{parallel} region is executed by 2 threads, and the inner 
-by 3, there will be a total of 6 implicit tasks generated by the two inner \code{parallel} 
-regions.
-
-Each implicit task generated by an inner \code{parallel} region will execute 
-the call to\\ \code{omp\_set\_num\_threads(4)}, assigning the value 4 to its respective 
-copy of \plc{nthreads-var}.
-
-The print statement in the outer \code{parallel} region is executed by only one 
-of the threads in the team. So it will be executed only once.
-
-The print statement in an inner \code{parallel} region is also executed by only 
-one of the threads in the team. Since we have a total of two inner \code{parallel} 
-regions, the print statement will be executed twice -- once per inner \code{parallel} 
-region.
-
-\pagebreak
-\cexample{icv}{1}
-
-\fexample{icv}{1}
-
--- a/Examples_init_lock.tex
+++ b/Examples_init_lock.tex
@ -1,10 +0,0 @@
-\subsection{The \code{omp\_init\_lock} Routine}
-\label{subsec:init_lock}
-
-The following example demonstrates how to initialize an array of locks in a \code{parallel} 
-region by using \code{omp\_init\_lock}.
-
-\cppexample{init_lock}{1}
-
-\fexample{init_lock}{1}
-
--- a/Examples_init_lock_with_hint.tex
+++ b/Examples_init_lock_with_hint.tex
@ -1,10 +0,0 @@
-%\pagebreak
-\subsection{The \code{omp\_init\_lock\_with\_hint} Routine}
-\label{subsec:init_lock_with_hint}
-
-The following example demonstrates how to initialize an array of locks in a \code{parallel} region by using \code{omp\_init\_lock\_with\_hint}.
-Note, hints are combined with an \code{|} or \code{+} operator in C/C++ and a \code{+} operator in Fortran.
-
-\cppexample{init_lock_with_hint}{1}
-
-\fexample{init_lock_with_hint}{1}
--- a/Examples_lastprivate.tex
+++ b/Examples_lastprivate.tex
@ -1,14 +0,0 @@
-\pagebreak
-\section{The \code{lastprivate} Clause}
-\label{sec:lastprivate}
-
-Correct execution sometimes depends on the value that the last iteration of a loop 
-assigns to a variable. Such programs must list all such variables in a \code{lastprivate} 
-clause  so that the values of the variables are the same as when the loop is executed 
-sequentially.
-
-\cexample{lastprivate}{1}
-
-\fexample{lastprivate}{1}
-
-
--- a/Examples_linear_in_loop.tex
+++ b/Examples_linear_in_loop.tex
@ -1,13 +0,0 @@
-\section{\code{linear} Clause in Loop Constructs}
-\label{sec:linear_in_loop}
-
-The following example shows the use of the \code{linear} clause in a loop 
-construct to allow the proper parallelization of a loop that contains 
-an induction variable (\plc{j}).  At the end of the execution of 
-the loop construct, the original variable \plc{j} is updated with 
-the value \plc{N/2} from the last iteration of the loop.
-
-\cexample{linear_in_loop}{1}
-
-\ffreeexample{linear_in_loop}{1}
-
--- a/Examples_lock_owner.tex
+++ b/Examples_lock_owner.tex
@ -1,22 +0,0 @@
-\subsection{Ownership of Locks}
-\label{subsec:lock_owner}
-
-Ownership of locks has changed since OpenMP 2.5. In OpenMP 2.5, locks are owned 
-by threads; so a lock released by the \code{omp\_unset\_lock} routine must be 
-owned by the same thread executing the routine.  Beginning with OpenMP 3.0, locks are owned 
-by task regions; so a lock released by the \code{omp\_unset\_lock} routine in 
-a task region must be owned by the same task region.
-
-This change in ownership requires extra care when using locks. The following program 
-is conforming in OpenMP 2.5 because the thread that releases the lock \code{lck} 
-in the parallel region is the same thread that acquired the lock in the sequential 
-part of the program (master thread of parallel region and the initial thread are 
-the same). However, it is not conforming beginning with OpenMP 3.0, because the task 
-region that releases the lock \code{lck} is different from the task region that 
-acquires the lock.
-
-\cexample{lock_owner}{1}
-
-\fexample{lock_owner}{1}
-
-
--- a/Examples_loop.tex
+++ b/Examples_loop.tex
@ -1,13 +0,0 @@
-\pagebreak
-\section{The \code{loop} Construct}
-\label{sec:loop}
-
-The following example illustrates the use of the OpenMP 5.0 \code{loop}
-construct for the execution of a loop.
-The \code{loop} construct asserts to the compiler that the iterations 
-of the loop are free of data dependencies and may be executed concurrently.
-It allows the compiler to use heuristics to select the parallelization scheme
-and compiler-level optimizations for the concurrency. 
-
-    \cexample{loop}{1}
-\ffreeexample{loop}{1}
--- a/Examples_master.tex
+++ b/Examples_master.tex
@ -1,13 +0,0 @@
-\pagebreak
-\section{The \code{master} Construct}
-\label{sec:master}
-
-The following example demonstrates the master construct . In the example, the master 
-keeps track of how many iterations have been executed and prints out a progress 
-report. The other threads skip the master region without waiting.
-
-\cexample{master}{1}
-
-\fexample{master}{1}
-
-
--- a/Examples_mem_model.tex
+++ b/Examples_mem_model.tex
@ -1,41 +0,0 @@
-
-\pagebreak
-\section{The OpenMP Memory Model}
-\label{sec:mem_model}
-
-In the following example, at Print 1, the value of \plc{x} could be either 2 
-or 5, depending on the timing of the threads, and the implementation of the assignment 
-to \plc{x}. There are two reasons that the value at Print 1 might not be 5. 
-First, Print 1 might be executed before the assignment to \plc{x} is executed. 
-Second, even if Print 1 is executed after the assignment, the value 5 is not guaranteed 
-to be seen by thread 1 because a flush may not have been executed by thread 0 since 
-the assignment.
-
-The barrier after Print 1 contains implicit flushes on all threads, as well as 
-a thread synchronization, so the programmer is guaranteed that the value 5 will 
-be printed by both Print 2 and Print 3.
-
-\cexample{mem_model}{1}
-
-\ffreeexample{mem_model}{1}
-
-\pagebreak
-The following example demonstrates why synchronization is difficult to perform 
-correctly through variables. The value of flag is undefined in both prints on thread 
-1 and the value of data is only well-defined in the second print.
-
-\cexample{mem_model}{2}
-
-\fexample{mem_model}{2}
-
-\pagebreak
-The next example demonstrates why synchronization is difficult to perform correctly 
-through variables. Because the \plc{write}(1)-\plc{flush}(1)-\plc{flush}(2)-\plc{read}(2) 
-sequence cannot be guaranteed in the example, the statements on thread 0 and thread 
-1 may execute in either order.
-
-\cexample{mem_model}{3}
-
-\fexample{mem_model}{3}
-
-
--- a/Examples_metadirective.tex
+++ b/Examples_metadirective.tex
@ -1,88 +0,0 @@
-\pagebreak
-\section{Metadirective Directive}
-\label{sec:metadirective}
-
-A \code{metadirective} directive provides a mechanism to select a directive in
-a \code{when} clause to be used, depending upon one or more contexts:  
-implementation, available devices and the present enclosing construct. 
-The directive in a \code{default} clause is used when a directive of the 
-\code{when} clause is not selected.
-
-In the \code{when} clause the \plc{context selector} (or just \plc{selector}) defines traits that are
-evaluated for selection of the directive that follows the selector. 
-This "selectable" directive is called a \plc{directive variant}.
-Traits are grouped by \plc{construct}, \plc{implementation} and 
-\plc{device} \plc{sets} to be used by a selector of the same name.
-
-In the first example the architecture trait \plc{arch} of the 
-\plc{device} selector set specifies that if an \plc{nvptx} (NVIDIA) architecture is
-active in the OpenMP context, then the \code{teams}~\code{loop} 
-\plc{directive variant} is selected as the directive; otherwise, the \code{parallel}~\code{loop}
-\plc{directive variant} of the \code{default} clause is selected as the directive.
-That is, if a \plc{device} of \plc{nvptx} architecture is supported by the implementation within
-the enclosing \code{target} construct, its \plc{directive variant} is selected.
-The architecture names, such as \plc{nvptx}, are implementation defined.
-Also, note that \plc{device} as used in a \code{target} construct specifies
-a device number, while \plc{device}, as used in the \code{metadirective}
-directive as selector set, has traits of \plc{kind}, \plc{isa} and \plc{arch}.
-
-
-
-\cexample{metadirective}{1}
-
-\ffreeexample{metadirective}{1}
-
-%\pagebreak
-In the second example, the \plc{implementation} selector set is specified
-in the \code{when} clause to distinguish between AMD and NVIDIA platforms. 
-Additionally, specific architectures are specified with the \plc{device} 
-selector set.
-
-In the code, different \code{teams} constructs are employed as determined
-by the \code{metadirective} directive.
-The number of teams is restricted by a \code{num\_teams} clause
-and a thread limit is also set by a \code{thread\_limit} clause for 
-\plc{vendor} AMD and NVIDIA platforms and specific architecture
-traits.  Otherwise, just the \code{teams} construct is used without
-any clauses, as prescribed by the \code{default} clause.
-
-
-\cexample{metadirective}{2}
-
-\ffreeexample{metadirective}{2}
-
-\clearpage
-
-%\pagebreak
-In the third example, a \plc{construct} selector set is specified in the \code{when} clause.  
-Here, a \code{metadirective} directive is used within a function that is also
-compiled as a function for a target device as directed by the \code{declare}~\code{target} directive.
-The \plc{target} directive name of the \code{construct} selector ensures that the
-\code{distribute}~\code{parallel}~\code{for/do} construct is employed for the target compilation.
-Otherwise, for the host-compiled version the \code{parallel}~\code{for/do}~\code{simd} construct is used.
-
-In the first call to the \plc{exp\_pi\_diff()} routine the context is a
-\code{target}~\code{teams} construct and the \code{distribute}~\code{parallel}~\code{for/do}
-construct version of the function is invoked,
-while in the second call the \code{parallel}~\code{for/do}~\code{simd} construct version is used.
-
-%%%%%%%%
-This case illustrates an important point for users that may want to hoist the 
-\code{target} directive out of a function that contains the usual 
-\code{target}~\code{teams}~\code{distribute}~\code{parallel}~\code{for/do} construct
-(for providing alternate constructs through the \code{metadirective} directive as here).
-While this combined construct can be decomposed into a \code{target} and
-\code{teams distribute parallel for/do} constructs, the OpenMP 5.0 specification has the restriction:
-``If a \code{teams} construct is nested within a \code{target} construct, that \code{target} construct must
-contain no statements, declarations or directives outside of the \code{teams} construct''.
-So, the \code{teams} construct must immediately follow the \code{target} construct without any intervening
-code statements (which includes function calls).  
-Since the \code{target} construct alone cannot be hoisted out of a function, 
-the \code{target}~\code{teams} construct has been hoisted out of the function, and the 
-\code{distribute}~\code{parallel}~\code{for/do} construct is used
-as the \plc{variant} directive of the \code{metadirective} directive within the function.
-%%%%%%%%
-
-\cexample{metadirective}{3}
-
-\ffreeexample{metadirective}{3}
--- a/Examples_nowait.tex
+++ b/Examples_nowait.tex
@ -1,28 +0,0 @@
-\pagebreak
-\section{The \code{nowait} Clause}
-\label{sec:nowait}
-
-If there are multiple independent loops within a \code{parallel} region, you 
-can use the \code{nowait} clause to avoid the implied barrier at the end of the 
-loop construct, as follows:
-
-\cexample{nowait}{1}
-
-\fexample{nowait}{1}
-
-In the following example, static scheduling distributes the same logical iteration 
-numbers to the threads that execute the three loop regions. This allows the \code{nowait} 
-clause to be used, even though there is a data dependence between the loops. The 
-dependence is satisfied as long the same thread executes the same logical iteration 
-numbers in each loop.
-
-Note that the iteration count of the loops must be the same. The example satisfies 
-this requirement, since the iteration space of the first two loops is from \code{0} 
-to \code{n-1} (from \code{1} to \code{N} in the Fortran version), while the 
-iteration space of the last loop is from \code{1} to \code{n} (\code{2} to 
-\code{N+1} in the Fortran version).
-
-\cexample{nowait}{2}
-
-\ffreeexample{nowait}{2}
-
--- a/Examples_nthrs_dynamic.tex
+++ b/Examples_nthrs_dynamic.tex
@ -1,31 +0,0 @@
-\pagebreak
-\section{Interaction Between the \code{num\_threads} Clause and \code{omp\_set\_dynamic}}
-\label{sec:nthrs_dynamic}
-
-The following example demonstrates the \code{num\_threads} clause  and the effect 
-of the \\
-\code{omp\_set\_dynamic} routine  on it.
-
-The call to the \code{omp\_set\_dynamic} routine with argument \code{0} in 
-C/C++, or \code{.FALSE.} in Fortran, disables the dynamic adjustment of the number 
-of threads in OpenMP implementations that support it. In this case, 10 threads 
-are provided. Note that in case of an error the OpenMP implementation is free to 
-abort the program or to supply any number of threads available.
-
-\cexample{nthrs_dynamic}{1}
-
-\fexample{nthrs_dynamic}{1}
-
-\pagebreak
-The call to the \code{omp\_set\_dynamic} routine with a non-zero argument in 
-C/C++, or \code{.TRUE.} in Fortran, allows the OpenMP implementation to choose 
-any number of threads between 1 and 10.
-
-\cexample{nthrs_dynamic}{2}
-
-\fexample{nthrs_dynamic}{2}
-
-It is good practice to set the \plc{dyn-var} ICV explicitly by calling the \code{omp\_set\_dynamic} 
-routine, as its default setting is implementation defined.
-
-
--- a/Examples_nthrs_nesting.tex
+++ b/Examples_nthrs_nesting.tex
@ -1,12 +0,0 @@
-\pagebreak
-\section{Controlling the Number of Threads on Multiple Nesting Levels}
-\label{sec:nthrs_nesting}
-
-The following examples demonstrate how to use the \code{OMP\_NUM\_THREADS} environment 
-variable  to control the number of threads on multiple nesting levels:
-
-\cexample{nthrs_nesting}{1}
-
-\fexample{nthrs_nesting}{1}
-
-
--- a/Examples_ordered.tex
+++ b/Examples_ordered.tex
@ -1,28 +0,0 @@
-\pagebreak
-\section{The \code{ordered} Clause and the \code{ordered} Construct}
-\label{sec:ordered}
-
-Ordered constructs  are useful for sequentially ordering the output from work that 
-is done in parallel. The following program prints out the indices in sequential 
-order:
-
-\cexample{ordered}{1}
-
-\fexample{ordered}{1}
-
-It is possible to have multiple \code{ordered} constructs within a loop region 
-with the \code{ordered} clause specified. The first example is non-conforming 
-because all iterations execute two \code{ordered} regions. An iteration of a 
-loop must not execute more than one \code{ordered} region:
-
-\cexample{ordered}{2}
-
-\fexample{ordered}{2}
-
-The following is a conforming example with more than one \code{ordered} construct. 
-Each iteration will execute only one \code{ordered} region:
-
-\cexample{ordered}{3}
-
-\fexample{ordered}{3}
-
--- a/Examples_parallel.tex
+++ b/Examples_parallel.tex
@ -1,12 +0,0 @@
-\pagebreak
-\section{The \code{parallel} Construct}
-\label{sec:parallel}
-
-The \code{parallel} construct  can be used in coarse-grain parallel programs. 
-In the following example, each thread in the \code{parallel} region decides what 
-part of the global array \plc{x} to work on, based on the thread number:
-
-\cexample{parallel}{1}
-
-\fexample{parallel}{1}
-
--- a/Examples_parallel_master_taskloop.tex
+++ b/Examples_parallel_master_taskloop.tex
@ -1,33 +0,0 @@
-\pagebreak
-\section{The \code{parallel master taskloop} Construct}
-\label{sec:parallel_master_taskloop}
-
-In the OpenMP 5.0 Specification several combined constructs containing
-the \code{taskloop} construct were added.
-   
-Just as the \code{for} and \code{do} constructs have been combined
-with the \code{parallel} construct for convenience, so too, the combined
-\code{parallel}~\code{master}~\code{taskloop} and 
-\code{parallel}~\code{master}~\code{taskloop}~\code{simd}
-constructs have been created for convenience.
-  
-In the following example the first \code{taskloop} construct is enclosed
-by the usual \code{parallel} and \code{master} constructs to form
-a team of threads, and a single task generator (master thread) for
-the \code{taskloop} construct.
-
-The same OpenMP operations for the first taskloop are accomplished by the second
-taskloop with the \code{parallel}~\code{master}~\code{taskloop} 
-combined construct. 
-The third taskloop uses the combined \code{parallel}~\code{master}~\code{taskloop}~\code{simd} 
-construct to accomplish the same behavior as closely nested \code{parallel master},
-and \code{taskloop simd} constructs.
-
-As with any combined construct the clauses of the components may be used
-with appropriate restrictions. The combination of the \code{parallel}~\code{master} construct
-with the \code{taskloop} or \code{taskloop}~\code{simd} construct produces no additional 
-restrictions.
-
-\cexample{parallel_master_taskloop}{1}
-
-\ffreeexample{parallel_master_taskloop}{1}
--- a/Examples_ploop.tex
+++ b/Examples_ploop.tex
@ -1,12 +0,0 @@
-\pagebreak
-\section{A Simple Parallel Loop}
-\label{sec:ploop}
-
-The following example demonstrates how to parallelize a simple loop using the parallel 
-loop construct. The loop iteration variable is private by default, so it is not 
-necessary to specify it explicitly in a \code{private} clause.
-
-\cexample{ploop}{1}
-
-\fexample{ploop}{1}
-
--- a/Examples_private.tex
+++ b/Examples_private.tex
@ -1,31 +0,0 @@
-\pagebreak
-\section{The \code{private} Clause}
-\label{sec:private}
-
-In the following example, the values of original list items \plc{i} and \plc{j} 
-are retained on exit from the \code{parallel} region, while the private list 
-items \plc{i} and \plc{j} are modified within the \code{parallel} construct. 
-
-\cexample{private}{1}
-
-\fexample{private}{1}
-
-In the following example, all uses of the variable \plc{a} within the loop construct 
-in the routine \plc{f} refer to a private list item \plc{a}, while it is 
-unspecified whether references to \plc{a} in the routine \plc{g} are to a 
-private list item or the original list item.
-
-\cexample{private}{2}
-
-\fexample{private}{2}
-
-The following example demonstrates that a list item that appears in a \code{private} 
- clause in a \code{parallel} construct may also appear in a \code{private} 
- clause in an enclosed worksharing construct, which results in an additional private 
-copy.
-
-\cexample{private}{3}
-
-\fexample{private}{3}
-
-
--- a/Examples_psections.tex
+++ b/Examples_psections.tex
@ -1,13 +0,0 @@
-\pagebreak
-\section{The \code{parallel} \code{sections} Construct}
-\label{sec:psections}
-
-In the following example routines \code{XAXIS}, \code{YAXIS}, and \code{ZAXIS} can 
-be executed concurrently. The first \code{section} directive is optional. Note 
-that all \code{section} directives need to appear in the \code{parallel sections} 
-construct.
-
-\cexample{psections}{1}
-
-\fexample{psections}{1}
-
--- a/Examples_reduction.tex
+++ b/Examples_reduction.tex
@ -1,237 +0,0 @@
-\pagebreak
-
-\section{Reduction}
-\label{sec:reduction}
-
-This section covers ways to perform reductions in parallel, task, taskloop, and SIMD regions.
-
-\subsection{The \code{reduction} Clause}
-\label{subsec:reduction}
-
-The following example demonstrates the \code{reduction} clause; note that some 
-reductions can be expressed in the loop in several ways, as shown for the \code{max} 
-and \code{min} reductions below:
-
-\cexample{reduction}{1}
-
-\pagebreak
-
-\ffreeexample{reduction}{1}
-
-A common implementation of the preceding example is to treat it as if it had been 
-written as follows:
-
-\cexample{reduction}{2}
-
-\fortranspecificstart
-\ffreenexample{reduction}{2}
-
-The following program is non-conforming because the reduction is on the 
-\emph{intrinsic procedure name} \code{MAX} but that name has been redefined to be the variable 
-named \code{MAX}.
-
-\ffreenexample{reduction}{3}
-% blue line floater at top of this page for "Fortran, cont."
-\begin{figure}[t!]
-\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
-\end{figure}
-
-The following conforming program performs the reduction using the 
-\emph{intrinsic procedure name} \code{MAX} even though the intrinsic \code{MAX} has been renamed 
-to \code{REN}.
-
-\ffreenexample{reduction}{4}
-
-The following conforming program performs the reduction using 
-\plc{intrinsic procedure name} \code{MAX} even though the intrinsic \code{MAX} has been renamed 
-to \code{MIN}.
-
-\ffreenexample{reduction}{5}
-\fortranspecificend
-
-\pagebreak
-The following example is non-conforming because the initialization (\code{a = 
-0}) of the original list item \code{a} is not synchronized with the update of 
-\code{a} as a result of the reduction computation in the \code{for} loop. Therefore, 
-the example may print an incorrect value for \code{a}.
-
-To avoid this problem, the initialization of the original list item \code{a} 
-should complete before any update of \code{a} as a result of the \code{reduction} 
-clause. This can be achieved by adding an explicit barrier after the assignment 
-\code{a = 0}, or by enclosing the assignment \code{a = 0} in a \code{single} 
-directive (which has an implied barrier), or by initializing \code{a} before 
-the start of the \code{parallel} region.
-
-\cexample{reduction}{6}
-
-\fexample{reduction}{6}
-
-The following example demonstrates the reduction of array \plc{a}.  In C/C++ this is illustrated by the explicit use of an array section \plc{a[0:N]} in the \code{reduction} clause.  The corresponding Fortran example uses array syntax supported in the base language.  As of the OpenMP 4.5 specification the explicit use of array section in the \code{reduction} clause in Fortran is not permitted.  But this oversight will be fixed in the next release of the specification.
-
-
-\cexample{reduction}{7}
-
-\ffreeexample{reduction}{7}
-
-
-\subsection{Task Reduction}
-\label{subsec:task_reduction}
-
-The following C/C++ and Fortran examples show how to implement 
-a task reduction over a linked list.
-
-Task reductions are supported by the \code{task\_reduction} clause which can only be
-applied to the \code{taskgroup} directive, and a \code{in\_reduction} clause
-which can be applied to the \code{task} construct among others.
-
-The \code{task\_reduction} clause on the \code{taskgroup} construct is used to 
-define the scope of a new reduction, and after the \code{taskgroup} 
-region the original variable will contain the final value of the reduction.
-In the task-generating while loop the \code{in\_reduction} clause of the \code{task}
-construct is used to specify that the task participates "in" the reduction.
-
-Note: The \plc{res} variable is private in the \plc{linked\_list\_sum} routine
-and is not required to be shared (as in the case of a \code{parallel} construct
-reduction).
-
-
-\cexample{task_reduction}{1}
-
-\ffreeexample{task_reduction}{1}
-
-
-\subsection{Taskloop Reduction}
-\label{subsec:taskloop_reduction}
-
-In the OpenMP 5.0 Specification the \code{taskloop} construct
-was extended to include the reductions.
-
-The following two examples show how to implement a reduction over an array
-using taskloop reduction in two different ways.
-In the first
-example we apply the \code{reduction} clause to the \code{taskloop} construct. As it was
-explained above in the task reduction examples, a reduction over tasks is
-divided in two components: the scope of the reduction, which is defined by a
-\code{taskgroup} region, and the tasks that participate in the reduction. In this
-example, the \code{reduction} clause defines both semantics. First, it specifies that
-the implicit \code{taskgroup} region associated with the \code{taskloop} construct is the scope of the
-reduction, and second, it defines all tasks created by the \code{taskloop} construct as
-participants of the reduction. About the first property, it is important to note
-that if we add the \code{nogroup} clause to the \code{taskloop} construct the code will be
-nonconforming, basically because we have a set of tasks that participate in a
-reduction that has not been defined.
-
-\cexample{taskloop_reduction}{1}
-\ffreeexample{taskloop_reduction}{1}
-
-%In the second example, we are computing exactly the same
-%value but we do it in a very different way. The first thing that we do in the
-%\plc{array\_sum} function is to create a \code{taskgroup} region that defines the scope of a
-%new reduction using the \code{task\_reduction} clause.
-%After that, we specify that a task and also the tasks generated
-%by a taskloop will participate in that reduction using the \code{in\_reduction} clause
-%on the \code{task} and \code{taskloop} constructs, respectively. Note that
-%we also added the \code{nogroup} clause to the \code{taskloop} construct. This is allowed
-%because what we are expressing with the \code{in\_reduction} clause is different
-%from what we were expressing with the \code{reduction} clause. In one case we specify
-%that the generated tasks will participate in a previously declared reduction
-%(\code{in\_reduction} clause) whereas in the other case we specify that we want to
-%create a new reduction and also that all tasks generated by the taskloop will
-%participate on it.
-
-The second example computes exactly the same value as in the preceding\plc{taskloop\_reduction.1} code section,
-but in a very different way.
-First, in the \plc{array\_sum} function a \code{taskgroup} region is created 
-that defines the scope of a new reduction using the \code{task\_reduction} clause.
-After that, a task and also the tasks generated by a taskloop participate in 
-that reduction by using the \code{in\_reduction} clause on the \code{task}
-and \code{taskloop} constructs, respectively. 
-Note that the \code{nogroup} clause was added to the \code{taskloop} construct.
-This is allowed because what is expressed with the \code{in\_reduction} clause
-is different from what is expressed with the \code{reduction} clause.
-In one case the generated tasks are specified to participate in a previously 
-declared reduction (\code{in\_reduction} clause) whereas in the other case
-creation of a new reduction is specified and also that all tasks generated 
-by the taskloop will participate on it.
-
-\cexample{taskloop_reduction}{2}
-\ffreeexample{taskloop_reduction}{2}
-
-In the OpenMP 5.0 Specification, \code{reduction} clauses for the
-\code{taskloop}~\code{ simd} construct were also added. 
-
-The examples below compare reductions for the \code{taskloop} and the \code{taskloop}~\code{simd} constructs.
-These examples illustrate the use of \code{reduction} clauses within 
-"stand-alone" \code{taskloop} constructs, and the use of \code{in\_reduction} clauses for tasks of taskloops to participate
-with other reductions within the scope of a parallel region.
-
-\textbf{taskloop reductions:}
-
-In the \plc{taskloop reductions} section of the example below, 
-\plc{taskloop 1} uses the \code{reduction} clause 
-in a \code{taskloop} construct for a sum reduction, accumulated in \plc{asum}. 
-The behavior is as though a \code{taskgroup} construct encloses the 
-taskloop region with a \code{task\_reduction} clause, and each taskloop
-task has an \code{in\_reduction} clause with the specifications 
-of the \code{reduction} clause.
-At the end of the taskloop region \plc{asum} contains the result of the reduction.
-
-The next taskloop, \plc{taskloop 2}, illustrates the use of the 
-\code{in\_reduction} clause to participate in a previously defined
-reduction scope of a \code{parallel} construct.
-
-The task reductions of \plc{task 2} and \plc{taskloop 2} are combined
-across the \code{taskloop} construct and the single \code{task} construct, as specified
-in the \code{reduction(task,}~\code{+:asum)} clause of the \code{parallel} construct.
-At the end of the parallel region \plc{asum} contains the combined result of all reductions.
-
-\textbf{taskloop simd reductions:}
-
-Reductions for the \code{taskloop}~\code{simd} construct are shown in the second half of the code.
-Since each component construct, \code{taskloop} and \code{simd}, 
-can accept a reduction-type clause, the \code{taskloop}~\code{simd} construct
-is a composite construct, and the specific application of the reduction clause is defined
-within the \code{taskloop}~\code{simd} construct section of the OpenMP 5.0 Specification.
-The code below illustrates use cases for these reductions.
-
-In the \plc{taskloop simd reduction} section of the example below,
-\plc{taskloop simd 3} uses the \code{reduction} clause 
-in a \code{taskloop}~\code{simd} construct for a sum reduction within a loop.
-For this case a \code{reduction} clause is used, as one would use 
-for a \code{simd} construct.
-The SIMD reductions of each task are combined, and the results of these tasks are further 
-combined just as in the \code{taskloop} construct with the \code{reduction} clause for \plc{taskloop 1}.
-At the end of the taskloop region \plc{asum} contains the combined result of all reductions.
-
-If a \code{taskloop}~\code{simd} construct is to participate in a previously defined 
-reduction scope, the reduction participation should be specified with
-a \code{in\_reduction} clause, as shown in the \code{parallel} region enclosing
-\plc{task 4} and \plc{taskloop simd 4} code sections.  
-
-Here the \code{taskloop}~\code{simd} construct's 
-\code{in\_reduction} clause specifies participation of the construct's tasks as 
-a task reduction within the scope of the parallel region.  
-That is, the results of each task of the \code{taskloop} construct component 
-contribute to the reduction in a broader level, just as in \plc{parallel reduction a} code section above.
-Also, each \code{simd}-component construct
-occurs as if it has a \code{reduction} clause, and the
-SIMD results of each task are combined as though to form a single result for
-each task (that participates in the \code{in\_reduction} clause).
-At the end of the parallel region \plc{asum} contains the combined result of all reductions.
-
-%Just as in \plc{parallel reduction a} the
-%\code{taskloop simd} construct reduction results are combined 
-%with the \code{task} construct reduction results
-%as specified by the \code{in\_reduction} clause of the \code{task} construct
-%and the \plc{task} reduction-modifier of the \code{reduction} clause of 
-%the \code{parallel} construct.
-%At the end of the parallel region \plc{asum} contains the combined result of all reductions.
-
-
-\cexample{taskloop_simd_reduction}{1}
-
-\ffreeexample{taskloop_simd_reduction}{1}
-
-
-
-% All other reductions
--- a/Examples_requires.tex
+++ b/Examples_requires.tex
@ -1,31 +0,0 @@
-\pagebreak
-\section{The \code{requires} Directive}
-\label{sec:requires}
-
-The declarative \code{requires} directive can be used to 
-specify features that an implementation must provide to compile and 
-execute correctly.
-
-In the following example the \code{unified\_shared\_memory} clause 
-of the \code{requires} directive ensures that the host and all 
-devices accessible through OpenMP provide a \plc{unified address} space
-for memory that is shared by all devices.
-
-The example illustrates the use of the \code{requires} directive specifying
-\plc{unified shared memory} in file scope, before any device 
-directives or device routines. No \code{map} clause is needed for
-the \plc{p} structure on the device (and its address \plc{\&p}, for the C++ code,
-is the same address on the host and device).
-However, scalar variables referenced within the \code{target}
-construct still have a default data-sharing attribute of firstprivate.
-The \plc{q} scalar is incremented on the device, and its change is
-not updated on the host.
-% will defaultmap(toform:scalar) make q use shared address space? 
-%Or will it be ignored at this point.
-% Does before device routines also mean before prototype?
-
-%\pagebreak
-
-\cppexample{requires}{1}
-
-\ffreeexample{requires}{1}
--- a/Examples_simple_lock.tex
+++ b/Examples_simple_lock.tex
@ -1,18 +0,0 @@
-\subsection{Simple Lock Routines}
-\label{subsec:simple_lock}
-
-In the following example, the lock routines cause the threads to be idle while 
-waiting for entry to the first critical section, but to do other work while waiting 
-for entry to the second. The \code{omp\_set\_lock} function blocks, but the \code{omp\_test\_lock} 
-function does not, allowing the work in \code{skip} to be done. 
-
-Note that the argument to the lock routines should have type \code{omp\_lock\_t}, 
-and that there is no need to flush it. 
-
-\cexample{simple_lock}{1}
-
-Note that there is no need to flush the lock variable. 
-
-\fexample{simple_lock}{1}
-
-
--- a/Examples_single.tex
+++ b/Examples_single.tex
@ -1,18 +0,0 @@
-\pagebreak
-\section{The \code{single} Construct}
-\label{sec:single}
-
-The following example demonstrates the \code{single} construct. In the example, 
-only one thread prints each of the progress messages. All other threads will skip 
-the \code{single} region and stop at the barrier at the end of the \code{single} 
-construct until all threads in the team have reached the barrier. If other threads 
-can proceed without waiting for the thread executing the \code{single} region, 
-a \code{nowait} clause can be specified, as is done in the third \code{single} 
-construct in this example. The user must not make any assumptions as to which thread 
-will execute a \code{single} region.
-
-\cexample{single}{1}
-
-\fexample{single}{1}
-
-
--- a/Examples_standalone.tex
+++ b/Examples_standalone.tex
@ -1,33 +0,0 @@
-\pagebreak
-\section{Placement of \code{flush}, \code{barrier}, \code{taskwait} 
-and \code{taskyield} Directives}
-\label{sec:standalone}
-
-The following example is non-conforming, because the \code{flush}, \code{barrier}, 
-\code{taskwait}, and \code{taskyield}  directives are stand-alone directives 
-and cannot be the immediate substatement of an \code{if} statement. 
-
-\cexample{standalone}{1}
-
-\pagebreak
-The following example is non-conforming, because the \code{flush}, \code{barrier}, 
-\code{taskwait}, and \code{taskyield}  directives are stand-alone directives 
-and cannot be the action statement of an \code{if} statement or a labeled branch 
-target.
-
-\ffreeexample{standalone}{1}
-
-The following version of the above example is conforming because the \code{flush}, 
-\code{barrier}, \code{taskwait}, and \code{taskyield} directives are enclosed 
-in a compound statement. 
-
-\cexample{standalone}{2}
-
-\pagebreak
-The following example is conforming because the \code{flush}, \code{barrier}, 
-\code{taskwait}, and \code{taskyield} directives are enclosed in an \code{if} 
-construct or follow the labeled branch target.
-
-\ffreeexample{standalone}{2}
-
-
--- a/Examples_target.tex
+++ b/Examples_target.tex
@ -1,144 +0,0 @@
-\pagebreak
-\section{\code{target} Construct}
-\label{sec:target}
-
-\subsection{\code{target} Construct on \code{parallel} Construct}
-\label{subsec:target_parallel}
-
-This following example shows how the \code{target} construct offloads a code 
-region to a target device. The variables \plc{p}, \plc{v1}, \plc{v2}, and \plc{N} are implicitly mapped 
-to the target device.
-
-\cexample{target}{1}
-
-\ffreeexample{target}{1}
-
-\subsection{\code{target} Construct with \code{map} Clause}
-\label{subsec:target_map}
-
-This following example shows how the \code{target} construct offloads a code 
-region to a target device. The variables \plc{p}, \plc{v1} and \plc{v2} are explicitly mapped to the 
-target device using the \code{map} clause. The variable \plc{N} is implicitly mapped to 
-the target device.
-
-\cexample{target}{2}
-
-\ffreeexample{target}{2}
-
-\subsection{\code{map} Clause with \code{to}/\code{from} map-types}
-\label{subsec:target_map_tofrom}
-
-The following example shows how the \code{target} construct offloads a code region 
-to a target device. In the \code{map} clause, the \code{to} and \code{from} 
-map-types define the mapping between the original (host) data and the target (device) 
-data. The \code{to} map-type specifies that the data will only be read on the 
-device, and the \code{from} map-type specifies that the data will only be written 
-to on the device. By specifying a guaranteed access on the device, data transfers 
-can be reduced for the \code{target} region.
-
-The \code{to} map-type indicates that at the start of the \code{target} region 
-the variables \plc{v1} and \plc{v2} are initialized with the values of the corresponding variables 
-on the host device, and at the end of the \code{target} region the variables 
-\plc{v1} and \plc{v2} are not assigned to their corresponding variables on the host device.
-
-The \code{from} map-type indicates that at the start of the \code{target} region 
-the variable \plc{p} is not initialized with the value of the corresponding variable 
-on the host device, and at the end of the \code{target} region the variable \plc{p} 
-is assigned to the corresponding variable on the host device.
-
-\cexample{target}{3}
-
-The \code{to} and \code{from} map-types allow programmers to optimize data 
-motion. Since data for the \plc{v} arrays are not returned, and data for the \plc{p} array 
-are not transferred to the device, only one-half of the data is moved, compared 
-to the default behavior of an implicit mapping.
-
-\ffreeexample{target}{3}
-
-\subsection{\code{map} Clause with Array Sections}
-\label{subsec:target_array_section}
-
-The following example shows how the \code{target} construct offloads a code region 
-to a target device. In the \code{map} clause, map-types are used to optimize 
-the mapping of variables to the target device. Because variables \plc{p}, \plc{v1} and \plc{v2} are 
-pointers, array section notation must be used to map the arrays. The notation \code{:N} 
-is equivalent to \code{0:N}.
-
-\cexample{target}{4}
-
-In C, the length of the pointed-to array must be specified. In Fortran the extent 
-of the array is known and the length need not be specified. A section of the array 
-can be specified with the usual Fortran syntax, as shown in the following example. 
-The value 1 is assumed for the lower bound for array section \plc{v2(:N)}.
-
-\ffreeexample{target}{4}
-
-A more realistic situation in which an assumed-size array is passed to \code{vec\_mult} 
-requires that the length of the arrays be specified, because the compiler does 
-not know the size of the storage. A section of the array must be specified with 
-the usual Fortran syntax, as shown in the following example. The value 1 is assumed 
-for the lower bound for array section \plc{v2(:N)}.
-
-\ffreeexample{target}{4b}
-
-\subsection{\code{target} Construct with \code{if} Clause}
-\label{subsec:target_if}
-
-The following example shows how the \code{target} construct offloads a code region 
-to a target device.
-
-The \code{if} clause on the \code{target} construct indicates that if the variable 
-\plc{N} is smaller than a given threshold, then the \code{target} region will be executed 
-by the host device.
-
-The \code{if} clause on the \code{parallel} construct indicates that if the 
-variable \plc{N} is smaller than a second threshold then the \code{parallel} region 
-is inactive.
-
-\cexample{target}{5}
-
-\ffreeexample{target}{5}
-
-The following example is a modification of the above \plc{target.5} code to show the combined \code{target}
-and parallel loop directives. It uses the \plc{directive-name} modifier in multiple \code{if}
-clauses to specify the component directive to which it applies. 
-
-The \code{if} clause with the \code{target} modifier applies to the \code{target} component of the 
-combined directive, and the \code{if} clause with the \code{parallel} modifier applies 
-to the \code{parallel} component of the combined directive.    
-
-\cexample{target}{6}
-
-\ffreeexample{target}{6}
-
-\subsection{target Reverse Offload}
-\label{subsec:target_reverse_offload}
-
-Beginning with OpenMP 5.0, implementations are allowed to
-offload back to the host (reverse offload).
-
-In the example below the \plc{error\_handler} function
-is executed back on the host, if an erroneous value is
-detected in the \plc{A} array on the device.
-
-This is accomplished by specifying the \plc{device-modifier}
-\code{ancestor} modifier, along with a device number of \code{1},
-to indicate that the execution is to be performed on the
-immediate parent (\plc{1st ancestor})-- the host.
-
-The \code{requires} directive (another 5.0 feature)
-uses the \code{reverse\_offload} clause to guarantee
-that the reverse offload is implemented.
-
-Note that the \code{declare target} directive uses the
-\code{device\_type} clause (another 5.0 feature) to specify that
-the \plc{error\_handler} function is compiled to
-execute on the \plc{host} only. This ensures that no
-attempt will be made to create a device version of the
-function.  This feature may be necessary if the function
-exists in another compile unit.
-
-
-\cexample{target_reverse_offload}{7}
-
-\ffreeexample{target_reverse_offload}{7}
--- a/Examples_target_data.tex
+++ b/Examples_target_data.tex
@ -1,178 +0,0 @@
-\pagebreak
-\section{\code{target} \code{data} Construct}
-\label{sec:target_data}
-
-\subsection{Simple \code{target} \code{data} Construct}
-\label{subsec:target_data_simple}
-
-This example shows how the \code{target} \code{data} construct maps variables 
-to a device data environment. The \code{target} \code{data} construct creates 
-a new device data environment and maps the variables \plc{v1}, \plc{v2}, and \plc{p} to the new device 
-data environment. The \code{target} construct enclosed in the \code{target} 
-\code{data} region creates a new device data environment, which inherits the 
-variables \plc{v1}, \plc{v2}, and \plc{p} from the enclosing device data environment. The variable 
-\plc{N} is mapped into the new device data environment from the encountering task's data 
-environment.
-
-\cexample{target_data}{1}
-
-\pagebreak
-The Fortran code passes a reference and specifies the extent of the arrays in the 
-declaration. No length information is necessary in the map clause, as is required 
-with C/C++ pointers.
-
-\ffreeexample{target_data}{1}
-
-\subsection{\code{target} \code{data} Region Enclosing Multiple \code{target} Regions}
-\label{subsec:target_data_multiregion}
-
-The following examples show how the \code{target} \code{data} construct maps 
-variables to a device data environment of a \code{target} region. The \code{target} 
-\code{data} construct creates a device data environment and encloses \code{target} 
-regions, which have their own device data environments. The device data environment 
-of the \code{target} \code{data} region is inherited by the device data environment 
-of an enclosed \code{target} region. The \code{target} \code{data} construct 
-is used to create variables that will persist throughout the \code{target} \code{data} 
-region.
-
-In the following example the variables \plc{v1} and \plc{v2} are mapped at each \code{target} 
-construct. Instead of mapping the variable \plc{p} twice, once at each \code{target} 
-construct, \plc{p} is mapped once by the \code{target} \code{data} construct.
-
-\cexample{target_data}{2}
-
-
-The Fortran code uses reference and specifies the extent of the \plc{p}, \plc{v1} and \plc{v2} arrays. 
-No length information is necessary in the \code{map} clause, as is required with 
-C/C++ pointers. The arrays \plc{v1} and \plc{v2} are mapped at each \code{target} construct. 
-Instead of mapping the array \plc{p} twice, once at each target construct, \plc{p} is mapped 
-once by the \code{target} \code{data} construct.
-
-\ffreeexample{target_data}{2}
-
-In the following example, the array \plc{Q} is mapped once at the enclosing 
-\code{target}~\code{data} region instead of at each \code{target} construct. 
-In OpenMP 4.0, a scalar variable is implicitly mapped with the \code{tofrom} map-type.
-But since OpenMP 4.5, a scalar variable, such as the \plc{tmp} variable, has to be explicitly mapped with 
-the \code{tofrom} map-type at the first \code{target} construct in order to return 
-its reduced value from the parallel loop construct to the host.
-The variable defaults to firstprivate at the second \code{target} construct.
-
-\cexample{target_data}{3}
-
-\ffreeexample{target_data}{3}
-
-\subsection{\code{target} \code{data} Construct with Orphaned Call}
-
-The following two examples show how the \code{target} \code{data} construct 
-maps variables to a device data environment. The \code{target} \code{data} 
-construct's device data environment encloses the \code{target} construct's device 
-data environment in the function \code{vec\_mult()}.
-
-When the type of the variable appearing in an array section is pointer, the pointer 
-variable and the storage location of the corresponding array section are mapped 
-to the device data environment. The pointer variable is treated as if it had appeared 
-in a \code{map} clause with a map-type of \code{alloc}. The array section's 
-storage location is mapped according to the map-type in the \code{map} clause 
-(the default map-type is \code{tofrom}).
-
-The \code{target} construct's device data environment inherits the storage locations 
-of the array sections \plc{v1[0:N]}, \plc{v2[:n]}, and \plc{p0[0:N]} from the enclosing target data 
-construct's device data environment. Neither initialization nor assignment is performed 
-for the array sections in the new device data environment.
-
-The pointer variables \plc{p1}, \plc{v3}, and \plc{v4} are mapped into the target construct's device 
-data environment with an implicit map-type of alloc and they are assigned the address 
-of the storage location associated with their corresponding array sections. Note 
-that the following pairs of array section storage locations are equivalent (\plc{p0[:N]}, 
-\plc{p1[:N]}), (\plc{v1[:N]},\plc{v3[:N]}), and (\plc{v2[:N]},\plc{v4[:N]}).
-
-\cexample{target_data}{4}
-
-The Fortran code maps the pointers and storage in an identical manner (same extent, 
-but uses indices from 1 to \plc{N}).
-
-The \code{target} construct's device data environment inherits the storage locations 
-of the arrays \plc{v1}, \plc{v2} and \plc{p0} from the enclosing \code{target} \code{data} constructs's 
-device data environment. However, in Fortran the associated data of the pointer 
-is known, and the shape is not required.
-
-The pointer variables \plc{p1}, \plc{v3}, and \plc{v4} are mapped into the \code{target} construct's 
-device data environment with an implicit map-type of \code{alloc} and they are 
-assigned the address of the storage location associated with their corresponding 
-array sections. Note that the following pair of array storage locations are equivalent 
-(\plc{p0},\plc{p1}), (\plc{v1},\plc{v3}), and (\plc{v2},\plc{v4}).
-
-\ffreeexample{target_data}{4}
-
-
-In the following example, the variables \plc{p1}, \plc{v3}, and \plc{v4} are references to the pointer 
-variables \plc{p0}, \plc{v1} and \plc{v2} respectively. The \code{target} construct's device data 
-environment inherits the pointer variables \plc{p0}, \plc{v1}, and \plc{v2} from the enclosing \code{target} 
-\code{data} construct's device data environment. Thus, \plc{p1}, \plc{v3}, and \plc{v4} are already 
-present in the device data environment.
-
-\cppexample{target_data}{5}
-
-In the following example, the usual Fortran approach is used for dynamic memory. 
-The \plc{p0}, \plc{v1}, and \plc{v2} arrays are allocated in the main program and passed as references 
-from one routine to another. In \code{vec\_mult}, \plc{p1}, \plc{v3} and \plc{v4} are references to the 
-\plc{p0}, \plc{v1}, and \plc{v2} arrays, respectively. The \code{target} construct's device data 
-environment inherits the arrays \plc{p0}, \plc{v1}, and \plc{v2} from the enclosing target data construct's 
-device data environment. Thus, \plc{p1}, \plc{v3}, and \plc{v4} are already present in the device 
-data environment.
-
-\ffreeexample{target_data}{5}
-
-\subsection{\code{target} \code{data} Construct with \code{if} Clause}
-\label{subsec:target_data_if}
-
-The following two examples show how the \code{target} \code{data} construct 
-maps variables to a device data environment.
-
-In the following example, the if clause on the \code{target} \code{data} construct 
-indicates that if the variable \plc{N} is smaller than a given threshold, then the \code{target} 
-\code{data} construct will not create a device data environment.
-
-The \code{target} constructs enclosed in the \code{target} \code{data} region 
-must also use an \code{if} clause on the same condition, otherwise the pointer 
-variable \plc{p} is implicitly mapped with a map-type of \code{tofrom}, but the storage 
-location for the array section \plc{p[0:N]} will not be mapped in the device data environments 
-of the \code{target} constructs.
-
-\cexample{target_data}{6}
-
-\pagebreak
-The \code{if} clauses work the same way for the following Fortran code. The \code{target} 
-constructs enclosed in the \code{target} \code{data} region should also use 
-an \code{if} clause with the same condition, so that the \code{target} \code{data} 
-region and the \code{target} region are either both created for the device, or 
-are both ignored.
-
-\ffreeexample{target_data}{6}
-
-\pagebreak
-In the following example, when the \code{if} clause conditional expression on 
-the \code{target} construct evaluates to \plc{false}, the target region will 
-execute on the host device. However, the \code{target} \code{data} construct 
-created an enclosing device data environment that mapped \plc{p[0:N]} to a device data 
-environment on the default device. At the end of the \code{target} \code{data} 
-region the array section \plc{p[0:N]} will be assigned from the device data environment 
-to the corresponding variable in the data environment of the task that encountered 
-the \code{target} \code{data} construct, resulting in undefined values in \plc{p[0:N]}.
-
-\cexample{target_data}{7}
-
-\pagebreak
-The \code{if} clauses work the same way for the following Fortran code. When 
-the \code{if} clause conditional expression on the \code{target} construct 
-evaluates to \plc{false}, the \code{target} region will execute on the host 
-device. However, the \code{target} \code{data} construct created an enclosing 
-device data environment that mapped the \plc{p} array (and \plc{v1} and \plc{v2}) to a device data 
-environment on the default target device. At the end of the \code{target} \code{data} 
-region the \plc{p} array will be assigned from the device data environment to the corresponding 
-variable in the data environment of the task that encountered the \code{target} 
-\code{data} construct, resulting in undefined values in \plc{p}.
-
-\ffreeexample{target_data}{7}
-
--- a/Examples_target_mapper.tex
+++ b/Examples_target_mapper.tex
@ -1,86 +0,0 @@
-\pagebreak
-\section{ \code{declare mapper} Construct}
-\label{sec:declare_mapper}
-
-The following examples show how to use the \code{declare mapper}
-directive to prescribe a map for later use.
-It is also quite useful for pre-defining partitioned and nested 
-structure elements.
-
-In the first example the \code{declare mapper} directive specifies 
-that any structure of type \plc{myvec\_t} for which implicit data-mapping
-rules apply will be mapped according to its \code{map} clause.
-The variable \plc{v} is used for referencing the structure and its 
-elements within the \code{map} clause. 
-Within the \code{map} clause the \plc{v} variable specifies that all
-elements of the structure are to be mapped.  Additionally, the
-array section \plc{v.data[0:v.len]} specifies that the dynamic 
-storage for data is to be mapped. 
-
-Within the main program the \plc{s} variable is typed as \plc{myvec\_t}.
-Since the variable is found within the target region and the type has a mapping prescribed by
-a \code{declare mapper} directive, it will be automatically mapped according to its prescription: 
-full structure, plus the dynamic storage of the \plc{data} element. 
-
-%Note: By default the mapping is \code{tofrom}. 
-%The associated Fortran allocatable \plc{data} array is automatically mapped with the derived
-%type, it does not require an array section as in the C/C++ example.
-
-\cexample{target_mapper}{1}
-
-\ffreeexample{target_mapper}{1}
-
-\pagebreak
-The next example illustrates the use of the \plc{mapper-identifier} and deep copy within a structure. 
-The structure, \plc{dzmat\_t},  represents a complex matrix, 
-with separate real (\plc{r\_m}) and imaginary (\plc{i\_m}) elements.
-Two map identifiers are created for partitioning the \plc{dzmat\_t} structure.
-
-For the C/C++ code the first identifier is named \plc{top\_id} and maps the top half of
-two matrices of type \plc{dzmat\_t}; while the second identifier, \plc{bottom\_id},
-maps the lower half of two matrices. 
-Each identifier is applied to a different \code{target} construct,
-as  \code{map(mapper(top\_id), tofrom: a,b)} 
-and \code{map(mapper(bottom\_id), tofrom: a,b)}.
-Each target offload is allowed to execute concurrently on two different devices 
-(\plc{0} and \plc{1}) through the \code{nowait} clause.
-The OpenMP 5.0 \code{parallel master} construct creates a region of two threads
-for these \code{target} constructs, with a single thread (\plc{master}) generator.
-
-The Fortran code uses the \plc{left\_id} and \plc{right\_id} map identifiers in the
-\code{map(mapper(left\_id),tofrom: a,b)} and \code{map(mapper(right\_id),tofrom: a,b)} map clauses.  
-The array sections for these left and right contiguous portions of the matrices 
-were defined previously in the \code{declare mapper} directive.
-
-Note, the \plc{is} and \plc{ie} scalars are firstprivate 
-by default for a target region, but are declared firstprivate anyway
-to remind the user of important firstprivate data-sharing properties required here.
-
-\cexample{target_mapper}{2}
-
-\ffreeexample{target_mapper}{2}
-
-\pagebreak
-In the third example \plc{myvec} structures are
-nested within a \plc{mypoints} structure. The \plc{myvec\_t} type is mapped
-as in the first example.  Following the \plc{mypoints} structure declaration, 
-the \plc{mypoints\_t} type is mapped by a \code{declare mapper} directive. 
-For this structure the \plc{hostonly\_data} element will not be mapped;
-also the array section of \plc{x} (\plc{v.x[:1]}) and \plc{x} will be mapped; and
-\plc{scratch} will be allocated and used as scratch storage on the device.
-The default map-type mapping, \code{tofrom}, applies to the \plc{x} array section,
-but not to \plc{scratch} which is explicitly mapped with the \code{alloc} map-type. 
-Note: the variable \plc{v} is not included in the map list (otherwise
-the \plc{hostonly\_data} would be mapped)-- just the elements 
-to be mapped are listed.
-
-The two mappers are combined when a \plc{mypoints\_t} structure type is mapped,
-because the mapper \plc{myvec\_t} structure type is used within a \plc{mypoints\_t}
-type structure.
-%Note, in the main program \plc{P} is an array of \plc{mypoints\_t} type structures, 
-%and hence every element of the array is mapped with the mapper prescription.
-
-\cexample{target_mapper}{3}
-
-\ffreeexample{target_mapper}{3}
-
--- a/Examples_target_pointer_mapping.tex
+++ b/Examples_target_pointer_mapping.tex
@ -1,53 +0,0 @@
-\pagebreak
-\section{Pointer mapping}
-\label{sec:pointer_mapping}
-
-The following example shows the basics of mapping pointers with and without
-associated storage on the host.
-
-Storage for pointers \plc{ptr1} and \plc{ptr2} is created on the host. 
-To map storage that is associated with a pointer on the host, the data can be
-explicitly mapped as an array section so that the compiler knows 
-the amount of data to be assigned in the device (to the "corresponding" data storage area).
-On the \code{target} construct array sections are mapped; however, the pointer \plc{ptr1}
-is mapped, while \plc{ptr2} is not. Since \plc{ptr2} is not explicitly mapped, it is
-firstprivate.  This creates a subtle difference in the way these pointers can be used.
-
-As a firstprivate pointer, \plc{ptr2} can be manipulated on the device;
-however, as an explicitly mapped pointer, 
-\plc{ptr1} becomes an \emph{attached} pointer and cannot be manipulated.
-In both cases the host pointer is not updated with the device pointer 
-address---as one would expect for distributed memory. 
-The storage data on the host is updated from the corresponding device
-data at the end of the \code{target} region.
-
-As a comparison, note that the \plc{aray} array is automatically mapped, 
-since the compiler knows the extent of the array. 
-
-The pointer \plc{ptr3} is used in the \code{target} region and has
-a data-sharing attribute of firstprivate.  
-The pointer is implicitly mapped to a zero-length array section.
-Neither the pointer address nor any 
-of its locally assigned data on the device is returned
-to the host.  
-
-\cexample{target_ptr_map}{1}
-
-In the following example the global pointer \plc{p} appears in a
-\code{declare}~\code{target} directive.  Hence, the pointer \plc{p} will 
-persist on the device throughout executions in all target regions.  
-
-The pointer is also used in an array section of a \code{map} clause on 
-a \code{target} construct.  When storage associated with 
-a \code{declare}~\code{target} pointer
-is mapped, as for the array section \plc{p[:N]} in the
-\code{target} construct, the array section on the device is \emph{attached}
-to the device pointer \plc{p} on entry to the construct, and
-the value of the device pointer \plc{p} becomes undefined on exit. 
-(Of course, storage allocation for
-the array section on the device will occur before the 
-pointer on the device is \emph{attached}.)
-% For globals with declare target is there such a things a
-% original and corresponding?
-
-\cexample{target_ptr_map}{2}
--- a/Examples_target_structure_mapping.tex
+++ b/Examples_target_structure_mapping.tex
@ -1,54 +0,0 @@
-\pagebreak
-\section{Structure mapping}
-\label{sec:structure_mapping}
-
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-In the example below, only structure elements \plc{S.a}, \plc{S.b} and \plc{S.p} 
-of the \plc{S} structure appear in \code{map} clauses of a \code{target} construct.
-Only these components have corresponding variables and storage on the device.  
-Hence, the large arrays, \plc{S.buffera} and \plc{S.bufferb}, and the \plc{S.x} component have no storage 
-on the device and cannot be accessed.  
-
-Also, since the pointer member \plc{S.p} is used in an array section of a 
-\code{map} clause, the array storage of the array section on the device, 
-\plc{S.p[:N]}, is \emph{attached} to the pointer member \plc{S.p} on the device.
-Explicitly mapping the pointer member \plc{S.p} is optional in this case.
-
-Note: The buffer arrays and the \plc{x} variable have been grouped together, so that
-the components that will reside on the device are all together (without gaps).
-This allows the runtime to optimize the transfer and the storage footprint on the device.
-
-\cexample{target_struct_map}{1}
-
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-The following example is a slight modification of the above example for 
-a C++ class.  In the member function \plc{SAXPY::driver} 
-the array section \plc{p[:N]} is \emph{attached} to the pointer member \plc{p}
-on the device.
- 
-\cppexample{target_struct_map}{2}
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-%In this example a pointer, \plc{p}, is mapped in a 
-%\code{target}~\code{data} construct (\code{map(p)}) and remains 
-%persistent throughout the \code{target}~\code{data} region. The address stored
-%on the host is not assigned to the device pointer variable, and 
-%the device value is not copied back to the host at the end of the
-%region (for a pointer, it is as though \code{map(alloc:p}) is effectively
-%used).  The array section, \plc{p[:N]}, is mapped on both \code{target}
-%constructs, and the pointer \plc{p} on the device is attached at the
-%beginning and detached at the end of the regions to the newly created
-%array section on the device.
-%
-%Also, in the following example the global variable, \plc{a}, becomes 
-%allocated when it is first used on the device in a \code{target} region, 
-%and persists on the device for all target regions.  The value on the
-%device and host may be different, as shown by the print statements.
-%The values may be made consistent with the \code{update} construct,
-%as shown in the \plc{declare\_target.3.c} and \plc{declare\_target.3.f90} 
-%examples.
-%
-%\cexample{target_struct_map}{2}
--- a/Examples_target_unstructured_data.tex
+++ b/Examples_target_unstructured_data.tex
@ -1,49 +0,0 @@
-%begin 
-\pagebreak
-\section{\code{target} \code{enter} \code{data} and \code{target} \code{exit} \code{data} Constructs}
-\label{sec:target_enter_exit_data}
-%\section{Simple target enter data and target exit data Constructs}
-
-The structured data construct (\code{target}~\code{data}) provides persistent data on a
-device for subsequent \code{target} constructs as shown in the 
-\code{target}~\code{data} examples above. This is accomplished by creating a single
-\code{target}~\code{data} region containing \code{target} constructs.
-
-The unstructured data constructs allow the creation and deletion of data on
-the device at any appropriate point within the host code, as shown below 
-with the \code{target}~\code{enter}~\code{data} and \code{target}~\code{exit}~\code{data} constructs.
-
-The following C++ code creates/deletes a vector in a constructor/destructor 
-of a class. The constructor creates a vector with \code{target}~\code{enter}~\code{data}
-and uses an \code{alloc} modifier in the \code{map} clause to avoid copying values
-to the device. The destructor deletes the data (\code{target}~\code{exit}~\code{data})
-and uses the \code{delete} modifier in the \code{map} clause to avoid copying data
-back to the host. Note, the stand-alone \code{target}~\code{enter}~\code{data} occurs 
-after the host vector is created, and the \code{target}~\code{exit}~\code{data}
-construct occurs before the host data is deleted.
-
-\cppexample{target_unstructured_data}{1}
-
-\pagebreak
-The following C code allocates and frees the data member of a Matrix structure.
-The \code{init\_matrix} function allocates the memory used in the structure and
-uses the \code{target}~\code{enter}~\code{data} directive to map it to the target device. The
-\code{free\_matrix} function removes the mapped array from the target device
-and then frees the memory on the host.  Note, the stand-alone 
-\code{target}~\code{enter}~\code{data} occurs after the host memory is allocated, and the 
-\code{target}~\code{exit}~\code{data} construct occurs before the host data is freed.
-
-\cexample{target_unstructured_data}{1}
-
-\pagebreak
-The following Fortran code allocates and deallocates a module array.  The
-\code{initialize} subroutine allocates the module array and uses the
-\code{target}~\code{enter}~\code{data} directive to map it to the target device. The
-\code{finalize} subroutine removes the mapped array from the target device and
-then deallocates the array on the host.  Note, the stand-alone 
-\code{target}~\code{enter}~\code{data} occurs after the host memory is allocated, and the 
-\code{target}~\code{exit}~\code{data} construct occurs before the host data is deallocated.
-
-\ffreeexample{target_unstructured_data}{1}
-%end
-
--- a/Examples_target_update.tex
+++ b/Examples_target_update.tex
@ -1,55 +0,0 @@
-\pagebreak
-\section{\code{target} \code{update} Construct}
-\label{sec:target_update}
-
-\subsection{Simple \code{target} \code{data} and \code{target} \code{update} Constructs}
-\label{subsec:target_data_and_update}
-
-The following example shows how the \code{target} \code{update} construct updates 
-variables in a device data environment.
-
-The \code{target} \code{data} construct maps array sections \plc{v1[:N]} and \plc{v2[:N]} 
-(arrays \plc{v1} and \plc{v2} in the Fortran code) into a device data environment.
-
-The task executing on the host device encounters the first \code{target} region 
-and waits for the completion of the region.
-
-After the execution of the first \code{target} region, the task executing on 
-the host device then assigns new values to \plc{v1[:N]} and \plc{v2[:N]} (\plc{v1} and \plc{v2} arrays 
-in Fortran code) in the task's data environment by calling the function \code{init\_again()}.
-
-The \code{target} \code{update} construct assigns the new values of \plc{v1} and 
-\plc{v2} from the task's data environment to the corresponding mapped array sections 
-in the device data environment of the \code{target} \code{data} construct.
-
-The task executing on the host device then encounters the second \code{target} 
-region and waits for the completion of the region.
-
-The second \code{target} region uses the updated values of \plc{v1[:N]} and \plc{v2[:N]}.
-
-\cexample{target_update}{1}
-
-\ffreeexample{target_update}{1}
-
-\subsection{\code{target} \code{update} Construct with \code{if} Clause}
-\label{subsec:target_update_if}
-
-The following example shows how the \code{target} \code{update} construct updates 
-variables in a device data environment.
-
-The \code{target} \code{data} construct maps array sections \plc{v1[:N]} and \plc{v2[:N]} 
-(arrays \plc{v1} and \plc{v2} in the Fortran code) into a device data environment. In between 
-the two \code{target} regions, the task executing on the host device conditionally 
-assigns new values to \plc{v1} and \plc{v2} in the task's data environment. The function \code{maybe\_init\_again()} 
-returns \plc{true} if new data is written.
-
-When the conditional expression (the return value of \code{maybe\_init\_again()}) in the 
-\code{if} clause is \plc{true}, the \code{target} \code{update} construct 
-assigns the new values of \plc{v1} and \plc{v2} from the task's data environment to the corresponding 
-mapped array sections in the \code{target} \code{data} construct's device data 
-environment.
-
-\cexample{target_update}{2}
-
-\ffreeexample{target_update}{2}
-
--- a/Examples_task_affinity.tex
+++ b/Examples_task_affinity.tex
@ -1,32 +0,0 @@
-\section{Task Affinity}
-\label{sec: task_affinity}
-
-The next example illustrates the use of the \code{affinity}
-clause with a \code{task} construct.
-The variables in the \code{affinity} clause provide a
-hint to the runtime that the task should execute
-"close" to the physical storage location of the variables. For example,
-on a two-socket platform with a local memory component
-close to each processor socket, the runtime will attempt to
-schedule the task execution on the socket where the storage is located.
-
-Because the C/C++ code employs a pointer, an array section is used in
-the \code{affinity} clause.
-Fortran code can use an array reference to specify the storage, as
-shown here.
-
-Note, in the second task of the C/C++ code the \plc{B} pointer is declared
-shared.  Otherwise, by default, it would be firstprivate since it is a local
-variable, and would probably be saved for the second task before being assigned
-a storage address by the first task.  Also, one might think it reasonable to use
-the \code{affinity} clause \plc{affinity(B[:N])} on the second \code{task} construct.
-However, the storage behind \plc{B} is created in the first task, and the 
-array section reference may not be valid when the second task is generated.
-The use of the \plc{A} array is sufficient for this case, because one
-would expect the storage for \plc{A} and \plc{B} would be physically "close"
-(as provided by the hint in the first task).
-
-\cexample{affinity}{6}
-
-\ffreeexample{affinity}{6}
-
--- a/Examples_task_dep.tex
+++ b/Examples_task_dep.tex
@ -1,216 +0,0 @@
-\pagebreak
-\section{Task Dependences}
-\label{sec:task_depend}
-
-\subsection{Flow Dependence}
-\label{subsec:task_flow_depend}
-
-This example shows a simple flow dependence using a \code{depend} 
-clause on the \code{task} construct.
-
-\cexample{task_dep}{1}
-
-\ffreeexample{task_dep}{1}
-
-The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend} 
-clauses enforce the ordering of the tasks. If the \code{depend} clauses had been 
-omitted, then the tasks could execute in any order and the program and the program 
-would have a race condition.
-
-\subsection{Anti-dependence}
-\label{subsec:task_anti_depend}
-
-This example shows an anti-dependence using the \code{depend} 
-clause on the \code{task} construct.
-
-\cexample{task_dep}{2}
-
-\ffreeexample{task_dep}{2}
-
-The program will always print \texttt{"}x = 1\texttt{"}, because the \code{depend} 
-clauses enforce the ordering of the tasks. If the \code{depend} clauses had been 
-omitted, then the tasks could execute in any order and the program would have a 
-race condition.
-
-\subsection{Output Dependence}
-\label{subsec:task_out_depend}
-
-This example shows an output dependence using the \code{depend} 
-clause on the \code{task} construct.
-
-\cexample{task_dep}{3}
-
-\ffreeexample{task_dep}{3}
-
-The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend} 
-clauses enforce the ordering of the tasks. If the \code{depend} clauses had been 
-omitted, then the tasks could execute in any order and the program would have a 
-race condition.
-
-\pagebreak
-\subsection{Concurrent Execution with Dependences}
-\label{subsec:task_concurrent_depend}
-
-In this example we show potentially concurrent execution of tasks using multiple 
-flow dependences expressed using the \code{depend} clause on the \code{task} 
-construct.
-
-\cexample{task_dep}{4}
-
-\ffreeexample{task_dep}{4}
-
-The last two tasks are dependent on the first task. However there is no dependence 
-between the last two tasks, which may execute in any order (or concurrently if 
-more than one thread is available). Thus, the possible outputs are \texttt{"}x 
-+ 1 = 3. x + 2 = 4. \texttt{"} and \texttt{"}x + 2 = 4. x + 1 = 3. \texttt{"}. 
-If the \code{depend} clauses had been omitted, then all of the tasks could execute 
-in any order and the program would have a race condition.
-
-\subsection{Matrix multiplication}
-\label{subsec:task_matrix_mult}
-
-This example shows a task-based blocked matrix multiplication. Matrices are of 
-NxN elements, and the multiplication is implemented using blocks of BSxBS elements.
-
-\cexample{task_dep}{5}
-
-\ffreeexample{task_dep}{5}
-
-\subsection{\code{taskwait} with Dependences}
-\label{subsec:taskwait_depend}
-
-In this subsection three examples illustrate how the
-\code{depend} clause can be applied to a \code{taskwait} construct to make the
-generating task wait for specific child tasks to complete. This is an OpenMP 5.0 feature.
- In the same manner that
-dependences can order executions among child tasks with \code{depend} clauses on
-\code{task} constructs, the generating task can be scheduled to wait on child tasks
-at a \code{taskwait} before it can proceed.
-
-Note: Since the \code{depend} clause on a \code{taskwait} construct relaxes the
-default synchronization behavior (waiting for all children to finish), it is important to
-realize that child tasks that are not predecessor tasks, as determined by the \code{depend}
-clause of the \code{taskwait} construct, may be running concurrently while the
-generating task is executing after the taskwait.
-
-In the first example the generating task waits at the \code{taskwait} construct
-for the completion of the first child task because a dependence on the first task
-is produced by \plc{x} with an \code{in} dependence type within the \code{depend}
-clause of the \code{taskwait} construct.
-Immediately after the first \code{taskwait} construct it is safe to access the
-\plc{x} variable by the generating task, as shown in the print statement.
-There is no completion restraint on the second child task.
-Hence, immediately after the first \code{taskwait} it is unsafe to access the
-\plc{y} variable since the second child task may still be executing.
-The second \code{taskwait} ensures that the second child task has completed; hence
-it is safe to access the \plc{y} variable in the following print statement.
-
-\cexample{task_dep}{6}
-
-\ffreeexample{task_dep}{6}
-
-In this example the first two tasks are serialized, because a dependence on
-the first child is produced by \plc{x} with the \code{in} dependence type
-in the \code{depend} clause of the second task.
-However, the generating task at the first \code{taskwait} waits only on the
-first child task to complete, because a dependence on only the first child task
-is produced by \plc{x} with an \code{in} dependence type within the
-\code{depend} clause of the \code{taskwait} construct.
-The second \code{taskwait} (without a \code{depend} clause) is included
-to guarantee completion of the second task before \plc{y} is accessed.
-(While unnecessary, the \code{depend(inout:} \code{y)} clause on the  2nd child task is
-included to illustrate how the child task dependences can be completely annotated
-in a data-flow model.)
-
-
-\cexample{task_dep}{7}
-
-\ffreeexample{task_dep}{7}
-
-
-This example is similar to the previous one, except the generating task is
-directed to also wait for completion of the second task.
-
-The \code{depend} clause of the \code{taskwait} construct now includes an
-\code{in} dependence type for \plc{y}.  Hence the generating task must now
-wait on completion of any child task having \plc{y} with an \code{out}
-(here \code{inout}) dependence type in its \code{depend} clause.
-So, the \code{depend} clause of the \code{taskwait} construct now constrains
-the second task to complete at the \code{taskwait}, too.
-%--both tasks must now complete execution at the \code{taskwait}. 
-(This change makes the second \code{taskwait} of the previous example unnecessary--
-it has been removed in this example.)
-
-Note: While a taskwait construct ensures that all child tasks have completed; a depend clause on a taskwait
-construct only waits for specific child tasks (prescribed by the dependence type and list
-items in the \code{taskwait}'s \code{depend} clause).
-This and the previous example illustrate the need to carefully determine
-the dependence type of variables in the \code{taskwait} \code{depend} clause
-when selecting child tasks that the generating task must wait on, so that its execution after the
-taskwait does not produce race conditions on variables accessed by non-completed child tasks.
-
-\cexample{task_dep}{8}
-
-\ffreeexample{task_dep}{8}
-
-\pagebreak
-\subsection{Mutually Exclusive Execution with Dependences}
-\label{subsec:task_dep_mutexinoutset}
-
-In this example we show a series of tasks, including mutually exclusive
-tasks, expressing dependences using the \code{depend} clause on the
-\code{task} construct.
-
-The program will always print~6. Tasks T1, T2 and T3 will be scheduled first,
-in any order. Task T4 will be scheduled after tasks T1 and T2 are
-completed. T5 will be scheduled after tasks T1 and T3 are completed. Due
-to the \code{mutexinoutset} dependence type on \code{c}, T4 and T5 may be
-scheduled in any order with respect to each other, but not at the same
-time. Tasks T6 will be scheduled after both T4 and T5 are completed.
-
-\cexample{task_dep}{9}
-
-\ffreeexample{task_dep}{9}
-
-The following example demonstrates a situation where the \code{mutexinoutset}
-dependence type is advantageous. If \code{shortTaskB} completes
-before \code{longTaskA}, the runtime can take advantage of this by
-scheduling \code{longTaskBC} before \code{shortTaskAC}.
-
-\cexample{task_dep}{10}
-
-\ffreeexample{task_dep}{10}
-
-\subsection{Multidependences Using Iterators}
-\label{subsec:depend_iterator}
-
-The following example uses an iterator to define a dynamic number of
-dependences.
-
-In the \code{single} construct of a parallel region a loop generates n tasks
-and each task has an \code{out} dependence specified through an element of
-the \plc{v} array.  This is followed by a single task that defines an \code{in}
-dependence on each element of the array.  This is accomplished by
-using the \code{iterator} modifier in the \code{depend} clause, supporting a dynamic number
-of dependences (\plc{n} here).
-
-The task for the \plc{print\_all\_elements} function is not executed until all dependences
-prescribed (or registered) by the iterator are fulfilled; that is,
-after all the tasks generated by the loop have completed.
-
-Note, one cannot simply use an array section in the \code{depend} clause
-of the second task construct because this would violate the \code{depend} clause restriction:
-
-"List items used in \code{depend} clauses of the same task or sibling tasks
-must indicate identical storage locations or disjoint storage locations".
-
-In this case each of the loop tasks use a single disjoint (different storage)
-element in their \code{depend} clause; however,
-the array-section storage area prescribed in the commented directive is neither
-identical nor disjoint to the storage prescibed by the elements of the
-loop tasks.  The iterator overcomes this restriction by effectively
-creating n disjoint storage areas.
-
-\cexample{task_dep}{11}
-
-\ffreeexample{task_dep}{11}
--- a/Examples_task_priority.tex
+++ b/Examples_task_priority.tex
@ -1,22 +0,0 @@
-\pagebreak
-\section{Task Priority}
-\label{sec:task_priority}
-
-
-
-%\subsection{Task Priority}
-%\label{subsec:task_priority}
-
-In this example we compute arrays in a matrix through a \plc{compute\_array} routine.
-Each task has a priority value equal to the value of the loop variable \plc{i} at the
-moment of its creation. A higher priority on a task means that a task is a candidate
-to run sooner.
-
-The creation of tasks occurs in ascending order (according to the iteration space of
-the loop) but a hint, by means of the \code{priority} clause, is provided to reverse
-the execution order.
-
-\cexample{task_priority}{1}
-
-\ffreeexample{task_priority}{1}
-
--- a/Examples_taskgroup.tex
+++ b/Examples_taskgroup.tex
@ -1,20 +0,0 @@
-\pagebreak
-\section{The \code{taskgroup} Construct}
-\label{sec:taskgroup}
-
-In this example, tasks are grouped and synchronized using the \code{taskgroup} 
-construct.
-
-Initially, one task (the task executing the \code{start\_background\_work()} 
-call) is created in the \code{parallel} region, and later a parallel tree traversal 
-is started (the task executing the root of the recursive \code{compute\_tree()} 
-calls). While synchronizing tasks at the end of each tree traversal, using the 
-\code{taskgroup} construct ensures that the formerly started background task 
-does not participate in the synchronization, and is left free to execute in parallel. 
-This is opposed to the behavior of the \code{taskwait} construct, which would 
-include the background tasks in the synchronization.
-
-\cexample{taskgroup}{1}
-
-\ffreeexample{taskgroup}{1}
-
--- a/Examples_tasking.tex
+++ b/Examples_tasking.tex
@ -1,190 +0,0 @@
-\pagebreak
-\section{The \code{task} and \code{taskwait} Constructs}
-\label{sec:task_taskwait}
-
-The following example shows how to traverse a tree-like structure using explicit 
-tasks. Note that the \code{traverse} function should be called from within a 
-parallel region for the different specified tasks to be executed in parallel. Also 
-note that the tasks will be executed in no specified order because there are no 
-synchronization directives. Thus, assuming that the traversal will be done in post 
-order, as in the sequential code, is wrong.
-
-\cexample{tasking}{1}
-
-\ffreeexample{tasking}{1}
-
-In the next example, we force a postorder traversal of the tree by adding a \code{taskwait} 
-directive. Now, we can safely assume that the left and right sons have been executed 
-before we process the current node.
-
-\cexample{tasking}{2}
-
-\ffreeexample{tasking}{2}
-
-The following example demonstrates how to use the \code{task} construct to process 
-elements of a linked list in parallel. The thread executing the \code{single} 
-region generates all of the explicit tasks, which are then executed by the threads 
-in the current team. The pointer \plc{p} is \code{firstprivate} by default 
-on the \code{task} construct so it is not necessary to specify it in a \code{firstprivate} 
-clause.
-
-\cexample{tasking}{3}
-
-\ffreeexample{tasking}{3}
-
-The \code{fib()} function should be called from within a \code{parallel}  region 
-for the different specified tasks to be executed in parallel. Also, only one thread 
-of the \code{parallel} region should call \code{fib()} unless multiple concurrent 
-Fibonacci computations are desired. 
-
-\cexample{tasking}{4}
-
-\fexample{tasking}{4}
-
-Note: There are more efficient algorithms for computing Fibonacci numbers. This 
-classic recursion algorithm is for illustrative purposes.
-
-The following example demonstrates a way to generate a large number of tasks with 
-one thread and execute them with the threads in the team. While generating these 
-tasks, the implementation may reach its limit on unassigned tasks.  If it does, 
-the implementation is allowed to cause the thread executing the task generating 
-loop to suspend its task at the task scheduling point in the \code{task} directive, 
-and start executing unassigned tasks.  Once the number of unassigned tasks is sufficiently 
-low, the thread may resume execution of the task generating loop.
-
-\cexample{tasking}{5}
-
-\fexample{tasking}{5}
-
-The following example is the same as the previous one, except that the tasks are 
-generated in an untied task. While generating the tasks, the implementation may 
-reach its limit on unassigned tasks. If it does, the implementation is allowed 
-to cause the thread executing the task generating loop to suspend its task at the 
-task scheduling point in the \code{task} directive, and start executing unassigned 
-tasks.  If that thread begins execution of a task that takes a long time to complete, 
-the other threads may complete all the other tasks before it is finished.
-
-In this case, since the loop is in an untied task, any other thread is eligible 
-to resume the task generating loop. In the previous examples, the other threads 
-would be forced to idle until the generating thread finishes its long task, since 
-the task generating loop was in a tied task.
-
-\cexample{tasking}{6}
-
-\fexample{tasking}{6}
-
-The following two examples demonstrate how the scheduling rules illustrated in 
-Section 2.11.3 of the OpenMP 4.0 specification affect the usage of 
-\code{threadprivate} variables in tasks. A \code{threadprivate} 
-variable can be modified by another task that is executed by the same thread. Thus, 
-the value of a \code{threadprivate} variable cannot be assumed to be unchanged 
-across a task scheduling point. In untied tasks, task scheduling points may be 
-added in any place by the implementation.
-
-A task switch may occur at a task scheduling point. A single thread may execute 
-both of the task regions that modify \code{tp}. The parts of these task regions 
-in which \code{tp} is modified may be executed in any order so the resulting 
-value of \code{var} can be either 1 or 2.
-
-\cexample{tasking}{7}
-
-
-\fexample{tasking}{7}
-
-In this example, scheduling constraints prohibit a thread in the team from executing 
-a new task that modifies \code{tp}  while another such task region tied to the 
-same thread is suspended. Therefore, the value written will persist across the 
-task scheduling point.
-
-\cexample{tasking}{8}
-
-
-\fexample{tasking}{8}
-
-The following two examples demonstrate how the scheduling rules illustrated in 
-Section 2.11.3 of the OpenMP 4.0 specification affect the usage of locks 
-and critical sections in tasks.  If a lock is held 
-across a task scheduling point, no attempt should be made to acquire the same lock 
-in any code that may be interleaved.  Otherwise, a deadlock is possible.
-
-In the example below, suppose the thread executing task 1 defers task 2.  When 
-it encounters the task scheduling point at task 3, it could suspend task 1 and 
-begin task 2 which will result in a deadlock when it tries to enter critical region 
-1.
-
-\cexample{tasking}{9}
-
-
-\fexample{tasking}{9}
-
-In the following example, \code{lock} is held across a task scheduling point. 
- However, according to the scheduling restrictions, the executing thread can't 
-begin executing one of the non-descendant tasks that also acquires \code{lock} before 
-the task region is complete.  Therefore, no deadlock is possible.
-
-\cexample{tasking}{10}
-
-
-\ffreeexample{tasking}{10}
-
-The following examples illustrate the use of the \code{mergeable} clause in the 
-\code{task} construct. In this first example, the \code{task} construct has 
-been annotated with the \code{mergeable}  clause. The addition of this clause 
-allows the implementation to reuse the data environment (including the ICVs) of 
-the parent task for the task inside \code{foo} if the task is included or undeferred. 
-Thus, the result of the execution may differ depending on whether the task is merged 
-or not. Therefore the mergeable clause needs to be used with caution. In this example, 
-the use of the mergeable clause is safe. As \code{x} is a shared variable the 
-outcome does not depend on whether or not the task is merged (that is, the task 
-will always increment the same variable and will always compute the same value 
-for \code{x}).
-
-\cexample{tasking}{11}
-
-\ffreeexample{tasking}{11}
-
-This second example shows an incorrect use of the \code{mergeable} clause. In 
-this example, the created task will access different instances of the variable 
-\code{x} if the task is not merged, as \code{x} is \code{firstprivate}, but 
-it will access the same variable \code{x} if the task is merged. As a result, 
-the behavior of the program is unspecified and it can print two different values 
-for \code{x} depending on the decisions taken by the implementation.
-
-\cexample{tasking}{12}
-
-\ffreeexample{tasking}{12}
-
-The following example shows the use of the \code{final} clause and the \code{omp\_in\_final} 
-API call in a recursive binary search program. To reduce overhead, once a certain 
-depth of recursion is reached the program uses the \code{final} clause to create 
-only included tasks, which allow additional optimizations.
-
-The use of the \code{omp\_in\_final} API call allows programmers to optimize 
-their code by specifying which parts of the program are not necessary when a task 
-can create only included tasks (that is, the code is inside a \code{final} task). 
-In this example, the use of a different state variable is not necessary so once 
-the program reaches the part of the computation that is finalized and copying from 
-the parent state to the new state is eliminated. The allocation of \code{new\_state} 
-in the stack could also be avoided but it would make this example less clear. The 
-\code{final} clause is most effective when used in conjunction with the \code{mergeable} 
-clause since all tasks created in a \code{final} task region are included tasks 
-that can be merged if the \code{mergeable} clause is present.
-
-\cexample{tasking}{13}
-
-\ffreeexample{tasking}{13}
-
-The following example illustrates the difference between the \code{if}  and the 
-\code{final} clauses. The \code{if} clause has a local effect. In the first 
-nest of tasks, the one that has the \code{if}  clause will be undeferred but 
-the task nested inside that task will not be affected by the \code{if} clause 
-and will be created as usual. Alternatively, the \code{final} clause affects 
-all \code{task} constructs in the \code{final} task region but not the \code{final} 
-task itself. In the second nest of tasks, the nested tasks will be created as included 
-tasks. Note also that the conditions for the \code{if} and \code{final} clauses 
-are usually the opposite.
-
-\cexample{tasking}{14}
-
-\ffreeexample{tasking}{14}
-
--- a/Examples_taskloop.tex
+++ b/Examples_taskloop.tex
@ -1,39 +0,0 @@
-\pagebreak
-\section{The \code{taskloop} Construct}
-\label{sec:taskloop}
-
-The following example illustrates how to execute a long running task concurrently with tasks created
-with a \code{taskloop} directive for a loop having unbalanced amounts of work for its iterations.
-
-The \code{grainsize} clause specifies that each task is to execute at least 500 iterations of the loop. 
-
-The \code{nogroup} clause removes the implicit taskgroup of the \code{taskloop} construct; the explicit \code{taskgroup} construct in the example ensures that the function is not exited before the long-running task and the loops have finished execution.
-
-\cexample{taskloop}{1}
-
-\ffreeexample{taskloop}{1}
-
-%\clearpage
-
-Because a \code{taskloop} construct encloses a loop, it is often incorrectly 
-perceived as a worksharing construct (when it is directly nested in 
-a \code{parallel} region).
-
-While a worksharing construct distributes the loop iterations across all threads in a team,
-the entire loop of a \code{taskloop} construct is executed by every thread of the team.
-
-In the example below the first taskloop occurs closely nested within 
-a \code{parallel} region and the entire loop is executed by each of the \plc{T} threads; 
-hence the reduction sum is executed \plc{T}*\plc{N} times.
- 
-The loop of the second taskloop is within a \code{single} region and is executed
-by a single thread so that only \plc{N} reduction sums occur.  (The other
-\plc{N}-1 threads of the \code{parallel} region will participate in executing the 
-tasks. This is the common use case for the \code{taskloop} construct.)
-
-In the example, the code thus prints \code{x1 = 16384} (\plc{T}*\plc{N}) and 
-\code{x2 = 1024} (\plc{N}).
-
-\cexample{taskloop}{2}
-
-\ffreeexample{taskloop}{2}
--- a/Examples_taskyield.tex
+++ b/Examples_taskyield.tex
@ -1,14 +0,0 @@
-\pagebreak
-\section{The \code{taskyield} Construct}
-\label{sec:taskyield}
-
-The following example illustrates the use of the \code{taskyield}  directive. 
-The tasks in the example compute something useful and then do some computation 
-that must be done in a critical region. By using \code{taskyield} when a task 
-cannot get access to the \code{critical} region the implementation can suspend 
-the current task and schedule some other task that can do something useful. 
-
-\cexample{taskyield}{1}
-
-\ffreeexample{taskyield}{1}
-
--- a/Examples_teams.tex
+++ b/Examples_teams.tex
@ -1,124 +0,0 @@
-\pagebreak
-\section{\code{teams} Constructs}
-\label{sec:teams}
-
-\subsection{\code{target} and \code{teams} Constructs with \code{omp\_get\_num\_teams}\\
-and \code{omp\_get\_team\_num} Routines}
-\label{subsec:teams_api}
-
-The following example shows how the \code{target} and \code{teams} constructs 
-are used to create a league of thread teams that execute a region. The \code{teams} 
-construct creates a league of at most two teams where the master thread of each 
-team executes the \code{teams} region.
-
-The \code{omp\_get\_num\_teams} routine returns the number of teams executing in a \code{teams} 
-region. The \code{omp\_get\_team\_num} routine returns the team number, which is an integer 
-between 0 and one less than the value returned by \code{omp\_get\_num\_teams}. The following 
-example manually distributes a loop across two teams.
-
-\cexample{teams}{1}
-
-\ffreeexample{teams}{1}
-
-\subsection{\code{target}, \code{teams}, and \code{distribute} Constructs}
-\label{subsec:teams_distribute}
-
-The following example shows how the \code{target}, \code{teams}, and \code{distribute} 
-constructs are used to execute a loop nest in a \code{target} region. The \code{teams} 
-construct creates a league and the master thread of each team executes the \code{teams} 
-region. The \code{distribute} construct schedules the subsequent loop iterations 
-across the master threads of each team.
-
-The number of teams in the league is less than or equal to the variable \plc{num\_blocks}. 
-Each team in the league has a number of threads less than or equal to the variable 
-\plc{block\_threads}. The iterations in the outer loop are distributed among the master 
-threads of each team.
-
-When a team's master thread encounters the parallel loop construct before the inner 
-loop, the other threads in its team are activated. The team executes the \code{parallel} 
-region and then workshares the execution of the loop.
-
-Each master thread executing the \code{teams} region has a private copy of the 
-variable \plc{sum} that is created by the \code{reduction} clause on the \code{teams} construct. 
-The master thread and all threads in its team have a private copy of the variable 
-\plc{sum} that is created by the \code{reduction} clause on the parallel loop construct. 
-The second private \plc{sum} is reduced into the master thread's private copy of \plc{sum} 
-created by the \code{teams} construct. At the end of the \code{teams} region, 
-each master thread's private copy of \plc{sum} is reduced into the final \plc{sum} that is 
-implicitly mapped into the \code{target} region.
-
-\cexample{teams}{2}
-
-\ffreeexample{teams}{2}
-
-\subsection{\code{target} \code{teams}, and Distribute Parallel Loop Constructs}
-\label{subsec:teams_distribute_parallel}
-
-The following example shows how the \code{target} \code{teams} and distribute 
-parallel loop constructs are used to execute a \code{target} region. The \code{target} 
-\code{teams} construct creates a league of teams where the master thread of each 
-team executes the \code{teams} region.
-
-The distribute parallel loop construct schedules the loop iterations across the 
-master threads of each team and then across the threads of each team.
-
-\cexample{teams}{3}
-
-\ffreeexample{teams}{3}
-
-\subsection{\code{target} \code{teams} and Distribute Parallel Loop 
-Constructs with Scheduling Clauses}
-\label{subsec:teams_distribute_parallel_schedule}
-
-The following example shows how the \code{target} \code{teams} and distribute 
-parallel loop constructs are used to execute a \code{target} region. The \code{teams} 
-construct creates a league of at most eight teams where the master thread of each 
-team executes the \code{teams} region. The number of threads in each team is 
-less than or equal to 16.
-
-The \code{distribute} parallel loop construct schedules the subsequent loop iterations 
-across the master threads of each team and then across the threads of each team.
-
-The \code{dist\_schedule} clause on the distribute parallel loop construct indicates 
-that loop iterations are distributed to the master thread of each team in chunks 
-of 1024 iterations.
-
-The \code{schedule} clause indicates that the 1024 iterations distributed to 
-a master thread are then assigned to the threads in its associated team in chunks 
-of 64 iterations.
-
-\cexample{teams}{4}
-
-\ffreeexample{teams}{4}
-
-\subsection{\code{target} \code{teams} and \code{distribute} \code{simd} Constructs}
-\label{subsec:teams_distribute_simd}
-
-The following example shows how the \code{target} \code{teams} and \code{distribute} 
-\code{simd} constructs are used to execute a loop in a \code{target} region. 
-The \code{target} \code{teams} construct creates a league of teams where the 
-master thread of each team executes the \code{teams} region.
-
-The \code{distribute} \code{simd} construct schedules the loop iterations across 
-the master thread of each team and then uses SIMD parallelism to execute the iterations.
-
-\cexample{teams}{5}
-
-\ffreeexample{teams}{5}
-
-\subsection{\code{target} \code{teams} and Distribute Parallel Loop SIMD Constructs}
-\label{subsec:teams_distribute_parallel_simd}
-
-The following example shows how the \code{target} \code{teams} and the distribute 
-parallel loop SIMD constructs are used to execute a loop in a \code{target} \code{teams} 
-region. The \code{target} \code{teams} construct creates a league of teams 
-where the master thread of each team executes the \code{teams} region.
-
-The distribute parallel loop SIMD construct schedules the loop iterations across 
-the master thread of each team and then across the threads of each team where each 
-thread uses SIMD parallelism.
-
-\cexample{teams}{6}
-
-\ffreeexample{teams}{6}
-
--- a/Examples_threadprivate.tex
+++ b/Examples_threadprivate.tex
@ -1,106 +0,0 @@
-\pagebreak
-\section{The \code{threadprivate} Directive}
-\label{sec:threadprivate}
-
-The following examples demonstrate how to use the \code{threadprivate} directive 
- to give each thread a separate counter.
-
-\cexample{threadprivate}{1}
-
-\fexample{threadprivate}{1}
-
-\ccppspecificstart
-The following example uses \code{threadprivate} on a static variable:
-
-\cnexample{threadprivate}{2}
-
-The following example demonstrates unspecified behavior for the initialization 
-of a \code{threadprivate} variable. A \code{threadprivate}  variable is initialized 
-once at an unspecified point before its first reference. Because \code{a} is 
-constructed using the value of \code{x}  (which is modified by the statement 
-\code{x++}), the value of \code{a.val}  at the start of the \code{parallel} 
-region could be either 1 or 2. This problem is avoided for \code{b}, which uses 
-an auxiliary \code{const} variable and a copy-constructor.
-
-\cppnexample{threadprivate}{3}
-\ccppspecificend
-
-The following examples show non-conforming uses and correct uses of the \code{threadprivate} 
-directive. 
-
-\fortranspecificstart
-The following example is non-conforming because the common block is not declared 
-local to the subroutine that refers to it:
-
-\fnexample{threadprivate}{2}
-
-The following example is also non-conforming because the common block is not declared 
-local to the subroutine that refers to it:
-
-\fnexample{threadprivate}{3}
-
-The following example is a correct rewrite of the previous example:
-
-\fnexample{threadprivate}{4}
-
-The following is an example of the use of \code{threadprivate} for local variables:
-% blue line floater at top of this page for "Fortran, cont."
-\begin{figure}[t!]
-\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
-\end{figure}
-
-\fnexample{threadprivate}{5}
-
-The above program, if executed by two threads, will print one of the following 
-two sets of output: 
-
-\code{a = 11 12 13}
-\\
-\code{ptr = 4}
-\\
-\code{i = 15}
-
-\code{A is not allocated}
-\\
-\code{ptr = 4}
-\\
-\code{i = 5}
-
-or
-
-\code{A is not allocated}
-\\
-\code{ptr = 4}
-\\
-\code{i = 15}
-
-\code{a = 1 2 3}
-\\
-\code{ptr = 4}
-\\
-\code{i = 5}
-
-The following is an example of the use of \code{threadprivate} for module variables:
-% blue line floater at top of this page for "Fortran, cont."
-\begin{figure}[t!]
-\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
-\end{figure}
-
-\fnexample{threadprivate}{6}
-\fortranspecificend
-
-\cppspecificstart
-The following example illustrates initialization of \code{threadprivate} variables 
-for class-type \code{T}. \code{t1} is default constructed, \code{t2} is constructed 
-taking a constructor accepting one argument of integer type, \code{t3} is copy 
-constructed with argument \code{f()}:
-
-\cppnexample{threadprivate}{4}
-
-The following example illustrates the use of \code{threadprivate} for static 
-class members. The \code{threadprivate} directive for a static class member must 
-be placed inside the class definition.
-
-\cppnexample{threadprivate}{5}
-\cppspecificend
-
--- a/Examples_udr.tex
+++ b/Examples_udr.tex
@ -1,89 +0,0 @@
-\subsection{User-Defined Reduction}
-\label{subsec:UDR}
-
-The \code{declare}~\code{reduction} directive can be used to specify 
-user-defined reductions (UDR) for user data types.
-
-%The following examples show how user-defined reductions can be used to support user data types in the \code{reduction} clause.
-
-%The following example computes the enclosing rectangle of a set of points. The point data structure (\code{struct}~\code{point}) is not supported by the \code{reduction} clause. Using two \code{declare}~\code{reduction} directives we define how a reduction for the point data structure is done for the \plc{min} and \plc{max} operations. Each \code{declare}~\code{reduction} directive calls the appropriate function that passes the two special variables that can be used in the user-defined reduction expression: \code{omp\_in}, which holds one of the two values to reduce, and \code{omp\_out}, which holds the other value and should hold also the result of the reduction once the expression has been executed. Note, also, that when defining the user-defined reduction for \plc{min} we specify how the private variables of each thread are to be initialized (that is, the neutral value). This is not the case for \plc{max} as the default values (that is, zero filling) are already adequate.
-
-
-In the following example, \code{declare}~\code{reduction} directives are used to define
-\plc{min} and \plc{max} operations for the \plc{point} data structure for computing
-the rectangle that encloses a set of 2-D points.
-
-Each \code{declare}~\code{reduction} directive defines new reduction identifiers,
-\plc{min} and \plc{max}, to be used in a \code{reduction} clause. The next item in the
-declaration list is the data type (\plc{struct} \plc{point}) used in the reduction,
-followed by the combiner, here the functions \plc{minproc} and \plc{maxproc} perform
-the min and max operations, respectively, on the user data (of type \plc{struct} \plc{point}).
-In the function argument list are two special OpenMP variable identifiers, \code{omp\_in} and \code{omp\_out},
-that denote the two values to be combined in the "real" function;
-the \code{omp\_out} identifier indicates which one is to hold the result.
-
-The initializer of the \code{declare}~\code{reduction} directive specifies
-the initial value for the private variable of each implicit task.
-The \code{omp\_priv} identifier is used to denote the private variable.
-
-\cexample{udr}{1}
-
-The following example shows the corresponding code in Fortran. 
-The \code{declare}~\code{reduction} directives are specified as part of 
-the declaration in subroutine \plc{find\_enclosing\_rectangle} and 
-the procedures that perform the min and max operations are specified as subprograms.
-
-\ffreeexample{udr}{1}
-
-
-The following example shows the same computation as \plc{udr.1} but it illustrates that you can craft complex expressions in the user-defined reduction declaration. In this case, instead of calling the \plc{minproc} and \plc{maxproc} functions we inline the code in a single expression.
-
-\cexample{udr}{2}
-
-The corresponding code of the same example in Fortran is very similar
-except that the assignment expression in the \code{declare}~\code{reduction}
-directive can only be used for a single variable, in this case through
-a type structure constructor \plc{point($\ldots$)}.
-
-\ffreeexample{udr}{2}
-
-
-The following example shows the use of special variables in arguments for combiner (\code{omp\_in} and \code{omp\_out}) and initializer (\code{omp\_priv} and \code{omp\_orig}) routines.  This example returns the maximum value of an array and the corresponding index value. The \code{declare}~\code{reduction} directive specifies a user-defined reduction operation \plc{maxloc} for data type \plc{struct} \plc{mx\_s}. The function \plc{mx\_combine} is the combiner and the function \plc{mx\_init} is the initializer.
-
-\cexample{udr}{3}
-
-Below is the corresponding Fortran version of the above example.  The \code{declare}~\code{reduction} directive specifies the user-defined operation \plc{maxloc} for user-derived type \plc{mx\_s}.  The combiner \plc{mx\_combine} and the initializer \plc{mx\_init} are specified as subprograms.
-
-\ffreeexample{udr}{3}
-
-
-The following example explains a few details of the user-defined reduction 
-in Fortran through modules. The \code{declare}~\code{reduction} directive is declared in a module (\plc{data\_red}). 
-The reduction-identifier \plc{.add.} is a user-defined operator that is
-to allow accessibility in the scope that performs the reduction
-operation.
-The user-defined operator \plc{.add.} and the subroutine \plc{dt\_init} specified in the \code{initializer} clause are defined in the same subprogram.
-
-The reduction operation (that is, the \code{reduction} clause) is in the main program.
-The reduction identifier \plc{.add.} is accessible by use association.
-Since \plc{.add.} is a user-defined operator, the explicit interface
-should also be accessible by use association in the current
-program unit.
-Since the \code{declare}~\code{reduction} associated to this \code{reduction} clause
-has the \code{initializer} clause, the subroutine specified on the clause
-must be accessible in the current scoping unit.  In this case,
-the subroutine \plc{dt\_init} is accessible by use association.
-
-\ffreeexample{udr}{4}
-
-
-The following example uses user-defined reductions to declare a plus (+) reduction for a C++ class. As the \code{declare}~\code{reduction} directive is inside the context of the \plc{V} class the expressions in the \code{declare}~\code{reduction} directive are resolved in the context of the class. Also, note that the \code{initializer} clause uses a copy constructor to initialize the private variables of the reduction and it uses as parameter to its original variable by using the special variable \code{omp\_orig}.
-
-\cppexample{udr}{5}
-
-The following examples shows how user-defined reductions can be defined for some STL containers. The first \code{declare}~\code{reduction} defines the plus (+) operation for \plc{std::vector<int>} by making use of the \plc{std::transform} algorithm. The second and third define the merge (or concatenation) operation for \plc{std::vector<int>} and \plc{std::list<int>}. 
-%It shows how the same user-defined reduction operation can be defined to be done differently depending on the specified data type.
-It shows how the user-defined reduction operation can be applied to specific data types of an STL.
-
-\cppexample{udr}{6}
-
--- a/Examples_variant.tex
+++ b/Examples_variant.tex
@ -1,77 +0,0 @@
-\pagebreak
-\section{\code{declare}~\code{variant} Directive}
-\label{sec:declare_variant}
-
-%A \code{declare variant} directive specifies that the following function is an alternate function, 
-%a \plc{function variant}, to be used in place of the specified \plc{base function} 
-%when the trait within the \code{match} clause has a valid context.
-
-A \code{declare}~\code{variant} directive specifies an alternate function, 
-\plc{function variant}, to be used in place of the \plc{base function} 
-%when the trait within the \code{match} clause has a valid context.
-when the trait within the \code{match} clause matches the OpenMP context at a given call site.
-The base function follows the directive in the C and C++ languages.
-In Fortran, either a subroutine or function may be used as the \plc{base function},
-and the \code{declare}~\code{variant} directive must be in the specification 
-part of a subroutine or function (unless a \plc{base-proc-name}
-modifier is used, as in the case of a procedure declaration statement). See
-the OpenMP 5.0 Specification for details on the modifier.
-
-When multiple \code{declare}~\code{variant} directives are used 
-a function variant becomes a candidate for replacing the base function if the
-%base function call context matches the traits of all selectors in the \code{match} clause.
-context at the base function call matches the traits of all selectors in the \code{match} clause.
-If there are multiple candidates, a score is assigned with rules for each
-of the selector traits. The scoring algorithm can be found in the OpenMP 5.0 Specification.
-
-In the first example the \plc{vxv()} function is called within a \code{parallel} region,
-a \code{target} region, and in a sequential part of the program.  Two function variants, \plc{p\_vxv()} and \plc{t\_vxv()},
-are defined for the first two regions by using \plc{parallel} and \plc{target} selectors (within
-the \plc{construct} trait set) in a \code{match} clause.  The \plc{p\_vxv()} function variant includes
-a \code{for} construct (\code{do} construct for Fortran) for the \code{parallel} region, 
-while \plc{t\_vxv()} includes a \code{distribute}~\code{simd} construct for the \code{target} region.
-The \plc{t\_vxv()} function is explicitly compiled for the device using a \code{declare}~\code{target} directive.
-
-Since the two \code{declare}~\code{variant} directives have no selectors that match traits for the context
-of the base function call in the sequential part of the program, the base \plc{vxv()} function is used there, 
-as expected.
-(The vectors in the \plc{p\_vxv} and \plc{t\_vxv} functions have been multiplied
-by 3 and 2, respectively, for checking the validity of the replacement. Normally
-the purpose of a function variant is to produce the same results by a different method.)
-
-%Note: a \code{target teams} construct is used to direct execution onto a device, with a
-%\code{distribute simd} construct in the function variant. As of the OpenMP 5.0 implementation
-%no intervening code is allowed between a \code{target} and \code{teams} construct. So
-%using a \code{target} construct to direct execution onto a device, and including 
-%\code{teams distribute simd} in the variant function would produce non conforming code.
-
-%\pagebreak
-\cexample{declare_variant}{1}
-
-\ffreeexample{declare_variant}{1}
-
-
-%\pagebreak
-
-In this example, traits from the \plc{device} set are used to select a function variant.
-In the \code{declare}~\code{variant} directive, an \plc{isa} selector
-specifies that if the implementation of the ``\plc{core-avx512}'' 
-instruction set is detected at compile time the \plc{avx512\_saxpy()}
-variant function is used for the call to \plc{base\_saxpy()}.  
-
-A compilation of \plc{avx512\_saxpy()} is aware of
-the AVX-512 instruction set that supports 512-bit vector extensions (for Xeon or Xeon Phi architectures). 
-Within \plc{avx512\_saxpy()}, the \code{parallel}~\code{for}~\code{simd} construct performs parallel execution, and
-takes advantage of 64-byte data alignment. 
-When the \plc{avx512\_saxpy()} function variant is not selected, the base \plc{base\_saxpy()} function variant
-containing only a basic \code{parallel}~\code{for} construct is used for the call to \plc{base\_saxpy()}.
-
-%Note:
-%An allocator is used to set the alignment to 64 bytes when an OpenMP compilation is performed.  
-%Details about allocator variable declarations and functions
-%can be found in the allocator example of the Memory Management Chapter.
-
-%\pagebreak
-\cexample{declare_variant}{2}
-
-\ffreeexample{declare_variant}{2}
--- a/Examples_workshare.tex
+++ b/Examples_workshare.tex
@ -1,76 +0,0 @@
-\pagebreak
-\section{The \code{workshare} Construct}
-\fortranspecificstart
-\label{sec:workshare}
-
-The following are examples of the \code{workshare} construct. 
-
-In the following example, \code{workshare} spreads work across the threads executing 
-the \code{parallel} region, and there is a barrier after the last statement. 
-Implementations must enforce Fortran execution rules inside of the \code{workshare} 
-block.
-
-\fnexample{workshare}{1}
-
-In the following example, the barrier at the end of the first \code{workshare} 
-region is eliminated with a \code{nowait} clause. Threads doing \code{CC = 
-DD} immediately begin work on \code{EE = FF} when they are done with \code{CC 
-= DD}.
-
-\fnexample{workshare}{2}
-% blue line floater at top of this page for "Fortran, cont."
-\begin{figure}[t!]
-\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
-\end{figure}
-
-The following example shows the use of an \code{atomic} directive inside a \code{workshare} 
-construct. The computation of \code{SUM(AA)} is workshared, but the update to 
-\code{R} is atomic.
-
-\fnexample{workshare}{3}
-
-Fortran \code{WHERE} and \code{FORALL} statements are \emph{compound statements}, 
-made up of a \emph{control} part and a \emph{statement} part. When \code{workshare} 
-is applied to one of these compound statements, both the control and the statement 
-parts are workshared. The following example shows the use of a \code{WHERE} statement 
-in a \code{workshare} construct.
-
-Each task gets worked on in order by the threads:
-
-\code{AA = BB} then
-\\
-\code{CC = DD} then
-\\
-\code{EE .ne. 0} then
-\\
-\code{FF = 1 / EE} then
-\\
-\code{GG = HH}
-
-\fnexample{workshare}{4}
-% blue line floater at top of this page for "Fortran, cont."
-\begin{figure}[t!]
-\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
-\end{figure}
-
-In the following example, an assignment to a shared scalar variable is performed 
-by one thread in a \code{workshare} while all other threads in the team wait.
-
-\fnexample{workshare}{5}
-
-The following example contains an assignment to a private scalar variable, which 
-is performed by one thread in a \code{workshare} while all other threads wait. 
-It is non-conforming because the private scalar variable is undefined after the 
-assignment statement. 
-
-\fnexample{workshare}{6}
-
-Fortran execution rules must be enforced inside a \code{workshare} construct. 
-In the following example, the same result is produced in the following program 
-fragment regardless of whether the code is executed sequentially or inside an OpenMP 
-program with multiple threads:
-
-\fnexample{workshare}{7}
-\fortranspecificend
-
-
--- a/Examples_worksharing_critical.tex
+++ b/Examples_worksharing_critical.tex
@ -1,18 +0,0 @@
-\pagebreak
-\section{Worksharing Constructs Inside a \code{critical} Construct}
-\label{sec:worksharing_critical}
-
-The following example demonstrates using a worksharing construct inside a \code{critical} 
-construct. This example is conforming because the worksharing \code{single}  
-region is not closely nested inside the \code{critical} region. A single thread 
-executes the one and only section in the \code{sections} region, and executes 
-the \code{critical} region. The same thread encounters the nested \code{parallel} 
-region, creates a new team of threads, and becomes the master of the new team. 
-One of the threads in the new team enters the \code{single} region and increments 
-\code{i} by \code{1}. At the end of this example \code{i} is equal to \code{2}.
-
-\cexample{worksharing_critical}{1}
-
-\fexample{worksharing_critical}{1}
-
-
--- a/Foreword_Chapt.tex
+++ b/Foreword_Chapt.tex
@ -1,23 +1,48 @@
-\pagebreak
 \chapter*{Foreword}
 \label{chap:foreword}
-\addcontentsline{toc}{chapter}{\protect\numberline{}Foreword}

 The OpenMP Examples document has been updated with new features
-found in the OpenMP 5.0 Specification. The additional examples and updates
-are referenced in the Document Revision History of the Appendix, \specref{sec:history_45_to_50}.
+found in the OpenMP \SVER\ Specification. 
+In order to provide users with new feature examples concurrently
+with the release of the OpenMP 6.0 Specification, 
+the 6.0 Examples document is being released early
+with a caveat that some of the 6.0 features
+(such as \kcode{workdistribute} construct, \kcode{taskgraph} construct,
+\kcode{threadset} clause and free-agent threads) will be covered
+in the next release of the document.
+For a list of the new examples and updates in this release,
+please refer to the Document Revision History of the Appendix on page~\pageref{chap:history}.

-Text describing an example with a 5.0 feature specifically states
-that the feature support begins in the OpenMP 5.0 Specification.  Also,
-an \plc{omp\_5.0} keyword has been added to metadata in the source code.
-These distinctions are presented to remind readers that a 5.0 compliant 
+Text describing an example with a \SVER\ feature specifically states
+that the feature support begins in the OpenMP \SVER\ Specification.  Also,
+an \kcode{\small{}omp_\SVER} keyword is included in the  metadata of the source code.
+These distinctions are presented to remind readers that a \SVER\ compliant 
 OpenMP implementation is necessary to use these features in codes.

-Examples for most of the 5.0 features are included in this document,
-and incremental releases will become available as more feature examples
-and updates are submitted, and approved by the OpenMP Examples Subcommittee.
+%Examples for most of the \SVER\ features are included in this document,
+%and 
+Incremental releases will become available as more feature examples
+and updates are submitted and approved by the OpenMP Examples Subcommittee.
+Examples are accepted for this document after discussions, revisions and reviews
+in the Examples Subcommittee, and two reviews/discussions and two votes
+in the OpenMP Language Committee.
+Draft examples are often derived from case studies for new features in the language,
+and are revised to illustrate the basic application of the features with code comments,
+and a text description. We are grateful to the numerous members of the Language Committee 
+who took the time to prepare codes and descriptions, and shepherd them through
+the acceptance process. We sincerely appreciate the Example Subcommittee members, who 
+actively participated and contributed in weekly meetings over the years.

 \bigskip
-Examples Subcommitee Co-chairs: \smallskip\linebreak
+Examples Subcommittee Co-chairs: \smallskip\linebreak
 Henry Jin (\textsc{NASA} Ames Research Center) \linebreak
-Kent Milfeld (\textsc{TACC}, Texas Advanced Research Center)
+Swaroop Pophale (Oak Ridge National Laboratory)
+
+\bigskip
+\bigskip
+Past Examples Subcommittee Co-chairs:
+\begin{itemize}
+\item Kent Milfeld (2014 - 2022)
+\end{itemize}
+
+
--- a/History.tex
+++ b/History.tex
@ -1,6 +1,377 @@
-\chapter{Document Revision History}
+\cchapter{Document Revision History}{history}
 \label{chap:history}

+%=====================================
+\section{Changes from 5.2.2 to 6.0}
+\label{sec:history_522_to_60}
+
+\begin{itemize}
+\item General changes:
+\begin{itemize}
+  \item Added a set of structured LaTeX environments for specifying
+    language-dependent text.  This allows extracting language-specific
+    content of the Examples document.  Refer to the content of
+    \examplesblob{v6.0/Contributions.md} for details.
+\end{itemize}
+
+\item Added the following examples for the 6.0 features:
+\begin{itemize}
+  \item \kcode{omp::decl} attribute for declarative directives in C/C++
+    (\specref{sec:attributes})
+  \item \kcode{transparent} clause on the \kcode{task} construct to enable dependences
+     between non-sibling tasks (\specref{subsec:depend_trans_task})
+  \item Task dependences for \kcode{taskloop} construct
+    (\specref{sec:taskloop_depend})
+  \item \kcode{num_threads} clause that appears inside \kcode{target} region
+    (\specref{subsec:target_teams_num_teams})
+  \item \kcode{nowait} clause with argument on the \kcode{target} construct to control deferment 
+     of target task (\specref{subsec:async_target_nowait_arg})
+  \item Traits for specifying devices (\specref{sec:device_env_traits})
+  \item \kcode{apply} clause with modifier argument to 
+    support selective loop transformations
+    (\specref{sec:apply_clause})
+  \item Reduction on private variables in a \kcode{parallel} region
+    (\specref{subsec:priv_reduction})
+  \item \kcode{induction} clause (\specref{subsec:induction})
+    and user-defined induction (\specref{subsec:user-defined-induction})
+  \item \kcode{init_complete} clause for \kcode{scan} directive to 
+    support initialization phase in scan operation
+    (\specref{sec:scan})
+  \item \kcode{assume} construct with \kcode{no_openmp} and \kcode{no_parallelism} clauses (\specref{sec:assumption})
+  \item \kcode{num_threads} clause with a list
+    (\specref{subsec:icv_nthreads})
+  \item \kcode{dispatch} construct to control variant substitution
+    for a procedure call (\specref{sec:dispatch})
+\end{itemize}
+
+\item Other changes:
+\begin{itemize}
+  \item Changed attribute specifier as a directive form from C++ only to C/C++
+    (\specref{chap:directive_syntax})
+  \item Added missing \bcode{include <omp.h>} in Example \example{atomic.4.c}
+    and \bcode{use omp_lib} in Example \example{atomic.4.f90}
+    (\specref{sec:atomic_hint})
+  \item Fixed the function declaration order for variant functions in
+    Examples \example{selector_scoring.[12].c} and Fortran pointer
+    initialization in Example \example{selector_scoring.2.f90}
+    (\specref{subsec:context_selector_scoring})
+  \item Replaced the deprecated use of \plc{combiner-exp} 
+    in \kcode{declare reduction} directive with \kcode{combiner} clause
+    (\specref{subsec:UDR} and \specref{sec:Updated Examples})
+  \item Fixed the initialization of Fortran pointers 
+    in Example \example{cancellation.2.f90} and changed to
+    use \kcode{atomic write} for performing atomic writes
+    (\specref{sec:cancellation})
+  \item Added missing \kcode{declare target} directive for external procedure
+    called inside \kcode{target} region in Example 
+    \example{requires.1.f90} (\specref{sec:requires})
+\end{itemize}
+
+\end{itemize}
+
+%=====================================
+\section{Changes from 5.2.1 to 5.2.2}
+\label{sec:history_521_to_522}
+
+\begin{itemize}
+\item To improve the style of the document, a set of macros was introduced
+  and consistently used for language keywords, names, concepts, and user codes
+  in the text description of the document.  Refer to the content of 
+  \examplesblob{v5.2.2/Contributions.md}
+  for details.
+
+\item Added the following examples:
+\begin{itemize}
+  \item Orphaned and nested \kcode{loop} constructs (\specref{sec:loop})
+  \item \kcode{all} variable category for the \kcode{defaultmap} clause
+    (\specref{sec:defaultmap})
+  \item \kcode{target update} construct using a custom mapper
+    (\specref{subsec:target_update_mapper})
+  \item \kcode{indirect} clause for indirect procedure calls in a
+    \kcode{target} region (\specref{subsec:indirect})
+  \item \kcode{omp_target_memcpy_async} routine with depend object
+    (\specref{subsec:target_mem_and_device_ptrs})
+  \item Synchronization hint for atomic operation (\specref{sec:atomic_hint})
+  \item Implication of passing shared variable to a procedure
+    in Fortran (\specref{sec:fort_shared_var})
+  \item Assumption directives for providing additional information
+    about program properties (\specref{sec:assumption})
+  \item Mapping behavior of scalars, pointers, references (C++) and associate names
+        (Fortran) when unified shared memory is required
+    (\specref{sec:requires})
+  \item \kcode{begin declare variant} paired with \kcode{end declare variant}
+    example to show use of nested declare variant 
+    directives (\specref{subsec:declare_variant})
+  \item Explicit scoring in context selectors
+    (\specref{subsec:context_selector_scoring})
+\end{itemize}
+
+\item Miscellaneous changes:
+\begin{itemize}
+  \item Included a general statement in Introduction about the number of 
+    threads used throughout the examples document (\specref{sec:examples})
+  \item Clarified the mapping of virtual functions in \kcode{target} regions
+    (\specref{sec:virtual_functions})
+  \item Added missing \kcode{declare target} directive for procedures
+    called inside \kcode{target} region in \example{Examples} 
+    \example{declare_mapper.1.f90} (\specref{sec:declare_mapper}),
+    \example{target_reduction.*.f90} (\specref{subsec:target_reduction}), 
+    and \example{target_task_reduction.*.f90}
+    (\specref{subsec:target_task_reduction})
+  \item Added missing \kcode{end target} directive in 
+    \example{Example declare_mapper.3.f90}
+    (\specref{sec:declare_mapper})
+  \item Removed example for \kcode{flush} without a list from Synchronization 
+    since the example is confusing and the use of \kcode{flush} is already
+    covered in other examples
+    (\specref{chap:synchronization})
+  \item \docref{declare variant Directive} and \docref{Metadirective} sections were moved to
+         subsections in the new \docref{Context-based Variant Selection} section, 
+         with a section introduction on context selectors.
+    (\specref{sec:context_based_variants})
+  \item Fixed a typo (`\kcode{for}' $\rightarrow$ `\kcode{do}') in 
+    \example{Example metadirective.4.f90}
+    (\specref{subsec:metadirective})
+\end{itemize}
+
+\end{itemize}
+
+%=====================================
+\section{Changes from 5.2 to 5.2.1}
+\label{sec:history_52_to_521}
+
+\begin{itemize}
+\item General changes:
+\begin{itemize}
+\item Updated source metadata tags for all examples to use an improved form
+ (see \examplesblob{v5.2.1/Contributions.md})
+\item Explicitly included the version tag \verlabel[pre\_]{3.0} in those 
+ examples that did not contain a version tag previously
+\end{itemize}
+
+\item Added the following examples for the 5.2 features:
+\begin{itemize}
+  \item \kcode{uses_allocators} clause for the use of allocators in 
+  \kcode{target} regions (\specref{sec:allocators})
+\end{itemize}
+\item Added the following examples for the 5.1 features:
+\begin{itemize}
+  \item The \kcode{inoutset} dependence type (\specref{subsec:task_concurrent_depend})
+  \item Atomic compare and capture (\specref{sec:cas})
+\end{itemize}
+\item Added the following examples for the 5.0 features:
+\begin{itemize}
+  \item \kcode{declare target} directive with \kcode{device_type(nohost)}
+     clause (\specref{subsec:declare_target_device_type})
+  \item \kcode{omp_pause_resource} and \kcode{omp_pause_resource_all}
+     routines (\specref{sec:pause_resource})
+\end{itemize}
+
+\item Miscellaneous fixes:
+\begin{itemize}
+  \item Cast to implementation-defined enum type \kcode{omp_event_handle_t} 
+    now uses \bcode{uintptr_t} (not \bcode{void *}) in
+    \example{Example task_detach.2.c}
+    (\specref{sec:task_detachment})
+  \item Moved Fortran \kcode{requires} directive into program main (\ucode{rev_off}), 
+    the program unit, in \example{Example target_reverse_offload.7.f90}
+    (\specref{subsec:target_reverse_offload})
+  \item Fixed an inconsistent use of mapper in \example{Example target_mapper.3.f90}
+    (\specref{sec:declare_mapper})
+  \item Added a missing semicolon at end of \ucode{XOR1} class definition in
+    \example{Example declare_target.2a.cpp}
+    (\specref{subsec:declare_target_class})
+  \item Fixed the placement of \kcode{declare simd} directive in 
+    \example{Examples linear_modifier.*.f90} (\specref{sec:linear_modifier})
+    and added a general statement about where a Fortran declarative
+    directive can appear (\specref{chap:directive_syntax})
+  \item Fixed mismatched argument list in \example{Example fort_sa_private.5.f}
+    (\specref{sec:fort_sa_private})
+  \item Moved the placement of \kcode{declare target enter}
+    directive after function declaration
+    (\specref{subsec:target_task_reduction})
+  \item Fixed an incorrect use of \kcode{omp_in_parallel} routine in 
+    \example{Example metadirective.4}
+    (\specref{subsec:metadirective})
+  \item Fixed an incorrect value for \kcode{at} clause
+    (\specref{subsec:error})
+\end{itemize}
+
+\end{itemize}
+
+%=====================================
+\section{Changes from 5.1 to 5.2}
+\label{sec:history_51_to_52}
+
+\begin{itemize}
+\item General changes:
+\begin{itemize}
+\item Included a description of the semantics for OpenMP directive syntax
+ (see \specref{chap:directive_syntax})
+\item Reorganized the Introduction Chapter and moved the Feature 
+Deprecation Chapter to Appendix~\ref{chap:deprecated_features}
+\item Included a list of examples that were updated for feature deprecation
+and replacement in each version (see Appendix~\ref{sec:Updated Examples})
+\item Added Index entries
+\end{itemize}
+
+\item Updated the examples for feature deprecation and replacement in OpenMP 5.2.
+See Table~\ref{tab:Deprecated Features} and 
+Table~\ref{tab:Updated Examples 5.2} for details.
+
+\item Added the following examples for the 5.2 features:
+\begin{itemize}
+  \item Mapping class objects with virtual functions
+     (\specref{sec:virtual_functions})
+  \item \kcode{allocators} construct for Fortran \bcode{allocate} statement
+     (\specref{sec:allocators})
+  \item Behavior of reallocation of variables through OpenMP allocator in
+     Fortran (\specref{sec:allocators})
+\end{itemize}
+
+\item Added the following examples for the 5.1 features:
+\begin{itemize}
+  \item Clarification of optional \kcode{end} directive for strictly structured 
+      block in Fortran (\specref{sec:fortran_free_format_comments})
+  \item \kcode{filter} clause on \kcode{masked} construct (\specref{sec:masked})
+  \item \kcode{omp_all_memory} reserved locator for specifying task dependences
+      (\specref{subsec:depend_undefer_task})
+  \item Behavior of Fortran allocatable variables in \kcode{target} regions
+      (\specref{sec:fort_allocatable_array_mapping})
+  \item Device memory routines in Fortran
+      (\specref{subsec:target_mem_and_device_ptrs})
+  \item Partial tiles from \kcode{tile} construct
+      (\specref{sec:incomplete_tiles})
+  \item Fortran associate names and selectors in \kcode{target} region
+      (\specref{sec:associate_target})
+  \item \kcode{allocate} directive for variable declarations and 
+      \kcode{allocate} clause on \kcode{task} constructs
+      (\specref{sec:allocators})
+  \item Controlling concurrency and reproducibility with \kcode{order} clause
+      (\specref{sec:reproducible_modifier})
+\end{itemize}
+
+\item Added other examples:
+\begin{itemize}
+  \item Using lambda expressions with \kcode{target} constructs
+     (\specref{sec:lambda_expressions})
+  \item Target memory and device pointer routines
+     (\specref{subsec:target_mem_and_device_ptrs})
+  \item Examples to illustrate the ordering properties of 
+     the \plc{flush} operation (\specref{sec:mem_model})
+  \item User selector in the \kcode{metadirective} directive
+     (\specref{subsec:metadirective})
+\end{itemize}
+
+\end{itemize}
+
+%=====================================
+\section{Changes from 5.0.1 to 5.1}
+\label{sec:history_501_to_51}
+
+\begin{itemize}
+\item General changes:
+\begin{itemize}
+  \item Replaced \kcode{master} construct example with equivalent \kcode{masked} construct example (\specref{sec:masked})
+  \item Primary thread is now used to describe thread number 0 in the current team
+  \item \kcode{primary} thread affinity policy is now used to specify that every 
+      thread in the team is assigned to the same place as the primary thread (\specref{subsec:affinity_primary})
+  \item The \kcode{omp_lock_hint_*} constants have been renamed \kcode{omp_sync_hint_*} (\specref{sec:critical}, \specref{sec:locks})
+\end{itemize}
+
+\item Added the following new chapters:
+\begin{itemize}
+  \item Deprecated Features (on page~\pageref{chap:deprecated_features})
+  \item Directive Syntax (\specref{chap:directive_syntax})
+  \item Loop Transformations (\specref{chap:loop_transformations})
+  \item OMPT Interface (\specref{chap:ompt_interface})
+\end{itemize}
+
+\item Added the following examples for the 5.1 features:
+\begin{itemize}
+  \item OpenMP directives in C++ \plc{attribute} specifiers
+    (\specref{sec:attributes})
+  \item Directive syntax adjustment to allow Fortran \bcode{BLOCK} ... 
+    \bcode{END BLOCK} as a structured block 
+      (\specref{sec:fortran_free_format_comments})
+  \item \kcode{omp_target_is_accessible} API routine
+      (\specref{sec:pointer_mapping})
+  \item Fortran allocatable array mapping in \kcode{target} regions (\specref{sec:fort_allocatable_array_mapping})
+  \item \kcode{begin declare target} (with
+        \kcode{end declare target}) directive
+      (\specref{subsec:declare_target_class})
+  \item \kcode{tile} construct             (\specref{sec:tile})
+  \item \kcode{unroll} construct           (\specref{sec:unroll})
+  \item Reduction with the \kcode{scope} construct
+      (\specref{subsec:reduction_scope})
+  \item  \kcode{metadirective} directive with dynamic \kcode{condition} selector
+      (\specref{subsec:metadirective})
+  \item \kcode{interop} construct  (\specref{sec:interop})
+  \item Environment display with the \kcode{omp_display_env} routine
+      (\specref{subsec:display_env})
+  \item \kcode{error} directive  (\specref{subsec:error})
+\end{itemize}
+
+\item Included additional examples for the 5.0 features:
+\begin{itemize}
+  \item \kcode{collapse} clause for non-rectangular loop nest
+      (\specref{sec:collapse})
+  \item \kcode{detach} clause for tasks (\specref{sec:task_detachment})
+  \item Pointer attachment for a structure member (\specref{sec:structure_mapping})
+  \item Host and device pointer association with the \kcode{omp_target_associate_ptr} routine (\specref{sec:target_associate_ptr})
+  
+  \item Sample code on activating the tool interface 
+      (\specref{sec:ompt_start})
+\end{itemize}
+
+\item Added other examples:
+\begin{itemize}
+  \item The \kcode{omp_get_wtime} routine (\specref{subsec:get_wtime})
+\end{itemize}
+\end{itemize}
+
+
+%=====================================
+\section{Changes from 5.0.0 to 5.0.1}
+\label{sec:history_50_to_501}
+
+\begin{itemize}
+\item Added version tags \verlabel{\plc{x.y}} in example labels 
+and the corresponding source codes for all examples that feature
+OpenMP 3.0 and later. 
+
+\item Included additional examples for the 5.0 features:
+
+\begin{itemize}
+\item Extension to the \kcode{defaultmap} clause
+(\specref{sec:defaultmap})
+\item Transferring noncontiguous data with the \kcode{target update} directive in Fortran (\specref{sec:array-shaping})
+\item \kcode{conditional} modifier for the \kcode{lastprivate} clause                             (\specref{sec:lastprivate})
+\item \kcode{task} modifier for the \kcode{reduction} clause                                      (\specref{subsec:task_reduction})
+\item Reduction on combined target constructs                                                     (\specref{subsec:target_reduction})
+\item Task reduction with \kcode{target} constructs 
+   (\specref{subsec:target_task_reduction})
+\item \kcode{scan} directive for returning the \emph{prefix sum} of a reduction                  (\specref{sec:scan})
+
+\end{itemize}
+
+\item Included additional examples for the 4.x features:
+
+\begin{itemize}
+\item Dependence for undeferred tasks
+(\specref{subsec:depend_undefer_task})
+\item \kcode{ref}, \kcode{val}, \kcode{uval} modifiers for \kcode{linear} clause (\specref{sec:linear_modifier})
+
+\end{itemize}
+
+\item Clarified the description of pointer mapping and pointer attachment in 
+\specref{sec:pointer_mapping}.
+\item Clarified the description of memory model examples
+in \specref{sec:mem_model}.
+
+\end{itemize}
+
+
 \section{Changes from 4.5.0 to 5.0.0}
 \label{sec:history_45_to_50}

@ -8,40 +379,48 @@
 \item Added the following examples for the 5.0 features:

 \begin{itemize}
-\item Extended \code{teams} construct for host execution     (\specref{sec:host_teams})
-\item \code{loop} and \code{teams}~\code{loop} constructs specify loop iterations that can execute concurrently 
+\item Extended \kcode{teams} construct for host execution     (\specref{sec:host_teams})
+\item \kcode{loop} and \kcode{teams loop} constructs specify loop iterations that can execute concurrently 
                                                             (\specref{sec:loop})
-\item Task data affinity is indicated by \code{affinity} clause of \code{task} construct
+\item Task data affinity is indicated by \kcode{affinity} clause of \kcode{task} construct
                                                             (\specref{sec: task_affinity})
-\item Display thread affinity with \code{OMP\_DISPLAY\_AFFINITY} environment variable or \code{omp\_display\_affinity()} API routine
+\item Display thread affinity with \kcode{OMP_DISPLAY_AFFINITY} environment variable or \kcode{omp_display_affinity()} API routine
                                                             (\specref{sec:affinity_display})
-\item \code{taskwait} with dependences                       (\specref{subsec:taskwait_depend})
-\item \code{mutexinoutset} task dependences                  (\specref{subsec:task_dep_mutexinoutset})
-\item Multidependence Iterators (in \code{depend} clauses)   (\specref{subsec:depend_iterator})
-\item Combined constructs: \code{parallel}~\code{master}~\code{taskloop} and \code{parallel}~\code{master}~\code{taskloop}~\code{simd}
-                                                                                  (\specref{sec:parallel_master_taskloop})
-\item Reverse Offload through \plc{ancestor} modifier of \code{device} clause.    (\specref{subsec:target_reverse_offload})
+\item \kcode{taskwait} with dependences                       (\specref{subsec:taskwait_depend})
+\item \kcode{mutexinoutset} task dependences                  (\specref{subsec:task_dep_mutexinoutset})
+\item Multidependence Iterators (in \kcode{depend} clauses)   (\specref{subsec:depend_iterator})
+\item Combined constructs: \kcode{parallel master taskloop} and \kcode{parallel master taskloop simd}
+                                                                                  (\specref{sec:parallel_masked_taskloop})
+\item Reverse Offload through \kcode{ancestor} modifier of \kcode{device} clause.    (\specref{subsec:target_reverse_offload})
+\item Pointer Mapping  - behavior of mapped pointers                              (\specref{sec:pointer_mapping})   %Example_target_ptr_map*
+\item Structure Mapping  - behavior of mapped structures                          (\specref{sec:structure_mapping}) %Examples_target_structure_mapping.tex target_struct_map*
 \item Array Shaping with the \plc{shape-operator}                                 (\specref{sec:array-shaping})
-\item The \code{declare}~\code{mapper} construct                                  (\specref{sec:declare_mapper})
+\item The \kcode{declare mapper} directive                                  (\specref{sec:declare_mapper})
 \item Acquire and Release Semantics Synchronization: Memory ordering 
-      clauses \code{acquire}, \code{release}, and \code{acq\_rel} were added
+      clauses \kcode{acquire}, \kcode{release}, and \kcode{acq_rel} were added
      to flush and atomic constructs
                                                                                  (\specref{sec:acquire_and_release_semantics})
-\item \code{depobj} construct provides dependence objects for subsequent use in \code{depend} clauses
+\item \kcode{depobj} construct provides dependence objects for subsequent use in \kcode{depend} clauses
                                                                                  (\specref{sec:depobj})
-\item \code{reduction} clause for \code{task} construct                           (\specref{subsec:task_reduction})
-\item \code{reduction} clause for \code{taskloop} construct                       (\specref{subsec:taskloop_reduction})
-\item \code{reduction} clause for \code{taskloop}~\code{simd} construct           (\specref{subsec:taskloop_reduction})
+\item \kcode{reduction} clause for \kcode{task} construct                           (\specref{subsec:task_reduction})
+\item \kcode{reduction} clause for \kcode{taskloop} construct                       (\specref{subsec:taskloop_reduction})
+\item \kcode{reduction} clause for \kcode{taskloop simd} construct           (\specref{subsec:taskloop_reduction})
 \item Memory Allocators for making OpenMP memory requests with traits             (\specref{sec:allocators})
-\item \code{requires} directive specifies required features of implementation     (\specref{sec:requires})
-\item \code{declare}~\code{variant} directive - for function variants             (\specref{sec:declare_variant})
-\item \code{metadirective} directive  - for directive variants                    (\specref{sec:metadirective})
+\item \kcode{requires} directive specifies required features of implementation     (\specref{sec:requires})
+\item \kcode{declare variant} directive - for function variants
+(\specref{subsec:declare_variant})
+\item \kcode{metadirective} directive  - for directive variants
+(\specref{subsec:metadirective})
+\item \kcode{OMP_TARGET_OFFLOAD} Environment Variable  - controls offload behavior (\specref{sec:target_offload})
 \end{itemize}

 \item Included the following additional examples for the 4.x features:
 \begin{itemize}
 \item more taskloop examples (\specref{sec:taskloop})
 \item user-defined reduction (UDR) (\specref{subsec:UDR})
+%NEW 5.0
+%\item \code{target} \code{enter} and \code{exit} \code{data} unstructured data constructs  (\specref{sec:target_enter_exit_data}) %Example_target_unstructured_data.* ?
+
 \end{itemize}
 \end{itemize}

@ -49,22 +428,22 @@
 \begin{itemize}
 \item Reorganized into chapters of major topics
 \item Included file extensions in example labels to indicate source type
-\item Applied the explicit \code{map(tofrom)} for scalar variables 
+\item Applied the explicit \kcode{map(tofrom)} for scalar variables 
      in a number of examples to comply with 
      the change of the default behavior for scalar variables from 
-      \code{map(tofrom)} to \code{firstprivate} in the 4.5 specification
+      \kcode{map(tofrom)} to \kcode{firstprivate} in the 4.5 specification
 \item Added the following new examples:

 \begin{itemize}
-\item \code{linear} clause in loop constructs     (\specref{sec:linear_in_loop})
-\item \code{priority} clause for \code{task} construct (\specref{sec:task_priority})
-\item \code{taskloop} construct                   (\specref{sec:taskloop})
-\item \plc{directive-name} modifier in multiple \code{if} clauses on
+\item \kcode{linear} clause in loop constructs     (\specref{sec:linear_in_loop})
+\item \kcode{priority} clause for \kcode{task} construct (\specref{sec:task_priority})
+\item \kcode{taskloop} construct                   (\specref{sec:taskloop})
+\item \plc{directive-name} modifier in multiple \kcode{if} clauses on
 a combined construct                              (\specref{subsec:target_if})
 \item unstructured data mapping                   (\specref{sec:target_enter_exit_data})
-\item \code{link} clause for \code{declare}~\code{target} directive 
+\item \kcode{link} clause for \kcode{declare target} directive 
                                                  (\specref{subsec:declare_target_link})
-\item asynchronous target execution with \code{nowait} clause (\specref{sec:async_target_exec_depend})
+\item asynchronous target execution with \kcode{nowait} clause (\specref{sec:async_target_exec_depend})
 \item device memory routines and device pointers  (\specref{subsec:target_mem_and_device_ptrs})
 \item doacross loop nest                          (\specref{sec:doacross})
 \item locks with hints                            (\specref{sec:locks})
@ -87,8 +466,8 @@ a combined construct                              (\specref{subsec:target_if})

 Added the following new examples:
 \begin{itemize}
-\item the \code{proc\_bind} clause   (\specref{sec:affinity})
-\item the \code{taskgroup} construct (\specref{sec:taskgroup})
+\item the \kcode{proc_bind} clause   (\specref{sec:affinity})
+\item the \kcode{taskgroup} construct (\specref{sec:taskgroup})
 \end{itemize}

 \section{Changes from 3.1 to 4.0}
@ -100,13 +479,13 @@ Added the following new examples:

 \begin{itemize}
 \item task dependences                       (\specref{sec:task_depend})
-\item \code{target} construct                (\specref{sec:target})
-\item \code{target}~\code{data} construct    (\specref{sec:target_data})
-\item \code{target}~\code{update} construct  (\specref{sec:target_update})
-\item \code{declare}~\code{target} construct (\specref{sec:declare_target})
-\item \code{teams} constructs                (\specref{sec:teams})
-\item asynchronous execution of a \code{target} region using tasks (\specref{subsec:async_target_with_tasks})
+\item \kcode{target} construct                (\specref{sec:target})
 \item array sections in device constructs    (\specref{sec:array_sections})
+\item \kcode{target data} construct    (\specref{sec:target_data})
+\item \kcode{target update} construct  (\specref{sec:target_update})
+\item \kcode{declare target} directive (\specref{sec:declare_target})
+\item \kcode{teams} constructs                (\specref{sec:teams})
+\item asynchronous execution of a \kcode{target} region using tasks (\specref{subsec:async_target_with_tasks})
 \item device runtime routines                (\specref{sec:device})
 \item Fortran ASSOCIATE construct            (\specref{sec:associate})
 \item cancellation constructs                (\specref{sec:cancellation})
--- a/121
+++ b/121
@ -1,23 +1,41 @@
 # Makefile for the OpenMP Examples document in LaTex format. 
-# For more information, see the master document, openmp-examples.tex.
+# For more information, see the main document, openmp-examples.tex.
+SHELL=bash
+
+include versioninfo

-version=5.0.0
 default: openmp-examples.pdf
+diff: clean openmp-diff-abridged.pdf

+release: VERSIONSTR="$(version_date)"
+release: clean openmp-examples.pdf
+
+book: BOOK_BUILD="\\def\\bookbuild{1}"
+book: clean release
+	mv openmp-examples-${version}.pdf openmp-examples-${version}-book.pdf
+
+ccpp-only: LANG_OPT="\\ccpptrue\\fortranfalse"
+ccpp-only: clean release
+
+fortran-only: LANG_OPT="\\ccppfalse\\fortrantrue"
+fortran-only: clean release

 CHAPTERS=Title_Page.tex \
 	Foreword_Chapt.tex \
-	Introduction_Chapt.tex \
-	Examples_*.tex \
-	History.tex
+	Chap_*.tex \
+	Deprecated_Features.tex \
+	History.tex \
+	*/*.tex

-SOURCES=sources/*.c \
-	sources/*.cpp \
-	sources/*.f90 \
-	sources/*.f 
+SOURCES=*/sources/*.c \
+	*/sources/*.cpp \
+	*/sources/*.f90 \
+	*/sources/*.f 

 INTERMEDIATE_FILES=openmp-examples.pdf \
 		openmp-examples.toc \
+		openmp-examples.lof \
+		openmp-examples.lot \
 		openmp-examples.idx \
 		openmp-examples.aux \
 		openmp-examples.ilg \
@ -25,13 +43,90 @@ INTERMEDIATE_FILES=openmp-examples.pdf \
 		openmp-examples.out \
 		openmp-examples.log

-openmp-examples.pdf: $(CHAPTERS) $(SOURCES) openmp.sty openmp-examples.tex openmp-logo.png
+LATEXCMD=pdflatex -interaction=batchmode -file-line-error
+LATEXDCMD=$(LATEXCMD) -draftmode
+
+# check for branches names with "name_XXX"
+DIFF_TICKET_ID=$(shell git rev-parse --abbrev-ref HEAD)
+GITREV=$(shell git rev-parse --short HEAD || echo "??")
+VERSIONSTR="GIT rev $(GITREV)"
+LANG_OPT="\\ccpptrue\\fortrantrue"
+
+openmp-examples.pdf: $(CHAPTERS) $(SOURCES) openmp.sty openmp-examples.tex openmp-logo.png generated-include.tex
 	rm -f $(INTERMEDIATE_FILES)
-	pdflatex -interaction=batchmode -file-line-error openmp-examples.tex
-	pdflatex -interaction=batchmode -file-line-error openmp-examples.tex
-	pdflatex -interaction=batchmode -file-line-error openmp-examples.tex
+	touch generated-include.tex
+	$(LATEXDCMD) openmp-examples.tex
+	makeindex -s openmp-index.ist openmp-examples.idx
+	$(LATEXDCMD) openmp-examples.tex
+	$(LATEXCMD) openmp-examples.tex
 	cp openmp-examples.pdf openmp-examples-${version}.pdf

+check:
+	sources/check_tags
+
 clean:
 	rm -f $(INTERMEDIATE_FILES)
+	rm -f generated-include.tex
+	rm -f openmp-diff-full.pdf openmp-diff-abridged.pdf
+	rm -rf *.tmpdir
+	cd util; make clean
+	rm -f chk_tags.log sources/*.log

+realclean: clean
+	rm -f openmp-examples-${version}.pdf openmp-examples-${version}-book.pdf
+
+ifdef DIFF_TO
+    VC_DIFF_TO := -r ${DIFF_TO}
+else
+    VC_DIFF_TO :=
+endif
+ifdef DIFF_FROM
+    VC_DIFF_FROM := -r ${DIFF_FROM}
+else
+    VC_DIFF_FROM := -r work_6.0
+endif
+
+DIFF_TO:=HEAD
+DIFF_FROM:=work_6.0
+DIFF_TYPE:=UNDERLINE
+
+COMMON_DIFF_OPTS:=--math-markup=whole  \
+                  --append-safecmd=plc,code,kcode,scode,ucode,vcode,splc,bcode,pvar,pout,example \
+                  --append-textcmd=subsubsubsection
+
+VC_DIFF_OPTS:=${COMMON_DIFF_OPTS} --force -c latexdiff.cfg --flatten --type="${DIFF_TYPE}" --git --pdf  ${VC_DIFF_FROM} ${VC_DIFF_TO}  --subtype=ZLABEL --graphics-markup=none
+
+VC_DIFF_MINIMAL_OPTS:= --only-changes --force
+
+generated-include.tex:
+	echo "$(BOOK_BUILD)" > $@
+	echo "\\def\\VER{${version}}" >> $@
+	echo "\\def\\SVER{${version_spec}}" >> $@
+	echo "\\def\\VERDATE{${VERSIONSTR}}" >> $@
+	@echo "\\newif\\ifccpp\\newif\\iffortran" >> $@
+	echo "$(LANG_OPT)" >> $@
+	util/list_tags -vtag */sources/* >> $@
+
+%.tmpdir: $(wildcard *.sty) $(wildcard *.png) $(wildcard *.aux) openmp-examples.pdf
+	mkdir -p $@/sources
+	for i in affinity devices loop_transformations parallel_execution SIMD tasking \
+		 data_environment memory_model program_control synchronization \
+		 directives ompt_interface;  do \
+	  mkdir -p $@/$$i; ln -sf "$$PWD"/$$i/sources $@/$$i/sources; done
+	mkdir -p $@/figs
+	cp -f $^ "$@/"
+	cp -f sources/* "$@/sources"
+	cp -f figs/* "$@/figs"
+
+openmp-diff-abridged.pdf: diff-fast-minimal.tmpdir openmp-examples.pdf
+	env PATH="$(shell pwd)/util/latexdiff:$(PATH)" latexdiff-vc ${VC_DIFF_MINIMAL_OPTS} --fast -d $< ${VC_DIFF_OPTS} openmp-examples.tex
+	cp $</openmp-examples.pdf $@
+	if [ "x$(DIFF_TICKET_ID)" != "x" ]; then cp $@ ${@:.pdf=-$(DIFF_TICKET_ID).pdf}; fi
+
+# Slow but portable diffs
+openmp-diff-minimal.pdf: diffs-slow-minimal.tmpdir
+	env PATH="$(shell pwd)/util/latexdiff:$(PATH)" latexdiff-vc ${VC_DIFF_MINIMAL_OPTS} -d $< ${VC_DIFF_OPTS} openmp-examples.tex
+	cp $</openmp-examples.pdf $@
+	if [ "x$(DIFF_TICKET_ID)" != "x" ]; then cp $@ ${@:.pdf=-$(DIFF_TICKET_ID).pdf}; fi
+
+.PHONY: diff default book clean realclean
--- a/68
+++ b/68
@ -1,68 +0,0 @@
-This is the OpenMP Examples document in LaTeX format.
-Please see the master file, openmp-examples.tex, for more information.
-
-For a brief revision history, please see Changes.log.
-
-For copyright information, please see omp_copyright.txt.
-
-
-1) Process for adding an example
-
-   - Prepare source code and text description
-   - Give a high level description in a trac ticket
-   - Determine a name (ename) for the example
-      - Propose a new name if creating a new chapter
-      - Use the existing name if adding to an existing chapter
-   - Number the example within the chapter (seq-no)
-   - Create files for the source code with proper tags in
-      sources/Example_<ename>.<seq-no>c.c
-      sources/Example_<ename>.<seq-no>f.f
-   - Create or update the description text in the chapter file
-      Examples_<ename>,tex
-   - If needed, add the new chapter file name in 
-      Makefile
-      openmp-examples.tex
-   - Commit the changes in git and push to the GitHub repo
-   - Discuss and vote in committee
-
-2) Tags (meta data) for example sources
-
-  @@name:        <ename>.<seq-no>[c|cpp|f|f90]
-  @@type:        C|C++|F-fixed|F-free
-  @@compilable:  yes|no|maybe
-  @@linkable:    yes|no|maybe
-  @@expect:      success|failure|nothing|rt-error
-
-  "name" is the name of an example
-  "type" is the source code type, which can be translated into or from
-          proper file extension (c,cpp,f,f90)
-  "compilable" indicates whether the source code is compilable
-  "linkable" indicates whether the source code is linkable
-  "expect" indicates some expected result for testing purpose
-          "success|failure|nothing" applies to the result of code compilation
-          "rt-error" is for a case where compilation may be successful, 
-          but the code contains potential runtime issues (such as race condition).
-          Alternative would be to just use "conforming" or "non-conforming".
-
-3) LaTeX macros for examples
-
- Source code with language h-rules
-   \cexample{<ename>}{<seq-no>}     % for C/C++ examples
-   \cppexample{<ename>}{<seq-no>}   % for C++ examples
-   \fexample{<ename>}{<seq-no>}     % for fixed-form Fortran examples
-   \ffreeexample{<ename>}{<seq-no>} % for free-form Fortran examples
-
- Source code without language h-rules
-   \cnexample{<ename>}{<seq-no>}
-   \cppnexample{<ename>}{<seq-no>}
-   \fnexample{<ename>}{<seq-no>}
-   \ffreenexample{<ename>}{<seq-no>}
-
- Language h-rules
-   \cspecificstart, \cspecificend
-   \cppspecificstart, \cppspecificend
-   \ccppspecificstart, \ccppspecificend
-   \fortranspecificstart, \fortranspecificend
-
- See openmp.sty for more information
-
--- a/README.md
+++ b/README.md
@ -1,2 +1,10 @@
-# Examples
-LaTeX Examples Document Source
+# OpenMP Examples Document
+
+This is the OpenMP Examples document in LaTeX format.
+
+Please see [Contributions.md](Contributions.md) on how to make contributions to adding new examples.
+
+For a brief revision history, please see [Changes.log](Changes.log).
+
+For copyright information, please see [omp_copyright.txt](omp_copyright.txt).
+
--- a/SIMD/SIMD.tex
+++ b/SIMD/SIMD.tex
@ -0,0 +1,150 @@
+%\pagebreak
+\section{\kcode{simd} and \kcode{declare simd} Directives}
+\label{sec:SIMD}
+
+\index{constructs!simd@\kcode{simd}}
+\index{simd construct@\kcode{simd} construct}
+The following example illustrates the basic use of the \kcode{simd} construct 
+to assure the compiler that the loop can be vectorized.
+
+\cexample[4.0]{SIMD}{1}
+
+\ffreeexample[4.0]{SIMD}{1}
+ 
+
+\index{directives!declare simd@\kcode{declare simd}}
+\index{declare simd directive@\kcode{declare simd} directive}
+\index{clauses!uniform@\kcode{uniform}}
+\index{uniform clause@\kcode{uniform} clause}
+\index{clauses!linear@\kcode{linear}}
+\index{linear clause@\kcode{linear} clause}
+When a function can be inlined within a loop the compiler has an opportunity to 
+vectorize the loop. By guaranteeing SIMD behavior of a function's operations, 
+characterizing the arguments of the function and privatizing temporary 
+variables of the loop, the compiler can often create faster, vector code for 
+the loop. In the examples below the \kcode{declare simd} directive is 
+used on the \ucode{add1} and \ucode{add2} functions to enable creation of their 
+corresponding SIMD function versions for execution within the associated SIMD 
+loop. The functions characterize two different approaches of accessing data 
+within the function: by a single variable and as an element in a data array, 
+respectively. The \ucode{add3} C function uses dereferencing.
+
+The \kcode{declare simd} directives also illustrate the use of 
+\kcode{uniform} and \kcode{linear} clauses.  The \kcode{uniform(\ucode{fact})} clause 
+indicates that the variable \ucode{fact} is invariant across the SIMD lanes. In 
+the \ucode{add2} function \ucode{a} and \ucode{b} are included in the \kcode{uniform} 
+list because the C pointer and the Fortran array references are constant.  The 
+\ucode{i} index used in the \ucode{add2} function is included in a \kcode{linear} 
+clause with a constant-linear-step of 1, to guarantee a unity increment of the 
+associated loop. In the \kcode{declare simd} directive for the \ucode{add3} 
+C function the  \kcode{linear(\ucode{a,b:1})} clause instructs the compiler to generate 
+unit-stride loads across the SIMD lanes; otherwise,  costly \emph{gather} 
+instructions would be generated for the unknown sequence of access of the 
+pointer dereferences.
+
+In the \kcode{simd} constructs for the loops the \kcode{private(\ucode{tmp})} clause is 
+necessary to assure that each vector operation has its own \ucode{tmp}
+variable.
+
+\cexample[4.0]{SIMD}{2}
+
+\ffreeexample[4.0]{SIMD}{2}
+
+%\pagebreak
+\index{clauses!private@\kcode{private}}
+\index{private clause@\kcode{private} clause}
+\index{clauses!reduction@\kcode{reduction}}
+\index{reduction clause@\kcode{reduction} clause}
+\index{reductions!reduction clause@\kcode{reduction} clause}
+A thread that encounters a SIMD construct executes a vectorized code of the 
+iterations. Similar to the concerns of a worksharing loop a loop vectorized 
+with a SIMD construct must assure that temporary and reduction variables are 
+privatized and declared as reductions with clauses.  The example below 
+illustrates the use of \kcode{private} and \kcode{reduction} clauses in a SIMD 
+construct.
+
+\cexample[4.0]{SIMD}{3}
+
+\ffreeexample[4.0]{SIMD}{3}
+
+
+%\pagebreak
+\index{clauses!safelen@\kcode{safelen}}
+\index{safelen clause@\kcode{safelen} clause}
+A \kcode{safelen(\ucode{N})} clause in a \kcode{simd} construct assures the compiler that 
+there are no loop-carried dependences for vectors of size \ucode{N} or below. If
+the \kcode{safelen} clause is not specified, then the default safelen value is 
+the number of loop iterations.
+ 
+The \kcode{safelen(\ucode{16})} clause in the example below guarantees that the vector 
+code is safe for vectors up to and including size 16.  In the loop, \ucode{m} can 
+be 16 or greater, for correct code execution.  If the value of \ucode{m} is less 
+than 16, the behavior is undefined.
+
+\cexample[4.0]{SIMD}{4}
+
+\ffreeexample[4.0]{SIMD}{4}
+
+%\pagebreak
+\index{clauses!collapse@\kcode{collapse}}
+\index{collapse clause@\kcode{collapse} clause}
+The following SIMD construct instructs the compiler to collapse the \ucode{i} and 
+\ucode{j} loops into a single SIMD loop in which SIMD chunks are executed by 
+threads of the team. Within the workshared loop chunks of a thread, the SIMD 
+chunks are executed in the lanes of the vector units.
+
+\cexample[4.0]{SIMD}{5}
+
+\ffreeexample[4.0]{SIMD}{5}
+
+
+%%% section
+\section{\kcode{inbranch} and \kcode{notinbranch} Clauses}
+\label{sec:SIMD_branch}
+\index{clauses!inbranch@\kcode{inbranch}}
+\index{inbranch clause@\kcode{inbranch} clause}
+\index{clauses!notinbranch@\kcode{notinbranch}}
+\index{notinbranch clause@\kcode{notinbranch} clause}
+
+The following examples illustrate the use of the \kcode{declare simd} 
+directive with the \kcode{inbranch} and \kcode{notinbranch} clauses. The 
+\kcode{notinbranch} clause informs the compiler that the function \ucode{foo} is 
+never called conditionally in the SIMD loop of the function \ucode{myaddint}. On 
+the other hand, the \kcode{inbranch} clause for the function goo indicates that 
+the function is always called conditionally in the SIMD loop inside 
+the function \ucode{myaddfloat}.
+
+\cexample[4.0]{SIMD}{6}
+
+\ffreeexample[4.0]{SIMD}{6}
+
+
+In the code below, the function \ucode{fib()} is called in the main program and 
+also recursively called in the function \ucode{fib()} within an \bcode{if} 
+condition. The compiler creates a masked vector version and a non-masked vector 
+version for the function \ucode{fib()} while retaining the original scalar 
+version of the \ucode{fib()} function.
+
+\cexample[4.0]{SIMD}{7}
+
+\ffreeexample[4.0]{SIMD}{7}
+
+
+
+%%% section
+%\pagebreak
+\section{Loop-Carried Lexical Forward Dependence}
+\label{sec:SIMD_forward_dep}
+\index{dependences!loop-carried lexical forward}
+
+
+ The following example tests the restriction on an SIMD loop with the loop-carried lexical forward-dependence. This dependence must be preserved for the correct execution of SIMD loops.
+
+A loop can be vectorized even though the iterations are not completely independent when it has loop-carried dependences that are forward lexical dependences, indicated in the code below by the read of \ucode{A[j+1]} and the write to \ucode{A[j]} in C/C++ code (or \ucode{A(j+1)} and \ucode{A(j)} in Fortran). That is, the read of \ucode{A[j+1]} (or \ucode{A(j+1)} in Fortran) before the write to \ucode{A[j]} (or \ucode{A(j)} in Fortran) ordering must be preserved for each iteration in \ucode{j} for valid SIMD code generation.
+
+This test assures that the compiler preserves the loop-carried lexical forward-dependence for generating a correct SIMD code.
+
+\cexample[4.0]{SIMD}{8}
+
+\ffreeexample[4.0]{SIMD}{8}
+
--- a/SIMD/linear_modifier.tex
+++ b/SIMD/linear_modifier.tex
@ -0,0 +1,83 @@
+%%% section
+\section{\kcode{ref}, \kcode{val}, \kcode{uval} Modifiers for \kcode{linear} Clause}
+\label{sec:linear_modifier}
+\index{modifiers, linear@modifiers, \kcode{linear}!ref@\kcode{ref}}
+\index{modifiers, linear@modifiers, \kcode{linear}!val@\kcode{val}}
+\index{modifiers, linear@modifiers, \kcode{linear}!uval@\kcode{uval}}
+\index{clauses!linear@\kcode{linear}}
+\index{linear clause@\kcode{linear} clause}
+
+When generating vector functions from \kcode{declare simd} directives, 
+it is important for a compiler to know the proper types of function arguments in
+order to generate efficient codes.
+This is especially true for C++ reference types and Fortran arguments.
+
+In the following example, the function \ucode{add_one2} has a C++ reference
+parameter (or Fortran argument) \ucode{p}.  Variable \ucode{p} gets incremented by 1 in the function.
+The caller loop \ucode{i} in the main program passes 
+a variable \ucode{k} as a reference to the function \ucode{add_one2} call.   
+The \kcode{ref} modifier for the \kcode{linear} clause on the 
+\kcode{declare simd} directive specifies that the 
+reference-type parameter \ucode{p} is to match the property of the variable 
+\ucode{k} in the loop.  
+This use of reference type is equivalent to the second call to 
+\ucode{add_one2} with a direct passing of the array element \ucode{a[i]}.  
+In the example, the preferred vector 
+length 8 is specified for both the caller loop and the callee function.
+
+When \kcode{linear(\ucode{p}: ref)} is applied to an argument passed by reference, 
+it tells the compiler that the addresses in its vector argument are consecutive,
+and so the compiler can generate a single vector load or store instead of 
+a gather or scatter. This allows more efficient SIMD code to be generated with 
+less source changes.
+
+\cppexample[5.2]{linear_modifier}{1}
+\ffreeexample[5.2]{linear_modifier}{1}
+%\clearpage
+
+ 
+The following example is a variant of the above example. The function \ucode{add_one2} 
+in the C++ code includes an additional C++ reference parameter \ucode{i}. 
+The loop index \ucode{i} of the caller loop \ucode{i} in the main program 
+is passed as a reference to the function \ucode{add_one2} call.   
+The loop index \ucode{i} has a uniform address with
+linear value of step 1 across SIMD lanes. 
+Thus, the \kcode{uval} modifier is used for the \kcode{linear} clause 
+to specify that the C++ reference-type parameter \ucode{i} is to match 
+the property of loop index \ucode{i}.
+
+In the corresponding Fortran code the arguments \ucode{p} and
+\ucode{i} in the routine \ucode{add_on2} are passed by references. 
+Similar modifiers are used for these variables in the \kcode{linear} clauses
+to match with the property at the caller loop in the main program.
+
+When \kcode{linear(\ucode{i}: uval)} is applied to an argument passed by reference, it 
+tells the compiler that its addresses in the vector argument are uniform 
+so that the compiler can generate a scalar load or scalar store and create 
+linear values. This allows more efficient SIMD code to be generated with 
+less source changes.
+
+\cppexample[5.2]{linear_modifier}{2}
+\ffreeexample[5.2]{linear_modifier}{2}
+
+In the following example, the function \ucode{func} takes arrays \ucode{x} and \ucode{y} 
+as arguments, and accesses the array elements referenced by the index \ucode{i}.
+The caller loop \ucode{i} in the main program passes a linear copy of
+the variable \ucode{k} to the function \ucode{func}.
+The \kcode{val} modifier is used for the \kcode{linear} clause 
+in the \kcode{declare simd} directive for the function
+\ucode{func} to specify that the argument \ucode{i} is to match the property of 
+the actual argument \ucode{k} passed in the SIMD loop.
+Arrays \ucode{x} and \ucode{y} have uniform addresses across SIMD lanes.
+
+When \kcode{linear(\ucode{i}: val,step(\ucode{1}))} is applied to an argument, 
+it tells the compiler that its addresses in the vector argument may not be 
+consecutive, however, their values are linear (with stride 1 here). When the value of \ucode{i} is used 
+in subscript of array references (e.g., \ucode{x[i]}), the compiler can generate 
+a vector load or store instead of a gather or scatter. This allows more 
+efficient SIMD code to be generated with less source changes.
+
+\cexample[5.2]{linear_modifier}{3}
+\ffreeexample[5.2]{linear_modifier}{3}
+
+
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Henry Jin	415024c369	Makefile update	2025-01-05 14:22:19 -08:00
Henry Jin	00bdf88b63	fix transparent task example	2024-11-14 08:08:37 -08:00
Henry Jin	3346a30ce2	v6.0 release	2024-11-13 11:07:08 -08:00
Henry Jin	11f2efcccf	v5.2.2 release	2024-04-16 08:59:23 -07:00
Henry Jin	075683d574	more uses_allocators update	2022-11-09 13:11:02 -08:00
Henry Jin	08859e6029	declare target with device_type(nohost)	2022-11-07 10:59:22 -08:00
Henry Jin	03b9a00df9	release 5.2.1	2022-11-04 09:35:42 -07:00
Henry Jin	a5e3d8b3f2	v5.2 release	2022-04-18 15:02:25 -07:00
Henry Jin	fb0edc81e7	merge with examples-internal/v5.1	2021-08-17 09:11:55 -07:00
Henry Jin	60e8ece384	cleanup files	2020-06-26 08:01:16 -07:00
Henry Jin	3052c10566	v5.0.1 release	2020-06-26 07:54:45 -07:00