synced with the 4.5.0 implementation of the examples-internal repo

2025-04-04 05:41:33 +01:00 · 2016-11-10 16:11:12 -08:00 · 2016-11-10 16:11:12 -08:00 · 156a12ca09
commit 156a12ca09
parent c65fe47427
447 changed files with 3166 additions and 788 deletions
--- a/Changes.log
+++ b/Changes.log
@ -1,3 +1,9 @@
+[20-May-2016] Version 4.5.0
+Changes from 4.0.2ltx
+
+1. Reorganization into topic chapters
+2. Change file suffixes (f/f90 => Fixed/Free format) C++ => cpp 
+
 [2-Feb-2015] Version 4.0.2
 Changes from 4.0.1ltx

--- a/Chap_SIMD.tex
+++ b/Chap_SIMD.tex
@ -0,0 +1,48 @@
+\pagebreak
+\chapter{SIMD}
+\label{chap:simd}
+
+Single instruction, multiple data (SIMD) is a form of parallel execution 
+in which the same operation is performed on multiple data elements 
+independently in hardware vector processing units (VPU), also called SIMD units.
+The addition of two vectors to form a third vector is a SIMD operation.
+Many processors have SIMD (vector) units that can perform simultaneously 
+2, 4, 8 or more executions of the same operation (by a single SIMD unit). 
+
+Loops without loop-carried backward dependency (or with dependency preserved using 
+ordered simd) are candidates for vectorization by the compiler for 
+execution with SIMD units. In addition, with state-of-the-art vectorization 
+technology and \code{declare simd} construct extensions for function vectorization
+in the OpenMP 4.5 specification, loops with function calls can be vectorized as well. 
+The basic idea is that a scalar function call in a loop can be replaced by a vector version 
+of the function, and the loop can be vectorized simultaneously by combining a loop 
+vectorization (\code{simd} directive on the loop) and a function 
+vectorization (\code{declare simd} directive on the function).
+
+A \code{simd} construct states that SIMD operations be performed on the
+data within the loop.  A number of clauses are available to provide
+data-sharing attributes (\code{private}, \code{linear}, \code{reduction} and 
+\code{lastprivate}).  Other clauses provide vector length preference/restrictions 
+(\code{simdlen} / \code{safelen}), loop fusion (\code{collapse}), and data 
+alignment (\code{aligned}).
+
+The \code{declare simd} directive designates
+that a vector version of the function should also be constructed for 
+execution within loops that contain the function and have a \code{simd} 
+directive.  Clauses provide argument specifications (\code{linear},
+\code{uniform}, and \code{aligned}), a requested vector length 
+(\code{simdlen}), and designate whether the function is always/never 
+called conditionally in a loop (\code{branch}/\code{inbranch}). 
+The latter is for optimizing peformance.
+
+Also, the \code{simd} construct has been combined with the worksharing loop 
+constructs (\code{for simd} and \code{do simd}) to enable simultaneous thread 
+execution in different SIMD units.  
+%Hence, the \code{simd} construct can be 
+%used alone on a loop to direct vectorization (SIMD execution), or in 
+%combination with a parallel loop construct to include thread parallelism 
+%(a parallel loop sequentially followed by a \code{simd} construct,
+%or a combined construct such as \code{parallel do simd} or 
+%\code{parallel for simd}).
+
+
--- a/Chap_affinity.tex
+++ b/Chap_affinity.tex
@ -0,0 +1,118 @@
+\pagebreak
+\chapter{OpenMP Affinity}
+\label{chap:openmp_affinity}
+
+OpenMP Affinity consists of a \code{proc\_bind} policy (thread affinity policy) and a specification of
+places (\texttt{"}location units\texttt{"} or \plc{processors} that may be cores, hardware
+threads, sockets, etc.).  
+OpenMP Affinity enables users to bind computations on specific places.
+The placement will hold for the duration of the parallel region. 
+However, the runtime is free to migrate the OpenMP threads 
+to different cores (hardware threads, sockets, etc.) prescribed within a given place, 
+if two or more cores (hardware threads, sockets, etc.) have been assigned to a given place.
+
+Often the binding can be managed without resorting to explicitly setting places.
+Without the specification of places in the \code{OMP\_PLACES} variable, 
+the OpenMP runtime will distribute and bind threads using the entire range of processors for 
+the OpenMP program, according to the \code{OMP\_PROC\_BIND} environment variable
+or the \code{proc\_bind} clause.  When places are specified, the OMP runtime
+binds threads to the places according to a default distribution policy, or
+those specified in the \code{OMP\_PROC\_BIND} environment variable or the
+\code{proc\_bind} clause.
+
+In the OpenMP Specifications document a processor refers to an execution unit that
+is enabled for an OpenMP thread to use.  A processor is a core when there is
+no SMT (Simultaneous Multi-Threading) support or SMT is disabled.  When 
+SMT is enabled, a processor is a hardware thread (HW-thread). (This is the
+usual case; but actually, the execution unit is implementation defined.) Processor
+numbers are numbered sequentially from 0 to the number of cores less one (without SMT), or
+0 to the number HW-threads less one (with SMT). OpenMP places use the processor number to designate
+binding locations (unless an \texttt{"}abstract name\texttt{"} is used.) 
+
+
+The processors available to a process may be a subset of the system's
+processors.  This restriction may be the result of a 
+wrapper process controlling the execution (such as \code{numactl} on Linux systems), 
+compiler options, library-specific environment variables, or default
+kernel settings.  For instance, the execution of multiple MPI processes,
+launched on a single compute node, will each have a subset of processors as
+determined by the MPI launcher or set by MPI affinity environment 
+variables for the MPI library.  %Forked threads within an MPI process
+%(for a hybrid execution of MPI and OpenMP code) inherit the valid 
+%processor set for execution from the parent process (the initial task region) 
+%when a parallel region forks threads.  The binding policy set in 
+%\code{OMP\_PROC\_BIND} or the \code{proc\_bind} clause will be applied to 
+%the subset of processors available to \plc{the particular} MPI process.
+
+%Also, setting an explicit list of processor numbers in the \code{OMP\_PLACES} 
+%variable before an MPI launch (which involves more than one MPI process) will
+%result in unspecified behavior (and doesn't make sense) because the set of 
+%processors in the places list must not contain processors outside the subset 
+%of processors for an MPI process. A separate \code{OMP\_PLACES} variable must
+%be set for each MPI process, and is usually accomplished by launching a script 
+%which sets \code{OMP\_PLACES} specifically for the MPI process. 
+
+Threads of a team are positioned onto places in a compact manner, a 
+scattered distribution, or onto the master's place, by setting the 
+\code{OMP\_PROC\_BIND} environment variable or the \code{proc\_bind} clause  to 
+\plc{close}, \plc{spread}, or \plc{master}, respectively.  When 
+\code{OMP\_PROC\_BIND} is set to FALSE no binding is enforced; and 
+when the value is TRUE, the binding is implementation defined to 
+a set of places in the \code{OMP\_PLACES} variable or to places 
+defined by the implementation if the \code{OMP\_PLACES} variable 
+is not set.
+
+The \code{OMP\_PLACES} variable can also be set to an abstract name 
+(\plc{threads}, \plc{cores}, \plc{sockets}) to specify that a place is
+either a single hardware thread, a core, or a socket, respectively. 
+This description of the \code{OMP\_PLACES} is most useful when the 
+number of threads is equal to the number of hardware thread, cores
+or sockets.  It can also be used with a \plc{close} or \plc{spread} 
+distribution policy when the equality doesn't hold.
+
+
+% We need an example of using sockets, cores and threads:
+
+% case 1 cores:
+
+%     Hyper-Threads on (2 hardware threads per core)
+%     1 socket x 4 cores x 2 HW-threads
+%   
+%     export OMP_NUM_THREADS=4
+%     export OMP_PLACES=threads
+%     
+%          core #      0    1    2    3
+%     processor #     0,1  2,3  4,5  6,7  
+%     thread #     0  * _  _ _  _ _  _ _   #mask for thread 0
+%     thread #     1  _ _  * _  _ _  _ _   #mask for thread 1
+%     thread #     2  _ _  _ _  * _  _ _   #mask for thread 2
+%     thread #     3  _ _  _ _  _ _  * _   #mask for thread 3
+
+% case 2 threads:
+%   
+%     Hyper-Threads on (2 hardware threads per core)
+%     1 socket x 4 cores x 2 HW-threads
+%    
+%     export OMP_NUM_THREADS=4
+%     export OMP_PLACES=cores
+%     
+%          core #      0    1    2    3
+%     processor #     0,1  2,3  4,5  6,7  
+%     thread #     0  * *  _ _  _ _  _ _   #mask for thread 0
+%     thread #     1  _ _  * *  _ _  _ _   #mask for thread 1
+%     thread #     2  _ _  _ _  * *  _ _   #mask for thread 2
+%     thread #     3  _ _  _ _  _ _  * *   #mask for thread 3
+
+% case 3 sockets:
+%   
+%     No Hyper-Threads
+%     3 socket x 4 cores 
+%     
+%     export OMP_NUM_THREADS=3
+%     export OMP_PLACES=sockets
+%     
+%        socket #        0         1          2
+%     processor #     0,1,2,3   4,5,6,7   8,9,10,11
+%     thread #     0  * * * *   _ _ _ _   _ _  _  _   #mask for thread 0
+%     thread #     0  _ _ _ _   * * * *   _ _  _  _   #mask for thread 1
+%     thread #     0  _ _ _ _   _ _ _ _   * *  *  *   #mask for thread 2
--- a/Chap_data_environment.tex
+++ b/Chap_data_environment.tex
@ -0,0 +1,75 @@
+\pagebreak
+\chapter{Data Environment}
+\label{chap:data_environment}
+The OpenMP \plc{data environment} contains data attributes of variables and
+objects.  Many constructs (such as \code{parallel}, \code{simd}, \code{task}) 
+accept clauses to control \plc{data-sharing} attributes
+of referenced variables in the construct, where \plc{data-sharing} applies to
+whether the attribute of the variable is \plc{shared}, 
+is \plc{private} storage, or has special operational characteristics 
+(as found in the \code{firstprivate}, \code{lastprivate}, \code{linear}, or \code{reduction} clause).
+
+The data environment for a device (distinguished as a \plc{device data environment})
+is controlled on the host by \plc{data-mapping} attributes, which determine the
+relationship of the data on the host, the \plc{original} data, and the data on the
+device, the \plc{corresponding} data.
+
+\bigskip
+DATA-SHARING ATTRIBUTES
+
+Data-sharing attributes of variables can be classified as being \plc{predetermined},
+\plc{explicitly determined} or \plc{implicitly determined}.
+
+Certain variables and objects have predetermined attributes.  
+A commonly found case is the loop iteration variable in associated loops 
+of a \code{for} or \code{do} construct. It has a private data-sharing attribute.
+Variables with predetermined data-sharing attributes can not be listed in a data-sharing clause; but there are some
+exceptions (mainly concerning loop iteration variables).
+
+Variables with explicitly determined data-sharing attributes are those that are
+referenced in a given construct and are listed in a data-sharing attribute
+clause on the construct. Some of the common data-sharing clauses are:
+\code{shared}, \code{private}, \code{firstprivate}, \code{lastprivate}, 
+\code{linear}, and \code{reduction}. % Are these all of them?
+
+Variables with implicitly determined data-sharing attributes are those
+that are referenced in a given construct, do not have predetermined
+data-sharing attributes, and are not listed in a data-sharing
+attribute clause of an enclosing construct.
+For a complete list of variables and objects with predetermined and
+implicitly determined attributes, please refer to the
+\plc{Data-sharing Attribute Rules for Variables Referenced in a Construct}
+subsection of the OpenMP Specifications document.  
+
+\bigskip
+DATA-MAPPING ATTRIBUTES
+
+The \code{map} clause on a device construct explictly specifies how the list items in
+the clause are mapped from the encountering task's data environment (on the host)
+to the corresponding item in the device data environment (on the device).
+The common \plc{list items} are arrays, array sections, scalars, pointers, and
+structure elements (members). 
+
+Procedures and global variables have predetermined data mapping if they appear
+within the list or block of a \code{declare target} directive. Also, a C/C++ pointer
+is mapped as a zero-length array section, as is a C++ variable that is a reference to a pointer.
+% Waiting for response from Eric on this.
+
+Without explict mapping, non-scalar and non-pointer variables within the scope of the \code{target}
+construct are implicitly mapped with a \plc{map-type} of \code{tofrom}.
+Without explicit mapping, scalar variables within the scope of the \code{target}
+construct are not mapped, but have an implicit firstprivate data-sharing
+attribute. (That is, the value of the original variable is given to a private
+variable of the same name on the device.) This behavior can be changed with
+the \code{defaultmap} clause.
+
+The \code{map} clause can appear on \code{target}, \code{target data} and 
+\code{target enter/exit data} constructs.  The operations of creation and
+removal of device storage as well as assignment of the original list item 
+values to the corresponding list items may be complicated when the list 
+item appears on multiple constructs or when the host and device storage 
+is shared. In these cases the item's reference count, the number of times
+it has been referenced (+1 on entry and -1 on exited) in nested (structured)
+map regions and/or accumulative (unstructured) mappings, determines the operation.
+Details of the \code{map} clause and reference count operation are specified 
+in the \plc{map Clause} subsection of the OpenMP Specifications document.
--- a/Chap_devices.tex
+++ b/Chap_devices.tex
@ -0,0 +1,53 @@
+\pagebreak
+\chapter{Devices}
+\label{chap:devices}
+
+The \code{target} construct consists of a \code{target} directive 
+and an execution region. The \code{target} region is executed on
+the default device or the device specified in the \code{device} 
+clause. 
+
+In OpenMP version 4.0, by default, all variables within the lexical
+scope of the construct are copied \plc{to} and \plc{from} the
+device, unless the device is the host, or the data exists on the
+device from a previously executed data-type construct that
+has created space on the device and possibly copied host
+data to the device storage.
+
+The constructs that explicitly
+create storage, transfer data, and free storage on the device
+are catagorized as structured and unstructured. The
+\code{target} \code{data} construct is structured. It creates
+a data region around \code{target} constructs, and is
+convenient for providing persistent data throughout multiple
+\code{target} regions. The \code{target} \code{enter} \code{data} and 
+\code{target} \code{exit} \code{data} constructs are unstructured, because 
+they can occur anywhere and do not support a "structure" 
+(a region) for enclosing \code{target} constructs, as does the
+\code{target} \code{data} construct. 
+
+The \code{map} clause is used on \code{target} 
+constructs and the data-type constructs to map host data. It 
+specifies the device storage and data movement \code{to} and \code{from}
+the device, and controls on the storage duration.
+
+There is an important change in the OpenMP 4.5 specification
+that alters the data model for scalar variables and C/C++ pointer variables.
+The default behavior for scalar variables and C/C++ pointer variables
+in an 4.5 compliant code is \code{firstprivate}. Example
+codes that have been updated to reflect this new behavior are
+annotated with a description that describes changes required
+for correct execution. Often it is a simple matter of mapping
+the variable as \code{tofrom} to obtain the intended 4.0 behavior.
+
+In OpenMP version 4.5 the mechanism for target
+execution is specified as occuring through a \plc{target task}. 
+When the \code{target} construct is encountered a new 
+\plc{target task} is generated. The \plc{target task} 
+completes after the \code{target} region has executed and all data 
+transfers have finished.
+
+This new specification does not affect the execution of 
+pre-4.5 code; it is a necessary element for asynchronous 
+execution of the \code{target} region when using the new \code{nowait} 
+clause introduced in OpenMP 4.5.
--- a/Chap_memory_model.tex
+++ b/Chap_memory_model.tex
@ -0,0 +1,105 @@
+\pagebreak
+\chapter{Memory Model}
+\label{chap:memory_model}
+
+In this chapter, examples illustrate race conditions on access to variables with
+shared data-sharing attributes.  A race condition can exist when two
+or more threads are involved in accessing a variable in which not all
+of the accesses are reads; that is, a WaR, RaW or WaW condition
+exists (R=read, a=after, W=write). A RaR does not produce a race condition.
+ Ensuring thread execution order at
+the processor level is not enough to avoid race conditions, because the
+local storage at the processor level (registers, caches, etc.)
+must be synchronized so that a consistent view of the variable in the
+memory hierarchy can be seen by the threads accessing the variable.
+
+OpenMP provides a shared-memory model which allows all threads access
+to \plc{memory} (shared data).  Each thread also has exclusive
+access to \plc{threadprivate memory} (private data).  A private
+variable referenced in an OpenMP directive's structured block is a
+new version of the original variable (with the same name) for each
+task (or SIMD lane) within the code block.  A private variable is
+initially undefined (except for variables in \code{firstprivate}
+and \code{linear} clauses), and the original variable value is
+unaltered by assignments to the private variable, (except for
+\code{reduction}, \code{lastprivate} and \code{linear} clauses).
+
+Private variables in an outer \code{parallel} region can be
+shared by implicit tasks of an inner \code{parallel} region 
+(with a \code{share} clause on the inner \code{parallel} directive).
+Likewise, a private variable may be shared in the region of an
+explicit \code{task} (through a \code{shared} clause).
+
+
+The \code{flush} directive forces a  consistent view of local variables
+of the thread executing the \code{flush}.
+When a list is supplied on the directive, only the items (variables)
+in the list are guaranteed to be flushed.
+
+Implied flushes exist at prescribed locations of certain constructs. 
+For the complete list of these locations and associated constructs,
+please refer to the \plc{flush Construct} section of the OpenMP 
+Specifications document.
+
+% The following table lists construct in which implied flushes exist, and the
+% location of their execution.
+% 
+% %\begin{table}[hb]
+% \begin{center}
+% %\caption {Execution Location for Implicit Flushes. } 
+% \begin{tabular}{ | p{0.6\linewidth} | l | } 
+% \hline
+% \code{CONSTRUCT}                                   & \makecell{\code{EXECUTION} \\ \code{LOCATION}} \\
+% \hline
+% \code{parallel}                                    & upon entry and exit \\
+% \hline
+% \makecell[l]{worksharing \\ \hspace{1.5em}\code{for}, \code{do} 
+%                          \\ \hspace{1.5em}\code{sections} 
+%                          \\ \hspace{1.5em}\code{single} 
+%                          \\ \hspace{1.5em}\code{workshare} }  
+%                                                    & upon exit \\ 
+% \hline
+% \code{critical}                                    & upon entry and exit \\
+% \hline
+% \code{target}                                      & upon entry and exit \\
+% \hline
+% \code{barrier}                                     & during \\
+% \hline
+% \code{atomic} operation with \plc{seq\_cst} clause & upon entry and exit \\
+% \hline
+% \code{ordered}*                                    & upon entry and exit \\
+% \hline
+% \code{cancel}** and \code{cancellation point}**    & during \\
+% \hline
+% \code{target data}                                 & upon entry and exit \\
+% \hline
+% \code{target update} + \code{to} clause,   
+% \code{target enter data}                           & on entry \\
+% \hline
+% \code{target update} + \code{from} clause, 
+% \code{target exit data}                            & on exit \\
+% \hline
+% \code{omp\_set\_lock}                              & during \\
+% \hline
+% \makecell[l]{ \code{omp\_set/unset\_lock}, \code{omp\_test\_lock}*** 
+%            \\ \code{omp\_set/unset/test\_nest\_lock}*** }
+%                                                    & during \\
+% \hline
+% task scheduling point                              & \makecell[l]{immediately \\ before and after} \\
+% \hline
+% \end{tabular}
+% %\caption {Execution Location for Implicit Flushes. } 
+% 
+% \end{center}
+% %\end{table}
+% 
+% * without clauses and with \code{threads} or \code{depend} clauses \newline
+% ** when \plc{cancel-var} ICV is \plc{true} (cancellation is turned on) and cancellation is activated \newline
+% *** if the region causes the lock to be set or unset
+% 
+% A flush with a list is implied for non-sequentially consistent \code{atomic} operations
+% (\code{atomic} directive without a \code{seq\_cst} clause), where the list item is the
+% specific storage location accessed atomically (specified as the \plc{x} variable
+% in \plc{atomic Construct} subsection of the OpenMP Specifications document).
+
+Examples 1-3 show the difficulty of synchronizing threads through \code{flush} and \code{atomic} directives.
--- a/Chap_parallel_execution.tex
+++ b/Chap_parallel_execution.tex
@ -0,0 +1,104 @@
+\pagebreak
+\chapter{Parallel Execution}
+\label{chap:parallel_execution}
+
+A single thread, the \plc{initial thread}, begins sequential execution of 
+an OpenMP enabled program, as if the whole program is in an implicit parallel
+region consisting of an implicit task executed by the \plc{initial thread}.
+
+A \code{parallel} construct encloses code, 
+forming a parallel region.  An \plc{initial thread} encountering a \code{parallel} 
+region forks (creates) a team of threads at the beginning of the 
+\code{parallel} region, and joins them (removes from execution) at the 
+end of the region.  The initial thread becomes the master thread of the team in a 
+\code{parallel} region with a \plc{thread} number equal to zero, the other 
+threads are numbered from 1 to number of threads minus 1. 
+A team may be comprised of just a single thread.
+
+Each thread of a team is assigned an implicit task consisting of code within the 
+parallel region. The task that creates a parallel region is suspended while the
+tasks of the team are executed.  A thread is tied to its task; that is,
+only the thread assigned to the task can execute that task.  After completion 
+of the \code{parallel} region, the master thread resumes execution of the generating task.  
+
+%After the \code{parallel} region the master thread becomes the initial 
+%thread again, and continues to execute the \plc{sequential part}.  
+
+Any task within a \code{parallel} region is allowed to encounter another
+\code{parallel} region to form a nested \code{parallel} region. The 
+parallelism of a nested \code{parallel} region (whether it forks additional 
+threads, or is executed serially by the encountering task) can be controlled by the
+\code{OMP\_NESTED} environment variable or the \code{omp\_set\_nested()} 
+API routine with arguments indicating true or false.
+
+The number of threads of a \code{parallel} region can be set by the \code{OMP\_NUM\_THREADS}
+environment variable, the \code{omp\_set\_num\_threads()} routine, or on the \code{parallel} 
+directive with the \code{num\_threads}
+clause. The routine overrides the environment variable, and the clause overrides all. 
+Use the \code{OMP\_DYNAMIC}
+or the \code{omp\_set\_dynamic()} function to specify that the OpenMP
+implementation dynamically adjust the number of threads for
+\code{parallel} regions.  The default setting for dynamic adjustment is implementation
+defined. When dynamic adjustment is on and the number of threads is specified,
+the number of threads becomes an upper limit for the number of threads to be
+provided by the OpenMP runtime.
+
+\pagebreak
+WORKSHARING CONSTRUCTS
+
+A worksharing construct distributes the execution of the associated region
+among the members of the team that encounter it.  There is an
+implied barrier at the end of the worksharing region
+(there is no barrier at the beginning). The worksharing
+constructs are:
+
+\begin{compactitem}
+
+\item loop constructs: {\code{for} and \code{do} }
+\item \code{sections}
+\item \code{single}
+\item \code{workshare}
+
+\end{compactitem}
+
+The \code{for} and \code{do} constructs (loop constructs) create a region 
+consisting of a loop.  A loop controlled by a loop construct is called 
+an \plc{associated} loop.  Nested loops can form a single region when the 
+\code{collapse} clause (with an integer argument) designates the number of 
+\plc{associated} loops to be executed in parallel, by forming a 
+"single iteration space" for the specified number of nested loops.  
+The \code{ordered} clause can also control multiple associated loops.
+
+An associated loop must adhere to a "canonical form" (specified in the 
+\plc{Canonical Loop Form} of the OpenMP Specifications document) which allows the 
+iteration count (of all associated loops) to be computed before the 
+(outermost) loop is executed. %[58:27-29].  
+Most common loops comply with the canonical form, including C++ iterators.
+
+A \code{single} construct forms a region in which only one thread (any one 
+of the team) executes the region. 
+The other threads wait at the implied 
+barrier at the end, unless the \code{nowait} clause is specified.
+
+The \code{sections} construct forms a region that contains one or more 
+structured blocks.  Each block of a \code{sections} directive is 
+constructed with a \code{section} construct, and executed once by 
+one of the threads (any one) in the team.  (If only one block is 
+formed in the region, the \code{section} construct, which is used to
+separate blocks, is not required.)
+The other threads wait at the implied 
+barrier at the end, unless the \code{nowait} clause is specified.
+
+
+The \code{workshare} construct is a Fortran feature that consists of a
+region with a single structure block (section of code). Statements in the
+\code{workshare} region are divided into units of work, and executed (once)
+by threads of the team.  
+
+\bigskip
+MASTER CONSTRUCT
+
+The \code{master} construct is not a worksharing construct.  The master region is
+is executed only by the master thread. There is no implicit barrier (and flush) 
+at the end of the \code{master} region; hence the other threads of the team continue
+execution beyond code statements beyond the \code{master} region.
--- a/Chap_program_control.tex
+++ b/Chap_program_control.tex
@ -0,0 +1,85 @@
+\pagebreak
+\chapter{Program Control}
+\label{sec:program_control}
+
+Some specific and elementary concepts of controlling program execution are
+illustrated in the examples of this chapter.  Control can be directly
+managed with conditional control code (ifdef's with the \code{\_OPENMP} 
+macro, and the Fortran sentinel (\code{!\$}) 
+for conditionally compiling). The \code{if} clause on some constructs
+can direct the runtime to ignore or alter the behavior of the construct.
+Of course, the base-language \code{if} statements can be used to control the "execution" 
+of stand-alone directives (such as \code{flush}, \code{barrier}, \code{taskwait}, 
+and  \code{taskyield}).
+However, the directives must appear in a block structure, and not as a substatement as shown in examples 1 and 2 of this chapter.
+
+\bigskip
+CANCELLATION
+
+Cancellation (termination) of the normal sequence of execution for the threads in an OpenMP region can
+be  accomplished with the \code{cancel} construct.  The construct uses a
+\plc{construct-type-clause} to set the region-type to activate for the cancellation. 
+That is, inclusion  of one of the \plc{construct-type-clause} names \code{parallel}, \code{for}, 
+\code{do}, \code{sections} or \code{taskgroup} on the directive line 
+activates the corresponding region.  
+The \code{cancel} construct is activated by the first encountering thread,  and it
+continues execution at the end of the named region.
+The \code{cancel} construct is also a concellation point for any other thread of the team 
+to also continue execution at the end of the named region.  
+
+Also, once the specified region has been activated for cancellation any thread that encounnters 
+a \code{cancellation point} construct with the same named region (\plc{construct-type-clause}),
+continues execution at the end of the region.
+
+For an activated \code{cancel taskgroup} construct, the tasks that
+belong to the taskgroup set of the innermost enclosing taskgroup region will be canceled. 
+
+A task that encounters the cancel taskgroup construct continues execution at the end of its
+task region. Any task of the taskgroup that has already begun execution will run to completion,
+unless it encounters a \code{cancellation point}; tasks that have not begun execution "may" be
+discarded as completed tasks.
+
+\bigskip
+CONTROL VARIABLES 
+
+  Internal control variables (ICV) are used by implementations to hold values which control the execution
+  of OpenMP regions.  Control (and hence the ICVs) may be set as implementation defaults, 
+  or set and adjusted through environment variables, clauses, and API functions.  Many of the ICV control
+  values are accessible through API function calls.  Also, initial ICV values are reported by the runtime
+  if the \code{OMP\_DISPLAY\_ENV} environment variable has been set to \code{TRUE}. 
+
+ %As an example, the \plc{nthreads-var} is the ICV that holds the number of threads
+ %to be used in a \code{parallel} region.  It can be set with the \code{OMP\_NUM\_THREADS} environment variable, 
+ %the \code{omp\_set\_num\_threads()} API function, or the \code{num\_threads} clause.  The default \plc{nthreads-var}
+ %value is implementation defined.  All of the ICVs are presented in the \plc{Internal Control Variables} section
+ %of the \plc{Directives} chapter of the OpenMP Specifications document.  Within the same document section, override 
+ %relationships and scoping information can be found for applying user specifications and understanding the 
+ %extent of the control.
+
+\bigskip
+NESTED CONSTRUCTS
+
+Certain combinations of nested constructs are permitted, giving rise to a \plc{combined} construct
+consisting of two or more constructs.  These can be used when the two (or several) constructs would be used
+immediately in succession (closely nested). A combined construct can use the clauses of the component
+constructs without restrictions.
+A \plc{composite} construct is a combined construct which has one or more clauses with (an often obviously)
+modified or restricted meaning, relative to when the constructs are uncombined. %%[appear separately (singly).
+
+%The combined \code{parallel do} and \code{parallel for} constructs are formed by combining the \code{parallel}
+%construct with one of the loops constructs \code{do} or \code{for}.  The
+%\code{parallel do SIMD} and \code{parallel for SIMD} constructs are composite constructs (composed from
+%the parallel loop constructs and the \code{SIMD} construct), because the \code{collapse} clause must
+%explicitly address the ordering of loop chunking \plc{and} SIMD "combined" execution.
+
+Certain nestings are forbidden, and often the reasoning is obvious.  Worksharing constructs cannot be nested, and
+the \code{barrier} construct cannot be nested inside a worksharing construct, or a \code{critical} construct. 
+Also, \code{target} constructs cannot be nested.  
+
+The \code{parallel} construct can be nested, as well as the \code{task} construct.  The parallel
+execution in the nested \code{parallel} construct(s) is control by the \code{OMP\_NESTED} and 
+\code{OMP\_MAX\_ACTIVE\_LEVELS} environment variables, and the \code{omp\_set\_nested()} and 
+\code{omp\_set\_max\_active\_levels()} functions.
+
+More details on nesting can be found in the \plc{Nesting of Regions} of the \plc{Directives} 
+chapter in the OpenMP Specifications document.
--- a/Chap_synchronization.tex
+++ b/Chap_synchronization.tex
@ -0,0 +1,69 @@
+\pagebreak
+\chapter{Synchronization}
+\label{chap:synchronization}
+
+The \code{barrier} construct is a stand-alone directive that requires all threads
+of a team (within a contention group) to execute the barrier and complete
+execution of all tasks within the region, before continuing past the barrier.
+
+The \code{critical} construct is a directive that contains a structured block. 
+The construct allows only a single thread at a time to execute the structured block (region).
+Multiple critical regions may exist in a parallel region, and may
+act cooperatively (only one thread at a time in all \code{critical} regions),
+or separately (only one thread at a time in each \code{critical} regions when
+a unique name is supplied on each \code{critical} construct).
+An optional (lock) \code{hint} clause may be specified on a named \code{critical} 
+construct to provide the OpenMP runtime guidance in selection a locking 
+mechanism.
+
+On a finer scale the \code{atomic} construct allows only a single thread at 
+a time to have atomic access to a storage location involving a single read, 
+write, update or capture statement, and a limited number of combinations 
+when specifying the \code{capture} \plc{atomic-clause} clause.  The \plc{atomic-clause} clause
+is required for some expression statements, but are not required for 
+\code{update} statements. Please see the details in the \plc{atomic Construct} 
+subsection of the \plc{Directives} chapter in the OpenMP Specifications document.
+
+% The following three sentences were stolen from the spec.
+The \code{ordered} construct either specifies a structured block in a loop, 
+simd, or loop SIMD region that will be executed in the order of the loop 
+iterations.  The ordered construct sequentializes and orders the execution 
+of ordered regions while allowing code outside the region to run in parallel.
+
+Since OpenMP 4.5 the \code{ordered} construct can also be a stand-alone 
+directive that specifies cross-iteration dependences in a doacross loop nest.  
+The \code{depend} clause uses a \code{sink} \plc{dependence-type}, along with a 
+iteration vector argument (vec) to indicate the iteration that satisfies the 
+dependence.  The \code{depend} clause with a \code{source}
+\plc{dependence-type} specifies dependence satisfaction.
+
+The \code{flush} directive is a stand-alone construct that forces a thread's 
+temporal local storage (view) of a variable to memory where a consistent view
+of the variable storage can be accesses.  When the construct is used without 
+a variable list, all the locally thread-visible data as defined by the 
+base language are flushed.  A construct with a list applies the flush 
+operation only to the items in the list.  The \code{flush} construct also 
+effectively insures that no memory (load or store) operation for
+the variable set (list items, or default set) may be reordered across 
+the \code{flush} directive. 
+
+General-purpose routines provide mutual exclusion semantics through locks, 
+represented by lock variables.  
+The semantics allows a task to \plc{set}, and hence 
+\plc{own} a lock, until it is \plc{unset} by the task that set it. A 
+\plc{nestable} lock can be set multiple times by a task, and is used
+when in code requires nested control of locks.  A \plc{simple lock} can
+only be set once by the owning task. There are specific calls for the two
+types of locks, and the variable of a specific lock type cannot be used by the
+other lock type.  
+
+Any explicit task will observe the synchronization prescribed in a 
+\code{barrier} construct and an implied barrier.  Also, additional synchronizations 
+are available for tasks.  All children of a task will wait at a \code{taskwait} (for 
+their siblings to complete).  A \code{taskgroup} construct creates a region in which the
+current task is suspended at the end of the region until all sibling tasks, 
+and their descendants, have completed. 
+Scheduling constraints on task execution can be prescribed by the \code{depend}
+clause to enforce dependence on previously generated tasks.
+More details on controlling task executions can be found in the \plc{Tasking} Chapter
+in the OpenMP Specifications document. %(DO REF. RIGHT.)
--- a/Chap_tasking.tex
+++ b/Chap_tasking.tex
@ -0,0 +1,51 @@
+\pagebreak
+\chapter{Tasking}
+\label{chap:tasking}
+
+Tasking constructs provide units of work to a thread for execution.  
+Worksharing constructs do this, too (e.g. \code{for}, \code{do}, 
+\code{sections}, and \code{singles} constructs); 
+but the work units are tightly controlled by an iteration limit and limited 
+scheduling, or a limited number of \code{sections} or \code{single} regions. 
+Worksharing was designed 
+with \texttt{"}data parallel\texttt{"} computing in mind.  Tasking was designed for 
+\texttt{"}task parallel\texttt{"} computing and often involves non-locality or irregularity 
+in memory access.
+
+The \code{task} construct can be used to execute work chunks: in a while loop; 
+while traversing nodes in a list; at nodes in a tree graph; 
+or in a normal loop (with a \code{taskloop} construct).  
+Unlike the statically scheduled loop iterations of worksharing, a task is 
+often enqueued, and then dequeued for execution by any of the threads of the
+team within a parallel region. The generation of tasks can be from a single 
+generating thread (creating sibling tasks), or from multiple generators
+in a recursive graph tree traversals. 
+%(creating a parent-descendents hierarchy of tasks, see example 4 and 7  below). 
+A \code{taskloop} construct
+bundles iterations of an associated loop into tasks, and provides 
+similar controls found in the \code{task} construct.
+
+Sibling tasks are synchronized by the \code{taskwait} construct, and tasks
+and their descendent tasks can be synchronized by containing them in
+a \code{taskgroup} region.  Ordered execution is accomplished by specifying
+dependences with a \code{depend} clause. Also, priorities can be
+specified as hints to the scheduler through a \code{priority} clause.
+
+Various clauses can be used to manage and optimize task generation,
+as well as reduce the overhead of execution and to relinquish 
+control of threads for work balance and forward progress. 
+
+Once a thread starts executing a task, it is the designated thread 
+for executing the task to completion, even though it may leave the
+execution at a scheduling point and return later.  The thread is tied
+to the task.  Scheduling points can be introduced with the \code{taskyield}
+construct.  With an \code{untied} clause any other thread is allowed to continue
+the task.  An \code{if} clause with a \plc{true} expression allows the 
+generating thread to immediately execute the task as an undeferred task.
+By including the data environment of the generating task into the generated task with the 
+\code{mergeable} and \code{final} clauses, task generation overhead can be reduced.
+
+A complete list of the tasking constructs and details of their clauses
+can be found in the \plc{Tasking Constructs} chapter of the OpenMP Specifications,
+in the \plc{OpenMP Application Programming Interface} section.
+
--- a/Examples_Chapt.tex
+++ b/Examples_Chapt.tex
@ -1,9 +1,21 @@

 \chapter*{Examples}
 \label{chap:examples}
+\addcontentsline{toc}{chapter}{\protect\numberline{}Examples}
 The following are examples of the OpenMP API directives, constructs, and routines.
 \ccppspecificstart
 A statement following a directive is compound only when necessary, and a 
 non-compound statement is indented with respect to a directive preceding it.
 \ccppspecificend

+Each example is labeled as \plc{ename.seqno.ext}, where \plc{ename} is 
+the example name, \plc{seqno} is the sequence number in a section, and 
+\plc{ext} is the source file extension to indicate the code type and 
+source form.  \plc{ext} is one of the following:
+\begin{compactitem}
+\item \plc{c} -- C code,
+\item \plc{cpp} -- C++ code,
+\item \plc{f} -- Fortran code in fixed form, and
+\item \plc{f90} -- Fortran code in free form.
+\end{compactitem}
+
--- a/Examples_SIMD.tex
+++ b/Examples_SIMD.tex
@ -1,17 +1,13 @@
-\pagebreak
-\chapter{SIMD Constructs}
-\label{chap:SIMD}
+%\pagebreak
+\section{\code{simd} and \code{declare} \code{simd} Constructs}
+\label{sec:SIMD}

-The following examples illustrate the use of SIMD constructs for vectorization.
+The following example illustrates the basic use of the \code{simd} construct 
+to assure the compiler that the loop can be vectorized.

-Compilers may not vectorize loops when they are complex or possibly have 
-dependencies, even though the programmer is certain the loop will execute 
-correctly as a vectorized loop.  The \code{simd} construct assures the compiler 
-that the loop can be vectorized.
+\cexample{SIMD}{1}

-\cexample{SIMD}{1c}
-
-\fexample{SIMD}{1f}
+\ffreeexample{SIMD}{1}
 

 When a function can be inlined within a loop the compiler has an opportunity to 
@ -42,9 +38,9 @@ In the \code{simd} constructs for the loops the \code{private(tmp)} clause is
 necessary to assure that the each vector operation has its own \plc{tmp} 
 variable.

-\cexample{SIMD}{2c}
+\cexample{SIMD}{2}

-\fexample{SIMD}{2f}
+\ffreeexample{SIMD}{2}


 A thread that encounters a SIMD construct executes a vectorized code of the 
@ -54,9 +50,9 @@ privatized and declared as reductions with clauses.  The example below
 illustrates the use of \code{private} and \code{reduction} clauses in a SIMD 
 construct.

-\cexample{SIMD}{3c}
+\cexample{SIMD}{3}

-\fexample{SIMD}{3f}
+\ffreeexample{SIMD}{3}


 A \code{safelen(N)} clause in a \code{simd} construct assures the compiler that 
@ -69,9 +65,9 @@ code is safe for vectors up to and including size 16.  In the loop, \plc{m} can
 be 16 or greater, for correct code execution.  If the value of \plc{m} is less 
 than 16, the behavior is undefined.

-\cexample{SIMD}{4c}
+\cexample{SIMD}{4}

-\fexample{SIMD}{4f}
+\ffreeexample{SIMD}{4}


 The following SIMD construct instructs the compiler to collapse the \plc{i} and 
@ -79,11 +75,15 @@ The following SIMD construct instructs the compiler to collapse the \plc{i} and
 threads of the team. Within the workshared loop chunks of a thread, the SIMD 
 chunks are executed in the lanes of the vector units.

-\cexample{SIMD}{5c}
+\cexample{SIMD}{5}

-\fexample{SIMD}{5f}
+\ffreeexample{SIMD}{5}


+%%% section
+\section{\code{inbranch} and \code{notinbranch} Clauses}
+\label{sec:SIMD_branch}
+
 The following examples illustrate the use of the \code{declare} \code{simd} 
 construct with the \code{inbranch} and \code{notinbranch} clauses. The 
 \code{notinbranch} clause informs the compiler that the function \plc{foo} is 
@ -92,9 +92,9 @@ the other hand, the \code{inbranch} clause for the function goo indicates that
 the function is always called conditionally in the SIMD loop inside 
 the function \plc{myaddfloat}.

-\cexample{SIMD}{6c}
+\cexample{SIMD}{6}

-\fexample{SIMD}{6f}
+\ffreeexample{SIMD}{6}


 In the code below, the function \plc{fib()} is called in the main program and 
@ -103,7 +103,24 @@ condition. The compiler creates a masked vector version and a non-masked vector
 version for the function \plc{fib()} while retaining the original scalar 
 version of the \plc{fib()} function.

-\cexample{SIMD}{7c}
+\cexample{SIMD}{7}

-\fexample{SIMD}{7f}
+\ffreeexample{SIMD}{7}
+
+
+
+%%% section
+\section{Loop-Carried Lexical Forward Dependence}
+\label{sec:SIMD_forward_dep}
+
+
+ The following example tests the restriction on an SIMD loop with the loop-carried lexical forward-dependence. This dependence must be preserved for the correct execution of SIMD loops.
+
+A loop can be vectorized even though the iterations are not completely independent when it has loop-carried dependences that are forward lexical dependences, indicated in the code below by the read of \plc{A[j+1]} and the write to \plc{A[j]} in C/C++ code (or \plc{A(j+1)} and \plc{A(j)} in Fortran). That is, the read of \plc{A[j+1]} (or \plc{A(j+1)} in Fortran) before the write to \plc{A[j]} (or \plc{A(j)} in Fortran) ordering must be preserved for each iteration in \plc{j} for valid SIMD code generation.
+
+This test assures that the compiler preserves the loop carried lexical forward-dependence for generating a correct SIMD code.
+
+\cexample{SIMD}{8}
+
+\ffreeexample{SIMD}{8}

--- a/Examples_affinity.tex
+++ b/Examples_affinity.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{The \code{proc\_bind} Clause}
-\label{chap:affinity}
+\section{The \code{proc\_bind} Clause}
+\label{sec:affinity}

 The following examples demonstrate how to use the \code{proc\_bind} clause to 
 control the thread binding for a team of threads in a \code{parallel} region. 
@ -25,16 +25,18 @@ or

 \code{OMP\_PLACES=\texttt{"}\{0:2\}:8:2\texttt{"}}

-\section{Spread Affinity Policy}
+\subsection{Spread Affinity Policy}
+\label{subsec:affinity_spread}
+

 The following example shows the result of the \code{spread} affinity policy on 
 the partition list when the number of threads is less than or equal to the number 
 of places in the parent's place partition, for the machine architecture depicted 
 above. Note that the threads are bound to the first place of each subpartition.

-\cexample{affinity}{1c}
+\cexample{affinity}{1}

-\fexample{affinity}{1f}
+\fexample{affinity}{1}

 It is unspecified on which place the master thread is initially started. If the 
 master thread is initially started on p0, the following placement of threads will 
@ -73,9 +75,9 @@ parent's place partition. The first \plc{T/P} threads of the team (including the
 thread) execute on the parent's place. The next \plc{T/P} threads execute on the next 
 place in the place partition, and so on, with wrap around. 

-\cexample{affinity}{2c}
+\cexample{affinity}{2}

-\fexample{affinity}{2f}
+\ffreeexample{affinity}{2}

 It is unspecified on which place the master thread is initially started. If the 
 master thread is initially started on p0, the following placement of threads will 
@ -120,16 +122,17 @@ and distribution of the place partition would be as follows:
 \item threads 14,15 execute on p1 with the place partition p1
 \end{compactitem}

-\section{Close Affinity Policy}
+\subsection{Close Affinity Policy}
+\label{subsec:affinity_close}

 The following example shows the result of the \code{close} affinity policy on 
 the partition list when the number of threads is less than or equal to the number 
 of places in parent's place partition, for the machine architecture depicted above. 
 The place partition is not changed by the \code{close} policy.

-\cexample{affinity}{3c}
+\cexample{affinity}{3}

-\fexample{affinity}{3f}
+\fexample{affinity}{3}

 It is unspecified on which place the master thread is initially started. If the 
 master thread is initially started on p0, the following placement of threads will 
@ -168,9 +171,9 @@ thread) execute on the parent's place. The next \plc{T/P} threads execute on the
 place in the place partition, and so on, with wrap around. The place partition 
 is not changed by the \code{close} policy.

-\cexample{affinity}{4c}
+\cexample{affinity}{4}

-\fexample{affinity}{4f}
+\ffreeexample{affinity}{4}

 It is unspecified on which place the master thread is initially started. If the 
 master thread is initially running on p0, the following placement of threads will 
@ -215,15 +218,16 @@ and distribution of the place partition would be as follows:
 \item threads 14,15 execute on p1 with the place partition p0-p7
 \end{compactitem}

-\section{Master Affinity Policy}
+\subsection{Master Affinity Policy}
+\label{subsec:affinity_master}

 The following example shows the result of the \code{master} affinity policy on 
 the partition list for the machine architecture depicted above. The place partition 
 is not changed by the master policy.

-\cexample{affinity}{5c}
+\cexample{affinity}{5}

-\fexample{affinity}{5f}
+\fexample{affinity}{5}

 It is unspecified on which place the master thread is initially started. If the 
 master thread is initially running on p0, the following placement of threads will 
--- a/Examples_affinity_query.tex
+++ b/Examples_affinity_query.tex
@ -0,0 +1,43 @@
+\section{Affinity Query Functions}
+\label{sec: affinity_query}
+
+In the example below a team of threads is generated on each socket of
+the system, using nested parallelism. Several query functions are used
+to gather information to support the creation of the teams and to obtain 
+socket and thread numbers.
+
+For proper execution of the code, the user must create a place partition, such that
+each place is a listing of the core numbers for a socket. For example,
+in a 2 socket system with 8 cores in each socket, and sequential numbering
+in the socket for the core numbers, the \code{OMP\_PLACES} variable would be set
+to "\{0:8\},\{8:8\}", using the place syntax \{\plc{lower\_bound}:\plc{length}:\plc{stride}\},
+and the default stride of 1.
+
+The code determines the number of sockets (\plc{n\_sockets})
+using the \code{omp\_get\_num\_places()} query function.
+In this example each place is constructed with a list of 
+each socket's core numbers, hence the number of places is equal
+to the number of sockets. 
+
+The outer parallel region forms a team of threads, and each thread 
+executes on a socket (place) because the \code{proc\_bind} clause uses 
+\code{spread} in the outer \code{parallel} construct.
+Next, in the \plc{socket\_init} function, an inner parallel region creates a team 
+of threads equal to the number of elements (core numbers) from the place
+of the parent thread. Because the outer \code{parallel} construct uses 
+a \code{spread} affinity policy, each of its threads inherits a subpartition of 
+the original partition.  Hence, the \code{omp\_get\_place\_num\_procs} query function
+returns the number of elements (here procs = cores) in the subpartition of the thread.  
+After each parent thread creates its nested parallel region on the section,
+the socket number and thread number are reported.
+
+Note: Portable tools like hwloc (Portable HardWare LOCality package), which support
+many common operating systems, can be used to determine the configuration of a system.  
+On some systems there are utilities, files or user guides that provide configuration
+information.  For instance, the socket number and proc\_id's for a socket 
+can be found in the /proc/cpuinfo text file on Linux systems.
+
+\cexample{affinity}{6}
+
+\ffreeexample{affinity}{6}
+
--- a/Examples_array_sections.tex
+++ b/Examples_array_sections.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{Array Sections in Device Constructs}
-\label{chap:array_sections}
+\section{Array Sections in Device Constructs}
+\label{sec:array_sections}

 The following examples show the usage of array sections in \code{map} clauses 
 on \code{target} and \code{target} \code{data} constructs.
@ -8,28 +8,28 @@ on \code{target} and \code{target} \code{data} constructs.
 This example shows the invalid usage of two seperate sections of the same array 
 inside of a \code{target} construct.

-\cexample{array_sections}{1c}
+\cexample{array_sections}{1}

-\fexample{array_sections}{1f}
+\ffreeexample{array_sections}{1}

 This example shows the invalid usage of two separate sections of the same array 
 inside of a \code{target} construct.

-\cexample{array_sections}{2c}
+\cexample{array_sections}{2}

-\fexample{array_sections}{2f}
+\ffreeexample{array_sections}{2}

 This example shows the valid usage of two separate sections of the same array inside 
 of a \code{target} construct.

-\cexample{array_sections}{3c}
+\cexample{array_sections}{3}

-\fexample{array_sections}{3f}
+\ffreeexample{array_sections}{3}

 This example shows the valid usage of a wholly contained array section of an already 
 mapped array section inside of a \code{target} construct.

-\cexample{array_sections}{4c}
+\cexample{array_sections}{4}

-\fexample{array_sections}{4f}
+\ffreeexample{array_sections}{4}

--- a/Examples_associate.tex
+++ b/Examples_associate.tex
@ -1,7 +1,7 @@
 \pagebreak
-\chapter{Fortran \code{ASSOCIATE} Construct}
+\section{Fortran \code{ASSOCIATE} Construct}
 \fortranspecificstart
-\label{chap:associate}
+\label{sec:associate}

 The following is an invalid example of specifying an associate name on a data-sharing attribute 
 clause. The constraint in the Data Sharing Attribute Rules section in the OpenMP 
@ -11,13 +11,13 @@ name \plc{b} is associated with the shared variable \plc{a}. With the predetermi
 attribute rule, the associate name \plc{b} is not allowed to be specified on the \code{private} 
 clause.

-\fnexample{associate}{1f}
+\fnexample{associate}{1}

 In next example, within the \code{parallel} construct, the association name \plc{thread\_id} 
 is associated with the private copy of \plc{i}. The print statement should output the 
 unique thread number.

-\fnexample{associate}{2f}
+\fnexample{associate}{2}

 The following example illustrates the effect of specifying a selector name on a data-sharing 
 attribute clause. The associate name \plc{u} is associated with \plc{v} and the variable \plc{v} 
@ -27,6 +27,6 @@ The association between \plc{u} and the original \plc{v} is retained (see the Da
 Attribute Rules section in the OpenMP 4.0 API Specifications). Inside the \code{parallel} 
 region, \plc{v} has the value of -1 and \plc{u} has the value of the original \plc{v}.

-\fnexample{associate}{3f}
+\ffreenexample{associate}{3}
 \fortranspecificend

--- a/Examples_async_target_depend.tex
+++ b/Examples_async_target_depend.tex
@ -0,0 +1,15 @@
+\pagebreak
+\section{Asynchronous \code{target} Execution and Dependences}
+\label{sec:async_target_exec_depend}
+
+Asynchronous execution of a \code{target} region can be accomplished
+by creating an explicit task around the \code{target} region. Examples
+with explicit tasks are shown at the beginning of this section. 
+
+As of OpenMP 4.5 and beyond the \code{nowait} clause can be used on the
+\code{target} directive for asynchronous execution. Examples with 
+\code{nowait} clauses follow the explicit \code{task} examples.
+
+This section also shows the use of \code{depend} clauses to order 
+executions through dependences.
+
--- a/Examples_async_target_nowait.tex
+++ b/Examples_async_target_nowait.tex
@ -0,0 +1,31 @@
+\subsection{\code{nowait} Clause on \code{target} Construct}
+\label{subsec:target_nowait_clause}
+
+The following example shows how to execute code asynchronously on a 
+device without an explicit task. The \code{nowait} clause on a \code{target} 
+construct allows the thread of the \plc{target task} to perform other
+work while waiting for the \code{target} region execution to complete. 
+Hence, the the \code{target} region can execute asynchronously on the 
+device (without requiring a host thread to idle while waiting for 
+the \plc{target task} execution to complete).
+
+In this example the product of two vectors (arrays), \plc{v1}
+and \plc{v2}, is formed. One half of the operations is performed
+on the device, and the last half on the host, concurrently.
+
+After a team of threads is formed the master thread generates 
+the \plc{target task} while the other threads can continue on, without a barrier,
+to the execution of the host portion of the vector product.
+The completion of the \plc{target task} (asynchronous target execution) is 
+guaranteed by the synchronization in the implicit barrier at the end of the 
+host vector-product worksharing loop region. See the \code{barrier} 
+glossary entry in the OpenMP specification for details.
+
+The host loop scheduling is \code{dynamic}, to balance the host thread executions, since 
+one thread is being used for offload generation. In the situation where 
+little time is spent by the \plc{target task} in setting 
+up and tearing down the the target execution, \code{static} scheduling may be desired. 
+
+\cexample{async_target}{3}
+
+\ffreeexample{async_target}{3}
--- a/Examples_async_target_nowait_depend.tex
+++ b/Examples_async_target_nowait_depend.tex
@ -0,0 +1,18 @@
+%begin 
+\subsection{Asynchronous \code{target} with \code{nowait} and \code{depend} Clauses}
+\label{subsec:async_target_nowait_depend}
+
+More details on dependences can be found in \specref{sec:task_depend}, Task 
+Dependences. In this example, there are three flow dependences.  In the first two dependences the
+target task does not execute until the preceding explicit tasks have finished.   These 
+dependences are produced by arrays \plc{v1} and \plc{v2}  with the \code{out} dependence type in the first two tasks, and the \code{in} dependence type in the target task.   
+
+The last dependence is produced by array \plc{p}  with the \code{out} dependence type in the target task, and the \code{in} dependence type in the last task.  The last task does not execute until the target task finishes.  
+
+The \code{nowait} clause on the \code{target} construct creates a deferrable \plc{target task}, allowing the encountering task to continue execution without waiting for the completion of the \plc{target task}.
+
+\cexample{async_target}{4}
+
+\ffreeexample{async_target}{4}
+
+%end
--- a/Examples_async_target_with_tasks.tex
+++ b/Examples_async_target_with_tasks.tex
@ -1,6 +1,5 @@
-\pagebreak
-\chapter{Asynchronous Execution of a \code{target} Region Using Tasks}
-\label{chap:async_target}
+\subsection{Asynchronous \code{target} with Tasks}
+\label{subsec:async_target_with_tasks}

 The following example shows how the \code{task} and \code{target} constructs 
 are used to execute multiple \code{target} regions asynchronously. The task that 
@ -10,45 +9,46 @@ scheduling point while waiting for the execution of the \code{target} region
 to complete, allowing the thread to switch back to the execution of the encountering 
 task or one of the previously generated explicit tasks.

-\cexample{async_target}{1c}
+\cexample{async_target}{1}

 The Fortran version has an interface block that contains the \code{declare} \code{target}. 
 An identical statement exists in the function declaration (not shown here).

-\fexample{async_target}{1f}
+\ffreeexample{async_target}{1}

 The following example shows how the \code{task} and \code{target} constructs 
 are used to execute multiple \code{target} regions asynchronously. The task dependence 
 ensures that the storage is allocated and initialized on the device before it is 
 accessed.

-\cexample{async_target}{2c}
+\cexample{async_target}{2}

 The Fortran example below is similar to the C version above. Instead of pointers, though, it uses
-the convenience of Fortran allocatable arrays on the device. An allocatable array has the
-same behavior in a \code{map} clause as a C pointer, in this case.
+the convenience of Fortran allocatable arrays on the device. In order to preserve the arrays 
+allocated on the device across multiple \code{target} regions, a \code{target}~\code{data} region 
+is used in this case.

 If there is no shape specified for an allocatable array in a \code{map} clause, only the array descriptor
 (also called a dope vector) is mapped. That is, device space is created for the descriptor, and it
 is initially populated with host values. In this case, the \plc{v1} and \plc{v2} arrays will be in a
 non-associated state on the device. When space for \plc{v1} and \plc{v2} is allocated on the device
-the addresses to the space will be included in their descriptors.
+in the first \code{target} region the addresses to the space will be included in their descriptors.

-At the end of the first \code{target} region, the descriptor (of an unshaped specification of an allocatable
-array in a \code{map} clause) is returned with the raw device address of the allocated space.
-The content of the array is not returned. In the example the data in arrays \plc{v1} and \plc{v2}
-are not returned. In the second \code{target} directive, the \plc{v1} and \plc{v2} descriptors are
-re-created on the device with the descriptive information; and references to the
-vectors point to the correct local storage, of the space that was not freed in the first \code{target}
-directive.  At the end of the second \code{target} region, the data in array \plc{p} is copied back
-to the host since \plc{p} is not an allocatable array.
+At the end of the first \code{target} region, the arrays \plc{v1} and \plc{v2} are preserved on the device 
+for access in the second \code{target} region. At the end of the second \code{target} region, the data 
+in array \plc{p} is copied back, the arrays \plc{v1} and \plc{v2} are not.

 A \code{depend} clause is used in the \code{task} directive to provide a wait at the beginning of the second 
 \code{target} region, to insure that there is no race condition with \plc{v1} and \plc{v2} in the two tasks.
 It would be noncompliant to use \plc{v1} and/or \plc{v2} in lieu of \plc{N} in the \code{depend} clauses, 
-because the use of non-allocated allocatable arrays as list items in the first \code{depend} clause would 
+because the use of non-allocated allocatable arrays as list items in a \code{depend} clause would 
 lead to unspecified behavior. 

-\fexample{async_target}{2f}
-
+\noteheader{--} This example is not strictly compliant with the OpenMP 4.5 specification since the allocation status
+of allocatable arrays \plc{v1} and \plc{v2} is changed inside the \code{target} region, which is not allowed.
+(See the restrictions for the \code{map} clause in the \plc{Data-mapping Attribute Rules and Clauses} 
+section of the specification.)
+However, the intention is to relax the restrictions on mapping of allocatable variables in the next release
+of the specification so that the example will be compliant.

+\ffreeexample{async_target}{2}
--- a/Examples_atomic.tex
+++ b/Examples_atomic.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{The \code{atomic} Construct}
-\label{chap:atomic}
+\section{The \code{atomic} Construct}
+\label{sec:atomic}

 The following example avoids race conditions (simultaneous updates of an element 
 of \plc{x} by multiple threads) by using the \code{atomic} construct .
@ -14,9 +14,9 @@ Note that the \code{atomic} directive applies only to the statement immediately
 following it. As a result, elements of \plc{y} are not updated atomically in 
 this example.

-\cexample{atomic}{1c}
+\cexample{atomic}{1}

-\fexample{atomic}{1f}
+\fexample{atomic}{1}

 The following example illustrates the \code{read} and \code{write}  clauses 
 for the \code{atomic} directive. These clauses ensure that the given variable 
@ -26,9 +26,9 @@ another part of the variable. Note that most hardware provides atomic reads and
 writes for some set of properly aligned variables of specific sizes, but not necessarily 
 for all the variable types supported by the OpenMP API.

-\cexample{atomic}{2c}
+\cexample{atomic}{2}

-\fexample{atomic}{2f}
+\fexample{atomic}{2}

 The following example illustrates the \code{capture} clause for the \code{atomic} 
 directive. In this case the value of a variable is captured, and then the variable 
@ -37,8 +37,8 @@ be implemented using the fetch-and-add instruction available on many kinds of ha
 The example also shows a way to implement a spin lock using the \code{capture} 
 and \code{read} clauses.

-\cexample{atomic}{3c}
+\cexample{atomic}{3}

-\fexample{atomic}{3f}
+\fexample{atomic}{3}


--- a/Examples_atomic_restrict.tex
+++ b/Examples_atomic_restrict.tex
@ -1,25 +1,25 @@
 \pagebreak
-\chapter{Restrictions on the \code{atomic} Construct}
-\label{chap:atomic_restrict}
+\section{Restrictions on the \code{atomic} Construct}
+\label{sec:atomic_restrict}

 The following non-conforming examples illustrate the restrictions on the \code{atomic} 
 construct. 

-\cexample{atomic_restrict}{1c}
+\cexample{atomic_restrict}{1}

-\fexample{atomic_restrict}{1f}
+\fexample{atomic_restrict}{1}

-\cexample{atomic_restrict}{2c}
+\cexample{atomic_restrict}{2}

 \fortranspecificstart
 The following example is non-conforming because \code{I} and \code{R} reference 
 the same location but have different types.

-\fnexample{atomic_restrict}{2f}
+\fnexample{atomic_restrict}{2}

 Although the following example might work on some implementations, this is also 
 non-conforming:

-\fnexample{atomic_restrict}{3f}
+\fnexample{atomic_restrict}{3}
 \fortranspecificend

--- a/Examples_barrier_regions.tex
+++ b/Examples_barrier_regions.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{Binding of \code{barrier} Regions}
-\label{chap:barrier_regions}
+\section{Binding of \code{barrier} Regions}
+\label{sec:barrier_regions}

 The binding rules call for a \code{barrier} region to bind to the closest enclosing 
 \code{parallel} region. 
@ -17,8 +17,8 @@ part. Also note that the \code{barrier} region in \plc{sub3} when called from
 \plc{sub2} only synchronizes the team of threads in the enclosing \code{parallel} 
 region and not all the threads created in \plc{sub1}.

-\cexample{barrier_regions}{1c}
+\cexample{barrier_regions}{1}

-\fexample{barrier_regions}{1f}
+\fexample{barrier_regions}{1}


--- a/Examples_cancellation.tex
+++ b/Examples_cancellation.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{Cancellation Constructs}
-\label{chap:cancellation}
+\section{Cancellation Constructs}
+\label{sec:cancellation}

 The following example shows how the \code{cancel} directive can be used to terminate 
 an OpenMP region. Although the \code{cancel} construct terminates the OpenMP 
@ -11,7 +11,7 @@ exception is properly handled in the sequential part. If cancellation of the \co
 region has been requested, some threads might have executed \code{phase\_1()}. 
 However, it is guaranteed that none of the threads executed \code{phase\_2()}.

-\cexample{cancellation}{1c}
+\cppexample{cancellation}{1}


 The following example illustrates the use of the \code{cancel} construct in error 
@ -20,7 +20,7 @@ the cancellation is activated. The encountering thread sets the shared variable
 \code{err} and other threads of the binding thread set proceed to the end of 
 the worksharing construct after the cancellation has been activated. 

-\fexample{cancellation}{1f}
+\ffreeexample{cancellation}{1}

 The following example shows how to cancel a parallel search on a binary tree as 
 soon as the search value has been detected. The code creates a task to descend 
@ -32,11 +32,11 @@ task group to control the effect of the \code{cancel taskgroup} directive. The
 \plc{level} argument is used to create undeferred tasks after the first ten 
 levels of the tree.

-\cexample{cancellation}{2c}
+\cexample{cancellation}{2}


 The following is the equivalent parallel search example in Fortran.

-\fexample{cancellation}{2f}
+\ffreeexample{cancellation}{2}


--- a/Examples_carrays_fpriv.tex
+++ b/Examples_carrays_fpriv.tex
@ -1,7 +1,7 @@
 \pagebreak
-\chapter{C/C++ Arrays in a \code{firstprivate} Clause}
+\section{C/C++ Arrays in a \code{firstprivate} Clause}
 \ccppspecificstart
-\label{chap:carrays_fpriv}
+\label{sec:carrays_fpriv}

 The following example illustrates the size and value of list items of array or 
 pointer type in a \code{firstprivate} clause . The size of new list items is 
@ -31,7 +31,7 @@ The new items of array type are initialized as if each integer element of the or
 array is assigned to the corresponding element of the new array. Those of pointer 
 type are initialized as if by assignment from the original item to the new item.

-\cnexample{carrays_fpriv}{1c}
+\cnexample{carrays_fpriv}{1}
 \ccppspecificend


--- a/Examples_collapse.tex
+++ b/Examples_collapse.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{The \code{collapse} Clause}
-\label{chap:collapse}
+\section{The \code{collapse} Clause}
+\label{sec:collapse}

 In the following example, the \code{k} and \code{j} loops are associated with 
 the loop construct. So the iterations of the \code{k} and \code{j} loops are 
@ -16,9 +16,9 @@ The variable \code{j} can be omitted from the \code{private}  clause when the
 from the \code{private} clause. In either case, \code{k} is implicitly private 
 and could be omitted from the \code{private}  clause.

-\cexample{collapse}{1c}
+\cexample{collapse}{1}

-\fexample{collapse}{1f}
+\fexample{collapse}{1}

 In the next example, the \code{k} and \code{j} loops are associated with the 
 loop construct. So the iterations of the \code{k} and \code{j} loops are collapsed 
@ -33,9 +33,9 @@ will have the value \code{2} and \code{j} will have the value \code{3}. Since
 by the sequentially last iteration of the collapsed \code{k} and \code{j} loop. 
 This example prints: \code{2 3}.

-\cexample{collapse}{2c}
+\cexample{collapse}{2}

-\fexample{collapse}{2f}
+\fexample{collapse}{2}

 The next example illustrates the interaction of the \code{collapse} and \code{ordered} 
 clauses.
@ -71,8 +71,8 @@ The code prints
 \\
 \code{1 3 2}

-\cexample{collapse}{3c}
+\cexample{collapse}{3}

-\fexample{collapse}{3f}
+\fexample{collapse}{3}


--- a/Examples_cond_comp.tex
+++ b/Examples_cond_comp.tex
@ -1,13 +1,13 @@
 \pagebreak
-\chapter{Conditional Compilation}
-\label{chap:cond_comp}
+\section{Conditional Compilation}
+\label{sec:cond_comp}

 \ccppspecificstart
 The following example illustrates the use of conditional compilation using the 
 OpenMP macro \code{\_OPENMP}. With OpenMP compilation, the \code{\_OPENMP} 
 macro becomes defined.

-\cnexample{cond_comp}{1c}
+\cnexample{cond_comp}{1}
 \ccppspecificend

 \fortranspecificstart
@ -16,6 +16,6 @@ With OpenMP compilation, the conditional compilation sentinel \code{!\$} is reco
 and treated as two spaces. In fixed form source, statements guarded by the sentinel 
 must start after column 6.

-\fnexample{cond_comp}{1f}
+\fnexample{cond_comp}{1}
 \fortranspecificend

--- a/Examples_copyin.tex
+++ b/Examples_copyin.tex
@ -1,13 +1,13 @@
 \pagebreak
-\chapter{The \code{copyin} Clause}
-\label{chap:copyin}
+\section{The \code{copyin} Clause}
+\label{sec:copyin}

 The \code{copyin} clause is used to initialize threadprivate data upon entry 
 to a \code{parallel} region. The value of the threadprivate variable in the master 
 thread is copied to the threadprivate variable of each other team member.

-\cexample{copyin}{1c}
+\cexample{copyin}{1}

-\fexample{copyin}{1f}
+\fexample{copyin}{1}


--- a/Examples_copyprivate.tex
+++ b/Examples_copyprivate.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{The \code{copyprivate} Clause}
-\label{chap:copyprivate}
+\section{The \code{copyprivate} Clause}
+\label{sec:copyprivate}

 The \code{copyprivate} clause can be used to broadcast values acquired by a single 
 thread directly to all instances of the private variables in the other threads. 
@ -16,28 +16,28 @@ The thread that executes the structured block associated with the \code{single}
 of the other implicit tasks in the thread team. The broadcast completes before 
 any of the threads have left the barrier at the end of the construct.

-\cexample{copyprivate}{1c}
+\cexample{copyprivate}{1}

-\fexample{copyprivate}{1f}
+\fexample{copyprivate}{1}

 In this example, assume that the input must be performed by the master thread. 
 Since the \code{master} construct does not support the \code{copyprivate} clause, 
 it cannot broadcast the input value that is read. However, \code{copyprivate} 
 is used to broadcast an address where the input value is stored.

-\cexample{copyprivate}{2c}
+\cexample{copyprivate}{2}

-\fexample{copyprivate}{2f}
+\fexample{copyprivate}{2}

 Suppose that the number of lock variables required within a \code{parallel} region 
 cannot easily be determined prior to entering it. The \code{copyprivate} clause 
 can be used to provide access to shared lock variables that are allocated within 
 that \code{parallel} region.

-\cexample{copyprivate}{3c}
+\cexample{copyprivate}{3}

 \fortranspecificstart
-\fnexample{copyprivate}{3f}
+\fnexample{copyprivate}{3}

 Note that the effect of the \code{copyprivate} clause on a variable with the 
 \code{allocatable} attribute is different than on a variable with the \code{pointer} 
@ -45,7 +45,7 @@ attribute. The value of \code{A} is copied (as if by intrinsic assignment) and
 the pointer \code{B} is copied (as if by pointer assignment) to the corresponding 
 list items in the other implicit tasks belonging to the \code{parallel} region. 

-\fnexample{copyprivate}{4f}
+\fnexample{copyprivate}{4}
 \fortranspecificend


--- a/Examples_cpp_reference.tex
+++ b/Examples_cpp_reference.tex
@ -0,0 +1,14 @@
+\section{C++ Reference in Data-Sharing Clauses}
+\cppspecificstart
+\label{sec:cpp_reference}
+
+C++ reference types are allowed in data-sharing attribute clauses as of OpenMP 4.5, except
+for the \code{threadprivate}, \code{copyin} and \code{copyprivate} clauses.  
+(See the Data-Sharing Attribute Clauses Section of the 4.5 OpenMP specification.)
+When a variable with C++ reference type is privatized, the object the reference refers to is privatized in addition to the reference itself.
+The following example shows the use of reference types in data-sharing clauses in the usual way.
+Additionally it shows how the data-sharing of formal arguments with a C++ reference type on an orphaned task generating construct is determined implicitly. (See the Data-sharing Attribute Rules for Variables Referenced in a Construct Section of the 4.5 OpenMP specification.)
+
+
+\cppnexample{cpp_reference}{1}
+\cppspecificend
--- a/Examples_critical.tex
+++ b/Examples_critical.tex
@ -1,16 +1,20 @@
 \pagebreak
-\chapter{The \code{critical} Construct}
-\label{chap:critical}
+\section{The \code{critical} Construct}
+\label{sec:critical}

-The following example includes several \code{critical} constructs . The example 
+The following example includes several \code{critical} constructs. The example 
 illustrates a queuing model in which a task is dequeued and worked on. To guard 
 against multiple threads dequeuing the same task, the dequeuing operation must 
 be in a \code{critical} region. Because the two queues in this example are independent, 
 they are protected by \code{critical} constructs with different names, \plc{xaxis} 
 and \plc{yaxis}.

-\cexample{critical}{1c}
+\cexample{critical}{1}

-\fexample{critical}{1f}
+\fexample{critical}{1}

+The following example extends the previous example by adding the \code{hint} clause to the \code{critical} constructs.

+\cexample{critical}{2}
+
+\fexample{critical}{2}
--- a/Examples_declare_target.tex
+++ b/Examples_declare_target.tex
@ -1,8 +1,9 @@
 \pagebreak
-\chapter{\code{declare} \code{target} Construct}
-\label{chap:declare_target}
+\section{\code{declare} \code{target} Construct}
+\label{sec:declare_target}

-\section{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for a Function}
+\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for a Function}
+\label{subsec:declare_target_function}

 The following example shows how the \code{declare} \code{target} directive 
 is used to indicate that the corresponding call inside a \code{target} region 
@ -15,7 +16,7 @@ the \code{target} region (thus \code{fib}) will execute on the host device.
 For C/C++ codes the declaration of the function \code{fib} appears between the \code{declare} 
 \code{target} and \code{end} \code{declare} \code{target} directives.

-\cexample{declare_target}{1c}
+\cexample{declare_target}{1}

 The Fortran \code{fib} subroutine contains a \code{declare} \code{target} declaration 
 to indicate to the compiler to create an device executable version of the procedure. 
@ -26,7 +27,7 @@ The program uses the \code{module\_fib} module, which presents an explicit inter
 the compiler with the \code{declare} \code{target} declarations for processing 
 the \code{fib} call.

-\fexample{declare_target}{1f}
+\ffreeexample{declare_target}{1}

 The next Fortran example shows the use of an external subroutine. Without an explicit 
 interface (through module use or an interface block) the \code{declare} \code{target} 
@ -34,9 +35,10 @@ declarations within a external subroutine are unknown to the main program unit;
 therefore, a \code{declare} \code{target} must be provided within the program 
 scope for the compiler to determine that a target binary should be available.

-\fexample{declare_target}{2f}
+\ffreeexample{declare_target}{2}

-\section{\code{declare} \code{target} Construct for Class Type}
+\subsection{\code{declare} \code{target} Construct for Class Type}
+\label{subsec:declare_target_class}

 \cppspecificstart
 The following example shows how the \code{declare} \code{target} and \code{end} 
@ -45,10 +47,11 @@ of a variable \plc{varY} with a class type \code{typeY}. The member function \co
 be accessed on a target device because its declaration did not appear between \code{declare} 
 \code{target} and \code{end} \code{declare} \code{target} directives.

-\cnexample{declare_target}{2c}
+\cppnexample{declare_target}{2}
 \cppspecificend

-\section{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for Variables}
+\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for Variables}
+\label{subsec:declare_target_variables}

 The following examples show how the \code{declare} \code{target} and \code{end} 
 \code{declare} \code{target} directives are used to indicate that global variables 
@ -62,13 +65,13 @@ is then used to manage the consistency of the variables \plc{p}, \plc{v1}, and \
 data environment of the encountering host device task and the implicit device data 
 environment of the default target device.

-\cexample{declare_target}{3c}
+\cexample{declare_target}{3}

 The Fortran version of the above C code uses a different syntax. Fortran modules 
 use a list syntax on the \code{declare} \code{target} directive to declare 
 mapped variables.

-\fexample{declare_target}{3f}
+\ffreeexample{declare_target}{3}

 The following example also indicates that the function \code{Pfun()} is available on the 
 target device, as well as the variable \plc{Q}, which is mapped to the implicit device 
@ -81,7 +84,7 @@ In the following example, the function and variable declarations appear between
 the \code{declare} \code{target} and \code{end} \code{declare} \code{target} 
 directives.

-\cexample{declare_target}{4c}
+\cexample{declare_target}{4}

 The Fortran version of the above C code uses a different syntax. In Fortran modules 
 a list syntax on the \code{declare} \code{target} directive is used to declare 
@ -90,9 +93,10 @@ separated list. When the \code{declare} \code{target} directive is used to
 declare just the procedure, the procedure name need not be listed -- it is implicitly 
 assumed, as illustrated in the \code{Pfun()} function.

-\fexample{declare_target}{4f}
+\ffreeexample{declare_target}{4}

-\section{\code{declare} \code{target} and \code{end} \code{declare} \code{target} with \code{declare} \code{simd}}
+\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} with \code{declare} \code{simd}}
+\label{subsec:declare_target_simd}

 The following example shows how the \code{declare} \code{target} and \code{end} 
 \code{declare} \code{target} directives are used to indicate that a function 
@ -100,7 +104,7 @@ is available on a target device. The \code{declare} \code{simd} directive indica
 that there is a SIMD version of the function \code{P()} that is available on the target 
 device as well as one that is available on the host device.

-\cexample{declare_target}{5c}
+\cexample{declare_target}{5}

 The Fortran version of the above C code uses a different syntax. Fortran modules 
 use a list syntax of the \code{declare} \code{target} declaration for the mapping. 
@ -109,5 +113,30 @@ The function declaration does not use a list and implicitly assumes the function
 name. In this Fortran example row and column indices are reversed relative to the 
 C/C++ example, as is usual for codes optimized for memory access.

-\fexample{declare_target}{5f}
+\ffreeexample{declare_target}{5}
+
+
+\subsection{\code{declare}~\code{target} Directive with \code{link} Clause}
+\label{subsec:declare_target_link}
+
+In the OpenMP 4.5 standard the \code{declare}~\code{target} directive was extended to allow static
+data to be mapped, \emph{when needed}, through a \code{link} clause.
+
+Data storage for items listed in the \code{link} clause becomes available on the device
+when it is mapped implicitly or explicitly in a \code{map} clause, and it persists for the scope of
+the mapping (as specified by a \code{target} construct, 
+a \code{target}~\code{data} construct, or 
+\code{target}~\code{enter/exit}~\code{data} constructs).
+
+Tip: When all the global data items will not fit on a device and are not needed
+simultaneously, use the \code{link} clause and map the data only when it is needed.
+
+The following C and Fortran examples show two sets of data (single precision and double precision)
+that are global on the host for the entire execution on the host; but are only used
+globally on the device for part of the program execution. The single precision data
+are allocated and persist only for the first \code{target} region. Similarly, the
+double precision data are in scope on the device only for the second \code{target} region.
+
+\cexample{declare_target}{6}
+\ffreeexample{declare_target}{6}

--- a/Examples_default_none.tex
+++ b/Examples_default_none.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{The \code{default(none)} Clause}
-\label{chap:default_none}
+\section{The \code{default(none)} Clause}
+\label{sec:default_none}

 The following example distinguishes the variables that are affected by the \code{default(none)} 
 clause from those that are not. 
@ -11,9 +11,9 @@ are no longer predetermined shared.  Thus, these variables (variable \plc{c} in
 need to be explicitly listed
 in data-sharing attribute clauses when the \code{default(none)} clause is specified.

-\cnexample{default_none}{1c}
+\cnexample{default_none}{1}
 \ccppspecificend

-\fexample{default_none}{1f}
+\fexample{default_none}{1}


--- a/Examples_device.tex
+++ b/Examples_device.tex
@ -1,35 +1,57 @@
 \pagebreak
-\chapter{Device Routines}
-\label{chap:device}
+\section{Device Routines}
+\label{sec:device}

-\section{\code{omp\_is\_initial\_device} Routine}
+\subsection{\code{omp\_is\_initial\_device} Routine}
+\label{subsec:device_is_initial}

 The following example shows how the \code{omp\_is\_initial\_device} runtime library routine 
 can be used to query if a code is executing on the initial host device or on a 
 target device. The example then sets the number of threads in the \code{parallel} 
 region based on where the code is executing.

-\cexample{device}{1c}
+\cexample{device}{1}

-\fexample{device}{1f}
+\ffreeexample{device}{1}

-\section{\code{omp\_get\_num\_devices} Routine}
+\subsection{\code{omp\_get\_num\_devices} Routine}
+\label{subsec:device_num_devices}

 The following example shows how the \code{omp\_get\_num\_devices} runtime library routine 
 can be used to determine the number of devices.

-\cexample{device}{2c}
+\cexample{device}{2}

-\fexample{device}{2f}
+\ffreeexample{device}{2}

-\section{\code{omp\_set\_default\_device} and \\
+\subsection{\code{omp\_set\_default\_device} and \\
 \code{omp\_get\_default\_device} Routines}
+\label{subsec:device_is_set_get_default}

 The following example shows how the \code{omp\_set\_default\_device} and \code{omp\_get\_default\_device} 
 runtime library routines can be used to set the default device and determine the 
 default device respectively.

-\cexample{device}{3c}
+\cexample{device}{3}

-\fexample{device}{3f}
+\ffreeexample{device}{3}
+
+
+ \subsection{Target Memory and Device Pointers Routines}
+\label{subsec:target_mem_and_device_ptrs}
+
+The following example shows how to create space on a device, transfer data
+to and from that space, and free the space, using API calls. The API calls
+directly execute allocation, copy and free operations on the device, without invoking
+any mapping through a \code{target} directive. The \code{omp\_target\_alloc} routine allocates space
+and returns a device pointer for referencing the space in the \code{omp\_target\_memcpy}
+API routine on the host. The \code{omp\_target\_free} routine frees the space on the device.
+
+The example also illustrates how to access that space
+in a \code{target} region by exposing the device pointer in an \code{is\_device\_ptr} clause.
+
+The example creates an array of cosine values on the default device, to be used
+on the host device. The function fails if a default device is not available.
+
+\cexample{device}{4}

--- a/Examples_doacross.tex
+++ b/Examples_doacross.tex
@ -0,0 +1,68 @@
+\pagebreak
+\section{Doacross Loop Nest}
+\label{sec:doacross}
+
+An \code{ordered} clause can be used on a loop construct with an integer
+parameter argument to define the number of associated loops within 
+a \plc{doacross loop nest} where cross-iteration dependences exist.
+A \code{depend} clause on an \code{ordered} construct within an ordered 
+loop describes the dependences of the \plc{doacross} loops. 
+
+In the code below, the \code{depend(sink:i-1)} clause defines an \plc{i-1} 
+to \plc{i} cross-iteration dependence that specifies a wait point for 
+the completion of computation from iteration \plc{i-1} before proceeding 
+to the subsequent statements. The \code{depend(source)} clause indicates 
+the completion of computation from the current iteration (\plc{i}) 
+to satisfy the cross-iteration dependence that arises from the iteration.
+For this example the same sequential ordering could have been achieved 
+with an \code{ordered} clause without a parameter, on the loop directive, 
+and a single \code{ordered} directive without the \code{depend} clause
+specified for the statement executing the \plc{bar} function.
+
+\cexample{doacross}{1}
+
+\ffreeexample{doacross}{1}
+
+The following code is similar to the previous example but with 
+\plc{doacross loop nest} extended to two nested loops, \plc{i} and \plc{j}, 
+as specified by the \code{ordered(2)} clause on the loop directive. 
+In the C/C++ code, the \plc{i} and \plc{j} loops are the first and
+second associated loops, respectively, whereas
+in the Fortran code, the \plc{j} and \plc{i} loops are the first and
+second associated loops, respectively.
+The \code{depend(sink:i-1,j)} and \code{depend(sink:i,j-1)} clauses in 
+the C/C++ code define cross-iteration dependences in two dimensions from 
+iterations (\plc{i-1, j}) and (\plc{i, j-1}) to iteration (\plc{i, j}).  
+Likewise, the \code{depend(sink:j-1,i)} and \code{depend(sink:j,i-1)} clauses 
+in the Fortran code define cross-iteration dependences from iterations 
+(\plc{j-1, i}) and (\plc{j, i-1}) to iteration (\plc{j, i}).
+
+\cexample{doacross}{2}
+
+\ffreeexample{doacross}{2}
+
+
+The following example shows the incorrect use of the \code{ordered} 
+directive with a \code{depend} clause.  There are two issues with the code.  
+The first issue is a missing \code{ordered}~\code{depend(source)} directive,
+which could cause a deadlock.  
+The second issue is the \code{depend(sink:i+1,j)} and \code{depend(sink:i,j+1)} 
+clauses define dependences on lexicographically later 
+source iterations (\plc{i+1, j}) and (\plc{i, j+1}), which could cause 
+a deadlock as well since they may not start to execute until the current iteration completes.
+
+\cexample{doacross}{3}
+
+\ffreeexample{doacross}{3}
+
+
+The following example illustrates the use of the \code{collapse} clause for
+a \plc{doacross loop nest}.  The \plc{i} and \plc{j} loops are the associated
+loops for the collapsed loop as well as for the \plc{doacross loop nest}.
+The example also shows a compliant usage of the dependence source
+directive placed before the corresponding sink directive.
+Checking the completion of computation from previous iterations at the sink point can occur after the source statement.
+
+\cexample{doacross}{4}
+
+\ffreeexample{doacross}{4}
--- a/Examples_flush_nolist.tex
+++ b/Examples_flush_nolist.tex
@ -1,12 +1,12 @@
 \pagebreak
-\chapter{The \code{flush} Construct without a List}
-\label{chap:flush_nolist}
+\section{The \code{flush} Construct without a List}
+\label{sec:flush_nolist}

 The following example distinguishes the shared variables affected by a \code{flush} 
 construct with no list from the shared objects that are not affected:

-\cexample{flush_nolist}{1c}
+\cexample{flush_nolist}{1}

-\fexample{flush_nolist}{1f}
+\fexample{flush_nolist}{1}


--- a/Examples_fort_do.tex
+++ b/Examples_fort_do.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{Fortran Restrictions on the \code{do} Construct}
-\label{chap:fort_do}
+\section{Fortran Restrictions on the \code{do} Construct}
+\label{sec:fort_do}
 \fortranspecificstart

 If an \code{end do} directive follows a \plc{do-construct}  in which several 
@ -8,12 +8,12 @@ If an \code{end do} directive follows a \plc{do-construct}  in which several
 directive can only be specified for the outermost of these \code{DO} statements. 
 The following example contains correct usages of loop constructs:

-\fnexample{fort_do}{1f}
+\fnexample{fort_do}{1}

 The following example is non-conforming because the matching \code{do} directive 
 for the \code{end do} does not precede the outermost loop:

-\fnexample{fort_do}{2f}
+\fnexample{fort_do}{2}
 \fortranspecificend


--- a/Examples_fort_loopvar.tex
+++ b/Examples_fort_loopvar.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{Fortran Private Loop Iteration Variables}
-\label{chap:fort_loopvar}
+\section{Fortran Private Loop Iteration Variables}
+\label{sec:fort_loopvar}
 \fortranspecificstart

 In general loop iteration variables will be private, when used in the \plc{do-loop} 
@ -10,12 +10,12 @@ the OpenMP 4.0 specification). In the following example of a sequential
 loop in a \code{parallel} construct the loop iteration variable \plc{I} will 
 be private.

-\fnexample{fort_loopvar}{1f}
+\ffreenexample{fort_loopvar}{1}

 In exceptional cases, loop iteration variables can be made shared, as in the following 
 example:

-\fnexample{fort_loopvar}{2f}
+\ffreenexample{fort_loopvar}{2}

 Note however that the use of shared loop iteration variables can easily lead to 
 race conditions.
--- a/Examples_fort_race.tex
+++ b/Examples_fort_race.tex
@ -1,7 +1,7 @@
 \pagebreak
-\chapter{Race Conditions Caused by Implied Copies of Shared Variables in Fortran}
+\section{Race Conditions Caused by Implied Copies of Shared Variables in Fortran}
 \fortranspecificstart
-\label{chap:fort_race}
+\label{sec:fort_race}

 The following example contains a race condition, because the shared variable, which 
 is an array section, is passed as an actual argument to a routine that has an assumed-size 
@ -10,7 +10,7 @@ may cause the compiler to copy the argument into a temporary location prior to
 the call and copy from the temporary location into the original variable when the 
 subroutine returns. This copying would cause races in the \code{parallel} region.

-\fnexample{fort_race}{1f}
+\ffreenexample{fort_race}{1}
 \fortranspecificend


--- a/Examples_fort_sa_private.tex
+++ b/Examples_fort_sa_private.tex
@ -1,23 +1,23 @@
 \pagebreak
-\chapter{Fortran Restrictions on Storage Association with the \code{private} Clause}
+\section{Fortran Restrictions on Storage Association with the \code{private} Clause}
 \fortranspecificstart
-\label{chap:fort_sa_private}
+\label{sec:fort_sa_private}

 The following non-conforming examples illustrate the implications of the \code{private} 
 clause rules with regard to storage association. 

-\fnexample{fort_sa_private}{1f}
+\fnexample{fort_sa_private}{1}

-\fnexample{fort_sa_private}{2f}
+\fnexample{fort_sa_private}{2}
+
+\fnexample{fort_sa_private}{3}
 % blue line floater at top of this page for "Fortran, cont."
 \begin{figure}[t!]
 \linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
 \end{figure}

-\fnexample{fort_sa_private}{3f}
+\fnexample{fort_sa_private}{4}

-\fnexample{fort_sa_private}{4f}
-
-\fnexample{fort_sa_private}{5f}
+\fnexample{fort_sa_private}{5}
 \fortranspecificend

--- a/Examples_fort_sp_common.tex
+++ b/Examples_fort_sp_common.tex
@ -1,7 +1,7 @@
 \pagebreak
-\chapter{Fortran Restrictions on \code{shared} and \code{private} Clauses with Common Blocks}
+\section{Fortran Restrictions on \code{shared} and \code{private} Clauses with Common Blocks}
 \fortranspecificstart
-\label{chap:fort_sp_common}
+\label{sec:fort_sp_common}

 When a named common block is specified in a \code{private}, \code{firstprivate}, 
 or \code{lastprivate} clause of a construct, none of its members may be declared 
@ -10,11 +10,11 @@ illustrate this point.

 The following example is conforming:

-\fnexample{fort_sp_common}{1f}
+\fnexample{fort_sp_common}{1}

 The following example is also conforming:

-\fnexample{fort_sp_common}{2f}
+\fnexample{fort_sp_common}{2}
 % blue line floater at top of this page for "Fortran, cont."
 \begin{figure}[t!]
 \linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
@ -22,17 +22,17 @@ The following example is also conforming:

 The following example is conforming:

-\fnexample{fort_sp_common}{3f}
+\fnexample{fort_sp_common}{3}

 The following example is non-conforming because \code{x} is a constituent element 
 of \code{c}:

-\fnexample{fort_sp_common}{4f}
+\fnexample{fort_sp_common}{4}

 The following example is non-conforming because a common block may not be declared 
 both shared and private:

-\fnexample{fort_sp_common}{5f}
+\fnexample{fort_sp_common}{5}
 \fortranspecificend


--- a/Examples_fpriv_sections.tex
+++ b/Examples_fpriv_sections.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{The \code{firstprivate} Clause and the \code{sections} Construct}
-\label{chap:fpriv_sections}
+\section{The \code{firstprivate} Clause and the \code{sections} Construct}
+\label{sec:fpriv_sections}

 In the following example of the \code{sections} construct  the \code{firstprivate} 
 clause is used to initialize the private copy of \code{section\_count} of each 
@ -11,8 +11,8 @@ thread executes the two sections, one section will print the value 1 and the oth
 will print the value 2. Since the order of execution of the two sections in this 
 case is unspecified, it is unspecified which section prints which value. 

-\cexample{fpriv_sections}{1c}
+\cexample{fpriv_sections}{1}

-\fexample{fpriv_sections}{1f}
+\ffreeexample{fpriv_sections}{1}


--- a/Examples_get_nthrs.tex
+++ b/Examples_get_nthrs.tex
@ -1,21 +1,21 @@
 \pagebreak
-\chapter{The \code{omp\_get\_num\_threads} Routine}
-\label{chap:get_nthrs}
+\section{The \code{omp\_get\_num\_threads} Routine}
+\label{sec:get_nthrs}

 In the following example, the \code{omp\_get\_num\_threads} call returns 1 in 
 the sequential part of the code, so \code{np} will always be equal to 1. To determine 
 the number of threads that will be deployed for the \code{parallel} region, the 
 call should be inside the \code{parallel} region.

-\cexample{get_nthrs}{1c}
+\cexample{get_nthrs}{1}

-\fexample{get_nthrs}{1f}
+\fexample{get_nthrs}{1}

 The following example shows how to rewrite this program without including a query 
 for the number of threads:

-\cexample{get_nthrs}{2c}
+\cexample{get_nthrs}{2}

-\fexample{get_nthrs}{2f}
+\fexample{get_nthrs}{2}


--- a/Examples_icv.tex
+++ b/Examples_icv.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{Internal Control Variables (ICVs)}
-\label{chap:icv}
+\section{Internal Control Variables (ICVs)}
+\label{sec:icv}

 According to Section 2.3 of the OpenMP 4.0 specification, an OpenMP implementation must act as if there are ICVs that control 
 the behavior of the program.  This example illustrates two ICVs, \plc{nthreads-var} 
@ -50,7 +50,7 @@ one of the threads in the team. Since we have a total of two inner \code{paralle
 regions, the print statement will be executed twice -- once per inner \code{parallel} 
 region.

-\cexample{icv}{1c}
+\cexample{icv}{1}

-\fexample{icv}{1f}
+\fexample{icv}{1}

--- a/Examples_init_lock.tex
+++ b/Examples_init_lock.tex
@ -1,11 +1,10 @@
-\pagebreak
-\chapter{The \code{omp\_init\_lock} Routine}
-\label{chap:init_lock}
+\subsection{The \code{omp\_init\_lock} Routine}
+\label{subsec:init_lock}

 The following example demonstrates how to initialize an array of locks in a \code{parallel} 
 region by using \code{omp\_init\_lock}.

-\cexample{init_lock}{1c}
+\cppexample{init_lock}{1}

-\fexample{init_lock}{1f}
+\fexample{init_lock}{1}

--- a/Examples_init_lock_with_hint.tex
+++ b/Examples_init_lock_with_hint.tex
@ -0,0 +1,10 @@
+%\pagebreak
+\subsection{The \code{omp\_init\_lock\_with\_hint} Routine}
+\label{subsec:init_lock_with_hint}
+
+The following example demonstrates how to initialize an array of locks in a \code{parallel} region by using \code{omp\_init\_lock\_with\_hint}.
+Note, hints are combined with an \code{|} or \code{+} operator in C/C++ and a \code{+} operator in Fortran.
+
+\cppexample{init_lock_with_hint}{1}
+
+\fexample{init_lock_with_hint}{1}
--- a/Examples_lastprivate.tex
+++ b/Examples_lastprivate.tex
@ -1,14 +1,14 @@
 \pagebreak
-\chapter{The \code{lastprivate} Clause}
-\label{chap:lastprivate}
+\section{The \code{lastprivate} Clause}
+\label{sec:lastprivate}

 Correct execution sometimes depends on the value that the last iteration of a loop 
 assigns to a variable. Such programs must list all such variables in a \code{lastprivate} 
 clause  so that the values of the variables are the same as when the loop is executed 
 sequentially.

-\cexample{lastprivate}{1c}
+\cexample{lastprivate}{1}

-\fexample{lastprivate}{1f}
+\fexample{lastprivate}{1}


--- a/Examples_linear_in_loop.tex
+++ b/Examples_linear_in_loop.tex
@ -0,0 +1,13 @@
+\section{\code{linear} Clause in Loop Constructs}
+\label{sec:linear_in_loop}
+
+The following example shows the use of the \code{linear} clause in a loop 
+construct to allow the proper parallelization of a loop that contains 
+an induction variable (\plc{j}).  At the end of the execution of 
+the loop construct, the original variable \plc{j} is updated with 
+the value \plc{N/2} from the last iteration of the loop.
+
+\cexample{linear_in_loop}{1}
+
+\ffreeexample{linear_in_loop}{1}
+
--- a/Examples_lock_owner.tex
+++ b/Examples_lock_owner.tex
@ -1,6 +1,5 @@
-\pagebreak
-\chapter{Ownership of Locks}
-\label{chap:lock_owner}
+\subsection{Ownership of Locks}
+\label{subsec:lock_owner}

 Ownership of locks has changed since OpenMP 2.5. In OpenMP 2.5, locks are owned 
 by threads; so a lock released by the \code{omp\_unset\_lock} routine must be 
@ -16,8 +15,8 @@ the same). However, it is not conforming beginning with OpenMP 3.0, because the
 region that releases the lock \code{lck} is different from the task region that 
 acquires the lock.

-\cexample{lock_owner}{1c}
+\cexample{lock_owner}{1}

-\fexample{lock_owner}{1f}
+\fexample{lock_owner}{1}


--- a/Examples_locks.tex
+++ b/Examples_locks.tex
@ -0,0 +1,5 @@
+\pagebreak
+\section{Lock Routines}
+\label{sec:locks}
+
+This section is about the use of lock routines for synchronization.
--- a/Examples_master.tex
+++ b/Examples_master.tex
@ -1,13 +1,13 @@
 \pagebreak
-\chapter{The \code{master} Construct}
-\label{chap:master}
+\section{The \code{master} Construct}
+\label{sec:master}

 The following example demonstrates the master construct . In the example, the master 
 keeps track of how many iterations have been executed and prints out a progress 
 report. The other threads skip the master region without waiting.

-\cexample{master}{1c}
+\cexample{master}{1}

-\fexample{master}{1f}
+\fexample{master}{1}


--- a/Examples_mem_model.tex
+++ b/Examples_mem_model.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{The OpenMP Memory Model}
-\label{chap:mem_model}
+\section{The OpenMP Memory Model}
+\label{sec:mem_model}

 In the following example, at Print 1, the value of \plc{x} could be either 2 
 or 5, depending on the timing of the threads, and the implementation of the assignment 
@ -14,25 +14,25 @@ The barrier after Print 1 contains implicit flushes on all threads, as well as
 a thread synchronization, so the programmer is guaranteed that the value 5 will 
 be printed by both Print 2 and Print 3.

-\cexample{mem_model}{1c}
+\cexample{mem_model}{1}

-\fexample{mem_model}{1f}
+\ffreeexample{mem_model}{1}

 The following example demonstrates why synchronization is difficult to perform 
 correctly through variables. The value of flag is undefined in both prints on thread 
 1 and the value of data is only well-defined in the second print.

-\cexample{mem_model}{2c}
+\cexample{mem_model}{2}

-\fexample{mem_model}{2f}
+\fexample{mem_model}{2}

 The next example demonstrates why synchronization is difficult to perform correctly 
 through variables. Because the \plc{write}(1)-\plc{flush}(1)-\plc{flush}(2)-\plc{read}(2) 
 sequence cannot be guaranteed in the example, the statements on thread 0 and thread 
 1 may execute in either order.

-\cexample{mem_model}{3c}
+\cexample{mem_model}{3}

-\fexample{mem_model}{3f}
+\fexample{mem_model}{3}


--- a/Examples_nestable_lock.tex
+++ b/Examples_nestable_lock.tex
@ -1,11 +1,10 @@
-\pagebreak
-\chapter{Nestable Lock Routines}
-\label{chap:nestable_lock}
+\subsection{Nestable Lock Routines}
+\label{subsec:nestable_lock}

 The following example demonstrates how a nestable lock can be used to synchronize 
 updates both to a whole structure and to one of its members.

-\cexample{nestable_lock}{1c}
+\cexample{nestable_lock}{1}

-\fexample{nestable_lock}{1f}
+\fexample{nestable_lock}{1}

--- a/Examples_nested_loop.tex
+++ b/Examples_nested_loop.tex
@ -1,18 +1,18 @@
 \pagebreak
-\chapter{Nested Loop Constructs}
-\label{chap:nested_loop}
+\section{Nested Loop Constructs}
+\label{sec:nested_loop}

 The following example of loop construct nesting is conforming because the inner 
 and outer loop regions bind to different \code{parallel} regions:

-\cexample{nested_loop}{1c}
+\cexample{nested_loop}{1}

-\fexample{nested_loop}{1f}
+\fexample{nested_loop}{1}

 The following variation of the preceding example is also conforming:

-\cexample{nested_loop}{2c}
+\cexample{nested_loop}{2}

-\fexample{nested_loop}{2f}
+\fexample{nested_loop}{2}


--- a/Examples_nesting_restrict.tex
+++ b/Examples_nesting_restrict.tex
@ -1,52 +1,52 @@
 \pagebreak
-\chapter{Restrictions on Nesting of Regions}
-\label{chap:nesting_restrict}
+\section{Restrictions on Nesting of Regions}
+\label{sec:nesting_restrict}

 The examples in this section illustrate the region nesting rules. 

 The following example is non-conforming because the inner and outer loop regions 
 are closely nested:

-\cexample{nesting_restrict}{1c}
+\cexample{nesting_restrict}{1}

-\fexample{nesting_restrict}{1f}
+\fexample{nesting_restrict}{1}

 The following orphaned version of the preceding example is also non-conforming:

-\cexample{nesting_restrict}{2c}
+\cexample{nesting_restrict}{2}

-\fexample{nesting_restrict}{2f}
+\fexample{nesting_restrict}{2}

 The following example is non-conforming because the loop and \code{single} regions 
 are closely nested:

-\cexample{nesting_restrict}{3c}
+\cexample{nesting_restrict}{3}

-\fexample{nesting_restrict}{3f}
+\fexample{nesting_restrict}{3}

 The following example is non-conforming because a \code{barrier} region cannot 
 be closely nested inside a loop region:

-\cexample{nesting_restrict}{4c}
+\cexample{nesting_restrict}{4}

-\fexample{nesting_restrict}{4f}
+\fexample{nesting_restrict}{4}

 The following example is non-conforming because the \code{barrier} region cannot 
 be closely nested inside the \code{critical} region. If this were permitted, 
 it would result in deadlock due to the fact that only one thread at a time can 
 enter the \code{critical} region:

-\cexample{nesting_restrict}{5c}
+\cexample{nesting_restrict}{5}

-\fexample{nesting_restrict}{5f}
+\fexample{nesting_restrict}{5}

 The following example is non-conforming because the \code{barrier} region cannot 
 be closely nested inside the \code{single} region. If this were permitted, it 
 would result in deadlock due to the fact that only one thread executes the \code{single} 
 region:

-\cexample{nesting_restrict}{6c}
+\cexample{nesting_restrict}{6}

-\fexample{nesting_restrict}{6f}
+\fexample{nesting_restrict}{6}


--- a/Examples_nowait.tex
+++ b/Examples_nowait.tex
@ -1,14 +1,14 @@
 \pagebreak
-\chapter{The \code{nowait} Clause}
-\label{chap:nowait}
+\section{The \code{nowait} Clause}
+\label{sec:nowait}

 If there are multiple independent loops within a \code{parallel} region, you 
 can use the \code{nowait} clause to avoid the implied barrier at the end of the 
 loop construct, as follows:

-\cexample{nowait}{1c}
+\cexample{nowait}{1}

-\fexample{nowait}{1f}
+\fexample{nowait}{1}

 In the following example, static scheduling distributes the same logical iteration 
 numbers to the threads that execute the three loop regions. This allows the \code{nowait} 
@ -22,7 +22,7 @@ to \code{n-1} (from \code{1} to \code{N} in the Fortran version), while the
 iteration space of the last loop is from \code{1} to \code{n} (\code{2} to 
 \code{N+1} in the Fortran version).

-\cexample{nowait}{2c}
+\cexample{nowait}{2}

-\fexample{nowait}{2f}
+\ffreeexample{nowait}{2}

--- a/Examples_nthrs_dynamic.tex
+++ b/Examples_nthrs_dynamic.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{Interaction Between the \code{num\_threads} Clause and \code{omp\_set\_dynamic}}
-\label{chap:nthrs_dynamic}
+\section{Interaction Between the \code{num\_threads} Clause and \code{omp\_set\_dynamic}}
+\label{sec:nthrs_dynamic}

 The following example demonstrates the \code{num\_threads} clause  and the effect 
 of the \\
@ -12,17 +12,17 @@ of threads in OpenMP implementations that support it. In this case, 10 threads
 are provided. Note that in case of an error the OpenMP implementation is free to 
 abort the program or to supply any number of threads available.

-\cexample{nthrs_dynamic}{1c}
+\cexample{nthrs_dynamic}{1}

-\fexample{nthrs_dynamic}{1f}
+\fexample{nthrs_dynamic}{1}

 The call to the \code{omp\_set\_dynamic} routine with a non-zero argument in 
 C/C++, or \code{.TRUE.} in Fortran, allows the OpenMP implementation to choose 
 any number of threads between 1 and 10.

-\cexample{nthrs_dynamic}{2c}
+\cexample{nthrs_dynamic}{2}

-\fexample{nthrs_dynamic}{2f}
+\fexample{nthrs_dynamic}{2}

 It is good practice to set the \plc{dyn-var} ICV explicitly by calling the \code{omp\_set\_dynamic} 
 routine, as its default setting is implementation defined.
--- a/Examples_nthrs_nesting.tex
+++ b/Examples_nthrs_nesting.tex
@ -1,12 +1,12 @@
 \pagebreak
-\chapter{Controlling the Number of Threads on Multiple Nesting Levels}
-\label{chap:nthrs_nesting}
+\section{Controlling the Number of Threads on Multiple Nesting Levels}
+\label{sec:nthrs_nesting}

 The following examples demonstrate how to use the \code{OMP\_NUM\_THREADS} environment 
 variable  to control the number of threads on multiple nesting levels:

-\cexample{nthrs_nesting}{1c}
+\cexample{nthrs_nesting}{1}

-\fexample{nthrs_nesting}{1f}
+\fexample{nthrs_nesting}{1}


--- a/Examples_ordered.tex
+++ b/Examples_ordered.tex
@ -1,28 +1,28 @@
 \pagebreak
-\chapter{The \code{ordered} Clause and the \code{ordered} Construct}
-\label{chap:ordered}
+\section{The \code{ordered} Clause and the \code{ordered} Construct}
+\label{sec:ordered}

 Ordered constructs  are useful for sequentially ordering the output from work that 
 is done in parallel. The following program prints out the indices in sequential 
 order:

-\cexample{ordered}{1c}
+\cexample{ordered}{1}

-\fexample{ordered}{1f}
+\fexample{ordered}{1}

 It is possible to have multiple \code{ordered} constructs within a loop region 
 with the \code{ordered} clause specified. The first example is non-conforming 
 because all iterations execute two \code{ordered} regions. An iteration of a 
 loop must not execute more than one \code{ordered} region:

-\cexample{ordered}{2c}
+\cexample{ordered}{2}

-\fexample{ordered}{2f}
+\fexample{ordered}{2}

 The following is a conforming example with more than one \code{ordered} construct. 
 Each iteration will execute only one \code{ordered} region:

-\cexample{ordered}{3c}
+\cexample{ordered}{3}

-\fexample{ordered}{3f}
+\fexample{ordered}{3}

--- a/Examples_parallel.tex
+++ b/Examples_parallel.tex
@ -1,12 +1,12 @@
 \pagebreak
-\chapter{The \code{parallel} Construct}
-\label{chap:parallel}
+\section{The \code{parallel} Construct}
+\label{sec:parallel}

 The \code{parallel} construct  can be used in coarse-grain parallel programs. 
 In the following example, each thread in the \code{parallel} region decides what 
 part of the global array \plc{x} to work on, based on the thread number:

-\cexample{parallel}{1c}
+\cexample{parallel}{1}

-\fexample{parallel}{1f}
+\fexample{parallel}{1}

--- a/Examples_ploop.tex
+++ b/Examples_ploop.tex
@ -1,11 +1,12 @@
-\chapter{A Simple Parallel Loop}
-\label{chap:ploop}
+\pagebreak
+\section{A Simple Parallel Loop}
+\label{sec:ploop}

 The following example demonstrates how to parallelize a simple loop using the parallel 
 loop construct. The loop iteration variable is private by default, so it is not 
 necessary to specify it explicitly in a \code{private} clause.

-\cexample{ploop}{1c}
+\cexample{ploop}{1}

-\fexample{ploop}{1f}
+\fexample{ploop}{1}

--- a/Examples_pra_iterator.tex
+++ b/Examples_pra_iterator.tex
@ -1,11 +1,11 @@
 \pagebreak
-\chapter{Parallel Random Access Iterator Loop}
+\section{Parallel Random Access Iterator Loop}
 \cppspecificstart
-\label{chap:pra_iterator}
+\label{sec:pra_iterator}

 The following example shows a parallel random access iterator loop.

-\cnexample{pra_iterator}{1c}
+\cppnexample{pra_iterator}{1}
 \cppspecificend


--- a/Examples_private.tex
+++ b/Examples_private.tex
@ -1,31 +1,31 @@
 \pagebreak
-\chapter{The \code{private} Clause}
-\label{chap:private}
+\section{The \code{private} Clause}
+\label{sec:private}

 In the following example, the values of original list items \plc{i} and \plc{j} 
 are retained on exit from the \code{parallel} region, while the private list 
 items \plc{i} and \plc{j} are modified within the \code{parallel} construct. 

-\cexample{private}{1c}
+\cexample{private}{1}

-\fexample{private}{1f}
+\fexample{private}{1}

 In the following example, all uses of the variable \plc{a} within the loop construct 
 in the routine \plc{f} refer to a private list item \plc{a}, while it is 
 unspecified whether references to \plc{a} in the routine \plc{g} are to a 
 private list item or the original list item.

-\cexample{private}{2c}
+\cexample{private}{2}

-\fexample{private}{2f}
+\fexample{private}{2}

 The following example demonstrates that a list item that appears in a \code{private} 
 clause in a \code{parallel} construct may also appear in a \code{private} 
 clause in an enclosed worksharing construct, which results in an additional private 
 copy.

-\cexample{private}{3c}
+\cexample{private}{3}

-\fexample{private}{3f}
+\fexample{private}{3}


--- a/Examples_psections.tex
+++ b/Examples_psections.tex
@ -1,13 +1,13 @@
 \pagebreak
-\chapter{The \code{parallel} \code{sections} Construct}
-\label{chap:psections}
+\section{The \code{parallel} \code{sections} Construct}
+\label{sec:psections}

 In the following example routines \code{XAXIS}, \code{YAXIS}, and \code{ZAXIS} can 
 be executed concurrently. The first \code{section} directive is optional. Note 
 that all \code{section} directives need to appear in the \code{parallel sections} 
 construct.

-\cexample{psections}{1c}
+\cexample{psections}{1}

-\fexample{psections}{1f}
+\fexample{psections}{1}

--- a/Examples_reduction.tex
+++ b/Examples_reduction.tex
@ -1,44 +1,44 @@
 \pagebreak
-\chapter{The \code{reduction} Clause}
-\label{chap:reduction}
+\section{The \code{reduction} Clause}
+\label{sec:reduction}

 The following example demonstrates the \code{reduction} clause ; note that some 
 reductions can be expressed in the loop in several ways, as shown for the \code{max} 
 and \code{min} reductions below:

-\cexample{reduction}{1c}
+\cexample{reduction}{1}

-\fexample{reduction}{1f}
+\ffreeexample{reduction}{1}

 A common implementation of the preceding example is to treat it as if it had been 
 written as follows:

-\cexample{reduction}{2c}
+\cexample{reduction}{2}

 \fortranspecificstart
-\fnexample{reduction}{2f}
+\ffreenexample{reduction}{2}

 The following program is non-conforming because the reduction is on the 
 \emph{intrinsic procedure name} \code{MAX} but that name has been redefined to be the variable 
 named \code{MAX}.
+
+\ffreenexample{reduction}{3}
 % blue line floater at top of this page for "Fortran, cont."
 \begin{figure}[t!]
 \linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
 \end{figure}

-\fnexample{reduction}{3f}
-
 The following conforming program performs the reduction using the 
 \emph{intrinsic procedure name} \code{MAX} even though the intrinsic \code{MAX} has been renamed 
 to \code{REN}.

-\fnexample{reduction}{4f}
+\ffreenexample{reduction}{4}

 The following conforming program performs the reduction using 
 \plc{intrinsic procedure name} \code{MAX} even though the intrinsic \code{MAX} has been renamed 
 to \code{MIN}.

-\fnexample{reduction}{5f}
+\ffreenexample{reduction}{5}
 \fortranspecificend

 The following example is non-conforming because the initialization (\code{a = 
@ -53,8 +53,13 @@ clause. This can be achieved by adding an explicit barrier after the assignment
 directive (which has an implied barrier), or by initializing \code{a} before 
 the start of the \code{parallel} region.

-\cexample{reduction}{3c}
+\cexample{reduction}{6}

-\fexample{reduction}{6f}
+\fexample{reduction}{6}
+
+The following example demonstrates the reduction of array \plc{a}.  In C/C++ this is illustrated by the explicit use of an array section \plc{a[0:N]} in the \code{reduction} clause.  The corresponding Fortran example uses array syntax supported in the base language.  As of the OpenMP 4.5 specification the explicit use of array section in the \code{reduction} clause in Fortran is not permitted.  But this oversight will be fixed in the next release of the specification.


+\cexample{reduction}{7}
+
+\ffreeexample{reduction}{7}
--- a/Examples_set_dynamic_nthrs.tex
+++ b/Examples_set_dynamic_nthrs.tex
@ -1,7 +1,7 @@
 \pagebreak
-\chapter{The \code{omp\_set\_dynamic} and \\
+\section{The \code{omp\_set\_dynamic} and \\
 \code{omp\_set\_num\_threads} Routines}
-\label{chap:set_dynamic_nthrs}
+\label{sec:set_dynamic_nthrs}

 Some programs rely on a fixed, prespecified number of threads to execute correctly. 
 Because the default setting for the dynamic adjustment of the number of threads 
@ -17,8 +17,8 @@ dynamic threads setting. The dynamic threads mechanism determines the number of
 threads to use at the start of the \code{parallel} region and keeps it constant 
 for the duration of the region.

-\cexample{set_dynamic_nthrs}{1c}
+\cexample{set_dynamic_nthrs}{1}

-\fexample{set_dynamic_nthrs}{1f}
+\fexample{set_dynamic_nthrs}{1}


--- a/Examples_simple_lock.tex
+++ b/Examples_simple_lock.tex
@ -1,6 +1,5 @@
-\pagebreak
-\chapter{Simple Lock Routines}
-\label{chap:simple_lock}
+\subsection{Simple Lock Routines}
+\label{subsec:simple_lock}

 In the following example, the lock routines cause the threads to be idle while 
 waiting for entry to the first critical section, but to do other work while waiting 
@ -10,10 +9,10 @@ function does not, allowing the work in \code{skip} to be done.
 Note that the argument to the lock routines should have type \code{omp\_lock\_t}, 
 and that there is no need to flush it. 

-\cexample{simple_lock}{1c}
+\cexample{simple_lock}{1}

 Note that there is no need to flush the lock variable. 

-\fexample{simple_lock}{1f}
+\fexample{simple_lock}{1}


--- a/Examples_single.tex
+++ b/Examples_single.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{The \code{single} Construct}
-\label{chap:single}
+\section{The \code{single} Construct}
+\label{sec:single}

 The following example demonstrates the \code{single} construct. In the example, 
 only one thread prints each of the progress messages. All other threads will skip 
@ -11,8 +11,8 @@ a \code{nowait} clause can be specified, as is done in the third \code{single}
 construct in this example. The user must not make any assumptions as to which thread 
 will execute a \code{single} region.

-\cexample{single}{1c}
+\cexample{single}{1}

-\fexample{single}{1f}
+\fexample{single}{1}


--- a/Examples_standalone.tex
+++ b/Examples_standalone.tex
@ -1,31 +1,31 @@
 \pagebreak
-\chapter{Placement of \code{flush}, \code{barrier}, \code{taskwait} 
+\section{Placement of \code{flush}, \code{barrier}, \code{taskwait} 
 and \code{taskyield} Directives}
-\label{chap:standalone}
+\label{sec:standalone}

 The following example is non-conforming, because the \code{flush}, \code{barrier}, 
 \code{taskwait}, and \code{taskyield}  directives are stand-alone directives 
 and cannot be the immediate substatement of an \code{if} statement. 

-\cexample{standalone}{1c}
+\cexample{standalone}{1}

 The following example is non-conforming, because the \code{flush}, \code{barrier}, 
 \code{taskwait}, and \code{taskyield}  directives are stand-alone directives 
 and cannot be the action statement of an \code{if} statement or a labeled branch 
 target.

-\fexample{standalone}{1f}
+\ffreeexample{standalone}{1}

 The following version of the above example is conforming because the \code{flush}, 
 \code{barrier}, \code{taskwait}, and \code{taskyield} directives are enclosed 
 in a compound statement. 

-\cexample{standalone}{2c}
+\cexample{standalone}{2}

 The following example is conforming because the \code{flush}, \code{barrier}, 
 \code{taskwait}, and \code{taskyield} directives are enclosed in an \code{if} 
 construct or follow the labeled branch target.

-\fexample{standalone}{2f}
+\ffreeexample{standalone}{2}


--- a/Examples_target.tex
+++ b/Examples_target.tex
@ -1,29 +1,32 @@
 \pagebreak
-\chapter{\code{target} Construct}
-\label{chap:target}
+\section{\code{target} Construct}
+\label{sec:target}

-\section{\code{target} Construct on \code{parallel} Construct}
+\subsection{\code{target} Construct on \code{parallel} Construct}
+\label{subsec:target_parallel}

 This following example shows how the \code{target} construct offloads a code 
 region to a target device. The variables \plc{p}, \plc{v1}, \plc{v2}, and \plc{N} are implicitly mapped 
 to the target device.

-\cexample{target}{1c}
+\cexample{target}{1}

-\fexample{target}{1f}
+\ffreeexample{target}{1}

-\section{\code{target} Construct with \code{map} Clause}
+\subsection{\code{target} Construct with \code{map} Clause}
+\label{subsec:target_map}

 This following example shows how the \code{target} construct offloads a code 
 region to a target device. The variables \plc{p}, \plc{v1} and \plc{v2} are explicitly mapped to the 
 target device using the \code{map} clause. The variable \plc{N} is implicitly mapped to 
 the target device.

-\cexample{target}{2c}
+\cexample{target}{2}

-\fexample{target}{2f}
+\ffreeexample{target}{2}

-\section{\code{map} Clause with \code{to}/\code{from} map-types}
+\subsection{\code{map} Clause with \code{to}/\code{from} map-types}
+\label{subsec:target_map_tofrom}

 The following example shows how the \code{target} construct offloads a code region 
 to a target device. In the \code{map} clause, the \code{to} and \code{from} 
@ -43,16 +46,17 @@ the variable \plc{p} is not initialized with the value of the corresponding vari
 on the host device, and at the end of the \code{target} region the variable \plc{p} 
 is assigned to the corresponding variable on the host device.

-\cexample{target}{3c}
+\cexample{target}{3}

 The \code{to} and \code{from} map-types allow programmers to optimize data 
 motion. Since data for the \plc{v} arrays are not returned, and data for the \plc{p} array 
 are not transferred to the device, only one-half of the data is moved, compared 
 to the default behavior of an implicit mapping.

-\fexample{target}{3f}
+\ffreeexample{target}{3}

-\section{\code{map} Clause with Array Sections}
+\subsection{\code{map} Clause with Array Sections}
+\label{subsec:target_array_section}

 The following example shows how the \code{target} construct offloads a code region 
 to a target device. In the \code{map} clause, map-types are used to optimize 
@ -60,14 +64,14 @@ the mapping of variables to the target device. Because variables \plc{p}, \plc{v
 pointers, array section notation must be used to map the arrays. The notation \code{:N} 
 is equivalent to \code{0:N}.

-\cexample{target}{4c}
+\cexample{target}{4}

 In C, the length of the pointed-to array must be specified. In Fortran the extent 
 of the array is known and the length need not be specified. A section of the array 
 can be specified with the usual Fortran syntax, as shown in the following example. 
 The value 1 is assumed for the lower bound for array section \plc{v2(:N)}.

-\fexample{target}{4f}
+\ffreeexample{target}{4}

 A more realistic situation in which an assumed-size array is passed to \code{vec\_mult} 
 requires that the length of the arrays be specified, because the compiler does 
@ -75,9 +79,10 @@ not know the size of the storage. A section of the array must be specified with
 the usual Fortran syntax, as shown in the following example. The value 1 is assumed 
 for the lower bound for array section \plc{v2(:N)}.

-\fexample{target}{4bf}
+\ffreeexample{target}{4b}

-\section{\code{target} Construct with \code{if} Clause}
+\subsection{\code{target} Construct with \code{if} Clause}
+\label{subsec:target_if}

 The following example shows how the \code{target} construct offloads a code region 
 to a target device.
@ -90,7 +95,18 @@ The \code{if} clause on the \code{parallel} construct indicates that if the
 variable \plc{N} is smaller than a second threshold then the \code{parallel} region 
 is inactive.

-\cexample{target}{5c}
+\cexample{target}{5}

-\fexample{target}{5f}
+\ffreeexample{target}{5}

+The following example is a modification of the above \plc{target.5} code to show the combined \code{target}
+and parallel loop directives. It uses the \plc{directive-name} modifier in multiple \code{if}
+clauses to specify the component directive to which it applies. 
+
+The \code{if} clause with the \code{target} modifier applies to the \code{target} component of the 
+combined directive, and the \code{if} clause with the \code{parallel} modifier applies 
+to the \code{parallel} component of the combined directive.    
+
+\cexample{target}{6}
+
+\ffreeexample{target}{6}
--- a/Examples_target_data.tex
+++ b/Examples_target_data.tex
@ -1,8 +1,9 @@
 \pagebreak
-\chapter{\code{target} \code{data} Construct}
-\label{chap:target_data}
+\section{\code{target} \code{data} Construct}
+\label{sec:target_data}

-\section{Simple \code{target} \code{data} Construct}
+\subsection{Simple \code{target} \code{data} Construct}
+\label{subsec:target_data_simple}

 This example shows how the \code{target} \code{data} construct maps variables 
 to a device data environment. The \code{target} \code{data} construct creates 
@ -13,15 +14,16 @@ variables \plc{v1}, \plc{v2}, and \plc{p} from the enclosing device data environ
 \plc{N} is mapped into the new device data environment from the encountering task's data 
 environment.

-\cexample{target_data}{1c}
+\cexample{target_data}{1}

 The Fortran code passes a reference and specifies the extent of the arrays in the 
 declaration. No length information is necessary in the map clause, as is required 
 with C/C++ pointers.

-\fexample{target_data}{1f}
+\ffreeexample{target_data}{1}

-\section{\code{target} \code{data} Region Enclosing Multiple \code{target} Regions}
+\subsection{\code{target} \code{data} Region Enclosing Multiple \code{target} Regions}
+\label{subsec:target_data_multiregion}

 The following examples show how the \code{target} \code{data} construct maps 
 variables to a device data environment of a \code{target} region. The \code{target} 
@ -36,7 +38,7 @@ In the following example the variables \plc{v1} and \plc{v2} are mapped at each
 construct. Instead of mapping the variable \plc{p} twice, once at each \code{target} 
 construct, \plc{p} is mapped once by the \code{target} \code{data} construct.

-\cexample{target_data}{2c}
+\cexample{target_data}{2}


 The Fortran code uses reference and specifies the extent of the \plc{p}, \plc{v1} and \plc{v2} arrays. 
@ -45,14 +47,14 @@ C/C++ pointers. The arrays \plc{v1} and \plc{v2} are mapped at each \code{target
 Instead of mapping the array \plc{p} twice, once at each target construct, \plc{p} is mapped 
 once by the \code{target} \code{data} construct.

-\fexample{target_data}{2f}
+\ffreeexample{target_data}{2}

 In the following example, the variable tmp defaults to \code{tofrom} map-type 
 and is mapped at each \code{target} construct. The array \plc{Q} is mapped once at 
 the enclosing \code{target} \code{data} region instead of at each \code{target} 
 construct. 

-\cexample{target_data}{3c}
+\cexample{target_data}{3}

 In the following example the arrays \plc{v1} and \plc{v2} are mapped at each \code{target} 
 construct. Instead of mapping the array \plc{Q} twice at each \code{target} construct, 
@ -61,9 +63,9 @@ variable is implicitly remapped for each \code{target} region, mapping the value
 from the device to the host at the end of the first \code{target} region, and 
 from the host to the device for the second \code{target} region.

-\fexample{target_data}{3f}
+\ffreeexample{target_data}{3}

-\section{\code{target} \code{data} Construct with Orphaned Call}
+\subsection{\code{target} \code{data} Construct with Orphaned Call}

 The following two examples show how the \code{target} \code{data} construct 
 maps variables to a device data environment. The \code{target} \code{data} 
@ -88,7 +90,7 @@ of the storage location associated with their corresponding array sections. Note
 that the following pairs of array section storage locations are equivalent (\plc{p0[:N]}, 
 \plc{p1[:N]}), (\plc{v1[:N]},\plc{v3[:N]}), and (\plc{v2[:N]},\plc{v4[:N]}).

-\cexample{target_data}{4c}
+\cexample{target_data}{4}

 The Fortran code maps the pointers and storage in an identical manner (same extent, 
 but uses indices from 1 to \plc{N}).
@ -104,7 +106,7 @@ assigned the address of the storage location associated with their corresponding
 array sections. Note that the following pair of array storage locations are equivalent 
 (\plc{p0},\plc{p1}), (\plc{v1},\plc{v3}), and (\plc{v2},\plc{v4}).

-\fexample{target_data}{4f}
+\ffreeexample{target_data}{4}


 In the following example, the variables \plc{p1}, \plc{v3}, and \plc{v4} are references to the pointer 
@ -113,7 +115,7 @@ environment inherits the pointer variables \plc{p0}, \plc{v1}, and \plc{v2} from
 \code{data} construct's device data environment. Thus, \plc{p1}, \plc{v3}, and \plc{v4} are already 
 present in the device data environment.

-\cexample{target_data}{5c}
+\cppexample{target_data}{5}

 In the following example, the usual Fortran approach is used for dynamic memory. 
 The \plc{p0}, \plc{v1}, and \plc{v2} arrays are allocated in the main program and passed as references 
@ -123,9 +125,10 @@ environment inherits the arrays \plc{p0}, \plc{v1}, and \plc{v2} from the enclos
 device data environment. Thus, \plc{p1}, \plc{v3}, and \plc{v4} are already present in the device 
 data environment.

-\fexample{target_data}{5f}
+\ffreeexample{target_data}{5}

-\section{\code{target} \code{data} Construct with \code{if} Clause}
+\subsection{\code{target} \code{data} Construct with \code{if} Clause}
+\label{subsec:target_data_if}

 The following two examples show how the \code{target} \code{data} construct 
 maps variables to a device data environment.
@ -140,7 +143,7 @@ variable \plc{p} is implicitly mapped with a map-type of \code{tofrom}, but the
 location for the array section \plc{p[0:N]} will not be mapped in the device data environments 
 of the \code{target} constructs.

-\cexample{target_data}{6c}
+\cexample{target_data}{6}

 The \code{if} clauses work the same way for the following Fortran code. The \code{target} 
 constructs enclosed in the \code{target} \code{data} region should also use 
@ -148,7 +151,7 @@ an \code{if} clause with the same condition, so that the \code{target} \code{dat
 region and the \code{target} region are either both created for the device, or 
 are both ignored.

-\fexample{target_data}{6f}
+\ffreeexample{target_data}{6}

 In the following example, when the \code{if} clause conditional expression on 
 the \code{target} construct evaluates to \plc{false}, the target region will 
@ -159,7 +162,7 @@ region the array section \plc{p[0:N]} will be assigned from the device data envi
 to the corresponding variable in the data environment of the task that encountered 
 the \code{target} \code{data} construct, resulting in undefined values in \plc{p[0:N]}.

-\cexample{target_data}{7c}
+\cexample{target_data}{7}

 The \code{if} clauses work the same way for the following Fortran code. When 
 the \code{if} clause conditional expression on the \code{target} construct 
@ -171,5 +174,5 @@ region the \plc{p} array will be assigned from the device data environment to th
 variable in the data environment of the task that encountered the \code{target} 
 \code{data} construct, resulting in undefined values in \plc{p}.

-\fexample{target_data}{7f}
+\ffreeexample{target_data}{7}

--- a/Examples_target_unstructured_data.tex
+++ b/Examples_target_unstructured_data.tex
@ -0,0 +1,47 @@
+%begin 
+\pagebreak
+\section{\code{target} \code{enter} \code{data} and \code{target} \code{exit} \code{data} Constructs}
+\label{sec:target_enter_exit_data}
+%\section{Simple target enter data and target exit data Constructs}
+
+The structured data construct (\code{target}~\code{data}) provides persistent data on a
+device for subsequent \code{target} constructs as shown in the 
+\code{target}~\code{data} examples above. This is accomplished by creating a single
+\code{target}~\code{data} region containing \code{target} constructs.
+
+The unstructured data constructs allow the creation and deletion of data on
+the device at any appropriate point within the host code, as shown below 
+with the \code{target}~\code{enter}~\code{data} and \code{target}~\code{exit}~\code{data} constructs.
+
+The following C++ code creates/deletes a vector in a constructor/destructor 
+of a class. The constructor creates a vector with \code{target}~\code{enter}~\code{data}
+and uses an \code{alloc} modifier in the \code{map} clause to avoid copying values
+to the device. The destructor deletes the data (\code{target}~\code{exit}~\code{data})
+and uses the \code{delete} modifier in the \code{map} clause to avoid copying data
+back to the host. Note, the stand-alone \code{target}~\code{enter}~\code{data} occurs 
+after the host vector is created, and the \code{target}~\code{exit}~\code{data}
+construct occurs before the host data is deleted.
+
+\cppexample{target_unstructured_data}{1}
+
+The following C code allocates and frees the data member of a Matrix structure.
+The \code{init\_matrix} function allocates the memory used in the structure and
+uses the \code{target}~\code{enter}~\code{data} directive to map it to the target device. The
+\code{free\_matrix} function removes the mapped array from the target device
+and then frees the memory on the host.  Note, the stand-alone 
+\code{target}~\code{enter}~\code{data} occurs after the host memory is allocated, and the 
+\code{target}~\code{exit}~\code{data} construct occurs before the host data is freed.
+
+\cexample{target_unstructured_data}{1}
+
+The following Fortran code allocates and deallocates a module array.  The
+\code{initialize} subroutine allocates the module array and uses the
+\code{target}~\code{enter}~\code{data} directive to map it to the target device. The
+\code{finalize} subroutine removes the mapped array from the target device and
+then deallocates the array on the host.  Note, the stand-alone 
+\code{target}~\code{enter}~\code{data} occurs after the host memory is allocated, and the 
+\code{target}~\code{exit}~\code{data} construct occurs before the host data is deallocated.
+
+\ffreeexample{target_unstructured_data}{1}
+%end
+
--- a/Examples_target_update.tex
+++ b/Examples_target_update.tex
@ -1,8 +1,9 @@
 \pagebreak
-\chapter{\code{target} \code{update} Construct}
-\label{chap:target_update}
+\section{\code{target} \code{update} Construct}
+\label{sec:target_update}

-\section{Simple \code{target} \code{data} and \code{target} \code{update} Constructs}
+\subsection{Simple \code{target} \code{data} and \code{target} \code{update} Constructs}
+\label{subsec:target_data_and_update}

 The following example shows how the \code{target} \code{update} construct updates 
 variables in a device data environment.
@ -26,11 +27,12 @@ region and waits for the completion of the region.

 The second \code{target} region uses the updated values of \plc{v1[:N]} and \plc{v2[:N]}.

-\cexample{target_update}{1c}
+\cexample{target_update}{1}

-\fexample{target_update}{1f}
+\ffreeexample{target_update}{1}

-\section{\code{target} \code{update} Construct with \code{if} Clause}
+\subsection{\code{target} \code{update} Construct with \code{if} Clause}
+\label{subsec:target_update_if}

 The following example shows how the \code{target} \code{update} construct updates 
 variables in a device data environment.
@ -47,7 +49,7 @@ assigns the new values of \plc{v1} and \plc{v2} from the task's data environment
 mapped array sections in the \code{target} \code{data} construct's device data 
 environment.

-\cexample{target_update}{2c}
+\cexample{target_update}{2}

-\fexample{target_update}{2f}
+\ffreeexample{target_update}{2}

--- a/Examples_task_dep.tex
+++ b/Examples_task_dep.tex
@ -1,58 +1,62 @@
 \pagebreak
-\chapter{Task Dependences}
-\label{chap:task_dep}
+\section{Task Dependences}
+\label{sec:task_depend}

-\section{Flow Dependence}
+\subsection{Flow Dependence}
+\label{subsec:task_flow_depend}

 In this example we show a simple flow dependence expressed using the \code{depend} 
 clause on the \code{task} construct.

-\cexample{task_dep}{1c}
+\cexample{task_dep}{1}

-\fexample{task_dep}{1f}
+\ffreeexample{task_dep}{1}

 The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend} 
 clauses enforce the ordering of the tasks. If the \code{depend} clauses had been 
 omitted, then the tasks could execute in any order and the program and the program 
 would have a race condition.

-\section{Anti-dependence}
+\subsection{Anti-dependence}
+\label{subsec:task_anti_depend}

 In this example we show an anti-dependence expressed using the \code{depend} 
 clause on the \code{task} construct.

-\cexample{task_dep}{2c}
+\cexample{task_dep}{2}

-\fexample{task_dep}{2f}
+\ffreeexample{task_dep}{2}

 The program will always print \texttt{"}x = 1\texttt{"}, because the \code{depend} 
 clauses enforce the ordering of the tasks. If the \code{depend} clauses had been 
 omitted, then the tasks could execute in any order and the program would have a 
 race condition.

-\section{Output Dependence}
+\subsection{Output Dependence}
+\label{subsec:task_out_depend}

 In this example we show an output dependence expressed using the \code{depend} 
 clause on the \code{task} construct.

-\cexample{task_dep}{3c}
+\cexample{task_dep}{3}

-\fexample{task_dep}{3f}
+\ffreeexample{task_dep}{3}

 The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend} 
 clauses enforce the ordering of the tasks. If the \code{depend} clauses had been 
 omitted, then the tasks could execute in any order and the program would have a 
 race condition.

-\section{Concurrent Execution with Dependences}
+\subsection{Concurrent Execution with Dependences}
+\label{subsec:task_concurrent_depend}

 In this example we show potentially concurrent execution of tasks using multiple 
 flow dependences expressed using the \code{depend} clause on the \code{task} 
 construct.

-\cexample{task_dep}{4c}
+\cexample{task_dep}{4}

-\fexample{task_dep}{4f}
+\ffreeexample{task_dep}{4}

 The last two tasks are dependent on the first task. However there is no dependence 
 between the last two tasks, which may execute in any order (or concurrently if 
@ -61,12 +65,13 @@ more than one thread is available). Thus, the possible outputs are \texttt{"}x
 If the \code{depend} clauses had been omitted, then all of the tasks could execute 
 in any order and the program would have a race condition.

-\section{Matrix multiplication}
+\subsection{Matrix multiplication}
+\label{subsec:task_matrix_mult}

 This example shows a task-based blocked matrix multiplication. Matrices are of 
 NxN elements, and the multiplication is implemented using blocks of BSxBS elements.

-\cexample{task_dep}{5c}
+\cexample{task_dep}{5}

-\fexample{task_dep}{5f}
+\ffreeexample{task_dep}{5}

--- a/Examples_task_priority.tex
+++ b/Examples_task_priority.tex
@ -0,0 +1,22 @@
+\pagebreak
+\section{Task Priority}
+\label{sec:task_priority}
+
+
+
+%\subsection{Task Priority}
+%\label{subsec:task_priority}
+
+In this example we compute arrays in a matrix through a \plc{compute\_array} routine.
+Each task has a priority value equal to the value of the loop variable \plc{i} at the
+moment of its creation. A higher priority on a task means that a task is a candidate
+to run sooner.
+
+The creation of tasks occurs in ascending order (according to the iteration space of
+the loop) but a hint, by means of the \code{priority} clause, is provided to reverse
+the execution order.
+
+\cexample{task_priority}{1}
+
+\ffreeexample{task_priority}{1}
+
--- a/Examples_taskgroup.tex
+++ b/Examples_taskgroup.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{The \code{taskgroup} Construct}
-\label{chap:taskgroup}
+\section{The \code{taskgroup} Construct}
+\label{sec:taskgroup}

 In this example, tasks are grouped and synchronized using the \code{taskgroup} 
 construct.
@ -14,7 +14,7 @@ does not participate in the synchronization, and is left free to execute in para
 This is opposed to the behaviour of the \code{taskwait} construct, which would 
 include the background tasks in the synchronization.

-\cexample{taskgroup}{1c}
+\cexample{taskgroup}{1}

-\fexample{taskgroup}{1f}
+\ffreeexample{taskgroup}{1}

--- a/Examples_tasking.tex
+++ b/Examples_tasking.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{The \code{task} and \code{taskwait} Constructs}
-\label{chap:tasking}
+\section{The \code{task} and \code{taskwait} Constructs}
+\label{sec:task_taskwait}

 The following example shows how to traverse a tree-like structure using explicit 
 tasks. Note that the \code{traverse} function should be called from within a 
@ -9,17 +9,17 @@ note that the tasks will be executed in no specified order because there are no
 synchronization directives. Thus, assuming that the traversal will be done in post 
 order, as in the sequential code, is wrong.

-\cexample{tasking}{1c}
+\cexample{tasking}{1}

-\fexample{tasking}{1f}
+\ffreeexample{tasking}{1}

 In the next example, we force a postorder traversal of the tree by adding a \code{taskwait} 
 directive. Now, we can safely assume that the left and right sons have been executed 
 before we process the current node.

-\cexample{tasking}{2c}
+\cexample{tasking}{2}

-\fexample{tasking}{2f}
+\ffreeexample{tasking}{2}

 The following example demonstrates how to use the \code{task} construct to process 
 elements of a linked list in parallel. The thread executing the \code{single} 
@ -28,18 +28,18 @@ in the current team. The pointer \plc{p} is \code{firstprivate} by default
 on the \code{task} construct so it is not necessary to specify it in a \code{firstprivate} 
 clause.

-\cexample{tasking}{3c}
+\cexample{tasking}{3}

-\fexample{tasking}{3f}
+\ffreeexample{tasking}{3}

 The \code{fib()} function should be called from within a \code{parallel}  region 
 for the different specified tasks to be executed in parallel. Also, only one thread 
 of the \code{parallel} region should call \code{fib()} unless multiple concurrent 
 Fibonacci computations are desired. 

-\cexample{tasking}{4c}
+\cexample{tasking}{4}

-\fexample{tasking}{4f}
+\fexample{tasking}{4}

 Note: There are more efficient algorithms for computing Fibonacci numbers. This 
 classic recursion algorithm is for illustrative purposes.
@ -52,9 +52,9 @@ loop to suspend its task at the task scheduling point in the \code{task} directi
 and start executing unassigned tasks.  Once the number of unassigned tasks is sufficiently 
 low, the thread may resume execution of the task generating loop.

-\cexample{tasking}{5c}
+\cexample{tasking}{5}
 \pagebreak
-\fexample{tasking}{5f}
+\fexample{tasking}{5}

 The following example is the same as the previous one, except that the tasks are 
 generated in an untied task. While generating the tasks, the implementation may 
@ -69,9 +69,9 @@ to resume the task generating loop. In the previous examples, the other threads
 would be forced to idle until the generating thread finishes its long task, since 
 the task generating loop was in a tied task.

-\cexample{tasking}{6c}
+\cexample{tasking}{6}

-\fexample{tasking}{6f}
+\fexample{tasking}{6}

 The following two examples demonstrate how the scheduling rules illustrated in 
 Section 2.11.3 of the OpenMP 4.0 specification affect the usage of 
@ -86,20 +86,20 @@ both of the task regions that modify \code{tp}. The parts of these task regions
 in which \code{tp} is modified may be executed in any order so the resulting 
 value of \code{var} can be either 1 or 2.

-\cexample{tasking}{7c}
+\cexample{tasking}{7}


-\fexample{tasking}{7f}
+\fexample{tasking}{7}

 In this example, scheduling constraints prohibit a thread in the team from executing 
 a new task that modifies \code{tp}  while another such task region tied to the 
 same thread is suspended. Therefore, the value written will persist across the 
 task scheduling point.

-\cexample{tasking}{8c}
+\cexample{tasking}{8}


-\fexample{tasking}{8f}
+\fexample{tasking}{8}

 The following two examples demonstrate how the scheduling rules illustrated in 
 Section 2.11.3 of the OpenMP 4.0 specification affect the usage of locks 
@ -112,20 +112,20 @@ it encounters the task scheduling point at task 3, it could suspend task 1 and
 begin task 2 which will result in a deadlock when it tries to enter critical region 
 1.

-\cexample{tasking}{9c}
+\cexample{tasking}{9}


-\fexample{tasking}{9f}
+\fexample{tasking}{9}

 In the following example, \code{lock} is held across a task scheduling point. 
 However, according to the scheduling restrictions, the executing thread can't 
 begin executing one of the non-descendant tasks that also acquires \code{lock} before 
 the task region is complete.  Therefore, no deadlock is possible.

-\cexample{tasking}{10c}
+\cexample{tasking}{10}


-\fexample{tasking}{10f}
+\ffreeexample{tasking}{10}

 The following examples illustrate the use of the \code{mergeable} clause in the 
 \code{task} construct. In this first example, the \code{task} construct has 
@ -139,9 +139,9 @@ outcome does not depend on whether or not the task is merged (that is, the task
 will always increment the same variable and will always compute the same value 
 for \code{x}).

-\cexample{tasking}{11c}
+\cexample{tasking}{11}

-\fexample{tasking}{11f}
+\ffreeexample{tasking}{11}

 This second example shows an incorrect use of the \code{mergeable} clause. In 
 this example, the created task will access different instances of the variable 
@ -150,9 +150,9 @@ it will access the same variable \code{x} if the task is merged. As a result,
 the behavior of the program is unspecified and it can print two different values 
 for \code{x} depending on the decisions taken by the implementation.

-\cexample{tasking}{12c}
+\cexample{tasking}{12}

-\fexample{tasking}{12f}
+\ffreeexample{tasking}{12}

 The following example shows the use of the \code{final} clause and the \code{omp\_in\_final} 
 API call in a recursive binary search program. To reduce overhead, once a certain 
@ -170,9 +170,9 @@ in the stack could also be avoided but it would make this example less clear. Th
 clause since all tasks created in a \code{final} task region are included tasks 
 that can be merged if the \code{mergeable} clause is present.

-\cexample{tasking}{13c}
+\cexample{tasking}{13}

-\fexample{tasking}{13f}
+\ffreeexample{tasking}{13}

 The following example illustrates the difference between the \code{if}  and the 
 \code{final} clauses. The \code{if} clause has a local effect. In the first 
@ -184,7 +184,7 @@ task itself. In the second nest of tasks, the nested tasks will be created as in
 tasks. Note also that the conditions for the \code{if} and \code{final} clauses 
 are usually the opposite.

-\cexample{tasking}{14c}
+\cexample{tasking}{14}

-\fexample{tasking}{14f}
+\ffreeexample{tasking}{14}

--- a/Examples_taskloop.tex
+++ b/Examples_taskloop.tex
@ -0,0 +1,14 @@
+\pagebreak
+\section{The \code{taskloop} Construct}
+\label{sec:taskloop}
+
+The following example illustrates how to execute a long running task concurrently with tasks created
+with a \code{taskloop} directive for a loop having unbalanced amounts of work for its iterations.
+
+The \code{grainsize} clause specifies that each task is to execute at least 500 iterations of the loop. 
+
+The \code{nogroup} clause removes the implicit taskgroup of the \code{taskloop} construct; the explicit \code{taskgroup} construct in the example ensures that the function is not exited before the long-running task and the loops have finished execution.
+
+\cexample{taskloop}{1}
+
+\ffreeexample{taskloop}{1}
--- a/Examples_taskyield.tex
+++ b/Examples_taskyield.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{The \code{taskyield} Construct}
-\label{chap:taskyield}
+\section{The \code{taskyield} Construct}
+\label{sec:taskyield}

 The following example illustrates the use of the \code{taskyield}  directive. 
 The tasks in the example compute something useful and then do some computation 
@ -8,7 +8,7 @@ that must be done in a critical region. By using \code{taskyield} when a task
 cannot get access to the \code{critical} region the implementation can suspend 
 the current task and schedule some other task that can do something useful. 

-\cexample{taskyield}{1c}
+\cexample{taskyield}{1}

-\fexample{taskyield}{1f}
+\ffreeexample{taskyield}{1}

--- a/Examples_teams.tex
+++ b/Examples_teams.tex
@ -1,9 +1,10 @@
 \pagebreak
-\chapter{\code{teams} Constructs}
-\label{chap:teams}
+\section{\code{teams} Constructs}
+\label{sec:teams}

-\section{\code{target} and \code{teams} Constructs with \code{omp\_get\_num\_teams}\\
+\subsection{\code{target} and \code{teams} Constructs with \code{omp\_get\_num\_teams}\\
 and \code{omp\_get\_team\_num} Routines}
+\label{subsec:teams_api}

 The following example shows how the \code{target} and \code{teams} constructs 
 are used to create a league of thread teams that execute a region. The \code{teams} 
@ -15,11 +16,12 @@ region. The \code{omp\_get\_team\_num} routine returns the team number, which is
 between 0 and one less than the value returned by \code{omp\_get\_num\_teams}. The following 
 example manually distributes a loop across two teams.

-\cexample{teams}{1c}
+\cexample{teams}{1}

-\fexample{teams}{1f}
+\ffreeexample{teams}{1}

-\section{\code{target}, \code{teams}, and \code{distribute} Constructs}
+\subsection{\code{target}, \code{teams}, and \code{distribute} Constructs}
+\label{subsec:teams_distribute}

 The following example shows how the \code{target}, \code{teams}, and \code{distribute} 
 constructs are used to execute a loop nest in a \code{target} region. The \code{teams} 
@ -45,11 +47,12 @@ created by the \code{teams} construct. At the end of the \code{teams} region,
 each master thread's private copy of \plc{sum} is reduced into the final \plc{sum} that is 
 implicitly mapped into the \code{target} region.

-\cexample{teams}{2c}
+\cexample{teams}{2}

-\fexample{teams}{2f}
+\ffreeexample{teams}{2}

-\section{\code{target} \code{teams}, and Distribute Parallel Loop Constructs}
+\subsection{\code{target} \code{teams}, and Distribute Parallel Loop Constructs}
+\label{subsec:teams_distribute_parallel}

 The following example shows how the \code{target} \code{teams} and distribute 
 parallel loop constructs are used to execute a \code{target} region. The \code{target} 
@ -59,12 +62,13 @@ team executes the \code{teams} region.
 The distribute parallel loop construct schedules the loop iterations across the 
 master threads of each team and then across the threads of each team.

-\cexample{teams}{3c}
+\cexample{teams}{3}

-\fexample{teams}{3f}
+\ffreeexample{teams}{3}

-\section{\code{target} \code{teams} and Distribute Parallel Loop 
+\subsection{\code{target} \code{teams} and Distribute Parallel Loop 
 Constructs with Scheduling Clauses}
+\label{subsec:teams_distribute_parallel_schedule}

 The following example shows how the \code{target} \code{teams} and distribute 
 parallel loop constructs are used to execute a \code{target} region. The \code{teams} 
@ -83,11 +87,12 @@ The \code{schedule} clause indicates that the 1024 iterations distributed to
 a master thread are then assigned to the threads in its associated team in chunks 
 of 64 iterations.

-\cexample{teams}{4c}
+\cexample{teams}{4}

-\fexample{teams}{4f}
+\ffreeexample{teams}{4}

-\section{\code{target} \code{teams} and \code{distribute} \code{simd} Constructs}
+\subsection{\code{target} \code{teams} and \code{distribute} \code{simd} Constructs}
+\label{subsec:teams_distribute_simd}

 The following example shows how the \code{target} \code{teams} and \code{distribute} 
 \code{simd} constructs are used to execute a loop in a \code{target} region. 
@ -97,11 +102,12 @@ master thread of each team executes the \code{teams} region.
 The \code{distribute} \code{simd} construct schedules the loop iterations across 
 the master thread of each team and then uses SIMD parallelism to execute the iterations.

-\cexample{teams}{5c}
+\cexample{teams}{5}

-\fexample{teams}{5f}
+\ffreeexample{teams}{5}

-\section{\code{target} \code{teams} and Distribute Parallel Loop SIMD Constructs}
+\subsection{\code{target} \code{teams} and Distribute Parallel Loop SIMD Constructs}
+\label{subsec:teams_distribute_parallel_simd}

 The following example shows how the \code{target} \code{teams} and the distribute 
 parallel loop SIMD constructs are used to execute a loop in a \code{target} \code{teams} 
@ -112,7 +118,7 @@ The distribute parallel loop SIMD construct schedules the loop iterations across
 the master thread of each team and then across the threads of each team where each 
 thread uses SIMD parallelism.

-\cexample{teams}{6c}
+\cexample{teams}{6}

-\fexample{teams}{6f}
+\ffreeexample{teams}{6}

--- a/Examples_threadprivate.tex
+++ b/Examples_threadprivate.tex
@ -1,18 +1,18 @@
 \pagebreak
-\chapter{The \code{threadprivate} Directive}
-\label{chap:threadprivate}
+\section{The \code{threadprivate} Directive}
+\label{sec:threadprivate}

 The following examples demonstrate how to use the \code{threadprivate} directive 
 to give each thread a separate counter.

-\cexample{threadprivate}{1c}
+\cexample{threadprivate}{1}

-\fexample{threadprivate}{1f}
+\fexample{threadprivate}{1}

 \ccppspecificstart
 The following example uses \code{threadprivate} on a static variable:

-\cnexample{threadprivate}{2c}
+\cnexample{threadprivate}{2}

 The following example demonstrates unspecified behavior for the initialization 
 of a \code{threadprivate} variable. A \code{threadprivate}  variable is initialized 
@ -22,7 +22,7 @@ constructed using the value of \code{x}  (which is modified by the statement
 region could be either 1 or 2. This problem is avoided for \code{b}, which uses 
 an auxiliary \code{const} variable and a copy-constructor.

-\cnexample{threadprivate}{3c}
+\cppnexample{threadprivate}{3}
 \ccppspecificend

 The following examples show non-conforming uses and correct uses of the \code{threadprivate} 
@ -32,29 +32,25 @@ directive.
 The following example is non-conforming because the common block is not declared 
 local to the subroutine that refers to it:

-\fnexample{threadprivate}{2f}
+\fnexample{threadprivate}{2}

 The following example is also non-conforming because the common block is not declared 
 local to the subroutine that refers to it:

-\fnexample{threadprivate}{3f}
+\fnexample{threadprivate}{3}

 The following example is a correct rewrite of the previous example:
-% blue line floater at top of this page for "Fortran, cont."
-\begin{figure}[t!]
-\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
-\end{figure}

-\fnexample{threadprivate}{4f}
+\fnexample{threadprivate}{4}

 The following is an example of the use of \code{threadprivate} for local variables:
-
-\fnexample{threadprivate}{5f}
 % blue line floater at top of this page for "Fortran, cont."
 \begin{figure}[t!]
 \linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
 \end{figure}

+\fnexample{threadprivate}{5}
+
 The above program, if executed by two threads, will print one of the following 
 two sets of output: 

@ -85,8 +81,12 @@ or
 \code{i = 5}

 The following is an example of the use of \code{threadprivate} for module variables:
+% blue line floater at top of this page for "Fortran, cont."
+\begin{figure}[t!]
+\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
+\end{figure}

-\fnexample{threadprivate}{6f}
+\fnexample{threadprivate}{6}
 \fortranspecificend

 \cppspecificstart
@ -95,12 +95,12 @@ for class-type \code{T}. \code{t1} is default constructed, \code{t2} is construc
 taking a constructor accepting one argument of integer type, \code{t3} is copy 
 constructed with argument \code{f()}:

-\cnexample{threadprivate}{4c}
+\cppnexample{threadprivate}{4}

 The following example illustrates the use of \code{threadprivate} for static 
 class members. The \code{threadprivate} directive for a static class member must 
 be placed inside the class definition.

-\cnexample{threadprivate}{5c}
+\cppnexample{threadprivate}{5}
 \cppspecificend

--- a/Examples_workshare.tex
+++ b/Examples_workshare.tex
@ -1,7 +1,7 @@
 \pagebreak
-\chapter{The \code{workshare} Construct}
+\section{The \code{workshare} Construct}
 \fortranspecificstart
-\label{chap:workshare}
+\label{sec:workshare}

 The following are examples of the \code{workshare} construct. 

@ -10,14 +10,14 @@ the \code{parallel} region, and there is a barrier after the last statement.
 Implementations must enforce Fortran execution rules inside of the \code{workshare} 
 block.

-\fnexample{workshare}{1f}
+\fnexample{workshare}{1}

 In the following example, the barrier at the end of the first \code{workshare} 
 region is eliminated with a \code{nowait} clause. Threads doing \code{CC = 
 DD} immediately begin work on \code{EE = FF} when they are done with \code{CC 
 = DD}.

-\fnexample{workshare}{2f}
+\fnexample{workshare}{2}
 % blue line floater at top of this page for "Fortran, cont."
 \begin{figure}[t!]
 \linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
@ -27,7 +27,7 @@ The following example shows the use of an \code{atomic} directive inside a \code
 construct. The computation of \code{SUM(AA)} is workshared, but the update to 
 \code{R} is atomic.

-\fnexample{workshare}{3f}
+\fnexample{workshare}{3}

 Fortran \code{WHERE} and \code{FORALL} statements are \emph{compound statements}, 
 made up of a \emph{control} part and a \emph{statement} part. When \code{workshare} 
@ -47,7 +47,7 @@ Each task gets worked on in order by the threads:
 \\
 \code{GG = HH}

-\fnexample{workshare}{4f}
+\fnexample{workshare}{4}
 % blue line floater at top of this page for "Fortran, cont."
 \begin{figure}[t!]
 \linewitharrows{-1}{dashed}{Fortran (cont.)}{8em}
@ -56,21 +56,21 @@ Each task gets worked on in order by the threads:
 In the following example, an assignment to a shared scalar variable is performed 
 by one thread in a \code{workshare} while all other threads in the team wait.

-\fnexample{workshare}{5f}
+\fnexample{workshare}{5}

 The following example contains an assignment to a private scalar variable, which 
 is performed by one thread in a \code{workshare} while all other threads wait. 
 It is non-conforming because the private scalar variable is undefined after the 
 assignment statement. 

-\fnexample{workshare}{6f}
+\fnexample{workshare}{6}

 Fortran execution rules must be enforced inside a \code{workshare} construct. 
 In the following example, the same result is produced in the following program 
 fragment regardless of whether the code is executed sequentially or inside an OpenMP 
 program with multiple threads:

-\fnexample{workshare}{7f}
+\fnexample{workshare}{7}
 \fortranspecificend


--- a/Examples_worksharing_critical.tex
+++ b/Examples_worksharing_critical.tex
@ -1,6 +1,6 @@
 \pagebreak
-\chapter{Worksharing Constructs Inside a \code{critical} Construct}
-\label{chap:worksharing_critical}
+\section{Worksharing Constructs Inside a \code{critical} Construct}
+\label{sec:worksharing_critical}

 The following example demonstrates using a worksharing construct inside a \code{critical} 
 construct. This example is conforming because the worksharing \code{single}  
@ -11,8 +11,8 @@ region, creates a new team of threads, and becomes the master of the new team.
 One of the threads in the new team enters the \code{single} region and increments 
 \code{i} by \code{1}. At the end of this example \code{i} is equal to \code{2}.

-\cexample{worksharing_critical}{1c}
+\cexample{worksharing_critical}{1}

-\fexample{worksharing_critical}{1f}
+\fexample{worksharing_critical}{1}


--- a/History.tex
+++ b/History.tex
@ -1,11 +1,39 @@
 \chapter{Document Revision History}
 \label{chap:history}

+\section{Changes from 4.0.2 to 4.5.0}
+\begin{itemize}
+\item Reorganized into chapters of major topics
+\item Included file extensions in example labels to indicate source type
+\item Applied the explicit \code{map(tofrom)} for scalar variables 
+in a number of examples to comply with 
+the change of the default behavior for scalar variables from 
+\code{map(tofrom)} to \code{firstprivate} in the 4.5 specification
+\item Added the following new examples:
+\begin{itemize}
+\item \code{linear} clause in loop constructs (\specref{sec:linear_in_loop})
+\item task priority (\specref{sec:task_priority})
+\item \code{taskloop} construct (\specref{sec:taskloop})
+\item \plc{directive-name} modifier in multiple \code{if} clauses on
+a combined construct (\specref{subsec:target_if})
+\item unstructured data mapping (\specref{sec:target_enter_exit_data})
+\item \code{link} clause for \code{declare}~\code{target} directive 
+(\specref{subsec:declare_target_link})
+\item asynchronous target execution with \code{nowait} clause (\specref{sec:async_target_exec_depend})
+\item device memory routines and device pointers
+(\specref{subsec:target_mem_and_device_ptrs})
+\item doacross loop nest (\specref{sec:doacross})
+\item locks with hints (\specref{sec:locks})
+\item C/C++ array reduction (\specref{sec:reduction})
+\item C++ reference types in data sharing clauses (\specref{sec:cpp_reference})
+\end{itemize}
+\end{itemize}
+
 \section{Changes from 4.0.1 to 4.0.2}

 \begin{itemize}
 \item Names of examples were changed from numbers to mnemonics
-\item Added SIMD examples (\specref{chap:SIMD})
+\item Added SIMD examples (\specref{sec:SIMD})
 \item Applied miscellaneous fixes in several source codes
 \item Added the revision history
 \end{itemize}
@ -14,8 +42,8 @@

 Added the following new examples:
 \begin{itemize}
-\item the \code{proc\_bind} clause (\specref{chap:affinity})
-\item the \code{taskgroup} construct (\specref{chap:taskgroup})
+\item the \code{proc\_bind} clause (\specref{sec:affinity})
+\item the \code{taskgroup} construct (\specref{sec:taskgroup})
 \end{itemize}

 \section{Changes from 3.1 to 4.0}
@ -25,16 +53,16 @@ from the specification document.

 Version 4.0 added the following new examples:
 \begin{itemize}
-\item task dependences (\specref{chap:task_dep})
-\item cancellation constructs (\specref{chap:cancellation})
-\item \code{target} construct (\specref{chap:target})
-\item \code{target} \code{data} construct (\specref{chap:target_data})
-\item \code{target} \code{update} construct (\specref{chap:target_update})
-\item \code{declare} \code{target} construct (\specref{chap:declare_target})
-\item \code{teams} constructs (\specref{chap:teams})
+\item task dependences (\specref{sec:task_depend})
+\item \code{target} construct (\specref{sec:target})
+\item \code{target} \code{data} construct (\specref{sec:target_data})
+\item \code{target} \code{update} construct (\specref{sec:target_update})
+\item \code{declare} \code{target} construct (\specref{sec:declare_target})
+\item \code{teams} constructs (\specref{sec:teams})
 \item asynchronous execution of a \code{target} region using tasks
- (\specref{chap:async_target})
-\item array sections in device constructs (\specref{chap:array_sections})
-\item device runtime routines (\specref{chap:device})
-\item Fortran ASSOCIATE construct (\specref{chap:associate})
+ (\specref{subsec:async_target_with_tasks})
+\item array sections in device constructs (\specref{sec:array_sections})
+\item device runtime routines (\specref{sec:device})
+\item Fortran ASSOCIATE construct (\specref{sec:associate})
+\item cancellation constructs (\specref{sec:cancellation})
 \end{itemize}
--- a/Introduction_Chapt.tex
+++ b/Introduction_Chapt.tex
@ -34,13 +34,14 @@

 \chapter*{Introduction}
 \label{chap:introduction}
+\addcontentsline{toc}{chapter}{\protect\numberline{}Introduction}
 This collection of programming examples supplements the OpenMP API for Shared
 Memory Parallelization specifications, and is not part of the formal specifications. It
 assumes familiarity with the OpenMP specifications, and shares the typographical
 conventions used in that document.

 \notestart
-\noteheader – This first release of the OpenMP Examples reflects the OpenMP Version 4.0
+\noteheader – This first release of the OpenMP Examples reflects the OpenMP Version 4.5
 specifications. Additional examples are being developed and will be published in future
 releases of this document.
 \noteend
--- a/71
+++ b/71
@ -1,75 +1,20 @@
 # Makefile for the OpenMP Examples document in LaTex format. 
 # For more information, see the master document, openmp-examples.tex.

-version=4.0.2
+version=4.5.0
 default: openmp-examples.pdf


 CHAPTERS=Title_Page.tex \
 	Introduction_Chapt.tex \
-	Examples_Chapt.tex \
-	Examples_ploop.tex \
-	Examples_mem_model.tex \
-	Examples_cond_comp.tex \
-	Examples_icv.tex \
-	Examples_parallel.tex \
-	Examples_nthrs_nesting.tex \
-	Examples_nthrs_dynamic.tex \
-	Examples_affinity.tex \
-	Examples_fort_do.tex \
-	Examples_fort_loopvar.tex \
-	Examples_nowait.tex \
-	Examples_collapse.tex \
-	Examples_psections.tex \
-	Examples_fpriv_sections.tex \
-	Examples_single.tex \
-	Examples_tasking.tex \
-	Examples_task_dep.tex \
-	Examples_taskgroup.tex \
-	Examples_taskyield.tex \
-	Examples_workshare.tex \
-	Examples_master.tex \
-	Examples_critical.tex \
-	Examples_worksharing_critical.tex \
-	Examples_barrier_regions.tex \
-	Examples_atomic.tex \
-	Examples_atomic_restrict.tex \
-	Examples_flush_nolist.tex \
-	Examples_standalone.tex \
-	Examples_ordered.tex \
-	Examples_cancellation.tex \
-	Examples_threadprivate.tex \
-	Examples_pra_iterator.tex \
-	Examples_fort_sp_common.tex \
-	Examples_default_none.tex \
-	Examples_fort_race.tex \
-	Examples_private.tex \
-	Examples_fort_sa_private.tex \
-	Examples_carrays_fpriv.tex \
-	Examples_lastprivate.tex \
-	Examples_reduction.tex \
-	Examples_copyin.tex \
-	Examples_copyprivate.tex \
-	Examples_nested_loop.tex \
-	Examples_nesting_restrict.tex \
-	Examples_set_dynamic_nthrs.tex \
-	Examples_get_nthrs.tex \
-	Examples_init_lock.tex \
-	Examples_lock_owner.tex \
-	Examples_simple_lock.tex \
-	Examples_nestable_lock.tex \
-	Examples_SIMD.tex \
-	Examples_target.tex \
-	Examples_target_data.tex \
-	Examples_target_update.tex \
-	Examples_declare_target.tex \
-	Examples_teams.tex \
-	Examples_async_target.tex \
-	Examples_array_sections.tex \
-	Examples_device.tex \
-	Examples_associate.tex \
+	Examples_*.tex \
 	History.tex

+SOURCES=sources/*.c \
+	sources/*.cpp \
+	sources/*.f90 \
+	sources/*.f 
+
 INTERMEDIATE_FILES=openmp-examples.pdf \
 		openmp-examples.toc \
 		openmp-examples.idx \
@ -79,7 +24,7 @@ INTERMEDIATE_FILES=openmp-examples.pdf \
 		openmp-examples.out \
 		openmp-examples.log

-openmp-examples.pdf: $(CHAPTERS) openmp.sty openmp-examples.tex openmp-logo.png
+openmp-examples.pdf: $(CHAPTERS) $(SOURCES) openmp.sty openmp-examples.tex openmp-logo.png
 	rm -f $(INTERMEDIATE_FILES)
 	pdflatex -interaction=batchmode -file-line-error openmp-examples.tex
 	pdflatex -interaction=batchmode -file-line-error openmp-examples.tex
--- a/Title_Page.tex
+++ b/Title_Page.tex
@ -27,7 +27,7 @@ Source codes for OpenMP \VER{} Examples can be downloaded from
 \href{https://github.com/OpenMP/Examples/tree/v\VER}{github}.\\

 \begin{adjustwidth}{0pt}{1em}\setlength{\parskip}{0.25\baselineskip}%
-Copyright © 1997-2015 OpenMP Architecture Review Board.\\
+Copyright © 1997-2016 OpenMP Architecture Review Board.\\
 Permission to copy without fee all or part of this material is granted,
 provided the OpenMP Architecture Review Board copyright notice and
 the title of this document appear. Notice is given that copying is by
--- a/omp_copyright.txt
+++ b/omp_copyright.txt
@ -1,4 +1,4 @@
-Copyright (c) 1997-2015 OpenMP Architecture Review Board.
+Copyright (c) 1997-2016 OpenMP Architecture Review Board.
 All rights reserved.

 Permission to redistribute and use without fee all or part of the source
--- a/openmp-examples.tcp
+++ b/openmp-examples.tcp
@ -0,0 +1,11 @@
+[FormatInfo]
+Type=TeXnicCenterProjectInformation
+Version=4
+
+[ProjectInfo]
+MainFile=ClassicThesis.tex
+UseBibTeX=1
+UseMakeIndex=0
+ActiveProfile=LaTeX ⇨ PDF
+ProjectLanguage=en
+ProjectDialect=US
--- a/openmp-examples.tex
+++ b/openmp-examples.tex
@ -48,8 +48,8 @@
 \documentclass[10pt,letterpaper,twoside,makeidx,hidelinks]{scrreprt}

 % Text to appear in the footer on even-numbered pages:
-\newcommand{\VER}{4.0.2}
-\newcommand{\VERDATE}{March 2015}
+\newcommand{\VER}{4.5.0}
+\newcommand{\VERDATE}{November 2016}
 \newcommand{\footerText}{OpenMP Examples Version \VER{} - \VERDATE}

 % Unified style sheet for OpenMP documents:
@ -77,71 +77,120 @@

    \setcounter{chapter}{0}  % start chapter numbering here

-    \input{Examples_ploop}
-    \input{Examples_mem_model}
-    \input{Examples_cond_comp}
-    \input{Examples_icv}
-    \input{Examples_parallel}
-    \input{Examples_nthrs_nesting}
-    \input{Examples_nthrs_dynamic}
-    \input{Examples_affinity}
-    \input{Examples_fort_do}
-    \input{Examples_fort_loopvar}
-    \input{Examples_nowait}
-    \input{Examples_collapse}
-    \input{Examples_psections}
-    \input{Examples_fpriv_sections}
-    \input{Examples_single}
-    \input{Examples_tasking}
-    \input{Examples_task_dep}
-    \input{Examples_taskgroup}
-    \input{Examples_taskyield}
-    \input{Examples_workshare}
-    \input{Examples_master}
-    \input{Examples_critical}
-    \input{Examples_worksharing_critical}
-    \input{Examples_barrier_regions}
-    \input{Examples_atomic}
-    \input{Examples_atomic_restrict}
-    \input{Examples_flush_nolist}
-    \input{Examples_standalone}
-    \input{Examples_ordered}
-    \input{Examples_cancellation}
-    \input{Examples_threadprivate}
-    \input{Examples_pra_iterator}
-    \input{Examples_fort_sp_common}
-    \input{Examples_default_none}
-    \input{Examples_fort_race}
-    \input{Examples_private}
-    \input{Examples_fort_sa_private}
-    \input{Examples_carrays_fpriv}
-    \input{Examples_lastprivate}
-    \input{Examples_reduction}
-    \input{Examples_copyin}
-    \input{Examples_copyprivate}
-    \input{Examples_nested_loop}
-    \input{Examples_nesting_restrict}
-    \input{Examples_set_dynamic_nthrs}
-    \input{Examples_get_nthrs}
-    \input{Examples_init_lock}
-    \input{Examples_lock_owner}
-    \input{Examples_simple_lock}
-    \input{Examples_nestable_lock}
-    \input{Examples_SIMD}
-    \input{Examples_target}
-    \input{Examples_target_data}
-    \input{Examples_target_update}
-    \input{Examples_declare_target}
-    \input{Examples_teams}
-    \input{Examples_async_target}
-    \input{Examples_array_sections}
-    \input{Examples_device}
-    \input{Examples_associate}
+    \input{Chap_parallel_execution}
+       \input{Examples_ploop}
+       \input{Examples_parallel}
+       \input{Examples_nthrs_nesting}
+       \input{Examples_nthrs_dynamic}
+       \input{Examples_fort_do}
+       \input{Examples_nowait}
+       \input{Examples_collapse}
+     % linear Clause 475
+       \input{Examples_linear_in_loop}
+       \input{Examples_psections}
+       \input{Examples_fpriv_sections}
+       \input{Examples_single}
+       \input{Examples_workshare}
+       \input{Examples_master}
+       \input{Examples_pra_iterator}
+       \input{Examples_set_dynamic_nthrs}
+       \input{Examples_get_nthrs}
+
+    \input{Chap_affinity}
+       \input{Examples_affinity}
+       \input{Examples_affinity_query}
+
+    \input{Chap_tasking}
+       \input{Examples_tasking}
+       \input{Examples_task_priority}
+       \input{Examples_task_dep}
+       \input{Examples_taskgroup}
+       \input{Examples_taskyield}
+       \input{Examples_taskloop}
+
+    \input{Chap_devices}
+       \input{Examples_target}
+       \input{Examples_target_data}
+       \input{Examples_target_unstructured_data}
+       \input{Examples_target_update}
+       \input{Examples_declare_target}
+           % Link clause 474
+       \input{Examples_teams}
+       \input{Examples_async_target_depend}
+       \input{Examples_async_target_with_tasks}
+           %Title change of 57.1 and 57.2
+           %New subsection
+       \input{Examples_async_target_nowait}
+       \input{Examples_async_target_nowait_depend}
+       \input{Examples_array_sections}
+     % Structure Element in map 487
+       \input{Examples_device}
+            % MemoryRoutine and Device ptr  473
+
+    \input{Chap_SIMD}
+       \input{Examples_SIMD}
+      % Forward Depend 370
+      % simdlen  476
+      % simd linear modifier 480
+
+    \input{Chap_synchronization}
+       \input{Examples_critical}
+       \input{Examples_worksharing_critical}
+       \input{Examples_barrier_regions}
+       \input{Examples_atomic}
+       \input{Examples_atomic_restrict}
+       \input{Examples_flush_nolist}
+       \input{Examples_ordered}
+     % Doacross loop  405
+       \input{Examples_doacross}
+       \input{Examples_locks}
+            \input{Examples_init_lock}
+            \input{Examples_init_lock_with_hint} 
+            \input{Examples_lock_owner}
+            \input{Examples_simple_lock}
+            \input{Examples_nestable_lock}
+    %  % LOCK with Hints 478
+    %       % Hint Clause  xxxxxx (included after init_lock)
+    %       % Lock routines with hint
+           
+
+    \input{Chap_data_environment}
+       \input{Examples_threadprivate}
+       \input{Examples_default_none}
+       \input{Examples_private}
+       \input{Examples_fort_loopvar}
+       \input{Examples_fort_sp_common}
+       \input{Examples_fort_sa_private}
+       \input{Examples_carrays_fpriv}
+       \input{Examples_lastprivate}
+       \input{Examples_reduction}
+     %  User UDR  287
+     %  C array reduction 377
+       \input{Examples_copyin}
+       \input{Examples_copyprivate}
+       \input{Examples_cpp_reference}
+     %  Fortran 2003 features  482
+            \input{Examples_associate}  %section--> subsection
+
+    \input{Chap_memory_model}
+       \input{Examples_mem_model}
+       \input{Examples_fort_race}
+
+    \input{Chap_program_control}
+       \input{Examples_cond_comp}
+       \input{Examples_icv}
+     % If multi-ifs  471
+       \input{Examples_standalone}
+       \input{Examples_cancellation}
+     % New Section Nested Regions
+           \input{Examples_nested_loop}
+           \input{Examples_nesting_restrict}
+

    \setcounter{chapter}{0}  % restart chapter numbering with "letter A"
    \renewcommand{\thechapter}{\Alph{chapter}}%
    \appendix
-
    \input{History}
+
 \end{document}

--- a/openmp.sty
+++ b/openmp.sty
@ -78,6 +78,7 @@

 \usepackage{comment}            % allow use of \begin{comment}
 \usepackage{ifpdf,ifthen}       % allow conditional tests in LaTeX definitions
+\usepackage{makecell}           % Allows common formatting in cells with \thread & \makecell


 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -416,8 +417,10 @@
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 % Code example formatting for the Examples document
 % This defines:
-%     /cexample       formats blue markers, caption, and code for C/C++ examples
-%     /fexample       formats blue markers, caption, and code for Fortran examples
+%     /cexample       formats blue markers, caption, and code for C examples
+%     /cppexample     formats blue markers, caption, and code for C++ examples
+%     /fexample       formats blue markers, caption, and code for Fortran (fixed) examples
+%     /ffreeexample     formats blue markers, caption, and code for Fortran90 (free) examples
 % Thanks to Jin, Haoqiang H. for the original definitions of the following:

 \usepackage{color,fancyvrb}  % for \VerbatimInput
@ -434,36 +437,40 @@

 \newcommand{\escstr}[1]{\myreplace{_}{\_}{#1}}

-\def\exampleheader#1#2{%
+\def\exampleheader#1#2#3#4{%
   \ifthenelse{ \equal{#1}{} }{
      \def\cname{#2}
      \def\ename\cname
   }{
-      \def\cname{#1.#2}
+      \def\cname{#1.#2.#3}
 % Use following line for old numbering
-%      \def\ename{\thechapter.#2}
+%      \def\ename{\thechapter.#2.#3}
 % Use following for mneumonics
-      \def\ename{\escstr{#1}.#2}
+      \def\ename{\escstr{#1}.#2.#3}
   }
   \noindent
   \textit{Example \ename}
   %\vspace*{-3mm}
+   \code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=#4,fontsize=\small]%
+   %\code{\VerbatimInput[numbers=left,firstnumber=1,firstline=#4,fontsize=\small]%
+   %\code{\VerbatimInput[firstline=#4,fontsize=\small]%
+      {sources/Example_\cname}} 
 }

 \def\cnexample#1#2{%
-   \exampleheader{#1}{#2}
-   \code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=8,fontsize=\small]%
-   %\code{\VerbatimInput[numbers=left,firstnumber=1,firstline=8,fontsize=\small]%
-   %\code{\VerbatimInput[firstline=8,fontsize=\small]%
-      {sources/Example_\cname.c}} 
+   \exampleheader{#1}{#2}{c}{8}
+}
+
+\def\cppnexample#1#2{%
+   \exampleheader{#1}{#2}{cpp}{8}
 }

 \def\fnexample#1#2{%
-   \exampleheader{#1}{#2}
-   \code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=6,fontsize=\small]%
-   %\code{\VerbatimInput[numbers=left,firstnumber=1,firstline=6,fontsize=\small]%
-   %\code{\VerbatimInput[firstline=6,fontsize=\small]%
-      {sources/Example_\cname.f}}
+   \exampleheader{#1}{#2}{f}{6}
+}
+
+\def\ffreenexample#1#2{%
+   \exampleheader{#1}{#2}{f90}{6}
 }

 \newcommand\cexample[2]{%
@ -474,7 +481,7 @@

 \newcommand\cppexample[2]{%
 \needspace{5\baselineskip}\cppspecificstart
-\cnexample{#1}{#2}
+\cppnexample{#1}{#2}
 \cppspecificend
 }

@ -484,6 +491,12 @@
 \fortranspecificend
 }

+\newcommand\ffreeexample[2]{%
+\needspace{5\baselineskip}\fortranspecificstart
+\ffreenexample{#1}{#2}
+\fortranspecificend
+}
+

 % Set default fonts:
 \rmfamily\mdseries\upshape\normalsize
--- a/sources/Example_SIMD.1c.c
+++ b/sources/Example_SIMD.1c.c
--- a/sources/Example_SIMD.1.f90
+++ b/sources/Example_SIMD.1.f90
--- a/sources/Example_SIMD.2c.c
+++ b/sources/Example_SIMD.2c.c
--- a/sources/Example_SIMD.2.f90
+++ b/sources/Example_SIMD.2.f90
--- a/sources/Example_SIMD.3c.c
+++ b/sources/Example_SIMD.3c.c
--- a/sources/Example_SIMD.3.f90
+++ b/sources/Example_SIMD.3.f90
--- a/sources/Example_SIMD.4c.c
+++ b/sources/Example_SIMD.4c.c
--- a/sources/Example_SIMD.4.f90
+++ b/sources/Example_SIMD.4.f90
--- a/sources/Example_SIMD.5c.c
+++ b/sources/Example_SIMD.5c.c
--- a/Show More
+++ b/Show More