diff --git a/Changes.log b/Changes.log index bb9d813..594cd19 100644 --- a/Changes.log +++ b/Changes.log @@ -1,3 +1,9 @@ +[20-May-2016] Version 4.5.0 +Changes from 4.0.2ltx + +1. Reorganization into topic chapters +2. Change file suffixes (f/f90 => Fixed/Free format) C++ => cpp + [2-Feb-2015] Version 4.0.2 Changes from 4.0.1ltx diff --git a/Chap_SIMD.tex b/Chap_SIMD.tex new file mode 100644 index 0000000..efa6cbe --- /dev/null +++ b/Chap_SIMD.tex @@ -0,0 +1,48 @@ +\pagebreak +\chapter{SIMD} +\label{chap:simd} + +Single instruction, multiple data (SIMD) is a form of parallel execution +in which the same operation is performed on multiple data elements +independently in hardware vector processing units (VPU), also called SIMD units. +The addition of two vectors to form a third vector is a SIMD operation. +Many processors have SIMD (vector) units that can perform simultaneously +2, 4, 8 or more executions of the same operation (by a single SIMD unit). + +Loops without loop-carried backward dependency (or with dependency preserved using +ordered simd) are candidates for vectorization by the compiler for +execution with SIMD units. In addition, with state-of-the-art vectorization +technology and \code{declare simd} construct extensions for function vectorization +in the OpenMP 4.5 specification, loops with function calls can be vectorized as well. +The basic idea is that a scalar function call in a loop can be replaced by a vector version +of the function, and the loop can be vectorized simultaneously by combining a loop +vectorization (\code{simd} directive on the loop) and a function +vectorization (\code{declare simd} directive on the function). + +A \code{simd} construct states that SIMD operations be performed on the +data within the loop. A number of clauses are available to provide +data-sharing attributes (\code{private}, \code{linear}, \code{reduction} and +\code{lastprivate}). Other clauses provide vector length preference/restrictions +(\code{simdlen} / \code{safelen}), loop fusion (\code{collapse}), and data +alignment (\code{aligned}). + +The \code{declare simd} directive designates +that a vector version of the function should also be constructed for +execution within loops that contain the function and have a \code{simd} +directive. Clauses provide argument specifications (\code{linear}, +\code{uniform}, and \code{aligned}), a requested vector length +(\code{simdlen}), and designate whether the function is always/never +called conditionally in a loop (\code{branch}/\code{inbranch}). +The latter is for optimizing peformance. + +Also, the \code{simd} construct has been combined with the worksharing loop +constructs (\code{for simd} and \code{do simd}) to enable simultaneous thread +execution in different SIMD units. +%Hence, the \code{simd} construct can be +%used alone on a loop to direct vectorization (SIMD execution), or in +%combination with a parallel loop construct to include thread parallelism +%(a parallel loop sequentially followed by a \code{simd} construct, +%or a combined construct such as \code{parallel do simd} or +%\code{parallel for simd}). + + diff --git a/Chap_affinity.tex b/Chap_affinity.tex new file mode 100644 index 0000000..91cb6cd --- /dev/null +++ b/Chap_affinity.tex @@ -0,0 +1,118 @@ +\pagebreak +\chapter{OpenMP Affinity} +\label{chap:openmp_affinity} + +OpenMP Affinity consists of a \code{proc\_bind} policy (thread affinity policy) and a specification of +places (\texttt{"}location units\texttt{"} or \plc{processors} that may be cores, hardware +threads, sockets, etc.). +OpenMP Affinity enables users to bind computations on specific places. +The placement will hold for the duration of the parallel region. +However, the runtime is free to migrate the OpenMP threads +to different cores (hardware threads, sockets, etc.) prescribed within a given place, +if two or more cores (hardware threads, sockets, etc.) have been assigned to a given place. + +Often the binding can be managed without resorting to explicitly setting places. +Without the specification of places in the \code{OMP\_PLACES} variable, +the OpenMP runtime will distribute and bind threads using the entire range of processors for +the OpenMP program, according to the \code{OMP\_PROC\_BIND} environment variable +or the \code{proc\_bind} clause. When places are specified, the OMP runtime +binds threads to the places according to a default distribution policy, or +those specified in the \code{OMP\_PROC\_BIND} environment variable or the +\code{proc\_bind} clause. + +In the OpenMP Specifications document a processor refers to an execution unit that +is enabled for an OpenMP thread to use. A processor is a core when there is +no SMT (Simultaneous Multi-Threading) support or SMT is disabled. When +SMT is enabled, a processor is a hardware thread (HW-thread). (This is the +usual case; but actually, the execution unit is implementation defined.) Processor +numbers are numbered sequentially from 0 to the number of cores less one (without SMT), or +0 to the number HW-threads less one (with SMT). OpenMP places use the processor number to designate +binding locations (unless an \texttt{"}abstract name\texttt{"} is used.) + + +The processors available to a process may be a subset of the system's +processors. This restriction may be the result of a +wrapper process controlling the execution (such as \code{numactl} on Linux systems), +compiler options, library-specific environment variables, or default +kernel settings. For instance, the execution of multiple MPI processes, +launched on a single compute node, will each have a subset of processors as +determined by the MPI launcher or set by MPI affinity environment +variables for the MPI library. %Forked threads within an MPI process +%(for a hybrid execution of MPI and OpenMP code) inherit the valid +%processor set for execution from the parent process (the initial task region) +%when a parallel region forks threads. The binding policy set in +%\code{OMP\_PROC\_BIND} or the \code{proc\_bind} clause will be applied to +%the subset of processors available to \plc{the particular} MPI process. + +%Also, setting an explicit list of processor numbers in the \code{OMP\_PLACES} +%variable before an MPI launch (which involves more than one MPI process) will +%result in unspecified behavior (and doesn't make sense) because the set of +%processors in the places list must not contain processors outside the subset +%of processors for an MPI process. A separate \code{OMP\_PLACES} variable must +%be set for each MPI process, and is usually accomplished by launching a script +%which sets \code{OMP\_PLACES} specifically for the MPI process. + +Threads of a team are positioned onto places in a compact manner, a +scattered distribution, or onto the master's place, by setting the +\code{OMP\_PROC\_BIND} environment variable or the \code{proc\_bind} clause to +\plc{close}, \plc{spread}, or \plc{master}, respectively. When +\code{OMP\_PROC\_BIND} is set to FALSE no binding is enforced; and +when the value is TRUE, the binding is implementation defined to +a set of places in the \code{OMP\_PLACES} variable or to places +defined by the implementation if the \code{OMP\_PLACES} variable +is not set. + +The \code{OMP\_PLACES} variable can also be set to an abstract name +(\plc{threads}, \plc{cores}, \plc{sockets}) to specify that a place is +either a single hardware thread, a core, or a socket, respectively. +This description of the \code{OMP\_PLACES} is most useful when the +number of threads is equal to the number of hardware thread, cores +or sockets. It can also be used with a \plc{close} or \plc{spread} +distribution policy when the equality doesn't hold. + + +% We need an example of using sockets, cores and threads: + +% case 1 cores: + +% Hyper-Threads on (2 hardware threads per core) +% 1 socket x 4 cores x 2 HW-threads +% +% export OMP_NUM_THREADS=4 +% export OMP_PLACES=threads +% +% core # 0 1 2 3 +% processor # 0,1 2,3 4,5 6,7 +% thread # 0 * _ _ _ _ _ _ _ #mask for thread 0 +% thread # 1 _ _ * _ _ _ _ _ #mask for thread 1 +% thread # 2 _ _ _ _ * _ _ _ #mask for thread 2 +% thread # 3 _ _ _ _ _ _ * _ #mask for thread 3 + +% case 2 threads: +% +% Hyper-Threads on (2 hardware threads per core) +% 1 socket x 4 cores x 2 HW-threads +% +% export OMP_NUM_THREADS=4 +% export OMP_PLACES=cores +% +% core # 0 1 2 3 +% processor # 0,1 2,3 4,5 6,7 +% thread # 0 * * _ _ _ _ _ _ #mask for thread 0 +% thread # 1 _ _ * * _ _ _ _ #mask for thread 1 +% thread # 2 _ _ _ _ * * _ _ #mask for thread 2 +% thread # 3 _ _ _ _ _ _ * * #mask for thread 3 + +% case 3 sockets: +% +% No Hyper-Threads +% 3 socket x 4 cores +% +% export OMP_NUM_THREADS=3 +% export OMP_PLACES=sockets +% +% socket # 0 1 2 +% processor # 0,1,2,3 4,5,6,7 8,9,10,11 +% thread # 0 * * * * _ _ _ _ _ _ _ _ #mask for thread 0 +% thread # 0 _ _ _ _ * * * * _ _ _ _ #mask for thread 1 +% thread # 0 _ _ _ _ _ _ _ _ * * * * #mask for thread 2 diff --git a/Chap_data_environment.tex b/Chap_data_environment.tex new file mode 100644 index 0000000..fdc20e4 --- /dev/null +++ b/Chap_data_environment.tex @@ -0,0 +1,75 @@ +\pagebreak +\chapter{Data Environment} +\label{chap:data_environment} +The OpenMP \plc{data environment} contains data attributes of variables and +objects. Many constructs (such as \code{parallel}, \code{simd}, \code{task}) +accept clauses to control \plc{data-sharing} attributes +of referenced variables in the construct, where \plc{data-sharing} applies to +whether the attribute of the variable is \plc{shared}, +is \plc{private} storage, or has special operational characteristics +(as found in the \code{firstprivate}, \code{lastprivate}, \code{linear}, or \code{reduction} clause). + +The data environment for a device (distinguished as a \plc{device data environment}) +is controlled on the host by \plc{data-mapping} attributes, which determine the +relationship of the data on the host, the \plc{original} data, and the data on the +device, the \plc{corresponding} data. + +\bigskip +DATA-SHARING ATTRIBUTES + +Data-sharing attributes of variables can be classified as being \plc{predetermined}, +\plc{explicitly determined} or \plc{implicitly determined}. + +Certain variables and objects have predetermined attributes. +A commonly found case is the loop iteration variable in associated loops +of a \code{for} or \code{do} construct. It has a private data-sharing attribute. +Variables with predetermined data-sharing attributes can not be listed in a data-sharing clause; but there are some +exceptions (mainly concerning loop iteration variables). + +Variables with explicitly determined data-sharing attributes are those that are +referenced in a given construct and are listed in a data-sharing attribute +clause on the construct. Some of the common data-sharing clauses are: +\code{shared}, \code{private}, \code{firstprivate}, \code{lastprivate}, +\code{linear}, and \code{reduction}. % Are these all of them? + +Variables with implicitly determined data-sharing attributes are those +that are referenced in a given construct, do not have predetermined +data-sharing attributes, and are not listed in a data-sharing +attribute clause of an enclosing construct. +For a complete list of variables and objects with predetermined and +implicitly determined attributes, please refer to the +\plc{Data-sharing Attribute Rules for Variables Referenced in a Construct} +subsection of the OpenMP Specifications document. + +\bigskip +DATA-MAPPING ATTRIBUTES + +The \code{map} clause on a device construct explictly specifies how the list items in +the clause are mapped from the encountering task's data environment (on the host) +to the corresponding item in the device data environment (on the device). +The common \plc{list items} are arrays, array sections, scalars, pointers, and +structure elements (members). + +Procedures and global variables have predetermined data mapping if they appear +within the list or block of a \code{declare target} directive. Also, a C/C++ pointer +is mapped as a zero-length array section, as is a C++ variable that is a reference to a pointer. +% Waiting for response from Eric on this. + +Without explict mapping, non-scalar and non-pointer variables within the scope of the \code{target} +construct are implicitly mapped with a \plc{map-type} of \code{tofrom}. +Without explicit mapping, scalar variables within the scope of the \code{target} +construct are not mapped, but have an implicit firstprivate data-sharing +attribute. (That is, the value of the original variable is given to a private +variable of the same name on the device.) This behavior can be changed with +the \code{defaultmap} clause. + +The \code{map} clause can appear on \code{target}, \code{target data} and +\code{target enter/exit data} constructs. The operations of creation and +removal of device storage as well as assignment of the original list item +values to the corresponding list items may be complicated when the list +item appears on multiple constructs or when the host and device storage +is shared. In these cases the item's reference count, the number of times +it has been referenced (+1 on entry and -1 on exited) in nested (structured) +map regions and/or accumulative (unstructured) mappings, determines the operation. +Details of the \code{map} clause and reference count operation are specified +in the \plc{map Clause} subsection of the OpenMP Specifications document. diff --git a/Chap_devices.tex b/Chap_devices.tex new file mode 100644 index 0000000..ca7da2e --- /dev/null +++ b/Chap_devices.tex @@ -0,0 +1,53 @@ +\pagebreak +\chapter{Devices} +\label{chap:devices} + +The \code{target} construct consists of a \code{target} directive +and an execution region. The \code{target} region is executed on +the default device or the device specified in the \code{device} +clause. + +In OpenMP version 4.0, by default, all variables within the lexical +scope of the construct are copied \plc{to} and \plc{from} the +device, unless the device is the host, or the data exists on the +device from a previously executed data-type construct that +has created space on the device and possibly copied host +data to the device storage. + +The constructs that explicitly +create storage, transfer data, and free storage on the device +are catagorized as structured and unstructured. The +\code{target} \code{data} construct is structured. It creates +a data region around \code{target} constructs, and is +convenient for providing persistent data throughout multiple +\code{target} regions. The \code{target} \code{enter} \code{data} and +\code{target} \code{exit} \code{data} constructs are unstructured, because +they can occur anywhere and do not support a "structure" +(a region) for enclosing \code{target} constructs, as does the +\code{target} \code{data} construct. + +The \code{map} clause is used on \code{target} +constructs and the data-type constructs to map host data. It +specifies the device storage and data movement \code{to} and \code{from} +the device, and controls on the storage duration. + +There is an important change in the OpenMP 4.5 specification +that alters the data model for scalar variables and C/C++ pointer variables. +The default behavior for scalar variables and C/C++ pointer variables +in an 4.5 compliant code is \code{firstprivate}. Example +codes that have been updated to reflect this new behavior are +annotated with a description that describes changes required +for correct execution. Often it is a simple matter of mapping +the variable as \code{tofrom} to obtain the intended 4.0 behavior. + +In OpenMP version 4.5 the mechanism for target +execution is specified as occuring through a \plc{target task}. +When the \code{target} construct is encountered a new +\plc{target task} is generated. The \plc{target task} +completes after the \code{target} region has executed and all data +transfers have finished. + +This new specification does not affect the execution of +pre-4.5 code; it is a necessary element for asynchronous +execution of the \code{target} region when using the new \code{nowait} +clause introduced in OpenMP 4.5. diff --git a/Chap_memory_model.tex b/Chap_memory_model.tex new file mode 100644 index 0000000..c44d53d --- /dev/null +++ b/Chap_memory_model.tex @@ -0,0 +1,105 @@ +\pagebreak +\chapter{Memory Model} +\label{chap:memory_model} + +In this chapter, examples illustrate race conditions on access to variables with +shared data-sharing attributes. A race condition can exist when two +or more threads are involved in accessing a variable in which not all +of the accesses are reads; that is, a WaR, RaW or WaW condition +exists (R=read, a=after, W=write). A RaR does not produce a race condition. + Ensuring thread execution order at +the processor level is not enough to avoid race conditions, because the +local storage at the processor level (registers, caches, etc.) +must be synchronized so that a consistent view of the variable in the +memory hierarchy can be seen by the threads accessing the variable. + +OpenMP provides a shared-memory model which allows all threads access +to \plc{memory} (shared data). Each thread also has exclusive +access to \plc{threadprivate memory} (private data). A private +variable referenced in an OpenMP directive's structured block is a +new version of the original variable (with the same name) for each +task (or SIMD lane) within the code block. A private variable is +initially undefined (except for variables in \code{firstprivate} +and \code{linear} clauses), and the original variable value is +unaltered by assignments to the private variable, (except for +\code{reduction}, \code{lastprivate} and \code{linear} clauses). + +Private variables in an outer \code{parallel} region can be +shared by implicit tasks of an inner \code{parallel} region +(with a \code{share} clause on the inner \code{parallel} directive). +Likewise, a private variable may be shared in the region of an +explicit \code{task} (through a \code{shared} clause). + + +The \code{flush} directive forces a consistent view of local variables +of the thread executing the \code{flush}. +When a list is supplied on the directive, only the items (variables) +in the list are guaranteed to be flushed. + +Implied flushes exist at prescribed locations of certain constructs. +For the complete list of these locations and associated constructs, +please refer to the \plc{flush Construct} section of the OpenMP +Specifications document. + +% The following table lists construct in which implied flushes exist, and the +% location of their execution. +% +% %\begin{table}[hb] +% \begin{center} +% %\caption {Execution Location for Implicit Flushes. } +% \begin{tabular}{ | p{0.6\linewidth} | l | } +% \hline +% \code{CONSTRUCT} & \makecell{\code{EXECUTION} \\ \code{LOCATION}} \\ +% \hline +% \code{parallel} & upon entry and exit \\ +% \hline +% \makecell[l]{worksharing \\ \hspace{1.5em}\code{for}, \code{do} +% \\ \hspace{1.5em}\code{sections} +% \\ \hspace{1.5em}\code{single} +% \\ \hspace{1.5em}\code{workshare} } +% & upon exit \\ +% \hline +% \code{critical} & upon entry and exit \\ +% \hline +% \code{target} & upon entry and exit \\ +% \hline +% \code{barrier} & during \\ +% \hline +% \code{atomic} operation with \plc{seq\_cst} clause & upon entry and exit \\ +% \hline +% \code{ordered}* & upon entry and exit \\ +% \hline +% \code{cancel}** and \code{cancellation point}** & during \\ +% \hline +% \code{target data} & upon entry and exit \\ +% \hline +% \code{target update} + \code{to} clause, +% \code{target enter data} & on entry \\ +% \hline +% \code{target update} + \code{from} clause, +% \code{target exit data} & on exit \\ +% \hline +% \code{omp\_set\_lock} & during \\ +% \hline +% \makecell[l]{ \code{omp\_set/unset\_lock}, \code{omp\_test\_lock}*** +% \\ \code{omp\_set/unset/test\_nest\_lock}*** } +% & during \\ +% \hline +% task scheduling point & \makecell[l]{immediately \\ before and after} \\ +% \hline +% \end{tabular} +% %\caption {Execution Location for Implicit Flushes. } +% +% \end{center} +% %\end{table} +% +% * without clauses and with \code{threads} or \code{depend} clauses \newline +% ** when \plc{cancel-var} ICV is \plc{true} (cancellation is turned on) and cancellation is activated \newline +% *** if the region causes the lock to be set or unset +% +% A flush with a list is implied for non-sequentially consistent \code{atomic} operations +% (\code{atomic} directive without a \code{seq\_cst} clause), where the list item is the +% specific storage location accessed atomically (specified as the \plc{x} variable +% in \plc{atomic Construct} subsection of the OpenMP Specifications document). + +Examples 1-3 show the difficulty of synchronizing threads through \code{flush} and \code{atomic} directives. diff --git a/Chap_parallel_execution.tex b/Chap_parallel_execution.tex new file mode 100644 index 0000000..db21ca4 --- /dev/null +++ b/Chap_parallel_execution.tex @@ -0,0 +1,104 @@ +\pagebreak +\chapter{Parallel Execution} +\label{chap:parallel_execution} + +A single thread, the \plc{initial thread}, begins sequential execution of +an OpenMP enabled program, as if the whole program is in an implicit parallel +region consisting of an implicit task executed by the \plc{initial thread}. + +A \code{parallel} construct encloses code, +forming a parallel region. An \plc{initial thread} encountering a \code{parallel} +region forks (creates) a team of threads at the beginning of the +\code{parallel} region, and joins them (removes from execution) at the +end of the region. The initial thread becomes the master thread of the team in a +\code{parallel} region with a \plc{thread} number equal to zero, the other +threads are numbered from 1 to number of threads minus 1. +A team may be comprised of just a single thread. + +Each thread of a team is assigned an implicit task consisting of code within the +parallel region. The task that creates a parallel region is suspended while the +tasks of the team are executed. A thread is tied to its task; that is, +only the thread assigned to the task can execute that task. After completion +of the \code{parallel} region, the master thread resumes execution of the generating task. + +%After the \code{parallel} region the master thread becomes the initial +%thread again, and continues to execute the \plc{sequential part}. + +Any task within a \code{parallel} region is allowed to encounter another +\code{parallel} region to form a nested \code{parallel} region. The +parallelism of a nested \code{parallel} region (whether it forks additional +threads, or is executed serially by the encountering task) can be controlled by the +\code{OMP\_NESTED} environment variable or the \code{omp\_set\_nested()} +API routine with arguments indicating true or false. + +The number of threads of a \code{parallel} region can be set by the \code{OMP\_NUM\_THREADS} +environment variable, the \code{omp\_set\_num\_threads()} routine, or on the \code{parallel} +directive with the \code{num\_threads} +clause. The routine overrides the environment variable, and the clause overrides all. +Use the \code{OMP\_DYNAMIC} +or the \code{omp\_set\_dynamic()} function to specify that the OpenMP +implementation dynamically adjust the number of threads for +\code{parallel} regions. The default setting for dynamic adjustment is implementation +defined. When dynamic adjustment is on and the number of threads is specified, +the number of threads becomes an upper limit for the number of threads to be +provided by the OpenMP runtime. + +\pagebreak +WORKSHARING CONSTRUCTS + +A worksharing construct distributes the execution of the associated region +among the members of the team that encounter it. There is an +implied barrier at the end of the worksharing region +(there is no barrier at the beginning). The worksharing +constructs are: + +\begin{compactitem} + +\item loop constructs: {\code{for} and \code{do} } +\item \code{sections} +\item \code{single} +\item \code{workshare} + +\end{compactitem} + +The \code{for} and \code{do} constructs (loop constructs) create a region +consisting of a loop. A loop controlled by a loop construct is called +an \plc{associated} loop. Nested loops can form a single region when the +\code{collapse} clause (with an integer argument) designates the number of +\plc{associated} loops to be executed in parallel, by forming a +"single iteration space" for the specified number of nested loops. +The \code{ordered} clause can also control multiple associated loops. + +An associated loop must adhere to a "canonical form" (specified in the +\plc{Canonical Loop Form} of the OpenMP Specifications document) which allows the +iteration count (of all associated loops) to be computed before the +(outermost) loop is executed. %[58:27-29]. +Most common loops comply with the canonical form, including C++ iterators. + +A \code{single} construct forms a region in which only one thread (any one +of the team) executes the region. +The other threads wait at the implied +barrier at the end, unless the \code{nowait} clause is specified. + +The \code{sections} construct forms a region that contains one or more +structured blocks. Each block of a \code{sections} directive is +constructed with a \code{section} construct, and executed once by +one of the threads (any one) in the team. (If only one block is +formed in the region, the \code{section} construct, which is used to +separate blocks, is not required.) +The other threads wait at the implied +barrier at the end, unless the \code{nowait} clause is specified. + + +The \code{workshare} construct is a Fortran feature that consists of a +region with a single structure block (section of code). Statements in the +\code{workshare} region are divided into units of work, and executed (once) +by threads of the team. + +\bigskip +MASTER CONSTRUCT + +The \code{master} construct is not a worksharing construct. The master region is +is executed only by the master thread. There is no implicit barrier (and flush) +at the end of the \code{master} region; hence the other threads of the team continue +execution beyond code statements beyond the \code{master} region. diff --git a/Chap_program_control.tex b/Chap_program_control.tex new file mode 100644 index 0000000..d45259c --- /dev/null +++ b/Chap_program_control.tex @@ -0,0 +1,85 @@ +\pagebreak +\chapter{Program Control} +\label{sec:program_control} + +Some specific and elementary concepts of controlling program execution are +illustrated in the examples of this chapter. Control can be directly +managed with conditional control code (ifdef's with the \code{\_OPENMP} +macro, and the Fortran sentinel (\code{!\$}) +for conditionally compiling). The \code{if} clause on some constructs +can direct the runtime to ignore or alter the behavior of the construct. +Of course, the base-language \code{if} statements can be used to control the "execution" +of stand-alone directives (such as \code{flush}, \code{barrier}, \code{taskwait}, +and \code{taskyield}). +However, the directives must appear in a block structure, and not as a substatement as shown in examples 1 and 2 of this chapter. + +\bigskip +CANCELLATION + +Cancellation (termination) of the normal sequence of execution for the threads in an OpenMP region can +be accomplished with the \code{cancel} construct. The construct uses a +\plc{construct-type-clause} to set the region-type to activate for the cancellation. +That is, inclusion of one of the \plc{construct-type-clause} names \code{parallel}, \code{for}, +\code{do}, \code{sections} or \code{taskgroup} on the directive line +activates the corresponding region. +The \code{cancel} construct is activated by the first encountering thread, and it +continues execution at the end of the named region. +The \code{cancel} construct is also a concellation point for any other thread of the team +to also continue execution at the end of the named region. + +Also, once the specified region has been activated for cancellation any thread that encounnters +a \code{cancellation point} construct with the same named region (\plc{construct-type-clause}), +continues execution at the end of the region. + +For an activated \code{cancel taskgroup} construct, the tasks that +belong to the taskgroup set of the innermost enclosing taskgroup region will be canceled. + +A task that encounters the cancel taskgroup construct continues execution at the end of its +task region. Any task of the taskgroup that has already begun execution will run to completion, +unless it encounters a \code{cancellation point}; tasks that have not begun execution "may" be +discarded as completed tasks. + +\bigskip +CONTROL VARIABLES + + Internal control variables (ICV) are used by implementations to hold values which control the execution + of OpenMP regions. Control (and hence the ICVs) may be set as implementation defaults, + or set and adjusted through environment variables, clauses, and API functions. Many of the ICV control + values are accessible through API function calls. Also, initial ICV values are reported by the runtime + if the \code{OMP\_DISPLAY\_ENV} environment variable has been set to \code{TRUE}. + + %As an example, the \plc{nthreads-var} is the ICV that holds the number of threads + %to be used in a \code{parallel} region. It can be set with the \code{OMP\_NUM\_THREADS} environment variable, + %the \code{omp\_set\_num\_threads()} API function, or the \code{num\_threads} clause. The default \plc{nthreads-var} + %value is implementation defined. All of the ICVs are presented in the \plc{Internal Control Variables} section + %of the \plc{Directives} chapter of the OpenMP Specifications document. Within the same document section, override + %relationships and scoping information can be found for applying user specifications and understanding the + %extent of the control. + +\bigskip +NESTED CONSTRUCTS + +Certain combinations of nested constructs are permitted, giving rise to a \plc{combined} construct +consisting of two or more constructs. These can be used when the two (or several) constructs would be used +immediately in succession (closely nested). A combined construct can use the clauses of the component +constructs without restrictions. +A \plc{composite} construct is a combined construct which has one or more clauses with (an often obviously) +modified or restricted meaning, relative to when the constructs are uncombined. %%[appear separately (singly). + +%The combined \code{parallel do} and \code{parallel for} constructs are formed by combining the \code{parallel} +%construct with one of the loops constructs \code{do} or \code{for}. The +%\code{parallel do SIMD} and \code{parallel for SIMD} constructs are composite constructs (composed from +%the parallel loop constructs and the \code{SIMD} construct), because the \code{collapse} clause must +%explicitly address the ordering of loop chunking \plc{and} SIMD "combined" execution. + +Certain nestings are forbidden, and often the reasoning is obvious. Worksharing constructs cannot be nested, and +the \code{barrier} construct cannot be nested inside a worksharing construct, or a \code{critical} construct. +Also, \code{target} constructs cannot be nested. + +The \code{parallel} construct can be nested, as well as the \code{task} construct. The parallel +execution in the nested \code{parallel} construct(s) is control by the \code{OMP\_NESTED} and +\code{OMP\_MAX\_ACTIVE\_LEVELS} environment variables, and the \code{omp\_set\_nested()} and +\code{omp\_set\_max\_active\_levels()} functions. + +More details on nesting can be found in the \plc{Nesting of Regions} of the \plc{Directives} +chapter in the OpenMP Specifications document. diff --git a/Chap_synchronization.tex b/Chap_synchronization.tex new file mode 100644 index 0000000..3b96062 --- /dev/null +++ b/Chap_synchronization.tex @@ -0,0 +1,69 @@ +\pagebreak +\chapter{Synchronization} +\label{chap:synchronization} + +The \code{barrier} construct is a stand-alone directive that requires all threads +of a team (within a contention group) to execute the barrier and complete +execution of all tasks within the region, before continuing past the barrier. + +The \code{critical} construct is a directive that contains a structured block. +The construct allows only a single thread at a time to execute the structured block (region). +Multiple critical regions may exist in a parallel region, and may +act cooperatively (only one thread at a time in all \code{critical} regions), +or separately (only one thread at a time in each \code{critical} regions when +a unique name is supplied on each \code{critical} construct). +An optional (lock) \code{hint} clause may be specified on a named \code{critical} +construct to provide the OpenMP runtime guidance in selection a locking +mechanism. + +On a finer scale the \code{atomic} construct allows only a single thread at +a time to have atomic access to a storage location involving a single read, +write, update or capture statement, and a limited number of combinations +when specifying the \code{capture} \plc{atomic-clause} clause. The \plc{atomic-clause} clause +is required for some expression statements, but are not required for +\code{update} statements. Please see the details in the \plc{atomic Construct} +subsection of the \plc{Directives} chapter in the OpenMP Specifications document. + +% The following three sentences were stolen from the spec. +The \code{ordered} construct either specifies a structured block in a loop, +simd, or loop SIMD region that will be executed in the order of the loop +iterations. The ordered construct sequentializes and orders the execution +of ordered regions while allowing code outside the region to run in parallel. + +Since OpenMP 4.5 the \code{ordered} construct can also be a stand-alone +directive that specifies cross-iteration dependences in a doacross loop nest. +The \code{depend} clause uses a \code{sink} \plc{dependence-type}, along with a +iteration vector argument (vec) to indicate the iteration that satisfies the +dependence. The \code{depend} clause with a \code{source} +\plc{dependence-type} specifies dependence satisfaction. + +The \code{flush} directive is a stand-alone construct that forces a thread's +temporal local storage (view) of a variable to memory where a consistent view +of the variable storage can be accesses. When the construct is used without +a variable list, all the locally thread-visible data as defined by the +base language are flushed. A construct with a list applies the flush +operation only to the items in the list. The \code{flush} construct also +effectively insures that no memory (load or store) operation for +the variable set (list items, or default set) may be reordered across +the \code{flush} directive. + +General-purpose routines provide mutual exclusion semantics through locks, +represented by lock variables. +The semantics allows a task to \plc{set}, and hence +\plc{own} a lock, until it is \plc{unset} by the task that set it. A +\plc{nestable} lock can be set multiple times by a task, and is used +when in code requires nested control of locks. A \plc{simple lock} can +only be set once by the owning task. There are specific calls for the two +types of locks, and the variable of a specific lock type cannot be used by the +other lock type. + +Any explicit task will observe the synchronization prescribed in a +\code{barrier} construct and an implied barrier. Also, additional synchronizations +are available for tasks. All children of a task will wait at a \code{taskwait} (for +their siblings to complete). A \code{taskgroup} construct creates a region in which the +current task is suspended at the end of the region until all sibling tasks, +and their descendants, have completed. +Scheduling constraints on task execution can be prescribed by the \code{depend} +clause to enforce dependence on previously generated tasks. +More details on controlling task executions can be found in the \plc{Tasking} Chapter +in the OpenMP Specifications document. %(DO REF. RIGHT.) diff --git a/Chap_tasking.tex b/Chap_tasking.tex new file mode 100644 index 0000000..4cb72c7 --- /dev/null +++ b/Chap_tasking.tex @@ -0,0 +1,51 @@ +\pagebreak +\chapter{Tasking} +\label{chap:tasking} + +Tasking constructs provide units of work to a thread for execution. +Worksharing constructs do this, too (e.g. \code{for}, \code{do}, +\code{sections}, and \code{singles} constructs); +but the work units are tightly controlled by an iteration limit and limited +scheduling, or a limited number of \code{sections} or \code{single} regions. +Worksharing was designed +with \texttt{"}data parallel\texttt{"} computing in mind. Tasking was designed for +\texttt{"}task parallel\texttt{"} computing and often involves non-locality or irregularity +in memory access. + +The \code{task} construct can be used to execute work chunks: in a while loop; +while traversing nodes in a list; at nodes in a tree graph; +or in a normal loop (with a \code{taskloop} construct). +Unlike the statically scheduled loop iterations of worksharing, a task is +often enqueued, and then dequeued for execution by any of the threads of the +team within a parallel region. The generation of tasks can be from a single +generating thread (creating sibling tasks), or from multiple generators +in a recursive graph tree traversals. +%(creating a parent-descendents hierarchy of tasks, see example 4 and 7 below). +A \code{taskloop} construct +bundles iterations of an associated loop into tasks, and provides +similar controls found in the \code{task} construct. + +Sibling tasks are synchronized by the \code{taskwait} construct, and tasks +and their descendent tasks can be synchronized by containing them in +a \code{taskgroup} region. Ordered execution is accomplished by specifying +dependences with a \code{depend} clause. Also, priorities can be +specified as hints to the scheduler through a \code{priority} clause. + +Various clauses can be used to manage and optimize task generation, +as well as reduce the overhead of execution and to relinquish +control of threads for work balance and forward progress. + +Once a thread starts executing a task, it is the designated thread +for executing the task to completion, even though it may leave the +execution at a scheduling point and return later. The thread is tied +to the task. Scheduling points can be introduced with the \code{taskyield} +construct. With an \code{untied} clause any other thread is allowed to continue +the task. An \code{if} clause with a \plc{true} expression allows the +generating thread to immediately execute the task as an undeferred task. +By including the data environment of the generating task into the generated task with the +\code{mergeable} and \code{final} clauses, task generation overhead can be reduced. + +A complete list of the tasking constructs and details of their clauses +can be found in the \plc{Tasking Constructs} chapter of the OpenMP Specifications, +in the \plc{OpenMP Application Programming Interface} section. + diff --git a/Examples_Chapt.tex b/Examples_Chapt.tex index 38b4d2d..ebc689c 100644 --- a/Examples_Chapt.tex +++ b/Examples_Chapt.tex @@ -1,9 +1,21 @@ \chapter*{Examples} \label{chap:examples} +\addcontentsline{toc}{chapter}{\protect\numberline{}Examples} The following are examples of the OpenMP API directives, constructs, and routines. \ccppspecificstart A statement following a directive is compound only when necessary, and a non-compound statement is indented with respect to a directive preceding it. \ccppspecificend +Each example is labeled as \plc{ename.seqno.ext}, where \plc{ename} is +the example name, \plc{seqno} is the sequence number in a section, and +\plc{ext} is the source file extension to indicate the code type and +source form. \plc{ext} is one of the following: +\begin{compactitem} +\item \plc{c} -- C code, +\item \plc{cpp} -- C++ code, +\item \plc{f} -- Fortran code in fixed form, and +\item \plc{f90} -- Fortran code in free form. +\end{compactitem} + diff --git a/Examples_SIMD.tex b/Examples_SIMD.tex index 24a24f5..a6842ba 100644 --- a/Examples_SIMD.tex +++ b/Examples_SIMD.tex @@ -1,17 +1,13 @@ -\pagebreak -\chapter{SIMD Constructs} -\label{chap:SIMD} +%\pagebreak +\section{\code{simd} and \code{declare} \code{simd} Constructs} +\label{sec:SIMD} -The following examples illustrate the use of SIMD constructs for vectorization. +The following example illustrates the basic use of the \code{simd} construct +to assure the compiler that the loop can be vectorized. -Compilers may not vectorize loops when they are complex or possibly have -dependencies, even though the programmer is certain the loop will execute -correctly as a vectorized loop. The \code{simd} construct assures the compiler -that the loop can be vectorized. +\cexample{SIMD}{1} -\cexample{SIMD}{1c} - -\fexample{SIMD}{1f} +\ffreeexample{SIMD}{1} When a function can be inlined within a loop the compiler has an opportunity to @@ -42,9 +38,9 @@ In the \code{simd} constructs for the loops the \code{private(tmp)} clause is necessary to assure that the each vector operation has its own \plc{tmp} variable. -\cexample{SIMD}{2c} +\cexample{SIMD}{2} -\fexample{SIMD}{2f} +\ffreeexample{SIMD}{2} A thread that encounters a SIMD construct executes a vectorized code of the @@ -54,9 +50,9 @@ privatized and declared as reductions with clauses. The example below illustrates the use of \code{private} and \code{reduction} clauses in a SIMD construct. -\cexample{SIMD}{3c} +\cexample{SIMD}{3} -\fexample{SIMD}{3f} +\ffreeexample{SIMD}{3} A \code{safelen(N)} clause in a \code{simd} construct assures the compiler that @@ -69,9 +65,9 @@ code is safe for vectors up to and including size 16. In the loop, \plc{m} can be 16 or greater, for correct code execution. If the value of \plc{m} is less than 16, the behavior is undefined. -\cexample{SIMD}{4c} +\cexample{SIMD}{4} -\fexample{SIMD}{4f} +\ffreeexample{SIMD}{4} The following SIMD construct instructs the compiler to collapse the \plc{i} and @@ -79,11 +75,15 @@ The following SIMD construct instructs the compiler to collapse the \plc{i} and threads of the team. Within the workshared loop chunks of a thread, the SIMD chunks are executed in the lanes of the vector units. -\cexample{SIMD}{5c} +\cexample{SIMD}{5} -\fexample{SIMD}{5f} +\ffreeexample{SIMD}{5} +%%% section +\section{\code{inbranch} and \code{notinbranch} Clauses} +\label{sec:SIMD_branch} + The following examples illustrate the use of the \code{declare} \code{simd} construct with the \code{inbranch} and \code{notinbranch} clauses. The \code{notinbranch} clause informs the compiler that the function \plc{foo} is @@ -92,9 +92,9 @@ the other hand, the \code{inbranch} clause for the function goo indicates that the function is always called conditionally in the SIMD loop inside the function \plc{myaddfloat}. -\cexample{SIMD}{6c} +\cexample{SIMD}{6} -\fexample{SIMD}{6f} +\ffreeexample{SIMD}{6} In the code below, the function \plc{fib()} is called in the main program and @@ -103,7 +103,24 @@ condition. The compiler creates a masked vector version and a non-masked vector version for the function \plc{fib()} while retaining the original scalar version of the \plc{fib()} function. -\cexample{SIMD}{7c} +\cexample{SIMD}{7} -\fexample{SIMD}{7f} +\ffreeexample{SIMD}{7} + + + +%%% section +\section{Loop-Carried Lexical Forward Dependence} +\label{sec:SIMD_forward_dep} + + + The following example tests the restriction on an SIMD loop with the loop-carried lexical forward-dependence. This dependence must be preserved for the correct execution of SIMD loops. + +A loop can be vectorized even though the iterations are not completely independent when it has loop-carried dependences that are forward lexical dependences, indicated in the code below by the read of \plc{A[j+1]} and the write to \plc{A[j]} in C/C++ code (or \plc{A(j+1)} and \plc{A(j)} in Fortran). That is, the read of \plc{A[j+1]} (or \plc{A(j+1)} in Fortran) before the write to \plc{A[j]} (or \plc{A(j)} in Fortran) ordering must be preserved for each iteration in \plc{j} for valid SIMD code generation. + +This test assures that the compiler preserves the loop carried lexical forward-dependence for generating a correct SIMD code. + +\cexample{SIMD}{8} + +\ffreeexample{SIMD}{8} diff --git a/Examples_affinity.tex b/Examples_affinity.tex index a5029cb..76050e5 100644 --- a/Examples_affinity.tex +++ b/Examples_affinity.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{The \code{proc\_bind} Clause} -\label{chap:affinity} +\section{The \code{proc\_bind} Clause} +\label{sec:affinity} The following examples demonstrate how to use the \code{proc\_bind} clause to control the thread binding for a team of threads in a \code{parallel} region. @@ -25,16 +25,18 @@ or \code{OMP\_PLACES=\texttt{"}\{0:2\}:8:2\texttt{"}} -\section{Spread Affinity Policy} +\subsection{Spread Affinity Policy} +\label{subsec:affinity_spread} + The following example shows the result of the \code{spread} affinity policy on the partition list when the number of threads is less than or equal to the number of places in the parent's place partition, for the machine architecture depicted above. Note that the threads are bound to the first place of each subpartition. -\cexample{affinity}{1c} +\cexample{affinity}{1} -\fexample{affinity}{1f} +\fexample{affinity}{1} It is unspecified on which place the master thread is initially started. If the master thread is initially started on p0, the following placement of threads will @@ -73,9 +75,9 @@ parent's place partition. The first \plc{T/P} threads of the team (including the thread) execute on the parent's place. The next \plc{T/P} threads execute on the next place in the place partition, and so on, with wrap around. -\cexample{affinity}{2c} +\cexample{affinity}{2} -\fexample{affinity}{2f} +\ffreeexample{affinity}{2} It is unspecified on which place the master thread is initially started. If the master thread is initially started on p0, the following placement of threads will @@ -120,16 +122,17 @@ and distribution of the place partition would be as follows: \item threads 14,15 execute on p1 with the place partition p1 \end{compactitem} -\section{Close Affinity Policy} +\subsection{Close Affinity Policy} +\label{subsec:affinity_close} The following example shows the result of the \code{close} affinity policy on the partition list when the number of threads is less than or equal to the number of places in parent's place partition, for the machine architecture depicted above. The place partition is not changed by the \code{close} policy. -\cexample{affinity}{3c} +\cexample{affinity}{3} -\fexample{affinity}{3f} +\fexample{affinity}{3} It is unspecified on which place the master thread is initially started. If the master thread is initially started on p0, the following placement of threads will @@ -168,9 +171,9 @@ thread) execute on the parent's place. The next \plc{T/P} threads execute on the place in the place partition, and so on, with wrap around. The place partition is not changed by the \code{close} policy. -\cexample{affinity}{4c} +\cexample{affinity}{4} -\fexample{affinity}{4f} +\ffreeexample{affinity}{4} It is unspecified on which place the master thread is initially started. If the master thread is initially running on p0, the following placement of threads will @@ -215,15 +218,16 @@ and distribution of the place partition would be as follows: \item threads 14,15 execute on p1 with the place partition p0-p7 \end{compactitem} -\section{Master Affinity Policy} +\subsection{Master Affinity Policy} +\label{subsec:affinity_master} The following example shows the result of the \code{master} affinity policy on the partition list for the machine architecture depicted above. The place partition is not changed by the master policy. -\cexample{affinity}{5c} +\cexample{affinity}{5} -\fexample{affinity}{5f} +\fexample{affinity}{5} It is unspecified on which place the master thread is initially started. If the master thread is initially running on p0, the following placement of threads will diff --git a/Examples_affinity_query.tex b/Examples_affinity_query.tex new file mode 100644 index 0000000..06f56b6 --- /dev/null +++ b/Examples_affinity_query.tex @@ -0,0 +1,43 @@ +\section{Affinity Query Functions} +\label{sec: affinity_query} + +In the example below a team of threads is generated on each socket of +the system, using nested parallelism. Several query functions are used +to gather information to support the creation of the teams and to obtain +socket and thread numbers. + +For proper execution of the code, the user must create a place partition, such that +each place is a listing of the core numbers for a socket. For example, +in a 2 socket system with 8 cores in each socket, and sequential numbering +in the socket for the core numbers, the \code{OMP\_PLACES} variable would be set +to "\{0:8\},\{8:8\}", using the place syntax \{\plc{lower\_bound}:\plc{length}:\plc{stride}\}, +and the default stride of 1. + +The code determines the number of sockets (\plc{n\_sockets}) +using the \code{omp\_get\_num\_places()} query function. +In this example each place is constructed with a list of +each socket's core numbers, hence the number of places is equal +to the number of sockets. + +The outer parallel region forms a team of threads, and each thread +executes on a socket (place) because the \code{proc\_bind} clause uses +\code{spread} in the outer \code{parallel} construct. +Next, in the \plc{socket\_init} function, an inner parallel region creates a team +of threads equal to the number of elements (core numbers) from the place +of the parent thread. Because the outer \code{parallel} construct uses +a \code{spread} affinity policy, each of its threads inherits a subpartition of +the original partition. Hence, the \code{omp\_get\_place\_num\_procs} query function +returns the number of elements (here procs = cores) in the subpartition of the thread. +After each parent thread creates its nested parallel region on the section, +the socket number and thread number are reported. + +Note: Portable tools like hwloc (Portable HardWare LOCality package), which support +many common operating systems, can be used to determine the configuration of a system. +On some systems there are utilities, files or user guides that provide configuration +information. For instance, the socket number and proc\_id's for a socket +can be found in the /proc/cpuinfo text file on Linux systems. + +\cexample{affinity}{6} + +\ffreeexample{affinity}{6} + diff --git a/Examples_array_sections.tex b/Examples_array_sections.tex index f1601ab..41e4980 100644 --- a/Examples_array_sections.tex +++ b/Examples_array_sections.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{Array Sections in Device Constructs} -\label{chap:array_sections} +\section{Array Sections in Device Constructs} +\label{sec:array_sections} The following examples show the usage of array sections in \code{map} clauses on \code{target} and \code{target} \code{data} constructs. @@ -8,28 +8,28 @@ on \code{target} and \code{target} \code{data} constructs. This example shows the invalid usage of two seperate sections of the same array inside of a \code{target} construct. -\cexample{array_sections}{1c} +\cexample{array_sections}{1} -\fexample{array_sections}{1f} +\ffreeexample{array_sections}{1} This example shows the invalid usage of two separate sections of the same array inside of a \code{target} construct. -\cexample{array_sections}{2c} +\cexample{array_sections}{2} -\fexample{array_sections}{2f} +\ffreeexample{array_sections}{2} This example shows the valid usage of two separate sections of the same array inside of a \code{target} construct. -\cexample{array_sections}{3c} +\cexample{array_sections}{3} -\fexample{array_sections}{3f} +\ffreeexample{array_sections}{3} This example shows the valid usage of a wholly contained array section of an already mapped array section inside of a \code{target} construct. -\cexample{array_sections}{4c} +\cexample{array_sections}{4} -\fexample{array_sections}{4f} +\ffreeexample{array_sections}{4} diff --git a/Examples_associate.tex b/Examples_associate.tex index 6946e0c..4e8f5a2 100644 --- a/Examples_associate.tex +++ b/Examples_associate.tex @@ -1,7 +1,7 @@ \pagebreak -\chapter{Fortran \code{ASSOCIATE} Construct} +\section{Fortran \code{ASSOCIATE} Construct} \fortranspecificstart -\label{chap:associate} +\label{sec:associate} The following is an invalid example of specifying an associate name on a data-sharing attribute clause. The constraint in the Data Sharing Attribute Rules section in the OpenMP @@ -11,13 +11,13 @@ name \plc{b} is associated with the shared variable \plc{a}. With the predetermi attribute rule, the associate name \plc{b} is not allowed to be specified on the \code{private} clause. -\fnexample{associate}{1f} +\fnexample{associate}{1} In next example, within the \code{parallel} construct, the association name \plc{thread\_id} is associated with the private copy of \plc{i}. The print statement should output the unique thread number. -\fnexample{associate}{2f} +\fnexample{associate}{2} The following example illustrates the effect of specifying a selector name on a data-sharing attribute clause. The associate name \plc{u} is associated with \plc{v} and the variable \plc{v} @@ -27,6 +27,6 @@ The association between \plc{u} and the original \plc{v} is retained (see the Da Attribute Rules section in the OpenMP 4.0 API Specifications). Inside the \code{parallel} region, \plc{v} has the value of -1 and \plc{u} has the value of the original \plc{v}. -\fnexample{associate}{3f} +\ffreenexample{associate}{3} \fortranspecificend diff --git a/Examples_async_target_depend.tex b/Examples_async_target_depend.tex new file mode 100644 index 0000000..14f09b0 --- /dev/null +++ b/Examples_async_target_depend.tex @@ -0,0 +1,15 @@ +\pagebreak +\section{Asynchronous \code{target} Execution and Dependences} +\label{sec:async_target_exec_depend} + +Asynchronous execution of a \code{target} region can be accomplished +by creating an explicit task around the \code{target} region. Examples +with explicit tasks are shown at the beginning of this section. + +As of OpenMP 4.5 and beyond the \code{nowait} clause can be used on the +\code{target} directive for asynchronous execution. Examples with +\code{nowait} clauses follow the explicit \code{task} examples. + +This section also shows the use of \code{depend} clauses to order +executions through dependences. + diff --git a/Examples_async_target_nowait.tex b/Examples_async_target_nowait.tex new file mode 100644 index 0000000..087af00 --- /dev/null +++ b/Examples_async_target_nowait.tex @@ -0,0 +1,31 @@ +\subsection{\code{nowait} Clause on \code{target} Construct} +\label{subsec:target_nowait_clause} + +The following example shows how to execute code asynchronously on a +device without an explicit task. The \code{nowait} clause on a \code{target} +construct allows the thread of the \plc{target task} to perform other +work while waiting for the \code{target} region execution to complete. +Hence, the the \code{target} region can execute asynchronously on the +device (without requiring a host thread to idle while waiting for +the \plc{target task} execution to complete). + +In this example the product of two vectors (arrays), \plc{v1} +and \plc{v2}, is formed. One half of the operations is performed +on the device, and the last half on the host, concurrently. + +After a team of threads is formed the master thread generates +the \plc{target task} while the other threads can continue on, without a barrier, +to the execution of the host portion of the vector product. +The completion of the \plc{target task} (asynchronous target execution) is +guaranteed by the synchronization in the implicit barrier at the end of the +host vector-product worksharing loop region. See the \code{barrier} +glossary entry in the OpenMP specification for details. + +The host loop scheduling is \code{dynamic}, to balance the host thread executions, since +one thread is being used for offload generation. In the situation where +little time is spent by the \plc{target task} in setting +up and tearing down the the target execution, \code{static} scheduling may be desired. + +\cexample{async_target}{3} + +\ffreeexample{async_target}{3} diff --git a/Examples_async_target_nowait_depend.tex b/Examples_async_target_nowait_depend.tex new file mode 100644 index 0000000..1cf260d --- /dev/null +++ b/Examples_async_target_nowait_depend.tex @@ -0,0 +1,18 @@ +%begin +\subsection{Asynchronous \code{target} with \code{nowait} and \code{depend} Clauses} +\label{subsec:async_target_nowait_depend} + +More details on dependences can be found in \specref{sec:task_depend}, Task +Dependences. In this example, there are three flow dependences. In the first two dependences the +target task does not execute until the preceding explicit tasks have finished. These +dependences are produced by arrays \plc{v1} and \plc{v2} with the \code{out} dependence type in the first two tasks, and the \code{in} dependence type in the target task. + +The last dependence is produced by array \plc{p} with the \code{out} dependence type in the target task, and the \code{in} dependence type in the last task. The last task does not execute until the target task finishes. + +The \code{nowait} clause on the \code{target} construct creates a deferrable \plc{target task}, allowing the encountering task to continue execution without waiting for the completion of the \plc{target task}. + +\cexample{async_target}{4} + +\ffreeexample{async_target}{4} + +%end diff --git a/Examples_async_target.tex b/Examples_async_target_with_tasks.tex similarity index 58% rename from Examples_async_target.tex rename to Examples_async_target_with_tasks.tex index e7ae503..5a3126d 100644 --- a/Examples_async_target.tex +++ b/Examples_async_target_with_tasks.tex @@ -1,6 +1,5 @@ -\pagebreak -\chapter{Asynchronous Execution of a \code{target} Region Using Tasks} -\label{chap:async_target} +\subsection{Asynchronous \code{target} with Tasks} +\label{subsec:async_target_with_tasks} The following example shows how the \code{task} and \code{target} constructs are used to execute multiple \code{target} regions asynchronously. The task that @@ -10,45 +9,46 @@ scheduling point while waiting for the execution of the \code{target} region to complete, allowing the thread to switch back to the execution of the encountering task or one of the previously generated explicit tasks. -\cexample{async_target}{1c} +\cexample{async_target}{1} The Fortran version has an interface block that contains the \code{declare} \code{target}. An identical statement exists in the function declaration (not shown here). -\fexample{async_target}{1f} +\ffreeexample{async_target}{1} The following example shows how the \code{task} and \code{target} constructs are used to execute multiple \code{target} regions asynchronously. The task dependence ensures that the storage is allocated and initialized on the device before it is accessed. -\cexample{async_target}{2c} +\cexample{async_target}{2} The Fortran example below is similar to the C version above. Instead of pointers, though, it uses -the convenience of Fortran allocatable arrays on the device. An allocatable array has the -same behavior in a \code{map} clause as a C pointer, in this case. +the convenience of Fortran allocatable arrays on the device. In order to preserve the arrays +allocated on the device across multiple \code{target} regions, a \code{target}~\code{data} region +is used in this case. If there is no shape specified for an allocatable array in a \code{map} clause, only the array descriptor (also called a dope vector) is mapped. That is, device space is created for the descriptor, and it is initially populated with host values. In this case, the \plc{v1} and \plc{v2} arrays will be in a non-associated state on the device. When space for \plc{v1} and \plc{v2} is allocated on the device -the addresses to the space will be included in their descriptors. +in the first \code{target} region the addresses to the space will be included in their descriptors. -At the end of the first \code{target} region, the descriptor (of an unshaped specification of an allocatable -array in a \code{map} clause) is returned with the raw device address of the allocated space. -The content of the array is not returned. In the example the data in arrays \plc{v1} and \plc{v2} -are not returned. In the second \code{target} directive, the \plc{v1} and \plc{v2} descriptors are -re-created on the device with the descriptive information; and references to the -vectors point to the correct local storage, of the space that was not freed in the first \code{target} -directive. At the end of the second \code{target} region, the data in array \plc{p} is copied back -to the host since \plc{p} is not an allocatable array. +At the end of the first \code{target} region, the arrays \plc{v1} and \plc{v2} are preserved on the device +for access in the second \code{target} region. At the end of the second \code{target} region, the data +in array \plc{p} is copied back, the arrays \plc{v1} and \plc{v2} are not. A \code{depend} clause is used in the \code{task} directive to provide a wait at the beginning of the second \code{target} region, to insure that there is no race condition with \plc{v1} and \plc{v2} in the two tasks. It would be noncompliant to use \plc{v1} and/or \plc{v2} in lieu of \plc{N} in the \code{depend} clauses, -because the use of non-allocated allocatable arrays as list items in the first \code{depend} clause would +because the use of non-allocated allocatable arrays as list items in a \code{depend} clause would lead to unspecified behavior. -\fexample{async_target}{2f} - +\noteheader{--} This example is not strictly compliant with the OpenMP 4.5 specification since the allocation status +of allocatable arrays \plc{v1} and \plc{v2} is changed inside the \code{target} region, which is not allowed. +(See the restrictions for the \code{map} clause in the \plc{Data-mapping Attribute Rules and Clauses} +section of the specification.) +However, the intention is to relax the restrictions on mapping of allocatable variables in the next release +of the specification so that the example will be compliant. +\ffreeexample{async_target}{2} diff --git a/Examples_atomic.tex b/Examples_atomic.tex index f92bbf3..06f280c 100644 --- a/Examples_atomic.tex +++ b/Examples_atomic.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{The \code{atomic} Construct} -\label{chap:atomic} +\section{The \code{atomic} Construct} +\label{sec:atomic} The following example avoids race conditions (simultaneous updates of an element of \plc{x} by multiple threads) by using the \code{atomic} construct . @@ -14,9 +14,9 @@ Note that the \code{atomic} directive applies only to the statement immediately following it. As a result, elements of \plc{y} are not updated atomically in this example. -\cexample{atomic}{1c} +\cexample{atomic}{1} -\fexample{atomic}{1f} +\fexample{atomic}{1} The following example illustrates the \code{read} and \code{write} clauses for the \code{atomic} directive. These clauses ensure that the given variable @@ -26,9 +26,9 @@ another part of the variable. Note that most hardware provides atomic reads and writes for some set of properly aligned variables of specific sizes, but not necessarily for all the variable types supported by the OpenMP API. -\cexample{atomic}{2c} +\cexample{atomic}{2} -\fexample{atomic}{2f} +\fexample{atomic}{2} The following example illustrates the \code{capture} clause for the \code{atomic} directive. In this case the value of a variable is captured, and then the variable @@ -37,8 +37,8 @@ be implemented using the fetch-and-add instruction available on many kinds of ha The example also shows a way to implement a spin lock using the \code{capture} and \code{read} clauses. -\cexample{atomic}{3c} +\cexample{atomic}{3} -\fexample{atomic}{3f} +\fexample{atomic}{3} diff --git a/Examples_atomic_restrict.tex b/Examples_atomic_restrict.tex index 7394de3..9599e05 100644 --- a/Examples_atomic_restrict.tex +++ b/Examples_atomic_restrict.tex @@ -1,25 +1,25 @@ \pagebreak -\chapter{Restrictions on the \code{atomic} Construct} -\label{chap:atomic_restrict} +\section{Restrictions on the \code{atomic} Construct} +\label{sec:atomic_restrict} The following non-conforming examples illustrate the restrictions on the \code{atomic} construct. -\cexample{atomic_restrict}{1c} +\cexample{atomic_restrict}{1} -\fexample{atomic_restrict}{1f} +\fexample{atomic_restrict}{1} -\cexample{atomic_restrict}{2c} +\cexample{atomic_restrict}{2} \fortranspecificstart The following example is non-conforming because \code{I} and \code{R} reference the same location but have different types. -\fnexample{atomic_restrict}{2f} +\fnexample{atomic_restrict}{2} Although the following example might work on some implementations, this is also non-conforming: -\fnexample{atomic_restrict}{3f} +\fnexample{atomic_restrict}{3} \fortranspecificend diff --git a/Examples_barrier_regions.tex b/Examples_barrier_regions.tex index 64b7eef..8cd2b5c 100644 --- a/Examples_barrier_regions.tex +++ b/Examples_barrier_regions.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{Binding of \code{barrier} Regions} -\label{chap:barrier_regions} +\section{Binding of \code{barrier} Regions} +\label{sec:barrier_regions} The binding rules call for a \code{barrier} region to bind to the closest enclosing \code{parallel} region. @@ -17,8 +17,8 @@ part. Also note that the \code{barrier} region in \plc{sub3} when called from \plc{sub2} only synchronizes the team of threads in the enclosing \code{parallel} region and not all the threads created in \plc{sub1}. -\cexample{barrier_regions}{1c} +\cexample{barrier_regions}{1} -\fexample{barrier_regions}{1f} +\fexample{barrier_regions}{1} diff --git a/Examples_cancellation.tex b/Examples_cancellation.tex index 7be91e8..88125be 100644 --- a/Examples_cancellation.tex +++ b/Examples_cancellation.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{Cancellation Constructs} -\label{chap:cancellation} +\section{Cancellation Constructs} +\label{sec:cancellation} The following example shows how the \code{cancel} directive can be used to terminate an OpenMP region. Although the \code{cancel} construct terminates the OpenMP @@ -11,7 +11,7 @@ exception is properly handled in the sequential part. If cancellation of the \co region has been requested, some threads might have executed \code{phase\_1()}. However, it is guaranteed that none of the threads executed \code{phase\_2()}. -\cexample{cancellation}{1c} +\cppexample{cancellation}{1} The following example illustrates the use of the \code{cancel} construct in error @@ -20,7 +20,7 @@ the cancellation is activated. The encountering thread sets the shared variable \code{err} and other threads of the binding thread set proceed to the end of the worksharing construct after the cancellation has been activated. -\fexample{cancellation}{1f} +\ffreeexample{cancellation}{1} The following example shows how to cancel a parallel search on a binary tree as soon as the search value has been detected. The code creates a task to descend @@ -32,11 +32,11 @@ task group to control the effect of the \code{cancel taskgroup} directive. The \plc{level} argument is used to create undeferred tasks after the first ten levels of the tree. -\cexample{cancellation}{2c} +\cexample{cancellation}{2} The following is the equivalent parallel search example in Fortran. -\fexample{cancellation}{2f} +\ffreeexample{cancellation}{2} diff --git a/Examples_carrays_fpriv.tex b/Examples_carrays_fpriv.tex index aad4b6d..f9086bb 100644 --- a/Examples_carrays_fpriv.tex +++ b/Examples_carrays_fpriv.tex @@ -1,7 +1,7 @@ \pagebreak -\chapter{C/C++ Arrays in a \code{firstprivate} Clause} +\section{C/C++ Arrays in a \code{firstprivate} Clause} \ccppspecificstart -\label{chap:carrays_fpriv} +\label{sec:carrays_fpriv} The following example illustrates the size and value of list items of array or pointer type in a \code{firstprivate} clause . The size of new list items is @@ -31,7 +31,7 @@ The new items of array type are initialized as if each integer element of the or array is assigned to the corresponding element of the new array. Those of pointer type are initialized as if by assignment from the original item to the new item. -\cnexample{carrays_fpriv}{1c} +\cnexample{carrays_fpriv}{1} \ccppspecificend diff --git a/Examples_collapse.tex b/Examples_collapse.tex index fd88dad..4a59066 100644 --- a/Examples_collapse.tex +++ b/Examples_collapse.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{The \code{collapse} Clause} -\label{chap:collapse} +\section{The \code{collapse} Clause} +\label{sec:collapse} In the following example, the \code{k} and \code{j} loops are associated with the loop construct. So the iterations of the \code{k} and \code{j} loops are @@ -16,9 +16,9 @@ The variable \code{j} can be omitted from the \code{private} clause when the from the \code{private} clause. In either case, \code{k} is implicitly private and could be omitted from the \code{private} clause. -\cexample{collapse}{1c} +\cexample{collapse}{1} -\fexample{collapse}{1f} +\fexample{collapse}{1} In the next example, the \code{k} and \code{j} loops are associated with the loop construct. So the iterations of the \code{k} and \code{j} loops are collapsed @@ -33,9 +33,9 @@ will have the value \code{2} and \code{j} will have the value \code{3}. Since by the sequentially last iteration of the collapsed \code{k} and \code{j} loop. This example prints: \code{2 3}. -\cexample{collapse}{2c} +\cexample{collapse}{2} -\fexample{collapse}{2f} +\fexample{collapse}{2} The next example illustrates the interaction of the \code{collapse} and \code{ordered} clauses. @@ -71,8 +71,8 @@ The code prints \\ \code{1 3 2} -\cexample{collapse}{3c} +\cexample{collapse}{3} -\fexample{collapse}{3f} +\fexample{collapse}{3} diff --git a/Examples_cond_comp.tex b/Examples_cond_comp.tex index 3f501b6..68ad0e3 100644 --- a/Examples_cond_comp.tex +++ b/Examples_cond_comp.tex @@ -1,13 +1,13 @@ \pagebreak -\chapter{Conditional Compilation} -\label{chap:cond_comp} +\section{Conditional Compilation} +\label{sec:cond_comp} \ccppspecificstart The following example illustrates the use of conditional compilation using the OpenMP macro \code{\_OPENMP}. With OpenMP compilation, the \code{\_OPENMP} macro becomes defined. -\cnexample{cond_comp}{1c} +\cnexample{cond_comp}{1} \ccppspecificend \fortranspecificstart @@ -16,6 +16,6 @@ With OpenMP compilation, the conditional compilation sentinel \code{!\$} is reco and treated as two spaces. In fixed form source, statements guarded by the sentinel must start after column 6. -\fnexample{cond_comp}{1f} +\fnexample{cond_comp}{1} \fortranspecificend diff --git a/Examples_copyin.tex b/Examples_copyin.tex index 5eb1496..ada9a5a 100644 --- a/Examples_copyin.tex +++ b/Examples_copyin.tex @@ -1,13 +1,13 @@ \pagebreak -\chapter{The \code{copyin} Clause} -\label{chap:copyin} +\section{The \code{copyin} Clause} +\label{sec:copyin} The \code{copyin} clause is used to initialize threadprivate data upon entry to a \code{parallel} region. The value of the threadprivate variable in the master thread is copied to the threadprivate variable of each other team member. -\cexample{copyin}{1c} +\cexample{copyin}{1} -\fexample{copyin}{1f} +\fexample{copyin}{1} diff --git a/Examples_copyprivate.tex b/Examples_copyprivate.tex index 6d91618..d6ccf66 100644 --- a/Examples_copyprivate.tex +++ b/Examples_copyprivate.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{The \code{copyprivate} Clause} -\label{chap:copyprivate} +\section{The \code{copyprivate} Clause} +\label{sec:copyprivate} The \code{copyprivate} clause can be used to broadcast values acquired by a single thread directly to all instances of the private variables in the other threads. @@ -16,28 +16,28 @@ The thread that executes the structured block associated with the \code{single} of the other implicit tasks in the thread team. The broadcast completes before any of the threads have left the barrier at the end of the construct. -\cexample{copyprivate}{1c} +\cexample{copyprivate}{1} -\fexample{copyprivate}{1f} +\fexample{copyprivate}{1} In this example, assume that the input must be performed by the master thread. Since the \code{master} construct does not support the \code{copyprivate} clause, it cannot broadcast the input value that is read. However, \code{copyprivate} is used to broadcast an address where the input value is stored. -\cexample{copyprivate}{2c} +\cexample{copyprivate}{2} -\fexample{copyprivate}{2f} +\fexample{copyprivate}{2} Suppose that the number of lock variables required within a \code{parallel} region cannot easily be determined prior to entering it. The \code{copyprivate} clause can be used to provide access to shared lock variables that are allocated within that \code{parallel} region. -\cexample{copyprivate}{3c} +\cexample{copyprivate}{3} \fortranspecificstart -\fnexample{copyprivate}{3f} +\fnexample{copyprivate}{3} Note that the effect of the \code{copyprivate} clause on a variable with the \code{allocatable} attribute is different than on a variable with the \code{pointer} @@ -45,7 +45,7 @@ attribute. The value of \code{A} is copied (as if by intrinsic assignment) and the pointer \code{B} is copied (as if by pointer assignment) to the corresponding list items in the other implicit tasks belonging to the \code{parallel} region. -\fnexample{copyprivate}{4f} +\fnexample{copyprivate}{4} \fortranspecificend diff --git a/Examples_cpp_reference.tex b/Examples_cpp_reference.tex new file mode 100644 index 0000000..2ff2ce8 --- /dev/null +++ b/Examples_cpp_reference.tex @@ -0,0 +1,14 @@ +\section{C++ Reference in Data-Sharing Clauses} +\cppspecificstart +\label{sec:cpp_reference} + +C++ reference types are allowed in data-sharing attribute clauses as of OpenMP 4.5, except +for the \code{threadprivate}, \code{copyin} and \code{copyprivate} clauses. +(See the Data-Sharing Attribute Clauses Section of the 4.5 OpenMP specification.) +When a variable with C++ reference type is privatized, the object the reference refers to is privatized in addition to the reference itself. +The following example shows the use of reference types in data-sharing clauses in the usual way. +Additionally it shows how the data-sharing of formal arguments with a C++ reference type on an orphaned task generating construct is determined implicitly. (See the Data-sharing Attribute Rules for Variables Referenced in a Construct Section of the 4.5 OpenMP specification.) + + +\cppnexample{cpp_reference}{1} +\cppspecificend diff --git a/Examples_critical.tex b/Examples_critical.tex index 0e84afc..c2ed0ea 100644 --- a/Examples_critical.tex +++ b/Examples_critical.tex @@ -1,16 +1,20 @@ \pagebreak -\chapter{The \code{critical} Construct} -\label{chap:critical} +\section{The \code{critical} Construct} +\label{sec:critical} -The following example includes several \code{critical} constructs . The example +The following example includes several \code{critical} constructs. The example illustrates a queuing model in which a task is dequeued and worked on. To guard against multiple threads dequeuing the same task, the dequeuing operation must be in a \code{critical} region. Because the two queues in this example are independent, they are protected by \code{critical} constructs with different names, \plc{xaxis} and \plc{yaxis}. -\cexample{critical}{1c} +\cexample{critical}{1} -\fexample{critical}{1f} +\fexample{critical}{1} +The following example extends the previous example by adding the \code{hint} clause to the \code{critical} constructs. +\cexample{critical}{2} + +\fexample{critical}{2} diff --git a/Examples_declare_target.tex b/Examples_declare_target.tex index e5c7c4b..e24d214 100644 --- a/Examples_declare_target.tex +++ b/Examples_declare_target.tex @@ -1,8 +1,9 @@ \pagebreak -\chapter{\code{declare} \code{target} Construct} -\label{chap:declare_target} +\section{\code{declare} \code{target} Construct} +\label{sec:declare_target} -\section{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for a Function} +\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for a Function} +\label{subsec:declare_target_function} The following example shows how the \code{declare} \code{target} directive is used to indicate that the corresponding call inside a \code{target} region @@ -15,7 +16,7 @@ the \code{target} region (thus \code{fib}) will execute on the host device. For C/C++ codes the declaration of the function \code{fib} appears between the \code{declare} \code{target} and \code{end} \code{declare} \code{target} directives. -\cexample{declare_target}{1c} +\cexample{declare_target}{1} The Fortran \code{fib} subroutine contains a \code{declare} \code{target} declaration to indicate to the compiler to create an device executable version of the procedure. @@ -26,7 +27,7 @@ The program uses the \code{module\_fib} module, which presents an explicit inter the compiler with the \code{declare} \code{target} declarations for processing the \code{fib} call. -\fexample{declare_target}{1f} +\ffreeexample{declare_target}{1} The next Fortran example shows the use of an external subroutine. Without an explicit interface (through module use or an interface block) the \code{declare} \code{target} @@ -34,9 +35,10 @@ declarations within a external subroutine are unknown to the main program unit; therefore, a \code{declare} \code{target} must be provided within the program scope for the compiler to determine that a target binary should be available. -\fexample{declare_target}{2f} +\ffreeexample{declare_target}{2} -\section{\code{declare} \code{target} Construct for Class Type} +\subsection{\code{declare} \code{target} Construct for Class Type} +\label{subsec:declare_target_class} \cppspecificstart The following example shows how the \code{declare} \code{target} and \code{end} @@ -45,10 +47,11 @@ of a variable \plc{varY} with a class type \code{typeY}. The member function \co be accessed on a target device because its declaration did not appear between \code{declare} \code{target} and \code{end} \code{declare} \code{target} directives. -\cnexample{declare_target}{2c} +\cppnexample{declare_target}{2} \cppspecificend -\section{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for Variables} +\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} for Variables} +\label{subsec:declare_target_variables} The following examples show how the \code{declare} \code{target} and \code{end} \code{declare} \code{target} directives are used to indicate that global variables @@ -62,13 +65,13 @@ is then used to manage the consistency of the variables \plc{p}, \plc{v1}, and \ data environment of the encountering host device task and the implicit device data environment of the default target device. -\cexample{declare_target}{3c} +\cexample{declare_target}{3} The Fortran version of the above C code uses a different syntax. Fortran modules use a list syntax on the \code{declare} \code{target} directive to declare mapped variables. -\fexample{declare_target}{3f} +\ffreeexample{declare_target}{3} The following example also indicates that the function \code{Pfun()} is available on the target device, as well as the variable \plc{Q}, which is mapped to the implicit device @@ -81,7 +84,7 @@ In the following example, the function and variable declarations appear between the \code{declare} \code{target} and \code{end} \code{declare} \code{target} directives. -\cexample{declare_target}{4c} +\cexample{declare_target}{4} The Fortran version of the above C code uses a different syntax. In Fortran modules a list syntax on the \code{declare} \code{target} directive is used to declare @@ -90,9 +93,10 @@ separated list. When the \code{declare} \code{target} directive is used to declare just the procedure, the procedure name need not be listed -- it is implicitly assumed, as illustrated in the \code{Pfun()} function. -\fexample{declare_target}{4f} +\ffreeexample{declare_target}{4} -\section{\code{declare} \code{target} and \code{end} \code{declare} \code{target} with \code{declare} \code{simd}} +\subsection{\code{declare} \code{target} and \code{end} \code{declare} \code{target} with \code{declare} \code{simd}} +\label{subsec:declare_target_simd} The following example shows how the \code{declare} \code{target} and \code{end} \code{declare} \code{target} directives are used to indicate that a function @@ -100,7 +104,7 @@ is available on a target device. The \code{declare} \code{simd} directive indica that there is a SIMD version of the function \code{P()} that is available on the target device as well as one that is available on the host device. -\cexample{declare_target}{5c} +\cexample{declare_target}{5} The Fortran version of the above C code uses a different syntax. Fortran modules use a list syntax of the \code{declare} \code{target} declaration for the mapping. @@ -109,5 +113,30 @@ The function declaration does not use a list and implicitly assumes the function name. In this Fortran example row and column indices are reversed relative to the C/C++ example, as is usual for codes optimized for memory access. -\fexample{declare_target}{5f} +\ffreeexample{declare_target}{5} + + +\subsection{\code{declare}~\code{target} Directive with \code{link} Clause} +\label{subsec:declare_target_link} + +In the OpenMP 4.5 standard the \code{declare}~\code{target} directive was extended to allow static +data to be mapped, \emph{when needed}, through a \code{link} clause. + +Data storage for items listed in the \code{link} clause becomes available on the device +when it is mapped implicitly or explicitly in a \code{map} clause, and it persists for the scope of +the mapping (as specified by a \code{target} construct, +a \code{target}~\code{data} construct, or +\code{target}~\code{enter/exit}~\code{data} constructs). + +Tip: When all the global data items will not fit on a device and are not needed +simultaneously, use the \code{link} clause and map the data only when it is needed. + +The following C and Fortran examples show two sets of data (single precision and double precision) +that are global on the host for the entire execution on the host; but are only used +globally on the device for part of the program execution. The single precision data +are allocated and persist only for the first \code{target} region. Similarly, the +double precision data are in scope on the device only for the second \code{target} region. + +\cexample{declare_target}{6} +\ffreeexample{declare_target}{6} diff --git a/Examples_default_none.tex b/Examples_default_none.tex index c590da5..2b4dd5b 100644 --- a/Examples_default_none.tex +++ b/Examples_default_none.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{The \code{default(none)} Clause} -\label{chap:default_none} +\section{The \code{default(none)} Clause} +\label{sec:default_none} The following example distinguishes the variables that are affected by the \code{default(none)} clause from those that are not. @@ -11,9 +11,9 @@ are no longer predetermined shared. Thus, these variables (variable \plc{c} in need to be explicitly listed in data-sharing attribute clauses when the \code{default(none)} clause is specified. -\cnexample{default_none}{1c} +\cnexample{default_none}{1} \ccppspecificend -\fexample{default_none}{1f} +\fexample{default_none}{1} diff --git a/Examples_device.tex b/Examples_device.tex index c09c7f6..0dd1a87 100644 --- a/Examples_device.tex +++ b/Examples_device.tex @@ -1,35 +1,57 @@ \pagebreak -\chapter{Device Routines} -\label{chap:device} +\section{Device Routines} +\label{sec:device} -\section{\code{omp\_is\_initial\_device} Routine} +\subsection{\code{omp\_is\_initial\_device} Routine} +\label{subsec:device_is_initial} The following example shows how the \code{omp\_is\_initial\_device} runtime library routine can be used to query if a code is executing on the initial host device or on a target device. The example then sets the number of threads in the \code{parallel} region based on where the code is executing. -\cexample{device}{1c} +\cexample{device}{1} -\fexample{device}{1f} +\ffreeexample{device}{1} -\section{\code{omp\_get\_num\_devices} Routine} +\subsection{\code{omp\_get\_num\_devices} Routine} +\label{subsec:device_num_devices} The following example shows how the \code{omp\_get\_num\_devices} runtime library routine can be used to determine the number of devices. -\cexample{device}{2c} +\cexample{device}{2} -\fexample{device}{2f} +\ffreeexample{device}{2} -\section{\code{omp\_set\_default\_device} and \\ +\subsection{\code{omp\_set\_default\_device} and \\ \code{omp\_get\_default\_device} Routines} +\label{subsec:device_is_set_get_default} The following example shows how the \code{omp\_set\_default\_device} and \code{omp\_get\_default\_device} runtime library routines can be used to set the default device and determine the default device respectively. -\cexample{device}{3c} +\cexample{device}{3} -\fexample{device}{3f} +\ffreeexample{device}{3} + + + \subsection{Target Memory and Device Pointers Routines} +\label{subsec:target_mem_and_device_ptrs} + +The following example shows how to create space on a device, transfer data +to and from that space, and free the space, using API calls. The API calls +directly execute allocation, copy and free operations on the device, without invoking +any mapping through a \code{target} directive. The \code{omp\_target\_alloc} routine allocates space +and returns a device pointer for referencing the space in the \code{omp\_target\_memcpy} +API routine on the host. The \code{omp\_target\_free} routine frees the space on the device. + +The example also illustrates how to access that space +in a \code{target} region by exposing the device pointer in an \code{is\_device\_ptr} clause. + +The example creates an array of cosine values on the default device, to be used +on the host device. The function fails if a default device is not available. + +\cexample{device}{4} diff --git a/Examples_doacross.tex b/Examples_doacross.tex new file mode 100644 index 0000000..a23ebab --- /dev/null +++ b/Examples_doacross.tex @@ -0,0 +1,68 @@ +\pagebreak +\section{Doacross Loop Nest} +\label{sec:doacross} + +An \code{ordered} clause can be used on a loop construct with an integer +parameter argument to define the number of associated loops within +a \plc{doacross loop nest} where cross-iteration dependences exist. +A \code{depend} clause on an \code{ordered} construct within an ordered +loop describes the dependences of the \plc{doacross} loops. + +In the code below, the \code{depend(sink:i-1)} clause defines an \plc{i-1} +to \plc{i} cross-iteration dependence that specifies a wait point for +the completion of computation from iteration \plc{i-1} before proceeding +to the subsequent statements. The \code{depend(source)} clause indicates +the completion of computation from the current iteration (\plc{i}) +to satisfy the cross-iteration dependence that arises from the iteration. +For this example the same sequential ordering could have been achieved +with an \code{ordered} clause without a parameter, on the loop directive, +and a single \code{ordered} directive without the \code{depend} clause +specified for the statement executing the \plc{bar} function. + +\cexample{doacross}{1} + +\ffreeexample{doacross}{1} + +The following code is similar to the previous example but with +\plc{doacross loop nest} extended to two nested loops, \plc{i} and \plc{j}, +as specified by the \code{ordered(2)} clause on the loop directive. +In the C/C++ code, the \plc{i} and \plc{j} loops are the first and +second associated loops, respectively, whereas +in the Fortran code, the \plc{j} and \plc{i} loops are the first and +second associated loops, respectively. +The \code{depend(sink:i-1,j)} and \code{depend(sink:i,j-1)} clauses in +the C/C++ code define cross-iteration dependences in two dimensions from +iterations (\plc{i-1, j}) and (\plc{i, j-1}) to iteration (\plc{i, j}). +Likewise, the \code{depend(sink:j-1,i)} and \code{depend(sink:j,i-1)} clauses +in the Fortran code define cross-iteration dependences from iterations +(\plc{j-1, i}) and (\plc{j, i-1}) to iteration (\plc{j, i}). + +\cexample{doacross}{2} + +\ffreeexample{doacross}{2} + + +The following example shows the incorrect use of the \code{ordered} +directive with a \code{depend} clause. There are two issues with the code. +The first issue is a missing \code{ordered}~\code{depend(source)} directive, +which could cause a deadlock. +The second issue is the \code{depend(sink:i+1,j)} and \code{depend(sink:i,j+1)} +clauses define dependences on lexicographically later +source iterations (\plc{i+1, j}) and (\plc{i, j+1}), which could cause +a deadlock as well since they may not start to execute until the current iteration completes. + +\cexample{doacross}{3} + +\ffreeexample{doacross}{3} + + +The following example illustrates the use of the \code{collapse} clause for +a \plc{doacross loop nest}. The \plc{i} and \plc{j} loops are the associated +loops for the collapsed loop as well as for the \plc{doacross loop nest}. +The example also shows a compliant usage of the dependence source +directive placed before the corresponding sink directive. +Checking the completion of computation from previous iterations at the sink point can occur after the source statement. + +\cexample{doacross}{4} + +\ffreeexample{doacross}{4} diff --git a/Examples_flush_nolist.tex b/Examples_flush_nolist.tex index 732da9f..d39e17e 100644 --- a/Examples_flush_nolist.tex +++ b/Examples_flush_nolist.tex @@ -1,12 +1,12 @@ \pagebreak -\chapter{The \code{flush} Construct without a List} -\label{chap:flush_nolist} +\section{The \code{flush} Construct without a List} +\label{sec:flush_nolist} The following example distinguishes the shared variables affected by a \code{flush} construct with no list from the shared objects that are not affected: -\cexample{flush_nolist}{1c} +\cexample{flush_nolist}{1} -\fexample{flush_nolist}{1f} +\fexample{flush_nolist}{1} diff --git a/Examples_fort_do.tex b/Examples_fort_do.tex index d0c5fe9..f8fc15e 100644 --- a/Examples_fort_do.tex +++ b/Examples_fort_do.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{Fortran Restrictions on the \code{do} Construct} -\label{chap:fort_do} +\section{Fortran Restrictions on the \code{do} Construct} +\label{sec:fort_do} \fortranspecificstart If an \code{end do} directive follows a \plc{do-construct} in which several @@ -8,12 +8,12 @@ If an \code{end do} directive follows a \plc{do-construct} in which several directive can only be specified for the outermost of these \code{DO} statements. The following example contains correct usages of loop constructs: -\fnexample{fort_do}{1f} +\fnexample{fort_do}{1} The following example is non-conforming because the matching \code{do} directive for the \code{end do} does not precede the outermost loop: -\fnexample{fort_do}{2f} +\fnexample{fort_do}{2} \fortranspecificend diff --git a/Examples_fort_loopvar.tex b/Examples_fort_loopvar.tex index 14d988a..0781d40 100644 --- a/Examples_fort_loopvar.tex +++ b/Examples_fort_loopvar.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{Fortran Private Loop Iteration Variables} -\label{chap:fort_loopvar} +\section{Fortran Private Loop Iteration Variables} +\label{sec:fort_loopvar} \fortranspecificstart In general loop iteration variables will be private, when used in the \plc{do-loop} @@ -10,12 +10,12 @@ the OpenMP 4.0 specification). In the following example of a sequential loop in a \code{parallel} construct the loop iteration variable \plc{I} will be private. -\fnexample{fort_loopvar}{1f} +\ffreenexample{fort_loopvar}{1} In exceptional cases, loop iteration variables can be made shared, as in the following example: -\fnexample{fort_loopvar}{2f} +\ffreenexample{fort_loopvar}{2} Note however that the use of shared loop iteration variables can easily lead to race conditions. diff --git a/Examples_fort_race.tex b/Examples_fort_race.tex index 25f0a37..cc6617e 100644 --- a/Examples_fort_race.tex +++ b/Examples_fort_race.tex @@ -1,7 +1,7 @@ \pagebreak -\chapter{Race Conditions Caused by Implied Copies of Shared Variables in Fortran} +\section{Race Conditions Caused by Implied Copies of Shared Variables in Fortran} \fortranspecificstart -\label{chap:fort_race} +\label{sec:fort_race} The following example contains a race condition, because the shared variable, which is an array section, is passed as an actual argument to a routine that has an assumed-size @@ -10,7 +10,7 @@ may cause the compiler to copy the argument into a temporary location prior to the call and copy from the temporary location into the original variable when the subroutine returns. This copying would cause races in the \code{parallel} region. -\fnexample{fort_race}{1f} +\ffreenexample{fort_race}{1} \fortranspecificend diff --git a/Examples_fort_sa_private.tex b/Examples_fort_sa_private.tex index fd50c5b..2333b34 100644 --- a/Examples_fort_sa_private.tex +++ b/Examples_fort_sa_private.tex @@ -1,23 +1,23 @@ \pagebreak -\chapter{Fortran Restrictions on Storage Association with the \code{private} Clause} +\section{Fortran Restrictions on Storage Association with the \code{private} Clause} \fortranspecificstart -\label{chap:fort_sa_private} +\label{sec:fort_sa_private} The following non-conforming examples illustrate the implications of the \code{private} clause rules with regard to storage association. -\fnexample{fort_sa_private}{1f} +\fnexample{fort_sa_private}{1} -\fnexample{fort_sa_private}{2f} +\fnexample{fort_sa_private}{2} + +\fnexample{fort_sa_private}{3} % blue line floater at top of this page for "Fortran, cont." \begin{figure}[t!] \linewitharrows{-1}{dashed}{Fortran (cont.)}{8em} \end{figure} -\fnexample{fort_sa_private}{3f} +\fnexample{fort_sa_private}{4} -\fnexample{fort_sa_private}{4f} - -\fnexample{fort_sa_private}{5f} +\fnexample{fort_sa_private}{5} \fortranspecificend diff --git a/Examples_fort_sp_common.tex b/Examples_fort_sp_common.tex index d5af05d..b404a8c 100644 --- a/Examples_fort_sp_common.tex +++ b/Examples_fort_sp_common.tex @@ -1,7 +1,7 @@ \pagebreak -\chapter{Fortran Restrictions on \code{shared} and \code{private} Clauses with Common Blocks} +\section{Fortran Restrictions on \code{shared} and \code{private} Clauses with Common Blocks} \fortranspecificstart -\label{chap:fort_sp_common} +\label{sec:fort_sp_common} When a named common block is specified in a \code{private}, \code{firstprivate}, or \code{lastprivate} clause of a construct, none of its members may be declared @@ -10,11 +10,11 @@ illustrate this point. The following example is conforming: -\fnexample{fort_sp_common}{1f} +\fnexample{fort_sp_common}{1} The following example is also conforming: -\fnexample{fort_sp_common}{2f} +\fnexample{fort_sp_common}{2} % blue line floater at top of this page for "Fortran, cont." \begin{figure}[t!] \linewitharrows{-1}{dashed}{Fortran (cont.)}{8em} @@ -22,17 +22,17 @@ The following example is also conforming: The following example is conforming: -\fnexample{fort_sp_common}{3f} +\fnexample{fort_sp_common}{3} The following example is non-conforming because \code{x} is a constituent element of \code{c}: -\fnexample{fort_sp_common}{4f} +\fnexample{fort_sp_common}{4} The following example is non-conforming because a common block may not be declared both shared and private: -\fnexample{fort_sp_common}{5f} +\fnexample{fort_sp_common}{5} \fortranspecificend diff --git a/Examples_fpriv_sections.tex b/Examples_fpriv_sections.tex index 2102263..5dd2f17 100644 --- a/Examples_fpriv_sections.tex +++ b/Examples_fpriv_sections.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{The \code{firstprivate} Clause and the \code{sections} Construct} -\label{chap:fpriv_sections} +\section{The \code{firstprivate} Clause and the \code{sections} Construct} +\label{sec:fpriv_sections} In the following example of the \code{sections} construct the \code{firstprivate} clause is used to initialize the private copy of \code{section\_count} of each @@ -11,8 +11,8 @@ thread executes the two sections, one section will print the value 1 and the oth will print the value 2. Since the order of execution of the two sections in this case is unspecified, it is unspecified which section prints which value. -\cexample{fpriv_sections}{1c} +\cexample{fpriv_sections}{1} -\fexample{fpriv_sections}{1f} +\ffreeexample{fpriv_sections}{1} diff --git a/Examples_get_nthrs.tex b/Examples_get_nthrs.tex index 35d0049..971d857 100644 --- a/Examples_get_nthrs.tex +++ b/Examples_get_nthrs.tex @@ -1,21 +1,21 @@ \pagebreak -\chapter{The \code{omp\_get\_num\_threads} Routine} -\label{chap:get_nthrs} +\section{The \code{omp\_get\_num\_threads} Routine} +\label{sec:get_nthrs} In the following example, the \code{omp\_get\_num\_threads} call returns 1 in the sequential part of the code, so \code{np} will always be equal to 1. To determine the number of threads that will be deployed for the \code{parallel} region, the call should be inside the \code{parallel} region. -\cexample{get_nthrs}{1c} +\cexample{get_nthrs}{1} -\fexample{get_nthrs}{1f} +\fexample{get_nthrs}{1} The following example shows how to rewrite this program without including a query for the number of threads: -\cexample{get_nthrs}{2c} +\cexample{get_nthrs}{2} -\fexample{get_nthrs}{2f} +\fexample{get_nthrs}{2} diff --git a/Examples_icv.tex b/Examples_icv.tex index 51b7b45..a8f1bdb 100644 --- a/Examples_icv.tex +++ b/Examples_icv.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{Internal Control Variables (ICVs)} -\label{chap:icv} +\section{Internal Control Variables (ICVs)} +\label{sec:icv} According to Section 2.3 of the OpenMP 4.0 specification, an OpenMP implementation must act as if there are ICVs that control the behavior of the program. This example illustrates two ICVs, \plc{nthreads-var} @@ -50,7 +50,7 @@ one of the threads in the team. Since we have a total of two inner \code{paralle regions, the print statement will be executed twice -- once per inner \code{parallel} region. -\cexample{icv}{1c} +\cexample{icv}{1} -\fexample{icv}{1f} +\fexample{icv}{1} diff --git a/Examples_init_lock.tex b/Examples_init_lock.tex index b686992..ba5324e 100644 --- a/Examples_init_lock.tex +++ b/Examples_init_lock.tex @@ -1,11 +1,10 @@ -\pagebreak -\chapter{The \code{omp\_init\_lock} Routine} -\label{chap:init_lock} +\subsection{The \code{omp\_init\_lock} Routine} +\label{subsec:init_lock} The following example demonstrates how to initialize an array of locks in a \code{parallel} region by using \code{omp\_init\_lock}. -\cexample{init_lock}{1c} +\cppexample{init_lock}{1} -\fexample{init_lock}{1f} +\fexample{init_lock}{1} diff --git a/Examples_init_lock_with_hint.tex b/Examples_init_lock_with_hint.tex new file mode 100644 index 0000000..8109eaa --- /dev/null +++ b/Examples_init_lock_with_hint.tex @@ -0,0 +1,10 @@ +%\pagebreak +\subsection{The \code{omp\_init\_lock\_with\_hint} Routine} +\label{subsec:init_lock_with_hint} + +The following example demonstrates how to initialize an array of locks in a \code{parallel} region by using \code{omp\_init\_lock\_with\_hint}. +Note, hints are combined with an \code{|} or \code{+} operator in C/C++ and a \code{+} operator in Fortran. + +\cppexample{init_lock_with_hint}{1} + +\fexample{init_lock_with_hint}{1} diff --git a/Examples_lastprivate.tex b/Examples_lastprivate.tex index b019fe1..f4e772e 100644 --- a/Examples_lastprivate.tex +++ b/Examples_lastprivate.tex @@ -1,14 +1,14 @@ \pagebreak -\chapter{The \code{lastprivate} Clause} -\label{chap:lastprivate} +\section{The \code{lastprivate} Clause} +\label{sec:lastprivate} Correct execution sometimes depends on the value that the last iteration of a loop assigns to a variable. Such programs must list all such variables in a \code{lastprivate} clause so that the values of the variables are the same as when the loop is executed sequentially. -\cexample{lastprivate}{1c} +\cexample{lastprivate}{1} -\fexample{lastprivate}{1f} +\fexample{lastprivate}{1} diff --git a/Examples_linear_in_loop.tex b/Examples_linear_in_loop.tex new file mode 100644 index 0000000..3ef1ad7 --- /dev/null +++ b/Examples_linear_in_loop.tex @@ -0,0 +1,13 @@ +\section{\code{linear} Clause in Loop Constructs} +\label{sec:linear_in_loop} + +The following example shows the use of the \code{linear} clause in a loop +construct to allow the proper parallelization of a loop that contains +an induction variable (\plc{j}). At the end of the execution of +the loop construct, the original variable \plc{j} is updated with +the value \plc{N/2} from the last iteration of the loop. + +\cexample{linear_in_loop}{1} + +\ffreeexample{linear_in_loop}{1} + diff --git a/Examples_lock_owner.tex b/Examples_lock_owner.tex index d6c9623..08a274c 100644 --- a/Examples_lock_owner.tex +++ b/Examples_lock_owner.tex @@ -1,6 +1,5 @@ -\pagebreak -\chapter{Ownership of Locks} -\label{chap:lock_owner} +\subsection{Ownership of Locks} +\label{subsec:lock_owner} Ownership of locks has changed since OpenMP 2.5. In OpenMP 2.5, locks are owned by threads; so a lock released by the \code{omp\_unset\_lock} routine must be @@ -16,8 +15,8 @@ the same). However, it is not conforming beginning with OpenMP 3.0, because the region that releases the lock \code{lck} is different from the task region that acquires the lock. -\cexample{lock_owner}{1c} +\cexample{lock_owner}{1} -\fexample{lock_owner}{1f} +\fexample{lock_owner}{1} diff --git a/Examples_locks.tex b/Examples_locks.tex new file mode 100644 index 0000000..a79b58f --- /dev/null +++ b/Examples_locks.tex @@ -0,0 +1,5 @@ +\pagebreak +\section{Lock Routines} +\label{sec:locks} + +This section is about the use of lock routines for synchronization. diff --git a/Examples_master.tex b/Examples_master.tex index 48de993..48e7548 100644 --- a/Examples_master.tex +++ b/Examples_master.tex @@ -1,13 +1,13 @@ \pagebreak -\chapter{The \code{master} Construct} -\label{chap:master} +\section{The \code{master} Construct} +\label{sec:master} The following example demonstrates the master construct . In the example, the master keeps track of how many iterations have been executed and prints out a progress report. The other threads skip the master region without waiting. -\cexample{master}{1c} +\cexample{master}{1} -\fexample{master}{1f} +\fexample{master}{1} diff --git a/Examples_mem_model.tex b/Examples_mem_model.tex index a7625dd..51ab56f 100644 --- a/Examples_mem_model.tex +++ b/Examples_mem_model.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{The OpenMP Memory Model} -\label{chap:mem_model} +\section{The OpenMP Memory Model} +\label{sec:mem_model} In the following example, at Print 1, the value of \plc{x} could be either 2 or 5, depending on the timing of the threads, and the implementation of the assignment @@ -14,25 +14,25 @@ The barrier after Print 1 contains implicit flushes on all threads, as well as a thread synchronization, so the programmer is guaranteed that the value 5 will be printed by both Print 2 and Print 3. -\cexample{mem_model}{1c} +\cexample{mem_model}{1} -\fexample{mem_model}{1f} +\ffreeexample{mem_model}{1} The following example demonstrates why synchronization is difficult to perform correctly through variables. The value of flag is undefined in both prints on thread 1 and the value of data is only well-defined in the second print. -\cexample{mem_model}{2c} +\cexample{mem_model}{2} -\fexample{mem_model}{2f} +\fexample{mem_model}{2} The next example demonstrates why synchronization is difficult to perform correctly through variables. Because the \plc{write}(1)-\plc{flush}(1)-\plc{flush}(2)-\plc{read}(2) sequence cannot be guaranteed in the example, the statements on thread 0 and thread 1 may execute in either order. -\cexample{mem_model}{3c} +\cexample{mem_model}{3} -\fexample{mem_model}{3f} +\fexample{mem_model}{3} diff --git a/Examples_nestable_lock.tex b/Examples_nestable_lock.tex index 89b55e7..991786f 100644 --- a/Examples_nestable_lock.tex +++ b/Examples_nestable_lock.tex @@ -1,11 +1,10 @@ -\pagebreak -\chapter{Nestable Lock Routines} -\label{chap:nestable_lock} +\subsection{Nestable Lock Routines} +\label{subsec:nestable_lock} The following example demonstrates how a nestable lock can be used to synchronize updates both to a whole structure and to one of its members. -\cexample{nestable_lock}{1c} +\cexample{nestable_lock}{1} -\fexample{nestable_lock}{1f} +\fexample{nestable_lock}{1} diff --git a/Examples_nested_loop.tex b/Examples_nested_loop.tex index fad0077..0131725 100644 --- a/Examples_nested_loop.tex +++ b/Examples_nested_loop.tex @@ -1,18 +1,18 @@ \pagebreak -\chapter{Nested Loop Constructs} -\label{chap:nested_loop} +\section{Nested Loop Constructs} +\label{sec:nested_loop} The following example of loop construct nesting is conforming because the inner and outer loop regions bind to different \code{parallel} regions: -\cexample{nested_loop}{1c} +\cexample{nested_loop}{1} -\fexample{nested_loop}{1f} +\fexample{nested_loop}{1} The following variation of the preceding example is also conforming: -\cexample{nested_loop}{2c} +\cexample{nested_loop}{2} -\fexample{nested_loop}{2f} +\fexample{nested_loop}{2} diff --git a/Examples_nesting_restrict.tex b/Examples_nesting_restrict.tex index 0eecdea..4502290 100644 --- a/Examples_nesting_restrict.tex +++ b/Examples_nesting_restrict.tex @@ -1,52 +1,52 @@ \pagebreak -\chapter{Restrictions on Nesting of Regions} -\label{chap:nesting_restrict} +\section{Restrictions on Nesting of Regions} +\label{sec:nesting_restrict} The examples in this section illustrate the region nesting rules. The following example is non-conforming because the inner and outer loop regions are closely nested: -\cexample{nesting_restrict}{1c} +\cexample{nesting_restrict}{1} -\fexample{nesting_restrict}{1f} +\fexample{nesting_restrict}{1} The following orphaned version of the preceding example is also non-conforming: -\cexample{nesting_restrict}{2c} +\cexample{nesting_restrict}{2} -\fexample{nesting_restrict}{2f} +\fexample{nesting_restrict}{2} The following example is non-conforming because the loop and \code{single} regions are closely nested: -\cexample{nesting_restrict}{3c} +\cexample{nesting_restrict}{3} -\fexample{nesting_restrict}{3f} +\fexample{nesting_restrict}{3} The following example is non-conforming because a \code{barrier} region cannot be closely nested inside a loop region: -\cexample{nesting_restrict}{4c} +\cexample{nesting_restrict}{4} -\fexample{nesting_restrict}{4f} +\fexample{nesting_restrict}{4} The following example is non-conforming because the \code{barrier} region cannot be closely nested inside the \code{critical} region. If this were permitted, it would result in deadlock due to the fact that only one thread at a time can enter the \code{critical} region: -\cexample{nesting_restrict}{5c} +\cexample{nesting_restrict}{5} -\fexample{nesting_restrict}{5f} +\fexample{nesting_restrict}{5} The following example is non-conforming because the \code{barrier} region cannot be closely nested inside the \code{single} region. If this were permitted, it would result in deadlock due to the fact that only one thread executes the \code{single} region: -\cexample{nesting_restrict}{6c} +\cexample{nesting_restrict}{6} -\fexample{nesting_restrict}{6f} +\fexample{nesting_restrict}{6} diff --git a/Examples_nowait.tex b/Examples_nowait.tex index ebdcffd..a71b19d 100644 --- a/Examples_nowait.tex +++ b/Examples_nowait.tex @@ -1,14 +1,14 @@ \pagebreak -\chapter{The \code{nowait} Clause} -\label{chap:nowait} +\section{The \code{nowait} Clause} +\label{sec:nowait} If there are multiple independent loops within a \code{parallel} region, you can use the \code{nowait} clause to avoid the implied barrier at the end of the loop construct, as follows: -\cexample{nowait}{1c} +\cexample{nowait}{1} -\fexample{nowait}{1f} +\fexample{nowait}{1} In the following example, static scheduling distributes the same logical iteration numbers to the threads that execute the three loop regions. This allows the \code{nowait} @@ -22,7 +22,7 @@ to \code{n-1} (from \code{1} to \code{N} in the Fortran version), while the iteration space of the last loop is from \code{1} to \code{n} (\code{2} to \code{N+1} in the Fortran version). -\cexample{nowait}{2c} +\cexample{nowait}{2} -\fexample{nowait}{2f} +\ffreeexample{nowait}{2} diff --git a/Examples_nthrs_dynamic.tex b/Examples_nthrs_dynamic.tex index 961a2f3..3ca3ba4 100644 --- a/Examples_nthrs_dynamic.tex +++ b/Examples_nthrs_dynamic.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{Interaction Between the \code{num\_threads} Clause and \code{omp\_set\_dynamic}} -\label{chap:nthrs_dynamic} +\section{Interaction Between the \code{num\_threads} Clause and \code{omp\_set\_dynamic}} +\label{sec:nthrs_dynamic} The following example demonstrates the \code{num\_threads} clause and the effect of the \\ @@ -12,17 +12,17 @@ of threads in OpenMP implementations that support it. In this case, 10 threads are provided. Note that in case of an error the OpenMP implementation is free to abort the program or to supply any number of threads available. -\cexample{nthrs_dynamic}{1c} +\cexample{nthrs_dynamic}{1} -\fexample{nthrs_dynamic}{1f} +\fexample{nthrs_dynamic}{1} The call to the \code{omp\_set\_dynamic} routine with a non-zero argument in C/C++, or \code{.TRUE.} in Fortran, allows the OpenMP implementation to choose any number of threads between 1 and 10. -\cexample{nthrs_dynamic}{2c} +\cexample{nthrs_dynamic}{2} -\fexample{nthrs_dynamic}{2f} +\fexample{nthrs_dynamic}{2} It is good practice to set the \plc{dyn-var} ICV explicitly by calling the \code{omp\_set\_dynamic} routine, as its default setting is implementation defined. diff --git a/Examples_nthrs_nesting.tex b/Examples_nthrs_nesting.tex index c5a0334..11028cf 100644 --- a/Examples_nthrs_nesting.tex +++ b/Examples_nthrs_nesting.tex @@ -1,12 +1,12 @@ \pagebreak -\chapter{Controlling the Number of Threads on Multiple Nesting Levels} -\label{chap:nthrs_nesting} +\section{Controlling the Number of Threads on Multiple Nesting Levels} +\label{sec:nthrs_nesting} The following examples demonstrate how to use the \code{OMP\_NUM\_THREADS} environment variable to control the number of threads on multiple nesting levels: -\cexample{nthrs_nesting}{1c} +\cexample{nthrs_nesting}{1} -\fexample{nthrs_nesting}{1f} +\fexample{nthrs_nesting}{1} diff --git a/Examples_ordered.tex b/Examples_ordered.tex index d757cfb..256a462 100644 --- a/Examples_ordered.tex +++ b/Examples_ordered.tex @@ -1,28 +1,28 @@ \pagebreak -\chapter{The \code{ordered} Clause and the \code{ordered} Construct} -\label{chap:ordered} +\section{The \code{ordered} Clause and the \code{ordered} Construct} +\label{sec:ordered} Ordered constructs are useful for sequentially ordering the output from work that is done in parallel. The following program prints out the indices in sequential order: -\cexample{ordered}{1c} +\cexample{ordered}{1} -\fexample{ordered}{1f} +\fexample{ordered}{1} It is possible to have multiple \code{ordered} constructs within a loop region with the \code{ordered} clause specified. The first example is non-conforming because all iterations execute two \code{ordered} regions. An iteration of a loop must not execute more than one \code{ordered} region: -\cexample{ordered}{2c} +\cexample{ordered}{2} -\fexample{ordered}{2f} +\fexample{ordered}{2} The following is a conforming example with more than one \code{ordered} construct. Each iteration will execute only one \code{ordered} region: -\cexample{ordered}{3c} +\cexample{ordered}{3} -\fexample{ordered}{3f} +\fexample{ordered}{3} diff --git a/Examples_parallel.tex b/Examples_parallel.tex index 06b56db..142340c 100644 --- a/Examples_parallel.tex +++ b/Examples_parallel.tex @@ -1,12 +1,12 @@ \pagebreak -\chapter{The \code{parallel} Construct} -\label{chap:parallel} +\section{The \code{parallel} Construct} +\label{sec:parallel} The \code{parallel} construct can be used in coarse-grain parallel programs. In the following example, each thread in the \code{parallel} region decides what part of the global array \plc{x} to work on, based on the thread number: -\cexample{parallel}{1c} +\cexample{parallel}{1} -\fexample{parallel}{1f} +\fexample{parallel}{1} diff --git a/Examples_ploop.tex b/Examples_ploop.tex index 40e113c..6bd03d3 100644 --- a/Examples_ploop.tex +++ b/Examples_ploop.tex @@ -1,11 +1,12 @@ -\chapter{A Simple Parallel Loop} -\label{chap:ploop} +\pagebreak +\section{A Simple Parallel Loop} +\label{sec:ploop} The following example demonstrates how to parallelize a simple loop using the parallel loop construct. The loop iteration variable is private by default, so it is not necessary to specify it explicitly in a \code{private} clause. -\cexample{ploop}{1c} +\cexample{ploop}{1} -\fexample{ploop}{1f} +\fexample{ploop}{1} diff --git a/Examples_pra_iterator.tex b/Examples_pra_iterator.tex index 60e5e2b..8008c49 100644 --- a/Examples_pra_iterator.tex +++ b/Examples_pra_iterator.tex @@ -1,11 +1,11 @@ \pagebreak -\chapter{Parallel Random Access Iterator Loop} +\section{Parallel Random Access Iterator Loop} \cppspecificstart -\label{chap:pra_iterator} +\label{sec:pra_iterator} The following example shows a parallel random access iterator loop. -\cnexample{pra_iterator}{1c} +\cppnexample{pra_iterator}{1} \cppspecificend diff --git a/Examples_private.tex b/Examples_private.tex index d25d7fc..8a912cf 100644 --- a/Examples_private.tex +++ b/Examples_private.tex @@ -1,31 +1,31 @@ \pagebreak -\chapter{The \code{private} Clause} -\label{chap:private} +\section{The \code{private} Clause} +\label{sec:private} In the following example, the values of original list items \plc{i} and \plc{j} are retained on exit from the \code{parallel} region, while the private list items \plc{i} and \plc{j} are modified within the \code{parallel} construct. -\cexample{private}{1c} +\cexample{private}{1} -\fexample{private}{1f} +\fexample{private}{1} In the following example, all uses of the variable \plc{a} within the loop construct in the routine \plc{f} refer to a private list item \plc{a}, while it is unspecified whether references to \plc{a} in the routine \plc{g} are to a private list item or the original list item. -\cexample{private}{2c} +\cexample{private}{2} -\fexample{private}{2f} +\fexample{private}{2} The following example demonstrates that a list item that appears in a \code{private} clause in a \code{parallel} construct may also appear in a \code{private} clause in an enclosed worksharing construct, which results in an additional private copy. -\cexample{private}{3c} +\cexample{private}{3} -\fexample{private}{3f} +\fexample{private}{3} diff --git a/Examples_psections.tex b/Examples_psections.tex index 3f96e18..08094de 100644 --- a/Examples_psections.tex +++ b/Examples_psections.tex @@ -1,13 +1,13 @@ \pagebreak -\chapter{The \code{parallel} \code{sections} Construct} -\label{chap:psections} +\section{The \code{parallel} \code{sections} Construct} +\label{sec:psections} In the following example routines \code{XAXIS}, \code{YAXIS}, and \code{ZAXIS} can be executed concurrently. The first \code{section} directive is optional. Note that all \code{section} directives need to appear in the \code{parallel sections} construct. -\cexample{psections}{1c} +\cexample{psections}{1} -\fexample{psections}{1f} +\fexample{psections}{1} diff --git a/Examples_reduction.tex b/Examples_reduction.tex index ae582f7..80898ee 100644 --- a/Examples_reduction.tex +++ b/Examples_reduction.tex @@ -1,44 +1,44 @@ \pagebreak -\chapter{The \code{reduction} Clause} -\label{chap:reduction} +\section{The \code{reduction} Clause} +\label{sec:reduction} The following example demonstrates the \code{reduction} clause ; note that some reductions can be expressed in the loop in several ways, as shown for the \code{max} and \code{min} reductions below: -\cexample{reduction}{1c} +\cexample{reduction}{1} -\fexample{reduction}{1f} +\ffreeexample{reduction}{1} A common implementation of the preceding example is to treat it as if it had been written as follows: -\cexample{reduction}{2c} +\cexample{reduction}{2} \fortranspecificstart -\fnexample{reduction}{2f} +\ffreenexample{reduction}{2} The following program is non-conforming because the reduction is on the \emph{intrinsic procedure name} \code{MAX} but that name has been redefined to be the variable named \code{MAX}. + +\ffreenexample{reduction}{3} % blue line floater at top of this page for "Fortran, cont." \begin{figure}[t!] \linewitharrows{-1}{dashed}{Fortran (cont.)}{8em} \end{figure} -\fnexample{reduction}{3f} - The following conforming program performs the reduction using the \emph{intrinsic procedure name} \code{MAX} even though the intrinsic \code{MAX} has been renamed to \code{REN}. -\fnexample{reduction}{4f} +\ffreenexample{reduction}{4} The following conforming program performs the reduction using \plc{intrinsic procedure name} \code{MAX} even though the intrinsic \code{MAX} has been renamed to \code{MIN}. -\fnexample{reduction}{5f} +\ffreenexample{reduction}{5} \fortranspecificend The following example is non-conforming because the initialization (\code{a = @@ -53,8 +53,13 @@ clause. This can be achieved by adding an explicit barrier after the assignment directive (which has an implied barrier), or by initializing \code{a} before the start of the \code{parallel} region. -\cexample{reduction}{3c} +\cexample{reduction}{6} -\fexample{reduction}{6f} +\fexample{reduction}{6} + +The following example demonstrates the reduction of array \plc{a}. In C/C++ this is illustrated by the explicit use of an array section \plc{a[0:N]} in the \code{reduction} clause. The corresponding Fortran example uses array syntax supported in the base language. As of the OpenMP 4.5 specification the explicit use of array section in the \code{reduction} clause in Fortran is not permitted. But this oversight will be fixed in the next release of the specification. +\cexample{reduction}{7} + +\ffreeexample{reduction}{7} diff --git a/Examples_set_dynamic_nthrs.tex b/Examples_set_dynamic_nthrs.tex index 5938e39..e29f788 100644 --- a/Examples_set_dynamic_nthrs.tex +++ b/Examples_set_dynamic_nthrs.tex @@ -1,7 +1,7 @@ \pagebreak -\chapter{The \code{omp\_set\_dynamic} and \\ +\section{The \code{omp\_set\_dynamic} and \\ \code{omp\_set\_num\_threads} Routines} -\label{chap:set_dynamic_nthrs} +\label{sec:set_dynamic_nthrs} Some programs rely on a fixed, prespecified number of threads to execute correctly. Because the default setting for the dynamic adjustment of the number of threads @@ -17,8 +17,8 @@ dynamic threads setting. The dynamic threads mechanism determines the number of threads to use at the start of the \code{parallel} region and keeps it constant for the duration of the region. -\cexample{set_dynamic_nthrs}{1c} +\cexample{set_dynamic_nthrs}{1} -\fexample{set_dynamic_nthrs}{1f} +\fexample{set_dynamic_nthrs}{1} diff --git a/Examples_simple_lock.tex b/Examples_simple_lock.tex index da9b8ca..cb59294 100644 --- a/Examples_simple_lock.tex +++ b/Examples_simple_lock.tex @@ -1,6 +1,5 @@ -\pagebreak -\chapter{Simple Lock Routines} -\label{chap:simple_lock} +\subsection{Simple Lock Routines} +\label{subsec:simple_lock} In the following example, the lock routines cause the threads to be idle while waiting for entry to the first critical section, but to do other work while waiting @@ -10,10 +9,10 @@ function does not, allowing the work in \code{skip} to be done. Note that the argument to the lock routines should have type \code{omp\_lock\_t}, and that there is no need to flush it. -\cexample{simple_lock}{1c} +\cexample{simple_lock}{1} Note that there is no need to flush the lock variable. -\fexample{simple_lock}{1f} +\fexample{simple_lock}{1} diff --git a/Examples_single.tex b/Examples_single.tex index b2502f8..4cabfa5 100644 --- a/Examples_single.tex +++ b/Examples_single.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{The \code{single} Construct} -\label{chap:single} +\section{The \code{single} Construct} +\label{sec:single} The following example demonstrates the \code{single} construct. In the example, only one thread prints each of the progress messages. All other threads will skip @@ -11,8 +11,8 @@ a \code{nowait} clause can be specified, as is done in the third \code{single} construct in this example. The user must not make any assumptions as to which thread will execute a \code{single} region. -\cexample{single}{1c} +\cexample{single}{1} -\fexample{single}{1f} +\fexample{single}{1} diff --git a/Examples_standalone.tex b/Examples_standalone.tex index acdd208..5e3c54e 100644 --- a/Examples_standalone.tex +++ b/Examples_standalone.tex @@ -1,31 +1,31 @@ \pagebreak -\chapter{Placement of \code{flush}, \code{barrier}, \code{taskwait} +\section{Placement of \code{flush}, \code{barrier}, \code{taskwait} and \code{taskyield} Directives} -\label{chap:standalone} +\label{sec:standalone} The following example is non-conforming, because the \code{flush}, \code{barrier}, \code{taskwait}, and \code{taskyield} directives are stand-alone directives and cannot be the immediate substatement of an \code{if} statement. -\cexample{standalone}{1c} +\cexample{standalone}{1} The following example is non-conforming, because the \code{flush}, \code{barrier}, \code{taskwait}, and \code{taskyield} directives are stand-alone directives and cannot be the action statement of an \code{if} statement or a labeled branch target. -\fexample{standalone}{1f} +\ffreeexample{standalone}{1} The following version of the above example is conforming because the \code{flush}, \code{barrier}, \code{taskwait}, and \code{taskyield} directives are enclosed in a compound statement. -\cexample{standalone}{2c} +\cexample{standalone}{2} The following example is conforming because the \code{flush}, \code{barrier}, \code{taskwait}, and \code{taskyield} directives are enclosed in an \code{if} construct or follow the labeled branch target. -\fexample{standalone}{2f} +\ffreeexample{standalone}{2} diff --git a/Examples_target.tex b/Examples_target.tex index d0a6742..50ec31e 100644 --- a/Examples_target.tex +++ b/Examples_target.tex @@ -1,29 +1,32 @@ \pagebreak -\chapter{\code{target} Construct} -\label{chap:target} +\section{\code{target} Construct} +\label{sec:target} -\section{\code{target} Construct on \code{parallel} Construct} +\subsection{\code{target} Construct on \code{parallel} Construct} +\label{subsec:target_parallel} This following example shows how the \code{target} construct offloads a code region to a target device. The variables \plc{p}, \plc{v1}, \plc{v2}, and \plc{N} are implicitly mapped to the target device. -\cexample{target}{1c} +\cexample{target}{1} -\fexample{target}{1f} +\ffreeexample{target}{1} -\section{\code{target} Construct with \code{map} Clause} +\subsection{\code{target} Construct with \code{map} Clause} +\label{subsec:target_map} This following example shows how the \code{target} construct offloads a code region to a target device. The variables \plc{p}, \plc{v1} and \plc{v2} are explicitly mapped to the target device using the \code{map} clause. The variable \plc{N} is implicitly mapped to the target device. -\cexample{target}{2c} +\cexample{target}{2} -\fexample{target}{2f} +\ffreeexample{target}{2} -\section{\code{map} Clause with \code{to}/\code{from} map-types} +\subsection{\code{map} Clause with \code{to}/\code{from} map-types} +\label{subsec:target_map_tofrom} The following example shows how the \code{target} construct offloads a code region to a target device. In the \code{map} clause, the \code{to} and \code{from} @@ -43,16 +46,17 @@ the variable \plc{p} is not initialized with the value of the corresponding vari on the host device, and at the end of the \code{target} region the variable \plc{p} is assigned to the corresponding variable on the host device. -\cexample{target}{3c} +\cexample{target}{3} The \code{to} and \code{from} map-types allow programmers to optimize data motion. Since data for the \plc{v} arrays are not returned, and data for the \plc{p} array are not transferred to the device, only one-half of the data is moved, compared to the default behavior of an implicit mapping. -\fexample{target}{3f} +\ffreeexample{target}{3} -\section{\code{map} Clause with Array Sections} +\subsection{\code{map} Clause with Array Sections} +\label{subsec:target_array_section} The following example shows how the \code{target} construct offloads a code region to a target device. In the \code{map} clause, map-types are used to optimize @@ -60,14 +64,14 @@ the mapping of variables to the target device. Because variables \plc{p}, \plc{v pointers, array section notation must be used to map the arrays. The notation \code{:N} is equivalent to \code{0:N}. -\cexample{target}{4c} +\cexample{target}{4} In C, the length of the pointed-to array must be specified. In Fortran the extent of the array is known and the length need not be specified. A section of the array can be specified with the usual Fortran syntax, as shown in the following example. The value 1 is assumed for the lower bound for array section \plc{v2(:N)}. -\fexample{target}{4f} +\ffreeexample{target}{4} A more realistic situation in which an assumed-size array is passed to \code{vec\_mult} requires that the length of the arrays be specified, because the compiler does @@ -75,9 +79,10 @@ not know the size of the storage. A section of the array must be specified with the usual Fortran syntax, as shown in the following example. The value 1 is assumed for the lower bound for array section \plc{v2(:N)}. -\fexample{target}{4bf} +\ffreeexample{target}{4b} -\section{\code{target} Construct with \code{if} Clause} +\subsection{\code{target} Construct with \code{if} Clause} +\label{subsec:target_if} The following example shows how the \code{target} construct offloads a code region to a target device. @@ -90,7 +95,18 @@ The \code{if} clause on the \code{parallel} construct indicates that if the variable \plc{N} is smaller than a second threshold then the \code{parallel} region is inactive. -\cexample{target}{5c} +\cexample{target}{5} -\fexample{target}{5f} +\ffreeexample{target}{5} +The following example is a modification of the above \plc{target.5} code to show the combined \code{target} +and parallel loop directives. It uses the \plc{directive-name} modifier in multiple \code{if} +clauses to specify the component directive to which it applies. + +The \code{if} clause with the \code{target} modifier applies to the \code{target} component of the +combined directive, and the \code{if} clause with the \code{parallel} modifier applies +to the \code{parallel} component of the combined directive. + +\cexample{target}{6} + +\ffreeexample{target}{6} diff --git a/Examples_target_data.tex b/Examples_target_data.tex index 5626523..b0ec8ad 100644 --- a/Examples_target_data.tex +++ b/Examples_target_data.tex @@ -1,8 +1,9 @@ \pagebreak -\chapter{\code{target} \code{data} Construct} -\label{chap:target_data} +\section{\code{target} \code{data} Construct} +\label{sec:target_data} -\section{Simple \code{target} \code{data} Construct} +\subsection{Simple \code{target} \code{data} Construct} +\label{subsec:target_data_simple} This example shows how the \code{target} \code{data} construct maps variables to a device data environment. The \code{target} \code{data} construct creates @@ -13,15 +14,16 @@ variables \plc{v1}, \plc{v2}, and \plc{p} from the enclosing device data environ \plc{N} is mapped into the new device data environment from the encountering task's data environment. -\cexample{target_data}{1c} +\cexample{target_data}{1} The Fortran code passes a reference and specifies the extent of the arrays in the declaration. No length information is necessary in the map clause, as is required with C/C++ pointers. -\fexample{target_data}{1f} +\ffreeexample{target_data}{1} -\section{\code{target} \code{data} Region Enclosing Multiple \code{target} Regions} +\subsection{\code{target} \code{data} Region Enclosing Multiple \code{target} Regions} +\label{subsec:target_data_multiregion} The following examples show how the \code{target} \code{data} construct maps variables to a device data environment of a \code{target} region. The \code{target} @@ -36,7 +38,7 @@ In the following example the variables \plc{v1} and \plc{v2} are mapped at each construct. Instead of mapping the variable \plc{p} twice, once at each \code{target} construct, \plc{p} is mapped once by the \code{target} \code{data} construct. -\cexample{target_data}{2c} +\cexample{target_data}{2} The Fortran code uses reference and specifies the extent of the \plc{p}, \plc{v1} and \plc{v2} arrays. @@ -45,14 +47,14 @@ C/C++ pointers. The arrays \plc{v1} and \plc{v2} are mapped at each \code{target Instead of mapping the array \plc{p} twice, once at each target construct, \plc{p} is mapped once by the \code{target} \code{data} construct. -\fexample{target_data}{2f} +\ffreeexample{target_data}{2} In the following example, the variable tmp defaults to \code{tofrom} map-type and is mapped at each \code{target} construct. The array \plc{Q} is mapped once at the enclosing \code{target} \code{data} region instead of at each \code{target} construct. -\cexample{target_data}{3c} +\cexample{target_data}{3} In the following example the arrays \plc{v1} and \plc{v2} are mapped at each \code{target} construct. Instead of mapping the array \plc{Q} twice at each \code{target} construct, @@ -61,9 +63,9 @@ variable is implicitly remapped for each \code{target} region, mapping the value from the device to the host at the end of the first \code{target} region, and from the host to the device for the second \code{target} region. -\fexample{target_data}{3f} +\ffreeexample{target_data}{3} -\section{\code{target} \code{data} Construct with Orphaned Call} +\subsection{\code{target} \code{data} Construct with Orphaned Call} The following two examples show how the \code{target} \code{data} construct maps variables to a device data environment. The \code{target} \code{data} @@ -88,7 +90,7 @@ of the storage location associated with their corresponding array sections. Note that the following pairs of array section storage locations are equivalent (\plc{p0[:N]}, \plc{p1[:N]}), (\plc{v1[:N]},\plc{v3[:N]}), and (\plc{v2[:N]},\plc{v4[:N]}). -\cexample{target_data}{4c} +\cexample{target_data}{4} The Fortran code maps the pointers and storage in an identical manner (same extent, but uses indices from 1 to \plc{N}). @@ -104,7 +106,7 @@ assigned the address of the storage location associated with their corresponding array sections. Note that the following pair of array storage locations are equivalent (\plc{p0},\plc{p1}), (\plc{v1},\plc{v3}), and (\plc{v2},\plc{v4}). -\fexample{target_data}{4f} +\ffreeexample{target_data}{4} In the following example, the variables \plc{p1}, \plc{v3}, and \plc{v4} are references to the pointer @@ -113,7 +115,7 @@ environment inherits the pointer variables \plc{p0}, \plc{v1}, and \plc{v2} from \code{data} construct's device data environment. Thus, \plc{p1}, \plc{v3}, and \plc{v4} are already present in the device data environment. -\cexample{target_data}{5c} +\cppexample{target_data}{5} In the following example, the usual Fortran approach is used for dynamic memory. The \plc{p0}, \plc{v1}, and \plc{v2} arrays are allocated in the main program and passed as references @@ -123,9 +125,10 @@ environment inherits the arrays \plc{p0}, \plc{v1}, and \plc{v2} from the enclos device data environment. Thus, \plc{p1}, \plc{v3}, and \plc{v4} are already present in the device data environment. -\fexample{target_data}{5f} +\ffreeexample{target_data}{5} -\section{\code{target} \code{data} Construct with \code{if} Clause} +\subsection{\code{target} \code{data} Construct with \code{if} Clause} +\label{subsec:target_data_if} The following two examples show how the \code{target} \code{data} construct maps variables to a device data environment. @@ -140,7 +143,7 @@ variable \plc{p} is implicitly mapped with a map-type of \code{tofrom}, but the location for the array section \plc{p[0:N]} will not be mapped in the device data environments of the \code{target} constructs. -\cexample{target_data}{6c} +\cexample{target_data}{6} The \code{if} clauses work the same way for the following Fortran code. The \code{target} constructs enclosed in the \code{target} \code{data} region should also use @@ -148,7 +151,7 @@ an \code{if} clause with the same condition, so that the \code{target} \code{dat region and the \code{target} region are either both created for the device, or are both ignored. -\fexample{target_data}{6f} +\ffreeexample{target_data}{6} In the following example, when the \code{if} clause conditional expression on the \code{target} construct evaluates to \plc{false}, the target region will @@ -159,7 +162,7 @@ region the array section \plc{p[0:N]} will be assigned from the device data envi to the corresponding variable in the data environment of the task that encountered the \code{target} \code{data} construct, resulting in undefined values in \plc{p[0:N]}. -\cexample{target_data}{7c} +\cexample{target_data}{7} The \code{if} clauses work the same way for the following Fortran code. When the \code{if} clause conditional expression on the \code{target} construct @@ -171,5 +174,5 @@ region the \plc{p} array will be assigned from the device data environment to th variable in the data environment of the task that encountered the \code{target} \code{data} construct, resulting in undefined values in \plc{p}. -\fexample{target_data}{7f} +\ffreeexample{target_data}{7} diff --git a/Examples_target_unstructured_data.tex b/Examples_target_unstructured_data.tex new file mode 100644 index 0000000..82605c2 --- /dev/null +++ b/Examples_target_unstructured_data.tex @@ -0,0 +1,47 @@ +%begin +\pagebreak +\section{\code{target} \code{enter} \code{data} and \code{target} \code{exit} \code{data} Constructs} +\label{sec:target_enter_exit_data} +%\section{Simple target enter data and target exit data Constructs} + +The structured data construct (\code{target}~\code{data}) provides persistent data on a +device for subsequent \code{target} constructs as shown in the +\code{target}~\code{data} examples above. This is accomplished by creating a single +\code{target}~\code{data} region containing \code{target} constructs. + +The unstructured data constructs allow the creation and deletion of data on +the device at any appropriate point within the host code, as shown below +with the \code{target}~\code{enter}~\code{data} and \code{target}~\code{exit}~\code{data} constructs. + +The following C++ code creates/deletes a vector in a constructor/destructor +of a class. The constructor creates a vector with \code{target}~\code{enter}~\code{data} +and uses an \code{alloc} modifier in the \code{map} clause to avoid copying values +to the device. The destructor deletes the data (\code{target}~\code{exit}~\code{data}) +and uses the \code{delete} modifier in the \code{map} clause to avoid copying data +back to the host. Note, the stand-alone \code{target}~\code{enter}~\code{data} occurs +after the host vector is created, and the \code{target}~\code{exit}~\code{data} +construct occurs before the host data is deleted. + +\cppexample{target_unstructured_data}{1} + +The following C code allocates and frees the data member of a Matrix structure. +The \code{init\_matrix} function allocates the memory used in the structure and +uses the \code{target}~\code{enter}~\code{data} directive to map it to the target device. The +\code{free\_matrix} function removes the mapped array from the target device +and then frees the memory on the host. Note, the stand-alone +\code{target}~\code{enter}~\code{data} occurs after the host memory is allocated, and the +\code{target}~\code{exit}~\code{data} construct occurs before the host data is freed. + +\cexample{target_unstructured_data}{1} + +The following Fortran code allocates and deallocates a module array. The +\code{initialize} subroutine allocates the module array and uses the +\code{target}~\code{enter}~\code{data} directive to map it to the target device. The +\code{finalize} subroutine removes the mapped array from the target device and +then deallocates the array on the host. Note, the stand-alone +\code{target}~\code{enter}~\code{data} occurs after the host memory is allocated, and the +\code{target}~\code{exit}~\code{data} construct occurs before the host data is deallocated. + +\ffreeexample{target_unstructured_data}{1} +%end + diff --git a/Examples_target_update.tex b/Examples_target_update.tex index a945d82..e2f4ced 100644 --- a/Examples_target_update.tex +++ b/Examples_target_update.tex @@ -1,8 +1,9 @@ \pagebreak -\chapter{\code{target} \code{update} Construct} -\label{chap:target_update} +\section{\code{target} \code{update} Construct} +\label{sec:target_update} -\section{Simple \code{target} \code{data} and \code{target} \code{update} Constructs} +\subsection{Simple \code{target} \code{data} and \code{target} \code{update} Constructs} +\label{subsec:target_data_and_update} The following example shows how the \code{target} \code{update} construct updates variables in a device data environment. @@ -26,11 +27,12 @@ region and waits for the completion of the region. The second \code{target} region uses the updated values of \plc{v1[:N]} and \plc{v2[:N]}. -\cexample{target_update}{1c} +\cexample{target_update}{1} -\fexample{target_update}{1f} +\ffreeexample{target_update}{1} -\section{\code{target} \code{update} Construct with \code{if} Clause} +\subsection{\code{target} \code{update} Construct with \code{if} Clause} +\label{subsec:target_update_if} The following example shows how the \code{target} \code{update} construct updates variables in a device data environment. @@ -47,7 +49,7 @@ assigns the new values of \plc{v1} and \plc{v2} from the task's data environment mapped array sections in the \code{target} \code{data} construct's device data environment. -\cexample{target_update}{2c} +\cexample{target_update}{2} -\fexample{target_update}{2f} +\ffreeexample{target_update}{2} diff --git a/Examples_task_dep.tex b/Examples_task_dep.tex index 98089af..ea3f542 100644 --- a/Examples_task_dep.tex +++ b/Examples_task_dep.tex @@ -1,58 +1,62 @@ \pagebreak -\chapter{Task Dependences} -\label{chap:task_dep} +\section{Task Dependences} +\label{sec:task_depend} -\section{Flow Dependence} +\subsection{Flow Dependence} +\label{subsec:task_flow_depend} In this example we show a simple flow dependence expressed using the \code{depend} clause on the \code{task} construct. -\cexample{task_dep}{1c} +\cexample{task_dep}{1} -\fexample{task_dep}{1f} +\ffreeexample{task_dep}{1} The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend} clauses enforce the ordering of the tasks. If the \code{depend} clauses had been omitted, then the tasks could execute in any order and the program and the program would have a race condition. -\section{Anti-dependence} +\subsection{Anti-dependence} +\label{subsec:task_anti_depend} In this example we show an anti-dependence expressed using the \code{depend} clause on the \code{task} construct. -\cexample{task_dep}{2c} +\cexample{task_dep}{2} -\fexample{task_dep}{2f} +\ffreeexample{task_dep}{2} The program will always print \texttt{"}x = 1\texttt{"}, because the \code{depend} clauses enforce the ordering of the tasks. If the \code{depend} clauses had been omitted, then the tasks could execute in any order and the program would have a race condition. -\section{Output Dependence} +\subsection{Output Dependence} +\label{subsec:task_out_depend} In this example we show an output dependence expressed using the \code{depend} clause on the \code{task} construct. -\cexample{task_dep}{3c} +\cexample{task_dep}{3} -\fexample{task_dep}{3f} +\ffreeexample{task_dep}{3} The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend} clauses enforce the ordering of the tasks. If the \code{depend} clauses had been omitted, then the tasks could execute in any order and the program would have a race condition. -\section{Concurrent Execution with Dependences} +\subsection{Concurrent Execution with Dependences} +\label{subsec:task_concurrent_depend} In this example we show potentially concurrent execution of tasks using multiple flow dependences expressed using the \code{depend} clause on the \code{task} construct. -\cexample{task_dep}{4c} +\cexample{task_dep}{4} -\fexample{task_dep}{4f} +\ffreeexample{task_dep}{4} The last two tasks are dependent on the first task. However there is no dependence between the last two tasks, which may execute in any order (or concurrently if @@ -61,12 +65,13 @@ more than one thread is available). Thus, the possible outputs are \texttt{"}x If the \code{depend} clauses had been omitted, then all of the tasks could execute in any order and the program would have a race condition. -\section{Matrix multiplication} +\subsection{Matrix multiplication} +\label{subsec:task_matrix_mult} This example shows a task-based blocked matrix multiplication. Matrices are of NxN elements, and the multiplication is implemented using blocks of BSxBS elements. -\cexample{task_dep}{5c} +\cexample{task_dep}{5} -\fexample{task_dep}{5f} +\ffreeexample{task_dep}{5} diff --git a/Examples_task_priority.tex b/Examples_task_priority.tex new file mode 100644 index 0000000..734efc2 --- /dev/null +++ b/Examples_task_priority.tex @@ -0,0 +1,22 @@ +\pagebreak +\section{Task Priority} +\label{sec:task_priority} + + + +%\subsection{Task Priority} +%\label{subsec:task_priority} + +In this example we compute arrays in a matrix through a \plc{compute\_array} routine. +Each task has a priority value equal to the value of the loop variable \plc{i} at the +moment of its creation. A higher priority on a task means that a task is a candidate +to run sooner. + +The creation of tasks occurs in ascending order (according to the iteration space of +the loop) but a hint, by means of the \code{priority} clause, is provided to reverse +the execution order. + +\cexample{task_priority}{1} + +\ffreeexample{task_priority}{1} + diff --git a/Examples_taskgroup.tex b/Examples_taskgroup.tex index 7913bfb..b098af7 100644 --- a/Examples_taskgroup.tex +++ b/Examples_taskgroup.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{The \code{taskgroup} Construct} -\label{chap:taskgroup} +\section{The \code{taskgroup} Construct} +\label{sec:taskgroup} In this example, tasks are grouped and synchronized using the \code{taskgroup} construct. @@ -14,7 +14,7 @@ does not participate in the synchronization, and is left free to execute in para This is opposed to the behaviour of the \code{taskwait} construct, which would include the background tasks in the synchronization. -\cexample{taskgroup}{1c} +\cexample{taskgroup}{1} -\fexample{taskgroup}{1f} +\ffreeexample{taskgroup}{1} diff --git a/Examples_tasking.tex b/Examples_tasking.tex index 0432493..891be2f 100644 --- a/Examples_tasking.tex +++ b/Examples_tasking.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{The \code{task} and \code{taskwait} Constructs} -\label{chap:tasking} +\section{The \code{task} and \code{taskwait} Constructs} +\label{sec:task_taskwait} The following example shows how to traverse a tree-like structure using explicit tasks. Note that the \code{traverse} function should be called from within a @@ -9,17 +9,17 @@ note that the tasks will be executed in no specified order because there are no synchronization directives. Thus, assuming that the traversal will be done in post order, as in the sequential code, is wrong. -\cexample{tasking}{1c} +\cexample{tasking}{1} -\fexample{tasking}{1f} +\ffreeexample{tasking}{1} In the next example, we force a postorder traversal of the tree by adding a \code{taskwait} directive. Now, we can safely assume that the left and right sons have been executed before we process the current node. -\cexample{tasking}{2c} +\cexample{tasking}{2} -\fexample{tasking}{2f} +\ffreeexample{tasking}{2} The following example demonstrates how to use the \code{task} construct to process elements of a linked list in parallel. The thread executing the \code{single} @@ -28,18 +28,18 @@ in the current team. The pointer \plc{p} is \code{firstprivate} by default on the \code{task} construct so it is not necessary to specify it in a \code{firstprivate} clause. -\cexample{tasking}{3c} +\cexample{tasking}{3} -\fexample{tasking}{3f} +\ffreeexample{tasking}{3} The \code{fib()} function should be called from within a \code{parallel} region for the different specified tasks to be executed in parallel. Also, only one thread of the \code{parallel} region should call \code{fib()} unless multiple concurrent Fibonacci computations are desired. -\cexample{tasking}{4c} +\cexample{tasking}{4} -\fexample{tasking}{4f} +\fexample{tasking}{4} Note: There are more efficient algorithms for computing Fibonacci numbers. This classic recursion algorithm is for illustrative purposes. @@ -52,9 +52,9 @@ loop to suspend its task at the task scheduling point in the \code{task} directi and start executing unassigned tasks. Once the number of unassigned tasks is sufficiently low, the thread may resume execution of the task generating loop. -\cexample{tasking}{5c} +\cexample{tasking}{5} \pagebreak -\fexample{tasking}{5f} +\fexample{tasking}{5} The following example is the same as the previous one, except that the tasks are generated in an untied task. While generating the tasks, the implementation may @@ -69,9 +69,9 @@ to resume the task generating loop. In the previous examples, the other threads would be forced to idle until the generating thread finishes its long task, since the task generating loop was in a tied task. -\cexample{tasking}{6c} +\cexample{tasking}{6} -\fexample{tasking}{6f} +\fexample{tasking}{6} The following two examples demonstrate how the scheduling rules illustrated in Section 2.11.3 of the OpenMP 4.0 specification affect the usage of @@ -86,20 +86,20 @@ both of the task regions that modify \code{tp}. The parts of these task regions in which \code{tp} is modified may be executed in any order so the resulting value of \code{var} can be either 1 or 2. -\cexample{tasking}{7c} +\cexample{tasking}{7} -\fexample{tasking}{7f} +\fexample{tasking}{7} In this example, scheduling constraints prohibit a thread in the team from executing a new task that modifies \code{tp} while another such task region tied to the same thread is suspended. Therefore, the value written will persist across the task scheduling point. -\cexample{tasking}{8c} +\cexample{tasking}{8} -\fexample{tasking}{8f} +\fexample{tasking}{8} The following two examples demonstrate how the scheduling rules illustrated in Section 2.11.3 of the OpenMP 4.0 specification affect the usage of locks @@ -112,20 +112,20 @@ it encounters the task scheduling point at task 3, it could suspend task 1 and begin task 2 which will result in a deadlock when it tries to enter critical region 1. -\cexample{tasking}{9c} +\cexample{tasking}{9} -\fexample{tasking}{9f} +\fexample{tasking}{9} In the following example, \code{lock} is held across a task scheduling point. However, according to the scheduling restrictions, the executing thread can't begin executing one of the non-descendant tasks that also acquires \code{lock} before the task region is complete. Therefore, no deadlock is possible. -\cexample{tasking}{10c} +\cexample{tasking}{10} -\fexample{tasking}{10f} +\ffreeexample{tasking}{10} The following examples illustrate the use of the \code{mergeable} clause in the \code{task} construct. In this first example, the \code{task} construct has @@ -139,9 +139,9 @@ outcome does not depend on whether or not the task is merged (that is, the task will always increment the same variable and will always compute the same value for \code{x}). -\cexample{tasking}{11c} +\cexample{tasking}{11} -\fexample{tasking}{11f} +\ffreeexample{tasking}{11} This second example shows an incorrect use of the \code{mergeable} clause. In this example, the created task will access different instances of the variable @@ -150,9 +150,9 @@ it will access the same variable \code{x} if the task is merged. As a result, the behavior of the program is unspecified and it can print two different values for \code{x} depending on the decisions taken by the implementation. -\cexample{tasking}{12c} +\cexample{tasking}{12} -\fexample{tasking}{12f} +\ffreeexample{tasking}{12} The following example shows the use of the \code{final} clause and the \code{omp\_in\_final} API call in a recursive binary search program. To reduce overhead, once a certain @@ -170,9 +170,9 @@ in the stack could also be avoided but it would make this example less clear. Th clause since all tasks created in a \code{final} task region are included tasks that can be merged if the \code{mergeable} clause is present. -\cexample{tasking}{13c} +\cexample{tasking}{13} -\fexample{tasking}{13f} +\ffreeexample{tasking}{13} The following example illustrates the difference between the \code{if} and the \code{final} clauses. The \code{if} clause has a local effect. In the first @@ -184,7 +184,7 @@ task itself. In the second nest of tasks, the nested tasks will be created as in tasks. Note also that the conditions for the \code{if} and \code{final} clauses are usually the opposite. -\cexample{tasking}{14c} +\cexample{tasking}{14} -\fexample{tasking}{14f} +\ffreeexample{tasking}{14} diff --git a/Examples_taskloop.tex b/Examples_taskloop.tex new file mode 100644 index 0000000..04d7aeb --- /dev/null +++ b/Examples_taskloop.tex @@ -0,0 +1,14 @@ +\pagebreak +\section{The \code{taskloop} Construct} +\label{sec:taskloop} + +The following example illustrates how to execute a long running task concurrently with tasks created +with a \code{taskloop} directive for a loop having unbalanced amounts of work for its iterations. + +The \code{grainsize} clause specifies that each task is to execute at least 500 iterations of the loop. + +The \code{nogroup} clause removes the implicit taskgroup of the \code{taskloop} construct; the explicit \code{taskgroup} construct in the example ensures that the function is not exited before the long-running task and the loops have finished execution. + +\cexample{taskloop}{1} + +\ffreeexample{taskloop}{1} diff --git a/Examples_taskyield.tex b/Examples_taskyield.tex index 88a8dfd..4a8eefa 100644 --- a/Examples_taskyield.tex +++ b/Examples_taskyield.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{The \code{taskyield} Construct} -\label{chap:taskyield} +\section{The \code{taskyield} Construct} +\label{sec:taskyield} The following example illustrates the use of the \code{taskyield} directive. The tasks in the example compute something useful and then do some computation @@ -8,7 +8,7 @@ that must be done in a critical region. By using \code{taskyield} when a task cannot get access to the \code{critical} region the implementation can suspend the current task and schedule some other task that can do something useful. -\cexample{taskyield}{1c} +\cexample{taskyield}{1} -\fexample{taskyield}{1f} +\ffreeexample{taskyield}{1} diff --git a/Examples_teams.tex b/Examples_teams.tex index a680d09..2a48344 100644 --- a/Examples_teams.tex +++ b/Examples_teams.tex @@ -1,9 +1,10 @@ \pagebreak -\chapter{\code{teams} Constructs} -\label{chap:teams} +\section{\code{teams} Constructs} +\label{sec:teams} -\section{\code{target} and \code{teams} Constructs with \code{omp\_get\_num\_teams}\\ +\subsection{\code{target} and \code{teams} Constructs with \code{omp\_get\_num\_teams}\\ and \code{omp\_get\_team\_num} Routines} +\label{subsec:teams_api} The following example shows how the \code{target} and \code{teams} constructs are used to create a league of thread teams that execute a region. The \code{teams} @@ -15,11 +16,12 @@ region. The \code{omp\_get\_team\_num} routine returns the team number, which is between 0 and one less than the value returned by \code{omp\_get\_num\_teams}. The following example manually distributes a loop across two teams. -\cexample{teams}{1c} +\cexample{teams}{1} -\fexample{teams}{1f} +\ffreeexample{teams}{1} -\section{\code{target}, \code{teams}, and \code{distribute} Constructs} +\subsection{\code{target}, \code{teams}, and \code{distribute} Constructs} +\label{subsec:teams_distribute} The following example shows how the \code{target}, \code{teams}, and \code{distribute} constructs are used to execute a loop nest in a \code{target} region. The \code{teams} @@ -45,11 +47,12 @@ created by the \code{teams} construct. At the end of the \code{teams} region, each master thread's private copy of \plc{sum} is reduced into the final \plc{sum} that is implicitly mapped into the \code{target} region. -\cexample{teams}{2c} +\cexample{teams}{2} -\fexample{teams}{2f} +\ffreeexample{teams}{2} -\section{\code{target} \code{teams}, and Distribute Parallel Loop Constructs} +\subsection{\code{target} \code{teams}, and Distribute Parallel Loop Constructs} +\label{subsec:teams_distribute_parallel} The following example shows how the \code{target} \code{teams} and distribute parallel loop constructs are used to execute a \code{target} region. The \code{target} @@ -59,12 +62,13 @@ team executes the \code{teams} region. The distribute parallel loop construct schedules the loop iterations across the master threads of each team and then across the threads of each team. -\cexample{teams}{3c} +\cexample{teams}{3} -\fexample{teams}{3f} +\ffreeexample{teams}{3} -\section{\code{target} \code{teams} and Distribute Parallel Loop +\subsection{\code{target} \code{teams} and Distribute Parallel Loop Constructs with Scheduling Clauses} +\label{subsec:teams_distribute_parallel_schedule} The following example shows how the \code{target} \code{teams} and distribute parallel loop constructs are used to execute a \code{target} region. The \code{teams} @@ -83,11 +87,12 @@ The \code{schedule} clause indicates that the 1024 iterations distributed to a master thread are then assigned to the threads in its associated team in chunks of 64 iterations. -\cexample{teams}{4c} +\cexample{teams}{4} -\fexample{teams}{4f} +\ffreeexample{teams}{4} -\section{\code{target} \code{teams} and \code{distribute} \code{simd} Constructs} +\subsection{\code{target} \code{teams} and \code{distribute} \code{simd} Constructs} +\label{subsec:teams_distribute_simd} The following example shows how the \code{target} \code{teams} and \code{distribute} \code{simd} constructs are used to execute a loop in a \code{target} region. @@ -97,11 +102,12 @@ master thread of each team executes the \code{teams} region. The \code{distribute} \code{simd} construct schedules the loop iterations across the master thread of each team and then uses SIMD parallelism to execute the iterations. -\cexample{teams}{5c} +\cexample{teams}{5} -\fexample{teams}{5f} +\ffreeexample{teams}{5} -\section{\code{target} \code{teams} and Distribute Parallel Loop SIMD Constructs} +\subsection{\code{target} \code{teams} and Distribute Parallel Loop SIMD Constructs} +\label{subsec:teams_distribute_parallel_simd} The following example shows how the \code{target} \code{teams} and the distribute parallel loop SIMD constructs are used to execute a loop in a \code{target} \code{teams} @@ -112,7 +118,7 @@ The distribute parallel loop SIMD construct schedules the loop iterations across the master thread of each team and then across the threads of each team where each thread uses SIMD parallelism. -\cexample{teams}{6c} +\cexample{teams}{6} -\fexample{teams}{6f} +\ffreeexample{teams}{6} diff --git a/Examples_threadprivate.tex b/Examples_threadprivate.tex index e507d10..21c5157 100644 --- a/Examples_threadprivate.tex +++ b/Examples_threadprivate.tex @@ -1,18 +1,18 @@ \pagebreak -\chapter{The \code{threadprivate} Directive} -\label{chap:threadprivate} +\section{The \code{threadprivate} Directive} +\label{sec:threadprivate} The following examples demonstrate how to use the \code{threadprivate} directive to give each thread a separate counter. -\cexample{threadprivate}{1c} +\cexample{threadprivate}{1} -\fexample{threadprivate}{1f} +\fexample{threadprivate}{1} \ccppspecificstart The following example uses \code{threadprivate} on a static variable: -\cnexample{threadprivate}{2c} +\cnexample{threadprivate}{2} The following example demonstrates unspecified behavior for the initialization of a \code{threadprivate} variable. A \code{threadprivate} variable is initialized @@ -22,7 +22,7 @@ constructed using the value of \code{x} (which is modified by the statement region could be either 1 or 2. This problem is avoided for \code{b}, which uses an auxiliary \code{const} variable and a copy-constructor. -\cnexample{threadprivate}{3c} +\cppnexample{threadprivate}{3} \ccppspecificend The following examples show non-conforming uses and correct uses of the \code{threadprivate} @@ -32,29 +32,25 @@ directive. The following example is non-conforming because the common block is not declared local to the subroutine that refers to it: -\fnexample{threadprivate}{2f} +\fnexample{threadprivate}{2} The following example is also non-conforming because the common block is not declared local to the subroutine that refers to it: -\fnexample{threadprivate}{3f} +\fnexample{threadprivate}{3} The following example is a correct rewrite of the previous example: -% blue line floater at top of this page for "Fortran, cont." -\begin{figure}[t!] -\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em} -\end{figure} -\fnexample{threadprivate}{4f} +\fnexample{threadprivate}{4} The following is an example of the use of \code{threadprivate} for local variables: - -\fnexample{threadprivate}{5f} % blue line floater at top of this page for "Fortran, cont." \begin{figure}[t!] \linewitharrows{-1}{dashed}{Fortran (cont.)}{8em} \end{figure} +\fnexample{threadprivate}{5} + The above program, if executed by two threads, will print one of the following two sets of output: @@ -85,8 +81,12 @@ or \code{i = 5} The following is an example of the use of \code{threadprivate} for module variables: +% blue line floater at top of this page for "Fortran, cont." +\begin{figure}[t!] +\linewitharrows{-1}{dashed}{Fortran (cont.)}{8em} +\end{figure} -\fnexample{threadprivate}{6f} +\fnexample{threadprivate}{6} \fortranspecificend \cppspecificstart @@ -95,12 +95,12 @@ for class-type \code{T}. \code{t1} is default constructed, \code{t2} is construc taking a constructor accepting one argument of integer type, \code{t3} is copy constructed with argument \code{f()}: -\cnexample{threadprivate}{4c} +\cppnexample{threadprivate}{4} The following example illustrates the use of \code{threadprivate} for static class members. The \code{threadprivate} directive for a static class member must be placed inside the class definition. -\cnexample{threadprivate}{5c} +\cppnexample{threadprivate}{5} \cppspecificend diff --git a/Examples_workshare.tex b/Examples_workshare.tex index 0870f3d..ea1207f 100644 --- a/Examples_workshare.tex +++ b/Examples_workshare.tex @@ -1,7 +1,7 @@ \pagebreak -\chapter{The \code{workshare} Construct} +\section{The \code{workshare} Construct} \fortranspecificstart -\label{chap:workshare} +\label{sec:workshare} The following are examples of the \code{workshare} construct. @@ -10,14 +10,14 @@ the \code{parallel} region, and there is a barrier after the last statement. Implementations must enforce Fortran execution rules inside of the \code{workshare} block. -\fnexample{workshare}{1f} +\fnexample{workshare}{1} In the following example, the barrier at the end of the first \code{workshare} region is eliminated with a \code{nowait} clause. Threads doing \code{CC = DD} immediately begin work on \code{EE = FF} when they are done with \code{CC = DD}. -\fnexample{workshare}{2f} +\fnexample{workshare}{2} % blue line floater at top of this page for "Fortran, cont." \begin{figure}[t!] \linewitharrows{-1}{dashed}{Fortran (cont.)}{8em} @@ -27,7 +27,7 @@ The following example shows the use of an \code{atomic} directive inside a \code construct. The computation of \code{SUM(AA)} is workshared, but the update to \code{R} is atomic. -\fnexample{workshare}{3f} +\fnexample{workshare}{3} Fortran \code{WHERE} and \code{FORALL} statements are \emph{compound statements}, made up of a \emph{control} part and a \emph{statement} part. When \code{workshare} @@ -47,7 +47,7 @@ Each task gets worked on in order by the threads: \\ \code{GG = HH} -\fnexample{workshare}{4f} +\fnexample{workshare}{4} % blue line floater at top of this page for "Fortran, cont." \begin{figure}[t!] \linewitharrows{-1}{dashed}{Fortran (cont.)}{8em} @@ -56,21 +56,21 @@ Each task gets worked on in order by the threads: In the following example, an assignment to a shared scalar variable is performed by one thread in a \code{workshare} while all other threads in the team wait. -\fnexample{workshare}{5f} +\fnexample{workshare}{5} The following example contains an assignment to a private scalar variable, which is performed by one thread in a \code{workshare} while all other threads wait. It is non-conforming because the private scalar variable is undefined after the assignment statement. -\fnexample{workshare}{6f} +\fnexample{workshare}{6} Fortran execution rules must be enforced inside a \code{workshare} construct. In the following example, the same result is produced in the following program fragment regardless of whether the code is executed sequentially or inside an OpenMP program with multiple threads: -\fnexample{workshare}{7f} +\fnexample{workshare}{7} \fortranspecificend diff --git a/Examples_worksharing_critical.tex b/Examples_worksharing_critical.tex index c3e785d..7f7a7fb 100644 --- a/Examples_worksharing_critical.tex +++ b/Examples_worksharing_critical.tex @@ -1,6 +1,6 @@ \pagebreak -\chapter{Worksharing Constructs Inside a \code{critical} Construct} -\label{chap:worksharing_critical} +\section{Worksharing Constructs Inside a \code{critical} Construct} +\label{sec:worksharing_critical} The following example demonstrates using a worksharing construct inside a \code{critical} construct. This example is conforming because the worksharing \code{single} @@ -11,8 +11,8 @@ region, creates a new team of threads, and becomes the master of the new team. One of the threads in the new team enters the \code{single} region and increments \code{i} by \code{1}. At the end of this example \code{i} is equal to \code{2}. -\cexample{worksharing_critical}{1c} +\cexample{worksharing_critical}{1} -\fexample{worksharing_critical}{1f} +\fexample{worksharing_critical}{1} diff --git a/History.tex b/History.tex index 4e577f7..477a551 100644 --- a/History.tex +++ b/History.tex @@ -1,11 +1,39 @@ \chapter{Document Revision History} \label{chap:history} +\section{Changes from 4.0.2 to 4.5.0} +\begin{itemize} +\item Reorganized into chapters of major topics +\item Included file extensions in example labels to indicate source type +\item Applied the explicit \code{map(tofrom)} for scalar variables +in a number of examples to comply with +the change of the default behavior for scalar variables from +\code{map(tofrom)} to \code{firstprivate} in the 4.5 specification +\item Added the following new examples: +\begin{itemize} +\item \code{linear} clause in loop constructs (\specref{sec:linear_in_loop}) +\item task priority (\specref{sec:task_priority}) +\item \code{taskloop} construct (\specref{sec:taskloop}) +\item \plc{directive-name} modifier in multiple \code{if} clauses on +a combined construct (\specref{subsec:target_if}) +\item unstructured data mapping (\specref{sec:target_enter_exit_data}) +\item \code{link} clause for \code{declare}~\code{target} directive +(\specref{subsec:declare_target_link}) +\item asynchronous target execution with \code{nowait} clause (\specref{sec:async_target_exec_depend}) +\item device memory routines and device pointers +(\specref{subsec:target_mem_and_device_ptrs}) +\item doacross loop nest (\specref{sec:doacross}) +\item locks with hints (\specref{sec:locks}) +\item C/C++ array reduction (\specref{sec:reduction}) +\item C++ reference types in data sharing clauses (\specref{sec:cpp_reference}) +\end{itemize} +\end{itemize} + \section{Changes from 4.0.1 to 4.0.2} \begin{itemize} \item Names of examples were changed from numbers to mnemonics -\item Added SIMD examples (\specref{chap:SIMD}) +\item Added SIMD examples (\specref{sec:SIMD}) \item Applied miscellaneous fixes in several source codes \item Added the revision history \end{itemize} @@ -14,8 +42,8 @@ Added the following new examples: \begin{itemize} -\item the \code{proc\_bind} clause (\specref{chap:affinity}) -\item the \code{taskgroup} construct (\specref{chap:taskgroup}) +\item the \code{proc\_bind} clause (\specref{sec:affinity}) +\item the \code{taskgroup} construct (\specref{sec:taskgroup}) \end{itemize} \section{Changes from 3.1 to 4.0} @@ -25,16 +53,16 @@ from the specification document. Version 4.0 added the following new examples: \begin{itemize} -\item task dependences (\specref{chap:task_dep}) -\item cancellation constructs (\specref{chap:cancellation}) -\item \code{target} construct (\specref{chap:target}) -\item \code{target} \code{data} construct (\specref{chap:target_data}) -\item \code{target} \code{update} construct (\specref{chap:target_update}) -\item \code{declare} \code{target} construct (\specref{chap:declare_target}) -\item \code{teams} constructs (\specref{chap:teams}) +\item task dependences (\specref{sec:task_depend}) +\item \code{target} construct (\specref{sec:target}) +\item \code{target} \code{data} construct (\specref{sec:target_data}) +\item \code{target} \code{update} construct (\specref{sec:target_update}) +\item \code{declare} \code{target} construct (\specref{sec:declare_target}) +\item \code{teams} constructs (\specref{sec:teams}) \item asynchronous execution of a \code{target} region using tasks - (\specref{chap:async_target}) -\item array sections in device constructs (\specref{chap:array_sections}) -\item device runtime routines (\specref{chap:device}) -\item Fortran ASSOCIATE construct (\specref{chap:associate}) + (\specref{subsec:async_target_with_tasks}) +\item array sections in device constructs (\specref{sec:array_sections}) +\item device runtime routines (\specref{sec:device}) +\item Fortran ASSOCIATE construct (\specref{sec:associate}) +\item cancellation constructs (\specref{sec:cancellation}) \end{itemize} diff --git a/Introduction_Chapt.tex b/Introduction_Chapt.tex index 5f1a7b7..2b493df 100644 --- a/Introduction_Chapt.tex +++ b/Introduction_Chapt.tex @@ -34,13 +34,14 @@ \chapter*{Introduction} \label{chap:introduction} +\addcontentsline{toc}{chapter}{\protect\numberline{}Introduction} This collection of programming examples supplements the OpenMP API for Shared Memory Parallelization specifications, and is not part of the formal specifications. It assumes familiarity with the OpenMP specifications, and shares the typographical conventions used in that document. \notestart -\noteheader – This first release of the OpenMP Examples reflects the OpenMP Version 4.0 +\noteheader – This first release of the OpenMP Examples reflects the OpenMP Version 4.5 specifications. Additional examples are being developed and will be published in future releases of this document. \noteend diff --git a/Makefile b/Makefile index 8babd70..3257577 100644 --- a/Makefile +++ b/Makefile @@ -1,75 +1,20 @@ # Makefile for the OpenMP Examples document in LaTex format. # For more information, see the master document, openmp-examples.tex. -version=4.0.2 +version=4.5.0 default: openmp-examples.pdf CHAPTERS=Title_Page.tex \ Introduction_Chapt.tex \ - Examples_Chapt.tex \ - Examples_ploop.tex \ - Examples_mem_model.tex \ - Examples_cond_comp.tex \ - Examples_icv.tex \ - Examples_parallel.tex \ - Examples_nthrs_nesting.tex \ - Examples_nthrs_dynamic.tex \ - Examples_affinity.tex \ - Examples_fort_do.tex \ - Examples_fort_loopvar.tex \ - Examples_nowait.tex \ - Examples_collapse.tex \ - Examples_psections.tex \ - Examples_fpriv_sections.tex \ - Examples_single.tex \ - Examples_tasking.tex \ - Examples_task_dep.tex \ - Examples_taskgroup.tex \ - Examples_taskyield.tex \ - Examples_workshare.tex \ - Examples_master.tex \ - Examples_critical.tex \ - Examples_worksharing_critical.tex \ - Examples_barrier_regions.tex \ - Examples_atomic.tex \ - Examples_atomic_restrict.tex \ - Examples_flush_nolist.tex \ - Examples_standalone.tex \ - Examples_ordered.tex \ - Examples_cancellation.tex \ - Examples_threadprivate.tex \ - Examples_pra_iterator.tex \ - Examples_fort_sp_common.tex \ - Examples_default_none.tex \ - Examples_fort_race.tex \ - Examples_private.tex \ - Examples_fort_sa_private.tex \ - Examples_carrays_fpriv.tex \ - Examples_lastprivate.tex \ - Examples_reduction.tex \ - Examples_copyin.tex \ - Examples_copyprivate.tex \ - Examples_nested_loop.tex \ - Examples_nesting_restrict.tex \ - Examples_set_dynamic_nthrs.tex \ - Examples_get_nthrs.tex \ - Examples_init_lock.tex \ - Examples_lock_owner.tex \ - Examples_simple_lock.tex \ - Examples_nestable_lock.tex \ - Examples_SIMD.tex \ - Examples_target.tex \ - Examples_target_data.tex \ - Examples_target_update.tex \ - Examples_declare_target.tex \ - Examples_teams.tex \ - Examples_async_target.tex \ - Examples_array_sections.tex \ - Examples_device.tex \ - Examples_associate.tex \ + Examples_*.tex \ History.tex +SOURCES=sources/*.c \ + sources/*.cpp \ + sources/*.f90 \ + sources/*.f + INTERMEDIATE_FILES=openmp-examples.pdf \ openmp-examples.toc \ openmp-examples.idx \ @@ -79,7 +24,7 @@ INTERMEDIATE_FILES=openmp-examples.pdf \ openmp-examples.out \ openmp-examples.log -openmp-examples.pdf: $(CHAPTERS) openmp.sty openmp-examples.tex openmp-logo.png +openmp-examples.pdf: $(CHAPTERS) $(SOURCES) openmp.sty openmp-examples.tex openmp-logo.png rm -f $(INTERMEDIATE_FILES) pdflatex -interaction=batchmode -file-line-error openmp-examples.tex pdflatex -interaction=batchmode -file-line-error openmp-examples.tex diff --git a/Title_Page.tex b/Title_Page.tex index 215002e..4040e22 100644 --- a/Title_Page.tex +++ b/Title_Page.tex @@ -27,7 +27,7 @@ Source codes for OpenMP \VER{} Examples can be downloaded from \href{https://github.com/OpenMP/Examples/tree/v\VER}{github}.\\ \begin{adjustwidth}{0pt}{1em}\setlength{\parskip}{0.25\baselineskip}% -Copyright © 1997-2015 OpenMP Architecture Review Board.\\ +Copyright © 1997-2016 OpenMP Architecture Review Board.\\ Permission to copy without fee all or part of this material is granted, provided the OpenMP Architecture Review Board copyright notice and the title of this document appear. Notice is given that copying is by diff --git a/omp_copyright.txt b/omp_copyright.txt index 0a4a7de..c81aa8f 100644 --- a/omp_copyright.txt +++ b/omp_copyright.txt @@ -1,4 +1,4 @@ -Copyright (c) 1997-2015 OpenMP Architecture Review Board. +Copyright (c) 1997-2016 OpenMP Architecture Review Board. All rights reserved. Permission to redistribute and use without fee all or part of the source diff --git a/openmp-examples.tcp b/openmp-examples.tcp new file mode 100644 index 0000000..cfc56ad --- /dev/null +++ b/openmp-examples.tcp @@ -0,0 +1,11 @@ +[FormatInfo] +Type=TeXnicCenterProjectInformation +Version=4 + +[ProjectInfo] +MainFile=ClassicThesis.tex +UseBibTeX=1 +UseMakeIndex=0 +ActiveProfile=LaTeX ⇨ PDF +ProjectLanguage=en +ProjectDialect=US diff --git a/openmp-examples.tex b/openmp-examples.tex index 6204ec1..7cf79cf 100644 --- a/openmp-examples.tex +++ b/openmp-examples.tex @@ -48,8 +48,8 @@ \documentclass[10pt,letterpaper,twoside,makeidx,hidelinks]{scrreprt} % Text to appear in the footer on even-numbered pages: -\newcommand{\VER}{4.0.2} -\newcommand{\VERDATE}{March 2015} +\newcommand{\VER}{4.5.0} +\newcommand{\VERDATE}{November 2016} \newcommand{\footerText}{OpenMP Examples Version \VER{} - \VERDATE} % Unified style sheet for OpenMP documents: @@ -77,71 +77,120 @@ \setcounter{chapter}{0} % start chapter numbering here - \input{Examples_ploop} - \input{Examples_mem_model} - \input{Examples_cond_comp} - \input{Examples_icv} - \input{Examples_parallel} - \input{Examples_nthrs_nesting} - \input{Examples_nthrs_dynamic} - \input{Examples_affinity} - \input{Examples_fort_do} - \input{Examples_fort_loopvar} - \input{Examples_nowait} - \input{Examples_collapse} - \input{Examples_psections} - \input{Examples_fpriv_sections} - \input{Examples_single} - \input{Examples_tasking} - \input{Examples_task_dep} - \input{Examples_taskgroup} - \input{Examples_taskyield} - \input{Examples_workshare} - \input{Examples_master} - \input{Examples_critical} - \input{Examples_worksharing_critical} - \input{Examples_barrier_regions} - \input{Examples_atomic} - \input{Examples_atomic_restrict} - \input{Examples_flush_nolist} - \input{Examples_standalone} - \input{Examples_ordered} - \input{Examples_cancellation} - \input{Examples_threadprivate} - \input{Examples_pra_iterator} - \input{Examples_fort_sp_common} - \input{Examples_default_none} - \input{Examples_fort_race} - \input{Examples_private} - \input{Examples_fort_sa_private} - \input{Examples_carrays_fpriv} - \input{Examples_lastprivate} - \input{Examples_reduction} - \input{Examples_copyin} - \input{Examples_copyprivate} - \input{Examples_nested_loop} - \input{Examples_nesting_restrict} - \input{Examples_set_dynamic_nthrs} - \input{Examples_get_nthrs} - \input{Examples_init_lock} - \input{Examples_lock_owner} - \input{Examples_simple_lock} - \input{Examples_nestable_lock} - \input{Examples_SIMD} - \input{Examples_target} - \input{Examples_target_data} - \input{Examples_target_update} - \input{Examples_declare_target} - \input{Examples_teams} - \input{Examples_async_target} - \input{Examples_array_sections} - \input{Examples_device} - \input{Examples_associate} + \input{Chap_parallel_execution} + \input{Examples_ploop} + \input{Examples_parallel} + \input{Examples_nthrs_nesting} + \input{Examples_nthrs_dynamic} + \input{Examples_fort_do} + \input{Examples_nowait} + \input{Examples_collapse} + % linear Clause 475 + \input{Examples_linear_in_loop} + \input{Examples_psections} + \input{Examples_fpriv_sections} + \input{Examples_single} + \input{Examples_workshare} + \input{Examples_master} + \input{Examples_pra_iterator} + \input{Examples_set_dynamic_nthrs} + \input{Examples_get_nthrs} + + \input{Chap_affinity} + \input{Examples_affinity} + \input{Examples_affinity_query} + + \input{Chap_tasking} + \input{Examples_tasking} + \input{Examples_task_priority} + \input{Examples_task_dep} + \input{Examples_taskgroup} + \input{Examples_taskyield} + \input{Examples_taskloop} + + \input{Chap_devices} + \input{Examples_target} + \input{Examples_target_data} + \input{Examples_target_unstructured_data} + \input{Examples_target_update} + \input{Examples_declare_target} + % Link clause 474 + \input{Examples_teams} + \input{Examples_async_target_depend} + \input{Examples_async_target_with_tasks} + %Title change of 57.1 and 57.2 + %New subsection + \input{Examples_async_target_nowait} + \input{Examples_async_target_nowait_depend} + \input{Examples_array_sections} + % Structure Element in map 487 + \input{Examples_device} + % MemoryRoutine and Device ptr 473 + + \input{Chap_SIMD} + \input{Examples_SIMD} + % Forward Depend 370 + % simdlen 476 + % simd linear modifier 480 + + \input{Chap_synchronization} + \input{Examples_critical} + \input{Examples_worksharing_critical} + \input{Examples_barrier_regions} + \input{Examples_atomic} + \input{Examples_atomic_restrict} + \input{Examples_flush_nolist} + \input{Examples_ordered} + % Doacross loop 405 + \input{Examples_doacross} + \input{Examples_locks} + \input{Examples_init_lock} + \input{Examples_init_lock_with_hint} + \input{Examples_lock_owner} + \input{Examples_simple_lock} + \input{Examples_nestable_lock} + % % LOCK with Hints 478 + % % Hint Clause xxxxxx (included after init_lock) + % % Lock routines with hint + + + \input{Chap_data_environment} + \input{Examples_threadprivate} + \input{Examples_default_none} + \input{Examples_private} + \input{Examples_fort_loopvar} + \input{Examples_fort_sp_common} + \input{Examples_fort_sa_private} + \input{Examples_carrays_fpriv} + \input{Examples_lastprivate} + \input{Examples_reduction} + % User UDR 287 + % C array reduction 377 + \input{Examples_copyin} + \input{Examples_copyprivate} + \input{Examples_cpp_reference} + % Fortran 2003 features 482 + \input{Examples_associate} %section--> subsection + + \input{Chap_memory_model} + \input{Examples_mem_model} + \input{Examples_fort_race} + + \input{Chap_program_control} + \input{Examples_cond_comp} + \input{Examples_icv} + % If multi-ifs 471 + \input{Examples_standalone} + \input{Examples_cancellation} + % New Section Nested Regions + \input{Examples_nested_loop} + \input{Examples_nesting_restrict} + \setcounter{chapter}{0} % restart chapter numbering with "letter A" \renewcommand{\thechapter}{\Alph{chapter}}% \appendix - \input{History} + \end{document} diff --git a/openmp.sty b/openmp.sty index 29327be..e21f365 100644 --- a/openmp.sty +++ b/openmp.sty @@ -78,6 +78,7 @@ \usepackage{comment} % allow use of \begin{comment} \usepackage{ifpdf,ifthen} % allow conditional tests in LaTeX definitions +\usepackage{makecell} % Allows common formatting in cells with \thread & \makecell %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -416,8 +417,10 @@ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Code example formatting for the Examples document % This defines: -% /cexample formats blue markers, caption, and code for C/C++ examples -% /fexample formats blue markers, caption, and code for Fortran examples +% /cexample formats blue markers, caption, and code for C examples +% /cppexample formats blue markers, caption, and code for C++ examples +% /fexample formats blue markers, caption, and code for Fortran (fixed) examples +% /ffreeexample formats blue markers, caption, and code for Fortran90 (free) examples % Thanks to Jin, Haoqiang H. for the original definitions of the following: \usepackage{color,fancyvrb} % for \VerbatimInput @@ -434,36 +437,40 @@ \newcommand{\escstr}[1]{\myreplace{_}{\_}{#1}} -\def\exampleheader#1#2{% +\def\exampleheader#1#2#3#4{% \ifthenelse{ \equal{#1}{} }{ \def\cname{#2} \def\ename\cname }{ - \def\cname{#1.#2} + \def\cname{#1.#2.#3} % Use following line for old numbering -% \def\ename{\thechapter.#2} +% \def\ename{\thechapter.#2.#3} % Use following for mneumonics - \def\ename{\escstr{#1}.#2} + \def\ename{\escstr{#1}.#2.#3} } \noindent \textit{Example \ename} %\vspace*{-3mm} + \code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=#4,fontsize=\small]% + %\code{\VerbatimInput[numbers=left,firstnumber=1,firstline=#4,fontsize=\small]% + %\code{\VerbatimInput[firstline=#4,fontsize=\small]% + {sources/Example_\cname}} } \def\cnexample#1#2{% - \exampleheader{#1}{#2} - \code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=8,fontsize=\small]% - %\code{\VerbatimInput[numbers=left,firstnumber=1,firstline=8,fontsize=\small]% - %\code{\VerbatimInput[firstline=8,fontsize=\small]% - {sources/Example_\cname.c}} + \exampleheader{#1}{#2}{c}{8} +} + +\def\cppnexample#1#2{% + \exampleheader{#1}{#2}{cpp}{8} } \def\fnexample#1#2{% - \exampleheader{#1}{#2} - \code{\VerbatimInput[numbers=left,numbersep=5ex,firstnumber=1,firstline=6,fontsize=\small]% - %\code{\VerbatimInput[numbers=left,firstnumber=1,firstline=6,fontsize=\small]% - %\code{\VerbatimInput[firstline=6,fontsize=\small]% - {sources/Example_\cname.f}} + \exampleheader{#1}{#2}{f}{6} +} + +\def\ffreenexample#1#2{% + \exampleheader{#1}{#2}{f90}{6} } \newcommand\cexample[2]{% @@ -474,7 +481,7 @@ \newcommand\cppexample[2]{% \needspace{5\baselineskip}\cppspecificstart -\cnexample{#1}{#2} +\cppnexample{#1}{#2} \cppspecificend } @@ -484,6 +491,12 @@ \fortranspecificend } +\newcommand\ffreeexample[2]{% +\needspace{5\baselineskip}\fortranspecificstart +\ffreenexample{#1}{#2} +\fortranspecificend +} + % Set default fonts: \rmfamily\mdseries\upshape\normalsize diff --git a/sources/Example_SIMD.1c.c b/sources/Example_SIMD.1.c similarity index 100% rename from sources/Example_SIMD.1c.c rename to sources/Example_SIMD.1.c diff --git a/sources/Example_SIMD.1f.f b/sources/Example_SIMD.1.f90 similarity index 100% rename from sources/Example_SIMD.1f.f rename to sources/Example_SIMD.1.f90 diff --git a/sources/Example_SIMD.2c.c b/sources/Example_SIMD.2.c similarity index 100% rename from sources/Example_SIMD.2c.c rename to sources/Example_SIMD.2.c diff --git a/sources/Example_SIMD.2f.f b/sources/Example_SIMD.2.f90 similarity index 100% rename from sources/Example_SIMD.2f.f rename to sources/Example_SIMD.2.f90 diff --git a/sources/Example_SIMD.3c.c b/sources/Example_SIMD.3.c similarity index 100% rename from sources/Example_SIMD.3c.c rename to sources/Example_SIMD.3.c diff --git a/sources/Example_SIMD.3f.f b/sources/Example_SIMD.3.f90 similarity index 100% rename from sources/Example_SIMD.3f.f rename to sources/Example_SIMD.3.f90 diff --git a/sources/Example_SIMD.4c.c b/sources/Example_SIMD.4.c similarity index 100% rename from sources/Example_SIMD.4c.c rename to sources/Example_SIMD.4.c diff --git a/sources/Example_SIMD.4f.f b/sources/Example_SIMD.4.f90 similarity index 100% rename from sources/Example_SIMD.4f.f rename to sources/Example_SIMD.4.f90 diff --git a/sources/Example_SIMD.5c.c b/sources/Example_SIMD.5.c similarity index 100% rename from sources/Example_SIMD.5c.c rename to sources/Example_SIMD.5.c diff --git a/sources/Example_SIMD.5f.f b/sources/Example_SIMD.5.f90 similarity index 100% rename from sources/Example_SIMD.5f.f rename to sources/Example_SIMD.5.f90 diff --git a/sources/Example_SIMD.6c.c b/sources/Example_SIMD.6.c similarity index 100% rename from sources/Example_SIMD.6c.c rename to sources/Example_SIMD.6.c diff --git a/sources/Example_SIMD.6f.f b/sources/Example_SIMD.6.f90 similarity index 88% rename from sources/Example_SIMD.6f.f rename to sources/Example_SIMD.6.f90 index 5cc5340..4a4cd05 100644 --- a/sources/Example_SIMD.6f.f +++ b/sources/Example_SIMD.6.f90 @@ -11,7 +11,7 @@ function foo(p) result(r) r = p end function foo -function myaddint(int *a, int *b, int n) result(r) +function myaddint(a, b, n) result(r) implicit none integer :: a(*), b(*), n, r integer :: i @@ -19,7 +19,7 @@ function myaddint(int *a, int *b, int n) result(r) !$omp simd do i=1, n - a(i) = foo(b[i]) ! foo is not called under a condition + a(i) = foo(b(i)) ! foo is not called under a condition end do r = a(n) diff --git a/sources/Example_SIMD.7c.c b/sources/Example_SIMD.7.c similarity index 100% rename from sources/Example_SIMD.7c.c rename to sources/Example_SIMD.7.c diff --git a/sources/Example_SIMD.7f.f b/sources/Example_SIMD.7.f90 similarity index 100% rename from sources/Example_SIMD.7f.f rename to sources/Example_SIMD.7.f90 diff --git a/sources/Example_SIMD.8c.c b/sources/Example_SIMD.8.c similarity index 90% rename from sources/Example_SIMD.8c.c rename to sources/Example_SIMD.8.c index ce00d68..7b93930 100644 --- a/sources/Example_SIMD.8c.c +++ b/sources/Example_SIMD.8.c @@ -14,8 +14,9 @@ float A[1000]; float do_work(float *arr) { float pri; + int i; #pragma omp simd lastprivate(pri) - for (int i = 0; i < 999; ++i) { + for (i = 0; i < 999; ++i) { int j = P[i]; pri = 0.5f; @@ -31,8 +32,9 @@ float do_work(float *arr) int main(void) { float pri, arr[1000]; + int i; - for (int i = 0; i < 1000; ++i) { + for (i = 0; i < 1000; ++i) { P[i] = i; A[i] = i * 1.5f; arr[i] = i * 1.8f; diff --git a/sources/Example_SIMD.8f.f b/sources/Example_SIMD.8.f90 similarity index 100% rename from sources/Example_SIMD.8f.f rename to sources/Example_SIMD.8.f90 diff --git a/sources/Example_affinity.1c.c b/sources/Example_affinity.1.c similarity index 100% rename from sources/Example_affinity.1c.c rename to sources/Example_affinity.1.c diff --git a/sources/Example_affinity.1f.f b/sources/Example_affinity.1.f similarity index 100% rename from sources/Example_affinity.1f.f rename to sources/Example_affinity.1.f diff --git a/sources/Example_affinity.2c.c b/sources/Example_affinity.2.c similarity index 100% rename from sources/Example_affinity.2c.c rename to sources/Example_affinity.2.c diff --git a/sources/Example_affinity.2f.f b/sources/Example_affinity.2.f90 similarity index 100% rename from sources/Example_affinity.2f.f rename to sources/Example_affinity.2.f90 diff --git a/sources/Example_affinity.3c.c b/sources/Example_affinity.3.c similarity index 100% rename from sources/Example_affinity.3c.c rename to sources/Example_affinity.3.c diff --git a/sources/Example_affinity.3f.f b/sources/Example_affinity.3.f similarity index 100% rename from sources/Example_affinity.3f.f rename to sources/Example_affinity.3.f diff --git a/sources/Example_affinity.4c.c b/sources/Example_affinity.4.c similarity index 100% rename from sources/Example_affinity.4c.c rename to sources/Example_affinity.4.c diff --git a/sources/Example_affinity.4f.f b/sources/Example_affinity.4.f90 similarity index 100% rename from sources/Example_affinity.4f.f rename to sources/Example_affinity.4.f90 diff --git a/sources/Example_affinity.5c.c b/sources/Example_affinity.5.c similarity index 100% rename from sources/Example_affinity.5c.c rename to sources/Example_affinity.5.c diff --git a/sources/Example_affinity.5f.f b/sources/Example_affinity.5.f similarity index 100% rename from sources/Example_affinity.5f.f rename to sources/Example_affinity.5.f diff --git a/sources/Example_affinity.6.c b/sources/Example_affinity.6.c new file mode 100644 index 0000000..4ea9c95 --- /dev/null +++ b/sources/Example_affinity.6.c @@ -0,0 +1,38 @@ +/* +* @@name: affinity.6c +* @@type: C +* @@compilable: yes +* @@linkable: no +* @@expect: success +*/ + +#include +#include + +void socket_init(int socket_num) +{ + int n_procs; + + n_procs = omp_get_place_num_procs(socket_num); + #pragma omp parallel num_threads(n_procs) proc_bind(close) + { + printf("Reporting in from socket num, thread num: %d %d\n", + socket_num,omp_get_thread_num() ); + } +} + +int main() +{ + int n_sockets, socket_num; + + omp_set_nested(1); // or export OMP_NESTED=true + omp_set_max_active_levels(2); // or export OMP_MAX_ACTIVE_LEVELS=2 + + n_sockets = omp_get_num_places(); + #pragma omp parallel num_threads(n_sockets) private(socket_num) \ + proc_bind(spread) + { + socket_num = omp_get_place_num(); + socket_init(socket_num); + } +} diff --git a/sources/Example_affinity.6.f90 b/sources/Example_affinity.6.f90 new file mode 100644 index 0000000..b4cab83 --- /dev/null +++ b/sources/Example_affinity.6.f90 @@ -0,0 +1,34 @@ +! @@name: affinity.6f +! @@type: F-free +! @@compilable: yes +! @@linkable: no +! @@expect: success + +subroutine socket_init(socket_num) + use omp_lib + integer :: socket_num, n_procs + + n_procs = omp_get_place_num_procs(socket_num) + !$omp parallel num_threads(n_procs) proc_bind(close) + + print*,"Reporting in from socket num, thread num: ", & + socket_num,omp_get_thread_num() + !$omp end parallel +end subroutine + +program numa_teams + use omp_lib + integer :: n_sockets, socket_num + + call omp_set_nested(.true.) ! or export OMP_NESTED=true + call omp_set_max_active_levels(2) ! or export OMP_MAX_ACTIVE_LEVELS=2 + + n_sockets = omp_get_num_places() + !$omp parallel num_threads(n_sockets) private(socket_num) & + !$omp& proc_bind(spread) + + socket_num = omp_get_place_num() + call socket_init(socket_num) + + !$omp end parallel +end program diff --git a/sources/Example_array_sections.1c.c b/sources/Example_array_sections.1.c similarity index 100% rename from sources/Example_array_sections.1c.c rename to sources/Example_array_sections.1.c diff --git a/sources/Example_array_sections.1f.f b/sources/Example_array_sections.1.f90 similarity index 92% rename from sources/Example_array_sections.1f.f rename to sources/Example_array_sections.1.f90 index 161cf88..49758b6 100644 --- a/sources/Example_array_sections.1f.f +++ b/sources/Example_array_sections.1.f90 @@ -10,6 +10,6 @@ integer :: A(30) ! Cannot map distinct parts of the same array !$omp target map( A(8:27) ) A(3) = 0 - !$omp end target map + !$omp end target !$omp end target data end subroutine diff --git a/sources/Example_array_sections.2c.c b/sources/Example_array_sections.2.c similarity index 100% rename from sources/Example_array_sections.2c.c rename to sources/Example_array_sections.2.c diff --git a/sources/Example_array_sections.2f.f b/sources/Example_array_sections.2.f90 similarity index 100% rename from sources/Example_array_sections.2f.f rename to sources/Example_array_sections.2.f90 diff --git a/sources/Example_array_sections.3c.c b/sources/Example_array_sections.3.c similarity index 100% rename from sources/Example_array_sections.3c.c rename to sources/Example_array_sections.3.c diff --git a/sources/Example_array_sections.3f.f b/sources/Example_array_sections.3.f90 similarity index 92% rename from sources/Example_array_sections.3f.f rename to sources/Example_array_sections.3.f90 index ac48140..c20d7dd 100644 --- a/sources/Example_array_sections.3f.f +++ b/sources/Example_array_sections.3.f90 @@ -11,6 +11,6 @@ integer,pointer :: p(:) !$omp target map( p(8:27) ) A(3) = 0 p(9) = 0 - !$omp end target map + !$omp end target !$omp end target data end subroutine diff --git a/sources/Example_array_sections.4c.c b/sources/Example_array_sections.4.c similarity index 100% rename from sources/Example_array_sections.4c.c rename to sources/Example_array_sections.4.c diff --git a/sources/Example_array_sections.4f.f b/sources/Example_array_sections.4.f90 similarity index 100% rename from sources/Example_array_sections.4f.f rename to sources/Example_array_sections.4.f90 diff --git a/sources/Example_associate.1f.f b/sources/Example_associate.1.f similarity index 100% rename from sources/Example_associate.1f.f rename to sources/Example_associate.1.f diff --git a/sources/Example_associate.2f.f b/sources/Example_associate.2.f similarity index 100% rename from sources/Example_associate.2f.f rename to sources/Example_associate.2.f diff --git a/sources/Example_associate.3f.f b/sources/Example_associate.3.f90 similarity index 100% rename from sources/Example_associate.3f.f rename to sources/Example_associate.3.f90 diff --git a/sources/Example_async_target.1c.c b/sources/Example_async_target.1.c similarity index 93% rename from sources/Example_async_target.1c.c rename to sources/Example_async_target.1.c index 3313742..b445aa5 100644 --- a/sources/Example_async_target.1c.c +++ b/sources/Example_async_target.1.c @@ -18,7 +18,7 @@ void pipedF() init(Z, N); for (C=0; C + +#define N 1000000 //N must be even +void init(int n, float *v1, float *v2); + +int main(){ + int i, n=N; + int chunk=1000; + float v1[N],v2[N],vxv[N]; + + init(n, v1,v2); + + #pragma omp parallel + { + + #pragma omp master + #pragma omp target teams distribute parallel for nowait \ + map(to: v1[0:n/2]) \ + map(to: v2[0:n/2]) \ + map(from: vxv[0:n/2]) + for(i=0; i #include +#include #define N 10000 diff --git a/sources/Example_cancellation.1f.f b/sources/Example_cancellation.1.f90 similarity index 100% rename from sources/Example_cancellation.1f.f rename to sources/Example_cancellation.1.f90 diff --git a/sources/Example_cancellation.2c.c b/sources/Example_cancellation.2.c similarity index 98% rename from sources/Example_cancellation.2c.c rename to sources/Example_cancellation.2.c index bb29c6f..d038029 100644 --- a/sources/Example_cancellation.2c.c +++ b/sources/Example_cancellation.2.c @@ -5,6 +5,8 @@ * @@linkable: no * @@expect: success */ +#include + typedef struct binary_tree_s { int value; struct binary_tree_s *left, *right; diff --git a/sources/Example_cancellation.2f.f b/sources/Example_cancellation.2.f90 similarity index 100% rename from sources/Example_cancellation.2f.f rename to sources/Example_cancellation.2.f90 diff --git a/sources/Example_carrays_fpriv.1c.c b/sources/Example_carrays_fpriv.1.c similarity index 100% rename from sources/Example_carrays_fpriv.1c.c rename to sources/Example_carrays_fpriv.1.c diff --git a/sources/Example_collapse.1c.c b/sources/Example_collapse.1.c similarity index 100% rename from sources/Example_collapse.1c.c rename to sources/Example_collapse.1.c diff --git a/sources/Example_collapse.1f.f b/sources/Example_collapse.1.f similarity index 100% rename from sources/Example_collapse.1f.f rename to sources/Example_collapse.1.f diff --git a/sources/Example_collapse.2c.c b/sources/Example_collapse.2.c similarity index 100% rename from sources/Example_collapse.2c.c rename to sources/Example_collapse.2.c diff --git a/sources/Example_collapse.2f.f b/sources/Example_collapse.2.f similarity index 100% rename from sources/Example_collapse.2f.f rename to sources/Example_collapse.2.f diff --git a/sources/Example_collapse.3c.c b/sources/Example_collapse.3.c similarity index 100% rename from sources/Example_collapse.3c.c rename to sources/Example_collapse.3.c diff --git a/sources/Example_collapse.3f.f b/sources/Example_collapse.3.f similarity index 100% rename from sources/Example_collapse.3f.f rename to sources/Example_collapse.3.f diff --git a/sources/Example_cond_comp.1c.c b/sources/Example_cond_comp.1.c similarity index 100% rename from sources/Example_cond_comp.1c.c rename to sources/Example_cond_comp.1.c diff --git a/sources/Example_cond_comp.1f.f b/sources/Example_cond_comp.1.f similarity index 100% rename from sources/Example_cond_comp.1f.f rename to sources/Example_cond_comp.1.f diff --git a/sources/Example_copyin.1c.c b/sources/Example_copyin.1.c similarity index 100% rename from sources/Example_copyin.1c.c rename to sources/Example_copyin.1.c diff --git a/sources/Example_copyin.1f.f b/sources/Example_copyin.1.f similarity index 100% rename from sources/Example_copyin.1f.f rename to sources/Example_copyin.1.f diff --git a/sources/Example_copyprivate.1c.c b/sources/Example_copyprivate.1.c similarity index 100% rename from sources/Example_copyprivate.1c.c rename to sources/Example_copyprivate.1.c diff --git a/sources/Example_copyprivate.1f.f b/sources/Example_copyprivate.1.f similarity index 100% rename from sources/Example_copyprivate.1f.f rename to sources/Example_copyprivate.1.f diff --git a/sources/Example_copyprivate.2c.c b/sources/Example_copyprivate.2.c similarity index 100% rename from sources/Example_copyprivate.2c.c rename to sources/Example_copyprivate.2.c diff --git a/sources/Example_copyprivate.2f.f b/sources/Example_copyprivate.2.f similarity index 100% rename from sources/Example_copyprivate.2f.f rename to sources/Example_copyprivate.2.f diff --git a/sources/Example_copyprivate.3c.c b/sources/Example_copyprivate.3.c similarity index 100% rename from sources/Example_copyprivate.3c.c rename to sources/Example_copyprivate.3.c diff --git a/sources/Example_copyprivate.3f.f b/sources/Example_copyprivate.3.f similarity index 100% rename from sources/Example_copyprivate.3f.f rename to sources/Example_copyprivate.3.f diff --git a/sources/Example_copyprivate.4f.f b/sources/Example_copyprivate.4.f similarity index 100% rename from sources/Example_copyprivate.4f.f rename to sources/Example_copyprivate.4.f diff --git a/sources/Example_cpp_reference.1.cpp b/sources/Example_cpp_reference.1.cpp new file mode 100644 index 0000000..6a7fc11 --- /dev/null +++ b/sources/Example_cpp_reference.1.cpp @@ -0,0 +1,25 @@ +/* +* @@name: cpp_reference.1c +* @@type: C++ +* @@compilable: yes +* @@linkable: no +* @@expect: success +*/ + +void task_body (int &); +void gen_task (int &x) { // on orphaned task construct reference argument + #pragma omp task // x is implicitly determined firstprivate(x) + task_body (x); +} +void test (int &y, int &z) { + #pragma omp parallel private(y) + { + y = z + 2; + gen_task (y); // no matter if the argument is determined private + gen_task (z); // or shared in the enclosing context. + + y++; // each thread has its own int object y refers to + gen_task (y); + } +} + diff --git a/sources/Example_critical.1c.c b/sources/Example_critical.1.c similarity index 100% rename from sources/Example_critical.1c.c rename to sources/Example_critical.1.c diff --git a/sources/Example_critical.1f.f b/sources/Example_critical.1.f similarity index 100% rename from sources/Example_critical.1f.f rename to sources/Example_critical.1.f diff --git a/sources/Example_critical.2.c b/sources/Example_critical.2.c new file mode 100644 index 0000000..4e2040c --- /dev/null +++ b/sources/Example_critical.2.c @@ -0,0 +1,28 @@ +/* +* @@name: critical.1c +* @@type: C +* @@compilable: yes +* @@linkable: no +* @@expect: success +*/ +#include + +int dequeue(float *a); +void work(int i, float *a); + +void critical_example(float *x, float *y) +{ + int ix_next, iy_next; + + #pragma omp parallel shared(x, y) private(ix_next, iy_next) + { + #pragma omp critical (xaxis) hint(omp_lock_hint_contended) + ix_next = dequeue(x); + work(ix_next, x); + + #pragma omp critical (yaxis) hint(omp_lock_hint_contended) + iy_next = dequeue(y); + work(iy_next, y); + } + +} diff --git a/sources/Example_critical.2.f b/sources/Example_critical.2.f new file mode 100644 index 0000000..ec19a5e --- /dev/null +++ b/sources/Example_critical.2.f @@ -0,0 +1,26 @@ +! @@name: critical.1f +! @@type: F-fixed +! @@compilable: yes +! @@linkable: no +! @@expect: success + SUBROUTINE CRITICAL_EXAMPLE(X, Y) + USE OMP_LIB ! or INCLUDE "omp_lib.h" + + REAL X(*), Y(*) + INTEGER IX_NEXT, IY_NEXT + +!$OMP PARALLEL SHARED(X, Y) PRIVATE(IX_NEXT, IY_NEXT) + +!$OMP CRITICAL(XAXIS) HINT(OMP_LOCK_HINT_CONTENDED) + CALL DEQUEUE(IX_NEXT, X) +!$OMP END CRITICAL(XAXIS) + CALL WORK(IX_NEXT, X) + +!$OMP CRITICAL(YAXIS) HINT(OMP_LOCK_HINT_CONTENDED) + CALL DEQUEUE(IY_NEXT,Y) +!$OMP END CRITICAL(YAXIS) + CALL WORK(IY_NEXT, Y) + +!$OMP END PARALLEL + + END SUBROUTINE CRITICAL_EXAMPLE diff --git a/sources/Example_declare_target.1c.c b/sources/Example_declare_target.1.c similarity index 100% rename from sources/Example_declare_target.1c.c rename to sources/Example_declare_target.1.c diff --git a/sources/Example_declare_target.1f.f b/sources/Example_declare_target.1.f90 similarity index 100% rename from sources/Example_declare_target.1f.f rename to sources/Example_declare_target.1.f90 diff --git a/sources/Example_declare_target.2c.c b/sources/Example_declare_target.2.cpp similarity index 100% rename from sources/Example_declare_target.2c.c rename to sources/Example_declare_target.2.cpp diff --git a/sources/Example_declare_target.2f.f b/sources/Example_declare_target.2.f90 similarity index 100% rename from sources/Example_declare_target.2f.f rename to sources/Example_declare_target.2.f90 diff --git a/sources/Example_declare_target.3c.c b/sources/Example_declare_target.3.c similarity index 100% rename from sources/Example_declare_target.3c.c rename to sources/Example_declare_target.3.c diff --git a/sources/Example_declare_target.3f.f b/sources/Example_declare_target.3.f90 similarity index 100% rename from sources/Example_declare_target.3f.f rename to sources/Example_declare_target.3.f90 diff --git a/sources/Example_declare_target.4c.c b/sources/Example_declare_target.4.c similarity index 70% rename from sources/Example_declare_target.4c.c rename to sources/Example_declare_target.4.c index 53827d9..2a74d6e 100644 --- a/sources/Example_declare_target.4c.c +++ b/sources/Example_declare_target.4.c @@ -15,9 +15,13 @@ float accum(int k) { float tmp = 0.0; #pragma omp target update to(Q) - #pragma omp target + #pragma omp target map(tofrom: tmp) #pragma omp parallel for reduction(+:tmp) for(int i=0; i < N; i++) tmp += Pfun(i,k); return tmp; } + +/* Note: The variable tmp is now mapped with tofrom, for correct + execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro. + */ diff --git a/sources/Example_declare_target.4f.f b/sources/Example_declare_target.4.f90 similarity index 76% rename from sources/Example_declare_target.4f.f rename to sources/Example_declare_target.4.f90 index df35c9b..12ee23e 100644 --- a/sources/Example_declare_target.4f.f +++ b/sources/Example_declare_target.4.f90 @@ -15,15 +15,19 @@ integer,intent(in) :: i,k Pfun=(Q(i,k) * Q(k,i)) end function end module + function accum(k) result(tmp) use my_global_array real :: tmp integer :: i, k tmp = 0.0e0 - !$omp target + !$omp target map(tofrom: tmp) !$omp parallel do reduction(+:tmp) do i=1,N tmp = tmp + Pfun(k,i) end do !$omp end target end function + +! Note: The variable tmp is now mapped with tofrom, for correct +! execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro. diff --git a/sources/Example_declare_target.5c.c b/sources/Example_declare_target.5.c similarity index 76% rename from sources/Example_declare_target.5c.c rename to sources/Example_declare_target.5.c index 1a43c3b..394fa1f 100644 --- a/sources/Example_declare_target.5c.c +++ b/sources/Example_declare_target.5.c @@ -15,11 +15,12 @@ float P(const int i, const int k) return Q[i][k] * Q[k][i]; } #pragma omp end declare target + float accum(void) { float tmp = 0.0; int i, k; -#pragma omp target +#pragma omp target map(tofrom: tmp) #pragma omp parallel for reduction(+:tmp) for (i=0; i < N; i++) { float tmp1 = 0.0; @@ -31,3 +32,7 @@ float accum(void) } return tmp; } + +/* Note: The variable tmp is now mapped with tofrom, for correct + execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro. + */ diff --git a/sources/Example_declare_target.5f.f b/sources/Example_declare_target.5.f90 similarity index 81% rename from sources/Example_declare_target.5f.f rename to sources/Example_declare_target.5.f90 index 1507e65..0f9809c 100644 --- a/sources/Example_declare_target.5f.f +++ b/sources/Example_declare_target.5.f90 @@ -16,12 +16,13 @@ integer,intent(in) :: k,i P=(Q(k,i) * Q(i,k)) end function end module + function accum() result(tmp) use my_global_array real :: tmp, tmp1 integer :: i tmp = 0.0e0 - !$omp target + !$omp target map(tofrom: tmp) !$omp parallel do private(tmp1) reduction(+:tmp) do i=1,N tmp1 = 0.0e0 @@ -33,3 +34,6 @@ integer :: i end do !$omp end target end function + +! Note: The variable tmp is now mapped with tofrom, for correct +! execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro. diff --git a/sources/Example_declare_target.6.c b/sources/Example_declare_target.6.c new file mode 100644 index 0000000..df832ef --- /dev/null +++ b/sources/Example_declare_target.6.c @@ -0,0 +1,53 @@ +/* +* @@name: declare_target.6.c +* @@type: C +* @@compilable: yes +* @@linkable: no +* @@expect: success +*/ +#define N 100000000 + +#pragma omp declare target link(sp,sv1,sv2) \ + link(dp,dv1,dv2) +float sp[N], sv1[N], sv2[N]; +double dp[N], dv1[N], dv2[N]; + +void s_init(float *, float *, int); +void d_init(double *, double *, int); +void s_output(float *, int); +void d_output(double *, int); + +#pragma omp declare target +void s_vec_mult_accum() +{ + int i; + + #pragma omp parallel for + for (i=0; i int x, y, z[1000]; diff --git a/sources/Example_default_none.1f.f b/sources/Example_default_none.1.f similarity index 96% rename from sources/Example_default_none.1f.f rename to sources/Example_default_none.1.f index 5df13c1..72a82bb 100644 --- a/sources/Example_default_none.1f.f +++ b/sources/Example_default_none.1.f @@ -1,8 +1,8 @@ ! @@name: default_none.1f ! @@type: F-fixed -! @@compilable: yes +! @@compilable: no ! @@linkable: no -! @@expect: success +! @@expect: failure SUBROUTINE DEFAULT_NONE(A) INCLUDE "omp_lib.h" ! or USE OMP_LIB diff --git a/sources/Example_device.1c.c b/sources/Example_device.1.c similarity index 93% rename from sources/Example_device.1c.c rename to sources/Example_device.1.c index b719f50..f7a637c 100644 --- a/sources/Example_device.1c.c +++ b/sources/Example_device.1.c @@ -37,7 +37,7 @@ void vec_mult(float *p, float *v1, float *v2, int N) printf("8 threads on initial device\n"); nthreads = 8; } - #pragma omp parallel for private(i) num_threads(nthreads); + #pragma omp parallel for private(i) num_threads(nthreads) for (i=0; i +#include +#include +#include + +void get_dev_cos(double *mem, size_t s) +{ + int h, t, i; + double * mem_dev_cpy; + h = omp_get_initial_device(); + t = omp_get_default_device(); + + if (omp_get_num_devices() < 1 || t < 0){ + printf(" ERROR: No device found.\n"); + exit(1); + } + + mem_dev_cpy = omp_target_alloc( sizeof(double) * s, t); + if(mem_dev_cpy == NULL){ + printf(" ERROR: No space left on device.\n"); + exit(1); + } + + /* dst src */ + omp_target_memcpy(mem_dev_cpy, mem, sizeof(double)*s, + 0, 0, + t, h); + + #pragma omp target is_device_ptr(mem_dev_cpy) device(t) + #pragma omp teams distribute parallel for + for(i=0;i + +omp_lock_t *new_locks() +{ + int i; + omp_lock_t *lock = new omp_lock_t[1000]; + + #pragma omp parallel for private(i) + for (i=0; i<1000; i++) + { + omp_init_lock_with_hint(&lock[i], + omp_lock_hint_contended | omp_lock_hint_speculative); + } + return lock; +} diff --git a/sources/Example_init_lock_with_hint.1.f b/sources/Example_init_lock_with_hint.1.f new file mode 100644 index 0000000..b0f8cfe --- /dev/null +++ b/sources/Example_init_lock_with_hint.1.f @@ -0,0 +1,19 @@ +! @@name: init_lock.1f +! @@type: F-fixed +! @@compilable: yes +! @@linkable: no +! @@expect: success + FUNCTION NEW_LOCKS() + USE OMP_LIB ! or INCLUDE "omp_lib.h" + INTEGER(OMP_LOCK_KIND), DIMENSION(1000) :: NEW_LOCKS + + INTEGER I + +!$OMP PARALLEL DO PRIVATE(I) + DO I=1,1000 + CALL OMP_INIT_LOCK_WITH_HINT(NEW_LOCKS(I), + & OMP_LOCK_HINT_CONTENDED + OMP_LOCK_HINT_SPECULATIVE) + END DO +!$OMP END PARALLEL DO + + END FUNCTION NEW_LOCKS diff --git a/sources/Example_lastprivate.1c.c b/sources/Example_lastprivate.1.c similarity index 100% rename from sources/Example_lastprivate.1c.c rename to sources/Example_lastprivate.1.c diff --git a/sources/Example_lastprivate.1f.f b/sources/Example_lastprivate.1.f similarity index 100% rename from sources/Example_lastprivate.1f.f rename to sources/Example_lastprivate.1.f diff --git a/sources/Example_linear_in_loop.1.c b/sources/Example_linear_in_loop.1.c new file mode 100644 index 0000000..f6deeda --- /dev/null +++ b/sources/Example_linear_in_loop.1.c @@ -0,0 +1,31 @@ +/* +* @@name: linear_in_loop.1c +* @@type: C +* @@compilable: yes +* @@linkable: yes +* @@expect: success +*/ +#include + +#define N 100 +int main(void) +{ + float a[N], b[N/2]; + int i, j; + + for ( i = 0; i < N; i++ ) + a[i] = i + 1; + + j = 0; + #pragma omp parallel + #pragma omp for linear(j:1) + for ( i = 0; i < N; i += 2 ) { + b[j] = a[i] * 2.0f; + j++; + } + + printf( "%d %f %f\n", j, b[0], b[j-1] ); + /* print out: 50 2.0 198.0 */ + + return 0; +} diff --git a/sources/Example_linear_in_loop.1.f90 b/sources/Example_linear_in_loop.1.f90 new file mode 100644 index 0000000..5edc41d --- /dev/null +++ b/sources/Example_linear_in_loop.1.f90 @@ -0,0 +1,28 @@ +! @@name: linear_in_loop.1f +! @@type: F-free +! @@compilable: yes +! @@linkable: yes +! @@expect: success +program linear_loop + implicit none + integer, parameter :: N = 100 + real :: a(N), b(N/2) + integer :: i, j + + do i = 1, N + a(i) = i + end do + + j = 0 + !$omp parallel + !$omp do linear(j:1) + do i = 1, N, 2 + j = j + 1 + b(j) = a(i) * 2.0 + end do + !$omp end parallel + + print *, j, b(1), b(j) + ! print out: 50 2.0 198.0 + +end program diff --git a/sources/Example_lock_owner.1c.c b/sources/Example_lock_owner.1.c similarity index 100% rename from sources/Example_lock_owner.1c.c rename to sources/Example_lock_owner.1.c diff --git a/sources/Example_lock_owner.1f.f b/sources/Example_lock_owner.1.f similarity index 100% rename from sources/Example_lock_owner.1f.f rename to sources/Example_lock_owner.1.f diff --git a/sources/Example_master.1c.c b/sources/Example_master.1.c similarity index 100% rename from sources/Example_master.1c.c rename to sources/Example_master.1.c diff --git a/sources/Example_master.1f.f b/sources/Example_master.1.f similarity index 100% rename from sources/Example_master.1f.f rename to sources/Example_master.1.f diff --git a/sources/Example_mem_model.1c.c b/sources/Example_mem_model.1.c similarity index 100% rename from sources/Example_mem_model.1c.c rename to sources/Example_mem_model.1.c diff --git a/sources/Example_mem_model.1f.f b/sources/Example_mem_model.1.f90 similarity index 100% rename from sources/Example_mem_model.1f.f rename to sources/Example_mem_model.1.f90 diff --git a/sources/Example_mem_model.2c.c b/sources/Example_mem_model.2.c similarity index 100% rename from sources/Example_mem_model.2c.c rename to sources/Example_mem_model.2.c diff --git a/sources/Example_mem_model.2f.f b/sources/Example_mem_model.2.f similarity index 100% rename from sources/Example_mem_model.2f.f rename to sources/Example_mem_model.2.f diff --git a/sources/Example_mem_model.3c.c b/sources/Example_mem_model.3.c similarity index 100% rename from sources/Example_mem_model.3c.c rename to sources/Example_mem_model.3.c diff --git a/sources/Example_mem_model.3f.f b/sources/Example_mem_model.3.f similarity index 100% rename from sources/Example_mem_model.3f.f rename to sources/Example_mem_model.3.f diff --git a/sources/Example_nestable_lock.1c.c b/sources/Example_nestable_lock.1.c similarity index 100% rename from sources/Example_nestable_lock.1c.c rename to sources/Example_nestable_lock.1.c diff --git a/sources/Example_nestable_lock.1f.f b/sources/Example_nestable_lock.1.f similarity index 100% rename from sources/Example_nestable_lock.1f.f rename to sources/Example_nestable_lock.1.f diff --git a/sources/Example_nested_loop.1c.c b/sources/Example_nested_loop.1.c similarity index 100% rename from sources/Example_nested_loop.1c.c rename to sources/Example_nested_loop.1.c diff --git a/sources/Example_nested_loop.1f.f b/sources/Example_nested_loop.1.f similarity index 100% rename from sources/Example_nested_loop.1f.f rename to sources/Example_nested_loop.1.f diff --git a/sources/Example_nested_loop.2c.c b/sources/Example_nested_loop.2.c similarity index 100% rename from sources/Example_nested_loop.2c.c rename to sources/Example_nested_loop.2.c diff --git a/sources/Example_nested_loop.2f.f b/sources/Example_nested_loop.2.f similarity index 100% rename from sources/Example_nested_loop.2f.f rename to sources/Example_nested_loop.2.f diff --git a/sources/Example_nesting_restrict.1c.c b/sources/Example_nesting_restrict.1.c similarity index 100% rename from sources/Example_nesting_restrict.1c.c rename to sources/Example_nesting_restrict.1.c diff --git a/sources/Example_nesting_restrict.1f.f b/sources/Example_nesting_restrict.1.f similarity index 100% rename from sources/Example_nesting_restrict.1f.f rename to sources/Example_nesting_restrict.1.f diff --git a/sources/Example_nesting_restrict.2c.c b/sources/Example_nesting_restrict.2.c similarity index 100% rename from sources/Example_nesting_restrict.2c.c rename to sources/Example_nesting_restrict.2.c diff --git a/sources/Example_nesting_restrict.2f.f b/sources/Example_nesting_restrict.2.f similarity index 100% rename from sources/Example_nesting_restrict.2f.f rename to sources/Example_nesting_restrict.2.f diff --git a/sources/Example_nesting_restrict.3c.c b/sources/Example_nesting_restrict.3.c similarity index 100% rename from sources/Example_nesting_restrict.3c.c rename to sources/Example_nesting_restrict.3.c diff --git a/sources/Example_nesting_restrict.3f.f b/sources/Example_nesting_restrict.3.f similarity index 100% rename from sources/Example_nesting_restrict.3f.f rename to sources/Example_nesting_restrict.3.f diff --git a/sources/Example_nesting_restrict.4c.c b/sources/Example_nesting_restrict.4.c similarity index 100% rename from sources/Example_nesting_restrict.4c.c rename to sources/Example_nesting_restrict.4.c diff --git a/sources/Example_nesting_restrict.4f.f b/sources/Example_nesting_restrict.4.f similarity index 100% rename from sources/Example_nesting_restrict.4f.f rename to sources/Example_nesting_restrict.4.f diff --git a/sources/Example_nesting_restrict.5c.c b/sources/Example_nesting_restrict.5.c similarity index 100% rename from sources/Example_nesting_restrict.5c.c rename to sources/Example_nesting_restrict.5.c diff --git a/sources/Example_nesting_restrict.5f.f b/sources/Example_nesting_restrict.5.f similarity index 100% rename from sources/Example_nesting_restrict.5f.f rename to sources/Example_nesting_restrict.5.f diff --git a/sources/Example_nesting_restrict.6c.c b/sources/Example_nesting_restrict.6.c similarity index 100% rename from sources/Example_nesting_restrict.6c.c rename to sources/Example_nesting_restrict.6.c diff --git a/sources/Example_nesting_restrict.6f.f b/sources/Example_nesting_restrict.6.f similarity index 100% rename from sources/Example_nesting_restrict.6f.f rename to sources/Example_nesting_restrict.6.f diff --git a/sources/Example_nowait.1c.c b/sources/Example_nowait.1.c similarity index 100% rename from sources/Example_nowait.1c.c rename to sources/Example_nowait.1.c diff --git a/sources/Example_nowait.1f.f b/sources/Example_nowait.1.f similarity index 100% rename from sources/Example_nowait.1f.f rename to sources/Example_nowait.1.f diff --git a/sources/Example_nowait.2c.c b/sources/Example_nowait.2.c similarity index 100% rename from sources/Example_nowait.2c.c rename to sources/Example_nowait.2.c diff --git a/sources/Example_nowait.2f.f b/sources/Example_nowait.2.f90 similarity index 100% rename from sources/Example_nowait.2f.f rename to sources/Example_nowait.2.f90 diff --git a/sources/Example_nthrs_dynamic.1c.c b/sources/Example_nthrs_dynamic.1.c similarity index 100% rename from sources/Example_nthrs_dynamic.1c.c rename to sources/Example_nthrs_dynamic.1.c diff --git a/sources/Example_nthrs_dynamic.1f.f b/sources/Example_nthrs_dynamic.1.f similarity index 100% rename from sources/Example_nthrs_dynamic.1f.f rename to sources/Example_nthrs_dynamic.1.f diff --git a/sources/Example_nthrs_dynamic.2c.c b/sources/Example_nthrs_dynamic.2.c similarity index 100% rename from sources/Example_nthrs_dynamic.2c.c rename to sources/Example_nthrs_dynamic.2.c diff --git a/sources/Example_nthrs_dynamic.2f.f b/sources/Example_nthrs_dynamic.2.f similarity index 100% rename from sources/Example_nthrs_dynamic.2f.f rename to sources/Example_nthrs_dynamic.2.f diff --git a/sources/Example_nthrs_nesting.1c.c b/sources/Example_nthrs_nesting.1.c similarity index 100% rename from sources/Example_nthrs_nesting.1c.c rename to sources/Example_nthrs_nesting.1.c diff --git a/sources/Example_nthrs_nesting.1f.f b/sources/Example_nthrs_nesting.1.f similarity index 100% rename from sources/Example_nthrs_nesting.1f.f rename to sources/Example_nthrs_nesting.1.f diff --git a/sources/Example_ordered.1c.c b/sources/Example_ordered.1.c similarity index 100% rename from sources/Example_ordered.1c.c rename to sources/Example_ordered.1.c diff --git a/sources/Example_ordered.1f.f b/sources/Example_ordered.1.f similarity index 100% rename from sources/Example_ordered.1f.f rename to sources/Example_ordered.1.f diff --git a/sources/Example_ordered.2c.c b/sources/Example_ordered.2.c similarity index 100% rename from sources/Example_ordered.2c.c rename to sources/Example_ordered.2.c diff --git a/sources/Example_ordered.2f.f b/sources/Example_ordered.2.f similarity index 100% rename from sources/Example_ordered.2f.f rename to sources/Example_ordered.2.f diff --git a/sources/Example_ordered.3c.c b/sources/Example_ordered.3.c similarity index 100% rename from sources/Example_ordered.3c.c rename to sources/Example_ordered.3.c diff --git a/sources/Example_ordered.3f.f b/sources/Example_ordered.3.f similarity index 100% rename from sources/Example_ordered.3f.f rename to sources/Example_ordered.3.f diff --git a/sources/Example_parallel.1c.c b/sources/Example_parallel.1.c similarity index 100% rename from sources/Example_parallel.1c.c rename to sources/Example_parallel.1.c diff --git a/sources/Example_parallel.1f.f b/sources/Example_parallel.1.f similarity index 100% rename from sources/Example_parallel.1f.f rename to sources/Example_parallel.1.f diff --git a/sources/Example_ploop.1c.c b/sources/Example_ploop.1.c similarity index 100% rename from sources/Example_ploop.1c.c rename to sources/Example_ploop.1.c diff --git a/sources/Example_ploop.1f.f b/sources/Example_ploop.1.f similarity index 100% rename from sources/Example_ploop.1f.f rename to sources/Example_ploop.1.f diff --git a/sources/Example_pra_iterator.1c.c b/sources/Example_pra_iterator.1.cpp similarity index 100% rename from sources/Example_pra_iterator.1c.c rename to sources/Example_pra_iterator.1.cpp diff --git a/sources/Example_private.1c.c b/sources/Example_private.1.c similarity index 100% rename from sources/Example_private.1c.c rename to sources/Example_private.1.c diff --git a/sources/Example_private.1f.f b/sources/Example_private.1.f similarity index 100% rename from sources/Example_private.1f.f rename to sources/Example_private.1.f diff --git a/sources/Example_private.2c.c b/sources/Example_private.2.c similarity index 100% rename from sources/Example_private.2c.c rename to sources/Example_private.2.c diff --git a/sources/Example_private.2f.f b/sources/Example_private.2.f similarity index 100% rename from sources/Example_private.2f.f rename to sources/Example_private.2.f diff --git a/sources/Example_private.3c.c b/sources/Example_private.3.c similarity index 100% rename from sources/Example_private.3c.c rename to sources/Example_private.3.c diff --git a/sources/Example_private.3f.f b/sources/Example_private.3.f similarity index 100% rename from sources/Example_private.3f.f rename to sources/Example_private.3.f diff --git a/sources/Example_psections.1c.c b/sources/Example_psections.1.c similarity index 100% rename from sources/Example_psections.1c.c rename to sources/Example_psections.1.c diff --git a/sources/Example_psections.1f.f b/sources/Example_psections.1.f similarity index 100% rename from sources/Example_psections.1f.f rename to sources/Example_psections.1.f diff --git a/sources/Example_reduction.1c.c b/sources/Example_reduction.1.c similarity index 100% rename from sources/Example_reduction.1c.c rename to sources/Example_reduction.1.c diff --git a/sources/Example_reduction.1f.f b/sources/Example_reduction.1.f90 similarity index 100% rename from sources/Example_reduction.1f.f rename to sources/Example_reduction.1.f90 diff --git a/sources/Example_reduction.2c.c b/sources/Example_reduction.2.c similarity index 100% rename from sources/Example_reduction.2c.c rename to sources/Example_reduction.2.c diff --git a/sources/Example_reduction.2f.f b/sources/Example_reduction.2.f90 similarity index 100% rename from sources/Example_reduction.2f.f rename to sources/Example_reduction.2.f90 diff --git a/sources/Example_reduction.3c.c b/sources/Example_reduction.3.c similarity index 100% rename from sources/Example_reduction.3c.c rename to sources/Example_reduction.3.c diff --git a/sources/Example_reduction.3f.f b/sources/Example_reduction.3.f90 similarity index 100% rename from sources/Example_reduction.3f.f rename to sources/Example_reduction.3.f90 diff --git a/sources/Example_reduction.4f.f b/sources/Example_reduction.4.f90 similarity index 100% rename from sources/Example_reduction.4f.f rename to sources/Example_reduction.4.f90 diff --git a/sources/Example_reduction.5f.f b/sources/Example_reduction.5.f90 similarity index 100% rename from sources/Example_reduction.5f.f rename to sources/Example_reduction.5.f90 diff --git a/sources/Example_reduction.6.c b/sources/Example_reduction.6.c new file mode 100644 index 0000000..28a507e --- /dev/null +++ b/sources/Example_reduction.6.c @@ -0,0 +1,30 @@ +/* +* @@name: reduction.6c +* @@type: C +* @@compilable: yes +* @@linkable: yes +* @@expect: rt-error +*/ +#include + +int main (void) +{ + int a, i; + + #pragma omp parallel shared(a) private(i) + { + #pragma omp master + a = 0; + + // To avoid race conditions, add a barrier here. + + #pragma omp for reduction(+:a) + for (i = 0; i < 10; i++) { + a += i; + } + + #pragma omp single + printf ("Sum is %d\n", a); + } + return 0; +} diff --git a/sources/Example_reduction.6f.f b/sources/Example_reduction.6.f similarity index 100% rename from sources/Example_reduction.6f.f rename to sources/Example_reduction.6.f diff --git a/sources/Example_reduction.7.c b/sources/Example_reduction.7.c new file mode 100644 index 0000000..5a94085 --- /dev/null +++ b/sources/Example_reduction.7.c @@ -0,0 +1,31 @@ +/* +* @@name: reduction.7c +* @@type: C +* @@compilable: yes +* @@linkable: no +* @@expect: success +*/ +#include + +#define N 100 +void init(int n, float (*b)[N]); + +int main(){ + + int i,j; + float a[N], b[N][N]; + + init(N,b); + + for(i=0; iTHRESHOLD1) if(parallel: N>THRESHOLD2) \ + map(to: v1[0:N], v2[:N]) map(from: p[0:N]) + for (i=0; iTHRESHHOLD1) if(parallel: N>THRESHOLD2) & + !$omp& map(to: v1, v2 ) map(from: p) + do i=1,N + p(i) = v1(i) * v2(i) + end do + !$omp end target parallel do + call output(p, N) +end subroutine diff --git a/sources/Example_target_data.1c.c b/sources/Example_target_data.1.c similarity index 100% rename from sources/Example_target_data.1c.c rename to sources/Example_target_data.1.c diff --git a/sources/Example_target_data.1f.f b/sources/Example_target_data.1.f90 similarity index 100% rename from sources/Example_target_data.1f.f rename to sources/Example_target_data.1.f90 diff --git a/sources/Example_target_data.2c.c b/sources/Example_target_data.2.c similarity index 100% rename from sources/Example_target_data.2c.c rename to sources/Example_target_data.2.c diff --git a/sources/Example_target_data.2f.f b/sources/Example_target_data.2.f90 similarity index 100% rename from sources/Example_target_data.2f.f rename to sources/Example_target_data.2.f90 diff --git a/sources/Example_target_data.3c.c b/sources/Example_target_data.3.c similarity index 75% rename from sources/Example_target_data.3c.c rename to sources/Example_target_data.3.c index 85d9514..06fc4a0 100644 --- a/sources/Example_target_data.3c.c +++ b/sources/Example_target_data.3.c @@ -14,14 +14,20 @@ void gramSchmidt(float Q[][COLS], const int rows) for(int k=0; k < cols; k++) { double tmp = 0.0; - #pragma omp target + #pragma omp target map(tofrom: tmp) #pragma omp parallel for reduction(+:tmp) for(int i=0; i < rows; i++) tmp += (Q[i][k] * Q[i][k]); + tmp = 1/sqrt(tmp); + #pragma omp target #pragma omp parallel for for(int i=0; i < rows; i++) Q[i][k] *= tmp; } } + +/* Note: The variable tmp is now mapped with tofrom, for correct + execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro. + */ diff --git a/sources/Example_target_data.3f.f b/sources/Example_target_data.3.f90 similarity index 78% rename from sources/Example_target_data.3f.f rename to sources/Example_target_data.3.f90 index d1cea10..026f77e 100644 --- a/sources/Example_target_data.3f.f +++ b/sources/Example_target_data.3.f90 @@ -9,13 +9,15 @@ double precision :: Q(rows,cols), tmp !$omp target data map(Q) do k=1,cols tmp = 0.0d0 - !$omp target + !$omp target map(tofrom: tmp) !$omp parallel do reduction(+:tmp) do i=1,rows tmp = tmp + (Q(i,k) * Q(i,k)) end do !$omp end target + tmp = 1.0d0/sqrt(tmp) + !$omp target !$omp parallel do do i=1,rows @@ -25,3 +27,6 @@ double precision :: Q(rows,cols), tmp end do !$omp end target data end subroutine + +! Note: The variable tmp is now mapped with tofrom, for correct +! execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro. diff --git a/sources/Example_target_data.4c.c b/sources/Example_target_data.4.c similarity index 100% rename from sources/Example_target_data.4c.c rename to sources/Example_target_data.4.c diff --git a/sources/Example_target_data.4f.f b/sources/Example_target_data.4.f90 similarity index 96% rename from sources/Example_target_data.4f.f rename to sources/Example_target_data.4.f90 index 7b04fc5..fd99596 100644 --- a/sources/Example_target_data.4f.f +++ b/sources/Example_target_data.4.f90 @@ -11,7 +11,7 @@ integer :: N,i call init(v1, v2, N) !$omp target data map(to: v1, v2) map(from: p0) call vec_mult(p0,v1,v2,N) - !omp end target data + !$omp end target data call output(p0, N) end subroutine subroutine vec_mult(p1,v3,v4,N) diff --git a/sources/Example_target_data.5c.c b/sources/Example_target_data.5.cpp similarity index 100% rename from sources/Example_target_data.5c.c rename to sources/Example_target_data.5.cpp diff --git a/sources/Example_target_data.5f.f b/sources/Example_target_data.5.f90 similarity index 96% rename from sources/Example_target_data.5f.f rename to sources/Example_target_data.5.f90 index 9bbac13..f9e8eaa 100644 --- a/sources/Example_target_data.5f.f +++ b/sources/Example_target_data.5.f90 @@ -11,7 +11,7 @@ integer :: N,i call init(v1, v2, N) !$omp target data map(to: v1, v2) map(from: p0) call vec_mult(p0,v1,v2,N) - !omp end target data + !$omp end target data call output(p0, N) end subroutine subroutine vec_mult(p1,v3,v4,N) diff --git a/sources/Example_target_data.6c.c b/sources/Example_target_data.6.c similarity index 100% rename from sources/Example_target_data.6c.c rename to sources/Example_target_data.6.c diff --git a/sources/Example_target_data.6f.f b/sources/Example_target_data.6.f90 similarity index 100% rename from sources/Example_target_data.6f.f rename to sources/Example_target_data.6.f90 diff --git a/sources/Example_target_data.7c.c b/sources/Example_target_data.7.c similarity index 100% rename from sources/Example_target_data.7c.c rename to sources/Example_target_data.7.c diff --git a/sources/Example_target_data.7f.f b/sources/Example_target_data.7.f90 similarity index 100% rename from sources/Example_target_data.7f.f rename to sources/Example_target_data.7.f90 diff --git a/sources/Example_target_unstructured_data.1.c b/sources/Example_target_unstructured_data.1.c new file mode 100644 index 0000000..190a7bd --- /dev/null +++ b/sources/Example_target_unstructured_data.1.c @@ -0,0 +1,27 @@ +/* + * @@name: target-unstructured-data.1.c + * @@type: C + * @@compilable: yes + * @@linkable: no + * @@expect: success + */ +#include +typedef struct { + double *A; + int N; +} Matrix; + +void init_matrix(Matrix *mat, int n) +{ + mat->A = (double *)malloc(n*sizeof(double)); + mat->N = n; + #pragma omp target enter data map(alloc:mat->A[:n]) +} + +void free_matrix(Matrix *mat) +{ + #pragma omp target exit data map(delete:mat->A[:mat->N]) + mat->N = 0; + free(mat->A); + mat->A = NULL; +} diff --git a/sources/Example_target_unstructured_data.1.cpp b/sources/Example_target_unstructured_data.1.cpp new file mode 100644 index 0000000..c894b17 --- /dev/null +++ b/sources/Example_target_unstructured_data.1.cpp @@ -0,0 +1,29 @@ +/* +* @@name: target-unstructured-data.1.cpp +* @@type: C++ +* @@compilable: yes +* @@linkable: no +* @@expect: success +*/ +class Matrix +{ + + Matrix(int n) { + len = n; + v = new double[len]; + #pragma omp target enter data map(alloc:v[0:len]) + } + + ~Matrix() { + // NOTE: delete map type should be used, since the corresponding + // host data will cease to exist after the deconstructor is called. + + #pragma omp target exit data map(delete:v[0:len]) + delete[] v; + } + + private: + double* v; + int len; + +}; diff --git a/sources/Example_target_unstructured_data.1.f90 b/sources/Example_target_unstructured_data.1.f90 new file mode 100644 index 0000000..24eca51 --- /dev/null +++ b/sources/Example_target_unstructured_data.1.f90 @@ -0,0 +1,24 @@ +! @@name: target-unstructured-data.1.f +! @@type: F-free +! @@compilable: yes +! @@linkable: no +! @@expect: success +module example + real(8), allocatable :: A(:) + + contains + subroutine initialize(N) + integer :: N + + allocate(A(N)) + !$omp target enter data map(alloc:A) + + end subroutine initialize + + subroutine finalize() + + !$omp target exit data map(delete:A) + deallocate(A) + + end subroutine finalize +end module example diff --git a/sources/Example_target_update.1c.c b/sources/Example_target_update.1.c similarity index 100% rename from sources/Example_target_update.1c.c rename to sources/Example_target_update.1.c diff --git a/sources/Example_target_update.1f.f b/sources/Example_target_update.1.f90 similarity index 100% rename from sources/Example_target_update.1f.f rename to sources/Example_target_update.1.f90 diff --git a/sources/Example_target_update.2c.c b/sources/Example_target_update.2.c similarity index 100% rename from sources/Example_target_update.2c.c rename to sources/Example_target_update.2.c diff --git a/sources/Example_target_update.2f.f b/sources/Example_target_update.2.f90 similarity index 89% rename from sources/Example_target_update.2f.f rename to sources/Example_target_update.2.f90 index b94476d..07c55a3 100644 --- a/sources/Example_target_update.2f.f +++ b/sources/Example_target_update.2.f90 @@ -22,9 +22,9 @@ subroutine vec_mult(p, v1, v2, N) end do !$omp end target changed = maybe_init_again(v1, N) - !$omp target if(changed) update to(v1(:N)) + !$omp target update if(changed) to(v1(:N)) changed = maybe_init_again(v2, N) - !$omp target if(changed) update to(v2(:N)) + !$omp target update if(changed) to(v2(:N)) !$omp target !$omp parallel do do i=1, N diff --git a/sources/Example_task_dep.1c.c b/sources/Example_task_dep.1.c similarity index 100% rename from sources/Example_task_dep.1c.c rename to sources/Example_task_dep.1.c diff --git a/sources/Example_task_dep.1f.f b/sources/Example_task_dep.1.f90 similarity index 100% rename from sources/Example_task_dep.1f.f rename to sources/Example_task_dep.1.f90 diff --git a/sources/Example_task_dep.2c.c b/sources/Example_task_dep.2.c similarity index 100% rename from sources/Example_task_dep.2c.c rename to sources/Example_task_dep.2.c diff --git a/sources/Example_task_dep.2f.f b/sources/Example_task_dep.2.f90 similarity index 100% rename from sources/Example_task_dep.2f.f rename to sources/Example_task_dep.2.f90 diff --git a/sources/Example_task_dep.3c.c b/sources/Example_task_dep.3.c similarity index 100% rename from sources/Example_task_dep.3c.c rename to sources/Example_task_dep.3.c diff --git a/sources/Example_task_dep.3f.f b/sources/Example_task_dep.3.f90 similarity index 100% rename from sources/Example_task_dep.3f.f rename to sources/Example_task_dep.3.f90 diff --git a/sources/Example_task_dep.4c.c b/sources/Example_task_dep.4.c similarity index 100% rename from sources/Example_task_dep.4c.c rename to sources/Example_task_dep.4.c diff --git a/sources/Example_task_dep.4f.f b/sources/Example_task_dep.4.f90 similarity index 100% rename from sources/Example_task_dep.4f.f rename to sources/Example_task_dep.4.f90 diff --git a/sources/Example_task_dep.5c.c b/sources/Example_task_dep.5.c similarity index 100% rename from sources/Example_task_dep.5c.c rename to sources/Example_task_dep.5.c diff --git a/sources/Example_task_dep.5f.f b/sources/Example_task_dep.5.f90 similarity index 100% rename from sources/Example_task_dep.5f.f rename to sources/Example_task_dep.5.f90 diff --git a/sources/Example_task_priority.1.c b/sources/Example_task_priority.1.c new file mode 100644 index 0000000..40c262f --- /dev/null +++ b/sources/Example_task_priority.1.c @@ -0,0 +1,21 @@ +/* +* @@name: task_priority.1c +* @@type: C +* @@compilable: yes +* @@linkable: no +* @@expect: success +*/ +void compute_array (float *node, int M); + +void compute_matrix (float *array, int N, int M) +{ + int i; + #pragma omp parallel private(i) + #pragma omp single + { + for (i=0;i HEAD + DO + !$OMP TASK + ! P is firstprivate by default + CALL PROCESS(P) + !$OMP END TASK + P => P%NEXT + IF ( .NOT. ASSOCIATED (P) ) EXIT + END DO + !$OMP END SINGLE + !$OMP END PARALLEL + END SUBROUTINE + END MODULE diff --git a/sources/Example_tasking.3f.f b/sources/Example_tasking.3f.f deleted file mode 100644 index 0a96a58..0000000 --- a/sources/Example_tasking.3f.f +++ /dev/null @@ -1,33 +0,0 @@ -! @@name: tasking.3f -! @@type: F-fixed -! @@compilable: yes -! @@linkable: no -! @@expect: success - MODULE LIST - TYPE NODE - INTEGER :: PAYLOAD - TYPE (NODE), POINTER :: NEXT - END TYPE NODE - CONTAINS - SUBROUTINE PROCESS(p) - TYPE (NODE), POINTER :: P - ! do work here - END SUBROUTINE - SUBROUTINE INCREMENT_LIST_ITEMS (HEAD) - TYPE (NODE), POINTER :: HEAD - TYPE (NODE), POINTER :: P - !$OMP PARALLEL PRIVATE(P) - !$OMP SINGLE - P => HEAD - DO - !$OMP TASK - ! P is firstprivate by default - CALL PROCESS(P) - !$OMP END TASK - P => P%NEXT - IF ( .NOT. ASSOCIATED (P) ) EXIT - END DO - !$OMP END SINGLE - !$OMP END PARALLEL - END SUBROUTINE - END MODULE diff --git a/sources/Example_tasking.4c.c b/sources/Example_tasking.4.c similarity index 100% rename from sources/Example_tasking.4c.c rename to sources/Example_tasking.4.c diff --git a/sources/Example_tasking.4f.f b/sources/Example_tasking.4.f similarity index 100% rename from sources/Example_tasking.4f.f rename to sources/Example_tasking.4.f diff --git a/sources/Example_tasking.5c.c b/sources/Example_tasking.5.c similarity index 100% rename from sources/Example_tasking.5c.c rename to sources/Example_tasking.5.c diff --git a/sources/Example_tasking.5f.f b/sources/Example_tasking.5.f similarity index 100% rename from sources/Example_tasking.5f.f rename to sources/Example_tasking.5.f diff --git a/sources/Example_tasking.6c.c b/sources/Example_tasking.6.c similarity index 100% rename from sources/Example_tasking.6c.c rename to sources/Example_tasking.6.c diff --git a/sources/Example_tasking.6f.f b/sources/Example_tasking.6.f similarity index 100% rename from sources/Example_tasking.6f.f rename to sources/Example_tasking.6.f diff --git a/sources/Example_tasking.7c.c b/sources/Example_tasking.7.c similarity index 100% rename from sources/Example_tasking.7c.c rename to sources/Example_tasking.7.c diff --git a/sources/Example_tasking.7f.f b/sources/Example_tasking.7.f similarity index 100% rename from sources/Example_tasking.7f.f rename to sources/Example_tasking.7.f diff --git a/sources/Example_tasking.8c.c b/sources/Example_tasking.8.c similarity index 100% rename from sources/Example_tasking.8c.c rename to sources/Example_tasking.8.c diff --git a/sources/Example_tasking.8f.f b/sources/Example_tasking.8.f similarity index 100% rename from sources/Example_tasking.8f.f rename to sources/Example_tasking.8.f diff --git a/sources/Example_tasking.9c.c b/sources/Example_tasking.9.c similarity index 100% rename from sources/Example_tasking.9c.c rename to sources/Example_tasking.9.c diff --git a/sources/Example_tasking.9f.f b/sources/Example_tasking.9.f similarity index 100% rename from sources/Example_tasking.9f.f rename to sources/Example_tasking.9.f diff --git a/sources/Example_taskloop.1.c b/sources/Example_taskloop.1.c new file mode 100644 index 0000000..ed76212 --- /dev/null +++ b/sources/Example_taskloop.1.c @@ -0,0 +1,25 @@ +/* +* @@name: taskloop.c +* @@type: C +* @@compilable: yes +* @@linkable: no +* @@expect: success +*/ +void long_running_task(void); +void loop_body(int i, int j); + +void parallel_work(void) { + int i, j; +#pragma omp taskgroup + { +#pragma omp task + long_running_task(); // can execute concurrently + +#pragma omp taskloop private(j) grainsize(500) nogroup + for (i = 0; i < 10000; i++) { // can execute concurrently + for (j = 0; j < i; j++) { + loop_body(i, j); + } + } + } +} diff --git a/sources/Example_taskloop.1.f90 b/sources/Example_taskloop.1.f90 new file mode 100644 index 0000000..5f23b20 --- /dev/null +++ b/sources/Example_taskloop.1.f90 @@ -0,0 +1,24 @@ +! @@name: taskloop.1f +! @@type: F-free +! @@compilable: yes +! @@linkable: no +! @@expect: success +subroutine parallel_work + integer i + integer j +!$omp taskgroup + +!$omp task + call long_running_task() +!$omp end task + +!$omp taskloop private(j) grainsize(500) nogroup + do i=1,10000 + do j=1,i + call loop_body(i, j) + end do + end do +!$omp end taskloop + +!$omp end taskgroup +end subroutine diff --git a/sources/Example_taskyield.1c.c b/sources/Example_taskyield.1.c similarity index 100% rename from sources/Example_taskyield.1c.c rename to sources/Example_taskyield.1.c diff --git a/sources/Example_taskyield.1f.f b/sources/Example_taskyield.1.f90 similarity index 100% rename from sources/Example_taskyield.1f.f rename to sources/Example_taskyield.1.f90 diff --git a/sources/Example_teams.1c.c b/sources/Example_teams.1.c similarity index 74% rename from sources/Example_teams.1c.c rename to sources/Example_teams.1.c index a9429f7..bfcee5e 100644 --- a/sources/Example_teams.1c.c +++ b/sources/Example_teams.1.c @@ -11,7 +11,7 @@ float dotprod(float B[], float C[], int N) { float sum0 = 0.0; float sum1 = 0.0; - #pragma omp target map(to: B[:N], C[:N]) + #pragma omp target map(to: B[:N], C[:N]) map(tofrom: sum0, sum1) #pragma omp teams num_teams(2) { int i; @@ -32,3 +32,7 @@ float dotprod(float B[], float C[], int N) } return sum0 + sum1; } + +/* Note: The variables sum0,sum1 are now mapped with tofrom, for correct + execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro. + */ diff --git a/sources/Example_teams.1f.f b/sources/Example_teams.1.f90 similarity index 79% rename from sources/Example_teams.1f.f rename to sources/Example_teams.1.f90 index 33da275..5318263 100644 --- a/sources/Example_teams.1f.f +++ b/sources/Example_teams.1.f90 @@ -9,7 +9,7 @@ use omp_lib, ONLY : omp_get_num_teams, omp_get_team_num integer :: N, i sum0 = 0.0e0 sum1 = 0.0e0 - !$omp target map(to: B, C) + !$omp target map(to: B, C) map(tofrom: sum0, sum1) !$omp teams num_teams(2) if (omp_get_num_teams() /= 2) stop "2 teams required" if (omp_get_team_num() == 0) then @@ -27,3 +27,6 @@ use omp_lib, ONLY : omp_get_num_teams, omp_get_team_num !$omp end target sum = sum0 + sum1 end function + +! Note: The variables sum0,sum1 are now mapped with tofrom, for correct +! execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro. diff --git a/sources/Example_teams.2c.c b/sources/Example_teams.2.c similarity index 64% rename from sources/Example_teams.2c.c rename to sources/Example_teams.2.c index 20c389a..6ec1a45 100644 --- a/sources/Example_teams.2c.c +++ b/sources/Example_teams.2.c @@ -5,12 +5,14 @@ * @@linkable: no * @@expect: success */ +#define min(x, y) (((x) < (y)) ? (x) : (y)) + float dotprod(float B[], float C[], int N, int block_size, int num_teams, int block_threads) { - float sum = 0; + float sum = 0.0; int i, i0; - #pragma omp target map(to: B[0:N], C[0:N]) + #pragma omp target map(to: B[0:N], C[0:N]) map(tofrom: sum) #pragma omp teams num_teams(num_teams) thread_limit(block_threads) \ reduction(+:sum) #pragma omp distribute @@ -20,3 +22,7 @@ float dotprod(float B[], float C[], int N, int block_size, sum += B[i] * C[i]; return sum; } + +/* Note: The variable sum is now mapped with tofrom, for correct + execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro. + */ diff --git a/sources/Example_teams.2f.f b/sources/Example_teams.2.f90 similarity index 77% rename from sources/Example_teams.2f.f rename to sources/Example_teams.2.f90 index 2777d51..ec9eeee 100644 --- a/sources/Example_teams.2f.f +++ b/sources/Example_teams.2.f90 @@ -8,7 +8,7 @@ implicit none real :: B(N), C(N), sum integer :: N, block_size, num_teams, block_threads, i, i0 sum = 0.0e0 - !$omp target map(to: B, C) + !$omp target map(to: B, C) map(tofrom: sum) !$omp teams num_teams(num_teams) thread_limit(block_threads) & !$omp& reduction(+:sum) !$omp distribute @@ -21,3 +21,6 @@ implicit none !$omp end teams !$omp end target end function + +! Note: The variable sum is now mapped with tofrom, for correct +! execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro. diff --git a/sources/Example_teams.3.c b/sources/Example_teams.3.c new file mode 100644 index 0000000..f073020 --- /dev/null +++ b/sources/Example_teams.3.c @@ -0,0 +1,23 @@ +/* +* @@name: teams.3c +* @@type: C +* @@compilable: yes +* @@linkable: no +* @@expect: success +*/ +float dotprod(float B[], float C[], int N) +{ + float sum = 0; + int i; + #pragma omp target teams map(to: B[0:N], C[0:N]) \ + defaultmap(tofrom:scalar) reduction(+:sum) + #pragma omp distribute parallel for reduction(+:sum) + for (i=0; i