mirror of
https://github.com/OpenMP/Examples.git
synced 2025-04-10 08:22:11 +01:00
55 lines
3.1 KiB
TeX
55 lines
3.1 KiB
TeX
\pagebreak
|
|
\chapter{Asynchronous Execution of a \code{target} Region Using Tasks}
|
|
\label{chap:async_target}
|
|
|
|
The following example shows how the \code{task} and \code{target} constructs
|
|
are used to execute multiple \code{target} regions asynchronously. The task that
|
|
encounters the \code{task} construct generates an explicit task that contains
|
|
a \code{target} region. The thread executing the explicit task encounters a task
|
|
scheduling point while waiting for the execution of the \code{target} region
|
|
to complete, allowing the thread to switch back to the execution of the encountering
|
|
task or one of the previously generated explicit tasks.
|
|
|
|
\cexample{async_target}{1c}
|
|
|
|
The Fortran version has an interface block that contains the \code{declare} \code{target}.
|
|
An identical statement exists in the function declaration (not shown here).
|
|
|
|
\fexample{async_target}{1f}
|
|
|
|
The following example shows how the \code{task} and \code{target} constructs
|
|
are used to execute multiple \code{target} regions asynchronously. The task dependence
|
|
ensures that the storage is allocated and initialized on the device before it is
|
|
accessed.
|
|
|
|
\cexample{async_target}{2c}
|
|
|
|
The Fortran example below is similar to the C version above. Instead of pointers, though, it uses
|
|
the convenience of Fortran allocatable arrays on the device. An allocatable array has the
|
|
same behavior in a \code{map} clause as a C pointer, in this case.
|
|
|
|
If there is no shape specified for an allocatable array in a \code{map} clause, only the array descriptor
|
|
(also called a dope vector) is mapped. That is, device space is created for the descriptor, and it
|
|
is initially populated with host values. In this case, the \plc{v1} and \plc{v2} arrays will be in a
|
|
non-associated state on the device. When space for \plc{v1} and \plc{v2} is allocated on the device
|
|
the addresses to the space will be included in their descriptors.
|
|
|
|
At the end of the first \code{target} region, the descriptor (of an unshaped specification of an allocatable
|
|
array in a \code{map} clause) is returned with the raw device address of the allocated space.
|
|
The content of the array is not returned. In the example the data in arrays \plc{v1} and \plc{v2}
|
|
are not returned. In the second \code{target} directive, the \plc{v1} and \plc{v2} descriptors are
|
|
re-created on the device with the descriptive information; and references to the
|
|
vectors point to the correct local storage, of the space that was not freed in the first \code{target}
|
|
directive. At the end of the second \code{target} region, the data in array \plc{p} is copied back
|
|
to the host since \plc{p} is not an allocatable array.
|
|
|
|
A \code{depend} clause is used in the \code{task} directive to provide a wait at the beginning of the second
|
|
\code{target} region, to insure that there is no race condition with \plc{v1} and \plc{v2} in the two tasks.
|
|
It would be noncompliant to use \plc{v1} and/or \plc{v2} in lieu of \plc{N} in the \code{depend} clauses,
|
|
because the use of non-allocated allocatable arrays as list items in the first \code{depend} clause would
|
|
lead to unspecified behavior.
|
|
|
|
\fexample{async_target}{2f}
|
|
|
|
|