OpenMP-Examples/Examples_async_target.tex

49 lines
2.6 KiB
TeX

\pagebreak
\chapter{Asynchronous Execution of a \code{target} Region Using Tasks}
\label{chap:async_target}
The following example shows how the \code{task} and \code{target} constructs
are used to execute multiple \code{target} regions asynchronously. The task that
encounters the \code{task} construct generates an explicit task that contains
a \code{target} region. The thread executing the explicit task encounters a task
scheduling point while waiting for the execution of the \code{target} region
to complete, allowing the thread to switch back to the execution of the encountering
task or one of the previously generated explicit tasks.
\cexample{async_target}{1c}
The Fortran version has an interface block that contains the \code{declare} \code{target}.
An identical statement exists in the function declaration (not shown here).
\fexample{async_target}{1f}
The following example shows how the \code{task} and \code{target} constructs
are used to execute multiple \code{target} regions asynchronously. The task dependence
ensures that the storage is allocated and initialized on the device before it is
accessed.
\cexample{async_target}{2c}
The Fortran example below is similar to the C version above. Instead of pointers, though, it uses
the convenience of Fortran allocatable arrays on the device. An allocatable array has the
same behavior in a \code{map} clause as a C pointer, in this case.
If there is no shape specified for an allocatable array in a \code{map} clause, only the array descriptor
(also called a dope vector) is mapped. That is, device space is created for the descriptor, and it
is initially populated with host values. In this case, the \plc{v1} and \plc{v2} arrays will be in a
non-associated state on the device. When space for \plc{v1} and \plc{v2} is allocated on the device
the addresses to the space will be included in their descriptors.
At the end of the device region, the descriptor (of an unshaped specification of an allocatable
array in a \code{map} clause) is returned with the raw device address of the allocated space.
The content of the array is not returned. In the example the data in arrays \plc{v1} and \plc{v2}
are not returned. In the second target directive, the \plc{v1} and \plc{v2} descriptors are
re-created on the device with the descriptive information; and references to the
vectors point to the correct local storage, of the space that was not freed in the first target
directive. At the end of the \code{target} region the data in array \plc{p} is copied back
to the host, since \plc{p} is not an allocatable array.
\fexample{async_target}{2f}