OpenMP-Examples/devices/async_target_with_tasks.tex
2024-04-16 08:59:23 -07:00

69 lines
3.9 KiB
TeX

\subsection{Asynchronous \kcode{target} with Tasks}
\label{subsec:async_target_with_tasks}
\index{target construct@\kcode{target} construct}
\index{task construct@\kcode{task} construct}
\index{directives!declare target@\kcode{declare target}}
\index{declare target directive@\kcode{declare target} directive}
\index{directives!begin declare target@\kcode{begin declare target}}
\index{begin declare target directive@\kcode{begin declare target} directive}
The following example shows how the \kcode{task} and \kcode{target} constructs
are used to execute multiple \kcode{target} regions asynchronously. The task that
encounters the \kcode{task} construct generates an explicit task that contains
a \kcode{target} region. The thread executing the explicit task encounters a task
scheduling point while waiting for the execution of the \kcode{target} region
to complete, allowing the thread to switch back to the execution of the encountering
task or one of the previously generated explicit tasks.
\cexample[5.1]{async_target}{1}
\pagebreak
\index{directives!declare target@\kcode{declare target}}
\index{declare target directive@\kcode{declare target} directive}
The Fortran version has an interface block that contains the \kcode{declare target}.
An identical statement exists in the function declaration (not shown here).
\ffreeexample[4.0]{async_target}{1}
The following example shows how the \kcode{task} and \kcode{target} constructs
are used to execute multiple \kcode{target} regions asynchronously. The task dependence
ensures that the storage is allocated and initialized on the device before it is
accessed.
\cexample[5.1]{async_target}{2}
The Fortran example below is similar to the C version above. Instead of pointers, though, it uses
the convenience of Fortran allocatable arrays on the device. In order to preserve the arrays
allocated on the device across multiple \kcode{target} regions, a \kcode{target data} region
is used in this case.
If there is no shape specified for an allocatable array in a \kcode{map} clause, only the array descriptor
(also called a dope vector) is mapped. That is, device space is created for the descriptor, and it
is initially populated with host values. In this case, the \ucode{v1} and \ucode{v2} arrays will be in a
non-associated state on the device. When space for \ucode{v1} and \ucode{v2} is allocated on the device
in the first \kcode{target} region the addresses to the space will be included in their descriptors.
At the end of the first \kcode{target} region, the arrays \ucode{v1} and \ucode{v2} are preserved on the device
for access in the second \kcode{target} region. At the end of the second \kcode{target} region, the data
in array \ucode{p} is copied back, the arrays \ucode{v1} and \ucode{v2} are not.
\index{task construct@\kcode{task} construct!depend clause@\kcode{depend} clause}
\index{clauses!depend@\kcode{depend}}
\index{depend clause@\kcode{depend} clause}
A \kcode{depend} clause is used in the \kcode{task} directive to provide a wait at the beginning of the second
\kcode{target} region, to insure that there is no race condition with \ucode{v1} and \ucode{v2} in the two tasks.
It would be noncompliant to use \ucode{v1} and/or \ucode{v2} in lieu of \ucode{N} in the \kcode{depend} clauses,
because the use of non-allocated allocatable arrays as list items in a \kcode{depend} clause would
lead to unspecified behavior.
\noteheader{--} This example is not strictly compliant with the OpenMP 4.5 specification since the allocation status
of allocatable arrays \ucode{v1} and \ucode{v2} is changed inside the \kcode{target} region, which is not allowed.
(See the restrictions for the \kcode{map} clause in the \docref{Data-mapping Attribute Rules and Clauses}
section of the specification.)
However, the intention is to relax the restrictions on mapping of allocatable variables in the next release
of the specification so that the example will be compliant.
\ffreeexample[4.0]{async_target}{2}