OpenMP-Examples/devices/async_target_nowait.tex
2022-04-18 15:02:25 -07:00

35 lines
1.7 KiB
TeX

\subsection{\code{nowait} Clause on \code{target} Construct}
\label{subsec:target_nowait_clause}
\index{target construct@\code{target} construct!nowait clause@\code{nowait} clause}
\index{clauses!nowait@\code{nowait}}
\index{nowait clause@\code{nowait} clause}
The following example shows how to execute code asynchronously on a
device without an explicit task. The \code{nowait} clause on a \code{target}
construct allows the thread of the \plc{target task} to perform other
work while waiting for the \code{target} region execution to complete.
Hence, the \code{target} region can execute asynchronously on the
device (without requiring a host thread to idle while waiting for
the \plc{target task} execution to complete).
In this example the product of two vectors (arrays), \plc{v1}
and \plc{v2}, is formed. One half of the operations is performed
on the device, and the last half on the host, concurrently.
After a team of threads is formed the primary thread generates
the \plc{target task} while the other threads can continue on, without a barrier,
to the execution of the host portion of the vector product.
The completion of the \plc{target task} (asynchronous target execution) is
guaranteed by the synchronization in the implicit barrier at the end of the
host vector-product worksharing loop region. See the \code{barrier}
glossary entry in the OpenMP specification for details.
The host loop scheduling is \code{dynamic}, to balance the host thread executions, since
one thread is being used for offload generation. In the situation where
little time is spent by the \plc{target task} in setting
up and tearing down the target execution, \code{static} scheduling may be desired.
\cexample[5.1]{async_target}{3}
\ffreeexample[5.1]{async_target}{3}