OpenMP-Examples/devices/async_target_nowait.tex
2024-04-16 08:59:23 -07:00

35 lines
1.7 KiB
TeX

\subsection{\kcode{nowait} Clause on \kcode{target} Construct}
\label{subsec:target_nowait_clause}
\index{target construct@\kcode{target} construct!nowait clause@\kcode{nowait} clause}
\index{clauses!nowait@\kcode{nowait}}
\index{nowait clause@\kcode{nowait} clause}
The following example shows how to execute code asynchronously on a
device without an explicit task. The \kcode{nowait} clause on a \kcode{target}
construct allows the thread of the \plc{target task} to perform other
work while waiting for the \kcode{target} region execution to complete.
Hence, the \kcode{target} region can execute asynchronously on the
device (without requiring a host thread to idle while waiting for
the target task execution to complete).
In this example the product of two vectors (arrays), \ucode{v1}
and \ucode{v2}, is formed. One half of the operations is performed
on the device, and the last half on the host, concurrently.
After a team of threads is formed the primary thread generates
the target task while the other threads can continue on, without a barrier,
to the execution of the host portion of the vector product.
The completion of the target task (asynchronous target execution) is
guaranteed by the synchronization in the implicit barrier at the end of the
host vector-product worksharing loop region. See the \kcode{barrier}
glossary entry in the OpenMP specification for details.
The host loop scheduling is \kcode{dynamic}, to balance the host thread executions, since
one thread is being used for offload generation. In the situation where
little time is spent by the target task in setting
up and tearing down the target execution, \kcode{static} scheduling may be desired.
\cexample[5.1]{async_target}{3}
\ffreeexample[5.1]{async_target}{3}