mirror of
https://github.com/OpenMP/Examples.git
synced 2025-04-04 05:41:33 +01:00
191 lines
9.5 KiB
TeX
191 lines
9.5 KiB
TeX
%\pagebreak
|
|
\section{Device Routines}
|
|
\label{sec:device}
|
|
|
|
\subsection{\kcode{omp_is_initial_device} Routine}
|
|
\label{subsec:device_is_initial}
|
|
\index{routines!omp_is_initial_device@\kcode{omp_is_initial_device}}
|
|
\index{omp_is_initial_device routine@\kcode{omp_is_initial_device} routine}
|
|
|
|
\index{directives!declare target@\kcode{declare target}}
|
|
\index{declare target directive@\kcode{declare target} directive}
|
|
|
|
\index{directives!begin declare target@\kcode{begin declare target}}
|
|
\index{begin declare target directive@\kcode{begin declare target} directive}
|
|
|
|
The following example shows how the \kcode{omp_is_initial_device} runtime library routine
|
|
can be used to query if a code is executing on the initial host device or on a
|
|
target device. The example then sets the number of threads in the \kcode{parallel}
|
|
region based on where the code is executing.
|
|
|
|
\cexample[5.1]{device}{1}
|
|
|
|
\ffreeexample[4.0]{device}{1}
|
|
|
|
\subsection{\kcode{omp_get_num_devices} Routine}
|
|
\label{subsec:device_num_devices}
|
|
|
|
The following example shows how the \kcode{omp_get_num_devices} runtime library routine
|
|
can be used to determine the number of devices.
|
|
|
|
\cexample[4.0]{device}{2}
|
|
|
|
\ffreeexample[4.0]{device}{2}
|
|
|
|
\subsection{\kcode{omp_set_default_device} and \\
|
|
\kcode{omp_get_default_device} Routines}
|
|
\label{subsec:device_is_set_get_default}
|
|
\index{routines!omp_set_default_device@\kcode{omp_set_default_device}}
|
|
\index{omp_set_default_device routine@\kcode{omp_set_default_device} routine}
|
|
|
|
The following example shows how the \kcode{omp_set_default_device} and \kcode{omp_get_default_device}
|
|
runtime library routines can be used to set the default device and determine the
|
|
default device respectively.
|
|
|
|
\cexample[4.0]{device}{3}
|
|
|
|
\ffreeexample[4.0]{device}{3}
|
|
|
|
|
|
%-------------
|
|
\input{devices/target_associate_ptr}
|
|
%-------------
|
|
|
|
\subsection{Target Memory and Device Pointers Routines}
|
|
\label{subsec:target_mem_and_device_ptrs}
|
|
\index{routines!omp_target_alloc@\kcode{omp_target_alloc}}
|
|
\index{omp_target_alloc routine@\kcode{omp_target_alloc} routine}
|
|
\index{routines!omp_target_memcpy@\kcode{omp_target_memcpy}}
|
|
\index{omp_target_memcpy routine@\kcode{omp_target_memcpy} routine}
|
|
\index{routines!omp_target_free@\kcode{omp_target_free}}
|
|
\index{omp_target_free routine@\kcode{omp_target_free} routine}
|
|
|
|
The following example shows how to create space on a device, transfer data
|
|
to and from that space, and free the space, using API calls. The API calls
|
|
directly execute allocation, copy and free operations on the device, without invoking
|
|
any mapping through a \kcode{target} directive. The \kcode{omp_target_alloc} routine allocates space
|
|
and returns a device pointer for referencing the space in the \kcode{omp_target_memcpy}
|
|
API routine on the host. The \kcode{omp_target_free} routine frees the space on the device.
|
|
|
|
\index{target construct@\kcode{target} construct!is_device_ptr clause@\kcode{is_device_ptr} clause}
|
|
\index{is_device_ptr clause@\kcode{is_device_ptr} clause}
|
|
\index{clauses!is_device_ptr@\kcode{is_device_ptr}}
|
|
The example also illustrates how to access that space
|
|
in a \kcode{target} region by exposing the device pointer in an \kcode{is_device_ptr} clause.
|
|
|
|
The example creates an array of cosine values on the default device, to be used
|
|
on the host device. The function fails if a default device is not available.
|
|
|
|
\cexample[4.5]{device}{4}
|
|
|
|
\index{routines!omp_target_is_present@\kcode{omp_target_is_present}}
|
|
\index{omp_target_is_present routine@\kcode{omp_target_is_present} routine}
|
|
\index{routines!omp_target_associate_ptr@\kcode{omp_target_associate_ptr}}
|
|
\index{omp_target_associate_ptr routine@\kcode{omp_target_associate_ptr} routine}
|
|
The following Fortran example illustrates how to use the \kcode{omp_target_alloc}
|
|
and \kcode{omp_target_memcpy} functions to directly allocate device
|
|
storage and transfer data to and from a device. It also shows how to check for
|
|
the presence of device data with the \kcode{omp_target_is_present} function and
|
|
to associate host and device storage with the \kcode{omp_target_associate_ptr} function.
|
|
|
|
In Section 1 of the code, 40 bytes of storage are allocated on the default device
|
|
with the \kcode{omp_target_alloc} function, which returns a value (of type
|
|
\bcode{C_PTR}) that contains the device address of the storage.
|
|
%A Fortran pointer (\ucode{fp}) is associated by the Fortran \splc{iso_c_binding}
|
|
%\bcode{c_f_pointer} routine with the target of the C pointer (\ucode{cp}).
|
|
In the subsequent \kcode{target} construct, \ucode{cp} is specified on the
|
|
\kcode{is_device_ptr} clause to instruct the compiler that \ucode{cp} is
|
|
a device pointer.
|
|
The device pointer (\ucode{cp}) is then associated with the Fortran pointer
|
|
(\ucode{fp}) via the \bcode{c_f_pointer} routine inside the \kcode{target}
|
|
construct.
|
|
As a result, \ucode{fp} points to the storage on the device that is allocated
|
|
by the \kcode{omp_target_alloc} routine.
|
|
In the \kcode{target} region, the value 4 is assigned to the storage on the device,
|
|
using the Fortran pointer.
|
|
A trivial test checks that all values were correctly assigned.
|
|
The Fortran pointer (\ucode{fp}) is nullified before the end of the \kcode{target} region.
|
|
After the \kcode{target} construct, the space on the device is freed with the
|
|
\kcode{omp_target_free} function, using the device \ucode{cp} pointer
|
|
which is set to null after the call.
|
|
|
|
In Section 2, the content of the storage allocated on the host is directly copied
|
|
to the OpenMP allocated storage on the device.
|
|
First, storage is allocated for the device and host using \kcode{omp_target_alloc}.
|
|
Next, on the host the device pointer, returned from the allocation
|
|
\kcode{omp_target_alloc} function, is associated with a Fortran pointer, and
|
|
values are assigned to the storage. Similarly, values are assigned on the device
|
|
to the device storage, after associating a Fortran pointer (\ucode{fp_dst})
|
|
with the device's storage pointer (\ucode{cp_dst}).
|
|
|
|
Next the \kcode{omp_target_memcpy} function directly copies the host data
|
|
to the device storage, specified by the respective host and device pointers.
|
|
This copy will overwrite -1 values in the device storage, and is checked in the
|
|
next \kcode{target} construct.
|
|
Keyword arguments are used here for clarity.
|
|
(A positional argument list is used in the next Section.)
|
|
|
|
In Section 3, space is allocated (with a Fortran \bcode{ALLOCATE} statement) and initialized using a
|
|
host Fortran pointer (\ucode{h_fp}), and the address of the storage is directly assigned to a
|
|
host C pointer (\ucode{h_cp}).
|
|
The following \kcode{omp_target_is_present} function returns \ucode{0} (\plc{false}, of \bcode{integer(C_INT)} type)
|
|
to indicate that \ucode{h_cp} does not have any corresponding storage on the default device.
|
|
|
|
Next, the same amount of space is allocated on the default device with
|
|
the \kcode{omp_target_alloc} function, which returns a device pointer (\ucode{d_cp}).
|
|
The device pointer \ucode{d_cp} and host pointer \ucode{h_cp}
|
|
are then associated using the \kcode{omp_target_associate_ptr} function.
|
|
The device storage to which \ucode{d_cp} points becomes the corresponding storage of
|
|
the host storage to which \ucode{h_cp} points.
|
|
The following \kcode{omp_target_is_present} call confirms this, by returning
|
|
a non-zero value of \bcode{integer(C_INT)} type for true.
|
|
|
|
After the association, the content of the host storage
|
|
is copied to the device using the \kcode{omp_target_memcpy} function.
|
|
In the final \kcode{target} construct an array section of \ucode{h_fp}
|
|
is mapped to the device, and evaluated for correctness.
|
|
The mapping establishes a connection of \ucode{h_fp} with
|
|
the corresponding device data in the \kcode{target} construct,
|
|
but does not produce an update on the device because the previous \kcode{omp_target_associate_ptr} routine sets the
|
|
reference count of the mapped object to infinity, meaning a mapping
|
|
without the \kcode{always} modifier will not
|
|
update the device object.
|
|
|
|
\ffreeexample[5.0]{device}{4}
|
|
|
|
\index{routines!omp_target_memcpy_async@\kcode{omp_target_memcpy_async}}
|
|
\index{omp_target_memcpy_async routine@\kcode{omp_target_memcpy_async} routine}
|
|
The following example illustrates the use of the \kcode{omp_target_memcpy_async}
|
|
routine to perform asynchronous memory copies.
|
|
The routine acts as if it is a deferrable task so that
|
|
a \kcode{taskwait} construct can be used to wait for the completion
|
|
of the deferrable task.
|
|
In the example the \kcode{omp_target_memcpy_async} routine copies host data
|
|
(\ucode{h_buf}) to device (\ucode{d_buf}).
|
|
The Fortran code uses the intrinsic \bcode{c_loc} function to get
|
|
the corresponding C pointer (\ucode{c_hbuf}) for
|
|
passing to the \kcode{omp_target_memcpy_async} routine.
|
|
The last two arguments (\ucode{0} and \ucode{NULL}) to the routine
|
|
indicate that there is no specified dependence associated with the call.
|
|
The Fortran code omits the unused last argument.
|
|
|
|
\cexample[5.2]{device}{5}
|
|
\ffreeexample[5.2]{device}{5}
|
|
|
|
\index{directives!depobj@\kcode{depobj}}
|
|
\index{depobj directive@\kcode{depobj} directive}
|
|
The following is a more complicated example that shows the use of
|
|
the \kcode{omp_target_memcpy_async} routine with a depend object \ucode{obj} to
|
|
overlap the memory copy with computation performed by \ucode{do_work}.
|
|
The depend object \ucode{obj} was created by the \kcode{depobj} directive
|
|
and initialized to an \kcode{out} dependence on the data \ucode{d_buf[0:N]}
|
|
(or \ucode{d_buf(1:N)} for Fortran) in advance.
|
|
The \kcode{depend(depobj: \ucode{obj})} (or alternatively
|
|
\kcode{depend(in: \ucode{d_buf[0:N]})}) clause on the \kcode{target} construct
|
|
ensures the asynchronous memory copy is complete before the data \ucode{d_buf}
|
|
can be used in the \kcode{target} region.
|
|
|
|
\cexample[5.2]{device}{6}
|
|
\ffreeexample[5.2]{device}{6}
|
|
|