OpenMP-Examples/devices/target_pointer_mapping.tex
2024-04-16 08:59:23 -07:00

168 lines
11 KiB
TeX

\pagebreak
\section{Pointer Mapping}
\label{sec:pointer_mapping}
\index{mapping!pointer}
\index{mapping!pointer attachment}
\index{pointer attachment}
Pointers that contain host addresses require that those addresses are translated to device addresses for them to be useful in the context of a device data environment. Broadly speaking, there are two scenarios where this is important.
The first scenario is where the pointer is mapped to the device data environment, such that references to the pointer inside a \kcode{target} region are to the corresponding pointer. Pointer \plc{attachment} ensures that the corresponding pointer will contain a device address when all of the following conditions are true:
\begin{itemize}
\item the pointer is mapped by directive $A$ to a device;
\item a list item that uses the pointer as its base pointer (call it the \emph{pointee}) is mapped, to the same device, by directive $B$, which may be the same as $A$;
\item the effect of directive $B$ is to create either the corresponding pointer or pointee in the device data environment of the device.
\end{itemize}
Given the above conditions, pointer attachment is initiated as a result of directive $B$ and subsequent references to the pointee list item in a target region that use the pointer will access the corresponding pointee. The corresponding pointer remains in this \plc{attached} state until it is removed from the device data environment.
The second scenario, which is only applicable for C/C++, is where the pointer is implicitly privatized inside a \kcode{target} construct when it appears as the base pointer to a list item on the construct and does not appear explicitly as a list item in a \kcode{map} clause, \kcode{is_device_ptr} clause, or data-sharing attribute clause. This scenario can be further split into two cases: the list item is a zero-length array section (e.g., \ucode{p[:0]}) or it is not.
If it is a zero-length array section, this will trigger a runtime check on entry to the \kcode{target} region for a previously mapped list item where the value of the pointer falls within the range of its base address and ending address. If such a match is found the private pointer is initialized to the device address corresponding to the value of the original pointer, and otherwise it is initialized to \bcode{NULL} (or retains its original value if the \kcode{unified_address} requirement is specified for that compilation unit).
If the list item (again, call it the \emph{pointee}) is not a zero-length array section, the private pointer will be initialized such that references in the \kcode{target} region to the pointee list item that use the pointer will access the corresponding pointee.
The following example shows the basics of mapping pointers with and without
associated storage on the host.
Storage for pointers \ucode{ptr1} and \ucode{ptr2} is created on the host.
To map storage that is associated with a pointer on the host, the data can be
explicitly mapped as an array section so that the compiler knows
the amount of data to be assigned in the device (to the \plc{corresponding} data storage area).
On the \kcode{target} construct array sections are mapped; however, the pointer \ucode{ptr1}
is mapped, while \ucode{ptr2} is not. Since \ucode{ptr2} is not explicitly mapped, it is
firstprivate. This creates a subtle difference in the way these pointers can be used.
As a firstprivate pointer, \ucode{ptr2} can be manipulated on the device;
however, as an explicitly mapped pointer,
\ucode{ptr1} becomes an \emph{attached} pointer and cannot be manipulated.
In both cases the host pointer is not updated with the device pointer
address---as one would expect for distributed memory.
The storage data on the host is updated from the corresponding device
data at the end of the \kcode{target} region.
As a comparison, note that the \ucode{aray} array is automatically mapped,
since the compiler knows the extent of the array.
The pointer \ucode{ptr3} is used inside the \kcode{target} construct, but it does
not appear in a data-mapping or data-sharing clause. Nor is there a
\kcode{defaultmap} clause on the construct to indicate what its implicit
data-mapping or data-sharing attribute should be. For such a case, \ucode{ptr3}
will be implicitly privatized within the construct and there will be a runtime
check to see if the host memory to which it is pointing has corresponding memory
in the device data environment. If this runtime check passes, the private
\ucode{ptr3} would be initialized to point to the corresponding memory. But in
this case the check does not pass and so it is initialized to null.
Since \ucode{ptr3} is private, the value to which it is assigned in the
\kcode{target} region is not returned into the original \ucode{ptr3} on the host.
\cexample[5.0]{target_ptr_map}{1}
\index{directives!begin declare target@\kcode{begin declare target}}
\index{begin declare target directive@\kcode{begin declare target} directive}
In the following example the global pointer \ucode{p} appears in a
declare target directive. Hence, the pointer \ucode{p} will
persist on the device throughout executions in all \kcode{target} regions.
The pointer is also used in an array section of a \kcode{map} clause on
a \kcode{target} construct. When the pointer of storage associated with
a declare target directive
is mapped, as for the array section \ucode{p[:N]} in the
\kcode{target} construct, the array section on the device is \emph{attached}
to the device pointer \ucode{p} on entry to the construct, and
the value of the device pointer \ucode{p} becomes undefined on exit.
(Of course, storage allocation for
the array section on the device will occur before the
pointer on the device is attached.)
% For globals with declare target is there such a things a
% original and corresponding?
\cexample[5.1]{target_ptr_map}{2}
\index{directives!begin declare target@\kcode{begin declare target}}
\index{begin declare target directive@\kcode{begin declare target} directive}
The following two examples illustrate subtle differences in pointer attachment
to device address because of the order of data mapping.
In example \example{target_ptr_map.3a}
the global pointer \ucode{p1} points to array \ucode{x} and \ucode{p2} points to
array \ucode{y} on the host.
The array section \ucode{x[:N]} is mapped by the \kcode{target enter data} directive while array \ucode{y} is mapped
on the \kcode{target} construct.
Since the \kcode{begin declare target} directive is applied to the declaration
of \ucode{p1}, \ucode{p1} is a treated like a mapped variable on the \kcode{target}
construct and references to \ucode{p1} inside the construct will be to the
corresponding \ucode{p1} that exists on the device. However, the corresponding
\ucode{p1} will be undefined since there is no pointer attachment for it. Pointer
attachment for \ucode{p1} would require that (1) \ucode{p1} (or an lvalue
expression that refers to the same storage as \ucode{p1}) appears as a base
pointer to a list item in a \kcode{map} clause, and (2) the construct that has
the \kcode{map} clause causes the list item to transition from \emph{not mapped}
to \emph{mapped}. The conditions are clearly not satisfied for this example.
The problem for \ucode{p2} in this example is also subtle. It will be privatized
inside the \kcode{target} construct, with a runtime check for whether the memory
to which it is pointing has corresponding memory that is accessible on the
device. If this check is successful, then the \ucode{p2} inside the construct
would be appropriately initialized to point to that corresponding memory.
Unfortunately, despite there being an implicit map of the array \ucode{y} (to
which \ucode{p2} is pointing) on the construct, the order of this map relative to
the initialization of \ucode{p2} is unspecified. Therefore, the initial value of
\ucode{p2} will also be undefined.
Thus, referencing values via either \ucode{p1} or \ucode{p2} inside
the \kcode{target} region would be invalid.
\cexample[5.1]{target_ptr_map}{3a}
In example \example{target_ptr_map.3b} the mapping orders for arrays \ucode{x}
and \ucode{y} were rearranged to allow proper pointer attachments.
On the \kcode{target} construct, the \kcode{map(\ucode{x})} clause triggers pointer
attachment for \ucode{p1} to the device address of \ucode{x}.
Pointer \ucode{p2} is assigned the device address of the previously mapped
array \ucode{y}.
Referencing values via either \ucode{p1} or \ucode{p2} inside the \kcode{target} region is now valid.
\cexample[5.1]{target_ptr_map}{3b}
%\clearpage
\index{routines!omp_target_is_accessible@\kcode{omp_target_is_accessible}}
\index{omp_target_is_accessible routine@\kcode{omp_target_is_accessible} routine}
In the following example, storage allocated on the host is not mapped in a \kcode{target}
region if it is determined that the host memory is accessible from the device.
On platforms that support host memory access from a target device,
it may be more efficient to omit map clauses and avoid the potential memory allocation
and data transfers that may result from the map.
%For discrete memory storage on host and devices, explicit mapping may be required, whereas for
%Unified Shared Memory platforms it may be optimal to avoid using map clauses,
%because re-allocation of the space may occur when map clauses are present.
The \kcode{omp_target_is_accessible} API routine is used to determine if the
host storage of size \ucode{buf_size} is accessible on the device, and a metadirective
is used to select the directive variant (a \kcode{target} with/without a \kcode{map} clause).
The \kcode{omp_target_is_accessible} routine will return true if the storage indicated
by the first and second arguments is accessible on the target device. In this case,
the host pointer \ucode{ptr} may be directly dereferenced in the subsequent
\kcode{target} region to access this storage, rather than mapping an array section based
off the pointer. By explicitly specifying the host pointer in a \kcode{firstprivate}
clause on the construct, its original value will be used directly in the \kcode{target} region.
In OpenMP 5.1, removing the \kcode{firstprivate} clause will result in an implicit presence
check of the storage to which \ucode{ptr} points, and since this storage is not mapped by the
program, \ucode{ptr} will be \bcode{NULL}-initialized in the \kcode{target} region.
In the OpenMP 5.2 Specification, a false presence check without
the \kcode{firstprivate} clause will cause the pointer to retain its original value.
\cexample[5.2]{target_ptr_map}{4}
\index{mapping!deep copy}
Similar to the previous example, the \kcode{omp_target_is_accessible} routine is used to
discover if a deep copy is required for the platform. Here, the \ucode{deep_copy} map,
defined in the \kcode{declare mapper} directive, is used if the host storage referenced by
\ucode{s.ptr} (or \ucode{s\%ptr} in Fortran) is not accessible from the device.
\cexample[5.2]{target_ptr_map}{5}
\ffreeexample[5.2]{target_ptr_map}{5}