mirror of
https://github.com/NVIDIA/cuda-samples.git
synced 2025-04-04 07:21:33 +01:00
threadFenceReduction - threadFenceReduction
Description
This sample shows how to perform a reduction operation on an array of values using the thread Fence intrinsic to produce a single value in a single kernel (as opposed to two or more kernel calls as shown in the "reduction" CUDA Sample). Single-pass reduction requires global atomic instructions (Compute Capability 2.0 or later) and the _threadfence() intrinsic (CUDA 2.2 or later).
Key Concepts
Cooperative Groups, Data-Parallel Algorithms, Performance Strategies
Supported SM Architectures
SM 5.0 SM 5.2 SM 5.3 SM 6.0 SM 6.1 SM 7.0 SM 7.2 SM 7.5 SM 8.0 SM 8.6 SM 8.7 SM 8.9 SM 9.0
Supported OSes
Linux, Windows
Supported CPU Architecture
x86_64, armv7l
CUDA APIs involved
CUDA Runtime API
cudaMemcpy, cudaFree, cudaDeviceSynchronize, cudaMalloc, cudaGetDeviceProperties
Prerequisites
Download and install the CUDA Toolkit 12.5 for your corresponding platform.