mirror of
https://github.com/NVIDIA/cuda-samples.git
synced 2025-04-04 07:21:33 +01:00
shfl_scan - CUDA Parallel Prefix Sum with Shuffle Intrinsics (SHFL_Scan)
Description
This example demonstrates how to use the shuffle intrinsic __shfl_up_sync to perform a scan operation across a thread block.
Key Concepts
Data-Parallel Algorithms, Performance Strategies
Supported SM Architectures
Supported OSes
Linux, Windows
Supported CPU Architecture
x86_64, armv7l, aarch64
CUDA APIs involved
CUDA Runtime API
cudaMemcpy, cudaFree, cudaMallocHost, cudaEventSynchronize, cudaEventRecord, cudaFreeHost, cudaGetDevice, cudaMemset, cudaMalloc, cudaEventElapsedTime, cudaGetDeviceProperties, cudaEventCreate
Dependencies needed to build/run
Prerequisites
Download and install the CUDA Toolkit 12.5 for your corresponding platform. Make sure the dependencies mentioned in Dependencies section above are installed.