reductionMultiBlockCG - Reduction using MultiBlock Cooperative Groups

Description

This sample demonstrates single pass reduction using Multi Block Cooperative Groups. This sample requires devices with compute capability 6.0 or higher having compute preemption.

Key Concepts

Cooperative Groups, MultiBlock Cooperative Groups

Supported SM Architectures

SM 6.0 SM 6.1 SM 7.0 SM 7.2 SM 7.5 SM 8.0 SM 8.6 SM 8.7 SM 8.9 SM 9.0

Supported OSes

Linux, Windows

Supported CPU Architecture

x86_64, aarch64

CUDA APIs involved

CUDA Runtime API

cudaMemcpy, cudaFree, cudaSetDevice, cudaDeviceSynchronize, cudaLaunchCooperativeKernel, cudaMalloc, cudaOccupancyMaxActiveBlocksPerMultiprocessor, cudaGetDeviceProperties, cudaOccupancyMaxPotentialBlockSize

Dependencies needed to build/run

MBCG, CPP11

Prerequisites

Download and install the CUDA Toolkit 12.5 for your corresponding platform. Make sure the dependencies mentioned in Dependencies section above are installed.

References (for more details)