warpAggregatedAtomicsCG - Warp Aggregated Atomics using Cooperative Groups

Description

This sample demonstrates how using Cooperative Groups (CG) to perform warp aggregated atomics to single and multiple counters, a useful technique to improve performance when many threads atomically add to a single or multiple counters.

Key Concepts

Cooperative Groups, Atomic Intrinsics

Supported SM Architectures

Supported OSes

Linux, Windows

Supported CPU Architecture

x86_64, armv7l, aarch64

CUDA APIs involved

CUDA Runtime API

cudaMemcpy, cudaFree, cudaDeviceGetAttribute, cudaMemset, cudaMalloc

Prerequisites

Download and install the CUDA Toolkit 12.5 for your corresponding platform.

References (for more details)