Thread block (CUDA programming)

Thread blocks can contain up to 1024 threads as of March 2010

Thread blocks are a fundamental concept in CUDA programming that allows for efficient parallel processing. The increase in the maximum number of threads per block to 1024 with compute capability 2.x and higher enables more complex computations and better utilization of the GPU's resources. This change reflects the evolution of CUDA architecture to support more demanding applications.

Example

In a CUDA program, a developer can define a thread block with 1024 threads to perform a large-scale matrix multiplication, taking advantage of the increased thread capacity for improved performance.

Understanding the maximum number of threads per block is crucial for optimizing CUDA applications and fully utilizing the GPU's capabilities.

Related concepts

__syncthreads() does in CUDA: synchronizes all threads within a block

__syncthreads() synchronizes all threads within a block

cooperative groups enable in CUDA: flexible thread synchronization patterns

Cooperative groups enable flexible thread synchronization patterns in CUDA

Dynamic random-access memory

DRAM requires periodic refreshing to maintain data integrity

CUDA

CUDA enables parallel computation on GPUs

Triton differs from CUDA

Triton uses block-level programming, while CUDA uses thread-level programming

List of gay characters in animation

AtomicAdd adds values to shared or global memory atomically

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews