To summarize what we talked about CUDA till this slide:
CUDA's SPMD programming parallelizes work into multiple threads and along with that, it groups these threads into blocks. There are separate address spaces for CPU and GPU. And within the GPU, there are separate local memories - one for storing thread-specific variables and a different space for storing the shared variables within each block.
syncthreads() is a barrier only to the threads within the same block and not across blocks.
To summarize what we talked about CUDA till this slide: CUDA's SPMD programming parallelizes work into multiple threads and along with that, it groups these threads into blocks. There are separate address spaces for CPU and GPU. And within the GPU, there are separate local memories - one for storing thread-specific variables and a different space for storing the shared variables within each block. syncthreads() is a barrier only to the threads within the same block and not across blocks.