I'm interested in whether I would see false sharing on (for example) an NVIDIA GPU, in (for example) global loads which bypass a non-coherent L1 cache. If so, are there any similar solutions like this? Or is the problem just rare enough because of non-coherence that we don't think about it?
I'm interested in whether I would see false sharing on (for example) an NVIDIA GPU, in (for example) global loads which bypass a non-coherent L1 cache. If so, are there any similar solutions like this? Or is the problem just rare enough because of non-coherence that we don't think about it?