Previous | Next --- Slide 50 of 94
Back to Lecture Thumbnails
jaez

In what cases would one set number of instances to be multiple of SIMD width rather than equal to SIMD width? What's the advantage of that?

czh

In the case of the number of instances being a small multiple of SIMD width, is this to utilize ILP or multithread interleaving?

leopan

I found this interesting thread of discussion in a previous offering Kayvon had at CMU. It seems that whether we could use a gang size that is multiple of SIMD width depends solely on the compiling options (which is also dependent on what instruction set the CPU supports). In the case of the example program in the discussion, compiling the program with a gang size of 8 (with SIMD width being 4) will turn most operations into two four-wide SIMD instructions. The bottom section in this guide might also be helpful. However, I'm not complete sure about how these separate sets of instructions are parallelized (e.g., over multiple cores / multiple threads on the same core).

Please log in to leave a comment.