Previous | Next --- Slide 12 of 94
Back to Lecture Thumbnails
kristinayige

Which comes first? The SIMD parallelism or the threads (i.e. which dominates the parallel between tasks)? Seems like in the lecture, the parallelism is "bottlenecked" by the number of ALUs.

gmukobi

@kristinayige I think "first" might be difficult to classify, but perhaps a more useful abstraction is thinking about the differences between SIMD concurrency and hardware thread concurrency. You can kind of abstract away SIMD as just a way to run vector operations--that is, an 8-wide group of SIMD ALUs can kind of be thought of as just one big ALU that is capable of running operations on 8-long vectors of data (and indeed that's what the assembly looks like, as you can see in the pseudocode for Instruction stream 0). That's opposed to hardware thread concurrency, where there are multiple execution contexts stored at the same time in a processor (here there are 4) and the processor can advance the instruction execution on any of them during a clock (or even multiple of them in the same clock if it's a superscaler core like this one).

shivalgo

having 4 threads in a single core with one execution context is not going to get us the speedup as processing it parallel on 4 multi cores with 1 exec context per core.

Please log in to leave a comment.