The notion of ILP gets complex in this context, as we can have SIMD instructions within a single thread and/or independent instructions running in parallel.
@gsamp. Correct! But conceptually, it's still ILP. There are multiple instructions in a thread of control (thread 0 in this case) that are independent, and the processor identifies that for the application and is able to execute them simultaneously. The fact that one of those instructions is a scalar and the other is vector is just a detail. The heterogeneity of the situation adds complexity to the superscalar execution logic since it's not enough for two instructions to be independent, it needs to be a particular mixture of independent scalar and vector operations.
Please log in to leave a comment.
Calculating performance speedups in such architecture becomes complex.