Back to Lecture Thumbnails
noelma
liana
@noelma, you're right that there are 2 execution units so we will only ever run two threads / instruction streams at once. However you can tell from the 6 fetch / decode blocks that this core allows for superscalar execution if we have enough independent instructions that can be run at the same time among the two threads that we have execution contexts for. Because we can then have 6 instructions running at the same time, it's possible to use a combination of the 5 scalar ALUs, the 2 vector mul/add and the 1 vector add alu - however having full utilization of the alu's would require having the right combination of independent instructions ready to run which might not always occur depending on the use case.
Please log in to leave a comment.
Copyright 2021 Stanford University
I'm not sure why this set up only permits up to 2 vector multiplications or 3 vector additions. Is it just that the ALUs are specifically designed to only handle multiplication/addition or solely addition? Even more so, why wouldn't this be limited to 2 vector additions as there are only 2 execution units?