I think the number of Fetch/Decode blocks is a design choice, and may not have to be directly related to the number of ALUs. In your example, the best case scenario would be that we have instructions that can take up all the scalar and vector ALUs, and having 7 or even more Fetch/Decode blocks will be beneficial. However, for example, if all the instructions are scalar, then the vector ALUs are not utilized. Having more Fetch/Decode blocks does not necessarily lead to better performance but they will take up more silicon area.
Since there are only 4 scalar ALU's but 6 Fetch/Decode blocks I imagine there are circumstances when the CPU will use a vector ALU to do a scalar operation. Does this happen? If so, are there drawbacks to using a vector ALU?
In lecture 3 , Kayvon has answered a question about this. In practice ALUs can perform a variety of operations-vector and scalar. For the sake of simplicity, he has introduced us to the concept of ALUs either performing Scalar or Vector ops.
Please log in to leave a comment.
I am a little confused about the discrepancies between the number of ALUs and Fetch/Decode blocks. If a core runs up to 4 independent scalar instructions and up to 3 vector instructions, wouldn't it require 7 Fetch/Decode blocks in total?