Previous | Next --- Slide 39 of 94
Back to Lecture Thumbnails
pizza

To clarify this example, in order to have a throughput of 1 instruction per cycle, would we need a processor that has 4-way superscalar execution?

12345

@pizza I think the answer is no. If you look at the graph, we never have multiple instructions running in the same stage. You can think of EX as the stage where instruction is running in ALUs. If we have multiple ALUs, we should be able to have multiple independent instructions to all run in the EX stage.

student1

To further clarify, are the instruction fetch and the instruction decode + register read being carried out simultaneously? Are these carried out by the corresponding execution unit? Meanwhile, to make sure, Are ALUs only responsible for carrying out computational instruction?

tim

@student1 In this specific diagram, the decode + register read is carried out within one clock cycle. This is a design decision that probably varies between different CPU chips. (As noted on the slide, the pipeline (time from instruction start to instruction done) on i7 is 20 clocks.)

ALUs are responsible for integer math computation and boolean logic.

rthomp

What is going on in the Intel Core i7 pipeline to give it (a maximum of) 20 different stages? I guess that's a can of worms we probably won't open in this class but it just makes me more and more fascinated by the complexity of computer architecture.

gklimias

@rthomp most of those stages are there to allow out-of-order execution (OoO). OoO is a solution that computer architects came up with to hide the latency of slower microoperations (mostly accessing memory and doing floating-point operations). In short, CPU cores are able to make progress on other independent instructions from the same thread while waiting for slower instructions to be completed. Therefore, there are number of stages that allow reorder and select which instructions can be executed.

Please log in to leave a comment.