mereyxu

Is there any processor that can decode and execute more than two instructions per clock?

gmudel

@mereyxu I think so; while I don't have a specific name, the diagram in section 7.2 here seems to indicate it's very much possible.

crs

Are there twice as many registers in this processor compared to the last? If so, how is this different from just having 2 processors and a separate piece to do the out-of-order logic?

leopan

@crs I think in this example we are having the same number of registers as before, rather we are doubling only the number of fetch and execution units. I think it basically means we can execute up to two instructions simultaneously now, but we might not need to expand our register space (in this specific case) as we are pretty much holding the same set / amount of intermediate results.

probreather101

One thing I noticed in this discussion of superscalar execution and number of clocks for execution is that pipelining was omitted from the discussion. On this slide, it is clear that there are multiple Fetch/Decode and Execution units, but it might be worth pointing out that each of these are individual stages in a pipelined processor. Having a brief side note on pipelines might help connect superscalar execution with the components the processor has access to, and also would explain how modern computers significantly reduce their clock cycle time by beginning to process the next instruction while the first is still finishing.

stao18

Since there are two ALUs in this processor, does it mean it can process two threads at once, or will it still be a single-threaded processor?

albystein

Does a superscalar processor have two or more program counters/instruction pointers?

tanner

@mereyxu It is possible to design a processor that can execute more than 2 instructions per clock. Though after searching around for a while, I haven't been able to find an example of a real processor that does this. I suspect that the following quotations from one of the supplemental reading papers (https://queue.acm.org/detail.cfm?id=1095418) might help explain this: "Typical instruction streams have only a limited amount of usable parallelism among instructions, so superscalar processors that can issue more than about four instructions per cycle achieve very little additional benefit on most applications." And "Complicating matters further, building superscalar processor cores that can exploit more than a few instructions per cycle becomes very expensive, because the complexity of all the additional logic required to find parallel instructions dynamically is approximately proportional to the square of the number of instructions that can be issued simultaneously."

ccheng18

Each core can run one thread unless it is a hyperthreaded core, in which case it can run up to max 2 threads with two instruction contexts.