We can tell that the memory bus is fully utilized because there are no points in time (vertical slices of the graph) that don't have a blue bar in them (representing usage of the memory bus), so the memory bus is always working.
ecb11
I believe we can illustrate latency in the distance/grayed-out block between each load instruction (shown in green) before memory bus usage (shown in blue). An increase would mean a greater number of clocks between that load instruction (where bandwidth doesn't matter) and the memory bus usage (where bandwidth matters).
shaan0924
No. Math instructions have no effective latency, so increasing the amount of math instructions per load instruction will cover up the latency between the load instruction and the memory bus usage as well as the memory usage itself. As such, if we have a significant increase in the math/load ratio, the stalls can be covered up completely by the math instructions.
brianamb
For the last point, there wouldn't be latency decrease because the math instructions would simply be hiding the latency. But, it would be effective in making it seem like there is no latency.
sareyan
3- I think if memory bus bandwidth is increased, blue lines are allowed to overlap, reducing the bandwidth-constrained throughput from 8 to 8/bandwidth and reducing the stall from 5 to max(0, 8/bandwidth - 3)
lindenli
I think the last question depends on how many math instructions there are. As Kayvon went over in the last part of class, the ratio to think about is arithmetic intensity—that is, the ratio of the amount of computation to the amount of communication. If this is higher, then we would be less bandwidth limited (since steady state utilization would be higher) and hence processor stalls would decrease.
ggomezm
One thing that has not been brought up regarding the last question, we also need to make sure the added math instructions are not all depending on some data that needs to be loaded. If we add a bunch of math operations but they all depend on data that needs to come from memory we will still get stalls.
apappu
It's unclear to me the answer to the second one -- about illustrating higher latency with fixed bandwidth. The length of the blue bar should still be 8 clocks if we are able to move 8 bytes / clock and are fetching
ghostcow
@sareyan perhaps I'm misunderstanding how the graph works here, but I'm not clear on why blue lines would be allowed to overlap if the memory bus bandwidth would overlap. The way I'd visualize it, I would imagine that an increase in the bandwidth would decrease the length of the blue bar, as more data would be able to be fetched per unit time. @apappu, I think this relates to your question as well.
huangda
@ghostcow, they aren't allowed to overlap. They would just be shorter, as we can move more bytes per clock so it would take fewer clocks to move each blue bar.
gsamp
We can tell that the memory bus is fully utilized by looking at the blue rectangle. When one request ends, another starts immediately. There are no stalls between requests.
We can illustrate higher memory latency by increasing the width (horizontally) of the gray rectangles that are adjacent to the green boxes or the blue rectangles. Green boxes = load instruction. Gray rectangles = time it takes for memory to start executing on the next load instruction.
Blue rectangles would have reduced width (horizontally).
At some point, no. If there are enough arithmetic instructions, they can cover the stalls (as per @shaan0924's comment)
tcr
1) Blue rectangle always present for every unit time (agree with gsamp!)
2) Latency = time from when load is issued to time when information is received = fixed latency for request to get to memory (gray bar) + amount of time needed for requested data to get back (blue bar).
You could increase any part of the blue or gray bars, I think either by increasing the size of the cache line (if we’re not changing bandwidth; since blue rectangle = cache line size / bandwidth) or increasing the “gray area”.
3) Blue rectangles might be smaller?
4) No, I don’t think so (if truly significant) - no longer bandwidth-bound.
It's me!
There are blue rectangles throughout the time which means memory is utilized 100% of the time.
Higher memory latency means longer blue rectangles in the figure. In a bandwidth bound system, higher memory latency decreases throughput proportionally.
If memory bandwidth increases, the blue rectangle becomes shorter.
If the computation time of arithmetic instructions covers the time till one outstanding load is finished, there will not be any processor stall.
For the last point, there wouldn't be latency decrease because the math instructions would simply be hiding the latency. But, it would be effective in making it seem like there is no latency.
3- I think if memory bus bandwidth is increased, blue lines are allowed to overlap, reducing the bandwidth-constrained throughput from 8 to 8/bandwidth and reducing the stall from 5 to max(0, 8/bandwidth - 3)
I think the last question depends on how many math instructions there are. As Kayvon went over in the last part of class, the ratio to think about is arithmetic intensity—that is, the ratio of the amount of computation to the amount of communication. If this is higher, then we would be less bandwidth limited (since steady state utilization would be higher) and hence processor stalls would decrease.
One thing that has not been brought up regarding the last question, we also need to make sure the added math instructions are not all depending on some data that needs to be loaded. If we add a bunch of math operations but they all depend on data that needs to come from memory we will still get stalls.
It's unclear to me the answer to the second one -- about illustrating higher latency with fixed bandwidth. The length of the blue bar should still be 8 clocks if we are able to move 8 bytes / clock and are fetching
@sareyan perhaps I'm misunderstanding how the graph works here, but I'm not clear on why blue lines would be allowed to overlap if the memory bus bandwidth would overlap. The way I'd visualize it, I would imagine that an increase in the bandwidth would decrease the length of the blue bar, as more data would be able to be fetched per unit time. @apappu, I think this relates to your question as well.
@ghostcow, they aren't allowed to overlap. They would just be shorter, as we can move more bytes per clock so it would take fewer clocks to move each blue bar.
We can tell that the memory bus is fully utilized by looking at the blue rectangle. When one request ends, another starts immediately. There are no stalls between requests.
We can illustrate higher memory latency by increasing the width (horizontally) of the gray rectangles that are adjacent to the green boxes or the blue rectangles. Green boxes = load instruction. Gray rectangles = time it takes for memory to start executing on the next load instruction.
Blue rectangles would have reduced width (horizontally).
At some point, no. If there are enough arithmetic instructions, they can cover the stalls (as per @shaan0924's comment)
1) Blue rectangle always present for every unit time (agree with gsamp!)
2) Latency = time from when load is issued to time when information is received = fixed latency for request to get to memory (gray bar) + amount of time needed for requested data to get back (blue bar).
You could increase any part of the blue or gray bars, I think either by increasing the size of the cache line (if we’re not changing bandwidth; since blue rectangle = cache line size / bandwidth) or increasing the “gray area”.
3) Blue rectangles might be smaller?
4) No, I don’t think so (if truly significant) - no longer bandwidth-bound.
Please log in to leave a comment.