@potato, I have the same question. I think the "total latency of memory access" should account for the whole blue section as well, since we can only use the values we read from memory for subsequent computations when they have been transferred to our working registers.
I still don't understand why the "total latency of memory access" only covers part of the blue section instead of the full blue section. Is it because the blue section is both "transfer cache line over memory bus" and "transfer value to processor register" (so there should be a split between the two which is where the "total latency of memory access" ends)?
Are many of these metrics measurable in practice from software? It seems low-level enough that it would be hard to diagnose the precise issue without more access to information in hardware than we usually haveā¦
Please log in to leave a comment.
What precisely is "total latency of memory access", and why does it include part of the blue section? I'd imagine that it'd only be up to the start of the blue section since that's when the request gets hit in memory