abraoliv

So messages don't travel "around" the loop, per se, right? The wires still just turn HI/LOW really quickly right? So what is the idea of the ring and the double connections? Also, the professor mentioned "shortcutting", could this be expanded upon?

juliob

Following up with the above, does a ring of this configuration mean that data would reach some cores faster than others? Ie - if we have a write for x = 3 from x = 0, would that mean that the closest core adjacent to the writer would receive that information first, followed by all the others?

zmelnyk

@juliob What I got from lecture is that messages travel one way in the loop and that the double loop structure is setup with loops in either direction such that the furthest a message has to travel to reach a specific other core is half of the loop.

fdxmw

what are the pros and cons of using a ring topology for the interconnect versus a mesh topology?

leave

Found a very nice video that explains Intel's ring bus: https://www.youtube.com/watch?v=HGToWWne3do. Looks like that the ring design helps reduce latency, which is favorable to consumer-faced applications.

shivalgo

4 L3 cache slices is like having 4-way set associative?

fromscratch

@shivalgo, I think it isn't about associativity. I believe it's more like the NUMA discussions in terms of locality.

tcr

I hadn't thought much about the hardware implementation of a shared address space! I'm curious about how this works at an OS level (I don't know much about multi-core OS's). CS140 teaches us that a basic kernel has essentially universal access to machine memory. I'm curious -- to what extent would the details of hardware like this be exposed in an implementation of kernel address space? would the kernel still see a straightforward implementation of array-ish memory?

mcu

@tcr The kernel doesn't just see array-ish memory; it also has a certain degree of access to, for example, the cache (for example, in a virtual memory system, changes to TLB might necessitate a cache flush depending on the way the specific CPU's cache interacts with memory addresses). However, for general use it can indeed pretend that it's straightforward array-ish memory, because most of these implementation details are hidden at the processor level and exist to support the semantics of the instruction set architecture while allowing for a more performant underlying implementation.

ccheng18

How are memory bus's implemented? Is the MSI versioning an example of it?