Parallel Graph Processing Frameworks + How DRAM Works

Previous | Next --- Slide 37 of 81

ghostcow

Most graph algorithms are bandwidth bound! This is because memory access patterns are often random, and not much work is performed for each (node/edge) access, resulting in low arithmetic intensity.

Given that graph algorithms are memory bandwidth bound, the only way to move faster is to read less data from memory. You can always hide latency with more parallelism (e.g. via more threads) or more arithmetic computation, but if you are bandwidth bound, unless you add enough computation to make the program CPU-bound, adding more threads does not help as all of them will be blocked on the memory bus bandwidth (fetching data).

pranil

@ghostcow's observation helps me develop intuition for what's going on. I think it will be very helpful for performance to exploit spatial and temporal locality in these graph algorithms. Certainly, one way to do it is to store the data of the graph in the memory intelligently - neighboring nodes could be stored in nearby memory addresses, as a lot of programs will access neighbors of nodes. If there is a certain temporal pattern in the way a program accesses nodes, the cache prefetcher could be extremely helpful too.

tmdalsl

I see that optimizing performance on large graphs are based around these 2 concepts. I was wondering which of the 2 is a more important factor when in comes to optimization? Could it be a case-by-case basis? If so, how do we know which one to prioritize?

Please log in to leave a comment.