Given a set of functions and what set of vertices to apply those functions, GraphLab will choose how to parallelize the algorithm and even what data structures are used under the hood. One example from lecture, if GraphLab sees that a program does not need outgoing edges, it will choose a data structure to optimize iterating over incoming edges only.
What about functions that may operate only a small subset of the graph? e.g., some algorithm that starts a BFS from vertex 1, and might only reach / update only 0.1% of the graph? Though there are means to choose which vertices are "converged" or do not need to be updated again, it seems like GraphLab's interface only allows the entire set of vertices as an entrypoint.
Is GraphLab still widely used today? What's the relationship between it and the other DSLs like Ligra in terms of how they are used by programmers in practice?
Please log in to leave a comment.
What are the key elements of GraphLab that influenced subsequent, more advanced DSLs for graphs? Is it just the idea of being able to define work per-vertex? Overall it's not clear why GraphLab gives speedups (other than easier development) that would be incorporated into later systems.