Previous | Next --- Slide 39 of 55
Back to Lecture Thumbnails
subscalar

The two reasons given in lecture for the change in program 2 is that splitting the input into chunks that can independently be processed permits more parallelism and reduces the memory requirements for each chunk. These benefits naturally apply to the RDD partitioning described in previous slides. Partitioning also facilitates certain functionality needed in distributed systems, such as data replication, checkpoints, and the ability to reschedule operations from failed nodes.

Please log in to leave a comment.