Previous | Next --- Slide 36 of 55
Back to Lecture Thumbnails
potato

How is this fault-tolerant if they're still in-memory? Wouldn't they still be wiped if memory died?

potato

As a follow-up, when, if eer, is the info written back to disk?

ecb11

@potato, my understanding is that we find that fault tolerance can be achieved by tracing back the lineage to retrieve a previous state of the RDD. In other words, upon a crash, we can go back in the lineage of deterministic transformations made to the RDDs until we are able to begin computation once again. This is a memory-efficient fault-tolerance mechanism since log files of transformations are low in memory to maintain.

amagibaba

What is a partition and why are there partitions?

rtan21

Also wondering this, is it because partitions are needed because there may be multiple physical RAM sticks? Why can't this be abstracted away?

jasonalouda

Could we store all the "lines" in one node, all the "lowers" in another node, etc? Or does that make it not fault tolerant anymore?

thepurplecow

Do we need to make copies of lines when storing them, to ensure that if one node fails, the lines are still available somewhere else?

alexder

I don't really understand what the main advantage of using RDDs is? Is it the fact that it allows for the read-only operations to be parallelized across machines or is there more to it? The way I understand it at least is that RDDs allow you to perform parallel computation tasks on datasets over clusters, but I'm assuming work like this had been done before spark.

Please log in to leave a comment.