I am still a little confused on where this code is run? Is it run on a master node and distributed to workers or is it run on each worker on its subset of the data?
michzrrr
So there will be multiple RDD's as this program is created, is that an excess use of space or can users choose to delete ones they won't need
I am still a little confused on where this code is run? Is it run on a master node and distributed to workers or is it run on each worker on its subset of the data?