It might be worth adding that: sometimes you really have to decide if using a cluster of commodity machines is truly worthwhile for your application, considering its arithmetic intensity and the regularity of the access patterns. There's been work in parallel graph systems that basically shows you can get pretty far with a single very high-end server with lots of RAM, and another example is Pixie from Pinterest (https://arxiv.org/pdf/1711.07601.pdf). [I can't remember the reference from graph analytics, but will post if I find it.] Today, EC2 supports up to 12TB per instance (that's already more than 10% of the 100TB figure at Facebook!) and many dozens of CPU cores, and the costs actually tend to scale pretty linearly with the resources you're given. That's not the same as 16,000 cores, but very few applications truly need that scale — so it's always worth running through this analysis before committing a lot of time to a complex distributed solution!
It might be worth adding that: sometimes you really have to decide if using a cluster of commodity machines is truly worthwhile for your application, considering its arithmetic intensity and the regularity of the access patterns. There's been work in parallel graph systems that basically shows you can get pretty far with a single very high-end server with lots of RAM, and another example is Pixie from Pinterest (https://arxiv.org/pdf/1711.07601.pdf). [I can't remember the reference from graph analytics, but will post if I find it.] Today, EC2 supports up to 12TB per instance (that's already more than 10% of the 100TB figure at Facebook!) and many dozens of CPU cores, and the costs actually tend to scale pretty linearly with the resources you're given. That's not the same as 16,000 cores, but very few applications truly need that scale — so it's always worth running through this analysis before committing a lot of time to a complex distributed solution!