Performance Optimization I: Work Distribution and Scheduling

Previous | Next --- Slide 7 of 62

timothy

It seems like to understand these unpredictable runtimes statistically, it might be useful to understand the rough distribution (or asymptotics) of the maximum of n random variables of various kinds, as splitting work roughly evenly, we have to deal with the maximum runtime across n tasks.

I'd guess pretty common/reasonable models might be geometric, exponential, binomial, and Gaussian RVs. And the expected "extra time" (from taking the max over n variables instead of one variable) scales as roughly as log(n), or sqrt(log(n)). This seems like a pretty glowing statistical endorsement of using parallelism, as that's a pretty slow scaling rate, but I'd guess real-world tasks have fatter tails than these RVs. Some interesting analysis to be done there!

Please log in to leave a comment.