Previous | Next --- Slide 36 of 83
Back to Lecture Thumbnails
joshcho

What would be a problem with having sum = reduce_add(x[i]) in the foreach loop?

joshcho

Ah, I understand. partial is 1/8th of the sum.

leo

Why is x[i] a different value for each program instance? The caller provides an array into the second parameter, and then ISPC creates the program instances using the same values across them. I don't see how different values of x[i] could mess the sum up since they'd get the same value after the for loop

leo

oh since sum is uniform, it can be different at one instance since x[i] values are non-unique so it breaks the uniform condition. In that case, I'm assuming it would be possible to have parallelism if sum were not uniform?

tmdalsl

As they must be written somewhere in memory, is there a way we can look at each of the partial sums? Or is that information hidden from the user?

gsamp

float partial is initialized for each instance of the iteration, and each instance has it's own copy: they all start at 0.0f.

In each iteration, they add themselves to 0.0f.

Then reduce_add adds together all of the partial sums; it has access to all N instance-local float partial values.

vgupta22

Since we remove the uniform and now each program instance has to track its own value for sum, will we see a significant decrease in the speedup?

noelma

Kayvon highlighted that reduce_add combines all the partials of all the gang instances into one overall sum.

rubensl

@joshcho I think it would be possible to call sum += reduce_add(x[i]) directly in each loop with no errors, it would just be really inefficient to compute the sum that way as it would lose the benefit of the vectorization though now as there might be some added communication between program instances for the reduction step. More on this can be found in https://ispc.github.io/perfguide.html.

Please log in to leave a comment.