mark

Would moving the line float myDiff = 0.f; after the first barrier potentially speed up the code, since instead of having all threads execute two lines before getting to the main portion of the code, they would just have to execute one?

awu

I'd imagine the speedup is minimal but it does seem like we can move it into the barrier.

jennaruzekowicz

The idea of barriers is extremely interesting. What are some examples of how code uses these in the real world? Has anyone used these before? I would love to hear about applications!

Additionally, what does the implementation under the hood look like for barriers? How do they affect run time and overhead?

fromscratch

@jennaruzekowicz, I think one of the simplest ways to implement barriers is to use a spin barrier with an atomic integer counter (or an integer counter with a lock). Each thread can increment the atomic counter upon encountering the barrier, and it simply loops ["spins"] until the value reaches the total number desired.

This would likely generate a lot of traffic on the memory/cache subsystem, so I imagine there are better implementations for use cases where that's undesirable.

xiyan

In lecture, we talked about the purpose of using 3 barriers. The purpose of the second barrier is to ensure that we check if diff is less than TOLERANCE after all threads have updated the diff, removing the barrier may cause us incorrectly terminate early. The purpose of the third barrier is to ensure every thread have checked done before done is reset to 0. The purpose of the first barrier is to make sure no other threads clear diff before we update diff.

michzrrr

If we were to do this for both black and red assignments, would we also have to have like an overall barrier to ensure that each step is fully complete before we run the next iteration (due to dependencies).

german.enik

to confirm my understanding, we don't need to lock and unlock around where we set done to true because it's already surrounded by two barriers. this means that after barrier #2, diff is guaranteed to have received all necessary updates from all thread

shreya_ravi

I am a bit confused as to how a barrier works when inside a while loop. I don't understand how a barrier can distinguish between iterations of a loop (i.e. how do we know that Thread A has reached the barrier after 3 loops and Thread B has also reached the barrier after 3 loops).

Is there some state associated with each barrier (e.g. the number of threads that has hit the barrier so far) and that is amended as each thread encounters the barrier (e.g. increments the count of the threads that have hit the barrier) until all threads have hit the barrier (e.g. thread_hit_barrier counter == num_threads), causing the threads to move on and the state to be reset?

This is similar to the implementation that @fromscratch suggested!

fromscratch

@shreya_ravi Unless explicitly indicated in the barrier API, the barrier doesn't in fact care about the loop iteration.

In fact, some threads might be inside the loop, others outside. They all just need to reach "the next" barrier.

But, with a different API, the semantics can be designed so that there's a "tag" that makes sure all threads are hit the "same" barrier in fact.

parthiv

@shreya_ravi

i.e. how do we know that Thread A has reached the barrier after 3 loops and Thread B has also reached the barrier after 3 loops My understanding is that we know this because, for these two threads to be reaching this barrier after 3 loops, they must have also hit all the barriers in the previous loops which would have forced them to be re-aligned.

lee

A question was asked in lecture (that I also wondered) about was why use barrier instead of .join()? The difference is that .join() waits for a single thread to finish, and after it's called, the thread dies. A barrier waits for all threads to get to that point, and none of the threads die after that, so a barrier is the right choice for this example