probreather101

How are some of these atomic primitives actually implemented in hardware? It seems like they lie below the level of abstraction of locks, so I was wondering if they use some hardware-specific atomic operations or if they actually utilize locks for their implementation. From PA2, it seems like (from reading C++ docs) that the atomic types, among others, were actually more efficient than their corresponding mutex + int structure that one could do manually.

fdxmw

@probreather101 I also had the same question and found this response here - https://stackoverflow.com/questions/14758088/how-are-atomic-operations-implemented-at-a-hardware-level. The intel documentation cited in the most upvoted response seems to suggest there can be a cache lock that guards the protected data/cache line. I personally found the user response a little misleading though - I don't think cache coherency itself is sufficient enough to implement atomic ops on the hardware level even though the person made it sound like so

jtguibas

@probreather101 CPUs have special operations for atomic ints. "Atomic operations leverage processor support (compare and swap instructions) and don't use locks at all, whereas locks are more OS-dependent and perform differently on, for example, Win and Linux." (https://stackoverflow.com/questions/15056237/which-is-more-efficient-basic-mutex-lock-or-atomic-integer)

jchao01

I'm still somewhat struggling to wrap my head around this. Since old is being updated to the new value at addr in the while loop, there doesn't seem to be anything stopping another thread from changing addr in a way that changes the "output" of an existing call to atomic_min. If so, how is atomic_min useful?

nassosterz

@jchao01 I dont think that old is being updated to the new value at addr in the while loop every time. The value of address gets updated only when its present value and what was stored on atomic_min are in sync. Being in sync means that no other thread updated the value at address at any time and thus we can be sure that atomic_min will yield the correct result for minimum between x and the value at addr.

huangda

I'm not quite clear as to why this is better than just using locks. I get that the CAS is implemented in hardware, but are there no hardware equivalents for locks and condition variables?

czh

@nassosterz I disagree with some parts of your assessment for @jchao01’s question. While atomic_min is running I think there could indeed be other threads changing the value stored at the address. As you said the value of the address will be updated when the present value and what is stored in atomic_min is in sync, but this is only for the thread running atomic_min. Other threads may come in and do an update first - in this case atomic_min updates it’s old variable to the new value stored at addr and goes through another iteration of the loop, until it is the one that successfully updates the value stored at addr. As for @jchao01’s question, I think atomic_min is useful in the sense that it guarantees that it doesn’t ever set the value at addr to a LARGER value. For example, if we have a small program where threads each call atomic_min with a random number, the expected result would be after all threads are done the value at addr is the smallest number of all. Without this implementation of atomic_min, there could race cases where one thread with the number x reads the value at addr as y > x but before it uses a regular min to update the value another thread already updates the value to w < x. This thread with the number x would go on to update the value at addr to x, breaking the correctness of the program.

huangda

Won't the atomicCAS change the value of the address back, if addr != old? In that sense, I don't think that the value at the address can change with regards to one thread running this function

ecb11

@czh, that's a really helpful clarification in that only a single thread will be looping and making those comparisons to ensure that we are working with the "old" variable under examination before an update. In this sense, we can see that it may be a single thread who is running an atomic_min check as well (in addition to the aptly suggested streaming idea of getting the minimum of many values) in response to @jchao01.

apappu

I found the explanation by Kayvon (and Yuhan in the midterm review session) of thinking about using atomicCAS instead of locks as "make my updates locally and then use atomicCAS to make sure the state of the world hasn't been changed by other threads" really helpful for figuring out how to write thread safe functions that only rely on atomicCAS

lordpancake

If locks and critical regions are implemented by CAS (or are they not), then why do we get a performance improvement by using CAS directly? As in the case of assn4. Very confusing

gsamp

@huangda, Won't the atomicCAS change the value of the address back, if addr != old?, yes, atomicCAS will change the value of the address back to what it was in memory (that is, the value that the other thread just set it to).

So the potential modification made by some other thread will be seen by the thread calling atomicMin, and the value at the address will change.