In the worst case scenario is this still scalable? If there are many threads who each perform a significant amount of work "T +/- noise", between a read and a write, there would be an arithmatic sum of processor work due to restarts. For 8 threads to complete the task and respect memory orderings, it would take 8T+7T+6T...+T time from the start of the operation until the final commit. Using lock/wait constrcuts would at least allow hw threads to stall and complete other operations while other threads complete and would still only take 8T time.
In the worst case scenario is this still scalable? If there are many threads who each perform a significant amount of work "T +/- noise", between a read and a write, there would be an arithmatic sum of processor work due to restarts. For 8 threads to complete the task and respect memory orderings, it would take 8T+7T+6T...+T time from the start of the operation until the final commit. Using lock/wait constrcuts would at least allow hw threads to stall and complete other operations while other threads complete and would still only take 8T time.