Ordering and Barriers
When dealing with synchronization between multiple processors or with hardware devices, it is sometimes a requirement that memory-reads (loads) and memory-writes (stores) issue in the order specified in your program code. When talking with hardware, you often need to ensure that a given read occurs before another read or write. Additionally, on symmetrical multiprocessing systems, it may be important for writes to appear in the order that your code issues them (usually to ensure subsequent reads see the data in the same order). Complicating these issues is the fact that both the compiler and the processor can reorder reads and writes[6] for performance reasons. Thankfully, all processors that do reorder reads or writes provide machine instructions to enforce ordering requirements. It is also possible to instruct the compiler not to reorder instructions around a given point. These instructions are called barriers . [6] Intel x86 processors do not ever reorder writes. That is, they do not do out-of-order stores. But other processors do.
Essentially, on some processors the code
a = 1;
b = 2; may allow the processor to store the new value in b before it stores the new value in a. Both the compiler and processor see no relation between a and b. The compiler would perform this reordering at compile time; the reordering would be static, and the resulting object code would simply set b before a. The processor, however, could perform the reordering dynamically during execution by fetching and dispatching seemingly un-related instructions in whatever order it feels is best. The vast majority of the time, such reordering is optimal because there is no apparent relation between a and b. Sometimes the programmer knows best, though.Although the previous example might be reordered, the processor would never reorder writes such as
a = 1;
b = a; where a and b are global, because there is clearly a data dependency between a and b. Neither the compiler nor the processor, however, know about code in other contexts. Occasionally, it is important that writes are seen by other code and the outside world in the specific order you intend. This is often the case with hardware devices, but is also common on multiprocessing machines. The rmb() method provides a read memory barrier. It ensures that no loads are reordered across the rmb() call. That is, no loads prior to the call will be reordered to after the call and no loads after the call will be reordered to before the call. The wmb() method provides a write barrier. It functions in the same manner as rmb(), but with respect to stores instead of loadsit ensures no stores are reordered across the barrier. The mb() call provides both a read barrier and a write barrier. No loads or stores will be reordered across a call to mb(). It is provided because a single instruction (often the same instruction used by rmb()) can provide both the load and store barrier. A variant of rmb(), read_barrier_depends(), provides a read barrier, but only for loads on which subsequent loads depend . All reads prior to the barrier are guaranteed to complete before any reads after the barrier that depend on the reads prior to the barrier. Got it? Basically, it enforces a read barrier, like rmb(), but only for certain readsthose that depend on each other. On some architectures, read_barrier_depends() is much quicker than rmb() because it is not needed and is, thus, a noop . Let's consider an example using mb() and rmb(). The initial value of a is one and the initial value of b is two.
Thread 1 | Thread 2 |
---|---|
a = 3; | - |
mb(); | - |
b = 4; | c = b; |
- | rmb(); |
- | d = a; |
Thread 1 | THRead 2 |
---|---|
a = 3; | - |
mb(); | - |
p = &a; | pp = p; |
- | read_barrier_depends(); |
- | b = *pp; |