1.. _Fenced_Data_Transfer: 2 3Fenced Data Transfer 4==================== 5 6 7.. container:: section 8 9 10 .. rubric:: Problem 11 :class: sectiontitle 12 13 Write a message to memory and have another processor read it on 14 hardware that does not have a sequentially consistent memory model. 15 16 17.. container:: section 18 19 20 .. rubric:: Context 21 :class: sectiontitle 22 23 The problem normally arises only when unsynchronized threads 24 concurrently act on a memory location, or are using reads and writes 25 to create synchronization. High level synchronization constructs 26 normally include mechanisms that prevent unwanted reordering. 27 28 29 Modern hardware and compilers can reorder memory operations in a way 30 that preserves the order of a thread's operation from its viewpoint, 31 but not as observed by other threads. A serial common idiom is to 32 write a message and mark it as ready to ready as shown in the 33 following code: 34 35 36 :: 37 38 39 bool Ready; 40 std::string Message; 41 42 43 void Send( const std::string& src ) {. // Executed by thread 1 44 Message=src; 45 Ready = true; 46 } 47 48 49 bool Receive( std::string& dst ) { // Executed by thread 2 50 bool result = Ready; 51 if( result ) dst=Message; 52 return result; // Return true if message was received. 53 } 54 55 56 Two key assumptions of the code are: 57 58 59 #. ``Ready`` does not become true until ``Message`` is written. 60 61 62 #. ``Message`` is not read until ``Ready`` becomes true. 63 64 65 These assumptions are trivially true on uniprocessor hardware. 66 However, they may break on multiprocessor hardware. Reordering by the 67 hardware or compiler can cause the sender's writes to appear out of 68 order to the receiver (thus breaking condition a) or the receiver's 69 reads to appear out of order (thus breaking condition b). 70 71 72.. container:: section 73 74 75 .. rubric:: Forces 76 :class: sectiontitle 77 78 - Creating synchronization via raw reads and writes. 79 80 81.. container:: section 82 83 84 .. rubric:: Solution 85 :class: sectiontitle 86 87 Change the flag from ``bool`` to ``std::atomic<bool>`` for the flag 88 that indicates when the message is ready. Here is the previous 89 example with modifications. 90 91 92 :: 93 94 95 std::atomic<bool> Ready; 96 std::string Message; 97 98 99 void Send( const std::string& src ) {. // Executed by thread 1 100 Message=src; 101 Ready.store(true, std::memory_order_release); 102 } 103 104 105 bool Receive( std::string& dst ) { // Executed by thread 2 106 bool result = Ready.load(std::memory_order_acquire); 107 if( result ) dst=Message; 108 return result; // Return true if message was received. 109 } 110 111 112 A write to a ``std::atomic`` value has *release* semantics, which 113 means that all of its prior writes will be seen before the releasing 114 write. A read from ``std::atomic`` value has *acquire* semantics, 115 which means that all of its subsequent reads will happen after the 116 acquiring read. The implementation of ``std::atomic`` ensures that 117 both the compiler and the hardware observe these ordering 118 constraints. 119 120 121.. container:: section 122 123 124 .. rubric:: Variations 125 :class: sectiontitle 126 127 Higher level synchronization constructs normally include the 128 necessary *acquire* and *release* fences. For example, mutexes are 129 normally implemented such that acquisition of a lock has *acquire* 130 semantics and release of a lock has *release* semantics. Thus a 131 thread that acquires a lock on a mutex always sees any memory writes 132 done by another thread before it released a lock on that mutex. 133 134 135.. container:: section 136 137 138 .. rubric:: Non Solutions 139 :class: sectiontitle 140 141 Mistaken solutions are so often proposed that it is worth 142 understanding why they are wrong. 143 144 145 One common mistake is to assume that declaring the flag with the 146 ``volatile`` keyword solves the problem. Though the ``volatile`` 147 keyword forces a write to happen immediately, it generally has no 148 effect on the visible ordering of that write with respect to other 149 memory operations. 150 151 152 Another mistake is to assume that conditionally executed code cannot 153 happen before the condition is tested. However, the compiler or 154 hardware may speculatively hoist the conditional code above the 155 condition. 156 157 158 Similarly, it is a mistake to assume that a processor cannot read the 159 target of a pointer before reading the pointer. A modern processor 160 does not read individual values from main memory. It reads cache 161 lines. The target of a pointer may be in a cache line that has 162 already been read before the pointer was read, thus giving the 163 appearance that the processor presciently read the pointer target. 164 165