1.. _Fenced_Data_Transfer:
2
3Fenced Data Transfer
4====================
5
6
7.. container:: section
8
9
10   .. rubric:: Problem
11      :class: sectiontitle
12
13   Write a message to memory and have another processor read it on
14   hardware that does not have a sequentially consistent memory model.
15
16
17.. container:: section
18
19
20   .. rubric:: Context
21      :class: sectiontitle
22
23   The problem normally arises only when unsynchronized threads
24   concurrently act on a memory location, or are using reads and writes
25   to create synchronization. High level synchronization constructs
26   normally include mechanisms that prevent unwanted reordering.
27
28
29   Modern hardware and compilers can reorder memory operations in a way
30   that preserves the order of a thread's operation from its viewpoint,
31   but not as observed by other threads. A serial common idiom is to
32   write a message and mark it as ready to ready as shown in the
33   following code:
34
35
36   ::
37
38
39      bool Ready;
40      std::string Message;
41       
42
43      void Send( const std::string& src ) {. // Executed by thread 1
44         Message=src;
45         Ready = true;
46      }
47       
48
49      bool Receive( std::string& dst ) {    // Executed by thread 2
50         bool result = Ready;
51         if( result ) dst=Message;
52         return result;              // Return true if message was received.
53      }
54
55
56   Two key assumptions of the code are:
57
58
59   #. ``Ready`` does not become true until ``Message`` is written.
60
61
62   #. ``Message`` is not read until ``Ready`` becomes true.
63
64
65   These assumptions are trivially true on uniprocessor hardware.
66   However, they may break on multiprocessor hardware. Reordering by the
67   hardware or compiler can cause the sender's writes to appear out of
68   order to the receiver (thus breaking condition a) or the receiver's
69   reads to appear out of order (thus breaking condition b).
70
71
72.. container:: section
73
74
75   .. rubric:: Forces
76      :class: sectiontitle
77
78   -  Creating synchronization via raw reads and writes.
79
80
81.. container:: section
82
83
84   .. rubric:: Solution
85      :class: sectiontitle
86
87   Change the flag from ``bool`` to ``std::atomic<bool>`` for the flag
88   that indicates when the message is ready. Here is the previous
89   example with modifications.
90
91
92   ::
93
94
95      std::atomic<bool> Ready;
96      std::string Message;
97       
98
99      void Send( const std::string& src ) {. // Executed by thread 1
100         Message=src;
101         Ready.store(true, std::memory_order_release);
102      }
103       
104
105      bool Receive( std::string& dst ) {    // Executed by thread 2
106         bool result = Ready.load(std::memory_order_acquire);
107         if( result ) dst=Message;
108         return result;              // Return true if message was received.
109      }
110
111
112   A write to a ``std::atomic`` value has *release* semantics, which
113   means that all of its prior writes will be seen before the releasing
114   write. A read from ``std::atomic`` value has *acquire* semantics,
115   which means that all of its subsequent reads will happen after the
116   acquiring read. The implementation of ``std::atomic`` ensures that
117   both the compiler and the hardware observe these ordering
118   constraints.
119
120
121.. container:: section
122
123
124   .. rubric:: Variations
125      :class: sectiontitle
126
127   Higher level synchronization constructs normally include the
128   necessary *acquire* and *release* fences. For example, mutexes are
129   normally implemented such that acquisition of a lock has *acquire*
130   semantics and release of a lock has *release* semantics. Thus a
131   thread that acquires a lock on a mutex always sees any memory writes
132   done by another thread before it released a lock on that mutex.
133
134
135.. container:: section
136
137
138   .. rubric:: Non Solutions
139      :class: sectiontitle
140
141   Mistaken solutions are so often proposed that it is worth
142   understanding why they are wrong.
143
144
145   One common mistake is to assume that declaring the flag with the
146   ``volatile`` keyword solves the problem. Though the ``volatile``
147   keyword forces a write to happen immediately, it generally has no
148   effect on the visible ordering of that write with respect to other
149   memory operations.
150
151
152   Another mistake is to assume that conditionally executed code cannot
153   happen before the condition is tested. However, the compiler or
154   hardware may speculatively hoist the conditional code above the
155   condition.
156
157
158   Similarly, it is a mistake to assume that a processor cannot read the
159   target of a pointer before reading the pointer. A modern processor
160   does not read individual values from main memory. It reads cache
161   lines. The target of a pointer may be in a cache line that has
162   already been read before the pointer was read, thus giving the
163   appearance that the processor presciently read the pointer target.
164
165