1=========================================== 2Control Flow Integrity Design Documentation 3=========================================== 4 5This page documents the design of the :doc:`ControlFlowIntegrity` schemes 6supported by Clang. 7 8Forward-Edge CFI for Virtual Calls 9================================== 10 11This scheme works by allocating, for each static type used to make a virtual 12call, a region of read-only storage in the object file holding a bit vector 13that maps onto to the region of storage used for those virtual tables. Each 14set bit in the bit vector corresponds to the `address point`_ for a virtual 15table compatible with the static type for which the bit vector is being built. 16 17For example, consider the following three C++ classes: 18 19.. code-block:: c++ 20 21 struct A { 22 virtual void f1(); 23 virtual void f2(); 24 virtual void f3(); 25 }; 26 27 struct B : A { 28 virtual void f1(); 29 virtual void f2(); 30 virtual void f3(); 31 }; 32 33 struct C : A { 34 virtual void f1(); 35 virtual void f2(); 36 virtual void f3(); 37 }; 38 39The scheme will cause the virtual tables for A, B and C to be laid out 40consecutively: 41 42.. csv-table:: Virtual Table Layout for A, B, C 43 :header: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 44 45 A::offset-to-top, &A::rtti, &A::f1, &A::f2, &A::f3, B::offset-to-top, &B::rtti, &B::f1, &B::f2, &B::f3, C::offset-to-top, &C::rtti, &C::f1, &C::f2, &C::f3 46 47The bit vector for static types A, B and C will look like this: 48 49.. csv-table:: Bit Vectors for A, B, C 50 :header: Class, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 51 52 A, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0 53 B, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 54 C, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 55 56To emit a virtual call, the compiler will assemble code that checks that 57the object's virtual table pointer is in-bounds and aligned and that the 58relevant bit is set in the bit vector. 59 60For example on x86 a typical virtual call may look like this: 61 62.. code-block:: none 63 64 159a: 48 8b 03 mov (%rbx),%rax 65 159d: 48 8d 15 6c 33 00 00 lea 0x336c(%rip),%rdx 66 15a4: 48 89 c1 mov %rax,%rcx 67 15a7: 48 29 d1 sub %rdx,%rcx 68 15aa: 48 c1 c1 3d rol $0x3d,%rcx 69 15ae: 48 83 f9 51 cmp $0x51,%rcx 70 15b2: 77 3b ja 15ef <main+0xcf> 71 15b4: 48 89 ca mov %rcx,%rdx 72 15b7: 48 c1 ea 05 shr $0x5,%rdx 73 15bb: 48 8d 35 b8 07 00 00 lea 0x7b8(%rip),%rsi 74 15c2: 8b 14 96 mov (%rsi,%rdx,4),%edx 75 15c5: 0f a3 ca bt %ecx,%edx 76 15c8: 73 25 jae 15ef <main+0xcf> 77 15ca: 48 89 df mov %rbx,%rdi 78 15cd: ff 10 callq *(%rax) 79 [...] 80 15ef: 0f 0b ud2 81 82The compiler relies on co-operation from the linker in order to assemble 83the bit vectors for the whole program. It currently does this using LLVM's 84`bit sets`_ mechanism together with link-time optimization. 85 86.. _address point: https://mentorembedded.github.io/cxx-abi/abi.html#vtable-general 87.. _bit sets: http://llvm.org/docs/BitSets.html 88 89Optimizations 90------------- 91 92The scheme as described above is the fully general variant of the scheme. 93Most of the time we are able to apply one or more of the following 94optimizations to improve binary size or performance. 95 96Stripping Leading/Trailing Zeros in Bit Vectors 97~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 98 99If a bit vector contains leading or trailing zeros, we can strip them from 100the vector. The compiler will emit code to check if the pointer is in range 101of the region covered by ones, and perform the bit vector check using a 102truncated version of the bit vector. For example, the bit vectors for our 103example class hierarchy will be emitted like this: 104 105.. csv-table:: Bit Vectors for A, B, C 106 :header: Class, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 107 108 A, , , 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, , 109 B, , , , , , , , 1, , , , , , , 110 C, , , , , , , , , , , , , 1, , 111 112Short Inline Bit Vectors 113~~~~~~~~~~~~~~~~~~~~~~~~ 114 115If the vector is sufficiently short, we can represent it as an inline constant 116on x86. This saves us a few instructions when reading the correct element 117of the bit vector. 118 119If the bit vector fits in 32 bits, the code looks like this: 120 121.. code-block:: none 122 123 dc2: 48 8b 03 mov (%rbx),%rax 124 dc5: 48 8d 15 14 1e 00 00 lea 0x1e14(%rip),%rdx 125 dcc: 48 89 c1 mov %rax,%rcx 126 dcf: 48 29 d1 sub %rdx,%rcx 127 dd2: 48 c1 c1 3d rol $0x3d,%rcx 128 dd6: 48 83 f9 03 cmp $0x3,%rcx 129 dda: 77 2f ja e0b <main+0x9b> 130 ddc: ba 09 00 00 00 mov $0x9,%edx 131 de1: 0f a3 ca bt %ecx,%edx 132 de4: 73 25 jae e0b <main+0x9b> 133 de6: 48 89 df mov %rbx,%rdi 134 de9: ff 10 callq *(%rax) 135 [...] 136 e0b: 0f 0b ud2 137 138Or if the bit vector fits in 64 bits: 139 140.. code-block:: none 141 142 11a6: 48 8b 03 mov (%rbx),%rax 143 11a9: 48 8d 15 d0 28 00 00 lea 0x28d0(%rip),%rdx 144 11b0: 48 89 c1 mov %rax,%rcx 145 11b3: 48 29 d1 sub %rdx,%rcx 146 11b6: 48 c1 c1 3d rol $0x3d,%rcx 147 11ba: 48 83 f9 2a cmp $0x2a,%rcx 148 11be: 77 35 ja 11f5 <main+0xb5> 149 11c0: 48 ba 09 00 00 00 00 movabs $0x40000000009,%rdx 150 11c7: 04 00 00 151 11ca: 48 0f a3 ca bt %rcx,%rdx 152 11ce: 73 25 jae 11f5 <main+0xb5> 153 11d0: 48 89 df mov %rbx,%rdi 154 11d3: ff 10 callq *(%rax) 155 [...] 156 11f5: 0f 0b ud2 157 158If the bit vector consists of a single bit, there is only one possible 159virtual table, and the check can consist of a single equality comparison: 160 161.. code-block:: none 162 163 9a2: 48 8b 03 mov (%rbx),%rax 164 9a5: 48 8d 0d a4 13 00 00 lea 0x13a4(%rip),%rcx 165 9ac: 48 39 c8 cmp %rcx,%rax 166 9af: 75 25 jne 9d6 <main+0x86> 167 9b1: 48 89 df mov %rbx,%rdi 168 9b4: ff 10 callq *(%rax) 169 [...] 170 9d6: 0f 0b ud2 171 172Virtual Table Layout 173~~~~~~~~~~~~~~~~~~~~ 174 175The compiler lays out classes of disjoint hierarchies in separate regions 176of the object file. At worst, bit vectors in disjoint hierarchies only 177need to cover their disjoint hierarchy. But the closer that classes in 178sub-hierarchies are laid out to each other, the smaller the bit vectors for 179those sub-hierarchies need to be (see "Stripping Leading/Trailing Zeros in Bit 180Vectors" above). The `GlobalLayoutBuilder`_ class is responsible for laying 181out the globals efficiently to minimize the sizes of the underlying bitsets. 182 183.. _GlobalLayoutBuilder: http://llvm.org/klaus/llvm/blob/master/include/llvm/Transforms/IPO/LowerBitSets.h 184 185Alignment 186~~~~~~~~~ 187 188If all gaps between address points in a particular bit vector are multiples 189of powers of 2, the compiler can compress the bit vector by strengthening 190the alignment requirements of the virtual table pointer. For example, given 191this class hierarchy: 192 193.. code-block:: c++ 194 195 struct A { 196 virtual void f1(); 197 virtual void f2(); 198 }; 199 200 struct B : A { 201 virtual void f1(); 202 virtual void f2(); 203 virtual void f3(); 204 virtual void f4(); 205 virtual void f5(); 206 virtual void f6(); 207 }; 208 209 struct C : A { 210 virtual void f1(); 211 virtual void f2(); 212 }; 213 214The virtual tables will be laid out like this: 215 216.. csv-table:: Virtual Table Layout for A, B, C 217 :header: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 218 219 A::offset-to-top, &A::rtti, &A::f1, &A::f2, B::offset-to-top, &B::rtti, &B::f1, &B::f2, &B::f3, &B::f4, &B::f5, &B::f6, C::offset-to-top, &C::rtti, &C::f1, &C::f2 220 221Notice that each address point for A is separated by 4 words. This lets us 222emit a compressed bit vector for A that looks like this: 223 224.. csv-table:: 225 :header: 2, 6, 10, 14 226 227 1, 1, 0, 1 228 229At call sites, the compiler will strengthen the alignment requirements by 230using a different rotate count. For example, on a 64-bit machine where the 231address points are 4-word aligned (as in A from our example), the ``rol`` 232instruction may look like this: 233 234.. code-block:: none 235 236 dd2: 48 c1 c1 3b rol $0x3b,%rcx 237