1======================== 2Debugging C++ Coroutines 3======================== 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11For performance and other architectural reasons, the C++ Coroutines feature in 12the Clang compiler is implemented in two parts of the compiler. Semantic 13analysis is performed in Clang, and Coroutine construction and optimization 14takes place in the LLVM middle-end. 15 16However, this design forces us to generate insufficient debugging information. 17Typically, the compiler generates debug information in the Clang frontend, as 18debug information is highly language specific. However, this is not possible 19for Coroutine frames because the frames are constructed in the LLVM middle-end. 20 21To mitigate this problem, the LLVM middle end attempts to generate some debug 22information, which is unfortunately incomplete, since much of the language 23specific information is missing in the middle end. 24 25This document describes how to use this debug information to better debug 26coroutines. 27 28Terminology 29=========== 30 31Due to the recent nature of C++20 Coroutines, the terminology used to describe 32the concepts of Coroutines is not settled. This section defines a common, 33understandable terminology to be used consistently throughout this document. 34 35coroutine type 36-------------- 37 38A `coroutine function` is any function that contains any of the Coroutine 39Keywords `co_await`, `co_yield`, or `co_return`. A `coroutine type` is a 40possible return type of one of these `coroutine functions`. `Task` and 41`Generator` are commonly referred to coroutine types. 42 43coroutine 44--------- 45 46By technical definition, a `coroutine` is a suspendable function. However, 47programmers typically use `coroutine` to refer to an individual instance. 48For example: 49 50.. code-block:: c++ 51 52 std::vector<Task> Coros; // Task is a coroutine type. 53 for (int i = 0; i < 3; i++) 54 Coros.push_back(CoroTask()); // CoroTask is a coroutine function, which 55 // would return a coroutine type 'Task'. 56 57In practice, we typically say "`Coros` contains 3 coroutines" in the above 58example, though this is not strictly correct. More technically, this should 59say "`Coros` contains 3 coroutine instances" or "Coros contains 3 coroutine 60objects." 61 62In this document, we follow the common practice of using `coroutine` to refer 63to an individual `coroutine instance`, since the terms `coroutine instance` and 64`coroutine object` aren't sufficiently defined in this case. 65 66coroutine frame 67--------------- 68 69The C++ Standard uses `coroutine state` to describe the allocated storage. In 70the compiler, we use `coroutine frame` to describe the generated data structure 71that contains the necessary information. 72 73The structure of coroutine frames 74================================= 75 76The structure of coroutine frames is defined as: 77 78.. code-block:: c++ 79 80 struct { 81 void (*__r)(); // function pointer to the `resume` function 82 void (*__d)(); // function pointer to the `destroy` function 83 promise_type; // the corresponding `promise_type` 84 ... // Any other needed information 85 } 86 87In the debugger, the function's name is obtainable from the address of the 88function. And the name of `resume` function is equal to the name of the 89coroutine function. So the name of the coroutine is obtainable once the 90address of the coroutine is known. 91 92Print promise_type 93================== 94 95Every coroutine has a `promise_type`, which defines the behavior 96for the corresponding coroutine. In other words, if two coroutines have the 97same `promise_type`, they should behave in the same way. 98To print a `promise_type` in a debugger when stopped at a breakpoint inside a 99coroutine, printing the `promise_type` can be done by: 100 101.. parsed-literal:: 102 103 print __promise 104 105It is also possible to print the `promise_type` of a coroutine from the address 106of the coroutine frame. For example, if the address of a coroutine frame is 1070x416eb0, and the type of the `promise_type` is `task::promise_type`, printing 108the `promise_type` can be done by: 109 110.. parsed-literal:: 111 112 print (task::promise_type)*(0x416eb0+0x10) 113 114This is possible because the `promise_type` is guaranteed by the ABI to be at a 11516 bit offset from the coroutine frame. 116 117Note that there is also an ABI independent method: 118 119.. parsed-literal:: 120 121 print std::coroutine_handle<task::promise_type>::from_address((void*)0x416eb0).promise() 122 123The functions `from_address(void*)` and `promise()` are often small enough to 124be removed during optimization, so this method may not be possible. 125 126Print coroutine frames 127====================== 128 129LLVM generates the debug information for the coroutine frame in the LLVM middle 130end, which permits printing of the coroutine frame in the debugger. Much like 131the `promise_type`, when stopped at a breakpoint inside a coroutine we can 132print the coroutine frame by: 133 134.. parsed-literal:: 135 136 print __coro_frame 137 138 139Just as printing the `promise_type` is possible from the coroutine address, 140printing the details of the coroutine frame from an address is also possible: 141 142:: 143 144 (gdb) # Get the address of coroutine frame 145 (gdb) print/x *0x418eb0 146 $1 = 0x4019e0 147 (gdb) # Get the linkage name for the coroutine 148 (gdb) x 0x4019e0 149 0x4019e0 <_ZL9coro_taski>: 0xe5894855 150 (gdb) # The coroutine frame type is 'linkage_name.coro_frame_ty' 151 (gdb) print (_ZL9coro_taski.coro_frame_ty)*(0x418eb0) 152 $2 = {__resume_fn = 0x4019e0 <coro_task(int)>, __destroy_fn = 0x402000 <coro_task(int)>, __promise = {...}, ...} 153 154The above is possible because: 155 156(1) The name of the debug type of the coroutine frame is the `linkage_name`, 157plus the `.coro_frame_ty` suffix because each coroutine function shares the 158same coroutine type. 159 160(2) The coroutine function name is accessible from the address of the coroutine 161frame. 162 163The above commands can be simplified by placing them in debug scripts. 164 165Examples to print coroutine frames 166---------------------------------- 167 168The print examples below use the following definition: 169 170.. code-block:: c++ 171 172 #include <coroutine> 173 #include <iostream> 174 175 struct task{ 176 struct promise_type { 177 task get_return_object() { return std::coroutine_handle<promise_type>::from_promise(*this); } 178 std::suspend_always initial_suspend() { return {}; } 179 std::suspend_always final_suspend() noexcept { return {}; } 180 void return_void() noexcept {} 181 void unhandled_exception() noexcept {} 182 183 int count = 0; 184 }; 185 186 void resume() noexcept { 187 handle.resume(); 188 } 189 190 task(std::coroutine_handle<promise_type> hdl) : handle(hdl) {} 191 ~task() { 192 if (handle) 193 handle.destroy(); 194 } 195 196 std::coroutine_handle<> handle; 197 }; 198 199 class await_counter : public std::suspend_always { 200 public: 201 template<class PromiseType> 202 void await_suspend(std::coroutine_handle<PromiseType> handle) noexcept { 203 handle.promise().count++; 204 } 205 }; 206 207 static task coro_task(int v) { 208 int a = v; 209 co_await await_counter{}; 210 a++; 211 std::cout << a << "\n"; 212 a++; 213 std::cout << a << "\n"; 214 a++; 215 std::cout << a << "\n"; 216 co_await await_counter{}; 217 a++; 218 std::cout << a << "\n"; 219 a++; 220 std::cout << a << "\n"; 221 } 222 223 int main() { 224 task t = coro_task(43); 225 t.resume(); 226 t.resume(); 227 t.resume(); 228 return 0; 229 } 230 231In debug mode (`O0` + `g`), the printing result would be: 232 233.. parsed-literal:: 234 235 {__resume_fn = 0x4019e0 <coro_task(int)>, __destroy_fn = 0x402000 <coro_task(int)>, __promise = {count = 1}, v = 43, a = 45, __coro_index = 1 '\001', struct_std__suspend_always_0 = {__int_8 = 0 '\000'}, 236 class_await_counter_1 = {__int_8 = 0 '\000'}, class_await_counter_2 = {__int_8 = 0 '\000'}, struct_std__suspend_always_3 = {__int_8 = 0 '\000'}} 237 238In the above, the values of `v` and `a` are clearly expressed, as are the 239temporary values for `await_counter` (`class_await_counter_1` and 240`class_await_counter_2`) and `std::suspend_always` ( 241`struct_std__suspend_always_0` and `struct_std__suspend_always_3`). The index 242of the current suspension point of the coroutine is emitted as `__coro_index`. 243In the above example, the `__coro_index` value of `1` means the coroutine 244stopped at the second suspend point (Note that `__coro_index` is zero indexed) 245which is the first `co_await await_counter{};` in `coro_task`. Note that the 246first initial suspend point is the compiler generated 247`co_await promise_type::initial_suspend()`. 248 249However, when optimizations are enabled, the printed result changes drastically: 250 251.. parsed-literal:: 252 253 {__resume_fn = 0x401280 <coro_task(int)>, __destroy_fn = 0x401390 <coro_task(int)>, __promise = {count = 1}, __int_32_0 = 43, __coro_index = 1 '\001'} 254 255Unused values are optimized out, as well as the name of the local variable `a`. 256The only information remained is the value of a 32 bit integer. In this simple 257case, it seems to be pretty clear that `__int_32_0` represents `a`. However, it 258is not true. 259 260An important note with optimization is that the value of a variable may not 261properly express the intended value in the source code. For example: 262 263.. code-block:: c++ 264 265 static task coro_task(int v) { 266 int a = v; 267 co_await await_counter{}; 268 a++; // __int_32_0 is 43 here 269 std::cout << a << "\n"; 270 a++; // __int_32_0 is still 43 here 271 std::cout << a << "\n"; 272 a++; // __int_32_0 is still 43 here! 273 std::cout << a << "\n"; 274 co_await await_counter{}; 275 a++; // __int_32_0 is still 43 here!! 276 std::cout << a << "\n"; 277 a++; // Why is __int_32_0 still 43 here? 278 std::cout << a << "\n"; 279 } 280 281When debugging step-by-step, the value of `__int_32_0` seemingly does not 282change, despite being frequently incremented, and instead is always `43`. 283While this might be surprising, this is a result of the optimizer recognizing 284that it can eliminate most of the load/store operations. The above code gets 285optimized to the equivalent of: 286 287.. code-block:: c++ 288 289 static task coro_task(int v) { 290 store v to __int_32_0 in the frame 291 co_await await_counter{}; 292 a = load __int_32_0 293 std::cout << a+1 << "\n"; 294 std::cout << a+2 << "\n"; 295 std::cout << a+3 << "\n"; 296 co_await await_counter{}; 297 a = load __int_32_0 298 std::cout << a+4 << "\n"; 299 std::cout << a+5 << "\n"; 300 } 301 302It should now be obvious why the value of `__int_32_0` remains unchanged 303throughout the function. It is important to recognize that `__int_32_0` 304does not directly correspond to `a`, but is instead a variable generated 305to assist the compiler in code generation. The variables in an optimized 306coroutine frame should not be thought of as directly representing the 307variables in the C++ source. 308 309Get the suspended points 310======================== 311 312An important requirement for debugging coroutines is to understand suspended 313points, which are where the coroutine is currently suspended and awaiting. 314 315For simple cases like the above, inspecting the value of the `__coro_index` 316variable in the coroutine frame works well. 317 318However, it is not quite so simple in really complex situations. In these 319cases, it is necessary to use the coroutine libraries to insert the 320line-number. 321 322For example: 323 324.. code-block:: c++ 325 326 // For all the promise_type we want: 327 class promise_type { 328 ... 329 + unsigned line_number = 0xffffffff; 330 }; 331 332 #include <source_location> 333 334 // For all the awaiter types we need: 335 class awaiter { 336 ... 337 template <typename Promise> 338 void await_suspend(std::coroutine_handle<Promise> handle, 339 std::source_location sl = std::source_location::current()) { 340 ... 341 handle.promise().line_number = sl.line(); 342 } 343 }; 344 345In this case, we use `std::source_location` to store the line number of the 346await inside the `promise_type`. Since we can locate the coroutine function 347from the address of the coroutine, we can identify suspended points this way 348as well. 349 350The downside here is that this comes at the price of additional runtime cost. 351This is consistent with the C++ philosophy of "Pay for what you use". 352 353Get the asynchronous stack 354========================== 355 356Another important requirement to debug a coroutine is to print the asynchronous 357stack to identify the asynchronous caller of the coroutine. As many 358implementations of coroutine types store `std::coroutine_handle<> continuation` 359in the promise type, identifying the caller should be trivial. The 360`continuation` is typically the awaiting coroutine for the current coroutine. 361That is, the asynchronous parent. 362 363Since the `promise_type` is obtainable from the address of a coroutine and 364contains the corresponding continuation (which itself is a coroutine with a 365`promise_type`), it should be trivial to print the entire asynchronous stack. 366 367This logic should be quite easily captured in a debugger script. 368 369Get the living coroutines 370========================= 371 372Another useful task when debugging coroutines is to enumerate the list of 373living coroutines, which is often done with threads. While technically 374possible, this task is not recommended in production code as it is costly at 375runtime. One such solution is to store the list of currently running coroutines 376in a collection: 377 378.. code-block:: c++ 379 380 inline std::unordered_set<void*> lived_coroutines; 381 // For all promise_type we want to record 382 class promise_type { 383 public: 384 promise_type() { 385 // Note to avoid data races 386 lived_coroutines.insert(std::coroutine_handle<promise_type>::from_promise(*this).address()); 387 } 388 ~promise_type() { 389 // Note to avoid data races 390 lived_coroutines.erase(std::coroutine_handle<promise_type>::from_promise(*this).address()); 391 } 392 }; 393 394In the above code snippet, we save the address of every lived coroutine in the 395`lived_coroutines` `unordered_set`. As before, once we know the address of the 396coroutine we can derive the function, `promise_type`, and other members of the 397frame. Thus, we could print the list of lived coroutines from that collection. 398 399Please note that the above is expensive from a storage perspective, and requires 400some level of locking (not pictured) on the collection to prevent data races. 401