1========================
2Debugging C++ Coroutines
3========================
4
5.. contents::
6   :local:
7
8Introduction
9============
10
11For performance and other architectural reasons, the C++ Coroutines feature in
12the Clang compiler is implemented in two parts of the compiler.  Semantic
13analysis is performed in Clang, and Coroutine construction and optimization
14takes place in the LLVM middle-end.
15
16However, this design forces us to generate insufficient debugging information.
17Typically, the compiler generates debug information in the Clang frontend, as
18debug information is highly language specific. However, this is not possible
19for Coroutine frames because the frames are constructed in the LLVM middle-end.
20
21To mitigate this problem, the LLVM middle end attempts to generate some debug
22information, which is unfortunately incomplete, since much of the language
23specific information is missing in the middle end.
24
25This document describes how to use this debug information to better debug
26coroutines.
27
28Terminology
29===========
30
31Due to the recent nature of C++20 Coroutines, the terminology used to describe
32the concepts of Coroutines is not settled.  This section defines a common,
33understandable terminology to be used consistently throughout this document.
34
35coroutine type
36--------------
37
38A `coroutine function` is any function that contains any of the Coroutine
39Keywords `co_await`, `co_yield`, or `co_return`.  A `coroutine type` is a
40possible return type of one of these `coroutine functions`.  `Task` and
41`Generator` are commonly referred to coroutine types.
42
43coroutine
44---------
45
46By technical definition, a `coroutine` is a suspendable function. However,
47programmers typically use `coroutine` to refer to an individual instance.
48For example:
49
50.. code-block:: c++
51
52  std::vector<Task> Coros; // Task is a coroutine type.
53  for (int i = 0; i < 3; i++)
54    Coros.push_back(CoroTask()); // CoroTask is a coroutine function, which
55                                 // would return a coroutine type 'Task'.
56
57In practice, we typically say "`Coros` contains 3 coroutines" in the above
58example, though this is not strictly correct.  More technically, this should
59say "`Coros` contains 3 coroutine instances" or "Coros contains 3 coroutine
60objects."
61
62In this document, we follow the common practice of using `coroutine` to refer
63to an individual `coroutine instance`, since the terms `coroutine instance` and
64`coroutine object` aren't sufficiently defined in this case.
65
66coroutine frame
67---------------
68
69The C++ Standard uses `coroutine state` to describe the allocated storage. In
70the compiler, we use `coroutine frame` to describe the generated data structure
71that contains the necessary information.
72
73The structure of coroutine frames
74=================================
75
76The structure of coroutine frames is defined as:
77
78.. code-block:: c++
79
80  struct {
81    void (*__r)(); // function pointer to the `resume` function
82    void (*__d)(); // function pointer to the `destroy` function
83    promise_type; // the corresponding `promise_type`
84    ... // Any other needed information
85  }
86
87In the debugger, the function's name is obtainable from the address of the
88function. And the name of `resume` function is equal to the name of the
89coroutine function. So the name of the coroutine is obtainable once the
90address of the coroutine is known.
91
92Print promise_type
93==================
94
95Every coroutine has a `promise_type`, which defines the behavior
96for the corresponding coroutine. In other words, if two coroutines have the
97same `promise_type`, they should behave in the same way.
98To print a `promise_type` in a debugger when stopped at a breakpoint inside a
99coroutine, printing the `promise_type` can be done by:
100
101.. parsed-literal::
102
103  print __promise
104
105It is also possible to print the `promise_type` of a coroutine from the address
106of the coroutine frame. For example, if the address of a coroutine frame is
1070x416eb0, and the type of the `promise_type` is `task::promise_type`, printing
108the `promise_type` can be done by:
109
110.. parsed-literal::
111
112  print (task::promise_type)*(0x416eb0+0x10)
113
114This is possible because the `promise_type` is guaranteed by the ABI to be at a
11516 bit offset from the coroutine frame.
116
117Note that there is also an ABI independent method:
118
119.. parsed-literal::
120
121  print std::coroutine_handle<task::promise_type>::from_address((void*)0x416eb0).promise()
122
123The functions `from_address(void*)` and `promise()` are often small enough to
124be removed during optimization, so this method may not be possible.
125
126Print coroutine frames
127======================
128
129LLVM generates the debug information for the coroutine frame in the LLVM middle
130end, which permits printing of the coroutine frame in the debugger. Much like
131the `promise_type`, when stopped at a breakpoint inside a coroutine we can
132print the coroutine frame by:
133
134.. parsed-literal::
135
136  print __coro_frame
137
138
139Just as printing the `promise_type` is possible from the coroutine address,
140printing the details of the coroutine frame from an address is also possible:
141
142::
143
144  (gdb) # Get the address of coroutine frame
145  (gdb) print/x *0x418eb0
146  $1 = 0x4019e0
147  (gdb) # Get the linkage name for the coroutine
148  (gdb) x 0x4019e0
149  0x4019e0 <_ZL9coro_taski>:  0xe5894855
150  (gdb) # The coroutine frame type is 'linkage_name.coro_frame_ty'
151  (gdb) print  (_ZL9coro_taski.coro_frame_ty)*(0x418eb0)
152  $2 = {__resume_fn = 0x4019e0 <coro_task(int)>, __destroy_fn = 0x402000 <coro_task(int)>, __promise = {...}, ...}
153
154The above is possible because:
155
156(1) The name of the debug type of the coroutine frame is the `linkage_name`,
157plus the `.coro_frame_ty` suffix because each coroutine function shares the
158same coroutine type.
159
160(2) The coroutine function name is accessible from the address of the coroutine
161frame.
162
163The above commands can be simplified by placing them in debug scripts.
164
165Examples to print coroutine frames
166----------------------------------
167
168The print examples below use the following definition:
169
170.. code-block:: c++
171
172  #include <coroutine>
173  #include <iostream>
174
175  struct task{
176    struct promise_type {
177      task get_return_object() { return std::coroutine_handle<promise_type>::from_promise(*this); }
178      std::suspend_always initial_suspend() { return {}; }
179      std::suspend_always final_suspend() noexcept { return {}; }
180      void return_void() noexcept {}
181      void unhandled_exception() noexcept {}
182
183      int count = 0;
184    };
185
186    void resume() noexcept {
187      handle.resume();
188    }
189
190    task(std::coroutine_handle<promise_type> hdl) : handle(hdl) {}
191    ~task() {
192      if (handle)
193        handle.destroy();
194    }
195
196    std::coroutine_handle<> handle;
197  };
198
199  class await_counter : public std::suspend_always {
200    public:
201      template<class PromiseType>
202      void await_suspend(std::coroutine_handle<PromiseType> handle) noexcept {
203          handle.promise().count++;
204      }
205  };
206
207  static task coro_task(int v) {
208    int a = v;
209    co_await await_counter{};
210    a++;
211    std::cout << a << "\n";
212    a++;
213    std::cout << a << "\n";
214    a++;
215    std::cout << a << "\n";
216    co_await await_counter{};
217    a++;
218    std::cout << a << "\n";
219    a++;
220    std::cout << a << "\n";
221  }
222
223  int main() {
224    task t = coro_task(43);
225    t.resume();
226    t.resume();
227    t.resume();
228    return 0;
229  }
230
231In debug mode (`O0` + `g`), the printing result would be:
232
233.. parsed-literal::
234
235  {__resume_fn = 0x4019e0 <coro_task(int)>, __destroy_fn = 0x402000 <coro_task(int)>, __promise = {count = 1}, v = 43, a = 45, __coro_index = 1 '\001', struct_std__suspend_always_0 = {__int_8 = 0 '\000'},
236    class_await_counter_1 = {__int_8 = 0 '\000'}, class_await_counter_2 = {__int_8 = 0 '\000'}, struct_std__suspend_always_3 = {__int_8 = 0 '\000'}}
237
238In the above, the values of `v` and `a` are clearly expressed, as are the
239temporary values for `await_counter` (`class_await_counter_1` and
240`class_await_counter_2`) and `std::suspend_always` (
241`struct_std__suspend_always_0` and `struct_std__suspend_always_3`). The index
242of the current suspension point of the coroutine is emitted as `__coro_index`.
243In the above example, the `__coro_index` value of `1` means the coroutine
244stopped at the second suspend point (Note that `__coro_index` is zero indexed)
245which is the first `co_await await_counter{};` in `coro_task`. Note that the
246first initial suspend point is the compiler generated
247`co_await promise_type::initial_suspend()`.
248
249However, when optimizations are enabled, the printed result changes drastically:
250
251.. parsed-literal::
252
253  {__resume_fn = 0x401280 <coro_task(int)>, __destroy_fn = 0x401390 <coro_task(int)>, __promise = {count = 1}, __int_32_0 = 43, __coro_index = 1 '\001'}
254
255Unused values are optimized out, as well as the name of the local variable `a`.
256The only information remained is the value of a 32 bit integer. In this simple
257case, it seems to be pretty clear that `__int_32_0` represents `a`. However, it
258is not true.
259
260An important note with optimization is that the value of a variable may not
261properly express the intended value in the source code.  For example:
262
263.. code-block:: c++
264
265  static task coro_task(int v) {
266    int a = v;
267    co_await await_counter{};
268    a++; // __int_32_0 is 43 here
269    std::cout << a << "\n";
270    a++; // __int_32_0 is still 43 here
271    std::cout << a << "\n";
272    a++; // __int_32_0 is still 43 here!
273    std::cout << a << "\n";
274    co_await await_counter{};
275    a++; // __int_32_0 is still 43 here!!
276    std::cout << a << "\n";
277    a++; // Why is __int_32_0 still 43 here?
278    std::cout << a << "\n";
279  }
280
281When debugging step-by-step, the value of `__int_32_0` seemingly does not
282change, despite being frequently incremented, and instead is always `43`.
283While this might be surprising, this is a result of the optimizer recognizing
284that it can eliminate most of the load/store operations. The above code gets
285optimized to the equivalent of:
286
287.. code-block:: c++
288
289  static task coro_task(int v) {
290    store v to __int_32_0 in the frame
291    co_await await_counter{};
292    a = load __int_32_0
293    std::cout << a+1 << "\n";
294    std::cout << a+2 << "\n";
295    std::cout << a+3 << "\n";
296    co_await await_counter{};
297    a = load __int_32_0
298    std::cout << a+4 << "\n";
299    std::cout << a+5 << "\n";
300  }
301
302It should now be obvious why the value of `__int_32_0` remains unchanged
303throughout the function. It is important to recognize that `__int_32_0`
304does not directly correspond to `a`, but is instead a variable generated
305to assist the compiler in code generation. The variables in an optimized
306coroutine frame should not be thought of as directly representing the
307variables in the C++ source.
308
309Get the suspended points
310========================
311
312An important requirement for debugging coroutines is to understand suspended
313points, which are where the coroutine is currently suspended and awaiting.
314
315For simple cases like the above, inspecting the value of the `__coro_index`
316variable in the coroutine frame works well.
317
318However, it is not quite so simple in really complex situations. In these
319cases, it is necessary to use the coroutine libraries to insert the
320line-number.
321
322For example:
323
324.. code-block:: c++
325
326  // For all the promise_type we want:
327  class promise_type {
328    ...
329  +  unsigned line_number = 0xffffffff;
330  };
331
332  #include <source_location>
333
334  // For all the awaiter types we need:
335  class awaiter {
336    ...
337    template <typename Promise>
338    void await_suspend(std::coroutine_handle<Promise> handle,
339                       std::source_location sl = std::source_location::current()) {
340          ...
341          handle.promise().line_number = sl.line();
342    }
343  };
344
345In this case, we use `std::source_location` to store the line number of the
346await inside the `promise_type`.  Since we can locate the coroutine function
347from the address of the coroutine, we can identify suspended points this way
348as well.
349
350The downside here is that this comes at the price of additional runtime cost.
351This is consistent with the C++ philosophy of "Pay for what you use".
352
353Get the asynchronous stack
354==========================
355
356Another important requirement to debug a coroutine is to print the asynchronous
357stack to identify the asynchronous caller of the coroutine.  As many
358implementations of coroutine types store `std::coroutine_handle<> continuation`
359in the promise type, identifying the caller should be trivial.  The
360`continuation` is typically the awaiting coroutine for the current coroutine.
361That is, the asynchronous parent.
362
363Since the `promise_type` is obtainable from the address of a coroutine and
364contains the corresponding continuation (which itself is a coroutine with a
365`promise_type`), it should be trivial to print the entire asynchronous stack.
366
367This logic should be quite easily captured in a debugger script.
368
369Get the living coroutines
370=========================
371
372Another useful task when debugging coroutines is to enumerate the list of
373living coroutines, which is often done with threads.  While technically
374possible, this task is not recommended in production code as it is costly at
375runtime. One such solution is to store the list of currently running coroutines
376in a collection:
377
378.. code-block:: c++
379
380  inline std::unordered_set<void*> lived_coroutines;
381  // For all promise_type we want to record
382  class promise_type {
383  public:
384      promise_type() {
385          // Note to avoid data races
386          lived_coroutines.insert(std::coroutine_handle<promise_type>::from_promise(*this).address());
387      }
388      ~promise_type() {
389          // Note to avoid data races
390          lived_coroutines.erase(std::coroutine_handle<promise_type>::from_promise(*this).address());
391      }
392  };
393
394In the above code snippet, we save the address of every lived coroutine in the
395`lived_coroutines` `unordered_set`. As before, once we know the address of the
396coroutine we can derive the function, `promise_type`, and other members of the
397frame. Thus, we could print the list of lived coroutines from that collection.
398
399Please note that the above is expensive from a storage perspective, and requires
400some level of locking (not pictured) on the collection to prevent data races.
401