1==========================
2Source-based Code Coverage
3==========================
4
5.. contents::
6   :local:
7
8Introduction
9============
10
11This document explains how to use clang's source-based code coverage feature.
12It's called "source-based" because it operates on AST and preprocessor
13information directly. This allows it to generate very precise coverage data.
14
15Clang ships two other code coverage implementations:
16
17* :doc:`SanitizerCoverage` - A low-overhead tool meant for use alongside the
18  various sanitizers. It can provide up to edge-level coverage.
19
20* gcov - A GCC-compatible coverage implementation which operates on DebugInfo.
21
22From this point onwards "code coverage" will refer to the source-based kind.
23
24The code coverage workflow
25==========================
26
27The code coverage workflow consists of three main steps:
28
29* Compiling with coverage enabled.
30
31* Running the instrumented program.
32
33* Creating coverage reports.
34
35The next few sections work through a complete, copy-'n-paste friendly example
36based on this program:
37
38.. code-block:: cpp
39
40    % cat <<EOF > foo.cc
41    #define BAR(x) ((x) || (x))
42    template <typename T> void foo(T x) {
43      for (unsigned I = 0; I < 10; ++I) { BAR(I); }
44    }
45    int main() {
46      foo<int>(0);
47      foo<float>(0);
48      return 0;
49    }
50    EOF
51
52Compiling with coverage enabled
53===============================
54
55To compile code with coverage enabled, pass ``-fprofile-instr-generate
56-fcoverage-mapping`` to the compiler:
57
58.. code-block:: console
59
60    # Step 1: Compile with coverage enabled.
61    % clang++ -fprofile-instr-generate -fcoverage-mapping foo.cc -o foo
62
63Note that linking together code with and without coverage instrumentation is
64supported: any uninstrumented code simply won't be accounted for.
65
66Running the instrumented program
67================================
68
69The next step is to run the instrumented program. When the program exits it
70will write a **raw profile** to the path specified by the ``LLVM_PROFILE_FILE``
71environment variable. If that variable does not exist, the profile is written
72to ``default.profraw`` in the current directory of the program. If
73``LLVM_PROFILE_FILE`` contains a path to a non-existent directory, the missing
74directory structure will be created.  Additionally, the following special
75**pattern strings** are rewritten:
76
77* "%p" expands out to the process ID.
78
79* "%h" expands out to the hostname of the machine running the program.
80
81* "%Nm" expands out to the instrumented binary's signature. When this pattern
82  is specified, the runtime creates a pool of N raw profiles which are used for
83  on-line profile merging. The runtime takes care of selecting a raw profile
84  from the pool, locking it, and updating it before the program exits.  If N is
85  not specified (i.e the pattern is "%m"), it's assumed that ``N = 1``. N must
86  be between 1 and 9. The merge pool specifier can only occur once per filename
87  pattern.
88
89.. code-block:: console
90
91    # Step 2: Run the program.
92    % LLVM_PROFILE_FILE="foo.profraw" ./foo
93
94Creating coverage reports
95=========================
96
97Raw profiles have to be **indexed** before they can be used to generate
98coverage reports. This is done using the "merge" tool in ``llvm-profdata``, so
99named because it can combine and index profiles at the same time:
100
101.. code-block:: console
102
103    # Step 3(a): Index the raw profile.
104    % llvm-profdata merge -sparse foo.profraw -o foo.profdata
105
106There are multiple different ways to render coverage reports. One option is to
107generate a line-oriented report:
108
109.. code-block:: console
110
111    # Step 3(b): Create a line-oriented coverage report.
112    % llvm-cov show ./foo -instr-profile=foo.profdata
113
114To generate the same report in html with demangling turned on, use:
115
116.. code-block:: console
117
118    % llvm-cov show ./foo -instr-profile=foo.profdata -format html -o report.dir -Xdemangler c++filt -Xdemangler -n
119
120This report includes a summary view as well as dedicated sub-views for
121templated functions and their instantiations. For our example program, we get
122distinct views for ``foo<int>(...)`` and ``foo<float>(...)``.  If
123``-show-line-counts-or-regions`` is enabled, ``llvm-cov`` displays sub-line
124region counts (even in macro expansions):
125
126.. code-block:: none
127
128        1|   20|#define BAR(x) ((x) || (x))
129                               ^20     ^2
130        2|    2|template <typename T> void foo(T x) {
131        3|   22|  for (unsigned I = 0; I < 10; ++I) { BAR(I); }
132                                       ^22     ^20  ^20^20
133        4|    2|}
134    ------------------
135    | void foo<int>(int):
136    |      2|    1|template <typename T> void foo(T x) {
137    |      3|   11|  for (unsigned I = 0; I < 10; ++I) { BAR(I); }
138    |                                     ^11     ^10  ^10^10
139    |      4|    1|}
140    ------------------
141    | void foo<float>(int):
142    |      2|    1|template <typename T> void foo(T x) {
143    |      3|   11|  for (unsigned I = 0; I < 10; ++I) { BAR(I); }
144    |                                     ^11     ^10  ^10^10
145    |      4|    1|}
146    ------------------
147
148It's possible to generate a file-level summary of coverage statistics (instead
149of a line-oriented report) with:
150
151.. code-block:: console
152
153    # Step 3(c): Create a coverage summary.
154    % llvm-cov report ./foo -instr-profile=foo.profdata
155    Filename           Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover
156    --------------------------------------------------------------------------------------------------------------------------------------
157    /tmp/foo.cc             13                 0   100.00%           3                 0   100.00%          13                 0   100.00%
158    --------------------------------------------------------------------------------------------------------------------------------------
159    TOTAL                   13                 0   100.00%           3                 0   100.00%          13                 0   100.00%
160
161A few final notes:
162
163* The ``-sparse`` flag is optional but can result in dramatically smaller
164  indexed profiles. This option should not be used if the indexed profile will
165  be reused for PGO.
166
167* Raw profiles can be discarded after they are indexed. Advanced use of the
168  profile runtime library allows an instrumented program to merge profiling
169  information directly into an existing raw profile on disk. The details are
170  out of scope.
171
172* The ``llvm-profdata`` tool can be used to merge together multiple raw or
173  indexed profiles. To combine profiling data from multiple runs of a program,
174  try e.g:
175
176  .. code-block:: console
177
178      % llvm-profdata merge -sparse foo1.profraw foo2.profdata -o foo3.profdata
179
180Exporting coverage data
181=======================
182
183Coverage data can be exported into JSON using the ``llvm-cov export``
184sub-command. There is a comprehensive reference which defines the structure of
185the exported data at a high level in the llvm-cov source code.
186
187Interpreting reports
188====================
189
190There are four statistics tracked in a coverage summary:
191
192* Function coverage is the percentage of functions which have been executed at
193  least once. A function is treated as having been executed if any of its
194  instantiations are executed.
195
196* Instantiation coverage is the percentage of function instantiations which
197  have been executed at least once. Template functions and static inline
198  functions from headers are two kinds of functions which may have multiple
199  instantiations.
200
201* Line coverage is the percentage of code lines which have been executed at
202  least once. Only executable lines within function bodies are considered to be
203  code lines, so e.g coverage for macro definitions in a header might not be
204  included.
205
206* Region coverage is the percentage of code regions which have been executed at
207  least once. A code region may span multiple lines (e.g a large function with
208  no control flow). However, it's also possible for a single line to contain
209  multiple code regions or even nested code regions (e.g "return x || y && z").
210
211Of these four statistics, function coverage is usually the least granular while
212region coverage is the most granular. The project-wide totals for each
213statistic are listed in the summary.
214
215Format compatibility guarantees
216===============================
217
218* There are no backwards or forwards compatibility guarantees for the raw
219  profile format. Raw profiles may be dependent on the specific compiler
220  revision used to generate them. It's inadvisable to store raw profiles for
221  long periods of time.
222
223* Tools must retain **backwards** compatibility with indexed profile formats.
224  These formats are not forwards-compatible: i.e, a tool which uses format
225  version X will not be able to understand format version (X+k).
226
227* There is a third format in play: the format of the coverage mappings emitted
228  into instrumented binaries. Tools must retain **backwards** compatibility
229  with these formats. These formats are not forwards-compatible.
230
231* The JSON coverage export format has a (major, minor, patch) version triple.
232  Only a major version increment indicates a backwards-incompatible change. A
233  minor version increment is for added functionality, and patch version
234  increments are for bugfixes.
235
236Using the profiling runtime without static initializers
237=======================================================
238
239By default the compiler runtime uses a static initializer to determine the
240profile output path and to register a writer function. To collect profiles
241without using static initializers, do this manually:
242
243* Export a ``int __llvm_profile_runtime`` symbol from each instrumented shared
244  library and executable. When the linker finds a definition of this symbol, it
245  knows to skip loading the object which contains the profiling runtime's
246  static initializer.
247
248* Forward-declare ``void __llvm_profile_initialize_file(void)`` and call it
249  once from each instrumented executable. This function parses
250  ``LLVM_PROFILE_FILE``, sets the output path, and truncates any existing files
251  at that path. To get the same behavior without truncating existing files,
252  pass a filename pattern string to ``void __llvm_profile_set_filename(char
253  *)``.  These calls can be placed anywhere so long as they precede all calls
254  to ``__llvm_profile_write_file``.
255
256* Forward-declare ``int __llvm_profile_write_file(void)`` and call it to write
257  out a profile. This function returns 0 when it succeeds, and a non-zero value
258  otherwise. Calling this function multiple times appends profile data to an
259  existing on-disk raw profile.
260
261Collecting coverage reports for the llvm project
262================================================
263
264To prepare a coverage report for llvm (and any of its sub-projects), add
265``-DLLVM_BUILD_INSTRUMENTED_COVERAGE=On`` to the cmake configuration. Raw
266profiles will be written to ``$BUILD_DIR/profiles/``. To prepare an html
267report, run ``llvm/utils/prepare-code-coverage-artifact.py``.
268
269To specify an alternate directory for raw profiles, use
270``-DLLVM_PROFILE_DATA_DIR``. To change the size of the profile merge pool, use
271``-DLLVM_PROFILE_MERGE_POOL_SIZE``.
272
273Drawbacks and limitations
274=========================
275
276* Code coverage does not handle unpredictable changes in control flow or stack
277  unwinding in the presence of exceptions precisely. Consider the following
278  function:
279
280  .. code-block:: cpp
281
282      int f() {
283        may_throw();
284        return 0;
285      }
286
287  If the call to ``may_throw()`` propagates an exception into ``f``, the code
288  coverage tool may mark the ``return`` statement as executed even though it is
289  not. A call to ``longjmp()`` can have similar effects.
290