1.. role:: raw-html(raw) 2 :format: html 3 4================================= 5LLVM Code Coverage Mapping Format 6================================= 7 8.. contents:: 9 :local: 10 11Introduction 12============ 13 14LLVM's code coverage mapping format is used to provide code coverage 15analysis using LLVM's and Clang's instrumentation based profiling 16(Clang's ``-fprofile-instr-generate`` option). 17 18This document is aimed at those who would like to know how LLVM's code coverage 19mapping works under the hood. A prior knowledge of how Clang's profile guided 20optimization works is useful, but not required. For those interested in using 21LLVM to provide code coverage analysis for their own programs, see the `Clang 22documentation <https://clang.llvm.org/docs/SourceBasedCodeCoverage.html>`. 23 24We start by briefly describing LLVM's code coverage mapping format and the 25way that Clang and LLVM's code coverage tool work with this format. After 26the basics are down, more advanced features of the coverage mapping format 27are discussed - such as the data structures, LLVM IR representation and 28the binary encoding. 29 30High Level Overview 31=================== 32 33LLVM's code coverage mapping format is designed to be a self contained 34data format that can be embedded into the LLVM IR and into object files. 35It's described in this document as a **mapping** format because its goal is 36to store the data that is required for a code coverage tool to map between 37the specific source ranges in a file and the execution counts obtained 38after running the instrumented version of the program. 39 40The mapping data is used in two places in the code coverage process: 41 421. When clang compiles a source file with ``-fcoverage-mapping``, it 43 generates the mapping information that describes the mapping between the 44 source ranges and the profiling instrumentation counters. 45 This information gets embedded into the LLVM IR and conveniently 46 ends up in the final executable file when the program is linked. 47 482. It is also used by *llvm-cov* - the mapping information is extracted from an 49 object file and is used to associate the execution counts (the values of the 50 profile instrumentation counters), and the source ranges in a file. 51 After that, the tool is able to generate various code coverage reports 52 for the program. 53 54The coverage mapping format aims to be a "universal format" that would be 55suitable for usage by any frontend, and not just by Clang. It also aims to 56provide the frontend the possibility of generating the minimal coverage mapping 57data in order to reduce the size of the IR and object files - for example, 58instead of emitting mapping information for each statement in a function, the 59frontend is allowed to group the statements with the same execution count into 60regions of code, and emit the mapping information only for those regions. 61 62Advanced Concepts 63================= 64 65The remainder of this guide is meant to give you insight into the way the 66coverage mapping format works. 67 68The coverage mapping format operates on a per-function level as the 69profile instrumentation counters are associated with a specific function. 70For each function that requires code coverage, the frontend has to create 71coverage mapping data that can map between the source code ranges and 72the profile instrumentation counters for that function. 73 74Mapping Region 75-------------- 76 77The function's coverage mapping data contains an array of mapping regions. 78A mapping region stores the `source code range`_ that is covered by this region, 79the `file id <coverage file id_>`_, the `coverage mapping counter`_ and 80the region's kind. 81There are several kinds of mapping regions: 82 83* Code regions associate portions of source code and `coverage mapping 84 counters`_. They make up the majority of the mapping regions. They are used 85 by the code coverage tool to compute the execution counts for lines, 86 highlight the regions of code that were never executed, and to obtain 87 the various code coverage statistics for a function. 88 For example: 89 90 :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main(int argc, const char *argv[]) </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Code Region from 1:40 to 9:2</span> 91 <span style='background-color:#4A789C'> </span> 92 <span style='background-color:#4A789C'> if (argc > 1) </span><span style='background-color:#85C1F5'>{ </span> <span class='c1'>// Code Region from 3:17 to 5:4</span> 93 <span style='background-color:#85C1F5'> printf("%s\n", argv[1]); </span> 94 <span style='background-color:#85C1F5'> }</span><span style='background-color:#4A789C'> else </span><span style='background-color:#F6D55D'>{ </span> <span class='c1'>// Code Region from 5:10 to 7:4</span> 95 <span style='background-color:#F6D55D'> printf("\n"); </span> 96 <span style='background-color:#F6D55D'> }</span><span style='background-color:#4A789C'> </span> 97 <span style='background-color:#4A789C'> return 0; </span> 98 <span style='background-color:#4A789C'>}</span> 99 </pre>` 100* Skipped regions are used to represent source ranges that were skipped 101 by Clang's preprocessor. They don't associate with 102 `coverage mapping counters`_, as the frontend knows that they are never 103 executed. They are used by the code coverage tool to mark the skipped lines 104 inside a function as non-code lines that don't have execution counts. 105 For example: 106 107 :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main() </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Code Region from 1:12 to 6:2</span> 108 <span style='background-color:#85C1F5'>#ifdef DEBUG </span> <span class='c1'>// Skipped Region from 2:1 to 4:2</span> 109 <span style='background-color:#85C1F5'> printf("Hello world"); </span> 110 <span style='background-color:#85C1F5'>#</span><span style='background-color:#4A789C'>endif </span> 111 <span style='background-color:#4A789C'> return 0; </span> 112 <span style='background-color:#4A789C'>}</span> 113 </pre>` 114* Expansion regions are used to represent Clang's macro expansions. They 115 have an additional property - *expanded file id*. This property can be 116 used by the code coverage tool to find the mapping regions that are created 117 as a result of this macro expansion, by checking if their file id matches the 118 expanded file id. They don't associate with `coverage mapping counters`_, 119 as the code coverage tool can determine the execution count for this region 120 by looking up the execution count of the first region with a corresponding 121 file id. 122 For example: 123 124 :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int func(int x) </span><span style='background-color:#4A789C'>{ </span> 125 <span style='background-color:#4A789C'> #define MAX(x,y) </span><span style='background-color:#85C1F5'>((x) > (y)? </span><span style='background-color:#F6D55D'>(x)</span><span style='background-color:#85C1F5'> : </span><span style='background-color:#F4BA70'>(y)</span><span style='background-color:#85C1F5'>)</span><span style='background-color:#4A789C'> </span> 126 <span style='background-color:#4A789C'> return </span><span style='background-color:#7FCA9F'>MAX</span><span style='background-color:#4A789C'>(x, 42); </span> <span class='c1'>// Expansion Region from 3:10 to 3:13</span> 127 <span style='background-color:#4A789C'>}</span> 128 </pre>` 129* Branch regions associate instrumentable branch conditions in the source code 130 with a `coverage mapping counter`_ to track how many times an individual 131 condition evaluated to 'true' and another `coverage mapping counter`_ to 132 track how many times that condition evaluated to false. Instrumentable 133 branch conditions may comprise larger boolean expressions using boolean 134 logical operators. The 'true' and 'false' cases reflect unique branch paths 135 that can be traced back to the source code. 136 For example: 137 138 :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int func(int x, int y) { 139 <span> if (<span style='background-color:#4A789C'>(x > 1)</span> || <span style='background-color:#4A789C'>(y > 3)</span>) {</span> <span class='c1'>// Branch Region from 3:6 to 3:12</span> 140 <span> </span><span class='c1'>// Branch Region from 3:17 to 3:23</span> 141 <span> printf("%d\n", x); </span> 142 <span> } else { </span> 143 <span> printf("\n"); </span> 144 <span> }</span> 145 <span> return 0; </span> 146 <span>}</span> 147 </pre>` 148 149.. _source code range: 150 151Source Range: 152^^^^^^^^^^^^^ 153 154The source range record contains the starting and ending location of a certain 155mapping region. Both locations include the line and the column numbers. 156 157.. _coverage file id: 158 159File ID: 160^^^^^^^^ 161 162The file id an integer value that tells us 163in which source file or macro expansion is this region located. 164It enables Clang to produce mapping information for the code 165defined inside macros, like this example demonstrates: 166 167:raw-html:`<pre class='highlight' style='line-height:initial;'><span>void func(const char *str) </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Code Region from 1:28 to 6:2 with file id 0</span> 168<span style='background-color:#4A789C'> #define PUT </span><span style='background-color:#85C1F5'>printf("%s\n", str)</span><span style='background-color:#4A789C'> </span> <span class='c1'>// 2 Code Regions from 2:15 to 2:34 with file ids 1 and 2</span> 169<span style='background-color:#4A789C'> if(*str) </span> 170<span style='background-color:#4A789C'> </span><span style='background-color:#F6D55D'>PUT</span><span style='background-color:#4A789C'>; </span> <span class='c1'>// Expansion Region from 4:5 to 4:8 with file id 0 that expands a macro with file id 1</span> 171<span style='background-color:#4A789C'> </span><span style='background-color:#F6D55D'>PUT</span><span style='background-color:#4A789C'>; </span> <span class='c1'>// Expansion Region from 5:3 to 5:6 with file id 0 that expands a macro with file id 2</span> 172<span style='background-color:#4A789C'>}</span> 173</pre>` 174 175.. _coverage mapping counter: 176.. _coverage mapping counters: 177 178Counter: 179^^^^^^^^ 180 181A coverage mapping counter can represents a reference to the profile 182instrumentation counter. The execution count for a region with such counter 183is determined by looking up the value of the corresponding profile 184instrumentation counter. 185 186It can also represent a binary arithmetical expression that operates on 187coverage mapping counters or other expressions. 188The execution count for a region with an expression counter is determined by 189evaluating the expression's arguments and then adding them together or 190subtracting them from one another. 191In the example below, a subtraction expression is used to compute the execution 192count for the compound statement that follows the *else* keyword: 193 194:raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main(int argc, const char *argv[]) </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Region's counter is a reference to the profile counter #0</span> 195<span style='background-color:#4A789C'> </span> 196<span style='background-color:#4A789C'> if (argc > 1) </span><span style='background-color:#85C1F5'>{ </span> <span class='c1'>// Region's counter is a reference to the profile counter #1</span> 197<span style='background-color:#85C1F5'> printf("%s\n", argv[1]); </span><span> </span> 198<span style='background-color:#85C1F5'> }</span><span style='background-color:#4A789C'> else </span><span style='background-color:#F6D55D'>{ </span> <span class='c1'>// Region's counter is an expression (reference to the profile counter #0 - reference to the profile counter #1)</span> 199<span style='background-color:#F6D55D'> printf("\n"); </span> 200<span style='background-color:#F6D55D'> }</span><span style='background-color:#4A789C'> </span> 201<span style='background-color:#4A789C'> return 0; </span> 202<span style='background-color:#4A789C'>}</span> 203</pre>` 204 205Finally, a coverage mapping counter can also represent an execution count of 206of zero. The zero counter is used to provide coverage mapping for 207unreachable statements and expressions, like in the example below: 208 209:raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main() </span><span style='background-color:#4A789C'>{ </span> 210<span style='background-color:#4A789C'> return 0; </span> 211<span style='background-color:#4A789C'> </span><span style='background-color:#85C1F5'>printf("Hello world!\n")</span><span style='background-color:#4A789C'>; </span> <span class='c1'>// Unreachable region's counter is zero</span> 212<span style='background-color:#4A789C'>}</span> 213</pre>` 214 215The zero counters allow the code coverage tool to display proper line execution 216counts for the unreachable lines and highlight the unreachable code. 217Without them, the tool would think that those lines and regions were still 218executed, as it doesn't possess the frontend's knowledge. 219 220Note that branch regions are created to track branch conditions in the source 221code and refer to two coverage mapping counters, one to track the number of 222times the branch condition evaluated to "true", and one to track the number of 223times the branch condition evaluated to "false". 224 225LLVM IR Representation 226====================== 227 228The coverage mapping data is stored in the LLVM IR using a global constant 229structure variable called *__llvm_coverage_mapping* with the *IPSK_covmap* 230section specifier (i.e. ".lcovmap$M" on Windows and "__llvm_covmap" elsewhere). 231 232For example, let’s consider a C file and how it gets compiled to LLVM: 233 234.. _coverage mapping sample: 235 236.. code-block:: c 237 238 int foo() { 239 return 42; 240 } 241 int bar() { 242 return 13; 243 } 244 245The coverage mapping variable generated by Clang has 2 fields: 246 247* Coverage mapping header. 248 249* An optionally compressed list of filenames present in the translation unit. 250 251The variable has 8-byte alignment because ld64 cannot always pack symbols from 252different object files tightly (the word-level alignment assumption is baked in 253too deeply). 254 255.. code-block:: llvm 256 257 @__llvm_coverage_mapping = internal constant { { i32, i32, i32, i32 }, [32 x i8] } 258 { 259 { i32, i32, i32, i32 } ; Coverage map header 260 { 261 i32 0, ; Always 0. In prior versions, the number of affixed function records 262 i32 32, ; The length of the string that contains the encoded translation unit filenames 263 i32 0, ; Always 0. In prior versions, the length of the affixed string that contains the encoded coverage mapping data 264 i32 3, ; Coverage mapping format version 265 }, 266 [32 x i8] c"..." ; Encoded data (dissected later) 267 }, section "__llvm_covmap", align 8 268 269The current version of the format is version 5. There is one difference from version 4: 270 271* The notion of branch region has been introduced along with a corresponding 272 region kind. Branch regions encode two counters, one to track how many 273 times a "true" branch condition is taken, and one to track how many times a 274 "false" branch condition is taken. 275 276There are two differences between versions 4 and 3: 277 278* Function records are now named symbols, and are marked *linkonce_odr*. This 279 allows linkers to merge duplicate function records. Merging of duplicate 280 *dummy* records (emitted for functions included-but-not-used in a translation 281 unit) reduces size bloat in the coverage mapping data. As part of this 282 change, region mapping information for a function is now included within the 283 function record, instead of being affixed to the coverage header. 284 285* The filename list for a translation unit may optionally be zlib-compressed. 286 287The only difference between versions 3 and 2 is that a special encoding for 288column end locations was introduced to indicate gap regions. 289 290In version 1, the function record for *foo* was defined as follows: 291 292.. code-block:: llvm 293 294 { i8*, i32, i32, i64 } { i8* getelementptr inbounds ([3 x i8]* @__profn_foo, i32 0, i32 0), ; Function's name 295 i32 3, ; Function's name length 296 i32 9, ; Function's encoded coverage mapping data string length 297 i64 0 ; Function's structural hash 298 } 299 300In version 2, the function record for *foo* was defined as follows: 301 302.. code-block:: llvm 303 304 { i64, i32, i64 } { 305 i64 0x5cf8c24cdb18bdac, ; Function's name MD5 306 i32 9, ; Function's encoded coverage mapping data string length 307 i64 0 ; Function's structural hash 308 309Coverage Mapping Header: 310------------------------ 311 312The coverage mapping header has the following fields: 313 314* The number of function records affixed to the coverage header. Always 0, but present for backwards compatibility. 315 316* The length of the string in the third field of *__llvm_coverage_mapping* that contains the encoded translation unit filenames. 317 318* The length of the string in the third field of *__llvm_coverage_mapping* that contains any encoded coverage mapping data affixed to the coverage header. Always 0, but present for backwards compatibility. 319 320* The format version. The current version is 4 (encoded as a 3). 321 322.. _function records: 323 324Function record: 325---------------- 326 327A function record is a structure of the following type: 328 329.. code-block:: llvm 330 331 { i64, i32, i64, i64, [? x i8] } 332 333It contains the function name's MD5, the length of the encoded mapping data for 334that function, the function's structural hash value, the hash of the filenames 335in the function's translation unit, and the encoded mapping data. 336 337Dissecting the sample: 338^^^^^^^^^^^^^^^^^^^^^^ 339 340Here's an overview of the encoded data that was stored in the 341IR for the `coverage mapping sample`_ that was shown earlier: 342 343* The IR contains the following string constant that represents the encoded 344 coverage mapping data for the sample translation unit: 345 346 .. code-block:: llvm 347 348 c"\01\15\1Dx\DA\13\D1\0F-N-*\D6/+\CE\D6/\C9-\D0O\CB\CF\D7K\06\00N+\07]" 349 350* The string contains values that are encoded in the LEB128 format, which is 351 used throughout for storing integers. It also contains a compressed payload. 352 353* The first three LEB128-encoded numbers in the sample specify the number of 354 filenames, the length of the uncompressed filenames, and the length of the 355 compressed payload (or 0 if compression is disabled). In this sample, there 356 is 1 filename that is 21 bytes in length (uncompressed), and stored in 29 357 bytes (compressed). 358 359* The coverage mapping from the first function record is encoded in this string: 360 361 .. code-block:: llvm 362 363 c"\01\00\00\01\01\01\0C\02\02" 364 365 This string consists of the following bytes: 366 367 +----------+-------------------------------------------------------------------------------------------------------------------------+ 368 | ``0x01`` | The number of file ids used by this function. There is only one file id used by the mapping data in this function. | 369 +----------+-------------------------------------------------------------------------------------------------------------------------+ 370 | ``0x00`` | An index into the filenames array which corresponds to the file "/Users/alex/test.c". | 371 +----------+-------------------------------------------------------------------------------------------------------------------------+ 372 | ``0x00`` | The number of counter expressions used by this function. This function doesn't use any expressions. | 373 +----------+-------------------------------------------------------------------------------------------------------------------------+ 374 | ``0x01`` | The number of mapping regions that are stored in an array for the function's file id #0. | 375 +----------+-------------------------------------------------------------------------------------------------------------------------+ 376 | ``0x01`` | The coverage mapping counter for the first region in this function. The value of 1 tells us that it's a coverage | 377 | | mapping counter that is a reference to the profile instrumentation counter with an index of 0. | 378 +----------+-------------------------------------------------------------------------------------------------------------------------+ 379 | ``0x01`` | The starting line of the first mapping region in this function. | 380 +----------+-------------------------------------------------------------------------------------------------------------------------+ 381 | ``0x0C`` | The starting column of the first mapping region in this function. | 382 +----------+-------------------------------------------------------------------------------------------------------------------------+ 383 | ``0x02`` | The ending line of the first mapping region in this function. | 384 +----------+-------------------------------------------------------------------------------------------------------------------------+ 385 | ``0x02`` | The ending column of the first mapping region in this function. | 386 +----------+-------------------------------------------------------------------------------------------------------------------------+ 387 388* The length of the substring that contains the encoded coverage mapping data 389 for the second function record is also 9. It's structured like the mapping data 390 for the first function record. 391 392* The two trailing bytes are zeroes and are used to pad the coverage mapping 393 data to give it the 8 byte alignment. 394 395Encoding 396======== 397 398The per-function coverage mapping data is encoded as a stream of bytes, 399with a simple structure. The structure consists of the encoding 400`types <cvmtypes_>`_ like variable-length unsigned integers, that 401are used to encode `File ID Mapping`_, `Counter Expressions`_ and 402the `Mapping Regions`_. 403 404The format of the structure follows: 405 406 ``[file id mapping, counter expressions, mapping regions]`` 407 408The translation unit filenames are encoded using the same encoding 409`types <cvmtypes_>`_ as the per-function coverage mapping data, with the 410following structure: 411 412 ``[numFilenames : LEB128, filename0 : string, filename1 : string, ...]`` 413 414.. _cvmtypes: 415 416Types 417----- 418 419This section describes the basic types that are used by the encoding format 420and can appear after ``:`` in the ``[foo : type]`` description. 421 422.. _LEB128: 423 424LEB128 425^^^^^^ 426 427LEB128 is an unsigned integer value that is encoded using DWARF's LEB128 428encoding, optimizing for the case where values are small 429(1 byte for values less than 128). 430 431.. _CoverageStrings: 432 433Strings 434^^^^^^^ 435 436``[length : LEB128, characters...]`` 437 438String values are encoded with a `LEB value <LEB128_>`_ for the length 439of the string and a sequence of bytes for its characters. 440 441.. _file id mapping: 442 443File ID Mapping 444--------------- 445 446``[numIndices : LEB128, filenameIndex0 : LEB128, filenameIndex1 : LEB128, ...]`` 447 448File id mapping in a function's coverage mapping stream 449contains the indices into the translation unit's filenames array. 450 451Counter 452------- 453 454``[value : LEB128]`` 455 456A `coverage mapping counter`_ is stored in a single `LEB value <LEB128_>`_. 457It is composed of two things --- the `tag <counter-tag_>`_ 458which is stored in the lowest 2 bits, and the `counter data`_ which is stored 459in the remaining bits. 460 461.. _counter-tag: 462 463Tag: 464^^^^ 465 466The counter's tag encodes the counter's kind 467and, if the counter is an expression, the expression's kind. 468The possible tag values are: 469 470* 0 - The counter is zero. 471 472* 1 - The counter is a reference to the profile instrumentation counter. 473 474* 2 - The counter is a subtraction expression. 475 476* 3 - The counter is an addition expression. 477 478.. _counter data: 479 480Data: 481^^^^^ 482 483The counter's data is interpreted in the following manner: 484 485* When the counter is a reference to the profile instrumentation counter, 486 then the counter's data is the id of the profile counter. 487* When the counter is an expression, then the counter's data 488 is the index into the array of counter expressions. 489 490.. _Counter Expressions: 491 492Counter Expressions 493------------------- 494 495``[numExpressions : LEB128, expr0LHS : LEB128, expr0RHS : LEB128, expr1LHS : LEB128, expr1RHS : LEB128, ...]`` 496 497Counter expressions consist of two counters as they 498represent binary arithmetic operations. 499The expression's kind is determined from the `tag <counter-tag_>`_ of the 500counter that references this expression. 501 502.. _Mapping Regions: 503 504Mapping Regions 505--------------- 506 507``[numRegionArrays : LEB128, regionsForFile0, regionsForFile1, ...]`` 508 509The mapping regions are stored in an array of sub-arrays where every 510region in a particular sub-array has the same file id. 511 512The file id for a sub-array of regions is the index of that 513sub-array in the main array e.g. The first sub-array will have the file id 514of 0. 515 516Sub-Array of Regions 517^^^^^^^^^^^^^^^^^^^^ 518 519``[numRegions : LEB128, region0, region1, ...]`` 520 521The mapping regions for a specific file id are stored in an array that is 522sorted in an ascending order by the region's starting location. 523 524Mapping Region 525^^^^^^^^^^^^^^ 526 527``[header, source range]`` 528 529The mapping region record contains two sub-records --- 530the `header`_, which stores the counter and/or the region's kind, 531and the `source range`_ that contains the starting and ending 532location of this region. 533 534.. _header: 535 536Header 537^^^^^^ 538 539``[counter]`` 540 541or 542 543``[pseudo-counter]`` 544 545The header encodes the region's counter and the region's kind. A branch region 546will encode two counters. 547 548The value of the counter's tag distinguishes between the counters and 549pseudo-counters --- if the tag is zero, than this header contains a 550pseudo-counter, otherwise this header contains an ordinary counter. 551 552Counter: 553"""""""" 554 555A mapping region whose header has a counter with a non-zero tag is 556a code region. 557 558Pseudo-Counter: 559""""""""""""""" 560 561``[value : LEB128]`` 562 563A pseudo-counter is stored in a single `LEB value <LEB128_>`_, just like 564the ordinary counter. It has the following interpretation: 565 566* bits 0-1: tag, which is always 0. 567 568* bit 2: expansionRegionTag. If this bit is set, then this mapping region 569 is an expansion region. 570 571* remaining bits: data. If this region is an expansion region, then the data 572 contains the expanded file id of that region. 573 574 Otherwise, the data contains the region's kind. The possible region 575 kind values are: 576 577 * 0 - This mapping region is a code region with a counter of zero. 578 * 2 - This mapping region is a skipped region. 579 * 4 - This mapping region is a branch region. 580 581.. _source range: 582 583Source Range 584^^^^^^^^^^^^ 585 586``[deltaLineStart : LEB128, columnStart : LEB128, numLines : LEB128, columnEnd : LEB128]`` 587 588The source range record contains the following fields: 589 590* *deltaLineStart*: The difference between the starting line of the 591 current mapping region and the starting line of the previous mapping region. 592 593 If the current mapping region is the first region in the current 594 sub-array, then it stores the starting line of that region. 595 596* *columnStart*: The starting column of the mapping region. 597 598* *numLines*: The difference between the ending line and the starting line 599 of the current mapping region. 600 601* *columnEnd*: The ending column of the mapping region. If the high bit is set, 602 the current mapping region is a gap area. A count for a gap area is only used 603 as the line execution count if there are no other regions on a line. 604