1.. role:: raw-html(raw) 2 :format: html 3 4================================= 5LLVM Code Coverage Mapping Format 6================================= 7 8.. contents:: 9 :local: 10 11Introduction 12============ 13 14LLVM's code coverage mapping format is used to provide code coverage 15analysis using LLVM's and Clang's instrumentation based profiling 16(Clang's ``-fprofile-instr-generate`` option). 17 18This document is aimed at those who would like to know how LLVM's code coverage 19mapping works under the hood. A prior knowledge of how Clang's profile guided 20optimization works is useful, but not required. For those interested in using 21LLVM to provide code coverage analysis for their own programs, see the `Clang 22documentation <https://clang.llvm.org/docs/SourceBasedCodeCoverage.html>`. 23 24We start by briefly describing LLVM's code coverage mapping format and the 25way that Clang and LLVM's code coverage tool work with this format. After 26the basics are down, more advanced features of the coverage mapping format 27are discussed - such as the data structures, LLVM IR representation and 28the binary encoding. 29 30High Level Overview 31=================== 32 33LLVM's code coverage mapping format is designed to be a self contained 34data format that can be embedded into the LLVM IR and into object files. 35It's described in this document as a **mapping** format because its goal is 36to store the data that is required for a code coverage tool to map between 37the specific source ranges in a file and the execution counts obtained 38after running the instrumented version of the program. 39 40The mapping data is used in two places in the code coverage process: 41 421. When clang compiles a source file with ``-fcoverage-mapping``, it 43 generates the mapping information that describes the mapping between the 44 source ranges and the profiling instrumentation counters. 45 This information gets embedded into the LLVM IR and conveniently 46 ends up in the final executable file when the program is linked. 47 482. It is also used by *llvm-cov* - the mapping information is extracted from an 49 object file and is used to associate the execution counts (the values of the 50 profile instrumentation counters), and the source ranges in a file. 51 After that, the tool is able to generate various code coverage reports 52 for the program. 53 54The coverage mapping format aims to be a "universal format" that would be 55suitable for usage by any frontend, and not just by Clang. It also aims to 56provide the frontend the possibility of generating the minimal coverage mapping 57data in order to reduce the size of the IR and object files - for example, 58instead of emitting mapping information for each statement in a function, the 59frontend is allowed to group the statements with the same execution count into 60regions of code, and emit the mapping information only for those regions. 61 62Advanced Concepts 63================= 64 65The remainder of this guide is meant to give you insight into the way the 66coverage mapping format works. 67 68The coverage mapping format operates on a per-function level as the 69profile instrumentation counters are associated with a specific function. 70For each function that requires code coverage, the frontend has to create 71coverage mapping data that can map between the source code ranges and 72the profile instrumentation counters for that function. 73 74Mapping Region 75-------------- 76 77The function's coverage mapping data contains an array of mapping regions. 78A mapping region stores the `source code range`_ that is covered by this region, 79the `file id <coverage file id_>`_, the `coverage mapping counter`_ and 80the region's kind. 81There are several kinds of mapping regions: 82 83* Code regions associate portions of source code and `coverage mapping 84 counters`_. They make up the majority of the mapping regions. They are used 85 by the code coverage tool to compute the execution counts for lines, 86 highlight the regions of code that were never executed, and to obtain 87 the various code coverage statistics for a function. 88 For example: 89 90 :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main(int argc, const char *argv[]) </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Code Region from 1:40 to 9:2</span> 91 <span style='background-color:#4A789C'> </span> 92 <span style='background-color:#4A789C'> if (argc > 1) </span><span style='background-color:#85C1F5'>{ </span> <span class='c1'>// Code Region from 3:17 to 5:4</span> 93 <span style='background-color:#85C1F5'> printf("%s\n", argv[1]); </span> 94 <span style='background-color:#85C1F5'> }</span><span style='background-color:#4A789C'> else </span><span style='background-color:#F6D55D'>{ </span> <span class='c1'>// Code Region from 5:10 to 7:4</span> 95 <span style='background-color:#F6D55D'> printf("\n"); </span> 96 <span style='background-color:#F6D55D'> }</span><span style='background-color:#4A789C'> </span> 97 <span style='background-color:#4A789C'> return 0; </span> 98 <span style='background-color:#4A789C'>}</span> 99 </pre>` 100* Skipped regions are used to represent source ranges that were skipped 101 by Clang's preprocessor. They don't associate with 102 `coverage mapping counters`_, as the frontend knows that they are never 103 executed. They are used by the code coverage tool to mark the skipped lines 104 inside a function as non-code lines that don't have execution counts. 105 For example: 106 107 :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main() </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Code Region from 1:12 to 6:2</span> 108 <span style='background-color:#85C1F5'>#ifdef DEBUG </span> <span class='c1'>// Skipped Region from 2:1 to 4:2</span> 109 <span style='background-color:#85C1F5'> printf("Hello world"); </span> 110 <span style='background-color:#85C1F5'>#</span><span style='background-color:#4A789C'>endif </span> 111 <span style='background-color:#4A789C'> return 0; </span> 112 <span style='background-color:#4A789C'>}</span> 113 </pre>` 114* Expansion regions are used to represent Clang's macro expansions. They 115 have an additional property - *expanded file id*. This property can be 116 used by the code coverage tool to find the mapping regions that are created 117 as a result of this macro expansion, by checking if their file id matches the 118 expanded file id. They don't associate with `coverage mapping counters`_, 119 as the code coverage tool can determine the execution count for this region 120 by looking up the execution count of the first region with a corresponding 121 file id. 122 For example: 123 124 :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int func(int x) </span><span style='background-color:#4A789C'>{ </span> 125 <span style='background-color:#4A789C'> #define MAX(x,y) </span><span style='background-color:#85C1F5'>((x) > (y)? </span><span style='background-color:#F6D55D'>(x)</span><span style='background-color:#85C1F5'> : </span><span style='background-color:#F4BA70'>(y)</span><span style='background-color:#85C1F5'>)</span><span style='background-color:#4A789C'> </span> 126 <span style='background-color:#4A789C'> return </span><span style='background-color:#7FCA9F'>MAX</span><span style='background-color:#4A789C'>(x, 42); </span> <span class='c1'>// Expansion Region from 3:10 to 3:13</span> 127 <span style='background-color:#4A789C'>}</span> 128 </pre>` 129 130.. _source code range: 131 132Source Range: 133^^^^^^^^^^^^^ 134 135The source range record contains the starting and ending location of a certain 136mapping region. Both locations include the line and the column numbers. 137 138.. _coverage file id: 139 140File ID: 141^^^^^^^^ 142 143The file id an integer value that tells us 144in which source file or macro expansion is this region located. 145It enables Clang to produce mapping information for the code 146defined inside macros, like this example demonstrates: 147 148:raw-html:`<pre class='highlight' style='line-height:initial;'><span>void func(const char *str) </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Code Region from 1:28 to 6:2 with file id 0</span> 149<span style='background-color:#4A789C'> #define PUT </span><span style='background-color:#85C1F5'>printf("%s\n", str)</span><span style='background-color:#4A789C'> </span> <span class='c1'>// 2 Code Regions from 2:15 to 2:34 with file ids 1 and 2</span> 150<span style='background-color:#4A789C'> if(*str) </span> 151<span style='background-color:#4A789C'> </span><span style='background-color:#F6D55D'>PUT</span><span style='background-color:#4A789C'>; </span> <span class='c1'>// Expansion Region from 4:5 to 4:8 with file id 0 that expands a macro with file id 1</span> 152<span style='background-color:#4A789C'> </span><span style='background-color:#F6D55D'>PUT</span><span style='background-color:#4A789C'>; </span> <span class='c1'>// Expansion Region from 5:3 to 5:6 with file id 0 that expands a macro with file id 2</span> 153<span style='background-color:#4A789C'>}</span> 154</pre>` 155 156.. _coverage mapping counter: 157.. _coverage mapping counters: 158 159Counter: 160^^^^^^^^ 161 162A coverage mapping counter can represents a reference to the profile 163instrumentation counter. The execution count for a region with such counter 164is determined by looking up the value of the corresponding profile 165instrumentation counter. 166 167It can also represent a binary arithmetical expression that operates on 168coverage mapping counters or other expressions. 169The execution count for a region with an expression counter is determined by 170evaluating the expression's arguments and then adding them together or 171subtracting them from one another. 172In the example below, a subtraction expression is used to compute the execution 173count for the compound statement that follows the *else* keyword: 174 175:raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main(int argc, const char *argv[]) </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Region's counter is a reference to the profile counter #0</span> 176<span style='background-color:#4A789C'> </span> 177<span style='background-color:#4A789C'> if (argc > 1) </span><span style='background-color:#85C1F5'>{ </span> <span class='c1'>// Region's counter is a reference to the profile counter #1</span> 178<span style='background-color:#85C1F5'> printf("%s\n", argv[1]); </span><span> </span> 179<span style='background-color:#85C1F5'> }</span><span style='background-color:#4A789C'> else </span><span style='background-color:#F6D55D'>{ </span> <span class='c1'>// Region's counter is an expression (reference to the profile counter #0 - reference to the profile counter #1)</span> 180<span style='background-color:#F6D55D'> printf("\n"); </span> 181<span style='background-color:#F6D55D'> }</span><span style='background-color:#4A789C'> </span> 182<span style='background-color:#4A789C'> return 0; </span> 183<span style='background-color:#4A789C'>}</span> 184</pre>` 185 186Finally, a coverage mapping counter can also represent an execution count of 187of zero. The zero counter is used to provide coverage mapping for 188unreachable statements and expressions, like in the example below: 189 190:raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main() </span><span style='background-color:#4A789C'>{ </span> 191<span style='background-color:#4A789C'> return 0; </span> 192<span style='background-color:#4A789C'> </span><span style='background-color:#85C1F5'>printf("Hello world!\n")</span><span style='background-color:#4A789C'>; </span> <span class='c1'>// Unreachable region's counter is zero</span> 193<span style='background-color:#4A789C'>}</span> 194</pre>` 195 196The zero counters allow the code coverage tool to display proper line execution 197counts for the unreachable lines and highlight the unreachable code. 198Without them, the tool would think that those lines and regions were still 199executed, as it doesn't possess the frontend's knowledge. 200 201LLVM IR Representation 202====================== 203 204The coverage mapping data is stored in the LLVM IR using a global constant 205structure variable called *__llvm_coverage_mapping* with the *IPSK_covmap* 206section specifier (i.e. ".lcovmap$M" on Windows and "__llvm_covmap" elsewhere). 207 208For example, let’s consider a C file and how it gets compiled to LLVM: 209 210.. _coverage mapping sample: 211 212.. code-block:: c 213 214 int foo() { 215 return 42; 216 } 217 int bar() { 218 return 13; 219 } 220 221The coverage mapping variable generated by Clang has 2 fields: 222 223* Coverage mapping header. 224 225* An optionally compressed list of filenames present in the translation unit. 226 227The variable has 8-byte alignment because ld64 cannot always pack symbols from 228different object files tightly (the word-level alignment assumption is baked in 229too deeply). 230 231.. code-block:: llvm 232 233 @__llvm_coverage_mapping = internal constant { { i32, i32, i32, i32 }, [32 x i8] } 234 { 235 { i32, i32, i32, i32 } ; Coverage map header 236 { 237 i32 0, ; Always 0. In prior versions, the number of affixed function records 238 i32 32, ; The length of the string that contains the encoded translation unit filenames 239 i32 0, ; Always 0. In prior versions, the length of the affixed string that contains the encoded coverage mapping data 240 i32 3, ; Coverage mapping format version 241 }, 242 [32 x i8] c"..." ; Encoded data (dissected later) 243 }, section "__llvm_covmap", align 8 244 245The current version of the format is version 4. There are two differences from version 3: 246 247* Function records are now named symbols, and are marked *linkonce_odr*. This 248 allows linkers to merge duplicate function records. Merging of duplicate 249 *dummy* records (emitted for functions included-but-not-used in a translation 250 unit) reduces size bloat in the coverage mapping data. As part of this 251 change, region mapping information for a function is now included within the 252 function record, instead of being affixed to the coverage header. 253 254* The filename list for a translation unit may optionally be zlib-compressed. 255 256The only difference between versions 3 and 2 is that a special encoding for 257column end locations was introduced to indicate gap regions. 258 259In version 1, the function record for *foo* was defined as follows: 260 261.. code-block:: llvm 262 263 { i8*, i32, i32, i64 } { i8* getelementptr inbounds ([3 x i8]* @__profn_foo, i32 0, i32 0), ; Function's name 264 i32 3, ; Function's name length 265 i32 9, ; Function's encoded coverage mapping data string length 266 i64 0 ; Function's structural hash 267 } 268 269In version 2, the function record for *foo* was defined as follows: 270 271.. code-block:: llvm 272 273 { i64, i32, i64 } { 274 i64 0x5cf8c24cdb18bdac, ; Function's name MD5 275 i32 9, ; Function's encoded coverage mapping data string length 276 i64 0 ; Function's structural hash 277 278Coverage Mapping Header: 279------------------------ 280 281The coverage mapping header has the following fields: 282 283* The number of function records affixed to the coverage header. Always 0, but present for backwards compatibility. 284 285* The length of the string in the third field of *__llvm_coverage_mapping* that contains the encoded translation unit filenames. 286 287* The length of the string in the third field of *__llvm_coverage_mapping* that contains any encoded coverage mapping data affixed to the coverage header. Always 0, but present for backwards compatibility. 288 289* The format version. The current version is 4 (encoded as a 3). 290 291.. _function records: 292 293Function record: 294---------------- 295 296A function record is a structure of the following type: 297 298.. code-block:: llvm 299 300 { i64, i32, i64, i64, [? x i8] } 301 302It contains the function name's MD5, the length of the encoded mapping data for 303that function, the function's structural hash value, the hash of the filenames 304in the function's translation unit, and the encoded mapping data. 305 306Dissecting the sample: 307^^^^^^^^^^^^^^^^^^^^^^ 308 309Here's an overview of the encoded data that was stored in the 310IR for the `coverage mapping sample`_ that was shown earlier: 311 312* The IR contains the following string constant that represents the encoded 313 coverage mapping data for the sample translation unit: 314 315 .. code-block:: llvm 316 317 c"\01\15\1Dx\DA\13\D1\0F-N-*\D6/+\CE\D6/\C9-\D0O\CB\CF\D7K\06\00N+\07]" 318 319* The string contains values that are encoded in the LEB128 format, which is 320 used throughout for storing integers. It also contains a compressed payload. 321 322* The first three LEB128-encoded numbers in the sample specify the number of 323 filenames, the length of the uncompressed filenames, and the length of the 324 compressed payload (or 0 if compression is disabled). In this sample, there 325 is 1 filename that is 21 bytes in length (uncompressed), and stored in 29 326 bytes (compressed). 327 328* The coverage mapping from the first function record is encoded in this string: 329 330 .. code-block:: llvm 331 332 c"\01\00\00\01\01\01\0C\02\02" 333 334 This string consists of the following bytes: 335 336 +----------+-------------------------------------------------------------------------------------------------------------------------+ 337 | ``0x01`` | The number of file ids used by this function. There is only one file id used by the mapping data in this function. | 338 +----------+-------------------------------------------------------------------------------------------------------------------------+ 339 | ``0x00`` | An index into the filenames array which corresponds to the file "/Users/alex/test.c". | 340 +----------+-------------------------------------------------------------------------------------------------------------------------+ 341 | ``0x00`` | The number of counter expressions used by this function. This function doesn't use any expressions. | 342 +----------+-------------------------------------------------------------------------------------------------------------------------+ 343 | ``0x01`` | The number of mapping regions that are stored in an array for the function's file id #0. | 344 +----------+-------------------------------------------------------------------------------------------------------------------------+ 345 | ``0x01`` | The coverage mapping counter for the first region in this function. The value of 1 tells us that it's a coverage | 346 | | mapping counter that is a reference to the profile instrumentation counter with an index of 0. | 347 +----------+-------------------------------------------------------------------------------------------------------------------------+ 348 | ``0x01`` | The starting line of the first mapping region in this function. | 349 +----------+-------------------------------------------------------------------------------------------------------------------------+ 350 | ``0x0C`` | The starting column of the first mapping region in this function. | 351 +----------+-------------------------------------------------------------------------------------------------------------------------+ 352 | ``0x02`` | The ending line of the first mapping region in this function. | 353 +----------+-------------------------------------------------------------------------------------------------------------------------+ 354 | ``0x02`` | The ending column of the first mapping region in this function. | 355 +----------+-------------------------------------------------------------------------------------------------------------------------+ 356 357* The length of the substring that contains the encoded coverage mapping data 358 for the second function record is also 9. It's structured like the mapping data 359 for the first function record. 360 361* The two trailing bytes are zeroes and are used to pad the coverage mapping 362 data to give it the 8 byte alignment. 363 364Encoding 365======== 366 367The per-function coverage mapping data is encoded as a stream of bytes, 368with a simple structure. The structure consists of the encoding 369`types <cvmtypes_>`_ like variable-length unsigned integers, that 370are used to encode `File ID Mapping`_, `Counter Expressions`_ and 371the `Mapping Regions`_. 372 373The format of the structure follows: 374 375 ``[file id mapping, counter expressions, mapping regions]`` 376 377The translation unit filenames are encoded using the same encoding 378`types <cvmtypes_>`_ as the per-function coverage mapping data, with the 379following structure: 380 381 ``[numFilenames : LEB128, filename0 : string, filename1 : string, ...]`` 382 383.. _cvmtypes: 384 385Types 386----- 387 388This section describes the basic types that are used by the encoding format 389and can appear after ``:`` in the ``[foo : type]`` description. 390 391.. _LEB128: 392 393LEB128 394^^^^^^ 395 396LEB128 is an unsigned integer value that is encoded using DWARF's LEB128 397encoding, optimizing for the case where values are small 398(1 byte for values less than 128). 399 400.. _CoverageStrings: 401 402Strings 403^^^^^^^ 404 405``[length : LEB128, characters...]`` 406 407String values are encoded with a `LEB value <LEB128_>`_ for the length 408of the string and a sequence of bytes for its characters. 409 410.. _file id mapping: 411 412File ID Mapping 413--------------- 414 415``[numIndices : LEB128, filenameIndex0 : LEB128, filenameIndex1 : LEB128, ...]`` 416 417File id mapping in a function's coverage mapping stream 418contains the indices into the translation unit's filenames array. 419 420Counter 421------- 422 423``[value : LEB128]`` 424 425A `coverage mapping counter`_ is stored in a single `LEB value <LEB128_>`_. 426It is composed of two things --- the `tag <counter-tag_>`_ 427which is stored in the lowest 2 bits, and the `counter data`_ which is stored 428in the remaining bits. 429 430.. _counter-tag: 431 432Tag: 433^^^^ 434 435The counter's tag encodes the counter's kind 436and, if the counter is an expression, the expression's kind. 437The possible tag values are: 438 439* 0 - The counter is zero. 440 441* 1 - The counter is a reference to the profile instrumentation counter. 442 443* 2 - The counter is a subtraction expression. 444 445* 3 - The counter is an addition expression. 446 447.. _counter data: 448 449Data: 450^^^^^ 451 452The counter's data is interpreted in the following manner: 453 454* When the counter is a reference to the profile instrumentation counter, 455 then the counter's data is the id of the profile counter. 456* When the counter is an expression, then the counter's data 457 is the index into the array of counter expressions. 458 459.. _Counter Expressions: 460 461Counter Expressions 462------------------- 463 464``[numExpressions : LEB128, expr0LHS : LEB128, expr0RHS : LEB128, expr1LHS : LEB128, expr1RHS : LEB128, ...]`` 465 466Counter expressions consist of two counters as they 467represent binary arithmetic operations. 468The expression's kind is determined from the `tag <counter-tag_>`_ of the 469counter that references this expression. 470 471.. _Mapping Regions: 472 473Mapping Regions 474--------------- 475 476``[numRegionArrays : LEB128, regionsForFile0, regionsForFile1, ...]`` 477 478The mapping regions are stored in an array of sub-arrays where every 479region in a particular sub-array has the same file id. 480 481The file id for a sub-array of regions is the index of that 482sub-array in the main array e.g. The first sub-array will have the file id 483of 0. 484 485Sub-Array of Regions 486^^^^^^^^^^^^^^^^^^^^ 487 488``[numRegions : LEB128, region0, region1, ...]`` 489 490The mapping regions for a specific file id are stored in an array that is 491sorted in an ascending order by the region's starting location. 492 493Mapping Region 494^^^^^^^^^^^^^^ 495 496``[header, source range]`` 497 498The mapping region record contains two sub-records --- 499the `header`_, which stores the counter and/or the region's kind, 500and the `source range`_ that contains the starting and ending 501location of this region. 502 503.. _header: 504 505Header 506^^^^^^ 507 508``[counter]`` 509 510or 511 512``[pseudo-counter]`` 513 514The header encodes the region's counter and the region's kind. 515 516The value of the counter's tag distinguishes between the counters and 517pseudo-counters --- if the tag is zero, than this header contains a 518pseudo-counter, otherwise this header contains an ordinary counter. 519 520Counter: 521"""""""" 522 523A mapping region whose header has a counter with a non-zero tag is 524a code region. 525 526Pseudo-Counter: 527""""""""""""""" 528 529``[value : LEB128]`` 530 531A pseudo-counter is stored in a single `LEB value <LEB128_>`_, just like 532the ordinary counter. It has the following interpretation: 533 534* bits 0-1: tag, which is always 0. 535 536* bit 2: expansionRegionTag. If this bit is set, then this mapping region 537 is an expansion region. 538 539* remaining bits: data. If this region is an expansion region, then the data 540 contains the expanded file id of that region. 541 542 Otherwise, the data contains the region's kind. The possible region 543 kind values are: 544 545 * 0 - This mapping region is a code region with a counter of zero. 546 * 2 - This mapping region is a skipped region. 547 548.. _source range: 549 550Source Range 551^^^^^^^^^^^^ 552 553``[deltaLineStart : LEB128, columnStart : LEB128, numLines : LEB128, columnEnd : LEB128]`` 554 555The source range record contains the following fields: 556 557* *deltaLineStart*: The difference between the starting line of the 558 current mapping region and the starting line of the previous mapping region. 559 560 If the current mapping region is the first region in the current 561 sub-array, then it stores the starting line of that region. 562 563* *columnStart*: The starting column of the mapping region. 564 565* *numLines*: The difference between the ending line and the starting line 566 of the current mapping region. 567 568* *columnEnd*: The ending column of the mapping region. If the high bit is set, 569 the current mapping region is a gap area. A count for a gap area is only used 570 as the line execution count if there are no other regions on a line. 571