1# LLVM IR Target 2 3This document describes the mechanisms of producing LLVM IR from MLIR. The 4overall flow is two-stage: 5 61. **conversion** of the IR to a set of dialects translatable to LLVM IR, for 7 example [LLVM Dialect](Dialects/LLVM.md) or one of the hardware-specific 8 dialects derived from LLVM IR intrinsics such as [AMX](Dialects/AMX.md), 9 [X86Vector](Dialects/X86Vector.md) or [ArmNeon](Dialects/ArmNeon.md); 102. **translation** of MLIR dialects to LLVM IR. 11 12This flow allows the non-trivial transformation to be performed within MLIR 13using MLIR APIs and makes the translation between MLIR and LLVM IR *simple* and 14potentially bidirectional. As a corollary, dialect ops translatable to LLVM IR 15are expected to closely match the corresponding LLVM IR instructions and 16intrinsics. This minimizes the dependency on LLVM IR libraries in MLIR as well 17as reduces the churn in case of changes. 18 19SPIR-V to LLVM dialect conversion has a 20[dedicated document](SPIRVToLLVMDialectConversion.md). 21 22[TOC] 23 24## Conversion to the LLVM Dialect 25 26Conversion to the LLVM dialect from other dialects is the first step to produce 27LLVM IR. All non-trivial IR modifications are expected to happen at this stage 28or before. The conversion is *progressive*: most passes convert one dialect to 29the LLVM dialect and keep operations from other dialects intact. For example, 30the `-convert-memref-to-llvm` pass will only convert operations from the 31`memref` dialect but will not convert operations from other dialects even if 32they use or produce `memref`-typed values. 33 34The process relies on the [Dialect Conversion](DialectConversion.md) 35infrastructure and, in particular, on the 36[materialization](DialectConversion.md#type-conversion) hooks of `TypeConverter` 37to support progressive lowering by injecting `unrealized_conversion_cast` 38operations between converted and unconverted operations. After multiple partial 39conversions to the LLVM dialect are performed, the cast operations that became 40noop can be removed by the `-reconcile-unrealized-casts` pass. The latter pass 41is not specific to the LLVM dialect and can remove any noop casts. 42 43### Conversion of Built-in Types 44 45Built-in types have a default conversion to LLVM dialect types provided by the 46`LLVMTypeConverter` class. Users targeting the LLVM dialect can reuse and extend 47this type converter to support other types. Extra care must be taken if the 48conversion rules for built-in types are overridden: all conversion must use the 49same type converter. 50 51#### LLVM Dialect-compatible Types 52 53The types [compatible](Dialects/LLVM.md#built-in-type-compatibility) with the 54LLVM dialect are kept as is. 55 56#### Complex Type 57 58Complex type is converted into an LLVM dialect literal structure type with two 59elements: 60 61- real part; 62- imaginary part. 63 64The elemental type is converted recursively using these rules. 65 66Example: 67 68```mlir 69 complex<f32> 70 // -> 71 !llvm.struct<(f32, f32)> 72``` 73 74#### Index Type 75 76Index type is converted into an LLVM dialect integer type with the bitwidth 77specified by the [data layout](DataLayout.md) of the closest module. For 78example, on x86-64 CPUs it converts to i64. This behavior can be overridden by 79the type converter configuration, which is often exposed as a pass option by 80conversion passes. 81 82Example: 83 84```mlir 85 index 86 // -> on x86_64 87 i64 88``` 89 90#### Ranked MemRef Types 91 92Ranked memref types are converted into an LLVM dialect literal structure type 93that contains the dynamic information associated with the memref object, 94referred to as *descriptor*. Only memrefs in the 95**[strided form](Dialects/Builtin.md/#strided-memref)** can be converted to the 96LLVM dialect with the default descriptor format. Memrefs with other, less 97trivial layouts should be converted into the strided form first, e.g., by 98materializing the non-trivial address remapping due to layout as `affine.apply` 99operations. 100 101The default memref descriptor is a struct with the following fields: 102 1031. The pointer to the data buffer as allocated, referred to as "allocated 104 pointer". This is only useful for deallocating the memref. 1052. The pointer to the properly aligned data pointer that the memref indexes, 106 referred to as "aligned pointer". 1073. A lowered converted `index`-type integer containing the distance in number 108 of elements between the beginning of the (aligned) buffer and the first 109 element to be accessed through the memref, referred to as "offset". 1104. An array containing as many converted `index`-type integers as the rank of 111 the memref: the array represents the size, in number of elements, of the 112 memref along the given dimension. 1135. A second array containing as many converted `index`-type integers as the 114 rank of memref: the second array represents the "stride" (in tensor 115 abstraction sense), i.e. the number of consecutive elements of the 116 underlying buffer one needs to jump over to get to the next logically 117 indexed element. 118 119For constant memref dimensions, the corresponding size entry is a constant whose 120runtime value matches the static value. This normalization serves as an ABI for 121the memref type to interoperate with externally linked functions. In the 122particular case of rank `0` memrefs, the size and stride arrays are omitted, 123resulting in a struct containing two pointers + offset. 124 125Examples: 126 127```mlir 128// Assuming index is converted to i64. 129 130memref<f32> -> !llvm.struct<(ptr<f32> , ptr<f32>, i64)> 131memref<1 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64, 132 array<1 x 64>, array<1 x i64>)> 133memref<? x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64 134 array<1 x 64>, array<1 x i64>)> 135memref<10x42x42x43x123 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64 136 array<5 x 64>, array<5 x i64>)> 137memref<10x?x42x?x123 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64 138 array<5 x 64>, array<5 x i64>)> 139 140// Memref types can have vectors as element types 141memref<1x? x vector<4xf32>> -> !llvm.struct<(ptr<vector<4 x f32>>, 142 ptr<vector<4 x f32>>, i64, 143 array<2 x i64>, array<2 x i64>)> 144``` 145 146#### Unranked MemRef Types 147 148Unranked memref types are converted to LLVM dialect literal structure type that 149contains the dynamic information associated with the memref object, referred to 150as *unranked descriptor*. It contains: 151 1521. a converted `index`-typed integer representing the dynamic rank of the 153 memref; 1542. a type-erased pointer (`!llvm.ptr<i8>`) to a ranked memref descriptor with 155 the contents listed above. 156 157This descriptor is primarily intended for interfacing with rank-polymorphic 158library functions. The pointer to the ranked memref descriptor points to some 159*allocated* memory, which may reside on stack of the current function or in 160heap. Conversion patterns for operations producing unranked memrefs are expected 161to manage the allocation. Note that this may lead to stack allocations 162(`llvm.alloca`) being performed in a loop and not reclaimed until the end of the 163current function. 164 165#### Function Types 166 167Function types are converted to LLVM dialect function types as follows: 168 169- function argument and result types are converted recursively using these 170 rules; 171- if a function type has multiple results, they are wrapped into an LLVM 172 dialect literal structure type since LLVM function types must have exactly 173 one result; 174- if a function type has no results, the corresponding LLVM dialect function 175 type will have one `!llvm.void` result since LLVM function types must have a 176 result; 177- function types used in arguments of another function type are wrapped in an 178 LLVM dialect pointer type to comply with LLVM IR expectations; 179- the structs corresponding to `memref` types, both ranked and unranked, 180 appearing as function arguments are unbundled into individual function 181 arguments to allow for specifying metadata such as aliasing information on 182 individual pointers; 183- the conversion of `memref`-typed arguments is subject to 184 [calling conventions](TargetLLVMIR.md#calling-conventions). 185- if a function type has boolean attribute `func.varargs` being set, the 186 converted LLVM function will be variadic. 187 188Examples: 189 190```mlir 191// Zero-ary function type with no results: 192() -> () 193// is converted to a zero-ary function with `void` result. 194!llvm.func<void ()> 195 196// Unary function with one result: 197(i32) -> (i64) 198// has its argument and result type converted, before creating the LLVM dialect 199// function type. 200!llvm.func<i64 (i32)> 201 202// Binary function with one result: 203(i32, f32) -> (i64) 204// has its arguments handled separately 205!llvm.func<i64 (i32, f32)> 206 207// Binary function with two results: 208(i32, f32) -> (i64, f64) 209// has its result aggregated into a structure type. 210!llvm.func<struct<(i64, f64)> (i32, f32)> 211 212// Function-typed arguments or results in higher-order functions: 213(() -> ()) -> (() -> ()) 214// are converted into pointers to functions. 215!llvm.func<ptr<func<void ()>> (ptr<func<void ()>>)> 216 217// These rules apply recursively: a function type taking a function that takes 218// another function 219( ( (i32) -> (i64) ) -> () ) -> () 220// is converted into a function type taking a pointer-to-function that takes 221// another point-to-function. 222!llvm.func<void (ptr<func<void (ptr<func<i64 (i32)>>)>>)> 223 224// A memref descriptor appearing as function argument: 225(memref<f32>) -> () 226// gets converted into a list of individual scalar components of a descriptor. 227!llvm.func<void (ptr<f32>, ptr<f32>, i64)> 228 229// The list of arguments is linearized and one can freely mix memref and other 230// types in this list: 231(memref<f32>, f32) -> () 232// which gets converted into a flat list. 233!llvm.func<void (ptr<f32>, ptr<f32>, i64, f32)> 234 235// For nD ranked memref descriptors: 236(memref<?x?xf32>) -> () 237// the converted signature will contain 2n+1 `index`-typed integer arguments, 238// offset, n sizes and n strides, per memref argument type. 239!llvm.func<void (ptr<f32>, ptr<f32>, i64, i64, i64, i64, i64)> 240 241// Same rules apply to unranked descriptors: 242(memref<*xf32>) -> () 243// which get converted into their components. 244!llvm.func<void (i64, ptr<i8>)> 245 246// However, returning a memref from a function is not affected: 247() -> (memref<?xf32>) 248// gets converted to a function returning a descriptor structure. 249!llvm.func<struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)> ()> 250 251// If multiple memref-typed results are returned: 252() -> (memref<f32>, memref<f64>) 253// their descriptor structures are additionally packed into another structure, 254// potentially with other non-memref typed results. 255!llvm.func<struct<(struct<(ptr<f32>, ptr<f32>, i64)>, 256 struct<(ptr<double>, ptr<double>, i64)>)> ()> 257 258// If "func.varargs" attribute is set: 259(i32) -> () attributes { "func.varargs" = true } 260// the corresponding LLVM function will be variadic: 261!llvm.func<void (i32, ...)> 262``` 263 264Conversion patterns are available to convert built-in function operations and 265standard call operations targeting those functions using these conversion rules. 266 267#### Multi-dimensional Vector Types 268 269LLVM IR only supports *one-dimensional* vectors, unlike MLIR where vectors can 270be multi-dimensional. Vector types cannot be nested in either IR. In the 271one-dimensional case, MLIR vectors are converted to LLVM IR vectors of the same 272size with element type converted using these conversion rules. In the 273n-dimensional case, MLIR vectors are converted to (n-1)-dimensional array types 274of one-dimensional vectors. 275 276Examples: 277 278``` 279vector<4x8 x f32> 280// -> 281!llvm.array<4 x vector<8 x f32>> 282 283memref<2 x vector<4x8 x f32> 284// -> 285!llvm.struct<(ptr<array<4 x vector<8xf32>>>, ptr<array<4 x vector<8xf32>>> 286 i64, array<1 x i64>, array<1 x i64>)> 287``` 288 289#### Tensor Types 290 291Tensor types cannot be converted to the LLVM dialect. Operations on tensors must 292be [bufferized](Bufferization.md) before being converted. 293 294### Calling Conventions 295 296Calling conventions provides a mechanism to customize the conversion of function 297and function call operations without changing how individual types are handled 298elsewhere. They are implemented simultaneously by the default type converter and 299by the conversion patterns for the relevant operations. 300 301#### Function Result Packing 302 303In case of multi-result functions, the returned values are inserted into a 304structure-typed value before being returned and extracted from it at the call 305site. This transformation is a part of the conversion and is transparent to the 306defines and uses of the values being returned. 307 308Example: 309 310```mlir 311func.func @foo(%arg0: i32, %arg1: i64) -> (i32, i64) { 312 return %arg0, %arg1 : i32, i64 313} 314func.func @bar() { 315 %0 = arith.constant 42 : i32 316 %1 = arith.constant 17 : i64 317 %2:2 = call @foo(%0, %1) : (i32, i64) -> (i32, i64) 318 "use_i32"(%2#0) : (i32) -> () 319 "use_i64"(%2#1) : (i64) -> () 320} 321 322// is transformed into 323 324llvm.func @foo(%arg0: i32, %arg1: i64) -> !llvm.struct<(i32, i64)> { 325 // insert the vales into a structure 326 %0 = llvm.mlir.undef : !llvm.struct<(i32, i64)> 327 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i32, i64)> 328 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i32, i64)> 329 330 // return the structure value 331 llvm.return %2 : !llvm.struct<(i32, i64)> 332} 333llvm.func @bar() { 334 %0 = llvm.mlir.constant(42 : i32) : i32 335 %1 = llvm.mlir.constant(17) : i64 336 337 // call and extract the values from the structure 338 %2 = llvm.call @bar(%0, %1) 339 : (i32, i32) -> !llvm.struct<(i32, i64)> 340 %3 = llvm.extractvalue %2[0] : !llvm.struct<(i32, i64)> 341 %4 = llvm.extractvalue %2[1] : !llvm.struct<(i32, i64)> 342 343 // use as before 344 "use_i32"(%3) : (i32) -> () 345 "use_i64"(%4) : (i64) -> () 346} 347``` 348 349#### Default Calling Convention for Ranked MemRef 350 351The default calling convention converts `memref`-typed function arguments to 352LLVM dialect literal structs 353[defined above](TargetLLVMIR.md#ranked-memref-types) before unbundling them into 354individual scalar arguments. 355 356Examples: 357 358This convention is implemented in the conversion of `func.func` and `func.call` to 359the LLVM dialect, with the former unpacking the descriptor into a set of 360individual values and the latter packing those values back into a descriptor so 361as to make it transparently usable by other operations. Conversions from other 362dialects should take this convention into account. 363 364This specific convention is motivated by the necessity to specify alignment and 365aliasing attributes on the raw pointers underpinning the memref. 366 367Examples: 368 369```mlir 370func.func @foo(%arg0: memref<?xf32>) -> () { 371 "use"(%arg0) : (memref<?xf32>) -> () 372 return 373} 374 375// Gets converted to the following 376// (using type alias for brevity): 377!llvm.memref_1d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)> 378 379llvm.func @foo(%arg0: !llvm.ptr<f32>, // Allocated pointer. 380 %arg1: !llvm.ptr<f32>, // Aligned pointer. 381 %arg2: i64, // Offset. 382 %arg3: i64, // Size in dim 0. 383 %arg4: i64) { // Stride in dim 0. 384 // Populate memref descriptor structure. 385 %0 = llvm.mlir.undef : 386 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_1d 387 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_1d 388 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_1d 389 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_1d 390 %5 = llvm.insertvalue %arg4, %4[4, 0] : !llvm.memref_1d 391 392 // Descriptor is now usable as a single value. 393 "use"(%5) : (!llvm.memref_1d) -> () 394 llvm.return 395} 396``` 397 398```mlir 399func.func @bar() { 400 %0 = "get"() : () -> (memref<?xf32>) 401 call @foo(%0) : (memref<?xf32>) -> () 402 return 403} 404 405// Gets converted to the following 406// (using type alias for brevity): 407!llvm.memref_1d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)> 408 409llvm.func @bar() { 410 %0 = "get"() : () -> !llvm.memref_1d 411 412 // Unpack the memref descriptor. 413 %1 = llvm.extractvalue %0[0] : !llvm.memref_1d 414 %2 = llvm.extractvalue %0[1] : !llvm.memref_1d 415 %3 = llvm.extractvalue %0[2] : !llvm.memref_1d 416 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_1d 417 %5 = llvm.extractvalue %0[4, 0] : !llvm.memref_1d 418 419 // Pass individual values to the callee. 420 llvm.call @foo(%1, %2, %3, %4, %5) : (!llvm.memref_1d) -> () 421 llvm.return 422} 423``` 424 425#### Default Calling Convention for Unranked MemRef 426 427For unranked memrefs, the list of function arguments always contains two 428elements, same as the unranked memref descriptor: an integer rank, and a 429type-erased (`!llvm<"i8*">`) pointer to the ranked memref descriptor. Note that 430while the *calling convention* does not require allocation, *casting* to 431unranked memref does since one cannot take an address of an SSA value containing 432the ranked memref, which must be stored in some memory instead. The caller is in 433charge of ensuring the thread safety and management of the allocated memory, in 434particular the deallocation. 435 436Example 437 438```mlir 439llvm.func @foo(%arg0: memref<*xf32>) -> () { 440 "use"(%arg0) : (memref<*xf32>) -> () 441 return 442} 443 444// Gets converted to the following. 445 446llvm.func @foo(%arg0: i64 // Rank. 447 %arg1: !llvm.ptr<i8>) { // Type-erased pointer to descriptor. 448 // Pack the unranked memref descriptor. 449 %0 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)> 450 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i64, ptr<i8>)> 451 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i64, ptr<i8>)> 452 453 "use"(%2) : (!llvm.struct<(i64, ptr<i8>)>) -> () 454 llvm.return 455} 456``` 457 458```mlir 459llvm.func @bar() { 460 %0 = "get"() : () -> (memref<*xf32>) 461 call @foo(%0): (memref<*xf32>) -> () 462 return 463} 464 465// Gets converted to the following. 466 467llvm.func @bar() { 468 %0 = "get"() : () -> (!llvm.struct<(i64, ptr<i8>)>) 469 470 // Unpack the memref descriptor. 471 %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr<i8>)> 472 %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr<i8>)> 473 474 // Pass individual values to the callee. 475 llvm.call @foo(%1, %2) : (i64, !llvm.ptr<i8>) 476 llvm.return 477} 478``` 479 480**Lifetime.** The second element of the unranked memref descriptor points to 481some memory in which the ranked memref descriptor is stored. By convention, this 482memory is allocated on stack and has the lifetime of the function. (*Note:* due 483to function-length lifetime, creation of multiple unranked memref descriptors, 484e.g., in a loop, may lead to stack overflows.) If an unranked descriptor has to 485be returned from a function, the ranked descriptor it points to is copied into 486dynamically allocated memory, and the pointer in the unranked descriptor is 487updated accordingly. The allocation happens immediately before returning. It is 488the responsibility of the caller to free the dynamically allocated memory. The 489default conversion of `func.call` and `func.call_indirect` copies the ranked 490descriptor to newly allocated memory on the caller's stack. Thus, the convention 491of the ranked memref descriptor pointed to by an unranked memref descriptor 492being stored on stack is respected. 493 494#### Bare Pointer Calling Convention for Ranked MemRef 495 496The "bare pointer" calling convention converts `memref`-typed function arguments 497to a *single* pointer to the aligned data. Note that this does *not* apply to 498uses of `memref` outside of function signatures, the default descriptor 499structures are still used. This convention further restricts the supported cases 500to the following. 501 502- `memref` types with default layout. 503- `memref` types with all dimensions statically known. 504- `memref` values allocated in such a way that the allocated and aligned 505 pointer match. Alternatively, the same function must handle allocation and 506 deallocation since only one pointer is passed to any callee. 507 508Examples: 509 510``` 511func.func @callee(memref<2x4xf32>) { 512 513func.func @caller(%0 : memref<2x4xf32>) { 514 call @callee(%0) : (memref<2x4xf32>) -> () 515} 516 517// -> 518 519!descriptor = !llvm.struct<(ptr<f32>, ptr<f32>, i64, 520 array<2xi64>, array<2xi64>)> 521 522llvm.func @callee(!llvm.ptr<f32>) 523 524llvm.func @caller(%arg0: !llvm.ptr<f32>) { 525 // A descriptor value is defined at the function entry point. 526 %0 = llvm.mlir.undef : !descriptor 527 528 // Both the allocated and aligned pointer are set up to the same value. 529 %1 = llvm.insertelement %arg0, %0[0] : !descriptor 530 %2 = llvm.insertelement %arg0, %1[1] : !descriptor 531 532 // The offset is set up to zero. 533 %3 = llvm.mlir.constant(0 : index) : i64 534 %4 = llvm.insertelement %3, %2[2] : !descriptor 535 536 // The sizes and strides are derived from the statically known values. 537 %5 = llvm.mlir.constant(2 : index) : i64 538 %6 = llvm.mlir.constant(4 : index) : i64 539 %7 = llvm.insertelement %5, %4[3, 0] : !descriptor 540 %8 = llvm.insertelement %6, %7[3, 1] : !descriptor 541 %9 = llvm.mlir.constant(1 : index) : i64 542 %10 = llvm.insertelement %9, %8[4, 0] : !descriptor 543 %11 = llvm.insertelement %10, %9[4, 1] : !descriptor 544 545 // The function call corresponds to extracting the aligned data pointer. 546 %12 = llvm.extractelement %11[1] : !descriptor 547 llvm.call @callee(%12) : (!llvm.ptr<f32>) -> () 548} 549``` 550 551#### Bare Pointer Calling Convention For Unranked MemRef 552 553The "bare pointer" calling convention does not support unranked memrefs as their 554shape cannot be known at compile time. 555 556### Generic alloction and deallocation functions 557 558When converting the Memref dialect, allocations and deallocations are converted 559into calls to `malloc` (`aligned_alloc` if aligned allocations are requested) 560and `free`. However, it is possible to convert them to more generic functions 561which can be implemented by a runtime library, thus allowing custom allocation 562strategies or runtime profiling. When the conversion pass is instructed to 563perform such operation, the names of the calles are `_mlir_alloc`, 564`_mlir_aligned_alloc` and `_mlir_free`. Their signatures are the same of 565`malloc`, `aligned_alloc` and `free`. 566 567### C-compatible wrapper emission 568 569In practical cases, it may be desirable to have externally-facing functions with 570a single attribute corresponding to a MemRef argument. When interfacing with 571LLVM IR produced from C, the code needs to respect the corresponding calling 572convention. The conversion to the LLVM dialect provides an option to generate 573wrapper functions that take memref descriptors as pointers-to-struct compatible 574with data types produced by Clang when compiling C sources. The generation of 575such wrapper functions can additionally be controlled at a function granularity 576by setting the `llvm.emit_c_interface` unit attribute. 577 578More specifically, a memref argument is converted into a pointer-to-struct 579argument of type `{T*, T*, i64, i64[N], i64[N]}*` in the wrapper function, where 580`T` is the converted element type and `N` is the memref rank. This type is 581compatible with that produced by Clang for the following C++ structure template 582instantiations or their equivalents in C. 583 584```cpp 585template<typename T, size_t N> 586struct MemRefDescriptor { 587 T *allocated; 588 T *aligned; 589 intptr_t offset; 590 intptr_t sizes[N]; 591 intptr_t strides[N]; 592}; 593``` 594 595Furthermore, we also rewrite function results to pointer parameters if the 596rewritten function result has a struct type. The special result parameter is 597added as the first parameter and is of pointer-to-struct type. 598 599If enabled, the option will do the following. For *external* functions declared 600in the MLIR module. 601 6021. Declare a new function `_mlir_ciface_<original name>` where memref arguments 603 are converted to pointer-to-struct and the remaining arguments are converted 604 as usual. Results are converted to a special argument if they are of struct 605 type. 6062. Add a body to the original function (making it non-external) that 607 1. allocates memref descriptors, 608 2. populates them, 609 3. potentially allocates space for the result struct, and 610 4. passes the pointers to these into the newly declared interface function, 611 then 612 5. collects the result of the call (potentially from the result struct), 613 and 614 6. returns it to the caller. 615 616For (non-external) functions defined in the MLIR module. 617 6181. Define a new function `_mlir_ciface_<original name>` where memref arguments 619 are converted to pointer-to-struct and the remaining arguments are converted 620 as usual. Results are converted to a special argument if they are of struct 621 type. 6222. Populate the body of the newly defined function with IR that 623 1. loads descriptors from pointers; 624 2. unpacks descriptor into individual non-aggregate values; 625 3. passes these values into the original function; 626 4. collects the results of the call and 627 5. either copies the results into the result struct or returns them to the 628 caller. 629 630Examples: 631 632```mlir 633 634func.func @qux(%arg0: memref<?x?xf32>) 635 636// Gets converted into the following 637// (using type alias for brevity): 638!llvm.memref_2d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)> 639 640// Function with unpacked arguments. 641llvm.func @qux(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>, 642 %arg2: i64, %arg3: i64, %arg4: i64, 643 %arg5: i64, %arg6: i64) { 644 // Populate memref descriptor (as per calling convention). 645 %0 = llvm.mlir.undef : !llvm.memref_2d 646 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d 647 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d 648 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d 649 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d 650 %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d 651 %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d 652 %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d 653 654 // Store the descriptor in a stack-allocated space. 655 %8 = llvm.mlir.constant(1 : index) : i64 656 %9 = llvm.alloca %8 x !llvm.memref_2d 657 : (i64) -> !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, 658 array<2xi64>, array<2xi64>)>> 659 llvm.store %7, %9 : !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, 660 array<2xi64>, array<2xi64>)>> 661 662 // Call the interface function. 663 llvm.call @_mlir_ciface_qux(%9) 664 : (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, 665 array<2xi64>, array<2xi64>)>>) -> () 666 667 // The stored descriptor will be freed on return. 668 llvm.return 669} 670 671// Interface function. 672llvm.func @_mlir_ciface_qux(!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, 673 array<2xi64>, array<2xi64>)>>) 674``` 675 676```mlir 677func.func @foo(%arg0: memref<?x?xf32>) { 678 return 679} 680 681// Gets converted into the following 682// (using type alias for brevity): 683!llvm.memref_2d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)> 684!llvm.memref_2d_ptr = !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>> 685 686// Function with unpacked arguments. 687llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>, 688 %arg2: i64, %arg3: i64, %arg4: i64, 689 %arg5: i64, %arg6: i64) { 690 llvm.return 691} 692 693// Interface function callable from C. 694llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr) { 695 // Load the descriptor. 696 %0 = llvm.load %arg0 : !llvm.memref_2d_ptr 697 698 // Unpack the descriptor as per calling convention. 699 %1 = llvm.extractvalue %0[0] : !llvm.memref_2d 700 %2 = llvm.extractvalue %0[1] : !llvm.memref_2d 701 %3 = llvm.extractvalue %0[2] : !llvm.memref_2d 702 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d 703 %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d 704 %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d 705 %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d 706 llvm.call @foo(%1, %2, %3, %4, %5, %6, %7) 707 : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64, 708 i64, i64) -> () 709 llvm.return 710} 711``` 712 713```mlir 714func.func @foo(%arg0: memref<?x?xf32>) -> memref<?x?xf32> { 715 return %arg0 : memref<?x?xf32> 716} 717 718// Gets converted into the following 719// (using type alias for brevity): 720!llvm.memref_2d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)> 721!llvm.memref_2d_ptr = !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>> 722 723// Function with unpacked arguments. 724llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>, %arg2: i64, 725 %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64) 726 -> !llvm.memref_2d { 727 %0 = llvm.mlir.undef : !llvm.memref_2d 728 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d 729 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d 730 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d 731 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d 732 %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d 733 %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d 734 %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d 735 llvm.return %7 : !llvm.memref_2d 736} 737 738// Interface function callable from C. 739llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr, %arg1: !llvm.memref_2d_ptr) { 740 %0 = llvm.load %arg1 : !llvm.memref_2d_ptr 741 %1 = llvm.extractvalue %0[0] : !llvm.memref_2d 742 %2 = llvm.extractvalue %0[1] : !llvm.memref_2d 743 %3 = llvm.extractvalue %0[2] : !llvm.memref_2d 744 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d 745 %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d 746 %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d 747 %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d 748 %8 = llvm.call @foo(%1, %2, %3, %4, %5, %6, %7) 749 : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64, i64, i64) -> !llvm.memref_2d 750 llvm.store %8, %arg0 : !llvm.memref_2d_ptr 751 llvm.return 752} 753``` 754 755Rationale: Introducing auxiliary functions for C-compatible interfaces is 756preferred to modifying the calling convention since it will minimize the effect 757of C compatibility on intra-module calls or calls between MLIR-generated 758functions. In particular, when calling external functions from an MLIR module in 759a (parallel) loop, the fact of storing a memref descriptor on stack can lead to 760stack exhaustion and/or concurrent access to the same address. Auxiliary 761interface function serves as an allocation scope in this case. Furthermore, when 762targeting accelerators with separate memory spaces such as GPUs, stack-allocated 763descriptors passed by pointer would have to be transferred to the device memory, 764which introduces significant overhead. In such situations, auxiliary interface 765functions are executed on host and only pass the values through device function 766invocation mechanism. 767 768Limitation: Right now we cannot generate C interface for variadic functions, 769regardless of being non-external or external. Because C functions are unable to 770"forward" variadic arguments like this: 771```c 772void bar(int, ...); 773 774void foo(int x, ...) { 775 // ERROR: no way to forward variadic arguments. 776 void bar(x, ...); 777} 778``` 779 780### Address Computation 781 782Accesses to a memref element are transformed into an access to an element of the 783buffer pointed to by the descriptor. The position of the element in the buffer 784is calculated by linearizing memref indices in row-major order (lexically first 785index is the slowest varying, similar to C, but accounting for strides). The 786computation of the linear address is emitted as arithmetic operation in the LLVM 787IR dialect. Strides are extracted from the memref descriptor. 788 789Examples: 790 791An access to a memref with indices: 792 793```mlir 794%0 = memref.load %m[%1,%2,%3,%4] : memref<?x?x4x8xf32, offset: ?> 795``` 796 797is transformed into the equivalent of the following code: 798 799```mlir 800// Compute the linearized index from strides. 801// When strides or, in absence of explicit strides, the corresponding sizes are 802// dynamic, extract the stride value from the descriptor. 803%stride1 = llvm.extractvalue[4, 0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, 804 array<4xi64>, array<4xi64>)> 805%addr1 = arith.muli %stride1, %1 : i64 806 807// When the stride or, in absence of explicit strides, the trailing sizes are 808// known statically, this value is used as a constant. The natural value of 809// strides is the product of all sizes following the current dimension. 810%stride2 = llvm.mlir.constant(32 : index) : i64 811%addr2 = arith.muli %stride2, %2 : i64 812%addr3 = arith.addi %addr1, %addr2 : i64 813 814%stride3 = llvm.mlir.constant(8 : index) : i64 815%addr4 = arith.muli %stride3, %3 : i64 816%addr5 = arith.addi %addr3, %addr4 : i64 817 818// Multiplication with the known unit stride can be omitted. 819%addr6 = arith.addi %addr5, %4 : i64 820 821// If the linear offset is known to be zero, it can also be omitted. If it is 822// dynamic, it is extracted from the descriptor. 823%offset = llvm.extractvalue[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, 824 array<4xi64>, array<4xi64>)> 825%addr7 = arith.addi %addr6, %offset : i64 826 827// All accesses are based on the aligned pointer. 828%aligned = llvm.extractvalue[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, 829 array<4xi64>, array<4xi64>)> 830 831// Get the address of the data pointer. 832%ptr = llvm.getelementptr %aligned[%addr8] 833 : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<4xi64>, array<4xi64>)> 834 -> !llvm.ptr<f32> 835 836// Perform the actual load. 837%0 = llvm.load %ptr : !llvm.ptr<f32> 838``` 839 840For stores, the address computation code is identical and only the actual store 841operation is different. 842 843Note: the conversion does not perform any sort of common subexpression 844elimination when emitting memref accesses. 845 846### Utility Classes 847 848Utility classes common to many conversions to the LLVM dialect can be found 849under `lib/Conversion/LLVMCommon`. They include the following. 850 851- `LLVMConversionTarget` specifies all LLVM dialect operations as legal. 852- `LLVMTypeConverter` implements the default type conversion as described 853 above. 854- `ConvertOpToLLVMPattern` extends the conversion pattern class with LLVM 855 dialect-specific functionality. 856- `VectorConvertOpToLLVMPattern` extends the previous class to automatically 857 unroll operations on higher-dimensional vectors into lists of operations on 858 one-dimensional vectors before. 859- `StructBuilder` provides a convenient API for building IR that creates or 860 accesses values of LLVM dialect structure types; it is derived by 861 `MemRefDescriptor`, `UrankedMemrefDescriptor` and `ComplexBuilder` for the 862 built-in types convertible to LLVM dialect structure types. 863 864## Translation to LLVM IR 865 866MLIR modules containing `llvm.func`, `llvm.mlir.global` and `llvm.metadata` 867operations can be translated to LLVM IR modules using the following scheme. 868 869- Module-level globals are translated to LLVM IR global values. 870- Module-level metadata are translated to LLVM IR metadata, which can be later 871 augmented with additional metadata defined on specific ops. 872- All functions are declared in the module so that they can be referenced. 873- Each function is then translated separately and has access to the complete 874 mappings between MLIR and LLVM IR globals, metadata, and functions. 875- Within a function, blocks are traversed in topological order and translated 876 to LLVM IR basic blocks. In each basic block, PHI nodes are created for each 877 of the block arguments, but not connected to their source blocks. 878- Within each block, operations are translated in their order. Each operation 879 has access to the same mappings as the function and additionally to the 880 mapping of values between MLIR and LLVM IR, including PHI nodes. Operations 881 with regions are responsible for translated the regions they contain. 882- After operations in a function are translated, the PHI nodes of blocks in 883 this function are connected to their source values, which are now available. 884 885The translation mechanism provides extension hooks for translating custom 886operations to LLVM IR via a dialect interface `LLVMTranslationDialectInterface`: 887 888- `convertOperation` translates an operation that belongs to the current 889 dialect to LLVM IR given an `IRBuilderBase` and various mappings; 890- `amendOperation` performs additional actions on an operation if it contains 891 a dialect attribute that belongs to the current dialect, for example sets up 892 instruction-level metadata. 893 894Dialects containing operations or attributes that want to be translated to LLVM 895IR must provide an implementation of this interface and register it with the 896system. Note that registration may happen without creating the dialect, for 897example, in a separate library to avoid the need for the "main" dialect library 898to depend on LLVM IR libraries. The implementations of these methods may used 899the 900[`ModuleTranslation`](https://mlir.llvm.org/doxygen/classmlir_1_1LLVM_1_1ModuleTranslation.html) 901object provided to them which holds the state of the translation and contains 902numerous utilities. 903 904Note that this extension mechanism is *intentionally restrictive*. LLVM IR has a 905small, relatively stable set of instructions and types that MLIR intends to 906model fully. Therefore, the extension mechanism is provided only for LLVM IR 907constructs that are more often extended -- intrinsics and metadata. The primary 908goal of the extension mechanism is to support sets of intrinsics, for example 909those representing a particular instruction set. The extension mechanism does 910not allow for customizing type or block translation, nor does it support custom 911module-level operations. Such transformations should be performed within MLIR 912and target the corresponding MLIR constructs. 913 914## Translation from LLVM IR 915 916An experimental flow allows one to import a substantially limited subset of LLVM 917IR into MLIR, producing LLVM dialect operations. 918 919``` 920 mlir-translate -import-llvm filename.ll 921``` 922