1# LLVM IR Target 2 3This document describes the mechanisms of producing LLVM IR from MLIR. The 4overall flow is two-stage: 5 61. **conversion** of the IR to a set of dialects translatable to LLVM IR, for 7 example [LLVM Dialect](Dialects/LLVM.md) or one of the hardware-specific 8 dialects derived from LLVM IR intrinsics such as [AMX](Dialects/AMX.md), 9 [X86Vector](Dialects/X86Vector.md) or [ArmNeon](Dialects/ArmNeon.md); 102. **translation** of MLIR dialects to LLVM IR. 11 12This flow allows the non-trivial transformation to be performed within MLIR 13using MLIR APIs and makes the translation between MLIR and LLVM IR *simple* and 14potentially bidirectional. As a corollary, dialect ops translatable to LLVM IR 15are expected to closely match the corresponding LLVM IR instructions and 16intrinsics. This minimizes the dependency on LLVM IR libraries in MLIR as well 17as reduces the churn in case of changes. 18 19SPIR-V to LLVM dialect conversion has a 20[dedicated document](SPIRVToLLVMDialectConversion.md). 21 22[TOC] 23 24## Conversion to the LLVM Dialect 25 26Conversion to the LLVM dialect from other dialects is the first step to produce 27LLVM IR. All non-trivial IR modifications are expected to happen at this stage 28or before. The conversion is *progressive*: most passes convert one dialect to 29the LLVM dialect and keep operations from other dialects intact. For example, 30the `-convert-memref-to-llvm` pass will only convert operations from the 31`memref` dialect but will not convert operations from other dialects even if 32they use or produce `memref`-typed values. 33 34The process relies on the [Dialect Conversion](DialectConversion.md) 35infrastructure and, in particular, on the 36[materialization](DialectConversion.md#type-conversion) hooks of `TypeConverter` 37to support progressive lowering by injecting `unrealized_conversion_cast` 38operations between converted and unconverted operations. After multiple partial 39conversions to the LLVM dialect are performed, the cast operations that became 40noop can be removed by the `-reconcile-unrealized-casts` pass. The latter pass 41is not specific to the LLVM dialect and can remove any noop casts. 42 43### Conversion of Built-in Types 44 45Built-in types have a default conversion to LLVM dialect types provided by the 46`LLVMTypeConverter` class. Users targeting the LLVM dialect can reuse and extend 47this type converter to support other types. Extra care must be taken if the 48conversion rules for built-in types are overridden: all conversion must use the 49same type converter. 50 51#### LLVM Dialect-compatible Types 52 53The types [compatible](Dialects/LLVM.md#built-in-type-compatibility) with the 54LLVM dialect are kept as is. 55 56#### Complex Type 57 58Complex type is converted into an LLVM dialect literal structure type with two 59elements: 60 61- real part; 62- imaginary part. 63 64The elemental type is converted recursively using these rules. 65 66Example: 67 68```mlir 69 complex<f32> 70 // -> 71 !llvm.struct<(f32, f32)> 72``` 73 74#### Index Type 75 76Index type is converted into an LLVM dialect integer type with the bitwidth 77specified by the [data layout](DataLayout.md) of the closest module. For 78example, on x86-64 CPUs it converts to i64. This behavior can be overridden by 79the type converter configuration, which is often exposed as a pass option by 80conversion passes. 81 82Example: 83 84```mlir 85 index 86 // -> on x86_64 87 i64 88``` 89 90#### Ranked MemRef Types 91 92Ranked memref types are converted into an LLVM dialect literal structure type 93that contains the dynamic information associated with the memref object, 94referred to as *descriptor*. Only memrefs in the 95**[strided form](Dialects/Builtin.md/#strided-memref)** can be converted to the 96LLVM dialect with the default descriptor format. Memrefs with other, less 97trivial layouts should be converted into the strided form first, e.g., by 98materializing the non-trivial address remapping due to layout as `affine.apply` 99operations. 100 101The default memref descriptor is a struct with the following fields: 102 1031. The pointer to the data buffer as allocated, referred to as "allocated 104 pointer". This is only useful for deallocating the memref. 1052. The pointer to the properly aligned data pointer that the memref indexes, 106 referred to as "aligned pointer". 1073. A lowered converted `index`-type integer containing the distance in number 108 of elements between the beginning of the (aligned) buffer and the first 109 element to be accessed through the memref, referred to as "offset". 1104. An array containing as many converted `index`-type integers as the rank of 111 the memref: the array represents the size, in number of elements, of the 112 memref along the given dimension. 1135. A second array containing as many converted `index`-type integers as the 114 rank of memref: the second array represents the "stride" (in tensor 115 abstraction sense), i.e. the number of consecutive elements of the 116 underlying buffer one needs to jump over to get to the next logically 117 indexed element. 118 119For constant memref dimensions, the corresponding size entry is a constant whose 120runtime value matches the static value. This normalization serves as an ABI for 121the memref type to interoperate with externally linked functions. In the 122particular case of rank `0` memrefs, the size and stride arrays are omitted, 123resulting in a struct containing two pointers + offset. 124 125Examples: 126 127```mlir 128// Assuming index is converted to i64. 129 130memref<f32> -> !llvm.struct<(ptr<f32> , ptr<f32>, i64)> 131memref<1 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64, 132 array<1 x 64>, array<1 x i64>)> 133memref<? x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64 134 array<1 x 64>, array<1 x i64>)> 135memref<10x42x42x43x123 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64 136 array<5 x 64>, array<5 x i64>)> 137memref<10x?x42x?x123 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64 138 array<5 x 64>, array<5 x i64>)> 139 140// Memref types can have vectors as element types 141memref<1x? x vector<4xf32>> -> !llvm.struct<(ptr<vector<4 x f32>>, 142 ptr<vector<4 x f32>>, i64, 143 array<2 x i64>, array<2 x i64>)> 144``` 145 146#### Unranked MemRef Types 147 148Unranked memref types are converted to LLVM dialect literal structure type that 149contains the dynamic information associated with the memref object, referred to 150as *unranked descriptor*. It contains: 151 1521. a converted `index`-typed integer representing the dynamic rank of the 153 memref; 1542. a type-erased pointer (`!llvm.ptr<i8>`) to a ranked memref descriptor with 155 the contents listed above. 156 157This descriptor is primarily intended for interfacing with rank-polymorphic 158library functions. The pointer to the ranked memref descriptor points to some 159*allocated* memory, which may reside on stack of the current function or in 160heap. Conversion patterns for operations producing unranked memrefs are expected 161to manage the allocation. Note that this may lead to stack allocations 162(`llvm.alloca`) being performed in a loop and not reclaimed until the end of the 163current function. 164 165#### Function Types 166 167Function types are converted to LLVM dialect function types as follows: 168 169- function argument and result types are converted recursively using these 170 rules; 171- if a function type has multiple results, they are wrapped into an LLVM 172 dialect literal structure type since LLVM function types must have exactly 173 one result; 174- if a function type has no results, the corresponding LLVM dialect function 175 type will have one `!llvm.void` result since LLVM function types must have a 176 result; 177- function types used in arguments of another function type are wrapped in an 178 LLVM dialect pointer type to comply with LLVM IR expectations; 179- the structs corresponding to `memref` types, both ranked and unranked, 180 appearing as function arguments are unbundled into individual function 181 arguments to allow for specifying metadata such as aliasing information on 182 individual pointers; 183- the conversion of `memref`-typed arguments is subject to 184 [calling conventions](TargetLLVMIR.md#calling-conventions). 185 186Examples: 187 188```mlir 189// Zero-ary function type with no results: 190() -> () 191// is converted to a zero-ary function with `void` result. 192!llvm.func<void ()> 193 194// Unary function with one result: 195(i32) -> (i64) 196// has its argument and result type converted, before creating the LLVM dialect 197// function type. 198!llvm.func<i64 (i32)> 199 200// Binary function with one result: 201(i32, f32) -> (i64) 202// has its arguments handled separately 203!llvm.func<i64 (i32, f32)> 204 205// Binary function with two results: 206(i32, f32) -> (i64, f64) 207// has its result aggregated into a structure type. 208!llvm.func<struct<(i64, f64)> (i32, f32)> 209 210// Function-typed arguments or results in higher-order functions: 211(() -> ()) -> (() -> ()) 212// are converted into pointers to functions. 213!llvm.func<ptr<func<void ()>> (ptr<func<void ()>>)> 214 215// These rules apply recursively: a function type taking a function that takes 216// another function 217( ( (i32) -> (i64) ) -> () ) -> () 218// is converted into a function type taking a pointer-to-function that takes 219// another point-to-function. 220!llvm.func<void (ptr<func<void (ptr<func<i64 (i32)>>)>>)> 221 222// A memref descriptor appearing as function argument: 223(memref<f32>) -> () 224// gets converted into a list of individual scalar components of a descriptor. 225!llvm.func<void (ptr<f32>, ptr<f32>, i64)> 226 227// The list of arguments is linearized and one can freely mix memref and other 228// types in this list: 229(memref<f32>, f32) -> () 230// which gets converted into a flat list. 231!llvm.func<void (ptr<f32>, ptr<f32>, i64, f32)> 232 233// For nD ranked memref descriptors: 234(memref<?x?xf32>) -> () 235// the converted signature will contain 2n+1 `index`-typed integer arguments, 236// offset, n sizes and n strides, per memref argument type. 237!llvm.func<void (ptr<f32>, ptr<f32>, i64, i64, i64, i64, i64)> 238 239// Same rules apply to unranked descriptors: 240(memref<*xf32>) -> () 241// which get converted into their components. 242!llvm.func<void (i64, ptr<i8>)> 243 244// However, returning a memref from a function is not affected: 245() -> (memref<?xf32>) 246// gets converted to a function returning a descriptor structure. 247!llvm.func<struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)> ()> 248 249// If multiple memref-typed results are returned: 250() -> (memref<f32>, memref<f64>) 251// their descriptor structures are additionally packed into another structure, 252// potentially with other non-memref typed results. 253!llvm.func<struct<(struct<(ptr<f32>, ptr<f32>, i64)>, 254 struct<(ptr<double>, ptr<double>, i64)>)> ()> 255``` 256 257Conversion patterns are available to convert built-in function operations and 258standard call operations targeting those functions using these conversion rules. 259 260#### Multi-dimensional Vector Types 261 262LLVM IR only supports *one-dimensional* vectors, unlike MLIR where vectors can 263be multi-dimensional. Vector types cannot be nested in either IR. In the 264one-dimensional case, MLIR vectors are converted to LLVM IR vectors of the same 265size with element type converted using these conversion rules. In the 266n-dimensional case, MLIR vectors are converted to (n-1)-dimensional array types 267of one-dimensional vectors. 268 269Examples: 270 271``` 272vector<4x8 x f32> 273// -> 274!llvm.array<4 x vector<8 x f32>> 275 276memref<2 x vector<4x8 x f32> 277// -> 278!llvm.struct<(ptr<array<4 x vector<8xf32>>>, ptr<array<4 x vector<8xf32>>> 279 i64, array<1 x i64>, array<1 x i64>)> 280``` 281 282#### Tensor Types 283 284Tensor types cannot be converted to the LLVM dialect. Operations on tensors must 285be [bufferized](Bufferization.md) before being converted. 286 287### Calling Conventions 288 289Calling conventions provides a mechanism to customize the conversion of function 290and function call operations without changing how individual types are handled 291elsewhere. They are implemented simultaneously by the default type converter and 292by the conversion patterns for the relevant operations. 293 294#### Function Result Packing 295 296In case of multi-result functions, the returned values are inserted into a 297structure-typed value before being returned and extracted from it at the call 298site. This transformation is a part of the conversion and is transparent to the 299defines and uses of the values being returned. 300 301Example: 302 303```mlir 304func @foo(%arg0: i32, %arg1: i64) -> (i32, i64) { 305 return %arg0, %arg1 : i32, i64 306} 307func @bar() { 308 %0 = arith.constant 42 : i32 309 %1 = arith.constant 17 : i64 310 %2:2 = call @foo(%0, %1) : (i32, i64) -> (i32, i64) 311 "use_i32"(%2#0) : (i32) -> () 312 "use_i64"(%2#1) : (i64) -> () 313} 314 315// is transformed into 316 317llvm.func @foo(%arg0: i32, %arg1: i64) -> !llvm.struct<(i32, i64)> { 318 // insert the vales into a structure 319 %0 = llvm.mlir.undef : !llvm.struct<(i32, i64)> 320 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i32, i64)> 321 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i32, i64)> 322 323 // return the structure value 324 llvm.return %2 : !llvm.struct<(i32, i64)> 325} 326llvm.func @bar() { 327 %0 = llvm.mlir.constant(42 : i32) : i32 328 %1 = llvm.mlir.constant(17) : i64 329 330 // call and extract the values from the structure 331 %2 = llvm.call @bar(%0, %1) 332 : (i32, i32) -> !llvm.struct<(i32, i64)> 333 %3 = llvm.extractvalue %2[0] : !llvm.struct<(i32, i64)> 334 %4 = llvm.extractvalue %2[1] : !llvm.struct<(i32, i64)> 335 336 // use as before 337 "use_i32"(%3) : (i32) -> () 338 "use_i64"(%4) : (i64) -> () 339} 340``` 341 342#### Default Calling Convention for Ranked MemRef 343 344The default calling convention converts `memref`-typed function arguments to 345LLVM dialect literal structs 346[defined above](TargetLLVMIR.md#ranked-memref-types) before unbundling them into 347individual scalar arguments. 348 349Examples: 350 351This convention is implemented in the conversion of `std.func` and `std.call` to 352the LLVM dialect, with the former unpacking the descriptor into a set of 353individual values and the latter packing those values back into a descriptor so 354as to make it transparently usable by other operations. Conversions from other 355dialects should take this convention into account. 356 357This specific convention is motivated by the necessity to specify alignment and 358aliasing attributes on the raw pointers underpinning the memref. 359 360Examples: 361 362```mlir 363func @foo(%arg0: memref<?xf32>) -> () { 364 "use"(%arg0) : (memref<?xf32>) -> () 365 return 366} 367 368// Gets converted to the following 369// (using type alias for brevity): 370!llvm.memref_1d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64, 371 array<1xi64>, array<1xi64>)> 372 373llvm.func @foo(%arg0: !llvm.ptr<f32>, // Allocated pointer. 374 %arg1: !llvm.ptr<f32>, // Aligned pointer. 375 %arg2: i64, // Offset. 376 %arg3: i64, // Size in dim 0. 377 %arg4: i64) { // Stride in dim 0. 378 // Populate memref descriptor structure. 379 %0 = llvm.mlir.undef : 380 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_1d 381 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_1d 382 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_1d 383 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_1d 384 %5 = llvm.insertvalue %arg4, %4[4, 0] : !llvm.memref_1d 385 386 // Descriptor is now usable as a single value. 387 "use"(%5) : (!llvm.memref_1d) -> () 388 llvm.return 389} 390``` 391 392```mlir 393func @bar() { 394 %0 = "get"() : () -> (memref<?xf32>) 395 call @foo(%0) : (memref<?xf32>) -> () 396 return 397} 398 399// Gets converted to the following 400// (using type alias for brevity): 401!llvm.memref_1d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64, 402 array<1xi64>, array<1xi64>)> 403 404llvm.func @bar() { 405 %0 = "get"() : () -> !llvm.memref_1d 406 407 // Unpack the memref descriptor. 408 %1 = llvm.extractvalue %0[0] : !llvm.memref_1d 409 %2 = llvm.extractvalue %0[1] : !llvm.memref_1d 410 %3 = llvm.extractvalue %0[2] : !llvm.memref_1d 411 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_1d 412 %5 = llvm.extractvalue %0[4, 0] : !llvm.memref_1d 413 414 // Pass individual values to the callee. 415 llvm.call @foo(%1, %2, %3, %4, %5) : (!llvm.memref_1d) -> () 416 llvm.return 417} 418``` 419 420#### Default Calling Convention for Unranked MemRef 421 422For unranked memrefs, the list of function arguments always contains two 423elements, same as the unranked memref descriptor: an integer rank, and a 424type-erased (`!llvm<"i8*">`) pointer to the ranked memref descriptor. Note that 425while the *calling convention* does not require allocation, *casting* to 426unranked memref does since one cannot take an address of an SSA value containing 427the ranked memref, which must be stored in some memory instead. The caller is in 428charge of ensuring the thread safety and management of the allocated memory, in 429particular the deallocation. 430 431Example 432 433```mlir 434llvm.func @foo(%arg0: memref<*xf32>) -> () { 435 "use"(%arg0) : (memref<*xf32>) -> () 436 return 437} 438 439// Gets converted to the following. 440 441llvm.func @foo(%arg0: i64 // Rank. 442 %arg1: !llvm.ptr<i8>) { // Type-erased pointer to descriptor. 443 // Pack the unranked memref descriptor. 444 %0 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)> 445 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i64, ptr<i8>)> 446 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i64, ptr<i8>)> 447 448 "use"(%2) : (!llvm.struct<(i64, ptr<i8>)>) -> () 449 llvm.return 450} 451``` 452 453```mlir 454llvm.func @bar() { 455 %0 = "get"() : () -> (memref<*xf32>) 456 call @foo(%0): (memref<*xf32>) -> () 457 return 458} 459 460// Gets converted to the following. 461 462llvm.func @bar() { 463 %0 = "get"() : () -> (!llvm.struct<(i64, ptr<i8>)>) 464 465 // Unpack the memref descriptor. 466 %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr<i8>)> 467 %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr<i8>)> 468 469 // Pass individual values to the callee. 470 llvm.call @foo(%1, %2) : (i64, !llvm.ptr<i8>) 471 llvm.return 472} 473``` 474 475**Lifetime.** The second element of the unranked memref descriptor points to 476some memory in which the ranked memref descriptor is stored. By convention, this 477memory is allocated on stack and has the lifetime of the function. (*Note:* due 478to function-length lifetime, creation of multiple unranked memref descriptors, 479e.g., in a loop, may lead to stack overflows.) If an unranked descriptor has to 480be returned from a function, the ranked descriptor it points to is copied into 481dynamically allocated memory, and the pointer in the unranked descriptor is 482updated accordingly. The allocation happens immediately before returning. It is 483the responsibility of the caller to free the dynamically allocated memory. The 484default conversion of `std.call` and `std.call_indirect` copies the ranked 485descriptor to newly allocated memory on the caller's stack. Thus, the convention 486of the ranked memref descriptor pointed to by an unranked memref descriptor 487being stored on stack is respected. 488 489#### Bare Pointer Calling Convention for Ranked MemRef 490 491The "bare pointer" calling convention converts `memref`-typed function arguments 492to a *single* pointer to the aligned data. Note that this does *not* apply to 493uses of `memref` outside of function signatures, the default descriptor 494structures are still used. This convention further restricts the supported cases 495to the following. 496 497- `memref` types with default layout. 498- `memref` types with all dimensions statically known. 499- `memref` values allocated in such a way that the allocated and aligned 500 pointer match. Alternatively, the same function must handle allocation and 501 deallocation since only one pointer is passed to any callee. 502 503Examples: 504 505``` 506func @callee(memref<2x4xf32>) { 507 508func @caller(%0 : memref<2x4xf32>) { 509 call @callee(%0) : (memref<2x4xf32>) -> () 510} 511 512// -> 513 514!descriptor = !llvm.struct<(ptr<f32>, ptr<f32>, i64, 515 array<2xi64>, array<2xi64>)> 516 517llvm.func @callee(!llvm.ptr<f32>) 518 519llvm.func @caller(%arg0: !llvm.ptr<f32>) { 520 // A descriptor value is defined at the function entry point. 521 %0 = llvm.mlir.undef : !descriptor 522 523 // Both the allocated and aligned pointer are set up to the same value. 524 %1 = llvm.insertelement %arg0, %0[0] : !descriptor 525 %2 = llvm.insertelement %arg0, %1[1] : !descriptor 526 527 // The offset is set up to zero. 528 %3 = llvm.mlir.constant(0 : index) : i64 529 %4 = llvm.insertelement %3, %2[2] : !descriptor 530 531 // The sizes and strides are derived from the statically known values. 532 %5 = llvm.mlir.constant(2 : index) : i64 533 %6 = llvm.mlir.constant(4 : index) : i64 534 %7 = llvm.insertelement %5, %4[3, 0] : !descriptor 535 %8 = llvm.insertelement %6, %7[3, 1] : !descriptor 536 %9 = llvm.mlir.constant(1 : index) : i64 537 %10 = llvm.insertelement %9, %8[4, 0] : !descriptor 538 %11 = llvm.insertelement %10, %9[4, 1] : !descriptor 539 540 // The function call corresponds to extracting the aligned data pointer. 541 %12 = llvm.extractelement %11[1] : !descriptor 542 llvm.call @callee(%12) : (!llvm.ptr<f32>) -> () 543} 544``` 545 546#### Bare Pointer Calling Convention For Unranked MemRef 547 548The "bare pointer" calling convention does not support unranked memrefs as their 549shape cannot be known at compile time. 550 551### C-compatible wrapper emission 552 553In practical cases, it may be desirable to have externally-facing functions with 554a single attribute corresponding to a MemRef argument. When interfacing with 555LLVM IR produced from C, the code needs to respect the corresponding calling 556convention. The conversion to the LLVM dialect provides an option to generate 557wrapper functions that take memref descriptors as pointers-to-struct compatible 558with data types produced by Clang when compiling C sources. The generation of 559such wrapper functions can additionally be controlled at a function granularity 560by setting the `llvm.emit_c_interface` unit attribute. 561 562More specifically, a memref argument is converted into a pointer-to-struct 563argument of type `{T*, T*, i64, i64[N], i64[N]}*` in the wrapper function, where 564`T` is the converted element type and `N` is the memref rank. This type is 565compatible with that produced by Clang for the following C++ structure template 566instantiations or their equivalents in C. 567 568```cpp 569template<typename T, size_t N> 570struct MemRefDescriptor { 571 T *allocated; 572 T *aligned; 573 intptr_t offset; 574 intptr_t sizes[N]; 575 intptr_t strides[N]; 576}; 577``` 578 579Furthermore, we also rewrite function results to pointer parameters if the 580rewritten function result has a struct type. The special result parameter is 581added as the first parameter and is of pointer-to-struct type. 582 583If enabled, the option will do the following. For *external* functions declared 584in the MLIR module. 585 5861. Declare a new function `_mlir_ciface_<original name>` where memref arguments 587 are converted to pointer-to-struct and the remaining arguments are converted 588 as usual. Results are converted to a special argument if they are of struct 589 type. 5902. Add a body to the original function (making it non-external) that 591 1. allocates memref descriptors, 592 2. populates them, 593 3. potentially allocates space for the result struct, and 594 4. passes the pointers to these into the newly declared interface function, 595 then 596 5. collects the result of the call (potentially from the result struct), 597 and 598 6. returns it to the caller. 599 600For (non-external) functions defined in the MLIR module. 601 6021. Define a new function `_mlir_ciface_<original name>` where memref arguments 603 are converted to pointer-to-struct and the remaining arguments are converted 604 as usual. Results are converted to a special argument if they are of struct 605 type. 6062. Populate the body of the newly defined function with IR that 607 1. loads descriptors from pointers; 608 2. unpacks descriptor into individual non-aggregate values; 609 3. passes these values into the original function; 610 4. collects the results of the call and 611 5. either copies the results into the result struct or returns them to the 612 caller. 613 614Examples: 615 616```mlir 617 618func @qux(%arg0: memref<?x?xf32>) 619 620// Gets converted into the following 621// (using type alias for brevity): 622!llvm.memref_2d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64, 623 array<2xi64>, array<2xi64>)> 624 625// Function with unpacked arguments. 626llvm.func @qux(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>, 627 %arg2: i64, %arg3: i64, %arg4: i64, 628 %arg5: i64, %arg6: i64) { 629 // Populate memref descriptor (as per calling convention). 630 %0 = llvm.mlir.undef : !llvm.memref_2d 631 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d 632 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d 633 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d 634 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d 635 %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d 636 %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d 637 %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d 638 639 // Store the descriptor in a stack-allocated space. 640 %8 = llvm.mlir.constant(1 : index) : i64 641 %9 = llvm.alloca %8 x !llvm.memref_2d 642 : (i64) -> !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, 643 array<2xi64>, array<2xi64>)>> 644 llvm.store %7, %9 : !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, 645 array<2xi64>, array<2xi64>)>> 646 647 // Call the interface function. 648 llvm.call @_mlir_ciface_qux(%9) 649 : (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, 650 array<2xi64>, array<2xi64>)>>) -> () 651 652 // The stored descriptor will be freed on return. 653 llvm.return 654} 655 656// Interface function. 657llvm.func @_mlir_ciface_qux(!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, 658 array<2xi64>, array<2xi64>)>>) 659``` 660 661```mlir 662func @foo(%arg0: memref<?x?xf32>) { 663 return 664} 665 666// Gets converted into the following 667// (using type alias for brevity): 668!llvm.memref_2d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64, 669 array<2xi64>, array<2xi64>)> 670!llvm.memref_2d_ptr = type !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, 671 array<2xi64>, array<2xi64>)>> 672 673// Function with unpacked arguments. 674llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>, 675 %arg2: i64, %arg3: i64, %arg4: i64, 676 %arg5: i64, %arg6: i64) { 677 llvm.return 678} 679 680// Interface function callable from C. 681llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr) { 682 // Load the descriptor. 683 %0 = llvm.load %arg0 : !llvm.memref_2d_ptr 684 685 // Unpack the descriptor as per calling convention. 686 %1 = llvm.extractvalue %0[0] : !llvm.memref_2d 687 %2 = llvm.extractvalue %0[1] : !llvm.memref_2d 688 %3 = llvm.extractvalue %0[2] : !llvm.memref_2d 689 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d 690 %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d 691 %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d 692 %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d 693 llvm.call @foo(%1, %2, %3, %4, %5, %6, %7) 694 : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64, 695 i64, i64) -> () 696 llvm.return 697} 698``` 699 700```mlir 701func @foo(%arg0: memref<?x?xf32>) -> memref<?x?xf32> { 702 return %arg0 : memref<?x?xf32> 703} 704 705// Gets converted into the following 706// (using type alias for brevity): 707!llvm.memref_2d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64, 708 array<2xi64>, array<2xi64>)> 709!llvm.memref_2d_ptr = type !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, 710 array<2xi64>, array<2xi64>)>> 711 712// Function with unpacked arguments. 713llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>, %arg2: i64, 714 %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64) 715 -> !llvm.memref_2d { 716 %0 = llvm.mlir.undef : !llvm.memref_2d 717 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d 718 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d 719 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d 720 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d 721 %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d 722 %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d 723 %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d 724 llvm.return %7 : !llvm.memref_2d 725} 726 727// Interface function callable from C. 728llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr, %arg1: !llvm.memref_2d_ptr) { 729 %0 = llvm.load %arg1 : !llvm.memref_2d_ptr 730 %1 = llvm.extractvalue %0[0] : !llvm.memref_2d 731 %2 = llvm.extractvalue %0[1] : !llvm.memref_2d 732 %3 = llvm.extractvalue %0[2] : !llvm.memref_2d 733 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d 734 %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d 735 %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d 736 %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d 737 %8 = llvm.call @foo(%1, %2, %3, %4, %5, %6, %7) 738 : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64, i64, i64) -> !llvm.memref_2d 739 llvm.store %8, %arg0 : !llvm.memref_2d_ptr 740 llvm.return 741} 742``` 743 744Rationale: Introducing auxiliary functions for C-compatible interfaces is 745preferred to modifying the calling convention since it will minimize the effect 746of C compatibility on intra-module calls or calls between MLIR-generated 747functions. In particular, when calling external functions from an MLIR module in 748a (parallel) loop, the fact of storing a memref descriptor on stack can lead to 749stack exhaustion and/or concurrent access to the same address. Auxiliary 750interface function serves as an allocation scope in this case. Furthermore, when 751targeting accelerators with separate memory spaces such as GPUs, stack-allocated 752descriptors passed by pointer would have to be transferred to the device memory, 753which introduces significant overhead. In such situations, auxiliary interface 754functions are executed on host and only pass the values through device function 755invocation mechanism. 756 757### Address Computation 758 759Accesses to a memref element are transformed into an access to an element of the 760buffer pointed to by the descriptor. The position of the element in the buffer 761is calculated by linearizing memref indices in row-major order (lexically first 762index is the slowest varying, similar to C, but accounting for strides). The 763computation of the linear address is emitted as arithmetic operation in the LLVM 764IR dialect. Strides are extracted from the memref descriptor. 765 766Examples: 767 768An access to a memref with indices: 769 770```mlir 771%0 = memref.load %m[%1,%2,%3,%4] : memref<?x?x4x8xf32, offset: ?> 772``` 773 774is transformed into the equivalent of the following code: 775 776```mlir 777// Compute the linearized index from strides. 778// When strides or, in absence of explicit strides, the corresponding sizes are 779// dynamic, extract the stride value from the descriptor. 780%stride1 = llvm.extractvalue[4, 0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, 781 array<4xi64>, array<4xi64>)> 782%addr1 = arith.muli %stride1, %1 : i64 783 784// When the stride or, in absence of explicit strides, the trailing sizes are 785// known statically, this value is used as a constant. The natural value of 786// strides is the product of all sizes following the current dimension. 787%stride2 = llvm.mlir.constant(32 : index) : i64 788%addr2 = arith.muli %stride2, %2 : i64 789%addr3 = arith.addi %addr1, %addr2 : i64 790 791%stride3 = llvm.mlir.constant(8 : index) : i64 792%addr4 = arith.muli %stride3, %3 : i64 793%addr5 = arith.addi %addr3, %addr4 : i64 794 795// Multiplication with the known unit stride can be omitted. 796%addr6 = arith.addi %addr5, %4 : i64 797 798// If the linear offset is known to be zero, it can also be omitted. If it is 799// dynamic, it is extracted from the descriptor. 800%offset = llvm.extractvalue[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, 801 array<4xi64>, array<4xi64>)> 802%addr7 = arith.addi %addr6, %offset : i64 803 804// All accesses are based on the aligned pointer. 805%aligned = llvm.extractvalue[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, 806 array<4xi64>, array<4xi64>)> 807 808// Get the address of the data pointer. 809%ptr = llvm.getelementptr %aligned[%addr8] 810 : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<4xi64>, array<4xi64>)> 811 -> !llvm.ptr<f32> 812 813// Perform the actual load. 814%0 = llvm.load %ptr : !llvm.ptr<f32> 815``` 816 817For stores, the address computation code is identical and only the actual store 818operation is different. 819 820Note: the conversion does not perform any sort of common subexpression 821elimination when emitting memref accesses. 822 823### Utility Classes 824 825Utility classes common to many conversions to the LLVM dialect can be found 826under `lib/Conversion/LLVMCommon`. They include the following. 827 828- `LLVMConversionTarget` specifies all LLVM dialect operations as legal. 829- `LLVMTypeConverter` implements the default type conversion as described 830 above. 831- `ConvertOpToLLVMPattern` extends the conversion pattern class with LLVM 832 dialect-specific functionality. 833- `VectorConvertOpToLLVMPattern` extends the previous class to automatically 834 unroll operations on higher-dimensional vectors into lists of operations on 835 one-dimensional vectors before. 836- `StructBuilder` provides a convenient API for building IR that creates or 837 accesses values of LLVM dialect structure types; it is derived by 838 `MemRefDescriptor`, `UrankedMemrefDescriptor` and `ComplexBuilder` for the 839 built-in types convertible to LLVM dialect structure types. 840 841## Translation to LLVM IR 842 843MLIR modules containing `llvm.func`, `llvm.mlir.global` and `llvm.metadata` 844operations can be translated to LLVM IR modules using the following scheme. 845 846- Module-level globals are translated to LLVM IR global values. 847- Module-level metadata are translated to LLVM IR metadata, which can be later 848 augmented with additional metadata defined on specific ops. 849- All functions are declared in the module so that they can be referenced. 850- Each function is then translated separately and has access to the complete 851 mappings between MLIR and LLVM IR globals, metadata, and functions. 852- Within a function, blocks are traversed in topological order and translated 853 to LLVM IR basic blocks. In each basic block, PHI nodes are created for each 854 of the block arguments, but not connected to their source blocks. 855- Within each block, operations are translated in their order. Each operation 856 has access to the same mappings as the function and additionally to the 857 mapping of values between MLIR and LLVM IR, including PHI nodes. Operations 858 with regions are responsible for translated the regions they contain. 859- After operations in a function are translated, the PHI nodes of blocks in 860 this function are connected to their source values, which are now available. 861 862The translation mechanism provides extension hooks for translating custom 863operations to LLVM IR via a dialect interface `LLVMTranslationDialectInterface`: 864 865- `convertOperation` translates an operation that belongs to the current 866 dialect to LLVM IR given an `IRBuilderBase` and various mappings; 867- `amendOperation` performs additional actions on an operation if it contains 868 a dialect attribute that belongs to the current dialect, for example sets up 869 instruction-level metadata. 870 871Dialects containing operations or attributes that want to be translated to LLVM 872IR must provide an implementation of this interface and register it with the 873system. Note that registration may happen without creating the dialect, for 874example, in a separate library to avoid the need for the "main" dialect library 875to depend on LLVM IR libraries. The implementations of these methods may used 876the 877[`ModuleTranslation`](https://mlir.llvm.org/doxygen/classmlir_1_1LLVM_1_1ModuleTranslation.html) 878object provided to them which holds the state of the translation and contains 879numerous utilities. 880 881Note that this extension mechanism is *intentionally restrictive*. LLVM IR has a 882small, relatively stable set of instructions and types that MLIR intends to 883model fully. Therefore, the extension mechanism is provided only for LLVM IR 884constructs that are more often extended -- intrinsics and metadata. The primary 885goal of the extension mechanism is to support sets of intrinsics, for example 886those representing a particular instruction set. The extension mechanism does 887not allow for customizing type or block translation, nor does it support custom 888module-level operations. Such transformations should be performed within MLIR 889and target the corresponding MLIR constructs. 890 891## Translation from LLVM IR 892 893An experimental flow allows one to import a substantially limited subset of LLVM 894IR into MLIR, producing LLVM dialect operations. 895 896``` 897 mlir-translate -import-llvm filename.ll 898``` 899