1# LLVM IR Target
2
3This document describes the mechanisms of producing LLVM IR from MLIR. The
4overall flow is two-stage:
5
61.  **conversion** of the IR to a set of dialects translatable to LLVM IR, for
7    example [LLVM Dialect](Dialects/LLVM.md) or one of the hardware-specific
8    dialects derived from LLVM IR intrinsics such as [AMX](Dialects/AMX.md),
9    [X86Vector](Dialects/X86Vector.md) or [ArmNeon](Dialects/ArmNeon.md);
102.  **translation** of MLIR dialects to LLVM IR.
11
12This flow allows the non-trivial transformation to be performed within MLIR
13using MLIR APIs and makes the translation between MLIR and LLVM IR *simple* and
14potentially bidirectional. As a corollary, dialect ops translatable to LLVM IR
15are expected to closely match the corresponding LLVM IR instructions and
16intrinsics. This minimizes the dependency on LLVM IR libraries in MLIR as well
17as reduces the churn in case of changes.
18
19SPIR-V to LLVM dialect conversion has a
20[dedicated document](SPIRVToLLVMDialectConversion.md).
21
22[TOC]
23
24## Conversion to the LLVM Dialect
25
26Conversion to the LLVM dialect from other dialects is the first step to produce
27LLVM IR. All non-trivial IR modifications are expected to happen at this stage
28or before. The conversion is *progressive*: most passes convert one dialect to
29the LLVM dialect and keep operations from other dialects intact. For example,
30the `-convert-memref-to-llvm` pass will only convert operations from the
31`memref` dialect but will not convert operations from other dialects even if
32they use or produce `memref`-typed values.
33
34The process relies on the [Dialect Conversion](DialectConversion.md)
35infrastructure and, in particular, on the
36[materialization](DialectConversion.md#type-conversion) hooks of `TypeConverter`
37to support progressive lowering by injecting `unrealized_conversion_cast`
38operations between converted and unconverted operations. After multiple partial
39conversions to the LLVM dialect are performed, the cast operations that became
40noop can be removed by the `-reconcile-unrealized-casts` pass. The latter pass
41is not specific to the LLVM dialect and can remove any noop casts.
42
43### Conversion of Built-in Types
44
45Built-in types have a default conversion to LLVM dialect types provided by the
46`LLVMTypeConverter` class. Users targeting the LLVM dialect can reuse and extend
47this type converter to support other types. Extra care must be taken if the
48conversion rules for built-in types are overridden: all conversion must use the
49same type converter.
50
51#### LLVM Dialect-compatible Types
52
53The types [compatible](Dialects/LLVM.md#built-in-type-compatibility) with the
54LLVM dialect are kept as is.
55
56#### Complex Type
57
58Complex type is converted into an LLVM dialect literal structure type with two
59elements:
60
61-   real part;
62-   imaginary part.
63
64The elemental type is converted recursively using these rules.
65
66Example:
67
68```mlir
69  complex<f32>
70  // ->
71  !llvm.struct<(f32, f32)>
72```
73
74#### Index Type
75
76Index type is converted into an LLVM dialect integer type with the bitwidth
77specified by the [data layout](DataLayout.md) of the closest module. For
78example, on x86-64 CPUs it converts to i64. This behavior can be overridden by
79the type converter configuration, which is often exposed as a pass option by
80conversion passes.
81
82Example:
83
84```mlir
85  index
86  // -> on x86_64
87  i64
88```
89
90#### Ranked MemRef Types
91
92Ranked memref types are converted into an LLVM dialect literal structure type
93that contains the dynamic information associated with the memref object,
94referred to as *descriptor*. Only memrefs in the
95**[strided form](Dialects/Builtin.md/#strided-memref)** can be converted to the
96LLVM dialect with the default descriptor format. Memrefs with other, less
97trivial layouts should be converted into the strided form first, e.g., by
98materializing the non-trivial address remapping due to layout as `affine.apply`
99operations.
100
101The default memref descriptor is a struct with the following fields:
102
1031.  The pointer to the data buffer as allocated, referred to as "allocated
104    pointer". This is only useful for deallocating the memref.
1052.  The pointer to the properly aligned data pointer that the memref indexes,
106    referred to as "aligned pointer".
1073.  A lowered converted `index`-type integer containing the distance in number
108    of elements between the beginning of the (aligned) buffer and the first
109    element to be accessed through the memref, referred to as "offset".
1104.  An array containing as many converted `index`-type integers as the rank of
111    the memref: the array represents the size, in number of elements, of the
112    memref along the given dimension.
1135.  A second array containing as many converted `index`-type integers as the
114    rank of memref: the second array represents the "stride" (in tensor
115    abstraction sense), i.e. the number of consecutive elements of the
116    underlying buffer one needs to jump over to get to the next logically
117    indexed element.
118
119For constant memref dimensions, the corresponding size entry is a constant whose
120runtime value matches the static value. This normalization serves as an ABI for
121the memref type to interoperate with externally linked functions. In the
122particular case of rank `0` memrefs, the size and stride arrays are omitted,
123resulting in a struct containing two pointers + offset.
124
125Examples:
126
127```mlir
128// Assuming index is converted to i64.
129
130memref<f32> -> !llvm.struct<(ptr<f32> , ptr<f32>, i64)>
131memref<1 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64,
132                                 array<1 x 64>, array<1 x i64>)>
133memref<? x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64
134                                 array<1 x 64>, array<1 x i64>)>
135memref<10x42x42x43x123 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64
136                                               array<5 x 64>, array<5 x i64>)>
137memref<10x?x42x?x123 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64
138                                             array<5 x 64>, array<5 x i64>)>
139
140// Memref types can have vectors as element types
141memref<1x? x vector<4xf32>> -> !llvm.struct<(ptr<vector<4 x f32>>,
142                                             ptr<vector<4 x f32>>, i64,
143                                             array<2 x i64>, array<2 x i64>)>
144```
145
146#### Unranked MemRef Types
147
148Unranked memref types are converted to LLVM dialect literal structure type that
149contains the dynamic information associated with the memref object, referred to
150as *unranked descriptor*. It contains:
151
1521.  a converted `index`-typed integer representing the dynamic rank of the
153    memref;
1542.  a type-erased pointer (`!llvm.ptr<i8>`) to a ranked memref descriptor with
155    the contents listed above.
156
157This descriptor is primarily intended for interfacing with rank-polymorphic
158library functions. The pointer to the ranked memref descriptor points to some
159*allocated* memory, which may reside on stack of the current function or in
160heap. Conversion patterns for operations producing unranked memrefs are expected
161to manage the allocation. Note that this may lead to stack allocations
162(`llvm.alloca`) being performed in a loop and not reclaimed until the end of the
163current function.
164
165#### Function Types
166
167Function types are converted to LLVM dialect function types as follows:
168
169-   function argument and result types are converted recursively using these
170    rules;
171-   if a function type has multiple results, they are wrapped into an LLVM
172    dialect literal structure type since LLVM function types must have exactly
173    one result;
174-   if a function type has no results, the corresponding LLVM dialect function
175    type will have one `!llvm.void` result since LLVM function types must have a
176    result;
177-   function types used in arguments of another function type are wrapped in an
178    LLVM dialect pointer type to comply with LLVM IR expectations;
179-   the structs corresponding to `memref` types, both ranked and unranked,
180    appearing as function arguments are unbundled into individual function
181    arguments to allow for specifying metadata such as aliasing information on
182    individual pointers;
183-   the conversion of `memref`-typed arguments is subject to
184    [calling conventions](TargetLLVMIR.md#calling-conventions).
185-   if a function type has boolean attribute `func.varargs` being set, the
186    converted LLVM function will be variadic.
187
188Examples:
189
190```mlir
191// Zero-ary function type with no results:
192() -> ()
193// is converted to a zero-ary function with `void` result.
194!llvm.func<void ()>
195
196// Unary function with one result:
197(i32) -> (i64)
198// has its argument and result type converted, before creating the LLVM dialect
199// function type.
200!llvm.func<i64 (i32)>
201
202// Binary function with one result:
203(i32, f32) -> (i64)
204// has its arguments handled separately
205!llvm.func<i64 (i32, f32)>
206
207// Binary function with two results:
208(i32, f32) -> (i64, f64)
209// has its result aggregated into a structure type.
210!llvm.func<struct<(i64, f64)> (i32, f32)>
211
212// Function-typed arguments or results in higher-order functions:
213(() -> ()) -> (() -> ())
214// are converted into pointers to functions.
215!llvm.func<ptr<func<void ()>> (ptr<func<void ()>>)>
216
217// These rules apply recursively: a function type taking a function that takes
218// another function
219( ( (i32) -> (i64) ) -> () ) -> ()
220// is converted into a function type taking a pointer-to-function that takes
221// another point-to-function.
222!llvm.func<void (ptr<func<void (ptr<func<i64 (i32)>>)>>)>
223
224// A memref descriptor appearing as function argument:
225(memref<f32>) -> ()
226// gets converted into a list of individual scalar components of a descriptor.
227!llvm.func<void (ptr<f32>, ptr<f32>, i64)>
228
229// The list of arguments is linearized and one can freely mix memref and other
230// types in this list:
231(memref<f32>, f32) -> ()
232// which gets converted into a flat list.
233!llvm.func<void (ptr<f32>, ptr<f32>, i64, f32)>
234
235// For nD ranked memref descriptors:
236(memref<?x?xf32>) -> ()
237// the converted signature will contain 2n+1 `index`-typed integer arguments,
238// offset, n sizes and n strides, per memref argument type.
239!llvm.func<void (ptr<f32>, ptr<f32>, i64, i64, i64, i64, i64)>
240
241// Same rules apply to unranked descriptors:
242(memref<*xf32>) -> ()
243// which get converted into their components.
244!llvm.func<void (i64, ptr<i8>)>
245
246// However, returning a memref from a function is not affected:
247() -> (memref<?xf32>)
248// gets converted to a function returning a descriptor structure.
249!llvm.func<struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)> ()>
250
251// If multiple memref-typed results are returned:
252() -> (memref<f32>, memref<f64>)
253// their descriptor structures are additionally packed into another structure,
254// potentially with other non-memref typed results.
255!llvm.func<struct<(struct<(ptr<f32>, ptr<f32>, i64)>,
256                   struct<(ptr<double>, ptr<double>, i64)>)> ()>
257
258// If "func.varargs" attribute is set:
259(i32) -> () attributes { "func.varargs" = true }
260// the corresponding LLVM function will be variadic:
261!llvm.func<void (i32, ...)>
262```
263
264Conversion patterns are available to convert built-in function operations and
265standard call operations targeting those functions using these conversion rules.
266
267#### Multi-dimensional Vector Types
268
269LLVM IR only supports *one-dimensional* vectors, unlike MLIR where vectors can
270be multi-dimensional. Vector types cannot be nested in either IR. In the
271one-dimensional case, MLIR vectors are converted to LLVM IR vectors of the same
272size with element type converted using these conversion rules. In the
273n-dimensional case, MLIR vectors are converted to (n-1)-dimensional array types
274of one-dimensional vectors.
275
276Examples:
277
278```
279vector<4x8 x f32>
280// ->
281!llvm.array<4 x vector<8 x f32>>
282
283memref<2 x vector<4x8 x f32>
284// ->
285!llvm.struct<(ptr<array<4 x vector<8xf32>>>, ptr<array<4 x vector<8xf32>>>
286              i64, array<1 x i64>, array<1 x i64>)>
287```
288
289#### Tensor Types
290
291Tensor types cannot be converted to the LLVM dialect. Operations on tensors must
292be [bufferized](Bufferization.md) before being converted.
293
294### Calling Conventions
295
296Calling conventions provides a mechanism to customize the conversion of function
297and function call operations without changing how individual types are handled
298elsewhere. They are implemented simultaneously by the default type converter and
299by the conversion patterns for the relevant operations.
300
301#### Function Result Packing
302
303In case of multi-result functions, the returned values are inserted into a
304structure-typed value before being returned and extracted from it at the call
305site. This transformation is a part of the conversion and is transparent to the
306defines and uses of the values being returned.
307
308Example:
309
310```mlir
311func.func @foo(%arg0: i32, %arg1: i64) -> (i32, i64) {
312  return %arg0, %arg1 : i32, i64
313}
314func.func @bar() {
315  %0 = arith.constant 42 : i32
316  %1 = arith.constant 17 : i64
317  %2:2 = call @foo(%0, %1) : (i32, i64) -> (i32, i64)
318  "use_i32"(%2#0) : (i32) -> ()
319  "use_i64"(%2#1) : (i64) -> ()
320}
321
322// is transformed into
323
324llvm.func @foo(%arg0: i32, %arg1: i64) -> !llvm.struct<(i32, i64)> {
325  // insert the vales into a structure
326  %0 = llvm.mlir.undef : !llvm.struct<(i32, i64)>
327  %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i32, i64)>
328  %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i32, i64)>
329
330  // return the structure value
331  llvm.return %2 : !llvm.struct<(i32, i64)>
332}
333llvm.func @bar() {
334  %0 = llvm.mlir.constant(42 : i32) : i32
335  %1 = llvm.mlir.constant(17) : i64
336
337  // call and extract the values from the structure
338  %2 = llvm.call @bar(%0, %1)
339     : (i32, i32) -> !llvm.struct<(i32, i64)>
340  %3 = llvm.extractvalue %2[0] : !llvm.struct<(i32, i64)>
341  %4 = llvm.extractvalue %2[1] : !llvm.struct<(i32, i64)>
342
343  // use as before
344  "use_i32"(%3) : (i32) -> ()
345  "use_i64"(%4) : (i64) -> ()
346}
347```
348
349#### Default Calling Convention for Ranked MemRef
350
351The default calling convention converts `memref`-typed function arguments to
352LLVM dialect literal structs
353[defined above](TargetLLVMIR.md#ranked-memref-types) before unbundling them into
354individual scalar arguments.
355
356Examples:
357
358This convention is implemented in the conversion of `func.func` and `func.call` to
359the LLVM dialect, with the former unpacking the descriptor into a set of
360individual values and the latter packing those values back into a descriptor so
361as to make it transparently usable by other operations. Conversions from other
362dialects should take this convention into account.
363
364This specific convention is motivated by the necessity to specify alignment and
365aliasing attributes on the raw pointers underpinning the memref.
366
367Examples:
368
369```mlir
370func.func @foo(%arg0: memref<?xf32>) -> () {
371  "use"(%arg0) : (memref<?xf32>) -> ()
372  return
373}
374
375// Gets converted to the following
376// (using type alias for brevity):
377!llvm.memref_1d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)>
378
379llvm.func @foo(%arg0: !llvm.ptr<f32>,  // Allocated pointer.
380               %arg1: !llvm.ptr<f32>,  // Aligned pointer.
381               %arg2: i64,             // Offset.
382               %arg3: i64,             // Size in dim 0.
383               %arg4: i64) {           // Stride in dim 0.
384  // Populate memref descriptor structure.
385  %0 = llvm.mlir.undef :
386  %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_1d
387  %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_1d
388  %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_1d
389  %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_1d
390  %5 = llvm.insertvalue %arg4, %4[4, 0] : !llvm.memref_1d
391
392  // Descriptor is now usable as a single value.
393  "use"(%5) : (!llvm.memref_1d) -> ()
394  llvm.return
395}
396```
397
398```mlir
399func.func @bar() {
400  %0 = "get"() : () -> (memref<?xf32>)
401  call @foo(%0) : (memref<?xf32>) -> ()
402  return
403}
404
405// Gets converted to the following
406// (using type alias for brevity):
407!llvm.memref_1d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)>
408
409llvm.func @bar() {
410  %0 = "get"() : () -> !llvm.memref_1d
411
412  // Unpack the memref descriptor.
413  %1 = llvm.extractvalue %0[0] : !llvm.memref_1d
414  %2 = llvm.extractvalue %0[1] : !llvm.memref_1d
415  %3 = llvm.extractvalue %0[2] : !llvm.memref_1d
416  %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_1d
417  %5 = llvm.extractvalue %0[4, 0] : !llvm.memref_1d
418
419  // Pass individual values to the callee.
420  llvm.call @foo(%1, %2, %3, %4, %5) : (!llvm.memref_1d) -> ()
421  llvm.return
422}
423```
424
425#### Default Calling Convention for Unranked MemRef
426
427For unranked memrefs, the list of function arguments always contains two
428elements, same as the unranked memref descriptor: an integer rank, and a
429type-erased (`!llvm<"i8*">`) pointer to the ranked memref descriptor. Note that
430while the *calling convention* does not require allocation, *casting* to
431unranked memref does since one cannot take an address of an SSA value containing
432the ranked memref, which must be stored in some memory instead. The caller is in
433charge of ensuring the thread safety and management of the allocated memory, in
434particular the deallocation.
435
436Example
437
438```mlir
439llvm.func @foo(%arg0: memref<*xf32>) -> () {
440  "use"(%arg0) : (memref<*xf32>) -> ()
441  return
442}
443
444// Gets converted to the following.
445
446llvm.func @foo(%arg0: i64              // Rank.
447               %arg1: !llvm.ptr<i8>) { // Type-erased pointer to descriptor.
448  // Pack the unranked memref descriptor.
449  %0 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
450  %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i64, ptr<i8>)>
451  %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i64, ptr<i8>)>
452
453  "use"(%2) : (!llvm.struct<(i64, ptr<i8>)>) -> ()
454  llvm.return
455}
456```
457
458```mlir
459llvm.func @bar() {
460  %0 = "get"() : () -> (memref<*xf32>)
461  call @foo(%0): (memref<*xf32>) -> ()
462  return
463}
464
465// Gets converted to the following.
466
467llvm.func @bar() {
468  %0 = "get"() : () -> (!llvm.struct<(i64, ptr<i8>)>)
469
470  // Unpack the memref descriptor.
471  %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr<i8>)>
472  %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr<i8>)>
473
474  // Pass individual values to the callee.
475  llvm.call @foo(%1, %2) : (i64, !llvm.ptr<i8>)
476  llvm.return
477}
478```
479
480**Lifetime.** The second element of the unranked memref descriptor points to
481some memory in which the ranked memref descriptor is stored. By convention, this
482memory is allocated on stack and has the lifetime of the function. (*Note:* due
483to function-length lifetime, creation of multiple unranked memref descriptors,
484e.g., in a loop, may lead to stack overflows.) If an unranked descriptor has to
485be returned from a function, the ranked descriptor it points to is copied into
486dynamically allocated memory, and the pointer in the unranked descriptor is
487updated accordingly. The allocation happens immediately before returning. It is
488the responsibility of the caller to free the dynamically allocated memory. The
489default conversion of `func.call` and `func.call_indirect` copies the ranked
490descriptor to newly allocated memory on the caller's stack. Thus, the convention
491of the ranked memref descriptor pointed to by an unranked memref descriptor
492being stored on stack is respected.
493
494#### Bare Pointer Calling Convention for Ranked MemRef
495
496The "bare pointer" calling convention converts `memref`-typed function arguments
497to a *single* pointer to the aligned data. Note that this does *not* apply to
498uses of `memref` outside of function signatures, the default descriptor
499structures are still used. This convention further restricts the supported cases
500to the following.
501
502-   `memref` types with default layout.
503-   `memref` types with all dimensions statically known.
504-   `memref` values allocated in such a way that the allocated and aligned
505    pointer match. Alternatively, the same function must handle allocation and
506    deallocation since only one pointer is passed to any callee.
507
508Examples:
509
510```
511func.func @callee(memref<2x4xf32>) {
512
513func.func @caller(%0 : memref<2x4xf32>) {
514  call @callee(%0) : (memref<2x4xf32>) -> ()
515}
516
517// ->
518
519!descriptor = !llvm.struct<(ptr<f32>, ptr<f32>, i64,
520                            array<2xi64>, array<2xi64>)>
521
522llvm.func @callee(!llvm.ptr<f32>)
523
524llvm.func @caller(%arg0: !llvm.ptr<f32>) {
525  // A descriptor value is defined at the function entry point.
526  %0 = llvm.mlir.undef : !descriptor
527
528  // Both the allocated and aligned pointer are set up to the same value.
529  %1 = llvm.insertelement %arg0, %0[0] : !descriptor
530  %2 = llvm.insertelement %arg0, %1[1] : !descriptor
531
532  // The offset is set up to zero.
533  %3 = llvm.mlir.constant(0 : index) : i64
534  %4 = llvm.insertelement %3, %2[2] : !descriptor
535
536  // The sizes and strides are derived from the statically known values.
537  %5 = llvm.mlir.constant(2 : index) : i64
538  %6 = llvm.mlir.constant(4 : index) : i64
539  %7 = llvm.insertelement %5, %4[3, 0] : !descriptor
540  %8 = llvm.insertelement %6, %7[3, 1] : !descriptor
541  %9 = llvm.mlir.constant(1 : index) : i64
542  %10 = llvm.insertelement %9, %8[4, 0] : !descriptor
543  %11 = llvm.insertelement %10, %9[4, 1] : !descriptor
544
545  // The function call corresponds to extracting the aligned data pointer.
546  %12 = llvm.extractelement %11[1] : !descriptor
547  llvm.call @callee(%12) : (!llvm.ptr<f32>) -> ()
548}
549```
550
551#### Bare Pointer Calling Convention For Unranked MemRef
552
553The "bare pointer" calling convention does not support unranked memrefs as their
554shape cannot be known at compile time.
555
556### Generic alloction and deallocation functions
557
558When converting the Memref dialect, allocations and deallocations are converted
559into calls to `malloc` (`aligned_alloc` if aligned allocations are requested)
560and `free`. However, it is possible to convert them to more generic functions
561which can be implemented by a runtime library, thus allowing custom allocation
562strategies or runtime profiling. When the conversion pass is  instructed to
563perform such operation, the names of the calles are `_mlir_alloc`,
564`_mlir_aligned_alloc` and `_mlir_free`. Their signatures are the same of
565`malloc`, `aligned_alloc` and `free`.
566
567### C-compatible wrapper emission
568
569In practical cases, it may be desirable to have externally-facing functions with
570a single attribute corresponding to a MemRef argument. When interfacing with
571LLVM IR produced from C, the code needs to respect the corresponding calling
572convention. The conversion to the LLVM dialect provides an option to generate
573wrapper functions that take memref descriptors as pointers-to-struct compatible
574with data types produced by Clang when compiling C sources. The generation of
575such wrapper functions can additionally be controlled at a function granularity
576by setting the `llvm.emit_c_interface` unit attribute.
577
578More specifically, a memref argument is converted into a pointer-to-struct
579argument of type `{T*, T*, i64, i64[N], i64[N]}*` in the wrapper function, where
580`T` is the converted element type and `N` is the memref rank. This type is
581compatible with that produced by Clang for the following C++ structure template
582instantiations or their equivalents in C.
583
584```cpp
585template<typename T, size_t N>
586struct MemRefDescriptor {
587  T *allocated;
588  T *aligned;
589  intptr_t offset;
590  intptr_t sizes[N];
591  intptr_t strides[N];
592};
593```
594
595Furthermore, we also rewrite function results to pointer parameters if the
596rewritten function result has a struct type. The special result parameter is
597added as the first parameter and is of pointer-to-struct type.
598
599If enabled, the option will do the following. For *external* functions declared
600in the MLIR module.
601
6021.  Declare a new function `_mlir_ciface_<original name>` where memref arguments
603    are converted to pointer-to-struct and the remaining arguments are converted
604    as usual. Results are converted to a special argument if they are of struct
605    type.
6062.  Add a body to the original function (making it non-external) that
607    1.  allocates memref descriptors,
608    2.  populates them,
609    3.  potentially allocates space for the result struct, and
610    4.  passes the pointers to these into the newly declared interface function,
611        then
612    5.  collects the result of the call (potentially from the result struct),
613        and
614    6.  returns it to the caller.
615
616For (non-external) functions defined in the MLIR module.
617
6181.  Define a new function `_mlir_ciface_<original name>` where memref arguments
619    are converted to pointer-to-struct and the remaining arguments are converted
620    as usual. Results are converted to a special argument if they are of struct
621    type.
6222.  Populate the body of the newly defined function with IR that
623    1.  loads descriptors from pointers;
624    2.  unpacks descriptor into individual non-aggregate values;
625    3.  passes these values into the original function;
626    4.  collects the results of the call and
627    5.  either copies the results into the result struct or returns them to the
628        caller.
629
630Examples:
631
632```mlir
633
634func.func @qux(%arg0: memref<?x?xf32>)
635
636// Gets converted into the following
637// (using type alias for brevity):
638!llvm.memref_2d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>
639
640// Function with unpacked arguments.
641llvm.func @qux(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>,
642               %arg2: i64, %arg3: i64, %arg4: i64,
643               %arg5: i64, %arg6: i64) {
644  // Populate memref descriptor (as per calling convention).
645  %0 = llvm.mlir.undef : !llvm.memref_2d
646  %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d
647  %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d
648  %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d
649  %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d
650  %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d
651  %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d
652  %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d
653
654  // Store the descriptor in a stack-allocated space.
655  %8 = llvm.mlir.constant(1 : index) : i64
656  %9 = llvm.alloca %8 x !llvm.memref_2d
657     : (i64) -> !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
658                                        array<2xi64>, array<2xi64>)>>
659  llvm.store %7, %9 : !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
660                                        array<2xi64>, array<2xi64>)>>
661
662  // Call the interface function.
663  llvm.call @_mlir_ciface_qux(%9)
664     : (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
665                          array<2xi64>, array<2xi64>)>>) -> ()
666
667  // The stored descriptor will be freed on return.
668  llvm.return
669}
670
671// Interface function.
672llvm.func @_mlir_ciface_qux(!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
673                                              array<2xi64>, array<2xi64>)>>)
674```
675
676```mlir
677func.func @foo(%arg0: memref<?x?xf32>) {
678  return
679}
680
681// Gets converted into the following
682// (using type alias for brevity):
683!llvm.memref_2d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>
684!llvm.memref_2d_ptr = !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>>
685
686// Function with unpacked arguments.
687llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>,
688               %arg2: i64, %arg3: i64, %arg4: i64,
689               %arg5: i64, %arg6: i64) {
690  llvm.return
691}
692
693// Interface function callable from C.
694llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr) {
695  // Load the descriptor.
696  %0 = llvm.load %arg0 : !llvm.memref_2d_ptr
697
698  // Unpack the descriptor as per calling convention.
699  %1 = llvm.extractvalue %0[0] : !llvm.memref_2d
700  %2 = llvm.extractvalue %0[1] : !llvm.memref_2d
701  %3 = llvm.extractvalue %0[2] : !llvm.memref_2d
702  %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d
703  %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d
704  %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d
705  %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d
706  llvm.call @foo(%1, %2, %3, %4, %5, %6, %7)
707    : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64,
708       i64, i64) -> ()
709  llvm.return
710}
711```
712
713```mlir
714func.func @foo(%arg0: memref<?x?xf32>) -> memref<?x?xf32> {
715  return %arg0 : memref<?x?xf32>
716}
717
718// Gets converted into the following
719// (using type alias for brevity):
720!llvm.memref_2d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>
721!llvm.memref_2d_ptr = !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>>
722
723// Function with unpacked arguments.
724llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>, %arg2: i64,
725               %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64)
726    -> !llvm.memref_2d {
727  %0 = llvm.mlir.undef : !llvm.memref_2d
728  %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d
729  %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d
730  %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d
731  %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d
732  %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d
733  %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d
734  %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d
735  llvm.return %7 : !llvm.memref_2d
736}
737
738// Interface function callable from C.
739llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr, %arg1: !llvm.memref_2d_ptr) {
740  %0 = llvm.load %arg1 : !llvm.memref_2d_ptr
741  %1 = llvm.extractvalue %0[0] : !llvm.memref_2d
742  %2 = llvm.extractvalue %0[1] : !llvm.memref_2d
743  %3 = llvm.extractvalue %0[2] : !llvm.memref_2d
744  %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d
745  %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d
746  %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d
747  %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d
748  %8 = llvm.call @foo(%1, %2, %3, %4, %5, %6, %7)
749    : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64, i64, i64) -> !llvm.memref_2d
750  llvm.store %8, %arg0 : !llvm.memref_2d_ptr
751  llvm.return
752}
753```
754
755Rationale: Introducing auxiliary functions for C-compatible interfaces is
756preferred to modifying the calling convention since it will minimize the effect
757of C compatibility on intra-module calls or calls between MLIR-generated
758functions. In particular, when calling external functions from an MLIR module in
759a (parallel) loop, the fact of storing a memref descriptor on stack can lead to
760stack exhaustion and/or concurrent access to the same address. Auxiliary
761interface function serves as an allocation scope in this case. Furthermore, when
762targeting accelerators with separate memory spaces such as GPUs, stack-allocated
763descriptors passed by pointer would have to be transferred to the device memory,
764which introduces significant overhead. In such situations, auxiliary interface
765functions are executed on host and only pass the values through device function
766invocation mechanism.
767
768Limitation: Right now we cannot generate C interface for variadic functions,
769regardless of being non-external or external. Because C functions are unable to
770"forward" variadic arguments like this:
771```c
772void bar(int, ...);
773
774void foo(int x, ...) {
775  // ERROR: no way to forward variadic arguments.
776  void bar(x, ...);
777}
778```
779
780### Address Computation
781
782Accesses to a memref element are transformed into an access to an element of the
783buffer pointed to by the descriptor. The position of the element in the buffer
784is calculated by linearizing memref indices in row-major order (lexically first
785index is the slowest varying, similar to C, but accounting for strides). The
786computation of the linear address is emitted as arithmetic operation in the LLVM
787IR dialect. Strides are extracted from the memref descriptor.
788
789Examples:
790
791An access to a memref with indices:
792
793```mlir
794%0 = memref.load %m[%1,%2,%3,%4] : memref<?x?x4x8xf32, offset: ?>
795```
796
797is transformed into the equivalent of the following code:
798
799```mlir
800// Compute the linearized index from strides.
801// When strides or, in absence of explicit strides, the corresponding sizes are
802// dynamic, extract the stride value from the descriptor.
803%stride1 = llvm.extractvalue[4, 0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
804                                                   array<4xi64>, array<4xi64>)>
805%addr1 = arith.muli %stride1, %1 : i64
806
807// When the stride or, in absence of explicit strides, the trailing sizes are
808// known statically, this value is used as a constant. The natural value of
809// strides is the product of all sizes following the current dimension.
810%stride2 = llvm.mlir.constant(32 : index) : i64
811%addr2 = arith.muli %stride2, %2 : i64
812%addr3 = arith.addi %addr1, %addr2 : i64
813
814%stride3 = llvm.mlir.constant(8 : index) : i64
815%addr4 = arith.muli %stride3, %3 : i64
816%addr5 = arith.addi %addr3, %addr4 : i64
817
818// Multiplication with the known unit stride can be omitted.
819%addr6 = arith.addi %addr5, %4 : i64
820
821// If the linear offset is known to be zero, it can also be omitted. If it is
822// dynamic, it is extracted from the descriptor.
823%offset = llvm.extractvalue[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
824                                               array<4xi64>, array<4xi64>)>
825%addr7 = arith.addi %addr6, %offset : i64
826
827// All accesses are based on the aligned pointer.
828%aligned = llvm.extractvalue[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
829                                                array<4xi64>, array<4xi64>)>
830
831// Get the address of the data pointer.
832%ptr = llvm.getelementptr %aligned[%addr8]
833     : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<4xi64>, array<4xi64>)>
834     -> !llvm.ptr<f32>
835
836// Perform the actual load.
837%0 = llvm.load %ptr : !llvm.ptr<f32>
838```
839
840For stores, the address computation code is identical and only the actual store
841operation is different.
842
843Note: the conversion does not perform any sort of common subexpression
844elimination when emitting memref accesses.
845
846### Utility Classes
847
848Utility classes common to many conversions to the LLVM dialect can be found
849under `lib/Conversion/LLVMCommon`. They include the following.
850
851-   `LLVMConversionTarget` specifies all LLVM dialect operations as legal.
852-   `LLVMTypeConverter` implements the default type conversion as described
853    above.
854-   `ConvertOpToLLVMPattern` extends the conversion pattern class with LLVM
855    dialect-specific functionality.
856-   `VectorConvertOpToLLVMPattern` extends the previous class to automatically
857    unroll operations on higher-dimensional vectors into lists of operations on
858    one-dimensional vectors before.
859-   `StructBuilder` provides a convenient API for building IR that creates or
860    accesses values of LLVM dialect structure types; it is derived by
861    `MemRefDescriptor`, `UrankedMemrefDescriptor` and `ComplexBuilder` for the
862    built-in types convertible to LLVM dialect structure types.
863
864## Translation to LLVM IR
865
866MLIR modules containing `llvm.func`, `llvm.mlir.global` and `llvm.metadata`
867operations can be translated to LLVM IR modules using the following scheme.
868
869-   Module-level globals are translated to LLVM IR global values.
870-   Module-level metadata are translated to LLVM IR metadata, which can be later
871    augmented with additional metadata defined on specific ops.
872-   All functions are declared in the module so that they can be referenced.
873-   Each function is then translated separately and has access to the complete
874    mappings between MLIR and LLVM IR globals, metadata, and functions.
875-   Within a function, blocks are traversed in topological order and translated
876    to LLVM IR basic blocks. In each basic block, PHI nodes are created for each
877    of the block arguments, but not connected to their source blocks.
878-   Within each block, operations are translated in their order. Each operation
879    has access to the same mappings as the function and additionally to the
880    mapping of values between MLIR and LLVM IR, including PHI nodes. Operations
881    with regions are responsible for translated the regions they contain.
882-   After operations in a function are translated, the PHI nodes of blocks in
883    this function are connected to their source values, which are now available.
884
885The translation mechanism provides extension hooks for translating custom
886operations to LLVM IR via a dialect interface `LLVMTranslationDialectInterface`:
887
888-   `convertOperation` translates an operation that belongs to the current
889    dialect to LLVM IR given an `IRBuilderBase` and various mappings;
890-   `amendOperation` performs additional actions on an operation if it contains
891    a dialect attribute that belongs to the current dialect, for example sets up
892    instruction-level metadata.
893
894Dialects containing operations or attributes that want to be translated to LLVM
895IR must provide an implementation of this interface and register it with the
896system. Note that registration may happen without creating the dialect, for
897example, in a separate library to avoid the need for the "main" dialect library
898to depend on LLVM IR libraries. The implementations of these methods may used
899the
900[`ModuleTranslation`](https://mlir.llvm.org/doxygen/classmlir_1_1LLVM_1_1ModuleTranslation.html)
901object provided to them which holds the state of the translation and contains
902numerous utilities.
903
904Note that this extension mechanism is *intentionally restrictive*. LLVM IR has a
905small, relatively stable set of instructions and types that MLIR intends to
906model fully. Therefore, the extension mechanism is provided only for LLVM IR
907constructs that are more often extended -- intrinsics and metadata. The primary
908goal of the extension mechanism is to support sets of intrinsics, for example
909those representing a particular instruction set. The extension mechanism does
910not allow for customizing type or block translation, nor does it support custom
911module-level operations. Such transformations should be performed within MLIR
912and target the corresponding MLIR constructs.
913
914## Translation from LLVM IR
915
916An experimental flow allows one to import a substantially limited subset of LLVM
917IR into MLIR, producing LLVM dialect operations.
918
919```
920  mlir-translate -import-llvm filename.ll
921```
922