1# 'llvm' Dialect
2
3This dialect maps [LLVM IR](https://llvm.org/docs/LangRef.html) into MLIR by
4defining the corresponding operations and types. LLVM IR metadata is usually
5represented as MLIR attributes, which offer additional structure verification.
6
7We use "LLVM IR" to designate the
8[intermediate representation of LLVM](https://llvm.org/docs/LangRef.html) and
9"LLVM _dialect_" or "LLVM IR _dialect_" to refer to this MLIR dialect.
10
11Unless explicitly stated otherwise, the semantics of the LLVM dialect operations
12must correspond to the semantics of LLVM IR instructions and any divergence is
13considered a bug. The dialect also contains auxiliary operations that smoothen
14the differences in the IR structure, e.g., MLIR does not have `phi` operations
15and LLVM IR does not have a `constant` operation. These auxiliary operations are
16systematically prefixed with `mlir`, e.g. `llvm.mlir.constant` where `llvm.` is
17the dialect namespace prefix.
18
19[TOC]
20
21## Dependency on LLVM IR
22
23LLVM dialect is not expected to depend on any object that requires an
24`LLVMContext`, such as an LLVM IR instruction or type. Instead, MLIR provides
25thread-safe alternatives compatible with the rest of the infrastructure. The
26dialect is allowed to depend on the LLVM IR objects that don't require a
27context, such as data layout and triple description.
28
29## Module Structure
30
31IR modules use the built-in MLIR `ModuleOp` and support all its features. In
32particular, modules can be named, nested and are subject to symbol visibility.
33Modules can contain any operations, including LLVM functions and globals.
34
35### Data Layout and Triple
36
37An IR module may have an optional data layout and triple information attached
38using MLIR attributes `llvm.data_layout` and `llvm.triple`, respectively. Both
39are string attributes with the
40[same syntax](https://llvm.org/docs/LangRef.html#data-layout) as in LLVM IR and
41are verified to be correct. They can be defined as follows.
42
43```mlir
44module attributes {llvm.data_layout = "e",
45                   llvm.target_triple = "aarch64-linux-android"} {
46  // module contents
47}
48```
49
50### Functions
51
52LLVM functions are represented by a special operation, `llvm.func`, that has
53syntax similar to that of the built-in function operation but supports
54LLVM-related features such as linkage and variadic argument lists. See detailed
55description in the operation list [below](#llvmfunc-mlirllvmllvmfuncop).
56
57### PHI Nodes and Block Arguments
58
59MLIR uses block arguments instead of PHI nodes to communicate values between
60blocks. Therefore, the LLVM dialect has no operation directly equivalent to
61`phi` in LLVM IR. Instead, all terminators can pass values as successor operands
62as these values will be forwarded as block arguments when the control flow is
63transferred.
64
65For example:
66
67```mlir
68^bb1:
69  %0 = llvm.addi %arg0, %cst : i32
70  llvm.br ^bb2[%0: i32]
71
72// If the control flow comes from ^bb1, %arg1 == %0.
73^bb2(%arg1: i32)
74  // ...
75```
76
77is equivalent to LLVM IR
78
79```llvm
80%0:
81  %1 = add i32 %arg0, %cst
82  br %3
83
84%3:
85  %arg1 = phi [%1, %0], //...
86```
87
88Since there is no need to use the block identifier to differentiate the source
89of different values, the LLVM dialect supports terminators that transfer the
90control flow to the same block with different arguments. For example:
91
92```mlir
93^bb1:
94  llvm.cond_br %cond, ^bb2[%0: i32], ^bb2[%1: i32]
95
96^bb2(%arg0: i32):
97  // ...
98```
99
100### Context-Level Values
101
102Some value kinds in LLVM IR, such as constants and undefs, are uniqued in
103context and used directly in relevant operations. MLIR does not support such
104values for thread-safety and concept parsimony reasons. Instead, regular values
105are produced by dedicated operations that have the corresponding semantics:
106[`llvm.mlir.constant`](#llvmmlirconstant-mlirllvmconstantop),
107[`llvm.mlir.undef`](#llvmmlirundef-mlirllvmundefop),
108[`llvm.mlir.null`](#llvmmlirnull-mlirllvmnullop). Note how these operations are
109prefixed with `mlir.` to indicate that they don't belong to LLVM IR but are only
110necessary to model it in MLIR. The values produced by these operations are
111usable just like any other value.
112
113Examples:
114
115```mlir
116// Create an undefined value of structure type with a 32-bit integer followed
117// by a float.
118%0 = llvm.mlir.undef : !llvm.struct<(i32, f32)>
119
120// Null pointer to i8.
121%1 = llvm.mlir.null : !llvm.ptr<i8>
122
123// Null pointer to a function with signature void().
124%2 = llvm.mlir.null : !llvm.ptr<func<void ()>>
125
126// Constant 42 as i32.
127%3 = llvm.mlir.constant(42 : i32) : i32
128
129// Splat dense vector constant.
130%3 = llvm.mlir.constant(dense<1.0> : vector<4xf32>) : vector<4xf32>
131```
132
133Note that constants list the type twice. This is an artifact of the LLVM dialect
134not using built-in types, which are used for typed MLIR attributes. The syntax
135will be reevaluated after considering composite constants.
136
137### Globals
138
139Global variables are also defined using a special operation,
140[`llvm.mlir.global`](#llvmmlirglobal-mlirllvmglobalop), located at the module
141level. Globals are MLIR symbols and are identified by their name.
142
143Since functions need to be isolated-from-above, i.e. values defined outside the
144function cannot be directly used inside the function, an additional operation,
145[`llvm.mlir.addressof`](#llvmmliraddressof-mlirllvmaddressofop), is provided to
146locally define a value containing the _address_ of a global. The actual value
147can then be loaded from that pointer, or a new value can be stored into it if
148the global is not declared constant. This is similar to LLVM IR where globals
149are accessed through name and have a pointer type.
150
151### Linkage
152
153Module-level named objects in the LLVM dialect, namely functions and globals,
154have an optional _linkage_ attribute derived from LLVM IR
155[linkage types](https://llvm.org/docs/LangRef.html#linkage-types). Linkage is
156specified by the same keyword as in LLVM IR and is located between the operation
157name (`llvm.func` or `llvm.global`) and the symbol name. If no linkage keyword
158is present, `external` linkage is assumed by default. Linkage is _distinct_ from
159MLIR symbol visibility.
160
161### Attribute Pass-Through
162
163The LLVM dialect provides a mechanism to forward function-level attributes to
164LLVM IR using the `passthrough` attribute. This is an array attribute containing
165either string attributes or array attributes. In the former case, the value of
166the string is interpreted as the name of LLVM IR function attribute. In the
167latter case, the array is expected to contain exactly two string attributes, the
168first corresponding to the name of LLVM IR function attribute, and the second
169corresponding to its value. Note that even integer LLVM IR function attributes
170have their value represented in the string form.
171
172Example:
173
174```mlir
175llvm.func @func() attributes {
176  passthrough = ["noinline",           // value-less attribute
177                 ["alignstack", "4"],  // integer attribute with value
178                 ["other", "attr"]]    // attribute unknown to LLVM
179} {
180  llvm.return
181}
182```
183
184If the attribute is not known to LLVM IR, it will be attached as a string
185attribute.
186
187## Types
188
189LLVM dialect uses built-in types whenever possible and defines a set of
190complementary types, which correspond to the LLVM IR types that cannot be
191directly represented with built-in types. Similarly to other MLIR context-owned
192objects, the creation and manipulation of LLVM dialect types is thread-safe.
193
194MLIR does not support module-scoped named type declarations, e.g. `%s = type
195{i32, i32}` in LLVM IR. Instead, types must be fully specified at each use,
196except for recursive types where only the first reference to a named type needs
197to be fully specified. MLIR [type aliases](../LangRef.md/#type-aliases) can be
198used to achieve more compact syntax.
199
200The general syntax of LLVM dialect types is `!llvm.`, followed by a type kind
201identifier (e.g., `ptr` for pointer or `struct` for structure) and by an
202optional list of type parameters in angle brackets. The dialect follows MLIR
203style for types with nested angle brackets and keyword specifiers rather than
204using different bracket styles to differentiate types. Types inside the angle
205brackets may omit the `!llvm.` prefix for brevity: the parser first attempts to
206find a type (starting with `!` or a built-in type) and falls back to accepting a
207keyword. For example, `!llvm.ptr<!llvm.ptr<i32>>` and `!llvm.ptr<ptr<i32>>` are
208equivalent, with the latter being the canonical form, and denote a pointer to a
209pointer to a 32-bit integer.
210
211### Built-in Type Compatibility
212
213LLVM dialect accepts a subset of built-in types that are referred to as _LLVM
214dialect-compatible types_. The following types are compatible:
215
216-   Signless integers - `iN` (`IntegerType`).
217-   Floating point types - `bfloat`, `half`, `float`, `double` , `f80`, `f128`
218    (`FloatType`).
219-   1D vectors of signless integers or floating point types - `vector<NxT>`
220    (`VectorType`).
221
222Note that only a subset of types that can be represented by a given class is
223compatible. For example, signed and unsigned integers are not compatible. LLVM
224provides a function, `bool LLVM::isCompatibleType(Type)`, that can be used as a
225compatibility check.
226
227Each LLVM IR type corresponds to *exactly one* MLIR type, either built-in or
228LLVM dialect type. For example, because `i32` is LLVM-compatible, there is no
229`!llvm.i32` type. However, `!llvm.ptr<T>` is defined in the LLVM dialect as
230there is no corresponding built-in type.
231
232### Additional Simple Types
233
234The following non-parametric types derived from the LLVM IR are available in the
235LLVM dialect:
236
237-   `!llvm.x86_mmx` (`LLVMX86MMXType`) - value held in an MMX register on x86
238    machine.
239-   `!llvm.ppc_fp128` (`LLVMPPCFP128Type`) - 128-bit floating-point value (two
240    64 bits).
241-   `!llvm.token` (`LLVMTokenType`) - a non-inspectable value associated with an
242    operation.
243-   `!llvm.metadata` (`LLVMMetadataType`) - LLVM IR metadata, to be used only if
244    the metadata cannot be represented as structured MLIR attributes.
245-   `!llvm.void` (`LLVMVoidType`) - does not represent any value; can only
246    appear in function results.
247
248These types represent a single value (or an absence thereof in case of `void`)
249and correspond to their LLVM IR counterparts.
250
251### Additional Parametric Types
252
253These types are parameterized by the types they contain, e.g., the pointee or
254the element type, which can be either compatible built-in or LLVM dialect types.
255
256#### Pointer Types
257
258Pointer types specify an address in memory.
259
260Both opaque and type-parameterized pointer types are supported.
261[Opaque pointers](https://llvm.org/docs/OpaquePointers.html) do not indicate the
262type of the data pointed to, and are intended to simplify LLVM IR by encoding
263behavior relevant to the pointee type into operations rather than into types.
264Non-opaque pointer types carry the pointee type as a type parameter. Both kinds
265of pointers may be additionally parameterized by an address space. The address
266space is an integer, but this choice may be reconsidered if MLIR implements
267named address spaces. The syntax of pointer types is as follows:
268
269```
270  llvm-ptr-type ::= `!llvm.ptr` (`<` integer-literal `>`)?
271                  | `!llvm.ptr<` type (`,` integer-literal)? `>`
272```
273
274where the former case is the opaque pointer type and the latter case is the
275non-opaque pointer type; the optional group containing the integer literal
276corresponds to the memory space. All cases are represented by `LLVMPointerType`
277internally.
278
279#### Array Types
280
281Array types represent sequences of elements in memory. Array elements can be
282addressed with a value unknown at compile time, and can be nested. Only 1D
283arrays are allowed though.
284
285Array types are parameterized by the fixed size and the element type.
286Syntactically, their representation is the following:
287
288```
289  llvm-array-type ::= `!llvm.array<` integer-literal `x` type `>`
290```
291
292and they are internally represented as `LLVMArrayType`.
293
294#### Function Types
295
296Function types represent the type of a function, i.e. its signature.
297
298Function types are parameterized by the result type, the list of argument types
299and by an optional "variadic" flag. Unlike built-in `FunctionType`, LLVM dialect
300functions (`LLVMFunctionType`) always have single result, which may be
301`!llvm.void` if the function does not return anything. The syntax is as follows:
302
303```
304  llvm-func-type ::= `!llvm.func<` type `(` type-list (`,` `...`)? `)` `>`
305```
306
307For example,
308
309```mlir
310!llvm.func<void ()>           // a function with no arguments;
311!llvm.func<i32 (f32, i32)>    // a function with two arguments and a result;
312!llvm.func<void (i32, ...)>   // a variadic function with at least one argument.
313```
314
315In the LLVM dialect, functions are not first-class objects and one cannot have a
316value of function type. Instead, one can take the address of a function and
317operate on pointers to functions.
318
319### Vector Types
320
321Vector types represent sequences of elements, typically when multiple data
322elements are processed by a single instruction (SIMD). Vectors are thought of as
323stored in registers and therefore vector elements can only be addressed through
324constant indices.
325
326Vector types are parameterized by the size, which may be either _fixed_ or a
327multiple of some fixed size in case of _scalable_ vectors, and the element type.
328Vectors cannot be nested and only 1D vectors are supported. Scalable vectors are
329still considered 1D.
330
331LLVM dialect uses built-in vector types for _fixed_-size vectors of built-in
332types, and provides additional types for fixed-sized vectors of LLVM dialect
333types (`LLVMFixedVectorType`) and scalable vectors of any types
334(`LLVMScalableVectorType`). These two additional types share the following
335syntax:
336
337```
338  llvm-vec-type ::= `!llvm.vec<` (`?` `x`)? integer-literal `x` type `>`
339```
340
341Note that the sets of element types supported by built-in and LLVM dialect
342vector types are mutually exclusive, e.g., the built-in vector type does not
343accept `!llvm.ptr<i32>` and the LLVM dialect fixed-width vector type does not
344accept `i32`.
345
346The following functions are provided to operate on any kind of the vector types
347compatible with the LLVM dialect:
348
349-   `bool LLVM::isCompatibleVectorType(Type)` - checks whether a type is a
350    vector type compatible with the LLVM dialect;
351-   `Type LLVM::getVectorElementType(Type)` - returns the element type of any
352    vector type compatible with the LLVM dialect;
353-   `llvm::ElementCount LLVM::getVectorNumElements(Type)` - returns the number
354    of elements in any vector type compatible with the LLVM dialect;
355-   `Type LLVM::getFixedVectorType(Type, unsigned)` - gets a fixed vector type
356    with the given element type and size; the resulting type is either a
357    built-in or an LLVM dialect vector type depending on which one supports the
358    given element type.
359
360#### Examples of Compatible Vector Types
361
362```mlir
363vector<42 x i32>                   // Vector of 42 32-bit integers.
364!llvm.vec<42 x ptr<i32>>           // Vector of 42 pointers to 32-bit integers.
365!llvm.vec<? x 4 x i32>             // Scalable vector of 32-bit integers with
366                                   // size divisible by 4.
367!llvm.array<2 x vector<2 x i32>>   // Array of 2 vectors of 2 32-bit integers.
368!llvm.array<2 x vec<2 x ptr<i32>>> // Array of 2 vectors of 2 pointers to 32-bit
369                                   // integers.
370```
371
372### Structure Types
373
374The structure type is used to represent a collection of data members together in
375memory. The elements of a structure may be any type that has a size.
376
377Structure types are represented in a single dedicated class
378mlir::LLVM::LLVMStructType. Internally, the struct type stores a (potentially
379empty) name, a (potentially empty) list of contained types and a bitmask
380indicating whether the struct is named, opaque, packed or uninitialized.
381Structure types that don't have a name are referred to as _literal_ structs.
382Such structures are uniquely identified by their contents. _Identified_ structs
383on the other hand are uniquely identified by the name.
384
385#### Identified Structure Types
386
387Identified structure types are uniqued using their name in a given context.
388Attempting to construct an identified structure with the same name a structure
389that already exists in the context *will result in the existing structure being
390returned*. **MLIR does not auto-rename identified structs in case of name
391conflicts** because there is no naming scope equivalent to a module in LLVM IR
392since MLIR modules can be arbitrarily nested.
393
394Programmatically, identified structures can be constructed in an _uninitialized_
395state. In this case, they are given a name but the body must be set up by a
396later call, using MLIR's type mutation mechanism. Such uninitialized types can
397be used in type construction, but must be eventually initialized for IR to be
398valid. This mechanism allows for constructing _recursive_ or mutually referring
399structure types: an uninitialized type can be used in its own initialization.
400
401Once the type is initialized, its body cannot be changed anymore. Any further
402attempts to modify the body will fail and return failure to the caller _unless
403the type is initialized with the exact same body_. Type initialization is
404thread-safe; however, if a concurrent thread initializes the type before the
405current thread, the initialization may return failure.
406
407The syntax for identified structure types is as follows.
408
409```
410llvm-ident-struct-type ::= `!llvm.struct<` string-literal, `opaque` `>`
411                         | `!llvm.struct<` string-literal, `packed`?
412                           `(` type-or-ref-list  `)` `>`
413type-or-ref-list ::= <maybe empty comma-separated list of type-or-ref>
414type-or-ref ::= <any compatible type with optional !llvm.>
415              | `!llvm.`? `struct<` string-literal `>`
416```
417
418The body of the identified struct is printed in full unless the it is
419transitively contained in the same struct. In the latter case, only the
420identifier is printed. For example, the structure containing the pointer to
421itself is represented as `!llvm.struct<"A", (ptr<"A">)>`, and the structure `A`
422containing two pointers to the structure `B` containing a pointer to the
423structure `A` is represented as `!llvm.struct<"A", (ptr<"B", (ptr<"A">)>,
424ptr<"B", (ptr<"A">))>`. Note that the structure `B` is "unrolled" for both
425elements. _A structure with the same name but different body is a syntax error._
426**The user must ensure structure name uniqueness across all modules processed in
427a given MLIR context.** Structure names are arbitrary string literals and may
428include, e.g., spaces and keywords.
429
430Identified structs may be _opaque_. In this case, the body is unknown but the
431structure type is considered _initialized_ and is valid in the IR.
432
433#### Literal Structure Types
434
435Literal structures are uniqued according to the list of elements they contain,
436and can optionally be packed. The syntax for such structs is as follows.
437
438```
439llvm-literal-struct-type ::= `!llvm.struct<` `packed`? `(` type-list `)` `>`
440type-list ::= <maybe empty comma-separated list of types with optional !llvm.>
441```
442
443Literal structs cannot be recursive, but can contain other structs. Therefore,
444they must be constructed in a single step with the entire list of contained
445elements provided.
446
447#### Examples of Structure Types
448
449```mlir
450!llvm.struct<>                  // NOT allowed
451!llvm.struct<()>                // empty, literal
452!llvm.struct<(i32)>             // literal
453!llvm.struct<(struct<(i32)>)>   // struct containing a struct
454!llvm.struct<packed (i8, i32)>  // packed struct
455!llvm.struct<"a">               // recursive reference, only allowed within
456                                // another struct, NOT allowed at top level
457!llvm.struct<"a", ptr<struct<"a">>>  // supported example of recursive reference
458!llvm.struct<"a", ()>           // empty, named (necessary to differentiate from
459                                // recursive reference)
460!llvm.struct<"a", opaque>       // opaque, named
461!llvm.struct<"a", (i32)>        // named
462!llvm.struct<"a", packed (i8, i32)>  // named, packed
463```
464
465### Unsupported Types
466
467LLVM IR `label` type does not have a counterpart in the LLVM dialect since, in
468MLIR, blocks are not values and don't need a type.
469
470## Operations
471
472All operations in the LLVM IR dialect have a custom form in MLIR. The mnemonic
473of an operation is that used in LLVM IR prefixed with "`llvm.`".
474
475[include "Dialects/LLVMOps.md"]
476
477## Operations for LLVM IR Intrinsics
478
479MLIR operation system is open making it unnecessary to introduce a hard bound
480between "core" operations and "intrinsics". General LLVM IR intrinsics are
481modeled as first-class operations in the LLVM dialect. Target-specific LLVM IR
482intrinsics, e.g., NVVM or ROCDL, are modeled as separate dialects.
483
484[include "Dialects/LLVMIntrinsicOps.md"]
485