1# Data Layout Modeling
2
3Data layout information allows the compiler to answer questions related to how a
4value of a particular type is stored in memory. For example, the size of a value
5or its address alignment requirements. It enables, among others, the generation
6of various linear memory addressing schemes for containers of abstract types and
7deeper reasoning about vectors.
8
9The data layout subsystem is designed to scale to MLIR's open type and operation
10system. At the top level, it consists of:
11
12*   attribute interfaces that can be implemented by concrete data layout
13    specifications;
14*   type interfaces that should be implemented by types subject to data layout;
15*   operation interfaces that must be implemented by operations that can serve
16    as data layout scopes (e.g., modules);
17*   and dialect interfaces for data layout properties unrelated to specific
18    types.
19
20Built-in types are handled specially to decrease the overall query cost.
21Similarly, built-in `ModuleOp` supports data layouts without going through the
22interface.
23
24## Usage
25
26### Scoping
27
28Following MLIR's nested structure, data layout properties are _scoped_ to
29regions belonging to either operations that implement the
30`DataLayoutOpInterface` or `ModuleOp` operations. Such scoping operations
31partially control the data layout properties and may have attributes that affect
32them, typically organized in a data layout specification.
33
34Types may have a different data layout in different scopes, including scopes
35that are nested in other scopes such as modules contained in other modules. At
36the same time, within the given scope excluding any nested scope, a given type
37has fixed data layout properties. Types are also expected to have a default,
38"natural" data layout in case they are used outside of any operation that
39provides data layout scope for them. This ensures that data layout queries
40always have a valid result.
41
42### Compatibility and Transformations
43
44The information necessary to compute layout properties can be combined from
45nested scopes. For example, an outer scope can define layout properties for a
46subset of types while inner scopes define them for a disjoint subset, or scopes
47can progressively relax alignment requirements on a type. This mechanism is
48supported by the notion of data layout _compatibility_: the layout defined in a
49nested scope is expected to be compatible with that of the outer scope. MLIR
50does not prescribe what compatibility means for particular ops and types but
51provides hooks for them to provide target- and type-specific checks. For
52example, one may want to only allow relaxation of alignment constraints (i.e.,
53smaller alignment) in nested modules or, alternatively, one may require nested
54modules to fully redefine all constraints of the outer scope.
55
56Data layout compatibility is also relevant during IR transformation. Any
57transformation that affects the data layout scoping operation is expected to
58maintain data layout compatibility. It is under responsibility of the
59transformation to ensure it is indeed the case.
60
61### Queries
62
63Data layout property queries can be performed on the special object --
64`DataLayout` -- which can be created for the given scoping operation. These
65objects allow one to interface with the data layout infrastructure and query
66properties of given types in the scope of the object. The signature of
67`DataLayout` class is as follows.
68
69```c++
70class DataLayout {
71public:
72  explicit DataLayout(DataLayoutOpInterface scope);
73
74  unsigned getTypeSize(Type type) const;
75  unsigned getTypeSizeInBits(Type type) const;
76  unsigned getTypeABIAlignment(Type type) const;
77  unsigned getTypePreferredAlignment(Type type) const;
78};
79```
80
81The user can construct the `DataLayout` object for the scope of interest. Since
82the data layout properties are fixed in the scope, they will be computed only
83once upon first request and cached for further use. Therefore,
84`DataLayout(op.getParentOfType<DataLayoutOpInterface>()).getTypeSize(type)` is
85considered an anti-pattern since it discards the cache after use. Because of
86caching, a `DataLayout` object returns valid results as long as the data layout
87properties of enclosing scopes remain the same, that is, as long as none of the
88ancestor operations are modified in a way that affects data layout. After such a
89modification, the user is expected to create a fresh `DataLayout` object. To aid
90with this, `DataLayout` asserts that the scope remains identical if MLIR is
91compiled with assertions enabled.
92
93## Custom Implementations
94
95Extensibility of the data layout modeling is provided through a set of MLIR
96[Interfaces](Interfaces.md).
97
98### Data Layout Specifications
99
100Data layout specification is an [attribute](LangRef.md/#attributes) that is
101conceptually a collection of key-value pairs called data layout specification
102_entries_. Data layout specification attributes implement the
103`DataLayoutSpecInterface`, described below. Each entry is itself an attribute
104that implements the `DataLayoutEntryInterface`. Entries have a key, either a
105`Type` or a `StringAttr`, and a value. Keys are used to associate entries with
106specific types or dialects: when handling a data layout properties request, a
107type or a dialect can only see the specification entries relevant to them and
108must go through the supplied `DataLayout` object for any recursive query. This
109supports and enforces better composability because types cannot (and should not)
110understand layout details of other types. Entry values are arbitrary attributes,
111specific to the type.
112
113For example, a data layout specification may be an actual list of pairs with
114simple custom syntax resembling the following:
115
116```mlir
117#my_dialect.layout_spec<
118  #my_dialect.layout_entry<!my_dialect.type, size=42>,
119  #my_dialect.layout_entry<"my_dialect.endianness", "little">,
120  #my_dialect.layout_entry<!my_dialect.vector, prefer_large_alignment>>
121```
122
123The exact details of the specification and entry attributes, as well as their
124syntax, are up to implementations.
125
126We use the notion of _type class_ throughout the data layout subsystem. It
127corresponds to the C++ class of the given type, e.g., `IntegerType` for built-in
128integers. MLIR does not have a mechanism to represent type classes in the IR.
129Instead, data layout entries contain specific _instances_ of a type class, for
130example, `IntegerType{signedness=signless, bitwidth=8}` (or `i8` in the IR) or
131`IntegerType{signedness=unsigned, bitwidth=32}` (or `ui32` in the IR). When
132handling a data layout property query, a type class will be supplied with _all_
133entries with keys belonging to this type class. For example, `IntegerType` will
134see the entries for `i8`, `si16` and `ui32`, but will _not_ see those for `f32`
135or `memref<?xi32>` (neither will `MemRefType` see the entry for `i32`). This
136allows for type-specific "interpolation" behavior where a type class can compute
137data layout properties of _any_ specific type instance given properties of other
138instances. Using integers as an example again, their alignment could be computed
139by taking that of the closest from above integer type with power-of-two
140bitwidth.
141
142[include "Interfaces/DataLayoutAttrInterface.md"]
143
144### Data Layout Scoping Operations
145
146Operations that define a scope for data layout queries, and that can be used to
147create a `DataLayout` object, are expected to implement the
148`DataLayoutOpInterface`. Such ops must provide at least a way of obtaining the
149data layout specification. The specification need not be necessarily attached to
150the operation as an attribute and may be constructed on-the-fly; it is only
151fetched once per `DataLayout` object and cached. Such ops may also provide
152custom handlers for data layout queries that provide results without forwarding
153the queries down to specific types or post-processing the results returned by
154types in target- or scope-specific ways. These custom handlers make it possible
155for scoping operations to (re)define data layout properties for types without
156having to modify the types themselves, e.g., when types are defined in another
157dialect.
158
159[include "Interfaces/DataLayoutOpInterface.md"]
160
161### Types with Data Layout
162
163Type classes that intend to handle data layout queries themselves are expected
164to implement the `DataLayoutTypeInterface`. This interface provides overridable
165hooks for each data layout query. Each of these hooks is supplied with the type
166instance, a `DataLayout` object suitable for recursive queries, and a list of
167data layout queries relevant for the type class. It is expected to provide a
168valid result even if the list of entries is empty. These hooks do not have
169access to the operation in the scope of which the query is handled and should
170use the supplied entries instead.
171
172[include "Interfaces/DataLayoutTypeInterface.md"]
173
174### Dialects with Data Layout Identifiers
175
176For data layout entries that are not related to a particular type class, the key
177of the entry is an Identifier that belongs to some dialect. In this case, the
178dialect is expected to implement the `DataLayoutDialectInterface`. This dialect
179provides hooks for verifying the validity of the entry value attributes and for
180and the compatibility of nested entries.
181
182### Bits and Bytes
183
184Two versions of hooks are provided for sizes: in bits and in bytes. The version
185in bytes has a default implementation that derives the size in bytes by rounding
186up the result of division of the size in bits by 8. Types exclusively targeting
187architectures with different assumptions can override this. Operations can
188redefine this for all types, providing scoped versions for cases of byte sizes
189other than eight without having to modify types, including built-in types.
190
191### Query Dispatch
192
193The overall flow of a data layout property query is as follows.
194
1951.  The user constructs a `DataLayout` at the given scope. The constructor
196    fetches the data layout specification and combines it with those of
197    enclosing scopes (layouts are expected to be compatible).
1982.  The user calls `DataLayout::query(Type ty)`.
1993.  If `DataLayout` has a cached response, this response is returned
200    immediately.
2014.  Otherwise, the query is handed down by `DataLayout` to the closest layout
202    scoping operation. If it implements `DataLayoutOpInterface`, then the query
203    is forwarded to`DataLayoutOpInterface::query(ty, *this, relevantEntries)`
204    where the relevant entries are computed as described above. If it does not
205    implement `DataLayoutOpInterface`, it must be a `ModuleOp`, and the query is
206    forwarded to `DataLayoutTypeInterface::query(dataLayout, relevantEntries)`
207    after casting `ty` to the type interface.
2085.  Unless the `query` hook is reimplemented by the op interface, the query is
209    handled further down to `DataLayoutTypeInterface::query(dataLayout,
210    relevantEntries)` after casting `ty` to the type interface. If the type does
211    not implement the interface, an unrecoverable fatal error is produced.
2126.  The type is expected to always provide the response, which is returned up
213    the call stack and cached by the `DataLayout.`
214
215## Default Implementation
216
217The default implementation of the data layout interfaces directly handles
218queries for a subset of built-in types.
219
220### Built-in Modules
221
222Built-in `ModuleOp` allows at most one attribute that implements
223`DataLayoutSpecInterface`. It does not implement the entire interface for
224efficiency and layering reasons. Instead, `DataLayout` can be constructed for
225`ModuleOp` and handles modules transparently alongside other operations that
226implement the interface.
227
228### Built-in Types
229
230The following describes the default properties of built-in types.
231
232The size of built-in integers and floats in bytes is computed as
233`ceildiv(bitwidth, 8)`. The ABI alignment of integer types with bitwidth below
23464 and of the float types is the closest from above power-of-two number of
235bytes. The ABI alignment of integer types with bitwidth 64 and above is 4 bytes
236(32 bits).
237
238The size of built-in vectors is computed by first rounding their number of
239elements in the _innermost_ dimension to the closest power-of-two from above,
240then getting the total number of elements, and finally multiplying it with the
241element size. For example, `vector<3xi32>` and `vector<4xi32>` have the same
242size. So do `vector<2x3xf32>` and `vector<2x4xf32>`, but `vector<3x4xf32>` and
243`vector<4x4xf32>` have different sizes. The ABI and preferred alignment of
244vector types is computed by taking the innermost dimension of the vector,
245rounding it up to the closest power-of-two, taking a product of that with
246element size in bytes, and rounding the result up again to the closest
247power-of-two.
248
249Note: these values are selected for consistency with the
250[default data layout in LLVM](https://llvm.org/docs/LangRef.html#data-layout),
251which MLIR assumed until the introduction of proper data layout modeling, and
252with the
253[modeling of n-D vectors](https://mlir.llvm.org/docs/Dialects/Vector/#deeperdive).
254They **may change** in the future.
255
256#### `index` type
257
258Index type is an integer type used for target-specific size information in,
259e.g., `memref` operations. Its data layout is parameterized by a single integer
260data layout entry that specifies its bitwidth. For example,
261
262```mlir
263module attributes { dlti.dl_spec = #dlti.dl_spec<
264  #dlti.dl_entry<index, 32>
265>} {}
266```
267
268specifies that `index` has 32 bits. All other layout properties of `index` match
269those of the integer type with the same bitwidth defined above.
270
271In absence of the corresponding entry, `index` is assumed to be a 64-bit
272integer.
273
274#### `complex` type
275
276By default complex type is treated like a 2 element structure of its given
277element type. This is to say that each of its elements are aligned to their
278preferred alignment, the entire complex type is also aligned to this preference,
279and the complex type size includes the possible padding between elements to enforce
280alignment.
281
282### Byte Size
283
284The default data layout assumes 8-bit bytes.
285
286### DLTI Dialect
287
288The [DLTI](Dialects/DLTI.md) dialect provides the attributes implementing
289`DataLayoutSpecInterface` and `DataLayoutEntryInterface`, as well as a dialect
290attribute that can be used to attach the specification to a given operation. The
291verifier of this attribute triggers those of the specification and checks the
292compatibility of nested specifications.
293