1# Bufferization
2
3[TOC]
4
5## Overview
6
7Bufferization in MLIR is the process of converting the `tensor` type to the
8`memref` type. MLIR provides a composable system that allows dialects to
9systematically bufferize a program. This system is a simple application of
10MLIR's [dialect conversion](DialectConversion.md) infrastructure. The bulk of
11the code related to bufferization is a set of ordinary `ConversionPattern`'s
12that dialect authors write for converting ops that operate on `tensor`'s to ops
13that operate on `memref`'s. A set of conventions and best practices are followed
14that allow these patterns to be run across multiple independent passes (rather
15than requiring a single huge atomic conversion pass), which makes the
16compilation pipelines scalable, robust, and easy to debug.
17
18This document is targeted at people looking to utilize MLIR's bufferization
19functionality, along with people who want to extend it to cover their own ops.
20
21<a name="the-talk">**NOTE:**</a> Before reading this document, please watch the
22talk "Type Conversions the Not-So-Hard-Way: MLIR's New Bufferization
23Infrastructure"
24([slides](https://drive.google.com/file/d/1FVbzCXxZzS9LBLuvpPNLWJD-XDkt54ky/view?usp=sharing),
25[recording](https://drive.google.com/file/d/1VfVajitgf8ZPnd-HRkJvaJiFLhBsluXN/view?usp=sharing)).
26That talk gives a high-level overview of the bufferization infrastructure and
27important conceptual details related to using the MLIR dialect conversion
28infrastructure.
29
30## Bufferization's place in a compilation pipeline
31
32Bufferization itself does not free any of the buffers that have been allocated,
33nor does it do anything particularly intelligent with the placement of buffers
34w.r.t. control flow. Thus, a realistic compilation pipeline will usually consist
35of:
36
371.  Bufferization
381.  Buffer optimizations such as `buffer-hoisting`, `buffer-loop-hoisting`, and
39    `promote-buffers-to-stack`, which do optimizations that are only exposed
40    after bufferization.
411.  Finally, running the [buffer deallocation](BufferDeallocationInternals.md)
42    pass.
43
44After buffer deallocation has been completed, the program will be quite
45difficult to transform due to the presence of the deallocation ops. Thus, other
46optimizations such as linalg fusion on memrefs should be done before that stage.
47
48## General structure of the bufferization process
49
50Bufferization consists of running multiple *partial* bufferization passes,
51followed by one *finalizing* bufferization pass.
52
53There is typically one partial bufferization pass per dialect (though other
54subdivisions are possible). For example, for a dialect `X` there will typically
55be a pass `X-bufferize` that knows how to bufferize all the ops in that dialect.
56By running pass `X-bufferize` for each dialect `X` in the program, all the ops
57in the program are incrementally bufferized.
58
59Partial bufferization passes create programs where only some ops have been
60bufferized. These passes will create *materializations* (also sometimes called
61"casts") that convert between the `tensor` and `memref` type, which allows
62bridging between ops that have been bufferized and ops that have not yet been
63bufferized.
64
65Finalizing bufferizations complete the bufferization process, and guarantee that
66there are no tensors remaining in the program. This involves eliminating the
67materializations. The pass `finalizing-bufferize` provides a minimal pass that
68only eliminates materializations and issues an error if any unbufferized ops
69exist in the program.
70
71However, it is possible for a finalizing bufferization to do more than just
72eliminate materializations. By adding patterns (just as a partial bufferization
73would), it is possible for a finalizing bufferization pass to simultaneously
74bufferize ops and eliminate materializations. This has a number of disadvantages
75discussed in the talk and should generally be avoided.
76
77### Example
78
79As a concrete example, we will look at the bufferization pipeline from the
80`mlir-npcomp` reference backend
81([code](https://github.com/llvm/mlir-npcomp/blob/97d6d04d41216e73d40b89ffd79620973fc14ce3/lib/RefBackend/RefBackend.cpp#L232)).
82The code, slightly simplified and annotated, is reproduced here:
83
84```c++
85  // Partial bufferization passes.
86  pm.addPass(createTensorConstantBufferizePass());
87  pm.addNestedPass<FuncOp>(createTCPBufferizePass()); // Bufferizes the downstream `tcp` dialect.
88  pm.addNestedPass<FuncOp>(createSCFBufferizePass());
89  pm.addNestedPass<FuncOp>(createLinalgBufferizePass());
90  pm.addNestedPass<FuncOp>(createStdBufferizePass());
91  pm.addNestedPass<FuncOp>(createTensorBufferizePass());
92  pm.addPass(createFuncBufferizePass());
93
94  // Finalizing bufferization pass.
95  pm.addNestedPass<FuncOp>(createFinalizingBufferizePass());
96```
97
98Looking first at the partial bufferization passes, we see that there are a
99sequence of `FuncOp` passes (which run in parallel on functions). These function
100passes are bracketed by `tensor-constant-bufferize` and `func-bufferize`, which
101are module passes (and thus serialize the parallel compilation process). These
102two passes must be module passes because they make changes to the top-level
103module.
104
105The bulk of the bufferization work is done by the function passes. Most of these
106passes are provided as part of the upstream MLIR distribution and bufferize
107their respective dialects (e.g. `scf-bufferize` bufferizes the `scf` dialect).
108The `tcp-bufferize` pass is an exception -- it is a partial bufferization pass
109used to bufferize the downstream `tcp` dialect, and fits in perfectly with all
110the other passes provided upstream.
111
112The last pass is the finalizing bufferization pass. The `mlir-npcomp` reference
113backend has arranged that all ops are bufferized by partial bufferizations, so
114that the upstream `finalizing-bufferize` pass can be used as the finalizing
115bufferization pass. This gives excellent diagnostics when something goes wrong
116with the bufferization process, such as due to an op that wasn't handled by any
117pattern.
118
119## How to write a partial bufferization pass
120
121The contract of a partial bufferization pass is that a subset of ops (or kinds
122of ops, customizable by a ConversionTarget) get bufferized.
123
124A partial bufferization pass is just a pass that uses the
125[dialect conversion](DialectConversion.md) framework to apply
126`ConversionPattern`s with a `tensor` to `memref` type conversion.
127
128To describe how to write such a pass, we will walk through an example, the
129`tensor-bufferize` pass
130([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23),
131[test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Tensor/bufferize.mlir#L1))
132that bufferizes the `tensor` dialect.
133
134The bulk of the code in the pass will be a set of conversion patterns, with a
135simple example being
136[BufferizeCastOp](https://github.com/llvm/llvm-project/blob/2bf6e443e54604c7818c4d1a1837f3d091023270/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23)).
137
138```
139class BufferizeCastOp : public OpConversionPattern<tensor::CastOp> {
140public:
141  using OpConversionPattern::OpConversionPattern;
142  LogicalResult
143  matchAndRewrite(tensor::CastOp op, OpAdaptor adaptor,
144                  ConversionPatternRewriter &rewriter) const override {
145    auto resultType = getTypeConverter()->convertType(op.getType());
146    rewriter.replaceOpWithNewOp<MemRefCastOp>(op, resultType, adaptor.source());
147    return success();
148  }
149};
150```
151
152See [the talk](#the-talk) for more details on how to write these patterns.
153
154The
155[pass itself](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L57)
156is very small, and follows the basic pattern of any dialect conversion pass.
157
158```
159void mlir::populateTensorBufferizePatterns(
160    BufferizeTypeConverter &typeConverter, RewritePatternSet &patterns) {
161  patterns.add<BufferizeCastOp, BufferizeExtractOp>(typeConverter,
162                                                    patterns.getContext());
163}
164
165struct TensorBufferizePass : public TensorBufferizeBase<TensorBufferizePass> {
166  void runOnFunction() override {
167    auto *context = &getContext();
168    BufferizeTypeConverter typeConverter;
169    RewritePatternSet patterns(context);
170    ConversionTarget target(*context);
171
172    populateTensorBufferizePatterns(typeConverter, patterns);
173    target.addIllegalOp<tensor::CastOp, tensor::ExtractOp>();
174    target.addLegalDialect<StandardOpsDialect>();
175
176    if (failed(
177            applyPartialConversion(getFunction(), target, std::move(patterns))))
178      signalPassFailure();
179  }
180};
181```
182
183The pass has all the hallmarks of a dialect conversion pass that does type
184conversions: a `TypeConverter`, a `RewritePatternSet`, and a `ConversionTarget`,
185and a call to `applyPartialConversion`. Note that a function
186`populateTensorBufferizePatterns` is separated, so that power users can use the
187patterns independently, if necessary (such as to combine multiple sets of
188conversion patterns into a single conversion call, for performance).
189
190One convenient utility provided by the MLIR bufferization infrastructure is the
191`BufferizeTypeConverter`, which comes pre-loaded with the necessary conversions
192and materializations between `tensor` and `memref`.
193
194In this case, the `BufferizationOpsDialect` is marked as legal, so the
195`bufferization.to_tensor` and `bufferization.to_memref` ops, which are inserted
196automatically by the dialect conversion framework as materializations, are
197legal. There is a helper `populateBufferizeMaterializationLegality`
198([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L53))
199which helps with this in general.
200
201### Other partial bufferization examples
202
203-   `linalg-bufferize`
204    ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L1),
205    [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Linalg/bufferize.mlir#L1))
206
207    -   Bufferizes the `linalg` dialect.
208    -   This is an example of how to simultaneously bufferize all the ops that
209        satisfy a certain OpInterface with a single pattern. Specifically,
210        `BufferizeAnyLinalgOp`
211        ([code](https://github.com/llvm/llvm-project/blob/daaaed6bb89044ac58a23f1bb1ccdd12342a5a58/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L170))
212        bufferizes any ops that implements the `LinalgOp` interface.
213
214-   `scf-bufferize`
215    ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/SCF/Transforms/Bufferize.cpp#L1),
216    [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/SCF/bufferize.mlir#L1))
217
218    -   Bufferizes ops from the `scf` dialect.
219    -   This is an example of how to bufferize ops that implement
220        `RegionBranchOpInterface` (that is, they use regions to represent
221        control flow).
222    -   The bulk of the work is done by
223        `lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp`
224        ([code](https://github.com/llvm/llvm-project/blob/daaaed6bb89044ac58a23f1bb1ccdd12342a5a58/mlir/lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp#L1)),
225        which is well-commented and covers how to correctly convert ops that
226        contain regions.
227
228-   `func-bufferize`
229    ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/StandardOps/Transforms/FuncBufferize.cpp#L1),
230    [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Standard/func-bufferize.mlir#L1))
231
232    -   Bufferizes `func`, `call`, and `BranchOpInterface` ops.
233    -   This is an example of how to bufferize ops that have multi-block
234        regions.
235    -   This is an example of a pass that is not split along dialect
236        subdivisions.
237
238-   `tensor-constant-bufferize`
239    ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/StandardOps/Transforms/TensorConstantBufferize.cpp#L1),
240    [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Standard/tensor-constant-bufferize.mlir#L1))
241
242    -   Bufferizes only `arith.constant` ops of `tensor` type.
243    -   This is an example of setting up the legality so that only a subset of
244        `arith.constant` ops get bufferized.
245    -   This is an example of a pass that is not split along dialect
246        subdivisions.
247
248## How to write a finalizing bufferization pass
249
250The contract of a finalizing bufferization pass is that all tensors are gone
251from the program.
252
253The easiest way to write a finalizing bufferize pass is to not write one at all!
254MLIR provides a pass `finalizing-bufferize` which eliminates the
255`bufferization.to_tensor` / `bufferization.to_memref` materialization ops
256inserted by partial bufferization passes and emits an error if that is not
257sufficient to remove all tensors from the program.
258
259This pass is sufficient when partial bufferization passes have bufferized all
260the ops in the program, leaving behind only the materializations. When possible,
261it is recommended to structure your pass pipeline this way, as this has the
262significant advantage that if an op does not get bufferized (due to a missing
263pattern, bug in the code, etc.), `finalizing-bufferize` will emit a nice clean
264error, and the IR seen by `finalizing-bufferize` will only contain only one
265unbufferized op.
266
267However, before the current bufferization infrastructure was put in place,
268bufferization could only be done as a single finalizing bufferization mega-pass
269that used the `populate*BufferizePatterns` functions from multiple dialects to
270simultaneously bufferize everything at once. Thus, one might see code in
271downstream projects structured this way. This structure is not recommended in
272new code. A helper, `populateEliminateBufferizeMaterializationsPatterns`
273([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L58))
274is available for such passes to provide patterns that eliminate
275`bufferization.to_tensor` and `bufferization.to_memref`.
276
277## Changes since [the talk](#the-talk)
278
279-   `func-bufferize` was changed to be a partial conversion pass, and there is a
280    new `finalizing-bufferize` which serves as a general finalizing
281    bufferization pass.
282