1# Bufferization 2 3[TOC] 4 5## Overview 6 7Bufferization in MLIR is the process of converting the `tensor` type to the 8`memref` type. MLIR provides a composable system that allows dialects to 9systematically bufferize a program. This system is a simple application of 10MLIR's [dialect conversion](DialectConversion.md) infrastructure. The bulk of 11the code related to bufferization is a set of ordinary `ConversionPattern`'s 12that dialect authors write for converting ops that operate on `tensor`'s to ops 13that operate on `memref`'s. A set of conventions and best practices are followed 14that allow these patterns to be run across multiple independent passes (rather 15than requiring a single huge atomic conversion pass), which makes the 16compilation pipelines scalable, robust, and easy to debug. 17 18This document is targeted at people looking to utilize MLIR's bufferization 19functionality, along with people who want to extend it to cover their own ops. 20 21<a name="the-talk">**NOTE:**</a> Before reading this document, please watch the 22talk "Type Conversions the Not-So-Hard-Way: MLIR's New Bufferization 23Infrastructure" 24([slides](https://drive.google.com/file/d/1FVbzCXxZzS9LBLuvpPNLWJD-XDkt54ky/view?usp=sharing), 25[recording](https://drive.google.com/file/d/1VfVajitgf8ZPnd-HRkJvaJiFLhBsluXN/view?usp=sharing)). 26That talk gives a high-level overview of the bufferization infrastructure and 27important conceptual details related to using the MLIR dialect conversion 28infrastructure. 29 30## Bufferization's place in a compilation pipeline 31 32Bufferization itself does not free any of the buffers that have been allocated, 33nor does it do anything particularly intelligent with the placement of buffers 34w.r.t. control flow. Thus, a realistic compilation pipeline will usually consist 35of: 36 371. Bufferization 381. Buffer optimizations such as `buffer-hoisting`, `buffer-loop-hoisting`, and 39 `promote-buffers-to-stack`, which do optimizations that are only exposed 40 after bufferization. 411. Finally, running the [buffer deallocation](BufferDeallocationInternals.md) 42 pass. 43 44After buffer deallocation has been completed, the program will be quite 45difficult to transform due to the presence of the deallocation ops. Thus, other 46optimizations such as linalg fusion on memrefs should be done before that stage. 47 48## General structure of the bufferization process 49 50Bufferization consists of running multiple *partial* bufferization passes, 51followed by one *finalizing* bufferization pass. 52 53There is typically one partial bufferization pass per dialect (though other 54subdivisions are possible). For example, for a dialect `X` there will typically 55be a pass `X-bufferize` that knows how to bufferize all the ops in that dialect. 56By running pass `X-bufferize` for each dialect `X` in the program, all the ops 57in the program are incrementally bufferized. 58 59Partial bufferization passes create programs where only some ops have been 60bufferized. These passes will create *materializations* (also sometimes called 61"casts") that convert between the `tensor` and `memref` type, which allows 62bridging between ops that have been bufferized and ops that have not yet been 63bufferized. 64 65Finalizing bufferizations complete the bufferization process, and guarantee that 66there are no tensors remaining in the program. This involves eliminating the 67materializations. The pass `finalizing-bufferize` provides a minimal pass that 68only eliminates materializations and issues an error if any unbufferized ops 69exist in the program. 70 71However, it is possible for a finalizing bufferization to do more than just 72eliminate materializations. By adding patterns (just as a partial bufferization 73would), it is possible for a finalizing bufferization pass to simultaneously 74bufferize ops and eliminate materializations. This has a number of disadvantages 75discussed in the talk and should generally be avoided. 76 77### Example 78 79As a concrete example, we will look at the bufferization pipeline from the 80`mlir-npcomp` reference backend 81([code](https://github.com/llvm/mlir-npcomp/blob/97d6d04d41216e73d40b89ffd79620973fc14ce3/lib/RefBackend/RefBackend.cpp#L232)). 82The code, slightly simplified and annotated, is reproduced here: 83 84```c++ 85 // Partial bufferization passes. 86 pm.addPass(createTensorConstantBufferizePass()); 87 pm.addNestedPass<FuncOp>(createTCPBufferizePass()); // Bufferizes the downstream `tcp` dialect. 88 pm.addNestedPass<FuncOp>(createSCFBufferizePass()); 89 pm.addNestedPass<FuncOp>(createLinalgBufferizePass()); 90 pm.addNestedPass<FuncOp>(createStdBufferizePass()); 91 pm.addNestedPass<FuncOp>(createTensorBufferizePass()); 92 pm.addPass(createFuncBufferizePass()); 93 94 // Finalizing bufferization pass. 95 pm.addNestedPass<FuncOp>(createFinalizingBufferizePass()); 96``` 97 98Looking first at the partial bufferization passes, we see that there are a 99sequence of `FuncOp` passes (which run in parallel on functions). These function 100passes are bracketed by `tensor-constant-bufferize` and `func-bufferize`, which 101are module passes (and thus serialize the parallel compilation process). These 102two passes must be module passes because they make changes to the top-level 103module. 104 105The bulk of the bufferization work is done by the function passes. Most of these 106passes are provided as part of the upstream MLIR distribution and bufferize 107their respective dialects (e.g. `scf-bufferize` bufferizes the `scf` dialect). 108The `tcp-bufferize` pass is an exception -- it is a partial bufferization pass 109used to bufferize the downstream `tcp` dialect, and fits in perfectly with all 110the other passes provided upstream. 111 112The last pass is the finalizing bufferization pass. The `mlir-npcomp` reference 113backend has arranged that all ops are bufferized by partial bufferizations, so 114that the upstream `finalizing-bufferize` pass can be used as the finalizing 115bufferization pass. This gives excellent diagnostics when something goes wrong 116with the bufferization process, such as due to an op that wasn't handled by any 117pattern. 118 119## How to write a partial bufferization pass 120 121The contract of a partial bufferization pass is that a subset of ops (or kinds 122of ops, customizable by a ConversionTarget) get bufferized. 123 124A partial bufferization pass is just a pass that uses the 125[dialect conversion](DialectConversion.md) framework to apply 126`ConversionPattern`s with a `tensor` to `memref` type conversion. 127 128To describe how to write such a pass, we will walk through an example, the 129`tensor-bufferize` pass 130([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23), 131[test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Tensor/bufferize.mlir#L1)) 132that bufferizes the `tensor` dialect. 133 134The bulk of the code in the pass will be a set of conversion patterns, with a 135simple example being 136[BufferizeCastOp](https://github.com/llvm/llvm-project/blob/2bf6e443e54604c7818c4d1a1837f3d091023270/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23)). 137 138``` 139class BufferizeCastOp : public OpConversionPattern<tensor::CastOp> { 140public: 141 using OpConversionPattern::OpConversionPattern; 142 LogicalResult 143 matchAndRewrite(tensor::CastOp op, OpAdaptor adaptor, 144 ConversionPatternRewriter &rewriter) const override { 145 auto resultType = getTypeConverter()->convertType(op.getType()); 146 rewriter.replaceOpWithNewOp<MemRefCastOp>(op, resultType, adaptor.source()); 147 return success(); 148 } 149}; 150``` 151 152See [the talk](#the-talk) for more details on how to write these patterns. 153 154The 155[pass itself](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L57) 156is very small, and follows the basic pattern of any dialect conversion pass. 157 158``` 159void mlir::populateTensorBufferizePatterns( 160 BufferizeTypeConverter &typeConverter, RewritePatternSet &patterns) { 161 patterns.add<BufferizeCastOp, BufferizeExtractOp>(typeConverter, 162 patterns.getContext()); 163} 164 165struct TensorBufferizePass : public TensorBufferizeBase<TensorBufferizePass> { 166 void runOnFunction() override { 167 auto *context = &getContext(); 168 BufferizeTypeConverter typeConverter; 169 RewritePatternSet patterns(context); 170 ConversionTarget target(*context); 171 172 populateTensorBufferizePatterns(typeConverter, patterns); 173 target.addIllegalOp<tensor::CastOp, tensor::ExtractOp>(); 174 target.addLegalDialect<StandardOpsDialect>(); 175 176 if (failed( 177 applyPartialConversion(getFunction(), target, std::move(patterns)))) 178 signalPassFailure(); 179 } 180}; 181``` 182 183The pass has all the hallmarks of a dialect conversion pass that does type 184conversions: a `TypeConverter`, a `RewritePatternSet`, and a `ConversionTarget`, 185and a call to `applyPartialConversion`. Note that a function 186`populateTensorBufferizePatterns` is separated, so that power users can use the 187patterns independently, if necessary (such as to combine multiple sets of 188conversion patterns into a single conversion call, for performance). 189 190One convenient utility provided by the MLIR bufferization infrastructure is the 191`BufferizeTypeConverter`, which comes pre-loaded with the necessary conversions 192and materializations between `tensor` and `memref`. 193 194In this case, the `BufferizationOpsDialect` is marked as legal, so the 195`bufferization.to_tensor` and `bufferization.to_memref` ops, which are inserted 196automatically by the dialect conversion framework as materializations, are 197legal. There is a helper `populateBufferizeMaterializationLegality` 198([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L53)) 199which helps with this in general. 200 201### Other partial bufferization examples 202 203- `linalg-bufferize` 204 ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L1), 205 [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Linalg/bufferize.mlir#L1)) 206 207 - Bufferizes the `linalg` dialect. 208 - This is an example of how to simultaneously bufferize all the ops that 209 satisfy a certain OpInterface with a single pattern. Specifically, 210 `BufferizeAnyLinalgOp` 211 ([code](https://github.com/llvm/llvm-project/blob/daaaed6bb89044ac58a23f1bb1ccdd12342a5a58/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L170)) 212 bufferizes any ops that implements the `LinalgOp` interface. 213 214- `scf-bufferize` 215 ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/SCF/Transforms/Bufferize.cpp#L1), 216 [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/SCF/bufferize.mlir#L1)) 217 218 - Bufferizes ops from the `scf` dialect. 219 - This is an example of how to bufferize ops that implement 220 `RegionBranchOpInterface` (that is, they use regions to represent 221 control flow). 222 - The bulk of the work is done by 223 `lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp` 224 ([code](https://github.com/llvm/llvm-project/blob/daaaed6bb89044ac58a23f1bb1ccdd12342a5a58/mlir/lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp#L1)), 225 which is well-commented and covers how to correctly convert ops that 226 contain regions. 227 228- `func-bufferize` 229 ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/StandardOps/Transforms/FuncBufferize.cpp#L1), 230 [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Standard/func-bufferize.mlir#L1)) 231 232 - Bufferizes `func`, `call`, and `BranchOpInterface` ops. 233 - This is an example of how to bufferize ops that have multi-block 234 regions. 235 - This is an example of a pass that is not split along dialect 236 subdivisions. 237 238- `tensor-constant-bufferize` 239 ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/StandardOps/Transforms/TensorConstantBufferize.cpp#L1), 240 [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Standard/tensor-constant-bufferize.mlir#L1)) 241 242 - Bufferizes only `arith.constant` ops of `tensor` type. 243 - This is an example of setting up the legality so that only a subset of 244 `arith.constant` ops get bufferized. 245 - This is an example of a pass that is not split along dialect 246 subdivisions. 247 248## How to write a finalizing bufferization pass 249 250The contract of a finalizing bufferization pass is that all tensors are gone 251from the program. 252 253The easiest way to write a finalizing bufferize pass is to not write one at all! 254MLIR provides a pass `finalizing-bufferize` which eliminates the 255`bufferization.to_tensor` / `bufferization.to_memref` materialization ops 256inserted by partial bufferization passes and emits an error if that is not 257sufficient to remove all tensors from the program. 258 259This pass is sufficient when partial bufferization passes have bufferized all 260the ops in the program, leaving behind only the materializations. When possible, 261it is recommended to structure your pass pipeline this way, as this has the 262significant advantage that if an op does not get bufferized (due to a missing 263pattern, bug in the code, etc.), `finalizing-bufferize` will emit a nice clean 264error, and the IR seen by `finalizing-bufferize` will only contain only one 265unbufferized op. 266 267However, before the current bufferization infrastructure was put in place, 268bufferization could only be done as a single finalizing bufferization mega-pass 269that used the `populate*BufferizePatterns` functions from multiple dialects to 270simultaneously bufferize everything at once. Thus, one might see code in 271downstream projects structured this way. This structure is not recommended in 272new code. A helper, `populateEliminateBufferizeMaterializationsPatterns` 273([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L58)) 274is available for such passes to provide patterns that eliminate 275`bufferization.to_tensor` and `bufferization.to_memref`. 276 277## Changes since [the talk](#the-talk) 278 279- `func-bufferize` was changed to be a partial conversion pass, and there is a 280 new `finalizing-bufferize` which serves as a general finalizing 281 bufferization pass. 282