1# MLIR Python Bindings 2 3Current status: Under development and not enabled by default 4 5 6## Building 7 8### Pre-requisites 9 10* [`pybind11`](https://github.com/pybind/pybind11) must be installed and able to 11 be located by CMake. 12* A relatively recent Python3 installation 13 14### CMake variables 15 16* **`MLIR_BINDINGS_PYTHON_ENABLED`**`:BOOL` 17 18 Enables building the Python bindings. Defaults to `OFF`. 19 20* **`MLIR_PYTHON_BINDINGS_VERSION_LOCKED`**`:BOOL` 21 22 Links the native extension against the Python runtime library, which is 23 optional on some platforms. While setting this to `OFF` can yield some greater 24 deployment flexibility, linking in this way allows the linker to report 25 compile time errors for unresolved symbols on all platforms, which makes for a 26 smoother development workflow. Defaults to `ON`. 27 28* **`PYTHON_EXECUTABLE`**:`STRING` 29 30 Specifies the `python` executable used for the LLVM build, including for 31 determining header/link flags for the Python bindings. On systems with 32 multiple Python implementations, setting this explicitly to the preferred 33 `python3` executable is strongly recommended. 34 35 36## Design 37 38### Use cases 39 40There are likely two primary use cases for the MLIR python bindings: 41 421. Support users who expect that an installed version of LLVM/MLIR will yield 43 the ability to `import mlir` and use the API in a pure way out of the box. 44 452. Downstream integrations will likely want to include parts of the API in their 46 private namespace or specially built libraries, probably mixing it with other 47 python native bits. 48 49 50### Composable modules 51 52In order to support use case #2, the Python bindings are organized into 53composable modules that downstream integrators can include and re-export into 54their own namespace if desired. This forces several design points: 55 56* Separate the construction/populating of a `py::module` from `PYBIND11_MODULE` 57 global constructor. 58 59* Introduce headers for C++-only wrapper classes as other related C++ modules 60 will need to interop with it. 61 62* Separate any initialization routines that depend on optional components into 63 its own module/dependency (currently, things like `registerAllDialects` fall 64 into this category). 65 66There are a lot of co-related issues of shared library linkage, distribution 67concerns, etc that affect such things. Organizing the code into composable 68modules (versus a monolithic `cpp` file) allows the flexibility to address many 69of these as needed over time. Also, compilation time for all of the template 70meta-programming in pybind scales with the number of things you define in a 71translation unit. Breaking into multiple translation units can significantly aid 72compile times for APIs with a large surface area. 73 74### Submodules 75 76Generally, the C++ codebase namespaces most things into the `mlir` namespace. 77However, in order to modularize and make the Python bindings easier to 78understand, sub-packages are defined that map roughly to the directory structure 79of functional units in MLIR. 80 81Examples: 82 83* `mlir.ir` 84* `mlir.passes` (`pass` is a reserved word :( ) 85* `mlir.dialect` 86* `mlir.execution_engine` (aside from namespacing, it is important that 87 "bulky"/optional parts like this are isolated) 88 89In addition, initialization functions that imply optional dependencies should 90be in underscored (notionally private) modules such as `_init` and linked 91separately. This allows downstream integrators to completely customize what is 92included "in the box" and covers things like dialect registration, 93pass registration, etc. 94 95### Loader 96 97LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with 98other non-trivial native extensions. As such, the native extension (i.e. the 99`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol 100(`_mlir`), while a small set of Python code is provided in `mlir/__init__.py` 101and siblings which loads and re-exports it. This split provides a place to stage 102code that needs to prepare the environment *before* the shared library is loaded 103into the Python runtime, and also provides a place that one-time initialization 104code can be invoked apart from module constructors. 105 106To start with the `mlir/__init__.py` loader shim can be very simple and scale to 107future need: 108 109```python 110from _mlir import * 111``` 112 113### Limited use of globals 114 115For normal operations, parent-child constructor relationships are realized with 116constructor methods on a parent class as opposed to requiring 117invocation/creation from a global symbol. 118 119For example, consider two code fragments: 120 121```python 122 123op = build_my_op() 124 125region = mlir.Region(op) 126 127``` 128 129vs 130 131```python 132 133op = build_my_op() 134 135region = op.new_region() 136 137``` 138 139For tightly coupled data structures like `Operation`, the latter is generally 140preferred because: 141 142* It is syntactically less possible to create something that is going to access 143 illegal memory (less error handling in the bindings, less testing, etc). 144 145* It reduces the global-API surface area for creating related entities. This 146 makes it more likely that if constructing IR based on an Operation instance of 147 unknown providence, receiving code can just call methods on it to do what they 148 want versus needing to reach back into the global namespace and find the right 149 `Region` class. 150 151* It leaks fewer things that are in place for C++ convenience (i.e. default 152 constructors to invalid instances). 153 154### Use the C-API 155 156The Python APIs should seek to layer on top of the C-API to the degree possible. 157Especially for the core, dialect-independent parts, such a binding enables 158packaging decisions that would be difficult or impossible if spanning a C++ ABI 159boundary. In addition, factoring in this way side-steps some very difficult 160issues that arise when combining RTTI-based modules (which pybind derived things 161are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM). 162 163 164## Style 165 166In general, for the core parts of MLIR, the Python bindings should be largely 167isomorphic with the underlying C++ structures. However, concessions are made 168either for practicality or to give the resulting library an appropriately 169"Pythonic" flavor. 170 171### Properties vs get*() methods 172 173Generally favor converting trivial methods like `getContext()`, `getName()`, 174`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is 175primarily a matter of calling `def_property_readonly` vs `def` in binding code, 176and makes things feel much nicer to the Python side. 177 178For example, prefer: 179 180```c++ 181m.def_property_readonly("context", ...) 182``` 183 184Over: 185 186```c++ 187m.def("getContext", ...) 188``` 189 190### __repr__ methods 191 192Things that have nice printed representations are really great :) If there is a 193reasonable printed form, it can be a significant productivity boost to wire that 194to the `__repr__` method (and verify it with a [doctest](#sample-doctest)). 195 196### CamelCase vs snake_case 197 198Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As 199a mechanical concession to Python style, this can go a long way to making the 200API feel like it fits in with its peers in the Python landscape. 201 202If in doubt, choose names that will flow properly with other 203[PEP 8 style names](https://pep8.org/#descriptive-naming-styles). 204 205### Prefer pseudo-containers 206 207Many core IR constructs provide methods directly on the instance to query count 208and begin/end iterators. Prefer hoisting these to dedicated pseudo containers. 209 210For example, a direct mapping of blocks within regions could be done this way: 211 212```python 213region = ... 214 215for block in region: 216 217 pass 218``` 219 220However, this way is preferred: 221 222```python 223region = ... 224 225for block in region.blocks: 226 227 pass 228 229print(len(region.blocks)) 230print(region.blocks[0]) 231print(region.blocks[-1]) 232``` 233 234Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate 235them to appropriate `__dunder__` methods and iterator wrappers in the bindings. 236 237Note that this can be taken too far, so use good judgment. For example, block 238arguments may appear container-like but have defined methods for lookup and 239mutation that would be hard to model properly without making semantics 240complicated. If running into these, just mirror the C/C++ API. 241 242### Provide one stop helpers for common things 243 244One stop helpers that aggregate over multiple low level entities can be 245incredibly helpful and are encouraged within reason. For example, making 246`Context` have a `parse_asm` or equivalent that avoids needing to explicitly 247construct a SourceMgr can be quite nice. One stop helpers do not have to be 248mutually exclusive with a more complete mapping of the backing constructs. 249 250## Testing 251 252Tests should be added in the `test/Bindings/Python` directory and should 253typically be `.py` files that have a lit run line. 254 255While lit can run any python module, prefer to lay tests out according to these 256rules: 257 258* For tests of the API surface area, prefer 259 [`doctest`](https://docs.python.org/3/library/doctest.html). 260* For generative tests (those that produce IR), define a Python module that 261 constructs/prints the IR and pipe it through `FileCheck`. 262* Parsing should be kept self-contained within the module under test by use of 263 raw constants and an appropriate `parse_asm` call. 264* Any file I/O code should be staged through a tempfile vs relying on file 265 artifacts/paths outside of the test module. 266 267### Sample Doctest 268 269```python 270# RUN: %PYTHON %s 271 272""" 273 >>> m = load_test_module() 274Test basics: 275 >>> m.operation.name 276 "module" 277 >>> m.operation.is_registered 278 True 279 >>> ... etc ... 280 281Verify that repr prints: 282 >>> m.operation 283 <operation 'module'> 284""" 285 286import mlir 287 288TEST_MLIR_ASM = r""" 289func @test_operation_correct_regions() { 290 // ... 291} 292""" 293 294# TODO: Move to a test utility class once any of this actually exists. 295def load_test_module(): 296 ctx = mlir.ir.Context() 297 ctx.allow_unregistered_dialects = True 298 module = ctx.parse_asm(TEST_MLIR_ASM) 299 return module 300 301 302if __name__ == "__main__": 303 import doctest 304 doctest.testmod() 305``` 306 307### Sample FileCheck test 308 309```python 310# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck 311 312# TODO: Move to a test utility class once any of this actually exists. 313def print_module(f): 314 m = f() 315 print("// -----") 316 print("// TEST_FUNCTION:", f.__name__) 317 print(m.to_asm()) 318 return f 319 320# CHECK-LABEL: TEST_FUNCTION: create_my_op 321@print_module 322def create_my_op(): 323 m = mlir.ir.Module() 324 builder = m.new_op_builder() 325 # CHECK: mydialect.my_operation ... 326 builder.my_op() 327 return m 328``` 329