1# MLIR Python Bindings 2 3Current status: Under development and not enabled by default 4 5## Building 6 7### Pre-requisites 8 9* [`pybind11`](https://github.com/pybind/pybind11) must be installed and able to 10 be located by CMake. 11* A relatively recent Python3 installation 12 13### CMake variables 14 15* **`MLIR_BINDINGS_PYTHON_ENABLED`**`:BOOL` 16 17 Enables building the Python bindings. Defaults to `OFF`. 18 19* **`MLIR_PYTHON_BINDINGS_VERSION_LOCKED`**`:BOOL` 20 21 Links the native extension against the Python runtime library, which is 22 optional on some platforms. While setting this to `OFF` can yield some greater 23 deployment flexibility, linking in this way allows the linker to report 24 compile time errors for unresolved symbols on all platforms, which makes for a 25 smoother development workflow. Defaults to `ON`. 26 27* **`PYTHON_EXECUTABLE`**:`STRING` 28 29 Specifies the `python` executable used for the LLVM build, including for 30 determining header/link flags for the Python bindings. On systems with 31 multiple Python implementations, setting this explicitly to the preferred 32 `python3` executable is strongly recommended. 33 34## Design 35 36### Use cases 37 38There are likely two primary use cases for the MLIR python bindings: 39 401. Support users who expect that an installed version of LLVM/MLIR will yield 41 the ability to `import mlir` and use the API in a pure way out of the box. 42 431. Downstream integrations will likely want to include parts of the API in their 44 private namespace or specially built libraries, probably mixing it with other 45 python native bits. 46 47### Composable modules 48 49In order to support use case \#2, the Python bindings are organized into 50composable modules that downstream integrators can include and re-export into 51their own namespace if desired. This forces several design points: 52 53* Separate the construction/populating of a `py::module` from `PYBIND11_MODULE` 54 global constructor. 55 56* Introduce headers for C++-only wrapper classes as other related C++ modules 57 will need to interop with it. 58 59* Separate any initialization routines that depend on optional components into 60 its own module/dependency (currently, things like `registerAllDialects` fall 61 into this category). 62 63There are a lot of co-related issues of shared library linkage, distribution 64concerns, etc that affect such things. Organizing the code into composable 65modules (versus a monolithic `cpp` file) allows the flexibility to address many 66of these as needed over time. Also, compilation time for all of the template 67meta-programming in pybind scales with the number of things you define in a 68translation unit. Breaking into multiple translation units can significantly aid 69compile times for APIs with a large surface area. 70 71### Submodules 72 73Generally, the C++ codebase namespaces most things into the `mlir` namespace. 74However, in order to modularize and make the Python bindings easier to 75understand, sub-packages are defined that map roughly to the directory structure 76of functional units in MLIR. 77 78Examples: 79 80* `mlir.ir` 81* `mlir.passes` (`pass` is a reserved word :( ) 82* `mlir.dialect` 83* `mlir.execution_engine` (aside from namespacing, it is important that 84 "bulky"/optional parts like this are isolated) 85 86In addition, initialization functions that imply optional dependencies should 87be in underscored (notionally private) modules such as `_init` and linked 88separately. This allows downstream integrators to completely customize what is 89included "in the box" and covers things like dialect registration, 90pass registration, etc. 91 92### Loader 93 94LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with 95other non-trivial native extensions. As such, the native extension (i.e. the 96`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol 97(`_mlir`), while a small set of Python code is provided in `mlir/__init__.py` 98and siblings which loads and re-exports it. This split provides a place to stage 99code that needs to prepare the environment *before* the shared library is loaded 100into the Python runtime, and also provides a place that one-time initialization 101code can be invoked apart from module constructors. 102 103To start with the `mlir/__init__.py` loader shim can be very simple and scale to 104future need: 105 106```python 107from _mlir import * 108``` 109 110### Use the C-API 111 112The Python APIs should seek to layer on top of the C-API to the degree possible. 113Especially for the core, dialect-independent parts, such a binding enables 114packaging decisions that would be difficult or impossible if spanning a C++ ABI 115boundary. In addition, factoring in this way side-steps some very difficult 116issues that arise when combining RTTI-based modules (which pybind derived things 117are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM). 118 119### Ownership in the Core IR 120 121There are several top-level types in the core IR that are strongly owned by their python-side reference: 122 123* `PyContext` (`mlir.ir.Context`) 124* `PyModule` (`mlir.ir.Module`) 125* `PyOperation` (`mlir.ir.Operation`) - but with caveats 126 127All other objects are dependent. All objects maintain a back-reference 128(keep-alive) to their closest containing top-level object. Further, dependent 129objects fall into two categories: a) uniqued (which live for the life-time of 130the context) and b) mutable. Mutable objects need additional machinery for 131keeping track of when the C++ instance that backs their Python object is no 132longer valid (typically due to some specific mutation of the IR, deletion, or 133bulk operation). 134 135### Optionality and argument ordering in the Core IR 136 137The following types support being bound to the current thread as a context manager: 138 139* `PyLocation` (`loc: mlir.ir.Location = None`) 140* `PyInsertionPoint` (`ip: mlir.ir.InsertionPoint = None`) 141* `PyMlirContext` (`context: mlir.ir.Context = None`) 142 143In order to support composability of function arguments, when these types appear 144as arguments, they should always be the last and appear in the above order and 145with the given names (which is generally the order in which they are expected to 146need to be expressed explicitly in special cases) as necessary. Each should 147carry a default value of `py::none()` and use either a manual or automatic 148conversion for resolving either with the explicit value or a value from the 149thread context manager (i.e. `DefaultingPyMlirContext` or 150`DefaultingPyLocation`). 151 152The rationale for this is that in Python, trailing keyword arguments to the 153*right* are the most composable, enabling a variety of strategies such as kwarg 154passthrough, default values, etc. Keeping function signatures composable 155increases the chances that interesting DSLs and higher level APIs can be 156constructed without a lot of exotic boilerplate. 157 158Used consistently, this enables a style of IR construction that rarely needs to 159use explicit contexts, locations, or insertion points but is free to do so when 160extra control is needed. 161 162#### Operation hierarchy 163 164As mentioned above, `PyOperation` is special because it can exist in either a 165top-level or dependent state. The life-cycle is unidirectional: operations can 166be created detached (top-level) and once added to another operation, they are 167then dependent for the remainder of their lifetime. The situation is more 168complicated when considering construction scenarios where an operation is added 169to a transitive parent that is still detached, necessitating further accounting 170at such transition points (i.e. all such added children are initially added to 171the IR with a parent of their outer-most detached operation, but then once it is 172added to an attached operation, they need to be re-parented to the containing 173module). 174 175Due to the validity and parenting accounting needs, `PyOperation` is the owner 176for regions and blocks and needs to be a top-level type that we can count on not 177aliasing. This let's us do things like selectively invalidating instances when 178mutations occur without worrying that there is some alias to the same operation 179in the hierarchy. Operations are also the only entity that are allowed to be in 180a detached state, and they are interned at the context level so that there is 181never more than one Python `mlir.ir.Operation` object for a unique 182`MlirOperation`, regardless of how it is obtained. 183 184The C/C++ API allows for Region/Block to also be detached, but it simplifies the 185ownership model a lot to eliminate that possibility in this API, allowing the 186Region/Block to be completely dependent on its owning operation for accounting. 187The aliasing of Python `Region`/`Block` instances to underlying 188`MlirRegion`/`MlirBlock` is considered benign and these objects are not interned 189in the context (unlike operations). 190 191If we ever want to re-introduce detached regions/blocks, we could do so with new 192"DetachedRegion" class or similar and also avoid the complexity of accounting. 193With the way it is now, we can avoid having a global live list for regions and 194blocks. We may end up needing an op-local one at some point TBD, depending on 195how hard it is to guarantee how mutations interact with their Python peer 196objects. We can cross that bridge easily when we get there. 197 198Module, when used purely from the Python API, can't alias anyway, so we can use 199it as a top-level ref type without a live-list for interning. If the API ever 200changes such that this cannot be guaranteed (i.e. by letting you marshal a 201native-defined Module in), then there would need to be a live table for it too. 202 203## Style 204 205In general, for the core parts of MLIR, the Python bindings should be largely 206isomorphic with the underlying C++ structures. However, concessions are made 207either for practicality or to give the resulting library an appropriately 208"Pythonic" flavor. 209 210### Properties vs get\*() methods 211 212Generally favor converting trivial methods like `getContext()`, `getName()`, 213`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is 214primarily a matter of calling `def_property_readonly` vs `def` in binding code, 215and makes things feel much nicer to the Python side. 216 217For example, prefer: 218 219```c++ 220m.def_property_readonly("context", ...) 221``` 222 223Over: 224 225```c++ 226m.def("getContext", ...) 227``` 228 229### __repr__ methods 230 231Things that have nice printed representations are really great :) If there is a 232reasonable printed form, it can be a significant productivity boost to wire that 233to the `__repr__` method (and verify it with a [doctest](#sample-doctest)). 234 235### CamelCase vs snake\_case 236 237Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As 238a mechanical concession to Python style, this can go a long way to making the 239API feel like it fits in with its peers in the Python landscape. 240 241If in doubt, choose names that will flow properly with other 242[PEP 8 style names](https://pep8.org/#descriptive-naming-styles). 243 244### Prefer pseudo-containers 245 246Many core IR constructs provide methods directly on the instance to query count 247and begin/end iterators. Prefer hoisting these to dedicated pseudo containers. 248 249For example, a direct mapping of blocks within regions could be done this way: 250 251```python 252region = ... 253 254for block in region: 255 256 pass 257``` 258 259However, this way is preferred: 260 261```python 262region = ... 263 264for block in region.blocks: 265 266 pass 267 268print(len(region.blocks)) 269print(region.blocks[0]) 270print(region.blocks[-1]) 271``` 272 273Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate 274them to appropriate `__dunder__` methods and iterator wrappers in the bindings. 275 276Note that this can be taken too far, so use good judgment. For example, block 277arguments may appear container-like but have defined methods for lookup and 278mutation that would be hard to model properly without making semantics 279complicated. If running into these, just mirror the C/C++ API. 280 281### Provide one stop helpers for common things 282 283One stop helpers that aggregate over multiple low level entities can be 284incredibly helpful and are encouraged within reason. For example, making 285`Context` have a `parse_asm` or equivalent that avoids needing to explicitly 286construct a SourceMgr can be quite nice. One stop helpers do not have to be 287mutually exclusive with a more complete mapping of the backing constructs. 288 289## Testing 290 291Tests should be added in the `test/Bindings/Python` directory and should 292typically be `.py` files that have a lit run line. 293 294While lit can run any python module, prefer to lay tests out according to these 295rules: 296 297* For tests of the API surface area, prefer 298 [`doctest`](https://docs.python.org/3/library/doctest.html). 299* For generative tests (those that produce IR), define a Python module that 300 constructs/prints the IR and pipe it through `FileCheck`. 301* Parsing should be kept self-contained within the module under test by use of 302 raw constants and an appropriate `parse_asm` call. 303* Any file I/O code should be staged through a tempfile vs relying on file 304 artifacts/paths outside of the test module. 305 306### Sample Doctest 307 308```python 309# RUN: %PYTHON %s 310 311""" 312 >>> m = load_test_module() 313Test basics: 314 >>> m.operation.name 315 "module" 316 >>> m.operation.is_registered 317 True 318 >>> ... etc ... 319 320Verify that repr prints: 321 >>> m.operation 322 <operation 'module'> 323""" 324 325import mlir 326 327TEST_MLIR_ASM = r""" 328func @test_operation_correct_regions() { 329 // ... 330} 331""" 332 333# TODO: Move to a test utility class once any of this actually exists. 334def load_test_module(): 335 ctx = mlir.ir.Context() 336 ctx.allow_unregistered_dialects = True 337 module = ctx.parse_asm(TEST_MLIR_ASM) 338 return module 339 340 341if __name__ == "__main__": 342 import doctest 343 doctest.testmod() 344``` 345 346### Sample FileCheck test 347 348```python 349# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck 350 351# TODO: Move to a test utility class once any of this actually exists. 352def print_module(f): 353 m = f() 354 print("// -----") 355 print("// TEST_FUNCTION:", f.__name__) 356 print(m.to_asm()) 357 return f 358 359# CHECK-LABEL: TEST_FUNCTION: create_my_op 360@print_module 361def create_my_op(): 362 m = mlir.ir.Module() 363 builder = m.new_op_builder() 364 # CHECK: mydialect.my_operation ... 365 builder.my_op() 366 return m 367``` 368