1# MLIR Python Bindings 2 3Current status: Under development and not enabled by default 4 5## Building 6 7### Pre-requisites 8 9* A relatively recent Python3 installation 10* [`pybind11`](https://github.com/pybind/pybind11) must be installed and able to 11 be located by CMake (auto-detected if installed via 12 `python -m pip install pybind11`). Note: minimum version required: :2.6.0. 13 14### CMake variables 15 16* **`MLIR_BINDINGS_PYTHON_ENABLED`**`:BOOL` 17 18 Enables building the Python bindings. Defaults to `OFF`. 19 20* **`Python3_EXECUTABLE`**:`STRING` 21 22 Specifies the `python` executable used for the LLVM build, including for 23 determining header/link flags for the Python bindings. On systems with 24 multiple Python implementations, setting this explicitly to the preferred 25 `python3` executable is strongly recommended. 26 27* **`MLIR_PYTHON_BINDINGS_VERSION_LOCKED`**`:BOOL` 28 29 Links the native extension against the Python runtime library, which is 30 optional on some platforms. While setting this to `OFF` can yield some greater 31 deployment flexibility, linking in this way allows the linker to report 32 compile time errors for unresolved symbols on all platforms, which makes for a 33 smoother development workflow. Defaults to `ON`. 34 35### Recommended development practices 36 37It is recommended to use a python virtual environment. Many ways exist for this, 38but the following is the simplest: 39 40```shell 41# Make sure your 'python' is what you expect. Note that on multi-python 42# systems, this may have a version suffix, and on many Linuxes and MacOS where 43# python2 and python3 co-exist, you may also want to use `python3`. 44which python 45python -m venv ~/.venv/mlirdev 46source ~/.venv/mlirdev/bin/activate 47 48# Now the `python` command will resolve to your virtual environment and 49# packages will be installed there. 50python -m pip install pybind11 numpy 51 52# Now run `cmake`, `ninja`, et al. 53``` 54 55For interactive use, it is sufficient to add the `python` directory in your 56`build/` directory to the `PYTHONPATH`. Typically: 57 58```shell 59export PYTHONPATH=$(cd build && pwd)/python 60``` 61 62## Design 63 64### Use cases 65 66There are likely two primary use cases for the MLIR python bindings: 67 681. Support users who expect that an installed version of LLVM/MLIR will yield 69 the ability to `import mlir` and use the API in a pure way out of the box. 70 711. Downstream integrations will likely want to include parts of the API in their 72 private namespace or specially built libraries, probably mixing it with other 73 python native bits. 74 75### Composable modules 76 77In order to support use case \#2, the Python bindings are organized into 78composable modules that downstream integrators can include and re-export into 79their own namespace if desired. This forces several design points: 80 81* Separate the construction/populating of a `py::module` from `PYBIND11_MODULE` 82 global constructor. 83 84* Introduce headers for C++-only wrapper classes as other related C++ modules 85 will need to interop with it. 86 87* Separate any initialization routines that depend on optional components into 88 its own module/dependency (currently, things like `registerAllDialects` fall 89 into this category). 90 91There are a lot of co-related issues of shared library linkage, distribution 92concerns, etc that affect such things. Organizing the code into composable 93modules (versus a monolithic `cpp` file) allows the flexibility to address many 94of these as needed over time. Also, compilation time for all of the template 95meta-programming in pybind scales with the number of things you define in a 96translation unit. Breaking into multiple translation units can significantly aid 97compile times for APIs with a large surface area. 98 99### Submodules 100 101Generally, the C++ codebase namespaces most things into the `mlir` namespace. 102However, in order to modularize and make the Python bindings easier to 103understand, sub-packages are defined that map roughly to the directory structure 104of functional units in MLIR. 105 106Examples: 107 108* `mlir.ir` 109* `mlir.passes` (`pass` is a reserved word :( ) 110* `mlir.dialect` 111* `mlir.execution_engine` (aside from namespacing, it is important that 112 "bulky"/optional parts like this are isolated) 113 114In addition, initialization functions that imply optional dependencies should 115be in underscored (notionally private) modules such as `_init` and linked 116separately. This allows downstream integrators to completely customize what is 117included "in the box" and covers things like dialect registration, 118pass registration, etc. 119 120### Loader 121 122LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with 123other non-trivial native extensions. As such, the native extension (i.e. the 124`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol 125(`_mlir`), while a small set of Python code is provided in `mlir/__init__.py` 126and siblings which loads and re-exports it. This split provides a place to stage 127code that needs to prepare the environment *before* the shared library is loaded 128into the Python runtime, and also provides a place that one-time initialization 129code can be invoked apart from module constructors. 130 131To start with the `mlir/__init__.py` loader shim can be very simple and scale to 132future need: 133 134```python 135from _mlir import * 136``` 137 138### Use the C-API 139 140The Python APIs should seek to layer on top of the C-API to the degree possible. 141Especially for the core, dialect-independent parts, such a binding enables 142packaging decisions that would be difficult or impossible if spanning a C++ ABI 143boundary. In addition, factoring in this way side-steps some very difficult 144issues that arise when combining RTTI-based modules (which pybind derived things 145are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM). 146 147### Ownership in the Core IR 148 149There are several top-level types in the core IR that are strongly owned by their python-side reference: 150 151* `PyContext` (`mlir.ir.Context`) 152* `PyModule` (`mlir.ir.Module`) 153* `PyOperation` (`mlir.ir.Operation`) - but with caveats 154 155All other objects are dependent. All objects maintain a back-reference 156(keep-alive) to their closest containing top-level object. Further, dependent 157objects fall into two categories: a) uniqued (which live for the life-time of 158the context) and b) mutable. Mutable objects need additional machinery for 159keeping track of when the C++ instance that backs their Python object is no 160longer valid (typically due to some specific mutation of the IR, deletion, or 161bulk operation). 162 163### Optionality and argument ordering in the Core IR 164 165The following types support being bound to the current thread as a context manager: 166 167* `PyLocation` (`loc: mlir.ir.Location = None`) 168* `PyInsertionPoint` (`ip: mlir.ir.InsertionPoint = None`) 169* `PyMlirContext` (`context: mlir.ir.Context = None`) 170 171In order to support composability of function arguments, when these types appear 172as arguments, they should always be the last and appear in the above order and 173with the given names (which is generally the order in which they are expected to 174need to be expressed explicitly in special cases) as necessary. Each should 175carry a default value of `py::none()` and use either a manual or automatic 176conversion for resolving either with the explicit value or a value from the 177thread context manager (i.e. `DefaultingPyMlirContext` or 178`DefaultingPyLocation`). 179 180The rationale for this is that in Python, trailing keyword arguments to the 181*right* are the most composable, enabling a variety of strategies such as kwarg 182passthrough, default values, etc. Keeping function signatures composable 183increases the chances that interesting DSLs and higher level APIs can be 184constructed without a lot of exotic boilerplate. 185 186Used consistently, this enables a style of IR construction that rarely needs to 187use explicit contexts, locations, or insertion points but is free to do so when 188extra control is needed. 189 190#### Operation hierarchy 191 192As mentioned above, `PyOperation` is special because it can exist in either a 193top-level or dependent state. The life-cycle is unidirectional: operations can 194be created detached (top-level) and once added to another operation, they are 195then dependent for the remainder of their lifetime. The situation is more 196complicated when considering construction scenarios where an operation is added 197to a transitive parent that is still detached, necessitating further accounting 198at such transition points (i.e. all such added children are initially added to 199the IR with a parent of their outer-most detached operation, but then once it is 200added to an attached operation, they need to be re-parented to the containing 201module). 202 203Due to the validity and parenting accounting needs, `PyOperation` is the owner 204for regions and blocks and needs to be a top-level type that we can count on not 205aliasing. This let's us do things like selectively invalidating instances when 206mutations occur without worrying that there is some alias to the same operation 207in the hierarchy. Operations are also the only entity that are allowed to be in 208a detached state, and they are interned at the context level so that there is 209never more than one Python `mlir.ir.Operation` object for a unique 210`MlirOperation`, regardless of how it is obtained. 211 212The C/C++ API allows for Region/Block to also be detached, but it simplifies the 213ownership model a lot to eliminate that possibility in this API, allowing the 214Region/Block to be completely dependent on its owning operation for accounting. 215The aliasing of Python `Region`/`Block` instances to underlying 216`MlirRegion`/`MlirBlock` is considered benign and these objects are not interned 217in the context (unlike operations). 218 219If we ever want to re-introduce detached regions/blocks, we could do so with new 220"DetachedRegion" class or similar and also avoid the complexity of accounting. 221With the way it is now, we can avoid having a global live list for regions and 222blocks. We may end up needing an op-local one at some point TBD, depending on 223how hard it is to guarantee how mutations interact with their Python peer 224objects. We can cross that bridge easily when we get there. 225 226Module, when used purely from the Python API, can't alias anyway, so we can use 227it as a top-level ref type without a live-list for interning. If the API ever 228changes such that this cannot be guaranteed (i.e. by letting you marshal a 229native-defined Module in), then there would need to be a live table for it too. 230 231## Style 232 233In general, for the core parts of MLIR, the Python bindings should be largely 234isomorphic with the underlying C++ structures. However, concessions are made 235either for practicality or to give the resulting library an appropriately 236"Pythonic" flavor. 237 238### Properties vs get\*() methods 239 240Generally favor converting trivial methods like `getContext()`, `getName()`, 241`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is 242primarily a matter of calling `def_property_readonly` vs `def` in binding code, 243and makes things feel much nicer to the Python side. 244 245For example, prefer: 246 247```c++ 248m.def_property_readonly("context", ...) 249``` 250 251Over: 252 253```c++ 254m.def("getContext", ...) 255``` 256 257### __repr__ methods 258 259Things that have nice printed representations are really great :) If there is a 260reasonable printed form, it can be a significant productivity boost to wire that 261to the `__repr__` method (and verify it with a [doctest](#sample-doctest)). 262 263### CamelCase vs snake\_case 264 265Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As 266a mechanical concession to Python style, this can go a long way to making the 267API feel like it fits in with its peers in the Python landscape. 268 269If in doubt, choose names that will flow properly with other 270[PEP 8 style names](https://pep8.org/#descriptive-naming-styles). 271 272### Prefer pseudo-containers 273 274Many core IR constructs provide methods directly on the instance to query count 275and begin/end iterators. Prefer hoisting these to dedicated pseudo containers. 276 277For example, a direct mapping of blocks within regions could be done this way: 278 279```python 280region = ... 281 282for block in region: 283 284 pass 285``` 286 287However, this way is preferred: 288 289```python 290region = ... 291 292for block in region.blocks: 293 294 pass 295 296print(len(region.blocks)) 297print(region.blocks[0]) 298print(region.blocks[-1]) 299``` 300 301Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate 302them to appropriate `__dunder__` methods and iterator wrappers in the bindings. 303 304Note that this can be taken too far, so use good judgment. For example, block 305arguments may appear container-like but have defined methods for lookup and 306mutation that would be hard to model properly without making semantics 307complicated. If running into these, just mirror the C/C++ API. 308 309### Provide one stop helpers for common things 310 311One stop helpers that aggregate over multiple low level entities can be 312incredibly helpful and are encouraged within reason. For example, making 313`Context` have a `parse_asm` or equivalent that avoids needing to explicitly 314construct a SourceMgr can be quite nice. One stop helpers do not have to be 315mutually exclusive with a more complete mapping of the backing constructs. 316 317## Testing 318 319Tests should be added in the `test/Bindings/Python` directory and should 320typically be `.py` files that have a lit run line. 321 322We use `lit` and `FileCheck` based tests: 323 324* For generative tests (those that produce IR), define a Python module that 325 constructs/prints the IR and pipe it through `FileCheck`. 326* Parsing should be kept self-contained within the module under test by use of 327 raw constants and an appropriate `parse_asm` call. 328* Any file I/O code should be staged through a tempfile vs relying on file 329 artifacts/paths outside of the test module. 330* For convenience, we also test non-generative API interactions with the same 331 mechanisms, printing and `CHECK`ing as needed. 332 333### Sample FileCheck test 334 335```python 336# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck 337 338# TODO: Move to a test utility class once any of this actually exists. 339def print_module(f): 340 m = f() 341 print("// -----") 342 print("// TEST_FUNCTION:", f.__name__) 343 print(m.to_asm()) 344 return f 345 346# CHECK-LABEL: TEST_FUNCTION: create_my_op 347@print_module 348def create_my_op(): 349 m = mlir.ir.Module() 350 builder = m.new_op_builder() 351 # CHECK: mydialect.my_operation ... 352 builder.my_op() 353 return m 354``` 355 356## Integration with ODS 357 358The MLIR Python bindings integrate with the tablegen-based ODS system for 359providing user-friendly wrappers around MLIR dialects and operations. There 360are multiple parts to this integration, outlined below. Most details have 361been elided: refer to the build rules and python sources under `mlir.dialects` 362for the canonical way to use this facility. 363 364### Generating `{DIALECT_NAMESPACE}.py` wrapper modules 365 366Each dialect with a mapping to python requires that an appropriate 367`{DIALECT_NAMESPACE}.py` wrapper module is created. This is done by invoking 368`mlir-tblgen` on a python-bindings specific tablegen wrapper that includes 369the boilerplate and actual dialect specific `td` file. An example, for the 370`StandardOps` (which is assigned the namespace `std` as a special case): 371 372```tablegen 373#ifndef PYTHON_BINDINGS_STANDARD_OPS 374#define PYTHON_BINDINGS_STANDARD_OPS 375 376include "mlir/Bindings/Python/Attributes.td" 377include "mlir/Dialect/StandardOps/IR/Ops.td" 378 379#endif 380``` 381 382In the main repository, building the wrapper is done via the CMake function 383`add_mlir_dialect_python_bindings`, which invokes: 384 385``` 386mlir-tblgen -gen-python-op-bindings -bind-dialect={DIALECT_NAMESPACE} \ 387 {PYTHON_BINDING_TD_FILE} 388``` 389 390### Extending the search path for wrapper modules 391 392When the python bindings need to locate a wrapper module, they consult the 393`dialect_search_path` and use it to find an appropriately named module. For 394the main repository, this search path is hard-coded to include the 395`mlir.dialects` module, which is where wrappers are emitted by the abobe build 396rule. Out of tree dialects and add their modules to the search path by calling: 397 398```python 399mlir._cext.append_dialect_search_prefix("myproject.mlir.dialects") 400``` 401 402### Wrapper module code organization 403 404The wrapper module tablegen emitter outputs: 405 406* A `_Dialect` class (extending `mlir.ir.Dialect`) with a `DIALECT_NAMESPACE` 407 attribute. 408* An `{OpName}` class for each operation (extending `mlir.ir.OpView`). 409* Decorators for each of the above to register with the system. 410 411Note: In order to avoid naming conflicts, all internal names used by the wrapper 412module are prefixed by `_ods_`. 413 414Each concrete `OpView` subclass further defines several public-intended 415attributes: 416 417* `OPERATION_NAME` attribute with the `str` fully qualified operation name 418 (i.e. `std.absf`). 419* An `__init__` method for the *default builder* if one is defined or inferred 420 for the operation. 421* `@property` getter for each operand or result (using an auto-generated name 422 for unnamed of each). 423* `@property` getter, setter and deleter for each declared attribute. 424 425It further emits additional private-intended attributes meant for subclassing 426and customization (default cases omit these attributes in favor of the 427defaults on `OpView`): 428 429* `_ODS_REGIONS`: A specification on the number and types of regions. 430 Currently a tuple of (min_region_count, has_no_variadic_regions). Note that 431 the API does some light validation on this but the primary purpose is to 432 capture sufficient information to perform other default building and region 433 accessor generation. 434* `_ODS_OPERAND_SEGMENTS` and `_ODS_RESULT_SEGMENTS`: Black-box value which 435 indicates the structure of either the operand or results with respect to 436 variadics. Used by `OpView._ods_build_default` to decode operand and result 437 lists that contain lists. 438 439#### Builders 440 441Presently, only a single, default builder is mapped to the `__init__` method. 442The intent is that this `__init__` method represents the *most specific* of 443the builders typically generated for C++; however currently it is just the 444generic form below. 445 446* One argument for each declared result: 447 * For single-valued results: Each will accept an `mlir.ir.Type`. 448 * For variadic results: Each will accept a `List[mlir.ir.Type]`. 449* One argument for each declared operand or attribute: 450 * For single-valued operands: Each will accept an `mlir.ir.Value`. 451 * For variadic operands: Each will accept a `List[mlir.ir.Value]`. 452 * For attributes, it will accept an `mlir.ir.Attribute`. 453* Trailing usage-specific, optional keyword arguments: 454 * `loc`: An explicit `mlir.ir.Location` to use. Defaults to the location 455 bound to the thread (i.e. `with Location.unknown():`) or an error if none 456 is bound nor specified. 457 * `ip`: An explicit `mlir.ir.InsertionPoint` to use. Default to the insertion 458 point bound to the thread (i.e. `with InsertionPoint(...):`). 459 460In addition, each `OpView` inherits a `build_generic` method which allows 461construction via a (nested in the case of variadic) sequence of `results` and 462`operands`. This can be used to get some default construction semantics for 463operations that are otherwise unsupported in Python, at the expense of having 464a very generic signature. 465