1# MLIR Python Bindings 2 3Current status: Under development and not enabled by default 4 5## Building 6 7### Pre-requisites 8 9* A relatively recent Python3 installation 10* Installation of python dependencies as specified in 11 `mlir/lib/Bindings/Python/requirements.txt` 12 13### CMake variables 14 15* **`MLIR_BINDINGS_PYTHON_ENABLED`**`:BOOL` 16 17 Enables building the Python bindings. Defaults to `OFF`. 18 19* **`Python3_EXECUTABLE`**:`STRING` 20 21 Specifies the `python` executable used for the LLVM build, including for 22 determining header/link flags for the Python bindings. On systems with 23 multiple Python implementations, setting this explicitly to the preferred 24 `python3` executable is strongly recommended. 25 26* **`MLIR_PYTHON_BINDINGS_VERSION_LOCKED`**`:BOOL` 27 28 Links the native extension against the Python runtime library, which is 29 optional on some platforms. While setting this to `OFF` can yield some greater 30 deployment flexibility, linking in this way allows the linker to report 31 compile time errors for unresolved symbols on all platforms, which makes for a 32 smoother development workflow. Defaults to `ON`. 33 34### Recommended development practices 35 36It is recommended to use a python virtual environment. Many ways exist for this, 37but the following is the simplest: 38 39```shell 40# Make sure your 'python' is what you expect. Note that on multi-python 41# systems, this may have a version suffix, and on many Linuxes and MacOS where 42# python2 and python3 co-exist, you may also want to use `python3`. 43which python 44python -m venv ~/.venv/mlirdev 45source ~/.venv/mlirdev/bin/activate 46 47# Note that many LTS distros will bundle a version of pip itself that is too 48# old to download all of the latest binaries for certain platforms. 49# The pip version can be obtained with `python -m pip --version`, and for 50# Linux specifically, this should be cross checked with minimum versions 51# here: https://github.com/pypa/manylinux 52# It is recommended to upgrade pip: 53python -m pip install --upgrade pip 54 55 56# Now the `python` command will resolve to your virtual environment and 57# packages will be installed there. 58python -m pip install -r mlir/lib/Bindings/Python/requirements.txt 59 60# Now run `cmake`, `ninja`, et al. 61``` 62 63For interactive use, it is sufficient to add the `python` directory in your 64`build/` directory to the `PYTHONPATH`. Typically: 65 66```shell 67export PYTHONPATH=$(cd build && pwd)/python 68``` 69 70## Design 71 72### Use cases 73 74There are likely two primary use cases for the MLIR python bindings: 75 761. Support users who expect that an installed version of LLVM/MLIR will yield 77 the ability to `import mlir` and use the API in a pure way out of the box. 78 791. Downstream integrations will likely want to include parts of the API in their 80 private namespace or specially built libraries, probably mixing it with other 81 python native bits. 82 83### Composable modules 84 85In order to support use case \#2, the Python bindings are organized into 86composable modules that downstream integrators can include and re-export into 87their own namespace if desired. This forces several design points: 88 89* Separate the construction/populating of a `py::module` from `PYBIND11_MODULE` 90 global constructor. 91 92* Introduce headers for C++-only wrapper classes as other related C++ modules 93 will need to interop with it. 94 95* Separate any initialization routines that depend on optional components into 96 its own module/dependency (currently, things like `registerAllDialects` fall 97 into this category). 98 99There are a lot of co-related issues of shared library linkage, distribution 100concerns, etc that affect such things. Organizing the code into composable 101modules (versus a monolithic `cpp` file) allows the flexibility to address many 102of these as needed over time. Also, compilation time for all of the template 103meta-programming in pybind scales with the number of things you define in a 104translation unit. Breaking into multiple translation units can significantly aid 105compile times for APIs with a large surface area. 106 107### Submodules 108 109Generally, the C++ codebase namespaces most things into the `mlir` namespace. 110However, in order to modularize and make the Python bindings easier to 111understand, sub-packages are defined that map roughly to the directory structure 112of functional units in MLIR. 113 114Examples: 115 116* `mlir.ir` 117* `mlir.passes` (`pass` is a reserved word :( ) 118* `mlir.dialect` 119* `mlir.execution_engine` (aside from namespacing, it is important that 120 "bulky"/optional parts like this are isolated) 121 122In addition, initialization functions that imply optional dependencies should 123be in underscored (notionally private) modules such as `_init` and linked 124separately. This allows downstream integrators to completely customize what is 125included "in the box" and covers things like dialect registration, 126pass registration, etc. 127 128### Loader 129 130LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with 131other non-trivial native extensions. As such, the native extension (i.e. the 132`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol 133(`_mlir`), while a small set of Python code is provided in 134`mlir/_cext_loader.py` and siblings which loads and re-exports it. This 135split provides a place to stage code that needs to prepare the environment 136*before* the shared library is loaded into the Python runtime, and also 137provides a place that one-time initialization code can be invoked apart from 138module constructors. 139 140It is recommended to avoid using `__init__.py` files to the extent possible, 141until reaching a leaf package that represents a discrete component. The rule 142to keep in mind is that the presence of an `__init__.py` file prevents the 143ability to split anything at that level or below in the namespace into 144different directories, deployment packages, wheels, etc. 145 146See the documentation for more information and advice: 147https://packaging.python.org/guides/packaging-namespace-packages/ 148 149### Use the C-API 150 151The Python APIs should seek to layer on top of the C-API to the degree possible. 152Especially for the core, dialect-independent parts, such a binding enables 153packaging decisions that would be difficult or impossible if spanning a C++ ABI 154boundary. In addition, factoring in this way side-steps some very difficult 155issues that arise when combining RTTI-based modules (which pybind derived things 156are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM). 157 158### Ownership in the Core IR 159 160There are several top-level types in the core IR that are strongly owned by their python-side reference: 161 162* `PyContext` (`mlir.ir.Context`) 163* `PyModule` (`mlir.ir.Module`) 164* `PyOperation` (`mlir.ir.Operation`) - but with caveats 165 166All other objects are dependent. All objects maintain a back-reference 167(keep-alive) to their closest containing top-level object. Further, dependent 168objects fall into two categories: a) uniqued (which live for the life-time of 169the context) and b) mutable. Mutable objects need additional machinery for 170keeping track of when the C++ instance that backs their Python object is no 171longer valid (typically due to some specific mutation of the IR, deletion, or 172bulk operation). 173 174### Optionality and argument ordering in the Core IR 175 176The following types support being bound to the current thread as a context manager: 177 178* `PyLocation` (`loc: mlir.ir.Location = None`) 179* `PyInsertionPoint` (`ip: mlir.ir.InsertionPoint = None`) 180* `PyMlirContext` (`context: mlir.ir.Context = None`) 181 182In order to support composability of function arguments, when these types appear 183as arguments, they should always be the last and appear in the above order and 184with the given names (which is generally the order in which they are expected to 185need to be expressed explicitly in special cases) as necessary. Each should 186carry a default value of `py::none()` and use either a manual or automatic 187conversion for resolving either with the explicit value or a value from the 188thread context manager (i.e. `DefaultingPyMlirContext` or 189`DefaultingPyLocation`). 190 191The rationale for this is that in Python, trailing keyword arguments to the 192*right* are the most composable, enabling a variety of strategies such as kwarg 193passthrough, default values, etc. Keeping function signatures composable 194increases the chances that interesting DSLs and higher level APIs can be 195constructed without a lot of exotic boilerplate. 196 197Used consistently, this enables a style of IR construction that rarely needs to 198use explicit contexts, locations, or insertion points but is free to do so when 199extra control is needed. 200 201#### Operation hierarchy 202 203As mentioned above, `PyOperation` is special because it can exist in either a 204top-level or dependent state. The life-cycle is unidirectional: operations can 205be created detached (top-level) and once added to another operation, they are 206then dependent for the remainder of their lifetime. The situation is more 207complicated when considering construction scenarios where an operation is added 208to a transitive parent that is still detached, necessitating further accounting 209at such transition points (i.e. all such added children are initially added to 210the IR with a parent of their outer-most detached operation, but then once it is 211added to an attached operation, they need to be re-parented to the containing 212module). 213 214Due to the validity and parenting accounting needs, `PyOperation` is the owner 215for regions and blocks and needs to be a top-level type that we can count on not 216aliasing. This let's us do things like selectively invalidating instances when 217mutations occur without worrying that there is some alias to the same operation 218in the hierarchy. Operations are also the only entity that are allowed to be in 219a detached state, and they are interned at the context level so that there is 220never more than one Python `mlir.ir.Operation` object for a unique 221`MlirOperation`, regardless of how it is obtained. 222 223The C/C++ API allows for Region/Block to also be detached, but it simplifies the 224ownership model a lot to eliminate that possibility in this API, allowing the 225Region/Block to be completely dependent on its owning operation for accounting. 226The aliasing of Python `Region`/`Block` instances to underlying 227`MlirRegion`/`MlirBlock` is considered benign and these objects are not interned 228in the context (unlike operations). 229 230If we ever want to re-introduce detached regions/blocks, we could do so with new 231"DetachedRegion" class or similar and also avoid the complexity of accounting. 232With the way it is now, we can avoid having a global live list for regions and 233blocks. We may end up needing an op-local one at some point TBD, depending on 234how hard it is to guarantee how mutations interact with their Python peer 235objects. We can cross that bridge easily when we get there. 236 237Module, when used purely from the Python API, can't alias anyway, so we can use 238it as a top-level ref type without a live-list for interning. If the API ever 239changes such that this cannot be guaranteed (i.e. by letting you marshal a 240native-defined Module in), then there would need to be a live table for it too. 241 242## Style 243 244In general, for the core parts of MLIR, the Python bindings should be largely 245isomorphic with the underlying C++ structures. However, concessions are made 246either for practicality or to give the resulting library an appropriately 247"Pythonic" flavor. 248 249### Properties vs get\*() methods 250 251Generally favor converting trivial methods like `getContext()`, `getName()`, 252`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is 253primarily a matter of calling `def_property_readonly` vs `def` in binding code, 254and makes things feel much nicer to the Python side. 255 256For example, prefer: 257 258```c++ 259m.def_property_readonly("context", ...) 260``` 261 262Over: 263 264```c++ 265m.def("getContext", ...) 266``` 267 268### __repr__ methods 269 270Things that have nice printed representations are really great :) If there is a 271reasonable printed form, it can be a significant productivity boost to wire that 272to the `__repr__` method (and verify it with a [doctest](#sample-doctest)). 273 274### CamelCase vs snake\_case 275 276Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As 277a mechanical concession to Python style, this can go a long way to making the 278API feel like it fits in with its peers in the Python landscape. 279 280If in doubt, choose names that will flow properly with other 281[PEP 8 style names](https://pep8.org/#descriptive-naming-styles). 282 283### Prefer pseudo-containers 284 285Many core IR constructs provide methods directly on the instance to query count 286and begin/end iterators. Prefer hoisting these to dedicated pseudo containers. 287 288For example, a direct mapping of blocks within regions could be done this way: 289 290```python 291region = ... 292 293for block in region: 294 295 pass 296``` 297 298However, this way is preferred: 299 300```python 301region = ... 302 303for block in region.blocks: 304 305 pass 306 307print(len(region.blocks)) 308print(region.blocks[0]) 309print(region.blocks[-1]) 310``` 311 312Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate 313them to appropriate `__dunder__` methods and iterator wrappers in the bindings. 314 315Note that this can be taken too far, so use good judgment. For example, block 316arguments may appear container-like but have defined methods for lookup and 317mutation that would be hard to model properly without making semantics 318complicated. If running into these, just mirror the C/C++ API. 319 320### Provide one stop helpers for common things 321 322One stop helpers that aggregate over multiple low level entities can be 323incredibly helpful and are encouraged within reason. For example, making 324`Context` have a `parse_asm` or equivalent that avoids needing to explicitly 325construct a SourceMgr can be quite nice. One stop helpers do not have to be 326mutually exclusive with a more complete mapping of the backing constructs. 327 328## Testing 329 330Tests should be added in the `test/Bindings/Python` directory and should 331typically be `.py` files that have a lit run line. 332 333We use `lit` and `FileCheck` based tests: 334 335* For generative tests (those that produce IR), define a Python module that 336 constructs/prints the IR and pipe it through `FileCheck`. 337* Parsing should be kept self-contained within the module under test by use of 338 raw constants and an appropriate `parse_asm` call. 339* Any file I/O code should be staged through a tempfile vs relying on file 340 artifacts/paths outside of the test module. 341* For convenience, we also test non-generative API interactions with the same 342 mechanisms, printing and `CHECK`ing as needed. 343 344### Sample FileCheck test 345 346```python 347# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck 348 349# TODO: Move to a test utility class once any of this actually exists. 350def print_module(f): 351 m = f() 352 print("// -----") 353 print("// TEST_FUNCTION:", f.__name__) 354 print(m.to_asm()) 355 return f 356 357# CHECK-LABEL: TEST_FUNCTION: create_my_op 358@print_module 359def create_my_op(): 360 m = mlir.ir.Module() 361 builder = m.new_op_builder() 362 # CHECK: mydialect.my_operation ... 363 builder.my_op() 364 return m 365``` 366 367## Integration with ODS 368 369The MLIR Python bindings integrate with the tablegen-based ODS system for 370providing user-friendly wrappers around MLIR dialects and operations. There 371are multiple parts to this integration, outlined below. Most details have 372been elided: refer to the build rules and python sources under `mlir.dialects` 373for the canonical way to use this facility. 374 375Users are responsible for providing a `{DIALECT_NAMESPACE}.py` (or an 376equivalent directory with `__init__.py` file) as the entrypoint. 377 378### Generating `_{DIALECT_NAMESPACE}_ops_gen.py` wrapper modules 379 380Each dialect with a mapping to python requires that an appropriate 381`_{DIALECT_NAMESPACE}_ops_gen.py` wrapper module is created. This is done by 382invoking `mlir-tblgen` on a python-bindings specific tablegen wrapper that 383includes the boilerplate and actual dialect specific `td` file. An example, for 384the `StandardOps` (which is assigned the namespace `std` as a special case): 385 386```tablegen 387#ifndef PYTHON_BINDINGS_STANDARD_OPS 388#define PYTHON_BINDINGS_STANDARD_OPS 389 390include "mlir/Bindings/Python/Attributes.td" 391include "mlir/Dialect/StandardOps/IR/Ops.td" 392 393#endif 394``` 395 396In the main repository, building the wrapper is done via the CMake function 397`add_mlir_dialect_python_bindings`, which invokes: 398 399``` 400mlir-tblgen -gen-python-op-bindings -bind-dialect={DIALECT_NAMESPACE} \ 401 {PYTHON_BINDING_TD_FILE} 402``` 403 404The generates op classes must be included in the `{DIALECT_NAMESPACE}.py` file 405in a similar way that generated headers are included for C++ generated code: 406 407```python 408from ._my_dialect_ops_gen import * 409``` 410 411### Extending the search path for wrapper modules 412 413When the python bindings need to locate a wrapper module, they consult the 414`dialect_search_path` and use it to find an appropriately named module. For 415the main repository, this search path is hard-coded to include the 416`mlir.dialects` module, which is where wrappers are emitted by the abobe build 417rule. Out of tree dialects and add their modules to the search path by calling: 418 419```python 420mlir._cext.append_dialect_search_prefix("myproject.mlir.dialects") 421``` 422 423### Wrapper module code organization 424 425The wrapper module tablegen emitter outputs: 426 427* A `_Dialect` class (extending `mlir.ir.Dialect`) with a `DIALECT_NAMESPACE` 428 attribute. 429* An `{OpName}` class for each operation (extending `mlir.ir.OpView`). 430* Decorators for each of the above to register with the system. 431 432Note: In order to avoid naming conflicts, all internal names used by the wrapper 433module are prefixed by `_ods_`. 434 435Each concrete `OpView` subclass further defines several public-intended 436attributes: 437 438* `OPERATION_NAME` attribute with the `str` fully qualified operation name 439 (i.e. `std.absf`). 440* An `__init__` method for the *default builder* if one is defined or inferred 441 for the operation. 442* `@property` getter for each operand or result (using an auto-generated name 443 for unnamed of each). 444* `@property` getter, setter and deleter for each declared attribute. 445 446It further emits additional private-intended attributes meant for subclassing 447and customization (default cases omit these attributes in favor of the 448defaults on `OpView`): 449 450* `_ODS_REGIONS`: A specification on the number and types of regions. 451 Currently a tuple of (min_region_count, has_no_variadic_regions). Note that 452 the API does some light validation on this but the primary purpose is to 453 capture sufficient information to perform other default building and region 454 accessor generation. 455* `_ODS_OPERAND_SEGMENTS` and `_ODS_RESULT_SEGMENTS`: Black-box value which 456 indicates the structure of either the operand or results with respect to 457 variadics. Used by `OpView._ods_build_default` to decode operand and result 458 lists that contain lists. 459 460#### Default Builder 461 462Presently, only a single, default builder is mapped to the `__init__` method. 463The intent is that this `__init__` method represents the *most specific* of 464the builders typically generated for C++; however currently it is just the 465generic form below. 466 467* One argument for each declared result: 468 * For single-valued results: Each will accept an `mlir.ir.Type`. 469 * For variadic results: Each will accept a `List[mlir.ir.Type]`. 470* One argument for each declared operand or attribute: 471 * For single-valued operands: Each will accept an `mlir.ir.Value`. 472 * For variadic operands: Each will accept a `List[mlir.ir.Value]`. 473 * For attributes, it will accept an `mlir.ir.Attribute`. 474* Trailing usage-specific, optional keyword arguments: 475 * `loc`: An explicit `mlir.ir.Location` to use. Defaults to the location 476 bound to the thread (i.e. `with Location.unknown():`) or an error if none 477 is bound nor specified. 478 * `ip`: An explicit `mlir.ir.InsertionPoint` to use. Default to the insertion 479 point bound to the thread (i.e. `with InsertionPoint(...):`). 480 481In addition, each `OpView` inherits a `build_generic` method which allows 482construction via a (nested in the case of variadic) sequence of `results` and 483`operands`. This can be used to get some default construction semantics for 484operations that are otherwise unsupported in Python, at the expense of having 485a very generic signature. 486 487#### Extending Generated Op Classes 488 489Note that this is a rather complex mechanism and this section errs on the side 490of explicitness. Users are encouraged to find an example and duplicate it if 491they don't feel the need to understand the subtlety. The `builtin` dialect 492provides some relatively simple examples. 493 494As mentioned above, the build system generates Python sources like 495`_{DIALECT_NAMESPACE}_ops_gen.py` for each dialect with Python bindings. It 496is often desirable to to use these generated classes as a starting point for 497further customization, so an extension mechanism is provided to make this 498easy (you are always free to do ad-hoc patching in your `{DIALECT_NAMESPACE}.py` 499file but we prefer a more standard mechanism that is applied uniformly). 500 501To provide extensions, add a `_{DIALECT_NAMESPACE}_ops_ext.py` file to the 502`dialects` module (i.e. adjacent to your `{DIALECT_NAMESPACE}.py` top-level 503and the `*_ops_gen.py` file). Using the `builtin` dialect and `FuncOp` as an 504example, the generated code will include an import like this: 505 506```python 507try: 508 from . import _builtin_ops_ext as _ods_ext_module 509except ImportError: 510 _ods_ext_module = None 511``` 512 513Then for each generated concrete `OpView` subclass, it will apply a decorator 514like: 515 516```python 517@_ods_cext.register_operation(_Dialect) 518@_ods_extend_opview_class(_ods_ext_module) 519class FuncOp(_ods_ir.OpView): 520``` 521 522See the `_ods_common.py` `extend_opview_class` function for details of the 523mechanism. At a high level: 524 525* If the extension module exists, locate an extension class for the op (in 526 this example, `FuncOp`): 527 * First by looking for an attribute with the exact name in the extension 528 module. 529 * Falling back to calling a `select_opview_mixin(parent_opview_cls)` 530 function defined in the extension module. 531* If a mixin class is found, a new subclass is dynamically created that multiply 532 inherits from `({_builtin_ops_ext.FuncOp}, _builtin_ops_gen.FuncOp)`. 533 534The mixin class should not inherit from anything (i.e. directly extends 535`object` only). The facility is typically used to define custom `__init__` 536methods, properties, instance methods and static methods. Due to the 537inheritance ordering, the mixin class can act as though it extends the 538generated `OpView` subclass in most contexts (i.e. 539`issubclass(_builtin_ops_ext.FuncOp, OpView)` will return `False` but usage 540generally allows you treat it as duck typed as an `OpView`). 541 542There are a couple of recommendations, given how the class hierarchy is 543defined: 544 545* For static methods that need to instantiate the actual "leaf" op (which 546 is dynamically generated and would result in circular dependencies to try 547 to reference by name), prefer to use `@classmethod` and the concrete 548 subclass will be provided as your first `cls` argument. See 549 `_builtin_ops_ext.FuncOp.from_py_func` as an example. 550* If seeking to replace the generated `__init__` method entirely, you may 551 actually want to invoke the super-super-class `mlir.ir.OpView` constructor 552 directly, as it takes an `mlir.ir.Operation`, which is likely what you 553 are constructing (i.e. the generated `__init__` method likely adds more 554 API constraints than you want to expose in a custom builder). 555 556A pattern that comes up frequently is wanting to provide a sugared `__init__` 557method which has optional or type-polymorphism/implicit conversions but to 558otherwise want to invoke the default op building logic. For such cases, 559it is recommended to use an idiom such as: 560 561```python 562 def __init__(self, sugar, spice, *, loc=None, ip=None): 563 ... massage into result_type, operands, attributes ... 564 OpView.__init__(self, self.build_generic( 565 results=[result_type], 566 operands=operands, 567 attributes=attributes, 568 loc=loc, 569 ip=ip)) 570``` 571 572Refer to the documentation for `build_generic` for more information. 573