1# MLIR Python Bindings
2
3Current status: Under development and not enabled by default
4
5## Building
6
7### Pre-requisites
8
9* A relatively recent Python3 installation
10* [`pybind11`](https://github.com/pybind/pybind11) must be installed and able to
11  be located by CMake (auto-detected if installed via
12  `python -m pip install pybind11`). Note: minimum version required: :2.6.0.
13
14### CMake variables
15
16* **`MLIR_BINDINGS_PYTHON_ENABLED`**`:BOOL`
17
18  Enables building the Python bindings. Defaults to `OFF`.
19
20* **`Python3_EXECUTABLE`**:`STRING`
21
22  Specifies the `python` executable used for the LLVM build, including for
23  determining header/link flags for the Python bindings. On systems with
24  multiple Python implementations, setting this explicitly to the preferred
25  `python3` executable is strongly recommended.
26
27* **`MLIR_PYTHON_BINDINGS_VERSION_LOCKED`**`:BOOL`
28
29  Links the native extension against the Python runtime library, which is
30  optional on some platforms. While setting this to `OFF` can yield some greater
31  deployment flexibility, linking in this way allows the linker to report
32  compile time errors for unresolved symbols on all platforms, which makes for a
33  smoother development workflow. Defaults to `ON`.
34
35### Recommended development practices
36
37It is recommended to use a python virtual environment. Many ways exist for this,
38but the following is the simplest:
39
40```shell
41# Make sure your 'python' is what you expect. Note that on multi-python
42# systems, this may have a version suffix, and on many Linuxes and MacOS where
43# python2 and python3 co-exist, you may also want to use `python3`.
44which python
45python -m venv ~/.venv/mlirdev
46source ~/.venv/mlirdev/bin/activate
47
48# Now the `python` command will resolve to your virtual environment and
49# packages will be installed there.
50python -m pip install pybind11 numpy
51
52# Now run `cmake`, `ninja`, et al.
53```
54
55For interactive use, it is sufficient to add the `python` directory in your
56`build/` directory to the `PYTHONPATH`. Typically:
57
58```shell
59export PYTHONPATH=$(cd build && pwd)/python
60```
61
62## Design
63
64### Use cases
65
66There are likely two primary use cases for the MLIR python bindings:
67
681. Support users who expect that an installed version of LLVM/MLIR will yield
69   the ability to `import mlir` and use the API in a pure way out of the box.
70
711. Downstream integrations will likely want to include parts of the API in their
72   private namespace or specially built libraries, probably mixing it with other
73   python native bits.
74
75### Composable modules
76
77In order to support use case \#2, the Python bindings are organized into
78composable modules that downstream integrators can include and re-export into
79their own namespace if desired. This forces several design points:
80
81* Separate the construction/populating of a `py::module` from `PYBIND11_MODULE`
82  global constructor.
83
84* Introduce headers for C++-only wrapper classes as other related C++ modules
85  will need to interop with it.
86
87* Separate any initialization routines that depend on optional components into
88  its own module/dependency (currently, things like `registerAllDialects` fall
89  into this category).
90
91There are a lot of co-related issues of shared library linkage, distribution
92concerns, etc that affect such things. Organizing the code into composable
93modules (versus a monolithic `cpp` file) allows the flexibility to address many
94of these as needed over time. Also, compilation time for all of the template
95meta-programming in pybind scales with the number of things you define in a
96translation unit. Breaking into multiple translation units can significantly aid
97compile times for APIs with a large surface area.
98
99### Submodules
100
101Generally, the C++ codebase namespaces most things into the `mlir` namespace.
102However, in order to modularize and make the Python bindings easier to
103understand, sub-packages are defined that map roughly to the directory structure
104of functional units in MLIR.
105
106Examples:
107
108* `mlir.ir`
109* `mlir.passes` (`pass` is a reserved word :( )
110* `mlir.dialect`
111* `mlir.execution_engine` (aside from namespacing, it is important that
112  "bulky"/optional parts like this are isolated)
113
114In addition, initialization functions that imply optional dependencies should
115be in underscored (notionally private) modules such as `_init` and linked
116separately. This allows downstream integrators to completely customize what is
117included "in the box" and covers things like dialect registration,
118pass registration, etc.
119
120### Loader
121
122LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with
123other non-trivial native extensions. As such, the native extension (i.e. the
124`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol
125(`_mlir`), while a small set of Python code is provided in `mlir/__init__.py`
126and siblings which loads and re-exports it. This split provides a place to stage
127code that needs to prepare the environment *before* the shared library is loaded
128into the Python runtime, and also provides a place that one-time initialization
129code can be invoked apart from module constructors.
130
131To start with the `mlir/__init__.py` loader shim can be very simple and scale to
132future need:
133
134```python
135from _mlir import *
136```
137
138### Use the C-API
139
140The Python APIs should seek to layer on top of the C-API to the degree possible.
141Especially for the core, dialect-independent parts, such a binding enables
142packaging decisions that would be difficult or impossible if spanning a C++ ABI
143boundary. In addition, factoring in this way side-steps some very difficult
144issues that arise when combining RTTI-based modules (which pybind derived things
145are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM).
146
147### Ownership in the Core IR
148
149There are several top-level types in the core IR that are strongly owned by their python-side reference:
150
151* `PyContext` (`mlir.ir.Context`)
152* `PyModule` (`mlir.ir.Module`)
153* `PyOperation` (`mlir.ir.Operation`) - but with caveats
154
155All other objects are dependent. All objects maintain a back-reference
156(keep-alive) to their closest containing top-level object. Further, dependent
157objects fall into two categories: a) uniqued (which live for the life-time of
158the context) and b) mutable. Mutable objects need additional machinery for
159keeping track of when the C++ instance that backs their Python object is no
160longer valid (typically due to some specific mutation of the IR, deletion, or
161bulk operation).
162
163### Optionality and argument ordering in the Core IR
164
165The following types support being bound to the current thread as a context manager:
166
167* `PyLocation` (`loc: mlir.ir.Location = None`)
168* `PyInsertionPoint` (`ip: mlir.ir.InsertionPoint = None`)
169* `PyMlirContext` (`context: mlir.ir.Context = None`)
170
171In order to support composability of function arguments, when these types appear
172as arguments, they should always be the last and appear in the above order and
173with the given names (which is generally the order in which they are expected to
174need to be expressed explicitly in special cases) as necessary. Each should
175carry a default value of `py::none()` and use either a manual or automatic
176conversion for resolving either with the explicit value or a value from the
177thread context manager (i.e. `DefaultingPyMlirContext` or
178`DefaultingPyLocation`).
179
180The rationale for this is that in Python, trailing keyword arguments to the
181*right* are the most composable, enabling a variety of strategies such as kwarg
182passthrough, default values, etc. Keeping function signatures composable
183increases the chances that interesting DSLs and higher level APIs can be
184constructed without a lot of exotic boilerplate.
185
186Used consistently, this enables a style of IR construction that rarely needs to
187use explicit contexts, locations, or insertion points but is free to do so when
188extra control is needed.
189
190#### Operation hierarchy
191
192As mentioned above, `PyOperation` is special because it can exist in either a
193top-level or dependent state. The life-cycle is unidirectional: operations can
194be created detached (top-level) and once added to another operation, they are
195then dependent for the remainder of their lifetime. The situation is more
196complicated when considering construction scenarios where an operation is added
197to a transitive parent that is still detached, necessitating further accounting
198at such transition points (i.e. all such added children are initially added to
199the IR with a parent of their outer-most detached operation, but then once it is
200added to an attached operation, they need to be re-parented to the containing
201module).
202
203Due to the validity and parenting accounting needs, `PyOperation` is the owner
204for regions and blocks and needs to be a top-level type that we can count on not
205aliasing. This let's us do things like selectively invalidating instances when
206mutations occur without worrying that there is some alias to the same operation
207in the hierarchy. Operations are also the only entity that are allowed to be in
208a detached state, and they are interned at the context level so that there is
209never more than one Python `mlir.ir.Operation` object for a unique
210`MlirOperation`, regardless of how it is obtained.
211
212The C/C++ API allows for Region/Block to also be detached, but it simplifies the
213ownership model a lot to eliminate that possibility in this API, allowing the
214Region/Block to be completely dependent on its owning operation for accounting.
215The aliasing of Python `Region`/`Block` instances to underlying
216`MlirRegion`/`MlirBlock` is considered benign and these objects are not interned
217in the context (unlike operations).
218
219If we ever want to re-introduce detached regions/blocks, we could do so with new
220"DetachedRegion" class or similar and also avoid the complexity of accounting.
221With the way it is now, we can avoid having a global live list for regions and
222blocks. We may end up needing an op-local one at some point TBD, depending on
223how hard it is to guarantee how mutations interact with their Python peer
224objects. We can cross that bridge easily when we get there.
225
226Module, when used purely from the Python API, can't alias anyway, so we can use
227it as a top-level ref type without a live-list for interning. If the API ever
228changes such that this cannot be guaranteed (i.e. by letting you marshal a
229native-defined Module in), then there would need to be a live table for it too.
230
231## Style
232
233In general, for the core parts of MLIR, the Python bindings should be largely
234isomorphic with the underlying C++ structures. However, concessions are made
235either for practicality or to give the resulting library an appropriately
236"Pythonic" flavor.
237
238### Properties vs get\*() methods
239
240Generally favor converting trivial methods like `getContext()`, `getName()`,
241`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is
242primarily a matter of calling `def_property_readonly` vs `def` in binding code,
243and makes things feel much nicer to the Python side.
244
245For example, prefer:
246
247```c++
248m.def_property_readonly("context", ...)
249```
250
251Over:
252
253```c++
254m.def("getContext", ...)
255```
256
257### __repr__ methods
258
259Things that have nice printed representations are really great :)  If there is a
260reasonable printed form, it can be a significant productivity boost to wire that
261to the `__repr__` method (and verify it with a [doctest](#sample-doctest)).
262
263### CamelCase vs snake\_case
264
265Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As
266a mechanical concession to Python style, this can go a long way to making the
267API feel like it fits in with its peers in the Python landscape.
268
269If in doubt, choose names that will flow properly with other
270[PEP 8 style names](https://pep8.org/#descriptive-naming-styles).
271
272### Prefer pseudo-containers
273
274Many core IR constructs provide methods directly on the instance to query count
275and begin/end iterators. Prefer hoisting these to dedicated pseudo containers.
276
277For example, a direct mapping of blocks within regions could be done this way:
278
279```python
280region = ...
281
282for block in region:
283
284  pass
285```
286
287However, this way is preferred:
288
289```python
290region = ...
291
292for block in region.blocks:
293
294  pass
295
296print(len(region.blocks))
297print(region.blocks[0])
298print(region.blocks[-1])
299```
300
301Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate
302them to appropriate `__dunder__` methods and iterator wrappers in the bindings.
303
304Note that this can be taken too far, so use good judgment. For example, block
305arguments may appear container-like but have defined methods for lookup and
306mutation that would be hard to model properly without making semantics
307complicated. If running into these, just mirror the C/C++ API.
308
309### Provide one stop helpers for common things
310
311One stop helpers that aggregate over multiple low level entities can be
312incredibly helpful and are encouraged within reason. For example, making
313`Context` have a `parse_asm` or equivalent that avoids needing to explicitly
314construct a SourceMgr can be quite nice. One stop helpers do not have to be
315mutually exclusive with a more complete mapping of the backing constructs.
316
317## Testing
318
319Tests should be added in the `test/Bindings/Python` directory and should
320typically be `.py` files that have a lit run line.
321
322We use `lit` and `FileCheck` based tests:
323
324* For generative tests (those that produce IR), define a Python module that
325  constructs/prints the IR and pipe it through `FileCheck`.
326* Parsing should be kept self-contained within the module under test by use of
327  raw constants and an appropriate `parse_asm` call.
328* Any file I/O code should be staged through a tempfile vs relying on file
329  artifacts/paths outside of the test module.
330* For convenience, we also test non-generative API interactions with the same
331  mechanisms, printing and `CHECK`ing as needed.
332
333### Sample FileCheck test
334
335```python
336# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck
337
338# TODO: Move to a test utility class once any of this actually exists.
339def print_module(f):
340  m = f()
341  print("// -----")
342  print("// TEST_FUNCTION:", f.__name__)
343  print(m.to_asm())
344  return f
345
346# CHECK-LABEL: TEST_FUNCTION: create_my_op
347@print_module
348def create_my_op():
349  m = mlir.ir.Module()
350  builder = m.new_op_builder()
351  # CHECK: mydialect.my_operation ...
352  builder.my_op()
353  return m
354```
355
356## Integration with ODS
357
358The MLIR Python bindings integrate with the tablegen-based ODS system for
359providing user-friendly wrappers around MLIR dialects and operations. There
360are multiple parts to this integration, outlined below. Most details have
361been elided: refer to the build rules and python sources under `mlir.dialects`
362for the canonical way to use this facility.
363
364### Generating `{DIALECT_NAMESPACE}.py` wrapper modules
365
366Each dialect with a mapping to python requires that an appropriate
367`{DIALECT_NAMESPACE}.py` wrapper module is created. This is done by invoking
368`mlir-tblgen` on a python-bindings specific tablegen wrapper that includes
369the boilerplate and actual dialect specific `td` file. An example, for the
370`StandardOps` (which is assigned the namespace `std` as a special case):
371
372```tablegen
373#ifndef PYTHON_BINDINGS_STANDARD_OPS
374#define PYTHON_BINDINGS_STANDARD_OPS
375
376include "mlir/Bindings/Python/Attributes.td"
377include "mlir/Dialect/StandardOps/IR/Ops.td"
378
379#endif
380```
381
382In the main repository, building the wrapper is done via the CMake function
383`add_mlir_dialect_python_bindings`, which invokes:
384
385```
386mlir-tblgen -gen-python-op-bindings -bind-dialect={DIALECT_NAMESPACE} \
387    {PYTHON_BINDING_TD_FILE}
388```
389
390### Extending the search path for wrapper modules
391
392When the python bindings need to locate a wrapper module, they consult the
393`dialect_search_path` and use it to find an appropriately named module. For
394the main repository, this search path is hard-coded to include the
395`mlir.dialects` module, which is where wrappers are emitted by the abobe build
396rule. Out of tree dialects and add their modules to the search path by calling:
397
398```python
399mlir._cext.append_dialect_search_prefix("myproject.mlir.dialects")
400```
401
402### Wrapper module code organization
403
404The wrapper module tablegen emitter outputs:
405
406* A `_Dialect` class (extending `mlir.ir.Dialect`) with a `DIALECT_NAMESPACE`
407  attribute.
408* An `{OpName}` class for each operation (extending `mlir.ir.OpView`).
409* Decorators for each of the above to register with the system.
410
411Note: In order to avoid naming conflicts, all internal names used by the wrapper
412module are prefixed by `_ods_`.
413
414Each concrete `OpView` subclass further defines several public-intended
415attributes:
416
417* `OPERATION_NAME` attribute with the `str` fully qualified operation name
418  (i.e. `std.absf`).
419* An `__init__` method for the *default builder* if one is defined or inferred
420  for the operation.
421* `@property` getter for each operand or result (using an auto-generated name
422  for unnamed of each).
423* `@property` getter, setter and deleter for each declared attribute.
424
425It further emits additional private-intended attributes meant for subclassing
426and customization (default cases omit these attributes in favor of the
427defaults on `OpView`):
428
429* `_ODS_REGIONS`: A specification on the number and types of regions.
430  Currently a tuple of (min_region_count, has_no_variadic_regions). Note that
431  the API does some light validation on this but the primary purpose is to
432  capture sufficient information to perform other default building and region
433  accessor generation.
434* `_ODS_OPERAND_SEGMENTS` and `_ODS_RESULT_SEGMENTS`: Black-box value which
435  indicates the structure of either the operand or results with respect to
436  variadics. Used by `OpView._ods_build_default` to decode operand and result
437  lists that contain lists.
438
439#### Builders
440
441Presently, only a single, default builder is mapped to the `__init__` method.
442The intent is that this `__init__` method represents the *most specific* of
443the builders typically generated for C++; however currently it is just the
444generic form below.
445
446* One argument for each declared result:
447  * For single-valued results: Each will accept an `mlir.ir.Type`.
448  * For variadic results: Each will accept a `List[mlir.ir.Type]`.
449* One argument for each declared operand or attribute:
450  * For single-valued operands: Each will accept an `mlir.ir.Value`.
451  * For variadic operands: Each will accept a `List[mlir.ir.Value]`.
452  * For attributes, it will accept an `mlir.ir.Attribute`.
453* Trailing usage-specific, optional keyword arguments:
454  * `loc`: An explicit `mlir.ir.Location` to use. Defaults to the location
455    bound to the thread (i.e. `with Location.unknown():`) or an error if none
456    is bound nor specified.
457  * `ip`: An explicit `mlir.ir.InsertionPoint` to use. Default to the insertion
458    point bound to the thread (i.e. `with InsertionPoint(...):`).
459
460In addition, each `OpView` inherits a `build_generic` method which allows
461construction via a (nested in the case of variadic) sequence of `results` and
462`operands`. This can be used to get some default construction semantics for
463operations that are otherwise unsupported in Python, at the expense of having
464a very generic signature.
465