1# MLIR Python Bindings
2
3Current status: Under development and not enabled by default
4
5
6## Building
7
8### Pre-requisites
9
10* [`pybind11`](https://github.com/pybind/pybind11) must be installed and able to
11  be located by CMake.
12* A relatively recent Python3 installation
13
14### CMake variables
15
16* **`MLIR_BINDINGS_PYTHON_ENABLED`**`:BOOL`
17
18  Enables building the Python bindings. Defaults to `OFF`.
19
20* **`MLIR_PYTHON_BINDINGS_VERSION_LOCKED`**`:BOOL`
21
22  Links the native extension against the Python runtime library, which is
23  optional on some platforms. While setting this to `OFF` can yield some greater
24  deployment flexibility, linking in this way allows the linker to report
25  compile time errors for unresolved symbols on all platforms, which makes for a
26  smoother development workflow. Defaults to `ON`.
27
28* **`PYTHON_EXECUTABLE`**:`STRING`
29
30  Specifies the `python` executable used for the LLVM build, including for
31  determining header/link flags for the Python bindings. On systems with
32  multiple Python implementations, setting this explicitly to the preferred
33  `python3` executable is strongly recommended.
34
35
36## Design
37
38### Use cases
39
40There are likely two primary use cases for the MLIR python bindings:
41
421. Support users who expect that an installed version of LLVM/MLIR will yield
43   the ability to `import mlir` and use the API in a pure way out of the box.
44
452. Downstream integrations will likely want to include parts of the API in their
46   private namespace or specially built libraries, probably mixing it with other
47   python native bits.
48
49
50### Composable modules
51
52In order to support use case #2, the Python bindings are organized into
53composable modules that downstream integrators can include and re-export into
54their own namespace if desired. This forces several design points:
55
56* Separate the construction/populating of a `py::module` from `PYBIND11_MODULE`
57  global constructor.
58
59* Introduce headers for C++-only wrapper classes as other related C++ modules
60  will need to interop with it.
61
62* Separate any initialization routines that depend on optional components into
63  its own module/dependency (currently, things like `registerAllDialects` fall
64  into this category).
65
66There are a lot of co-related issues of shared library linkage, distribution
67concerns, etc that affect such things. Organizing the code into composable
68modules (versus a monolithic `cpp` file) allows the flexibility to address many
69of these as needed over time. Also, compilation time for all of the template
70meta-programming in pybind scales with the number of things you define in a
71translation unit. Breaking into multiple translation units can significantly aid
72compile times for APIs with a large surface area.
73
74### Submodules
75
76Generally, the C++ codebase namespaces most things into the `mlir` namespace.
77However, in order to modularize and make the Python bindings easier to
78understand, sub-packages are defined that map roughly to the directory structure
79of functional units in MLIR.
80
81Examples:
82
83* `mlir.ir`
84* `mlir.passes` (`pass` is a reserved word :( )
85* `mlir.dialect`
86* `mlir.execution_engine` (aside from namespacing, it is important that
87  "bulky"/optional parts like this are isolated)
88
89In addition, initialization functions that imply optional dependencies should
90be in underscored (notionally private) modules such as `_init` and linked
91separately. This allows downstream integrators to completely customize what is
92included "in the box" and covers things like dialect registration,
93pass registration, etc.
94
95### Loader
96
97LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with
98other non-trivial native extensions. As such, the native extension (i.e. the
99`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol
100(`_mlir`), while a small set of Python code is provided in `mlir/__init__.py`
101and siblings which loads and re-exports it. This split provides a place to stage
102code that needs to prepare the environment *before* the shared library is loaded
103into the Python runtime, and also provides a place that one-time initialization
104code can be invoked apart from module constructors.
105
106To start with the `mlir/__init__.py` loader shim can be very simple and scale to
107future need:
108
109```python
110from _mlir import *
111```
112
113### Limited use of globals
114
115For normal operations, parent-child constructor relationships are realized with
116constructor methods on a parent class as opposed to requiring
117invocation/creation from a global symbol.
118
119For example, consider two code fragments:
120
121```python
122
123op = build_my_op()
124
125region = mlir.Region(op)
126
127```
128
129vs
130
131```python
132
133op = build_my_op()
134
135region = op.new_region()
136
137```
138
139For tightly coupled data structures like `Operation`, the latter is generally
140preferred because:
141
142* It is syntactically less possible to create something that is going to access
143  illegal memory (less error handling in the bindings, less testing, etc).
144
145* It reduces the global-API surface area for creating related entities. This
146  makes it more likely that if constructing IR based on an Operation instance of
147  unknown providence, receiving code can just call methods on it to do what they
148  want versus needing to reach back into the global namespace and find the right
149  `Region` class.
150
151* It leaks fewer things that are in place for C++ convenience (i.e. default
152  constructors to invalid instances).
153
154### Use the C-API
155
156The Python APIs should seek to layer on top of the C-API to the degree possible.
157Especially for the core, dialect-independent parts, such a binding enables
158packaging decisions that would be difficult or impossible if spanning a C++ ABI
159boundary. In addition, factoring in this way side-steps some very difficult
160issues that arise when combining RTTI-based modules (which pybind derived things
161are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM).
162
163
164## Style
165
166In general, for the core parts of MLIR, the Python bindings should be largely
167isomorphic with the underlying C++ structures. However, concessions are made
168either for practicality or to give the resulting library an appropriately
169"Pythonic" flavor.
170
171### Properties vs get*() methods
172
173Generally favor converting trivial methods like `getContext()`, `getName()`,
174`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is
175primarily a matter of calling `def_property_readonly` vs `def` in binding code,
176and makes things feel much nicer to the Python side.
177
178For example, prefer:
179
180```c++
181m.def_property_readonly("context", ...)
182```
183
184Over:
185
186```c++
187m.def("getContext", ...)
188```
189
190### __repr__ methods
191
192Things that have nice printed representations are really great :)  If there is a
193reasonable printed form, it can be a significant productivity boost to wire that
194to the `__repr__` method (and verify it with a [doctest](#sample-doctest)).
195
196### CamelCase vs snake_case
197
198Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As
199a mechanical concession to Python style, this can go a long way to making the
200API feel like it fits in with its peers in the Python landscape.
201
202If in doubt, choose names that will flow properly with other
203[PEP 8 style names](https://pep8.org/#descriptive-naming-styles).
204
205### Prefer pseudo-containers
206
207Many core IR constructs provide methods directly on the instance to query count
208and begin/end iterators. Prefer hoisting these to dedicated pseudo containers.
209
210For example, a direct mapping of blocks within regions could be done this way:
211
212```python
213region = ...
214
215for block in region:
216
217  pass
218```
219
220However, this way is preferred:
221
222```python
223region = ...
224
225for block in region.blocks:
226
227  pass
228
229print(len(region.blocks))
230print(region.blocks[0])
231print(region.blocks[-1])
232```
233
234Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate
235them to appropriate `__dunder__` methods and iterator wrappers in the bindings.
236
237Note that this can be taken too far, so use good judgment. For example, block
238arguments may appear container-like but have defined methods for lookup and
239mutation that would be hard to model properly without making semantics
240complicated. If running into these, just mirror the C/C++ API.
241
242### Provide one stop helpers for common things
243
244One stop helpers that aggregate over multiple low level entities can be
245incredibly helpful and are encouraged within reason. For example, making
246`Context` have a `parse_asm` or equivalent that avoids needing to explicitly
247construct a SourceMgr can be quite nice. One stop helpers do not have to be
248mutually exclusive with a more complete mapping of the backing constructs.
249
250## Testing
251
252Tests should be added in the `test/Bindings/Python` directory and should
253typically be `.py` files that have a lit run line.
254
255While lit can run any python module, prefer to lay tests out according to these
256rules:
257
258* For tests of the API surface area, prefer
259  [`doctest`](https://docs.python.org/3/library/doctest.html).
260* For generative tests (those that produce IR), define a Python module that
261  constructs/prints the IR and pipe it through `FileCheck`.
262* Parsing should be kept self-contained within the module under test by use of
263  raw constants and an appropriate `parse_asm` call.
264* Any file I/O code should be staged through a tempfile vs relying on file
265  artifacts/paths outside of the test module.
266
267### Sample Doctest
268
269```python
270# RUN: %PYTHON %s
271
272"""
273  >>> m = load_test_module()
274Test basics:
275  >>> m.operation.name
276  "module"
277  >>> m.operation.is_registered
278  True
279  >>> ... etc ...
280
281Verify that repr prints:
282  >>> m.operation
283  <operation 'module'>
284"""
285
286import mlir
287
288TEST_MLIR_ASM = r"""
289func @test_operation_correct_regions() {
290  // ...
291}
292"""
293
294# TODO: Move to a test utility class once any of this actually exists.
295def load_test_module():
296  ctx = mlir.ir.Context()
297  ctx.allow_unregistered_dialects = True
298  module = ctx.parse_asm(TEST_MLIR_ASM)
299  return module
300
301
302if __name__ == "__main__":
303  import doctest
304  doctest.testmod()
305```
306
307### Sample FileCheck test
308
309```python
310# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck
311
312# TODO: Move to a test utility class once any of this actually exists.
313def print_module(f):
314  m = f()
315  print("// -----")
316  print("// TEST_FUNCTION:", f.__name__)
317  print(m.to_asm())
318  return f
319
320# CHECK-LABEL: TEST_FUNCTION: create_my_op
321@print_module
322def create_my_op():
323  m = mlir.ir.Module()
324  builder = m.new_op_builder()
325  # CHECK: mydialect.my_operation ...
326  builder.my_op()
327  return m
328```
329