inline.rs - OpenGrok history log for /wasmtime-44.0.1/cranelift/codegen/src/inline.rs

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: dev, v36.0.9, v44.0.1, v43.0.2, v36.0.8, v24.0.8, v44.0.0, v43.0.1, v42.0.2, v36.0.7, v24.0.7
# ab78bd82	22-Mar-2026	Ho Kim <[email protected]>	fix: correct various typos (#12807) Signed-off-by: Ho Kim <[email protected]>
Revision tags: v43.0.0, v42.0.1, v41.0.4, v42.0.0, v40.0.4, v36.0.6, v24.0.6, v41.0.3, v41.0.2, v41.0.1, v36.0.5, v40.0.3, v41.0.0, v36.0.4, v39.0.2, v40.0.2, v40.0.1, v40.0.0
# 87ed3b60	15-Dec-2025	Chris Fallin <[email protected]>	Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`. (#12160) * Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable AB Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`. (#12160) * Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`. As discussed in this week's Cranelift meeting, we've discovered a need to generalize the `patchable_call` mechanism and corresponding `patchable` ABI slightly. In particular, we will need patchable `try_call` callsites as well in order to allow breakpoint handlers to throw exceptions (desirable functionality eventually) and have this work in the presence of inlining. Also, it's just a nice generalization to say that patchability is an orthogonal dimension to the call ABI and the other restrictions we initially imposed, and works as long as the basic requirement (no return values) is met. This also renames the `patchable` ABI to `preserve_all`, to make it clear that its purpose is actually orthogonal, and it can be used independently of patchable callsites. It also deletes the `cold` ABI, which never actually did anything and is misleading in the presence of an actual cold-ish (subzero temperature, actually) ABI like `preserve_all`. * Review feedback. show more ...
# 17fbd3c6	12-Dec-2025	Chris Fallin <[email protected]>	Debug: implement breakpoints and single-stepping. (#12133) * Debug: implement breakpoints and single-stepping. This is a PR that puts together a bunch of earlier pieces (patchable calls in #12061 a Debug: implement breakpoints and single-stepping. (#12133) * Debug: implement breakpoints and single-stepping. This is a PR that puts together a bunch of earlier pieces (patchable calls in #12061 and #12101, private copies of code in #12051, and all the prior debug event and instrumentation infrastructure) to implement breakpoints in the guest debugger. These are implemented in the way we have planned in #11964: each sequence point (location prior to a Wasm opcode) is now a patchable call instruction, patched out (replaced with NOPs) by default. When patched in, the breakpoint callsite calls a trampoline with the `patchable` ABI which then invokes the `breakpoint` hostcall. That hostcall emits the debug event and nothing else. A few of the interesting bits in this PR include: - Implementations of "unpublish" (switch permissions back to read/write from read/execute) for mmap'd code memory on all our platforms. - Infrastructure in the frame-tables (debug info) metadata producer and parser to record "breakpoint patches". - A tweak to the NOP metadata packaged with the `MachBuffer` to allow multiple NOP sizes. This lets us use one 5-byte NOP on x86-64, for example (did you know x86-64 had these?!) rather than five 1-byte NOPs. This PR also implements single-stepping with a global-per-`Store` flag, because at this point why not; it's a small additional bit of logic to do all patches in all modules registered in the `Store` when that flag is enabled. A few realizations for future work: - The need for an introspection API available to a debugger to see the modules within a component is starting to become clear; either that, or the "module and PC" location identifier for a breakpoint switches to a "module or component" sum type. Right now, the tests for this feature use only core modules. Extending to components should not actually be hard at all, we just need to build the API for it. - The interaction between inlining and `patchable_call` is interesting: what happens if we inline a `patchable_call` at a `try_call` callsite? Right now, we do not update the `patchable_call` to a `try_call`, because there is no `patchable_try_call`; this is fine in the Wasmtime embedding in practice because we never (today!) throw exceptions from a breakpoint handler. This does suggest to me that maybe we should make patchability a property of any callsite, and allow try-calls to be patchable too (with the same restriction about no return values as the only restriction); but happy to discuss that one further. * Add missing debug.wat disas test. * Review feedback. * Fix comment on `CodeMemory::text_mut`. * Review feedback. * Review feedback: abort process on failure to re-apply executable permissions. * Implement icache flush for aarch64. This appears to be necessary as we otherwise see a failure in CI on macOS/aarch64 that is consistent with patched-in breakpoint calls still being incorrectly cached after we remove them and republish the code. There is a longstanding issue in #3310 tracking proper icache coherence handling on aarch64. We implemented this for Linux with the `membarrier` syscall but never did so for macOS. Maybe this is the first point at which it matters, because code was always loaded at new addresses (hence did not have coherence issues because nothing would have been cached) previously. prtest:full * Review feedback: use `next_multiple_of`. show more ...
Revision tags: v39.0.1, v39.0.0, v38.0.4, v37.0.3, v36.0.3, v24.0.5, v38.0.3, v38.0.2, v38.0.1, v37.0.2
# a3d6e407	06-Oct-2025	Chris Fallin <[email protected]>	Cranelift: add debug tag infrastructure. (#11768) * Cranelift: add debug tag infrastructure. This PR adds debug tags, a kind of metadata that can attach to CLIF instructions and be lowered to VCo Cranelift: add debug tag infrastructure. (#11768) * Cranelift: add debug tag infrastructure. This PR adds debug tags, a kind of metadata that can attach to CLIF instructions and be lowered to VCode instructions and as metadata on the produced compiled code. It also adds opaque descriptor blobs carried with stackslots. Together, these two features allow decorating IR with first-class debug instrumentation that is properly preserved by the compiler, including across optimizations and inlining. (Wasmtime's use of these features will come in followup PRs.) The key idea of a "debug tag" is to allow the Cranelift embedder to express whatever information it needs to, in a format that is opaque to Cranelift itself, except for the parts that need translation during lowering. In particular, the `DebugTag::StackSlot` variant gets translated to a physical offset into the stackframe in the compiled metadata output. So, for example, the embedder can emit a tag referring to a stackslot, and another describing an offset in that stackslot. The debug tags exist as a sequence on any given instruction; the meaning of the sequence is known only to the embedder, except that during inlining, the tags for the inlining call instruction are prepended to the tags of inlined instructions. In this way, a canonical use-case of tags as describing original source-language frames can preserve the source-language view even when multiple functions are inlined into one. The descriptor on a stackslot may look a little odd at first, but its purpose is to allow serializing some description of stackslot-contained runtime user-program data, in a way that is firmly attached to the stackslot. In particular, in the face of inlining, this descriptor is copied into the inlining (parent) function from the inlined function when the stackslot entity is copied; no other metadata outside Cranelift needs to track the identity of stackslots and know about that motion. This fits nicely with the ability of tags to refer to stackslots; together, the embedder can annotate instructions as having certain state in stackslots, and describe the format of that state per stackslot. This infrastructure is tested with some compile-tests now; testing of the interpretation of the metadata output will come with end-to-end debug instrumentation tests in a followup PR. * Review feedback: add back sequence points and enforce tags only on sequence points or calls. * Use Vecs for debug metadata in MachBuffer to avoid SmallVec size penalty in not-used case. * Review feedback: switch from inlined stackslot descriptor blobs to u64 keys. show more ...
Revision tags: v37.0.1, v37.0.0
# 4c01ee2f	05-Sep-2025	Chris Fallin <[email protected]>	Cranelift: add get_exception_handler_address. (#11629) * Cranelift: add get_exception_handler_address. This is designed to enable applications such as #11592 that use alternative unwinding mechanis Cranelift: add get_exception_handler_address. (#11629) * Cranelift: add get_exception_handler_address. This is designed to enable applications such as #11592 that use alternative unwinding mechanisms that may not necessarily want to walk a stack and look up exception tables. The idea is that whenever it would be valid to resume to an exception handler that is active on the stack, we can provide the same PC as a first-class runtime value that would be found in the exception table for the given handler edge. A "custom" resume step can then use this PC as a resume-point as long as it follows the relevant exception ABI (i.e.: restore SP, FP, any other saved registers that the exception ABI specifies, and provide appropriate payload value(s)). Handlers are associated with edges out of `try_call`s (or `try_call_indirect`s); and edges specifically, not blocks, because there could be multiple out-edges to one block. The instruction thus takes the block that contains the try-call and an immediate that indexes its exceptional edges. This CLIF instruction required a bit of infrastructure to (i) allow naming raw blocks, not just block calls, as instruction arguments, and (ii) allow getting the MachLabel for any other lowered block during lowering. But given that, the lowerings themselves are straightforward uses of MachBuffer labels to fix-up PC-relative address-loading instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`). * Review feedback. * Review feedback: more tests. show more ...
Revision tags: v36.0.2
# e767c56b	25-Aug-2025	Nick Fitzgerald <[email protected]>	Cranelift: fix an inliner bug where the last-inlined block was not inserted in the layout (#11513) Fixes #11493
Revision tags: v36.0.1, v36.0.0
# dcedcbf5	19-Aug-2025	Nick Fitzgerald <[email protected]>	Do not consider inlined function bodies for more inlining (#11462) Due to our bottom-up, SCC-based approach to inlining, we've already performed inlining on the callee, and do not need to consider a Do not consider inlined function bodies for more inlining (#11462) Due to our bottom-up, SCC-based approach to inlining, we've already performed inlining on the callee, and do not need to consider any further inlining within it. This also prevents us from inlining recursive callees all the way up to the size threshold, and instead just inline one layer. show more ...
# cfc05638	12-Aug-2025	Nick Fitzgerald <[email protected]>	Make Wasmtime's `FuncKey` one-to-one with Cranelift's `ir::UserExternalName` (#11415) * Make Wasmtime's `FuncKey` one-to-one with Cranelift's `ir::UserExternalName` `FuncKey`, which used to be call Make Wasmtime's `FuncKey` one-to-one with Cranelift's `ir::UserExternalName` (#11415) * Make Wasmtime's `FuncKey` one-to-one with Cranelift's `ir::UserExternalName` `FuncKey`, which used to be called `CompileKey`, is now one-to-one with `cranelift_codegen::ir::UserExternalName`, and is used for not just identifying compilation objects but also relocations and call-graph edges. This allows us to determine the `StaticModuleIndex` and `DefinedFuncIndex` pair for any `cranelift_codegen::ir::FuncRef`, regardless of inlining depth, which fixes some fuzz bugs on OSS-Fuzz. This continues pushing on the idea that Wasmtime's compilation orchestration and linking should be relatively agnostic to the kinds of things it is actually compiling and linking, allowing us to tweak, add, and remove new kinds of `FuncKey`s more easily. Adding a new `FuncKey` should not require modifying relocation resolution, for example, just a little bit of code to run the associated compilation and optionally some code to extract metadata into our final artifacts for querying at runtime. Everything in between should Just Continue Working. We still aren't all the way there yet, but this does bring us a little bit closer. Finally, in Cranelift's inlining pass, this adds a check that a block is inserted in the layout before attempting to remove it from the layout, which would otherwise cause panics. This was triggered by multi-level inlining and now-unreachable blocks in the inner callees. I'll note that this does update basically all of the disas tests, or at least nearly all of them that make function calls. This is because the namespace/index numbering pair changed slightly to align with `FuncKey`, but that should pretty much be the only changes. * remove debug info from panic message, it is only available in some `cfg`s * fill out module doc comment * Fix compilation without `component-model` feature * Fix some more cfg compilations * cargo fmt * fix a wrong `&dyn Any` auto coercion; add helpful debug logging and assertions for this kind of thing show more ...
# 4cbea5e8	07-Aug-2025	Nick Fitzgerald <[email protected]>	Cranelift: Translate `ir::UserExternalNameRef`s into callers when inlining (#11389) This was an entity that we forgot to translate from the callee into the caller. Note that we do not use the `Entit Cranelift: Translate `ir::UserExternalNameRef`s into callers when inlining (#11389) This was an entity that we forgot to translate from the callee into the caller. Note that we do not use the `EntityMap` offset approach for these entities because `ir::Function` hash-conses them. show more ...
# 3ecb338e	29-Jul-2025	Nick Fitzgerald <[email protected]>	Wasmtime: Add (optional) bottom-up function inlining to Wasm compilation (#11283) * Wasmtime: Add (optional) bottom-up function inlining to Wasm compilation This commit plumbs together two pieces o Wasmtime: Add (optional) bottom-up function inlining to Wasm compilation (#11283) * Wasmtime: Add (optional) bottom-up function inlining to Wasm compilation This commit plumbs together two pieces of recently-added infrastructure: 1. function inlining in Cranelift, and 2. the parallel bottom-up inlining scheduler in Wasmtime. Sprinkle some very simple inlining heuristics on top, and this gives us function inlining in Wasm compilation. The default Wasmtime configuration does not enable inlining, and when we do enable it, we only enable it for cross-component calls by default (since presumably the toolchain that produced a particular core Wasm module, like LLVM, already performed any inlining that was beneficial within that module, but that toolchain couldn't know how that Wasm module would be getting linked together with other modules via component composition, and so it could not have done any cross-component inlining). For what it is worth, there is a config knob to enable intra-module function inlining, but this is primarily for use by our fuzzers, so that they can easily excercise and explore this new inlining functionality. All this plumbing required some changes to the `wasmtime_environ::Compiler` trait, since Winch cannot do inlining but Cranelift can. This is mostly encapsulated in the new `wasmtime_environ::InliningCompiler` trait, for the most part. Additionally, we take care not to construct the call graph, or any other data structures required only by the inliner and not regular compilation, both when using Winch and when using Cranelift with inlining disabled. Finally, we add a `disas` test to verify that we successfully inline a series of calls from a function in one component, to a cross-component adapter function, to a function in another component. Most test coverage is expected to come from our fuzzing, however. * Fix dead code warning when not `cfg(feature = "component-model")` * fix winch trampoline compilation * Move CLI options to codegen * Move parameters into struct * Use an index set for call-graph construction * Smuggle inlining heuristic options through cranelift flags * Remove old CLI flags * set tunables before settings * Only configure inlining options for cranelift in fuzzing show more ...
# 4590076f	26-Jul-2025	Chris Fallin <[email protected]>	Cranelift: support dynamic contexts in exception-handler lists. (#11321) In #11285, we realized that Wasm semantics require us to match on dynamic instances of exception tags, rather than static tag Cranelift: support dynamic contexts in exception-handler lists. (#11321) In #11285, we realized that Wasm semantics require us to match on dynamic instances of exception tags, rather than static tag types. This fundamentally requires the unwinder to be able to resolve the current Wasm instance for each Wasm frame on the stack that has any handlers, and our frame format does not provide this today. We discussed many options, some of which solve the more general problem (Wasm vmctx for any frame), but ultimately landed on a notion of "dynamic context for evaluating tags", specific to Cranelift's exception-catch metadata; and storing that context and carrying it through to a place that is named in the unwind metadata. The reasoning is fairly straightforward: we cannot afford a more general approach that stores vmctx in every frame (I measured this at 20% overhead for a recursive-Fibonacci benchmark that is call-intensive); and inlining means that we may have multiple contexts at any given program point, each associated with a different slice of the handler tags; so we need a mechanism that, just for a try-call, intersperses contexts with tags (or puts a context on each tag) and stores these somewhere that the exception-unwind ABI doesn't clobber (e.g., on the stack). This PR implements "option 4" from that issue, namely, dynamic exception contexts. The idea is that this is the dual to exception payload: while payload lets the unwinder communicate state to the catching code, context lets the unwinder take state from the catching code that lets it decide whether the tag is a match. Because of inlining, we need to either associate (optional) context with every tag, or intersperse context-updates with handler tags. I've opted for the latter for efficiency at the CLIF level (in most cases there will be multiple tags per context), though they are isomorphic. The new tag-matching semantics are: when walking up the stack, upon reaching a `try_call`, evaluate catch-clauses in listed order. A `context` clause sets the current context. A `tagN: block(...)` clause attempts to match the throwing exception against `tagN`, evaluated in the current context, and branches to the named block if it matches. A `default: block(...)` always branches to the named block. Note that this lets us assume less about tags than before, and this particularly manifests in the changes to the inliner. Whereas before, `tagN` is `tagN` and an inner handler for that tag shadows an outer handler (that is, tags always alias if identical indices); and whereas before, `tagN` is not `tagM` and so we can order the tags arbitrarily (that is, tags never alias if non-identical indices); now any two static tag indices may or may not alias depending on the dynamic context of each. Or, even in the same context, two may alias, because we leave the match-predicate as an unspecified (user-chosen) algorithm during unwinding. (This mirrors the reality that, for example, a Wasm instance may import two tags, and dynamically these tags may be equal or different at runtime, even instantiation-to-instantiation.) Cranelift's only job is to faithfully carry the list of contexts and tags through to the compiled-code metadata; and to ensure that they remain in the order they were specified in the CLIF. This PR introduces the Cranelift-level feature, and it will be used in a subsequent PR that introduces Wasm exception handling. Because of that, I've opted not to update the clif-utils runtest "runtime" to read out contexts and do something with them -- we will have plenty of test coverage via a bunch of Wasm tests for corner cases such as the above. This PR does include filetests that show that contexts are carried through to spillslots and those appear in the metadata. Fixes #11285. show more ...
# e3a607ea	22-Jul-2025	Nick Fitzgerald <[email protected]>	Cranelift: fix a bug with inlining and unreachable callee blocks (#11282) We copy all callee blocks into the caller's layout, but were then only copying the callee instructions in reachable call Cranelift: fix a bug with inlining and unreachable callee blocks (#11282) We copy all callee blocks into the caller's layout, but were then only copying the callee instructions in reachable callee blocks into the caller. Therefore, any unreachable blocks would remain empty in the caller, which is invalid CLIF because all blocks must end in a terminator, so this commit adds a quick pass over the inlined blocks to remove any empty blocks from the caller's layout. show more ...
Revision tags: v35.0.0, v24.0.4, v33.0.2, v34.0.2
# 968952ab	10-Jul-2025	Nick Fitzgerald <[email protected]>	Cranelift: introduce a function inliner (#11210) * Cranelift: introduce a function inliner This comit adds "inlining as a library" to Cranelift; it does _not_ provide a complete, off-the-shelf inli Cranelift: introduce a function inliner (#11210) * Cranelift: introduce a function inliner This comit adds "inlining as a library" to Cranelift; it does _not_ provide a complete, off-the-shelf inlining solution. Cranelift's compilation context is per-function and does not encompass the full call graph. It does not know which functions are hot and which are cold, which have been marked the equivalent of `#[inline(always)]` versus `#[inline(never)]`, etc... Only the Cranelift user can understand these aspects of the full compilation pipeline, and these things can be very different between (say) Wasmtime and `cg_clif`. Therefore, this infrastructure does not attempt to define hueristics for when inlining a particular call is likely beneficial. This module only provides hooks for the Cranelift user to tell Cranelift whether a given call should be inlined or not, and the mechanics to inline a callee into a particular call site when the user directs Cranelift to do so. This commit also creates a new kind of filetest that will always inline calls to functions that have already been defined in the file. This lets us exercise the inliner in filetests. Fixes https://github.com/bytecodealliance/wasmtime/issues/4127 * Address review feedback * Require callee bodies are pre-legalized show more ...