|
Revision tags: dev, v36.0.9, v44.0.1, v43.0.2, v36.0.8, v24.0.8, v44.0.0, v43.0.1, v42.0.2, v36.0.7, v24.0.7, v43.0.0, v42.0.1, v41.0.4, v42.0.0, v40.0.4, v36.0.6, v24.0.6, v41.0.3, v41.0.2, v41.0.1, v36.0.5, v40.0.3, v41.0.0, v36.0.4, v39.0.2, v40.0.2, v40.0.1, v40.0.0 |
|
| #
87ed3b60 |
| 15-Dec-2025 |
Chris Fallin <[email protected]> |
Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`. (#12160)
* Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable AB
Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`. (#12160)
* Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`.
As discussed in this week's Cranelift meeting, we've discovered a need to generalize the `patchable_call` mechanism and corresponding `patchable` ABI slightly. In particular, we will need patchable `try_call` callsites as well in order to allow breakpoint handlers to throw exceptions (desirable functionality eventually) and have this work in the presence of inlining. Also, it's just a nice generalization to say that patchability is an orthogonal dimension to the call ABI and the other restrictions we initially imposed, and works as long as the basic requirement (no return values) is met.
This also renames the `patchable` ABI to `preserve_all`, to make it clear that its purpose is actually orthogonal, and it can be used independently of patchable callsites. It also deletes the `cold` ABI, which never actually did anything and is misleading in the presence of an actual cold-ish (subzero temperature, actually) ABI like `preserve_all`.
* Review feedback.
show more ...
|
| #
c00e9ea2 |
| 02-Dec-2025 |
Chris Fallin <[email protected]> |
Cranelift: add patchable call instructions. (#12101)
* Cranelift: add patchable call instructions.
The new `patchable_call` CLIF instruction pairs with the `patchable` ABI, and emits a callsite wit
Cranelift: add patchable call instructions. (#12101)
* Cranelift: add patchable call instructions.
The new `patchable_call` CLIF instruction pairs with the `patchable` ABI, and emits a callsite with one new key property: the MachBuffer carries metadata that describes exactly which byte range to "NOP out" (overwrite with NOP instructions) to disable that callsite. Doing so is semantically valid and explicitly supported.
This enables patching of code at runtime to dynamically turn on and off features such as instrumentation or debugging hooks. We plan to use this to implement breakpoints in Wasmtime's guest debugging support.
As part of this change, I added a notion of "unit of NOP bytes" to the MachBuffer so that the consumer (e.g., Wasmtime's Cranelift-based code compilation pipeline and metadata-producing logic) can handle patchable callsites without any other special knowledge of the ISA.
For the "real metal" ISAs there are perfectly well-defined NOPs to use, but for Pulley, where all opcodes are assigned at compile time by macro magic, I explicitly defined NOP as opcode byte 0 by moving `Nop`'s definition to the top of the list and adding a unit test asserting its encoding.
A design note: in principle it would be possible, as an alternative, to treat "patchability" as an orthogonal dimension of all callsites, and emit the metadata describing the instruction-offset range for any callsite with the flag set. The only truly necessary semantic restriction is that there are no return values (because if we turn the callsite off, nothing writes to them); we could support patchability for other ABIs and for the other kinds of call instructions. The `patchable` ABI would then be better described as something like the "no clobbers ABI". I opted not to generalize in this way because it creates some less-tested corners and the generalized form, at least at the MachInst level, is not really much simpler in the end.
A testing note: I opted not to implement actual code patching in the `cranelift-tools` filetest runner and test patching callsites in/out via some actuation (e.g. a magic hostcall, like we do for throws) because (i) that's a lot of new plumbing and (ii) we are going to test this very shortly in Wasmtime anyway and (iii) the correctness (or not) of the location-and-length metadata is easy enough to verify in the disassemblies in the compile-tests.
* Review feedback: remove dependence on (and test for) NOP being the literal byte 0.
show more ...
|
|
Revision tags: v39.0.1, v39.0.0, v38.0.4, v37.0.3, v36.0.3, v24.0.5, v38.0.3, v38.0.2, v38.0.1, v37.0.2 |
|
| #
a3d6e407 |
| 06-Oct-2025 |
Chris Fallin <[email protected]> |
Cranelift: add debug tag infrastructure. (#11768)
* Cranelift: add debug tag infrastructure.
This PR adds *debug tags*, a kind of metadata that can attach to CLIF instructions and be lowered to VCo
Cranelift: add debug tag infrastructure. (#11768)
* Cranelift: add debug tag infrastructure.
This PR adds *debug tags*, a kind of metadata that can attach to CLIF instructions and be lowered to VCode instructions and as metadata on the produced compiled code. It also adds opaque descriptor blobs carried with stackslots. Together, these two features allow decorating IR with first-class debug instrumentation that is properly preserved by the compiler, including across optimizations and inlining. (Wasmtime's use of these features will come in followup PRs.)
The key idea of a "debug tag" is to allow the Cranelift embedder to express whatever information it needs to, in a format that is opaque to Cranelift itself, except for the parts that need translation during lowering. In particular, the `DebugTag::StackSlot` variant gets translated to a physical offset into the stackframe in the compiled metadata output. So, for example, the embedder can emit a tag referring to a stackslot, and another describing an offset in that stackslot.
The debug tags exist as a *sequence* on any given instruction; the meaning of the sequence is known only to the embedder, *except* that during inlining, the tags for the inlining call instruction are prepended to the tags of inlined instructions. In this way, a canonical use-case of tags as describing original source-language frames can preserve the source-language view even when multiple functions are inlined into one.
The descriptor on a stackslot may look a little odd at first, but its purpose is to allow serializing some description of stackslot-contained runtime user-program data, in a way that is firmly attached to the stackslot. In particular, in the face of inlining, this descriptor is copied into the inlining (parent) function from the inlined function when the stackslot entity is copied; no other metadata outside Cranelift needs to track the identity of stackslots and know about that motion. This fits nicely with the ability of tags to refer to stackslots; together, the embedder can annotate instructions as having certain state in stackslots, and describe the format of that state per stackslot.
This infrastructure is tested with some compile-tests now; testing of the interpretation of the metadata output will come with end-to-end debug instrumentation tests in a followup PR.
* Review feedback: add back sequence points and enforce tags only on sequence points or calls.
* Use Vecs for debug metadata in MachBuffer to avoid SmallVec size penalty in not-used case.
* Review feedback: switch from inlined stackslot descriptor blobs to u64 keys.
show more ...
|
|
Revision tags: v37.0.1, v37.0.0, v36.0.2, v36.0.1, v36.0.0, v35.0.0, v24.0.4, v33.0.2, v34.0.2, v34.0.1, v33.0.1, v24.0.3, v32.0.1, v34.0.0, v33.0.0, v32.0.0 |
|
| #
3932e8f1 |
| 17-Apr-2025 |
bjorn3 <[email protected]> |
Some fixes for try_call (#10593)
* Fix cranelift-frontend handling of try_call
* Implement eliminate_unreachable_code for exception tables
* Ensure try_call is considered a memory fence
* Don't e
Some fixes for try_call (#10593)
* Fix cranelift-frontend handling of try_call
* Implement eliminate_unreachable_code for exception tables
* Ensure try_call is considered a memory fence
* Don't error on try_call in the verifier if no TargetIsa is passed
* Don't clobber all registers for try_call unless the tail call conv is used
This way other consumers of Cranelift don't have to pay the cost of the way Wasmtime will implement unwinding on exceptions.
* Allow SystemV call conv with try_call
show more ...
|
| #
94ec88ea |
| 08-Apr-2025 |
Chris Fallin <[email protected]> |
Cranelift: initial try_call / try_call_indirect (exception) support. (#10510)
* Cranelift: initial try_call / try_call_indirect (exception) support.
This PR adds `try_call` and `try_call_indirect`
Cranelift: initial try_call / try_call_indirect (exception) support. (#10510)
* Cranelift: initial try_call / try_call_indirect (exception) support.
This PR adds `try_call` and `try_call_indirect` instructions, and lowerings on four of five ISAs (x86-64, aarch64, riscv64, pulley; s390x has its own non-shared ABI code that will need separate work).
It extends CLIF to support these instructions as new kinds of branches, and extends block-calls to accept `retN` and `exnN` block-call args that carry the normal return values or exception payloads (respectively) into the appropriate successor blocks.
It wires up the "normal return path" so that it continues to work. It updates the ABI so that unwinding is possible without an initial register state at throw: specifically, as per our RFC, all registers are clobbered. It also includes metadata in the `MachBuffer` that describes exception-catch destinations. However, no unwinder exists to interpret these catch-destinations yet, so they are untested.
* Add try_call_indirect lowering as well.
show more ...
|
|
Revision tags: v31.0.0 |
|
| #
4d876371 |
| 06-Mar-2025 |
Nick Fitzgerald <[email protected]> |
Add "pure" flag to `ir::MemFlags` (#10340)
* Add "pure" flag to `ir::MemFlags`
This flag represents whether the memory operation's safety (e.g. the validity of its `notrap` and `readonly` claims) i
Add "pure" flag to `ir::MemFlags` (#10340)
* Add "pure" flag to `ir::MemFlags`
This flag represents whether the memory operation's safety (e.g. the validity of its `notrap` and `readonly` claims) is purely a function of its data dependencies.
If this flag is `true`, then it is okay to code motion this instruction to arbitrary locations, in the function, including across blocks and conditional branches, so long as data dependencies (and trap ordering, if relevant) are upheld.
If this flag is `false`, then the memory operation's safety potentially relies upon invariants that are not reflected in its data dependencies, and therefore it is not safe to code motion this operation. For example, this operation could be in a block that is dominated by a control-flow bounds check that makes this operation safe, and that invariant is not reflected in its operands. It would be unsafe to code motion such an instruction above its associated bounds check, even if its data dependencies would still be satisfied.
I've added this flag because we were doing exactly that kind of code motion where we moved a `readonly` and `notrap` memory operation past its associated null-check and therefore it was no longer safe to perform and we would get a segfault. This could only be triggered when the Wasm typed-function-references proposal was enabled, which is not a tier-1 proposal, so it is not considered a vulnerability. Nonetheless, it is a pretty scary kind of bug, and other code paths weren't affected due to pretty subtle interactions. And this is the motivation for the new "pure" flag: without needing to explicitly opt into data-dependency-based code motion (i.e. set the "pure" flag), it is too easy to accidentally move loads past their control-flow-based safety guards.
* fix load-hoisting test; also test that non-pure loads don't hoist
* Rename `pure` flag to `can_move`
show more ...
|
|
Revision tags: v30.0.2, v30.0.1, v30.0.0, v29.0.1, v29.0.0, v28.0.1 |
|
| #
a88eb702 |
| 14-Jan-2025 |
Nick Fitzgerald <[email protected]> |
Cranelift: dedupe `trap[n]z` instructions (#10004)
* Cranelift: dedupe `trap[n]z` instructions
This commit extends our existing support for merging idempotently side-effectful instructions that pro
Cranelift: dedupe `trap[n]z` instructions (#10004)
* Cranelift: dedupe `trap[n]z` instructions
This commit extends our existing support for merging idempotently side-effectful instructions that produce exactly one value to those that produce zero or one value, and marks the `trap[n]z` instructions as having idempotent side effects. This cleans up a lot test cases in our `disas` test suite, particularly those related to explicit bounds checks and GC.
As an aside, it seems like it should be easy to extend this to idempotently side-effectful instructions that produce multiple values as well, but I don't believe we have any such instructions, so I didn't bother.
* Update more disas tests
* review feedback
show more ...
|
|
Revision tags: v28.0.0, v27.0.0, v26.0.1, v25.0.3, v24.0.2, v26.0.0, v21.0.2, v22.0.1, v23.0.3, v25.0.2, v24.0.1, v25.0.1, v25.0.0 |
|
| #
b81ef46c |
| 22-Aug-2024 |
Nick Fitzgerald <[email protected]> |
Remove reference types (`r32` and `r64`) from Cranelift (#9164)
* Remove reference types (`r32` and `r64`) from Cranelift
* restore fuzz regression test
|
|
Revision tags: v24.0.0, v23.0.2, v23.0.1, v23.0.0 |
|
| #
41eca60b |
| 17-Jul-2024 |
beetrees <[email protected]> |
cranelift: Add `f16const` and `f128const` instructions (#8893)
* cranelift: Add `f16const` and `f128const` instructions
* cranelift: Add constant propagation for `f16` and `f128`
|
|
Revision tags: v22.0.0, v21.0.1, v21.0.0 |
|
| #
b869b66b |
| 13-May-2024 |
Jamey Sharp <[email protected]> |
cranelift: Delete redundant DCE optimization pass (#8227)
The egraph pass and the dead-code elimination pass both remove instructions whose results are unused. If the optimization level is "none", n
cranelift: Delete redundant DCE optimization pass (#8227)
The egraph pass and the dead-code elimination pass both remove instructions whose results are unused. If the optimization level is "none", neither pass runs, and if it's anything else both passes run. I don't think we should do this work twice.
Note that the DCE pass is different than the "eliminate unreachable code" pass, which removes entire blocks that are unreachable from the entry block. That pass might still be necessary.
show more ...
|
|
Revision tags: v20.0.2, v20.0.1, v20.0.0, v17.0.3, v19.0.2, v18.0.4 |
|
| #
1721fe3f |
| 08-Apr-2024 |
Nick Fitzgerald <[email protected]> |
Cranelift: Do not dedupe/GVN bitcasts from reference values (#8317)
* Cranelift: Do not dedupe/GVN bitcasts from reference values
Deduping bitcasts to integers from references can make the referenc
Cranelift: Do not dedupe/GVN bitcasts from reference values (#8317)
* Cranelift: Do not dedupe/GVN bitcasts from reference values
Deduping bitcasts to integers from references can make the references no long longer live across safepoints, and instead only the bitcasted integer results would be. Because the reference is no longer live after the safepoint, the safepoint's stack map would not have an entry for the reference, which could result in the collector reclaiming an object too early, which is basically a use-after-free bug. Luckily, we sandbox the GC heap now, so such UAF bugs aren't memory unsafe, but they could potentially result in denial of service attacks. Either way, we don't want those bugs!
On the other hand, it is technically fine to dedupe bitcasts *to* reference types. Doing so extends, rather than shortens, the live range of the GC reference. This potentially adds it to more stack maps than it otherwise would have been in, which means it might unnecessarily survive a GC it otherwise wouldn't have. But that is fine. Shrinking live ranges of GC references, and removing them from stack maps they otherwise should have been in, is the problematic transformation.
* Add additional logging and debug asserts for GC stuff
show more ...
|
|
Revision tags: v19.0.1, v19.0.0, v18.0.3, v18.0.2, v17.0.2, v18.0.1, v18.0.0, v17.0.1, v17.0.0, v16.0.0, v15.0.1, v15.0.0, v14.0.4, v14.0.3, v14.0.2, v13.0.1, v14.0.1, v14.0.0, minimum-viable-wasi-proxy-serve, v13.0.0, v12.0.2, v11.0.2, v10.0.2, v12.0.1, v12.0.0, v11.0.1, v11.0.0, v10.0.1, v10.0.0, v9.0.4, v9.0.3, v9.0.2, v9.0.1, v9.0.0, v6.0.2, v7.0.1, v8.0.1, v8.0.0 |
|
| #
f684a5fb |
| 11-Apr-2023 |
T0b1-iOS <[email protected]> |
remove `iadd_cout` and `isub_bout` (#6198)
|
|
Revision tags: v7.0.0, v6.0.1, v5.0.1, v4.0.1 |
|
| #
7b8854f8 |
| 02-Mar-2023 |
Chris Fallin <[email protected]> |
egraphs: fix handling of effectful-but-idempotent ops and GVN. (#5800)
* Revert "egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808)"
This reverts commit c7e25718665aa3fd5f28d4a
egraphs: fix handling of effectful-but-idempotent ops and GVN. (#5800)
* Revert "egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808)"
This reverts commit c7e25718665aa3fd5f28d4a3d0c94580eb040c37.
* egraphs: fix handling of effectful-but-idempotent ops and GVN.
This PR addresses #5796: currently, ops that are effectful, i.e., remain in the side-effecting skeleton (which we keep in the `Layout` while the egraph exists), but are idempotent and thus mergeable by a GVN pass, are not handled properly.
GVN is still possible on effectful but idempotent ops precisely because our GVN does not create partial redundancies: it removes an instruction only when it is dominated by an identical instruction. An isntruction will not be "hoisted" to a point where it could execute in the optimized code but not in the original.
However, there are really two parts to the egraph implementation that produce this effect: the deduplication on insertion into the egraph, and the elaboration with a scoped hashmap. The deduplication lets us give a single name (value ID) to all copies of an identical instruction, and then elaboration will re-create duplicates if GVN should not hoist or merge some of them.
Because deduplication need not worry about dominance or scopes, we use a simple (non-scoped) hashmap to dedup/intern ops as "egraph nodes".
When we added support for GVN'ing effectful but idempotent ops (#5594), we kept the use of this simple dedup'ing hashmap, but these ops do not get elaborated; instead they stay in the side-effecting skeleton. Thus, we inadvertently created potential for weird code-motion effects.
The proposal in #5796 would solve this in a clean way by treating these ops as pure again, and keeping them out of the skeleton, instead putting "force" pseudo-ops in the skeleton. However, this is a little more complex than I would like, and I've realized that @jameysharp's earlier suggestion is much simpler: we can keep an actual scoped hashmap separately just for the effectful-but-idempotent ops, and use it to GVN while we build the egraph. In effect, we're fusing a separate GVN pass with the egraph pass (but letting it interact corecursively with egraph rewrites. This is in principle similar to how we keep a separate map for loads and fuse this pass with the egraph rewrite pass as well.
Note that we can use a `ScopedHashMap` here without the "context" (as needed by `CtxHashMap`) because, as noted by @jameysharp, in practice the ops we want to GVN have all their args inline. Equality on the `InstructinoData` itself is conservative: two insts whose struct contents compare shallowly equal are definitely identical, but identical insts in a deep-equality sense may not compare shallowly equal, due to list indirection. This is fine for GVN, because it is still sound to skip any given GVN opportunity (and keep the original instructions).
Fixes #5796.
* Add comments from review.
show more ...
|
|
Revision tags: v6.0.0 |
|
| #
c7e25718 |
| 16-Feb-2023 |
Chris Fallin <[email protected]> |
egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808)
This is a short-term fix to the same bug that #5800 is addressing (#5796), but with less risk: it simply turns off GVN'ing of
egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808)
This is a short-term fix to the same bug that #5800 is addressing (#5796), but with less risk: it simply turns off GVN'ing of effectful but idempotent ops. Because we have an upcoming release, and this is a miscompile (albeit to do with trapping behavior), we would like to make the simplest possible fix that avoids the bug, and backport it. I will then rebase #5800 on top of a revert of this followed by the more complete fix.
show more ...
|
| #
80c147d9 |
| 16-Feb-2023 |
Trevor Elliott <[email protected]> |
Rework br_table to use BlockCall (#5731)
Rework br_table to use BlockCall, allowing us to avoid adding new nodes during ssa construction to hold block arguments. Additionally, many places where we p
Rework br_table to use BlockCall (#5731)
Rework br_table to use BlockCall, allowing us to avoid adding new nodes during ssa construction to hold block arguments. Additionally, many places where we previously matched on InstructionData to extract branch destinations can be replaced with a use of branch_destination or branch_destination_mut.
show more ...
|
| #
d99783fc |
| 10-Feb-2023 |
Trevor Elliott <[email protected]> |
Move default blocks into jump tables (#5756)
Move the default block off of the br_table instrution, and into the JumpTable that it references.
|
| #
b0b3f67c |
| 08-Feb-2023 |
Trevor Elliott <[email protected]> |
Move jump tables to the DataFlowGraph (#5745)
Move the storage for jump tables off of FunctionStencil and onto DataFlowGraph. This change is in service of #5731, making it easier to access the jump
Move jump tables to the DataFlowGraph (#5745)
Move the storage for jump tables off of FunctionStencil and onto DataFlowGraph. This change is in service of #5731, making it easier to access the jump table data in the context of helpers like inst_values.
show more ...
|
| #
3343cf80 |
| 07-Feb-2023 |
Trevor Elliott <[email protected]> |
Add assertions for matches that used to use analyze_branch (#5733)
Following up from #5730, add debug assertions to ensure that new branch instructions don't slip through matches that used to use an
Add assertions for matches that used to use analyze_branch (#5733)
Following up from #5730, add debug assertions to ensure that new branch instructions don't slip through matches that used to use analyze_branch.
show more ...
|
| #
2c842599 |
| 07-Feb-2023 |
Trevor Elliott <[email protected]> |
Refactor matches that used to consume BranchInfo (#5734)
Explicitly borrow the instruction data, and use a mutable borrow to avoid rematch.
|
| #
c8a6adf8 |
| 07-Feb-2023 |
Trevor Elliott <[email protected]> |
Remove analyze_branch and BranchInfo (#5730)
We don't have overlap in behavior for branch instructions anymore, so we can remove analyze_branch and instead match on the InstructionData directly.
Remove analyze_branch and BranchInfo (#5730)
We don't have overlap in behavior for branch instructions anymore, so we can remove analyze_branch and instead match on the InstructionData directly.
Co-authored-by: Jamey Sharp <[email protected]>
show more ...
|
| #
a5698ced |
| 30-Jan-2023 |
Trevor Elliott <[email protected]> |
cranelift: Remove brz and brnz (#5630)
Remove the brz and brnz instructions, as their behavior is now redundant with brif.
|
| #
b58a197d |
| 24-Jan-2023 |
Trevor Elliott <[email protected]> |
cranelift: Add a conditional branch instruction with two targets (#5446)
Add a conditional branch instruction with two targets: brif. This instruction will eventually replace brz and brnz, as it enc
cranelift: Add a conditional branch instruction with two targets (#5446)
Add a conditional branch instruction with two targets: brif. This instruction will eventually replace brz and brnz, as it encompasses the behavior of both.
This PR also changes the InstructionData layout for instruction formats that hold BlockCall values, taking the same approach we use for Value arguments. This allows branch_destination to return a slice to the BlockCall values held in the instruction, rather than requiring that we pattern match on InstructionData to fetch the then/else blocks.
Function generation for fuzzing has been updated to generate uses of brif, and I've run the cranelift-fuzzgen target locally for hours without triggering any new failures.
show more ...
|
|
Revision tags: v5.0.0 |
|
| #
704f5a57 |
| 19-Jan-2023 |
Chris Fallin <[email protected]> |
Cranelift/egraph mid-end: support merging effectful-but-idempotent ops (#5594)
* Support mergeable-but-side-effectful (idempotent) operations in general in the egraph's GVN.
This mirrors the simi
Cranelift/egraph mid-end: support merging effectful-but-idempotent ops (#5594)
* Support mergeable-but-side-effectful (idempotent) operations in general in the egraph's GVN.
This mirrors the similar change made in #5534.
* Add tests for egraph case.
show more ...
|
| #
7cea73a8 |
| 19-Jan-2023 |
Trevor Elliott <[email protected]> |
Refactor BranchInfo::Table to no longer have an optional default branch (#5593)
|
| #
1e6c13d8 |
| 18-Jan-2023 |
Trevor Elliott <[email protected]> |
cranelift: Rework block instructions to use BlockCall (#5464)
Add a new type BlockCall that represents the pair of a block name with arguments to be passed to it. (The mnemonic here is that it looks
cranelift: Rework block instructions to use BlockCall (#5464)
Add a new type BlockCall that represents the pair of a block name with arguments to be passed to it. (The mnemonic here is that it looks a bit like a function call.) Rework the implementation of jump, brz, and brnz to use BlockCall instead of storing the block arguments as varargs in the instruction's ValueList.
To ensure that we're processing block arguments from BlockCall values in instructions, three new functions have been introduced on DataFlowGraph that both sets of arguments:
inst_values - returns an iterator that traverses values in the instruction and block arguments
map_inst_values - applies a function to each value in the instruction and block arguments
overwrite_inst_values - overwrite all values in an instruction and block arguments with values from the iterator
Co-authored-by: Jamey Sharp <[email protected]>
show more ...
|