|
Revision tags: dev, v36.0.9, v44.0.1, v43.0.2, v36.0.8, v24.0.8, v44.0.0, v43.0.1, v42.0.2, v36.0.7, v24.0.7, v43.0.0 |
|
| #
58633f35 |
| 02-Mar-2026 |
tsudzuki <[email protected]> |
Fix doc typo, the behavior is actually selecting 'x' if the bit in 'c' is 1 (#12695)
|
|
Revision tags: v42.0.1, v41.0.4, v42.0.0, v40.0.4, v36.0.6, v24.0.6, v41.0.3 |
|
| #
7ac4b818 |
| 03-Feb-2026 |
Jimmy Brisson <[email protected]> |
S390x: emit new instructions added in z17 (#12319)
* s390x: Emit instructions from MIE4 & VXRS_EXT3 on z17
This emits & tests a bunch of instructions: * from Miscellaneous-Instruction-Extensions
S390x: emit new instructions added in z17 (#12319)
* s390x: Emit instructions from MIE4 & VXRS_EXT3 on z17
This emits & tests a bunch of instructions: * from Miscellaneous-Instruction-Extensions Facility 4: * CLZ, 64bit * CTZ, 64bit * from Vector-Enhancements Facility 3: * 32x4, 64x2 & 128x1 variants of the following: * Divide * Remainder * 64x2 & 128x1 multiply variants * 128x1 vaiants of: * Compare * CLZ * CTZ * Max * Min * Average * Negation * Evaluate
Co-authored-by: Jimmy Brisson <[email protected]>
* s390x: Emit vector blend on z17
* Rename x86_blendv to blendv
Now that s390x implements blendv as well, we should refer to the instruction without the x86 prefix.
---------
Co-authored-by: Ulrich Weigand <[email protected]>
show more ...
|
|
Revision tags: v41.0.2, v41.0.1, v36.0.5, v40.0.3, v41.0.0, v36.0.4, v39.0.2, v40.0.2, v40.0.1, v40.0.0 |
|
| #
87ed3b60 |
| 15-Dec-2025 |
Chris Fallin <[email protected]> |
Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`. (#12160)
* Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable AB
Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`. (#12160)
* Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`.
As discussed in this week's Cranelift meeting, we've discovered a need to generalize the `patchable_call` mechanism and corresponding `patchable` ABI slightly. In particular, we will need patchable `try_call` callsites as well in order to allow breakpoint handlers to throw exceptions (desirable functionality eventually) and have this work in the presence of inlining. Also, it's just a nice generalization to say that patchability is an orthogonal dimension to the call ABI and the other restrictions we initially imposed, and works as long as the basic requirement (no return values) is met.
This also renames the `patchable` ABI to `preserve_all`, to make it clear that its purpose is actually orthogonal, and it can be used independently of patchable callsites. It also deletes the `cold` ABI, which never actually did anything and is misleading in the presence of an actual cold-ish (subzero temperature, actually) ABI like `preserve_all`.
* Review feedback.
show more ...
|
| #
c00e9ea2 |
| 02-Dec-2025 |
Chris Fallin <[email protected]> |
Cranelift: add patchable call instructions. (#12101)
* Cranelift: add patchable call instructions.
The new `patchable_call` CLIF instruction pairs with the `patchable` ABI, and emits a callsite wit
Cranelift: add patchable call instructions. (#12101)
* Cranelift: add patchable call instructions.
The new `patchable_call` CLIF instruction pairs with the `patchable` ABI, and emits a callsite with one new key property: the MachBuffer carries metadata that describes exactly which byte range to "NOP out" (overwrite with NOP instructions) to disable that callsite. Doing so is semantically valid and explicitly supported.
This enables patching of code at runtime to dynamically turn on and off features such as instrumentation or debugging hooks. We plan to use this to implement breakpoints in Wasmtime's guest debugging support.
As part of this change, I added a notion of "unit of NOP bytes" to the MachBuffer so that the consumer (e.g., Wasmtime's Cranelift-based code compilation pipeline and metadata-producing logic) can handle patchable callsites without any other special knowledge of the ISA.
For the "real metal" ISAs there are perfectly well-defined NOPs to use, but for Pulley, where all opcodes are assigned at compile time by macro magic, I explicitly defined NOP as opcode byte 0 by moving `Nop`'s definition to the top of the list and adding a unit test asserting its encoding.
A design note: in principle it would be possible, as an alternative, to treat "patchability" as an orthogonal dimension of all callsites, and emit the metadata describing the instruction-offset range for any callsite with the flag set. The only truly necessary semantic restriction is that there are no return values (because if we turn the callsite off, nothing writes to them); we could support patchability for other ABIs and for the other kinds of call instructions. The `patchable` ABI would then be better described as something like the "no clobbers ABI". I opted not to generalize in this way because it creates some less-tested corners and the generalized form, at least at the MachInst level, is not really much simpler in the end.
A testing note: I opted not to implement actual code patching in the `cranelift-tools` filetest runner and test patching callsites in/out via some actuation (e.g. a magic hostcall, like we do for throws) because (i) that's a lot of new plumbing and (ii) we are going to test this very shortly in Wasmtime anyway and (iii) the correctness (or not) of the location-and-length metadata is easy enough to verify in the disassemblies in the compile-tests.
* Review feedback: remove dependence on (and test for) NOP being the literal byte 0.
show more ...
|
|
Revision tags: v39.0.1, v39.0.0, v38.0.4, v37.0.3, v36.0.3, v24.0.5, v38.0.3, v38.0.2, v38.0.1, v37.0.2 |
|
| #
a3d6e407 |
| 06-Oct-2025 |
Chris Fallin <[email protected]> |
Cranelift: add debug tag infrastructure. (#11768)
* Cranelift: add debug tag infrastructure.
This PR adds *debug tags*, a kind of metadata that can attach to CLIF instructions and be lowered to VCo
Cranelift: add debug tag infrastructure. (#11768)
* Cranelift: add debug tag infrastructure.
This PR adds *debug tags*, a kind of metadata that can attach to CLIF instructions and be lowered to VCode instructions and as metadata on the produced compiled code. It also adds opaque descriptor blobs carried with stackslots. Together, these two features allow decorating IR with first-class debug instrumentation that is properly preserved by the compiler, including across optimizations and inlining. (Wasmtime's use of these features will come in followup PRs.)
The key idea of a "debug tag" is to allow the Cranelift embedder to express whatever information it needs to, in a format that is opaque to Cranelift itself, except for the parts that need translation during lowering. In particular, the `DebugTag::StackSlot` variant gets translated to a physical offset into the stackframe in the compiled metadata output. So, for example, the embedder can emit a tag referring to a stackslot, and another describing an offset in that stackslot.
The debug tags exist as a *sequence* on any given instruction; the meaning of the sequence is known only to the embedder, *except* that during inlining, the tags for the inlining call instruction are prepended to the tags of inlined instructions. In this way, a canonical use-case of tags as describing original source-language frames can preserve the source-language view even when multiple functions are inlined into one.
The descriptor on a stackslot may look a little odd at first, but its purpose is to allow serializing some description of stackslot-contained runtime user-program data, in a way that is firmly attached to the stackslot. In particular, in the face of inlining, this descriptor is copied into the inlining (parent) function from the inlined function when the stackslot entity is copied; no other metadata outside Cranelift needs to track the identity of stackslots and know about that motion. This fits nicely with the ability of tags to refer to stackslots; together, the embedder can annotate instructions as having certain state in stackslots, and describe the format of that state per stackslot.
This infrastructure is tested with some compile-tests now; testing of the interpretation of the metadata output will come with end-to-end debug instrumentation tests in a followup PR.
* Review feedback: add back sequence points and enforce tags only on sequence points or calls.
* Use Vecs for debug metadata in MachBuffer to avoid SmallVec size penalty in not-used case.
* Review feedback: switch from inlined stackslot descriptor blobs to u64 keys.
show more ...
|
|
Revision tags: v37.0.1, v37.0.0 |
|
| #
4c01ee2f |
| 05-Sep-2025 |
Chris Fallin <[email protected]> |
Cranelift: add get_exception_handler_address. (#11629)
* Cranelift: add get_exception_handler_address.
This is designed to enable applications such as #11592 that use alternative unwinding mechanis
Cranelift: add get_exception_handler_address. (#11629)
* Cranelift: add get_exception_handler_address.
This is designed to enable applications such as #11592 that use alternative unwinding mechanisms that may not necessarily want to walk a stack and look up exception tables. The idea is that whenever it would be valid to resume to an exception handler that is active on the stack, we can provide the same PC as a first-class runtime value that would be found in the exception table for the given handler edge. A "custom" resume step can then use this PC as a resume-point as long as it follows the relevant exception ABI (i.e.: restore SP, FP, any other saved registers that the exception ABI specifies, and provide appropriate payload value(s)).
Handlers are associated with edges out of `try_call`s (or `try_call_indirect`s); and edges specifically, not blocks, because there could be multiple out-edges to one block. The instruction thus takes the block that contains the try-call and an immediate that indexes its exceptional edges.
This CLIF instruction required a bit of infrastructure to (i) allow naming raw blocks, not just block calls, as instruction arguments, and (ii) allow getting the MachLabel for any other lowered block during lowering. But given that, the lowerings themselves are straightforward uses of MachBuffer labels to fix-up PC-relative address-loading instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`).
* Review feedback.
* Review feedback: more tests.
show more ...
|
|
Revision tags: v36.0.2, v36.0.1, v36.0.0, v35.0.0, v24.0.4, v33.0.2, v34.0.2 |
|
| #
8a23cc74 |
| 09-Jul-2025 |
Nick Fitzgerald <[email protected]> |
Cranelift: Make `ir::{Constant,Immediate}` considered entities (#11207)
* Cranelift: Make `ir::{Constant,Immediate}` considered entities
They reference data in out-of-line pools rather than storing
Cranelift: Make `ir::{Constant,Immediate}` considered entities (#11207)
* Cranelift: Make `ir::{Constant,Immediate}` considered entities
They reference data in out-of-line pools rather than storing their data inline in the instruction, and when an instruction containing them is moved from one `ir::Function` to another, they need their indices updated accordingly. Therefore, they really are entities rather than immediates.
This recategorization means that they will now be properly mapped in `ir::InstructionData::map` calls.
* fix tests
show more ...
|
|
Revision tags: v34.0.1, v33.0.1, v24.0.3, v32.0.1, v34.0.0, v33.0.0 |
|
| #
90ac295e |
| 19-May-2025 |
Alex Crichton <[email protected]> |
Update Wasmtime to the 2024 Rust Edition (#10806)
* Update Wasmtime to the 2024 Rust Edition
Now that our MSRV supports the 2024 edition it's possible to make this switch. This commit moves Wasmtim
Update Wasmtime to the 2024 Rust Edition (#10806)
* Update Wasmtime to the 2024 Rust Edition
Now that our MSRV supports the 2024 edition it's possible to make this switch. This commit moves Wasmtime to the 2024 Edition to keep up-to-date with Rust idioms and access many of the edition features exclusive to the 2024 edition.
prtest:full
* Reformat with the 2024 edition
show more ...
|
|
Revision tags: v32.0.0 |
|
| #
94ec88ea |
| 08-Apr-2025 |
Chris Fallin <[email protected]> |
Cranelift: initial try_call / try_call_indirect (exception) support. (#10510)
* Cranelift: initial try_call / try_call_indirect (exception) support.
This PR adds `try_call` and `try_call_indirect`
Cranelift: initial try_call / try_call_indirect (exception) support. (#10510)
* Cranelift: initial try_call / try_call_indirect (exception) support.
This PR adds `try_call` and `try_call_indirect` instructions, and lowerings on four of five ISAs (x86-64, aarch64, riscv64, pulley; s390x has its own non-shared ABI code that will need separate work).
It extends CLIF to support these instructions as new kinds of branches, and extends block-calls to accept `retN` and `exnN` block-call args that carry the normal return values or exception payloads (respectively) into the appropriate successor blocks.
It wires up the "normal return path" so that it continues to work. It updates the ABI so that unwinding is possible without an initial register state at throw: specifically, as per our RFC, all registers are clobbered. It also includes metadata in the `MachBuffer` that describes exception-catch destinations. However, no unwinder exists to interpret these catch-destinations yet, so they are untested.
* Add try_call_indirect lowering as well.
show more ...
|
|
Revision tags: v31.0.0, v30.0.2, v30.0.1, v30.0.0, v29.0.1, v29.0.0, v28.0.1 |
|
| #
a88eb702 |
| 14-Jan-2025 |
Nick Fitzgerald <[email protected]> |
Cranelift: dedupe `trap[n]z` instructions (#10004)
* Cranelift: dedupe `trap[n]z` instructions
This commit extends our existing support for merging idempotently side-effectful instructions that pro
Cranelift: dedupe `trap[n]z` instructions (#10004)
* Cranelift: dedupe `trap[n]z` instructions
This commit extends our existing support for merging idempotently side-effectful instructions that produce exactly one value to those that produce zero or one value, and marks the `trap[n]z` instructions as having idempotent side effects. This cleans up a lot test cases in our `disas` test suite, particularly those related to explicit bounds checks and GC.
As an aside, it seems like it should be easy to extend this to idempotently side-effectful instructions that produce multiple values as well, but I don't believe we have any such instructions, so I didn't bother.
* Update more disas tests
* review feedback
show more ...
|
| #
2d1c0abd |
| 29-Dec-2024 |
Julian Eager <[email protected]> |
pulley: Implement vector sqmul_round_sat (#9911)
* pulley: Implement vector sqmul_round_sat
* parenthesize to bring out op. order
|
|
Revision tags: v28.0.0 |
|
| #
45b60bd6 |
| 02-Dec-2024 |
Alex Crichton <[email protected]> |
Start using `#[expect]` instead of `#[allow]` (#9696)
* Start using `#[expect]` instead of `#[allow]`
In Rust 1.81, our new MSRV, a new feature was added to Rust to use `#[expect]` to control lint
Start using `#[expect]` instead of `#[allow]` (#9696)
* Start using `#[expect]` instead of `#[allow]`
In Rust 1.81, our new MSRV, a new feature was added to Rust to use `#[expect]` to control lint levels. This new lint annotation will silence a lint but will itself cause a lint if it doesn't actually silence anything. This is quite useful to ensure that annotations don't get stale over time.
Another feature is the ability to use a `reason` directive on the attribute with a string explaining why the attribute is there. This string is then rendered in compiler messages if a warning or error happens.
This commit migrates applies a few changes across the workspace:
* Some `#[allow]` are changed to `#[expect]` with a `reason`. * Some `#[allow]` have a `reason` added if the lint conditionally fires (mostly related to macros). * Some `#[allow]` are removed since the lint doesn't actually fire. * The workspace configures `clippy::allow_attributes_without_reason = 'warn'` as a "ratchet" to prevent future regressions. * Many crates are annotated to allow `allow_attributes_without_reason` during this transitionary period.
The end-state is that all crates should use `#[expect(..., reason = "...")]` for any lint that unconditionally fires but is expected. The `#[allow(..., reason = "...")]` lint should be used for conditionally firing lints, primarily in macro-related code. The `allow_attributes_without_reason = 'warn'` level is intended to be permanent but the transitionary `#[expect(clippy::allow_attributes_without_reason)]` crate annotations to go away over time.
* Fix adapter build
prtest:full
* Fix one-core build of icache coherence
* Use `allow` for missing_docs
Work around rust-lang/rust#130021 which was fixed in Rust 1.83 and isn't fixed for our MSRV at this time.
* More MSRV compat
show more ...
|
|
Revision tags: v27.0.0, v26.0.1, v25.0.3, v24.0.2, v26.0.0 |
|
| #
3036e795 |
| 14-Oct-2024 |
beetrees <[email protected]> |
Add I128 atomic support to the `x64` backend (#9459)
* Add I128 atomic support to the `x64` backend
* fix typo in cranelift/codegen/src/isa/x64/inst/emit.rs
---------
Co-authored-by: Nick Fitzger
Add I128 atomic support to the `x64` backend (#9459)
* Add I128 atomic support to the `x64` backend
* fix typo in cranelift/codegen/src/isa/x64/inst/emit.rs
---------
Co-authored-by: Nick Fitzgerald <[email protected]>
show more ...
|
|
Revision tags: v21.0.2, v22.0.1, v23.0.3, v25.0.2, v24.0.1, v25.0.1, v25.0.0 |
|
| #
ff987608 |
| 06-Sep-2024 |
Alex Crichton <[email protected]> |
Remove `iadd_cin` and `isub_bin`, split `isub_borrow` and `iadd_carry` (#9199)
* Remove `iadd_cin` and `isub_bin`, split `isub_borrow` and `iadd_carry`
This commit refactors the opcodes the Craneli
Remove `iadd_cin` and `isub_bin`, split `isub_borrow` and `iadd_carry` (#9199)
* Remove `iadd_cin` and `isub_bin`, split `isub_borrow` and `iadd_carry`
This commit refactors the opcodes the Cranelift supports for add-with-carry and subtract-with-borrow. None of these opcodes are currently in use by the wasm frontend nor supported by any backend. In that sense it's unlikely they have many users and the hope is that refactoring won't cause much impact.
The `iadd_cin` and `isub_bin` opcodes are the equivalent of `*_borrow` and `*_carry` except that they don't return the carry flag, they only return the result of the operation. While theoretically useful I've elected to remove them here in favor of only the borrow-returning operations. They can be added back in in the future though if necessary.
I've split the preexisting operations `isub_borrow` and `iadd_carry` additionally into signed/unsigned portions:
* `isub_borrow` => `usub_borrow` and `ssub_borrow` * `iadd_carry` => `uadd_carry` and `sadd_carry`
This reflects how the condition needs to differ on the carry flag computation for signed/unsigned inputs. I've additionally fixed the interpreter's implementation of `IsubBorrow` when switching to the signed/unsigned opcodes.
Finally the documentation for these instructions now explicitly say that the incoming carry/borrow is zero-or-nonzero even though it's typed as `i8`. Additionally the tests have been refactored to make use of multi-return which may not have existed when they were first written.
* Rename instructions
* Fix more renames
* Update instruction descriptions
show more ...
|
| #
b81ef46c |
| 22-Aug-2024 |
Nick Fitzgerald <[email protected]> |
Remove reference types (`r32` and `r64`) from Cranelift (#9164)
* Remove reference types (`r32` and `r64`) from Cranelift
* restore fuzz regression test
|
| #
dbc11c30 |
| 21-Aug-2024 |
Frank Emrich <[email protected]> |
Cranelift: add stack_switch CLIF instruction (#9078)
* stack_switch instruction
* Update cranelift/codegen/src/isa/x64/pcc.rs
Co-authored-by: Nick Fitzgerald <[email protected]>
* cargo fmt
* on
Cranelift: add stack_switch CLIF instruction (#9078)
* stack_switch instruction
* Update cranelift/codegen/src/isa/x64/pcc.rs
Co-authored-by: Nick Fitzgerald <[email protected]>
* cargo fmt
* only lower on linux
* give stack_switch the call() side effect
* add function in filetest doing switching only
* Extend documentation of new instruction
* better comments on how we handle the payloads
* Revert "only lower on linux"
This reverts commit 2af10f944186629de1615aa0ed999b7f49d13132.
* Add StackSwitchModel, use to compile stack_switch
* turn stack_switch_model into partial constructor
---------
Co-authored-by: Nick Fitzgerald <[email protected]>
show more ...
|
|
Revision tags: v24.0.0, v23.0.2, v23.0.1, v23.0.0 |
|
| #
41eca60b |
| 17-Jul-2024 |
beetrees <[email protected]> |
cranelift: Add `f16const` and `f128const` instructions (#8893)
* cranelift: Add `f16const` and `f128const` instructions
* cranelift: Add constant propagation for `f16` and `f128`
|
|
Revision tags: v22.0.0 |
|
| #
9ffc9e67 |
| 14-Jun-2024 |
Nick Fitzgerald <[email protected]> |
Cranelift: Remove resumable traps (#8809)
These were originally a SpiderMonkey-ism and have been unused ever since. It was introduced for GC integration, where the runtime could do something to make
Cranelift: Remove resumable traps (#8809)
These were originally a SpiderMonkey-ism and have been unused ever since. It was introduced for GC integration, where the runtime could do something to make Cranelift code hit a trap and pause for a GC and then resume execution once GC completed. But it is unclear that, as implemented, this is actually a useful mechanism for doing that (compared to, say, loading from some Well Known page and the GC protecting that page and catching signals to interrupt the mutator, or simply branching and doing a libcall). And if someone has that particular use case in the future (Wasmtime and its GC integration doesn't need exactly this) then we can design something for what is actually needed at that time, instead of carrying this cruft forward forever.
show more ...
|
|
Revision tags: v21.0.1, v21.0.0, v20.0.2, v20.0.1, v20.0.0, v17.0.3, v19.0.2, v18.0.4, v19.0.1 |
|
| #
f59b3246 |
| 20-Mar-2024 |
Jamey Sharp <[email protected]> |
cranelift: Optimize select_spectre_guard, carefully (#8139)
* cranelift: Optimize select_spectre_guard, carefully
This commit makes two changes to our treatment of `select_spectre_guard`.
First, s
cranelift: Optimize select_spectre_guard, carefully (#8139)
* cranelift: Optimize select_spectre_guard, carefully
This commit makes two changes to our treatment of `select_spectre_guard`.
First, stop annotating this instruction as having any side effects. We only care that if its value result is used, then it's computed without branching on the condition input. We don't otherwise care when the value is computed, or if it's computed at all.
Second, introduce some carefully selected ISLE egraph rewrites for this instruction. These particular rewrites are those where we can statically determine which SSA value will be the result of the instruction. Since there is no actual choice involved, there's no way to accidentally introduce speculation on the condition input.
* Add filetests
show more ...
|
|
Revision tags: v19.0.0 |
|
| #
c4478334 |
| 14-Mar-2024 |
Jamey Sharp <[email protected]> |
cranelift: Remove support for WebAssembly tables (#8124)
Wasmtime no longer needs any of this infrastructure and neither should anybody else.
This diff is nearly identical to @bjorn3's version of t
cranelift: Remove support for WebAssembly tables (#8124)
Wasmtime no longer needs any of this infrastructure and neither should anybody else.
This diff is nearly identical to @bjorn3's version of the same change, except I didn't remove Uimm64, which has started being used in other places. I forgot bjorn3 had already tackled this part until after I was already done, but it's reassuring that we both made the same changes.
https://github.com/bjorn3/wasmtime/commit/fb82ccb3948e949641a6d9581aa84472f68f97b8
Fixes #5532
show more ...
|
|
Revision tags: v18.0.3, v18.0.2, v17.0.2, v18.0.1, v18.0.0, v17.0.1, v17.0.0, v16.0.0, v15.0.1, v15.0.0, v14.0.4, v14.0.3, v14.0.2, v13.0.1, v14.0.1, v14.0.0, minimum-viable-wasi-proxy-serve, v13.0.0, v12.0.2, v11.0.2, v10.0.2 |
|
| #
d8db07fa |
| 09-Sep-2023 |
Afonso Bordado <[email protected]> |
cranelift: Fix `v{all,any}_true` and `vhigh_bits` instructions in the interpreter (#6985)
* cranelift: Implement `vall_true` for floats in the interpreter
* cranelift: Implement `vany_true` for flo
cranelift: Fix `v{all,any}_true` and `vhigh_bits` instructions in the interpreter (#6985)
* cranelift: Implement `vall_true` for floats in the interpreter
* cranelift: Implement `vany_true` for floats in the interpreter
* cranelift: Implement `vhigh_bits` for floats in the interpreter
* cranelift: Forbid vector return types for `vhigh_bits`
This instruction doesen't really make sense with a vector return type. The description also states that it returns a scalar integer so I suspect it wasn't intended to allow vector integers.
* fuzzgen: Enable `v{all,any}_true` and `vhigh_bits`
show more ...
|
| #
62fdafa1 |
| 29-Aug-2023 |
Alex Crichton <[email protected]> |
Remove clippy configuration from repo and crates (#6927)
Wasmtime's CI does not run clippy so there's no enforcement of this configuration. Additionally the configuration per-crate is not uniformly
Remove clippy configuration from repo and crates (#6927)
Wasmtime's CI does not run clippy so there's no enforcement of this configuration. Additionally the configuration per-crate is not uniformly applied across all of the Wasmtime workspace and is only on some historical crates. Because we don't run clippy in CI this commit removes all of the clippy annotations for allow/warn/deny from the source.
show more ...
|
|
Revision tags: v12.0.1 |
|
| #
4fc053b5 |
| 21-Aug-2023 |
Alex Crichton <[email protected]> |
cranelift: Remove `f{min,max}_pseudo` instructions (#6874)
This commit removes these two instructions and replaces them instead with their equivalents using `fcmp` plus `select` or `bitselect` depen
cranelift: Remove `f{min,max}_pseudo` instructions (#6874)
This commit removes these two instructions and replaces them instead with their equivalents using `fcmp` plus `select` or `bitselect` depending on the type (`bitselect` for vectors, `select` for scalars). The motivation for this commit is that incorrect optimizations for these instructions were removed in #6859 and likely stemmed from the surprising definitions of these instructions. These originally were intended to correspond to operations in the SIMD proposal for WebAssembly but nowadays the functionality of these instructions is replaced with:
* Lowering from wasm to clif uses the `fcmp` plus `select` combo instruction. * Backends that support optimizing this pattern use ISLE patterns to match the instruction and emit the specialization for the pseudo semantics.
This means that while the instructions are removed here it should be the case that no functionality is lost and the output of Wasmtime/Cranelift should still be the same as it was before. Existing tests using the pseudo instructions were preserved except the riscv64 ones (where the lowering was deleted) and the dynamic AArch64 ones. Both s390x and x64 continue to have specialized patterns for this compare-plus-select.
show more ...
|
|
Revision tags: v12.0.0, v11.0.1, v11.0.0, v10.0.1, v10.0.0 |
|
| #
7f108b1e |
| 13-Jun-2023 |
Alex Crichton <[email protected]> |
cranelift: Remove the `fcvt_low_from_sint` instruction (#6565)
* cranelift: Remove the `fcvt_low_from_sint` instruction
This commit removes this instruction since it's a combination of `swiden_low`
cranelift: Remove the `fcvt_low_from_sint` instruction (#6565)
* cranelift: Remove the `fcvt_low_from_sint` instruction
This commit removes this instruction since it's a combination of `swiden_low` plus `fcvt_from_sint`. This was used by the WebAssembly `f64x2.convert_low_i32x4_s` instruction previously but the corresponding unsigned variant of the instruction, `f64x2.convert_low_i32x4_u`, used a `uwiden_low` plus `fcvt_from_uint` combo. To help simplify Cranelift's instruction set and to make these two instructions mirrors of each other the Cranelift instruction is removed.
The s390x and AArch64 backend lowering rules for this instruction could simply be deleted as the previous combination of the `swiden_low` and `fcvt_from_sint` lowering rules produces the same code. The x64 backend moved its lowering to a special case of the `fcvt_from_sint` lowering.
* Fix cranelift-fuzzgen build
show more ...
|
|
Revision tags: v9.0.4, v9.0.3, v9.0.2, v9.0.1, v9.0.0 |
|
| #
913efdf2 |
| 27-Apr-2023 |
Nick Fitzgerald <[email protected]> |
wasmtime: Overhaul trampolines (#6262)
This commit splits `VMCallerCheckedFuncRef::func_ptr` into three new function pointers: `VMCallerCheckedFuncRef::{wasm,array,native}_call`. Each one has a dedi
wasmtime: Overhaul trampolines (#6262)
This commit splits `VMCallerCheckedFuncRef::func_ptr` into three new function pointers: `VMCallerCheckedFuncRef::{wasm,array,native}_call`. Each one has a dedicated calling convention, so callers just choose the version that works for them. This is as opposed to the previous behavior where we would chain together many trampolines that converted between calling conventions, sometimes up to four on the way into Wasm and four more on the way back out. See [0] for details.
[0] https://github.com/bytecodealliance/rfcs/blob/main/accepted/tail-calls.md#a-review-of-our-existing-trampolines-calling-conventions-and-call-paths
Thanks to @bjorn3 for the initial idea of having multiple function pointers for different calling conventions.
This is generally a nice ~5-10% speed up to our call benchmarks across the board: both Wasm-to-host and host-to-Wasm. The one exception is typed calls from Wasm to the host, which have a minor regression. We hypothesize that this is because the old hand-written assembly trampolines did not maintain a call frame and do a tail call, but the new Cranelift-generated trampolines do maintain a call frame and do a regular call. The regression is only a couple nanoseconds, which seems well-explained by these differences explain, and ultimately is not a big deal.
However, this does lead to a ~5% code size regression for compiled modules. Before, we compiled a trampoline per escaping function's signature and we deduplicated these trampolines by signature. Now we compile two trampolines per escaping function: one for if the host calls via the array calling convention and one for it the host calls via the native calling convention. Additionally, we compile a trampoline for every type in the module, in case there is a native calling convention function from the host that we `call_indirect` of that type. Much of this is in the `.eh_frame` section in the compiled module, because each of our trampolines needs an entry there. Note that the `.eh_frame` section is not required for Wasmtime's correctness, and you can disable its generation to shrink compiled module code size; we just emit it to play nice with external unwinders and profilers. We believe there are code size gains available for follow up work to offset this code size regression in the future.
Backing up a bit: the reason each Wasm module needs to provide these Wasm-to-native trampolines is because `wasmtime::Func::wrap` and friends allow embedders to create functions even when there is no compiler available, so they cannot bring their own trampoline. Instead the Wasm module has to supply it. This in turn means that we need to look up and patch in these Wasm-to-native trampolines during roughly instantiation time. But instantiation is super hot, and we don't want to add more passes over imports or any extra work on this path. So we integrate with `wasmtime::InstancePre` to patch these trampolines in ahead of time.
Co-Authored-By: Jamey Sharp <[email protected]> Co-Authored-By: Alex Crichton <[email protected]>
prtest:full
show more ...
|