History log of /wasmtime-44.0.1/cranelift/codegen/meta/src/shared/instructions.rs (Results 1 – 25 of 192)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: dev, v36.0.9, v44.0.1, v43.0.2, v36.0.8, v24.0.8, v44.0.0, v43.0.1, v42.0.2, v36.0.7, v24.0.7, v43.0.0
# 58633f35 02-Mar-2026 tsudzuki <[email protected]>

Fix doc typo, the behavior is actually selecting 'x' if the bit in 'c' is 1 (#12695)


Revision tags: v42.0.1, v41.0.4, v42.0.0, v40.0.4, v36.0.6, v24.0.6, v41.0.3
# 7ac4b818 03-Feb-2026 Jimmy Brisson <[email protected]>

S390x: emit new instructions added in z17 (#12319)

* s390x: Emit instructions from MIE4 & VXRS_EXT3 on z17

This emits & tests a bunch of instructions:
* from Miscellaneous-Instruction-Extensions

S390x: emit new instructions added in z17 (#12319)

* s390x: Emit instructions from MIE4 & VXRS_EXT3 on z17

This emits & tests a bunch of instructions:
* from Miscellaneous-Instruction-Extensions Facility 4:
* CLZ, 64bit
* CTZ, 64bit
* from Vector-Enhancements Facility 3:
* 32x4, 64x2 & 128x1 variants of the following:
* Divide
* Remainder
* 64x2 & 128x1 multiply variants
* 128x1 vaiants of:
* Compare
* CLZ
* CTZ
* Max
* Min
* Average
* Negation
* Evaluate

Co-authored-by: Jimmy Brisson <[email protected]>

* s390x: Emit vector blend on z17

* Rename x86_blendv to blendv

Now that s390x implements blendv as well, we should refer to
the instruction without the x86 prefix.

---------

Co-authored-by: Ulrich Weigand <[email protected]>

show more ...


Revision tags: v41.0.2, v41.0.1, v36.0.5, v40.0.3, v41.0.0, v36.0.4, v39.0.2, v40.0.2, v40.0.1, v40.0.0
# 87ed3b60 15-Dec-2025 Chris Fallin <[email protected]>

Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`. (#12160)

* Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable AB

Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`. (#12160)

* Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`.

As discussed in this week's Cranelift meeting, we've discovered a need
to generalize the `patchable_call` mechanism and corresponding
`patchable` ABI slightly. In particular, we will need patchable
`try_call` callsites as well in order to allow breakpoint handlers to
throw exceptions (desirable functionality eventually) and have this work
in the presence of inlining. Also, it's just a nice generalization to
say that patchability is an orthogonal dimension to the call ABI and the
other restrictions we initially imposed, and works as long as the basic
requirement (no return values) is met.

This also renames the `patchable` ABI to `preserve_all`, to make it
clear that its purpose is actually orthogonal, and it can be used
independently of patchable callsites. It also deletes the `cold` ABI,
which never actually did anything and is misleading in the presence of
an actual cold-ish (subzero temperature, actually) ABI like
`preserve_all`.

* Review feedback.

show more ...


# c00e9ea2 02-Dec-2025 Chris Fallin <[email protected]>

Cranelift: add patchable call instructions. (#12101)

* Cranelift: add patchable call instructions.

The new `patchable_call` CLIF instruction pairs with the `patchable`
ABI, and emits a callsite wit

Cranelift: add patchable call instructions. (#12101)

* Cranelift: add patchable call instructions.

The new `patchable_call` CLIF instruction pairs with the `patchable`
ABI, and emits a callsite with one new key property: the MachBuffer
carries metadata that describes exactly which byte range to "NOP out"
(overwrite with NOP instructions) to disable that callsite. Doing so is
semantically valid and explicitly supported.

This enables patching of code at runtime to dynamically turn on and off
features such as instrumentation or debugging hooks. We plan to use this
to implement breakpoints in Wasmtime's guest debugging support.

As part of this change, I added a notion of "unit of NOP bytes" to the
MachBuffer so that the consumer (e.g., Wasmtime's Cranelift-based code
compilation pipeline and metadata-producing logic) can handle patchable
callsites without any other special knowledge of the ISA.

For the "real metal" ISAs there are perfectly well-defined NOPs to use,
but for Pulley, where all opcodes are assigned at compile time by macro
magic, I explicitly defined NOP as opcode byte 0 by moving `Nop`'s
definition to the top of the list and adding a unit test asserting its
encoding.

A design note: in principle it would be possible, as an alternative, to
treat "patchability" as an orthogonal dimension of all callsites, and
emit the metadata describing the instruction-offset range for any
callsite with the flag set. The only truly necessary semantic
restriction is that there are no return values (because if we turn the
callsite off, nothing writes to them); we could support patchability for
other ABIs and for the other kinds of call instructions. The `patchable`
ABI would then be better described as something like the "no clobbers
ABI". I opted not to generalize in this way because it creates some
less-tested corners and the generalized form, at least at the MachInst
level, is not really much simpler in the end.

A testing note: I opted not to implement actual code patching in the
`cranelift-tools` filetest runner and test patching callsites in/out via
some actuation (e.g. a magic hostcall, like we do for throws) because
(i) that's a lot of new plumbing and (ii) we are going to test this very
shortly in Wasmtime anyway and (iii) the correctness (or not) of the
location-and-length metadata is easy enough to verify in the
disassemblies in the compile-tests.

* Review feedback: remove dependence on (and test for) NOP being the literal byte 0.

show more ...


Revision tags: v39.0.1, v39.0.0, v38.0.4, v37.0.3, v36.0.3, v24.0.5, v38.0.3, v38.0.2, v38.0.1, v37.0.2
# a3d6e407 06-Oct-2025 Chris Fallin <[email protected]>

Cranelift: add debug tag infrastructure. (#11768)

* Cranelift: add debug tag infrastructure.

This PR adds *debug tags*, a kind of metadata that can attach to CLIF
instructions and be lowered to VCo

Cranelift: add debug tag infrastructure. (#11768)

* Cranelift: add debug tag infrastructure.

This PR adds *debug tags*, a kind of metadata that can attach to CLIF
instructions and be lowered to VCode instructions and as metadata on
the produced compiled code. It also adds opaque descriptor blobs
carried with stackslots. Together, these two features allow decorating
IR with first-class debug instrumentation that is properly preserved
by the compiler, including across optimizations and
inlining. (Wasmtime's use of these features will come in followup
PRs.)

The key idea of a "debug tag" is to allow the Cranelift embedder to
express whatever information it needs to, in a format that is opaque
to Cranelift itself, except for the parts that need translation during
lowering. In particular, the `DebugTag::StackSlot` variant gets
translated to a physical offset into the stackframe in the compiled
metadata output. So, for example, the embedder can emit a tag
referring to a stackslot, and another describing an offset in that
stackslot.

The debug tags exist as a *sequence* on any given instruction; the
meaning of the sequence is known only to the embedder, *except* that
during inlining, the tags for the inlining call instruction are
prepended to the tags of inlined instructions. In this way, a
canonical use-case of tags as describing original source-language
frames can preserve the source-language view even when multiple
functions are inlined into one.

The descriptor on a stackslot may look a little odd at first, but its
purpose is to allow serializing some description of
stackslot-contained runtime user-program data, in a way that is firmly
attached to the stackslot. In particular, in the face of inlining,
this descriptor is copied into the inlining (parent) function from the
inlined function when the stackslot entity is copied; no other
metadata outside Cranelift needs to track the identity of stackslots
and know about that motion. This fits nicely with the ability of tags
to refer to stackslots; together, the embedder can annotate
instructions as having certain state in stackslots, and describe the
format of that state per stackslot.

This infrastructure is tested with some compile-tests now;
testing of the interpretation of the metadata output will come with
end-to-end debug instrumentation tests in a followup PR.

* Review feedback: add back sequence points and enforce tags only on sequence points or calls.

* Use Vecs for debug metadata in MachBuffer to avoid SmallVec size penalty in not-used case.

* Review feedback: switch from inlined stackslot descriptor blobs to u64 keys.

show more ...


Revision tags: v37.0.1, v37.0.0
# 4c01ee2f 05-Sep-2025 Chris Fallin <[email protected]>

Cranelift: add get_exception_handler_address. (#11629)

* Cranelift: add get_exception_handler_address.

This is designed to enable applications such as #11592 that use
alternative unwinding mechanis

Cranelift: add get_exception_handler_address. (#11629)

* Cranelift: add get_exception_handler_address.

This is designed to enable applications such as #11592 that use
alternative unwinding mechanisms that may not necessarily want to walk a
stack and look up exception tables. The idea is that whenever it would
be valid to resume to an exception handler that is active on the stack,
we can provide the same PC as a first-class runtime value that would be
found in the exception table for the given handler edge. A "custom"
resume step can then use this PC as a resume-point as long as it follows
the relevant exception ABI (i.e.: restore SP, FP, any other saved
registers that the exception ABI specifies, and provide appropriate
payload value(s)).

Handlers are associated with edges out of `try_call`s (or
`try_call_indirect`s); and edges specifically, not blocks, because there
could be multiple out-edges to one block. The instruction thus takes the
block that contains the try-call and an immediate that indexes its
exceptional edges.

This CLIF instruction required a bit of infrastructure to (i) allow
naming raw blocks, not just block calls, as instruction arguments, and
(ii) allow getting the MachLabel for any other lowered block during
lowering. But given that, the lowerings themselves are straightforward
uses of MachBuffer labels to fix-up PC-relative address-loading
instructions (e.g., `LEA` or `ADR` or `AUIPC`+`ADDI`).

* Review feedback.

* Review feedback: more tests.

show more ...


Revision tags: v36.0.2, v36.0.1, v36.0.0, v35.0.0, v24.0.4, v33.0.2, v34.0.2
# 8a23cc74 09-Jul-2025 Nick Fitzgerald <[email protected]>

Cranelift: Make `ir::{Constant,Immediate}` considered entities (#11207)

* Cranelift: Make `ir::{Constant,Immediate}` considered entities

They reference data in out-of-line pools rather than storing

Cranelift: Make `ir::{Constant,Immediate}` considered entities (#11207)

* Cranelift: Make `ir::{Constant,Immediate}` considered entities

They reference data in out-of-line pools rather than storing their data inline
in the instruction, and when an instruction containing them is moved from one
`ir::Function` to another, they need their indices updated
accordingly. Therefore, they really are entities rather than immediates.

This recategorization means that they will now be properly mapped in
`ir::InstructionData::map` calls.

* fix tests

show more ...


Revision tags: v34.0.1, v33.0.1, v24.0.3, v32.0.1, v34.0.0, v33.0.0
# 90ac295e 19-May-2025 Alex Crichton <[email protected]>

Update Wasmtime to the 2024 Rust Edition (#10806)

* Update Wasmtime to the 2024 Rust Edition

Now that our MSRV supports the 2024 edition it's possible to make this
switch. This commit moves Wasmtim

Update Wasmtime to the 2024 Rust Edition (#10806)

* Update Wasmtime to the 2024 Rust Edition

Now that our MSRV supports the 2024 edition it's possible to make this
switch. This commit moves Wasmtime to the 2024 Edition to keep
up-to-date with Rust idioms and access many of the edition features
exclusive to the 2024 edition.

prtest:full

* Reformat with the 2024 edition

show more ...


Revision tags: v32.0.0
# 94ec88ea 08-Apr-2025 Chris Fallin <[email protected]>

Cranelift: initial try_call / try_call_indirect (exception) support. (#10510)

* Cranelift: initial try_call / try_call_indirect (exception) support.

This PR adds `try_call` and `try_call_indirect`

Cranelift: initial try_call / try_call_indirect (exception) support. (#10510)

* Cranelift: initial try_call / try_call_indirect (exception) support.

This PR adds `try_call` and `try_call_indirect` instructions, and
lowerings on four of five ISAs (x86-64, aarch64, riscv64, pulley; s390x
has its own non-shared ABI code that will need separate work).

It extends CLIF to support these instructions as new kinds of branches,
and extends block-calls to accept `retN` and `exnN` block-call args that
carry the normal return values or exception payloads (respectively) into
the appropriate successor blocks.

It wires up the "normal return path" so that it continues to work.
It updates the ABI so that unwinding is possible without an initial
register state at throw: specifically, as per our RFC, all registers are
clobbered. It also includes metadata in the `MachBuffer` that describes
exception-catch destinations. However, no unwinder exists to interpret
these catch-destinations yet, so they are untested.

* Add try_call_indirect lowering as well.

show more ...


Revision tags: v31.0.0, v30.0.2, v30.0.1, v30.0.0, v29.0.1, v29.0.0, v28.0.1
# a88eb702 14-Jan-2025 Nick Fitzgerald <[email protected]>

Cranelift: dedupe `trap[n]z` instructions (#10004)

* Cranelift: dedupe `trap[n]z` instructions

This commit extends our existing support for merging idempotently side-effectful
instructions that pro

Cranelift: dedupe `trap[n]z` instructions (#10004)

* Cranelift: dedupe `trap[n]z` instructions

This commit extends our existing support for merging idempotently side-effectful
instructions that produce exactly one value to those that produce zero or one
value, and marks the `trap[n]z` instructions as having idempotent side
effects. This cleans up a lot test cases in our `disas` test suite, particularly
those related to explicit bounds checks and GC.

As an aside, it seems like it should be easy to extend this to idempotently
side-effectful instructions that produce multiple values as well, but I don't
believe we have any such instructions, so I didn't bother.

* Update more disas tests

* review feedback

show more ...


# 2d1c0abd 29-Dec-2024 Julian Eager <[email protected]>

pulley: Implement vector sqmul_round_sat (#9911)

* pulley: Implement vector sqmul_round_sat

* parenthesize to bring out op. order


Revision tags: v28.0.0
# 45b60bd6 02-Dec-2024 Alex Crichton <[email protected]>

Start using `#[expect]` instead of `#[allow]` (#9696)

* Start using `#[expect]` instead of `#[allow]`

In Rust 1.81, our new MSRV, a new feature was added to Rust to use
`#[expect]` to control lint

Start using `#[expect]` instead of `#[allow]` (#9696)

* Start using `#[expect]` instead of `#[allow]`

In Rust 1.81, our new MSRV, a new feature was added to Rust to use
`#[expect]` to control lint levels. This new lint annotation will
silence a lint but will itself cause a lint if it doesn't actually
silence anything. This is quite useful to ensure that annotations don't
get stale over time.

Another feature is the ability to use a `reason` directive on the
attribute with a string explaining why the attribute is there. This
string is then rendered in compiler messages if a warning or error
happens.

This commit migrates applies a few changes across the workspace:

* Some `#[allow]` are changed to `#[expect]` with a `reason`.
* Some `#[allow]` have a `reason` added if the lint conditionally fires
(mostly related to macros).
* Some `#[allow]` are removed since the lint doesn't actually fire.
* The workspace configures `clippy::allow_attributes_without_reason = 'warn'`
as a "ratchet" to prevent future regressions.
* Many crates are annotated to allow `allow_attributes_without_reason`
during this transitionary period.

The end-state is that all crates should use
`#[expect(..., reason = "...")]` for any lint that unconditionally fires
but is expected. The `#[allow(..., reason = "...")]` lint should be used
for conditionally firing lints, primarily in macro-related code.
The `allow_attributes_without_reason = 'warn'` level is intended to be
permanent but the transitionary
`#[expect(clippy::allow_attributes_without_reason)]` crate annotations
to go away over time.

* Fix adapter build

prtest:full

* Fix one-core build of icache coherence

* Use `allow` for missing_docs

Work around rust-lang/rust#130021 which was fixed in Rust 1.83 and isn't
fixed for our MSRV at this time.

* More MSRV compat

show more ...


Revision tags: v27.0.0, v26.0.1, v25.0.3, v24.0.2, v26.0.0
# 3036e795 14-Oct-2024 beetrees <[email protected]>

Add I128 atomic support to the `x64` backend (#9459)

* Add I128 atomic support to the `x64` backend

* fix typo in cranelift/codegen/src/isa/x64/inst/emit.rs

---------

Co-authored-by: Nick Fitzger

Add I128 atomic support to the `x64` backend (#9459)

* Add I128 atomic support to the `x64` backend

* fix typo in cranelift/codegen/src/isa/x64/inst/emit.rs

---------

Co-authored-by: Nick Fitzgerald <[email protected]>

show more ...


Revision tags: v21.0.2, v22.0.1, v23.0.3, v25.0.2, v24.0.1, v25.0.1, v25.0.0
# ff987608 06-Sep-2024 Alex Crichton <[email protected]>

Remove `iadd_cin` and `isub_bin`, split `isub_borrow` and `iadd_carry` (#9199)

* Remove `iadd_cin` and `isub_bin`, split `isub_borrow` and `iadd_carry`

This commit refactors the opcodes the Craneli

Remove `iadd_cin` and `isub_bin`, split `isub_borrow` and `iadd_carry` (#9199)

* Remove `iadd_cin` and `isub_bin`, split `isub_borrow` and `iadd_carry`

This commit refactors the opcodes the Cranelift supports for
add-with-carry and subtract-with-borrow. None of these opcodes are
currently in use by the wasm frontend nor supported by any backend. In
that sense it's unlikely they have many users and the hope is that
refactoring won't cause much impact.

The `iadd_cin` and `isub_bin` opcodes are the equivalent of `*_borrow`
and `*_carry` except that they don't return the carry flag, they only
return the result of the operation. While theoretically useful I've
elected to remove them here in favor of only the borrow-returning
operations. They can be added back in in the future though if necessary.

I've split the preexisting operations `isub_borrow` and `iadd_carry`
additionally into signed/unsigned portions:

* `isub_borrow` => `usub_borrow` and `ssub_borrow`
* `iadd_carry` => `uadd_carry` and `sadd_carry`

This reflects how the condition needs to differ on the carry flag
computation for signed/unsigned inputs. I've additionally fixed the
interpreter's implementation of `IsubBorrow` when switching to the
signed/unsigned opcodes.

Finally the documentation for these instructions now explicitly say that
the incoming carry/borrow is zero-or-nonzero even though it's typed as
`i8`. Additionally the tests have been refactored to make use of
multi-return which may not have existed when they were first written.

* Rename instructions

* Fix more renames

* Update instruction descriptions

show more ...


# b81ef46c 22-Aug-2024 Nick Fitzgerald <[email protected]>

Remove reference types (`r32` and `r64`) from Cranelift (#9164)

* Remove reference types (`r32` and `r64`) from Cranelift

* restore fuzz regression test


# dbc11c30 21-Aug-2024 Frank Emrich <[email protected]>

Cranelift: add stack_switch CLIF instruction (#9078)

* stack_switch instruction

* Update cranelift/codegen/src/isa/x64/pcc.rs

Co-authored-by: Nick Fitzgerald <[email protected]>

* cargo fmt

* on

Cranelift: add stack_switch CLIF instruction (#9078)

* stack_switch instruction

* Update cranelift/codegen/src/isa/x64/pcc.rs

Co-authored-by: Nick Fitzgerald <[email protected]>

* cargo fmt

* only lower on linux

* give stack_switch the call() side effect

* add function in filetest doing switching only

* Extend documentation of new instruction

* better comments on how we handle the payloads

* Revert "only lower on linux"

This reverts commit 2af10f944186629de1615aa0ed999b7f49d13132.

* Add StackSwitchModel, use to compile stack_switch

* turn stack_switch_model into partial constructor

---------

Co-authored-by: Nick Fitzgerald <[email protected]>

show more ...


Revision tags: v24.0.0, v23.0.2, v23.0.1, v23.0.0
# 41eca60b 17-Jul-2024 beetrees <[email protected]>

cranelift: Add `f16const` and `f128const` instructions (#8893)

* cranelift: Add `f16const` and `f128const` instructions

* cranelift: Add constant propagation for `f16` and `f128`


Revision tags: v22.0.0
# 9ffc9e67 14-Jun-2024 Nick Fitzgerald <[email protected]>

Cranelift: Remove resumable traps (#8809)

These were originally a SpiderMonkey-ism and have been unused ever
since. It was introduced for GC integration, where the runtime could do
something to make

Cranelift: Remove resumable traps (#8809)

These were originally a SpiderMonkey-ism and have been unused ever
since. It was introduced for GC integration, where the runtime could do
something to make Cranelift code hit a trap and pause for a GC and then resume
execution once GC completed. But it is unclear that, as implemented, this is
actually a useful mechanism for doing that (compared to, say, loading from some
Well Known page and the GC protecting that page and catching signals to
interrupt the mutator, or simply branching and doing a libcall). And if someone
has that particular use case in the future (Wasmtime and its GC integration
doesn't need exactly this) then we can design something for what is actually
needed at that time, instead of carrying this cruft forward forever.

show more ...


Revision tags: v21.0.1, v21.0.0, v20.0.2, v20.0.1, v20.0.0, v17.0.3, v19.0.2, v18.0.4, v19.0.1
# f59b3246 20-Mar-2024 Jamey Sharp <[email protected]>

cranelift: Optimize select_spectre_guard, carefully (#8139)

* cranelift: Optimize select_spectre_guard, carefully

This commit makes two changes to our treatment of
`select_spectre_guard`.

First, s

cranelift: Optimize select_spectre_guard, carefully (#8139)

* cranelift: Optimize select_spectre_guard, carefully

This commit makes two changes to our treatment of
`select_spectre_guard`.

First, stop annotating this instruction as having any side effects. We
only care that if its value result is used, then it's computed without
branching on the condition input. We don't otherwise care when the value
is computed, or if it's computed at all.

Second, introduce some carefully selected ISLE egraph rewrites for this
instruction. These particular rewrites are those where we can statically
determine which SSA value will be the result of the instruction. Since
there is no actual choice involved, there's no way to accidentally
introduce speculation on the condition input.

* Add filetests

show more ...


Revision tags: v19.0.0
# c4478334 14-Mar-2024 Jamey Sharp <[email protected]>

cranelift: Remove support for WebAssembly tables (#8124)

Wasmtime no longer needs any of this infrastructure and neither should
anybody else.

This diff is nearly identical to @bjorn3's version of t

cranelift: Remove support for WebAssembly tables (#8124)

Wasmtime no longer needs any of this infrastructure and neither should
anybody else.

This diff is nearly identical to @bjorn3's version of the same change,
except I didn't remove Uimm64, which has started being used in other
places. I forgot bjorn3 had already tackled this part until after I was
already done, but it's reassuring that we both made the same changes.

https://github.com/bjorn3/wasmtime/commit/fb82ccb3948e949641a6d9581aa84472f68f97b8

Fixes #5532

show more ...


Revision tags: v18.0.3, v18.0.2, v17.0.2, v18.0.1, v18.0.0, v17.0.1, v17.0.0, v16.0.0, v15.0.1, v15.0.0, v14.0.4, v14.0.3, v14.0.2, v13.0.1, v14.0.1, v14.0.0, minimum-viable-wasi-proxy-serve, v13.0.0, v12.0.2, v11.0.2, v10.0.2
# d8db07fa 09-Sep-2023 Afonso Bordado <[email protected]>

cranelift: Fix `v{all,any}_true` and `vhigh_bits` instructions in the interpreter (#6985)

* cranelift: Implement `vall_true` for floats in the interpreter

* cranelift: Implement `vany_true` for flo

cranelift: Fix `v{all,any}_true` and `vhigh_bits` instructions in the interpreter (#6985)

* cranelift: Implement `vall_true` for floats in the interpreter

* cranelift: Implement `vany_true` for floats in the interpreter

* cranelift: Implement `vhigh_bits` for floats in the interpreter

* cranelift: Forbid vector return types for `vhigh_bits`

This instruction doesen't really make sense with a vector return type.
The description also states that it returns a scalar integer so I suspect
it wasn't intended to allow vector integers.

* fuzzgen: Enable `v{all,any}_true` and `vhigh_bits`

show more ...


# 62fdafa1 29-Aug-2023 Alex Crichton <[email protected]>

Remove clippy configuration from repo and crates (#6927)

Wasmtime's CI does not run clippy so there's no enforcement of this
configuration. Additionally the configuration per-crate is not uniformly

Remove clippy configuration from repo and crates (#6927)

Wasmtime's CI does not run clippy so there's no enforcement of this
configuration. Additionally the configuration per-crate is not uniformly
applied across all of the Wasmtime workspace and is only on some
historical crates. Because we don't run clippy in CI this commit removes
all of the clippy annotations for allow/warn/deny from the source.

show more ...


Revision tags: v12.0.1
# 4fc053b5 21-Aug-2023 Alex Crichton <[email protected]>

cranelift: Remove `f{min,max}_pseudo` instructions (#6874)

This commit removes these two instructions and replaces them instead
with their equivalents using `fcmp` plus `select` or `bitselect`
depen

cranelift: Remove `f{min,max}_pseudo` instructions (#6874)

This commit removes these two instructions and replaces them instead
with their equivalents using `fcmp` plus `select` or `bitselect`
depending on the type (`bitselect` for vectors, `select` for scalars).
The motivation for this commit is that incorrect optimizations for these
instructions were removed in #6859 and likely stemmed from the
surprising definitions of these instructions. These originally were
intended to correspond to operations in the SIMD proposal for
WebAssembly but nowadays the functionality of these instructions is
replaced with:

* Lowering from wasm to clif uses the `fcmp` plus `select` combo instruction.
* Backends that support optimizing this pattern use ISLE patterns to
match the instruction and emit the specialization for the pseudo
semantics.

This means that while the instructions are removed here it should be the
case that no functionality is lost and the output of Wasmtime/Cranelift
should still be the same as it was before. Existing tests using the
pseudo instructions were preserved except the riscv64 ones (where the
lowering was deleted) and the dynamic AArch64 ones. Both s390x and x64
continue to have specialized patterns for this compare-plus-select.

show more ...


Revision tags: v12.0.0, v11.0.1, v11.0.0, v10.0.1, v10.0.0
# 7f108b1e 13-Jun-2023 Alex Crichton <[email protected]>

cranelift: Remove the `fcvt_low_from_sint` instruction (#6565)

* cranelift: Remove the `fcvt_low_from_sint` instruction

This commit removes this instruction since it's a combination of
`swiden_low`

cranelift: Remove the `fcvt_low_from_sint` instruction (#6565)

* cranelift: Remove the `fcvt_low_from_sint` instruction

This commit removes this instruction since it's a combination of
`swiden_low` plus `fcvt_from_sint`. This was used by the WebAssembly
`f64x2.convert_low_i32x4_s` instruction previously but the corresponding
unsigned variant of the instruction, `f64x2.convert_low_i32x4_u`, used a
`uwiden_low` plus `fcvt_from_uint` combo. To help simplify Cranelift's
instruction set and to make these two instructions mirrors of each other
the Cranelift instruction is removed.

The s390x and AArch64 backend lowering rules for this instruction could
simply be deleted as the previous combination of the `swiden_low` and
`fcvt_from_sint` lowering rules produces the same code. The x64 backend
moved its lowering to a special case of the `fcvt_from_sint` lowering.

* Fix cranelift-fuzzgen build

show more ...


Revision tags: v9.0.4, v9.0.3, v9.0.2, v9.0.1, v9.0.0
# 913efdf2 27-Apr-2023 Nick Fitzgerald <[email protected]>

wasmtime: Overhaul trampolines (#6262)

This commit splits `VMCallerCheckedFuncRef::func_ptr` into three new function
pointers: `VMCallerCheckedFuncRef::{wasm,array,native}_call`. Each one has a
dedi

wasmtime: Overhaul trampolines (#6262)

This commit splits `VMCallerCheckedFuncRef::func_ptr` into three new function
pointers: `VMCallerCheckedFuncRef::{wasm,array,native}_call`. Each one has a
dedicated calling convention, so callers just choose the version that works for
them. This is as opposed to the previous behavior where we would chain together
many trampolines that converted between calling conventions, sometimes up to
four on the way into Wasm and four more on the way back out. See [0] for
details.

[0] https://github.com/bytecodealliance/rfcs/blob/main/accepted/tail-calls.md#a-review-of-our-existing-trampolines-calling-conventions-and-call-paths

Thanks to @bjorn3 for the initial idea of having multiple function pointers for
different calling conventions.

This is generally a nice ~5-10% speed up to our call benchmarks across the
board: both Wasm-to-host and host-to-Wasm. The one exception is typed calls from
Wasm to the host, which have a minor regression. We hypothesize that this is
because the old hand-written assembly trampolines did not maintain a call frame
and do a tail call, but the new Cranelift-generated trampolines do maintain a
call frame and do a regular call. The regression is only a couple nanoseconds,
which seems well-explained by these differences explain, and ultimately is not a
big deal.

However, this does lead to a ~5% code size regression for compiled modules.
Before, we compiled a trampoline per escaping function's signature and we
deduplicated these trampolines by signature. Now we compile two trampolines per
escaping function: one for if the host calls via the array calling convention
and one for it the host calls via the native calling convention. Additionally,
we compile a trampoline for every type in the module, in case there is a native
calling convention function from the host that we `call_indirect` of that
type. Much of this is in the `.eh_frame` section in the compiled module, because
each of our trampolines needs an entry there. Note that the `.eh_frame` section
is not required for Wasmtime's correctness, and you can disable its generation
to shrink compiled module code size; we just emit it to play nice with external
unwinders and profilers. We believe there are code size gains available for
follow up work to offset this code size regression in the future.

Backing up a bit: the reason each Wasm module needs to provide these
Wasm-to-native trampolines is because `wasmtime::Func::wrap` and friends allow
embedders to create functions even when there is no compiler available, so they
cannot bring their own trampoline. Instead the Wasm module has to supply
it. This in turn means that we need to look up and patch in these Wasm-to-native
trampolines during roughly instantiation time. But instantiation is super hot,
and we don't want to add more passes over imports or any extra work on this
path. So we integrate with `wasmtime::InstancePre` to patch these trampolines in
ahead of time.

Co-Authored-By: Jamey Sharp <[email protected]>
Co-Authored-By: Alex Crichton <[email protected]>

prtest:full

show more ...


12345678