History log of /wasmtime-44.0.1/cranelift/codegen/meta/src/shared/settings.rs (Results 1 – 25 of 56)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: dev, v36.0.9, v44.0.1, v43.0.2, v36.0.8, v24.0.8, v44.0.0, v43.0.1, v42.0.2, v36.0.7, v24.0.7
# 2f7dbd61 31-Mar-2026 Chris Fallin <[email protected]>

PCC: remove proof-carrying code (for now?). (#12800)

In late 2023, we built out an experimental feature called
Proof-Carrying Code (PCC), where we attached "facts" to values in the
CLIF IR and built

PCC: remove proof-carrying code (for now?). (#12800)

In late 2023, we built out an experimental feature called
Proof-Carrying Code (PCC), where we attached "facts" to values in the
CLIF IR and built verification of these facts after lowering to
machine instructions. We also added "memory types" describing layout
of memory and a "checked" flag on memory operations such that we could
verify that any checked memory operation accessed valid memory (as
defined by memory types attached to pointer values via
facts). Wasmtime's Cranelift backend then put appropriate memory types
and facts in its IR such that all accesses to memory (aspirationally)
could be checked, taking the whole mid-end and lowering backend of
Cranelift out of the trusted core that enforces SFI.

This basically worked, at the time, for static memories; but never for
dynamic memories, and then work on the feature lost
prioritization (aka I had to work on other things) and I wasn't able
to complete it and put it in fuzzing/enable it as a production option.

Unfortunately since then it has bit-rotted significantly -- as we add
new backend optimizations and instruction lowerings we haven't kept
the PCC framework up to date.

Inspired by the discussion in #12497 I think it's time to delete
it (hopefully just "for now"?) unless/until we can build it again. And
when we do that, we should probably get it to the point of validating
robust operation on all combinations of memory configurations before
merging. (That implies a big experiment branch rather than a bunch of
eager PRs in-tree, but so it goes.) I still believe it is possible to
build this (and I have ideas on how to do it!) but not right now.

show more ...


# 2811ee83 24-Mar-2026 Mikhail Katychev <[email protected]>

feat(style,doc): added typos-cli workspace configuration (#12827)

* init config values

* more manual changes

* typos write

* revert certain changes

* misused, tightened up hex encoding


Revision tags: v43.0.0, v42.0.1, v41.0.4, v42.0.0, v40.0.4, v36.0.6, v24.0.6, v41.0.3, v41.0.2, v41.0.1, v36.0.5, v40.0.3, v41.0.0, v36.0.4, v39.0.2, v40.0.2, v40.0.1, v40.0.0
# 87ed3b60 15-Dec-2025 Chris Fallin <[email protected]>

Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`. (#12160)

* Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable AB

Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`. (#12160)

* Cranelift: make all non-tail, non-indirect calls patchable, and rename patchable ABI to `preserve_all`.

As discussed in this week's Cranelift meeting, we've discovered a need
to generalize the `patchable_call` mechanism and corresponding
`patchable` ABI slightly. In particular, we will need patchable
`try_call` callsites as well in order to allow breakpoint handlers to
throw exceptions (desirable functionality eventually) and have this work
in the presence of inlining. Also, it's just a nice generalization to
say that patchability is an orthogonal dimension to the call ABI and the
other restrictions we initially imposed, and works as long as the basic
requirement (no return values) is met.

This also renames the `patchable` ABI to `preserve_all`, to make it
clear that its purpose is actually orthogonal, and it can be used
independently of patchable callsites. It also deletes the `cold` ABI,
which never actually did anything and is misleading in the presence of
an actual cold-ish (subzero temperature, actually) ABI like
`preserve_all`.

* Review feedback.

show more ...


Revision tags: v39.0.1, v39.0.0, v38.0.4, v37.0.3, v36.0.3, v24.0.5, v38.0.3, v38.0.2, v38.0.1, v37.0.2
# 2a2e8f62 01-Oct-2025 bjorn3 <[email protected]>

Couple cleanups to the flags/settings handling in Cranelift (#11744)

* Remove unused shared flags

* Get rid of predicate settings

They were important in the old backend framework, but with the new

Couple cleanups to the flags/settings handling in Cranelift (#11744)

* Remove unused shared flags

* Get rid of predicate settings

They were important in the old backend framework, but with the new
backend framework if we need a combination of multiple settings, that
can just be done as a regular extractor doing &&. This simplifies the
settings implementation.

show more ...


Revision tags: v37.0.1, v37.0.0, v36.0.2
# 73de2ee9 25-Aug-2025 Chris Fallin <[email protected]>

Pull in new regalloc2 with fastalloc fixes for exceptions, and re-enable and add to testing. (#11533)

* Revert "Cranelift/Wasmtime: disable fastalloc (single-pass) allocator for now. (#10554)"

This

Pull in new regalloc2 with fastalloc fixes for exceptions, and re-enable and add to testing. (#11533)

* Revert "Cranelift/Wasmtime: disable fastalloc (single-pass) allocator for now. (#10554)"

This reverts commit d52e23b09191185996792b8ef18e5fca2865ca43.

* Upgrade to regalloc2 0.13.1.

Pulls in bytecodealliance/regalloc2#233 to update fastalloc to support
the looser constraints needed by exception-related changes.

* cargo-vet update.

show more ...


Revision tags: v36.0.1, v36.0.0, v35.0.0, v24.0.4, v33.0.2, v34.0.2, v34.0.1, v33.0.1, v24.0.3, v32.0.1, v34.0.0, v33.0.0, v32.0.0
# d52e23b0 09-Apr-2025 Chris Fallin <[email protected]>

Cranelift/Wasmtime: disable fastalloc (single-pass) allocator for now. (#10554)

Unfortunately, as discovered by a recent fuzzbug [1], the single-pass
register allocator is not compatible with the ap

Cranelift/Wasmtime: disable fastalloc (single-pass) allocator for now. (#10554)

Unfortunately, as discovered by a recent fuzzbug [1], the single-pass
register allocator is not compatible with the approach to callsite
defs that exception-handling support has forced us to take. In
particular, we needed to move all call return-value defs onto the call
instruction itself, so calls could be terminators; this unbounded
number of defs is made to be a solvable allocation problem by using
`any` constraints, which allow allocation directly into spillslots;
but fastalloc appears to error out if it runs out of registers,
regardless of this constraint.

Long-term, we should fix this, but unfortunately I don't have cycles
to dive into fastalloc's internals at the moment, and it's (I think) a
tier-3 feature. As such, this PR disables its use for now. I've
filed a tracking issue in RA2 [2], and referenced this in the
Cranelift configuration option docs at least.

To keep from shifting all fuzzbugs / fuzzing corpii by altering the
`arbitrary` interpretation, I opted to keep the enum the same in the
fuzzing crate, and remap `SinglePass` to `Backtracking` there. I'm
happy to take the other approach and remove the option (thus
invalidating all fuzzbugs) if we'd prefer that instead.

[1]: https://oss-fuzz.com/testcase-detail/5433312476987392
[2]: https://github.com/bytecodealliance/regalloc2/issues/217

show more ...


Revision tags: v31.0.0
# 2af0a1f7 13-Mar-2025 bjorn3 <[email protected]>

Introduce log2_min_function_alignment flag (#10391)

* Remove function_alignment handling from cranelift-object and cranelift-jit

It is already handled by MachBuffer. The symbol_alignment could also

Introduce log2_min_function_alignment flag (#10391)

* Remove function_alignment handling from cranelift-object and cranelift-jit

It is already handled by MachBuffer. The symbol_alignment could also be
removed as no current backend has a symbol alignment bigger than the
function alignment, but keeping it around is a bit safer when new
backends are introduced.

* Introduce log2_min_function_alignment flag

This is required for cg_clif to implement -Zmin-function-alignment.

show more ...


Revision tags: v30.0.2, v30.0.1, v30.0.0, v29.0.1, v29.0.0, v28.0.1, v28.0.0, v27.0.0
# 1e3e5fc1 15-Nov-2024 Chris Fallin <[email protected]>

Cranelift: add option to use new single-pass register allocator. (#9611)

* Cranelift: add option to use new single-pass register allocator.

In bytecodealliance/regalloc2#181, @d-sonuga added a fast

Cranelift: add option to use new single-pass register allocator. (#9611)

* Cranelift: add option to use new single-pass register allocator.

In bytecodealliance/regalloc2#181, @d-sonuga added a fast single-pass
algorithm option to regalloc2, in addition to its existing backtracking
allocator. This produces code much more quickly, at the expense of code
quality. Sometimes this tradeoff is desirable (e.g. when performing a
debug build in a fast-iteration development situation, or in an initial
JIT tier).

This PR adds a Cranelift option to select the RA2 algorithm, plumbs it
through to a Wasmtime option, and adds the option to Wasmtime fuzzing as
well.

An initial compile-time measurement in Wasmtime: `spidermonkey.wasm`
builds in 1.383s with backtracking (existing algorithm), and 1.065s with
single-pass. The resulting binary runs a simple Fibonacci benchmark in
2.060s with backtracking vs. 3.455s with single-pass.

Hence, the single-pass algorithm yields a 23% compile-time reduction, at
the cost of a 67% runtime increase.

* cargo-vet audit for allocator-api2 0.2.18 -> 0.2.20.

show more ...


Revision tags: v26.0.1, v25.0.3, v24.0.2
# af968461 27-Oct-2024 bjorn3 <[email protected]>

Gate support for implicit return area pointers behind an option (#9511)

* Remove couple of unused isle helpers

* Gate support for implicit return area pointers behind an option

It implements an in

Gate support for implicit return area pointers behind an option (#9511)

* Remove couple of unused isle helpers

* Gate support for implicit return area pointers behind an option

It implements an incorrect ABI and may be removed in the future due to
complexity reasons. By requiring to enable an option to use it, it
becomes harder to accidentally hit the ABI issue and allows a more
deprecation and eventual removal.

* Fix review comments

* Enable enable_multi_ret_implicit_sret for s390x tests that use i128

* Enable enable_multi_ret_implicit_sret for riscv tests that use vector types

* Enable enable_multi_ret_implicit_sret for more riscv tests

* Enable enable_multi_ret_implicit_sret for windows tests that use i128

show more ...


Revision tags: v26.0.0, v21.0.2, v22.0.1, v23.0.3, v25.0.2, v24.0.1, v25.0.1, v25.0.0
# dbc11c30 21-Aug-2024 Frank Emrich <[email protected]>

Cranelift: add stack_switch CLIF instruction (#9078)

* stack_switch instruction

* Update cranelift/codegen/src/isa/x64/pcc.rs

Co-authored-by: Nick Fitzgerald <[email protected]>

* cargo fmt

* on

Cranelift: add stack_switch CLIF instruction (#9078)

* stack_switch instruction

* Update cranelift/codegen/src/isa/x64/pcc.rs

Co-authored-by: Nick Fitzgerald <[email protected]>

* cargo fmt

* only lower on linux

* give stack_switch the call() side effect

* add function in filetest doing switching only

* Extend documentation of new instruction

* better comments on how we handle the payloads

* Revert "only lower on linux"

This reverts commit 2af10f944186629de1615aa0ed999b7f49d13132.

* Add StackSwitchModel, use to compile stack_switch

* turn stack_switch_model into partial constructor

---------

Co-authored-by: Nick Fitzgerald <[email protected]>

show more ...


Revision tags: v24.0.0, v23.0.2, v23.0.1, v23.0.0, v22.0.0, v21.0.1, v21.0.0, v20.0.2, v20.0.1, v20.0.0
# d53d0788 16-Apr-2024 Jamey Sharp <[email protected]>

cranelift: Simplify checking whether probestack is needed (#8376)

`Callee::probestack_min_frame` was always set to the guard-page size.
That is the same as the `guard_size` local in `gen_prologue`,

cranelift: Simplify checking whether probestack is needed (#8376)

`Callee::probestack_min_frame` was always set to the guard-page size.
That is the same as the `guard_size` local in `gen_prologue`, which was
also the only place which used `probestack_min_frame`. So don't even
bother storing it, just compute it from the flags as needed.

Also, remove the long-obsolete `probestack_func_adjusts_sp` setting,
which hasn't been used since at least 2021. We may actually want to do
something like what that setting described, but even if we do, it should
be a choice the backend makes rather than a global setting.

show more ...


Revision tags: v17.0.3, v19.0.2, v18.0.4, v19.0.1, v19.0.0, v18.0.3, v18.0.2, v17.0.2, v18.0.1, v18.0.0, v17.0.1, v17.0.0, v16.0.0, v15.0.1, v15.0.0, v14.0.4, v14.0.3, v14.0.2, v13.0.1, v14.0.1, v14.0.0
# 8e00cc20 17-Oct-2023 Chris Fallin <[email protected]>

PCC: initial end-to-end integration with Wasmtime's static memories. (#7274)

* PCC: add facts to global values, parse and print them. No verification yet.

Co-authored-by: Nick Fitzgerald <fitzgen@g

PCC: initial end-to-end integration with Wasmtime's static memories. (#7274)

* PCC: add facts to global values, parse and print them. No verification yet.

Co-authored-by: Nick Fitzgerald <[email protected]>

* PCC: propagate facts on GV loads and check them.

Co-authored-by: Nick Fitzgerald <[email protected]>

* PCC: support propagating facts on iteratively-elaborated GVs as well.

Co-authored-by: Nick Fitzgerald <[email protected]>

* PCC: fix up Wasmtime uses of GVs after refactors to memflags handling.

Co-authored-by: Nick Fitzgerald <[email protected]>

* PCC: working end-to-end for static memories!

Co-authored-by: Nick Fitzgerald <[email protected]>

* PCC: add toplevel Wasmtime option `-C enable-pcc=y`.

* Fix filetests build.

* Review feedback, and blessed test updates due to GV legalization changes.

---------

Co-authored-by: Nick Fitzgerald <[email protected]>

show more ...


# f466aa26 06-Oct-2023 Chris Fallin <[email protected]>

Skeleton and initial support for proof-carrying code. (#7165)

* WIP veriwasm 2.0

Co-Authored-By: Chris Fallin <[email protected]>

* PCC: successfully parse some simple facts.

Co-authored-by: Nic

Skeleton and initial support for proof-carrying code. (#7165)

* WIP veriwasm 2.0

Co-Authored-By: Chris Fallin <[email protected]>

* PCC: successfully parse some simple facts.

Co-authored-by: Nick Fitzgerald <[email protected]>

* PCC: plumb facts through VCode and add framework on LowerBackend to check them.

Co-authored-by: Nick Fitzgerald <[email protected]>

* PCC: code is carrying some proofs! Very simple test-case.

Co-authored-by: Nick Fitzgerald <[email protected]>

* PCC: add a `safe` flag for checked memory accesses.

* PCC: add pretty-printing of facts to CLIF output.

* PCC: misc. cleanups.

* PCC: lots of cleanup.

* Post-rebase fixups and some misc. fixes.

* Add serde traits to facts.

* PCC: add succeed and fail tests.

* Review feedback: rename `safe` memflag to `checked`.

* Review feedback.

---------

Co-authored-by: Nick Fitzgerald <[email protected]>

show more ...


Revision tags: minimum-viable-wasi-proxy-serve, v13.0.0, v12.0.2, v11.0.2, v10.0.2, v12.0.1, v12.0.0, v11.0.1, v11.0.0
# b25fe4b4 22-Jun-2023 Alex Crichton <[email protected]>

cranelift: Remove the `enable_simd` shared setting (#6631)

This commit removes a setting for Cranelift which I've found a bit
confusing historically and I think is no longer necessary. The setting
i

cranelift: Remove the `enable_simd` shared setting (#6631)

This commit removes a setting for Cranelift which I've found a bit
confusing historically and I think is no longer necessary. The setting
is currently documented as enabling SIMD instructions, but that only
sort of works for the x64 backend and none of the other backends look at
it. Historically this was used to flag to Cranelift that a higher x64
baseline feature set is required for codegen but as of #6625 that's no
longer necessary.

Otherwise it seems more Cranelift-like nowadays to say that vector
instructions generate SIMD instructions where non-vector instructions
probably don't, but may still depending on activated CPU features. In
that sense I'm not sure if a dedicated `enable_simd` setting is still
motivated, so this PR removes it.

This renames some features in the x86 backend such as `use_avx_simd` to
`use_avx` since the `_simd` part is no longer part of the computation
now that `enable_simd` is gone.

show more ...


Revision tags: v10.0.1, v10.0.0, v9.0.4, v9.0.3, v9.0.2, v9.0.1, v9.0.0
# 49dd8fd7 16-May-2023 Alex Crichton <[email protected]>

aarch64: Fix Ldr19 relocations being unresolvable (#6384)

* Add a cranelift setting for padding between basic blocks

Various relocations, jumps, and such require special handling in
`MachBuffer` w

aarch64: Fix Ldr19 relocations being unresolvable (#6384)

* Add a cranelift setting for padding between basic blocks

Various relocations, jumps, and such require special handling in
`MachBuffer` with respect to islands to ensure that everything gets
emitted correctly. This commit adds a setting to synthetically insert
padding at the end of every basic block to help stress this logic with
more minimal test cases. The setting is disabled by default but is
something that we should be able to turn on during fuzzing, for example.

* aarch64: Fix out-of-range `Ldr19` relocations

This commit fixes a bug in the AArch64 backend, and possibly others,
where constants were unconditionally forced to be at the end of the
function when they sometimes couldn't be. For example the `Ldr19`
relocation has a 512k range meaning that if an instruction near the
beginning of a function accesses a constant at the end of a function and
the function is >1M, then the relocation cannot be resolved. This is all
handled internally with `MachBuffer`'s handling of islands but the
problem with constants is that the labels (and the constant values)
weren't defined until the end of the function.

The first attempt at fixing this was to move the calls to
`defer_constant` to the beginning of emission. This would enable the
constants to get deferred as necessary. This was problematic, however,
because it only solved the forwards case (aka your constant was forced
to the end of the function which is too far away). The backwards case,
aka your constant is way too far behind you, was a new problem that
arose.

To fix all of these issues constants are now handled differently inside
of the `MachBuffer`. Previously constants were all pre-assigned a
label-per-constant and all references to the constant would use that
single label. Instead a new heuristic has been added where constants
record their size/alignment at the start of emission and labels are
lazily deferred. When a label for a constant is requested then a label
is lazily allocated or a previously-allocated label for this constant is
returned. When an island is emitted then all emitted constants get
their labels cleared. This intends to balance the previous functionality
of multiple uses of a constant only emit the constant once with fixing
this issue with simplicity as well. This means that constants may get
emitted multiple times, since each reference to a constant after an
island is generated will be guaranteed to generate a new label, even if
it's in-range to access. This can perhaps be fixed in the future with a
more clever API where the `LabelUse` is passed into the function which
converts a constant to a label, but that's left as a refactoring for a
future date.

This commit also moves an `alignment: u32` field into the
`MachBufferFinalized` itself since that's now a function of whatever
constants actually got emitted. Additionally note that constant
emission in the middle of a function doesn't actually emit anything,
instead recording markers of where constants need to go. Then when a
buffer is finalized the constants are passed in to get access to the
data which fills in everything as it's referenced.

* Fuzz the `bb_padding_log2` setting

This commit hooks up the previously-added setting to Cranelift to
Wasmtime's fuzzing infrastructure. This will automatically configure the
setting based on the fuzz input to add a bit of "chaos" to the emitted
code. This should hopefully help expose the issue fixed previously via
fuzzing which otherwise won't generate massive functions.

* Realign back to an instruction boundary

Otherwise misaligned instructions were getting emitted and tripping
various asserts.

* Fix riscv64 testing

* Rename codegen setting to bb_padding_log2_minus_one

Allow for inserting one byte of padding.

* Doc updates

* Thread through shared flags differently

Don't use `EmitInfo`, instead pass in to vcode emission

* Fix s390x tests

* Combine island calculations during vcode emission

Fixes an off-by-just-a-few error if the two island checks are done
separately after a basic block.

show more ...


Revision tags: v6.0.2, v7.0.1, v8.0.1, v8.0.0
# 230e2135 06-Apr-2023 Chris Fallin <[email protected]>

Cranelift: remove non-egraphs optimization pipeline and `use_egraphs` option. (#6167)

* Cranelift: remove non-egraphs optimization pipeline and `use_egraphs` option.

This PR removes the LICM, GVN,

Cranelift: remove non-egraphs optimization pipeline and `use_egraphs` option. (#6167)

* Cranelift: remove non-egraphs optimization pipeline and `use_egraphs` option.

This PR removes the LICM, GVN, and preopt passes, and associated support
pieces, from `cranelift-codegen`. Not to worry, we still have
optimizations: the egraph framework subsumes all of these, and has been
on by default since #5181.

A few decision points:

- Filetests for the legacy LICM, GVN and simple_preopt were removed too.
As we built optimizations in the egraph framework we wrote new tests
for the equivalent functionality, and many of the old tests were
testing specific behaviors in the old implementations that may not be
relevant anymore. However if folks prefer I could take a different
approach here and try to port over all of the tests.

- The corresponding filetest modes (commands) were deleted too. The
`test alias_analysis` mode remains, but no longer invokes a separate
GVN first (since there is no separate GVN that will not also do alias
analysis) so the tests were tweaked slightly to work with that. The
egrpah testsuite also covers alias analysis.

- The `divconst_magic_numbers` module is removed since it's unused
without `simple_preopt`, though this is the one remaining optimization
we still need to build in the egraphs framework, pending #5908. The
magic numbers will live forever in git history so removing this in the
meantime is not a major issue IMHO.

- The `use_egraphs` setting itself was removed at both the Cranelift and
Wasmtime levels. It has been marked deprecated for a few releases now
(Wasmtime 6.0, 7.0, upcoming 8.0, and corresponding Cranelift
versions) so I think this is probably OK. As an alternative if anyone
feels strongly, we could leave the setting and make it a no-op.

* Update test outputs for remaining test differences.

show more ...


Revision tags: v7.0.0
# 5ae85752 16-Mar-2023 Alex Crichton <[email protected]>

x64: Take SIGFPE signals for divide traps (#6026)

* x64: Take SIGFPE signals for divide traps

Prior to this commit Wasmtime would configure `avoid_div_traps=true`
unconditionally for Cranelift. Thi

x64: Take SIGFPE signals for divide traps (#6026)

* x64: Take SIGFPE signals for divide traps

Prior to this commit Wasmtime would configure `avoid_div_traps=true`
unconditionally for Cranelift. This, for the division-based
instructions, would change emitted code to explicitly trap on trap
conditions instead of letting the `div` x86 instruction trap.

There's no specific reason for Wasmtime, however, to specifically avoid
traps in the `div` instruction. This means that the extra generated
branches on x86 aren't necessary since the `div` and `idiv` instructions
already trap for similar conditions as wasm requires.

This commit instead disables the `avoid_div_traps` setting for
Wasmtime's usage of Cranelift. Subsequently the codegen rules were
updated slightly:

* When `avoid_div_traps=true`, traps are no longer emitted for `div`
instructions.
* The `udiv`/`urem` instructions now list their trap as divide-by-zero
instead of integer overflow.
* The lowering for `sdiv` was updated to still explicitly check for zero
but the integer overflow case is deferred to the instruction itself.
* The lowering of `srem` no longer checks for zero and the listed trap
for the `div` instruction is a divide-by-zero.

This means that the codegen for `udiv` and `urem` no longer have any
branches. The codegen for `sdiv` removes one branch but keeps the
zero-check to differentiate the two kinds of traps. The codegen for
`srem` removes one branch but keeps the -1 check since the semantics of
`srem` mismatch with the semantics of `idiv` with a -1 divisor
(specifically for INT_MIN).

This is unlikely to have really all that much of a speedup but was
something I noticed during #6008 which seemed like it'd be good to clean
up. Plus Wasmtime's signal handling was already set up to catch
`SIGFPE`, it was just never firing.

* Remove the `avoid_div_traps` cranelift setting

With no known users currently removing this should be possible and helps
simplify the x64 backend.

* x64: GC more support for avoid_div_traps

Remove the `validate_sdiv_divisor*` pseudo-instructions and clean up
some of the ISLE rules now that `div` is allowed to itself trap
unconditionally.

* x64: Store div trap code in instruction itself

* Keep divisors in registers, not in memory

Don't accidentally fold multiple traps together

* Handle EXC_ARITHMETIC on macos

* Update emit tests

* Update winch and tests

show more ...


Revision tags: v6.0.1, v5.0.1, v4.0.1, v6.0.0, v5.0.0
# 1faff8c2 19-Jan-2023 Chris Fallin <[email protected]>

Enable egraph-based optimization by default. (#5587)

This PR follows up on #5382 and #5391, which rebuilt the egraph-based optimization framework to be more performant, by enabling it by default.

Enable egraph-based optimization by default. (#5587)

This PR follows up on #5382 and #5391, which rebuilt the egraph-based optimization framework to be more performant, by enabling it by default.

Based on performance results in #5382 (my measurements on SpiderMonkey and bjorn3's independent confirmation with cg_clif), it seems that this is reasonable to enable. Now that we have been fuzzing compiler configurations with egraph opts (#5388) for 6 weeks, having fixed a few fuzzbugs that came up (#5409, #5420, #5438) and subsequently received no further reports from OSS-Fuzz, I believe it is stable enough to rely on.

This PR enables `use_egraphs`, and also normalizes its meaning: previously it forced optimization (it basically meant "turn on the egraph optimization machinery"), now it runs egraph opts if the opt level indicates (it means "use egraphs to optimize if we are going to optimize"). The conditionals in the top-level pass driver are a little subtle, but will get simpler once we can remove the non-egraph path (which we plan to do eventually!).

Fixes #5181.

show more ...


Revision tags: v4.0.0
# c0b587ac 15-Dec-2022 Nick Fitzgerald <[email protected]>

Remove heaps from core Cranelift, push them into `cranelift-wasm` (#5386)

* cranelift-wasm: translate Wasm loads into lower-level CLIF operations

Rather than using `heap_{load,store,addr}`.

*

Remove heaps from core Cranelift, push them into `cranelift-wasm` (#5386)

* cranelift-wasm: translate Wasm loads into lower-level CLIF operations

Rather than using `heap_{load,store,addr}`.

* cranelift: Remove the `heap_{addr,load,store}` instructions

These are now legalized in the `cranelift-wasm` frontend.

* cranelift: Remove the `ir::Heap` entity from CLIF

* Port basic memory operation tests to .wat filetests

* Remove test for verifying CLIF heaps

* Remove `heap_addr` from replace_branching_instructions_and_cfg_predecessors.clif test

* Remove `heap_addr` from readonly.clif test

* Remove `heap_addr` from `table_addr.clif` test

* Remove `heap_addr` from the simd-fvpromote_low.clif test

* Remove `heap_addr` from simd-fvdemote.clif test

* Remove `heap_addr` from the load-op-store.clif test

* Remove the CLIF heap runtest

* Remove `heap_addr` from the global_value.clif test

* Remove `heap_addr` from fpromote.clif runtests

* Remove `heap_addr` from fdemote.clif runtests

* Remove `heap_addr` from memory.clif parser test

* Remove `heap_addr` from reject_load_readonly.clif test

* Remove `heap_addr` from reject_load_notrap.clif test

* Remove `heap_addr` from load_readonly_notrap.clif test

* Remove `static-heap-without-guard-pages.clif` test

Will be subsumed when we port `make-heap-load-store-tests.sh` to generating
`.wat` tests.

* Remove `static-heap-with-guard-pages.clif` test

Will be subsumed when we port `make-heap-load-store-tests.sh` over to `.wat`
tests.

* Remove more heap tests

These will be subsumed by porting `make-heap-load-store-tests.sh` over to `.wat`
tests.

* Remove `heap_addr` from `simple-alias.clif` test

* Remove `heap_addr` from partial-redundancy.clif test

* Remove `heap_addr` from multiple-blocks.clif test

* Remove `heap_addr` from fence.clif test

* Remove `heap_addr` from extends.clif test

* Remove runtests that rely on heaps

Heaps are not a thing in CLIF or the interpreter anymore

* Add generated load/store `.wat` tests

* Enable memory-related wasm features in `.wat` tests

* Remove CLIF heap from fcmp-mem-bug.clif test

* Add a mode for compiling `.wat` all the way to assembly in filetests

* Also generate WAT to assembly tests in `make-load-store-tests.sh`

* cargo fmt

* Reinstate `f{de,pro}mote.clif` tests without the heap bits

* Remove undefined doc link

* Remove outdated SVG and dot file from docs

* Add docs about `None` returns for base address computation helpers

* Factor out `env.heap_access_spectre_mitigation()` to a local

* Expand docs for `FuncEnvironment::heaps` trait method

* Restore f{de,pro}mote+load clif runtests with stack memory

show more ...


Revision tags: v3.0.1
# 83088538 30-Nov-2022 Alex Crichton <[email protected]>

Implement inline stack probes for AArch64 (#5353)

* Turn off probestack by default in Cranelift

The probestack feature is not implemented for the aarch64 and s390x
backends and currently the on-

Implement inline stack probes for AArch64 (#5353)

* Turn off probestack by default in Cranelift

The probestack feature is not implemented for the aarch64 and s390x
backends and currently the on-by-default status requires the aarch64 and
s390x implementations to be a stub. Turning off probestack by default
allows the s390x and aarch64 backends to panic with an error message to
avoid providing a false sense of security. When the probestack option is
implemented for all backends, however, it may be reasonable to
re-enable.

* aarch64: Improve codegen for AMode fallback

Currently the final fallback for finalizing an `AMode` will generate
both a constant-loading instruction as well as an `add` instruction to
the base register into the same temporary. This commit improves the
codegen by removing the `add` instruction and folding the final add into
the finalized `AMode`. This changes the `extendop` used but both
registers are 64-bit so shouldn't be affected by the extending
operation.

* aarch64: Implement inline stack probes

This commit implements inline stack probes for the aarch64 backend in
Cranelift. The support here is modeled after the x64 support where
unrolled probes are used up to a particular threshold after which a loop
is generated. The instructions here are similar in spirit to x64 except
that unlike x64 the stack pointer isn't modified during the unrolled
loop to avoid needing to re-adjust it back up at the end of the loop.

* Enable inline probestack for AArch64 and Riscv64

This commit enables inline probestacks for the AArch64 and Riscv64
architectures in the same manner that x86_64 has it enabled now. Some
more testing was additionally added since on Unix platforms we should be
guaranteed that Rust's stack overflow message is now printed too.

* Enable probestack for aarch64 in cranelift-fuzzgen

* Address review comments

* Remove implicit stack overflow traps from x64 backend

This commit removes implicit `StackOverflow` traps inserted by the x64
backend for stack-based operations. This was historically required when
stack overflow was detected with page faults but Wasmtime no longer
requires that since it's not suitable for wasm modules which call host
functions. Additionally no other backend implements this form of
implicit trap-code additions so this is intended to synchronize the
behavior of all the backends.

This fixes a test added prior for aarch64 to properly abort the process
instead of accidentally being caught by Wasmtime.

* Fix a style issue

show more ...


Revision tags: v3.0.0, v1.0.2, v2.0.2, v2.0.1
# 51d87342 20-Oct-2022 Afonso Bordado <[email protected]>

fuzzgen: Generate compiler flags (#5020)

* fuzzgen: Test compiler flags

* cranelift: Generate `all()` function for all enum flags

This allows a user to iterate all flags that exist.

* fuzzg

fuzzgen: Generate compiler flags (#5020)

* fuzzgen: Test compiler flags

* cranelift: Generate `all()` function for all enum flags

This allows a user to iterate all flags that exist.

* fuzzgen: Minimize regalloc_checker compiles

* fuzzgen: Limit the amount of test case inputs

* fuzzgen: Add egraphs flag

It's finally here! ��

* cranelift: Add fuzzing comment to settings

* fuzzgen: Add riscv64

* fuzzgen: Unconditionally enable some flags

show more ...


Revision tags: v2.0.0
# 2be12a51 12-Oct-2022 Chris Fallin <[email protected]>

egraph-based midend: draw the rest of the owl (productionized). (#4953)

* egraph-based midend: draw the rest of the owl.

* Rename `egg` submodule of cranelift-codegen to `egraph`.

* Apply some

egraph-based midend: draw the rest of the owl (productionized). (#4953)

* egraph-based midend: draw the rest of the owl.

* Rename `egg` submodule of cranelift-codegen to `egraph`.

* Apply some feedback from @jsharp during code walkthrough.

* Remove recursion from find_best_node by doing a single pass.

Rather than recursively computing the lowest-cost node for a given
eclass and memoizing the answer at each eclass node, we can do a single
forward pass; because every eclass node refers only to earlier nodes,
this is sufficient. The behavior may slightly differ from the earlier
behavior because we cannot short-circuit costs to zero once a node is
elaborated; but in practice this should not matter.

* Make elaboration non-recursive.

Use an explicit stack instead (with `ElabStackEntry` entries,
alongside a result stack).

* Make elaboration traversal of the domtree non-recursive/stack-safe.

* Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph.

* Apply static recursion limit to rule application.

* Fix aarch64 wrt dynamic-vector support -- broken rebase.

* Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to!

* Fix multi-result call testcase.

* Include `cranelift-egraph` in `PUBLISHED_CRATES`.

* Fix atomic_rmw: not really a load.

* Remove now-unnecessary PartialOrd/Ord derivations.

* Address some code-review comments.

* Review feedback.

* Review feedback.

* No overlap in mid-end rules, because we are defining a multi-constructor.

* rustfmt

* Review feedback.

* Review feedback.

* Review feedback.

* Review feedback.

* Remove redundant `mut`.

* Add comment noting what rules can do.

* Review feedback.

* Clarify comment wording.

* Update `has_memory_fence_semantics`.

* Apply @jameysharp's improved loop-level computation.

Co-authored-by: Jamey Sharp <[email protected]>

* Fix suggestion commit.

* Fix off-by-one in new loop-nest analysis.

* Review feedback.

* Review feedback.

* Review feedback.

* Use `Default`, not `std::default::Default`, as per @fitzgen

Co-authored-by: Nick Fitzgerald <[email protected]>

* Apply @fitzgen's comment elaboration to a doc-comment.

Co-authored-by: Nick Fitzgerald <[email protected]>

* Add stat for hitting the rewrite-depth limit.

* Some code motion in split prelude to make the diff a little clearer wrt `main`.

* Take @jameysharp's suggested `try_into()` usage for blockparam indices.

Co-authored-by: Jamey Sharp <[email protected]>

* Take @jameysharp's suggestion to avoid double-match on load op.

Co-authored-by: Jamey Sharp <[email protected]>

* Fix suggestion (add import).

* Review feedback.

* Fix stack_load handling.

* Remove redundant can_store case.

* Take @jameysharp's suggested improvement to FuncEGraph::build() logic

Co-authored-by: Jamey Sharp <[email protected]>

* Tweaks to FuncEGraph::build() on top of suggestion.

* Take @jameysharp's suggested clarified condition

Co-authored-by: Jamey Sharp <[email protected]>

* Clean up after suggestion (unused variable).

* Fix loop analysis.

* loop level asserts

* Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now.

* Take @jameysharp's suggestion re: result_tys

Co-authored-by: Jamey Sharp <[email protected]>

* Fix up after suggestion

* Take @jameysharp's suggestion to use fold rather than reduce

Co-authored-by: Jamey Sharp <[email protected]>

* Fixup after suggestion

* Take @jameysharp's suggestion to remove elaborate_eclass_use's return value.

* Clarifying comment in terminator insts.

Co-authored-by: Jamey Sharp <[email protected]>
Co-authored-by: Nick Fitzgerald <[email protected]>

show more ...


Revision tags: v1.0.1, v1.0.0
# 08e7a7f1 01-Sep-2022 Afonso Bordado <[email protected]>

cranelift: Add inline stack probing for x64 (#4747)

* cranelift: Add inline stack probe for x64

* cranelift: Cleanups comments

Thanks @jameysharp!


Revision tags: v0.40.1, v0.40.0
# 8a9b1a90 12-Aug-2022 Benjamin Bouvier <[email protected]>

Implement an incremental compilation cache for Cranelift (#4551)

This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested b

Implement an incremental compilation cache for Cranelift (#4551)

This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.

After the suggestion of Chris, `Function` has been split into mostly two parts:

- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.

Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:

- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
- `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
- The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.

The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.

A basic fuzz target has been introduced that tries to do the bare minimum:

- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
- This last check is less efficient and less likely to happen, so probably should be rethought a bit.

Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.

Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement.

Fixes #4155.

show more ...


# 43f17652 02-Aug-2022 Chris Fallin <[email protected]>

Cranellift: remove Baldrdash support and related features. (#4571)

* Cranellift: remove Baldrdash support and related features.

As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey te

Cranellift: remove Baldrdash support and related features. (#4571)

* Cranellift: remove Baldrdash support and related features.

As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team
has recently determined that their current form of integration with
Cranelift is too hard to maintain, and they have chosen to remove it
from their codebase. If and when they decide to build updated support
for Cranelift, they will adopt different approaches to several details
of the integration.

In the meantime, after discussion with the SpiderMonkey folks, they
agree that it makes sense to remove the bits of Cranelift that exist
to support the integration ("Baldrdash"), as they will not need
them. Many of these bits are difficult-to-maintain special cases that
are not actually tested in Cranelift proper: for example, the
Baldrdash integration required Cranelift to emit function bodies
without prologues/epilogues, and instead communicate very precise
information about the expected frame size and layout, then stitched
together something post-facto. This was brittle and caused a lot of
incidental complexity ("fallthrough returns", the resulting special
logic in block-ordering); this is just one example. As another
example, one particular Baldrdash ABI variant processed stack args in
reverse order, so our ABI code had to support both traversal
orders. We had a number of other Baldrdash-specific settings as well
that did various special things.

This PR removes Baldrdash ABI support, the `fallthrough_return`
instruction, and pulls some threads to remove now-unused bits as a
result of those two, with the understanding that the SpiderMonkey folks
will build new functionality as needed in the future and we can perhaps
find cleaner abstractions to make it all work.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425

* Review feedback.

* Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations.

The debugger tests invoke `wasmtime` from within each test case under
the control of a debugger (gdb or lldb). Some of these tests started to
inexplicably fail in CI with unrelated changes, and the failures were
only inconsistently reproducible locally. It seems to be cache related:
if we disable cached compilation on the nested `wasmtime` invocations,
the tests consistently pass.

* Review feedback.

show more ...


123