| f302ebd6 | 30-Apr-2026 |
wasmtime-publish <[email protected]> |
Release Wasmtime 44.0.1 (#13241)
[automatically-tag-and-release-this-commit]
Co-authored-by: Wasmtime Publish <[email protected]> |
| 39e910be | 09-Apr-2026 |
Alex Crichton <[email protected]> |
[44.0.0] Merged backports for security advisories (#13007)
* fix(environ): repair unsound StringPool::try_clone()
The 43.0 release introduced a soundness bug in StringPool::try_clone(): the cloned
[44.0.0] Merged backports for security advisories (#13007)
* fix(environ): repair unsound StringPool::try_clone()
The 43.0 release introduced a soundness bug in StringPool::try_clone(): the cloned map retains &'static str keys pointing into the original pool's strings storage. Once the original Linker is dropped those keys dangle.
Cloning a Linker, then dropping the original one, leaves a linker whose registered imports could no longer be found, causing instantiation to fail with "unknown import".
Signed-off-by: Flavio Castelli <[email protected]>
* Fix pooling allocator predicate to reset VM permissions
This commit fixes a mistake that was introduced in #9583 where the logic to reset a linear memory slot in the pooling allocator used the wrong predicate. Specifically VM permissions must be reset if virtual memory can be relied on at all, and the preexisting predicate of `can_elide_bounds_check` was an inaccurate representation of this. The correct predicate to check is `can_use_virtual_memory`.
* winch: Fix the type of the `table.size` output register
This commit corrects the tagged size of the output of the `table.size` instruction. Previously this was hardcoded as a 32-bit integer instead of consulting the table's index type to use the index-type-sized-register instead.
* winch: Fix a host panic when executing `table.fill`
This commit fixes a possible panic when a Winch-compiled module executes the `table.fill` instruction. Refactoring in #11254 updated Cranelift but forgot to update Winch meaning that Winch's indices were still using the module-level indices instead of the `DefinedTableIndex` space. This adds some tests and updates Winch's translation to use preexisting helpers.
* x64: Fix `f64x2.splat` without SSE3
Don't sink a load into `pshufd` which loads 16 bytes, instead force `put_in_xmm` to ensure only 8 bytes are loaded.
* Properly verify alignment in string transcoding
This commit updates string transcoding between guest modules to properly verify alignment. Previously alignment was only verified on the first allocation, not reallocations, which is not spec-compliant. This additionally fixes a possible host panic when dealing with unaligned pointers.
* Fix type confusion in AArch64 amode RegScaled folding
* winch: Add add_uextend to perform explicit extension when needed.
This commit fixes an out-of-bounds access caused by the lack zero extension in the code responsible for calculating the heap address for loads/stores.
This issue manifests in aarch64 (unlike x64) given that no automatic extension is performed, resulting in an out-of-bounds access.
An alternative approach is to emit an extend for the index, however this approach is preferred given that it gives the MacroAssembler layer better control of how to lower addition, e.g., in aarch64 we can inline the desired extension in a single instruction.
* winch: Correctly type the result of table.grow
This commit fixes an out-of-bounds access caused by the lack of type narrowing from the `table.grow` builtin. Without explicit narrowing, the type is treated as 64-bit value, which could cause issues when paired with loads/stores.
* Review comments
* Properly handle table index types
Only narrow when dealing with the 64-bit pointer/32-bit tables
* Fix panic with out-of-bounds flags in `Value`
This commit fixes a panic when a component model `Value` is lifted from a flags value which specifies out-of-bounds bits as 1. This is specified in the component model to ignore the out-of-bounds bits, which `flags!` correctly did (and thus `bindgen!`), but `Value` treated out-of-bounds bits as a panic due to indexing an array.
* Fix bounds checks in FACT's `string_to_compact` method
We need to bounds check the source byte length, not the number of code units.
* Add missing realloc validation in string transcoding
This commit adds a missing validation that a return value of `realloc` is inbounds during string transcoding. This was accidentally missing on the transcoding path from `utf8` to `latin1+utf16` which meant that a nearly-raw pointer could get passed to the host to perform the transcode.
* winch: Refine zero extension heuristic
This commit refines the zero extension heuristic such that it unconditionally emits a zero extension when dealing with 32-bit heaps. This eliminates any ambiguity related to the value of the memory indices across ISAs.
* Fix failure on 32-bit
* Fix miri test
---------
Signed-off-by: Flavio Castelli <[email protected]> Co-authored-by: Flavio Castelli <[email protected]> Co-authored-by: Shun Kashiwa <[email protected]> Co-authored-by: Saúl Cabrera <[email protected]> Co-authored-by: Nick Fitzgerald <[email protected]>
show more ...
|
| eb4c5279 | 04-Apr-2026 |
Till Schneidereit <[email protected]> |
Fix another panic optimizing vector expressions (#12961)
Leftover from #12957 |
| 1e73c1f1 | 03-Apr-2026 |
Alex Crichton <[email protected]> |
Fix panic optimizing vector expressions (#12957)
This commit fixes an accidental regression from #12926 where `iconst_u` was called with vector types which caused a panic. The fix is is to disallow
Fix panic optimizing vector expressions (#12957)
This commit fixes an accidental regression from #12926 where `iconst_u` was called with vector types which caused a panic. The fix is is to disallow vector types in these ISLE rules and defer vector optimizations to a future commit.
show more ...
|
| 0bc447b3 | 03-Apr-2026 |
Chris Fallin <[email protected]> |
Cranelift: riscv64: fix zero-extension in trapif + icmp folding. (#12952)
* Add bad test.
* Fix lowering to properly zero-extend compared values. |
| 7e432118 | 03-Apr-2026 |
Alex Crichton <[email protected]> |
riscv64: Fix `uadd_overflow` for 32-bit integers (#12951)
This commit fixes a mistake from #11583 where the implementation of `uadd_overflow` on riscv64 was not correct for some inputs. This fix gen
riscv64: Fix `uadd_overflow` for 32-bit integers (#12951)
This commit fixes a mistake from #11583 where the implementation of `uadd_overflow` on riscv64 was not correct for some inputs. This fix generates the same codegen as `uadd_overflow_trap` which is to zero-extend both inputs, perform a 64-bit add, and use the 33rd bit as the overflow flag.
This sequence does notably differ from what LLVM generates. For example this input function
#[unsafe(no_mangle)] pub fn uadd_overflow(a: u64, b: u64) -> (u32, bool) { (a as u32).overflowing_add(b as u32) }
generates:
uadd_overflow: addw a0, a0, a1 sext.w a1, a1 sltu a1, a0, a1 ret
While this is probably correct I find it tough to reason about how `addw` produces a sign-extended result, `sext.w` sign-extends one of the operands, and then an unsigned comparison is used to generate the overflow flag for an unsigned addition. Overall I felt it was easier to just match the `uadd_overflow_trap` codegen.
show more ...
|
| 2a50190f | 03-Apr-2026 |
Alex Crichton <[email protected]> |
x64: Fix possible overflow in `Amode::offset` (#12949)
* x64: Fix possible overflow in `Amode::offset`
This commit fixes an issue in the x64 backend of Cranelift where the `Amode::offset` method co
x64: Fix possible overflow in `Amode::offset` (#12949)
* x64: Fix possible overflow in `Amode::offset`
This commit fixes an issue in the x64 backend of Cranelift where the `Amode::offset` method contained unchecked arithmetic meaning that it could possibly overflow. This in turn could lead to a miscompile of loading/storing 128-bit integers where this method is used to generate an `Amode` that is 8 bytes beyond the based address to load the upper bits. This miscompile isn't reachable from WebAssembly but is nonetheless still a good bugfix to have for Cranelift.
The fix here is to switch the `Amode::offset` method to being fallible, returning `None` on overflow. This then propagates up into ISLE where the `amode_offset` helper now has a separate case for when the addition fails, using `lea` to generate a register with an address in it. This then subsequently also needed fixing for various `Atomic128*` operations where instead of storing just a single `SyntheticAmode` they now store two, one for the address of the low bits and one for the address of the high bits.
* Fix tests
Notably package up all the arguments into a boxed structure for the atomic128 ops to avoid making `Inst` too large.
* Fix clippy
show more ...
|
| 5b5e3573 | 02-Apr-2026 |
Chris Fallin <[email protected]> |
Cranelift: aarch64: fix `preserve-all` to save full vector registers. (#12944)
It turns out that the `preserve-all` ABI was only preserving some, not all (false advertising!): specifically, the aarc
Cranelift: aarch64: fix `preserve-all` to save full vector registers. (#12944)
It turns out that the `preserve-all` ABI was only preserving some, not all (false advertising!): specifically, the aarch64 ABI code was continuing to use low-64-bit loads/stores on vector/float registers, as it does for the ordinary AAPCS (SysV) calling convention. `PreserveAll` specifically indicates that the *entire* vector register should be saved; so now we do that.
show more ...
|
| 4d4e9033 | 02-Apr-2026 |
Hyunbin Kim <[email protected]> |
[Cranelift] add simplification rules (#12937) |
| 763622c3 | 01-Apr-2026 |
Nick Fitzgerald <[email protected]> |
Preserve `try_call[_indirect]` stack maps during lowering (#12934)
* Preserve `try_call[_indirect]` stack maps during lowering
Branch instructions are skipped in the main lowering loop, which means
Preserve `try_call[_indirect]` stack maps during lowering (#12934)
* Preserve `try_call[_indirect]` stack maps during lowering
Branch instructions are skipped in the main lowering loop, which means the stack map forwarding code is never reached for them. The branch lowering path didn't forward stack maps either. This was fine because branch instructions couldn't previously ever be safepoints. However, with the introduction of `try_call` and `try_call_indirect`, we now have instructions that are both safepoints and branches.
This caused GC references live across `try_call[_indirect]` instructions to not be traced during garbage collection, leading to use-after-free within the GC heap sandbox when the collector swept those untraced-but-still-live objects.
The fix adds stack map forwarding after branch lowering, mirroring the existing logic for non-branch instructions.
Fixes bytecodealliance/wasmtime#11753.
* update disas test
show more ...
|
| bac0e78f | 01-Apr-2026 |
Alex Crichton <[email protected]> |
aarch64: Disable csdb emission by default (#12932)
* aarch64: Disable csdb emission by default
This has a massive performance penalty on macOS, for example, and peer compilers are not emitting this
aarch64: Disable csdb emission by default (#12932)
* aarch64: Disable csdb emission by default
This has a massive performance penalty on macOS, for example, and peer compilers are not emitting this as part of on-by-default mitigations. This commit preserves the option to emit it with an aarch64-specific `use_csdb` flag, but the default is now `false` meaning that this is not emitted by default.
Closes #12789
* Fix tests
* Fix tests & review comments
* Use ISLE rule introduced
show more ...
|
| 15783254 | 01-Apr-2026 |
Hyunbin Kim <[email protected]> |
[Cranelift] add simplification rules (#12926)
[Cranelift] Add arithmetic simplification rules |
| 33e8b3d9 | 31-Mar-2026 |
Alex Crichton <[email protected]> |
aarch64: Fix miscompile lowering the `extr` instruction (#12907)
This commit fixes a miscompile in the lowering of the `extr` instruction for the aarch64 backend where one of the shift operands is 0
aarch64: Fix miscompile lowering the `extr` instruction (#12907)
This commit fixes a miscompile in the lowering of the `extr` instruction for the aarch64 backend where one of the shift operands is 0. In this edge case the generated `extr` instruction did not match the input CLIF semantics, calculating a different value. The fix here is to only use the `extr` instruction when both immediates are larger than 0.
show more ...
|
| f81160c2 | 31-Mar-2026 |
Alex Crichton <[email protected]> |
aarch64: Fix `splat(ireduce(iconst(...)))` (#12902)
This commit fixes a lowering rule in the aarch64 Cranelift backend. Specifically a combined `splat(ireduce(_))` combo would pass an immediate to t
aarch64: Fix `splat(ireduce(iconst(...)))` (#12902)
This commit fixes a lowering rule in the aarch64 Cranelift backend. Specifically a combined `splat(ireduce(_))` combo would pass an immediate to the `splat_const` helper which had higher bits set since the `ireduce` wasn't const-propagated. The fix applied here is to delete the `ireduce`-related rule and rely on mid-end optimizations to trigger to fold the `ireduce(iconst(...))` appropriately. This ensures that the `u64` values passed into the `splat_const` rule is indeed the exact value that's being splatted.
show more ...
|
| 2f7dbd61 | 31-Mar-2026 |
Chris Fallin <[email protected]> |
PCC: remove proof-carrying code (for now?). (#12800)
In late 2023, we built out an experimental feature called Proof-Carrying Code (PCC), where we attached "facts" to values in the CLIF IR and built
PCC: remove proof-carrying code (for now?). (#12800)
In late 2023, we built out an experimental feature called Proof-Carrying Code (PCC), where we attached "facts" to values in the CLIF IR and built verification of these facts after lowering to machine instructions. We also added "memory types" describing layout of memory and a "checked" flag on memory operations such that we could verify that any checked memory operation accessed valid memory (as defined by memory types attached to pointer values via facts). Wasmtime's Cranelift backend then put appropriate memory types and facts in its IR such that all accesses to memory (aspirationally) could be checked, taking the whole mid-end and lowering backend of Cranelift out of the trusted core that enforces SFI.
This basically worked, at the time, for static memories; but never for dynamic memories, and then work on the feature lost prioritization (aka I had to work on other things) and I wasn't able to complete it and put it in fuzzing/enable it as a production option.
Unfortunately since then it has bit-rotted significantly -- as we add new backend optimizations and instruction lowerings we haven't kept the PCC framework up to date.
Inspired by the discussion in #12497 I think it's time to delete it (hopefully just "for now"?) unless/until we can build it again. And when we do that, we should probably get it to the point of validating robust operation on all combinations of memory configurations before merging. (That implies a big experiment branch rather than a bunch of eager PRs in-tree, but so it goes.) I still believe it is possible to build this (and I have ideas on how to do it!) but not right now.
show more ...
|
| d70a517c | 30-Mar-2026 |
Nick Fitzgerald <[email protected]> |
Print stack maps for `try_call[_indirect]` CLIF instructions (#12891)
* Print stack maps for `try_call[_indirect]` CLIF instructions
The safepoint liveness analysis already correctly records stack
Print stack maps for `try_call[_indirect]` CLIF instructions (#12891)
* Print stack maps for `try_call[_indirect]` CLIF instructions
The safepoint liveness analysis already correctly records stack maps for these instructions, but the display omission hid this from view.
* cargo fmt
show more ...
|
| 9d7677c2 | 30-Mar-2026 |
Alex Crichton <[email protected]> |
x64: Refactor `x64_load` helper (#12877)
It's mostly unused nowadays so whittle it down to exactly what's necessary, which is to say:
* Loading 64-bits is "const propagated" to `x64_movq_rm` * The
x64: Refactor `x64_load` helper (#12877)
It's mostly unused nowadays so whittle it down to exactly what's necessary, which is to say:
* Loading 64-bits is "const propagated" to `x64_movq_rm` * The helper itself is now `x64_load_xmm` and returns a typed `Xmm` register which handles only XMM-related types.
show more ...
|
| f33f15e6 | 30-Mar-2026 |
Alex Crichton <[email protected]> |
Use `if cfg!(...)` instead of `#[cfg]` (#12889)
Minor style follow-up from #12841 |
| 8da1acec | 30-Mar-2026 |
Alex Crichton <[email protected]> |
x64: Shrink size of `x64_not` in `icmp` peephole optimization (#12876)
This commit fixes a minor issue where a `not` instruction was translated with a 64-bit width when the input was a 32-bit value.
x64: Shrink size of `x64_not` in `icmp` peephole optimization (#12876)
This commit fixes a minor issue where a `not` instruction was translated with a 64-bit width when the input was a 32-bit value. This didn't end up actually affecting the semantics of the instruction itself but it's not necessary to have a full 64-bit negation, so this commit updates it to a 32-bit negation which was the original intention here.
show more ...
|
| c2abcc2a | 27-Mar-2026 |
Shun Kashiwa <[email protected]> |
Fix fmin f16 cprop rule using $F32 instead of $F16 (#12854) |
| baa6b27b | 26-Mar-2026 |
Chris Fallin <[email protected]> |
Cranelift: rework MachBuffer to handle very short-deadline jumps. (#12842)
* Cranelift: rework MachBuffer to handle very short-deadline jumps.
In #12811 it was reported that riscv64 compressed jump
Cranelift: rework MachBuffer to handle very short-deadline jumps. (#12842)
* Cranelift: rework MachBuffer to handle very short-deadline jumps.
In #12811 it was reported that riscv64 compressed jumps (`c.j` instructions), with a +/- 2048-byte range, could cause panics when combined with queued-up/deferred constants in a constant pool during binary emission.
Our `MachBuffer` handles single-pass machine code emission, resolution of labels, and upgrading of label ranges via "veneers" (jumps that a shorter jump can reach that themselves have a longer range). We track a pending "deadline" of all unresolved branches, and when the deadline is too close (including the max size of all veneers yet to be emitted), we emit an "island" of all veneers to resolve the deadline.
After its initial design, we added support for deferred traps and constants to the `MachBuffer`. These worked by emitting their contents *before* the "island" of veneers, which turns out to be slightly nicer for code layout in some cases.
Unfortunately the full implications of those additions weren't realized against the invariants of the deadline-resolution algorithm. In particular, when a new branch is added with a very short range (e.g., `c.j`), it is possible that there are *already* too many queued-up traps/constants for the range of that just-emitted branch to reach even the first possible veneer site if we start an island right away.
Thus it is strictly necessary to emit the veneers before constants/traps. Unfortunately this requires some alterations to other aspects of label resolution as well: in particular, we can't resolve fixups for label references to constants before we emit those constants, and likewise for traps. Note that we do a fixpoint loop over emitting island(s) at the end of emission, so all constants/traps *will* be emitted and label references to them *will* be resolved eventually; just in the opposite order, now.
No compile test because the particular reduced testcase in #12811 only worked in the `release-36.0.0` branch, and not on `main`, and it was too hard to tweak the test to hit the right case on `main` as well. In lieu of that, I've added a unit test directly to the `MachBuffer` implementation to exercise this case.
Fixes #12811.
* fix filetest with errant comments confusing precise-output check
show more ...
|
| fa8cd552 | 25-Mar-2026 |
Chris Fallin <[email protected]> |
Cranelift: perform aggressive splitting of generated ISLE function bodies in debug builds. (#12841)
As investigated in #12821, ISLE-generated Rust code with many locals in one large function body re
Cranelift: perform aggressive splitting of generated ISLE function bodies in debug builds. (#12841)
As investigated in #12821, ISLE-generated Rust code with many locals in one large function body results in unreasonably-large stack frames when rustc's optimizations (and therefore LLVM's mem2reg) are disabled. In `constructor_simplify`, a single function body generated from all mid-end rewrite rules, I was seeing a stack-frame size of `0x44000` bytes (272 KiB) on x86-64 (while in an opt build it is `0x1000` bytes (4 KiB)). With a recursion depth of 5 (the default limit) for rewrites of RHS-created nodes, this has the potential to overrun a 1 MiB stack; and is generally quite wasteful.
In [another branch] I attempted to make `islec` do manual regalloc of a sort just for multi-extractor iterators; but soon realized that the problem has to do with *all* locals (which become `alloca`s in LLVM IR), not just the iterators. We could do the equivalent thing to share all locals but this would become grossly un-idiomatic for e.g. LHSes of destructuring in Rust.
Instead, this PR leverages previous work in #12303 (thanks Bongjun!) that splits function bodies in the generated Rust code from ISLE with a configurable threshold. This PR enables that feature and makes the threshold quite low in debug builds. This has the natural effect of splitting locals among many smaller stack frames, which are dynamically pushed only when matching enters a particular subtree; so stack usage more closely mirrors the actual live-set of values bound during matching.
With this change, the stack frame for the top-level `constructor_simplify` is < 4 KiB on x86-64 in a debug build.
Fixes #12821.
[another branch]: https://github.com/cfallin/wasmtime/tree/isle-iterator-reuse
show more ...
|
| 2811ee83 | 24-Mar-2026 |
Mikhail Katychev <[email protected]> |
feat(style,doc): added typos-cli workspace configuration (#12827)
* init config values
* more manual changes
* typos write
* revert certain changes
* misused, tightened up hex encoding |
| 0e51666b | 24-Mar-2026 |
SSD <[email protected]> |
CI: Add a check for cranelift-codegen (only x86 and aarch64) on no_std tagets (#12812)
* CI: Add a check for cranelift-codegen (only x86 and aarch64) on no_std tagets
* Change -F to --features
* F
CI: Add a check for cranelift-codegen (only x86 and aarch64) on no_std tagets (#12812)
* CI: Add a check for cranelift-codegen (only x86 and aarch64) on no_std tagets
* Change -F to --features
* Fix CI: remove unused imports when building without unwind
* Cargo fmt
show more ...
|
| ab78bd82 | 22-Mar-2026 |
Ho Kim <[email protected]> |
fix: correct various typos (#12807)
Signed-off-by: Ho Kim <[email protected]> |