| 39e910be | 09-Apr-2026 |
Alex Crichton <[email protected]> |
[44.0.0] Merged backports for security advisories (#13007)
* fix(environ): repair unsound StringPool::try_clone()
The 43.0 release introduced a soundness bug in StringPool::try_clone(): the cloned
[44.0.0] Merged backports for security advisories (#13007)
* fix(environ): repair unsound StringPool::try_clone()
The 43.0 release introduced a soundness bug in StringPool::try_clone(): the cloned map retains &'static str keys pointing into the original pool's strings storage. Once the original Linker is dropped those keys dangle.
Cloning a Linker, then dropping the original one, leaves a linker whose registered imports could no longer be found, causing instantiation to fail with "unknown import".
Signed-off-by: Flavio Castelli <[email protected]>
* Fix pooling allocator predicate to reset VM permissions
This commit fixes a mistake that was introduced in #9583 where the logic to reset a linear memory slot in the pooling allocator used the wrong predicate. Specifically VM permissions must be reset if virtual memory can be relied on at all, and the preexisting predicate of `can_elide_bounds_check` was an inaccurate representation of this. The correct predicate to check is `can_use_virtual_memory`.
* winch: Fix the type of the `table.size` output register
This commit corrects the tagged size of the output of the `table.size` instruction. Previously this was hardcoded as a 32-bit integer instead of consulting the table's index type to use the index-type-sized-register instead.
* winch: Fix a host panic when executing `table.fill`
This commit fixes a possible panic when a Winch-compiled module executes the `table.fill` instruction. Refactoring in #11254 updated Cranelift but forgot to update Winch meaning that Winch's indices were still using the module-level indices instead of the `DefinedTableIndex` space. This adds some tests and updates Winch's translation to use preexisting helpers.
* x64: Fix `f64x2.splat` without SSE3
Don't sink a load into `pshufd` which loads 16 bytes, instead force `put_in_xmm` to ensure only 8 bytes are loaded.
* Properly verify alignment in string transcoding
This commit updates string transcoding between guest modules to properly verify alignment. Previously alignment was only verified on the first allocation, not reallocations, which is not spec-compliant. This additionally fixes a possible host panic when dealing with unaligned pointers.
* Fix type confusion in AArch64 amode RegScaled folding
* winch: Add add_uextend to perform explicit extension when needed.
This commit fixes an out-of-bounds access caused by the lack zero extension in the code responsible for calculating the heap address for loads/stores.
This issue manifests in aarch64 (unlike x64) given that no automatic extension is performed, resulting in an out-of-bounds access.
An alternative approach is to emit an extend for the index, however this approach is preferred given that it gives the MacroAssembler layer better control of how to lower addition, e.g., in aarch64 we can inline the desired extension in a single instruction.
* winch: Correctly type the result of table.grow
This commit fixes an out-of-bounds access caused by the lack of type narrowing from the `table.grow` builtin. Without explicit narrowing, the type is treated as 64-bit value, which could cause issues when paired with loads/stores.
* Review comments
* Properly handle table index types
Only narrow when dealing with the 64-bit pointer/32-bit tables
* Fix panic with out-of-bounds flags in `Value`
This commit fixes a panic when a component model `Value` is lifted from a flags value which specifies out-of-bounds bits as 1. This is specified in the component model to ignore the out-of-bounds bits, which `flags!` correctly did (and thus `bindgen!`), but `Value` treated out-of-bounds bits as a panic due to indexing an array.
* Fix bounds checks in FACT's `string_to_compact` method
We need to bounds check the source byte length, not the number of code units.
* Add missing realloc validation in string transcoding
This commit adds a missing validation that a return value of `realloc` is inbounds during string transcoding. This was accidentally missing on the transcoding path from `utf8` to `latin1+utf16` which meant that a nearly-raw pointer could get passed to the host to perform the transcode.
* winch: Refine zero extension heuristic
This commit refines the zero extension heuristic such that it unconditionally emits a zero extension when dealing with 32-bit heaps. This eliminates any ambiguity related to the value of the memory indices across ISAs.
* Fix failure on 32-bit
* Fix miri test
---------
Signed-off-by: Flavio Castelli <[email protected]> Co-authored-by: Flavio Castelli <[email protected]> Co-authored-by: Shun Kashiwa <[email protected]> Co-authored-by: Saúl Cabrera <[email protected]> Co-authored-by: Nick Fitzgerald <[email protected]>
show more ...
|
| c00e9ea2 | 02-Dec-2025 |
Chris Fallin <[email protected]> |
Cranelift: add patchable call instructions. (#12101)
* Cranelift: add patchable call instructions.
The new `patchable_call` CLIF instruction pairs with the `patchable` ABI, and emits a callsite wit
Cranelift: add patchable call instructions. (#12101)
* Cranelift: add patchable call instructions.
The new `patchable_call` CLIF instruction pairs with the `patchable` ABI, and emits a callsite with one new key property: the MachBuffer carries metadata that describes exactly which byte range to "NOP out" (overwrite with NOP instructions) to disable that callsite. Doing so is semantically valid and explicitly supported.
This enables patching of code at runtime to dynamically turn on and off features such as instrumentation or debugging hooks. We plan to use this to implement breakpoints in Wasmtime's guest debugging support.
As part of this change, I added a notion of "unit of NOP bytes" to the MachBuffer so that the consumer (e.g., Wasmtime's Cranelift-based code compilation pipeline and metadata-producing logic) can handle patchable callsites without any other special knowledge of the ISA.
For the "real metal" ISAs there are perfectly well-defined NOPs to use, but for Pulley, where all opcodes are assigned at compile time by macro magic, I explicitly defined NOP as opcode byte 0 by moving `Nop`'s definition to the top of the list and adding a unit test asserting its encoding.
A design note: in principle it would be possible, as an alternative, to treat "patchability" as an orthogonal dimension of all callsites, and emit the metadata describing the instruction-offset range for any callsite with the flag set. The only truly necessary semantic restriction is that there are no return values (because if we turn the callsite off, nothing writes to them); we could support patchability for other ABIs and for the other kinds of call instructions. The `patchable` ABI would then be better described as something like the "no clobbers ABI". I opted not to generalize in this way because it creates some less-tested corners and the generalized form, at least at the MachInst level, is not really much simpler in the end.
A testing note: I opted not to implement actual code patching in the `cranelift-tools` filetest runner and test patching callsites in/out via some actuation (e.g. a magic hostcall, like we do for throws) because (i) that's a lot of new plumbing and (ii) we are going to test this very shortly in Wasmtime anyway and (iii) the correctness (or not) of the location-and-length metadata is easy enough to verify in the disassemblies in the compile-tests.
* Review feedback: remove dependence on (and test for) NOP being the literal byte 0.
show more ...
|
| a3d6e407 | 06-Oct-2025 |
Chris Fallin <[email protected]> |
Cranelift: add debug tag infrastructure. (#11768)
* Cranelift: add debug tag infrastructure.
This PR adds *debug tags*, a kind of metadata that can attach to CLIF instructions and be lowered to VCo
Cranelift: add debug tag infrastructure. (#11768)
* Cranelift: add debug tag infrastructure.
This PR adds *debug tags*, a kind of metadata that can attach to CLIF instructions and be lowered to VCode instructions and as metadata on the produced compiled code. It also adds opaque descriptor blobs carried with stackslots. Together, these two features allow decorating IR with first-class debug instrumentation that is properly preserved by the compiler, including across optimizations and inlining. (Wasmtime's use of these features will come in followup PRs.)
The key idea of a "debug tag" is to allow the Cranelift embedder to express whatever information it needs to, in a format that is opaque to Cranelift itself, except for the parts that need translation during lowering. In particular, the `DebugTag::StackSlot` variant gets translated to a physical offset into the stackframe in the compiled metadata output. So, for example, the embedder can emit a tag referring to a stackslot, and another describing an offset in that stackslot.
The debug tags exist as a *sequence* on any given instruction; the meaning of the sequence is known only to the embedder, *except* that during inlining, the tags for the inlining call instruction are prepended to the tags of inlined instructions. In this way, a canonical use-case of tags as describing original source-language frames can preserve the source-language view even when multiple functions are inlined into one.
The descriptor on a stackslot may look a little odd at first, but its purpose is to allow serializing some description of stackslot-contained runtime user-program data, in a way that is firmly attached to the stackslot. In particular, in the face of inlining, this descriptor is copied into the inlining (parent) function from the inlined function when the stackslot entity is copied; no other metadata outside Cranelift needs to track the identity of stackslots and know about that motion. This fits nicely with the ability of tags to refer to stackslots; together, the embedder can annotate instructions as having certain state in stackslots, and describe the format of that state per stackslot.
This infrastructure is tested with some compile-tests now; testing of the interpretation of the metadata output will come with end-to-end debug instrumentation tests in a followup PR.
* Review feedback: add back sequence points and enforce tags only on sequence points or calls.
* Use Vecs for debug metadata in MachBuffer to avoid SmallVec size penalty in not-used case.
* Review feedback: switch from inlined stackslot descriptor blobs to u64 keys.
show more ...
|
| 62dfbd60 | 25-Sep-2025 |
Chris Fallin <[email protected]> |
Cranelift: use SP-offset amodes for `stack_addr`+load/store. (#11727)
We provide `stack_load`/ `stack_store` / `stack_addr` instructions in Cranelift to operate on stack slots, and the first two are
Cranelift: use SP-offset amodes for `stack_addr`+load/store. (#11727)
We provide `stack_load`/ `stack_store` / `stack_addr` instructions in Cranelift to operate on stack slots, and the first two are legalized to a `stack_addr` plus an ordinary load or store instruction.
We currently have lowerings for `stack_addr` that materialize an SP-relative address into a register: for example, `leaq 8(%rsp), %rax` on x86-64 or `add x0, sp, #8` on aarch64.
Taken together, we see sequences like (aarch64 / x86-64)
``` add x0, sp, #8 / leaq 8(%rsp), %rax str x1, [x0] / movq %rdx, (%rax) ```
when using `stack_store`s. In particular, we do *not* use the direct SP-relative form, which would look like
``` str x1, [sp, #8] / movq %rdx, 8(%rsp) ```
and which we can already generate in other cases, e.g. spillslot moves (spills/reloads) and clobber saves/restores.
This inefficiency is undesirable whenever the embedder is using stackslots, but in particular when we expect to have high memory traffic to stack slots (e.g., I am seeing this now when implementing debug instrumentation in Wasmtime, and user stack map instrumentation for GC will also benefit).
This PR adds new lowerings that use the existing synthetic address mode we already use for spillslots to emit loads/stores to stackslots directly when possible. The PR does this for x86-64 and aarch64; others could be updated later.
show more ...
|