|
Revision tags: dev, v36.0.9, v44.0.1, v43.0.2, v36.0.8, v24.0.8, v44.0.0, v43.0.1, v42.0.2, v36.0.7, v24.0.7 |
|
| #
2f7dbd61 |
| 31-Mar-2026 |
Chris Fallin <[email protected]> |
PCC: remove proof-carrying code (for now?). (#12800)
In late 2023, we built out an experimental feature called Proof-Carrying Code (PCC), where we attached "facts" to values in the CLIF IR and built
PCC: remove proof-carrying code (for now?). (#12800)
In late 2023, we built out an experimental feature called Proof-Carrying Code (PCC), where we attached "facts" to values in the CLIF IR and built verification of these facts after lowering to machine instructions. We also added "memory types" describing layout of memory and a "checked" flag on memory operations such that we could verify that any checked memory operation accessed valid memory (as defined by memory types attached to pointer values via facts). Wasmtime's Cranelift backend then put appropriate memory types and facts in its IR such that all accesses to memory (aspirationally) could be checked, taking the whole mid-end and lowering backend of Cranelift out of the trusted core that enforces SFI.
This basically worked, at the time, for static memories; but never for dynamic memories, and then work on the feature lost prioritization (aka I had to work on other things) and I wasn't able to complete it and put it in fuzzing/enable it as a production option.
Unfortunately since then it has bit-rotted significantly -- as we add new backend optimizations and instruction lowerings we haven't kept the PCC framework up to date.
Inspired by the discussion in #12497 I think it's time to delete it (hopefully just "for now"?) unless/until we can build it again. And when we do that, we should probably get it to the point of validating robust operation on all combinations of memory configurations before merging. (That implies a big experiment branch rather than a bunch of eager PRs in-tree, but so it goes.) I still believe it is possible to build this (and I have ideas on how to do it!) but not right now.
show more ...
|
|
Revision tags: v43.0.0, v42.0.1, v41.0.4, v42.0.0, v40.0.4, v36.0.6, v24.0.6 |
|
| #
b5d2ff5d |
| 18-Feb-2026 |
Chris Fallin <[email protected]> |
Cranelift: update regalloc2 to 0.15.0 to permit more VRegs. (#12611)
* Cranelift: update regalloc2 to 0.15.0 to permit more VRegs.
This pulls in bytecodealliance/regalloc2#257 to permit more VRegs
Cranelift: update regalloc2 to 0.15.0 to permit more VRegs. (#12611)
* Cranelift: update regalloc2 to 0.15.0 to permit more VRegs.
This pulls in bytecodealliance/regalloc2#257 to permit more VRegs to be used in a single function body, addressing #12229 and our followup discussions about supporting function body sizes up to the Wasm implementation limit standard.
In addition to the RA2 upgrade, this also includes a bit more explicit limit-checking on the Cranelift side: note that we don't directly use `regalloc2::VReg` but instead we further bitpack it into `Reg`, which is logically a sum type of `VReg`, `PReg` and `SpillSlot` (the last one needed to represent stack allocation locations on defs, e.g. on callsites with many returns). `PReg`s are packed into the beginning of the `VReg` index space but `SpillSlot`s are distinguished by stealing the upper bit of a `u32`. This was previously not a problem given the smaller `VReg` index space but now we need to check explicitly; hence `Reg::from_virtual_reg_checked` and its use in the lowering vreg allocator. Because the `VReg` index packs the class into the bottom two bits, and index into the upper 30, but we steal one bit at the top, the true limit for VReg count is thus actually 2^29, or 512M.
Fixes #12229.
* Drop `code_too_large` test.
show more ...
|
|
Revision tags: v41.0.3 |
|
| #
9fe4cc18 |
| 04-Feb-2026 |
Philip Craig <[email protected]> |
cranelift: improve debug value locations around cold blocks (#12484)
If a debug range started or ended on an instruction from a cold block, we omitted the entire range. Fix by skipping the instructi
cranelift: improve debug value locations around cold blocks (#12484)
If a debug range started or ended on an instruction from a cold block, we omitted the entire range. Fix by skipping the instructions in the cold block to find the next valid offset.
show more ...
|
|
Revision tags: v41.0.2, v41.0.1, v36.0.5, v40.0.3, v41.0.0, v36.0.4, v39.0.2, v40.0.2, v40.0.1 |
|
| #
76911c29 |
| 07-Jan-2026 |
SSD <[email protected]> |
Partial support for no_std in cranelift_codegen (#12222)
* Move most things from std to core and alloc
* Port assembler_x64 to no_std
* before adding prelude to each file
* Most of the files now
Partial support for no_std in cranelift_codegen (#12222)
* Move most things from std to core and alloc
* Port assembler_x64 to no_std
* before adding prelude to each file
* Most of the files now work with no_std
* update isle to use alloc and core
* some instances shouldn't have been renamed, fixes cargo test
* add cranelift-assembler-x64 (no_std) to CI
* fix codegen_meta, missed one spot with std::slice
* automatically remove prelude with cargo fix
* update isle changes
* update assembler changes
* update assembler changes
* use latest codegen changes + fix FxHash problem
* add imports
* fix floating issues with libm
* remove unused import
* temporarily remove OnceLock
* add no_std arm support and add it into CI
* Move most things from std to core and alloc
* Port assembler_x64 to no_std
* before adding prelude to each file
* Most of the files now work with no_std
* update isle to use alloc and core
* some instances shouldn't have been renamed, fixes cargo test
* add cranelift-assembler-x64 (no_std) to CI
* automatically remove prelude with cargo fix
* update isle changes
* update assembler changes
* update assembler changes
* use latest codegen changes + fix FxHash problem
* add imports
* fix floating issues with libm
* remove unused import
* temporarily remove OnceLock
* add no_std arm support and add it into CI
* Move most things from std to core and alloc
* Port assembler_x64 to no_std
* before adding prelude to each file
* Most of the files now work with no_std
* update isle to use alloc and core
* add cranelift-assembler-x64 (no_std) to CI
* automatically remove prelude with cargo fix
* update isle changes
* update assembler changes
* use latest codegen changes + fix FxHash problem
* add imports
* fix floating issues with libm
* temporarily remove OnceLock
* add no_std arm support and add it into CI
* revert Cargo.toml formating
* remove prelude and fix cargo.toml
* cargo fmt
* remove empty lines
* bad renames
* macro_use only on no_std
* revert OnceLock change
* only use stable libm features
* update regalloc2
* update comment
* use continue instead
* Update vets
---------
Co-authored-by: Alex Crichton <[email protected]>
show more ...
|
| #
0889323a |
| 03-Jan-2026 |
SSD <[email protected]> |
cranelift-codegen: rename most uses of std to core and alloc (#12237)
* rename most std uses to core and alloc
* cargo fmt
|
|
Revision tags: v40.0.0, v39.0.1, v39.0.0, v38.0.4, v37.0.3, v36.0.3, v24.0.5, v38.0.3, v38.0.2, v38.0.1, v37.0.2 |
|
| #
a3d6e407 |
| 06-Oct-2025 |
Chris Fallin <[email protected]> |
Cranelift: add debug tag infrastructure. (#11768)
* Cranelift: add debug tag infrastructure.
This PR adds *debug tags*, a kind of metadata that can attach to CLIF instructions and be lowered to VCo
Cranelift: add debug tag infrastructure. (#11768)
* Cranelift: add debug tag infrastructure.
This PR adds *debug tags*, a kind of metadata that can attach to CLIF instructions and be lowered to VCode instructions and as metadata on the produced compiled code. It also adds opaque descriptor blobs carried with stackslots. Together, these two features allow decorating IR with first-class debug instrumentation that is properly preserved by the compiler, including across optimizations and inlining. (Wasmtime's use of these features will come in followup PRs.)
The key idea of a "debug tag" is to allow the Cranelift embedder to express whatever information it needs to, in a format that is opaque to Cranelift itself, except for the parts that need translation during lowering. In particular, the `DebugTag::StackSlot` variant gets translated to a physical offset into the stackframe in the compiled metadata output. So, for example, the embedder can emit a tag referring to a stackslot, and another describing an offset in that stackslot.
The debug tags exist as a *sequence* on any given instruction; the meaning of the sequence is known only to the embedder, *except* that during inlining, the tags for the inlining call instruction are prepended to the tags of inlined instructions. In this way, a canonical use-case of tags as describing original source-language frames can preserve the source-language view even when multiple functions are inlined into one.
The descriptor on a stackslot may look a little odd at first, but its purpose is to allow serializing some description of stackslot-contained runtime user-program data, in a way that is firmly attached to the stackslot. In particular, in the face of inlining, this descriptor is copied into the inlining (parent) function from the inlined function when the stackslot entity is copied; no other metadata outside Cranelift needs to track the identity of stackslots and know about that motion. This fits nicely with the ability of tags to refer to stackslots; together, the embedder can annotate instructions as having certain state in stackslots, and describe the format of that state per stackslot.
This infrastructure is tested with some compile-tests now; testing of the interpretation of the metadata output will come with end-to-end debug instrumentation tests in a followup PR.
* Review feedback: add back sequence points and enforce tags only on sequence points or calls.
* Use Vecs for debug metadata in MachBuffer to avoid SmallVec size penalty in not-used case.
* Review feedback: switch from inlined stackslot descriptor blobs to u64 keys.
show more ...
|
|
Revision tags: v37.0.1, v37.0.0 |
|
| #
3b85d838 |
| 03-Sep-2025 |
Paul Nodet <[email protected]> |
feat: add granular tail call detection infrastructure to MachInst (#11599)
* feat: add granular tail call detection infrastructure to machinst
Adds core infrastructure for distinguishing between re
feat: add granular tail call detection infrastructure to MachInst (#11599)
* feat: add granular tail call detection infrastructure to machinst
Adds core infrastructure for distinguishing between regular calls and tail calls at the instruction level.
* feat: implement call_type() method for all ISA backends
* refactor: pass around function_calls enum instead of boolean
* feat: add function_calls.update() logic
show more ...
|
| #
3fe9c3c7 |
| 03-Sep-2025 |
Paul Nodet <[email protected]> |
fix: accurate leaf detection (#11581)
* feat: add is_call() method to MachInst trait and VCode analysis
Add is_call() method to MachInst trait to enable accurate leaf function detection during regi
fix: accurate leaf detection (#11581)
* feat: add is_call() method to MachInst trait and VCode analysis
Add is_call() method to MachInst trait to enable accurate leaf function detection during register allocation. Update VCode compute_clobbers() to return (clobbers, is_leaf) tuple by analyzing actual call instructions in machine code.
* feat: implement is_call() method across all architectures
Implement is_call() method for all architecture-specific MachInst implementations:
- x64: Detects CallKnown, CallUnknown, ReturnCall variants, and TLS calls (ElfTlsGetAddr, MachOTlsGetAddr) - aarch64: Detects Call, CallInd, ReturnCall variants, and TLS calls (ElfTlsGetAddr, MachOTlsGetAddr) - riscv64: Detects Call, CallInd, ReturnCall variants, and ElfTlsGetAddr - s390x: Detects CallKnown, CallUnknown, ReturnCall variants - pulley: Detects Call, CallIndirect, ReturnCall variants
Co-authored-by: bjorn3 <[email protected]>
* feat: improve leaf function detection and pass is_leaf to FrameLayout
* test: add filetests for leaf detection
* test: update expected outputs for accurate leaf function detection
* test(riscv64): update filetests output
---------
Co-authored-by: bjorn3 <[email protected]>
show more ...
|
|
Revision tags: v36.0.2, v36.0.1, v36.0.0, v35.0.0, v24.0.4, v33.0.2, v34.0.2 |
|
| #
0854775b |
| 08-Jul-2025 |
bjorn3 <[email protected]> |
Couple of optimizations to the Cranelift incremental cache (#11186)
* Fix a couple of comments
* Remove flags.predicate_view()
It is a remenant of the old backend framework.
* Avoid string conver
Couple of optimizations to the Cranelift incremental cache (#11186)
* Fix a couple of comments
* Remove flags.predicate_view()
It is a remenant of the old backend framework.
* Avoid string conversions for hashing the TargetIsa
* Remove func_body_len
It is identical to buffer.data.len()
* Introduce IsaFlagsHashKey
show more ...
|
|
Revision tags: v34.0.1, v33.0.1, v24.0.3, v32.0.1, v34.0.0, v33.0.0 |
|
| #
90ac295e |
| 19-May-2025 |
Alex Crichton <[email protected]> |
Update Wasmtime to the 2024 Rust Edition (#10806)
* Update Wasmtime to the 2024 Rust Edition
Now that our MSRV supports the 2024 edition it's possible to make this switch. This commit moves Wasmtim
Update Wasmtime to the 2024 Rust Edition (#10806)
* Update Wasmtime to the 2024 Rust Edition
Now that our MSRV supports the 2024 edition it's possible to make this switch. This commit moves Wasmtime to the 2024 Edition to keep up-to-date with Rust idioms and access many of the edition features exclusive to the 2024 edition.
prtest:full
* Reformat with the 2024 edition
show more ...
|
| #
5ded0f4e |
| 06-May-2025 |
Ulrich Weigand <[email protected]> |
Refactor call ABI implementation (#10722)
This refactors implementation of call ABI handling across architectures with the goal of bringing s390x in line with other platforms.
The main idea is to -
Refactor call ABI implementation (#10722)
This refactors implementation of call ABI handling across architectures with the goal of bringing s390x in line with other platforms.
The main idea is to - handle main call instruction selection and generation in ISLE (like s390x but unlike other platforms today) - handle argument setup mostly outside of ISLE (like other platforms but unlike s390x today) - handle return value processing as part of the call instructio (like all platforms today)
All platforms now emit the main call instruction directly from ISLE, which e.g. handles selection of the correct ISA instruction depending on the call destination. This ISLE code calls out to helper routines to handle argument and return value processing. These helpers are mostly common code and provided by the Callee and/or Lower layers, with some platform-specific additions via ISLE Context routines.
The old CallSite abstraction is no longer needed; most of the differences between call and return_call handling disappear. (There is still a common-code CallInfo vs. a platform-specifc ReturnCallInfo. At this point, it should be relatively straight- forward to make CallInfo platform-specific as well if desired, but this is not done here.)
Some ISLE infrastructure for iterators / loops, which was only ever used by the s390x argument processing code, has been removed.
s390x now closely matches all other platforms, with only a few special cases (slightly different tail-call ABI requires some differences in stack offset computations; we still need to handle vector lane swaps for cross-ABI calls), which should simplify future maintenance.
show more ...
|
|
Revision tags: v32.0.0 |
|
| #
5b63c874 |
| 15-Apr-2025 |
SingleAccretion <[email protected]> |
[DI] Fix live range tracking off-by-one confusions (#10570)
* Dump blocks in the VL table
* Add a test
* Work around #10572 in tests
* [DI] Fix live range tracking off-by-one confusions
How thin
[DI] Fix live range tracking off-by-one confusions (#10570)
* Dump blocks in the VL table
* Add a test
* Work around #10572 in tests
* [DI] Fix live range tracking off-by-one confusions
How things used to work w.r.t. instruction indices (IIs): 1) In lowering: - Reversed order: IIs represented "before IP"s. - Block args were defined one instruction too late, but this issue was masked due to how RA allocates, at least in simple examples. - Execution order: IIs represented "after IP"s. 2) In RA: - IIs represented "before IP"s. - Notice the mismatch. 3) In emit: - RA directions w.r.t. the explicit ProgPoint positions were not respected and always treated as "after".
How things work after this change: 1) In lowering: - Reversed order: IIs represent "after IP"s. - Execution order: IIs represent "before IP"s. 2) In RA: - No change; mismatch fixed. 3) In emit: - ProgPoint positions now respected.
This fixes various "silent bad debug info" issues.
show more ...
|
| #
3da7fc8e |
| 08-Apr-2025 |
SingleAccretion <[email protected]> |
[DI] Dump value label assignments in a table (#10549)
* Dump compilation start/end
* [DI] Log value label ranges in a table
Sample table:
|Inst |IP |VL0 |VL1 |VL3 |VL4 |VL5
[DI] Dump value label assignments in a table (#10549)
* Dump compilation start/end
* [DI] Log value label ranges in a table
Sample table:
|Inst |IP |VL0 |VL1 |VL3 |VL4 |VL5 |VL7 |VL10 |VL11 |VL4294967294| |--------|----|--------|---------|---------|--------|--------|--------|---------|--------|------------| |Inst 0 |53 | | | | | | | | | | | | | | | | | | | |Inst 1 |53 | | | | | | | | | | | | | | | | | | | |Inst 2 |60 |v194|p2i|v232|p12i| | | | | | | | | | | | |v192|p7i | |Inst 3 |64 |* |p2i|* |p12i|v231|p13i| | | | | | | | | | |* |p7i | |Inst 4 |68 |* |p2i|* |p12i|* |p13i| | | | | | | | | | |* |p7i | |Inst 5 |72 |* |p2i|* |p12i|* |p13i| | | | | | | | | | |* |p7i | |Inst 6 |76 |* |p2i|* |p12i|* |p13i| | | | | | | | | | |* |p7i | |Inst 7 |87 |* | |* |p12i|* |p13i| | | | | | | | | | |* |p7i | |Inst 8 |92 |* | |* |p12i|* |p13i|v227|p0i| | | | | | | | |* |p15i | |Inst 9 |94 |* | |v204| |v204| |v204| |v204| |v204| |v204| |v204| |* |p15i | |Inst 10 |100 |* | |* | |* | |* | |* | |* | |* | |* | |* |p15i | |Inst 11 |105 |* | |* | |* | |* | |v226|p9i|* | |* | |* | |* |p15i | |Inst 12 |109 |* | |* | |* | |* | |* | |v225|p9i|* | |* | |* |p15i | |Inst 13 |114 |* | |* | |* | |* | |* | |* | |* | |* | |* |p15i | |Inst 14 |119 |* | |* | |* | |* | |* | |* | |* | |* | |* |p15i | |Inst 15 |125 |* | |* | |* | |* | |* | |* | |* | |* | |* |p15i | |Inst 16 |129 |* | |* | |* | |* | |* | |* | |v223|p11i|* | |* |p15i | |Inst 17 |134 |* | |* | |* | |* | |* | |* | |* | |* | |* |p15i | |Inst 18 |134 |* | |* | |* | |* | |* | |* | |* | |* | |* |p15i | |Inst 19 |139 |* | |* | |* | |* | |* | |* | |* | |v222|p0i|* |p15i | |Inst 20 |143 |* | |* | |* | |* | |* | |* | |* | |* |p0i|* |p15i | |Inst 21 |143 |* | |* | |* | |* | |* | |* | |* | |* |p0i|* | |
This will make it much easier to diagnose problems with incomplete/missing live ranges.
show more ...
|
| #
94ec88ea |
| 08-Apr-2025 |
Chris Fallin <[email protected]> |
Cranelift: initial try_call / try_call_indirect (exception) support. (#10510)
* Cranelift: initial try_call / try_call_indirect (exception) support.
This PR adds `try_call` and `try_call_indirect`
Cranelift: initial try_call / try_call_indirect (exception) support. (#10510)
* Cranelift: initial try_call / try_call_indirect (exception) support.
This PR adds `try_call` and `try_call_indirect` instructions, and lowerings on four of five ISAs (x86-64, aarch64, riscv64, pulley; s390x has its own non-shared ABI code that will need separate work).
It extends CLIF to support these instructions as new kinds of branches, and extends block-calls to accept `retN` and `exnN` block-call args that carry the normal return values or exception payloads (respectively) into the appropriate successor blocks.
It wires up the "normal return path" so that it continues to work. It updates the ABI so that unwinding is possible without an initial register state at throw: specifically, as per our RFC, all registers are clobbered. It also includes metadata in the `MachBuffer` that describes exception-catch destinations. However, no unwinder exists to interpret these catch-destinations yet, so they are untested.
* Add try_call_indirect lowering as well.
show more ...
|
| #
a62b396f |
| 05-Apr-2025 |
Chris Fallin <[email protected]> |
Cranelift: remove return-value instructions after calls at callsites. (#10502)
* Cranelift: remove return-value instructions after calls at callsites.
This PR addresses the issues described in #104
Cranelift: remove return-value instructions after calls at callsites. (#10502)
* Cranelift: remove return-value instructions after calls at callsites.
This PR addresses the issues described in #10488 in a more head-on way: it removes the use of separate "return-value instructions" that load return values from the stack, instead folding these loads into the semantics of the call VCode instruction.
This is a prerequisite for exception-handling: we need calls to be workable as terminators, meaning that we cannot require any other (VCode) instructions after the call to define the return values.
In principle, this PR starts simply enough: the return-locations list on the `CallInfo` that each backend uses to provide regalloc metadata is updated to support a notion of "register or stack address" as the source of each return value, and this list is now used for both kinds of returns, not just returns in registers. Shared code is defined in `machinst::abi` used by all backends to perform the requisite loads.
In order to make this work with more defined values than fit in registers, however, this PR also had to add support for "any"-constrained registers to Cranelift, and handling allocations that may be spillslots. This has always been supported by RA2, but this is the first time that Cranelift uses them directly (previously they were used only internally in RA2 as lowerings from other kinds of constraints like safepoints). This requires encoding a spillslot index in our `Reg` type.
There is a little bit of complexity around handling the loads/defs as well: if we have a return value on-stack, and we need to put it in a spillslot, we cannot do a memory-to-memory move directly, so we need a temporary register. Earlier versions of this PR allocated another temp as a vreg on the call, but this doesn't work with all calling conventions (too many clobbers). For simplicity I picked a particular register that is (i) clobbered by calls and (ii) not used for return values for each architecture (x86-64's tailcall needed to lose one return-in-register slot to make this work).
This removes retval insts from the shared ABI infra completely. s390x is different, still, because it handles callsite lowering from ISLE; we will need to address that separately for exception support there.
* Fix is_included_in_clobbers on aarch64: new defs must skip optimization.
* Review feedback: add assert.
* Review feedback: handle retval temp reg via ABI trait method.
* Update is_clobbered_in_inst to affect only clobbers, not all defs.
show more ...
|
|
Revision tags: v31.0.0 |
|
| #
2af0a1f7 |
| 13-Mar-2025 |
bjorn3 <[email protected]> |
Introduce log2_min_function_alignment flag (#10391)
* Remove function_alignment handling from cranelift-object and cranelift-jit
It is already handled by MachBuffer. The symbol_alignment could also
Introduce log2_min_function_alignment flag (#10391)
* Remove function_alignment handling from cranelift-object and cranelift-jit
It is already handled by MachBuffer. The symbol_alignment could also be removed as no current backend has a symbol alignment bigger than the function alignment, but keeping it around is a bit safer when new backends are introduced.
* Introduce log2_min_function_alignment flag
This is required for cg_clif to implement -Zmin-function-alignment.
show more ...
|
|
Revision tags: v30.0.2, v30.0.1, v30.0.0 |
|
| #
392c7a96 |
| 23-Jan-2025 |
Chris Fallin <[email protected]> |
Cranelift/x64 backend: do not use one-way branches. (#10086)
* Cranelift/x64 backend: do not use one-way branches.
In #9980, we saw that code copmiled with the single-pass register allocator has in
Cranelift/x64 backend: do not use one-way branches. (#10086)
* Cranelift/x64 backend: do not use one-way branches.
In #9980, we saw that code copmiled with the single-pass register allocator has incorrect behavior. We eventually narrowed this down to the fact that the single-pass allocator is inserting code meant to be at the end of a block, just before its terminator, *between* two branches that form the terminator sequence. The allocator is correct; the bug is with Cranelift's x64 backend.
When we produce instructions into a VCode container, we maintain basic blocks, and we have the invariant (usual for basic block-based IR) that only the last -- terminator -- instruction is a branch that can leave the block. Even the conditional branches maintain this invariant: though VCode is meant to be "almost machine code", we emit *two-target conditionals* that are semantically like "jcond; jmp". We then are able to optimize this inline during binary emission in the `MachBuffer`: the buffer knows about unconditional and conditional branches and will "chomp" branches off the tail of the buffer whenever they target the fallthrough block. (We designed the system this way because it is simpler to think about BBs that are order-invariant, i.e., not bake the "fallthrough" concept into the IR.) Thus we have a simpler abstraction but produce optimal terminator sequences.
Unfortunately, when adding a branch-on-floating-point-compare lowering, we had the need to branch to a target if either of *two* conditions were true, and rather than add a new kind of terminator instruction, we added a "one-armed branch": conditionally branch to label or fall through. We emitted this in sequence right before the actual terminator, so semantically it was almost equivalent.
I write "almost" because the register allocator *is* allowed to insert spills/reloads/moves between any two instructions. Here the distinct pieces of the terminator sequence matter: the allocator might insert something just before the last instruction, assuming the basic-block "single in, single out" invariant means this will always run with the block. With one-armed branches this is no longer true.
The backtracking allocator (our original RA2 algorithm, and still the default today) will never insert code at the end of a block when it has multiple terminators, because it associates such block-start/end insertions with *edges*; so in such conditions it inserts instructions into the tops of successor blocks instead. But the single-pass allocator needs to perform work at the end of every block, so it will trigger this bug.
This PR removes `JmpIf` and converts the br-of-fcmp lowering to use `JmpCondOr` instead, which is a pseudoinstruction that does `jcc1; jcc2; jmp`. This maintains the BB invariant and fixes the bug.
Note that Winch still uses `JmpIf`, so we cannot remove it entirely: this PR renames it to `WinchJmpIf` instead, and adds a mechanism to assert failure if it is ever added to `VCode` (rather than emitted directly, as Winch's macro-assembler does). We could instead write Winch's `jmp_if` assembler function in terms of `JmpCond` with a fallthrough label that is immediately bound, and let the MachBuffer always chomp the jmp; I opted not to regress Winch compiler performance by doing this. If one day we abstract out the assembler further, we can remove `WinchJmpIf`.
This is one of two instances of a "one-armed branch"; the other is s390x's `OneWayCondBr`, used in `br_table` lowerings, which we will address separately. Once we do, that will address #9980 entirely.
* Add test for cascading branch-chomping behavior.
* keep the paperclip happy
show more ...
|
|
Revision tags: v29.0.1, v29.0.0 |
|
| #
48f4621f |
| 15-Jan-2025 |
Alex Crichton <[email protected]> |
Run the full test suite on 32-bit platforms (#9837)
* Run the full test suite on 32-bit platforms
This commit switches to running the full test suite in its entirety (`./ci/run-tests.sh`) on 32-bit
Run the full test suite on 32-bit platforms (#9837)
* Run the full test suite on 32-bit platforms
This commit switches to running the full test suite in its entirety (`./ci/run-tests.sh`) on 32-bit platforms in CI in addition to 64-bit platforms. This notably adds i686 and armv7 as architectures that are tested in CI.
Lots of little fixes here and there were applied to a number of tests. Many tests just don't run on 32-bit platforms or a platform without Cranelift support, and they've been annotated as such where necessary. Other tests were adjusted to run on all platforms a few minor bug fixes are here as well.
prtest:full
* Fix clippy warning
* Get wasm code working by default on 32-bit
Don't require the `pulley` feature opt-in on 32-bit platforms to get wasm code running.
* Fix dead code warning
* Fix build on armv7
* Fix test assertion on armv7
* Review comments
* Update how tests are skipped
* Change how Pulley is defaulted
Default to pulley in `build.rs` rather than in `Cargo.toml` to make it easier to write down the condition and comment what's happening. This means that the `pulley-interpreter` crate and pulley support in Cranelift is always compiled in now and cannot be removed. This should hopefully be ok though as the `pulley-interpreter` crate is still conditionally used (meaning it can get GC'd) and the code-size of Cranelift is not as important as the runtime itself.
* pulley: Save/restore callee-save state on traps
* Fewer clippy warnings about casts
* Use wrapping_add in `g32_addr`, fixing arm test
show more ...
|
|
Revision tags: v28.0.1, v28.0.0, v27.0.0, v26.0.1, v25.0.3, v24.0.2, v26.0.0, v21.0.2, v22.0.1, v23.0.3, v25.0.2, v24.0.1, v25.0.1, v25.0.0 |
|
| #
1854929d |
| 03-Sep-2024 |
Trevor Elliott <[email protected]> |
Update to regalloc2-0.10.0 (#9197)
* Reapply "Upgrade regalloc2 to 0.9.4 (#9191)" (#9193)
This reverts commit 7081b8fc10f9909fd31fcc26da54badc2f00ad7a.
* Upgrade to regalloc-0.10.0
|
| #
7081b8fc |
| 31-Aug-2024 |
Trevor Elliott <[email protected]> |
Revert "Upgrade regalloc2 to 0.9.4 (#9191)" (#9193)
This reverts commit 098430f3c8fd7bb92968402beef0670d08023fba.
|
| #
098430f3 |
| 30-Aug-2024 |
Trevor Elliott <[email protected]> |
Upgrade regalloc2 to 0.9.4 (#9191)
* Upgrade to regalloc-0.9.4
* Update filetests
* Run `cargo vet`
|
| #
b81ef46c |
| 22-Aug-2024 |
Nick Fitzgerald <[email protected]> |
Remove reference types (`r32` and `r64`) from Cranelift (#9164)
* Remove reference types (`r32` and `r64`) from Cranelift
* restore fuzz regression test
|
| #
c0c3a68c |
| 21-Aug-2024 |
Nick Fitzgerald <[email protected]> |
Cranelift: Remove the old stack maps implementation (#9159)
They are superseded by the new user stack maps implementation.
|
|
Revision tags: v24.0.0, v23.0.2 |
|
| #
a0442ea0 |
| 05-Aug-2024 |
Hamir Mahal <[email protected]> |
Enforce `uninlined_format_args` for the workspace (#9065)
* Enforce `uninlined_format_args` for the workspace
* fix: failing `Monolith Checks` job
* fix: formatting
|
|
Revision tags: v23.0.1, v23.0.0 |
|
| #
c510a2b9 |
| 02-Jul-2024 |
bjorn3 <[email protected]> |
Couple of small improvements for debugging Cranelift (#8885)
* Print block params and branch args in vcode
* Implement Debug for JumpTableData and GlobalValueData
|