History log of /wasmtime-44.0.1/cranelift/codegen/src/machinst/mod.rs (Results 1 – 25 of 102)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: dev, v36.0.9, v44.0.1, v43.0.2, v36.0.8, v24.0.8, v44.0.0, v43.0.1, v42.0.2, v36.0.7, v24.0.7
# 2f7dbd61 31-Mar-2026 Chris Fallin <[email protected]>

PCC: remove proof-carrying code (for now?). (#12800)

In late 2023, we built out an experimental feature called
Proof-Carrying Code (PCC), where we attached "facts" to values in the
CLIF IR and built

PCC: remove proof-carrying code (for now?). (#12800)

In late 2023, we built out an experimental feature called
Proof-Carrying Code (PCC), where we attached "facts" to values in the
CLIF IR and built verification of these facts after lowering to
machine instructions. We also added "memory types" describing layout
of memory and a "checked" flag on memory operations such that we could
verify that any checked memory operation accessed valid memory (as
defined by memory types attached to pointer values via
facts). Wasmtime's Cranelift backend then put appropriate memory types
and facts in its IR such that all accesses to memory (aspirationally)
could be checked, taking the whole mid-end and lowering backend of
Cranelift out of the trusted core that enforces SFI.

This basically worked, at the time, for static memories; but never for
dynamic memories, and then work on the feature lost
prioritization (aka I had to work on other things) and I wasn't able
to complete it and put it in fuzzing/enable it as a production option.

Unfortunately since then it has bit-rotted significantly -- as we add
new backend optimizations and instruction lowerings we haven't kept
the PCC framework up to date.

Inspired by the discussion in #12497 I think it's time to delete
it (hopefully just "for now"?) unless/until we can build it again. And
when we do that, we should probably get it to the point of validating
robust operation on all combinations of memory configurations before
merging. (That implies a big experiment branch rather than a bunch of
eager PRs in-tree, but so it goes.) I still believe it is possible to
build this (and I have ideas on how to do it!) but not right now.

show more ...


Revision tags: v43.0.0, v42.0.1, v41.0.4, v42.0.0, v40.0.4, v36.0.6, v24.0.6, v41.0.3, v41.0.2, v41.0.1, v36.0.5, v40.0.3, v41.0.0, v36.0.4, v39.0.2, v40.0.2, v40.0.1
# 0889323a 03-Jan-2026 SSD <[email protected]>

cranelift-codegen: rename most uses of std to core and alloc (#12237)

* rename most std uses to core and alloc

* cargo fmt


Revision tags: v40.0.0
# 17fbd3c6 12-Dec-2025 Chris Fallin <[email protected]>

Debug: implement breakpoints and single-stepping. (#12133)

* Debug: implement breakpoints and single-stepping.

This is a PR that puts together a bunch of earlier pieces (patchable
calls in #12061 a

Debug: implement breakpoints and single-stepping. (#12133)

* Debug: implement breakpoints and single-stepping.

This is a PR that puts together a bunch of earlier pieces (patchable
calls in #12061 and #12101, private copies of code in #12051, and all
the prior debug event and instrumentation infrastructure) to implement
breakpoints in the guest debugger.

These are implemented in the way we have planned in #11964: each
sequence point (location prior to a Wasm opcode) is now a patchable call
instruction, patched out (replaced with NOPs) by default. When patched
in, the breakpoint callsite calls a trampoline with the `patchable` ABI
which then invokes the `breakpoint` hostcall. That hostcall emits the
debug event and nothing else.

A few of the interesting bits in this PR include:
- Implementations of "unpublish" (switch permissions back to read/write
from read/execute) for mmap'd code memory on all our platforms.
- Infrastructure in the frame-tables (debug info) metadata producer and
parser to record "breakpoint patches".
- A tweak to the NOP metadata packaged with the `MachBuffer` to allow
multiple NOP sizes. This lets us use one 5-byte NOP on x86-64, for
example (did you know x86-64 had these?!) rather than five 1-byte
NOPs.

This PR also implements single-stepping with a global-per-`Store` flag,
because at this point why not; it's a small additional bit of logic to
do *all* patches in all modules registered in the `Store` when that flag
is enabled.

A few realizations for future work:
- The need for an introspection API available to a debugger to see the
modules within a component is starting to become clear; either that,
or the "module and PC" location identifier for a breakpoint switches
to a "module or component" sum type. Right now, the tests for this
feature use only core modules. Extending to components should not
actually be hard at all, we just need to build the API for it.
- The interaction between inlining and `patchable_call` is interesting:
what happens if we inline a `patchable_call` at a `try_call` callsite?
Right now, we do *not* update the `patchable_call` to a `try_call`,
because there is no `patchable_try_call`; this is fine in the Wasmtime
embedding in practice because we never (today!) throw exceptions from
a breakpoint handler. This does suggest to me that maybe we should
make patchability a property of any callsite, and allow try-calls to
be patchable too (with the same restriction about no return values as
the only restriction); but happy to discuss that one further.

* Add missing debug.wat disas test.

* Review feedback.

* Fix comment on `CodeMemory::text_mut`.

* Review feedback.

* Review feedback: abort process on failure to re-apply executable permissions.

* Implement icache flush for aarch64.

This appears to be necessary as we otherwise see a failure in CI on
macOS/aarch64 that is consistent with patched-in breakpoint calls still
being incorrectly cached after we remove them and republish the code.

There is a longstanding issue in #3310 tracking proper icache coherence
handling on aarch64. We implemented this for Linux with the `membarrier`
syscall but never did so for macOS. Maybe this is the first point at
which it matters, because code was always loaded at new addresses (hence
did not have coherence issues because nothing would have been cached)
previously.

prtest:full

* Review feedback: use `next_multiple_of`.

show more ...


# c00e9ea2 02-Dec-2025 Chris Fallin <[email protected]>

Cranelift: add patchable call instructions. (#12101)

* Cranelift: add patchable call instructions.

The new `patchable_call` CLIF instruction pairs with the `patchable`
ABI, and emits a callsite wit

Cranelift: add patchable call instructions. (#12101)

* Cranelift: add patchable call instructions.

The new `patchable_call` CLIF instruction pairs with the `patchable`
ABI, and emits a callsite with one new key property: the MachBuffer
carries metadata that describes exactly which byte range to "NOP out"
(overwrite with NOP instructions) to disable that callsite. Doing so is
semantically valid and explicitly supported.

This enables patching of code at runtime to dynamically turn on and off
features such as instrumentation or debugging hooks. We plan to use this
to implement breakpoints in Wasmtime's guest debugging support.

As part of this change, I added a notion of "unit of NOP bytes" to the
MachBuffer so that the consumer (e.g., Wasmtime's Cranelift-based code
compilation pipeline and metadata-producing logic) can handle patchable
callsites without any other special knowledge of the ISA.

For the "real metal" ISAs there are perfectly well-defined NOPs to use,
but for Pulley, where all opcodes are assigned at compile time by macro
magic, I explicitly defined NOP as opcode byte 0 by moving `Nop`'s
definition to the top of the list and adding a unit test asserting its
encoding.

A design note: in principle it would be possible, as an alternative, to
treat "patchability" as an orthogonal dimension of all callsites, and
emit the metadata describing the instruction-offset range for any
callsite with the flag set. The only truly necessary semantic
restriction is that there are no return values (because if we turn the
callsite off, nothing writes to them); we could support patchability for
other ABIs and for the other kinds of call instructions. The `patchable`
ABI would then be better described as something like the "no clobbers
ABI". I opted not to generalize in this way because it creates some
less-tested corners and the generalized form, at least at the MachInst
level, is not really much simpler in the end.

A testing note: I opted not to implement actual code patching in the
`cranelift-tools` filetest runner and test patching callsites in/out via
some actuation (e.g. a magic hostcall, like we do for throws) because
(i) that's a lot of new plumbing and (ii) we are going to test this very
shortly in Wasmtime anyway and (iii) the correctness (or not) of the
location-and-length metadata is easy enough to verify in the
disassemblies in the compile-tests.

* Review feedback: remove dependence on (and test for) NOP being the literal byte 0.

show more ...


Revision tags: v39.0.1, v39.0.0, v38.0.4, v37.0.3, v36.0.3, v24.0.5, v38.0.3, v38.0.2, v38.0.1
# 557cc2d6 10-Oct-2025 Alex Crichton <[email protected]>

Another batch of dependency updates (#11832)

* Another batch of dependency updates

Bringing some deps in `Cargo.toml` up-to-date with their latest versions
along the same lines as #11820 to avoid d

Another batch of dependency updates (#11832)

* Another batch of dependency updates

Bringing some deps in `Cargo.toml` up-to-date with their latest versions
along the same lines as #11820 to avoid deps getting too stale/old.

Code-wise this updates `anyhow` which enables preexisting Clippy
warnings to check more code, so those warnings are fixed here as well.

prtest:full

* Run rustfmt

show more ...


Revision tags: v37.0.2
# a3d6e407 06-Oct-2025 Chris Fallin <[email protected]>

Cranelift: add debug tag infrastructure. (#11768)

* Cranelift: add debug tag infrastructure.

This PR adds *debug tags*, a kind of metadata that can attach to CLIF
instructions and be lowered to VCo

Cranelift: add debug tag infrastructure. (#11768)

* Cranelift: add debug tag infrastructure.

This PR adds *debug tags*, a kind of metadata that can attach to CLIF
instructions and be lowered to VCode instructions and as metadata on
the produced compiled code. It also adds opaque descriptor blobs
carried with stackslots. Together, these two features allow decorating
IR with first-class debug instrumentation that is properly preserved
by the compiler, including across optimizations and
inlining. (Wasmtime's use of these features will come in followup
PRs.)

The key idea of a "debug tag" is to allow the Cranelift embedder to
express whatever information it needs to, in a format that is opaque
to Cranelift itself, except for the parts that need translation during
lowering. In particular, the `DebugTag::StackSlot` variant gets
translated to a physical offset into the stackframe in the compiled
metadata output. So, for example, the embedder can emit a tag
referring to a stackslot, and another describing an offset in that
stackslot.

The debug tags exist as a *sequence* on any given instruction; the
meaning of the sequence is known only to the embedder, *except* that
during inlining, the tags for the inlining call instruction are
prepended to the tags of inlined instructions. In this way, a
canonical use-case of tags as describing original source-language
frames can preserve the source-language view even when multiple
functions are inlined into one.

The descriptor on a stackslot may look a little odd at first, but its
purpose is to allow serializing some description of
stackslot-contained runtime user-program data, in a way that is firmly
attached to the stackslot. In particular, in the face of inlining,
this descriptor is copied into the inlining (parent) function from the
inlined function when the stackslot entity is copied; no other
metadata outside Cranelift needs to track the identity of stackslots
and know about that motion. This fits nicely with the ability of tags
to refer to stackslots; together, the embedder can annotate
instructions as having certain state in stackslots, and describe the
format of that state per stackslot.

This infrastructure is tested with some compile-tests now;
testing of the interpretation of the metadata output will come with
end-to-end debug instrumentation tests in a followup PR.

* Review feedback: add back sequence points and enforce tags only on sequence points or calls.

* Use Vecs for debug metadata in MachBuffer to avoid SmallVec size penalty in not-used case.

* Review feedback: switch from inlined stackslot descriptor blobs to u64 keys.

show more ...


Revision tags: v37.0.1, v37.0.0
# 3b85d838 03-Sep-2025 Paul Nodet <[email protected]>

feat: add granular tail call detection infrastructure to MachInst (#11599)

* feat: add granular tail call detection infrastructure to machinst

Adds core infrastructure for distinguishing between re

feat: add granular tail call detection infrastructure to MachInst (#11599)

* feat: add granular tail call detection infrastructure to machinst

Adds core infrastructure for distinguishing between regular calls and
tail calls at the instruction level.

* feat: implement call_type() method for all ISA backends

* refactor: pass around function_calls enum instead of boolean

* feat: add function_calls.update() logic

show more ...


# 3fe9c3c7 03-Sep-2025 Paul Nodet <[email protected]>

fix: accurate leaf detection (#11581)

* feat: add is_call() method to MachInst trait and VCode analysis

Add is_call() method to MachInst trait to enable accurate leaf function
detection during regi

fix: accurate leaf detection (#11581)

* feat: add is_call() method to MachInst trait and VCode analysis

Add is_call() method to MachInst trait to enable accurate leaf function
detection during register allocation. Update VCode compute_clobbers() to
return (clobbers, is_leaf) tuple by analyzing actual call instructions
in machine code.

* feat: implement is_call() method across all architectures

Implement is_call() method for all architecture-specific MachInst
implementations:

- x64: Detects CallKnown, CallUnknown, ReturnCall variants, and TLS
calls (ElfTlsGetAddr, MachOTlsGetAddr)
- aarch64: Detects Call, CallInd, ReturnCall variants, and TLS calls
(ElfTlsGetAddr, MachOTlsGetAddr)
- riscv64: Detects Call, CallInd, ReturnCall variants, and ElfTlsGetAddr
- s390x: Detects CallKnown, CallUnknown, ReturnCall variants
- pulley: Detects Call, CallIndirect, ReturnCall variants

Co-authored-by: bjorn3 <[email protected]>

* feat: improve leaf function detection and pass is_leaf to FrameLayout

* test: add filetests for leaf detection

* test: update expected outputs for accurate leaf function detection

* test(riscv64): update filetests output

---------

Co-authored-by: bjorn3 <[email protected]>

show more ...


Revision tags: v36.0.2, v36.0.1, v36.0.0, v35.0.0, v24.0.4, v33.0.2, v34.0.2
# 099102d9 07-Jul-2025 Alex Crichton <[email protected]>

Remove `expect(clippy::allow_attributes_without_reason)` from cranelift-codegen (#11182)

* Remove `expect(clippy::allow_attributes_without_reason)` from cranelift-codegen

This commit gets around to

Remove `expect(clippy::allow_attributes_without_reason)` from cranelift-codegen (#11182)

* Remove `expect(clippy::allow_attributes_without_reason)` from cranelift-codegen

This commit gets around to migrating the `cranelift-codegen` crate to
require a reason on lint directives and additionally switch to
`#[expect]` where possible.

prtest:full

* Move x64-only item to x64 backend

show more ...


Revision tags: v34.0.1, v33.0.1, v24.0.3, v32.0.1, v34.0.0, v33.0.0
# 90ac295e 19-May-2025 Alex Crichton <[email protected]>

Update Wasmtime to the 2024 Rust Edition (#10806)

* Update Wasmtime to the 2024 Rust Edition

Now that our MSRV supports the 2024 edition it's possible to make this
switch. This commit moves Wasmtim

Update Wasmtime to the 2024 Rust Edition (#10806)

* Update Wasmtime to the 2024 Rust Edition

Now that our MSRV supports the 2024 edition it's possible to make this
switch. This commit moves Wasmtime to the 2024 Edition to keep
up-to-date with Rust idioms and access many of the edition features
exclusive to the 2024 edition.

prtest:full

* Reformat with the 2024 edition

show more ...


Revision tags: v32.0.0
# 94ec88ea 08-Apr-2025 Chris Fallin <[email protected]>

Cranelift: initial try_call / try_call_indirect (exception) support. (#10510)

* Cranelift: initial try_call / try_call_indirect (exception) support.

This PR adds `try_call` and `try_call_indirect`

Cranelift: initial try_call / try_call_indirect (exception) support. (#10510)

* Cranelift: initial try_call / try_call_indirect (exception) support.

This PR adds `try_call` and `try_call_indirect` instructions, and
lowerings on four of five ISAs (x86-64, aarch64, riscv64, pulley; s390x
has its own non-shared ABI code that will need separate work).

It extends CLIF to support these instructions as new kinds of branches,
and extends block-calls to accept `retN` and `exnN` block-call args that
carry the normal return values or exception payloads (respectively) into
the appropriate successor blocks.

It wires up the "normal return path" so that it continues to work.
It updates the ABI so that unwinding is possible without an initial
register state at throw: specifically, as per our RFC, all registers are
clobbered. It also includes metadata in the `MachBuffer` that describes
exception-catch destinations. However, no unwinder exists to interpret
these catch-destinations yet, so they are untested.

* Add try_call_indirect lowering as well.

show more ...


# a62b396f 05-Apr-2025 Chris Fallin <[email protected]>

Cranelift: remove return-value instructions after calls at callsites. (#10502)

* Cranelift: remove return-value instructions after calls at callsites.

This PR addresses the issues described in #104

Cranelift: remove return-value instructions after calls at callsites. (#10502)

* Cranelift: remove return-value instructions after calls at callsites.

This PR addresses the issues described in #10488 in a more head-on
way: it removes the use of separate "return-value instructions" that
load return values from the stack, instead folding these loads into
the semantics of the call VCode instruction.

This is a prerequisite for exception-handling: we need calls to be
workable as terminators, meaning that we cannot require any
other (VCode) instructions after the call to define the return values.

In principle, this PR starts simply enough: the return-locations list
on the `CallInfo` that each backend uses to provide regalloc metadata
is updated to support a notion of "register or stack address" as the
source of each return value, and this list is now used for both kinds
of returns, not just returns in registers. Shared code is defined in
`machinst::abi` used by all backends to perform the requisite loads.

In order to make this work with more defined values than fit in
registers, however, this PR also had to add support for
"any"-constrained registers to Cranelift, and handling allocations
that may be spillslots. This has always been supported by RA2, but
this is the first time that Cranelift uses them directly (previously
they were used only internally in RA2 as lowerings from other kinds of
constraints like safepoints). This requires encoding a spillslot index
in our `Reg` type.

There is a little bit of complexity around handling the loads/defs as
well: if we have a return value on-stack, and we need to put it in a
spillslot, we cannot do a memory-to-memory move directly, so we need a
temporary register. Earlier versions of this PR allocated another temp
as a vreg on the call, but this doesn't work with all calling
conventions (too many clobbers). For simplicity I picked a particular
register that is (i) clobbered by calls and (ii) not used for return
values for each architecture (x86-64's tailcall needed to lose one
return-in-register slot to make this work).

This removes retval insts from the shared ABI infra completely. s390x
is different, still, because it handles callsite lowering from ISLE;
we will need to address that separately for exception support there.

* Fix is_included_in_clobbers on aarch64: new defs must skip optimization.

* Review feedback: add assert.

* Review feedback: handle retval temp reg via ABI trait method.

* Update is_clobbered_in_inst to affect only clobbers, not all defs.

show more ...


Revision tags: v31.0.0, v30.0.2, v30.0.1, v30.0.0
# 392c7a96 23-Jan-2025 Chris Fallin <[email protected]>

Cranelift/x64 backend: do not use one-way branches. (#10086)

* Cranelift/x64 backend: do not use one-way branches.

In #9980, we saw that code copmiled with the single-pass register
allocator has in

Cranelift/x64 backend: do not use one-way branches. (#10086)

* Cranelift/x64 backend: do not use one-way branches.

In #9980, we saw that code copmiled with the single-pass register
allocator has incorrect behavior. We eventually narrowed this down to
the fact that the single-pass allocator is inserting code meant to be
at the end of a block, just before its terminator, *between* two
branches that form the terminator sequence. The allocator is correct;
the bug is with Cranelift's x64 backend.

When we produce instructions into a VCode container, we maintain basic
blocks, and we have the invariant (usual for basic block-based IR)
that only the last -- terminator -- instruction is a branch that can
leave the block. Even the conditional branches maintain this
invariant: though VCode is meant to be "almost machine code", we
emit *two-target conditionals* that are semantically like "jcond;
jmp". We then are able to optimize this inline during binary emission
in the `MachBuffer`: the buffer knows about unconditional and
conditional branches and will "chomp" branches off the tail of the
buffer whenever they target the fallthrough block. (We designed the
system this way because it is simpler to think about BBs that are
order-invariant, i.e., not bake the "fallthrough" concept into the
IR.) Thus we have a simpler abstraction but produce optimal terminator
sequences.

Unfortunately, when adding a branch-on-floating-point-compare
lowering, we had the need to branch to a target if either of *two*
conditions were true, and rather than add a new kind of terminator
instruction, we added a "one-armed branch": conditionally branch to
label or fall through. We emitted this in sequence right before the
actual terminator, so semantically it was almost equivalent.

I write "almost" because the register allocator *is* allowed to insert
spills/reloads/moves between any two instructions. Here the distinct
pieces of the terminator sequence matter: the allocator might insert
something just before the last instruction, assuming the basic-block
"single in, single out" invariant means this will always run with the
block. With one-armed branches this is no longer true.

The backtracking allocator (our original RA2 algorithm, and still the
default today) will never insert code at the end of a block when it
has multiple terminators, because it associates such block-start/end
insertions with *edges*; so in such conditions it inserts instructions
into the tops of successor blocks instead. But the single-pass
allocator needs to perform work at the end of every block, so it will
trigger this bug.

This PR removes `JmpIf` and converts the br-of-fcmp lowering to use
`JmpCondOr` instead, which is a pseudoinstruction that does `jcc1;
jcc2; jmp`. This maintains the BB invariant and fixes the bug.

Note that Winch still uses `JmpIf`, so we cannot remove it entirely:
this PR renames it to `WinchJmpIf` instead, and adds a mechanism to
assert failure if it is ever added to `VCode` (rather than emitted
directly, as Winch's macro-assembler does). We could instead write
Winch's `jmp_if` assembler function in terms of `JmpCond` with a
fallthrough label that is immediately bound, and let the MachBuffer
always chomp the jmp; I opted not to regress Winch compiler
performance by doing this. If one day we abstract out the assembler
further, we can remove `WinchJmpIf`.

This is one of two instances of a "one-armed branch"; the other is
s390x's `OneWayCondBr`, used in `br_table` lowerings, which we will
address separately. Once we do, that will address #9980 entirely.

* Add test for cascading branch-chomping behavior.

* keep the paperclip happy

show more ...


Revision tags: v29.0.1, v29.0.0, v28.0.1, v28.0.0
# 438fc938 25-Nov-2024 Alex Crichton <[email protected]>

pulley: Implement interpreter-to-host calls (#9665)

* pulley: Implement interpreter-to-host calls

This commit is an initial stab at implementing interpreter-to-host
communication in Pulley. The bas

pulley: Implement interpreter-to-host calls (#9665)

* pulley: Implement interpreter-to-host calls

This commit is an initial stab at implementing interpreter-to-host
communication in Pulley. The basic problem is that Pulley needs the
ability to call back into Wasmtime to implement tasks such as
`memory.grow`, imported functions, etc. For native platforms this is a
simple `call_indirect` operation in Cranelift but the story for Pulley
must be different because it's effectively switching from interpreted
code to native code.

The initial idea for this in #9651 is replaced here and looks mostly
similar but with a few changes. The overall structure of how this works
is:

* A new `call_indirect_host` opcode is added to Pulley.
* Function signatures that can be called from Pulley bytecode are
statically enumerated at build-time.
* This enables the implementation of `call_indirect_host` to take an
immediate of which signature is being used and cast the function
pointer to the right type.
* A new pulley-specific relocation is added to Cranelift for this opcode.
* `RelocDistance::Far` calls to a name trigger the use of
`call_indirect_host`.
* The relocation is filled in by Wasmtime after compilation where the
signature number is inserted.
* A new `NS_*` value for user-function namespaces is reserved in
`wasmtime-cranelift` for this new namespace of functions.
* Code generation for Pulley in `wasmtime-cranelift` now has
Pulley-specific handling of the wasm-to-host transition where all
previous `call_indirect` instructions are replaced with a call to a
"backend intrinsic" which gets lowered to a `call_indirect_host`.

Note that most of this still isn't hooked up everywhere in Wasmtime.
That means that the testing here is pretty light at this time. It'll
require a fair bit more work to get everything fully integrated from
Wasmtime in Pulley. This is expected to be one of the significant
remaining chunks of work and should help unblock future testing (or make
those diffs smaller ideally).

* Review comments

show more ...


Revision tags: v27.0.0, v26.0.1, v25.0.3, v24.0.2, v26.0.0, v21.0.2, v22.0.1, v23.0.3, v25.0.2, v24.0.1, v25.0.1, v25.0.0
# c0c3a68c 21-Aug-2024 Nick Fitzgerald <[email protected]>

Cranelift: Remove the old stack maps implementation (#9159)

They are superseded by the new user stack maps implementation.


Revision tags: v24.0.0, v23.0.2
# a0442ea0 05-Aug-2024 Hamir Mahal <[email protected]>

Enforce `uninlined_format_args` for the workspace (#9065)

* Enforce `uninlined_format_args` for the workspace

* fix: failing `Monolith Checks` job

* fix: formatting


Revision tags: v23.0.1, v23.0.0
# e20b4244 27-Jun-2024 Nick Fitzgerald <[email protected]>

Cranelift: Take user stack maps through lowering and emission (#8876)

* Cranelift: Take user stack maps through lowering and emission

Previously, user stack maps were inserted by the frontend and p

Cranelift: Take user stack maps through lowering and emission (#8876)

* Cranelift: Take user stack maps through lowering and emission

Previously, user stack maps were inserted by the frontend and preserved in the
mid-end. This commit takes them from the mid-end CLIF into the backend vcode,
and then from that vcode into the finalized mach buffer during emission.

During lowering, we compile the `UserStackMapEntry`s into packed
`UserStackMap`s. This is the appropriate moment in time to do that coalescing,
packing, and compiling because the stack map entries are immutable from this
point on.

Additionally, we include user stack maps in the `Debug` and disassembly
implementations for vcode, just after their associated safepoint
instructions. This allows us to see the stack maps we are generating when
debugging, as well as write filetests that check we are generating the expected
stack maps for the correct instructions.

Co-Authored-By: Trevor Elliott <[email protected]>

* uncomment debug assert that was commented out for debugging

* Address review feedback

* remove new method that was actually never needed

---------

Co-authored-by: Trevor Elliott <[email protected]>

show more ...


Revision tags: v22.0.0, v21.0.1, v21.0.0
# e165106b 17-May-2024 Trevor Elliott <[email protected]>

cranelift: Remove nominal-sp (#8643)

* Update the frame layout comment

* Remove more references to nominal SP

* Remove the nominal_sp_offset from backend emit states

* Continue removing reference

cranelift: Remove nominal-sp (#8643)

* Update the frame layout comment

* Remove more references to nominal SP

* Remove the nominal_sp_offset from backend emit states

* Continue removing references to the nominal sp

* Remove nominal-sp from the aarch64 backend

* Remove nominal-sp from the s390x backend

* Remove nominal-sp from the riscv64 backend

* Remove old comment

show more ...


# 54e53cc7 16-May-2024 Trevor Elliott <[email protected]>

cranelift: Remove the virtual sp offset from all backends (#8631)

* gen_nominal_sp_adj now returns a smallvec

* Remove the virtual sp offset from the x64 backend

* Remove the virtual sp offset fro

cranelift: Remove the virtual sp offset from all backends (#8631)

* gen_nominal_sp_adj now returns a smallvec

* Remove the virtual sp offset from the x64 backend

* Remove the virtual sp offset from the aarch64 backend

* Remove the virtual sp offset from the riscv64 backend

* Remove the virtual sp offset from the s390x backend

* Remove gen_nomninal_sp_adj, and argument area management functions

* Remove get_virtual_sp_offset_from_state

* Code review suggestions

show more ...


# 7d703191 13-May-2024 Jamey Sharp <[email protected]>

cranelift: Delete more unused regalloc-related stuff (#8604)

Part of the ongoing saga of #8524, #8566, #8581, and #8592


Revision tags: v20.0.2, v20.0.1
# 688cd8f6 29-Apr-2024 Jamey Sharp <[email protected]>

cranelift: Generalize OperandCollector into a trait (#8499)

This paves the way for more implementations of this OperandVisitor trait
which can do different things with the operands.

Of particular n

cranelift: Generalize OperandCollector into a trait (#8499)

This paves the way for more implementations of this OperandVisitor trait
which can do different things with the operands.

Of particular note, this commit introduces a second implementation which
is used only in the s390x backend and only to implement a debug
assertion. Previously, s390x used an OperandCollector::no_reuse_def
method to implement this assertion, but I didn't want to require that
all implementors of the new trait have to provide that method, so this
captures the same check but keeps it local to where it's needed.

show more ...


# d180b907 26-Apr-2024 Jamey Sharp <[email protected]>

cranelift: Update operand aliases in-place (#8486)

Now all registers passed to the operand collector are mutably borrowed
directly out of their original locations in the Inst, so it is possible
to u

cranelift: Update operand aliases in-place (#8486)

Now all registers passed to the operand collector are mutably borrowed
directly out of their original locations in the Inst, so it is possible
to update them in place.

As an initial demonstration of the utility of this change, the results
of the VReg renamer are applied directly to the instructions during
operand collection, and then all VReg aliases are cleared after operand
collection.

Most of this commit consists of deleting noise from the many
`get_operands` implementations in all the backends: most ampersands and
asterisks, and all uses of the `ref` keyword.

show more ...


Revision tags: v20.0.0, v17.0.3, v19.0.2, v18.0.4, v19.0.1, v19.0.0
# c423a693 13-Mar-2024 Jamey Sharp <[email protected]>

cranelift: Remove srcloc from emit state on all targets (#8122)

* cranelift: Remove srcloc from emit state on all targets

In #2426, @cfallin wrote:

> […] don't emit trap info unless an op can trap

cranelift: Remove srcloc from emit state on all targets (#8122)

* cranelift: Remove srcloc from emit state on all targets

In #2426, @cfallin wrote:

> […] don't emit trap info unless an op can trap.
>
> This end result was previously enacted by carrying a SourceLoc on
> every load/store, which was somewhat cumbersome, and only indirectly
> encoded metadata about a memory reference (can it trap) by its
> presence or absence.

That PR changed both backends that existed at the time to check both the
source location and the memory flags to determine whether a memory
access could trap.

Then in #2685, @cfallin wrote:

> Finally, while working out why trap records were not appearing, I had
> noticed that isa::x64::emit_std_enc_mem() was only emitting heap-OOB
> trap metadata for loads/stores when it had a srcloc. This PR ensures
> that the metadata is emitted even when the srcloc is empty.

However that PR did not apply the same change to other backends. Since
then, the pattern from #2426 has been copied to new backends.

I believe checking the source location has been unnecessary since #2426
and is now just a source of confusion at best, and possibly bugs at
worst. So this PR makes all targets match the behavior of the x64
backend.

In addition, this pattern was the only reason why source locations were
provided to any backend's emit state, so I'm removing that entirely.
The `cur_srcloc` field has been unused on x64 since #2685.

This change is mostly straightforward, but there are two questionable
changes in the riscv64 backend:

- The riscv64 backend had one use of this pattern for a
BadConversionToInteger trap. All other uses on all backends were for
HeapOutOfBounds traps. I suspect that was a copy-paste bug so I've
removed it just like all the others.

- The riscv64 `Inst::Atomic` does not have a MemFlags field, so this
means the HeapOutOfBounds trap metadata is added unconditionally for
such instructions.

* Filetests don't have srclocs so they get traps now

show more ...


Revision tags: v18.0.3, v18.0.2, v17.0.2
# 11609b68 22-Feb-2024 Ulrich Weigand <[email protected]>

s390x: Fix TLS GD relocation order (#7978)

When emitting a call to __tls_get_offset, the instruction needs to
carry two relocations, a R_390_PLT32DBL targeting __tls_get_offset
and a R_390_TLS_GDCAL

s390x: Fix TLS GD relocation order (#7978)

When emitting a call to __tls_get_offset, the instruction needs to
carry two relocations, a R_390_PLT32DBL targeting __tls_get_offset
and a R_390_TLS_GDCALL targeting the TLS symbol. Specifically, the
system linker expects to see these two relocation in that order.

However, the cranelift backend currently emits the relocations in
reverse order - this unfortunately causes the linker to corrupt
the instruction sequence when performing a TLS relaxation.

To fix this in the backend, I need support in machinst common code
to emit a relocation at some offset to the instruction about to be
emitted (e.g. the relocation should target two bytes into the 6-byte
instruction that will be emitted next). I've added a new routine
add_reloc_at_offset to that effect. This also allowed to simplify
some existing code in the backend.

Also, change the disassembler to print multiple relocations on
a single instruction, if present.

show more ...


Revision tags: v18.0.1, v18.0.0, v17.0.1, v17.0.0
# 61807647 02-Jan-2024 Alex Crichton <[email protected]>

Bump MSRV to 1.73.0, use 1.75.0 in CI (#7739)

* Bump MSRV to 1.73.0, use 1.75.0 in CI

Pulling in the Rust update released over the winter holidays.

* Fix more warnings


12345