History log of /wasmtime-44.0.1/cranelift/codegen/src/machinst/buffer.rs (Results 1 – 25 of 109)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: dev, v36.0.9, v44.0.1, v43.0.2, v36.0.8, v24.0.8, v44.0.0, v43.0.1, v42.0.2, v36.0.7, v24.0.7
# bac0e78f 01-Apr-2026 Alex Crichton <[email protected]>

aarch64: Disable csdb emission by default (#12932)

* aarch64: Disable csdb emission by default

This has a massive performance penalty on macOS, for example, and peer
compilers are not emitting this

aarch64: Disable csdb emission by default (#12932)

* aarch64: Disable csdb emission by default

This has a massive performance penalty on macOS, for example, and peer
compilers are not emitting this as part of on-by-default mitigations.
This commit preserves the option to emit it with an aarch64-specific
`use_csdb` flag, but the default is now `false` meaning that this is not
emitted by default.

Closes #12789

* Fix tests

* Fix tests & review comments

* Use ISLE rule introduced

show more ...


# baa6b27b 26-Mar-2026 Chris Fallin <[email protected]>

Cranelift: rework MachBuffer to handle very short-deadline jumps. (#12842)

* Cranelift: rework MachBuffer to handle very short-deadline jumps.

In #12811 it was reported that riscv64 compressed jump

Cranelift: rework MachBuffer to handle very short-deadline jumps. (#12842)

* Cranelift: rework MachBuffer to handle very short-deadline jumps.

In #12811 it was reported that riscv64 compressed jumps (`c.j`
instructions), with a +/- 2048-byte range, could cause panics when
combined with queued-up/deferred constants in a constant pool during
binary emission.

Our `MachBuffer` handles single-pass machine code emission, resolution
of labels, and upgrading of label ranges via "veneers" (jumps that a
shorter jump can reach that themselves have a longer range). We track a
pending "deadline" of all unresolved branches, and when the deadline is
too close (including the max size of all veneers yet to be emitted), we
emit an "island" of all veneers to resolve the deadline.

After its initial design, we added support for deferred traps and
constants to the `MachBuffer`. These worked by emitting their contents
*before* the "island" of veneers, which turns out to be slightly nicer
for code layout in some cases.

Unfortunately the full implications of those additions weren't realized
against the invariants of the deadline-resolution algorithm. In
particular, when a new branch is added with a very short range (e.g.,
`c.j`), it is possible that there are *already* too many queued-up
traps/constants for the range of that just-emitted branch to reach even
the first possible veneer site if we start an island right away.

Thus it is strictly necessary to emit the veneers before
constants/traps. Unfortunately this requires some alterations to other
aspects of label resolution as well: in particular, we can't resolve
fixups for label references to constants before we emit those constants,
and likewise for traps. Note that we do a fixpoint loop over emitting
island(s) at the end of emission, so all constants/traps *will* be
emitted and label references to them *will* be resolved eventually; just
in the opposite order, now.

No compile test because the particular reduced testcase in #12811 only
worked in the `release-36.0.0` branch, and not on `main`, and it was too
hard to tweak the test to hit the right case on `main` as well. In lieu
of that, I've added a unit test directly to the `MachBuffer`
implementation to exercise this case.

Fixes #12811.

* fix filetest with errant comments confusing precise-output check

show more ...


# ab78bd82 22-Mar-2026 Ho Kim <[email protected]>

fix: correct various typos (#12807)

Signed-off-by: Ho Kim <[email protected]>


Revision tags: v43.0.0, v42.0.1, v41.0.4, v42.0.0, v40.0.4, v36.0.6, v24.0.6, v41.0.3, v41.0.2, v41.0.1, v36.0.5, v40.0.3, v41.0.0, v36.0.4, v39.0.2, v40.0.2, v40.0.1
# 0889323a 03-Jan-2026 SSD <[email protected]>

cranelift-codegen: rename most uses of std to core and alloc (#12237)

* rename most std uses to core and alloc

* cargo fmt


Revision tags: v40.0.0
# 17fbd3c6 12-Dec-2025 Chris Fallin <[email protected]>

Debug: implement breakpoints and single-stepping. (#12133)

* Debug: implement breakpoints and single-stepping.

This is a PR that puts together a bunch of earlier pieces (patchable
calls in #12061 a

Debug: implement breakpoints and single-stepping. (#12133)

* Debug: implement breakpoints and single-stepping.

This is a PR that puts together a bunch of earlier pieces (patchable
calls in #12061 and #12101, private copies of code in #12051, and all
the prior debug event and instrumentation infrastructure) to implement
breakpoints in the guest debugger.

These are implemented in the way we have planned in #11964: each
sequence point (location prior to a Wasm opcode) is now a patchable call
instruction, patched out (replaced with NOPs) by default. When patched
in, the breakpoint callsite calls a trampoline with the `patchable` ABI
which then invokes the `breakpoint` hostcall. That hostcall emits the
debug event and nothing else.

A few of the interesting bits in this PR include:
- Implementations of "unpublish" (switch permissions back to read/write
from read/execute) for mmap'd code memory on all our platforms.
- Infrastructure in the frame-tables (debug info) metadata producer and
parser to record "breakpoint patches".
- A tweak to the NOP metadata packaged with the `MachBuffer` to allow
multiple NOP sizes. This lets us use one 5-byte NOP on x86-64, for
example (did you know x86-64 had these?!) rather than five 1-byte
NOPs.

This PR also implements single-stepping with a global-per-`Store` flag,
because at this point why not; it's a small additional bit of logic to
do *all* patches in all modules registered in the `Store` when that flag
is enabled.

A few realizations for future work:
- The need for an introspection API available to a debugger to see the
modules within a component is starting to become clear; either that,
or the "module and PC" location identifier for a breakpoint switches
to a "module or component" sum type. Right now, the tests for this
feature use only core modules. Extending to components should not
actually be hard at all, we just need to build the API for it.
- The interaction between inlining and `patchable_call` is interesting:
what happens if we inline a `patchable_call` at a `try_call` callsite?
Right now, we do *not* update the `patchable_call` to a `try_call`,
because there is no `patchable_try_call`; this is fine in the Wasmtime
embedding in practice because we never (today!) throw exceptions from
a breakpoint handler. This does suggest to me that maybe we should
make patchability a property of any callsite, and allow try-calls to
be patchable too (with the same restriction about no return values as
the only restriction); but happy to discuss that one further.

* Add missing debug.wat disas test.

* Review feedback.

* Fix comment on `CodeMemory::text_mut`.

* Review feedback.

* Review feedback: abort process on failure to re-apply executable permissions.

* Implement icache flush for aarch64.

This appears to be necessary as we otherwise see a failure in CI on
macOS/aarch64 that is consistent with patched-in breakpoint calls still
being incorrectly cached after we remove them and republish the code.

There is a longstanding issue in #3310 tracking proper icache coherence
handling on aarch64. We implemented this for Linux with the `membarrier`
syscall but never did so for macOS. Maybe this is the first point at
which it matters, because code was always loaded at new addresses (hence
did not have coherence issues because nothing would have been cached)
previously.

prtest:full

* Review feedback: use `next_multiple_of`.

show more ...


# c00e9ea2 02-Dec-2025 Chris Fallin <[email protected]>

Cranelift: add patchable call instructions. (#12101)

* Cranelift: add patchable call instructions.

The new `patchable_call` CLIF instruction pairs with the `patchable`
ABI, and emits a callsite wit

Cranelift: add patchable call instructions. (#12101)

* Cranelift: add patchable call instructions.

The new `patchable_call` CLIF instruction pairs with the `patchable`
ABI, and emits a callsite with one new key property: the MachBuffer
carries metadata that describes exactly which byte range to "NOP out"
(overwrite with NOP instructions) to disable that callsite. Doing so is
semantically valid and explicitly supported.

This enables patching of code at runtime to dynamically turn on and off
features such as instrumentation or debugging hooks. We plan to use this
to implement breakpoints in Wasmtime's guest debugging support.

As part of this change, I added a notion of "unit of NOP bytes" to the
MachBuffer so that the consumer (e.g., Wasmtime's Cranelift-based code
compilation pipeline and metadata-producing logic) can handle patchable
callsites without any other special knowledge of the ISA.

For the "real metal" ISAs there are perfectly well-defined NOPs to use,
but for Pulley, where all opcodes are assigned at compile time by macro
magic, I explicitly defined NOP as opcode byte 0 by moving `Nop`'s
definition to the top of the list and adding a unit test asserting its
encoding.

A design note: in principle it would be possible, as an alternative, to
treat "patchability" as an orthogonal dimension of all callsites, and
emit the metadata describing the instruction-offset range for any
callsite with the flag set. The only truly necessary semantic
restriction is that there are no return values (because if we turn the
callsite off, nothing writes to them); we could support patchability for
other ABIs and for the other kinds of call instructions. The `patchable`
ABI would then be better described as something like the "no clobbers
ABI". I opted not to generalize in this way because it creates some
less-tested corners and the generalized form, at least at the MachInst
level, is not really much simpler in the end.

A testing note: I opted not to implement actual code patching in the
`cranelift-tools` filetest runner and test patching callsites in/out via
some actuation (e.g. a magic hostcall, like we do for throws) because
(i) that's a lot of new plumbing and (ii) we are going to test this very
shortly in Wasmtime anyway and (iii) the correctness (or not) of the
location-and-length metadata is easy enough to verify in the
disassemblies in the compile-tests.

* Review feedback: remove dependence on (and test for) NOP being the literal byte 0.

show more ...


Revision tags: v39.0.1, v39.0.0, v38.0.4, v37.0.3, v36.0.3, v24.0.5
# d55a5c8b 29-Oct-2025 geogrego <[email protected]>

docs: minor improvement for docs (#11952)

Signed-off-by: geogrego <[email protected]>


Revision tags: v38.0.3, v38.0.2, v38.0.1, v37.0.2
# a3d6e407 06-Oct-2025 Chris Fallin <[email protected]>

Cranelift: add debug tag infrastructure. (#11768)

* Cranelift: add debug tag infrastructure.

This PR adds *debug tags*, a kind of metadata that can attach to CLIF
instructions and be lowered to VCo

Cranelift: add debug tag infrastructure. (#11768)

* Cranelift: add debug tag infrastructure.

This PR adds *debug tags*, a kind of metadata that can attach to CLIF
instructions and be lowered to VCode instructions and as metadata on
the produced compiled code. It also adds opaque descriptor blobs
carried with stackslots. Together, these two features allow decorating
IR with first-class debug instrumentation that is properly preserved
by the compiler, including across optimizations and
inlining. (Wasmtime's use of these features will come in followup
PRs.)

The key idea of a "debug tag" is to allow the Cranelift embedder to
express whatever information it needs to, in a format that is opaque
to Cranelift itself, except for the parts that need translation during
lowering. In particular, the `DebugTag::StackSlot` variant gets
translated to a physical offset into the stackframe in the compiled
metadata output. So, for example, the embedder can emit a tag
referring to a stackslot, and another describing an offset in that
stackslot.

The debug tags exist as a *sequence* on any given instruction; the
meaning of the sequence is known only to the embedder, *except* that
during inlining, the tags for the inlining call instruction are
prepended to the tags of inlined instructions. In this way, a
canonical use-case of tags as describing original source-language
frames can preserve the source-language view even when multiple
functions are inlined into one.

The descriptor on a stackslot may look a little odd at first, but its
purpose is to allow serializing some description of
stackslot-contained runtime user-program data, in a way that is firmly
attached to the stackslot. In particular, in the face of inlining,
this descriptor is copied into the inlining (parent) function from the
inlined function when the stackslot entity is copied; no other
metadata outside Cranelift needs to track the identity of stackslots
and know about that motion. This fits nicely with the ability of tags
to refer to stackslots; together, the embedder can annotate
instructions as having certain state in stackslots, and describe the
format of that state per stackslot.

This infrastructure is tested with some compile-tests now;
testing of the interpretation of the metadata output will come with
end-to-end debug instrumentation tests in a followup PR.

* Review feedback: add back sequence points and enforce tags only on sequence points or calls.

* Use Vecs for debug metadata in MachBuffer to avoid SmallVec size penalty in not-used case.

* Review feedback: switch from inlined stackslot descriptor blobs to u64 keys.

show more ...


Revision tags: v37.0.1, v37.0.0, v36.0.2
# fa1d6867 21-Aug-2025 Chris Fallin <[email protected]>

Wasmtime/Cranelift: carry "FP to SP offset" in exception data, and use it in stackwalk. (#11500)

* Wasmtime/Cranelift: carry "FP to SP offset" in exception data, and use it in stackwalk.

Currently

Wasmtime/Cranelift: carry "FP to SP offset" in exception data, and use it in stackwalk. (#11500)

* Wasmtime/Cranelift: carry "FP to SP offset" in exception data, and use it in stackwalk.

Currently Wasmtime unwinds stack frames to look for exception handlers
by walking frames one-by-one, following the FP chain as usual, and
assuming that *these frames are contiguous*: that is, that the SP in
any given frame (bottom of that frame) is immediately above the FP of
the next lower frame, plus the FP/return address pair (e.g. 16 bytes).
This allows us to get the SP for any given frame in addition to FP. We
need SP for two reasons:

- To look up dynamic context, to match Wasm tag instances for handlers
against the thrown tag;
- To actually set SP when we resume, if we do resume to a handler in
this frame.

This logic *almost but not quite* worked: I had forgotten that in our
tail-call ABI, we need to clean up incoming stack args in the
callee (because only the final callee in a parade of tail-calling
functions that reuse the same stack frame location knows how many args
it has, not the original caller). This implies that there is an
"incoming args area" *above* the FP/return address pair. Thus, frames
are not necessarily contiguous by the above definition.

In #11489 we see a case where a function of signature `(func)`
tail-calls one of `(func (param i32 i32 i32 i32 i32))`, which on
x86-64 (with four arg registers left for Wasm) is sufficient to create
incoming stack args, which then trips up the unwinder, reading a bogus
vmctx and segfaulting.

The most reasonable solution seems to be to embed the SP-to-FP offset
in the exception metadata itself, so from only the FP (which is
totally robust -- we rely on the FP chain for multiple kinds of
stack-walking) we can get the SP, allowing us to read dynamic context
and to reset SP during resume.

This PR does just that. Technically, in our ABI, the SP-to-FP offset
is constant for an entire function, but it was simpler in the
exception metadata to encode this per callsite instead (there is no
other notion of "per-function" data, only "per-callsite", so it would
be a separate binary search).

Fixes #11489.

prtest:full

* Review feedback.

show more ...


Revision tags: v36.0.1, v36.0.0
# 4590076f 26-Jul-2025 Chris Fallin <[email protected]>

Cranelift: support dynamic contexts in exception-handler lists. (#11321)

In #11285, we realized that Wasm semantics require us to match on
dynamic instances of exception tags, rather than static tag

Cranelift: support dynamic contexts in exception-handler lists. (#11321)

In #11285, we realized that Wasm semantics require us to match on
dynamic instances of exception tags, rather than static tag types. This
fundamentally requires the unwinder to be able to resolve the current
Wasm instance for each Wasm frame on the stack that has any handlers,
and our frame format does not provide this today.

We discussed many options, some of which solve the more general problem
(Wasm vmctx for any frame), but ultimately landed on a notion of
"dynamic context for evaluating tags", specific to Cranelift's
exception-catch metadata; and storing that context and carrying it
through to a place that is named in the unwind metadata. The reasoning
is fairly straightforward: we cannot afford a more general approach that
stores vmctx in every frame (I measured this at 20% overhead for a
recursive-Fibonacci benchmark that is call-intensive); and inlining
means that we may have *multiple* contexts at any given program point,
each associated with a different slice of the handler tags; so we need a
mechanism that, *just for a try-call*, intersperses contexts with tags
(or puts a context on each tag) and stores these somewhere that the
exception-unwind ABI doesn't clobber (e.g., on the stack).

This PR implements "option 4" from that issue, namely, *dynamic
exception contexts*. The idea is that this is the dual to exception
payload: while payload lets the unwinder communicate state *to* the
catching code, context lets the unwinder take state *from* the catching
code that lets it decide whether the tag is a match. Because of
inlining, we need to either associate (optional) context with every tag,
or intersperse context-updates with handler tags. I've opted for the
latter for efficiency at the CLIF level (in most cases there will be
multiple tags per context), though they are isomorphic.

The new tag-matching semantics are: when walking up the stack, upon
reaching a `try_call`, evaluate catch-clauses in listed order. A
`context` clause sets the current context. A `tagN: block(...)` clause
attempts to match the throwing exception against `tagN`, *evaluated in
the current context*, and branches to the named block if it matches. A
`default: block(...)` always branches to the named block.

Note that this lets us assume less about tags than before, and this
particularly manifests in the changes to the inliner. Whereas before,
`tagN` is `tagN` and an inner handler for that tag shadows an outer
handler (that is, tags always alias if identical indices); and whereas
before, `tagN` is not `tagM` and so we can order the tags arbitrarily
(that is, tags never alias if non-identical indices); now any two static
tag indices may or may not alias depending on the dynamic context of
each. Or, even in the same context, two may alias, because we leave the
match-predicate as an unspecified (user-chosen) algorithm during
unwinding. (This mirrors the reality that, for example, a Wasm instance
may import two tags, and dynamically these tags may be equal or
different at runtime, even instantiation-to-instantiation.) Cranelift's
only job is to faithfully carry the list of contexts and tags through to
the compiled-code metadata; and to ensure that they remain in the order
they were specified in the CLIF.

This PR introduces the Cranelift-level feature, and it will be used in
a subsequent PR that introduces Wasm exception handling. Because of
that, I've opted not to update the clif-utils runtest "runtime" to read
out contexts and do something with them -- we will have plenty of test
coverage via a bunch of Wasm tests for corner cases such as the above.
This PR does include filetests that show that contexts are carried
through to spillslots and those appear in the metadata.

Fixes #11285.

show more ...


Revision tags: v35.0.0, v24.0.4, v33.0.2, v34.0.2
# 0854775b 08-Jul-2025 bjorn3 <[email protected]>

Couple of optimizations to the Cranelift incremental cache (#11186)

* Fix a couple of comments

* Remove flags.predicate_view()

It is a remenant of the old backend framework.

* Avoid string conver

Couple of optimizations to the Cranelift incremental cache (#11186)

* Fix a couple of comments

* Remove flags.predicate_view()

It is a remenant of the old backend framework.

* Avoid string conversions for hashing the TargetIsa

* Remove func_body_len

It is identical to buffer.data.len()

* Introduce IsaFlagsHashKey

show more ...


Revision tags: v34.0.1, v33.0.1, v24.0.3, v32.0.1, v34.0.0
# 703871a2 27-May-2025 Alex Crichton <[email protected]>

Enable the `useless_conversion` Clippy lint (#10838)

* Enable the `useless_conversion` Clippy lint

We've got lots of types in Wasmtime and convert between them quite a
lot, but often over time conv

Enable the `useless_conversion` Clippy lint (#10838)

* Enable the `useless_conversion` Clippy lint

We've got lots of types in Wasmtime and convert between them quite a
lot, but often over time conversions become unnecessary through
refactorings or similar. This will hopefully enable us to clean up some
conversions as they come up to try to have as few as possible ideally.

* Review comments

show more ...


Revision tags: v33.0.0
# 90ac295e 19-May-2025 Alex Crichton <[email protected]>

Update Wasmtime to the 2024 Rust Edition (#10806)

* Update Wasmtime to the 2024 Rust Edition

Now that our MSRV supports the 2024 edition it's possible to make this
switch. This commit moves Wasmtim

Update Wasmtime to the 2024 Rust Edition (#10806)

* Update Wasmtime to the 2024 Rust Edition

Now that our MSRV supports the 2024 edition it's possible to make this
switch. This commit moves Wasmtime to the 2024 Edition to keep
up-to-date with Rust idioms and access many of the edition features
exclusive to the 2024 edition.

prtest:full

* Reformat with the 2024 edition

show more ...


Revision tags: v32.0.0
# c9db233a 18-Apr-2025 Chris Fallin <[email protected]>

Cranelift: move exception-handler metadata into callsites. (#10609)

* Rework MachBuffer interface for exception_handlers

* Rework MachBuffer to store exception handler records in flattened vector.

Cranelift: move exception-handler metadata into callsites. (#10609)

* Rework MachBuffer interface for exception_handlers

* Rework MachBuffer to store exception handler records in flattened vector.

This commit updates the call-site metadata to refer to a range in a
flattened vector containing tuples of handler tags and labels (before
finalization) or code offsets (after finalization). It also provides an
iterator accessor `.call_sites()` on the finalized buffer that yields
this information in a safe way.

---------

Co-authored-by: bjorn3 <[email protected]>

show more ...


# 94ec88ea 08-Apr-2025 Chris Fallin <[email protected]>

Cranelift: initial try_call / try_call_indirect (exception) support. (#10510)

* Cranelift: initial try_call / try_call_indirect (exception) support.

This PR adds `try_call` and `try_call_indirect`

Cranelift: initial try_call / try_call_indirect (exception) support. (#10510)

* Cranelift: initial try_call / try_call_indirect (exception) support.

This PR adds `try_call` and `try_call_indirect` instructions, and
lowerings on four of five ISAs (x86-64, aarch64, riscv64, pulley; s390x
has its own non-shared ABI code that will need separate work).

It extends CLIF to support these instructions as new kinds of branches,
and extends block-calls to accept `retN` and `exnN` block-call args that
carry the normal return values or exception payloads (respectively) into
the appropriate successor blocks.

It wires up the "normal return path" so that it continues to work.
It updates the ABI so that unwinding is possible without an initial
register state at throw: specifically, as per our RFC, all registers are
clobbered. It also includes metadata in the `MachBuffer` that describes
exception-catch destinations. However, no unwinder exists to interpret
these catch-destinations yet, so they are untested.

* Add try_call_indirect lowering as well.

show more ...


Revision tags: v31.0.0
# 2af0a1f7 13-Mar-2025 bjorn3 <[email protected]>

Introduce log2_min_function_alignment flag (#10391)

* Remove function_alignment handling from cranelift-object and cranelift-jit

It is already handled by MachBuffer. The symbol_alignment could also

Introduce log2_min_function_alignment flag (#10391)

* Remove function_alignment handling from cranelift-object and cranelift-jit

It is already handled by MachBuffer. The symbol_alignment could also be
removed as no current backend has a symbol alignment bigger than the
function alignment, but keeping it around is a bit safer when new
backends are introduced.

* Introduce log2_min_function_alignment flag

This is required for cg_clif to implement -Zmin-function-alignment.

show more ...


Revision tags: v30.0.2, v30.0.1, v30.0.0
# b09b892c 27-Jan-2025 Andrew Brown <[email protected]>

refactor: unify how bits are accessed in `cranelift-entity` (#10126)

* refactor: unify how bits are accessed in `cranelift-entity`

While using `MachLabel`, a `cranelift-entity`-created type, I noti

refactor: unify how bits are accessed in `cranelift-entity` (#10126)

* refactor: unify how bits are accessed in `cranelift-entity`

While using `MachLabel`, a `cranelift-entity`-created type, I noticed
that there were three ways to access the contained bits: `.get()`,
`.as_u32()`, and `.as_bits()`. All performed essentially the same
function and it was unclear which to use.

This change removes `MachLabel::get()`, replacing it with `as_u32()`.
It also replaces all uses of `from_bits()` and `as_bits()` with
`from_u32()` and `as_u32()`. Why? I would have preferred the "bits"
naming since it seems more clear ("just unwrap this thing") and it could
avoid a large rename if the type were changed in the future, I realized
that there are vastly more uses of the "u32" naming that already
exist--it's just easier.

While this refactoring _should_ result in no functional change, you may
notice a couple of failing tests related to a pre-existing check on
`from_u32` that did not exist on `from_bits`. For some reason,
`from_u32` asserted that we would never pick `u32::MAX` for an entity
value; unfortunately, some parsing code, `decode_narrow_field`, does
just this. Why did we have such an assertion in the first place? Is it
still needed? Should `decode_narrow_field` do something else?

* Re-add `from_bits`, `as_bits` and uses

* doc: tweak doc comment

show more ...


Revision tags: v29.0.1, v29.0.0, v28.0.1
# f6f447b0 20-Dec-2024 Alex Crichton <[email protected]>

pulley: Add macro `CallN` instructions (#9874)

* pulley: Add macro `CallN` instructions

This commit adds new macro instructions to assist with speeding up calls
between functions. Pulley's previous

pulley: Add macro `CallN` instructions (#9874)

* pulley: Add macro `CallN` instructions

This commit adds new macro instructions to assist with speeding up calls
between functions. Pulley's previous `Call` instruction was similar to
native call instructions where arguments/results are implicitly in the
right location according to the ABI, but movement between registers is
more expensive with Pulley than with native architectures. The `CallN`
instructions here enable listing a few arguments (only integer
registers) in the opcode itself. This removes the need for individual `xmov`
instructions into individual registers and instead it can all be done
within the opcode handlers.

This additionally enables passing the same argument twice to a function
to reside only in one register. Finally parallel-copies between these
registers are supported as the interpreter loads all registers and then
stores all registers.

These new instructions participate in register allocation differently
from before where the first few arguments are allowed to be in any
register and no longer use `reg_fixed_use`. All other arguments (and all
float arguments for example) continue to use `reg_fixed_use`.

Locally sightglass reports this change speeding up `pulldown-cmark` by
2-10%. On a `fib(N)` micro-benchmark it didn't help as much as I was
hoping that it was going to.

* Fix MSRV

show more ...


Revision tags: v28.0.0
# 031a28a4 17-Dec-2024 ad hoc <[email protected]>

aarch64: support udiv for 32bit integers (#9798)

* emit 32bit udiv

* winch: aarch64 udiv/urem without extension

* remove stray dbg!

* fmt

* remove println

* fix formatting in ISLE

* Sized Trap

aarch64: support udiv for 32bit integers (#9798)

* emit 32bit udiv

* winch: aarch64 udiv/urem without extension

* remove stray dbg!

* fmt

* remove println

* fix formatting in ISLE

* Sized TrapIf

* move operand size into CondBrKind variant

* show_reg_sized fallback

show more ...


# 66989d9d 05-Dec-2024 Andrew Brown <[email protected]>

Fix minor formatting issues (#9748)

* format: fix typo

* format: wrap line length

* format: re-wrap comment

* format: organize crate dependencies


# 438fc938 25-Nov-2024 Alex Crichton <[email protected]>

pulley: Implement interpreter-to-host calls (#9665)

* pulley: Implement interpreter-to-host calls

This commit is an initial stab at implementing interpreter-to-host
communication in Pulley. The bas

pulley: Implement interpreter-to-host calls (#9665)

* pulley: Implement interpreter-to-host calls

This commit is an initial stab at implementing interpreter-to-host
communication in Pulley. The basic problem is that Pulley needs the
ability to call back into Wasmtime to implement tasks such as
`memory.grow`, imported functions, etc. For native platforms this is a
simple `call_indirect` operation in Cranelift but the story for Pulley
must be different because it's effectively switching from interpreted
code to native code.

The initial idea for this in #9651 is replaced here and looks mostly
similar but with a few changes. The overall structure of how this works
is:

* A new `call_indirect_host` opcode is added to Pulley.
* Function signatures that can be called from Pulley bytecode are
statically enumerated at build-time.
* This enables the implementation of `call_indirect_host` to take an
immediate of which signature is being used and cast the function
pointer to the right type.
* A new pulley-specific relocation is added to Cranelift for this opcode.
* `RelocDistance::Far` calls to a name trigger the use of
`call_indirect_host`.
* The relocation is filled in by Wasmtime after compilation where the
signature number is inserted.
* A new `NS_*` value for user-function namespaces is reserved in
`wasmtime-cranelift` for this new namespace of functions.
* Code generation for Pulley in `wasmtime-cranelift` now has
Pulley-specific handling of the wasm-to-host transition where all
previous `call_indirect` instructions are replaced with a call to a
"backend intrinsic" which gets lowered to a `call_indirect_host`.

Note that most of this still isn't hooked up everywhere in Wasmtime.
That means that the testing here is pretty light at this time. It'll
require a fair bit more work to get everything fully integrated from
Wasmtime in Pulley. This is expected to be one of the significant
remaining chunks of work and should help unblock future testing (or make
those diffs smaller ideally).

* Review comments

show more ...


Revision tags: v27.0.0, v26.0.1, v25.0.3, v24.0.2, v26.0.0, v21.0.2, v22.0.1, v23.0.3, v25.0.2, v24.0.1
# 9fc41bae 01-Oct-2024 Alex Crichton <[email protected]>

Convert `TrapCode` to a single byte (#9338)

* Convert `TrapCode` to a single byte

This commit refactors the representation of
`cranelift_codegen::ir::TrapCode` to be a single byte. The previous
enu

Convert `TrapCode` to a single byte (#9338)

* Convert `TrapCode` to a single byte

This commit refactors the representation of
`cranelift_codegen::ir::TrapCode` to be a single byte. The previous
enumeration is replaced with an opaque byte-sized structure. Previous
variants that Cranelift uses internally are now associated `const`
values on `TrapCode` itself. For example `TrapCode::IntegerOverflow` is
now `TrapCode::INTEGER_OVERFLOW`. All non-Cranelift traps are now
removed and exclusively live in the `wasmtime-cranelift` crate now.

The representation of a `TrapCode` is now:

* 0 - invalid, used in `MemFlags` for "no trap code"
* 1..256-N - user traps
* 256-N..256 - built-in Cranelift traps (it uses N of these)

This enables embedders to have 255-N trap codes which is more than
enough for Wasmtime for example. Cranelift reserves a few built-in codes
for itself which shouldn't eat too much into the trap space.
Additionally if Cranelift needs to grow a new trap it can do so pretty
easily too.

The overall intent of this commit is to reduce the coupling of Wasmtime
and Cranelift further and generally refactor Wasmtime to use user traps
more often. This additionally shrinks the size of `TrapCode` for storage
in various locations, notably it can now infallibly be represented
inside of a `MemFlags`.

Closes #9310

* Fix some more tests

* Fix more tests

* Fix even more tests

* Review comments

* Fix tests

* Fix rebase conflict

* Update test expectations

show more ...


Revision tags: v25.0.1, v25.0.0
# c0c3a68c 21-Aug-2024 Nick Fitzgerald <[email protected]>

Cranelift: Remove the old stack maps implementation (#9159)

They are superseded by the new user stack maps implementation.


Revision tags: v24.0.0
# b2025ead 19-Aug-2024 Nick Fitzgerald <[email protected]>

Switch to new "user" stack maps and use `i32` for GC refs in Wasmtime (#9082)

This moves Wasmtime over from the old, regalloc-based stack maps system to the
new "user" stack maps system.

Removing t

Switch to new "user" stack maps and use `i32` for GC refs in Wasmtime (#9082)

This moves Wasmtime over from the old, regalloc-based stack maps system to the
new "user" stack maps system.

Removing the old regalloc-based stack maps system is left for follow-up work.

show more ...


# b5268651 14-Aug-2024 Nick Fitzgerald <[email protected]>

Cranelift: Add a new backend for emitting Pulley bytecode (#9089)

* Cranelift: Add a new backend for emitting Pulley bytecode

This commit adds two new backends for Cranelift that emits 32- and 64-b

Cranelift: Add a new backend for emitting Pulley bytecode (#9089)

* Cranelift: Add a new backend for emitting Pulley bytecode

This commit adds two new backends for Cranelift that emits 32- and 64-bit Pulley
bytecode. The backends are both actually the same, with a common implementation
living in `cranelift/codegen/src/isa/pulley_shared`. Each backend configures an
ISA flag that determines the pointer size, and lowering inspects this flag's
value when lowering memory accesses.

To avoid multiple ISLE compilation units, and to avoid compiling duplicate
copies of Pulley's generated `MInst`, I couldn't use `MInst` as the `MachInst`
implementation directly. Instead, there is an `InstAndKind` type that is a
newtype over the generated `MInst` but which also carries a phantom type
parameter that implements the `PulleyTargetKind` trait. There are two
implementations of this trait, a 32- and 64-bit version. This is necessary
because there are various static trait methods for the mach backend which we
must implement, and which return the pointer width, but don't have access to any
`self`. Therefore, we are forced to monomorphize some amount of code. This type
parameter is fairly infectious, and all the "big" backend
types (`PulleyBackend<P>`, `PulleyABICallSite<P>`, etc...) are parameterized
over it. Nonetheless, not everything is parameterized over a `PulleyTargetKind`,
and we manage to avoid duplicate `MInst` definitions and lowering code.

Note that many methods are still stubbed out with `todo!`s. It is expected that
we will fill in those implementations as the work on Pulley progresses.

* Trust the `pulley-interpreter` crate, as it is part of our workspace

* fix some clippy warnings

* Fix a dead-code warning from inside generated code

* Use a helper for emitting br_if+comparison instructions

* Add a helper for converting `Reg` to `pulley_interpreter::XReg`

* Add version to pulley workspace dependency

* search the pulley directory for crates in the publish script

show more ...


12345