| c00e9ea2 | 02-Dec-2025 |
Chris Fallin <[email protected]> |
Cranelift: add patchable call instructions. (#12101)
* Cranelift: add patchable call instructions.
The new `patchable_call` CLIF instruction pairs with the `patchable` ABI, and emits a callsite wit
Cranelift: add patchable call instructions. (#12101)
* Cranelift: add patchable call instructions.
The new `patchable_call` CLIF instruction pairs with the `patchable` ABI, and emits a callsite with one new key property: the MachBuffer carries metadata that describes exactly which byte range to "NOP out" (overwrite with NOP instructions) to disable that callsite. Doing so is semantically valid and explicitly supported.
This enables patching of code at runtime to dynamically turn on and off features such as instrumentation or debugging hooks. We plan to use this to implement breakpoints in Wasmtime's guest debugging support.
As part of this change, I added a notion of "unit of NOP bytes" to the MachBuffer so that the consumer (e.g., Wasmtime's Cranelift-based code compilation pipeline and metadata-producing logic) can handle patchable callsites without any other special knowledge of the ISA.
For the "real metal" ISAs there are perfectly well-defined NOPs to use, but for Pulley, where all opcodes are assigned at compile time by macro magic, I explicitly defined NOP as opcode byte 0 by moving `Nop`'s definition to the top of the list and adding a unit test asserting its encoding.
A design note: in principle it would be possible, as an alternative, to treat "patchability" as an orthogonal dimension of all callsites, and emit the metadata describing the instruction-offset range for any callsite with the flag set. The only truly necessary semantic restriction is that there are no return values (because if we turn the callsite off, nothing writes to them); we could support patchability for other ABIs and for the other kinds of call instructions. The `patchable` ABI would then be better described as something like the "no clobbers ABI". I opted not to generalize in this way because it creates some less-tested corners and the generalized form, at least at the MachInst level, is not really much simpler in the end.
A testing note: I opted not to implement actual code patching in the `cranelift-tools` filetest runner and test patching callsites in/out via some actuation (e.g. a magic hostcall, like we do for throws) because (i) that's a lot of new plumbing and (ii) we are going to test this very shortly in Wasmtime anyway and (iii) the correctness (or not) of the location-and-length metadata is easy enough to verify in the disassemblies in the compile-tests.
* Review feedback: remove dependence on (and test for) NOP being the literal byte 0.
show more ...
|
| 192f2fcd | 08-Sep-2025 |
Alex Crichton <[email protected]> |
Replace setjmp/longjmp usage in Wasmtime (#11592)
Since Wasmtime's inception it's used the `setjmp` and `longjmp` functions in C to implement handling of traps. While this solution was easy to imple
Replace setjmp/longjmp usage in Wasmtime (#11592)
Since Wasmtime's inception it's used the `setjmp` and `longjmp` functions in C to implement handling of traps. While this solution was easy to implement, relatively portable, and performant enough, there are a number of downsides that have evolved over time to make this an unattractive approach in the long run:
* Using `setjmp` fundamentally requires using C because Rust does not understand a function that returns twice. It's fundamentally unsound to invoke `setjmp` in Rust meaning that Wasmtime has forever needed a C compiler configured and set up to build. This notably means that `cargo check` cannot check other targets easily.
* Using `longjmp` means that Rust function frames are unwound on the stack without running destructors. This is a dangerous operation of which we get no protection from the compiler about. Both frames entering wasm and frames exiting wasm are all skipped. Absolutely minimizing this has been beneficial for portability to platforms such as Pulley.
* Currently the no_std implementation of Wasmtime requires embedders to provide `wasmtime_{setjmp,longjmp}` which is a thorn in the side of what is otherwise a mostly entirely independent implementation of Wasmtime.
* There is a performance floor to using `setjmp` and `longjmp`. Calling `setjmp` requires using C but Wasmtime is otherwise written in Rust meaning that there's a Rust->C->Rust->Wasm boundary which fundamentally can't be inlined without cross-language LTO which is difficult to configure.
* With the implementation of the WebAssembly exceptions proposal Wasmtime now has two means of unwinding the stack. Ideally Wasmtime would only have one, and the more general one is the method of exceptions.
* Jumping out of a signal handler on Unix is tricky business. While we've made it work it's generally most robust of the signal handler simply returns which it now does.
With all of that in mind the purpose of this commit is to replace the setjmp/longjmp mechanism of handling traps with the recently implemented support for exceptions in Cranelift. That is intended to resolve all of the above points in one swoop.
One point in particular though that's nice about setjmp/longjmp is that unwinding the stack on a trap is an O(1) operation. For situations such as stack overflow that's a particularly nice property to have as we can guarantee embedders that traps are a constant time (albeit somewhat expensive with signals) operation. Exceptions naively require unwinding the entire stack, and although frame pointers mean we're just traversing a linked list I wanted to preserve the O(1) property here nonetheless. To achieve this a solution is implemented where the array-to-wasm (host-to-wasm) trampolines setup state in `VMStoreContext` so looking up the current trap handler frame is an O(1) operation. Namely the sp/fp/pc values for a `Handler` are stored inline.
Implementing this feature required supporting relocations-to-offsets-in-functions which was not previously supported by Wasmtime. This required Cranelift refactorings such as #11570, #11585, and #11576. This then additionally required some more refactoring in this commit which was difficult to split out as it otherwise wouldn't be tested.
Apart from the relocation-related business much of this change is about updating the platform signal handlers to use exceptions instead of longjmp to return. For example on Unix this means updating the `ucontext_t` with register values that the handler specifies. Windows involves updating similar contexts, and macOS mach ports ended up not needing too many changes.
In terms of overall performance the relevant benchmark from this repository, compared to before this commit, is:
sync/no-hook/core - host-to-wasm - typed - nop time: [10.552 ns 10.561 ns 10.571 ns] change: [−7.5238% −7.4011% −7.2786%] (p = 0.00 < 0.05) Performance has improved.
Closes #3927 cc #10923
prtest:full
show more ...
|
| 3fe9c3c7 | 03-Sep-2025 |
Paul Nodet <[email protected]> |
fix: accurate leaf detection (#11581)
* feat: add is_call() method to MachInst trait and VCode analysis
Add is_call() method to MachInst trait to enable accurate leaf function detection during regi
fix: accurate leaf detection (#11581)
* feat: add is_call() method to MachInst trait and VCode analysis
Add is_call() method to MachInst trait to enable accurate leaf function detection during register allocation. Update VCode compute_clobbers() to return (clobbers, is_leaf) tuple by analyzing actual call instructions in machine code.
* feat: implement is_call() method across all architectures
Implement is_call() method for all architecture-specific MachInst implementations:
- x64: Detects CallKnown, CallUnknown, ReturnCall variants, and TLS calls (ElfTlsGetAddr, MachOTlsGetAddr) - aarch64: Detects Call, CallInd, ReturnCall variants, and TLS calls (ElfTlsGetAddr, MachOTlsGetAddr) - riscv64: Detects Call, CallInd, ReturnCall variants, and ElfTlsGetAddr - s390x: Detects CallKnown, CallUnknown, ReturnCall variants - pulley: Detects Call, CallIndirect, ReturnCall variants
Co-authored-by: bjorn3 <[email protected]>
* feat: improve leaf function detection and pass is_leaf to FrameLayout
* test: add filetests for leaf detection
* test: update expected outputs for accurate leaf function detection
* test(riscv64): update filetests output
---------
Co-authored-by: bjorn3 <[email protected]>
show more ...
|