History log of /wasmtime-44.0.1/winch/codegen/src/isa/reg.rs (Results 1 – 11 of 11)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: dev, v36.0.9, v44.0.1, v43.0.2, v36.0.8, v24.0.8, v44.0.0, v43.0.1, v42.0.2, v36.0.7, v24.0.7, v43.0.0, v42.0.1, v41.0.4, v42.0.0, v40.0.4, v36.0.6, v24.0.6, v41.0.3, v41.0.2, v41.0.1, v36.0.5, v40.0.3, v41.0.0, v36.0.4, v39.0.2, v40.0.2, v40.0.1, v40.0.0, v39.0.1, v39.0.0, v38.0.4, v37.0.3, v36.0.3, v24.0.5, v38.0.3, v38.0.2, v38.0.1, v37.0.2, v37.0.1, v37.0.0, v36.0.2, v36.0.1, v36.0.0, v35.0.0, v24.0.4, v33.0.2, v34.0.2, v34.0.1, v33.0.1, v24.0.3, v32.0.1, v34.0.0, v33.0.0, v32.0.0, v31.0.0, v30.0.2, v30.0.1, v30.0.0, v29.0.1, v29.0.0, v28.0.1, v28.0.0, v27.0.0, v26.0.1, v25.0.3, v24.0.2, v26.0.0
# 6a5a4f27 21-Oct-2024 Saúl Cabrera <[email protected]>

winch: Remove some `#[allow(dead_code)]` directives (#9480)

This commit simply removes some `#[allow(dead_code)]` which are no
longer needed as well as the the notion of callee-saved registers.

The

winch: Remove some `#[allow(dead_code)]` directives (#9480)

This commit simply removes some `#[allow(dead_code)]` which are no
longer needed as well as the the notion of callee-saved registers.

The notion of callee-saved registers was primarily used in the early
days when Winch generated its own trampolines, however, trampolines are
now emitted through Cranelift.

show more ...


Revision tags: v21.0.2, v22.0.1, v23.0.3, v25.0.2, v24.0.1
# bdb834ba 02-Oct-2024 Saúl Cabrera <[email protected]>

winch: Propery define destination registers (#9354)

No functional changes are introduced as part of this change.

This commit introduces a formal mechanism for identifying destination
registers in W

winch: Propery define destination registers (#9354)

No functional changes are introduced as part of this change.

This commit introduces a formal mechanism for identifying destination
registers in Winch's MacroAssembler and Assembler layers.

Before this change, there was no standardized way to identify writable
registers, which made it challenging to:

* Audit register clobbering effectively.
* Establish a consistent approach for defining new MacroAssembler
methods, as the identification of writable registers was done ad-hoc
and varied from method to method.

This enhancement aims to improve code maintainability and reduce
potential errors related to register management.

This commit makes use of Cranelift's `Writable<T>` type to identify writable
registers.

show more ...


Revision tags: v25.0.1, v25.0.0, v24.0.0, v23.0.2, v23.0.1, v23.0.0, v22.0.0, v21.0.1, v21.0.0, v20.0.2, v20.0.1, v20.0.0
# 55742216 16-Apr-2024 Jamey Sharp <[email protected]>

cranelift: Represent RealReg using PReg, not VReg (#8387)

Semantically, a "real register" is supposed to be a physical register,
so let's use the type dedicated for that purpose.

This has the advan

cranelift: Represent RealReg using PReg, not VReg (#8387)

Semantically, a "real register" is supposed to be a physical register,
so let's use the type dedicated for that purpose.

This has the advantage that `PReg` is only one byte while `VReg` is four
bytes, so the arrays where we record collections of `RealReg` become
smaller.

There was an implementation of `From<VReg> for RealReg` which was not a
sensible conversion, because not all `VReg`s are valid `RealReg`s. I
could have replaced it with a `TryFrom` implementation but it wasn't
used anywhere important, so I'm just deleting it instead.

Winch was using that `VReg`->`RealReg` conversion, but only in the
implementation of another conversion that was itself unused, so I'm
deleting that conversion as well. It's easy to implement correctly (the
Winch `Reg` type is identical to `RealReg`, so all conversions for the
latter are readily available) but as far as I can tell Winch doesn't
need to use Cranelift's register wrappers or RA2's virtual register
type, so it's simpler to just delete those conversions.

The riscv64 backend was relying on quirks of the existing conversions
between `RealReg` and `VReg` when emitting clobber saves and restores.
Just using the generic conversions between `RealReg` and `Reg` is
simpler and works correctly with the rest of these changes.

show more ...


Revision tags: v17.0.3, v19.0.2, v18.0.4, v19.0.1, v19.0.0, v18.0.3, v18.0.2, v17.0.2, v18.0.1, v18.0.0, v17.0.1
# 0bcceda3 25-Jan-2024 Trevor Elliott <[email protected]>

winch: Emit unwind info in the x64 backend (#7798)

* Enable all winch tests on windows

prtest:mingw-x64

* Plumb through x64 unwind info creation

* Add the frame regs unwind info

* Emit UnwindInf

winch: Emit unwind info in the x64 backend (#7798)

* Enable all winch tests on windows

prtest:mingw-x64

* Plumb through x64 unwind info creation

* Add the frame regs unwind info

* Emit UnwindInfo::SaveReg instructions

* Review feedback

* Comment the offset_downward_to_clobbers value

show more ...


Revision tags: v17.0.0, v16.0.0, v15.0.1, v15.0.0, v14.0.4, v14.0.3, v14.0.2, v13.0.1, v14.0.1, v14.0.0
# 4b288ba8 29-Sep-2023 Saúl Cabrera <[email protected]>

winch(x64): Call indirect (#7100)

* winch(x64): Call indirect

This change adds support for the `call_indirect` instruction to Winch.

Libcalls are a pre-requisite for supporting `call_indirect` in

winch(x64): Call indirect (#7100)

* winch(x64): Call indirect

This change adds support for the `call_indirect` instruction to Winch.

Libcalls are a pre-requisite for supporting `call_indirect` in order to
lazily initialy funcrefs. This change adds support for libcalls to
Winch by introducing a `BuiltinFunctions` struct similar to Cranelift's
`BuiltinFunctionSignatures` struct.

In general, libcalls are handled like any other function call, with the
only difference that given that not all the information to fulfill the
function call might be known up-front, control is given to the caller
for finalizing the call.

The introduction of function references also involves dealing with
pointer-sized loads and stores, so this change also adds the required
functionality to `FuncEnv` and `MacroAssembler` to be pointer aware,
making it straight forward to derive an `OperandSize` or `WasmType` from
the target's pointer size.

Finally, given the complexity of the call_indirect instrunction, this
change bundles an improvement to the register allocator, allowing it to
track the allocatable vs non-allocatable registers, this is done to
avoid any mistakes when allocating/de-allocating registers that are not
alloctable.

--
prtest:full

* Address review comments

* Fix typos
* Better documentation for `new_unchecked`
* Introduce `max` for `BitSet`
* Make allocatable property `u64`

* winch(calls): Overhaul `FnCall`

This commit simplifies `FnCall`'s interface making its usage more
uniform throughout the compiler. In summary, this change:

* Avoids side effects in the `FnCall::new` constructor, and also makes
it the only constructor.
* Exposes `FnCall::save_live_registers` and
`FnCall::calculate_call_stack_space` to calculate the stack space
consumed by the call and so that the caller can decide which one to
use at callsites depending on their use-case.

* tests: Fix regset tests

show more ...


Revision tags: minimum-viable-wasi-proxy-serve, v13.0.0, v12.0.2, v11.0.2, v10.0.2
# 2da108df 25-Aug-2023 Saúl Cabrera <[email protected]>

winch: Add support for parametric instructions (#6912)

* winch: Add support for parametric instructions

This commit introduces support for the drop and select instructions.

Additionally, it refact

winch: Add support for parametric instructions (#6912)

* winch: Add support for parametric instructions

This commit introduces support for the drop and select instructions.

Additionally, it refactors the CodeGenContext::drop_last implementation,
enhancing flexibility for callers to determine the handling of elements to be
dropped. This refactoring simplifies scenarios where a Memory entry is at the
top of the stack.

* refactor: Use `cmov` instead of local control flow

show more ...


Revision tags: v12.0.1
# 14b39bc2 23-Aug-2023 Saúl Cabrera <[email protected]>

winch: Initial support for floats (#6860)

* winch: Initial support for floats

This change introuduces the necessary building blocks to support floats in
Winch as well as support for both `f32.const

winch: Initial support for floats (#6860)

* winch: Initial support for floats

This change introuduces the necessary building blocks to support floats in
Winch as well as support for both `f32.const` and `f64.const` instructions.

To achieve support for floats, this change adds several key enhancements to the
compiler:

* Constant pool: A constant pool is implemented, at the Assembler level, using the machinery
exposed by Cranelift's `VCode` and `MachBuffer`. Float immediates are stored
using their bit representation in the value stack, and whenever they are
used at the MacroAssembler level they are added to the constant
pool, from that point on, they are referenced through a `Constant` addressing
mode, which gets translated to a RIP-relative addressing mode during emission.

* More precise value tagging: aside from immediates, from which the type can
be easily inferred, all the other value stack entries (`Memory`, `Reg`, and `Local`) are
modified to explicitly contain a WebAssembly type. This allows for better
instruction selection.

--

prtest:full

* fix: Account for relative sp position when pushing float regs

This was an oversight of the initial implementation. When pushing float
registers, always return an address that is relative to the current position of
the stack pointer, essentially storing to (%rsp). The previous implementation
accounted for static addresses, which is not correct.

* fix: Introduce `stack_arg_slot_size_for_type`

To correctly calculate the stack argument slot sizes, instead of overallocating
for `word_bytes`, since for `f32` floating points we only need to worry about
loading/storing 4 bytes.

* fix: Correctly type the result register.

The previous version wrongly typed the register as a general purpose register.

* refactor: Re-write `add_constants` through `add_constant`

* docs: Replace old comment

* chore: Rust fmt

* refactor: Index regset per register class

This commit implements `std::ops::{Index, IndexMut}` for `RegSet` to index each
of the bitsets by class. This reduces boilerplate and repetition throuhg the
code generation context, register allocator and register set.

* refactor: Correctly size callee saved registers

To comply with the expectation of the underlying architecture: for example in
Aarch64, only the low 64 bits of VRegs are callee saved (the D-view) and in the
`fastcall` calling convention it's expected that the callee saves the entire 128
bits of the register xmm6-xmm15.

This change also fixes the the stores/loads of callee saved float registers in the
fastcall calling convention, as in the previous implementation only the low 64
bits were saved/restored.

* docs: Add comment regarding typed-based spills

show more ...


Revision tags: v12.0.0, v11.0.1, v11.0.0, v10.0.1, v10.0.0, v9.0.4, v9.0.3, v9.0.2, v9.0.1, v9.0.0, v6.0.2, v7.0.1, v8.0.1, v8.0.0, v7.0.0, v6.0.1, v5.0.1, v4.0.1, v6.0.0
# 426c49b8 02-Feb-2023 Saúl Cabrera <[email protected]>

winch: Use aarch64 backend for code emission. (#5652)

This patch introduces basic aarch64 code generation by using
`cranelift-codegen`'s backend.

This commit *does not*:

* Change the semantic

winch: Use aarch64 backend for code emission. (#5652)

This patch introduces basic aarch64 code generation by using
`cranelift-codegen`'s backend.

This commit *does not*:

* Change the semantics of the code generation
* Adds support for other Wasm instructions

The most notable change in this patch is how addressing modes are handled at the
MacroAssembler layer: instead of having a canonical address representation, this
patch introduces the addressing mode as an associated type in the
MacroAssembler trait. This approach has the advantage that gives each ISA enough
flexiblity to describe the addressing modes and their constraints in isolation
without having to worry on how a particular addressing mode is going to affect
other ISAs. In the case of Aarch64 this becomes useful to describe indexed
addressing modes (particularly from the stack pointer).

This patch uses the concept of a shadow stack pointer (x28) as a workaround to
Aarch64's stack pointer 16-byte alignment. This constraint is enforced by:

* Introducing specialized addressing modes when using the real stack pointer; this
enables auditing when the real stack pointer is used. As of this change, the
real stack pointer is only used in the function's prologue and epilogue.

* Asserting that the real stack pointer is not used as a base for addressing
modes.

* Ensuring that at any point during the code generation process where the stack
pointer changes (e.g. when stack space is allocated / deallocated) the value of
the real stack pointer is copied into the shadow stack pointer.

show more ...


# f5f517e8 02-Feb-2023 Saúl Cabrera <[email protected]>

winch: Small clean-up for x64 (#5691)

This commit contains a small set of clean up items for x64.

Notably:

* Adds filetests
* Documents why 16 for the arg base offset abi implementation, for

winch: Small clean-up for x64 (#5691)

This commit contains a small set of clean up items for x64.

Notably:

* Adds filetests
* Documents why 16 for the arg base offset abi implementation, for clarity.
* Fixes a bug in the spill implementation caught while anlyzing the
filetests results. The fix consists of emitting a load instead of a store into
the scratch register before spiiling its value.
* Remove dead code for pretty printing registers which is not needed anymore
since we now have proper disassembly.

show more ...


Revision tags: v5.0.0
# 94b51cdb 18-Jan-2023 Saúl Cabrera <[email protected]>

winch: Use cranelift-codegen x64 backend for emission. (#5581)

This change substitutes the string based emission mechanism with
cranelift-codegen's x64 backend.

This change _does not_:

* Intr

winch: Use cranelift-codegen x64 backend for emission. (#5581)

This change substitutes the string based emission mechanism with
cranelift-codegen's x64 backend.

This change _does not_:

* Introduce new functionality in terms of supported instructions.
* Change the semantics of the assembler/macroassembler in terms of the logic to
emit instructions.

The most notable differences between this change and the previous version are:

* Handling of shared flags and ISA-specific flags, which for now are left with
the default value.
* Simplification of instruction emission per operand size: previously the
assembler defined different methods depending on the operand size (e.g. `mov`
for 64 bits, and `movl` for 32 bits). This change updates such approach so that
each assembler method takes an operand size as a parameter, reducing duplication
and making the code more concise and easier to integrate with the x64's `Inst` enum.
* Introduction of a disassembler for testing purposes.

As of this change, Winch generates the following code for the following test
programs:

```wat
(module
(export "main" (func $main))

(func $main (result i32)
(i32.const 10)
(i32.const 20)
i32.add
))
```

```asm
0: 55 push rbp
1: 48 89 e5 mov rbp, rsp
4: b8 0a 00 00 00 mov eax, 0xa
9: 83 c0 14 add eax, 0x14
c: 5d pop rbp
d: c3 ret
```

```wat
(module
(export "main" (func $main))

(func $main (result i32)
(local $foo i32)
(local $bar i32)
(i32.const 10)
(local.set $foo)
(i32.const 20)
(local.set $bar)

(local.get $foo)
(local.get $bar)
i32.add
))
```

```asm
0: 55 push rbp
1: 48 89 e5 mov rbp, rsp
4: 48 83 ec 08 sub rsp, 8
8: 48 c7 04 24 00 00 00 00 mov qword ptr [rsp], 0
10: b8 0a 00 00 00 mov eax, 0xa
15: 89 44 24 04 mov dword ptr [rsp + 4], eax
19: b8 14 00 00 00 mov eax, 0x14
1e: 89 04 24 mov dword ptr [rsp], eax
21: 8b 04 24 mov eax, dword ptr [rsp]
24: 8b 4c 24 04 mov ecx, dword ptr [rsp + 4]
28: 01 c1 add ecx, eax
2a: 48 89 c8 mov rax, rcx
2d: 48 83 c4 08 add rsp, 8
31: 5d pop rbp
32: c3 ret
```

```wat
(module
(export "main" (func $main))

(func $main (param i32) (param i32) (result i32)
(local.get 0)
(local.get 1)
i32.add
))
```

```asm
0: 55 push rbp
1: 48 89 e5 mov rbp, rsp
4: 48 83 ec 08 sub rsp, 8
8: 89 7c 24 04 mov dword ptr [rsp + 4], edi
c: 89 34 24 mov dword ptr [rsp], esi
f: 8b 04 24 mov eax, dword ptr [rsp]
12: 8b 4c 24 04 mov ecx, dword ptr [rsp + 4]
16: 01 c1 add ecx, eax
18: 48 89 c8 mov rax, rcx
1b: 48 83 c4 08 add rsp, 8
1f: 5d pop rbp
20: c3 ret
```

show more ...


Revision tags: v4.0.0, v3.0.1, v3.0.0, v1.0.2, v2.0.2
# 835abbcd 28-Oct-2022 Saúl Cabrera <[email protected]>

Initial skeleton for Winch (#4907)

* Initial skeleton for Winch

This commit introduces the initial skeleton for Winch, the "baseline"
compiler.

This skeleton contains mostly setup code for th

Initial skeleton for Winch (#4907)

* Initial skeleton for Winch

This commit introduces the initial skeleton for Winch, the "baseline"
compiler.

This skeleton contains mostly setup code for the ISA, ABI, registers,
and compilation environment abstractions. It also includes the
calculation of function local slots.

As of this commit, the structure of these abstractions looks like the
following:

+------------------------+
| v
+----------+ +-----+ +-----------+-----+-----------------+
| Compiler | --> | ISA | --> | Registers | ABI | Compilation Env |
+----------+ +-----+ +-----------+-----+-----------------+
| ^
+------------------------------+

* Compilation environment will hold a reference to the function data

* Add basic documentation to the ABI trait

* Enable x86 and arm64 in cranelift-codegen

* Add reg_name function for x64

* Introduce the concept of a MacroAssembler and Assembler

This commit introduces the concept of a MacroAsesembler and
Assembler. The MacroAssembler trait will provide a high enough
interface across architectures so that each ISA implementation can use their own low-level
Assembler implementation to fulfill the interface. Each Assembler will
provide a 1-1 mapping to each ISA instruction.

As of this commit, only a partial debug implementation is provided for
the x64 Assembler.

* Add a newtype over PReg

Adds a newtype `Reg` over regalloc2::PReg; this ensures that Winch
will operate only on the concept of `Reg`. This change is temporary
until we have the necessary machinery to share a common Reg
abstraction via `cranelift_asm`

* Improvements to local calcuation

- Add `LocalSlot::addressed_from_sp`
- Use `u32` for local slot and local sizes calculation

* Add helper methods to ABIArg

Adds helper methods to retrieve register and type information from the argument

* Make locals_size public in frame

* Improve x64 register naming depending on size

* Add new methods to the masm interface

This commit introduces the ability for the MacroAssembler to reserve
stack space, get the address of a given local and perform a stack
store based on the concept of `Operand`s.

There are several motivating factors to introduce the concept of an
Operand:

- Make the translation between Winch and Cranelift easier;
- Make dispatching from the MacroAssembler to the underlying Assembler
- easier by minimizing the amount of functions that we need to define
- in order to satisfy the store/load combinations

This commit also introduces the concept of a memory address, which
essentially describes the addressing modes; as of this commit only one
addressing mode is supported. We'll also need to verify that this
structure will play nicely with arm64.

* Blank masm implementation for arm64

* Implementation of reserve_stack, local_address, store and fp_offset
for x64

* Implement function prologue and argument register spilling

* Add structopt and wat

* Fix debug instruction formatting

* Make TargetISA trait publicly accessible

* Modify the MacroAssembler finalize siganture to return a slice of strings

* Introduce a simple CLI for Winch

To be able to compile Wasm programs with Winch independently. Mostly
meant for testing / debugging

* Fix bug in x64 assembler mov_rm

* Remove unused import

* Move the stack slot calculation to the Frame

This commit moves the calculation of the stack slots to the frame
handler abstraction and also includes the calculation of the limits
for the function defined locals, which will be used to zero the locals
that are not associated to function arguments

* Add i32 and i64 constructors to local slots

* Introduce the concept of DefinedLocalsRange

This commit introduces `DefinedLocalsRange` to track the stack offset
at which the function-defined locals start and end; this is later used
to zero-out that stack region

* Add constructors for int and float registers

* Add a placeholder stack implementation

* Add a regset abstraction to track register availability

Adds a bit set abstraction to track register availability for register
allocation.

The bit set has no specific knowledge about physical registers, it
works on the register's hardware encoding as the source of truth.

Each RegSet is expected to be created with the universe of allocatable
registers per ISA when starting the compilation of a particular function.

* Add an abstraction over register and immediate

This is meant to be used as the source for stores.

* Add a way to zero local slots and an initial skeletion of regalloc

This commit introduces `zero_local_slots` to the MacroAssembler; which
ensures that function defined locals are zeroed out when starting the
function body.

The algorithm divides the defined function locals stack range
into 8 byte slots and stores a zero at each address. This process
relies on register allocation if the amount of slots that need to be
initialized is greater than 1. In such case, the next available
register is requested to the register set and it's used to store a 0,
which is then stored at every local slot

* Update to wasmparser 0.92

* Correctly track if the regset has registers available

* Add a result entry to the ABI signature

This commuit introduces ABIResult as part of the ABISignature;
this struct will track how function results are stored; initially it
will consiste of a single register that will be requested to the
register allocator at the end of the function; potentially causing a spill

* Move zero local slots and add more granular methods to the masm

This commit removes zeroing local slots from the MacroAssembler and
instead adds more granular methods to it (e.g `zero`, `add`).

This allows for better code sharing since most of the work done by the
algorithm for zeroing slots will be the same in all targets, except
for the binary emissions pieces, which is what gets delegated to the masm

* Use wasmparser's visitor API and add initial support for const and add

This commit adds initial support for the I32Const and I32
instructions; this involves adding a minimum for register
allocation. Note that some regalloc pieces are still incomplete, since
for the current set of supported instructions they are not needed.

* Make the ty field public in Local

* Add scratch_reg to the abi

* Add a method to get a particular local from the Frame

* Split the compilation environment abstraction

This commit splits the compilation environment into two more concise
abstractions:

1. CodeGen: the main abstraction for code generation
2. CodeGenContext: abstraction that shares the common pieces for
compilation; these pieces are shared between the code generator and
the register allocator

* Add `push` and `load` to the MacroAssembler

* Remove dead code warnings for unused paths

* Map ISA features to cranelift-codegen ISA features

* Apply formatting

* Fix Cargo.toml after a bad rebase

* Add component-compiler feature

* Use clap instead of structopt

* Add winch to publish.rs script

* Minor formatting

* Add tests to RegSet and fix two bugs when freeing and checking for
register availability

* Add tests to Stack

* Free source register after a non-constant i32 add

* Improve comments

- Remove unneeded comments
- And improve some of the TODO items

* Update default features

* Drop the ABI generic param and pass the word_size information directly

To avoid dealing with dead code warnings this commit passes the word
size information directly, since it's the only piece of information
needed from the ABI by Codegen until now

* Remove dead code

This piece of code will be put back once we start integrating Winch
with Wasmtime

* Remove unused enum variant

This variant doesn't get constructed; it should be added back once a
backend is added and not enabled by default or when Winch gets
integrated into Wasmtime

* Fix unused code in regset tests

* Update spec testsuite

* Switch the visitor pattern for a simpler operator match

This commit removes the usage of wasmparser's visitor pattern and
instead defaults to a simpler operator matching approach. This removes
the complexity of having to define all the visitor trait functions at once.

* Use wasmparser's Visitor trait with a different macro strategy

This commit puts back wasmparser's Visitor trait, with a sigle;
simpler macro, only used for unsupported operators.

* Restructure Winch

This commit restuructures Winch's parts. It divides the initial
approach into three main crates: `winch-codegen`,`wasmtime-winch` and `winch-tools`.

`wasmtime-winch` is reponsible for the Wasmtime-Winch integration.
`winch-codegen` is solely responsible for code generation.
`winch-tools` is CLI tool to compile Wasm programs, mainly for testing purposes.

* Refactor zero local slots

This commit moves the logic of zeroing local slots from the codegen
module into a method with a default implementation in the
MacroAssembler trait: `zero_mem_range`.

The refactored implementation is very similar to the previous
implementation with the only difference
that it doesn't allocates a general-purpose register; it instead uses
the register allocator to retrieve the scratch register and uses this
register to unroll the series of zero stores.

* Tie the codegen creation to the ISA ABI

This commit makes the relationship between the ISA ABI and the codegen
explicit. This allows us to pass down ABI-specific bit and pieces to
the codegeneration. In this case the only concrete piece that we need
is the ABI word size.

* Mark winch as publishable directory

* Revamp winch docs

This commit ensures that all the code comments in Winch are compliant
with the syle used in the rest of Wasmtime's codebase.

It also imptoves, generally the quality of the comments in some modules.

* Panic when using multi-value when the target is aarch64

Similar to x64, this commit ensures that the abi signature of the
current function doesn't use multi-value returns

* Document the usage of directives

* Use endianness instead of endianess in the ISA trait

* Introduce a three-argument form in the MacroAssembler

This commit introduces the usage of three-argument form for the
MacroAssembler interface. This allows for a natural mapping for
architectures like aarch64. In the case of x64, the implementation can
simply restrict the implementation asserting for equality in two of
the arguments of defaulting to a differnt set of instructions.

As of this commit, the implementation of `add` panics if the
destination and the first source arguments are not equal; internally
the x64 assembler implementation will ensure that all the allowed
combinations of `add` are satisfied. The reason for panicking and not
emitting a `mov` followed by an `add` for example is simply because register
allocation happens right before calling `add`, which ensures any
register-to-register moves, if needed.

This implementation will evolve in the future and this panic will be
lifted if needed.

* Improve the documentation for the MacroAssembler.

Documents the usage of three-arg form and the intention around the
high-level interface.

* Format comments in remaining modules

* Clean up Cargo.toml for winch pieces

This commit adds missing fields to each of Winch's Cargo.toml.

* Use `ModuleTranslation::get_types()` to derive the function type

* Assert that start range is always word-size aligned

show more ...