|
Revision tags: dev, v36.0.9, v44.0.1, v43.0.2, v36.0.8, v24.0.8, v44.0.0, v43.0.1, v42.0.2, v36.0.7, v24.0.7, v43.0.0, v42.0.1, v41.0.4, v42.0.0, v40.0.4, v36.0.6, v24.0.6, v41.0.3, v41.0.2, v41.0.1, v36.0.5, v40.0.3, v41.0.0, v36.0.4, v39.0.2, v40.0.2, v40.0.1 |
|
| #
50431b45 |
| 06-Jan-2026 |
Nick Fitzgerald <[email protected]> |
Cranelift: Do not track expression depth in egraph cost function (#12248)
We don't have any rules that try to rebalance trees to make them shallower in the mid-end (we determined it was the wrong pl
Cranelift: Do not track expression depth in egraph cost function (#12248)
We don't have any rules that try to rebalance trees to make them shallower in the mid-end (we determined it was the wrong place to do that kind of thing, since we don't know register pressure at that point) so there is no need to track expression depth in the cost function.
show more ...
|
| #
0889323a |
| 03-Jan-2026 |
SSD <[email protected]> |
cranelift-codegen: rename most uses of std to core and alloc (#12237)
* rename most std uses to core and alloc
* cargo fmt
|
|
Revision tags: v40.0.0, v39.0.1, v39.0.0, v38.0.4, v37.0.3, v36.0.3, v24.0.5 |
|
| #
d5ef5285 |
| 07-Nov-2025 |
Nick Fitzgerald <[email protected]> |
Cranelift: Tweak cost function to make `imul` more expensive (#12006)
* Cranelift: Tweak cost function to pessimize `select` opcodes
`select`s cannot be speculated through on some of our targets (e
Cranelift: Tweak cost function to make `imul` more expensive (#12006)
* Cranelift: Tweak cost function to pessimize `select` opcodes
`select`s cannot be speculated through on some of our targets (e.g. x64) so strongly prefer not choosing them.
Using a target-specific cost function, so that we only pessimize `select` when it makes sense, is left for follow up work and is tracked in #12005.
* Don't pessimize `select`s that heavily
show more ...
|
|
Revision tags: v38.0.3, v38.0.2, v38.0.1, v37.0.2, v37.0.1, v37.0.0, v36.0.2, v36.0.1, v36.0.0, v35.0.0, v24.0.4, v33.0.2, v34.0.2, v34.0.1, v33.0.1, v24.0.3, v32.0.1, v34.0.0, v33.0.0, v32.0.0 |
|
| #
7bf31723 |
| 08-Apr-2025 |
Nick Fitzgerald <[email protected]> |
Cranelift: simplify some side-effectful instructions in ISLE (#10524)
* Cranelift: simplify some side-effectful instructions in ISLE
This commit adds a new top-level ISLE entrypoint specifically fo
Cranelift: simplify some side-effectful instructions in ISLE (#10524)
* Cranelift: simplify some side-effectful instructions in ISLE
This commit adds a new top-level ISLE entrypoint specifically for instructions in the side-effectful skeleton: `simplify_skeleton`. While these rewrites are processed during the egraph pass, values from skeleton instructions still do not get inserted into the egraph. Indeed, `simplify_skeleton` operates on *instructions* rather than *values* because we do not represent side effects as values; values do not have side effects in CLIF, instructions do. Therefore, rather than doing a whole dynamic-programming style extraction of the best candidate simplification like we do with the egraph, we take an eager and greedy approach.
Furthermore, `simplify_skeleton` is limited only to skeleton instructions that do not involve control-flow or terminators right now. This is because changing the control-flow graph can change whether a use is dominated by a def or not, and we do not currently have the machinery to track and fix up invalidated uses. Addressing this is left for future commits.
* fix `MIN / -1` cprop and add negative tests for things simplify_skeleton cannot handle yet
show more ...
|
|
Revision tags: v31.0.0 |
|
| #
1104a838 |
| 17-Mar-2025 |
Chris Fallin <[email protected]> |
Cranelift: avoid integer-shift optimization with invalid reduces on vector types. (#10413)
An optimization rule in our mid-end implemented the following equivalence:
``` (x << N) >> N == x as T
Cranelift: avoid integer-shift optimization with invalid reduces on vector types. (#10413)
An optimization rule in our mid-end implemented the following equivalence:
``` (x << N) >> N == x as T1 as T2 ```
where `T1` is a smaller integer type and `T2` is a larger integer type, and `N` is the difference in bitwidths between them. In other words, when a left-then-right-shift clears the upper bits (or sign-extends them), we can implement that by reducing to a smaller type then extending back to the original type.
Unfortunately, `ireduce` is not valid on vector-of-integer types. A fuzzbug reported in #10409 shows that this pattern results in invalid CLIF when optimizing this pattern with an `i64x2` as input.
This PR tightens the rule's LHS to only apply to integer types.
Note that there is another optimization rule that can also implement this behavior with a bitwise AND with a mask. Previously the two resulting expressions had the same cost in the egraph, and whatever perturbation occurred in the ISLE compilation of the rule (which is a multi-constructor because this is the mid-end, so is returning multiple results) caused that rule's result to be picked instead during extraction. I've also tweaked the costs to make reduces/extends cheaper than ordinary arithmetic as a result, to keep the same optimizer outputs after extraction. One x64 test in particular showed why this is correct: the mask-with-AND resulted in two instructions (move, AND-with-immediate) while a uextend can be done with one (zero-extending move, `movzbl`), and aarch64 has similar builtin extends in many cases.
show more ...
|
|
Revision tags: v30.0.2, v30.0.1, v30.0.0, v29.0.1, v29.0.0, v28.0.1, v28.0.0, v27.0.0, v26.0.1, v25.0.3, v24.0.2, v26.0.0, v21.0.2, v22.0.1, v23.0.3, v25.0.2, v24.0.1, v25.0.1, v25.0.0, v24.0.0, v23.0.2, v23.0.1, v23.0.0, v22.0.0, v21.0.1, v21.0.0, v20.0.2, v20.0.1, v20.0.0, v17.0.3, v19.0.2, v18.0.4, v19.0.1, v19.0.0, v18.0.3, v18.0.2, v17.0.2, v18.0.1, v18.0.0, v17.0.1 |
|
| #
5b2ae836 |
| 05-Feb-2024 |
Nick Fitzgerald <[email protected]> |
Cranelift: Use a fixpoint loop to compute the best value for each eclass (#7859)
* Cranelift: Use a fixpoint loop to compute the best value for each eclass
Fixes #7857
* Remove fixpoint loop early
Cranelift: Use a fixpoint loop to compute the best value for each eclass (#7859)
* Cranelift: Use a fixpoint loop to compute the best value for each eclass
Fixes #7857
* Remove fixpoint loop early-continue optimization
* Add document describing optimization rule invariants
* Make select optimizations use subsume
* Remove invalid debug assert
* Remove now-unused methods
* Add commutative adds to cost tests
show more ...
|
|
Revision tags: v17.0.0, v16.0.0, v15.0.1, v15.0.0 |
|
| #
b9f2a306 |
| 07-Nov-2023 |
Nick Fitzgerald <[email protected]> |
Cranelift: Break op cost ties with expression depth in egraphs (#7456)
* Cranelift: Switch egraph `Cost` to a struct with named fields
Mechanical change.
* Cranelift: Break op cost ties with expre
Cranelift: Break op cost ties with expression depth in egraphs (#7456)
* Cranelift: Switch egraph `Cost` to a struct with named fields
Mechanical change.
* Cranelift: Break op cost ties with expression depth in egraphs
This means that, when the opcode cost is the same, we prefer shallow and wide expressions to narrow and deep. For example, `(a + b) + (c + d)` is preferred to `((a + b) + c) + d`. This is beneficial because it exposes more instruction-level parallelism and shortens live ranges.
Co-Authored-By: Trevor Elliott <[email protected]>
* Cranelift: Bitpack the egraph `Cost` structure
Co-Authored-By: Chris Fallin <[email protected]> Co-Authored-By: Trevor Elliott <[email protected]>
* Make it so you can't construct `Cost::inifinity()` by accident
* Use fold to code golf
---------
Co-authored-by: Trevor Elliott <[email protected]> Co-authored-by: Chris Fallin <[email protected]>
show more ...
|
| #
e39c6b76 |
| 07-Nov-2023 |
Nick Fitzgerald <[email protected]> |
Cranelift: Fix union node bitpacking (#7465)
* Cranelift: Fix union node bitpacking
It turns out we have just been taking the newest rewrite's value for a eclass union and never actually comparing
Cranelift: Fix union node bitpacking (#7465)
* Cranelift: Fix union node bitpacking
It turns out we have just been taking the newest rewrite's value for a eclass union and never actually comparing costs and taking the value with the minimum cost. Whoops!
Fixing this made some test expectations fail, which we resolved by tweaking the cost function to give materializing constants nonzero cost. This way we prefer `-x` to `0 - x`.
We also made elaboration function break ties between values with the same cost with the value index. It prefers larger value indices, since the original value's index will be lower than all of its rewritten values' indices. This heuristically prefers rewritten values because we hope our rewrites are all improvements even when the cost function can't show that.
Co-Authored-By: Chris Fallin <[email protected]> Co-Authored-By: Trevor Elliott <[email protected]>
* Add more information to assertion message
* Fix off-by-one bug in assertion
* Limit number of matches consumed from ISLE
We generally want to clamp down and avoid runaway behavior here.
But there also seems to be some sort of rustc/llvm bug on Rust 1.71 that is causing iteration to wild here. This commit avoids that bug.
* Update test expectation
* prtest:full
---------
Co-authored-by: Chris Fallin <[email protected]> Co-authored-by: Trevor Elliott <[email protected]>
show more ...
|
|
Revision tags: v14.0.4, v14.0.3, v14.0.2, v13.0.1, v14.0.1, v14.0.0 |
|
| #
04fcb6a1 |
| 19-Oct-2023 |
Trevor Elliott <[email protected]> |
A couple of small refactorings to the egraph elaboration pass (#7304)
@jameysharp and I noticed a couple of refactoring opportunities while reading through the elaboration pass:
* The elaboration l
A couple of small refactorings to the egraph elaboration pass (#7304)
@jameysharp and I noticed a couple of refactoring opportunities while reading through the elaboration pass:
* The elaboration loop doesn't need to match on the top of the stack as a reference, because each case pops it immediately. Instead we can pop and match on the popped value. * Computing the cost of a `Result` value that's not in the DFG was using the block that contains the instruction to determine the level, but since the instruction is already known to not be in the DFG, this would default to `LoopLevel::root()` unconditionally. This also meant that the `Cost::at_level` function turned into an identity function on the cost given, making it unnecessary.
Co-authored-by: Jamey Sharp <[email protected]>
show more ...
|
|
Revision tags: minimum-viable-wasi-proxy-serve, v13.0.0, v12.0.2, v11.0.2, v10.0.2, v12.0.1, v12.0.0, v11.0.1, v11.0.0, v10.0.1, v10.0.0, v9.0.4, v9.0.3, v9.0.2, v9.0.1, v9.0.0, v6.0.2, v7.0.1, v8.0.1, v8.0.0, v7.0.0, v6.0.1, v5.0.1, v4.0.1, v6.0.0 |
|
| #
de0e0bea |
| 06-Feb-2023 |
Alex Crichton <[email protected]> |
Legalize `b{and,or,xor}_not` into component instructions (#5709)
* Remove trailing whitespace in `lower.isle` files
* Legalize the `band_not` instruction into simpler form
This commit legalize
Legalize `b{and,or,xor}_not` into component instructions (#5709)
* Remove trailing whitespace in `lower.isle` files
* Legalize the `band_not` instruction into simpler form
This commit legalizes the `band_not` instruction into `band`-of-`bnot`,
or two instructions. This is intended to assist with egraph-based
optimizations where the `band_not` instruction doesn't have to be
specifically included in other bit-operation-patterns.
Lowerings of the `band_not` instruction have been moved to a
specialization of the `band` instruction.
* Legalize `bor_not` into components
Same as prior commit, but for the `bor_not` instruction.
* Legalize bxor_not into bxor-of-bnot
Same as prior commits. I think this also ended up fixing a bug in the
s390x backend where `bxor_not x y` was actually translated as `bnot
(bxor x y)` by accident given the test update changes.
* Simplify not-fused operands for riscv64
Looks like some delegated-to rules have special-cases for "if this
feature is enabled use the fused instruction" so move the clause for
testing the feature up to the lowering phase to help trigger other rules
if the feature isn't enabled. This should make the riscv64 backend more
consistent with how other backends are implemented.
* Remove B{and,or,xor}Not from cost of egraph metrics
These shouldn't ever reach egraphs now that they're legalized away.
* Add an egraph optimization for `x^-1 => ~x`
This adds a simplification node to translate xor-against-minus-1 to a
`bnot` instruction. This helps trigger various other optimizations in
the egraph implementation and also various backend lowering rules for
instructions. This is chiefly useful as wasm doesn't have a `bnot`
equivalent, so it's encoded as `x^-1`.
* Add a wasm test for end-to-end bitwise lowerings
Test that end-to-end various optimizations are being applied for input
wasm modules.
* Specifically don't self-update rustup on CI
I forget why this was here originally, but this is failing on Windows
CI. In general there's no need to update rustup, so leave it as-is.
* Cleanup some aarch64 lowering rules
Previously a 32/64 split was necessary due to the `ALUOp` being different
but that's been refactored away no so there's no longer any need for
duplicate rules.
* Narrow a x64 lowering rule
This previously made more sense when it was `band_not` and rarely used,
but be more specific in the type-filter on this rule that it's only
applicable to SIMD types with lanes.
* Simplify xor-against-minus-1 rule
No need to have the commutative version since constants are already
shuffled right for egraphs
* Optimize band-of-bnot when bnot is on the left
Use some more rules in the egraph algebraic optimizations to
canonicalize band/bor/bxor with a `bnot` operand to put the operand on
the right. That way the lowerings in the backends only have to list the
rule once, with the operand on the right, to optimize both styles of
input.
* Add commutative lowering rules
* Update cranelift/codegen/src/isa/x64/lower.isle
Co-authored-by: Jamey Sharp <[email protected]>
---------
Co-authored-by: Jamey Sharp <[email protected]>
show more ...
|
| #
ffbcc67e |
| 28-Jan-2023 |
Nick Fitzgerald <[email protected]> |
Cranelift: Consider shifts as "simple" arithmetic in egraph cost model (#5646)
|
|
Revision tags: v5.0.0, v4.0.0 |
|
| #
f980defe |
| 06-Dec-2022 |
Chris Fallin <[email protected]> |
egraph support: rewrite to work in terms of CLIF data structures. (#5382)
* egraph support: rewrite to work in terms of CLIF data structures.
This work rewrites the "egraph"-based optimization f
egraph support: rewrite to work in terms of CLIF data structures. (#5382)
* egraph support: rewrite to work in terms of CLIF data structures.
This work rewrites the "egraph"-based optimization framework in
Cranelift to operate on aegraphs (acyclic egraphs) represented in the
CLIF itself rather than as a separate data structure to which and from
which we translate the CLIF.
The basic idea is to add a new kind of value, a "union", that is like an
alias but refers to two other values rather than one. This allows us to
represent an eclass of enodes (values) as a tree. The union node allows
for a value to have *multiple representations*: either constituent value
could be used, and (in well-formed CLIF produced by correct
optimization rules) they must be equivalent.
Like the old egraph infrastructure, we take advantage of acyclicity and
eager rule application to do optimization in a single pass. Like before,
we integrate GVN (during the optimization pass) and LICM (during
elaboration).
Unlike the old egraph infrastructure, everything stays in the
DataFlowGraph. "Pure" enodes are represented as instructions that have
values attached, but that are not placed into the function layout. When
entering "egraph" form, we remove them from the layout while optimizing.
When leaving "egraph" form, during elaboration, we can place an
instruction back into the layout the first time we elaborate the enode;
if we elaborate it more than once, we clone the instruction.
The implementation performs two passes overall:
- One, a forward pass in RPO (to see defs before uses), that (i) removes
"pure" instructions from the layout and (ii) optimizes as it goes. As
before, we eagerly optimize, so we form the entire union of optimized
forms of a value before we see any uses of that value. This lets us
rewrite uses to use the most "up-to-date" form of the value and
canonicalize and optimize that form.
The eager rewriting and acyclic representation make each other work
(we could not eagerly rewrite if there were cycles; and acyclicity
does not miss optimization opportunities only because the first time
we introduce a value, we immediately produce its "best" form). This
design choice is also what allows us to avoid the "parent pointers"
and fixpoint loop of traditional egraphs.
This forward optimization pass keeps a scoped hashmap to "intern"
nodes (thus performing GVN), and also interleaves on a per-instruction
level with alias analysis. The interleaving with alias analysis allows
alias analysis to see the most optimized form of each address (so it
can see equivalences), and allows the next value to see any
equivalences (reuses of loads or stored values) that alias analysis
uncovers.
- Two, a forward pass in domtree preorder, that "elaborates" pure enodes
back into the layout, possibly in multiple places if needed. This
tracks the loop nest and hoists nodes as needed, performing LICM as it
goes. Note that by doing this in forward order, we avoid the
"fixpoint" that traditional LICM needs: we hoist a def before its
uses, so when we place a node, we place it in the right place the
first time rather than moving later.
This PR replaces the old (a)egraph implementation. It removes both the
cranelift-egraph crate and the logic in cranelift-codegen that uses it.
On `spidermonkey.wasm` running a simple recursive Fibonacci
microbenchmark, this work shows 5.5% compile-time reduction and 7.7%
runtime improvement (speedup).
Most of this implementation was done in (very productive) pair
programming sessions with Jamey Sharp, thus:
Co-authored-by: Jamey Sharp <[email protected]>
* Review feedback.
* Review feedback.
* Review feedback.
* Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`.
Co-authored-by: Jamey Sharp <[email protected]>
show more ...
|