cost.rs - OpenGrok history log for /wasmtime-44.0.1/cranelift/codegen/src/egraph/cost.rs

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: dev, v36.0.9, v44.0.1, v43.0.2, v36.0.8, v24.0.8, v44.0.0, v43.0.1, v42.0.2, v36.0.7, v24.0.7, v43.0.0, v42.0.1, v41.0.4, v42.0.0, v40.0.4, v36.0.6, v24.0.6, v41.0.3, v41.0.2, v41.0.1, v36.0.5, v40.0.3, v41.0.0, v36.0.4, v39.0.2, v40.0.2, v40.0.1
# 50431b45	06-Jan-2026	Nick Fitzgerald <[email protected]>	Cranelift: Do not track expression depth in egraph cost function (#12248) We don't have any rules that try to rebalance trees to make them shallower in the mid-end (we determined it was the wrong pl Cranelift: Do not track expression depth in egraph cost function (#12248) We don't have any rules that try to rebalance trees to make them shallower in the mid-end (we determined it was the wrong place to do that kind of thing, since we don't know register pressure at that point) so there is no need to track expression depth in the cost function. show more ...
# 0889323a	03-Jan-2026	SSD <[email protected]>	cranelift-codegen: rename most uses of std to core and alloc (#12237) * rename most std uses to core and alloc * cargo fmt
Revision tags: v40.0.0, v39.0.1, v39.0.0, v38.0.4, v37.0.3, v36.0.3, v24.0.5
# d5ef5285	07-Nov-2025	Nick Fitzgerald <[email protected]>	Cranelift: Tweak cost function to make `imul` more expensive (#12006) * Cranelift: Tweak cost function to pessimize `select` opcodes `select`s cannot be speculated through on some of our targets (e Cranelift: Tweak cost function to make `imul` more expensive (#12006) * Cranelift: Tweak cost function to pessimize `select` opcodes `select`s cannot be speculated through on some of our targets (e.g. x64) so strongly prefer not choosing them. Using a target-specific cost function, so that we only pessimize `select` when it makes sense, is left for follow up work and is tracked in #12005. * Don't pessimize `select`s that heavily show more ...
Revision tags: v38.0.3, v38.0.2, v38.0.1, v37.0.2, v37.0.1, v37.0.0, v36.0.2, v36.0.1, v36.0.0, v35.0.0, v24.0.4, v33.0.2, v34.0.2, v34.0.1, v33.0.1, v24.0.3, v32.0.1, v34.0.0, v33.0.0, v32.0.0
# 7bf31723	08-Apr-2025	Nick Fitzgerald <[email protected]>	Cranelift: simplify some side-effectful instructions in ISLE (#10524) * Cranelift: simplify some side-effectful instructions in ISLE This commit adds a new top-level ISLE entrypoint specifically fo Cranelift: simplify some side-effectful instructions in ISLE (#10524) * Cranelift: simplify some side-effectful instructions in ISLE This commit adds a new top-level ISLE entrypoint specifically for instructions in the side-effectful skeleton: `simplify_skeleton`. While these rewrites are processed during the egraph pass, values from skeleton instructions still do not get inserted into the egraph. Indeed, `simplify_skeleton` operates on instructions rather than values because we do not represent side effects as values; values do not have side effects in CLIF, instructions do. Therefore, rather than doing a whole dynamic-programming style extraction of the best candidate simplification like we do with the egraph, we take an eager and greedy approach. Furthermore, `simplify_skeleton` is limited only to skeleton instructions that do not involve control-flow or terminators right now. This is because changing the control-flow graph can change whether a use is dominated by a def or not, and we do not currently have the machinery to track and fix up invalidated uses. Addressing this is left for future commits. * fix `MIN / -1` cprop and add negative tests for things simplify_skeleton cannot handle yet show more ...
Revision tags: v31.0.0
# 1104a838	17-Mar-2025	Chris Fallin <[email protected]>	Cranelift: avoid integer-shift optimization with invalid reduces on vector types. (#10413) An optimization rule in our mid-end implemented the following equivalence: ``` (x << N) >> N == x as T Cranelift: avoid integer-shift optimization with invalid reduces on vector types. (#10413) An optimization rule in our mid-end implemented the following equivalence: ``` (x << N) >> N == x as T1 as T2 ``` where `T1` is a smaller integer type and `T2` is a larger integer type, and `N` is the difference in bitwidths between them. In other words, when a left-then-right-shift clears the upper bits (or sign-extends them), we can implement that by reducing to a smaller type then extending back to the original type. Unfortunately, `ireduce` is not valid on vector-of-integer types. A fuzzbug reported in #10409 shows that this pattern results in invalid CLIF when optimizing this pattern with an `i64x2` as input. This PR tightens the rule's LHS to only apply to integer types. Note that there is another optimization rule that can also implement this behavior with a bitwise AND with a mask. Previously the two resulting expressions had the same cost in the egraph, and whatever perturbation occurred in the ISLE compilation of the rule (which is a multi-constructor because this is the mid-end, so is returning multiple results) caused that rule's result to be picked instead during extraction. I've also tweaked the costs to make reduces/extends cheaper than ordinary arithmetic as a result, to keep the same optimizer outputs after extraction. One x64 test in particular showed why this is correct: the mask-with-AND resulted in two instructions (move, AND-with-immediate) while a uextend can be done with one (zero-extending move, `movzbl`), and aarch64 has similar builtin extends in many cases. show more ...
Revision tags: v30.0.2, v30.0.1, v30.0.0, v29.0.1, v29.0.0, v28.0.1, v28.0.0, v27.0.0, v26.0.1, v25.0.3, v24.0.2, v26.0.0, v21.0.2, v22.0.1, v23.0.3, v25.0.2, v24.0.1, v25.0.1, v25.0.0, v24.0.0, v23.0.2, v23.0.1, v23.0.0, v22.0.0, v21.0.1, v21.0.0, v20.0.2, v20.0.1, v20.0.0, v17.0.3, v19.0.2, v18.0.4, v19.0.1, v19.0.0, v18.0.3, v18.0.2, v17.0.2, v18.0.1, v18.0.0, v17.0.1
# 5b2ae836	05-Feb-2024	Nick Fitzgerald <[email protected]>	Cranelift: Use a fixpoint loop to compute the best value for each eclass (#7859) * Cranelift: Use a fixpoint loop to compute the best value for each eclass Fixes #7857 * Remove fixpoint loop early Cranelift: Use a fixpoint loop to compute the best value for each eclass (#7859) * Cranelift: Use a fixpoint loop to compute the best value for each eclass Fixes #7857 * Remove fixpoint loop early-continue optimization * Add document describing optimization rule invariants * Make select optimizations use subsume * Remove invalid debug assert * Remove now-unused methods * Add commutative adds to cost tests show more ...
Revision tags: v17.0.0, v16.0.0, v15.0.1, v15.0.0
# b9f2a306	07-Nov-2023	Nick Fitzgerald <[email protected]>	Cranelift: Break op cost ties with expression depth in egraphs (#7456) * Cranelift: Switch egraph `Cost` to a struct with named fields Mechanical change. * Cranelift: Break op cost ties with expre Cranelift: Break op cost ties with expression depth in egraphs (#7456) * Cranelift: Switch egraph `Cost` to a struct with named fields Mechanical change. * Cranelift: Break op cost ties with expression depth in egraphs This means that, when the opcode cost is the same, we prefer shallow and wide expressions to narrow and deep. For example, `(a + b) + (c + d)` is preferred to `((a + b) + c) + d`. This is beneficial because it exposes more instruction-level parallelism and shortens live ranges. Co-Authored-By: Trevor Elliott <[email protected]> * Cranelift: Bitpack the egraph `Cost` structure Co-Authored-By: Chris Fallin <[email protected]> Co-Authored-By: Trevor Elliott <[email protected]> * Make it so you can't construct `Cost::inifinity()` by accident * Use fold to code golf --------- Co-authored-by: Trevor Elliott <[email protected]> Co-authored-by: Chris Fallin <[email protected]> show more ...
# e39c6b76	07-Nov-2023	Nick Fitzgerald <[email protected]>	Cranelift: Fix union node bitpacking (#7465) * Cranelift: Fix union node bitpacking It turns out we have just been taking the newest rewrite's value for a eclass union and never actually comparing Cranelift: Fix union node bitpacking (#7465) * Cranelift: Fix union node bitpacking It turns out we have just been taking the newest rewrite's value for a eclass union and never actually comparing costs and taking the value with the minimum cost. Whoops! Fixing this made some test expectations fail, which we resolved by tweaking the cost function to give materializing constants nonzero cost. This way we prefer `-x` to `0 - x`. We also made elaboration function break ties between values with the same cost with the value index. It prefers larger value indices, since the original value's index will be lower than all of its rewritten values' indices. This heuristically prefers rewritten values because we hope our rewrites are all improvements even when the cost function can't show that. Co-Authored-By: Chris Fallin <[email protected]> Co-Authored-By: Trevor Elliott <[email protected]> * Add more information to assertion message * Fix off-by-one bug in assertion * Limit number of matches consumed from ISLE We generally want to clamp down and avoid runaway behavior here. But there also seems to be some sort of rustc/llvm bug on Rust 1.71 that is causing iteration to wild here. This commit avoids that bug. * Update test expectation * prtest:full --------- Co-authored-by: Chris Fallin <[email protected]> Co-authored-by: Trevor Elliott <[email protected]> show more ...
Revision tags: v14.0.4, v14.0.3, v14.0.2, v13.0.1, v14.0.1, v14.0.0
# 04fcb6a1	19-Oct-2023	Trevor Elliott <[email protected]>	A couple of small refactorings to the egraph elaboration pass (#7304) @jameysharp and I noticed a couple of refactoring opportunities while reading through the elaboration pass: * The elaboration l A couple of small refactorings to the egraph elaboration pass (#7304) @jameysharp and I noticed a couple of refactoring opportunities while reading through the elaboration pass: * The elaboration loop doesn't need to match on the top of the stack as a reference, because each case pops it immediately. Instead we can pop and match on the popped value. * Computing the cost of a `Result` value that's not in the DFG was using the block that contains the instruction to determine the level, but since the instruction is already known to not be in the DFG, this would default to `LoopLevel::root()` unconditionally. This also meant that the `Cost::at_level` function turned into an identity function on the cost given, making it unnecessary. Co-authored-by: Jamey Sharp <[email protected]> show more ...
Revision tags: minimum-viable-wasi-proxy-serve, v13.0.0, v12.0.2, v11.0.2, v10.0.2, v12.0.1, v12.0.0, v11.0.1, v11.0.0, v10.0.1, v10.0.0, v9.0.4, v9.0.3, v9.0.2, v9.0.1, v9.0.0, v6.0.2, v7.0.1, v8.0.1, v8.0.0, v7.0.0, v6.0.1, v5.0.1, v4.0.1, v6.0.0
# de0e0bea	06-Feb-2023	Alex Crichton <[email protected]>	Legalize `b{and,or,xor}_not` into component instructions (#5709) * Remove trailing whitespace in `lower.isle` files * Legalize the `band_not` instruction into simpler form This commit legalize Legalize `b{and,or,xor}_not` into component instructions (#5709) * Remove trailing whitespace in `lower.isle` files * Legalize the `band_not` instruction into simpler form This commit legalizes the `band_not` instruction into `band`-of-`bnot`, or two instructions. This is intended to assist with egraph-based optimizations where the `band_not` instruction doesn't have to be specifically included in other bit-operation-patterns. Lowerings of the `band_not` instruction have been moved to a specialization of the `band` instruction. * Legalize `bor_not` into components Same as prior commit, but for the `bor_not` instruction. * Legalize bxor_not into bxor-of-bnot Same as prior commits. I think this also ended up fixing a bug in the s390x backend where `bxor_not x y` was actually translated as `bnot (bxor x y)` by accident given the test update changes. * Simplify not-fused operands for riscv64 Looks like some delegated-to rules have special-cases for "if this feature is enabled use the fused instruction" so move the clause for testing the feature up to the lowering phase to help trigger other rules if the feature isn't enabled. This should make the riscv64 backend more consistent with how other backends are implemented. * Remove B{and,or,xor}Not from cost of egraph metrics These shouldn't ever reach egraphs now that they're legalized away. * Add an egraph optimization for `x^-1 => ~x` This adds a simplification node to translate xor-against-minus-1 to a `bnot` instruction. This helps trigger various other optimizations in the egraph implementation and also various backend lowering rules for instructions. This is chiefly useful as wasm doesn't have a `bnot` equivalent, so it's encoded as `x^-1`. * Add a wasm test for end-to-end bitwise lowerings Test that end-to-end various optimizations are being applied for input wasm modules. * Specifically don't self-update rustup on CI I forget why this was here originally, but this is failing on Windows CI. In general there's no need to update rustup, so leave it as-is. * Cleanup some aarch64 lowering rules Previously a 32/64 split was necessary due to the `ALUOp` being different but that's been refactored away no so there's no longer any need for duplicate rules. * Narrow a x64 lowering rule This previously made more sense when it was `band_not` and rarely used, but be more specific in the type-filter on this rule that it's only applicable to SIMD types with lanes. * Simplify xor-against-minus-1 rule No need to have the commutative version since constants are already shuffled right for egraphs * Optimize band-of-bnot when bnot is on the left Use some more rules in the egraph algebraic optimizations to canonicalize band/bor/bxor with a `bnot` operand to put the operand on the right. That way the lowerings in the backends only have to list the rule once, with the operand on the right, to optimize both styles of input. * Add commutative lowering rules * Update cranelift/codegen/src/isa/x64/lower.isle Co-authored-by: Jamey Sharp <[email protected]> --------- Co-authored-by: Jamey Sharp <[email protected]> show more ...
# ffbcc67e	28-Jan-2023	Nick Fitzgerald <[email protected]>	Cranelift: Consider shifts as "simple" arithmetic in egraph cost model (#5646)
Revision tags: v5.0.0, v4.0.0
# f980defe	06-Dec-2022	Chris Fallin <[email protected]>	egraph support: rewrite to work in terms of CLIF data structures. (#5382) * egraph support: rewrite to work in terms of CLIF data structures. This work rewrites the "egraph"-based optimization f egraph support: rewrite to work in terms of CLIF data structures. (#5382) * egraph support: rewrite to work in terms of CLIF data structures. This work rewrites the "egraph"-based optimization framework in Cranelift to operate on aegraphs (acyclic egraphs) represented in the CLIF itself rather than as a separate data structure to which and from which we translate the CLIF. The basic idea is to add a new kind of value, a "union", that is like an alias but refers to two other values rather than one. This allows us to represent an eclass of enodes (values) as a tree. The union node allows for a value to have multiple representations: either constituent value could be used, and (in well-formed CLIF produced by correct optimization rules) they must be equivalent. Like the old egraph infrastructure, we take advantage of acyclicity and eager rule application to do optimization in a single pass. Like before, we integrate GVN (during the optimization pass) and LICM (during elaboration). Unlike the old egraph infrastructure, everything stays in the DataFlowGraph. "Pure" enodes are represented as instructions that have values attached, but that are not placed into the function layout. When entering "egraph" form, we remove them from the layout while optimizing. When leaving "egraph" form, during elaboration, we can place an instruction back into the layout the first time we elaborate the enode; if we elaborate it more than once, we clone the instruction. The implementation performs two passes overall: - One, a forward pass in RPO (to see defs before uses), that (i) removes "pure" instructions from the layout and (ii) optimizes as it goes. As before, we eagerly optimize, so we form the entire union of optimized forms of a value before we see any uses of that value. This lets us rewrite uses to use the most "up-to-date" form of the value and canonicalize and optimize that form. The eager rewriting and acyclic representation make each other work (we could not eagerly rewrite if there were cycles; and acyclicity does not miss optimization opportunities only because the first time we introduce a value, we immediately produce its "best" form). This design choice is also what allows us to avoid the "parent pointers" and fixpoint loop of traditional egraphs. This forward optimization pass keeps a scoped hashmap to "intern" nodes (thus performing GVN), and also interleaves on a per-instruction level with alias analysis. The interleaving with alias analysis allows alias analysis to see the most optimized form of each address (so it can see equivalences), and allows the next value to see any equivalences (reuses of loads or stored values) that alias analysis uncovers. - Two, a forward pass in domtree preorder, that "elaborates" pure enodes back into the layout, possibly in multiple places if needed. This tracks the loop nest and hoists nodes as needed, performing LICM as it goes. Note that by doing this in forward order, we avoid the "fixpoint" that traditional LICM needs: we hoist a def before its uses, so when we place a node, we place it in the right place the first time rather than moving later. This PR replaces the old (a)egraph implementation. It removes both the cranelift-egraph crate and the logic in cranelift-codegen that uses it. On `spidermonkey.wasm` running a simple recursive Fibonacci microbenchmark, this work shows 5.5% compile-time reduction and 7.7% runtime improvement (speedup). Most of this implementation was done in (very productive) pair programming sessions with Jamey Sharp, thus: Co-authored-by: Jamey Sharp <[email protected]> * Review feedback. * Review feedback. * Review feedback. * Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`. Co-authored-by: Jamey Sharp <[email protected]> show more ...