|
Revision tags: dev, v36.0.9, v44.0.1, v43.0.2, v36.0.8, v24.0.8, v44.0.0, v43.0.1, v42.0.2, v36.0.7, v24.0.7, v43.0.0, v42.0.1, v41.0.4, v42.0.0, v40.0.4, v36.0.6, v24.0.6, v41.0.3, v41.0.2, v41.0.1, v36.0.5, v40.0.3, v41.0.0, v36.0.4, v39.0.2, v40.0.2, v40.0.1 |
|
| #
0889323a |
| 03-Jan-2026 |
SSD <[email protected]> |
cranelift-codegen: rename most uses of std to core and alloc (#12237)
* rename most std uses to core and alloc
* cargo fmt
|
|
Revision tags: v40.0.0, v39.0.1, v39.0.0, v38.0.4, v37.0.3, v36.0.3, v24.0.5, v38.0.3, v38.0.2, v38.0.1, v37.0.2, v37.0.1, v37.0.0, v36.0.2, v36.0.1, v36.0.0, v35.0.0, v24.0.4, v33.0.2, v34.0.2, v34.0.1, v33.0.1, v24.0.3, v32.0.1, v34.0.0, v33.0.0, v32.0.0 |
|
| #
82c0a09b |
| 26-Mar-2025 |
Chris Fallin <[email protected]> |
Simplify aegraphs by removing union-find and canonical eclass IDs. (#10471)
I was recently re-thinking through some of the core data structure design in our aegraph implementation, and wondered: do
Simplify aegraphs by removing union-find and canonical eclass IDs. (#10471)
I was recently re-thinking through some of the core data structure design in our aegraph implementation, and wondered: do we really need the union-find data structure, the notion of "canonical" ID for an eclass separate from its latest ID (root of union-node tree), and the hashcons-key canonicalization using all of this? It's an awful lot of complexity and has led to some fairly subtle bugs (e.g., #6126), and is generally unsatisfying.
I had the realization: the only case where the distinction between canonical and latest ID matters is when we expand an eclass after its initial (eager) rewriting, which happens before we see its uses. If we hypothesize that this happens rarely, then it should be fine to canonicalize based only on latest ID -- we shouldn't lose much (and we can measure this loss empirically).
The chief case where this kind of "late eclass expansion" still happens is: if we have some expression E1 that eventually rewrites to E2 via some simplification, and E2 already exists earlier in the program, then E1 will join the eclass. If we then have some `E3 := E1 + 1`, and later `E4 := E2 + 1`, we need the union-find canonicalization for E4 to GVN to E3. Otherwise, the latest ID for the eclass that eventually contains E1 and E2 is different at the time that E3 uses it (and is GVN'd and rewritten) and when E4 does. Put another way: E3 captures a snapshot of its operand's eclass before a new node joins it, and is never reprocessed when that happens, so E3 remains distinct.
But if the `E2 -> E1` rewrite is truly "directional" toward a better representation that we will always want to choose -- say, `x + 0 -> x`, or any constant-propagation in general -- then if the eager rewriting for E2 produces E1's eclass ID directly *without* adding E2 to its nodes, then all users will still canonicalize as before. This "only return the rewrite target, don't union with it" before is exactly our `subsume` operator.
Put another way: subsumption prevents growing eclasses later, so snapshots in time remain the latest, and everyone canonicalizes with the same ID. We move to a true immutable data structure, with simple hashconsing with no magic.
The rewrite semantics are then much simpler too: if any value is marked "subsume", we pick it (pick an arbitrary one if multiple) as our rewrite result; otherwise, create an eclass with the original rewrite input and all rewrite outputs (by creating union-nodes in the DFG). No auto-subsume or factoring in availability -- that's it. "Subsume" means: always pick the rewritten value, don't keep the original, and we use it whenever that's unambiguously true for better canonicalization.
Given that, it turns out that we can remove the union-find mechanism entirely. It also turns out that we can unify the scoped-hashmap for "effectful/idempotent ops" with the ordinary hashmap for pure ops; everything can be scoped. To get working LICM we do need to retain our "available block" mechanism, but only to insert values at a higher scope level in the scoped hashmap -- it is now heuristic, not load-bearing for correctness.
I suspect that the fixpoint loop computing analysis results can go away too, now that we truly don't update arguments -- we have a simple immutable data structure with everything snapshotted at creation -- but I haven't made that change yet.
This change appears to be "in the noise" in both runtime and compile time -- a Sightglass run on the default suite shows only
``` compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm
Δ = 551234.50 ± 514580.62 (confidence = 99%)
new.so is 1.00x to 1.01x faster than old.so!
[61669181 72513567.85 98139932] new.so [60991071 73064802.35 120044089] old.so
execution :: cycles :: benchmarks/bz2/benchmark.wasm
Δ = 232827.80 ± 204621.12 (confidence = 99%)
old.so is 1.00x to 1.01x faster than new.so!
[67208140 72812782.32 89996076] new.so [69531172 72579954.52 80530142] old.so ```
which seem like suitably small swings that are fine. Spot-checking the aegraph stats on the same function before-and-after shows the same optimizations happening in all functions I examined, and we see the compile-tests showing no movement except for a value renumbering in one case. So: no effect objectively, but deletes code and significantly simplifies the core algorithm.
show more ...
|
|
Revision tags: v31.0.0, v30.0.2, v30.0.1, v30.0.0, v29.0.1, v29.0.0, v28.0.1 |
|
| #
f2fb00dd |
| 10-Jan-2025 |
Jonas Kruckenberg <[email protected]> |
chore: update `hashbrown` to 0.15 (#9974)
* chore: update `hashbrown` to 0.15
* fmt
|
|
Revision tags: v28.0.0, v27.0.0, v26.0.1, v25.0.3, v24.0.2, v26.0.0, v21.0.2, v22.0.1, v23.0.3, v25.0.2, v24.0.1, v25.0.1, v25.0.0, v24.0.0, v23.0.2, v23.0.1, v23.0.0, v22.0.0, v21.0.1, v21.0.0, v20.0.2, v20.0.1 |
|
| #
132ef1e4 |
| 29-Apr-2024 |
Kirpal Grewal <[email protected]> |
Fxhash to rustchash (#8498)
* move fx hash to workspace level dep
* change internal fxhash to use fxhash crate
* remove unneeded HashSet import
* change fxhash crate to rustc hash
* undo migrat
Fxhash to rustchash (#8498)
* move fx hash to workspace level dep
* change internal fxhash to use fxhash crate
* remove unneeded HashSet import
* change fxhash crate to rustc hash
* undo migration to rustc hash
* manually implement hash function from fxhash
* change to rustc hash
show more ...
|
|
Revision tags: v20.0.0, v17.0.3, v19.0.2, v18.0.4, v19.0.1, v19.0.0, v18.0.3, v18.0.2, v17.0.2 |
|
| #
9ce3ffe1 |
| 22-Feb-2024 |
Alex Crichton <[email protected]> |
Update some CI dependencies (#7983)
* Update some CI dependencies
* Update to the latest nightly toolchain * Update mdbook * Update QEMU for cross-compiled testing * Update `cargo nextest` for usag
Update some CI dependencies (#7983)
* Update some CI dependencies
* Update to the latest nightly toolchain * Update mdbook * Update QEMU for cross-compiled testing * Update `cargo nextest` for usage with MIRI
prtest:full
* Remove lots of unnecessary imports
* Downgrade qemu as 8.2.1 seems to segfault
* Remove more imports
* Remove unused winch trait method
* Fix warnings about unused trait methods
* More unused imports
* More unused imports
show more ...
|
|
Revision tags: v18.0.1, v18.0.0, v17.0.1, v17.0.0, v16.0.0, v15.0.1, v15.0.0, v14.0.4, v14.0.3, v14.0.2, v13.0.1, v14.0.1, v14.0.0, minimum-viable-wasi-proxy-serve, v13.0.0, v12.0.2, v11.0.2, v10.0.2, v12.0.1, v12.0.0, v11.0.1, v11.0.0, v10.0.1, v10.0.0, v9.0.4, v9.0.3, v9.0.2, v9.0.1, v9.0.0, v6.0.2, v7.0.1, v8.0.1, v8.0.0, v7.0.0, v6.0.1, v5.0.1, v4.0.1, v6.0.0, v5.0.0, v4.0.0 |
|
| #
f980defe |
| 06-Dec-2022 |
Chris Fallin <[email protected]> |
egraph support: rewrite to work in terms of CLIF data structures. (#5382)
* egraph support: rewrite to work in terms of CLIF data structures.
This work rewrites the "egraph"-based optimization f
egraph support: rewrite to work in terms of CLIF data structures. (#5382)
* egraph support: rewrite to work in terms of CLIF data structures.
This work rewrites the "egraph"-based optimization framework in
Cranelift to operate on aegraphs (acyclic egraphs) represented in the
CLIF itself rather than as a separate data structure to which and from
which we translate the CLIF.
The basic idea is to add a new kind of value, a "union", that is like an
alias but refers to two other values rather than one. This allows us to
represent an eclass of enodes (values) as a tree. The union node allows
for a value to have *multiple representations*: either constituent value
could be used, and (in well-formed CLIF produced by correct
optimization rules) they must be equivalent.
Like the old egraph infrastructure, we take advantage of acyclicity and
eager rule application to do optimization in a single pass. Like before,
we integrate GVN (during the optimization pass) and LICM (during
elaboration).
Unlike the old egraph infrastructure, everything stays in the
DataFlowGraph. "Pure" enodes are represented as instructions that have
values attached, but that are not placed into the function layout. When
entering "egraph" form, we remove them from the layout while optimizing.
When leaving "egraph" form, during elaboration, we can place an
instruction back into the layout the first time we elaborate the enode;
if we elaborate it more than once, we clone the instruction.
The implementation performs two passes overall:
- One, a forward pass in RPO (to see defs before uses), that (i) removes
"pure" instructions from the layout and (ii) optimizes as it goes. As
before, we eagerly optimize, so we form the entire union of optimized
forms of a value before we see any uses of that value. This lets us
rewrite uses to use the most "up-to-date" form of the value and
canonicalize and optimize that form.
The eager rewriting and acyclic representation make each other work
(we could not eagerly rewrite if there were cycles; and acyclicity
does not miss optimization opportunities only because the first time
we introduce a value, we immediately produce its "best" form). This
design choice is also what allows us to avoid the "parent pointers"
and fixpoint loop of traditional egraphs.
This forward optimization pass keeps a scoped hashmap to "intern"
nodes (thus performing GVN), and also interleaves on a per-instruction
level with alias analysis. The interleaving with alias analysis allows
alias analysis to see the most optimized form of each address (so it
can see equivalences), and allows the next value to see any
equivalences (reuses of loads or stored values) that alias analysis
uncovers.
- Two, a forward pass in domtree preorder, that "elaborates" pure enodes
back into the layout, possibly in multiple places if needed. This
tracks the loop nest and hoists nodes as needed, performing LICM as it
goes. Note that by doing this in forward order, we avoid the
"fixpoint" that traditional LICM needs: we hoist a def before its
uses, so when we place a node, we place it in the right place the
first time rather than moving later.
This PR replaces the old (a)egraph implementation. It removes both the
cranelift-egraph crate and the logic in cranelift-codegen that uses it.
On `spidermonkey.wasm` running a simple recursive Fibonacci
microbenchmark, this work shows 5.5% compile-time reduction and 7.7%
runtime improvement (speedup).
Most of this implementation was done in (very productive) pair
programming sessions with Jamey Sharp, thus:
Co-authored-by: Jamey Sharp <[email protected]>
* Review feedback.
* Review feedback.
* Review feedback.
* Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`.
Co-authored-by: Jamey Sharp <[email protected]>
show more ...
|