|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
33fb23f7 |
| 24-Feb-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Merge flat with global in the SILoadStoreOptimizer
Flat can be merged with flat global since address cast is a no-op. A combined memory operation needs to be promoted to flat.
Differential
[AMDGPU] Merge flat with global in the SILoadStoreOptimizer
Flat can be merged with flat global since address cast is a no-op. A combined memory operation needs to be promoted to flat.
Differential Revision: https://reviews.llvm.org/D120431
show more ...
|
| #
517171ce |
| 24-Feb-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Extend SILoadStoreOptimizer to handle flat load/stores
TODO: merge flat with global promoting to flat.
Differential Revision: https://reviews.llvm.org/D120351
|
| #
3279e440 |
| 22-Feb-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Extend SILoadStoreOptimizer to handle global stores
TODO: merge flat load/stores. TODO: merge flat with global promoting to flat.
Differential Revision: https://reviews.llvm.org/D120346
|
| #
cefa1c5c |
| 23-Feb-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Fix combined MMO in load-store merge
Loads and stores can be out of order in the SILoadStoreOptimizer. When combining MachineMemOperands of two instructions operands are sent in the IR orde
[AMDGPU] Fix combined MMO in load-store merge
Loads and stores can be out of order in the SILoadStoreOptimizer. When combining MachineMemOperands of two instructions operands are sent in the IR order into the combineKnownAdjacentMMOs. At the moment it picks the first operand and just replaces its offset and size. This essentially loses alignment information and may generally result in an incorrect base pointer to be used.
Use a base pointer in memory addresses order instead and only adjust size.
Differential Revision: https://reviews.llvm.org/D120370
show more ...
|
| #
9e055c0f |
| 21-Feb-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Extend SILoadStoreOptimizer to handle global saddr loads
This adds handling of the _SADDR forms to the GLOBAL_LOAD combining.
TODO: merge global stores. TODO: merge flat load/stores. TODO:
[AMDGPU] Extend SILoadStoreOptimizer to handle global saddr loads
This adds handling of the _SADDR forms to the GLOBAL_LOAD combining.
TODO: merge global stores. TODO: merge flat load/stores. TODO: merge flat with global promoting to flat.
Differential Revision: https://reviews.llvm.org/D120285
show more ...
|
| #
ba17bd26 |
| 21-Feb-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Extend SILoadStoreOptimizer to handle global loads
There can be situations where global and flat loads and stores are not combined by the vectorizer, in particular if their address space di
[AMDGPU] Extend SILoadStoreOptimizer to handle global loads
There can be situations where global and flat loads and stores are not combined by the vectorizer, in particular if their address space differ in the IR but they end up the same class instructions after selection. For example a divergent load from constant address space ends up being the same global_load as a load from global address space.
TODO: merge global stores. TODO: handle SADDR forms. TODO: merge flat load/stores. TODO: merge flat with global promoting to flat.
Differential Revision: https://reviews.llvm.org/D120279
show more ...
|
| #
dc098156 |
| 21-Feb-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Remove redundand check in the SILoadStoreOptimizer
Differential Revision: https://reviews.llvm.org/D120268
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
359a792f |
| 28-Jan-2022 |
Jay Foad <[email protected]> |
[AMDGPU] SILoadStoreOptimizer: avoid unbounded register pressure increases
Previously when combining two loads this pass would sink the first one down to the second one, putting the combined load wh
[AMDGPU] SILoadStoreOptimizer: avoid unbounded register pressure increases
Previously when combining two loads this pass would sink the first one down to the second one, putting the combined load where the second one was. It would also sink any intervening instructions which depended on the first load down to just after the combined load.
For example, if we started with this sequence of instructions (code flowing from left to right):
X A B C D E F Y
After combining loads X and Y into XY we might end up with:
A B C D E F XY
But if B D and F depended on X, we would get:
A C E XY B D F
Now if the original code had some short disjoint live ranges from A to B, C to D and E to F, in the transformed code these live ranges will be long and overlapping. In this way a single merge of two loads could cause an unbounded increase in register pressure.
To fix this, change the way the way that loads are moved in order to merge them so that: - The second load is moved up to the first one. (But when merging stores, we still move the first store down to the second one.) - Intervening instructions are never moved. - Instead, if we find an intervening instruction that would need to be moved, give up on the merge. But this case should now be pretty rare because normal stores have no outputs, and normal loads only have address register inputs, but these will be identical for any pair of loads that we try to merge.
As well as fixing the unbounded register pressure increase problem, moving loads up and stores down seems like it should usually be a win for memory latency reasons.
Differential Revision: https://reviews.llvm.org/D119006
show more ...
|
| #
6527b2a4 |
| 18-Feb-2022 |
Sebastian Neubauer <[email protected]> |
[AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.
Differential Revision: https://reviews.llvm.org/D119235
|
| #
a456ace9 |
| 27-Jan-2022 |
Jay Foad <[email protected]> |
[AMDGPU] SILoadStoreOptimizer: rewrite checkAndPrepareMerge. NFCI.
Separate the function clearly into: - Checks that can be done on CI and Paired before the loop. - The loop over all instructions be
[AMDGPU] SILoadStoreOptimizer: rewrite checkAndPrepareMerge. NFCI.
Separate the function clearly into: - Checks that can be done on CI and Paired before the loop. - The loop over all instructions between CI and Paired. - Checks that must be done on InstsToMove after the loop.
Previously these were mostly done inside the loop in a very confusing way.
Differential Revision: https://reviews.llvm.org/D118994
show more ...
|
| #
001cb431 |
| 04-Feb-2022 |
Jay Foad <[email protected]> |
[AMDGPU] SILoadStoreOptimizer: fewer calls to offsetsCanBeCombined
Only call offsetsCanBeCombined with Modify = true in cases where it will really do something. NFC.
|
| #
00bbda07 |
| 28-Jan-2022 |
Jay Foad <[email protected]> |
[AMDGPU] SILoadStoreOptimizer: simplify class/subclass checks
Also add a comment explaining the difference between class and subclass. NFCI.
|
| #
33ef8bdf |
| 04-Feb-2022 |
Jay Foad <[email protected]> |
[AMDGPU] SILoadStoreOptimizer: simplify optimizeInstsWithSameBaseAddr
Common up all the calls to CI.setMI. NFCI.
|
| #
ca05edd9 |
| 04-Feb-2022 |
Jay Foad <[email protected]> |
[AMDGPU] SILoadStoreOptimizer: simplify OptimizeListAgain test
At this point CI represents the combined access (original CI combined with Paired) so it doesn't make any sense to add in Paired.width
[AMDGPU] SILoadStoreOptimizer: simplify OptimizeListAgain test
At this point CI represents the combined access (original CI combined with Paired) so it doesn't make any sense to add in Paired.width again. NFCI.
show more ...
|
| #
68e39462 |
| 27-Jan-2022 |
Jay Foad <[email protected]> |
[AMDGPU] SILoadStoreOptimizer: break lists on instructions with side effects
This just helps to keep the lists shorter and faster to sort. NFCI.
Differential Revision: https://reviews.llvm.org/D118
[AMDGPU] SILoadStoreOptimizer: break lists on instructions with side effects
This just helps to keep the lists shorter and faster to sort. NFCI.
Differential Revision: https://reviews.llvm.org/D118384
show more ...
|
| #
4b133cee |
| 27-Jan-2022 |
Jay Foad <[email protected]> |
[AMDGPU] SILoadStoreOptimizer: reject AGPR DS_WRITE sooner
Rejecting AGPR DS_WRITE instructions before adding them to any mergeable list seems cleaner than adding them to the list and rejecting them
[AMDGPU] SILoadStoreOptimizer: reject AGPR DS_WRITE sooner
Rejecting AGPR DS_WRITE instructions before adding them to any mergeable list seems cleaner than adding them to the list and rejecting them later.
Differential Revision: https://reviews.llvm.org/D118368
show more ...
|
| #
94a4594c |
| 27-Jan-2022 |
Jay Foad <[email protected]> |
[AMDGPU] SILoadStoreOptimizer: use separate lists for AGPR instructions
Using separate lists for AGPR and non-AGPR instructions seems like a cleaner solution than putting them all in the same list a
[AMDGPU] SILoadStoreOptimizer: use separate lists for AGPR instructions
Using separate lists for AGPR and non-AGPR instructions seems like a cleaner solution than putting them all in the same list and then later refusing to merge instructions of different AGPR-ness.
Differential Revision: https://reviews.llvm.org/D118367
show more ...
|
| #
8a52fef1 |
| 27-Jan-2022 |
Jay Foad <[email protected]> |
[AMDGPU] SILoadStoreOptimizer: tweak API of CombineInfo::setMI. NFC.
Change CombineInfo::setMI to take a reference to the SILoadStoreOptimizer instance, for easy access to common fields like TII and
[AMDGPU] SILoadStoreOptimizer: tweak API of CombineInfo::setMI. NFC.
Change CombineInfo::setMI to take a reference to the SILoadStoreOptimizer instance, for easy access to common fields like TII and STM.
Differential Revision: https://reviews.llvm.org/D118366
show more ...
|
| #
185cb8e8 |
| 26-Jan-2022 |
Jay Foad <[email protected]> |
[AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access
Swizzled accesses are not merged, but there is no particular reason not to merge two instructions if any of the intervening inst
[AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access
Swizzled accesses are not merged, but there is no particular reason not to merge two instructions if any of the intervening instructions happens to be a swizzled access.
This moves the check for swizzled accesses out of checkAndPrepareMerge into collectMergeableInsts where I think it makes more sense.
Differential Revision: https://reviews.llvm.org/D118267
show more ...
|
| #
95857a70 |
| 26-Jan-2022 |
Jay Foad <[email protected]> |
[AMDGPU] SILoadStoreOptimizer: Remove redundant check for volatile
SILoadStoreOptimizer::collectMergeableInsts already ends the current block if it sees a volatile (or ordered) memory access, so the
[AMDGPU] SILoadStoreOptimizer: Remove redundant check for volatile
SILoadStoreOptimizer::collectMergeableInsts already ends the current block if it sees a volatile (or ordered) memory access, so there is no need to check for them again when scanning the instructions between two pairing candidates in a block.
Differential Revision: https://reviews.llvm.org/D118266
show more ...
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
63eea41d |
| 19-Jan-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Simplify SILoadStoreOptimizer::getSubRegIdxs. NFC.
|
|
Revision tags: llvmorg-13.0.1-rc2 |
|
| #
5a667c0e |
| 28-Dec-2021 |
Kazu Hirata <[email protected]> |
[llvm] Use nullptr instead of 0 (NFC)
Identified with modernize-use-nullptr.
|
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
| #
654c89d8 |
| 06-Sep-2021 |
Christudasan Devadasan <[email protected]> |
[AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to
[AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to enable copy between VGPR and AGPR registers during regalloc.
Also, added the missing AV register classes from 192b to 1024b.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D109300
show more ...
|
| #
d1f45ed5 |
| 11-Nov-2021 |
Neubauer, Sebastian <[email protected]> |
[AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
|
| #
c5029023 |
| 02-Nov-2021 |
Martin Liska <[email protected]> |
Fix building with GCC 12:
Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380
Differential Revision: https://reviews.llvm.org/D112990
|