SILoadStoreOptimizer.cpp - OpenGrok history log for /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2
# 33fb23f7	24-Feb-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Merge flat with global in the SILoadStoreOptimizer Flat can be merged with flat global since address cast is a no-op. A combined memory operation needs to be promoted to flat. Differential [AMDGPU] Merge flat with global in the SILoadStoreOptimizer Flat can be merged with flat global since address cast is a no-op. A combined memory operation needs to be promoted to flat. Differential Revision: https://reviews.llvm.org/D120431 show more ...
# 517171ce	24-Feb-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Extend SILoadStoreOptimizer to handle flat load/stores TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120351
# 3279e440	22-Feb-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Extend SILoadStoreOptimizer to handle global stores TODO: merge flat load/stores. TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120346
# cefa1c5c	23-Feb-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Fix combined MMO in load-store merge Loads and stores can be out of order in the SILoadStoreOptimizer. When combining MachineMemOperands of two instructions operands are sent in the IR orde [AMDGPU] Fix combined MMO in load-store merge Loads and stores can be out of order in the SILoadStoreOptimizer. When combining MachineMemOperands of two instructions operands are sent in the IR order into the combineKnownAdjacentMMOs. At the moment it picks the first operand and just replaces its offset and size. This essentially loses alignment information and may generally result in an incorrect base pointer to be used. Use a base pointer in memory addresses order instead and only adjust size. Differential Revision: https://reviews.llvm.org/D120370 show more ...
# 9e055c0f	21-Feb-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Extend SILoadStoreOptimizer to handle global saddr loads This adds handling of the _SADDR forms to the GLOBAL_LOAD combining. TODO: merge global stores. TODO: merge flat load/stores. TODO: [AMDGPU] Extend SILoadStoreOptimizer to handle global saddr loads This adds handling of the _SADDR forms to the GLOBAL_LOAD combining. TODO: merge global stores. TODO: merge flat load/stores. TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120285 show more ...
# ba17bd26	21-Feb-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Extend SILoadStoreOptimizer to handle global loads There can be situations where global and flat loads and stores are not combined by the vectorizer, in particular if their address space di [AMDGPU] Extend SILoadStoreOptimizer to handle global loads There can be situations where global and flat loads and stores are not combined by the vectorizer, in particular if their address space differ in the IR but they end up the same class instructions after selection. For example a divergent load from constant address space ends up being the same global_load as a load from global address space. TODO: merge global stores. TODO: handle SADDR forms. TODO: merge flat load/stores. TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120279 show more ...
# dc098156	21-Feb-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Remove redundand check in the SILoadStoreOptimizer Differential Revision: https://reviews.llvm.org/D120268
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init
# 359a792f	28-Jan-2022	Jay Foad <[email protected]>	[AMDGPU] SILoadStoreOptimizer: avoid unbounded register pressure increases Previously when combining two loads this pass would sink the first one down to the second one, putting the combined load wh [AMDGPU] SILoadStoreOptimizer: avoid unbounded register pressure increases Previously when combining two loads this pass would sink the first one down to the second one, putting the combined load where the second one was. It would also sink any intervening instructions which depended on the first load down to just after the combined load. For example, if we started with this sequence of instructions (code flowing from left to right): X A B C D E F Y After combining loads X and Y into XY we might end up with: A B C D E F XY But if B D and F depended on X, we would get: A C E XY B D F Now if the original code had some short disjoint live ranges from A to B, C to D and E to F, in the transformed code these live ranges will be long and overlapping. In this way a single merge of two loads could cause an unbounded increase in register pressure. To fix this, change the way the way that loads are moved in order to merge them so that: - The second load is moved up to the first one. (But when merging stores, we still move the first store down to the second one.) - Intervening instructions are never moved. - Instead, if we find an intervening instruction that would need to be moved, give up on the merge. But this case should now be pretty rare because normal stores have no outputs, and normal loads only have address register inputs, but these will be identical for any pair of loads that we try to merge. As well as fixing the unbounded register pressure increase problem, moving loads up and stores down seems like it should usually be a win for memory latency reasons. Differential Revision: https://reviews.llvm.org/D119006 show more ...
# 6527b2a4	18-Feb-2022	Sebastian Neubauer <[email protected]>	[AMDGPU][NFC] Fix typos Fix some typos in the amdgpu backend. Differential Revision: https://reviews.llvm.org/D119235
# a456ace9	27-Jan-2022	Jay Foad <[email protected]>	[AMDGPU] SILoadStoreOptimizer: rewrite checkAndPrepareMerge. NFCI. Separate the function clearly into: - Checks that can be done on CI and Paired before the loop. - The loop over all instructions be [AMDGPU] SILoadStoreOptimizer: rewrite checkAndPrepareMerge. NFCI. Separate the function clearly into: - Checks that can be done on CI and Paired before the loop. - The loop over all instructions between CI and Paired. - Checks that must be done on InstsToMove after the loop. Previously these were mostly done inside the loop in a very confusing way. Differential Revision: https://reviews.llvm.org/D118994 show more ...
# 001cb431	04-Feb-2022	Jay Foad <[email protected]>	[AMDGPU] SILoadStoreOptimizer: fewer calls to offsetsCanBeCombined Only call offsetsCanBeCombined with Modify = true in cases where it will really do something. NFC.
# 00bbda07	28-Jan-2022	Jay Foad <[email protected]>	[AMDGPU] SILoadStoreOptimizer: simplify class/subclass checks Also add a comment explaining the difference between class and subclass. NFCI.
# 33ef8bdf	04-Feb-2022	Jay Foad <[email protected]>	[AMDGPU] SILoadStoreOptimizer: simplify optimizeInstsWithSameBaseAddr Common up all the calls to CI.setMI. NFCI.
# ca05edd9	04-Feb-2022	Jay Foad <[email protected]>	[AMDGPU] SILoadStoreOptimizer: simplify OptimizeListAgain test At this point CI represents the combined access (original CI combined with Paired) so it doesn't make any sense to add in Paired.width [AMDGPU] SILoadStoreOptimizer: simplify OptimizeListAgain test At this point CI represents the combined access (original CI combined with Paired) so it doesn't make any sense to add in Paired.width again. NFCI. show more ...
# 68e39462	27-Jan-2022	Jay Foad <[email protected]>	[AMDGPU] SILoadStoreOptimizer: break lists on instructions with side effects This just helps to keep the lists shorter and faster to sort. NFCI. Differential Revision: https://reviews.llvm.org/D118 [AMDGPU] SILoadStoreOptimizer: break lists on instructions with side effects This just helps to keep the lists shorter and faster to sort. NFCI. Differential Revision: https://reviews.llvm.org/D118384 show more ...
# 4b133cee	27-Jan-2022	Jay Foad <[email protected]>	[AMDGPU] SILoadStoreOptimizer: reject AGPR DS_WRITE sooner Rejecting AGPR DS_WRITE instructions before adding them to any mergeable list seems cleaner than adding them to the list and rejecting them [AMDGPU] SILoadStoreOptimizer: reject AGPR DS_WRITE sooner Rejecting AGPR DS_WRITE instructions before adding them to any mergeable list seems cleaner than adding them to the list and rejecting them later. Differential Revision: https://reviews.llvm.org/D118368 show more ...
# 94a4594c	27-Jan-2022	Jay Foad <[email protected]>	[AMDGPU] SILoadStoreOptimizer: use separate lists for AGPR instructions Using separate lists for AGPR and non-AGPR instructions seems like a cleaner solution than putting them all in the same list a [AMDGPU] SILoadStoreOptimizer: use separate lists for AGPR instructions Using separate lists for AGPR and non-AGPR instructions seems like a cleaner solution than putting them all in the same list and then later refusing to merge instructions of different AGPR-ness. Differential Revision: https://reviews.llvm.org/D118367 show more ...
# 8a52fef1	27-Jan-2022	Jay Foad <[email protected]>	[AMDGPU] SILoadStoreOptimizer: tweak API of CombineInfo::setMI. NFC. Change CombineInfo::setMI to take a reference to the SILoadStoreOptimizer instance, for easy access to common fields like TII and [AMDGPU] SILoadStoreOptimizer: tweak API of CombineInfo::setMI. NFC. Change CombineInfo::setMI to take a reference to the SILoadStoreOptimizer instance, for easy access to common fields like TII and STM. Differential Revision: https://reviews.llvm.org/D118366 show more ...
# 185cb8e8	26-Jan-2022	Jay Foad <[email protected]>	[AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access Swizzled accesses are not merged, but there is no particular reason not to merge two instructions if any of the intervening inst [AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access Swizzled accesses are not merged, but there is no particular reason not to merge two instructions if any of the intervening instructions happens to be a swizzled access. This moves the check for swizzled accesses out of checkAndPrepareMerge into collectMergeableInsts where I think it makes more sense. Differential Revision: https://reviews.llvm.org/D118267 show more ...
# 95857a70	26-Jan-2022	Jay Foad <[email protected]>	[AMDGPU] SILoadStoreOptimizer: Remove redundant check for volatile SILoadStoreOptimizer::collectMergeableInsts already ends the current block if it sees a volatile (or ordered) memory access, so the [AMDGPU] SILoadStoreOptimizer: Remove redundant check for volatile SILoadStoreOptimizer::collectMergeableInsts already ends the current block if it sees a volatile (or ordered) memory access, so there is no need to check for them again when scanning the instructions between two pairing candidates in a block. Differential Revision: https://reviews.llvm.org/D118266 show more ...
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3
# 63eea41d	19-Jan-2022	Jay Foad <[email protected]>	[AMDGPU] Simplify SILoadStoreOptimizer::getSubRegIdxs. NFC.
Revision tags: llvmorg-13.0.1-rc2
# 5a667c0e	28-Dec-2021	Kazu Hirata <[email protected]>	[llvm] Use nullptr instead of 0 (NFC) Identified with modernize-use-nullptr.
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 654c89d8	06-Sep-2021	Christudasan Devadasan <[email protected]>	[AMDGPU] Make vector superclasses allocatable The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to [AMDGPU] Make vector superclasses allocatable The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to enable copy between VGPR and AGPR registers during regalloc. Also, added the missing AV register classes from 192b to 1024b. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D109300 show more ...
# d1f45ed5	11-Nov-2021	Neubauer, Sebastian <[email protected]>	[AMDGPU][NFC] Fix typos Differential Revision: https://reviews.llvm.org/D113672
# c5029023	02-Nov-2021	Martin Liska <[email protected]>	Fix building with GCC 12: Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380 Differential Revision: https://reviews.llvm.org/D112990
12 3 4 5 6