History log of /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp (Results 1 – 25 of 134)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2
# 33fb23f7 24-Feb-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Merge flat with global in the SILoadStoreOptimizer

Flat can be merged with flat global since address cast is a no-op.
A combined memory operation needs to be promoted to flat.

Differential

[AMDGPU] Merge flat with global in the SILoadStoreOptimizer

Flat can be merged with flat global since address cast is a no-op.
A combined memory operation needs to be promoted to flat.

Differential Revision: https://reviews.llvm.org/D120431

show more ...


# 517171ce 24-Feb-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Extend SILoadStoreOptimizer to handle flat load/stores

TODO: merge flat with global promoting to flat.

Differential Revision: https://reviews.llvm.org/D120351


# 3279e440 22-Feb-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Extend SILoadStoreOptimizer to handle global stores

TODO: merge flat load/stores.
TODO: merge flat with global promoting to flat.

Differential Revision: https://reviews.llvm.org/D120346


# cefa1c5c 23-Feb-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Fix combined MMO in load-store merge

Loads and stores can be out of order in the SILoadStoreOptimizer.
When combining MachineMemOperands of two instructions operands are
sent in the IR orde

[AMDGPU] Fix combined MMO in load-store merge

Loads and stores can be out of order in the SILoadStoreOptimizer.
When combining MachineMemOperands of two instructions operands are
sent in the IR order into the combineKnownAdjacentMMOs. At the
moment it picks the first operand and just replaces its offset and
size. This essentially loses alignment information and may generally
result in an incorrect base pointer to be used.

Use a base pointer in memory addresses order instead and only adjust
size.

Differential Revision: https://reviews.llvm.org/D120370

show more ...


# 9e055c0f 21-Feb-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Extend SILoadStoreOptimizer to handle global saddr loads

This adds handling of the _SADDR forms to the GLOBAL_LOAD combining.

TODO: merge global stores.
TODO: merge flat load/stores.
TODO:

[AMDGPU] Extend SILoadStoreOptimizer to handle global saddr loads

This adds handling of the _SADDR forms to the GLOBAL_LOAD combining.

TODO: merge global stores.
TODO: merge flat load/stores.
TODO: merge flat with global promoting to flat.

Differential Revision: https://reviews.llvm.org/D120285

show more ...


# ba17bd26 21-Feb-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Extend SILoadStoreOptimizer to handle global loads

There can be situations where global and flat loads and stores are not
combined by the vectorizer, in particular if their address space
di

[AMDGPU] Extend SILoadStoreOptimizer to handle global loads

There can be situations where global and flat loads and stores are not
combined by the vectorizer, in particular if their address space
differ in the IR but they end up the same class instructions after
selection. For example a divergent load from constant address space
ends up being the same global_load as a load from global address space.

TODO: merge global stores.
TODO: handle SADDR forms.
TODO: merge flat load/stores.
TODO: merge flat with global promoting to flat.

Differential Revision: https://reviews.llvm.org/D120279

show more ...


# dc098156 21-Feb-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Remove redundand check in the SILoadStoreOptimizer

Differential Revision: https://reviews.llvm.org/D120268


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init
# 359a792f 28-Jan-2022 Jay Foad <[email protected]>

[AMDGPU] SILoadStoreOptimizer: avoid unbounded register pressure increases

Previously when combining two loads this pass would sink the
first one down to the second one, putting the combined load
wh

[AMDGPU] SILoadStoreOptimizer: avoid unbounded register pressure increases

Previously when combining two loads this pass would sink the
first one down to the second one, putting the combined load
where the second one was. It would also sink any intervening
instructions which depended on the first load down to just
after the combined load.

For example, if we started with this sequence of
instructions (code flowing from left to right):

X A B C D E F Y

After combining loads X and Y into XY we might end up with:

A B C D E F XY

But if B D and F depended on X, we would get:

A C E XY B D F

Now if the original code had some short disjoint live ranges
from A to B, C to D and E to F, in the transformed code
these live ranges will be long and overlapping. In this way
a single merge of two loads could cause an unbounded
increase in register pressure.

To fix this, change the way the way that loads are moved in
order to merge them so that:
- The second load is moved up to the first one. (But when
merging stores, we still move the first store down to the
second one.)
- Intervening instructions are never moved.
- Instead, if we find an intervening instruction that would
need to be moved, give up on the merge. But this case
should now be pretty rare because normal stores have no
outputs, and normal loads only have address register
inputs, but these will be identical for any pair of loads
that we try to merge.

As well as fixing the unbounded register pressure increase
problem, moving loads up and stores down seems like it
should usually be a win for memory latency reasons.

Differential Revision: https://reviews.llvm.org/D119006

show more ...


# 6527b2a4 18-Feb-2022 Sebastian Neubauer <[email protected]>

[AMDGPU][NFC] Fix typos

Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235


# a456ace9 27-Jan-2022 Jay Foad <[email protected]>

[AMDGPU] SILoadStoreOptimizer: rewrite checkAndPrepareMerge. NFCI.

Separate the function clearly into:
- Checks that can be done on CI and Paired before the loop.
- The loop over all instructions be

[AMDGPU] SILoadStoreOptimizer: rewrite checkAndPrepareMerge. NFCI.

Separate the function clearly into:
- Checks that can be done on CI and Paired before the loop.
- The loop over all instructions between CI and Paired.
- Checks that must be done on InstsToMove after the loop.

Previously these were mostly done inside the loop in a very
confusing way.

Differential Revision: https://reviews.llvm.org/D118994

show more ...


# 001cb431 04-Feb-2022 Jay Foad <[email protected]>

[AMDGPU] SILoadStoreOptimizer: fewer calls to offsetsCanBeCombined

Only call offsetsCanBeCombined with Modify = true in cases
where it will really do something. NFC.


# 00bbda07 28-Jan-2022 Jay Foad <[email protected]>

[AMDGPU] SILoadStoreOptimizer: simplify class/subclass checks

Also add a comment explaining the difference between class
and subclass. NFCI.


# 33ef8bdf 04-Feb-2022 Jay Foad <[email protected]>

[AMDGPU] SILoadStoreOptimizer: simplify optimizeInstsWithSameBaseAddr

Common up all the calls to CI.setMI. NFCI.


# ca05edd9 04-Feb-2022 Jay Foad <[email protected]>

[AMDGPU] SILoadStoreOptimizer: simplify OptimizeListAgain test

At this point CI represents the combined access (original CI combined
with Paired) so it doesn't make any sense to add in Paired.width

[AMDGPU] SILoadStoreOptimizer: simplify OptimizeListAgain test

At this point CI represents the combined access (original CI combined
with Paired) so it doesn't make any sense to add in Paired.width again.
NFCI.

show more ...


# 68e39462 27-Jan-2022 Jay Foad <[email protected]>

[AMDGPU] SILoadStoreOptimizer: break lists on instructions with side effects

This just helps to keep the lists shorter and faster to sort. NFCI.

Differential Revision: https://reviews.llvm.org/D118

[AMDGPU] SILoadStoreOptimizer: break lists on instructions with side effects

This just helps to keep the lists shorter and faster to sort. NFCI.

Differential Revision: https://reviews.llvm.org/D118384

show more ...


# 4b133cee 27-Jan-2022 Jay Foad <[email protected]>

[AMDGPU] SILoadStoreOptimizer: reject AGPR DS_WRITE sooner

Rejecting AGPR DS_WRITE instructions before adding them to any mergeable
list seems cleaner than adding them to the list and rejecting them

[AMDGPU] SILoadStoreOptimizer: reject AGPR DS_WRITE sooner

Rejecting AGPR DS_WRITE instructions before adding them to any mergeable
list seems cleaner than adding them to the list and rejecting them
later.

Differential Revision: https://reviews.llvm.org/D118368

show more ...


# 94a4594c 27-Jan-2022 Jay Foad <[email protected]>

[AMDGPU] SILoadStoreOptimizer: use separate lists for AGPR instructions

Using separate lists for AGPR and non-AGPR instructions seems like a
cleaner solution than putting them all in the same list a

[AMDGPU] SILoadStoreOptimizer: use separate lists for AGPR instructions

Using separate lists for AGPR and non-AGPR instructions seems like a
cleaner solution than putting them all in the same list and then later
refusing to merge instructions of different AGPR-ness.

Differential Revision: https://reviews.llvm.org/D118367

show more ...


# 8a52fef1 27-Jan-2022 Jay Foad <[email protected]>

[AMDGPU] SILoadStoreOptimizer: tweak API of CombineInfo::setMI. NFC.

Change CombineInfo::setMI to take a reference to the
SILoadStoreOptimizer instance, for easy access to common fields like
TII and

[AMDGPU] SILoadStoreOptimizer: tweak API of CombineInfo::setMI. NFC.

Change CombineInfo::setMI to take a reference to the
SILoadStoreOptimizer instance, for easy access to common fields like
TII and STM.

Differential Revision: https://reviews.llvm.org/D118366

show more ...


# 185cb8e8 26-Jan-2022 Jay Foad <[email protected]>

[AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access

Swizzled accesses are not merged, but there is no particular reason not
to merge two instructions if any of the intervening inst

[AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access

Swizzled accesses are not merged, but there is no particular reason not
to merge two instructions if any of the intervening instructions happens
to be a swizzled access.

This moves the check for swizzled accesses out of checkAndPrepareMerge
into collectMergeableInsts where I think it makes more sense.

Differential Revision: https://reviews.llvm.org/D118267

show more ...


# 95857a70 26-Jan-2022 Jay Foad <[email protected]>

[AMDGPU] SILoadStoreOptimizer: Remove redundant check for volatile

SILoadStoreOptimizer::collectMergeableInsts already ends the current
block if it sees a volatile (or ordered) memory access, so the

[AMDGPU] SILoadStoreOptimizer: Remove redundant check for volatile

SILoadStoreOptimizer::collectMergeableInsts already ends the current
block if it sees a volatile (or ordered) memory access, so there is no
need to check for them again when scanning the instructions between two
pairing candidates in a block.

Differential Revision: https://reviews.llvm.org/D118266

show more ...


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3
# 63eea41d 19-Jan-2022 Jay Foad <[email protected]>

[AMDGPU] Simplify SILoadStoreOptimizer::getSubRegIdxs. NFC.


Revision tags: llvmorg-13.0.1-rc2
# 5a667c0e 28-Dec-2021 Kazu Hirata <[email protected]>

[llvm] Use nullptr instead of 0 (NFC)

Identified with modernize-use-nullptr.


Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 654c89d8 06-Sep-2021 Christudasan Devadasan <[email protected]>

[AMDGPU] Make vector superclasses allocatable

The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to

[AMDGPU] Make vector superclasses allocatable

The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.

Also, added the missing AV register classes from
192b to 1024b.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D109300

show more ...


# d1f45ed5 11-Nov-2021 Neubauer, Sebastian <[email protected]>

[AMDGPU][NFC] Fix typos

Differential Revision: https://reviews.llvm.org/D113672


# c5029023 02-Nov-2021 Martin Liska <[email protected]>

Fix building with GCC 12:

Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380

Differential Revision: https://reviews.llvm.org/D112990


123456