History log of /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (Results 1 – 25 of 66)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 7ca9e471 14-Jul-2022 Austin Kerbow <[email protected]>

[AMDGPU] Start refactoring GCNSchedStrategy

Tries to make the different scheduling stages a bit more self contained and
modifiable. Intended to be NFC. Preface to other changes.

Reviewed By: rampit

[AMDGPU] Start refactoring GCNSchedStrategy

Tries to make the different scheduling stages a bit more self contained and
modifiable. Intended to be NFC. Preface to other changes.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D130147

show more ...


# 8d0383eb 24-Jun-2022 Matt Arsenault <[email protected]>

CodeGen: Remove AliasAnalysis from regalloc

This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is

CodeGen: Remove AliasAnalysis from regalloc

This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.

Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.

Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.

show more ...


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# 311edc6b 25-Mar-2022 Vang Thao <[email protected]>

[AMDGPU] Enable PreRARematerialize scheduling pass with multiple high RP regions

Enable the PreRARematerialize pass when there are multiple high RP scheduling
regions present. Require the occupancy

[AMDGPU] Enable PreRARematerialize scheduling pass with multiple high RP regions

Enable the PreRARematerialize pass when there are multiple high RP scheduling
regions present. Require the occupancy in all high RP regions be improved
before finalizing sinking. If any high RP region did not improve in occupancy
then un-do all sinking and restore the state to before the pass.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D122501

show more ...


# cd107117 08-Apr-2022 Vang Thao <[email protected]>

[AMDGPU] Fix inline asm causing assert during PreRARematerialize stage in scheduler pass

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D123348


# 45c2371c 04-Apr-2022 Vang Thao <[email protected]>

[AMDGPU] Ignore debug use during PreRARematerialize stage in scheduling pass

Ignore all debug uses when collecting trivially rematerializable defs. This fixes an issue with difference in codegen whe

[AMDGPU] Ignore debug use during PreRARematerialize stage in scheduling pass

Ignore all debug uses when collecting trivially rematerializable defs. This fixes an issue with difference in codegen when enabling debug info.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D123048

show more ...


# 27e19315 17-Mar-2022 Vang Thao <[email protected]>

[AMDGPU] Fix PreRARematerialize scheduler pass sinking subreg defs

When collecting trivially rematerializable defs, skip any subreg defs. We do not want to sink these.

Differential Revision: https:

[AMDGPU] Fix PreRARematerialize scheduler pass sinking subreg defs

When collecting trivially rematerializable defs, skip any subreg defs. We do not want to sink these.

Differential Revision: https://reviews.llvm.org/D121874

show more ...


# 989f1c72 15-Mar-2022 serge-sans-paille <[email protected]>

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# a278250b 10-Mar-2022 Nico Weber <[email protected]>

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https:/

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169

show more ...


# 7f230fee 07-Mar-2022 serge-sans-paille <[email protected]>

Cleanup codegen includes

after: 1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169


Revision tags: llvmorg-14.0.0-rc2
# 28322c25 10-Feb-2022 Vang Thao <[email protected]>

[AMDGPU] Add scheduler pass to rematerialize trivial defs

Add a new pass in the pre-ra AMDGPU scheduler to check if sinking trivially rematerializable defs that only has one use outside of the defin

[AMDGPU] Add scheduler pass to rematerialize trivial defs

Add a new pass in the pre-ra AMDGPU scheduler to check if sinking trivially rematerializable defs that only has one use outside of the defining block will increase occupancy. If we can determine that occupancy can be increased, then rematerialize only the minimum amount of defs required to increase occupancy. Also re-schedule all regions that had occupancy matching the previous min occupancy using the new occupancy.

This is based off of the discussion in https://reviews.llvm.org/D117562.

The logic to determine the defs we should collect and determining if sinking would be beneficial is mostly the same. Main differences is that we are no longer limiting it to immediate defs and the def and use does not have to be part of a loop.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D119475

show more ...


Revision tags: llvmorg-14.0.0-rc1
# 57047119 04-Feb-2022 Vang Thao <[email protected]>

[AMDGPU] Fix debug values in scheduler not placed correctly when reverting

Debug position data is cleared after ScheduleDAGMILive::schedule() due to it also calling placeDebugValues(). Make it so th

[AMDGPU] Fix debug values in scheduler not placed correctly when reverting

Debug position data is cleared after ScheduleDAGMILive::schedule() due to it also calling placeDebugValues(). Make it so the data is not cleared after initial call to placeDebugValues since we will call it again after reverting a schedule.

Secondly, since we skip debug instructions when reverting the schedule on AMDGPU, all debug instructions are now moved to the end of the scheduling region. RegionEnd points to the beginning of this chunk of debug instructions since it was not incremented when a debug instruction was skipped. RegionBegin may also point to the same debug instruction if Unsched.front() is a debug instruction thus shrinking the region to 1. Fix RegionBegin and RegionEnd so that they point to the current beginning and ending before calling placeDebugValues() since both vars will be used as reference points to move debug instructions back.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D119022

show more ...


# 2ca194ff 02-Feb-2022 Vang Thao <[email protected]>

[AMDGPU] Fix scheduler live-ins with debug inst at start of block

GCNDownwardRPTracker RPTracker.reset() skips debug instructions for NextMI so RPTracker.getNext() will never give the beginning of a

[AMDGPU] Fix scheduler live-ins with debug inst at start of block

GCNDownwardRPTracker RPTracker.reset() skips debug instructions for NextMI so RPTracker.getNext() will never give the beginning of a sched region if it is a debug value. In this case we will never set the live-ins for that block.

Add check to see if getNext also equals the MI after skipping debug instructions.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D118853

show more ...


Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# d1f45ed5 11-Nov-2021 Neubauer, Sebastian <[email protected]>

[AMDGPU][NFC] Fix typos

Differential Revision: https://reviews.llvm.org/D113672


# 02e60f2e 23-Oct-2021 Austin Kerbow <[email protected]>

[AMDGPU] Use max waves for scheduler's initial occupancy target

The scheduler should set critical/excess register usage thresholds that
are guided by the maximum possible occupancy for the function.

[AMDGPU] Use max waves for scheduler's initial occupancy target

The scheduler should set critical/excess register usage thresholds that
are guided by the maximum possible occupancy for the function. This
change is focused on setting proper lower bounds on register usage which
we would typically only see when a specific number of maximum waves is
requested with the "waves-per-eu" attribute, or by setting
"amdgpu-num-vgpr|sgpr" directly. This was broken previously. I have a
follow-on patch that will address issues with the scheduler not
targeting correct upper bounds on register usage which is typical with
launch bounds and min "waves-per-eu".

Changes by this patch:

Set the initial critical register usage thresholds to minimum values
that are determined by the maximum possible occupancy for the function,
or the number of allocatable registers, whichever is lower.

Avoid unisgned overflow if register limits are lower than the register
tracking "ErrorMargin", I.e. when using stress-regalloc=2.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112373

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3
# 799c50fe 25-Feb-2021 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Avoid second rescheduling for some regions

If a region was not constrained by a high register pressure
and was not rescheduled without clustering we can skip
rescheduling it ClusteredLowOcc

[AMDGPU] Avoid second rescheduling for some regions

If a region was not constrained by a high register pressure
and was not rescheduled without clustering we can skip
rescheduling it ClusteredLowOccupancyReschedule stage.

This improves scheduling speed by 25% on some kernels.

Differential Revision: https://reviews.llvm.org/D97506

show more ...


Revision tags: llvmorg-12.0.0-rc2
# 635993f0 23-Feb-2021 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Skip unclusterd rescheduling w/o ld/st

We are attempting rescheduling without load store clustering
if occupancy limits were not met with clustering. Skip this
for regions which do not have

[AMDGPU] Skip unclusterd rescheduling w/o ld/st

We are attempting rescheduling without load store clustering
if occupancy limits were not met with clustering. Skip this
for regions which do not have any loads or stores at all.

In a set of kernels I am experimenting with this improves
scheduling time by ~30%.

Differential Revision: https://reviews.llvm.org/D97342

show more ...


# bb16efe2 22-Feb-2021 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Move RPT::getLiveRegs() check under EXPENSIVE_CHECKS

This is too expensive even for debug builds. It doubles
scheduling time if enabled.

Differential Revision: https://reviews.llvm.org/D97

[AMDGPU] Move RPT::getLiveRegs() check under EXPENSIVE_CHECKS

This is too expensive even for debug builds. It doubles
scheduling time if enabled.

Differential Revision: https://reviews.llvm.org/D97232

show more ...


# a8d9d507 17-Feb-2021 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] gfx90a support

Differential Revision: https://reviews.llvm.org/D96906


Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2
# 560d7e04 20-Jan-2021 dfukalov <[email protected]>

[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets

... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036


Revision tags: llvmorg-11.1.0-rc1
# 6a87e9b0 25-Dec-2020 dfukalov <[email protected]>

[NFC][AMDGPU] Reduce include files dependency.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2
# 04bd5b52 07-Aug-2020 Vang Thao <[email protected]>

[AMDGPU] Fix not rescheduling without clustering

Regions are sometimes skipped which should be rescheduled without memory op
clustering. RegionIdx is not incremented when iterating over regions that

[AMDGPU] Fix not rescheduling without clustering

Regions are sometimes skipped which should be rescheduled without memory op
clustering. RegionIdx is not incremented when iterating over regions that
are flagged to be skipped, causing the index to be incorrect.

Thanks to Vang Thao for discovering this bug!

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D85498

show more ...


Revision tags: llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1
# 43830790 07-Oct-2019 Jay Foad <[email protected]>

[AMDGPU] Remove dubious logic in bidirectional list scheduler

Summary:
pickNodeBidirectional tried to compare the best top candidate and the
best bottom candidate by examining TopCand.Reason and Bot

[AMDGPU] Remove dubious logic in bidirectional list scheduler

Summary:
pickNodeBidirectional tried to compare the best top candidate and the
best bottom candidate by examining TopCand.Reason and BotCand.Reason.
This is unsound because, after calling pickNodeFromQueue, Cand.Reason
does not reflect the most important reason why Cand was chosen. Rather
it reflects the most recent reason why it beat some other potential
candidate, which could have been for some low priority tie breaker
reason.

I have seen this cause problems where TopCand is a good candidate, but
because TopCand.Reason is ORDER (which is very low priority) it is
repeatedly ignored in favour of a mediocre BotCand. This is not how
bidirectional scheduling is supposed to work.

To fix this I changed the code to always compare TopCand and BotCand
directly, like the generic implementation of pickNodeBidirectional does.
This removes some uncommented AMDGPU-specific logic; if this logic turns
out to be important then perhaps it could be moved into an override of
tryCandidate instead.

Graphics shader benchmarking on gfx10 shows a lot more positive than
negative effects from this change.

Reviewers: arsenm, tstellar, rampitec, kzhuravl, vpykhtin, dstuttard, tpr, atrick, MatzeB

Subscribers: jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68338

show more ...


# dd476645 18-Feb-2020 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Use generated RegisterPressureSets enum

Differential Revision: https://reviews.llvm.org/D74671


# 53eb0f8c 24-Jan-2020 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Attempt to reschedule withou clustering

We want to have more load/store clustering but we also want
to maintain low register pressure which are oposit targets.
Allow scheduler to reschedule

[AMDGPU] Attempt to reschedule withou clustering

We want to have more load/store clustering but we also want
to maintain low register pressure which are oposit targets.
Allow scheduler to reschedule regions without mutations
applied if we hit a register limit.

Differential Revision: https://reviews.llvm.org/D73386

show more ...


# 4aa7fb77 03-Jan-2020 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Revert scheduling to reduce spilling

We can revert region schedule if new schedule decreases occupancy.
However, if we already have only one wave we would accept any new
schedule even if it

[AMDGPU] Revert scheduling to reduce spilling

We can revert region schedule if new schedule decreases occupancy.
However, if we already have only one wave we would accept any new
schedule even if it blows up register pressure. Such schedule may
result in quite heavy spilling which can be avoided if we reject
this new schedule.

Differential Revision: https://reviews.llvm.org/D72181

show more ...


123