|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
7ca9e471 |
| 14-Jul-2022 |
Austin Kerbow <[email protected]> |
[AMDGPU] Start refactoring GCNSchedStrategy
Tries to make the different scheduling stages a bit more self contained and modifiable. Intended to be NFC. Preface to other changes.
Reviewed By: rampit
[AMDGPU] Start refactoring GCNSchedStrategy
Tries to make the different scheduling stages a bit more self contained and modifiable. Intended to be NFC. Preface to other changes.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D130147
show more ...
|
| #
8d0383eb |
| 24-Jun-2022 |
Matt Arsenault <[email protected]> |
CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is
CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
311edc6b |
| 25-Mar-2022 |
Vang Thao <[email protected]> |
[AMDGPU] Enable PreRARematerialize scheduling pass with multiple high RP regions
Enable the PreRARematerialize pass when there are multiple high RP scheduling regions present. Require the occupancy
[AMDGPU] Enable PreRARematerialize scheduling pass with multiple high RP regions
Enable the PreRARematerialize pass when there are multiple high RP scheduling regions present. Require the occupancy in all high RP regions be improved before finalizing sinking. If any high RP region did not improve in occupancy then un-do all sinking and restore the state to before the pass.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D122501
show more ...
|
| #
cd107117 |
| 08-Apr-2022 |
Vang Thao <[email protected]> |
[AMDGPU] Fix inline asm causing assert during PreRARematerialize stage in scheduler pass
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D123348
|
| #
45c2371c |
| 04-Apr-2022 |
Vang Thao <[email protected]> |
[AMDGPU] Ignore debug use during PreRARematerialize stage in scheduling pass
Ignore all debug uses when collecting trivially rematerializable defs. This fixes an issue with difference in codegen whe
[AMDGPU] Ignore debug use during PreRARematerialize stage in scheduling pass
Ignore all debug uses when collecting trivially rematerializable defs. This fixes an issue with difference in codegen when enabling debug info.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D123048
show more ...
|
| #
27e19315 |
| 17-Mar-2022 |
Vang Thao <[email protected]> |
[AMDGPU] Fix PreRARematerialize scheduler pass sinking subreg defs
When collecting trivially rematerializable defs, skip any subreg defs. We do not want to sink these.
Differential Revision: https:
[AMDGPU] Fix PreRARematerialize scheduler pass sinking subreg defs
When collecting trivially rematerializable defs, skip any subreg defs. We do not want to sink these.
Differential Revision: https://reviews.llvm.org/D121874
show more ...
|
| #
989f1c72 |
| 15-Mar-2022 |
serge-sans-paille <[email protected]> |
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
a278250b |
| 10-Mar-2022 |
Nico Weber <[email protected]> |
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https:/
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169
show more ...
|
| #
7f230fee |
| 07-Mar-2022 |
serge-sans-paille <[email protected]> |
Cleanup codegen includes
after: 1061034926 before: 1063332844
Differential Revision: https://reviews.llvm.org/D121169
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
28322c25 |
| 10-Feb-2022 |
Vang Thao <[email protected]> |
[AMDGPU] Add scheduler pass to rematerialize trivial defs
Add a new pass in the pre-ra AMDGPU scheduler to check if sinking trivially rematerializable defs that only has one use outside of the defin
[AMDGPU] Add scheduler pass to rematerialize trivial defs
Add a new pass in the pre-ra AMDGPU scheduler to check if sinking trivially rematerializable defs that only has one use outside of the defining block will increase occupancy. If we can determine that occupancy can be increased, then rematerialize only the minimum amount of defs required to increase occupancy. Also re-schedule all regions that had occupancy matching the previous min occupancy using the new occupancy.
This is based off of the discussion in https://reviews.llvm.org/D117562.
The logic to determine the defs we should collect and determining if sinking would be beneficial is mostly the same. Main differences is that we are no longer limiting it to immediate defs and the def and use does not have to be part of a loop.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D119475
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1 |
|
| #
57047119 |
| 04-Feb-2022 |
Vang Thao <[email protected]> |
[AMDGPU] Fix debug values in scheduler not placed correctly when reverting
Debug position data is cleared after ScheduleDAGMILive::schedule() due to it also calling placeDebugValues(). Make it so th
[AMDGPU] Fix debug values in scheduler not placed correctly when reverting
Debug position data is cleared after ScheduleDAGMILive::schedule() due to it also calling placeDebugValues(). Make it so the data is not cleared after initial call to placeDebugValues since we will call it again after reverting a schedule.
Secondly, since we skip debug instructions when reverting the schedule on AMDGPU, all debug instructions are now moved to the end of the scheduling region. RegionEnd points to the beginning of this chunk of debug instructions since it was not incremented when a debug instruction was skipped. RegionBegin may also point to the same debug instruction if Unsched.front() is a debug instruction thus shrinking the region to 1. Fix RegionBegin and RegionEnd so that they point to the current beginning and ending before calling placeDebugValues() since both vars will be used as reference points to move debug instructions back.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D119022
show more ...
|
| #
2ca194ff |
| 02-Feb-2022 |
Vang Thao <[email protected]> |
[AMDGPU] Fix scheduler live-ins with debug inst at start of block
GCNDownwardRPTracker RPTracker.reset() skips debug instructions for NextMI so RPTracker.getNext() will never give the beginning of a
[AMDGPU] Fix scheduler live-ins with debug inst at start of block
GCNDownwardRPTracker RPTracker.reset() skips debug instructions for NextMI so RPTracker.getNext() will never give the beginning of a sched region if it is a debug value. In this case we will never set the live-ins for that block.
Add check to see if getNext also equals the MI after skipping debug instructions.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D118853
show more ...
|
|
Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
d1f45ed5 |
| 11-Nov-2021 |
Neubauer, Sebastian <[email protected]> |
[AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
|
| #
02e60f2e |
| 23-Oct-2021 |
Austin Kerbow <[email protected]> |
[AMDGPU] Use max waves for scheduler's initial occupancy target
The scheduler should set critical/excess register usage thresholds that are guided by the maximum possible occupancy for the function.
[AMDGPU] Use max waves for scheduler's initial occupancy target
The scheduler should set critical/excess register usage thresholds that are guided by the maximum possible occupancy for the function. This change is focused on setting proper lower bounds on register usage which we would typically only see when a specific number of maximum waves is requested with the "waves-per-eu" attribute, or by setting "amdgpu-num-vgpr|sgpr" directly. This was broken previously. I have a follow-on patch that will address issues with the scheduler not targeting correct upper bounds on register usage which is typical with launch bounds and min "waves-per-eu".
Changes by this patch:
Set the initial critical register usage thresholds to minimum values that are determined by the maximum possible occupancy for the function, or the number of allocatable registers, whichever is lower.
Avoid unisgned overflow if register limits are lower than the register tracking "ErrorMargin", I.e. when using stress-regalloc=2.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D112373
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3 |
|
| #
799c50fe |
| 25-Feb-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Avoid second rescheduling for some regions
If a region was not constrained by a high register pressure and was not rescheduled without clustering we can skip rescheduling it ClusteredLowOcc
[AMDGPU] Avoid second rescheduling for some regions
If a region was not constrained by a high register pressure and was not rescheduled without clustering we can skip rescheduling it ClusteredLowOccupancyReschedule stage.
This improves scheduling speed by 25% on some kernels.
Differential Revision: https://reviews.llvm.org/D97506
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc2 |
|
| #
635993f0 |
| 23-Feb-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Skip unclusterd rescheduling w/o ld/st
We are attempting rescheduling without load store clustering if occupancy limits were not met with clustering. Skip this for regions which do not have
[AMDGPU] Skip unclusterd rescheduling w/o ld/st
We are attempting rescheduling without load store clustering if occupancy limits were not met with clustering. Skip this for regions which do not have any loads or stores at all.
In a set of kernels I am experimenting with this improves scheduling time by ~30%.
Differential Revision: https://reviews.llvm.org/D97342
show more ...
|
| #
bb16efe2 |
| 22-Feb-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Move RPT::getLiveRegs() check under EXPENSIVE_CHECKS
This is too expensive even for debug builds. It doubles scheduling time if enabled.
Differential Revision: https://reviews.llvm.org/D97
[AMDGPU] Move RPT::getLiveRegs() check under EXPENSIVE_CHECKS
This is too expensive even for debug builds. It doubles scheduling time if enabled.
Differential Revision: https://reviews.llvm.org/D97232
show more ...
|
| #
a8d9d507 |
| 17-Feb-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
|
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
| #
560d7e04 |
| 20-Jan-2021 |
dfukalov <[email protected]> |
[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D95036
|
|
Revision tags: llvmorg-11.1.0-rc1 |
|
| #
6a87e9b0 |
| 25-Dec-2020 |
dfukalov <[email protected]> |
[NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D93813
|
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2 |
|
| #
04bd5b52 |
| 07-Aug-2020 |
Vang Thao <[email protected]> |
[AMDGPU] Fix not rescheduling without clustering
Regions are sometimes skipped which should be rescheduled without memory op clustering. RegionIdx is not incremented when iterating over regions that
[AMDGPU] Fix not rescheduling without clustering
Regions are sometimes skipped which should be rescheduled without memory op clustering. RegionIdx is not incremented when iterating over regions that are flagged to be skipped, causing the index to be incorrect.
Thanks to Vang Thao for discovering this bug!
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D85498
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
| #
43830790 |
| 07-Oct-2019 |
Jay Foad <[email protected]> |
[AMDGPU] Remove dubious logic in bidirectional list scheduler
Summary: pickNodeBidirectional tried to compare the best top candidate and the best bottom candidate by examining TopCand.Reason and Bot
[AMDGPU] Remove dubious logic in bidirectional list scheduler
Summary: pickNodeBidirectional tried to compare the best top candidate and the best bottom candidate by examining TopCand.Reason and BotCand.Reason. This is unsound because, after calling pickNodeFromQueue, Cand.Reason does not reflect the most important reason why Cand was chosen. Rather it reflects the most recent reason why it beat some other potential candidate, which could have been for some low priority tie breaker reason.
I have seen this cause problems where TopCand is a good candidate, but because TopCand.Reason is ORDER (which is very low priority) it is repeatedly ignored in favour of a mediocre BotCand. This is not how bidirectional scheduling is supposed to work.
To fix this I changed the code to always compare TopCand and BotCand directly, like the generic implementation of pickNodeBidirectional does. This removes some uncommented AMDGPU-specific logic; if this logic turns out to be important then perhaps it could be moved into an override of tryCandidate instead.
Graphics shader benchmarking on gfx10 shows a lot more positive than negative effects from this change.
Reviewers: arsenm, tstellar, rampitec, kzhuravl, vpykhtin, dstuttard, tpr, atrick, MatzeB
Subscribers: jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68338
show more ...
|
| #
dd476645 |
| 18-Feb-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Use generated RegisterPressureSets enum
Differential Revision: https://reviews.llvm.org/D74671
|
| #
53eb0f8c |
| 24-Jan-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Attempt to reschedule withou clustering
We want to have more load/store clustering but we also want to maintain low register pressure which are oposit targets. Allow scheduler to reschedule
[AMDGPU] Attempt to reschedule withou clustering
We want to have more load/store clustering but we also want to maintain low register pressure which are oposit targets. Allow scheduler to reschedule regions without mutations applied if we hit a register limit.
Differential Revision: https://reviews.llvm.org/D73386
show more ...
|
| #
4aa7fb77 |
| 03-Jan-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Revert scheduling to reduce spilling
We can revert region schedule if new schedule decreases occupancy. However, if we already have only one wave we would accept any new schedule even if it
[AMDGPU] Revert scheduling to reduce spilling
We can revert region schedule if new schedule decreases occupancy. However, if we already have only one wave we would accept any new schedule even if it blows up register pressure. Such schedule may result in quite heavy spilling which can be avoided if we reject this new schedule.
Differential Revision: https://reviews.llvm.org/D72181
show more ...
|