|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
da067ed5 |
| 10-Nov-2021 |
Austin Kerbow <[email protected]> |
[AMDGPU] Set most sched model resource's BufferSize to one
Using a BufferSize of one for memory ProcResources will result in better ILP since it more accurately models the dependencies between memor
[AMDGPU] Set most sched model resource's BufferSize to one
Using a BufferSize of one for memory ProcResources will result in better ILP since it more accurately models the dependencies between memory ops and their consumers on an in-order processor. After this change, the scheduler will treat the data edges from loads as blocking so that stalls are guaranteed when waiting for data to be retreaved from memory. Since we don't actually track waitcnt here, this should do a better job at modeling their behavior.
Practically, this means that the scheduler will trigger the 'STALL' heuristic more often.
This type of change needs to be evaluated experimentally. Preliminary results are positive.
Fixes: SWDEV-282962
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D114777
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
| #
3ce1b963 |
| 08-Sep-2021 |
Joe Nash <[email protected]> |
[AMDGPU] Switch PostRA sched to MachineSched
Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D1095
[AMDGPU] Switch PostRA sched to MachineSched
Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D109536
Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
ed0f4415 |
| 15-Jul-2021 |
alex-t <[email protected]> |
[AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode dive
[AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode divergence flag.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D106079
show more ...
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
| #
2f499b9a |
| 19-Dec-2020 |
Tony <[email protected]> |
[AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant cache
[AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant caches will be bypassed.
A volatile atomic is not changed and still only bypasses caches upto the level specified by the SyncScope operand.
Differential Revision: https://reviews.llvm.org/D94214
show more ...
|
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
| #
9bb2b4f0 |
| 29-Oct-2020 |
Christudasan Devadasan <[email protected]> |
[AMDGPU] Add alignment check for v3 to v4 load type promotion
It should be enabled only when the load alignment is at least 8-byte.
Fixes: SWDEV-256824
Reviewed By: foad
Differential Revision: ht
[AMDGPU] Add alignment check for v3 to v4 load type promotion
It should be enabled only when the load alignment is at least 8-byte.
Fixes: SWDEV-256824
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D90404
show more ...
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init |
|
| #
62fd7f76 |
| 07-Jan-2020 |
Jay Foad <[email protected]> |
[MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics
tryLatency compares two sched candidates. For the top zone it prefers the one with lesser depth, but only if that depth is grea
[MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics
tryLatency compares two sched candidates. For the top zone it prefers the one with lesser depth, but only if that depth is greater than the total latency of the instructions we've already scheduled -- otherwise its latency would be hidden and there would be no stall.
Unfortunately it only tests the depth of one of the candidates. This can lead to situations where the TopDepthReduce heuristic does not kick in, but a lower priority heuristic chooses the other candidate, whose depth *is* greater than the already scheduled latency, which causes a stall.
The fix is to apply the heuristic if the depth of *either* candidate is greater than the already scheduled latency.
All this also applies to the BotHeightReduce heuristic in the bottom zone.
Differential Revision: https://reviews.llvm.org/D72392
show more ...
|
| #
778351df |
| 24-Jun-2020 |
Matt Arsenault <[email protected]> |
Revert "[AMDGPU] Enable compare operations to be selected by divergence"
This reverts commit 521ac0b5cea02f629d035f807460affbb65ae7ad.
Reported to break thousands of piglit tests.
|
| #
521ac0b5 |
| 19-Jun-2020 |
alex-t <[email protected]> |
[AMDGPU] Enable compare operations to be selected by divergence
Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent.
Reviewers: rampitec, arsenm
[AMDGPU] Enable compare operations to be selected by divergence
Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent.
Reviewers: rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82194
show more ...
|
| #
7e626842 |
| 08-Apr-2020 |
Simon Pilgrim <[email protected]> |
[AMDGPU] Regenerate vector-extract-insert test checks to fix issue reported on D77354
|
|
Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3 |
|
| #
054f8101 |
| 19-Nov-2018 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Convert insert_vector_elt into set of selects
This allows to avoid scratch use or indirect VGPR addressing for small vectors.
Differential Revision: https://reviews.llvm.org/D54606
llvm-s
[AMDGPU] Convert insert_vector_elt into set of selects
This allows to avoid scratch use or indirect VGPR addressing for small vectors.
Differential Revision: https://reviews.llvm.org/D54606
llvm-svn: 347231
show more ...
|
| #
bcb34ac2 |
| 13-Nov-2018 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] combine extractelement into several selects
An extractelement with non-constant index will be lowered either to scratch or movrel loop in most cases. This patch converts such instruction in
[AMDGPU] combine extractelement into several selects
An extractelement with non-constant index will be lowered either to scratch or movrel loop in most cases. This patch converts such instruction into a set of selects if vector size is not too big.
Differential Revision: https://reviews.llvm.org/D54351
llvm-svn: 346800
show more ...
|
|
Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2 |
|
| #
8728c5f2 |
| 07-Aug-2017 |
Matt Arsenault <[email protected]> |
AMDGPU: Cleanup subtarget features
Try to avoid mutually exclusive features. Don't use a real default GPU, and use a fake "generic". The goal is to make it easier to see which set of features are in
AMDGPU: Cleanup subtarget features
Try to avoid mutually exclusive features. Don't use a real default GPU, and use a fake "generic". The goal is to make it easier to see which set of features are incompatible between feature strings.
Most of the test changes are due to random scheduling changes from not having a default fullspeed model.
llvm-svn: 310258
show more ...
|
|
Revision tags: llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2, llvmorg-4.0.1-rc1 |
|
| #
3dbeefa9 |
| 21-Mar-2017 |
Matt Arsenault <[email protected]> |
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default ca
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel).
llvm-svn: 298444
show more ...
|
|
Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2, llvmorg-4.0.0-rc1, llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1, llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2, llvmorg-3.9.0-rc1 |
|
| #
3fb8f9ea |
| 08-Jul-2016 |
Matt Arsenault <[email protected]> |
Reapply r274829 with fix for FP vectors
llvm-svn: 274937
|
| #
2d793895 |
| 05-Jul-2016 |
Matt Arsenault <[email protected]> |
DAGCombiner: Fold away vector extract of insert with the same index
This only really matters when the index is non-constant since the constant case already gets taken care of by other combines.
llv
DAGCombiner: Fold away vector extract of insert with the same index
This only really matters when the index is non-constant since the constant case already gets taken care of by other combines.
llvm-svn: 274569
show more ...
|