|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
8d0383eb |
| 24-Jun-2022 |
Matt Arsenault <[email protected]> |
CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is
CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
3eb2281b |
| 16-May-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Aggressively fold immediates in SIFoldOperands
Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by ad
[AMDGPU] Aggressively fold immediates in SIFoldOperands
Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by adding the same 32-bit literal operand to many instructions.
This patch removes that restriction, so that a non-inlinable immediate will be folded into any number of users. The rationale is: - It reduces the number of registers used for holding constant values, which might increase occupancy. (On the other hand, many of these registers are SGPRs which no longer affect occupancy on GFX10+.) - It reduces ALU stalls between the instruction that loads a constant into a register, and the instruction that uses it. - The above benefits are expected to outweigh any increase in code size.
Differential Revision: https://reviews.llvm.org/D114643
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
18ba57a5 |
| 18-Nov-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Regenerate test checks in splitkit-getsubrangeformask.ll
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
ed0f4415 |
| 15-Jul-2021 |
alex-t <[email protected]> |
[AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode dive
[AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode divergence flag.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D106079
show more ...
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3 |
|
| #
40e3df2a |
| 17-Jun-2021 |
Ruiling Song <[email protected]> |
[RegisterCoalescer] Resolve conflict based on liveness of subregister
Currently we are resolving lane/subregister conflict by visiting instructions sequentially in current block to see whether there
[RegisterCoalescer] Resolve conflict based on liveness of subregister
Currently we are resolving lane/subregister conflict by visiting instructions sequentially in current block to see whether there is any use of the tainted lanes. To save compile time, we are not doing further check in successor blocks. This sounds reasonable without subgregister liveness.
But since we have added subregister liveness tracking capability to register coalescer, we can easily determine whether we have subregister liveness conflict by checking subranges. This would help coalescing more COPYs for target that enables subregister liveness tracking.
Reviewed by: arsenm, qcolombet
Differential Revision: https://reviews.llvm.org/D104509
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
fae05692 |
| 20-May-2021 |
Matt Arsenault <[email protected]> |
CodeGen: Print/parse LLTs in MachineMemOperands
This will currently accept the old number of bytes syntax, and convert it to a scalar. This should be removed in the near future (I think I converted
CodeGen: Print/parse LLTs in MachineMemOperands
This will currently accept the old number of bytes syntax, and convert it to a scalar. This should be removed in the near future (I think I converted all of the tests already, but likely missed a few).
Not sure what the exact syntax and policy should be. We can continue printing the number of bytes for non-generic instructions to avoid test churn and only allow non-scalar types for generic instructions.
This will currently print the LLT in parentheses, but accept parsing the existing integers and implicitly converting to scalar. The parentheses are a bit ugly, but the parser logic seems unable to deal without either parentheses or some keyword to indicate the start of a type.
show more ...
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2 |
|
| #
3bffb1cd |
| 09-Feb-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amoun
[AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway.
Additional advantage that parser will accept these flags in any order unlike now.
Differential Revision: https://reviews.llvm.org/D96469
show more ...
|
| #
78b8ce40 |
| 13-Mar-2021 |
Roman Lebedev <[email protected]> |
Reland [SCEV] Improve modelling for (null) pointer constants
This reverts commit 329aeb5db43f5e69df038fb20d2def77fe6f8595, and relands commit 61f006ac655431bd44b9e089f74c73bec0c1a48c.
This is a con
Reland [SCEV] Improve modelling for (null) pointer constants
This reverts commit 329aeb5db43f5e69df038fb20d2def77fe6f8595, and relands commit 61f006ac655431bd44b9e089f74c73bec0c1a48c.
This is a continuation of D89456.
As it was suggested there, now that SCEV models `PtrToInt`, we can try to improve SCEV's pointer handling. In particular, i believe, i will need this in the future to further fix `SCEVAddExpr`operation type handling.
This removes special handling of `ConstantPointerNull` from `ScalarEvolution::createSCEV()`, and add constant folding into `ScalarEvolution::getPtrToIntExpr()`. This way, `null` constants stay as such in SCEV's, but gracefully become zero integers when asked.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D98147
show more ...
|
| #
329aeb5d |
| 13-Mar-2021 |
Roman Lebedev <[email protected]> |
Temporairly evert "[SCEV] Improve modelling for (null) pointer constants"
This appears to have broken ubsan bot: https://lab.llvm.org/buildbot/#/builders/85/builds/3062 https://reviews.llvm.org/D981
Temporairly evert "[SCEV] Improve modelling for (null) pointer constants"
This appears to have broken ubsan bot: https://lab.llvm.org/buildbot/#/builders/85/builds/3062 https://reviews.llvm.org/D98147#2623549
It looks like LSR needs some kind of a change around insertion point handling. Reverting until i have a fix.
This reverts commit 61f006ac655431bd44b9e089f74c73bec0c1a48c.
show more ...
|
| #
61f006ac |
| 12-Mar-2021 |
Roman Lebedev <[email protected]> |
[SCEV] Improve modelling for (null) pointer constants
This is a continuation of D89456.
As it was suggested there, now that SCEV models `PtrToInt`, we can try to improve SCEV's pointer handling. In
[SCEV] Improve modelling for (null) pointer constants
This is a continuation of D89456.
As it was suggested there, now that SCEV models `PtrToInt`, we can try to improve SCEV's pointer handling. In particular, i believe, i will need this in the future to further fix `SCEVAddExpr`operation type handling.
This removes special handling of `ConstantPointerNull` from `ScalarEvolution::createSCEV()`, and add constant folding into `ScalarEvolution::getPtrToIntExpr()`. This way, `null` constants stay as such in SCEV's, but gracefully become zero integers when asked.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D98147
show more ...
|
| #
81b2c23b |
| 12-Feb-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Use kill instruction to hint soft clause live ranges
Previously we would use a bundle to hint the register allocator to not overwrite the pointers in a sequence of loads to avoid breaking so
AMDGPU: Use kill instruction to hint soft clause live ranges
Previously we would use a bundle to hint the register allocator to not overwrite the pointers in a sequence of loads to avoid breaking soft clauses. This bundling was based on a fuzzy register pressure heuristic, so we could not guarantee using more registers than are really available. This would result in register allocator failing on unsatisfiable bundles. Use a kill to artificially extend the live ranges, so we can always succeed at register allocation even if it means extra spills in the worst case.
This seems to capture most of the benefit of the bundle while avoiding most of the risk presented by the bundle. However the lit tests do show a handful of regressions. In some cases with sequences of volatile loads, unused load components end up getting reallocated to the next load which forces a wait between. There are also a few small scheduling regressions where a hazard used to be avoided, and one spill torture test which for some reason nearly doubles the stack usage. There is also a bit of noise from leftover kills (it may make sense for post-RA pseudos to strip all of these out).
show more ...
|
| #
a8d9d507 |
| 17-Feb-2021 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
|
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
| #
e3c6fa36 |
| 14-Jan-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Restrict soft clause bundling at half of the available regs
Fixes a testcase that was overcommitting large register tuples to a bundle, which the register allocator could not possibly satisf
AMDGPU: Restrict soft clause bundling at half of the available regs
Fixes a testcase that was overcommitting large register tuples to a bundle, which the register allocator could not possibly satisfy. This was producing a bundle which used nearly all of the available SGPRs with a series of 16-dword loads (not all of which are freely available to use).
This is a quick hack for some deeper issues with how the clause bundler tracks register pressure.
Overall the pressure tracking used here doesn't make sense and is too imprecise for what it needs to avoid the allocator failing. The pressure estimate does not account for the alignment requirements of large SGPR tuples, so this was really underestimating the pressure impact. This also ignores the impact of the extended live range of the use registers after the bundle is introduced. Additionally, it didn't account for some wide tuples not being available due to reserved registers.
This regresses a few cases. These end up introducing more spilling. This is also a function of the global pressure being used in the decision to bundle, not the local pressure impact of the bundle itself.
show more ...
|
|
Revision tags: llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
| #
2291bd13 |
| 30-Nov-2020 |
Austin Kerbow <[email protected]> |
[AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able to differentiate between different scenarios for these dynamic
[AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able to differentiate between different scenarios for these dynamic subtarget features.
The possible settings are:
- Unsupported: The GPU has no support for XNACK/SRAMECC. - Any: Preference is unspecified. Use conservative settings that can run anywhere. - Off: Request support for XNACK/SRAMECC Off - On: Request support for XNACK/SRAMECC On
GCNSubtarget will track the four options based on the following criteria. If the subtarget does not support XNACK/SRAMECC we say the setting is "Unsupported". If no subtarget features for XNACK/SRAMECC are requested we must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the feature string when initializing the subtarget, the settings are "On/Off".
The defaults are updated to be conservatively correct, meaning if no setting for XNACK or SRAMECC is explicitly requested, defaults will be used which generate code that can be run anywhere. This corresponds to the "Any" setting.
Differential Revision: https://reviews.llvm.org/D85882
show more ...
|
| #
8214982b |
| 21-Jan-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names.
Relands ba7dc
[AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names.
Relands ba7dcd8542ab, which had memory leaks.
Differential Revision: https://reviews.llvm.org/D95215
show more ...
|
| #
4dbdff66 |
| 21-Jan-2021 |
Sebastian Neubauer <[email protected]> |
Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue"
This reverts commit ba7dcd8542abfc784255efcb0767701dec42fe83.
(caused memory leaks)
|
| #
ba7dcd85 |
| 15-Jan-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names.
Differential
[AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names.
Differential Revision: https://reviews.llvm.org/D94768
show more ...
|
| #
d28624a2 |
| 01-Dec-2020 |
Jay Foad <[email protected]> |
[AMDGPU] Stop adding an implicit def of vcc_hi for wave32
This doesn't seem to be needed for anything.
Differential Revision: https://reviews.llvm.org/D92400
|
|
Revision tags: llvmorg-11.0.1-rc1 |
|
| #
5ab17021 |
| 10-Nov-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Remove scratch rsrc from spill pseudos
Differential Revision: https://reviews.llvm.org/D91110
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
| #
b34ddfcc |
| 21-Sep-2020 |
Jay Foad <[email protected]> |
[SplitKit] In addDeadDef tolerate parent range that defines more lanes
Following on from D87757 "[SplitKit] Only copy live lanes", in SplitEditor::addDeadDef, when we're checking whether the parent
[SplitKit] In addDeadDef tolerate parent range that defines more lanes
Following on from D87757 "[SplitKit] Only copy live lanes", in SplitEditor::addDeadDef, when we're checking whether the parent live interval has a subrange defining the same lanes, tolerate the case where the parent subrange defines a superset of the lanes. This can happen when the child subrange comes from SplitEditor::buildCopy decomposing a partial copy into a sequence of subreg copies that cover the required lanes.
Differential Revision: https://reviews.llvm.org/D88020
show more ...
|