|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
5cae8816 |
| 06-Jul-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Add GFX11 test coverage
Add GFX11 test coverage to a bunch of tests where it was easy to do so, mostly because the checks are autogenerated and/or GFX11 can share the same checks as GFX10.
[AMDGPU] Add GFX11 test coverage
Add GFX11 test coverage to a bunch of tests where it was easy to do so, mostly because the checks are autogenerated and/or GFX11 can share the same checks as GFX10.
Differential Revision: https://reviews.llvm.org/D129295
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
77851cc1 |
| 15-Jun-2022 |
David Stuttard <[email protected]> |
[AMDGPU] Change use null for dead sdst to be gfx1030+
Pre gfx1030 null for sdst is different. c97436f8b6e2 [AMDGPU] Use null for dead sdst operand - requires a change to make it not apply to pre gfx
[AMDGPU] Change use null for dead sdst to be gfx1030+
Pre gfx1030 null for sdst is different. c97436f8b6e2 [AMDGPU] Use null for dead sdst operand - requires a change to make it not apply to pre gfx1030
Differential Revision: https://reviews.llvm.org/D127869
show more ...
|
| #
c97436f8 |
| 10-Jun-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Use null for dead sdst operand
Differential Revision: https://reviews.llvm.org/D127542
|
|
Revision tags: llvmorg-14.0.5 |
|
| #
23db8e4b |
| 06-Jun-2022 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Use v_mad_u64_u32 for IMAD32
Nic Curtis done the experiments to prove it is faster than a separate mul and add.
Fixes: SWDEV-332806
Differential Revision: https://reviews.llvm.org/D127253
|
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
a5d4f82b |
| 11-Feb-2022 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Make enable-flat-scratch a subtarget feature
Use a subtarget feature instead of a command line argument to reduce global state. We want to enable flat scratch for graphics in some cases and
[AMDGPU] Make enable-flat-scratch a subtarget feature
Use a subtarget feature instead of a command line argument to reduce global state. We want to enable flat scratch for graphics in some cases and this doesn't work well with command line options.
Differential Revision: https://reviews.llvm.org/D119425
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
10ed1eca |
| 18-Jan-2022 |
Vang Thao <[email protected]> |
[MachineSink] Allow sinking of constant or ignorable physreg uses
For AMDGPU, any use of the physical register EXEC prevents sinking even if it is not a real physical register read. Add check to see
[MachineSink] Allow sinking of constant or ignorable physreg uses
For AMDGPU, any use of the physical register EXEC prevents sinking even if it is not a real physical register read. Add check to see if a physical register use can be ignored for sinking.
Also perform same constant and ignorable physical register check when considering sinking in loops.
https://reviews.llvm.org/D116053
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
729bf9b2 |
| 14-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on
AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on uses. This resulted in more argument shuffling code anyway.
Also have the option stop implying all inputs need to be passed. This will no rely on the amdgpu-no-* attributes to avoid passing unnecessary values.
show more ...
|
| #
18f93512 |
| 19-Nov-2021 |
RamNalamothu <[email protected]> |
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slow
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slows down the loader, and is only used to simplify disassembly.
Use '--symbolize-operands' with llvm-objdump to improve readability of the branch target operands in disassembly.
Fixes: SWDEV-312223
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D114273
show more ...
|
| #
722b8e0e |
| 14-Aug-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Invert ABI attribute handling
Previously we assumed all callable functions did not need any implicitly passed inputs, and added attributes to functions to indicate when they were necessary.
AMDGPU: Invert ABI attribute handling
Previously we assumed all callable functions did not need any implicitly passed inputs, and added attributes to functions to indicate when they were necessary. Requiring attributes for correctness is pretty ugly, and it makes supporting indirect and external calls more complicated.
This inverts the direction of the attributes, so an undecorated function is assumed to need all implicit imputs. This enables AMDGPUAttributor by default to mark when functions are proven to not need a given input. This strips the equivalent functionality from the legacy AMDGPUAnnotateKernelFeatures pass.
However, AMDGPUAnnotateKernelFeatures is not fully removed at this point although it should be in the future. It is still necessary for the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which would be better served by a trivial analysis on the IR during selection. Additionally, AMDGPUAnnotateKernelFeatures still redundantly handles the uniform-work-group-size attribute to be removed in a future commit.
At this point when not using -amdgpu-fixed-function-abi, we are still modifying the ABI based on these newly negated attributes. In the future, this option will be removed and the locations for implicit inputs will always be fixed. We will then use the new attributes to avoid passing the values when unnecessary.
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3 |
|
| #
ed745839 |
| 03-Mar-2021 |
Jay Foad <[email protected]> |
[AMDGPU] Don't check for VMEM hazards on GFX10
The hazard where a VMEM reads an SGPR written by a VALU counts as a data dependency hazard, so no nops are required on GFX10. Tested with Vulkan CTS on
[AMDGPU] Don't check for VMEM hazards on GFX10
The hazard where a VMEM reads an SGPR written by a VALU counts as a data dependency hazard, so no nops are required on GFX10. Tested with Vulkan CTS on GFX10.1 and GFX10.3.
Differential Revision: https://reviews.llvm.org/D97926
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
| #
ff8a1cae |
| 15-Jan-2021 |
Christudasan Devadasan <[email protected]> |
[AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses.
During instruction selection, there is an inconsistency in choosing the initial soffset value. With certain early passes, this val
[AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses.
During instruction selection, there is an inconsistency in choosing the initial soffset value. With certain early passes, this value is getting modified and that brought additional fixup during eliminateFrameIndex to work for all cases. This whole transformation looks trivial and can be handled better.
This patch clearly defines the initial value for soffset and keeps it unchanged before eliminateFrameIndex. The initial value must be zero for MUBUF with a frame index. The non-frame index MUBUF forms that use a raw offset from SP will have the stack register for soffset. During frame elimination, the soffset remains zero for entry functions with zero dynamic allocas and no callsites, or else is updated to the appropriate frame/stack register.
Also, did some code clean up and made all asserts around soffset stricter to match.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D95071
show more ...
|
|
Revision tags: llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
| #
d28624a2 |
| 01-Dec-2020 |
Jay Foad <[email protected]> |
[AMDGPU] Stop adding an implicit def of vcc_hi for wave32
This doesn't seem to be needed for anything.
Differential Revision: https://reviews.llvm.org/D92400
|
|
Revision tags: llvmorg-11.0.1-rc1 |
|
| #
cf6565f6 |
| 12-Nov-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Enable multi-dword flat scratch load/stores
Differential Revision: https://reviews.llvm.org/D91384
|
| #
d5a46586 |
| 06-Nov-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Omit buffer resource with flat scratch.
Differential Revision: https://reviews.llvm.org/D90979
|
| #
038d884a |
| 21-Oct-2020 |
Stanislav Mekhanoshin <[email protected]> |
[AMDGPU] Use flat scratch instructions where available
The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled
[AMDGPU] Use flat scratch instructions where available
The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled to swizzled as used by flat scratch instructions, so it cannot be mixed with MUBUF stack access.
At the very least missing:
- GlobalISel; - Some optimizations in frame elimination in between vector and scalar ALU; - It shall finally allow to always materialize frame index as an SGPR, but that is not implemented and frame elimination cannot handle it yet; - Unaligned and/or multidword flat scratch shall work, but it is legalized now for MUBUF; - Operand folding cannot optimize FI like with MUBUF yet; - It will need scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address;
Differential Revision: https://reviews.llvm.org/D89170
show more ...
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4 |
|
| #
a343b9b0 |
| 23-Sep-2020 |
Sebastian Neubauer <[email protected]> |
Revert "[AMDGPU] Insert waitcnt after returning from call"
This reverts commit ca907bfb57d8ad3ec3bcc2cff2abab7b1b933af6.
According to michel.daenzer, > This completely broke the Mesa radeonsi drive
Revert "[AMDGPU] Insert waitcnt after returning from call"
This reverts commit ca907bfb57d8ad3ec3bcc2cff2abab7b1b933af6.
According to michel.daenzer, > This completely broke the Mesa radeonsi driver on Navi 14. Xorg + > xterm come up with major corruption & psychedelic colours.
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc3 |
|
| #
ca907bfb |
| 04-Sep-2020 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Insert waitcnt after returning from call
When memory operations are outstanding on function calls, either the caller or the callee can insert a waitcnt to ensure that all reads are finished
[AMDGPU] Insert waitcnt after returning from call
When memory operations are outstanding on function calls, either the caller or the callee can insert a waitcnt to ensure that all reads are finished. Calls need some time to be executed, so if the callee inserts the waitcnt, filling the instruction buffer and waiting for memory will be interleaved, hiding some latency. This comes at the cost of having a waitcnt inside functions that may not be needed as no memory operations are outstanding.
For function calls, this is already implemented. The same principal applies to returns: If the caller inserts a waitcnt after the call, the callee does not have to wait and the return and memory operation can be run in parallel.
This commit implements waiting in the caller after returning from a function call.
Differential Revision: https://reviews.llvm.org/D87674
show more ...
|
| #
36724895 |
| 10-Sep-2020 |
Matt Arsenault <[email protected]> |
AMDGPU: Clear offset register when using local stack area
eliminateFrameIndex won't fix up the offset register when the direct frame index reference is moved to a separate move instruction. Switch t
AMDGPU: Clear offset register when using local stack area
eliminateFrameIndex won't fix up the offset register when the direct frame index reference is moved to a separate move instruction. Switch the offset to a base 0 (which it probably should be to begin with).
show more ...
|
| #
4bdab2e8 |
| 01-Sep-2020 |
Jay Foad <[email protected]> |
[AMDGPU] Fix offset for REL32_HI relocs
The addend in a REL32 reloc needs to be adjusted to account for the offset from the PC value returned by the s_getpc instruction to the point where the reloc
[AMDGPU] Fix offset for REL32_HI relocs
The addend in a REL32 reloc needs to be adjusted to account for the offset from the PC value returned by the s_getpc instruction to the point where the reloc is applied. This was being done correctly for (GOTPC)REL32_LO but not for (GOTPC)REL32_HI. This will only make a difference if the target symbol happens to get loaded almost exactly a multiple of 4G away from the relocated instructions.
Differential Revision: https://reviews.llvm.org/D86938
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4 |
|
| #
590964c8 |
| 11-Mar-2020 |
Jay Foad <[email protected]> |
[AMDGPU] More accurate gfx10 latencies
Differential Revision: https://reviews.llvm.org/D81012
|
| #
c9c930ae |
| 06-May-2020 |
Eli Friedman <[email protected]> |
[SelectionDAG] Don't promote the alignment of allocas beyond the stack alignment.
allocas in LLVM IR have a specified alignment. When that alignment is specified, the alloca has at least that alignm
[SelectionDAG] Don't promote the alignment of allocas beyond the stack alignment.
allocas in LLVM IR have a specified alignment. When that alignment is specified, the alloca has at least that alignment at runtime.
If the specified type of the alloca has a higher preferred alignment, SelectionDAG currently ignores that specified alignment, and increases the alignment. It does this even if it would trigger stack realignment. I don't think this makes sense, so this patch changes that.
I was looking into this for SVE in particular: for SVE, overaligning vscale'ed types is extra expensive because it requires realigning the stack multiple times, or using dynamic allocation. (This currently isn't implemented.)
I updated the expected assembly for a couple tests; in particular, for arg-copy-elide.ll, the optimization in question does not increase the alignment the way SelectionDAG normally would. For the rest, I just increased the specified alignment on the allocas to match what SelectionDAG was inferring.
Differential Revision: https://reviews.llvm.org/D79532
show more ...
|
| #
375cec4b |
| 27-Mar-2020 |
Christudasan Devadasan <[email protected]> |
[AMDGPU] Introduce more scratch registers in the ABI.
The AMDGPU target has a convention that defined all VGPRs (execept the initial 32 argument registers) as callee-saved. This convention is not ef
[AMDGPU] Introduce more scratch registers in the ABI.
The AMDGPU target has a convention that defined all VGPRs (execept the initial 32 argument registers) as callee-saved. This convention is not efficient always, esp. when the callee requiring more registers, ended up emitting a large number of spills, even though its caller requires only a few.
This patch revises the ABI by introducing more scratch registers that a callee can freely use. The 256 vgpr registers now become: 32 argument registers 112 scratch registers and 112 callee saved registers. The scratch registers and the CSRs are intermixed at regular intervals (a split boundary of 8) to obtain a better occupancy.
Reviewers: arsenm, t-tye, rampitec, b-sumner, mjbedy, tpr
Reviewed By: arsenm, t-tye
Differential Revision: https://reviews.llvm.org/D76356
show more ...
|
|
Revision tags: llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1 |
|
| #
60b1967c |
| 21-Jan-2020 |
Scott Linder <[email protected]> |
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us t
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI.
As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative.
Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first.
Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75138
show more ...
|
| #
e53a9d96 |
| 22-Jan-2020 |
cdevadas <[email protected]> |
Resubmit: [AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to h
Resubmit: [AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary.
This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass.
Differential revision: https://reviews.llvm.org/D68092
show more ...
|
| #
a80291ce |
| 21-Jan-2020 |
Nicolai Hähnle <[email protected]> |
Revert "[AMDGPU] Invert the handling of skip insertion."
This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd.
The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Me
Revert "[AMDGPU] Invert the handling of skip insertion."
This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd.
The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa.
show more ...
|