History log of /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (Results 1 – 25 of 355)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 86bd7e20 04-Jul-2022 Thomas Symalla <[email protected]>

[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass.

This patch removes a bit of code duplication and
moves the v_cmpx optimization out of the
runOnMachineFunction pass.

Reviewed By: foad

Differe

[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass.

This patch removes a bit of code duplication and
moves the v_cmpx optimization out of the
runOnMachineFunction pass.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D129086

show more ...


Revision tags: llvmorg-14.0.6
# cb9ae937 10-Jun-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Define SGPR_NULL64 register. NFCI.

On gfx10+ null register can be used as both 32 and 64 bit operand.
Define a 64 bit version of the register to use during codegen.

Differential Revision:

[AMDGPU] Define SGPR_NULL64 register. NFCI.

On gfx10+ null register can be used as both 32 and 64 bit operand.
Define a 64 bit version of the register to use during codegen.

Differential Revision: https://reviews.llvm.org/D127527

show more ...


Revision tags: llvmorg-14.0.5
# dd7e407d 02-Jun-2022 Matt Arsenault <[email protected]>

AMDGPU: Move SpilledReg from MFI to SIRegisterInfo

This isn't the most natural place for it, but it avoids a circular
include dependency in an out of tree patch.


# 2d43955c 25-May-2022 Scott Linder <[email protected]>

[AMDGPU][NFC] Refactor AMDGPUCallingConv.td

Rename CalleeSavedRegs defs to avoid being overly specific:

* CSR_AMDGPU_AGPRs_32_255 => CSR_AMDGPU_AGPRs
* CSR_AMDGPU_SGPRs_30_31 + CSR_AMDGPU_SGPRs_32_

[AMDGPU][NFC] Refactor AMDGPUCallingConv.td

Rename CalleeSavedRegs defs to avoid being overly specific:

* CSR_AMDGPU_AGPRs_32_255 => CSR_AMDGPU_AGPRs
* CSR_AMDGPU_SGPRs_30_31 + CSR_AMDGPU_SGPRs_32_105 => CSR_AMDGPU_SGPRs
* CSR_AMDGPU_SI_Gfx_SGPRs_4_29 + CSR_AMDGPU_SI_Gfx_SGPRs_64_105 =>
CSR_AMDGPU_SI_Gfx_SGPRs
* CSR_AMDGPU_HighRegs => CSR_AMDGPU
* CSR_AMDGPU_HighRegs_With_AGPRs => CSR_AMDGPU_GFX90AInsts
* CSR_AMDGPU_SI_Gfx_With_AGPRs => CSR_AMDGPU_SI_Gfx_GFX90AInsts

Introduce a class RegMask to mark the cases where we use the
CalleeSavedRegs class purely as an expedient way to produce a mask.
Update the names of these masks to not mention "CSR". Other targets also
seem to do this, so a reasonable alternative is to actually update
table-gen to include a new class to do this explicitly, but the current
approach seems harmless so I opted to just make it more explicit.

Reviewed By: arsenm, sebastian-ne

Differential Revision: https://reviews.llvm.org/D109008

show more ...


Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2
# e0d585d7 16-Apr-2022 Matt Arsenault <[email protected]>

AMDGPU: Defer creation of WWM VGPR spill slots

There's no reason to create these immediately. They can be created in
the prolog/epilog code like CSR spills. There's probably a cleaner way
to do this

AMDGPU: Defer creation of WWM VGPR spill slots

There's no reason to create these immediately. They can be created in
the prolog/epilog code like CSR spills. There's probably a cleaner way
to do this by utilizing the CSR spill code.

This makes the frame index used transient state for
PrologEpilogInserter, and thus makes serialization easier. Really this
doesn't need to be saved here but there isn't really a better place
for it.

show more ...


# ea47373a 14-Apr-2022 hsmahesha <[email protected]>

[AMDGPU][NFC] Organize code around reserving VGPR32 for AGPR copy.

This is an NFC patch in preparation to fix a bug related to always
reserving VGPR32 for AGPR copy.

Reviewed By: rampitec

Differen

[AMDGPU][NFC] Organize code around reserving VGPR32 for AGPR copy.

This is an NFC patch in preparation to fix a bug related to always
reserving VGPR32 for AGPR copy.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D123651

show more ...


Revision tags: llvmorg-14.0.1
# 463bc93e 09-Apr-2022 Matt Arsenault <[email protected]>

AMDGPU/GlobalISel: Remove unused parameter


# f014303e 23-Mar-2022 hsmahesha <[email protected]>

[AMDGPU] [NFC]: Organize the code around reserving registers.

First, add code to reserve all required special purpose registers,
followed by code to reserve SGPRs, followed by code to reserve
VGPRs/

[AMDGPU] [NFC]: Organize the code around reserving registers.

First, add code to reserve all required special purpose registers,
followed by code to reserve SGPRs, followed by code to reserve
VGPRs/AGPRs.

This patch is prepared as a pre-requisite to fix an issue related to
GFX90A hardware.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122219

show more ...


# 7636c9a9 22-Mar-2022 alex-t <[email protected]>

[AMDGPU] use scalar shift for SALU users in frame index elimination

In the frame index lowering we have to insert shift and add
instructions to adjust stack object access. We need to take care of t

[AMDGPU] use scalar shift for SALU users in frame index elimination

In the frame index lowering we have to insert shift and add
instructions to adjust stack object access. We need to take care of the stack
object user kind and use scalar shift/add for scalar users.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D121524

show more ...


Revision tags: llvmorg-14.0.0
# 0a488cba 12-Mar-2022 alex-t <[email protected]>

[AMDGPU] use scalar shift for SALU users in frame index elimination

In the frame index lowering we have to insert shift and add
instructions to adjust stack object access. We need to take care of t

[AMDGPU] use scalar shift for SALU users in frame index elimination

In the frame index lowering we have to insert shift and add
instructions to adjust stack object access. We need to take care of the stack
object user kind and use scalar shift/add for scalar users.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D121524

show more ...


# 37b37838 16-Mar-2022 Shengchen Kan <[email protected]>

[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments


# 989f1c72 15-Mar-2022 serge-sans-paille <[email protected]>

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681

show more ...


Revision tags: llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 36fe3f13 08-Mar-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] flat scratch SVS addressing mode for gfx940

Both VADDR and SADDR are used in SVS mode.

Differential Revision: https://reviews.llvm.org/D121254


# 72a9e5f8 11-Mar-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Restrict machine copy propagation from creating unaligned classes

Fixes: SWDEV-326366

Differential Revision: https://reviews.llvm.org/D121491


# a278250b 10-Mar-2022 Nico Weber <[email protected]>

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https:/

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169

show more ...


# 7f230fee 07-Mar-2022 serge-sans-paille <[email protected]>

Cleanup codegen includes

after: 1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169


Revision tags: llvmorg-14.0.0-rc2
# bf60a1c5 25-Feb-2022 Aakanksha <[email protected]>

Avoid comparisons between types of different widths in a loop condition to prevent the loop from behaving unexpectedly

This change fixes the code violations flagged in AMD compute CodeQL scan -
Quer

Avoid comparisons between types of different widths in a loop condition to prevent the loop from behaving unexpectedly

This change fixes the code violations flagged in AMD compute CodeQL scan -
Query Description: "Comparisons between types of different widths in a loop condition can cause the loop to behave unexpectedly."

Differential Revision: https://reviews.llvm.org/D120355

show more ...


# 3884cb92 14-Feb-2022 Matt Arsenault <[email protected]>

AMDGPU: Always reserve VGPR for AGPR copies on gfx908

Just because there aren't AGPRs in the original program doesn't mean
the register allocator can't choose to use them (unless we were to
forcibly

AMDGPU: Always reserve VGPR for AGPR copies on gfx908

Just because there aren't AGPRs in the original program doesn't mean
the register allocator can't choose to use them (unless we were to
forcibly reserve all AGPRs if there weren't any uses). This happens in
high pressure situations and introduces copies to avoid spills.

In this test, the allocator ends up introducing a copy from SGPR to
AGPR which requires an intermediate VGPR. I don't believe it would
introduce a copy from AGPR to AGPR in this situation, since it would
be trying to use an intermediate with a different class.

Theoretically this is also broken on gfx90a, but I have been unable to
come up with a testcase.

show more ...


# 898dc8a4 15-Feb-2022 Matt Arsenault <[email protected]>

AMDGPU: Use subtarget in class instead of querying function


Revision tags: llvmorg-14.0.0-rc1
# f2c99ea4 04-Feb-2022 Matt Arsenault <[email protected]>

AMDGPU: Use reserved VGPR for AGPR spills to memory

Previously would reuse the VGPR used for large frame offsets with the
one needed for copying from the AGPR. Fix this by reusing the register
we al

AMDGPU: Use reserved VGPR for AGPR spills to memory

Previously would reuse the VGPR used for large frame offsets with the
one needed for copying from the AGPR. Fix this by reusing the register
we already reserved for handling AGPR to AGPR copies.

show more ...


Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 8b2ca766 15-Dec-2021 Matt Arsenault <[email protected]>

AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908

We need to guarantee cheap copies between AGPRs, and unfortunately
gfx908 cannot directly do this. Theoretically we could set the
s

AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908

We need to guarantee cheap copies between AGPRs, and unfortunately
gfx908 cannot directly do this. Theoretically we could set the
scavenger up with an emergency spill slot, but it also feels
unreasonable to pay that cost for what was assumed to be a simple and
cheap copy. Pick a register that doesn't conflict with any ABI
registers.

This does not address the same issue when copying from SGPR to AGPR
for gfx90a (this coincidentally fixes it for gfx908), but that's less
interesting since the register allocator shouldn't be proactively
introducing such copies.

One edge case I'm worried about is respecting the VGPR budget implied
by amdgpu-waves-per-eu. If the theoretical upper bound of a function
is 32 VGPRs, this will force the actual count to be 33.

This is also broken if inline assembly uses/defs something in v32. The
coalescer will eliminate the intermediate vreg between the def and
use, and the introduced copy will clobber the user value.

(cherry picked from commit 3335784ac2d587ff4eac04586e189532ae8b2607)

show more ...


Revision tags: llvmorg-13.0.1-rc1
# 31973062 01-Nov-2021 Matt Arsenault <[email protected]>

AMDGPU: Fix clobbering SCC when expanding large offset spill pseudos

If we had a large offset which required materializing in a register,
we would emit an s_add_i32, clobbering SCC. Start checking i

AMDGPU: Fix clobbering SCC when expanding large offset spill pseudos

If we had a large offset which required materializing in a register,
we would emit an s_add_i32, clobbering SCC. Start checking if SCC is
live, and instead use a VGPR offset. For MUBUF, we switch to using
offen. We would do this anyway in a normal load/store with a frame
index, but not for spills.

The same problem still exists in other contexts where we expand frame
indices.

The nasty edge case is when SGPRs are spilled to memory at a large
frame offset where SCC is also clobbered. This requires a second
scavenging index, and also required several patches in the scavenger
to correctly handle multiple recursive scavenge indexes.

An even nastier edge case we still don't support is if we don't have
any free SGPRs. If SCC is live and we don't have any free SGPRs to
save exec, we have no way of flipping exec back and forth without also
clobbering SCC.

Fixes: SWDEV-309419

show more ...


# 245e25f9 02-Feb-2022 Matt Arsenault <[email protected]>

AMDGPU: Implement isAsmClobberable

Warn on inline assembly clobbering reserved registers. It should also
warn on at least some reserved register defs, but that isn't happening
right now. If you have

AMDGPU: Implement isAsmClobberable

Warn on inline assembly clobbering reserved registers. It should also
warn on at least some reserved register defs, but that isn't happening
right now. If you have a def and re-use of a register we reserve, the
register coalescer will eliminate the intermediate virtual
register. When the reserved reg def is introduced later by the
backend, it will end up clobbering the value the register coalescer
assumed was live through the range.

There is also isInlineAsmReadOnlyReg, although I don't understand what
the distinction really is. It's called in SelectionDAGBuilder, long
before the set of reserved registers is frozen so I'm not sure how
that can possibly work reliably.

Unfortunately this is also using the ugly tablegenerated names for the
registers.

show more ...


# cf58b9ce 09-Dec-2021 Christudasan Devadasan <[email protected]>

[AMDGPU] Add AV class spill pseudo instructions

While enabling vector superclasses with D109301,
the AV spills are converted into VGPR spills by
introducing appropriate copies. The whole thing
ended

[AMDGPU] Add AV class spill pseudo instructions

While enabling vector superclasses with D109301,
the AV spills are converted into VGPR spills by
introducing appropriate copies. The whole thing
ended up adding two instructions per spill (a copy
+ vgpr spill pseudo) and caused an incorrect
liverange update during inline spiller.

This patch adds the pseudo instructions for all
AV spills from 32b to 1024b and handles them in
the way all other spills are lowered.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D115439

show more ...


# 017ef785 08-Dec-2021 Matt Arsenault <[email protected]>

AMDGPU: Mark scc defs dead in SGPR to VMEM path for no free SGPRs

This introduces verifier errors into this broken situation which we do
not handle correctly, which is better than being silently
mis

AMDGPU: Mark scc defs dead in SGPR to VMEM path for no free SGPRs

This introduces verifier errors into this broken situation which we do
not handle correctly, which is better than being silently
miscompiled. For the emergency stack slot, the scavenger likes to move
the restore instruction as late as possible, which ends up separating
the SCC def from the conditional branch.

show more ...


12345678910>>...15