History log of /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp (Results 1 – 25 of 255)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 3a205977 19-Jul-2022 Jon Chesterfield <[email protected]>

[amdgpu] Implement lds kernel id intrinsic

Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS varia

[amdgpu] Implement lds kernel id intrinsic

Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.

There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.

It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.

The intent is to emit a __const array of LDS addresses and index into it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D125060

show more ...


# 8a12f20e 13-Jul-2022 jeff <[email protected]>

[AMDGPU] Update the mechanism used to check for cycles and add eges in power-sched mutation


# 6817031d 06-Jul-2022 Austin Kerbow <[email protected]>

[AMDGPU] Disable FillMFMAShadowMutation by default

Disable amdgpu mfma power sched.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D129172


Revision tags: llvmorg-14.0.6
# f1255186 18-Jun-2022 Guillaume Chatelet <[email protected]>

[NFC][Alignment] Remove max functions between Align and MaybeAlign

`llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are
not used often enough to be required. They also make the code

[NFC][Alignment] Remove max functions between Align and MaybeAlign

`llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are
not used often enough to be required. They also make the code more opaque.

Differential Revision: https://reviews.llvm.org/D128121

show more ...


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4
# 3732cd59 16-May-2022 Joe Nash <[email protected]>

[AMDGPU] gfx11 vop3 and inherited vop instructions

This patch includes MC layer support for VOP3 encoded instructions and generic VOP support
classes.
Some VOP1 and VOP2 instructions which share an

[AMDGPU] gfx11 vop3 and inherited vop instructions

This patch includes MC layer support for VOP3 encoded instructions and generic VOP support
classes.
Some VOP1 and VOP2 instructions which share an encoding with gfx10 and are using
the AssemblerPredicate = isGFX10Plus are also enabled. That predicate
will be changed to isGFX10Only in a later patch.

Patch 15/N for upstreaming of AMDGPU gfx11 architecture.

Depends on D126468

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D126475

show more ...


Revision tags: llvmorg-14.0.3, llvmorg-14.0.2
# 8a53b25e 12-Apr-2022 Jay Foad <[email protected]>

[AMDGPU] Use default member initializers in Subtarget classes

Use default member initializers in AMDGPUSubtarget and subclasses. This
is to guard against adding a new feature boolean in AMDGPUSubtar

[AMDGPU] Use default member initializers in Subtarget classes

Use default member initializers in AMDGPUSubtarget and subclasses. This
is to guard against adding a new feature boolean in AMDGPUSubtarget.h
but forgetting to initialize it to false in AMDGPUSubtarget.cpp.

This was mostly autogenerated by:
clang-tidy -checks=-*,cppcoreguidelines-prefer-member-initializer,modernize-use-default-member-init -header-filter=Subtarget -fix lib/Target/AMDGPU/*Subtarget.cpp

Differential Revision: https://reviews.llvm.org/D123613

show more ...


Revision tags: llvmorg-14.0.1
# 6733590d 07-Apr-2022 Changpeng Fang <[email protected]>

AMDGPU: Set implicit kernarg size to be of 256 bytes for code object version 5

Summary:
If implicitarg_ptr intrinsic is not used, set implicit kernarg size to 0, otherwise
set it to 256 bytes for

AMDGPU: Set implicit kernarg size to be of 256 bytes for code object version 5

Summary:
If implicitarg_ptr intrinsic is not used, set implicit kernarg size to 0, otherwise
set it to 256 bytes for code object version 5 (and beyond).

Reviewers: arsenm

Differential Revision: https://reviews.llvm.org/D123262

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 0c0636f7 07-Mar-2022 Austin Kerbow <[email protected]>

[AMDGPU] Fix uninitialized value after 8d0c34fd4f


Revision tags: llvmorg-14.0.0-rc2
# 35ec58d8 01-Mar-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] gfx940 removes all image instructions

Differential Revision: https://reviews.llvm.org/D120763


# 2e2e64df 28-Feb-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Add gfx940 target

This is target definition only.

Differential Revision: https://reviews.llvm.org/D120688


# a5d4f82b 11-Feb-2022 Sebastian Neubauer <[email protected]>

[AMDGPU] Make enable-flat-scratch a subtarget feature

Use a subtarget feature instead of a command line argument to reduce
global state.
We want to enable flat scratch for graphics in some cases and

[AMDGPU] Make enable-flat-scratch a subtarget feature

Use a subtarget feature instead of a command line argument to reduce
global state.
We want to enable flat scratch for graphics in some cases and this
doesn't work well with command line options.

Differential Revision: https://reviews.llvm.org/D119425

show more ...


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init
# 4e077c0a 25-Jan-2022 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Remove feature register-banking

Since RegBankReassign pass was removed this feature is not
use for anything.

Differential Revision: https://reviews.llvm.org/D118195


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3
# 0b1140e8 14-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU: Correct getMaxNumSGPR treatment of flat_scratch

This was approximating the entry point logic for flat_scratch_init,
which is not really the point. We need to account for whether we need
to r

AMDGPU: Correct getMaxNumSGPR treatment of flat_scratch

This was approximating the entry point logic for flat_scratch_init,
which is not really the point. We need to account for whether we need
to reserve the SGPR pair used for flat_scratch, not whether we needed
the initialization kernel argument. If this was an arbitrary function,
we would end up over-reporting the number of potentially free
SGPRs. The logic for architected flat scratch also only applies to the
initialization in the kernel, not the reserved registers at the end.

Avoids compile failures in a future patch from allocating more SGPRs
than the subtarget supports.

show more ...


Revision tags: llvmorg-13.0.1-rc2
# 4db74227 14-Dec-2021 Jay Foad <[email protected]>

[AMDGPU] Improve zeroesHigh16BitsOfDest for GFX9 legacy opcodes

Pseudos like V_MAD_U16 and V_FMA_F16 map down to what GFX9 calls
v_mad_legacy_u16 and v_fma_legacy_f16, which are documented to have t

[AMDGPU] Improve zeroesHigh16BitsOfDest for GFX9 legacy opcodes

Pseudos like V_MAD_U16 and V_FMA_F16 map down to what GFX9 calls
v_mad_legacy_u16 and v_fma_legacy_f16, which are documented to have the
same zeroing behaviour as on GFX8.

Differential Revision: https://reviews.llvm.org/D115729

show more ...


Revision tags: llvmorg-13.0.1-rc1
# 2959e082 26-Oct-2021 Matt Arsenault <[email protected]>

AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default

Previously we would require adding an attribute to kernels to enable
the inputs passed in the kernarg segment, accessed by
llvm

AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default

Previously we would require adding an attribute to kernels to enable
the inputs passed in the kernarg segment, accessed by
llvm.amdgcn.implicitarg.ptr. This violates the principle of being
correct by default. Some OpenMP testcases were broken recently since
it wasn't correctly setting this attribute, and no known frontends are
setting this to anything other than the maximum.

Most of the test changes are from load widening of argument loads
since there now more implied dereferenceable bytes.

show more ...


# ae0ba7de 25-Oct-2021 Matt Arsenault <[email protected]>

AMDGPU: Optimize out implicit kernarg argument allocation if unused

We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be
unused. Start using it to avoid allocating the implicit arg

AMDGPU: Optimize out implicit kernarg argument allocation if unused

We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be
unused. Start using it to avoid allocating the implicit arguments if
unneeded.

show more ...


# 90ff1487 28-Oct-2021 Matt Arsenault <[email protected]>

AMDGPU: Account for implicit argument alignment for kernarg segment

If a kernel had no formal arguments but did have the implicit
arguments, we were reporting a required kernarg alignment of 4. For

AMDGPU: Account for implicit argument alignment for kernarg segment

If a kernel had no formal arguments but did have the implicit
arguments, we were reporting a required kernarg alignment of 4. For
some reason we require an 8-byte alignment for this, even though
there's no real advantage and I don't see where this is documented in
the ABI.

The code object header code also claims the minimum alignment is 16,
which is what I thought you always got at runtime anyway so I don't
know why this matters.

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 8d4b74ac 11-Sep-2021 Matt Arsenault <[email protected]>

AMDGPU: Don't consider whether amdgpu-flat-work-group-size was set

It should be semantically identical if it was set to the same value as
the default. Also improve the documentation.


# 3f34f75a 22-Oct-2021 Jay Foad <[email protected]>

[AMDGPU] Fix latency for implicit vcc_lo operands on GFX10 wave32

As described in the comment, the way we change vcc to vcc_lo in these
operands confuses addPhysRegDataDeps into treating them as imp

[AMDGPU] Fix latency for implicit vcc_lo operands on GFX10 wave32

As described in the comment, the way we change vcc to vcc_lo in these
operands confuses addPhysRegDataDeps into treating them as implicit
pseudo operands. Fix this by setting the correct latency from the
SchedModel after addPhysRegDataDeps wrongly set it to 0.

Differential Revision: https://reviews.llvm.org/D112317

show more ...


# 6fe949c4 22-Oct-2021 Kazu Hirata <[email protected]>

[Target, Transforms] Use StringRef::contains (NFC)


# 082e22f3 24-Sep-2021 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch

With architected flat scratch it becomes readonly. We must always
reserve SGPR pair for it even if we do not use scratch at all

[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch

With architected flat scratch it becomes readonly. We must always
reserve SGPR pair for it even if we do not use scratch at all since
an attempt to write to SGPRs mapped to FLAT_SCRATCH results in
memory violation.

This is not needed since GFX10 with architected flat scratch though
since special SGPRs are not carving space from normal SGPRs.

Differential Revision: https://reviews.llvm.org/D110376

show more ...


# ec55dced 16-Sep-2021 Matt Arsenault <[email protected]>

AMDGPU: Refactor getWavesPerEU to separate flat workgroup size query

Add an overload to pass the flat workgroup range in separately. This
will allow the attributor to use the assumed value for
amdgp

AMDGPU: Refactor getWavesPerEU to separate flat workgroup size query

Add an overload to pass the flat workgroup range in separately. This
will allow the attributor to use the assumed value for
amdgpu-flat-workgroup-sizes when inferring amdgpu-waves-per-eu.

show more ...


# dc6e8dfd 20-Sep-2021 Jacob Lambert <[email protected]>

[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.


# 3ce1b963 08-Sep-2021 Joe Nash <[email protected]>

[AMDGPU] Switch PostRA sched to MachineSched

Use GCNHazardRecognizer in postra sched.
Updated tests for the new schedules.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D1095

[AMDGPU] Switch PostRA sched to MachineSched

Use GCNHazardRecognizer in postra sched.
Updated tests for the new schedules.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109536

Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde

show more ...


Revision tags: llvmorg-13.0.0-rc2
# 48958d02 23-Aug-2021 Daniil Fukalov <[email protected]>

[NFC][AMDGPU] Reduce includes dependencies.

1. Splitted out some parts of R600 target to separate modules/headers.
2. Reduced some include lists in headers.
3. Found and fixed issue with override `G

[NFC][AMDGPU] Reduce includes dependencies.

1. Splitted out some parts of R600 target to separate modules/headers.
2. Reduced some include lists in headers.
3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()`
and `R600TargetMachine::getSubtargetImpl()` had different return value type
than base class.
4. Minor forward declarations cleanup.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108596

show more ...


1234567891011