History log of /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (Results 1 – 25 of 277)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6
# d96361d7 17-Jun-2022 Abinav Puthan Purayil <[email protected]>

[AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the kernel metadata map

This change introduces the dynamic stack boolean field to code-object-v3
and above under the code prope

[AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the kernel metadata map

This change introduces the dynamic stack boolean field to code-object-v3
and above under the code properties of the kernel descriptor and under
the kernel metadata map of NT_AMDGPU_METADATA. This field corresponds to
the is_dynamic_callstack field of amd_kernel_code_t.

Differential Revision: https://reviews.llvm.org/D128344

show more ...


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4
# 67357739 11-Mar-2022 Vang Thao <[email protected]>

[AMDGPU] Add remarks to output some resource usage

Add analyis remarks to output kernel name, register usage, occupancy,
scratch usage, spills, and LDS information.

Reviewed By: arsenm

Differentia

[AMDGPU] Add remarks to output some resource usage

Add analyis remarks to output kernel name, register usage, occupancy,
scratch usage, spills, and LDS information.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D123878

show more ...


Revision tags: llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init
# 0bdaef38 24-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU: Add gfx11 feature to force initializing 16 input SGPRs

The total user+system SGPR count needs to be padded out to 16 if fewer
inputs are enabled.


# 929a8ad2 06-Apr-2022 Jay Foad <[email protected]>

[AMDGPU] Update SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE for GFX11

The granularity of SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE changed
in GFX11. It is now in units of 256 dwords instead of 128 dwords.

[AMDGPU] Update SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE for GFX11

The granularity of SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE changed
in GFX11. It is now in units of 256 dwords instead of 128 dwords.

COMPUTE_PGM_RSRC2.LDS_SIZE is unaffected. It is still in units of
128 dwords.

Differential Revision: https://reviews.llvm.org/D128179

show more ...


# 129b531c 19-Jun-2022 Kazu Hirata <[email protected]>

[llvm] Use value_or instead of getValueOr (NFC)


# adf4142f 11-Jun-2022 Fangrui Song <[email protected]>

[MC] De-capitalize SwitchSection. NFC

Add SwitchSection to return switchSection. The API will be removed soon.


# ff85d61a 05-Apr-2022 Jay Foad <[email protected]>

Update *_TMPRING_SIZE.WAVESIZE for GFX11

The encoding of COMPUTE_TMPRING_SIZE.WAVESIZE and
SPI_TMPRING_SIZE.WAVESIZE has changed in GFX11: it is now in units
of 64 dwords instead of 256 dwords, and

Update *_TMPRING_SIZE.WAVESIZE for GFX11

The encoding of COMPUTE_TMPRING_SIZE.WAVESIZE and
SPI_TMPRING_SIZE.WAVESIZE has changed in GFX11: it is now in units
of 64 dwords instead of 256 dwords, and the field has been widened
from 13 bits to 15 bits.

Depends on D126989

Reviewed By: rampitec, arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D127248

show more ...


# 15d82c62 07-Jun-2022 Fangrui Song <[email protected]>

[MC] De-capitalize MCStreamer functions

Follow-up to c031378ce01b8485ba0ef486654bc9393c4ac024 .
The class is mostly consistent now.


# bc7902f1 16-Apr-2022 Matt Arsenault <[email protected]>

AMDGPU: Remove unused MachineFunctionInfo fields

These were leftovers from a half-implement spill to LDS attempt.


# 989f1c72 15-Mar-2022 serge-sans-paille <[email protected]>

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681

show more ...


# a278250b 10-Mar-2022 Nico Weber <[email protected]>

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https:/

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169

show more ...


# 7f230fee 07-Mar-2022 serge-sans-paille <[email protected]>

Cleanup codegen includes

after: 1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169


# 0f20a35b 09-Mar-2022 Changpeng Fang <[email protected]>

AMDGPU: Set up User SGPRs for queue_ptr only when necessary

Summary:
In general, we need queue_ptr for aperture bases and trap handling,
and user SGPRs have to be set up to hold queue_ptr. In curr

AMDGPU: Set up User SGPRs for queue_ptr only when necessary

Summary:
In general, we need queue_ptr for aperture bases and trap handling,
and user SGPRs have to be set up to hold queue_ptr. In current implementation,
user SGPRs are set up unnecessarily for some cases. If the target has aperture
registers, queue_ptr is not needed to reference aperture bases. For trap
handling, if target suppots getDoorbellID, queue_ptr is also not necessary.
Futher, code object version 5 introduces new kernel ABI which passes queue_ptr
as an implicit kernel argument, so user SGPRs are no longer necessary for
queue_ptr. Based on the trap handling document:
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table,
llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap
in the backend.

Reviewers: sameerds, arsenm

Fixes: SWDEV-307189

Differential Revision: https://reviews.llvm.org/D119762

show more ...


# 74702444 02-Feb-2022 Jacob Lambert <[email protected]>

[AMDGPU] Add agpr_count to metadata and AsmParser

gfx90a allows the number of ACC registers (AGPRs) to be set
independently to the VGPR registers. For both HSA and PAL metadata, we
now include an "a

[AMDGPU] Add agpr_count to metadata and AsmParser

gfx90a allows the number of ACC registers (AGPRs) to be set
independently to the VGPR registers. For both HSA and PAL metadata, we
now include an "agpr_count" key to report the number of AGPRs set for
supported devices (gfx90a, gfx908, as determined by hasMAIInsts()).
This is collected from SIProgramInfo.NumAccVGPR for both HSA and PAL.
The AsmParser also now recognizes ".kernel.agpr_count" for supported
devices.

Differential Revision: https://reviews.llvm.org/D116140

show more ...


# ef736a1c 08-Feb-2022 serge-sans-paille <[email protected]>

Cleanup LLVMMC headers

There's a few relevant forward declarations in there that may require downstream
adding explicit includes:

llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llv

Cleanup LLVMMC headers

There's a few relevant forward declarations in there that may require downstream
adding explicit includes:

llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h
llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h
llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h

Counting preprocessed lines required to rebuild llvm-project on my setup:
before: 1052436830
after: 1049293745

Which is significant and backs up the change in addition to the usual benefits of
decreasing coupling between headers and compilation units.

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119244

show more ...


# 4a025622 04-Feb-2022 Sebastian Neubauer <[email protected]>

[AMDGPU] Lazily init pal metadata on first function

Delay reading global metadata until the first function or the end of
the file is emitted. That way, earlier module passes can set metadata
that is

[AMDGPU] Lazily init pal metadata on first function

Delay reading global metadata until the first function or the end of
the file is emitted. That way, earlier module passes can set metadata
that is emitted in the ELF.

`emitStartOfAsmFile` gets called when the passes are initialized,
which prevented earlier passes from changing the metadata.

This fixes issues encountered after converting
AMDGPUResourceUsageAnalysis to a Module pass in D117504.

Differential Revision: https://reviews.llvm.org/D118492

show more ...


# 1194b9cd 01-Feb-2022 Changpeng Fang <[email protected]>

AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args

Summary:
Add code object v5 support (deafult is still v4)
Generate metadata for implicit kernel args for t

AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args

Summary:
Add code object v5 support (deafult is still v4)
Generate metadata for implicit kernel args for the new ABI
Set the metadata version to be 1.2

Reviewers:
t-tye, b-sumner, arsenm, and bcahoon

Fixes:
SWDEV-307188, SWDEV-307189

Differential Revision:
https://reviews.llvm.org/D118272

show more ...


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# 90ff1487 28-Oct-2021 Matt Arsenault <[email protected]>

AMDGPU: Account for implicit argument alignment for kernarg segment

If a kernel had no formal arguments but did have the implicit
arguments, we were reporting a required kernarg alignment of 4. For

AMDGPU: Account for implicit argument alignment for kernarg segment

If a kernel had no formal arguments but did have the implicit
arguments, we were reporting a required kernarg alignment of 4. For
some reason we require an 8-byte alignment for this, even though
there's no real advantage and I don't see where this is documented in
the ABI.

The code object header code also claims the minimum alignment is 16,
which is what I thought you always got at runtime anyway so I don't
know why this matters.

show more ...


# 89b57061 08-Oct-2021 Reid Kleckner <[email protected]>

Move TargetRegistry.(h|cpp) from Support to MC

This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually us

Move TargetRegistry.(h|cpp) from Support to MC

This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.

This allows us to ensure that Support doesn't have includes from MC/*.

Differential Revision: https://reviews.llvm.org/D111454

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# 69f7d81d 15-Apr-2021 David Stuttard <[email protected]>

[AMDGPU] Set number vgprs used in PS shaders based on input registers actually used

For PS shaders we can use the input SPI_PS_INPUT_ENA and SPI_PS_INPUT_ADDR
registers

Calculate the number of VGPR

[AMDGPU] Set number vgprs used in PS shaders based on input registers actually used

For PS shaders we can use the input SPI_PS_INPUT_ENA and SPI_PS_INPUT_ADDR
registers

Calculate the number of VGPR registers used as input VGPRs based on these
registers rather than the arguments passed in (this conservatively always
allocates the maximum).

Differential Revision: https://reviews.llvm.org/D101633

Change-Id: Idf7c060cbbd5f7e3300102c55ecee3c07f209de6

show more ...


# e4223438 21-Sep-2021 Arthur Eubanks <[email protected]>

Make DiagnosticInfoResourceLimit's limit param required

And always print it.

This makes some LLVM diagnostics match up better with Clang's diagnostics.

Updated some AMDGPU uses of DiagnosticInfoRe

Make DiagnosticInfoResourceLimit's limit param required

And always print it.

This makes some LLVM diagnostics match up better with Clang's diagnostics.

Updated some AMDGPU uses of DiagnosticInfoResourceLimit and now we print
better diagnostics for those.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D110204

show more ...


# 2b08f6af 19-Jul-2021 Sebastian Neubauer <[email protected]>

[AMDGPU] Improve register computation for indirect calls

First, collect the register usage in each function, then apply the
maximum register usage of all functions to functions with indirect
calls.

[AMDGPU] Improve register computation for indirect calls

First, collect the register usage in each function, then apply the
maximum register usage of all functions to functions with indirect
calls.

This is more accurate than guessing the maximum register usage without
looking at the actual usage.

As before, assume that indirect calls will hit a function in the
current module.

Differential Revision: https://reviews.llvm.org/D105839

show more ...


# db646de3 29-Jun-2021 Sebastian Neubauer <[email protected]>

[AMDGPU] Set optional PAL metadata

Set informational fields in the .shader_functions table.

Also correct the documentation, .scratch_memory_size and .lds_size are
integers.

Differential Revision:

[AMDGPU] Set optional PAL metadata

Set informational fields in the .shader_functions table.

Also correct the documentation, .scratch_memory_size and .lds_size are
integers.

Differential Revision: https://reviews.llvm.org/D105116

show more ...


# 98f48723 24-Jun-2021 Carl Ritson <[email protected]>

[AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs

Add SReg_224, VReg_224, AReg_224, etc.
Link 224-bit types with v7i32/v7f32.
Link existing 192-bit types to newly added v3i64/v3f64/v6

[AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs

Add SReg_224, VReg_224, AReg_224, etc.
Link 224-bit types with v7i32/v7f32.
Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D104622

show more ...


# ad4a1825 14-Jun-2021 Matt Arsenault <[email protected]>

AMDGPU: Fix assert on m0_lo16/m0_hi16

These get added (redundantly) to the bundle expanded for indirect
register accesses. We hit this path only when there is a call in the
function.


12345678910>>...12