|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6 |
|
| #
d96361d7 |
| 17-Jun-2022 |
Abinav Puthan Purayil <[email protected]> |
[AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the kernel metadata map
This change introduces the dynamic stack boolean field to code-object-v3 and above under the code prope
[AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the kernel metadata map
This change introduces the dynamic stack boolean field to code-object-v3 and above under the code properties of the kernel descriptor and under the kernel metadata map of NT_AMDGPU_METADATA. This field corresponds to the is_dynamic_callstack field of amd_kernel_code_t.
Differential Revision: https://reviews.llvm.org/D128344
show more ...
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4 |
|
| #
67357739 |
| 11-Mar-2022 |
Vang Thao <[email protected]> |
[AMDGPU] Add remarks to output some resource usage
Add analyis remarks to output kernel name, register usage, occupancy, scratch usage, spills, and LDS information.
Reviewed By: arsenm
Differentia
[AMDGPU] Add remarks to output some resource usage
Add analyis remarks to output kernel name, register usage, occupancy, scratch usage, spills, and LDS information.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D123878
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init |
|
| #
0bdaef38 |
| 24-Jan-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Add gfx11 feature to force initializing 16 input SGPRs
The total user+system SGPR count needs to be padded out to 16 if fewer inputs are enabled.
|
| #
929a8ad2 |
| 06-Apr-2022 |
Jay Foad <[email protected]> |
[AMDGPU] Update SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE for GFX11
The granularity of SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE changed in GFX11. It is now in units of 256 dwords instead of 128 dwords.
[AMDGPU] Update SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE for GFX11
The granularity of SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE changed in GFX11. It is now in units of 256 dwords instead of 128 dwords.
COMPUTE_PGM_RSRC2.LDS_SIZE is unaffected. It is still in units of 128 dwords.
Differential Revision: https://reviews.llvm.org/D128179
show more ...
|
| #
129b531c |
| 19-Jun-2022 |
Kazu Hirata <[email protected]> |
[llvm] Use value_or instead of getValueOr (NFC)
|
| #
adf4142f |
| 11-Jun-2022 |
Fangrui Song <[email protected]> |
[MC] De-capitalize SwitchSection. NFC
Add SwitchSection to return switchSection. The API will be removed soon.
|
| #
ff85d61a |
| 05-Apr-2022 |
Jay Foad <[email protected]> |
Update *_TMPRING_SIZE.WAVESIZE for GFX11
The encoding of COMPUTE_TMPRING_SIZE.WAVESIZE and SPI_TMPRING_SIZE.WAVESIZE has changed in GFX11: it is now in units of 64 dwords instead of 256 dwords, and
Update *_TMPRING_SIZE.WAVESIZE for GFX11
The encoding of COMPUTE_TMPRING_SIZE.WAVESIZE and SPI_TMPRING_SIZE.WAVESIZE has changed in GFX11: it is now in units of 64 dwords instead of 256 dwords, and the field has been widened from 13 bits to 15 bits.
Depends on D126989
Reviewed By: rampitec, arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D127248
show more ...
|
| #
15d82c62 |
| 07-Jun-2022 |
Fangrui Song <[email protected]> |
[MC] De-capitalize MCStreamer functions
Follow-up to c031378ce01b8485ba0ef486654bc9393c4ac024 . The class is mostly consistent now.
|
| #
bc7902f1 |
| 16-Apr-2022 |
Matt Arsenault <[email protected]> |
AMDGPU: Remove unused MachineFunctionInfo fields
These were leftovers from a half-implement spill to LDS attempt.
|
| #
989f1c72 |
| 15-Mar-2022 |
serge-sans-paille <[email protected]> |
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681
show more ...
|
| #
a278250b |
| 10-Mar-2022 |
Nico Weber <[email protected]> |
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https:/
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169
show more ...
|
| #
7f230fee |
| 07-Mar-2022 |
serge-sans-paille <[email protected]> |
Cleanup codegen includes
after: 1061034926 before: 1063332844
Differential Revision: https://reviews.llvm.org/D121169
|
| #
0f20a35b |
| 09-Mar-2022 |
Changpeng Fang <[email protected]> |
AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In curr
AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In current implementation, user SGPRs are set up unnecessarily for some cases. If the target has aperture registers, queue_ptr is not needed to reference aperture bases. For trap handling, if target suppots getDoorbellID, queue_ptr is also not necessary. Futher, code object version 5 introduces new kernel ABI which passes queue_ptr as an implicit kernel argument, so user SGPRs are no longer necessary for queue_ptr. Based on the trap handling document: https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table, llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap in the backend.
Reviewers: sameerds, arsenm
Fixes: SWDEV-307189
Differential Revision: https://reviews.llvm.org/D119762
show more ...
|
| #
74702444 |
| 02-Feb-2022 |
Jacob Lambert <[email protected]> |
[AMDGPU] Add agpr_count to metadata and AsmParser
gfx90a allows the number of ACC registers (AGPRs) to be set independently to the VGPR registers. For both HSA and PAL metadata, we now include an "a
[AMDGPU] Add agpr_count to metadata and AsmParser
gfx90a allows the number of ACC registers (AGPRs) to be set independently to the VGPR registers. For both HSA and PAL metadata, we now include an "agpr_count" key to report the number of AGPRs set for supported devices (gfx90a, gfx908, as determined by hasMAIInsts()). This is collected from SIProgramInfo.NumAccVGPR for both HSA and PAL. The AsmParser also now recognizes ".kernel.agpr_count" for supported devices.
Differential Revision: https://reviews.llvm.org/D116140
show more ...
|
| #
ef736a1c |
| 08-Feb-2022 |
serge-sans-paille <[email protected]> |
Cleanup LLVMMC headers
There's a few relevant forward declarations in there that may require downstream adding explicit includes:
llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llv
Cleanup LLVMMC headers
There's a few relevant forward declarations in there that may require downstream adding explicit includes:
llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h
Counting preprocessed lines required to rebuild llvm-project on my setup: before: 1052436830 after: 1049293745
Which is significant and backs up the change in addition to the usual benefits of decreasing coupling between headers and compilation units.
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D119244
show more ...
|
| #
4a025622 |
| 04-Feb-2022 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Lazily init pal metadata on first function
Delay reading global metadata until the first function or the end of the file is emitted. That way, earlier module passes can set metadata that is
[AMDGPU] Lazily init pal metadata on first function
Delay reading global metadata until the first function or the end of the file is emitted. That way, earlier module passes can set metadata that is emitted in the ELF.
`emitStartOfAsmFile` gets called when the passes are initialized, which prevented earlier passes from changing the metadata.
This fixes issues encountered after converting AMDGPUResourceUsageAnalysis to a Module pass in D117504.
Differential Revision: https://reviews.llvm.org/D118492
show more ...
|
| #
1194b9cd |
| 01-Feb-2022 |
Changpeng Fang <[email protected]> |
AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args
Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for t
AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args
Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for the new ABI Set the metadata version to be 1.2
Reviewers: t-tye, b-sumner, arsenm, and bcahoon
Fixes: SWDEV-307188, SWDEV-307189
Differential Revision: https://reviews.llvm.org/D118272
show more ...
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
90ff1487 |
| 28-Oct-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Account for implicit argument alignment for kernarg segment
If a kernel had no formal arguments but did have the implicit arguments, we were reporting a required kernarg alignment of 4. For
AMDGPU: Account for implicit argument alignment for kernarg segment
If a kernel had no formal arguments but did have the implicit arguments, we were reporting a required kernarg alignment of 4. For some reason we require an 8-byte alignment for this, even though there's no real advantage and I don't see where this is documented in the ABI.
The code object header code also claims the minimum alignment is 16, which is what I thought you always got at runtime anyway so I don't know why this matters.
show more ...
|
| #
89b57061 |
| 08-Oct-2021 |
Reid Kleckner <[email protected]> |
Move TargetRegistry.(h|cpp) from Support to MC
This moves the registry higher in the LLVM library dependency stack. Every client of the target registry needs to link against MC anyway to actually us
Move TargetRegistry.(h|cpp) from Support to MC
This moves the registry higher in the LLVM library dependency stack. Every client of the target registry needs to link against MC anyway to actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
69f7d81d |
| 15-Apr-2021 |
David Stuttard <[email protected]> |
[AMDGPU] Set number vgprs used in PS shaders based on input registers actually used
For PS shaders we can use the input SPI_PS_INPUT_ENA and SPI_PS_INPUT_ADDR registers
Calculate the number of VGPR
[AMDGPU] Set number vgprs used in PS shaders based on input registers actually used
For PS shaders we can use the input SPI_PS_INPUT_ENA and SPI_PS_INPUT_ADDR registers
Calculate the number of VGPR registers used as input VGPRs based on these registers rather than the arguments passed in (this conservatively always allocates the maximum).
Differential Revision: https://reviews.llvm.org/D101633
Change-Id: Idf7c060cbbd5f7e3300102c55ecee3c07f209de6
show more ...
|
| #
e4223438 |
| 21-Sep-2021 |
Arthur Eubanks <[email protected]> |
Make DiagnosticInfoResourceLimit's limit param required
And always print it.
This makes some LLVM diagnostics match up better with Clang's diagnostics.
Updated some AMDGPU uses of DiagnosticInfoRe
Make DiagnosticInfoResourceLimit's limit param required
And always print it.
This makes some LLVM diagnostics match up better with Clang's diagnostics.
Updated some AMDGPU uses of DiagnosticInfoResourceLimit and now we print better diagnostics for those.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D110204
show more ...
|
| #
2b08f6af |
| 19-Jul-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Improve register computation for indirect calls
First, collect the register usage in each function, then apply the maximum register usage of all functions to functions with indirect calls.
[AMDGPU] Improve register computation for indirect calls
First, collect the register usage in each function, then apply the maximum register usage of all functions to functions with indirect calls.
This is more accurate than guessing the maximum register usage without looking at the actual usage.
As before, assume that indirect calls will hit a function in the current module.
Differential Revision: https://reviews.llvm.org/D105839
show more ...
|
| #
db646de3 |
| 29-Jun-2021 |
Sebastian Neubauer <[email protected]> |
[AMDGPU] Set optional PAL metadata
Set informational fields in the .shader_functions table.
Also correct the documentation, .scratch_memory_size and .lds_size are integers.
Differential Revision:
[AMDGPU] Set optional PAL metadata
Set informational fields in the .shader_functions table.
Also correct the documentation, .scratch_memory_size and .lds_size are integers.
Differential Revision: https://reviews.llvm.org/D105116
show more ...
|
| #
98f48723 |
| 24-Jun-2021 |
Carl Ritson <[email protected]> |
[AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs
Add SReg_224, VReg_224, AReg_224, etc. Link 224-bit types with v7i32/v7f32. Link existing 192-bit types to newly added v3i64/v3f64/v6
[AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs
Add SReg_224, VReg_224, AReg_224, etc. Link 224-bit types with v7i32/v7f32. Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D104622
show more ...
|
| #
ad4a1825 |
| 14-Jun-2021 |
Matt Arsenault <[email protected]> |
AMDGPU: Fix assert on m0_lo16/m0_hi16
These get added (redundantly) to the bundle expanded for indirect register accesses. We hit this path only when there is a call in the function.
|