AMDGPUSubtarget.cpp - OpenGrok history log for /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 3a205977	19-Jul-2022	Jon Chesterfield <[email protected]>	[amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS varia [amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060 show more ...
# 8a12f20e	13-Jul-2022	jeff <[email protected]>	[AMDGPU] Update the mechanism used to check for cycles and add eges in power-sched mutation
# 6817031d	06-Jul-2022	Austin Kerbow <[email protected]>	[AMDGPU] Disable FillMFMAShadowMutation by default Disable amdgpu mfma power sched. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D129172
Revision tags: llvmorg-14.0.6
# f1255186	18-Jun-2022	Guillaume Chatelet <[email protected]>	[NFC][Alignment] Remove max functions between Align and MaybeAlign `llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are not used often enough to be required. They also make the code [NFC][Alignment] Remove max functions between Align and MaybeAlign `llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are not used often enough to be required. They also make the code more opaque. Differential Revision: https://reviews.llvm.org/D128121 show more ...
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4
# 3732cd59	16-May-2022	Joe Nash <[email protected]>	[AMDGPU] gfx11 vop3 and inherited vop instructions This patch includes MC layer support for VOP3 encoded instructions and generic VOP support classes. Some VOP1 and VOP2 instructions which share an [AMDGPU] gfx11 vop3 and inherited vop instructions This patch includes MC layer support for VOP3 encoded instructions and generic VOP support classes. Some VOP1 and VOP2 instructions which share an encoding with gfx10 and are using the AssemblerPredicate = isGFX10Plus are also enabled. That predicate will be changed to isGFX10Only in a later patch. Patch 15/N for upstreaming of AMDGPU gfx11 architecture. Depends on D126468 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D126475 show more ...
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2
# 8a53b25e	12-Apr-2022	Jay Foad <[email protected]>	[AMDGPU] Use default member initializers in Subtarget classes Use default member initializers in AMDGPUSubtarget and subclasses. This is to guard against adding a new feature boolean in AMDGPUSubtar [AMDGPU] Use default member initializers in Subtarget classes Use default member initializers in AMDGPUSubtarget and subclasses. This is to guard against adding a new feature boolean in AMDGPUSubtarget.h but forgetting to initialize it to false in AMDGPUSubtarget.cpp. This was mostly autogenerated by: clang-tidy -checks=-,cppcoreguidelines-prefer-member-initializer,modernize-use-default-member-init -header-filter=Subtarget -fix lib/Target/AMDGPU/Subtarget.cpp Differential Revision: https://reviews.llvm.org/D123613 show more ...
Revision tags: llvmorg-14.0.1
# 6733590d	07-Apr-2022	Changpeng Fang <[email protected]>	AMDGPU: Set implicit kernarg size to be of 256 bytes for code object version 5 Summary: If implicitarg_ptr intrinsic is not used, set implicit kernarg size to 0, otherwise set it to 256 bytes for AMDGPU: Set implicit kernarg size to be of 256 bytes for code object version 5 Summary: If implicitarg_ptr intrinsic is not used, set implicit kernarg size to 0, otherwise set it to 256 bytes for code object version 5 (and beyond). Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D123262 show more ...
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 0c0636f7	07-Mar-2022	Austin Kerbow <[email protected]>	[AMDGPU] Fix uninitialized value after 8d0c34fd4f
Revision tags: llvmorg-14.0.0-rc2
# 35ec58d8	01-Mar-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] gfx940 removes all image instructions Differential Revision: https://reviews.llvm.org/D120763
# 2e2e64df	28-Feb-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Add gfx940 target This is target definition only. Differential Revision: https://reviews.llvm.org/D120688
# a5d4f82b	11-Feb-2022	Sebastian Neubauer <[email protected]>	[AMDGPU] Make enable-flat-scratch a subtarget feature Use a subtarget feature instead of a command line argument to reduce global state. We want to enable flat scratch for graphics in some cases and [AMDGPU] Make enable-flat-scratch a subtarget feature Use a subtarget feature instead of a command line argument to reduce global state. We want to enable flat scratch for graphics in some cases and this doesn't work well with command line options. Differential Revision: https://reviews.llvm.org/D119425 show more ...
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init
# 4e077c0a	25-Jan-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Remove feature register-banking Since RegBankReassign pass was removed this feature is not use for anything. Differential Revision: https://reviews.llvm.org/D118195
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3
# 0b1140e8	14-Jan-2022	Matt Arsenault <[email protected]>	AMDGPU: Correct getMaxNumSGPR treatment of flat_scratch This was approximating the entry point logic for flat_scratch_init, which is not really the point. We need to account for whether we need to r AMDGPU: Correct getMaxNumSGPR treatment of flat_scratch This was approximating the entry point logic for flat_scratch_init, which is not really the point. We need to account for whether we need to reserve the SGPR pair used for flat_scratch, not whether we needed the initialization kernel argument. If this was an arbitrary function, we would end up over-reporting the number of potentially free SGPRs. The logic for architected flat scratch also only applies to the initialization in the kernel, not the reserved registers at the end. Avoids compile failures in a future patch from allocating more SGPRs than the subtarget supports. show more ...
Revision tags: llvmorg-13.0.1-rc2
# 4db74227	14-Dec-2021	Jay Foad <[email protected]>	[AMDGPU] Improve zeroesHigh16BitsOfDest for GFX9 legacy opcodes Pseudos like V_MAD_U16 and V_FMA_F16 map down to what GFX9 calls v_mad_legacy_u16 and v_fma_legacy_f16, which are documented to have t [AMDGPU] Improve zeroesHigh16BitsOfDest for GFX9 legacy opcodes Pseudos like V_MAD_U16 and V_FMA_F16 map down to what GFX9 calls v_mad_legacy_u16 and v_fma_legacy_f16, which are documented to have the same zeroing behaviour as on GFX8. Differential Revision: https://reviews.llvm.org/D115729 show more ...
Revision tags: llvmorg-13.0.1-rc1
# 2959e082	26-Oct-2021	Matt Arsenault <[email protected]>	AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default Previously we would require adding an attribute to kernels to enable the inputs passed in the kernarg segment, accessed by llvm AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default Previously we would require adding an attribute to kernels to enable the inputs passed in the kernarg segment, accessed by llvm.amdgcn.implicitarg.ptr. This violates the principle of being correct by default. Some OpenMP testcases were broken recently since it wasn't correctly setting this attribute, and no known frontends are setting this to anything other than the maximum. Most of the test changes are from load widening of argument loads since there now more implied dereferenceable bytes. show more ...
# ae0ba7de	25-Oct-2021	Matt Arsenault <[email protected]>	AMDGPU: Optimize out implicit kernarg argument allocation if unused We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be unused. Start using it to avoid allocating the implicit arg AMDGPU: Optimize out implicit kernarg argument allocation if unused We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be unused. Start using it to avoid allocating the implicit arguments if unneeded. show more ...
# 90ff1487	28-Oct-2021	Matt Arsenault <[email protected]>	AMDGPU: Account for implicit argument alignment for kernarg segment If a kernel had no formal arguments but did have the implicit arguments, we were reporting a required kernarg alignment of 4. For AMDGPU: Account for implicit argument alignment for kernarg segment If a kernel had no formal arguments but did have the implicit arguments, we were reporting a required kernarg alignment of 4. For some reason we require an 8-byte alignment for this, even though there's no real advantage and I don't see where this is documented in the ABI. The code object header code also claims the minimum alignment is 16, which is what I thought you always got at runtime anyway so I don't know why this matters. show more ...
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 8d4b74ac	11-Sep-2021	Matt Arsenault <[email protected]>	AMDGPU: Don't consider whether amdgpu-flat-work-group-size was set It should be semantically identical if it was set to the same value as the default. Also improve the documentation.
# 3f34f75a	22-Oct-2021	Jay Foad <[email protected]>	[AMDGPU] Fix latency for implicit vcc_lo operands on GFX10 wave32 As described in the comment, the way we change vcc to vcc_lo in these operands confuses addPhysRegDataDeps into treating them as imp [AMDGPU] Fix latency for implicit vcc_lo operands on GFX10 wave32 As described in the comment, the way we change vcc to vcc_lo in these operands confuses addPhysRegDataDeps into treating them as implicit pseudo operands. Fix this by setting the correct latency from the SchedModel after addPhysRegDataDeps wrongly set it to 0. Differential Revision: https://reviews.llvm.org/D112317 show more ...
# 6fe949c4	22-Oct-2021	Kazu Hirata <[email protected]>	[Target, Transforms] Use StringRef::contains (NFC)
# 082e22f3	24-Sep-2021	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch With architected flat scratch it becomes readonly. We must always reserve SGPR pair for it even if we do not use scratch at all [AMDGPU] Always reserve flat scratch SGPR for architected flat scratch With architected flat scratch it becomes readonly. We must always reserve SGPR pair for it even if we do not use scratch at all since an attempt to write to SGPRs mapped to FLAT_SCRATCH results in memory violation. This is not needed since GFX10 with architected flat scratch though since special SGPRs are not carving space from normal SGPRs. Differential Revision: https://reviews.llvm.org/D110376 show more ...
# ec55dced	16-Sep-2021	Matt Arsenault <[email protected]>	AMDGPU: Refactor getWavesPerEU to separate flat workgroup size query Add an overload to pass the flat workgroup range in separately. This will allow the attributor to use the assumed value for amdgp AMDGPU: Refactor getWavesPerEU to separate flat workgroup size query Add an overload to pass the flat workgroup range in separately. This will allow the attributor to use the assumed value for amdgpu-flat-workgroup-sizes when inferring amdgpu-waves-per-eu. show more ...
# dc6e8dfd	20-Sep-2021	Jacob Lambert <[email protected]>	[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.
# 3ce1b963	08-Sep-2021	Joe Nash <[email protected]>	[AMDGPU] Switch PostRA sched to MachineSched Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D1095 [AMDGPU] Switch PostRA sched to MachineSched Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109536 Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde show more ...
Revision tags: llvmorg-13.0.0-rc2
# 48958d02	23-Aug-2021	Daniil Fukalov <[email protected]>	[NFC][AMDGPU] Reduce includes dependencies. 1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `G [NFC][AMDGPU] Reduce includes dependencies. 1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()` and `R600TargetMachine::getSubtargetImpl()` had different return value type than base class. 4. Minor forward declarations cleanup. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D108596 show more ...
12 3 4 5 6 7 8 9 10 11