History log of /llvm-project-15.0.7/llvm/lib/Target/AMDGPU/AMDGPU.h (Results 1 – 25 of 155)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# d1af09ad 23-Jun-2022 Joe Nash <[email protected]>

[AMDGPU] gfx11 Generate VOPD Instructions

We form VOPD instructions in the GCNCreateVOPD pass by combining
back-to-back component instructions. There are strict register
constraints for creating a

[AMDGPU] gfx11 Generate VOPD Instructions

We form VOPD instructions in the GCNCreateVOPD pass by combining
back-to-back component instructions. There are strict register
constraints for creating a legal VOPD, namely that the matching operands
(e.g. src0x and src0y, src1x and src1y) must be in different register
banks. We add a PostRA scheduler
mutation to put possible VOPD components back-to-back.

Depends on D128442, D128270

Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D128656

show more ...


# 0f94d2b3 30-Jun-2022 Jay Foad <[email protected]>

[AMDGPU] GFX11: automatically release VGPRs at the end of the shader

GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to
release a shader's VGPRs. Sending this at the end of a shader

[AMDGPU] GFX11: automatically release VGPRs at the end of the shader

GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to
release a shader's VGPRs. Sending this at the end of a shader (just
before the s_endpgm) can help overall system performance in cases where
the s_endpgm would have to wait for outstanding VMEM stores to complete
before releasing the VGPRs.

Differential Revision: https://reviews.llvm.org/D128442

show more ...


Revision tags: llvmorg-14.0.6
# cfb7ffde 21-Jun-2022 Jay Foad <[email protected]>

[AMDGPU] New AMDGPUInsertDelayAlu pass

Differential Revision: https://reviews.llvm.org/D128270


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3
# 6ddf2a82 27-Apr-2022 Ivan Kosarev <[email protected]>

[AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling.

As older waves execute long sequences of VALU instructions, this may
prevent younger waves from address calculation an

[AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling.

As older waves execute long sequences of VALU instructions, this may
prevent younger waves from address calculation and then issuing their
VMEM loads, which in turn leads the VALU unit to idle. This patch tries
to prevent this by temporarily raising the wave's priority.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D124246

show more ...


Revision tags: llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 203a1e36 17-Dec-2021 Matt Arsenault <[email protected]>

Reapply "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass"

This reverts commit 8a85be807bd453eb9c88d0126c75fd5ea393f60d.

The unrelated failure this exposed was fixed.


# e188aae4 31-Jan-2022 serge-sans-paille <[email protected]>

Cleanup header dependencies in LLVMCore

Based on the output of include-what-you-use.

This is a big chunk of changes. It is very likely to break downstream code
unless they took a lot of care in avo

Cleanup header dependencies in LLVMCore

Based on the output of include-what-you-use.

This is a big chunk of changes. It is very likely to break downstream code
unless they took a lot of care in avoiding hidden ehader dependencies, something
the LLVM codebase doesn't do that well :-/

I've tried to summarize the biggest change below:

- llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h
- llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h
- llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h
- llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h
- llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h
- llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h
- llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h

And the usual count of preprocessed lines:
$ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 6400831
after: 6189948

200k lines less to process is no that bad ;-)

Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup

Differential Revision: https://reviews.llvm.org/D118652

show more ...


# 8a85be80 16-Dec-2021 Ron Lieberman <[email protected]>

Revert "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass"

Offload abort in Nekbone

This reverts commit 2b4876157562bc76e86f193d371348993905bc61.


# 2b487615 14-Dec-2021 Matt Arsenault <[email protected]>

AMDGPU: Remove AMDGPUFixFunctionBitcasts pass

This was a workaround for not supporting indirect calls when
instcombine didn't eliminate constant expression casts of the callee
at -O0. Indirect calls

AMDGPU: Remove AMDGPUFixFunctionBitcasts pass

This was a workaround for not supporting indirect calls when
instcombine didn't eliminate constant expression casts of the callee
at -O0. Indirect calls are supposed to work now, so drop the hack.

show more ...


Revision tags: llvmorg-13.0.1-rc1
# 9cf995be 08-Oct-2021 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Promote generic pointer kernel arguments into global

The new pass walks kernel's pointer arguments, then loads from them.
If a loaded value is a pointer and loaded pointer is unmodified in

[AMDGPU] Promote generic pointer kernel arguments into global

The new pass walks kernel's pointer arguments, then loads from them.
If a loaded value is a pointer and loaded pointer is unmodified in
the kernel before the load, then promote loaded pointer to global.
Then recursively continue.

Differential Revision: https://reviews.llvm.org/D111464

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 4c1023b4 14-Sep-2021 Jacob Lambert <[email protected]>

[AMDGPU] NFC: Fixing small spelling errors in AMDGPU header files

Nonfunctional commit fixing several minor spelling errors in llvm/lib/Target/AMDGPU header files.
Testing workflow as a new contribu

[AMDGPU] NFC: Fixing small spelling errors in AMDGPU header files

Nonfunctional commit fixing several minor spelling errors in llvm/lib/Target/AMDGPU header files.
Testing workflow as a new contributor.

Differential Revision: https://reviews.llvm.org/D109733

show more ...


Revision tags: llvmorg-13.0.0-rc2
# 48958d02 23-Aug-2021 Daniil Fukalov <[email protected]>

[NFC][AMDGPU] Reduce includes dependencies.

1. Splitted out some parts of R600 target to separate modules/headers.
2. Reduced some include lists in headers.
3. Found and fixed issue with override `G

[NFC][AMDGPU] Reduce includes dependencies.

1. Splitted out some parts of R600 target to separate modules/headers.
2. Reduced some include lists in headers.
3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()`
and `R600TargetMachine::getSubtargetImpl()` had different return value type
than base class.
4. Minor forward declarations cleanup.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108596

show more ...


# 5173854f 06-Aug-2021 Reshabh Sharma <[email protected]>

[AMDGPU] Handle functions in llvm's global ctors and dtors list

This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer w

[AMDGPU] Handle functions in llvm's global ctors and dtors list

This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.

To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.

Reviewed by: yaxunl

Differential Revision: https://reviews.llvm.org/D105682

show more ...


# dce35ef1 04-Aug-2021 Reshabh Sharma <[email protected]>

Revert "[AMDGPU] Handle functions in llvm's global ctors and dtors list"

This reverts commit d42e70b3d315645e37f3b1455d39e68678e69525.


# d42e70b3 04-Aug-2021 Reshabh Sharma <[email protected]>

[AMDGPU] Handle functions in llvm's global ctors and dtors list

This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer w

[AMDGPU] Handle functions in llvm's global ctors and dtors list

This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.

To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.

Reviewed by: yaxunl

Differential Revision: https://reviews.llvm.org/D105682

show more ...


Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4
# 96709823 27-Jun-2021 Kuter Dinel <[email protected]>

[AMDGPU] Deduce attributes with the Attributor

This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.

Reviewed By: jdoerfert, arsenm

Differential Revision: htt

[AMDGPU] Deduce attributes with the Attributor

This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.

Reviewed By: jdoerfert, arsenm

Differential Revision: https://reviews.llvm.org/D104997

show more ...


# 2b08f6af 19-Jul-2021 Sebastian Neubauer <[email protected]>

[AMDGPU] Improve register computation for indirect calls

First, collect the register usage in each function, then apply the
maximum register usage of all functions to functions with indirect
calls.

[AMDGPU] Improve register computation for indirect calls

First, collect the register usage in each function, then apply the
maximum register usage of all functions to functions with indirect
calls.

This is more accurate than guessing the maximum register usage without
looking at the actual usage.

As before, assume that indirect calls will hit a function in the
current module.

Differential Revision: https://reviews.llvm.org/D105839

show more ...


# 381ded34 28-Jun-2021 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants

This is to allow 64 bit constant rematerialization. If a constant
is split into two separate moves initializing sub0 and sub1 like
now RA cannot

[AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants

This is to allow 64 bit constant rematerialization. If a constant
is split into two separate moves initializing sub0 and sub1 like
now RA cannot rematerizalize a 64 bit register.

This gives 10-20% uplift in a set of huge apps heavily using double
precession math.

Fixes: SWDEV-292645

Differential Revision: https://reviews.llvm.org/D104874

show more ...


Revision tags: llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# 208332de 19-Apr-2021 Ruiling Song <[email protected]>

[AMDGPU] Add Optimize VGPR LiveRange Pass.

This pass aims to optimize VGPR live-range in a typical divergent if-else
control flow. For example:

def(a)
if(cond)
use(a)
... // A
else
use(a)

As

[AMDGPU] Add Optimize VGPR LiveRange Pass.

This pass aims to optimize VGPR live-range in a typical divergent if-else
control flow. For example:

def(a)
if(cond)
use(a)
... // A
else
use(a)

As AMDGPU access vgpr with respect to active-mask, we can mark `a` as
dead in region A. For details, please refer to the comments in
implementation file.

The pass is enabled by default, the frontend can disable it through
"-amdgpu-opt-vgpr-liverange=false".

Differential Revision: https://reviews.llvm.org/D102212

show more ...


# 80fd5fa5 21-Jun-2021 hsmahesha <[email protected]>

[AMDGPU] Replace non-kernel function uses of LDS globals by pointers.

The main motivation behind pointer replacement of LDS use within non-kernel
functions is - to *avoid* subsequent LDS lowering pa

[AMDGPU] Replace non-kernel function uses of LDS globals by pointers.

The main motivation behind pointer replacement of LDS use within non-kernel
functions is - to *avoid* subsequent LDS lowering pass from directly packing
LDS (assume large LDS) into a struct type which would otherwise cause allocating
huge memory for struct instance within every kernel.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D103225

show more ...


# caf1294d 26-Apr-2021 Baptiste Saleil <[email protected]>

[AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts
the compilation time and there is no case for which we see any improvement in
performance. This patch removes this pas

[AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts
the compilation time and there is no case for which we see any improvement in
performance. This patch removes this pass and its associated test cases from
the tree.

Differential Revision: https://reviews.llvm.org/D101313

Change-Id: I0599169a7609c19a887f8d847a71e664030cc141

show more ...


# d5d412f2 07-Apr-2021 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Split GCNRegBankReassign

Allow pass to work separately with SGPR, VGPR registers or both.
This is NFC now but will be needed to split RA for separate
SGPR and VGPR passes.

Differential Rev

[AMDGPU] Split GCNRegBankReassign

Allow pass to work separately with SGPR, VGPR registers or both.
This is NFC now but will be needed to split RA for separate
SGPR and VGPR passes.

Differential Revision: https://reviews.llvm.org/D100063

show more ...


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5
# fdc4f19e 01-Apr-2021 Jay Foad <[email protected]>

[AMDGPU] Remove SIAddIMGInit pass which is now unused

Differential Revision: https://reviews.llvm.org/D99748


Revision tags: llvmorg-12.0.0-rc4
# fe5f4c39 20-Mar-2021 Carl Ritson <[email protected]>

[AMDGPU] Rename SIInsertSkips Pass

Pass no longer handles skips. Pass now removes unnecessary
unconditional branches and lowers early termination branches.
Hence rename to SILateBranchLowering.

Mo

[AMDGPU] Rename SIInsertSkips Pass

Pass no longer handles skips. Pass now removes unnecessary
unconditional branches and lowers early termination branches.
Hence rename to SILateBranchLowering.

Move code to handle returns to epilog from SIPreEmitPeephole
into SILateBranchLowering. This means SIPreEmitPeephole only
contains optional optimisations, and all required transforms
are in SILateBranchLowering.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D98915

show more ...


# 5df2af8b 20-Mar-2021 Carl Ritson <[email protected]>

[AMDGPU] Merge SIRemoveShortExecBranches into SIPreEmitPeephole

SIRemoveShortExecBranches is an optimisation so fits well in the
context of SIPreEmitPeephole.

Test changes relate to early terminati

[AMDGPU] Merge SIRemoveShortExecBranches into SIPreEmitPeephole

SIRemoveShortExecBranches is an optimisation so fits well in the
context of SIPreEmitPeephole.

Test changes relate to early termination from kills which have now
been lowered prior to considering branches for removal.
As these use s_cbranch the execz skips are now retained instead.
Currently either behaviour is valid as kill with EXEC=0 is a nop;
however, if early termination is used differently in future then
the new behaviour is the correct one.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D98917

show more ...


# 13e49dce 15-Mar-2021 Jon Chesterfield <[email protected]>

[amdgpu] Implement lower function LDS pass

[amdgpu] Implement lower function LDS pass

Local variables are allocated at kernel launch. This pass collects global
variables that are used from non-kern

[amdgpu] Implement lower function LDS pass

[amdgpu] Implement lower function LDS pass

Local variables are allocated at kernel launch. This pass collects global
variables that are used from non-kernel functions, moves them into a new struct
type, and allocates an instance of that type in every kernel. Uses are then
replaced with a constantexpr offset.

Prior to this pass, accesses from a function are compiled to trap. With this
pass, most such accesses are removed before reaching codegen. The trap logic
is left unchanged by this pass. It is still reachable for the cases this pass
misses, notably the extern shared construct from hip and variables marked
constant which survive the optimizer.

This is of interest to the openmp project because the deviceRTL runtime library
uses cuda shared variables from functions that cannot be inlined. Trunk llvm
therefore cannot compile some openmp kernels for amdgpu. In addition to the
unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled
and the function pointer hashing scheme deleted passes the openmp suite.

This lowering will use more LDS than strictly necessary. It is intended to be
a functionally correct fallback for cases that are difficult to target from
future optimisation passes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94648

show more ...


1234567