Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4
# affa1b1c 10-May-2022 Nicolai Hähnle <[email protected]>

AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane

A later change will add a 3rd user, so factoring out the common code
seems useful.

Reorganizing the executeInWaterfallLoop causes

AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane

A later change will add a 3rd user, so factoring out the common code
seems useful.

Reorganizing the executeInWaterfallLoop causes some more COPYs to be
generated, but those all fold away during instruction selection.
Generating the comparisons uses generic instructions over machine
instructions now which admittedly shouldn't make a difference
(though it should make it easier to move the waterfall loop generation
to another place).

(Resubmit with missing test added.)

Differential Revision: https://reviews.llvm.org/D125324

show more ...


# afc90101 25-May-2022 Nicolai Hähnle <[email protected]>

Revert "AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane"

This reverts commit 2a28467e5389c4d741d1825fadd39ae84ecaa5dc.


# 2a28467e 10-May-2022 Nicolai Hähnle <[email protected]>

AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane

A later change will add a 3rd user, so factoring out the common code
seems useful.

Reorganizing the executeInWaterfallLoop causes

AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane

A later change will add a 3rd user, so factoring out the common code
seems useful.

Reorganizing the executeInWaterfallLoop causes some more COPYs to be
generated, but those all fold away during instruction selection.
Generating the comparisons uses generic instructions over machine
instructions now which admittedly shouldn't make a difference
(though it should make it easier to move the waterfall loop generation
to another place).

Differential Revision: https://reviews.llvm.org/D125324

show more ...


Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# 1f52d02c 28-Mar-2022 Carl Ritson <[email protected]>

[AMDGPU] Split waterfall loop exec manipulation

Split waterfall loops into multiple blocks so that exec mask
manipulation (s_and_saveexec) does not occur in the middle of
a block.

VGPR live range o

[AMDGPU] Split waterfall loop exec manipulation

Split waterfall loops into multiple blocks so that exec mask
manipulation (s_and_saveexec) does not occur in the middle of
a block.

VGPR live range optimizer is updated to handle waterfall loops
spanning multiple blocks.

Reviewed By: ruiling

Differential Revision: https://reviews.llvm.org/D122200

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# bd2c01e9 11-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU/GlobalISel: Do not use terminator copy before waterfall loops

Stop using the _term variants of the mov to save the initial exec
value before the waterfall loop. This cannot be glued to the bo

AMDGPU/GlobalISel: Do not use terminator copy before waterfall loops

Stop using the _term variants of the mov to save the initial exec
value before the waterfall loop. This cannot be glued to the bottom of
the block because we may need to spill the result register. Just use a
regular mov, like the loops produced on the DAG path. Fixes some
verification errors with regalloc fast.

show more ...


# 0cf860ec 11-Jan-2022 Matt Arsenault <[email protected]>

AMDGPU/GlobalISel: Regenerate baseline checks to include -NEXT


Revision tags: llvmorg-13.0.1-rc1
# fd1cfc90 28-Oct-2021 Sebastian Neubauer <[email protected]>

[AMDGPU][GlobalISel] Fix waterfall loops

- Move the `s_and exec` to its correct position before the content of
the waterfall loop
- Use the SI_WATERFALL pseudo instruction, like for sdag, to benef

[AMDGPU][GlobalISel] Fix waterfall loops

- Move the `s_and exec` to its correct position before the content of
the waterfall loop
- Use the SI_WATERFALL pseudo instruction, like for sdag, to benefit
from optimizations
- Add support for indirect function calls

To support indirect calls, add a G_SI_CALL instruction without register
class restrictions and insert a waterfall loop when applying register
banks.

Differential Revision: https://reviews.llvm.org/D109052

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init
# 59f68652 21-Jul-2021 Jay Foad <[email protected]>

[AMDGPU][GISel] Fix MMO for raw/struct buffer access with non-constant offset

Codegen for the raw/struct buffer access intrinsics would update the
offset in the MMO to reflect the combined offset, i

[AMDGPU][GISel] Fix MMO for raw/struct buffer access with non-constant offset

Codegen for the raw/struct buffer access intrinsics would update the
offset in the MMO to reflect the combined offset, if it was known to be
constant. If the combined offset was not known to be constant, or if
there was an index, it would set the offset in the MMO to 0. This is
unsafe because it makes it look like the access does not alias with
another access with a fixed non-zero offset.

Fix these cases by setting the pointer in the MMO to null, to reflect
the fact that we do not have any known IR value pointer + constant
offset for the access.

D106284 did this for SelectionDAG. This is the corresponding fix for
GlobalISel.

Differential Revision: https://reviews.llvm.org/D106451

show more ...


Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# fae05692 20-May-2021 Matt Arsenault <[email protected]>

CodeGen: Print/parse LLTs in MachineMemOperands

This will currently accept the old number of bytes syntax, and convert
it to a scalar. This should be removed in the near future (I think I
converted

CodeGen: Print/parse LLTs in MachineMemOperands

This will currently accept the old number of bytes syntax, and convert
it to a scalar. This should be removed in the near future (I think I
converted all of the tests already, but likely missed a few).

Not sure what the exact syntax and policy should be. We can continue
printing the number of bytes for non-generic instructions to avoid
test churn and only allow non-scalar types for generic instructions.

This will currently print the LLT in parentheses, but accept parsing
the existing integers and implicitly converting to scalar. The
parentheses are a bit ugly, but the parser logic seems unable to deal
without either parentheses or some keyword to indicate the start of a
type.

show more ...


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2
# 3bffb1cd 09-Feb-2021 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] Use single cache policy operand

Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amoun

[AMDGPU] Use single cache policy operand

Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469

show more ...


# 62d946e1 07-Feb-2021 Matt Arsenault <[email protected]>

GlobalISel: Merge some AMDGPU ABI lowering code to generic code

AMDGPU currently has a lot of pre-processing code to pre-split
argument types into 32-bit pieces before passing it to the generic
code

GlobalISel: Merge some AMDGPU ABI lowering code to generic code

AMDGPU currently has a lot of pre-processing code to pre-split
argument types into 32-bit pieces before passing it to the generic
code in handleAssignments. This is a bit sloppy and also requires some
overly fancy iterator work when building the calls. It's better if all
argument marshalling code is handled directly in
handleAssignments. This handles more situations like decomposing large
element vectors into sub-element sized pieces.

This should mostly be NFC, but does change the generated code by
shifting where the initial argument packing instructions are placed. I
think this is nicer looking, since it now emits the packing code
directly after the relevant copies, rather than after the copies for
the remaining arguments.

This doubles down on gfx6/gfx7 using the gfx8+ ABI for 16-bit
types. This is ultimately the better option, but incompatible with the
DAG. Fixing this requires more work, especially for f16.

show more ...


# a8d9d507 17-Feb-2021 Stanislav Mekhanoshin <[email protected]>

[AMDGPU] gfx90a support

Differential Revision: https://reviews.llvm.org/D96906


Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2
# 8214982b 21-Jan-2021 Sebastian Neubauer <[email protected]>

[AMDGPU] Implement mir parseCustomPseudoSourceValue

Allow parsing generated mir with custom pseudo source value tokens.
Also rename pseudo source values to have more meaningful names.

Relands ba7dc

[AMDGPU] Implement mir parseCustomPseudoSourceValue

Allow parsing generated mir with custom pseudo source value tokens.
Also rename pseudo source values to have more meaningful names.

Relands ba7dcd8542ab, which had memory leaks.

Differential Revision: https://reviews.llvm.org/D95215

show more ...


# 4dbdff66 21-Jan-2021 Sebastian Neubauer <[email protected]>

Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue"

This reverts commit ba7dcd8542abfc784255efcb0767701dec42fe83.

(caused memory leaks)


# ba7dcd85 15-Jan-2021 Sebastian Neubauer <[email protected]>

[AMDGPU] Implement mir parseCustomPseudoSourceValue

Allow parsing generated mir with custom pseudo source value tokens.
Also rename pseudo source values to have more meaningful names.

Differential

[AMDGPU] Implement mir parseCustomPseudoSourceValue

Allow parsing generated mir with custom pseudo source value tokens.
Also rename pseudo source values to have more meaningful names.

Differential Revision: https://reviews.llvm.org/D94768

show more ...


Revision tags: llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4
# d07f9e73 10-Mar-2020 Carl Ritson <[email protected]>

[AMDGPU] Allow struct.buffer.*.format intrinsics to accept i32

Summary:
In the same manner as struct.buffer.load / struct.buffer.store,
allow struct.buffer.load.format / struct.buffer.store.format t

[AMDGPU] Allow struct.buffer.*.format intrinsics to accept i32

Summary:
In the same manner as struct.buffer.load / struct.buffer.store,
allow struct.buffer.load.format / struct.buffer.store.format to
return / accept any type. This simplifies front-end code gen.

Reviewers: tpr, arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75789

show more ...


Revision tags: llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init
# 97711228 13-Jan-2020 Matt Arsenault <[email protected]>

AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.load.format