|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
aaaf9ced |
| 27-May-2022 |
Luo, Yuanke <[email protected]> |
[X86][AMX] Replace LDTILECFG with PLDTILECFGV on auto-config.
There is intrinsic `@llvm.x86.ldtilecfg` which is lowered to LDTILECFG. This intrinsic is open for user to configure tile registers by t
[X86][AMX] Replace LDTILECFG with PLDTILECFGV on auto-config.
There is intrinsic `@llvm.x86.ldtilecfg` which is lowered to LDTILECFG. This intrinsic is open for user to configure tile registers by themselves. There is a chance that `@llvm.x86.ldtilecfg` would be mixed with the new AMX intrinsics which depend on compiler to configure tile registers. Separate pusedo instruction PLDTILECFGV would avoid unexpected behavious when `@llvm.x86.ldtilecfg` is mixed with new AMX intrinsics. Though user should not mix the two programming model, compiler should avoid crash or UB when they are mixed.
Differential Revision: https://reviews.llvm.org/D126519
show more ...
|
|
Revision tags: llvmorg-14.0.4 |
|
| #
373ce147 |
| 04-May-2022 |
Luo, Yuanke <[email protected]> |
[X86][AMX] Replace PXOR instruction with SET0 in AMX pre config.
To generate zero value, the PXOR instruction need 3 operands that is tied to the same vreg. If is not good in SSA form and with undef
[X86][AMX] Replace PXOR instruction with SET0 in AMX pre config.
To generate zero value, the PXOR instruction need 3 operands that is tied to the same vreg. If is not good in SSA form and with undef value two address instruction pass may convert `%0:vr128 = PXORrr undef %0, undef %0` to `%1:vr128 = PXORrr undef %1:vr128(tied-def 0), undef %0:vr128`. It is not expected. It can be simplified to SET0 instruction which only take 1 destination operand. It should be more friendly to two address instruction pass and register allocation pass. `%0:vr128 = V_SET0` Also add AVX1 code path so that it is consistant to other code.
Differential Revision: https://reviews.llvm.org/D124903
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
f3ad7ea0 |
| 24-Apr-2022 |
Luo, Yuanke <[email protected]> |
[X86][AMX] Report error when shapes are not pre-defined.
Instead of report fatal error, this patch emit error message and exit when shapes are not pre-defined. This would cause the compiling fail bu
[X86][AMX] Report error when shapes are not pre-defined.
Instead of report fatal error, this patch emit error message and exit when shapes are not pre-defined. This would cause the compiling fail but not crash.
Differential Revision: https://reviews.llvm.org/D124342
show more ...
|
|
Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
c4dba471 |
| 17-Nov-2021 |
Luo, Yuanke <[email protected]> |
[X86][AMX] Don't emit tilerelease for old AMX instrisic.
We should avoid mixing old AMX instrinsic with new AMX intrinsic. For old AMX intrinsic, user is responsible for invoking tile release. This
[X86][AMX] Don't emit tilerelease for old AMX instrisic.
We should avoid mixing old AMX instrinsic with new AMX intrinsic. For old AMX intrinsic, user is responsible for invoking tile release. This patch is to check if there is any tile config generated by compiler. If so it emit tilerelease instruction, otherwise it don't emit the instruction.
Differential Revision: https://reviews.llvm.org/D114066
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
| #
db23f277 |
| 17-Sep-2021 |
Simon Pilgrim <[email protected]> |
[X86] X86PreTileConfig - Use const-ref iterator in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
|
|
Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2 |
|
| #
4ed2b6cc |
| 26-May-2021 |
Luo, Yuanke <[email protected]> |
[X86][AMX] Fix a bug on tile config.
The previous code detect if a MBB is bottom block to determine if it is a backedge of a loop. We should check latch block instead of bottom block and we should c
[X86][AMX] Fix a bug on tile config.
The previous code detect if a MBB is bottom block to determine if it is a backedge of a loop. We should check latch block instead of bottom block and we should check the header and the bottom block are in the same loop.
Differential Revision: https://reviews.llvm.org/D103145
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc1 |
|
| #
f69adfb8 |
| 28-Apr-2021 |
Wang, Pengfei <[email protected]> |
[X86][AMX][NFC] Add more comments and remove unnecessary check found by Clocwork
|
| #
016092d7 |
| 27-Apr-2021 |
Wang, Pengfei <[email protected]> |
Reapply "[X86][AMX] Try to hoist AMX shapes' def"
We request no intersections between AMX instructions and their shapes' def when we insert ldtilecfg. However, this is not always ture resulting from
Reapply "[X86][AMX] Try to hoist AMX shapes' def"
We request no intersections between AMX instructions and their shapes' def when we insert ldtilecfg. However, this is not always ture resulting from not only users don't follow AMX API model, but also optimizations.
This patch adds a mechanism that tries to hoist AMX shapes' def as well. It only hoists shapes inside a BB, we can improve it for cases across BBs in future. Currently, it only hoists shapes of which all sources' def above the first AMX instruction. We can improve for the case that only source that moves an immediate value to a register below AMX instruction.
Reviewed By: xiangzhangllvm
Differential Revision: https://reviews.llvm.org/D101067
show more ...
|
| #
caea37b3 |
| 23-Apr-2021 |
Mitch Phillips <[email protected]> |
Revert "[X86][AMX] Try to hoist AMX shapes' def"
This reverts commit 90118563ad0f133c696e070ad72761fa0daa4517.
Reason: Broke the MSan buildbots. https://lab.llvm.org/buildbot/#/builders/5/builds/69
Revert "[X86][AMX] Try to hoist AMX shapes' def"
This reverts commit 90118563ad0f133c696e070ad72761fa0daa4517.
Reason: Broke the MSan buildbots. https://lab.llvm.org/buildbot/#/builders/5/builds/6967/steps/9/logs/stdio
More details can be found in the original phabricator review: https://reviews.llvm.org/D101067
show more ...
|
| #
151e244f |
| 23-Apr-2021 |
Wang, Pengfei <[email protected]> |
[X86][AMX][NFC] Make comparison operators to be complete
The previous D101039 didn't fix the SmallSet insertion issue, due to we always return false for the comparison between 2 different nonnull BB
[X86][AMX][NFC] Make comparison operators to be complete
The previous D101039 didn't fix the SmallSet insertion issue, due to we always return false for the comparison between 2 different nonnull BBs. This patch makes the the comparison to be complete by comparing `MBB` first, so that we can always get the invariant order by a single operator.
show more ...
|
| #
90118563 |
| 22-Apr-2021 |
Wang, Pengfei <[email protected]> |
[X86][AMX] Try to hoist AMX shapes' def
We request no intersections between AMX instructions and their shapes' def when we insert ldtilecfg. However, this is not always ture resulting from not only
[X86][AMX] Try to hoist AMX shapes' def
We request no intersections between AMX instructions and their shapes' def when we insert ldtilecfg. However, this is not always ture resulting from not only users don't follow AMX API model, but also optimizations.
This patch adds a mechanism that tries to hoist AMX shapes' def as well. It only hoists shapes inside a BB, we can improve it for cases across BBs in future. Currently, it only hoists shapes of which all sources' def above the first AMX instruction. We can improve for the case that only source that moves an immediate value to a register below AMX instruction.
Differential Revision: https://reviews.llvm.org/D101067
show more ...
|
| #
aafb6d81 |
| 22-Apr-2021 |
Wang, Pengfei <[email protected]> |
[X86][AMX][NFC] Remove assert for comparison between different BBs.
SmallSet may use operator `<` when we insert MIRef elements, so we cannot limit the comparison between different BBs.
We allow MI
[X86][AMX][NFC] Remove assert for comparison between different BBs.
SmallSet may use operator `<` when we insert MIRef elements, so we cannot limit the comparison between different BBs.
We allow MIRef() to be less that any initialized MIRef object, otherwise, we always reture false when compare between different BBs.
Differential Revision: https://reviews.llvm.org/D101039
show more ...
|
| #
a3b52a9d |
| 14-Apr-2021 |
Wang, Pengfei <[email protected]> |
[X86][AMX] Refactor for PostRA ldtilecfg pass.
This is a follow up of D99010. We didn't consider the live range of shape registers when hoist ldtilecfg. There maybe risks, e.g. we happen to insert i
[X86][AMX] Refactor for PostRA ldtilecfg pass.
This is a follow up of D99010. We didn't consider the live range of shape registers when hoist ldtilecfg. There maybe risks, e.g. we happen to insert it to an invalid range of some registers and get unexpected error.
This patch fixes this problem by storing the value to corresponding stack place of ldtilecfg after all its definition immediately.
This patch also fix a problem in previous code: If we don't have a ldtilecfg which dominates all AMX instructions, we cannot initialize shapes for other ldtilecfg.
There're still some optimization points left. E.g. eliminate unused mov instructions, break the def-use dependency before RA etc.
Reviewed By: LuoYuanke, xiangzhangllvm
Differential Revision: https://reviews.llvm.org/D99966
show more ...
|
| #
4cbaaf4a |
| 12-Apr-2021 |
Wang, Pengfei <[email protected]> |
[X86][AMX] Hoist ldtilecfg
The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop.
This patch try to calc
[X86][AMX] Hoist ldtilecfg
The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop.
This patch try to calculate the nearest point where all shapes of AMX registers are reachable.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D99010
show more ...
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
2327513b |
| 20-Mar-2021 |
Wang, Pengfei <[email protected]> |
[X86] Fix a bug when calculating the ldtilecfg insertion points.
The BB we initialized the ldtilecfg is special. We don't need to check if its predecessor BBs need to insert ldtilecfg for calls.
We
[X86] Fix a bug when calculating the ldtilecfg insertion points.
The BB we initialized the ldtilecfg is special. We don't need to check if its predecessor BBs need to insert ldtilecfg for calls.
We reused the flag HasCallBeforeAMX, so that the predecessors won't be added to CfgNeedInsert.
This case happens only when the entry BB is in a loop. We need to hoist the first tile config point out of the loop in future.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D98845
show more ...
|
| #
0002d4bf |
| 18-Mar-2021 |
Bing1 Yu <[email protected]> |
[X86][AMX][NFC] Give correct Passname for Tile Register Pre-configure
|
|
Revision tags: llvmorg-12.0.0-rc3 |
|
| #
4bc7c863 |
| 24-Feb-2021 |
Liu, Chen3 <[email protected]> |
[X86] Support amx-bf16 intrinsic.
Adding support for intrinsics of AMX-BF16. This patch alse fix a bug that AMX-INT8 instructions will be selected with wrong predicate.
Differential Revision: https
[X86] Support amx-bf16 intrinsic.
Adding support for intrinsics of AMX-BF16. This patch alse fix a bug that AMX-INT8 instructions will be selected with wrong predicate.
Differential Revision: https://reviews.llvm.org/D97358
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc2 |
|
| #
f8b9035a |
| 23-Feb-2021 |
Liu, Chen3 <[email protected]> |
[X86] Support amx-int8 intrinsic.
Adding support for intrinsics of TDPBSUD/TDPBUSD/TDPBUUD.
Differential Revision: https://reviews.llvm.org/D97259
|
| #
e9c11c19 |
| 18-Feb-2021 |
Wang, Pengfei <[email protected]> |
[X86] Zero AMX config buffer for non AVX512 cases.
Zero AMX config buffer for non AVX512 cases.
Differential Revision: https://reviews.llvm.org/D96927
|
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3 |
|
| #
a5d9e0c7 |
| 30-Jan-2021 |
Wang, Pengfei <[email protected]> |
[X86] Fix tile config register spill issue.
This is an optimized approach for D94155.
Previous code build the model that tile config register is the user of each AMX instruction. There is a problem
[X86] Fix tile config register spill issue.
This is an optimized approach for D94155.
Previous code build the model that tile config register is the user of each AMX instruction. There is a problem for the tile config register spill. When across function, the ldtilecfg instruction may be inserted on each AMX instruction which use tile config register. This cause all tile data register clobber.
To fix this issue, we remove the model of tile config register. Instead, we analyze the AMX instructions between one call to another. We will insert ldtilecfg after the first call if we find any AMX instructions.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D95136
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
| #
64132f54 |
| 21-Jan-2021 |
Luo, Yuanke <[email protected]> |
Revert "[X86][AMX] Fix tile config register spill issue."
This reverts commit 20013d02f3352a88d0838eed349abc9a2b0e9cc0.
|
|
Revision tags: llvmorg-11.1.0-rc1 |
|
| #
20013d02 |
| 05-Jan-2021 |
Luo, Yuanke <[email protected]> |
[X86][AMX] Fix tile config register spill issue.
Previous code build the model that tile config register is the user of each AMX instruction. There is a problem for the tile config register spill. W
[X86][AMX] Fix tile config register spill issue.
Previous code build the model that tile config register is the user of each AMX instruction. There is a problem for the tile config register spill. When across function, the ldtilecfg instruction may be inserted on each AMX instruction which use tile config register. This cause all tile data register clobber. To fix this issue, we remove the model of tile config register. We analyze the regmask of call instruction and insert ldtilecfg if there is any tile data register live across the call. Inserting the sttilecfg before the call is unneccessary, because the tile config doesn't change and we can just reload the config. Besides we also need check tile config register interference. Since we don't model the config register we should check interference from the ldtilecfg to each tile data register def. ldtilecfg / \ BB1 BB2 / \ call BB3 / \ %1=tileload %2=tilezero We can start from the instruction of each tile def, and backward to ldtilecfg. If there is any call instruction, and tile data register is not preserved, we should insert ldtilecfg after the call instruction.
Differential Revision: https://reviews.llvm.org/D94155
show more ...
|
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
| #
08665b18 |
| 08-Dec-2020 |
Luo, Yuanke <[email protected]> |
Support tilezero intrinsic and c interface for AMX.
Differential Revision: https://reviews.llvm.org/D92837
|
|
Revision tags: llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
| #
f80b2987 |
| 06-Sep-2020 |
Luo, Yuanke <[email protected]> |
[X86] AMX programming model. This patch implements amx programming model that discussed in llvm-dev (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html). Thank Hal for the good sugge
[X86] AMX programming model. This patch implements amx programming model that discussed in llvm-dev (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html). Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet. This patch implemeted 7 components.
1. The c interface to end user. 2. The AMX intrinsics in LLVM IR. 3. Transform load/store <256 x i32> to AMX intrinsics or split the type into two <128 x i32>. 4. The Lowering from AMX intrinsics to AMX pseudo instruction. 5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx intruction. 6. The register allocation for tile register. 7. Morph AMX pseudo instruction to AMX real instruction.
Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0
Differential Revision: https://reviews.llvm.org/D87981
show more ...
|