AVXTranspose.cpp - OpenGrok history log for /llvm-project-15.0.7/mlir/lib/Dialect/X86Vector/Transforms/AVXTranspose.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2
# eda6f907	22-Apr-2022	River Riddle <[email protected]>	[mlir][NFC] Shift a bunch of dialect includes from the .h to the .cpp Now that dialect constructors are generated in the .cpp file, we can drop all of the dependent dialect includes from the .h file [mlir][NFC] Shift a bunch of dialect includes from the .h to the .cpp Now that dialect constructors are generated in the .cpp file, we can drop all of the dependent dialect includes from the .h file. Differential Revision: https://reviews.llvm.org/D124298 show more ...
Revision tags: llvmorg-14.0.1
# 7c38fd60	28-Mar-2022	Jacques Pienaar <[email protected]>	[mlir] Flip Vector dialect accessors used to prefixed form. This has been on _Both for a couple of weeks. Flip usages in core with intention to flip flag to _Prefixed in follow up. Needed to add a c [mlir] Flip Vector dialect accessors used to prefixed form. This has been on _Both for a couple of weeks. Flip usages in core with intention to flip flag to _Prefixed in follow up. Needed to add a couple of helper methods in AffineOps and Linalg to facilitate a pure flag flip in follow up as some of these classes are used in templates and so sensitive to Vector dialect changes. Differential Revision: https://reviews.llvm.org/D122151 show more ...
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2
# 875bbce9	25-Feb-2022	Diego Caballero <[email protected]>	[mlir][Vector] Prevent AVX2 lowering for non-f32 transpose ops The AVX2 lowering for transpose operations is only applicable to f32 vector types. Reviewed By: aartbik Differential Revision: https: [mlir][Vector] Prevent AVX2 lowering for non-f32 transpose ops The AVX2 lowering for transpose operations is only applicable to f32 vector types. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D120427 show more ...
# d7e0a084	25-Feb-2022	Diego Caballero <[email protected]>	[mlir][Vector] Generalize AVX2 transpose lowering to n-D vectors The existing AVX2 lowering patterns for the transpose op only triggers if the input vector is 2-D. This patch extends the patterns to [mlir][Vector] Generalize AVX2 transpose lowering to n-D vectors The existing AVX2 lowering patterns for the transpose op only triggers if the input vector is 2-D. This patch extends the patterns to trigger for n-D vectors which are effectively 2-D vectors (e.g., vector<1x4x1x8x1). The main constraint for the generalized AVX2 patterns to be applicable to these vectors is that the dimensions that are greater than one must be transposed. Otherwise, the existing patterns are not applicable. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D119505 show more ...
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init
# 42398b51	26-Jan-2022	Nicolas Vasilache <[email protected]>	[mlir][LLVM] Add support for operand_attrs to InlineAsmOp This revision adds enough support to allow InlineAsmOp to work properly with indirect memory constraints "m". These require an explicit "el [mlir][LLVM] Add support for operand_attrs to InlineAsmOp This revision adds enough support to allow InlineAsmOp to work properly with indirect memory constraints "m". These require an explicit "elementtype" TypeAttr on the operands to pass LLVM verification and need to be provided. Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D118006 show more ...
# 99ef9eeb	31-Jan-2022	Matthias Springer <[email protected]>	[mlir][vector][NFC] Split into IR, Transforms and Utils This reduces the dependencies of the MLIRVector target and makes the dialect consistent with other dialects. Differential Revision: https://r [mlir][vector][NFC] Split into IR, Transforms and Utils This reduces the dependencies of the MLIRVector target and makes the dialect consistent with other dialects. Differential Revision: https://reviews.llvm.org/D118533 show more ...
# 7ebd22c5	26-Jan-2022	Mehdi Amini <[email protected]>	Revert "[mlir][LLVM] Add support for operand_attrs to InlineAsmOp" This reverts commit e6ce2c0b8d5f8253791bf87145669c58328c30db. The test is failing in CI right now.
# e6ce2c0b	26-Jan-2022	Nicolas Vasilache <[email protected]>	[mlir][LLVM] Add support for operand_attrs to InlineAsmOp This revision adds enough support to allow InlineAsmOp to work properly with indirect memory constraints "m". These require an explicit "el [mlir][LLVM] Add support for operand_attrs to InlineAsmOp This revision adds enough support to allow InlineAsmOp to work properly with indirect memory constraints "m". These require an explicit "elementtype" TypeAttr on the operands to pass LLVM verification and need to be provided. Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D118006 show more ...
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 02b6fb21	20-Dec-2021	Mehdi Amini <[email protected]>	Fix clang-tidy issues in mlir/ (NFC) Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D115956
Revision tags: llvmorg-13.0.1-rc1
# b2729fda	22-Nov-2021	Nicolas Vasilache <[email protected]>	[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm) This revision follows up on the conversation titled: ```[llvm-dev] Understanding and controlling some of the A [mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm) This revision follows up on the conversation titled: ```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths``` The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation. This results in roughly 20% fewer cycles as reported by llvm-mca: After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted): ``` Iterations: 100 Instructions: 5900 Total Cycles: 2415 Total uOps: 7300 Dispatch Width: 6 uOps Per Cycle: 3.02 IPC: 2.44 Block RThroughput: 24.0 Cycles with backend pressure increase [ 89.90% ] Throughput Bottlenecks: Resource Pressure [ 89.65% ] - SKXPort1 [ 0.04% ] - SKXPort2 [ 12.42% ] - SKXPort3 [ 12.42% ] - SKXPort5 [ 89.52% ] Data Dependencies: [ 37.06% ] - Register Dependencies [ 37.06% ] - Memory Dependencies [ 0.00% ] ``` After this revision (inline_asm version, vblendps instructions are indeed emitted): ``` Iterations: 100 Instructions: 6300 Total Cycles: 2015 Total uOps: 7700 Dispatch Width: 6 uOps Per Cycle: 3.82 IPC: 3.13 Block RThroughput: 20.0 Cycles with backend pressure increase [ 83.47% ] Throughput Bottlenecks: Resource Pressure [ 83.18% ] - SKXPort0 [ 14.49% ] - SKXPort1 [ 14.54% ] - SKXPort2 [ 19.70% ] - SKXPort3 [ 19.70% ] - SKXPort5 [ 83.03% ] - SKXPort6 [ 14.49% ] Data Dependencies: [ 39.75% ] - Register Dependencies [ 39.75% ] - Memory Dependencies [ 0.00% ] ``` An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0). Differential Revision: https://reviews.llvm.org/D114393 show more ...
# e0b7bee7	22-Nov-2021	Mehdi Amini <[email protected]>	Revert "[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)" This reverts commit a9e236bed835c58be381dadb973a1db0681e4795. This broke the Windows build: mlir\incl Revert "[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)" This reverts commit a9e236bed835c58be381dadb973a1db0681e4795. This broke the Windows build: mlir\include\mlir/Dialect/X86Vector/Transforms.h(28): error C2061: syntax error: identifier 'uint' show more ...
# a9e236be	22-Nov-2021	Nicolas Vasilache <[email protected]>	[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm) This revision follows up on the conversation titled: ```[llvm-dev] Understanding and controlling some of the A [mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm) This revision follows up on the conversation titled: ```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths``` The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation. This results in roughly 20% fewer cycles as reported by llvm-mca: After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted): ``` Iterations: 100 Instructions: 5900 Total Cycles: 2415 Total uOps: 7300 Dispatch Width: 6 uOps Per Cycle: 3.02 IPC: 2.44 Block RThroughput: 24.0 Cycles with backend pressure increase [ 89.90% ] Throughput Bottlenecks: Resource Pressure [ 89.65% ] - SKXPort1 [ 0.04% ] - SKXPort2 [ 12.42% ] - SKXPort3 [ 12.42% ] - SKXPort5 [ 89.52% ] Data Dependencies: [ 37.06% ] - Register Dependencies [ 37.06% ] - Memory Dependencies [ 0.00% ] ``` After this revision (inline_asm version, vblendps instructions are indeed emitted): ``` Iterations: 100 Instructions: 6300 Total Cycles: 2015 Total uOps: 7700 Dispatch Width: 6 uOps Per Cycle: 3.82 IPC: 3.13 Block RThroughput: 20.0 Cycles with backend pressure increase [ 83.47% ] Throughput Bottlenecks: Resource Pressure [ 83.18% ] - SKXPort0 [ 14.49% ] - SKXPort1 [ 14.54% ] - SKXPort2 [ 19.70% ] - SKXPort3 [ 19.70% ] - SKXPort5 [ 83.03% ] - SKXPort6 [ 14.49% ] Data Dependencies: [ 39.75% ] - Register Dependencies [ 39.75% ] - Memory Dependencies [ 0.00% ] ``` An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0). Reviewed By: ftynse, dcaballe Differential Revision: https://reviews.llvm.org/D114335 show more ...
# f04a1237	11-Nov-2021	Benjamin Kramer <[email protected]>	[mlir][X86Vector] Fix unused variable warning
# a085c4b5	11-Nov-2021	Nicolas Vasilache <[email protected]>	[mlir][Vector] Silence recently introduced warnings
# 34ff8573	10-Nov-2021	Nicolas Vasilache <[email protected]>	[mlir][X86Vector] Add specialized vector.transpose lowering patterns for AVX2 This revision adds an implementation of 2-D vector.transpose for 4x8 and 8x8 for AVX2 and surfaces it to the Linalg leve [mlir][X86Vector] Add specialized vector.transpose lowering patterns for AVX2 This revision adds an implementation of 2-D vector.transpose for 4x8 and 8x8 for AVX2 and surfaces it to the Linalg level of control. Reviewed By: dcaballe Differential Revision: https://reviews.llvm.org/D113347 show more ...