NvGpuSupport.cpp - OpenGrok history log for /llvm-project-15.0.7/mlir/lib/Conversion/VectorToGPU/NvGpuSupport.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# d2c0572b	19-Jul-2022	Jacques Pienaar <[email protected]>	[mlir] Flip LinAlg dialect to _Both This one required more changes than ideal due to overlapping generated name with different return types. Changed getIndexingMaps to getIndexingMapsArray to move i [mlir] Flip LinAlg dialect to _Both This one required more changes than ideal due to overlapping generated name with different return types. Changed getIndexingMaps to getIndexingMapsArray to move it out of the way/highlight that it returns (more expensively) a SmallVector and uses the prefixed name for the Attribute. Differential Revision: https://reviews.llvm.org/D129919 show more ...
# 8dfdb80f	17-Jul-2022	Kazu Hirata <[email protected]>	Ensure newlines at the end of files (NFC)
Revision tags: llvmorg-14.0.6
# 670eee08	17-Jun-2022	Christopher Bate <[email protected]>	[mlir][VectorToGPU] Fix support for i4, col-major operand support For the conversion to nvgpu `mma.sync` and `ldmatrix` pathways, the code was missing support for the `i4` data type. While fixing th [mlir][VectorToGPU] Fix support for i4, col-major operand support For the conversion to nvgpu `mma.sync` and `ldmatrix` pathways, the code was missing support for the `i4` data type. While fixing this, another bug was discoverd that caused the number of ldmatrix tiles calculated for certain operand types and configurations to be incorrect. This change fixes both issues and adds additional tests. Differential Revision: https://reviews.llvm.org/D128074 show more ...
Revision tags: llvmorg-14.0.5
# 51b925df	07-Jun-2022	Christopher Bate <[email protected]>	[mlir][nvgpu] shared memory access optimization pass This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared mem [mlir][nvgpu] shared memory access optimization pass This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared memory in order to avoid bank conflicts. Given a value representing a shared memory memref, it traverses all reads/writes within the parent op and, subject to suitable conditions, rewrites all last dimension index values such that element locations in the final (col) dimension are given by `newColIdx = col % vecSize + perm[row](col/vecSize,row)` where `perm` is a permutation function indexed by `row` and `vecSize` is the vector access size in elements (currently assumes 128bit vectorized accesses, but this can be made a parameter). This specific transformation can help optimize typical distributed & vectorized accesses common to loading matrix multiplication operands to/from shared memory. Differential Revision: https://reviews.llvm.org/D127457 show more ...
Revision tags: llvmorg-14.0.4
# 1ca772ed	17-May-2022	Christopher Bate <[email protected]>	[MLIR][GPU] Add NvGpu mma.sync path to the VectorToGPU pass This changes adds the option to lower to NvGpu dialect ops during the VectorToGPU convsersion pass. Because this transformation reuses exi [MLIR][GPU] Add NvGpu mma.sync path to the VectorToGPU pass This changes adds the option to lower to NvGpu dialect ops during the VectorToGPU convsersion pass. Because this transformation reuses existing VectorToGPU logic, a seperate VectorToNvGpu conversion pass is not created. The option `use-nvgpu` is added to the VectorToGPU pass. When this is true, the pass will attempt to convert slices rooted at `vector.contract` operations into `nvgpu.mma.sync` ops, and `vector.transfer_read` ops are converted to either `nvgpu.ldmatrix` or one or more `vector.load` operations. The specific data loaded will depend on the thread id within a subgroup (warp). These index calculations depend on data type and shape of the MMA op according to the downstream PTX specification. The code for supporting these details is separated into `NvGpuSupport.cpp\|h`. Differential Revision: https://reviews.llvm.org/D122940 show more ...