|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
d2c0572b |
| 19-Jul-2022 |
Jacques Pienaar <[email protected]> |
[mlir] Flip LinAlg dialect to _Both
This one required more changes than ideal due to overlapping generated name with different return types. Changed getIndexingMaps to getIndexingMapsArray to move i
[mlir] Flip LinAlg dialect to _Both
This one required more changes than ideal due to overlapping generated name with different return types. Changed getIndexingMaps to getIndexingMapsArray to move it out of the way/highlight that it returns (more expensively) a SmallVector and uses the prefixed name for the Attribute.
Differential Revision: https://reviews.llvm.org/D129919
show more ...
|
| #
8dfdb80f |
| 17-Jul-2022 |
Kazu Hirata <[email protected]> |
Ensure newlines at the end of files (NFC)
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
670eee08 |
| 17-Jun-2022 |
Christopher Bate <[email protected]> |
[mlir][VectorToGPU] Fix support for i4, col-major operand support
For the conversion to nvgpu `mma.sync` and `ldmatrix` pathways, the code was missing support for the `i4` data type. While fixing th
[mlir][VectorToGPU] Fix support for i4, col-major operand support
For the conversion to nvgpu `mma.sync` and `ldmatrix` pathways, the code was missing support for the `i4` data type. While fixing this, another bug was discoverd that caused the number of ldmatrix tiles calculated for certain operand types and configurations to be incorrect. This change fixes both issues and adds additional tests.
Differential Revision: https://reviews.llvm.org/D128074
show more ...
|
|
Revision tags: llvmorg-14.0.5 |
|
| #
51b925df |
| 07-Jun-2022 |
Christopher Bate <[email protected]> |
[mlir][nvgpu] shared memory access optimization pass
This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared mem
[mlir][nvgpu] shared memory access optimization pass
This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared memory in order to avoid bank conflicts. Given a value representing a shared memory memref, it traverses all reads/writes within the parent op and, subject to suitable conditions, rewrites all last dimension index values such that element locations in the final (col) dimension are given by `newColIdx = col % vecSize + perm[row](col/vecSize,row)` where `perm` is a permutation function indexed by `row` and `vecSize` is the vector access size in elements (currently assumes 128bit vectorized accesses, but this can be made a parameter). This specific transformation can help optimize typical distributed & vectorized accesses common to loading matrix multiplication operands to/from shared memory.
Differential Revision: https://reviews.llvm.org/D127457
show more ...
|
|
Revision tags: llvmorg-14.0.4 |
|
| #
1ca772ed |
| 17-May-2022 |
Christopher Bate <[email protected]> |
[MLIR][GPU] Add NvGpu mma.sync path to the VectorToGPU pass
This changes adds the option to lower to NvGpu dialect ops during the VectorToGPU convsersion pass. Because this transformation reuses exi
[MLIR][GPU] Add NvGpu mma.sync path to the VectorToGPU pass
This changes adds the option to lower to NvGpu dialect ops during the VectorToGPU convsersion pass. Because this transformation reuses existing VectorToGPU logic, a seperate VectorToNvGpu conversion pass is not created. The option `use-nvgpu` is added to the VectorToGPU pass. When this is true, the pass will attempt to convert slices rooted at `vector.contract` operations into `nvgpu.mma.sync` ops, and `vector.transfer_read` ops are converted to either `nvgpu.ldmatrix` or one or more `vector.load` operations. The specific data loaded will depend on the thread id within a subgroup (warp). These index calculations depend on data type and shape of the MMA op according to the downstream PTX specification. The code for supporting these details is separated into `NvGpuSupport.cpp|h`.
Differential Revision: https://reviews.llvm.org/D122940
show more ...
|