[mlir][VectorToGPU] Fix support for i4, col-major operand supportFor the conversion to nvgpu `mma.sync` and `ldmatrix` pathways, the codewas missing support for the `i4` data type. While fixing th
[mlir][VectorToGPU] Fix support for i4, col-major operand supportFor the conversion to nvgpu `mma.sync` and `ldmatrix` pathways, the codewas missing support for the `i4` data type. While fixing this, anotherbug was discoverd that caused the number of ldmatrix tiles calculated forcertain operand types and configurations to be incorrect. This changefixes both issues and adds additional tests.Differential Revision: https://reviews.llvm.org/D128074
show more ...
[mlir][VectorToGPU] Fix bug generating incorrect ldmatrix opsldmatrix transpose can only be used with types that are 16bits wide.Differential Revision: https://reviews.llvm.org/D126846
[MLIR][GPU] Add NvGpu mma.sync path to the VectorToGPU passThis changes adds the option to lower to NvGpu dialect ops during theVectorToGPU convsersion pass. Because this transformation reusesexi
[MLIR][GPU] Add NvGpu mma.sync path to the VectorToGPU passThis changes adds the option to lower to NvGpu dialect ops during theVectorToGPU convsersion pass. Because this transformation reusesexisting VectorToGPU logic, a seperate VectorToNvGpu conversion pass isnot created. The option `use-nvgpu` is added to the VectorToGPU pass.When this is true, the pass will attempt to convert slices rooted at`vector.contract` operations into `nvgpu.mma.sync` ops, and`vector.transfer_read` ops are converted to either `nvgpu.ldmatrix` orone or more `vector.load` operations. The specific data loaded willdepend on the thread id within a subgroup (warp). These indexcalculations depend on data type and shape of the MMA opaccording to the downstream PTX specification. The code for supportingthese details is separated into `NvGpuSupport.cpp|h`.Differential Revision: https://reviews.llvm.org/D122940
[mlir][NFC] Update textual references of `func` to `func.func` in Conversion/ testsThe special case parsing of `func` operations is being removed.
[mlir][gpu] Relax restriction on mma load/store opThose ops can support more complex layout as long as the most innerdimension is contiguous.Differential Revision: https://reviews.llvm.org/D1224
[mlir][gpu] Relax restriction on mma load/store opThose ops can support more complex layout as long as the most innerdimension is contiguous.Differential Revision: https://reviews.llvm.org/D122452
[mlir] Move the Builtin FuncOp to the Func dialectThis commit moves FuncOp out of the builtin dialect, and into the Funcdialect. This move has been planned in some capacity from the momentwe made
[mlir] Move the Builtin FuncOp to the Func dialectThis commit moves FuncOp out of the builtin dialect, and into the Funcdialect. This move has been planned in some capacity from the momentwe made FuncOp an operation (years ago). This commit handles thefunctional aspects of the move, but various aspects are left untouchedto ease migration: func::FuncOp is re-exported into mlir to reducethe actual API churn, the assembly format still accepts the unqualified`func`. These temporary measures will remain for a little while tosimplify migration before being removed.Differential Revision: https://reviews.llvm.org/D121266
[mlir] Update FuncOp conversion passes to Pass/InterfacePass<FunctionOpInterface>These passes generally don't rely on any special aspects of FuncOp, and moving allowsfor these passes to be used in
[mlir] Update FuncOp conversion passes to Pass/InterfacePass<FunctionOpInterface>These passes generally don't rely on any special aspects of FuncOp, and moving allowsfor these passes to be used in many more situations. The passes that obviously weren'trelying on invariants guaranteed by a "function" were updated to be generic pass, therest were updated to be FunctionOpinterface InterfacePasses.The test updates are NFC switching from implicit nesting (-pass -pass2) form tothe -pass-pipeline form (generic passes do not implicitly nest as op-specific passes do).Differential Revision: https://reviews.llvm.org/D121190
[VectorToGPU] Fix horizontal stride calculation for N-D memrefFix a bug in how we calculate the stride of mma load/store ops for N-DmemrefsDifferential Revision: https://reviews.llvm.org/D118378
[mlir] Replace StrEnumAttr -> EnumAttr in core dialectsRemoves uses of `StrEnumAttr` in core dialectsReviewed By: mehdi_amini, rriddleDifferential Revision: https://reviews.llvm.org/D117514
[mlir][VectorToGPU] Support more cases in conversion to MMA opsSupport load with broadcast, elementwise divf op and remove thehardcoded restriction on the vector size. Picking the right size shoul
[mlir][VectorToGPU] Support more cases in conversion to MMA opsSupport load with broadcast, elementwise divf op and remove thehardcoded restriction on the vector size. Picking the right size shouldbe enfored by user and will fail conversion to llvm/spirv if it is notsupported.Differential Revision: https://reviews.llvm.org/D113618
[mlir][VectorToGPU] Add support for elementwise mma to vector to GPUDifferential Revision: https://reviews.llvm.org/D112960
[MLIR] Replace std ops with arith dialect opsPrecursor: https://reviews.llvm.org/D110200Removed redundant ops from the standard dialect that were moved to the`arith` or `math` dialects.Renamed
[MLIR] Replace std ops with arith dialect opsPrecursor: https://reviews.llvm.org/D110200Removed redundant ops from the standard dialect that were moved to the`arith` or `math` dialects.Renamed all instances of operations in the codebase and in tests.Reviewed By: rriddle, jpienaarDifferential Revision: https://reviews.llvm.org/D110797
[mlir][VectorToGPU] Support converting vetor.broadcast to MMA opDifferential Revision: https://reviews.llvm.org/D105175
[mlir][VectorToGPU] Add conversion for scf::For op with Matrix operandsDifferential Revision: https://reviews.llvm.org/D104134
[mlir][VectorToGPU] Add conversion for splat constant to MMA const matrixDifferential Revision: https://reviews.llvm.org/D104133
[mlir][VectorToGPU] First step to convert vector ops to GPU MMA opsThis is the first step to convert vector ops to MMA operations in order totarget GPUs tensor core ops. This currently only suppor
[mlir][VectorToGPU] First step to convert vector ops to GPU MMA opsThis is the first step to convert vector ops to MMA operations in order totarget GPUs tensor core ops. This currently only support simple cases,transpose and element-wise operation will be added later.Differential Revision: https://reviews.llvm.org/D102962