[mlir][AMDGPU] Add lds_barrier opThe lds_barrier op allows workgroups to wait at a barrier foroperations to/from their local data store (LDS) to complete withoutincurring the performance penaltie
[mlir][AMDGPU] Add lds_barrier opThe lds_barrier op allows workgroups to wait at a barrier foroperations to/from their local data store (LDS) to complete withoutincurring the performance penalties of a full memory fence.Reviewed By: nirvedhmeshramDifferential Revision: https://reviews.llvm.org/D129522
show more ...
[mlir][AMDGPU] Add --chipset option to AMDGPUToROCDLBecause the buffer descriptor structure (the V#) has no backwards-compatibilityguarentees, and since said guarantees have been violated in pract
[mlir][AMDGPU] Add --chipset option to AMDGPUToROCDLBecause the buffer descriptor structure (the V#) has no backwards-compatibilityguarentees, and since said guarantees have been violated in practice(see https://github.com/llvm/llvm-project/issues/56323 ), and sincethe `targetIsRDNA` attribute isn't something that higher-level clients can setin general, make the lowering of the amdgpu dialect to rocdl take a --chipsetoption.Note that this option is a string because adding a parser for the Chipsetstruct to llvm::cl wasn't working out.Reviewed By: herhutDifferential Revision: https://reviews.llvm.org/D129228
[MLIR][AMDGPU] Add AMDGPU dialect, wrappers around raw buffer intrinsicsBy analogy with the NVGPU dialect, introduce an AMDGPU dialect forAMD-specific intrinsic wrappers.The dialect initially in
[MLIR][AMDGPU] Add AMDGPU dialect, wrappers around raw buffer intrinsicsBy analogy with the NVGPU dialect, introduce an AMDGPU dialect forAMD-specific intrinsic wrappers.The dialect initially includes wrappers around the raw buffer intrinsics.On AMD GPUs, a memref can be converted to a "buffer descriptor" thatallows more precise control of memory access, such as by allowing forout of bounds loads/stores to be replaced by 0/ignored without addingadditional conditional logic, which is important for performance.The repository currently contains a limited conversion fromtransfer_read/transfer_write to Mubuf intrinsics, which are an older,deprecated intrinsic for the same functionality.The new amdgpu.raw_buffer_* ops allow these operations to be usedexplicitly and for including metadata such as whether the targetchipset is an RDNA chip or not (which impacts the interpretation ofsome bits in the buffer descriptor), while still maintaining anMLIR-like interface.(This change also exposes the floating-point atomic add intrinsic.)Reviewed By: ThomasRaouxDifferential Revision: https://reviews.llvm.org/D122765