[mlir] replace 'emit_c_wrappers' func->llvm conversion option with a passThe 'emit_c_wrappers' option in the FuncToLLVM conversion requests C interfacewrappers to be emitted for every builtin func
[mlir] replace 'emit_c_wrappers' func->llvm conversion option with a passThe 'emit_c_wrappers' option in the FuncToLLVM conversion requests C interfacewrappers to be emitted for every builtin function in the module. While this hasbeen useful to bootstrap the interface, it is problematic in the longer term asit may unintentionally affect the functions that should retain their existinginterface, e.g., libm functions obtained by lowering math operations (seeD126964 for an example). Since D77314, we have a finer-grain control overinterface generation via an attribute that avoids the problem entirely. Removethe 'emit_c_wrappers' option. Introduce the '-llvm-request-c-wrappers' passthat can be run in any pipeline that needs blanket emission of functions toannotate all builtin functions with the attribute before performing the usuallowering that accounts for the attribute.Reviewed By: cheliniDifferential Revision: https://reviews.llvm.org/D127952
show more ...
[mlir] (NFC) Clean up bazel and CMake target namesAll dialect targets in bazel have been named *Dialect and all dialecttargets in CMake have been named MLIR*Dialect.
Recommit: "[MLIR][NVVM] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration."This change rolls bcfc0a9051014437b55ab932d9aca5ecdca6776b
Recommit: "[MLIR][NVVM] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration."This change rolls bcfc0a9051014437b55ab932d9aca5ecdca6776b forward (i.e., reverting 369ce54bb302f209239b8ebc77ad824add9df089) with fixed CMakeLists.txt.
Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration."This reverts commit bcfc0a9051014437b55ab932d9aca5ecdca6776b.
Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration."This reverts commit bcfc0a9051014437b55ab932d9aca5ecdca6776b.The build is broken with shared library enabled.
[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration.This is correct for all values, i.e. the same as promoting the division
[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration.This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because:- it performs less Newton iterations- it avoids the slow path for e.g. denormals- it allows reuse of the reciprocal for multiple divisions by the same divisorTest program:```#include <stdio.h>#include "cuda_fp16.h"// This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below// and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values.__device__ half hdiv_newton(half a, half b) { float fa = __half2float(a); float fb = __half2float(b); float rcp; asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb)); float result = fa * rcp; auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000; if (exponent != 0 && exponent != 0x7f800000) { float err = __fmaf_rn(-fb, result, fa); result = __fmaf_rn(rcp, err, result); } return __float2half(result);}// Surprisingly, this is faster than CUDA's own __hdiv.__device__ half hdiv_promote(half a, half b) { return __float2half(__half2float(a) / __half2float(b));}// This is an approximation that is accurate up to 1 ulp.__device__ half hdiv_approx(half a, half b) { float fa = __half2float(a); float fb = __half2float(b); float result; asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb)); return __float2half(result);}__global__ void CheckCorrectness() { int i = threadIdx.x + blockIdx.x * blockDim.x; half x = reinterpret_cast<const half&>(i); for (int j = 0; j < 65536; ++j) { half y = reinterpret_cast<const half&>(j); half d1 = hdiv_newton(x, y); half d2 = hdiv_promote(x, y); auto s1 = reinterpret_cast<const short&>(d1); auto s2 = reinterpret_cast<const short&>(d2); if (s1 != s2) { printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n", __half2float(x), i, __half2float(y), j, __half2float(d1), s1, __half2float(d2), s2); //__trap(); } }}__device__ half dst;__global__ void ProfileBuiltin(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = x / x; } dst = x;}__global__ void ProfilePromote(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_promote(x, x); } dst = x;}__global__ void ProfileNewton(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_newton(x, x); } dst = x;}__global__ void ProfileApprox(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_approx(x, x); } dst = x;}int main() { CheckCorrectness<<<256, 256>>>(); half one = __float2half(1.0f); ProfileBuiltin<<<1, 1>>>(one); // 1.001s ProfilePromote<<<1, 1>>>(one); // 0.560s ProfileNewton<<<1, 1>>>(one); // 0.508s ProfileApprox<<<1, 1>>>(one); // 0.304s auto status = cudaDeviceSynchronize(); printf("%s\n", cudaGetErrorString(status));}```Reviewed By: herhutDifferential Revision: https://reviews.llvm.org/D126158
[mlir] Make locations required when adding/creating block argumentsBlockArguments gained the ability to have locations attached a while ago, but theyhave always been optional. This goes against th
[mlir] Make locations required when adding/creating block argumentsBlockArguments gained the ability to have locations attached a while ago, but theyhave always been optional. This goes against the core tenant of MLIR where locationinformation is a requirement, so this commit updates the API to require locations.Fixes #53279Differential Revision: https://reviews.llvm.org/D117633
[mlir] handle nested regions in llvm-legalize-for-exportThe translation from the MLIR LLVM dialect to LLVM IR includes a mechanism thatensures the successors of a block to be different blocks in c
[mlir] handle nested regions in llvm-legalize-for-exportThe translation from the MLIR LLVM dialect to LLVM IR includes a mechanism thatensures the successors of a block to be different blocks in case blockarguments are passed to them since the opposite cannot be expressed in LLVM IR.This mechanism previously only worked for functions because it was writtenprior to the introduction of other region-carrying operations such as theOpenMP dialect, which also translates directly to LLVM IR. Modify thismechanism to handle all regions in the module and not only functions.Reviewed By: wsmosesDifferential Revision: https://reviews.llvm.org/D117548
Adjust "end namespace" comment in MLIR to match new agree'd coding styleSee D115115 and this mailing list discussion:https://lists.llvm.org/pipermail/llvm-dev/2021-December/154199.htmlDifferenti
Adjust "end namespace" comment in MLIR to match new agree'd coding styleSee D115115 and this mailing list discussion:https://lists.llvm.org/pipermail/llvm-dev/2021-December/154199.htmlDifferential Revision: https://reviews.llvm.org/D115309
[mlir][BuiltinDialect] Resolve comments from D91571* Move ops to a BuiltinOps.h* Add file comments
[mlir][NFC] Remove references to Module.h and Function.hThese includes have been deprecated in favor of BuiltinDialect.h, which contains the definitions of ModuleOp and FuncOp.Differential Revisi
[mlir][NFC] Remove references to Module.h and Function.hThese includes have been deprecated in favor of BuiltinDialect.h, which contains the definitions of ModuleOp and FuncOp.Differential Revision: https://reviews.llvm.org/D91572
[MLIR][NFC] Eliminate .getBlocks() when not neededDifferential Revision: https://reviews.llvm.org/D82229
[MLIR] Reapply: Adjust libMLIR building to more closely follow libClangThis reverts commit ab1ca6e60fc58b857cc5030ca6e024d20d919cb9.
Revert "[MLIR] Adjust libMLIR building to more closely follow libClang"This reverts commit 4f0f436749c264c16eb226c9b9b132e07e3650a6.This seems to show some compile dependence problems, and also b
Revert "[MLIR] Adjust libMLIR building to more closely follow libClang"This reverts commit 4f0f436749c264c16eb226c9b9b132e07e3650a6.This seems to show some compile dependence problems, and also breaks flang.
[MLIR] Adjust libMLIR building to more closely follow libClang- Exports MLIR targets to be used out-of-tree.- mimicks `add_clang_library` and `add_flang_library`.- Fixes libMLIR.soAfter https:/
[MLIR] Adjust libMLIR building to more closely follow libClang- Exports MLIR targets to be used out-of-tree.- mimicks `add_clang_library` and `add_flang_library`.- Fixes libMLIR.soAfter https://reviews.llvm.org/D77515 libMLIR.so was no longer containingany object files. We originally had a cludge there that made it work withthe static initalizers and when switchting away from that to the way theclang shlib does it, I noticed that MLIR doesn't create a `obj.{name}` target,and doesn't export it's targets to `lib/cmake/mlir`.This is due to MLIR using `add_llvm_library` under the hood, which addsthe target to `llvmexports`.Differential Revision: https://reviews.llvm.org/D78773[MLIR] Fix libMLIR.so and LLVM_LINK_LLVM_DYLIBPrimarily, this patch moves all mlir references to LLVM libraries intoeither LLVM_LINK_COMPONENTS or LINK_COMPONENTS. This enables magic inthe llvm cmake files to automatically replace reference to LLVM componentswith references to libLLVM.so when necessary. Among other things, thiscompletes fixing libMLIR.so, which has been broken for some configurationssince D77515.Unlike previously, the pattern is now that mlir libraries should almostalways use add_mlir_library. Previously, some libraries still usedadd_llvm_library. However, this confuses the export of targets for useout of tree because libraries specified with add_llvm_library are exportedby LLVM. Instead users which don't need/can't be linked into libMLIR.socan specify EXCLUDE_FROM_LIBMLIRA common error mode is linking with LLVM libraries outside of LINK_COMPONENTS.This almost always results in symbol confusion or multiply defined optionsin LLVM when the same object file is included as a static library andas part of libLLVM.so. To catch these errors more directly, there's nowmlir_check_all_link_libraries.To simplify usage of add_mlir_library, we assume that all mlirlibraries depend on LLVMSupport, so it's not necessary to separately specifyit.tested with:BUILD_SHARED_LIBS=on,BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB,BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB + LLVM_LINK_LLVM_DYLIB.By: Stephen Neuendorffer <[email protected]>Differential Revision: https://reviews.llvm.org/D79067[MLIR] Move from using target_link_libraries to LINK_LIBSThis allows us to correctly generate dependencies for derived targets,such as targets which are created for object libraries.By: Stephen Neuendorffer <[email protected]>Differential Revision: https://reviews.llvm.org/D79243Three commits have been squashed to avoid intermediate build breakage.
[mlir][Pass] Update the PassGen to generate base classes instead of utilitiesSummary:This is much cleaner, and fits the same structure as many other tablegen backends. This was not done originally
[mlir][Pass] Update the PassGen to generate base classes instead of utilitiesSummary:This is much cleaner, and fits the same structure as many other tablegen backends. This was not done originally as the CRTP in the pass classes made it overly verbose/complex.Differential Revision: https://reviews.llvm.org/D77367
[mlir][Pass] Remove the use of CRTP from the Pass classesThis revision removes all of the CRTP from the pass hierarchy in preparation for using the tablegen backend instead. This creates a much cle
[mlir][Pass] Remove the use of CRTP from the Pass classesThis revision removes all of the CRTP from the pass hierarchy in preparation for using the tablegen backend instead. This creates a much cleaner interface in the C++ code, and naturally fits with the rest of the infrastructure. A new utility class, PassWrapper, is added to replicate the existing behavior for passes not suitable for using the tablegen backend.Differential Revision: https://reviews.llvm.org/D77350
[mlir][Pass] Add support for generating pass utilities via tablegenThis revision adds support for generating utilities for passes such as options/statistics/etc. that can be inferred from the table
[mlir][Pass] Add support for generating pass utilities via tablegenThis revision adds support for generating utilities for passes such as options/statistics/etc. that can be inferred from the tablegen definition. This removes additional boilerplate from the pass, and also makes it easier to remove the reliance on the pass registry to provide certain things(e.g. the pass argument).Differential Revision: https://reviews.llvm.org/D76659
[mlir][Pass] Move the registration of dialect passes to tablegenThis generates a Passes.td for all of the dialects that have transformation passes. This removes the need for global registration for
[mlir][Pass] Move the registration of dialect passes to tablegenThis generates a Passes.td for all of the dialects that have transformation passes. This removes the need for global registration for all of the dialect passes.Differential Revision: https://reviews.llvm.org/D76657
[mlir] LLVM dialect: move ensureDistinctSuccessors out of std->LLVM conversionMLIR supports terminators that have the same successor block with differentblock operands, which cannot be expressed i
[mlir] LLVM dialect: move ensureDistinctSuccessors out of std->LLVM conversionMLIR supports terminators that have the same successor block with differentblock operands, which cannot be expressed in the LLVM's phi-notation as theblock identifier is used to tell apart the predecessors. This limitation can beworked around by branching to a new block instead, with this new blockunconditionally branching to the original successor and forwarding theargument. Until now, this transformation was performed during the conversionfrom the Standard to the LLVM dialect. This does not scale well to multipledialects targeting the LLVM dialect as all of them would have to be aware ofthis limitation and perform the preparatory transformation. Instead, do it as aseparate pass and run it immediately before the translation.Differential Revision: https://reviews.llvm.org/D75619