History log of /llvm-project-15.0.7/llvm/test/CodeGen/X86/atomic-non-integer.ll (Results 1 – 25 of 38)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6
# 2f448bf5 22-Jun-2022 Nikita Popov <[email protected]>

[X86] Migrate tests to use opaque pointers (NFC)

Test updates were performed using:
https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34

These are only the test updates where the test pas

[X86] Migrate tests to use opaque pointers (NFC)

Test updates were performed using:
https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34

These are only the test updates where the test passed without
further modification (which is almost all of them, as the backend
is largely pointer-type agnostic).

show more ...


# 655ba9c8 17-Jun-2022 Phoebe Wang <[email protected]>

Reland "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""""

This resolves problems reported in commit 1a20252978c76cf2518aa45b175a9e5d6d36c4f0.
1. Promot

Reland "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""""

This resolves problems reported in commit 1a20252978c76cf2518aa45b175a9e5d6d36c4f0.
1. Promote to float lowering for nodes XINT_TO_FP
2. Bail out f16 from shuffle combine due to vector type is not legal in the version

show more ...


# 1a202529 17-Jun-2022 Benjamin Kramer <[email protected]>

Revert "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""""

This reverts commit 04a3d5f3a1193fb87576425a385aa0a6115b1e7c.

I see two more issues:

- uito

Revert "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""""

This reverts commit 04a3d5f3a1193fb87576425a385aa0a6115b1e7c.

I see two more issues:

- uitofp/sitofp from i32/i64 to half now generates
__floatsihf/__floatdihf, which exists in neither compiler-rt nor
libgcc

- This crashes when legalizing the bitcast:
```
; RUN: llc < %s -mcpu=skx
define void @main.45(ptr nocapture readnone %retval, ptr noalias nocapture readnone %run_options, ptr noalias nocapture readnone %params, ptr noalias nocapture readonly %buffer_table, ptr noalias nocapture readnone %status, ptr noalias nocapture readnone %prof_counters) local_unnamed_addr {
entry:
%fusion = load ptr, ptr %buffer_table, align 8
%0 = getelementptr inbounds ptr, ptr %buffer_table, i64 1
%Arg_1.2 = load ptr, ptr %0, align 8
%1 = getelementptr inbounds ptr, ptr %buffer_table, i64 2
%Arg_0.1 = load ptr, ptr %1, align 8
%2 = load half, ptr %Arg_0.1, align 8
%3 = bitcast half %2 to i16
%4 = and i16 %3, 32767
%5 = icmp eq i16 %4, 0
%6 = and i16 %3, -32768
%broadcast.splatinsert = insertelement <4 x half> poison, half %2, i64 0
%broadcast.splat = shufflevector <4 x half> %broadcast.splatinsert, <4 x half> poison, <4 x i32> zeroinitializer
%broadcast.splatinsert9 = insertelement <4 x i16> poison, i16 %4, i64 0
%broadcast.splat10 = shufflevector <4 x i16> %broadcast.splatinsert9, <4 x i16> poison, <4 x i32> zeroinitializer
%broadcast.splatinsert11 = insertelement <4 x i16> poison, i16 %6, i64 0
%broadcast.splat12 = shufflevector <4 x i16> %broadcast.splatinsert11, <4 x i16> poison, <4 x i32> zeroinitializer
%broadcast.splatinsert13 = insertelement <4 x i16> poison, i16 %3, i64 0
%broadcast.splat14 = shufflevector <4 x i16> %broadcast.splatinsert13, <4 x i16> poison, <4 x i32> zeroinitializer
%wide.load = load <4 x half>, ptr %Arg_1.2, align 8
%7 = fcmp uno <4 x half> %broadcast.splat, %wide.load
%8 = fcmp oeq <4 x half> %broadcast.splat, %wide.load
%9 = bitcast <4 x half> %wide.load to <4 x i16>
%10 = and <4 x i16> %9, <i16 32767, i16 32767, i16 32767, i16 32767>
%11 = icmp eq <4 x i16> %10, zeroinitializer
%12 = and <4 x i16> %9, <i16 -32768, i16 -32768, i16 -32768, i16 -32768>
%13 = or <4 x i16> %12, <i16 1, i16 1, i16 1, i16 1>
%14 = select <4 x i1> %11, <4 x i16> %9, <4 x i16> %13
%15 = icmp ugt <4 x i16> %broadcast.splat10, %10
%16 = icmp ne <4 x i16> %broadcast.splat12, %12
%17 = or <4 x i1> %15, %16
%18 = select <4 x i1> %17, <4 x i16> <i16 -1, i16 -1, i16 -1, i16 -1>, <4 x i16> <i16 1, i16 1, i16 1, i16 1>
%19 = add <4 x i16> %18, %broadcast.splat14
%20 = select i1 %5, <4 x i16> %14, <4 x i16> %19
%21 = select <4 x i1> %8, <4 x i16> %9, <4 x i16> %20
%22 = bitcast <4 x i16> %21 to <4 x half>
%23 = select <4 x i1> %7, <4 x half> <half 0xH7E00, half 0xH7E00, half 0xH7E00, half 0xH7E00>, <4 x half> %22
store <4 x half> %23, ptr %fusion, align 16
ret void
}
```

llc: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:977: void (anonymous namespace)::SelectionDAGLegalize::LegalizeOp(llvm::SDNode *): Assertion `(TLI.getTypeAction(*DAG.getContext(), Op.getValueType()) == TargetLowering::TypeLegal || Op.getOpcode() == ISD::TargetConstant || Op.getOpcode() == ISD::Register) && "Unexpected illegal type!"' failed.

show more ...


# 04a3d5f3 17-Jun-2022 Phoebe Wang <[email protected]>

Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"""

Fix the crash on lowering X86ISD::FCMP.


# 3cd5696a 15-Jun-2022 Frederik Gossen <[email protected]>

Revert "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"""

This reverts commit e1c5afa47d37012499467b5061fc42e50884d129.

This introduces crashes in the JAX back

Revert "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"""

This reverts commit e1c5afa47d37012499467b5061fc42e50884d129.

This introduces crashes in the JAX backend on CPU. A reproducer in LLVM is
below. Let me know if you have trouble reproducing this.

; ModuleID = '__compute_module'
source_filename = "__compute_module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

@0 = private unnamed_addr constant [4 x i8] c"\00\00\00?"
@1 = private unnamed_addr constant [4 x i8] c"\1C}\908"
@2 = private unnamed_addr constant [4 x i8] c"?\00\\4"
@3 = private unnamed_addr constant [4 x i8] c"%ci1"
@4 = private unnamed_addr constant [4 x i8] zeroinitializer
@5 = private unnamed_addr constant [4 x i8] c"\00\00\00\C0"
@6 = private unnamed_addr constant [4 x i8] c"\00\00\00B"
@7 = private unnamed_addr constant [4 x i8] c"\94\B4\C22"
@8 = private unnamed_addr constant [4 x i8] c"^\09B6"
@9 = private unnamed_addr constant [4 x i8] c"\15\F3M?"
@10 = private unnamed_addr constant [4 x i8] c"e\CC\\;"
@11 = private unnamed_addr constant [4 x i8] c"d\BD/>"
@12 = private unnamed_addr constant [4 x i8] c"V\F4I="
@13 = private unnamed_addr constant [4 x i8] c"\10\CB,<"
@14 = private unnamed_addr constant [4 x i8] c"\AC\E3\D6:"
@15 = private unnamed_addr constant [4 x i8] c"\DC\A8E9"
@16 = private unnamed_addr constant [4 x i8] c"\C6\FA\897"
@17 = private unnamed_addr constant [4 x i8] c"%\F9\955"
@18 = private unnamed_addr constant [4 x i8] c"\B5\DB\813"
@19 = private unnamed_addr constant [4 x i8] c"\B4W_\B2"
@20 = private unnamed_addr constant [4 x i8] c"\1Cc\8F\B4"
@21 = private unnamed_addr constant [4 x i8] c"~3\94\B6"
@22 = private unnamed_addr constant [4 x i8] c"3Yq\B8"
@23 = private unnamed_addr constant [4 x i8] c"\E9\17\17\BA"
@24 = private unnamed_addr constant [4 x i8] c"\F1\B2\8D\BB"
@25 = private unnamed_addr constant [4 x i8] c"\F8t\C2\BC"
@26 = private unnamed_addr constant [4 x i8] c"\82[\C2\BD"
@27 = private unnamed_addr constant [4 x i8] c"uB-?"
@28 = private unnamed_addr constant [4 x i8] c"^\FF\9B\BE"
@29 = private unnamed_addr constant [4 x i8] c"\00\00\00A"

; Function Attrs: uwtable
define void @main.158(ptr %retval, ptr noalias %run_options, ptr noalias %params, ptr noalias %buffer_table, ptr noalias %status, ptr noalias %prof_counters) #0 {
entry:
%fusion.invar_address.dim.1 = alloca i64, align 8
%fusion.invar_address.dim.0 = alloca i64, align 8
%0 = getelementptr inbounds ptr, ptr %buffer_table, i64 1
%Arg_0.1 = load ptr, ptr %0, align 8, !invariant.load !0, !dereferenceable !1, !align !2
%1 = getelementptr inbounds ptr, ptr %buffer_table, i64 0
%fusion = load ptr, ptr %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2
store i64 0, ptr %fusion.invar_address.dim.0, align 8
br label %fusion.loop_header.dim.0

return: ; preds = %fusion.loop_exit.dim.0
ret void

fusion.loop_header.dim.0: ; preds = %fusion.loop_exit.dim.1, %entry
%fusion.indvar.dim.0 = load i64, ptr %fusion.invar_address.dim.0, align 8
%2 = icmp uge i64 %fusion.indvar.dim.0, 3
br i1 %2, label %fusion.loop_exit.dim.0, label %fusion.loop_body.dim.0

fusion.loop_body.dim.0: ; preds = %fusion.loop_header.dim.0
store i64 0, ptr %fusion.invar_address.dim.1, align 8
br label %fusion.loop_header.dim.1

fusion.loop_header.dim.1: ; preds = %fusion.loop_body.dim.1, %fusion.loop_body.dim.0
%fusion.indvar.dim.1 = load i64, ptr %fusion.invar_address.dim.1, align 8
%3 = icmp uge i64 %fusion.indvar.dim.1, 1
br i1 %3, label %fusion.loop_exit.dim.1, label %fusion.loop_body.dim.1

fusion.loop_body.dim.1: ; preds = %fusion.loop_header.dim.1
%4 = getelementptr inbounds [3 x [1 x half]], ptr %Arg_0.1, i64 0, i64 %fusion.indvar.dim.0, i64 0
%5 = load half, ptr %4, align 2, !invariant.load !0, !noalias !3
%6 = fpext half %5 to float
%7 = call float @llvm.fabs.f32(float %6)
%constant.121 = load float, ptr @29, align 4
%compare.2 = fcmp ole float %7, %constant.121
%8 = zext i1 %compare.2 to i8
%constant.120 = load float, ptr @0, align 4
%multiply.95 = fmul float %7, %constant.120
%constant.119 = load float, ptr @5, align 4
%add.82 = fadd float %multiply.95, %constant.119
%constant.118 = load float, ptr @4, align 4
%multiply.94 = fmul float %add.82, %constant.118
%constant.117 = load float, ptr @19, align 4
%add.81 = fadd float %multiply.94, %constant.117
%multiply.92 = fmul float %add.82, %add.81
%constant.116 = load float, ptr @18, align 4
%add.79 = fadd float %multiply.92, %constant.116
%multiply.91 = fmul float %add.82, %add.79
%subtract.87 = fsub float %multiply.91, %add.81
%constant.115 = load float, ptr @20, align 4
%add.78 = fadd float %subtract.87, %constant.115
%multiply.89 = fmul float %add.82, %add.78
%subtract.86 = fsub float %multiply.89, %add.79
%constant.114 = load float, ptr @17, align 4
%add.76 = fadd float %subtract.86, %constant.114
%multiply.88 = fmul float %add.82, %add.76
%subtract.84 = fsub float %multiply.88, %add.78
%constant.113 = load float, ptr @21, align 4
%add.75 = fadd float %subtract.84, %constant.113
%multiply.86 = fmul float %add.82, %add.75
%subtract.83 = fsub float %multiply.86, %add.76
%constant.112 = load float, ptr @16, align 4
%add.73 = fadd float %subtract.83, %constant.112
%multiply.85 = fmul float %add.82, %add.73
%subtract.81 = fsub float %multiply.85, %add.75
%constant.111 = load float, ptr @22, align 4
%add.72 = fadd float %subtract.81, %constant.111
%multiply.83 = fmul float %add.82, %add.72
%subtract.80 = fsub float %multiply.83, %add.73
%constant.110 = load float, ptr @15, align 4
%add.70 = fadd float %subtract.80, %constant.110
%multiply.82 = fmul float %add.82, %add.70
%subtract.78 = fsub float %multiply.82, %add.72
%constant.109 = load float, ptr @23, align 4
%add.69 = fadd float %subtract.78, %constant.109
%multiply.80 = fmul float %add.82, %add.69
%subtract.77 = fsub float %multiply.80, %add.70
%constant.108 = load float, ptr @14, align 4
%add.68 = fadd float %subtract.77, %constant.108
%multiply.79 = fmul float %add.82, %add.68
%subtract.75 = fsub float %multiply.79, %add.69
%constant.107 = load float, ptr @24, align 4
%add.67 = fadd float %subtract.75, %constant.107
%multiply.77 = fmul float %add.82, %add.67
%subtract.74 = fsub float %multiply.77, %add.68
%constant.106 = load float, ptr @13, align 4
%add.66 = fadd float %subtract.74, %constant.106
%multiply.76 = fmul float %add.82, %add.66
%subtract.72 = fsub float %multiply.76, %add.67
%constant.105 = load float, ptr @25, align 4
%add.65 = fadd float %subtract.72, %constant.105
%multiply.74 = fmul float %add.82, %add.65
%subtract.71 = fsub float %multiply.74, %add.66
%constant.104 = load float, ptr @12, align 4
%add.64 = fadd float %subtract.71, %constant.104
%multiply.73 = fmul float %add.82, %add.64
%subtract.69 = fsub float %multiply.73, %add.65
%constant.103 = load float, ptr @26, align 4
%add.63 = fadd float %subtract.69, %constant.103
%multiply.71 = fmul float %add.82, %add.63
%subtract.67 = fsub float %multiply.71, %add.64
%constant.102 = load float, ptr @11, align 4
%add.62 = fadd float %subtract.67, %constant.102
%multiply.70 = fmul float %add.82, %add.62
%subtract.66 = fsub float %multiply.70, %add.63
%constant.101 = load float, ptr @28, align 4
%add.61 = fadd float %subtract.66, %constant.101
%multiply.68 = fmul float %add.82, %add.61
%subtract.65 = fsub float %multiply.68, %add.62
%constant.100 = load float, ptr @27, align 4
%add.60 = fadd float %subtract.65, %constant.100
%subtract.64 = fsub float %add.60, %add.62
%multiply.66 = fmul float %subtract.64, %constant.120
%constant.99 = load float, ptr @6, align 4
%divide.4 = fdiv float %constant.99, %7
%add.59 = fadd float %divide.4, %constant.119
%multiply.65 = fmul float %add.59, %constant.118
%constant.98 = load float, ptr @3, align 4
%add.58 = fadd float %multiply.65, %constant.98
%multiply.64 = fmul float %add.59, %add.58
%constant.97 = load float, ptr @7, align 4
%add.57 = fadd float %multiply.64, %constant.97
%multiply.63 = fmul float %add.59, %add.57
%subtract.63 = fsub float %multiply.63, %add.58
%constant.96 = load float, ptr @2, align 4
%add.56 = fadd float %subtract.63, %constant.96
%multiply.62 = fmul float %add.59, %add.56
%subtract.62 = fsub float %multiply.62, %add.57
%constant.95 = load float, ptr @8, align 4
%add.55 = fadd float %subtract.62, %constant.95
%multiply.61 = fmul float %add.59, %add.55
%subtract.61 = fsub float %multiply.61, %add.56
%constant.94 = load float, ptr @1, align 4
%add.54 = fadd float %subtract.61, %constant.94
%multiply.60 = fmul float %add.59, %add.54
%subtract.60 = fsub float %multiply.60, %add.55
%constant.93 = load float, ptr @10, align 4
%add.53 = fadd float %subtract.60, %constant.93
%multiply.59 = fmul float %add.59, %add.53
%subtract.59 = fsub float %multiply.59, %add.54
%constant.92 = load float, ptr @9, align 4
%add.52 = fadd float %subtract.59, %constant.92
%subtract.58 = fsub float %add.52, %add.54
%multiply.58 = fmul float %subtract.58, %constant.120
%9 = call float @llvm.sqrt.f32(float %7)
%10 = fdiv float 1.000000e+00, %9
%multiply.57 = fmul float %multiply.58, %10
%11 = trunc i8 %8 to i1
%12 = select i1 %11, float %multiply.66, float %multiply.57
%13 = fptrunc float %12 to half
%14 = getelementptr inbounds [3 x [1 x half]], ptr %fusion, i64 0, i64 %fusion.indvar.dim.0, i64 0
store half %13, ptr %14, align 2, !alias.scope !3
%invar.inc1 = add nuw nsw i64 %fusion.indvar.dim.1, 1
store i64 %invar.inc1, ptr %fusion.invar_address.dim.1, align 8
br label %fusion.loop_header.dim.1

fusion.loop_exit.dim.1: ; preds = %fusion.loop_header.dim.1
%invar.inc = add nuw nsw i64 %fusion.indvar.dim.0, 1
store i64 %invar.inc, ptr %fusion.invar_address.dim.0, align 8
br label %fusion.loop_header.dim.0

fusion.loop_exit.dim.0: ; preds = %fusion.loop_header.dim.0
br label %return
}

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare float @llvm.fabs.f32(float %0) #1

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare float @llvm.sqrt.f32(float %0) #1

attributes #0 = { uwtable "denormal-fp-math"="preserve-sign" "no-frame-pointer-elim"="false" }
attributes #1 = { nocallback nofree nosync nounwind readnone speculatable willreturn }

!0 = !{}
!1 = !{i64 6}
!2 = !{i64 8}
!3 = !{!4}
!4 = !{!"buffer: {index:0, offset:0, size:6}", !5}
!5 = !{!"XLA global AA domain"}

show more ...


# e1c5afa4 15-Jun-2022 Phoebe Wang <[email protected]>

Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""

Fixed the missing SQRT promotion. Adding several missing operations too.


# 37455b1f 15-Jun-2022 Thomas Joerg <[email protected]>

Revert "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""

This reverts commit 6e02e27536b9de25a651cfc9c2966ce471169355.

This introduces a crash in the backend. Reproduc

Revert "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""

This reverts commit 6e02e27536b9de25a651cfc9c2966ce471169355.

This introduces a crash in the backend. Reproducer in MLIR's LLVM
dialect follows. Let me know if you have trouble reproducing this.

module {
llvm.func @malloc(i64) -> !llvm.ptr<i8>
llvm.func @_mlir_ciface_tf_report_error(!llvm.ptr<i8>, i32, !llvm.ptr<i8>)
llvm.mlir.global internal constant @error_message_2208944672953921889("failed to allocate memory at loc(\22-\22:3:8)\00")
llvm.func @_mlir_ciface_tf_alloc(!llvm.ptr<i8>, i64, i64, i32, i32, !llvm.ptr<i32>) -> !llvm.ptr<i8>
llvm.func @Rsqrt_CPU_DT_HALF_DT_HALF(%arg0: !llvm.ptr<i8>, %arg1: i64, %arg2: !llvm.ptr<i8>) -> !llvm.struct<(i64, ptr<i8>)> attributes {llvm.emit_c_interface, tf_entry} {
%0 = llvm.mlir.constant(8 : i32) : i32
%1 = llvm.mlir.constant(8 : index) : i64
%2 = llvm.mlir.constant(2 : index) : i64
%3 = llvm.mlir.constant(dense<0.000000e+00> : vector<4xf16>) : vector<4xf16>
%4 = llvm.mlir.constant(dense<[0, 1, 2, 3]> : vector<4xi32>) : vector<4xi32>
%5 = llvm.mlir.constant(dense<1.000000e+00> : vector<4xf16>) : vector<4xf16>
%6 = llvm.mlir.constant(false) : i1
%7 = llvm.mlir.constant(1 : i32) : i32
%8 = llvm.mlir.constant(0 : i32) : i32
%9 = llvm.mlir.constant(4 : index) : i64
%10 = llvm.mlir.constant(0 : index) : i64
%11 = llvm.mlir.constant(1 : index) : i64
%12 = llvm.mlir.constant(-1 : index) : i64
%13 = llvm.mlir.null : !llvm.ptr<f16>
%14 = llvm.getelementptr %13[%9] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16>
%15 = llvm.ptrtoint %14 : !llvm.ptr<f16> to i64
%16 = llvm.alloca %15 x f16 {alignment = 32 : i64} : (i64) -> !llvm.ptr<f16>
%17 = llvm.alloca %15 x f16 {alignment = 32 : i64} : (i64) -> !llvm.ptr<f16>
%18 = llvm.mlir.null : !llvm.ptr<i64>
%19 = llvm.getelementptr %18[%arg1] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
%20 = llvm.ptrtoint %19 : !llvm.ptr<i64> to i64
%21 = llvm.alloca %20 x i64 : (i64) -> !llvm.ptr<i64>
llvm.br ^bb1(%10 : i64)
^bb1(%22: i64): // 2 preds: ^bb0, ^bb2
%23 = llvm.icmp "slt" %22, %arg1 : i64
llvm.cond_br %23, ^bb2, ^bb3
^bb2: // pred: ^bb1
%24 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>>
%25 = llvm.getelementptr %24[%10, 2] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>>, i64) -> !llvm.ptr<i64>
%26 = llvm.add %22, %11 : i64
%27 = llvm.getelementptr %25[%26] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
%28 = llvm.load %27 : !llvm.ptr<i64>
%29 = llvm.getelementptr %21[%22] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
llvm.store %28, %29 : !llvm.ptr<i64>
llvm.br ^bb1(%26 : i64)
^bb3: // pred: ^bb1
llvm.br ^bb4(%10, %11 : i64, i64)
^bb4(%30: i64, %31: i64): // 2 preds: ^bb3, ^bb5
%32 = llvm.icmp "slt" %30, %arg1 : i64
llvm.cond_br %32, ^bb5, ^bb6
^bb5: // pred: ^bb4
%33 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>>
%34 = llvm.getelementptr %33[%10, 2] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>>, i64) -> !llvm.ptr<i64>
%35 = llvm.add %30, %11 : i64
%36 = llvm.getelementptr %34[%35] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
%37 = llvm.load %36 : !llvm.ptr<i64>
%38 = llvm.mul %37, %31 : i64
llvm.br ^bb4(%35, %38 : i64, i64)
^bb6: // pred: ^bb4
%39 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>>
%40 = llvm.getelementptr %39[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>>
%41 = llvm.load %40 : !llvm.ptr<ptr<f16>>
%42 = llvm.getelementptr %13[%11] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16>
%43 = llvm.ptrtoint %42 : !llvm.ptr<f16> to i64
%44 = llvm.alloca %7 x i32 : (i32) -> !llvm.ptr<i32>
llvm.store %8, %44 : !llvm.ptr<i32>
%45 = llvm.call @_mlir_ciface_tf_alloc(%arg0, %31, %43, %8, %7, %44) : (!llvm.ptr<i8>, i64, i64, i32, i32, !llvm.ptr<i32>) -> !llvm.ptr<i8>
%46 = llvm.bitcast %45 : !llvm.ptr<i8> to !llvm.ptr<f16>
%47 = llvm.icmp "eq" %31, %10 : i64
%48 = llvm.or %6, %47 : i1
%49 = llvm.mlir.null : !llvm.ptr<i8>
%50 = llvm.icmp "ne" %45, %49 : !llvm.ptr<i8>
%51 = llvm.or %50, %48 : i1
llvm.cond_br %51, ^bb7, ^bb13
^bb7: // pred: ^bb6
%52 = llvm.urem %31, %9 : i64
%53 = llvm.sub %31, %52 : i64
llvm.br ^bb8(%10 : i64)
^bb8(%54: i64): // 2 preds: ^bb7, ^bb9
%55 = llvm.icmp "slt" %54, %53 : i64
llvm.cond_br %55, ^bb9, ^bb10
^bb9: // pred: ^bb8
%56 = llvm.mul %54, %11 : i64
%57 = llvm.add %56, %10 : i64
%58 = llvm.add %57, %10 : i64
%59 = llvm.getelementptr %41[%58] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16>
%60 = llvm.bitcast %59 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>>
%61 = llvm.load %60 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>>
%62 = "llvm.intr.sqrt"(%61) : (vector<4xf16>) -> vector<4xf16>
%63 = llvm.fdiv %5, %62 : vector<4xf16>
%64 = llvm.getelementptr %46[%58] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16>
%65 = llvm.bitcast %64 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>>
llvm.store %63, %65 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>>
%66 = llvm.add %54, %9 : i64
llvm.br ^bb8(%66 : i64)
^bb10: // pred: ^bb8
%67 = llvm.icmp "ult" %53, %31 : i64
llvm.cond_br %67, ^bb11, ^bb12
^bb11: // pred: ^bb10
%68 = llvm.mul %53, %12 : i64
%69 = llvm.add %31, %68 : i64
%70 = llvm.mul %53, %11 : i64
%71 = llvm.add %70, %10 : i64
%72 = llvm.trunc %69 : i64 to i32
%73 = llvm.mlir.undef : vector<4xi32>
%74 = llvm.insertelement %72, %73[%8 : i32] : vector<4xi32>
%75 = llvm.shufflevector %74, %73 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<4xi32>, vector<4xi32>
%76 = llvm.icmp "slt" %4, %75 : vector<4xi32>
%77 = llvm.add %71, %10 : i64
%78 = llvm.getelementptr %41[%77] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16>
%79 = llvm.bitcast %78 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>>
%80 = llvm.intr.masked.load %79, %76, %3 {alignment = 2 : i32} : (!llvm.ptr<vector<4xf16>>, vector<4xi1>, vector<4xf16>) -> vector<4xf16>
%81 = llvm.bitcast %16 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>>
llvm.store %80, %81 : !llvm.ptr<vector<4xf16>>
%82 = llvm.load %81 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>>
%83 = "llvm.intr.sqrt"(%82) : (vector<4xf16>) -> vector<4xf16>
%84 = llvm.fdiv %5, %83 : vector<4xf16>
%85 = llvm.bitcast %17 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>>
llvm.store %84, %85 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>>
%86 = llvm.load %85 : !llvm.ptr<vector<4xf16>>
%87 = llvm.getelementptr %46[%77] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16>
%88 = llvm.bitcast %87 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>>
llvm.intr.masked.store %86, %88, %76 {alignment = 2 : i32} : vector<4xf16>, vector<4xi1> into !llvm.ptr<vector<4xf16>>
llvm.br ^bb12
^bb12: // 2 preds: ^bb10, ^bb11
%89 = llvm.mul %2, %1 : i64
%90 = llvm.mul %arg1, %2 : i64
%91 = llvm.add %90, %11 : i64
%92 = llvm.mul %91, %1 : i64
%93 = llvm.add %89, %92 : i64
%94 = llvm.alloca %93 x i8 : (i64) -> !llvm.ptr<i8>
%95 = llvm.bitcast %94 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>>
llvm.store %46, %95 : !llvm.ptr<ptr<f16>>
%96 = llvm.getelementptr %95[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>>
llvm.store %46, %96 : !llvm.ptr<ptr<f16>>
%97 = llvm.getelementptr %95[%2] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>>
%98 = llvm.bitcast %97 : !llvm.ptr<ptr<f16>> to !llvm.ptr<i64>
llvm.store %10, %98 : !llvm.ptr<i64>
%99 = llvm.bitcast %94 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64, i64)>>
%100 = llvm.getelementptr %99[%10, 3] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64, i64)>>, i64) -> !llvm.ptr<i64>
%101 = llvm.getelementptr %100[%arg1] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
%102 = llvm.sub %arg1, %11 : i64
llvm.br ^bb14(%102, %11 : i64, i64)
^bb13: // pred: ^bb6
%103 = llvm.mlir.addressof @error_message_2208944672953921889 : !llvm.ptr<array<42 x i8>>
%104 = llvm.getelementptr %103[%10, %10] : (!llvm.ptr<array<42 x i8>>, i64, i64) -> !llvm.ptr<i8>
llvm.call @_mlir_ciface_tf_report_error(%arg0, %0, %104) : (!llvm.ptr<i8>, i32, !llvm.ptr<i8>) -> ()
%105 = llvm.mul %2, %1 : i64
%106 = llvm.mul %2, %10 : i64
%107 = llvm.add %106, %11 : i64
%108 = llvm.mul %107, %1 : i64
%109 = llvm.add %105, %108 : i64
%110 = llvm.alloca %109 x i8 : (i64) -> !llvm.ptr<i8>
%111 = llvm.bitcast %110 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>>
llvm.store %13, %111 : !llvm.ptr<ptr<f16>>
%112 = llvm.getelementptr %111[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>>
llvm.store %13, %112 : !llvm.ptr<ptr<f16>>
%113 = llvm.getelementptr %111[%2] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>>
%114 = llvm.bitcast %113 : !llvm.ptr<ptr<f16>> to !llvm.ptr<i64>
llvm.store %10, %114 : !llvm.ptr<i64>
%115 = llvm.call @malloc(%109) : (i64) -> !llvm.ptr<i8>
"llvm.intr.memcpy"(%115, %110, %109, %6) : (!llvm.ptr<i8>, !llvm.ptr<i8>, i64, i1) -> ()
%116 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
%117 = llvm.insertvalue %10, %116[0] : !llvm.struct<(i64, ptr<i8>)>
%118 = llvm.insertvalue %115, %117[1] : !llvm.struct<(i64, ptr<i8>)>
llvm.return %118 : !llvm.struct<(i64, ptr<i8>)>
^bb14(%119: i64, %120: i64): // 2 preds: ^bb12, ^bb15
%121 = llvm.icmp "sge" %119, %10 : i64
llvm.cond_br %121, ^bb15, ^bb16
^bb15: // pred: ^bb14
%122 = llvm.getelementptr %21[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
%123 = llvm.load %122 : !llvm.ptr<i64>
%124 = llvm.getelementptr %100[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
llvm.store %123, %124 : !llvm.ptr<i64>
%125 = llvm.getelementptr %101[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
llvm.store %120, %125 : !llvm.ptr<i64>
%126 = llvm.mul %120, %123 : i64
%127 = llvm.sub %119, %11 : i64
llvm.br ^bb14(%127, %126 : i64, i64)
^bb16: // pred: ^bb14
%128 = llvm.call @malloc(%93) : (i64) -> !llvm.ptr<i8>
"llvm.intr.memcpy"(%128, %94, %93, %6) : (!llvm.ptr<i8>, !llvm.ptr<i8>, i64, i1) -> ()
%129 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
%130 = llvm.insertvalue %arg1, %129[0] : !llvm.struct<(i64, ptr<i8>)>
%131 = llvm.insertvalue %128, %130[1] : !llvm.struct<(i64, ptr<i8>)>
llvm.return %131 : !llvm.struct<(i64, ptr<i8>)>
}
llvm.func @_mlir_ciface_Rsqrt_CPU_DT_HALF_DT_HALF(%arg0: !llvm.ptr<struct<(i64, ptr<i8>)>>, %arg1: !llvm.ptr<i8>, %arg2: !llvm.ptr<struct<(i64, ptr<i8>)>>) attributes {llvm.emit_c_interface, tf_entry} {
%0 = llvm.load %arg2 : !llvm.ptr<struct<(i64, ptr<i8>)>>
%1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr<i8>)>
%2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr<i8>)>
%3 = llvm.call @Rsqrt_CPU_DT_HALF_DT_HALF(%arg1, %1, %2) : (!llvm.ptr<i8>, i64, !llvm.ptr<i8>) -> !llvm.struct<(i64, ptr<i8>)>
llvm.store %3, %arg0 : !llvm.ptr<struct<(i64, ptr<i8>)>>
llvm.return
}
}

show more ...


# 6e02e275 15-Jun-2022 Phoebe Wang <[email protected]>

Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"

Disabled 2 mlir tests due to the runtime doesn't support `_Float16`, see
the issue here https://github.com/llvm/llvm-pro

Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"

Disabled 2 mlir tests due to the runtime doesn't support `_Float16`, see
the issue here https://github.com/llvm/llvm-project/issues/55992

show more ...


# 5d8298a7 12-Jun-2022 Mehdi Amini <[email protected]>

Revert "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"

This reverts commit 2d2da259c8726fd5c974c01122a9689981a12196.

This breaks MLIR integration test (JIT crashing), reverti

Revert "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"

This reverts commit 2d2da259c8726fd5c974c01122a9689981a12196.

This breaks MLIR integration test (JIT crashing), reverting in the
meantime.

show more ...


# 2d2da259 11-Jun-2022 Phoebe Wang <[email protected]>

[X86][RFC] Enable `_Float16` type support on X86 following the psABI

GCC and Clang/LLVM will support `_Float16` on X86 in C/C++, following
the latest X86 psABI. (https://gitlab.com/x86-psABIs)

_Flo

[X86][RFC] Enable `_Float16` type support on X86 following the psABI

GCC and Clang/LLVM will support `_Float16` on X86 in C/C++, following
the latest X86 psABI. (https://gitlab.com/x86-psABIs)

_Float16 arithmetic will be performed using native half-precision. If
native arithmetic instructions are not available, it will be performed
at a higher precision (currently always float) and then truncated down
to _Float16 immediately after each single arithmetic operation.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D107082

show more ...


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 7f7dac71 25-Nov-2021 Zarko Todorovski <[email protected]>

[NFC][llvm] Inclusive language: reword uses of sanity test and check

Part of continuing work to use more inclusive language. Reworded uses
of sanity check and sanity test in llvm/test/


Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2
# 0aef747b 11-Jun-2021 Roman Lebedev <[email protected]>

[NFC][X86][Codegen] Megacommit: mass-regenerate all check lines that were already autogenerated

The motivation is that the update script has at least two deviations
(`<...>@GOT`/`<...>@PLT`/ and not

[NFC][X86][Codegen] Megacommit: mass-regenerate all check lines that were already autogenerated

The motivation is that the update script has at least two deviations
(`<...>@GOT`/`<...>@PLT`/ and not hiding pointer arithmetics) from
what pretty much all the checklines were generated with,
and most of the tests are still not updated, so each time one of the
non-up-to-date tests is updated to see the effect of the code change,
there is a lot of noise. Instead of having to deal with that each
time, let's just deal with everything at once.

This has been done via:
```
cd llvm-project/llvm/test/CodeGen/X86
grep -rl "; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py" | xargs -L1 <...>/llvm-project/llvm/utils/update_llc_test_checks.py --llc-binary <...>/llvm-project/build/bin/llc
```

Not all tests were regenerated, however.

show more ...


Revision tags: llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1
# c22dc71b 16-Nov-2020 Wang, Pengfei <[email protected]>

[CodeGen][X86] Remove unused trivial check-prefixes from all CodeGen/X86 directory.

I had manually removed unused prefixes from CodeGen/X86 directory for more than 100 tests.
I checked the change hi

[CodeGen][X86] Remove unused trivial check-prefixes from all CodeGen/X86 directory.

I had manually removed unused prefixes from CodeGen/X86 directory for more than 100 tests.
I checked the change history for each of them at the beginning, and then I mainly focused on the format since I found all of the unused prefixes were result from either insensible copy or residuum after functional update.
I think it's OK to remove the remaining X86 tests by script now. I wrote a rough script which works for me in most tests. I put it in llvm/utils temporarily for review and hope it may help other components owners.
The tests in this patch are all generated by the tool and checked by update tool for the autogenerated tests. I skimmed them and checked about 30 tests and didn't find any unexpected changes.

Reviewed By: mtrofin, MaskRay

Differential Revision: https://reviews.llvm.org/D91496

show more ...


Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2
# 0c005be6 29-Jul-2020 Simon Pilgrim <[email protected]>

[X86][SSE] getV4X86ShuffleImm8 - canonicalize broadcast masks

If the mask input to getV4X86ShuffleImm8 only refers to a single source element (+ undefs) then canonicalize to a full broadcast.

getV4

[X86][SSE] getV4X86ShuffleImm8 - canonicalize broadcast masks

If the mask input to getV4X86ShuffleImm8 only refers to a single source element (+ undefs) then canonicalize to a full broadcast.

getV4X86ShuffleImm8 defaults to inline values for undefs, which can be useful for shuffle widening/narrowing but does leave SimplifyDemanded* calls thinking the shuffle depends on unnecessary elements.

I'm still investigating what we should do more generally to avoid these undemanded elements, but broadcast cases was a simpler win.

show more ...


Revision tags: llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3
# 15b6aa74 23-Feb-2020 Craig Topper <[email protected]>

[X86] Enable the use of movlps for i64 atomic load on 32-bit targets with sse1.

Still a little room for improvement by using movlps to store to
the stack temporary needed to move data out of the xmm

[X86] Enable the use of movlps for i64 atomic load on 32-bit targets with sse1.

Still a little room for improvement by using movlps to store to
the stack temporary needed to move data out of the xmm register
after the load.

show more ...


# 2a10f801 23-Feb-2020 Craig Topper <[email protected]>

[X86] Use FIST for i64 atomic stores on 32-bit targets without SSE.


# bdb1729c 23-Feb-2020 Craig Topper <[email protected]>

[X86] Teach EltsFromConsecutiveLoads that it's ok to form a v4f32 VZEXT_LOAD with a 64 bit memory size on SSE1 targets.

We can use MOVLPS which will load 64 bits, but we need a v4f32
result type. We

[X86] Teach EltsFromConsecutiveLoads that it's ok to form a v4f32 VZEXT_LOAD with a 64 bit memory size on SSE1 targets.

We can use MOVLPS which will load 64 bits, but we need a v4f32
result type. We already have isel patterns for this.

The code here is a little hacky. We can probably improve it with
more isel patterns.

show more ...


# e7a184fc 23-Feb-2020 Craig Topper <[email protected]>

[X86] Use movlps for i64 atomic stores on 32-targets with sse1.

This is similar to using movd which we do for sse2 targets.

I've added a DAG combine for VEXTRACT_STORE to use SimplifyDemandedVector

[X86] Use movlps for i64 atomic stores on 32-targets with sse1.

This is similar to using movd which we do for sse2 targets.

I've added a DAG combine for VEXTRACT_STORE to use SimplifyDemandedVectorElts
to clean up some artifacts from type legalization.

show more ...


Revision tags: llvmorg-10.0.0-rc2
# 943b5561 01-Feb-2020 Craig Topper <[email protected]>

[LegalizeTypes][X86] Add a new strategy for type legalizing f16 type that softens it to i16, but promotes to f32 around arithmetic ops.

This is based on this llvm-dev thread http://lists.llvm.org/pi

[LegalizeTypes][X86] Add a new strategy for type legalizing f16 type that softens it to i16, but promotes to f32 around arithmetic ops.

This is based on this llvm-dev thread http://lists.llvm.org/pipermail/llvm-dev/2019-December/137521.html

The current strategy for f16 is to promote type to float every except where the specific width is required like loads, stores, and bitcasts. This results in rounding occurring in odd places instead of immediately after arithmetic operations. This interacts in weird ways with the __fp16 type in clang which is a storage only type where arithmetic is always promoted to float. InstCombine can remove some fpext/fptruncs around such arithmetic and turn it into arithmetic on half. This wouldn't be so bad if SelectionDAG was able to put those fpext/fpround back in when it promotes.

It is also not obvious how to handle to make the existing strategy work with STRICT fp. We need to use STRICT versions of the conversions which require chain operands. But if the conversions are created for a bitcast, there is no place to get an appropriate chain from.

This patch implements a different strategy where conversions are emitted directly around arithmetic operations. And otherwise its passed around as an i16 including in arguments and return values. This can result in more conversions between arithmetic operations, but is closer to matching the IR the frontend generates for __fp16. And it will allow us to use the chain from constrained arithmetic nodes to link the STRICT_FP_TO_FP16/STRICT_FP16_TO_FP that will need to be added. I've set it up so that each target can opt into the new behavior. Converting all the targets myself was more than I was able to handle.

Differential Revision: https://reviews.llvm.org/D73749

show more ...


Revision tags: llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1
# 027aa27d 05-Nov-2019 Philip Reames <[email protected]>

[X86/Atomics] (Semantically) revert G246098, switch back to the old atomic example

When writing an email for a follow up proposal, I realized one of the diffs in the committed change was incorrect.

[X86/Atomics] (Semantically) revert G246098, switch back to the old atomic example

When writing an email for a follow up proposal, I realized one of the diffs in the committed change was incorrect. Digging into it revealed that the fix is complicated enough to require some thought, so reverting in the meantime.

The problem is visible in this diff (from the revert):
; X64-SSE-LABEL: store_fp128:
; X64-SSE: # %bb.0:
-; X64-SSE-NEXT: movaps %xmm0, (%rdi)
+; X64-SSE-NEXT: subq $24, %rsp
+; X64-SSE-NEXT: .cfi_def_cfa_offset 32
+; X64-SSE-NEXT: movaps %xmm0, (%rsp)
+; X64-SSE-NEXT: movq (%rsp), %rsi
+; X64-SSE-NEXT: movq {{[0-9]+}}(%rsp), %rdx
+; X64-SSE-NEXT: callq __sync_lock_test_and_set_16
+; X64-SSE-NEXT: addq $24, %rsp
+; X64-SSE-NEXT: .cfi_def_cfa_offset 8
; X64-SSE-NEXT: retq
store atomic fp128 %v, fp128* %fptr unordered, align 16
ret void

The problem here is three fold:
1) x86-64 doesn't guarantee atomicity of anything larger than 8 bytes. Some platforms observably break this guarantee, others don't, but the codegen isn't considering this, so it's wrong on at least some platforms.
2) When I started to track down the problem, I discovered that DAGCombiner had stripped the atomicity off the store entirely. This comes down to idiomatic usage of DAG.getStore passing all MMO components separately as opposed to just passing the MMO.
3) On x86 (not -64), there are cases where 8 byte atomiciy is supported, but only for floating point operations. This would seem to imply that operation typing matters for correctness, and DAGCombine happily folds away bitcasts. I'm not 100% sure there's a problem here, but I'm not entirely sure there isn't either.

I plan on returning to each issue in turn; sorry for the churn here.

show more ...


# 2460989e 29-Oct-2019 Philip Reames <[email protected]>

[SelectionDAG] Enable lowering unordered atomics loads w/LoadSDNode (and stores w/StoreSDNode) by default

Enable the new SelectionDAG representation for unordered loads and stores introduced in r371

[SelectionDAG] Enable lowering unordered atomics loads w/LoadSDNode (and stores w/StoreSDNode) by default

Enable the new SelectionDAG representation for unordered loads and stores introduced in r371441 by default. As a reminder, the new lowering changes the representation of an unordered atomic load from an AtomicSDNode - which is essentially a black box which gets passed through without combines messing with it - to a LoadSDNode w/a atomic marker on the MMO. The later parallels the way we handle volatiles, and I've audited the code to ensure that every location which checks one checks the other.

This has been fairly heavily fuzzed, and I examined diffs in a reasonable large corpus of assembly by hand, so I'm reasonable sure this is correct for the common case. Late in the review for this, it was discovered that I hadn't correctly handled cases which could be legalized into CAS operations. This points out that there's a strong bias in the IR of the frontend I'm working with towards only legal atomics. If there are problems with this patch, the most likely area will be legalization.

Differential Revision: https://reviews.llvm.org/D69219

show more ...


Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4
# 45cd1851 02-Sep-2019 Craig Topper <[email protected]>

[X86] Enable fp128 as a legal type with SSE1 rather than with MMX.

FP128 values are passed in xmm registers so should be asssociated
with an SSE feature rather than MMX which uses a different set
of

[X86] Enable fp128 as a legal type with SSE1 rather than with MMX.

FP128 values are passed in xmm registers so should be asssociated
with an SSE feature rather than MMX which uses a different set
of registers.

llc enables sse1 and sse2 by default with x86_64. But does not
enable mmx. Clang enables all 3 features by default.

I've tried to add command lines to test with -sse
where possible, but any test that returns a value in an xmm
register fails with a fatal error with -sse since we have no
defined ABI for that scenario.

llvm-svn: 370682

show more ...


Revision tags: llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1
# 3098e44d 14-May-2019 Philip Reames <[email protected]>

[X86] Prefer locked stack op over mfence for seq_cst 64-bit stores on 32-bit targets

This is a follow on to D58632, with the same logic. Given a memory operation which needs ordering, but doesn't ne

[X86] Prefer locked stack op over mfence for seq_cst 64-bit stores on 32-bit targets

This is a follow on to D58632, with the same logic. Given a memory operation which needs ordering, but doesn't need to modify any particular address, prefer to use a locked stack op over an mfence.

Differential Revision: https://reviews.llvm.org/D61863

llvm-svn: 360649

show more ...


# 063b471f 27-Apr-2019 Craig Topper <[email protected]>

[X86] Use MOVQ for i64 atomic_stores when SSE2 is enabled

Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to in

[X86] Use MOVQ for i64 atomic_stores when SSE2 is enabled

Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store.

Reviewers: spatel, RKSimon, reames, jfb, efriedma

Reviewed By: RKSimon

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60546

llvm-svn: 359368

show more ...


# 586fad50 11-Apr-2019 Craig Topper <[email protected]>

[X86] Add patterns for using movss/movsd for atomic load/store of f32/64. Remove atomic fadd pseudos use isel patterns instead.

This patch adds patterns for turning bitcasted atomic load/store into

[X86] Add patterns for using movss/movsd for atomic load/store of f32/64. Remove atomic fadd pseudos use isel patterns instead.

This patch adds patterns for turning bitcasted atomic load/store into movss/sd.

It also removes the pseudo instructions for atomic RMW fadd. Instead just adding isel patterns for folding an atomic load into addss/sd. And relying on the new movss/sd store pattern to handle the write part.

This also makes the fadd patterns use VEX and EVEX instructions when AVX or AVX512F are enabled.

Differential Revision: https://reviews.llvm.org/D60394

llvm-svn: 358215

show more ...


12