1; 2; Here we have 5-way unswitchable switch with each successor also having an unswitchable 3; exiting branch in it. If we start unswitching those branches we start duplicating the 4; whole switch. This can easily lead to exponential behavior w/o proper control. 5; On a real-life testcase there was 16-way switch and that took forever to compile w/o 6; a cost control. 7; 8; 9; When we use the stricted multiplier candidates formula (unscaled candidates == 0) 10; we should be getting just a single loop. 11; 12; RUN: opt < %s -enable-unswitch-cost-multiplier=true \ 13; RUN: -unswitch-num-initial-unscaled-candidates=0 -unswitch-siblings-toplevel-div=1 \ 14; RUN: -passes='loop(unswitch<nontrivial>),print<loops>' -disable-output 2>&1 | FileCheck %s --check-prefixes=LOOP1 15; 16; RUN: opt < %s -enable-unswitch-cost-multiplier=true \ 17; RUN: -unswitch-num-initial-unscaled-candidates=0 -unswitch-siblings-toplevel-div=16 \ 18; RUN: -passes='loop(unswitch<nontrivial>),print<loops>' -disable-output 2>&1 | FileCheck %s --check-prefixes=LOOP1 19; 20; 21; With relaxed candidates multiplier (unscaled candidates == 8) we should allow 22; some unswitches to happen until siblings multiplier starts kicking in: 23; 24; RUN: opt < %s -enable-unswitch-cost-multiplier=true \ 25; RUN: -unswitch-num-initial-unscaled-candidates=8 -unswitch-siblings-toplevel-div=1 \ 26; RUN: -passes='loop(unswitch<nontrivial>),print<loops>' -disable-output 2>&1 | \ 27; RUN: sort -b -k 1 | FileCheck %s --check-prefixes=LOOP-RELAX 28; 29; With relaxed candidates multiplier (unscaled candidates == 8) and with relaxed 30; siblings multiplier for top-level loops (toplevel-div == 8) we should get 31; considerably more copies of the loop (especially top-level ones). 32; 33; RUN: opt < %s -enable-unswitch-cost-multiplier=true \ 34; RUN: -unswitch-num-initial-unscaled-candidates=8 -unswitch-siblings-toplevel-div=8 \ 35; RUN: -passes='loop(unswitch<nontrivial>),print<loops>' -disable-output 2>&1 | \ 36; RUN: sort -b -k 1 | FileCheck %s --check-prefixes=LOOP-RELAX2 37; 38; We get hundreds of copies of the loop when cost multiplier is disabled: 39; 40; RUN: opt < %s -enable-unswitch-cost-multiplier=false \ 41; RUN: -passes='loop(unswitch<nontrivial>),print<loops>' -disable-output 2>&1 | \ 42; RUN: sort -b -k 1 | FileCheck %s --check-prefixes=LOOP-MAX 43; 44 45; Single loop nest, not unswitched 46; LOOP1: Loop at depth 1 containing: 47; LOOP1-NOT: Loop at depth 1 containing: 48; LOOP1: Loop at depth 2 containing: 49; LOOP1-NOT: Loop at depth 2 containing: 50; 51; Somewhat relaxed restrictions on candidates: 52; LOOP-RELAX-COUNT-5: Loop at depth 1 containing: 53; LOOP-RELAX-NOT: Loop at depth 1 containing: 54; LOOP-RELAX-COUNT-32: Loop at depth 2 containing: 55; LOOP-RELAX-NOT: Loop at depth 2 containing: 56; 57; Even more relaxed restrictions on candidates and siblings. 58; LOOP-RELAX2-COUNT-11: Loop at depth 1 containing: 59; LOOP-RELAX2-NOT: Loop at depth 1 containing: 60; LOOP-RELAX2-COUNT-40: Loop at depth 2 containing: 61; LOOP-RELAX-NOT: Loop at depth 2 containing: 62; 63; Unswitched as much as it could (with multiplier disabled). 64; LOOP-MAX-COUNT-56: Loop at depth 1 containing: 65; LOOP-MAX-NOT: Loop at depth 1 containing: 66; LOOP-MAX-COUNT-111: Loop at depth 2 containing: 67; LOOP-MAX-NOT: Loop at depth 2 containing: 68 69define i32 @loop_switch(i32* %addr, i32 %c1, i32 %c2) { 70entry: 71 %addr1 = getelementptr i32, i32* %addr, i64 0 72 %addr2 = getelementptr i32, i32* %addr, i64 1 73 %check0 = icmp eq i32 %c2, 0 74 %check1 = icmp eq i32 %c2, 31 75 %check2 = icmp eq i32 %c2, 32 76 %check3 = icmp eq i32 %c2, 33 77 %check4 = icmp eq i32 %c2, 34 78 br label %outer_loop 79 80outer_loop: 81 %iv1 = phi i32 [0, %entry], [%iv1.next, %outer_latch] 82 %iv1.next = add i32 %iv1, 1 83 br label %inner_loop 84inner_loop: 85 %iv2 = phi i32 [0, %outer_loop], [%iv2.next, %inner_latch] 86 %iv2.next = add i32 %iv2, 1 87 switch i32 %c1, label %inner_latch [ 88 i32 0, label %case0 89 i32 1, label %case1 90 i32 2, label %case2 91 i32 3, label %case3 92 i32 4, label %case4 93 ] 94 95case4: 96 br i1 %check4, label %exit, label %inner_latch 97case3: 98 br i1 %check3, label %exit, label %inner_latch 99case2: 100 br i1 %check2, label %exit, label %inner_latch 101case1: 102 br i1 %check1, label %exit, label %inner_latch 103case0: 104 br i1 %check0, label %exit, label %inner_latch 105 106inner_latch: 107 store volatile i32 0, i32* %addr1 108 %test_inner = icmp slt i32 %iv2, 50 109 br i1 %test_inner, label %inner_loop, label %outer_latch 110 111outer_latch: 112 store volatile i32 0, i32* %addr2 113 %test_outer = icmp slt i32 %iv1, 50 114 br i1 %test_outer, label %outer_loop, label %exit 115 116exit: ; preds = %bci_0 117 ret i32 1 118} 119