| #
52735fc4 |
| 14-Jul-2016 |
Dean Michael Berris <[email protected]> |
XRay: Add entry and exit sleds
Summary: In this patch we implement the following parts of XRay:
- Supporting a function attribute named 'function-instrument' which currently only supports 'xray-alw
XRay: Add entry and exit sleds
Summary: In this patch we implement the following parts of XRay:
- Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches. - Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts). - X86-specific nop sleds as described in the white paper. - A machine function pass that adds the different instrumentation marker instructions at a very late stage. - A way of identifying which return opcode is considered "normal" for each architecture.
There are some caveats here:
1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet.
2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library.
Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk
Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D19904
llvm-svn: 275367
show more ...
|
| #
90d195a5 |
| 08-Jul-2016 |
Wei Mi <[email protected]> |
[PM] Port UnreachableBlockElim to the new Pass Manager
Differential Revision: http://reviews.llvm.org/D22124
llvm-svn: 274824
|
| #
82d5da5a |
| 24-Jun-2016 |
Michael Kuperstein <[email protected]> |
[PM] Port PreISelIntrinsicLowering to the new PM
llvm-svn: 273713
|
|
Revision tags: llvmorg-3.8.1, llvmorg-3.8.1-rc1 |
|
| #
f9acacaa |
| 31-May-2016 |
Matthias Braun <[email protected]> |
CodeGen: Refactor renameDisconnectedComponents() as a pass
Refactor LiveIntervals::renameDisconnectedComponents() to be a pass. Also change the name to "RenameIndependentSubregs":
- renameDisconnec
CodeGen: Refactor renameDisconnectedComponents() as a pass
Refactor LiveIntervals::renameDisconnectedComponents() to be a pass. Also change the name to "RenameIndependentSubregs":
- renameDisconnectedComponents() worked on a MachineFunction at a time so it is a natural candidate for a machine function pass.
- The algorithm is testable with a .mir test now.
- This also fixes a problem where the lazy renaming as part of the MachineScheduler introduced IMPLICIT_DEF instructions after the number of a nodes in a region were counted leading to a mismatch.
Differential Revision: http://reviews.llvm.org/D20507
llvm-svn: 271345
show more ...
|
| #
330a1255 |
| 19-May-2016 |
Matthew Simpson <[email protected]> |
[ARM, AArch64] Properly initialize InterleavedAccessPass
InterleavedAccessPass is an IR-level pass, so this change will enable testing it with opt. This is part of D20250.
llvm-svn: 270101
|
| #
fbe85ae1 |
| 28-Apr-2016 |
Matthias Braun <[email protected]> |
CodeGen: Add DetectDeadLanes pass.
The DetectDeadLanes pass performs a dataflow analysis of used/defined subregister lanes across COPY instructions and instructions that will get lowered to copies.
CodeGen: Add DetectDeadLanes pass.
The DetectDeadLanes pass performs a dataflow analysis of used/defined subregister lanes across COPY instructions and instructions that will get lowered to copies. It detects dead definitions and uses reading undefined values which are obscured by COPY and subregister usage.
These dead definitions cause trouble in the register coalescer which cannot deal with definitions suddenly becoming dead after coalescing COPY instructions.
For now the pass only adds dead and undef flags to machine operands. It should be possible to extend it in the future to remove the dead instructions and redo the analysis for the affected virtual registers.
Differential Revision: http://reviews.llvm.org/D18427
llvm-svn: 267851
show more ...
|
| #
7dd8dbf4 |
| 22-Apr-2016 |
Peter Collingbourne <[email protected]> |
Introduce llvm.load.relative intrinsic.
This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that value and ret
Introduce llvm.load.relative intrinsic.
This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that value and returns it. The constant folder specifically recognizes the form of this intrinsic and the constant initializers it may load from; if a loaded constant initializer is known to have the form ``i32 trunc(x - %ptr)``, the intrinsic call is folded to ``x``.
LLVM provides that the calculation of such a constant initializer will not overflow at link time under the medium code model if ``x`` is an ``unnamed_addr`` function. However, it does not provide this guarantee for a constant initializer folded into a function body. This intrinsic can be used to avoid the possibility of overflows when loading from such a constant.
Differential Revision: http://reviews.llvm.org/D18367
llvm-svn: 267223
show more ...
|
| #
ee34680b |
| 22-Apr-2016 |
Tom Stellard <[email protected]> |
CodeGen: Add a stand-alone hazard recognizer pass
Summary: This new pass allows targets to use the hazard recognizer without having to also run one of the schedulers. This is useful when compiling
CodeGen: Add a stand-alone hazard recognizer pass
Summary: This new pass allows targets to use the hazard recognizer without having to also run one of the schedulers. This is useful when compiling with optimizations disabled for targets that still need noop hazards to be handled correctly.
Reviewers: hfinkel, atrick
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18594
llvm-svn: 267156
show more ...
|
| #
c0441c29 |
| 19-Apr-2016 |
Sanjoy Das <[email protected]> |
Introduce a "patchable-function" function attribute
Summary: The `"patchable-function"` attribute can be used by an LLVM client to influence LLVM's code generation in ways that makes the generated c
Introduce a "patchable-function" function attribute
Summary: The `"patchable-function"` attribute can be used by an LLVM client to influence LLVM's code generation in ways that makes the generated code easily patchable at runtime (for instance, to redirect control). Right now only one patchability scheme is supported, `"prologue-short-redirect"`, but this can be expanded in the future.
Reviewers: joker.eph, rnk, echristo, dberris
Subscribers: joker.eph, echristo, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19046
llvm-svn: 266715
show more ...
|
|
Revision tags: llvmorg-3.8.0, llvmorg-3.8.0-rc3, llvmorg-3.8.0-rc2 |
|
| #
390c33cd |
| 27-Jan-2016 |
Benjamin Kramer <[email protected]> |
Move SafeStack to CodeGen.
It depends on the target machinery, that's not available for instrumentation passes.
llvm-svn: 258942
|
|
Revision tags: llvmorg-3.8.0-rc1 |
|
| #
859ad29b |
| 16-Dec-2015 |
Vikram TV <[email protected]> |
Recommit LiveDebugValues pass after fixing a couple of minor issues.
llvm-svn: 255759
|
| #
ceca9715 |
| 09-Dec-2015 |
Mehdi Amini <[email protected]> |
Revert "Implement a new pass - LiveDebugValues - to compute the set of live DEBUG_VALUEs at each basic block and insert them. Reviewed and accepted at: http://reviews.llvm.org/D11933"
This reverts c
Revert "Implement a new pass - LiveDebugValues - to compute the set of live DEBUG_VALUEs at each basic block and insert them. Reviewed and accepted at: http://reviews.llvm.org/D11933"
This reverts commit r255096.
Break the bots: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_check/16378/
From: Mehdi Amini <[email protected]> llvm-svn: 255101
show more ...
|
| #
0876d2d5 |
| 09-Dec-2015 |
Vikram TV <[email protected]> |
Implement a new pass - LiveDebugValues - to compute the set of live DEBUG_VALUEs at each basic block and insert them. Reviewed and accepted at: http://reviews.llvm.org/D11933
llvm-svn: 255096
|
|
Revision tags: llvmorg-3.7.1, llvmorg-3.7.1-rc2, llvmorg-3.7.1-rc1 |
|
| #
97890230 |
| 17-Sep-2015 |
David Majnemer <[email protected]> |
[WinEH] Add a funclet layout pass
Windows EH funclets need to be contiguous. The FuncletLayout pass will ensure that the funclets are together and begin with a funclet entry MBB.
Differential Revi
[WinEH] Add a funclet layout pass
Windows EH funclets need to be contiguous. The FuncletLayout pass will ensure that the funclets are together and begin with a funclet entry MBB.
Differential Revision: http://reviews.llvm.org/D12943
llvm-svn: 247937
show more ...
|
|
Revision tags: llvmorg-3.7.0, llvmorg-3.7.0-rc4, llvmorg-3.7.0-rc3, llvmorg-3.7.0-rc2, llvmorg-3.7.0-rc1, llvmorg-3.6.2, llvmorg-3.6.2-rc1 |
|
| #
69fad079 |
| 15-Jun-2015 |
Sanjoy Das <[email protected]> |
[CodeGen] Add a pass to fold null checks into nearby memory operations.
Summary: This change adds an "ImplicitNullChecks" target dependent pass. This pass folds null checks into memory operation us
[CodeGen] Add a pass to fold null checks into nearby memory operations.
Summary: This change adds an "ImplicitNullChecks" target dependent pass. This pass folds null checks into memory operation using the FAULTING_LOAD pseudo-op introduced in previous patches.
Depends on D10197 Depends on D10199 Depends on D10200
Reviewers: reames, rnk, pgavlin, JosephTremoulet, atrick
Reviewed By: atrick
Subscribers: ab, JosephTremoulet, llvm-commits
Differential Revision: http://reviews.llvm.org/D10201
llvm-svn: 239743
show more ...
|
|
Revision tags: llvmorg-3.6.1, llvmorg-3.6.1-rc1 |
|
| #
61b305ed |
| 05-May-2015 |
Quentin Colombet <[email protected]> |
[ShrinkWrap] Add (a simplified version) of shrink-wrapping.
This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find
[ShrinkWrap] Add (a simplified version) of shrink-wrapping.
This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find safe points that are cheaper than the entry and exits blocks.
As an example and to avoid regressions to be introduce, this patch also implements the required bits to enable the shrink-wrapping pass for AArch64.
** Context **
Currently we insert the prologue and epilogue of the method/function in the entry and exits blocks. Although this is correct, we can do a better job when those are not immediately required and insert them at less frequently executed places. The job of the shrink-wrapping pass is to identify such places.
** Motivating example **
Let us consider the following function that perform a call only in one branch of a if: define i32 @f(i32 %a, i32 %b) { %tmp = alloca i32, align 4 %tmp2 = icmp slt i32 %a, %b br i1 %tmp2, label %true, label %false
true: store i32 %a, i32* %tmp, align 4 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp) br label %false
false: %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ] ret i32 %tmp.0 }
On AArch64 this code generates (removing the cfi directives to ease readabilities): _f: ; @f ; BB#0: stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething LBB0_2: ; %false mov sp, x29 ldp x29, x30, [sp], #16 ret
With shrink-wrapping we could generate: _f: ; @f ; BB#0: cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething add sp, x29, #16 ; =16 ldp x29, x30, [sp], #16 LBB0_2: ; %false ret
Therefore, we would pay the overhead of setting up/destroying the frame only if we actually do the call.
** Proposed Solution **
This patch introduces a new machine pass that perform the shrink-wrapping analysis (See the comments at the beginning of ShrinkWrap.cpp for more details). It then stores the safe save and restore point into the MachineFrameInfo attached to the MachineFunction. This information is then used by the PrologEpilogInserter (PEI) to place the related code at the right place. This pass runs right before the PEI.
Unlike the original paper of Chow from PLDI’88, this implementation of shrink-wrapping does not use expensive data-flow analysis and does not need hack to properly avoid frequently executed point. Instead, it relies on dominance and loop properties.
The pass is off by default and each target can opt-in by setting the EnableShrinkWrap boolean to true in their derived class of TargetPassConfig. This setting can also be overwritten on the command line by using -enable-shrink-wrap.
Before you try out the pass for your target, make sure you properly fix your emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not necessarily the entry block.
** Design Decisions **
1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but for debugging and clarity I thought it was best to have its own file. 2. Right now, we only support one save point and one restore point. At some point we can expand this to several save point and restore point, the impacted component would then be: - The pass itself: New algorithm needed. - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one pointer. - PEI: Should loop over the save point and restore point. Anyhow, at least for this first iteration, I do not believe this is interesting to support the complex cases. We should revisit that when we motivating examples.
Differential Revision: http://reviews.llvm.org/D9210
<rdar://problem/3201744>
llvm-svn: 236507
show more ...
|
|
Revision tags: llvmorg-3.5.2, llvmorg-3.5.2-rc1 |
|
| #
be0a0506 |
| 09-Mar-2015 |
Reid Kleckner <[email protected]> |
Reland r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation
Fix the double-deletion of AnalysisResolver when delegating through to Dwarf EH preparation by creating one from
Reland r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation
Fix the double-deletion of AnalysisResolver when delegating through to Dwarf EH preparation by creating one from scratch. Hopefully the new pass manager simplifies this.
This reverts commit r229952.
llvm-svn: 231719
show more ...
|
|
Revision tags: llvmorg-3.6.0 |
|
| #
301ed0c3 |
| 20-Feb-2015 |
Chandler Carruth <[email protected]> |
Revert r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation
This doesn't pass 'ninja check-llvm' for me. Lots of tests, including the ones updated, fail with crashes and ot
Revert r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation
This doesn't pass 'ninja check-llvm' for me. Lots of tests, including the ones updated, fail with crashes and other explosions.
llvm-svn: 229952
show more ...
|
| #
0b647e6c |
| 20-Feb-2015 |
Reid Kleckner <[email protected]> |
EH: Prune unreachable resume instructions during Dwarf EH preparation
Today a simple function that only catches exceptions and doesn't run destructor cleanups ends up containing a dead call to _Unwi
EH: Prune unreachable resume instructions during Dwarf EH preparation
Today a simple function that only catches exceptions and doesn't run destructor cleanups ends up containing a dead call to _Unwind_Resume (PR20300). We can't remove these dead resume instructions during normal optimization because inlining might introduce additional landingpads that do have cleanups to run. Instead we can do this during EH preparation, which is guaranteed to run after inlining.
Fixes PR20300.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D7744
llvm-svn: 229944
show more ...
|
|
Revision tags: llvmorg-3.6.0-rc4, llvmorg-3.6.0-rc3 |
|
| #
705b185f |
| 31-Jan-2015 |
Chandler Carruth <[email protected]> |
[PM] Change the core design of the TTI analysis to use a polymorphic type erased interface and a single analysis pass rather than an extremely complex analysis group.
The end result is that the TTI
[PM] Change the core design of the TTI analysis to use a polymorphic type erased interface and a single analysis pass rather than an extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it.
Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below.
The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to *just* do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
show more ...
|
|
Revision tags: llvmorg-3.6.0-rc2, llvmorg-3.6.0-rc1 |
|
| #
eeea8970 |
| 14-Jan-2015 |
JF Bastien <[email protected]> |
Revert "Insert random noops to increase security against ROP attacks (llvm)"
This reverts commit: http://reviews.llvm.org/D3392
llvm-svn: 225948
|
| #
dcdd5ad2 |
| 14-Jan-2015 |
JF Bastien <[email protected]> |
Insert random noops to increase security against ROP attacks (llvm)
A pass that adds random noops to X86 binaries to introduce diversity with the goal of increasing security against most return-orie
Insert random noops to increase security against ROP attacks (llvm)
A pass that adds random noops to X86 binaries to introduce diversity with the goal of increasing security against most return-oriented programming attacks.
Command line options: -noop-insertion // Enable noop insertion. -noop-insertion-percentage=X // X% of assembly instructions will have a noop prepended (default: 50%, requires -noop-insertion) -max-noops-per-instruction=X // Randomly generate X noops per instruction. ie. roll the dice X times with probability set above (default: 1). This doesn't guarantee X noop instructions.
In addition, the following 'quick switch' in clang enables basic diversity using default settings (currently: noop insertion and schedule randomization; it is intended to be extended in the future). -fdiversify
This is the llvm part of the patch. clang part: D3393
http://reviews.llvm.org/D3392 Patch by Stephen Crane (@rinon)
llvm-svn: 225908
show more ...
|
|
Revision tags: llvmorg-3.5.1, llvmorg-3.5.1-rc2, llvmorg-3.5.1-rc1, llvmorg-3.5.0, llvmorg-3.5.0-rc4 |
|
| #
59c23cd9 |
| 21-Aug-2014 |
Robin Morisset <[email protected]> |
Rename AtomicExpandLoadLinked into AtomicExpand
AtomicExpandLoadLinked is currently rather ARM-specific. This patch is the first of a group that aim at making it more target-independent. See http://
Rename AtomicExpandLoadLinked into AtomicExpand
AtomicExpandLoadLinked is currently rather ARM-specific. This patch is the first of a group that aim at making it more target-independent. See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075873.html for details
The command line option is "atomic-expand"
llvm-svn: 216231
show more ...
|
|
Revision tags: llvmorg-3.5.0-rc3, llvmorg-3.5.0-rc2 |
|
| #
5e1207e5 |
| 03-Aug-2014 |
Gerolf Hoflehner <[email protected]> |
MachineCombiner Pass for selecting faster instruction sequence - target independent framework
When the DAGcombiner selects instruction sequences it could increase the critical path or resource l
MachineCombiner Pass for selecting faster instruction sequence - target independent framework
When the DAGcombiner selects instruction sequences it could increase the critical path or resource len.
For example, on arm64 there are multiply-accumulate instructions (madd, msub). If e.g. the equivalent multiply-add sequence is not on the crictial path it makes sense to select it instead of the combined, single accumulate instruction (madd/msub). The reason is that the conversion from add+mul to the madd could lengthen the critical path by the latency of the multiply.
But the DAGCombiner would always combine and select the madd/msub instruction.
This patch uses machine trace metrics to estimate critical path length and resource length of an original instruction sequence vs a combined instruction sequence and picks the faster code based on its estimates.
This patch only commits the target independent framework that evaluates and selects code sequences. The machine instruction combiner is turned off for all targets and expected to evolve over time by gradually handling DAGCombiner pattern in the target specific code.
This framework lays the groundwork for fixing rdar://16319955
llvm-svn: 214666
show more ...
|
|
Revision tags: llvmorg-3.5.0-rc1, llvmorg-3.4.2, llvmorg-3.4.2-rc1, llvmorg-3.4.1, llvmorg-3.4.1-rc2 |
|
| #
037f26f2 |
| 17-Apr-2014 |
Tim Northover <[email protected]> |
Atomics: promote ARM's IR-based atomics pass to CodeGen.
Still only 32-bit ARM using it at this stage, but the promotion allows direct testing via opt and is a reasonably self-contained patch on the
Atomics: promote ARM's IR-based atomics pass to CodeGen.
Still only 32-bit ARM using it at this stage, but the promotion allows direct testing via opt and is a reasonably self-contained patch on the way to switching ARM64.
At this point, other targets should be able to make use of it without too much difficulty if they want. (See ARM64 commit coming soon for an example).
llvm-svn: 206485
show more ...
|