|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
9e6d1f4b |
| 17-Jul-2022 |
Kazu Hirata <[email protected]> |
[CodeGen] Qualify auto variables in for loops (NFC)
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
1e67385d |
| 17-Jun-2022 |
Mingming Liu <[email protected]> |
[MachineBlockPlacementStats] Added check for "-filter-print-funcs" option to the machine-block-placement-stats.
Differential Revision: https://reviews.llvm.org/D128019
|
| #
b7d09557 |
| 17-Jun-2022 |
Mingming Liu <[email protected]> |
Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats."
This reverts commit 46d45df4516e9a5bc43460429cd02cd04a85db1a. Going to add differen
Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats."
This reverts commit 46d45df4516e9a5bc43460429cd02cd04a85db1a. Going to add differential revision link to commit message and re-commit.
show more ...
|
| #
46d45df4 |
| 17-Jun-2022 |
Mingming Liu <[email protected]> |
[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats.
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
989f1c72 |
| 15-Mar-2022 |
serge-sans-paille <[email protected]> |
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
a278250b |
| 10-Mar-2022 |
Nico Weber <[email protected]> |
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https:/
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169
show more ...
|
| #
7f230fee |
| 07-Mar-2022 |
serge-sans-paille <[email protected]> |
Cleanup codegen includes
after: 1061034926 before: 1063332844
Differential Revision: https://reviews.llvm.org/D121169
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
bcdc0477 |
| 01-Mar-2022 |
spupyrev <[email protected]> |
speeding up ext-tsp for huge instances
Differential Revision: https://reviews.llvm.org/D120780
|
|
Revision tags: llvmorg-14.0.0-rc1 |
|
| #
dee058c6 |
| 05-Feb-2022 |
Hongtao Yu <[email protected]> |
[CSSPGO] Turn on ext-tsp by default for CSSPGO.
I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, pro
[CSSPGO] Turn on ext-tsp by default for CSSPGO.
I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D119048
show more ...
|
|
Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
73d92faa |
| 01-Dec-2021 |
Nicholas Guy <[email protected]> |
[CodeGen] Emit alignment "Max Skip" operand
The current AsmPrinter has support to emit the "Max Skip" operand (the 3rd of .p2align), however has no support for it to actually be specified. Adding Ma
[CodeGen] Emit alignment "Max Skip" operand
The current AsmPrinter has support to emit the "Max Skip" operand (the 3rd of .p2align), however has no support for it to actually be specified. Adding MaxBytesForAlignment to MachineBasicBlock provides this capability on a per-block basis. Leaving the value as default (0) causes no observable differences in behaviour.
Differential Revision: https://reviews.llvm.org/D114590
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
f573f686 |
| 08-Nov-2021 |
spupyrev <[email protected]> |
ext-tsp basic block layout
A new basic block ordering improving existing MachineBlockPlacement.
The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality
ext-tsp basic block layout
A new basic block ordering improving existing MachineBlockPlacement.
The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem.
The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order.
An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast.
Differential Revision: https://reviews.llvm.org/D113424
show more ...
|
| #
3678326d |
| 07-Dec-2021 |
Nico Weber <[email protected]> |
Revert "ext-tsp basic block layout"
This reverts commit c68f71eb37c2b6ffcf29e865d443a910e73083bd.
Breaks tests on arm hosts, see comments on https://reviews.llvm.org/D113424
|
| #
c68f71eb |
| 08-Nov-2021 |
spupyrev <[email protected]> |
ext-tsp basic block layout
A new basic block ordering improving existing MachineBlockPlacement.
The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality
ext-tsp basic block layout
A new basic block ordering improving existing MachineBlockPlacement.
The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem.
The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order.
An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast.
Differential Revision: https://reviews.llvm.org/D113424
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
| #
c9fca53a |
| 10-Sep-2021 |
Kazu Hirata <[email protected]> |
[CodeGen, Target] Use pred_empty and succ_empty (NFC)
|
|
Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1 |
|
| #
50b62731 |
| 29-Jul-2021 |
Guozhi Wei <[email protected]> |
[MBP] findBestLoopTopHelper should exit if OldTop is not a chain header
Function findBestLoopTopHelper tries to find a new loop top block which can also fall through to OldTop, but it's impossible i
[MBP] findBestLoopTopHelper should exit if OldTop is not a chain header
Function findBestLoopTopHelper tries to find a new loop top block which can also fall through to OldTop, but it's impossible if OldTop is not a chain header, so it should exit immediately.
Differential Revision: https://reviews.llvm.org/D106329
show more ...
|
|
Revision tags: llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
d8aba75a |
| 07-May-2021 |
Fangrui Song <[email protected]> |
Internalize some cl::opt global variables or move them under namespace llvm
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3 |
|
| #
cd880442 |
| 28-Jan-2021 |
Nicholas Guy <[email protected]> |
[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold
Different targets might handle branch performance differently, so this patch allows for targets to speci
[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold
Different targets might handle branch performance differently, so this patch allows for targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch can be and still be duplicated to generate straight-line code instead. This patch also specifies said override values for the AArch64 subtarget.
Differential Revision: https://reviews.llvm.org/D95631
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
| #
7bc76fd0 |
| 31-Dec-2020 |
Kazu Hirata <[email protected]> |
[CodeGen] Construct SmallVector with iterator ranges (NFC)
|
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
| #
687e80be |
| 16-Dec-2020 |
Guozhi Wei <[email protected]> |
[MBP] Add whole chain to BlockFilterSet instead of individual BB
Currently we add individual BB to BlockFilterSet if its frequency satisfies
LoopFreq / Freq <= LoopToColdBlockRatio
LoopFreq is e
[MBP] Add whole chain to BlockFilterSet instead of individual BB
Currently we add individual BB to BlockFilterSet if its frequency satisfies
LoopFreq / Freq <= LoopToColdBlockRatio
LoopFreq is edge frequency from outside to loop header. LoopToColdBlockRatio is a command line parameter.
It doesn't make sense since we always layout whole chain, not individual BBs.
It may also cause a tricky problem. Sometimes it is possible that the LoopFreq of an inner loop is smaller than LoopFreq of outer loop. So a BB can be in BlockFilterSet of inner loop, but not in BlockFilterSet of outer loop, like .cold in the test case. So it is added to the chain of inner loop. When work on the outer loop, .cold is not added to BlockFilterSet, so the edge to successor .problem is not counted in UnscheduledPredecessors of .problem chain. But other blocks in the inner loop are added BlockFilterSet, so the whole inner loop chain can be layout, and markChainSuccessors is called to decrease UnscheduledPredecessors of following chains. markChainSuccessors calls markBlockSuccessors for every BB, even it is not in BlockFilterSet, like .cold, so .problem chain's UnscheduledPredecessors is decreased, but this edge was not counted on in fillWorkLists, so .problem chain's UnscheduledPredecessors becomes 0 when it still has an unscheduled predecessor .pred! And it causes problems in following various successor BB selection algorithms.
Differential Revision: https://reviews.llvm.org/D89088
show more ...
|
| #
d50d7c37 |
| 14-Dec-2020 |
Guozhi Wei <[email protected]> |
[MBP] Prevent rotating a chain contains entry block
The entry block should always be the first BB in a function. So we should not rotate a chain contains the entry block.
Differential Revision: htt
[MBP] Prevent rotating a chain contains entry block
The entry block should always be the first BB in a function. So we should not rotate a chain contains the entry block.
Differential Revision: https://reviews.llvm.org/D92882
show more ...
|
| #
ee5b5b7a |
| 14-Dec-2020 |
Kazu Hirata <[email protected]> |
[CodeGen] Use llvm::erase_value (NFC)
|
| #
a553ac97 |
| 05-Dec-2020 |
Kazu Hirata <[email protected]> |
[CodeGen] llvm::erase_if (NFC)
|
|
Revision tags: llvmorg-11.0.1-rc1 |
|
| #
68403af0 |
| 22-Nov-2020 |
Kazu Hirata <[email protected]> |
[MBP] Remove unused declaration shouldPredBlockBeOutlined (NFC)
The function was introduced on Jun 12, 2016 in commit 071d0f180794f7819c44026815614ce8fa00a3bd. Its definition was removed on Mar 2,
[MBP] Remove unused declaration shouldPredBlockBeOutlined (NFC)
The function was introduced on Jun 12, 2016 in commit 071d0f180794f7819c44026815614ce8fa00a3bd. Its definition was removed on Mar 2, 2017 in commit 1393761e0ca3fe8271245762f78daf4d5208cd77.
show more ...
|
| #
e42f6c0a |
| 23-Oct-2020 |
Han Shen <[email protected]> |
Revert "[MBP] Add whole chain to BlockFilterSet instead of individual BB"
This reverts commit adfb5415010fbbc009a4a6298cfda7a6ed4fa6d4.
This is reverted because it caused an chrome error: https://c
Revert "[MBP] Add whole chain to BlockFilterSet instead of individual BB"
This reverts commit adfb5415010fbbc009a4a6298cfda7a6ed4fa6d4.
This is reverted because it caused an chrome error: https://crbug.com/1140168
show more ...
|
| #
adfb5415 |
| 14-Oct-2020 |
Guozhi Wei <[email protected]> |
[MBP] Add whole chain to BlockFilterSet instead of individual BB
Currently we add individual BB to BlockFilterSet if its frequency satisfies
LoopFreq / Freq <= LoopToColdBlockRatio
LoopFreq is edg
[MBP] Add whole chain to BlockFilterSet instead of individual BB
Currently we add individual BB to BlockFilterSet if its frequency satisfies
LoopFreq / Freq <= LoopToColdBlockRatio
LoopFreq is edge frequency from outside to loop header. LoopToColdBlockRatio is a command line parameter.
It doesn't make sense since we always layout whole chain, not individual BBs.
It may also cause a tricky problem. Sometimes it is possible that the LoopFreq of an inner loop is smaller than LoopFreq of outer loop. So a BB can be in BlockFilterSet of inner loop, but not in BlockFilterSet of outer loop, like .cold in the test case. So it is added to the chain of inner loop. When work on the outer loop, .cold is not added to BlockFilterSet, so the edge to successor .problem is not counted in UnscheduledPredecessors of .problem chain. But other blocks in the inner loop are added BlockFilterSet, so the whole inner loop chain can be layout, and markChainSuccessors is called to decrease UnscheduledPredecessors of following chains. markChainSuccessors calls markBlockSuccessors for every BB, even it is not in BlockFilterSet, like .cold, so .problem chain's UnscheduledPredecessors is decreased, but this edge was not counted on in fillWorkLists, so .problem chain's UnscheduledPredecessors becomes 0 when it still has an unscheduled predecessor .pred! And it causes problems in following various successor BB selection algorithms.
Differential Revision: https://reviews.llvm.org/D89088
show more ...
|