History log of /llvm-project-15.0.7/llvm/lib/CodeGen/MachineBlockPlacement.cpp (Results 1 – 25 of 295)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 9e6d1f4b 17-Jul-2022 Kazu Hirata <[email protected]>

[CodeGen] Qualify auto variables in for loops (NFC)


Revision tags: llvmorg-14.0.6
# 1e67385d 17-Jun-2022 Mingming Liu <[email protected]>

[MachineBlockPlacementStats] Added check for "-filter-print-funcs"
option to the machine-block-placement-stats.

Differential Revision: https://reviews.llvm.org/D128019


# b7d09557 17-Jun-2022 Mingming Liu <[email protected]>

Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats."

This reverts commit 46d45df4516e9a5bc43460429cd02cd04a85db1a. Going to
add differen

Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats."

This reverts commit 46d45df4516e9a5bc43460429cd02cd04a85db1a. Going to
add differential revision link to commit message and re-commit.

show more ...


# 46d45df4 17-Jun-2022 Mingming Liu <[email protected]>

[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats.


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# 989f1c72 15-Mar-2022 serge-sans-paille <[email protected]>

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# a278250b 10-Mar-2022 Nico Weber <[email protected]>

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https:/

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169

show more ...


# 7f230fee 07-Mar-2022 serge-sans-paille <[email protected]>

Cleanup codegen includes

after: 1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169


Revision tags: llvmorg-14.0.0-rc2
# bcdc0477 01-Mar-2022 spupyrev <[email protected]>

speeding up ext-tsp for huge instances

Differential Revision: https://reviews.llvm.org/D120780


Revision tags: llvmorg-14.0.0-rc1
# dee058c6 05-Feb-2022 Hongtao Yu <[email protected]>

[CSSPGO] Turn on ext-tsp by default for CSSPGO.

I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, pro

[CSSPGO] Turn on ext-tsp by default for CSSPGO.

I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D119048

show more ...


Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 73d92faa 01-Dec-2021 Nicholas Guy <[email protected]>

[CodeGen] Emit alignment "Max Skip" operand

The current AsmPrinter has support to emit the "Max Skip" operand
(the 3rd of .p2align), however has no support for it to actually be specified.
Adding Ma

[CodeGen] Emit alignment "Max Skip" operand

The current AsmPrinter has support to emit the "Max Skip" operand
(the 3rd of .p2align), however has no support for it to actually be specified.
Adding MaxBytesForAlignment to MachineBasicBlock provides this capability on a
per-block basis. Leaving the value as default (0) causes no observable differences
in behaviour.

Differential Revision: https://reviews.llvm.org/D114590

show more ...


Revision tags: llvmorg-13.0.1-rc1
# f573f686 08-Nov-2021 spupyrev <[email protected]>

ext-tsp basic block layout

A new basic block ordering improving existing MachineBlockPlacement.

The algorithm tries to find a layout of nodes (basic blocks) of a given CFG
optimizing jump locality

ext-tsp basic block layout

A new basic block ordering improving existing MachineBlockPlacement.

The algorithm tries to find a layout of nodes (basic blocks) of a given CFG
optimizing jump locality and thus processor I-cache utilization. This is
achieved via increasing the number of fall-through jumps and co-locating
frequently executed nodes together. The name follows the underlying
optimization problem, Extended-TSP, which is a generalization of classical
(maximum) Traveling Salesmen Problem.

The algorithm is a greedy heuristic that works with chains (ordered lists)
of basic blocks. Initially all chains are isolated basic blocks. On every
iteration, we pick a pair of chains whose merging yields the biggest increase
in the ExtTSP value, which models how i-cache "friendly" a specific chain is.
A pair of chains giving the maximum gain is merged into a new chain. The
procedure stops when there is only one chain left, or when merging does not
increase ExtTSP. In the latter case, the remaining chains are sorted by
density in decreasing order.

An important aspect is the way two chains are merged. Unlike earlier
algorithms (e.g., based on the approach of Pettis-Hansen), two
chains, X and Y, are first split into three, X1, X2, and Y. Then we
consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y,
X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score.
This improves the quality of the final result (the search space is larger)
while keeping the implementation sufficiently fast.

Differential Revision: https://reviews.llvm.org/D113424

show more ...


# 3678326d 07-Dec-2021 Nico Weber <[email protected]>

Revert "ext-tsp basic block layout"

This reverts commit c68f71eb37c2b6ffcf29e865d443a910e73083bd.

Breaks tests on arm hosts, see comments on https://reviews.llvm.org/D113424


# c68f71eb 08-Nov-2021 spupyrev <[email protected]>

ext-tsp basic block layout

A new basic block ordering improving existing MachineBlockPlacement.

The algorithm tries to find a layout of nodes (basic blocks) of a given CFG
optimizing jump locality

ext-tsp basic block layout

A new basic block ordering improving existing MachineBlockPlacement.

The algorithm tries to find a layout of nodes (basic blocks) of a given CFG
optimizing jump locality and thus processor I-cache utilization. This is
achieved via increasing the number of fall-through jumps and co-locating
frequently executed nodes together. The name follows the underlying
optimization problem, Extended-TSP, which is a generalization of classical
(maximum) Traveling Salesmen Problem.

The algorithm is a greedy heuristic that works with chains (ordered lists)
of basic blocks. Initially all chains are isolated basic blocks. On every
iteration, we pick a pair of chains whose merging yields the biggest increase
in the ExtTSP value, which models how i-cache "friendly" a specific chain is.
A pair of chains giving the maximum gain is merged into a new chain. The
procedure stops when there is only one chain left, or when merging does not
increase ExtTSP. In the latter case, the remaining chains are sorted by
density in decreasing order.

An important aspect is the way two chains are merged. Unlike earlier
algorithms (e.g., based on the approach of Pettis-Hansen), two
chains, X and Y, are first split into three, X1, X2, and Y. Then we
consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y,
X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score.
This improves the quality of the final result (the search space is larger)
while keeping the implementation sufficiently fast.

Differential Revision: https://reviews.llvm.org/D113424

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# c9fca53a 10-Sep-2021 Kazu Hirata <[email protected]>

[CodeGen, Target] Use pred_empty and succ_empty (NFC)


Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1
# 50b62731 29-Jul-2021 Guozhi Wei <[email protected]>

[MBP] findBestLoopTopHelper should exit if OldTop is not a chain header

Function findBestLoopTopHelper tries to find a new loop top block which can also
fall through to OldTop, but it's impossible i

[MBP] findBestLoopTopHelper should exit if OldTop is not a chain header

Function findBestLoopTopHelper tries to find a new loop top block which can also
fall through to OldTop, but it's impossible if OldTop is not a chain header, so
it should exit immediately.

Differential Revision: https://reviews.llvm.org/D106329

show more ...


Revision tags: llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# d8aba75a 07-May-2021 Fangrui Song <[email protected]>

Internalize some cl::opt global variables or move them under namespace llvm


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3
# cd880442 28-Jan-2021 Nicholas Guy <[email protected]>

[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold

Different targets might handle branch performance differently, so this patch allows for
targets to speci

[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold

Different targets might handle branch performance differently, so this patch allows for
targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch
can be and still be duplicated to generate straight-line code instead.
This patch also specifies said override values for the AArch64 subtarget.

Differential Revision: https://reviews.llvm.org/D95631

show more ...


Revision tags: llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# 7bc76fd0 31-Dec-2020 Kazu Hirata <[email protected]>

[CodeGen] Construct SmallVector with iterator ranges (NFC)


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2
# 687e80be 16-Dec-2020 Guozhi Wei <[email protected]>

[MBP] Add whole chain to BlockFilterSet instead of individual BB

Currently we add individual BB to BlockFilterSet if its frequency satisfies

LoopFreq / Freq <= LoopToColdBlockRatio

LoopFreq is e

[MBP] Add whole chain to BlockFilterSet instead of individual BB

Currently we add individual BB to BlockFilterSet if its frequency satisfies

LoopFreq / Freq <= LoopToColdBlockRatio

LoopFreq is edge frequency from outside to loop header.
LoopToColdBlockRatio is a command line parameter.

It doesn't make sense since we always layout whole chain, not individual BBs.

It may also cause a tricky problem. Sometimes it is possible that the LoopFreq
of an inner loop is smaller than LoopFreq of outer loop. So a BB can be in
BlockFilterSet of inner loop, but not in BlockFilterSet of outer loop,
like .cold in the test case. So it is added to the chain of inner loop. When
work on the outer loop, .cold is not added to BlockFilterSet, so the edge to
successor .problem is not counted in UnscheduledPredecessors of .problem chain.
But other blocks in the inner loop are added BlockFilterSet, so the whole inner
loop chain can be layout, and markChainSuccessors is called to decrease
UnscheduledPredecessors of following chains. markChainSuccessors calls
markBlockSuccessors for every BB, even it is not in BlockFilterSet, like .cold,
so .problem chain's UnscheduledPredecessors is decreased, but this edge was not
counted on in fillWorkLists, so .problem chain's UnscheduledPredecessors
becomes 0 when it still has an unscheduled predecessor .pred! And it causes
problems in following various successor BB selection algorithms.

Differential Revision: https://reviews.llvm.org/D89088

show more ...


# d50d7c37 14-Dec-2020 Guozhi Wei <[email protected]>

[MBP] Prevent rotating a chain contains entry block

The entry block should always be the first BB in a function.
So we should not rotate a chain contains the entry block.

Differential Revision: htt

[MBP] Prevent rotating a chain contains entry block

The entry block should always be the first BB in a function.
So we should not rotate a chain contains the entry block.

Differential Revision: https://reviews.llvm.org/D92882

show more ...


# ee5b5b7a 14-Dec-2020 Kazu Hirata <[email protected]>

[CodeGen] Use llvm::erase_value (NFC)


# a553ac97 05-Dec-2020 Kazu Hirata <[email protected]>

[CodeGen] llvm::erase_if (NFC)


Revision tags: llvmorg-11.0.1-rc1
# 68403af0 22-Nov-2020 Kazu Hirata <[email protected]>

[MBP] Remove unused declaration shouldPredBlockBeOutlined (NFC)

The function was introduced on Jun 12, 2016 in commit
071d0f180794f7819c44026815614ce8fa00a3bd. Its definition was removed
on Mar 2,

[MBP] Remove unused declaration shouldPredBlockBeOutlined (NFC)

The function was introduced on Jun 12, 2016 in commit
071d0f180794f7819c44026815614ce8fa00a3bd. Its definition was removed
on Mar 2, 2017 in commit 1393761e0ca3fe8271245762f78daf4d5208cd77.

show more ...


# e42f6c0a 23-Oct-2020 Han Shen <[email protected]>

Revert "[MBP] Add whole chain to BlockFilterSet instead of individual BB"

This reverts commit adfb5415010fbbc009a4a6298cfda7a6ed4fa6d4.

This is reverted because it caused an chrome error: https://c

Revert "[MBP] Add whole chain to BlockFilterSet instead of individual BB"

This reverts commit adfb5415010fbbc009a4a6298cfda7a6ed4fa6d4.

This is reverted because it caused an chrome error: https://crbug.com/1140168

show more ...


# adfb5415 14-Oct-2020 Guozhi Wei <[email protected]>

[MBP] Add whole chain to BlockFilterSet instead of individual BB

Currently we add individual BB to BlockFilterSet if its frequency satisfies

LoopFreq / Freq <= LoopToColdBlockRatio

LoopFreq is edg

[MBP] Add whole chain to BlockFilterSet instead of individual BB

Currently we add individual BB to BlockFilterSet if its frequency satisfies

LoopFreq / Freq <= LoopToColdBlockRatio

LoopFreq is edge frequency from outside to loop header.
LoopToColdBlockRatio is a command line parameter.

It doesn't make sense since we always layout whole chain, not individual BBs.

It may also cause a tricky problem. Sometimes it is possible that the LoopFreq
of an inner loop is smaller than LoopFreq of outer loop. So a BB can be in
BlockFilterSet of inner loop, but not in BlockFilterSet of outer loop,
like .cold in the test case. So it is added to the chain of inner loop. When
work on the outer loop, .cold is not added to BlockFilterSet, so the edge to
successor .problem is not counted in UnscheduledPredecessors of .problem chain.
But other blocks in the inner loop are added BlockFilterSet, so the whole inner
loop chain can be layout, and markChainSuccessors is called to decrease
UnscheduledPredecessors of following chains. markChainSuccessors calls
markBlockSuccessors for every BB, even it is not in BlockFilterSet, like .cold,
so .problem chain's UnscheduledPredecessors is decreased, but this edge was not
counted on in fillWorkLists, so .problem chain's UnscheduledPredecessors
becomes 0 when it still has an unscheduled predecessor .pred! And it causes
problems in following various successor BB selection algorithms.

Differential Revision: https://reviews.llvm.org/D89088

show more ...


12345678910>>...12