History log of /llvm-project-15.0.7/llvm/lib/MC/MCObjectStreamer.cpp (Results 1 – 25 of 282)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# eecd41aa 11-Jul-2022 spupyrev <[email protected]>

Revert "Rebase: [Facebook] [MC] Introduce NeverAlign fragment type"

This reverts commit 6d0528636ae54fba75938a79ae7a98dfcc949f72.


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# 6d052863 05-Aug-2021 Rafael Auler <[email protected]>

Rebase: [Facebook] [MC] Introduce NeverAlign fragment type

Summary:
Introduce NeverAlign fragment type.

The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible

Rebase: [Facebook] [MC] Introduce NeverAlign fragment type

Summary:
Introduce NeverAlign fragment type.

The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible instructions. NeverAlign fragment ensures that
the next fragment (first instruction in the pair) does not end at a
given alignment boundary by emitting a minimal size nop if necessary.

In effect, it ensures that a pair of macro-fusible instructions is not
split by a given alignment boundary, which is a precondition for
macro-op fusion in modern Intel Cores (64B = cache line size, see Intel
Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode
Pipeline: Macro-Fusion).

This patch introduces functionality used by BOLT when emitting code with
MacroFusion alignment already in place.

The use case is different from BoundaryAlign and instruction bundling:
- BoundaryAlign can be extended to perform the desired alignment for the
first instruction in the macro-op fusion pair (D101817). However, this
approach has higher overhead due to reliance on relaxation as
BoundaryAlign requires in the general case - see
https://reviews.llvm.org/D97982#2710638.
- Instruction bundling: the intent of NeverAlign fragment is to prevent
the first instruction in a pair ending at a given alignment boundary, by
inserting at most one minimum size nop. It's OK if either instruction
crosses the cache line. Padding both instructions using bundles to not
cross the alignment boundary would result in excessive padding. There's
no straightforward way to request instruction bundling to avoid a given
end alignment for the first instruction in the bundle.

LLVM: https://reviews.llvm.org/D97982

Manual rebase conflict history:
https://phabricator.intern.facebook.com/D30142613

Test Plan: sandcastle

Reviewers: #llvm-bolt

Subscribers: phabricatorlinter

Differential Revision: https://phabricator.intern.facebook.com/D31361547

show more ...


# e0e687a6 20-Jun-2022 Kazu Hirata <[email protected]>

[llvm] Don't use Optional::hasValue (NFC)


# 412c788a 14-Jun-2022 Guillaume Chatelet <[email protected]>

[NFC][Alignment] Use Align in MCAlignFragment


# adf4142f 11-Jun-2022 Fangrui Song <[email protected]>

[MC] De-capitalize SwitchSection. NFC

Add SwitchSection to return switchSection. The API will be removed soon.


# ef736a1c 08-Feb-2022 serge-sans-paille <[email protected]>

Cleanup LLVMMC headers

There's a few relevant forward declarations in there that may require downstream
adding explicit includes:

llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llv

Cleanup LLVMMC headers

There's a few relevant forward declarations in there that may require downstream
adding explicit includes:

llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h
llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h
llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h

Counting preprocessed lines required to rebuild llvm-project on my setup:
before: 1052436830
after: 1049293745

Which is significant and backs up the change in addition to the usual benefits of
decreasing coupling between headers and compilation units.

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119244

show more ...


# 3a3cb929 07-Feb-2022 Kazu Hirata <[email protected]>

[llvm] Use = default (NFC)


# d606e233 14-Jan-2022 luxufan <[email protected]>

[MC] Support constant offset for symbol PendingFixup

This patch add support relocation offset of sym+constant(like `foo+4`) form for pending fixup.

In the past, llvm-mc ignored the constant in sym+

[MC] Support constant offset for symbol PendingFixup

This patch add support relocation offset of sym+constant(like `foo+4`) form for pending fixup.

In the past, llvm-mc ignored the constant in sym+constant form, for `foo+4`, `4` would be ignored. And test case
```
.text
ret
nop
nop
.reloc foo+4, R_RISCV_32, 6

.data
.globl foo
foo:
.word 0
.word 0
.word 0
```
when run `llvm-mc -filetype=obj -triple=riscv64 %s | llvm-readobj -r`
The output is
```
Relocations [
Section (3) .rela.text {
0x0 R_RISCV_32 - 0x6
}
]
```

After applying this patch, the output is
```
Relocations [
Section (3) .rela.text {
0x4 R_RISCV_32 - 0x6
}
]
```

Differential Revision: https://reviews.llvm.org/D117316

show more ...


# 08b29b17 14-Jan-2022 luxufan <[email protected]>

[MC] Put the Pending Fixups into location symbol's fragment

Differential Revision: https://reviews.llvm.org/D117317


# 9a74c753 10-Dec-2021 Sami Tolvanen <[email protected]>

[ThinLTO][MC] Use conditional assignments for promotion aliases

Inline assembly refererences to static functions with ThinLTO+CFI were
fixed in D104058 by creating aliases for promoted functions. Cr

[ThinLTO][MC] Use conditional assignments for promotion aliases

Inline assembly refererences to static functions with ThinLTO+CFI were
fixed in D104058 by creating aliases for promoted functions. Creating
the aliases unconditionally resulted in an unexpected size increase in
a Chrome helper binary:

https://bugs.chromium.org/p/chromium/issues/detail?id=1261715

This is caused by the compiler being unable to drop unused code now
referenced by the alias in module-level inline assembly. This change
adds a .set_conditional assembly extension, which emits an assignment
only if the target symbol is also emitted, avoiding phantom references
to functions that could have otherwise been dropped.

This is an alternative to the solution proposed in D112761.

Reviewed By: pcc, nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D113613

show more ...


# e63455d5 09-Aug-2021 Peter Smith <[email protected]>

[MC] Use local MCSubtargetInfo in writeNops

On some architectures such as Arm and X86 the encoding for a nop may
change depending on the subtarget in operation at the time of
encoding. This change r

[MC] Use local MCSubtargetInfo in writeNops

On some architectures such as Arm and X86 the encoding for a nop may
change depending on the subtarget in operation at the time of
encoding. This change replaces the per module MCSubtargetInfo retained
by the targets AsmBackend in favour of passing through the local
MCSubtargetInfo in operation at the time.

On Arm using the architectural NOP instruction can have a performance
benefit on some implementations.

For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to
limit the chances of this causing problems in the future. I've not
done this for other targets such as X86 as there is more frequent use
of the MCSubtargetInfo and it looks to be for stable properties that
we would not expect to vary per function.

This change required threading STI through MCNopsFragment and
MCBoundaryAlignFragment.

I've attempted to take into account the in tree experimental backends.

Differential Revision: https://reviews.llvm.org/D45962

show more ...


# 5e71839f 06-Aug-2021 Peter Smith <[email protected]>

[MC] Add MCSubtargetInfo to MCAlignFragment

In preparation for passing the MCSubtargetInfo (STI) through to writeNops
so that it can use the STI in operation at the time, we need to record the
STI i

[MC] Add MCSubtargetInfo to MCAlignFragment

In preparation for passing the MCSubtargetInfo (STI) through to writeNops
so that it can use the STI in operation at the time, we need to record the
STI in operation when a MCAlignFragment may write nops as padding. The
STI is currently unused, a further patch will pass it through to
writeNops.

There are many places that can create an MCAlignFragment, in most cases
we can find out the STI in operation at the time. In a few places this
isn't possible as we are in initialisation or finalisation, or are
emitting constant pools. When possible I've tried to find the most
appropriate existing fragment to obtain the STI from, when none is
available use the per module STI.

For constant pools we don't actually need to use EmitCodeAlign as the
constant pools are data anyway so falling through into it via an
executable NOP is no better than falling through into data padding.

This is a prerequisite for D45962 which uses the STI to emit the
appropriate NOP for the STI. Which can differ per fragment.

Note that involves an interface change to InitSections. It is now
called initSections and requires a SubtargetInfo as a parameter.

Differential Revision: https://reviews.llvm.org/D45961

show more ...


Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init
# 28fb69e0 12-Jul-2021 Jinsong Ji <[email protected]>

[AIX] Emit version string in .file directive

AIX .file directive support including compiler version string.
https://www.ibm.com/docs/en/aix/7.2?topic=ops-file-pseudo-op

This patch adds the support

[AIX] Emit version string in .file directive

AIX .file directive support including compiler version string.
https://www.ibm.com/docs/en/aix/7.2?topic=ops-file-pseudo-op

This patch adds the support so that it will be easier to identify build
compiler in objects.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D105743

show more ...


Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2
# bbea6425 26-May-2021 Saleem Abdulrasool <[email protected]>

RISCV: adjust handling of relocation emission for RISCV

This re-architects the RISCV relocation handling to bring the
implementation closer in line with the implementation in binutils. We
would pre

RISCV: adjust handling of relocation emission for RISCV

This re-architects the RISCV relocation handling to bring the
implementation closer in line with the implementation in binutils. We
would previously aggressively resolve the relocation. With this
restructuring, we always will emit a paired relocation for any symbolic
difference of the type of S±T[±C] where S and T are labels and C is a
constant.

GAS has a special target hook controlled by `RELOC_EXPANSION_POSSIBLE`
which indicates that a fixup may be expanded into multiple relocations.
This is used by the RISCV backend to always emit a paired relocation -
either ADD[WIDTH] + SUB[WIDTH] for text relocations or SET[WIDTH] +
SUB[WIDTH] for a debug info relocation. Irrespective of whether linker
relaxation support is enabled, symbolic difference is always emitted as
a paired relocation.

This change also sinks the target specific behaviour down into the
target specific area rather than exposing it to the shared relocation
handling. In the process, we also sink the "special" handling for debug
information down into the RISCV target. Although this improves the path
for the other targets, this is not necessarily entirely ideal either.
The changes in the debug info emission could be done through another
type of hook as this functionality would be required by any other target
which wishes to do linker relaxation. However, as there are no other
targets in LLVM which currently do this, this is a reasonable thing to
do until such time as the code needs to be shared.

Improve the handling of the relocation (and add a reduced test case from
the Linux kernel) to ensure that we handle complex expressions for
symbolic difference. This ensures that we correct relocate symbols with
the adddends normalized and associated with the addition portion of the
paired relocation.

This change also addresses some review comments from Alex Bradbury about
the relocations meant for use in the DWARF CFA being named incorrectly
(using ADD6 instead of SET6) in the original change which introduced the
relocation type.

This resolves the issues with the symbolic difference emission
sufficiently to enable building the Linux kernel with clang+IAS+lld
(without linker relaxation).

Resolves PR50153, PR50156!
Fixes: ClangBuiltLinux/linux#1023, ClangBuiltLinux/linux#1143

Reviewed By: nickdesaulniers, maskray

Differential Revision: https://reviews.llvm.org/D103539

show more ...


Revision tags: llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3
# 87bbf3d1 05-Mar-2021 Chen Zheng <[email protected]>

[XCOFF][DebugInfo] support DWARF for XCOFF for assembly output.

Reviewed By: jasonliu

Differential Revision: https://reviews.llvm.org/D95518


Revision tags: llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# d9a0c40b 21-Dec-2020 Fangrui Song <[email protected]>

[MC] Split MCContext::createTempSymbol, default AlwaysAddSuffix to true, and add comments

CanBeUnnamed is rarely false. Splitting to a createNamedTempSymbol makes the
intention clearer and matches t

[MC] Split MCContext::createTempSymbol, default AlwaysAddSuffix to true, and add comments

CanBeUnnamed is rarely false. Splitting to a createNamedTempSymbol makes the
intention clearer and matches the direction of reverted r240130 (to drop the
unneeded parameters).

No behavior change.

show more ...


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2
# 705a4c14 08-Dec-2020 Hongtao Yu <[email protected]>

[CSSPGO] Pseudo probe encoding and emission.

This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdY

[CSSPGO] Pseudo probe encoding and emission.

This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section .pseudo_probe_desc,"",@progbits
.quad 6309742469962978389 // Func GUID
.quad 4294967295 // Func Hash
.byte 9 // Length of func name
.ascii "_Z5funcAi" // Func name
.quad 7102633082150537521
.quad 138828622701
.byte 12
.ascii "_Z8funcLeafi"
.quad 446061515086924981
.quad 4294967295
.byte 9
.ascii "_Z5funcBi"
.quad -2016976694713209516
.quad 72617220756
.byte 7
.ascii "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
GUID (uint64)
GUID of the function
NPROBES (ULEB128)
Number of probes originating from this function.
NUM_INLINED_FUNCTIONS (ULEB128)
Number of callees inlined into this function, aka number of
first-level inlinees
PROBE RECORDS
A list of NPROBES entries. Each entry contains:
INDEX (ULEB128)
TYPE (uint4)
0 - block probe, 1 - indirect call, 2 - direct call
ATTRIBUTE (uint3)
reserved
ADDRESS_TYPE (uint1)
0 - code address, 1 - address delta
CODE_ADDRESS (uint64 or ULEB128)
code address or address delta, depending on ADDRESS_TYPE
INLINED FUNCTION RECORDS
A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
callees. Each record contains:
INLINE SITE
GUID of the inlinee (uint64)
ID of the callsite probe (ULEB128)
FUNCTION BODY
A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 3 0
.pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe 837061429793323041 4 0
popq %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
2011a0: 50 push rax
2011a1: 85 ff test edi,edi
[Probe]: FUNC: foo2 Index: 1 Type: Block
2011a3: 74 02 je 2011a7 <foo2+0x7>
[Probe]: FUNC: foo2 Index: 3 Type: Block
[Probe]: FUNC: foo2 Index: 4 Type: Block
[Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6
2011a5: 58 pop rax
2011a6: c3 ret
[Probe]: FUNC: foo2 Index: 2 Type: Block
2011a7: bf 01 00 00 00 mov edi,0x1
[Probe]: FUNC: foo2 Index: 5 Type: IndirectCall
2011ac: ff d6 call rsi
[Probe]: FUNC: foo2 Index: 4 Type: Block
2011ae: 58 pop rax
2011af: c3 ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878

show more ...


# 7ead5f5a 10-Dec-2020 Mitch Phillips <[email protected]>

Revert "[CSSPGO] Pseudo probe encoding and emission."

This reverts commit b035513c06d1cba2bae8f3e88798334e877523e1.

Reason: Broke the ASan buildbots:
http://lab.llvm.org:8011/#/builders/5/builds/

Revert "[CSSPGO] Pseudo probe encoding and emission."

This reverts commit b035513c06d1cba2bae8f3e88798334e877523e1.

Reason: Broke the ASan buildbots:
http://lab.llvm.org:8011/#/builders/5/builds/2269

show more ...


# b035513c 08-Dec-2020 Hongtao Yu <[email protected]>

[CSSPGO] Pseudo probe encoding and emission.

This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdY

[CSSPGO] Pseudo probe encoding and emission.

This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section .pseudo_probe_desc,"",@progbits
.quad 6309742469962978389 // Func GUID
.quad 4294967295 // Func Hash
.byte 9 // Length of func name
.ascii "_Z5funcAi" // Func name
.quad 7102633082150537521
.quad 138828622701
.byte 12
.ascii "_Z8funcLeafi"
.quad 446061515086924981
.quad 4294967295
.byte 9
.ascii "_Z5funcBi"
.quad -2016976694713209516
.quad 72617220756
.byte 7
.ascii "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
GUID (uint64)
GUID of the function
NPROBES (ULEB128)
Number of probes originating from this function.
NUM_INLINED_FUNCTIONS (ULEB128)
Number of callees inlined into this function, aka number of
first-level inlinees
PROBE RECORDS
A list of NPROBES entries. Each entry contains:
INDEX (ULEB128)
TYPE (uint4)
0 - block probe, 1 - indirect call, 2 - direct call
ATTRIBUTE (uint3)
reserved
ADDRESS_TYPE (uint1)
0 - code address, 1 - address delta
CODE_ADDRESS (uint64 or ULEB128)
code address or address delta, depending on ADDRESS_TYPE
INLINED FUNCTION RECORDS
A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
callees. Each record contains:
INLINE SITE
GUID of the inlinee (uint64)
ID of the callsite probe (ULEB128)
FUNCTION BODY
A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 3 0
.pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe 837061429793323041 4 0
popq %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
2011a0: 50 push rax
2011a1: 85 ff test edi,edi
[Probe]: FUNC: foo2 Index: 1 Type: Block
2011a3: 74 02 je 2011a7 <foo2+0x7>
[Probe]: FUNC: foo2 Index: 3 Type: Block
[Probe]: FUNC: foo2 Index: 4 Type: Block
[Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6
2011a5: 58 pop rax
2011a6: c3 ret
[Probe]: FUNC: foo2 Index: 2 Type: Block
2011a7: bf 01 00 00 00 mov edi,0x1
[Probe]: FUNC: foo2 Index: 5 Type: IndirectCall
2011ac: ff d6 call rsi
[Probe]: FUNC: foo2 Index: 4 Type: Block
2011ae: 58 pop rax
2011af: c3 ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878

show more ...


Revision tags: llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2
# c6334db5 31-Jul-2020 Jian Cai <[email protected]>

[X86] support .nops directive

Add support of .nops on X86. This addresses llvm.org/PR45788.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D82826


Revision tags: llvmorg-11.0.0-rc1
# e0a372ff 21-Jul-2020 Stefan Pintilie <[email protected]>

[PowerPC] Extend .reloc directive on PowerPC

When the compiler generates a GOT indirect load it must generate two loads. One
that loads the address of the element from the GOT and a second to load t

[PowerPC] Extend .reloc directive on PowerPC

When the compiler generates a GOT indirect load it must generate two loads. One
that loads the address of the element from the GOT and a second to load the
actual element based on the address just loaded from the GOT. However, the
linker can optimize these two loads into one load if it knows that it is safe
to do so. The compiler can tell the linker that the optimization is safe
by using the R_PPC64_PCREL_OPT relocation.

This patch extends the .reloc directive to allow the following setup

pld 3, vec@got@pcrel(0), 1
.Lpcrel1=.-8
... More instructions possible here ...
.reloc .Lpcrel1,R_PPC64_PCREL_OPT,.-.Lpcrel1
lwa 3, 4(3)

Reviewers: nemanjai, lei, hfinkel, sfertile, efriedma, tstellar, grosbach, MaskRay

Reviewed By: nemanjai, MaskRay

Differential Revision: https://reviews.llvm.org/D79625

show more ...


Revision tags: llvmorg-12-init
# b71ef0c5 14-Jul-2020 Fangrui Song <[email protected]>

[MC] Support .reloc sym+constant, *, *

For `.reloc offset, *, *`, currently offset can be a constant or symbol.
This patch makes it support any expression which can be folded to sym+constant.

Revie

[MC] Support .reloc sym+constant, *, *

For `.reloc offset, *, *`, currently offset can be a constant or symbol.
This patch makes it support any expression which can be folded to sym+constant.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D83751

show more ...


Revision tags: llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1
# 99ac9ce7 09-May-2020 Shengchen Kan <[email protected]>

[NFC] Clean up in MCObjectStreamer and X86AsmBackend


# fd773e8a 26-Apr-2020 Alexandre Ganea <[email protected]>

Re-land [MC] Fix quadratic behavior in addPendingLabel

This was discovered when compiling large unity/blob/jumbo files.

Differential Revision: https://reviews.llvm.org/D78775


# 65fe71be 24-Apr-2020 Alexandre Ganea <[email protected]>

Revert "[MC] Fix quadratic behavior in addPendingLabel()"

This reverts commit e98f73a629075ae3b9c4d5317bead5a122d69865.


12345678910>>...12