|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
e0612c91 |
| 06-Jul-2022 |
Fangrui Song <[email protected]> |
[ELF] Optimize getInputSections. NFC
In the majority of cases (e.g. orphan sections), an OutputSection has at most one InputSectionDescription (isd). By changing the return type to ArrayRef<InputSec
[ELF] Optimize getInputSections. NFC
In the majority of cases (e.g. orphan sections), an OutputSection has at most one InputSectionDescription (isd). By changing the return type to ArrayRef<InputSection *> we can just reference the isd->sections. For OutputSections with more than one InputSectionDescription we use a caller provided SmallVector to copy the elements as before.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D129111
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
7effcbda |
| 19-Jun-2022 |
Nico Weber <[email protected]> |
Rename parallelForEachN to just parallelFor
Patch created by running:
rg -l parallelForEachN | xargs sed -i '' -c 's/parallelForEachN/parallelFor/'
No behavior change.
Differential Revision: ht
Rename parallelForEachN to just parallelFor
Patch created by running:
rg -l parallelForEachN | xargs sed -i '' -c 's/parallelForEachN/parallelFor/'
No behavior change.
Differential Revision: https://reviews.llvm.org/D128140
show more ...
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
b3d5bb3b |
| 06-May-2022 |
Fangrui Song <[email protected]> |
[ELF] Change (NOLOAD) type mismatch to use SHT_NOBITS instead of SHT_PROGBITS
Placing a non-SHT_NOBITS input section in an output section specified with (NOLOAD) is fishy but used by some projects.
[ELF] Change (NOLOAD) type mismatch to use SHT_NOBITS instead of SHT_PROGBITS
Placing a non-SHT_NOBITS input section in an output section specified with (NOLOAD) is fishy but used by some projects. D118840 changed the output type to SHT_PROGBITS, but using the specified type seems to make more sense and improve GNU ld compatibility: `(NOLOAD)` seems to change the output section type regardless of input.
I think we should keep the current type mismatch warning as it does indicate an error-prone usage.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D125074
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
6c814931 |
| 08-Mar-2022 |
Fangrui Song <[email protected]> |
[ELF] Don't use multiple inheritance for OutputSection. NFC
Add an OutputDesc class inheriting from SectionCommand. An OutputDesc wraps an OutputSection. This change allows InputSection::getParent t
[ELF] Don't use multiple inheritance for OutputSection. NFC
Add an OutputDesc class inheriting from SectionCommand. An OutputDesc wraps an OutputSection. This change allows InputSection::getParent to be inlined.
Differential Revision: https://reviews.llvm.org/D120650
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2 |
|
| #
4976d1fe |
| 28-Feb-2022 |
Fangrui Song <[email protected]> |
[ELF] Move SyntheticSection check from InputSection::writeTo to OutputSection::writeTo. NFC
Simplify code and make the heavyweight operation to the call site so that it is clearer how to improve the
[ELF] Move SyntheticSection check from InputSection::writeTo to OutputSection::writeTo. NFC
Simplify code and make the heavyweight operation to the call site so that it is clearer how to improve the inefficient scheduling in the future.
show more ...
|
| #
b01430a0 |
| 24-Feb-2022 |
Fangrui Song <[email protected]> |
[ELF] Don't rely on Symbols.h's transitive inclusion of InputFiles.h. NFC
|
| #
cb0a4bb5 |
| 18-Feb-2022 |
Fangrui Song <[email protected]> |
[ELF] Change (NOLOAD) section type mismatch error to warning
Making a (NOLOAD) section SHT_PROGBITS is fishy (the user may expect all-zero content, but the linker does not check that), but some proj
[ELF] Change (NOLOAD) section type mismatch error to warning
Making a (NOLOAD) section SHT_PROGBITS is fishy (the user may expect all-zero content, but the linker does not check that), but some projects (e.g. Linux kernel https://github.com/ClangBuiltLinux/linux/issues/1597) traditionally rely on the behavior. Issue a warning to not break them.
show more ...
|
| #
66f8ac8d |
| 17-Feb-2022 |
Fangrui Song <[email protected]> |
[ELF] Support (TYPE=<value>) to customize the output section type
The current output section type allows to set the ELF section type to SHT_PROGBITS or SHT_NOLOAD. This patch allows an arbitrary sec
[ELF] Support (TYPE=<value>) to customize the output section type
The current output section type allows to set the ELF section type to SHT_PROGBITS or SHT_NOLOAD. This patch allows an arbitrary section value to be specified. Some common SHT_* literal names are supported as well.
``` SECTIONS { note (TYPE=SHT_NOTE) : { BYTE(8) *(note) } init_array ( TYPE=14 ) : { QUAD(14) } fini_array (TYPE = SHT_FINI_ARRAY) : { QUAD(15) } } ```
When `sh_type` is specified, it is an error if an input section has a different type.
Our syntax is compatible with GNU ld 2.39 (https://sourceware.org/bugzilla/show_bug.cgi?id=28841).
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D118840
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1 |
|
| #
27bb7990 |
| 08-Feb-2022 |
Fangrui Song <[email protected]> |
[ELF] Clean up headers. NFC
|
| #
e8bff9ae |
| 07-Feb-2022 |
Mariusz Ceier <[email protected]> |
Fix lld standalone build
lld/ELF/OutputSections.cpp includes llvm/Config/config.h for LLVM_ENABLE_ZLIB definition, but llvm/Config/config.h doesn't exist in standalone build.
To fix this, this patc
Fix lld standalone build
lld/ELF/OutputSections.cpp includes llvm/Config/config.h for LLVM_ENABLE_ZLIB definition, but llvm/Config/config.h doesn't exist in standalone build.
To fix this, this patch moves LLVM_ENABLE_ZLIB from config.h to llvm-config.h and updates OutputSections.cpp to include llvm-config.h instead of config.h
Reviewed By: MaskRay, mgorny
Differential Revision: https://reviews.llvm.org/D119058
show more ...
|
|
Revision tags: llvmorg-15-init |
|
| #
5a2020d0 |
| 30-Jan-2022 |
Fangrui Song <[email protected]> |
[ELF] copyShtGroup: replace unordered_set<uint32_t> with DenseSet<uint32_t>. NFC
We don't need to support the empty/tombstone key section index.
|
| #
f318fd9b |
| 30-Jan-2022 |
Fangrui Song <[email protected]> |
[ELF] crtbegin/crtend test: replace std::regex with hand-written matcher. NFC
My x86-64 lld executable is 18KiB smaller.
|
| #
fcd8817d |
| 30-Jan-2022 |
Fangrui Song <[email protected]> |
[ELF] Simplify maybeCompress with lld::split. NFC
|
| #
913914f0 |
| 26-Jan-2022 |
Fangrui Song <[email protected]> |
[ELF] Simplify writing the Elf_Chdr header. NFC
And avoiding changing `size` in `writeTo`.
|
| #
2a80c3db |
| 26-Jan-2022 |
Fangrui Song <[email protected]> |
[ELF] Clarify that Z_BEST_SPEED==1 in a comment. NFC
|
| #
7438dbe0 |
| 26-Jan-2022 |
Fangrui Song <[email protected]> |
[ELF] Cast size to size_t. NFC
To fix
../../chromeclang/bin/../include/c++/v1/__algorithm/min.h:39:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('unsigned long
[ELF] Cast size to size_t. NFC
To fix
../../chromeclang/bin/../include/c++/v1/__algorithm/min.h:39:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('unsigned long' vs. 'unsigned long long')
on macOS arm64.
show more ...
|
| #
223f9dea |
| 26-Jan-2022 |
Fangrui Song <[email protected]> |
[ELF] maybeCompress: replace vector<uint8_t> with unique_ptr<uint8_t[]>. NFC
And mention that it is zero-initialized. I do not notice a speed-up if changed to be uninitialized by forcing the zero fi
[ELF] maybeCompress: replace vector<uint8_t> with unique_ptr<uint8_t[]>. NFC
And mention that it is zero-initialized. I do not notice a speed-up if changed to be uninitialized by forcing the zero filler in writeTo.
show more ...
|
| #
4cdc4416 |
| 25-Jan-2022 |
Fangrui Song <[email protected]> |
[ELF] Parallelize --compress-debug-sections=zlib
When linking a Debug build clang (265MiB SHF_ALLOC sections, 920MiB uncompressed debug info), in a --threads=1 link "Compress debug sections" takes 2
[ELF] Parallelize --compress-debug-sections=zlib
When linking a Debug build clang (265MiB SHF_ALLOC sections, 920MiB uncompressed debug info), in a --threads=1 link "Compress debug sections" takes 2/3 time and in a --threads=8 link "Compress debug sections" takes ~70% time.
This patch splits a section into 1MiB shards and calls zlib `deflake` parallelly.
DEFLATE blocks are a bit sequence. We need to ensure every shard starts at a byte boundary for concatenation. We use Z_SYNC_FLUSH for all shards but the last to flush the output to a byte boundary. (Z_FULL_FLUSH can be used as well, but Z_FULL_FLUSH clears the hash table which just wastes time.)
The last block requires the BFINAL flag. We call deflate with Z_FINISH to set the flag as well as flush the output to a byte boundary. Under the hood, all of Z_SYNC_FLUSH, Z_FULL_FLUSH, and Z_FINISH emit a non-compressed block (called stored block in zlib). RFC1951 says "Any bits of input up to the next byte boundary are ignored."
In a --threads=8 link, "Compress debug sections" is 5.7x as fast and the total speed is 2.54x. Because the hash table for one shard is not shared with the next shard, the output is slightly larger. Better compression ratio can be achieved by preloading the window size from the previous shard as dictionary (`deflateSetDictionary`), but that is overkill.
``` # 1MiB shards % bloaty clang.new -- clang.old FILE SIZE VM SIZE -------------- -------------- +0.3% +129Ki [ = ] 0 .debug_str +0.1% +105Ki [ = ] 0 .debug_info +0.3% +101Ki [ = ] 0 .debug_line +0.2% +2.66Ki [ = ] 0 .debug_abbrev +0.0% +1.19Ki [ = ] 0 .debug_ranges +0.1% +341Ki [ = ] 0 TOTAL
# 2MiB shards % bloaty clang.new -- clang.old FILE SIZE VM SIZE -------------- -------------- +0.2% +74.2Ki [ = ] 0 .debug_line +0.1% +72.3Ki [ = ] 0 .debug_str +0.0% +69.9Ki [ = ] 0 .debug_info +0.1% +976 [ = ] 0 .debug_abbrev +0.0% +882 [ = ] 0 .debug_ranges +0.0% +218Ki [ = ] 0 TOTAL ```
Bonus in not using zlib::compress
* we can compress a debug section larger than 4GiB * peak memory usage is lower because for most shards the output size is less than 50% input size (all less than 55% for a large binary I tested, but decreasing the initial output size does not decrease memory usage)
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D117853
show more ...
|
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
a1c2ee01 |
| 26-Dec-2021 |
Fangrui Song <[email protected]> |
[ELF] LinkerScript/OutputSection: change other std::vector members to SmallVector
11+KiB smaller .text with both libc++ and libstdc++ builds.
|
| #
bf7f3dd7 |
| 26-Dec-2021 |
Fangrui Song <[email protected]> |
[ELF] Move outSecOff addition from InputSection::writeTo to the caller
Simplify the code a bit and improve consistency with SyntheticSection::writeTo.
|
| #
ba948c5a |
| 23-Dec-2021 |
Fangrui Song <[email protected]> |
[ELF] Use SmallVector for some global variables (*Files and *Sections). NFC
My lld executable is 26+KiB smaller.
|
| #
6683099a |
| 21-Dec-2021 |
Fangrui Song <[email protected]> |
[ELF] Optimize RelocationSection<ELFT>::writeTo
When linking a 1.2G output (nearly no debug info, 2846621 dynamic relocations) using `--threads=8`, I measured
``` 9.131462 Total ExecuteLinker 1.449
[ELF] Optimize RelocationSection<ELFT>::writeTo
When linking a 1.2G output (nearly no debug info, 2846621 dynamic relocations) using `--threads=8`, I measured
``` 9.131462 Total ExecuteLinker 1.449913 Total Write output file 1.445784 Total Write sections 0.657152 Write sections {"detail":".rela.dyn"} ```
This change decreases the .rela.dyn time to 0.25, leading to 4% speed up in the total time.
* The parallelSort is slow because of expensive r_sym/r_offset computation. Cache the values. * The iteration is slow. Move r_sym/r_addend computation ahead of time and parallelize it.
With the change, the new encodeDynamicReloc is cheap (0.05s). So no need to parallelize it.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D115993
show more ...
|
| #
8825ffdb |
| 20-Dec-2021 |
Fangrui Song <[email protected]> |
[ELF] --time-trace: Trace "Write sections"
writeSections is typically a bottleneck. This was used to track down the following bottlenecks:
* Output section .rela.dyn (9115d75117b57115fe45153e5f38f2
[ELF] --time-trace: Trace "Write sections"
writeSections is typically a bottleneck. This was used to track down the following bottlenecks:
* Output section .rela.dyn (9115d75117b57115fe45153e5f38f2c444c0cd91) * Output section .debug_str (3aae04c744b03eb3eec7376f9d34fa3e42f8d108) * posix_fallocate is slow for Linux tmpfs: D115957
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D115984
show more ...
|
| #
93558e57 |
| 17-Dec-2021 |
Fangrui Song <[email protected]> |
[ELF] Internalize createMergeSynthetic. NFC
Only called once. Moving to OutputSections.cpp can make it inlined. finalizeInputSections can be very hot, especially in -O1 links with much debug info.
|
| #
d060cc1f |
| 28-Nov-2021 |
Fangrui Song <[email protected]> |
[ELF] Fix out-of-bounds write in memset(&Out::first, ...)
Fix r285764: there is no guarantee that Out::first is placed before other static data members of `struct Out`. After `bufferStart` was intro
[ELF] Fix out-of-bounds write in memset(&Out::first, ...)
Fix r285764: there is no guarantee that Out::first is placed before other static data members of `struct Out`. After `bufferStart` was introduced, this out-of-bounds write is destined in many compilers. It is likely benign, though.
And move `Out::elfHeader->size` assignment beside `Out::elfHeader->sectionIndex`
show more ...
|