|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3 |
|
| #
ac954145 |
| 10-Feb-2025 |
Miguel Ojeda <[email protected]> |
kbuild: rust: add rustc-min-version support function
Introduce `rustc-min-version` support function that mimics `{gcc,clang}-min-version` ones, following commit 88b61e3bff93 ("Makefile.compiler: rep
kbuild: rust: add rustc-min-version support function
Introduce `rustc-min-version` support function that mimics `{gcc,clang}-min-version` ones, following commit 88b61e3bff93 ("Makefile.compiler: replace cc-ifversion with compiler-specific macros").
In addition, use it in the first use case we have in the kernel (which was done independently to minimize the changes needed for the fix).
Signed-off-by: Miguel Ojeda <[email protected]> Reviewed-by: Fiona Behrens <[email protected]> Reviewed-by: Nicolas Schier <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
show more ...
|
| #
446a8351 |
| 10-Feb-2025 |
Miguel Ojeda <[email protected]> |
arm64: rust: clean Rust 1.85.0 warning using softfloat target
Starting with Rust 1.85.0 (to be released 2025-02-20), `rustc` warns [1] about disabling neon in the aarch64 hardfloat target:
warn
arm64: rust: clean Rust 1.85.0 warning using softfloat target
Starting with Rust 1.85.0 (to be released 2025-02-20), `rustc` warns [1] about disabling neon in the aarch64 hardfloat target:
warning: target feature `neon` cannot be toggled with `-Ctarget-feature`: unsound on hard-float targets because it changes float ABI | = note: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release! = note: for more information, see issue #116344 <https://github.com/rust-lang/rust/issues/116344>
Thus, instead, use the softfloat target instead.
While trying it out, I found that the kernel sanitizers were not enabled for that built-in target [2]. Upstream Rust agreed to backport the enablement for the current beta so that it is ready for the 1.85.0 release [3] -- thanks!
However, that still means that before Rust 1.85.0, we cannot switch since sanitizers could be in use. Thus conditionally do so.
Cc: [email protected] # Needed in 6.12.y and 6.13.y only (Rust is pinned in older LTSs). Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Matthew Maurer <[email protected]> Cc: Alice Ryhl <[email protected]> Cc: Ralf Jung <[email protected]> Cc: Jubilee Young <[email protected]> Link: https://github.com/rust-lang/rust/pull/133417 [1] Link: https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/arm64.20neon.20.60-Ctarget-feature.60.20warning/near/495358442 [2] Link: https://github.com/rust-lang/rust/pull/135905 [3] Link: https://github.com/rust-lang/rust/issues/116344 Signed-off-by: Miguel Ojeda <[email protected]> Reviewed-by: Trevor Gross <[email protected]> Tested-by: Matthew Maurer <[email protected]> Reviewed-by: Ralf Jung <[email protected]> Reviewed-by: Alice Ryhl <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7 |
|
| #
214c0eea |
| 10-Nov-2024 |
Masahiro Yamada <[email protected]> |
kbuild: add $(objtree)/ prefix to some in-kernel build artifacts
$(objtree) refers to the top of the output directory of kernel builds.
This commit adds the explicit $(objtree)/ prefix to build art
kbuild: add $(objtree)/ prefix to some in-kernel build artifacts
$(objtree) refers to the top of the output directory of kernel builds.
This commit adds the explicit $(objtree)/ prefix to build artifacts needed for building external modules.
This change has no immediate impact, as the top-level Makefile currently defines:
objtree := .
This commit prepares for supporting the building of external modules in a different directory.
Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Nicolas Schier <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
| #
9abe390e |
| 27-Sep-2024 |
Mark Rutland <[email protected]> |
arm64: Force position-independent veneers
Certain portions of code always need to be position-independent regardless of CONFIG_RELOCATABLE, including code which is executed in an idmap or which is e
arm64: Force position-independent veneers
Certain portions of code always need to be position-independent regardless of CONFIG_RELOCATABLE, including code which is executed in an idmap or which is executed before relocations are applied. In some kernel configurations the LLD linker generates position-dependent veneers for such code, and when executed these result in early boot-time failures.
Marc Zyngier encountered a boot failure resulting from this when building a (particularly cursed) configuration with LLVM, as he reported to the list:
https://lore.kernel.org/linux-arm-kernel/[email protected]/
In Marc's kernel configuration, the .head.text and .rodata.text sections end up more than 128MiB apart, requiring a veneer to branch between the two:
| [mark@lakrids:~/src/linux]% usekorg 14.1.0 aarch64-linux-objdump -t vmlinux | grep -w _text | ffff800080000000 g .head.text 0000000000000000 _text | [mark@lakrids:~/src/linux]% usekorg 14.1.0 aarch64-linux-objdump -t vmlinux | grep -w primary_entry | ffff8000889df0e0 g .rodata.text 000000000000006c primary_entry,
... consequently, LLD inserts a position-dependent veneer for the branch from _stext (in .head.text) to primary_entry (in .rodata.text):
| ffff800080000000 <_text>: | ffff800080000000: fa405a4d ccmp x18, #0x0, #0xd, pl // pl = nfrst | ffff800080000004: 14003fff b ffff800080010000 <__AArch64AbsLongThunk_primary_entry> ... | ffff800080010000 <__AArch64AbsLongThunk_primary_entry>: | ffff800080010000: 58000050 ldr x16, ffff800080010008 <__AArch64AbsLongThunk_primary_entry+0x8> | ffff800080010004: d61f0200 br x16 | ffff800080010008: 889df0e0 .word 0x889df0e0 | ffff80008001000c: ffff8000 .word 0xffff8000
... and as this is executed early in boot before the kernel is mapped in TTBR1 this results in a silent boot failure.
Fix this by passing '--pic-veneer' to the linker, which will cause the linker to use position-independent veneers, e.g.
| ffff800080000000 <_text>: | ffff800080000000: fa405a4d ccmp x18, #0x0, #0xd, pl // pl = nfrst | ffff800080000004: 14003fff b ffff800080010000 <__AArch64ADRPThunk_primary_entry> ... | ffff800080010000 <__AArch64ADRPThunk_primary_entry>: | ffff800080010000: f004e3f0 adrp x16, ffff800089c8f000 <__idmap_text_start> | ffff800080010004: 91038210 add x16, x16, #0xe0 | ffff800080010008: d61f0200 br x16
I've opted to pass '--pic-veneer' unconditionally, as:
* In addition to solving the boot failure, these sequences are generally nicer as they require fewer instructions and don't need to perform data accesses.
* While the position-independent veneer sequences have a limited +/-2GiB range, this is not a new restriction. Even kernels built with CONFIG_RELOCATABLE=n are limited to 2GiB in size as we have several structues using 32-bit relative offsets and PPREL32 relocations, which are similarly limited to +/-2GiB in range. These include extable entries, jump table entries, and alt_instr entries.
* GNU LD defaults to using position-independent veneers, and supports the same '--pic-veneer' option, so this change is not expected to adversely affect GNU LD.
I've tested with GNU LD 2.30 to 2.42 inclusive and LLVM 13.0.1 to 19.1.0 inclusive, using the kernel.org binaries from:
* https://mirrors.edge.kernel.org/pub/tools/crosstool/ * https://mirrors.edge.kernel.org/pub/tools/llvm/
Signed-off-by: Mark Rutland <[email protected]> Reported-by: Marc Zyngier <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Will Deacon <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
|
Revision tags: v6.11, v6.11-rc7, v6.11-rc6 |
|
| #
d077242d |
| 29-Aug-2024 |
Alice Ryhl <[email protected]> |
rust: support for shadow call stack sanitizer
Add all of the flags that are needed to support the shadow call stack (SCS) sanitizer with Rust, and updates Kconfig to allow only configurations that w
rust: support for shadow call stack sanitizer
Add all of the flags that are needed to support the shadow call stack (SCS) sanitizer with Rust, and updates Kconfig to allow only configurations that work.
The -Zfixed-x18 flag is required to use SCS on arm64, and requires rustc version 1.80.0 or greater. This restriction is reflected in Kconfig.
When CONFIG_DYNAMIC_SCS is enabled, the build will be configured to include unwind tables in the build artifacts. Dynamic SCS uses the unwind tables at boot to find all places that need to be patched. The -Cforce-unwind-tables=y flag ensures that unwind tables are available for Rust code.
In non-dynamic mode, the -Zsanitizer=shadow-call-stack flag is what enables the SCS sanitizer. Using this flag requires rustc version 1.82.0 or greater on the targets used by Rust in the kernel. This restriction is reflected in Kconfig.
It is possible to avoid the requirement of rustc 1.80.0 by using -Ctarget-feature=+reserve-x18 instead of -Zfixed-x18. However, this flag emits a warning during the build, so this patch does not add support for using it and instead requires 1.80.0 or greater.
The dependency is placed on `select HAVE_RUST` to avoid a situation where enabling Rust silently turns off the sanitizer. Instead, turning on the sanitizer results in Rust being disabled. We generally do not want changes to CONFIG_RUST to result in any mitigations being changed or turned off.
At the time of writing, rustc 1.82.0 only exists via the nightly release channel. There is a chance that the -Zsanitizer=shadow-call-stack flag will end up needing 1.83.0 instead, but I think it is small.
Reviewed-by: Sami Tolvanen <[email protected]> Reviewed-by: Ard Biesheuvel <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Alice Ryhl <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Fixed indentation using spaces. - Miguel ] Signed-off-by: Miguel Ojeda <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1 |
|
| #
4c7be57f |
| 23-Jul-2024 |
Linus Torvalds <[email protected]> |
arm64: allow installing compressed image by default
On arm64 we build compressed images, but "make install" by default will install the old non-compressed one. To actually get the compressed image
arm64: allow installing compressed image by default
On arm64 we build compressed images, but "make install" by default will install the old non-compressed one. To actually get the compressed image install, you need to use "make zinstall", which is not the usual way to install a kernel.
Which may not sound like much of an issue, but when you deal with multiple architectures (and years of your fingers knowing the regular "make install" incantation), this inconsistency is pretty annoying.
But as Will Deacon says: "Sadly, bootloaders being as top quality as you might expect, I don't think we're in a position to rely on decompressor support across the board. Our Image.gz is literally just that -- we don't have a built-in decompressor (nor do I think we want to rush into that again after the fun we had on arm32) and the recent EFI zboot support solves that problem for platforms using EFI.
Changing the default 'install' target terrifies me. There are bound to be folks with embedded boards who've scripted this and we could really ruin their day if we quietly give them a compressed kernel that their bootloader doesn't know how to handle :/"
So make this conditional on a new "COMPRESSED_INSTALL" option.
Cc: Catalin Marinas <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2 |
|
| #
71883ae3 |
| 29-Mar-2024 |
Samuel Holland <[email protected]> |
arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT
arm64 provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. Add a wrapper header, and
arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT
arm64 provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. Add a wrapper header, and export CFLAGS adjustments as found in lib/raid6/Makefile.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Samuel Holland <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Russell King <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
7a23b027 |
| 29-Mar-2024 |
Simon Glass <[email protected]> |
arm64: boot: Support Flat Image Tree
Add a script which produces a Flat Image Tree (FIT), a single file containing the built kernel and associated devicetree files. Compression defaults to gzip whic
arm64: boot: Support Flat Image Tree
Add a script which produces a Flat Image Tree (FIT), a single file containing the built kernel and associated devicetree files. Compression defaults to gzip which gives a good balance of size and performance.
The files compress from about 86MB to 24MB using this approach.
The FIT can be used by bootloaders which support it, such as U-Boot and Linuxboot. It permits automatic selection of the correct devicetree, matching the compatible string of the running board with the closest compatible string in the FIT. There is no need for filenames or other workarounds.
Add a 'make image.fit' build target for arm64, as well.
The FIT can be examined using 'dumpimage -l'.
This uses the 'dtbs-list' file but processes only .dtb files, ignoring the overlay .dtbo files.
This features requires pylibfdt (use 'pip install libfdt'). It also requires compression utilities for the algorithm being used. Supported compression options are the same as the Image.xxx files. Use FIT_COMPRESSION to select an algorithm other than gzip.
While FIT supports a ramdisk / initrd, no attempt is made to support this here, since it must be built separately from the Linux build.
Signed-off-by: Simon Glass <[email protected]> Acked-by: Masahiro Yamada <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
| #
0dc1670b |
| 29-Mar-2024 |
Simon Glass <[email protected]> |
arm64: Add BOOT_TARGETS variable
Add a new variable containing a list of possible targets. Mark them as phony. This matches the approach taken for arch/arm
Signed-off-by: Simon Glass <sjg@chromium.
arm64: Add BOOT_TARGETS variable
Add a new variable containing a list of possible targets. Mark them as phony. This matches the approach taken for arch/arm
Signed-off-by: Simon Glass <[email protected]> Reviewed-by: Nicolas Schier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7 |
|
| #
724a75ac |
| 20-Oct-2023 |
Jamie Cunliffe <[email protected]> |
arm64: rust: Enable Rust support for AArch64
This commit provides the build flags for Rust for AArch64. The core Rust support already in the kernel does the rest. This enables the PAC ret and BTI op
arm64: rust: Enable Rust support for AArch64
This commit provides the build flags for Rust for AArch64. The core Rust support already in the kernel does the rest. This enables the PAC ret and BTI options in the Rust build flags to match the options that are used when building C.
The Rust samples have been tested with this commit.
Signed-off-by: Jamie Cunliffe <[email protected]> Acked-by: Will Deacon <[email protected]> Tested-by: Dirk Behme <[email protected]> Tested-by: Boqun Feng <[email protected]> Acked-by: Miguel Ojeda <[email protected]> Acked-by: Catalin Marinas <[email protected]> Tested-by: Alice Ryhl <[email protected]> Tested-by: Fabien Parent <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
| #
c7767f5c |
| 29-Jan-2024 |
Kevin Brodsky <[email protected]> |
arm64: vdso32: Remove unused vdso32-offsets.h
Commit 2d071968a405 ("arm64: compat: Remove 32-bit sigreturn code from the vDSO") removed all VDSO_* symbols in the compat vDSO. As a result, vdso32-off
arm64: vdso32: Remove unused vdso32-offsets.h
Commit 2d071968a405 ("arm64: compat: Remove 32-bit sigreturn code from the vDSO") removed all VDSO_* symbols in the compat vDSO. As a result, vdso32-offsets.h is now empty and therefore unused. Time to remove it.
Signed-off-by: Kevin Brodsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
| #
a099bec7 |
| 17-Nov-2023 |
Masahiro Yamada <[email protected]> |
arm64: vdso32: rename 32-bit debug vdso to vdso32.so.dbg
'make vdso_install' renames arch/arm64/kernel/vdso32/vdso.so.dbg to vdso32.so during installation, which allows 64-bit and 32-bit vdso files
arm64: vdso32: rename 32-bit debug vdso to vdso32.so.dbg
'make vdso_install' renames arch/arm64/kernel/vdso32/vdso.so.dbg to vdso32.so during installation, which allows 64-bit and 32-bit vdso files to be installed in the same directory.
However, arm64 is the only architecture that requires this renaming.
To simplify the vdso_install logic, rename the in-tree vdso file so its base name matches the installed file name.
Signed-off-by: Masahiro Yamada <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
| #
c0a85742 |
| 19-Nov-2023 |
Masahiro Yamada <[email protected]> |
arm64: add dependency between vmlinuz.efi and Image
A common issue in Makefile is a race in parallel building.
You need to be careful to prevent multiple threads from writing to the same file simul
arm64: add dependency between vmlinuz.efi and Image
A common issue in Makefile is a race in parallel building.
You need to be careful to prevent multiple threads from writing to the same file simultaneously.
Commit 3939f3345050 ("ARM: 8418/1: add boot image dependencies to not generate invalid images") addressed such a bad scenario.
A similar symptom occurs with the following command:
$ make -j$(nproc) ARCH=arm64 Image vmlinuz.efi [ snip ] SORTTAB vmlinux OBJCOPY arch/arm64/boot/Image OBJCOPY arch/arm64/boot/Image AS arch/arm64/boot/zboot-header.o PAD arch/arm64/boot/vmlinux.bin GZIP arch/arm64/boot/vmlinuz OBJCOPY arch/arm64/boot/vmlinuz.o LD arch/arm64/boot/vmlinuz.efi.elf OBJCOPY arch/arm64/boot/vmlinuz.efi
The log "OBJCOPY arch/arm64/boot/Image" is displayed twice.
It indicates that two threads simultaneously enter arch/arm64/boot/ and write to arch/arm64/boot/Image.
It occasionally leads to a build failure:
$ make -j$(nproc) ARCH=arm64 Image vmlinuz.efi [ snip ] SORTTAB vmlinux OBJCOPY arch/arm64/boot/Image PAD arch/arm64/boot/vmlinux.bin truncate: Invalid number: 'arch/arm64/boot/vmlinux.bin' make[2]: *** [drivers/firmware/efi/libstub/Makefile.zboot:13: arch/arm64/boot/vmlinux.bin] Error 1 make[2]: *** Deleting file 'arch/arm64/boot/vmlinux.bin' make[1]: *** [arch/arm64/Makefile:163: vmlinuz.efi] Error 2 make[1]: *** Waiting for unfinished jobs.... make: *** [Makefile:234: __sub-make] Error 2
vmlinuz.efi depends on Image, but such a dependency is not specified in arch/arm64/Makefile.
Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Reviewed-by: SImon Glass <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc6 |
|
| #
56769ba4 |
| 14-Oct-2023 |
Masahiro Yamada <[email protected]> |
kbuild: unify vdso_install rules
Currently, there is no standard implementation for vdso_install, leading to various issues:
1. Code duplication
Many architectures duplicate similar code just
kbuild: unify vdso_install rules
Currently, there is no standard implementation for vdso_install, leading to various issues:
1. Code duplication
Many architectures duplicate similar code just for copying files to the install destination.
Some architectures (arm, sparc, x86) create build-id symlinks, introducing more code duplication.
2. Unintended updates of in-tree build artifacts
The vdso_install rule depends on the vdso files to install. It may update in-tree build artifacts. This can be problematic, as explained in commit 19514fc665ff ("arm, kbuild: make "make install" not depend on vmlinux").
3. Broken code in some architectures
Makefile code is often copied from one architecture to another without proper adaptation.
'make vdso_install' for parisc does not work.
'make vdso_install' for s390 installs vdso64, but not vdso32.
To address these problems, this commit introduces a generic vdso_install rule.
Architectures that support vdso_install need to define vdso-install-y in arch/*/Makefile. vdso-install-y lists the files to install.
For example, arch/x86/Makefile looks like this:
vdso-install-$(CONFIG_X86_64) += arch/x86/entry/vdso/vdso64.so.dbg vdso-install-$(CONFIG_X86_X32_ABI) += arch/x86/entry/vdso/vdsox32.so.dbg vdso-install-$(CONFIG_X86_32) += arch/x86/entry/vdso/vdso32.so.dbg vdso-install-$(CONFIG_IA32_EMULATION) += arch/x86/entry/vdso/vdso32.so.dbg
These files will be installed to $(MODLIB)/vdso/ with the .dbg suffix, if exists, stripped away.
vdso-install-y can optionally take the second field after the colon separator. This is needed because some architectures install a vdso file as a different base name.
The following is a snippet from arch/arm64/Makefile.
vdso-install-$(CONFIG_COMPAT_VDSO) += arch/arm64/kernel/vdso32/vdso.so.dbg:vdso32.so
This will rename vdso.so.dbg to vdso32.so during installation. If such architectures change their implementation so that the base names match, this workaround will go away.
Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Sven Schnelle <[email protected]> # s390 Reviewed-by: Nicolas Schier <[email protected]> Reviewed-by: Guo Ren <[email protected]> Acked-by: Helge Deller <[email protected]> # parisc Acked-by: Catalin Marinas <[email protected]> Acked-by: Russell King (Oracle) <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8 |
|
| #
c6cd63f5 |
| 10-Feb-2023 |
Mark Brown <[email protected]> |
arm64: configs: Add virtconfig
Provide a slimline configuration intended to be booted on virtual machines, with the goal of providing a light configuration which will boot on and enable features ava
arm64: configs: Add virtconfig
Provide a slimline configuration intended to be booted on virtual machines, with the goal of providing a light configuration which will boot on and enable features available in mach-virt. This is defined in terms of the standard defconfig, with an additional virt.config fragment which disables options unneeded in a virtual configuration.
As a first step we just disable all the ARCH_ configuration options, disabling the build of all the SoC specific drivers. This results in a kernel that builds about 25% faster in my testing, if this approach works for people we can add further options.
Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc7 |
|
| #
c68cf528 |
| 31-Jan-2023 |
Mark Rutland <[email protected]> |
arm64: pauth: don't sign leaf functions
Currently, when CONFIG_ARM64_PTR_AUTH_KERNEL=y (and CONFIG_UNWIND_PATCH_PAC_INTO_SCS=n), we enable pointer authentication for all functions, including leaf fu
arm64: pauth: don't sign leaf functions
Currently, when CONFIG_ARM64_PTR_AUTH_KERNEL=y (and CONFIG_UNWIND_PATCH_PAC_INTO_SCS=n), we enable pointer authentication for all functions, including leaf functions. This isn't necessary, and is unfortunate for a few reasons:
* Any PACIASP instruction is implicitly a `BTI C` landing pad, and forcing the addition of a PACIASP in every function introduces a larger set of BTI gadgets than is necessary.
* The PACIASP and AUTIASP instructions make leaf functions larger than necessary, bloating the kernel Image. For a defconfig v6.2-rc3 kernel, this appears to add ~64KiB relative to not signing leaf functions, which is unfortunate but not entirely onerous.
* The PACIASP and AUTIASP instructions potentially make leaf functions more expensive in terms of performance and/or power. For many trivial leaf functions, this is clearly unnecessary, e.g.
| <arch_local_save_flags>: | d503233f paciasp | d53b4220 mrs x0, daif | d50323bf autiasp | d65f03c0 ret
| <calibration_delay_done>: | d503233f paciasp | d50323bf autiasp | d65f03c0 ret | d503201f nop
* When CONFIG_UNWIND_PATCH_PAC_INTO_SCS=y we disable pointer authentication for leaf functions, so clearly this is not functionally necessary, indicates we have an inconsistent threat model, and convolutes the Makefile logic.
We've used pointer authentication in leaf functions since the introduction of in-kernel pointer authentication in commit:
74afda4016a7437e ("arm64: compile the kernel with ptrauth return address signing")
... but at the time we had no rationale for signing leaf functions.
Subsequently, we considered avoiding signing leaf functions:
https://lore.kernel.org/linux-arm-kernel/[email protected]/ https://lore.kernel.org/linux-arm-kernel/[email protected]/
... however at the time we didn't have an abundance of reasons to avoid signing leaf functions as above (e.g. the BTI case), we had no hardware to make performance measurements, and it was reasoned that this gave some level of protection against a limited set of code-reuse gadgets which would fall through to a RET. We documented this in commit:
717b938e22f8dbf0 ("arm64: Document why we enable PAC support for leaf functions")
Notably, this was before we supported any forward-edge CFI scheme (e.g. Arm BTI, or Clang CFI/kCFI), which would prevent jumping into the middle of a function.
In addition, even with signing forced for leaf functions, AUTIASP may be placed before a number of instructions which might constitute such a gadget, e.g.
| <user_regs_reset_single_step>: | f9400022 ldr x2, [x1] | d503233f paciasp | d50323bf autiasp | f9408401 ldr x1, [x0, #264] | 720b005f tst w2, #0x200000 | b26b0022 orr x2, x1, #0x200000 | 926af821 and x1, x1, #0xffffffffffdfffff | 9a820021 csel x1, x1, x2, eq // eq = none | f9008401 str x1, [x0, #264] | d65f03c0 ret
| <fpsimd_cpu_dead>: | 2a0003e3 mov w3, w0 | 9000ff42 adrp x2, ffff800009ffd000 <xen_dynamic_chip+0x48> | 9120e042 add x2, x2, #0x838 | 52800000 mov w0, #0x0 // #0 | d503233f paciasp | f000d041 adrp x1, ffff800009a20000 <this_cpu_vector> | d50323bf autiasp | 9102c021 add x1, x1, #0xb0 | f8635842 ldr x2, [x2, w3, uxtw #3] | f821685f str xzr, [x2, x1] | d65f03c0 ret | d503201f nop
So generally, trying to use AUTIASP to detect such gadgetization is not robust, and this is dealt with far better by forward-edge CFI (which is designed to prevent such cases). We should bite the bullet and stop pretending that AUTIASP is a mitigation for such forward-edge gadgetization.
For the above reasons, this patch has the kernel consistently sign non-leaf functions and avoid signing leaf functions.
Considering a defconfig v6.2-rc3 kernel built with LLVM 15.0.6:
* The vmlinux is ~43KiB smaller:
| [mark@lakrids:~/src/linux]% ls -al vmlinux-* | -rwxr-xr-x 1 mark mark 338547808 Jan 25 17:17 vmlinux-after | -rwxr-xr-x 1 mark mark 338591472 Jan 25 17:22 vmlinux-before
* The resulting Image is 64KiB smaller:
| [mark@lakrids:~/src/linux]% ls -al Image-* | -rwxr-xr-x 1 mark mark 32702976 Jan 25 17:17 Image-after | -rwxr-xr-x 1 mark mark 32768512 Jan 25 17:22 Image-before
* There are ~400 fewer BTI gadgets:
| [mark@lakrids:~/src/linux]% usekorg 12.1.0 aarch64-linux-objdump -d vmlinux-before 2> /dev/null | grep -ow 'paciasp\|bti\sc\?' | sort | uniq -c | 1219 bti c | 61982 paciasp
| [mark@lakrids:~/src/linux]% usekorg 12.1.0 aarch64-linux-objdump -d vmlinux-after 2> /dev/null | grep -ow 'paciasp\|bti\sc\?' | sort | uniq -c | 10099 bti c | 52699 paciasp
Which is +8880 BTIs, and -9283 PACIASPs, for -403 unnecessary BTI gadgets. While this is small relative to the total, distinguishing the two cases will make it easier to analyse and reduce this set further in future.
Signed-off-by: Mark Rutland <[email protected]> Reviewed-by: Ard Biesheuvel <[email protected]> Reviewed-by: Mark Brown <[email protected]> Cc: Amit Daniel Kachhap <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
| #
1e249c41 |
| 31-Jan-2023 |
Mark Rutland <[email protected]> |
arm64: unify asm-arch manipulation
Assemblers will reject instructions not supported by a target architecture version, and so we must explicitly tell the assembler the latest architecture version fo
arm64: unify asm-arch manipulation
Assemblers will reject instructions not supported by a target architecture version, and so we must explicitly tell the assembler the latest architecture version for which we want to assemble instructions from.
We've added a few AS_HAS_ARMV8_<N> definitions for this, in addition to an inconsistently named AS_HAS_PAC definition, from which arm64's top-level Makefile determines the architecture version that we intend to target, and generates the `asm-arch` variable.
To make this a bit clearer and easier to maintain, this patch reworks the Makefile to determine asm-arch in a single if-else-endif chain. AS_HAS_PAC, which is defined when the assembler supports `-march=armv8.3-a`, is renamed to AS_HAS_ARMV8_3.
As the logic for armv8.3-a is lifted out of the block handling pointer authentication, `asm-arch` may now be set to armv8.3-a regardless of whether support for pointer authentication is selected. This means that it will be possible to assemble armv8.3-a instructions even if we didn't intend to, but this is consistent with our handling of other architecture versions, and the compiler won't generate armv8.3-a instructions regardless.
For the moment there's no need for an CONFIG_AS_HAS_ARMV8_1, as the code for LSE atomics and LDAPR use individual `.arch_extension` entries and do not require the baseline asm arch to be bumped to armv8.1-a. The other armv8.1-a features (e.g. PAN) do not require assembler support.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <[email protected]> Reviewed-by: Ard Biesheuvel <[email protected]> Reviewed-by: Mark Brown <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc6 |
|
| #
baaf553d |
| 23-Jan-2023 |
Mark Rutland <[email protected]> |
arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64. This allows each ftrace callsite to provide an ftrace_ops to the common ftrac
arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64. This allows each ftrace callsite to provide an ftrace_ops to the common ftrace trampoline, allowing each callsite to invoke distinct tracer functions without the need to fall back to list processing or to allocate custom trampolines for each callsite. This significantly speeds up cases where multiple distinct trace functions are used and callsites are mostly traced by a single tracer.
The main idea is to place a pointer to the ftrace_ops as a literal at a fixed offset from the function entry point, which can be recovered by the common ftrace trampoline. Using a 64-bit literal avoids branch range limitations, and permits the ops to be swapped atomically without special considerations that apply to code-patching. In future this will also allow for the implementation of DYNAMIC_FTRACE_WITH_DIRECT_CALLS without branch range limitations by using additional fields in struct ftrace_ops.
As noted in the core patch adding support for DYNAMIC_FTRACE_WITH_CALL_OPS, this approach allows for directly invoking ftrace_ops::func even for ftrace_ops which are dynamically-allocated (or part of a module), without going via ftrace_ops_list_func.
Currently, this approach is not compatible with CLANG_CFI, as the presence/absence of pre-function NOPs changes the offset of the pre-function type hash, and there's no existing mechanism to ensure a consistent offset for instrumented and uninstrumented functions. When CLANG_CFI is enabled, the existing scheme with a global ops->func pointer is used, and there should be no functional change. I am currently working with others to allow the two to work together in future (though this will liekly require updated compiler support).
I've benchamrked this with the ftrace_ops sample module [1], which is not currently upstream, but available at:
https://lore.kernel.org/lkml/[email protected] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git ftrace-ops-sample-20230109
Using that module I measured the total time taken for 100,000 calls to a trivial instrumented function, with a number of tracers enabled with relevant filters (which would apply to the instrumented function) and a number of tracers enabled with irrelevant filters (which would not apply to the instrumented function). I tested on an M1 MacBook Pro, running under a HVF-accelerated QEMU VM (i.e. on real hardware).
Before this patch:
Number of tracers || Total time | Per-call average time (ns) Relevant | Irrelevant || (ns) | Total | Overhead =========+============++=============+==============+============ 0 | 0 || 94,583 | 0.95 | - 0 | 1 || 93,709 | 0.94 | - 0 | 2 || 93,666 | 0.94 | - 0 | 10 || 93,709 | 0.94 | - 0 | 100 || 93,792 | 0.94 | - ---------+------------++-------------+--------------+------------ 1 | 1 || 6,467,833 | 64.68 | 63.73 1 | 2 || 7,509,708 | 75.10 | 74.15 1 | 10 || 23,786,792 | 237.87 | 236.92 1 | 100 || 106,432,500 | 1,064.43 | 1063.38 ---------+------------++-------------+--------------+------------ 1 | 0 || 1,431,875 | 14.32 | 13.37 2 | 0 || 6,456,334 | 64.56 | 63.62 10 | 0 || 22,717,000 | 227.17 | 226.22 100 | 0 || 103,293,667 | 1032.94 | 1031.99 ---------+------------++-------------+--------------+--------------
Note: per-call overhead is estimated relative to the baseline case with 0 relevant tracers and 0 irrelevant tracers.
After this patch
Number of tracers || Total time | Per-call average time (ns) Relevant | Irrelevant || (ns) | Total | Overhead =========+============++=============+==============+============ 0 | 0 || 94,541 | 0.95 | - 0 | 1 || 93,666 | 0.94 | - 0 | 2 || 93,709 | 0.94 | - 0 | 10 || 93,667 | 0.94 | - 0 | 100 || 93,792 | 0.94 | - ---------+------------++-------------+--------------+------------ 1 | 1 || 281,000 | 2.81 | 1.86 1 | 2 || 281,042 | 2.81 | 1.87 1 | 10 || 280,958 | 2.81 | 1.86 1 | 100 || 281,250 | 2.81 | 1.87 ---------+------------++-------------+--------------+------------ 1 | 0 || 280,959 | 2.81 | 1.86 2 | 0 || 6,502,708 | 65.03 | 64.08 10 | 0 || 18,681,209 | 186.81 | 185.87 100 | 0 || 103,550,458 | 1,035.50 | 1034.56 ---------+------------++-------------+--------------+------------
Note: per-call overhead is estimated relative to the baseline case with 0 relevant tracers and 0 irrelevant tracers.
As can be seen from the above:
a) Whenever there is a single relevant tracer function associated with a tracee, the overhead of invoking the tracer is constant, and does not scale with the number of tracers which are *not* associated with that tracee.
b) The overhead for a single relevant tracer has dropped to ~1/7 of the overhead prior to this series (from 13.37ns to 1.86ns). This is largely due to permitting calls to dynamically-allocated ftrace_ops without going through ftrace_ops_list_func.
I've run the ftrace selftests from v6.2-rc3, which reports:
| # of passed: 110 | # of failed: 0 | # of unresolved: 3 | # of untested: 0 | # of unsupported: 0 | # of xfailed: 1 | # of undefined(test bug): 0
... where the unresolved entries were the tests for DIRECT functions (which are not supported), and the checkbashisms selftest (which is irrelevant here):
| [8] Test ftrace direct functions against tracers [UNRESOLVED] | [9] Test ftrace direct functions against kprobes [UNRESOLVED] | [62] Meta-selftest: Checkbashisms [UNRESOLVED]
... with all other tests passing (or failing as expected).
Signed-off-by: Mark Rutland <[email protected]> Cc: Florent Revest <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4 |
|
| #
26299b3f |
| 03-Nov-2022 |
Mark Rutland <[email protected]> |
ftrace: arm64: move from REGS to ARGS
This commit replaces arm64's support for FTRACE_WITH_REGS with support for FTRACE_WITH_ARGS. This removes some overhead and complexity, and removes some latent
ftrace: arm64: move from REGS to ARGS
This commit replaces arm64's support for FTRACE_WITH_REGS with support for FTRACE_WITH_ARGS. This removes some overhead and complexity, and removes some latent issues with inconsistent presentation of struct pt_regs (which can only be reliably saved/restored at exception boundaries).
FTRACE_WITH_REGS has been supported on arm64 since commit:
3b23e4991fb66f6d ("arm64: implement ftrace with regs")
As noted in the commit message, the major reasons for implementing FTRACE_WITH_REGS were:
(1) To make it possible to use the ftrace graph tracer with pointer authentication, where it's necessary to snapshot/manipulate the LR before it is signed by the instrumented function.
(2) To make it possible to implement LIVEPATCH in future, where we need to hook function entry before an instrumented function manipulates the stack or argument registers. Practically speaking, we need to preserve the argument/return registers, PC, LR, and SP.
Neither of these need a struct pt_regs, and only require the set of registers which are live at function call/return boundaries. Our calling convention is defined by "Procedure Call Standard for the Arm® 64-bit Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found at:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
Per AAPCS64, all function call argument and return values are held in the following GPRs:
* X0 - X7 : parameter / result registers * X8 : indirect result location register * SP : stack pointer (AKA SP)
Additionally, ad function call boundaries, the following GPRs hold context/return information:
* X29 : frame pointer (AKA FP) * X30 : link register (AKA LR)
... and for ftrace we need to capture the instrumented address:
* PC : program counter
No other GPRs are relevant, as none of the other arguments hold parameters or return values:
* X9 - X17 : temporaries, may be clobbered * X18 : shadow call stack pointer (or temorary) * X19 - X28 : callee saved
This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring the minimal set of registers necessary. This is always sufficient to manipulate control flow (e.g. for live-patching) or to manipulate function arguments and return values.
This reduces the necessary stack usage from 336 bytes for pt_regs down to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing up 188 bytes. This could be reduced further with changes to the unwinder.
As there is no longer a need to save different sets of registers for different features, we no longer need distinct `ftrace_caller` and `ftrace_regs_caller` trampolines. This allows the trampoline assembly to be simpler, and simplifies code which previously had to handle the two trampolines.
I've tested this with the ftrace selftests, where there are no unexpected failures.
Co-developed-by: Florent Revest <[email protected]> Signed-off-by: Mark Rutland <[email protected]> Signed-off-by: Florent Revest <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Will Deacon <[email protected]> Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc3 |
|
| #
3b619e22 |
| 27-Oct-2022 |
Ard Biesheuvel <[email protected]> |
arm64: implement dynamic shadow call stack for Clang
Implement dynamic shadow call stack support on Clang, by parsing the unwind tables at init time to locate all occurrences of PACIASP/AUTIASP inst
arm64: implement dynamic shadow call stack for Clang
Implement dynamic shadow call stack support on Clang, by parsing the unwind tables at init time to locate all occurrences of PACIASP/AUTIASP instructions, and replacing them with the shadow call stack push and pop instructions, respectively.
This is useful because the overhead of the shadow call stack is difficult to justify on hardware that implements pointer authentication (PAC), and given that the PAC instructions are executed as NOPs on hardware that doesn't, we can just replace them without breaking anything. As PACIASP/AUTIASP are guaranteed to be paired with respect to manipulations of the return address, replacing them 1:1 with shadow call stack pushes and pops is guaranteed to result in the desired behavior.
Signed-off-by: Ard Biesheuvel <[email protected]> Reviewed-by: Sami Tolvanen <[email protected]> Tested-by: Sami Tolvanen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
| #
68c76ad4 |
| 27-Oct-2022 |
Ard Biesheuvel <[email protected]> |
arm64: unwind: add asynchronous unwind tables to kernel and modules
Enable asynchronous unwind table generation for both the core kernel as well as modules, and emit the resulting .eh_frame sections
arm64: unwind: add asynchronous unwind tables to kernel and modules
Enable asynchronous unwind table generation for both the core kernel as well as modules, and emit the resulting .eh_frame sections as init code so we can use the unwind directives for code patching at boot or module load time.
This will be used by dynamic shadow call stack support, which will rely on code patching rather than compiler codegen to emit the shadow call stack push and pop instructions.
Signed-off-by: Ard Biesheuvel <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Reviewed-by: Sami Tolvanen <[email protected]> Tested-by: Sami Tolvanen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7 |
|
| #
ce697cce |
| 24-Sep-2022 |
Masahiro Yamada <[email protected]> |
kbuild: remove head-y syntax
Kbuild puts the objects listed in head-y at the head of vmlinux. Conventionally, we do this for head*.S, which contains the kernel entry point.
A counter approach is to
kbuild: remove head-y syntax
Kbuild puts the objects listed in head-y at the head of vmlinux. Conventionally, we do this for head*.S, which contains the kernel entry point.
A counter approach is to control the section order by the linker script. Actually, the code marked as __HEAD goes into the ".head.text" section, which is placed before the normal ".text" section.
I do not know if both of them are needed. From the build system perspective, head-y is not mandatory. If you can achieve the proper code placement by the linker script only, it would be cleaner.
I collected the current head-y objects into head-object-list.txt. It is a whitelist. My hope is it will be reduced in the long run.
Signed-off-by: Masahiro Yamada <[email protected]> Tested-by: Nick Desaulniers <[email protected]> Reviewed-by: Nicolas Schier <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6 |
|
| #
c37b830f |
| 01-May-2022 |
Ard Biesheuvel <[email protected]> |
arm64: efi: enable generic EFI compressed boot
Wire up the generic EFI zboot support for arm64.
Signed-off-by: Ard Biesheuvel <[email protected]> Tested-by: Jeremy Linton <[email protected]> Acke
arm64: efi: enable generic EFI compressed boot
Wire up the generic EFI zboot support for arm64.
Signed-off-by: Ard Biesheuvel <[email protected]> Tested-by: Jeremy Linton <[email protected]> Acked-by: Catalin Marinas <[email protected]>
show more ...
|
| #
f774f5bb |
| 03-May-2022 |
Masahiro Yamada <[email protected]> |
kbuild: factor out the common installation code into scripts/install.sh
Many architectures have similar install.sh scripts.
The first half is really generic; it verifies that the kernel image and S
kbuild: factor out the common installation code into scripts/install.sh
Many architectures have similar install.sh scripts.
The first half is really generic; it verifies that the kernel image and System.map exist, then executes ~/bin/${INSTALLKERNEL} or /sbin/${INSTALLKERNEL} if available.
The second half is kind of arch-specific; it copies the kernel image and System.map to the destination, but the code is slightly different.
Factor out the generic part into scripts/install.sh.
Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Nicolas Schier <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6 |
|
| #
2c54b423 |
| 13-Dec-2021 |
Ard Biesheuvel <[email protected]> |
arm64/xor: use EOR3 instructions when available
Use the EOR3 instruction to implement xor_blocks() if the instruction is available, which is the case if the CPU implements the SHA-3 extension. This
arm64/xor: use EOR3 instructions when available
Use the EOR3 instruction to implement xor_blocks() if the instruction is available, which is the case if the CPU implements the SHA-3 extension. This is about 20% faster on Apple M1 when using the 5-way version.
Signed-off-by: Ard Biesheuvel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|