|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5 |
|
| #
909639aa |
| 28-Feb-2025 |
H. Peter Anvin (Intel) <[email protected]> |
x86/cpufeatures: Rename X86_CMPXCHG64 to X86_CX8
Replace X86_CMPXCHG64 with X86_CX8, as CX8 is the name of the CPUID flag, thus to make it consistent with X86_FEATURE_CX8 defined in <asm/cpufeatures
x86/cpufeatures: Rename X86_CMPXCHG64 to X86_CX8
Replace X86_CMPXCHG64 with X86_CX8, as CX8 is the name of the CPUID flag, thus to make it consistent with X86_FEATURE_CX8 defined in <asm/cpufeatures.h>.
No functional change intended.
Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
b815f687 |
| 24-Feb-2025 |
Peter Zijlstra <[email protected]> |
x86/bhi: Add BHI stubs
Add an array of code thunks, to be called from the FineIBT preamble, clobbering the first 'n' argument registers for speculative execution.
Notably the 0th entry will clobber
x86/bhi: Add BHI stubs
Add an array of code thunks, to be called from the FineIBT preamble, clobbering the first 'n' argument registers for speculative execution.
Notably the 0th entry will clobber no argument registers and will never be used, it exists so the array can be naturally indexed, while the 7th entry will clobber all the 6 argument registers and also RSP in order to mess up stack based arguments.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3 |
|
| #
4ffd5086 |
| 10-Feb-2025 |
Eric Biggers <[email protected]> |
x86/crc64: implement crc64_be and crc64_nvme using new template
Add x86_64 [V]PCLMULQDQ optimized implementations of crc64_be() and crc64_nvme() by wiring them up to crc-pclmul-template.S.
crc64_be
x86/crc64: implement crc64_be and crc64_nvme using new template
Add x86_64 [V]PCLMULQDQ optimized implementations of crc64_be() and crc64_nvme() by wiring them up to crc-pclmul-template.S.
crc64_be() is used by bcache and bcachefs, and crc64_nvme() is used by blk-integrity. Both features can CRC large amounts of data, and the developers of both features have expressed interest in having these CRCs be optimized. So this optimization should be worthwhile. (See https://lore.kernel.org/r/v36sousjd5ukqlkpdxslvpu7l37zbu7d7slgc2trjjqwty2bny@qgzew34feo2r and https://lore.kernel.org/r/[email protected])
Benchmark results on AMD Ryzen 9 9950X (Zen 5) using crc_kunit:
crc64_be:
Length Before After ------ ------ ----- 1 633 MB/s 477 MB/s 16 717 MB/s 2517 MB/s 64 715 MB/s 7525 MB/s 127 714 MB/s 10002 MB/s 128 713 MB/s 13344 MB/s 200 715 MB/s 15752 MB/s 256 714 MB/s 22933 MB/s 511 715 MB/s 28025 MB/s 512 714 MB/s 49772 MB/s 1024 715 MB/s 65261 MB/s 3173 714 MB/s 78773 MB/s 4096 714 MB/s 83315 MB/s 16384 714 MB/s 89487 MB/s
crc64_nvme:
Length Before After ------ ------ ----- 1 716 MB/s 474 MB/s 16 717 MB/s 3303 MB/s 64 713 MB/s 7940 MB/s 127 715 MB/s 9867 MB/s 128 714 MB/s 13698 MB/s 200 715 MB/s 15995 MB/s 256 714 MB/s 23479 MB/s 511 714 MB/s 28013 MB/s 512 715 MB/s 51533 MB/s 1024 715 MB/s 66788 MB/s 3173 715 MB/s 79182 MB/s 4096 715 MB/s 83966 MB/s 16384 715 MB/s 89739 MB/s
Acked-by: Keith Busch <[email protected]> Reviewed-by: "Martin K. Petersen" <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
show more ...
|
| #
dbdda1fd |
| 10-Feb-2025 |
Eric Biggers <[email protected]> |
x86/crc-t10dif: implement crc_t10dif using new template
Instantiate crc-pclmul-template.S for crc_t10dif and delete the original PCLMULQDQ optimized implementation. This has the following advantage
x86/crc-t10dif: implement crc_t10dif using new template
Instantiate crc-pclmul-template.S for crc_t10dif and delete the original PCLMULQDQ optimized implementation. This has the following advantages:
- Less CRC-variant-specific code. - VPCLMULQDQ support, greatly improving performance on sufficiently long messages on newer CPUs. - A faster reduction from 128 bits to the final CRC. - Support for i386.
Benchmark results on AMD Ryzen 9 9950X (Zen 5) using crc_kunit:
Length Before After ------ ------ ----- 1 440 MB/s 386 MB/s 16 1865 MB/s 2008 MB/s 64 4343 MB/s 6917 MB/s 127 5440 MB/s 8909 MB/s 128 5533 MB/s 12150 MB/s 200 5908 MB/s 14423 MB/s 256 15870 MB/s 21288 MB/s 511 14219 MB/s 25840 MB/s 512 18361 MB/s 37797 MB/s 1024 19941 MB/s 61374 MB/s 3173 20461 MB/s 74909 MB/s 4096 21310 MB/s 78919 MB/s 16384 21663 MB/s 85012 MB/s
Acked-by: Ard Biesheuvel <[email protected]> Acked-by: Keith Busch <[email protected]> Reviewed-by: "Martin K. Petersen" <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2 |
|
| #
ed4bc981 |
| 02-Dec-2024 |
Eric Biggers <[email protected]> |
x86/crc-t10dif: expose CRC-T10DIF function through lib
Move the x86 CRC-T10DIF assembly code into the lib directory and wire it up to the library interface. This allows it to be used without going
x86/crc-t10dif: expose CRC-T10DIF function through lib
Move the x86 CRC-T10DIF assembly code into the lib directory and wire it up to the library interface. This allows it to be used without going through the crypto API. It remains usable via the crypto API too via the shash algorithms that use the library interface. Thus all the arch-specific "shash" code becomes unnecessary and is removed.
Reviewed-by: Ard Biesheuvel <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
show more ...
|
| #
55d1ecce |
| 02-Dec-2024 |
Eric Biggers <[email protected]> |
x86/crc32: expose CRC32 functions through lib
Move the x86 CRC32 assembly code into the lib directory and wire it up to the library interface. This allows it to be used without going through the cr
x86/crc32: expose CRC32 functions through lib
Move the x86 CRC32 assembly code into the lib directory and wire it up to the library interface. This allows it to be used without going through the crypto API. It remains usable via the crypto API too via the shash algorithms that use the library interface. Thus all the arch-specific "shash" code becomes unnecessary and is removed.
Reviewed-by: Ard Biesheuvel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4 |
|
| #
20516d6e |
| 11-Apr-2024 |
Jason Gunthorpe <[email protected]> |
x86: Stop using weak symbols for __iowrite32_copy()
Start switching iomap_copy routines over to use #define and arch provided inline/macro functions instead of weak symbols.
Inline functions allow
x86: Stop using weak symbols for __iowrite32_copy()
Start switching iomap_copy routines over to use #define and arch provided inline/macro functions instead of weak symbols.
Inline functions allow more compiler optimization and this is often a driver hot path.
x86 has the only weak implementation for __iowrite32_copy(), so replace it with a static inline containing the same single instruction inline assembly. The compiler will generate the "mov edx,ecx" in a more optimal way.
Remove iomap_copy_64.S
Link: https://lore.kernel.org/r/[email protected] Acked-by: Arnd Bergmann <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7 |
|
| #
cd0d9d92 |
| 27-Feb-2024 |
Ard Biesheuvel <[email protected]> |
x86/boot: Move mem_encrypt= parsing to the decompressor
The early SME/SEV code parses the command line very early, in order to decide whether or not memory encryption should be enabled, which needs
x86/boot: Move mem_encrypt= parsing to the decompressor
The early SME/SEV code parses the command line very early, in order to decide whether or not memory encryption should be enabled, which needs to occur even before the initial page tables are created.
This is problematic for a number of reasons: - this early code runs from the 1:1 mapping provided by the decompressor or firmware, which uses a different translation than the one assumed by the linker, and so the code needs to be built in a special way; - parsing external input while the entire kernel image is still mapped writable is a bad idea in general, and really does not belong in security minded code; - the current code ignores the built-in command line entirely (although this appears to be the case for the entire decompressor)
Given that the decompressor/EFI stub is an intrinsic part of the x86 bootable kernel image, move the command line parsing there and out of the core kernel. This removes the need to build lib/cmdline.o in a special way, or to use RIP-relative LEA instructions in inline asm blocks.
This involves a new xloadflag in the setup header to indicate that mem_encrypt=on appeared on the kernel command line.
Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Tom Lendacky <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3 |
|
| #
aefb2f2e |
| 21-Nov-2023 |
Breno Leitao <[email protected]> |
x86/bugs: Rename CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINE
Step 5/10 of the namespace unification of CPU mitigations related Kconfig options.
[ mingo: Converted a few more uses in
x86/bugs: Rename CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINE
Step 5/10 of the namespace unification of CPU mitigations related Kconfig options.
[ mingo: Converted a few more uses in comments/messages as well. ]
Suggested-by: Josh Poimboeuf <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Ariel Miculas <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5 |
|
| #
6d12c8d3 |
| 31-May-2023 |
Peter Zijlstra <[email protected]> |
percpu: Wire up cmpxchg128
In order to replace cmpxchg_double() with the newly minted cmpxchg128() family of functions, wire it up in this_cpu_cmpxchg().
Signed-off-by: Peter Zijlstra (Intel) <pete
percpu: Wire up cmpxchg128
In order to replace cmpxchg_double() with the newly minted cmpxchg128() family of functions, wire it up in this_cpu_cmpxchg().
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Mark Rutland <[email protected]> Tested-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3 |
|
| #
034ff37d |
| 20-Apr-2023 |
Linus Torvalds <[email protected]> |
x86: rewrite '__copy_user_nocache' function
I didn't really want to do this, but as part of all the other changes to the user copy loops, I've been looking at this horror.
I tried to clean it up mu
x86: rewrite '__copy_user_nocache' function
I didn't really want to do this, but as part of all the other changes to the user copy loops, I've been looking at this horror.
I tried to clean it up multiple times, but every time I just found more problems, and the way it's written, it's just too hard to fix them.
For example, the code is written to do quad-word alignment, and will use regular byte accesses to get to that point. That's fairly simple, but it means that any initial 8-byte alignment will be done with cached copies.
However, the code then is very careful to do any 4-byte _tail_ accesses using an uncached 4-byte write, and that was claimed to be relevant in commit a82eee742452 ("x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()").
So if you do a 4-byte copy using that function, it carefully uses a 4-byte 'movnti' for the destination. But if you were to do a 12-byte copy that is 4-byte aligned, it would _not_ do a 4-byte 'movnti' followed by a 8-byte 'movnti' to keep it all uncached.
Instead, it would align the destination to 8 bytes using a byte-at-a-time loop, and then do a 8-byte 'movnti' for the final 8 bytes.
The main caller that cares is __copy_user_flushcache(), which knows about this insanity, and has odd cases for it all. But I just can't deal with looking at this kind of "it does one case right, and another related case entirely wrong".
And the code really wasn't fixable without hard drugs, which I try to avoid.
So instead, rewrite it in a form that hopefully not only gets this right, but is a bit more maintainable. Knock wood.
Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2 |
|
| #
bce5a1e8 |
| 18-Oct-2022 |
Nick Desaulniers <[email protected]> |
x86/mem: Move memmove to out of line assembler
When building ARCH=i386 with CONFIG_LTO_CLANG_FULL=y, it's possible (depending on additional configs which I have not been able to isolate) to observe
x86/mem: Move memmove to out of line assembler
When building ARCH=i386 with CONFIG_LTO_CLANG_FULL=y, it's possible (depending on additional configs which I have not been able to isolate) to observe a failure during register allocation:
error: inline assembly requires more registers than available
when memmove is inlined into tcp_v4_fill_cb() or tcp_v6_fill_cb().
memmove is quite large and probably shouldn't be inlined due to size alone. A noinline function attribute would be the simplest fix, but there's a few things that stand out with the current definition:
In addition to having complex constraints that can't always be resolved, the clobber list seems to be missing %bx. By using numbered operands rather than symbolic operands, the constraints are quite obnoxious to refactor.
Having a large function be 99% inline asm is a code smell that this function should simply be written in stand-alone out-of-line assembler.
Moving this to out of line assembler guarantees that the compiler cannot inline calls to memmove.
This has been done previously for 64b: commit 9599ec0471de ("x86-64, mem: Convert memmove() to assembly file and fix return value bug")
That gives the opportunity for other cleanups like fixing the inconsistent use of tabs vs spaces and instruction suffixes, and the label 3 appearing twice. Symbolic operands, local labels, and additional comments would provide this code with a fresh coat of paint.
Finally, add a test that tickles the `rep movsl` implementation to test it for correctness, since it has implicit operands.
Suggested-by: Ingo Molnar <[email protected]> Suggested-by: David Laight <[email protected]> Signed-off-by: Nick Desaulniers <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Kees Cook <[email protected]> Tested-by: Kees Cook <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/all/20221018172155.287409-1-ndesaulniers%40google.com
show more ...
|
|
Revision tags: v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6 |
|
| #
d911c67e |
| 15-Sep-2022 |
Alexander Potapenko <[email protected]> |
x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN
This is needed to allow memory tools like KASAN and KMSAN see the memory accesses from the checksum code. Without CO
x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN
This is needed to allow memory tools like KASAN and KMSAN see the memory accesses from the checksum code. Without CONFIG_GENERIC_CSUM the tools can't see memory accesses originating from handwritten assembly code.
For KASAN it's a question of detecting more bugs, for KMSAN using the C implementation also helps avoid false positives originating from seemingly uninitialized checksum values.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2 |
|
| #
c6dbd3e5 |
| 15-Nov-2021 |
Peter Zijlstra <[email protected]> |
x86/mmx_32: Remove X86_USE_3DNOW
This code puts an exception table entry on the PREFETCH instruction to overwrite it with a JMP.d8 when it triggers an exception. Except of course, our code is no lon
x86/mmx_32: Remove X86_USE_3DNOW
This code puts an exception table entry on the PREFETCH instruction to overwrite it with a JMP.d8 when it triggers an exception. Except of course, our code is no longer writable, also SMP.
Instead of fixing this broken mess, simply take it out.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3 |
|
| #
fb6a0408 |
| 20-Jul-2021 |
Maciej W. Rozycki <[email protected]> |
x86: Add support for 0x22/0x23 port I/O configuration space
Define macros and accessors for the configuration space addressed indirectly with an index register and a data register at the port I/O
x86: Add support for 0x22/0x23 port I/O configuration space
Define macros and accessors for the configuration space addressed indirectly with an index register and a data register at the port I/O locations of 0x22 and 0x23 respectively.
This space is defined by the Intel MultiProcessor Specification for the IMCR register used to switch between the PIC and the APIC mode[1], by Cyrix processors for their configuration[2][3], and also some chipsets.
Given the lack of atomicity with the indirect addressing a spinlock is required to protect accesses, although for Cyrix processors it is enough if accesses are executed with interrupts locally disabled, because the registers are local to the accessing CPU, and IMCR is only ever poked at by the BSP and early enough for interrupts not to have been configured yet. Therefore existing code does not have to change or use the new spinlock and neither it does.
Put the spinlock in a library file then, so that it does not get pulled unnecessarily for configurations that do not refer it.
Convert Cyrix accessors to wrappers so as to retain the brevity and clarity of the `getCx86' and `setCx86' calls.
References:
[1] "MultiProcessor Specification", Version 1.4, Intel Corporation, Order Number: 242016-006, May 1997, Section 3.6.2.1 "PIC Mode", pp. 3-7, 3-8
[2] "5x86 Microprocessor", Cyrix Corporation, Order Number: 94192-00, July 1995, Section 2.3.2.4 "Configuration Registers", p. 2-23
[3] "6x86 Processor", Cyrix Corporation, Order Number: 94175-01, March 1996, Section 2.4.4 "6x86 Configuration Registers", p. 2-23
Signed-off-by: Maciej W. Rozycki <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9 |
|
| #
ec6347bb |
| 06-Oct-2020 |
Dan Williams <[email protected]> |
x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named
x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled.
Of particular concern is that even though x86 might be able to handle the semantics of copy_mc_to_user() with its common copy_user_generic() implementation other archs likely need / want an explicit path for this case:
On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <[email protected]> wrote: > > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <[email protected]> wrote: > > > > However now I see that copy_user_generic() works for the wrong reason. > > It works because the exception on the source address due to poison > > looks no different than a write fault on the user address to the > > caller, it's still just a short copy. So it makes copy_to_user() work > > for the wrong reason relative to the name. > > Right. > > And it won't work that way on other architectures. On x86, we have a > generic function that can take faults on either side, and we use it > for both cases (and for the "in_user" case too), but that's an > artifact of the architecture oddity. > > In fact, it's probably wrong even on x86 - because it can hide bugs - > but writing those things is painful enough that everybody prefers > having just one function.
Replace a single top-level memcpy_mcsafe() with either copy_mc_to_user(), or copy_mc_to_kernel().
Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch.
One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks.
[ bp: Massage a bit. ]
Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tony Luck <[email protected]> Acked-by: Michael Ellerman <[email protected]> Cc: <[email protected]> Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
show more ...
|
|
Revision tags: v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4 |
|
| #
aef0148f |
| 03-Sep-2020 |
Arvind Sankar <[email protected]> |
x86/cmdline: Disable jump tables for cmdline.c
When CONFIG_RETPOLINE is disabled, Clang uses a jump table for the switch statement in cmdline_find_option (jump tables are disabled when CONFIG_RETPOL
x86/cmdline: Disable jump tables for cmdline.c
When CONFIG_RETPOLINE is disabled, Clang uses a jump table for the switch statement in cmdline_find_option (jump tables are disabled when CONFIG_RETPOLINE is enabled). This function is called very early in boot from sme_enable() if CONFIG_AMD_MEM_ENCRYPT is enabled. At this time, the kernel is still executing out of the identity mapping, but the jump table will contain virtual addresses.
Fix this by disabling jump tables for cmdline.c when AMD_MEM_ENCRYPT is enabled.
Signed-off-by: Arvind Sankar <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3 |
|
| #
893ab004 |
| 26-Jun-2020 |
Masahiro Yamada <[email protected]> |
kbuild: remove cc-option test of -fno-stack-protector
Some Makefiles already pass -fno-stack-protector unconditionally. For example, arch/arm64/kernel/vdso/Makefile, arch/x86/xen/Makefile.
No probl
kbuild: remove cc-option test of -fno-stack-protector
Some Makefiles already pass -fno-stack-protector unconditionally. For example, arch/arm64/kernel/vdso/Makefile, arch/x86/xen/Makefile.
No problem report so far about hard-coding this option. So, we can assume all supported compilers know -fno-stack-protector.
GCC 4.8 and Clang support this option (https://godbolt.org/z/_HDGzN)
Get rid of cc-option from -fno-stack-protector.
Remove CONFIG_CC_HAS_STACKPROTECTOR_NONE, which is always 'y'.
Note: arch/mips/vdso/Makefile adds -fno-stack-protector twice, first unconditionally, and second conditionally. I removed the second one.
Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]>
show more ...
|
|
Revision tags: v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2 |
|
| #
f5d2313b |
| 14-Feb-2020 |
Marco Elver <[email protected]> |
kcsan, trace: Make KCSAN compatible with tracing
Previously the system would lock up if ftrace was enabled together with KCSAN. This is due to recursion on reporting if the tracer code is instrument
kcsan, trace: Make KCSAN compatible with tracing
Previously the system would lock up if ftrace was enabled together with KCSAN. This is due to recursion on reporting if the tracer code is instrumented with KCSAN.
To avoid this for all types of tracing, disable KCSAN instrumentation for all of kernel/trace.
Furthermore, since KCSAN relies on udelay() to introduce delay, we have to disable ftrace for udelay() (currently done for x86) in case KCSAN is used together with lockdep and ftrace. The reason is that it may corrupt lockdep IRQ flags tracing state due to a peculiar case of recursion (details in Makefile comment).
Reported-by: Qian Cai <[email protected]> Tested-by: Qian Cai <[email protected]> Acked-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Marco Elver <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8 |
|
| #
40d04110 |
| 14-Nov-2019 |
Marco Elver <[email protected]> |
x86, kcsan: Enable KCSAN for x86
This patch enables KCSAN for x86, with updates to build rules to not use KCSAN for several incompatible compilation units.
Signed-off-by: Marco Elver <elver@google.
x86, kcsan: Enable KCSAN for x86
This patch enables KCSAN for x86, with updates to build rules to not use KCSAN for several incompatible compilation units.
Signed-off-by: Marco Elver <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
|
Revision tags: v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1 |
|
| #
b51ce374 |
| 29-Apr-2019 |
Gary Hook <[email protected]> |
x86/mm/mem_encrypt: Disable all instrumentation for early SME setup
Enablement of AMD's Secure Memory Encryption feature is determined very early after start_kernel() is entered. Part of this proced
x86/mm/mem_encrypt: Disable all instrumentation for early SME setup
Enablement of AMD's Secure Memory Encryption feature is determined very early after start_kernel() is entered. Part of this procedure involves scanning the command line for the parameter 'mem_encrypt'.
To determine intended state, the function sme_enable() uses library functions cmdline_find_option() and strncmp(). Their use occurs early enough such that it cannot be assumed that any instrumentation subsystem is initialized.
For example, making calls to a KASAN-instrumented function before KASAN is set up will result in the use of uninitialized memory and a boot failure.
When AMD's SME support is enabled, conditionally disable instrumentation of these dependent functions in lib/string.c and arch/x86/lib/cmdline.c.
[ bp: Get rid of intermediary nostackp var and cleanup whitespace. ]
Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption") Reported-by: Li RongQing <[email protected]> Signed-off-by: Gary R Hook <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Coly Li <[email protected]> Cc: "[email protected]" <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: "[email protected]" <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: "[email protected]" <[email protected]> Cc: "[email protected]" <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: x86-ml <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2 |
|
| #
46ad0840 |
| 22-Mar-2019 |
Waiman Long <[email protected]> |
locking/rwsem: Remove arch specific rwsem files
As the generic rwsem-xadd code is using the appropriate acquire and release versions of the atomic operations, the arch specific rwsem.h files will no
locking/rwsem: Remove arch specific rwsem files
As the generic rwsem-xadd code is using the appropriate acquire and release versions of the atomic operations, the arch specific rwsem.h files will not be that much faster than the generic code as long as the atomic functions are properly implemented. So we can remove those arch specific rwsem.h and stop building asm/rwsem.h to reduce maintenance effort.
Currently, only x86, alpha and ia64 have implemented architecture specific fast paths. I don't have access to alpha and ia64 systems for testing, but they are legacy systems that are not likely to be updated to the latest kernel anyway.
By using a rwsem microbenchmark, the total locking rates on a 4-socket 56-core 112-thread x86-64 system before and after the patch were as follows (mixed means equal # of read and write locks):
Before Patch After Patch # of Threads wlock rlock mixed wlock rlock mixed ------------ ----- ----- ----- ----- ----- ----- 1 29,201 30,143 29,458 28,615 30,172 29,201 2 6,807 13,299 1,171 7,725 15,025 1,804 4 6,504 12,755 1,520 7,127 14,286 1,345 8 6,762 13,412 764 6,826 13,652 726 16 6,693 15,408 662 6,599 15,938 626 32 6,145 15,286 496 5,549 15,487 511 64 5,812 15,495 60 5,858 15,572 60
There were some run-to-run variations for the multi-thread tests. For x86-64, using the generic C code fast path seems to be a little bit faster than the assembly version with low lock contention. Looking at the assembly version of the fast paths, there are assembly to/from C code wrappers that save and restore all the callee-clobbered registers (7 registers on x86-64). The assembly generated from the generic C code doesn't need to do that. That may explain the slight performance gain here.
The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h with no code change as no other code other than those under kernel/locking needs to access the internal rwsem macros and functions.
Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tim Chen <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1 |
|
| #
172caf19 |
| 31-Dec-2018 |
Masahiro Yamada <[email protected]> |
kbuild: remove redundant target cleaning on failure
Since commit 9c2af1c7377a ("kbuild: add .DELETE_ON_ERROR special target"), the target file is automatically deleted on failure.
The boilerplate c
kbuild: remove redundant target cleaning on failure
Since commit 9c2af1c7377a ("kbuild: add .DELETE_ON_ERROR special target"), the target file is automatically deleted on failure.
The boilerplate code
... || { rm -f $@; false; }
is unneeded.
Signed-off-by: Masahiro Yamada <[email protected]>
show more ...
|
| #
170d13ca |
| 05-Jan-2019 |
Linus Torvalds <[email protected]> |
x86: re-introduce non-generic memcpy_{to,from}io
This has been broken forever, and nobody ever really noticed because it's purely a performance issue.
Long long ago, in commit 6175ddf06b61 ("x86: C
x86: re-introduce non-generic memcpy_{to,from}io
This has been broken forever, and nobody ever really noticed because it's purely a performance issue.
Long long ago, in commit 6175ddf06b61 ("x86: Clean up mem*io functions") Brian Gerst simplified the memory copies to and from iomem, since on x86, the instructions to access iomem are exactly the same as the regular instructions.
That is technically true, and things worked, and nobody said anything. Besides, back then the regular memcpy was pretty simple and worked fine.
Nobody noticed except for David Laight, that is. David has a testing a TLP monitor he was writing for an FPGA, and has been occasionally complaining about how memcpy_toio() writes things one byte at a time.
Which is completely unacceptable from a performance standpoint, even if it happens to technically work.
The reason it's writing one byte at a time is because while it's technically true that accesses to iomem are the same as accesses to regular memory on x86, the _granularity_ (and ordering) of accesses matter to iomem in ways that they don't matter to regular cached memory.
In particular, when ERMS is set, we default to using "rep movsb" for larger memory copies. That is indeed perfectly fine for real memory, since the whole point is that the CPU is going to do cacheline optimizations and executes the memory copy efficiently for cached memory.
With iomem? Not so much. With iomem, "rep movsb" will indeed work, but it will copy things one byte at a time. Slowly and ponderously.
Now, originally, back in 2010 when commit 6175ddf06b61 was done, we didn't use ERMS, and this was much less noticeable.
Our normal memcpy() was simpler in other ways too.
Because in fact, it's not just about using the string instructions. Our memcpy() these days does things like "read and write overlapping values" to handle the last bytes of the copy. Again, for normal memory, overlapping accesses isn't an issue. For iomem? It can be.
So this re-introduces the specialized memcpy_toio(), memcpy_fromio() and memset_io() functions. It doesn't particularly optimize them, but it tries to at least not be horrid, or do overlapping accesses. In fact, this uses the existing __inline_memcpy() function that we still had lying around that uses our very traditional "rep movsl" loop followed by movsw/movsb for the final bytes.
Somebody may decide to try to improve on it, but if we've gone almost a decade with only one person really ever noticing and complaining, maybe it's not worth worrying about further, once it's not _completely_ broken?
Reported-by: David Laight <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1, v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4, v4.17-rc3, v4.17-rc2, v4.17-rc1, v4.16, v4.16-rc7, v4.16-rc6, v4.16-rc5, v4.16-rc4, v4.16-rc3 |
|
| #
d1c99108 |
| 19-Feb-2018 |
David Woodhouse <[email protected]> |
Revert "x86/retpoline: Simplify vmexit_fill_RSB()"
This reverts commit 1dde7415e99933bb7293d6b2843752cbdb43ec11. By putting the RSB filling out of line and calling it, we waste one RSB slot for retu
Revert "x86/retpoline: Simplify vmexit_fill_RSB()"
This reverts commit 1dde7415e99933bb7293d6b2843752cbdb43ec11. By putting the RSB filling out of line and calling it, we waste one RSB slot for returning from the function itself, which means one fewer actual function call we can make if we're doing the Skylake abomination of call-depth counting.
It also changed the number of RSB stuffings we do on vmexit from 32, which was correct, to 16. Let's just stop with the bikeshedding; it didn't actually *fix* anything anyway.
Signed-off-by: David Woodhouse <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|