|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7 |
|
| #
b25eb5f5 |
| 12-Mar-2025 |
David Woodhouse <[email protected]> |
x86/kexec: Add relocate_kernel() debugging support: Load a GDT
There are some failure modes which lead to triple-faults in the relocate_kernel() function, which is fairly much undebuggable for norma
x86/kexec: Add relocate_kernel() debugging support: Load a GDT
There are some failure modes which lead to triple-faults in the relocate_kernel() function, which is fairly much undebuggable for normal mortals.
Adding a GDT in the relocate_kernel() environment is step 1 towards being able to catch faults and do something more useful.
Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Brian Gerst <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7 |
|
| #
e5360575 |
| 09-Jan-2025 |
David Woodhouse <[email protected]> |
x86/kexec: Cope with relocate_kernel() not being at the start of the page
A few places in the kexec control code page make the assumption that the first instruction of relocate_kernel is at the very
x86/kexec: Cope with relocate_kernel() not being at the start of the page
A few places in the kexec control code page make the assumption that the first instruction of relocate_kernel is at the very start of the page.
To allow for Clang CFI information to be added to relocate_kernel(), as well as the general principle of removing unwarranted assumptions, fix them to use the external __relocate_kernel_start symbol that the linker adds. This means using a separate addq and subq for calculating offsets, as the assembler can no longer calculate the delta directly for itself and relocations aren't that versatile. But those values can at least be used relative to a local label to avoid absolute relocations.
Turn the jump from relocate_kernel() to identity_mapped() into a real indirect 'jmp *%rsi' too, while touching it. There was no real reason for it to be a push+ret in the first place, and adding Clang CFI info will also give objtool enough visibility to start complaining 'return with modified stack frame' about it.
[ bp: Massage commit message. ]
Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
eeed9150 |
| 09-Jan-2025 |
Nathan Chancellor <[email protected]> |
x86/kexec: Fix location of relocate_kernel with -ffunction-sections
After commit
cb33ff9e063c ("x86/kexec: Move relocate_kernel to kernel .data section"),
kernels configured with an option that
x86/kexec: Fix location of relocate_kernel with -ffunction-sections
After commit
cb33ff9e063c ("x86/kexec: Move relocate_kernel to kernel .data section"),
kernels configured with an option that uses -ffunction-sections, such as CONFIG_LTO_CLANG, crash when kexecing because the value of relocate_kernel does not match the value of __relocate_kernel_start so incorrect code gets copied via machine_kexec_prepare().
$ llvm-nm good-vmlinux &| rg relocate_kernel ffffffff83280d41 T __relocate_kernel_end ffffffff83280b00 T __relocate_kernel_start ffffffff83280b00 T relocate_kernel
$ llvm-nm bad-vmlinux &| rg relocate_kernel ffffffff83266100 D __relocate_kernel_end ffffffff83266100 D __relocate_kernel_start ffffffff8120b0d8 T relocate_kernel
When -ffunction-sections is enabled, TEXT_MAIN matches on '.text.[0-9a-zA-Z_]*' to coalesce the function specific functions back into .text during link time after they have been optimized. Due to the placement of TEXT_TEXT before KEXEC_RELOCATE_KERNEL in the x86 linker script, the .text.relocate_kernel section ends up in .text instead of .data.
Use a second dot in the relocate_kernel section name to avoid matching on TEXT_MAIN, which matches a similar situation that happened in commit
79cd2a11224e ("x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG"),
which allows kexec to function properly.
While .data.relocate_kernel still ends up in the .data section via DATA_MAIN -> DATA_DATA, ensure it is located with the .text.relocate_kernel section as intended by performing the same transformation.
Fixes: cb33ff9e063c ("x86/kexec: Move relocate_kernel to kernel .data section") Fixes: 8dbec5c77bc3 ("x86/kexec: Add data section to relocate_kernel") Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
2cacf7f2 |
| 09-Jan-2025 |
David Woodhouse <[email protected]> |
x86/kexec: Fix stack and handling of re-entry point for ::preserve_context
A ::preserve_context kimage can be invoked more than once, and the entry point can be different every time. When the callee
x86/kexec: Fix stack and handling of re-entry point for ::preserve_context
A ::preserve_context kimage can be invoked more than once, and the entry point can be different every time. When the callee returns to the kernel, it leaves the address of its entry point for next time on the stack.
That being the case, one might reasonably assume that the caller would allocate space for it on the stack frame before actually performing the 'call' into the callee.
Apparently not, though. Ever since the kjump code was first added in 2009, it has set up a *new* stack at the top of the swap_page scratch page, then just performed the 'call' without allocating any space for the re-entry address to be returned. It then reads the re-entry point for next time from 0(%rsp) which is actually the first qword of the page *after* the swap page, which might not exist at all! And if the callee has written to that, then it will have corrupted memory it doesn't own.
Correct this by pushing the entry point of the callee onto the stack before calling it. The callee may then adjust it, or not, as it sees fit, and subsequent invocations should work correctly either way.
Remove a stray push of zero to the *relocate_kernel* stack, which may have been intended for this purpose, but which was actually just noise.
Also, loading the stack for the callee relied on the address of the swap page being in %r10 without ever documenting that fact. Recent code changes made that no longer true, so load it directly from the local kexec_pa_swap_page variable instead.
Fixes: b3adabae8a96 ("x86/kexec: Drop page_list argument from relocate_kernel()") Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
85d724df |
| 09-Jan-2025 |
David Woodhouse <[email protected]> |
x86/kexec: Use correct swap page in swap_pages function
The swap_pages function expects the swap page to be in %r10, but there was no documentation to that effect. Once upon a time the setup code us
x86/kexec: Use correct swap page in swap_pages function
The swap_pages function expects the swap page to be in %r10, but there was no documentation to that effect. Once upon a time the setup code used to load its value from a kernel virtual address and save it to an address which is accessible in the identity-mapped page tables, and *happened* to use %r10 to do so, with no comment that it was left there on *purpose* instead of just being a scratch register. Once that was no longer necessary, %r10 just holds whatever the kernel happened to leave in it.
Now that the original value passed by the kernel is accessible via %rip-relative addressing, load directly from there instead of using %r10 for it. But document the other parameters that the swap_pages function *does* expect in registers.
Fixes: b3adabae8a96 ("x86/kexec: Drop page_list argument from relocate_kernel()") Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
4d5f1da9 |
| 09-Jan-2025 |
David Woodhouse <[email protected]> |
x86/kexec: Ensure preserve_context flag is set on return to kernel
The swap_pages() function will only actually *swap*, as its name implies, if the preserve_context flag in the %r11 register is non-
x86/kexec: Ensure preserve_context flag is set on return to kernel
The swap_pages() function will only actually *swap*, as its name implies, if the preserve_context flag in the %r11 register is non-zero. On the way back from a ::preserve_context kexec, ensure that the %r11 register is non-zero so that the pages get swapped back.
Fixes: 9e5683e2d0b5 ("x86/kexec: Only swap pages for ::preserve_context mode") Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
d144d8a6 |
| 09-Jan-2025 |
David Woodhouse <[email protected]> |
x86/kexec: Disable global pages before writing to control page
The kernel switches to a new set of page tables during kexec. The global mappings (_PAGE_GLOBAL==1) can remain in the TLB after this sw
x86/kexec: Disable global pages before writing to control page
The kernel switches to a new set of page tables during kexec. The global mappings (_PAGE_GLOBAL==1) can remain in the TLB after this switch. This is generally not a problem because the new page tables use a different portion of the virtual address space than the normal kernel mappings.
The critical exception to that generalisation (and the only mapping which isn't an identity mapping) is the kexec control page itself — which was ROX in the original kernel mapping, but should be RWX in the new page tables. If there is a global TLB entry for that in its prior read-only state, it definitely needs to be flushed before attempting to write through that virtual mapping.
It would be possible to just avoid writing to the virtual address of the page and defer all writes until they can be done through the identity mapping. But there's no good reason to keep the old TLB entries around, as they can cause nothing but trouble.
Clear the PGE bit in %cr4 early, before storing data in the control page.
Fixes: 5a82223e0743 ("x86/kexec: Mark relocate_kernel page as ROX instead of RWX") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219592 Reported-by: Nathan Chancellor <[email protected]> Reported-by: "Ning, Hongyu" <[email protected]> Co-developed-by: Dave Hansen <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Tested-by: "Ning, Hongyu" <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
aeb68937 |
| 08-Dec-2024 |
Damien Le Moal <[email protected]> |
x86: Fix build regression with CONFIG_KEXEC_JUMP enabled
Build 6.13-rc12 for x86_64 with gcc 14.2.1 fails with the error:
ld: vmlinux.o: in function `virtual_mapped': linux/arch/x86/kernel/relo
x86: Fix build regression with CONFIG_KEXEC_JUMP enabled
Build 6.13-rc12 for x86_64 with gcc 14.2.1 fails with the error:
ld: vmlinux.o: in function `virtual_mapped': linux/arch/x86/kernel/relocate_kernel_64.S:249:(.text+0x5915b): undefined reference to `saved_context_gdt_desc'
when CONFIG_KEXEC_JUMP is enabled.
This was introduced by commit 07fa619f2a40 ("x86/kexec: Restore GDT on return from ::preserve_context kexec") which introduced a use of saved_context_gdt_desc without a declaration for it.
Fix that by including asm/asm-offsets.h where saved_context_gdt_desc is defined (indirectly in include/generated/asm-offsets.h which asm/asm-offsets.h includes).
Fixes: 07fa619f2a40 ("x86/kexec: Restore GDT on return from ::preserve_context kexec") Signed-off-by: Damien Le Moal <[email protected]> Acked-by: Borislav Petkov (AMD) <[email protected]> Acked-by: David Woodhouse <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2 |
|
| #
93e489ad |
| 05-Dec-2024 |
David Woodhouse <[email protected]> |
x86/kexec: Clean up register usage in relocate_kernel()
The memory encryption flag is passed in %r8 because that's where the calling convention puts it. Instead of moving it to %r12 and then using %
x86/kexec: Clean up register usage in relocate_kernel()
The memory encryption flag is passed in %r8 because that's where the calling convention puts it. Instead of moving it to %r12 and then using %r8 for other things, just leave it in %r8 and use other registers instead.
Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
b7155dfd |
| 05-Dec-2024 |
David Woodhouse <[email protected]> |
x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page
All writes to the relocate_kernel control page are now done *after* the %cr3 switch via simple %rip-relative addressing, wh
x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page
All writes to the relocate_kernel control page are now done *after* the %cr3 switch via simple %rip-relative addressing, which means the DATA() macro with its pointer arithmetic can also now be removed.
Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
b3adabae |
| 05-Dec-2024 |
David Woodhouse <[email protected]> |
x86/kexec: Drop page_list argument from relocate_kernel()
The kernel's virtual mapping of the relocate_kernel page currently needs to be RWX because it is written to before the %cr3 switch.
Now tha
x86/kexec: Drop page_list argument from relocate_kernel()
The kernel's virtual mapping of the relocate_kernel page currently needs to be RWX because it is written to before the %cr3 switch.
Now that the relocate_kernel page has its own .data section and local variables, it can also have *global* variables. So eliminate the separate page_list argument, and write the same information directly to variables in the relocate_kernel page instead. This way, the relocate_kernel code itself doesn't need to copy it.
Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
8dbec5c7 |
| 05-Dec-2024 |
David Woodhouse <[email protected]> |
x86/kexec: Add data section to relocate_kernel
Now that the relocate_kernel page is handled sanely by a linker script we can have actual data, and just use %rip-relative addressing to access it.
Si
x86/kexec: Add data section to relocate_kernel
Now that the relocate_kernel page is handled sanely by a linker script we can have actual data, and just use %rip-relative addressing to access it.
Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
cb33ff9e |
| 05-Dec-2024 |
David Woodhouse <[email protected]> |
x86/kexec: Move relocate_kernel to kernel .data section
Now that the copy is executed instead of the original, the relocate_kernel page can live in the kernel's .text section. This will allow subseq
x86/kexec: Move relocate_kernel to kernel .data section
Now that the copy is executed instead of the original, the relocate_kernel page can live in the kernel's .text section. This will allow subsequent commits to actually add real data to it and clean up the code somewhat as well as making the control page ROX.
Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
eeebbde5 |
| 05-Dec-2024 |
David Woodhouse <[email protected]> |
x86/kexec: Invoke copy of relocate_kernel() instead of the original
This currently calls set_memory_x() from machine_kexec_prepare() just like the 32-bit version does. That's actually a bit earlier
x86/kexec: Invoke copy of relocate_kernel() instead of the original
This currently calls set_memory_x() from machine_kexec_prepare() just like the 32-bit version does. That's actually a bit earlier than I'd like, as it leaves the page RWX all the time the image is even *loaded*.
Subsequent commits will eliminate all the writes to the page between the point it's marked executable in machine_kexec_prepare() the time that relocate_kernel() is running and has switched to the identmap %cr3, so that it can be ROX. But that can't happen until it's moved to the .data section of the kernel, and *that* can't happen until we start executing the copy instead of executing it in place in the kernel .text. So break the circular dependency in those commits by letting it be RWX for now.
Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
9e5683e2 |
| 05-Dec-2024 |
David Woodhouse <[email protected]> |
x86/kexec: Only swap pages for ::preserve_context mode
There's no need to swap pages (which involves three memcopies for each page) in the plain kexec case. Just do a single copy from source to dest
x86/kexec: Only swap pages for ::preserve_context mode
There's no need to swap pages (which involves three memcopies for each page) in the plain kexec case. Just do a single copy from source to destination page.
Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
46d4e205 |
| 05-Dec-2024 |
David Woodhouse <[email protected]> |
x86/kexec: Use named labels in swap_pages in relocate_kernel_64.S
Make the code a little more readable.
Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Ingo Molnar <[email protected]
x86/kexec: Use named labels in swap_pages in relocate_kernel_64.S
Make the code a little more readable.
Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Kai Huang <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
207bdf7f |
| 05-Dec-2024 |
David Woodhouse <[email protected]> |
x86/kexec: Clean up and document register use in relocate_kernel_64.S
Add more comments explaining what each register contains, and save the preserve_context flag to a non-clobbered register sooner,
x86/kexec: Clean up and document register use in relocate_kernel_64.S
Add more comments explaining what each register contains, and save the preserve_context flag to a non-clobbered register sooner, to keep things simpler.
No change in behavior intended.
Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Kai Huang <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
07fa619f |
| 05-Dec-2024 |
David Woodhouse <[email protected]> |
x86/kexec: Restore GDT on return from ::preserve_context kexec
The restore_processor_state() function explicitly states that "the asm code that gets us here will have restored a usable GDT". That wa
x86/kexec: Restore GDT on return from ::preserve_context kexec
The restore_processor_state() function explicitly states that "the asm code that gets us here will have restored a usable GDT". That wasn't true in the case of returning from a ::preserve_context kexec. Make it so.
Without this, the kernel was depending on the called function to reload a GDT which is appropriate for the kernel before returning.
Test program:
#include <unistd.h> #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <linux/kexec.h> #include <linux/reboot.h> #include <sys/reboot.h> #include <sys/syscall.h>
int main (void) { struct kexec_segment segment = {}; unsigned char purgatory[] = { 0x66, 0xba, 0xf8, 0x03, // mov $0x3f8, %dx 0xb0, 0x42, // mov $0x42, %al 0xee, // outb %al, (%dx) 0xc3, // ret }; int ret;
segment.buf = &purgatory; segment.bufsz = sizeof(purgatory); segment.mem = (void *)0x400000; segment.memsz = 0x1000; ret = syscall(__NR_kexec_load, 0x400000, 1, &segment, KEXEC_PRESERVE_CONTEXT); if (ret) { perror("kexec_load"); exit(1); }
ret = syscall(__NR_reboot, LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, LINUX_REBOOT_CMD_KEXEC); if (ret) { perror("kexec reboot"); exit(1); } printf("Success\n"); return 0; }
Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6 |
|
| #
ea49cdb2 |
| 25-Aug-2024 |
Kai Huang <[email protected]> |
x86/kexec: Add comments around swap_pages() assembly to improve readability
The current assembly around swap_pages() in the relocate_kernel() takes some time to follow because the use of registers c
x86/kexec: Add comments around swap_pages() assembly to improve readability
The current assembly around swap_pages() in the relocate_kernel() takes some time to follow because the use of registers can be easily lost when the line of assembly goes long. Add a couple of comments to clarify the code around swap_pages() to improve readability.
Signed-off-by: Kai Huang <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Link: https://lore.kernel.org/all/8b52b0b8513a34b2a02fb4abb05c6700c2821475.1724573384.git.kai.huang@intel.com
show more ...
|
| #
3c41ad39 |
| 25-Aug-2024 |
Kai Huang <[email protected]> |
x86/kexec: Fix a comment of swap_pages() assembly
When relocate_kernel() gets called, %rdi holds 'indirection_page' and %rsi holds 'page_list'. And %rdi always holds 'indirection_page' when swap_pa
x86/kexec: Fix a comment of swap_pages() assembly
When relocate_kernel() gets called, %rdi holds 'indirection_page' and %rsi holds 'page_list'. And %rdi always holds 'indirection_page' when swap_pages() is called.
Therefore the comment of the first line code of swap_pages()
movq %rdi, %rcx /* Put the page_list in %rcx */
.. isn't correct because it actually moves the 'indirection_page' to the %rcx. Fix it.
Signed-off-by: Kai Huang <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Link: https://lore.kernel.org/all/adafdfb1421c88efce04420fc9a996c0e2ca1b34.1724573384.git.kai.huang@intel.com
show more ...
|
|
Revision tags: v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4 |
|
| #
de606131 |
| 14-Jun-2024 |
Kirill A. Shutemov <[email protected]> |
x86/kexec: Keep CR4.MCE set during kexec for TDX guest
TDX guests run with MCA enabled (CR4.MCE=1b) from the very start. If that bit is cleared during CR4 register reprogramming during boot or kexec
x86/kexec: Keep CR4.MCE set during kexec for TDX guest
TDX guests run with MCA enabled (CR4.MCE=1b) from the very start. If that bit is cleared during CR4 register reprogramming during boot or kexec flows, a #VE exception will be raised which the guest kernel cannot handle.
Therefore, make sure the CR4.MCE setting is preserved over kexec too and avoid raising any #VEs.
Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
7b46a899 |
| 14-Jun-2024 |
Borislav Petkov <[email protected]> |
x86/relocate_kernel: Use named labels for less confusion
That identity_mapped() function was loving that "1" label to the point of completely confusing its readers.
Use named labels in each place f
x86/relocate_kernel: Use named labels for less confusion
That identity_mapped() function was loving that "1" label to the point of completely confusing its readers.
Use named labels in each place for clarity.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.10-rc3, v6.10-rc2 |
|
| #
54183d10 |
| 29-May-2024 |
Nikolay Borisov <[email protected]> |
x86/kexec: Remove spurious unconditional JMP from from identity_mapped()
This seemingly straightforward JMP was introduced in the initial version of the the 64bit kexec code without any explanation.
x86/kexec: Remove spurious unconditional JMP from from identity_mapped()
This seemingly straightforward JMP was introduced in the initial version of the the 64bit kexec code without any explanation.
It turns out (check accompanying Link) it's likely a copy/paste artefact from 32-bit code, where such a JMP could be used as a serializing instruction for the 486's prefetch queue. On x86_64 that's not needed because there's already a preceding write to cr4 which itself is a serializing operation.
[ bp: Typos. Let's try this and see what cries out. If it does, reverting it is trivial. ]
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/all/[email protected]/
show more ...
|
|
Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1 |
|
| #
fb799447 |
| 01-Mar-2023 |
Josh Poimboeuf <[email protected]> |
x86,objtool: Split UNWIND_HINT_EMPTY in two
Mark reported that the ORC unwinder incorrectly marks an unwind as reliable when the unwind terminates prematurely in the dark corners of return_to_handle
x86,objtool: Split UNWIND_HINT_EMPTY in two
Mark reported that the ORC unwinder incorrectly marks an unwind as reliable when the unwind terminates prematurely in the dark corners of return_to_handler() due to lack of information about the next frame.
The problem is UNWIND_HINT_EMPTY is used in two different situations:
1) The end of the kernel stack unwind before hitting user entry, boot code, or fork entry
2) A blind spot in ORC coverage where the unwinder has to bail due to lack of information about the next frame
The ORC unwinder has no way to tell the difference between the two. When it encounters an undefined stack state with 'end=1', it blindly marks the stack reliable, which can break the livepatch consistency model.
Fix it by splitting UNWIND_HINT_EMPTY into UNWIND_HINT_UNDEFINED and UNWIND_HINT_END_OF_STACK.
Reported-by: Mark Rutland <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/fd6212c8b450d3564b855e1cb48404d6277b4d9f.1677683419.git.jpoimboe@kernel.org
show more ...
|
|
Revision tags: v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6 |
|
| #
e81dc127 |
| 15-Sep-2022 |
Thomas Gleixner <[email protected]> |
x86/callthunks: Add call patching for call depth tracking
Mitigating the Intel SKL RSB underflow issue in software requires to track the call depth. That is every CALL and every RET need to be inter
x86/callthunks: Add call patching for call depth tracking
Mitigating the Intel SKL RSB underflow issue in software requires to track the call depth. That is every CALL and every RET need to be intercepted and additional code injected.
The existing retbleed mitigations already include means of redirecting RET to __x86_return_thunk; this can be re-purposed and RET can be redirected to another function doing RET accounting.
CALL accounting will use the function padding introduced in prior patches. For each CALL instruction, the destination symbol's padding is rewritten to do the accounting and the CALL instruction is adjusted to call into the padding.
This ensures only affected CPUs pay the overhead of this accounting. Unaffected CPUs will leave the padding unused and have their 'JMP __x86_return_thunk' replaced with an actual 'RET' instruction.
Objtool has been modified to supply a .call_sites section that lists all the 'CALL' instructions. Additionally the paravirt instruction sites are iterated since they will have been patched from an indirect call to direct calls (or direct instructions in which case it'll be ignored).
Module handling and the actual thunk code for SKL will be added in subsequent steps.
Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|