History log of /linux-6.15/arch/x86/kernel/relocate_kernel_64.S (Results 1 – 25 of 51)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7
# b25eb5f5 12-Mar-2025 David Woodhouse <[email protected]>

x86/kexec: Add relocate_kernel() debugging support: Load a GDT

There are some failure modes which lead to triple-faults in the
relocate_kernel() function, which is fairly much undebuggable
for norma

x86/kexec: Add relocate_kernel() debugging support: Load a GDT

There are some failure modes which lead to triple-faults in the
relocate_kernel() function, which is fairly much undebuggable
for normal mortals.

Adding a GDT in the relocate_kernel() environment is step 1 towards
being able to catch faults and do something more useful.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7
# e5360575 09-Jan-2025 David Woodhouse <[email protected]>

x86/kexec: Cope with relocate_kernel() not being at the start of the page

A few places in the kexec control code page make the assumption that the first
instruction of relocate_kernel is at the very

x86/kexec: Cope with relocate_kernel() not being at the start of the page

A few places in the kexec control code page make the assumption that the first
instruction of relocate_kernel is at the very start of the page.

To allow for Clang CFI information to be added to relocate_kernel(), as well
as the general principle of removing unwarranted assumptions, fix them to use
the external __relocate_kernel_start symbol that the linker adds. This means
using a separate addq and subq for calculating offsets, as the assembler can
no longer calculate the delta directly for itself and relocations aren't that
versatile. But those values can at least be used relative to a local label to
avoid absolute relocations.

Turn the jump from relocate_kernel() to identity_mapped() into a real indirect
'jmp *%rsi' too, while touching it. There was no real reason for it to be
a push+ret in the first place, and adding Clang CFI info will also give
objtool enough visibility to start complaining 'return with modified stack
frame' about it.

[ bp: Massage commit message. ]

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# eeed9150 09-Jan-2025 Nathan Chancellor <[email protected]>

x86/kexec: Fix location of relocate_kernel with -ffunction-sections

After commit

cb33ff9e063c ("x86/kexec: Move relocate_kernel to kernel .data section"),

kernels configured with an option that

x86/kexec: Fix location of relocate_kernel with -ffunction-sections

After commit

cb33ff9e063c ("x86/kexec: Move relocate_kernel to kernel .data section"),

kernels configured with an option that uses -ffunction-sections, such as
CONFIG_LTO_CLANG, crash when kexecing because the value of relocate_kernel
does not match the value of __relocate_kernel_start so incorrect code gets
copied via machine_kexec_prepare().

$ llvm-nm good-vmlinux &| rg relocate_kernel
ffffffff83280d41 T __relocate_kernel_end
ffffffff83280b00 T __relocate_kernel_start
ffffffff83280b00 T relocate_kernel

$ llvm-nm bad-vmlinux &| rg relocate_kernel
ffffffff83266100 D __relocate_kernel_end
ffffffff83266100 D __relocate_kernel_start
ffffffff8120b0d8 T relocate_kernel

When -ffunction-sections is enabled, TEXT_MAIN matches on
'.text.[0-9a-zA-Z_]*' to coalesce the function specific functions back
into .text during link time after they have been optimized. Due to the
placement of TEXT_TEXT before KEXEC_RELOCATE_KERNEL in the x86 linker
script, the .text.relocate_kernel section ends up in .text instead of
.data.

Use a second dot in the relocate_kernel section name to avoid matching
on TEXT_MAIN, which matches a similar situation that happened in
commit

79cd2a11224e ("x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG"),

which allows kexec to function properly.

While .data.relocate_kernel still ends up in the .data section via
DATA_MAIN -> DATA_DATA, ensure it is located with the
.text.relocate_kernel section as intended by performing the same
transformation.

Fixes: cb33ff9e063c ("x86/kexec: Move relocate_kernel to kernel .data section")
Fixes: 8dbec5c77bc3 ("x86/kexec: Add data section to relocate_kernel")
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 2cacf7f2 09-Jan-2025 David Woodhouse <[email protected]>

x86/kexec: Fix stack and handling of re-entry point for ::preserve_context

A ::preserve_context kimage can be invoked more than once, and the entry point
can be different every time. When the callee

x86/kexec: Fix stack and handling of re-entry point for ::preserve_context

A ::preserve_context kimage can be invoked more than once, and the entry point
can be different every time. When the callee returns to the kernel, it leaves
the address of its entry point for next time on the stack.

That being the case, one might reasonably assume that the caller would
allocate space for it on the stack frame before actually performing the 'call'
into the callee.

Apparently not, though. Ever since the kjump code was first added in 2009, it
has set up a *new* stack at the top of the swap_page scratch page, then just
performed the 'call' without allocating any space for the re-entry address to
be returned. It then reads the re-entry point for next time from 0(%rsp) which
is actually the first qword of the page *after* the swap page, which might not
exist at all! And if the callee has written to that, then it will have
corrupted memory it doesn't own.

Correct this by pushing the entry point of the callee onto the stack before
calling it. The callee may then adjust it, or not, as it sees fit, and
subsequent invocations should work correctly either way.

Remove a stray push of zero to the *relocate_kernel* stack, which may have
been intended for this purpose, but which was actually just noise.

Also, loading the stack for the callee relied on the address of the swap page
being in %r10 without ever documenting that fact. Recent code changes made
that no longer true, so load it directly from the local kexec_pa_swap_page
variable instead.

Fixes: b3adabae8a96 ("x86/kexec: Drop page_list argument from relocate_kernel()")
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 85d724df 09-Jan-2025 David Woodhouse <[email protected]>

x86/kexec: Use correct swap page in swap_pages function

The swap_pages function expects the swap page to be in %r10, but there
was no documentation to that effect. Once upon a time the setup code
us

x86/kexec: Use correct swap page in swap_pages function

The swap_pages function expects the swap page to be in %r10, but there
was no documentation to that effect. Once upon a time the setup code
used to load its value from a kernel virtual address and save it to an
address which is accessible in the identity-mapped page tables, and
*happened* to use %r10 to do so, with no comment that it was left there
on *purpose* instead of just being a scratch register. Once that was no
longer necessary, %r10 just holds whatever the kernel happened to leave
in it.

Now that the original value passed by the kernel is accessible via
%rip-relative addressing, load directly from there instead of using %r10
for it. But document the other parameters that the swap_pages function
*does* expect in registers.

Fixes: b3adabae8a96 ("x86/kexec: Drop page_list argument from relocate_kernel()")
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 4d5f1da9 09-Jan-2025 David Woodhouse <[email protected]>

x86/kexec: Ensure preserve_context flag is set on return to kernel

The swap_pages() function will only actually *swap*, as its name implies, if
the preserve_context flag in the %r11 register is non-

x86/kexec: Ensure preserve_context flag is set on return to kernel

The swap_pages() function will only actually *swap*, as its name implies, if
the preserve_context flag in the %r11 register is non-zero. On the way back
from a ::preserve_context kexec, ensure that the %r11 register is non-zero so
that the pages get swapped back.

Fixes: 9e5683e2d0b5 ("x86/kexec: Only swap pages for ::preserve_context mode")
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# d144d8a6 09-Jan-2025 David Woodhouse <[email protected]>

x86/kexec: Disable global pages before writing to control page

The kernel switches to a new set of page tables during kexec. The global
mappings (_PAGE_GLOBAL==1) can remain in the TLB after this sw

x86/kexec: Disable global pages before writing to control page

The kernel switches to a new set of page tables during kexec. The global
mappings (_PAGE_GLOBAL==1) can remain in the TLB after this switch. This
is generally not a problem because the new page tables use a different
portion of the virtual address space than the normal kernel mappings.

The critical exception to that generalisation (and the only mapping
which isn't an identity mapping) is the kexec control page itself —
which was ROX in the original kernel mapping, but should be RWX in the
new page tables. If there is a global TLB entry for that in its prior
read-only state, it definitely needs to be flushed before attempting to
write through that virtual mapping.

It would be possible to just avoid writing to the virtual address of the
page and defer all writes until they can be done through the identity
mapping. But there's no good reason to keep the old TLB entries around,
as they can cause nothing but trouble.

Clear the PGE bit in %cr4 early, before storing data in the control page.

Fixes: 5a82223e0743 ("x86/kexec: Mark relocate_kernel page as ROX instead of RWX")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219592
Reported-by: Nathan Chancellor <[email protected]>
Reported-by: "Ning, Hongyu" <[email protected]>
Co-developed-by: Dave Hansen <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Tested-by: "Ning, Hongyu" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3
# aeb68937 08-Dec-2024 Damien Le Moal <[email protected]>

x86: Fix build regression with CONFIG_KEXEC_JUMP enabled

Build 6.13-rc12 for x86_64 with gcc 14.2.1 fails with the error:

ld: vmlinux.o: in function `virtual_mapped':
linux/arch/x86/kernel/relo

x86: Fix build regression with CONFIG_KEXEC_JUMP enabled

Build 6.13-rc12 for x86_64 with gcc 14.2.1 fails with the error:

ld: vmlinux.o: in function `virtual_mapped':
linux/arch/x86/kernel/relocate_kernel_64.S:249:(.text+0x5915b): undefined reference to `saved_context_gdt_desc'

when CONFIG_KEXEC_JUMP is enabled.

This was introduced by commit 07fa619f2a40 ("x86/kexec: Restore GDT on
return from ::preserve_context kexec") which introduced a use of
saved_context_gdt_desc without a declaration for it.

Fix that by including asm/asm-offsets.h where saved_context_gdt_desc
is defined (indirectly in include/generated/asm-offsets.h which
asm/asm-offsets.h includes).

Fixes: 07fa619f2a40 ("x86/kexec: Restore GDT on return from ::preserve_context kexec")
Signed-off-by: Damien Le Moal <[email protected]>
Acked-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: David Woodhouse <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


Revision tags: v6.13-rc2
# 93e489ad 05-Dec-2024 David Woodhouse <[email protected]>

x86/kexec: Clean up register usage in relocate_kernel()

The memory encryption flag is passed in %r8 because that's where the
calling convention puts it. Instead of moving it to %r12 and then using
%

x86/kexec: Clean up register usage in relocate_kernel()

The memory encryption flag is passed in %r8 because that's where the
calling convention puts it. Instead of moving it to %r12 and then using
%r8 for other things, just leave it in %r8 and use other registers
instead.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# b7155dfd 05-Dec-2024 David Woodhouse <[email protected]>

x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page

All writes to the relocate_kernel control page are now done *after* the
%cr3 switch via simple %rip-relative addressing, wh

x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page

All writes to the relocate_kernel control page are now done *after* the
%cr3 switch via simple %rip-relative addressing, which means the DATA()
macro with its pointer arithmetic can also now be removed.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# b3adabae 05-Dec-2024 David Woodhouse <[email protected]>

x86/kexec: Drop page_list argument from relocate_kernel()

The kernel's virtual mapping of the relocate_kernel page currently needs
to be RWX because it is written to before the %cr3 switch.

Now tha

x86/kexec: Drop page_list argument from relocate_kernel()

The kernel's virtual mapping of the relocate_kernel page currently needs
to be RWX because it is written to before the %cr3 switch.

Now that the relocate_kernel page has its own .data section and local
variables, it can also have *global* variables. So eliminate the separate
page_list argument, and write the same information directly to variables
in the relocate_kernel page instead. This way, the relocate_kernel code
itself doesn't need to copy it.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 8dbec5c7 05-Dec-2024 David Woodhouse <[email protected]>

x86/kexec: Add data section to relocate_kernel

Now that the relocate_kernel page is handled sanely by a linker script
we can have actual data, and just use %rip-relative addressing to access
it.

Si

x86/kexec: Add data section to relocate_kernel

Now that the relocate_kernel page is handled sanely by a linker script
we can have actual data, and just use %rip-relative addressing to access
it.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# cb33ff9e 05-Dec-2024 David Woodhouse <[email protected]>

x86/kexec: Move relocate_kernel to kernel .data section

Now that the copy is executed instead of the original, the relocate_kernel
page can live in the kernel's .text section. This will allow subseq

x86/kexec: Move relocate_kernel to kernel .data section

Now that the copy is executed instead of the original, the relocate_kernel
page can live in the kernel's .text section. This will allow subsequent
commits to actually add real data to it and clean up the code somewhat as
well as making the control page ROX.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# eeebbde5 05-Dec-2024 David Woodhouse <[email protected]>

x86/kexec: Invoke copy of relocate_kernel() instead of the original

This currently calls set_memory_x() from machine_kexec_prepare() just
like the 32-bit version does. That's actually a bit earlier

x86/kexec: Invoke copy of relocate_kernel() instead of the original

This currently calls set_memory_x() from machine_kexec_prepare() just
like the 32-bit version does. That's actually a bit earlier than I'd
like, as it leaves the page RWX all the time the image is even *loaded*.

Subsequent commits will eliminate all the writes to the page between the
point it's marked executable in machine_kexec_prepare() the time that
relocate_kernel() is running and has switched to the identmap %cr3, so
that it can be ROX. But that can't happen until it's moved to the .data
section of the kernel, and *that* can't happen until we start executing
the copy instead of executing it in place in the kernel .text. So break
the circular dependency in those commits by letting it be RWX for now.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 9e5683e2 05-Dec-2024 David Woodhouse <[email protected]>

x86/kexec: Only swap pages for ::preserve_context mode

There's no need to swap pages (which involves three memcopies for each
page) in the plain kexec case. Just do a single copy from source to
dest

x86/kexec: Only swap pages for ::preserve_context mode

There's no need to swap pages (which involves three memcopies for each
page) in the plain kexec case. Just do a single copy from source to
destination page.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 46d4e205 05-Dec-2024 David Woodhouse <[email protected]>

x86/kexec: Use named labels in swap_pages in relocate_kernel_64.S

Make the code a little more readable.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]

x86/kexec: Use named labels in swap_pages in relocate_kernel_64.S

Make the code a little more readable.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Kai Huang <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 207bdf7f 05-Dec-2024 David Woodhouse <[email protected]>

x86/kexec: Clean up and document register use in relocate_kernel_64.S

Add more comments explaining what each register contains, and save the
preserve_context flag to a non-clobbered register sooner,

x86/kexec: Clean up and document register use in relocate_kernel_64.S

Add more comments explaining what each register contains, and save the
preserve_context flag to a non-clobbered register sooner, to keep things
simpler.

No change in behavior intended.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Kai Huang <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 07fa619f 05-Dec-2024 David Woodhouse <[email protected]>

x86/kexec: Restore GDT on return from ::preserve_context kexec

The restore_processor_state() function explicitly states that "the asm code
that gets us here will have restored a usable GDT". That wa

x86/kexec: Restore GDT on return from ::preserve_context kexec

The restore_processor_state() function explicitly states that "the asm code
that gets us here will have restored a usable GDT". That wasn't true in the
case of returning from a ::preserve_context kexec. Make it so.

Without this, the kernel was depending on the called function to reload a
GDT which is appropriate for the kernel before returning.

Test program:

#include <unistd.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <linux/kexec.h>
#include <linux/reboot.h>
#include <sys/reboot.h>
#include <sys/syscall.h>

int main (void)
{
struct kexec_segment segment = {};
unsigned char purgatory[] = {
0x66, 0xba, 0xf8, 0x03, // mov $0x3f8, %dx
0xb0, 0x42, // mov $0x42, %al
0xee, // outb %al, (%dx)
0xc3, // ret
};
int ret;

segment.buf = &purgatory;
segment.bufsz = sizeof(purgatory);
segment.mem = (void *)0x400000;
segment.memsz = 0x1000;
ret = syscall(__NR_kexec_load, 0x400000, 1, &segment, KEXEC_PRESERVE_CONTEXT);
if (ret) {
perror("kexec_load");
exit(1);
}

ret = syscall(__NR_reboot, LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, LINUX_REBOOT_CMD_KEXEC);
if (ret) {
perror("kexec reboot");
exit(1);
}
printf("Success\n");
return 0;
}

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6
# ea49cdb2 25-Aug-2024 Kai Huang <[email protected]>

x86/kexec: Add comments around swap_pages() assembly to improve readability

The current assembly around swap_pages() in the relocate_kernel() takes
some time to follow because the use of registers c

x86/kexec: Add comments around swap_pages() assembly to improve readability

The current assembly around swap_pages() in the relocate_kernel() takes
some time to follow because the use of registers can be easily lost when
the line of assembly goes long. Add a couple of comments to clarify the
code around swap_pages() to improve readability.

Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Link: https://lore.kernel.org/all/8b52b0b8513a34b2a02fb4abb05c6700c2821475.1724573384.git.kai.huang@intel.com

show more ...


# 3c41ad39 25-Aug-2024 Kai Huang <[email protected]>

x86/kexec: Fix a comment of swap_pages() assembly

When relocate_kernel() gets called, %rdi holds 'indirection_page' and
%rsi holds 'page_list'. And %rdi always holds 'indirection_page' when
swap_pa

x86/kexec: Fix a comment of swap_pages() assembly

When relocate_kernel() gets called, %rdi holds 'indirection_page' and
%rsi holds 'page_list'. And %rdi always holds 'indirection_page' when
swap_pages() is called.

Therefore the comment of the first line code of swap_pages()

movq %rdi, %rcx /* Put the page_list in %rcx */

.. isn't correct because it actually moves the 'indirection_page' to
the %rcx. Fix it.

Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Link: https://lore.kernel.org/all/adafdfb1421c88efce04420fc9a996c0e2ca1b34.1724573384.git.kai.huang@intel.com

show more ...


Revision tags: v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4
# de606131 14-Jun-2024 Kirill A. Shutemov <[email protected]>

x86/kexec: Keep CR4.MCE set during kexec for TDX guest

TDX guests run with MCA enabled (CR4.MCE=1b) from the very start. If
that bit is cleared during CR4 register reprogramming during boot or kexec

x86/kexec: Keep CR4.MCE set during kexec for TDX guest

TDX guests run with MCA enabled (CR4.MCE=1b) from the very start. If
that bit is cleared during CR4 register reprogramming during boot or kexec
flows, a #VE exception will be raised which the guest kernel cannot handle.

Therefore, make sure the CR4.MCE setting is preserved over kexec too and avoid
raising any #VEs.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 7b46a899 14-Jun-2024 Borislav Petkov <[email protected]>

x86/relocate_kernel: Use named labels for less confusion

That identity_mapped() function was loving that "1" label to the point of
completely confusing its readers.

Use named labels in each place f

x86/relocate_kernel: Use named labels for less confusion

That identity_mapped() function was loving that "1" label to the point of
completely confusing its readers.

Use named labels in each place for clarity.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.10-rc3, v6.10-rc2
# 54183d10 29-May-2024 Nikolay Borisov <[email protected]>

x86/kexec: Remove spurious unconditional JMP from from identity_mapped()

This seemingly straightforward JMP was introduced in the initial version
of the the 64bit kexec code without any explanation.

x86/kexec: Remove spurious unconditional JMP from from identity_mapped()

This seemingly straightforward JMP was introduced in the initial version
of the the 64bit kexec code without any explanation.

It turns out (check accompanying Link) it's likely a copy/paste artefact
from 32-bit code, where such a JMP could be used as a serializing
instruction for the 486's prefetch queue. On x86_64 that's not needed
because there's already a preceding write to cr4 which itself is
a serializing operation.

[ bp: Typos. Let's try this and see what cries out. If it does,
reverting it is trivial. ]

Signed-off-by: Nikolay Borisov <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/

show more ...


Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1
# fb799447 01-Mar-2023 Josh Poimboeuf <[email protected]>

x86,objtool: Split UNWIND_HINT_EMPTY in two

Mark reported that the ORC unwinder incorrectly marks an unwind as
reliable when the unwind terminates prematurely in the dark corners of
return_to_handle

x86,objtool: Split UNWIND_HINT_EMPTY in two

Mark reported that the ORC unwinder incorrectly marks an unwind as
reliable when the unwind terminates prematurely in the dark corners of
return_to_handler() due to lack of information about the next frame.

The problem is UNWIND_HINT_EMPTY is used in two different situations:

1) The end of the kernel stack unwind before hitting user entry, boot
code, or fork entry

2) A blind spot in ORC coverage where the unwinder has to bail due to
lack of information about the next frame

The ORC unwinder has no way to tell the difference between the two.
When it encounters an undefined stack state with 'end=1', it blindly
marks the stack reliable, which can break the livepatch consistency
model.

Fix it by splitting UNWIND_HINT_EMPTY into UNWIND_HINT_UNDEFINED and
UNWIND_HINT_END_OF_STACK.

Reported-by: Mark Rutland <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Steven Rostedt (Google) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/fd6212c8b450d3564b855e1cb48404d6277b4d9f.1677683419.git.jpoimboe@kernel.org

show more ...


Revision tags: v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6
# e81dc127 15-Sep-2022 Thomas Gleixner <[email protected]>

x86/callthunks: Add call patching for call depth tracking

Mitigating the Intel SKL RSB underflow issue in software requires to
track the call depth. That is every CALL and every RET need to be
inter

x86/callthunks: Add call patching for call depth tracking

Mitigating the Intel SKL RSB underflow issue in software requires to
track the call depth. That is every CALL and every RET need to be
intercepted and additional code injected.

The existing retbleed mitigations already include means of redirecting
RET to __x86_return_thunk; this can be re-purposed and RET can be
redirected to another function doing RET accounting.

CALL accounting will use the function padding introduced in prior
patches. For each CALL instruction, the destination symbol's padding
is rewritten to do the accounting and the CALL instruction is adjusted
to call into the padding.

This ensures only affected CPUs pay the overhead of this accounting.
Unaffected CPUs will leave the padding unused and have their 'JMP
__x86_return_thunk' replaced with an actual 'RET' instruction.

Objtool has been modified to supply a .call_sites section that lists
all the 'CALL' instructions. Additionally the paravirt instruction
sites are iterated since they will have been patched from an indirect
call to direct calls (or direct instructions in which case it'll be
ignored).

Module handling and the actual thunk code for SKL will be added in
subsequent steps.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


123