|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
a1e4cc01 |
| 03-Mar-2025 |
Brian Gerst <[email protected]> |
x86/percpu: Move current_task to percpu hot section
No functional change.
Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Uros Bizjak <ubizjak
x86/percpu: Move current_task to percpu hot section
No functional change.
Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Uros Bizjak <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
385f72c8 |
| 03-Mar-2025 |
Brian Gerst <[email protected]> |
x86/percpu: Move top_of_stack to percpu hot section
No functional change.
Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Uros Bizjak <ubizjak
x86/percpu: Move top_of_stack to percpu hot section
No functional change.
Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Uros Bizjak <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
c6a09180 |
| 03-Mar-2025 |
Brian Gerst <[email protected]> |
x86/irq: Move irq stacks to percpu hot section
No functional change.
Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Uros Bizjak <ubizjak@gmai
x86/irq: Move irq stacks to percpu hot section
No functional change.
Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Uros Bizjak <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
399fd7a2 |
| 03-Mar-2025 |
Brian Gerst <[email protected]> |
x86/asm: Merge KSTK_ESP() implementations
Commit:
263042e4630a ("Save user RSP in pt_regs->sp on SYSCALL64 fastpath")
simplified the 64-bit implementation of KSTK_ESP() which is now identical to
x86/asm: Merge KSTK_ESP() implementations
Commit:
263042e4630a ("Save user RSP in pt_regs->sp on SYSCALL64 fastpath")
simplified the 64-bit implementation of KSTK_ESP() which is now identical to 32-bit. Merge them into a common definition.
No functional change.
Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
| #
684b1291 |
| 02-Feb-2025 |
Brian Gerst <[email protected]> |
x86/arch_prctl/64: Clean up ARCH_MAP_VDSO_32
process_64.c is not built on native 32-bit, so CONFIG_X86_32 will never be set.
No change in functionality intended.
Signed-off-by: Brian Gerst <brgers
x86/arch_prctl/64: Clean up ARCH_MAP_VDSO_32
process_64.c is not built on native 32-bit, so CONFIG_X86_32 will never be set.
No change in functionality intended.
Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
2df1ad0d |
| 02-Feb-2025 |
Brian Gerst <[email protected]> |
x86/arch_prctl: Simplify sys_arch_prctl()
Use in_ia32_syscall() instead of a compat syscall entry.
No change in functionality intended.
Signed-off-by: Brian Gerst <[email protected]> Signed-off-by
x86/arch_prctl: Simplify sys_arch_prctl()
Use in_ia32_syscall() instead of a compat syscall entry.
No change in functionality intended.
Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7 |
|
| #
b7c35279 |
| 02-Jul-2024 |
Yosry Ahmed <[email protected]> |
x86/mm: Cleanup prctl_enable_tagged_addr() nr_bits error checking
There are two separate checks in prctl_enable_tagged_addr() that nr_bits is in the correct range. The checks are arranged such the c
x86/mm: Cleanup prctl_enable_tagged_addr() nr_bits error checking
There are two separate checks in prctl_enable_tagged_addr() that nr_bits is in the correct range. The checks are arranged such the correct case is sandwiched between both error cases, which do exactly the same thing.
Simplify the if condition and pull the correct case outside with the rest of the success code path.
Signed-off-by: Yosry Ahmed <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Kirill A. Shutemov <[email protected]> Link: https://lore.kernel.org/all/20240702132139.3332013-4-yosryahmed%40google.com
show more ...
|
| #
ec225f8c |
| 02-Jul-2024 |
Yosry Ahmed <[email protected]> |
x86/mm: Fix LAM inconsistency during context switch
LAM can only be enabled when a process is single-threaded. But _kernel_ threads can temporarily use a single-threaded process's mm. That means t
x86/mm: Fix LAM inconsistency during context switch
LAM can only be enabled when a process is single-threaded. But _kernel_ threads can temporarily use a single-threaded process's mm. That means that a context-switching kernel thread can race and observe the mm's LAM metadata (mm->context.lam_cr3_mask) change.
The context switch code does two logical things with that metadata: populate CR3 and populate 'cpu_tlbstate.lam'. If it hits this race, 'cpu_tlbstate.lam' and CR3 can end up out of sync.
This de-synchronization is currently harmless. But it is confusing and might lead to warnings or real bugs.
Update set_tlbstate_lam_mode() to take in the LAM mask and untag mask instead of an mm_struct pointer, and while we are at it, rename it to cpu_tlbstate_update_lam(). This should also make it clearer that we are updating cpu_tlbstate. In switch_mm_irqs_off(), read the LAM mask once and use it for both the cpu_tlbstate update and the CR3 update.
Signed-off-by: Yosry Ahmed <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Kirill A. Shutemov <[email protected]> Link: https://lore.kernel.org/all/20240702132139.3332013-3-yosryahmed%40google.com
show more ...
|
| #
3b299b99 |
| 02-Jul-2024 |
Yosry Ahmed <[email protected]> |
x86/mm: Use IPIs to synchronize LAM enablement
LAM can only be enabled when a process is single-threaded. But _kernel_ threads can temporarily use a single-threaded process's mm.
If LAM is enabled
x86/mm: Use IPIs to synchronize LAM enablement
LAM can only be enabled when a process is single-threaded. But _kernel_ threads can temporarily use a single-threaded process's mm.
If LAM is enabled by a userspace process while a kthread is using its mm, the kthread will not observe LAM enablement (i.e. LAM will be disabled in CR3). This could be fine for the kthread itself, as LAM only affects userspace addresses. However, if the kthread context switches to a thread in the same userspace process, CR3 may or may not be updated because the mm_struct doesn't change (based on pending TLB flushes). If CR3 is not updated, the userspace thread will run incorrectly with LAM disabled, which may cause page faults when using tagged addresses. Example scenario:
CPU 1 CPU 2 /* kthread */ kthread_use_mm() /* user thread */ prctl_enable_tagged_addr() /* LAM enabled on CPU 2 */ /* LAM disabled on CPU 1 */ context_switch() /* to CPU 1 */ /* Switching to user thread */ switch_mm_irqs_off() /* CR3 not updated */ /* LAM is still disabled on CPU 1 */
Synchronize LAM enablement by sending an IPI to all CPUs running with the mm_struct to enable LAM. This makes sure LAM is enabled on CPU 1 in the above scenario before prctl_enable_tagged_addr() returns and userspace starts using tagged addresses, and before it's possible to run the userspace process on CPU 1.
In switch_mm_irqs_off(), move reading the LAM mask until after mm_cpumask() is updated. This ensures that if an outdated LAM mask is written to CR3, an IPI is received to update it right after IRQs are re-enabled.
[ dhansen: Add a LAM enabling helper and comment it ]
Fixes: 82721d8b25d7 ("x86/mm: Handle LAM on context switch") Suggested-by: Andy Lutomirski <[email protected]> Signed-off-by: Yosry Ahmed <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Kirill A. Shutemov <[email protected]> Link: https://lore.kernel.org/all/20240702132139.3332013-2-yosryahmed%40google.com
show more ...
|
|
Revision tags: v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5 |
|
| #
b53c6bd5 |
| 21-Apr-2024 |
David Kaplan <[email protected]> |
x86/cpu: Fix check for RDPKRU in __show_regs()
cpu_feature_enabled(X86_FEATURE_OSPKE) does not necessarily reflect whether CR4.PKE is set on the CPU. In particular, they may differ on non-BSP CPUs
x86/cpu: Fix check for RDPKRU in __show_regs()
cpu_feature_enabled(X86_FEATURE_OSPKE) does not necessarily reflect whether CR4.PKE is set on the CPU. In particular, they may differ on non-BSP CPUs before setup_pku() is executed. In this scenario, RDPKRU will #UD causing the system to hang.
Fix by checking CR4 for PKE enablement which is always correct for the current CPU.
The scenario happens by inserting a WARN* before setup_pku() in identiy_cpu() or some other diagnostic which would lead to calling __show_regs().
[ bp: Massage commit message. ]
Signed-off-by: David Kaplan <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5 |
|
| #
ad41a14c |
| 05-Dec-2023 |
H. Peter Anvin (Intel) <[email protected]> |
x86/fred: Allow single-step trap and NMI when starting a new task
Entering a new task is logically speaking a return from a system call (exec, fork, clone, etc.). As such, if ptrace enables single s
x86/fred: Allow single-step trap and NMI when starting a new task
Entering a new task is logically speaking a return from a system call (exec, fork, clone, etc.). As such, if ptrace enables single stepping a single step exception should be allowed to trigger immediately upon entering user space. This is not optional.
NMI should *never* be disabled in user space. As such, this is an optional, opportunistic way to catch errors.
Allow single-step trap and NMI when starting a new task, thus once the new task enters user space, single-step trap and NMI are both enabled immediately.
Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
09794f68 |
| 05-Dec-2023 |
H. Peter Anvin (Intel) <[email protected]> |
x86/fred: Disallow the swapgs instruction when FRED is enabled
SWAPGS is no longer needed thus NOT allowed with FRED because FRED transitions ensure that an operating system can _always_ operate wit
x86/fred: Disallow the swapgs instruction when FRED is enabled
SWAPGS is no longer needed thus NOT allowed with FRED because FRED transitions ensure that an operating system can _always_ operate with its own GS base address:
- For events that occur in ring 3, FRED event delivery swaps the GS base address with the IA32_KERNEL_GS_BASE MSR.
- ERETU (the FRED transition that returns to ring 3) also swaps the GS base address with the IA32_KERNEL_GS_BASE MSR.
And the operating system can still setup the GS segment for a user thread without the need of loading a user thread GS with:
- Using LKGS, available with FRED, to modify other attributes of the GS segment without compromising its ability always to operate with its own GS base address.
- Accessing the GS segment base address for a user thread as before using RDMSR or WRMSR on the IA32_KERNEL_GS_BASE MSR.
Note, LKGS loads the GS base address into the IA32_KERNEL_GS_BASE MSR instead of the GS segment's descriptor cache. As such, the operating system never changes its runtime GS base address.
Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
ee63291a |
| 05-Dec-2023 |
Xin Li <[email protected]> |
x86/ptrace: Cleanup the definition of the pt_regs structure
struct pt_regs is hard to read because the member or section related comments are not aligned with the members.
The 'cs' and 'ss' members
x86/ptrace: Cleanup the definition of the pt_regs structure
struct pt_regs is hard to read because the member or section related comments are not aligned with the members.
The 'cs' and 'ss' members of pt_regs are type of 'unsigned long' while in reality they are only 16-bit wide. This works so far as the remaining space is unused, but FRED will use the remaining bits for other purposes.
To prepare for FRED:
- Cleanup the formatting - Convert 'cs' and 'ss' to u16 and embed them into an union with a u64 - Fixup the related printk() format strings
Suggested-by: Thomas Gleixner <[email protected]> Originally-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7 |
|
| #
24b8a236 |
| 18-Oct-2023 |
Linus Torvalds <[email protected]> |
x86/fpu: Clean up FPU switching in the middle of task switching
It happens to work, but it's very very wrong, because our 'current' macro is magic that is supposedly loading a stable value.
It just
x86/fpu: Clean up FPU switching in the middle of task switching
It happens to work, but it's very very wrong, because our 'current' macro is magic that is supposedly loading a stable value.
It just happens to be not quite stable enough and the compilers re-load the value enough for this code to work. But it's wrong.
The whole
struct fpu *prev_fpu = &prev->fpu;
thing in __switch_to() is pretty ugly. There's no reason why we should look at that 'prev_fpu' pointer there, or pass it down.
And it only generates worse code, in how it loads 'current' when __switch_to() has the right task pointers.
The attached patch not only cleans this up, it actually generates better code too:
(a) it removes one push/pop pair at entry/exit because there's one less register used (no 'current')
(b) it removes that pointless load of 'current' because it just uses the right argument:
- movq %gs:pcpu_hot(%rip), %r12 - testq $16384, (%r12) + testq $16384, (%rdi)
Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7 |
|
| #
67840ad0 |
| 13-Jun-2023 |
Rick Edgecombe <[email protected]> |
x86/shstk: Add ARCH_SHSTK_STATUS
CRIU and GDB need to get the current shadow stack and WRSS enablement status. This information is already available via /proc/pid/status, but this is inconvenient fo
x86/shstk: Add ARCH_SHSTK_STATUS
CRIU and GDB need to get the current shadow stack and WRSS enablement status. This information is already available via /proc/pid/status, but this is inconvenient for CRIU because it involves parsing the text output in an area of the code where this is difficult. Provide a status arch_prctl(), ARCH_SHSTK_STATUS for retrieving the status. Have arg2 be a userspace address, and make the new arch_prctl simply copy the features out to userspace.
Suggested-by: Mike Rapoport <[email protected]> Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Tested-by: Pengfei Xu <[email protected]> Tested-by: John Allen <[email protected]> Tested-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/all/20230613001108.3040476-43-rick.p.edgecombe%40intel.com
show more ...
|
| #
680ed2f1 |
| 13-Jun-2023 |
Mike Rapoport <[email protected]> |
x86/shstk: Add ARCH_SHSTK_UNLOCK
Userspace loaders may lock features before a CRIU restore operation has the chance to set them to whatever state is required by the process being restored. Allow a w
x86/shstk: Add ARCH_SHSTK_UNLOCK
Userspace loaders may lock features before a CRIU restore operation has the chance to set them to whatever state is required by the process being restored. Allow a way for CRIU to unlock features. Add it as an arch_prctl() like the other shadow stack operations, but restrict it being called by the ptrace arch_pctl() interface.
[Merged into recent API changes, added commit log and docs]
Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Tested-by: Pengfei Xu <[email protected]> Tested-by: John Allen <[email protected]> Tested-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/all/20230613001108.3040476-42-rick.p.edgecombe%40intel.com
show more ...
|
| #
98cfa463 |
| 13-Jun-2023 |
Rick Edgecombe <[email protected]> |
x86: Introduce userspace API for shadow stack
Add three new arch_prctl() handles:
- ARCH_SHSTK_ENABLE/DISABLE enables or disables the specified feature. Returns 0 on success or a negative value
x86: Introduce userspace API for shadow stack
Add three new arch_prctl() handles:
- ARCH_SHSTK_ENABLE/DISABLE enables or disables the specified feature. Returns 0 on success or a negative value on error.
- ARCH_SHSTK_LOCK prevents future disabling or enabling of the specified feature. Returns 0 on success or a negative value on error.
The features are handled per-thread and inherited over fork(2)/clone(2), but reset on exec().
Co-developed-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Tested-by: Pengfei Xu <[email protected]> Tested-by: John Allen <[email protected]> Tested-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/all/20230613001108.3040476-27-rick.p.edgecombe%40intel.com
show more ...
|
|
Revision tags: v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6 |
|
| #
97740266 |
| 03-Apr-2023 |
Kirill A. Shutemov <[email protected]> |
x86/mm/iommu/sva: Do not allow to set FORCE_TAGGED_SVA bit from outside
arch_prctl(ARCH_FORCE_TAGGED_SVA) overrides the default and allows LAM and SVA to co-exist in the process. It is expected by c
x86/mm/iommu/sva: Do not allow to set FORCE_TAGGED_SVA bit from outside
arch_prctl(ARCH_FORCE_TAGGED_SVA) overrides the default and allows LAM and SVA to co-exist in the process. It is expected by called by the process when it knows what it is doing.
arch_prctl() operates on the current process, but the same code is reachable from ptrace where it can be called on arbitrary task.
Make it strict and only allow to set MM_CONTEXT_FORCE_TAGGED_SVA for the current process.
Fixes: 23e5d9ec2bab ("x86/mm/iommu/sva: Make LAM and SVA mutually exclusive") Suggested-by: Dmitry Vyukov <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Link: https://lore.kernel.org/all/20230403111020.3136-3-kirill.shutemov%40linux.intel.com
show more ...
|
| #
fca1fdd2 |
| 03-Apr-2023 |
Kirill A. Shutemov <[email protected]> |
x86/mm/iommu/sva: Fix error code for LAM enabling failure due to SVA
Normally, LAM and SVA are mutually exclusive. LAM enabling will fail if SVA is already in use.
Correct error code for the failur
x86/mm/iommu/sva: Fix error code for LAM enabling failure due to SVA
Normally, LAM and SVA are mutually exclusive. LAM enabling will fail if SVA is already in use.
Correct error code for the failure. EINTR is nonsensical there.
Fixes: 23e5d9ec2bab ("x86/mm/iommu/sva: Make LAM and SVA mutually exclusive") Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Link: https://lore.kernel.org/all/CACT4Y+YfqSMsZArhh25TESmG-U4jO5Hjphz87wKSnTiaw2Wrfw@mail.gmail.com Link: https://lore.kernel.org/all/20230403111020.3136-2-kirill.shutemov%40linux.intel.com
show more ...
|
|
Revision tags: v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2 |
|
| #
23e5d9ec |
| 12-Mar-2023 |
Kirill A. Shutemov <[email protected]> |
x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
IOMMU and SVA-capable devices know nothing about LAM and only expect canonical addresses. An attempt to pass down tagged pointer will lead to ad
x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
IOMMU and SVA-capable devices know nothing about LAM and only expect canonical addresses. An attempt to pass down tagged pointer will lead to address translation failure.
By default do not allow to enable both LAM and use SVA in the same process.
The new ARCH_FORCE_TAGGED_SVA arch_prctl() overrides the limitation. By using the arch_prctl() userspace takes responsibility to never pass tagged address to the device.
Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Ashok Raj <[email protected]> Reviewed-by: Jacob Pan <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/20230312112612.31869-12-kirill.shutemov%40linux.intel.com
show more ...
|
| #
2f8794bd |
| 12-Mar-2023 |
Kirill A. Shutemov <[email protected]> |
x86/mm: Provide arch_prctl() interface for LAM
Add a few of arch_prctl() handles:
- ARCH_ENABLE_TAGGED_ADDR enabled LAM. The argument is required number of tag bits. It is rounded up to the nea
x86/mm: Provide arch_prctl() interface for LAM
Add a few of arch_prctl() handles:
- ARCH_ENABLE_TAGGED_ADDR enabled LAM. The argument is required number of tag bits. It is rounded up to the nearest LAM mode that can provide it. For now only LAM_U57 is supported, with 6 tag bits.
- ARCH_GET_UNTAG_MASK returns untag mask. It can indicates where tag bits located in the address.
- ARCH_GET_MAX_TAG_BITS returns the maximum tag bits user can request. Zero if LAM is not supported.
Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Alexander Potapenko <[email protected]> Link: https://lore.kernel.org/all/20230312112612.31869-9-kirill.shutemov%40linux.intel.com
show more ...
|
| #
5ef495e5 |
| 12-Mar-2023 |
Kirill A. Shutemov <[email protected]> |
x86: Allow atomic MM_CONTEXT flags setting
So far there's no need in atomic setting of MM context flags in mm_context_t::flags. The flags set early in exec and never change after that.
LAM enabling
x86: Allow atomic MM_CONTEXT flags setting
So far there's no need in atomic setting of MM context flags in mm_context_t::flags. The flags set early in exec and never change after that.
LAM enabling requires atomic flag setting. The upcoming flag MM_CONTEXT_FORCE_TAGGED_SVA can be set much later in the process lifetime where multiple threads exist.
Convert the field to unsigned long and do MM_CONTEXT_* accesses with __set_bit() and test_bit().
No functional changes.
Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Alexander Potapenko <[email protected]> Link: https://lore.kernel.org/all/20230312112612.31869-3-kirill.shutemov%40linux.intel.com
show more ...
|
| #
7fef0997 |
| 07-Mar-2023 |
Linus Torvalds <[email protected]> |
x86/resctl: fix scheduler confusion with 'current'
The implementation of 'current' on x86 is very intentionally special: it is a very common thing to look up, and it uses 'this_cpu_read_stable()' to
x86/resctl: fix scheduler confusion with 'current'
The implementation of 'current' on x86 is very intentionally special: it is a very common thing to look up, and it uses 'this_cpu_read_stable()' to get the current thread pointer efficiently from per-cpu storage.
And the keyword in there is 'stable': the current thread pointer never changes as far as a single thread is concerned. Even if when a thread is preempted, or moved to another CPU, or even across an explicit call 'schedule()' that thread will still have the same value for 'current'.
It is, after all, the kernel base pointer to thread-local storage. That's why it's stable to begin with, but it's also why it's important enough that we have that special 'this_cpu_read_stable()' access for it.
So this is all done very intentionally to allow the compiler to treat 'current' as a value that never visibly changes, so that the compiler can do CSE and combine multiple different 'current' accesses into one.
However, there is obviously one very special situation when the currently running thread does actually change: inside the scheduler itself.
So the scheduler code paths are special, and do not have a 'current' thread at all. Instead there are _two_ threads: the previous and the next thread - typically called 'prev' and 'next' (or prev_p/next_p) internally.
So this is all actually quite straightforward and simple, and not all that complicated.
Except for when you then have special code that is run in scheduler context, that code then has to be aware that 'current' isn't really a valid thing. Did you mean 'prev'? Did you mean 'next'?
In fact, even if then look at the code, and you use 'current' after the new value has been assigned to the percpu variable, we have explicitly told the compiler that 'current' is magical and always stable. So the compiler is quite free to use an older (or newer) value of 'current', and the actual assignment to the percpu storage is not relevant even if it might look that way.
Which is exactly what happened in the resctl code, that blithely used 'current' in '__resctrl_sched_in()' when it really wanted the new process state (as implied by the name: we're scheduling 'into' that new resctl state). And clang would end up just using the old thread pointer value at least in some configurations.
This could have happened with gcc too, and purely depends on random compiler details. Clang just seems to have been more aggressive about moving the read of the per-cpu current_task pointer around.
The fix is trivial: just make the resctl code adhere to the scheduler rules of using the prev/next thread pointer explicitly, instead of using 'current' in a situation where it just wasn't valid.
That same code is then also used outside of the scheduler context (when a thread resctl state is explicitly changed), and then we will just pass in 'current' as that pointer, of course. There is no ambiguity in that case.
The fix may be trivial, but noticing and figuring out what went wrong was not. The credit for that goes to Stephane Eranian.
Reported-by: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/lkml/[email protected]/ Reviewed-by: Nick Desaulniers <[email protected]> Tested-by: Tony Luck <[email protected]> Tested-by: Stephane Eranian <[email protected]> Tested-by: Babu Moger <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4 |
|
| #
6007878a |
| 04-Nov-2022 |
Juergen Gross <[email protected]> |
x86/cpu: Switch to cpu_feature_enabled() for X86_FEATURE_XENPV
Convert the remaining cases of static_cpu_has(X86_FEATURE_XENPV) and boot_cpu_has(X86_FEATURE_XENPV) to use cpu_feature_enabled(), allo
x86/cpu: Switch to cpu_feature_enabled() for X86_FEATURE_XENPV
Convert the remaining cases of static_cpu_has(X86_FEATURE_XENPV) and boot_cpu_has(X86_FEATURE_XENPV) to use cpu_feature_enabled(), allowing more efficient code in case the kernel is configured without CONFIG_XEN_PV.
Signed-off-by: Juergen Gross <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Dave Hansen <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6 |
|
| #
d7b6d709 |
| 15-Sep-2022 |
Thomas Gleixner <[email protected]> |
x86/percpu: Move irq_stack variables next to current_task
Further extend struct pcpu_hot with the hard and soft irq stack pointers.
Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by
x86/percpu: Move irq_stack variables next to current_task
Further extend struct pcpu_hot with the hard and soft irq stack pointers.
Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|