History log of /linux-6.15/arch/arm64/kernel/process.c (Results 1 – 25 of 176)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1
# 1751f872 28-Jan-2025 Joel Granados <[email protected]>

treewide: const qualify ctl_tables where applicable

Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysc

treewide: const qualify ctl_tables where applicable

Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.

Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.

Created this by running an spatch followed by a sed command:
Spatch:
virtual patch

@
depends on !(file in "net")
disable optional_qualifier
@

identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@

+ const
struct ctl_table table_name [] = { ... };

sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c

Reviewed-by: Song Liu <[email protected]>
Acked-by: Steven Rostedt (Google) <[email protected]> # for kernel/trace/
Reviewed-by: Martin K. Petersen <[email protected]> # SCSI
Reviewed-by: Darrick J. Wong <[email protected]> # xfs
Acked-by: Jani Nikula <[email protected]>
Acked-by: Corey Minyard <[email protected]>
Acked-by: Wei Liu <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Reviewed-by: Bill O'Donnell <[email protected]>
Acked-by: Baoquan He <[email protected]>
Acked-by: Ashutosh Dixit <[email protected]>
Acked-by: Anna Schumaker <[email protected]>
Signed-off-by: Joel Granados <[email protected]>

show more ...


Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12
# 67ab51cb 14-Nov-2024 Will Deacon <[email protected]>

arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled

Commit 18011eac28c7 ("arm64: tls: Avoid unconditional zeroing of
tpidrro_el0 for native tasks") tried to optimise the context sw

arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled

Commit 18011eac28c7 ("arm64: tls: Avoid unconditional zeroing of
tpidrro_el0 for native tasks") tried to optimise the context switching
of tpidrro_el0 by eliding the clearing of the register when switching
to a native task with kpti enabled, on the erroneous assumption that
the kpti trampoline entry code would already have taken care of the
write.

Although the kpti trampoline does zero the register on entry from a
native task, the check in tls_thread_switch() is on the *next* task and
so we can end up leaving a stale, non-zero value in the register if the
previous task was 32-bit.

Drop the broken optimisation and zero tpidrro_el0 unconditionally when
switching to a native 64-bit task.

Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Fixes: 18011eac28c7 ("arm64: tls: Avoid unconditional zeroing of tpidrro_el0 for native tasks")
Signed-off-by: Will Deacon <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

show more ...


Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4
# c2c6b27b 17-Oct-2024 Mark Rutland <[email protected]>

arm64: stacktrace: unwind exception boundaries

When arm64's stack unwinder encounters an exception boundary, it uses
the pt_regs::stackframe created by the entry code, which has a copy of
the PC and

arm64: stacktrace: unwind exception boundaries

When arm64's stack unwinder encounters an exception boundary, it uses
the pt_regs::stackframe created by the entry code, which has a copy of
the PC and FP at the time the exception was taken. The unwinder doesn't
know anything about pt_regs, and reports the PC from the stackframe, but
does not report the LR.

The LR is only guaranteed to contain the return address at function call
boundaries, and can be used as a scratch register at other times, so the
LR at an exception boundary may or may not be a legitimate return
address. It would be useful to report the LR value regardless, as it can
be helpful when debugging, and in future it will be helpful for reliable
stacktrace support.

This patch changes the way we unwind across exception boundaries,
allowing both the PC and LR to be reported. The entry code creates a
frame_record_meta structure embedded within pt_regs, which the unwinder
uses to find the pt_regs. The unwinder can then extract pt_regs::pc and
pt_regs::lr as two separate unwind steps before continuing with a
regular walk of frame records.

When a PC is unwound from pt_regs::lr, dump_backtrace() will log this
with an "L" marker so that it can be identified easily. For example,
an unwind across an exception boundary will appear as follows:

| el1h_64_irq+0x6c/0x70
| _raw_spin_unlock_irqrestore+0x10/0x60 (P)
| __aarch64_insn_write+0x6c/0x90 (L)
| aarch64_insn_patch_text_nosync+0x28/0x80

... with a (P) entry for pt_regs::pc, and an (L) entry for pt_regs:lr.

Note that the LR may be stale at the point of the exception, for example,
shortly after a return:

| el1h_64_irq+0x6c/0x70
| default_idle_call+0x34/0x180 (P)
| default_idle_call+0x28/0x180 (L)
| do_idle+0x204/0x268

... where the LR points a few instructions before the current PC.

This plays nicely with all the other unwind metadata tracking. With the
ftrace_graph profiler enabled globally, and kretprobes installed on
generic_handle_domain_irq() and do_interrupt_handler(), a backtrace triggered
by magic-sysrq + L reports:

| Call trace:
| show_stack+0x20/0x40 (CF)
| dump_stack_lvl+0x60/0x80 (F)
| dump_stack+0x18/0x28
| nmi_cpu_backtrace+0xfc/0x140
| nmi_trigger_cpumask_backtrace+0x1c8/0x200
| arch_trigger_cpumask_backtrace+0x20/0x40
| sysrq_handle_showallcpus+0x24/0x38 (F)
| __handle_sysrq+0xa8/0x1b0 (F)
| handle_sysrq+0x38/0x50 (F)
| pl011_int+0x460/0x5a8 (F)
| __handle_irq_event_percpu+0x60/0x220 (F)
| handle_irq_event+0x54/0xc0 (F)
| handle_fasteoi_irq+0xa8/0x1d0 (F)
| generic_handle_domain_irq+0x34/0x58 (F)
| gic_handle_irq+0x54/0x140 (FK)
| call_on_irq_stack+0x24/0x58 (F)
| do_interrupt_handler+0x88/0xa0
| el1_interrupt+0x34/0x68 (FK)
| el1h_64_irq_handler+0x18/0x28
| el1h_64_irq+0x6c/0x70
| default_idle_call+0x34/0x180 (P)
| default_idle_call+0x28/0x180 (L)
| do_idle+0x204/0x268
| cpu_startup_entry+0x3c/0x50 (F)
| rest_init+0xe4/0xf0
| start_kernel+0x744/0x750
| __primary_switched+0x88/0x98

Signed-off-by: Mark Rutland <[email protected]>
Reviewed-by: Mark Brown <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
Reviewed-by: Puranjay Mohan <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Kalesh Singh <[email protected]>
Cc: Madhavan T. Venkataraman <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

show more ...


# 886c2b0b 17-Oct-2024 Mark Rutland <[email protected]>

arm64: use a common struct frame_record

Currently the signal handling code has its own struct frame_record,
the definition of struct pt_regs open-codes a frame record as an array,
and the kernel unw

arm64: use a common struct frame_record

Currently the signal handling code has its own struct frame_record,
the definition of struct pt_regs open-codes a frame record as an array,
and the kernel unwinder hard-codes frame record offsets.

Move to a common struct frame_record that can be used throughout the
kernel.

Signed-off-by: Mark Rutland <[email protected]>
Reviewed-by: Mark Brown <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
Reviewed-by: Puranjay Mohan <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Kalesh Singh <[email protected]>
Cc: Madhavan T. Venkataraman <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

show more ...


# 14543630 17-Oct-2024 Mark Rutland <[email protected]>

arm64: pt_regs: swap 'unused' and 'pmr' fields

In subsequent patches we'll want to add an additional u64 to struct
pt_regs. To make space, this patch swaps the 'unused' and 'pmr' fields,
as the 'pmr

arm64: pt_regs: swap 'unused' and 'pmr' fields

In subsequent patches we'll want to add an additional u64 to struct
pt_regs. To make space, this patch swaps the 'unused' and 'pmr' fields,
as the 'pmr' value only requires bits[7:0] and can safely fit into a
u32, which frees up a 64-bit unused field.

The 'lockdep_hardirqs' and 'exit_rcu' fields should eventually be moved
out of pt_regs and managed locally within entry-common.c, so I've left
those as-is for the moment.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <[email protected]>
Reviewed-by: Mark Brown <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
Reviewed-by: Puranjay Mohan <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Kalesh Singh <[email protected]>
Cc: Madhavan T. Venkataraman <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

show more ...


# 00d95979 17-Oct-2024 Mark Rutland <[email protected]>

arm64: pt_regs: rename "pmr_save" -> "pmr"

The pt_regs::pmr_save field is weirdly named relative to all other
pt_regs fields, with a '_save' suffix that doesn't make anything clearer
and only leads

arm64: pt_regs: rename "pmr_save" -> "pmr"

The pt_regs::pmr_save field is weirdly named relative to all other
pt_regs fields, with a '_save' suffix that doesn't make anything clearer
and only leads to more typing to access the field.

Remove the '_save' suffix.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <[email protected]>
Reviewed-by: Mark Brown <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
Reviewed-by: Puranjay Mohan <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Kalesh Singh <[email protected]>
Cc: Madhavan T. Venkataraman <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

show more ...


Revision tags: v6.12-rc3, v6.12-rc2
# e3e85271 01-Oct-2024 Joey Gouly <[email protected]>

arm64: set POR_EL0 for kernel threads

Restrict kernel threads to only have RWX overlays for pkey 0. This matches
what arch/x86 does, by defaulting to a restrictive PKRU.

Signed-off-by: Joey Gouly

arm64: set POR_EL0 for kernel threads

Restrict kernel threads to only have RWX overlays for pkey 0. This matches
what arch/x86 does, by defaulting to a restrictive PKRU.

Signed-off-by: Joey Gouly <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Reviewed-by: Kevin Brodsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

show more ...


# 506496bc 01-Oct-2024 Mark Brown <[email protected]>

arm64/gcs: Ensure that new threads have a GCS

When a new thread is created by a thread with GCS enabled the GCS needs
to be specified along with the regular stack.

Unfortunately plain clone() is no

arm64/gcs: Ensure that new threads have a GCS

When a new thread is created by a thread with GCS enabled the GCS needs
to be specified along with the regular stack.

Unfortunately plain clone() is not extensible and existing clone3()
users will not specify a stack so all existing code would be broken if
we mandated specifying the stack explicitly. For compatibility with
these cases and also x86 (which did not initially implement clone3()
support for shadow stacks) if no GCS is specified we will allocate one
so when a thread is created which has GCS enabled allocate one for it.
We follow the extensively discussed x86 implementation and allocate
min(RLIMIT_STACK/2, 2G). Since the GCS only stores the call stack and not
any variables this should be more than sufficient for most applications.

GCSs allocated via this mechanism will be freed when the thread exits.

Reviewed-by: Thiago Jung Bauermann <[email protected]>
Acked-by: Yury Khrustalev <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

show more ...


# fc84bc53 01-Oct-2024 Mark Brown <[email protected]>

arm64/gcs: Context switch GCS state for EL0

There are two registers controlling the GCS state of EL0, GCSPR_EL0 which
is the current GCS pointer and GCSCRE0_EL1 which has enable bits for the
specifi

arm64/gcs: Context switch GCS state for EL0

There are two registers controlling the GCS state of EL0, GCSPR_EL0 which
is the current GCS pointer and GCSCRE0_EL1 which has enable bits for the
specific GCS functionality enabled for EL0. Manage these on context switch
and process lifetime events, GCS is reset on exec(). Also ensure that
any changes to the GCS memory are visible to other PEs and that changes
from other PEs are visible on this one by issuing a GCSB DSYNC when
moving to or from a thread with GCS.

Since the current GCS configuration of a thread will be visible to
userspace we store the configuration in the format used with userspace
and provide a helper which configures the system register as needed.

On systems that support GCS we always allow access to GCSPR_EL0, this
facilitates reporting of GCS faults if userspace implements disabling of
GCS on error - the GCS can still be discovered and examined even if GCS
has been disabled.

Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

show more ...


Revision tags: v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5
# 160a8e13 22-Aug-2024 Joey Gouly <[email protected]>

arm64: context switch POR_EL0 register

POR_EL0 is a register that can be modified by userspace directly,
so it must be context switched.

Signed-off-by: Joey Gouly <[email protected]>
Cc: Catalin M

arm64: context switch POR_EL0 register

POR_EL0 is a register that can be modified by userspace directly,
so it must be context switched.

Signed-off-by: Joey Gouly <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[will: Dropped unnecessary isb()s]
Signed-off-by: Will Deacon <[email protected]>

show more ...


# 3e9e67e1 24-Aug-2024 Peter Collingbourne <[email protected]>

arm64: Implement prctl(PR_{G,S}ET_TSC)

On arm64, this prctl controls access to CNTVCT_EL0, CNTVCTSS_EL0 and
CNTFRQ_EL0 via CNTKCTL_EL1.EL0VCTEN. Since this bit is also used to
implement various erra

arm64: Implement prctl(PR_{G,S}ET_TSC)

On arm64, this prctl controls access to CNTVCT_EL0, CNTVCTSS_EL0 and
CNTFRQ_EL0 via CNTKCTL_EL1.EL0VCTEN. Since this bit is also used to
implement various erratum workarounds, check whether the CPU needs
a workaround whenever we potentially need to change it.

This is needed for a correct implementation of non-instrumenting
record-replay debugging on arm64 (i.e. rr; https://rr-project.org/).
rr must trap and record any sources of non-determinism from the
userspace program's perspective so it can be replayed later. This
includes the results of syscalls as well as the results of access
to architected timers exposed directly to the program. This prctl
was originally added for x86 by commit 8fb402bccf20 ("generic, x86:
add prctl commands PR_GET_TSC and PR_SET_TSC"), and rr uses it to
trap RDTSC on x86 for the same reason.

We also considered exposing this as a PTRACE_EVENT. However, prctl
seems like a better choice for these reasons:

1) In general an in-process control seems more useful than an
out-of-process control, since anything that you would be able to
do with ptrace could also be done with prctl (tracer can inject a
call to the prctl and handle signal-delivery-stops), and it avoids
needing an additional process (which will complicate debugging
of the ptraced process since it cannot have more than one tracer,
and will be incompatible with ptrace_scope=3) in cases where that
is not otherwise necessary.

2) Consistency with x86_64. Note that on x86_64, RDTSC has been there
since the start, so it's the same situation as on arm64.

Signed-off-by: Peter Collingbourne <[email protected]>
Link: https://linux-review.googlesource.com/id/I233a1867d1ccebe2933a347552e7eae862344421
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

show more ...


Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3
# bce79b0c 02-Feb-2024 Dawei Li <[email protected]>

arm64: remove unneeded BUILD_BUG_ON assertion

Since commit c02433dd6de3 ("arm64: split thread_info from task stack"),
CONFIG_THREAD_INFO_IN_TASK is enabled unconditionally for arm64. So
remove this

arm64: remove unneeded BUILD_BUG_ON assertion

Since commit c02433dd6de3 ("arm64: split thread_info from task stack"),
CONFIG_THREAD_INFO_IN_TASK is enabled unconditionally for arm64. So
remove this always-true assertion from arch_dup_task_struct.

Signed-off-by: Dawei Li <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

show more ...


Revision tags: v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7
# bc75d0c0 16-Oct-2023 Mark Rutland <[email protected]>

arm64: Avoid cpus_have_const_cap() for ARM64_SSBS

In ssbs_thread_switch() we use cpus_have_const_cap() to check for
ARM64_SSBS, but this is not necessary and alternative_has_cap_*() would
be prefera

arm64: Avoid cpus_have_const_cap() for ARM64_SSBS

In ssbs_thread_switch() we use cpus_have_const_cap() to check for
ARM64_SSBS, but this is not necessary and alternative_has_cap_*() would
be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The cpus_have_const_cap() check in ssbs_thread_switch() is an
optimization to avoid the overhead of
spectre_v4_enable_task_mitigation() where all CPUs implement SSBS and
naturally preserve the SSBS bit in SPSR_ELx. In the window between
detecting the ARM64_SSBS system-wide and patching alternative branches
it is benign to continue to call spectre_v4_enable_task_mitigation().

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime.

Signed-off-by: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

show more ...


Revision tags: v6.6-rc6, v6.6-rc5
# de8a660b 02-Oct-2023 Joel Granados <[email protected]>

arm: Remove now superfluous sentinel elem from ctl_table arrays

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) whic

arm: Remove now superfluous sentinel elem from ctl_table arrays

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/)

Removed the sentinel as well as the explicit size from ctl_isa_vars. The
size is redundant as the initialization sets it. Changed
insn_emulation->sysctl from a 2 element array of struct ctl_table to a
simple struct. This has no consequence for the sysctl registration as it
is forwarded as a pointer. Removed sentinel from sve_defatul_vl_table,
sme_default_vl_table, tagged_addr_sysctl_table and
armv8_pmu_sysctl_table.

This removal is safe because register_sysctl_sz and register_sysctl use
the array size in addition to checking for the sentinel.

Signed-off-by: Joel Granados <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>

show more ...


Revision tags: v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7
# ca708599 12-Apr-2023 Mark Rutland <[email protected]>

arm64: use XPACLRI to strip PAC

Currently we strip the PAC from pointers using C code, which requires
generating bitmasks, and conditionally clearing/setting bits depending
on bit 55. We can do bett

arm64: use XPACLRI to strip PAC

Currently we strip the PAC from pointers using C code, which requires
generating bitmasks, and conditionally clearing/setting bits depending
on bit 55. We can do better by using XPACLRI directly.

When the logic was originally written to strip PACs from user pointers,
contemporary toolchains used for the kernel had assemblers which were
unaware of the PAC instructions. As stripping the PAC from userspace
pointers required unconditional clearing of a fixed set of bits (which
could be performed with a single instruction), it was simpler to
implement the masking in C than it was to make use of XPACI or XPACLRI.

When support for in-kernel pointer authentication was added, the
stripping logic was extended to cover TTBR1 pointers, requiring several
instructions to handle whether to clear/set bits dependent on bit 55 of
the pointer.

This patch simplifies the stripping of PACs by using XPACLRI directly,
as contemporary toolchains do within __builtin_return_address(). This
saves a number of instructions, especially where
__builtin_return_address() does not implicitly strip the PAC but is
heavily used (e.g. with tracepoints). As the kernel might be compiled
with an assembler without knowledge of XPACLRI, it is assembled using
the 'HINT #7' alias, which results in an identical opcode.

At the same time, I've split ptrauth_strip_insn_pac() into
ptrauth_strip_user_insn_pac() and ptrauth_strip_kernel_insn_pac()
helpers so that we can avoid unnecessary PAC stripping when pointer
authentication is not in use in userspace or kernel respectively.

The underlying xpaclri() macro uses inline assembly which clobbers x30.
The clobber causes the compiler to save/restore the original x30 value
in a frame record (protected with PACIASP and AUTIASP when in-kernel
authentication is enabled), so this does not provide a gadget to alter
the return address. Similarly this does not adversely affect unwinding
due to the presence of the frame record.

The ptrauth_user_pac_mask() and ptrauth_kernel_pac_mask() are exported
from the kernel in ptrace and core dumps, so these are retained. A
subsequent patch will move them out of <asm/compiler.h>.

Signed-off-by: Mark Rutland <[email protected]>
Cc: Amit Daniel Kachhap <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Kristina Martsenko <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

show more ...


Revision tags: v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2
# 071c44e4 14-Feb-2023 Josh Poimboeuf <[email protected]>

sched/idle: Mark arch_cpu_idle_dead() __noreturn

Before commit 076cbf5d2163 ("x86/xen: don't let xen_pv_play_dead()
return"), in Xen, when a previously offlined CPU was brought back
online, it unexp

sched/idle: Mark arch_cpu_idle_dead() __noreturn

Before commit 076cbf5d2163 ("x86/xen: don't let xen_pv_play_dead()
return"), in Xen, when a previously offlined CPU was brought back
online, it unexpectedly resumed execution where it left off in the
middle of the idle loop.

There were some hacks to make that work, but the behavior was surprising
as do_idle() doesn't expect an offlined CPU to return from the dead (in
arch_cpu_idle_dead()).

Now that Xen has been fixed, and the arch-specific implementations of
arch_cpu_idle_dead() also don't return, give it a __noreturn attribute.

This will cause the compiler to complain if an arch-specific
implementation might return. It also improves code generation for both
caller and callee.

Also fixes the following warning:

vmlinux.o: warning: objtool: do_idle+0x25f: unreachable instruction

Reported-by: Paul E. McKenney <[email protected]>
Tested-by: Paul E. McKenney <[email protected]>
Link: https://lore.kernel.org/r/60d527353da8c99d4cf13b6473131d46719ed16d.1676358308.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <[email protected]>

show more ...


Revision tags: v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5
# d6138b4a 16-Jan-2023 Mark Brown <[email protected]>

arm64/sme: Provide storage for ZT0

When the system supports SME2 there is an additional register ZT0 which
we must store when the task is using SME. Since ZT0 is accessible only
when PSTATE.ZA is se

arm64/sme: Provide storage for ZT0

When the system supports SME2 there is an additional register ZT0 which
we must store when the task is using SME. Since ZT0 is accessible only
when PSTATE.ZA is set just like ZA we allocate storage for it along with
ZA, increasing the allocation size for the memory region where we store
ZA and storing the data for ZT after that for ZA.

Signed-off-by: Mark Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

show more ...


# ce514000 16-Jan-2023 Mark Brown <[email protected]>

arm64/sme: Rename za_state to sme_state

In preparation for adding support for storage for ZT0 to the thread_struct
rename za_state to sme_state. Since ZT0 is accessible when PSTATE.ZA is
set just li

arm64/sme: Rename za_state to sme_state

In preparation for adding support for storage for ZT0 to the thread_struct
rename za_state to sme_state. Since ZT0 is accessible when PSTATE.ZA is
set just like ZA itself we will extend the allocation done for ZA to
cover it, avoiding the need to further expand task_struct for non-SME
tasks.

No functional changes.

Signed-off-by: Mark Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

show more ...


Revision tags: v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6
# baa85152 15-Nov-2022 Mark Brown <[email protected]>

arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE

When we save the state for the floating point registers this can be done
in the form visible through either the FPSIMD V registe

arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE

When we save the state for the floating point registers this can be done
in the form visible through either the FPSIMD V registers or the SVE Z and
P registers. At present we track which format is currently used based on
TIF_SVE and the SME streaming mode state but particularly in the SVE case
this limits our options for optimising things, especially around syscalls.
Introduce a new enum which we place together with saved floating point
state in both thread_struct and the KVM guest state which explicitly
states which format is active and keep it up to date when we change it.

At present we do not use this state except to verify that it has the
expected value when loading the state, future patches will introduce
functional changes.

Signed-off-by: Mark Brown <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

show more ...


Revision tags: v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1
# 8032bf12 10-Oct-2022 Jason A. Donenfeld <[email protected]>

treewide: use get_random_u32_below() instead of deprecated function

This is a simple mechanical transformation done by:

@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
(E)

Reviewed-

treewide: use get_random_u32_below() instead of deprecated function

This is a simple mechanical transformation done by:

@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
(E)

Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Acked-by: Darrick J. Wong <[email protected]> # for xfs
Reviewed-by: SeongJae Park <[email protected]> # for damon
Reviewed-by: Jason Gunthorpe <[email protected]> # for infiniband
Reviewed-by: Russell King (Oracle) <[email protected]> # for arm
Acked-by: Ulf Hansson <[email protected]> # for mmc
Signed-off-by: Jason A. Donenfeld <[email protected]>

show more ...


# 81895a65 05-Oct-2022 Jason A. Donenfeld <[email protected]>

treewide: use prandom_u32_max() when possible, part 1

Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
t

treewide: use prandom_u32_max() when possible, part 1

Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

- RAND = get_random_u32();
... when != RAND
- RAND %= (E);
+ RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

((T)get_random_u32()@p & (LITERAL))

// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
print("Skipping 0x%x for cleanup elsewhere" % (value))
cocci.include_match(False)
elif value & (value + 1) != 0:
print("Skipping 0x%x because it's not a power of two minus one" % (value))
cocci.include_match(False)
elif literal.startswith('0x'):
coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

- (FUNC()@p & (LITERAL))
+ prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

{
- T VAR;
- VAR = (E);
- return VAR;
+ return E;
}

@drop_var@
type T;
identifier VAR;
@@

{
- T VAR;
... when != VAR
}

Reviewed-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Yury Norov <[email protected]>
Reviewed-by: KP Singh <[email protected]>
Reviewed-by: Jan Kara <[email protected]> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <[email protected]> # for drbd
Acked-by: Jakub Kicinski <[email protected]>
Acked-by: Heiko Carstens <[email protected]> # for s390
Acked-by: Ulf Hansson <[email protected]> # for mmc
Acked-by: Darrick J. Wong <[email protected]> # for xfs
Signed-off-by: Jason A. Donenfeld <[email protected]>

show more ...


Revision tags: v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2
# 2be9880d 19-Aug-2022 Kefeng Wang <[email protected]>

kernel: exit: cleanup release_thread()

Only x86 has own release_thread(), introduce a new weak release_thread()
function to clean empty definitions in other ARCHs.

Link: https://lkml.kernel.org/r/2

kernel: exit: cleanup release_thread()

Only x86 has own release_thread(), introduce a new weak release_thread()
function to clean empty definitions in other ARCHs.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: Guo Ren <[email protected]> [csky]
Acked-by: Russell King (Oracle) <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Acked-by: Brian Cain <[email protected]>
Acked-by: Michael Ellerman <[email protected]> [powerpc]
Acked-by: Stafford Horne <[email protected]> [openrisc]
Acked-by: Catalin Marinas <[email protected]> [arm64]
Acked-by: Huacai Chen <[email protected]> [LoongArch]
Cc: Alexander Gordeev <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Guo Ren <[email protected]> [csky]
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Xuerui Wang <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7
# 0c649914 09-May-2022 Dmitry Osipenko <[email protected]>

arm64: Use do_kernel_power_off()

Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, whi

arm64: Use do_kernel_power_off()

Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, which will be removed once all drivers will
be converted to the new sys-off API.

Acked-by: Catalin Marinas <[email protected]>
Reviewed-by: Michał Mirosław <[email protected]>
Signed-off-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

show more ...


Revision tags: v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3
# 5bd2e97c 12-Apr-2022 Eric W. Biederman <[email protected]>

fork: Generalize PF_IO_WORKER handling

Add fn and fn_arg members into struct kernel_clone_args and test for
them in copy_thread (instead of testing for PF_KTHREAD | PF_IO_WORKER).
This allows any ta

fork: Generalize PF_IO_WORKER handling

Add fn and fn_arg members into struct kernel_clone_args and test for
them in copy_thread (instead of testing for PF_KTHREAD | PF_IO_WORKER).
This allows any task that wants to be a user space task that only runs
in kernel mode to use this functionality.

The code on x86 is an exception and still retains a PF_KTHREAD test
because x86 unlikely everything else handles kthreads slightly
differently than user space tasks that start with a function.

The functions that created tasks that start with a function
have been updated to set ".fn" and ".fn_arg" instead of
".stack" and ".stack_size". These functions are fork_idle(),
create_io_thread(), kernel_thread(), and user_mode_thread().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>

show more ...


Revision tags: v5.18-rc2
# c5febea0 08-Apr-2022 Eric W. Biederman <[email protected]>

fork: Pass struct kernel_clone_args into copy_thread

With io_uring we have started supporting tasks that are for most
purposes user space tasks that exclusively run code in kernel mode.

The kernel

fork: Pass struct kernel_clone_args into copy_thread

With io_uring we have started supporting tasks that are for most
purposes user space tasks that exclusively run code in kernel mode.

The kernel task that exec's init and tasks that exec user mode
helpers are also user mode tasks that just run kernel code
until they call kernel execve.

Pass kernel_clone_args into copy_thread so these oddball
tasks can be supported more cleanly and easily.

v2: Fix spelling of kenrel_clone_args on h8300
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>

show more ...


12345678