History log of /linux-6.15/fs/proc/array.c (Results 1 – 25 of 180)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6
# ab251dac 02-Jan-2025 Nam Cao <[email protected]>

fs/proc: do_task_stat: Fix ESP not readable during coredump

The field "eip" (instruction pointer) and "esp" (stack pointer) of a task
can be read from /proc/PID/stat. These fields can be interesting

fs/proc: do_task_stat: Fix ESP not readable during coredump

The field "eip" (instruction pointer) and "esp" (stack pointer) of a task
can be read from /proc/PID/stat. These fields can be interesting for
coredump.

However, these fields were disabled by commit 0a1eb2d474ed ("fs/proc: Stop
reporting eip and esp in /proc/PID/stat"), because it is generally unsafe
to do so. But it is safe for a coredumping process, and therefore
exceptions were made:

- for a coredumping thread by commit fd7d56270b52 ("fs/proc: Report
eip/esp in /prod/PID/stat for coredumping").

- for all other threads in a coredumping process by commit cb8f381f1613
("fs/proc/array.c: allow reporting eip/esp for all coredumping
threads").

The above two commits check the PF_DUMPCORE flag to determine a coredump thread
and the PF_EXITING flag for the other threads.

Unfortunately, commit 92307383082d ("coredump: Don't perform any cleanups
before dumping core") moved coredump to happen earlier and before PF_EXITING is
set. Thus, checking PF_EXITING is no longer the correct way to determine
threads in a coredumping process.

Instead of PF_EXITING, use PF_POSTCOREDUMP to determine the other threads.

Checking of PF_EXITING was added for coredumping, so it probably can now be
removed. But it doesn't hurt to keep.

Fixes: 92307383082d ("coredump: Don't perform any cleanups before dumping core")
Cc: [email protected]
Cc: Eric W. Biederman <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Acked-by: Kees Cook <[email protected]>
Signed-off-by: Nam Cao <[email protected]>
Link: https://lore.kernel.org/r/d89af63d478d6c64cc46a01420b46fd6eb147d6f.1735805772.git.namcao@linutronix.de
Signed-off-by: Christian Brauner <[email protected]>

show more ...


Revision tags: v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3
# 4cc0473d 07-Oct-2024 Yafang Shao <[email protected]>

get rid of __get_task_comm()

Patch series "Improve the copy of task comm", v8.

Using {memcpy,strncpy,strcpy,kstrdup} to copy the task comm relies on the
length of task comm. Changes in the task co

get rid of __get_task_comm()

Patch series "Improve the copy of task comm", v8.

Using {memcpy,strncpy,strcpy,kstrdup} to copy the task comm relies on the
length of task comm. Changes in the task comm could result in a
destination string that is overflow. Therefore, we should explicitly
ensure the destination string is always NUL-terminated, regardless of the
task comm. This approach will facilitate future extensions to the task
comm.

As suggested by Linus [0], we can identify all relevant code with the
following git grep command:

git grep 'memcpy.*->comm\>'
git grep 'kstrdup.*->comm\>'
git grep 'strncpy.*->comm\>'
git grep 'strcpy.*->comm\>'

PATCH #2~#4: memcpy
PATCH #5~#6: kstrdup
PATCH #7: strcpy

Please note that strncpy() is not included in this series as it is being
tracked by another effort. [1]


This patch (of 7):

We want to eliminate the use of __get_task_comm() for the following
reasons:

- The task_lock() is unnecessary
Quoted from Linus [0]:
: Since user space can randomly change their names anyway, using locking
: was always wrong for readers (for writers it probably does make sense
: to have some lock - although practically speaking nobody cares there
: either, but at least for a writer some kind of race could have
: long-term mixed results

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0]
Link: https://lore.kernel.org/all/CAHk-=whWtUC-AjmGJveAETKOMeMFSTwKwu99v7+b6AyHMmaDFA@mail.gmail.com/
Link: https://lore.kernel.org/all/CAHk-=wjAmmHUg6vho1KjzQi2=psR30+CogFd4aXrThr2gsiS4g@mail.gmail.com/ [0]
Link: https://github.com/KSPP/linux/issues/90 [1]
Signed-off-by: Yafang Shao <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Matus Jokay <[email protected]>
Cc: Alejandro Colomar <[email protected]>
Cc: "Serge E. Hallyn" <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Justin Stitt <[email protected]>
Cc: Steven Rostedt (Google) <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Eric Paris <[email protected]>
Cc: James Morris <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Ondrej Mosnacek <[email protected]>
Cc: Paul Moore <[email protected]>
Cc: Quentin Monnet <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: Stephen Smalley <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2
# 7601df80 23-Jan-2024 Oleg Nesterov <[email protected]>

fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats

lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call
do_task_stat() at the same time and the proces

fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats

lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call
do_task_stat() at the same time and the process has NR_THREADS, it will
spin with irqs disabled O(NR_CPUS * NR_THREADS) time.

Change do_task_stat() to use sig->stats_lock to gather the statistics
outside of ->siglock protected section, in the likely case this code will
run lockless.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Dylan Hatch <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 60f92acb 23-Jan-2024 Oleg Nesterov <[email protected]>

fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()

Patch series "fs/proc: do_task_stat: use sig->stats_".

do_task_stat() has the same problem as getrusage()

fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()

Patch series "fs/proc: do_task_stat: use sig->stats_".

do_task_stat() has the same problem as getrusage() had before "getrusage:
use sig->stats_lock rather than lock_task_sighand()": a hard lockup. If
NR_CPUS threads call lock_task_sighand() at the same time and the process
has NR_THREADS, spin_lock_irq will spin with irqs disabled O(NR_CPUS *
NR_THREADS) time.


This patch (of 3):

thread_group_cputime() does its own locking, we can safely shift
thread_group_cputime_adjusted() which does another for_each_thread loop
outside of ->siglock protected section.

Not only this removes for_each_thread() from the critical section with
irqs disabled, this removes another case when stats_lock is taken with
siglock held. We want to remove this dependency, then we can change the
users of stats_lock to not disable irqs.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Dylan Hatch <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1
# 7904e53e 09-Sep-2023 Oleg Nesterov <[email protected]>

fs/proc: do_task_stat: use __for_each_thread()

do/while_each_thread should be avoided when possible.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <

fs/proc: do_task_stat: use __for_each_thread()

do/while_each_thread should be avoided when possible.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7
# 0ee44885 13-Jun-2023 Rick Edgecombe <[email protected]>

x86: Expose thread features in /proc/$PID/status

Applications and loaders can have logic to decide whether to enable
shadow stack. They usually don't report whether shadow stack has been
enabled or

x86: Expose thread features in /proc/$PID/status

Applications and loaders can have logic to decide whether to enable
shadow stack. They usually don't report whether shadow stack has been
enabled or not, so there is no way to verify whether an application
actually is protected by shadow stack.

Add two lines in /proc/$PID/status to report enabled and locked features.

Since, this involves referring to arch specific defines in asm/prctl.h,
implement an arch breakout to emit the feature lines.

[Switched to CET, added to commit log]

Co-developed-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Tested-by: Pengfei Xu <[email protected]>
Tested-by: John Allen <[email protected]>
Tested-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/all/20230613001108.3040476-37-rick.p.edgecombe%40intel.com

show more ...


Revision tags: v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7
# 522dc4e5 16-Apr-2023 Chunguang Wu <[email protected]>

fs/proc: add Kthread flag to /proc/$pid/status

The command `ps -ef ` and `top -c` mark kernel thread by '[' and ']', but
sometimes the result is not correct. The task->flags in /proc/$pid/stat
is g

fs/proc: add Kthread flag to /proc/$pid/status

The command `ps -ef ` and `top -c` mark kernel thread by '[' and ']', but
sometimes the result is not correct. The task->flags in /proc/$pid/stat
is good, but we need remember the value of PF_KTHREAD is 0x00200000 and
convert dec to hex. If we have no binary program and shell script which
read /proc/$pid/stat, we can know it directly by `cat /proc/$pid/status`.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Chunguang Wu <[email protected]>
Reviewed-by: Randy Dunlap <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2
# f7d30434 12-Mar-2023 Kirill A. Shutemov <[email protected]>

mm: Expose untagging mask in /proc/$PID/status

Add a line in /proc/$PID/status to report untag_mask. It can be
used to find out LAM status of the process from the outside. It is
useful for debuggers

mm: Expose untagging mask in /proc/$PID/status

Add a line in /proc/$PID/status to report untag_mask. It can be
used to find out LAM status of the process from the outside. It is
useful for debuggers.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Alexander Potapenko <[email protected]>
Link: https://lore.kernel.org/all/20230312112612.31869-10-kirill.shutemov%40linux.intel.com

show more ...


Revision tags: v6.3-rc1
# f122a08b 28-Feb-2023 Linus Torvalds <[email protected]>

capability: just use a 'u64' instead of a 'u32[2]' array

Back in 2008 we extended the capability bits from 32 to 64, and we did
it by extending the single 32-bit capability word from one word to an

capability: just use a 'u64' instead of a 'u32[2]' array

Back in 2008 we extended the capability bits from 32 to 64, and we did
it by extending the single 32-bit capability word from one word to an
array of two words. It was then obfuscated by hiding the "2" behind two
macro expansions, with the reasoning being that maybe it gets extended
further some day.

That reasoning may have been valid at the time, but the last thing we
want to do is to extend the capability set any more. And the array of
values not only causes source code oddities (with loops to deal with
it), but also results in worse code generation. It's a lose-lose
situation.

So just change the 'u32[2]' into a 'u64' and be done with it.

We still have to deal with the fact that the user space interface is
designed around an array of these 32-bit values, but that was the case
before too, since the array layouts were different (ie user space
doesn't use an array of 32-bit values for individual capability masks,
but an array of 32-bit slices of multiple masks).

So that marshalling of data is actually simplified too, even if it does
remain somewhat obscure and odd.

This was all triggered by my reaction to the new "cap_isidentical()"
introduced recently. By just using a saner data structure, it went from

unsigned __capi;
CAP_FOR_EACH_U32(__capi) {
if (a.cap[__capi] != b.cap[__capi])
return false;
}
return true;

to just being

return a.val == b.val;

instead. Which is rather more obvious both to humans and to compilers.

Cc: Mateusz Guzik <[email protected]>
Cc: Casey Schaufler <[email protected]>
Cc: Serge Hallyn <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Paul Moore <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


Revision tags: v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8
# ed8fb78d 23-Jul-2022 Alexey Dobriyan <[email protected]>

proc: add some (hopefully) insightful comments

* /proc/${pid}/net status
* removing PDE vs last close stuff (again!)
* random small stuff

Link: https://lkml.kernel.org/r/YtwrM6sDC0OQ53YB@localhost.

proc: add some (hopefully) insightful comments

* /proc/${pid}/net status
* removing PDE vs last close stuff (again!)
* random small stuff

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3
# 376b0c26 15-Jun-2022 Alexey Dobriyan <[email protected]>

proc: delete unused <linux/uaccess.h> includes

Those aren't necessary after seq files won.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexey Dobriyan <ado

proc: delete unused <linux/uaccess.h> includes

Those aren't necessary after seq files won.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v5.19-rc2, v5.19-rc1, v5.18
# de399236 18-May-2022 Alexey Gladkov <[email protected]>

ucounts: Split rlimit and ucount values and max values

Since the semantics of maximum rlimit values are different, it would be
better not to mix ucount and rlimit values. This will prevent the error

ucounts: Split rlimit and ucount values and max values

Since the semantics of maximum rlimit values are different, it would be
better not to mix ucount and rlimit values. This will prevent the error
of using inc_count/dec_ucount for rlimit parameters.

This patch also renames the functions to emphasize the lack of
connection between rlimit and ucount.

v3:
- Fix BUG:KASAN:use-after-free_in_dec_ucount.

v2:
- Fix the array-index-out-of-bounds that was found by the lkp project.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Alexey Gladkov <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>

show more ...


Revision tags: v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4
# 355f841a 09-Feb-2022 Eric W. Biederman <[email protected]>

tracehook: Remove tracehook.h

Now that all of the definitions have moved out of tracehook.h into
ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in
tracehook.h so remove it.

Upda

tracehook: Remove tracehook.h

Now that all of the definitions have moved out of tracehook.h into
ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in
tracehook.h so remove it.

Update the few files that were depending upon tracehook.h to bring in
definitions to use the headers they need directly.

Reviewed-by: Kees Cook <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>

show more ...


Revision tags: v5.17-rc3, v5.17-rc2, v5.17-rc1
# d6986ce2 20-Jan-2022 Yafang Shao <[email protected]>

kthread: dynamically allocate memory to store kthread's full name

When I was implementing a new per-cpu kthread cfs_migration, I found the
comm of it "cfs_migration/%u" is truncated due to the limit

kthread: dynamically allocate memory to store kthread's full name

When I was implementing a new per-cpu kthread cfs_migration, I found the
comm of it "cfs_migration/%u" is truncated due to the limitation of
TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19
all have the same name "cfs_migration/1", which will confuse the user.
This issue is not critical, because we can get the corresponding CPU
from the task's Cpus_allowed. But for kthreads corresponding to other
hardware devices, it is not easy to get the detailed device info from
task comm, for example,

jbd2/nvme0n1p2-
xfs-reclaim/sdf

Currently there are so many truncated kthreads:

rcu_tasks_kthre
rcu_tasks_rude_
rcu_tasks_trace
poll_mpt3sas0_s
ext4-rsv-conver
xfs-reclaim/sd{a, b, c, ...}
xfs-blockgc/sd{a, b, c, ...}
xfs-inodegc/sd{a, b, c, ...}
audit_send_repl
ecryptfs-kthrea
vfio-irqfd-clea
jbd2/nvme0n1p2-
...

We can shorten these names to work around this problem, but it may be
not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-'
for example, it is a nice name, and it is not a good idea to shorten it.

One possible way to fix this issue is extending the task comm size, but
as task->comm is used in lots of places, that may cause some potential
buffer overflows. Another more conservative approach is introducing a
new pointer to store kthread's full name if it is truncated, which won't
introduce too much overhead as it is in the non-critical path. Finally
we make a dicision to use the second approach. See also the discussions
in this thread:
https://lore.kernel.org/lkml/[email protected]/

After this change, the full name of these truncated kthreads will be
displayed via /proc/[pid]/comm:

rcu_tasks_kthread
rcu_tasks_rude_kthread
rcu_tasks_trace_kthread
poll_mpt3sas0_statu
ext4-rsv-conversion
xfs-reclaim/sdf1
xfs-blockgc/sdf1
xfs-inodegc/sdf1
audit_send_reply
ecryptfs-kthread
vfio-irqfd-cleanup
jbd2/nvme0n1p2-8

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Suggested-by: Petr Mladek <[email protected]>
Suggested-by: Steven Rostedt <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Michal Miroslaw <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


Revision tags: v5.16, v5.16-rc8, v5.16-rc7
# 2d18f7f4 21-Dec-2021 Eric W. Biederman <[email protected]>

exit: Use the correct exit_code in /proc/<pid>/stat

Since do_proc_statt was modified to return process wide values instead
of per task values the exit_code calculation has never been updated.
Update

exit: Use the correct exit_code in /proc/<pid>/stat

Since do_proc_statt was modified to return process wide values instead
of per task values the exit_code calculation has never been updated.
Update it now to return the process wide exit_code when it is requested
and available.

History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: bf719d26a5c1 ("[PATCH] distinct tgid/tid CPU usage")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>

show more ...


Revision tags: v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4
# 4e046156 29-Sep-2021 Kees Cook <[email protected]>

proc: Use task_is_running() for wchan in /proc/$pid/stat

The implementations of get_wchan() can be expensive. The only information
imparted here is whether or not a process is currently blocked in t

proc: Use task_is_running() for wchan in /proc/$pid/stat

The implementations of get_wchan() can be expensive. The only information
imparted here is whether or not a process is currently blocked in the
scheduler (and even this doesn't need to be exact). Avoid doing the
heavy lifting of stack walking and just report that information by using
task_is_running().

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

show more ...


Revision tags: v5.15-rc3
# 0258b5fd 22-Sep-2021 Eric W. Biederman <[email protected]>

coredump: Limit coredumps to a single thread group

Today when a signal is delivered with a handler of SIG_DFL whose
default behavior is to generate a core dump not only that process but
every proces

coredump: Limit coredumps to a single thread group

Today when a signal is delivered with a handler of SIG_DFL whose
default behavior is to generate a core dump not only that process but
every process that shares the mm is killed.

In the case of vfork this looks like a real world problem. Consider
the following well defined sequence.

if (vfork() == 0) {
execve(...);
_exit(EXIT_FAILURE);
}

If a signal that generates a core dump is received after vfork but
before the execve changes the mm the process that called vfork will
also be killed (as the mm is shared).

Similarly if the execve fails after the point of no return the kernel
delivers SIGSEGV which will kill both the exec'ing process and because
the mm is shared the process that called vfork as well.

As far as I can tell this behavior is a violation of people's
reasonable expectations, POSIX, and is unnecessarily fragile when the
system is low on memory.

Solve this by making a userspace visible change to only kill a single
process/thread group. This is possible because Jann Horn recently
modified[1] the coredump code so that the mm can safely be modified
while the coredump is happening. With LinuxThreads long gone I don't
expect anyone to have a notice this behavior change in practice.

To accomplish this move the core_state pointer from mm_struct to
signal_struct, which allows different thread groups to coredump
simultatenously.

In zap_threads remove the work to kill anything except for the current
thread group.

v2: Remove core_state from the VM_BUG_ON_MM print to fix
compile failure when CONFIG_DEBUG_VM is enabled.
Reported-by: Stephen Rothwell <[email protected]>

[1] a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot")
Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3")
History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Link: https://lkml.kernel.org/r/87y27mvnke.fsf@disp2133
Link: https://lkml.kernel.org/r/[email protected]
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>

show more ...


Revision tags: v5.15-rc2, v5.15-rc1
# 8d23b208 08-Sep-2021 Christoph Hellwig <[email protected]>

proc: stop using seq_get_buf in proc_task_name

Use seq_escape_str and seq_printf instead of poking holes into the
seq_file abstraction.

Link: https://lkml.kernel.org/r/20210810151945.1795567-1-hch@

proc: stop using seq_get_buf in proc_task_name

Use seq_escape_str and seq_printf instead of poking holes into the
seq_file abstraction.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


Revision tags: v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12
# d6469690 22-Apr-2021 Alexey Gladkov <[email protected]>

Reimplement RLIMIT_SIGPENDING on top of ucounts

The rlimit counter is tied to uid in the user_namespace. This allows
rlimit values to be specified in userns even if they are already
globally exceede

Reimplement RLIMIT_SIGPENDING on top of ucounts

The rlimit counter is tied to uid in the user_namespace. This allows
rlimit values to be specified in userns even if they are already
globally exceeded by the user. However, the value of the previous
user_namespaces cannot be exceeded.

Changelog

v11:
* Revert most of changes to fix performance issues.

v10:
* Fix memory leak on get_ucounts failure.

Signed-off-by: Alexey Gladkov <[email protected]>
Link: https://lkml.kernel.org/r/df9d7764dddd50f28616b7840de74ec0f81711a8.1619094428.git.legion@kernel.org
Signed-off-by: Eric W. Biederman <[email protected]>

show more ...


Revision tags: v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4
# 64bdc024 21-Mar-2021 [email protected] <[email protected]>

seccomp: Fix CONFIG tests for Seccomp_filters

Strictly speaking, seccomp filters are only used
when CONFIG_SECCOMP_FILTER.
This patch fixes the condition to enable "Seccomp_filters"
in /proc/$pid/st

seccomp: Fix CONFIG tests for Seccomp_filters

Strictly speaking, seccomp filters are only used
when CONFIG_SECCOMP_FILTER.
This patch fixes the condition to enable "Seccomp_filters"
in /proc/$pid/status.

Signed-off-by: Kenta Tada <[email protected]>
Fixes: c818c03b661c ("seccomp: Report number of loaded filters in /proc/$pid/status")
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/OSBPR01MB26772D245E2CF4F26B76A989F5669@OSBPR01MB2677.jpnprd01.prod.outlook.com

show more ...


Revision tags: v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1
# fe719888 16-Dec-2020 Anand K Mistry <[email protected]>

proc: provide details on indirect branch speculation

Similar to speculation store bypass, show information about the indirect
branch speculation mode of a task in /proc/$pid/status.

For testing/ben

proc: provide details on indirect branch speculation

Similar to speculation store bypass, show information about the indirect
branch speculation mode of a task in /proc/$pid/status.

For testing/benchmarking, I needed to see whether IB (Indirect Branch)
speculation (see Spectre-v2) is enabled on a task, to see whether an
IBPB instruction should be executed on an address space switch.
Unfortunately, this information isn't available anywhere else and
currently the only way to get it is to hack the kernel to expose it
(like this change). It also helped expose a bug with conditional IB
speculation on certain CPUs.

Another place this could be useful is to audit the system when using
sanboxing. With this change, I can confirm that seccomp-enabled
process have IB speculation force disabled as expected when the kernel
command line parameter `spectre_v2_user=seccomp`.

Since there's already a 'Speculation_Store_Bypass' field, I used that
as precedent for adding this one.

[[email protected]: remove underscores from field name to workaround documentation issue]
Link: https://lkml.kernel.org/r/20201106131015.v2.1.I7782b0cedb705384a634cfd8898eb7523562da99@changeid

Link: https://lkml.kernel.org/r/20201030172731.1.I7782b0cedb705384a634cfd8898eb7523562da99@changeid
Signed-off-by: Anand K Mistry <[email protected]>
Cc: Anthony Steinhauser <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Anand K Mistry <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Alexey Gladkov <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


Revision tags: v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9
# 86fbcd3b 05-Oct-2020 Peter Zijlstra <[email protected]>

sched/proc: Print accurate cpumask vs migrate_disable()

Ensure /proc/*/status doesn't print 'random' cpumasks due to
migrate_disable().

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>

sched/proc: Print accurate cpumask vs migrate_disable()

Ensure /proc/*/status doesn't print 'random' cpumasks due to
migrate_disable().

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Reviewed-by: Daniel Bristot de Oliveira <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

show more ...


# 3ae700ec 27-Oct-2020 Michael Weiß <[email protected]>

fs/proc: apply the time namespace offset to /proc/stat btime

'/proc/stat' provides the field 'btime' which states the time stamp of
system boot in seconds. In case of time namespaces, the offset to

fs/proc: apply the time namespace offset to /proc/stat btime

'/proc/stat' provides the field 'btime' which states the time stamp of
system boot in seconds. In case of time namespaces, the offset to the
boot time stamp was not applied earlier.
This confuses tasks which are in another time universe, e.g., in a
container of a container runtime which utilize time namespaces to
virtualize boottime.

Therefore, we make procfs to virtualize also the btime field by
subtracting the offset of the timens boottime from 'btime' before
printing the stats.

Since start_boottime of processes are seconds since boottime and the
boottime stamp is now shifted according to the timens offset, the
offset of the time namespace also needs to be applied before the
process stats are given to userspace.

This avoids that processes shown, e.g., by 'ps' appear as time
travelers in the corresponding time namespace.

Signed-off-by: Michael Weiß <[email protected]>
Reviewed-by: Andrei Vagin <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6
# c818c03b 13-May-2020 Kees Cook <[email protected]>

seccomp: Report number of loaded filters in /proc/$pid/status

A common question asked when debugging seccomp filters is "how many
filters are attached to your process?" Provide a way to easily answe

seccomp: Report number of loaded filters in /proc/$pid/status

A common question asked when debugging seccomp filters is "how many
filters are attached to your process?" Provide a way to easily answer
this question through /proc/$pid/status with a "Seccomp_filters" line.

Signed-off-by: Kees Cook <[email protected]>

show more ...


# e31cf2f4 09-Jun-2020 Mike Rapoport <[email protected]>

mm: don't include asm/pgtable.h if linux/mm.h is already included

Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset

mm: don't include asm/pgtable.h if linux/mm.h is already included

Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once. For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
functions involving page table manipulations, e.g. pte_alloc() and
pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h>
in the files that include <linux/mm.h>.

The include statements in such cases are remove with a simple loop:

for f in $(git grep -l "include <linux/mm.h>") ; do
sed -i -e '/include <asm\/pgtable.h>/ d' $f
done

Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

show more ...


12345678