|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6 |
|
| #
ab251dac |
| 02-Jan-2025 |
Nam Cao <[email protected]> |
fs/proc: do_task_stat: Fix ESP not readable during coredump
The field "eip" (instruction pointer) and "esp" (stack pointer) of a task can be read from /proc/PID/stat. These fields can be interesting
fs/proc: do_task_stat: Fix ESP not readable during coredump
The field "eip" (instruction pointer) and "esp" (stack pointer) of a task can be read from /proc/PID/stat. These fields can be interesting for coredump.
However, these fields were disabled by commit 0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in /proc/PID/stat"), because it is generally unsafe to do so. But it is safe for a coredumping process, and therefore exceptions were made:
- for a coredumping thread by commit fd7d56270b52 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping").
- for all other threads in a coredumping process by commit cb8f381f1613 ("fs/proc/array.c: allow reporting eip/esp for all coredumping threads").
The above two commits check the PF_DUMPCORE flag to determine a coredump thread and the PF_EXITING flag for the other threads.
Unfortunately, commit 92307383082d ("coredump: Don't perform any cleanups before dumping core") moved coredump to happen earlier and before PF_EXITING is set. Thus, checking PF_EXITING is no longer the correct way to determine threads in a coredumping process.
Instead of PF_EXITING, use PF_POSTCOREDUMP to determine the other threads.
Checking of PF_EXITING was added for coredumping, so it probably can now be removed. But it doesn't hurt to keep.
Fixes: 92307383082d ("coredump: Don't perform any cleanups before dumping core") Cc: [email protected] Cc: Eric W. Biederman <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Acked-by: Kees Cook <[email protected]> Signed-off-by: Nam Cao <[email protected]> Link: https://lore.kernel.org/r/d89af63d478d6c64cc46a01420b46fd6eb147d6f.1735805772.git.namcao@linutronix.de Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
4cc0473d |
| 07-Oct-2024 |
Yafang Shao <[email protected]> |
get rid of __get_task_comm()
Patch series "Improve the copy of task comm", v8.
Using {memcpy,strncpy,strcpy,kstrdup} to copy the task comm relies on the length of task comm. Changes in the task co
get rid of __get_task_comm()
Patch series "Improve the copy of task comm", v8.
Using {memcpy,strncpy,strcpy,kstrdup} to copy the task comm relies on the length of task comm. Changes in the task comm could result in a destination string that is overflow. Therefore, we should explicitly ensure the destination string is always NUL-terminated, regardless of the task comm. This approach will facilitate future extensions to the task comm.
As suggested by Linus [0], we can identify all relevant code with the following git grep command:
git grep 'memcpy.*->comm\>' git grep 'kstrdup.*->comm\>' git grep 'strncpy.*->comm\>' git grep 'strcpy.*->comm\>'
PATCH #2~#4: memcpy PATCH #5~#6: kstrdup PATCH #7: strcpy
Please note that strncpy() is not included in this series as it is being tracked by another effort. [1]
This patch (of 7):
We want to eliminate the use of __get_task_comm() for the following reasons:
- The task_lock() is unnecessary Quoted from Linus [0]: : Since user space can randomly change their names anyway, using locking : was always wrong for readers (for writers it probably does make sense : to have some lock - although practically speaking nobody cares there : either, but at least for a writer some kind of race could have : long-term mixed results
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0] Link: https://lore.kernel.org/all/CAHk-=whWtUC-AjmGJveAETKOMeMFSTwKwu99v7+b6AyHMmaDFA@mail.gmail.com/ Link: https://lore.kernel.org/all/CAHk-=wjAmmHUg6vho1KjzQi2=psR30+CogFd4aXrThr2gsiS4g@mail.gmail.com/ [0] Link: https://github.com/KSPP/linux/issues/90 [1] Signed-off-by: Yafang Shao <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jan Kara <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Kees Cook <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Matus Jokay <[email protected]> Cc: Alejandro Colomar <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Airlie <[email protected]> Cc: Eric Paris <[email protected]> Cc: James Morris <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Ondrej Mosnacek <[email protected]> Cc: Paul Moore <[email protected]> Cc: Quentin Monnet <[email protected]> Cc: Simon Horman <[email protected]> Cc: Stephen Smalley <[email protected]> Cc: Thomas Zimmermann <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2 |
|
| #
7601df80 |
| 23-Jan-2024 |
Oleg Nesterov <[email protected]> |
fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats
lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call do_task_stat() at the same time and the proces
fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats
lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call do_task_stat() at the same time and the process has NR_THREADS, it will spin with irqs disabled O(NR_CPUS * NR_THREADS) time.
Change do_task_stat() to use sig->stats_lock to gather the statistics outside of ->siglock protected section, in the likely case this code will run lockless.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Dylan Hatch <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
60f92acb |
| 23-Jan-2024 |
Oleg Nesterov <[email protected]> |
fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()
Patch series "fs/proc: do_task_stat: use sig->stats_".
do_task_stat() has the same problem as getrusage()
fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()
Patch series "fs/proc: do_task_stat: use sig->stats_".
do_task_stat() has the same problem as getrusage() had before "getrusage: use sig->stats_lock rather than lock_task_sighand()": a hard lockup. If NR_CPUS threads call lock_task_sighand() at the same time and the process has NR_THREADS, spin_lock_irq will spin with irqs disabled O(NR_CPUS * NR_THREADS) time.
This patch (of 3):
thread_group_cputime() does its own locking, we can safely shift thread_group_cputime_adjusted() which does another for_each_thread loop outside of ->siglock protected section.
Not only this removes for_each_thread() from the critical section with irqs disabled, this removes another case when stats_lock is taken with siglock held. We want to remove this dependency, then we can change the users of stats_lock to not disable irqs.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Dylan Hatch <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1 |
|
| #
7904e53e |
| 09-Sep-2023 |
Oleg Nesterov <[email protected]> |
fs/proc: do_task_stat: use __for_each_thread()
do/while_each_thread should be avoided when possible.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <
fs/proc: do_task_stat: use __for_each_thread()
do/while_each_thread should be avoided when possible.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Eric W. Biederman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7 |
|
| #
0ee44885 |
| 13-Jun-2023 |
Rick Edgecombe <[email protected]> |
x86: Expose thread features in /proc/$PID/status
Applications and loaders can have logic to decide whether to enable shadow stack. They usually don't report whether shadow stack has been enabled or
x86: Expose thread features in /proc/$PID/status
Applications and loaders can have logic to decide whether to enable shadow stack. They usually don't report whether shadow stack has been enabled or not, so there is no way to verify whether an application actually is protected by shadow stack.
Add two lines in /proc/$PID/status to report enabled and locked features.
Since, this involves referring to arch specific defines in asm/prctl.h, implement an arch breakout to emit the feature lines.
[Switched to CET, added to commit log]
Co-developed-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Tested-by: Pengfei Xu <[email protected]> Tested-by: John Allen <[email protected]> Tested-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/all/20230613001108.3040476-37-rick.p.edgecombe%40intel.com
show more ...
|
|
Revision tags: v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7 |
|
| #
522dc4e5 |
| 16-Apr-2023 |
Chunguang Wu <[email protected]> |
fs/proc: add Kthread flag to /proc/$pid/status
The command `ps -ef ` and `top -c` mark kernel thread by '[' and ']', but sometimes the result is not correct. The task->flags in /proc/$pid/stat is g
fs/proc: add Kthread flag to /proc/$pid/status
The command `ps -ef ` and `top -c` mark kernel thread by '[' and ']', but sometimes the result is not correct. The task->flags in /proc/$pid/stat is good, but we need remember the value of PF_KTHREAD is 0x00200000 and convert dec to hex. If we have no binary program and shell script which read /proc/$pid/stat, we can know it directly by `cat /proc/$pid/status`.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chunguang Wu <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2 |
|
| #
f7d30434 |
| 12-Mar-2023 |
Kirill A. Shutemov <[email protected]> |
mm: Expose untagging mask in /proc/$PID/status
Add a line in /proc/$PID/status to report untag_mask. It can be used to find out LAM status of the process from the outside. It is useful for debuggers
mm: Expose untagging mask in /proc/$PID/status
Add a line in /proc/$PID/status to report untag_mask. It can be used to find out LAM status of the process from the outside. It is useful for debuggers.
Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Catalin Marinas <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Alexander Potapenko <[email protected]> Link: https://lore.kernel.org/all/20230312112612.31869-10-kirill.shutemov%40linux.intel.com
show more ...
|
|
Revision tags: v6.3-rc1 |
|
| #
f122a08b |
| 28-Feb-2023 |
Linus Torvalds <[email protected]> |
capability: just use a 'u64' instead of a 'u32[2]' array
Back in 2008 we extended the capability bits from 32 to 64, and we did it by extending the single 32-bit capability word from one word to an
capability: just use a 'u64' instead of a 'u32[2]' array
Back in 2008 we extended the capability bits from 32 to 64, and we did it by extending the single 32-bit capability word from one word to an array of two words. It was then obfuscated by hiding the "2" behind two macro expansions, with the reasoning being that maybe it gets extended further some day.
That reasoning may have been valid at the time, but the last thing we want to do is to extend the capability set any more. And the array of values not only causes source code oddities (with loops to deal with it), but also results in worse code generation. It's a lose-lose situation.
So just change the 'u32[2]' into a 'u64' and be done with it.
We still have to deal with the fact that the user space interface is designed around an array of these 32-bit values, but that was the case before too, since the array layouts were different (ie user space doesn't use an array of 32-bit values for individual capability masks, but an array of 32-bit slices of multiple masks).
So that marshalling of data is actually simplified too, even if it does remain somewhat obscure and odd.
This was all triggered by my reaction to the new "cap_isidentical()" introduced recently. By just using a saner data structure, it went from
unsigned __capi; CAP_FOR_EACH_U32(__capi) { if (a.cap[__capi] != b.cap[__capi]) return false; } return true;
to just being
return a.val == b.val;
instead. Which is rather more obvious both to humans and to compilers.
Cc: Mateusz Guzik <[email protected]> Cc: Casey Schaufler <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Al Viro <[email protected]> Cc: Paul Moore <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8 |
|
| #
ed8fb78d |
| 23-Jul-2022 |
Alexey Dobriyan <[email protected]> |
proc: add some (hopefully) insightful comments
* /proc/${pid}/net status * removing PDE vs last close stuff (again!) * random small stuff
Link: https://lkml.kernel.org/r/YtwrM6sDC0OQ53YB@localhost.
proc: add some (hopefully) insightful comments
* /proc/${pid}/net status * removing PDE vs last close stuff (again!) * random small stuff
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3 |
|
| #
376b0c26 |
| 15-Jun-2022 |
Alexey Dobriyan <[email protected]> |
proc: delete unused <linux/uaccess.h> includes
Those aren't necessary after seq files won.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexey Dobriyan <ado
proc: delete unused <linux/uaccess.h> includes
Those aren't necessary after seq files won.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc2, v5.19-rc1, v5.18 |
|
| #
de399236 |
| 18-May-2022 |
Alexey Gladkov <[email protected]> |
ucounts: Split rlimit and ucount values and max values
Since the semantics of maximum rlimit values are different, it would be better not to mix ucount and rlimit values. This will prevent the error
ucounts: Split rlimit and ucount values and max values
Since the semantics of maximum rlimit values are different, it would be better not to mix ucount and rlimit values. This will prevent the error of using inc_count/dec_ucount for rlimit parameters.
This patch also renames the functions to emphasize the lack of connection between rlimit and ucount.
v3: - Fix BUG:KASAN:use-after-free_in_dec_ucount.
v2: - Fix the array-index-out-of-bounds that was found by the lkp project.
Reported-by: kernel test robot <[email protected]> Signed-off-by: Alexey Gladkov <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4 |
|
| #
355f841a |
| 09-Feb-2022 |
Eric W. Biederman <[email protected]> |
tracehook: Remove tracehook.h
Now that all of the definitions have moved out of tracehook.h into ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in tracehook.h so remove it.
Upda
tracehook: Remove tracehook.h
Now that all of the definitions have moved out of tracehook.h into ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in tracehook.h so remove it.
Update the few files that were depending upon tracehook.h to bring in definitions to use the headers they need directly.
Reviewed-by: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc3, v5.17-rc2, v5.17-rc1 |
|
| #
d6986ce2 |
| 20-Jan-2022 |
Yafang Shao <[email protected]> |
kthread: dynamically allocate memory to store kthread's full name
When I was implementing a new per-cpu kthread cfs_migration, I found the comm of it "cfs_migration/%u" is truncated due to the limit
kthread: dynamically allocate memory to store kthread's full name
When I was implementing a new per-cpu kthread cfs_migration, I found the comm of it "cfs_migration/%u" is truncated due to the limitation of TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 all have the same name "cfs_migration/1", which will confuse the user. This issue is not critical, because we can get the corresponding CPU from the task's Cpus_allowed. But for kthreads corresponding to other hardware devices, it is not easy to get the detailed device info from task comm, for example,
jbd2/nvme0n1p2- xfs-reclaim/sdf
Currently there are so many truncated kthreads:
rcu_tasks_kthre rcu_tasks_rude_ rcu_tasks_trace poll_mpt3sas0_s ext4-rsv-conver xfs-reclaim/sd{a, b, c, ...} xfs-blockgc/sd{a, b, c, ...} xfs-inodegc/sd{a, b, c, ...} audit_send_repl ecryptfs-kthrea vfio-irqfd-clea jbd2/nvme0n1p2- ...
We can shorten these names to work around this problem, but it may be not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-' for example, it is a nice name, and it is not a good idea to shorten it.
One possible way to fix this issue is extending the task comm size, but as task->comm is used in lots of places, that may cause some potential buffer overflows. Another more conservative approach is introducing a new pointer to store kthread's full name if it is truncated, which won't introduce too much overhead as it is in the non-critical path. Finally we make a dicision to use the second approach. See also the discussions in this thread: https://lore.kernel.org/lkml/[email protected]/
After this change, the full name of these truncated kthreads will be displayed via /proc/[pid]/comm:
rcu_tasks_kthread rcu_tasks_rude_kthread rcu_tasks_trace_kthread poll_mpt3sas0_statu ext4-rsv-conversion xfs-reclaim/sdf1 xfs-blockgc/sdf1 xfs-inodegc/sdf1 audit_send_reply ecryptfs-kthread vfio-irqfd-cleanup jbd2/nvme0n1p2-8
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yafang Shao <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Suggested-by: Petr Mladek <[email protected]> Suggested-by: Steven Rostedt <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Michal Miroslaw <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Al Viro <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.16, v5.16-rc8, v5.16-rc7 |
|
| #
2d18f7f4 |
| 21-Dec-2021 |
Eric W. Biederman <[email protected]> |
exit: Use the correct exit_code in /proc/<pid>/stat
Since do_proc_statt was modified to return process wide values instead of per task values the exit_code calculation has never been updated. Update
exit: Use the correct exit_code in /proc/<pid>/stat
Since do_proc_statt was modified to return process wide values instead of per task values the exit_code calculation has never been updated. Update it now to return the process wide exit_code when it is requested and available.
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: bf719d26a5c1 ("[PATCH] distinct tgid/tid CPU usage") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
|
Revision tags: v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4 |
|
| #
4e046156 |
| 29-Sep-2021 |
Kees Cook <[email protected]> |
proc: Use task_is_running() for wchan in /proc/$pid/stat
The implementations of get_wchan() can be expensive. The only information imparted here is whether or not a process is currently blocked in t
proc: Use task_is_running() for wchan in /proc/$pid/stat
The implementations of get_wchan() can be expensive. The only information imparted here is whether or not a process is currently blocked in the scheduler (and even this doesn't need to be exact). Avoid doing the heavy lifting of stack walking and just report that information by using task_is_running().
Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.15-rc3 |
|
| #
0258b5fd |
| 22-Sep-2021 |
Eric W. Biederman <[email protected]> |
coredump: Limit coredumps to a single thread group
Today when a signal is delivered with a handler of SIG_DFL whose default behavior is to generate a core dump not only that process but every proces
coredump: Limit coredumps to a single thread group
Today when a signal is delivered with a handler of SIG_DFL whose default behavior is to generate a core dump not only that process but every process that shares the mm is killed.
In the case of vfork this looks like a real world problem. Consider the following well defined sequence.
if (vfork() == 0) { execve(...); _exit(EXIT_FAILURE); }
If a signal that generates a core dump is received after vfork but before the execve changes the mm the process that called vfork will also be killed (as the mm is shared).
Similarly if the execve fails after the point of no return the kernel delivers SIGSEGV which will kill both the exec'ing process and because the mm is shared the process that called vfork as well.
As far as I can tell this behavior is a violation of people's reasonable expectations, POSIX, and is unnecessarily fragile when the system is low on memory.
Solve this by making a userspace visible change to only kill a single process/thread group. This is possible because Jann Horn recently modified[1] the coredump code so that the mm can safely be modified while the coredump is happening. With LinuxThreads long gone I don't expect anyone to have a notice this behavior change in practice.
To accomplish this move the core_state pointer from mm_struct to signal_struct, which allows different thread groups to coredump simultatenously.
In zap_threads remove the work to kill anything except for the current thread group.
v2: Remove core_state from the VM_BUG_ON_MM print to fix compile failure when CONFIG_DEBUG_VM is enabled. Reported-by: Stephen Rothwell <[email protected]>
[1] a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot") Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3") History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Link: https://lkml.kernel.org/r/87y27mvnke.fsf@disp2133 Link: https://lkml.kernel.org/r/[email protected] Reviewed-by: Kees Cook <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
|
Revision tags: v5.15-rc2, v5.15-rc1 |
|
| #
8d23b208 |
| 08-Sep-2021 |
Christoph Hellwig <[email protected]> |
proc: stop using seq_get_buf in proc_task_name
Use seq_escape_str and seq_printf instead of poking holes into the seq_file abstraction.
Link: https://lkml.kernel.org/r/20210810151945.1795567-1-hch@
proc: stop using seq_get_buf in proc_task_name
Use seq_escape_str and seq_printf instead of poking holes into the seq_file abstraction.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Christian Brauner <[email protected]> Cc: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12 |
|
| #
d6469690 |
| 22-Apr-2021 |
Alexey Gladkov <[email protected]> |
Reimplement RLIMIT_SIGPENDING on top of ucounts
The rlimit counter is tied to uid in the user_namespace. This allows rlimit values to be specified in userns even if they are already globally exceede
Reimplement RLIMIT_SIGPENDING on top of ucounts
The rlimit counter is tied to uid in the user_namespace. This allows rlimit values to be specified in userns even if they are already globally exceeded by the user. However, the value of the previous user_namespaces cannot be exceeded.
Changelog
v11: * Revert most of changes to fix performance issues.
v10: * Fix memory leak on get_ucounts failure.
Signed-off-by: Alexey Gladkov <[email protected]> Link: https://lkml.kernel.org/r/df9d7764dddd50f28616b7840de74ec0f81711a8.1619094428.git.legion@kernel.org Signed-off-by: Eric W. Biederman <[email protected]>
show more ...
|
|
Revision tags: v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4 |
|
| #
64bdc024 |
| 21-Mar-2021 |
[email protected] <[email protected]> |
seccomp: Fix CONFIG tests for Seccomp_filters
Strictly speaking, seccomp filters are only used when CONFIG_SECCOMP_FILTER. This patch fixes the condition to enable "Seccomp_filters" in /proc/$pid/st
seccomp: Fix CONFIG tests for Seccomp_filters
Strictly speaking, seccomp filters are only used when CONFIG_SECCOMP_FILTER. This patch fixes the condition to enable "Seccomp_filters" in /proc/$pid/status.
Signed-off-by: Kenta Tada <[email protected]> Fixes: c818c03b661c ("seccomp: Report number of loaded filters in /proc/$pid/status") Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/OSBPR01MB26772D245E2CF4F26B76A989F5669@OSBPR01MB2677.jpnprd01.prod.outlook.com
show more ...
|
|
Revision tags: v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1 |
|
| #
fe719888 |
| 16-Dec-2020 |
Anand K Mistry <[email protected]> |
proc: provide details on indirect branch speculation
Similar to speculation store bypass, show information about the indirect branch speculation mode of a task in /proc/$pid/status.
For testing/ben
proc: provide details on indirect branch speculation
Similar to speculation store bypass, show information about the indirect branch speculation mode of a task in /proc/$pid/status.
For testing/benchmarking, I needed to see whether IB (Indirect Branch) speculation (see Spectre-v2) is enabled on a task, to see whether an IBPB instruction should be executed on an address space switch. Unfortunately, this information isn't available anywhere else and currently the only way to get it is to hack the kernel to expose it (like this change). It also helped expose a bug with conditional IB speculation on certain CPUs.
Another place this could be useful is to audit the system when using sanboxing. With this change, I can confirm that seccomp-enabled process have IB speculation force disabled as expected when the kernel command line parameter `spectre_v2_user=seccomp`.
Since there's already a 'Speculation_Store_Bypass' field, I used that as precedent for adding this one.
[[email protected]: remove underscores from field name to workaround documentation issue] Link: https://lkml.kernel.org/r/20201106131015.v2.1.I7782b0cedb705384a634cfd8898eb7523562da99@changeid
Link: https://lkml.kernel.org/r/20201030172731.1.I7782b0cedb705384a634cfd8898eb7523562da99@changeid Signed-off-by: Anand K Mistry <[email protected]> Cc: Anthony Steinhauser <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Anand K Mistry <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Alexey Gladkov <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kees Cook <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: NeilBrown <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9 |
|
| #
86fbcd3b |
| 05-Oct-2020 |
Peter Zijlstra <[email protected]> |
sched/proc: Print accurate cpumask vs migrate_disable()
Ensure /proc/*/status doesn't print 'random' cpumasks due to migrate_disable().
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
sched/proc: Print accurate cpumask vs migrate_disable()
Ensure /proc/*/status doesn't print 'random' cpumasks due to migrate_disable().
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Reviewed-by: Daniel Bristot de Oliveira <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
3ae700ec |
| 27-Oct-2020 |
Michael Weiß <[email protected]> |
fs/proc: apply the time namespace offset to /proc/stat btime
'/proc/stat' provides the field 'btime' which states the time stamp of system boot in seconds. In case of time namespaces, the offset to
fs/proc: apply the time namespace offset to /proc/stat btime
'/proc/stat' provides the field 'btime' which states the time stamp of system boot in seconds. In case of time namespaces, the offset to the boot time stamp was not applied earlier. This confuses tasks which are in another time universe, e.g., in a container of a container runtime which utilize time namespaces to virtualize boottime.
Therefore, we make procfs to virtualize also the btime field by subtracting the offset of the timens boottime from 'btime' before printing the stats.
Since start_boottime of processes are seconds since boottime and the boottime stamp is now shifted according to the timens offset, the offset of the time namespace also needs to be applied before the process stats are given to userspace.
This avoids that processes shown, e.g., by 'ps' appear as time travelers in the corresponding time namespace.
Signed-off-by: Michael Weiß <[email protected]> Reviewed-by: Andrei Vagin <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Acked-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6 |
|
| #
c818c03b |
| 13-May-2020 |
Kees Cook <[email protected]> |
seccomp: Report number of loaded filters in /proc/$pid/status
A common question asked when debugging seccomp filters is "how many filters are attached to your process?" Provide a way to easily answe
seccomp: Report number of loaded filters in /proc/$pid/status
A common question asked when debugging seccomp filters is "how many filters are attached to your process?" Provide a way to easily answer this question through /proc/$pid/status with a "Seccomp_filters" line.
Signed-off-by: Kees Cook <[email protected]>
show more ...
|
| #
e31cf2f4 |
| 09-Jun-2020 |
Mike Rapoport <[email protected]> |
mm: don't include asm/pgtable.h if linux/mm.h is already included
Patch series "mm: consolidate definitions of page table accessors", v2.
The low level page table accessors (pXY_index(), pXY_offset
mm: don't include asm/pgtable.h if linux/mm.h is already included
Patch series "mm: consolidate definitions of page table accessors", v2.
The low level page table accessors (pXY_index(), pXY_offset()) are duplicated across all architectures and sometimes more than once. For instance, we have 31 definition of pgd_offset() for 25 supported architectures.
Most of these definitions are actually identical and typically it boils down to, e.g.
static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); }
static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); }
These definitions can be shared among 90% of the arches provided XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.
For architectures that really need a custom version there is always possibility to override the generic version with the usual ifdefs magic.
These patches introduce include/linux/pgtable.h that replaces include/asm-generic/pgtable.h and add the definitions of the page table accessors to the new header.
This patch (of 12):
The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the functions involving page table manipulations, e.g. pte_alloc() and pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h> in the files that include <linux/mm.h>.
The include statements in such cases are remove with a simple loop:
for f in $(git grep -l "include <linux/mm.h>") ; do sed -i -e '/include <asm\/pgtable.h>/ d' $f done
Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nick Hu <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|