base.c - OpenGrok history log for /linux-6.15/fs/proc/base.c

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14
# 6287fbad	19-Mar-2025	Bart Van Assche <[email protected]>	fs/procfs: fix the comment above proc_pid_wchan() proc_pid_wchan() used to report kernel addresses to user space but that is no longer the case today. Bring the comment above proc_pid_wchan() in sy fs/procfs: fix the comment above proc_pid_wchan() proc_pid_wchan() used to report kernel addresses to user space but that is no longer the case today. Bring the comment above proc_pid_wchan() in sync with the implementation. Link: https://lkml.kernel.org/r/[email protected] Fixes: b2f73922d119 ("fs/proc, core/debug: Don't expose absolute kernel addresses via wchan") Signed-off-by: Bart Van Assche <[email protected]> Cc: Kees Cook <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# dd5bdaf2	17-Mar-2025	Ingo Molnar <[email protected]>	sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional All the big Linux distros enable CONFIG_SCHED_DEBUG, because the various features it provides help not just with kernel development, sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional All the big Linux distros enable CONFIG_SCHED_DEBUG, because the various features it provides help not just with kernel development, but with system administration and user-space software development as well. Reflect this reality and enable this functionality unconditionally. Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Shrikanth Hegde <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ben Segall <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected] show more ...
Revision tags: v6.14-rc7, v6.14-rc6
# 2dc4dbf8	08-Mar-2025	Thomas Gleixner <[email protected]>	posix-timers: Dont iterate /proc/$PID/timers with sighand:: Siglock held The readout of /proc/$PID/timers holds sighand::siglock with interrupts disabled. That is required to protect against concurr posix-timers: Dont iterate /proc/$PID/timers with sighand:: Siglock held The readout of /proc/$PID/timers holds sighand::siglock with interrupts disabled. That is required to protect against concurrent modifications of the task::signal::posix_timers list because the list is not RCU safe. With the conversion of the timer storage to a RCU protected hlist, this is not longer required. The only requirement is to protect the returned entry against a concurrent free, which is trivial as the timers are RCU protected. Removing the trylock of sighand::siglock is benign because the life time of task_struct::signal is bound to the life time of the task_struct itself. There are two scenarios where this matters: 1) The process is life and not about to be checkpointed 2) The process is stopped via ptrace for checkpointing #1 is a racy snapshot of the armed timers and nothing can rely on it. It's not more than debug information and it has been that way before because sighand lock is dropped when the buffer is full and the restart of the iteration might find a completely different set of timers. The task and therefore task::signal cannot be freed as timers_start() acquired a reference count via get_pid_task(). #2 the process is stopped for checkpointing so nothing can delete or create timers at this point. Neither can the process exit during the traversal. If CRIU fails to observe an exit in progress prior to the dissimination of the timers, then there are more severe problems to solve in the CRIU mechanics as they can't rely on posix timers being enabled in the first place. Therefore replace the lock acquisition with rcu_read_lock() and switch the timer storage traversal over to seq_hlist_*_rcu(). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/all/[email protected] show more ...
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2
# 5be1fa8a	08-Dec-2024	Al Viro <[email protected]>	Pass parent directory inode and expected name to ->d_revalidate() ->d_revalidate() often needs to access dentry parent and name; that has to be done carefully, since the locking environment varies f Pass parent directory inode and expected name to ->d_revalidate() ->d_revalidate() often needs to access dentry parent and name; that has to be done carefully, since the locking environment varies from caller to caller. We are not guaranteed that dentry in question will not be moved right under us - not unless the filesystem is such that nothing on it ever gets renamed. It can be dealt with, but that results in boilerplate code that isn't even needed - the callers normally have just found the dentry via dcache lookup and want to verify that it's in the right place; they already have the values of ->d_parent and ->d_name stable. There is a couple of exceptions (overlayfs and, to less extent, ecryptfs), but for the majority of calls that song and dance is not needed at all. It's easier to make ecryptfs and overlayfs find and pass those values if there's a ->d_revalidate() instance to be called, rather than doing that in the instances. This commit only changes the calling conventions; making use of supplied values is left to followups. NOTE: some instances need more than just the parent - things like CIFS may need to build an entire path from filesystem root, so they need more precautions than the usual boilerplate. This series doesn't do anything to that need - these filesystems have to keep their locking mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem a-la v9fs). One thing to keep in mind when using name is that name->name will normally point into the pathname being resolved; the filename in question occupies name->len bytes starting at name->name, and there is NUL somewhere after it, but it the next byte might very well be '/' rather than '\0'. Do not ignore name->len. Reviewed-by: Jeff Layton <[email protected]> Reviewed-by: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Al Viro <[email protected]> show more ...
# 3ab76c76	10-Jan-2025	xu xin <[email protected]>	ksm: add ksm involvement information for each process In /proc/<pid>/ksm_stat, add two extra ksm involvement items including KSM_mergeable and KSM_merge_any. It helps administrators to better know ksm: add ksm involvement information for each process In /proc/<pid>/ksm_stat, add two extra ksm involvement items including KSM_mergeable and KSM_merge_any. It helps administrators to better know the system's KSM behavior at process level. ksm_merge_any: yes/no whether the process'mm is added by prctl() into the candidate list of KSM or not, and fully enabled at process level. ksm_mergeable: yes/no whether any VMAs of the process'mm are currently applicable to KSM. Purpose ======= These two items are just to improve the observability of KSM at process level, so that users can know if a certain process has enabled KSM. For example, if without these two items, when we look at /proc/<pid>/ksm_stat and there's no merging pages found, We are not sure whether it is because KSM was not enabled or because KSM did not successfully merge any pages. Although "mg" in /proc/<pid>/smaps indicate VM_MERGEABLE, it's opaque and not very obvious for non professionals. [[email protected]: wording tweaks, per David and akpm] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: xu xin <[email protected]> Acked-by: David Hildenbrand <[email protected]> Tested-by: Mario Casquero <[email protected]> Cc: Wang Yaxin <[email protected]> Cc: Yang Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
Revision tags: v6.13-rc1, v6.12, v6.12-rc7
# 6017a158	05-Nov-2024	Thomas Gleixner <[email protected]>	posix-timers: Embed sigqueue in struct k_itimer To cure the SIG_IGN handling for posix interval timers, the preallocated sigqueue needs to be embedded into struct k_itimer to prevent life time races posix-timers: Embed sigqueue in struct k_itimer To cure the SIG_IGN handling for posix interval timers, the preallocated sigqueue needs to be embedded into struct k_itimer to prevent life time races of all sorts. Now that the prerequisites are in place, embed the sigqueue into struct k_itimer and fixup the relevant usage sites. Aside of preparing for proper SIG_IGN handling, this spares an extra allocation. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected] show more ...
Revision tags: v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1
# cd3f8467	24-Sep-2024	Lorenzo Stoakes <[email protected]>	mm: refactor mm_access() to not return NULL mm_access() can return NULL if the mm is not found, but this is handled the same as an error in all callers, with some translating this into an -ESRCH err mm: refactor mm_access() to not return NULL mm_access() can return NULL if the mm is not found, but this is handled the same as an error in all callers, with some translating this into an -ESRCH error. Only proc_mem_open() returns NULL if no mm is found, however in this case it is clearer and makes more sense to explicitly handle the error. Additionally we take the opportunity to refactor the function to eliminate unnecessary nesting. Simplify things by simply returning -ESRCH if no mm is found - this both eliminates confusing use of the IS_ERR_OR_NULL() macro, and simplifies callers which would return -ESRCH by returning this error directly. [[email protected]: prefer neater pointer error comparison] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Lorenzo Stoakes <[email protected]> Suggested-by: Arnd Bergmann <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3
# be5498ca	03-Jun-2024	Al Viro <[email protected]>	remove pointless includes of <linux/fdtable.h> some of those used to be needed, some had been cargo-culted for no reason... Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Vir remove pointless includes of <linux/fdtable.h> some of those used to be needed, some had been cargo-culted for no reason... Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]> show more ...
# b4dba2ef	30-Aug-2024	Christian Brauner <[email protected]>	proc: store cookie in private data Store the cookie to detect concurrent seeks on directories in file->private_data. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-14-6d3e4816aa7b@k proc: store cookie in private data Store the cookie to detect concurrent seeks on directories in file->private_data. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Christian Brauner <[email protected]> show more ...
# 641bb439	09-Aug-2024	Christian Brauner <[email protected]>	fs: move FMODE_UNSIGNED_OFFSET to fop_flags This is another flag that is statically set and doesn't need to use up an FMODE_* bit. Move it to ->fop_flags and free up another FMODE_* bit. (1) mem_op fs: move FMODE_UNSIGNED_OFFSET to fop_flags This is another flag that is statically set and doesn't need to use up an FMODE_* bit. Move it to ->fop_flags and free up another FMODE_* bit. (1) mem_open() used from proc_mem_operations (2) adi_open() used from adi_fops (3) drm_open_helper(): (3.1) accel_open() used from DRM_ACCEL_FOPS (3.2) drm_open() used from (3.2.1) amdgpu_driver_kms_fops (3.2.2) psb_gem_fops (3.2.3) i915_driver_fops (3.2.4) nouveau_driver_fops (3.2.5) panthor_drm_driver_fops (3.2.6) radeon_driver_kms_fops (3.2.7) tegra_drm_fops (3.2.8) vmwgfx_driver_fops (3.2.9) xe_driver_fops (3.2.10) DRM_GEM_FOPS (3.2.11) DEFINE_DRM_GEM_DMA_FOPS (4) struct memdev sets fmode flags based on type of device opened. For devices using struct mem_fops unsigned offset is used. Mark all these file operations as FOP_UNSIGNED_OFFSET and add asserts into the open helper to ensure that the flag is always set. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jeff Layton <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]> show more ...
# 3836b31c	06-Aug-2024	Christian Brauner <[email protected]>	proc: block mounting on top of /proc/<pid>/map_files/* Entries under /proc/<pid>/map_files/* are ephemeral and may go away before the process dies. As such allowing them to be used as mount points c proc: block mounting on top of /proc/<pid>/map_files/* Entries under /proc/<pid>/map_files/* are ephemeral and may go away before the process dies. As such allowing them to be used as mount points creates the ability to leak mounts that linger until the process dies with no ability to unmount them until then. Don't allow using them as mountpoints. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Christian Brauner <[email protected]> show more ...
# 41e8149c	02-Aug-2024	Adrian Ratiu <[email protected]>	proc: add config & param to block forcing mem writes This adds a Kconfig option and boot param to allow removing the FOLL_FORCE flag from /proc/pid/mem write calls because it can be abused. The tra proc: add config & param to block forcing mem writes This adds a Kconfig option and boot param to allow removing the FOLL_FORCE flag from /proc/pid/mem write calls because it can be abused. The traditional forcing behavior is kept as default because it can break GDB and some other use cases. Previously we tried a more sophisticated approach allowing distributions to fine-tune /proc/pid/mem behavior, however that got NAK-ed by Linus [1], who prefers this simpler approach with semantics also easier to understand for users. Link: https://lore.kernel.org/lkml/CAHk-=wiGWLChxYmUA5HrT5aopZrB7_2VTa0NLZcxORgkUe5tEQ@mail.gmail.com/ [1] Cc: Doug Anderson <[email protected]> Cc: Jeff Xu <[email protected]> Cc: Jann Horn <[email protected]> Cc: Kees Cook <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Christian Brauner <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Adrian Ratiu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]> show more ...
# ed4fb6d7	14-Aug-2024	Felix Moessbauer <[email protected]>	hrtimer: Use and report correct timerslack values for realtime tasks The timerslack_ns setting is used to specify how much the hardware timers should be delayed, to potentially dispatch multiple tim hrtimer: Use and report correct timerslack values for realtime tasks The timerslack_ns setting is used to specify how much the hardware timers should be delayed, to potentially dispatch multiple timers in a single interrupt. This is a performance optimization. Timers of realtime tasks (having a realtime scheduling policy) should not be delayed. This logic was inconsitently applied to the hrtimers, leading to delays of realtime tasks which used timed waits for events (e.g. condition variables). Due to the downstream override of the slack for rt tasks, the procfs reported incorrect (non-zero) timerslack_ns values. This is changed by setting the timer_slack_ns task attribute to 0 for all tasks with a rt policy. By that, downstream users do not need to specially handle rt tasks (w.r.t. the slack), and the procfs entry shows the correct value of "0". Setting non-zero slack values (either via procfs or PR_SET_TIMERSLACK) on tasks with a rt policy is ignored, as stated in "man 2 PR_SET_TIMERSLACK": Timer slack is not applied to threads that are scheduled under a real-time scheduling policy (see sched_setscheduler(2)). The special handling of timerslack on rt tasks in downstream users is removed as well. Signed-off-by: Felix Moessbauer <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] show more ...
# 52dea0a1	10-Jun-2024	Thomas Gleixner <[email protected]>	posix-timers: Convert timer list to hlist No requirement for a real list. Spare a few bytes. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Frederic Weisbecker <frederic@kernel. posix-timers: Convert timer list to hlist No requirement for a real list. Spare a few bytes. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> show more ...
Revision tags: v6.10-rc2
# c2dc78b8	28-May-2024	Chengming Zhou <[email protected]>	mm/ksm: fix ksm_zero_pages accounting We normally ksm_zero_pages++ in ksmd when page is merged with zero page, but ksm_zero_pages-- is done from page tables side, where there is no any accessing pro mm/ksm: fix ksm_zero_pages accounting We normally ksm_zero_pages++ in ksmd when page is merged with zero page, but ksm_zero_pages-- is done from page tables side, where there is no any accessing protection of ksm_zero_pages. So we can read very exceptional value of ksm_zero_pages in rare cases, such as -1, which is very confusing to users. Fix it by changing to use atomic_long_t, and the same case with the mm->ksm_zero_pages. Link: https://lkml.kernel.org/r/[email protected] Fixes: e2942062e01d ("ksm: count all zero pages placed by KSM") Fixes: 6080d19f0704 ("ksm: add ksm zero pages for each process") Signed-off-by: Chengming Zhou <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ran Xiaokai <[email protected]> Cc: Stefan Roesch <[email protected]> Cc: xu xin <[email protected]> Cc: Yang Yang <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3
# 47458802	20-Sep-2023	Al Viro <[email protected]>	procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() that keeps both around until struct inode is freed, making access to them safe from rcu-pathwalk Acked-by: Christian Brauner procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() that keeps both around until struct inode is freed, making access to them safe from rcu-pathwalk Acked-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]> show more ...
Revision tags: v6.6-rc2
# 267c068e	12-Sep-2023	Casey Schaufler <[email protected]>	proc: Use lsmids instead of lsm names for attrs Use the LSM ID number instead of the LSM name to identify which security module's attibute data should be shown in /proc/self/attr. The security_[gs]e proc: Use lsmids instead of lsm names for attrs Use the LSM ID number instead of the LSM name to identify which security module's attibute data should be shown in /proc/self/attr. The security_[gs]etprocattr() functions have been changed to expect the LSM ID. The change from a string comparison to an integer comparison in these functions will provide a minor performance improvement. Cc: [email protected] Signed-off-by: Casey Schaufler <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Serge Hallyn <[email protected]> Reviewed-by: Mickael Salaun <[email protected]> Reviewed-by: John Johansen <[email protected]> Signed-off-by: Paul Moore <[email protected]> show more ...
# 63993102	26-Oct-2023	Yang Li <[email protected]>	fs/proc/base.c: remove unneeded semicolon ./fs/proc/base.c:3829:2-3: Unneeded semicolon Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Li <yang fs/proc/base.c: remove unneeded semicolon ./fs/proc/base.c:3829:2-3: Unneeded semicolon Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Li <[email protected]> Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7057 Acked-by: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# 1df4bd83	23-Oct-2023	Oleg Nesterov <[email protected]>	do_io_accounting: use sig->stats_lock Rather than lock_task_sighand(), sig->stats_lock was specifically designed for this type of use. This way the "if (whole)" branch runs lockless in the likely c do_io_accounting: use sig->stats_lock Rather than lock_task_sighand(), sig->stats_lock was specifically designed for this type of use. This way the "if (whole)" branch runs lockless in the likely case. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# 23202220	23-Oct-2023	Oleg Nesterov <[email protected]>	do_io_accounting: use __for_each_thread() Rather than while_each_thread() which should be avoided when possible. This makes the code more clear and allows the next change. Link: https://lkml.kerne do_io_accounting: use __for_each_thread() Rather than while_each_thread() which should be avoided when possible. This makes the code more clear and allows the next change. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# 08582d67	09-Oct-2023	Amir Goldstein <[email protected]>	fs: create helper file_user_path() for user displayed mapped file path Overlayfs uses backing files with "fake" overlayfs f_path and "real" underlying f_inode, in order to use underlying inode aops fs: create helper file_user_path() for user displayed mapped file path Overlayfs uses backing files with "fake" overlayfs f_path and "real" underlying f_inode, in order to use underlying inode aops for mapped files and to display the overlayfs path in /proc/<pid>/maps. In preparation for storing the overlayfs "fake" path instead of the underlying "real" path in struct backing_file, define a noop helper file_user_path() that returns f_path for now. Use the new helper in procfs and kernel logs whenever a path of a mapped file is displayed to users. Signed-off-by: Amir Goldstein <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]> show more ...
# 860a2e7f	29-Sep-2023	Alexey Dobriyan <[email protected]>	proc: use initializer for clearing some buffers Save LOC by using dark magic of initialisation instead of memset(). Those buffer aren't passed to userspace directly so padding is not an issue. Lin proc: use initializer for clearing some buffers Save LOC by using dark magic of initialisation instead of memset(). Those buffer aren't passed to userspace directly so padding is not an issue. Link: https://lkml.kernel.org/r/3821d3a2-6e10-4629-b0d5-9519d828ab72@p183 Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# 200d9421	04-Oct-2023	Jeff Layton <[email protected]>	proc: convert to new timestamp accessors Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <[email protected]> Link: https://lore.kernel.org/r/20231004185347. proc: convert to new timestamp accessors Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]> show more ...
Revision tags: v6.6-rc1, v6.5
# 33a98138	24-Aug-2023	Oleg Nesterov <[email protected]>	introduce __next_thread(), fix next_tid() vs exec() race Patch series "introduce __next_thread(), change next_thread()". After commit dce8f8ed1de1 ("document while_each_thread(), change first_tid() introduce __next_thread(), fix next_tid() vs exec() race Patch series "introduce __next_thread(), change next_thread()". After commit dce8f8ed1de1 ("document while_each_thread(), change first_tid() to use for_each_thread()") + this series 1. We have only one lockless user of next_thread(), task_group_seq_get_next(). I think it should be changed too. 2. We have only one user of task_struct->thread_group, thread_group_empty(). The next patches will change thread_group_empty() and kill ->thread_group. This patch (of 2): next_tid(start) does: rcu_read_lock(); if (pid_alive(start)) { pos = next_thread(start); if (thread_group_leader(pos)) pos = NULL; else get_task_struct(pos); it should return pos = NULL when next_thread() wraps to the 1st thread in the thread group, group leader, and the thread_group_leader() check tries to detect this case. But this can race with exec. To simplify, suppose we have a main thread M and a single sub-thread T, next_tid(T) should return NULL. Now suppose that T execs. If next_tid(T) is called after T changes the leadership and before it does release_task() which removes the old leader from list, then next_thread() returns M and thread_group_leader(M) = F. Lockless use of next_thread() should be avoided. After this change only task_group_seq_get_next() does this, and I believe it should be changed as well. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
# dce8f8ed	23-Aug-2023	Oleg Nesterov <[email protected]>	document while_each_thread(), change first_tid() to use for_each_thread() Add the comment to explain that while_each_thread(g,t) is not rcu-safe unless g is stable (e.g. current). Even if g is a g document while_each_thread(), change first_tid() to use for_each_thread() Add the comment to explain that while_each_thread(g,t) is not rcu-safe unless g is stable (e.g. current). Even if g is a group leader and thus can't exit before t, t or another sub-thread can exec and remove g from the thread_group list. The only lockless user of while_each_thread() is first_tid() and it is fine in that it can't loop forever, yet for_each_thread() looks better and I am going to change while_each_thread/next_thread. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Eric W. Biederman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> show more ...
12 3 4 5 6 7 8 9 10 >>...25