|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7 |
|
| #
ec2d0c04 |
| 11-Mar-2025 |
Thomas Gleixner <[email protected]> |
posix-timers: Provide a mechanism to allocate a given timer ID
Checkpoint/Restore in Userspace (CRIU) requires to reconstruct posix timers with the same timer ID on restore. It uses sys_timer_create
posix-timers: Provide a mechanism to allocate a given timer ID
Checkpoint/Restore in Userspace (CRIU) requires to reconstruct posix timers with the same timer ID on restore. It uses sys_timer_create() and relies on the monotonic increasing timer ID provided by this syscall. It creates and deletes timers until the desired ID is reached. This is can loop for a long time, when the checkpointed process had a very sparse timer ID range.
It has been debated to implement a new syscall to allow the creation of timers with a given timer ID, but that's tideous due to the 32/64bit compat issues of sigevent_t and of dubious value.
The restore mechanism of CRIU creates the timers in a state where all threads of the restored process are held on a barrier and cannot issue syscalls. That means the restorer task has exclusive control.
This allows to address this issue with a prctl() so that the restorer thread can do:
if (prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_ON)) goto linear_mode; create_timers_with_explicit_ids(); prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_OFF); This is backwards compatible because the prctl() fails on older kernels and CRIU can fall back to the linear timer ID mechanism. CRIU versions which do not know about the prctl() just work as before.
Implement the prctl() and modify timer_create() so that it copies the requested timer ID from userspace by utilizing the existing timer_t pointer, which is used to copy out the allocated timer ID on success.
If the prctl() is disabled, which it is by default, timer_create() works as before and does not try to read from the userspace pointer.
There is no problem when a broken or rogue user space application enables the prctl(). If the user space pointer does not contain a valid ID, then timer_create() fails. If the data is not initialized, but constains a random valid ID, timer_create() will create that random timer ID or fail if the ID is already given out. As CRIU must use the raw syscall to avoid manipulating the internal state of the restored process, this has no library dependencies and can be adopted by CRIU right away.
Recreating two timers with IDs 1000000 and 2000000 takes 1.5 seconds with the create/delete method. With the prctl() it takes 3 microseconds.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Reviewed-by: Cyrill Gorcunov <[email protected]> Tested-by: Cyrill Gorcunov <[email protected]> Link: https://lore.kernel.org/all/87jz8vz0en.ffs@tglx
show more ...
|
|
Revision tags: v6.14-rc6 |
|
| #
5fa75a43 |
| 08-Mar-2025 |
Thomas Gleixner <[email protected]> |
posix-timers: Avoid false cacheline sharing
struct k_itimer has the hlist_node, which is used for lookup in the hash bucket, and the timer lock in the same cache line.
That's obviously bad, if one
posix-timers: Avoid false cacheline sharing
struct k_itimer has the hlist_node, which is used for lookup in the hash bucket, and the timer lock in the same cache line.
That's obviously bad, if one CPU fiddles with a timer and the other is walking the hash bucket on which that timer is queued.
Avoid this by restructuring struct k_itimer, so that the read mostly (only modified during setup and teardown) fields are in the first cache line and the lock and the rest of the fields which get written to are in cacheline 2-N.
Reduces cacheline contention in a test case of 64 processes creating and accessing 20000 timers each by almost 30% according to perf.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
1d25bdd3 |
| 10-Mar-2025 |
Thomas Gleixner <[email protected]> |
posix-timers: Rework timer removal
sys_timer_delete() and the do_exit() cleanup function itimer_delete() are doing the same thing, but have needlessly different implementations instead of sharing th
posix-timers: Rework timer removal
sys_timer_delete() and the do_exit() cleanup function itimer_delete() are doing the same thing, but have needlessly different implementations instead of sharing the code.
The other oddity of timer deletion is the fact that the timer is not invalidated before the actual deletion happens, which allows concurrent lookups to succeed.
That's wrong because a timer which is in the process of being deleted should not be visible and any actions like signal queueing, delivery and rearming should not happen once the task, which invoked timer_delete(), has the timer locked.
Rework the code so that:
1) The signal queueing and delivery code ignore timers which are marked invalid
2) The deletion implementation between sys_timer_delete() and itimer_delete() is shared
3) The timer is invalidated and removed from the linked lists before the deletion callback of the relevant clock is invoked.
That requires to rework timer_wait_running() as it does a lookup of the timer when relocking it at the end. In case of deletion this lookup would fail due to the preceding invalidation and the wait loop would terminate prematurely.
But due to the preceding invalidation the timer cannot be accessed by other tasks anymore, so there is no way that the timer has been freed after the timer lock has been dropped.
Move the re-validation out of timer_wait_running() and handle it at the only other usage site, timer_settime().
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/all/87zfht1exf.ffs@tglx
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7 |
|
| #
7a66f72b |
| 05-Nov-2024 |
Thomas Gleixner <[email protected]> |
posix-timers: Cleanup SIG_IGN workaround leftovers
Now that ignored posix timer signals are requeued and the timers are rearmed on signal delivery the workaround to keep such timers alive and self r
posix-timers: Cleanup SIG_IGN workaround leftovers
Now that ignored posix timer signals are requeued and the timers are rearmed on signal delivery the workaround to keep such timers alive and self rearm them is not longer required.
Remove the relevant hacks and the not longer required return values from the related functions. The alarm timer workarounds will be cleaned up in a separate step.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
df7a996b |
| 05-Nov-2024 |
Thomas Gleixner <[email protected]> |
signal: Queue ignored posixtimers on ignore list
Queue posixtimers which have their signal ignored on the ignored list:
1) When the timer fires and the signal has SIG_IGN set
2) When SIG_IGN
signal: Queue ignored posixtimers on ignore list
Queue posixtimers which have their signal ignored on the ignored list:
1) When the timer fires and the signal has SIG_IGN set
2) When SIG_IGN is installed via sigaction() and a timer signal is already queued
This only happens when the signal is for a valid timer, which delivered the signal in periodic mode. One-shot timer signals are correctly dropped.
Due to the lock order constraints (sighand::siglock nests inside timer::lock) the signal code cannot access any of the timer fields which are relevant to make this decision, e.g. timer::it_status.
This is addressed by establishing a protection scheme which requires to lock both locks on the timer side for modifying decision fields in the timer struct and therefore makes it possible for the signal delivery to evaluate with only sighand:siglock being held:
1) Move the NULLification of timer->it_signal into the sighand::siglock protected section of timer_delete() and check timer::it_signal in the code path which determines whether the signal is dropped or queued on the ignore list.
This ensures that a deleted timer cannot be moved onto the ignore list, which would prevent it from being freed on exit() as it is not longer in the process' posix timer list.
If the timer got moved to the ignored list before deletion then it is removed from the ignored list under sighand lock in timer_delete().
2) Provide a new timer::it_sig_periodic flag, which gets set in the signal queue path with both timer and sighand locks held if the timer is actually in periodic mode at expiry time.
The ignore list code checks this flag under sighand::siglock and drops the signal when it is not set.
If it is set, then the signal is moved to the ignored list independent of the actual state of the timer.
When the signal is un-ignored later then the signal is moved back to the signal queue. On signal delivery the posix timer side decides about dropping the signal if the timer was re-armed, dis-armed or deleted based on the signal sequence counter check.
If the thread/process exits then not yet delivered signals are discarded which means the reference of the timer containing the sigqueue is dropped and frees the timer.
This is way cheaper than requiring all code paths to lock sighand::siglock of the target thread/process on any modification of timer::it_status or going all the way and removing pending signals from the signal queues on every rearm, disarm or delete operation.
So the protection scheme here is that on the timer side both timer::lock and sighand::siglock have to be held for modifying
timer::it_signal timer::it_sig_periodic
which means that on the signal side holding sighand::siglock is enough to evaluate these fields. In posixtimer_deliver_signal() holding timer::lock is sufficient to do the sequence validation against timer::it_signal_seq because a concurrent expiry is waiting on timer::lock to be released.
This completes the SIG_IGN handling and such timers are not longer self rearmed which avoids pointless wakeups.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
0e20cd33 |
| 05-Nov-2024 |
Thomas Gleixner <[email protected]> |
posix-timers: Handle ignored list on delete and exit
To handle posix timer signals on sigaction(SIG_IGN) properly, the timers will be queued on a separate ignored list.
Add the necessary cleanup co
posix-timers: Handle ignored list on delete and exit
To handle posix timer signals on sigaction(SIG_IGN) properly, the timers will be queued on a separate ignored list.
Add the necessary cleanup code for timer_delete() and exit_itimers().
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
647da5f7 |
| 05-Nov-2024 |
Thomas Gleixner <[email protected]> |
posix-timers: Move sequence logic into struct k_itimer
The posix timer signal handling uses siginfo::si_sys_private for handling the sequence counter check. That indirection is not longer required a
posix-timers: Move sequence logic into struct k_itimer
The posix timer signal handling uses siginfo::si_sys_private for handling the sequence counter check. That indirection is not longer required and the sequence count value at signal queueing time can be stored in struct k_itimer itself.
This removes the requirement of treating siginfo::si_sys_private special as it's now always zero as the kernel does not touch it anymore.
Suggested-by: Eric W. Biederman <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
6017a158 |
| 05-Nov-2024 |
Thomas Gleixner <[email protected]> |
posix-timers: Embed sigqueue in struct k_itimer
To cure the SIG_IGN handling for posix interval timers, the preallocated sigqueue needs to be embedded into struct k_itimer to prevent life time races
posix-timers: Embed sigqueue in struct k_itimer
To cure the SIG_IGN handling for posix interval timers, the preallocated sigqueue needs to be embedded into struct k_itimer to prevent life time races of all sorts.
Now that the prerequisites are in place, embed the sigqueue into struct k_itimer and fixup the relevant usage sites.
Aside of preparing for proper SIG_IGN handling, this spares an extra allocation.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
11629b98 |
| 05-Nov-2024 |
Thomas Gleixner <[email protected]> |
signal: Replace resched_timer logic
In preparation for handling ignored posix timer signals correctly and embedding the sigqueue struct into struct k_itimer, hand down a pointer to the sigqueue stru
signal: Replace resched_timer logic
In preparation for handling ignored posix timer signals correctly and embedding the sigqueue struct into struct k_itimer, hand down a pointer to the sigqueue struct into posix_timer_deliver_signal() instead of just having a boolean flag.
No functional change.
Suggested-by: Eric W. Biederman <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
0360ed14 |
| 05-Nov-2024 |
Thomas Gleixner <[email protected]> |
signal: Refactor send_sigqueue()
To handle posix timers which have their signal ignored via SIG_IGN properly it is required to requeue a ignored signal for delivery when SIG_IGN is lifted so the tim
signal: Refactor send_sigqueue()
To handle posix timers which have their signal ignored via SIG_IGN properly it is required to requeue a ignored signal for delivery when SIG_IGN is lifted so the timer gets rearmed.
Split the required code out of send_sigqueue() so it can be reused in context of sigaction().
While at it rename send_sigqueue() to posixtimer_send_sigqueue() so its clear what this is about.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
ef1c5bcd |
| 05-Nov-2024 |
Thomas Gleixner <[email protected]> |
posix-timers: Store PID type in the timer
instead of re-evaluating the signal delivery mode everywhere.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <frederi
posix-timers: Store PID type in the timer
instead of re-evaluating the signal delivery mode everywhere.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
54f1dd64 |
| 05-Nov-2024 |
Thomas Gleixner <[email protected]> |
signal: Provide posixtimer_sigqueue_init()
To cure the SIG_IGN handling for posix interval timers, the preallocated sigqueue needs to be embedded into struct k_itimer to prevent life time races of a
signal: Provide posixtimer_sigqueue_init()
To cure the SIG_IGN handling for posix interval timers, the preallocated sigqueue needs to be embedded into struct k_itimer to prevent life time races of all sorts.
Provide a new function to initialize the embedded sigqueue to prepare for that.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
5d916a09 |
| 05-Nov-2024 |
Thomas Gleixner <[email protected]> |
posix-timers: Add a refcount to struct k_itimer
To cure the SIG_IGN handling for posix interval timers, the preallocated sigqueue needs to be embedded into struct k_itimer to prevent life time races
posix-timers: Add a refcount to struct k_itimer
To cure the SIG_IGN handling for posix interval timers, the preallocated sigqueue needs to be embedded into struct k_itimer to prevent life time races of all sorts.
To make that work correctly it needs reference counting so that timer deletion does not free the timer prematuraly when there is a signal queued or delivered concurrently.
Add a rcuref to the posix timer part.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
4cf7bf2a |
| 05-Nov-2024 |
Thomas Gleixner <[email protected]> |
posix-cpu-timers: Use dedicated flag for CPU timer nanosleep
POSIX CPU timer nanosleep creates a k_itimer on stack and uses the sigq pointer to detect the nanosleep case in the expiry function.
Pre
posix-cpu-timers: Use dedicated flag for CPU timer nanosleep
POSIX CPU timer nanosleep creates a k_itimer on stack and uses the sigq pointer to detect the nanosleep case in the expiry function.
Prepare for embedding sigqueue into struct k_itimer by using a dedicated flag for nanosleep.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
bf635681 |
| 05-Nov-2024 |
Thomas Gleixner <[email protected]> |
posix-cpu-timers: Cleanup the firing logic
The firing flag of a posix CPU timer is tristate:
0: when the timer is not about to deliver a signal
1: when the timer has expired, but the signal ha
posix-cpu-timers: Cleanup the firing logic
The firing flag of a posix CPU timer is tristate:
0: when the timer is not about to deliver a signal
1: when the timer has expired, but the signal has not been delivered yet
-1: when the timer was queued for signal delivery and a rearm operation raced against it and supressed the signal delivery.
This is a pointless exercise as this can be simply expressed with a boolean. Only if set, the signal is delivered. This makes delete and rearm consistent with the rest of the posix timers.
Convert firing to bool and fixup the usage sites accordingly and add comments why the timer cannot be dequeued right away.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
513793bc |
| 05-Nov-2024 |
Thomas Gleixner <[email protected]> |
posix-timers: Make signal delivery consistent
Signals of timers which are reprogammed, disarmed or deleted can deliver signals related to the past. The POSIX spec is blury about this:
- "The effec
posix-timers: Make signal delivery consistent
Signals of timers which are reprogammed, disarmed or deleted can deliver signals related to the past. The POSIX spec is blury about this:
- "The effect of disarming or resetting a timer with pending expiration notifications is unspecified."
- "The disposition of pending signals for the deleted timer is unspecified."
In both cases it is reasonable to expect that pending signals are discarded. Especially in the reprogramming case it does not make sense to account for previous overruns or to deliver a signal for a timer which has been disarmed. This makes the behaviour consistent and understandable.
Remove the si_sys_private check from the signal delivery code and invoke posix_timer_deliver_signal() unconditionally for posix timer related signals.
Change posix_timer_deliver_signal() so it controls the actual signal delivery via the return value. It now instructs the signal code to drop the signal when:
1) The timer does not longer exist in the hash table
2) The timer signal_seq value is not the same as the si_sys_private value which was set when the signal was queued.
This is also a preparatory change to embed the sigqueue into the k_itimer structure, which in turn allows to remove the si_sys_private magic.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
|
Revision tags: v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2 |
|
| #
1550dde8 |
| 01-Oct-2024 |
Thomas Gleixner <[email protected]> |
posix-timers: Add proper state tracking
Right now the state tracking is done by two struct members:
- it_active: A boolean which tracks armed/disarmed state
- it_signal_seq: A sequence
posix-timers: Add proper state tracking
Right now the state tracking is done by two struct members:
- it_active: A boolean which tracks armed/disarmed state
- it_signal_seq: A sequence counter which is used to invalidate settings and prevent rearming
Replace it_active with it_status and keep properly track about the states in one place.
This allows to reuse it_signal_seq to track reprogramming, disarm and delete operations in order to drop signals which are related to the state previous of those operations.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
cd1e93ae |
| 01-Oct-2024 |
Thomas Gleixner <[email protected]> |
posix-timers: Rename k_itimer:: It_requeue_pending
Prepare for using this struct member to do a proper reprogramming and deletion accounting so that stale signals can be dropped.
No functional chan
posix-timers: Rename k_itimer:: It_requeue_pending
Prepare for using this struct member to do a proper reprogramming and deletion accounting so that stale signals can be dropped.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
c775ea28 |
| 01-Oct-2024 |
Thomas Gleixner <[email protected]> |
signal: Allow POSIX timer signals to be dropped
In case that a timer was reprogrammed or deleted an already pending signal is obsolete. Right now such signals are kept around and eventually delivere
signal: Allow POSIX timer signals to be dropped
In case that a timer was reprogrammed or deleted an already pending signal is obsolete. Right now such signals are kept around and eventually delivered. While POSIX is blury about this:
- "The effect of disarming or resetting a timer with pending expiration notifications is unspecified."
- "The disposition of pending signals for the deleted timer is unspecified."
it is reasonable in both cases to expect that pending signals are discarded as they have no meaning anymore.
Prepare the signal code to allow dropping posix timer signals.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
68f99be2 |
| 01-Oct-2024 |
Thomas Gleixner <[email protected]> |
signal: Confine POSIX_TIMERS properly
Move the itimer rearming out of the signal code and consolidate all posix timer related functions in the signal code under one ifdef.
Signed-off-by: Thomas Gle
signal: Confine POSIX_TIMERS properly
Move the itimer rearming out of the signal code and consolidate all posix timer related functions in the signal code under one ifdef.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
|
Revision tags: v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4 |
|
| #
52dea0a1 |
| 10-Jun-2024 |
Thomas Gleixner <[email protected]> |
posix-timers: Convert timer list to hlist
No requirement for a real list. Spare a few bytes.
Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Frederic Weisbecker <frederic@kernel.
posix-timers: Convert timer list to hlist
No requirement for a real list. Spare a few bytes.
Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6 |
|
| #
53d31ba8 |
| 11-Dec-2023 |
Kent Overstreet <[email protected]> |
posix-cpu-timers: Split out posix-timers_types.h
Trimming down sched.h dependencies: we don't want to include more than the base types.
Cc: Thomas Gleixner <[email protected]> Signed-off-by: Kent
posix-cpu-timers: Split out posix-timers_types.h
Trimming down sched.h dependencies: we don't want to include more than the base types.
Cc: Thomas Gleixner <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3 |
|
| #
f7abf14f |
| 17-Apr-2023 |
Thomas Gleixner <[email protected]> |
posix-cpu-timers: Implement the missing timer_wait_running callback
For some unknown reason the introduction of the timer_wait_running callback missed to fixup posix CPU timers, which went unnoticed
posix-cpu-timers: Implement the missing timer_wait_running callback
For some unknown reason the introduction of the timer_wait_running callback missed to fixup posix CPU timers, which went unnoticed for almost four years. Marco reported recently that the WARN_ON() in timer_wait_running() triggers with a posix CPU timer test case.
Posix CPU timers have two execution models for expiring timers depending on CONFIG_POSIX_CPU_TIMERS_TASK_WORK:
1) If not enabled, the expiry happens in hard interrupt context so spin waiting on the remote CPU is reasonably time bound.
Implement an empty stub function for that case.
2) If enabled, the expiry happens in task work before returning to user space or guest mode. The expired timers are marked as firing and moved from the timer queue to a local list head with sighand lock held. Once the timers are moved, sighand lock is dropped and the expiry happens in fully preemptible context. That means the expiring task can be scheduled out, migrated, interrupted etc. So spin waiting on it is more than suboptimal.
The timer wheel has a timer_wait_running() mechanism for RT, which uses a per CPU timer-base expiry lock which is held by the expiry code and the task waiting for the timer function to complete blocks on that lock.
This does not work in the same way for posix CPU timers as there is no timer base and expiry for process wide timers can run on any task belonging to that process, but the concept of waiting on an expiry lock can be used too in a slightly different way:
- Add a mutex to struct posix_cputimers_work. This struct is per task and used to schedule the expiry task work from the timer interrupt.
- Add a task_struct pointer to struct cpu_timer which is used to store a the task which runs the expiry. That's filled in when the task moves the expired timers to the local expiry list. That's not affecting the size of the k_itimer union as there are bigger union members already
- Let the task take the expiry mutex around the expiry function
- Let the waiter acquire a task reference with rcu_read_lock() held and block on the expiry mutex
This avoids spin-waiting on a task which might not even be on a CPU and works nicely for RT too.
Fixes: ec8f954a40da ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT") Reported-by: Marco Elver <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Marco Elver <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/87zg764ojw.ffs@tglx
show more ...
|
|
Revision tags: v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2 |
|
| #
8ca07e17 |
| 28-Jan-2022 |
Eric W. Biederman <[email protected]> |
task_work: Remove unnecessary include from posix_timers.h
Break a header file circular dependency by removing the unnecessary include of task_work.h from posix_timers.h.
sched.h -> posix-timers.h p
task_work: Remove unnecessary include from posix_timers.h
Break a header file circular dependency by removing the unnecessary include of task_work.h from posix_timers.h.
sched.h -> posix-timers.h posix-timers.h -> task_work.h task_work.h -> sched.h
Add missing includes of task_work.h to: arch/x86/mm/tlb.c kernel/time/posix-cpu-timers.c
Reviewed-by: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc1, v5.16 |
|
| #
18c91bb2 |
| 06-Jan-2022 |
Barret Rhoden <[email protected]> |
prlimit: do not grab the tasklist_lock
Unnecessarily grabbing the tasklist_lock can be a scalability bottleneck for workloads that also must grab the tasklist_lock for waiting, killing, and cloning.
prlimit: do not grab the tasklist_lock
Unnecessarily grabbing the tasklist_lock can be a scalability bottleneck for workloads that also must grab the tasklist_lock for waiting, killing, and cloning.
The tasklist_lock was grabbed to protect tsk->sighand from disappearing (becoming NULL). tsk->signal was already protected by holding a reference to tsk.
update_rlimit_cpu() assumed tsk->sighand != NULL. With this commit, it attempts to lock_task_sighand(). However, this means that update_rlimit_cpu() can fail. This only happens when a task is exiting. Note that during exec, sighand may *change*, but it will not be NULL.
Prior to this commit, the do_prlimit() ensured that update_rlimit_cpu() would not fail by read locking the tasklist_lock and checking tsk->sighand != NULL.
If update_rlimit_cpu() fails, there may be other tasks that are not exiting that share tsk->signal. However, the group_leader is the last task to be released, so if we cannot update_rlimit_cpu(group_leader), then the entire process is exiting.
The only other caller of update_rlimit_cpu() is selinux_bprm_committing_creds(). It has tsk == current, so update_rlimit_cpu() cannot fail (current->sighand cannot disappear until current exits).
This change resulted in a 14% speedup on a microbenchmark where parents kill and wait on their children, and children getpriority, setpriority, and getrlimit.
Signed-off-by: Barret Rhoden <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
show more ...
|