History log of /linux-6.15/include/linux/posix-timers.h (Results 1 – 25 of 95)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7
# ec2d0c04 11-Mar-2025 Thomas Gleixner <[email protected]>

posix-timers: Provide a mechanism to allocate a given timer ID

Checkpoint/Restore in Userspace (CRIU) requires to reconstruct posix timers
with the same timer ID on restore. It uses sys_timer_create

posix-timers: Provide a mechanism to allocate a given timer ID

Checkpoint/Restore in Userspace (CRIU) requires to reconstruct posix timers
with the same timer ID on restore. It uses sys_timer_create() and relies on
the monotonic increasing timer ID provided by this syscall. It creates and
deletes timers until the desired ID is reached. This is can loop for a long
time, when the checkpointed process had a very sparse timer ID range.

It has been debated to implement a new syscall to allow the creation of
timers with a given timer ID, but that's tideous due to the 32/64bit compat
issues of sigevent_t and of dubious value.

The restore mechanism of CRIU creates the timers in a state where all
threads of the restored process are held on a barrier and cannot issue
syscalls. That means the restorer task has exclusive control.

This allows to address this issue with a prctl() so that the restorer
thread can do:

if (prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_ON))
goto linear_mode;
create_timers_with_explicit_ids();
prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_OFF);

This is backwards compatible because the prctl() fails on older kernels and
CRIU can fall back to the linear timer ID mechanism. CRIU versions which do
not know about the prctl() just work as before.

Implement the prctl() and modify timer_create() so that it copies the
requested timer ID from userspace by utilizing the existing timer_t
pointer, which is used to copy out the allocated timer ID on success.

If the prctl() is disabled, which it is by default, timer_create() works as
before and does not try to read from the userspace pointer.

There is no problem when a broken or rogue user space application enables
the prctl(). If the user space pointer does not contain a valid ID, then
timer_create() fails. If the data is not initialized, but constains a
random valid ID, timer_create() will create that random timer ID or fail if
the ID is already given out.

As CRIU must use the raw syscall to avoid manipulating the internal state
of the restored process, this has no library dependencies and can be
adopted by CRIU right away.

Recreating two timers with IDs 1000000 and 2000000 takes 1.5 seconds with
the create/delete method. With the prctl() it takes 3 microseconds.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>
Tested-by: Cyrill Gorcunov <[email protected]>
Link: https://lore.kernel.org/all/87jz8vz0en.ffs@tglx

show more ...


Revision tags: v6.14-rc6
# 5fa75a43 08-Mar-2025 Thomas Gleixner <[email protected]>

posix-timers: Avoid false cacheline sharing

struct k_itimer has the hlist_node, which is used for lookup in the hash
bucket, and the timer lock in the same cache line.

That's obviously bad, if one

posix-timers: Avoid false cacheline sharing

struct k_itimer has the hlist_node, which is used for lookup in the hash
bucket, and the timer lock in the same cache line.

That's obviously bad, if one CPU fiddles with a timer and the other is
walking the hash bucket on which that timer is queued.

Avoid this by restructuring struct k_itimer, so that the read mostly (only
modified during setup and teardown) fields are in the first cache line and
the lock and the rest of the fields which get written to are in cacheline
2-N.

Reduces cacheline contention in a test case of 64 processes creating and
accessing 20000 timers each by almost 30% according to perf.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# 1d25bdd3 10-Mar-2025 Thomas Gleixner <[email protected]>

posix-timers: Rework timer removal

sys_timer_delete() and the do_exit() cleanup function itimer_delete() are
doing the same thing, but have needlessly different implementations instead
of sharing th

posix-timers: Rework timer removal

sys_timer_delete() and the do_exit() cleanup function itimer_delete() are
doing the same thing, but have needlessly different implementations instead
of sharing the code.

The other oddity of timer deletion is the fact that the timer is not
invalidated before the actual deletion happens, which allows concurrent
lookups to succeed.

That's wrong because a timer which is in the process of being deleted
should not be visible and any actions like signal queueing, delivery and
rearming should not happen once the task, which invoked timer_delete(), has
the timer locked.

Rework the code so that:

1) The signal queueing and delivery code ignore timers which are marked
invalid

2) The deletion implementation between sys_timer_delete() and
itimer_delete() is shared

3) The timer is invalidated and removed from the linked lists before
the deletion callback of the relevant clock is invoked.

That requires to rework timer_wait_running() as it does a lookup of
the timer when relocking it at the end. In case of deletion this
lookup would fail due to the preceding invalidation and the wait loop
would terminate prematurely.

But due to the preceding invalidation the timer cannot be accessed by
other tasks anymore, so there is no way that the timer has been freed
after the timer lock has been dropped.

Move the re-validation out of timer_wait_running() and handle it at
the only other usage site, timer_settime().

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Link: https://lore.kernel.org/all/87zfht1exf.ffs@tglx

show more ...


Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7
# 7a66f72b 05-Nov-2024 Thomas Gleixner <[email protected]>

posix-timers: Cleanup SIG_IGN workaround leftovers

Now that ignored posix timer signals are requeued and the timers are
rearmed on signal delivery the workaround to keep such timers alive and
self r

posix-timers: Cleanup SIG_IGN workaround leftovers

Now that ignored posix timer signals are requeued and the timers are
rearmed on signal delivery the workaround to keep such timers alive and
self rearm them is not longer required.

Remove the relevant hacks and the not longer required return values from
the related functions. The alarm timer workarounds will be cleaned up in a
separate step.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# df7a996b 05-Nov-2024 Thomas Gleixner <[email protected]>

signal: Queue ignored posixtimers on ignore list

Queue posixtimers which have their signal ignored on the ignored list:

1) When the timer fires and the signal has SIG_IGN set

2) When SIG_IGN

signal: Queue ignored posixtimers on ignore list

Queue posixtimers which have their signal ignored on the ignored list:

1) When the timer fires and the signal has SIG_IGN set

2) When SIG_IGN is installed via sigaction() and a timer signal
is already queued

This only happens when the signal is for a valid timer, which delivered the
signal in periodic mode. One-shot timer signals are correctly dropped.

Due to the lock order constraints (sighand::siglock nests inside
timer::lock) the signal code cannot access any of the timer fields which
are relevant to make this decision, e.g. timer::it_status.

This is addressed by establishing a protection scheme which requires to
lock both locks on the timer side for modifying decision fields in the
timer struct and therefore makes it possible for the signal delivery to
evaluate with only sighand:siglock being held:

1) Move the NULLification of timer->it_signal into the sighand::siglock
protected section of timer_delete() and check timer::it_signal in the
code path which determines whether the signal is dropped or queued on
the ignore list.

This ensures that a deleted timer cannot be moved onto the ignore
list, which would prevent it from being freed on exit() as it is not
longer in the process' posix timer list.

If the timer got moved to the ignored list before deletion then it is
removed from the ignored list under sighand lock in timer_delete().

2) Provide a new timer::it_sig_periodic flag, which gets set in the
signal queue path with both timer and sighand locks held if the timer
is actually in periodic mode at expiry time.

The ignore list code checks this flag under sighand::siglock and drops
the signal when it is not set.

If it is set, then the signal is moved to the ignored list independent
of the actual state of the timer.

When the signal is un-ignored later then the signal is moved back to
the signal queue. On signal delivery the posix timer side decides
about dropping the signal if the timer was re-armed, dis-armed or
deleted based on the signal sequence counter check.

If the thread/process exits then not yet delivered signals are
discarded which means the reference of the timer containing the
sigqueue is dropped and frees the timer.

This is way cheaper than requiring all code paths to lock
sighand::siglock of the target thread/process on any modification of
timer::it_status or going all the way and removing pending signals
from the signal queues on every rearm, disarm or delete operation.

So the protection scheme here is that on the timer side both timer::lock
and sighand::siglock have to be held for modifying

timer::it_signal
timer::it_sig_periodic

which means that on the signal side holding sighand::siglock is enough to
evaluate these fields.

In posixtimer_deliver_signal() holding timer::lock is sufficient to do the
sequence validation against timer::it_signal_seq because a concurrent
expiry is waiting on timer::lock to be released.

This completes the SIG_IGN handling and such timers are not longer self
rearmed which avoids pointless wakeups.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# 0e20cd33 05-Nov-2024 Thomas Gleixner <[email protected]>

posix-timers: Handle ignored list on delete and exit

To handle posix timer signals on sigaction(SIG_IGN) properly, the timers
will be queued on a separate ignored list.

Add the necessary cleanup co

posix-timers: Handle ignored list on delete and exit

To handle posix timer signals on sigaction(SIG_IGN) properly, the timers
will be queued on a separate ignored list.

Add the necessary cleanup code for timer_delete() and exit_itimers().

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# 647da5f7 05-Nov-2024 Thomas Gleixner <[email protected]>

posix-timers: Move sequence logic into struct k_itimer

The posix timer signal handling uses siginfo::si_sys_private for handling
the sequence counter check. That indirection is not longer required a

posix-timers: Move sequence logic into struct k_itimer

The posix timer signal handling uses siginfo::si_sys_private for handling
the sequence counter check. That indirection is not longer required and the
sequence count value at signal queueing time can be stored in struct
k_itimer itself.

This removes the requirement of treating siginfo::si_sys_private special as
it's now always zero as the kernel does not touch it anymore.

Suggested-by: Eric W. Biederman <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# 6017a158 05-Nov-2024 Thomas Gleixner <[email protected]>

posix-timers: Embed sigqueue in struct k_itimer

To cure the SIG_IGN handling for posix interval timers, the preallocated
sigqueue needs to be embedded into struct k_itimer to prevent life time
races

posix-timers: Embed sigqueue in struct k_itimer

To cure the SIG_IGN handling for posix interval timers, the preallocated
sigqueue needs to be embedded into struct k_itimer to prevent life time
races of all sorts.

Now that the prerequisites are in place, embed the sigqueue into struct
k_itimer and fixup the relevant usage sites.

Aside of preparing for proper SIG_IGN handling, this spares an extra
allocation.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# 11629b98 05-Nov-2024 Thomas Gleixner <[email protected]>

signal: Replace resched_timer logic

In preparation for handling ignored posix timer signals correctly and
embedding the sigqueue struct into struct k_itimer, hand down a pointer to
the sigqueue stru

signal: Replace resched_timer logic

In preparation for handling ignored posix timer signals correctly and
embedding the sigqueue struct into struct k_itimer, hand down a pointer to
the sigqueue struct into posix_timer_deliver_signal() instead of just
having a boolean flag.

No functional change.

Suggested-by: Eric W. Biederman <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# 0360ed14 05-Nov-2024 Thomas Gleixner <[email protected]>

signal: Refactor send_sigqueue()

To handle posix timers which have their signal ignored via SIG_IGN properly
it is required to requeue a ignored signal for delivery when SIG_IGN is
lifted so the tim

signal: Refactor send_sigqueue()

To handle posix timers which have their signal ignored via SIG_IGN properly
it is required to requeue a ignored signal for delivery when SIG_IGN is
lifted so the timer gets rearmed.

Split the required code out of send_sigqueue() so it can be reused in
context of sigaction().

While at it rename send_sigqueue() to posixtimer_send_sigqueue() so its
clear what this is about.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# ef1c5bcd 05-Nov-2024 Thomas Gleixner <[email protected]>

posix-timers: Store PID type in the timer

instead of re-evaluating the signal delivery mode everywhere.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <frederi

posix-timers: Store PID type in the timer

instead of re-evaluating the signal delivery mode everywhere.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# 54f1dd64 05-Nov-2024 Thomas Gleixner <[email protected]>

signal: Provide posixtimer_sigqueue_init()

To cure the SIG_IGN handling for posix interval timers, the preallocated
sigqueue needs to be embedded into struct k_itimer to prevent life time
races of a

signal: Provide posixtimer_sigqueue_init()

To cure the SIG_IGN handling for posix interval timers, the preallocated
sigqueue needs to be embedded into struct k_itimer to prevent life time
races of all sorts.

Provide a new function to initialize the embedded sigqueue to prepare for
that.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# 5d916a09 05-Nov-2024 Thomas Gleixner <[email protected]>

posix-timers: Add a refcount to struct k_itimer

To cure the SIG_IGN handling for posix interval timers, the preallocated
sigqueue needs to be embedded into struct k_itimer to prevent life time
races

posix-timers: Add a refcount to struct k_itimer

To cure the SIG_IGN handling for posix interval timers, the preallocated
sigqueue needs to be embedded into struct k_itimer to prevent life time
races of all sorts.

To make that work correctly it needs reference counting so that timer
deletion does not free the timer prematuraly when there is a signal queued
or delivered concurrently.

Add a rcuref to the posix timer part.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# 4cf7bf2a 05-Nov-2024 Thomas Gleixner <[email protected]>

posix-cpu-timers: Use dedicated flag for CPU timer nanosleep

POSIX CPU timer nanosleep creates a k_itimer on stack and uses the sigq
pointer to detect the nanosleep case in the expiry function.

Pre

posix-cpu-timers: Use dedicated flag for CPU timer nanosleep

POSIX CPU timer nanosleep creates a k_itimer on stack and uses the sigq
pointer to detect the nanosleep case in the expiry function.

Prepare for embedding sigqueue into struct k_itimer by using a dedicated
flag for nanosleep.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# bf635681 05-Nov-2024 Thomas Gleixner <[email protected]>

posix-cpu-timers: Cleanup the firing logic

The firing flag of a posix CPU timer is tristate:

0: when the timer is not about to deliver a signal

1: when the timer has expired, but the signal ha

posix-cpu-timers: Cleanup the firing logic

The firing flag of a posix CPU timer is tristate:

0: when the timer is not about to deliver a signal

1: when the timer has expired, but the signal has not been delivered yet

-1: when the timer was queued for signal delivery and a rearm operation
raced against it and supressed the signal delivery.

This is a pointless exercise as this can be simply expressed with a
boolean. Only if set, the signal is delivered. This makes delete and rearm
consistent with the rest of the posix timers.

Convert firing to bool and fixup the usage sites accordingly and add
comments why the timer cannot be dequeued right away.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# 513793bc 05-Nov-2024 Thomas Gleixner <[email protected]>

posix-timers: Make signal delivery consistent

Signals of timers which are reprogammed, disarmed or deleted can deliver
signals related to the past. The POSIX spec is blury about this:

- "The effec

posix-timers: Make signal delivery consistent

Signals of timers which are reprogammed, disarmed or deleted can deliver
signals related to the past. The POSIX spec is blury about this:

- "The effect of disarming or resetting a timer with pending expiration
notifications is unspecified."

- "The disposition of pending signals for the deleted timer is
unspecified."

In both cases it is reasonable to expect that pending signals are
discarded. Especially in the reprogramming case it does not make sense to
account for previous overruns or to deliver a signal for a timer which has
been disarmed. This makes the behaviour consistent and understandable.

Remove the si_sys_private check from the signal delivery code and invoke
posix_timer_deliver_signal() unconditionally for posix timer related
signals.

Change posix_timer_deliver_signal() so it controls the actual signal
delivery via the return value. It now instructs the signal code to drop the
signal when:

1) The timer does not longer exist in the hash table

2) The timer signal_seq value is not the same as the si_sys_private value
which was set when the signal was queued.

This is also a preparatory change to embed the sigqueue into the k_itimer
structure, which in turn allows to remove the si_sys_private magic.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


Revision tags: v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2
# 1550dde8 01-Oct-2024 Thomas Gleixner <[email protected]>

posix-timers: Add proper state tracking

Right now the state tracking is done by two struct members:

- it_active:
A boolean which tracks armed/disarmed state

- it_signal_seq:
A sequence

posix-timers: Add proper state tracking

Right now the state tracking is done by two struct members:

- it_active:
A boolean which tracks armed/disarmed state

- it_signal_seq:
A sequence counter which is used to invalidate settings
and prevent rearming

Replace it_active with it_status and keep properly track about the states
in one place.

This allows to reuse it_signal_seq to track reprogramming, disarm and
delete operations in order to drop signals which are related to the state
previous of those operations.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# cd1e93ae 01-Oct-2024 Thomas Gleixner <[email protected]>

posix-timers: Rename k_itimer:: It_requeue_pending

Prepare for using this struct member to do a proper reprogramming and
deletion accounting so that stale signals can be dropped.

No functional chan

posix-timers: Rename k_itimer:: It_requeue_pending

Prepare for using this struct member to do a proper reprogramming and
deletion accounting so that stale signals can be dropped.

No functional change.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# c775ea28 01-Oct-2024 Thomas Gleixner <[email protected]>

signal: Allow POSIX timer signals to be dropped

In case that a timer was reprogrammed or deleted an already pending signal
is obsolete. Right now such signals are kept around and eventually
delivere

signal: Allow POSIX timer signals to be dropped

In case that a timer was reprogrammed or deleted an already pending signal
is obsolete. Right now such signals are kept around and eventually
delivered. While POSIX is blury about this:

- "The effect of disarming or resetting a timer with pending expiration
notifications is unspecified."

- "The disposition of pending signals for the deleted timer is
unspecified."

it is reasonable in both cases to expect that pending signals are discarded
as they have no meaning anymore.

Prepare the signal code to allow dropping posix timer signals.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# 68f99be2 01-Oct-2024 Thomas Gleixner <[email protected]>

signal: Confine POSIX_TIMERS properly

Move the itimer rearming out of the signal code and consolidate all posix
timer related functions in the signal code under one ifdef.

Signed-off-by: Thomas Gle

signal: Confine POSIX_TIMERS properly

Move the itimer rearming out of the signal code and consolidate all posix
timer related functions in the signal code under one ifdef.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


Revision tags: v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4
# 52dea0a1 10-Jun-2024 Thomas Gleixner <[email protected]>

posix-timers: Convert timer list to hlist

No requirement for a real list. Spare a few bytes.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Frederic Weisbecker <frederic@kernel.

posix-timers: Convert timer list to hlist

No requirement for a real list. Spare a few bytes.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>

show more ...


Revision tags: v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6
# 53d31ba8 11-Dec-2023 Kent Overstreet <[email protected]>

posix-cpu-timers: Split out posix-timers_types.h

Trimming down sched.h dependencies: we don't want to include more than
the base types.

Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Kent

posix-cpu-timers: Split out posix-timers_types.h

Trimming down sched.h dependencies: we don't want to include more than
the base types.

Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>

show more ...


Revision tags: v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3
# f7abf14f 17-Apr-2023 Thomas Gleixner <[email protected]>

posix-cpu-timers: Implement the missing timer_wait_running callback

For some unknown reason the introduction of the timer_wait_running callback
missed to fixup posix CPU timers, which went unnoticed

posix-cpu-timers: Implement the missing timer_wait_running callback

For some unknown reason the introduction of the timer_wait_running callback
missed to fixup posix CPU timers, which went unnoticed for almost four years.
Marco reported recently that the WARN_ON() in timer_wait_running()
triggers with a posix CPU timer test case.

Posix CPU timers have two execution models for expiring timers depending on
CONFIG_POSIX_CPU_TIMERS_TASK_WORK:

1) If not enabled, the expiry happens in hard interrupt context so
spin waiting on the remote CPU is reasonably time bound.

Implement an empty stub function for that case.

2) If enabled, the expiry happens in task work before returning to user
space or guest mode. The expired timers are marked as firing and moved
from the timer queue to a local list head with sighand lock held. Once
the timers are moved, sighand lock is dropped and the expiry happens in
fully preemptible context. That means the expiring task can be scheduled
out, migrated, interrupted etc. So spin waiting on it is more than
suboptimal.

The timer wheel has a timer_wait_running() mechanism for RT, which uses
a per CPU timer-base expiry lock which is held by the expiry code and the
task waiting for the timer function to complete blocks on that lock.

This does not work in the same way for posix CPU timers as there is no
timer base and expiry for process wide timers can run on any task
belonging to that process, but the concept of waiting on an expiry lock
can be used too in a slightly different way:

- Add a mutex to struct posix_cputimers_work. This struct is per task
and used to schedule the expiry task work from the timer interrupt.

- Add a task_struct pointer to struct cpu_timer which is used to store
a the task which runs the expiry. That's filled in when the task
moves the expired timers to the local expiry list. That's not
affecting the size of the k_itimer union as there are bigger union
members already

- Let the task take the expiry mutex around the expiry function

- Let the waiter acquire a task reference with rcu_read_lock() held and
block on the expiry mutex

This avoids spin-waiting on a task which might not even be on a CPU and
works nicely for RT too.

Fixes: ec8f954a40da ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT")
Reported-by: Marco Elver <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Marco Elver <[email protected]>
Tested-by: Sebastian Andrzej Siewior <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/87zg764ojw.ffs@tglx

show more ...


Revision tags: v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2
# 8ca07e17 28-Jan-2022 Eric W. Biederman <[email protected]>

task_work: Remove unnecessary include from posix_timers.h

Break a header file circular dependency by removing the unnecessary
include of task_work.h from posix_timers.h.

sched.h -> posix-timers.h
p

task_work: Remove unnecessary include from posix_timers.h

Break a header file circular dependency by removing the unnecessary
include of task_work.h from posix_timers.h.

sched.h -> posix-timers.h
posix-timers.h -> task_work.h
task_work.h -> sched.h

Add missing includes of task_work.h to:
arch/x86/mm/tlb.c
kernel/time/posix-cpu-timers.c

Reviewed-by: Kees Cook <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>

show more ...


Revision tags: v5.17-rc1, v5.16
# 18c91bb2 06-Jan-2022 Barret Rhoden <[email protected]>

prlimit: do not grab the tasklist_lock

Unnecessarily grabbing the tasklist_lock can be a scalability bottleneck
for workloads that also must grab the tasklist_lock for waiting,
killing, and cloning.

prlimit: do not grab the tasklist_lock

Unnecessarily grabbing the tasklist_lock can be a scalability bottleneck
for workloads that also must grab the tasklist_lock for waiting,
killing, and cloning.

The tasklist_lock was grabbed to protect tsk->sighand from disappearing
(becoming NULL). tsk->signal was already protected by holding a
reference to tsk.

update_rlimit_cpu() assumed tsk->sighand != NULL. With this commit, it
attempts to lock_task_sighand(). However, this means that
update_rlimit_cpu() can fail. This only happens when a task is exiting.
Note that during exec, sighand may *change*, but it will not be NULL.

Prior to this commit, the do_prlimit() ensured that update_rlimit_cpu()
would not fail by read locking the tasklist_lock and checking tsk->sighand
!= NULL.

If update_rlimit_cpu() fails, there may be other tasks that are not
exiting that share tsk->signal. However, the group_leader is the last
task to be released, so if we cannot update_rlimit_cpu(group_leader),
then the entire process is exiting.

The only other caller of update_rlimit_cpu() is
selinux_bprm_committing_creds(). It has tsk == current, so
update_rlimit_cpu() cannot fail (current->sighand cannot disappear
until current exits).

This change resulted in a 14% speedup on a microbenchmark where parents
kill and wait on their children, and children getpriority, setpriority,
and getrlimit.

Signed-off-by: Barret Rhoden <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>

show more ...


1234