History log of /linux-6.15/kernel/seccomp.c (Results 1 – 25 of 162)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3
# 8f193313 13-Feb-2025 Mateusz Guzik <[email protected]>

seccomp: avoid the lock trip seccomp_filter_release in common case

Vast majority of threads don't have any seccomp filters, all while the
lock taken here is shared between all threads in given proce

seccomp: avoid the lock trip seccomp_filter_release in common case

Vast majority of threads don't have any seccomp filters, all while the
lock taken here is shared between all threads in given process and
frequently used.

Safety of the check relies on the following:
- seccomp_filter_release is only legally called for PF_EXITING threads
- SIGNAL_GROUP_EXIT is only ever set with the sighand lock held
- PF_EXITING is only ever set with the sighand lock held *or* after
SIGNAL_GROUP_EXIT is set *or* the process is single-threaded
- seccomp_sync_threads holds the sighand lock and skips all threads if
SIGNAL_GROUP_EXIT is set, PF_EXITING threads if not

Resulting reduction of contention gives me a 5% boost in a
microbenchmark spawning and killing threads within the same process.

Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>

show more ...


Revision tags: v6.14-rc2, v6.14-rc1
# e1cec510 28-Jan-2025 Oleg Nesterov <[email protected]>

seccomp: remove the 'sd' argument from __seccomp_filter()

After the previous change 'sd' is always NULL.

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link

seccomp: remove the 'sd' argument from __seccomp_filter()

After the previous change 'sd' is always NULL.

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>

show more ...


# 1027cd80 28-Jan-2025 Oleg Nesterov <[email protected]>

seccomp: remove the 'sd' argument from __secure_computing()

After the previous changes 'sd' is always NULL.

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
L

seccomp: remove the 'sd' argument from __secure_computing()

After the previous changes 'sd' is always NULL.

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>

show more ...


# b37778be 28-Jan-2025 Oleg Nesterov <[email protected]>

seccomp: fix the __secure_computing() stub for !HAVE_ARCH_SECCOMP_FILTER

Depending on CONFIG_HAVE_ARCH_SECCOMP_FILTER, __secure_computing(NULL)
will crash or not. This is not consistent/safe, especi

seccomp: fix the __secure_computing() stub for !HAVE_ARCH_SECCOMP_FILTER

Depending on CONFIG_HAVE_ARCH_SECCOMP_FILTER, __secure_computing(NULL)
will crash or not. This is not consistent/safe, especially considering
that after the previous change __secure_computing(sd) is always called
with sd == NULL.

Fortunately, if CONFIG_HAVE_ARCH_SECCOMP_FILTER=n, __secure_computing()
has no callers, these architectures use secure_computing_strict(). Yet
it make sense make __secure_computing(NULL) safe in this case.

Note also that with this change we can unexport secure_computing_strict()
and change the current callers to use __secure_computing(NULL).

Fixes: 8cf8dfceebda ("seccomp: Stub for !HAVE_ARCH_SECCOMP_FILTER")
Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>

show more ...


# cf6cb56e 02-Feb-2025 Eyal Birger <[email protected]>

seccomp: passthrough uretprobe systemcall without filtering

When attaching uretprobes to processes running inside docker, the attached
process is segfaulted when encountering the retprobe.

The reas

seccomp: passthrough uretprobe systemcall without filtering

When attaching uretprobes to processes running inside docker, the attached
process is segfaulted when encountering the retprobe.

The reason is that now that uretprobe is a system call the default seccomp
filters in docker block it as they only allow a specific set of known
syscalls. This is true for other userspace applications which use seccomp
to control their syscall surface.

Since uretprobe is a "kernel implementation detail" system call which is
not used by userspace application code directly, it is impractical and
there's very little point in forcing all userspace applications to
explicitly allow it in order to avoid crashing tracked processes.

Pass this systemcall through seccomp without depending on configuration.

Note: uretprobe is currently only x86_64 and isn't expected to ever be
supported in i386.

Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe")
Reported-by: Rafael Buchbinder <[email protected]>
Closes: https://lore.kernel.org/lkml/CAHsH6Gs3Eh8DFU0wq58c_LF8A4_+o6z456J7BidmcVY2AqOnHQ@mail.gmail.com/
Link: https://lore.kernel.org/lkml/[email protected]/T/#me2676c378eff2d6a33f3054fed4a5f3afa64e65b
Link: https://lore.kernel.org/lkml/[email protected]/
Cc: [email protected]
Signed-off-by: Eyal Birger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[kees: minimized changes for easier backporting, tweaked commit log]
Signed-off-by: Kees Cook <[email protected]>

show more ...


# 1751f872 28-Jan-2025 Joel Granados <[email protected]>

treewide: const qualify ctl_tables where applicable

Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysc

treewide: const qualify ctl_tables where applicable

Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.

Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.

Created this by running an spatch followed by a sed command:
Spatch:
virtual patch

@
depends on !(file in "net")
disable optional_qualifier
@

identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@

+ const
struct ctl_table table_name [] = { ... };

sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c

Reviewed-by: Song Liu <[email protected]>
Acked-by: Steven Rostedt (Google) <[email protected]> # for kernel/trace/
Reviewed-by: Martin K. Petersen <[email protected]> # SCSI
Reviewed-by: Darrick J. Wong <[email protected]> # xfs
Acked-by: Jani Nikula <[email protected]>
Acked-by: Corey Minyard <[email protected]>
Acked-by: Wei Liu <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Reviewed-by: Bill O'Donnell <[email protected]>
Acked-by: Baoquan He <[email protected]>
Acked-by: Ashutosh Dixit <[email protected]>
Acked-by: Anna Schumaker <[email protected]>
Signed-off-by: Joel Granados <[email protected]>

show more ...


Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1
# 78eb4ea2 24-Jul-2024 Joel Granados <[email protected]>

sysctl: treewide: constify the ctl_table argument of proc_handlers

const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ct

sysctl: treewide: constify the ctl_table argument of proc_handlers

const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ctl_table
structs into .rodata data which will ensure that proc_handler function
pointers cannot be modified.

This patch has been generated by the following coccinelle script:

```
virtual patch

@r1@
identifier ctl, write, buffer, lenp, ppos;
identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
@@

int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos);

@r2@
identifier func, ctl, write, buffer, lenp, ppos;
@@

int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos)
{ ... }

@r3@
identifier func;
@@

int func(
- struct ctl_table *
+ const struct ctl_table *
,int , void *, size_t *, loff_t *);

@r4@
identifier func, ctl;
@@

int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int , void *, size_t *, loff_t *);

@r5@
identifier func, write, buffer, lenp, ppos;
@@

int func(
- struct ctl_table *
+ const struct ctl_table *
,int write, void *buffer, size_t *lenp, loff_t *ppos);

```

* Code formatting was adjusted in xfs_sysctl.c to comply with code
conventions. The xfs_stats_clear_proc_handler,
xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
adjusted.

* The ctl_table argument in proc_watchdog_common was const qualified.
This is called from a proc_handler itself and is calling back into
another proc_handler, making it necessary to change it as part of the
proc_handler migration.

Co-developed-by: Thomas Weißschuh <[email protected]>
Signed-off-by: Thomas Weißschuh <[email protected]>
Co-developed-by: Joel Granados <[email protected]>
Signed-off-by: Joel Granados <[email protected]>

show more ...


Revision tags: v6.10, v6.10-rc7, v6.10-rc6
# bfafe5ef 28-Jun-2024 Andrei Vagin <[email protected]>

seccomp: release task filters when the task exits

Previously, seccomp filters were released in release_task(), which
required the process to exit and its zombie to be collected. However,
exited thre

seccomp: release task filters when the task exits

Previously, seccomp filters were released in release_task(), which
required the process to exit and its zombie to be collected. However,
exited threads/processes can't trigger any seccomp events, making it
more logical to release filters upon task exits.

This adjustment simplifies scenarios where a parent is tracing its child
process. The parent process can now handle all events from a seccomp
listening descriptor and then call wait to collect a child zombie.

seccomp_filter_release takes the siglock to avoid races with
seccomp_sync_threads. There was an idea to bypass taking the lock by
checking PF_EXITING, but it can be set without holding siglock if
threads have SIGNAL_GROUP_EXIT. This means it can happen concurently
with seccomp_filter_release.

This change also fixes another minor problem. Suppose that a group
leader installs the new filter without SECCOMP_FILTER_FLAG_TSYNC, exits,
and becomes a zombie. Without this change, SECCOMP_FILTER_FLAG_TSYNC
from any other thread can never succeed, seccomp_can_sync_threads() will
check a zombie leader and is_ancestor() will fail.

Reviewed-by: Oleg Nesterov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Tycho Andersen <[email protected]>
Signed-off-by: Kees Cook <[email protected]>

show more ...


# 95036a79 28-Jun-2024 Andrei Vagin <[email protected]>

seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited

SECCOMP_IOCTL_NOTIF_RECV promptly returns when a seccomp filter becomes
unused, as a filter without users can't trigger any eve

seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited

SECCOMP_IOCTL_NOTIF_RECV promptly returns when a seccomp filter becomes
unused, as a filter without users can't trigger any events.

Previously, event listeners had to rely on epoll to detect when all
processes had exited.

The change is based on the 'commit 99cdb8b9a573 ("seccomp: notify about
unused filter")' which implemented (E)POLLHUP notifications.

Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Tycho Andersen <[email protected]>
Signed-off-by: Kees Cook <[email protected]>

show more ...


Revision tags: v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9
# e406737b 08-May-2024 Kees Cook <[email protected]>

seccomp: Constify sysctl subhelpers

The read_actions_logged() and write_actions_logged() helpers called by the
sysctl proc handler seccomp_actions_logged_handler() are already expecting
their sysctl

seccomp: Constify sysctl subhelpers

The read_actions_logged() and write_actions_logged() helpers called by the
sysctl proc handler seccomp_actions_logged_handler() are already expecting
their sysctl table argument to be read-only. Actually mark the argument
as const in preparation[1] for global constification of the sysctl tables.

Suggested-by: Thomas Weißschuh <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/ [1]
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Thomas Weißschuh <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>

show more ...


Revision tags: v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1
# e822582e 27-Jun-2023 Joel Granados <[email protected]>

seccomp: Remove the now superfluous sentinel elements from ctl_table array

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sent

seccomp: Remove the now superfluous sentinel elements from ctl_table array

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/)

Remove sentinel element from seccomp_sysctl_table.

Acked-by: Kees Cook <[email protected]>
Signed-off-by: Joel Granados <[email protected]>

show more ...


# 4e94ddfe 30-Nov-2023 Christian Brauner <[email protected]>

file: remove __receive_fd()

Honestly, there's little value in having a helper with and without that
int __user *ufd argument. It's just messy and doesn't really give us
anything. Just expose receive

file: remove __receive_fd()

Honestly, there's little value in having a helper with and without that
int __user *ufd argument. It's just messy and doesn't really give us
anything. Just expose receive_fd() with that argument and get rid of
that helper.

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

show more ...


# 46822860 17-Aug-2023 Kees Cook <[email protected]>

seccomp: Add missing kerndoc notations

The kerndoc for some struct member and function arguments were missing.
Add them.

Cc: Andy Lutomirski <[email protected]>
Cc: Will Drewry <[email protected]>

seccomp: Add missing kerndoc notations

The kerndoc for some struct member and function arguments were missing.
Add them.

Cc: Andy Lutomirski <[email protected]>
Cc: Will Drewry <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Kees Cook <[email protected]>

show more ...


Revision tags: v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2
# 48a1084a 08-Mar-2023 Andrei Vagin <[email protected]>

seccomp: add the synchronous mode for seccomp_unotify

seccomp_unotify allows more privileged processes do actions on behalf
of less privileged processes.

In many cases, the workflow is fully synchr

seccomp: add the synchronous mode for seccomp_unotify

seccomp_unotify allows more privileged processes do actions on behalf
of less privileged processes.

In many cases, the workflow is fully synchronous. It means a target
process triggers a system call and passes controls to a supervisor
process that handles the system call and returns controls to the target
process. In this context, "synchronous" means that only one process is
running and another one is waiting.

There is the WF_CURRENT_CPU flag that is used to advise the scheduler to
move the wakee to the current CPU. For such synchronous workflows, it
makes context switches a few times faster.

Right now, each interaction takes 12µs. With this patch, it takes about
3µs.

This change introduce the SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP flag that
it used to enable the sync mode.

Signed-off-by: Andrei Vagin <[email protected]>
Acked-by: "Peter Zijlstra (Intel)" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>

show more ...


# 4943b66d 08-Mar-2023 Andrei Vagin <[email protected]>

seccomp: don't use semaphore and wait_queue together

The main reason is to use new wake_up helpers that will be added in the
following patches. But here are a few other reasons:

* if we use two dif

seccomp: don't use semaphore and wait_queue together

The main reason is to use new wake_up helpers that will be added in the
following patches. But here are a few other reasons:

* if we use two different ways, we always need to call them both. This
patch fixes seccomp_notify_recv where we forgot to call wake_up_poll
in the error path.

* If we use one primitive, we can control how many waiters are woken up
for each request. Our goal is to wake up just one that will handle a
request. Right now, wake_up_poll can wake up one waiter and
up(&match->notif->request) can wake up one more.

Signed-off-by: Andrei Vagin <[email protected]>
Acked-by: "Peter Zijlstra (Intel)" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>

show more ...


Revision tags: v6.3-rc1
# 02a6b455 02-Mar-2023 Luis Chamberlain <[email protected]>

seccomp: simplify sysctls with register_sysctl_init()

register_sysctl_paths() is only needed if you have childs (directories)
with entries. Just use register_sysctl_init() as it also does the
kmemle

seccomp: simplify sysctls with register_sysctl_init()

register_sysctl_paths() is only needed if you have childs (directories)
with entries. Just use register_sysctl_init() as it also does the
kmemleak check for you.

Acked-by: Kees Cook <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>

show more ...


Revision tags: v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3
# 0fb0624b 08-Jan-2023 Randy Dunlap <[email protected]>

seccomp: fix kernel-doc function name warning

Move the ACTION_ONLY() macro so that it is not between the kernel-doc
notation and the function definition for seccomp_run_filters(),
eliminating a kern

seccomp: fix kernel-doc function name warning

Move the ACTION_ONLY() macro so that it is not between the kernel-doc
notation and the function definition for seccomp_run_filters(),
eliminating a kernel-doc warning:

kernel/seccomp.c:400: warning: expecting prototype for seccomp_run_filters(). Prototype was for ACTION_ONLY() instead

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Will Drewry <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6
# c2aa2dfe 03-May-2022 Sargun Dhillon <[email protected]>

seccomp: Add wait_killable semantic to seccomp user notifier

This introduces a per-filter flag (SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV)
that makes it so that when notifications are received by the s

seccomp: Add wait_killable semantic to seccomp user notifier

This introduces a per-filter flag (SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV)
that makes it so that when notifications are received by the supervisor the
notifying process will transition to wait killable semantics. Although wait
killable isn't a set of semantics formally exposed to userspace, the
concept is searchable. If the notifying process is signaled prior to the
notification being received by the userspace agent, it will be handled as
normal.

One quirk about how this is handled is that the notifying process
only switches to TASK_KILLABLE if it receives a wakeup from either
an addfd or a signal. This is to avoid an unnecessary wakeup of
the notifying task.

The reasons behind switching into wait_killable only after userspace
receives the notification are:
* Avoiding unncessary work - Often, workloads will perform work that they
may abort (request racing comes to mind). This allows for syscalls to be
aborted safely prior to the notification being received by the
supervisor. In this, the supervisor doesn't end up doing work that the
workload does not want to complete anyways.
* Avoiding side effects - We don't want the syscall to be interruptible
once the supervisor starts doing work because it may not be trivial
to reverse the operation. For example, unmounting a file system may
take a long time, and it's hard to rollback, or treat that as
reentrant.
* Avoid breaking runtimes - Various runtimes do not GC when they are
during a syscall (or while running native code that subsequently
calls a syscall). If many notifications are blocked, and not picked
up by the supervisor, this can get the application into a bad state.

Signed-off-by: Sargun Dhillon <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.18-rc5
# 4cbf6f62 28-Apr-2022 Sargun Dhillon <[email protected]>

seccomp: Use FIFO semantics to order notifications

Previously, the seccomp notifier used LIFO semantics, where each
notification would be added on top of the stack, and notifications
were popped off

seccomp: Use FIFO semantics to order notifications

Previously, the seccomp notifier used LIFO semantics, where each
notification would be added on top of the stack, and notifications
were popped off the top of the stack. This could result one process
that generates a large number of notifications preventing other
notifications from being handled. This patch moves from LIFO (stack)
semantics to FIFO (queue semantics).

Signed-off-by: Sargun Dhillon <[email protected]>
Reviewed-by: Christian Brauner (Microsoft) <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4
# 355f841a 09-Feb-2022 Eric W. Biederman <[email protected]>

tracehook: Remove tracehook.h

Now that all of the definitions have moved out of tracehook.h into
ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in
tracehook.h so remove it.

Upda

tracehook: Remove tracehook.h

Now that all of the definitions have moved out of tracehook.h into
ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in
tracehook.h so remove it.

Update the few files that were depending upon tracehook.h to bring in
definitions to use the headers they need directly.

Reviewed-by: Kees Cook <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>

show more ...


# 495ac306 08-Feb-2022 Kees Cook <[email protected]>

seccomp: Invalidate seccomp mode to catch death failures

If seccomp tries to kill a process, it should never see that process
again. To enforce this proactively, switch the mode to something
impossi

seccomp: Invalidate seccomp mode to catch death failures

If seccomp tries to kill a process, it should never see that process
again. To enforce this proactively, switch the mode to something
impossible. If encountered: WARN, reject all syscalls, and attempt to
kill the process again even harder.

Cc: Andy Lutomirski <[email protected]>
Cc: Will Drewry <[email protected]>
Fixes: 8112c4f140fa ("seccomp: remove 2-phase API")
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>

show more ...


Revision tags: v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13
# d21918e5 23-Jun-2021 Eric W. Biederman <[email protected]>

signal/seccomp: Dump core when there is only one live thread

Replace get_nr_threads with atomic_read(&current->signal->live) as
that is a more accurate number that is decremented sooner.

Acked-by:

signal/seccomp: Dump core when there is only one live thread

Replace get_nr_threads with atomic_read(&current->signal->live) as
that is a more accurate number that is decremented sooner.

Acked-by: Kees Cook <[email protected]>
Link: https://lkml.kernel.org/r/87lf6z6qbd.fsf_-_@disp2133
Signed-off-by: "Eric W. Biederman" <[email protected]>

show more ...


# 307d522f 23-Jun-2021 Eric W. Biederman <[email protected]>

signal/seccomp: Refactor seccomp signal and coredump generation

Factor out force_sig_seccomp from the seccomp signal generation and
place it in kernel/signal.c. The function force_sig_seccomp takes

signal/seccomp: Refactor seccomp signal and coredump generation

Factor out force_sig_seccomp from the seccomp signal generation and
place it in kernel/signal.c. The function force_sig_seccomp takes a
parameter force_coredump to indicate that the sigaction field should
be reset to SIGDFL so that a coredump will be generated when the
signal is delivered.

force_sig_seccomp is then used to replace both seccomp_send_sigsys
and seccomp_init_siginfo.

force_sig_info_to_task gains an extra parameter to force using
the default signal action.

With this change seccomp is no longer a special case and there
becomes exactly one place do_coredump is called from.

Further it no longer becomes necessary for __seccomp_filter
to call do_group_exit.

Acked-by: Kees Cook <[email protected]>
Link: https://lkml.kernel.org/r/87r1gr6qc4.fsf_-_@disp2133
Signed-off-by: "Eric W. Biederman" <[email protected]>

show more ...


Revision tags: v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2
# b4d8a58f 04-Mar-2021 Hsuan-Chi Kuo <[email protected]>

seccomp: Fix setting loaded filter count during TSYNC

The desired behavior is to set the caller's filter count to thread's.
This value is reported via /proc, so this fixes the inaccurate count
expos

seccomp: Fix setting loaded filter count during TSYNC

The desired behavior is to set the caller's filter count to thread's.
This value is reported via /proc, so this fixes the inaccurate count
exposed to userspace; it is not used for reference counting, etc.

Signed-off-by: Hsuan-Chi Kuo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Co-developed-by: Wiktor Garbacz <[email protected]>
Signed-off-by: Wiktor Garbacz <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Cc: [email protected]
Fixes: c818c03b661c ("seccomp: Report number of loaded filters in /proc/$pid/status")

show more ...


# 0ae71c77 17-May-2021 Rodrigo Campos <[email protected]>

seccomp: Support atomic "addfd + send reply"

Alban Crequy reported a race condition userspace faces when we want to
add some fds and make the syscall return them[1] using seccomp notify.

The proble

seccomp: Support atomic "addfd + send reply"

Alban Crequy reported a race condition userspace faces when we want to
add some fds and make the syscall return them[1] using seccomp notify.

The problem is that currently two different ioctl() calls are needed by
the process handling the syscalls (agent) for another userspace process
(target): SECCOMP_IOCTL_NOTIF_ADDFD to allocate the fd and
SECCOMP_IOCTL_NOTIF_SEND to return that value. Therefore, it is possible
for the agent to do the first ioctl to add a file descriptor but the
target is interrupted (EINTR) before the agent does the second ioctl()
call.

This patch adds a flag to the ADDFD ioctl() so it adds the fd and
returns that value atomically to the target program, as suggested by
Kees Cook[2]. This is done by simply allowing
seccomp_do_user_notification() to add the fd and return it in this case.
Therefore, in this case the target wakes up from the wait in
seccomp_do_user_notification() either to interrupt the syscall or to add
the fd and return it.

This "allocate an fd and return" functionality is useful for syscalls
that return a file descriptor only, like connect(2). Other syscalls that
return a file descriptor but not as return value (or return more than
one fd), like socketpair(), pipe(), recvmsg with SCM_RIGHTs, will not
work with this flag.

This effectively combines SECCOMP_IOCTL_NOTIF_ADDFD and
SECCOMP_IOCTL_NOTIF_SEND into an atomic opteration. The notification's
return value, nor error can be set by the user. Upon successful invocation
of the SECCOMP_IOCTL_NOTIF_ADDFD ioctl with the SECCOMP_ADDFD_FLAG_SEND
flag, the notifying process's errno will be 0, and the return value will
be the file descriptor number that was installed.

[1]: https://lore.kernel.org/lkml/CADZs7q4sw71iNHmV8EOOXhUKJMORPzF7thraxZYddTZsxta-KQ@mail.gmail.com/
[2]: https://lore.kernel.org/lkml/202012011322.26DCBC64F2@keescook/

Signed-off-by: Rodrigo Campos <[email protected]>
Signed-off-by: Sargun Dhillon <[email protected]>
Acked-by: Tycho Andersen <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


1234567