|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
| #
1027cd80 |
| 28-Jan-2025 |
Oleg Nesterov <[email protected]> |
seccomp: remove the 'sd' argument from __secure_computing()
After the previous changes 'sd' is always NULL.
Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Kees Cook <[email protected]> L
seccomp: remove the 'sd' argument from __secure_computing()
After the previous changes 'sd' is always NULL.
Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
show more ...
|
| #
b37778be |
| 28-Jan-2025 |
Oleg Nesterov <[email protected]> |
seccomp: fix the __secure_computing() stub for !HAVE_ARCH_SECCOMP_FILTER
Depending on CONFIG_HAVE_ARCH_SECCOMP_FILTER, __secure_computing(NULL) will crash or not. This is not consistent/safe, especi
seccomp: fix the __secure_computing() stub for !HAVE_ARCH_SECCOMP_FILTER
Depending on CONFIG_HAVE_ARCH_SECCOMP_FILTER, __secure_computing(NULL) will crash or not. This is not consistent/safe, especially considering that after the previous change __secure_computing(sd) is always called with sd == NULL.
Fortunately, if CONFIG_HAVE_ARCH_SECCOMP_FILTER=n, __secure_computing() has no callers, these architectures use secure_computing_strict(). Yet it make sense make __secure_computing(NULL) safe in this case.
Note also that with this change we can unexport secure_computing_strict() and change the current callers to use __secure_computing(NULL).
Fixes: 8cf8dfceebda ("seccomp: Stub for !HAVE_ARCH_SECCOMP_FILTER") Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7 |
|
| #
f90877dd |
| 08-Jan-2025 |
Linus Walleij <[email protected]> |
seccomp: Stub for !CONFIG_SECCOMP
When using !CONFIG_SECCOMP with CONFIG_GENERIC_ENTRY, the randconfig bots found the following snag:
kernel/entry/common.c: In function 'syscall_trace_enter': >>
seccomp: Stub for !CONFIG_SECCOMP
When using !CONFIG_SECCOMP with CONFIG_GENERIC_ENTRY, the randconfig bots found the following snag:
kernel/entry/common.c: In function 'syscall_trace_enter': >> kernel/entry/common.c:52:23: error: implicit declaration of function '__secure_computing' [-Wimplicit-function-declaration] 52 | ret = __secure_computing(NULL); | ^~~~~~~~~~~~~~~~~~
Since generic entry calls __secure_computing() unconditionally, fix this by moving the stub out of the ifdef clause for CONFIG_HAVE_ARCH_SECCOMP_FILTER so it's always available.
Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Linus Walleij <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5 |
|
| #
8cf8dfce |
| 22-Oct-2024 |
Linus Walleij <[email protected]> |
seccomp: Stub for !HAVE_ARCH_SECCOMP_FILTER
If we have CONFIG_SECCOMP but not CONFIG_HAVE_ARCH_SECCOMP_FILTER we get a compilation error:
../kernel/entry/common.c: In function 'syscall_trace_enter'
seccomp: Stub for !HAVE_ARCH_SECCOMP_FILTER
If we have CONFIG_SECCOMP but not CONFIG_HAVE_ARCH_SECCOMP_FILTER we get a compilation error:
../kernel/entry/common.c: In function 'syscall_trace_enter': ../kernel/entry/common.c:55:23: error: implicit declaration of function '__secure_computing' [-Werror=implicit-function-declaration] 55 | ret = __secure_computing(NULL); | ^~~~~~~~~~~~~~~~~~
This is because generic entry calls __secure_computing() unconditionally.
Provide the needed stub similar to how current ARM does this by calling secure_computing_strict() in the absence of secure_computing(). This is similar to what is done for ARM in arch/arm/kernel/ptrace.c.
Signed-off-by: Linus Walleij <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6 |
|
| #
a6e1420c |
| 11-Dec-2023 |
Kent Overstreet <[email protected]> |
seccomp: Split out seccomp_types.h
More pruning of sched.h dependencies.
Signed-off-by: Kent Overstreet <[email protected]>
|
| #
6d5e9d63 |
| 11-Dec-2023 |
Kent Overstreet <[email protected]> |
pid: Split out pid_types.h
Trimming down sched.h dependencies: we dont't want to include more than the base types.
Cc: Kees Cook <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc
pid: Split out pid_types.h
Trimming down sched.h dependencies: we dont't want to include more than the base types.
Cc: Kees Cook <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Will Drewry <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8 |
|
| #
b9069728 |
| 02-Dec-2022 |
Randy Dunlap <[email protected]> |
seccomp: document the "filter_count" field
Add missing kernel-doc for the 'filter_count' field in struct seccomp.
seccomp.h:40: warning: Function parameter or member 'filter_count' not described in
seccomp: document the "filter_count" field
Add missing kernel-doc for the 'filter_count' field in struct seccomp.
seccomp.h:40: warning: Function parameter or member 'filter_count' not described in 'seccomp'
Fixes: c818c03b661c ("seccomp: Report number of loaded filters in /proc/$pid/status") Signed-off-by: Randy Dunlap <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6 |
|
| #
c2aa2dfe |
| 03-May-2022 |
Sargun Dhillon <[email protected]> |
seccomp: Add wait_killable semantic to seccomp user notifier
This introduces a per-filter flag (SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) that makes it so that when notifications are received by the s
seccomp: Add wait_killable semantic to seccomp user notifier
This introduces a per-filter flag (SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) that makes it so that when notifications are received by the supervisor the notifying process will transition to wait killable semantics. Although wait killable isn't a set of semantics formally exposed to userspace, the concept is searchable. If the notifying process is signaled prior to the notification being received by the userspace agent, it will be handled as normal.
One quirk about how this is handled is that the notifying process only switches to TASK_KILLABLE if it receives a wakeup from either an addfd or a signal. This is to avoid an unnecessary wakeup of the notifying task.
The reasons behind switching into wait_killable only after userspace receives the notification are: * Avoiding unncessary work - Often, workloads will perform work that they may abort (request racing comes to mind). This allows for syscalls to be aborted safely prior to the notification being received by the supervisor. In this, the supervisor doesn't end up doing work that the workload does not want to complete anyways. * Avoiding side effects - We don't want the syscall to be interruptible once the supervisor starts doing work because it may not be trivial to reverse the operation. For example, unmounting a file system may take a long time, and it's hard to rollback, or treat that as reentrant. * Avoid breaking runtimes - Various runtimes do not GC when they are during a syscall (or while running native code that subsequently calls a syscall). If many notifications are blocked, and not picked up by the supervisor, this can get the application into a bad state.
Signed-off-by: Sargun Dhillon <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4 |
|
| #
0d8315dd |
| 11-Nov-2020 |
YiFei Zhu <[email protected]> |
seccomp/cache: Report cache data through /proc/pid/seccomp_cache
Currently the kernel does not provide an infrastructure to translate architecture numbers to a human-readable name. Translating sysca
seccomp/cache: Report cache data through /proc/pid/seccomp_cache
Currently the kernel does not provide an infrastructure to translate architecture numbers to a human-readable name. Translating syscall numbers to syscall names is possible through FTRACE_SYSCALL infrastructure but it does not provide support for compat syscalls.
This will create a file for each PID as /proc/pid/seccomp_cache. The file will be empty when no seccomp filters are loaded, or be in the format of: <arch name> <decimal syscall number> <ALLOW | FILTER> where ALLOW means the cache is guaranteed to allow the syscall, and filter means the cache will pass the syscall to the BPF filter.
For the docker default profile on x86_64 it looks like: x86_64 0 ALLOW x86_64 1 ALLOW x86_64 2 ALLOW x86_64 3 ALLOW [...] x86_64 132 ALLOW x86_64 133 ALLOW x86_64 134 FILTER x86_64 135 FILTER x86_64 136 FILTER x86_64 137 ALLOW x86_64 138 ALLOW x86_64 139 FILTER x86_64 140 ALLOW x86_64 141 ALLOW [...]
This file is guarded by CONFIG_SECCOMP_CACHE_DEBUG with a default of N because I think certain users of seccomp might not want the application to know which syscalls are definitely usable. For the same reason, it is also guarded by CAP_SYS_ADMIN.
Suggested-by: Jann Horn <[email protected]> Link: https://lore.kernel.org/lkml/CAG48ez3Ofqp4crXGksLmZY6=fGrF_tWyUCg7PBkAetvbbOPeOA@mail.gmail.com/ Signed-off-by: YiFei Zhu <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/94e663fa53136f5a11f432c661794d1ee7060779.1605101222.git.yifeifz2@illinois.edu
show more ...
|
| #
23d67a54 |
| 16-Nov-2020 |
Gabriel Krisman Bertazi <[email protected]> |
seccomp: Migrate to use SYSCALL_WORK flag
On architectures using the generic syscall entry code the architecture independent syscall work is moved to flags in thread_info::syscall_work. This removes
seccomp: Migrate to use SYSCALL_WORK flag
On architectures using the generic syscall entry code the architecture independent syscall work is moved to flags in thread_info::syscall_work. This removes architecture dependencies and frees up TIF bits.
Define SYSCALL_WORK_SECCOMP, use it in the generic entry code and convert the code which uses the TIF specific helper functions to use the new *_syscall_work() helpers which either resolve to the new mode for users of the generic entry code or to the TIF based functions for the other architectures.
Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7 |
|
| #
3135f5b7 |
| 26-Jul-2020 |
Thomas Gleixner <[email protected]> |
entry: Correct __secure_computing() stub
The original version of that used secure_computing() which has no arguments. Review requested to switch to __secure_computing() which has one. The function n
entry: Correct __secure_computing() stub
The original version of that used secure_computing() which has no arguments. Review requested to switch to __secure_computing() which has one. The function name was correct, but no argument added and of course compiling without SECCOMP was deemed overrated.
Add the missing function argument.
Fixes: 6823ecabf030 ("seccomp: Provide stub for __secure_computing()") Reported-by: Ingo Molnar <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
show more ...
|
| #
6823ecab |
| 22-Jul-2020 |
Thomas Gleixner <[email protected]> |
seccomp: Provide stub for __secure_computing()
To avoid #ifdeffery in the upcoming generic syscall entry work code provide a stub for __secure_computing() as this is preferred over secure_computing(
seccomp: Provide stub for __secure_computing()
To avoid #ifdeffery in the upcoming generic syscall entry work code provide a stub for __secure_computing() as this is preferred over secure_computing() because the TIF flag is already evaluated.
Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1 |
|
| #
7cf97b12 |
| 03-Jun-2020 |
Sargun Dhillon <[email protected]> |
seccomp: Introduce addfd ioctl to seccomp user notifier
The current SECCOMP_RET_USER_NOTIF API allows for syscall supervision over an fd. It is often used in settings where a supervising task emulat
seccomp: Introduce addfd ioctl to seccomp user notifier
The current SECCOMP_RET_USER_NOTIF API allows for syscall supervision over an fd. It is often used in settings where a supervising task emulates syscalls on behalf of a supervised task in userspace, either to further restrict the supervisee's syscall abilities or to circumvent kernel enforced restrictions the supervisor deems safe to lift (e.g. actually performing a mount(2) for an unprivileged container).
While SECCOMP_RET_USER_NOTIF allows for the interception of any syscall, only a certain subset of syscalls could be correctly emulated. Over the last few development cycles, the set of syscalls which can't be emulated has been reduced due to the addition of pidfd_getfd(2). With this we are now able to, for example, intercept syscalls that require the supervisor to operate on file descriptors of the supervisee such as connect(2).
However, syscalls that cause new file descriptors to be installed can not currently be correctly emulated since there is no way for the supervisor to inject file descriptors into the supervisee. This patch adds a new addfd ioctl to remove this restriction by allowing the supervisor to install file descriptors into the intercepted task. By implementing this feature via seccomp the supervisor effectively instructs the supervisee to install a set of file descriptors into its own file descriptor table during the intercepted syscall. This way it is possible to intercept syscalls such as open() or accept(), and install (or replace, like dup2(2)) the supervisor's resulting fd into the supervisee. One replacement use-case would be to redirect the stdout and stderr of a supervisee into log file descriptors opened by the supervisor.
The ioctl handling is based on the discussions[1] of how Extensible Arguments should interact with ioctls. Instead of building size into the addfd structure, make it a function of the ioctl command (which is how sizes are normally passed to ioctls). To support forward and backward compatibility, just mask out the direction and size, and match everything. The size (and any future direction) checks are done along with copy_struct_from_user() logic.
As a note, the seccomp_notif_addfd structure is laid out based on 8-byte alignment without requiring packing as there have been packing issues with uapi highlighted before[2][3]. Although we could overload the newfd field and use -1 to indicate that it is not to be used, doing so requires changing the size of the fd field, and introduces struct packing complexity.
[1]: https://lore.kernel.org/lkml/[email protected]/ [2]: https://lore.kernel.org/lkml/[email protected]/ [3]: https://lore.kernel.org/lkml/[email protected]
Cc: Christoph Hellwig <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Tycho Andersen <[email protected]> Cc: Jann Horn <[email protected]> Cc: Robert Sesek <[email protected]> Cc: Chris Palmer <[email protected]> Cc: Al Viro <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Suggested-by: Matt Denton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sargun Dhillon <[email protected]> Reviewed-by: Will Drewry <[email protected]> Co-developed-by: Kees Cook <[email protected]> Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v5.7 |
|
| #
3a15fb6e |
| 31-May-2020 |
Christian Brauner <[email protected]> |
seccomp: release filter after task is fully dead
The seccomp filter used to be released in free_task() which is called asynchronously via call_rcu() and assorted mechanisms. Since we need to inform
seccomp: release filter after task is fully dead
The seccomp filter used to be released in free_task() which is called asynchronously via call_rcu() and assorted mechanisms. Since we need to inform tasks waiting on the seccomp notifier when a filter goes empty we will notify them as soon as a task has been marked fully dead in release_task(). To not split seccomp cleanup into two parts, move filter release out of free_task() and into release_task() after we've unhashed struct task from struct pid, exited signals, and unlinked it from the threadgroups' thread list. We'll put the empty filter notification infrastructure into it in a follow up patch.
This also renames put_seccomp_filter() to seccomp_filter_release() which is a more descriptive name of what we're doing here especially once we've added the empty filter notification mechanism in there.
We're also NULL-ing the task's filter tree entrypoint which seems cleaner than leaving a dangling pointer in there. Note that this shouldn't need any memory barriers since we're calling this when the task is in release_task() which means it's EXIT_DEAD. So it can't modify its seccomp filters anymore. You can also see this from the point where we're calling seccomp_filter_release(). It's after __exit_signal() and at this point, tsk->sighand will already have been NULLed which is required for thread-sync and filter installation alike.
Cc: Tycho Andersen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Matt Denton <[email protected]> Cc: Sargun Dhillon <[email protected]> Cc: Jann Horn <[email protected]> Cc: Chris Palmer <[email protected]> Cc: Aleksa Sarai <[email protected]> Cc: Robert Sesek <[email protected]> Cc: Jeffrey Vander Stoep <[email protected]> Cc: Linux Containers <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v5.7-rc7, v5.7-rc6 |
|
| #
c818c03b |
| 13-May-2020 |
Kees Cook <[email protected]> |
seccomp: Report number of loaded filters in /proc/$pid/status
A common question asked when debugging seccomp filters is "how many filters are attached to your process?" Provide a way to easily answe
seccomp: Report number of loaded filters in /proc/$pid/status
A common question asked when debugging seccomp filters is "how many filters are attached to your process?" Provide a way to easily answer this question through /proc/$pid/status with a "Seccomp_filters" line.
Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5 |
|
| #
51891498 |
| 04-Mar-2020 |
Tycho Andersen <[email protected]> |
seccomp: allow TSYNC and USER_NOTIF together
The restriction introduced in 7a0df7fbc145 ("seccomp: Make NEW_LISTENER and TSYNC flags exclusive") is mostly artificial: there is enough information in
seccomp: allow TSYNC and USER_NOTIF together
The restriction introduced in 7a0df7fbc145 ("seccomp: Make NEW_LISTENER and TSYNC flags exclusive") is mostly artificial: there is enough information in a seccomp user notification to tell which thread triggered a notification. The reason it was introduced is because TSYNC makes the syscall return a thread-id on failure, and NEW_LISTENER returns an fd, and there's no way to distinguish between these two cases (well, I suppose the caller could check all fds it has, then do the syscall, and if the return value was an fd that already existed, then it must be a thread id, but bleh).
Matthew would like to use these two flags together in the Chrome sandbox which wants to use TSYNC for video drivers and NEW_LISTENER to proxy syscalls.
So, let's fix this ugliness by adding another flag, TSYNC_ESRCH, which tells the kernel to just return -ESRCH on a TSYNC error. This way, NEW_LISTENER (and any subsequent seccomp() commands that want to return positive values) don't conflict with each other.
Suggested-by: Matthew Denton <[email protected]> Signed-off-by: Tycho Andersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1 |
|
| #
fefad9ef |
| 24-Sep-2019 |
Christian Brauner <[email protected]> |
seccomp: simplify secure_computing()
Afaict, the struct seccomp_data argument to secure_computing() is unused by all current callers. So let's remove it. The argument was added in [1]. It was added
seccomp: simplify secure_computing()
Afaict, the struct seccomp_data argument to secure_computing() is unused by all current callers. So let's remove it. The argument was added in [1]. It was added because having the arch supply the syscall arguments used to be faster than having it done by secure_computing() (cf. Andy's comment in [2]). This is not true anymore though.
/* References */ [1]: 2f275de5d1ed ("seccomp: Add a seccomp_data parameter secure_computing()") [2]: https://lore.kernel.org/r/CALCETrU_fs_At-hTpr231kpaAd0z7xJN4ku-DvzhRU6cvcJA_w@mail.gmail.com
Signed-off-by: Christian Brauner <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Drewry <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Acked-by: Borislav Petkov <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6 |
|
| #
6a21cc50 |
| 09-Dec-2018 |
Tycho Andersen <[email protected]> |
seccomp: add a return code to trap to userspace
This patch introduces a means for syscalls matched in seccomp to notify some other task that a particular filter has been triggered.
The motivation f
seccomp: add a return code to trap to userspace
This patch introduces a means for syscalls matched in seccomp to notify some other task that a particular filter has been triggered.
The motivation for this is primarily for use with containers. For example, if a container does an init_module(), we obviously don't want to load this untrusted code, which may be compiled for the wrong version of the kernel anyway. Instead, we could parse the module image, figure out which module the container is trying to load and load it on the host.
As another example, containers cannot mount() in general since various filesystems assume a trusted image. However, if an orchestrator knows that e.g. a particular block device has not been exposed to a container for writing, it want to allow the container to mount that block device (that is, handle the mount for it).
This patch adds functionality that is already possible via at least two other means that I know about, both of which involve ptrace(): first, one could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL. Unfortunately this is slow, so a faster version would be to install a filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP. Since ptrace allows only one tracer, if the container runtime is that tracer, users inside the container (or outside) trying to debug it will not be able to use ptrace, which is annoying. It also means that older distributions based on Upstart cannot boot inside containers using ptrace, since upstart itself uses ptrace to monitor services while starting.
The actual implementation of this is fairly small, although getting the synchronization right was/is slightly complex.
Finally, it's worth noting that the classic seccomp TOCTOU of reading memory data from the task still applies here, but can be avoided with careful design of the userspace handler: if the userspace handler reads all of the task memory that is necessary before applying its security policy, the tracee's subsequent memory edits will not be read by the tracer.
Signed-off-by: Tycho Andersen <[email protected]> CC: Kees Cook <[email protected]> CC: Andy Lutomirski <[email protected]> CC: Oleg Nesterov <[email protected]> CC: Eric W. Biederman <[email protected]> CC: "Serge E. Hallyn" <[email protected]> Acked-by: Serge Hallyn <[email protected]> CC: Christian Brauner <[email protected]> CC: Tyler Hicks <[email protected]> CC: Akihiro Suda <[email protected]> Signed-off-by: Kees Cook <[email protected]>
show more ...
|
| #
a5662e4d |
| 09-Dec-2018 |
Tycho Andersen <[email protected]> |
seccomp: switch system call argument type to void *
The const qualifier causes problems for any code that wants to write to the third argument of the seccomp syscall, as we will do in a future patch
seccomp: switch system call argument type to void *
The const qualifier causes problems for any code that wants to write to the third argument of the seccomp syscall, as we will do in a future patch in this series.
The third argument to the seccomp syscall is documented as void *, so rather than just dropping the const, let's switch everything to use void * as well.
I believe this is safe because of 1. the documentation above, 2. there's no real type information exported about syscalls anywhere besides the man pages.
Signed-off-by: Tycho Andersen <[email protected]> CC: Kees Cook <[email protected]> CC: Andy Lutomirski <[email protected]> CC: Oleg Nesterov <[email protected]> CC: Eric W. Biederman <[email protected]> CC: "Serge E. Hallyn" <[email protected]> Acked-by: Serge Hallyn <[email protected]> CC: Christian Brauner <[email protected]> CC: Tyler Hicks <[email protected]> CC: Akihiro Suda <[email protected]> Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1, v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4 |
|
| #
00a02d0c |
| 03-May-2018 |
Kees Cook <[email protected]> |
seccomp: Add filter flag to opt-out of SSB mitigation
If a seccomp user is not interested in Speculative Store Bypass mitigation by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag wh
seccomp: Add filter flag to opt-out of SSB mitigation
If a seccomp user is not interested in Speculative Store Bypass mitigation by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when adding filters.
Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
show more ...
|
|
Revision tags: v4.17-rc3, v4.17-rc2, v4.17-rc1, v4.16, v4.16-rc7, v4.16-rc6, v4.16-rc5, v4.16-rc4, v4.16-rc3, v4.16-rc2, v4.16-rc1, v4.15, v4.15-rc9, v4.15-rc8, v4.15-rc7, v4.15-rc6, v4.15-rc5, v4.15-rc4, v4.15-rc3, v4.15-rc2, v4.15-rc1, v4.14, v4.14-rc8, v4.14-rc7, v4.14-rc6, v4.14-rc5 |
|
| #
26500475 |
| 11-Oct-2017 |
Tycho Andersen <[email protected]> |
ptrace, seccomp: add support for retrieving seccomp metadata
With the new SECCOMP_FILTER_FLAG_LOG, we need to be able to extract these flags for checkpoint restore, since they describe the state of
ptrace, seccomp: add support for retrieving seccomp metadata
With the new SECCOMP_FILTER_FLAG_LOG, we need to be able to extract these flags for checkpoint restore, since they describe the state of a filter.
So, let's add PTRACE_SECCOMP_GET_METADATA, similar to ..._GET_FILTER, which returns the metadata of the nth filter (right now, just the flags). Hopefully this will be future proof, and new per-filter metadata can be added to this struct.
Signed-off-by: Tycho Andersen <[email protected]> CC: Kees Cook <[email protected]> CC: Andy Lutomirski <[email protected]> CC: Oleg Nesterov <[email protected]> Signed-off-by: Kees Cook <[email protected]>
show more ...
|
| #
b2441318 |
| 01-Nov-2017 |
Greg Kroah-Hartman <[email protected]> |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license identifiers to apply.
- when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary:
SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became the concluded license(s).
- when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time.
In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related.
Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches.
Reviewed-by: Kate Stewart <[email protected]> Reviewed-by: Philippe Ombredanne <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v4.14-rc4, v4.14-rc3, v4.14-rc2, v4.14-rc1, v4.13, v4.13-rc7, v4.13-rc6, v4.13-rc5 |
|
| #
e66a3997 |
| 11-Aug-2017 |
Tyler Hicks <[email protected]> |
seccomp: Filter flag to log all actions except SECCOMP_RET_ALLOW
Add a new filter flag, SECCOMP_FILTER_FLAG_LOG, that enables logging for all actions except for SECCOMP_RET_ALLOW for the given filte
seccomp: Filter flag to log all actions except SECCOMP_RET_ALLOW
Add a new filter flag, SECCOMP_FILTER_FLAG_LOG, that enables logging for all actions except for SECCOMP_RET_ALLOW for the given filter.
SECCOMP_RET_KILL actions are always logged, when "kill" is in the actions_logged sysctl, and SECCOMP_RET_ALLOW actions are never logged, regardless of this flag.
This flag can be used to create noisy filters that result in all non-allowed actions to be logged. A process may have one noisy filter, which is loaded with this flag, as well as a quiet filter that's not loaded with this flag. This allows for the actions in a set of filters to be selectively conveyed to the admin.
Since a system could have a large number of allocated seccomp_filter structs, struct packing was taken in consideration. On 64 bit x86, the new log member takes up one byte of an existing four byte hole in the struct. On 32 bit x86, the new log member creates a new four byte hole (unavoidable) and consumes one of those bytes.
Unfortunately, the tests added for SECCOMP_FILTER_FLAG_LOG are not capable of inspecting the audit log to verify that the actions taken in the filter were logged.
With this patch, the logic for deciding if an action will be logged is:
if action == RET_ALLOW: do not log else if action == RET_KILL && RET_KILL in actions_logged: log else if filter-requests-logging && action in actions_logged: log else if audit_enabled && process-is-being-audited: log else: do not log
Signed-off-by: Tyler Hicks <[email protected]> Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v4.13-rc4, v4.13-rc3, v4.13-rc2, v4.13-rc1, v4.12, v4.12-rc7, v4.12-rc6, v4.12-rc5, v4.12-rc4, v4.12-rc3, v4.12-rc2, v4.12-rc1, v4.11, v4.11-rc8, v4.11-rc7, v4.11-rc6, v4.11-rc5, v4.11-rc4, v4.11-rc3, v4.11-rc2, v4.11-rc1, v4.10, v4.10-rc8, v4.10-rc7, v4.10-rc6, v4.10-rc5, v4.10-rc4, v4.10-rc3, v4.10-rc2, v4.10-rc1, v4.9, v4.9-rc8, v4.9-rc7, v4.9-rc6, v4.9-rc5, v4.9-rc4, v4.9-rc3, v4.9-rc2, v4.9-rc1, v4.8, v4.8-rc8, v4.8-rc7, v4.8-rc6, v4.8-rc5, v4.8-rc4, v4.8-rc3, v4.8-rc2, v4.8-rc1, v4.7, v4.7-rc7, v4.7-rc6, v4.7-rc5, v4.7-rc4, v4.7-rc3, v4.7-rc2 |
|
| #
8112c4f1 |
| 01-Jun-2016 |
Kees Cook <[email protected]> |
seccomp: remove 2-phase API
Since nothing is using the 2-phase API, and it adds more complexity than benefit, remove it.
Signed-off-by: Kees Cook <[email protected]> Cc: Andy Lutomirski <luto@k
seccomp: remove 2-phase API
Since nothing is using the 2-phase API, and it adds more complexity than benefit, remove it.
Signed-off-by: Kees Cook <[email protected]> Cc: Andy Lutomirski <[email protected]>
show more ...
|
|
Revision tags: v4.7-rc1 |
|
| #
2f275de5 |
| 27-May-2016 |
Andy Lutomirski <[email protected]> |
seccomp: Add a seccomp_data parameter secure_computing()
Currently, if arch code wants to supply seccomp_data directly to seccomp (which is generally much faster than having seccomp do it using the
seccomp: Add a seccomp_data parameter secure_computing()
Currently, if arch code wants to supply seccomp_data directly to seccomp (which is generally much faster than having seccomp do it using the syscall_get_xyz() API), it has to use the two-phase seccomp hooks. Add it to the easy hooks, too.
Cc: [email protected] Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Kees Cook <[email protected]>
show more ...
|