|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5 |
|
| #
cc9554e6 |
| 23-Feb-2025 |
Yonatan Goldschmidt <[email protected]> |
binfmt: Remove loader from linux_binprm struct
Commit 987f20a9dcce ("a.out: Remove the a.out implementation") removed the last in-tree user of the loader field, and as far as I can tell, it was the
binfmt: Remove loader from linux_binprm struct
Commit 987f20a9dcce ("a.out: Remove the a.out implementation") removed the last in-tree user of the loader field, and as far as I can tell, it was the only one historically.
Signed-off-by: Yonatan Goldschmidt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
a5874fde |
| 12-Dec-2024 |
Mickaël Salaün <[email protected]> |
exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would be allowed for execution. The main use case is for script interpreters and
exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would be allowed for execution. The main use case is for script interpreters and dynamic linkers to check execution permission according to the kernel's security policy. Another use case is to add context to access logs e.g., which script (instead of interpreter) accessed a file. As any executable code, scripts could also use this check [1].
This is different from faccessat(2) + X_OK which only checks a subset of access rights (i.e. inode permission and mount options for regular files), but not the full context (e.g. all LSM access checks). The main use case for access(2) is for SUID processes to (partially) check access on behalf of their caller. The main use case for execveat(2) + AT_EXECVE_CHECK is to check if a script execution would be allowed, according to all the different restrictions in place. Because the use of AT_EXECVE_CHECK follows the exact kernel semantic as for a real execution, user space gets the same error codes.
An interesting point of using execveat(2) instead of openat2(2) is that it decouples the check from the enforcement. Indeed, the security check can be logged (e.g. with audit) without blocking an execution environment not yet ready to enforce a strict security policy.
LSMs can control or log execution requests with security_bprm_creds_for_exec(). However, to enforce a consistent and complete access control (e.g. on binary's dependencies) LSMs should restrict file executability, or measure executed files, with security_file_open() by checking file->f_flags & __FMODE_EXEC.
Because AT_EXECVE_CHECK is dedicated to user space interpreters, it doesn't make sense for the kernel to parse the checked files, look for interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC if the format is unknown. Because of that, security_bprm_check() is never called when AT_EXECVE_CHECK is used.
It should be noted that script interpreters cannot directly use execveat(2) (without this new AT_EXECVE_CHECK flag) because this could lead to unexpected behaviors e.g., `python script.sh` could lead to Bash being executed to interpret the script. Unlike the kernel, script interpreters may just interpret the shebang as a simple comment, which should not change for backward compatibility reasons.
Because scripts or libraries files might not currently have the executable permission set, or because we might want specific users to be allowed to run arbitrary scripts, the following patch provides a dynamic configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE securebits.
This is a redesign of the CLIP OS 4's O_MAYEXEC: https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch This patch has been used for more than a decade with customized script interpreters. Some examples can be found here: https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
Cc: Al Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Kees Cook <[email protected]> Acked-by: Paul Moore <[email protected]> Reviewed-by: Serge Hallyn <[email protected]> Reviewed-by: Jeff Xu <[email protected]> Tested-by: Jeff Xu <[email protected]> Link: https://docs.python.org/3/library/io.html#io.open_code [1] Signed-off-by: Mickaël Salaün <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2, v6.13-rc1 |
|
| #
543841d1 |
| 21-Nov-2024 |
Kees Cook <[email protected]> |
exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
Zbigniew mentioned at Linux Plumber's that systemd is interested in switching to execveat() for service execution, but can't, because
exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
Zbigniew mentioned at Linux Plumber's that systemd is interested in switching to execveat() for service execution, but can't, because the contents of /proc/pid/comm are the file descriptor which was used, instead of the path to the binary[1]. This makes the output of tools like top and ps useless, especially in a world where most fds are opened CLOEXEC so the number is truly meaningless.
When the filename passed in is empty (e.g. with AT_EMPTY_PATH), use the dentry's filename for "comm" instead of using the useless numeral from the synthetic fdpath construction. This way the actual exec machinery is unchanged, but cosmetically the comm looks reasonable to admins investigating things.
Instead of adding TASK_COMM_LEN more bytes to bprm, use one of the unused flag bits to indicate that we need to set "comm" from the dentry.
Suggested-by: Zbigniew Jędrzejewski-Szmek <[email protected]> Suggested-by: Tycho Andersen <[email protected]> Suggested-by: Al Viro <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec [1] Reviewed-by: Aleksa Sarai <[email protected]> Tested-by: Zbigniew Jędrzejewski-Szmek <[email protected]> Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5 |
|
| #
084ebf7c |
| 21-Jun-2024 |
Kees Cook <[email protected]> |
execve: Keep bprm->argmin behind CONFIG_MMU
When argmin was added in commit 655c16a8ce9c ("exec: separate MM_ANONPAGES and RLIMIT_STACK accounting"), it was intended only for validating stack limits
execve: Keep bprm->argmin behind CONFIG_MMU
When argmin was added in commit 655c16a8ce9c ("exec: separate MM_ANONPAGES and RLIMIT_STACK accounting"), it was intended only for validating stack limits on CONFIG_MMU[1]. All checking for reaching the limit (argmin) is wrapped in CONFIG_MMU ifdef checks, though setting argmin was not. That argmin is only supposed to be used under CONFIG_MMU was rediscovered recently[2], and I don't want to trip over this again.
Move argmin's declaration into the existing CONFIG_MMU area, and add helpers functions so the MMU tests can be consolidated.
Link: https://lore.kernel.org/all/[email protected] [1] Link: https://lore.kernel.org/all/202406211253.7037F69@keescook/ [2] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15 |
|
| #
21ca59b3 |
| 28-Oct-2021 |
Christian Brauner <[email protected]> |
binfmt_misc: enable sandboxed mounts
Enable unprivileged sandboxes to create their own binfmt_misc mounts. This is based on Laurent's work in [1] but has been significantly reworked to fix various i
binfmt_misc: enable sandboxed mounts
Enable unprivileged sandboxes to create their own binfmt_misc mounts. This is based on Laurent's work in [1] but has been significantly reworked to fix various issues we identified in earlier versions.
While binfmt_misc can currently only be mounted in the initial user namespace, binary types registered in this binfmt_misc instance are available to all sandboxes (Either by having them installed in the sandbox or by registering the binary type with the F flag causing the interpreter to be opened right away). So binfmt_misc binary types are already delegated to sandboxes implicitly.
However, while a sandbox has access to all registered binary types in binfmt_misc a sandbox cannot currently register its own binary types in binfmt_misc. This has prevented various use-cases some of which were already outlined in [1] but we have a range of issues associated with this (cf. [3]-[5] below which are just a small sample).
Extend binfmt_misc to be mountable in non-initial user namespaces. Similar to other filesystem such as nfsd, mqueue, and sunrpc we use keyed superblock management. The key determines whether we need to create a new superblock or can reuse an already existing one. We use the user namespace of the mount as key. This means a new binfmt_misc superblock is created once per user namespace creation. Subsequent mounts of binfmt_misc in the same user namespace will mount the same binfmt_misc instance. We explicitly do not create a new binfmt_misc superblock on every binfmt_misc mount as the semantics for load_misc_binary() line up with the keying model. This also allows us to retrieve the relevant binfmt_misc instance based on the caller's user namespace which can be done in a simple (bounded to 32 levels) loop.
Similar to the current binfmt_misc semantics allowing access to the binary types in the initial binfmt_misc instance we do allow sandboxes access to their parent's binfmt_misc mounts if they do not have created a separate binfmt_misc instance.
Overall, this will unblock the use-cases mentioned below and in general will also allow to support and harden execution of another architecture's binaries in tight sandboxes. For instance, using the unshare binary it possible to start a chroot of another architecture and configure the binfmt_misc interpreter without being root to run the binaries in this chroot and without requiring the host to modify its binary type handlers.
Henning had already posted a few experiments in the cover letter at [1]. But here's an additional example where an unprivileged container registers qemu-user-static binary handlers for various binary types in its separate binfmt_misc mount and is then seamlessly able to start containers with a different architecture without affecting the host:
root [lxc monitor] /var/snap/lxd/common/lxd/containers f1 1000000 \_ /sbin/init 1000000 \_ /lib/systemd/systemd-journald 1000000 \_ /lib/systemd/systemd-udevd 1000100 \_ /lib/systemd/systemd-networkd 1000101 \_ /lib/systemd/systemd-resolved 1000000 \_ /usr/sbin/cron -f 1000103 \_ /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only 1000000 \_ /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers 1000104 \_ /usr/sbin/rsyslogd -n -iNONE 1000000 \_ /lib/systemd/systemd-logind 1000000 \_ /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 vt220 1000107 \_ dnsmasq --conf-file=/dev/null -u lxc-dnsmasq --strict-order --bind-interfaces --pid-file=/run/lxc/dnsmasq.pid --liste 1000000 \_ [lxc monitor] /var/lib/lxc f1-s390x 1100000 \_ /usr/bin/qemu-s390x-static /sbin/init 1100000 \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-journald 1100000 \_ /usr/bin/qemu-s390x-static /usr/sbin/cron -f 1100103 \_ /usr/bin/qemu-s390x-static /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-ac 1100000 \_ /usr/bin/qemu-s390x-static /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers 1100104 \_ /usr/bin/qemu-s390x-static /usr/sbin/rsyslogd -n -iNONE 1100000 \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-logind 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/0 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/1 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/2 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/3 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-udevd
[1]: https://lore.kernel.org/all/[email protected] [2]: https://discuss.linuxcontainers.org/t/binfmt-misc-permission-denied [3]: https://discuss.linuxcontainers.org/t/lxd-binfmt-support-for-qemu-static-interpreters [4]: https://discuss.linuxcontainers.org/t/3-1-0-binfmt-support-service-in-unprivileged-guest-requires-write-access-on-hosts-proc-sys-fs-binfmt-misc [5]: https://discuss.linuxcontainers.org/t/qemu-user-static-not-working-4-11
Link: https://lore.kernel.org/r/[email protected] (origin) Link: https://lore.kernel.org/r/[email protected] (v1) Cc: Sargun Dhillon <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Jann Horn <[email protected]> Cc: Henning Schild <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Al Viro <[email protected]> Cc: Laurent Vivier <[email protected]> Cc: [email protected] Signed-off-by: Laurent Vivier <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Kees Cook <[email protected]> --- /* v2 */ - Serge Hallyn <[email protected]>: - Use GFP_KERNEL_ACCOUNT for userspace triggered allocations when a new binary type handler is registered. - Christian Brauner <[email protected]>: - Switch authorship to me. I refused to do that earlier even though Laurent said I should do so because I think it's genuinely bad form. But by now I have changed so many things that it'd be unfair to blame Laurent for any potential bugs in here. - Add more comments that explain what's going on. - Rename functions while changing them to better reflect what they are doing to make the code easier to understand. - In the first version when a specific binary type handler was removed either through a write to the entry's file or all binary type handlers were removed by a write to the binfmt_misc mount's status file all cleanup work happened during inode eviction. That includes removal of the relevant entries from entry list. While that works fine I disliked that model after thinking about it for a bit. Because it means that there was a window were someone has already removed a or all binary handlers but they could still be safely reached from load_misc_binary() when it has managed to take the read_lock() on the entries list while inode eviction was already happening. Again, that perfectly benign but it's cleaner to remove the binary handler from the list immediately meaning that ones the write to then entry's file or the binfmt_misc status file returns the binary type cannot be executed anymore. That gives stronger guarantees to the user.
show more ...
|
| #
9f4beead |
| 29-Sep-2022 |
Lukas Bulwahn <[email protected]> |
binfmt: remove taso from linux_binprm struct
With commit 987f20a9dcce ("a.out: Remove the a.out implementation"), the use of the special taso flag for alpha architectures in the linux_binprm struct
binfmt: remove taso from linux_binprm struct
With commit 987f20a9dcce ("a.out: Remove the a.out implementation"), the use of the special taso flag for alpha architectures in the linux_binprm struct is gone.
Remove the definition of taso in the linux_binprm struct.
No functional change.
Signed-off-by: Lukas Bulwahn <[email protected]> Reviewed-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
a99a3e2e |
| 31-Jan-2022 |
Eric W. Biederman <[email protected]> |
coredump: Move definition of struct coredump_params into coredump.h
Move the definition of struct coredump_params into coredump.h where it belongs.
Remove the slightly errorneous comment explaining
coredump: Move definition of struct coredump_params into coredump.h
Move the definition of struct coredump_params into coredump.h where it belongs.
Remove the slightly errorneous comment explaining why struct coredump_params was declared in binfmts.h.
Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
| #
d65bc29b |
| 13-Feb-2022 |
Alexey Dobriyan <[email protected]> |
binfmt: move more stuff undef CONFIG_COREDUMP
struct linux_binfmt::core_dump and struct min_coredump::min_coredump are used under CONFIG_COREDUMP only. Shrink those embedded configs a bit.
Signed-o
binfmt: move more stuff undef CONFIG_COREDUMP
struct linux_binfmt::core_dump and struct min_coredump::min_coredump are used under CONFIG_COREDUMP only. Shrink those embedded configs a bit.
Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5 |
|
| #
d0f1088b |
| 08-Mar-2020 |
Al Viro <[email protected]> |
coredump: don't bother with do_truncate()
have dump_skip() just remember how much needs to be skipped, leave actual seeks/writing zeroes to the next dump_emit() or the end of coredump output, whiche
coredump: don't bother with do_truncate()
have dump_skip() just remember how much needs to be skipped, leave actual seeks/writing zeroes to the next dump_emit() or the end of coredump output, whichever comes first. And instead of playing with do_truncate() in the end, just write one NUL at the end of the last gap (if any).
Signed-off-by: Al Viro <[email protected]>
show more ...
|
|
Revision tags: v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1 |
|
| #
2347961b |
| 28-Jan-2020 |
Laurent Vivier <[email protected]> |
binfmt_misc: pass binfmt_misc flags to the interpreter
It can be useful to the interpreter to know which flags are in use.
For instance, knowing if the preserve-argv[0] is in use would allow to ski
binfmt_misc: pass binfmt_misc flags to the interpreter
It can be useful to the interpreter to know which flags are in use.
For instance, knowing if the preserve-argv[0] is in use would allow to skip the pathname argument.
This patch uses an unused auxiliary vector, AT_FLAGS, to add a flag to inform interpreter if the preserve-argv[0] is enabled.
Note by Helge Deller: The real-world user of this patch is qemu-user, which needs to know if it has to preserve the argv[0]. See Debian bug #970460.
Signed-off-by: Laurent Vivier <[email protected]> Reviewed-by: YunQiang Su <[email protected]> URL: http://bugs.debian.org/970460 Signed-off-by: Helge Deller <[email protected]>
show more ...
|
| #
be619f7f |
| 13-Jul-2020 |
Eric W. Biederman <[email protected]> |
exec: Implement kernel_execve
To allow the kernel not to play games with set_fs to call exec implement kernel_execve. The function kernel_execve takes pointers into kernel memory and copies the val
exec: Implement kernel_execve
To allow the kernel not to play games with set_fs to call exec implement kernel_execve. The function kernel_execve takes pointers into kernel memory and copies the values pointed to onto the new userspace stack.
The calls with arguments from kernel space of do_execve are replaced with calls to kernel_execve.
The calls do_execve and do_execveat are made static as there are now no callers outside of exec.
The comments that mention do_execve are updated to refer to kernel_execve or execve depending on the circumstances. In addition to correcting the comments, this makes it easy to grep for do_execve and verify it is not used.
Inspired-by: https://lkml.kernel.org/r/[email protected] Reviewed-by: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
| #
60d9ad1d |
| 11-Jul-2020 |
Eric W. Biederman <[email protected]> |
exec: Move initialization of bprm->filename into alloc_bprm
Currently it is necessary for the usermode helper code and the code that launches init to use set_fs so that pages coming from the kernel
exec: Move initialization of bprm->filename into alloc_bprm
Currently it is necessary for the usermode helper code and the code that launches init to use set_fs so that pages coming from the kernel look like they are coming from userspace.
To allow that usage of set_fs to be removed cleanly the argument copying from userspace needs to happen earlier. Move the computation of bprm->filename and possible allocation of a name in the case of execveat into alloc_bprm to make that possible.
The exectuable name, the arguments, and the environment are copied into the new usermode stack which is stored in bprm until exec passes the point of no return.
As the executable name is copied first onto the usermode stack it needs to be known. As there are no dependencies to computing the executable name, compute it early in alloc_bprm.
As an implementation detail if the filename needs to be generated because it embeds a file descriptor store that filename in a new field bprm->fdpath, and free it in free_bprm. Previously this was done in an independent variable pathbuf. I have renamed pathbuf fdpath because fdpath is more suggestive of what kind of path is in the variable. I moved fdpath into struct linux_binprm because it is tightly tied to the other variables in struct linux_binprm, and as such is needed to allow the call alloc_binprm to move.
Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
| #
9746c9be |
| 11-Jul-2020 |
Eric W. Biederman <[email protected]> |
exec: Remove unnecessary spaces from binfmts.h
The general convention in the linux kernel is to define a pointer member as "type *name". The declaration of struct linux_binprm has several pointer d
exec: Remove unnecessary spaces from binfmts.h
The general convention in the linux kernel is to define a pointer member as "type *name". The declaration of struct linux_binprm has several pointer defined as "type * name". Update them to the form of "type *name" for consistency.
Suggested-by: Kees Cook <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
| #
25cf336d |
| 25-Jun-2020 |
Eric W. Biederman <[email protected]> |
exec: Remove do_execve_file
Now that the last callser has been removed remove this code from exec.
For anyone thinking of resurrecing do_execve_file please note that the code was buggy in several f
exec: Remove do_execve_file
Now that the last callser has been removed remove this code from exec.
For anyone thinking of resurrecing do_execve_file please note that the code was buggy in several fundamental ways.
- It did not ensure the file it was passed was read-only and that deny_write_access had been called on it. Which subtlely breaks invaniants in exec.
- The caller of do_execve_file was expected to hold and put a reference to the file, but an extra reference for use by exec was not taken so that when exec put it's reference to the file an underflow occured on the file reference count.
- The point of the interface was so that a pathname did not need to exist. Which breaks pathname based LSMs.
Tetsuo Handa originally reported these issues[1]. While it was clear that deny_write_access was missing the fundamental incompatibility with the passed in O_RDWR filehandle was not immediately recognized.
All of these issues were fixed by modifying the usermode driver code to have a path, so it did not need this hack.
Reported-by: Tetsuo Handa <[email protected]> [1] https://lore.kernel.org/linux-fsdevel/[email protected]/ v1: https://lkml.kernel.org/r/[email protected] v2: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Reviewed-by: Greg Kroah-Hartman <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Tested-by: Alexei Starovoitov <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
| #
986db2d1 |
| 04-Jun-2020 |
Christoph Hellwig <[email protected]> |
exec: simplify the copy_strings_kernel calling convention
copy_strings_kernel is always used with a single argument, adjust the calling convention to that.
Signed-off-by: Christoph Hellwig <hch@lst
exec: simplify the copy_strings_kernel calling convention
copy_strings_kernel is always used with a single argument, adjust the calling convention to that.
Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Alexander Viro <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
56305aa9 |
| 30-May-2020 |
Eric W. Biederman <[email protected]> |
exec: Compute file based creds only once
Move the computation of creds from prepare_binfmt into begin_new_exec so that the creds need only be computed once. This is just code reorganization no sema
exec: Compute file based creds only once
Move the computation of creds from prepare_binfmt into begin_new_exec so that the creds need only be computed once. This is just code reorganization no semantic changes of any kind are made.
Moving the computation is safe. I have looked through the kernel and verified none of the binfmts look at bprm->cred directly, and that there are no helpers that look at bprm->cred indirectly. Which means that it is not a problem to compute the bprm->cred later in the execution flow as it is not used until it becomes current->cred.
A new function bprm_creds_from_file is added to contain the work that needs to be done. bprm_creds_from_file first computes which file bprm->executable or most likely bprm->file that the bprm->creds will be computed from.
The funciton bprm_fill_uid is updated to receive the file instead of accessing bprm->file. The now unnecessary work needed to reset the bprm->cred->euid, and bprm->cred->egid is removed from brpm_fill_uid. A small comment to document that bprm_fill_uid now only deals with the work to handle suid and sgid files. The default case is already heandled by prepare_exec_creds.
The function security_bprm_repopulate_creds is renamed security_bprm_creds_from_file and now is explicitly passed the file from which to compute the creds. The documentation of the bprm_creds_from_file security hook is updated to explain when the hook is called and what it needs to do. The file is passed from cap_bprm_creds_from_file into get_file_caps so that the caps are computed for the appropriate file. The now unnecessary work in cap_bprm_creds_from_file to reset the ambient capabilites has been removed. A small comment to document that the work of cap_bprm_creds_from_file is to read capabilities from the files secureity attribute and derive capabilities from the fact the user had uid 0 has been added.
Reviewed-by: Kees Cook <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
| #
a7868323 |
| 29-May-2020 |
Eric W. Biederman <[email protected]> |
exec: Add a per bprm->file version of per_clear
There is a small bug in the code that recomputes parts of bprm->cred for every bprm->file. The code never recomputes the part of clear_dangerous_pers
exec: Add a per bprm->file version of per_clear
There is a small bug in the code that recomputes parts of bprm->cred for every bprm->file. The code never recomputes the part of clear_dangerous_personality_flags it is responsible for.
Which means that in practice if someone creates a sgid script the interpreter will not be able to use any of: READ_IMPLIES_EXEC ADDR_NO_RANDOMIZE ADDR_COMPAT_LAYOUT MMAP_PAGE_ZERO.
This accentially clearing of personality flags probably does not matter in practice because no one has complained but it does make the code more difficult to understand.
Further remaining bug compatible prevents the recomputation from being removed and replaced by simply computing bprm->cred once from the final bprm->file.
Making this change removes the last behavior difference between computing bprm->creds from the final file and recomputing bprm->cred several times. Which allows this behavior change to be justified for it's own reasons, and for any but hunts looking into why the behavior changed to wind up here instead of in the code that will follow that computes bprm->cred from the final bprm->file.
This small logic bug appears to have existed since the code started clearing dangerous personality bits.
History Tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: 1bb0fa189c6a ("[PATCH] NX: clean up legacy binary support") Reviewed-by: Kees Cook <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
| #
bc2bf338 |
| 18-May-2020 |
Eric W. Biederman <[email protected]> |
exec: Remove recursion from search_binary_handler
Recursion in kernel code is generally a bad idea as it can overflow the kernel stack. Recursion in exec also hides that the code is looping and tha
exec: Remove recursion from search_binary_handler
Recursion in kernel code is generally a bad idea as it can overflow the kernel stack. Recursion in exec also hides that the code is looping and that the loop changes bprm->file.
Instead of recursing in search_binary_handler have the methods that would recurse set bprm->interpreter and return 0. Modify exec_binprm to loop when bprm->interpreter is set. Consolidate all of the reassignments of bprm->file in that loop to make it clear what is going on.
The structure of the new loop in exec_binprm is that all errors return immediately, while successful completion (ret == 0 && !bprm->interpreter) just breaks out of the loop and runs what exec_bprm has always run upon successful completion.
Fail if the an interpreter is being call after execfd has been set. The code has never properly handled an interpreter being called with execfd being set and with reassignments of bprm->file and the assignment of bprm->executable in generic code it has finally become possible to test and fail when if this problematic condition happens.
With the reassignments of bprm->file and the assignment of bprm->executable moved into the generic code add a test to see if bprm->executable is being reassigned.
In search_binary_handler remove the test for !bprm->file. With all reassignments of bprm->file moved to exec_binprm bprm->file can never be NULL in search_binary_handler.
Link: https://lkml.kernel.org/r/[email protected] Acked-by: Linus Torvalds <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
| #
b8a61c9e |
| 14-May-2020 |
Eric W. Biederman <[email protected]> |
exec: Generic execfd support
Most of the support for passing the file descriptor of an executable to an interpreter already lives in the generic code and in binfmt_elf. Rework the fields in binfmt_e
exec: Generic execfd support
Most of the support for passing the file descriptor of an executable to an interpreter already lives in the generic code and in binfmt_elf. Rework the fields in binfmt_elf that deal with executable file descriptor passing to make executable file descriptor passing a first class concept.
Move the fd_install from binfmt_misc into begin_new_exec after the new creds have been installed. This means that accessing the file through /proc/<pid>/fd/N is able to see the creds for the new executable before allowing access to the new executables files.
Performing the install of the executables file descriptor after the point of no return also means that nothing special needs to be done on error. The exiting of the process will close all of it's open files.
Move the would_dump from binfmt_misc into begin_new_exec right after would_dump is called on the bprm->file. This makes it obvious this case exists and that no nesting of bprm->file is currently supported.
In binfmt_misc the movement of fd_install into generic code means that it's special error exit path is no longer needed.
Link: https://lkml.kernel.org/r/[email protected] Acked-by: Linus Torvalds <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
| #
8b72ca90 |
| 14-May-2020 |
Eric W. Biederman <[email protected]> |
exec: Move the call of prepare_binprm into search_binary_handler
The code in prepare_binary_handler needs to be run every time search_binary_handler is called so move the call into search_binary_han
exec: Move the call of prepare_binprm into search_binary_handler
The code in prepare_binary_handler needs to be run every time search_binary_handler is called so move the call into search_binary_handler itself to make the code simpler and easier to understand.
Link: https://lkml.kernel.org/r/[email protected] Acked-by: Linus Torvalds <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: James Morris <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
| #
a16b3357 |
| 16-May-2020 |
Eric W. Biederman <[email protected]> |
exec: Allow load_misc_binary to call prepare_binprm unconditionally
Add a flag preserve_creds that binfmt_misc can set to prevent credentials from being updated. This allows binfmt_misc to always c
exec: Allow load_misc_binary to call prepare_binprm unconditionally
Add a flag preserve_creds that binfmt_misc can set to prevent credentials from being updated. This allows binfmt_misc to always call prepare_binprm. Allowing the credential computation logic to be consolidated.
Not replacing the credentials with the interpreters credentials is safe because because an open file descriptor to the executable is passed to the interpreter. As the interpreter does not need to reopen the executable it is guaranteed to see the same file that exec sees.
Ref: c407c033de84 ("[PATCH] binfmt_misc: improve calculation of interpreter's credentials") Link: https://lkml.kernel.org/r/[email protected] Acked-by: Linus Torvalds <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
| #
112b7147 |
| 14-May-2020 |
Eric W. Biederman <[email protected]> |
exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
Rename bprm->cap_elevated to bprm->active_secureexec and initialize it in prepare_binprm instead of in cap_bprm_set_creds.
exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
Rename bprm->cap_elevated to bprm->active_secureexec and initialize it in prepare_binprm instead of in cap_bprm_set_creds. Initializing bprm->active_secureexec in prepare_binprm allows multiple implementations of security_bprm_repopulate_creds to play nicely with each other.
Rename security_bprm_set_creds to security_bprm_reopulate_creds to emphasize that this path recomputes part of bprm->cred. This recomputation avoids the time of check vs time of use problems that are inherent in unix #! interpreters.
In short two renames and a move in the location of initializing bprm->active_secureexec.
Link: https://lkml.kernel.org/r/[email protected] Acked-by: Linus Torvalds <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
| #
b8bff599 |
| 22-Mar-2020 |
Eric W. Biederman <[email protected]> |
exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
Today security_bprm_set_creds has several implementations: apparmor_bprm_set_creds, cap_bprm_set_creds, selinux_bprm_set_cred
exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
Today security_bprm_set_creds has several implementations: apparmor_bprm_set_creds, cap_bprm_set_creds, selinux_bprm_set_creds, smack_bprm_set_creds, and tomoyo_bprm_set_creds.
Except for cap_bprm_set_creds they all test bprm->called_set_creds and return immediately if it is true. The function cap_bprm_set_creds ignores bprm->calld_sed_creds entirely.
Create a new LSM hook security_bprm_creds_for_exec that is called just before prepare_binprm in __do_execve_file, resulting in a LSM hook that is called exactly once for the entire of exec. Modify the bits of security_bprm_set_creds that only want to be called once per exec into security_bprm_creds_for_exec, leaving only cap_bprm_set_creds behind.
Remove bprm->called_set_creds all of it's former users have been moved to security_bprm_creds_for_exec.
Add or upate comments a appropriate to bring them up to date and to reflect this change.
Link: https://lkml.kernel.org/r/[email protected] Acked-by: Linus Torvalds <[email protected]> Acked-by: Casey Schaufler <[email protected]> # For the LSM and Smack bits Reviewed-by: Kees Cook <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
| #
2388777a |
| 03-May-2020 |
Eric W. Biederman <[email protected]> |
exec: Rename flush_old_exec begin_new_exec
There is and has been for a very long time been a lot more going on in flush_old_exec than just flushing the old state. After the movement of code from se
exec: Rename flush_old_exec begin_new_exec
There is and has been for a very long time been a lot more going on in flush_old_exec than just flushing the old state. After the movement of code from setup_new_exec there is a whole lot more going on than just flushing the old executables state.
Rename flush_old_exec to begin_new_exec to more accurately reflect what this function does.
Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Greg Ungerer <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|
| #
96ecee29 |
| 03-May-2020 |
Eric W. Biederman <[email protected]> |
exec: Merge install_exec_creds into setup_new_exec
The two functions are now always called one right after the other so merge them together to make future maintenance easier.
Reviewed-by: Kees Cook
exec: Merge install_exec_creds into setup_new_exec
The two functions are now always called one right after the other so merge them together to make future maintenance easier.
Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Greg Ungerer <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
show more ...
|