|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14 |
|
| #
983e0e4e |
| 18-Mar-2025 |
Pauli Virtanen <[email protected]> |
net-timestamp: COMPLETION timestamp on packet tx completion
Add SOF_TIMESTAMPING_TX_COMPLETION, for requesting a software timestamp when hardware reports a packet completed.
Completion tstamp is us
net-timestamp: COMPLETION timestamp on packet tx completion
Add SOF_TIMESTAMPING_TX_COMPLETION, for requesting a software timestamp when hardware reports a packet completed.
Completion tstamp is useful for Bluetooth, as hardware timestamps do not exist in the HCI specification except for ISO packets, and the hardware has a queue where packets may wait. In this case the software SND timestamp only reflects the kernel-side part of the total latency (usually small) and queue length (usually 0 unless HW buffers congested), whereas the completion report time is more informative of the true latency.
It may also be useful in other cases where HW TX timestamps cannot be obtained and user wants to estimate an upper bound to when the TX probably happened.
Signed-off-by: Pauli Virtanen <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc7 |
|
| #
ed3ba9b6 |
| 16-Mar-2025 |
Kuniyuki Iwashima <[email protected]> |
net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF.
SIOCBRDELIF is passed to dev_ioctl() first and later forwarded to br_ioctl_call(), which causes unnecessary RTNL dance and the splat below [0]
net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF.
SIOCBRDELIF is passed to dev_ioctl() first and later forwarded to br_ioctl_call(), which causes unnecessary RTNL dance and the splat below [0] under RTNL pressure.
Let's say Thread A is trying to detach a device from a bridge and Thread B is trying to remove the bridge.
In dev_ioctl(), Thread A bumps the bridge device's refcnt by netdev_hold() and releases RTNL because the following br_ioctl_call() also re-acquires RTNL.
In the race window, Thread B could acquire RTNL and try to remove the bridge device. Then, rtnl_unlock() by Thread B will release RTNL and wait for netdev_put() by Thread A.
Thread A, however, must hold RTNL after the unlock in dev_ifsioc(), which may take long under RTNL pressure, resulting in the splat by Thread B.
Thread A (SIOCBRDELIF) Thread B (SIOCBRDELBR) ---------------------- ---------------------- sock_ioctl sock_ioctl `- sock_do_ioctl `- br_ioctl_call `- dev_ioctl `- br_ioctl_stub |- rtnl_lock | |- dev_ifsioc ' ' |- dev = __dev_get_by_name(...) |- netdev_hold(dev, ...) . / |- rtnl_unlock ------. | | |- br_ioctl_call `---> |- rtnl_lock Race | | `- br_ioctl_stub |- br_del_bridge Window | | | |- dev = __dev_get_by_name(...) | | | May take long | `- br_dev_delete(dev, ...) | | | under RTNL pressure | `- unregister_netdevice_queue(dev, ...) | | | | `- rtnl_unlock \ | |- rtnl_lock <-' `- netdev_run_todo | |- ... `- netdev_run_todo | `- rtnl_unlock |- __rtnl_unlock | |- netdev_wait_allrefs_any |- netdev_put(dev, ...) <----------------' Wait refcnt decrement and log splat below
To avoid blocking SIOCBRDELBR unnecessarily, let's not call dev_ioctl() for SIOCBRADDIF and SIOCBRDELIF.
In the dev_ioctl() path, we do the following:
1. Copy struct ifreq by get_user_ifreq in sock_do_ioctl() 2. Check CAP_NET_ADMIN in dev_ioctl() 3. Call dev_load() in dev_ioctl() 4. Fetch the master dev from ifr.ifr_name in dev_ifsioc()
3. can be done by request_module() in br_ioctl_call(), so we move 1., 2., and 4. to br_ioctl_stub().
Note that 2. is also checked later in add_del_if(), but it's better performed before RTNL.
SIOCBRADDIF and SIOCBRDELIF have been processed in dev_ioctl() since the pre-git era, and there seems to be no specific reason to process them there.
[0]: unregister_netdevice: waiting for wpan3 to become free. Usage count = 2 ref_tracker: wpan3@ffff8880662d8608 has 1/1 users at __netdev_tracker_alloc include/linux/netdevice.h:4282 [inline] netdev_hold include/linux/netdevice.h:4311 [inline] dev_ifsioc+0xc6a/0x1160 net/core/dev_ioctl.c:624 dev_ioctl+0x255/0x10c0 net/core/dev_ioctl.c:826 sock_do_ioctl+0x1ca/0x260 net/socket.c:1213 sock_ioctl+0x23a/0x6c0 net/socket.c:1318 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl fs/ioctl.c:892 [inline] __x64_sys_ioctl+0x1a4/0x210 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 893b19587534 ("net: bridge: fix ioctl locking") Reported-by: syzkaller <[email protected]> Reported-by: yan kang <[email protected]> Reported-by: yue sun <[email protected]> Closes: https://lore.kernel.org/netdev/SY8P300MB0421225D54EB92762AE8F0F2A1D32@SY8P300MB0421.AUSP300.PROD.OUTLOOK.COM/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc6, v6.14-rc5 |
|
| #
e6116fc6 |
| 25-Feb-2025 |
Willem de Bruijn <[email protected]> |
net: skb: free up one bit in tx_flags
The linked series wants to add skb tx completion timestamps. That needs a bit in skb_shared_info.tx_flags, but all are in use.
A per-skb bit is only needed for
net: skb: free up one bit in tx_flags
The linked series wants to add skb tx completion timestamps. That needs a bit in skb_shared_info.tx_flags, but all are in use.
A per-skb bit is only needed for features that are configured on a per packet basis. Per socket features can be read from sk->sk_tsflags.
Per packet tsflags can be set in sendmsg using cmsg, but only those in SOF_TIMESTAMPING_TX_RECORD_MASK.
Per packet tsflags can also be set without cmsg by sandwiching a send inbetween two setsockopts:
val |= SOF_TIMESTAMPING_$FEATURE; setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); write(fd, buf, sz); val &= ~SOF_TIMESTAMPING_$FEATURE; setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));
Changing a datapath test from skb_shinfo(skb)->tx_flags to skb->sk->sk_tsflags can change behavior in that case, as the tx_flags is written before the second setsockopt updates sk_tsflags.
Therefore, only bits can be reclaimed that cannot be set by cmsg and are also highly unlikely to be used to target individual packets otherwise.
Free up the bit currently used for SKBTX_HW_TSTAMP_USE_CYCLES. This selects between clock and free running counter source for HW TX timestamps. It is probable that all packets of the same socket will always use the same source.
Link: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Willem de Bruijn <[email protected]> Reviewed-by: Jason Xing <[email protected]> Reviewed-by: Gerhard Engleder <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4 |
|
| #
2deaf7f4 |
| 20-Feb-2025 |
Jason Xing <[email protected]> |
bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback
Support hw SCM_TSTAMP_SND case for bpf timestamping.
Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_HW_CB. This callback will occur at the same
bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback
Support hw SCM_TSTAMP_SND case for bpf timestamping.
Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_HW_CB. This callback will occur at the same timestamping point as the user space's hardware SCM_TSTAMP_SND. The BPF program can use it to get the same SCM_TSTAMP_SND timestamp without modifying the user-space application.
To avoid increasing the code complexity, replace SKBTX_HW_TSTAMP with SKBTX_HW_TSTAMP_NOBPF instead of changing numerous callers from driver side using SKBTX_HW_TSTAMP. The new definition of SKBTX_HW_TSTAMP means the combination tests of socket timestamping and bpf timestamping. After this patch, drivers can work under the bpf timestamping.
Considering some drivers don't assign the skb with hardware timestamp, this patch does the assignment and then BPF program can acquire the hwstamp from skb directly.
Signed-off-by: Jason Xing <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/[email protected]
show more ...
|
|
Revision tags: v6.14-rc3, v6.14-rc2 |
|
| #
2a42754b |
| 03-Feb-2025 |
Amir Goldstein <[email protected]> |
fsnotify: disable notification by default for all pseudo files
Most pseudo files are not applicable for fsnotify events at all, let alone to the new pre-content events.
Disable notifications to all
fsnotify: disable notification by default for all pseudo files
Most pseudo files are not applicable for fsnotify events at all, let alone to the new pre-content events.
Disable notifications to all files allocated with alloc_file_pseudo() and enable legacy inotify events for the specific cases of pipe and socket, which have known users of inotify events.
Pre-content events are also kept disabled for sockets and pipes.
Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches") Reported-by: Alex Williamson <[email protected]> Closes: https://lore.kernel.org/linux-fsdevel/[email protected]/ Suggested-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wi2pThSVY=zhO=ZKxViBj5QCRX-=AS2+rVknQgJnHXDFg@mail.gmail.com/ Tested-by: Alex Williamson <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1, v6.13, v6.13-rc7 |
|
| #
b6be5ba8 |
| 12-Jan-2025 |
Dr. David Alan Gilbert <[email protected]> |
socket: Remove unused kernel_sendmsg_locked
The last use of kernel_sendmsg_locked() was removed in 2023 by commit dc97391e6610 ("sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)")
socket: Remove unused kernel_sendmsg_locked
The last use of kernel_sendmsg_locked() was removed in 2023 by commit dc97391e6610 ("sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Joe Damato <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
21520e74 |
| 10-Jan-2025 |
Jakub Kicinski <[email protected]> |
net: hide the definition of dev_get_by_napi_id()
There are no module callers of dev_get_by_napi_id(), and commit d1cacd747768 ("netdev: prevent accessing NAPI instances from another namespace") prov
net: hide the definition of dev_get_by_napi_id()
There are no module callers of dev_get_by_napi_id(), and commit d1cacd747768 ("netdev: prevent accessing NAPI instances from another namespace") proves that getting NAPI by id needs to be done with care. So hide dev_get_by_napi_id().
Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Reviewed-by: Joe Damato <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
e45469e5 |
| 13-Dec-2024 |
Anna Emese Nyiri <[email protected]> |
sock: Introduce SO_RCVPRIORITY socket option
Add new socket option, SO_RCVPRIORITY, to include SO_PRIORITY in the ancillary data returned by recvmsg(). This is analogous to the existing support for
sock: Introduce SO_RCVPRIORITY socket option
Add new socket option, SO_RCVPRIORITY, to include SO_PRIORITY in the ancillary data returned by recvmsg(). This is analogous to the existing support for SO_RCVMARK, as implemented in commit 6fd1d51cfa253 ("net: SO_RCVMARK socket option for SO_MARK with recvmsg()").
Reviewed-by: Willem de Bruijn <[email protected]> Suggested-by: Ferenc Fejes <[email protected]> Signed-off-by: Anna Emese Nyiri <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1 |
|
| #
53c0a58b |
| 26-May-2024 |
Al Viro <[email protected]> |
net/socket.c: switch to CLASS(fd)
The important part in sockfd_lookup_light() is avoiding needless file refcount operations, not the marginal reduction of the register pressure from not keeping a s
net/socket.c: switch to CLASS(fd)
The important part in sockfd_lookup_light() is avoiding needless file refcount operations, not the marginal reduction of the register pressure from not keeping a struct file pointer in the caller.
Switch to use fdget()/fdpu(); with sane use of CLASS(fd) we can get a better code generation...
Would be nice if somebody tested it on networking test suites (including benchmarks)...
sockfd_lookup_light() does fdget(), uses sock_from_file() to get the associated socket and returns the struct socket reference to the caller, along with "do we need to fput()" flag. No matching fdput(), the caller does its equivalent manually, using the fact that sock->file points to the struct file the socket has come from.
Get rid of that - have the callers do fdget()/fdput() and use sock_from_file() directly. That kills sockfd_lookup_light() and fput_light() (no users left).
What's more, we can get rid of explicit fdget()/fdput() by switching to CLASS(fd, ...) - code generation does not suffer, since now fdput() inserted on "descriptor is not opened" failure exit is recognized to be a no-op by compiler.
[folded a fix for braino in do_recvmmsg() caught by Simon Horman]
Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
4bbd360a |
| 24-Oct-2024 |
Kuniyuki Iwashima <[email protected]> |
socket: Print pf->create() when it does not clear sock->sk on failure.
I suggested to put DEBUG_NET_WARN_ON_ONCE() in __sock_create() to catch possible use-after-free.
But the warning itself was no
socket: Print pf->create() when it does not clear sock->sk on failure.
I suggested to put DEBUG_NET_WARN_ON_ONCE() in __sock_create() to catch possible use-after-free.
But the warning itself was not useful because our interest is in the callee than the caller.
Let's define DEBUG_NET_WARN_ONCE() and print the name of pf->create() and the socket identifier.
While at it, we enclose DEBUG_NET_WARN_ON_ONCE() in parentheses too to avoid a checkpatch error.
Note that %pf or %pF were obsoleted and will be removed later as per comment in lib/vsprintf.c.
Link: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
48156296 |
| 14-Oct-2024 |
Ignat Korchagin <[email protected]> |
net: warn, if pf->create does not clear sock->sk on error
All pf->create implementations have been fixed now to clear sock->sk on error, when they deallocate the allocated sk object.
Put a warning
net: warn, if pf->create does not clear sock->sk on error
All pf->create implementations have been fixed now to clear sock->sk on error, when they deallocate the allocated sk object.
Put a warning in place to make sure we don't break this promise in the future.
Suggested-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Ignat Korchagin <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
63108314 |
| 03-Oct-2024 |
Ignat Korchagin <[email protected]> |
net: explicitly clear the sk pointer, when pf->create fails
We have recently noticed the exact same KASAN splat as in commit 6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket creat
net: explicitly clear the sk pointer, when pf->create fails
We have recently noticed the exact same KASAN splat as in commit 6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket creation fails"). The problem is that commit did not fully address the problem, as some pf->create implementations do not use sk_common_release in their error paths.
For example, we can use the same reproducer as in the above commit, but changing ping to arping. arping uses AF_PACKET socket and if packet_create fails, it will just sk_free the allocated sk object.
While we could chase all the pf->create implementations and make sure they NULL the freed sk object on error from the socket, we can't guarantee future protocols will not make the same mistake.
So it is easier to just explicitly NULL the sk pointer upon return from pf->create in __sock_create. We do know that pf->create always releases the allocated sk object on error, so if the pointer is not NULL, it is definitely dangling.
Fixes: 6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket creation fails") Signed-off-by: Ignat Korchagin <[email protected]> Cc: [email protected] Reviewed-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
822b5bc6 |
| 01-Oct-2024 |
Vadim Fedorenko <[email protected]> |
net_tstamp: add SCM_TS_OPT_ID for RAW sockets
The last type of sockets which supports SOF_TIMESTAMPING_OPT_ID is RAW sockets. To add new option this patch converts all callers (direct and indirect)
net_tstamp: add SCM_TS_OPT_ID for RAW sockets
The last type of sockets which supports SOF_TIMESTAMPING_OPT_ID is RAW sockets. To add new option this patch converts all callers (direct and indirect) of _sock_tx_timestamp to provide sockcm_cookie instead of tsflags. And while here fix __sock_tx_timestamp to receive tsflags as __u32 instead of __u16.
Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: Jason Xing <[email protected]> Signed-off-by: Vadim Fedorenko <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
cb787f4a |
| 27-Sep-2024 |
Al Viro <[email protected]> |
[tree-wide] finally take no_llseek out
no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical
[tree-wide] finally take no_llseek out
no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical removal of no_llseek -
git grep -l -w no_llseek | grep -v porting.rst | while read i; do sed -i '/\<no_llseek\>/d' $i done
would do it.
Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe.
Signed-off-by: Al Viro <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
be8e9eb3 |
| 09-Sep-2024 |
Jason Xing <[email protected]> |
net-timestamp: introduce SOF_TIMESTAMPING_OPT_RX_FILTER flag
introduce a new flag SOF_TIMESTAMPING_OPT_RX_FILTER in the receive path. User can set it with SOF_TIMESTAMPING_SOFTWARE to filter out rx
net-timestamp: introduce SOF_TIMESTAMPING_OPT_RX_FILTER flag
introduce a new flag SOF_TIMESTAMPING_OPT_RX_FILTER in the receive path. User can set it with SOF_TIMESTAMPING_SOFTWARE to filter out rx software timestamp report, especially after a process turns on netstamp_needed_key which can time stamp every incoming skb.
Previously, we found out if an application starts first which turns on netstamp_needed_key, then another one only passing SOF_TIMESTAMPING_SOFTWARE could also get rx timestamp. Now we handle this case by introducing this new flag without breaking users.
Quoting Willem to explain why we need the flag: "why a process would want to request software timestamp reporting, but not receive software timestamp generation. The only use I see is when the application does request SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_TX_SOFTWARE."
Similarly, this new flag could also be used for hardware case where we can set it with SOF_TIMESTAMPING_RAW_HARDWARE, then we won't receive hardware receive timestamp.
Another thing about errqueue in this patch I have a few words to say: In this case, we need to handle the egress path carefully, or else reporting the tx timestamp will fail. Egress path and ingress path will finally call sock_recv_timestamp(). We have to distinguish them. Errqueue is a good indicator to reflect the flow direction.
Suggested-by: Willem de Bruijn <[email protected]> Signed-off-by: Jason Xing <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
33f339a1 |
| 30-Aug-2024 |
Tze-nan Wu <[email protected]> |
bpf, net: Fix a potential race in do_sock_getsockopt()
There's a potential race when `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` is false during the execution of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN`, but
bpf, net: Fix a potential race in do_sock_getsockopt()
There's a potential race when `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` is false during the execution of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN`, but becomes true when `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is called. This inconsistency can lead to `BPF_CGROUP_RUN_PROG_GETSOCKOPT` receiving an "-EFAULT" from `__cgroup_bpf_run_filter_getsockopt(max_optlen=0)`. Scenario shown as below:
`process A` `process B` ----------- ------------ BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN enable CGROUP_GETSOCKOPT BPF_CGROUP_RUN_PROG_GETSOCKOPT (-EFAULT)
To resolve this, remove the `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` macro and directly uses `copy_from_sockptr` to ensure that `max_optlen` is always set before `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is invoked.
Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks") Co-developed-by: Yanghui Li <[email protected]> Signed-off-by: Yanghui Li <[email protected]> Co-developed-by: Cheng-Jui Wang <[email protected]> Signed-off-by: Cheng-Jui Wang <[email protected]> Signed-off-by: Tze-nan Wu <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
88a2f646 |
| 31-May-2024 |
Al Viro <[email protected]> |
struct fd: representation change
We want the compiler to see that fdput() on empty instance is a no-op. The emptiness check is that file reference is NULL, while fdput() is "fput() if FDPUT_FPUT i
struct fd: representation change
We want the compiler to see that fdput() on empty instance is a no-op. The emptiness check is that file reference is NULL, while fdput() is "fput() if FDPUT_FPUT is present in flags". The reason why fdput() on empty instance is a no-op is something compiler can't see - it's that we never generate instances with NULL file reference combined with non-zero flags.
It's not that hard to deal with - the real primitives behind fdget() et.al. are returning an unsigned long value, unpacked by (inlined) __to_fd() into the current struct file * + int. The lower bits are used to store flags, while the rest encodes the pointer. Linus suggested that keeping this unsigned long around with the extractions done by inlined accessors should generate a sane code and that turns out to be the case. Namely, turning struct fd into a struct-wrapped unsinged long, with fd_empty(f) => unlikely(f.word == 0) fd_file(f) => (struct file *)(f.word & ~3) fdput(f) => if (f.word & 1) fput(fd_file(f)) ends up with compiler doing the right thing. The cost is the patch footprint, of course - we need to switch f.file to fd_file(f) all over the tree, and it's not doable with simple search and replace; there are false positives, etc.
Note that the sole member of that structure is an opaque unsigned long - all accesses should be done via wrappers and I don't want to use a name that would invite manual casts to file pointers, etc. The value of that member is equal either to (unsigned long)p | flags, p being an address of some struct file instance, or to 0 for an empty fd.
For now the new predicate (fd_empty(f)) has no users; all the existing checks have form (!fd_file(f)). We will convert to fd_empty() use later; here we only define it (and tell the compiler that it's unlikely to return true).
This commit only deals with representation change; there will be followups.
Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
1da91ea8 |
| 31-May-2024 |
Al Viro <[email protected]> |
introduce fd_file(), convert all accessors to it.
For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are ve
introduce fd_file(), convert all accessors to it.
For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()).
NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...).
[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict]
Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
| #
bb6aaf73 |
| 14-Jun-2024 |
Gabriel Krisman Bertazi <[email protected]> |
net: Split a __sys_listen helper for io_uring
io_uring holds a reference to the file and maintains a sockaddr_storage address. Similarly to what was done to __sys_connect_file, split an internal he
net: Split a __sys_listen helper for io_uring
io_uring holds a reference to the file and maintains a sockaddr_storage address. Similarly to what was done to __sys_connect_file, split an internal helper for __sys_listen in preparation to support an io_uring listen command.
Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
dc2e7797 |
| 14-Jun-2024 |
Gabriel Krisman Bertazi <[email protected]> |
net: Split a __sys_bind helper for io_uring
io_uring holds a reference to the file and maintains a sockaddr_storage address. Similarly to what was done to __sys_connect_file, split an internal help
net: Split a __sys_bind helper for io_uring
io_uring holds a reference to the file and maintains a sockaddr_storage address. Similarly to what was done to __sys_connect_file, split an internal helper for __sys_bind in preparation to supporting an io_uring bind command.
Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.9 |
|
| #
0645fbe7 |
| 09-May-2024 |
Jens Axboe <[email protected]> |
net: have do_accept() take a struct proto_accept_arg argument
In preparation for passing in more information via this API, change do_accept() to take a proto_accept_arg struct pointer rather than ju
net: have do_accept() take a struct proto_accept_arg argument
In preparation for passing in more information via this API, change do_accept() to take a proto_accept_arg struct pointer rather than just the file flags separately.
No functional changes in this patch.
Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
92ef0fd5 |
| 09-May-2024 |
Jens Axboe <[email protected]> |
net: change proto and proto_ops accept type
Rather than pass in flags, error pointer, and whether this is a kernel invocation or not, add a struct proto_accept_arg struct as the argument. This then
net: change proto and proto_ops accept type
Rather than pass in flags, error pointer, and whether this is a kernel invocation or not, add a struct proto_accept_arg struct as the argument. This then holds all of these arguments, and prepares accept for being able to pass back more information.
No functional changes in this patch.
Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4 |
|
| #
8c9a6f54 |
| 09-Apr-2024 |
Pavel Begunkov <[email protected]> |
io_uring: separate header for exported net bits
We're exporting some io_uring bits to networking, e.g. for implementing a net callback for io_uring cmds, but we don't want to expose more than needed
io_uring: separate header for exported net bits
We're exporting some io_uring bits to networking, e.g. for implementing a net callback for io_uring cmds, but we don't want to expose more than needed. Add a separate header for networking.
Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: David Wei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc3, v6.9-rc2, v6.9-rc1 |
|
| #
e54e09c0 |
| 12-Mar-2024 |
Jens Axboe <[email protected]> |
net: remove {revc,send}msg_copy_msghdr() from exports
The only user of these was io_uring, and it's not using them anymore. Make them static and remove them from the socket header file.
Signed-off-
net: remove {revc,send}msg_copy_msghdr() from exports
The only user of these was io_uring, and it's not using them anymore. Make them static and remove them from the socket header file.
Signed-off-by: Jens Axboe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.8, v6.8-rc7 |
|
| #
d4f01c5e |
| 28-Feb-2024 |
Chengming Zhou <[email protected]> |
net: remove SLAB_MEM_SPREAD flag usage
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1, so it became a dead flag since the commit 16a1d968358a ("mm/slab: re
net: remove SLAB_MEM_SPREAD flag usage
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1, so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete to avoid confusion for users. Here we can just remove all its users, which has no functional change.
[1] https://lore.kernel.org/all/[email protected]/
Signed-off-by: Chengming Zhou <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|