|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1 |
|
| #
f1a69a94 |
| 04-Apr-2025 |
Ricardo Cañuelo Navarro <[email protected]> |
sctp: detect and prevent references to a freed transport in sendmsg
sctp_sendmsg() re-uses associations and transports when possible by doing a lookup based on the socket endpoint and the message de
sctp: detect and prevent references to a freed transport in sendmsg
sctp_sendmsg() re-uses associations and transports when possible by doing a lookup based on the socket endpoint and the message destination address, and then sctp_sendmsg_to_asoc() sets the selected transport in all the message chunks to be sent.
There's a possible race condition if another thread triggers the removal of that selected transport, for instance, by explicitly unbinding an address with setsockopt(SCTP_SOCKOPT_BINDX_REM), after the chunks have been set up and before the message is sent. This can happen if the send buffer is full, during the period when the sender thread temporarily releases the socket lock in sctp_wait_for_sndbuf().
This causes the access to the transport data in sctp_outq_select_transport(), when the association outqueue is flushed, to result in a use-after-free read.
This change avoids this scenario by having sctp_transport_free() signal the freeing of the transport, tagging it as "dead". In order to do this, the patch restores the "dead" bit in struct sctp_transport, which was removed in commit 47faa1e4c50e ("sctp: remove the dead field of sctp_transport").
Then, in the scenario where the sender thread has released the socket lock in sctp_wait_for_sndbuf(), the bit is checked again after re-acquiring the socket lock to detect the deletion. This is done while holding a reference to the transport to prevent it from being freed in the process.
If the transport was deleted while the socket lock was relinquished, sctp_sendmsg_to_asoc() will return -EAGAIN to let userspace retry the send.
The bug was found by a private syzbot instance (see the error report [1] and the C reproducer that triggers it [2]).
Link: https://people.igalia.com/rcn/kernel_logs/20250402__KASAN_slab-use-after-free_Read_in_sctp_outq_select_transport.txt [1] Link: https://people.igalia.com/rcn/kernel_logs/20250402__KASAN_slab-use-after-free_Read_in_sctp_outq_select_transport__repro.c [2] Cc: [email protected] Fixes: df132eff4638 ("sctp: clear the transport of some out_chunk_list chunks in sctp_assoc_rm_peer") Suggested-by: Xin Long <[email protected]> Signed-off-by: Ricardo Cañuelo Navarro <[email protected]> Acked-by: Xin Long <[email protected]> Link: https://patch.msgid.link/20250404-kasan_slab-use-after-free_read_in_sctp_outq_select_transport__20250404-v1-1-5ce4a0b78ef2@igalia.com Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5 |
|
| #
7f47fcea |
| 22-Aug-2024 |
Simon Horman <[email protected]> |
sctp: Correct spelling in headers
Correct spelling in sctp.h and structs.h. As reported by codespell.
Cc: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: Simon Horman <horms@kern
sctp: Correct spelling in headers
Correct spelling in sctp.h and structs.h. As reported by codespell.
Cc: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: Simon Horman <[email protected]> Acked-by: Xin Long <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3 |
|
| #
89304f91 |
| 02-Feb-2024 |
Eric Dumazet <[email protected]> |
sctp: preserve const qualifier in sctp_sk()
We can change sctp_sk() to propagate its argument const qualifier, thanks to container_of_const().
Signed-off-by: Eric Dumazet <[email protected]> Cc:
sctp: preserve const qualifier in sctp_sk()
We can change sctp_sk() to propagate its argument const qualifier, thanks to container_of_const().
Signed-off-by: Eric Dumazet <[email protected]> Cc: Marcelo Ricardo Leitner <[email protected]> Cc: Xin Long <[email protected]> Acked-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5 |
|
| #
49c467dc |
| 31-Jul-2023 |
Yue Haibing <[email protected]> |
sctp: Remove unused function declarations
These declarations are never implemented since beginning of git history.
Signed-off-by: Yue Haibing <[email protected]> Reviewed-by: Simon Horman <horm
sctp: Remove unused function declarations
These declarations are never implemented since beginning of git history.
Signed-off-by: Yue Haibing <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Xin Long <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3 |
|
| #
f97278ff |
| 19-Apr-2023 |
Xin Long <[email protected]> |
sctp: delete the nested flexible array peer_init
This patch deletes the flexible-array peer_init[] from the structure sctp_cookie to avoid some sparse warnings:
# make C=2 CF="-Wflexible-array-ne
sctp: delete the nested flexible array peer_init
This patch deletes the flexible-array peer_init[] from the structure sctp_cookie to avoid some sparse warnings:
# make C=2 CF="-Wflexible-array-nested" M=./net/sctp/ net/sctp/sm_make_chunk.c: note: in included file (through include/net/sctp/sctp.h): ./include/net/sctp/structs.h:1588:28: warning: nested flexible array ./include/net/sctp/structs.h:343:28: warning: nested flexible array
Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc7 |
|
| #
bd4b2818 |
| 14-Apr-2023 |
Xin Long <[email protected]> |
sctp: delete the obsolete code for the host name address param
In the latest RFC9260, the Host Name Address param has been deprecated. For INIT chunk:
Note 3: An INIT chunk MUST NOT contain the H
sctp: delete the obsolete code for the host name address param
In the latest RFC9260, the Host Name Address param has been deprecated. For INIT chunk:
Note 3: An INIT chunk MUST NOT contain the Host Name Address parameter. The receiver of an INIT chunk containing a Host Name Address parameter MUST send an ABORT chunk and MAY include an "Unresolvable Address" error cause.
For Supported Address Types:
The value indicating the Host Name Address parameter MUST NOT be used when sending this parameter and MUST be ignored when receiving this parameter.
Currently Linux SCTP doesn't really support Host Name Address param, but only saves some flag and print debug info, which actually won't even be triggered due to the verification in sctp_verify_param(). This patch is to delete those dead code.
Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2 |
|
| #
42d452e7 |
| 07-Mar-2023 |
Xin Long <[email protected]> |
sctp: add weighted fair queueing stream scheduler
As it says in rfc8260#section-3.6 about the weighted fair queueing scheduler:
A Weighted Fair Queueing scheduler between the streams is used. T
sctp: add weighted fair queueing stream scheduler
As it says in rfc8260#section-3.6 about the weighted fair queueing scheduler:
A Weighted Fair Queueing scheduler between the streams is used. The weight is configurable per outgoing SCTP stream. This scheduler considers the lengths of the messages of each stream and schedules them in a specific way to use the capacity according to the given weights. If the weight of stream S1 is n times the weight of stream S2, the scheduler should assign to stream S1 n times the capacity it assigns to stream S2. The details are implementation dependent. Interleaving user messages allows for a better realization of the capacity usage according to the given weights.
This patch adds Weighted Fair Queueing Scheduler actually based on the code of Fair Capacity Scheduler by adding fc_weight into struct sctp_stream_out_ext and taking it into account when sorting stream-> fc_list in sctp_sched_fc_sched() and sctp_sched_fc_dequeue_done().
Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
4821a076 |
| 07-Mar-2023 |
Xin Long <[email protected]> |
sctp: add fair capacity stream scheduler
As it says in rfc8260#section-3.5 about the fair capacity scheduler:
A fair capacity distribution between the streams is used. This scheduler conside
sctp: add fair capacity stream scheduler
As it says in rfc8260#section-3.5 about the fair capacity scheduler:
A fair capacity distribution between the streams is used. This scheduler considers the lengths of the messages of each stream and schedules them in a specific way to maintain an equal capacity for all streams. The details are implementation dependent. interleaving user messages allows for a better realization of the fair capacity usage.
This patch adds Fair Capacity Scheduler based on the foundations added by commit 5bbbbe32a431 ("sctp: introduce stream scheduler foundations"):
A fc_list and a fc_length are added into struct sctp_stream_out_ext and a fc_list is added into struct sctp_stream. In .enqueue, when there are chunks enqueued into a stream, this stream will be linked into stream-> fc_list by its fc_list ordered by its fc_length. In .dequeue, it always picks up the 1st skb from stream->fc_list. In .dequeue_done, fc_length is increased by chunk's len and update its location in stream->fc_list according to the its new fc_length.
Note that when the new fc_length overflows in .dequeue_done, instead of resetting all fc_lengths to 0, we only reduced them by U32_MAX / 4 to avoid a moment of imbalance in the scheduling, as Marcelo suggested.
Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc1 |
|
| #
68ba4463 |
| 22-Feb-2023 |
Xin Long <[email protected]> |
sctp: add a refcnt in sctp_stream_priorities to avoid a nested loop
With this refcnt added in sctp_stream_priorities, we don't need to traverse all streams to check if the prio is used by other stre
sctp: add a refcnt in sctp_stream_priorities to avoid a nested loop
With this refcnt added in sctp_stream_priorities, we don't need to traverse all streams to check if the prio is used by other streams when freeing one stream's prio in sctp_sched_prio_free_sid(). This can avoid a nested loop (up to 65535 * 65535), which may cause a stuck as Ying reported:
watchdog: BUG: soft lockup - CPU#23 stuck for 26s! [ksoftirqd/23:136] Call Trace: <TASK> sctp_sched_prio_free_sid+0xab/0x100 [sctp] sctp_stream_free_ext+0x64/0xa0 [sctp] sctp_stream_free+0x31/0x50 [sctp] sctp_association_free+0xa5/0x200 [sctp]
Note that it doesn't need to use refcount_t type for this counter, as its accessing is always protected under the sock lock.
v1->v2: - add a check in sctp_sched_prio_set to avoid the possible prio_head refcnt overflow.
Fixes: 9ed7bfc79542 ("sctp: fix memory leak in sctp_stream_outq_migrate()") Reported-by: Ying Xu <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: Xin Long <[email protected]> Link: https://lore.kernel.org/r/825eb0c905cb864991eba335f4a2b780e543f06b.1677085641.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6 |
|
| #
0af03170 |
| 16-Nov-2022 |
Xin Long <[email protected]> |
sctp: add dif and sdif check in asoc and ep lookup
This patch at first adds a pernet global l3mdev_accept to decide if it accepts the packets from a l3mdev when a SCTP socket doesn't bind to any int
sctp: add dif and sdif check in asoc and ep lookup
This patch at first adds a pernet global l3mdev_accept to decide if it accepts the packets from a l3mdev when a SCTP socket doesn't bind to any interface. It's set to 1 to avoid any possible incompatible issue, and in next patch, a sysctl will be introduced to allow to change it.
Then similar to inet/udp_sk_bound_dev_eq(), sctp_sk_bound_dev_eq() is added to check either dif or sdif is equal to sk_bound_dev_if, and to check sid is 0 or l3mdev_accept is 1 if sk_bound_dev_if is not set. This function is used to match a association or a endpoint, namely called by sctp_addrs_lookup_transport() and sctp_endpoint_is_match(). All functions that needs updating are:
sctp_rcv(): asoc: __sctp_rcv_lookup() __sctp_lookup_association() -> sctp_addrs_lookup_transport() __sctp_rcv_lookup_harder() __sctp_rcv_init_lookup() __sctp_lookup_association() -> sctp_addrs_lookup_transport() __sctp_rcv_walk_lookup() __sctp_rcv_asconf_lookup() __sctp_lookup_association() -> sctp_addrs_lookup_transport()
ep: __sctp_rcv_lookup_endpoint() -> sctp_endpoint_is_match()
sctp_connect(): sctp_endpoint_is_peeled_off() __sctp_lookup_association() sctp_has_association() sctp_lookup_association() __sctp_lookup_association() -> sctp_addrs_lookup_transport()
sctp_diag_dump_one(): sctp_transport_lookup_process() -> sctp_addrs_lookup_transport()
Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
33e93ed2 |
| 16-Nov-2022 |
Xin Long <[email protected]> |
sctp: add skb_sdif in struct sctp_af
Add skb_sdif function in struct sctp_af to get the enslaved device for both ipv4 and ipv6 when adding SCTP VRF support in sctp_rcv in the next patch.
Signed-off
sctp: add skb_sdif in struct sctp_af
Add skb_sdif function in struct sctp_af to get the enslaved device for both ipv4 and ipv6 when adding SCTP VRF support in sctp_rcv in the next patch.
Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7 |
|
| #
5ec7d18d |
| 23-Dec-2021 |
Xin Long <[email protected]> |
sctp: use call_rcu to free endpoint
This patch is to delay the endpoint free by calling call_rcu() to fix another use-after-free issue in sctp_sock_dump():
BUG: KASAN: use-after-free in __lock_ac
sctp: use call_rcu to free endpoint
This patch is to delay the endpoint free by calling call_rcu() to fix another use-after-free issue in sctp_sock_dump():
BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20 Call Trace: __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218 lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh include/linux/spinlock.h:334 [inline] __lock_sock+0x203/0x350 net/core/sock.c:2253 lock_sock_nested+0xfe/0x120 net/core/sock.c:2774 lock_sock include/net/sock.h:1492 [inline] sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324 sctp_for_each_transport+0x2b5/0x370 net/sctp/socket.c:5091 sctp_diag_dump+0x3ac/0x660 net/sctp/diag.c:527 __inet_diag_dump+0xa8/0x140 net/ipv4/inet_diag.c:1049 inet_diag_dump+0x9b/0x110 net/ipv4/inet_diag.c:1065 netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244 __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352 netlink_dump_start include/linux/netlink.h:216 [inline] inet_diag_handler_cmd+0x2ce/0x3f0 net/ipv4/inet_diag.c:1170 __sock_diag_cmd net/core/sock_diag.c:232 [inline] sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274
This issue occurs when asoc is peeled off and the old sk is freed after getting it by asoc->base.sk and before calling lock_sock(sk).
To prevent the sk free, as a holder of the sk, ep should be alive when calling lock_sock(). This patch uses call_rcu() and moves sock_put and ep free into sctp_endpoint_destroy_rcu(), so that it's safe to try to hold the ep under rcu_read_lock in sctp_transport_traverse_process().
If sctp_endpoint_hold() returns true, it means this ep is still alive and we have held it and can continue to dump it; If it returns false, it means this ep is dead and can be freed after rcu_read_unlock, and we should skip it.
In sctp_sock_dump(), after locking the sk, if this ep is different from tsp->asoc->ep, it means during this dumping, this asoc was peeled off before calling lock_sock(), and the sk should be skipped; If this ep is the same with tsp->asoc->ep, it means no peeloff happens on this asoc, and due to lock_sock, no peeloff will happen either until release_sock.
Note that delaying endpoint free won't delay the port release, as the port release happens in sctp_endpoint_destroy() before calling call_rcu(). Also, freeing endpoint by call_rcu() makes it safe to access the sk by asoc->base.sk in sctp_assocs_seq_show() and sctp_rcv().
Thanks Jones to bring this issue up.
v1->v2: - improve the changelog. - add kfree(ep) into sctp_endpoint_destroy_rcu(), as Jakub noticed.
Reported-by: [email protected] Reported-by: Lee Jones <[email protected]> Fixes: d25adbeb0cdb ("sctp: fix an use-after-free issue in sctp_sock_dump") Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
3d3b2f57 |
| 21-Dec-2021 |
Xin Long <[email protected]> |
sctp: move hlist_node and hashent out of sctp_ep_common
Struct sctp_ep_common is included in both asoc and ep, but hlist_node and hashent are only needed by ep after asoc_hashtable was dropped by Co
sctp: move hlist_node and hashent out of sctp_ep_common
Struct sctp_ep_common is included in both asoc and ep, but hlist_node and hashent are only needed by ep after asoc_hashtable was dropped by Commit b5eff7128366 ("sctp: drop the old assoc hashtable of sctp").
So it is better to move hlist_node and hashent from sctp_ep_common to sctp_endpoint, and it saves some space for each asoc.
Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3 |
|
| #
70331909 |
| 24-Nov-2021 |
Xin Long <[email protected]> |
sctp: make the raise timer more simple and accurate
Currently, the probe timer is reused as the raise timer when PLPMTUD is in the Search Complete state. raise_count was introduced to count how many
sctp: make the raise timer more simple and accurate
Currently, the probe timer is reused as the raise timer when PLPMTUD is in the Search Complete state. raise_count was introduced to count how many times the probe timer has timed out. When raise_count reaches to 30, the raise timer handler will be triggered.
During the whole processing above, the timer keeps timing out every probe_ interval. It is a waste for the Search Complete state, as the raise timer only needs to time out after 30 * probe_interval.
Since the raise timer and probe timer are never used at the same time, it is no need to keep probe timer 'alive' in the Search Complete state. This patch to introduce sctp_transport_reset_raise_timer() to start the timer as the raise timer when entering the Search Complete state. When entering the other states, sctp_transport_reset_probe_timer() will still be called to reset the timer to the probe timer.
raise_count can be removed from sctp_transport as no need to count probe timer timeout for raise timer timeout. last_rtx_chunks can be removed as sctp_transport_reset_probe_timer() can be called in the place where asoc rtx_data_chunks is changed.
Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Link: https://lore.kernel.org/r/edb0e48988ea85997488478b705b11ddc1ba724a.1637781974.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.16-rc2, v5.16-rc1 |
|
| #
c081d53f |
| 02-Nov-2021 |
Xin Long <[email protected]> |
security: pass asoc to sctp_assoc_request and sctp_sk_clone
This patch is to move secid and peer_secid from endpoint to association, and pass asoc to sctp_assoc_request and sctp_sk_clone instead of
security: pass asoc to sctp_assoc_request and sctp_sk_clone
This patch is to move secid and peer_secid from endpoint to association, and pass asoc to sctp_assoc_request and sctp_sk_clone instead of ep. As ep is the local endpoint and asoc represents a connection, and in SCTP one sk/ep could have multiple asoc/connection, saving secid/peer_secid for new asoc will overwrite the old asoc's.
Note that since asoc can be passed as NULL, security_sctp_assoc_request() is moved to the place right after the new_asoc is created in sctp_sf_do_5_1B_init() and sctp_sf_do_unexpected_init().
v1->v2: - fix the description of selinux_netlbl_skbuff_setsid(), as Jakub noticed. - fix the annotation in selinux_sctp_assoc_request(), as Richard Noticed.
Fixes: 72e89f50084c ("security: Add support for SCTP security hooks") Reported-by: Prashanth Prahlad <[email protected]> Reviewed-by: Richard Haines <[email protected]> Tested-by: Richard Haines <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3 |
|
| #
eacf078c |
| 25-Jul-2021 |
Xin Long <[email protected]> |
sctp: send pmtu probe only if packet loss in Search Complete state
This patch is to introduce last_rtx_chunks into sctp_transport to detect if there's any packet retransmission/loss happened by chec
sctp: send pmtu probe only if packet loss in Search Complete state
This patch is to introduce last_rtx_chunks into sctp_transport to detect if there's any packet retransmission/loss happened by checking against asoc's rtx_data_chunks in sctp_transport_pl_send().
If there is, namely, transport->last_rtx_chunks != asoc->rtx_data_chunks, the pmtu probe will be sent out. Otherwise, increment the pl.raise_count and return when it's in Search Complete state.
With this patch, if in Search Complete state, which is a long period, it doesn't need to keep probing the current pmtu unless there's data packet loss. This will save quite some traffic.
v1->v2: - add the missing Fixes tag.
Fixes: 0dac127c0557 ("sctp: do black hole detection in search complete state") Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
058e6e0e |
| 25-Jul-2021 |
Xin Long <[email protected]> |
sctp: improve the code for pmtu probe send and recv update
This patch does 3 things:
- make sctp_transport_pl_send() and sctp_transport_pl_recv() return bool type to decide if more probe is n
sctp: improve the code for pmtu probe send and recv update
This patch does 3 things:
- make sctp_transport_pl_send() and sctp_transport_pl_recv() return bool type to decide if more probe is needed to send.
- pr_debug() only when probe is really needed to send.
- count pl.raise_count in sctp_transport_pl_send() instead of sctp_transport_pl_recv(), and it's only incremented for the 1st probe for the same size.
These are preparations for the next patch to make probes happen only when there's packet loss in Search Complete state.
Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.14-rc2, v5.14-rc1 |
|
| #
0c5dc070 |
| 28-Jun-2021 |
Marcelo Ricardo Leitner <[email protected]> |
sctp: validate from_addr_param return
Ilja reported that, simply putting it, nothing was validating that from_addr_param functions were operating on initialized memory. That is, the parameter itself
sctp: validate from_addr_param return
Ilja reported that, simply putting it, nothing was validating that from_addr_param functions were operating on initialized memory. That is, the parameter itself was being validated by sctp_walk_params, but it doesn't check for types and their specific sizes and it could be a 0-length one, causing from_addr_param to potentially work over the next parameter or even uninitialized memory.
The fix here is to, in all calls to from_addr_param, check if enough space is there for the wanted IP address type.
Reported-by: Ilja Van Sprundel <[email protected]> Signed-off-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.13 |
|
| #
0dac127c |
| 24-Jun-2021 |
Xin Long <[email protected]> |
sctp: do black hole detection in search complete state
Currently the PLPMUTD probe will stop for a long period (interval * 30) after it enters search complete state. If there's a pmtu change on the
sctp: do black hole detection in search complete state
Currently the PLPMUTD probe will stop for a long period (interval * 30) after it enters search complete state. If there's a pmtu change on the route path, it takes a long time to be aware if the ICMP TooBig packet is lost or filtered.
As it says in rfc8899#section-4.3:
"A DPLPMTUD method MUST NOT rely solely on this method." (ICMP PTB message).
This patch is to enable the other method for search complete state:
"A PL can use the DPLPMTUD probing mechanism to periodically generate probe packets of the size of the current PLPMTU."
With this patch, the probe will continue with the current pmtu every 'interval' until the PMTU_RAISE_TIMER 'timeout', which we implement by adding raise_count to raise the probe size when it counts to 30 and removing the SCTP_PL_COMPLETE check for PMTU_RAISE_TIMER.
Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
b87641af |
| 22-Jun-2021 |
Xin Long <[email protected]> |
sctp: do state transition when a probe succeeds on HB ACK recv path
As described in rfc8899#section-5.2, when a probe succeeds, there might be the following state transitions:
- Base -> Search, o
sctp: do state transition when a probe succeeds on HB ACK recv path
As described in rfc8899#section-5.2, when a probe succeeds, there might be the following state transitions:
- Base -> Search, occurs when probe succeeds with BASE_PLPMTU, pl.pmtu is not changing, pl.probe_size increases by SCTP_PL_BIG_STEP,
- Error -> Search, occurs when probe succeeds with BASE_PLPMTU, pl.pmtu is changed from SCTP_MIN_PLPMTU to SCTP_BASE_PLPMTU, pl.probe_size increases by SCTP_PL_BIG_STEP.
- Search -> Search Complete, occurs when probe succeeds with the probe size SCTP_MAX_PLPMTU less than pl.probe_high, pl.pmtu is not changing, but update *pathmtu* with it, pl.probe_size is set back to pl.pmtu to double check it.
- Search Complete -> Search, occurs when probe succeeds with the probe size equal to pl.pmtu, pl.pmtu is not changing, pl.probe_size increases by SCTP_PL_MIN_STEP.
So search process can be described as:
1. When it just enters 'Search' state, *pathmtu* is not updated with pl.pmtu, and probe_size increases by a big step (SCTP_PL_BIG_STEP) each round.
2. Until pl.probe_high is set when a probe fails, and probe_size decreases back to pl.pmtu, as described in the last patch.
3. When the probe with the new size succeeds, probe_size changes to increase by a small step (SCTP_PL_MIN_STEP) due to pl.probe_high is set.
4. Until probe_size is next to pl.probe_high, the searching finishes and it goes to 'Complete' state and updates *pathmtu* with pl.pmtu, and then probe_size is set to pl.pmtu to confirm by once more probe.
5. This probe occurs after "30 * probe_inteval", a much longer time than that in Search state. Once it is done it goes to 'Search' state again with probe_size increased by SCTP_PL_MIN_STEP.
As we can see above, during the searching, pl.pmtu changes while *pathmtu* doesn't. *pathmtu* is only updated when the search finishes by which it gets an optimal value for it. A big step is used at the beginning until it gets close to the optimal value, then it changes to a small step until it has this optimal value.
The small step is also used in 'Complete' until it goes to 'Search' state again and the probe with 'pmtu + the small step' succeeds, which means a higher size could be used. Then probe_size changes to increase by a big step again until it gets close to the next optimal value.
Note that anytime when black hole is detected, it goes directly to 'Base' state with pl.pmtu set to SCTP_BASE_PLPMTU, as described in the last patch.
Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
1dc68c19 |
| 22-Jun-2021 |
Xin Long <[email protected]> |
sctp: do state transition when PROBE_COUNT == MAX_PROBES on HB send path
The state transition is described in rfc8899#section-5.2, PROBE_COUNT == MAX_PROBES means the probe fails for MAX times, and
sctp: do state transition when PROBE_COUNT == MAX_PROBES on HB send path
The state transition is described in rfc8899#section-5.2, PROBE_COUNT == MAX_PROBES means the probe fails for MAX times, and the state transition includes:
- Base -> Error, occurs when BASE_PLPMTU Confirmation Fails, pl.pmtu is set to SCTP_MIN_PLPMTU, probe_size is still SCTP_BASE_PLPMTU;
- Search -> Base, occurs when Black Hole Detected, pl.pmtu is set to SCTP_BASE_PLPMTU, probe_size is set back to SCTP_BASE_PLPMTU;
- Search Complete -> Base, occurs when Black Hole Detected pl.pmtu is set to SCTP_BASE_PLPMTU, probe_size is set back to SCTP_BASE_PLPMTU;
Note a black hole is encountered when a sender is unaware that packets are not being delivered to the destination endpoint. So it includes the probe failures with equal probe_size to pl.pmtu, and definitely not include that with greater probe_size than pl.pmtu. The later one is the normal probe failure where probe_size should decrease back to pl.pmtu and pl.probe_high is set. pl.probe_high would be used on HB ACK recv path in the next patch.
Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
fe59379b |
| 22-Jun-2021 |
Xin Long <[email protected]> |
sctp: do the basic send and recv for PLPMTUD probe
This patch does exactly what rfc8899#section-6.2.1.2 says:
The SCTP sender needs to be able to determine the total size of a probe packet.
sctp: do the basic send and recv for PLPMTUD probe
This patch does exactly what rfc8899#section-6.2.1.2 says:
The SCTP sender needs to be able to determine the total size of a probe packet. The HEARTBEAT chunk could carry a Heartbeat Information parameter that includes, besides the information suggested in [RFC4960], the probe size to help an implementation associate a HEARTBEAT ACK with the size of probe that was sent. The sender could also use other methods, such as sending a nonce and verifying the information returned also contains the corresponding nonce. The length of the PAD chunk is computed by reducing the probing size by the size of the SCTP common header and the HEARTBEAT chunk.
Note that HB ACK chunk will carry back whatever HB chunk carried, including the probe_size we put it in; We also check hbinfo->probe_size in the HB ACK against link->pl.probe_size to validate this HB ACK chunk.
v1->v2: - Remove the unused 'sp' and add static for sctp_packet_bundle_pad().
Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
92548ec2 |
| 22-Jun-2021 |
Xin Long <[email protected]> |
sctp: add the probe timer in transport for PLPMTUD
There are 3 timers described in rfc8899#section-5.1.1:
PROBE_TIMER, PMTU_RAISE_TIMER, CONFIRMATION_TIMER
This patches adds a 'probe_timer' in t
sctp: add the probe timer in transport for PLPMTUD
There are 3 timers described in rfc8899#section-5.1.1:
PROBE_TIMER, PMTU_RAISE_TIMER, CONFIRMATION_TIMER
This patches adds a 'probe_timer' in transport, and it works as either PROBE_TIMER or PMTU_RAISE_TIMER. At most time, it works as PROBE_TIMER and expires every a 'probe_interval' time to send the HB probe packet. When transport pl enters COMPLETE state, it works as PMTU_RAISE_TIMER and expires in 'probe_interval * 30' time to go back to SEARCH state and do searching again.
SCTP HB is an acknowledged packet, CONFIRMATION_TIMER is not needed.
The timer will start when transport pl enters BASE state and stop when it enters DISABLED state.
Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
d9e2e410 |
| 22-Jun-2021 |
Xin Long <[email protected]> |
sctp: add the constants/variables and states and some APIs for transport
These are 4 constants described in rfc8899#section-5.1.2:
MAX_PROBES, MIN_PLPMTU, MAX_PLPMTU, BASE_PLPMTU;
And 2 variable
sctp: add the constants/variables and states and some APIs for transport
These are 4 constants described in rfc8899#section-5.1.2:
MAX_PROBES, MIN_PLPMTU, MAX_PLPMTU, BASE_PLPMTU;
And 2 variables described in rfc8899#section-5.1.3:
PROBED_SIZE, PROBE_COUNT;
And 5 states described in rfc8899#section-5.2:
DISABLED, BASE, SEARCH, SEARCH_COMPLETE, ERROR;
And these 4 APIs are used to reset/update PLPMTUD, check if PLPMTUD is enabled, and calculate the additional headers length for a transport.
Note the member 'probe_high' in transport will be set to the probe size when a probe fails with this probe size in the next patches.
Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
d1e462a7 |
| 22-Jun-2021 |
Xin Long <[email protected]> |
sctp: add probe_interval in sysctl and sock/asoc/transport
PLPMTUD can be enabled by doing 'sysctl -w net.sctp.probe_interval=n'. 'n' is the interval for PLPMTUD probe timer in milliseconds, and it
sctp: add probe_interval in sysctl and sock/asoc/transport
PLPMTUD can be enabled by doing 'sysctl -w net.sctp.probe_interval=n'. 'n' is the interval for PLPMTUD probe timer in milliseconds, and it can't be less than 5000 if it's not 0.
All asoc/transport's PLPMTUD in a new socket will be enabled by default.
Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|