|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1 |
|
| #
f1e30061 |
| 24-Mar-2025 |
Eric Dumazet <[email protected]> |
tcp/dccp: remove icsk->icsk_ack.timeout
icsk->icsk_ack.timeout can be replaced by icsk->csk_delack_timer.expires
This saves 8 bytes in TCP/DCCP sockets and helps for better cache locality.
Signed-
tcp/dccp: remove icsk->icsk_ack.timeout
icsk->icsk_ack.timeout can be replaced by icsk->csk_delack_timer.expires
This saves 8 bytes in TCP/DCCP sockets and helps for better cache locality.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
a7c428ee |
| 24-Mar-2025 |
Eric Dumazet <[email protected]> |
tcp/dccp: remove icsk->icsk_timeout
icsk->icsk_timeout can be replaced by icsk->icsk_retransmit_timer.expires
This saves 8 bytes in TCP/DCCP sockets and helps for better cache locality.
Signed-off
tcp/dccp: remove icsk->icsk_timeout
icsk->icsk_timeout can be replaced by icsk->icsk_retransmit_timer.expires
This saves 8 bytes in TCP/DCCP sockets and helps for better cache locality.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2 |
|
| #
553f9a8b |
| 05-Feb-2025 |
Nam Cao <[email protected]> |
tcp: Switch to use hrtimer_setup()
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely.
Replace hrtimer_init() and the open coded initialization of
tcp: Switch to use hrtimer_setup()
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely.
Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism.
Patch was created by using Coccinelle.
Signed-off-by: Nam Cao <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/a16c227cc6882d8aecf658e6a7e38b74e7fd7573.1738746872.git.namcao@linutronix.de
show more ...
|
| #
6dc4c252 |
| 12-Feb-2025 |
Eric Dumazet <[email protected]> |
tcp: use EXPORT_IPV6_MOD[_GPL]()
Use EXPORT_IPV6_MOD[_GPL]() for symbols that don't need to be exported unless CONFIG_IPV6=m
tcp_hashinfo and tcp_openreq_init_rwin() are no longer used from any mod
tcp: use EXPORT_IPV6_MOD[_GPL]()
Use EXPORT_IPV6_MOD[_GPL]() for symbols that don't need to be exported unless CONFIG_IPV6=m
tcp_hashinfo and tcp_openreq_init_rwin() are no longer used from any module anyway.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Mateusz Polchlopek <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
54a378f4 |
| 07-Feb-2025 |
Eric Dumazet <[email protected]> |
tcp: add the ability to control max RTO
Currently, TCP stack uses a constant (120 seconds) to limit the RTO value exponential growth.
Some applications want to set a lower value.
Add TCP_RTO_MAX_M
tcp: add the ability to control max RTO
Currently, TCP stack uses a constant (120 seconds) to limit the RTO value exponential growth.
Some applications want to set a lower value.
Add TCP_RTO_MAX_MS socket option to set a value (in ms) between 1 and 120 seconds.
It is discouraged to change the socket rto max on a live socket, as it might lead to unexpected disconnects.
Following patch is adding a netns sysctl to control the default value at socket creation time.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jason Xing <[email protected]> Reviewed-by: Neal Cardwell <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
48b69b4c |
| 07-Feb-2025 |
Eric Dumazet <[email protected]> |
tcp: use tcp_reset_xmit_timer()
In order to reduce TCP_RTO_MAX occurrences, replace:
inet_csk_reset_xmit_timer(sk, what, when, TCP_RTO_MAX)
With:
tcp_reset_xmit_timer(sk, what, when, fals
tcp: use tcp_reset_xmit_timer()
In order to reduce TCP_RTO_MAX occurrences, replace:
inet_csk_reset_xmit_timer(sk, what, when, TCP_RTO_MAX)
With:
tcp_reset_xmit_timer(sk, what, when, false);
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jason Xing <[email protected]> Reviewed-by: Neal Cardwell <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| #
be258f65 |
| 06-Feb-2025 |
Eric Dumazet <[email protected]> |
tcp: rename inet_csk_{delete|reset}_keepalive_timer()
inet_csk_delete_keepalive_timer() and inet_csk_reset_keepalive_timer() are only used from core TCP, there is no need to export them.
Replace th
tcp: rename inet_csk_{delete|reset}_keepalive_timer()
inet_csk_delete_keepalive_timer() and inet_csk_reset_keepalive_timer() are only used from core TCP, there is no need to export them.
Replace their prefix by tcp.
Move them to net/ipv4/tcp_timer.c and make tcp_delete_keepalive_timer() static.
Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jason Xing <[email protected]> Reviewed-by: Joe Damato <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2 |
|
| #
269084f7 |
| 03-Oct-2024 |
Menglong Dong <[email protected]> |
net: tcp: refresh tcp_mstamp for compressed ack in timer
For now, we refresh the tcp_mstamp for delayed acks and keepalives, but not for the compressed ack in tcp_compressed_ack_kick().
I have not
net: tcp: refresh tcp_mstamp for compressed ack in timer
For now, we refresh the tcp_mstamp for delayed acks and keepalives, but not for the compressed ack in tcp_compressed_ack_kick().
I have not found out the effact of the tcp_mstamp when sending ack, but we can still refresh it for the compressed ack to keep consistent.
Signed-off-by: Menglong Dong <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
81df4fa9 |
| 02-Oct-2024 |
Eric Dumazet <[email protected]> |
tcp: add a fast path in tcp_delack_timer()
delack timer is not stopped from inet_csk_clear_xmit_timer() because we do not define INET_CSK_CLEAR_TIMERS.
This is a conscious choice : inet_csk_clear_x
tcp: add a fast path in tcp_delack_timer()
delack timer is not stopped from inet_csk_clear_xmit_timer() because we do not define INET_CSK_CLEAR_TIMERS.
This is a conscious choice : inet_csk_clear_xmit_timer() is often called from another cpu. Calling del_timer() would cause false sharing and lock contention.
This means that very often, tcp_delack_timer() is called at the timer expiration, while there is no ACK to transmit.
This can be detected very early, avoiding the socket spinlock.
Notes: - test about tp->compressed_ack is racy, but in the unlikely case there is a race, the dedicated compressed_ack_timer hrtimer would close it.
- Even if the fast path is not taken, reading icsk->icsk_ack.pending and tp->compressed_ack before acquiring the socket spinlock reduces acquisition time and chances of contention.
Signed-off-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
3b784293 |
| 02-Oct-2024 |
Eric Dumazet <[email protected]> |
tcp: add a fast path in tcp_write_timer()
retransmit timer is not stopped from inet_csk_clear_xmit_timer() because we do not define INET_CSK_CLEAR_TIMERS.
This is a conscious choice : for active TC
tcp: add a fast path in tcp_write_timer()
retransmit timer is not stopped from inet_csk_clear_xmit_timer() because we do not define INET_CSK_CLEAR_TIMERS.
This is a conscious choice : for active TCP flows, it is better to only call mod_timer(), because there is more chances of keeping the timer unchanged. Also inet_csk_clear_xmit_timer() is often called from another cpu, and calling del_timer() would cause false sharing and lock contention.
This means that very often, tcp_write_timer() is called at the timer expiration, while there is nothing to retransmit.
This can be detected very early, avoiding the socket spinlock.
Signed-off-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
5a9071a7 |
| 02-Oct-2024 |
Eric Dumazet <[email protected]> |
tcp: annotate data-races around icsk->icsk_pending
icsk->icsk_pending can be read locklessly already.
Following patch in the series will add another lockless read.
Add smp_load_acquire() and smp_s
tcp: annotate data-races around icsk->icsk_pending
icsk->icsk_pending can be read locklessly already.
Following patch in the series will add another lockless read.
Add smp_load_acquire() and smp_store_release() annotations because following patch will add a test in tcp_write_timer(), and READ_ONCE()/WRITE_ONCE() alone would possibly lead to races.
Signed-off-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc1, v6.11 |
|
| #
6982826f |
| 09-Sep-2024 |
Matthieu Baerts (NGI0) <[email protected]> |
mptcp: fallback to TCP after SYN+MPC drops
Some middleboxes might be nasty with MPTCP, and decide to drop packets with MPTCP options, instead of just dropping the MPTCP options (or letting them pass
mptcp: fallback to TCP after SYN+MPC drops
Some middleboxes might be nasty with MPTCP, and decide to drop packets with MPTCP options, instead of just dropping the MPTCP options (or letting them pass...).
In this case, it sounds better to fallback to "plain" TCP after 2 retransmissions, and try again.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/477 Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-2-da7ebb4cd2a3@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
0a399892 |
| 02-Aug-2024 |
Jason Xing <[email protected]> |
tcp: rstreason: introduce SK_RST_REASON_TCP_KEEPALIVE_TIMEOUT for active reset
Introducing this to show the users the reason of keepalive timeout.
Signed-off-by: Jason Xing <[email protected]>
tcp: rstreason: introduce SK_RST_REASON_TCP_KEEPALIVE_TIMEOUT for active reset
Introducing this to show the users the reason of keepalive timeout.
Signed-off-by: Jason Xing <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
edefba66 |
| 02-Aug-2024 |
Jason Xing <[email protected]> |
tcp: rstreason: introduce SK_RST_REASON_TCP_STATE for active reset
Introducing a new type TCP_STATE to handle some reset conditions appearing in RFC 793 due to its socket state. Actually, we can loo
tcp: rstreason: introduce SK_RST_REASON_TCP_STATE for active reset
Introducing a new type TCP_STATE to handle some reset conditions appearing in RFC 793 due to its socket state. Actually, we can look into RFC 9293 which has no discrepancy about this part.
Signed-off-by: Jason Xing <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
8407994f |
| 02-Aug-2024 |
Jason Xing <[email protected]> |
tcp: rstreason: introduce SK_RST_REASON_TCP_ABORT_ON_MEMORY for active reset
Introducing a new type TCP_ABORT_ON_MEMORY for tcp reset reason to handle out of memory case.
Signed-off-by: Jason Xing
tcp: rstreason: introduce SK_RST_REASON_TCP_ABORT_ON_MEMORY for active reset
Introducing a new type TCP_ABORT_ON_MEMORY for tcp reset reason to handle out of memory case.
Signed-off-by: Jason Xing <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1, v6.10 |
|
| #
97a90635 |
| 10-Jul-2024 |
Eric Dumazet <[email protected]> |
tcp: avoid too many retransmit packets
If a TCP socket is using TCP_USER_TIMEOUT, and the other peer retracted its window to zero, tcp_retransmit_timer() can retransmit a packet every two jiffies (2
tcp: avoid too many retransmit packets
If a TCP socket is using TCP_USER_TIMEOUT, and the other peer retracted its window to zero, tcp_retransmit_timer() can retransmit a packet every two jiffies (2 ms for HZ=1000), for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.
The fix is to make sure tcp_rtx_probe0_timed_out() takes icsk->icsk_user_timeout into account.
Before blamed commit, the socket would not timeout after icsk->icsk_user_timeout, but would use standard exponential backoff for the retransmits.
Also worth noting that before commit e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0"), the issue would last 2 minutes instead of 4.
Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy") Signed-off-by: Eric Dumazet <[email protected]> Cc: Neal Cardwell <[email protected]> Reviewed-by: Jason Xing <[email protected]> Reviewed-by: Jon Maxwell <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc7 |
|
| #
0ec986ed |
| 03-Jul-2024 |
Neal Cardwell <[email protected]> |
tcp: fix incorrect undo caused by DSACK of TLP retransmit
Loss recovery undo_retrans bookkeeping had a long-standing bug where a DSACK from a spurious TLP retransmit packet could cause an erroneous
tcp: fix incorrect undo caused by DSACK of TLP retransmit
Loss recovery undo_retrans bookkeeping had a long-standing bug where a DSACK from a spurious TLP retransmit packet could cause an erroneous undo of a fast recovery or RTO recovery that repaired a single really-lost packet (in a sequence range outside that of the TLP retransmit). Basically, because the loss recovery state machine didn't account for the fact that it sent a TLP retransmit, the DSACK for the TLP retransmit could erroneously be implicitly be interpreted as corresponding to the normal fast recovery or RTO recovery retransmit that plugged a real hole, thus resulting in an improper undo.
For example, consider the following buggy scenario where there is a real packet loss but the congestion control response is improperly undone because of this bug:
+ send packets P1, P2, P3, P4 + P1 is really lost + send TLP retransmit of P4 + receive SACK for original P2, P3, P4 + enter fast recovery, fast-retransmit P1, increment undo_retrans to 1 + receive DSACK for TLP P4, decrement undo_retrans to 0, undo (bug!) + receive cumulative ACK for P1-P4 (fast retransmit plugged real hole)
The fix: when we initialize undo machinery in tcp_init_undo(), if there is a TLP retransmit in flight, then increment tp->undo_retrans so that we make sure that we receive a DSACK corresponding to the TLP retransmit, as well as DSACKs for all later normal retransmits, before triggering a loss recovery undo. Note that we also have to move the line that clears tp->tlp_high_seq for RTO recovery, so that upon RTO we remember the tp->tlp_high_seq value until tcp_init_undo() and clear it only afterward.
Also note that the bug dates back to the original 2013 TLP implementation, commit 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)").
However, this patch will only compile and work correctly with kernels that have tp->tlp_retrans, which was added only in v5.8 in 2020 in commit 76be93fc0702 ("tcp: allow at most one TLP probe per flight"). So we associate this fix with that later commit.
Fixes: 76be93fc0702 ("tcp: allow at most one TLP probe per flight") Signed-off-by: Neal Cardwell <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Cc: Yuchung Cheng <[email protected]> Cc: Kevin Yang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2 |
|
| #
853c3bd7 |
| 28-May-2024 |
Eric Dumazet <[email protected]> |
tcp: fix race in tcp_write_err()
I noticed flakes in a packetdrill test, expecting an epoll_wait() to return EPOLLERR | EPOLLHUP on a failed connect() attempt, after multiple SYN retransmits. It som
tcp: fix race in tcp_write_err()
I noticed flakes in a packetdrill test, expecting an epoll_wait() to return EPOLLERR | EPOLLHUP on a failed connect() attempt, after multiple SYN retransmits. It sometimes return EPOLLERR only.
The issue is that tcp_write_err(): 1) writes an error in sk->sk_err, 2) calls sk_error_report(), 3) then calls tcp_done().
tcp_done() is writing SHUTDOWN_MASK into sk->sk_shutdown, among other things.
Problem is that the awaken user thread (from 2) sk_error_report()) might call tcp_poll() before tcp_done() has written sk->sk_shutdown.
tcp_poll() only sees a non zero sk->sk_err and returns EPOLLERR.
This patch fixes the issue by making sure to call sk_error_report() after tcp_done().
tcp_write_err() also lacks an smp_wmb().
We can reuse tcp_done_with_error() to factor out the details, as Neal suggested.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Neal Cardwell <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
36534d3c |
| 07-Jun-2024 |
Eric Dumazet <[email protected]> |
tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
Due to timer wheel implementation, a timer will usually fire after its schedule.
For instance, for HZ=1000, a timeout between 512ms and 4s h
tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
Due to timer wheel implementation, a timer will usually fire after its schedule.
For instance, for HZ=1000, a timeout between 512ms and 4s has a granularity of 64ms. For this range of values, the extra delay could be up to 63ms.
For TCP, this means that tp->rcv_tstamp may be after inet_csk(sk)->icsk_timeout whenever the timer interrupt finally triggers, if one packet came during the extra delay.
We need to make sure tcp_rtx_probe0_timed_out() handles this case.
Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0") Signed-off-by: Eric Dumazet <[email protected]> Cc: Menglong Dong <[email protected]> Acked-by: Neal Cardwell <[email protected]> Reviewed-by: Jason Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6 |
|
| #
5691276b |
| 25-Apr-2024 |
Jason Xing <[email protected]> |
rstreason: prepare for active reset
Like what we did to passive reset: only passing possible reset reason in each active reset path.
No functional changes.
Signed-off-by: Jason Xing <kernelxing@te
rstreason: prepare for active reset
Like what we did to passive reset: only passing possible reset reason in each active reset path.
No functional changes.
Signed-off-by: Jason Xing <[email protected]> Acked-by: Matthieu Baerts (NGI0) <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2 |
|
| #
58169ec9 |
| 29-Mar-2024 |
Eric Dumazet <[email protected]> |
inet: preserve const qualifier in inet_csk()
We can change inet_csk() to propagate its argument const qualifier, thanks to container_of_const().
We have to fix few places that had mistakes, like tc
inet: preserve const qualifier in inet_csk()
We can change inet_csk() to propagate its argument const qualifier, thanks to container_of_const().
We have to fix few places that had mistakes, like tcp_bound_rto().
Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2 |
|
| #
14dd92d0 |
| 14-Nov-2023 |
Eric Dumazet <[email protected]> |
tcp: use tp->total_rto to track number of linear timeouts in SYN_SENT state
In commit ccce324dabfe ("tcp: make the first N SYN RTO backoffs linear") David used icsk->icsk_backoff field to track the
tcp: use tp->total_rto to track number of linear timeouts in SYN_SENT state
In commit ccce324dabfe ("tcp: make the first N SYN RTO backoffs linear") David used icsk->icsk_backoff field to track the number of linear timeouts.
Since then, tp->total_rto has been added.
This commit uses tp->total_rto instead of icsk->icsk_backoff so that tcp_ld_RTO_revert() no longer can trigger an overflow in inet_csk_rto_backoff(). Other than the potential UBSAN report, there was no issue because receiving an ICMP message currently aborts the connect().
In the following patch, we want to adhere to RFC 6069 and RFC 1122 4.2.3.9.
Signed-off-by: Eric Dumazet <[email protected]> Cc: David Morley <[email protected]> Cc: Neal Cardwell <[email protected]> Cc: Yuchung Cheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc1, v6.6, v6.6-rc7 |
|
| #
614e8316 |
| 20-Oct-2023 |
Eric Dumazet <[email protected]> |
tcp: add support for usec resolution in TCP TS values
Back in 2015, Van Jacobson suggested to use usec resolution in TCP TS values. This has been implemented in our private kernels.
Goals were :
1
tcp: add support for usec resolution in TCP TS values
Back in 2015, Van Jacobson suggested to use usec resolution in TCP TS values. This has been implemented in our private kernels.
Goals were :
1) better observability of delays in networking stacks. 2) better disambiguation of events based on TSval/ecr values. 3) building block for congestion control modules needing usec resolution.
Back then we implemented a schem based on private SYN options to negotiate the feature.
For upstream submission, we chose to use a route attribute, because this feature is probably going to be used in private networks [1] [2].
ip route add 10/8 ... features tcp_usec_ts
Note that RFC 7323 recommends a "timestamp clock frequency in the range 1 ms to 1 sec per tick.", but also mentions "the maximum acceptable clock frequency is one tick every 59 ns."
[1] Unfortunately RFC 7323 5.5 (Outdated Timestamps) suggests to invalidate TS.Recent values after a flow was idle for more than 24 days. This is the part making usec_ts a problem for peers following this recommendation for long living idle flows.
[2] Attempts to standardize usec ts went nowhere:
https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf https://datatracker.ietf.org/doc/draft-wang-tcpm-low-latency-opt/
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
9d0c00f5 |
| 20-Oct-2023 |
Eric Dumazet <[email protected]> |
tcp: rename tcp_time_stamp() to tcp_time_stamp_ts()
This helper returns a TSval from a TCP socket.
It currently calls tcp_time_stamp_ms() but will soon be able to return a usec based TSval, dependi
tcp: rename tcp_time_stamp() to tcp_time_stamp_ts()
This helper returns a TSval from a TCP socket.
It currently calls tcp_time_stamp_ms() but will soon be able to return a usec based TSval, depending on an upcoming tp->tcp_usec_ts field.
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
d1a02ed6 |
| 20-Oct-2023 |
Eric Dumazet <[email protected]> |
tcp: rename tcp_skb_timestamp()
This helper returns a 32bit TCP TSval from skb->tstamp.
As we are going to support usec or ms units soon, rename it to tcp_skb_timestamp_ts() and add a boolean to se
tcp: rename tcp_skb_timestamp()
This helper returns a 32bit TCP TSval from skb->tstamp.
As we are going to support usec or ms units soon, rename it to tcp_skb_timestamp_ts() and add a boolean to select the unit.
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|