|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2 |
|
| #
72671b35 |
| 30-Oct-2020 |
Jon Maloy <[email protected]> |
tipc: add stricter control of reserved service types
TIPC reserves 64 service types for current and future internal use. Therefore, the bind() function is meant to block regular user sockets from be
tipc: add stricter control of reserved service types
TIPC reserves 64 service types for current and future internal use. Therefore, the bind() function is meant to block regular user sockets from being bound to these values, while it should let through such bindings from internal users.
However, since we at the design moment saw no way to distinguish between regular and internal users the filter function ended up with allowing all bindings of the reserved types which were really in use ([0,1]), and block all the rest ([2,63]).
This is risky, since a regular user may bind to the service type representing the topology server (TIPC_TOP_SRV == 1) or the one used for indicating neighboring node status (TIPC_CFG_SRV == 0), and wreak havoc for users of those services, i.e., most users.
The reality is however that TIPC_CFG_SRV never is bound through the bind() function, since it doesn't represent a regular socket, and TIPC_TOP_SRV can also be made to bypass the checks in tipc_bind() by introducing a different entry function, tipc_sk_bind().
It should be noted that although this is a change of the API semantics, there is no risk we will break any currently working applications by doing this. Any application trying to bind to the values in question would be badly broken from the outset, so there is no chance we would find any such applications in real-world production systems.
v2: Added warning printout when a user is blocked from binding, as suggested by Jakub Kicinski
Acked-by: Yung Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7 |
|
| #
095ae612 |
| 28-May-2020 |
Christoph Hellwig <[email protected]> |
tipc: call tsk_set_importance from tipc_topsrv_create_listener
Avoid using kernel_setsockopt for the TIPC_IMPORTANCE option when we can just use the internal helper. The only change needed is to pa
tipc: call tsk_set_importance from tipc_topsrv_create_listener
Avoid using kernel_setsockopt for the TIPC_IMPORTANCE option when we can just use the internal helper. The only change needed is to pass a struct sock instead of tipc_sock, which is private to socket.c
Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20 |
|
| #
01e661eb |
| 19-Dec-2018 |
Tuong Lien <[email protected]> |
tipc: add trace_events for tipc socket
The commit adds the new trace_events for TIPC socket object:
trace_tipc_sk_create() trace_tipc_sk_poll() trace_tipc_sk_sendmsg() trace_tipc_sk_sendmcast() tra
tipc: add trace_events for tipc socket
The commit adds the new trace_events for TIPC socket object:
trace_tipc_sk_create() trace_tipc_sk_poll() trace_tipc_sk_sendmsg() trace_tipc_sk_sendmcast() trace_tipc_sk_sendstream() trace_tipc_sk_filter_rcv() trace_tipc_sk_advance_rx() trace_tipc_sk_rej_msg() trace_tipc_sk_drop_msg() trace_tipc_sk_release() trace_tipc_sk_shutdown() trace_tipc_sk_overlimit1() trace_tipc_sk_overlimit2()
Also, enables the traces for the following cases: - When user creates a TIPC socket; - When user calls poll() on TIPC socket; - When user sends a dgram/mcast/stream message. - When a message is put into the socket 'sk_receive_queue'; - When a message is released from the socket 'sk_receive_queue'; - When a message is rejected (e.g. due to no port, invalid, etc.); - When a message is dropped (e.g. due to wrong message type); - When socket is released; - When socket is shutdown; - When socket rcvq's allocation is overlimit (> 90%); - When socket rcvq + bklq's allocation is overlimit (> 90%); - When the 'TIPC_ERR_OVERLOAD/2' issue happens;
Note: a) All the socket traces are designed to be able to trace on a specific socket by either using the 'event filtering' feature on a known socket 'portid' value or the sysctl file:
/proc/sys/net/tipc/sk_filter
The file determines a 'tuple' for what socket should be traced:
(portid, sock type, name type, name lower, name upper)
where: + 'portid' is the socket portid generated at socket creating, can be found in the trace outputs or the 'tipc socket list' command printouts; + 'sock type' is the socket type (1 = SOCK_TREAM, ...); + 'name type', 'name lower' and 'name upper' are the service name being connected to or published by the socket.
Value '0' means 'ANY', the default tuple value is (0, 0, 0, 0, 0) i.e. the traces happen for every sockets with no filter.
b) The 'tipc_sk_overlimit1/2' event is also a conditional trace_event which happens when the socket receive queue (and backlog queue) is about to be overloaded, when the queue allocation is > 90%. Then, when the trace is enabled, the last skbs leading to the TIPC_ERR_OVERLOAD/2 issue can be traced.
The trace event is designed as an 'upper watermark' notification that the other traces (e.g. 'tipc_sk_advance_rx' vs 'tipc_sk_filter_rcv') or actions can be triggerred in the meanwhile to see what is going on with the socket queue.
In addition, the 'trace_tipc_sk_dump()' is also placed at the 'TIPC_ERR_OVERLOAD/2' case, so the socket and last skb can be dumped for post-analysis.
Acked-by: Ying Xue <[email protected]> Tested-by: Ying Xue <[email protected]> Acked-by: Jon Maloy <[email protected]> Signed-off-by: Tuong Lien <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
b4b9771b |
| 19-Dec-2018 |
Tuong Lien <[email protected]> |
tipc: enable tracepoints in tipc
As for the sake of debugging/tracing, the commit enables tracepoints in TIPC along with some general trace_events as shown below. It also defines some 'tipc_*_dump()
tipc: enable tracepoints in tipc
As for the sake of debugging/tracing, the commit enables tracepoints in TIPC along with some general trace_events as shown below. It also defines some 'tipc_*_dump()' functions that allow to dump TIPC object data whenever needed, that is, for general debug purposes, ie. not just for the trace_events.
The following trace_events are now available:
- trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data, e.g. message type, user, droppable, skb truesize, cloned skb, etc.
- trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or queues, e.g. TIPC link transmq, socket receive queue, etc.
- trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g. sk state, sk type, connection type, rmem_alloc, socket queues, etc.
- trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g. link state, silent_intv_cnt, gap, bc_gap, link queues, etc.
- trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g. node state, active links, capabilities, link entries, etc.
How to use: Put the trace functions at any places where we want to dump TIPC data or events.
Note: a) The dump functions will generate raw data only, that is, to offload the trace event's processing, it can require a tool or script to parse the data but this should be simple.
b) The trace_tipc_*_dump() should be reserved for a failure cases only (e.g. the retransmission failure case) or where we do not expect to happen too often, then we can consider enabling these events by default since they will almost not take any effects under normal conditions, but once the rare condition or failure occurs, we get the dumped data fully for post-analysis.
For other trace purposes, we can reuse these trace classes as template but different events.
c) A trace_event is only effective when we enable it. To enable the TIPC trace_events, echo 1 to 'enable' files in the events/tipc/ directory in the 'debugfs' file system. Normally, they are located at:
/sys/kernel/debug/tracing/events/tipc/
For example:
To enable the tipc_link_dump event:
echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable
To enable all the TIPC trace_events:
echo 1 > /sys/kernel/debug/tracing/events/tipc/enable
To collect the trace data:
cat trace
or
cat trace_pipe > /trace.out &
To disable all the TIPC trace_events:
echo 0 > /sys/kernel/debug/tracing/events/tipc/enable
To clear the trace buffer:
echo > trace
d) Like the other trace_events, the feature like 'filter' or 'trigger' is also usable for the tipc trace_events. For more details, have a look at:
Documentation/trace/ftrace.txt
MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc
Acked-by: Ying Xue <[email protected]> Tested-by: Ying Xue <[email protected]> Acked-by: Jon Maloy <[email protected]> Signed-off-by: Tuong Lien <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3 |
|
| #
8f5c5fcf |
| 04-Sep-2018 |
Cong Wang <[email protected]> |
tipc: call start and done ops directly in __tipc_nl_compat_dumpit()
__tipc_nl_compat_dumpit() uses a netlink_callback on stack, so the only way to align it with other ->dumpit() call path is calling
tipc: call start and done ops directly in __tipc_nl_compat_dumpit()
__tipc_nl_compat_dumpit() uses a netlink_callback on stack, so the only way to align it with other ->dumpit() call path is calling tipc_dump_start() and tipc_dump_done() directly inside it. Otherwise ->dumpit() would always get NULL from cb->args[].
But tipc_dump_start() uses sock_net(cb->skb->sk) to retrieve net pointer, the cb->skb here doesn't set skb->sk, the net pointer is saved in msg->net instead, so introduce a helper function __tipc_dump_start() to pass in msg->net.
Ying pointed out cb->args[0...3] are already used by other callbacks on this call path, so we can't use cb->args[0] any more, use cb->args[4] instead.
Fixes: 9a07efa9aea2 ("tipc: switch to rhashtable iterator") Reported-and-tested-by: [email protected] Cc: Jon Maloy <[email protected]> Cc: Ying Xue <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Ying Xue <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v4.19-rc2, v4.19-rc1 |
|
| #
9a07efa9 |
| 24-Aug-2018 |
Cong Wang <[email protected]> |
tipc: switch to rhashtable iterator
syzbot reported a use-after-free in tipc_group_fill_sock_diag(), where tipc_group_fill_sock_diag() still reads tsk->group meanwhile tipc_group_delete() just delet
tipc: switch to rhashtable iterator
syzbot reported a use-after-free in tipc_group_fill_sock_diag(), where tipc_group_fill_sock_diag() still reads tsk->group meanwhile tipc_group_delete() just deletes it in tipc_release().
tipc_nl_sk_walk() aims to lock this sock when walking each sock in the hash table to close race conditions with sock changes like this one, by acquiring tsk->sk.sk_lock.slock spinlock, unfortunately this doesn't work at all. All non-BH call path should take lock_sock() instead to make it work.
tipc_nl_sk_walk() brutally iterates with raw rht_for_each_entry_rcu() where RCU read lock is required, this is the reason why lock_sock() can't be taken on this path. This could be resolved by switching to rhashtable iterator API's, where taking a sleepable lock is possible. Also, the iterator API's are friendly for restartable calls like diag dump, the last position is remembered behind the scence, all we need to do here is saving the iterator into cb->args[].
I tested this with parallel tipc diag dump and thousands of tipc socket creation and release, no crash or memory leak.
Reported-by: [email protected] Cc: Jon Maloy <[email protected]> Cc: Ying Xue <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4, v4.17-rc3, v4.17-rc2, v4.17-rc1 |
|
| #
e41f0548 |
| 07-Apr-2018 |
Cong Wang <[email protected]> |
tipc: use the right skb in tipc_sk_fill_sock_diag()
Commit 4b2e6877b879 ("tipc: Fix namespace violation in tipc_sk_fill_sock_diag") tried to fix the crash but failed, the crash is still 100% reprodu
tipc: use the right skb in tipc_sk_fill_sock_diag()
Commit 4b2e6877b879 ("tipc: Fix namespace violation in tipc_sk_fill_sock_diag") tried to fix the crash but failed, the crash is still 100% reproducible with it.
In tipc_sk_fill_sock_diag(), skb is the diag dump we are filling, it is not correct to retrieve its NETLINK_CB(), instead, like other protocol diag, we should use NETLINK_CB(cb->skb).sk here.
Reported-by: <[email protected]> Fixes: 4b2e6877b879 ("tipc: Fix namespace violation in tipc_sk_fill_sock_diag") Fixes: c30b70deb5f4 (tipc: implement socket diagnostics for AF_TIPC) Cc: GhantaKrishnamurthy MohanKrishna <[email protected]> Cc: Jon Maloy <[email protected]> Cc: Ying Xue <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v4.16, v4.16-rc7 |
|
| #
c30b70de |
| 21-Mar-2018 |
GhantaKrishnamurthy MohanKrishna <[email protected]> |
tipc: implement socket diagnostics for AF_TIPC
This commit adds socket diagnostics capability for AF_TIPC in netlink family NETLINK_SOCK_DIAG in a new kernel module (diag.ko).
The following are key
tipc: implement socket diagnostics for AF_TIPC
This commit adds socket diagnostics capability for AF_TIPC in netlink family NETLINK_SOCK_DIAG in a new kernel module (diag.ko).
The following are key design considerations: - config TIPC_DIAG has default y, like INET_DIAG. - only requests with flag NLM_F_DUMP is supported (dump all). - tipc_sock_diag_req message is introduced to send filter parameters. - the response attributes are of TLV, some nested.
To avoid exposing data structures between diag and tipc modules and avoid code duplication, the following additions are required: - export tipc_nl_sk_walk function to reuse socket iterator. - export tipc_sk_fill_sock_diag to fill the tipc diag attributes. - create a sock_diag response message in __tipc_add_sock_diag defined in diag.c and use the above exported tipc_sk_fill_sock_diag to fill response.
Acked-by: Jon Maloy <[email protected]> Acked-by: Ying Xue <[email protected]> Signed-off-by: GhantaKrishnamurthy MohanKrishna <[email protected]> Signed-off-by: Parthasarathy Bhuvaragan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v4.16-rc6, v4.16-rc5, v4.16-rc4, v4.16-rc3, v4.16-rc2, v4.16-rc1, v4.15, v4.15-rc9, v4.15-rc8, v4.15-rc7, v4.15-rc6, v4.15-rc5, v4.15-rc4, v4.15-rc3, v4.15-rc2, v4.15-rc1, v4.14, v4.14-rc8, v4.14-rc7, v4.14-rc6, v4.14-rc5, v4.14-rc4, v4.14-rc3, v4.14-rc2, v4.14-rc1, v4.13, v4.13-rc7, v4.13-rc6, v4.13-rc5, v4.13-rc4, v4.13-rc3, v4.13-rc2, v4.13-rc1, v4.12, v4.12-rc7, v4.12-rc6, v4.12-rc5, v4.12-rc4, v4.12-rc3, v4.12-rc2, v4.12-rc1, v4.11, v4.11-rc8, v4.11-rc7, v4.11-rc6, v4.11-rc5, v4.11-rc4, v4.11-rc3, v4.11-rc2, v4.11-rc1, v4.10, v4.10-rc8, v4.10-rc7, v4.10-rc6, v4.10-rc5, v4.10-rc4, v4.10-rc3, v4.10-rc2, v4.10-rc1, v4.9, v4.9-rc8, v4.9-rc7, v4.9-rc6, v4.9-rc5, v4.9-rc4, v4.9-rc3, v4.9-rc2, v4.9-rc1, v4.8, v4.8-rc8, v4.8-rc7, v4.8-rc6, v4.8-rc5, v4.8-rc4, v4.8-rc3, v4.8-rc2, v4.8-rc1, v4.7, v4.7-rc7, v4.7-rc6, v4.7-rc5, v4.7-rc4, v4.7-rc3, v4.7-rc2, v4.7-rc1, v4.6, v4.6-rc7 |
|
| #
10724cc7 |
| 02-May-2016 |
Jon Paul Maloy <[email protected]> |
tipc: redesign connection-level flow control
There are two flow control mechanisms in TIPC; one at link level that handles network congestion, burst control, and retransmission, and one at connectio
tipc: redesign connection-level flow control
There are two flow control mechanisms in TIPC; one at link level that handles network congestion, burst control, and retransmission, and one at connection level which' only remaining task is to prevent overflow in the receiving socket buffer. In TIPC, the latter task has to be solved end-to-end because messages can not be thrown away once they have been accepted and delivered upwards from the link layer, i.e, we can never permit the receive buffer to overflow.
Currently, this algorithm is message based. A counter in the receiving socket keeps track of number of consumed messages, and sends a dedicated acknowledge message back to the sender for each 256 consumed message. A counter at the sending end keeps track of the sent, not yet acknowledged messages, and blocks the sender if this number ever reaches 512 unacknowledged messages. When the missing acknowledge arrives, the socket is then woken up for renewed transmission. This works well for keeping the message flow running, as it almost never happens that a sender socket is blocked this way.
A problem with the current mechanism is that it potentially is very memory consuming. Since we don't distinguish between small and large messages, we have to dimension the socket receive buffer according to a worst-case of both. I.e., the window size must be chosen large enough to sustain a reasonable throughput even for the smallest messages, while we must still consider a scenario where all messages are of maximum size. Hence, the current fix window size of 512 messages and a maximum message size of 66k results in a receive buffer of 66 MB when truesize(66k) = 131k is taken into account. It is possible to do much better.
This commit introduces an algorithm where we instead use 1024-byte blocks as base unit. This unit, always rounded upwards from the actual message size, is used when we advertise windows as well as when we count and acknowledge transmitted data. The advertised window is based on the configured receive buffer size in such a way that even the worst-case truesize/msgsize ratio always is covered. Since the smallest possible message size (from a flow control viewpoint) now is 1024 bytes, we can safely assume this ratio to be less than four, which is the value we are now using.
This way, we have been able to reduce the default receive buffer size from 66 MB to 2 MB with maintained performance.
In order to keep this solution backwards compatible, we introduce a new capability bit in the discovery protocol, and use this throughout the message sending/reception path to always select the right unit.
Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v4.6-rc6, v4.6-rc5, v4.6-rc4, v4.6-rc3, v4.6-rc2, v4.6-rc1, v4.5, v4.5-rc7, v4.5-rc6, v4.5-rc5, v4.5-rc4, v4.5-rc3, v4.5-rc2, v4.5-rc1, v4.4, v4.4-rc8, v4.4-rc7, v4.4-rc6, v4.4-rc5, v4.4-rc4, v4.4-rc3, v4.4-rc2, v4.4-rc1, v4.3, v4.3-rc7, v4.3-rc6, v4.3-rc5, v4.3-rc4, v4.3-rc3, v4.3-rc2, v4.3-rc1, v4.2, v4.2-rc8, v4.2-rc7, v4.2-rc6, v4.2-rc5, v4.2-rc4 |
|
| #
cda3696d |
| 22-Jul-2015 |
Jon Paul Maloy <[email protected]> |
tipc: clean up socket layer message reception
When a message is received in a socket, one of the call chains tipc_sk_rcv()->tipc_sk_enqueue()->filter_rcv()(->tipc_sk_proto_rcv()) or tipc_sk_backlog_
tipc: clean up socket layer message reception
When a message is received in a socket, one of the call chains tipc_sk_rcv()->tipc_sk_enqueue()->filter_rcv()(->tipc_sk_proto_rcv()) or tipc_sk_backlog_rcv()->filter_rcv()(->tipc_sk_proto_rcv()) are followed. At each of these levels we may encounter situations where the message may need to be rejected, or a new message produced for transfer back to the sender. Despite recent improvements, the current code for doing this is perceived as awkward and hard to follow.
Leveraging the two previous commits in this series, we now introduce a more uniform handling of such situations. We let each of the functions in the chain itself produce/reverse the message to be returned to the sender, but also perform the actual forwarding. This simplifies the necessary logics within each function.
Reviewed-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v4.2-rc3, v4.2-rc2, v4.2-rc1, v4.1, v4.1-rc8, v4.1-rc7, v4.1-rc6, v4.1-rc5, v4.1-rc4, v4.1-rc3, v4.1-rc2, v4.1-rc1, v4.0, v4.0-rc7, v4.0-rc6, v4.0-rc5 |
|
| #
76100a8a |
| 18-Mar-2015 |
Ying Xue <[email protected]> |
tipc: fix netns refcnt leak
When the TIPC module is loaded, we launch a topology server in kernel space, which in its turn is creating TIPC sockets for communication with topology server users. Beca
tipc: fix netns refcnt leak
When the TIPC module is loaded, we launch a topology server in kernel space, which in its turn is creating TIPC sockets for communication with topology server users. Because both the socket's creator and provider reside in the same module, it is necessary that the TIPC module's reference count remains zero after the server is started and the socket created; otherwise it becomes impossible to perform "rmmod" even on an idle module.
Currently, we achieve this by defining a separate "tipc_proto_kern" protocol struct, that is used only for kernel space socket allocations. This structure has the "owner" field set to NULL, which restricts the module reference count from being be bumped when sk_alloc() for local sockets is called. Furthermore, we have defined three kernel-specific functions, tipc_sock_create_local(), tipc_sock_release_local() and tipc_sock_accept_local(), to avoid the module counter being modified when module local sockets are created or deleted. This has worked well until we introduced name space support.
However, after name space support was introduced, we have observed that a reference count leak occurs, because the netns counter is not decremented in tipc_sock_delete_local().
This commit remedies this problem. But instead of just modifying tipc_sock_delete_local(), we eliminate the whole parallel socket handling infrastructure, and start using the regular sk_create_kern(), kernel_accept() and sk_release_kernel() calls. Since those functions manipulate the module counter, we must now compensate for that by explicitly decrementing the counter after module local sockets are created, and increment it just before calling sk_release_kernel().
Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace") Signed-off-by: Ying Xue <[email protected]> Reviewed-by: Jon Maloy <[email protected]> Reviewed-by: Erik Hugne <[email protected]> Reported-by: Cong Wang <[email protected]> Tested-by: Erik Hugne <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v4.0-rc4, v4.0-rc3, v4.0-rc2, v4.0-rc1 |
|
| #
487d2a3a |
| 09-Feb-2015 |
Richard Alpe <[email protected]> |
tipc: convert legacy nl socket dump to nl compat
Convert socket (port) listing to compat dumpit call. If a socket (port) has publications a second dumpit call is issued to collect them and format th
tipc: convert legacy nl socket dump to nl compat
Convert socket (port) listing to compat dumpit call. If a socket (port) has publications a second dumpit call is issued to collect them and format then into the legacy buffer before continuing to process the sockets (ports).
Command converted in this patch: TIPC_CMD_SHOW_PORTS
Signed-off-by: Richard Alpe <[email protected]> Reviewed-by: Erik Hugne <[email protected]> Reviewed-by: Ying Xue <[email protected]> Reviewed-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v3.19 |
|
| #
cb1b7280 |
| 05-Feb-2015 |
Jon Paul Maloy <[email protected]> |
tipc: eliminate race condition at multicast reception
In a previous commit in this series we resolved a race problem during unicast message reception.
Here, we resolve the same problem at multicast
tipc: eliminate race condition at multicast reception
In a previous commit in this series we resolved a race problem during unicast message reception.
Here, we resolve the same problem at multicast reception. We apply the same technique: an input queue serializing the delivery of arriving buffers. The main difference is that here we do it in two steps. First, the broadcast link feeds arriving buffers into the tail of an arrival queue, which head is consumed at the socket level, and where destination lookup is performed. Second, if the lookup is successful, the resulting buffer clones are fed into a second queue, the input queue. This queue is consumed at reception in the socket just like in the unicast case. Both queues are protected by the same lock, -the one of the input queue.
Reviewed-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
3c724acd |
| 05-Feb-2015 |
Jon Paul Maloy <[email protected]> |
tipc: simplify socket multicast reception
The structure 'tipc_port_list' is used to collect port numbers representing multicast destination socket on a receiving node. The list is not based on a sta
tipc: simplify socket multicast reception
The structure 'tipc_port_list' is used to collect port numbers representing multicast destination socket on a receiving node. The list is not based on a standard linked list, and is in reality optimized for the uncommon case that there are more than one multicast destinations per node. This makes the list handling unecessarily complex, and as a consequence, even the socket multicast reception becomes more complex.
In this commit, we replace 'tipc_port_list' with a new 'struct tipc_plist', which is based on a standard list. We give the new list stack (push/pop) semantics, someting that simplifies the implementation of the function tipc_sk_mcast_rcv().
Reviewed-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
c637c103 |
| 05-Feb-2015 |
Jon Paul Maloy <[email protected]> |
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer, before passing messages upwards to the destination sockets. During the upca
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer, before passing messages upwards to the destination sockets. During the upcall from link to socket no locks are held. It is therefore possible, and we see it happen occasionally, that messages arriving in different threads and delivered in sequence still bypass each other before they reach the destination socket. This must not happen, since it violates the sequentiality guarantee.
We solve this by adding a new input buffer queue to the link structure. Arriving messages are added safely to the tail of that queue by the link, while the head of the queue is consumed, also safely, by the receiving socket. Sequentiality is secured per socket by only allowing buffers to be dequeued inside the socket lock. Since there may be multiple simultaneous readers of the queue, we use a 'filter' parameter to reduce the risk that they peek the same buffer from the queue, hence also reducing the risk of contention on the receiving socket locks.
This solves the sequentiality problem, and seems to cause no measurable performance degradation.
A nice side effect of this change is that lock handling in the functions tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that will enable future simplifications of those functions.
Reviewed-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v3.19-rc7, v3.19-rc6, v3.19-rc5, v3.19-rc4 |
|
| #
a62fbcce |
| 09-Jan-2015 |
Ying Xue <[email protected]> |
tipc: make subscriber server support net namespace
TIPC establishes one subscriber server which allows users to subscribe their interesting name service status. After tipc supports namespace, one de
tipc: make subscriber server support net namespace
TIPC establishes one subscriber server which allows users to subscribe their interesting name service status. After tipc supports namespace, one dedicated tipc stack instance is created for each namespace, and each instance can be deemed as one independent TIPC node. As a result, subscriber server must be built for each namespace.
Signed-off-by: Ying Xue <[email protected]> Tested-by: Tero Aho <[email protected]> Reviewed-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
e05b31f4 |
| 09-Jan-2015 |
Ying Xue <[email protected]> |
tipc: make tipc socket support net namespace
Now tipc socket table is statically allocated as a global variable. Through it, we can look up one socket instance with port ID, insert a new socket inst
tipc: make tipc socket support net namespace
Now tipc socket table is statically allocated as a global variable. Through it, we can look up one socket instance with port ID, insert a new socket instance to the table, and delete a socket from the table. But when tipc supports net namespace, each namespace must own its specific socket table. So the global variable of socket table must be redefined in tipc_net structure. As a concequence, a new socket table will be allocated when a new namespace is created, and a socket table will be deallocated when namespace is destroyed.
Signed-off-by: Ying Xue <[email protected]> Tested-by: Tero Aho <[email protected]> Reviewed-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
f2f9800d |
| 09-Jan-2015 |
Ying Xue <[email protected]> |
tipc: make tipc node table aware of net namespace
Global variables associated with node table are below: - node table list (node_htable) - node hash table list (tipc_node_list) - node table lock (no
tipc: make tipc node table aware of net namespace
Global variables associated with node table are below: - node table list (node_htable) - node hash table list (tipc_node_list) - node table lock (node_list_lock) - node number counter (tipc_num_nodes) - node link number counter (tipc_num_links)
To make node table support namespace, above global variables must be moved to tipc_net structure in order to keep secret for different namespaces. As a consequence, these variables are allocated and initialized when namespace is created, and deallocated when namespace is destroyed. After the change, functions associated with these variables have to utilize a namespace pointer to access them. So adding namespace pointer as a parameter of these functions is the major change made in the commit.
Signed-off-by: Ying Xue <[email protected]> Tested-by: Tero Aho <[email protected]> Reviewed-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
859fc7c0 |
| 09-Jan-2015 |
Ying Xue <[email protected]> |
tipc: cleanup core.c and core.h files
Only the works of initializing and shutting down tipc module are done in core.h and core.c files, so all stuffs which are not closely associated with the two ta
tipc: cleanup core.c and core.h files
Only the works of initializing and shutting down tipc module are done in core.h and core.c files, so all stuffs which are not closely associated with the two tasks should be moved to appropriate places.
Signed-off-by: Ying Xue <[email protected]> Tested-by: Tero Aho <[email protected]> Reviewed-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
07f6c4bc |
| 07-Jan-2015 |
Ying Xue <[email protected]> |
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size requested on stack initialization stage is quite big even if the maximum
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size requested on stack initialization stage is quite big even if the maximum port number is just restricted to 8191 currently, however, the number already becomes insufficient in practice. But if the maximum ports is allowed to its theory value - 2^32, its consumed memory size will reach a ridiculously unacceptable value. Apart from this, heavy tipc users spend a considerable amount of time in tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above mentioned both disadvantages would be resolved respectively: making use of the new resizable hash table can avoid locking on the lookup; smaller memory size is required at initial stage, for example, 256 hash bucket slots are requested at the beginning phase instead of allocating the entire 8191 slots in old mode. The hash table will grow if entries exceeds 75% of table size up to a total table size of 1M, and it will automatically shrink if usage falls below 30%, but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table mutations on write side. Lastly defers the release of the socket reference using call_rcu() to allow using an RCU read-side protected call to rhashtable_lookup().
Signed-off-by: Ying Xue <[email protected]> Acked-by: Jon Maloy <[email protected]> Acked-by: Erik Hugne <[email protected]> Cc: Thomas Graf <[email protected]> Acked-by: Thomas Graf <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v3.19-rc3, v3.19-rc2, v3.19-rc1, v3.18, v3.18-rc7, v3.18-rc6 |
|
| #
1a1a143d |
| 20-Nov-2014 |
Richard Alpe <[email protected]> |
tipc: add publication dump to new netlink api
Add TIPC_NL_PUBL_GET command to the new tipc netlink API.
This command supports dumping of all publications for a specific socket.
Netlink logical lay
tipc: add publication dump to new netlink api
Add TIPC_NL_PUBL_GET command to the new tipc netlink API.
This command supports dumping of all publications for a specific socket.
Netlink logical layout of request message: -> socket -> reference
Netlink logical layout of response message: -> publication -> type -> lower -> upper
Signed-off-by: Richard Alpe <[email protected]> Reviewed-by: Erik Hugne <[email protected]> Reviewed-by: Jon Maloy <[email protected]> Acked-by: Ying Xue <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
34b78a12 |
| 20-Nov-2014 |
Richard Alpe <[email protected]> |
tipc: add sock dump to new netlink api
Add TIPC_NL_SOCK_GET command to the new tipc netlink API.
This command supports dumping of all available sockets with their associated connection or publicati
tipc: add sock dump to new netlink api
Add TIPC_NL_SOCK_GET command to the new tipc netlink API.
This command supports dumping of all available sockets with their associated connection or publication(s). It could be extended to reply with a single socket if the NLM_F_DUMP isn't set.
The information about a socket includes reference, address, connection information / publication information.
Netlink logical layout of response message: -> socket -> reference -> address [ -> connection -> node -> socket [ -> connected flag -> type -> instance ] ] [ -> publication flag ]
Signed-off-by: Richard Alpe <[email protected]> Reviewed-by: Erik Hugne <[email protected]> Reviewed-by: Jon Maloy <[email protected]> Acked-by: Ying Xue <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v3.18-rc5, v3.18-rc4, v3.18-rc3, v3.18-rc2, v3.18-rc1, v3.17, v3.17-rc7, v3.17-rc6, v3.17-rc5, v3.17-rc4, v3.17-rc3, v3.17-rc2 |
|
| #
301bae56 |
| 22-Aug-2014 |
Jon Paul Maloy <[email protected]> |
tipc: merge struct tipc_port into struct tipc_sock
We complete the merging of the port and socket layer by aggregating the fields of struct tipc_port directly into struct tipc_sock, and moving the c
tipc: merge struct tipc_port into struct tipc_sock
We complete the merging of the port and socket layer by aggregating the fields of struct tipc_port directly into struct tipc_sock, and moving the combined structure into socket.c.
We also move all functions and macros that are not any longer exposed to the rest of the stack into socket.c, and rename them accordingly.
Despite the size of this commit, there are no functional changes. We have only made such changes that are necessary due of the removal of struct tipc_port.
Signed-off-by: Jon Maloy <[email protected]> Reviewed-by: Erik Hugne <[email protected]> Reviewed-by: Ying Xue <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
808d90f9 |
| 22-Aug-2014 |
Jon Paul Maloy <[email protected]> |
tipc: remove files ref.h and ref.c
The reference table is now 'socket aware' instead of being generic, and has in reality become a socket internal table. In order to be able to minimize the API expo
tipc: remove files ref.h and ref.c
The reference table is now 'socket aware' instead of being generic, and has in reality become a socket internal table. In order to be able to minimize the API exposed by the socket layer towards the rest of the stack, we now move the reference table definitions and functions into the file socket.c, and rename the functions accordingly.
There are no functional changes in this commit.
Signed-off-by: Jon Maloy <[email protected]> Reviewed-by: Erik Hugne <[email protected]> Reviewed-by: Ying Xue <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
2e84c60b |
| 22-Aug-2014 |
Jon Paul Maloy <[email protected]> |
tipc: remove include file port.h
We move the inline functions in the file port.h to socket.c, and modify their names accordingly.
We move struct tipc_port and some macros to socket.h.
Finally, we
tipc: remove include file port.h
We move the inline functions in the file port.h to socket.c, and modify their names accordingly.
We move struct tipc_port and some macros to socket.h.
Finally, we remove the file port.h.
Signed-off-by: Jon Maloy <[email protected]> Reviewed-by: Erik Hugne <[email protected]> Reviewed-by: Ying Xue <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|