|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1 |
|
| #
b912d599 |
| 01-Apr-2025 |
Stanislav Fomichev <[email protected]> |
net: rename rtnl_net_debug to lock_debug
And make it selected by CONFIG_DEBUG_NET. Don't rename any of the structs/functions. Next patch will use rtnl_net_debug_event in netdevsim.
Reviewed-by: Jak
net: rename rtnl_net_debug to lock_debug
And make it selected by CONFIG_DEBUG_NET. Don't rename any of the structs/functions. Next patch will use rtnl_net_debug_event in netdevsim.
Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
7e4d784f |
| 05-Mar-2025 |
Stanislav Fomichev <[email protected]> |
net: hold netdev instance lock during rtnetlink operations
To preserve the atomicity, hold the lock while applying multiple attributes. The major issue with a full conversion to the instance lock ar
net: hold netdev instance lock during rtnetlink operations
To preserve the atomicity, hold the lock while applying multiple attributes. The major issue with a full conversion to the instance lock are software nesting devices (bonding/team/vrf/etc). Those devices call into the core stack for their lower (potentially real hw) devices. To avoid explicitly wrapping all those places into instance lock/unlock, introduce new API boundaries:
- (some) existing dev_xxx calls are now considered "external" (to drivers) APIs and they transparently grab the instance lock if needed (dev_api.c) - new netif_xxx calls are internal core stack API (naming is sketchy, I've tried netdev_xxx_locked per Jakub's suggestion, but it feels a bit verbose; but happy to get back to this naming scheme if this is the preference)
This avoids touching most of the existing ioctl/sysfs/drivers paths.
Note the special handling of ndo_xxx_slave operations: I exploit the fact that none of the drivers that call these functions need/use instance lock. At the same time, they use dev_xxx APIs, so the lower device has to be unlocked.
Changes in unregister_netdevice_many_notify (to protect dev->state with instance lock) trigger lockdep - the loop over close_list (mostly from cleanup_net) introduces spurious ordering issues. netdev_lock_cmp_fn has a justification on why it's ok to suppress for now.
Cc: Saeed Mahameed <[email protected]> Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7 |
|
| #
12079a59 |
| 07-Nov-2024 |
Breno Leitao <[email protected]> |
net: Implement fault injection forcing skb reallocation
Introduce a fault injection mechanism to force skb reallocation. The primary goal is to catch bugs related to pointer invalidation after poten
net: Implement fault injection forcing skb reallocation
Introduce a fault injection mechanism to force skb reallocation. The primary goal is to catch bugs related to pointer invalidation after potential skb reallocation.
The fault injection mechanism aims to identify scenarios where callers retain pointers to various headers in the skb but fail to reload these pointers after calling a function that may reallocate the data. This type of bug can lead to memory corruption or crashes if the old, now-invalid pointers are used.
By forcing reallocation through fault injection, we can stress-test code paths and ensure proper pointer management after potential skb reallocations.
Add a hook for fault injection in the following functions:
* pskb_trim_rcsum() * pskb_may_pull_reason() * pskb_trim()
As the other fault injection mechanism, protect it under a debug Kconfig called CONFIG_FAIL_SKB_REALLOC.
This patch was *heavily* inspired by Jakub's proposal from: https://lore.kernel.org/all/[email protected]/
CC: Akinobu Mita <[email protected]> Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Akinobu Mita <[email protected]> Acked-by: Paolo Abeni <[email protected]> Acked-by: Guillaume Nault <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2 |
|
| #
03fa5348 |
| 04-Oct-2024 |
Kuniyuki Iwashima <[email protected]> |
rtnetlink: Add ASSERT_RTNL_NET() placeholder for netdev notifier.
The global and per-netns netdev notifier depend on RTNL, and its dependency is not so clear due to nested calls.
Let's add a placeh
rtnetlink: Add ASSERT_RTNL_NET() placeholder for netdev notifier.
The global and per-netns netdev notifier depend on RTNL, and its dependency is not so clear due to nested calls.
Let's add a placeholder to place ASSERT_RTNL_NET() for each event.
Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc1, v6.11 |
|
| #
170aafe3 |
| 10-Sep-2024 |
Mina Almasry <[email protected]> |
netdev: support binding dma-buf to netdevice
Add a netdev_dmabuf_binding struct which represents the dma-buf-to-netdevice binding. The netlink API will bind the dma-buf to rx queues on the netdevice
netdev: support binding dma-buf to netdevice
Add a netdev_dmabuf_binding struct which represents the dma-buf-to-netdevice binding. The netlink API will bind the dma-buf to rx queues on the netdevice. On the binding, the dma_buf_attach & dma_buf_map_attachment will occur. The entries in the sg_table from mapping will be inserted into a genpool to make it ready for allocation.
The chunks in the genpool are owned by a dmabuf_chunk_owner struct which holds the dma-buf offset of the base of the chunk and the dma_addr of the chunk. Both are needed to use allocations that come from this chunk.
We create a new type that represents an allocation from the genpool: net_iov. We setup the net_iov allocation size in the genpool to PAGE_SIZE for simplicity: to match the PAGE_SIZE normally allocated by the page pool and given to the drivers.
The user can unbind the dmabuf from the netdevice by closing the netlink socket that established the binding. We do this so that the binding is automatically unbound even if the userspace process crashes.
The binding and unbinding leaves an indicator in struct netdev_rx_queue that the given queue is bound, and the binding is actuated by resetting the rx queue using the queue API.
The netdev_dmabuf_binding struct is refcounted, and releases its resources only when all the refs are released.
Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: Kaiyuan Zhang <[email protected]> Signed-off-by: Mina Almasry <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> # excluding netlink Acked-by: Daniel Vetter <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
7c88f865 |
| 10-Sep-2024 |
Mina Almasry <[email protected]> |
netdev: add netdev_rx_queue_restart()
Add netdev_rx_queue_restart(), which resets an rx queue using the queue API recently merged[1].
The queue API was merged to enable the core net stack to reset
netdev: add netdev_rx_queue_restart()
Add netdev_rx_queue_restart(), which resets an rx queue using the queue API recently merged[1].
The queue API was merged to enable the core net stack to reset individual rx queues to actuate changes in the rx queue's configuration. In later patches in this series, we will use netdev_rx_queue_restart() to reset rx queues after binding or unbinding dmabuf configuration, which will cause reallocation of the page_pool to repopulate its memory using the new configuration.
[1] https://lore.kernel.org/netdev/[email protected]/T/
Signed-off-by: David Wei <[email protected]> Signed-off-by: Mina Almasry <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7 |
|
| #
768cf841 |
| 03-May-2024 |
Oleksij Rempel <[email protected]> |
net: add IEEE 802.1q specific helpers
IEEE 802.1q specification provides recommendation and examples which can be used as good default values for different drivers.
This patch implements mapping ex
net: add IEEE 802.1q specific helpers
IEEE 802.1q specification provides recommendation and examples which can be used as good default values for different drivers.
This patch implements mapping examples documented in IEEE 802.1Q-2022 in Annex I "I.3 Traffic type to traffic class mapping" and IETF DSCP naming and mapping DSCP to Traffic Type inspired by RFC8325.
This helpers will be used in followup patches for dsa/microchip DCB implementation.
Signed-off-by: Oleksij Rempel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2 |
|
| #
5b2be2ab |
| 27-Mar-2024 |
Alexander Lobakin <[email protected]> |
net: net_test: add tests for IP tunnel flags conversion helpers
Now that there are helpers for converting IP tunnel flags between the old __be16 format and the bitmap format, make sure they work as
net: net_test: add tests for IP tunnel flags conversion helpers
Now that there are helpers for converting IP tunnel flags between the old __be16 format and the bitmap format, make sure they work as expected by adding a couple of tests to the networking testing suite. The helpers are all inline, so no dependencies on the related CONFIG_* (or a standalone module) are needed.
Cover three possible cases:
1. No bits past BIT(15) are set, VTI/SIT bits are not set. This conversion is almost a direct assignment. 2. No bits past BIT(15) are set, but VTI/SIT bit is set. During the conversion, it must be transformed into BIT(16) in the bitmap, but still compatible with the __be16 format. 3. The bitmap has bits past BIT(15) set (not the VTI/SIT one). The result will be truncated. Note that currently __IP_TUNNEL_FLAG_NUM is 17 (incl. special), which means that the result of this case is currently semi-false-positive. When BIT(17) is finally here, it will be adjusted accordingly.
Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc1, v6.8 |
|
| #
2658b5a8 |
| 06-Mar-2024 |
Eric Dumazet <[email protected]> |
net: introduce struct net_hotdata
Instead of spreading networking critical fields all over the places, add a custom net_hotdata structure so that we can precisely control its layout.
In this first
net: introduce struct net_hotdata
Instead of spreading networking critical fields all over the places, add a custom net_hotdata structure so that we can precisely control its layout.
In this first patch, move :
- gro_normal_batch used in rx (GRO stack) - offload_base used in rx and tx (GRO and TSO stacks)
Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3 |
|
| #
f17c6964 |
| 26-Nov-2023 |
Jakub Kicinski <[email protected]> |
net: page_pool: id the page pools
To give ourselves the flexibility of creating netlink commands and ability to refer to page pool instances in uAPIs create IDs for page pools.
Reviewed-by: Ilias A
net: page_pool: id the page pools
To give ourselves the flexibility of creating netlink commands and ability to refer to page pool instances in uAPIs create IDs for page pools.
Reviewed-by: Ilias Apalodimas <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6 |
|
| #
b3098d32 |
| 09-Oct-2023 |
Willem de Bruijn <[email protected]> |
net: add skb_segment kunit test
Add unit testing for skb segment. This function is exercised by many different code paths, such as GSO_PARTIAL or GSO_BY_FRAGS, linear (with or without head_frag), fr
net: add skb_segment kunit test
Add unit testing for skb segment. This function is exercised by many different code paths, such as GSO_PARTIAL or GSO_BY_FRAGS, linear (with or without head_frag), frags or frag_list skbs, etc.
It is infeasible to manually run tests that cover all code paths when making changes. The long and complex function also makes it hard to establish through analysis alone that a patch has no unintended side-effects.
Add code coverage through kunit regression testing. Introduce kunit infrastructure for tests under net/core, and add this first test.
This first skb_segment test exercises a simple case: a linear skb. Follow-on patches will parametrize the test and add more variants.
Tested: Built and ran the test with
make ARCH=um mrproper
./tools/testing/kunit/kunit.py run \ --kconfig_add CONFIG_NET=y \ --kconfig_add CONFIG_DEBUG_KERNEL=y \ --kconfig_add CONFIG_DEBUG_INFO=y \ --kconfig_add=CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y \ net_core_gso
Signed-off-by: Willem de Bruijn <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6 |
|
| #
d457a0e3 |
| 08-Jun-2023 |
Eric Dumazet <[email protected]> |
net: move gso declarations and functions to their own files
Move declarations into include/net/gso.h and code into net/core/gso.c
Signed-off-by: Eric Dumazet <[email protected]> Cc: Stanislav Fom
net: move gso declarations and functions to their own files
Move declarations into include/net/gso.h and code into net/core/gso.c
Signed-off-by: Eric Dumazet <[email protected]> Cc: Stanislav Fomichev <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7 |
|
| #
d3d854fd |
| 01-Feb-2023 |
Jakub Kicinski <[email protected]> |
netdev-genl: create a simple family for netdev stuff
Add a Netlink spec-compatible family for netdevs. This is a very simple implementation without much thought going into it.
It allows us to reap
netdev-genl: create a simple family for netdev stuff
Add a Netlink spec-compatible family for netdevs. This is a very simple implementation without much thought going into it.
It allows us to reap all the benefits of Netlink specs, one can use the generic client to issue the commands:
$ ./cli.py --spec netdev.yaml --dump dev_get [{'ifindex': 1, 'xdp-features': set()}, {'ifindex': 2, 'xdp-features': {'basic', 'ndo-xmit', 'redirect'}}, {'ifindex': 3, 'xdp-features': {'rx-sg'}}]
the generic python library does not have flags-by-name support, yet, but we also don't have to carry strings in the messages, as user space can get the names from the spec.
Acked-by: Jesper Dangaard Brouer <[email protected]> Co-developed-by: Lorenzo Bianconi <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Co-developed-by: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Co-developed-by: Marek Majtyka <[email protected]> Signed-off-by: Marek Majtyka <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/327ad9c9868becbe1e601b580c962549c8cd81f2.1675245258.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3 |
|
| #
f05bd8eb |
| 05-Jan-2023 |
Jakub Kicinski <[email protected]> |
devlink: move code to a dedicated directory
The devlink code is hard to navigate with 13kLoC in one file. I really like the way Michal split the ethtool into per-command files and core. It'd probabl
devlink: move code to a dedicated directory
The devlink code is hard to navigate with 13kLoC in one file. I really like the way Michal split the ethtool into per-command files and core. It'd probably be too much to split it all up, but we can at least separate the core parts out of the per-cmd implementations and put it in a directory so that new commands can be separate files.
Move the code, subsequent commit will do a partial split.
Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5 |
|
| #
9cb252c4 |
| 05-Sep-2022 |
Menglong Dong <[email protected]> |
net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
As Eric reported, the 'reason' field is not presented when trace the kfree_skb event by perf:
$ perf record -e skb:kfree_skb -a sleep 1
net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
As Eric reported, the 'reason' field is not presented when trace the kfree_skb event by perf:
$ perf record -e skb:kfree_skb -a sleep 10 $ perf script ip_defrag 14605 [021] 221.614303: skb:kfree_skb: skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1 reason:
The cause seems to be passing kernel address directly to TP_printk(), which is not right. As the enum 'skb_drop_reason' is not exported to user space through TRACE_DEFINE_ENUM(), perf can't get the drop reason string from the 'reason' field, which is a number.
Therefore, we introduce the macro DEFINE_DROP_REASON(), which is used to define the trace enum by TRACE_DEFINE_ENUM(). With the help of DEFINE_DROP_REASON(), now we can remove the auto-generate that we introduced in the commit ec43908dd556 ("net: skb: use auto-generation to convert skb drop reason to string"), and define the string array 'drop_reasons'.
Hmmmm...now we come back to the situation that have to maintain drop reasons in both enum skb_drop_reason and DEFINE_DROP_REASON. But they are both in dropreason.h, which makes it easier.
After this commit, now the format of kfree_skb is like this:
$ cat /tracing/events/skb/kfree_skb/format name: kfree_skb ID: 1524 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1;
field:void * skbaddr; offset:8; size:8; signed:0; field:void * location; offset:16; size:8; signed:0; field:unsigned short protocol; offset:24; size:2; signed:0; field:enum skb_drop_reason reason; offset:28; size:4; signed:0;
print fmt: "skbaddr=%p protocol=%u location=%p reason: %s", REC->skbaddr, REC->protocol, REC->location, __print_symbolic(REC->reason, { 1, "NOT_SPECIFIED" }, { 2, "NO_SOCKET" } ......
Fixes: ec43908dd556 ("net: skb: use auto-generation to convert skb drop reason to string") Link: https://lore.kernel.org/netdev/CANn89i+bx0ybvE55iMYf5GJM48WwV1HNpdm9Q6t-HaEstqpCSA@mail.gmail.com/ Reported-by: Eric Dumazet <[email protected]> Signed-off-by: Menglong Dong <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2 |
|
| #
ec43908d |
| 06-Jun-2022 |
Menglong Dong <[email protected]> |
net: skb: use auto-generation to convert skb drop reason to string
It is annoying to add new skb drop reasons to 'enum skb_drop_reason' and TRACE_SKB_DROP_REASON in trace/event/skb.h, and it's easy
net: skb: use auto-generation to convert skb drop reason to string
It is annoying to add new skb drop reasons to 'enum skb_drop_reason' and TRACE_SKB_DROP_REASON in trace/event/skb.h, and it's easy to forget to add the new reasons we added to TRACE_SKB_DROP_REASON.
TRACE_SKB_DROP_REASON is used to convert drop reason of type number to string. For now, the string we passed to user space is exactly the same as the name in 'enum skb_drop_reason' with a 'SKB_DROP_REASON_' prefix. Therefore, we can use 'auto-generation' to generate these drop reasons to string at build time.
The new source 'dropreason_str.c' will be auto generated during build time, which contains the string array 'const char * const drop_reasons[]'.
Signed-off-by: Menglong Dong <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2 |
|
| #
2c193f2c |
| 19-Nov-2021 |
Jakub Kicinski <[email protected]> |
net: kunit: add a test for dev_addr_lists
Add a KUnit test for the dev_addr API.
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
|
| #
e456a18a |
| 15-Nov-2021 |
Eric Dumazet <[email protected]> |
net: gro: move skb_gro_receive into net/core/gro.c
net/core/gro.c will contain all core gro functions, to shrink net/core/skbuff.c and net/core/dev.c
Signed-off-by: Eric Dumazet <[email protected]
net: gro: move skb_gro_receive into net/core/gro.c
net/core/gro.c will contain all core gro functions, to shrink net/core/skbuff.c and net/core/dev.c
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5 |
|
| #
e330fb14 |
| 07-Oct-2021 |
Jakub Kicinski <[email protected]> |
of: net: move of_net under net/
Rob suggests to move of_net.c from under drivers/of/ somewhere to the networking code.
Suggested-by: Rob Herring <[email protected]> Signed-off-by: Jakub Kicinski <kub
of: net: move of_net under net/
Rob suggests to move of_net.c from under drivers/of/ somewhere to the networking code.
Suggested-by: Rob Herring <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Rob Herring <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1 |
|
| #
17edea21 |
| 04-Jul-2021 |
Cong Wang <[email protected]> |
sock_map: Relax config dependency to CONFIG_NET
Currently sock_map still has Kconfig dependency on CONFIG_INET, but there is no actual functional dependency on it after we introduce ->psock_update_s
sock_map: Relax config dependency to CONFIG_NET
Currently sock_map still has Kconfig dependency on CONFIG_INET, but there is no actual functional dependency on it after we introduce ->psock_update_sk_prot().
We have to extend it to CONFIG_NET now as we are going to support AF_UNIX.
Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1 |
|
| #
4a52dd8f |
| 28-Apr-2021 |
Oleksij Rempel <[email protected]> |
net: selftest: fix build issue if INET is disabled
In case ethernet driver is enabled and INET is disabled, selftest will fail to build.
Reported-by: Randy Dunlap <[email protected]> Fixes: 3e1
net: selftest: fix build issue if INET is disabled
In case ethernet driver is enabled and INET is disabled, selftest will fail to build.
Reported-by: Randy Dunlap <[email protected]> Fixes: 3e1e58d64c3d ("net: add generic selftest support") Signed-off-by: Oleksij Rempel <[email protected]> Acked-by: Randy Dunlap <[email protected]> # build-tested Reviewed-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.12 |
|
| #
3e1e58d6 |
| 19-Apr-2021 |
Oleksij Rempel <[email protected]> |
net: add generic selftest support
Port some parts of the stmmac selftest and reuse it as basic generic selftest library. This patch was tested with following combinations: - iMX6DL FEC -> AT8035 - i
net: add generic selftest support
Port some parts of the stmmac selftest and reuse it as basic generic selftest library. This patch was tested with following combinations: - iMX6DL FEC -> AT8035 - iMX6DL FEC -> SJA1105Q switch -> KSZ8081 - iMX6DL FEC -> SJA1105Q switch -> KSZ9031 - AR9331 ag71xx -> AR9331 PHY - AR9331 ag71xx -> AR9331 switch -> AR9331 PHY
Signed-off-by: Oleksij Rempel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse |
|
| #
88759609 |
| 23-Feb-2021 |
Cong Wang <[email protected]> |
bpf: Clean up sockmap related Kconfigs
As suggested by John, clean up sockmap related Kconfigs:
Reduce the scope of CONFIG_BPF_STREAM_PARSER down to TCP stream parser, to reflect its name.
Make th
bpf: Clean up sockmap related Kconfigs
As suggested by John, clean up sockmap related Kconfigs:
Reduce the scope of CONFIG_BPF_STREAM_PARSER down to TCP stream parser, to reflect its name.
Make the rest sockmap code simply depend on CONFIG_BPF_SYSCALL and CONFIG_INET, the latter is still needed at this point because of TCP/UDP proto update. And leave CONFIG_NET_SOCK_MSG untouched, as it is used by non-sockmap cases.
Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Reviewed-by: Lorenz Bauer <[email protected]> Acked-by: John Fastabend <[email protected]> Acked-by: Jakub Sitnicki <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
|
Revision tags: v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2 |
|
| #
9ce48e5a |
| 11-Dec-2019 |
Michal Kubecek <[email protected]> |
ethtool: move to its own directory
The ethtool netlink interface is going to be split into multiple files so that it will be more convenient to put all of them in a separate directory net/ethtool. S
ethtool: move to its own directory
The ethtool netlink interface is going to be split into multiple files so that it will be more convenient to put all of them in a separate directory net/ethtool. Start by moving current ethtool.c with ioctl interface into this directory and renaming it to ioctl.c.
Signed-off-by: Michal Kubecek <[email protected]> Acked-by: Jiri Pirko <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7 |
|
| #
6ac99e8f |
| 26-Apr-2019 |
Martin KaFai Lau <[email protected]> |
bpf: Introduce bpf sk local storage
After allowing a bpf prog to - directly read the skb->sk ptr - get the fullsock bpf_sock by "bpf_sk_fullsock()" - get the bpf_tcp_sock by "bpf_tcp_sock()" - get t
bpf: Introduce bpf sk local storage
After allowing a bpf prog to - directly read the skb->sk ptr - get the fullsock bpf_sock by "bpf_sk_fullsock()" - get the bpf_tcp_sock by "bpf_tcp_sock()" - get the listener sock by "bpf_get_listener_sock()" - avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock" into different bpf running context.
this patch is another effort to make bpf's network programming more intuitive to do (together with memory and performance benefit).
When bpf prog needs to store data for a sk, the current practice is to define a map with the usual 4-tuples (src/dst ip/port) as the key. If multiple bpf progs require to store different sk data, multiple maps have to be defined. Hence, wasting memory to store the duplicated keys (i.e. 4 tuples here) in each of the bpf map. [ The smallest key could be the sk pointer itself which requires some enhancement in the verifier and it is a separate topic. ]
Also, the bpf prog needs to clean up the elem when sk is freed. Otherwise, the bpf map will become full and un-usable quickly. The sk-free tracking currently could be done during sk state transition (e.g. BPF_SOCK_OPS_STATE_CB).
The size of the map needs to be predefined which then usually ended-up with an over-provisioned map in production. Even the map was re-sizable, while the sk naturally come and go away already, this potential re-size operation is arguably redundant if the data can be directly connected to the sk itself instead of proxy-ing through a bpf map.
This patch introduces sk->sk_bpf_storage to provide local storage space at sk for bpf prog to use. The space will be allocated when the first bpf prog has created data for this particular sk.
The design optimizes the bpf prog's lookup (and then optionally followed by an inline update). bpf_spin_lock should be used if the inline update needs to be protected.
BPF_MAP_TYPE_SK_STORAGE: ----------------------- To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in this patch) needs to be created. Multiple BPF_MAP_TYPE_SK_STORAGE maps can be created to fit different bpf progs' needs. The map enforces BTF to allow printing the sk-local-storage during a system-wise sk dump (e.g. "ss -ta") in the future.
The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete a "sk-local-storage" data from a particular sk. Think of the map as a meta-data (or "type") of a "sk-local-storage". This particular "type" of "sk-local-storage" data can then be stored in any sk.
The main purposes of this map are mostly: 1. Define the size of a "sk-local-storage" type. 2. Provide a similar syscall userspace API as the map (e.g. lookup/update, map-id, map-btf...etc.) 3. Keep track of all sk's storages of this "type" and clean them up when the map is freed.
sk->sk_bpf_storage: ------------------ The main lookup/update/delete is done on sk->sk_bpf_storage (which is a "struct bpf_sk_storage"). When doing a lookup, the "map" pointer is now used as the "key" to search on the sk_storage->list. The "map" pointer is actually serving as the "type" of the "sk-local-storage" that is being requested.
To allow very fast lookup, it should be as fast as looking up an array at a stable-offset. At the same time, it is not ideal to set a hard limit on the number of sk-local-storage "type" that the system can have. Hence, this patch takes a cache approach. The last search result from sk_storage->list is cached in sk_storage->cache[] which is a stable sized array. Each "sk-local-storage" type has a stable offset to the cache[] array. In the future, a map's flag could be introduced to do cache opt-out/enforcement if it became necessary.
The cache size is 16 (i.e. 16 types of "sk-local-storage"). Programs can share map. On the program side, having a few bpf_progs running in the networking hotpath is already a lot. The bpf_prog should have already consolidated the existing sock-key-ed map usage to minimize the map lookup penalty. 16 has enough runway to grow.
All sk-local-storage data will be removed from sk->sk_bpf_storage during sk destruction.
bpf_sk_storage_get() and bpf_sk_storage_delete(): ------------------------------------------------ Instead of using bpf_map_(lookup|update|delete)_elem(), the bpf prog needs to use the new helper bpf_sk_storage_get() and bpf_sk_storage_delete(). The verifier can then enforce the ARG_PTR_TO_SOCKET argument. The bpf_sk_storage_get() also allows to "create" new elem if one does not exist in the sk. It is done by the new BPF_SK_STORAGE_GET_F_CREATE flag. An optional value can also be provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE. The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock. Together, it has eliminated the potential use cases for an equivalent bpf_map_update_elem() API (for bpf_prog) in this patch.
Misc notes: ---------- 1. map_get_next_key is not supported. From the userspace syscall perspective, the map has the socket fd as the key while the map can be shared by pinned-file or map-id.
Since btf is enforced, the existing "ss" could be enhanced to pretty print the local-storage.
Supporting a kernel defined btf with 4 tuples as the return key could be explored later also.
2. The sk->sk_lock cannot be acquired. Atomic operations is used instead. e.g. cmpxchg is done on the sk->sk_bpf_storage ptr. Please refer to the source code comments for the details in synchronization cases and considerations.
3. The mem is charged to the sk->sk_omem_alloc as the sk filter does.
Benchmark: --------- Here is the benchmark data collected by turning on the "kernel.bpf_stats_enabled" sysctl. Two bpf progs are tested:
One bpf prog with the usual bpf hashmap (max_entries = 8192) with the sk ptr as the key. (verifier is modified to support sk ptr as the key That should have shortened the key lookup time.)
Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE.
Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for each egress skb and then bump the cnt. netperf is used to drive data with 4096 connected UDP sockets.
BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run) 27: cgroup_skb name egress_sk_map tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633 loaded_at 2019-04-15T13:46:39-0700 uid 0 xlated 344B jited 258B memlock 4096B map_ids 16 btf_id 5
BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run) 30: cgroup_skb name egress_sk_stora tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739 loaded_at 2019-04-15T13:47:54-0700 uid 0 xlated 168B jited 156B memlock 4096B map_ids 17 btf_id 6
Here is a high-level picture on how are the objects organized:
sk ┌──────┐ │ │ │ │ │ │ │*sk_bpf_storage─────▶ bpf_sk_storage └──────┘ ┌───────┐ ┌───────────┤ list │ │ │ │ │ │ │ │ │ │ │ └───────┘ │ │ elem │ ┌────────┐ ├─▶│ snode │ │ ├────────┤ │ │ data │ bpf_map │ ├────────┤ ┌─────────┐ │ │map_node│◀─┬─────┤ list │ │ └────────┘ │ │ │ │ │ │ │ │ elem │ │ │ │ ┌────────┐ │ └─────────┘ └─▶│ snode │ │ ├────────┤ │ bpf_map │ data │ │ ┌─────────┐ ├────────┤ │ │ list ├───────▶│map_node│ │ │ │ └────────┘ │ │ │ │ │ │ elem │ └─────────┘ ┌────────┐ │ ┌─▶│ snode │ │ │ ├────────┤ │ │ │ data │ │ │ ├────────┤ │ │ │map_node│◀─┘ │ └────────┘ │ │ │ ┌───────┐ sk └──────────│ list │ ┌──────┐ │ │ │ │ │ │ │ │ │ │ │ │ └───────┘ │*sk_bpf_storage───────▶bpf_sk_storage └──────┘
Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|