|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6 |
|
| #
391bb659 |
| 29-Jun-2024 |
Lorenzo Bianconi <[email protected]> |
netfilter: Add bpf_xdp_flow_lookup kfunc
Introduce bpf_xdp_flow_lookup kfunc in order to perform the lookup of a given flowtable entry based on a fib tuple of incoming traffic. bpf_xdp_flow_lookup c
netfilter: Add bpf_xdp_flow_lookup kfunc
Introduce bpf_xdp_flow_lookup kfunc in order to perform the lookup of a given flowtable entry based on a fib tuple of incoming traffic. bpf_xdp_flow_lookup can be used as building block to offload in xdp the processing of sw flowtable when hw flowtable is not available.
Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Acked-by: Pablo Neira Ayuso <[email protected]> Link: https://lore.kernel.org/bpf/55d38a4e5856f6d1509d823ff4e98aaa6d356097.1719698275.git.lorenzo@kernel.org
show more ...
|
| #
89cc8f1c |
| 29-Jun-2024 |
Florian Westphal <[email protected]> |
netfilter: nf_tables: Add flowtable map for xdp offload
This adds a small internal mapping table so that a new bpf (xdp) kfunc can perform lookups in a flowtable.
As-is, xdp program has access to t
netfilter: nf_tables: Add flowtable map for xdp offload
This adds a small internal mapping table so that a new bpf (xdp) kfunc can perform lookups in a flowtable.
As-is, xdp program has access to the device pointer, but no way to do a lookup in a flowtable -- there is no way to obtain the needed struct without questionable stunts.
This allows to obtain an nf_flowtable pointer given a net_device structure.
In order to keep backward compatibility, the infrastructure allows the user to add a given device to multiple flowtables, but it will always return the first added mapping performing the lookup since it assumes the right configuration is 1:1 mapping between flowtables and net_devices.
Co-developed-by: Lorenzo Bianconi <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Pablo Neira Ayuso <[email protected]> Link: https://lore.kernel.org/bpf/9f20e2c36f494b3bf177328718367f636bb0b2ab.1719698275.git.lorenzo@kernel.org
show more ...
|
|
Revision tags: v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3 |
|
| #
aefb2f2e |
| 21-Nov-2023 |
Breno Leitao <[email protected]> |
x86/bugs: Rename CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINE
Step 5/10 of the namespace unification of CPU mitigations related Kconfig options.
[ mingo: Converted a few more uses in
x86/bugs: Rename CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINE
Step 5/10 of the namespace unification of CPU mitigations related Kconfig options.
[ mingo: Converted a few more uses in comments/messages as well. ]
Suggested-by: Josh Poimboeuf <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Ariel Miculas <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3 |
|
| #
84601d6e |
| 21-Apr-2023 |
Florian Westphal <[email protected]> |
bpf: add bpf_link support for BPF_NETFILTER programs
Add bpf_link support skeleton. To keep this reviewable, no bpf program can be invoked yet, if a program is attached only a c-stub is called and
bpf: add bpf_link support for BPF_NETFILTER programs
Add bpf_link support skeleton. To keep this reviewable, no bpf program can be invoked yet, if a program is attached only a c-stub is called and not the actual bpf program.
Defaults to 'y' if both netfilter and bpf syscall are enabled in kconfig.
Uapi example usage: union bpf_attr attr = { };
attr.link_create.prog_fd = progfd; attr.link_create.attach_type = 0; /* unused */ attr.link_create.netfilter.pf = PF_INET; attr.link_create.netfilter.hooknum = NF_INET_LOCAL_IN; attr.link_create.netfilter.priority = -128;
err = bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
... this would attach progfd to ipv4:input hook.
Such hook gets removed automatically if the calling program exits.
BPF_NETFILTER program invocation is added in followup change.
NF_HOOK_OP_BPF enum will eventually be read from nfnetlink_hook, it allows to tell userspace which program is attached at the given hook when user runs 'nft hook list' command rather than just the priority and not-very-helpful 'this hook runs a bpf prog but I can't tell which one'.
Will also be used to disallow registration of two bpf programs with same priority in a followup patch.
v4: arm32 cmpxchg only supports 32bit operand s/prio/priority/ v3: restrict prog attachment to ip/ip6 for now, lets lift restrictions if more use cases pop up (arptables, ebtables, netdev ingress/egress etc).
Signed-off-by: Florian Westphal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8 |
|
| #
c0c3ab63 |
| 07-Feb-2023 |
Xin Long <[email protected]> |
net: create nf_conntrack_ovs for ovs and tc use
Similar to nf_nat_ovs created by Commit ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc"), this patch is to create nf_conntrack
net: create nf_conntrack_ovs for ovs and tc use
Similar to nf_nat_ovs created by Commit ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc"), this patch is to create nf_conntrack_ovs to get these functions shared by OVS and TC only.
There are nf_ct_helper() and nf_ct_add_helper() from nf_conntrak_helper in this patch, and will be more in the following patches.
Signed-off-by: Xin Long <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Aaron Conole <[email protected]> Acked-by: Florian Westphal <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3 |
|
| #
d9e78914 |
| 03-Jan-2023 |
Florian Westphal <[email protected]> |
netfilter: nf_tables: avoid retpoline overhead for some ct expression calls
nft_ct expression cannot be made builtin to nf_tables without also forcing the conntrack itself to be builtin.
However, t
netfilter: nf_tables: avoid retpoline overhead for some ct expression calls
nft_ct expression cannot be made builtin to nf_tables without also forcing the conntrack itself to be builtin.
However, this can be avoided by splitting retrieval of a few selector keys that only need to access the nf_conn structure, i.e. no function calls to nf_conntrack code.
Many rulesets start with something like "ct status established,related accept"
With this change, this no longer requires an indirect call, which gives about 1.8% more throughput with a simple conntrack-enabled forwarding test (retpoline thunk used).
Signed-off-by: Florian Westphal <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc2, v6.2-rc1, v6.1 |
|
| #
ebddb140 |
| 08-Dec-2022 |
Xin Long <[email protected]> |
net: move the nat function to nf_nat_ovs for ovs and tc
There are two nat functions are nearly the same in both OVS and TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().
This patch c
net: move the nat function to nf_nat_ovs for ovs and tc
There are two nat functions are nearly the same in both OVS and TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().
This patch creates nf_nat_ovs.c under netfilter and moves them there then exports nf_ct_nat() so that it can be shared by both OVS and TC, and keeps the nat (type) check and nat flag update in OVS and TC's own place, as these parts are different between OVS and TC.
Note that in OVS nat function it was using skb->protocol to get the proto as it already skips vlans in key_extract(), while it doesn't in TC, and TC has to call skb_protocol() to get proto. So in nf_ct_nat_execute(), we keep using skb_protocol() which works for both OVS and TC contrack.
Signed-off-by: Xin Long <[email protected]> Acked-by: Aaron Conole <[email protected]> Acked-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3 |
|
| #
3a07327d |
| 25-Oct-2022 |
Pablo Neira Ayuso <[email protected]> |
netfilter: nft_inner: support for inner tunnel header matching
This new expression allows you to match on the inner headers that are encapsulated by any of the existing tunneling protocols.
This ex
netfilter: nft_inner: support for inner tunnel header matching
This new expression allows you to match on the inner headers that are encapsulated by any of the existing tunneling protocols.
This expression parses the inner packet to set the link, network and transport offsets, so the existing expressions (with a few updates) can be reused to match on the inner headers.
The inner expression supports for different tunnel combinations such as:
- ethernet frame over IPv4/IPv6 packet, eg. VxLAN. - IPv4/IPv6 packet over IPv4/IPv6 packet, eg. IPIP. - IPv4/IPv6 packet over IPv4/IPv6 + transport header, eg. GRE. - transport header (ESP or SCTP) over transport header (usually UDP)
The following fields are used to describe the tunnel protocol:
- flags, which describe how to parse the inner headers:
NFT_PAYLOAD_CTX_INNER_TUN, the tunnel provides its own header. NFT_PAYLOAD_CTX_INNER_ETHER, the ethernet frame is available as inner header. NFT_PAYLOAD_CTX_INNER_NH, the network header is available as inner header. NFT_PAYLOAD_CTX_INNER_TH, the transport header is available as inner header.
For example, VxLAN sets on all of these flags. While GRE only sets on NFT_PAYLOAD_CTX_INNER_NH and NFT_PAYLOAD_CTX_INNER_TH. Then, ESP over UDP only sets on NFT_PAYLOAD_CTX_INNER_TH.
The tunnel description is composed of the following attributes:
- header size: in case the tunnel comes with its own header, eg. VxLAN.
- type: this provides a hint to userspace on how to delinearize the rule. This is useful for VxLAN and Geneve since they run over UDP, since transport does not provide a hint. This is also useful in case hardware offload is ever supported. The type is not currently interpreted by the kernel.
- expression: currently only payload supported. Follow up patch adds also inner meta support which is required by autogenerated dependencies. The exthdr expression should be supported too at some point. There is a new inner_ops operation that needs to be set on to allow to use an existing expression from the inner expression.
This patch adds a new NFT_PAYLOAD_TUN_HEADER base which allows to match on the tunnel header fields, eg. vxlan vni.
The payload expression is embedded into nft_inner private area and this private data area is passed to the payload inner eval function via direct call.
Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc2 |
|
| #
d037abc2 |
| 21-Oct-2022 |
Florian Westphal <[email protected]> |
netfilter: nft_objref: make it builtin
nft_objref is needed to reference named objects, it makes no sense to disable it.
Before: text data bss dec filename 4014 424 0
netfilter: nft_objref: make it builtin
nft_objref is needed to reference named objects, it makes no sense to disable it.
Before: text data bss dec filename 4014 424 0 4438 nft_objref.o 4174 1128 0 5302 nft_objref.ko 359351 15276 864 375491 nf_tables.ko After: text data bss dec filename 3815 408 0 4223 nft_objref.o 363161 15692 864 379717 nf_tables.ko
Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc1, v6.0 |
|
| #
820dc052 |
| 29-Sep-2022 |
Lorenzo Bianconi <[email protected]> |
net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c
Remove circular dependency between nf_nat module and nf_conntrack one moving bpf_ct_set_nat_info kfunc in nf_nat_bpf.c
Fixes: 0fabd2aa
net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c
Remove circular dependency between nf_nat module and nf_conntrack one moving bpf_ct_set_nat_info kfunc in nf_nat_bpf.c
Fixes: 0fabd2aa199f ("net: netfilter: add bpf_ct_set_nat_info kfunc helper") Suggested-by: Kumar Kartikeya Dwivedi <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Tested-by: Yauheni Kaliuta <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/r/51a65513d2cda3eeb0754842e8025ab3966068d8.1664490511.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3 |
|
| #
b0381776 |
| 15-Jun-2022 |
Vlad Buslov <[email protected]> |
netfilter: nf_flow_table: count pending offload workqueue tasks
To improve hardware offload debuggability count pending 'add', 'del' and 'stats' flow_table offload workqueue tasks. Counters are incr
netfilter: nf_flow_table: count pending offload workqueue tasks
To improve hardware offload debuggability count pending 'add', 'del' and 'stats' flow_table offload workqueue tasks. Counters are incremented before scheduling new task and decremented when workqueue handler finishes executing. These counters allow user to diagnose congestion on hardware offload workqueues that can happen when either CPU is starved and workqueue jobs are executed at lower rate than new ones are added or when hardware/driver can't keep up with the rate.
Implement the described counters as percpu counters inside new struct netns_ft which is stored inside struct net. Expose them via new procfs file '/proc/net/stats/nf_flowtable' that is similar to existing 'nf_conntrack' file.
Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: Oz Shlomo <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1 |
|
| #
b4c2b959 |
| 14-Jan-2022 |
Kumar Kartikeya Dwivedi <[email protected]> |
net/netfilter: Add unstable CT lookup helpers for XDP and TC-BPF
This change adds conntrack lookup helpers using the unstable kfunc call interface for the XDP and TC-BPF hooks. The primary usecase i
net/netfilter: Add unstable CT lookup helpers for XDP and TC-BPF
This change adds conntrack lookup helpers using the unstable kfunc call interface for the XDP and TC-BPF hooks. The primary usecase is implementing a synproxy in XDP, see Maxim's patchset [0].
Export get_net_ns_by_id as nf_conntrack_bpf.c needs to call it.
This object is only built when CONFIG_DEBUG_INFO_BTF_MODULES is enabled.
[0]: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
|
Revision tags: v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6 |
|
| #
023223df |
| 17-Dec-2021 |
Pablo Neira Ayuso <[email protected]> |
netfilter: nf_tables: make counter support built-in
Make counter support built-in to allow for direct call in case of CONFIG_RETPOLINE.
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Revision tags: v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7 |
|
| #
7a3f5b0d |
| 17-Aug-2021 |
Ryoga Saito <[email protected]> |
netfilter: add netfilter hooks to SRv6 data plane
This patch introduces netfilter hooks for solving the problem that conntrack couldn't record both inner flows and outer flows.
This patch also intr
netfilter: add netfilter hooks to SRv6 data plane
This patch introduces netfilter hooks for solving the problem that conntrack couldn't record both inner flows and outer flows.
This patch also introduces a new sysctl toggle for enabling lightweight tunnel netfilter hooks.
Signed-off-by: Ryoga Saito <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7 |
|
| #
836382dc |
| 16-Jun-2021 |
Pablo Neira Ayuso <[email protected]> |
netfilter: nf_tables: add last expression
Add a new optional expression that tells you when last matching on a given rule / set element element has happened.
Signed-off-by: Pablo Neira Ayuso <pablo
netfilter: nf_tables: add last expression
Add a new optional expression that tells you when last matching on a given rule / set element element has happened.
Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v5.13-rc6, v5.13-rc5 |
|
| #
e2cf17d3 |
| 04-Jun-2021 |
Florian Westphal <[email protected]> |
netfilter: add new hook nfnl subsystem
This nfnl subsystem allows to dump the list of all active netfiler hooks, e.g. defrag, conntrack, nf/ip/arp/ip6tables and so on.
This helps to see what kind o
netfilter: add new hook nfnl subsystem
This nfnl subsystem allows to dump the list of all active netfiler hooks, e.g. defrag, conntrack, nf/ip/arp/ip6tables and so on.
This helps to see what kind of features are currently enabled in the network stack.
Sample output from nft tool using this infra:
$ nft list hook ip input family ip hook input { +0000000010 nft_do_chain_inet [nf_tables] # nft table firewalld INPUT +0000000100 nf_nat_ipv4_local_in [nf_nat] +2147483647 ipv4_confirm [nf_conntrack] }
Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5 |
|
| #
e465cccd |
| 25-Mar-2021 |
Florian Westphal <[email protected]> |
netfilter: nf_log_common: merge with nf_log_syslog
Remove nf_log_common. Now that all per-af modules have been merged there is no longer a need to provide a helper module.
Signed-off-by: Florian W
netfilter: nf_log_common: merge with nf_log_syslog
Remove nf_log_common. Now that all per-af modules have been merged there is no longer a need to provide a helper module.
Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
| #
1510618e |
| 25-Mar-2021 |
Florian Westphal <[email protected]> |
netfilter: nf_log_netdev: merge with nf_log_syslog
Provide netdev family support from the nf_log_syslog module.
Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <pabl
netfilter: nf_log_netdev: merge with nf_log_syslog
Provide netdev family support from the nf_log_syslog module.
Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
| #
db3187ae |
| 25-Mar-2021 |
Florian Westphal <[email protected]> |
netfilter: nf_log_ipv4: rename to nf_log_syslog
Netfilter has multiple log modules: nf_log_arp nf_log_bridge nf_log_ipv4 nf_log_ipv6 nf_log_netdev nfnetlink_log nf_log_common
With the except
netfilter: nf_log_ipv4: rename to nf_log_syslog
Netfilter has multiple log modules: nf_log_arp nf_log_bridge nf_log_ipv4 nf_log_ipv6 nf_log_netdev nfnetlink_log nf_log_common
With the exception of nfnetlink_log (packet is sent to userspace for dissection/logging), all of them log to the kernel ringbuffer.
This is the first part of a series to merge all modules except nfnetlink_log into a single module: nf_log_syslog.
This allows to reduce code. After the series, only two log modules remain: nfnetlink_log and nf_log_syslog. The latter provides the same functionality as the old per-af log modules.
This renames nf_log_ipv4 to nf_log_syslog.
Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1 |
|
| #
6bbb9ad3 |
| 22-Oct-2020 |
Jose M. Guisado Gomez <[email protected]> |
netfilter: nft_reject: add reject verdict support for netdev
Adds support for reject from ingress hook in netdev family. Both stacks ipv4 and ipv6. With reject packets supporting ICMP and TCP RST.
netfilter: nft_reject: add reject verdict support for netdev
Adds support for reject from ingress hook in netdev family. Both stacks ipv4 and ipv6. With reject packets supporting ICMP and TCP RST.
This ability is required in devices that need to REJECT legitimate clients which traffic is forwarded from the ingress hook.
Joint work with Laura Garcia.
Signed-off-by: Jose M. Guisado Gomez <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6 |
|
| #
e6abef61 |
| 26-Mar-2020 |
Jason A. Donenfeld <[email protected]> |
x86: update AS_* macros to binutils >=2.23, supporting ADX and AVX2
Now that the kernel specifies binutils 2.23 as the minimum version, we can remove ifdefs for AVX2 and ADX throughout.
Signed-off-
x86: update AS_* macros to binutils >=2.23, supporting ADX and AVX2
Now that the kernel specifies binutils 2.23 as the minimum version, we can remove ifdefs for AVX2 and ADX throughout.
Signed-off-by: Jason A. Donenfeld <[email protected]> Acked-by: Ingo Molnar <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
show more ...
|
| #
5e8ebd84 |
| 26-Mar-2020 |
Jason A. Donenfeld <[email protected]> |
x86: probe assembler capabilities via kconfig instead of makefile
Doing this probing inside of the Makefiles means we have a maze of ifdefs inside the source code and child Makefiles that need to ma
x86: probe assembler capabilities via kconfig instead of makefile
Doing this probing inside of the Makefiles means we have a maze of ifdefs inside the source code and child Makefiles that need to make proper decisions on this too. Instead, we do it at Kconfig time, like many other compiler and assembler options, which allows us to set up the dependencies normally for full compilation units. In the process, the ADX test changes to use %eax instead of %r10 so that it's valid in both 32-bit and 64-bit mode.
Signed-off-by: Jason A. Donenfeld <[email protected]> Acked-by: Ingo Molnar <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
show more ...
|
|
Revision tags: v5.6-rc7, v5.6-rc6, v5.6-rc5 |
|
| #
7400b063 |
| 07-Mar-2020 |
Stefano Brivio <[email protected]> |
nft_set_pipapo: Introduce AVX2-based lookup implementation
If the AVX2 set is available, we can exploit the repetitive characteristic of this algorithm to provide a fast, vectorised version by using
nft_set_pipapo: Introduce AVX2-based lookup implementation
If the AVX2 set is available, we can exploit the repetitive characteristic of this algorithm to provide a fast, vectorised version by using 256-bit wide AVX2 operations for bucket loads and bitwise intersections.
In most cases, this implementation consistently outperforms rbtree set instances despite the fact they are configured to use a given, single, ranged data type out of the ones used for performance measurements by the nft_concat_range.sh kselftest.
That script, injecting packets directly on the ingoing device path with pktgen, reports, averaged over five runs on a single AMD Epyc 7402 thread (3.35GHz, 768 KiB L1D$, 12 MiB L2$), the figures below. CONFIG_RETPOLINE was not set here.
Note that this is not a fair comparison over hash and rbtree set types: non-ranged entries (used to have a reference for hash types) would be matched faster than this, and matching on a single field only (which is the case for rbtree) is also significantly faster.
However, it's not possible at the moment to choose this set type for non-ranged entries, and the current implementation also needs a few minor adjustments in order to match on less than two fields.
---------------.-----------------------------------.------------. AMD Epyc 7402 | baselines, Mpps | this patch | 1 thread |___________________________________|____________| 3.35GHz | | | | | | 768KiB L1D$ | netdev | hash | rbtree | | | ---------------| hook | no | single | | pipapo | type entries | drop | ranges | field | pipapo | AVX2 | ---------------|--------|--------|--------|--------|------------| net,port | | | | | | 1000 | 19.0 | 10.4 | 3.8 | 4.0 | 7.5 +87% | ---------------|--------|--------|--------|--------|------------| port,net | | | | | | 100 | 18.8 | 10.3 | 5.8 | 6.3 | 8.1 +29% | ---------------|--------|--------|--------|--------|------------| net6,port | | | | | | 1000 | 16.4 | 7.6 | 1.8 | 2.1 | 4.8 +128% | ---------------|--------|--------|--------|--------|------------| port,proto | | | | | | 30000 | 19.6 | 11.6 | 3.9 | 0.5 | 2.6 +420% | ---------------|--------|--------|--------|--------|------------| net6,port,mac | | | | | | 10 | 16.5 | 5.4 | 4.3 | 3.4 | 4.7 +38% | ---------------|--------|--------|--------|--------|------------| net6,port,mac, | | | | | | proto 1000 | 16.5 | 5.7 | 1.9 | 1.4 | 3.6 +26% | ---------------|--------|--------|--------|--------|------------| net,mac | | | | | | 1000 | 19.0 | 8.4 | 3.9 | 2.5 | 6.4 +156% | ---------------'--------'--------'--------'--------'------------'
A similar strategy could be easily reused to implement specialised versions for other SIMD sets, and I plan to post at least a NEON version at a later time.
Signed-off-by: Stefano Brivio <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v5.6-rc4, v5.6-rc3 |
|
| #
e32a4dc6 |
| 18-Feb-2020 |
Florian Westphal <[email protected]> |
netfilter: nf_tables: make sets built-in
Placing nftables set support in an extra module is pointless:
1. nf_tables needs dynamic registeration interface for sake of one module 2. nft heavily relie
netfilter: nf_tables: make sets built-in
Placing nftables set support in an extra module is pointless:
1. nf_tables needs dynamic registeration interface for sake of one module 2. nft heavily relies on sets, e.g. even simple rule like "nft ... tcp dport { 80, 443 }" will not work with _SETS=n.
IOW, either nftables isn't used or both nf_tables and nf_tables_set modules are needed anyway.
With extra module: 307K net/netfilter/nf_tables.ko 79K net/netfilter/nf_tables_set.ko
text data bss dec filename 146416 3072 545 150033 nf_tables.ko 35496 1817 0 37313 nf_tables_set.ko
This patch: 373K net/netfilter/nf_tables.ko
178563 4049 545 183157 nf_tables.ko
Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v5.6-rc2, v5.6-rc1, v5.5 |
|
| #
3c4287f6 |
| 21-Jan-2020 |
Stefano Brivio <[email protected]> |
nf_tables: Add set type for arbitrary concatenation of ranges
This new set type allows for intervals in concatenated fields, which are expressed in the usual way, that is, simple byte concatenation
nf_tables: Add set type for arbitrary concatenation of ranges
This new set type allows for intervals in concatenated fields, which are expressed in the usual way, that is, simple byte concatenation with padding to 32 bits for single fields, and given as ranges by specifying start and end elements containing, each, the full concatenation of start and end values for the single fields.
Ranges are expanded to composing netmasks, for each field: these are inserted as rules in per-field lookup tables. Bits to be classified are divided in 4-bit groups, and for each group, the lookup table contains 4^2 buckets, representing all the possible values of a bit group. This approach was inspired by the Grouper algorithm: http://www.cse.usf.edu/~ligatti/projects/grouper/
Matching is performed by a sequence of AND operations between bucket values, with buckets selected according to the value of packet bits, for each group. The result of this sequence tells us which rules matched for a given field.
In order to concatenate several ranged fields, per-field rules are mapped using mapping arrays, one per field, that specify which rules should be considered while matching the next field. The mapping array for the last field contains a reference to the element originally inserted.
The notes in nft_set_pipapo.c cover the algorithm in deeper detail.
A pure hash-based approach is of no use here, as ranges need to be classified. An implementation based on "proxying" the existing red-black tree set type, creating a tree for each field, was considered, but deemed impractical due to the fact that elements would need to be shared between trees, at least as long as we want to keep UAPI changes to a minimum.
A stand-alone implementation of this algorithm is available at: https://pipapo.lameexcu.se together with notes about possible future optimisations (in pipapo.c).
This algorithm was designed with data locality in mind, and can be highly optimised for SIMD instruction sets, as the bulk of the matching work is done with repetitive, simple bitwise operations.
At this point, without further optimisations, nft_concat_range.sh reports, for one AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB L2$):
TEST: performance net,port [ OK ] baseline (drop from netdev hook): 10190076pps baseline hash (non-ranged entries): 6179564pps baseline rbtree (match on first field only): 2950341pps set with 1000 full, ranged entries: 2304165pps port,net [ OK ] baseline (drop from netdev hook): 10143615pps baseline hash (non-ranged entries): 6135776pps baseline rbtree (match on first field only): 4311934pps set with 100 full, ranged entries: 4131471pps net6,port [ OK ] baseline (drop from netdev hook): 9730404pps baseline hash (non-ranged entries): 4809557pps baseline rbtree (match on first field only): 1501699pps set with 1000 full, ranged entries: 1092557pps port,proto [ OK ] baseline (drop from netdev hook): 10812426pps baseline hash (non-ranged entries): 6929353pps baseline rbtree (match on first field only): 3027105pps set with 30000 full, ranged entries: 284147pps net6,port,mac [ OK ] baseline (drop from netdev hook): 9660114pps baseline hash (non-ranged entries): 3778877pps baseline rbtree (match on first field only): 3179379pps set with 10 full, ranged entries: 2082880pps net6,port,mac,proto [ OK ] baseline (drop from netdev hook): 9718324pps baseline hash (non-ranged entries): 3799021pps baseline rbtree (match on first field only): 1506689pps set with 1000 full, ranged entries: 783810pps net,mac [ OK ] baseline (drop from netdev hook): 10190029pps baseline hash (non-ranged entries): 5172218pps baseline rbtree (match on first field only): 2946863pps set with 1000 full, ranged entries: 1279122pps
v4: - fix build for 32-bit architectures: 64-bit division needs div_u64() (kbuild test robot <[email protected]>) v3: - rework interface for field length specification, NFT_SET_SUBKEY disappears and information is stored in description - remove scratch area to store closing element of ranges, as elements now come with an actual attribute to specify the upper range limit (Pablo Neira Ayuso) - also remove pointer to 'start' element from mapping table, closing key is now accessible via extension data - use bytes right away instead of bits for field lengths, this way we can also double the inner loop of the lookup function to take care of upper and lower bits in a single iteration (minor performance improvement) - make it clearer that set operations are actually atomic API-wise, but we can't e.g. implement flush() as one-shot action - fix type for 'dup' in nft_pipapo_insert(), check for duplicates only in the next generation, and in general take care of differentiating generation mask cases depending on the operation (Pablo Neira Ayuso) - report C implementation matching rate in commit message, so that AVX2 implementation can be compared (Pablo Neira Ayuso) v2: - protect access to scratch maps in nft_pipapo_lookup() with local_bh_disable/enable() (Florian Westphal) - drop rcu_read_lock/unlock() from nft_pipapo_lookup(), it's already implied (Florian Westphal) - explain why partial allocation failures don't need handling in pipapo_realloc_scratch(), rename 'm' to clone and update related kerneldoc to make it clear we're not operating on the live copy (Florian Westphal) - add expicit check for priv->start_elem in nft_pipapo_insert() to avoid ending up in nft_pipapo_walk() with a NULL start element, and also zero it out in every operation that might make it invalid, so that insertion doesn't proceed with an invalid element (Florian Westphal)
Signed-off-by: Stefano Brivio <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|