|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
0bfcb7b7 |
| 07-Oct-2024 |
Florian Westphal <[email protected]> |
netfilter: xtables: avoid NFPROTO_UNSPEC where needed
syzbot managed to call xt_cluster match via ebtables:
WARNING: CPU: 0 PID: 11 at net/netfilter/xt_cluster.c:72 xt_cluster_mt+0x196/0x780 [..]
netfilter: xtables: avoid NFPROTO_UNSPEC where needed
syzbot managed to call xt_cluster match via ebtables:
WARNING: CPU: 0 PID: 11 at net/netfilter/xt_cluster.c:72 xt_cluster_mt+0x196/0x780 [..] ebt_do_table+0x174b/0x2a40
Module registers to NFPROTO_UNSPEC, but it assumes ipv4/ipv6 packet processing. As this is only useful to restrict locally terminating TCP/UDP traffic, register this for ipv4 and ipv6 family only.
Pablo points out that this is a general issue, direct users of the set/getsockopt interface can call into targets/matches that were only intended for use with ip(6)tables.
Check all UNSPEC matches and targets for similar issues:
- matches and targets are fine except if they assume skb_network_header() is valid -- this is only true when called from inet layer: ip(6) stack pulls the ip/ipv6 header into linear data area. - targets that return XT_CONTINUE or other xtables verdicts must be restricted too, they are incompatbile with the ebtables traverser, e.g. EBT_CONTINUE is a completely different value than XT_CONTINUE.
Most matches/targets are changed to register for NFPROTO_IPV4/IPV6, as they are provided for use by ip(6)tables.
The MARK target is also used by arptables, so register for NFPROTO_ARP too.
While at it, bail out if connbytes fails to enable the corresponding conntrack family.
This change passes the selftests in iptables.git.
Reported-by: [email protected] Closes: https://lore.kernel.org/netfilter-devel/[email protected]/ Fixes: 0269ea493734 ("netfilter: xtables: add cluster match") Signed-off-by: Florian Westphal <[email protected]> Co-developed-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2 |
|
| #
8556bceb |
| 18-Aug-2022 |
Wolfram Sang <[email protected]> |
netfilter: move from strlcpy with unused retval to strscpy
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generat
netfilter: move from strlcpy with unused retval to strscpy
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Florian Westphal <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6 |
|
| #
50dc9a85 |
| 16-Oct-2021 |
Ahmed S. Darwish <[email protected]> |
net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types
The only factor differentiating per-CPU bstats data type (struct gnet_stats_basic_cpu) from the packed non-per-CPU one (struct gnet_s
net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types
The only factor differentiating per-CPU bstats data type (struct gnet_stats_basic_cpu) from the packed non-per-CPU one (struct gnet_stats_basic_packed) was a u64_stats sync point inside the former. The two data types are now equivalent: earlier commits added a u64_stats sync point to the latter.
Combine both data types into "struct gnet_stats_basic_sync". This eliminates redundancy and simplifies the bstats read/write APIs.
Use u64_stats_t for bstats "packets" and "bytes" data types. On 64-bit architectures, u64_stats sync points do not use sequence counter protection.
Signed-off-by: Ahmed S. Darwish <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
67c9e627 |
| 16-Oct-2021 |
Ahmed S. Darwish <[email protected]> |
net: sched: Protect Qdisc::bstats with u64_stats
The not-per-CPU variant of qdisc tc (traffic control) statistics, Qdisc::gnet_stats_basic_packed bstats, is protected with Qdisc::running sequence co
net: sched: Protect Qdisc::bstats with u64_stats
The not-per-CPU variant of qdisc tc (traffic control) statistics, Qdisc::gnet_stats_basic_packed bstats, is protected with Qdisc::running sequence counter.
This sequence counter is used for reliably protecting bstats reads from parallel writes. Meanwhile, the seqcount's write section covers a much wider area than bstats update: qdisc_run_begin() => qdisc_run_end().
That read/write section asymmetry can lead to needless retries of the read section. To prepare for removing the Qdisc::running sequence counter altogether, introduce a u64_stats sync point inside bstats instead.
Modify _bstats_update() to start/end the bstats u64_stats write section.
For bisectability, and finer commits granularity, the bstats read section is still protected with a Qdisc::running read/retry loop and qdisc_run_begin/end() still starts/ends that seqcount write section. Once all call sites are modified to use _bstats_update(), the Qdisc::running seqcount will be removed and bstats read/retry loop will be modified to utilize the internal u64_stats sync point.
Note, using u64_stats implies no sequence counter protection for 64-bit architectures. This can lead to the statistics "packets" vs. "bytes" values getting out of sync on rare occasions. The individual values will still be valid.
[bigeasy: Minor commit message edits, init all gnet_stats_basic_packed.]
Signed-off-by: Ahmed S. Darwish <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1 |
|
| #
6cb56218 |
| 22-Dec-2020 |
Florian Westphal <[email protected]> |
netfilter: xt_RATEEST: reject non-null terminated string from userspace
syzbot reports: detected buffer overflow in strlen [..] Call Trace: strlen include/linux/string.h:325 [inline] strlcpy inclu
netfilter: xt_RATEEST: reject non-null terminated string from userspace
syzbot reports: detected buffer overflow in strlen [..] Call Trace: strlen include/linux/string.h:325 [inline] strlcpy include/linux/string.h:348 [inline] xt_rateest_tg_checkentry+0x2a5/0x6b0 net/netfilter/xt_RATEEST.c:143
strlcpy assumes src is a c-string. Check info->name before its used.
Reported-by: [email protected] Fixes: 5859034d7eb8793 ("[NETFILTER]: x_tables: add RATEEST target") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2 |
|
| #
c593642c |
| 09-Dec-2019 |
Pankaj Bharadiya <[email protected]> |
treewide: Use sizeof_field() macro
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIEL
treewide: Use sizeof_field() macro
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF().
This patch is generated using following script:
EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"
git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do
if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done
Signed-off-by: Pankaj Bharadiya <[email protected]> Link: https://lore.kernel.org/r/[email protected] Co-developed-by: Kees Cook <[email protected]> Signed-off-by: Kees Cook <[email protected]> Acked-by: David Miller <[email protected]> # for net
show more ...
|
|
Revision tags: v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4 |
|
| #
d2912cb1 |
| 04-Jun-2019 |
Thomas Gleixner <[email protected]> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of th
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation
this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Enrico Weigelt <[email protected]> Reviewed-by: Kate Stewart <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19 |
|
| #
0fbcc5b5 |
| 18-Oct-2018 |
Taehee Yoo <[email protected]> |
netfilter: xt_RATEEST: remove netns exit routine
xt_rateest_net_exit() was added to check whether rules are flushed successfully. but ->net_exit() callback is called earlier than ->destroy() callbac
netfilter: xt_RATEEST: remove netns exit routine
xt_rateest_net_exit() was added to check whether rules are flushed successfully. but ->net_exit() callback is called earlier than ->destroy() callback. So that ->net_exit() callback can't check that.
test commands: %ip netns add vm1 %ip netns exec vm1 iptables -t mangle -I PREROUTING -p udp \ --dport 1111 -j RATEEST --rateest-name ap \ --rateest-interval 250ms --rateest-ewma 0.5s %ip netns del vm1
splat looks like: [ 668.813518] WARNING: CPU: 0 PID: 87 at net/netfilter/xt_RATEEST.c:210 xt_rateest_net_exit+0x210/0x340 [xt_RATEEST] [ 668.813518] Modules linked in: xt_RATEEST xt_tcpudp iptable_mangle bpfilter ip_tables x_tables [ 668.813518] CPU: 0 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc7+ #21 [ 668.813518] Workqueue: netns cleanup_net [ 668.813518] RIP: 0010:xt_rateest_net_exit+0x210/0x340 [xt_RATEEST] [ 668.813518] Code: 00 48 8b 85 30 ff ff ff 4c 8b 23 80 38 00 0f 85 24 01 00 00 48 8b 85 30 ff ff ff 4d 85 e4 4c 89 a5 58 ff ff ff c6 00 f8 74 b2 <0f> 0b 48 83 c3 08 4c 39 f3 75 b0 48 b8 00 00 00 00 00 fc ff df 49 [ 668.813518] RSP: 0018:ffff8801156c73f8 EFLAGS: 00010282 [ 668.813518] RAX: ffffed0022ad8e85 RBX: ffff880118928e98 RCX: 5db8012a00000000 [ 668.813518] RDX: ffff8801156c7428 RSI: 00000000cb1d185f RDI: ffff880115663b74 [ 668.813518] RBP: ffff8801156c74d0 R08: ffff8801156633c0 R09: 1ffff100236440be [ 668.813518] R10: 0000000000000001 R11: ffffed002367d852 R12: ffff880115142b08 [ 668.813518] R13: 1ffff10022ad8e81 R14: ffff880118928ea8 R15: dffffc0000000000 [ 668.813518] FS: 0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000 [ 668.813518] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 668.813518] CR2: 0000563aa69f4f28 CR3: 0000000105a16000 CR4: 00000000001006f0 [ 668.813518] Call Trace: [ 668.813518] ? unregister_netdevice_many+0xe0/0xe0 [ 668.813518] ? xt_rateest_net_init+0x2c0/0x2c0 [xt_RATEEST] [ 668.813518] ? default_device_exit+0x1ca/0x270 [ 668.813518] ? remove_proc_entry+0x1cd/0x390 [ 668.813518] ? dev_change_net_namespace+0xd00/0xd00 [ 668.813518] ? __init_waitqueue_head+0x130/0x130 [ 668.813518] ops_exit_list.isra.10+0x94/0x140 [ 668.813518] cleanup_net+0x45b/0x900 [ 668.813518] ? net_drop_ns+0x110/0x110 [ 668.813518] ? swapgs_restore_regs_and_return_to_usermode+0x3c/0x80 [ 668.813518] ? save_trace+0x300/0x300 [ 668.813518] ? lock_acquire+0x196/0x470 [ 668.813518] ? lock_acquire+0x196/0x470 [ 668.813518] ? process_one_work+0xb60/0x1de0 [ 668.813518] ? _raw_spin_unlock_irq+0x29/0x40 [ 668.813518] ? _raw_spin_unlock_irq+0x29/0x40 [ 668.813518] ? __lock_acquire+0x4500/0x4500 [ 668.813518] ? __lock_is_held+0xb4/0x140 [ 668.813518] process_one_work+0xc13/0x1de0 [ 668.813518] ? pwq_dec_nr_in_flight+0x3c0/0x3c0 [ 668.813518] ? set_load_weight+0x270/0x270 [ ... ]
Fixes: 3427b2ab63fa ("netfilter: make xt_rateest hash table per net") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1, v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4, v4.17-rc3, v4.17-rc2, v4.17-rc1, v4.16, v4.16-rc7, v4.16-rc6, v4.16-rc5, v4.16-rc4 |
|
| #
3427b2ab |
| 02-Mar-2018 |
Cong Wang <[email protected]> |
netfilter: make xt_rateest hash table per net
As suggested by Eric, we need to make the xt_rateest hash table and its lock per netns to reduce lock contentions.
Cc: Florian Westphal <[email protected]>
netfilter: make xt_rateest hash table per net
As suggested by Eric, we need to make the xt_rateest hash table and its lock per netns to reduce lock contentions.
Cc: Florian Westphal <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Pablo Neira Ayuso <[email protected]> Signed-off-by: Cong Wang <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v4.16-rc3, v4.16-rc2, v4.16-rc1 |
|
| #
7dc68e98 |
| 05-Feb-2018 |
Cong Wang <[email protected]> |
netfilter: xt_RATEEST: acquire xt_rateest_mutex for hash insert
rateest_hash is supposed to be protected by xt_rateest_mutex, and, as suggested by Eric, lookup and insert should be atomic, so we sho
netfilter: xt_RATEEST: acquire xt_rateest_mutex for hash insert
rateest_hash is supposed to be protected by xt_rateest_mutex, and, as suggested by Eric, lookup and insert should be atomic, so we should acquire the xt_rateest_mutex once for both.
So introduce a non-locking helper for internal use and keep the locking one for external.
Reported-by: <[email protected]> Fixes: 5859034d7eb8 ("[NETFILTER]: x_tables: add RATEEST target") Signed-off-by: Cong Wang <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v4.15, v4.15-rc9, v4.15-rc8, v4.15-rc7, v4.15-rc6, v4.15-rc5, v4.15-rc4, v4.15-rc3, v4.15-rc2, v4.15-rc1, v4.14, v4.14-rc8, v4.14-rc7, v4.14-rc6, v4.14-rc5, v4.14-rc4, v4.14-rc3, v4.14-rc2, v4.14-rc1, v4.13, v4.13-rc7, v4.13-rc6, v4.13-rc5, v4.13-rc4, v4.13-rc3, v4.13-rc2, v4.13-rc1, v4.12, v4.12-rc7, v4.12-rc6, v4.12-rc5, v4.12-rc4, v4.12-rc3, v4.12-rc2, v4.12-rc1, v4.11, v4.11-rc8, v4.11-rc7, v4.11-rc6, v4.11-rc5, v4.11-rc4, v4.11-rc3, v4.11-rc2, v4.11-rc1, v4.10, v4.10-rc8, v4.10-rc7, v4.10-rc6, v4.10-rc5, v4.10-rc4, v4.10-rc3 |
|
| #
ec231890 |
| 02-Jan-2017 |
Willem de Bruijn <[email protected]> |
xtables: extend matches and targets with .usersize
In matches and targets that define a kernel-only tail to their xt_match and xt_target data structs, add a field .usersize that specifies up to wher
xtables: extend matches and targets with .usersize
In matches and targets that define a kernel-only tail to their xt_match and xt_target data structs, add a field .usersize that specifies up to where data is to be shared with userspace.
Performed a search for comment "Used internally by the kernel" to find relevant matches and targets. Manually inspected the structs to derive a valid offsetof.
Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v4.10-rc2, v4.10-rc1, v4.9, v4.9-rc8 |
|
| #
1c0d32fd |
| 04-Dec-2016 |
Eric Dumazet <[email protected]> |
net_sched: gen_estimator: complete rewrite of rate estimators
1) Old code was hard to maintain, due to complex lock chains. (We probably will be able to remove some kfree_rcu() in callers)
2) Us
net_sched: gen_estimator: complete rewrite of rate estimators
1) Old code was hard to maintain, due to complex lock chains. (We probably will be able to remove some kfree_rcu() in callers)
2) Using a single timer to update all estimators does not scale.
3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity is not supposed to work well)
In this rewrite :
- I removed the RB tree that had to be scanned in gen_estimator_active(). qdisc dumps should be much faster.
- Each estimator has its own timer.
- Estimations are maintained in net_rate_estimator structure, instead of dirtying the qdisc. Minor, but part of the simplification.
- Reading the estimator uses RCU and a seqcount to provide proper support for 32bit kernels.
- We reduce memory need when estimators are not used, since we store a pointer, instead of the bytes/packets counters.
- xt_rateest_mt() no longer has to grab a spinlock. (In the future, xt_rateest_tg() could be switched to per cpu counters)
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v4.9-rc7, v4.9-rc6, v4.9-rc5, v4.9-rc4, v4.9-rc3, v4.9-rc2, v4.9-rc1, v4.8, v4.8-rc8, v4.8-rc7 |
|
| #
7bdc6624 |
| 18-Sep-2016 |
Gao Feng <[email protected]> |
netfilter: Enhance the codes used to get random once
There are some codes which are used to get one random once in netfilter. We could use net_get_random_once to simplify these codes.
Signed-off-by
netfilter: Enhance the codes used to get random once
There are some codes which are used to get one random once in netfilter. We could use net_get_random_once to simplify these codes.
Signed-off-by: Gao Feng <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v4.8-rc6, v4.8-rc5, v4.8-rc4, v4.8-rc3, v4.8-rc2, v4.8-rc1, v4.7, v4.7-rc7, v4.7-rc6, v4.7-rc5, v4.7-rc4, v4.7-rc3 |
|
| #
edb09eb1 |
| 06-Jun-2016 |
Eric Dumazet <[email protected]> |
net: sched: do not acquire qdisc spinlock in qdisc/class stats dump
Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host agent [1] are problematic at scale :
For each qdisc/clas
net: sched: do not acquire qdisc spinlock in qdisc/class stats dump
Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host agent [1] are problematic at scale :
For each qdisc/class found in the dump, we currently lock the root qdisc spinlock in order to get stats. Sampling stats every 5 seconds from thousands of HTB classes is a challenge when the root qdisc spinlock is under high pressure. Not only the dumps take time, they also slow down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.
An audit of existing qdiscs showed that sch_fq_codel is the only qdisc that might need the qdisc lock in fq_codel_dump_stats() and fq_codel_dump_class_stats()
In v2 of this patch, I now use the Qdisc running seqcount to provide consistent reads of packets/bytes counters, regardless of 32/64 bit arches.
I also changed rate estimators to use the same infrastructure so that they no longer need to lock root qdisc lock.
[1] http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf
Signed-off-by: Eric Dumazet <[email protected]> Cc: Cong Wang <[email protected]> Cc: Jamal Hadi Salim <[email protected]> Cc: John Fastabend <[email protected]> Cc: Kevin Athey <[email protected]> Cc: Xiaotian Pei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v4.7-rc2, v4.7-rc1, v4.6, v4.6-rc7, v4.6-rc6, v4.6-rc5, v4.6-rc4, v4.6-rc3, v4.6-rc2, v4.6-rc1, v4.5, v4.5-rc7, v4.5-rc6, v4.5-rc5, v4.5-rc4, v4.5-rc3, v4.5-rc2, v4.5-rc1, v4.4, v4.4-rc8, v4.4-rc7, v4.4-rc6, v4.4-rc5, v4.4-rc4, v4.4-rc3, v4.4-rc2, v4.4-rc1, v4.3, v4.3-rc7, v4.3-rc6, v4.3-rc5, v4.3-rc4, v4.3-rc3, v4.3-rc2, v4.3-rc1, v4.2, v4.2-rc8, v4.2-rc7, v4.2-rc6, v4.2-rc5, v4.2-rc4, v4.2-rc3, v4.2-rc2, v4.2-rc1, v4.1, v4.1-rc8, v4.1-rc7, v4.1-rc6, v4.1-rc5, v4.1-rc4, v4.1-rc3, v4.1-rc2, v4.1-rc1, v4.0, v4.0-rc7, v4.0-rc6, v4.0-rc5, v4.0-rc4, v4.0-rc3, v4.0-rc2, v4.0-rc1, v3.19, v3.19-rc7, v3.19-rc6, v3.19-rc5, v3.19-rc4, v3.19-rc3, v3.19-rc2, v3.19-rc1, v3.18, v3.18-rc7, v3.18-rc6, v3.18-rc5, v3.18-rc4, v3.18-rc3, v3.18-rc2, v3.18-rc1, v3.17, v3.17-rc7 |
|
| #
22e0f8b9 |
| 28-Sep-2014 |
John Fastabend <[email protected]> |
net: sched: make bstats per cpu and estimator RCU safe
In order to run qdisc's without locking statistics and estimators need to be handled correctly.
To resolve bstats make the statistics per cpu.
net: sched: make bstats per cpu and estimator RCU safe
In order to run qdisc's without locking statistics and estimators need to be handled correctly.
To resolve bstats make the statistics per cpu. And because this is only needed for qdiscs that are running without locks which is not the case for most qdiscs in the near future only create percpu stats when qdiscs set the TCQ_F_CPUSTATS flag.
Next because estimators use the bstats to calculate packets per second and bytes per second the estimator code paths are updated to use the per cpu statistics.
Signed-off-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v3.17-rc6, v3.17-rc5, v3.17-rc4, v3.17-rc3, v3.17-rc2, v3.17-rc1, v3.16, v3.16-rc7, v3.16-rc6, v3.16-rc5, v3.16-rc4, v3.16-rc3, v3.16-rc2, v3.16-rc1, v3.15, v3.15-rc8, v3.15-rc7, v3.15-rc6, v3.15-rc5, v3.15-rc4, v3.15-rc3, v3.15-rc2, v3.15-rc1, v3.14, v3.14-rc8, v3.14-rc7, v3.14-rc6, v3.14-rc5, v3.14-rc4, v3.14-rc3, v3.14-rc2, v3.14-rc1, v3.13, v3.13-rc8 |
|
| #
2a50d805 |
| 06-Jan-2014 |
Pablo Neira Ayuso <[email protected]> |
Revert "netfilter: avoid get_random_bytes calls"
This reverts commit a42b99a6e329654d376b330de057eff87686d890.
Hannes Frederic Sowa reported some problems with this patch, more specifically that pr
Revert "netfilter: avoid get_random_bytes calls"
This reverts commit a42b99a6e329654d376b330de057eff87686d890.
Hannes Frederic Sowa reported some problems with this patch, more specifically that prandom_u32() may not be ready at boot time, see:
http://marc.info/?l=linux-netdev&m=138896532403533&w=2
Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v3.13-rc7, v3.13-rc6, v3.13-rc5 |
|
| #
a42b99a6 |
| 19-Dec-2013 |
Florian Westphal <[email protected]> |
netfilter: avoid get_random_bytes calls
All these users need an initial seed value for jhash, prandom is perfectly fine. This avoids draining the entropy pool where its not strictly required.
nfne
netfilter: avoid get_random_bytes calls
All these users need an initial seed value for jhash, prandom is perfectly fine. This avoids draining the entropy pool where its not strictly required.
nfnetlink_log did not use the random value at all.
Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
show more ...
|
|
Revision tags: v3.13-rc4, v3.13-rc3, v3.13-rc2, v3.13-rc1, v3.12, v3.12-rc7, v3.12-rc6, v3.12-rc5, v3.12-rc4, v3.12-rc3, v3.12-rc2, v3.12-rc1, v3.11, v3.11-rc7, v3.11-rc6, v3.11-rc5, v3.11-rc4, v3.11-rc3, v3.11-rc2, v3.11-rc1, v3.10, v3.10-rc7, v3.10-rc6, v3.10-rc5, v3.10-rc4, v3.10-rc3, v3.10-rc2, v3.10-rc1, v3.9, v3.9-rc8, v3.9-rc7, v3.9-rc6, v3.9-rc5, v3.9-rc4, v3.9-rc3, v3.9-rc2, v3.9-rc1 |
|
| #
b67bfe0d |
| 28-Feb-2013 |
Sasha Levin <[email protected]> |
hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and
hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T; expression a,c,d,e; identifier b; statement S; @@
-T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+>
[[email protected]: drop bogus change from net/ipv4/raw.c] [[email protected]: drop bogus hunk from net/ipv6/raw.c] [[email protected]: checkpatch fixes] [[email protected]: fix warnings] [[email protected]: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Cc: Wu Fengguang <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Gleb Natapov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v3.8, v3.8-rc7, v3.8-rc6, v3.8-rc5, v3.8-rc4, v3.8-rc3, v3.8-rc2, v3.8-rc1, v3.7, v3.7-rc8, v3.7-rc7, v3.7-rc6, v3.7-rc5, v3.7-rc4, v3.7-rc3, v3.7-rc2, v3.7-rc1, v3.6, v3.6-rc7, v3.6-rc6, v3.6-rc5, v3.6-rc4, v3.6-rc3, v3.6-rc2, v3.6-rc1, v3.5, v3.5-rc7, v3.5-rc6, v3.5-rc5, v3.5-rc4, v3.5-rc3, v3.5-rc2, v3.5-rc1, v3.4, v3.4-rc7, v3.4-rc6, v3.4-rc5, v3.4-rc4, v3.4-rc3, v3.4-rc2, v3.4-rc1, v3.3, v3.3-rc7, v3.3-rc6, v3.3-rc5, v3.3-rc4, v3.3-rc3, v3.3-rc2, v3.3-rc1, v3.2, v3.2-rc7, v3.2-rc6, v3.2-rc5, v3.2-rc4, v3.2-rc3, v3.2-rc2, v3.2-rc1, v3.1, v3.1-rc10, v3.1-rc9, v3.1-rc8, v3.1-rc7, v3.1-rc6, v3.1-rc5, v3.1-rc4, v3.1-rc3, v3.1-rc2, v3.1-rc1, v3.0, v3.0-rc7, v3.0-rc6, v3.0-rc5, v3.0-rc4, v3.0-rc3, v3.0-rc2, v3.0-rc1, v2.6.39, v2.6.39-rc7, v2.6.39-rc6 |
|
| #
cefcb602 |
| 02-May-2011 |
Paul E. McKenney <[email protected]> |
net,rcu: Convert call_rcu(xt_rateest_free_rcu) to kfree_rcu()
The RCU callback xt_rateest_free_rcu() just calls kfree(), so we can use kfree_rcu() instead of call_rcu(). This also allows us to disp
net,rcu: Convert call_rcu(xt_rateest_free_rcu) to kfree_rcu()
The RCU callback xt_rateest_free_rcu() just calls kfree(), so we can use kfree_rcu() instead of call_rcu(). This also allows us to dispense with an rcu_barrier() call, speeding up unloading of this module.
Signed-off-by: Paul E. McKenney <[email protected]> Cc: Patrick McHardy <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
show more ...
|
|
Revision tags: v2.6.39-rc5, v2.6.39-rc4, v2.6.39-rc3, v2.6.39-rc2, v2.6.39-rc1, v2.6.38, v2.6.38-rc8, v2.6.38-rc7, v2.6.38-rc6, v2.6.38-rc5, v2.6.38-rc4, v2.6.38-rc3, v2.6.38-rc2, v2.6.38-rc1, v2.6.37, v2.6.37-rc8, v2.6.37-rc7, v2.6.37-rc6, v2.6.37-rc5, v2.6.37-rc4, v2.6.37-rc3, v2.6.37-rc2, v2.6.37-rc1, v2.6.36, v2.6.36-rc8, v2.6.36-rc7, v2.6.36-rc6, v2.6.36-rc5, v2.6.36-rc4, v2.6.36-rc3, v2.6.36-rc2, v2.6.36-rc1, v2.6.35, v2.6.35-rc6, v2.6.35-rc5, v2.6.35-rc4, v2.6.35-rc3 |
|
| #
c7de2cf0 |
| 09-Jun-2010 |
Eric Dumazet <[email protected]> |
pkt_sched: gen_kill_estimator() rcu fixes
gen_kill_estimator() API is incomplete or not well documented, since caller should make sure an RCU grace period is respected before freeing stats_lock.
Th
pkt_sched: gen_kill_estimator() rcu fixes
gen_kill_estimator() API is incomplete or not well documented, since caller should make sure an RCU grace period is respected before freeing stats_lock.
This was partially addressed in commit 5d944c640b4 (gen_estimator: deadlock fix), but same problem exist for all gen_kill_estimator() users, if lock they use is not already RCU protected.
A code review shows xt_RATEEST.c, act_api.c, act_police.c have this problem. Other are ok because they use qdisc lock, already RCU protected.
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v2.6.35-rc2, v2.6.35-rc1, v2.6.34, v2.6.34-rc7, v2.6.34-rc6, v2.6.34-rc5, v2.6.34-rc4, v2.6.34-rc3, v2.6.34-rc2, v2.6.34-rc1, v2.6.33, v2.6.33-rc8, v2.6.33-rc7, v2.6.33-rc6, v2.6.33-rc5, v2.6.33-rc4, v2.6.33-rc3, v2.6.33-rc2, v2.6.33-rc1, v2.6.32, v2.6.32-rc8, v2.6.32-rc7, v2.6.32-rc6, v2.6.32-rc5, v2.6.32-rc4, v2.6.32-rc3, v2.6.32-rc1, v2.6.32-rc2, v2.6.31, v2.6.31-rc9, v2.6.31-rc8, v2.6.31-rc7, v2.6.31-rc6, v2.6.31-rc5, v2.6.31-rc4, v2.6.31-rc3 |
|
| #
4b560b44 |
| 05-Jul-2009 |
Jan Engelhardt <[email protected]> |
netfilter: xtables: substitute temporary defines by final name
Signed-off-by: Jan Engelhardt <[email protected]>
|
| #
5a0e3ad6 |
| 24-Mar-2010 |
Tejun Heo <[email protected]> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being included when bu
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files.
2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files.
3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch.
Signed-off-by: Tejun Heo <[email protected]> Guess-its-ok-by: Christoph Lameter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Lee Schermerhorn <[email protected]>
show more ...
|
| #
4a5a5c73 |
| 19-Mar-2010 |
Jan Engelhardt <[email protected]> |
netfilter: xtables: slightly better error reporting
When extended status codes are available, such as ENOMEM on failed allocations, or subsequent functions (e.g. nf_ct_get_l3proto), passing them up
netfilter: xtables: slightly better error reporting
When extended status codes are available, such as ENOMEM on failed allocations, or subsequent functions (e.g. nf_ct_get_l3proto), passing them up to userspace seems like a good idea compared to just always EINVAL.
Signed-off-by: Jan Engelhardt <[email protected]>
show more ...
|
| #
d6b00a53 |
| 25-Mar-2010 |
Jan Engelhardt <[email protected]> |
netfilter: xtables: change targets to return error code
Part of the transition of done by this semantic patch: // <smpl> @ rule1 @ struct xt_target ops; identifier check; @@ ops.checkentry = check;
netfilter: xtables: change targets to return error code
Part of the transition of done by this semantic patch: // <smpl> @ rule1 @ struct xt_target ops; identifier check; @@ ops.checkentry = check;
@@ identifier rule1.check; @@ check(...) { <... -return true; +return 0; ...> }
@@ identifier rule1.check; @@ check(...) { <... -return false; +return -EINVAL; ...> } // </smpl>
Signed-off-by: Jan Engelhardt <[email protected]>
show more ...
|
| #
135367b8 |
| 19-Mar-2010 |
Jan Engelhardt <[email protected]> |
netfilter: xtables: change xt_target.checkentry return type
Restore function signatures from bool to int so that we can report memory allocation failures or similar using -ENOMEM rather than always
netfilter: xtables: change xt_target.checkentry return type
Restore function signatures from bool to int so that we can report memory allocation failures or similar using -ENOMEM rather than always having to pass -EINVAL back.
// <smpl> @@ type bool; identifier check, par; @@ -bool check +int check (struct xt_tgchk_param *par) { ... } // </smpl>
Minus the change it does to xt_ct_find_proto.
Signed-off-by: Jan Engelhardt <[email protected]>
show more ...
|