| f1b2991c | 03-Dec-2021 |
Markus Theil <[email protected]> |
kni: fix ioctl signature
Fix kni's ioctl signature to correctly match the kernel's structs. This shaves off the (void*) casts and uses struct file* instead of struct inode*. With the correct signatu
kni: fix ioctl signature
Fix kni's ioctl signature to correctly match the kernel's structs. This shaves off the (void*) casts and uses struct file* instead of struct inode*. With the correct signature, control flow integrity checkers are no longer confused at this point.
Signed-off-by: Markus Theil <[email protected]> Tested-by: Michael Pfeiffer <[email protected]> Acked-by: Stephen Hemminger <[email protected]>
show more ...
|
| 5569dd7d | 20-Jan-2022 |
Tudor Cornea <[email protected]> |
kni: allow configuring thread granularity
The Kni kthreads seem to be re-scheduled at a granularity of roughly 1 millisecond right now, which seems to be insufficient for performing tests involving
kni: allow configuring thread granularity
The Kni kthreads seem to be re-scheduled at a granularity of roughly 1 millisecond right now, which seems to be insufficient for performing tests involving a lot of control plane traffic.
Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it seems that the existing code cannot reschedule at the desired granularily, due to precision constraints of schedule_timeout_interruptible().
In our use case, we leverage the Linux Kernel for control plane, and it is not uncommon to have 60K - 100K pps for some signaling protocols.
Since we are not in atomic context, the usleep_range() function seems to be more appropriate for being able to introduce smaller controlled delays, in the range of 5-10 microseconds. Upon reading the existing code, it would seem that this was the original intent. Adding sub-millisecond delays, seems unfeasible with a call to schedule_timeout_interruptible().
KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ schedule_timeout_interruptible( usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));
Below, we attempted a brief comparison between the existing implementation, which uses schedule_timeout_interruptible() and usleep_range().
We attempt to measure the CPU usage, and RTT between two Kni interfaces, which are created on top of vmxnet3 adapters, connected by a vSwitch.
insmod rte_kni.ko kthread_mode=single carrier=on
schedule_timeout_interruptible(usecs_to_jiffies(5)) kni_single CPU Usage: 2-4 % [root@localhost ~]# ping 1.1.1.2 -I eth1 PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms
usleep_range(5, 10) kni_single CPU usage: 50% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms
usleep_range(20, 50) kni_single CPU usage: 24% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms
usleep_range(50, 100) kni_single CPU usage: 13% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms
usleep_range(100, 200) kni_single CPU usage: 7% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms
usleep_range(1000, 1100) kni_single CPU usage: 2% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms
Upon testing, usleep_range(1000, 1100) seems roughly equivalent in latency and cpu usage to the variant with schedule_timeout_interruptible(), while usleep_range(100, 200) seems to give a decent tradeoff between latency and cpu usage, while allowing users to tweak the limits for improved precision if they have such use cases.
Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a softlockup on my kernel.
Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b [<ffffffff814f7891>] panic+0xcd/0x1e0 [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160 [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0 [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0 [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0 [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80
This patch also attempts to remove this option.
References: [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt
Signed-off-by: Tudor Cornea <[email protected]> Acked-by: Padraig Connolly <[email protected]> Reviewed-by: Ferruh Yigit <[email protected]>
show more ...
|
| e16b972b | 24-Jan-2022 |
Bruce Richardson <[email protected]> |
build: remove deprecated Meson functions
Starting in meson 0.56, the functions meson.source_root() and meson.build_root() are deprecated and to be replaced by the [more descriptive] functions: proje
build: remove deprecated Meson functions
Starting in meson 0.56, the functions meson.source_root() and meson.build_root() are deprecated and to be replaced by the [more descriptive] functions: project_source_root()/global_source_root() and project_build_root()/global_build_root(). Unfortunately, these new replacement functions were only added in 0.56 release too, so to use them we would need version checks for old/new functions to remove the deprecation warnings.
However, the functions "current_build_dir()" and "current_source_dir()" remain unaffected by all this, so we can bypass the versioning problem, by saving off these values to "dpdk_source_root" and "dpdk_build_root" in the top-level meson.build file
Bugzilla ID: 926 Cc: [email protected]
Signed-off-by: Bruce Richardson <[email protected]> Tested-by: Jerin Jacob <[email protected]>
show more ...
|
| 631217c7 | 29-Mar-2021 |
Elad Nachman <[email protected]> |
kni: fix kernel deadlock with bifurcated device
KNI runs userspace callback with rtnl lock held, this is not working fine with some devices that needs to interact with kernel interface in the callba
kni: fix kernel deadlock with bifurcated device
KNI runs userspace callback with rtnl lock held, this is not working fine with some devices that needs to interact with kernel interface in the callback, like Mellanox devices.
The solution is releasing the rtnl lock before calling the userspace callback. But it requires two consideration:
1. The rtnl lock needs to released before 'kni->sync_lock', otherwise it causes deadlock with multiple KNI devices, please check below the A. for the details of the deadlock condition.
2. When rtnl lock is released for interface down event, it cause a regression and deadlock, so can't release the rtnl lock for interface down event, please check below B. for the details.
As a solution, interface down event is handled asynchronously and for all other events rtnl lock is released before processing the callback.
A. KNI sync lock is being locked while rtnl is held. If two threads are calling kni_net_process_request() , then the first one will take the sync lock, release rtnl lock then sleep. The second thread will try to lock sync lock while holding rtnl. The first thread will wake, and try to lock rtnl, resulting in a deadlock. The remedy is to release rtnl before locking the KNI sync lock. Since in between nothing is accessing Linux network-wise, no rtnl locking is needed.
B. There is a race condition in __dev_close_many() processing the close_list while the application terminates. It looks like if two KNI interfaces are terminating, and one releases the rtnl lock, the other takes it, updating the close_list in an unstable state, causing the close_list to become a circular linked list, hence list_for_each_entry() will endlessly loop inside __dev_close_many() .
To summarize: request != interface down : unlock rtnl, send request to user-space, wait for response, send the response error code to caller in user-space.
request == interface down: send request to user-space, return immediately with error code of 0 (success) to user-space.
Fixes: 3fc5ca2f6352 ("kni: initial import") Cc: [email protected]
Signed-off-by: Elad Nachman <[email protected]>
show more ...
|