|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6 |
|
| #
6010d245 |
| 30-Oct-2024 |
Waiman Long <[email protected]> |
sched/isolation: Consolidate housekeeping cpumasks that are always identical
The housekeeping cpumasks are only set by two boot commandline parameters: "nohz_full" and "isolcpus". When there is more
sched/isolation: Consolidate housekeeping cpumasks that are always identical
The housekeeping cpumasks are only set by two boot commandline parameters: "nohz_full" and "isolcpus". When there is more than one of "nohz_full" or "isolcpus", the extra ones must have the same CPU list or the setup will fail partially.
The HK_TYPE_DOMAIN and HK_TYPE_MANAGED_IRQ types are settable by "isolcpus" only and their settings can be independent of the other types. The other housekeeping types are all set by "nohz_full" or "isolcpus=nohz" without a way to set them individually. So they all have identical cpumasks.
There is actually no point in having different cpumasks for these "nohz_full" only housekeeping types. Consolidate these types to use the same cpumask by aliasing them to the same value. If there is a need to set any of them independently in the future, we can break them out to their own cpumasks again.
With this change, the number of cpumasks in the housekeeping structure drops from 9 to 3. Other than that, there should be no other functional change.
Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
1174b934 |
| 30-Oct-2024 |
Waiman Long <[email protected]> |
sched/isolation: Make "isolcpus=nohz" equivalent to "nohz_full"
The "isolcpus=nohz" boot parameter and flag were used to disable tick when running a single task. Nowsdays, this "nohz" flag is seldo
sched/isolation: Make "isolcpus=nohz" equivalent to "nohz_full"
The "isolcpus=nohz" boot parameter and flag were used to disable tick when running a single task. Nowsdays, this "nohz" flag is seldomly used as it is included as part of the "nohz_full" parameter. Extend this flag to cover other kernel noises disabled by the "nohz_full" parameter to make them equivalent. This also eliminates the need to use both the "isolcpus" and the "nohz_full" parameters to fully isolated a given set of CPUs.
Suggested-by: Frederic Weisbecker <[email protected]> Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
ae5c6777 |
| 30-Oct-2024 |
Waiman Long <[email protected]> |
sched/core: Remove HK_TYPE_SCHED
The HK_TYPE_SCHED housekeeping type is defined but not set anywhere. So any code that try to use HK_TYPE_SCHED are essentially dead code. So remove HK_TYPE_SCHED and
sched/core: Remove HK_TYPE_SCHED
The HK_TYPE_SCHED housekeeping type is defined but not set anywhere. So any code that try to use HK_TYPE_SCHED are essentially dead code. So remove HK_TYPE_SCHED and any code that use it.
Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4 |
|
| #
257bf89d |
| 13-Apr-2024 |
Oleg Nesterov <[email protected]> |
sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU
housekeeping_setup() checks cpumask_intersects(present, online) to ensure that the kernel will have at least one housekeeping CP
sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU
housekeeping_setup() checks cpumask_intersects(present, online) to ensure that the kernel will have at least one housekeeping CPU after smp_init(), but this doesn't work if the maxcpus= kernel parameter limits the number of processors available after bootup.
For example, a kernel with "maxcpus=2 nohz_full=0-2" parameters crashes at boot time on a virtual machine with 4 CPUs.
Change housekeeping_setup() to use cpumask_first_and() and check that the returned CPU number is valid and less than setup_max_cpus.
Another corner case is "nohz_full=0" on a machine with a single CPU or with the maxcpus=1 kernel argument. In this case non_housekeeping_mask is empty and tick_nohz_full_setup() makes no sense. And indeed, the kernel hits the WARN_ON(tick_nohz_full_running) in tick_sched_do_timer().
And how should the kernel interpret the "nohz_full=" parameter? It should be silently ignored, but currently cpulist_parse() happily returns the empty cpumask and this leads to the same problem.
Change housekeeping_setup() to check cpumask_empty(non_housekeeping_mask) and do nothing in this case.
Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Phil Auld <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
5097cbcb |
| 11-Apr-2024 |
Oleg Nesterov <[email protected]> |
sched/isolation: Prevent boot crash when the boot CPU is nohz_full
Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not include the boot CPU, which is no longer true after:
0
sched/isolation: Prevent boot crash when the boot CPU is nohz_full
Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not include the boot CPU, which is no longer true after:
08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full").
However after:
aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
the kernel will crash at boot time in this case; housekeeping_any_cpu() returns an invalid CPU number until smp_init() brings the first housekeeping CPU up.
Change housekeeping_any_cpu() to check the result of cpumask_any_and() and return smp_processor_id() in this case.
This is just the simple and backportable workaround which fixes the symptom, but smp_processor_id() at boot time should be safe at least for type == HK_TYPE_TIMER, this more or less matches the tick_do_timer_boot_cpu logic.
There is no worry about cpu_down(); tick_nohz_cpu_down() will not allow to offline tick_do_timer_cpu (the 1st online housekeeping CPU).
Fixes: aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work") Reported-by: Chris von Recklinghausen <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Phil Auld <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected] Closes: https://lore.kernel.org/all/[email protected]/
show more ...
|
|
Revision tags: v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6 |
|
| #
801c1419 |
| 22-Feb-2022 |
Ingo Molnar <[email protected]> |
sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there
Collect all utility functionality source code files into a single kernel/sched/build_utility.c file, via #incl
sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there
Collect all utility functionality source code files into a single kernel/sched/build_utility.c file, via #include-ing the .c files:
kernel/sched/clock.c kernel/sched/completion.c kernel/sched/loadavg.c kernel/sched/swait.c kernel/sched/wait_bit.c kernel/sched/wait.c
CONFIG_CPU_FREQ: kernel/sched/cpufreq.c
CONFIG_CPU_FREQ_GOV_SCHEDUTIL: kernel/sched/cpufreq_schedutil.c
CONFIG_CGROUP_CPUACCT: kernel/sched/cpuacct.c
CONFIG_SCHED_DEBUG: kernel/sched/debug.c
CONFIG_SCHEDSTATS: kernel/sched/stats.c
CONFIG_SMP: kernel/sched/cpupri.c kernel/sched/stop_task.c kernel/sched/topology.c
CONFIG_SCHED_CORE: kernel/sched/core_sched.c
CONFIG_PSI: kernel/sched/psi.c
CONFIG_MEMBARRIER: kernel/sched/membarrier.c
CONFIG_CPU_ISOLATION: kernel/sched/isolation.c
CONFIG_SCHED_AUTOGROUP: kernel/sched/autogroup.c
The goal is to amortize the 60+ KLOC header bloat from over a dozen build units into a single build unit.
The build time of build_utility.c also roughly matches the build time of core.c and fair.c - allowing better load-balancing of scheduler-only rebuilds.
Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Peter Zijlstra <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc5, v5.17-rc4 |
|
| #
ed3b362d |
| 07-Feb-2022 |
Frederic Weisbecker <[email protected]> |
sched/isolation: Split housekeeping cpumask per isolation features
To prepare for supporting each housekeeping feature toward cpuset, split the global housekeeping cpumask per HK_TYPE_* entry.
This
sched/isolation: Split housekeeping cpumask per isolation features
To prepare for supporting each housekeeping feature toward cpuset, split the global housekeeping cpumask per HK_TYPE_* entry.
This will later allow, for example, to runtime modify the cpulist passed through "isolcpus=", "nohz_full=" and "rcu_nocbs=" kernel boot parameters.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Juri Lelli <[email protected]> Reviewed-by: Phil Auld <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
65e53f86 |
| 07-Feb-2022 |
Frederic Weisbecker <[email protected]> |
sched/isolation: Fix housekeeping_mask memory leak
If "nohz_full=" or "isolcpus=nohz" are called with CONFIG_NO_HZ_FULL=n, housekeeping_mask doesn't get freed despite it being unused if housekeeping
sched/isolation: Fix housekeeping_mask memory leak
If "nohz_full=" or "isolcpus=nohz" are called with CONFIG_NO_HZ_FULL=n, housekeeping_mask doesn't get freed despite it being unused if housekeeping_setup() is called for the first time.
Check this scenario first to fix this, so that no useless allocation is performed.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Juri Lelli <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Reviewed-by: Phil Auld <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
0cd3e59d |
| 07-Feb-2022 |
Frederic Weisbecker <[email protected]> |
sched/isolation: Consolidate error handling
Centralize the mask freeing and return value for the error path. This makes potential leaks more visible.
Signed-off-by: Frederic Weisbecker <frederic@ke
sched/isolation: Consolidate error handling
Centralize the mask freeing and return value for the error path. This makes potential leaks more visible.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Juri Lelli <[email protected]> Reviewed-by: Phil Auld <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
6367b600 |
| 07-Feb-2022 |
Frederic Weisbecker <[email protected]> |
sched/isolation: Consolidate check for housekeeping minimum service
There can be two subsequent calls to housekeeping_setup() due to "nohz_full=" and "isolcpus=" that can mix up. The two passes eac
sched/isolation: Consolidate check for housekeeping minimum service
There can be two subsequent calls to housekeeping_setup() due to "nohz_full=" and "isolcpus=" that can mix up. The two passes each have their own way to deal with an empty housekeeping set of CPUs. Consolidate this part and remove the awful "tmp" based naming.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Juri Lelli <[email protected]> Reviewed-by: Phil Auld <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
04d4e665 |
| 07-Feb-2022 |
Frederic Weisbecker <[email protected]> |
sched/isolation: Use single feature type while referring to housekeeping cpumask
Refer to housekeeping APIs using single feature types instead of flags. This prevents from passing multiple isolation
sched/isolation: Use single feature type while referring to housekeeping cpumask
Refer to housekeeping APIs using single feature types instead of flags. This prevents from passing multiple isolation features at once to housekeeping interfaces, which soon won't be possible anymore as each isolation features will have their own cpumask.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Juri Lelli <[email protected]> Reviewed-by: Phil Auld <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12 |
|
| #
915a2bc3 |
| 19-Apr-2021 |
Paul Gortmaker <[email protected]> |
sched/isolation: Reconcile rcu_nocbs= and nohz_full=
We have a mismatch between RCU and isolation -- in relation to what is considered the maximum valid CPU number.
This matters because nohz_full=
sched/isolation: Reconcile rcu_nocbs= and nohz_full=
We have a mismatch between RCU and isolation -- in relation to what is considered the maximum valid CPU number.
This matters because nohz_full= and rcu_nocbs= are joined at the hip; in fact the former will enforce the latter. So we don't want a CPU mask to be valid for one and denied for the other.
The difference 1st appeared as of v4.15; further details are below.
As it is confusing to anyone who isn't looking at the code regularly, a reminder is in order; three values exist here:
CONFIG_NR_CPUS - compiled in maximum cap on number of CPUs supported. nr_cpu_ids - possible # of CPUs (typically reflects what ACPI says) cpus_present - actual number of present/detected/installed CPUs.
For this example, I'll refer to NR_CPUS=64 from "make defconfig" and nr_cpu_ids=6 for ACPI reporting on a board that could run a six core, and present=4 for a quad that is physically in the socket. From dmesg:
smpboot: Allowing 6 CPUs, 2 hotplug CPUs setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:6 nr_node_ids:1 rcu: RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=6. smp: Brought up 1 node, 4 CPUs
And from userspace, see:
paul@trash:/sys/devices/system/cpu$ cat present 0-3 paul@trash:/sys/devices/system/cpu$ cat possible 0-5 paul@trash:/sys/devices/system/cpu$ cat kernel_max 63
Everything is fine if we boot 5x5 for rcu/nohz:
Command line: BOOT_IMAGE=/boot/bzImage nohz_full=2-5 rcu_nocbs=2-5 root=/dev/sda1 ro NO_HZ: Full dynticks CPUs: 2-5. rcu: Offload RCU callbacks from CPUs: 2-5.
..even though there is no CPU 4 or 5. Both RCU and nohz_full are OK. Now we push that > 6 but less than NR_CPU and with 15x15 we get:
Command line: BOOT_IMAGE=/boot/bzImage rcu_nocbs=2-15 nohz_full=2-15 root=/dev/sda1 ro rcu: Note: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs. rcu: Offload RCU callbacks from CPUs: 2-5.
These are both functionally equivalent, as we are only changing flags on phantom CPUs that don't exist, but note the kernel interpretation changes. And worse, it only changes for one of the two - which is the problem.
RCU doesn't care if you want to restrict the flags on phantom CPUs but clearly nohz_full does after this change from v4.15.
edb9382175c3: ("sched/isolation: Move isolcpus= handling to the housekeeping code")
- if (cpulist_parse(str, non_housekeeping_mask) < 0) { - pr_warn("Housekeeping: Incorrect nohz_full cpumask\n"); + err = cpulist_parse(str, non_housekeeping_mask); + if (err < 0 || cpumask_last(non_housekeeping_mask) >= nr_cpu_ids) { + pr_warn("Housekeeping: nohz_full= or isolcpus= incorrect CPU range\n");
To be clear, the sanity check on "possible" (nr_cpu_ids) is new here.
The goal was reasonable ; not wanting housekeeping to land on a not-possible CPU, but note two things:
1) this is an exclusion list, not an inclusion list; we are tracking non_housekeeping CPUs; not ones who are explicitly assigned housekeeping
2) we went one further in 9219565aa890 ("sched/isolation: Require a present CPU in housekeeping mask") - ensuring that housekeeping was sanity checking against present and not just possible CPUs.
To be clear, this means the check added in v4.15 is doubly redundant. And more importantly, overly strict/restrictive.
We care now, because the bitmap boot arg parsing now knows that a value of "N" is NR_CPUS; the size of the bitmap, but the bitmap code doesn't know anything about the subtleties of our max/possible/present CPU specifics as outlined above.
So drop the check added in v4.15 (edb9382175c3) and make RCU and nohz_full both in alignment again on NR_CPUS so "N" works for both, and then they can fall back to nr_cpu_ids internally just as before.
Command line: BOOT_IMAGE=/boot/bzImage nohz_full=2-N rcu_nocbs=2-N root=/dev/sda1 ro NO_HZ: Full dynticks CPUs: 2-5. rcu: Offload RCU callbacks from CPUs: 2-5.
As shown above, with this change, RCU and nohz_full are in sync, even with the use of the "N" placeholder. Same result is achieved with "15".
Signed-off-by: Paul Gortmaker <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7 |
|
| #
9cc5b865 |
| 27-May-2020 |
Marcelo Tosatti <[email protected]> |
isolcpus: Affine unbound kernel threads to housekeeping cpus
This is a kernel enhancement that configures the cpu affinity of kernel threads via kernel boot option nohz_full=.
When this option is s
isolcpus: Affine unbound kernel threads to housekeeping cpus
This is a kernel enhancement that configures the cpu affinity of kernel threads via kernel boot option nohz_full=.
When this option is specified, the cpumask is immediately applied upon kthread launch. This does not affect kernel threads that specify cpu and node.
This allows CPU isolation (that is not allowing certain threads to execute on certain CPUs) without using the isolcpus=domain parameter, making it possible to enable load balancing on such CPUs during runtime (see kernel-parameters.txt).
Note-1: this is based off on Wind River's patch at https://github.com/starlingx-staging/stx-integ/blob/master/kernel/kernel-std/centos/patches/affine-compute-kernel-threads.patch
Difference being that this patch is limited to modifying kernel thread cpumask. Behaviour of other threads can be controlled via cgroups or sched_setaffinity.
Note-2: Wind River's patch was based off Christoph Lameter's patch at https://lwn.net/Articles/565932/ with the only difference being the kernel parameter changed from kthread to kthread_cpus.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1 |
|
| #
3662daf0 |
| 03-Apr-2020 |
Peter Xu <[email protected]> |
sched/isolation: Allow "isolcpus=" to skip unknown sub-parameters
The "isolcpus=" parameter allows sub-parameters before the cpulist is specified, and if the parser detects an unknown sub-parameters
sched/isolation: Allow "isolcpus=" to skip unknown sub-parameters
The "isolcpus=" parameter allows sub-parameters before the cpulist is specified, and if the parser detects an unknown sub-parameters the whole parameter will be ignored.
This design is incompatible with itself when new sub-parameters are added. An older kernel will not recognize the new sub-parameter and will invalidate the whole parameter so the CPU isolation will not take effect. It emits a warning:
isolcpus: Error, unknown flag
The better and compatible way is to allow "isolcpus=" to skip unknown sub-parameters, so that even if new sub-parameters are added an older kernel will still be able to behave as usual even if with the new sub-parameter specified on the command line.
Ideally this should have been there when the first sub-parameter for "isolcpus=" was introduced.
Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5 |
|
| #
11ea68f5 |
| 20-Jan-2020 |
Ming Lei <[email protected]> |
genirq, sched/isolation: Isolate from handling managed interrupts
The affinity of managed interrupts is completely handled in the kernel and cannot be changed via the /proc/irq/* interfaces from use
genirq, sched/isolation: Isolate from handling managed interrupts
The affinity of managed interrupts is completely handled in the kernel and cannot be changed via the /proc/irq/* interfaces from user space. As the kernel tries to spread out interrupts evenly accross CPUs on x86 to prevent vector exhaustion, it can happen that a managed interrupt whose affinity mask contains both isolated and housekeeping CPUs is routed to an isolated CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts on the isolated CPU.
Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding logic in the interrupt affinity selection code.
The subparameter indicates to the interrupt affinity selection logic that it should try to avoid the above scenario.
This isolation is best effort and only effective if the automatically assigned interrupt mask of a device queue contains isolated and housekeeping CPUs. If housekeeping CPUs are online then such interrupts are directed to the housekeeping CPU so that IO submitted on the housekeeping CPU cannot disturb the isolated CPU.
If a queue's affinity mask contains only isolated CPUs then this parameter has no effect on the interrupt routing decision, though interrupts are only happening when tasks running on those isolated CPUs submit IO. IO submitted on housekeeping CPUs has no influence on those queues.
If the affinity mask contains both housekeeping and isolated CPUs, but none of the contained housekeeping CPUs is online, then the interrupt is also routed to an isolated CPU. Interrupts are only delivered when one of the isolated CPUs in the affinity mask submits IO. If one of the contained housekeeping CPUs comes online, the CPU hotplug logic migrates the interrupt automatically back to the upcoming housekeeping CPU. Depending on the type of interrupt controller, this can require that at least one interrupt is delivered to the isolated CPU in order to complete the migration.
[ tglx: Removed unused parameter, added and edited comments/documentation and rephrased the changelog so it contains more details. ]
Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7 |
|
| #
e0e8d491 |
| 28-Jun-2019 |
Wanpeng Li <[email protected]> |
sched/isolation: Prefer housekeeping CPU in local node
In real product setup, there will be houseeking CPUs in each nodes, it is prefer to do housekeeping from local node, fallback to global online
sched/isolation: Prefer housekeeping CPU in local node
In real product setup, there will be houseeking CPUs in each nodes, it is prefer to do housekeeping from local node, fallback to global online cpumask if failed to find houseeking CPU from local node.
Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Reviewed-by: Srikar Dronamraju <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
| #
0c5f81da |
| 06-Jul-2019 |
Wanpeng Li <[email protected]> |
KVM: LAPIC: Inject timer interrupt via posted interrupt
Dedicated instances are currently disturbed by unnecessary jitter due to the emulated lapic timers firing on the same pCPUs where the vCPUs re
KVM: LAPIC: Inject timer interrupt via posted interrupt
Dedicated instances are currently disturbed by unnecessary jitter due to the emulated lapic timers firing on the same pCPUs where the vCPUs reside. There is no hardware virtual timer on Intel for guest like ARM, so both programming timer in guest and the emulated timer fires incur vmexits. This patch tries to avoid vmexit when the emulated timer fires, at least in dedicated instance scenario when nohz_full is enabled.
In that case, the emulated timers can be offload to the nearest busy housekeeping cpus since APICv has been found for several years in server processors. The guest timer interrupt can then be injected via posted interrupts, which are delivered by the housekeeping cpu once the emulated timer fires.
The host should tuned so that vCPUs are placed on isolated physical processors, and with several pCPUs surplus for busy housekeeping. If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode, ~3% redis performance benefit can be observed on Skylake server, and the number of external interrupt vmexits drops substantially. Without patch
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )
While with patch:
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )
Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Marcelo Tosatti <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1 |
|
| #
457c8996 |
| 19-May-2019 |
Thomas Gleixner <[email protected]> |
treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was use
treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5 |
|
| #
9219565a |
| 11-Apr-2019 |
Nicholas Piggin <[email protected]> |
sched/isolation: Require a present CPU in housekeeping mask
During housekeeping mask setup, currently a possible CPU is required. That does not guarantee the CPU would be available at boot time, so
sched/isolation: Require a present CPU in housekeeping mask
During housekeeping mask setup, currently a possible CPU is required. That does not guarantee the CPU would be available at boot time, so check to ensure that at least one present CPU is in the mask.
Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7 |
|
| #
c89d92ed |
| 12-Feb-2019 |
Viresh Kumar <[email protected]> |
sched/fair: Use non-atomic cpumask_{set,clear}_cpu()
The cpumasks updated here are not subject to concurrency and using atomic bitops for them is pointless and expensive. Use the non-atomic variants
sched/fair: Use non-atomic cpumask_{set,clear}_cpu()
The cpumasks updated here are not subject to concurrency and using atomic bitops for them is pointless and expensive. Use the non-atomic variants instead.
Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Viresh Kumar <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vincent Guittot <[email protected]> Link: http://lkml.kernel.org/r/2e2a10f84b9049a81eef94ed6d5989447c21e34a.1549963617.git.viresh.kumar@linaro.org Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6 |
|
| #
dfcb245e |
| 03-Dec-2018 |
Ingo Molnar <[email protected]> |
sched: Fix various typos in comments
Go over the scheduler source code and fix common typos in comments - and a typo in an actual variable name.
No change in functionality intended.
Cc: Peter Zijl
sched: Fix various typos in comments
Go over the scheduler source code and fix common typos in comments - and a typo in an actual variable name.
No change in functionality intended.
Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1, v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4, v4.17-rc3, v4.17-rc2, v4.17-rc1, v4.16, v4.16-rc7, v4.16-rc6, v4.16-rc5, v4.16-rc4 |
|
| #
325ea10c |
| 03-Mar-2018 |
Ingo Molnar <[email protected]> |
sched/headers: Simplify and clean up header usage in the scheduler
Do the following cleanups and simplifications:
- sched/sched.h already includes <asm/paravirt.h>, so no need to include it in
sched/headers: Simplify and clean up header usage in the scheduler
Do the following cleanups and simplifications:
- sched/sched.h already includes <asm/paravirt.h>, so no need to include it in sched/core.c again.
- order the <linux/sched/*.h> headers alphabetically
- add all <linux/sched/*.h> headers to kernel/sched/sched.h
- remove all unnecessary includes from the .c files that are already included in kernel/sched/sched.h.
Finally, make all scheduler .c files use a single common header:
#include "sched.h"
... which now contains a union of the relied upon headers.
This makes the various .c files easier to read and easier to handle.
Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
| #
97fb7a0a |
| 03-Mar-2018 |
Ingo Molnar <[email protected]> |
sched: Clean up and harmonize the coding style of the scheduler code base
A good number of small style inconsistencies have accumulated in the scheduler core, so do a pass over them to harmonize all
sched: Clean up and harmonize the coding style of the scheduler code base
A good number of small style inconsistencies have accumulated in the scheduler core, so do a pass over them to harmonize all these details:
- fix speling in comments,
- use curly braces for multi-line statements,
- remove unnecessary parentheses from integer literals,
- capitalize consistently,
- remove stray newlines,
- add comments where necessary,
- remove invalid/unnecessary comments,
- align structure definitions and other data types vertically,
- add missing newlines for increased readability,
- fix vertical tabulation where it's misaligned,
- harmonize preprocessor conditional block labeling and vertical alignment,
- remove line-breaks where they uglify the code,
- add newline after local variable definitions,
No change in functionality:
md5: 1191fa0a890cfa8132156d2959d7e9e2 built-in.o.before.asm 1191fa0a890cfa8132156d2959d7e9e2 built-in.o.after.asm
Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v4.16-rc3 |
|
| #
d84b3131 |
| 21-Feb-2018 |
Frederic Weisbecker <[email protected]> |
sched/isolation: Offload residual 1Hz scheduler tick
When a CPU runs in full dynticks mode, a 1Hz tick remains in order to keep the scheduler stats alive. However this residual tick is a burden for
sched/isolation: Offload residual 1Hz scheduler tick
When a CPU runs in full dynticks mode, a 1Hz tick remains in order to keep the scheduler stats alive. However this residual tick is a burden for bare metal tasks that can't stand any interruption at all, or want to minimize them.
The usual boot parameters "nohz_full=" or "isolcpus=nohz" will now outsource these scheduler ticks to the global workqueue so that a housekeeping CPU handles those remotely. The sched_class::task_tick() implementations have been audited and look safe to be called remotely as the target runqueue and its current task are passed in parameter and don't seem to be accessed locally.
Note that in the case of using isolcpus, it's still up to the user to affine the global workqueues to the housekeeping CPUs through /sys/devices/virtual/workqueue/cpumask or domains isolation "isolcpus=nohz,domain".
Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luiz Capitulino <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Wanpeng Li <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
| #
1bda3f80 |
| 21-Feb-2018 |
Frederic Weisbecker <[email protected]> |
sched/isolation: Isolate workqueues when "nohz_full=" is set
As we prepare for offloading the residual 1hz scheduler ticks to workqueue, let's affine those to housekeepers so that they don't interru
sched/isolation: Isolate workqueues when "nohz_full=" is set
As we prepare for offloading the residual 1hz scheduler ticks to workqueue, let's affine those to housekeepers so that they don't interrupt the CPUs that don't want to be disturbed.
Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luiz Capitulino <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Wanpeng Li <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|