1ff58fa7fSSebastian Andrzej Siewior========================= 2ff58fa7fSSebastian Andrzej SiewiorCPU hotplug in the Kernel 3ff58fa7fSSebastian Andrzej Siewior========================= 4ff58fa7fSSebastian Andrzej Siewior 5c9871c80SThomas Gleixner:Date: September, 2021 6ff58fa7fSSebastian Andrzej Siewior:Author: Sebastian Andrzej Siewior <[email protected]>, 7ff58fa7fSSebastian Andrzej Siewior Rusty Russell <[email protected]>, 8ff58fa7fSSebastian Andrzej Siewior Srivatsa Vaddagiri <[email protected]>, 9ff58fa7fSSebastian Andrzej Siewior Ashok Raj <[email protected]>, 10c9871c80SThomas Gleixner Joel Schopp <[email protected]>, 11c9871c80SThomas Gleixner Thomas Gleixner <[email protected]> 12ff58fa7fSSebastian Andrzej Siewior 13ff58fa7fSSebastian Andrzej SiewiorIntroduction 14ff58fa7fSSebastian Andrzej Siewior============ 15ff58fa7fSSebastian Andrzej Siewior 16ff58fa7fSSebastian Andrzej SiewiorModern advances in system architectures have introduced advanced error 17ff58fa7fSSebastian Andrzej Siewiorreporting and correction capabilities in processors. There are couple OEMS that 18ff58fa7fSSebastian Andrzej Siewiorsupport NUMA hardware which are hot pluggable as well, where physical node 19ff58fa7fSSebastian Andrzej Siewiorinsertion and removal require support for CPU hotplug. 20ff58fa7fSSebastian Andrzej Siewior 21ff58fa7fSSebastian Andrzej SiewiorSuch advances require CPUs available to a kernel to be removed either for 22ff58fa7fSSebastian Andrzej Siewiorprovisioning reasons, or for RAS purposes to keep an offending CPU off 23ff58fa7fSSebastian Andrzej Siewiorsystem execution path. Hence the need for CPU hotplug support in the 24ff58fa7fSSebastian Andrzej SiewiorLinux kernel. 25ff58fa7fSSebastian Andrzej Siewior 26ff58fa7fSSebastian Andrzej SiewiorA more novel use of CPU-hotplug support is its use today in suspend resume 27ff58fa7fSSebastian Andrzej Siewiorsupport for SMP. Dual-core and HT support makes even a laptop run SMP kernels 28ff58fa7fSSebastian Andrzej Siewiorwhich didn't support these methods. 29ff58fa7fSSebastian Andrzej Siewior 30ff58fa7fSSebastian Andrzej Siewior 31ff58fa7fSSebastian Andrzej SiewiorCommand Line Switches 32ff58fa7fSSebastian Andrzej Siewior===================== 33ff58fa7fSSebastian Andrzej Siewior``maxcpus=n`` 34319f5fa0SBarry Song Restrict boot time CPUs to *n*. Say if you have four CPUs, using 35ff58fa7fSSebastian Andrzej Siewior ``maxcpus=2`` will only boot two. You can choose to bring the 36ff58fa7fSSebastian Andrzej Siewior other CPUs later online. 37ff58fa7fSSebastian Andrzej Siewior 38ff58fa7fSSebastian Andrzej Siewior``nr_cpus=n`` 39241c9eb3SYaohui Wang Restrict the total amount of CPUs the kernel will support. If the number 40241c9eb3SYaohui Wang supplied here is lower than the number of physically available CPUs, then 41ff58fa7fSSebastian Andrzej Siewior those CPUs can not be brought online later. 42ff58fa7fSSebastian Andrzej Siewior 43ff58fa7fSSebastian Andrzej Siewior``possible_cpus=n`` 44ff58fa7fSSebastian Andrzej Siewior This option sets ``possible_cpus`` bits in ``cpu_possible_mask``. 45ff58fa7fSSebastian Andrzej Siewior 46ff58fa7fSSebastian Andrzej Siewior This option is limited to the X86 and S390 architecture. 47ff58fa7fSSebastian Andrzej Siewior 48ff58fa7fSSebastian Andrzej Siewior``cpu0_hotplug`` 49ff58fa7fSSebastian Andrzej Siewior Allow to shutdown CPU0. 50ff58fa7fSSebastian Andrzej Siewior 51ff58fa7fSSebastian Andrzej Siewior This option is limited to the X86 architecture. 52ff58fa7fSSebastian Andrzej Siewior 53ff58fa7fSSebastian Andrzej SiewiorCPU maps 54ff58fa7fSSebastian Andrzej Siewior======== 55ff58fa7fSSebastian Andrzej Siewior 56ff58fa7fSSebastian Andrzej Siewior``cpu_possible_mask`` 57ff58fa7fSSebastian Andrzej Siewior Bitmap of possible CPUs that can ever be available in the 58ff58fa7fSSebastian Andrzej Siewior system. This is used to allocate some boot time memory for per_cpu variables 59ff58fa7fSSebastian Andrzej Siewior that aren't designed to grow/shrink as CPUs are made available or removed. 60ff58fa7fSSebastian Andrzej Siewior Once set during boot time discovery phase, the map is static, i.e no bits 61ff58fa7fSSebastian Andrzej Siewior are added or removed anytime. Trimming it accurately for your system needs 62ff58fa7fSSebastian Andrzej Siewior upfront can save some boot time memory. 63ff58fa7fSSebastian Andrzej Siewior 64ff58fa7fSSebastian Andrzej Siewior``cpu_online_mask`` 65ff58fa7fSSebastian Andrzej Siewior Bitmap of all CPUs currently online. Its set in ``__cpu_up()`` 66ff58fa7fSSebastian Andrzej Siewior after a CPU is available for kernel scheduling and ready to receive 67ff58fa7fSSebastian Andrzej Siewior interrupts from devices. Its cleared when a CPU is brought down using 68ff58fa7fSSebastian Andrzej Siewior ``__cpu_disable()``, before which all OS services including interrupts are 69ff58fa7fSSebastian Andrzej Siewior migrated to another target CPU. 70ff58fa7fSSebastian Andrzej Siewior 71ff58fa7fSSebastian Andrzej Siewior``cpu_present_mask`` 72ff58fa7fSSebastian Andrzej Siewior Bitmap of CPUs currently present in the system. Not all 73ff58fa7fSSebastian Andrzej Siewior of them may be online. When physical hotplug is processed by the relevant 74ff58fa7fSSebastian Andrzej Siewior subsystem (e.g ACPI) can change and new bit either be added or removed 75ff58fa7fSSebastian Andrzej Siewior from the map depending on the event is hot-add/hot-remove. There are currently 76ff58fa7fSSebastian Andrzej Siewior no locking rules as of now. Typical usage is to init topology during boot, 77ff58fa7fSSebastian Andrzej Siewior at which time hotplug is disabled. 78ff58fa7fSSebastian Andrzej Siewior 79ff58fa7fSSebastian Andrzej SiewiorYou really don't need to manipulate any of the system CPU maps. They should 80ff58fa7fSSebastian Andrzej Siewiorbe read-only for most use. When setting up per-cpu resources almost always use 81ff58fa7fSSebastian Andrzej Siewior``cpu_possible_mask`` or ``for_each_possible_cpu()`` to iterate. To macro 82ff58fa7fSSebastian Andrzej Siewior``for_each_cpu()`` can be used to iterate over a custom CPU mask. 83ff58fa7fSSebastian Andrzej Siewior 84ff58fa7fSSebastian Andrzej SiewiorNever use anything other than ``cpumask_t`` to represent bitmap of CPUs. 85ff58fa7fSSebastian Andrzej Siewior 86ff58fa7fSSebastian Andrzej Siewior 87ff58fa7fSSebastian Andrzej SiewiorUsing CPU hotplug 88ff58fa7fSSebastian Andrzej Siewior================= 89f8c6a07cSYanteng Si 90ff58fa7fSSebastian Andrzej SiewiorThe kernel option *CONFIG_HOTPLUG_CPU* needs to be enabled. It is currently 91ff58fa7fSSebastian Andrzej Siewioravailable on multiple architectures including ARM, MIPS, PowerPC and X86. The 92f8c6a07cSYanteng Siconfiguration is done via the sysfs interface:: 93ff58fa7fSSebastian Andrzej Siewior 94ff58fa7fSSebastian Andrzej Siewior $ ls -lh /sys/devices/system/cpu 95ff58fa7fSSebastian Andrzej Siewior total 0 96ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu0 97ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu1 98ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu2 99ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu3 100ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu4 101ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu5 102ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu6 103ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu7 104ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x 2 root root 0 Dec 21 16:33 hotplug 105ff58fa7fSSebastian Andrzej Siewior -r--r--r-- 1 root root 4.0K Dec 21 16:33 offline 106ff58fa7fSSebastian Andrzej Siewior -r--r--r-- 1 root root 4.0K Dec 21 16:33 online 107ff58fa7fSSebastian Andrzej Siewior -r--r--r-- 1 root root 4.0K Dec 21 16:33 possible 108ff58fa7fSSebastian Andrzej Siewior -r--r--r-- 1 root root 4.0K Dec 21 16:33 present 109ff58fa7fSSebastian Andrzej Siewior 110ff58fa7fSSebastian Andrzej SiewiorThe files *offline*, *online*, *possible*, *present* represent the CPU masks. 111ff58fa7fSSebastian Andrzej SiewiorEach CPU folder contains an *online* file which controls the logical on (1) and 112f8c6a07cSYanteng Sioff (0) state. To logically shutdown CPU4:: 113ff58fa7fSSebastian Andrzej Siewior 114ff58fa7fSSebastian Andrzej Siewior $ echo 0 > /sys/devices/system/cpu/cpu4/online 115ff58fa7fSSebastian Andrzej Siewior smpboot: CPU 4 is now offline 116ff58fa7fSSebastian Andrzej Siewior 117ff58fa7fSSebastian Andrzej SiewiorOnce the CPU is shutdown, it will be removed from */proc/interrupts*, 118ff58fa7fSSebastian Andrzej Siewior*/proc/cpuinfo* and should also not be shown visible by the *top* command. To 119f8c6a07cSYanteng Sibring CPU4 back online:: 120ff58fa7fSSebastian Andrzej Siewior 121ff58fa7fSSebastian Andrzej Siewior $ echo 1 > /sys/devices/system/cpu/cpu4/online 122ff58fa7fSSebastian Andrzej Siewior smpboot: Booting Node 0 Processor 4 APIC 0x1 123ff58fa7fSSebastian Andrzej Siewior 124e59e74dcSThomas GleixnerThe CPU is usable again. This should work on all CPUs, but CPU0 is often special 125e59e74dcSThomas Gleixnerand excluded from CPU hotplug. 126ff58fa7fSSebastian Andrzej Siewior 127ff58fa7fSSebastian Andrzej SiewiorThe CPU hotplug coordination 128ff58fa7fSSebastian Andrzej Siewior============================ 129ff58fa7fSSebastian Andrzej Siewior 130ff58fa7fSSebastian Andrzej SiewiorThe offline case 131ff58fa7fSSebastian Andrzej Siewior---------------- 132f8c6a07cSYanteng Si 133ff58fa7fSSebastian Andrzej SiewiorOnce a CPU has been logically shutdown the teardown callbacks of registered 134ff58fa7fSSebastian Andrzej Siewiorhotplug states will be invoked, starting with ``CPUHP_ONLINE`` and terminating 135ff58fa7fSSebastian Andrzej Siewiorat state ``CPUHP_OFFLINE``. This includes: 136ff58fa7fSSebastian Andrzej Siewior 137ff58fa7fSSebastian Andrzej Siewior* If tasks are frozen due to a suspend operation then *cpuhp_tasks_frozen* 138ff58fa7fSSebastian Andrzej Siewior will be set to true. 139ff58fa7fSSebastian Andrzej Siewior* All processes are migrated away from this outgoing CPU to new CPUs. 140ff58fa7fSSebastian Andrzej Siewior The new CPU is chosen from each process' current cpuset, which may be 141ff58fa7fSSebastian Andrzej Siewior a subset of all online CPUs. 142ff58fa7fSSebastian Andrzej Siewior* All interrupts targeted to this CPU are migrated to a new CPU 143ff58fa7fSSebastian Andrzej Siewior* timers are also migrated to a new CPU 144ff58fa7fSSebastian Andrzej Siewior* Once all services are migrated, kernel calls an arch specific routine 145ff58fa7fSSebastian Andrzej Siewior ``__cpu_disable()`` to perform arch specific cleanup. 146ff58fa7fSSebastian Andrzej Siewior 147f8c6a07cSYanteng Si 148c9871c80SThomas GleixnerThe CPU hotplug API 149c9871c80SThomas Gleixner=================== 150ff58fa7fSSebastian Andrzej Siewior 151c9871c80SThomas GleixnerCPU hotplug state machine 152c9871c80SThomas Gleixner------------------------- 153ff58fa7fSSebastian Andrzej Siewior 154c9871c80SThomas GleixnerCPU hotplug uses a trivial state machine with a linear state space from 155c9871c80SThomas GleixnerCPUHP_OFFLINE to CPUHP_ONLINE. Each state has a startup and a teardown 156c9871c80SThomas Gleixnercallback. 157ff58fa7fSSebastian Andrzej Siewior 158c9871c80SThomas GleixnerWhen a CPU is onlined, the startup callbacks are invoked sequentially until 159c9871c80SThomas Gleixnerthe state CPUHP_ONLINE is reached. They can also be invoked when the 160c9871c80SThomas Gleixnercallbacks of a state are set up or an instance is added to a multi-instance 161c9871c80SThomas Gleixnerstate. 162ff58fa7fSSebastian Andrzej Siewior 163c9871c80SThomas GleixnerWhen a CPU is offlined the teardown callbacks are invoked in the reverse 164c9871c80SThomas Gleixnerorder sequentially until the state CPUHP_OFFLINE is reached. They can also 165c9871c80SThomas Gleixnerbe invoked when the callbacks of a state are removed or an instance is 166c9871c80SThomas Gleixnerremoved from a multi-instance state. 167ff58fa7fSSebastian Andrzej Siewior 168c9871c80SThomas GleixnerIf a usage site requires only a callback in one direction of the hotplug 169c9871c80SThomas Gleixneroperations (CPU online or CPU offline) then the other not-required callback 170c9871c80SThomas Gleixnercan be set to NULL when the state is set up. 171f8c6a07cSYanteng Si 172c9871c80SThomas GleixnerThe state space is divided into three sections: 173ff58fa7fSSebastian Andrzej Siewior 174c9871c80SThomas Gleixner* The PREPARE section 175ff58fa7fSSebastian Andrzej Siewior 176c9871c80SThomas Gleixner The PREPARE section covers the state space from CPUHP_OFFLINE to 177c9871c80SThomas Gleixner CPUHP_BRINGUP_CPU. 178ff58fa7fSSebastian Andrzej Siewior 179c9871c80SThomas Gleixner The startup callbacks in this section are invoked before the CPU is 180c9871c80SThomas Gleixner started during a CPU online operation. The teardown callbacks are invoked 181c9871c80SThomas Gleixner after the CPU has become dysfunctional during a CPU offline operation. 182ff58fa7fSSebastian Andrzej Siewior 183c9871c80SThomas Gleixner The callbacks are invoked on a control CPU as they can't obviously run on 184c9871c80SThomas Gleixner the hotplugged CPU which is either not yet started or has become 185c9871c80SThomas Gleixner dysfunctional already. 186ff58fa7fSSebastian Andrzej Siewior 187c9871c80SThomas Gleixner The startup callbacks are used to setup resources which are required to 188c9871c80SThomas Gleixner bring a CPU successfully online. The teardown callbacks are used to free 189c9871c80SThomas Gleixner resources or to move pending work to an online CPU after the hotplugged 190c9871c80SThomas Gleixner CPU became dysfunctional. 191f8c6a07cSYanteng Si 192c9871c80SThomas Gleixner The startup callbacks are allowed to fail. If a callback fails, the CPU 193c9871c80SThomas Gleixner online operation is aborted and the CPU is brought down to the previous 194c9871c80SThomas Gleixner state (usually CPUHP_OFFLINE) again. 195ff58fa7fSSebastian Andrzej Siewior 196c9871c80SThomas Gleixner The teardown callbacks in this section are not allowed to fail. 197ff58fa7fSSebastian Andrzej Siewior 198c9871c80SThomas Gleixner* The STARTING section 199f8c6a07cSYanteng Si 200c9871c80SThomas Gleixner The STARTING section covers the state space between CPUHP_BRINGUP_CPU + 1 201c9871c80SThomas Gleixner and CPUHP_AP_ONLINE. 202ff58fa7fSSebastian Andrzej Siewior 203c9871c80SThomas Gleixner The startup callbacks in this section are invoked on the hotplugged CPU 204c9871c80SThomas Gleixner with interrupts disabled during a CPU online operation in the early CPU 205c9871c80SThomas Gleixner setup code. The teardown callbacks are invoked with interrupts disabled 206c9871c80SThomas Gleixner on the hotplugged CPU during a CPU offline operation shortly before the 207c9871c80SThomas Gleixner CPU is completely shut down. 208ff58fa7fSSebastian Andrzej Siewior 209c9871c80SThomas Gleixner The callbacks in this section are not allowed to fail. 210f8c6a07cSYanteng Si 211c9871c80SThomas Gleixner The callbacks are used for low level hardware initialization/shutdown and 212c9871c80SThomas Gleixner for core subsystems. 213ff58fa7fSSebastian Andrzej Siewior 214c9871c80SThomas Gleixner* The ONLINE section 215ff58fa7fSSebastian Andrzej Siewior 216c9871c80SThomas Gleixner The ONLINE section covers the state space between CPUHP_AP_ONLINE + 1 and 217c9871c80SThomas Gleixner CPUHP_ONLINE. 218c9871c80SThomas Gleixner 219c9871c80SThomas Gleixner The startup callbacks in this section are invoked on the hotplugged CPU 220c9871c80SThomas Gleixner during a CPU online operation. The teardown callbacks are invoked on the 221c9871c80SThomas Gleixner hotplugged CPU during a CPU offline operation. 222c9871c80SThomas Gleixner 223c9871c80SThomas Gleixner The callbacks are invoked in the context of the per CPU hotplug thread, 224c9871c80SThomas Gleixner which is pinned on the hotplugged CPU. The callbacks are invoked with 225c9871c80SThomas Gleixner interrupts and preemption enabled. 226c9871c80SThomas Gleixner 227c9871c80SThomas Gleixner The callbacks are allowed to fail. When a callback fails the hotplug 228c9871c80SThomas Gleixner operation is aborted and the CPU is brought back to the previous state. 229c9871c80SThomas Gleixner 230c9871c80SThomas GleixnerCPU online/offline operations 231c9871c80SThomas Gleixner----------------------------- 232c9871c80SThomas Gleixner 233c9871c80SThomas GleixnerA successful online operation looks like this:: 234c9871c80SThomas Gleixner 235c9871c80SThomas Gleixner [CPUHP_OFFLINE] 236c9871c80SThomas Gleixner [CPUHP_OFFLINE + 1]->startup() -> success 237c9871c80SThomas Gleixner [CPUHP_OFFLINE + 2]->startup() -> success 238c9871c80SThomas Gleixner [CPUHP_OFFLINE + 3] -> skipped because startup == NULL 239c9871c80SThomas Gleixner ... 240c9871c80SThomas Gleixner [CPUHP_BRINGUP_CPU]->startup() -> success 241c9871c80SThomas Gleixner === End of PREPARE section 242c9871c80SThomas Gleixner [CPUHP_BRINGUP_CPU + 1]->startup() -> success 243c9871c80SThomas Gleixner ... 244c9871c80SThomas Gleixner [CPUHP_AP_ONLINE]->startup() -> success 245c9871c80SThomas Gleixner === End of STARTUP section 246c9871c80SThomas Gleixner [CPUHP_AP_ONLINE + 1]->startup() -> success 247c9871c80SThomas Gleixner ... 248c9871c80SThomas Gleixner [CPUHP_ONLINE - 1]->startup() -> success 249c9871c80SThomas Gleixner [CPUHP_ONLINE] 250c9871c80SThomas Gleixner 251c9871c80SThomas GleixnerA successful offline operation looks like this:: 252c9871c80SThomas Gleixner 253c9871c80SThomas Gleixner [CPUHP_ONLINE] 254c9871c80SThomas Gleixner [CPUHP_ONLINE - 1]->teardown() -> success 255c9871c80SThomas Gleixner ... 256c9871c80SThomas Gleixner [CPUHP_AP_ONLINE + 1]->teardown() -> success 257c9871c80SThomas Gleixner === Start of STARTUP section 258c9871c80SThomas Gleixner [CPUHP_AP_ONLINE]->teardown() -> success 259c9871c80SThomas Gleixner ... 260c9871c80SThomas Gleixner [CPUHP_BRINGUP_ONLINE - 1]->teardown() 261c9871c80SThomas Gleixner ... 262c9871c80SThomas Gleixner === Start of PREPARE section 263c9871c80SThomas Gleixner [CPUHP_BRINGUP_CPU]->teardown() 264c9871c80SThomas Gleixner [CPUHP_OFFLINE + 3]->teardown() 265c9871c80SThomas Gleixner [CPUHP_OFFLINE + 2] -> skipped because teardown == NULL 266c9871c80SThomas Gleixner [CPUHP_OFFLINE + 1]->teardown() 267c9871c80SThomas Gleixner [CPUHP_OFFLINE] 268c9871c80SThomas Gleixner 269c9871c80SThomas GleixnerA failed online operation looks like this:: 270c9871c80SThomas Gleixner 271c9871c80SThomas Gleixner [CPUHP_OFFLINE] 272c9871c80SThomas Gleixner [CPUHP_OFFLINE + 1]->startup() -> success 273c9871c80SThomas Gleixner [CPUHP_OFFLINE + 2]->startup() -> success 274c9871c80SThomas Gleixner [CPUHP_OFFLINE + 3] -> skipped because startup == NULL 275c9871c80SThomas Gleixner ... 276c9871c80SThomas Gleixner [CPUHP_BRINGUP_CPU]->startup() -> success 277c9871c80SThomas Gleixner === End of PREPARE section 278c9871c80SThomas Gleixner [CPUHP_BRINGUP_CPU + 1]->startup() -> success 279c9871c80SThomas Gleixner ... 280c9871c80SThomas Gleixner [CPUHP_AP_ONLINE]->startup() -> success 281c9871c80SThomas Gleixner === End of STARTUP section 282c9871c80SThomas Gleixner [CPUHP_AP_ONLINE + 1]->startup() -> success 283c9871c80SThomas Gleixner --- 284c9871c80SThomas Gleixner [CPUHP_AP_ONLINE + N]->startup() -> fail 285c9871c80SThomas Gleixner [CPUHP_AP_ONLINE + (N - 1)]->teardown() 286c9871c80SThomas Gleixner ... 287c9871c80SThomas Gleixner [CPUHP_AP_ONLINE + 1]->teardown() 288c9871c80SThomas Gleixner === Start of STARTUP section 289c9871c80SThomas Gleixner [CPUHP_AP_ONLINE]->teardown() 290c9871c80SThomas Gleixner ... 291c9871c80SThomas Gleixner [CPUHP_BRINGUP_ONLINE - 1]->teardown() 292c9871c80SThomas Gleixner ... 293c9871c80SThomas Gleixner === Start of PREPARE section 294c9871c80SThomas Gleixner [CPUHP_BRINGUP_CPU]->teardown() 295c9871c80SThomas Gleixner [CPUHP_OFFLINE + 3]->teardown() 296c9871c80SThomas Gleixner [CPUHP_OFFLINE + 2] -> skipped because teardown == NULL 297c9871c80SThomas Gleixner [CPUHP_OFFLINE + 1]->teardown() 298c9871c80SThomas Gleixner [CPUHP_OFFLINE] 299c9871c80SThomas Gleixner 300c9871c80SThomas GleixnerA failed offline operation looks like this:: 301c9871c80SThomas Gleixner 302c9871c80SThomas Gleixner [CPUHP_ONLINE] 303c9871c80SThomas Gleixner [CPUHP_ONLINE - 1]->teardown() -> success 304c9871c80SThomas Gleixner ... 305c9871c80SThomas Gleixner [CPUHP_ONLINE - N]->teardown() -> fail 306c9871c80SThomas Gleixner [CPUHP_ONLINE - (N - 1)]->startup() 307c9871c80SThomas Gleixner ... 308c9871c80SThomas Gleixner [CPUHP_ONLINE - 1]->startup() 309c9871c80SThomas Gleixner [CPUHP_ONLINE] 310c9871c80SThomas Gleixner 311c9871c80SThomas GleixnerRecursive failures cannot be handled sensibly. Look at the following 312c9871c80SThomas Gleixnerexample of a recursive fail due to a failed offline operation: :: 313c9871c80SThomas Gleixner 314c9871c80SThomas Gleixner [CPUHP_ONLINE] 315c9871c80SThomas Gleixner [CPUHP_ONLINE - 1]->teardown() -> success 316c9871c80SThomas Gleixner ... 317c9871c80SThomas Gleixner [CPUHP_ONLINE - N]->teardown() -> fail 318c9871c80SThomas Gleixner [CPUHP_ONLINE - (N - 1)]->startup() -> success 319c9871c80SThomas Gleixner [CPUHP_ONLINE - (N - 2)]->startup() -> fail 320c9871c80SThomas Gleixner 321c9871c80SThomas GleixnerThe CPU hotplug state machine stops right here and does not try to go back 322c9871c80SThomas Gleixnerdown again because that would likely result in an endless loop:: 323c9871c80SThomas Gleixner 324c9871c80SThomas Gleixner [CPUHP_ONLINE - (N - 1)]->teardown() -> success 325c9871c80SThomas Gleixner [CPUHP_ONLINE - N]->teardown() -> fail 326c9871c80SThomas Gleixner [CPUHP_ONLINE - (N - 1)]->startup() -> success 327c9871c80SThomas Gleixner [CPUHP_ONLINE - (N - 2)]->startup() -> fail 328c9871c80SThomas Gleixner [CPUHP_ONLINE - (N - 1)]->teardown() -> success 329c9871c80SThomas Gleixner [CPUHP_ONLINE - N]->teardown() -> fail 330c9871c80SThomas Gleixner 331c9871c80SThomas GleixnerLather, rinse and repeat. In this case the CPU left in state:: 332c9871c80SThomas Gleixner 333c9871c80SThomas Gleixner [CPUHP_ONLINE - (N - 1)] 334c9871c80SThomas Gleixner 335c9871c80SThomas Gleixnerwhich at least lets the system make progress and gives the user a chance to 336c9871c80SThomas Gleixnerdebug or even resolve the situation. 337c9871c80SThomas Gleixner 338c9871c80SThomas GleixnerAllocating a state 339c9871c80SThomas Gleixner------------------ 340c9871c80SThomas Gleixner 341c9871c80SThomas GleixnerThere are two ways to allocate a CPU hotplug state: 342c9871c80SThomas Gleixner 343c9871c80SThomas Gleixner* Static allocation 344c9871c80SThomas Gleixner 345c9871c80SThomas Gleixner Static allocation has to be used when the subsystem or driver has 346c9871c80SThomas Gleixner ordering requirements versus other CPU hotplug states. E.g. the PERF core 347c9871c80SThomas Gleixner startup callback has to be invoked before the PERF driver startup 348c9871c80SThomas Gleixner callbacks during a CPU online operation. During a CPU offline operation 349c9871c80SThomas Gleixner the driver teardown callbacks have to be invoked before the core teardown 350c9871c80SThomas Gleixner callback. The statically allocated states are described by constants in 351c9871c80SThomas Gleixner the cpuhp_state enum which can be found in include/linux/cpuhotplug.h. 352c9871c80SThomas Gleixner 353c9871c80SThomas Gleixner Insert the state into the enum at the proper place so the ordering 354c9871c80SThomas Gleixner requirements are fulfilled. The state constant has to be used for state 355c9871c80SThomas Gleixner setup and removal. 356c9871c80SThomas Gleixner 357c9871c80SThomas Gleixner Static allocation is also required when the state callbacks are not set 358c9871c80SThomas Gleixner up at runtime and are part of the initializer of the CPU hotplug state 359c9871c80SThomas Gleixner array in kernel/cpu.c. 360c9871c80SThomas Gleixner 361c9871c80SThomas Gleixner* Dynamic allocation 362c9871c80SThomas Gleixner 363c9871c80SThomas Gleixner When there are no ordering requirements for the state callbacks then 364c9871c80SThomas Gleixner dynamic allocation is the preferred method. The state number is allocated 365c9871c80SThomas Gleixner by the setup function and returned to the caller on success. 366c9871c80SThomas Gleixner 367c9871c80SThomas Gleixner Only the PREPARE and ONLINE sections provide a dynamic allocation 368c9871c80SThomas Gleixner range. The STARTING section does not as most of the callbacks in that 369c9871c80SThomas Gleixner section have explicit ordering requirements. 370c9871c80SThomas Gleixner 371c9871c80SThomas GleixnerSetup of a CPU hotplug state 372c9871c80SThomas Gleixner---------------------------- 373c9871c80SThomas Gleixner 374c9871c80SThomas GleixnerThe core code provides the following functions to setup a state: 375c9871c80SThomas Gleixner 376c9871c80SThomas Gleixner* cpuhp_setup_state(state, name, startup, teardown) 377c9871c80SThomas Gleixner* cpuhp_setup_state_nocalls(state, name, startup, teardown) 378c9871c80SThomas Gleixner* cpuhp_setup_state_cpuslocked(state, name, startup, teardown) 379c9871c80SThomas Gleixner* cpuhp_setup_state_nocalls_cpuslocked(state, name, startup, teardown) 380c9871c80SThomas Gleixner 381c9871c80SThomas GleixnerFor cases where a driver or a subsystem has multiple instances and the same 382c9871c80SThomas GleixnerCPU hotplug state callbacks need to be invoked for each instance, the CPU 383c9871c80SThomas Gleixnerhotplug core provides multi-instance support. The advantage over driver 384c9871c80SThomas Gleixnerspecific instance lists is that the instance related functions are fully 385c9871c80SThomas Gleixnerserialized against CPU hotplug operations and provide the automatic 386c9871c80SThomas Gleixnerinvocations of the state callbacks on add and removal. To set up such a 387c9871c80SThomas Gleixnermulti-instance state the following function is available: 388c9871c80SThomas Gleixner 389c9871c80SThomas Gleixner* cpuhp_setup_state_multi(state, name, startup, teardown) 390c9871c80SThomas Gleixner 391c9871c80SThomas GleixnerThe @state argument is either a statically allocated state or one of the 392e0a99a83SAnna-Maria Behnsenconstants for dynamically allocated states - CPUHP_BP_PREPARE_DYN, 393e0a99a83SAnna-Maria BehnsenCPUHP_AP_ONLINE_DYN - depending on the state section (PREPARE, ONLINE) for 394c9871c80SThomas Gleixnerwhich a dynamic state should be allocated. 395c9871c80SThomas Gleixner 396c9871c80SThomas GleixnerThe @name argument is used for sysfs output and for instrumentation. The 397c9871c80SThomas Gleixnernaming convention is "subsys:mode" or "subsys/driver:mode", 398c9871c80SThomas Gleixnere.g. "perf:mode" or "perf/x86:mode". The common mode names are: 399c9871c80SThomas Gleixner 400c9871c80SThomas Gleixner======== ======================================================= 401c9871c80SThomas Gleixnerprepare For states in the PREPARE section 402c9871c80SThomas Gleixner 403c9871c80SThomas Gleixnerdead For states in the PREPARE section which do not provide 404c9871c80SThomas Gleixner a startup callback 405c9871c80SThomas Gleixner 406c9871c80SThomas Gleixnerstarting For states in the STARTING section 407c9871c80SThomas Gleixner 408c9871c80SThomas Gleixnerdying For states in the STARTING section which do not provide 409c9871c80SThomas Gleixner a startup callback 410c9871c80SThomas Gleixner 411c9871c80SThomas Gleixneronline For states in the ONLINE section 412c9871c80SThomas Gleixner 413c9871c80SThomas Gleixneroffline For states in the ONLINE section which do not provide 414c9871c80SThomas Gleixner a startup callback 415c9871c80SThomas Gleixner======== ======================================================= 416c9871c80SThomas Gleixner 417c9871c80SThomas GleixnerAs the @name argument is only used for sysfs and instrumentation other mode 418c9871c80SThomas Gleixnerdescriptors can be used as well if they describe the nature of the state 419c9871c80SThomas Gleixnerbetter than the common ones. 420c9871c80SThomas Gleixner 421c9871c80SThomas GleixnerExamples for @name arguments: "perf/online", "perf/x86:prepare", 422c9871c80SThomas Gleixner"RCU/tree:dying", "sched/waitempty" 423c9871c80SThomas Gleixner 424c9871c80SThomas GleixnerThe @startup argument is a function pointer to the callback which should be 425c9871c80SThomas Gleixnerinvoked during a CPU online operation. If the usage site does not require a 426c9871c80SThomas Gleixnerstartup callback set the pointer to NULL. 427c9871c80SThomas Gleixner 428c9871c80SThomas GleixnerThe @teardown argument is a function pointer to the callback which should 429c9871c80SThomas Gleixnerbe invoked during a CPU offline operation. If the usage site does not 430c9871c80SThomas Gleixnerrequire a teardown callback set the pointer to NULL. 431c9871c80SThomas Gleixner 432c9871c80SThomas GleixnerThe functions differ in the way how the installed callbacks are treated: 433c9871c80SThomas Gleixner 434c9871c80SThomas Gleixner * cpuhp_setup_state_nocalls(), cpuhp_setup_state_nocalls_cpuslocked() 435c9871c80SThomas Gleixner and cpuhp_setup_state_multi() only install the callbacks 436c9871c80SThomas Gleixner 437c9871c80SThomas Gleixner * cpuhp_setup_state() and cpuhp_setup_state_cpuslocked() install the 438c9871c80SThomas Gleixner callbacks and invoke the @startup callback (if not NULL) for all online 439c9871c80SThomas Gleixner CPUs which have currently a state greater than the newly installed 440c9871c80SThomas Gleixner state. Depending on the state section the callback is either invoked on 441c9871c80SThomas Gleixner the current CPU (PREPARE section) or on each online CPU (ONLINE 442c9871c80SThomas Gleixner section) in the context of the CPU's hotplug thread. 443c9871c80SThomas Gleixner 444c9871c80SThomas Gleixner If a callback fails for CPU N then the teardown callback for CPU 445c9871c80SThomas Gleixner 0 .. N-1 is invoked to rollback the operation. The state setup fails, 446c9871c80SThomas Gleixner the callbacks for the state are not installed and in case of dynamic 447c9871c80SThomas Gleixner allocation the allocated state is freed. 448c9871c80SThomas Gleixner 449c9871c80SThomas GleixnerThe state setup and the callback invocations are serialized against CPU 450c9871c80SThomas Gleixnerhotplug operations. If the setup function has to be called from a CPU 451c9871c80SThomas Gleixnerhotplug read locked region, then the _cpuslocked() variants have to be 452c9871c80SThomas Gleixnerused. These functions cannot be used from within CPU hotplug callbacks. 453c9871c80SThomas Gleixner 454c9871c80SThomas GleixnerThe function return values: 455c9871c80SThomas Gleixner ======== =================================================================== 456c9871c80SThomas Gleixner 0 Statically allocated state was successfully set up 457c9871c80SThomas Gleixner 458c9871c80SThomas Gleixner >0 Dynamically allocated state was successfully set up. 459c9871c80SThomas Gleixner 460c9871c80SThomas Gleixner The returned number is the state number which was allocated. If 461c9871c80SThomas Gleixner the state callbacks have to be removed later, e.g. module 462c9871c80SThomas Gleixner removal, then this number has to be saved by the caller and used 463c9871c80SThomas Gleixner as @state argument for the state remove function. For 464c9871c80SThomas Gleixner multi-instance states the dynamically allocated state number is 465c9871c80SThomas Gleixner also required as @state argument for the instance add/remove 466c9871c80SThomas Gleixner operations. 467c9871c80SThomas Gleixner 468c9871c80SThomas Gleixner <0 Operation failed 469c9871c80SThomas Gleixner ======== =================================================================== 470c9871c80SThomas Gleixner 471c9871c80SThomas GleixnerRemoval of a CPU hotplug state 472c9871c80SThomas Gleixner------------------------------ 473c9871c80SThomas Gleixner 474c9871c80SThomas GleixnerTo remove a previously set up state, the following functions are provided: 475c9871c80SThomas Gleixner 476c9871c80SThomas Gleixner* cpuhp_remove_state(state) 477c9871c80SThomas Gleixner* cpuhp_remove_state_nocalls(state) 478c9871c80SThomas Gleixner* cpuhp_remove_state_nocalls_cpuslocked(state) 479c9871c80SThomas Gleixner* cpuhp_remove_multi_state(state) 480c9871c80SThomas Gleixner 481c9871c80SThomas GleixnerThe @state argument is either a statically allocated state or the state 482c9871c80SThomas Gleixnernumber which was allocated in the dynamic range by cpuhp_setup_state*(). If 483c9871c80SThomas Gleixnerthe state is in the dynamic range, then the state number is freed and 484c9871c80SThomas Gleixneravailable for dynamic allocation again. 485c9871c80SThomas Gleixner 486c9871c80SThomas GleixnerThe functions differ in the way how the installed callbacks are treated: 487c9871c80SThomas Gleixner 488c9871c80SThomas Gleixner * cpuhp_remove_state_nocalls(), cpuhp_remove_state_nocalls_cpuslocked() 489c9871c80SThomas Gleixner and cpuhp_remove_multi_state() only remove the callbacks. 490c9871c80SThomas Gleixner 491c9871c80SThomas Gleixner * cpuhp_remove_state() removes the callbacks and invokes the teardown 492c9871c80SThomas Gleixner callback (if not NULL) for all online CPUs which have currently a state 493c9871c80SThomas Gleixner greater than the removed state. Depending on the state section the 494c9871c80SThomas Gleixner callback is either invoked on the current CPU (PREPARE section) or on 495c9871c80SThomas Gleixner each online CPU (ONLINE section) in the context of the CPU's hotplug 496c9871c80SThomas Gleixner thread. 497c9871c80SThomas Gleixner 498c9871c80SThomas Gleixner In order to complete the removal, the teardown callback should not fail. 499c9871c80SThomas Gleixner 500c9871c80SThomas GleixnerThe state removal and the callback invocations are serialized against CPU 501c9871c80SThomas Gleixnerhotplug operations. If the remove function has to be called from a CPU 502c9871c80SThomas Gleixnerhotplug read locked region, then the _cpuslocked() variants have to be 503c9871c80SThomas Gleixnerused. These functions cannot be used from within CPU hotplug callbacks. 504c9871c80SThomas Gleixner 505c9871c80SThomas GleixnerIf a multi-instance state is removed then the caller has to remove all 506c9871c80SThomas Gleixnerinstances first. 507c9871c80SThomas Gleixner 508c9871c80SThomas GleixnerMulti-Instance state instance management 509c9871c80SThomas Gleixner---------------------------------------- 510c9871c80SThomas Gleixner 511c9871c80SThomas GleixnerOnce the multi-instance state is set up, instances can be added to the 512c9871c80SThomas Gleixnerstate: 513c9871c80SThomas Gleixner 514c9871c80SThomas Gleixner * cpuhp_state_add_instance(state, node) 515c9871c80SThomas Gleixner * cpuhp_state_add_instance_nocalls(state, node) 516c9871c80SThomas Gleixner 517c9871c80SThomas GleixnerThe @state argument is either a statically allocated state or the state 518c9871c80SThomas Gleixnernumber which was allocated in the dynamic range by cpuhp_setup_state_multi(). 519c9871c80SThomas Gleixner 520c9871c80SThomas GleixnerThe @node argument is a pointer to an hlist_node which is embedded in the 521c9871c80SThomas Gleixnerinstance's data structure. The pointer is handed to the multi-instance 522c9871c80SThomas Gleixnerstate callbacks and can be used by the callback to retrieve the instance 523c9871c80SThomas Gleixnervia container_of(). 524c9871c80SThomas Gleixner 525c9871c80SThomas GleixnerThe functions differ in the way how the installed callbacks are treated: 526c9871c80SThomas Gleixner 527c9871c80SThomas Gleixner * cpuhp_state_add_instance_nocalls() and only adds the instance to the 528c9871c80SThomas Gleixner multi-instance state's node list. 529c9871c80SThomas Gleixner 530c9871c80SThomas Gleixner * cpuhp_state_add_instance() adds the instance and invokes the startup 531c9871c80SThomas Gleixner callback (if not NULL) associated with @state for all online CPUs which 532c9871c80SThomas Gleixner have currently a state greater than @state. The callback is only 533c9871c80SThomas Gleixner invoked for the to be added instance. Depending on the state section 534c9871c80SThomas Gleixner the callback is either invoked on the current CPU (PREPARE section) or 535c9871c80SThomas Gleixner on each online CPU (ONLINE section) in the context of the CPU's hotplug 536c9871c80SThomas Gleixner thread. 537c9871c80SThomas Gleixner 538c9871c80SThomas Gleixner If a callback fails for CPU N then the teardown callback for CPU 539c9871c80SThomas Gleixner 0 .. N-1 is invoked to rollback the operation, the function fails and 540c9871c80SThomas Gleixner the instance is not added to the node list of the multi-instance state. 541c9871c80SThomas Gleixner 542c9871c80SThomas GleixnerTo remove an instance from the state's node list these functions are 543c9871c80SThomas Gleixneravailable: 544c9871c80SThomas Gleixner 545c9871c80SThomas Gleixner * cpuhp_state_remove_instance(state, node) 546c9871c80SThomas Gleixner * cpuhp_state_remove_instance_nocalls(state, node) 547c9871c80SThomas Gleixner 548d2bef8e1SAkhil RajThe arguments are the same as for the cpuhp_state_add_instance*() 549c9871c80SThomas Gleixnervariants above. 550c9871c80SThomas Gleixner 551c9871c80SThomas GleixnerThe functions differ in the way how the installed callbacks are treated: 552c9871c80SThomas Gleixner 553c9871c80SThomas Gleixner * cpuhp_state_remove_instance_nocalls() only removes the instance from the 554c9871c80SThomas Gleixner state's node list. 555c9871c80SThomas Gleixner 556c9871c80SThomas Gleixner * cpuhp_state_remove_instance() removes the instance and invokes the 557c9871c80SThomas Gleixner teardown callback (if not NULL) associated with @state for all online 558c9871c80SThomas Gleixner CPUs which have currently a state greater than @state. The callback is 559c9871c80SThomas Gleixner only invoked for the to be removed instance. Depending on the state 560c9871c80SThomas Gleixner section the callback is either invoked on the current CPU (PREPARE 561c9871c80SThomas Gleixner section) or on each online CPU (ONLINE section) in the context of the 562c9871c80SThomas Gleixner CPU's hotplug thread. 563c9871c80SThomas Gleixner 564c9871c80SThomas Gleixner In order to complete the removal, the teardown callback should not fail. 565c9871c80SThomas Gleixner 566c9871c80SThomas GleixnerThe node list add/remove operations and the callback invocations are 567c9871c80SThomas Gleixnerserialized against CPU hotplug operations. These functions cannot be used 568c9871c80SThomas Gleixnerfrom within CPU hotplug callbacks and CPU hotplug read locked regions. 569c9871c80SThomas Gleixner 570c9871c80SThomas GleixnerExamples 571c9871c80SThomas Gleixner-------- 572c9871c80SThomas Gleixner 573c9871c80SThomas GleixnerSetup and teardown a statically allocated state in the STARTING section for 574c9871c80SThomas Gleixnernotifications on online and offline operations:: 575c9871c80SThomas Gleixner 576c9871c80SThomas Gleixner ret = cpuhp_setup_state(CPUHP_SUBSYS_STARTING, "subsys:starting", subsys_cpu_starting, subsys_cpu_dying); 577c9871c80SThomas Gleixner if (ret < 0) 578c9871c80SThomas Gleixner return ret; 579c9871c80SThomas Gleixner .... 580c9871c80SThomas Gleixner cpuhp_remove_state(CPUHP_SUBSYS_STARTING); 581c9871c80SThomas Gleixner 582c9871c80SThomas GleixnerSetup and teardown a dynamically allocated state in the ONLINE section 583c9871c80SThomas Gleixnerfor notifications on offline operations:: 584c9871c80SThomas Gleixner 585e0a99a83SAnna-Maria Behnsen state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "subsys:offline", NULL, subsys_cpu_offline); 586c9871c80SThomas Gleixner if (state < 0) 587c9871c80SThomas Gleixner return state; 588c9871c80SThomas Gleixner .... 589c9871c80SThomas Gleixner cpuhp_remove_state(state); 590c9871c80SThomas Gleixner 591c9871c80SThomas GleixnerSetup and teardown a dynamically allocated state in the ONLINE section 592c9871c80SThomas Gleixnerfor notifications on online operations without invoking the callbacks:: 593c9871c80SThomas Gleixner 594e0a99a83SAnna-Maria Behnsen state = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "subsys:online", subsys_cpu_online, NULL); 595c9871c80SThomas Gleixner if (state < 0) 596c9871c80SThomas Gleixner return state; 597c9871c80SThomas Gleixner .... 598c9871c80SThomas Gleixner cpuhp_remove_state_nocalls(state); 599c9871c80SThomas Gleixner 600c9871c80SThomas GleixnerSetup, use and teardown a dynamically allocated multi-instance state in the 601c9871c80SThomas GleixnerONLINE section for notifications on online and offline operation:: 602c9871c80SThomas Gleixner 603e0a99a83SAnna-Maria Behnsen state = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "subsys:online", subsys_cpu_online, subsys_cpu_offline); 604c9871c80SThomas Gleixner if (state < 0) 605c9871c80SThomas Gleixner return state; 606c9871c80SThomas Gleixner .... 607c9871c80SThomas Gleixner ret = cpuhp_state_add_instance(state, &inst1->node); 608c9871c80SThomas Gleixner if (ret) 609c9871c80SThomas Gleixner return ret; 610c9871c80SThomas Gleixner .... 611c9871c80SThomas Gleixner ret = cpuhp_state_add_instance(state, &inst2->node); 612c9871c80SThomas Gleixner if (ret) 613c9871c80SThomas Gleixner return ret; 614c9871c80SThomas Gleixner .... 615c9871c80SThomas Gleixner cpuhp_remove_instance(state, &inst1->node); 616c9871c80SThomas Gleixner .... 617c9871c80SThomas Gleixner cpuhp_remove_instance(state, &inst2->node); 618c9871c80SThomas Gleixner .... 619*f62da559SLucas De Marchi cpuhp_remove_multi_state(state); 620c9871c80SThomas Gleixner 621ff58fa7fSSebastian Andrzej Siewior 622ff58fa7fSSebastian Andrzej SiewiorTesting of hotplug states 623ff58fa7fSSebastian Andrzej Siewior========================= 624f8c6a07cSYanteng Si 625ff58fa7fSSebastian Andrzej SiewiorOne way to verify whether a custom state is working as expected or not is to 626ff58fa7fSSebastian Andrzej Siewiorshutdown a CPU and then put it online again. It is also possible to put the CPU 627ff58fa7fSSebastian Andrzej Siewiorto certain state (for instance *CPUHP_AP_ONLINE*) and then go back to 628ff58fa7fSSebastian Andrzej Siewior*CPUHP_ONLINE*. This would simulate an error one state after *CPUHP_AP_ONLINE* 629ff58fa7fSSebastian Andrzej Siewiorwhich would lead to rollback to the online state. 630ff58fa7fSSebastian Andrzej Siewior 631f8c6a07cSYanteng SiAll registered states are enumerated in ``/sys/devices/system/cpu/hotplug/states`` :: 632ff58fa7fSSebastian Andrzej Siewior 633ff58fa7fSSebastian Andrzej Siewior $ tail /sys/devices/system/cpu/hotplug/states 634ff58fa7fSSebastian Andrzej Siewior 138: mm/vmscan:online 635ff58fa7fSSebastian Andrzej Siewior 139: mm/vmstat:online 636ff58fa7fSSebastian Andrzej Siewior 140: lib/percpu_cnt:online 637ff58fa7fSSebastian Andrzej Siewior 141: acpi/cpu-drv:online 638ff58fa7fSSebastian Andrzej Siewior 142: base/cacheinfo:online 639ff58fa7fSSebastian Andrzej Siewior 143: virtio/net:online 640ff58fa7fSSebastian Andrzej Siewior 144: x86/mce:online 641ff58fa7fSSebastian Andrzej Siewior 145: printk:online 642ff58fa7fSSebastian Andrzej Siewior 168: sched:active 643ff58fa7fSSebastian Andrzej Siewior 169: online 644ff58fa7fSSebastian Andrzej Siewior 645f8c6a07cSYanteng SiTo rollback CPU4 to ``lib/percpu_cnt:online`` and back online just issue:: 646ff58fa7fSSebastian Andrzej Siewior 647ff58fa7fSSebastian Andrzej Siewior $ cat /sys/devices/system/cpu/cpu4/hotplug/state 648ff58fa7fSSebastian Andrzej Siewior 169 649ff58fa7fSSebastian Andrzej Siewior $ echo 140 > /sys/devices/system/cpu/cpu4/hotplug/target 650ff58fa7fSSebastian Andrzej Siewior $ cat /sys/devices/system/cpu/cpu4/hotplug/state 651ff58fa7fSSebastian Andrzej Siewior 140 652ff58fa7fSSebastian Andrzej Siewior 653f8c6a07cSYanteng SiIt is important to note that the teardown callback of state 140 have been 654f8c6a07cSYanteng Siinvoked. And now get back online:: 655ff58fa7fSSebastian Andrzej Siewior 656ff58fa7fSSebastian Andrzej Siewior $ echo 169 > /sys/devices/system/cpu/cpu4/hotplug/target 657ff58fa7fSSebastian Andrzej Siewior $ cat /sys/devices/system/cpu/cpu4/hotplug/state 658ff58fa7fSSebastian Andrzej Siewior 169 659ff58fa7fSSebastian Andrzej Siewior 660f8c6a07cSYanteng SiWith trace events enabled, the individual steps are visible, too:: 661ff58fa7fSSebastian Andrzej Siewior 662ff58fa7fSSebastian Andrzej Siewior # TASK-PID CPU# TIMESTAMP FUNCTION 663ff58fa7fSSebastian Andrzej Siewior # | | | | | 664ff58fa7fSSebastian Andrzej Siewior bash-394 [001] 22.976: cpuhp_enter: cpu: 0004 target: 140 step: 169 (cpuhp_kick_ap_work) 665ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 22.977: cpuhp_enter: cpu: 0004 target: 140 step: 168 (sched_cpu_deactivate) 666ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 22.990: cpuhp_exit: cpu: 0004 state: 168 step: 168 ret: 0 667ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 22.991: cpuhp_enter: cpu: 0004 target: 140 step: 144 (mce_cpu_pre_down) 668ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 22.992: cpuhp_exit: cpu: 0004 state: 144 step: 144 ret: 0 669ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 22.993: cpuhp_multi_enter: cpu: 0004 target: 140 step: 143 (virtnet_cpu_down_prep) 670ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 22.994: cpuhp_exit: cpu: 0004 state: 143 step: 143 ret: 0 671ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 22.995: cpuhp_enter: cpu: 0004 target: 140 step: 142 (cacheinfo_cpu_pre_down) 672ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 22.996: cpuhp_exit: cpu: 0004 state: 142 step: 142 ret: 0 673ff58fa7fSSebastian Andrzej Siewior bash-394 [001] 22.997: cpuhp_exit: cpu: 0004 state: 140 step: 169 ret: 0 674ff58fa7fSSebastian Andrzej Siewior bash-394 [005] 95.540: cpuhp_enter: cpu: 0004 target: 169 step: 140 (cpuhp_kick_ap_work) 675ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 95.541: cpuhp_enter: cpu: 0004 target: 169 step: 141 (acpi_soft_cpu_online) 676ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 95.542: cpuhp_exit: cpu: 0004 state: 141 step: 141 ret: 0 677ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 95.543: cpuhp_enter: cpu: 0004 target: 169 step: 142 (cacheinfo_cpu_online) 678ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 95.544: cpuhp_exit: cpu: 0004 state: 142 step: 142 ret: 0 679ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 95.545: cpuhp_multi_enter: cpu: 0004 target: 169 step: 143 (virtnet_cpu_online) 680ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 95.546: cpuhp_exit: cpu: 0004 state: 143 step: 143 ret: 0 681ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 95.547: cpuhp_enter: cpu: 0004 target: 169 step: 144 (mce_cpu_online) 682ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 95.548: cpuhp_exit: cpu: 0004 state: 144 step: 144 ret: 0 683ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 95.549: cpuhp_enter: cpu: 0004 target: 169 step: 145 (console_cpu_notify) 684ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 95.550: cpuhp_exit: cpu: 0004 state: 145 step: 145 ret: 0 685ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 95.551: cpuhp_enter: cpu: 0004 target: 169 step: 168 (sched_cpu_activate) 686ff58fa7fSSebastian Andrzej Siewior cpuhp/4-31 [004] 95.552: cpuhp_exit: cpu: 0004 state: 168 step: 168 ret: 0 687ff58fa7fSSebastian Andrzej Siewior bash-394 [005] 95.553: cpuhp_exit: cpu: 0004 state: 169 step: 140 ret: 0 688ff58fa7fSSebastian Andrzej Siewior 689ff58fa7fSSebastian Andrzej SiewiorAs it an be seen, CPU4 went down until timestamp 22.996 and then back up until 690ff58fa7fSSebastian Andrzej Siewior95.552. All invoked callbacks including their return codes are visible in the 691ff58fa7fSSebastian Andrzej Siewiortrace. 692ff58fa7fSSebastian Andrzej Siewior 693ff58fa7fSSebastian Andrzej SiewiorArchitecture's requirements 694ff58fa7fSSebastian Andrzej Siewior=========================== 695f8c6a07cSYanteng Si 696ff58fa7fSSebastian Andrzej SiewiorThe following functions and configurations are required: 697ff58fa7fSSebastian Andrzej Siewior 698ff58fa7fSSebastian Andrzej Siewior``CONFIG_HOTPLUG_CPU`` 699ff58fa7fSSebastian Andrzej Siewior This entry needs to be enabled in Kconfig 700ff58fa7fSSebastian Andrzej Siewior 701ff58fa7fSSebastian Andrzej Siewior``__cpu_up()`` 702ff58fa7fSSebastian Andrzej Siewior Arch interface to bring up a CPU 703ff58fa7fSSebastian Andrzej Siewior 704ff58fa7fSSebastian Andrzej Siewior``__cpu_disable()`` 705ff58fa7fSSebastian Andrzej Siewior Arch interface to shutdown a CPU, no more interrupts can be handled by the 706ff58fa7fSSebastian Andrzej Siewior kernel after the routine returns. This includes the shutdown of the timer. 707ff58fa7fSSebastian Andrzej Siewior 708ff58fa7fSSebastian Andrzej Siewior``__cpu_die()`` 709ff58fa7fSSebastian Andrzej Siewior This actually supposed to ensure death of the CPU. Actually look at some 710ff58fa7fSSebastian Andrzej Siewior example code in other arch that implement CPU hotplug. The processor is taken 711ff58fa7fSSebastian Andrzej Siewior down from the ``idle()`` loop for that specific architecture. ``__cpu_die()`` 712ff58fa7fSSebastian Andrzej Siewior typically waits for some per_cpu state to be set, to ensure the processor dead 713ff58fa7fSSebastian Andrzej Siewior routine is called to be sure positively. 714ff58fa7fSSebastian Andrzej Siewior 715ff58fa7fSSebastian Andrzej SiewiorUser Space Notification 716ff58fa7fSSebastian Andrzej Siewior======================= 717f8c6a07cSYanteng Si 718f8c6a07cSYanteng SiAfter CPU successfully onlined or offline udev events are sent. A udev rule like:: 719ff58fa7fSSebastian Andrzej Siewior 720ff58fa7fSSebastian Andrzej Siewior SUBSYSTEM=="cpu", DRIVERS=="processor", DEVPATH=="/devices/system/cpu/*", RUN+="the_hotplug_receiver.sh" 721ff58fa7fSSebastian Andrzej Siewior 722f8c6a07cSYanteng Siwill receive all events. A script like:: 723ff58fa7fSSebastian Andrzej Siewior 724ff58fa7fSSebastian Andrzej Siewior #!/bin/sh 725ff58fa7fSSebastian Andrzej Siewior 726ff58fa7fSSebastian Andrzej Siewior if [ "${ACTION}" = "offline" ] 727ff58fa7fSSebastian Andrzej Siewior then 728ff58fa7fSSebastian Andrzej Siewior echo "CPU ${DEVPATH##*/} offline" 729ff58fa7fSSebastian Andrzej Siewior 730ff58fa7fSSebastian Andrzej Siewior elif [ "${ACTION}" = "online" ] 731ff58fa7fSSebastian Andrzej Siewior then 732ff58fa7fSSebastian Andrzej Siewior echo "CPU ${DEVPATH##*/} online" 733ff58fa7fSSebastian Andrzej Siewior 734ff58fa7fSSebastian Andrzej Siewior fi 735ff58fa7fSSebastian Andrzej Siewior 736ff58fa7fSSebastian Andrzej Siewiorcan process the event further. 737ff58fa7fSSebastian Andrzej Siewior 73888a6f899SEric DeVolderWhen changes to the CPUs in the system occur, the sysfs file 73988a6f899SEric DeVolder/sys/devices/system/cpu/crash_hotplug contains '1' if the kernel 740c91c6062SSourabh Jainupdates the kdump capture kernel list of CPUs itself (via elfcorehdr and 741c91c6062SSourabh Jainother relevant kexec segment), or '0' if userspace must update the kdump 742c91c6062SSourabh Jaincapture kernel list of CPUs. 74388a6f899SEric DeVolder 74488a6f899SEric DeVolderThe availability depends on the CONFIG_HOTPLUG_CPU kernel configuration 74588a6f899SEric DeVolderoption. 74688a6f899SEric DeVolder 74788a6f899SEric DeVolderTo skip userspace processing of CPU hot un/plug events for kdump 74888a6f899SEric DeVolder(i.e. the unload-then-reload to obtain a current list of CPUs), this sysfs 74988a6f899SEric DeVolderfile can be used in a udev rule as follows: 75088a6f899SEric DeVolder 75188a6f899SEric DeVolder SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" 75288a6f899SEric DeVolder 75388a6f899SEric DeVolderFor a CPU hot un/plug event, if the architecture supports kernel updates 754c91c6062SSourabh Jainof the elfcorehdr (which contains the list of CPUs) and other relevant 755c91c6062SSourabh Jainkexec segments, then the rule skips the unload-then-reload of the kdump 756c91c6062SSourabh Jaincapture kernel. 75788a6f899SEric DeVolder 758ff58fa7fSSebastian Andrzej SiewiorKernel Inline Documentations Reference 759ff58fa7fSSebastian Andrzej Siewior====================================== 760ff58fa7fSSebastian Andrzej Siewior 761ff58fa7fSSebastian Andrzej Siewior.. kernel-doc:: include/linux/cpuhotplug.h 762