1ff58fa7fSSebastian Andrzej Siewior=========================
2ff58fa7fSSebastian Andrzej SiewiorCPU hotplug in the Kernel
3ff58fa7fSSebastian Andrzej Siewior=========================
4ff58fa7fSSebastian Andrzej Siewior
5c9871c80SThomas Gleixner:Date: September, 2021
6ff58fa7fSSebastian Andrzej Siewior:Author: Sebastian Andrzej Siewior <[email protected]>,
7ff58fa7fSSebastian Andrzej Siewior         Rusty Russell <[email protected]>,
8ff58fa7fSSebastian Andrzej Siewior         Srivatsa Vaddagiri <[email protected]>,
9ff58fa7fSSebastian Andrzej Siewior         Ashok Raj <[email protected]>,
10c9871c80SThomas Gleixner         Joel Schopp <[email protected]>,
11c9871c80SThomas Gleixner	 Thomas Gleixner <[email protected]>
12ff58fa7fSSebastian Andrzej Siewior
13ff58fa7fSSebastian Andrzej SiewiorIntroduction
14ff58fa7fSSebastian Andrzej Siewior============
15ff58fa7fSSebastian Andrzej Siewior
16ff58fa7fSSebastian Andrzej SiewiorModern advances in system architectures have introduced advanced error
17ff58fa7fSSebastian Andrzej Siewiorreporting and correction capabilities in processors. There are couple OEMS that
18ff58fa7fSSebastian Andrzej Siewiorsupport NUMA hardware which are hot pluggable as well, where physical node
19ff58fa7fSSebastian Andrzej Siewiorinsertion and removal require support for CPU hotplug.
20ff58fa7fSSebastian Andrzej Siewior
21ff58fa7fSSebastian Andrzej SiewiorSuch advances require CPUs available to a kernel to be removed either for
22ff58fa7fSSebastian Andrzej Siewiorprovisioning reasons, or for RAS purposes to keep an offending CPU off
23ff58fa7fSSebastian Andrzej Siewiorsystem execution path. Hence the need for CPU hotplug support in the
24ff58fa7fSSebastian Andrzej SiewiorLinux kernel.
25ff58fa7fSSebastian Andrzej Siewior
26ff58fa7fSSebastian Andrzej SiewiorA more novel use of CPU-hotplug support is its use today in suspend resume
27ff58fa7fSSebastian Andrzej Siewiorsupport for SMP. Dual-core and HT support makes even a laptop run SMP kernels
28ff58fa7fSSebastian Andrzej Siewiorwhich didn't support these methods.
29ff58fa7fSSebastian Andrzej Siewior
30ff58fa7fSSebastian Andrzej Siewior
31ff58fa7fSSebastian Andrzej SiewiorCommand Line Switches
32ff58fa7fSSebastian Andrzej Siewior=====================
33ff58fa7fSSebastian Andrzej Siewior``maxcpus=n``
34319f5fa0SBarry Song  Restrict boot time CPUs to *n*. Say if you have four CPUs, using
35ff58fa7fSSebastian Andrzej Siewior  ``maxcpus=2`` will only boot two. You can choose to bring the
36ff58fa7fSSebastian Andrzej Siewior  other CPUs later online.
37ff58fa7fSSebastian Andrzej Siewior
38ff58fa7fSSebastian Andrzej Siewior``nr_cpus=n``
39241c9eb3SYaohui Wang  Restrict the total amount of CPUs the kernel will support. If the number
40241c9eb3SYaohui Wang  supplied here is lower than the number of physically available CPUs, then
41ff58fa7fSSebastian Andrzej Siewior  those CPUs can not be brought online later.
42ff58fa7fSSebastian Andrzej Siewior
43ff58fa7fSSebastian Andrzej Siewior``possible_cpus=n``
44ff58fa7fSSebastian Andrzej Siewior  This option sets ``possible_cpus`` bits in ``cpu_possible_mask``.
45ff58fa7fSSebastian Andrzej Siewior
46ff58fa7fSSebastian Andrzej Siewior  This option is limited to the X86 and S390 architecture.
47ff58fa7fSSebastian Andrzej Siewior
48ff58fa7fSSebastian Andrzej Siewior``cpu0_hotplug``
49ff58fa7fSSebastian Andrzej Siewior  Allow to shutdown CPU0.
50ff58fa7fSSebastian Andrzej Siewior
51ff58fa7fSSebastian Andrzej Siewior  This option is limited to the X86 architecture.
52ff58fa7fSSebastian Andrzej Siewior
53ff58fa7fSSebastian Andrzej SiewiorCPU maps
54ff58fa7fSSebastian Andrzej Siewior========
55ff58fa7fSSebastian Andrzej Siewior
56ff58fa7fSSebastian Andrzej Siewior``cpu_possible_mask``
57ff58fa7fSSebastian Andrzej Siewior  Bitmap of possible CPUs that can ever be available in the
58ff58fa7fSSebastian Andrzej Siewior  system. This is used to allocate some boot time memory for per_cpu variables
59ff58fa7fSSebastian Andrzej Siewior  that aren't designed to grow/shrink as CPUs are made available or removed.
60ff58fa7fSSebastian Andrzej Siewior  Once set during boot time discovery phase, the map is static, i.e no bits
61ff58fa7fSSebastian Andrzej Siewior  are added or removed anytime. Trimming it accurately for your system needs
62ff58fa7fSSebastian Andrzej Siewior  upfront can save some boot time memory.
63ff58fa7fSSebastian Andrzej Siewior
64ff58fa7fSSebastian Andrzej Siewior``cpu_online_mask``
65ff58fa7fSSebastian Andrzej Siewior  Bitmap of all CPUs currently online. Its set in ``__cpu_up()``
66ff58fa7fSSebastian Andrzej Siewior  after a CPU is available for kernel scheduling and ready to receive
67ff58fa7fSSebastian Andrzej Siewior  interrupts from devices. Its cleared when a CPU is brought down using
68ff58fa7fSSebastian Andrzej Siewior  ``__cpu_disable()``, before which all OS services including interrupts are
69ff58fa7fSSebastian Andrzej Siewior  migrated to another target CPU.
70ff58fa7fSSebastian Andrzej Siewior
71ff58fa7fSSebastian Andrzej Siewior``cpu_present_mask``
72ff58fa7fSSebastian Andrzej Siewior  Bitmap of CPUs currently present in the system. Not all
73ff58fa7fSSebastian Andrzej Siewior  of them may be online. When physical hotplug is processed by the relevant
74ff58fa7fSSebastian Andrzej Siewior  subsystem (e.g ACPI) can change and new bit either be added or removed
75ff58fa7fSSebastian Andrzej Siewior  from the map depending on the event is hot-add/hot-remove. There are currently
76ff58fa7fSSebastian Andrzej Siewior  no locking rules as of now. Typical usage is to init topology during boot,
77ff58fa7fSSebastian Andrzej Siewior  at which time hotplug is disabled.
78ff58fa7fSSebastian Andrzej Siewior
79ff58fa7fSSebastian Andrzej SiewiorYou really don't need to manipulate any of the system CPU maps. They should
80ff58fa7fSSebastian Andrzej Siewiorbe read-only for most use. When setting up per-cpu resources almost always use
81ff58fa7fSSebastian Andrzej Siewior``cpu_possible_mask`` or ``for_each_possible_cpu()`` to iterate. To macro
82ff58fa7fSSebastian Andrzej Siewior``for_each_cpu()`` can be used to iterate over a custom CPU mask.
83ff58fa7fSSebastian Andrzej Siewior
84ff58fa7fSSebastian Andrzej SiewiorNever use anything other than ``cpumask_t`` to represent bitmap of CPUs.
85ff58fa7fSSebastian Andrzej Siewior
86ff58fa7fSSebastian Andrzej Siewior
87ff58fa7fSSebastian Andrzej SiewiorUsing CPU hotplug
88ff58fa7fSSebastian Andrzej Siewior=================
89f8c6a07cSYanteng Si
90ff58fa7fSSebastian Andrzej SiewiorThe kernel option *CONFIG_HOTPLUG_CPU* needs to be enabled. It is currently
91ff58fa7fSSebastian Andrzej Siewioravailable on multiple architectures including ARM, MIPS, PowerPC and X86. The
92f8c6a07cSYanteng Siconfiguration is done via the sysfs interface::
93ff58fa7fSSebastian Andrzej Siewior
94ff58fa7fSSebastian Andrzej Siewior $ ls -lh /sys/devices/system/cpu
95ff58fa7fSSebastian Andrzej Siewior total 0
96ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu0
97ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu1
98ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu2
99ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu3
100ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu4
101ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu5
102ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu6
103ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu7
104ff58fa7fSSebastian Andrzej Siewior drwxr-xr-x  2 root root    0 Dec 21 16:33 hotplug
105ff58fa7fSSebastian Andrzej Siewior -r--r--r--  1 root root 4.0K Dec 21 16:33 offline
106ff58fa7fSSebastian Andrzej Siewior -r--r--r--  1 root root 4.0K Dec 21 16:33 online
107ff58fa7fSSebastian Andrzej Siewior -r--r--r--  1 root root 4.0K Dec 21 16:33 possible
108ff58fa7fSSebastian Andrzej Siewior -r--r--r--  1 root root 4.0K Dec 21 16:33 present
109ff58fa7fSSebastian Andrzej Siewior
110ff58fa7fSSebastian Andrzej SiewiorThe files *offline*, *online*, *possible*, *present* represent the CPU masks.
111ff58fa7fSSebastian Andrzej SiewiorEach CPU folder contains an *online* file which controls the logical on (1) and
112f8c6a07cSYanteng Sioff (0) state. To logically shutdown CPU4::
113ff58fa7fSSebastian Andrzej Siewior
114ff58fa7fSSebastian Andrzej Siewior $ echo 0 > /sys/devices/system/cpu/cpu4/online
115ff58fa7fSSebastian Andrzej Siewior  smpboot: CPU 4 is now offline
116ff58fa7fSSebastian Andrzej Siewior
117ff58fa7fSSebastian Andrzej SiewiorOnce the CPU is shutdown, it will be removed from */proc/interrupts*,
118ff58fa7fSSebastian Andrzej Siewior*/proc/cpuinfo* and should also not be shown visible by the *top* command. To
119f8c6a07cSYanteng Sibring CPU4 back online::
120ff58fa7fSSebastian Andrzej Siewior
121ff58fa7fSSebastian Andrzej Siewior $ echo 1 > /sys/devices/system/cpu/cpu4/online
122ff58fa7fSSebastian Andrzej Siewior smpboot: Booting Node 0 Processor 4 APIC 0x1
123ff58fa7fSSebastian Andrzej Siewior
124e59e74dcSThomas GleixnerThe CPU is usable again. This should work on all CPUs, but CPU0 is often special
125e59e74dcSThomas Gleixnerand excluded from CPU hotplug.
126ff58fa7fSSebastian Andrzej Siewior
127ff58fa7fSSebastian Andrzej SiewiorThe CPU hotplug coordination
128ff58fa7fSSebastian Andrzej Siewior============================
129ff58fa7fSSebastian Andrzej Siewior
130ff58fa7fSSebastian Andrzej SiewiorThe offline case
131ff58fa7fSSebastian Andrzej Siewior----------------
132f8c6a07cSYanteng Si
133ff58fa7fSSebastian Andrzej SiewiorOnce a CPU has been logically shutdown the teardown callbacks of registered
134ff58fa7fSSebastian Andrzej Siewiorhotplug states will be invoked, starting with ``CPUHP_ONLINE`` and terminating
135ff58fa7fSSebastian Andrzej Siewiorat state ``CPUHP_OFFLINE``. This includes:
136ff58fa7fSSebastian Andrzej Siewior
137ff58fa7fSSebastian Andrzej Siewior* If tasks are frozen due to a suspend operation then *cpuhp_tasks_frozen*
138ff58fa7fSSebastian Andrzej Siewior  will be set to true.
139ff58fa7fSSebastian Andrzej Siewior* All processes are migrated away from this outgoing CPU to new CPUs.
140ff58fa7fSSebastian Andrzej Siewior  The new CPU is chosen from each process' current cpuset, which may be
141ff58fa7fSSebastian Andrzej Siewior  a subset of all online CPUs.
142ff58fa7fSSebastian Andrzej Siewior* All interrupts targeted to this CPU are migrated to a new CPU
143ff58fa7fSSebastian Andrzej Siewior* timers are also migrated to a new CPU
144ff58fa7fSSebastian Andrzej Siewior* Once all services are migrated, kernel calls an arch specific routine
145ff58fa7fSSebastian Andrzej Siewior  ``__cpu_disable()`` to perform arch specific cleanup.
146ff58fa7fSSebastian Andrzej Siewior
147f8c6a07cSYanteng Si
148c9871c80SThomas GleixnerThe CPU hotplug API
149c9871c80SThomas Gleixner===================
150ff58fa7fSSebastian Andrzej Siewior
151c9871c80SThomas GleixnerCPU hotplug state machine
152c9871c80SThomas Gleixner-------------------------
153ff58fa7fSSebastian Andrzej Siewior
154c9871c80SThomas GleixnerCPU hotplug uses a trivial state machine with a linear state space from
155c9871c80SThomas GleixnerCPUHP_OFFLINE to CPUHP_ONLINE. Each state has a startup and a teardown
156c9871c80SThomas Gleixnercallback.
157ff58fa7fSSebastian Andrzej Siewior
158c9871c80SThomas GleixnerWhen a CPU is onlined, the startup callbacks are invoked sequentially until
159c9871c80SThomas Gleixnerthe state CPUHP_ONLINE is reached. They can also be invoked when the
160c9871c80SThomas Gleixnercallbacks of a state are set up or an instance is added to a multi-instance
161c9871c80SThomas Gleixnerstate.
162ff58fa7fSSebastian Andrzej Siewior
163c9871c80SThomas GleixnerWhen a CPU is offlined the teardown callbacks are invoked in the reverse
164c9871c80SThomas Gleixnerorder sequentially until the state CPUHP_OFFLINE is reached. They can also
165c9871c80SThomas Gleixnerbe invoked when the callbacks of a state are removed or an instance is
166c9871c80SThomas Gleixnerremoved from a multi-instance state.
167ff58fa7fSSebastian Andrzej Siewior
168c9871c80SThomas GleixnerIf a usage site requires only a callback in one direction of the hotplug
169c9871c80SThomas Gleixneroperations (CPU online or CPU offline) then the other not-required callback
170c9871c80SThomas Gleixnercan be set to NULL when the state is set up.
171f8c6a07cSYanteng Si
172c9871c80SThomas GleixnerThe state space is divided into three sections:
173ff58fa7fSSebastian Andrzej Siewior
174c9871c80SThomas Gleixner* The PREPARE section
175ff58fa7fSSebastian Andrzej Siewior
176c9871c80SThomas Gleixner  The PREPARE section covers the state space from CPUHP_OFFLINE to
177c9871c80SThomas Gleixner  CPUHP_BRINGUP_CPU.
178ff58fa7fSSebastian Andrzej Siewior
179c9871c80SThomas Gleixner  The startup callbacks in this section are invoked before the CPU is
180c9871c80SThomas Gleixner  started during a CPU online operation. The teardown callbacks are invoked
181c9871c80SThomas Gleixner  after the CPU has become dysfunctional during a CPU offline operation.
182ff58fa7fSSebastian Andrzej Siewior
183c9871c80SThomas Gleixner  The callbacks are invoked on a control CPU as they can't obviously run on
184c9871c80SThomas Gleixner  the hotplugged CPU which is either not yet started or has become
185c9871c80SThomas Gleixner  dysfunctional already.
186ff58fa7fSSebastian Andrzej Siewior
187c9871c80SThomas Gleixner  The startup callbacks are used to setup resources which are required to
188c9871c80SThomas Gleixner  bring a CPU successfully online. The teardown callbacks are used to free
189c9871c80SThomas Gleixner  resources or to move pending work to an online CPU after the hotplugged
190c9871c80SThomas Gleixner  CPU became dysfunctional.
191f8c6a07cSYanteng Si
192c9871c80SThomas Gleixner  The startup callbacks are allowed to fail. If a callback fails, the CPU
193c9871c80SThomas Gleixner  online operation is aborted and the CPU is brought down to the previous
194c9871c80SThomas Gleixner  state (usually CPUHP_OFFLINE) again.
195ff58fa7fSSebastian Andrzej Siewior
196c9871c80SThomas Gleixner  The teardown callbacks in this section are not allowed to fail.
197ff58fa7fSSebastian Andrzej Siewior
198c9871c80SThomas Gleixner* The STARTING section
199f8c6a07cSYanteng Si
200c9871c80SThomas Gleixner  The STARTING section covers the state space between CPUHP_BRINGUP_CPU + 1
201c9871c80SThomas Gleixner  and CPUHP_AP_ONLINE.
202ff58fa7fSSebastian Andrzej Siewior
203c9871c80SThomas Gleixner  The startup callbacks in this section are invoked on the hotplugged CPU
204c9871c80SThomas Gleixner  with interrupts disabled during a CPU online operation in the early CPU
205c9871c80SThomas Gleixner  setup code. The teardown callbacks are invoked with interrupts disabled
206c9871c80SThomas Gleixner  on the hotplugged CPU during a CPU offline operation shortly before the
207c9871c80SThomas Gleixner  CPU is completely shut down.
208ff58fa7fSSebastian Andrzej Siewior
209c9871c80SThomas Gleixner  The callbacks in this section are not allowed to fail.
210f8c6a07cSYanteng Si
211c9871c80SThomas Gleixner  The callbacks are used for low level hardware initialization/shutdown and
212c9871c80SThomas Gleixner  for core subsystems.
213ff58fa7fSSebastian Andrzej Siewior
214c9871c80SThomas Gleixner* The ONLINE section
215ff58fa7fSSebastian Andrzej Siewior
216c9871c80SThomas Gleixner  The ONLINE section covers the state space between CPUHP_AP_ONLINE + 1 and
217c9871c80SThomas Gleixner  CPUHP_ONLINE.
218c9871c80SThomas Gleixner
219c9871c80SThomas Gleixner  The startup callbacks in this section are invoked on the hotplugged CPU
220c9871c80SThomas Gleixner  during a CPU online operation. The teardown callbacks are invoked on the
221c9871c80SThomas Gleixner  hotplugged CPU during a CPU offline operation.
222c9871c80SThomas Gleixner
223c9871c80SThomas Gleixner  The callbacks are invoked in the context of the per CPU hotplug thread,
224c9871c80SThomas Gleixner  which is pinned on the hotplugged CPU. The callbacks are invoked with
225c9871c80SThomas Gleixner  interrupts and preemption enabled.
226c9871c80SThomas Gleixner
227c9871c80SThomas Gleixner  The callbacks are allowed to fail. When a callback fails the hotplug
228c9871c80SThomas Gleixner  operation is aborted and the CPU is brought back to the previous state.
229c9871c80SThomas Gleixner
230c9871c80SThomas GleixnerCPU online/offline operations
231c9871c80SThomas Gleixner-----------------------------
232c9871c80SThomas Gleixner
233c9871c80SThomas GleixnerA successful online operation looks like this::
234c9871c80SThomas Gleixner
235c9871c80SThomas Gleixner  [CPUHP_OFFLINE]
236c9871c80SThomas Gleixner  [CPUHP_OFFLINE + 1]->startup()       -> success
237c9871c80SThomas Gleixner  [CPUHP_OFFLINE + 2]->startup()       -> success
238c9871c80SThomas Gleixner  [CPUHP_OFFLINE + 3]                  -> skipped because startup == NULL
239c9871c80SThomas Gleixner  ...
240c9871c80SThomas Gleixner  [CPUHP_BRINGUP_CPU]->startup()       -> success
241c9871c80SThomas Gleixner  === End of PREPARE section
242c9871c80SThomas Gleixner  [CPUHP_BRINGUP_CPU + 1]->startup()   -> success
243c9871c80SThomas Gleixner  ...
244c9871c80SThomas Gleixner  [CPUHP_AP_ONLINE]->startup()         -> success
245c9871c80SThomas Gleixner  === End of STARTUP section
246c9871c80SThomas Gleixner  [CPUHP_AP_ONLINE + 1]->startup()     -> success
247c9871c80SThomas Gleixner  ...
248c9871c80SThomas Gleixner  [CPUHP_ONLINE - 1]->startup()        -> success
249c9871c80SThomas Gleixner  [CPUHP_ONLINE]
250c9871c80SThomas Gleixner
251c9871c80SThomas GleixnerA successful offline operation looks like this::
252c9871c80SThomas Gleixner
253c9871c80SThomas Gleixner  [CPUHP_ONLINE]
254c9871c80SThomas Gleixner  [CPUHP_ONLINE - 1]->teardown()       -> success
255c9871c80SThomas Gleixner  ...
256c9871c80SThomas Gleixner  [CPUHP_AP_ONLINE + 1]->teardown()    -> success
257c9871c80SThomas Gleixner  === Start of STARTUP section
258c9871c80SThomas Gleixner  [CPUHP_AP_ONLINE]->teardown()        -> success
259c9871c80SThomas Gleixner  ...
260c9871c80SThomas Gleixner  [CPUHP_BRINGUP_ONLINE - 1]->teardown()
261c9871c80SThomas Gleixner  ...
262c9871c80SThomas Gleixner  === Start of PREPARE section
263c9871c80SThomas Gleixner  [CPUHP_BRINGUP_CPU]->teardown()
264c9871c80SThomas Gleixner  [CPUHP_OFFLINE + 3]->teardown()
265c9871c80SThomas Gleixner  [CPUHP_OFFLINE + 2]                  -> skipped because teardown == NULL
266c9871c80SThomas Gleixner  [CPUHP_OFFLINE + 1]->teardown()
267c9871c80SThomas Gleixner  [CPUHP_OFFLINE]
268c9871c80SThomas Gleixner
269c9871c80SThomas GleixnerA failed online operation looks like this::
270c9871c80SThomas Gleixner
271c9871c80SThomas Gleixner  [CPUHP_OFFLINE]
272c9871c80SThomas Gleixner  [CPUHP_OFFLINE + 1]->startup()       -> success
273c9871c80SThomas Gleixner  [CPUHP_OFFLINE + 2]->startup()       -> success
274c9871c80SThomas Gleixner  [CPUHP_OFFLINE + 3]                  -> skipped because startup == NULL
275c9871c80SThomas Gleixner  ...
276c9871c80SThomas Gleixner  [CPUHP_BRINGUP_CPU]->startup()       -> success
277c9871c80SThomas Gleixner  === End of PREPARE section
278c9871c80SThomas Gleixner  [CPUHP_BRINGUP_CPU + 1]->startup()   -> success
279c9871c80SThomas Gleixner  ...
280c9871c80SThomas Gleixner  [CPUHP_AP_ONLINE]->startup()         -> success
281c9871c80SThomas Gleixner  === End of STARTUP section
282c9871c80SThomas Gleixner  [CPUHP_AP_ONLINE + 1]->startup()     -> success
283c9871c80SThomas Gleixner  ---
284c9871c80SThomas Gleixner  [CPUHP_AP_ONLINE + N]->startup()     -> fail
285c9871c80SThomas Gleixner  [CPUHP_AP_ONLINE + (N - 1)]->teardown()
286c9871c80SThomas Gleixner  ...
287c9871c80SThomas Gleixner  [CPUHP_AP_ONLINE + 1]->teardown()
288c9871c80SThomas Gleixner  === Start of STARTUP section
289c9871c80SThomas Gleixner  [CPUHP_AP_ONLINE]->teardown()
290c9871c80SThomas Gleixner  ...
291c9871c80SThomas Gleixner  [CPUHP_BRINGUP_ONLINE - 1]->teardown()
292c9871c80SThomas Gleixner  ...
293c9871c80SThomas Gleixner  === Start of PREPARE section
294c9871c80SThomas Gleixner  [CPUHP_BRINGUP_CPU]->teardown()
295c9871c80SThomas Gleixner  [CPUHP_OFFLINE + 3]->teardown()
296c9871c80SThomas Gleixner  [CPUHP_OFFLINE + 2]                  -> skipped because teardown == NULL
297c9871c80SThomas Gleixner  [CPUHP_OFFLINE + 1]->teardown()
298c9871c80SThomas Gleixner  [CPUHP_OFFLINE]
299c9871c80SThomas Gleixner
300c9871c80SThomas GleixnerA failed offline operation looks like this::
301c9871c80SThomas Gleixner
302c9871c80SThomas Gleixner  [CPUHP_ONLINE]
303c9871c80SThomas Gleixner  [CPUHP_ONLINE - 1]->teardown()       -> success
304c9871c80SThomas Gleixner  ...
305c9871c80SThomas Gleixner  [CPUHP_ONLINE - N]->teardown()       -> fail
306c9871c80SThomas Gleixner  [CPUHP_ONLINE - (N - 1)]->startup()
307c9871c80SThomas Gleixner  ...
308c9871c80SThomas Gleixner  [CPUHP_ONLINE - 1]->startup()
309c9871c80SThomas Gleixner  [CPUHP_ONLINE]
310c9871c80SThomas Gleixner
311c9871c80SThomas GleixnerRecursive failures cannot be handled sensibly. Look at the following
312c9871c80SThomas Gleixnerexample of a recursive fail due to a failed offline operation: ::
313c9871c80SThomas Gleixner
314c9871c80SThomas Gleixner  [CPUHP_ONLINE]
315c9871c80SThomas Gleixner  [CPUHP_ONLINE - 1]->teardown()       -> success
316c9871c80SThomas Gleixner  ...
317c9871c80SThomas Gleixner  [CPUHP_ONLINE - N]->teardown()       -> fail
318c9871c80SThomas Gleixner  [CPUHP_ONLINE - (N - 1)]->startup()  -> success
319c9871c80SThomas Gleixner  [CPUHP_ONLINE - (N - 2)]->startup()  -> fail
320c9871c80SThomas Gleixner
321c9871c80SThomas GleixnerThe CPU hotplug state machine stops right here and does not try to go back
322c9871c80SThomas Gleixnerdown again because that would likely result in an endless loop::
323c9871c80SThomas Gleixner
324c9871c80SThomas Gleixner  [CPUHP_ONLINE - (N - 1)]->teardown() -> success
325c9871c80SThomas Gleixner  [CPUHP_ONLINE - N]->teardown()       -> fail
326c9871c80SThomas Gleixner  [CPUHP_ONLINE - (N - 1)]->startup()  -> success
327c9871c80SThomas Gleixner  [CPUHP_ONLINE - (N - 2)]->startup()  -> fail
328c9871c80SThomas Gleixner  [CPUHP_ONLINE - (N - 1)]->teardown() -> success
329c9871c80SThomas Gleixner  [CPUHP_ONLINE - N]->teardown()       -> fail
330c9871c80SThomas Gleixner
331c9871c80SThomas GleixnerLather, rinse and repeat. In this case the CPU left in state::
332c9871c80SThomas Gleixner
333c9871c80SThomas Gleixner  [CPUHP_ONLINE - (N - 1)]
334c9871c80SThomas Gleixner
335c9871c80SThomas Gleixnerwhich at least lets the system make progress and gives the user a chance to
336c9871c80SThomas Gleixnerdebug or even resolve the situation.
337c9871c80SThomas Gleixner
338c9871c80SThomas GleixnerAllocating a state
339c9871c80SThomas Gleixner------------------
340c9871c80SThomas Gleixner
341c9871c80SThomas GleixnerThere are two ways to allocate a CPU hotplug state:
342c9871c80SThomas Gleixner
343c9871c80SThomas Gleixner* Static allocation
344c9871c80SThomas Gleixner
345c9871c80SThomas Gleixner  Static allocation has to be used when the subsystem or driver has
346c9871c80SThomas Gleixner  ordering requirements versus other CPU hotplug states. E.g. the PERF core
347c9871c80SThomas Gleixner  startup callback has to be invoked before the PERF driver startup
348c9871c80SThomas Gleixner  callbacks during a CPU online operation. During a CPU offline operation
349c9871c80SThomas Gleixner  the driver teardown callbacks have to be invoked before the core teardown
350c9871c80SThomas Gleixner  callback. The statically allocated states are described by constants in
351c9871c80SThomas Gleixner  the cpuhp_state enum which can be found in include/linux/cpuhotplug.h.
352c9871c80SThomas Gleixner
353c9871c80SThomas Gleixner  Insert the state into the enum at the proper place so the ordering
354c9871c80SThomas Gleixner  requirements are fulfilled. The state constant has to be used for state
355c9871c80SThomas Gleixner  setup and removal.
356c9871c80SThomas Gleixner
357c9871c80SThomas Gleixner  Static allocation is also required when the state callbacks are not set
358c9871c80SThomas Gleixner  up at runtime and are part of the initializer of the CPU hotplug state
359c9871c80SThomas Gleixner  array in kernel/cpu.c.
360c9871c80SThomas Gleixner
361c9871c80SThomas Gleixner* Dynamic allocation
362c9871c80SThomas Gleixner
363c9871c80SThomas Gleixner  When there are no ordering requirements for the state callbacks then
364c9871c80SThomas Gleixner  dynamic allocation is the preferred method. The state number is allocated
365c9871c80SThomas Gleixner  by the setup function and returned to the caller on success.
366c9871c80SThomas Gleixner
367c9871c80SThomas Gleixner  Only the PREPARE and ONLINE sections provide a dynamic allocation
368c9871c80SThomas Gleixner  range. The STARTING section does not as most of the callbacks in that
369c9871c80SThomas Gleixner  section have explicit ordering requirements.
370c9871c80SThomas Gleixner
371c9871c80SThomas GleixnerSetup of a CPU hotplug state
372c9871c80SThomas Gleixner----------------------------
373c9871c80SThomas Gleixner
374c9871c80SThomas GleixnerThe core code provides the following functions to setup a state:
375c9871c80SThomas Gleixner
376c9871c80SThomas Gleixner* cpuhp_setup_state(state, name, startup, teardown)
377c9871c80SThomas Gleixner* cpuhp_setup_state_nocalls(state, name, startup, teardown)
378c9871c80SThomas Gleixner* cpuhp_setup_state_cpuslocked(state, name, startup, teardown)
379c9871c80SThomas Gleixner* cpuhp_setup_state_nocalls_cpuslocked(state, name, startup, teardown)
380c9871c80SThomas Gleixner
381c9871c80SThomas GleixnerFor cases where a driver or a subsystem has multiple instances and the same
382c9871c80SThomas GleixnerCPU hotplug state callbacks need to be invoked for each instance, the CPU
383c9871c80SThomas Gleixnerhotplug core provides multi-instance support. The advantage over driver
384c9871c80SThomas Gleixnerspecific instance lists is that the instance related functions are fully
385c9871c80SThomas Gleixnerserialized against CPU hotplug operations and provide the automatic
386c9871c80SThomas Gleixnerinvocations of the state callbacks on add and removal. To set up such a
387c9871c80SThomas Gleixnermulti-instance state the following function is available:
388c9871c80SThomas Gleixner
389c9871c80SThomas Gleixner* cpuhp_setup_state_multi(state, name, startup, teardown)
390c9871c80SThomas Gleixner
391c9871c80SThomas GleixnerThe @state argument is either a statically allocated state or one of the
392e0a99a83SAnna-Maria Behnsenconstants for dynamically allocated states - CPUHP_BP_PREPARE_DYN,
393e0a99a83SAnna-Maria BehnsenCPUHP_AP_ONLINE_DYN - depending on the state section (PREPARE, ONLINE) for
394c9871c80SThomas Gleixnerwhich a dynamic state should be allocated.
395c9871c80SThomas Gleixner
396c9871c80SThomas GleixnerThe @name argument is used for sysfs output and for instrumentation. The
397c9871c80SThomas Gleixnernaming convention is "subsys:mode" or "subsys/driver:mode",
398c9871c80SThomas Gleixnere.g. "perf:mode" or "perf/x86:mode". The common mode names are:
399c9871c80SThomas Gleixner
400c9871c80SThomas Gleixner======== =======================================================
401c9871c80SThomas Gleixnerprepare  For states in the PREPARE section
402c9871c80SThomas Gleixner
403c9871c80SThomas Gleixnerdead     For states in the PREPARE section which do not provide
404c9871c80SThomas Gleixner         a startup callback
405c9871c80SThomas Gleixner
406c9871c80SThomas Gleixnerstarting For states in the STARTING section
407c9871c80SThomas Gleixner
408c9871c80SThomas Gleixnerdying    For states in the STARTING section which do not provide
409c9871c80SThomas Gleixner         a startup callback
410c9871c80SThomas Gleixner
411c9871c80SThomas Gleixneronline   For states in the ONLINE section
412c9871c80SThomas Gleixner
413c9871c80SThomas Gleixneroffline  For states in the ONLINE section which do not provide
414c9871c80SThomas Gleixner         a startup callback
415c9871c80SThomas Gleixner======== =======================================================
416c9871c80SThomas Gleixner
417c9871c80SThomas GleixnerAs the @name argument is only used for sysfs and instrumentation other mode
418c9871c80SThomas Gleixnerdescriptors can be used as well if they describe the nature of the state
419c9871c80SThomas Gleixnerbetter than the common ones.
420c9871c80SThomas Gleixner
421c9871c80SThomas GleixnerExamples for @name arguments: "perf/online", "perf/x86:prepare",
422c9871c80SThomas Gleixner"RCU/tree:dying", "sched/waitempty"
423c9871c80SThomas Gleixner
424c9871c80SThomas GleixnerThe @startup argument is a function pointer to the callback which should be
425c9871c80SThomas Gleixnerinvoked during a CPU online operation. If the usage site does not require a
426c9871c80SThomas Gleixnerstartup callback set the pointer to NULL.
427c9871c80SThomas Gleixner
428c9871c80SThomas GleixnerThe @teardown argument is a function pointer to the callback which should
429c9871c80SThomas Gleixnerbe invoked during a CPU offline operation. If the usage site does not
430c9871c80SThomas Gleixnerrequire a teardown callback set the pointer to NULL.
431c9871c80SThomas Gleixner
432c9871c80SThomas GleixnerThe functions differ in the way how the installed callbacks are treated:
433c9871c80SThomas Gleixner
434c9871c80SThomas Gleixner  * cpuhp_setup_state_nocalls(), cpuhp_setup_state_nocalls_cpuslocked()
435c9871c80SThomas Gleixner    and cpuhp_setup_state_multi() only install the callbacks
436c9871c80SThomas Gleixner
437c9871c80SThomas Gleixner  * cpuhp_setup_state() and cpuhp_setup_state_cpuslocked() install the
438c9871c80SThomas Gleixner    callbacks and invoke the @startup callback (if not NULL) for all online
439c9871c80SThomas Gleixner    CPUs which have currently a state greater than the newly installed
440c9871c80SThomas Gleixner    state. Depending on the state section the callback is either invoked on
441c9871c80SThomas Gleixner    the current CPU (PREPARE section) or on each online CPU (ONLINE
442c9871c80SThomas Gleixner    section) in the context of the CPU's hotplug thread.
443c9871c80SThomas Gleixner
444c9871c80SThomas Gleixner    If a callback fails for CPU N then the teardown callback for CPU
445c9871c80SThomas Gleixner    0 .. N-1 is invoked to rollback the operation. The state setup fails,
446c9871c80SThomas Gleixner    the callbacks for the state are not installed and in case of dynamic
447c9871c80SThomas Gleixner    allocation the allocated state is freed.
448c9871c80SThomas Gleixner
449c9871c80SThomas GleixnerThe state setup and the callback invocations are serialized against CPU
450c9871c80SThomas Gleixnerhotplug operations. If the setup function has to be called from a CPU
451c9871c80SThomas Gleixnerhotplug read locked region, then the _cpuslocked() variants have to be
452c9871c80SThomas Gleixnerused. These functions cannot be used from within CPU hotplug callbacks.
453c9871c80SThomas Gleixner
454c9871c80SThomas GleixnerThe function return values:
455c9871c80SThomas Gleixner  ======== ===================================================================
456c9871c80SThomas Gleixner  0        Statically allocated state was successfully set up
457c9871c80SThomas Gleixner
458c9871c80SThomas Gleixner  >0       Dynamically allocated state was successfully set up.
459c9871c80SThomas Gleixner
460c9871c80SThomas Gleixner           The returned number is the state number which was allocated. If
461c9871c80SThomas Gleixner           the state callbacks have to be removed later, e.g. module
462c9871c80SThomas Gleixner           removal, then this number has to be saved by the caller and used
463c9871c80SThomas Gleixner           as @state argument for the state remove function. For
464c9871c80SThomas Gleixner           multi-instance states the dynamically allocated state number is
465c9871c80SThomas Gleixner           also required as @state argument for the instance add/remove
466c9871c80SThomas Gleixner           operations.
467c9871c80SThomas Gleixner
468c9871c80SThomas Gleixner  <0	   Operation failed
469c9871c80SThomas Gleixner  ======== ===================================================================
470c9871c80SThomas Gleixner
471c9871c80SThomas GleixnerRemoval of a CPU hotplug state
472c9871c80SThomas Gleixner------------------------------
473c9871c80SThomas Gleixner
474c9871c80SThomas GleixnerTo remove a previously set up state, the following functions are provided:
475c9871c80SThomas Gleixner
476c9871c80SThomas Gleixner* cpuhp_remove_state(state)
477c9871c80SThomas Gleixner* cpuhp_remove_state_nocalls(state)
478c9871c80SThomas Gleixner* cpuhp_remove_state_nocalls_cpuslocked(state)
479c9871c80SThomas Gleixner* cpuhp_remove_multi_state(state)
480c9871c80SThomas Gleixner
481c9871c80SThomas GleixnerThe @state argument is either a statically allocated state or the state
482c9871c80SThomas Gleixnernumber which was allocated in the dynamic range by cpuhp_setup_state*(). If
483c9871c80SThomas Gleixnerthe state is in the dynamic range, then the state number is freed and
484c9871c80SThomas Gleixneravailable for dynamic allocation again.
485c9871c80SThomas Gleixner
486c9871c80SThomas GleixnerThe functions differ in the way how the installed callbacks are treated:
487c9871c80SThomas Gleixner
488c9871c80SThomas Gleixner  * cpuhp_remove_state_nocalls(), cpuhp_remove_state_nocalls_cpuslocked()
489c9871c80SThomas Gleixner    and cpuhp_remove_multi_state() only remove the callbacks.
490c9871c80SThomas Gleixner
491c9871c80SThomas Gleixner  * cpuhp_remove_state() removes the callbacks and invokes the teardown
492c9871c80SThomas Gleixner    callback (if not NULL) for all online CPUs which have currently a state
493c9871c80SThomas Gleixner    greater than the removed state. Depending on the state section the
494c9871c80SThomas Gleixner    callback is either invoked on the current CPU (PREPARE section) or on
495c9871c80SThomas Gleixner    each online CPU (ONLINE section) in the context of the CPU's hotplug
496c9871c80SThomas Gleixner    thread.
497c9871c80SThomas Gleixner
498c9871c80SThomas Gleixner    In order to complete the removal, the teardown callback should not fail.
499c9871c80SThomas Gleixner
500c9871c80SThomas GleixnerThe state removal and the callback invocations are serialized against CPU
501c9871c80SThomas Gleixnerhotplug operations. If the remove function has to be called from a CPU
502c9871c80SThomas Gleixnerhotplug read locked region, then the _cpuslocked() variants have to be
503c9871c80SThomas Gleixnerused. These functions cannot be used from within CPU hotplug callbacks.
504c9871c80SThomas Gleixner
505c9871c80SThomas GleixnerIf a multi-instance state is removed then the caller has to remove all
506c9871c80SThomas Gleixnerinstances first.
507c9871c80SThomas Gleixner
508c9871c80SThomas GleixnerMulti-Instance state instance management
509c9871c80SThomas Gleixner----------------------------------------
510c9871c80SThomas Gleixner
511c9871c80SThomas GleixnerOnce the multi-instance state is set up, instances can be added to the
512c9871c80SThomas Gleixnerstate:
513c9871c80SThomas Gleixner
514c9871c80SThomas Gleixner  * cpuhp_state_add_instance(state, node)
515c9871c80SThomas Gleixner  * cpuhp_state_add_instance_nocalls(state, node)
516c9871c80SThomas Gleixner
517c9871c80SThomas GleixnerThe @state argument is either a statically allocated state or the state
518c9871c80SThomas Gleixnernumber which was allocated in the dynamic range by cpuhp_setup_state_multi().
519c9871c80SThomas Gleixner
520c9871c80SThomas GleixnerThe @node argument is a pointer to an hlist_node which is embedded in the
521c9871c80SThomas Gleixnerinstance's data structure. The pointer is handed to the multi-instance
522c9871c80SThomas Gleixnerstate callbacks and can be used by the callback to retrieve the instance
523c9871c80SThomas Gleixnervia container_of().
524c9871c80SThomas Gleixner
525c9871c80SThomas GleixnerThe functions differ in the way how the installed callbacks are treated:
526c9871c80SThomas Gleixner
527c9871c80SThomas Gleixner  * cpuhp_state_add_instance_nocalls() and only adds the instance to the
528c9871c80SThomas Gleixner    multi-instance state's node list.
529c9871c80SThomas Gleixner
530c9871c80SThomas Gleixner  * cpuhp_state_add_instance() adds the instance and invokes the startup
531c9871c80SThomas Gleixner    callback (if not NULL) associated with @state for all online CPUs which
532c9871c80SThomas Gleixner    have currently a state greater than @state. The callback is only
533c9871c80SThomas Gleixner    invoked for the to be added instance. Depending on the state section
534c9871c80SThomas Gleixner    the callback is either invoked on the current CPU (PREPARE section) or
535c9871c80SThomas Gleixner    on each online CPU (ONLINE section) in the context of the CPU's hotplug
536c9871c80SThomas Gleixner    thread.
537c9871c80SThomas Gleixner
538c9871c80SThomas Gleixner    If a callback fails for CPU N then the teardown callback for CPU
539c9871c80SThomas Gleixner    0 .. N-1 is invoked to rollback the operation, the function fails and
540c9871c80SThomas Gleixner    the instance is not added to the node list of the multi-instance state.
541c9871c80SThomas Gleixner
542c9871c80SThomas GleixnerTo remove an instance from the state's node list these functions are
543c9871c80SThomas Gleixneravailable:
544c9871c80SThomas Gleixner
545c9871c80SThomas Gleixner  * cpuhp_state_remove_instance(state, node)
546c9871c80SThomas Gleixner  * cpuhp_state_remove_instance_nocalls(state, node)
547c9871c80SThomas Gleixner
548d2bef8e1SAkhil RajThe arguments are the same as for the cpuhp_state_add_instance*()
549c9871c80SThomas Gleixnervariants above.
550c9871c80SThomas Gleixner
551c9871c80SThomas GleixnerThe functions differ in the way how the installed callbacks are treated:
552c9871c80SThomas Gleixner
553c9871c80SThomas Gleixner  * cpuhp_state_remove_instance_nocalls() only removes the instance from the
554c9871c80SThomas Gleixner    state's node list.
555c9871c80SThomas Gleixner
556c9871c80SThomas Gleixner  * cpuhp_state_remove_instance() removes the instance and invokes the
557c9871c80SThomas Gleixner    teardown callback (if not NULL) associated with @state for all online
558c9871c80SThomas Gleixner    CPUs which have currently a state greater than @state.  The callback is
559c9871c80SThomas Gleixner    only invoked for the to be removed instance.  Depending on the state
560c9871c80SThomas Gleixner    section the callback is either invoked on the current CPU (PREPARE
561c9871c80SThomas Gleixner    section) or on each online CPU (ONLINE section) in the context of the
562c9871c80SThomas Gleixner    CPU's hotplug thread.
563c9871c80SThomas Gleixner
564c9871c80SThomas Gleixner    In order to complete the removal, the teardown callback should not fail.
565c9871c80SThomas Gleixner
566c9871c80SThomas GleixnerThe node list add/remove operations and the callback invocations are
567c9871c80SThomas Gleixnerserialized against CPU hotplug operations. These functions cannot be used
568c9871c80SThomas Gleixnerfrom within CPU hotplug callbacks and CPU hotplug read locked regions.
569c9871c80SThomas Gleixner
570c9871c80SThomas GleixnerExamples
571c9871c80SThomas Gleixner--------
572c9871c80SThomas Gleixner
573c9871c80SThomas GleixnerSetup and teardown a statically allocated state in the STARTING section for
574c9871c80SThomas Gleixnernotifications on online and offline operations::
575c9871c80SThomas Gleixner
576c9871c80SThomas Gleixner   ret = cpuhp_setup_state(CPUHP_SUBSYS_STARTING, "subsys:starting", subsys_cpu_starting, subsys_cpu_dying);
577c9871c80SThomas Gleixner   if (ret < 0)
578c9871c80SThomas Gleixner        return ret;
579c9871c80SThomas Gleixner   ....
580c9871c80SThomas Gleixner   cpuhp_remove_state(CPUHP_SUBSYS_STARTING);
581c9871c80SThomas Gleixner
582c9871c80SThomas GleixnerSetup and teardown a dynamically allocated state in the ONLINE section
583c9871c80SThomas Gleixnerfor notifications on offline operations::
584c9871c80SThomas Gleixner
585e0a99a83SAnna-Maria Behnsen   state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "subsys:offline", NULL, subsys_cpu_offline);
586c9871c80SThomas Gleixner   if (state < 0)
587c9871c80SThomas Gleixner       return state;
588c9871c80SThomas Gleixner   ....
589c9871c80SThomas Gleixner   cpuhp_remove_state(state);
590c9871c80SThomas Gleixner
591c9871c80SThomas GleixnerSetup and teardown a dynamically allocated state in the ONLINE section
592c9871c80SThomas Gleixnerfor notifications on online operations without invoking the callbacks::
593c9871c80SThomas Gleixner
594e0a99a83SAnna-Maria Behnsen   state = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "subsys:online", subsys_cpu_online, NULL);
595c9871c80SThomas Gleixner   if (state < 0)
596c9871c80SThomas Gleixner       return state;
597c9871c80SThomas Gleixner   ....
598c9871c80SThomas Gleixner   cpuhp_remove_state_nocalls(state);
599c9871c80SThomas Gleixner
600c9871c80SThomas GleixnerSetup, use and teardown a dynamically allocated multi-instance state in the
601c9871c80SThomas GleixnerONLINE section for notifications on online and offline operation::
602c9871c80SThomas Gleixner
603e0a99a83SAnna-Maria Behnsen   state = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "subsys:online", subsys_cpu_online, subsys_cpu_offline);
604c9871c80SThomas Gleixner   if (state < 0)
605c9871c80SThomas Gleixner       return state;
606c9871c80SThomas Gleixner   ....
607c9871c80SThomas Gleixner   ret = cpuhp_state_add_instance(state, &inst1->node);
608c9871c80SThomas Gleixner   if (ret)
609c9871c80SThomas Gleixner        return ret;
610c9871c80SThomas Gleixner   ....
611c9871c80SThomas Gleixner   ret = cpuhp_state_add_instance(state, &inst2->node);
612c9871c80SThomas Gleixner   if (ret)
613c9871c80SThomas Gleixner        return ret;
614c9871c80SThomas Gleixner   ....
615c9871c80SThomas Gleixner   cpuhp_remove_instance(state, &inst1->node);
616c9871c80SThomas Gleixner   ....
617c9871c80SThomas Gleixner   cpuhp_remove_instance(state, &inst2->node);
618c9871c80SThomas Gleixner   ....
619*f62da559SLucas De Marchi   cpuhp_remove_multi_state(state);
620c9871c80SThomas Gleixner
621ff58fa7fSSebastian Andrzej Siewior
622ff58fa7fSSebastian Andrzej SiewiorTesting of hotplug states
623ff58fa7fSSebastian Andrzej Siewior=========================
624f8c6a07cSYanteng Si
625ff58fa7fSSebastian Andrzej SiewiorOne way to verify whether a custom state is working as expected or not is to
626ff58fa7fSSebastian Andrzej Siewiorshutdown a CPU and then put it online again. It is also possible to put the CPU
627ff58fa7fSSebastian Andrzej Siewiorto certain state (for instance *CPUHP_AP_ONLINE*) and then go back to
628ff58fa7fSSebastian Andrzej Siewior*CPUHP_ONLINE*. This would simulate an error one state after *CPUHP_AP_ONLINE*
629ff58fa7fSSebastian Andrzej Siewiorwhich would lead to rollback to the online state.
630ff58fa7fSSebastian Andrzej Siewior
631f8c6a07cSYanteng SiAll registered states are enumerated in ``/sys/devices/system/cpu/hotplug/states`` ::
632ff58fa7fSSebastian Andrzej Siewior
633ff58fa7fSSebastian Andrzej Siewior $ tail /sys/devices/system/cpu/hotplug/states
634ff58fa7fSSebastian Andrzej Siewior 138: mm/vmscan:online
635ff58fa7fSSebastian Andrzej Siewior 139: mm/vmstat:online
636ff58fa7fSSebastian Andrzej Siewior 140: lib/percpu_cnt:online
637ff58fa7fSSebastian Andrzej Siewior 141: acpi/cpu-drv:online
638ff58fa7fSSebastian Andrzej Siewior 142: base/cacheinfo:online
639ff58fa7fSSebastian Andrzej Siewior 143: virtio/net:online
640ff58fa7fSSebastian Andrzej Siewior 144: x86/mce:online
641ff58fa7fSSebastian Andrzej Siewior 145: printk:online
642ff58fa7fSSebastian Andrzej Siewior 168: sched:active
643ff58fa7fSSebastian Andrzej Siewior 169: online
644ff58fa7fSSebastian Andrzej Siewior
645f8c6a07cSYanteng SiTo rollback CPU4 to ``lib/percpu_cnt:online`` and back online just issue::
646ff58fa7fSSebastian Andrzej Siewior
647ff58fa7fSSebastian Andrzej Siewior  $ cat /sys/devices/system/cpu/cpu4/hotplug/state
648ff58fa7fSSebastian Andrzej Siewior  169
649ff58fa7fSSebastian Andrzej Siewior  $ echo 140 > /sys/devices/system/cpu/cpu4/hotplug/target
650ff58fa7fSSebastian Andrzej Siewior  $ cat /sys/devices/system/cpu/cpu4/hotplug/state
651ff58fa7fSSebastian Andrzej Siewior  140
652ff58fa7fSSebastian Andrzej Siewior
653f8c6a07cSYanteng SiIt is important to note that the teardown callback of state 140 have been
654f8c6a07cSYanteng Siinvoked. And now get back online::
655ff58fa7fSSebastian Andrzej Siewior
656ff58fa7fSSebastian Andrzej Siewior  $ echo 169 > /sys/devices/system/cpu/cpu4/hotplug/target
657ff58fa7fSSebastian Andrzej Siewior  $ cat /sys/devices/system/cpu/cpu4/hotplug/state
658ff58fa7fSSebastian Andrzej Siewior  169
659ff58fa7fSSebastian Andrzej Siewior
660f8c6a07cSYanteng SiWith trace events enabled, the individual steps are visible, too::
661ff58fa7fSSebastian Andrzej Siewior
662ff58fa7fSSebastian Andrzej Siewior  #  TASK-PID   CPU#    TIMESTAMP  FUNCTION
663ff58fa7fSSebastian Andrzej Siewior  #     | |       |        |         |
664ff58fa7fSSebastian Andrzej Siewior      bash-394  [001]  22.976: cpuhp_enter: cpu: 0004 target: 140 step: 169 (cpuhp_kick_ap_work)
665ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  22.977: cpuhp_enter: cpu: 0004 target: 140 step: 168 (sched_cpu_deactivate)
666ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  22.990: cpuhp_exit:  cpu: 0004  state: 168 step: 168 ret: 0
667ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  22.991: cpuhp_enter: cpu: 0004 target: 140 step: 144 (mce_cpu_pre_down)
668ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  22.992: cpuhp_exit:  cpu: 0004  state: 144 step: 144 ret: 0
669ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  22.993: cpuhp_multi_enter: cpu: 0004 target: 140 step: 143 (virtnet_cpu_down_prep)
670ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  22.994: cpuhp_exit:  cpu: 0004  state: 143 step: 143 ret: 0
671ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  22.995: cpuhp_enter: cpu: 0004 target: 140 step: 142 (cacheinfo_cpu_pre_down)
672ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  22.996: cpuhp_exit:  cpu: 0004  state: 142 step: 142 ret: 0
673ff58fa7fSSebastian Andrzej Siewior      bash-394  [001]  22.997: cpuhp_exit:  cpu: 0004  state: 140 step: 169 ret: 0
674ff58fa7fSSebastian Andrzej Siewior      bash-394  [005]  95.540: cpuhp_enter: cpu: 0004 target: 169 step: 140 (cpuhp_kick_ap_work)
675ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  95.541: cpuhp_enter: cpu: 0004 target: 169 step: 141 (acpi_soft_cpu_online)
676ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  95.542: cpuhp_exit:  cpu: 0004  state: 141 step: 141 ret: 0
677ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  95.543: cpuhp_enter: cpu: 0004 target: 169 step: 142 (cacheinfo_cpu_online)
678ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  95.544: cpuhp_exit:  cpu: 0004  state: 142 step: 142 ret: 0
679ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  95.545: cpuhp_multi_enter: cpu: 0004 target: 169 step: 143 (virtnet_cpu_online)
680ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  95.546: cpuhp_exit:  cpu: 0004  state: 143 step: 143 ret: 0
681ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  95.547: cpuhp_enter: cpu: 0004 target: 169 step: 144 (mce_cpu_online)
682ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  95.548: cpuhp_exit:  cpu: 0004  state: 144 step: 144 ret: 0
683ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  95.549: cpuhp_enter: cpu: 0004 target: 169 step: 145 (console_cpu_notify)
684ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  95.550: cpuhp_exit:  cpu: 0004  state: 145 step: 145 ret: 0
685ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  95.551: cpuhp_enter: cpu: 0004 target: 169 step: 168 (sched_cpu_activate)
686ff58fa7fSSebastian Andrzej Siewior   cpuhp/4-31   [004]  95.552: cpuhp_exit:  cpu: 0004  state: 168 step: 168 ret: 0
687ff58fa7fSSebastian Andrzej Siewior      bash-394  [005]  95.553: cpuhp_exit:  cpu: 0004  state: 169 step: 140 ret: 0
688ff58fa7fSSebastian Andrzej Siewior
689ff58fa7fSSebastian Andrzej SiewiorAs it an be seen, CPU4 went down until timestamp 22.996 and then back up until
690ff58fa7fSSebastian Andrzej Siewior95.552. All invoked callbacks including their return codes are visible in the
691ff58fa7fSSebastian Andrzej Siewiortrace.
692ff58fa7fSSebastian Andrzej Siewior
693ff58fa7fSSebastian Andrzej SiewiorArchitecture's requirements
694ff58fa7fSSebastian Andrzej Siewior===========================
695f8c6a07cSYanteng Si
696ff58fa7fSSebastian Andrzej SiewiorThe following functions and configurations are required:
697ff58fa7fSSebastian Andrzej Siewior
698ff58fa7fSSebastian Andrzej Siewior``CONFIG_HOTPLUG_CPU``
699ff58fa7fSSebastian Andrzej Siewior  This entry needs to be enabled in Kconfig
700ff58fa7fSSebastian Andrzej Siewior
701ff58fa7fSSebastian Andrzej Siewior``__cpu_up()``
702ff58fa7fSSebastian Andrzej Siewior  Arch interface to bring up a CPU
703ff58fa7fSSebastian Andrzej Siewior
704ff58fa7fSSebastian Andrzej Siewior``__cpu_disable()``
705ff58fa7fSSebastian Andrzej Siewior  Arch interface to shutdown a CPU, no more interrupts can be handled by the
706ff58fa7fSSebastian Andrzej Siewior  kernel after the routine returns. This includes the shutdown of the timer.
707ff58fa7fSSebastian Andrzej Siewior
708ff58fa7fSSebastian Andrzej Siewior``__cpu_die()``
709ff58fa7fSSebastian Andrzej Siewior  This actually supposed to ensure death of the CPU. Actually look at some
710ff58fa7fSSebastian Andrzej Siewior  example code in other arch that implement CPU hotplug. The processor is taken
711ff58fa7fSSebastian Andrzej Siewior  down from the ``idle()`` loop for that specific architecture. ``__cpu_die()``
712ff58fa7fSSebastian Andrzej Siewior  typically waits for some per_cpu state to be set, to ensure the processor dead
713ff58fa7fSSebastian Andrzej Siewior  routine is called to be sure positively.
714ff58fa7fSSebastian Andrzej Siewior
715ff58fa7fSSebastian Andrzej SiewiorUser Space Notification
716ff58fa7fSSebastian Andrzej Siewior=======================
717f8c6a07cSYanteng Si
718f8c6a07cSYanteng SiAfter CPU successfully onlined or offline udev events are sent. A udev rule like::
719ff58fa7fSSebastian Andrzej Siewior
720ff58fa7fSSebastian Andrzej Siewior  SUBSYSTEM=="cpu", DRIVERS=="processor", DEVPATH=="/devices/system/cpu/*", RUN+="the_hotplug_receiver.sh"
721ff58fa7fSSebastian Andrzej Siewior
722f8c6a07cSYanteng Siwill receive all events. A script like::
723ff58fa7fSSebastian Andrzej Siewior
724ff58fa7fSSebastian Andrzej Siewior  #!/bin/sh
725ff58fa7fSSebastian Andrzej Siewior
726ff58fa7fSSebastian Andrzej Siewior  if [ "${ACTION}" = "offline" ]
727ff58fa7fSSebastian Andrzej Siewior  then
728ff58fa7fSSebastian Andrzej Siewior      echo "CPU ${DEVPATH##*/} offline"
729ff58fa7fSSebastian Andrzej Siewior
730ff58fa7fSSebastian Andrzej Siewior  elif [ "${ACTION}" = "online" ]
731ff58fa7fSSebastian Andrzej Siewior  then
732ff58fa7fSSebastian Andrzej Siewior      echo "CPU ${DEVPATH##*/} online"
733ff58fa7fSSebastian Andrzej Siewior
734ff58fa7fSSebastian Andrzej Siewior  fi
735ff58fa7fSSebastian Andrzej Siewior
736ff58fa7fSSebastian Andrzej Siewiorcan process the event further.
737ff58fa7fSSebastian Andrzej Siewior
73888a6f899SEric DeVolderWhen changes to the CPUs in the system occur, the sysfs file
73988a6f899SEric DeVolder/sys/devices/system/cpu/crash_hotplug contains '1' if the kernel
740c91c6062SSourabh Jainupdates the kdump capture kernel list of CPUs itself (via elfcorehdr and
741c91c6062SSourabh Jainother relevant kexec segment), or '0' if userspace must update the kdump
742c91c6062SSourabh Jaincapture kernel list of CPUs.
74388a6f899SEric DeVolder
74488a6f899SEric DeVolderThe availability depends on the CONFIG_HOTPLUG_CPU kernel configuration
74588a6f899SEric DeVolderoption.
74688a6f899SEric DeVolder
74788a6f899SEric DeVolderTo skip userspace processing of CPU hot un/plug events for kdump
74888a6f899SEric DeVolder(i.e. the unload-then-reload to obtain a current list of CPUs), this sysfs
74988a6f899SEric DeVolderfile can be used in a udev rule as follows:
75088a6f899SEric DeVolder
75188a6f899SEric DeVolder SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
75288a6f899SEric DeVolder
75388a6f899SEric DeVolderFor a CPU hot un/plug event, if the architecture supports kernel updates
754c91c6062SSourabh Jainof the elfcorehdr (which contains the list of CPUs) and other relevant
755c91c6062SSourabh Jainkexec segments, then the rule skips the unload-then-reload of the kdump
756c91c6062SSourabh Jaincapture kernel.
75788a6f899SEric DeVolder
758ff58fa7fSSebastian Andrzej SiewiorKernel Inline Documentations Reference
759ff58fa7fSSebastian Andrzej Siewior======================================
760ff58fa7fSSebastian Andrzej Siewior
761ff58fa7fSSebastian Andrzej Siewior.. kernel-doc:: include/linux/cpuhotplug.h
762