152d7e21fSMike Rapoport.. _memory_hotplug:
252d7e21fSMike Rapoport
352d7e21fSMike Rapoport==============
452d7e21fSMike RapoportMemory hotplug
552d7e21fSMike Rapoport==============
652d7e21fSMike Rapoport
752d7e21fSMike RapoportMemory hotplug event notifier
852d7e21fSMike Rapoport=============================
952d7e21fSMike Rapoport
1052d7e21fSMike RapoportHotplugging events are sent to a notification queue.
1152d7e21fSMike Rapoport
1252d7e21fSMike RapoportThere are six types of notification defined in ``include/linux/memory.h``:
1352d7e21fSMike Rapoport
1452d7e21fSMike RapoportMEM_GOING_ONLINE
1552d7e21fSMike Rapoport  Generated before new memory becomes available in order to be able to
1652d7e21fSMike Rapoport  prepare subsystems to handle memory. The page allocator is still unable
1752d7e21fSMike Rapoport  to allocate from the new memory.
1852d7e21fSMike Rapoport
1952d7e21fSMike RapoportMEM_CANCEL_ONLINE
2052d7e21fSMike Rapoport  Generated if MEM_GOING_ONLINE fails.
2152d7e21fSMike Rapoport
2252d7e21fSMike RapoportMEM_ONLINE
2352d7e21fSMike Rapoport  Generated when memory has successfully brought online. The callback may
2452d7e21fSMike Rapoport  allocate pages from the new memory.
2552d7e21fSMike Rapoport
2652d7e21fSMike RapoportMEM_GOING_OFFLINE
2752d7e21fSMike Rapoport  Generated to begin the process of offlining memory. Allocations are no
2852d7e21fSMike Rapoport  longer possible from the memory but some of the memory to be offlined
2952d7e21fSMike Rapoport  is still in use. The callback can be used to free memory known to a
3052d7e21fSMike Rapoport  subsystem from the indicated memory block.
3152d7e21fSMike Rapoport
3252d7e21fSMike RapoportMEM_CANCEL_OFFLINE
3352d7e21fSMike Rapoport  Generated if MEM_GOING_OFFLINE fails. Memory is available again from
3452d7e21fSMike Rapoport  the memory block that we attempted to offline.
3552d7e21fSMike Rapoport
3652d7e21fSMike RapoportMEM_OFFLINE
3752d7e21fSMike Rapoport  Generated after offlining memory is complete.
3852d7e21fSMike Rapoport
3952d7e21fSMike RapoportA callback routine can be registered by calling::
4052d7e21fSMike Rapoport
4152d7e21fSMike Rapoport  hotplug_memory_notifier(callback_func, priority)
4252d7e21fSMike Rapoport
4352d7e21fSMike RapoportCallback functions with higher values of priority are called before callback
4452d7e21fSMike Rapoportfunctions with lower values.
4552d7e21fSMike Rapoport
4652d7e21fSMike RapoportA callback function must have the following prototype::
4752d7e21fSMike Rapoport
4852d7e21fSMike Rapoport  int callback_func(
4952d7e21fSMike Rapoport    struct notifier_block *self, unsigned long action, void *arg);
5052d7e21fSMike Rapoport
5152d7e21fSMike RapoportThe first argument of the callback function (self) is a pointer to the block
5252d7e21fSMike Rapoportof the notifier chain that points to the callback function itself.
5352d7e21fSMike RapoportThe second argument (action) is one of the event types described above.
5452d7e21fSMike RapoportThe third argument (arg) passes a pointer of struct memory_notify::
5552d7e21fSMike Rapoport
5652d7e21fSMike Rapoport	struct memory_notify {
5752d7e21fSMike Rapoport		unsigned long start_pfn;
5852d7e21fSMike Rapoport		unsigned long nr_pages;
5952d7e21fSMike Rapoport		int status_change_nid_normal;
6052d7e21fSMike Rapoport		int status_change_nid;
6152d7e21fSMike Rapoport	}
6252d7e21fSMike Rapoport
6352d7e21fSMike Rapoport- start_pfn is start_pfn of online/offline memory.
6452d7e21fSMike Rapoport- nr_pages is # of pages of online/offline memory.
6552d7e21fSMike Rapoport- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
6652d7e21fSMike Rapoport  is (will be) set/clear, if this is -1, then nodemask status is not changed.
6752d7e21fSMike Rapoport- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
6852d7e21fSMike Rapoport  set/clear. It means a new(memoryless) node gets new memory by online and a
6952d7e21fSMike Rapoport  node loses all memory. If this is -1, then nodemask status is not changed.
7052d7e21fSMike Rapoport
7152d7e21fSMike Rapoport  If status_changed_nid* >= 0, callback should create/discard structures for the
7252d7e21fSMike Rapoport  node if necessary.
7352d7e21fSMike Rapoport
7452d7e21fSMike RapoportThe callback routine shall return one of the values
7552d7e21fSMike RapoportNOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
7652d7e21fSMike Rapoportdefined in ``include/linux/notifier.h``
7752d7e21fSMike Rapoport
7852d7e21fSMike RapoportNOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
7952d7e21fSMike Rapoport
8052d7e21fSMike RapoportNOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE,
8152d7e21fSMike RapoportMEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops
8252d7e21fSMike Rapoportfurther processing of the notification queue.
8352d7e21fSMike Rapoport
8452d7e21fSMike RapoportNOTIFY_STOP stops further processing of the notification queue.
85*3a7452c5SDavid Hildenbrand
86*3a7452c5SDavid HildenbrandLocking Internals
87*3a7452c5SDavid Hildenbrand=================
88*3a7452c5SDavid Hildenbrand
89*3a7452c5SDavid HildenbrandWhen adding/removing memory that uses memory block devices (i.e. ordinary RAM),
90*3a7452c5SDavid Hildenbrandthe device_hotplug_lock should be held to:
91*3a7452c5SDavid Hildenbrand
92*3a7452c5SDavid Hildenbrand- synchronize against online/offline requests (e.g. via sysfs). This way, memory
93*3a7452c5SDavid Hildenbrand  block devices can only be accessed (.online/.state attributes) by user
94*3a7452c5SDavid Hildenbrand  space once memory has been fully added. And when removing memory, we
95*3a7452c5SDavid Hildenbrand  know nobody is in critical sections.
96*3a7452c5SDavid Hildenbrand- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
97*3a7452c5SDavid Hildenbrand
98*3a7452c5SDavid HildenbrandEspecially, there is a possible lock inversion that is avoided using
99*3a7452c5SDavid Hildenbranddevice_hotplug_lock when adding memory and user space tries to online that
100*3a7452c5SDavid Hildenbrandmemory faster than expected:
101*3a7452c5SDavid Hildenbrand
102*3a7452c5SDavid Hildenbrand- device_online() will first take the device_lock(), followed by
103*3a7452c5SDavid Hildenbrand  mem_hotplug_lock
104*3a7452c5SDavid Hildenbrand- add_memory_resource() will first take the mem_hotplug_lock, followed by
105*3a7452c5SDavid Hildenbrand  the device_lock() (while creating the devices, during bus_add_device()).
106*3a7452c5SDavid Hildenbrand
107*3a7452c5SDavid HildenbrandAs the device is visible to user space before taking the device_lock(), this
108*3a7452c5SDavid Hildenbrandcan result in a lock inversion.
109*3a7452c5SDavid Hildenbrand
110*3a7452c5SDavid Hildenbrandonlining/offlining of memory should be done via device_online()/
111*3a7452c5SDavid Hildenbranddevice_offline() - to make sure it is properly synchronized to actions
112*3a7452c5SDavid Hildenbrandvia sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
113*3a7452c5SDavid Hildenbrand
114*3a7452c5SDavid HildenbrandWhen adding/removing/onlining/offlining memory or adding/removing
115*3a7452c5SDavid Hildenbrandheterogeneous/device memory, we should always hold the mem_hotplug_lock in
116*3a7452c5SDavid Hildenbrandwrite mode to serialise memory hotplug (e.g. access to global/zone
117*3a7452c5SDavid Hildenbrandvariables).
118*3a7452c5SDavid Hildenbrand
119*3a7452c5SDavid HildenbrandIn addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read
120*3a7452c5SDavid Hildenbrandmode allows for a quite efficient get_online_mems/put_online_mems
121*3a7452c5SDavid Hildenbrandimplementation, so code accessing memory can protect from that memory
122*3a7452c5SDavid Hildenbrandvanishing.
123