In update_lmb_associativity_index() we lookup dr_node using
of_find_node_by_path() which takes a reference for us. In the
non-error case we forget to drop the reference. Note that
find_aa_index() does modify properties of the node, but doesn't need
an extra reference held once it's returned.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For fadump to work successfully there should not be any holes in reserved
memory ranges where kernel has asked firmware to move the content of old
kernel memory in event of crash. Now that fadump uses CMA for reserved
area, this memory area is now not protected from hot-remove operations
unless it is cma allocated. Hence, fadump service can fail to re-register
after the hot-remove operation, if hot-removed memory belongs to fadump
reserved region. To avoid this make sure that memory from fadump reserved
area is not hot-removable if fadump is registered.
However, if user still wants to remove that memory, he can do so by
manually stopping fadump service before hot-remove operation.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Remove directly accessing device_node.type pointer and use the
accessors instead. This will eventually allow removing the type
pointer.
Replace the open coded iterating over child nodes with
for_each_child_of_node() while we're here.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.
In general, we should hold the device_hotplug_lock when adding memory to
synchronize against online/offline request (e.g. from user space) - which
already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3b ("mm, hotplug: fix concurrent memory
hot-add deadlock"). add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.
Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.
The lock is not held yet in
drivers/xen/balloon.c
arch/powerpc/platforms/powernv/memtrace.c
drivers/s390/char/sclp_cmd.c
drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.
Don't export add_memory_resource(), as it once was exported to be used by
XEN, which is never built as a module. If somebody requires it, we also
have to export a locked variant (as device_hotplug_lock is never
exported).
Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: online/offline_pages called w.o. mem_hotplug_lock", v3.
Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock. And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.
While e.g.
echo "online" > /sys/devices/system/memory/memory9/state
is fine, e.g.
echo 1 > /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.
E.g. via memory_probe_store(), we can end up calling
add_memory()->online_pages() without the device_hotplug_lock. So we can
have concurrent callers in online_pages(). We e.g. touch in
online_pages() basically unprotected zone->present_pages then.
Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible. We
would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.
Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
More details can be found in patch 3 and patch 6.
I propose the general rules (documentation added in patch 6):
1. add_memory/add_memory_resource() must only be called with
device_hotplug_lock.
2. remove_memory() must only be called with device_hotplug_lock. This is
already documented and holds for all callers.
3. device_online()/device_offline() must only be called with
device_hotplug_lock. This is already documented and true for now in core
code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
online_pages/offline_pages.
To me, this looks way cleaner than what we have right now (and easier to
verify). And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural.
This patch (of 6):
remove_memory() is exported right now but requires the
device_hotplug_lock, which is not exported. So let's provide a variant
that takes the lock and only export that one.
The lock is already held in
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
arch/powerpc/platforms/powernv/memtrace.c
Apart from that, there are not other users in the tree.
Link: http://lkml.kernel.org/r/20180925091457.28651-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The variable 'aa_index' is defined as an unsigned value in
update_lmb_associativity_index(), but find_aa_index() may return -1
when dlpar_clone_property() fails. So change find_aa_index() to return
a bool, which indicates whether 'aa_index' was found or not.
Fixes: c05a5a4096 ("powerpc/pseries: Dynamic add entires to associativity lookup array")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Nathan Fontenot nfont@linux.vnet.ibm.com>
[mpe: Tweak changelog, rename is_found to just found]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The updates to powerpc numa and memory hotplug code now use the
in-kernel LMB array instead of the device tree. This change allows the
pseries memory DLPAR code to only update the device tree once after
successfully handling a DLPAR request.
Prior to the in-kernel LMB array, the numa code looked up the affinity
for memory being added in the device tree, the code now looks this up
in the LMB array. This change means the memory hotplug code can just
update the affinity for an LMB in the LMB array instead of updating
the device tree.
This also provides a savings in kernel memory. When updating the
device tree old properties are never free'ed since there is no
usecount on properties. This behavior leads to a new copy of the
property being allocated every time a LMB is added or removed (i.e. a
request to add 100 LMBs creates 100 new copies of the property). With
this update only a single new property is created when a DLPAR request
completes successfully.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that the powerpc code parses dynamic reconfiguration memory
LMB information from the LMB array and not the device tree
directly we can move the of_drconf_cell struct to drmem.h where
it fits better.
In addition, the struct is renamed to of_drconf_cell_v1 in
anticipation of upcoming support for version 2 of the dynamic
reconfiguration property and the members are typed as __be*
values to reflect how they exist in the device tree.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Update the pseries memory hotplug code to use the newly added
dynamic reconfiguration LMB array. Doing this is required for the
upcoming support of version 2 of the dynamic reconfiguration
device tree property.
In addition, making this change cleans up the code that parses the
LMB information as we no longer need to worry about device tree
format. This allows us to discard one of the first steps on memory
hotplug where we make a working copy of the device tree property and
convert the entire property to cpu format. Instead we just use the
LMB array directly while holding the memory hotplug lock.
This patch also moves the updating of the device tree property to
powerpc/mm/drmem.c. This allows to the hotplug code to work without
needing to know the device tree format and provides a single
routine for updating the device tree property. This new routine
will handle determination of the proper device tree format and
generate a properly formatted device tree property.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Check if an LMB is assigned before attempting to call dlpar_acquire_drc
in order to avoid any unnecessary rtas calls. This substantially
reduces the running time of memory hot add on lpars with large amounts
of memory.
[mpe: We need to explicitly set rc to 0 in the success case, otherwise
the compiler might think we use rc without initialising it.]
Fixes: c21f515c74 ("powerpc/pseries: Make the acquire/release of the drc for memory a seperate step")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When DLPAR adding or removing memory we need to check the device
offline status before trying to online/offline the memory. This is
needed because calls to device_online() and device_offline() will
return non-zero for memory that is already online and offline
respectively.
This update resolves two scenarios. First, for a kernel built with
auto-online memory enabled (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y),
memory will be onlined as part of calls to add_memory(). After adding
the memory the pseries DLPAR code tries to online it and fails since
the memory is already online. The DLPAR code then tries to remove the
memory which produces the oops message below because the memory is not
offline.
The second scenario occurs when removing memory that is already
offline, i.e. marking memory offline (via sysfs) and then trying to
remove that memory. This doesn't work because offlining the already
offline memory does not succeed and the DLPAR code then fails the
DLPAR remove operation.
The fix for both scenarios is to check the device.offline status
before making the calls to device_online() or device_offline().
kernel BUG at mm/memory_hotplug.c:1936!
...
NIP [c0000000002ca428] .remove_memory+0xb8/0xc0
LR [c0000000002ca3cc] .remove_memory+0x5c/0xc0
Call Trace:
.remove_memory+0x5c/0xc0 (unreliable)
.dlpar_add_lmb+0x384/0x400
.dlpar_memory+0x5dc/0xca0
.handle_dlpar_errorlog+0x74/0xe0
.pseries_hp_work_fn+0x2c/0x90
.process_one_work+0x17c/0x460
.worker_thread+0x88/0x500
.kthread+0x15c/0x1a0
.ret_from_kernel_thread+0x58/0xc0
Fixes: 943db62c31 ("powerpc/pseries: Revert 'Auto-online hotplugged memory'")
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
[mpe: Use bool, add explicit rc=0 case, change log typos & formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Merge our fixes branch, a few of them are tripping people up while
working on top of next, and we also have a dependency between the CXL
fixes and new CXL code we want to merge into next.
To register fadump, boot memory area - the size of low memory chunk that
is required for a kernel to boot successfully when booted with restricted
memory, is assumed to have no holes. But this memory area is currently
not protected from hot-remove operations. So, fadump could fail to
re-register after a memory hot-remove operation, if memory is removed
from boot memory area. To avoid this, ensure that memory from boot
memory area is not hot-removed when fadump is registered.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Reviewed-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When adding or removing memory, the aa_index (affinity value) for the
memblock must also be converted to match the endianness of the rest
of the 'ibm,dynamic-memory' property. Otherwise, subsequent retrieval
of the attribute will likely lead to non-existent nodes, followed by
using the default node in the code inappropriately.
Fixes: 5f97b2a0d1 ("powerpc/pseries: Implement memory hotplug add in the kernel")
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This reverts commit ec99907244 ("powerpc/pseries: Auto-online
hotplugged memory"), and 9dc512819e ("powerpc: Fix unused function
warning 'lmb_to_memblock'").
Using the auto-online acpability does online added memory but does not
update the associated device struct to indicate that the memory is
online. This causes the pseries memory DLPAR code to fail when trying to
remove a LMB that was previously removed and added back. This happens
when validating that the LMB is removable.
This patch reverts to the previous behavior of calling device_online()
to online the LMB when it is DLPAR added and moves the lmb_to_memblock()
routine out of CONFIG_MEMORY_HOTREMOVE now that we call it for add.
Fixes: ec99907244 ("powerpc/pseries: Auto-online hotplugged memory")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Indexed-count remove for memory hotplug guarantees that a contiguous block
of <count> lmbs beginning at a specified <index> will be unassigned (NOT
that <count> lmbs will be removed). Because of Qemu's per-DIMM memory
management, the removal of a contiguous block of memory currently
requires a series of individual calls. Indexed-count remove reduces
this series into a single call.
Signed-off-by: Sahil Mehta <sahilmehta17@gmail.com>
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Indexed-count add for memory hotplug guarantees that a contiguous block
of <count> lmbs beginning at a specified <drc index> will be assigned,
any LMBs in this range that are not already assigned will be DLPAR added.
Because of Qemu's per-DIMM memory management, the addition of a contiguous
block of memory currently requires a series of individual calls to add
each LMB in the block. Indexed-count add reduces this series of calls to
a single call for the entire block.
Signed-off-by: Sahil Mehta <sahilmehta17@gmail.com>
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We broke the build when CONFIG_MEMORY_HOTREMOVE=n:
arch/powerpc/platforms/pseries/hotplug-memory.c:821:8: error: implicit
declaration of function 'dlpar_memory_readd_by_index'
Add a dummy to fix it.
Fixes: e70d59700f ("powerpc/pseries: Introduce memory hotplug READD operation")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, memory must be hot removed and subsequently re-added in order
to dynamically update the affinity of LMBs specified by a PRRN event.
Earlier implementations of the PRRN event handler ran into issues in which
the hot remove would occur successfully, but a hotplug event would be
initiated from another source and grab the hotplug lock preventing the hot
add from occurring. To prevent this situation, this patch introduces the
notion of a hot "readd" action for memory which atomizes a hot remove and
a hot add into a single, serialized operation on the hotplug queue.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When adding and removing LMBs we should make the acquire/release of
the DRC a separate step to allow for a few improvements. First
this will ensure that LMBs removed during a remove by count operation
are all available if a error occurs and we need to add them back. By
first removeing all the LMBs from the kernel before releasing their
DRCs the LMBs are available to add back should an error occur.
Also, this will allow for faster re-add operations of memory for
PRRN event handling since we can skip the unneeded step of having
to release the DRC and the acquire it back.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We should be using lmb_is_removable() to validate that enough LMBs
are available to remove when doing a remove by count. This will check
that the LMB is owned by the system and it is considered removable.
This patch also adds a pr_info() notification to report the LMB count
to remove was not satisfied.
What we do now is just check that there are enough LMBs owned by the
system when validating there are enough LMBs to remove. This can
lead to situations where there are enough LMBs owned by the system
but not enough that are considered removable. This results in having
to bail out of the remove operation instead of just failing the request
that we should have known wouldn't succeed.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch fixes the following warning:
arch/powerpc/platforms/pseries/hotplug-memory.c:323:29: error: 'lmb_to_memblock' defined but not used [-Werror=unused-function]
static struct memory_block *lmb_to_memblock(struct of_drconf_cell *lmb)
^~~~~~~~~~~~~~~
The only consumer of this function is 'dlpar_remove_lmb', which is
enabled with CONFIG_MEMORY_HOTREMOVE, so move it into the same
ifdef block.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The call to memblock_add is not needed, this is already done by
memory_add(). This patch removes this call which shrinks
dlpar_add_lmb_memory() enough that it can be merged into dlpar_add_lmb().
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
A recent update (commit id 31bc3858ea) allows for automatically
onlining memory that is added. This patch sets the config option
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y for pseries and updates the
pseries memory hotplug code so that DLPAR added memory can be
automatically onlined instead of explicitly onlining the memory.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Dynamically add entries to the associativity lookup array
The ibm,associativity-lookup-arrays property may only contain
associativity arrays for LMBs present at boot time. When hotplug
adding a LMB its associativity array may not be in the associativity
lookup array, this patch adds the ability to add new entries to the
associativity lookup array.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move property cloning code into its own routine
Split the pieces of dlpar_clone_drconf_property() that create a copy of
the property struct into its own routine. This allows for creating
clones of more than just the ibm,dynamic-memory property used in memory
hotplug.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The associativity array index specified for a LMB in the device tree,
/ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory, needs to be updated
prior to DLPAR adding a LMB and after DLPAR removing a LMB.
Without doing this step in the DLPAR add process a LMB could be configured
with the incorrect affinity. For a LMB that was not present at boot the
affinity index is set to 0xffffffff, which defaults to adding the LMB to
the first online node since the index is not a valid value. Or, the
affinity index could contain a stale value if the LMB was present at boot
but later DLPAR removed and is being DLPAR added back to the system.
This patch adds a step in the DLPAR add flow to look up the associativity
index for a LMB prior to adding a LMB and setting the associativity to
0xffffffff when a LMB is removed.
This patch also modifies the DLPAR add/remove flow to no longer do a single
update of the device tree property after all of the requested DLPAR
operations are complete and now does a property update during the add
or remove of each LMB.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Re-factor dlpar_lmb_add() routine by moving the validation of the lmb
flags and the acquireing of the DRC to a wrapper around the work to add
the memory to the system. This is done to make handling of errors
during the addition of the memory easier and to facilitate the upcoming
addition of updating the lmb's affinity prior to adding the memory.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch was generated using fixed coccinelle semantic patch
scripts/coccinelle/api/memdup.cocci [1].
[1]: http://permalink.gmane.org/gmane.linux.kernel/2014320
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
51925fb3c5 "powerpc/pseries: Implement memory hotplug remove in the kernel"
broke compile when CONFIG_MEMORY_HOTREMOVE is not defined due to missing
symbols. This fixes the issue by adding the missing symbols.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Memory dlpar handling can return from dlpar_memory() without releasing the
device_hotplug lock. Correct this routine to ensure the lock is released.
Fixes: 5f97b2a0d1 ("powerpc/pseries: Implement memory hotplug add in the kernel")
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch adds the ability to do memory hotplug remove in the kernel.
Currently the operation to hotplug remove memory is handled by the drmgr
command which performs the operation by performing some work in user-space
and making requests to the kernel to handle other pieces. By moving all
of the work to the kernel we can do the remove faster, and provide a common
code path to do memory hotplug for both the PowerVM and PowerKVM environments.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds the ability to do memory hotplug add in the kernel.
Currently the operation to hotplug add memory is handled by the drmgr
command which performs the operation by performing some work in user-space
and making requests to the kernel to handle other pieces. By moving all
of the work to the kernel we can do the add faster, and provide a common
code path to do memory hotplug for both the PowerVM and PowerKVM environments.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The current hotplug (or dlpar) of devices (the process is generally the
same for memory, cpu, and pci) on PowerVM systems is initiated
from the HMC, which communicates the request to the partitions through
the RSCT framework. The RSCT framework then invokes the drmgr command.
The drmgr command performs the hotplug operation by doing some pieces,
such as most of the rtas calls and device tree parsing, in userspace
and make requests to the kernel to online/offline the device, update the
device tree and add/remove the device.
For PowerKVM the approach for device hotplug is to follow what is currently
being done for pci hotplug. A hotplug request is initiated from the host.
QEMU then generates an EPOW interrupt to the guest which causes the guest
to make the rtas,check-exception call. In QEMU, the rtas,check-exception call
returns a rtas hotplug event to the guest.
Please note that the current pci hotplug path for PowerKVM involves the
kernel receiving the rtas hotplug event, passing it to rtas_errd in
userspace, and having rtas_errd invoke drmgr. The drmgr command then
handles the request as described above for PowerVM systems.
There is no need for this circuitous route, we should just handle the entire
hotplug of devices in the kernel. What I am planning is to enable this
by moving the code to handle hotplug from drmgr into the kernel to
provide a single path for handling device hotplug for both PowerVM and
PowerKVM systems. This patch provides the common iframework and entry point.
For PowerKVM a future update to the kernel rtas code will recognize rtas
hotplug events returned from rtas,check-exception calls and use the common
entry point to handle hotplug of the device.
For PowerVM systems, This patch creates /sys/kernel/dlpar that can be
used by the drmgr command to initiate hotplug requests. In order to do
this a string of the format "<resource> <action> <id_type> <id>" is
written to this file. The string consists of a resource (cpu, memory, pci,
phb), an action (add or remove), an id_type (count, drc index, drc name),
and the corresponding id. The kernel will parse the string and create a
rtas hotplug section that can be passed to the common entry point for
handling hotplug requests.
It should be noted that there is no chance of updating how we receive
hotplug (dlpar) requests from the HMC on PowerVM systems.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some nice cleanups like removing bootmem, and removal of __get_cpu_var().
There is one patch to mm/gup.c. This is the generic GUP implementation, but is
only used by us and arm(64). We have an ack from Steve Capper, and although we
didn't get an ack from Andrew he told us to take the patch through the powerpc
tree.
There's one cxl patch. This is in drivers/misc, but Greg said he was happy for
us to manage fixes for it.
There is an infrastructure patch to support an IPMI driver for OPAL. That patch
also appears in Corey Minyard's IPMI tree, you may see a conflict there.
There is also an RTC driver for OPAL. We weren't able to get any response from
the RTC maintainer, Alessandro Zummo, so in the end we just merged the driver.
The usual batch of Freescale updates from Scott.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUiSTSAAoJEFHr6jzI4aWAirQP/3rIEng0LzLu5kW2zkGylIaM
SNDum1vze3mHiTFl+CFcSIGpC1UEULoB49HA+2oE/ExKpIceG6lpL2LP+wNh2FW5
mozjMjS6mZt4w1Fu1D2ZtgQc3O1T1pxkqsnZmPa8gVf5k5d5IQNPY6yB0pgVWwbV
gwBKxe4VwPAzJjppE9i9MDhNTJwmHZq0lI8XuoTXOOU/f+4G1WxmjrbyveQ7cRP5
i/sq2cKjxpWA+KDeIXo0GR0DpXR7qMeAvFX5xXY7oKuUJIFDM4kSHfmMYP6qLf5c
2vlsJqHVqfOgQdve41z1ooaPzNtg7ezVo+VqqguSgtSgwy2JUo/uHpnzz3gD1Olo
AP5+6xj8LZac0rTPxF4n4Hoyrp7AaaFjEFt1zqT9PWniZW4B41wtia0QORBNUf1S
UEmKAC9T3WZJ47mH7WMSadtOPF9E3Yd/zuiPD4udtptCNKPbr6/k1MpJPIW2D4Rn
BJ0QZTRd7V0yRofXxZtHxaMxq8pWd/Tip7J/zr/ghz+ulnH8BuFamuhCCLuJlESU
+A2PMfuseyTMpH9sMAmmTwSGPDKjaUFWvmFvY/n88NZL7r2LlomNrDWFSSQOIHUP
FxjYmjUMpZeexsfyRdgFV/INhYC3o3cso2fRGO45YK6nkxNnjNFEBS6WhQLvNLBu
sknd1WjXkuJtoMC15SrQ
=jvyT
-----END PGP SIGNATURE-----
Merge tag 'powerpc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
Pull powerpc updates from Michael Ellerman:
"Some nice cleanups like removing bootmem, and removal of
__get_cpu_var().
There is one patch to mm/gup.c. This is the generic GUP
implementation, but is only used by us and arm(64). We have an ack
from Steve Capper, and although we didn't get an ack from Andrew he
told us to take the patch through the powerpc tree.
There's one cxl patch. This is in drivers/misc, but Greg said he was
happy for us to manage fixes for it.
There is an infrastructure patch to support an IPMI driver for OPAL.
There is also an RTC driver for OPAL. We weren't able to get any
response from the RTC maintainer, Alessandro Zummo, so in the end we
just merged the driver.
The usual batch of Freescale updates from Scott"
* tag 'powerpc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (101 commits)
powerpc/powernv: Return to cpu offline loop when finished in KVM guest
powerpc/book3s: Fix partial invalidation of TLBs in MCE code.
powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault
powerpc/xmon: Cleanup the breakpoint flags
powerpc/xmon: Enable HW instruction breakpoint on POWER8
powerpc/mm/thp: Use tlbiel if possible
powerpc/mm/thp: Remove code duplication
powerpc/mm/hugetlb: Sanity check gigantic hugepage count
powerpc/oprofile: Disable pagefaults during user stack read
powerpc/mm: Check for matching hpte without taking hpte lock
powerpc: Drop useless warning in eeh_init()
powerpc/powernv: Cleanup unused MCE definitions/declarations.
powerpc/eeh: Dump PHB diag-data early
powerpc/eeh: Recover EEH error on ownership change for BCM5719
powerpc/eeh: Set EEH_PE_RESET on PE reset
powerpc/eeh: Refactor eeh_reset_pe()
powerpc: Remove more traces of bootmem
powerpc/pseries: Initialise nvram_pstore_info's buf_lock
cxl: Name interrupts in /proc/interrupt
cxl: Return error to PSL if IRQ demultiplexing fails & print clearer warning
...
The OF_RECONFIG notifier callback uses a different structure depending
on whether it is a node change or a property change. This is silly, and
not very safe. Rework the code to use the same data structure regardless
of the type of notifier.
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
We have an extra level of indirection on memory hot remove which is not
matched on memory hot add. Memory hotplug is book3s only, so there is
no need for it.
This also enables means remove_memory() (ie memory hot unplug) works
on powernv.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull powerpc updates from Michael Ellerman:
"Here's a first pull request for powerpc updates for 3.18.
The bulk of the additions are for the "cxl" driver, for IBM's Coherent
Accelerator Processor Interface (CAPI). Most of it's in drivers/misc,
which Greg & Arnd maintain, Greg said he was happy for us to take it
through our tree.
There's the usual minor cleanups and fixes, including a bit of noise
in drivers from some of those. A bunch of updates to our EEH code,
which has been getting more testing. Several nice speedups from
Anton, including 20% in clear_page().
And a bunch of updates for freescale from Scott"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (130 commits)
cxl: Fix afu_read() not doing finish_wait() on signal or non-blocking
cxl: Add documentation for userspace APIs
cxl: Add driver to Kbuild and Makefiles
cxl: Add userspace header file
cxl: Driver code for powernv PCIe based cards for userspace access
cxl: Add base builtin support
powerpc/mm: Add hooks for cxl
powerpc/opal: Add PHB to cxl mode call
powerpc/mm: Add new hash_page_mm()
powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts
cxl: Add new header for call backs and structs
powerpc/powernv: Split out set MSI IRQ chip code
powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize
powerpc/msi: Improve IRQ bitmap allocator
powerpc/cell: Make spu_flush_all_slbs() generic
powerpc/cell: Move data segment faulting code out of cell platform
powerpc/cell: Move spu_handle_mm_fault() out of cell platform
powerpc/pseries: Use new defines when calling H_SET_MODE
powerpc: Update contact info in Documentation files
powerpc/perf/hv-24x7: Simplify catalog_read()
...
Fix a number of places where global functions were not including
their prototype. This ensures the prototype and the function match.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Values acquired from Open Firmware are in 32-bit big endian format
and need to be handled on little endian architectures. This patch
ensures values are in cpu endian when hotplugging memory.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Pull more powerpc updates from Ben Herrenschmidt:
"Here are some more powerpc bits for 3.17, essentially fixes.
The biggest series, also aimed at -stable, is from Aneesh and is the
result of weeks and weeks of debugging to find out why the heck or THP
implementation was occasionally triggering multi-hit errors in our
level 1 TLB. It ended up being a combination of issues including
subtleties as to how we should invalidate those special 'MPSS' pages
we use to allow the use of 16M pages inside 4K/64K "base page size"
segments (you really have to love our MMU !)
Another interesting one in the "OMG" category is the series from
Michael adding memory barriers to spin_is_locked(). That's also the
result of many days of debugging to figure out why the semaphore code
would occasionally crash in ways that made no sense. It ended up
being some creative lock stacking that was defeated by the fact that
our locks allow a load inside the locked section to be re-ordered with
the load of the lock value itself (I'm still of two mind about whether
to kill that once and for all by putting a heavier barrier back into
our lock implementation...). The fixes come with a long explanation
in the cset comments, feel free to read it if you feel like having a
headache today"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (25 commits)
powerpc/thp: Add tracepoints to track hugepage invalidate
powerpc/mm: Use read barrier when creating real_pte
powerpc/thp: Use ACCESS_ONCE when loading pmdp
powerpc/thp: Invalidate with vpn in loop
powerpc/thp: Handle combo pages in invalidate
powerpc/thp: Invalidate old 64K based hash page mapping before insert of 4k pte
powerpc/thp: Don't recompute vsid and ssize in loop on invalidate
powerpc/thp: Add write barrier after updating the valid bit
powerpc: reorder per-cpu NUMA information's initialization
powerpc/perf/hv-24x7: Use kmem_cache_free
powerpc/pseries/hvcserver: Fix endian issue in hvcs_get_partner_info
powerpc: Hard disable interrupts in xmon
powerpc: remove duplicate definition of TEXASR_FS
powerpc/pseries: Avoid deadlock on removing ddw
powerpc/pseries: Failure on removing device node
powerpc/boot: Use correct zlib types for comparison
powerpc/powernv: Interface to register/unregister opal dump region
printk: Add function to return log buffer address and size
powerpc: Add POWER8 features to CPU_FTRS_POSSIBLE/ALWAYS
powerpc/ppc476: Disable BTAC
...
Currently, devicetree reconfig notifiers get emitted before the change
is applied to the tree, but that behaviour is problematic if the
receiver wants the determine the new state of the tree. The current
users don't care, but the changeset code to follow will be making
multiple changes at once. Reorder notifiers to get emitted after the
change has been applied to the tree so that callbacks see the new tree
state.
At the same time, fixup the existing callbacks to expect the new order.
There are a few callbacks that compare the old and new values of a
changed property. Put both property pointers into the of_prop_reconfig
structure.
The current notifiers also allow the notifier callback to fail and
cancel the change to the tree, but that feature isn't actually used.
It really isn't valid to ignore a tree modification provided by firmware
anyway, so remove the ability to cancel a change to the tree.
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Cc: Nathan Fontenot <nfont@austin.ibm.com>
The pseries platform code unconditionally overrides
memory_block_size_bytes regardless of the running platform.
Create a ppc_md hook that so each platform can choose to
do what it wants.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
While testing memory hot-remove, I found following dead lock:
Process #1141 is drmgr, trying to remove some memory, i.e. memory499.
It holds the memory_hotplug_mutex, and blocks when trying to remove file
"online" under dir memory499, in kernfs_drain(), at
wait_event(root->deactivate_waitq,
atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);
Process #1120 is trying to online memory499 by
echo 1 > memory499/online
In .kernfs_fop_write, it uses kernfs_get_active() to increase
&kn->active, thus blocking process #1141. While itself is blocked later
when trying to acquire memory_hotplug_mutex, which is held by process
The backtrace of both processes are shown below:
[<c000000001b18600>] 0xc000000001b18600
[<c000000000015044>] .__switch_to+0x144/0x200
[<c000000000263ca4>] .online_pages+0x74/0x7b0
[<c00000000055b40c>] .memory_subsys_online+0x9c/0x150
[<c00000000053cbe8>] .device_online+0xb8/0x120
[<c00000000053cd04>] .online_store+0xb4/0xc0
[<c000000000538ce4>] .dev_attr_store+0x64/0xa0
[<c00000000030f4ec>] .sysfs_kf_write+0x7c/0xb0
[<c00000000030e574>] .kernfs_fop_write+0x154/0x1e0
[<c000000000268450>] .vfs_write+0xe0/0x260
[<c000000000269144>] .SyS_write+0x64/0x110
[<c000000000009ffc>] syscall_exit+0x0/0x7c
[<c000000001b18600>] 0xc000000001b18600
[<c000000000015044>] .__switch_to+0x144/0x200
[<c00000000030be14>] .__kernfs_remove+0x204/0x300
[<c00000000030d428>] .kernfs_remove_by_name_ns+0x68/0xf0
[<c00000000030fb38>] .sysfs_remove_file_ns+0x38/0x60
[<c000000000539354>] .device_remove_attrs+0x54/0xc0
[<c000000000539fd8>] .device_del+0x158/0x250
[<c00000000053a104>] .device_unregister+0x34/0xa0
[<c00000000055bc14>] .unregister_memory_section+0x164/0x170
[<c00000000024ee18>] .__remove_pages+0x108/0x4c0
[<c00000000004b590>] .arch_remove_memory+0x60/0xc0
[<c00000000026446c>] .remove_memory+0x8c/0xe0
[<c00000000007f9f4>] .pseries_remove_memblock+0xd4/0x160
[<c00000000007fcfc>] .pseries_memory_notifier+0x27c/0x290
[<c0000000008ae6cc>] .notifier_call_chain+0x8c/0x100
[<c0000000000d858c>] .__blocking_notifier_call_chain+0x6c/0xe0
[<c00000000071ddec>] .of_property_notify+0x7c/0xc0
[<c00000000071ed3c>] .of_update_property+0x3c/0x1b0
[<c0000000000756cc>] .ofdt_write+0x3dc/0x740
[<c0000000002f60fc>] .proc_reg_write+0xac/0x110
[<c000000000268450>] .vfs_write+0xe0/0x260
[<c000000000269144>] .SyS_write+0x64/0x110
[<c000000000009ffc>] syscall_exit+0x0/0x7c
This patch uses lock_device_hotplug() to protect remove_memory() called
in pseries_remove_memblock(), which is also stated before function
remove_memory():
* NOTE: The caller must call lock_device_hotplug() to serialize hotplug
* and online/offline operations before this call, as required by
* try_offline_node().
*/
void __ref remove_memory(int nid, u64 start, u64 size)
With this lock held, the other process(#1120 above) trying to online the
memory block will retry the system call when calling
lock_device_hotplug_sysfs(), and finally find No such device error.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The memory remove code for powerpc/pseries should call remove_memory()
so that we are holding the hotplug_memory lock during memory remove
operations.
This patch updates the memory node remove handler to call remove_memory()
and adds a ppc_md.remove_memory() entry to handle pseries specific work
that is called from arch_remove_memory().
During memory remove in pseries_remove_memblock() we have to stay with
removing memory one section at a time. This is needed because of how memory
resources are handled. During memory add for pseries (via the probe file in
sysfs) we add memory one section at a time which gives us a memory resource
for each section. Future patches will aim to address this so will not have
to remove memory one section at a time.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When removing prom.h include by of.h, several OF headers will no longer
be implicitly included. Add explicit includes of of_*.h as needed.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Olof Johansson <olof@lixom.net>
Cc: linuxppc-dev@lists.ozlabs.org
__remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE. PowerPC
pseries will return -EOPNOTSUPP if unsupported.
Adding an #ifdef causes several other functions it depends on to also
become unnecessary, which saves in .text when disabled (it's disabled in
most defconfigs besides powerpc, including x86). remove_memory_block()
becomes static since it is not referenced outside of
drivers/base/memory.c.
Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled
and disabled.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>