From 30467e0b3be83c286d60039f8267dd421128ca74 Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Tue, 14 Apr 2015 15:45:11 -0700 Subject: mm, hotplug: fix concurrent memory hot-add deadlock There's a deadlock when concurrently hot-adding memory through the probe interface and switching a memory block from offline to online. When hot-adding memory via the probe interface, add_memory() first takes mem_hotplug_begin() and then device_lock() is later taken when registering the newly initialized memory block. This creates a lock dependency of (1) mem_hotplug.lock (2) dev->mutex. When switching a memory block from offline to online, dev->mutex is first grabbed in device_online() when the write(2) transitions an existing memory block from offline to online, and then online_pages() will take mem_hotplug_begin(). This creates a lock inversion between mem_hotplug.lock and dev->mutex. Vitaly reports that this deadlock can happen when kworker handling a probe event races with systemd-udevd switching a memory block's state. This patch requires the state transition to take mem_hotplug_begin() before dev->mutex. Hot-adding memory via the probe interface creates a memory block while holding mem_hotplug_begin(), there is no way to take dev->mutex first in this case. online_pages() and offline_pages() are only called when transitioning memory block state. We now require that mem_hotplug_begin() is taken before calling them -- this requires exporting the mem_hotplug_begin() and mem_hotplug_done() to generic code. In all hot-add and hot-remove cases, mem_hotplug_begin() is done prior to device_online(). This is all that is needed to avoid the deadlock. Signed-off-by: David Rientjes Reported-by: Vitaly Kuznetsov Tested-by: Vitaly Kuznetsov Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: "K. Y. Srinivasan" Cc: Yasuaki Ishimatsu Cc: Tang Chen Cc: Vlastimil Babka Cc: Zhang Zhen Cc: Vladimir Davydov Cc: Wang Nan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/base/memory.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) (limited to 'drivers/base') diff --git a/drivers/base/memory.c b/drivers/base/memory.c index ab5c9a7..2804aed 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -219,6 +219,7 @@ static bool pages_correctly_reserved(unsigned long start_pfn) /* * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is * OK to have direct references to sparsemem variables in here. + * Must already be protected by mem_hotplug_begin(). */ static int memory_block_action(unsigned long phys_index, unsigned long action, int online_type) @@ -286,6 +287,7 @@ static int memory_subsys_online(struct device *dev) if (mem->online_type < 0) mem->online_type = MMOP_ONLINE_KEEP; + /* Already under protection of mem_hotplug_begin() */ ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE); /* clear online_type */ @@ -328,17 +330,19 @@ store_mem_state(struct device *dev, goto err; } + /* + * Memory hotplug needs to hold mem_hotplug_begin() for probe to find + * the correct memory block to online before doing device_online(dev), + * which will take dev->mutex. Take the lock early to prevent an + * inversion, memory_subsys_online() callbacks will be implemented by + * assuming it's already protected. + */ + mem_hotplug_begin(); + switch (online_type) { case MMOP_ONLINE_KERNEL: case MMOP_ONLINE_MOVABLE: case MMOP_ONLINE_KEEP: - /* - * mem->online_type is not protected so there can be a - * race here. However, when racing online, the first - * will succeed and the second will just return as the - * block will already be online. The online type - * could be either one, but that is expected. - */ mem->online_type = online_type; ret = device_online(&mem->dev); break; @@ -349,6 +353,7 @@ store_mem_state(struct device *dev, ret = -EINVAL; /* should never happen */ } + mem_hotplug_done(); err: unlock_device_hotplug(); -- cgit v1.1