summaryrefslogtreecommitdiffstats
path: root/mm/memory_hotplug.c
Commit message (Collapse)AuthorAgeFilesLines
* mem-hotplug: introduce movable_node boot optionTang Chen2013-11-131-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hot-Pluggable field in SRAT specifies which memory is hotpluggable. As we mentioned before, if hotpluggable memory is used by the kernel, it cannot be hot-removed. So memory hotplug users may want to set all hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it. Memory hotplug users may also set a node as movable node, which has ZONE_MOVABLE only, so that the whole node can be hot-removed. But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the kernel cannot use memory in movable nodes. This will cause NUMA performance down. And other users may be unhappy. So we need a way to allow users to enable and disable this functionality. In this patch, we introduce movable_node boot option to allow users to choose to not to consume hotpluggable memory at early boot time and later we can set it as ZONE_MOVABLE. To achieve this, the movable_node boot option will control the memblock allocation direction. That said, after memblock is ready, before SRAT is parsed, we should allocate memory near the kernel image as we explained in the previous patches. So if movable_node boot option is set, the kernel does the following: 1. After memblock is ready, make memblock allocate memory bottom up. 2. After SRAT is parsed, make memblock behave as default, allocate memory top down. Users can specify "movable_node" in kernel commandline to enable this functionality. For those who don't use memory hotplug or who don't want to lose their NUMA performance, just don't specify anything. The kernel will work as before. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Suggested-by: Ingo Molnar <mingo@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/sparsemem: use PAGES_PER_SECTION to remove redundant nr_pages parameterZhang Yanfei2013-11-131-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | For below functions, - sparse_add_one_section() - kmalloc_section_memmap() - __kmalloc_section_memmap() - __kfree_section_memmap() they are always invoked to operate on one memory section, so it is redundant to always pass a nr_pages parameter, which is the page numbers in one section. So we can directly use predefined macro PAGES_PER_SECTION instead of passing the parameter. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cpu/mem hotplug: add try_online_node() for cpu_up()Toshi Kani2013-11-131-2/+14
| | | | | | | | | | | | | | | | | | | | | | cpu_up() has #ifdef CONFIG_MEMORY_HOTPLUG code blocks, which call mem_online_node() to put its node online if offlined and then call build_all_zonelists() to initialize the zone list. These steps are specific to memory hotplug, and should be managed in mm/memory_hotplug.c. lock_memory_hotplug() should also be held for the whole steps. For this reason, this patch replaces mem_online_node() with try_online_node(), which performs the whole steps with lock_memory_hotplug() held. try_online_node() is named after try_offline_node() as they have similar purpose. There is no functional change in this patch. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/memory_hotplug.c: use pfn_to_nid() instead of page_to_nid(pfn_to_page())Xishi Qiu2013-11-131-1/+1
| | | | | | | | | Use "pfn_to_nid(pfn)" instead of "page_to_nid(pfn_to_page(pfn))". Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/memory_hotplug.c: rename the function is_memblock_offlined_cb()Xishi Qiu2013-11-131-2/+2
| | | | | | | | | | | A is_memblock_offlined() return or 1 means memory block is offlined, but is_memblock_offlined_cb() returning 1 means memory block is not offlined, this will confuse somebody, so rename the function. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: use pgdat_end_pfn() to simplify the code in othersXishi Qiu2013-11-131-5/+4
| | | | | | | | | Use "pgdat_end_pfn()" instead of "pgdat->node_start_pfn + pgdat->node_spanned_pages". Simplify the code, no functional change. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'pm+acpi-fixes-3.12-rc1' of ↵Linus Torvalds2013-09-121-4/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: "All of these commits are fixes that have emerged recently and some of them fix bugs introduced during this merge window. Specifics: 1) ACPI-based PCI hotplug (ACPIPHP) fixes related to spurious events After the recent ACPIPHP changes we've seen some interesting breakage on a system that triggers device check notifications during boot for non-existing devices. Although those notifications are really spurious, we should be able to deal with them nevertheless and that shouldn't introduce too much overhead. Four commits to make that work properly. 2) Memory hotplug and hibernation mutual exclusion rework This was maent to be a cleanup, but it happens to fix a classical ABBA deadlock between system suspend/hibernation and ACPI memory hotplug which is possible if they are started roughly at the same time. Three commits rework memory hotplug so that it doesn't acquire pm_mutex and make hibernation use device_hotplug_lock which prevents it from racing with memory hotplug. 3) ACPI Intel LPSS (Low-Power Subsystem) driver crash fix The ACPI LPSS driver crashes during boot on Apple Macbook Air with Haswell that has slightly unusual BIOS configuration in which one of the LPSS device's _CRS method doesn't return all of the information expected by the driver. Fix from Mika Westerberg, for stable. 4) ACPICA fix related to Store->ArgX operation AML interpreter fix for obscure breakage that causes AML to be executed incorrectly on some machines (observed in practice). From Bob Moore. 5) ACPI core fix for PCI ACPI device objects lookup There still are cases in which there is more than one ACPI device object matching a given PCI device and we don't choose the one that the BIOS expects us to choose, so this makes the lookup take more criteria into account in those cases. 6) Fix to prevent cpuidle from crashing in some rare cases If the result of cpuidle_get_driver() is NULL, which can happen on some systems, cpuidle_driver_ref() will crash trying to use that pointer and the Daniel Fu's fix prevents that from happening. 7) cpufreq fixes related to CPU hotplug Stephen Boyd reported a number of concurrency problems with cpufreq related to CPU hotplug which are addressed by a series of fixes from Srivatsa S Bhat and Viresh Kumar. 8) cpufreq fix for time conversion in time_in_state attribute Time conversion carried out by cpufreq when user space attempts to read /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state won't work correcty if cputime_t doesn't map directly to jiffies. Fix from Andreas Schwab. 9) Revert of a troublesome cpufreq commit Commit 7c30ed5 (cpufreq: make sure frequency transitions are serialized) was intended to address some known concurrency problems in cpufreq related to the ordering of transitions, but unfortunately it introduced several problems of its own, so I decided to revert it now and address the original problems later in a more robust way. 10) Intel Haswell CPU models for intel_pstate from Nell Hardcastle. 11) cpufreq fixes related to system suspend/resume The recent cpufreq changes that made it preserve CPU sysfs attributes over suspend/resume cycles introduced a possible NULL pointer dereference that caused it to crash during the second attempt to suspend. Three commits from Srivatsa S Bhat fix that problem and a couple of related issues. 12) cpufreq locking fix cpufreq_policy_restore() should acquire the lock for reading, but it acquires it for writing. Fix from Lan Tianyu" * tag 'pm+acpi-fixes-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits) cpufreq: Acquire the lock in cpufreq_policy_restore() for reading cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu cpufreq: Restructure if/else block to avoid unintended behavior cpufreq: Fix crash in cpufreq-stats during suspend/resume intel_pstate: Add Haswell CPU models Revert "cpufreq: make sure frequency transitions are serialized" cpufreq: Use signed type for 'ret' variable, to store negative error values cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock cpufreq: Split __cpufreq_remove_dev() into two parts cpufreq: Fix wrong time unit conversion cpufreq: serialize calls to __cpufreq_governor() cpufreq: don't allow governor limits to be changed when it is disabled ACPI / bind: Prefer device objects with _STA to those without it ACPI / hotplug / PCI: Avoid parent bus rescans on spurious device checks ACPI / hotplug / PCI: Use _OST to notify firmware about notify status ACPI / hotplug / PCI: Avoid doing too much for spurious notifies ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field. ACPI / hotplug / PCI: Don't trim devices before scanning the namespace ...
| * PM / hibernate / memory hotplug: Rework mutual exclusionRafael J. Wysocki2013-08-311-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since all of the memory hotplug operations have to be carried out under device_hotplug_lock, they won't need to acquire pm_mutex if device_hotplug_lock is held around hibernation. For this reason, make the hibernation code acquire device_hotplug_lock after freezing user space processes and release it before thawing them. At the same tim drop the lock_system_sleep() and unlock_system_sleep() calls from lock_memory_hotplug() and unlock_memory_hotplug(), respectively. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Toshi Kani <toshi.kani@hp.com>
* | mm: memory-hotplug: enable memory hotplug to handle hugepageNaoya Horiguchi2013-09-111-7/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now we can't offline memory blocks which contain hugepages because a hugepage is considered as an unmovable page. But now with this patch series, a hugepage has become movable, so by using hugepage migration we can offline such memory blocks. What's different from other users of hugepage migration is that we need to decompose all the hugepages inside the target memory block into free buddy pages after hugepage migration, because otherwise free hugepages remaining in the memory block intervene the memory offlining. For this reason we introduce new functions dissolve_free_huge_page() and dissolve_free_huge_pages(). Other than that, what this patch does is straightforwardly to add hugepage migration code, that is, adding hugepage code to the functions which scan over pfn and collect hugepages to be migrated, and adding a hugepage allocation function to alloc_migrate_target(). As for larger hugepages (1GB for x86_64), it's not easy to do hotremove over them because it's larger than memory block. So we now simply leave it to fail as it is. [yongjun_wei@trendmicro.com.cn: remove duplicated include] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | mm/hotplug: remove stop_machine() from try_offline_node()Toshi Kani2013-09-111-9/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lock_device_hotplug() serializes hotplug & online/offline operations. The lock is held in common sysfs online/offline interfaces and ACPI hotplug code paths. And here are the code paths: - CPU & Mem online/offline via sysfs online store_online()->lock_device_hotplug() - Mem online via sysfs state: store_mem_state()->lock_device_hotplug() - ACPI CPU & Mem hot-add: acpi_scan_bus_device_check()->lock_device_hotplug() - ACPI CPU & Mem hot-delete: acpi_scan_hot_remove()->lock_device_hotplug() try_offline_node() off-lines a node if all memory sections and cpus are removed on the node. It is called from acpi_processor_remove() and acpi_memory_remove_memory()->remove_memory() paths, both of which are in the ACPI hotplug code. try_offline_node() calls stop_machine() to stop all cpus while checking all cpu status with the assumption that the caller is not protected from CPU hotplug or CPU online/offline operations. However, the caller is always serialized with lock_device_hotplug(). Also, the code needs to be properly serialized with a lock, not by stopping all cpus at a random place with stop_machine(). This patch removes the use of stop_machine() in try_offline_node() and adds comments to try_offline_node() and remove_memory() that lock_device_hotplug() is required. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | mm/hotplug: verify hotplug memory rangeToshi Kani2013-09-111-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | add_memory() and remove_memory() can only handle a memory range aligned with section. There are problems when an unaligned range is added and then deleted as follows: - add_memory() with an unaligned range succeeds, but __add_pages() called from add_memory() adds a whole section of pages even though a given memory range is less than the section size. - remove_memory() to the added unaligned range hits BUG_ON() in __remove_pages(). This patch changes add_memory() and remove_memory() to check if a given memory range is aligned with section at the beginning. As the result, add_memory() fails with -EINVAL when a given range is unaligned, and does not add such memory range. This prevents remove_memory() to be called with an unaligned range as well. Note that remove_memory() has to use BUG_ON() since this function cannot fail. [akpm@linux-foundation.org: avoid printk warnings] Signed-off-by: Toshi Kani <toshi.kani@hp.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | mm: use zone_is_initialized() instead of if(zone->wait_table)Xishi Qiu2013-09-111-1/+1
| | | | | | | | | | | | | | | | | | | | Use "zone_is_initialized()" instead of "if (zone->wait_table)". Simplify the code, no functional change. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Cc: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | mm: use zone_is_empty() instead of if(zone->spanned_pages)Xishi Qiu2013-09-111-3/+3
| | | | | | | | | | | | | | | | | | | | Use "zone_is_empty()" instead of "if (zone->spanned_pages)". Simplify the code, no functional change. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Cc: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | mm: use zone_end_pfn() instead of zone_start_pfn+spanned_pagesXishi Qiu2013-09-111-3/+4
| | | | | | | | | | | | | | | | | | | | | | Use "zone_end_pfn()" instead of "zone->zone_start_pfn + zone->spanned_pages". Simplify the code, no functional change. [akpm@linux-foundation.org: fix build] Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Cc: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | mm/hotplug: remove unnecessary BUG_ON in __offline_pages()Xishi Qiu2013-09-111-1/+0
|/ | | | | | | | | | | | | | | | | | | | | | | | | I think we can remove "BUG_ON(start_pfn >= end_pfn)" in __offline_pages(), because in memory_block_action() "nr_pages = PAGES_PER_SECTION * sections_per_block" is always greater than 0. memory_block_action() offline_pages() __offline_pages() BUG_ON(start_pfn >= end_pfn) In v2.6.32, If info->length==0, this way may hit this BUG_ON(). acpi_memory_disable_device() remove_memory(info->start_addr, info->length) offline_pages() A later Fujitsu patch renamed this function and the BUG_ON() is unnecessary. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/memory_hotplug.c: fix return value of online_pages()Toshi Kani2013-07-091-3/+3
| | | | | | | | | | | | online_pages() is called from memory_block_action() when a user requests to online a memory block via sysfs. This function needs to return a proper error value in case of error. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/memory_hotplug.c: fix a comment typo in register_page_bootmem_info_node()Tang Chen2013-07-091-2/+2
| | | | | | Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'akpm' (updates from Andrew Morton)Linus Torvalds2013-07-031-32/+16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge first patch-bomb from Andrew Morton: - various misc bits - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been distracted. There has been quite a bit of activity. - About half the MM queue - Some backlight bits - Various lib/ updates - checkpatch updates - zillions more little rtc patches - ptrace - signals - exec - procfs - rapidio - nbd - aoe - pps - memstick - tools/testing/selftests updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits) tools/testing/selftests: don't assume the x bit is set on scripts selftests: add .gitignore for kcmp selftests: fix clean target in kcmp Makefile selftests: add .gitignore for vm selftests: add hugetlbfstest self-test: fix make clean selftests: exit 1 on failure kernel/resource.c: remove the unneeded assignment in function __find_resource aio: fix wrong comment in aio_complete() drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode drivers/memstick/host/r592.c: convert to module_pci_driver drivers/memstick/host/jmb38x_ms: convert to module_pci_driver pps-gpio: add device-tree binding and support drivers/pps/clients/pps-gpio.c: convert to module_platform_driver drivers/pps/clients/pps-gpio.c: convert to devm_* helpers drivers/parport/share.c: use kzalloc Documentation/accounting/getdelays.c: avoid strncpy in accounting tool aoe: update internal version number to v83 aoe: update copyright date aoe: perform I/O completions in parallel ...
| * mm/hotplug: prepare for removing num_physpagesJiang Liu2013-07-031-4/+0
| | | | | | | | | | | | | | | | | | | | | | Prepare for removing num_physpages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * mm: correctly update zone->managed_pagesJiang Liu2013-07-031-13/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enhance adjust_managed_page_count() to adjust totalhigh_pages for highmem pages. And change code which directly adjusts totalram_pages to use adjust_managed_page_count() because it adjusts totalram_pages, totalhigh_pages and zone->managed_pages altogether in a safe way. Remove inc_totalhigh_pages() and dec_totalhigh_pages() from xen/balloon driver bacause adjust_managed_page_count() has already adjusted totalhigh_pages. This patch also fixes two bugs: 1) enhances virtio_balloon driver to adjust totalhigh_pages when reserve/unreserve pages. 2) enhance memory_hotplug.c to adjust totalhigh_pages when hot-removing memory. We still need to deal with modifications of totalram_pages in file arch/powerpc/platforms/pseries/cmm.c, but need help from PPC experts. [akpm@linux-foundation.org: remove ifdef, per Wanpeng Li, virtio_balloon.c cleanup, per Sergei] [akpm@linux-foundation.org: export adjust_managed_page_count() to modules, for drivers/virtio/virtio_balloon.c] Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <sworddragon2@aol.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * mm: make __free_pages_bootmem() only available at boot timeJiang Liu2013-07-031-14/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to simpilify management of totalram_pages and zone->managed_pages, make __free_pages_bootmem() only available at boot time. With this change applied, __free_pages_bootmem() will only be used by bootmem.c and nobootmem.c at boot time, so mark it as __init. Other callers of __free_pages_bootmem() have been converted to use free_reserved_page(), which handles totalram_pages and zone->managed_pages in a safer way. This patch also fix a bug in free_pagetable() for x86_64, which should increase zone->managed_pages instead of zone->present_pages when freeing reserved pages. And now we have managed_pages_count_lock to protect totalram_pages and zone->managed_pages, so remove the redundant ppb_lock lock in put_page_bootmem(). This greatly simplifies the locking rules. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: <sworddragon2@aol.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * mm: fix some trivial typos in commentsJiang Liu2013-07-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix some trivial typos in comments. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan@kernel.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: <sworddragon2@aol.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * mm/memory_hotplug.c: change normal message to use pr_debugToshi Kani2013-07-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During early boot-up, iomem_resource is set up from the boot descriptor table, such as EFI Memory Table and e820. Later, acpi_memory_device_add() calls add_memory() for each ACPI memory device object as it enumerates ACPI namespace. This add_memory() call is expected to fail in register_memory_resource() at boot since iomem_resource has been set up from EFI/e820. As a result, add_memory() returns -EEXIST, which acpi_memory_device_add() handles as the normal case. This scheme works fine, but the following error message is logged for every ACPI memory device object during boot-up. "System RAM resource %pR cannot be added\n" This patch changes register_memory_resource() to use pr_debug() for the message as it shows up under the normal case. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * memory_hotplug: use pgdat_resize_lock() in __offline_pages()Cody P Schafer2013-07-031-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mmzone.h documents node_size_lock (which pgdat_resize_lock() locks) as follows: * Must be held any time you expect node_start_pfn, node_present_pages * or node_spanned_pages stay constant. [...] So actually hold it when we update node_present_pages in __offline_pages(). [akpm@linux-foundation.org: fix build] Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * memory_hotplug: use pgdat_resize_lock() in online_pages()Cody P Schafer2013-07-031-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mmzone.h documents node_size_lock (which pgdat_resize_lock() locks) as follows: * Must be held any time you expect node_start_pfn, node_present_pages * or node_spanned_pages stay constant. [...] So actually hold it when we update node_present_pages in online_pages(). Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'acpi-hotplug'Rafael J. Wysocki2013-06-281-70/+11
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * acpi-hotplug: ACPI: Do not use CONFIG_ACPI_HOTPLUG_MEMORY_MODULE ACPI / cpufreq: Add ACPI processor device IDs to acpi-cpufreq Memory hotplug: Move alternative function definitions to header ACPI / processor: Fix potential NULL pointer dereference in acpi_processor_add() Memory hotplug / ACPI: Simplify memory removal ACPI / scan: Add second pass of companion offlining to hot-remove code Driver core / MM: Drop offline_memory_block() ACPI / processor: Pass processor object handle to acpi_bind_one() ACPI: Drop removal_type field from struct acpi_device Driver core / memory: Simplify __memory_block_change_state() ACPI / processor: Initialize per_cpu(processors, pr->id) properly CPU: Fix sysfs cpu/online of offlined CPUs Driver core: Introduce offline/online callbacks for memory blocks ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes ACPI / processor: Use common hotplug infrastructure ACPI / hotplug: Use device offline/online for graceful hot-removal Driver core: Use generic offline/online for CPU offline/online Driver core: Add offline/online device operations
| * Memory hotplug: Move alternative function definitions to headerRafael J. Wysocki2013-06-011-7/+1
| | | | | | | | | | | | | | | | Move the definitions of offline_pages() and remove_memory() for CONFIG_MEMORY_HOTREMOVE to memory_hotplug.h, where they belong, and make them static inline. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * Memory hotplug / ACPI: Simplify memory removalRafael J. Wysocki2013-06-011-63/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the memory offlining should be taken care of by the companion device offlining code in acpi_scan_hot_remove(), the ACPI memory hotplug driver doesn't need to offline it in remove_memory() any more. Moreover, since the return value of remove_memory() is not used, it's better to make it be a void function and trigger a BUG() if the memory scheduled for removal is not offline. Change the code in accordance with the above observations. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
| * Driver core / MM: Drop offline_memory_block()Rafael J. Wysocki2013-06-011-1/+1
| | | | | | | | | | | | | | | | | | | | Since offline_memory_block(mem) is functionally equivalent to device_offline(&mem->dev), make the only caller of the former use the latter instead and drop offline_memory_block() entirely. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Toshi Kani <toshi.kani@hp.com>
| * ACPI / memhotplug: Bind removable memory blocks to ACPI device nodesRafael J. Wysocki2013-05-121-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During ACPI memory hotplug configuration bind memory blocks residing in modules removable through the standard ACPI mechanism to struct acpi_device objects associated with ACPI namespace objects representing those modules. Accordingly, unbind those memory blocks from the struct acpi_device objects when the memory modules in question are being removed. When "offline" operation for devices representing memory blocks is introduced, this will allow the ACPI core's device hot-remove code to use it to carry out remove_memory() for those memory blocks and check the results of that before it actually removes the modules holding them from the system. Since walk_memory_range() is used for accessing all memory blocks corresponding to a given ACPI namespace object, it is exported from memory_hotplug.c so that the code in acpi_memhotplug.c can use it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
* | mm/memory_hotplug.c: fix printk format warningsRandy Dunlap2013-05-241-3/+6
|/ | | | | | | | | | | | Fix printk format warnings in mm/memory_hotplug.c by using "%pa": mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'resource_size_t' [-Wformat] mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'resource_size_t' [-Wformat] Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: fix memory_hotplug.c printk format warningRandy Dunlap2013-04-291-4/+8
| | | | | | | | | | | | | PFN_PHYS() is a phys_addr_t, which can be u32 or u64. Fix the build warning when phys_addr_t is u32. mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'unsigned int' [-Wformat]: => 1685:3 mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'unsigned int' [-Wformat]: => 1685:3 Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, hotplug: avoid compiling memory hotremove functions when disabledDavid Rientjes2013-04-291-33/+35
| | | | | | | | | | | | | | | | | | | | | | | | | __remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE. PowerPC pseries will return -EOPNOTSUPP if unsupported. Adding an #ifdef causes several other functions it depends on to also become unnecessary, which saves in .text when disabled (it's disabled in most defconfigs besides powerpc, including x86). remove_memory_block() becomes static since it is not referenced outside of drivers/base/memory.c. Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled and disabled. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: change __remove_pages() to call release_mem_region_adjustable()Toshi Kani2013-04-291-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Change __remove_pages() to call release_mem_region_adjustable(). This allows a requested memory range to be released from the iomem_resource table even if it does not match exactly to an resource entry but still fits into. The resource entries initialized at bootup usually cover the whole contiguous memory ranges and may not necessarily match with the size of memory hot-delete requests. If release_mem_region_adjustable() failed, __remove_pages() emits a warning message and continues to proceed as it was the case with release_mem_region(). release_mem_region(), which is defined to __release_region(), emits a warning message and returns no error since a void function. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by : Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: T Makphaibulchoke <tmac@hp.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: walk_memory_range(): fix typo in commentToshi Kani2013-04-291-1/+1
| | | | | | | | Fix a typo "end_pft" in the comment of walk_memory_range(). Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/hotplug: only free wait_table if it's allocated by vmallocJianguo Wu2013-03-221-1/+5
| | | | | | | | | | | zone->wait_table may be allocated from bootmem, it can not be freed. Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove_memory(): fix end_pfn settingToshi Kani2013-03-131-1/+1
| | | | | | | | | | | | | | | | | | | remove_memory() calls walk_memory_range() with [start_pfn, end_pfn), where end_pfn is exclusive in this range. Therefore, end_pfn needs to be set to the next page of the end address. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/memory_hotplug: use pgdat_end_pfn() instead of open coding the same.Cody P Schafer2013-02-231-1/+1
| | | | | | | | | | | | Replace open coded pgdat_end_pfn() with helper function. Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: David Hansen <dave@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/memory_hotplug: use ensure_zone_is_initialized()Cody P Schafer2013-02-231-19/+10
| | | | | | | | | | | | Remove open coding of ensure_zone_is_initialzied(). Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: David Hansen <dave@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: add helper ensure_zone_is_initialized()Cody P Schafer2013-02-231-0/+11
| | | | | | | | | | | | | | | ensure_zone_is_initialized() checks if a zone is in a empty & not initialized state (typically occuring after it is created in memory hotplugging), and, if so, calls init_currently_empty_zone() to initialize the zone. Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: David Hansen <dave@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: add & use zone_end_pfn() and zone_spans_pfn()Cody P Schafer2013-02-231-5/+5
| | | | | | | | | | | | | | | | | | | | | | Add 2 helpers (zone_end_pfn() and zone_spans_pfn()) to reduce code duplication. This also switches to using them in compaction (where an additional variable needed to be renamed), page_alloc, vmstat, memory_hotplug, and kmemleak. Note that in compaction.c I avoid calling zone_end_pfn() repeatedly because I expect at some point the sycronization issues with start_pfn & spanned_pages will need fixing, either by actually using the seqlock or clever memory barrier usage. Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: David Hansen <dave@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove offlining arg to migrate_pagesHugh Dickins2013-02-231-2/+1
| | | | | | | | | | | | | | | | | No functional change, but the only purpose of the offlining argument to migrate_pages() etc, was to ensure that __unmap_and_move() could migrate a KSM page for memory hotremove (which took ksm_thread_mutex) but not for other callers. Now all cases are safe, remove the arg. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Petr Holasek <pholasek@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: increase totalram_pages when free pages allocated by bootmem allocatorJiang Liu2013-02-231-0/+1
| | | | | | | | | | | | | | | | | | | | | Function put_page_bootmem() is used to free pages allocated by bootmem allocator, so it should increase totalram_pages when freeing pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Cc: Chris Clayton <chris2553@googlemail.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cpu-hotplug,memory-hotplug: clear cpu_to_node() when offlining the nodeWen Congyang2013-02-231-1/+29
| | | | | | | | | | | | | | | | | | | | | | When the node is offlined, there is no memory/cpu on the node. If a sleep task runs on a cpu of this node, it will be migrated to the cpu on the other node. So we can clear cpu-to-node mapping. [akpm@linux-foundation.org: numa_clear_node() and numa_set_node() can no longer be __cpuinit] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memory-hotplug: export the function try_offline_node()Wen Congyang2013-02-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | try_offline_node() will be needed in the tristate drivers/acpi/processor_driver.c. The node will be offlined when all memory/cpu on the node have been hotremoved. So we need the function try_offline_node() in cpu-hotplug path. If the memory-hotplug is disabled, and cpu-hotplug is enabled 1. no memory no the node we don't online the node, and cpu's node is the nearest node. 2. the node contains some memory the node has been onlined, and cpu's node is still needed to migrate the sleep task on the cpu to the same node. So we do nothing in try_offline_node() in this case. [rientjes@google.com: export the function try_offline_node() fix] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <lenb@kernel.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memory-hotplug: do not allocate pgdat if it was not freed when offline.Tang Chen2013-02-231-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since there is no way to guarentee the address of pgdat/zone is not on stack of any kernel threads or used by other kernel objects without reference counting or other symchronizing method, we cannot reset node_data and free pgdat when offlining a node. Just reset pgdat to 0 and reuse the memory when the node is online again. The problem is suggested by Kamezawa Hiroyuki. The idea is from Wen Congyang. NOTE: If we don't reset pgdat to 0, the WARN_ON in free_area_init_node() will be triggered. [akpm@linux-foundation.org: fix warning when CONFIG_NEED_MULTIPLE_NODES=n] [akpm@linux-foundation.org: fix the warning again again] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memory-hotplug: free node_data when a node is offlinedWen Congyang2013-02-231-3/+27
| | | | | | | | | | | | | | | | | | | | We call hotadd_new_pgdat() to allocate memory to store node_data. So we should free it when removing a node. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memory-hotplug: remove sysfs file of nodeTang Chen2013-02-231-2/+56
| | | | | | | | | | | | | | | | | | | | | Introduce a new function try_offline_node() to remove sysfs file of node when all memory sections of this node are removed. If some memory sections of this node are not removed, this function does nothing. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memory_hotplug: clear zone when removing the memoryYasuaki Ishimatsu2013-02-231-0/+207
| | | | | | | | | | | | | | | | | | | | | | | When memory is added, we update zone's and pgdat's start_pfn and spanned_pages in __add_zone(). So we should revert them when the memory is removed. The patch adds a new function __remove_zone() to do this. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memory-hotplug: integrated __remove_section() of CONFIG_SPARSEMEM_VMEMMAP.Tang Chen2013-02-231-11/+0
| | | | | | | | | | | | | | | | | | | | Currently __remove_section for SPARSEMEM_VMEMMAP does nothing. But even if we use SPARSEMEM_VMEMMAP, we can unregister the memory_section. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OpenPOWER on IntegriCloud