op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	mm: disable zone_reclaim_mode by default	Mel Gorman	2014-06-04	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When it was introduced, zone_reclaim_mode made sense as NUMA distances punished and workloads were generally partitioned to fit into a NUMA node. NUMA machines are now common but few of the workloads are NUMA-aware and it's routine to see major performance degradation due to zone_reclaim_mode being enabled but relatively few can identify the problem. Those that require zone_reclaim_mode are likely to be able to detect when it needs to be enabled and tune appropriately so lets have a sensible default for the bulk of users. This patch (of 2): zone_reclaim_mode causes processes to prefer reclaiming memory from local node instead of spilling over to other nodes. This made sense initially when NUMA machines were almost exclusively HPC and the workload was partitioned into nodes. The NUMA penalties were sufficiently high to justify reclaiming the memory. On current machines and workloads it is often the case that zone_reclaim_mode destroys performance but not all users know how to detect this. Favour the common case and disable it by default. Users that are sophisticated enough to know they need zone_reclaim_mode will detect it. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	sched: Remove unused mc_capable() and smt_capable()	Bjorn Helgaas	2014-03-11	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove mc_capable() and smt_capable(). Neither is used. Both were added by 5c45bf279d37 ("sched: mc/smt power savings sched policy"). Uses of both were removed by 8e7fbcbc22c1 ("sched: Remove stale power aware scheduling remnants and dysfunctional knobs"). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: http://lkml.kernel.org/r/20140304210737.16893.54289.stgit@bhelgaas-glaptop.roam.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
*	powerpc: Fix the setup of CPU-to-Node mappings during CPU online	Srivatsa S. Bhat	2014-01-15	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On POWER platforms, the hypervisor can notify the guest kernel about dynamic changes in the cpu-numa associativity (VPHN topology update). Hence the cpu-to-node mappings that we got from the firmware during boot, may no longer be valid after such updates. This is handled using the arch_update_cpu_topology() hook in the scheduler, and the sched-domains are rebuilt according to the new mappings. But unfortunately, at the moment, CPU hotplug ignores these updated mappings and instead queries the firmware for the cpu-to-numa relationships and uses them during CPU online. So the kernel can end up assigning wrong NUMA nodes to CPUs during subsequent CPU hotplug online operations (after booting). Further, a particularly problematic scenario can result from this bug: On POWER platforms, the SMT mode can be switched between 1, 2, 4 (and even 8) threads per core. The switch to Single-Threaded (ST) mode is performed by offlining all except the first CPU thread in each core. Switching back to SMT mode involves onlining those other threads back, in each core. Now consider this scenario: 1. During boot, the kernel gets the cpu-to-node mappings from the firmware and assigns the CPUs to NUMA nodes appropriately, during CPU online. 2. Later on, the hypervisor updates the cpu-to-node mappings dynamically and communicates this update to the kernel. The kernel in turn updates its cpu-to-node associations and rebuilds its sched domains. Everything is fine so far. 3. Now, the user switches the machine from SMT to ST mode (say, by running ppc64_cpu --smt=1). This involves offlining all except 1 thread in each core. 4. The user then tries to switch back from ST to SMT mode (say, by running ppc64_cpu --smt=4), and this involves onlining those threads back. Since CPU hotplug ignores the new mappings, it queries the firmware and tries to associate the newly onlined sibling threads to the old NUMA nodes. This results in sibling threads within the same core getting associated with different NUMA nodes, which is incorrect. The scheduler's build-sched-domains code gets thoroughly confused with this and enters an infinite loop and causes soft-lockups, as explained in detail in commit 3be7db6ab (powerpc: VPHN topology change updates all siblings). So to fix this, use the numa_cpu_lookup_table to remember the updated cpu-to-node mappings, and use them during CPU hotplug online operations. Further, we also need to ensure that all threads in a core are assigned to a common NUMA node, irrespective of whether all those threads were online during the topology update. To achieve this, we take care not to use cpu_sibling_mask() since it is not hotplug invariant. Instead, we use cpu_first_sibling_thread() and set up the mappings manually using the 'threads_per_core' value for that particular platform. This helps us ensure that we don't hit this bug with any combination of CPU hotplug and SMT mode switching. Cc: stable@vger.kernel.org Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Make chip-id information available to userspace	Vasant Hegde	2013-08-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	So far "/sys/devices/system/cpu/cpuX/topology/physical_package_id" was always default (-1) on ppc64 architecture. Now, some systems have an ibm,chip-id property in the cpu nodes in the device tree. On these systems, we now use this information to display physical_package_id. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/pseries: Add /proc interface to control topology updates	Nathan Fontenot	2013-04-26	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \|	There are instances in which we do not want topology updates to occur. In order to allow this a /proc interface (/proc/powerpc/topology_updates) is introduced so that topology updates can be enabled and disabled. This patch also adds a prrn_is_enabled() call so that PRRN events are handled in the kernel only if topology updating is enabled. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	sched/numa: Rewrite the CONFIG_NUMA sched domain support	Peter Zijlstra	2012-05-09	1	-36/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current code groups up to 16 nodes in a level and then puts an ALLNODES domain spanning the entire tree on top of that. This doesn't reflect the numa topology and esp for the smaller not-fully-connected machines out there today this might make a difference. Therefore, build a proper numa topology based on node_distance(). Since there's no fixed numa layers anymore, the static SD_NODE_INIT and SD_ALLNODES_INIT aren't usable anymore, the new code tries to construct something similar and scales some values either on the number of cpus in the domain and/or the node_distance() ratio. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Anton Blanchard <anton@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: linux-alpha@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-sh@vger.kernel.org Cc: Matt Turner <mattst88@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: sparclinux@vger.kernel.org Cc: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Greg Pearson <greg.pearson@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: bob.picco@oracle.com Cc: chris.mason@oracle.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-r74n3n8hhuc2ynbrnp3vt954@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
*	cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem	Kay Sievers	2011-12-21	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem and converts the devices to regular devices. The sysdev drivers are implemented as subsystem interfaces now. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Userspace relies on events and generic sysfs subsystem infrastructure from sysdev devices, which are made available with this conversion. Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@amd64.org> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Len Brown <lenb@kernel.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	powerpc/numa: Remove duplicate RECLAIM_DISTANCE definition	Anton Blanchard	2011-09-20	1	-10/+0
\| \| \| \| \| \| \| \|	We have two identical definitions of RECLAIM_DISTANCE, looks like the patch got applied twice. Remove one. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/numa: Disable NEWIDLE balancing at node level	Anton Blanchard	2011-09-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	On big POWER7 boxes we see large amounts of CPU time in system processes like workqueue and watchdog kernel threads. We currently rebalance the entire machine each time a task goes idle and this is very expensive on large machines. Disable newidle balancing at the node level and rely on the scheduler tick to rebalance across nodes. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/numa: Increase SD_NODES_PER_DOMAIN to 32.	Anton Blanchard	2011-09-20	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	The largest POWER7 boxes have 32 nodes. SD_NODES_PER_DOMAIN groups nodes into chunks of 16 and adds a global balancing domain (SD_ALLNODES) above it. If we bump SD_NODES_PER_DOMAIN to 32, then we avoid this extra level of balancing on our largest boxes. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/numa: Enable SD_WAKE_AFFINE in node definition	Anton Blanchard	2011-09-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When chasing a performance issue on ppc64, I noticed tasks communicating via a pipe would often end up on different nodes. It turns out SD_WAKE_AFFINE is not set in our node defition. Commit 9fcd18c9e63e (sched: re-tune balancing) enabled SD_WAKE_AFFINE in the node definition for x86 and we need a similar change for ppc64. I used lmbench lat_ctx and perf bench pipe to verify this fix. Each benchmark was run 10 times and the average taken. lmbench lat_ctx: before: 66565 ops/sec after: 204700 ops/sec 3.1x faster perf bench pipe: before: 5.6570 usecs after: 1.3470 usecs 4.2x faster Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/pseries: Fix build of topology stuff without CONFIG_NUMA	Benjamin Herrenschmidt	2011-01-12	1	-13/+14
\| \| \| \|	Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/pseries: Fix VPHN build errors on non-SMP systems	Jesse Larrew	2011-01-11	1	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The header asm/hvcall.h was previously included indirectly via smp.h. On non-SMP systems, however, these declarations are excluded and the build breaks. This is easily fixed by including asm/hvcall.h directly. The VPHN feature is only meaningful on NUMA systems that implement the SPLPAR option, so exclude the VPHN code on systems without SPLPAR enabled. Also, expose unmap_cpu_from_node() on systems with SPLPAR enabled, even if CONFIG_HOTPLUG_CPU is disabled. Lastly, map_cpu_to_node() is now needed by VPHN to manipulate the node masks after boot time, so remove the __cpuinit annotation to fix a section mismatch. Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
*	powerpc: Disable VPHN polling during a suspend operation	Jesse Larrew	2010-12-09	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \|	Tie the polling mechanism into the ibm,suspend-me rtas call to stop/restart polling before/after a suspend, hibernate, migrate, or checkpoint restart operation. This ensures that the system has a chance to disable the polling if the partition is migrated to a system that does not support VPHN (and vice versa). Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	Merge branch 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6	Linus Torvalds	2010-08-05	1	-7/+0
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6: (63 commits) of/platform: Register of_platform_drivers with an "of:" prefix of/address: Clean up function declarations of/spi: call of_register_spi_devices() from spi core code of: Provide default of_node_to_nid() implementation. of/device: Make of_device_make_bus_id() usable by other code. of/irq: Fix endian issues in parsing interrupt specifiers of: Fix phandle endian issues of/flattree: fix of_flat_dt_is_compatible() to match the full compatible string of: remove of_default_bus_ids of: make of_find_device_by_node generic microblaze: remove references to of_device and to_of_device sparc: remove references to of_device and to_of_device powerpc: remove references to of_device and to_of_device of/device: Replace of_device with platform_device in includes and core code of/device: Protect against binding of_platform_drivers to non-OF devices of: remove asm/of_device.h of: remove asm/of_platform.h of/platform: remove all of_bus_type and of_platform_bus_type references of: Merge of_platform_bus_type with platform_bus_type drivercore/of: Add OF style matching to platform bus ... Fix up trivial conflicts in arch/microblaze/kernel/Makefile due to just some obj-y removals by the devicetree branch, while the microblaze updates added a new file.
\| *	of: Provide default of_node_to_nid() implementation.	Grant Likely	2010-07-30	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	of_node_to_nid() is only relevant in a few architectures. Don't force everyone to implement it anyway. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* \|	powerpc/numa: Use form 1 affinity to setup node distance	Anton Blanchard	2010-07-09	1	-0/+3
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Form 1 affinity allows multiple entries in ibm,associativity-reference-points which represent affinity domains in decreasing order of importance. The Linux concept of a node is always the first entry, but using the other values as an input to node_distance() allows the memory allocator to make better decisions on which node to go first when local memory has been exhausted. We keep things simple and create an array indexed by NUMA node, capped at 4 entries. Each time we lookup an associativity property we initialise the array which is overkill, but since we should only hit this path during boot it didn't seem worth adding a per node valid bit. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to enable zone reclaim	Anton Blanchard	2010-05-21	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I noticed /proc/sys/vm/zone_reclaim_mode was 0 on a ppc64 NUMA box. It gets enabled via this: /* * If another node is sufficiently far away then it is better * to reclaim pages in a zone before going off node. */ if (distance > RECLAIM_DISTANCE) zone_reclaim_mode = 1; Since we use the default value of 20 for REMOTE_DISTANCE and 20 for RECLAIM_DISTANCE it never kicks in. The local to remote bandwidth ratios can be quite large on System p machines so it makes sense for us to reclaim clean pagecache locally before going off node. The patch below sets a smaller value for RECLAIM_DISTANCE and thus enables zone reclaim. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/cpumask: Convert NUMA code to new cpumask API	Anton Blanchard	2010-05-06	1	-1/+1
\| \| \| \| \| \| \| \| \|	Convert NUMA code to new cpumask API. We shift the node to cpumask setup code until after we complete bootmem allocation so we can dynamically allocate the cpumasks. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/cpumask: Dynamically allocate cpu_sibling_map and cpu_core_map cpumasks	Anton Blanchard	2010-05-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dynamically allocate cpu_sibling_map and cpu_core_map cpumasks. We don't need to set_cpu_online() the boot cpu in smp_prepare_boot_cpu, init/main.c does it for us. We also postpone setting of the boot cpu in cpu_sibling_map and cpu_core_map until when the memory allocator is available (smp_prepare_cpus), similar to x86. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to enable zone reclaim	Anton Blanchard	2010-04-07	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I noticed /proc/sys/vm/zone_reclaim_mode was 0 on a ppc64 NUMA box. It gets enabled via this: /* * If another node is sufficiently far away then it is better * to reclaim pages in a zone before going off node. */ if (distance > RECLAIM_DISTANCE) zone_reclaim_mode = 1; Since we use the default value of 20 for REMOTE_DISTANCE and 20 for RECLAIM_DISTANCE it never kicks in. The local to remote bandwidth ratios can be quite large on System p machines so it makes sense for us to reclaim clean pagecache locally before going off node. The patch below sets a smaller value for RECLAIM_DISTANCE and thus enables zone reclaim. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Reformat SD_NODE_INIT to match x86	Anton Blanchard	2010-02-09	1	-21/+27
\| \| \| \| \| \| \| \|	Clean up SD_NODE_INITS so we can easily compare it to x86. Similar to the work in 47734f89be0614b5acbd6a532390f9c72f019648 (sched: Clean up topology.h) Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: cpumask_of_node() should handle -1 as a node	Anton Blanchard	2010-01-15	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pcibus_to_node can return -1 if we cannot determine which node a pci bus is on. If passed -1, cpumask_of_node will negatively index the lookup array and pull in random data: # cat /sys/devices/pci0000:00/0000:00:01.0/local_cpus 00000000,00000003,00000000,00000000 # cat /sys/devices/pci0000:00/0000:00:01.0/local_cpulist 64-65 Change cpumask_of_node to check for -1 and return cpu_all_mask in this case: # cat /sys/devices/pci0000:00/0000:00:01.0/local_cpus ffffffff,ffffffff,ffffffff,ffffffff # cat /sys/devices/pci0000:00/0000:00:01.0/local_cpulist 0-127 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	cpumask: remove obsolete topology_core_siblings and ↵	Rusty Russell	2009-09-24	1	-2/+0
\| \| \| \| \| \| \| \|	topology_thread_siblings: powerpc There were replaced by topology_core_cpumask and topology_thread_cpumask. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
*	cpumask: remove obsolete node_to_cpumask now everyone uses cpumask_of_node	Rusty Russell	2009-09-24	1	-5/+0
\| \| \| \|	Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
*	cpumask: remove the now-obsoleted pcibus_to_cpumask(): powerpc	Rusty Russell	2009-09-24	1	-5/+0
\| \| \| \| \| \|	cpumask_of_pcibus() is the new version. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
*	sched: Disable wakeup balancing	Peter Zijlstra	2009-09-16	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't really mind too much, SD_BALANCE_NEWIDLE picks up most of the slack. On a dual socket, quad core, dual thread nehalem system: sysbench (--num_threads=16): SD_BALANCE_WAKE-: 13982 tx/s SD_BALANCE_WAKE+: 15688 tx/s kbuild (-j16): SD_BALANCE_WAKE-: 47.648295846 seconds time elapsed ( +- 0.312% ) SD_BALANCE_WAKE+: 47.608607360 seconds time elapsed ( +- 0.026% ) (same within noise) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: Improve latencies and throughput	Mike Galbraith	2009-09-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the idle balancer more agressive, to improve a x264 encoding workload provided by Jason Garrett-Glaser: NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 252.82 fps, 22096.60 kb/s encoded 600 frames, 250.69 fps, 22096.60 kb/s encoded 600 frames, 245.76 fps, 22096.60 kb/s NO_NEXT_BUDDY LB_BIAS encoded 600 frames, 344.44 fps, 22096.60 kb/s encoded 600 frames, 346.66 fps, 22096.60 kb/s encoded 600 frames, 352.59 fps, 22096.60 kb/s NO_NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 425.75 fps, 22096.60 kb/s encoded 600 frames, 425.45 fps, 22096.60 kb/s encoded 600 frames, 422.49 fps, 22096.60 kb/s Peter pointed out that this is better done via newidle_idx, not via LB_BIAS, newidle balancing should look for where there is load _now_, not where there was load 2 ticks ago. Worst-case latencies are improved as well as no buddies means less vruntime spread. (as per prior lkml discussions) This change improves kbuild-peak parallelism as well. Reported-by: Jason Garrett-Glaser <darkshikari@gmail.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1253011667.9128.16.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: Tweak wake_idx	Peter Zijlstra	2009-09-15	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	When merging select_task_rq_fair() and sched_balance_self() we lost the use of wake_idx, restore that and set them to 0 to make wake balancing more aggressive. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: Merge select_task_rq_fair() and sched_balance_self()	Peter Zijlstra	2009-09-15	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem with wake_idle() is that is doesn't respect things like cpu_power, which means it doesn't deal well with SMT nor the recent RT interaction. To cure this, it needs to do what sched_balance_self() does, which leads to the possibility of merging select_task_rq_fair() and sched_balance_self(). Modify sched_balance_self() to: - update_shares() when walking up the domain tree, (it only called it for the top domain, but it should have done this anyway), which allows us to remove this ugly bit from try_to_wake_up(). - do wake_affine() on the smallest domain that contains both this (the waking) and the prev (the wakee) cpu for WAKE invocations. Then use the top-down balance steps it had to replace wake_idle(). This leads to the dissapearance of SD_WAKE_BALANCE and SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE. SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective. Touch all topology bits to replace the old with new SD flags -- platforms might need re-tuning, enabling SD_BALANCE_WAKE conditionally on a NUMA distance seems like a good additional feature, magny-core and small nehalem systems would want this enabled, systems with slow interconnects would not. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	cpumask: remove node_to_first_cpu	Rusty Russell	2009-03-30	1	-5/+0
\| \| \| \| \| \| \| \|	Everyone defines it, and only one person uses it (arch/mips/sgi-ip27/ip27-nmi.c). So just open code it there. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: linux-mips@linux-mips.org
*	Merge branch 'master' of ↵	Mike Travis	2009-01-03	1	-3/+9
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask into merge-rr-cpumask Conflicts: arch/x86/kernel/io_apic.c kernel/rcuclassic.c kernel/sched.c kernel/time/tick-sched.c Signed-off-by: Mike Travis <travis@sgi.com> [ mingo@elte.hu: backmerged typo fix for io_apic.c ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
\| *	cpumask: Introduce topology_core_cpumask()/topology_thread_cpumask(): powerpc	Rusty Russell	2009-01-01	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Impact: New API The old topology_core_siblings() and topology_thread_siblings() return a cpumask_t; these new ones return a (const) struct cpumask *. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>
\| *	cpumask: powerpc: Introduce cpumask_of_{node,pcibus} to replace ↵	Rusty Russell	2008-12-26	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	{node,pcibus}_to_cpumask Impact: New APIs The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these return a pointer to a struct cpumask. Part of removing cpumasks from the stack. (Also replaces powerpc internal uses of node_to_cpumask). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* \|	sched: convert struct root_domain to cpumask_var_t, fix	Ingo Molnar	2008-11-26	1	-1/+0
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mathieu Desnoyers reported this build failure on powerpc: kernel/sched.c: In function 'sd_init_NODE': kernel/sched.c:7319: error: non-static initialization of a flexible array member kernel/sched.c:7319: error: (near initialization for '(anonymous)') this happens because .span changed to cpumask_var_t, hence the static CPU_MASK_NONE initializers in the SD_*_INIT templates are not type-correct anymore. Remove them, as they default to empty anyway. Also remove them from IA64, MIPS and SH. Reported-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	powerpc: Move include files to arch/powerpc/include/asm	Stephen Rothwell	2008-08-04	1	-0/+117
	from include/asm-powerpc. This is the result of a mkdir arch/powerpc/include/asm git mv include/asm-powerpc/* arch/powerpc/include/asm Followed by a few documentation/comment fixups and a couple of places where <asm-powepc/...> was being used explicitly. Of the latter only one was outside the arch code and it is a driver only built for powerpc. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>