| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a cpu is hotplug-onlined, if we don't set per_cpu(last_jiffy) to
something sane, timer_interrupt will execute its while loop for every
tick missed since the cpu was last online (or since the system was
booted, if we're adding a new cpu). This can cause weird hangs, ssh
sessions dropping, and we can even go xmon if we take a global IPI at
the wrong time.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Since 404849bbd2bfd62e05b36f4753f6e1af6050a824 we've been using
LOAD_REG_ADDRBASE, which uses the toc pointer, in decrementer_iSeries_masked.
This can explode if we take the decrementer interrupt while we're in a module,
because the toc pointer in r2 will be the module's toc pointer.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
|
|
| |
[arch/powerpc/kernel/rtas_flash.c]
Checking a pointer for NULL before passing it to kfree is pointless, kfree
does its own NULL checking of input.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
| |
arch/powerpc/kernel/udbg_16550.c: In function `udbg_init_maple_realmode':
arch/powerpc/kernel/udbg_16550.c:162: warning: assignment from incompatible pointer type
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
setup_peg2 must do some refcounting.
of_get_pci_address may need to drop the node
Pegasos l2cr : L2 cache was not active, activating
PCI bus 0 controlled by pci at 80000000
Badness in kref_get at /home/olaf/kernel/olh/ppc64/linux-2.6.16-rc2-olh/lib/kref.c:32
Call Trace:
[C037BD00] [C0007934] show_stack+0x5c/0x184 (unreliable)
[C037BD30] [C000E068] program_check_exception+0x184/0x584
[C037BD90] [C000F5F0] ret_from_except_full+0x0/0x4c
--- Exception: 700 at kref_get+0xc/0x24
LR = of_node_get+0x24/0x3c
[C037BE50] [C004FD94] __pte_alloc_kernel+0x64/0x80 (unreliable)
[C037BE70] [C000CA18] of_get_parent+0x34/0x58
[C037BE90] [C0009B18] of_get_address+0x24/0x174
[C037BED0] [C000A108] of_address_to_resource+0x24/0x68
[C037BF00] [C038B128] chrp_find_bridges+0x114/0x470
[C037BF90] [C038AE48] chrp_setup_arch+0x1fc/0x32c
[C037BFB0] [C03849B0] setup_arch+0x144/0x188
[C037BFD0] [C037C45C] start_kernel+0x34/0x1a8
[C037BFF0] [000037A0] 0x37a0
Badness in kref_get at /home/olaf/kernel/olh/ppc64/linux-2.6.16-rc2-olh/lib/kref.c:32
Call Trace:
[C037BC90] [C0007934] show_stack+0x5c/0x184 (unreliable)
[C037BCC0] [C000E068] program_check_exception+0x184/0x584
[C037BD20] [C000F5F0] ret_from_except_full+0x0/0x4c
--- Exception: 700 at kref_get+0xc/0x24
LR = of_node_get+0x24/0x3c
[C037BDE0] [00000000] 0x0 (unreliable)
[C037BE00] [C000CA18] of_get_parent+0x34/0x58
[C037BE20] [C0009CE8] of_translate_address+0x2c/0x2fc
[C037BEA0] [C0009FE8] __of_address_to_resource+0x30/0xc4
[C037BED0] [C000A130] of_address_to_resource+0x4c/0x68
[C037BF00] [C038B128] chrp_find_bridges+0x114/0x470
[C037BF90] [C038AE48] chrp_setup_arch+0x1fc/0x32c
[C037BFB0] [C03849B0] setup_arch+0x144/0x188
[C037BFD0] [C037C45C] start_kernel+0x34/0x1a8
[C037BFF0] [000037A0] 0x37a0
PCI bus 0 controlled by pci at c0000000
Top of RAM: 0x10000000, Total RAM: 0x10000000
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
| |
remove pointer/integer confusion
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
| |
remove pointer/integer confusion
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's possible for prom_init to allocate the flat device tree inside the
kdump crash kernel region. If this happens, when we load the kdump kernel we
overwrite the flattened device tree, which is bad.
We could make prom_init try and avoid allocating inside the crash kernel
region, but then we run into issues if the crash kernel region uses all the
space inside the RMO. The easiest solution is to move the flat device tree
once we're running in the kernel.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out that we can't stop the watchdog from
triggering here. If we touch the timer (which just uses the current jiffie
value) before we enable interrupts, it does nothing because jiffies
are not mass-updated until after we enable interrupts. If we touch the
timer after we enable interrupts, its too late because the softlockup
watchdog will already have triggered. The touch_softlockup_watchdog
call removed below does nothing.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
| |
We need to prod everyone here since this is the only CPU that is
guaranteed to be running after the ibm,suspend-me RTAS call returns.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
| |
Correctly return the status from the RTAS call. rtas_call expects
to return the status as a return value.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
| |
arch/powerpc/kernel/rtas.c is getting hvcall.h via spinlock.h, but when we're
building for UP we don't include spinlock.h.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This addresses two items, which are unlikely to be hit if we
trust drivers.
The first is moving a memory barrier below where the vmerged SG count
is passed back, but before the list is set to end. If those
instructions were reordered, there could be an issue in iommu_unmap_sg().
The second is making sure we terminate the list on the failure case of
iommu_map_sg(). If a driver does not look at the failure return code,
it could pass a ill-formed SG list to iommu_unmap_sg().
Signed-off-by: Jake Moilanen <moilanen@austin.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
You can't boot a kdump kernel via OF, not reliably anyway, the kernel being at
32 MB conflicts with the zImage wrapper etc. and it blows up.
It's trivial to check in prom_init though, and this is early enough that we can
actually drop back to OF where a reset-all will get you going again, which is
kinda nice. I think this should go in for 2.6.16.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In prom.c we run finish_node() on allnodes twice. The first time we just
calculate how much memory we'll need, the second time we do the actual work.
If the calculation stage determines that we need 0 bytes, then we should skip
the lmb allocation. Although an alloc of zero will work, it has been seen to
lead to a BUG_ON() in reserve_bootmem() on at least one machine.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
|
| |
When loading up the FPU, we were using a 'ld' (load doubleword)
instruction to get the FP exception mode from the thread_struct,
but it's only an int field. This changes the ld to lwz (load
word and zero-extend).
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
|
|
| |
Better save the sigmask instead of throwing it away so it can be restored.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Older pSeries systems with serial ports dont get any console output after
recent changes. CONFIG_ISA does not make sense for CONFIG_PPC_PSERIES
because it enables lots of old drivers. Instead, remove the dependency on
CONFIG_ISA from the serial port discovery code.
Signed-off-by: Olaf Hering <olh@suse.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Implement the TIF_RESTORE_SIGMASK flag in the new arch/powerpc kernel, for
both 32-bit and 64-bit system call paths.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The TIF_RESTORE_SIGMASK flag allows us to have a generic implementation of
sys_rt_sigsuspend() instead of duplicating it for each architecture. This
provides such an implementation and makes arch/powerpc use it.
It also tidies up the ppc32 sys_sigsuspend() to use TIF_RESTORE_SIGMASK.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
|
|
|
|
|
| |
When I removed the powermac support from arch/ppc/kernel/pci.c,
I overlooked the fact that that file is used in 32-bit ARCH=powerpc
builds. To prevent problems in future, restore the original version
of that file as arch/powerpc/kernel/pci_32.c, and use that.
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- This contains the arch specific changes for the following the
kdump generic fixes which were already accepted in the upstream.
. Capturing CPU registers (for the case of 'panic' and invoking
the dump using 'sysrq-trigger') from a function (stack frame) which will
be not be available during the kdump boot. Hence, might result in
invalid stack trace.
. Dynamically allocating per cpu ELF notes section instead of
statically for NR_CPUS.
- Fix the compiler warning in prom_init.c.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|\ |
|
| |
| |
| |
| |
| | |
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
| |
| |
| |
| |
| | |
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The first generation of PCI powermacs had a host bridge called /chaos
which was for all intents and purposes a PCI host bridge, but has a
device_type of "vci" in the device tree (presumably it's not really
PCI at the hardware level or something).
The OF parsing stuff in arch/powerpc/kernel/prom_parse.c currently
doesn't recognize it as a PCI bridge, which means that controlfb.c
can't get its device addresses.
This makes prom_parse.c recognize a device_type of "vci" as indicating
a PCI host bridge. With this, controlfb works again.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Handle the ibm,suspend-me RTAS call specially. It needs
to be wrapped in a set of synchronization hypervisor calls
(H_Join). When the H_Join calls are made on all CPUs, the
intent is that only one will return with H_Continue, meaning
that he is the "last man standing". That CPU then issues the
ibm,suspend-me call. What is interesting, of course, is that
the CPU running when the rtas syscall is made, may NOT be the
CPU that ultimately executes the ibm,suspend-me rtas call.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
| |
| |
| |
| |
| |
| |
| | |
Add the first MPC83xx board that uses a flat device tree to arch/powerpc.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
On a number of embedded reference boards there isn't always a
way to reset, power_off, or halt the board. Rather than having
each board implement a spin loop just let the generic code do
it.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In 2.6.15-git6 a change was commited in the oprofile support in
the powerpc architecture. It introduced the powerpc_oprofile_type
which contains the define G4. This causes a name clash with the
existing wacom usb tablet driver.
CC [M] drivers/usb/input/wacom.o
drivers/usb/input/wacom.c:98: error: conflicting types for `G4'
include/asm/cputable.h:37: error: previous declaration of `G4'
CC [M] drivers/usb/mon/mon_text.o
make[3]: *** [drivers/usb/input/wacom.o] Error 1
make[2]: *** [drivers/usb/input] Error 2
The elements of an enum declared in global scope are effectivly
global identifiers themselves. As such we need to ensure the names
are unique. This patch updates the later oprofile support to use
unique names.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The glibc folks want to use AT_PLATFORM to select between possible
alternative versions of shared libraries. This commit makes the kernel
supply an AT_PLATFORM string that indicates what class of processor
we are running on. Processors with the same set of user-level
instructions and roughly the same instruction scheduling characteristics
are given the same AT_PLATFORM value; for example, 821, 823 and 860
are all reported as "ppc823", and 7447, 7447A, 7448, 7450, 7451, 7455
are all called "ppc7450".
The intention is that the AT_PLATFORM values match the values that
gcc accepts for the -mcpu= option. For values which are numeric
(e.g. -mcpu=750), "ppc" has been prepended.
This also adds a PPC_FEATURE_BOOKE bit to the AT_HWCAP value and sets
it for the 440 family and the Freescale 85xx family.
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
At present the lppaca - the structure shared with the iSeries
hypervisor and phyp - is contained within the PACA, our own low-level
per-cpu structure. This doesn't have to be so, the patch below
removes it, making a separate array of lppaca structures.
This saves approximately 500*NR_CPUS bytes of image size and kernel
memory, because we don't need aligning gap between the Linux and
hypervisor portions of every PACA. On the other hand it means an
extra level of dereference in many accesses to the lppaca.
The patch also gets rid of several places where we assign the paca
address to a local variable for no particular reason.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch consolidates the variety of macros used for loading 32 or
64-bit constants in assembler (LOADADDR, LOADBASE, SET_REG_TO_*). The
idea is to make the set of macros consistent across 32 and 64 bit and
to make it more obvious which is the appropriate one to use in a given
situation. The new macros and their semantics are described in the
comments in ppc_asm.h.
In the process, we change several places that were unnecessarily using
immediate loads on ppc64 to use the GOT/TOC. Likewise we cleanup a
couple of places where we were clumsily subtracting PAGE_OFFSET with
asm instructions to use assemble-time arithmetic or the toreal() macro
instead.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add an of_find_property function that returns a struct property
given a property name. Then change the get_property function to
use that routine internally.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for updating and removing device tree
properties. Since we hand out pointers to properties with gay
abandon, we can't just free the property storage. Instead we
move deleted, or the old copy of an updated property, to a
"dead properties" list.
Also note, its not feasable to kref device tree properties.
we call get_property() all over the kernel in a wild variety
of contexts.
One consequence of this change is that we now take a
read_lock(&devtree_lock) when doing get_property().
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|\
| |
| |
| |
| |
| |
| | |
Fix up delete/modify conflict of arch/ppc/kernel/process.c by hand (it's
gone, gone, gone).
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit 5388fb1025443ec223ba556b10efc4c5f83f8682 made signal_32.c
use discard_lazy_cpu_state, which broke ARCH=ppc because that
uses the common signal_32.c but has its own process.c. Make ARCH=ppc
use the common process.c to fix this and to reduce the amount
of duplicated code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
pcibios_claim_one_bus is not needed on iSeries and phbs_remap_io can be
mode static.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
powerpc: Fixed memory reserve map layout
The memory reserve map is suppose to be a pair of 64-bit integers
to represent each region. On ppc32 the code was treating the
pair as two 32-bit integers. Additional the prom_init code was
producing the wrong layout on ppc32.
Added a simple check to try to provide backwards compatibility.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Heikki Lindholm pointed out that there was a potential race with the
lazy CPU state (FP, VR, EVR) stuff if preempt is enabled. The race
is that in the process of restoring FP state on sigreturn, the task
gets preempted by a user task that wants to use the FPU. It will take
an FP unavailable exception, which will write the current FPU state
to the thread_struct, overwriting the values which sigreturn has
stored. Note that this can only happen on UP since we don't implement
lazy CPU state on SMP.
The fix is to flush the lazy CPU state before updating the
thread_struct. To do this we re-use the flush_lazy_cpu_state()
function from process.c.
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
| |
| |
| |
| |
| |
| | |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|/
|
|
|
|
| |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
|
|
|
|
| |
arch: Use <linux/capability.h> where capable() is used.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a window where a probe gets removed right after the probe is hit
on some different cpu. In this case probe handlers can't find a matching
probe instance related to break address. In this case we need to read the
original instruction at break address to see if that is not a break/int3
instruction and recover safely.
Previous code had a bug where we were not checking for the above race in
case of reentrant probes and the below patch fixes this race.
Tested on IA64, Powerpc, x86_64.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
|
|
|
|
|
|
| |
arch/powerpc/kernel/crash.c isn't safe for PPC32 (yet?), so don't build it.
Built with CONFIG_KEXEC=y for pmac32_defconfig, pseries_defconfig,
and g5_defconfig.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
| |
ignore generated files under arch/powerpc
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes the compilation error (shown below) when CONFIG_SMP=n.
arch/powerpc/kernel/crash.c: In function `crash_kexec_prepare_cpus':
arch/powerpc/kernel/crash.c:236: error: implicit declaration of
function `smp_release_cpus'
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
| |
We were getting elfcorehdr_addr undefined in this case.
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current ppc64 per cpu data implementation is quite slow. eg:
lhz 11,18(13) /* smp_processor_id() */
ld 9,.LC63-.LCTOC1(30) /* per_cpu__variable_name */
ld 8,.LC61-.LCTOC1(30) /* __per_cpu_offset */
sldi 11,11,3 /* form index into __per_cpu_offset */
mr 10,9
ldx 9,11,8 /* __per_cpu_offset[smp_processor_id()] */
ldx 0,10,9 /* load per cpu data */
5 loads for something that is supposed to be fast, pretty awful. One
reason for the large number of loads is that we have to synthesize 2
64bit constants (per_cpu__variable_name and __per_cpu_offset).
By putting __per_cpu_offset into the paca we can avoid the 2 loads
associated with it:
ld 11,56(13) /* paca->data_offset */
ld 9,.LC59-.LCTOC1(30) /* per_cpu__variable_name */
ldx 0,9,11 /* load per cpu data
Longer term we can should be able to do even better than 3 loads.
If per_cpu__variable_name wasnt a 64bit constant and paca->data_offset
was in a register we could cut it down to one load. A suggestion from
Rusty is to use gcc's __thread extension here. In order to do this we
would need to free up r13 (the __thread register and where the paca
currently is). So far Ive had a few unsuccessful attempts at doing that :)
The patch also allocates per cpu memory node local on NUMA machines.
This patch from Rusty has been sitting in my queue _forever_ but stalled
when I hit the compiler bug. Sorry about that.
Finally I also only allocate per cpu data for possible cpus, which comes
straight out of the x86-64 port. On a pseries kernel (with NR_CPUS == 128)
and 4 possible cpus we see some nice gains:
total used free shared buffers cached
Mem: 4012228 212860 3799368 0 0 162424
total used free shared buffers cached
Mem: 4016200 212984 3803216 0 0 162424
A saving of 3.75MB. Quite nice for smaller machines. Note: we now have
to be careful of per cpu users that touch data for !possible cpus.
At this stage it might be worth making the NUMA and possible cpu
optimisations generic, but per cpu init is done so early we have to be
careful that all architectures have their possible map setup correctly.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|