summaryrefslogtreecommitdiffstats
path: root/include/asm-x86_64
Commit message (Collapse)AuthorAgeFilesLines
* [PATCH] add sem_is_read/write_locked()Rik Van Riel2005-10-291-0/+5
| | | | | | | | | | Add sem_is_read/write_locked functions to the read/write semaphores, along the same lines of the *_is_locked spinlock functions. The swap token tuning patch uses sem_is_read_locked; sem_is_write_locked is added for completeness. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] gfp_t: dma-mapping (amd64)Al Viro2005-10-281-1/+1
| | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] gfp_t: dma-mapping (ia64)Al Viro2005-10-281-1/+1
| | | | | | | | ... and related annotations for amd64 - swiotlb code is shared, but prototypes are not. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Revert "x86-64: Avoid unnecessary double bouncing for swiotlb"Linus Torvalds2005-10-271-3/+3
| | | | | | | | | Commit id 6142891a0c0209c91aa4a98f725de0d6e2ed4918 Andi Kleen reports that it seems to break things for some people, and since it's purely a small optimization, revert it for now. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86_64: Allocate cpu local data for all possible CPUsAndi Kleen2005-10-101-0/+1
| | | | | | | | | CPU hotplug fills up the possible map to NR_CPUs, but it did that after setting up per CPU data. This lead to CPU data not getting allocated for all possible CPUs, which lead to various side effects. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86_64: Add missing () around arguments of pte_index macroKirill Korotaev2005-09-301-1/+1
| | | | | | | | x86-64: Add missing () around arguments of pte_index macro Signed-Off-By: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-Off-By: Kirill Korotaev <dev@sw.ru> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Fix up TLB flush filter disablingAndi Kleen2005-09-291-0/+1
| | | | | | | | I checked with AMD and they requested to only disable it for family 15. Also disable it for i386 too. And some style fixes. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86_64: desc.h-needs smp.hAndrew Morton2005-09-171-0/+2
| | | | | | | | | | include/asm/desc.h: In function `load_LDT': include/asm/desc.h:209: warning: implicit declaration of function `get_cpu' include/asm/desc.h:211: warning: implicit declaration of function `put_cpu' Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] feature removal of io_remap_page_range()Randy Dunlap2005-09-131-3/+0
| | | | | | | | | As written in Documentation/feature-removal-schedule.txt, remove the io_remap_page_range() kernel API. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: clean up local_add/sub argumentsAndi Kleen2005-09-121-2/+2
| | | | Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: i386/x86-64: Fix time going twice as fast problem on ATI ↵Chuck Ebbert2005-09-121-0/+2
| | | | | | | | | | | Xpress chipsets Original patch from Bertro Simul This is probably still not quite correct, but seems to be the best solution so far. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: reduce x86-64 bug frame by 4 bytesJan Beulich2005-09-121-6/+4
| | | | | | | | As mentioned before, the size of the bug frame can be further reduced while continuing to use instructions to encode the information. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Lose constraints on cmpxchgJan Beulich2005-09-121-3/+3
| | | | | | | | | While only cosmetic for x86-64, this adjusts the cmpxchg code appearantly inherited from i386 to use more generic constraints. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Declare NMI_VECTOR and handle it in the IPI sending code.Jan Beulich2005-09-122-3/+15
| | | | | Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Remove unused vxtime.hz fieldAndi Kleen2005-09-121-1/+0
| | | | | Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Set the stack pointer correctly in init_thread and init_tssAndi Kleen2005-09-121-1/+7
| | | | | Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Safe interrupts in oops_begin/endJan Beulich2005-09-122-5/+2
| | | | | | | | | Rather than blindly re-enabling interrupts in oops_end(), save their state in oope_begin() and then restore that state. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Merge msr.c with i386 versionAndi Kleen2005-09-121-12/+27
| | | | | | | | | | The only difference was the inline assembly, so move that into asm/msr.h and merge with the i386 version. This adds some missing sysfs support code to x86-64. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Fix CFI informationJan Beulich2005-09-122-8/+23
| | | | | | | | | | | | Being the foundation for reliable stack unwinding, this fixes CFI unwind annotations in many low-level x86_64 routines, plus a config option (available to all architectures, and also present in the previously sent patch adding such annotations to i386 code) to enable them separatly rather than only along with adding full debug information. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Add dma_sync_single_range_for_{cpu,device}Andi Kleen2005-09-121-0/+5
| | | | | | | | | Currently just defined to their non range parts. Pointed out by John Linville Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Replace extern inline with static inline in asm-x86_64/*Adrian Bunk2005-09-1210-30/+30
| | | | | | | | | They should be identical in the kernel now, but this makes it consistent with other code. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Increase nodemap hash.Nakul Saraiya2005-09-121-1/+1
| | | | | | | | | Needed for some newer Opteron systems with E stepping and memory relocation enabled. The node addresses are different in lower bits now so the nodemap hash function needs to be enlarged. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Fix off by one in pfn_validJim Paradis2005-09-121-1/+1
| | | | | | | When I gave proposed the fix to pfn_valid() for RHEL4, Stephen Tweedie's sharp eyes caught this: Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Increase TLB flush array sizeAndi Kleen2005-09-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | The generic TLB flush functions kept upto 506 pages per CPU to avoid too frequent IPIs. This value was done for the L1 cache of older x86 CPUs, but with modern CPUs it does not make much sense anymore. TLB flushing is slow enough that using the L2 cache is fine. This patch increases the flush array on x86-64 to cache 5350 pages. That is roughly 20MB with 4K pages. It speeds up large munmaps in multithreaded processes on SMP considerably. The cost is roughly 42k of memory per CPU, which is reasonable. I only increased it on x86-64 for now, but it would probably make sense to increase it everywhere. Embedded architectures with SMP may keep it smaller to save some memory per CPU. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Don't include config.h in asm/timex.hAndi Kleen2005-09-121-1/+0
| | | | | | | | | asm-x86-64/timex.h does not reference CONFIG constants. Do not need to include config.h. Signed-off-by: Grant Grundler <iod00d@hp.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Some cleanup and optimization to the processor data area.Andi Kleen2005-09-122-10/+11
| | | | | | | | | - Remove unused irqrsp field - Remove pda->me - Optimize set_softirq_pending slightly Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Make remote TLB flush more scalableAndi Kleen2005-09-121-8/+9
| | | | | | | | | | | | | | | | | | | | | | | Instead of using a global spinlock to protect the state of the remote TLB flush use a lock and state for each sending CPU. To tell the receiver where to look for the state use 8 different call vectors. Each CPU uses a specific vector to trigger flushes on other CPUs. Depending on the received vector the target CPUs look into the right per cpu variable for the flush data. When the system has more than 8 CPUs they are hashed to the 8 available vectors. The limited global vector space forces us to this right now. In future when interrupts are split into per CPU domains this could be fixed, at the cost of needing more IPIs in flat mode. Also some minor cleanup in the smp flush code and remove some outdated debug code. Requires patch to move cpu_possible_map setup earlier. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Use ACPI PXM to parse PCI<->node assignmentsAndi Kleen2005-09-122-2/+2
| | | | | | | Since this is shared code I had to implement it for i386 too Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Remove redundant max_mapnr and replace with end_pfnAndi Kleen2005-09-122-3/+3
| | | | | | | | The FLATMEM people added it, but there doesn't seem a good reason because end_pfn is identical. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Avoid unnecessary double bouncing for swiotlbAndi Kleen2005-09-121-3/+3
| | | | | | | | | PCI_DMA_BUS_IS_PHYS has to be zero even when the GART IOMMU is disabled and the swiotlb is used. Otherwise the block layer does unnecessary double bouncing. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Support dualcore and 8 socket systems in k8 fallback node ↵Andi Kleen2005-09-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | parsing In particular on systems where the local APIC space and node space is very different from the Linux CPU number space. Previously the older NUMA setup code directly parsing the K8 northbridge registers had some issues on 8 socket or dual core systems. This patch fixes them. This is mainly done by fixing some confusion between Linux CPU numbers and local APIC ids. We now pass the local APIC IDs to later code, which avoids mismatches. Also add some heuristics to detect cases where the Hypertransport nodeids and the local APIC IDs don't match, but are shifted by a constant offset. This is still all quite hackish, hopefully BIOS writers fill in correct SRATs instead. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Don't cache align PDA on UP buildsAndi Kleen2005-09-121-1/+1
| | | | | | | Suggested by someone I forgot who sorry. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Don't assign CPU numbers in SRAT parsingAndi Kleen2005-09-121-0/+2
| | | | | | | | | Do that later when the CPU boots. SRAT just stores the APIC<->Node mapping node. This fixes problems on systems where the order of SRAT entries does not match the MADT. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Remove esr disable hack in APIC codeAndi Kleen2005-09-121-1/+0
| | | | | | | | This was just needed for the Numasaurus, which fortunately doesn't support x86-64 CPUs. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86-64: Remove obsolete APIC "write around" bug workaroundAndi Kleen2005-09-121-3/+3
| | | | | | | | | No x86-64 chipset has this bug Generated code doesn't change because it was always disabled. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386/x86_64: make get_cpu_vendor() staticAdrian Bunk2005-09-101-1/+0
| | | | | | | | get_cpu_vendor() no longer has any users in other files. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] spinlock consolidationIngo Molnar2005-09-102-122/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch (written by me and also containing many suggestions of Arjan van de Ven) does a major cleanup of the spinlock code. It does the following things: - consolidates and enhances the spinlock/rwlock debugging code - simplifies the asm/spinlock.h files - encapsulates the raw spinlock type and moves generic spinlock features (such as ->break_lock) into the generic code. - cleans up the spinlock code hierarchy to get rid of the spaghetti. Most notably there's now only a single variant of the debugging code, located in lib/spinlock_debug.c. (previously we had one SMP debugging variant per architecture, plus a separate generic one for UP builds) Also, i've enhanced the rwlock debugging facility, it will now track write-owners. There is new spinlock-owner/CPU-tracking on SMP builds too. All locks have lockup detection now, which will work for both soft and hard spin/rwlock lockups. The arch-level include files now only contain the minimally necessary subset of the spinlock code - all the rest that can be generalized now lives in the generic headers: include/asm-i386/spinlock_types.h | 16 include/asm-x86_64/spinlock_types.h | 16 I have also split up the various spinlock variants into separate files, making it easier to see which does what. The new layout is: SMP | UP ----------------------------|----------------------------------- asm/spinlock_types_smp.h | linux/spinlock_types_up.h linux/spinlock_types.h | linux/spinlock_types.h asm/spinlock_smp.h | linux/spinlock_up.h linux/spinlock_api_smp.h | linux/spinlock_api_up.h linux/spinlock.h | linux/spinlock.h /* * here's the role of the various spinlock/rwlock related include files: * * on SMP builds: * * asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the * initializers * * linux/spinlock_types.h: * defines the generic type and initializers * * asm/spinlock.h: contains the __raw_spin_*()/etc. lowlevel * implementations, mostly inline assembly code * * (also included on UP-debug builds:) * * linux/spinlock_api_smp.h: * contains the prototypes for the _spin_*() APIs. * * linux/spinlock.h: builds the final spin_*() APIs. * * on UP builds: * * linux/spinlock_type_up.h: * contains the generic, simplified UP spinlock type. * (which is an empty structure on non-debug builds) * * linux/spinlock_types.h: * defines the generic type and initializers * * linux/spinlock_up.h: * contains the __raw_spin_*()/etc. version of UP * builds. (which are NOPs on non-debug, non-preempt * builds) * * (included on UP-non-debug builds:) * * linux/spinlock_api_up.h: * builds the _spin_*() APIs. * * linux/spinlock.h: builds the final spin_*() APIs. */ All SMP and UP architectures are converted by this patch. arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via crosscompilers. m32r, mips, sh, sparc, have not been tested yet, but should be mostly fine. From: Grant Grundler <grundler@parisc-linux.org> Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU). Builds 32-bit SMP kernel (not booted or tested). I did not try to build non-SMP kernels. That should be trivial to fix up later if necessary. I converted bit ops atomic_hash lock to raw_spinlock_t. Doing so avoids some ugly nesting of linux/*.h and asm/*.h files. Those particular locks are well tested and contained entirely inside arch specific code. I do NOT expect any new issues to arise with them. If someone does ever need to use debug/metrics with them, then they will need to unravel this hairball between spinlocks, atomic ops, and bit ops that exist only because parisc has exactly one atomic instruction: LDCW (load and clear word). From: "Luck, Tony" <tony.luck@intel.com> ia64 fix Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjanv@infradead.org> Signed-off-by: Grant Grundler <grundler@parisc-linux.org> Cc: Matthew Wilcox <willy@debian.org> Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Mikael Pettersson <mikpe@csd.uu.se> Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge master.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild Linus Torvalds2005-09-091-1/+1
|\
| * kbuild: alpha,x86_64 use generic asm-offsets.h supportSam Ravnborg2005-09-091-1/+1
| | | | | | | | | | | | | | Delete obsolete stuff from arch makefiles Rename .h file to asm-offsets.h Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* | [PATCH] remove unnecessary handle_IRQ_event() prototypesKenji Kaneshige2005-09-091-4/+0
|/ | | | | | | | | The function prototype for handle_IRQ_event() in a few architctures is not needed because they use GENERIC_HARDIRQ. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge linux-2.6 with linux-acpi-2.6Len Brown2005-09-0810-114/+138
|\
| * [PATCH] Clean up struct flock definitionsStephen Rothwell2005-09-071-13/+0
| | | | | | | | | | | | | | | | | | This patch just gathers together all the struct flock definitions except xtensa into asm-generic/fcntl.h. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] Clean up the fcntl operationsStephen Rothwell2005-09-071-21/+0
| | | | | | | | | | | | | | | | | | This patch puts the most popular of each fcntl operation/flag into asm-generic/fcntl.h and cleans up the arch files. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] Clean up the open flagsStephen Rothwell2005-09-071-17/+0
| | | | | | | | | | | | | | | | | | This patch puts the most popular of each open flag into asm-generic/fcntl.h and cleans up the arch files. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] Create asm-generic/fcntl.hStephen Rothwell2005-09-071-25/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | This set of patches creates asm-generic/fcntl.h and consolidates as much as possible from the asm-*/fcntl.h files into it. This patch just gathers all the identical bits of the asm-*/fcntl.h files into asm-generic/fcntl.h. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] remove verify_area(): remove verify_area() from various uaccess.h ↵Jesper Juhl2005-09-071-7/+0
| | | | | | | | | | | | | | | | | | | | | | headers Remove the deprecated (and unused) verify_area() from various uaccess.h headers. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] remove asm-*/hdreg.hChristoph Hellwig2005-09-071-1/+0
| | | | | | | | | | | | | | | | unused and useless.. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] auxiliary vector cleanupsH. J. Lu2005-09-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | The size of auxiliary vector is fixed at 42 in linux/sched.h. But it isn't very obvious when looking at linux/elf.h. This patch adds AT_VECTOR_SIZE so that we can change it if necessary when a new vector is added. Because of include file ordering problems, doing this necessitated the extraction of the AT_* symbols into a standalone header file. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] compat: be more consistent about [ug]id_tStephen Rothwell2005-09-071-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I first wrote the compat layer patches, I was somewhat cavalier about the definition of compat_uid_t and compat_gid_t (or maybe I just misunderstood :-)). This patch makes the compat types much more consistent with the types we are being compatible with and hopefully will fix a few bugs along the way. compat type type in compat arch __compat_[ug]id_t __kernel_[ug]id_t __compat_[ug]id32_t __kernel_[ug]id32_t compat_[ug]id_t [ug]id_t The difference is that compat_uid_t is always 32 bits (for the archs we care about) but __compat_uid_t may be 16 bits on some. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] FUTEX_WAKE_OP: pthread_cond_signal() speedupJakub Jelinek2005-09-071-0/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ATM pthread_cond_signal is unnecessarily slow, because it wakes one waiter (which at least on UP usually means an immediate context switch to one of the waiter threads). This waiter wakes up and after a few instructions it attempts to acquire the cv internal lock, but that lock is still held by the thread calling pthread_cond_signal. So it goes to sleep and eventually the signalling thread is scheduled in, unlocks the internal lock and wakes the waiter again. Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal to avoid this performance issue, but it was removed when locks were redesigned to the 3 state scheme (unlocked, locked uncontended, locked contended). Following scenario shows why simply using FUTEX_REQUEUE in pthread_cond_signal together with using lll_mutex_unlock_force in place of lll_mutex_unlock is not enough and probably why it has been disabled at that time: The number is value in cv->__data.__lock. thr1 thr2 thr3 0 pthread_cond_wait 1 lll_mutex_lock (cv->__data.__lock) 0 lll_mutex_unlock (cv->__data.__lock) 0 lll_futex_wait (&cv->__data.__futex, futexval) 0 pthread_cond_signal 1 lll_mutex_lock (cv->__data.__lock) 1 pthread_cond_signal 2 lll_mutex_lock (cv->__data.__lock) 2 lll_futex_wait (&cv->__data.__lock, 2) 2 lll_futex_requeue (&cv->__data.__futex, 0, 1, &cv->__data.__lock) # FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE 2 lll_mutex_unlock_force (cv->__data.__lock) 0 cv->__data.__lock = 0 0 lll_futex_wake (&cv->__data.__lock, 1) 1 lll_mutex_lock (cv->__data.__lock) 0 lll_mutex_unlock (cv->__data.__lock) # Here, lll_mutex_unlock doesn't know there are threads waiting # on the internal cv's lock Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal, but it will cost us not one, but 2 extra syscalls and, what's worse, one of these extra syscalls will be done for every single waiting loop in pthread_cond_*wait. We would need to use lll_mutex_unlock_force in pthread_cond_signal after requeue and lll_mutex_cond_lock in pthread_cond_*wait after lll_futex_wait. Another alternative is to do the unlocking pthread_cond_signal needs to do (the lock can't be unlocked before lll_futex_wake, as that is racy) in the kernel. I have implemented both variants, futex-requeue-glibc.patch is the first one and futex-wake_op{,-glibc}.patch is the unlocking inside of the kernel. The kernel interface allows userland to specify how exactly an unlocking operation should look like (some atomic arithmetic operation with optional constant argument and comparison of the previous futex value with another constant). It has been implemented just for ppc*, x86_64 and i?86, for other architectures I'm including just a stub header which can be used as a starting point by maintainers to write support for their arches and ATM will just return -ENOSYS for FUTEX_WAKE_OP. The requeue patch has been (lightly) tested just on x86_64, the wake_op patch on ppc64 kernel running 32-bit and 64-bit NPTL and x86_64 kernel running 32-bit and 64-bit NPTL. With the following benchmark on UP x86-64 I get: for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so --library-path .:$i /tmp/bench; \ for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2>&1; done; done time elf/ld.so --library-path .:nptl-orig /tmp/bench real 0m0.655s user 0m0.253s sys 0m0.403s real 0m0.657s user 0m0.269s sys 0m0.388s time elf/ld.so --library-path .:nptl-requeue /tmp/bench real 0m0.496s user 0m0.225s sys 0m0.271s real 0m0.531s user 0m0.242s sys 0m0.288s time elf/ld.so --library-path .:nptl-wake_op /tmp/bench real 0m0.380s user 0m0.176s sys 0m0.204s real 0m0.382s user 0m0.175s sys 0m0.207s The benchmark is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00001.txt Older futex-requeue-glibc.patch version is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00002.txt Older futex-wake_op-glibc.patch version is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00003.txt Will post a new version (just x86-64 fixes so that the patch applies against pthread_cond_signal.S) to libc-hacker ml soon. Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded testcase that will not test the atomicity of the operation, but at least check if the threads that should have been woken up are woken up and whether the arithmetic operation in the kernel gave the expected results. Acked-by: Ingo Molnar <mingo@redhat.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Jamie Lokier <jamie@shareable.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
OpenPOWER on IntegriCloud