summaryrefslogtreecommitdiffstats
path: root/arch/x86
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2010-05-274-23/+29
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits) tracing: Add __used annotation to event variable perf, trace: Fix !x86 build bug perf report: Support multiple events on the TUI perf annotate: Fix up usage of the build id cache x86/mmiotrace: Remove redundant instruction prefix checks perf annotate: Add TUI interface perf tui: Remove annotate from popup menu after failure perf report: Don't start the TUI if -D is used perf: Fix getline undeclared perf: Optimize perf_tp_event_match() perf: Remove more code from the fastpath perf: Optimize the !vmalloc backed buffer perf: Optimize perf_output_copy() perf: Fix wakeup storm for RO mmap()s perf-record: Share per-cpu buffers perf-record: Remove -M perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig ...
| * perf, trace: Fix !x86 build bugPeter Zijlstra2010-05-251-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch b7e2ecef92 (perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction) made the unfortunate mistake of assuming the world is x86 only, correct this. The problem was that perf_fetch_caller_regs() did local_save_flags() into regs->flags, and I re-used that to remove another local_save_flags(), forgetting !x86 doesn't have regs->flags. Do the reverse, remove the local_save_flags() from perf_fetch_caller_regs() and let the ftrace site do the local_save_flags() instead. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Cc: acme@redhat.com Cc: efault@gmx.de Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org LKML-Reference: <1274778175.5882.623.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86/mmiotrace: Remove redundant instruction prefix checksAkinobu Mita2010-05-231-1/+1
| | | | | | | | | | | | | | | | | | | | Get rid of the duplicated entries in prefix_codes[] to eliminate redundant checks by skip_prefix(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Pekka Paalanen <pq@iki.fi> LKML-Reference: <1274140110-5841-1-git-send-email-akinobu.mita@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * Merge branch 'perf/urgent' of ↵Ingo Molnar2010-05-20100-2531/+4006
| |\ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core
| * | perf, x86: P4 PMU -- add missing bit in CCCR maskCyrill Gorcunov2010-05-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Should be there for the sake of RAW events. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Lin Ming <ming.m.lin@intel.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100518212439.354345151@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf, x86: P4_pmu_schedule_events -- use smp_processor_id instead of raw_Cyrill Gorcunov2010-05-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This snippet somehow escaped the commit: | commit 137351e0feeb9f25d99488ee1afc1c79f5499a9a | Author: Cyrill Gorcunov <gorcunov@openvz.org> | Date: Sat May 8 15:25:52 2010 +0400 | | x86, perf: P4 PMU -- protect sensible procedures from preemption so bring it eventually back. It helps to catch preemption issue (if there will be, rule of thumb -- don't use raw_ if you can). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100518212439.167259349@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf, x86: P4 PMU -- do a real check for ESCR address being in hashCyrill Gorcunov2010-05-191-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To prevent from clashes in future code modifications do a real check for ESCR address being in hash. At moment the callers are known to pass sane values but better to be on a safe side. And comment fix. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Lin Ming <ming.m.lin@intel.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100518212439.004503600@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf, x86: P4 PMU -- fix typo in unflagged NMI handlingCyrill Gorcunov2010-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Tested-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> LKML-Reference: <1274174954.22793.17.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf, x86: P4 PMU -- handle unflagged eventsCyrill Gorcunov2010-05-181-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It might happen that an event can overflow without the proper overflow flag set. Check the sign bit in the raw counter value to solve this problem. Tested-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: fweisbec@gmail.com Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1274083984.6540.15.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | numa: x86_64: use generic percpu var numa_node_id() implementationLee Schermerhorn2010-05-275-24/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | x86 arch specific changes to use generic numa_node_id() based on generic percpu variable infrastructure. Back out x86's custom version of numa_node_id() Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | numa: add generic percpu var numa_node_id() implementationLee Schermerhorn2010-05-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rework the generic version of the numa_node_id() function to use the new generic percpu variable infrastructure. Guard the new implementation with a new config option: CONFIG_USE_PERCPU_NUMA_NODE_ID. Archs which support this new implemention will default this option to 'y' when NUMA is configured. This config option could be removed if/when all archs switch over to the generic percpu implementation of numa_node_id(). Arch support involves: 1) converting any existing per cpu variable implementations to use this implementation. x86_64 is an instance of such an arch. 2) archs that don't use a per cpu variable for numa_node_id() will need to initialize the new per cpu variable "numa_node" as cpus are brought on-line. ia64 is an example. 3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g., when NUMA is configured. This is required because I have retained the old implementation by default to allow archs to be modified incrementally, as desired. Subsequent patches will convert x86_64 and ia64 to use this implemenation. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | asm-generic: remove ARCH_HAS_SG_CHAIN in scatterlist.hFUJITA Tomonori2010-05-271-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are more architectures that don't support ARCH_HAS_SG_CHAIN than those that support it. This removes removes ARCH_HAS_SG_CHAIN in asm-generic/scatterlist.h and lets arhictectures to define it. It's clearer than defining ARCH_HAS_SG_CHAIN asm-generic/scatterlist.h and undefing it in arhictectures that don't support it. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | x86_32: use asm-generic/scatterlist.hAndrew Morton2010-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | asm-generic: add NEED_SG_DMA_LENGTH to define sg_dma_len()FUJITA Tomonori2010-05-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are only two ways to define sg_dma_len(); use sg->dma_length or sg->length. This patch introduces NEED_SG_DMA_LENGTH that enables architectures to choose sg->dma_length or sg->length. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | x86: remove unnecessary sync_single_range_* in swiotlb_dma_opsFUJITA Tomonori2010-05-271-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sync_single_range_for_cpu and sync_single_range_for_device hooks in swiotlb_dma_ops are unnecessary because sync_single_for_cpu and sync_single_for_device are used there. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | x86: convert cpu notifier to return encapsulate errno valueAkinobu Mita2010-05-273-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By the previous modification, the cpu notifier can return encapsulate errno value. This converts the cpu notifiers for msr, cpuid, and therm_throt. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | cpusets: randomize node rotor used in cpuset_mem_spread_node()Jack Steiner2010-05-271-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some workloads that create a large number of small files tend to assign too many pages to node 0 (multi-node systems). Part of the reason is that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at node 0 for newly created tasks. This patch changes the rotor to be initialized to a random node number of the cpuset. [akpm@linux-foundation.org: fix layout] [Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration] Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Paul Menage <menage@google.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Revert "endian: #define __BYTE_ORDER"Linus Torvalds2010-05-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit b3b77c8caef1750ebeea1054e39e358550ea9f55, which was also totally broken (see commit 0d2daf5cc858 that reverted the crc32 version of it). As reported by Stephen Rothwell, it causes problems on big-endian machines: > In file included from fs/jfs/jfs_types.h:33, > from fs/jfs/jfs_incore.h:26, > from fs/jfs/file.c:22: > fs/jfs/endian24.h:36:101: warning: "__LITTLE_ENDIAN" is not defined The kernel has never had that crazy "__BYTE_ORDER == __LITTLE_ENDIAN" model. It's not how we do things, and it isn't how we _should_ do things. So don't go there. Requested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | driver core: add devname module aliases to allow module on-demand auto-loadingKay Sievers2010-05-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the *common* and *single*instance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Ian Kent <raven@themaw.net> Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | | drivers/hwmon/coretemp.c: get TjMax value from MSRCarsten Emde2010-05-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MSR IA32_TEMPERATURE_TARGET contains the TjMax value in the newer Intel processors. Signed-off-by: Huaxu Wan <huaxu.wan@linux.intel.com> Signed-off-by: Carsten Emde <C.Emde@osadl.org> Cc: Jean Delvare <khali@linux-fr.org> Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Yong Wang <yong.y.wang@linux.intel.com> Cc: Rudolf Marek <r.marek@assembler.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | endian: #define __BYTE_ORDERJoakim Tjernlund2010-05-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linux does not define __BYTE_ORDER in its endian header files which makes some header files bend backwards to get at the current endian. Lets #define __BYTE_ORDER in big_endian.h/litte_endian.h to make it easier for header files that are used in user space too. In userspace the convention is that 1. _both_ __LITTLE_ENDIAN and __BIG_ENDIAN are defined, 2. you have to test for e.g. __BYTE_ORDER == __BIG_ENDIAN. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'linux-next' of ↵Linus Torvalds2010-05-2111-32/+149
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (36 commits) PCI: hotplug: pciehp: Removed check for hotplug of display devices PCI: read memory ranges out of Broadcom CNB20LE host bridge PCI: Allow manual resource allocation for PCI hotplug bridges x86/PCI: make ACPI MCFG reserved error messages ACPI specific PCI hotplug: Use kmemdup PM/PCI: Update PCI power management documentation PCI: output FW warning in pci_read/write_vpd PCI: fix typos pci_device_dis/enable to pci_dis/enable_device in comments PCI quirks: disable msi on AMD rs4xx internal gfx bridges PCI: Disable MSI for MCP55 on P5N32-E SLI x86/PCI: irq and pci_ids patch for additional Intel Cougar Point DeviceIDs PCI: aerdrv: trivial cleanup for aerdrv_core.c PCI: aerdrv: trivial cleanup for aerdrv.c PCI: aerdrv: introduce default_downstream_reset_link PCI: aerdrv: rework find_aer_service PCI: aerdrv: remove is_downstream PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS PCI: aerdrv: redefine PCI_ERR_ROOT_*_SRC PCI: aerdrv: rework do_recovery PCI: aerdrv: rework get_e_source() ...
| * | | PCI: read memory ranges out of Broadcom CNB20LE host bridgeIra W. Snyder2010-05-213-0/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Read the memory ranges behind the Broadcom CNB20LE host bridge out of the hardware. This allows PCI hotplugging to work, since we know which memory range to allocate PCI BAR's from. The x86 PCI code automatically prefers the ACPI _CRS information when it is available. In that case, this information is not used. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | x86/PCI: make ACPI MCFG reserved error messages ACPI specificFeng Tang2010-05-181-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both ACPI and SFI firmwares will have MCFG space, but the error message isn't valid on SFI, so don't print the message in that case. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | x86/PCI: irq and pci_ids patch for additional Intel Cougar Point DeviceIDsSeth Heasley2010-05-111-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds additional LPC Controller DeviceIDs for the Intel Cougar Point PCH. The DeviceIDs are defined and referenced as a range of values, the same way Ibex Peak was implemented. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | x86/PCI: Convert pci_config_lock to raw_spinlockThomas Gleixner2010-05-116-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pci_config_lock must be a real spinlock in preempt-rt. Convert it to raw_spinlock. No change for !RT kernels. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* | | | Merge branch 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2010-05-2125-2050/+2987
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits) KVM: x86: Add missing locking to arch specific vcpu ioctls KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls KVM: MMU: Segregate shadow pages with different cr0.wp KVM: x86: Check LMA bit before set_efer KVM: Don't allow lmsw to clear cr0.pe KVM: Add cpuid.txt file KVM: x86: Tell the guest we'll warn it about tsc stability x86, paravirt: don't compute pvclock adjustments if we trust the tsc x86: KVM guest: Try using new kvm clock msrs KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID KVM: x86: add new KVMCLOCK cpuid feature KVM: x86: change msr numbers for kvmclock x86, paravirt: Add a global synchronization point for pvclock x86, paravirt: Enable pvclock flags in vcpu_time_info structure KVM: x86: Inject #GP with the right rip on efer writes KVM: SVM: Don't allow nested guest to VMMCALL into host KVM: x86: Fix exception reinjection forced to true KVM: Fix wallclock version writing race KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots KVM: VMX: enable VMXON check with SMX enabled (Intel TXT) ...
| * | | | KVM: x86: Add missing locking to arch specific vcpu ioctlsAvi Kivity2010-05-191-0/+6
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | KVM: MMU: Segregate shadow pages with different cr0.wpAvi Kivity2010-05-192-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When cr0.wp=0, we may shadow a gpte having u/s=1 and r/w=0 with an spte having u/s=0 and r/w=1. This allows excessive access if the guest sets cr0.wp=1 and accesses through this spte. Fix by making cr0.wp part of the base role; we'll have different sptes for the two cases and the problem disappears. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | KVM: x86: Check LMA bit before set_eferSheng Yang2010-05-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_x86_ops->set_efer() would execute vcpu->arch.efer = efer, so the checking of LMA bit didn't work. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | KVM: Don't allow lmsw to clear cr0.peAvi Kivity2010-05-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current lmsw implementation allows the guest to clear cr0.pe, contrary to the manual, which breaks EMM386.EXE. Fix by ORing the old cr0.pe with lmsw's operand. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | KVM: x86: Tell the guest we'll warn it about tsc stabilityGlauber Costa2010-05-191-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch puts up the flag that tells the guest that we'll warn it about the tsc being trustworthy or not. By now, we also say it is not. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | x86, paravirt: don't compute pvclock adjustments if we trust the tscGlauber Costa2010-05-194-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the HV told us we can fully trust the TSC, skip any correction Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | x86: KVM guest: Try using new kvm clock msrsGlauber Costa2010-05-191-21/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We now added a new set of clock-related msrs in replacement of the old ones. In theory, we could just try to use them and get a return value indicating they do not exist, due to our use of kvm_write_msr_save. However, kvm clock registration happens very early, and if we ever try to write to a non-existant MSR, we raise a lethal #GP, since our idt handlers are not in place yet. So this patch tests for a cpuid feature exported by the host to decide which set of msrs are supported. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUIDGlauber Costa2010-05-191-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now, we were using individual KVM_CAP entities to communicate userspace about which cpuids we support. This is suboptimal, since it generates a delay between the feature arriving in the host, and being available at the guest. A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID. This makes userspace automatically aware of what we provide. And if we ever add a new cpuid bit in the future, we have to do that again, which create some complexity and delay in feature adoption. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | KVM: x86: add new KVMCLOCK cpuid featureGlauber Costa2010-05-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This cpuid, KVM_CPUID_CLOCKSOURCE2, will indicate to the guest that kvmclock is available through a new set of MSRs. The old ones are deprecated. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | KVM: x86: change msr numbers for kvmclockGlauber Costa2010-05-192-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avi pointed out a while ago that those MSRs falls into the pentium PMU range. So the idea here is to add new ones, and after a while, deprecate the old ones. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | x86, paravirt: Add a global synchronization point for pvclockGlauber Costa2010-05-191-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In recent stress tests, it was found that pvclock-based systems could seriously warp in smp systems. Using ingo's time-warp-test.c, I could trigger a scenario as bad as 1.5mi warps a minute in some systems. (to be fair, it wasn't that bad in most of them). Investigating further, I found out that such warps were caused by the very offset-based calculation pvclock is based on. This happens even on some machines that report constant_tsc in its tsc flags, specially on multi-socket ones. Two reads of the same kernel timestamp at approx the same time, will likely have tsc timestamped in different occasions too. This means the delta we calculate is unpredictable at best, and can probably be smaller in a cpu that is legitimately reading clock in a forward ocasion. Some adjustments on the host could make this window less likely to happen, but still, it pretty much poses as an intrinsic problem of the mechanism. A while ago, I though about using a shared variable anyway, to hold clock last state, but gave up due to the high contention locking was likely to introduce, possibly rendering the thing useless on big machines. I argue, however, that locking is not necessary. We do a read-and-return sequence in pvclock, and between read and return, the global value can have changed. However, it can only have changed by means of an addition of a positive value. So if we detected that our clock timestamp is less than the current global, we know that we need to return a higher one, even though it is not exactly the one we compared to. OTOH, if we detect we're greater than the current time source, we atomically replace the value with our new readings. This do causes contention on big boxes (but big here means *BIG*), but it seems like a good trade off, since it provide us with a time source guaranteed to be stable wrt time warps. After this patch is applied, I don't see a single warp in time during 5 days of execution, in any of the machines I saw them before. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> CC: Jeremy Fitzhardinge <jeremy@goop.org> CC: Avi Kivity <avi@redhat.com> CC: Marcelo Tosatti <mtosatti@redhat.com> CC: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | x86, paravirt: Enable pvclock flags in vcpu_time_info structureGlauber Costa2010-05-193-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes one padding byte and transform it into a flags field. New versions of guests using pvclock will query these flags upon each read. Flags, however, will only be interpreted when the guest decides to. It uses the pvclock_valid_flags function to signal that a specific set of flags should be taken into consideration. Which flags are valid are usually devised via HV negotiation. Signed-off-by: Glauber Costa <glommer@redhat.com> CC: Jeremy Fitzhardinge <jeremy@goop.org> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | KVM: x86: Inject #GP with the right rip on efer writesRoedel, Joerg2010-05-191-19/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a bug in the KVM efer-msr write path. If a guest writes to a reserved efer bit the set_efer function injects the #GP directly. The architecture dependent wrmsr function does not see this, assumes success and advances the rip. This results in a #GP in the guest with the wrong rip. This patch fixes this by reporting efer write errors back to the architectural wrmsr function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | KVM: SVM: Don't allow nested guest to VMMCALL into hostJoerg Roedel2010-05-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch disables the possibility for a l2-guest to do a VMMCALL directly into the host. This would happen if the l1-hypervisor doesn't intercept VMMCALL and the l2-guest executes this instruction. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | KVM: x86: Fix exception reinjection forced to trueJoerg Roedel2010-05-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch merged recently which allowed to mark an exception as reinjected has a bug as it always marks the exception as reinjected. This breaks nested-svm shadow-on-shadow implementation. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | KVM: Fix wallclock version writing raceAvi Kivity2010-05-191-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wallclock writing uses an unprotected global variable to hold the version; this can cause one guest to interfere with another if both write their wallclock at the same time. Acked-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_rootsAvi Kivity2010-05-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On svm, kvm_read_pdptr() may require reading guest memory, which can sleep. Push the spinlock into mmu_alloc_roots(), and only take it after we've read the pdptr. Tested-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)Shane Wang2010-05-193-13/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per document, for feature control MSR: Bit 1 enables VMXON in SMX operation. If the bit is clear, execution of VMXON in SMX operation causes a general-protection exception. Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution of VMXON outside SMX operation causes a general-protection exception. This patch is to enable this kind of check with SMX for VMXON in KVM. Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | KVM: x86: properly update ready_for_interrupt_injectionMarcelo Tosatti2010-05-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent changes to emulate string instructions without entering guest mode exposed a bug where pending interrupts are not properly reflected in ready_for_interrupt_injection. The result is that userspace overwrites a previously queued interrupt, when irqchip's are emulated in userspace. Fix by always updating state before returning to userspace. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | KVM: VMX: Atomically switch efer if EPT && !EFER.NXAvi Kivity2010-05-191-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When EPT is enabled, we cannot emulate EFER.NX=0 through the shadow page tables. This causes accesses through ptes with bit 63 set to succeed instead of failing a reserved bit check. Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | KVM: VMX: Add facility to atomically switch MSRs on guest entry/exitAvi Kivity2010-05-191-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some guest msr values cannot be used on the host (for example. EFER.NX=0), so we need to switch them atomically during guest entry or exit. Add a facility to program the vmx msr autoload registers accordingly. Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | KVM: VMX: Add definitions for guest and host EFER autoswitch vmcs entriesAvi Kivity2010-05-191-0/+4
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | KVM: VMX: Add definition for msr autoload entryAvi Kivity2010-05-191-1/+7
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Avi Kivity <avi@redhat.com>
OpenPOWER on IntegriCloud