op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	KVM: Prevent guest fpu state from leaking into the host	Avi Kivity	2007-06-15	3	-9/+28
\| \| \| \| \| \| \| \| \|	The lazy fpu changes did not take into account that some vmexit handlers can sleep. Move loading the guest state into the inner loop so that it can be reloaded if necessary, and move loading the host state into vmx_vcpu_put() so it can be performed whenever we relinquish the vcpu. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	kvm: fix section mismatch warning in kvm-intel.o	Sam Ravnborg	2007-06-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix following section mismatch warning in kvm-intel.o: WARNING: o-i386/drivers/kvm/kvm-intel.o(.init.text+0xbd): Section mismatch: reference to .exit.text: (between 'hardware_setup' and 'vmx_disabled_by_bios') The function free_kvm_area is used in the function alloc_kvm_area which is marked __init. The __exit area is discarded by some archs during link-time if a module is built-in resulting in an oops. Note: This warning is only seen by my local copy of modpost but the change will soon hit upstream. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Detach sched.h from mm.h	Alexey Dobriyan	2007-05-21	4	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	First thing mm.h does is including sched.h solely for can_do_mlock() inline function which has "current" dereference inside. By dealing with can_do_mlock() mm.h can be detached from sched.h which is good. See below, why. This patch a) removes unconditional inclusion of sched.h from mm.h b) makes can_do_mlock() normal function in mm/mlock.c c) exports can_do_mlock() to not break compilation d) adds sched.h inclusions back to files that were getting it indirectly. e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were getting them indirectly Net result is: a) mm.h users would get less code to open, read, preprocess, parse, ... if they don't need sched.h b) sched.h stops being dependency for significant number of files: on x86_64 allmodconfig touching sched.h results in recompile of 4083 files, after patch it's only 3744 (-8.3%). Cross-compile tested on all arm defconfigs, all mips defconfigs, all powerpc defconfigs, alpha alpha-up arm i386 i386-up i386-defconfig i386-allnoconfig ia64 ia64-up m68k mips parisc parisc-up powerpc powerpc-up s390 s390-up sparc sparc-up sparc64 sparc64-up um-x86_64 x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[S390] Kconfig: refine depends statements.	Martin Schwidefsky	2007-05-10	1	-0/+1
\| \| \| \| \| \| \|	Refine some depends statements to limit their visibility to the environments that are actually supported. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
*	Add suspend-related notifications for CPU hotplug	Rafael J. Wysocki	2007-05-09	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	KVM: Remove unused 'instruction_length'	Avi Kivity	2007-05-03	1	-1/+0
\| \| \| \| \| \| \|	As we no longer emulate in userspace, this is meaningless. We don't compute it on SVM anyway. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Don't require explicit indication of completion of mmio or pio	Avi Kivity	2007-05-03	1	-22/+22
\| \| \| \| \| \| \| \|	It is illegal not to return from a pio or mmio request without completing it, as mmio or pio is an atomic operation. Therefore, we can simplify the userspace interface by avoiding the completion indication. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Remove extraneous guest entry on mmio read	Avi Kivity	2007-05-03	2	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	When emulating an mmio read, we actually emulate twice: once to determine the physical address of the mmio, and, after we've exited to userspace to get the mmio value, we emulate again to place the value in the result register and update any flags. But we don't really need to enter the guest again for that, only to take an immediate vmexit. So, if we detect that we're doing an mmio read, emulate a single instruction before entering the guest again. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: SVM: Only save/restore MSRs when needed	Anthony Liguori	2007-05-03	2	-17/+20
\| \| \| \| \| \| \| \| \| \| \| \|	We only have to save/restore MSR_GS_BASE on every VMEXIT. The rest can be saved/restored when we leave the VCPU. Since we don't emulate the DEBUGCTL MSRs and the guest cannot write to them, we don't have to worry about saving/restoring them at all. This shaves a whopping 40% off raw vmexit costs on AMD. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: fix an if() condition	Adrian Bunk	2007-05-03	1	-1/+1
\| \| \| \| \| \| \| \|	It might have worked in this case since PT_PRESENT_MASK is 1, but let's express this correctly. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: VMX: Add lazy FPU support for VT	Anthony Liguori	2007-05-03	1	-5/+56
\| \| \| \| \| \| \| \|	Only save/restore the FPU host state when the guest is actually using the FPU. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: VMX: Properly shadow the CR0 register in the vcpu struct	Anthony Liguori	2007-05-03	4	-14/+14
\| \| \| \| \| \| \| \|	Set all of the host mask bits for CR0 so that we can maintain a proper shadow of CR0. This exposes CR0.TS, paving the way for lazy fpu handling. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Don't complain about cpu erratum AA15	Avi Kivity	2007-05-03	1	-2/+0
\| \| \| \| \| \|	It slows down Windows x64 horribly. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Lazy FPU support for SVM	Anthony Liguori	2007-05-03	2	-4/+33
\| \| \| \| \| \| \| \|	Avoid saving and restoring the guest fpu state on every exit. This shaves ~100 cycles off the guest/host switch. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Allow passing 64-bit values to the emulated read/write API	Avi Kivity	2007-05-03	3	-99/+24
\| \| \| \| \| \| \|	This simplifies the API somewhat (by eliminating the special-case cmpxchg8b on i386). Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Per-vcpu statistics	Avi Kivity	2007-05-03	6	-53/+79
\| \| \| \| \| \| \| \|	Make the exit statistics per-vcpu instead of global. This gives a 3.5% boost when running one virtual machine per core on my two socket dual core (4 cores total) machine. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: VMX: Avoid unnecessary vcpu_load()/vcpu_put() cycles	Yaozu Dong	2007-05-03	1	-0/+2
\| \| \| \| \| \| \| \|	By checking if a reschedule is needed, we avoid dropping the vcpu. [With changes by me, based on Anthony Liguori's observations] Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: Avoid heavy ASSERT at non debug mode.	Yaozu Dong	2007-05-03	1	-0/+6
\| \| \| \|	Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: VMX: Only save/restore MSR_K6_STAR if necessary	Avi Kivity	2007-05-03	1	-0/+16
\| \| \| \| \| \| \| \| \| \|	Intel hosts only support syscall/sysret in long more (and only if efer.sce is enabled), so only reload the related MSR_K6_STAR if the guest will actually be able to use it. This reduces vmexit cost by about 500 cycles (6400 -> 5870) on my setup. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Fold drivers/kvm/kvm_vmx.h into drivers/kvm/vmx.c	Avi Kivity	2007-05-03	2	-15/+6
\| \| \| \| \| \|	No meat in that file. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: VMX: Don't switch 64-bit msrs for 32-bit guests	Avi Kivity	2007-05-03	1	-16/+42
\| \| \| \| \| \| \| \| \|	Some msrs are only used by x86_64 instructions, and are therefore not needed when the guest is legacy mode. By not bothering to switch them, we reduce vmexit latency by 2400 cycles (from about 8800) when running a 32-bt guest on a 64-bit host. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: VMX: Reduce unnecessary saving of host msrs	Avi Kivity	2007-05-03	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \|	THe automatically switched msrs are never changed on the host (with the exception of MSR_KERNEL_GS_BASE) and thus there is no need to save them on every vm entry. This reduces vmexit latency by ~400 cycles on i386 and by ~900 cycles (10%) on x86_64. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Handle guest page faults when emulating mmio	Avi Kivity	2007-05-03	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Usually, guest page faults are detected by the kvm page fault handler, which detects if they are shadow faults, mmio faults, pagetable faults, or normal guest page faults. However, in ceratin circumstances, we can detect a page fault much later. One of these events is the following combination: - A two memory operand instruction (e.g. movsb) is executed. - The first operand is in mmio space (which is the fault reported to kvm) - The second operand is in an ummaped address (e.g. a guest page fault) The Windows 2000 installer does such an access, an promptly hangs. Fix by adding the missing page fault injection on that path. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: SVM: Report hardware exit reason to userspace instead of dmesg	Avi Kivity	2007-05-03	1	-6/+1
\| \| \| \|	Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Retry sleeping allocation if atomic allocation fails	Avi Kivity	2007-05-03	1	-5/+21
\| \| \| \| \| \|	This avoids -ENOMEM under memory pressure. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Use slab caches to allocate mmu data structures	Avi Kivity	2007-05-03	3	-4/+45
\| \| \| \| \| \| \|	Better leak detection, statistics, memory use, speed -- goodness all around. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Handle partial pae pdptr	Avi Kivity	2007-05-03	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \|	Some guests (Solaris) do not set up all four pdptrs, but leave some invalid. kvm incorrectly treated these as valid page directories, pinning the wrong pages and causing general confusion. Fix by checking the valid bit of a pae pdpte. This closes sourceforge bug 1698922. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Initialize cr0 to indicate an fpu is present	Avi Kivity	2007-05-03	1	-0/+1
\| \| \| \| \| \| \|	Solaris panics if it sees a cpu with no fpu, and it seems to rely on this bit. Closes sourceforge bug 1698920. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Fix overflow bug in overflow detection code	Eric Sesterhenn / Snakebyte	2007-05-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The expression sp - 6 < sp where sp is a u16 is undefined in C since 'sp - 6' is promoted to int, and signed overflow is undefined in C. gcc 4.2 actually warns about it. Replace with a simpler test. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Use kernel-standard types	Avi Kivity	2007-05-03	1	-3/+3
\| \| \| \| \| \|	Noted by Joerg Roedel. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: SVM: enable LBRV virtualization if available	Joerg Roedel	2007-05-03	1	-0/+13
\| \| \| \| \| \| \| \| \|	This patch enables the virtualization of the last branch record MSRs on SVM if this feature is available in hardware. It also introduces a small and simple check feature for specific SVM extensions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Add fpu get/set operations	Avi Kivity	2007-05-03	1	-0/+86
\| \| \| \| \| \| \|	These are really helpful when migrating an floating point app to another machine. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Add physical memory aliasing feature	Avi Kivity	2007-05-03	2	-3/+95
\| \| \| \| \| \| \| \|	With this, we can specify that accesses to one physical memory range will be remapped to another. This is useful for the vga window at 0xa0000 which is used as a movable window into the (much larger) framebuffer. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Simply gfn_to_page()	Avi Kivity	2007-05-03	4	-42/+33
\| \| \| \| \| \| \| \| \| \| \| \|	Mapping a guest page to a host page is a common operation. Currently, one has first to find the memory slot where the page belongs (gfn_to_memslot), then locate the page itself (gfn_to_page()). This is clumsy, and also won't work well with memory aliases. So simplify gfn_to_page() not to require memory slot translation first, and instead do it internally. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Add mmu cache clear function	Dor Laor	2007-05-03	2	-0/+18
\| \| \| \| \| \| \| \| \| \|	Functions that play around with the physical memory map need a way to clear mappings to possibly nonexistent or invalid memory. Both the mmu cache and the processor tlb are cleared. Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: x86 emulator: fix bit string operations operand size	Avi Kivity	2007-05-03	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On x86, bit operations operate on a string of bits that can reside in multiple words. For example, 'btsl %eax, (blah)' will touch the word at blah+4 if %eax is between 32 and 63. The x86 emulator compensates for that by advancing the operand address by (bit offset / BITS_PER_LONG) and truncating the bit offset to the range (0..BITS_PER_LONG-1). This has a side effect of forcing the operand size to 8 bytes on 64-bit hosts. Now, a 32-bit guest goes and fork()s a process. It write protects a stack page at 0xbffff000 using the 'btr' instruction, at offset 0xffc in the page table, with bit offset 1 (for the write permission bit). The emulator now forces the operand size to 8 bytes as previously described, and an innocent page table update turns into a cross-page-boundary write, which is assumed by the mmu code not to be a page table, so it doesn't actually clear the corresponding shadow page table entry. The guest and host permissions are out of sync and guest memory is corrupted soon afterwards, leading to guest failure. Fix by not using BITS_PER_LONG as the word size; instead use the actual operand size, so we get a 32-bit write in that case. Note we still have to teach the mmu to handle cross-page-boundary writes to guest page table; but for now this allows Damn Small Linux 0.4 (2.4.20) to boot. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Remove debug message	Avi Kivity	2007-05-03	1	-1/+0
\| \| \| \| \| \|	No longer interesting. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Use list_move()	Avi Kivity	2007-05-03	1	-8/+4
\| \| \| \| \| \|	Use list_move() where possible. Noticed by Dor Laor. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Remove unused function	Michal Piotrowski	2007-05-03	1	-7/+0
\| \| \| \| \| \| \| \| \| \|	Remove unused function CC drivers/kvm/svm.o drivers/kvm/svm.c:207: warning: ‘inject_db’ defined but not used Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: SVM: Ensure timestamp counter monotonicity	Avi Kivity	2007-05-03	2	-4/+18
\| \| \| \| \| \| \| \| \| \| \| \|	When a vcpu is migrated from one cpu to another, its timestamp counter may lose its monotonic property if the host has unsynced timestamp counters. This can confuse the guest, sometimes to the point of refusing to boot. As the rdtsc instruction is rather fast on AMD processors (7-10 cycles), we can simply record the last host tsc when we drop the cpu, and adjust the vcpu tsc offset when we detect that we've migrated to a different cpu. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: Fix hugepage pdes mapping same physical address with different access	Avi Kivity	2007-05-03	3	-4/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The kvm mmu keeps a shadow page for hugepage pdes; if several such pdes map the same physical address, they share the same shadow page. This is a fairly common case (kernel mappings on i386 nonpae Linux, for example). However, if the two pdes map the same memory but with different permissions, kvm will happily use the cached shadow page. If the access through the more permissive pde will occur after the access to the strict pde, an endless pagefault loop will be generated and the guest will make no progress. Fix by making the access permissions part of the cache lookup key. The fix allows Xen pae to boot on kvm and run guest domains. Thanks to Jeremy Fitzhardinge for reporting the bug and testing the fix. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: SVM: forbid guest to execute monitor/mwait	Joerg Roedel	2007-05-03	2	-1/+11
\| \| \| \| \| \| \| \| \| \|	This patch forbids the guest to execute monitor/mwait instructions on SVM. This is necessary because the guest can execute these instructions if they are available even if the kvm cpuid doesn't report its existence. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Handle writes to MCG_STATUS msr	Sergey Kiselev	2007-05-03	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	Some older (~2.6.7) kernels write MCG_STATUS register during kernel boot (mce_clear_all() function, called from mce_init()). It's not currently handled by kvm and will cause it to inject a GPF. Following patch adds a "nop" handler for this. Signed-off-by: Sergey Kiselev <sergey.kiselev@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Remove unused and write-only variables	Avi Kivity	2007-05-03	2	-4/+0
\| \| \| \| \| \|	Trivial cleanup. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Don't allow the guest to turn off the cpu cache	Avi Kivity	2007-05-03	1	-1/+3
\| \| \| \| \| \| \|	The cpu cache is a host resource; the guest should not be able to turn it off (even for itself). Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Hack real-mode segments on vmx from KVM_SET_SREGS	Avi Kivity	2007-05-03	1	-1/+8
\| \| \| \| \| \| \| \| \| \|	As usual, we need to mangle segment registers when emulating real mode as vm86 has specific constraints. We special case the reset segment base, and set the "access rights" (or descriptor flags) to vm86 comaptible values. This fixes reboot on vmx. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Modify guest segments after potentially switching modes	Avi Kivity	2007-05-03	1	-10/+10
\| \| \| \| \| \| \| \| \|	The SET_SREGS ioctl modifies both cr0.pe (real mode/protected mode) and guest segment registers. Since segment handling is modified by the mode on Intel procesors, update the segment registers after the mode switch has taken place. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Remove set_cr0_no_modeswitch() arch op	Avi Kivity	2007-05-03	4	-21/+1
\| \| \| \| \| \| \| \| \|	set_cr0_no_modeswitch() was a hack to avoid corrupting segment registers. As we now cache the protected mode values on entry to real mode, this isn't an issue anymore, and it interferes with reboot (which usually _is_ a modeswitch). Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Workaround vmx inability to virtualize the reset state	Avi Kivity	2007-05-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The reset state has cs.selector == 0xf000 and cs.base == 0xffff0000, which aren't compatible with vm86 mode, which is used for real mode virtualization. When we create a vcpu, we set cs.base to 0xf0000, but if we get there by way of a reset, the values are inconsistent and vmx refuses to enter guest mode. Workaround by detecting the state and munging it appropriately. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: Remove global pte tracking	Avi Kivity	2007-05-03	2	-10/+0
\| \| \| \| \| \| \| \| \| \| \|	The initial, noncaching, version of the kvm mmu flushed the all nonglobal shadow page table translations (much like a native tlb flush). The new implementation flushes translations only when they change, rendering global pte tracking superfluous. This removes the unused tracking mechanism and storage space. Signed-off-by: Avi Kivity <avi@qumranet.com>