summaryrefslogtreecommitdiffstats
path: root/drivers/lguest/core.c
Commit message (Collapse)AuthorAgeFilesLines
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* lguest: cleanup for map_switcher()Xiao Guangrong2009-09-231-3/+2
| | | | | | | We can use alloc_page() instead of get_zeroed_page() and virt_to_page() Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: update commentryRusty Russell2009-07-301-1/+6
| | | | | | | | | | Every so often, after code shuffles, I need to go through and unbitrot the Lguest Journey (see drivers/lguest/README). Since we now use RCU in a simple form in one place I took the opportunity to expand that explanation. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
* lguest: fix comment styleRusty Russell2009-07-301-39/+75
| | | | | | | | | I don't really notice it (except to begrudge the extra vertical space), but Ingo does. And he pointed out that one excuse of lguest is as a teaching tool, it should set a good example. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com>
* lguest: remove obsolete LHREQ_BREAK callRusty Russell2009-06-121-8/+3
| | | | | | | | We no longer need an efficient mechanism to force the Guest back into host userspace, as each device is serviced without bothering the main Guest process (aka. the Launcher). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: use eventfds for device notificationRusty Russell2009-06-121-3/+5
| | | | | | | | | | | | Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with an address: the main Launcher process returns with this address, and figures out what device to run. A far nicer model is to let processes bind an eventfd to an address: if we find one, we simply signal the eventfd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Davide Libenzi <davidel@xmailserver.org>
* lguest: map switcher with executable page table entriesMatias Zabaljauregui2009-06-121-1/+1
| | | | | | | | Map switcher with executable page table entries. (This bug didn't matter before PAE and hence NX support -- RR) Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: improve interrupt handling, speed up stream networkingRusty Russell2009-06-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | lguest never checked for pending interrupts when enabling interrupts, and things still worked. However, it makes a significant difference to TCP performance, so it's time we fixed it by introducing a pending_irq flag and checking it on irq_restore and irq_enable. These two routines are now too big to patch into the 8/10 bytes patch space, so we drop that code. Note: The high latency on interrupt delivery had a very curious effect: once everything else was optimized, networking without GSO was faster than networking with GSO, since more interrupts were sent and hence a greater chance of one getting through to the Guest! Note2: (Almost) Closing the same loophole for iret doesn't have any measurable effect, so I'm leaving that patch for the moment. Before: 1GB tcpblast Guest->Host: 30.7 seconds 1GB tcpblast Guest->Host (no GSO): 76.0 seconds After: 1GB tcpblast Guest->Host: 6.8 seconds 1GB tcpblast Guest->Host (no GSO): 27.8 seconds Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: fix race in halt codeRusty Russell2009-06-121-2/+12
| | | | | | | | | | | | | When the Guest does the LHCALL_HALT hypercall, we go to sleep, expecting that a timer or the Waker will wake_up_process() us. But we do it in a stupid way, leaving a classic missing wakeup race. So split maybe_do_interrupt() into interrupt_pending() and try_deliver_interrupt(), and check maybe_do_interrupt() and the "break_out" flag before calling schedule. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: use bool instead of intMatias Zabaljauregui2009-03-301-2/+2
| | | | | | | | | | Impact: clean up Rusty told me, some time ago, that he had become a fan of "bool". So, here are some replacements. Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: typos fixAtsushi SAKAI2009-01-301-1/+1
| | | | | | | | | | | 3 points lguest_asm.S => i386_head.S LHCALL_BREAK => LHREQ_BREAK perferred => preferred Signed-off-by: Atsushi SAKAI <sakaia@jp.fujitsu.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: fix switcher_page leak on unloadJohannes Weiner2008-07-291-0/+1
| | | | | | | | map_switcher allocates the array, unmap_switcher has to free it accordingly. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: comment documentation update.Rusty Russell2008-03-281-9/+9
| | | | | | | | | Took some cycles to re-read the Lguest Journey end-to-end, fix some rot and tighten some phrases. Only comments change. No new jokes, but a couple of recycled old jokes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: fix __get_vm_area usage.Rusty Russell2008-03-111-2/+13
| | | | | | | | | | | | | | | Robert Bragg's 5dc331852848a38ca00a2817e5b98a1d0561b116 tightened (ie. fixed) the checking in __get_vm_area, and it broke lguest. lguest should pass the exact "end" it wants, not some random constant (it was possible previously that it would actually get an address different from SWITCHER_ADDR). Also, Fabio Checconi pointed out that we should make sure we're not hitting the fixmap area. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Robert Bragg <robert@sixbynine.org>
* lguest: get rid of lg variable assignmentsGlauber de Oliveira Costa2008-01-301-13/+11
| | | | | | | | We can save some lines of code by getting rid of *lg = cpu... lines of code spread everywhere by now. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: make pending notifications per-vcpuGlauber de Oliveira Costa2008-01-301-3/+3
| | | | | | | | this patch makes the pending_notify field, used to control pending notifications, per-vcpu, instead of per-guest Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: per-vcpu lguest task managementGlauber de Oliveira Costa2008-01-301-2/+2
| | | | | | | | | | lguest uses tasks to control its running behaviour (like sending breaks, controlling halted state, etc). In a per-vcpu environment, each vcpu will have its own underlying task. So this patch makes the infrastructure for that possible Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: per-vcpu interrupt processing.Glauber de Oliveira Costa2008-01-301-1/+1
| | | | | | | This patch adapts interrupt processing for using the vcpu struct. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: make hypercalls use the vcpu structGlauber de Oliveira Costa2008-01-301-3/+3
| | | | | | | | | this patch changes do_hcall() and do_async_hcall() interfaces (and obviously their callers) to get a vcpu struct. Again, a vcpu services the hypercall, not the whole guest Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: per-cpu run guestGlauber de Oliveira Costa2008-01-301-2/+4
| | | | | | | | | This patch makes the run_guest() routine use the lg_cpu struct. This is required since in a smp guest environment, there's no more the notion of "running the guest", but rather, it is "running the vcpu" Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: Reboot supportBalaji Rao2008-01-301-0/+2
| | | | | | | | | Reboot Implemented (Prevent fd leak, fix style and fix documentation --RR) Signed-off-by: Balaji Rao <balajirrao@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: remove pv_info dependencyGlauber de Oliveira Costa2008-01-301-1/+1
| | | | | | | | | | | | Currently, lguest module can't be compiled without the PARAVIRT flag being on. This is a fake dependency, since the module itself shouldn't need any paravirt override. Reason for that is the reference to pv_info structure in initial loading tests. This patch removes it in favour of a more generic error message. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest: documentation updateRusty Russell2007-10-251-1/+4
| | | | | | | | Went through the documentation doing typo and content fixes. This patch contains only comment and whitespace changes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* generalize lgread_u32/lgwrite_u32.Rusty Russell2007-10-231-31/+8
| | | | | | | | | | | | | | Jes complains that page table code still uses lgread_u32 even though it now uses general kernel pte types. The best thing to do is to generalize lgread_u32 and lgwrite_u32. This means we lose the efficiency of getuser(). We could potentially regain it if we used __copy_from_user instead of copy_from_user, but I'm not certain that our range check is equivalent to access_ok() on all platforms. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Jes Sorensen <jes@sgi.com>
* Remove old lguest I/O infrrasructure.Rusty Russell2007-10-231-8/+4
| | | | | | | | | | | This patch gets rid of the old lguest host I/O infrastructure and replaces it with a single hypercall "LHCALL_NOTIFY" which takes an address. The main change is the removal of io.c: that mainly did inter-guest I/O, which virtio doesn't yet support. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Allow guest to specify syscall vector to use.Rusty Russell2007-10-231-10/+20
| | | | | | | | | | (Based on Ron Minnich's LGUEST_PLAN9_SYSCALL patch). This patch allows Guests to specify what system call vector they want, and we try to reserve it. We only allow one non-Linux system call vector, to try to avoid DoS on the Host. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Introduce "hcall" pointer to indicate pending hypercall.Rusty Russell2007-10-231-4/+4
| | | | | | | | | | | | | | | | | | | Currently we look at the "trapnum" to see if the Guest wants a hypercall. But once the hypercall is done we have to reset trapnum to a bogus value, otherwise if we exit to userspace and return, we'd run the same hypercall twice (that was a nasty bug to find!). This has two main effects: 1) When Jes's patch changes the hypercall args to be a generic "struct hcall_args" we simply change the type of "lg->hcall". It's set by arch code, so if it has to copy args or something it can do so, and point "hcall" into lg->arch somewhere. 2) Async hypercalls only get run when an actual hypercall is pending. This simplfies the code a little and is a more logical semantic. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Move i386 part of core.c to x86/core.c.Jes Sorensen2007-10-231-446/+13
| | | | | | | | Separate i386 architecture specific from core.c and move it to x86/core.c and add x86/lguest.h header file to match. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Remove fixed limit on number of guests, and lguests array.Rusty Russell2007-10-231-14/+0
| | | | | | | | | | | Back when we had all the Guest state in the switcher, we had a fixed array of them. This is no longer necessary. If we switch the network code to using random_ether_addr (46 bits is enough to avoid clashes), we can get rid of the concept of "guest id" altogether. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Introduce guest mem offset, static link example launcherRusty Russell2007-10-231-12/+10
| | | | | | | | | | | In order to avoid problematic special linking of the Launcher, we give the Host an offset: this means we can use any memory region in the Launcher as Guest memory rather than insisting on mmap() at 0. The result is quite pleasing: a number of casts are replaced with simple additions. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* paravirt: refactor struct paravirt_ops into smaller pv_*_opsJeremy Fitzhardinge2007-10-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch refactors the paravirt_ops structure into groups of functionally related ops: pv_info - random info, rather than function entrypoints pv_init_ops - functions used at boot time (some for module_init too) pv_misc_ops - lazy mode, which didn't fit well anywhere else pv_time_ops - time-related functions pv_cpu_ops - various privileged instruction ops pv_irq_ops - operations for managing interrupt state pv_apic_ops - APIC operations pv_mmu_ops - operations for managing pagetables There are several motivations for this: 1. Some of these ops will be general to all x86, and some will be i386/x86-64 specific. This makes it easier to share common stuff while allowing separate implementations where needed. 2. At the moment we must export all of paravirt_ops, but modules only need selected parts of it. This allows us to export on a case by case basis (and also choose which export license we want to apply). 3. Functional groupings make things a bit more readable. Struct paravirt_ops is now only used as a template to generate patch-site identifiers, and to extract function pointers for inserting into jmp/calls when patching. It is only instantiated when needed. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Zach Amsden <zach@vmware.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Anthony Liguory <aliguori@us.ibm.com> Cc: "Glauber de Oliveira Costa" <glommer@gmail.com> Cc: Jun Nakajima <jun.nakajima@intel.com>
* lguest: Fix Malicious Guest GDT Host CrashRusty Russell2007-08-091-0/+5
| | | | | | | | | | | | | | | | | | | If a Guest makes hypercall which sets a GDT entry to not present, we currently set any segment registers using that GDT entry to 0. Unfortunately, this is not sufficient: there are other ways of altering GDT entries which will cause a fault. The correct solution to do what Linux does: let them set any GDT value they want and handle the #GP when popping causes a fault. This has the added benefit of making our Switcher slightly more robust in the case of any other bugs which cause it to fault. We kill the Guest if it causes a fault in the Switcher: it's the Guest's responsibility to make sure it's not using segments when it changes them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* lguest: documentation VI: SwitcherRusty Russell2007-07-261-4/+47
| | | | | | | | Documentation: The Switcher Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* lguest: documentation V: HostRusty Russell2007-07-261-15/+258
| | | | | | | | Documentation: The Host Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* lguest: documentation IV: LauncherRusty Russell2007-07-261-2/+22
| | | | | | | | Documentation: The Launcher Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* lguest: documentation I: PreparationRusty Russell2007-07-261-2/+5
| | | | | | | | | | | The netfilter code had very good documentation: the Netfilter Hacking HOWTO. Noone ever read it. So this time I'm trying something different, using a bit of Knuthiness. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* lguest: the host codeRusty Russell2007-07-191-0/+462
This is the code for the "lg.ko" module, which allows lguest guests to be launched. [akpm@linux-foundation.org: update for futex-new-private-futexes] [akpm@linux-foundation.org: build fix] [jmorris@namei.org: lguest: use hrtimers] [akpm@linux-foundation.org: x86_64 build fix] Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OpenPOWER on IntegriCloud