From fc16123937ccfc2f2bbe98a5e41c4f6d26a22b24 Mon Sep 17 00:00:00 2001 From: Raghavendra K T Date: Tue, 7 Aug 2012 13:09:59 +0530 Subject: KVM: Add documentation on hypercalls Thanks Alex for KVM_HC_FEATURES inputs and Jan for VAPIC_POLL_IRQ, and Peter (HPA) for suggesting hypercall ABI addition. Signed-off-by: Raghavendra K T Signed-off-by: Marcelo Tosatti --- Documentation/virtual/kvm/hypercalls.txt | 66 ++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 Documentation/virtual/kvm/hypercalls.txt (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt new file mode 100644 index 0000000..ea113b5 --- /dev/null +++ b/Documentation/virtual/kvm/hypercalls.txt @@ -0,0 +1,66 @@ +Linux KVM Hypercall: +=================== +X86: + KVM Hypercalls have a three-byte sequence of either the vmcall or the vmmcall + instruction. The hypervisor can replace it with instructions that are + guaranteed to be supported. + + Up to four arguments may be passed in rbx, rcx, rdx, and rsi respectively. + The hypercall number should be placed in rax and the return value will be + placed in rax. No other registers will be clobbered unless explicitly stated + by the particular hypercall. + +S390: + R2-R7 are used for parameters 1-6. In addition, R1 is used for hypercall + number. The return value is written to R2. + + S390 uses diagnose instruction as hypercall (0x500) along with hypercall + number in R1. + + PowerPC: + It uses R3-R10 and hypercall number in R11. R4-R11 are used as output registers. + Return value is placed in R3. + + KVM hypercalls uses 4 byte opcode, that are patched with 'hypercall-instructions' + property inside the device tree's /hypervisor node. + For more information refer to Documentation/virtual/kvm/ppc-pv.txt + +KVM Hypercalls Documentation +=========================== +The template for each hypercall is: +1. Hypercall name. +2. Architecture(s) +3. Status (deprecated, obsolete, active) +4. Purpose + +1. KVM_HC_VAPIC_POLL_IRQ +------------------------ +Architecture: x86 +Status: active +Purpose: Trigger guest exit so that the host can check for pending +interrupts on reentry. + +2. KVM_HC_MMU_OP +------------------------ +Architecture: x86 +Status: deprecated. +Purpose: Support MMU operations such as writing to PTE, +flushing TLB, release PT. + +3. KVM_HC_FEATURES +------------------------ +Architecture: PPC +Status: active +Purpose: Expose hypercall availability to the guest. On x86 platforms, cpuid +used to enumerate which hypercalls are available. On PPC, either device tree +based lookup ( which is also what EPAPR dictates) OR KVM specific enumeration +mechanism (which is this hypercall) can be used. + +4. KVM_HC_PPC_MAP_MAGIC_PAGE +------------------------ +Architecture: PPC +Status: active +Purpose: To enable communication between the hypervisor and guest there is a +shared page that contains parts of supervisor visible register state. +The guest can map this shared page to access its supervisor register through +memory using this hypercall. -- cgit v1.1 From 6024f1a4dd066d54213cc17f821e19e3062298ca Mon Sep 17 00:00:00 2001 From: Alexander Graf Date: Tue, 7 Aug 2012 13:10:26 +0530 Subject: KVM: Add ppc hypercall documentation Signed-off-by: Alexander Graf Signed-off-by: Raghavendra K T Signed-off-by: Marcelo Tosatti --- Documentation/virtual/kvm/ppc-pv.txt | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/ppc-pv.txt b/Documentation/virtual/kvm/ppc-pv.txt index 4911cf9..4cd076f 100644 --- a/Documentation/virtual/kvm/ppc-pv.txt +++ b/Documentation/virtual/kvm/ppc-pv.txt @@ -174,3 +174,25 @@ following: That way we can inject an arbitrary amount of code as replacement for a single instruction. This allows us to check for pending interrupts when setting EE=1 for example. + +Hypercall ABIs in KVM on PowerPC +================================= +1) KVM hypercalls (ePAPR) + +These are ePAPR compliant hypercall implementation (mentioned above). Even +generic hypercalls are implemented here, like the ePAPR idle hcall. These are +available on all targets. + +2) PAPR hypercalls + +PAPR hypercalls are needed to run server PowerPC PAPR guests (-M pseries in QEMU). +These are the same hypercalls that pHyp, the POWER hypervisor implements. Some of +them are handled in the kernel, some are handled in user space. This is only +available on book3s_64. + +3) OSI hypercalls + +Mac-on-Linux is another user of KVM on PowerPC, which has its own hypercall (long +before KVM). This is supported to maintain compatibility. All these hypercalls get +forwarded to user space. This is only useful on book3s_32, but can be used with +book3s_64 as well. -- cgit v1.1 From 4d8b81abc47b83a1939e59df2fdb0e98dfe0eedd Mon Sep 17 00:00:00 2001 From: Xiao Guangrong Date: Tue, 21 Aug 2012 11:02:51 +0800 Subject: KVM: introduce readonly memslot In current code, if we map a readonly memory space from host to guest and the page is not currently mapped in the host, we will get a fault pfn and async is not allowed, then the vm will crash We introduce readonly memory region to map ROM/ROMD to the guest, read access is happy for readonly memslot, write access on readonly memslot will cause KVM_EXIT_MMIO exit Signed-off-by: Xiao Guangrong Signed-off-by: Avi Kivity --- Documentation/virtual/kvm/api.txt | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index bf33aaa..b91bfd4 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -857,7 +857,8 @@ struct kvm_userspace_memory_region { }; /* for kvm_memory_region::flags */ -#define KVM_MEM_LOG_DIRTY_PAGES 1UL +#define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) +#define KVM_MEM_READONLY (1UL << 1) This ioctl allows the user to create or modify a guest physical memory slot. When changing an existing slot, it may be moved in the guest @@ -873,9 +874,12 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr be identical. This allows large pages in the guest to be backed by large pages in the host. -The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which +The flags field supports two flag, KVM_MEM_LOG_DIRTY_PAGES, which instructs kvm to keep track of writes to memory within the slot. See -the KVM_GET_DIRTY_LOG ioctl. +the KVM_GET_DIRTY_LOG ioctl. Another flag is KVM_MEM_READONLY when the +KVM_CAP_READONLY_MEM capability, it indicates the guest memory is read-only, +that means, guest is only allowed to read it. Writes will be posted to +userspace as KVM_EXIT_MMIO exits. When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory region are automatically reflected into the guest. For example, an mmap() -- cgit v1.1 From 7efd8fa145c340a68a9cd758012b6e0ae4883c93 Mon Sep 17 00:00:00 2001 From: Jan Kiszka Date: Fri, 7 Sep 2012 13:17:47 +0200 Subject: KVM: Improve wording of KVM_SET_USER_MEMORY_REGION documentation Signed-off-by: Jan Kiszka Signed-off-by: Avi Kivity --- Documentation/virtual/kvm/api.txt | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index b91bfd4..36befa7 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -874,17 +874,17 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr be identical. This allows large pages in the guest to be backed by large pages in the host. -The flags field supports two flag, KVM_MEM_LOG_DIRTY_PAGES, which -instructs kvm to keep track of writes to memory within the slot. See -the KVM_GET_DIRTY_LOG ioctl. Another flag is KVM_MEM_READONLY when the -KVM_CAP_READONLY_MEM capability, it indicates the guest memory is read-only, -that means, guest is only allowed to read it. Writes will be posted to -userspace as KVM_EXIT_MMIO exits. - -When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory -region are automatically reflected into the guest. For example, an mmap() -that affects the region will be made visible immediately. Another example -is madvise(MADV_DROP). +The flags field supports two flag, KVM_MEM_LOG_DIRTY_PAGES, which instructs +kvm to keep track of writes to memory within the slot. See KVM_GET_DIRTY_LOG +ioctl. The KVM_CAP_READONLY_MEM capability indicates the availability of the +KVM_MEM_READONLY flag. When this flag is set for a memory region, KVM only +allows read accesses. Writes will be posted to userspace as KVM_EXIT_MMIO +exits. + +When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of +the memory region are automatically reflected into the guest. For example, an +mmap() that affects the region will be made visible immediately. Another +example is madvise(MADV_DROP). It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl. The KVM_SET_MEMORY_REGION does not allow fine grained control over memory -- cgit v1.1 From 879238fecc051d95037ae76332916209a7770709 Mon Sep 17 00:00:00 2001 From: Stefan Fritsch Date: Sun, 16 Sep 2012 12:55:40 +0200 Subject: KVM: clarify kvmclock documentation - mention that system time needs to be added to wallclock time - positive tsc_shift means left shift, not right - mention additional 32bit right shift Signed-off-by: Stefan Fritsch Signed-off-by: Marcelo Tosatti --- Documentation/virtual/kvm/msr.txt | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt index 7304710..6d470ae 100644 --- a/Documentation/virtual/kvm/msr.txt +++ b/Documentation/virtual/kvm/msr.txt @@ -34,9 +34,12 @@ MSR_KVM_WALL_CLOCK_NEW: 0x4b564d00 time information and check that they are both equal and even. An odd version indicates an in-progress update. - sec: number of seconds for wallclock. + sec: number of seconds for wallclock at time of boot. - nsec: number of nanoseconds for wallclock. + nsec: number of nanoseconds for wallclock at time of boot. + + In order to get the current wallclock time, the system_time from + MSR_KVM_SYSTEM_TIME_NEW needs to be added. Note that although MSRs are per-CPU entities, the effect of this particular MSR is global. @@ -82,20 +85,25 @@ MSR_KVM_SYSTEM_TIME_NEW: 0x4b564d01 time at the time this structure was last updated. Unit is nanoseconds. - tsc_to_system_mul: a function of the tsc frequency. One has - to multiply any tsc-related quantity by this value to get - a value in nanoseconds, besides dividing by 2^tsc_shift + tsc_to_system_mul: multiplier to be used when converting + tsc-related quantity to nanoseconds - tsc_shift: cycle to nanosecond divider, as a power of two, to - allow for shift rights. One has to shift right any tsc-related - quantity by this value to get a value in nanoseconds, besides - multiplying by tsc_to_system_mul. + tsc_shift: shift to be used when converting tsc-related + quantity to nanoseconds. This shift will ensure that + multiplication with tsc_to_system_mul does not overflow. + A positive value denotes a left shift, a negative value + a right shift. - With this information, guests can derive per-CPU time by - doing: + The conversion from tsc to nanoseconds involves an additional + right shift by 32 bits. With this information, guests can + derive per-CPU time by doing: time = (current_tsc - tsc_timestamp) - time = (time * tsc_to_system_mul) >> tsc_shift + if (tsc_shift >= 0) + time <<= tsc_shift; + else + time >>= -tsc_shift; + time = (time * tsc_to_system_mul) >> 32 time = time + system_time flags: bits in this field indicate extended capabilities -- cgit v1.1 From 7a84428af7ca6a847f058c9ff244a18a2664fd1b Mon Sep 17 00:00:00 2001 From: Alex Williamson Date: Fri, 21 Sep 2012 11:58:03 -0600 Subject: KVM: Add resampling irqfds for level triggered interrupts To emulate level triggered interrupts, add a resample option to KVM_IRQFD. When specified, a new resamplefd is provided that notifies the user when the irqchip has been resampled by the VM. This may, for instance, indicate an EOI. Also in this mode, posting of an interrupt through an irqfd only asserts the interrupt. On resampling, the interrupt is automatically de-asserted prior to user notification. This enables level triggered interrupts to be posted and re-enabled from vfio with no userspace intervention. All resampling irqfds can make use of a single irq source ID, so we reserve a new one for this interface. Signed-off-by: Alex Williamson Signed-off-by: Avi Kivity --- Documentation/virtual/kvm/api.txt | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 36befa7..f6ec3a9 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1950,6 +1950,19 @@ the guest using the specified gsi pin. The irqfd is removed using the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd and kvm_irqfd.gsi. +With KVM_CAP_IRQFD_RESAMPLE, KVM_IRQFD supports a de-assert and notify +mechanism allowing emulation of level-triggered, irqfd-based +interrupts. When KVM_IRQFD_FLAG_RESAMPLE is set the user must pass an +additional eventfd in the kvm_irqfd.resamplefd field. When operating +in resample mode, posting of an interrupt through kvm_irq.fd asserts +the specified gsi in the irqchip. When the irqchip is resampled, such +as from an EOI, the gsi is de-asserted and the user is notifed via +kvm_irqfd.resamplefd. It is the user's responsibility to re-queue +the interrupt if the device making use of it still requires service. +Note that closing the resamplefd is not sufficient to disable the +irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment +and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. + 4.76 KVM_PPC_ALLOCATE_HTAB Capability: KVM_CAP_PPC_ALLOC_HTAB -- cgit v1.1