summaryrefslogtreecommitdiffstats
path: root/src/docs/specs/ivshmem_device_spec.txt
diff options
context:
space:
mode:
Diffstat (limited to 'src/docs/specs/ivshmem_device_spec.txt')
-rw-r--r--src/docs/specs/ivshmem_device_spec.txt161
1 files changed, 161 insertions, 0 deletions
diff --git a/src/docs/specs/ivshmem_device_spec.txt b/src/docs/specs/ivshmem_device_spec.txt
new file mode 100644
index 0000000..d318d65
--- /dev/null
+++ b/src/docs/specs/ivshmem_device_spec.txt
@@ -0,0 +1,161 @@
+
+Device Specification for Inter-VM shared memory device
+------------------------------------------------------
+
+The Inter-VM shared memory device is designed to share a memory region (created
+on the host via the POSIX shared memory API) between multiple QEMU processes
+running different guests. In order for all guests to be able to pick up the
+shared memory area, it is modeled by QEMU as a PCI device exposing said memory
+to the guest as a PCI BAR.
+The memory region does not belong to any guest, but is a POSIX memory object on
+the host. The host can access this shared memory if needed.
+
+The device also provides an optional communication mechanism between guests
+sharing the same memory object. More details about that in the section 'Guest to
+guest communication' section.
+
+
+The Inter-VM PCI device
+-----------------------
+
+From the VM point of view, the ivshmem PCI device supports three BARs.
+
+- BAR0 is a 1 Kbyte MMIO region to support registers and interrupts when MSI is
+ not used.
+- BAR1 is used for MSI-X when it is enabled in the device.
+- BAR2 is used to access the shared memory object.
+
+It is your choice how to use the device but you must choose between two
+behaviors :
+
+- basically, if you only need the shared memory part, you will map BAR2.
+ This way, you have access to the shared memory in guest and can use it as you
+ see fit (memnic, for example, uses it in userland
+ http://dpdk.org/browse/memnic).
+
+- BAR0 and BAR1 are used to implement an optional communication mechanism
+ through interrupts in the guests. If you need an event mechanism between the
+ guests accessing the shared memory, you will most likely want to write a
+ kernel driver that will handle interrupts. See details in the section 'Guest
+ to guest communication' section.
+
+The behavior is chosen when starting your QEMU processes:
+- no communication mechanism needed, the first QEMU to start creates the shared
+ memory on the host, subsequent QEMU processes will use it.
+
+- communication mechanism needed, an ivshmem server must be started before any
+ QEMU processes, then each QEMU process connects to the server unix socket.
+
+For more details on the QEMU ivshmem parameters, see qemu-doc documentation.
+
+
+Guest to guest communication
+----------------------------
+
+This section details the communication mechanism between the guests accessing
+the ivhsmem shared memory.
+
+*ivshmem server*
+
+This server code is available in qemu.git/contrib/ivshmem-server.
+
+The server must be started on the host before any guest.
+It creates a shared memory object then waits for clients to connect on a unix
+socket. All the messages are little-endian int64_t integer.
+
+For each client (QEMU process) that connects to the server:
+- the server sends a protocol version, if client does not support it, the client
+ closes the communication,
+- the server assigns an ID for this client and sends this ID to him as the first
+ message,
+- the server sends a fd to the shared memory object to this client,
+- the server creates a new set of host eventfds associated to the new client and
+ sends this set to all already connected clients,
+- finally, the server sends all the eventfds sets for all clients to the new
+ client.
+
+The server signals all clients when one of them disconnects.
+
+The client IDs are limited to 16 bits because of the current implementation (see
+Doorbell register in 'PCI device registers' subsection). Hence only 65536
+clients are supported.
+
+All the file descriptors (fd to the shared memory, eventfds for each client)
+are passed to clients using SCM_RIGHTS over the server unix socket.
+
+Apart from the current ivshmem implementation in QEMU, an ivshmem client has
+been provided in qemu.git/contrib/ivshmem-client for debug.
+
+*QEMU as an ivshmem client*
+
+At initialisation, when creating the ivshmem device, QEMU first receives a
+protocol version and closes communication with server if it does not match.
+Then, QEMU gets its ID from the server then makes it available through BAR0
+IVPosition register for the VM to use (see 'PCI device registers' subsection).
+QEMU then uses the fd to the shared memory to map it to BAR2.
+eventfds for all other clients received from the server are stored to implement
+BAR0 Doorbell register (see 'PCI device registers' subsection).
+Finally, eventfds assigned to this QEMU process are used to send interrupts in
+this VM.
+
+*PCI device registers*
+
+From the VM point of view, the ivshmem PCI device supports 4 registers of
+32-bits each.
+
+enum ivshmem_registers {
+ IntrMask = 0,
+ IntrStatus = 4,
+ IVPosition = 8,
+ Doorbell = 12
+};
+
+The first two registers are the interrupt mask and status registers. Mask and
+status are only used with pin-based interrupts. They are unused with MSI
+interrupts.
+
+Status Register: The status register is set to 1 when an interrupt occurs.
+
+Mask Register: The mask register is bitwise ANDed with the interrupt status
+and the result will raise an interrupt if it is non-zero. However, since 1 is
+the only value the status will be set to, it is only the first bit of the mask
+that has any effect. Therefore interrupts can be masked by setting the first
+bit to 0 and unmasked by setting the first bit to 1.
+
+IVPosition Register: The IVPosition register is read-only and reports the
+guest's ID number. The guest IDs are non-negative integers. When using the
+server, since the server is a separate process, the VM ID will only be set when
+the device is ready (shared memory is received from the server and accessible
+via the device). If the device is not ready, the IVPosition will return -1.
+Applications should ensure that they have a valid VM ID before accessing the
+shared memory.
+
+Doorbell Register: To interrupt another guest, a guest must write to the
+Doorbell register. The doorbell register is 32-bits, logically divided into
+two 16-bit fields. The high 16-bits are the guest ID to interrupt and the low
+16-bits are the interrupt vector to trigger. The semantics of the value
+written to the doorbell depends on whether the device is using MSI or a regular
+pin-based interrupt. In short, MSI uses vectors while regular interrupts set
+the status register.
+
+Regular Interrupts
+
+If regular interrupts are used (due to either a guest not supporting MSI or the
+user specifying not to use them on startup) then the value written to the lower
+16-bits of the Doorbell register results is arbitrary and will trigger an
+interrupt in the destination guest.
+
+Message Signalled Interrupts
+
+An ivshmem device may support multiple MSI vectors. If so, the lower 16-bits
+written to the Doorbell register must be between 0 and the maximum number of
+vectors the guest supports. The lower 16 bits written to the doorbell is the
+MSI vector that will be raised in the destination guest. The number of MSI
+vectors is configurable but it is set when the VM is started.
+
+The important thing to remember with MSI is that it is only a signal, no status
+is set (since MSI interrupts are not shared). All information other than the
+interrupt itself should be communicated via the shared memory region. Devices
+supporting multiple MSI vectors can use different vectors to indicate different
+events have occurred. The semantics of interrupt vectors are left to the
+user's discretion.
OpenPOWER on IntegriCloud