diff options
Diffstat (limited to 'src/docs/specs/ivshmem_device_spec.txt')
-rw-r--r-- | src/docs/specs/ivshmem_device_spec.txt | 161 |
1 files changed, 161 insertions, 0 deletions
diff --git a/src/docs/specs/ivshmem_device_spec.txt b/src/docs/specs/ivshmem_device_spec.txt new file mode 100644 index 0000000..d318d65 --- /dev/null +++ b/src/docs/specs/ivshmem_device_spec.txt @@ -0,0 +1,161 @@ + +Device Specification for Inter-VM shared memory device +------------------------------------------------------ + +The Inter-VM shared memory device is designed to share a memory region (created +on the host via the POSIX shared memory API) between multiple QEMU processes +running different guests. In order for all guests to be able to pick up the +shared memory area, it is modeled by QEMU as a PCI device exposing said memory +to the guest as a PCI BAR. +The memory region does not belong to any guest, but is a POSIX memory object on +the host. The host can access this shared memory if needed. + +The device also provides an optional communication mechanism between guests +sharing the same memory object. More details about that in the section 'Guest to +guest communication' section. + + +The Inter-VM PCI device +----------------------- + +From the VM point of view, the ivshmem PCI device supports three BARs. + +- BAR0 is a 1 Kbyte MMIO region to support registers and interrupts when MSI is + not used. +- BAR1 is used for MSI-X when it is enabled in the device. +- BAR2 is used to access the shared memory object. + +It is your choice how to use the device but you must choose between two +behaviors : + +- basically, if you only need the shared memory part, you will map BAR2. + This way, you have access to the shared memory in guest and can use it as you + see fit (memnic, for example, uses it in userland + http://dpdk.org/browse/memnic). + +- BAR0 and BAR1 are used to implement an optional communication mechanism + through interrupts in the guests. If you need an event mechanism between the + guests accessing the shared memory, you will most likely want to write a + kernel driver that will handle interrupts. See details in the section 'Guest + to guest communication' section. + +The behavior is chosen when starting your QEMU processes: +- no communication mechanism needed, the first QEMU to start creates the shared + memory on the host, subsequent QEMU processes will use it. + +- communication mechanism needed, an ivshmem server must be started before any + QEMU processes, then each QEMU process connects to the server unix socket. + +For more details on the QEMU ivshmem parameters, see qemu-doc documentation. + + +Guest to guest communication +---------------------------- + +This section details the communication mechanism between the guests accessing +the ivhsmem shared memory. + +*ivshmem server* + +This server code is available in qemu.git/contrib/ivshmem-server. + +The server must be started on the host before any guest. +It creates a shared memory object then waits for clients to connect on a unix +socket. All the messages are little-endian int64_t integer. + +For each client (QEMU process) that connects to the server: +- the server sends a protocol version, if client does not support it, the client + closes the communication, +- the server assigns an ID for this client and sends this ID to him as the first + message, +- the server sends a fd to the shared memory object to this client, +- the server creates a new set of host eventfds associated to the new client and + sends this set to all already connected clients, +- finally, the server sends all the eventfds sets for all clients to the new + client. + +The server signals all clients when one of them disconnects. + +The client IDs are limited to 16 bits because of the current implementation (see +Doorbell register in 'PCI device registers' subsection). Hence only 65536 +clients are supported. + +All the file descriptors (fd to the shared memory, eventfds for each client) +are passed to clients using SCM_RIGHTS over the server unix socket. + +Apart from the current ivshmem implementation in QEMU, an ivshmem client has +been provided in qemu.git/contrib/ivshmem-client for debug. + +*QEMU as an ivshmem client* + +At initialisation, when creating the ivshmem device, QEMU first receives a +protocol version and closes communication with server if it does not match. +Then, QEMU gets its ID from the server then makes it available through BAR0 +IVPosition register for the VM to use (see 'PCI device registers' subsection). +QEMU then uses the fd to the shared memory to map it to BAR2. +eventfds for all other clients received from the server are stored to implement +BAR0 Doorbell register (see 'PCI device registers' subsection). +Finally, eventfds assigned to this QEMU process are used to send interrupts in +this VM. + +*PCI device registers* + +From the VM point of view, the ivshmem PCI device supports 4 registers of +32-bits each. + +enum ivshmem_registers { + IntrMask = 0, + IntrStatus = 4, + IVPosition = 8, + Doorbell = 12 +}; + +The first two registers are the interrupt mask and status registers. Mask and +status are only used with pin-based interrupts. They are unused with MSI +interrupts. + +Status Register: The status register is set to 1 when an interrupt occurs. + +Mask Register: The mask register is bitwise ANDed with the interrupt status +and the result will raise an interrupt if it is non-zero. However, since 1 is +the only value the status will be set to, it is only the first bit of the mask +that has any effect. Therefore interrupts can be masked by setting the first +bit to 0 and unmasked by setting the first bit to 1. + +IVPosition Register: The IVPosition register is read-only and reports the +guest's ID number. The guest IDs are non-negative integers. When using the +server, since the server is a separate process, the VM ID will only be set when +the device is ready (shared memory is received from the server and accessible +via the device). If the device is not ready, the IVPosition will return -1. +Applications should ensure that they have a valid VM ID before accessing the +shared memory. + +Doorbell Register: To interrupt another guest, a guest must write to the +Doorbell register. The doorbell register is 32-bits, logically divided into +two 16-bit fields. The high 16-bits are the guest ID to interrupt and the low +16-bits are the interrupt vector to trigger. The semantics of the value +written to the doorbell depends on whether the device is using MSI or a regular +pin-based interrupt. In short, MSI uses vectors while regular interrupts set +the status register. + +Regular Interrupts + +If regular interrupts are used (due to either a guest not supporting MSI or the +user specifying not to use them on startup) then the value written to the lower +16-bits of the Doorbell register results is arbitrary and will trigger an +interrupt in the destination guest. + +Message Signalled Interrupts + +An ivshmem device may support multiple MSI vectors. If so, the lower 16-bits +written to the Doorbell register must be between 0 and the maximum number of +vectors the guest supports. The lower 16 bits written to the doorbell is the +MSI vector that will be raised in the destination guest. The number of MSI +vectors is configurable but it is set when the VM is started. + +The important thing to remember with MSI is that it is only a signal, no status +is set (since MSI interrupts are not shared). All information other than the +interrupt itself should be communicated via the shared memory region. Devices +supporting multiple MSI vectors can use different vectors to indicate different +events have occurred. The semantics of interrupt vectors are left to the +user's discretion. |