summaryrefslogtreecommitdiffstats
path: root/hw
Commit message (Collapse)AuthorAgeFilesLines
...
| * | virtio: right size for virtio_queue_get_avail_sizePierre Morel2015-09-241-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Being working on dataplane I notice something strange: virtio_queue_get_avail_size() used a 64bit size index for the calculation of the available ring size. It is quite strange but it did work with the old calculation of the avail ring, at most with performance penalty, and I wonder where I missed something. This patch let use a 16bit size as defined in virtio_ring.h Signed-off-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | Merge remote-tracking branch 'remotes/elmarco/tags/rm-libcacard' into stagingPeter Maydell2015-09-242-4/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove libcacard # gpg: Signature made Wed 23 Sep 2015 22:37:11 BST using RSA key ID 75969CE5 # gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" # gpg: aka "Marc-André Lureau <marcandre.lureau@gmail.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 87A9 BD93 3F87 C606 D276 F62D DAE8 E109 7596 9CE5 * remotes/elmarco/tags/rm-libcacard: libcacard: use the standalone project Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * | libcacard: use the standalone projectMarc-André Lureau2015-09-232-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | libcacard is now a standalone project hosted with the Spice project (see the 2.5.0 release announcement), remove it from qemu tree. Use the library if found during configure or if --enable-smartcard. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Paolo Bonzini <pbonzini@redhat.com>
* | | hw/arm/virt-acpi-build: Fix wrong size of flash in ACPI tableShannon Zhao2015-09-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While virt machine creates two flash devices with total size 0x08000000, the ACPI table generation code was wrongly using this total size as the size of each flash device, so it would overlap other MMIO spaces. Make each device entry in the table half the total; this brings the ACPI table into line with the code which generates the device tree and which creates the flash devices themselves. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Wei Huang <wei@redhat.com> Tested-by: Graeme Gregory <graeme.gregory@linaro.org> Message-id: 1442455041-6596-1-git-send-email-shannon.zhao@linaro.org [PMM: edited commit message] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* | | hw/arm/virt: Add gic-version option to virt machinePavel Fedin2015-09-242-45/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add gic_version to VirtMachineState, set it to value of the option and pass it around where necessary. Instantiate devices and fdt nodes according to the choice. max_cpus for virt machine increased to 123 (calculated from redistributor space available in the memory map). GICv2 compatibility check happens inside arm_gic_common_realize(). ITS region is added to the memory map too, however currently it not used, just reserved. Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Tested-by: Ashok kumar <ashoks@broadcom.com> [PMM: Added missing cpu_to_le* calls, thanks to Shannon Zhao] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* | | hw/intc: Initial implementation of vGICv3Pavel Fedin2015-09-242-0/+150
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the initial version of KVM-accelerated GICv3 support. State load and save are not yet supported, live migration is not possible. In order to get correct class name in a simpler way, gicv3_class_name() function is implemented, similar to gic_class_name(). Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Tested-by: Ashok kumar <ashoks@broadcom.com> Message-id: 69d8f01d14994d7a1a140e96aef59fd332d02293.1441784344.git.p.fedin@samsung.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* | | intc/gic: Extract some reusable vGIC codePavel Fedin2015-09-242-65/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some functions previously used only by vGICv2 are useful also for vGICv3 implementation. Untie them from GICState and make accessible from within other modules: - kvm_arm_gic_set_irq() - kvm_gic_supports_attr() - moved to common code and renamed to kvm_device_check_attr() - kvm_gic_access() - turned into GIC-independent kvm_device_access(). Data pointer changed to void * because some GICv3 registers are 64-bit wide Some of these changes are not used right now, but they will be helpful for implementing live migration. Actually kvm_dist_get() and kvm_dist_put() could also be made reusable, but they would require two extra parameters (s->dev_fd and s->num_cpu) as well as lots of typecasts of 's' to DeviceState * and back to GICState *. This makes the code very ugly so i decided to stop at this point. I tried also an approach with making a base class for all possible GICs, but it would contain only three variables (dev_fd, cpu_num and irq_num), and accessing them through the rest of the code would be again tedious (either ugly casts or qemu-style separate object pointer). So i disliked it too. Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Tested-by: Ashok kumar <ashoks@broadcom.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 2ef56d1dd64ffb75ed02a10dcdaf605e5b8ff4f8.1441784344.git.p.fedin@samsung.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* | | hw/intc: Implement GIC-500 base classShlomo Pongratz2015-09-242-0/+141
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This class is to be used by both software and KVM implementations of GICv3 Currently it is mostly a placeholder, but in future it is supposed to hold qemu's representation of GICv3 state, which is necessary for migration. The interface of this class is fully compatible with GICv2 one. This is done in order to simplify integration with existing code. Signed-off-by: Shlomo Pongratz <shlomo.pongratz@huawei.com> Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Ashok kumar <ashoks@broadcom.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: aff8baaee493cdcab0694b4a1d4dd5ff27c37ed2.1441784344.git.p.fedin@samsung.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* | vfio/pci: Add emulated PCI IDsAlex Williamson2015-09-233-6/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Specifying an emulated PCI vendor/device ID can be useful for testing various quirk paths, even though the behavior and functionality of the device with bogus IDs is fully unsupportable. We need to use a uint32_t for the vendor/device IDs, even though the registers themselves are only 16-bit in order to be able to determine whether the value is valid and user set. The same support is added for subsystem vendor/device ID, though these have the possibility of being useful and supported for more than a testing tool. An emulated platform might want to impose their own subsystem IDs or at least hide the physical subsystem ID. Windows guests will often reinstall drivers due to a change in subsystem IDs, something that VM users may want to avoid. Of course careful attention would be required to ensure that guest drivers do not rely on the subsystem ID as a basis for device driver quirks. All of these options are added using the standard experimental option prefix and should not be considered stable. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Cache vendor and device IDAlex Williamson2015-09-233-19/+11
| | | | | | | | | | | | | | Simplify access to commonly referenced PCI vendor and device ID by caching it on the VFIOPCIDevice struct. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Move AMD device specific reset to quirksAlex Williamson2015-09-233-157/+170
| | | | | | | | | | | | | | This is just another quirk, for reset rather than affecting memory regions. Move it to our new quirks file. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Remove old config window and mirror quirksAlex Williamson2015-09-232-177/+0
| | | | | | | | | | | | These are now unused. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Config mirror quirkAlex Williamson2015-09-231-106/+124
| | | | | | | | | | | | Re-implement our mirror quirk using the new infrastructure. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Config window quirksAlex Williamson2015-09-231-88/+270
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Config windows make use of an address register and a data register. In VGA cards, these are often used to provide real mode code in the BIOS an easy way to access MMIO registers since the window often resides in an I/O port register. When the MMIO register has a mirror of PCI config space, we need to trap those accesses and redirect them to emulated config space. The previous version of this functionality made use of a single MemoryRegion and single match address. This version uses separate MemoryRegions for each of the address and data registers and allows for multiple match addresses. This is useful for Nvidia cards which have two ranges which index into PCI config space. The previous implementation is left for the follow-on patch for a more reviewable diff. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Rework RTL8168 quirkAlex Williamson2015-09-231-76/+101
| | | | | | | | | | | | | | | | | | Another rework of this quirk, this time to update to the new quirk structure. We can handle the address and data registers with separate MemoryRegions and a quirk specific data structure, making the code much more understandable. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Cleanup Nvidia 0x3d0 quirkAlex Williamson2015-09-231-65/+107
| | | | | | | | | | | | | | | | The Nvidia 0x3d0 quirk makes use of a two separate registers and gives us our first chance to make use of separate memory regions for each to simplify the code a bit. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Cleanup ATI 0x3c3 quirkAlex Williamson2015-09-231-17/+9
| | | | | | | | | | | | | | | | This is an easy quirk that really doesn't need a data structure if its own. We can pass vdev as the opaque data and access to the MemoryRegion isn't required. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Foundation for new quirk structureAlex Williamson2015-09-232-104/+157
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VFIOQuirk hosts a single memory region and a fixed set of data fields that try to handle all the quirk cases, but end up making those that don't exactly match really confusing. This patch introduces a struct intended to provide more flexibility and simpler code. VFIOQuirk is stripped to its basics, an opaque data pointer for quirk specific data and a pointer to an array of MemoryRegions with a counter. This still allows us to have common teardown routines, but adds much greater flexibility to support multiple memory regions and quirk specific data structures that are easier to maintain. The existing VFIOQuirk is transformed into VFIOLegacyQuirk, which further patches will eliminate entirely. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Cleanup ROM blacklist quirkAlex Williamson2015-09-232-20/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | Create a vendor:device ID helper that we'll also use as we rework the rest of the quirks. Re-reading the config entries, even if we get more blacklist entries, is trivial overhead and only incurred during device setup. There's no need to typedef the blacklist structure, it's a static private data type used once. The elements get bumped up to uint32_t to avoid future maintenance issues if PCI_ANY_ID gets used for a blacklist entry (avoiding an actual hardware match). Our test loop is also crying out to be simplified as a for loop. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Split quirks to a separate fileAlex Williamson2015-09-234-882/+908
| | | | | | | | Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Extract PCI structures to a separate headerAlex Williamson2015-09-232-143/+159
| | | | | | | | Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio: Change polarity of our no-mmap optionAlex Williamson2015-09-233-3/+3
| | | | | | | | | | | | | | | | | | The default should be to allow mmap and new drivers shouldn't need to expose an option or set it to other than the allocation default in their initfn. Take advantage of the experimental flag to change this option to the correct polarity. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Make interrupt bypass runtime configurableAlex Williamson2015-09-231-7/+12
| | | | | | | | | | | | | | Tracing is more effective when we can completely disable all KVM bypass paths. Make these runtime rather than build-time configurable. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Rename MSI/X functions for easier tracingAlex Williamson2015-09-231-39/+34
| | | | | | | | | | | | | | | | | | This allows vfio_msi* tracing. The MSI/X interrupt tracing is also pulled out of #ifdef DEBUG_VFIO to avoid a recompile for tracing this path. A few cycles to read the message is hardly anything if we're already in QEMU. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Rename INTx functions for easier tracingAlex Williamson2015-09-231-24/+24
| | | | | | | | | | | | | | Rename functions and tracing callbacks so that we can trace vfio_intx* to see all the INTx related activities. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio/pci: Cleanup vfio_early_setup_msix() error pathAlex Williamson2015-09-231-14/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | With the addition of the Chelsio quirk we have an error path out of vfio_early_setup_msix() that doesn't free the allocated VFIOMSIXInfo struct. This doesn't introduce a leak as it still gets freed in the vfio_put_device() path, but it's complicated and sloppy to rely on that. Restructure to free the allocated data on error and only link it into the vdev on success. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
* | vfio/pci: Cleanup RTL8168 quirk and tracingAlex Williamson2015-09-231-51/+37
|/ | | | | | | | | | | | | | | | | | | | | | | | There's quite a bit of cleanup that can be done to the RTL8168 quirk, as well as the tracing to prevent a spew of uninteresting accesses for anything else the driver might choose to use the window registers for besides the MSI-X table. There should be no functional change, but it's now possible to get compact and useful traces by enabling vfio_rtl8168_quirk*, ex: vfio_rtl8168_quirk_write 0000:04:00.0 [address]: 0x1f000 vfio_rtl8168_quirk_read 0000:04:00.0 [address]: 0x8001f000 vfio_rtl8168_quirk_read 0000:04:00.0 [data]: 0xfee0100c vfio_rtl8168_quirk_write 0000:04:00.0 [address]: 0x1f004 vfio_rtl8168_quirk_read 0000:04:00.0 [address]: 0x8001f004 vfio_rtl8168_quirk_read 0000:04:00.0 [data]: 0x0 vfio_rtl8168_quirk_write 0000:04:00.0 [address]: 0x1f008 vfio_rtl8168_quirk_read 0000:04:00.0 [address]: 0x8001f008 vfio_rtl8168_quirk_read 0000:04:00.0 [data]: 0x49b1 vfio_rtl8168_quirk_write 0000:04:00.0 [address]: 0x1f00c vfio_rtl8168_quirk_read 0000:04:00.0 [address]: 0x8001f00c vfio_rtl8168_quirk_read 0000:04:00.0 [data]: 0x0 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* sPAPR: Enable EEH on VFIO PCI device onlyGavin Shan2015-09-231-1/+1
| | | | | | | | | This checks if the PCI device retrieved from the PCI device address is VFIO PCI device when enabling EEH functionality. If it's not VFIO PCI device, the EEH functonality isn't enabled. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* sPAPR: Revert don't enable EEH on emulated PCI devicesGavin Shan2015-09-231-7/+0
| | | | | | | | | | This reverts commit 7cb18007 ("sPAPR: Don't enable EEH on emulated PCI devices") as rtas_ibm_set_eeh_option() isn't the right place to check if there has the corresponding PCI device for the input address, which can be PE address, not PCI device address. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* ppc/spapr: Implement H_RANDOM hypercall in QEMUThomas Huth2015-09-233-1/+195
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PAPR interface defines a hypercall to pass high-quality hardware generated random numbers to guests. Recent kernels can already provide this hypercall to the guest if the right hardware random number generator is available. But in case the user wants to use another source like EGD, or QEMU is running with an older kernel, we should also have this call in QEMU, so that guests that do not support virtio-rng yet can get good random numbers, too. This patch now adds a new pseudo-device to QEMU that either directly provides this hypercall to the guest or is able to enable the in-kernel hypercall if available. The in-kernel hypercall can be enabled with the use-kvm property, e.g.: qemu-system-ppc64 -device spapr-rng,use-kvm=true For handling the hypercall in QEMU instead, a "RngBackend" is required since the hypercall should provide "good" random data instead of pseudo-random (like from a "simple" library function like rand() or g_random_int()). Since there are multiple RngBackends available, the user must select an appropriate back-end via the "rng" property of the device, e.g.: qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=gid0 \ -device spapr-rng,rng=gid0 ... See http://wiki.qemu-project.org/Features-Done/VirtIORNG for other example of specifying RngBackends. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* ppc/spapr: Fix buffer overflow in spapr_populate_drconf_memory()Thomas Huth2015-09-231-3/+6
| | | | | | | | | | | | | | | | The buffer that is allocated in spapr_populate_drconf_memory() is used for setting both, the "ibm,dynamic-memory" and the "ibm,associativity-lookup-arrays" property. However, only the size of the first one is taken into account when allocating the memory. So if the length of the second property is larger than the length of the first one, we run into a buffer overflow here! Fix it by taking the length of the second property into account, too. Fixes: "spapr: Support ibm,dynamic-reconfiguration-memory" patch Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Fix default NUMA node allocation for threadsDavid Gibson2015-09-231-0/+8
| | | | | | | | | | | | | | | | | At present, if guest numa nodes are requested, but the cpus in each node are not specified, spapr just uses the default behaviour or assigning each vcpu round-robin to nodes. If smp_threads != 1, that will assign adjacent threads in a core to different NUMA nodes. As well as being just weird, that's a configuration that can't be represented in the device tree we give to the guest, which means the guest and qemu end up with different ideas of the NUMA topology. This patch implements mc->cpu_index_to_socket_id in the spapr code to make sure vcpus get assigned to nodes only at the socket granularity. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
* spapr: Move memory hotplug to RTAS_LOG_V6_HP_ID_DRC_COUNT typeBharata B Rao2015-09-231-1/+1
| | | | | | | | | | | | | | Till now memory hotplug used RTAS_LOG_V6_HP_ID_DRC_INDEX hotplug type which meant that we generated one hotplug type of EPOW event for every 256MB (SPAPR_MEMORY_BLOCK_SIZE). This quickly overruns the kernel rtas log buffer thus resulting in loss of memory hotplug events. Switch to RTAS_LOG_V6_HP_ID_DRC_COUNT hotplug type for memory so that we generate only one event per hotplug request. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Support hotplug by specifying DRC countBharata B Rao2015-09-233-12/+41
| | | | | | | | | | | | | | | | Support hotplug identifier type RTAS_LOG_V6_HP_ID_DRC_COUNT that allows hotplugging of DRCs by specifying the DRC count. While we are here, rename spapr_hotplug_req_add_event() to spapr_hotplug_req_add_by_index() spapr_hotplug_req_remove_event() to spapr_hotplug_req_remove_by_index() so that they match with spapr_hotplug_req_add_by_count(). Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Revert to memory@XXXX representation for non-hotplugged memoryBharata B Rao2015-09-231-38/+9
| | | | | | | | | | | | | | Don't represent non-hotluggable memory under drconf node. With this we don't have to create DRC objects for them. The effect of this patch is that we revert back to memory@XXXX representation for all the memory specified with -m option and represent the cold plugged memory and hot-pluggable memory under ibm,dynamic-reconfiguration-memory. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Populate ibm,associativity-lookup-arrays correctly for non-NUMABharata B Rao2015-09-231-2/+3
| | | | | | | | | | | | When NUMA isn't configured explicitly, assume node 0 is present for the purpose of creating ibm,associativity-lookup-arrays property under ibm,dynamic-reconfiguration-memory DT node. This ensures that the associativity index property is correctly updated in ibm,dynamic-memory for the LMB that is hotplugged. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Provide better error message when slots exceed max allowedBharata B Rao2015-09-231-2/+2
| | | | | | | | | | | | | | | Currently when user specifies more slots than allowed max of SPAPR_MAX_RAM_SLOTS (32), we error out like this: qemu-system-ppc64: unsupported amount of memory slots: 64 Let the user know about the max allowed slots like this: qemu-system-ppc64: Specified number of memory slots 64 exceeds max supported 32 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Don't allow memory hotplug to memory less nodesBharata B Rao2015-09-231-1/+23
| | | | | | | | | | | | | | | | | | | Currently PowerPC kernel doesn't allow hot-adding memory to memory-less node, but instead will silently add the memory to the first node that has some memory. This causes two unexpected behaviours for the user. - Memory gets hotplugged to a different node than what the user specified. - Since pc-dimm subsystem in QEMU still thinks that memory belongs to memory-less node, a reboot will set things accordingly and the previously hotplugged memory now ends in the right node. This appears as if some memory moved from one node to another. So until kernel starts supporting memory hotplug to memory-less nodes, just prevent such attempts upfront in QEMU. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Memory hotplug supportBharata B Rao2015-09-232-3/+123
| | | | | | | | | Make use of pc-dimm infrastructure to support memory hotplug for PowerPC. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Make hash table size a factor of maxram_sizeBharata B Rao2015-09-231-1/+1
| | | | | | | | | | | | The hash table size is dependent on ram_size, but since with hotplug the memory can grow till maxram_size. Hence make hash table size dependent on maxram_size. This allows to hotplug huge amounts of memory to the guest. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Support ibm,dynamic-reconfiguration-memoryBharata B Rao2015-09-232-49/+212
| | | | | | | | | | | | | | | | | | | | Parse ibm,architecture.vec table obtained from the guest and enable memory node configuration via ibm,dynamic-reconfiguration-memory if guest supports it. This is in preparation to support memory hotplug for sPAPR guests. This changes the way memory node configuration is done. Currently all memory nodes are built upfront. But after this patch, only memory@0 node for RMA is built upfront. Guest kernel boots with just that and rest of the memory nodes (via memory@XXX or ibm,dynamic-reconfiguration-memory) are built when guest does ibm,client-architecture-support call. Note: This patch needs a SLOF enhancement which is already part of SLOF binary in QEMU. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Add LMB DR connectorsDavid Gibson2015-09-231-0/+87
| | | | | | | | | | | | | | | Enable memory hotplug for pseries 2.4 and add LMB DR connectors. With memory hotplug, enforce RAM size, NUMA node memory size and maxmem to be a multiple of SPAPR_MEMORY_BLOCK_SIZE (256M) since that's the granularity in which LMBs are represented and hot-added. LMB DR connectors will be used by the memory hotplug code. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> [spapr_drc_reset implementation] [since this missed the 2.4 cutoff, changing to only enable for 2.5] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Use QEMU limit for maximum CPUs numberAlexey Kardashevskiy2015-09-231-4/+2
| | | | | | | | | | | | | | | | | sPAPR uses hard coded limit of maximum 255 supported CPUs which is exactly the same as QEMU-wide limit which is MAX_CPUMASK_BITS and also defined as 255. This makes use of a global CPU number limit for the "pseries" machine. In order to anticipate future increase of the MAX_CPUMASK_BITS (or to help debugging large systems), this also bumps the FDT_MAX_SIZE limit from 256K to 1M assuming that 1 CPU core needs roughly 512 bytes in the device tree so the new limit can cover up to 2048 CPU cores. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Don't use QOM [*] syntax for DR connectors.David Gibson2015-09-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dynamic reconfiguration (hotplug) code for the pseries machine type uses a "DR connector" QOM object for each resource it will be possible to hotplug. Each of these is added to its owner using object_property_add_child(owner, "dr-connector[*], ...); That works ok, mostly, but it means that the property indices are arbitrary, depending on the order in which the connectors are constructed. That might line up to something useful, but it doesn't have to. It will get worse once we add hotplug RAM support. That will add a DR connector object for every 256MB of potential memory. So if maxmem=2T, for example, there are 8192 objects under the same parent. The QOM interfaces aren't really designed for this. In particular object_property_add() with [*] has O(n^2) time complexity (in the number of existing children): first it has a linear search through array indices to find a free slot, each of which is attempted to a recursive call to object_property_add() with a specific [N]. Those calls are O(n) because there's a linear search through all properties to check for duplicates. By using a meaningful index value, which we already know is unique we can avoid the [*] special behaviour. That lets us reduce the total time for creating the DR objects from O(n^3) to O(n^2). O(n^2) is still kind of crappy, but it's enough to reduce the startup time of qemu (with in-progress memory hotplug support) with maxmem=2T from ~20 minutes to ~4 seconds. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
* spapr_drc: use RTAS return codes for methods called by RTASMichael Roth2015-09-232-36/+41
| | | | | | | | | | | | | | | | | | | | | | | | | Certain methods in sPAPRDRConnector objects are only ever called by RTAS and in many cases are responsible for the logic that determines the RTAS return codes. Rather than having a level of indirection requiring RTAS code to re-interpret return values from such methods to determine the appropriate return code, just pass them through directly. This requires changing method return types to uint32_t to match the type of values currently passed to RTAS helpers. In the case of read accesses like drc->entity_sense() where we weren't previously reporting any errors, just the read value, we modify the function to return RTAS return code, and pass the read value back via reference. Suggested-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Suggested-by: David Gibson <david@gibson.dropbear.id.au> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Initialize hotplug memory address spaceBharata B Rao2015-09-231-0/+18
| | | | | | | | | | | | Initialize a hotplug memory region under which all the hotplugged memory is accommodated. Also enable memory hotplug by setting CONFIG_MEM_HOTPLUG. Modelled on i386 memory hotplug. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_drc: don't allow 'empty' DRCs to be unisolated or allocatedMichael Roth2015-09-231-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Logical resources start with allocation-state:UNUSABLE / isolation-state:ISOLATED. During hotplug, guests will transition them to allocation-state:USABLE, and then to isolation-state:UNISOLATED. For cases where we cannot transition to allocation-state:USABLE, in this case due to no device/resource being association with the logical DRC, we should return an error -3. For physical DRCs, we default to allocation-state:USABLE and stay there, so in this case we should report an error -3 when the guest attempts to make the isolation-state:ISOLATED transition for a DRC with no device associated. These are as documented in PAPR 2.7, 13.5.3.4. We also ensure allocation-state:USABLE when the guest attempts transition to isolation-state:UNISOLATED to deal with misbehaving guests attempting to bring online an unallocated logical resource. This is as documented in PAPR 2.7, 13.7. Currently we implement no such error logic. Fix this by handling these error cases as PAPR defines. Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_pci: fix device tree props for MSI/MSI-XMichael Roth2015-09-231-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PAPR requires ibm,req#msi and ibm,req#msi-x to be present in the device node to define the number of msi/msi-x interrupts the device supports, respectively. Currently we have ibm,req#msi-x hardcoded to a non-sensical constant that happens to be 2, and are missing ibm,req#msi entirely. The result of that is that msi-x capable devices get limited to 2 msi-x interrupts (which can impact performance), and msi-only devices likely wouldn't work at all. Additionally, if devices expect a minimum that exceeds 2, the guest driver may fail to load entirely. SLOF still owns the generation of these properties at boot-time (although other device properties have since been offloaded to QEMU), but for hotplugged devices we rely on the values generated by QEMU and thus hit the limitations above. Fix this by generating these properties in QEMU as expected by guests. In the future it may make sense to modify SLOF to pass through these values directly as we do with other props since we're duplicating SLOF code. Cc: qemu-ppc@nongnu.org Cc: qemu-stable@nongnu.org Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Enable in-kernel H_SET_MODE handlingAlexey Kardashevskiy2015-09-231-0/+1
| | | | | | | | | | | | | | | | | For setting debug watchpoints, sPAPR guests use H_SET_MODE hypercall. The existing QEMU H_SET_MODE handler does not support this but the KVM handler in HV KVM does. However it is not enabled. This enables the in-kernel H_SET_MODE handler which handles: - Completed Instruction Address Breakpoint Register - Watch point 0 registers. The rest is still handled in QEMU. Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* pseries: Fix incorrect calculation of threads per socket for chip-idDavid Gibson2015-09-231-4/+2
| | | | | | | | | | | | | | | | | | | | | The device tree presented to pseries machine type guests includes an ibm,chip-id property which gives essentially the socket number of each vcpu core (individual vcpu threads don't get a node in the device tree). To calculate this, it uses a vcpus_per_socket variable computed as (smp_cpus / #sockets). This is correct for the usual case where smp_cpus == smp_threads * smp_cores * #sockets. However, you can start QEMU with the number of cores and threads mismatching the total number of vcpus (whether that _should_ be permitted is a topic for another day). It's a bit hard to say what the "real" number of vcpus per socket here is, but for most purposes (smp_threads * smp_cores) will more meaningfully match how QEMU behaves with respect to socket boundaries. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
OpenPOWER on IntegriCloud