summaryrefslogtreecommitdiffstats
path: root/sys/x86/iommu
Commit message (Collapse)AuthorAgeFilesLines
* MFC r325530 (jeff), r325566 (kib), r325588 (kib):markj2018-02-211-8/+5
| | | | | | Replace many instances of VM_WAIT with blocking page allocation flags. (cherry picked from commit 2069f0080fbdcf49b623bc3c1eda76524a4d1a77)
* MFC r320125:kib2017-06-263-34/+25
| | | | | | Fix batched unload for DMAR busdma in qi mode. Approved by: re (marius)
* MFC r316851:kib2017-04-211-1/+1
| | | | | Correct calculation of the entry->free_down in the invariants-checking code.
* MFC r316011:kib2017-04-034-46/+124
| | | | Timeout DMAR commands.
* MFC r315968:kib2017-04-021-3/+23
| | | | Provide less laborius way to enable busdma DMAR to only short list of devices.
* MFC r315934:kib2017-04-011-2/+4
| | | | Avoid leaking allocated but unused context after creation race.
* MFC r315933:kib2017-04-011-5/+7
| | | | Do not create RMRR entries for identity-mapped domains.
* MFC r315932:kib2017-04-011-2/+2
| | | | Slight cleanup of the comment.
* MFC r309551:kib2017-03-311-0/+2
| | | | Release DMAR table after using it.
* MFC r309550:kib2016-12-122-2/+2
| | | | Rename fast taskqueues used by DMAR.
* MFC 303754: Add __printflike() to bus_describe_intr() to enable -Wformat checks.jhb2016-10-061-1/+1
| | | | | Fix a few places that were passing a raw string as the format to use a "%s" format string instead.
* MFC 303886: Add additional constants.jhb2016-09-301-0/+4
| | | | | | | | | | - Add constants for the fields in the root-entry table address register, namely the root type type (RTT) and root table address (RTA) mask. - Add macros for the bitmask of the domain ID field in the second word of context table entries as well as a helper macro (DMAR_CTX2_GET_DID) to extract the domain ID from a context table entry. Sponsored by: Chelsio Communications
* sys: use our roundup2/rounddown2() macros when param.h is available.pfg2016-04-211-2/+2
| | | | | | | | | | rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.
* Add hw.dmar.batch_coalesce tunable/sysctl, which specifies rate atkib2016-04-173-2/+20
| | | | | | | | | | | | | which queued invalidation completion interrupt is requested with regard to the queued invalidation requests. In other words, setting the value of the knob to N requests completion interrupt after N items are processed. Existing behaviour is restored by setting hw.dmar.batch_coalesce=1. The knob significantly decreases the DMAR qi interrupt rate at the cost of slightly longer DMAR map entries recycling. Sponsored by: The FreeBSD Foundation
* Remove taskqueue_enqueue_fast().jhb2016-03-012-2/+2
| | | | | | | | | | taskqueue_enqueue() was changed to support both fast and non-fast taskqueues 10 years ago in r154167. It has been a compat shim ever since. It's time for the compat shim to go. Submitted by: Howard Su <howard0su@gmail.com> Reviewed by: sephe Differential Revision: https://reviews.freebsd.org/D5131
* Some BIOSes ACPI bytecode needs to take (sleepable) acpi mutex forkib2016-02-201-6/+2
| | | | | | | | | | | | | | | acpi_GetInteger() execution. Intel DMAR interrupt remapping code needs to know UID of the HPET to properly route the FSB interrupts from the HPET, even when interrupt remapping is disabled, and the code is executed under some non-sleepable mutexes. Cache HPET UIDs in the device softc at the attach time and provide lock-less method to get UID, use the method from the dmar hpet handling code instead of calling GetInteger(). Reported and tested by: Larry Rosenman <ler@lerctr.org> Sponsored by: The FreeBSD Foundation MFC after: 1 week
* dmar_ctx_dtr() does not exist since r284869. Remove the static functionbz2015-09-221-1/+0
| | | | declaration to avoid a cmpile time warning.
* Comment only change, fix grammar and somewhat clarify the action.kib2015-08-141-2/+3
| | | | | Sponsored by: The FreeBSD Foundation MFC after: 3 days
* Typo in comment.kib2015-07-201-1/+1
|
* Split the DMAR unit domains and contexts. Domains carry address spacekib2015-06-268-623/+839
| | | | | | | | | | | | | | | | | | | | and related data structures. Contexts attach requests initiators to domains. There is still 1:1 correspondence between contexts and domains on the running system, since only busdma currently allocates them, using dmar_get_ctx_for_dev(). Large part of the change is formal rename of the ctx to domain, but patch also reworks the context allocation and free to allow for independent domain creation. The helper dmar_move_ctx_to_domain() is introduced for future use, to reassign request initiator from one domain to another. The hard issue which is not yet resolved with the context move is proper handling (or reserving) RMRR entries in the destination domain as required by ACPI DMAR table for moved context. Tested by: pho Sponsored by: The FreeBSD Foundation
* Remove several write-only variables, all reported by the gcc 4.9kib2015-05-292-9/+3
| | | | | | | | | | | | | | | | buildkernel run. Some of them were write-only under some kernel options, e.g. variables keeping values only used by CTR() macros. It costs nothing to the code readability and correctness to eliminate the warnings in those cases too by removing the local cached values used only for single-access. Review: https://reviews.freebsd.org/D2665 Reviewed by: rodrigc Looked at by: bjk Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Explicitely enable queued invalidation completion interrupt when thekib2015-05-291-0/+2
| | | | | | | | queue is started, not relying on the interrupt remaping method to happen. Also disable interrupts when shooting down the queue. Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Account for the offset of the page run when allocating thekib2015-04-083-21/+32
| | | | | | | | | | | dmar_map_entry. Non-zero offset both increases the required mapping size, which is handled in dmar_bus_dmamap_load_something1(), and makes it possible that allocated range crosses boundary, which needs a check in dmar_gas_match_one(). Reported and tested by: jimharris Sponsored by: The FreeBSD Foundation MFC after: 1 week
* When mapping an allocated entry, use the entry size, instead of thekib2015-03-241-1/+1
| | | | | | | | | requested size. If tag restrictions caused split entry, its size is less then requsted. Hardware provided by: Michael Fuckner <michael@fuckner.net> Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Assert that the mapping loop makes progress.kib2015-03-241-0/+1
| | | | | Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Use VT-d interrupt remapping block (IR) to perform FSB messageskib2015-03-1912-19/+726
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | translation. In particular, despite IO-APICs only take 8bit apic id, IR translation structures accept 32bit APIC Id, which allows x2APIC mode to function properly. Extend msi_cpu of struct msi_intrsrc and io_cpu of ioapic_intsrc to full int from one byte. KPI of IR is isolated into the x86/iommu/iommu_intrmap.h, to avoid bringing all dmar headers into interrupt code. The non-PCI(e) devices which generate message interrupts on FSB require special handling. The HPET FSB interrupts are remapped, while DMAR interrupts are not. For each msi and ioapic interrupt source, the iommu cookie is added, which is in fact index of the IRE (interrupt remap entry) in the IR table. Cookie is made at the source allocation time, and then used at the map time to fill both IRE and device registers. The MSI address/data registers and IO-APIC redirection registers are programmed with the special values which are recognized by IR and used to restore the IRE index, to find proper delivery mode and target. Map all MSI interrupts in the block when msi_map() is called. Since an interrupt source setup and dismantle code are done in the non-sleepable context, flushing interrupt entries cache in the IR hardware, which is done async and ideally waits for the interrupt, requires busy-wait for queue to drain. The dmar_qi_wait_for_seq() is modified to take a boolean argument requesting busy-wait for the written sequence number instead of waiting for interrupt. Some interrupts are configured before IR is initialized, e.g. ACPI SCI. Add intr_reprogram() function to reprogram all already configured interrupts, and call it immediately before an IR unit is enabled. There is still a small window after the IO-APIC redirection entry is reprogrammed with cookie but before the unit is enabled, but to fix this properly, IR must be started much earlier. Add workarounds for 5500 and X58 northbridges, some revisions of which have severe flaws in handling IR. Use the same identification methods as employed by Linux. Review: https://reviews.freebsd.org/D1892 Reviewed by: neel Discussed with: jhb Tested by: glebius, pho (previous versions) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
* Provide definitions for all descriptors types in the DMAR invalidationkib2015-03-191-6/+21
| | | | | | | | | queue. They are for first-level translations and device TLB. Review: https://reviews.freebsd.org/D1892 Reviewed by: neel Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Fix syntax error.kib2015-03-191-1/+1
| | | | | | | Review: https://reviews.freebsd.org/D1892 Found by: neel Sponsored by: The FreeBSD Foundation MFC after: 3 days
* When initial placement of the new entry crosses the boundary,kib2015-03-171-2/+4
| | | | | | | | | | | | | | allocator tries to move the entry up, after the boundary. The new location may still fail to satisfy boundary requirement, for instance, if the boundary is set to page size, and allocation is of multiple pages. Recheck that boundary is not crossed after the move. If it is crossed, give up on allocating the whole entry and split it. Reported by: Michael Fuckner <michael@fuckner.net>, running nvme(4) Sponsored by: The FreeBSD Foundation MFC after: 1 week
* When inserting new entry into the address map, ensure that not onlykib2015-03-171-1/+2
| | | | | | | | next entry does not intersect with the tail of the new entry, but also that previous entry is also before new entry start. Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Revert r276949 and redo the fix for PCIe/PCI bridges, which do notkib2015-02-211-6/+26
| | | | | | | | | | | | | follow specification and do not provide PCIe capability. Verify if the port above such bridge is downstream PCIe (or root port) and treat the bridge as PCIe/PCI then. This allows to avoid maintaining the table of device ids for bridges without capability, while still calculate correct request originator for devices behind the bridge. Submitted by: Jason Harmening <jason.harmening@gmail.com> MFC after: 1 week
* Registers definitions for the new capabilities from the version 2.4 ofkib2015-02-112-4/+67
| | | | | | | | | | | VT-d specification. Also add definitions for the interrupt remapping table and IEC. Print new capabilities on boot. although there is no hardware which support it. Sponsored by: The FreeBSD Foundation MFC after: 1 week
* vm_page_lookup() accepts read-locked object.kib2015-02-111-4/+2
| | | | | Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Right now, for non-coherent DMARs, page table update code flushes thekib2015-01-114-21/+54
| | | | | | | | | | | | | | | | | | | | | | cache for whole page containing modified pte, and more, only last page in the series of the consequtive pages is flushed (i.e. the affected mappings should be larger than 2MB). Avoid excessive flushing and do missed neccessary flushing, by splitting invalidation and unmapping. For now, flush exactly the range of the changed pte. This is still somewhat bigger than neccessary, since pte is 8 bytes, while cache flush line is at least 32 bytes. The originator of the issue reports that after the change, 'dmar_bus_dmamap_unload went from 13,288 cycles down to 3,257. dmar_bus_dmamap_load_buffer went from 9,686 cycles down to 3,517. and I am now able to get line 1GbE speed with Netperf TCP (even with 1K message size).' Diagnosed and tested by: Nadav Amit <nadav.amit@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Fix calculation of requester for PCI device behind PCIe/PCI bridge.kib2015-01-101-25/+27
| | | | | | | | | | | | | | | | | | | In my case on the test machine, I have hierarchy of pcib2 (PCIe port on host bridge with PCIe capability) -> pci2 -> pcib3 (ITE PCIe/PCI bridge) -> pci3 -> em1 The device to check PCIe capability is pcib2 and not pcib3, as it is currently done in the code. Also, in case of the bridge, we shall step to pcib2 for the loop iteration, since pcib3 does not carry PCIe capability info and would force wrong recalculation of rid. Also change the returned requester to the PCIe bus which provides port for the bridge. This only results in changing hw.busdma.pciX.X.X.X.bounce tunable to force identity-mapped context for the device. Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Print rid when announcing DMAR context creation. Print sid when faultkib2015-01-102-4/+5
| | | | | | | | occurs. This allows to connect dots in case the requester is calculated erronously. Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Fix DMAR context allocations for the devices behind PCIe->PCI bridgeskib2015-01-091-1/+1
| | | | | | | | | after dmar driver was converted to use rids. The bus component to calculate context page must be taken from the requestor rid, which is a bridge, and not from the device bus number. Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Follow up to r225617. In order to maximize the re-usability of kernel codedavide2014-10-161-1/+1
| | | | | | | | in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe
* Remove ia64.marcel2014-07-071-1/+1
| | | | | | | | | | | | | | | | | This includes: o All directories named *ia64* o All files named *ia64* o All ia64-specific code guarded by __ia64__ o All ia64-specific makefile logic o Mention of ia64 in comments and documentation This excludes: o Everything under contrib/ o Everything under crypto/ o sys/xen/interface o sys/sys/elf_common.h Discussed at: BSDcan
* Pull in r267961 and r267973 again. Fix for issues reported will follow.hselasky2014-06-281-5/+4
|
* Revert r267961, r267973:gjb2014-06-271-4/+5
| | | | | | | | | | These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory
* Extend the meaning of the CTLFLAG_TUN flag to automatically check ifhselasky2014-06-271-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies
* Re-implement the DMAR I/O MMU code in terms of PCI RIDsrstone2014-04-016-47/+64
| | | | | | | | | | | | | Under the hood the VT-d spec is really implemented in terms of PCI RIDs instead of bus/slot/function, even though the spec makes pains to convert back to bus/slot/function in examples. However working with bus/slot/function is not correct when PCI ARI is in use, so convert to using RIDs in most cases. bus/slot/function will only be used when reporting errors to a user. Reviewed by: kib MFC after: 2 months Sponsored by: Sandvine Inc.
* Revert PCI RID changes.rstone2014-04-016-64/+47
| | | | | | | | My PCI RID changes somehow got intermixed with my PCI ARI patch when I committed it. I may have accidentally applied a patch to a non-clean working tree. Revert everything while I figure out what went wrong. Pointy hat to: rstone
* Re-implement the DMAR I/O MMU code in terms of PCI RIDsrstone2014-04-016-47/+64
| | | | | | | | | | | | Under the hood the VT-d spec is really implemented in terms of PCI RIDs instead of bus/slot/function, even though the spec makes pains to convert back to bus/slot/function in examples. However working with bus/slot/function is not correct when PCI ARI is in use, so convert to using RIDs in most cases. bus/slot/function will only be used when reporting errors to a user. Reviewed by: kib Sponsored by: Sandvine Inc.
* Add support for the PCI(e)-PCI bridges to the Intel VT-d driver. Thekib2014-03-183-17/+122
| | | | | | | | | | | | | | | | | bridge takes ownership of the transaction, so bsf of the requester is the bridge and not a device behind it. As result, code needs to walk the hierarchy up to use correct context. Note that PCIe->PCI-X bridges are not handled quite correctly since such bridges are allowed to only take ownership of some transactions. Also, weird but unrealistic cases of PCIe behind PCI bus are also not handled. Still, the patch provides significant step forward for the bridge handling. Submitted by: Jason Harmening <jason.harmening@gmail.com> MFC after: 1 week
* It is not uncommon for BIOSes to report wrong RMRR entries in DMARkib2014-03-181-0/+9
| | | | | | | | | | | | | table. Among them, some (old AMI ?) BIOSes report entries with range like (bf7ec000, bf7ebfff). Attempts to ignore the bogus entries result in faults, so the range must be covered somehow. Provide a workaround by identity mapping the 32 pages after the bogus entry start, which seems to be enough for the reported BIOS. Reported and tested by: Jason Harmening <jason.harmening@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Trim at EOL.kib2014-03-181-2/+2
| | | | MFC after: 3 days
* Fix undefined behavior: (1 << 31) is not defined as 1 is an int and thiseadler2013-11-301-8/+8
| | | | | | | | | | | | | shifts into the sign bit. Instead use (1U << 31) which gets the expected result. This fix is not ideal as it assumes a 32 bit int, but does fix the issue for most cases. A similar change was made in OpenBSD. Discussed with: -arch, rdivacky Reviewed by: cperciva
* Fix gcc warning about an uninitialized bool in sys/x86/iommu/intel_drv.c.dim2013-11-091-0/+2
| | | | Reviewed by: kib
OpenPOWER on IntegriCloud