summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Move the definition of the struct unrhdr into a separate header file,kib2013-08-303-27/+70
| | | | | | | | to allow embedding the struct. Add init_unrhdr(9) initializer, which sets up preallocated unrhdr. Reviewed by: alc Tested by: pho, bf
* Reduce WARNS to 0 for dig, host, and nslookup to make themerwin2013-08-303-3/+3
| | | | | | | | | compile with the optional WITH_BIND_SIGCHASE. Submitted by: Andre Albsmeier <Andre.Albsmeier@siemens.com> Approved by: delphij (mentor, implicit) MFC after: 3 days Sponsored by: DK Hostmaster A/S
* Few more minor if_vmx tweaksbryanv2013-08-303-25/+92
| | | | | | | - Allow the Rx/Tx queue sizes to be configured by tunables - Bail out earlier if the Tx queue unlikely has enough free descriptors to hold the frame - Cleanup some of the offloading capabilities handling
* Fix the sysctl that displays whether buffer packing is enablednp2013-08-301-7/+13
| | | | or not.
* If reading a virtual-device value fails, attempt to read a virtual-device-extcperciva2013-08-301-0/+3
| | | | | | | | | | | value. Some hosts do not publish "extended" disk IDs via virtual-device in an attempt to avoid confusing old blkfront drivers, and without this change we failed to attach such disks. In particular, this commit allows all 24 ephemeral disks on EC2 hs1.8xlarge instances to be used, instead of only the first 15. MFC after: 3 days
* Implement support for rx buffer packing. Enable it by default for T5np2013-08-302-142/+601
| | | | | | | | | | | | | | | | cards. This is a T4 and T5 chip feature which lets the chip deliver multiple Ethernet frames in a single buffer. This is more efficient within the chip, in the driver, and reduces wastage of space in rx buffers. - Always allocate rx buffers from the jumbop zone, no matter what the MTU is. Do not use the normal cluster refcounting mechanism. - Reserve space for an mbuf and a refcount in the cluster itself and let the chip DMA multiple frames in the rest. - Use the embedded mbuf for the first frame and allocate mbufs on the fly for any additional frames delivered in the cluster. Each of these mbufs has a reference on the underlying cluster.
* - Fix LOCAL_MTREE so it properly handles multiple files and quotesbdrewery2013-08-301-3/+3
| | | | | | | | | | its value into submakes PR: conf/179466 Submitted by: Garrett Cooper <yaneurabeya@gmail.com> Approved by: bapt MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division
* Add a routine for attaching an mbuf to a buffer with an externalnp2013-08-291-0/+22
| | | | | | refcount. This one is willing to work with buffers that may already be referenced. MEXTADD/m_extadd are suitable only for the first attachment to a cluster -- they initialize the refcount to 1.
* Introduce a new, HVM compatible, paravirtualized timer driver for Xen.gibbs2013-08-298-642/+615
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use this new driver for both PV and HVM instances. This driver requires a Xen hypervisor that supports vector callbacks, VCPUOP hypercalls, and reports that it has a "safe PV clock". New timer driver: Submitted by: will Sponsored by: Spectra Logic Corporation PV port to new driver, and bug fixes: Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D sys/dev/xen/timer/timer.c: - Register a PV timer device driver which (currently) implements device_{identify,probe,attach} and stubs device_detach. The detach routine requires functionality not provided by timecounters(4). The suspend and resume routines need additional work (due to Xen requiring that the hypercalls be executed on the target VCPU), and aren't needed for our purposes. - Make sure there can only be one device instance of this driver, and that it only registers one eventtimers(4) and one timecounters(4) device interface. Make both interfaces use PCPU data as needed. - Match, with a few style cleanups & API differences, the Xen versions of the "fetch time" functions. - Document the magic scale_delta() better for the i386 version. - When registering the event timer, bind a separate event channel for the timer VIRQ to the device's event timer interrupt handler for each active VCPU. Describe each interrupt as "xen_et:c%d", so they can be identified per CPU in "vmstat -i" or "show intrcnt" in KDB. - When scheduling a timer into the hypervisor, try up to 60 times if the hypervisor rejects the time as being in the past. In the common case, this retry shouldn't happen, and if it does, it should only happen once. This is because the event timer advertises a minimum period of 100usec, which is only less than the usual hypercall round trip time about 1 out of every 100 tries. (Unlike other similar drivers, this one actually checks whether the hypervisor accepted the singleshot timer set hypercall.) - Implement a RTC PV clock based on the hypervisor wallclock. sys/conf/files: - Add dev/xen/timer/timer.c if the kernel configuration includes either the XEN or XENHVM options. sys/conf/files.i386: sys/i386/include/xen/xen_clock_util.h: sys/i386/xen/clock.c: sys/i386/xen/xen_clock_util.c: sys/i386/xen/mp_machdep.c: sys/i386/xen/xen_rtc.c: - Remove previous PV timer used in i386 XEN PV kernels, the new timer introduced in this change is used instead (so we share the same code between PVHVM and PV). MFC after: 2 weeks
* 'u_long' is consistently spelled 'unsigned long' in this file. Fix it.jkim2013-08-291-1/+1
|
* Partially revert r254880. The bitmap operations actually use long type now.jkim2013-08-293-8/+5
|
* Bump up the default timeouts for move commands in the ch(4) driverken2013-08-291-4/+4
| | | | | | | | | | to 15 minutes, and 5 minutes for things like READ ELEMENT STATUS. This is needed to account for the worst case scenarios on at least some Spectra Logic tape libraries. Sponsored by: Spectra Logic MFC after: 3 days
* Fix the incomplete conversion from atomic_t to long for test_bit().jkim2013-08-291-1/+1
|
* Clarify confusions between atomic_t and bitmap. Fix bitmap operationsjkim2013-08-292-13/+19
| | | | accordingly.
* Implement vector callback for PVHVM and unify event channel implementationsgibbs2013-08-2964-2654/+2433
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-structure Xen HVM support so that: - Xen is detected and hypercalls can be performed very early in system startup. - Xen interrupt services are implemented using FreeBSD's native interrupt delivery infrastructure. - the Xen interrupt service implementation is shared between PV and HVM guests. - Xen interrupt handlers can optionally use a filter handler in order to avoid the overhead of dispatch to an interrupt thread. - interrupt load can be distributed among all available CPUs. - the overhead of accessing the emulated local and I/O apics on HVM is removed for event channel port events. - a similar optimization can eventually, and fairly easily, be used to optimize MSI. Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure, and misc Xen cleanups: Sponsored by: Spectra Logic Corporation Unification of PV & HVM interrupt infrastructure, bug fixes, and misc Xen cleanups: Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D sys/x86/x86/local_apic.c: sys/amd64/include/apicvar.h: sys/i386/include/apicvar.h: sys/amd64/amd64/apic_vector.S: sys/i386/i386/apic_vector.s: sys/amd64/amd64/machdep.c: sys/i386/i386/machdep.c: sys/i386/xen/exception.s: sys/x86/include/segments.h: Reserve IDT vector 0x93 for the Xen event channel upcall interrupt handler. On Hypervisors that support the direct vector callback feature, we can request that this vector be called directly by an injected HVM interrupt event, instead of a simulated PCI interrupt on the Xen platform PCI device. This avoids all of the overhead of dealing with the emulated I/O APIC and local APIC. It also means that the Hypervisor can inject these events on any CPU, allowing upcalls for different ports to be handled in parallel. sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: Map Xen per-vcpu area during AP startup. sys/amd64/include/intr_machdep.h: sys/i386/include/intr_machdep.h: Increase the FreeBSD IRQ vector table to include space for event channel interrupt sources. sys/amd64/include/pcpu.h: sys/i386/include/pcpu.h: Remove Xen HVM per-cpu variable data. These fields are now allocated via the dynamic per-cpu scheme. See xen_intr.c for details. sys/amd64/include/xen/hypercall.h: sys/dev/xen/blkback/blkback.c: sys/i386/include/xen/xenvar.h: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/xen/gnttab.c: Prefer FreeBSD primatives to Linux ones in Xen support code. sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: sys/dev/xen/balloon/balloon.c: sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/console/xencons_ring.c: sys/dev/xen/control/control.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/dev/xen/xenpci/xenpci.c: sys/i386/i386/machdep.c: sys/i386/include/pmap.h: sys/i386/include/xen/xenfunc.h: sys/i386/isa/npx.c: sys/i386/xen/clock.c: sys/i386/xen/mp_machdep.c: sys/i386/xen/mptable.c: sys/i386/xen/xen_clock_util.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/xen_rtc.c: sys/xen/evtchn/evtchn_dev.c: sys/xen/features.c: sys/xen/gnttab.c: sys/xen/gnttab.h: sys/xen/hvm.h: sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbus_if.m: sys/xen/xenbus/xenbusb_front.c: sys/xen/xenbus/xenbusvar.h: sys/xen/xenstore/xenstore.c: sys/xen/xenstore/xenstore_dev.c: sys/xen/xenstore/xenstorevar.h: Pull common Xen OS support functions/settings into xen/xen-os.h. sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: Remove constants, macros, and functions unused in FreeBSD's Xen support. sys/xen/xen-os.h: sys/i386/xen/xen_machdep.c: sys/x86/xen/hvm.c: Introduce new functions xen_domain(), xen_pv_domain(), and xen_hvm_domain(). These are used in favor of #ifdefs so that FreeBSD can dynamically detect and adapt to the presence of a hypervisor. The goal is to have an HVM optimized GENERIC, but more is necessary before this is possible. sys/amd64/amd64/machdep.c: sys/dev/xen/xenpci/xenpcivar.h: sys/dev/xen/xenpci/xenpci.c: sys/x86/xen/hvm.c: sys/sys/kernel.h: Refactor magic ioport, Hypercall table and Hypervisor shared information page setup, and move it to a dedicated HVM support module. HVM mode initialization is now triggered during the SI_SUB_HYPERVISOR phase of system startup. This currently occurs just after the kernel VM is fully setup which is just enough infrastructure to allow the hypercall table and shared info page to be properly mapped. sys/xen/hvm.h: sys/x86/xen/hvm.c: Add definitions and a method for configuring Hypervisor event delievery via a direct vector callback. sys/amd64/include/xen/xen-os.h: sys/x86/xen/hvm.c: sys/conf/files: sys/conf/files.amd64: sys/conf/files.i386: Adjust kernel build to reflect the refactoring of early Xen startup code and Xen interrupt services. sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: sys/dev/xen/control/control.c: sys/dev/xen/evtchn/evtchn_dev.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/xen/xenstore/xenstore.c: sys/xen/evtchn/evtchn_dev.c: sys/dev/xen/console/console.c: sys/dev/xen/console/xencons_ring.c Adjust drivers to use new xen_intr_*() API. sys/dev/xen/blkback/blkback.c: Since blkback defers all event handling to a taskqueue, convert this task queue to a "fast" taskqueue, and schedule it via an interrupt filter. This avoids an unnecessary ithread context switch. sys/xen/xenstore/xenstore.c: The xenstore driver is MPSAFE. Indicate as much when registering its interrupt handler. sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbusvar.h: Remove unused event channel APIs. sys/xen/evtchn.h: Remove all kernel Xen interrupt service API definitions from this file. It is now only used for structure and ioctl definitions related to the event channel userland device driver. Update the definitions in this file to match those from NetBSD. Implementing this interface will be necessary for Dom0 support. sys/xen/evtchn/evtchnvar.h: Add a header file for implemenation internal APIs related to managing event channels event delivery. This is used to allow, for example, the event channel userland device driver to access low-level routines that typical kernel consumers of event channel services should never access. sys/xen/interface/event_channel.h: sys/xen/xen_intr.h: Standardize on the evtchn_port_t type for referring to an event channel port id. In order to prevent low-level event channel APIs from leaking to kernel consumers who should not have access to this data, the type is defined twice: Once in the Xen provided event_channel.h, and again in xen/xen_intr.h. The double declaration is protected by __XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared twice within a given compilation unit. sys/xen/xen_intr.h: sys/xen/evtchn/evtchn.c: sys/x86/xen/xen_intr.c: sys/dev/xen/xenpci/evtchn.c: sys/dev/xen/xenpci/xenpcivar.h: New implementation of Xen interrupt services. This is similar in many respects to the i386 PV implementation with the exception that events for bound to event channel ports (i.e. not IPI, virtual IRQ, or physical IRQ) are further optimized to avoid mask/unmask operations that aren't necessary for these edge triggered events. Stubs exist for supporting physical IRQ binding, but will need additional work before this implementation can be fully shared between PV and HVM. sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: sys/i386/xen/mp_machdep.c sys/x86/xen/hvm.c: Add support for placing vcpu_info into an arbritary memory page instead of using HYPERVISOR_shared_info->vcpu_info. This allows the creation of domains with more than 32 vcpus. sys/i386/i386/machdep.c: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/exception.s: Add support for new event channle implementation.
* - Remove test_and_set_bit() macro. It is unused since r255037.jkim2013-08-291-5/+3
| | | | | | | | - Relax atomic_read() and atomic_set() macros. Linux does not require any memory barrier. Also, these macros may be even reordered or optimized away according to the API documentation: https://www.kernel.org/doc/Documentation/atomic_ops.txt
* Convert the if_lagg rwlock to an rmlock.adrian2013-08-292-33/+56
| | | | | | | | | | | | | We've been seeing lots of cache line contention (but not lock contention!) in our workloads between the various TX and RX threads going on. The write lock is only grabbed when configuration changes are made - which are infrequent. With this patch, the contention and cycles spent waiting for updates disappear. Sponsored by: Netflix, Inc.
* Fix atomic operations on context_flag without altering semantics.jkim2013-08-291-3/+3
|
* Add directories that is installed as part of bsdconfig.delphij2013-08-291-0/+74
| | | | | | | | | | These are included unconditionally for now because bsdconfig is currently installed unconditionally. This fixes 'make -j 17 installworld' caused by a race condition. MFC candidate.
* Add a few missing language directories for /usr.delphij2013-08-291-0/+8
|
* Update to 2013-08-29 NetBSD libexecinfo snapshotemaste2013-08-293-13/+27
| | | | | This adds my patch to use the kern.proc.pathname sysctl instead of relying on procfs(5).
* Fix some issues in change 254760 pointed out by Bruce Evans:ken2013-08-291-11/+8
| | | | | | | | | | - Remove excessive parenthesis - Use KNF continuation indentation - Cut down on excessive continuation lines - More consistent style in messages - Use uprintf() instead of printf() Submitted by: bde
* Work-around a timing problem with the ITE IT8513E now that the coremarcel2013-08-291-1/+13
| | | | | | | | | | | calls ns8250_bus_ipend() almost immediately after ns8250_bus_attach(). As it appears, a line break condition is being signalled for almost all received characters due to this. A delay of 150ms seems enough to allow the H/W to settle and to avoid the problem. More analysis is needed, but for now a regression has been addressed. Reported by: kevlo@ Tested by: kevlo@
* Don't return an error for socket timeouts that are too large. Justjhb2013-08-291-7/+2
| | | | | | | | cap them to INT_MAX ticks instead. PR: kern/181416 (r254699 really) Requested by: bde MFC after: 2 weeks
* Fix after r255014antoine2013-08-292-4/+4
|
* Significantly reduce the cost, i.e., run time, of calls to madvise(...,alc2013-08-2913-11/+419
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new pmap function, pmap_advise(), that operates on a range of virtual addresses within the specified pmap, allowing for a more efficient implementation of MADV_DONTNEED and MADV_FREE. Previously, the implementation of MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as pmap_clear_reference(). Intuitively, the problem with this implementation is that the pmap-level locks are acquired and released and the page table traversed repeatedly, once for each resident page in the range that was specified to madvise(2). A more subtle flaw with the previous implementation is that pmap_clear_reference() would clear the reference bit on all mappings to the specified page, not just the mapping in the range specified to madvise(2). Since our malloc(3) makes heavy use of madvise(2), this change can have a measureable impact. For example, the system time for completing a parallel "buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%. Note: This change only contains pmap_advise() implementations for a subset of our supported architectures. I will commit implementations for the remaining architectures after further testing. For now, a stub function is sufficient because of the advisory nature of pmap_advise(). Discussed with: jeff, jhb, kib Tested by: pho (i386), marcel (ia64) Sponsored by: EMC / Isilon Storage Division
* Migrate iwn(4) to use the new ieee80211_tx_complete() API.adrian2013-08-291-35/+23
| | | | | | Tested: * Intel 5100, STA mode
* Remove the duplicate LLC_MISS event and put it in the right order.adrian2013-08-291-3/+2
|
* Prevent the full restart cycle every time arge_start() is called. Onlyloos2013-08-291-1/+2
| | | | | | | | | (re)start the interface when it is down. This change fix a race with BOOTP where the response packet is lost because the interface is being reset by a netmask change right after send the packet. PR: 178318 Approved by: adrian (mentor)
* Remove GNU_PATCH leftover.andreast2013-08-291-5/+0
|
* Merge r254736 from user/np/cxl_tuning.np2013-08-291-12/+14
| | | | | | Don't leak tags when M_NOFREE | M_PKTHDR mbufs are freed. Reviewed by: andre
* Merge r254386 from user/np/cxl_tuning. Add an INET|INET6 check missingnp2013-08-293-0/+17
| | | | | | | in said revision. r254386: Flush inactive LRO entries periodically.
* Drop build option switch for the older GNU patch.pfg2013-08-294-19/+0
| | | | | | | | As promised, drop the option to make the older GNU patch the default. GNU patch is still being built but something drastic may happen to it to it before Release.
* Correct atomic operations in i915.jkim2013-08-283-11/+11
|
* Fix a compiler warning and add couple of VM map types.jkim2013-08-281-3/+21
|
* Whitespace nit.np2013-08-281-1/+1
|
* Merge r254336 from user/np/cxl_tuning.np2013-08-282-1/+26
| | | | | | | | | | | | | | | | | | | | | | | | | Add a last-modified timestamp to each LRO entry and provide an interface to flush all inactive entries. Drivers decide when to flush and what the inactivity threshold should be. Network drivers that process an rx queue to completion can enter a livelock type situation when the rate at which packets are received reaches equilibrium with the rate at which the rx thread is processing them. When this happens the final LRO flush (normally when the rx routine is done) does not occur. Pure ACKs and segments with total payload < 64K can get stuck in an LRO entry. Symptoms are that TCP tx-mostly connections' performance falls off a cliff during heavy, unrelated rx on the interface. Flushing only inactive LRO entries works better than any of these alternates that I tried: - don't LRO pure ACKs - flush _all_ LRO entries periodically (every 'x' microseconds or every 'y' descriptors) - stop rx processing in the driver periodically and schedule remaining work for later. Reviewed by: andre
* Fix a compiler warning. With this fix, a negative time can be converted tojkim2013-08-281-1/+1
| | | | a struct timeval and back to the original nanoseconds correctly.
* Support storing 7 additional file flags in tmpfs:ken2013-08-281-3/+4
| | | | | | | | | | | UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY, and UF_HIDDEN. Sort the file flags tmpfs supports alphabetically. tmpfs now supports the same flags as UFS, with the exception of SF_SNAPSHOT. Reported by: bdrewery, antoine Sponsored by: Spectra Logic
* libutil: Use O_CLOEXEC for internal file descriptors from open().jilles2013-08-285-9/+12
|
* Change t4_list_lock and t4_uld_list_lock from mutexes to sx'es.np2013-08-283-35/+34
| | | | | | | | - tom_uninit had to be reworked not to hold the adapter lock (a mutex) around t4_deactivate_uld, which acquires the uld_list_lock. - the ifc_match for the interface cloner that creates the tracer ifnet had to be reworked as the kernel calls ifc_match with the global if_cloners_mtx held.
* Add hooks in base cxgbe(4) for the iWARP upper-layer driver. Update anp2013-08-286-8/+30
| | | | couple of assertions in the TOE driver as well.
* Reduce diff against stable/9 slightly.jkim2013-08-281-3/+2
|
* ql_minidump() should be performed only by port 0.davidcs2013-08-281-2/+2
| | | | Submitted by: David C Somayajulu
* Xref capsicum(4) and procdesc(4) from pdfork(2).rwatson2013-08-281-4/+18
| | | | | Suggested by: sbruno MFC after: 3 days
* Add a simple procdesc(4) man page describing "options PROCDESC" and therwatson2013-08-283-5/+103
| | | | | | | | high-level facility, supplementing pdfork(2) and friends. Update capsicum.4 to xref. Suggested by: sbruno MFC after: 3 days
* Do not save/restore video memory if we are not using linear frame buffer.jkim2013-08-281-22/+15
| | | | Note this partially revert r233896.
* Make sure to free stale buffer before allocating new one for safety.jkim2013-08-281-2/+6
|
* Avoid unnecessary signedness conversion.jkim2013-08-281-3/+3
|
* In looking at block layouts as part of fixing filesystem blockmckusick2013-08-281-2/+2
| | | | | | | | allocations under low free-space conditions (-r254995), determine that old block-preference search order used before -r249782 worked a bit better. This change reverts to that block-preference search order. MFC after: 2 weeks
OpenPOWER on IntegriCloud