summaryrefslogtreecommitdiffstats
path: root/sys/dev/e1000/if_em.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge remote-tracking branch 'origin/stable/10' into develRenato Botelho2016-02-281-9/+161
|\
| * MFC: r295906marius2016-02-261-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix and clean up usage of DMA and TSO segments: - At Intel it is believed that most of their products support "only" 40 DMA segments so lower {EM,IGB}_MAX_SCATTER accordingly. Actually, 40 is more than plenty to handle full size TSO packets so it doesn't make sense to further distinguish between MAC variants that really can do 64 DMA segments. Moreover, capping at 40 DMA segments limits the stack usage of {em,igb}_xmit() that - given the rare use of more than these - previously hardly was justifiable, while still being sufficient to avoid the problems seen with em(4) and EM_MAX_SCATTER set to 32. - In igb(4), pass the actually supported TSO parameters up the stack. Previously, the defaults set in if_attach_internal() were applied, i. e. a maximum of 35 TSO segments, which made supporting more than these in the driver pointless. However, this might explain why no problems were seen with IGB_MAX_SCATTER at 64. - In em(4), take the 5 m_pullup(9) invocations performed by em_xmit() in the TSO case into account when reporting TSO parameters upwards. In the worst case, each of these calls will add another mbuf and, thus, the requirement for an additional DMA segment. So for best performance, it doesn't make sense to advertize a maximum of TSO segments that typically will require defragmentation in em_xmit(). Again, this leaves enough room to handle full size TSO packets. - Drop TSO macros from if_lem.h given that corresponding MACS don't support TSO in the first place. Reviewed by: erj, sbruno, jeffrey.e.pieper_intel.com Approved by: re (gjb)
| * MFC r295323:erj2016-02-251-8/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update em(4) to 7.6.1; update igb(4) to 2.5.3. Major changes: - Add i219/i219(2) hardware support. (Found on Skylake generation and newer chipsets.) - Further to the last Skylake support diff, this one also includes support for the Lewisburg chipset (i219(3)). - Add a workaround to an igb hardware errata. All 1G server products need to have IPv6 extension header parsing turned off. This should be listed in the specification updates for current 1G server products, e.g. for i350 it's errata #37 in this document: http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/ethernet-controller-i350-spec-update.pdf - Avoton (i354) PHY errata workaround added And a bunch of minor fixes, as well as #defines for things that the current em(4)/igb(4) drivers don't implement. MFC r287465: igb(4): Update and fix HW errata - HW errata workaround for IPv6 offload w/ extension headers - Edited start of if_igb.c (Device IDs / #includes) to match ixgbe/ixl Approved by: re (gjb) Sponsored by: Intel Corporation
* | Merge remote-tracking branch 'origin/stable/10' into develRenato Botelho2016-02-051-2/+9
|\ \ | |/
| * MFC: r295133marius2016-02-041-2/+9
| | | | | | | | | | | | | | | | | | | | | | As it turns out, one of the more or less recent changes to em(4) causes watchdog timeouts when using TSO4 at link speeds below Gigabit, at least with 82573E. So disable the assist automatically when at lower speeds. Submitted by: jfv Approved by: re (kib), erj Obtained from: D3162
* | Merge remote-tracking branch 'origin/stable/10' into develRenato Botelho2016-02-011-210/+265
|\ \ | |/
| * Sync the e1000 drivers with what's in head as of r294327, modulo partsmarius2016-01-271-210/+266
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | that don't apply to stable/10 (driver API, if_inc_counter(), RSS changes etc.) and modulo r287465 (which reportedly breaks igb(4)), i. e. assorted fixes and improvements only: o MFC r267385 (partial): - Don't compare bus_dma map pointers for static DMA allocations against NULL to determine if bus_dmamap_unload() or bus_dmamem_free() should be called. Instead, check the associated bus and virtual addresses. - Don't clear static DMA maps to NULL. o MFC r284933: Delete the refernce to VLAN handling being disabled by default. This is no longer the case. [1] o MFC r285639: Add an adapter CORE lock in the DDB hook em_dump_queue to avoid WITNESS panic in em_init_locked() while debugging. o MFC r285879: - Remove unused txd_saved. - Intialize txd_upper, txd_lower and txd_used at declaration. o MFC r286162: Free mbufs when busdma loading fails. o MFC r286829: Add capability to disable CRC stripping as it breaks IPMI/BMC capabilities on certain adatpers. [2] o MFC r286831: [3] - Increase EM_MAX_SCATTER to 64 such that the size of em_xmit():: segs[EM_MAX_SCATTER] doesn't get overrun by things like NFS that can and do shove more than 32 segs when being used with em(4) and TSO4. - Update tso handling code in em_xmit() with update from jhb@ - Set if_hw_tsomax, if_hw_tsomaxsegcount and if_hw_tsomaxsegsize to appropriate values. - Define a TSO workaround "magic" number of 4 that is used to avoid an alignment issue in hardware. - Change a couple of integer values that were used as booleans to actual bool types. - Ensure that em_enable_intr() enables the appropriate mask of interrupts and not just a hardcoded define of values. o MFC r286832: e1000/if_lem.c bump to 1.1.0 o MFC r286833: Bump all copywrite dates to 2015. o MFC r287112: Style/whitespace cleanup in shared/common code. o MFC r293331: - Switch em(4) to the extended RX descriptor format. - Split rxbuffer and txbuffer apart to support the new RX descriptor format structures. Move rxbuffer manipulation to em_setup_rxdesc() to unify the new behavior changes. - Add a RSSKEYLEN macro for help in generating the RSSKEY data structures in the card. - Change em_receive_checksum() to process the new rxdescriptor format status bit. o MFC r293332: Disable the reuse of checksum offload context descriptors in the case of multiple queues in em(4). Document errata in the code. o MFC r293854: Given that em(4), lem(4) and igb(4) hardware doesn't require the alignment guarantees provided by m_defrag(9), use m_collapse(9) instead for performance reasons. While at it, sanitize the statistics softc members, i. e. retire unused ones and add SYSCTL nodes missing for actually used ones. PR: 118693 [1], 161277 [2], 195078 [3], 199174 [3], 200221 [3]
* | Importing pfSense patch iface_iftx_altq_hybrid.diffRenato Botelho2015-08-171-27/+27
|/
* MFC r284179, r283959sbruno2015-06-171-189/+441
| | | | | | | | | | | | Implement multiqueue (max 2 tx/rx queues) for the 82574L chipset. Change default tuning parameters to handle this new configuration if EM_MULTIQUEUE is set in the kernel configuration. Off by default. See r283959 changelog for the scope of these changes. Relnotes: Yes Sponsored by: Limelight Networks
* MFC r283923sbruno2015-06-161-43/+34
| | | | | | | | | | | | | | | | Simplify hang detection by stealing the techniques used in ixl(4) and applying them to em(4). Rely on iterations through the local timer, and the tx queue state to determine if an actual hang has occurred. Any time a descriptor is used (packet sent), the tx queue is flagged as busy. Then when txeof runs, it either clears the flag when all is clean, or resets it to 1 if ANY are cleaned, if nothing is cleaned it increments the flag. Local timer simply checks to see if busy ever reaches MAX (10, which is compile time configurable), and then sets it as HUNG, at that point there is one more timer cycle in which to have any cleans, if not a watchdog reset will occur.
* MFC r283290sbruno2015-05-251-0/+3
| | | | | | | Bump rx_overruns when indicated by the ICR mask. PR: 199716 Sponsored by: Limelight Networks
* MFC r263710, r273377, r273378, r273423 and r273455:hselasky2014-10-271-1/+1
| | | | | | | - De-vnet hash sizes and hash masks. - Fix multiple issues related to arguments passed to SYSCTL macros. Sponsored by: Mellanox Technologies
* MFC 270063: update of netmap codeluigi2014-08-201-4/+4
| | | | (vtnet and cxgbe not merged yet because we need some other mfc first)
* MFC: r268726rmacklem2014-07-291-1/+1
| | | | | | | Move the "retry:" label so that the calls to m_pullup() are not done after the call to m_defrag(). This fixes a problem where m_pullup() would prepend an mbuf to the list created by m_defrag() making the chain greater than 32 again.
* MFC of R267935: Sync the E1000 shared code to Intel internal, andjfv2014-07-281-2/+9
| | | | more importantly add new I218 adapter support to em.
* MFH: sync the netmap code with the one in HEADluigi2014-02-181-4/+5
| | | | | (enhanced VALE switch, netmap pipes, emulated netmap mode). See details in the log for svn 261909.
* MFC r257541:kib2013-12-171-2/+4
| | | | Fix several issues with the busdma(9) KPI use in the e1000 drivers.
* Update PCI drivers to no longer look at the MEMIO-enabled bit in the PCIscottl2013-08-121-9/+1
| | | | | | | | | | | | | | | | | command register. The lazy BAR allocation code in FreeBSD sometimes disables this bit when it detects a range conflict, and will re-enable it on demand when a driver allocates the BAR. Thus, the bit is no longer a reliable indication of capability, and should not be checked. This results in the elimination of a lot of code from drivers, and also gives the opportunity to simplify a lot of drivers to use a helper API to set the busmaster enable bit. This changes fixes some recent reports of disk controllers and their associated drives/enclosures disappearing during boot. Submitted by: jhb Reviewed by: jfv, marius, achadd, achim MFC after: 1 day
* Improve the MSIX setup code in the drivers, thanks to Marius forjfv2013-08-121-7/+14
| | | | | | | | the changes. Make sure that pci_alloc_msix() does give us the vectors we need and fall back to MSI when it doesn't, also release any that were allocated when insufficient. MFC after: 3 days
* Make the various driver MSIX setup routines fallback to MSI morejfv2013-08-061-10/+9
| | | | | | | gracefully. This change was suggested by Marius Strobl, thank you. PR: kern/181016 MFC after: ASAP
* Change the E1000 driver option header handling to match thejfv2013-07-121-2/+3
| | | | | | | | | | ixgbe driver. As it was, when building them as a module INET and INET6 are not defined. In these drivers it does not cause a panic, however it does result in different behavior in the ioctl routine when you are using a module vs static, and I think the behavior should be the same. MFC after: 3 days
* if_lem.c: make sure that lem_rxeof() can drain the entire rx queueluigi2013-05-091-2/+10
| | | | | | | | | | | | irrespective of the setting of lem_rx_process_limit, while giving a chance to the taskqueue scheduler to act after each chunk. This makes lem_rxeof similar to the one in if_em.c and if_igb.c . if_lem.c and if_em.c: add a sysctl to manually configure the 'itr' moderation register. Approved by: Jack Vogel
* simplify the code to initialize the RDT while in netmap mode.luigi2013-05-091-10/+5
|
* use netmap_rx_irq() / netmap_tx_irq() to handle interrupts inluigi2013-04-301-21/+4
| | | | | | netmap mode, removing the logic from individual drivers. (note: if_lem.c not updated yet due to some other pending modifications)
* Corrections to the RX checksum code, make sure its disabled asjfv2013-04-151-25/+18
| | | | | | well as enabled when necessary. And simplify the checksum routine itself, adding UDP bit to the test. Thanks to Kevin Lo for pointing out the problems and code suggestions.
* Correct the multicast handling in the E1000 drivers as wasjfv2013-04-031-4/+29
| | | | | | done in ixgbe, thanks to Mike Karels for this fix. When exiting promiscuous mode MPE bit was being unconditionally cleared, this should not be done if we are in MAX multicast groups.
* Refresh on the shared code for the E1000 drivers.jfv2013-02-211-18/+26
| | | | | | | | | | | | | | | | | | | | - bear with me, there are lots of white space changes, I would not do them, but I am a mere consumer of this stuff and if these drivers are to stay in shape they need to be taken. em driver changes: support for the new i217/i218 interfaces igb driver changes: - TX mq start has a quick turnaround to the stack - Link/media handling improvement - When link status changes happen the current flow control state will now be displayed. - A few white space/style changes. lem driver changes: - the shared code uncovered a bogus write to the RLPML register (which does not exist in this hardware) in the vlan code,this is removed.
* This fixes a out-of-order problem with severalrrs2013-02-071-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of the newer drivers. The basic problem was that the driver was pulling the mbuf off the drbr ring and then when sending with xmit(), encounting a full transmit ring. Thus the lower layer xmit() function would return an error, and the drivers would then append the data back on to the ring. For TCP this is a horrible scenario sure to bring on a fast-retransmit. The fix is to use drbr_peek() to pull the data pointer but not remove it from the ring. If it fails then we either call the new drbr_putback or drbr_advance method. Advance moves it forward (we do this sometimes when the xmit() function frees the mbuf). When we succeed we always call advance. The putback will always copy the mbuf back to the top of the ring. Note that the putback *cannot* be used with a drbr_dequeue() only with drbr_peek(). We most of the time, in putback, would not need to copy it back since most likey the mbuf is still the same, but sometimes xmit() functions will change the mbuf via a pullup or other call. So the optimial case for the single consumer is to always copy it back. If we ever do a multiple_consumer (for lagg?) we will need a test and atomic in the put back possibly a seperate putback_mc() in the ring buf. Reviewed by: jhb@freebsd.org, jlv@freebsd.org
* Use DEVMETHOD_END macro defined in sys/bus.h instead of {0, 0} sentinel on ↵sbz2013-01-301-1/+1
| | | | | | | device_method_t arrays Reviewed by: cognet Approved by: cognet
* Mechanically substitute flags from historic mbuf allocator withglebius2012-12-041-5/+5
| | | | malloc(9) flags in sys/dev.
* This isn't functionally identical. In some cases a hint to disableeadler2012-10-221-0/+5
| | | | | | | | unit 0 would in fact disable all units. This reverts r241856 Approved by: cperciva (implicit)
* Now that device disabling is generic, remove extraneous code from theeadler2012-10-221-5/+0
| | | | | | | | device drivers that used to provide this feature. Reviewed by: des Approved by: cperciva MFC after: 1 week
* The drbr(9) API appeared to be so unclear, that most drivers inglebius2012-09-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tree used it incorrectly, which lead to inaccurate overrated if_obytes accounting. The drbr(9) used to update ifnet stats on drbr_enqueue(), which is not accurate since enqueuing doesn't imply successful processing by driver. Dequeuing neither mean that. Most drivers also called drbr_stats_update() which did accounting again, leading to doubled if_obytes statistics. And in case of severe transmitting, when a packet could be several times enqueued and dequeued it could have been accounted several times. o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between ALTQ queueing or buf_ring(9) queueing. - It doesn't touch the buf_ring stats any more. - It doesn't touch ifnet stats anymore. - drbr_stats_update() no longer exists. o buf_ring(9) handles its stats itself: - It handles br_drops itself. - br_prod_bytes stats are dropped. Rationale: no one ever reads them but update of a common counter on every packet negatively affects performance due to excessive cache invalidation. - buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since we no longer account bytes. o Drivers handle their stats theirselves: if_obytes, if_omcasts. o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer use drbr_stats_update(), and update ifnet stats theirselves. o bxe(4) was the most correct driver, it didn't call drbr_stats_update(), thus it was the only driver accurate under moderate load. Now it also maintains stats itself. o ixgbe(4) had already taken stats from hardware, so just - drop software stats updating. - take multicast packet count from hardware as well. o mxge(4) just no longer needs NO_SLOW_STATS define. o cxgb(4), cxgbe(4) need no change, since they obtain stats from hardware. Reviewed by: jfv, gnn
* This patch fixes a nit in the em, lem, and igb driver statistics. Incrementsbruno2012-09-231-1/+1
| | | | | | | | | adapter->dropped_pkts instead of if_ierrors because if_ierrors is overwritten by hw stats collection. Submitted by: Andrew Boyer <aboyer@averesystems.com> Reviewed by: Jack F Vogel <jfv@freebsd.org> MFC after: 2 weeks
* Switch some PCI register reads from using magic numbers to using the namesgavin2012-09-191-1/+1
| | | | | | defined in pcireg.h MFC after: 1 week
* Align the PCI Express #defines with the style used for the PCI-Xgavin2012-09-181-3/+3
| | | | | | | | | | | | | | | | | #defines. This also has the advantage that it makes the names more compact, iand also allows us to correct the non-uniform naming of the PCIM_LINK_* defines, making them all consistent amongst themselves. This is a mostly mechanical rename: s/PCIR_EXPRESS_/PCIER_/g s/PCIM_EXP_/PCIEM_/g s/PCIM_LINK_/PCIEM_LINK_/g When this is MFC'd, #defines will be added for the old names to assist out-of-tree drivers. Discussed with: jhb MFC after: 1 week
* Customer report of a panic on boot due to the oldjfv2012-08-151-0/+2
| | | | | | | | | | | "m_getjcl:invalid cluster type" that occurred some time back with the igb driver. This happens often when booting over the net. I believe the NIC hardware is left in a warm state when handed over to the driver, and a stray RX interrupt happens earlier than the code is prepared for it to happen. This change was verified to fix the problem, its kind of a bandaid... but it is similar to what was done in the igb code.
* Change the interface to the Energy Efficient Ethernet (EEE)jfv2012-07-071-4/+29
| | | | | | | | | | | setting in the igb and em driver. This was necessitated by a shared code change that I was given late in the game, a data type changed from bool to int, in the last update I dealt with it by a cast, but it was pointed out (thanks jhb) that there was a potential problem with this. John suggested this safer approach, and it is fine with me... MFC after:2 days (to catch the 9.1 update)
* Sync with Intel internal source:jfv2012-07-051-1/+1
| | | | | | | | shared code update and small changes in core required Add support for new i210/i211 devices Improve queue calculation based on mac type MFC after:5 days
* Initialize "error" to zero when it's declared in em_setup_receive_ring()kevlo2012-05-111-1/+1
|
* Fix a few issues with transmit handling in em(4) and igb(4):jhb2012-03-301-33/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Do not define the foo_start() methods or set if_start in the ifnet if multiq transmit is enabled. Also, set if_transmit and if_qflush before ether_ifattach rather than after when multiq transmit is enabled. This helps to ensure that the drivers never try to mix different transmit methods. - Properly restart transmit during resume. igb(4) was not restarting it at all, and em(4) was restarting even if the link was down and was calling the wrong method if multiq transmit was enabled. - Remove all the 'more' handling for transmit completions. Transmit completion processing does not have a processing limit, so it always runs to completion and never has more work to do when it returns. Instead, the previous code was returning 'true' anytime there were packets in the queue that weren't still in the process of being transmitted. The effect was that the driver would continuously reschedule a task to process TX completions in effect running at 100% CPU polling the hardware until it finished transmitting all of the packets in the ring. Now it will just wait for the next TX completion interrupt. - Restart packet transmission when the link becomes active. - Fix the MSI-X queue interrupt handlers to restart packet transmission if there are pending packets in the relevant software queue (IFQ or buf_ring) after processing TX completions. This is the root cause for the OACTIVE hangs as if the MSI-X queue handler drained all the pending packets from the TX ring, nothing would ever restart it. As such, remove some previously-added workarounds to reschedule a task to poll the TX ring anytime OACTIVE was set. Tested by: sbruno Reviewed by: jfv MFC after: 1 week
* A bunch of netmap fixes:luigi2012-02-271-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | USERSPACE: 1. add support for devices with different number of rx and tx queues; 2. add better support for zero-copy operation, adding an extra field to the netmap ring to indicate how many buffers we have already processed but not yet released (with help from Eddie Kohler); 3. The two changes above unfortunately require an API change, so while at it add a version field and some spares to the ioctl() argument to help detect mismatches. 4. update the manual page for the two changes above; 5. update sample applications in tools/tools/netmap KERNEL: 1. simplify the internal structures moving the global wait queues to the 'struct netmap_adapter'; 2. simplify the functions that map kring<->nic ring indexes 3. normalize device-specific code, helps mainteinance; 4. start exploring the impact of micro-optimizations (prefetch etc.) in the ixgbe driver. Use 'legacy' descriptors on the tx ring and prefetch slots gives about 20% speedup at 900 MHz. Another 7-10% would come from removing the explict calls to bus_dmamap* in the core (they are effectively NOPs in this case, but it takes expensive load of the per-buffer dma maps to figure out that they are all NULL. Rx performance not investigated. I am postponing the MFC so i can import a few more improvements before merging.
* (This commit only touches code within the DEV_NETMAP blocks)luigi2012-02-151-7/+2
| | | | | | | | | Introduce some functions to map NIC ring indexes into netmap ring indexes and vice versa. This way we can implement the bound checks only in one place (and hopefully in a correct way). On passing, make the code and comments more uniform across the various drivers.
* clear the pointer after freeing the mbuf. Without that, weluigi2012-01-121-0/+1
| | | | | risk a double free if the subsequent mbuf allocation fails. This bug is not netmap-related and was introduced in rev. 228387
* fix the initialization of the rings when netmap is used,luigi2012-01-121-62/+25
| | | | | | to adapt it to the changes in 228387 . Now the code is similar to the one used in other drivers. Not applicable to stable/9 and stable/8
* small code cleanup in preparation for future modifications inluigi2012-01-101-14/+18
| | | | | | | the memory allocator used by netmap. No functional change, two small bug fixes: - in if_re.c add a missing bus_dmamap_sync() - in netmap.c comment out a spurious free() in an error handling block
* ether_ifattach() sets if_mtu to ETHERMTU, don't bother set it againkevlo2012-01-071-1/+0
| | | | Reviewed by: yongari
* When extracting the VLAN tag from if_em and if_lem receive descriptorrwatson2012-01-051-2/+1
| | | | | | | | | | rings, copy the whole VLAN tag, not just the VLAN ID. This fixes a problem in which VLAN priority information was dropped when using offloaded VLAN processing with these drivers. Discussed with: jfv, rrs Sponsored by: ADARA Networks, Inc. MFC after: 3 days
* Last change still had an issue, one more time...jfv2011-12-111-3/+3
|
* Correct LINT build issues in the ioctl code.jfv2011-12-111-3/+7
|
OpenPOWER on IntegriCloud