summaryrefslogtreecommitdiffstats
path: root/sys/dev/ath/if_ath.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Add a new option to limit the maximum size of aggregates.adrian2013-02-211-0/+1
| | | | | | | | | | | The default is to limit them to what the hardware is capable of. Add sysctl twiddles for both the non-RTS and RTS protected aggregate generation. Whilst here, add some comments about stuff that I've discovered during my exploration of the TX aggregate / delimiter setup path from the reference driver.
* Enable TX FIFO underrun interrupts. This allows the TX FIFO thresholdadrian2013-02-201-0/+1
| | | | | | | | | | | | adjustment code to now run. Tested: * AR5416, STA TODO: * Much more thorough testing on the other chips, AR5210 -> AR9287
* oops, tab!adrian2013-02-201-1/+1
|
* Post interrupts in the ath alq trace.adrian2013-02-201-0/+4
|
* CFG_ERR, DATA_UNDERRUN and DELIM_UNDERRUN are all flags, rather thanadrian2013-02-201-6/+13
| | | | | | | | | | part of ts_status. Thus: * make sure we decode them from ts_flags, rather than ts_status; * make sure we decode them regardless of whether there's an error or not. This correctly exposes descriptor configuration errors, TX delimiter underruns and TX data underruns.
* Pull out the if_transmit() work and revert back to ath_start().adrian2013-02-131-441/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My changed had some rather significant behavioural changes to throughput. The two issues I noticed: * With if_start and the ifnet mbuf queue, any temporary latency would get eaten up by some mbufs being queued. With ath_transmit() queuing things to ath_buf's, I'd only get 512 TX buffers before I couldn't queue any further frames. * There's also some non-zero latency involved with TX being pushed into a taskqueue via direct dispatch. Any time the scheduler didn't immediately schedule the ath TX task would cause extra latency. Various 1ge/10ge drivers implement both direct dispatch (if the TX lock can be acquired) and deferred task transmission (if the TX lock can't be acquired), with frames being pushed into a drbd queue. I'll have to do this at some point, but until I figure out how to deal with 802.11 fragments, I'll have to wait a while longer. So what I saw: * lots of extra latency, specially under load - if the taskqueue wasn't immediately scheduled, things went pear shaped; * any extra latency would result in TX ath_buf's taking their sweet time being replenished, so any further calls to ath_transmit() would drop mbufs. * .. yes, there's no explicit backpressure here - things are just dropped. Eek. With this, the general performance has gone up, but those subtle if_start() related race conditions are back. For some reason, this is doubly-obvious with the AR5416 NIC and I don't quite understand why yet. There's an unrelated issue with AR5416 performance in STA mode (it's fine in AP mode when bridging frames, weirdly..) that requires a little further investigation. Specifically - it works fine on a Lenovo T40 (single core CPU) running a March 2012 9-STABLE kernel, but a Lenovo T60 (dual core) running an early November 2012 kernel behaves very poorly. The same hardware with an AR9160 or AR9280 behaves perfectly.
* Go back to direct-dispatch of the software queue and frame TX pathsadrian2013-02-111-7/+9
| | | | | | | | | | | | | | | | | when they're being called from the TX completion handler. Going (back) through the taskqueue is just adding extra locking and latency to packet operations. This improves performance a little bit on most NICs. It still hasn't restored the original performance of the AR5416 NIC but the AR9160, AR9280 and later NICs behave very well with this. Tested: * AR5416 STA (still tops out at ~ 70mbit TCP, rather than 150mbit TCP..) * AR9160 hostap (good for both TX and RX) * AR9280 hostap (good for both TX and RX)
* Create a new TX lock specifically for queuing frames.adrian2013-02-071-14/+8
| | | | | This now separates out the act of queuing frames from the act of running TX and TX completion.
* Methodize the process of adding the software TX queue to the taskqueue.adrian2013-02-071-2/+2
| | | | Move it (for now) to the TX taskqueue.
* Migrate the TX sending code out from under the ath0 taskq and intoadrian2013-01-261-3/+19
| | | | | | | | | | | | | | | | | | the separate ath0 TX taskq. Whilst here, make sure that the TX software scheduler is also running out of the TX task, rather than the ath0 taskqueue. Make sure that the tx taskqueue is blocked/unblocked as necessary. This allows for a little more parallelism on multi-core machines, as well as (eventually) supporting a higher task priority for TX tasks, allowing said TX task to preempt an already running RX or TX completion task. Tested: * AR5416, AR9280 hostap and STA modes
* Fix hangs (exposed by spectral scan activity) in STA mode when theadrian2013-01-171-1/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | chip hangs. * Always do a reset in ath_bmiss_proc(), regardless of whether the hardware is "hung" or not. Specifically, for spectral scan, there's likely a whole bunch of potential hangs that we don't (yet) recognise in the HAL. So to avoid staying RX deaf persisting until the station disassociates, just do a no-loss reset. * Set sc_beacons=1 in STA mode. During a reset, the beacon programming isn't done. (It's likely I need to set sc_syncbeacons during a hang reset, but I digress.) Thus after a reset, there's no beacon timer programming to send a BMISS interrupt if beacons aren't heard .. thus if the AP disappears, you won't get notified and you'll have to reset your interface. This hasn't yet fixed all of the hangs that I've seen when debugging spectral scan, but it's certainly reduced the hang frequency and it should improve general STA stability in very noisy environments. Tested: * AR9280, STA mode, spectral scan off/on PR: kern/175227
* Implement frame (data) transmission using if_transmit(), rather thanadrian2013-01-151-98/+435
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | if_start(). This removes the overlapping data path TX from occuring, which solves quite a number of the potential TX queue races in ath(4). It doesn't fix the net80211 layer TX queue races and it doesn't fix the raw TX path yet, but it's an important step towards this. This hasn't dropped the TX performance in my testing; primarily because now the TX path can quickly queue frames and continue along processing. This involves a few rather deep changes: * Use the ath_buf as a queue placeholder for now, as we need to be able to support queuing a list of mbufs (ie, when transmitting fragments) and m_nextpkt can't be used here (because it's what is joining the fragments together) * if_transmit() now simply allocates the ath_buf and queues it to a driver TX staging queue. * TX is now moved into a taskqueue function. * The TX taskqueue function now dequeues and transmits frames. * Fragments are handled correctly here - as the current API passes the fragment list as one mbuf list (joined with m_nextpkt) through to the driver if_transmit(). * For the couple of places where ath_start() may be called (mostly from net80211 when starting the VAP up again), just reimplement it using the new enqueue and taskqueue methods. What I don't like (about this work and the TX code in general): * I'm using the same lock for the staging TX queue management and the actual TX. This isn't required; I'm just being slack. * I haven't yet moved TX to a separate taskqueue (but the taskqueue is created); it's easy enough to do this later if necessary. I just need to make sure it's a higher priority queue, so TX has the same behaviour as it used to (where it would preempt existing RX..) * I need to re-review the TX path a little more and make sure that ieee80211_node_*() functions aren't called within the TX lock. When queueing, I should just push failed frames into a queue and when I'm wrapping up the TX code, unlock the TX lock and call ieee80211_node_free() on each. * It would be nice if I could hold the TX lock for the entire TX and TX completion, rather than this release/re-acquire behaviour. But that requires that I shuffle around the TX completion code to handle actual ath_buf free and net80211 callback/free outside of the TX lock. That's one of my next projects. * the ic_raw_xmit() path doesn't use this yet - so it still has sequencing problems with parallel, overlapping calls to the data path. I'll fix this later. Tested: * Hostap - AR9280, AR9220 * STA - AR5212, AR9280, AR5416
* Add a new (skeleton) spectral mode manager module.adrian2013-01-021-0/+25
|
* Fix typo in comment.bapt2012-12-281-1/+1
| | | | Submitted by: Christoph Mallon <christoph.mallon@gmx.de>
* Delete the per-TXQ locks and replace them with a single TX lock.adrian2012-12-021-21/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I couldn't think of a way to maintain the hardware TXQ locks _and_ layer on top of that per-TXQ software queuing and any other kind of fine-grained locks (eg per-TID, or per-node locks.) So for now, to facilitate some further code refactoring and development as part of the final push to get software queue ps-poll and u-apsd handling into this driver, just do away with them entirely. I may eventually bring them back at some point, when it looks slightly more architectually cleaner to do so. But as it stands at the present, it's not really buying us much: * in order to properly serialise things and not get bitten by scheduling and locking interactions with things higher up in the stack, we need to wrap the whole TX path in a long held lock. Otherwise we can end up being pre-empted during frame handling, resulting in some out of order frame handling between sequence number allocation and encryption handling (ie, the seqno and the CCMP IV get out of sequence); * .. so whilst that's the case, holding the lock for that long means that we're acquiring and releasing the TXQ lock _inside_ that context; * And we also acquire it per-frame during frame completion, but we currently can't hold the lock for the duration of the TX completion as we need to call net80211 layer things with the locks _unheld_ to avoid LOR. * .. the other places were grab that lock are reset/flush, which don't happen often. My eventual aim is to change the TX path so all rejected frame transmissions and all frame completions result in any ieee80211_free_node() calls to occur outside of the TX lock; then I can cut back on the amount of locking that goes on here. There may be some LORs that occur when ieee80211_free_node() is called when the TX queue path fails; I'll begin to address these in follow-up commits.
* Call if_free() with the correct vnet context if and only if ifp_vnetadrian2012-11-281-2/+7
| | | | | | | isn't NULL. If the attach fails prematurely and there's no if_vnet context, calling CURVNET_SET(ifp->if_vnet) is going to dereference a NULL pointer.
* ALQ logging enhancements:adrian2012-11-161-0/+14
| | | | | | | | | | | | | * upon setup, tell the alq code what the chip information is. * add TX/RX path logging for legacy chips. * populate the tx/rx descriptor length fields with a best-estimate. It's overly big (96 bytes when AH_SUPPORT_AR5416 is enabled) but it'll do for now. Whilst I'm here, add CURVNET_RESTORE() here during probe/attach as a partial solution to fixing crashes during attach when the attach fails. There are other attach failures that I have to deal with; those'll come later.
* Correctly fix the 'scan during STA mode' crash.adrian2012-11-111-0/+7
|
* Don't compile in my (not yet committed) ath_alq code unless ATH_DEBUG_ALQadrian2012-11-071-3/+3
| | | | | | is defined. This will unbreak ATH_DEBUG builds.
* Disable my software queue TIM and PS handling for now.adrian2012-11-071-0/+37
| | | | | | | ps-poll is totally broken in its current form. This should unbreak things enough to let people use PS-POLL devices, but leave it in place for me to finish PS-POLL handling.
* Add a new HAL call to extract out the HAL enterprise bits from theadrian2012-11-031-0/+7
| | | | AR9300 HAL.
* I give up - introduce a TX lock to serialise TX operations.adrian2012-10-311-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | I've tried serialising TX using queues and such but unfortunately due to how this interacts with the locking going on elsewhere in the networking stack, the TX task gets delayed, resulting in quite a noticable throughput loss: * baseline TCP for 2x2 11n HT40 is ~ 170mbit/sec; * TCP for TX task in the ath taskq, with the RX also going on - 80mbit/sec; * TCP for TX task in a separate, second taskq - 100mbit/sec. So for now I'm going with the Linux wireless stack approach - lock tx early. The linux code does in the wireless stack, before the 802.11 state stuff happens and before it's punted to the driver. But TX locking needs to also occur at the driver layer as the TX completion code _also_ begins to drain the ifnet TX queue. Whilst I'm here, add some KTR traces for the TX path. Note: * This really should be done at the net80211 layer (as well, at least.) But that'll have to wait for a little more thought to happen.
* Begin fleshing out some software queue awareness for TIM handling withadrian2012-10-281-0/+244
| | | | | | | | | | | | | | | | | the power save queue. * introduce some new ATH_NODE lock protected fields, tracking the net80211 psq and TIM state; * when doing buffer transitions - ie, when sending and completing buffers - check the state of the SWQ and update the TIM appropriately. * when clearing the TIM bit, if the SWQ is not empty then delay clearing it. This is racy, but it's no less racy than the current net80211 power save queue management code. Specifically, with multiple TX threads, it's quite plausible that parallel state updates will race and the TIM will be left in an inconsistent state. I'll address that in a follow-up commit.
* Add a temporary (for values of "temporary") work around for hotplugadrian2012-10-281-1/+9
| | | | | | | | | | | | | | | | support with ath(4) and VIMAGE. Right now the VIMAGE code doesn't supply a default vnet context during: * hotplug attach; * any device detach. It special cases kldload/boot time probing (by setting the context to vnet0) but that doesn't occur when probing devices during a bus rescan - eg, adding a cardbus card. These will eventually go away when the VIMAGE support extends to providing default contexts to hotplug attach/detach.
* Push the actual TX processing into the ath taskqueue, rather than havingadrian2012-10-141-14/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | it run out of multiple concurrent contexts. Right now the ath(4) TX processing is a bit hairy. Specifically: * It was running out of ath_start(), which could occur from multiple concurrent sending processes (as if_start() can be started from multiple sending threads nowdays.. sigh) * during RX if fast frames are enabled (so not really at the moment, not until I fix this particular feature again..) * during ath_reset() - so anything which calls that * during ath_tx_proc*() in the ath taskqueue - ie, TX is attempted again after TX completion, as there's now hopefully some ath_bufs available. * Then, the ic_raw_xmit() method can queue raw frames for transmission at any time, from any net80211 TX context. Ew. This has caused packet ordering issues in the past - specifically, there's absolutely no guarantee that preemption won't occuring _during_ ath_start() by the TX completion processing, which will call ath_start() again. It's a mess - 802.11 really, really wants things to be in sequence or things go all kinds of loopy. So: * create a new task struct for TX'ing; * make the if_start method simply queue the task on the ath taskqueue; * make ath_start() just be called by the new TX task; * make ath_tx_kick() just schedule the ath TX task, rather than directly calling ath_start(). Now yes, this means that I've taken a step backwards in terms of concurrency - TX -and- RX now occur in the same single-task taskqueue. But there's nothing stopping me from separating out the TX / TX completion code into a separate taskqueue which runs in parallel with the RX path, if that ends up being appropriate for some platforms. This fixes the CCMP/seqno concurrency issues that creep up when you transmit large amounts of uni-directional UDP traffic (>200MBit) on a FreeBSD STA -> AP, as now there's only one TX context no matter what's going on (TX completion->retry/software queue, userland->net80211->ath_start(), TX completion -> ath_start()); but it won't fix any concurrency issues between raw transmitted frames and non-raw transmitted frames (eg EAPOL frames on TID 16 and any other TID 16 multicast traffic that gets put on the CABQ.) That is going to require a bunch more re-architecture before it's feasible to fix. In any case, this is a big step towards making the majority of the TX path locking irrelevant, as now almost all TX activity occurs in the taskqueue. Phew.
* Initialise an uninitialised variable.adrian2012-10-051-1/+2
|
* Pause and unpause the software queues for a given node based on theadrian2012-10-031-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | net80211 node power save state. * Add an ATH_NODE_UNLOCK_ASSERT() check * Add a new node field - an_is_powersave * Pause/unpause the queue based on the node state * Attempt to handle net80211 concurrency issues so the queue doesn't get paused/unpaused more than once at a time from the net80211 power save code. Whilst here (and breaking my usual rule), set CLRDMASK when a queue is unpaused, regardless of whether the queue has some pending traffic. This means the first frame from that TID (now or later) will hvae CLRDMASK set. Also whilst here, bump the swretrymax counters whenever the filtered frames code expires a frame. Again, breaking my rule, but this is just a statistics thing rather than a functional change. This doesn't fix ps-poll (but it doesn't break it too much worse than it is at the present) or correcting the TID updates. That's next on the list. Tested: * AR9220 AP (Atheros AP96 reference design) * Macbook Pro and LG Optimus 1 Android phone, both setting and clearing power save state (but not using PS-POLL.)
* Migrate the ath(4) KTR logging to use an ATH_KTR() macro.adrian2012-09-241-8/+39
| | | | | | | This should eventually be unified with ATH_DEBUG() so I can get both from one macro; that may take some time. Add some new probes for TX and TX completion.
* Remove TDMA #define entries from if_ath.c; they now exist in if_ath_tdma.h.adrian2012-09-091-16/+0
|
* There's no nede to allocate a DMA map just before calling bus_dmamem_alloc().adrian2012-08-291-11/+0
| | | | | | | | | | | In fact, bus_dmamem_alloc() happily NULLs the dmat pointer passed in, before replacing it with its own. This fixes a MIPS crash when kldload'ing if_ath/if_ath_pci - bus_dmamap_destroy() was passed in a NULL dmat pointer and was doing all kinds of very bad things. Reviewed by: scottl
* Implement a sequential descriptor ID value and stuff it in the ath_buf.adrian2012-08-151-0/+8
| | | | | This will be used by the EDMA TX code to assign descriptor IDs in order to provide some debugging.
* Break out the TX completion code into a separate function, so it can beadrian2012-08-141-35/+62
| | | | | | | | | re-used by the upcoming EDMA TX completion code. Make ath_stoptxdma() public, again so the EDMA TX code can use it. Don't check for the TXQ bitmap in the ISR when doing EDMA work as it doesn't apply for EDMA.
* Revert the ath_tx_draintxq() method, and instead teach it the minimumadrian2012-08-121-3/+24
| | | | | | | | | necessary to "do" EDMA. It was just using the TX completion status for logging information about the descriptor completion. Since with EDMA we don't know this without checking the TX completion FIFO, we can't provide this information. So don't.
* Break out ath_draintxq() into a method and un-methodize ath_tx_processq().adrian2012-08-121-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that I understand what's going on with this, I've realised that it's going to be quite difficult to implement a processq method in the EDMA case. Because there's a separate TX status FIFO, I can't just run processq() on each EDMA TXQ to see what's finished. i have to actually run the TX status queue and handle individual TXQs. So: * unmethodize ath_tx_processq(); * leave ath_tx_draintxq() as a method, as it only uses the completion status for debugging rather than actively completing the frames (ie, all frames here are failed); * Methodize ath_draintxq(). The EDMA ath_draintxq() will have to take care of running the TX completion FIFO before (potentially) freeing frames in the queue. The only two places where ath_tx_draintxq() (on a single TXQ) are used: * ath_draintxq(); and * the CABQ handling in the beacon setup code - it drains the CABQ before populating the CABQ with frames for a new beacon (when doing multi-VAP operation.) So it's quite possible that once I methodize the CABQ and beacon handling, I can just drop ath_tx_draintxq() in its entirety. Finally, it's also quite possible that I can remove ath_tx_draintxq() in the future and just "teach" it to not check the status when doing EDMA.
* Extend the beacon code slightly to support AP mode beaconing for theadrian2012-08-111-1/+1
| | | | | | | | | | | | | | | | | | | EDMA HAL hardware. * The EDMA HAL code assumes the nexttbtt and intval values are in TU/8 units, rather than TU. For now, just "hack" around that here, at least until I code up something to translate it in the HAL. * Setup some different TXQ flags for EDMA hardware. * The EDMA HAL doesn't support setting the first rate series via ath_hal_setuptxdesc() - instead, a call to ath_hal_set11nratescenario() is always required. So for now, just do an 11n rate series setup for EDMA beacon frames. This allows my AR9380 to successfully transmit beacon frames. However, CABQ TX and all normal data frame TX and TX completion is still not functional and will require some more significant code churn to make work.
* Allow 802.11n hardware to support multi-rate retry when RTS/CTS isadrian2012-07-311-0/+9
| | | | | | | | | | | | | | | | enabled. The legacy (pre-802.11n) hardware doesn't support this - although the AR5212 era hardware supports MRR, it doesn't have all the bits needed to support MRR + RTS/CTS. The AR5416 and later support a packet duration and RTS/CTS flags per rate scenario, so we should support it. Tested: * AR9280, STA PR: kern/170302
* Migrate some more TX side setup routines to be methods.adrian2012-07-311-17/+30
|
* Fix breakage introduced in r238824 - correctly calculate the descriptoradrian2012-07-291-1/+7
| | | | | | | | | | | | wrapping. The previous code was only wrapping descriptor "block" boundaries rather than individual descriptors. It sounds equivalent but it isn't. r238824 changed the descriptor allocation to enforce that an individual descriptor doesn't wrap a 4KiB boundary rather than the whole block of descriptors. Eg, for TX descriptors, they're allocated in blocks of 10 descriptors for each ath_buf (for scatter/gather DMA.)
* Add a missing call to ath_txdma_teardown().adrian2012-07-281-0/+1
|
* Modify ath_descdma_cleanup() to handle ath_descdma instances with noadrian2012-07-271-18/+23
| | | | | | | | buffers. ath_descdma is now being used for things other than the classical combination of ath_buf + ath_desc allocations. In this particular case, don't try to free and blank out the ath_buf list if it's not passed in.
* Migrate the descriptor allocation function to not care about the numberadrian2012-07-271-8/+8
| | | | | | | | | | | | | | | of buffers, only the number of descriptors. This involves: * Change the allocation function to not use nbuf at all; * When calling it, pass in "nbuf * ndesc" to correctly update how many descriptors are being allocated. Whilst here, fix the descriptor allocation code to correctly allocate a larger buffer size if the Merlin 4KB WAR is required. It overallocates descriptors when allocating a block that doesn't ever have a 4KB boundary being crossed, but that can be fixed at a later stage.
* Refactor out the descriptor allocation code from the buffer allocationadrian2012-07-271-10/+51
| | | | | | | code. The TX EDMA completion path is going to need descriptors allocated but not any buffers. This code will form the basis for that.
* Modify ath_descdma_setup() to take a descriptor size parameter.adrian2012-07-231-5/+6
| | | | | | | | | | | | | The AR9300 and later descriptors are 128 bytes, however I'd like to make sure that isn't used for earlier chips. * Populate the TX descriptor length field in the softc with sizeof(ath_desc) * Use this field when allocating the TX descriptors * Pre-AR93xx TX/RX descriptors will use the ath_desc size; newer ones will query the HAL for these sizes.
* Begin separating out the TX DMA setup in preparation for TX EDMA support.adrian2012-07-231-3/+18
| | | | | | | | | | | | * Introduce TX DMA setup/teardown methods, mirroring what's done in the RX path. Although the TX DMA descriptor is setup via ath_desc_alloc() / ath_desc_free(), there TX status descriptor ring will be allocated in this path. * Remove some of the TX EDMA capability probing from the RX path and push it into the new TX EDMA path.
* Begin modifying the descriptor allocation functions to support a variableadrian2012-07-231-9/+10
| | | | | | | | sized TX descriptor. This is required for the AR93xx EDMA support which requires 128 byte TX descriptors (which is significantly larger than the earlier hardware.)
* Enable the basic node-based rate control statistics via an ioctl().adrian2012-07-201-0/+40
|
* Ensure that error is set.adrian2012-07-141-0/+1
| | | | Noticed by: rui
* Don't free the descriptor allocation/map if it doesn't exist.adrian2012-07-141-4/+6
| | | | I missed this in my previous commit.
* Fix EDMA RX to actually work without panicing the machine.adrian2012-07-141-0/+61
| | | | | | | | | | I was setting up the RX EDMA buffer to be 4096 bytes rather than the RX data buffer portion. The hardware was likely getting very confused and DMAing descriptor portions into places it shouldn't, leading to memory corruption and occasional panics. Whilst here, don't bother allocating descriptors for the RX EDMA case. We don't use those descriptors. Instead, just allocate ath_buf entries.
* Flip on EDMA RX of both HP and LP queue frames.adrian2012-07-101-1/+13
| | | | | Yes, this is in the legacy interrupt path. The NIC does support MSI but I haven't yet sat down and written that code.
OpenPOWER on IntegriCloud