op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	tcp: Fix a connect() race with timewait sockets	Eric Dumazet	2009-12-08	3	-8/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we find a timewait connection in __inet_hash_connect() and reuse it for a new connection request, we have a race window, releasing bind list lock and reacquiring it in __inet_twsk_kill() to remove timewait socket from list. Another thread might find the timewait socket we already chose, leading to list corruption and crashes. Fix is to remove timewait socket from bind list before releasing the bind lock. Note: This problem happens if sysctl_tcp_tw_reuse is set. Reported-by: kapil dakhane <kdakhane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: Fix a connect() race with timewait sockets	Eric Dumazet	2009-12-08	8	-17/+35
\| \| \| \| \| \| \| \| \| \| \| \| \|	First patch changes __inet_hash_nolisten() and __inet6_hash() to get a timewait parameter to be able to unhash it from ehash at same time the new socket is inserted in hash. This makes sure timewait socket wont be found by a concurrent writer in __inet_check_established() Reported-by: kapil dakhane <kdakhane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ixgbe: add support for 82599 KR device 0x1517	Don Skidmore	2009-12-08	3	-0/+4
\| \| \| \| \| \| \|	Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ixgbe: Fix TX stats accounting	Eric Dumazet	2009-12-08	2	-16/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Here is an updated version, because ixgbe_get_ethtool_stats() needs to call dev_get_stats() or "ethtool -S" wont give correct tx_bytes/tx_packets values. Several cpus can update netdev->stats.tx_bytes & netdev->stats.tx_packets in parallel. In this case, TX stats are under estimated and false sharing takes place. After a pktgen session sending exactly 200000000 packets : # ifconfig fiber0 \| grep TX TX packets:198501982 errors:0 dropped:0 overruns:0 carrier:0 Multi queue devices should instead use txq->tx_bytes & txq->tx_packets in their xmit() method (appropriate txq lock already held by caller, no cache line miss), or use appropriate locking. After patch, same pktgen session gives : # ifconfig fiber0 \| grep TX TX packets:200000000 errors:0 dropped:0 overruns:0 carrier:0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	e1000e: only perform ESB2 MDIC workaround on certain configurations	Bruce Allan	2009-12-08	2	-33/+69
\| \| \| \| \| \| \| \| \| \| \| \|	A workaround added for all ESB2 devices (adds a delay for all MDIC accesses which resolves an issue with the MDIC ready bit being set prematurely) is applicable only to devices in which the MAC-PHY interconnect is not operating in a certain mode with in-band MDIO. Check the control register for the operating mode and enable the workaround accordingly. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	e1000e: replace incorrect use of GG82563_REG macro	Bruce Allan	2009-12-08	2	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \|	The GG82563_REG() macro should not be used to determine the offset provided to the e1000e_[read\|write]_kmrn_reg() functions since the first argument to the macro is already implied and gets masked off anyway in the functions. The resultant register reads/writes with this patch are functionally the same as before. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	e1000e: minor correction to name of bit in CTRL_EXT register	Bruce Allan	2009-12-08	2	-2/+2
\| \| \| \| \| \| \| \| \|	Bit 7 in the CTRL_REG register is actually the Software Definable Pin 3, not the Software Definable Pin 7. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: Remove runtime check that can never be true.	David S. Miller	2009-12-08	1	-5/+0
\| \| \| \| \| \| \| \| \|	GCC even warns about it, as reported by Andrew Morton: net/ipv4/tcp.c: In function 'do_tcp_getsockopt': net/ipv4/tcp.c:2544: warning: comparison is always false due to limited range of data type Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2009-12-08	14	-201/+228
\|\ \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
\| *	ath5k: add support for Dell Vostro A860 LED	Shahar Or	2009-12-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds support for the WiFi activity LED on the Dell Vostro A860 laptop. Signed-off-by: Shahar Or <shahar@shahar-or.co.il> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	orinoco: remove spare KERN_DEBUG	David Kilroy	2009-12-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A KERN_DEBUG didn't get removed when transitioning from printk to pr_debug Signed-off-by: David Kilroy <kilroyd@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	rtl8187: Fix wrong rfkill switch mask for some models	Larry Finger	2009-12-07	3	-4/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are different bits used to convey the setting of the rfkill switch to the driver. The current driver only supports one of these possibilities. These changes were derived from the latest version of the vendor driver. This patch fixes the regression noted in kernel Bugzilla #14743. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-and-tested-by: Antti Kaijanmäki <antti@kaijanmaki.net> Tested-by: Hin-Tak Leung <hintak.leung@gmail.com> Cc: Stable <stable@kernel.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	ath9k: fix tx status reporting	Felix Fietkau	2009-12-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a bug in ath9k's tx status check, which caused mac80211 to consider regularly transmitted unicast frames as un-acked. When checking the ts_status field for errors, it needs to be masked with ATH9K_TXERR_FILT, because this field also contains other fields like ATH9K_TX_ACKED. Without this patch, AP mode is pretty much unusable, as hostapd checks the ACK status for the frames that it injects. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mac80211: Fix bug in computing crc over dynamic IEs in beacon	Vasanthakumar Thiagarajan	2009-12-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On a 32-bit machine, BIT() macro does not give the required bit value if the bit is mroe than 31. In ieee802_11_parse_elems_crc(), BIT() is suppossed to get the bit value more than 31 (42 (id of ERP_INFO_IE), 37 (CHANNEL_SWITCH_IE), (42), 32 (POWER_CONSTRAINT_IE), 45 (HT_CAP_IE), 61 (HT_INFO_IE)). As we do not get the required bit value for the above IEs, crc over these IEs are never calculated, so any dynamic change in these IEs after the association is not really handled on 32-bit platforms. This patch fixes this issue. Cc: stable@kernel.org Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	net/rfkill/core.c: work around gcc-4.0.2 silliness	Andrew Morton	2009-12-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	net/rfkill/core.c: In function 'rfkill_type_show': net/rfkill/core.c:610: warning: control may reach end of non-void function 'rfkill_get_type_str' being inlined A gcc bug, but simple enough to squish. Cc: John W. Linville <linville@tuxdriver.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mwl8k: don't complain about oversized beacons in FINALIZE_JOIN	Lennert Buytenhek	2009-12-07	1	-21/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The FINALIZE_JOIN firmware command only looks at the first couple of fields in the beacon, and therefore it's not necessary to complain if the beacon is longer than 128 bytes. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mwl8k: don't overwrite mwl8k_vif::bssid until after disassociation	Lennert Buytenhek	2009-12-07	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When disassociating, mac80211 zeroes vif->bss_info.bssid before calling our ->bss_info_changed(), but we need the BSSID to remove the hardware station database entry for our AP, so we can't clear our local copy of the BSSID until after we've done that. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mwl8k: struct ieee80211_rx_status::qual is deprecated	Lennert Buytenhek	2009-12-07	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mwl8k: don't forget to call pci_disable_device()	Lennert Buytenhek	2009-12-07	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't forget to call pci_disable_device() if pci_request_regions() fails during probe. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mwl8k: increase firmware loading timeouts	Lennert Buytenhek	2009-12-07	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The time between loading the helper image and starting to upload the main firmware image should be at least 5 ms or so. We were doing an msleep(1) before, and 1 ms appears to not be enough in almost all cases, but building with HZ=100 has always masked this so far. Bumping the msleep argument to 5 fixes firmware loading e.g. when HZ=1000. Some firmware images need more than 200ms to initialize. Bump the ready code timeout to 500ms to accommodate for this. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mwl8k: allow more time for transmit rings to drain	Lennert Buytenhek	2009-12-07	1	-60/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before issuing any firmware commands, we wait for the transmit rings to drain, to prevent control versus data path synchronization issues. In some cases, this can end up taking longer than the current hardcoded limit of 5 seconds, for example if the transmit rings are filled with packets for a host that has dropped off the air and we end up retransmitting every pending packet at the lowest rate a couple of times. This patch changes mwl8k_tx_wait_empty() to only bail out on timeout expiry if there was no change in the number of packets pending in the transmit rings during the waiting period. If at least one transmit ring entry was reclaimed while we were waiting, we are apparently still making progress, and we'll allow waiting for another timeout period. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mwl8k: allow more time for firmware commands to complete	Lennert Buytenhek	2009-12-07	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some firmware commands can under some circumstances take more than 2 seconds to complete. This patch bumps the timeout up to 10 seconds, and prints a message whenever a command takes more than 2 seconds. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mwl8k: properly report rate on received 40MHz packets	Lennert Buytenhek	2009-12-07	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On 8366, bit 6 in the rx descriptor rate field indicates whether the packet was received on a 20MHz or 40MHz channel, and is not part of the MCS index. Handle this properly, which then prevents hitting the WARN_ON and being dropped in ieee80211_rx(). Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mwl8k: fix addr4 zeroing and payload overwrite on DMA header creation	Lennert Buytenhek	2009-12-07	1	-14/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When inserting a DMA header into a packet for transmission, mwl8k_add_dma_header() would blindly zero the addr4 field, which is not a good idea if the packet being transmitted is actually a 4-address packet. Also, if the transmitted packet was a 4-address with QoS packet, the memmove() to do the needed header reshuffling would inadvertently overwrite the first two bytes of the packet payload with the QoS field. This fixes both of these issues. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mwl8k: prevent corruption of QoS field on receive	Lennert Buytenhek	2009-12-07	1	-15/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Packets exchanged between the mwl8k driver and the firmware always have a 4-address header without QoS field. For QoS packets, the QoS field is passed to/from the firmware via the tx/rx descriptors. We were handling this correctly on transmit, but not on receive -- if a QoS packet was received, we would leave garbage in the QoS field in the packet passed up to the stack, which is Bad(tm). Also, if the packet received on the air was a 4-address without QoS packet, we would forget to skb_pull the 2-byte DMA length prefix off. This patch adds an argument to the ->rxd_process() receive descriptor operation to retrieve the QoS field from the receive descriptor, and extends mwl8k_remove_dma_header() to insert this field back into the packet if the packet received is a QoS packet. It also fixes mwl8k_remove_dma_header() to strip off the length prefix in all cases. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mwl8k: fix UPDATE_STADB command struct legacy_rates array length	Lennert Buytenhek	2009-12-07	1	-38/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There exist 12 802.11b/g rates, but mwl8k supports two additional (non-standard) rates, and includes those rates in rate bitmasks and in its internal rate table that hardware rate indices index. Commit "mwl8k: report rate and other information for received frames" added one of the nonstandard rates to the mwl8k_rates table to make the OFDM rates in the table line up with the rate indices that are reported in the receive descriptor (so that we can just simply copy the receive descriptor rate index into ieee80211_rx_status::rate_idx) and bumped MWL8K_IEEE_LEGACY_DATA_RATES from 12 to 13, but this screwed up the UPDATE_STADB command struct layout, as it also uses that define, for its legacy_rates array. To avoid having to convert rate indices and legacy rate bitmaps (e.g. ieee80211_bss_conf::basic_rates) between the 12-rate mac80211 format and the 14-rate mwl8k format, we'll report all 14 rates in our wiphy's band, but filter out the nonstandard ones e.g. in the case of the UPDATE_STADB command which only accepts 12 rates. In the commands that accept 14 rates (SET_AID, SET_RATE), replace the use of the MWL8K_RATE_INDEX_MAX_ARRAY define in the command struct by the constant 14, to make it clearer that these commands accept 14 rates. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mwl8k: fix MCS bitmap size in SET_RATE command	Lennert Buytenhek	2009-12-07	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MCS bitmaps in the SET_RATE command structure were of the wrong size, due to use of the wrong define for the array length. Just hardcode the lengths as 16, and do the same for the MCS bitmaps in other command structures. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mac80211: Fix dynamic power save for scanning.	Vivek Natarajan	2009-12-07	2	-4/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Not only ps_sdata but also IEEE80211_CONF_PS is to be considered before restoring PS in scan_ps_disable(). For instance, when ps_sdata is set but CONF_PS is not set just because the dynamic timer is still running, a sw scan leads to setting of CONF_PS in scan_ps_disable instead of restarting the dynamic PS timer. Also for the above case, a null data frame is to be sent after returning to operating channel which was not happening with the current implementation. This patch fixes this too. Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com> Reviewed-by: Kalle Valo <kalle.valo@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mac80211: recalculate idle later in MLME	Johannes Berg	2009-12-07	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	hwsim testing has revealed that when the MLME recalculates the idle state of the device, it sometimes does so before sending the final deauthentication or disassociation frame. This patch changes the place where the idle state is recalculated, but of course driver transmit is typically asynchronous while configuration is expected to be synchronous, so it doesn't fix all possible cases yet. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	wl1251: don't build null data template in wl1251_op_config()	Kalle Valo	2009-12-07	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bssid can be zero when null data template is set in wl1251_op_config(). It's enough, and especially safe, to set it once after association. Signed-off-by: Kalle Valo <kalle.valo@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	wl1251: fix bssid handling	Kalle Valo	2009-12-07	1	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bssid needs to be copied first in wl1251_op_bss_info_changed(), otherwise templates will have incorrect bssid and power save will not work correctly. Signed-off-by: Kalle Valo <kalle.valo@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	wl1251: remove false warning messages	Kalle Valo	2009-12-07	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was a warning from wl1251_op_bss_info_changed(): wl1251: WARNING Set ctsprotect failed 0 It was printed always, it's completely false and can be removed. Signed-off-by: Kalle Valo <kalle.valo@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	iwlwifi: fix warning from ieee80211_stop_tx_ba_cb_irqsafe argument change	John W. Linville	2009-12-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CC [M] drivers/net/wireless/iwlwifi/iwl-tx.o drivers/net/wireless/iwlwifi/iwl-tx.c: In function ‘iwl_tx_agg_stop’: drivers/net/wireless/iwlwifi/iwl-tx.c:1356: warning: passing argument 1 of ‘ieee80211_stop_tx_ba_cb_irqsafe’ from incompatible pointer type include/net/mac80211.h:2128: note: expected ‘struct ieee80211_vif ’ but argument is of type ‘struct ieee80211_hw ’ Signed-off-by: John W. Linville <linville@tuxdriver.com>
* \|	sctp: fix compile error due to sysctl mismerge	Linus Torvalds	2009-12-08	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I messed up the merge in d7fc02c7bae7b1cf69269992cf880a43a350cdaa, where the conflict in question wasn't just about CTL_UNNUMBERED being removed, but the 'strategy' field is too (sysctl handling is now done through the /proc interface, with no duplicate protocols for reading the data). Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge branch 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block	Linus Torvalds	2009-12-08	107	-2132/+24813
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block: (113 commits) cfq-iosched: Do not access cfqq after freeing it block: include linux/err.h to use ERR_PTR cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit blkio: Allow CFQ group IO scheduling even when CFQ is a module blkio: Implement dynamic io controlling policy registration blkio: Export some symbols from blkio as its user CFQ can be a module block: Fix io_context leak after failure of clone with CLONE_IO block: Fix io_context leak after clone with CLONE_IO cfq-iosched: make nonrot check logic consistent io controller: quick fix for blk-cgroup and modular CFQ cfq-iosched: move IO controller declerations to a header file cfq-iosched: fix compile problem with !CONFIG_CGROUP blkio: Documentation blkio: Wait on sync-noidle queue even if rq_noidle = 1 blkio: Implement group_isolation tunable blkio: Determine async workload length based on total number of queues blkio: Wait for cfq queue to get backlogged if group is empty blkio: Propagate cgroup weight updation to cfq groups blkio: Drop the reference to queue once the task changes cgroup blkio: Provide some isolation between groups ...
\| * \|	cfq-iosched: Do not access cfqq after freeing it	Vivek Goyal	2009-12-07	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a crash during boot reported by Jeff Moyer. Fix the issue of accessing cfqq after freeing it. Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@carl.(none)>
\| * \|	block: include linux/err.h to use ERR_PTR	Stephen Rothwell	2009-12-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit	Jens Axboe	2009-12-06	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After the merge of the IO controller patches, booting on my megaraid box ran much slower. Vivek Goyal traced it down to megaraid discovery creating tons of devices, each suffering a grace period when they later kill that queue (if no device is found). So lets use call_rcu() to batch these deferred frees, instead of taking the grace period hit for each one. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	blkio: Allow CFQ group IO scheduling even when CFQ is a module	Vivek Goyal	2009-12-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Now issues of blkio controller and CFQ in module mode should be fixed. Enable the cfq group scheduling support in module mode. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	blkio: Implement dynamic io controlling policy registration	Vivek Goyal	2009-12-04	4	-12/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o One of the goals of block IO controller is that it should be able to support mulitple io control policies, some of which be operational at higher level in storage hierarchy. o To begin with, we had one io controlling policy implemented by CFQ, and I hard coded the CFQ functions called by blkio. This created issues when CFQ is compiled as module. o This patch implements a basic dynamic io controlling policy registration functionality in blkio. This is similar to elevator functionality where ioschedulers register the functions dynamically. o Now in future, when more IO controlling policies are implemented, these can dynakically register with block IO controller. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	blkio: Export some symbols from blkio as its user CFQ can be a module	Vivek Goyal	2009-12-04	3	-2/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o blkio controller is inside the kernel and cfq makes use of interfaces exported by blkio. CFQ can be a module too, hence export symbols used by CFQ. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	block: Fix io_context leak after failure of clone with CLONE_IO	Louis Rilling	2009-12-04	4	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With CLONE_IO, parent's io_context->nr_tasks is incremented, but never decremented whenever copy_process() fails afterwards, which prevents exit_io_context() from calling IO schedulers exit functions. Give a task_struct to exit_io_context(), and call exit_io_context() instead of put_io_context() in copy_process() cleanup path. Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	block: Fix io_context leak after clone with CLONE_IO	Louis Rilling	2009-12-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With CLONE_IO, copy_io() increments both ioc->refcount and ioc->nr_tasks. However exit_io_context() only decrements ioc->refcount if ioc->nr_tasks reaches 0. Always call put_io_context() in exit_io_context(). Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	cfq-iosched: make nonrot check logic consistent	Shaohua Li	2009-12-04	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cfq_arm_slice_timer() has logic to disable idle window for SSD device. The same thing should be done at cfq_select_queue() too, otherwise we will still see idle window. This makes the nonrot check logic consistent in cfq. Tests in a intel SSD with low_latency knob close, below patch can triple disk thoughput for muti-thread sequential read. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	io controller: quick fix for blk-cgroup and modular CFQ	Jens Axboe	2009-12-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's currently not an allowed configuration, so express that in Kconfig. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	cfq-iosched: move IO controller declerations to a header file	Jens Axboe	2009-12-04	3	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	They should not be declared inside some other file that's not related to CFQ. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	cfq-iosched: fix compile problem with !CONFIG_CGROUP	Jens Axboe	2009-12-03	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	blkio: Documentation	Vivek Goyal	2009-12-03	1	-0/+135
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	blkio: Wait on sync-noidle queue even if rq_noidle = 1	Vivek Goyal	2009-12-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o rq_noidle() is supposed to tell cfq that do not expect a request after this one, hence don't idle. But this does not seem to work very well. For example for direct random readers, rq_noidle = 1 but there is next request coming after this. Not idling, leads to a group not getting its share even if group_isolation=1. o The right solution for this issue is to scan the higher layers and set right flag (WRITE_SYNC or WRITE_ODIRECT). For the time being, this single line fix helps. This should not have any significant impact when we are not using cgroups. I will later figure out IO paths in higher layer and fix it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	blkio: Implement group_isolation tunable	Vivek Goyal	2009-12-03	1	-1/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o If a group is running only a random reader, then it will not have enough traffic to keep disk busy and we will reduce overall throughput. This should result in better latencies for random reader though. If we don't idle on random reader service tree, then this random reader will experience large latencies if there are other groups present in system with sequential readers running in these. o One solution suggested by corrado is that by default keep the random readers or sync-noidle workload in root group so that during one dispatch round we idle only once on sync-noidle tree. This means that all the sync-idle workload queues will be in their respective group and we will see service differentiation in those but not on sync-noidle workload. o Provide a tunable group_isolation. If set, this will make sure that even sync-noidle queues go in their respective group and we wait on these. This provides stronger isolation between groups but at the expense of throughput if group does not have enough traffic to keep the disk busy. o By default group_isolation = 0 Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>