op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	net: fix route cache rebuilds	Eric Dumazet	2010-03-08	1	-12/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We added an automatic route cache rebuilding in commit 1080d709fb9d8cd43 but had to correct few bugs. One of the assumption of original patch, was that entries where kept sorted in a given way. This assumption is known to be wrong (commit 1ddbcb005c395518 gave an explanation of this and corrected a leak) and expensive to respect. Paweł Staszewski reported to me one of his machine got its routing cache disabled after few messages like : [ 2677.850065] Route hash chain too long! [ 2677.850080] Adjust your secret_interval! [82839.662993] Route hash chain too long! [82839.662996] Adjust your secret_interval! [155843.731650] Route hash chain too long! [155843.731664] Adjust your secret_interval! [155843.811881] Route hash chain too long! [155843.811891] Adjust your secret_interval! [155843.858209] vlan0811: 5 rebuilds is over limit, route caching disabled [155843.858212] Route hash chain too long! [155843.858213] Adjust your secret_interval! This is because rt_intern_hash() might be fooled when computing a chain length, because multiple entries with same keys can differ because of TOS (or mark/oif) bits. In the rare case the fast algorithm see a too long chain, and before taking expensive path, we call a helper function in order to not count duplicates of same routes, that only differ with tos/mark/oif bits. This helper works with data already in cpu cache and is not be very expensive, despite its O(N^2) implementation. Paweł Staszewski sucessfully tested this patch on his loaded router. Reported-and-tested-by: Paweł Staszewski <pstaszewski@itcare.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	qlcnic: remove extra space from board names	Amit Kumar Salecha	2010-03-08	1	-4/+4
\| \| \| \| \|	Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	qlcnic: fix bios version check	Amit Kumar Salecha	2010-03-08	1	-1/+1
\| \| \| \| \| \| \|	Bios sub version from unified fw image is calculated incorrect. Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	qlcnic: validate unified fw image	Sucheta Chakraborty	2010-03-08	1	-7/+139
\| \| \| \| \| \| \| \| \|	Validate all sections of unified fw image, before accessing them, to avoid seg fault. Signed-off-by: Sucheta Chakraborty <sucheta@dut6195.unminc.com> Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	qlcnic: fix multicast handling	Sucheta Chakraborty	2010-03-08	1	-25/+6
\| \| \| \| \| \| \| \| \| \| \| \|	For promiscuous mode, driver send request to device for deleting multicast addresses and again it send request for adding them back while exiting from this mode, this is bad for performance. Just setting device in promiscuous mode is enough, no need to del/add multicast addresses. Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	qlcnic: additional driver statistics.	Sucheta Chakraborty	2010-03-08	5	-2/+28
\| \| \| \| \| \| \| \| \|	Statistics added for lro/lso bytes, count for tx stop queue and wake queue and skb alloc failure count. Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	qlcnic: fix tx csum status	Sucheta Chakraborty	2010-03-08	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	Kernel default tx csum function (ethtool_op_get_tx_csum) doesn't show correct csum status. It takes various FLAGS (NETIF_F_ALL_CSUM) in account to show tx csum status, which driver doesn't set while disabling tx csum. Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	be2net: remove unused code in be_load_fw	Ajit Khaparde	2010-03-08	1	-9/+0
\| \| \| \| \| \| \|	This patch cleans up some unused code from be_load_fw(). Signed-off-by: Ajit Khaparde <ajitk@serverengines.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	be2net: remove usage of be_pci_func	Ajit Khaparde	2010-03-08	3	-12/+1
\| \| \| \| \| \| \| \| \| \| \|	When PCI functions are virtuialized in applications by assigning PCI functions to VM (PCI passthrough), the be2net driver in the VM sees a different function number. So, use of PCI function number in any calculation will break existing code. This patch takes care of it. Signed-off-by: Ajit Khaparde <ajitk@serverengines.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: Add SNMP counters for backlog and min_ttl drops	Eric Dumazet	2010-03-08	4	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 6b03a53a (tcp: use limited socket backlog) added the possibility of dropping frames when backlog queue is full. Commit d218d111 (tcp: Generalized TTL Security Mechanism) added the possibility of dropping frames when TTL is under a given limit. This patch adds new SNMP MIB entries, named TCPBacklogDrop and TCPMinTTLDrop, published in /proc/net/netstat in TcpExt: line netstat -s \| egrep "TCPBacklogDrop\|TCPMinTTLDrop" TCPBacklogDrop: 0 TCPMinTTLDrop: 0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: add __must_check to sk_add_backlog	Zhu Yi	2010-03-08	1	-1/+1
\| \| \| \| \| \| \| \|	Add the "__must_check" tag to sk_add_backlog() so that any failure to check and drop packets will be warned about. Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bridge: Fix RCU race in br_multicast_stop	Herbert Xu	2010-03-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Thanks to Paul McKenny for pointing out that it is incorrect to use synchronize_rcu_bh to ensure that pending callbacks have completed. Instead we should use rcu_barrier_bh. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bridge: Use RCU list primitive in __br_mdb_ip_get	Herbert Xu	2010-03-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	As Paul McKenney correctly pointed out, __br_mdb_ip_get needs to use the RCU list walking primitive in order to work correctly on platforms where data-dependency ordering is not guaranteed. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv6: Optmize translation between IPV6_PREFER_SRC_xxx and RT6_LOOKUP_F_xxx.	YOSHIFUJI Hideaki / 吉藤英明	2010-03-07	3	-18/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IPV6_PREFER_SRC_xxx definitions: \| #define IPV6_PREFER_SRC_TMP 0x0001 \| #define IPV6_PREFER_SRC_PUBLIC 0x0002 \| #define IPV6_PREFER_SRC_COA 0x0004 RT6_LOOKUP_F_xxx definitions: \| #define RT6_LOOKUP_F_SRCPREF_TMP 0x00000008 \| #define RT6_LOOKUP_F_SRCPREF_PUBLIC 0x00000010 \| #define RT6_LOOKUP_F_SRCPREF_COA 0x00000020 So, we can translate between these two groups by shift operation instead of multiple 'if's. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	cpmac: bump version to 0.5.2	Florian Fainelli	2010-03-07	1	-1/+1
\| \| \| \| \|	Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	cpmac: fallback to switch mode if no PHY chip found	Florian Fainelli	2010-03-07	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	If we were unable to detect a PHY on any of the MDIO bus id we tried instead of bailing out with -ENODEV, assume the MAC is connected to a switch and use MDIO bus 0. This unbreaks quite a lot of devices out there whose switch cannot be detected. Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	cpmac: fix the receiving of 802.1q frames	Florian Fainelli	2010-03-07	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	Despite what the comment above CPMAC_SKB_SIZE says, the hardware also needs to account for the FCS length in a received frame. This patch fix the receiving of 802.1q frames which have 4 more bytes. While at it unhardcode the definition and use the one from if_vlan.h. Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	MAINTAINER: Correct CAN Maintainer responsibilities and paths	Oliver Hartkopp	2010-03-07	1	-5/+13
\| \| \| \| \| \| \| \| \|	Update the CAN Maintainer responsibilities and add source paths. Additional the SocketCAN core ML is not subscribers-only anymore. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Wolfgang Grandegger <wg@grandegger.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	another pegasus usb net device	Petko Manolov	2010-03-07	1	-2/+4
\| \| \| \| \| \| \| \| \|	This one removes trailing whitespace in pegasus.h and more importantly adds new Pegasus compatible device. Signed-off-by: Julian Brown <julian@codesourcery.com> Signed-off-by: Petko Manolov <petkan@nucleusys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	irda-usb: add error handling and fix leak	Dan Carpenter	2010-03-07	1	-0/+4
\| \| \| \| \| \| \|	If the call to kcalloc() fails then we should return -ENOMEM. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sock.c: potential null dereference	Dan Carpenter	2010-03-07	1	-1/+2
\| \| \| \| \| \| \| \|	We test that "prot->rsk_prot" is non-null right before we dereference it on this line. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ems_usb: cleanup: remove uneeded check	Dan Carpenter	2010-03-07	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \|	"skb" is alway non-null here, but even if it were null the check isn't needed because dev_kfree_skb() can handle it. This eliminates a smatch warning about dereferencing a variable before checking that it is non-null. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bridge: cleanup: remove unneed check	Dan Carpenter	2010-03-07	1	-2/+2
\| \| \| \| \| \| \| \| \|	We dereference "port" on the lines immediately before and immediately after the test so port should hopefully never be null here. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	fix a race in ks8695_poll	Figo.zhang	2010-03-07	1	-1/+1
\| \| \| \| \| \| \| \|	fix a race at the end of NAPI processing in ks8695_poll() function. Signed-off-by:Figo.zhang <figo1802@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	s2io: Fixing debug message	Breno Leitao	2010-03-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently s2io is dumping debug messages using the interface name before it was allocated, showing a message like the following: s2io: eth%d: Ring Mem PHY: 0x7ef80000 s2io: s2io_reset: Resetting XFrame card eth%d This patch just fixes it, printing the pci bus information for the card instead of the interface name. Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	e1000e: fix packet corruption and tx hang during NFSv2	Jesse Brandeburg	2010-03-05	2	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	when receiving a particular type of NFS v2 UDP traffic, the hardware could DMA some bad data and then hang, possibly corrupting memory. Disable the NFS parsing in this hardware, verified to fix the bug. Originally reported and reproduced by RedHat's Neil Horman CC: nhorman@tuxdriver.com Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	typhoon: fix incorrect use of smp_wmb()	David Dillow	2010-03-05	1	-2/+4
\| \| \| \| \| \| \| \| \|	The typhoon driver was incorrectly using smp_wmb() to order memory accesses against IO to the NIC in a few instances. Use wmb() instead, which is required to actually order between memory types. Signed-off-by: David Dillow <dave@thedillows.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ethtool: Add direct access to ops->get_sset_count	Jeff Garzik	2010-03-05	2	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On 03/04/2010 09:26 AM, Ben Hutchings wrote: > On Thu, 2010-03-04 at 00:51 -0800, Jeff Kirsher wrote: >> From: Jeff Garzik<jgarzik@redhat.com> >> >> This patch is an alternative approach for accessing string >> counts, vs. the drvinfo indirect approach. This way the drvinfo >> space doesn't run out, and we don't break ABI later. > [...] >> --- a/net/core/ethtool.c >> +++ b/net/core/ethtool.c >> @@ -214,6 +214,10 @@ static noinline int ethtool_get_drvinfo(struct net_device dev, void __user use >> info.cmd = ETHTOOL_GDRVINFO; >> ops->get_drvinfo(dev,&info); >> >> + /* >> + * this method of obtaining string set info is deprecated; >> + * consider using ETHTOOL_GSSET_INFO instead >> + / > > This comment belongs on the interface (ethtool.h) not the > implementation. Debatable -- the current comment is located at the callsite of ops->get_sset_count(), which is where an implementor might think to add a new call. Not all the numeric fields in ethtool_drvinfo are obtained from ->get_sset_count(). Hence the "some" in the attached patch to include/linux/ethtool.h, addressing your comment. > [...] >> +static noinline int ethtool_get_sset_info(struct net_device dev, >> + void __user useraddr) >> +{ > [...] >> + / calculate size of return buffer */ >> + for (i = 0; i< 64; i++) >> + if (sset_mask& (1ULL<< i)) >> + n_bits++; > [...] > > We have a function for this: > > n_bits = hweight64(sset_mask); Agreed. I've attached a follow-up patch, which should enable my/Jeff's kernel patch to be applied, followed by this one. Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ethtool: Add direct access to ops->get_sset_count	Jeff Garzik	2010-03-05	2	-3/+86
\| \| \| \| \| \| \| \| \| \| \|	This patch is an alternative approach for accessing string counts, vs. the drvinfo indirect approach. This way the drvinfo space doesn't run out, and we don't break ABI later. Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: smc91x: Support Qualcomm MSM development boards.	David Brown	2010-03-05	1	-0/+14
\| \| \| \| \| \| \|	Signed-off-by: David Brown <davidb@quicinc.com> Signed-off-by: Daniel Walker <dwalker@codeaurora.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: backlog functions rename	Zhu Yi	2010-03-05	13	-17/+17
\| \| \| \| \| \| \| \| \|	sk_add_backlog -> __sk_add_backlog sk_add_backlog_limited -> sk_add_backlog Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	x25: use limited socket backlog	Zhu Yi	2010-03-05	1	-1/+1
\| \| \| \| \| \| \| \| \|	Make x25 adapt to the limited socket backlog change. Cc: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tipc: use limited socket backlog	Zhu Yi	2010-03-05	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \|	Make tipc adapt to the limited socket backlog change. Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sctp: use limited socket backlog	Zhu Yi	2010-03-05	2	-15/+30
\| \| \| \| \| \| \| \| \|	Make sctp adapt to the limited socket backlog change. Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	llc: use limited socket backlog	Zhu Yi	2010-03-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Make llc adapt to the limited socket backlog change. Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	udp: use limited socket backlog	Zhu Yi	2010-03-05	2	-12/+22
\| \| \| \| \| \| \| \| \| \| \| \|	Make udp adapt to the limited socket backlog change. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: use limited socket backlog	Zhu Yi	2010-03-05	2	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Make tcp adapt to the limited socket backlog change. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: add limit for socket backlog	Zhu Yi	2010-03-05	2	-3/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We got system OOM while running some UDP netperf testing on the loopback device. The case is multiple senders sent stream UDP packets to a single receiver via loopback on local host. Of course, the receiver is not able to handle all the packets in time. But we surprisingly found that these packets were not discarded due to the receiver's sk->sk_rcvbuf limit. Instead, they are kept queuing to sk->sk_backlog and finally ate up all the memory. We believe this is a secure hole that a none privileged user can crash the system. The root cause for this problem is, when the receiver is doing __release_sock() (i.e. after userspace recv, kernel udp_recvmsg -> skb_free_datagram_locked -> release_sock), it moves skbs from backlog to sk_receive_queue with the softirq enabled. In the above case, multiple busy senders will almost make it an endless loop. The skbs in the backlog end up eat all the system memory. The issue is not only for UDP. Any protocols using socket backlog is potentially affected. The patch adds limit for socket backlog so that the backlog size cannot be expanded endlessly. Reported-by: Alex Shi <alex.shi@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: Patrick McHardy <kaber@trash.net> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Allan Stephens <allan.stephens@windriver.com> Cc: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	rndis_wlan: correct multicast_list handling V3	Jiri Pirko	2010-03-04	1	-25/+41
\| \| \| \| \| \| \| \| \| \| \|	My previous patch (655ffee284dfcf9a24ac0343f3e5ee6db85b85c5) added locking in a bad way. Because rndis_set_oid can sleep, there is need to prepare multicast addresses into local buffer under netif_addr_lock first, then call rndis_set_oid outside. This caused reorganizing of the whole function. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reported-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
*	MAINTAINERS: Add netdev to bridge entry.	David S. Miller	2010-03-04	1	-0/+1
\| \| \| \| \| \|	Noticed by Ingo Molnar. Signed-off-by: David S. Miller <davem@davemloft.net>
*	cxgb3: fix hot plug removal crash	Divy Le Ray	2010-03-04	1	-0/+1
\| \| \| \| \| \| \| \|	queue restart tasklets need to be stopped after napi handlers are stopped since the latter can restart them. So stop them after stopping napi. Signed-off-by: Divy Le Ray <divy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	gianfar: Fix TX ring processing on SMP machines	Anton Vorontsov	2010-03-04	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Starting with commit a3bc1f11e9b867a4f49505 ("gianfar: Revive SKB recycling") gianfar driver sooner or later stops transmitting any packets on SMP machines. start_xmit() prepares new skb for transmitting, generally it does three things: 1. sets up all BDs (marks them ready to send), except the first one. 2. stores skb into tx_queue->tx_skbuff so that clean_tx_ring() would cleanup it later. 3. sets up the first BD, i.e. marks it ready. Here is what clean_tx_ring() does: 1. reads skbs from tx_queue->tx_skbuff 2. checks if the last BD is ready. If it's still ready [to send] then it it isn't transmitted, so clean_tx_ring() returns. Otherwise it actually cleanups BDs. All is OK. Now, if there is just one BD, code flow: - start_xmit(): stores skb into tx_skbuff. Note that the first BD (which is also the last one) isn't marked as ready, yet. - clean_tx_ring(): sees that skb is not null, and its lstatus says that it is NOT ready (like if BD was sent), so it cleans it up (bad!) - start_xmit(): marks BD as ready [to send], but it's too late. We can fix this simply by reordering lstatus/tx_skbuff writes. Reported-by: Martyn Welch <martyn.welch@ge.com> Bisected-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com> Tested-by: Martyn Welch <martyn.welch@ge.com> Cc: Sandeep Gopalpet <Sandeep.Kumar@freescale.com> Cc: Stable <stable@vger.kernel.org> [2.6.33] Signed-off-by: David S. Miller <davem@davemloft.net>
*	r8169: use correct barrier between cacheable and non-cacheable memory	David Dillow	2010-03-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	r8169 needs certain writes to be visible to other CPUs or the NIC before touching the hardware, but was using smp_wmb() which is only required to order cacheable memory access. Switch to wmb() which is required to order both cacheable and non-cacheable memory. Noticed by Catalin Marinas and Paul Mackerras. Signed-off-by: David Dillow <dave@thedillows.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tipc: Fix oops on send prior to entering networked mode (v3)	Neil Horman	2010-03-04	3	-53/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix TIPC to disallow sending to remote addresses prior to entering NET_MODE user programs can oops the kernel by sending datagrams via AF_TIPC prior to entering networked mode. The following backtrace has been observed: ID: 13459 TASK: ffff810014640040 CPU: 0 COMMAND: "tipc-client" [exception RIP: tipc_node_select_next_hop+90] RIP: ffffffff8869d3c3 RSP: ffff81002d9a5ab8 RFLAGS: 00010202 RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000001001001 RBP: 0000000001001001 R8: 0074736575716552 R9: 0000000000000000 R10: ffff81003fbd0680 R11: 00000000000000c8 R12: 0000000000000008 R13: 0000000000000001 R14: 0000000000000001 R15: ffff810015c6ca00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 RIP: 0000003cbd8d49a3 RSP: 00007fffc84e0be8 RFLAGS: 00010206 RAX: 000000000000002c RBX: ffffffff8005d116 RCX: 0000000000000000 RDX: 0000000000000008 RSI: 00007fffc84e0c00 RDI: 0000000000000003 RBP: 0000000000000000 R8: 00007fffc84e0c10 R9: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fffc84e0d10 R14: 0000000000000000 R15: 00007fffc84e0c30 ORIG_RAX: 000000000000002c CS: 0033 SS: 002b What happens is that, when the tipc module in inserted it enters a standalone node mode in which communication to its own address is allowed <0.0.0> but not to other addresses, since the appropriate data structures have not been allocated yet (specifically the tipc_net pointer). There is nothing stopping a client from trying to send such a message however, and if that happens, we attempt to dereference tipc_net.zones while the pointer is still NULL, and explode. The fix is pretty straightforward. Since these oopses all arise from the dereference of global pointers prior to their assignment to allocated values, and since these allocations are small (about 2k total), lets convert these pointers to static arrays of the appropriate size. All the accesses to these bits consider 0/NULL to be a non match when searching, so all the lookups still work properly, and there is no longer a chance of a bad dererence anywhere. As a bonus, this lets us eliminate the setup/teardown routines for those pointers, and elimnates the need to preform any locking around them to prevent access while their being allocated/freed. I've updated the tipc_net structure to behave this way to fix the exact reported problem, and also fixed up the tipc_bearers and media_list arrays to fix an obvious simmilar problem that arises from issuing tipc-config commands to manipulate bearers/links prior to entering networked mode I've tested this for a few hours by running the sanity tests and stress test with the tipcutils suite, and nothing has fallen over. There have been a few lockdep warnings, but those were there before, and can be addressed later, as they didn't actually result in any deadlock. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Allan Stephens <allan.stephens@windriver.com> CC: David S. Miller <davem@davemloft.net> CC: tipc-discussion@lists.sourceforge.net bearer.c \| 37 ++++++------------------------------- bearer.h \| 2 +- net.c \| 25 ++++--------------------- 3 files changed, 11 insertions(+), 53 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
*	gre: fix hard header destination address checking	Timo Teräs	2010-03-04	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ipgre_header() can be called with zero daddr when the gre device is configured as multipoint tunnel and still has the NOARP flag set (which is typically cleared by the userspace arp daemon). If the NOARP packets are not dropped, ipgre_tunnel_xmit() will take rt->rt_gateway (= NBMA IP) and use that for route look up (and may lead to bogus xfrm acquires). The multicast address check is removed as sending to multicast group should be ok. In fact, if gre device has a multicast address as destination ipgre_header is always called with multicast address. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: add scheduler sync hint to tcp_prequeue().	Mike Galbraith	2010-03-04	1	-1/+1
\| \| \| \| \| \| \|	Decreases the odds wakee will suffer from frequent cache misses. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
*	IPv6: fix race between cleanup and add/delete address	stephen hemminger	2010-03-04	1	-5/+13
\| \| \| \| \| \| \| \| \| \| \| \|	This solves a potential race problem during the cleanup process. The issue is that addrconf_ifdown() needs to traverse address list, but then drop lock to call the notifier. The version in -next could get confused if add/delete happened during this window. Original code (2.6.32 and earlier) was okay because all addresses were always deleted. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	IPv6: addrconf notify when address is unavailable	stephen hemminger	2010-03-04	1	-17/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My recent change in net-next to retain permanent addresses caused regression. Device refcount would not go to zero when device was unregistered because left over anycast reference would hold ipv6 dev reference which would hold device references... The correct procedure is to call notify chain when address is no longer available for use. When interface comes back DAD timer will notify back that address is available. Also, link local addresses should be purged when interface is brought down. The address might be changed. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	IPv6: addrconf timer race	stephen hemminger	2010-03-04	1	-13/+15
\| \| \| \| \| \| \| \| \|	The Router Solicitation timer races with device state changes because it doesn't lock the device. Use local variable to avoid one repeated dereference. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	IPv6: addrconf dad timer unnecessary bh_disable	stephen hemminger	2010-03-04	1	-5/+5
\| \| \| \| \| \| \| \| \|	Timer code runs in bottom half, so there is no need for using _bh form of locking. Also check if device is not ready to avoid race with address that is no longer active. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>