op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	net: remove use of ndo_set_multicast_list in drivers	Jiri Pirko	2011-08-17	7	-10/+6
\| \| \| \| \| \| \|	replace it by ndo_set_rx_mode Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: introduce IFF_UNICAST_FLT private flag	Jiri Pirko	2011-08-17	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use IFF_UNICAST_FTL to find out if driver handles unicast address filtering. In case it does not, promisc mode is entered. Patch also fixes following drivers: stmmac, niu: support uc filtering and yet it propagated ndo_set_multicast_list bna, benet, pxa168_eth, ks8851, ks8851_mll, ksz884x : has set ndo_set_rx_mode but do not support uc filtering Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	rps: Inspect GRE encapsulated packets to get flow hash	Tom Herbert	2011-08-17	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Crack open GRE packets in __skb_get_rxhash to compute 4-tuple hash on in encapsulated packet. Note that this is used only when the __skb_get_rxhash is taken, in particular only when the device does not compute provide the rxhash (ie. feature is disabled). This was tested by creating a single GRE tunnel between two 16 core AMD machines. 200 netperf TCP_RR streams were ran with 1 byte request and response size. Without patch: 157497 tps, 50/90/99% latencies 1250/1292/1364 usecs With patch: 325896 tps, 50/90/99% latencies 603/848/1169 Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	rps: Infrastructure in __skb_get_rxhash for deep inspection	Tom Herbert	2011-08-17	1	-0/+7
\| \| \| \| \| \| \|	Basics for looking for ports in encapsulated packets in tunnels. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	rps: Add flag to skb to indicate rxhash is based on L4 tuple	Tom Herbert	2011-08-17	6	-13/+16
\| \| \| \| \| \| \| \| \| \| \|	The l4_rxhash flag was added to the skb structure to indicate that the rxhash value was computed over the 4 tuple for the packet which includes the port information in the encapsulated transport packet. This is used by the stack to preserve the rxhash value in __skb_rx_tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	rps: Some minor cleanup in get_rps_cpus	Tom Herbert	2011-08-17	1	-5/+7
\| \| \| \| \| \| \|	Use some variables for clarity and extensibility. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	af_iucv: add HiperSockets transport	Ursula Braun	2011-08-13	1	-72/+677
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current transport mechanism for af_iucv is the z/VM offered communications facility IUCV. To provide equivalent support when running Linux in an LPAR, HiperSockets transport is added to the AF_IUCV address family. It requires explicit binding of an AF_IUCV socket to a HiperSockets device. A new packet_type ETH_P_AF_IUCV is announced. An af_iucv specific transport header is defined preceding the skb data. A small protocol is implemented for connecting and for flow control/congestion management. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	af_iucv: cleanup - use iucv_sk(sk) early	Ursula Braun	2011-08-13	1	-21/+23
\| \| \| \| \| \| \| \|	Code cleanup making make use of local variable for struct iucv_sock. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	af_iucv: use loadable iucv interface	Frank Blaschka	2011-08-13	1	-45/+74
\| \| \| \| \| \| \| \|	For future af_iucv extensions the module should be able to run in LPAR mode too. For this we use the new dynamic loading iucv interface. Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	iucv: kernel option for z/VM IUCV and HiperSockets	Ursula Braun	2011-08-13	1	-6/+8
\| \| \| \| \| \| \| \| \| \|	When adding HiperSockets transport to AF_IUCV Sockets, af_iucv either depends on IUCV or QETH_L3 (or both). This patch introduces the necessary changes for kernel configuration. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	iucv: introduce loadable iucv interface	Frank Blaschka	2011-08-13	1	-0/+23
\| \| \| \| \| \| \|	This patch adds a symbol to dynamically load iucv functions. Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: cleanup some rcu_dereference_raw	Eric Dumazet	2011-08-12	10	-23/+22
\| \| \| \| \| \| \| \| \|	RCU api had been completed and rcu_access_pointer() or rcu_dereference_protected() are better than generic rcu_dereference_raw() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	neigh: reduce arp latency	Eric Dumazet	2011-08-12	1	-14/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the artificial HZ latency on arp resolution. Instead of firing a timer in one jiffy (up to 10 ms if HZ=100), lets send the ARP message immediately. Before patch : # arp -d 192.168.20.108 ; ping -c 3 192.168.20.108 PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data. 64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=9.91 ms 64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.065 ms 64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.061 ms After patch : $ arp -d 192.168.20.108 ; ping -c 3 192.168.20.108 PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data. 64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=0.152 ms 64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.064 ms 64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.074 ms Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	rtnetlink: remove initialization of dev->real_num_tx_queues	Jiri Pirko	2011-08-11	1	-1/+0
\| \| \| \| \| \| \|	dev->real_num_tx_queues is correctly set already in alloc_netdev_mqs. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: fix potential neighbour race in dst_ifdown()	Eric Dumazet	2011-08-09	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Followup of commit f2c31e32b378a (fix NULL dereferences in check_peer_redir()). We need to make sure dst neighbour doesnt change in dst_ifdown(). Fix some sparse errors. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net	David S. Miller	2011-08-07	41	-103/+353
\|\
\| *	ipv4: use dst with ref during bcast/mcast loopback	Julian Anastasov	2011-08-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make sure skb dst has reference when moving to another context. Currently, I don't see protocols that can hit it when sending broadcasts/multicasts to loopback using noref dsts, so it is just a precaution. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv4: route non-local sources for raw socket	Julian Anastasov	2011-08-07	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The raw sockets can provide source address for routing but their privileges are not considered. We can provide non-local source address, make sure the FLOWI_FLAG_ANYSRC flag is set if socket has privileges for this, i.e. based on hdrincl (IP_HDRINCL) and transparent flags. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	netfilter: TCP and raw fix for ip_route_me_harder	Julian Anastasov	2011-08-07	1	-10/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TCP in some cases uses different global (raw) socket to send RST and ACK. The transparent flag is not set there. Currently, it is a problem for rerouting after the previous change. Fix it by simplifying the checks in ip_route_me_harder and use FLOWI_FLAG_ANYSRC even for sockets. It looks safe because the initial routing allowed this source address to be used and now we just have to make sure the packet is rerouted. As a side effect this also allows rerouting for normal raw sockets that use spoofed source addresses which was not possible even before we eliminated the ip_route_input call. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv4: Fix ip_getsockopt for IP_PKTOPTIONS	Daniel Baluta	2011-08-07	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IP_PKTOPTIONS is broken for 32-bit applications running in COMPAT mode on 64-bit kernels. This happens because msghdr's msg_flags field is always set to zero. When running in COMPAT mode this should be set to MSG_CMSG_COMPAT instead. Signed-off-by: Tiberiu Szocs-Mihai <tszocs@ixiacom.com> Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv4: fix the reusing of routing cache entries	Julian Anastasov	2011-08-07	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	compare_keys and ip_route_input_common rely on rt_oif for distinguishing of input and output routes with same keys values. But sometimes the input route has also same hash chain (keyed by iif != 0) with the output routes (keyed by orig_oif=0). Problem visible if running with small number of rhash_entries. Fix them to use rt_route_iif instead. By this way input route can not be returned to users that request output route. The patch fixes the ip_rt_bug errors that were reported in ip_local_out context, mostly for 255.255.255.255 destinations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	netfilter: avoid double free in nf_reinject	Julian Anastasov	2011-08-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	NF_STOLEN means skb was already freed Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: Compute protocol sequence numbers and fragment IDs using MD5.	David S. Miller	2011-08-06	11	-8/+195
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Computers have become a lot faster since we compromised on the partial MD4 hash which we use currently for performance reasons. MD5 is a much safer choice, and is inline with both RFC1948 and other ISS generators (OpenBSD, Solaris, etc.) Furthermore, only having 24-bits of the sequence number be truly unpredictable is a very serious limitation. So the periodic regeneration and 8-bit counter have been removed. We compute and use a full 32-bit sequence number. For ipv6, DCCP was found to use a 32-bit truncated initial sequence number (it needs 43-bits) and that is fixed here as well. Reported-by: Dan Kaminsky <dan@doxpara.com> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: check for IPv4 mapped addresses when connecting IPv6 sockets	Max Matveev	2011-08-05	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When support for binding to 'mapped INADDR_ANY (::ffff.0.0.0.0)' was added in 0f8d3c7ac3693d7b6c731bf2159273a59bf70e12 the rest of the code wasn't told so now it's possible to bind IPv6 datagram socket to ::ffff.0.0.0.0, connect it to another IPv4 address and it will all work except for getsockhame() which does not return the local address as expected. To give getsockname() something to work with check for 'mapped INADDR_ANY' when connecting and update the in-core source addresses appropriately. Signed-off-by: Max Matveev <makc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: Fix security_socket_sendmsg() bypass problem.	Tetsuo Handa	2011-08-05	1	-9/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sendmmsg() introduced by commit 228e548e "net: Add sendmmsg socket system call" is capable of sending to multiple different destination addresses. SMACK is using destination's address for checking sendmsg() permission. However, security_socket_sendmsg() is called for only once even if multiple different destination addresses are passed to sendmmsg(). Therefore, we need to call security_socket_sendmsg() for each destination address rather than only the first destination address. Since calling security_socket_sendmsg() every time when only single destination address was passed to sendmmsg() is a waste of time, omit calling security_socket_sendmsg() unless destination address of previous datagram and that of current datagram differs. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Anton Blanchard <anton@samba.org> Cc: stable <stable@kernel.org> [3.0+] Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: Cap number of elements for sendmmsg	Anton Blanchard	2011-08-05	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To limit the amount of time we can spend in sendmmsg, cap the number of elements to UIO_MAXIOV (currently 1024). For error handling an application using sendmmsg needs to retry at the first unsent message, so capping is simpler and requires less application logic than returning EINVAL. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable <stable@kernel.org> [3.0+] Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: sendmmsg should only return an error if no messages were sent	Anton Blanchard	2011-08-05	1	-24/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sendmmsg uses a similar error return strategy as recvmmsg but it turns out to be a confusing way to communicate errors. The current code stores the error code away and returns it on the next sendmmsg call. This means a call with completely valid arguments could get an error from a previous call. Change things so we only return an error if no datagrams could be sent. If less than the requested number of messages were sent, the application must retry starting at the first failed one and if the problem is persistent the error will be returned. This matches the behaviour of other syscalls like read/write - it is not an error if less than the requested number of elements are sent. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable <stable@kernel.org> [3.0+] Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Merge branch 'for-davem' of ↵	David S. Miller	2011-08-03	1	-1/+1
\| \|\ \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
\| \| *	Merge git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next ↵	John W. Linville	2011-08-03	1	-1/+1
\| \| \|\ \| \| \| \| \| \| \| \| \| \| \| \|	into for-davem
\| \| \| *	cfg80211: off by one in nl80211_trigger_scan()	Dan Carpenter	2011-08-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The test is off by one so we'd read past the end of the wiphy->bands[] array on the next line. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| * \| \|	net: fix NULL dereferences in check_peer_redir()	Eric Dumazet	2011-08-03	6	-22/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gergely Kalman reported crashes in check_peer_redir(). It appears commit f39925dbde778 (ipv4: Cache learned redirect information in inetpeer.) added a race, leading to possible NULL ptr dereference. Since we can now change dst neighbour, we should make sure a reader can safely use a neighbour. Add RCU protection to dst neighbour, and make sure check_peer_redir() can be called safely by different cpus in parallel. As neighbours are already freed after one RCU grace period, this patch should not add typical RCU penalty (cache cold effects) Many thanks to Gergely for providing a pretty report pointing to the bug. Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	net: add kerneldoc to skb_copy_bits()	Eric Dumazet	2011-08-01	1	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since skb_copy_bits() is called from assembly, add a fat comment to make clear we should think twice before changing its prototype. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	doc: Update the email address for Paul Moore in various source files	Paul Moore	2011-08-01	14	-15/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My @hp.com will no longer be valid starting August 5, 2011 so an update is necessary. My new email address is employer independent so we don't have to worry about doing this again any time soon. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	atm: br2864: sent packets truncated in VC routed mode	Chas Williams	2011-08-01	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reported-by: Pascal Hambourg <pascal@plouf.fr.eu.org> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	sch_sfq: fix sfq_enqueue()	Eric Dumazet	2011-08-01	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 8efa88540635 (sch_sfq: avoid giving spurious NET_XMIT_CN signals) forgot to call qdisc_tree_decrease_qlen() to signal upper levels that a packet (from another flow) was dropped, leading to various problems. With help from Michal Soltys and Michal Pokrywka, who did a bisection. Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=39372 Debian ref: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631945 Reported-by: Lucas Bocchi <lucas.bocchi@gmail.com> Reported-and-bisected-by: Michal Pokrywka <wolfmoon@o2.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Michal Soltys <soltys@ziu.info> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	net: adjust array index	Julia Lawall	2011-08-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert array index from the loop bound to the loop index. A simplified version of the semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression e1,e2,ar; @@ for(e1 = 0; e1 < e2; e1++) { <... ar[ - e2 + e1 ] ...> } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \| \|	rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER	Stephen Hemminger	2011-08-02	55	-193/+193
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When assigning a NULL value to an RCU protected pointer, no barrier is needed. The rcu_assign_pointer, used to handle that but will soon change to not handle the special case. Convert all rcu_assign_pointer of NULL value. //smpl @@ expression P; @@ - rcu_assign_pointer(P, NULL) + RCU_INIT_POINTER(P, NULL) // </smpl> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \| \|	ipv6: updates to privacy addresses per RFC 4941.	Lorenzo Colitti	2011-08-01	1	-21/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the code to handle some of the differences between RFC 3041 and RFC 4941, which obsoletes it. Also a couple of janitorial fixes. - Allow router advertisements to increase the lifetime of temporary addresses. This was not allowed by RFC 3041, but is specified by RFC 4941. It is useful when RA lifetimes are lower than TEMP_{VALID,PREFERRED}_LIFETIME: in this case, the previous code would delete or deprecate addresses prematurely. - Change the default of MAX_RETRY to 3 per RFC 4941. - Add a comment to clarify that the preferred and valid lifetimes in inet6_ifaddr are relative to the timestamp. - Shorten lines to 80 characters in a couple of places. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \| \|	Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6	David S. Miller	2011-08-01	6	-24/+271
\|\ \ \ \
\| * \| \| \|	dccp ccid-2: check Ack Ratio when reducing cwnd	Samuel Jero	2011-08-01	1	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch causes CCID-2 to check the Ack Ratio after reducing the congestion window. If the Ack Ratio is greater than the congestion window, it is reduced. This prevents timeouts caused by an Ack Ratio larger than the congestion window. In this situation, we choose to set the Ack Ratio to half the congestion window (or one if that's zero) so that if we loose one ack we don't trigger a timeout. Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
\| * \| \| \|	dccp ccid-2: increment cwnd correctly	Samuel Jero	2011-08-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes an issue where CCID-2 will not increase the congestion window for numerous RTTs after an idle period, application-limited period, or a loss once the algorithm is in Congestion Avoidance. What happens is that, when CCID-2 is in Congestion Avoidance mode, it will increase hc->tx_packets_acked by one for every packet and will increment cwnd every cwnd packets. However, if there is now an idle period in the connection, cwnd will be reduced, possibly below the slow start threshold. This will cause the connection to go into Slow Start. However, in Slow Start CCID-2 performs this test to increment cwnd every second ack: ++hc->tx_packets_acked == 2 Unfortunately, this will be incorrect, if cwnd previous to the idle period was larger than 2 and if tx_packets_acked was close to cwnd. For example: cwnd=50 and tx_packets_acked=45. In this case, the current code, will increment tx_packets_acked until it equals two, which will only be once tx_packets_acked (an unsigned 32-bit integer) overflows. My fix is simply to change that test for tx_packets_acked greater than or equal to two in slow start. Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
\| * \| \| \|	dccp ccid-2: prevent cwnd > Sequence Window	Samuel Jero	2011-08-01	2	-15/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a check to prevent CCID-2 from increasing the cwnd greater than the Sequence Window. When the congestion window becomes bigger than the Sequence Window, CCID-2 will attempt to keep more data in the network than the DCCP Sequence Window code considers possible. This results in the Sequence Window code issuing a Sync, thereby inducing needless overhead. Further, if this occurs at the sender, CCID-2 will never detect the problem because the Acks it receives will indicate no losses. I have seen this cause a drop of 1/3rd in throughput for a connection. Also add code to adjust the Sequence Window to be about 5 times the number of packets in the network (RFC 4340, 7.5.2) and to adjust the Ack Ratio so that the remote Sequence Window will hold about 5 times the number of packets in the network. This allows the congestion window to increase correctly without being limited by the Sequence Window. Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
\| * \| \| \|	dccp ccid-2: use feature-negotiation to report Ack Ratio changes	Gerrit Renker	2011-08-01	2	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This uses the new feature-negotiation framework to signal Ack Ratio changes, as required by RFC 4341, sec. 6.1.2. That raises some problems with CCID-2, which at the moment can not cope gracefully with Ack Ratios > 1. Since these issues are not directly related to feature negotiation, they are marked by a FIXME. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
\| * \| \| \|	dccp: send Confirm options only once	Samuel Jero	2011-08-01	1	-5/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a connection is in the OPEN state, remove feature negotiation Confirm options from the list of options after sending them once; as such options are NOT supposed to be retransmitted and are ONLY supposed to be sent in response to a Change option (RFC 4340 6.2). Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
\| * \| \| \|	dccp: support for exchanging of NN options in established state 2/2	Gerrit Renker	2011-08-01	1	-0/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the receiver side and the (fast-path) activation part for dynamic changes of non-negotiable (NN) parameters in (PART)OPEN state. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
\| * \| \| \|	dccp: support for the exchange of NN options in established state 1/2	Gerrit Renker	2011-08-01	3	-0/+67
\| \| \|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In contrast to static feature negotiation at the begin of a connection, this patch introduces support for exchange of dynamically changing options. Such an update/exchange is necessary in at least two cases: * CCID-2's Ack Ratio (RFC 4341, 6.1.2) which changes during the connection; * Sequence Window values that, as per RFC 4340, 7.5.2, should be sent "as the connection progresses". Both are non-negotiable (NN) features, which means that no new capabilities are negotiated, but rather that changes in known parameters are brought up-to-date at either end. Thse characteristics are reflected by the implementation: * only NN options can be exchanged after connection setup; * an ack is scheduled directly after activation to speed up the update; * CCIDs may request changes to an NN feature even if a negotiation for that feature is already underway: this is required by CCID-2, where changes in cwnd necessitate Ack Ratio changes, such that the previous Ack Ratio (which is still being negotiated) would cause irrecoverable RTO timeouts (thanks to work by Samuel Jero). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
* \| \| \|	ip6tnl: avoid touching dst refcount in ip6_tnl_xmit2()	Eric Dumazet	2011-08-01	1	-13/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Even using percpu stats, we still hit tunnel dst_entry refcount in ip6_tnl_xmit2() Since we are in RCU locked section, we can use skb_dst_set_noref() and avoid these atomic operations, leaving dst shared on cpus. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \| \|	ipv6: avoid a dst_entry refcount change in ipv6_destopt_rcv()	Eric Dumazet	2011-08-01	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ipv6_destopt_rcv() runs with rcu_read_lock(), so there is no need to take a temporay reference on dst_entry, even if skb is freed by ip6_parse_tlv() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \| \|	ipv6: use RCU in inet6_csk_xmit()	Eric Dumazet	2011-08-01	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use RCU to avoid changing dst_entry refcount in fast path. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \| \|	ipv6: some RCU conversions	Eric Dumazet	2011-08-01	2	-35/+21
\| \|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ICMP and ND are not fast path, but still we can avoid changing idev refcount, using RCU. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>