op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	tcp: change tcp_adv_win_scale and tcp_rmem[2]	Eric Dumazet	2012-05-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcp_adv_win_scale default value is 2, meaning we expect a good citizen skb to have skb->len / skb->truesize ratio of 75% (3/4) In 2.6 kernels we (mis)accounted for typical MSS=1460 frame : 1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392. So these skbs were considered as not bloated. With recent truesize fixes, a typical MSS=1460 frame truesize is now the more precise : 2048 + 256 = 2304. But 2304 * 3/4 = 1728. So these skb are not good citizen anymore, because 1460 < 1728 (GRO can escape this problem because it build skbs with a too low truesize.) This also means tcp advertises a too optimistic window for a given allocated rcvspace : When receiving frames, sk_rmem_alloc can hit sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often, especially when application is slow to drain its receive queue or in case of losses (netperf is fast, scp is slow). This is a major latency source. We should adjust the len/truesize ratio to 50% instead of 75% This patch : 1) changes tcp_adv_win_scale default to 1 instead of 2 2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account better truesize tracking and to allow autotuning tcp receive window to reach same value than before. Note that same amount of kernel memory is consumed compared to 2.6 kernels. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	TCP: update ip_local_port_range documentation	Fernando Luis Vazquez Cao	2012-04-03	1	-9/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The explanation of ip_local_port_range in Documentation/networking/ip-sysctl.txt contains several factual errors: - The default value of ip_local_port_range does not depend on the amount of memory available in the system. - tcp_tw_recycle is not enabled by default. - 1024-4999 is not the default value. - Etc. Clean up the mess. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2011-12-06	1	-5/+5
\|\
\| *	ipv4:correct description for tcp_max_syn_backlog	Peter Pan(潘卫平)	2011-12-06	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit c5ed63d66f24(tcp: fix three tcp sysctls tuning), sysctl_max_syn_backlog is determined by tcp_hashinfo->ehash_mask, and the minimal value is 128, and it will increase in proportion to the memory of machine. The original description for tcp_max_syn_backlog and sysctl_max_syn_backlog are out of date. Changelog: V2: update description for sysctl_max_syn_backlog Signed-off-by: Weiping Pan <panweiping3@gmail.com> Reviewed-by: Shan Wei <shanwei88@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	tcp: inherit listener congestion control for passive cnx	Eric Dumazet	2011-11-30	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rick Jones reported that TCP_CONGESTION sockopt performed on a listener was ignored for its children sockets : right after accept() the congestion control for new socket is the system default one. This seems an oversight of the initial design (quoted from Stephen) Based on prior investigation and patch from Rick. Reported-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Yuchung Cheng <ycheng@google.com> Tested-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	neigh: new unresolved queue limits	Eric Dumazet	2011-11-14	1	-0/+10
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit : > From: David Miller <davem@davemloft.net> > Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST) > > > From: Eric Dumazet <eric.dumazet@gmail.com> > > Date: Wed, 09 Nov 2011 12:14:09 +0100 > > > >> unres_qlen is the number of frames we are able to queue per unresolved > >> neighbour. Its default value (3) was never changed and is responsible > >> for strange drops, especially if IP fragments are used, or multiple > >> sessions start in parallel. Even a single tcp flow can hit this limit. > > ... > > > > Ok, I've applied this, let's see what happens :-) > > Early answer, build fails. > > Please test build this patch with DECNET enabled and resubmit. The > decnet neigh layer still refers to the removed ->queue_len member. > > Thanks. Ouch, this was fixed on one machine yesterday, but not the other one I used this morning, sorry. [PATCH V5 net-next] neigh: new unresolved queue limits unres_qlen is the number of frames we are able to queue per unresolved neighbour. Its default value (3) was never changed and is responsible for strange drops, especially if IP fragments are used, or multiple sessions start in parallel. Even a single tcp flow can hit this limit. $ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108 PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data. 8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: min_pmtu default is 552	Eric Dumazet	2011-11-08	1	-1/+1
\| \| \| \| \| \| \|	Small fix in Documentation, since min_pmtu is 512 + 20 + 20 = 552 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of github.com:davem330/net	David S. Miller	2011-10-07	1	-2/+2
\|\ \| \| \| \| \| \| \| \|	Conflicts: net/batman-adv/soft-interface.c
\| *	net: Documentation: Fix type of variables	Roy.Li	2011-09-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Roy.Li <rongqing.li@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'master' of github.com:davem330/net	David S. Miller	2011-09-22	1	-1/+1
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: MAINTAINERS drivers/net/Kconfig drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c drivers/net/ethernet/broadcom/tg3.c drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c drivers/net/wireless/rt2x00/rt2800usb.c drivers/net/wireless/wl12xx/main.c
\| *	net: Documentation: RFC 2553bis is now RFC 3493	Geoffrey Thomas	2011-08-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Geoffrey Thomas <geofft@mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv6: Send ICMPv6 RSes only when RAs are accepted	Tore Anderson	2011-09-16	1	-9/+8
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch improves the logic determining when to send ICMPv6 Router Solicitations, so that they are 1) always sent when the kernel is accepting Router Advertisements, and 2) never sent when the kernel is not accepting RAs. In other words, the operational setting of the "accept_ra" sysctl is used. The change also makes the special "Hybrid Router" forwarding mode ("forwarding" sysctl set to 2) operate exactly the same as the standard Router mode (forwarding=1). The only difference between the two was that RSes was being sent in the Hybrid Router mode only. The sysctl documentation describing the special Hybrid Router mode has therefore been removed. Rationale for the change: Currently, the value of forwarding sysctl is the only thing determining whether or not to send RSes. If it has the value 0 or 2, they are sent, otherwise they are not. This leads to inconsistent behaviour in the following cases: * accept_ra=0, forwarding=0 * accept_ra=0, forwarding=2 * accept_ra=1, forwarding=2 * accept_ra=2, forwarding=1 In the first three cases, the kernel will send RSes, even though it will not accept any RAs received in reply. In the last case, it will not send any RSes, even though it will accept and process any RAs received. (Most routers will send unsolicited RAs periodically, so suppressing RSes in the last case will merely delay auto-configuration, not prevent it.) Also, it is my opinion that having the forwarding sysctl control RS sending behaviour (completely independent of whether RAs are being accepted or not) is simply not what most users would intuitively expect to be the case. Signed-off-by: Tore Anderson <tore@fud.no> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2011-07-14	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c
\| *	net: Fix default in docs for tcp_orphan_retries.	David S. Miller	2011-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Default should be listed at 8 instead of 7. Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Update documented default values for various TCP/UDP tunables	Max Matveev	2011-07-04	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcp_rmem and tcp_wmem use 1 page as default value for the minimum amount of memory to be used, same as udp_wmem_min and udp_rmem_min. Pages are different size on different architectures - use the right units when describing the defaults. Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Max Matveev <makc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Update description of net.sctp.sctp_rmem and net.sctp.sctp_wmem tunables	Max Matveev	2011-07-04	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sctp does not use second and third ("default" and "max") values of sctp_rmem tunable. The format is the same as tcp_rmem but the meaning is different so make the documentation explicit to avoid confusion. sctp_wmem is not used at all. Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Max Matveev <makc@redhat.com> Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	inetpeer: remove unused list	Eric Dumazet	2011-06-08	1	-10/+0
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Andi Kleen and Tim Chen reported huge contention on inetpeer unused_peers.lock, on memcached workload on a 40 core machine, with disabled route cache. It appears we constantly flip peers refcnt between 0 and 1 values, and we must insert/remove peers from unused_peers.list, holding a contended spinlock. Remove this list completely and perform a garbage collection on-the-fly, at lookup time, using the expired nodes we met during the tree traversal. This removes a lot of code, makes locking more standard, and obsoletes two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also removes two pointers in inet_peer structure. There is still a false sharing effect because refcnt is in first cache line of object [were the links and keys used by lookups are located], we might move it at the end of inet_peer structure to let this first cache line mostly read by cpus. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Andi Kleen <andi@firstfloor.org> CC: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: document tcp_max_ssthresh (Limited Slow-Start)	Ilpo Järvinen	2011-02-20	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Base on Ilpo's patch about documenting tcp_max_ssthresh. (see http://marc.info/?l=linux-netdev&m=117950581307310&w=2) According to errata of RFC3742, fix the number of segments increased during RTT time. Just to state the occasion to use this parameter, But about how to set parameter value, maybe some others can do it. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp_ecn is an integer not a boolean	Peter Chubb	2011-02-02	1	-1/+1
\| \| \| \| \| \| \| \| \|	There was some confusion at LCA as to why the sysctl tcp_ecn took one of three values when it was documented as a Boolean. This patch fixes the documentation. Signed-off-by: Peter Chubb <peter.chubb@nicta.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: change ip_default_ttl documentation	Eric Dumazet	2010-12-13	1	-1/+3
\| \| \| \| \|	Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2010-12-08	1	-0/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath/ath9k/ar9003_eeprom.c net/llc/af_llc.c
\| *	tcp: restrict net.ipv4.tcp_adv_win_scale (#20312)	Alexey Dobriyan	2010-11-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcp_win_from_space() does the following: if (sysctl_tcp_adv_win_scale <= 0) return space >> (-sysctl_tcp_adv_win_scale); else return space - (space >> sysctl_tcp_adv_win_scale); "space" is int. As per C99 6.5.7 (3) shifting int for 32 or more bits is undefined behaviour. Indeed, if sysctl_tcp_adv_win_scale is exactly 32, space >> 32 equals space and function returns 0. Which means we busyloop in tcp_fixup_rcvbuf(). Restrict net.ipv4.tcp_adv_win_scale to [-31, 31]. Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312 Steps to reproduce: echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale wget www.kernel.org [softlockup] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	clarify documentation for net.ipv4.igmp_max_memberships	Jeremy Eder	2010-11-17	1	-3/+21
\|/ \| \| \| \| \| \| \| \| \| \|	This patch helps clarify documentation for net.ipv4.igmp_max_memberships by providing a formula for calculating the maximum number of multicast groups that can be subscribed to, plus defining the theoretical limit. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Jeremy Eder <jeder@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	docs: Add neigh/gc_thresh3 and route/max_size documentation.	Ben Greear	2010-11-12	1	-0/+9
\| \| \| \| \|	Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv6: Update ip-sysctl.txt documentation for recent changes to accept_ra and ↵	Thomas Graf	2010-09-03	1	-5/+22
\| \| \| \| \| \| \| \| \| \|	forwarding Documentation for recent changes to the tunables accept_ra and forwarding. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	arp_notify: document that a gratuitous ARP request is sent when this option ↵	Ian Campbell	2010-05-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	is enabled This option causes a gratuitous ARP request, not a reply as the documentation currently suggests. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: reserve ports for applications using fixed port numbers	Amerigo Wang	2010-05-15	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Dropped the infiniband part, because Tetsuo modified the related code, I will send a separate patch for it once this is accepted.) This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which allows users to reserve ports for third-party applications. The reserved ports will not be used by automatic port assignments (e.g. when calling connect() or bind() with port number 0). Explicit port allocation behavior is unchanged. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2010-02-25	1	-4/+4
\|\ \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
\| *	IPv6: better document max_addresses parameter	Brian Haley	2010-02-23	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Andrew Morton wrote: >> >From ip-sysctl.txt file in kernel documentation I can see following description >> for max_addresses: >> max_addresses - INTEGER >> Number of maximum addresses per interface. 0 disables limitation. >> It is recommended not set too large value (or 0) because it would >> be too easy way to crash kernel to allow to create too much of >> autoconfigured addresses. ^^^^^^^^^^^^^^ >> If this parameter applies only for auto-configured IP addressed, please state >> it more clearly in docs or rename the parameter to show that it refers to >> auto-configuration. It did mention autoconfigured in the text, but the below makes it more obvious. More clearly document IPv6 max_addresses parameter. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: TCP thin dupack	Andreas Petlund	2010-02-18	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables fast retransmissions after one dupACK for TCP if the stream is identified as thin. This will reduce latencies for thin streams that are not able to trigger fast retransmissions due to high packet interarrival time. This mechanism is only active if enabled by iocontrol or syscontrol and the stream is identified as thin. Signed-off-by: Andreas Petlund <apetlund@simula.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: TCP thin linear timeouts	Andreas Petlund	2010-02-18	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch will make TCP use only linear timeouts if the stream is thin. This will help to avoid the very high latencies that thin stream suffer because of exponential backoff. This mechanism is only active if enabled by iocontrol or syscontrol and the stream is identified as thin. A maximum of 6 linear timeouts is tried before exponential backoff is resumed. Signed-off-by: Andreas Petlund <apetlund@simula.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv4: allow warming up the ARP cache with request type gratuitous ARP	Octavian Purdila	2010-01-19	1	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the per device ARP_ACCEPT option is enable, currently we only allow creating new ARP cache entries for response type gratuitous ARP. Allowing gratuitous ARP to create new ARP entries (not only to update existing ones) is useful when we want to avoid unnecessary delays for the first packet of a stream. This patch allows request type gratuitous ARP to create new ARP cache entries as well. This is useful when we want to populate the ARP cache entries for a large number of hosts on the same LAN. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: RFC3069, private VLAN proxy arp support	Jesper Dangaard Brouer	2010-01-07	1	-0/+19
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is to be used together with switch technologies, like RFC3069, that where the individual ports are not allowed to communicate with each other, but they are allowed to talk to the upstream router. As described in RFC 3069, it is possible to allow these hosts to communicate through the upstream router by proxy_arp'ing. This patch basically allow proxy arp replies back to the same interface (from which the ARP request/solicitation was received). Tunable per device via proc "proxy_arp_pvlan": /proc/sys/net/ipv4/conf/*/proxy_arp_pvlan This switch technology is known by different vendor names: - In RFC 3069 it is called VLAN Aggregation. - Cisco and Allied Telesyn call it Private VLAN. - Hewlett-Packard call it Source-Port filtering or port-isolation. - Ericsson call it MAC-Forced Forwarding (RFC Draft). Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4 05/05: add sysctl to accept packets with local source addresses	Patrick McHardy	2009-12-03	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 8ec1e0ebe26087bfc5c0394ada5feb5758014fc8 Author: Patrick McHardy <kaber@trash.net> Date: Thu Dec 3 12:16:35 2009 +0100 ipv4: add sysctl to accept packets with local source addresses Change fib_validate_source() to accept packets with a local source address when the "accept_local" sysctl is set for the incoming inet device. Combined with the previous patches, this allows to communicate between multiple local interfaces over the wire. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS	William Allen Simpson	2009-12-02	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Define sysctl (tcp_cookie_size) to turn on and off the cookie option default globally, instead of a compiled configuration option. Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant data values, retrieving variable cookie values, and other facilities. Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h, near its corresponding struct tcp_options_received (prior to changes). This is a straightforward re-implementation of an earlier (year-old) patch that no longer applies cleanly, with permission of the original author (Adam Langley): http://thread.gmane.org/gmane.linux.network/102586 These functions will also be used in subsequent patches that implement additional features. Requires: net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED Signed-off-by: William.Allen.Simpson@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2009-12-02	1	-2/+2
\|\ \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
\| *	ip: update the description of rp_filter in ip-sysctl.txt	Shan Wei	2009-12-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The commit 27fed4175acf81ddd91d9a4ee2fd298981f60295 (ip: fix logic of reverse path filter sysctl) has changed the logic of rp_filter. The document about rp_filter is out of date. Now, setting conf/all/rp_filte with 0 can also enable source validation. Update the document according to the commit. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	make TLLAO option for NA packets configurable	Octavian Purdila	2009-10-07	1	-0/+18
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Friday 02 October 2009 20:53:51 you wrote: > This is good although I would have shortened the name. Ah, I knew I forgot something :) Here is v4. tavi >From 24d96d825b9fa832b22878cc6c990d5711968734 Mon Sep 17 00:00:00 2001 From: Octavian Purdila <opurdila@ixiacom.com> Date: Fri, 2 Oct 2009 00:51:15 +0300 Subject: [PATCH] ipv6: new sysctl for sending TLLAO with unicast NAs Neighbor advertisements responding to unicast neighbor solicitations did not include the target link-layer address option. This patch adds a new sysctl option (disabled by default) which controls whether this option should be sent even with unicast NAs. The need for this arose because certain routers expect the TLLAO in some situations even as a response to unicast NS packets. Moreover, RFC 2461 recommends sending this to avoid a race condition (section 4.4, Target link-layer address) Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sctp: Sysctl configuration for IPv4 Address Scoping	Bhaskar Dutta	2009-09-04	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces a new sysctl option to make IPv4 Address Scoping configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>. In networking environments where DNAT rules in iptables prerouting chains convert destination IP's to link-local/private IP addresses, SCTP connections fail to establish as the INIT chunk is dropped by the kernel due to address scope match failure. For example to support overlapping IP addresses (same IP address with different vlan id) a Layer-5 application listens on link local IP's, and there is a DNAT rule that maps the destination IP to a link local IP. Such applications never get the SCTP INIT if the address-scoping draft is strictly followed. This sysctl configuration allows SCTP to function in such unconventional networking environments. Sysctl options: 0 - Disable IPv4 address scoping draft altogether 1 - Enable IPv4 address scoping (default, current behavior) 2 - Enable address scoping but allow IPv4 private addresses in init/init-ack 3 - Enable address scoping but allow IPv4 link local address in init/init-ack Signed-off-by: Bhaskar Dutta <bhaskar.dutta@globallogic.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
*	RTO connection timeout: sysctl documentation update	Damian Lukowski	2009-09-01	1	-11/+26
\| \| \| \| \| \| \| \|	This patch updates the sysctl documentation concerning the interpretation of tcp_retries{1,2} and tcp_orphan_retries. Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
*	IPv6: Add 'autoconf' and 'disable_ipv6' module parameters	Brian Haley	2009-06-01	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add 'autoconf' and 'disable_ipv6' parameters to the IPv6 module. The first controls if IPv6 addresses are autoconfigured from prefixes received in Router Advertisements. The IPv6 loopback (::1) and link-local addresses are still configured. The second controls if IPv6 addresses are desired at all. No IPv6 addresses will be added to any interfaces. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2009-05-18	1	-3/+12
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/scsi/fcoe/fcoe.c
\| *	Doc: fixed descriptions on /proc/sys/net/core/* and /proc/sys/net/unix/*	Wang Tinggong	2009-05-17	1	-3/+12
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Wang Tinggong <wangtinggong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	tcp: extend ECN sysctl to allow server-side only ECN	Ilpo Järvinen	2009-05-04	1	-1/+10
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should be very safe compared with full enabled, so I see no reason why it shouldn't be done right away. As ECN can only be negotiated if the SYN sending party is also supporting it, somebody in the loop probably knows what he/she is doing. If SYN does not ask for ECN, the server side SYN-ACK is identical to what it is without ECN. Thus it's quite safe. The chosen value is safe w.r.t to existing configs which choose to currently set manually either 0 or 1 but silently upgrades those who have not explicitly requested ECN off. Whether to just enable both sides comes up time to time but unless that gets done now we can at least make the servers aware of ECN already. As there are some known problems to occur if ECN is enabled, it's currently questionable whether there's any real gain from enabling clients as servers mostly won't support it anyway (so we'd hit just the negative sides). After enabling the servers and getting that deployed, the client end enable really has some potential gain too. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv6: Fix incorrect disable_ipv6 behavior	Brian Haley	2009-03-18	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the behavior of allowing both sysctl and addrconf_dad_failure() to set the disable_ipv6 parameter without any bad side-effects. If DAD fails and accept_dad > 1, we will still set disable_ipv6=1, but then instead of allowing an RA to add an address then immediately fail DAD, we simply don't allow the address to be added in the first place. This also lets the user set this flag and disable all IPv6 addresses on the interface, or on the entire system. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Doc: Cleanup whitespaces in ip-sysctl.txt	Jesper Dangaard Brouer	2009-02-24	1	-59/+59
\| \| \| \| \| \| \|	Fix up whitespaces while going though ip-sysctl.txt anyway. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Doc: Fix typos in ip-sysctl.txt about rp_filter.	Jesper Dangaard Brouer	2009-02-24	1	-2/+2
\| \| \| \| \| \| \|	First fix a typo in Stephens patch ;-) Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ip: add loose reverse path filtering	Stephen Hemminger	2009-02-22	1	-9/+15
\| \| \| \| \| \| \| \| \| \| \|	Extend existing reverse path filter option to allow strict or loose filtering. (See http://en.wikipedia.org/wiki/Reverse_path_filtering). For compatibility with existing usage, the value 1 is chosen for strict mode and 2 for loose mode. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: add ARP notify option for devices	Stephen Hemminger	2009-02-01	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	This adds another inet device option to enable gratuitous ARP when device is brought up or address change. This is handy for clusters or virtualization. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: implement emergency route cache rebulds when gc_elasticity is exceeded	Neil Horman	2008-10-27	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a patch to provide on demand route cache rebuilding. Currently, our route cache is rebulid periodically regardless of need. This introduced unneeded periodic latency. This patch offers a better approach. Using code provided by Eric Dumazet, we compute the standard deviation of the average hash bucket chain length while running rt_check_expire. Should any given chain length grow to larger that average plus 4 standard deviations, we trigger an emergency hash table rebuild for that net namespace. This allows for the common case in which chains are well behaved and do not grow unevenly to not incur any latency at all, while those systems (which may be being maliciously attacked), only rebuild when the attack is detected. This patch take 2 other factors into account: 1) chains with multiple entries that differ by attributes that do not affect the hash value are only counted once, so as not to unduly bias system to rebuilding if features like QOS are heavily used 2) if rebuilding crosses a certain threshold (which is adjustable via the added sysctl in this patch), route caching is disabled entirely for that net namespace, since constant rebuilding is less efficient that no caching at all Tested successfully by me. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>