op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	net/hsr: Better variable names and update of contact info.	Arvid Brodin	2014-07-08	8	-305/+304
\| \| \| \| \|	Signed-off-by: Arvid Brodin <arvid.brodin@alten.se> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipconfig: add static to local variable	Fabian Frederick	2014-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	ic_dev_xid is only used in ipconfig.c Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: netdev@vger.kernel.org Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mac802154: at86rf230: add hw flags and merge ops	Alexander Aring	2014-07-07	1	-14/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds new mac802154 hw flags for transmit power, csma and listen before transmit (lbt). These flags indicates that the transceiver supports these features. If the flags are set and the driver doesn't implement the necessary functions, then ieee802154_register_device returns -ENOSYS "Function not implemented". This patch merges also all at86rf230 operations into one operations structure and set the right hw flags for the at86rf230 transceivers. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Only do flow_dissector hash computation once per packet	Tom Herbert	2014-07-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	Add sw_hash flag to skbuff to indicate that skb->hash was computed from flow_dissector. This flag is checked in skb_get_hash to avoid repeatedly trying to compute the hash (ie. in the case that no L4 hash can be computed). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv6: Implement automatic flow label generation on transmit	Tom Herbert	2014-07-07	6	-5/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Automatically generate flow labels for IPv6 packets on transmit. The flow label is computed based on skb_get_hash. The flow label will only automatically be set when it is zero otherwise (i.e. flow label manager hasn't set one). This supports the transmit side functionality of RFC 6438. Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this functionality per socket. By default, auto flowlabels are disabled to avoid possible conflicts with flow label manager, however if this feature proves useful we may want to enable it by default. It should also be noted that FreeBSD has already implemented automatic flow labels (including the sysctl and socket option). In FreeBSD, automatic flow labels default to enabled. Performance impact: Running super_netperf with 200 flows for TCP_RR and UDP_RR for IPv6. Note that in UDP case, __skb_get_hash will be called for every packet with explains slight regression. In the TCP case the hash is saved in the socket so there is no regression. Automatic flow labels disabled: TCP_RR: 86.53% CPU utilization 127/195/322 90/95/99% latencies 1.40498e+06 tps UDP_RR: 90.70% CPU utilization 118/168/243 90/95/99% latencies 1.50309e+06 tps Automatic flow labels enabled: TCP_RR: 85.90% CPU utilization 128/199/337 90/95/99% latencies 1.40051e+06 UDP_RR 92.61% CPU utilization 115/164/236 90/95/99% latencies 1.4687e+06 Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	flow_dissector: Use IPv6 flow label in flow_dissector	Tom Herbert	2014-07-07	1	-0/+17
\| \| \| \| \| \| \| \| \| \|	This patch implements the receive side to support RFC 6438 which is to use the flow label as an ECMP hash. If an IPv6 flow label is set in a packet we can use this as input for computing an L4-hash. There should be no need to parse any transport headers in this case. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	vxlan: Call udp_flow_src_port	Tom Herbert	2014-07-07	1	-4/+1
\| \| \| \| \| \| \| \| \|	In vxlan and OVS vport-vxlan call common function to get source port for a UDP tunnel. Removed vxlan_src_port since the functionality is now in udp_flow_src_port. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Call skb_get_hash in get_xps_queue and __skb_tx_hash	Tom Herbert	2014-07-07	1	-24/+5
\| \| \| \| \| \| \| \|	Call standard function to get a packet hash instead of taking this from skb->sk->sk_hash or only using skb->protocol. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Save TX flow hash in sock and set in skbuf on xmit	Tom Herbert	2014-07-07	5	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For a connected socket we can precompute the flow hash for setting in skb->hash on output. This is a performance advantage over calculating the skb->hash for every packet on the connection. The computation is done using the common hash algorithm to be consistent with computations done for packets of the connection in other states where thers is no socket (e.g. time-wait, syn-recv, syn-cookies). This patch adds sk_txhash to the sock structure. inet_set_txhash and ip6_set_txhash functions are added which are called from points in TCP and UDP where socket moves to established state. skb_set_hash_from_sk is a function which sets skb->hash from the sock txhash value. This is called in UDP and TCP transmit path when transmitting within the context of a socket. Tested: ran super_netperf with 200 TCP_RR streams over a vxlan interface (in this case skb_get_hash called on every TX packet to create a UDP source port). Before fix: 95.02% CPU utilization 154/256/505 90/95/99% latencies 1.13042e+06 tps Time in functions: 0.28% skb_flow_dissect 0.21% __skb_get_hash After fix: 94.95% CPU utilization 156/254/485 90/95/99% latencies 1.15447e+06 Neither __skb_get_hash nor skb_flow_dissect appear in perf Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	flow_dissector: Abstract out hash computation	Tom Herbert	2014-07-07	1	-16/+28
\| \| \| \| \| \| \| \| \|	Move the hash computation located in __skb_get_hash to be a separate function which takes flow_keys as input. This will allow flow hash computation in other contexts where we only have addresses and ports. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	6lowpan: mac802154: fix coding style issues	Varka Bhadram	2014-07-07	15	-145/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixed the coding style issues reported by checkpatch.pl following issues fixed: CHECK: Alignment should match open parenthesis WARNING: line over 80 characters CHECK: Blank lines aren't necessary before a close brace '}' WARNING: networking block comments don't use an empty /* line, use /* Comment... WARNING: Missing a blank line after declarations WARNING: networking block comments start with * on subsequent lines CHECK: braces {} should be used on all arms of this statement Signed-off-by: Varka Bhadram <varkab@cdac.in> Tested-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netlink: Fix do_one_broadcast() prototype.	Rami Rosen	2014-07-07	1	-9/+6
\| \| \| \| \| \| \|	This patch changes the prototype of the do_one_broadcast() method so that it will return void. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tipc: fix link acknowledge logic in receive path	Erik Hugne	2014-07-07	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Link state acks triggered from the receive path is done before the last received packet have been processed by the link layer. The effect of this is that the last received packet will not be included in the ack. This causes problems if the link window is set to TIPC_MIN_LINK_WIN, where the ack interval will be equal to the link tolerance, and the link enters a stop-and-go behavior. We move the ack logic to after link state processing, just before the packet is delivered to higher layers. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Carl Sigurjonsson <carl.sigurjonsson@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tipc: refactor message delivery out of tipc_rcv	Erik Hugne	2014-07-07	1	-44/+80
\| \| \| \| \| \| \| \| \| \|	This is a cosmetic change, separating message delivery from the link state processing. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: switch snt_synack back to measuring transmit time of first SYNACK	Neal Cardwell	2014-07-07	2	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Always store in snt_synack the time at which the server received the first client SYN and attempted to send the first SYNACK. Recent commit aa27fc501 ("tcp: tcp_v[46]_conn_request: fix snt_synack initialization") resolved an inconsistency between IPv4 and IPv6 in the initialization of snt_synack. This commit brings back the idea from 843f4a55e (tcp: use tcp_v4_send_synack on first SYN-ACK), which was going for the original behavior of snt_synack from the commit where it was added in 9ad7c049f0f79 ("tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open side") in v3.1. In addition to being simpler (and probably a tiny bit faster), unconditionally storing the time of the first SYNACK attempt has been useful because it allows calculating a performance metric quantifying how long it took to establish a passive TCP connection. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Cc: Octavian Purdila <octavian.purdila@intel.com> Cc: Jerry Chu <hkchu@google.com> Acked-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	batman-adv: Use kasprintf	Himangi Saraogi	2014-07-07	1	-16/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kasprintf combines kmalloc and sprintf, and takes care of the size calculation itself. The semantic patch that makes this change is as follows: // <smpl> @@ expression a,flag; expression list args; statement S; @@ a = - \(kmalloc\\|kzalloc\)(...,flag) + kasprintf(flag,args) <... when != a if (a == NULL \|\| ...) S ...> - sprintf(a,args); // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
*	vlan: Pass SIOC[SG]HWTSTAMP ioctls to real device	Stefan Sørensen	2014-07-07	1	-0/+2
\| \| \| \| \| \| \| \|	This allows applications to enable hardware timestamping without being aware of it being a vlan device and figuring out the real device. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ptp: Classify ptp over ip over vlan packets	Stefan Sørensen	2014-07-07	1	-6/+58
\| \| \| \| \| \| \| \| \|	This extends the ptp bpf to also match ptp over ip over vlan packets. The ptp classes are changed to orthogonal bitfields representing version, transport and vlan values to simplify matching. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Simplify ptp class checks	Stefan Sørensen	2014-07-07	1	-37/+20
\| \| \| \| \| \| \| \|	Replace two switch statements enumerating all valid ptp classes with an if statement matching for not PTP_CLASS_NONE. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: sctp: only warn in proc_sctp_do_alpha_beta if write	Daniel Borkmann	2014-07-02	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	Only warn if the value is written to alpha or beta. We don't care emitting a one-time warning when only reading it. Reported-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: sctp: improve timer slack calculation for transport HBs	Daniel Borkmann	2014-07-02	1	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RFC4960, section 8.3 says: On an idle destination address that is allowed to heartbeat, it is recommended that a HEARTBEAT chunk is sent once per RTO of that destination address plus the protocol parameter 'HB.interval', with jittering of +/- 50% of the RTO value, and exponential backoff of the RTO if the previous HEARTBEAT is unanswered. Currently, we calculate jitter via sctp_jitter() function first, and then add its result to the current RTO for the new timeout: TMO = RTO + (RAND() % RTO) - (RTO / 2) `------------------------^-=> sctp_jitter() Instead, we can just simplify all this by directly calculating: TMO = (RTO / 2) + (RAND() % RTO) With the help of prandom_u32_max(), we don't need to open code our own global PRNG, but can instead just make use of the per CPU implementation of prandom with better quality numbers. Also, we can now spare us the conditional for divide by zero check since no div or mod operation needs to be used. Note that prandom_u32_max() won't emit the same result as a mod operation, but we really don't care here as we only want to have a random number scaled into RTO interval. Note, exponential RTO backoff is handeled elsewhere, namely in sctp_do_8_2_transport_strike(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/caif/caif_socket.c: remove unnecessary null test before ↵	Fabian Frederick	2014-07-02	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	debugfs_remove_recursive based on checkpatch: "debugfs_remove_recursive(NULL) is safe this check is probably not required" Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
*	inet: move ipv6only in sock_common	Eric Dumazet	2014-07-01	5	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an UDP application switches from AF_INET to AF_INET6 sockets, we have a small performance degradation for IPv4 communications because of extra cache line misses to access ipv6only information. This can also be noticed for TCP listeners, as ipv6_only_sock() is also used from __inet_lookup_listener()->compute_score() This is magnified when SO_REUSEPORT is used. Move ipv6only into struct sock_common so that it is available at no extra cost in lookups. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	pktgen: RCU-ify "if_list" to remove lock in next_to_run()	Jesper Dangaard Brouer	2014-07-01	1	-49/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The if_lock()/if_unlock() in next_to_run() adds a significant overhead, because its called for every packet in busy loop of pktgen_thread_worker(). (Thomas Graf originally pointed me at this lock problem). Removing these two "LOCK" operations should in theory save us approx 16ns (8ns x 2), as illustrated below we do save 16ns when removing the locks and introducing RCU protection. Performance data with CLONE_SKB==100000, TX-size=512, rx-usecs=30: (single CPU performance, ixgbe 10Gbit/s, E5-2630) * Prev : 5684009 pps --> 175.93ns (1/568400910^9) RCU-fix: 6272204 pps --> 159.43ns (1/627220410^9) Diff : +588195 pps --> -16.50ns To understand this RCU patch, I describe the pktgen thread model below. In pktgen there is several kernel threads, but there is only one CPU running each kernel thread. Communication with the kernel threads are done through some thread control flags. This allow the thread to change data structures at a know synchronization point, see main thread func pktgen_thread_worker(). Userspace changes are communicated through proc-file writes. There are three types of changes, general control changes "pgctrl" (func:pgctrl_write), thread changes "kpktgend_X" (func:pktgen_thread_write), and interface config changes "etcX@N" (func:pktgen_if_write). Userspace "pgctrl" and "thread" changes are synchronized via the mutex pktgen_thread_lock, thus only a single userspace instance can run. The mutex is taken while the packet generator is running, by pgctrl "start". Thus e.g. "add_device" cannot be invoked when pktgen is running/started. All "pgctrl" and all "thread" changes, except thread "add_device", communicate via the thread control flags. The main problem is the exception "add_device", that modifies threads "if_list" directly. Fortunately "add_device" cannot be invoked while pktgen is running. But there exists a race between "rem_device_all" and "add_device" (which normally don't occur, because "rem_device_all" waits 125ms before returning). Background'ing "rem_device_all" and running "add_device" immediately allow the race to occur. The race affects the threads (list of devices) "if_list". The if_lock is used for protecting this "if_list". Other readers are given lock-free access to the list under RCU read sections. Note, interface config changes (via proc) can occur while pktgen is running, which worries me a bit. I'm assuming proc_remove() takes appropriate locks, to assure no writers exists after proc_remove() finish. I've been running a script exercising the race condition (leading me to fix the proc_remove order), without any issues. The script also exercises concurrent proc writes, while the interface config is getting removed. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
*	pktgen: avoid expensive set_current_state() call in loop	Jesper Dangaard Brouer	2014-07-01	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid calling set_current_state() inside the busy-loop in pktgen_thread_worker(). In case of pkt_dev->delay, then it is still used/enabled in pktgen_xmit() via the spin() call. The set_current_state(TASK_INTERRUPTIBLE) uses a xchg, which implicit is LOCK prefixed. I've measured the asm LOCK operation to take approx 8ns on this E5-2630 CPU. Performance increase corrolate with this measurement. Performance data with CLONE_SKB==100000, rx-usecs=30: (single CPU performance, ixgbe 10Gbit/s, E5-2630) * Prev: 5454050 pps --> 183.35ns (1/545405010^9) Now: 5684009 pps --> 175.93ns (1/568400910^9) Diff: +229959 pps --> -7.42ns Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	openvswitch: introduce rtnl ops stub	Jiri Pirko	2014-07-01	3	-1/+26
\| \| \| \| \| \| \| \|	This stub now allows userspace to see IFLA_INFO_KIND for ovs master and IFLA_INFO_SLAVE_KIND for slave. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	rtnetlink: allow to register ops without ops->setup set	Jiri Pirko	2014-07-01	2	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	So far, it is assumed that ops->setup is filled up. But there might be case that ops might make sense even without ->setup. In that case, forbid to newlink and dellink. This allows to register simple rtnl link ops containing only ->kind. That allows consistent way of passing device kind (either device-kind or slave-kind) to userspace. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: fix some typos in comment	Ying Xue	2014-07-01	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	In commit 371121057607e3127e19b3fa094330181b5b031e("net: QDISC_STATE_RUNNING dont need atomic bit ops") the __QDISC_STATE_RUNNING is renamed to __QDISC___STATE_RUNNING, but the old names existing in comment are not replaced with the new name completely. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv6: Allow accepting RA from local IP addresses.	Ben Greear	2014-07-01	2	-8/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This can be used in virtual networking applications, and may have other uses as well. The option is disabled by default. A specific use case is setting up virtual routers, bridges, and hosts on a single OS without the use of network namespaces or virtual machines. With proper use of ip rules, routing tables, veth interface pairs and/or other virtual interfaces, and applications that can bind to interfaces and/or IP addresses, it is possibly to create one or more virtual routers with multiple hosts attached. The host interfaces can act as IPv6 systems, with radvd running on the ports in the virtual routers. With the option provided in this patch enabled, those hosts can now properly obtain IPv6 addresses from the radvd. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv6: Add more debugging around accept-ra logic.	Ben Greear	2014-07-01	1	-8/+43
\| \| \| \| \| \| \| \| \|	This is disabled by default, just like similar debug info already in this module. But, makes it easier to find out why RA is not being accepted when debugging strange behaviour. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: tcp_conn_request: fix build error when IPv6 is disabled	Octavian Purdila	2014-06-29	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes build error introduced by commit 1fb6f159fd21c64 (tcp: add tcp_conn_request): net/ipv4/tcp_input.c: In function 'pr_drop_req': net/ipv4/tcp_input.c:5889:130: error: 'struct sock_common' has no member named 'skc_v6_daddr' Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: add tcp_conn_request	Octavian Purdila	2014-06-27	3	-244/+152
\| \| \| \| \| \| \| \|	Create tcp_conn_request and remove most of the code from tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: add queue_add_hash to tcp_request_sock_ops	Octavian Purdila	2014-06-27	2	-2/+4
\| \| \| \| \| \| \| \|	Add queue_add_hash member to tcp_request_sock_ops so that we can later unify tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: add mss_clamp to tcp_request_sock_ops	Octavian Purdila	2014-06-27	2	-2/+5
\| \| \| \| \| \| \| \|	Add mss_clamp member to tcp_request_sock_ops so that we can later unify tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: unify tcp_v4_rtx_synack and tcp_v6_rtx_synack	Octavian Purdila	2014-06-27	3	-27/+17
\| \| \| \| \|	Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: add send_synack method to tcp_request_sock_ops	Octavian Purdila	2014-06-27	2	-9/+14
\| \| \| \| \| \| \| \| \| \|	Create a new tcp_request_sock_ops method to unify the IPv4/IPv6 signature for tcp_v[46]_send_synack. This allows us to later unify tcp_v4_rtx_synack with tcp_v6_rtx_synack and tcp_v4_conn_request with tcp_v4_conn_request. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: add init_seq method to tcp_request_sock_ops	Octavian Purdila	2014-06-27	2	-3/+5
\| \| \| \| \| \| \| \| \|	More work in preparation of unifying tcp_v4_conn_request and tcp_v6_conn_request: indirect the init sequence calls via the tcp_request_sock_ops. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: move around a few calls in tcp_v6_conn_request	Octavian Purdila	2014-06-27	1	-11/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the tcp_v6_conn_request calls flow similar with that of tcp_v4_conn_request. Note that want_cookie can be true only if isn is zero and that is why we can move the if (want_cookie) block out of the if (!isn) block. Moving security_inet_conn_request() has a couple of side effects: missing inet_rsk(req)->ecn_ok update and the req->cookie_ts update. However, neither SELinux nor Smack security hooks seems to check them. This change should also avoid future different behaviour for IPv4 and IPv6 in the security hooks. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: add route_req method to tcp_request_sock_ops	Octavian Purdila	2014-06-27	2	-13/+49
\| \| \| \| \| \| \| \| \| \| \| \| \|	Create wrappers with same signature for the IPv4/IPv6 request routing calls and use these wrappers (via route_req method from tcp_request_sock_ops) in tcp_v4_conn_request and tcp_v6_conn_request with the purpose of unifying the two functions in a later patch. We can later drop the wrapper functions and modify inet_csk_route_req and inet6_cks_route_req to use the same signature. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: add init_cookie_seq method to tcp_request_sock_ops	Octavian Purdila	2014-06-27	2	-2/+8
\| \| \| \| \| \| \| \| \|	Move the specific IPv4/IPv6 cookie sequence initialization to a new method in tcp_request_sock_ops in preparation for unifying tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: add init_req method to tcp_request_sock_ops	Octavian Purdila	2014-06-27	2	-35/+49
\| \| \| \| \| \| \| \| \|	Move the specific IPv4/IPv6 intializations to a new method in tcp_request_sock_ops in preparation for unifying tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: remove inet6_reqsk_alloc	Octavian Purdila	2014-06-27	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Since pktops is only used for IPv6 only and opts is used for IPv4 only, we can move these fields into a union and this allows us to drop the inet6_reqsk_alloc function as after this change it becomes equivalent with inet_reqsk_alloc. This patch also fixes a kmemcheck issue in the IPv6 stack: the flags field was not annotated after a request_sock was allocated. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: tcp_v[46]_conn_request: fix snt_synack initialization	Octavian Purdila	2014-06-27	2	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 016818d07 (tcp: TCP Fast Open Server - take SYNACK RTT after completing 3WHS) changes the code to only take a snt_synack timestamp when a SYNACK transmit or retransmit succeeds. This behaviour is later broken by commit 843f4a55e (tcp: use tcp_v4_send_synack on first SYN-ACK), as snt_synack is now updated even if tcp_v4_send_synack fails. Also, commit 3a19ce0ee (tcp: IPv6 support for fastopen server) misses the required IPv6 updates for 016818d07. This patch makes sure that snt_synack is updated only when the SYNACK trasnmit/retransmit succeeds, for both IPv4 and IPv6. Cc: Cardwell <ncardwell@google.com> Cc: Daniel Lee <longinus00@gmail.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: cookie_v4_init_sequence: skb should be const	Octavian Purdila	2014-06-27	1	-1/+2
\| \| \| \| \|	Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tipc: simplify connection congestion handling	Jon Paul Maloy	2014-06-27	5	-54/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As a consequence of the recently introduced serialized access to the socket in commit 8d94168a761819d10252bab1f8de6d7b202c3baa ("tipc: same receive code path for connection protocol and data messages") we can make a number of simplifications in the detection and handling of connection congestion situations. - We don't need to keep two counters, one for sent messages and one for acked messages. There is no longer any risk for races between acknowledge messages arriving in BH and data message sending running in user context. So we merge this into one counter, 'sent_unacked', which is incremented at sending and subtracted from at acknowledge reception. - We don't need to set the 'congested' field in tipc_port to true before we sent the message, and clear it when sending is successful. (As a matter of fact, it was never necessary; the field was set in link_schedule_port() before any wakeup could arrive anyway.) - We keep the conditions for link congestion and connection connection congestion separated. There would otherwise be a risk that an arriving acknowledge message may wake up a user sleeping because of link congestion. - We can simplify reception of acknowledge messages. We also make some cosmetic/structural changes: - We rename the 'congested' field to the more correct 'link_cong´. - We rename 'conn_unacked' to 'rcv_unacked' - We move the above mentioned fields from struct tipc_port to struct tipc_sock. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tipc: clean up connection protocol reception function	Jon Paul Maloy	2014-06-27	5	-63/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We simplify the code for receiving connection probes, leveraging the recently introduced tipc_msg_reverse() function. We also stick to the principle of sending a possible response message directly from the calling (tipc_sk_rcv or backlog_rcv) functions, hence making the call chain shallower and easier to follow. We make one small protocol change here, allowed according to the spec. If a protocol message arrives from a remote socket that is not the one we are connected to, we are currently generating a connection abort message and send it to the source. This behavior is unnecessary, and might even be a security risk, so instead we now choose to only ignore the message. The consequnce for the sender is that he will need longer time to discover his mistake (until the next timeout), but this is an extreme corner case, and may happen anyway under other circumstances, so we deem this change acceptable. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tipc: same receive code path for connection protocol and data messages	Jon Paul Maloy	2014-06-27	6	-55/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As a preparation to eliminate port_lock we need to bring reception of connection protocol messages under proper protection of bh_lock_sock or socket owner. We fix this by letting those messages follow the same code path as incoming data messages. As a side effect of this change, the last reference to the function net_route_msg() disappears, and we can eliminate that function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tipc: let port protocol senders use new link send function	Jon Paul Maloy	2014-06-27	1	-7/+23
\| \| \| \| \| \| \| \| \| \| \|	Several functions in port.c, related to the port protocol and connection shutdown, need to send messages. We now convert them to use the new link send function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tipc: connection oriented transport uses new send functions	Jon Paul Maloy	2014-06-27	4	-513/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We move the message sending across established connections to use the message preparation and send functions introduced earlier in this series. We now do the message preparation and call to the link send function directly from the socket, instead of going via the port layer. As a consequence of this change, the functions tipc_send(), tipc_port_iovec_rcv(), tipc_port_iovec_reject() and tipc_reject_msg() become unreferenced and can be eliminated from port.c. For the same reason, the functions tipc_link_xmit_fast(), tipc_link_iovec_xmit_long() and tipc_link_iovec_fast() can be eliminated from link.c. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tipc: RDM/DGRAM transport uses new fragmenting and sending functions	Jon Paul Maloy	2014-06-27	3	-149/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We merge the code for sending port name and port identity addressed messages into the corresponding send functions in socket.c, and start using the new fragmenting and transmit functions we just have introduced. This saves a call level and quite a few code lines, as well as making this part of the code easier to follow. As a consequence, the functions tipc_send2name() and tipc_send2port() in port.c can be removed. For practical reasons, we break out the code for sending multicast messages from tipc_sendmsg() and move it into a separate function, tipc_sendmcast(), but we do not yet convert it into using the new build/send functions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>