summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* af_key: remove a duplicated skb_orphan()Cong Wang2013-01-281-1/+0
| | | | | | | | | | | skb_set_owner_r() will call skb_orphan(), I don't see any reason to call it twice. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* net: fix possible wrong checksum generationEric Dumazet2013-01-289-6/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pravin Shelar mentioned that GSO could potentially generate wrong TX checksum if skb has fragments that are overwritten by the user between the checksum computation and transmit. He suggested to linearize skbs but this extra copy can be avoided for normal tcp skbs cooked by tcp_sendmsg(). This patch introduces a new SKB_GSO_SHARED_FRAG flag, set in skb_shinfo(skb)->gso_type if at least one frag can be modified by the user. Typical sources of such possible overwrites are {vm}splice(), sendfile(), and macvtap/tun/virtio_net drivers. Tested: $ netperf -H 7.7.8.84 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3959.52 $ netperf -H 7.7.8.84 -t TCP_SENDFILE TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3216.80 Performance of the SENDFILE is impacted by the extra allocation and copy, and because we use order-0 pages, while the TCP_STREAM uses bigger pages. Reported-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-nextDavid S. Miller2013-01-284-27/+80
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Marc Kleine-Budde says: ==================== this is a pull-request for net-next/master. There is are 9 patches by Fabio Baltieri and Kurt Van Dijck which add LED infrastructure and support for CAN devices. Bernd Krumboeck adds a driver for the USB CAN adapter from 8 devices. Oliver Hartkopp improves the CAN gateway functionality. There are 4 patches by me, which clean up the CAN's Kconfig. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * can: gw: indicate and count deleted frames due to misconfigurationOliver Hartkopp2013-01-261-1/+11
| | | | | | | | | | | | | | | | Add a statistic counter to detect deleted frames due to misconfiguration with a new read-only CGW_DELETED netlink attribute for the CAN gateway. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: gw: add a variable limit for CAN frame routingsOliver Hartkopp2013-01-261-16/+40
| | | | | | | | | | | | | | | | | | | | | | To prevent a possible misconfiguration (e.g. circular CAN frame routings) limit the number of routings of a single CAN frame to a small variable value. The limit can be specified by the module parameter 'max_hops' (1..6). The default value is 1 (one hop), according to the original can-gw behaviour. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: gw: make routing to the incoming CAN interface configurableOliver Hartkopp2013-01-261-0/+8
| | | | | | | | | | | | | | | | | | Introduce new configuration flag CGW_FLAGS_CAN_IIF_TX_OK to configure if a CAN sk_buff that has been routed with can-gw is allowed to be send back to the originating CAN interface. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: add private data space for CAN sk_buffsOliver Hartkopp2013-01-262-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The struct can_skb_priv is used to transport additional information along with the stored struct can(fd)_frame that can not be contained in existing struct sk_buff elements. can_skb_priv is located in the skb headroom, which does not touch the existing CAN sk_buff usage with skb->data and skb->len, so that even out-of-tree CAN drivers can be used without changes. Btw. out-of-tree CAN drivers without can_skb_priv in the sk_buff headroom would not support features based on can_skb_priv. The can_skb_priv->ifindex contains the first interface where the CAN frame appeared on the local host. Unfortunately skb->skb_iif can not be used as this value is overwritten in every netif_receive_skb() call. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: Kconfig: switch on all CAN protocolls by defaultMarc Kleine-Budde2013-01-261-3/+3
| | | | | | | | | | | | This patch enables all basic CAN protocol by default. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * can: Kconfig: convert 'depends on CAN' into 'if CAN...endif' blockMarc Kleine-Budde2013-01-261-3/+4
| | | | | | | | | | | | | | This patch adds an 'if CAN...endif' Block around all CAN symbols in net/can/Kconfig. So the 'depends on CAN' dependencies can be removed. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
* | decnet: use correct RCU API to deref sk_dst_cache fieldCong Wang2013-01-283-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | sock->sk_dst_cache is protected by RCU, therefore we should use __sk_dst_get() to deref it once we lock the sock. This fixes several sparse warnings. Cc: linux-decnet-user@lists.sourceforge.net Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | irda: buffer overflow in irnet_ctrl_read()Dan Carpenter2013-01-271-59/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comments here say that the /* Max event is 61 char */ but in 2003 we changed the event format and now the max event size is 75. The longest event is: "Discovered %08x (%s) behind %08x {hints %02X-%02X}\n", 12345678901 23 456789012 34567890 1 2 3 +8 +21 +8 +2 +2 +1 = 75 characters. There was a check to return -EOVERFLOW if the user gave us a "count" value that was less than 64. Raising it to 75 might break backwards compatability. Instead I removed the check and now it returns a truncated string if "count" is too low. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | soreuseport: fix use of uid in tb->fastuidTom Herbert2013-01-271-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | Fix a reported compilation error where ia variable of type kuid_t was being set to zero. Eliminate two instances of setting tb->fastuid to zero. tb->fastuid is only used if tb->fastreuseport is set, so there should be no problem if tb->fastuid is not initialized (when tb->fastreuesport is zero). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of git://1984.lsi.us.es/nf-nextDavid S. Miller2013-01-2724-401/+950
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== This batch contains netfilter updates for you net-next tree, they are: * The new connlabel extension for x_tables, that allows us to attach labels to each conntrack flow. The kernel implementation uses a bitmask and there's a file in user-space that maps the bits with the corresponding string for each existing label. By now, you can attach up to 128 overlapping labels. From Florian Westphal. * A new round of improvements for the netns support for conntrack. Gao feng has moved many of the initialization code of each module of the netns init path. He also made several code refactoring, that code looks cleaner to me now. * Added documentation for all possible tweaks for nf_conntrack via sysctl, from Jiri Pirko. * Cisco 7941/7945 IP phone support for our SIP conntrack helper, from Kevin Cernekee. * Missing header file in the snmp helper, from Stephen Hemminger. * Finally, a couple of fixes to resolve minor issues with these changes, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nf_conntrack: fix compilation if sysctl are disabledPablo Neira Ayuso2013-01-231-2/+9
| | | | | | | | | | | | | | | | In (f94161c netfilter: nf_conntrack: move initialization out of pernet operations), some ifdefs were missing for sysctl dependent code. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_conntrack: refactor l4proto support for netnsGao feng2013-01-237-103/+197
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the code that register/unregister l4proto to the module_init/exit context. Given that we have to modify some interfaces to accomodate these changes, it is a good time to use shorter function names for this using the nf_ct_* prefix instead of nf_conntrack_*, that is: nf_ct_l4proto_register nf_ct_l4proto_pernet_register nf_ct_l4proto_unregister nf_ct_l4proto_pernet_unregister We same many line breaks with it. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_conntrack: refactor l3proto support for netnsGao feng2013-01-233-38/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the code that register/unregister l3proto to the module_init/exit context. Given that we have to modify some interfaces to accomodate these changes, it is a good time to use shorter function names for this using the nf_ct_* prefix instead of nf_conntrack_*, that is: nf_ct_l3proto_register nf_ct_l3proto_pernet_register nf_ct_l3proto_unregister nf_ct_l3proto_pernet_unregister We same many line breaks with it. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_ct_proto: move initialization out of pernet_operationsGao feng2013-01-232-18/+29
| | | | | | | | | | | | | | Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_ct_labels: move initialization out of pernet_operationsGao feng2013-01-232-16/+12
| | | | | | | | | | | | | | Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_ct_helper: move initialization out of pernet_operationsGao feng2013-01-232-32/+36
| | | | | | | | | | | | | | Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_ct_timeout: move initialization out of pernet_operationsGao feng2013-01-232-23/+15
| | | | | | | | | | | | | | Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_ct_ecache: move initialization out of pernet_operationsGao feng2013-01-232-28/+24
| | | | | | | | | | | | | | Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_ct_tstamp: move initialization out of pernet_operationsGao feng2013-01-232-28/+26
| | | | | | | | | | | | | | Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_ct_acct: move initialization out of pernet_operationsGao feng2013-01-232-27/+24
| | | | | | | | | | | | | | Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_ct_expect: move initialization out of pernet_operationsGao feng2013-01-232-31/+36
| | | | | | | | | | | | | | Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_conntrack: move initialization out of pernet operationsGao feng2013-01-232-81/+71
| | | | | | | | | | | | | | | | | | | | | | nf_conntrack initialization and cleanup codes happens in pernet operations function. This task should be done in module_init/exit. We can't use init_net to identify if it's the right time to initialize or cleanup since we cannot make assumption on the order netns are created/destroyed. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: x_tables: add xt_bpf matchWillem de Bruijn2013-01-213-0/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Support arbitrary linux socket filter (BPF) programs as x_tables match rules. This allows for very expressive filters, and on platforms with BPF JIT appears competitive with traditional hardcoded iptables rules using the u32 match. The size of the filter has been artificially limited to 64 instructions maximum to avoid bloating the size of each rule using this new match. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_ct_snmp: add include filestephen hemminger2013-01-181-0/+1
| | | | | | | | | | | | | | | | Prototype for nf_nat_snmp_hook is in nf_conntrack_snmp.h therefore it should be included to get type checking. Found by sparse. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ctnetlink: allow userspace to modify labelsFlorian Westphal2013-01-182-0/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the ability to set/clear labels assigned to a conntrack via ctnetlink. To allow userspace to only alter specific bits, Pablo suggested to add a new CTA_LABELS_MASK attribute: The new set of active labels is then determined via active = (active & ~mask) ^ changeset i.e., the mask selects those bits in the existing set that should be changed. This follows the same method already used by MARK and CONNMARK targets. Omitting CTA_LABELS_MASK is the same as setting all bits in CTA_LABELS_MASK to 1: The existing set is replaced by the one from userspace. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ctnetlink: deliver labels to userspaceFlorian Westphal2013-01-182-1/+42
| | | | | | | | | | | | | | | | | | | | Introduce CTA_LABELS attribute to send a bit-vector of currently active labels to userspace. Future patch will permit userspace to also set/delete active labels. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: add connlabel conntrack extensionFlorian Westphal2013-01-186-0/+206
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | similar to connmarks, except labels are bit-based; i.e. all labels may be attached to a flow at the same time. Up to 128 labels are supported. Supporting more labels is possible, but requires increasing the ct offset delta from u8 to u16 type due to increased extension sizes. Mapping of bit-identifier to label name is done in userspace. The extension is enabled at run-time once "-m connlabel" netfilter rules are added. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_ct_sip: support Cisco 7941/7945 IP phonesKevin Cernekee2013-01-172-3/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most SIP devices use a source port of 5060/udp on SIP requests, so the response automatically comes back to port 5060: phone_ip:5060 -> proxy_ip:5060 REGISTER proxy_ip:5060 -> phone_ip:5060 100 Trying The newer Cisco IP phones, however, use a randomly chosen high source port for the SIP request but expect the response on port 5060: phone_ip:49173 -> proxy_ip:5060 REGISTER proxy_ip:5060 -> phone_ip:5060 100 Trying Standard Linux NAT, with or without nf_nat_sip, will send the reply back to port 49173, not 5060: phone_ip:49173 -> proxy_ip:5060 REGISTER proxy_ip:5060 -> phone_ip:49173 100 Trying But the phone is not listening on 49173, so it will never see the reply. This patch modifies nf_*_sip to work around this quirk by extracting the SIP response port from the Via: header, iff the source IP in the packet header matches the source IP in the SIP request. Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | Merge branch 'testing' of ↵David S. Miller2013-01-234-72/+69
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Add a statistic counter for invalid output states and remove a superfluous state valid check, from Li RongQing. 2) Probe for asynchronous block ciphers instead of synchronous block ciphers to make the asynchronous variants available even if no synchronous block ciphers are found, from Jussi Kivilinna. 3) Make rfc3686 asynchronous block cipher and make use of the new asynchronous variant, from Jussi Kivilinna. 4) Replace some rwlocks by rcu, from Cong Wang. 5) Remove some unused defines. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | xfrm: use separated locks to protect pointers of struct xfrm_state_afinfoCong Wang2013-01-171-25/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | afinfo->type_map and afinfo->mode_map deserve separated locks, they are different things. We should just take RCU read lock to protect afinfo itself, but not for the inner pointers. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | xfrm: replace rwlock on xfrm_km_list with rcuCong Wang2013-01-161-28/+30
| | | | | | | | | | | | | | | | | | | | | | | | Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | xfrm: replace rwlock on xfrm_state_afinfo with rcuCong Wang2013-01-161-17/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to commit 418a99ac6ad487dc9c42e6b0e85f941af56330f2 (Replace rwlock on xfrm_policy_afinfo with rcu), the rwlock on xfrm_state_afinfo can be replaced by RCU too. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | xfrm_algo: probe asynchronous block ciphers instead of synchronousJussi Kivilinna2013-01-081-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPSEC uses block ciphers asynchronous, but probes only for synchronous block ciphers and makes ealg entries only available if synchronous block cipher is found. So with setup, where hardware crypto driver registers asynchronous block ciphers and software crypto module is not build, ealg is not marked as being available. Use crypto_has_ablkcipher instead and remove ASYNC mask. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | xfrm: removes a superfluous check and add a statisticLi RongQing2013-01-073-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the check if x->km.state equal to XFRM_STATE_VALID in xfrm_state_check_expire(), which will be done before call xfrm_state_check_expire(). add a LINUX_MIB_XFRMOUTSTATEINVALID statistic to record the outbound error due to invalid xfrm state. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | | soreuseport: UDP/IPv6 implementationTom Herbert2013-01-231-3/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Motivation for soreuseport would be something like a DNS server.  An alternative would be to recv on the same socket from multiple threads. As in the case of TCP, the load across these threads tends to be disproportionate and we also see a lot of contection on the socket lock. Note that SO_REUSEADDR already allows multiple UDP sockets to bind to the same port, however there is no provision to prevent hijacking and nothing to distribute packets across all the sockets sharing the same bound port.  This patch does not change the semantics of SO_REUSEADDR, but provides usable functionality of it for unicast. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | soreuseport: TCP/IPv6 implementationTom Herbert2013-01-233-9/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Motivation for soreuseport would be something like a web server binding to port 80 running with multiple threads, where each thread might have it's own listener socket. This could be done as an alternative to other models: 1) have one listener thread which dispatches completed connections to workers. 2) accept on a single listener socket from multiple threads. In case #1 the listener thread can easily become the bottleneck with high connection turn-over rate. In case #2, the proportion of connections accepted per thread tends to be uneven under high connection load (assuming simple event loop: while (1) { accept(); process() }, wakeup does not promote fairness among the sockets. We have seen the disproportion to be as high as 3:1 ratio between thread accepting most connections and the one accepting the fewest. With so_reusport the distribution is uniform. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | soreuseport: UDP/IPv4 implementationTom Herbert2013-01-231-18/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow multiple UDP sockets to bind to the same port. Motivation soreuseport would be something like a DNS server.  An alternative would be to recv on the same socket from multiple threads. As in the case of TCP, the load across these threads tends to be disproportionate and we also see a lot of contection on the socketlock. Note that SO_REUSEADDR already allows multiple UDP sockets to bind to the same port, however there is no provision to prevent hijacking and nothing to distribute packets across all the sockets sharing the same bound port.  This patch does not change the semantics of SO_REUSEADDR, but provides usable functionality of it for unicast. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | soreuseport: TCP/IPv4 implementationTom Herbert2013-01-233-18/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow multiple listener sockets to bind to the same port. Motivation for soresuseport would be something like a web server binding to port 80 running with multiple threads, where each thread might have it's own listener socket. This could be done as an alternative to other models: 1) have one listener thread which dispatches completed connections to workers. 2) accept on a single listener socket from multiple threads. In case #1 the listener thread can easily become the bottleneck with high connection turn-over rate. In case #2, the proportion of connections accepted per thread tends to be uneven under high connection load (assuming simple event loop: while (1) { accept(); process() }, wakeup does not promote fairness among the sockets. We have seen the disproportion to be as high as 3:1 ratio between thread accepting most connections and the one accepting the fewest. With so_reusport the distribution is uniform. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | soreuseport: infrastructureTom Herbert2013-01-231-0/+7
| | | | | | | | | | | | | | | | | | | | | Definitions and macros for implementing soreusport. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netpoll: fix an uninitialized variableCong Wang2013-01-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fengguang reported: net/core/netpoll.c: In function 'netpoll_setup': net/core/netpoll.c:1049:6: warning: 'err' may be used uninitialized in this function [-Wmaybe-uninitialized] in !CONFIG_IPV6 case, we may error out without initializing 'err'. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: remove duplicated declaration of ip6_fragment()Cong Wang2013-01-221-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is declared in: include/net/ip6_route.h:187:int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)); and net/ip6_route.h is already included. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netfilter: Use IS_ERR_OR_NULL().YOSHIFUJI Hideaki / 吉藤英明2013-01-223-15/+15
| | | | | | | | | | | | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: Use IS_ERR_OR_NULL().YOSHIFUJI Hideaki / 吉藤英明2013-01-221-2/+2
| | | | | | | | | | | | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Use IS_ERR_OR_NULL().YOSHIFUJI Hideaki / 吉藤英明2013-01-223-3/+3
| | | | | | | | | | | | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: Use IS_ERR_OR_NULL().YOSHIFUJI Hideaki / 吉藤英明2013-01-221-1/+1
| | | | | | | | | | | | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | neigh: Keep neighbour cache entries if number of them is small enough.YOSHIFUJI Hideaki / 吉藤英明2013-01-221-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we have removed NCE (Neighbour Cache Entry) reference from routing entries, the only refcnt holders of an NCE are its timer (if running) and its owner table, in usual cases. As a result, neigh_periodic_work() purges NCEs over and over again even for gateways. It does not make sense to purge entries, if number of them is very small, so keep them. The minimum number of entries to keep is specified by gc_thresh1. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipmr: fix sparse warning when testing origin or groupNicolas Dichtel2013-01-221-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | mfc_mcastgrp and mfc_origin are __be32, thus we need to convert INADDR_ANY. Because INADDR_ANY is 0, this patch just fix sparse warnings. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud