summaryrefslogtreecommitdiffstats
path: root/net/netfilter
Commit message (Collapse)AuthorAgeFilesLines
* ipv6: make lookups simpler and fasterEric Dumazet2013-10-092-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCP listener refactoring, part 4 : To speed up inet lookups, we moved IPv4 addresses from inet to struct sock_common Now is time to do the same for IPv6, because it permits us to have fast lookups for all kind of sockets, including upcoming SYN_RECV. Getting IPv6 addresses in TCP lookups currently requires two extra cache lines, plus a dereference (and memory stall). inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6 This patch is way bigger than its IPv4 counter part, because for IPv4, we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6, it's not doable easily. inet6_sk(sk)->daddr becomes sk->sk_v6_daddr inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr at the same offset. We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic macro. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2013-10-0426-1650/+2414
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter updates for your net-next tree, mostly ipset improvements and enhancements features, they are: * Don't call ip_nest_end needlessly in the error path from me, suggested by Pablo Neira Ayuso, from Jozsef Kadlecsik. * Fixed sparse warnings about shadowed variable and missing rcu annotation and fix of "may be used uninitialized" warnings, also from Jozsef. * Renamed simple macro names to avoid namespace issues, reported by David Laight, again from Jozsef. * Use fix sized type for timeout in the extension part, and cosmetic ordering of matches and targets separatedly in xt_set.c, from Jozsef. * Support package fragments for IPv4 protos without ports from Anders K. Pedersen. For example this allows a hash:ip,port ipset containing the entry 192.168.0.1,gre:0 to match all package fragments for PPTP VPN tunnels to/from the host. Without this patch only the first package fragment (with fragment offset 0) was matched. * Introduced a new operation to get both setname and family, from Jozsef. ip[6]tables set match and SET target need to know the family of the set in order to reject adding rules which refer to a set with a non-mathcing family. Currently such rules are silently accepted and then ignored instead of generating an error message to the user. * Reworked extensions support in ipset types from Jozsef. The approach of defining structures with all variations is not manageable as the number of extensions grows. Therefore a blob for the extensions is introduced, somewhat similar to conntrack. The support of extensions which need a per data destroy function is added as well. * When an element timed out in a list:set type of set, the garbage collector skipped the checking of the next element. So the purging was delayed to the next run of the gc, fixed by Jozsef. * A small Kconfig fix: NETFILTER_NETLINK cannot be selected and ipset requires it. * hash:net,net type from Oliver Smith. The type provides the ability to store pairs of subnets in a set. * Comment for ipset entries from Oliver Smith. This makes possible to annotate entries in a set with comments, for example: ipset n foo hash:net,net comment ipset a foo 10.0.0.0/21,192.168.1.0/24 comment "office nets A and B" * Fix of hash types resizing with comment extension from Jozsef. * Fix of new extensions for list:set type when an element is added into a slot from where another element was pushed away from Jozsef. * Introduction of a common function for the listing of the element extensions from Jozsef. * Net namespace support for ipset from Vitaly Lavrov. * hash:net,port,net type from Oliver Smith, which makes possible to store the triples of two subnets and a protocol, port pair in a set. * Get xt_TCPMSS working with net namespace, by Gao feng. * Use the proper net netnamespace to allocate skbs, also by Gao feng. * A couple of cleanups for the conntrack SIP helper, by Holger Eitzenberger. * Extend cttimeout to allow setting default conntrack timeouts via nfnetlink, so we can get rid of all our sysctl/proc interfaces in the future for timeout tuning, from me. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: cttimeout: allow to set/get default protocol timeoutsPablo Neira Ayuso2013-10-011-8/+153
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Default timeouts are currently set via proc/sysctl interface, the typical pattern is a file name like: /proc/sys/net/netfilter/nf_conntrack_PROTOCOL_timeout_STATE This results in one entry per default protocol state timeout. This patch simplifies this by allowing to set default protocol timeouts via cttimeout netlink interface. This should allow us to get rid of the existing proc/sysctl code in the midterm. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_ct_sip: consolidate NAT hook functionsholger@eitzenberger.org2013-10-012-113/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are currently seven different NAT hooks used in both nf_conntrack_sip and nf_nat_sip, each of the hooks is exported in nf_conntrack_sip, then set from the nf_nat_sip NAT helper. And because each of them is exported there is quite some overhead introduced due of this. By introducing nf_nat_sip_hooks I am able to reduce both text/data somewhat. For nf_conntrack_sip e. g. I get text data bss dec old 15243 5256 32 20531 new 15010 5192 32 20234 Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nfnetlink_log: use proper net to allocate skbGao feng2013-10-011-5/+6
| | | | | | | | | | | | | | | | Use proper net struct to allocate skb, otherwise netlink mmap will be of no effect. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nfnetlink_queue: use proper net namespace to allocate skbGao feng2013-10-011-3/+3
| | | | | | | | | | | | | | | | Use proper net struct to allocate skb, otherwise netlink mmap will have no effect. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ipset: Add hash:net,port,net module to kernel.Oliver Smith2013-09-303-0/+598
| | | | | | | | | | | | | | | | | | This adds a new set that provides similar functionality to ip,port,net but permits arbitrary size subnets for both the first and last parameter. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfiler: ipset: Add net namespace for ipsetVitaly Lavrov2013-09-307-138/+232
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds netns support for ipset. Major changes were made in ip_set_core.c and ip_set.h. Global variables are moved to per net namespace. Added initialization code and the destruction of the network namespace ipset subsystem. In the prototypes of public functions ip_set_* added parameter "struct net*". The remaining corrections related to the change prototypes of public functions ip_set_*. The patch for git://git.netfilter.org/ipset.git commit 6a4ec96c0b8caac5c35474e40e319704d92ca347 Signed-off-by: Vitaly Lavrov <lve@guap.ru> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Use a common function at listing the extensionsJozsef Kadlecsik2013-09-304-50/+12
| | | | | | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: For set:list types, replaced elements must be zeroed outJozsef Kadlecsik2013-09-301-1/+3
| | | | | | | | | | | | | | The new extensions require zero initialization for the new element to be added into a slot from where another element was pushed away. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Fix hash resizing with commentsJozsef Kadlecsik2013-09-301-5/+5
| | | | | | | | | | | | | | The destroy function must take into account that resizing doesn't create new extensions so those cannot be destroyed at resize. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Support comments in hash-type ipsets.Oliver Smith2013-09-309-13/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | This provides kernel support for creating ipsets with comment support. This does incur a penalty to flushing/destroying an ipset since all entries are walked in order to free the allocated strings, this penalty is of course less expensive than the operation of listing an ipset to userspace, so for general-purpose usage the overall impact is expected to be little to none. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Support comments in the list-type ipset.Oliver Smith2013-09-301-6/+12
| | | | | | | | | | | | | | | | This provides kernel support for creating list ipsets with the comment annotation extension. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Support comments in bitmap-type ipsets.Oliver Smith2013-09-304-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This provides kernel support for creating bitmap ipsets with comment support. As is the case for hashes, this incurs a penalty when flushing or destroying the entire ipset as the entries must first be walked in order to free the comment strings. This penalty is of course far less than the cost of listing an ipset to userspace. Any set created without support for comments will be flushed/destroyed as before. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Support comments for ipset entries in the core.Oliver Smith2013-09-301-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds the core support for having comments on ipset entries. The comments are stored as standard null-terminated strings in dynamically allocated memory after being passed to the kernel. As a result of this, code has been added to the generic destroy function to iterate all extensions and call that extension's destroy task if the set has that extension activated, and if such a task is defined. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Add hash:net,net module to kernel.Oliver Smith2013-09-304-9/+541
| | | | | | | | | | | | | | | | | | | | This adds a new set that provides the ability to configure pairs of subnets. A small amount of additional handling code has been added to the generic hash header file - this code is conditionally activated by a preprocessor definition. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Kconfig: ipset needs NETFILTER_NETLINKJozsef Kadlecsik2013-09-301-1/+1
| | | | | | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: list:set: make sure all elements are checked by the gcJozsef Kadlecsik2013-09-301-2/+5
| | | | | | | | | | | | | | When an element timed out, the next one was skipped by the garbage collector, fixed. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Support extensions which need a per data destroy functionJozsef Kadlecsik2013-09-303-38/+90
| | | | | | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Generalize extensions supportJozsef Kadlecsik2013-09-3013-749/+105
| | | | | | | | | | | | | | Get rid of the structure based extensions and introduce a blob for the extensions. Thus we can support more extension types easily. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Move extension data to set structureJozsef Kadlecsik2013-09-3013-278/+244
| | | | | | | | | | | | | | | | Default timeout and extension offsets are moved to struct set, because all set types supports all extensions and it makes possible to generalize extension support. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Rename extension offset ids to extension idsJozsef Kadlecsik2013-09-306-35/+35
| | | | | | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Prepare ipset to support multiple networks for hash typesJozsef Kadlecsik2013-09-305-46/+48
| | | | | | | | | | | | | | | | In order to support hash:net,net, hash:net,port,net etc. types, arrays are introduced for the book-keeping of existing cidr sizes and network numbers in a set. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Introduce new operation to get both setname and familyJozsef Kadlecsik2013-09-301-0/+17
| | | | | | | | | | | | | | | | | | | | ip[6]tables set match and SET target need to know the family of the set in order to reject adding rules which refer to a set with a non-mathcing family. Currently such rules are silently accepted and then ignored instead of generating a clear error message to the user, which is not helpful. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: order matches and targets separatedly in xt_set.cJozsef Kadlecsik2013-09-301-92/+96
| | | | | | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Support package fragments for IPv4 protos without portsAnders K. Pedersen2013-09-301-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable ipset port set types to match IPv4 package fragments for protocols that doesn't have ports (or the port information isn't supported by ipset). For example this allows a hash:ip,port ipset containing the entry 192.168.0.1,gre:0 to match all package fragments for PPTP VPN tunnels to/from the host. Without this patch only the first package fragment (with fragment offset 0) was matched, while subsequent fragments wasn't. This is not possible for IPv6, where the protocol is in the fragmented part of the package unlike IPv4, where the protocol is in the IP header. IPPROTO_ICMPV6 is deliberately not included, because it isn't relevant for IPv4. Signed-off-by: Anders K. Pedersen <akp@surftown.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Fix "may be used uninitialized" warningsJozsef Kadlecsik2013-09-309-12/+12
| | | | | | | | | | Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Rename simple macro names to avoid namespace issues.Jozsef Kadlecsik2013-09-3013-162/+166
| | | | | | | | | | Reported-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Fix sparse warnings due to missing rcu annotationsJozsef Kadlecsik2013-09-301-32/+55
| | | | | | | | | | Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Sparse warning about shadowed variable fixedJozsef Kadlecsik2013-09-301-1/+1
| | | | | | | | | | | | | | net/netfilter/ipset/ip_set_hash_ipportnet.c:275:20: warning: symbol 'cidr' shadows an earlier one Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: ipset: Don't call ip_nest_end needlessly in the error pathJozsef Kadlecsik2013-09-303-3/+3
| | | | | | | | | | Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * netfilter: xt_TCPMSS: lookup route from proper net namespaceGao feng2013-09-271-3/+5
| | | | | | | | | | | | | | Otherwise the pmtu will be incorrect. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: xt_TCPMSS: Get mtu only if clamp-mss-to-pmtu is specifiedGao feng2013-09-271-34/+36
| | | | | | | | | | | | | | | | This patch refactors the code to skip tcpmss_reverse_mtu if no clamp-mss-to-pmtu is specified. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_ct_sip: extend RCU read lock in set_expected_rtp_rtcp()holger@eitzenberger.org2013-09-271-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently set_expected_rtp_rtcp() in the SIP helper uses rcu_dereference() two times to access two different NAT hook functions. However, only the first one is protected by the RCU reader lock, but the 2nd isn't. Fix it by extending the RCU protected area. This is more a cosmetic thing since we rely on all netfilter hooks being rcu_read_lock()ed by nf_hook_slow() in many places anyways, as Patrick McHardy clarified. Signed-off-by: Holger Eitzenberger <holger.eitzenberger@sophos.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2013-10-019-145/+125
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter/IPVS fixes for your net tree, they are: * Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from Patrick McHardy. * Fix possible weight overflow in lblc and lblcr schedulers due to 32-bits arithmetics, from Simon Kirby. * Fix possible memory access race in the lblc and lblcr schedulers, introduced when it was converted to use RCU, two patches from Julian Anastasov. * Fix hard dependency on CPU 0 when reading per-cpu stats in the rate estimator, from Julian Anastasov. * Fix race that may lead to object use after release, when invoking ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian Anastasov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: synproxy: fix BUG_ON triggered by corrupt TCP packetsPatrick McHardy2013-09-301-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | TCP packets hitting the SYN proxy through the SYNPROXY target are not validated by TCP conntrack. When th->doff is below 5, an underflow happens when calculating the options length, causing skb_header_pointer() to return NULL and triggering the BUG_ON(). Handle this case gracefully by checking for NULL instead of using BUG_ON(). Reported-by: Martin Topholm <mph@one.com> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * ipvs: stats should not depend on CPU 0Julian Anastasov2013-09-181-1/+3
| | | | | | | | | | | | | | | | When reading percpu stats we need to properly reset the sum when CPU 0 is not present in the possible mask. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: do not use dest after ip_vs_dest_put in LBLCRJulian Anastasov2013-09-181-30/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c5549571f975ab ("ipvs: convert lblcr scheduler to rcu") allows RCU readers to use dest after calling ip_vs_dest_put(). In the corner case it can race with ip_vs_dest_trash_expire() which can release the dest while it is being returned to the RCU readers as scheduling result. To fix the problem do not allow e->dest to be replaced and defer the ip_vs_dest_put() call by using RCU callback. Now e->dest does not need to be RCU pointer. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: do not use dest after ip_vs_dest_put in LBLCJulian Anastasov2013-09-181-37/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c2a4ffb70eef39 ("ipvs: convert lblc scheduler to rcu") allows RCU readers to use dest after calling ip_vs_dest_put(). In the corner case it can race with ip_vs_dest_trash_expire() which can release the dest while it is being returned to the RCU readers as scheduling result. To fix the problem do not allow en->dest to be replaced and defer the ip_vs_dest_put() call by using RCU callback. Now en->dest does not need to be RCU pointer. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: make the service replacement more robustJulian Anastasov2013-09-182-53/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added IP_VS_DEST_STATE_REMOVING flag and RCU callback named ip_vs_dest_wait_readers() to keep dests and services after removal for at least a RCU grace period. But we have the following corner cases: - we can not reuse the same dest if its service is removed while IP_VS_DEST_STATE_REMOVING is still set because another dest removal in the first grace period can not extend this period. It can happen when ipvsadm -C && ipvsadm -R is used. - dest->svc can be replaced but ip_vs_in_stats() and ip_vs_out_stats() have no explicit read memory barriers when accessing dest->svc. It can happen that dest->svc was just freed (replaced) while we use it to update the stats. We solve the problems as follows: - IP_VS_DEST_STATE_REMOVING is removed and we ensure a fixed idle period for the dest (IP_VS_DEST_TRASH_PERIOD). idle_start will remember when for first time after deletion we noticed dest->refcnt=0. Later, the connections can grab a reference while in RCU grace period but if refcnt becomes 0 we can safely free the dest and its svc. - dest->svc becomes RCU pointer. As result, we add explicit RCU locking in ip_vs_in_stats() and ip_vs_out_stats(). - __ip_vs_unbind_svc is renamed to __ip_vs_svc_put(), it now can free the service immediately or after a RCU grace period. dest->svc is not set to NULL anymore. As result, unlinked dests and their services are freed always after IP_VS_DEST_TRASH_PERIOD period, unused services are freed after a RCU grace period. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: fix overflow on dest weight multiplySimon Kirby2013-09-185-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Schedulers such as lblc and lblcr require the weight to be as high as the maximum number of active connections. In commit b552f7e3a9524abcbcdf ("ipvs: unify the formula to estimate the overhead of processing connections"), the consideration of inactconns and activeconns was cleaned up to always count activeconns as 256 times more important than inactconns. In cases where 3000 or more connections are expected, a weight of 3000 * 256 * 3000 connections overflows the 32-bit signed result used to determine if rescheduling is required. On amd64, this merely changes the multiply and comparison instructions to 64-bit. On x86, a 64-bit result is already present from imull, so only a few more comparison instructions are emitted. Signed-off-by: Simon Kirby <sim@hostway.ca> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* | ip: generate unique IP identificator if local fragmentation is allowedAnsis Atteka2013-09-191-1/+1
|/ | | | | | | | | | | | | | | | | If local fragmentation is allowed, then ip_select_ident() and ip_select_ident_more() need to generate unique IDs to ensure correct defragmentation on the peer. For example, if IPsec (tunnel mode) has to encrypt large skbs that have local_df bit set, then all IP fragments that belonged to different ESP datagrams would have used the same identificator. If one of these IP fragments would get lost or reordered, then peer could possibly stitch together wrong IP fragments that did not belong to the same datagram. This would lead to a packet loss or data corruption. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netfilter: nfnetlink_queue: use network skb for sequence adjustmentGao feng2013-09-171-1/+1
| | | | | | | Instead of the netlink skb. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: ipset: Fix serious failure in CIDR trackingOliver Smith2013-09-161-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | This fixes a serious bug affecting all hash types with a net element - specifically, if a CIDR value is deleted such that none of the same size exist any more, all larger (less-specific) values will then fail to match. Adding back any prefix with a CIDR equal to or more specific than the one deleted will fix it. Steps to reproduce: ipset -N test hash:net ipset -A test 1.1.0.0/16 ipset -A test 2.2.2.0/24 ipset -T test 1.1.1.1 #1.1.1.1 IS in set ipset -D test 2.2.2.0/24 ipset -T test 1.1.1.1 #1.1.1.1 IS NOT in set This is due to the fact that the nets counter was unconditionally decremented prior to the iteration that shifts up the entries. Now, we first check if there is a proceeding entry and if not, decrement it and return. Otherwise, we proceed to iterate and then zero the last element, which, in most cases, will already be zero. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: Validate the set family and not the set type family at ↵Jozsef Kadlecsik2013-09-161-1/+1
| | | | | | | | swapping This closes netfilter bugzilla #843, reported by Quentin Armitage. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: Consistent userspace testing with nomatch flagJozsef Kadlecsik2013-09-165-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "nomatch" commandline flag should invert the matching at testing, similarly to the --return-nomatch flag of the "set" match of iptables. Until now it worked with the elements with "nomatch" flag only. From now on it works with elements without the flag too, i.e: # ipset n test hash:net # ipset a test 10.0.0.0/24 nomatch # ipset t test 10.0.0.1 10.0.0.1 is NOT in set test. # ipset t test 10.0.0.1 nomatch 10.0.0.1 is in set test. # ipset a test 192.168.0.0/24 # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is NOT in set test. Before the patch the results were ... # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is in set test. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: Skip really non-first fragments for IPv6 when getting ↵Jozsef Kadlecsik2013-09-161-2/+2
| | | | | | port/protocol Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: Fix build errors with xt_socket.cDavid S. Miller2013-09-051-0/+1
| | | | | | | | | | | | | | | | | | | | | As reported by Randy Dunlap: ==================== when CONFIG_IPV6=m and CONFIG_NETFILTER_XT_MATCH_SOCKET=y: net/built-in.o: In function `socket_mt6_v1_v2': xt_socket.c:(.text+0x51b55): undefined reference to `udp6_lib_lookup' net/built-in.o: In function `socket_mt_init': xt_socket.c:(.init.text+0x1ef8): undefined reference to `nf_defrag_ipv6_enable' ==================== Like several other modules under net/netfilter/ we have to have a dependency "IPV6 disabled or set compatibly with this module" clause. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netfilter: xt_TCPMSS: correct return value in tcpmss_mangle_packetPhil Oester2013-09-041-1/+1
| | | | | | | | | | | In commit b396966c4 (netfilter: xt_TCPMSS: Fix missing fragmentation handling), I attempted to add safe fragment handling to xt_TCPMSS. However, Andy Padavan of Project N56U correctly points out that returning XT_CONTINUE in this function does not work. The callers (tcpmss_tg[46]) expect to receive a value of 0 in order to return XT_CONTINUE. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: synproxy_core: fix warning in __nf_ct_ext_add_length()Patrick McHardy2013-09-041-2/+2
| | | | | | | | | | | | | | | | With CONFIG_NETFILTER_DEBUG we get the following warning during SYNPROXY init: [ 80.558906] WARNING: CPU: 1 PID: 4833 at net/netfilter/nf_conntrack_extend.c:80 __nf_ct_ext_add_length+0x217/0x220 [nf_conntrack]() The reason is that the conntrack template is set to confirmed before adding the extension and it is invalid to add extensions to already confirmed conntracks. Fix by adding the extensions before setting the conntrack to confirmed. Reported-by: Jesper Dangaard Brouer <jesper.brouer@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
OpenPOWER on IntegriCloud