summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* netfilter: nf_tables: wait for call_rcu completion on module removalPablo Neira Ayuso2014-10-021-0/+1
| | | | | | | Make sure the objects have been released before the nf_tables modules is removed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: use IS_ENABLED(CONFIG_BRIDGE_NETFILTER)Pablo Neira Ayuso2014-10-0210-20/+20
| | | | | | | | | | In 34666d4 ("netfilter: bridge: move br_netfilter out of the core"), the bridge netfilter code has been modularized. Use IS_ENABLED instead of ifdef to cover the module case. Fixes: 34666d4 ("netfilter: bridge: move br_netfilter out of the core") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: move nf_send_resetX() code to nf_reject_ipvX modulesPablo Neira Ayuso2014-10-027-117/+309
| | | | | | | | Move nf_send_reset() and nf_send_reset6() to nf_reject_ipv4 and nf_reject_ipv6 respectively. This code is shared by x_tables and nf_tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nft_reject: introduce icmp code abstraction for inet and bridgePablo Neira Ayuso2014-10-027-17/+241
| | | | | | | | | | | | | | | | | | | | | This patch introduces the NFT_REJECT_ICMPX_UNREACH type which provides an abstraction to the ICMP and ICMPv6 codes that you can use from the inet and bridge tables, they are: * NFT_REJECT_ICMPX_NO_ROUTE: no route to host - network unreachable * NFT_REJECT_ICMPX_PORT_UNREACH: port unreachable * NFT_REJECT_ICMPX_HOST_UNREACH: host unreachable * NFT_REJECT_ICMPX_ADMIN_PROHIBITED: administratevely prohibited You can still use the specific codes when restricting the rule to match the corresponding layer 3 protocol. I decided to not overload the existing NFT_REJECT_ICMP_UNREACH to have different semantics depending on the table family and to allow the user to specify ICMP family specific codes if they restrict it to the corresponding family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* ipv4: mentions skb_gro_postpull_rcsum() in inet_gro_receive()Eric Dumazet2014-10-011-0/+3
| | | | | | | | | | | | Proper CHECKSUM_COMPLETE support needs to adjust skb->csum when we remove one header. Its done using skb_gro_postpull_rcsum() In the case of IPv4, we know that the adjustment is not really needed, because the checksum over IPv4 header is 0. Lets add a comment to ease code comprehension and avoid copy/paste errors. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* fm10k: using vmalloc requires including linux/vmalloc.hStephen Rothwell2014-10-011-0/+2
| | | | | | Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ieee802154: fix __init functionsFabian Frederick2014-10-011-2/+2
| | | | | | | | | | | | Commit 3243acd37fd9 ("ieee802154: add __init to lowpan_frags_sysctl_register") added __init to lowpan_frags_ns_sysctl_register instead of lowpan_frags_sysctl_register Suggested-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'sunvnet-jumbograms'David S. Miller2014-09-305-68/+366
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | David L Stevens says: ==================== sunvnet: add jumbo frames support This patch set updates the sunvnet driver to version 1.6 of the VIO protocol to support per-port exchange of MTU information and allow non-standard MTU sizes, including jumbo frames. Using large MTUs shows a nearly 5X throughput improvement Linux-Solaris and > 10X throughput improvement Linux-Linux. Changes from v8: -add a short timeout to free pending skbs if a new transmit doesn't do it first per Dave Miller <davem@davemloft.net> Changes from v7: -handle skb allocation failures in vnet_skb_shape() per Dave Miller <davem@davemloft.net> Changes from v6: -made kernel transmit path zero-copy to remove memory n^2 scaling issue raised by Raghuram Kothakota <Raghuram.Kothakota@oracle.com> Changes from v5: - fixed comment per Sowmini Varadhan <sowmini.varadhan@oracle.com> Changes from v4: - changed VNET_MAXPACKET per David Laight <David.Laight@ACULAB.COM> - added cookies to support non-contiguous buffers of max size Changes from v3: - added version functions per Dave Miller <davem@davemloft.net> - moved rmtu to vnet_port per Dave Miller <davem@davemloft.net> - explicitly set options bits and capability flags to 0 per Raghuram Kothakota <Raghuram.Kothakota@oracle.com> Changes from v2: - make checkpatch clean Changes from v1: - fix brace formatting per Dave Miller <davem@davemloft.net> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * sunvnet: generate ICMP PTMUD messages for smaller port MTUsDavid L Stevens2014-09-301-1/+36
| | | | | | | | | | | | | | | | | | | | | | This patch sends ICMP and ICMPv6 messages for Path MTU Discovery when a remote port MTU is smaller than the device MTU. This allows mixing newer VIO protocol devices that support MTU negotiation with older devices that do not on the same vswitch. It also allows Linux-Linux LDOMs to use 64K-1 data packets even though Solaris vswitch is limited to <16K MTU. Signed-off-by: David L Stevens <david.stevens@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sunvnet: allow admin to set sunvnet MTUDavid L Stevens2014-09-303-5/+10
| | | | | | | | | | | | | | | | This patch allows an admin to set the MTU on a sunvnet device to arbitrary values between the minimum (68) and maximum (65535) IPv4 packet sizes. Signed-off-by: David L Stevens <david.stevens@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sunvnet: make transmit path zero-copy in the kernelDavid L Stevens2014-09-302-45/+182
| | | | | | | | | | | | | | | | | | | | | | | | This patch removes pre-allocated transmit buffers and instead directly maps pending packets on demand. This saves O(n^2) maximum-sized transmit buffers, for n hosts on a vswitch, as well as a copy to those buffers. Single-stream TCP throughput linux-solaris dropped ~5% for 1500-byte MTU, but linux-linux at 1500-bytes increased ~20%. Signed-off-by: David L Stevens <david.stevens@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sunvnet: upgrade to VIO protocol version 1.6David L Stevens2014-09-304-22/+143
|/ | | | | | | | | This patch upgrades the sunvnet driver to support VIO protocol version 1.6. In particular, it adds per-port MTU negotiation, allowing MTUs other than ETH_FRAMELEN with ports using newer VIO protocol versions. Signed-off-by: David L Stevens <david.stevens@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: Change tcp_slow_start function to return voidLi RongQing2014-09-302-4/+2
| | | | | | | No caller uses the return value, so make this function return void. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ieee802154: add __init to lowpan_frags_sysctl_registerFabian Frederick2014-09-301-2/+2
| | | | | | | | lowpan_frags_sysctl_register is only called by __init lowpan_net_frag_init (part of the lowpan module). Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
* irda: add __init to irlan_openFabian Frederick2014-09-301-2/+2
| | | | | | | irlan_open is only called by __init irlan_init in same module. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
* next: mips: bpf: Fix build failureGuenter Roeck2014-09-301-1/+0
| | | | | | | | | | | | | | Fix: arch/mips/net/bpf_jit.c: In function 'build_body': arch/mips/net/bpf_jit.c:762:6: error: unused variable 'tmp' cc1: all warnings being treated as errors make[2]: *** [arch/mips/net/bpf_jit.o] Error 1 Seen when building mips:allmodconfig in -next since next-20140924. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'pxa168_eth'David S. Miller2014-09-305-79/+199
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Antoine Tenart says: ==================== ARM: Berlin: Ethernet support This series introduce support for the Ethernet controller on Berlin SoCs, using the existing pxa168 Ethernet driver. In order to do this, DT support is added to the driver alongside some other modifications and fixes. This has been tested on a Berlin BG2Q DMP board. Changes since v5: - fixed the build when building the driver as a module Changes since v4: - removed the phy-addr property and added a phy subnode - added COMPILE_TEST for the pxa168_eth driver Changes since v3: - moved the addition of pxa168_eth_get_mac_address() to the patch using it first Changes since v2: - reworked how the MAC address is configured - made the clock anonymous Changes since v1: - removed custom Berlin Ethernet driver - used the pxa168 Ethernet driver instead - made modifications to the pxa168 driver (DT support, fixes) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * ARM: dts: berlin: enable the Ethernet port on the BG2Q DMPAntoine Ténart2014-09-301-0/+4
| | | | | | | | | | | | | | | | This patch enables the Ethernet port on the Marvell Berlin2Q DMP board. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ARM: dts: berlin: add the Ethernet nodeAntoine Ténart2014-09-301-0/+17
| | | | | | | | | | | | | | | | | | This patch adds the Ethernet node, enabling the network unit on Berlin BG2Q SoCs. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: pxa168_eth: allow to compile the pxa168_eth driver for testsAntoine Ténart2014-09-301-1/+1
| | | | | | | | | | | | | | | | | | Add a dependency to COMPILE_TEST so that the driver can be compiled for test purposes. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: pxa168_eth: allow Berlin SoCs to use the pxa168_eth driverAntoine Ténart2014-09-301-1/+1
| | | | | | | | | | | | | | | | | | Berlin SoCs have an Ethernet controller compatible with the pxa168. Allow these SoCs to use the pxa168_eth driver. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: pxa168_eth: rework the MAC address setupAntoine Ténart2014-09-301-2/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch rework the way the MAC address is retrieved. The MAC address can now, in addition to being random, be set in the device tree or retrieved from the Ethernet controller MAC address registers. The probing function will try to get a MAC address in the following order: - From the device tree. - From the Ethernet controller MAC address registers. - Generate a random one. This patch also adds a function to read the MAC address from the Ethernet Controller registers. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: pxa168_eth: set the mac address on the Ethernet controllerAntoine Ténart2014-09-301-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | When changing the MAC address, in addition to updating the dev_addr in the net_device structure, this patch also update the MAC address registers (high and low) of the Ethernet controller with the new MAC. The address stored in these registers is used for IEEE 802.3x Ethernet flow control, which is already enabled. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: pxa168_eth: fix Ethernet flow control statusAntoine Ténart2014-09-301-2/+2
| | | | | | | | | | | | | | | | | | | | IEEE 802.3x Ethernet flow control is disabled when bit (1 << 2) is set in the port status register. Fix the flow control detection in the link event handling function which was relying on the opposite assumption. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Documentation: bindings: net: add the Marvell PXA168 Ethernet controllerAntoine Ténart2014-09-301-0/+36
| | | | | | | | | | | | | | | | | | This adds the binding documentation for the Marvell PXA168 Ethernet controller, following its DT support. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: pxa168_eth: add device tree supportAntoine Ténart2014-09-301-23/+47
| | | | | | | | | | | | | | | | Add the device tree support to the pxa168_eth driver. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: pxa168_eth: clean upAntoine Ténart2014-09-301-52/+50
|/ | | | | | | | | Clean up a bit the pxa168_eth driver before adding the device tree support. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'mlx4-next'David S. Miller2014-09-304-185/+260
|\ | | | | | | | | | | | | | | | | | | | | | | Or Gerlitz says: ==================== mlx4_core driver updates A series from Jack and Co of low-level fixes for the mlx4_core driver ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debugJack Morgenstein2014-09-301-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ConnectX2 HCAs have max_mtu=4k and max_vl=8 vls. However, if you specify a 4K mtu, the max_vl supported for 4K is 4 vls. The driver at startup attempts to set a 4K mtu using the max_vl value obtained from QUERY_PORT. Since the max_vl value is 8 vls (which is supported up to 2K mtu size), the first attempt to set the mtl/vl port value will fail, generating the following error message in the log: mlx4_core 0000:06:00.0: command 0xc failed: fw status = 0x40 The driver then tries again, using mtu=4k, vls=4, and this succeeds. Since we do not want to have this error message always displayed at driver start when there are ConnectX2 HCAs on the host, we deprecate the error message for this specific command/input_modifier/opcode_modifier/fw-status to be debug. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx4_core: Protect QUERY_PORT wrapper from untrusted guestsJack Morgenstein2014-09-301-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | The function mlx4_QUERY_PORT_wrapper implements only the QUERY_PORT "general" case (opcode modifier = 0). Verify that the opcode modifier is zero, and also that the input modifier contains only the port number in bits 0..7 (all other bits should be zero). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx4_core: New init and exit flow for mlx4_coreMajd Dibbiny2014-09-302-174/+220
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the new flow, we separate the pci initialization and teardown from the initialization and teardown of the other resources. __mlx4_init_one handles the pci resources initialization. It then calls mlx4_load_one to initialize the remainder of the resources. When removing a device, mlx4_remove_one is invoked. However, now mlx4_remove_one calls mlx4_unload_one to free all the resources except the pci resources. When mlx4_unload_one returns, mlx4_remove_one then frees the pci resources. The above separation will allow us to implement 'reset flow' in the future. It will also enable more EQs for VFs and is a pre-step to the modern API to enable/disable SRIOV. Also added nvfs; an integer array of size MLX4_MAX_PORTS + 1; to the mlx4_dev struct. This new field is used to avoid parsing the num_vfs module parameter each time the mlx4_restart_one is called. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx4_core: Don't disable SRIOV if there are active VFsJack Morgenstein2014-09-301-7/+21
|/ | | | | | | | | | | | | | | | When unloading the host driver while there are VFs active on VMs, the PF driver disabled sriov anyway, causing kernel crashes. We now leave SRIOV enabled, to avoid that. When the driver is reloaded, __mlx4_init_one is invoked on the PF. It now checks to see if SRIOV is already enabled on the PF -- and if so does not enable sriov again. Signed-off-by: Tal Alon <talal@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netfilter: bridge: build br_nf_core only if requiredFlorian Westphal2014-09-302-3/+4
| | | | | | | | | | | | | | | | | | Eric reports build failure with CONFIG_BRIDGE_NETFILTER=n We insist to build br_nf_core.o unconditionally, but we must only do so if br_netfilter was enabled, else it fails to build due to functions being defined to empty stubs (and some structure members being defined out). Also, BRIDGE_NETFILTER=y|m makes no sense when BRIDGE=n. Fixes: 34666d467 (netfilter: bridge: move br_netfilter out of the core) Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'am335x'David S. Miller2014-09-305-3/+57
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Markus Pargmann says: ==================== net: cpsw: Support for am335x chip MACIDs This series adds support to the cpsw driver to read the MACIDs of the am335x chip and use them as fallback. These addresses are only used if there are no mac addresses in the devicetree, for example set by a bootloader. ==================== Acked-by: Mugunthan V N <mugunthanvnm@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * arm: dts: am33xx, Add syscon phandle to cpsw nodeMarkus Pargmann2014-09-301-0/+1
| | | | | | | | | | | | | | | | | | | | There are 2 MACIDs stored in the control module of the am33xx. These are read by the cpsw driver if no valid MACID was found in the devicetree. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Reviewed-by: Wolfram Sang <wsa@the-dreams.de> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * am33xx: define syscon control module device nodeMarkus Pargmann2014-09-301-0/+5
| | | | | | | | | | | | | | Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Reviewed-by: Wolfram Sang <wsa@the-dreams.de> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cpsw: Add am33xx MACID readoutMarkus Pargmann2014-09-303-1/+47
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds a function to get the MACIDs from the am33xx SoC control module registers which hold unique vendor MACIDs. This is only used if of_get_mac_address() fails to get a valid mac address. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Reviewed-by: Wolfram Sang <wsa@the-dreams.de> Tested-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cpsw: Replace pr_err by dev_errMarkus Pargmann2014-09-301-1/+1
| | | | | | | | | | | | | | | | Use dev_err instead of pr_err. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Reviewed-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cpsw: header, Add missing includeMarkus Pargmann2014-09-301-0/+1
| | | | | | | | | | | | | | | | | | "MII_BUS_ID_SIZE" is defined in linux/phy.h which is not included in the cpsw.h file. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Reviewed-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cpsw: Add missing return valueMarkus Pargmann2014-09-301-0/+1
| | | | | | | | | | | | | | | | | | | | ret is set 0 at this point, so jumping to that error label would result in a return value of 0. Set ret to -ENOMEM to return a proper error value. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Reviewed-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * DT doc: net: cpsw mac-address is optionalMarkus Pargmann2014-09-301-1/+1
|/ | | | | | | | | mac-address is an optional property. If no mac-address is set, a random mac-address will be generated. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Reviewed-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* bonding: make global bonding stats more reliableAndy Gospodarek2014-09-302-28/+43
| | | | | | | | | | | | | | | | | | | | | | | As the code stands today, bonding stats are based simply on the stats from the member interfaces. If a member was to be removed from a bond, the stats would instantly drop. This would be confusing to an admin would would suddonly see interface stats drop while traffic is still flowing. In addition to preventing the stats drops mentioned above, new members will now be added to the bond and only traffic received after the member was added to the bond will be counted as part of bonding stats. Bonding counters will also be updated when any slaves are dropped to make sure the reported stats are reliable. v2: Changes suggested by Nik to properly allocate/free stats memory. v3: Properly destroy workqueue and fix netlink configuration path. v4: Moved cached stats into bonding and slave structs as there does not seem to be a complexity/performance benefit to using alloc'd memory vs in-struct memory. Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: enable per cpu qstatsJohn Fastabend2014-09-3017-25/+87
| | | | | | | | After previous patches to simplify qstats the qstats can be made per cpu with a packed union in Qdisc struct. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: restrict use of qstats qlenJohn Fastabend2014-09-3016-34/+32
| | | | | | | | | | | | | | | | | | | | This removes the use of qstats->qlen variable from the classifiers and makes it an explicit argument to gnet_stats_copy_queue(). The qlen represents the qdisc queue length and is packed into the qstats at the last moment before passnig to user space. By handling it explicitely we avoid, in the percpu stats case, having to figure out which per_cpu variable to put it in. It would probably be best to remove it from qstats completely but qstats is a user space ABI and can't be broken. A future patch could make an internal only qstats structure that would avoid having to allocate an additional u32 variable on the Qdisc struct. This would make the qstats struct 128bits instead of 128+32. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: implement qstat helper routinesJohn Fastabend2014-09-3025-81/+108
| | | | | | | | | This adds helpers to manipulate qstats logic and replaces locations that touch the counters directly. This simplifies future patches to push qstats onto per cpu counters. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: make bstats per cpu and estimator RCU safeJohn Fastabend2014-09-3019-51/+164
| | | | | | | | | | | | | | | | | In order to run qdisc's without locking statistics and estimators need to be handled correctly. To resolve bstats make the statistics per cpu. And because this is only needed for qdiscs that are running without locks which is not the case for most qdiscs in the near future only create percpu stats when qdiscs set the TCQ_F_CPUSTATS flag. Next because estimators use the bstats to calculate packets per second and bytes per second the estimator code paths are updated to use the per cpu statistics. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvlan: add source modeMichael Braun2014-09-293-3/+314
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new mode of operation to macvlan, called "source". It allows one to set a list of allowed mac address, which is used to match against source mac address from received frames on underlying interface. This enables creating mac based VLAN associations, instead of standard port or tag based. The feature is useful to deploy 802.1x mac based behavior, where drivers of underlying interfaces doesn't allows that. Configuration is done through the netlink interface using e.g.: ip link add link eth0 name macvlan0 type macvlan mode source ip link add link eth0 name macvlan1 type macvlan mode source ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11 ip link set link dev macvlan0 type macvlan macaddr add 00:22:22:22:22:22 ip link set link dev macvlan0 type macvlan macaddr add 00:33:33:33:33:33 ip link set link dev macvlan1 type macvlan macaddr add 00:33:33:33:33:33 ip link set link dev macvlan1 type macvlan macaddr add 00:44:44:44:44:44 This allows clients with MAC addresses 00:11:11:11:11:11, 00:22:22:22:22:22 to be part of only VLAN associated with macvlan0 interface. Clients with MAC addresses 00:44:44:44:44:44 with only VLAN associated with macvlan1 interface. And client with MAC address 00:33:33:33:33:33 to be associated with both VLANs. Based on work of Stefan Gula <steweg@gmail.com> v8: last version of Stefan Gula for Kernel 3.2.1 v9: rework onto linux-next 2014-03-12 by Michael Braun add MACADDR_SET command, enable to configure mac for source mode while creating interface v10: - reduce indention level - rename source_list to source_entry - use aligned 64bit ether address - use hash_64 instead of addr[5] v11: - rebase for 3.14 / linux-next 20.04.2014 v12 - rebase for linux-next 2014-09-25 Signed-off-by: Michael Braun <michael-dev@fami-braun.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2014-09-2972-512/+1595
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== pull request: netfilter/ipvs updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, most relevantly they are: 1) Four patches to make the new nf_tables masquerading support independent of the x_tables infrastructure. This also resolves a compilation breakage if the masquerade target is disabled but the nf_tables masq expression is enabled. 2) ipset updates via Jozsef Kadlecsik. This includes the addition of the skbinfo extension that allows you to store packet metainformation in the elements. This can be used to fetch and restore this to the packets through the iptables SET target, patches from Anton Danilov. 3) Add the hash:mac set type to ipset, from Jozsef Kadlecsick. 4) Add simple weighted fail-over scheduler via Simon Horman. This provides a fail-over IPVS scheduler (unlike existing load balancing schedulers). Connections are directed to the appropriate server based solely on highest weight value and server availability, patch from Kenny Mathis. 5) Support IPv6 real servers in IPv4 virtual-services and vice versa. Simon Horman informs that the motivation for this is to allow more flexibility in the choice of IP version offered by both virtual-servers and real-servers as they no longer need to match: An IPv4 connection from an end-user may be forwarded to a real-server using IPv6 and vice versa. No ip_vs_sync support yet though. Patches from Alex Gartrell and Julian Anastasov. 6) Add global generation ID to the nf_tables ruleset. When dumping from several different object lists, we need a way to identify that an update has ocurred so userspace knows that it needs to refresh its lists. This also includes a new command to obtain the 32-bits generation ID. The less significant 16-bits of this ID is also exposed through res_id field in the nfnetlink header to quickly detect the interference and retry when there is no risk of ID wraparound. 7) Move br_netfilter out of the bridge core. The br_netfilter code is built in the bridge core by default. This causes problems of different kind to people that don't want this: Jesper reported performance drop due to the inconditional hook registration and I remember to have read complains on netdev from people regarding the unexpected behaviour of our bridging stack when br_netfilter is enabled (fragmentation handling, layer 3 and upper inspection). People that still need this should easily undo the damage by modprobing the new br_netfilter module. 8) Dump the set policy nf_tables that allows set parameterization. So userspace can keep user-defined preferences when saving the ruleset. From Arturo Borrero. 9) Use __seq_open_private() helper function to reduce boiler plate code in x_tables, From Rob Jones. 10) Safer default behaviour in case that you forget to load the protocol tracker. Daniel Borkmann and Florian Westphal detected that if your ruleset is stateful, you allow traffic to at least one single SCTP port and the SCTP protocol tracker is not loaded, then any SCTP traffic may be pass through unfiltered. After this patch, the connection tracking classifies SCTP/DCCP/UDPlite/GRE packets as invalid if your kernel has been compiled with support for these modules. ==================== Trivially resolved conflict in include/linux/skbuff.h, Eric moved some netfilter skbuff members around, and the netfilter tree adjusted the ifdef guards for the bridging info pointer. Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: conntrack: disable generic tracking for known protocolsFlorian Westphal2014-09-291-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given following iptables ruleset: -P FORWARD DROP -A FORWARD -m sctp --dport 9 -j ACCEPT -A FORWARD -p tcp --dport 80 -j ACCEPT -A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT One would assume that this allows SCTP on port 9 and TCP on port 80. Unfortunately, if the SCTP conntrack module is not loaded, this allows *all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT, which we think is a security issue. This is because on the first SCTP packet on port 9, we create a dummy "generic l4" conntrack entry without any port information (since conntrack doesn't know how to extract this information). All subsequent packets that are unknown will then be in established state since they will fallback to proto_generic and will match the 'generic' entry. Our originally proposed version [1] completely disabled generic protocol tracking, but Jozsef suggests to not track protocols for which a more suitable helper is available, hence we now mitigate the issue for in tree known ct protocol helpers only, so that at least NAT and direction information will still be preserved for others. [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html Joint work with Daniel Borkmann. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_tables: store and dump set policyArturo Borrero2014-09-292-0/+8
| | | | | | | | | | | | | | | | We want to know in which cases the user explicitly sets the policy options. In that case, we also want to dump back the info. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
OpenPOWER on IntegriCloud