op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO	Duan Jiong	2014-04-12	1	-3/+1
\| \| \| \| \| \| \| \|	This patch fixes coccinelle error regarding usage of IS_ERR and PTR_ERR instead of PTR_ERR_OR_ZERO. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: ipv6: Fix oif in TCP SYN+ACK route lookup.	Lorenzo Colitti	2014-04-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	net-next commit 9c76a11, ipv6: tcp_ipv6 policy route issue, had a boolean logic error that caused incorrect behaviour for TCP SYN+ACK when oif-based rules are in use. Specifically: 1. If a SYN comes in from a global address, and sk_bound_dev_if is not set, the routing lookup has oif set to the interface the SYN came in on. Instead, it should have oif unset, because for global addresses, the incoming interface doesn't necessarily have any bearing on the interface the SYN+ACK is sent out on. 2. If a SYN comes in from a link-local address, and sk_bound_dev_if is set, the routing lookup has oif set to the interface the SYN came in on. Instead, it should have oif set to sk_bound_dev_if, because that's what the application requested. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'for-davem' of ↵	David S. Miller	2014-04-11	12	-62/+66
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== Please pull this batch of fixes intended for the 3.15 stream! Chun-Yeow Yeoh gives us an ath9k_htc fix so that mac80211 can report last_tx_rate correctly for those devices.. Fariya Fatima has a number of small fixes for things identified by the static analysis folks in the new rsi driver. Felix Fietkau brings an ath9k fix to better support some older chips, and a fix for a scheduling while atomic bug introduced by an earlier patch. Janusz Dziedzic produced an ath9k fix to only enable DFS when a related build option is selected. Paul Bolle removes some dead code in rtlwifi. Rafał Miłecki fixes some b43 code that was accessing some registers with operations for the wrong register width. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Merge branch 'master' of ↵	John W. Linville	2014-04-10	12	-62/+66
\| \|\ \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
\| \| *	ath9k: fix a scheduling while atomic bug in CSA handling	Felix Fietkau	2014-04-09	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit "ath9k: prepare for multi-interface CSA support" added a call to ieee80211_iterate_active_interfaces in atomic context (beacon tasklet), which is crashing. Use ieee80211_iterate_active_interfaces_atomic instead. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| \| *	ath9k_hw: reduce ANI firstep range for older chips	Felix Fietkau	2014-04-09	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use 0-8 instead of 0-16, which is closer to the old implementation. Also drop the overwrite of the firstep_low parameter to improve stability. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| \| *	ath9k: Enable DFS only when ATH9K_DFS_CERTIFIED	Janusz Dziedzic	2014-04-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add DFS interface combination only when CONFIG_ATH9K_DFS_CERTIFIED is set. In other case user can run CAC/beaconing without proper handling of pulse events (without radar detection activated). Reported-by: Cedric Voncken <cedric.voncken@acksys.fr> Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| \| *	b43: Fix machine check error due to improper access of B43_MMIO_PSM_PHY_HDR	Rafał Miłecki	2014-04-09	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Register B43_MMIO_PSM_PHY_HDR is 16 bit one, so accessing it with 32b functions isn't safe. On my machine it causes delayed (!) CPU exception: Disabling lock debugging due to kernel taint mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f mce: [Hardware Error]: TSC 164083803dc mce: [Hardware Error]: PROCESSOR 2:20fc2 TIME 1396650505 SOCKET 0 APIC 0 microcode 0 mce: [Hardware Error]: Run the above through 'mcelog --ascii' mce: [Hardware Error]: Machine check: Processor context corrupt Kernel panic - not syncing: Fatal machine check on current CPU Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Stable <stable@vger.kernel.org> [2.6.35+] Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| \| *	rtlwifi: btcoexist: remove undefined Kconfig macros	Paul Bolle	2014-04-09	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are references to four undefined Kconfig macros in the code. Commit 8542373dccd2 ("Staging: rtl8812ae: remove undefined Kconfig macros") removed identical references from that staging driver, but they resurfaced in rtlwifi. Remove these again as the checks for them still will always evaluate to false. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| \| *	ath9k_htc: set IEEE80211_TX_STAT_AMPDU for acked aggregated frames	Chun-Yeow Yeoh	2014-04-09	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Frame aggregation requires the IEEE80211_TX_STAT_AMPDU to be set so that mac80211 can report the last_tx_rate correctly. Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| \| *	rsi: Fixed issue relating to doing dma on stack error.	Fariya Fatima	2014-04-09	1	-7/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Fariya Fatima <fariyaf@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| \| *	rsi: Fixed issue relating to index of q_num.	Fariya Fatima	2014-04-09	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Fariya Fatima <fariyaf@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| \| *	rsi: Fixed issue relating to return value.	Fariya Fatima	2014-04-09	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Fariya Fatima <fariyaf@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| \| *	rsi: Fixed issue relating to variable de-referenced before check 'adapter'	Fariya Fatima	2014-04-09	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Fariya Fatima <fariyaf@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| \| *	rsi: Fixed signedness bug reported by static code analyzer.	Fariya Fatima	2014-04-09	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Fariya Fatima <fariyaf@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| \| *	rsi: Potential null pointer derefernce issue fixed.	Fariya Fatima	2014-04-09	1	-19/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Fariya Fatima <fariyaf@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* \| \|	Merge branch 'cpsw'	David S. Miller	2014-04-11	1	-7/+7
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mugunthan V N says: ==================== This patch series fixes the cpsw issue with interface up/dpwn with high ethernet traffic. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	drivers: net: cpsw: enable interrupts after napi enable and clearing ↵	Mugunthan V N	2014-04-11	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	previous interrupts When the Ethernet interface is put down and up with heavy Ethernet traffic, then there is prossibility of an interrupt waiting in irq controller to be processed, so when the interface is brought up again just after enable interrupt, it goes to ISR due to the previous unhandled interrutp and in ISR napi is not scheduled as the napi is not enabled in ndo_open which results in disabled interrupt for CPSW and no packets are received in cpsw. So this patch moves enabling of interupts after napi_enable and clearing CPDMA interrupts. Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	drivers: net: cpsw: discard all packets received when interface is down	Mugunthan V N	2014-04-11	1	-1/+1
\|/ / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the Ethernet interface is brought down during high Ethernet traffic, then cpsw creates the following warn dump. When cpdma has already processed the packet then the status will be greater than 0, so the cpsw_rx_handler considers that the interface is up and try to resubmit one more rx buffer to cpdma which fails as the DMA is in teardown process. This can be avoided by checking the interface state and then process the received packet, if the interface is down just discard and free the skb and return. [ 2823.104591] WARNING: CPU: 0 PID: 1823 at drivers/net/ethernet/ti/cpsw.c:711 cpsw_rx_handler+0x148/0x164() [ 2823.114654] Modules linked in: [ 2823.117872] CPU: 0 PID: 1823 Comm: ifconfig Tainted: G W 3.14.0-11992-gf34c4a3 #11 [ 2823.126860] [<c0014b5c>] (unwind_backtrace) from [<c00117e4>] (show_stack+0x10/0x14) [ 2823.135030] [<c00117e4>] (show_stack) from [<c0533a9c>] (dump_stack+0x80/0x9c) [ 2823.142619] [<c0533a9c>] (dump_stack) from [<c003f0e0>] (warn_slowpath_common+0x6c/0x90) [ 2823.151141] [<c003f0e0>] (warn_slowpath_common) from [<c003f120>] (warn_slowpath_null+0x1c/0x24) [ 2823.160336] [<c003f120>] (warn_slowpath_null) from [<c03caeb0>] (cpsw_rx_handler+0x148/0x164) [ 2823.169314] [<c03caeb0>] (cpsw_rx_handler) from [<c03c730c>] (__cpdma_chan_free+0x90/0xa8) [ 2823.178028] [<c03c730c>] (__cpdma_chan_free) from [<c03c7418>] (__cpdma_chan_process+0xf4/0x134) [ 2823.187279] [<c03c7418>] (__cpdma_chan_process) from [<c03c7560>] (cpdma_chan_stop+0xb4/0x17c) [ 2823.196349] [<c03c7560>] (cpdma_chan_stop) from [<c03c766c>] (cpdma_ctlr_stop+0x44/0x9c) [ 2823.204872] [<c03c766c>] (cpdma_ctlr_stop) from [<c03cb708>] (cpsw_ndo_stop+0x154/0x188) [ 2823.213321] [<c03cb708>] (cpsw_ndo_stop) from [<c046f0ec>] (__dev_close_many+0x84/0xc8) [ 2823.221761] [<c046f0ec>] (__dev_close_many) from [<c046f158>] (__dev_close+0x28/0x3c) [ 2823.230012] [<c046f158>] (__dev_close) from [<c0474ca8>] (__dev_change_flags+0x88/0x160) [ 2823.238483] [<c0474ca8>] (__dev_change_flags) from [<c0474da0>] (dev_change_flags+0x18/0x48) [ 2823.247316] [<c0474da0>] (dev_change_flags) from [<c04d12c4>] (devinet_ioctl+0x61c/0x6e0) [ 2823.255884] [<c04d12c4>] (devinet_ioctl) from [<c045c660>] (sock_ioctl+0x68/0x2a4) [ 2823.263789] [<c045c660>] (sock_ioctl) from [<c0125fe4>] (do_vfs_ioctl+0x78/0x61c) [ 2823.271629] [<c0125fe4>] (do_vfs_ioctl) from [<c01265ec>] (SyS_ioctl+0x64/0x74) [ 2823.279284] [<c01265ec>] (SyS_ioctl) from [<c000e580>] (ret_fast_syscall+0x0/0x48) Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	net: Fix use after free by removing length arg from sk_data_ready callbacks.	David S. Miller	2014-04-11	58	-121/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	Merge branch 'hyperv'	David S. Miller	2014-04-11	4	-4/+41
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	K. Y. Srinivasan says: ==================== Fix issues with Heper-V network offload code WS2008 R2 does not support udp checksum offload. Furthermore, ws2012 and ws2012 r2 have issues offloading udp checksum from Linux guests. This patch-set addresses these issues as well as other bug fixes. Please apply. In this version, I have addressed the comment from David Miller with reagards to COWing the skb prior to modifying the header (patch 3/3). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	Drivers: net: hyperv: Address UDP checksum issues	KY Srinivasan	2014-04-11	3	-2/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ws2008r2 does not support UDP checksum offload. Thus, we cannnot turn on UDP offload in the host. Also, on ws2012 and ws2012 r2, there appear to be an issue with UDP checksum offload. Fix this issue by computing the UDP checksum in the Hyper-V driver. Based on Dave Miller's comments, in this version, I have COWed the skb before modifying the UDP header (the checksum field). Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	Drivers: net: hyperv: Negotiate suitable ndis version for offload support	KY Srinivasan	2014-04-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ws2008R2 supports ndis_version 6.1 and 6.1 is the minimal version required for various offloads. Negotiate ndis_version 6.1 when on ws2008r2. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	Drivers: net: hyperv: Allocate memory for all possible per-pecket information	KY Srinivasan	2014-04-11	1	-1/+3
\|/ / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An outgoing packet can potentially need per-packet information for all the offloads and VLAN tagging. Fix this issue. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	bridge: Fix double free and memory leak around br_allowed_ingress	Toshiaki Makita	2014-04-11	2	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	br_allowed_ingress() has two problems. 1. If br_allowed_ingress() is called by br_handle_frame_finish() and vlan_untag() in br_allowed_ingress() fails, skb will be freed by both vlan_untag() and br_handle_frame_finish(). 2. If br_allowed_ingress() is called by br_dev_xmit() and br_allowed_ingress() fails, the skb will not be freed. Fix these two problems by freeing the skb in br_allowed_ingress() if it fails. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	bonding: Remove debug_fs files when module init fails	Thomas Richter	2014-04-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the bonding debug_fs entries when the module initialization fails. The debug_fs entries should be removed together with all other already allocated resources. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Jay Vosburgh <j.vosburgh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	net: core: don't account for udp header size when computing seglen	Florian Westphal	2014-04-10	1	-5/+7
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case of tcp, gso_size contains the tcpmss. For UFO (udp fragmentation offloading) skbs, gso_size is the fragment payload size, i.e. we must not account for udp header size. Otherwise, when using virtio drivers, a to-be-forwarded UFO GSO packet will be needlessly fragmented in the forward path, because we think its individual segments are too large for the outgoing link. Fixes: fe6cc55f3a9a053 ("net: ip, ipv6: handle gso skbs in forwarding path") Cc: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	l2tp: take PMTU from tunnel UDP socket	Dmitry Petukhov	2014-04-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When l2tp driver tries to get PMTU for the tunnel destination, it uses the pointer to struct sock that represents PPPoX socket, while it should use the pointer that represents UDP socket of the tunnel. Signed-off-by: Dmitry Petukhov <dmgenp@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	drivers: net: cpsw: Add default vlan for dual emac case also	Mugunthan V N	2014-04-09	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dual EMAC works with VLAN segregation of the ports, so default vlan needs to be added in dual EMAC case else default vlan will be tagged for all egress packets and vlan unaware switches/servers will drop packets from the EVM. Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Tested-by: Yegor Yefremov <yegorslists@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net phylib: Remove unnecessary condition check in phy	Balakumaran Kannan	2014-04-09	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This condition check makes no difference in the code flow since 3.10 Signed-off-by: Balakumaran Kannan <kumaran.4353@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: sctp: test if association is dead in sctp_wake_up_waiters	Daniel Borkmann	2014-04-09	1	-0/+6
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In function sctp_wake_up_waiters(), we need to involve a test if the association is declared dead. If so, we don't have any reference to a possible sibling association anymore and need to invoke sctp_write_space() instead, and normally walk the socket's associations and notify them of new wmem space. The reason for special casing is that otherwise, we could run into the following issue when a sctp_primitive_SEND() call from sctp_sendmsg() fails, and tries to flush an association's outq, i.e. in the following way: sctp_association_free() `-> list_del(&asoc->asocs) <-- poisons list pointer asoc->base.dead = true sctp_outq_free(&asoc->outqueue) `-> __sctp_outq_teardown() `-> sctp_chunk_free() `-> consume_skb() `-> sctp_wfree() `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers if asoc->ep->sndbuf_policy=0 Therefore, only walk the list in an 'optimized' way if we find that the current association is still active. We could also use list_del_init() in addition when we call sctp_association_free(), but as Vlad suggests, we want to trap such bugs and thus leave it poisoned as is. Why is it safe to resolve the issue by testing for asoc->base.dead? Parallel calls to sctp_sendmsg() are protected under socket lock, that is lock_sock()/release_sock(). Only within that path under lock held, we're setting skb/chunk owner via sctp_set_owner_w(). Eventually, chunks are freed directly by an association still under that lock. So when traversing association list on destruction time from sctp_wake_up_waiters() via sctp_wfree(), a different CPU can't be running sctp_wfree() while another one calls sctp_association_free() as both happens under the same lock. Therefore, this can also not race with setting/testing against asoc->base.dead as we are guaranteed for this to happen in order, under lock. Further, Vlad says: the times we check asoc->base.dead is when we've cached an association pointer for later processing. In between cache and processing, the association may have been freed and is simply still around due to reference counts. We check asoc->base.dead under a lock, so it should always be safe to check and not race against sctp_association_free(). Stress-testing seems fine now, too. Fixes: cd253f9f357d ("net: sctp: wake up all assocs if sndbuf policy is per socket") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds	2014-04-08	38	-168/+320
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull more networking updates from David Miller: 1) If a VXLAN interface is created with no groups, we can crash on reception of packets. Fix from Mike Rapoport. 2) Missing includes in CPTS driver, from Alexei Starovoitov. 3) Fix string validations in isdnloop driver, from YOSHIFUJI Hideaki and Dan Carpenter. 4) Missing irq.h include in bnxw2x, enic, and qlcnic drivers. From Josh Boyer. 5) AF_PACKET transmit doesn't statistically count TX drops, from Daniel Borkmann. 6) Byte-Queue-Limit enabled drivers aren't handled properly in AF_PACKET transmit path, also from Daniel Borkmann. Same problem exists in pktgen, and Daniel fixed it there too. 7) Fix resource leaks in driver probe error paths of new sxgbe driver, from Francois Romieu. 8) Truesize of SKBs can gradually get more and more corrupted in NAPI packet recycling path, fix from Eric Dumazet. 9) Fix uniprocessor netfilter build, from Florian Westphal. In the longer term we should perhaps try to find a way for ARRAY_SIZE() to work even with zero sized array elements. 10) Fix crash in netfilter conntrack extensions due to mis-estimation of required extension space. From Andrey Vagin. 11) Since we commit table rule updates before trying to copy the counters back to userspace (it's the last action we perform), we really can't signal the user copy with an error as we are beyond the point from which we can unwind everything. This causes all kinds of use after free crashes and other mysterious behavior. From Thomas Graf. 12) Restore previous behvaior of div/mod by zero in BPF filter processing. From Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits) net: sctp: wake up all assocs if sndbuf policy is per socket isdnloop: several buffer overflows netdev: remove potentially harmful checks pktgen: fix xmit test for BQL enabled devices net/at91_ether: avoid NULL pointer dereference tipc: Let tipc_release() return 0 at86rf230: fix MAX_CSMA_RETRIES parameter mac802154: fix duplicate #include headers sxgbe: fix duplicate #include headers net: filter: be more defensive on div/mod by X==0 netfilter: Can't fail and free after table replacement xen-netback: Trivial format string fix net: bcmgenet: Remove unnecessary version.h inclusion net: smc911x: Remove unused local variable bonding: Inactive slaves should keep inactive flag's value netfilter: nf_tables: fix wrong format in request_module() netfilter: nf_tables: set names cannot be larger than 15 bytes netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len netfilter: Add {ipt,ip6t}_osf aliases for xt_osf netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks ...
\| *	net: sctp: wake up all assocs if sndbuf policy is per socket	Daniel Borkmann	2014-04-08	1	-1/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SCTP charges chunks for wmem accounting via skb->truesize in sctp_set_owner_w(), and sctp_wfree() respectively as the reverse operation. If a sender runs out of wmem, it needs to wait via sctp_wait_for_sndbuf(), and gets woken up by a call to __sctp_write_space() mostly via sctp_wfree(). __sctp_write_space() is being called per association. Although we assign sk->sk_write_space() to sctp_write_space(), which is then being done per socket, it is only used if send space is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE is set and therefore not invoked in sock_wfree(). Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf again when transmitting packet") fixed an issue where in case sctp_packet_transmit() manages to queue up more than sndbuf bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is interrupted by a signal. However, a still remaining issue is that if net.sctp.sndbuf_policy=0, that is accounting per socket, and one-to-many sockets are in use, the reclaimed write space from sctp_wfree() is 'unfairly' handed back on the server to the association that is the lucky one to be woken up again via __sctp_write_space(), while the remaining associations are never be woken up again (unless by a signal). The effect disappears with net.sctp.sndbuf_policy=1, that is wmem accounting per association, as it guarantees a fair share of wmem among associations. Therefore, if we have reclaimed memory in case of per socket accounting, wake all related associations to a socket in a fair manner, that is, traverse the socket association list starting from the current neighbour of the association and issue a __sctp_write_space() to everyone until we end up waking ourselves. This guarantees that no association is preferred over another and even if more associations are taken into the one-to-many session, all receivers will get messages from the server and are not stalled forever on high load. This setting still leaves the advantage of per socket accounting in touch as an association can still use up global limits if unused by others. Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	isdnloop: several buffer overflows	Dan Carpenter	2014-04-08	1	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are three buffer overflows addressed in this patch. 1) In isdnloop_fake_err() we add an 'E' to a 60 character string and then copy it into a 60 character buffer. I have made the destination buffer 64 characters and I'm changed the sprintf() to a snprintf(). 2) In isdnloop_parse_cmd(), p points to a 6 characters into a 60 character buffer so we have 54 characters. The ->eazlist[] is 11 characters long. I have modified the code to return if the source buffer is too long. 3) In isdnloop_command() the cbuf[] array was 60 characters long but the max length of the string then can be up to 79 characters. I made the cbuf array 80 characters long and changed the sprintf() to snprintf(). I also removed the temporary "dial" buffer and changed it to use "p" directly. Unfortunately, we pass the "cbuf" string from isdnloop_command() to isdnloop_writecmd() which truncates anything over 60 characters to make it fit in card->omsg[]. (It can accept values up to 255 characters so long as there is a '\n' character every 60 characters). For now I have just fixed the memory corruption bug and left the other problems in this driver alone. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	netdev: remove potentially harmful checks	Veaceslav Falico	2014-04-07	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we're checking a variable for != NULL after actually dereferencing it, in netdev_lower_get_next_private*(). It's counter-intuitive at best, and can lead to faulty usage (as it implies that the variable can be NULL), so fix it by removing the useless checks. Reported-by: Daniel Borkmann <dborkman@redhat.com> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Jiri Pirko <jiri@resnulli.us> CC: stephen hemminger <stephen@networkplumber.org> CC: Jerry Chu <hkchu@google.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	pktgen: fix xmit test for BQL enabled devices	Daniel Borkmann	2014-04-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similarly as in commit 8e2f1a63f221 ("packet: fix packet_direct_xmit for BQL enabled drivers"), we test for __QUEUE_STATE_STACK_XOFF bit in pktgen's xmit, which would not fully fill the device's TX ring for BQL drivers that use netdev_tx_sent_queue(). Fix is to use, similarly as we do in packet sockets, netif_xmit_frozen_or_drv_stopped() test. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/at91_ether: avoid NULL pointer dereference	Gilles Chanteperdrix	2014-04-07	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The at91_ether driver calls macb_mii_init passing a 'struct macb' structure whose tx_clk member is initialized to 0. However, macb_handle_link_change() expects tx_clk to be the result of a call to clk_get, and so IS_ERR(tx_clk) to be true if the clock is invalid. This causes an oops when booting Linux 3.14 on the csb637 board. The following changes avoids this. Signed-off-by: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: Let tipc_release() return 0	Geert Uytterhoeven	2014-04-07	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	net/tipc/socket.c: In function ‘tipc_release’: net/tipc/socket.c:352: warning: ‘res’ is used uninitialized in this function Introduced by commit 24be34b5a0c9114541891d29dff1152bb1a8df34 ("tipc: eliminate upcall function pointers between port and socket"), which removed the sole initializer of "res". Just return 0 to fix it. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	at86rf230: fix MAX_CSMA_RETRIES parameter	Alexander Aring	2014-04-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fix a copy&paste failure for setting the MAX_CSMA_RETRIES value of the at86rf212 chip which was introduced by commit f2fdd67c6bc89de0100410efb37de69b1c98ac03 ("ieee802154: enable smart transmitter features of RF212") Signed-off-by: Alexander Aring <alex.aring@gmail.com> Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mac802154: fix duplicate #include headers	Jean Sacren	2014-04-07	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The commit e6278d92005e ("mac802154: use header operations to create/parse headers") included the header net/ieee802154_netdev.h which had been included by the commit b70ab2e87f17 ("ieee802154: enforce consistent endianness in the 802.15.4 stack"). Fix this duplicate #include by deleting the latter one as the required header has already been in place. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Cc: linux-zigbee-devel@lists.sourceforge.net Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sxgbe: fix duplicate #include headers	Jean Sacren	2014-04-07	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The commit 1edb9ca69e8a ("net: sxgbe: add basic framework for Samsung 10Gb ethernet driver") added support for Samsung 10Gb ethernet driver(sxgbe) with a minor issue of including linux/io.h header twice in sxgbe_dma.c file. Fix the duplicate #include by deleting the top one so that all the rest good #include headers would be preserved in the alphabetical order. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Cc: Byungho An <bh74.an@samsung.com> Cc: Girish K S <ks.giri@samsung.com> Cc: Siva Reddy Kallam <siva.kallam@samsung.com> Cc: Vipul Pandya <vipul.pandya@samsung.com> Acked-by: Byungho An <bh74.an@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: filter: be more defensive on div/mod by X==0	Daniel Borkmann	2014-04-07	1	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old interpreter behaviour was that we returned with 0 whenever we found a division by 0 would take place. In the new interpreter we would currently just skip that instead and continue execution. It's true that a value of 0 as return might not be appropriate in all cases, but current users (socket filters -> drop packet, seccomp -> SECCOMP_RET_KILL, cls_bpf -> unclassified, etc) seem fine with that behaviour. Better this than undefined BPF program behaviour as it's expected that A contains the result of the division. In future, as more use cases open up, we could further adapt this return value to our needs, if necessary. So reintroduce return of 0 for division by 0 as in the old interpreter. Also in case of K which is guaranteed to be 32bit wide, sk_chk_filter() already takes care of preventing division by 0 invoked through K, so we can generally spare us these tests. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Reviewed-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf	David S. Miller	2014-04-06	9	-24/+40
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter fixes for your net tree, they are: * Use 16-bits offset and length fields instead of 8-bits in the conntrack extension to avoid an overflow when many conntrack extension are used, from Andrey Vagin. * Allow to use cgroup match from LOCAL_IN, there is no apparent reason for not allowing this, from Alexey Perevalov. * Fix build of the connlimit match after recent changes to let it scale up that result in a divide by zero compilation error in UP, from Florian Westphal. * Move the lock out of the structure connlimit_data to avoid a false sharing spotted by Eric Dumazet and Jesper D. Brouer, this needed as part of the recent connlimit scalability improvements, also from Florian Westphal. * Add missing module aliases in xt_osf to fix loading of rules using this match, from Kirill Tkhai. * Restrict set names in nf_tables to 15 characters instead of silently trimming them off, from me. * Fix wrong format in nf_tables request module call for chain types, spotted by Florian Westphal, patch from me. * Fix crash in xtables when it fails to copy the counters back to userspace after having replaced the table already. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	netfilter: Can't fail and free after table replacement	Thomas Graf	2014-04-05	4	-9/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All xtables variants suffer from the defect that the copy_to_user() to copy the counters to user memory may fail after the table has already been exchanged and thus exposed. Return an error at this point will result in freeing the already exposed table. Any subsequent packet processing will result in a kernel panic. We can't copy the counters before exposing the new tables as we want provide the counter state after the old table has been unhooked. Therefore convert this into a silent error. Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| \| *	netfilter: nf_tables: fix wrong format in request_module()	Pablo Neira Ayuso	2014-04-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The intended format in request_module is %.s instead of %.s. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| \| *	netfilter: nf_tables: set names cannot be larger than 15 bytes	Pablo Neira Ayuso	2014-04-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, nf_tables trims off the set name if it exceeeds 15 bytes, so explicitly reject set names that are too large. Reported-by: Giuseppe Longo <giuseppelng@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| \| *	netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len	Andrey Vagin	2014-04-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"len" contains sizeof(nf_ct_ext) and size of extensions. In a worst case it can contain all extensions. Bellow you can find sizes for all types of extensions. Their sum is definitely bigger than 256. nf_ct_ext_types[0]->len = 24 nf_ct_ext_types[1]->len = 32 nf_ct_ext_types[2]->len = 24 nf_ct_ext_types[3]->len = 32 nf_ct_ext_types[4]->len = 152 nf_ct_ext_types[5]->len = 2 nf_ct_ext_types[6]->len = 16 nf_ct_ext_types[7]->len = 8 I have seen "len" up to 280 and my host has crashes w/o this patch. The right way to fix this problem is reducing the size of the ecache extension (4) and Florian is going to do this, but these changes will be quite large to be appropriate for a stable tree. Fixes: 5b423f6a40a0 (netfilter: nf_conntrack: fix racy timer handling with reliable) Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| \| *	netfilter: Add {ipt,ip6t}_osf aliases for xt_osf	Kirill Tkhai	2014-04-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are no these aliases, so kernel can not request appropriate match table: $ iptables -I INPUT -p tcp -m osf --genre Windows --ttl 2 -j DROP iptables: No chain/target/match by that name. setsockopt() requests ipt_osf module, which is not present. Add the aliases. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| \| *	netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks	Alexey Perevalov	2014-04-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This simple modification allows iptables to work with INPUT chain in combination with cgroup module. It could be useful for counting ingress traffic per cgroup with nfacct netfilter module. There were no problems to count the egress traffic that way formerly. It's possible to get classified sk_buff after PREROUTING, due to socket lookup being done in early_demux (tcp_v4_early_demux). Also it works for udp as well. Trivial usage example, assuming we're in the same shell every step and we have enough permissions: 1) Classic net_cls cgroup initialization: mkdir /sys/fs/cgroup/net_cls mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls 2) Set up cgroup for interesting application: mkdir /sys/fs/cgroup/net_cls/wget echo 1 > /sys/fs/cgroup/net_cls/wget/net_cls.classid echo $BASHPID > /sys/fs/cgroup/net_cls/wget/cgroup.procs 3) Create kernel counters: nfacct add wget-cgroup-in iptables -A INPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-in nfacct add wget-cgroup-out iptables -A OUTPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-out 4) Network usage: wget https://www.kernel.org/pub/linux/kernel/v3.x/testing/linux-3.14-rc6.tar.xz 5) Check results: nfacct list Cgroup approach is being used for the DataUsage (counting & blocking traffic) feature for Samsung's modification of the Tizen OS. Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| \| *	netfilter: connlimit: move lock array out of struct connlimit_data	Florian Westphal	2014-04-03	1	-9/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Eric points out that the locks can be global. Moreover, both Jesper and Eric note that using only 32 locks increases false sharing as only two cache lines are used. This increases locks to 256 (16 cache lines assuming 64byte cacheline and 4 bytes per spinlock). Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>