op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	rhashtable: add a note for grow and shrink decision functions	Ying Xue	2015-01-13	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As commit c0c09bfdc415 ("rhashtable: avoid unnecessary wakeup for worker queue") moves condition statements of verifying whether hash table size exceeds its maximum threshold or reaches its minimum threshold from resizing functions to resizing decision functions, we should add a note in rhashtable.h to indicate the implementation of what the grow and shrink decision function must enforce min/max shift, otherwise, it's failed to take min/max shift's set watermarks into effect. Signed-off-by: Ying Xue <ying.xue@windriver.com> Cc: Thomas Graf <tgraf@suug.ch> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netlink: eliminate nl_sk_hash_lock	Ying Xue	2015-01-13	3	-19/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	As rhashtable_lookup_compare_insert() can guarantee the process of search and insertion is atomic, it's safe to eliminate the nl_sk_hash_lock. After this, object insertion or removal will be protected with per bucket lock on write side while object lookup is guarded with rcu read lock on read side. Signed-off-by: Ying Xue <ying.xue@windriver.com> Cc: Thomas Graf <tgraf@suug.ch> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
*	rhashtable: involve rhashtable_lookup_compare_insert routine	Ying Xue	2015-01-13	2	-2/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a new function called rhashtable_lookup_compare_insert() which is very similar to rhashtable_lookup_insert(). But the former makes use of users' given compare function to look for an object, and then inserts it into hash table if found. As the entire process of search and insertion is under protection of per bucket lock, this can help users to avoid the involvement of extra lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Cc: Thomas Graf <tgraf@suug.ch> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'tuntap_queues'	David S. Miller	2015-01-12	2	-9/+11
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pankaj Gupta says: ==================== Increase the limit of tuntap queues Networking under KVM works best if we allocate a per-vCPU rx and tx queue in a virtual NIC. This requires a per-vCPU queue on the host side. Modern physical NICs have multiqueue support for large number of queues. To scale vNIC to run multiple queues parallel to maximum number of vCPU's we need to increase number of queues support in tuntap. Changes from v4: PATCH2: Michael.S.Tsirkin - Updated change comment message. Changes from v3: PATCH1: Michael.S.Tsirkin - Some cleanups and updated commit message. Perf numbers on 10 Gbs NIC Changes from v2: PATCH 3: David Miller - flex array adds extra level of indirection for preallocated array.(dropped, as flow array is allocated using kzalloc with failover to zalloc). Changes from v1: PATCH 2: David Miller - sysctl changes to limit number of queues not required for unprivileged users(dropped). Changes from RFC PATCH 1: Sergei Shtylyov - Add an empty line after declarations. PATCH 2: Jiri Pirko - Do not introduce new module paramaters. Michael.S.Tsirkin- We can use sysctl for limiting max number of queues. This series is to increase the number of tuntap queues. Original work is being done by 'jasowang@redhat.com'. I am taking this 'https://lkml.org/lkml/2013/6/19/29' patch series as a reference. As per discussion in the patch series: There were two reasons which prevented us from increasing number of tun queues: - The netdev_queue array in netdevice were allocated through kmalloc, which may cause a high order memory allocation too when we have several queues. E.g. sizeof(netdev_queue) is 320, which means a high order allocation would happens when the device has more than 16 queues. - We store the hash buckets in tun_struct which results a very large size of tun_struct, this high order memory allocation fail easily when the memory is fragmented. The patch 60877a32bce00041528576e6b8df5abe9251fa73 increases the number of tx queues. Memory allocation fallback to vzalloc() when kmalloc() fails. This series tries to address following issues: - Increase the number of netdev_queue queues for rx similarly its done for tx queues by falling back to vzalloc() when memory allocation with kmalloc() fails. - Increase number of queues to 256, maximum number is equal to maximum number of vCPUS allowed in a guest. I have also done testing with multiple parallel Netperf sessions for different combination of queues and CPU's. It seems to be working fine without much increase in cpu load with increase in number of queues. I also see good increase in throughput with increase in number of queues. Though i had limitation of 8 physical CPU's. For this test: Two Hosts(Host1 & Host2) are directly connected with cable Host1 is running Guest1. Data is sent from Host2 to Guest1 via Host1. Host kernel: 3.19.0-rc2+, AMD Opteron(tm) Processor 6320 NIC : Emulex Corporation OneConnect 10Gb NIC (be3) Patch Applied %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle throughput Single Queue, 2 vCPU's ------------- Before Patch :all 0.19 0.00 0.16 0.07 0.04 0.10 0.00 0.18 0.00 99.26 57864.18 After Patch :all 0.99 0.00 0.64 0.69 0.07 0.26 0.00 1.58 0.00 95.77 57735.77 With 2 Queues, 2 vCPU's --------------- Before Patch :all 0.19 0.00 0.19 0.10 0.04 0.11 0.00 0.28 0.00 99.08 63083.09 After Patch :all 0.87 0.00 0.73 0.78 0.09 0.35 0.00 2.04 0.00 95.14 62917.03 With 4 Queues, 4 vCPU's -------------- Before Patch :all 0.20 0.00 0.21 0.11 0.04 0.12 0.00 0.32 0.00 99.00 80865.06 After Patch :all 0.71 0.00 0.93 0.85 0.11 0.51 0.00 2.62 0.00 94.27 86463.19 With 8 Queues, 8 vCPU's -------------- Before Patch :all 0.19 0.00 0.18 0.09 0.04 0.11 0.00 0.23 0.00 99.17 86795.31 After Patch :all 0.65 0.00 1.18 0.93 0.13 0.68 0.00 3.38 0.00 93.05 89459.93 With 16 Queues, 8 vCPU's -------------- After Patch :all 0.61 0.00 1.59 0.97 0.18 0.92 0.00 4.32 0.00 91.41 120951.60 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tuntap: Increase the number of queues in tun.	Pankaj Gupta	2015-01-12	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Networking under kvm works best if we allocate a per-vCPU RX and TX queue in a virtual NIC. This requires a per-vCPU queue on the host side. It is now safe to increase the maximum number of queues. Preceding patch: 'net: allow large number of rx queues' made sure this won't cause failures due to high order memory allocations. Increase it to 256: this is the max number of vCPUs KVM supports. Size of tun_struct changes from 8512 to 10496 after this patch. This keeps pages allocated for tun_struct before and after the patch to 3. Signed-off-by: Pankaj Gupta <pagupta@redhat.com> Reviewed-by: David Gibson <dgibson@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: allow large number of rx queues	Pankaj Gupta	2015-01-12	1	-5/+8
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	netif_alloc_rx_queues() uses kcalloc() to allocate memory for "struct netdev_queue *_rx" array. If we are doing large rx queue allocation kcalloc() might fail, so this patch does a fallback to vzalloc(). Similar implementation is done for tx queue allocation in netif_alloc_netdev_queues(). We avoid failure of high order memory allocation with the help of vzalloc(), this allows us to do large rx and tx queue allocation which in turn helps us to increase the number of queues in tun. As vmalloc() adds overhead on a critical network path, __GFP_REPEAT flag is used with kzalloc() to do this fallback only when really needed. Signed-off-by: Pankaj Gupta <pagupta@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Gibson <dgibson@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	team: Remove dead code	Kenneth Williams	2015-01-12	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	The deleted lines are called from a function which is called: 1) Only through __team_options_register via team_options_register and 2) Only during initialization / mode initialization when there are no ports attached. Therefore the ports list is guarenteed to be empty and this code will never be executed. Signed-off-by: Kenneth Williams <ken@williamsclan.us> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: bnx2x: avoid macro redefinition	David Decotigny	2015-01-12	1	-4/+0
\| \| \| \| \|	Signed-off-by: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: sched: sch_teql: Remove unused function	Rickard Strandqvist	2015-01-12	1	-7/+0
\| \| \| \| \| \| \| \| \|	Remove the function teql_neigh_release() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: xfrm: xfrm_algo: Remove unused function	Rickard Strandqvist	2015-01-12	1	-5/+0
\| \| \| \| \| \| \| \| \|	Remove the function aead_entries() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'bridge_vlan_ranges'	David S. Miller	2015-01-12	3	-57/+195
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Roopa Prabhu says: ==================== bridge: support for vlan range in setlink/dellink This series adds new flags in IFLA_BRIDGE_VLAN_INFO to indicate vlan range. Will post corresponding iproute2 patches if these get accepted. v1-> v2 - changed patches to use a nested list attribute IFLA_BRIDGE_VLAN_INFO_LIST as suggested by scott feldman - dropped notification changes from the series. Will post them separately after this range message is accepted. v2 -> v3 - incorporated some review feedback - include patches to fill vlan ranges during getlink - Dropped IFLA_BRIDGE_VLAN_INFO_LIST. I think it may get confusing to userspace if we introduce yet another way to send lists. With getlink already sending nested IFLA_BRIDGE_VLAN_INFO in IFLA_AF_SPEC, It seems better to use the existing format for lists and just use the flags from v2 to mark vlan ranges ==================== Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bridge: new function to pack vlans into ranges during gets	Roopa Prabhu	2015-01-12	1	-21/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds new function to pack vlans into ranges whereever applicable using the flags BRIDGE_VLAN_INFO_RANGE_BEGIN and BRIDGE VLAN_INFO_RANGE_END Old vlan packing code is moved to a new function and continues to be called when filter_mask is RTEXT_FILTER_BRVLAN. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	rtnetlink: new filter RTEXT_FILTER_BRVLAN_COMPRESSED	Roopa Prabhu	2015-01-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This filter is same as RTEXT_FILTER_BRVLAN except that it tries to compress the consecutive vlans into ranges. This helps on systems with large number of configured vlans. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bridge: support for multiple vlans and vlan ranges in setlink and dellink ↵	Roopa Prabhu	2015-01-12	2	-36/+70
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	requests This patch changes bridge IFLA_AF_SPEC netlink attribute parser to look for more than one IFLA_BRIDGE_VLAN_INFO attribute. This allows userspace to pack more than one vlan in the setlink msg. The dumps were already sending more than one vlan info in the getlink msg. This patch also adds bridge_vlan_info flags BRIDGE_VLAN_INFO_RANGE_BEGIN and BRIDGE_VLAN_INFO_RANGE_END to indicate start and end of vlan range This patch also deletes unused ifla_br_policy. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	drivers: net: xen-netfront: remove residual dead code	Vincenzo Maffione	2015-01-12	1	-4/+0
\| \| \| \| \| \| \| \| \|	This patch removes some unused arrays from the netfront private data structures. These arrays were used in "flip" receive mode. Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Driver: Vmxnet3: Reinitialize vmxnet3 backend on wakeup from hibernate	Shrikrishna Khare	2015-01-12	2	-21/+27
\| \| \| \| \| \| \| \| \| \|	Failing to reinitialize on wakeup results in loss of network connectivity for vmxnet3 interface. Signed-off-by: Srividya Murali <smurali@vmware.com> Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Reviewed-by: Shreyas N Bhatewara <sbhatewara@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bonding: cleanup bond_opts array	Jonathan Toppins	2015-01-12	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the empty array element initializer and size the array with BOND_OPT_LAST so the compiler will complain if more elements are in there than should be. An interesting unwanted side effect of this initializer is that if one inserts new options into the middle of the array then this initializer will zero out the option that equals BOND_OPT_TLB_DYNAMIC_LB+1. Example: Extend the OPTS enum: enum { ... BOND_OPT_TLB_DYNAMIC_LB, BOND_OPT_LACP_NEW1, BOND_OPT_LAST }; Now insert into bond_opts array: static const struct bond_option bond_opts[] = { ... [BOND_OPT_LACP_RATE] = { .... unchanged stuff .... }, [BOND_OPT_LACP_NEW1] = { ... new stuff ... }, ... [BOND_OPT_TLB_DYNAMIC_LB] = { .... unchanged stuff ....}, { } // MARK A }; Since BOND_OPT_LACP_NEW1 = BOND_OPT_TLB_DYNAMIC_LB+1, the last initializer (MARK A) will overwrite the contents of BOND_OPT_LACP_NEW1 and can be easily viewed with the crash utility. Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Cc: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'tipc-namespaces'	David S. Miller	2015-01-12	32	-1277/+1503
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ying Xue says: ==================== tipc: make tipc support namespace This patchset aims to add net namespace support for TIPC stack. Currently TIPC module declares the following global resources: - TIPC network idenfication number - TIPC node table - TIPC bearer list table - TIPC broadcast link - TIPC socket reference table - TIPC name service table - TIPC node address - TIPC service subscriber server - TIPC random value - TIPC netlink In order that TIPC is aware of namespace, above each resource must be allocated, initialized and destroyed inside per namespace. Therefore, the major works of this patchset are to isolate these global resources and make them private for each namespace. However, before these changes come true, some necessary preparation works must be first done: convert socket reference table with generic rhashtable, cleanup core.c and core.h files, remove unnecessary wrapper functions for kernel timer interfaces and so on. It should be noted that commit ##1 ("tipc: fix bug in broadcast retransmit code") was already submitted to 'net' tree, so please see below link: http://patchwork.ozlabs.org/patch/426717/ Since it is prerequisite for the rest of the series to apply, I prepend them to the series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: make netlink support net namespace	Ying Xue	2015-01-12	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently tipc module only allows users sitting on "init_net" namespace to configure it through netlink interface. But now almost each tipc component is able to be aware of net namespace, so it's time to open the permission for users residing in other namespaces, allowing them to configure their own tipc stack instance through netlink interface. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: make tipc random value aware of net namespace	Ying Xue	2015-01-12	4	-12/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After namespace is supported, each namespace should own its private random value. So the global variable representing the random value must be moved to tipc_net structure. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: make subscriber server support net namespace	Ying Xue	2015-01-12	8	-65/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TIPC establishes one subscriber server which allows users to subscribe their interesting name service status. After tipc supports namespace, one dedicated tipc stack instance is created for each namespace, and each instance can be deemed as one independent TIPC node. As a result, subscriber server must be built for each namespace. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: make tipc node address support net namespace	Ying Xue	2015-01-12	17	-194/+246
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If net namespace is supported in tipc, each namespace will be treated as a separate tipc node. Therefore, every namespace must own its private tipc node address. This means the "tipc_own_addr" global variable of node address must be moved to tipc_net structure to satisfy the requirement. It's turned out that users also can assign node address for every namespace. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: name tipc name table support net namespace	Ying Xue	2015-01-12	15	-116/+154
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TIPC name table is used to store the mapping relationship between TIPC service name and socket port ID. When tipc supports namespace, it allows users to publish service names only owned by a certain namespace. Therefore, every namespace must have its private name table to prevent service names published to one namespace from being contaminated by other service names in another namespace. Therefore, The name table global variable (ie, nametbl) and its lock must be moved to tipc_net structure, and a parameter of namespace must be added for necessary functions so that they can obtain name table variable defined in tipc_net structure. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: make tipc socket support net namespace	Ying Xue	2015-01-12	6	-33/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now tipc socket table is statically allocated as a global variable. Through it, we can look up one socket instance with port ID, insert a new socket instance to the table, and delete a socket from the table. But when tipc supports net namespace, each namespace must own its specific socket table. So the global variable of socket table must be redefined in tipc_net structure. As a concequence, a new socket table will be allocated when a new namespace is created, and a socket table will be deallocated when namespace is destroyed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: make tipc broadcast link support net namespace	Ying Xue	2015-01-12	12	-205/+259
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TIPC broadcast link is statically established and its relevant states are maintained with the global variables: "bcbearer", "bclink" and "bcl". Allowing different namespace to own different broadcast link instances, these variables must be moved to tipc_net structure and broadcast link instances would be allocated and initialized when namespace is created. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: make bearer list support net namespace	Ying Xue	2015-01-12	10	-73/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bearer list defined as a global variable is used to store bearer instances. When tipc supports net namespace, bearers created in one namespace must be isolated with others allocated in other namespaces, which requires us that the bearer list(bearer_list) must be moved to tipc_net structure. As a result, a net namespace pointer has to be passed to functions which access the bearer list. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: make tipc node table aware of net namespace	Ying Xue	2015-01-12	21	-244/+329
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Global variables associated with node table are below: - node table list (node_htable) - node hash table list (tipc_node_list) - node table lock (node_list_lock) - node number counter (tipc_num_nodes) - node link number counter (tipc_num_links) To make node table support namespace, above global variables must be moved to tipc_net structure in order to keep secret for different namespaces. As a consequence, these variables are allocated and initialized when namespace is created, and deallocated when namespace is destroyed. After the change, functions associated with these variables have to utilize a namespace pointer to access them. So adding namespace pointer as a parameter of these functions is the major change made in the commit. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: involve namespace infrastructure	Ying Xue	2015-01-12	15	-86/+151
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Involve namespace infrastructure, make the "tipc_net_id" global variable aware of per namespace, and rename it to "net_id". In order that the conversion can be successfully done, an instance of networking namespace must be passed to relevant functions, allowing them to access the "net_id" variable of per namespace. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: remove unused tipc_link_get_max_pkt routine	Ying Xue	2015-01-12	2	-28/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: feed tipc sock pointer to tipc_sk_timeout routine	Ying Xue	2015-01-12	1	-17/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to make tipc socket table aware of namespace, a networking namespace instance must be passed to tipc_sk_lookup(), allowing it to look up tipc socket instance with a given port ID from a concrete socket table. However, as now tipc_sk_timeout() only has one port ID parameter and is not namespace aware, it's unable to obtain a correct socket instance through tipc_sk_lookup() just with a port ID, especially after namespace is completely supported. If port ID is replaced with socket instance as tipc_sk_timeout()'s parameter, it's unnecessary to look up socket table. But as the timer handler - tipc_sk_timeout() is run asynchronously, socket reference must be held before its timer is launched, and must be carefully checked to identify whether the socket reference needs to be put or not when its timer is terminated. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: cleanup core.c and core.h files	Ying Xue	2015-01-12	12	-89/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Only the works of initializing and shutting down tipc module are done in core.h and core.c files, so all stuffs which are not closely associated with the two tasks should be moved to appropriate places. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: remove unnecessary wrapper functions of kernel timer APIs	Ying Xue	2015-01-12	8	-118/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Not only some wrapper function like k_term_timer() is empty, but also some others including k_start_timer() and k_cancel_timer() don't return back any value to its caller, what's more, there is no any component in the kernel world to do such thing. Therefore, these timer interfaces defined in tipc module should be purged. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: remove tipc_core_start/stop routines	Ying Xue	2015-01-12	1	-43/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove redundant wrapper functions like tipc_core_start() and tipc_core_stop(), and directly move them to their callers, such as tipc_init() and tipc_exit(), having us clearly know what are really done in both initialization and deinitialzation functions. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: fix bug in broadcast retransmit code	Jon Maloy	2015-01-12	1	-2/+3
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 58dc55f25631178ee74cd27185956a8f7dcb3e32 ("tipc: use generic SKB list APIs to manage link transmission queue") we replace all list traversal loops with the macros skb_queue_walk() or skb_queue_walk_safe(). While the previous loops were based on the assumption that the list was NULL-terminated, the standard macros stop when the iterator reaches the list head, which is non-NULL. In the function bclink_retransmit_pkt() this macro replacement has lead to a bug. When we receive a BCAST STATE_MSG we unconditionally call the function bclink_retransmit_pkt(), whether there really is anything to retransmit or not, assuming that the sequence number comparisons will lead to the correct behavior. However, if the transmission queue is empty, or if there are no eligible buffers in the transmission queue, we will by mistake pass the list head pointer to the function tipc_link_retransmit(). Since the list head is not a valid sk_buff, this leads to a crash. In this commit we fix this by only calling tipc_link_retransmit() if we actually found eligible buffers in the transmission queue. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'cxgb4-next'	David S. Miller	2015-01-12	11	-176/+356
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Anish Bhatt says: ==================== All Chelsio drivers : Cleanup CPL messages macros This patch series cleans up all register defines/MACROS defined in t4_msg.h and affected files as part of the continuing cleanup effort The patches series is created against 'net-next' tree and includes patches to the cxgb4, cxgb4vf, iw_cxgb4, cxgb4i and csiostor drivers. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	iw_cxgb4/cxgb4/cxgb4vf/cxgb4i/csiostor: Cleanup register defines/macros ↵	Hariprasad Shenai	2015-01-12	11	-89/+173
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	related to all other cpl messages This patch cleanups all other macros/register define related to CPL messages that are defined in t4_msg.h and the affected files Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	iw_cxgb4/cxgb4/cxgb4i: Cleanup register defines/MACROS related to CM CPL ↵	Hariprasad Shenai	2015-01-12	6	-87/+183
\|/ \| \| \| \| \| \| \| \| \| \|	messages This patch cleanups all macros/register define related to connection management CPL messages that are defined in t4_msg.h and the affected files Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bridge: Add ability to enable TSO	Toshiaki Makita	2015-01-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Currently a bridge device turns off TSO feature if no bridge ports support it. We can always enable it, since packets can be segmented on ports by software as well as on the bridge device. This will reduce the number of packets processed in the bridge. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'r8152-next'	David S. Miller	2015-01-12	1	-1/+7
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hayes Wang says: ==================== r8152: adjust r8152_submit_rx v2: Replace the patch #1 with "call rtl_start_rx after netif_carrier_on". For patch #2, replace checking tp->speed with netif_carrier_ok. v1: Avoid r8152_submit_rx() from submitting rx during unexpected moment. This could reduce the time of stopping rx. For patch #1, the tp->speed should be updated early. Then, the patch #2 could use it to check the current linking status. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	r8152: check the status before submitting rx	hayeswang	2015-01-12	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't submit the rx if the device is unplugged, stopped, or linking down. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	r8152: call rtl_start_rx after netif_carrier_on	hayeswang	2015-01-12	1	-1/+2
\|/ \| \| \| \| \| \| \|	Remove rtl_start_rx() from rtl_enable() and put it after calling netif_carrier_on(). Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	vxlan: Improve support for header flags	Tom Herbert	2015-01-12	2	-14/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch cleans up the header flags of VXLAN in anticipation of defining some new ones: - Move header related definitions from vxlan.c to vxlan.h - Change VXLAN_FLAGS to be VXLAN_HF_VNI (only currently defined flag) - Move check for unknown flags to after we find vxlan_sock, this assumes that some flags may be processed based on tunnel configuration - Add a comment about why the stack treating unknown set flags as an error instead of ignoring them Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	packet: make packet too small warning match condition	Willem de Bruijn	2015-01-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The expression in ll_header_truncated() tests less than or equal, but the warning prints less than. Update the warning. Reported-by: Jouni Malinen <jkmalinen@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tg3: move init/deinit from open/close to probe/remove	Ivan Vecera	2015-01-12	1	-14/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Move init and deinit of PTP support from open/close functions to probe/remove funcs to avoid removing/re-adding of associated PTP device(s) during ifup/ifdown. v2: tg3_ptp_init call moved to correct place (thx. Prashant) Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: eth: xgene: devm_ioremap() returns NULL on error	Dan Carpenter	2015-01-12	1	-6/+6
\| \| \| \| \| \| \| \| \|	devm_ioremap() returns NULL on failure, it doesn't return an ERR_PTR. Fixes: de7b5b3d790a ('net: eth: xgene: change APM X-Gene SoC platform ethernet to support ACPI') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	csiostor:fix sparse warnings	Praveen Madhavan	2015-01-12	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes sparse warning reported by kbuild. Apply this on net-next since it depends on previous commit. drivers/scsi/csiostor/csio_hw.c:259:17: sparse: cast to restricted __le32 drivers/scsi/csiostor/csio_hw.c:536:31: sparse: incorrect type in assignment (different base types) drivers/scsi/csiostor/csio_hw.c:536:31: expected unsigned int [unsigned] [usertype] <noident> drivers/scsi/csiostor/csio_hw.c:536:31: got restricted __be32 [usertype] <noident> >> drivers/scsi/csiostor/csio_hw.c:2012:5: sparse: symbol 'csio_hw_prep_fw' was not declared. Should it be static? Signed-off-by: Praveen Madhavan <praveenm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	doc: fix the compile fix of txtimestamp.c	Willem de Bruijn	2015-01-11	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A fix to ipv6 structure definitions removed the now superfluous definition of in6_pktinfo in this file. But, use of the glibc definition requires defining _GNU_SOURCE (see also https://sourceware.org/bugzilla/show_bug.cgi?id=6775). Before this change, the following would fail for me: make make headers_install make M=Documentation/networking/timestamping with Documentation/networking/timestamping/txtimestamp.c: In function '__recv_errmsg_cmsg': Documentation/networking/timestamping/txtimestamp.c:205:33: error: dereferencing pointer to incomplete type Documentation/networking/timestamping/txtimestamp.c:206:23: error: dereferencing pointer to incomplete type After this patch compilation succeeded. Fixes: cd91cc5bdddf ("doc: fix the compile error of txtimestamp.c") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'irda-next'	David S. Miller	2015-01-11	15	-89/+34
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Chunyan Zhang says: ==================== irda: Use ktime_t instead of timeval This patch-set removed all uses of timeval and used ktime_t instead if needed, since 32-bit time types will break in the year 2038. This patch-set also used the ktime_xxx functions accordingly. e.g. * Used ktime_get to get the current time instead of do_gettimeofday. * And, used ktime_us_delta to get the elapsed time directly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	irda: vlsi_ir: Replace timeval with ktime_t	Chunyan Zhang	2015-01-11	2	-34/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The vlsi ir driver uses 'timeval', which we try to remove in the kernel because all 32-bit time types will break in the year 2038. This patch also changes do_gettimeofday() to ktime_get() accordingly, since ktime_get returns a ktime_t, but do_gettimeofday returns a struct timeval, and the other reason is that ktime_get() uses the monotonic clock. This patch uses ktime_us_delta to get the elapsed time of microsecond, and uses div_s64_rem to get what seconds & microseconds time elapsed for printing. This patch also changes the function 'vlsi_hard_start_xmit' to do the same things as the others drivers, that is passing the remaining time into udelay() instead of looping until enough time has passed. Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	irda: stir4200: Replace timeval with ktime_t	Chunyan Zhang	2015-01-11	1	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The stir4200 driver uses 'timeval', which we try to remove in the kernel because all 32-bit time types will break in the year 2038. This patch also changes do_gettimeofday() to ktime_get() accordingly, since ktime_get returns a ktime_t, but do_gettimeofday returns a struct timeval, and the other reason is that ktime_get() uses the monotonic clock. This patch uses ktime_us_delta to get the elapsed time of microsecond. Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>