summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* net: ovs: use CRC32 accelerated flow hash if availableFrancesco Fusco2013-12-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently OVS uses jhash2() for calculating flow hashes in its internal flow_hash() function. The performance of the flow_hash() function is critical, as the input data can be hundreds of bytes long. OVS is largely deployed in x86_64 based datacenters. Therefore, we argue that the performance critical fast path of OVS should exploit underlying CPU features in order to reduce the per packet processing costs. We replace jhash2 with the hash implementation provided by the kernel hash lib, which exploits the crc32l instruction to achieve high performance Our patch greatly reduces the hash footprint from ~200 cycles of jhash2() to around ~90 cycles in case of ovs_flow_hash_crc() (measured with rdtsc over maximum length flow keys on an i7 Intel CPU). Additionally, we wrote a microbenchmark to stress the flow table performance. The benchmark inserts random flows into the flow hash and then performs lookups. Our hash deployed on a CRC32 capable CPU reduces the lookup for 1000 flows, 100 masks from ~10,100us to ~6,700us, for example. Thus, simply use the newly introduced arch_fast_hash2() as a drop-in replacement. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Thomas Graf <tgraf@redhat.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* lib: introduce arch optimized hash libraryFrancesco Fusco2013-12-177-2/+180
| | | | | | | | | | | | | | | | | | | | | | | We introduce a new hashing library that is meant to be used in the contexts where speed is more important than uniformity of the hashed values. The hash library leverages architecture specific implementation to achieve high performance and fall backs to jhash() for the generic case. On Intel-based x86 architectures, the library can exploit the crc32l instruction, part of the Intel SSE4.2 instruction set, if the instruction is supported by the processor. This implementation is twice as fast as the jhash() implementation on an i7 processor. Additional architectures, such as Arm64 provide instructions for accelerating the computation of CRC, so they could be added as well in follow-up work. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Thomas Graf <tgraf@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
* fddi: cleanup unsigned to unsigned int/shorttanxiaojun2013-12-164-73/+73
| | | | | | | | Use "unsigned int/short" instead of "unsigned", and change the type of iteration variable "i" to "unsigned int". Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: change lock_sock order in connect()wangweidong2013-12-161-4/+2
| | | | | | | | | | | Instead of reaquiring the socket lock and taking the normal exit path when a connection times out, we bail out early with a return -ETIMEDOUT. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: Use <linux/uaccess.h> instead of <asm/uaccess.h>wangweidong2013-12-161-1/+1
| | | | | | | | | | As warned by checkpatch.pl, use #include <linux/uaccess.h> instead of <asm/uaccess.h> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: kill unnecessary goto'swangweidong2013-12-161-8/+6
| | | | | | | | | | | | Remove a number of needless 'goto exit' in send_stream when the socket is in an unconnected state. This patch is cosmetic and does not alter the operation of TIPC in any way. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: remove unnecessary variables and conditionswangweidong2013-12-163-17/+8
| | | | | | | | | | | We remove a number of unnecessary variables and branches in TIPC. This patch is cosmetic and does not change the operation of TIPC in any way. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net-ipv6: Fix alleged compiler warning in ipv6_exthdrs_len()Jerry Chu2013-12-151-8/+4
| | | | | | | | | | | | | | | | | | | | It was reported that Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") triggered a compiler warning in ipv6_exthdrs_len(): net/ipv6/ip6_offload.c: In function ‘ipv6_gro_complete’: net/ipv6/ip6_offload.c:178:24: warning: ‘optlen’ may be used uninitialized in this function [-Wmaybe-u opth = (void *)opth + optlen; ^ net/ipv6/ip6_offload.c:164:22: note: ‘optlen’ was declared here int len = 0, proto, optlen; ^ Note that there was no real bug here - optlen was never uninitialized before use. (Was the version of gcc I used smarter to not complain?) Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'for-davem' of ↵David S. Miller2013-12-1410-217/+797
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next Ben Hutchings says: ==================== 1. Change PTP clock name to 'sfc'. 2. Complete support for hardware timestamping and PTP clock on the SFC9100 family. 3. Various cleanups for the PTP code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Remove unnecessary condition for processing the TX timestamp queueBen Hutchings2013-12-121-12/+4
| | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Don't clear timestamps in efx_ptp_rx()Ben Hutchings2013-12-121-6/+0
| | | | | | | | | | | | A freshly allocated skb starts with timestamps clear. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Enable PTP clock and timestamping for all functions on EF10Ben Hutchings2013-12-122-17/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The SFC9100 family has only one clock per controller, shared by all functions. Therefore only create a clock device under the primary function, and make all other functions refer to the primary's clock device. Since PTP functionality is limited to port 0 and PF 0 on the earlier SFN[56]322F boards, and we also set the primary flag for that function, we can make the creation of a clock device conditional only on this flag. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Associate primary and secondary functions of controllerBen Hutchings2013-12-124-0/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The primary function of an EF10 controller will share its clock device with other functions in the same domain (which we call secondary functions). To this end, we need to associate functions on the same controller. We do not control probe order, so allow primary and secondary functions to appear in any order. Maintain global lists of all primary functions and of unassociated secondary functions, and a list of secondary functions on each primary function. Use the VPD serial number to tell whether functions are part of the same controller. VPD will not be readable by virtual functions, so this may need to be revisited later. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Store VPD serial number at probe timeBen Hutchings2013-12-122-7/+34
| | | | | | | | | | | | Original version by Stuart Hodgson. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Add RX packet timestamping for EF10Jon Cooper2013-12-127-15/+277
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The EF10 firmware can optionally insert RX timestamps in the packet prefix. These only include the clock minor value. We must also enable periodic time sync events on each event queue which provide the high bits of the clock value. [bwh: Combined and rebased several changes. Added the above description and some sanity checks for inline vs separate timestamps. Changed efx_rx_skb_attach_timestamp() to read the packet prefix from the skb head area.] Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Copy RX prefix into skb head area in efx_rx_mk_skb()Ben Hutchings2013-12-122-12/+7
| | | | | | | | | | | | | | | | | | We can potentially pull the entire packet contents into the head area and then free the page it was in. In order to read an inline timestamp safely, we need to copy the prefix into the head area as well. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: split setup of hardware timestamping into NIC-type operationDaniel Pieczko2013-12-124-72/+78
| | | | | | | | | | | | | | | | | | I added efx_ptp_get_mode() to avoid moving the definition for efx_ptp_data, since the current PTP mode is needed for siena.c:siena_set_ptp_hwtstamp. [bwh: Also move the rx_filters mask, and add kernel-doc] Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Add support for SFC9100 timestamp formatLaurence Evans2013-12-121-25/+231
| | | | | | | | | | | | | | | | | | | | | | The clock minor tick on the SFC9100 family is 2^-27 s, not 1 ns. There are also various pipeline delays which we need to correct for when interpreting timestamps. We query the firmware for the clock format and corrections at run-time. [bwh: Combined and rebased several changes] Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Tidy up PTP synchronization codeLaurence Evans2013-12-121-20/+24
| | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: PTP - tidy up unused/useless variablesLaurence Evans2013-12-121-15/+1
| | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Remove kernel-doc for efx_ptp_data fields not present in this versionBen Hutchings2013-12-121-15/+0
| | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Initialise efx_ptp_data::phc_clock_info from a static templateBen Hutchings2013-12-121-14/+16
| | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Do not use MAC address as clock nameBen Hutchings2013-12-121-3/+2
| | | | | | | | | | | | | | | | | | We'll be sharing clocks between multiple functions with their own MAC addresses. The name field is now documented as 'A short "friendly name" to identify the clock ...' and '... not meant to be a unique id.' So use the name 'sfc'. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Store flags from MC_CMD_DRV_ATTACH for later useBen Hutchings2013-12-122-2/+18
| | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
* | ipv6: fix compiler warning in ipv6_exthdrs_lenHannes Frederic Sowa2013-12-141-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") used an uninitialized variable which leads to the following compiler warning: net/ipv6/ip6_offload.c: In function ‘ipv6_gro_complete’: net/ipv6/ip6_offload.c:178:24: warning: ‘optlen’ may be used uninitialized in this function [-Wmaybe-uninitialized] opth = (void *)opth + optlen; ^ net/ipv6/ip6_offload.c:164:22: note: ‘optlen’ was declared here int len = 0, proto, optlen; ^ Fix it up. Cc: Jerry Chu <hkchu@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'bonding_rcu'David S. Miller2013-12-148-147/+130
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ding Tianhong says: ==================== bonding: rebuild the lock use for bond monitor Now the bond slave list is not protected by bond lock, only by RTNL, but the monitor still use the bond lock to protect the slave list, it is useless, according to the Veaceslav's opinion, there were three way to fix the protect problem: 1. add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe to call call_netdevice_notifiers() in write lock. 2. remove unused bond->lock for monitor function, only use the exist rtnl lock(), it will take performance loss in fast path. 3. use RCU to protect the slave list, of course, performance is better, but in slow path, it is ignored. obviously the solution 1 is not fit here, I will consider the 2 and 3 solution. My principle is simple, if in fast path, RCU is better, otherwise in slow path, both is well, but according to the Jay Vosburgh's opinion, the monitor will loss performace if use RTNL to protect the all slave list, so remove the bond lock and replace with RCU. The second problem is the curr_slave_lock for bond, it is too old and unwanted in many place, because the curr_active_slave would only be changed in 3 place: 1. enslave slave. 2. release slave. 3. change active slave. all above were already holding bond lock, RTNL and curr_slave_lock together, it is tedious and no need to add so mach lock, when change the curr_active_slave, you have to hold the RTNL and curr_slave_lock together, and when you read the curr_active_slave, RTNL or curr_slave_lock, any one of them is no problem. for the stability, I did not change the logic for the monitor, all change is clear and simple, I have test the patch set for lockdep, it work well and stability. v2. accept the Jay Vosburgh's opinion, remove the RTNL and replace with RCU, also add some rcu function for bond use, so the patch set reach 10. v3. accept the Nikolay Aleksandrov's opinion, remove no needed bond_has_slave_rcu(), add protection for several 3ad mode handler functions and current_arp_slave. rebuild the bond_first_slave_rcu(), make it more clear. v4. because the struct netdev_adjacent should not be exist in netdevice.h, so I have to make a new function to support micro bond_first_slave_rcu(). also add a new patch to simplify the bond_resend_igmp_join_requests_delayed(). v5. according the Jay Vosburgh's opinion, in patch 2 and 6, the calling of notify peer is hardly to happen with the bond_xxx_commit() when the monitoring is running, so the performance impact about make two round trips to one trip on RTNL is minimal, no need to do that,the reason is very clear, so modify the patch 2 and 6, recover the notify peer in RTNL alone. ==================== Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bonding: rebuild the bond_resend_igmp_join_requests_delayed()dingtianhong2013-12-141-16/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bond_resend_igmp_join_requests_delayed() and bond_resend_igmp_join_requests() should be integrated, because the bond_resend_igmp_join_requests_delayed() did nothing except bond_resend_igmp_join_requests(). The bond igmp_retrans could only be changed in bond_change_active_slave and here, bond_change_active_slave will be called in RTNL and curr_slave_lock, the bond_resend_igmp_join_requests already hold RTNL, so no need to free RTNL and hold curr_slave_lock again, it may be a small optimization, so move the igmp_retrans in RTNL and remove the curr_slave_lock. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bonding: remove unwanted lock for bond_store_primaryxxx()dingtianhong2013-12-141-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bond_select_active_slave() will not release and acquire bond lock, so it is no need to read the bond lock for them, and the bond_store_primaryxxx() is already in RTNL, so remove the unwanted lock. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bonding: remove unwanted lock for bond_option_active_slave_set()dingtianhong2013-12-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bond_option_active_slave_set() is always called in RTNL, the RTNL could protect bond slave list, so remove the unwanted bond lock. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bonding: add RCU for bond_3ad_state_machine_handler()dingtianhong2013-12-142-26/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bond_3ad_state_machine_handler() use the bond lock to protect the bond slave list and slave port together, but it is not enough, the bond slave list was link and unlink in RTNL, not bond lock, so I add RCU to protect the slave list from leaving. The bond lock is still used here, because when the slave has been removed from the list by the time the state machine runs, it appears to be possible for both function to manupulate the same aggregator->lag_ports by finding the aggregator via two different ports that are both members of that aggregator (i.e., port A of the agg is being unbound, and port B of the agg is runing its state machine). If I remove the bond lock, there are nothing to mutex changes to aggregator->lag_ports between bond_3ad_state_machine_handler and bond_3ad_unbind_slave, So the bond lock is the simplest way to protect aggregator->lag_ports. There was a lot of function need RCU protect, I have two choice to make the function in RCU-safe, (1) create new similar functions and make the bond slave list in RCU. (2) modify the existed functions and make them in read-side critical section, because the RCU read-side critical sections may be nested. I choose (2) because it is no need to create more similar functions. The nots in the function is still too old, clean up the nots. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bonding: remove unwanted lock for bond enslave and releasedingtianhong2013-12-141-21/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bond_change_active_slave() and bond_select_active_slave() do't need bond lock anymore, so remove the unwanted bond lock for these two functions. The bond_select_active_slave() will release and acquire curr_slave_lock, so the curr_slave_lock need to protect the function. In bond enslave and bond release, the bond slave list is also protected by RTNL, so bond lock is no need to exist, remove the lock and clean the functions. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bonding: rebuild the lock use for bond_activebackup_arp_mon()dingtianhong2013-12-141-25/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bond_activebackup_arp_mon() use the bond lock for read to protect the slave list, it is no effect, and the RTNL is only called for bond_ab_arp_commit() and peer notify, for the performance better, use RCU to replace with the bond lock, to the bond slave list need to called in RCU, add a new bond_first_slave_rcu() to get the first slave in RCU protection. In bond_ab_arp_probe(), the bond->current_arp_slave may changd if bond release slave, just like: bond_ab_arp_probe() bond_release() cpu 0 cpu 1 ... if (bond->current_arp_slave...) ... ... bond->current_arp_slave = NULl bond->current_arp_slave->dev->name ... So the current_arp_slave need to dereference in the section. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bonding: create bond_first_slave_rcu()dingtianhong2013-12-143-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bond_first_slave_rcu() will be used to instead of bond_first_slave() in rcu_read_lock(). According to the Jay Vosburgh's suggestion, the struct netdev_adjacent should hide from users who wanted to use it directly. so I package a new function to get the first slave of the bond. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bonding: rebuild the lock use for bond_loadbalance_arp_mon()dingtianhong2013-12-141-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bond_loadbalance_arp_mon() use the bond lock to protect the bond slave list, it is no effect, so I could use RTNL or RCU to replace it, considering the performance impact, the RCU is more better here, so the bond lock replace with the RCU. The bond_select_active_slave() need RTNL and curr_slave_lock together, but there is no RTNL lock here, so add a rtnl_rtylock. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bonding: rebuild the lock use for bond_alb_monitor()dingtianhong2013-12-141-13/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bond_alb_monitor use bond lock to protect the bond slave list, it is no effect here, we need to use RTNL or RCU to replace bond lock, the bond_alb_monitor will called 10 times one second, RTNL may loss performance here, so I replace bond lock with RCU to protect the bond slave list, also the RTNL is preserved, the logic of the monitor did not changed. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bonding: rebuild the lock use for bond_mii_monitor()dingtianhong2013-12-141-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bond_mii_monitor() still use bond lock to protect bond slave list, it is no effect, I have 2 way to fix the problem, move the RTNL to the top of the function, or add RCU to protect the bond slave list, according to the Jay Vosburgh's opinion, 10 times one second is a truely big performance loss if use RTNL to protect the whole monitor, so I would take the advice and use RCU to protect the bond slave list. The bond_has_slave() will not protect by anything, there will no things happen if the slave list is be changed, unless the bond was free, but it will not happened before the monitor, the bond will closed before be freed. The peers notify for the bond will calling curr_active_slave, so derefence the slave to make sure we will accessing the same slave if the curr_active_slave changed, as the rcu dereference need in read-side critical sector and bond_change_active_slave() will call it with no RCU hold, so add peer notify in rcu_read_lock which will be nested in monitor. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bonding: remove the no effect lock for bond_select_active_slave()dingtianhong2013-12-142-21/+6
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bond slave list was no longer protected by bond lock and only protected by RTNL or RCU, so anywhere that use bond lock to protect slave list is meaningless. remove the release and acquire bond lock for bond_select_active_slave(). The curr_active_slave could only be changed in 3 place: 1. enslave slave. 2. release slave. 3. change_active_slave. all above place were holding bond lock, RTNL and curr_slave_lock together, it is tedious and meaningless, obviously bond lock is no need here, but RTNL or curr_slave_lock is needed, so if you want to access active slave, you have to choose one lock, RTNL or curr_slave_lock, if RTNL is exist, no need to add curr_slave_lock, otherwise curr_slave_lock is better, because of the performance. there are several place calling bond_select_active_slave() and bond_change_active_slave(), the next step I will clean these place and remove the no effect lock. there are some document changed together when update the function. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | pkt_sched: set root qdisc before change() in attach_default_qdiscs()Eric Dumazet2013-12-142-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 95dc19299f74 ("pkt_sched: give visibility to mq slave qdiscs") we call disc_list_add() while the device qdisc might be the noop_qdisc one. This shows up as duplicates in "tc qdisc show", as all inactive devices point to noop_qdisc. Fix this by setting dev->qdisc to the new qdisc before calling ops->change() in attach_default_qdiscs() Add a WARN_ON_ONCE() to catch any future similar problem. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'for-davem' of ↵David S. Miller2013-12-1420-421/+1418
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next Ben Hutchings says: ==================== An assortment of changes for Linux 3.14: 1. Merge the sfc fixes that you have already merged into net.git. (The branch point for those was such that this does not bring in any other changes.) 2. Reduce log level for a generally useless warning message, from Robert Stonehouse. 3. Include BISTs in ethtool offline self-test for EF10 and recover from BISTs initiated through other functions, from Jon Cooper. 4. Improve a sanity check on RX completions. 5. Avoid incrementing RX dropped count while the interface is down, from Jon Cooper. 6. Improve hardware sensor naming and log messages, from Edward Cree. 7. Log all unexpected errors returned by firmware, from Edward Cree. 8. Expose another NVRAM partition to userland. 9. Some refactoring of the PTP code in preparation for EF10 support. 10. Various minor cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Remove dependency of PTP on having a dedicated channelBen Hutchings2013-12-123-24/+41
| | | | | | | | | | | | | | | | | | | | | | We need a dedicated channel on Siena to ensure we can match up the separate RX and timestamp events for each PTP packet. We won't do this for EF10 as timestamps are delivered inline. Pass a channel index of 0 to MC_CMD_PTP_OP_ENABLE when there is no dedicated channel. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Split PTP multicast filter insertion/removal out of efx_ptp_{start,stop}()Ben Hutchings2013-12-121-17/+39
| | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Return EBUSY for filter insertion on EF10, matching Falcon/SienaBen Hutchings2013-12-121-0/+2
| | | | | | | | | | | | | | | | | | The MC firmware will return error MC_CMD_ERR_ENOSPC if filter insertion fails due to lack of resources. The net driver's filter implementation for Falcon-architecture returns EBUSY. They should behave consistently, so for EF10 change ENOSPC to EBUSY. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Expose NVRAM_PARTITION_TYPE_LICENSE on EF10Ben Hutchings2013-12-121-0/+1
| | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Fold efx_flush_all() into efx_stop_port() and update commentsBen Hutchings2013-12-121-18/+12
| | | | | | | | | | | | | | | | | | | | | | | | efx_flush_all() is a really misleading name - it has nothing to do with e.g. flushing DMA queues. Since it's called immediately after efx_stop_port() and is highly dependent on what that does, combine the two functions. Update comments to explain what this is doing a little better. Also update an related and erroneous comment in efx_start_port(). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Map MCDI error MC_CMD_ERR_ENOTSUP to Linux EOPNOTSUPPBen Hutchings2013-12-121-0/+2
| | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Log all unexpected MCDI errorsEdward Cree2013-12-125-229/+265
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split each of efx_mcdi_rpc, efx_mcdi_rpc_finish, and efx_mcdi_rpc_async into a normal and a _quiet version; made the former log MCDI errors with netif_err (and include the raw MCDI error code), and the latter never log them at all. Changed various callers; any where some errors are expected (but others are not) call the _quiet version and then if necessary log the MCDI error themselves. Said logging is done by new efx_mcdi_display_error. Callers of efx_mcdi_rpc*_quiet functions which may want to log the error need to ensure that their outbuf is big enough to hold an MCDI error; to this end, they now use MCDI_DECLARE_BUF_OUT_OR_ERR, which always allocates at least 8 bytes. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Add new sensor namesBen Hutchings2013-12-121-0/+3
| | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Revise sensor names to be more understandable and consistentEdward Cree2013-12-121-23/+26
| | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Report units in sensor warningsEdward Cree2013-12-121-4/+20
| | | | | | | | | | | | Add units to the "Sensor reports condition X for raw value Y" messages. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * sfc: Correct RX dropped count for drops while interface is downJon Cooper2013-12-129-11/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't directly control RX ingress on Siena or any later controllers, and so we cannot prevent packets from entering the RX datapath while the RX queues are not set up. This results in the hardware incrementing RX_NODESC_DROP_CNT, but it's not an error and we should not include it in error stats. When bringing an interface up or down, pull (or wait for) stats and count the number of packets that were dropped while the interface was down. Subtract this from the reported RX dropped count. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
OpenPOWER on IntegriCloud