summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* inet: ip early demux should avoid request socketsEric Dumazet2015-03-162-2/+2
| | | | | | | | | | When a request socket is created, we do not cache ip route dst entry, like for timewait sockets. Let's use sk_fullsock() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add sk_fullsock() helperEric Dumazet2015-03-161-0/+9
| | | | | | | | | We have many places where we want to check if a socket is not a timewait or request socket. Use a helper to avoid hard coding this. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'swdev_ops'David S. Miller2015-03-165-89/+105
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scott Feldman says: ==================== switchdev: add swdev ops v3: - Fix missing include for DSA build v2: - Per Simon's review, squash some of the dependent commits into one to make series git bisect safe. v1: Per discussions at netconf, move switchdev ndo ops to a new swdev_ops to keep ndo namespace clean and maintain switchdev-related ops into one place. There are no functional changes here; just shuffling ops around for better organization. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * netdev: remove ndo ops for switchdevScott Feldman2015-03-161-38/+0
| | | | | | | | | | Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * switchdev: use new swdev opsScott Feldman2015-03-163-51/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move swdev wrappers over to new swdev ops (from previous ndo ops). No functional changes to the implementation. Signed-off-by: Scott Feldman <sfeldma@gmail.com> rocker: move to new swdev ops Signed-off-by: Scott Feldman <sfeldma@gmail.com> dsa: move to new swdev ops Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * switchdev: add swdev opsScott Feldman2015-03-162-0/+41
|/ | | | | | | | | As discussed at netconf, introduce swdev_ops as first step to move switchdev ops from ndo to swdev. This will keep switchdev from cluttering up ndo ops space. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'rhashtable-fixes-next'David S. Miller2015-03-151-13/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Herbert Xu says: ==================== rhashtable: Fix two bugs caused by multiple rehash preparation While testing some new patches over the weekend I discovered a couple of bugs in the series that had just been merged. These two patches fix them: 1) A use-after-free in the walker that can cause crashes when walking during a rehash. 2) When a second rehash starts during a single rhashtable_remove call the remove may fail when it shouldn't. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * rhashtable: Fix rhashtable_remove failuresHerbert Xu2015-03-151-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 9d901bc05153bbf33b5da2cd6266865e531f0545 ("rhashtable: Free bucket tables asynchronously after rehash") causes gratuitous failures in rhashtable_remove. The reason is that it inadvertently introduced multiple rehashing from the perspective of readers. IOW it is now possible to see more than two tables during a single RCU critical section. Fortunately the other reader rhashtable_lookup already deals with this correctly thanks to c4db8848af6af92f90462258603be844baeab44d ("rhashtable: rhashtable: Move future_tbl into struct bucket_table") so only rhashtable_remove is broken by this change. This patch fixes this by looping over every table from the first one to the last or until we find the element that we were trying to delete. Incidentally the simple test for detecting rehashing to prevent starting another shrinking no longer works. Since it isn't needed anyway (the work queue and the mutex serves as a natural barrier to unnecessary rehashes) I've simply killed the test. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * rhashtable: Fix use-after-free in rhashtable_walk_stopHerbert Xu2015-03-151-3/+4
|/ | | | | | | | | | | | | | | The commit c4db8848af6af92f90462258603be844baeab44d ("rhashtable: Move future_tbl into struct bucket_table") introduced a use-after- free bug in rhashtable_walk_stop because it dereferences tbl after droping the RCU read lock. This patch fixes it by moving the RCU read unlock down to the bottom of rhashtable_walk_stop. In fact this was how I had it originally but it got dropped while rearranging patches because this one depended on the async freeing of bucket_table. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bcmgenet: add support for Hardware Filter BlockPetri Gynther2015-03-152-0/+176
| | | | | | | | Add support for Hardware Filter Block (HFB) so that incoming Rx traffic can be matched and directed to desired Rx queues. Signed-off-by: Petri Gynther <pgynther@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'ebpf_skb_fields'David S. Miller2015-03-1510-51/+335
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexei Starovoitov says: ==================== bpf: allow eBPF access skb fields V1->V2: - refactored field access converter into common helper convert_skb_access() used in both classic and extended BPF - added missing build_bug_on for field 'len' - added comment to uapi/linux/bpf.h as suggested by Daniel - dropped exposing 'ifindex' field for now classic BPF has a way to access skb fields, whereas extended BPF didn't. This patch introduces this ability. Classic BPF can access fields via negative SKF_AD_OFF offset. Positive bpf_ld_abs N is treated as load from packet, whereas bpf_ld_abs -0x1000 + N is treated as skb fields access. Many offsets were hard coded over years: SKF_AD_PROTOCOL, SKF_AD_PKTTYPE, etc. The problem with this approach was that for every new field classic bpf assembler had to be tweaked. I've considered doing the same for extended, but for every new field LLVM compiler would have to be modifed. Since it would need to add a new intrinsic. It could be done with single intrinsic and magic offset or use of inline assembler, but neither are clean from compiler backend point of view, since they look like calls but shouldn't scratch caller-saved registers. Another approach was to introduce a new helper functions like bpf_get_pkt_type() for every field that we want to access, but that is equally ugly for kernel and slow, since helpers are calls and they are slower then just loads. In theory helper calls can be 'inlined' inside kernel into direct loads, but since they were calls for user space, compiler would have to spill registers around such calls anyway. Teaching compiler to treat such helpers differently is even uglier. They were few other ideas considered. At the end the best seems to be to introduce a user accessible mirror of in-kernel sk_buff structure: struct __sk_buff { __u32 len; __u32 pkt_type; __u32 mark; __u32 queue_mapping; }; bpf programs will do: int bpf_prog1(struct __sk_buff *skb) { __u32 var = skb->pkt_type; which will be compiled to bpf assembler as: dst_reg = *(u32 *)(src_reg + 4) // 4 == offsetof(struct __sk_buff, pkt_type) bpf verifier will check validity of access and will convert it to: dst_reg = *(u8 *)(src_reg + offsetof(struct sk_buff, __pkt_type_offset)) dst_reg &= 7 since 'pkt_type' is a bitfield. No new instructions added. LLVM doesn't need to be modified. JITs don't change and verifier already knows when it accesses 'ctx' pointer. The only thing needed was to convert user visible offset within __sk_buff to kernel internal offset within sk_buff. For 'len' and other fields conversion is trivial. Converting 'pkt_type' takes 2 or 3 instructions depending on endianness. More fields can be exposed by adding to the end of the 'struct __sk_buff'. Like vlan_tci and others can be added later. When pkt_type field is moved around, goes into different structure, removed or its size changes, the function convert_skb_access() would need to updated and it will cover both classic and extended. Patch 2 updates examples to demonstrates how fields are accessed and adds new tests for verifier, since it needs to detect a corner case when attacker is using single bpf instruction in two branches with different register types. The 4 fields of __sk_buff are already exposed to user space via classic bpf and I believe they're useful in extended as well. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * samples: bpf: add skb->field examples and testsAlexei Starovoitov2015-03-155-16/+101
| | | | | | | | | | | | | | | | | | - modify sockex1 example to count number of bytes in outgoing packets - modify sockex2 example to count number of bytes and packets per flow - add 4 stress tests that exercise 'skb->field' code path of verifier Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: allow extended BPF programs access skb fieldsAlexei Starovoitov2015-03-155-35/+234
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | introduce user accessible mirror of in-kernel 'struct sk_buff': struct __sk_buff { __u32 len; __u32 pkt_type; __u32 mark; __u32 queue_mapping; }; bpf programs can do: int bpf_prog(struct __sk_buff *skb) { __u32 var = skb->pkt_type; which will be compiled to bpf assembler as: dst_reg = *(u32 *)(src_reg + 4) // 4 == offsetof(struct __sk_buff, pkt_type) bpf verifier will check validity of access and will convert it to: dst_reg = *(u8 *)(src_reg + offsetof(struct sk_buff, __pkt_type_offset)) dst_reg &= 7 since skb->pkt_type is a bitfield. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'ebpf_helpers'David S. Miller2015-03-155-0/+36
|\ | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== eBPF updates Two small eBPF helper additions to better match up with ancillary classic BPF functionality. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * ebpf: add helper for obtaining current processor idDaniel Borkmann2015-03-155-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the possibility to obtain raw_smp_processor_id() in eBPF. Currently, this is only possible in classic BPF where commit da2033c28226 ("filter: add SKF_AD_RXHASH and SKF_AD_CPU") has added facilities for this. Perhaps most importantly, this would also allow us to track per CPU statistics with eBPF maps, or to implement a poor-man's per CPU data structure through eBPF maps. Example function proto-type looks like: u32 (*smp_processor_id)(void) = (void *)BPF_FUNC_get_smp_processor_id; Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ebpf: add prandom helper for packet samplingDaniel Borkmann2015-03-155-0/+19
|/ | | | | | | | | | | | | | | This work is similar to commit 4cd3675ebf74 ("filter: added BPF random opcode") and adds a possibility for packet sampling in eBPF. Currently, this is only possible in classic BPF and useful to combine sampling with f.e. packet sockets, possible also with tc. Example function proto-type looks like: u32 (*prandom_u32)(void) = (void *)BPF_FUNC_get_prandom_u32; Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'gianfar-next'David S. Miller2015-03-152-96/+138
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Claudiu Manoil says: ==================== gianfar: ARM port driver updates (2/2) The 2nd round of driver updates to make gianfar portable on ARM, for the ARM based SoC that integrates eTSEC - "ls1021a". The patches address the bulk of remaining endianess issues - handling DMA fields (BD and FCB), and device tree properties. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * gianfar: Consider dts property endianess on handlingJingchang Lu2015-03-151-30/+47
| | | | | | | | | | | | | | | | | | Use of_property_read*() to get arch endian consistent property values. Do some refactoring in the process. Signed-off-by: Jingchang Lu <jingchang.lu@freescale.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * gianfar: Make FCB access endian safeClaudiu Manoil2015-03-152-12/+13
| | | | | | | | | | | | | | | | Use conversion macros to correctly access the BE fields of the Rx and Tx Frame Control Block on LE CPUs. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * gianfar: Make BDs access endian safeClaudiu Manoil2015-03-152-54/+78
|/ | | | | | | | Use conversion macros to correctly access the BE fields of the Rx and Tx Buffer Descriptors on LE CPUs. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'rhashtable-next'David S. Miller2015-03-153-62/+80
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Herbert Xu says: ==================== rhashtable: Fixes + cleanups + preparation for multiple rehash Patch 1 fixes the walker so that it behaves properly even during a resize. Patch 2-3 are cleanups. Patch 4-6 lays some ground work for the upcoming multiple rehashing. This revision fixes the warning coming from the bucket_table->size downsize and improves its changelog. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * rhashtable: Move future_tbl into struct bucket_tableHerbert Xu2015-03-152-18/+14
| | | | | | | | | | | | | | | | This patch moves future_tbl to open up the possibility of having multiple rehashes on the same table. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * rhashtable: Add rehash counter to bucket_tableHerbert Xu2015-03-153-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a rehash counter to bucket_table to indicate the last bucket that has been rehashed. This serves two purposes: 1. Any bucket that has been rehashed can never gain a new object. 2. If the rehash counter reaches the size of the table, the table will forever remain empty. This patch also downsizes bucket_table->size to an unsigned int since we do not support sizes greater than 32 bits yet. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * rhashtable: Free bucket tables asynchronously after rehashHerbert Xu2015-03-152-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | There is in fact no need to wait for an RCU grace period in the rehash function, since all insertions are guaranteed to go into the new table through spin locks. This patch uses call_rcu to free the old/rehashed table at our leisure. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * rhashtable: Move seed init into bucket_table_allocHerbert Xu2015-03-151-10/+6
| | | | | | | | | | | | | | | | | | | | | | It seems that I have already made every rehash redo the random seed even though my commit message indicated otherwise :) Since we have already taken that step, this patch goes one step further and moves the seed initialisation into bucket_table_alloc. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * rhashtable: Use SINGLE_DEPTH_NESTINGHerbert Xu2015-03-151-7/+2
| | | | | | | | | | | | | | | | We only nest one level deep there is no need to roll our own subclasses. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * rhashtable: Fix walker behaviour during rehashHerbert Xu2015-03-152-27/+50
|/ | | | | | | | | | | | | | | | | | | | Previously whenever the walker encountered a resize it simply snaps back to the beginning and starts again. However, this only works if the rehash started and completed while the walker was idle. If the walker attempts to restart while the rehash is still ongoing, we may miss objects that we shouldn't have. This patch fixes this by making the walker walk the old table followed by the new table just like all other readers. If a rehash is detected we will still signal our caller of the fact so they can prepare for duplicates but we will simply continue the walk onto the new table after the old one is finished either by us or by the rehasher. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: do not use slave MII bus for fixed PHYsFlorian Fainelli2015-03-141-1/+2
| | | | | | | | | | | | | | | | Commit cd28a1a9baee7 ("net: dsa: fully divert PHY reads/writes if requested") introduced a check for particular PHYs that need to be accessed using the slave MII bus created by DSA, but this check was too inclusive. This would prevent fixed PHYs from being successfully registered because those should not go through the slave MII bus created by DSA. Make sure we check that the PHY is not a fixed PHY to prevent that from happening. Fixes: cd28a1a9baee7 ("net: dsa: fully divert PHY reads/writes if requested") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2015-03-1419-536/+636
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-03-13 This series contains updates to ixgbe and ixgbevf. Don adds additional support for X550 MAC types, which require additional steps around enabling and disabling Rx. Also cleans up variable type inconsistency. I provide a patch to allow relaxed ordering to be enabled on SPARC architectures. Also cleans up ixgbevf whitespace and code comments to align the driver with networking coding standard. Lastly cleaned up uses of memcpy() where ether_addr_copy() could have been used. Alex removes some dead code in the ixgbe cleanup patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * ixgbevf: Use ether_addr_copy() instead of memcpy()Jeff Kirsher2015-03-131-4/+4
| | | | | | | | | | | | | | Use the macro to copy the Ethernet address instead of memcpy(). Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
| * ixgbevf: Fix code comments and whitespaceJeff Kirsher2015-03-139-495/+498
| | | | | | | | | | | | | | | | | | | | | | Fix the code comments to align with drivers/net/ code commenting style, as well as whitespace issues. The whitespace issues resolve checkpatch errors, like lines exceeding 80 chars (except for strings) and the use of tabs where possible. CC: <kernel-team@fb.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
| * ixgbe: Remove IXGBE_FLAG_IN_NETPOLL since it doesn't do anythingAlexander Duyck2015-03-132-15/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes some dead code from the cleanup path for ixgbe. Setting and clearing the flag doesn't do anything since all we are doing is setting the flag, scheduling NAPI, clearing the flag and then letting netpoll do the polling cleanup. As such it doesn't make much sense to have it there. This patch also removes one minor white-space error. CC: <kernel-team@fb.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ixgbe: enable relaxed ordering for SPARCJeff Kirsher2015-03-132-4/+12
| | | | | | | | | | | | | | | | | | | | | | This patch makes sure that relaxed ordering is not disabled when on SPARC, where it helps with performance. CC: <kernel-team@fb.com> CC: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
| * ixgbe: cleanup make ixgbe_set_ethertype_anti_spoofing_X550 staticDon Skidmore2015-03-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | Correcting a mistake when I initial created this function. I should have made this static since it is only referenced where the function pointer is assigned. CC: <kernel-team@fb.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ixgbe: Clean up type inconsistencyDon Skidmore2015-03-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | Missed this when I created commit 6a14ee0cfb197 ("ixgbe: Add X550 support function pointers"). Use a the __be* type to be consistent with how the value is assigned. CC: <kernel-team@fb.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ixgbe: add new wrapper for X550 supportDon Skidmore2015-03-139-14/+114
| | | | | | | | | | | | | | | | | | | | | | For the X550 mac type we have to do additional steps around enabling/disabling Rx. This patch will add a layer of indirection around these support functions to enable this. CC: <kernel-team@fb.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | Merge branch 'listener_refactor_part_9'David S. Miller2015-03-145-104/+54
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eric Dumazet says: ==================== inet: tcp listener refactoring, part 9 This preliminary work pushes socket convergence a bit more: 1) request sock ir_iif is universally set 2) inet_diag can use common helpers to reduce LOC ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | inet_diag: factorize code in new inet_diag_msg_common_fill() helperEric Dumazet2015-03-141-101/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now the three type of sockets share a common base, we can factorize code in inet_diag_msg_common_fill(). inet_diag_entry no longer requires saddr_storage & daddr_storage and the extra copies. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | inet_diag: adjust inet_sk_diag_fill() bug conditionEric Dumazet2015-03-141-1/+1
| | | | | | | | | | | | | | | | | | | | | inet_sk_diag_fill() only copes with non timewait and non request socks Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | inet: fill request sock ir_iif for IPv4Eric Dumazet2015-03-145-3/+7
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once request socks will be in ehash table, they will need to have a valid ir_iff field. This is currently true only for IPv6. This patch extends support for IPv4 as well. This means inet_diag_fill_req() can now properly use ir_iif, which is better for IPv6 link locals anyway, as request sockets and established sockets will propagate consistent netlink idiag_if. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'tipc-next'David S. Miller2015-03-148-341/+300
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jon Maloy says: ==================== tipc: some optimizations and impovements The commits in this series contain some relatively simple changes that lead to better throughput across TIPC connections. We also make changes to the implementation of link transmission queueing and priority handling, in order to make the code more comprehensible and maintainable. v2: Commit #2: Redesigned tipc_msg_validate() to use pskb_may_pull(), as per feedback from David Miller. Commit #3: Some cosmetic changes to tipc_msg_extract(). I tried to replace the unconditional skb_linearize() with calls to pskb_may_pull() at selected locations, but I gave up. First, skb_trim() requires a fully linearized buffer. Second, it doesn't make much sense; the whole buffer will end up linearized, one way or another. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tipc: clean up handling of message prioritiesJon Paul Maloy2015-03-144-61/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Messages transferred by TIPC are assigned an "importance priority", -an integer value indicating how to treat the message when there is link or destination socket congestion. There is no separate header field for this value. Instead, the message user values have been chosen in ascending order according to perceived importance, so that the message user field can be used for this. This is not a good solution. First, we have many more users than the needed priority levels, so we end up with treating more priority levels than necessary. Second, the user field cannot always accurately reflect the priority of the message. E.g., a message fragment packet should really have the priority of the enveloped user data message, and not the priority of the MSG_FRAGMENTER user. Until now, we have been working around this problem in different ways, but it is now time to implement a consistent way of handling such priorities, although still within the constraint that we cannot allocate any more bits in the regular data message header for this. In this commit, we define a new priority level, TIPC_SYSTEM_IMPORTANCE, that will be the only one used apart from the four (lower) user data levels. All non-data messages map down to this priority. Furthermore, we take some free bits from the MSG_FRAGMENTER header and allocate them to store the priority of the enveloped message. We then adjust the functions msg_importance()/msg_set_importance() so that they read/set the correct header fields depending on user type. This small protocol change is fully compatible, because the code at the receiving end of a link currently reads the importance level only from user data messages, where there is no change. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tipc: split link outqueueJon Paul Maloy2015-03-147-167/+150
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct tipc_link contains one single queue for outgoing packets, where both transmitted and waiting packets are queued. This infrastructure is hard to maintain, because we need to keep a number of fields to keep track of which packets are sent or unsent, and the number of packets in each category. A lot of code becomes simpler if we split this queue into a transmission queue, where sent/unacknowledged packets are kept, and a backlog queue, where we keep the not yet sent packets. In this commit we do this separation. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tipc: eliminate unnecessary call to broadcast ack functionJon Paul Maloy2015-03-142-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The unicast packet header contains a broadcast acknowledge sequence number, that may need to be conveyed to the broadcast link for proper treatment. Currently, the function tipc_rcv(), which is on the most critical data path, calls the function tipc_bclink_acknowledge() to have this done. This call is made for each received packet, and results in the unconditional grabbing of the broadcast link spinlock. This is unnecessary, since we can see directly from tipc_rcv() if the acknowledged number differs from what has been previously acked from the node in question. In the vast majority of cases the numbers won't differ, and there is nothing to update. We now make the call to tipc_bclink_acknowledge() conditional to that the two ack values differ. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tipc: extract bundled buffers by cloning instead of copyingJon Paul Maloy2015-03-142-47/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we currently extract a bundled buffer from a message bundle in the function tipc_msg_extract(), we allocate a new buffer and explicitly copy the linear data area. This is unnecessary, since we can just clone the buffer and do skb_pull() on the clone to move the data pointer to the correct position. This is what we do in this commit. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tipc: eliminate unnecessary linearization of incoming buffersJon Paul Maloy2015-03-142-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, TIPC linearizes all incoming buffers directly at reception before passing them upwards in the stack. This is clearly a waste of CPU resources, and must be avoided. In this commit, we eliminate this unnecessary linearization. We still ensure that at least the message header is linear, and that the buffer is linearized where this is still needed, i.e. when unbundling and when reversing messages. In addition, we ensure that fragmented messages are validated after reassembly before delivering them upwards in the stack. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tipc: move message validation function to msg.cJon Paul Maloy2015-03-143-60/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function link_buf_validate() is in reality re-entrant and context independent, and will in later commits be called from several locations. Therefore, we move it to msg.c, make it outline and rename the it to tipc_msg_validate(). We also redesign the function to make proper use of pskb_may_pull() Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tipc: add framework for node capabilities exchangeJon Paul Maloy2015-03-143-2/+16
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TIPC protocol spec has defined a 13 bit capability bitmap in the neighbor discovery header, as a means to maintain compatibility between different code and protocol generations. Until now this field has been unused. We now introduce the basic framework for exchanging capabilities between nodes at first contact. After exchange, a peer node's capabilities are stored as a 16 bit bitmap in struct tipc_node. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'for-upstream' of ↵David S. Miller2015-03-1423-1386/+1888
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== Here's another set of Bluetooth & ieee802154 patches intended for 4.1: - Added support for QCA ROME chipset family in the btusb driver - at86rf230 driver fixes & cleanups - ieee802154 cleanups - Refactoring of Bluetooth mgmt API to allow new users - New setting for static Bluetooth address exposed to user space - Refactoring of hci_dev flags to remove limit of 32 - Remove unnecessary fast-connectable setting usage restrictions - Fix behavior to be consistent when trying to pair already paired device - Service discovery corner-case fixes Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * ieee802154: don't export static symbolJulia Lawall2015-03-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ type T; identifier f; @@ static T f (...) { ... } @@ identifier r.f; declarer name EXPORT_SYMBOL; @@ -EXPORT_SYMBOL(f); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
OpenPOWER on IntegriCloud